Steven Weinberg - Foundations of Modern Physics-Cambridge University Press (2021)

Foundations of Modern Physics
In addition to his ground-breaking research, Nobel Laureate Steven Weinberg

is known for a series of highly praised texts on various aspects of physics, com-
bining exceptional physical insight with a gift for clear exposition. Describing
the foundations of modern physics in their historical context and with some
new derivations, Weinberg introduces topics ranging from early applications
of atomic theory through thermodynamics, statistical mechanics, transport
theory, special relativity, quantum mechanics, nuclear physics, and quantum
field theory. This volume provides the basis for advanced undergraduate and
graduate physics courses as well as being a handy introduction to aspects of
modern physics for working scientists.
steven weinberg is a member of the Physics and Astronomy Departments

at the University of Texas at Austin. He has been honored with numerous
awards, including the Nobel Prize in Physics, the National Medal of Science,
the Heinemann Prize in Mathematical Physics, and most recently a Special
Breakthrough Prize in Fundamental Physics. He is a member of the US National
Academy of Sciences, the UK’s Royal Society, and other academies in the US
and internationally. The American Philosophical Society awarded him the
Benjamin Franklin medal, with a citation that said he is “considered by many
to be the preeminent theoretical physicist alive in the world today.” He has
written several highly regarded books, including Gravitation and Cosmology,
the three-volume work The Quantum Theory of Fields, Cosmology, Lectures on
Quantum Mechanics, and Lectures on Astrophysics.
Foundations of Modern Physics
Steven Weinberg
University of Texas, Austin
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781108841764
DOI: 10.1017/9781108894845
© Steven Weinberg 2021
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2021
Printed in the United Kingdom by TJ Books Limited, Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Weinberg, Steven, 1933– author.
Title: Foundations of modern physics / Steven Weinberg, The University of Texas at Austin.
Description: New York : Cambridge University Press, 2021. | Includes
bibliographical references and indexes.
Identifiers: LCCN 2020055431 (print) | LCCN 2020055432 (ebook) |
ISBN 9781108841764 (hardback) | ISBN 9781108894845 (epub)
Subjects: LCSH: Physics.
Classification: LCC QC21.3 .W345 2021 (print) | LCC QC21.3 (ebook) |
DDC 530–dc23
LC record available at https://lccn.loc.gov/2020055431
LC ebook record available at https://lccn.loc.gov/2020055432
ISBN 978-1-108-84176-4 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
For Louise, Elizabeth, and Gabrielle
Contents
PREFACE page xiii
1 EARLY ATOMIC THEORY 1
1.1 Gas Properties 2

Air pressure Boyle’s law Temperature Scales Charles’ law
Explanation of gas laws Ideal gas law
1.2 Chemistry 6
Elements Law of combining weights Dalton’s atomic weights Law of
combining volumes Avogadro’s principle The gas constant Avogadro’s
number
1.3 Electrolysis 10
Early electricity Early magnetism Electromagnetism Discovery of
electrolysis Faraday’s theory The faraday
1.4 The Electron 14

Cathode rays Thomson’s experiments Electrons as atomic constituents
2 THERMODYNAMICS AND KINETIC THEORY 16
2.1 Heat and Energy 16

Caloric Heat as energy Kinetic energy Specific heat Energy density
and pressure Adiabatic changes
vii
viii Contents
2.2 Absolute Temperature 21

Carnot cycles Theorems on efficiency Absolute temperature defined
Relation to gas thermometers
2.3 Entropy 27
Definition of entropy Independence of path Increase of entropy
Thermodynamic relations Entropy of ideal gases Neutral matter
Radiation energy Laws of thermodynamics
2.4 Kinetic Theory and Statistical Mechanics 33

Maxwell–Boltzmann distribution General H -theorem Time reversal
Canonical and grand-canonical distributions Connection with thermodynamics
Compound systems Probability distribution in gases Equipartition of
energy Entropy as disorder
2.5 Transport Phenomena 42

Conservation laws Galilean relativity Navier–Stokes equation Viscosity
Mean free path Diffusion
2.6 The Atomic Scale 53

Nineteenth century estimates Electronic charge Brownian motion
Consistency of constants Appendix: Einstein’s diffusion constant rederived
3 EARLY QUANTUM THEORY 61
3.1 Black Body Radiation 61

Absorption and energy density Degrees of freedom of electromagnetic fields
Rayleigh–Jeans distribution Planck distribution Measurement of
Boltzmann constant Radiation energy constant
3.2 Photons 67
Quantization of radiation energy Derivation of Planck distribution
Photoelectric effect Particles of light
3.3 The Nuclear Atom 71

Radioactivity Alpha and beta rays Discovery of the nucleus Nuclear mass
Nuclear size Scattering pattern Nuclear charge
3.4 Atomic Energy Levels 77

Spectral lines Electron orbits Combination principle Bohr’s quantization
condition Correspondence principle Comparison with observed one-electron
atomic spectra Reduced mass Atomic number Outstanding questions
Contents ix
3.5 Emission and Absorption of Radiation 84

Einstein A and B coefficients Equilibrium with black body radiation Relations
among coefficients Lasers Suppressed absorption
4 RELATIVITY 88
4.1 Early Relativity 88

Motion of the Earth Relativity of motion Speed of light
Michelson–Morley experiment Lorentz–Fitzgerald contraction
4.2 Einsteinian Relativity 94

Postulate of invariance of electrodynamics Lorentz transformations Space
inversion, time reversal The Galilean limit Maximum speed Boosts in
general directions Special and general relativity
4.3 Clocks, Rulers, Light Waves 103

Clocks and time dilation Rulers and length contraction Transformation of
frequency and wave number
4.4 Mass, Energy, Momentum, Force 106

Einstein’s thought experiment Formulas for energy and momentum E = mc2
Force in relativistic dynamics
4.5 Photons as Particles 111

Photon momentum Compton scattering Other massless particles
4.6 Maxwell’s Equations 114

The inhomogeneous and homogeneous equations Density and current of electric
charge Relativistic formulation of inhomogeneous Maxwell equations Indices
upstairs and downstairs Relativistic formulation of homogeneous Maxwell
equations Electric and magnetic forces
4.7 Causality 121

Causes precede effects Invariance of temporal order Maximum signal speed
Light cone
5 QUANTUM MECHANICS 124
5.1 De Broglie Waves 125

Free-particle wave functions Group velocity Application to hydrogen
Davisson–Germer experiment Electron microscopes Appendix: Derivation
of the Bragg formula
x Contents
5.2 The Schrödinger Equation 129

Wave equation for particle in potential Boundary conditions Spherical
symmetry Radial and angular wave functions Angular multiplicity
Spherical harmonics Hydrogenic energy levels Degeneracy
5.3 General Principles of Quantum Mechanics 138

States and wave functions Observables and operators Hamiltonian
Adjoints Expectation values Probabilities Continuum limit
Momentum space Commutation relations Uncertainty principle Time
dependent wave functions Conservation laws Heisenberg and
Schrödinger pictures
5.4 Spin and Orbital Angular Momentum 151

Doubling of sodium D-line The idea of spin General action of rotations on
wave functions Total angular momentum operator Commutation relations
Spin and orbital angular momentum Multiplets Adding angular momenta
Atomic fine structure and space inversion Hyperfine structure Appendix:
Clebsch–Gordan Coefficients
5.5 Bosons and Fermions 165

Identical particles Symmetric and antisymmetric wave functions Bosons and
fermions in statistical mechanics Hartree approximation Slater determinant
Pauli exclusion principle Periodic table of elements Diatomic molecules:
para and ortho Astrophysical cooling
5.6 Scattering 175

Scattering wave function Representations of the delta function Calculation of
the Green’s function Scattering amplitude Probabilistic interpretation
Cross section Born approximation Scattering by shielded Coulomb
potential Appendix: General transition rates
5.7 Canonical Formalism 190

Hamiltonian formalism Canonical commutation relations Lagrangian
formalism Action principle Connection of formalisms Noether’s theorem:
symmetries and conservation laws Space translation and momentum
5.8 Charged Particles in Electromagnetic Fields 195

Vector and scalar potential Charged particle Hamiltonian Equations of motion
Gauge transformations Magnetic interactions Spin coupling
5.9 Perturbation Theory 199

Perturbative expansion First-order perturbation theory Dealing with
degeneracy The Zeeman effect Second-order perturbation theory
Contents xi
5.10 Beyond Wave Mechanics 206

State vectors Linear operators First postulate: values of observables
Second postulate: expectation values Probabilities Continuum limit
Wave functions as vector components
6 NUCLEAR PHYSICS 210
6.1 Protons and Neutrons 210

Discovery of the proton Integer atomic weights Nuclei as protons and
electrons? Trouble with diatomic nitrogen Discovery of the neutron
Nuclear radius and binding energy Liquid drop model Stable valley and
decay modes
6.2 Isotopic Spin Symmetry 216

Neutron–proton and proton–proton forces Isotopic spin rotations Isotopic spin
multiplets Quark model Pions Appendix: The three–three resonance
6.3 Shell Structure 224

Harmonic oscillator approximation Raising and lowering operators
Degenerate multiplets Magic numbers Spin–orbit coupling
6.4 Alpha Decay 229

Coulomb barrier Barrier suppression factors Semi-classical estimate of alpha
decay rate Level splitting Geiger–Nuttall law Radium alpha decay
Appendix: Quantum theory of barrier penetration rates
6.5 Beta Decay 243

Electron energy distribution Neutrinos proposed Fermi theory
Gamow–Teller modification Selection rules Strength of weak interactions
Neutrinos discovered Violation of left–right and matter–antimatter symmetries
Neutrino helicities Varieties of neutrino
7 QUANTUM FIELD THEORY 251
7.1 Canonical Formalism for Fields 252

Action, Lagrangian, Lagrangian density Functional derivatives
Euler–Lagrange field equations Commutation relations Energy and
momentum of fields
7.2 Free Real Scalar Field 255

Lagrangian density Field equation Creation and annihilation operators
Energy and momentum Vacuum state Multiparticle states
xii Contents
7.3 Interactions 261

Time-ordered perturbation theory Requirements for Lorentz invariance
Example: Scattering of neutral spinless particles Feynman diagram
Calculation of the propagator Yukawa potential
7.4 Antiparticles, Spin, Statistics 270

Antiparticles needed Complex scalar field General fields Lorentz
transformation Spin–statistics connection Appendix: Dirac fields
7.5 Quantum Theory of Electromagnetism 280

Lagrangian density for electrodynamics Four-vector potential Gauge
transformations Coulomb gauge Commutation relations Free fields
Photon momentum and helicity Radiative decay rates Selection rules
Gauge invariance and charge conservation Local phase invariance Standard
model
ASSORTED PROBLEMS 296

BIBLIOGRAPHY 301
AUTHOR INDEX 303
SUBJECT INDEX 307
Preface
This book grew out of the notes for a course I gave for undergraduate physics
students at the University of Texas. In this book I think I go farther forward
than is usual in undergraduate courses, giving readers a taste of nuclear physics
and quantum field theory. I also go farther back than is usual, starting with the
struggle in the nineteenth century to establish the existence and properties of
atoms, including the development of thermodynamics that both aided in this
struggle and offered an alternative program.
I fear that some readers may want to skim through this early part and hurry
on to what they regard as the good stuff, quantum mechanics and relativity. That
would be a pity. In my experience physics students who aim at a career in atomic
or nuclear or elementary particle physics often manage to get through their
formal education without ever becoming familiar with entropy, or equipartition,
or viscosity, or diffusion. That was true in my own case. This book, or a course
based on it, may provide some students with their last chance to learn about
these and other matters needed to understand the macroscopic world.
Readers may find this book unusual also in its strong emphasis on history.
I make a point of saying a little about the welter of theoretical guesswork and ill-
understood experiments out of which modern physics emerged in the twentieth
century. This, it seems to me, is a help in understanding what otherwise may
seem an arbitrary set of postulates for relativity and quantum mechanics. It is
also a matter of personal taste. Research in physics seems to me to lose some of
its excitement if we do not see it as part of a great historical progression. Some
valuable historical works are listed in a bibliography, along with collections of
original articles that I have found most helpful.
But this is not a work of history. Historians aim at uncovering how the scien-
tists of the past thought about their own problems – for instance, how Einstein
in 1905 thought about the measurement of space and time separations in de-
veloping the special theory of relativity. For this aim of historical writing it is
necessary to go deeply into personal accounts, institutional development, and
xiii
xiv Preface
false starts, and to put aside our knowledge of subsequent progress. I try to
be accurate in describing the state of physics in past times, but the aim of this
book in discussing the problems of the past is different: it is to make clear how
physicists think about these things today.
This book is intended chiefly for physics students who are well into their time
as undergraduates, and for working scientists who want a brief introduction to
some area of modern physics. I have therefore not hesitated to use calculus and
matrix algebra, though not in advanced versions. As required by the subject
matter, the mathematical level here slopes upwards through the book. Where
possible I have chosen concrete rather than abstract formulations of physical
theories. For instance, in Chapter 5, on quantum mechanics, I mostly represent
physical states as wave functions, only coming at the end of the chapter to their
representation as vectors in Hilbert space. In some sections detailed material
that can be skipped without losing the thread of the theory is put into appendices.
Two of these appendices present what in my unbiased opinion are improved
derivations of important results: the appendix to Section 2.6 gives a revised
version of Einstein’s derivation of his formula for the diffusion constant in
Brownian motion, and the appendix to Section 6.4 presents a revision of Fermi’s
calculation of the rate of alpha decay.
In my experience, with some judicious pruning, the material of the book up
to about the middle of Chapter 5 can be covered in a one-term undergraduate
course. But I think that to go over the whole book would take a full two-term
academic year.
This book treats such a broad range of topics that it is impossible to go very
far into any of them. Certainly its treatment of quantum mechanics, statistical
mechanics, transport theory, nuclear physics, and quantum field theory is no
substitute for graduate-level courses on these topics, any one of which would
occupy at least a whole year. This book presents what I think, in an ideal
world, the ambitious physics student would already know when he or she enters
graduate school. At least, it is what I wish that I had known when I entered
graduate school.
In any case, I hope that the student or reader may be sufficiently interested in
what I do discuss that they will want to go into these topics in greater detail in
more specialized books or courses, and that they will find in this book a good
preparation for such further studies.
I am grateful to many students and colleagues for pointing out errors in
the lecture notes on which this book is based and for the expert and friendly
assistance I have received from Simon Capelin and Vince Higgs, the editors at
Cambridge University Press who guided the publication of this book.
STEVEN WEINBERG
1
Early Atomic Theory
It is an old idea that matter consists of atoms, tiny indivisible particles moving
in empty space. This theory can be traced to Democritus, working in the Greek
city of Abdera, on the north shore of the Aegean sea. In the late 400s BC
Democritus proclaimed that “atoms and void alone exist in reality.” He offered
neither evidence for this hypothesis nor calculations on which to base predic-
tions that could confirm it. Nevertheless, this idea was tremendously influential,
if only as an example of how it might be possible to account for natural phe-
nomena without invoking the gods. Atoms were brought into the materialistic
philosophy of Epicurus of Samos, who a little after 300 BC founded one of
the four great schools of Athens, the Garden. In turn, the idea of atoms and
the philosophy of Epicurus were invoked in the poem On the Nature of Things
by the Roman Lucretius. After this poem was rediscovered in 1417 it influ-
enced Machiavelli, More, Shakespeare, Montaigne, and Newton, among others.
Newton in his Opticks speculated that the properties of matter arise from the
clustering of atoms into larger particles, which themselves cluster into larger
particles, and so on. As we will see, Newton made a stab at an atomic theory of
air pressure, but without significant success.
The serious scientific application of the atomic theory began in the eighteenth
century, with calculations of the properties of gases, which had been studied
experimentally since the century before. This is the topic with which we begin
this chapter. Applications to chemistry and electrolysis followed in the nine-
teenth century and will be considered in subsequent sections. The final section
of this chapter describes how the nature of atoms began to be clarified with the
discovery of the electron. In the following chapter we will see how it became
possible to estimate the atoms’ masses and sizes.1
1 Further historical details about some of these matters can be found in Weinberg, The Discovery of
Subatomic Particles, listed in the bibliography.
1
2 1 Early Atomic Theory
1.1 Gas Properties
Experimental Relations
The upsurge of enthusiasm for experiment in the seventeenth century was
largely concentrated on the properties of air. The execution and reports of these
experiments did not depend on hypotheses regarding atoms, but we need to
recall them here because their results provided the background for later theories
of gas properties that did rely on assumptions about atoms.
It had been thought by Aristotle and his followers that the suction observed
in pumps and bellows arises from nature’s abhorrence of a vacuum. This notion
was challenged in the 1640s by the invention of the barometer by the Florentine
polymath Evangelista Torricelli (1608–1647). If nature abhors a vacuum, then
when a long glass tube with one end closed is filled with mercury and set
upright with the closed end on top, why does the mercury flow out of the bottom
until the column is only 760 mm high, with empty space appearing above the
mercury? Is there a limit to how much nature abhors a vacuum? Torricelli
argued that the mercury is held up instead by the pressure of the air acting
on the open end of the glass tube (or on the surface of a bath of mercury in
which the open end of the tube is immersed), which is just sufficient to support
a column of mercury 760 mm high. If so, then it should be possible to measure
variations in air pressure using a column of mercury in a vertical glass tube, a
device that we know as a barometer. Such measurements were made from 1648
to 1651 by Blaise Pascal (1623–1662), who found that the height of mercury
in a barometer is decreased by moving to the top of a mountain, where less air
extends above the barometer.
The quantitative properties of air pressure soon began to be studied
experimentally, before there was any correct theoretical understanding of gas
properties. In 1662, in the second edition of his book New Experiments Physico-
Mechanical Concerning the Spring of the Air and its Effects, the Anglo-Irish
aristocrat Robert Boyle (1627–1691) described experiments relating the pres-
sure (the “spring of the air”) and volume of a fixed mass of air. He studied a
sample of air enclosed at the end of a glass tube by a column of mercury in
the tube. The air was compressed at constant temperature by pushing on the
mercury’s surface, revealing what came to be known as Boyle’s law, that for
constant temperature the volume of a gas of fixed mass and composition is
inversely proportional to the pressure, now defined by Boyle as the force per
area exerted on the gas.
Temperature Scales
A word must be said about the phrase “at constant temperature.” Boyle lived
before the establishment of our modern Fahrenheit and Celsius scales, whose
forerunners go back respectively to 1724 and 1742. But, although in Boyle’s

time no meaningful numerical value could be given to the temperature of any
given body, it was nevertheless possible to speak with precision of two bodies
being at the same temperature: they are at the same temperature if when put in
contact neither body is felt to grow appreciably hotter or colder. Boyle’s glass
tube could be kept at constant temperature by immersing it in a large bath, say of
water from melting ice. Later the Fahrenheit temperature scale was established
by defining the temperature of melting ice as 32 ◦ F and the temperature of
boiling water at mean atmospheric pressure as 212 ◦ F, and defining a 1 ◦ F
increase of temperature by etching 212 − 32 equal divisions between 32 and
212 on the glass tube of a mercury thermometer. Likewise, in the Celsius scale,
the temperatures of melting ice and boiling water are 0 ◦ C and 100 ◦ C, and
1 ◦ C is the temperature difference required to increase the volume of mercury
in a thermometer by 1% of the volume change in heating from melting ice
to boiling water. As we will see in the next chapter, there is a more sophis-
ticated universal definition of temperature, to which scales based on mercury
thermometers provide only a good approximation.
After the temperature scale was established it became possible to carry out a
quantitative study of the relation between volume and temperature, with pres-
sure and mass kept fixed by enclosing the air in a vessel with flexible walls,
which expand or contract to keep the pressure inside equal to the air pressure
outside. This relation was announced in an 1802 lecture by Joseph Louis Gay-
Lussac (1775–1850), who attributed it to unpublished work in the 1780s by
Jacques Charles (1746–1823). The relation, subsequently known as Charles’
Law, is that at constant pressure and mass the volume of gas is proportional
to T − T0 , where T is the temperature measured for instance with a mercury
thermometer and T0 is a constant whose numerical value naturally depends
on the units used for temperature: T0 = −459.67 ◦ F = −273.15 ◦ C. Thus
T0 is absolute zero, the minimum possible temperature, at which the gas vol-
ume vanishes. Using Celsius units for temperature differences, the absolute
temperature T ≡ T − T0 is known today as the temperature in degrees Kelvin,
denoted K.
Theoretical Explanations
In Proposition 23 of his great book, the Principia, Isaac Newton (1643–1727)
made an attempt to account for Boyle’s law by considering air to consist of
particles repelling each other at a distance. Using little more than dimensional
analysis, he showed that the pressure p of a fixed mass of air is inversely
proportional to the volume V if the repulsive force between particles separated
by a distance r falls off as 1/r. But as he pointed out, if the repulsive force goes
as 1/r 2 , then p ∝ V −4/3 . He did not claim to offer any reason why the repulsive
force should go as 1/r and, as we shall see, it is not forces that go as 1/r but
rather forces of very short range that act only in collisions that mostly account
for the properties of gases.
It was the Swiss mathematical physicist Daniel Bernoulli (1700–1782) who
made the first attempt to understand the properties of gases theoretically, on the
assumption that a gas consists of many tiny particles moving freely except in
very brief collisions. In 1738, in the chapter, “On the Properties and Motions of
Elastic Fluids, Especially Air” of his book Hydrodynamics, he argued that in a
gas (then called an “elastic fluid”) with n particles per unit volume moving with
a velocity v that is the same (because of collisions) in all directions, the pressure
is proportional to n and to v 2 , because the number of particles that hit any given
area of the wall in a given time is proportional to the number in any given
volume, to the rate at which they hit the wall, which is proportional to v, and to
the force that each particle exerts on the wall, which is also proportional to v.
For a fixed mass of gas n is inversely proportional to the volume V , so pV is
proportional to v 2 . If (as Bernoulli thought) v 2 depends only on the temperature,
this explains Boyle’s law. If v 2 is proportional to the absolute temperature, it
also gives Charles’ law.
Bernoulli did not give much in the way of mathematical details, and did not
try to say to what else the pressure might be proportional besides nv 2 , a matter
crucial for the history of chemistry. These details were provided by Rudolf
Clausius (1822–1888) in 1857, in an article entitled “The Nature of the Motion
which We Call Heat.” Below is a more-or-less faithful description of Clausius’
derivation, in a somewhat different notation.
Suppose a particle hits the wall of a vessel and remains in contact with it for a
small time t, during which it exerts a force with component F along the inward
normal to the wall. Its momentum in the direction of the inward normal to the
wall will decrease by an amount F t, so if the component of the velocity of the
particle before it strikes the wall is v⊥ > 0, and it bounces back elastically with
normal velocity component −v⊥ , the change in the inward normal component
of momentum is −2mv⊥ , where m is the particle mass, so
F = 2mv⊥ /t .
Now, suppose that this goes on with many particles hitting the wall over a time
interval T t, all particles with the same velocity vector v. The number N of
particles that will hit an area A of the wall in this time is the number of particles
in a cylinder with base A and height v⊥ T , or
N = nAv⊥ T ,
where n is the number density, the number of particles per volume. Each of
these particles is in contact with the wall for a fraction t/T of the time T , so the
total force exerted on the wall is
F N (t/T ) = 2mv⊥ /t × nAv⊥ T × (t/T ) = 2nmv⊥
2
A.
We see that all dependence on the times t and T cancels. The pressure p is
defined as the force per area, so this gives the relation
p = 2nmv⊥
2
. (1.1.1)
This is for the unphysical case in which every particle has the same value of v⊥ ,
positive in the sense that the particles are assumed to be going toward the wall.
In the real world, different particles will be moving with different speeds in
different directions, and Eq. (1.1.1) should be replaced with
1 2
p = 2nm × v⊥ = nmv⊥2
, (1.1.2)
2
the brackets indicating an average over all gas particles, with the factor 1/2
inserted in the first expression because only 50% of these particles will be going
toward any given wall area.
To express v⊥ 2 in terms of the root mean square velocity, Clausius assumed
without proof that “on the average each direction [of the particle velocities]
is equally represented.” In this case, the average square of each component of
velocity equals v⊥ 2 , and the average of the squared velocity vector is then
v2 = v12 + v22 + v32 = 3v⊥

2

and therefore Eq. (1.1.2) reads
p = nmv2 /3 . (1.1.3)
This is essentially the result p ∝ nv2 of Bernoulli, except that, with the
factor m/3, Eq. (1.1.3) is now an equality, not just a statement of proportion-
ality. For a fixed mass M of gas occupying a volume V , the number density
is n = M/mV , so Clausius could use Boyle’s law (which he called Mariotte’s
law), which states that pV is constant for fixed temperature, to conclude that
for a given gas v2 depends only on the temperature. Further, as Clausius
remarked, Eq. (1.1.3) together with Charles’ law (which Clausius called the
law of Gay-Lussac) indicates that v2 is proportional to the absolute temper-
ature T . If we like, we can adopt a modern notation and write the constant of
proportionality as 3k/m, so that
mv2 /3 = kT , (1.1.4)
and therefore Eq. (1.1.3) reads
p = nkT , (1.1.5)
where k is a constant, in the sense of being independent of p, n, and T . But
the choice of notation does not tell us whether k varies from one type of gas
to another or whether it depends on the molecular mass m. Clausius could not
answer this question, and did not offer any theoretical justification for Boyle’s
law or Charles’ law. Clausius deserves to be called the founder of thermo-

dynamics, discussed in Sections 2.2 and 2.3, but these are not questions that
can be answered by thermodynamics alone. As we will see in the following
section, experiments in the chemistry of gases indicated that k is the same for
all gases, a universal constant now known as Boltzmann’s constant, but the
theoretical explanation for this and for Boyle’s law and Charles’ law had to
wait for the development of kinetic theory and statistical mechanics, the subject
of Section 2.4.
As indicated by the title of his article, “The Nature of the Motion which We
Call Heat,” Clausius was concerned to show that, at least in gases, the phe-
nomenon of heat is explained by the motion of the particles of which gases are
composed. He defended this view by using his theory to calculate the specific
heat of gases, a topic to be considered in the next chapter.
1.2 Chemistry
Elements
The idea that all matter is composed of a limited number of elements goes back
to the earliest speculations about the nature of matter. At first, in the century
before Socrates, it was supposed that there is just one element: water (Thales) or
air (Anaximenes) or fire (Heraclitus) or earth (perhaps Xenophanes). The idea of
four elements was proposed around 450 BC by Empedocles of Acragas (modern
Agrigento). In On Nature he identified the elements as “fire and water and earth
and the endless height of air.” Classical Chinese sources list five elements: water,
fire, earth, wood, and metal.
Like the theory of atoms, these early proposals of elements did not come
accompanied with any evidence that these really are elements, or any suggestion
how such evidence might be gained. Plato in Timaeus even doubled down and
stated that the difference between one element and another arises from the
shapes of the atoms of which the elements are composed: earth atoms are tiny
cubes, while the atoms of fire, air, and water are other regular polyhedra –
solids bounded respectively by 4, 8, or 20 identical regular polygons, with every
edge and every vertex of each solid the same as every other edge or vertex of
that solid.
By the end of the middle ages this list of elements had come to seem implau-
sible. It is difficult to identify any particular sample of dirt as the element earth,
and fire seems more like a process than a substance. Alchemists narrowed the
list of elements to just three: mercury, sulfur, and salt.
Modern chemistry began around the end of the eighteenth century, with
careful experiments by Joseph Priestley (1733–1804), Henry Cavendish (1743–
1810), Antoine Lavoisier (1743–1794), and others. By 1787 Lavoisier had
1.2 Chemistry 7
worked out a list of 55 elements. In place of air there were several gases:
hydrogen, oxygen, and nitrogen; air was identified as a mixture of nitrogen and
oxygen. There were other non-metals on the list of elements: sulfur, carbon,
and phosphorus, and a number of common metals: iron, copper, tin, lead, silver,
gold, mercury. Lavoisier also listed as elements some chemicals that we now
know are tightly bound compounds: lime, soda, and potash. And the list also
included heat and light, which of course are not substances at all.
Law of Combining Weights

Chemistry was first used to provide quantitative information about atoms by
John Dalton (1766–1844), the son of a poor weaver. His laboratory notebooks
from 1802 to 1804 describe careful measurements of the weights of elements
combining in compounds. He discovered that these weights are always in fixed
ratios. For instance, he found that when hydrogen burns in oxygen, 1 gram of
hydrogen combines with 5.5 grams of oxygen, giving 6.5 grams of water, with
nothing left over. Under the assumption that one particle of water consists of
one atom of hydrogen and one atom of oxygen, one oxygen atom must weigh
5.5 times as much as one hydrogen atom.
As we will see, water was soon discovered to be H2 O: two atoms of
hydrogen to each atom of oxygen. If Dalton had known this, he would have
concluded that an oxygen atom weighs 5.5 times as much as two hydrogen
atoms, i.e., 11 times the weight of one hydrogen atom. Of course, more accurate
measurements later revealed that 1 gram of hydrogen combines with about 8
grams of oxygen, so one oxygen atom weighs eight times the weight of two
hydrogen atoms, or 16 times as much as one hydrogen atom. Atomic weights
soon became defined as the weights of atoms relative to the weight of one
hydrogen atom, so the atomic weight of oxygen is 16. (This is only approximate.
Today the atomic weight of the atoms of the most common isotope of carbon
is defined to be precisely 12; with this definition, the atomic weights of the
most common isotopes of hydrogen and oxygen are measured to be 1.007825
and 15.99491.)
The following table compares Dalton’s assumed formulas for a few common
compounds with the correct formulas:
Compound Dalton formula True formula
Water HO H2 O
Carbon dioxide CO2 CO2
Ammonia NH NH3
Sulfuric acid SO2 H2 SO4
Here is a list of the approximate true atomic weights for a few elements,
the weights deduced by Dalton, and (in the column marked with an asterisk) the
weights Dalton would have calculated if he had known the true chemical
formulas.
Element True Dalton Dalton*
H 1 1 1
C 12 4.3 8.6
N 14 4.2 12.6
O 16 5.5 11
S 32 14.4 57.6
To make progress in measuring atomic weights, it was evidently necessary

to find some way of working out the correct formulas for various chemical
compounds. This was provided by the study of chemical reactions in gases.
Law of Combining Volumes

On December 31, 1808, Gay-Lussac read a paper to the Societe Philomathique
in Paris, in which he announced his observation that gases at the same tem-
perature and pressure always combine in definite proportions of volumes. For
instance, two liters of hydrogen combine with one liter of oxygen to give water
vapor, with no hydrogen or oxygen left over. Likewise, one liter of nitrogen
combines with three liters of hydrogen to give ammonia gas, with nothing left
over. And so on.
The correct interpretation of this experimental result was given in 1811 by
Count Amadeo Avogadro (1776–1856) in Turin. Avogadro’s principle states
that equal volumes of gases at the same temperature and pressure always con-
tain equal numbers of the gas particles, which Avogadro called “molecules,”
particles that may consist of single atoms or of several atoms of the same or
different elements joined together. The observation that water vapor is formed
from a volume of oxygen combined with a volume of hydrogen twice as large
shows, according to Avogadro’s principle, that molecules of water are formed
from twice as many molecules of hydrogen as molecules of oxygen, which is
not what Dalton had assumed.
There was a further surprise in the data. Two liters of hydrogen combined
with one liter of oxygen give not one but two liters of water vapor. This is not
what one would expect if oxygen and hydrogen molecules consist of single
atoms and water molecules consist of two atoms of hydrogen and one atom
of oxygen. In that case two liters of hydrogen plus one liter of oxygen would
produce one liter of water vapor. Avogadro could conclude that if, as seemed
1.2 Chemistry 9
plausible, molecules of water contain two atoms of hydrogen and one atom of
oxygen, the molecules of oxygen and hydrogen must each contain two atoms.
That is, taking water molecules as H2 O, the reaction for producing molecules
of water is
2H2 + O2 → 2H2 O .
The use of Avogadro’s principle rapidly provided the correct formulas for gases
such as CO2 , NH3 , NO, and so on. Knowing these formulas and measuring
the weights of gases participating in various reactions, it was possible to cor-
rect Dalton’s atomic weights and calculate more reliable values for the atomic
weights of the atoms in gas molecules, relative to any one of them. Taking the
atomic weight of hydrogen as unity, this gave atomic weights close to 12 for
carbon, 14 for nitrogen, 16 for oxygen, 32 for sulfur, and so on. Then, knowing
these atomic weights, it became possible to find atomic weights for many other
elements, not just those commonly found in gases, by measuring the weights of
elements combining in various chemical reactions.
The Gas Constant

As we saw in the previous section, in 1857 Clausius had shown that in a gas
consisting of n particles of mass m per volume with mean square velocity v2 ,
the pressure is p = nmv2 /3. Using Charles’ law, he concluded that v2 is
proportional to absolute temperature. Writing this relation as mv2 /3 = kT
with k some constant gives Eq. (1.1.5), p = nkT . But this in itself does not
tell us how k varies from one gas to another. This is answered by Avogadro’s
principle. With N particles in a volume V , the number density is n = N/V , so
Eq. (1.1.5) can be written
pV = NkT . (1.2.1)
If as stated by Avogadro the number of molecules in a gas with a given pressure,

volume, and temperature is the same for any gas, then k = pV /N T must be the
same for any gas. Clausius did not draw this conclusion, perhaps because there
was then no known theoretical basis for Avogadro’s principle. The universality
of the constant k, and hence Avogadro’s principle, were explained later by
kinetic theory, to be covered in the next chapter. The constant k came to be
called Boltzmann’s constant, after Ludwig Boltzmann, who as we shall see was
one of the chief founders of kinetic theory.
The molecular weight μ of any compound is defined as the sum of the atomic
weights of the atoms in a single molecule. The actual mass m of a molecule
is its molecular weight times the mass m1 of a hypothetical atom with atomic
weight unity:
m = μm1 . (1.2.2)
In the modern system of atomic weights, with the atomic weight of the most
common isotope of carbon defined as precisely 12, m1 = 1.660539 × 10−24 g,
which of course was not known in Avogadro’s time. A mass M contains
N = M/m = M/m1 μ molecules, so the ideal gas law (1.2.1) can be written
pV = MkT /m1 μ = (M/μ)RT (1.2.3)
where R is the gas constant
R = k/m1 . (1.2.4)
Physicists in the early nineteenth century could use Eq. (1.2.3) to measure R,
and they found a value close to the modern value R = 8.314 J/K. This would
have allowed a determination of m1 and hence of the masses of all atoms of
known atomic weight if k were known, but k did not become known until the
developments described in Section 2.6.
Avogadro’s Number
Incidentally, a mole of any element or compound of molecular weight μ is
defined as μ grams, so in Eq. (1.2.3) the ratio M/μ expressed in grams equals
the number of moles of gas. Since N = M/m1 μ, one mole contains a number of
molecules equal to 1/m1 with m1 given in grams. This is known as Avogadro’s
number. But of course Avogadro did not know Avogadro’s number. It is now
known to be 6.02214 × 1023 molecules per mole, corresponding to unit molec-
ular weight m1 = 1.66054 × 10−24 grams. The measurement of Avogadro’s
number was widely recognized in the late nineteenth century as one of the great
challenges facing physics.
1.3 Electrolysis
Early Electricity
Electricity was known in the ancient world, as what we now call static
electricity. Amber rubbed with fur was seen to attract or repel small bits of light
material. Plato in Timaeus mentions “marvels concerning the attraction of
amber.” (This is where the word electricity comes from; the Greek word for
amber is “elektron.”)
Electricity began to be studied scientifically in the eighteenth century. Two
kinds of electricity were distinguished: resinous electricity is left on an amber
rod when rubbed with fur, while vitreous electricity is left on a glass rod when
rubbed with silk. Unlike charges were found to attract each other, while like
charges repel each other. Benjamin Franklin (1706–1790) gave our modern
terms positive and negative to vitreous and resinous electricity, respectively.
1.3 Electrolysis 11
In 1785 Charles-Augustin de Coulomb (1736–1806) reported that the force

F between two bodies carrying charges q1 and q2 separated by a distance r is
ke q1 q2
F = (1.3.1)
r2
where ke is a universal constant. For like and unlike charges the product q1 q2
is positive or negative, respectively, indicating a repulsive or attractive force.
Coulomb had no way of actually measuring these charges, but he could reduce
the charge on a body by a factor 2 by touching it to an uncharged body of the
same material and size, and observe that this reduces the force between it and
any other charged body by the same factor 2. The introduction of our modern
units of electric charge had to wait until the quantitative study of magnetism.
Early Magnetism
Magnetism too was known in the ancient world, as what we now call per-
manent magnetism. The Greeks knew of naturally occurring lodestones that
could attract or repel small bits of iron. Plato’s Timaeus refers to lodestones as
“Heraclean stones.” (Our word magnet comes from the city Magnesia in Asia
Minor, near where lodestones were commonly found.)
Very early the Chinese also discovered the lodestone and used it as a magnetic
compass (a “south-seeking stone”) for purposes of geomancy and navigation.
Each lodestone has a south-seeking pole at one end, attracted to a point near the
South Pole of the Earth, and a north-seeking pole at the other end, attracted to a
point near the Earth’s North Pole. Magnetism was first studied scientifically by
William Gilbert (1544–1603), court physician to Elizabeth I. It was observed
that the south-seeking poles of different lodestones repel each other, and like-
wise for the north-seeking poles, while the south-seeking pole of one lodestone
attracts the north-seeking pole of another lodestone. Gilbert concluded that one
pole of a lodestone is pulled toward the north and the other toward the south
because the Earth itself is a magnet, with what in a lodestone would be its
south-seeking and north-seeking poles respectively near the Earth’s North Pole
and South Pole.
Electromagnetism
It began to be possible to explore the relations between electricity and mag-
netism quantitatively with the invention in 1809 of electric batteries by Count
Alessandro Volta (1745–1827). These were stacks of disks of two different
metals separated by cardboard disks soaked in salt water. Such batteries drive
steady currents of electricity through wires attached to the ends of the stacks,
with positive and negative terminals identified respectively as the ends of the
stacks from which and towards which electric current flows.
In July 1820 Hans Christian Oersted (1777–1851) in Copenhagen noticed that

turning on an electric current deflected a nearby compass needle, and concluded
that electric currents exert force on magnets. Conversely, he found also that
magnets exert force on wires carrying electric currents.
These discoveries were carried further in Paris a few months later by Andrè-
Marie Ampère (1775–1836), who found that wires carrying electric current
exert force on each other. For two parallel wires of length L carrying electric
currents (charge per second) I1 and I2 , and separated by a distance r
L, the
force is
km I1 I2 L
F = , (1.3.2)
r
where km is another universal constant. The force is repulsive if the currents
are in the same direction; attractive if in opposite directions. One ampere is
defined so that F = 10−7 × L/r newtons if I1 = I2 = 1 ampere. (That
is, km ≡ 10−7 N/ampere2 .) The electromagnetic unit of electric charge, the
coulomb, is defined as the electric charge carried in one second by a current
of one ampere. A modern ammeter measures electric currents by observing the
magnetic force produced by current flowing through a wire loop.
The connection between electricity and magnetism was strengthened in 1831
by Michael Faraday (1791–1867), at the Royal Institution in London. He dis-
covered that changing magnetic fields generate electric forces that can drive
currents in conducting wires. This is the principle underlying the generation of
electric currents today. Electricity began soon after to have important practical
applications, with the invention in 1831 of the electric telegraph by the Ameri-
can painter Samuel F. B. Morse (1791–1872).
Finally, in the 1870s, the great Scottish physicist James Clerk Maxwell
(1831–1879) showed that the consistency of the equation for the generation
of magnetic fields by electric currents required that magnetic fields are also
generated by changing electric fields. In particular, while oscillating magnetic
fields produce oscillating electric fields, also oscillating electric fields produce
oscillating magnetic fields, so a self-sustaining oscillation in both electric and
magnetic fields can propagate in apparently empty √ space. Maxwell calculated
the speed of its propagation and found it to equal 2ke /km ,2 numerically about
equal to the measured speed of light, suggesting strongly that light is such a
self-sustaining oscillation in electric and magnetic fields. We will see more of
Maxwell’s equations in subsequent chapters, especially in Chapters 4 and 5.
2 This quantity is independent of the units used for electric charge as long as the currents appearing in
Eq. (1.3.2) are defined as the rates of flow of charge in the same units as used in Eq. (1.3.1). It is obviously
also independent of the units used for force, as long as the same force units are used in Eqs. (1.3.1) and
(1.3.2).
1.3 Electrolysis 13
Discovery of Electrolysis
Electrolysis was discovered in 1800 by the chemist William Nicholson (1753–
1815) and the surgeon Anthony Carlisle (1768–1840). They found that bubbles
of hydrogen and oxygen would be produced where wires attached respectively
to the negative and positive terminals of a Volta-style battery were inserted in
water. Sir Humphrey Davy (1778–1829), Faraday’s boss at the Royal Institution,
carried out extensive experiments on the electrolysis of molten salts, finding
for instance that, in the electrolysis of molten table salt, sodium, a previously
unknown metal, was produced at the wire attached to the negative terminal of
the battery and a greenish gas, chlorine, was produced at the wire attached
to the other, positive, terminal. Davy’s electrolysis experiments added several
metals aside from sodium to Lavoisier’s list of elements, including aluminum,
potassium, calcium, and magnesium.
A theory of electrolysis was worked out by Faraday. In modern terms, a
small fraction (1.8 × 10−9 at room temperature) of water molecules are nor-
mally dissociated into positive hydrogen ions (H+ ), which are attracted to the
wire attached to the negative terminal of a battery, and negative hydroxyl ions
(OH− ), which are attracted to the wire attached to the positive terminal. At the
wire attached to the negative terminal, two H+ ions combine with two units of
negative charge from the battery to form a neutral H2 molecule. At the wire
attached to the positive terminal, four OH− ions give one O2 molecule plus
two H2 O molecules plus four units of negative charge, which flow through the
battery to the negative terminal.3
Likewise, a small fraction of molten table salt (NaCl) molecules are normally
dissociated into Na+ ions and Cl− ions. At the wire attached to the negative
terminal of a battery, one Na+ ion plus one unit of negative charge gives one
atom of metallic sodium (Na); at the wire attached to the positive terminal, two
Cl− ions give one chlorine (Cl2 ) molecule and two units of negative charge,
which flow through the battery to the negative terminal.
In Faraday’s theory, it takes one unit of electric charge to convert a singly
charged ion such as H+ or Cl− to a neutral atom or molecule, so since molecules
of molecular weight μ have mass μm1 , it takes M/m1 μ units of electric charge
to convert a mass M of singly charged ions to a mass M of neutral atoms
or molecules of molecular weight μ. Experiment showed that it takes about
96 500 coulombs (e.g., one ampere for about 96 500 seconds) to convert μ
grams (that is, one mole) of singly charged ions to neutral atoms or molecules.
(This is called a faraday; the modern value is 96 486.3 coulombs/mole.) Hence
3 We now know that it is negative charge, i.e., electrons, that flows through a battery. As far as Faraday knew,
it was equally possible that positive charges flow through a battery, in which case at the wire attached to the
negative terminal two H+ ions would give an H2 molecule plus two units of positive charge, which would
flow though the battery to the wire attached to the positive terminal, where four OH− ions plus four units
of positive charge would give an O2 molecule and two H2 O molecules.
Faraday knew that e/m1 96 500 coulombs/gram, where e is the unit of

electric charge, which was called an “electrine” in 1874 by the Irish physi-
cist George Johnstone Stoney (1826–1911). Having measured the faraday, if
physicists knew the value of e then they would know m1 , but they didn’t have
this information until later. Also, no one then knew that e is the charge of an
actual particle.
1.4 The Electron
As sometimes happens, in 1858 a new path in fundamental physics was opened

with the invention of a practical device, in this case an improved air pump. In his
pump the Bonn craftsman Heinrich Johann Geissler (1814–1879) used a column
of mercury as a piston, in this way greatly reducing the leakage of air through
the piston that had troubled all previous air pumps. With his pump Geissler was
able to reduce the pressure in a closed glass tube to about a ten-thousandth of
the typical air pressure on the Earth’s surface.
With such a near vacuum in a glass tube, electric currents could travel without
wires through the tube. It was discovered that an electric current would flow
from a cathode, a metal plate attached to the negative terminal of a powerful
electric battery, fly through a hole in an anode, another metal plate attached to
the positive pole of the battery, and light up a spot on the far wall of the tube.
Adding small amounts of various gases to the interior of the tube caused these
cathode rays to light up, with orange or pink or blue-green light emitted along
the path of the ray, when neon, helium, or mercury vapor was added. Using
Geissler’s pumps, Julius Plücker (1801–1868) in 1858–1859 found that cathode
rays could be deflected by magnetic fields, thus moving the spot of light where
the ray hits the glass at the tube end.
In 1897 Joseph John Thomson (1856–1940), the successor to Maxwell as
Cavendish Professor at Cambridge, began a series of measurements of the
deflection of cathode rays. In his experiments, after the ray particles pass
through the anode they feel an electric or magnetic force F exerted at a right
angle to their direction of motion for a distance d along the ray. They then drift
in a force-free region for a distance D d until they hit the end of the tube. If
a ray particle has velocity v along the direction of the ray, it feels the electric
or magnetic force for a time d/v and then drifts for a longer time D/v. A force
F normal to the ray gives ray particles of mass m a component of velocity
perpendicular to the ray that is equal to the acceleration F /m times the time
d/v, so by the time they hit the end of the tube they have been displaced by an
amount
F dD
displacement = (F /m) × (d/v) × (D/v) = .
mv 2
1.4 The Electron 15
The forces exerted on a charge e by an electric field E or a magnetic field B at

right angles to the ray are
Felec = eE , Fmag = evB
so
eEdD
electric displacement = ,
mv 2
eBdD
magnetic displacement = .
mv
Thomson wanted to measure e/m. He knew D, d, E, and B, but not v. He
could eliminate v from these equations if he could measure both the electric and
magnetic displacements, but the electric displacement was difficult to measure.
A strong electric field tends to ionize any residual air in the tube, with positive
and negative ions pulled to the negatively and positively charged plates that
produce the electric fields, neutralizing their charges. Finally Thomson suc-
ceeded in measuring the electric as well as the magnetic deflection by using
a cathode ray tube with very low air pressure. (Both the electric and magnetic
displacements were only a few inches.) This gave results for the ratio of charge
to mass ranging from 6 × 107 to 108 coulombs per gram.
Thomson compared this with the result that Faraday had found in measure-
ments of electrolysis, that e/m1 ≈ 105 coulombs per gram, where e is the
electric charge of a singly ionized atom or molecule (such as a sodium ion in
the electrolysis of NaCl) and m1 is the mass of a hypothetical atom of atomic
weight unity, close to the mass of the hydrogen atom. He reasoned that if the
particles in his cathode rays are the same as those transferred in electrolysis,
then their charge must be the same as e, so their mass must be about 10−3 m1 .
Thomson concluded that since the cathode ray particles are so much lighter than
ions or atoms, they must be the basic constituents of ions and atoms.
Thomson had still not measured e or m. He had not even shown that cathode
rays are streams of particles; they might be streams of electrically charged fluid,
with any volume of fluid having a ratio of charge to mass equal to his measured
e/m. Nevertheless, in the following decade it became widely accepted that
Thomson had indeed discovered a particle present in atoms, and the particle
came to be called the electron.
2
Thermodynamics and Kinetic Theory
The successful uses of atomic theory described in the previous chapter did not
settle the existence of atoms in all scientists’ minds. This was in part because
of the appearance in the first half of the nineteenth century of an attractive
competitor, the physical theory of thermodynamics. As we shall see in the first
three sections of this chapter, with thermodynamics one may derive powerful
results of great generality without ever committing oneself to the existence of
atoms or molecules. But thermodynamics could not do everything. Section 2.4
will describe the advent of kinetic theory, which is based on the assumption that
matter consists of very large numbers of particles, and its generalization to sta-
tistical mechanics. From these thermodynamics could be derived, and together
with the atomic hypothesis it yielded results far more powerful than could be
obtained from thermodynamics alone. Even so, it was not until the appearance
of direct evidence for the graininess of matter, described in Section 2.5, that
the existence of atoms became almost universally accepted.
2.1 Heat and Energy
The first step in the development of thermodynamics was the recognition that
heat is a form of energy. Though so familiar to us today, this was far from
obvious to the physicists and chemists of the early nineteenth century. Until the
1840s heat was widely regarded as a fluid, named caloric by Lavoisier. Caloric
theory was used to calculate the speed of sound by Pierre-Simon Laplace
(1749–1827) in 1816, the conduction of heat by Joseph Fourier (1768–1830) in
1807 and 1822, and the efficiency of steam engines by Sadi Carnot (1796–1832)
in 1824, whose work as we will see in the next section became a foundation
of thermodynamics. Adding to the confusion, other scientists considered heat
as some sort of wave. This reflected uncertainty regarding the nature of what
is now called infrared radiation, discovered by William Herschel (1738–1822)
in 1800.
16
Heat as Energy
In 1798 Benjamin Thompson (1753–1814), an American expatriate in Eng-
land, offered evidence against the idea that heat is a fluid. (Thompson is also
known as Count Rumford, a title he was given when he later served as military
adviser in Austria.) It was well known that boring a cannon produces heat,
which might be supposed to be due to the liberation of caloric from the iron,
but Rumford observed that if the heat is carried away by immersing the cannon
in running water while it is being bored there is no limit to the heat that can be
produced.
The first measurement of the energy in heat was provided in the mid-1840s
by James Prescott Joule (1818–1889). In his apparatus a falling weight turned
paddles in a tank of water, heating the water. The gravitational force on a mass m
kilograms is m times the acceleration of gravity, 9.8 meters/sec2 or 9.8 newtons
per kilogram. Work is force times distance, so dropping one kilogram a distance
of one meter gave it an energy equal to 9.8 newton meters, now also known as
9.8 joules. Joule found that the paddles driven by this dropping weight would
raise the temperature of 100 grams of water by 0.023 ◦ C, so the paddles pro-
duced heat equal to 0.023 × 100 calories, the calorie being defined as the heat
required to raise the temperature of one gram of water by one degree Celsius.
Hence Joule could conclude that 9.8 joules is equivalent to 2.3 calories, so one
calorie is equivalent to 9.8/2.3 = 4.3 joules. The modern value is 4.184 joules.
In 1847 the Prussian physician and physicist Hermann von Helmholtz
(1821–1894) put forward the idea of the universal conservation of energy,
whether in the form of kinetic or potential or chemical energy or heat. But
what sort of energy is heat? For some nineteenth century physicists the question
was irrelevant. They developed the science of heat known as thermodynamics,
which did not depend on any detailed model of heat energy. But there was one
context in which the nature of heat energy seemed evident. In his great 1857
paper, The Nature of the Motion which We Call Heat, Clausius found that at
least part of the heat energy of gases is the kinetic energy of their molecules.
Kinetic Energy
The concept of kinetic energy was long familiar. If a steady force F is exerted
on a particle of mass m, it produces an acceleration F /m, so after a time t the
velocity of the body is v = F t/m. The distance traveled in this time is t times
the average velocity v/2, and the work done on the particle is the force times
this distance:
F × t × F t/2m = F 2 t 2 /2m = mv 2 /2 .
Instead of this work going into heating a tub of water, as in the experiment of
Joule, it goes into giving the particle an energy mv 2 /2.
18 2 Thermodynamics and Kinetic Theory
This energy has the special property of being conserved when bodies come
into contact in collisions. Consider a collision between two rigid balls A and B
with initial vector velocities vA and vB . For the moment suppose that the time
interval t over which this force acts is sufficiently brief that the forces acting on
the balls do not change appreciably during this time. The force that A exerts on
B is equal and opposite to the force F that B exerts on A, so Newton’s second
law tells us that the final velocities of A and B are vA = vA + Ft/mA and
vB = vB − Ft/mB . Hence, as Newton showed, momentum is conserved:
mA vA + mB vB = mA vA + mB vB . (2.1.1)
Neglecting changes in acceleration during the brief time t, the vector displace-
ments traveled by A and B equal t times the average velocities, [vA + vA ]/2
and [vB + vB ]/2, respectively. If the balls remain in contact during this time
interval, then these displacements must be the same, so
vA + vA = vB + vB . (2.1.2)
To derive a second conservation law, rewrite Eq. (2.1.2) as vB − vA = vA − vB
and square this, giving
v2 2
B − 2vB · vA + vA = vB − 2vB · vA + vA .
2 2
Multiply this with mA mB and add the square of Eq. (2.1.1), so that the scalar
products cancel. Dividing by 2(mA +mB ), the result is another conservation law,
mA 2 mB 2 mA 2 mB 2
vA + vB = vA + v . (2.1.3)
2 2 2 2 B
Equations (2.1.1) and (2.1.3) have been derived here only for the case in
which the particles are in contact only for a brief time interval during which
the force acting between the bodies is constant, but this is not an essential
requirement for we can break up any time interval into a large number of brief
intervals in each of which the change in the force is negligible, Then, since
mA vA + mB vB and mA v2A /2 + mB v2B /2 do not change in each interval, they do
not change at all, as long as the bodies exert forces on each other only when they
are in contact.
In 1669 Christiaan Huygens (1629–1695) reported in Journal des Sçavans
that he had confirmed the conservation of the total of mv2 /2, probably by
observing collisions of pendulum bobs, for which initial and final velocities
could be precisely determined. Newton in the Principia called the conserved
quantity mv the quantity of motion, while Huygens gave the name vis viva
(“living force”) to the conserved quantity mv2 /2. These two quantities have
since become known as momentum and kinetic energy.
On the other hand, it was essential in deriving the conservation of kinetic
energy that we assumed that particles interact only when in contact. This is
generally a good approximation in gases, but it is not valid in the presence of
long range forces, such as electromagnetic or gravitational forces. In such cases

kinetic energy is not conserved – it is only the sum of kinetic energy plus some
sort of potential energy that does not change.
Specific Heat
The total kinetic energy of N molecules of gas of mass m and mean square
velocity v2 is Nmv2 /2. Clausius had found the relation (1.1.4) between
mean square velocity and absolute temperature, according to which mv2 /2 =
3kT /2, where k is some constant (later identified as a universal constant of
nature), so the total kinetic energy is 3NkT /2. A mass M of gas of molecu-
lar weight μ contains N = M/μm1 molecules, so the total kinetic energy is
3MRT /2μ, where R = k/m1 is the gas constant (1.2.4). Clausius concluded
that to raise the temperature of a mass M of gas of molecular weight μ by
an amount dT at constant volume, so that the gas does no work on its con-
tainer, requires an energy dE = 3MRdT /2μ. The ratio dE/MdT is known as
the specific heat, so Clausius found that the specific heat of a gas at constant
volume is
Cv = 3R/2μ . (2.1.4)
This result must be distinguished from the value for a different sort of specific
heat, measured at constant pressure, such as when the gas is in a container with
an expandable wall, for which the volume V can change to keep the pressure p
equal to the pressure of the surrounding air or other medium. When pressure
pushes a surface of area A a small distance dL, the work done is the force pA
times dL, which equals pdV where dV = AdL is the change in volume.
According to the ideal gas law (1.2.3), pV = RT M/μ, so if the temperature
is increased by an amount dT , then at constant pressure the gas does work
pdV = RMdT /μ, and this temperature increase therefore requires an energy
3MRdT /2μ + MRdT /μ = 5MRdT /3. In other words, the specific heat at
constant pressure is
Cp = 5R/2μ . (2.1.5)
This result is often expressed in terms of the ratio of specific heats,
γ ≡ Cp /Cv . (2.1.6)
So Clausius found that if all the heat of a gas is contained in the kinetic energy
of its molecules then γ = 5/3.
This did not agree with measurements of the specific heats of common
diatomic gases, such as oxygen or hydrogen, which Clausius cited as giving
γ = 1.421. Later, it was found that γ does indeed equal 5/3 for a monatomic
gas like mercury vapor, but this left the question, in what form is the energy in
ordinary gases that are not monatomic?
To deal with this issue, Clausius suggested that the internal energy of a gas is
larger than the kinetic energy of the molecules, say by a factor 1 + f , with f
some positive number. Then instead of Eq. (2.1.4) we have
Cv = (1 + f ) × 3R/2μ , (2.1.7)
and in place of Eq. (2.1.5),
Cp = (1 + f )3R/2μ + R/μ . (2.1.8)
The specific heat ratio is then
2
γ =1+ . (2.1.9)
3(1 + f )
This is often expressed (especially in astrophysics) as a formula for the internal
energy density E in terms of the pressure and γ :
E = 3RT M(1 + f )/2μV = 3(1 + f )p/2 = p/(γ − 1) . (2.1.10)
The observation that γ 1.4 for diatomic gases like O2 and H2 indicated
that the internal energy of these gases is larger than the kinetic energy of its
molecules by a factor 1 + f 5/3. Measurements gave values of γ for more
complicated molecules like H2 O or CO2 even closer to unity, indicating that
f is even larger for these molecules. The reason for these values for f and γ
did not become clear until the formulation of the equipartition of energy, to be
discussed in Section 2.4.
Adiabatic Changes
It often happens that work is done adiabatically, that is, without the transfer of
heat. In this case the conservation of energy tells us that the work done by an
expanding fluid must be balanced by a decrease in its internal energy E V :
0 = p dV + d(E V ) = (p + E )dV + V d E . (2.1.11)
For an ideal gas, the internal energy per unit volume E is given by Eq. (2.1.10),
so this tells us that
0 = γp dV + V dp
and so, in an adiabatic process,
p ∝ V −γ ∝ ρ γ , (2.1.12)
or, since for a fixed mass p ∝ T /V ,
T ∝ V 1−γ ∝ ρ γ −1 . (2.1.13)
This is in contrast with an isothermal process, for which T is constant and
p ∝ V −1 .
Equation (2.1.12) has an immediate consequence for the speed of sound. At

audible frequencies the conduction of heat is typically too slow to be effective,
so the expansion and compression of a fluid carrying a sound wave is adiabatic.
It is a standard result of hydrodynamics, proved by Newton, that the speed of
sound is

∂p
cs = .
∂ρ
Newton thought
√ that p would be proportional to ρ in a sound wave, which would
give cs = p/ρ, but in fact at audible frequencies the pressure is given by the
√
adiabatic relation (2.1.12), and cs is larger than Newton’s value by a factor γ .
2.2 Absolute Temperature
We have been casually discussing temperature, but what precisely do we mean

by this? It is not hard to give a precise meaning to a statement that one body has a
higher temperature than another, by a generalization of common experience that
is sometimes known as the second law of thermodynamics (the first law being
the conservation of energy). Observation of heat flow shows that if heat can flow
spontaneously from a body A to a body B, then it cannot flow spontaneously
from B to A. We can say then that the temperature tA of A is higher than
the temperature tB of B. Likewise, we say that two bodies are at the same
temperature if heat cannot flow from either one to the other spontaneously,
without work being done on these bodies. Temperature defined in this way is
observed to be transitive: If heat can flow spontaneously from a body A to a
body B, and from B to a body C, then it can flow spontaneously from A to C.
This is a property shared with real numbers – if a number a is larger than number
b, and if b is larger than c, then a is larger than c – and is a necessary condition
for temperatures to be represented by real numbers.
But this does not give a precise meaning to any particular numerical value of
temperature, or even to numerical ratios of temperatures. If, for some definition
of temperature t, a comparison of values of t tells us the direction of heat flow
then the same would be true of any monotonic function T (t). Conventionally
temperatures are defined by thermometers. With a column of some liquid such
as mercury or alcohol in a glass tube, we mark off the heights of the column
when the tube is placed in freezing or boiling water, and for a Celsius tem-
perature scale etch on the tube marks that divide the distance from freezing to
boiling to a hundred equal parts. The trouble is that different liquids expand
differently with increasing temperature, and the temperatures measured in this
way with a mercury or alcohol thermometer will not be precisely equal. We can
try instead to give significance to numerical values of temperature by using a
gas thermometer, relying on the ideal gas law pV = MRT /μ, but this law
is approximate, holding precisely only for molecules of negligible size that
interact only in contact in collisions. How can we give precise meaning to
numerical values of temperature without relying on approximate relations?
Surprisingly, as shown by Rudolf Clausius in his 1850 paper1 “On the Mov-
ing Force of Heat,” it is possible by find a definition of temperature T with
absolute significance by the study of thermodynamic engines known as Carnot
cycles.
Sadi Carnot (1796–1832) was a French military engineer, the son of Lazare
Carnot, organizer of military victory in the French Revolution, and uncle of
a later president of the Third Republic. In 1824 Carnot in Reflections on the
Motive Power of Fire set out to study the efficiency of steam engines, explaining
that “Already the steam engine works our mines, impels our ships, excavates
our ports and our rivers, forges iron, fashions wood, grinds grains, spins and
weaves our clothes, transports our heaviest burdens, etc.” (A few years later he
might also have mentioned the beginning of steam-propelled locomotives, with
the opening of the Liverpool–Manchester railroad in 1830.) Carnot invented an
idealized engine, known as a Carnot cycle, which as we shall see is maximally
efficient and provides a natural definition of absolute temperature.
In the Carnot cycle, a working fluid (such as steam in a cylinder fitted with a
piston) goes through four frictionless steps:
1. Isothermal: The working fluid does work on its environment, for instance
by pushing a piston against external pressure, but keeping a constant tem-
perature by absorbing heat Q2 from a hot reservoir at temperature t2 . (We
will continue to use lower case t to indicate temperature defined in any
way that indicates the direction of heat flow, without specifying any physical
significance to its particular numerical values.)
2. Adiabatic: The working fluid, perfectly insulated from its environment and
with no internal friction, does more work, with its temperature dropping to
the temperature t1 of a cold reservoir but with no heat flowing in or out.
3. Isothermal: Work is done on the fluid, for instance by pushing in the piston,
with its temperature kept constant by its giving up heat Q1 to the cold
reservoir.
4. Adiabatic: With the working fluid again completely insulated from its envi-
ronment, work is done on it, bringing its volume back down to its original
value and its temperature back up to the temperature t2 of the hot reservoir.
1 This paper is reprinted in Brush, The Kinetic Theory of Gases – An Anthology of Classic Papers with
Historical Commentary, listed in the bibliography.
p
A
1
B
D
3
C
Figure 2.1 A Carnot cycle (not drawn to scale).
A graph of the pressure versus the volume of the working fluid in this cycle is
a closed curve, with the net work W done on the environment equal to pdV –
that is, to the area enclosed by the curve. (See Figure 2.1.) As long as steps 2
and 4 are truly adiabatic, the conservation of energy tells us that this work is
W = Q2 − Q1 (2.2.1)
and the efficiency of this cycle is
Q2 − Q1
W/Q2 = <1. (2.2.2)
Q2
(We call this the efficiency, having in mind that, as for a steam engine, we have
to pay for the heat Q2 taken up at the higher temperature t2 , while the heat Q1
given up at the lower temperature t1 is wasted.)
Any Carnot cycle is reversible, because any frictionless adiabatic or isother-
mal process follows the same track, depending only on its endpoints, whichever
direction the process takes. But not all thermodynamic cycles, which take a
working fluid through a series of steps back to the original temperature and
volume, are reversible even though of course they all conserve energy. For
reversibility it is not enough that all steps be either isothermal or adiabatic –
there also should be no friction, which if present would provide an internal
source of heat that is not available to do work.
The importance of Carnot cycles in thermodynamics rests on the following

theorem:
I. The efficiency of the Carnot cycle C described above is at least as great as
that for any general thermodynamical cycle C , not necessarily reversible, which
begins with the working fluid absorbing a heat Q2 from a reservoir at the same
high temperature t2 , then emitting heat at the same lower temperature t1 , and
then returning to its original temperature and volume, in the process doing net
work W . That is,
W/Q2 ≥ W /Q2 . (2.2.3)
II. All Carnot cycles that take heat from a reservoir at the same temperature t2 ,
using it to do work, and giving up waste heat to a reservoir at the same lower
temperature t1 , have the same efficiency, which depends only on t2 and t1 .
Proof:2 Like any positive real number, the ratio of the work done in the Carnot
cycle C and in a general cycle C can be approximated to an arbitrary accuracy
by a ratio of positive real integers N and N :
W/W = N /N . (2.2.4)
Since any Carnot cycle by definition is reversible, the cycle C has an inverse
C−1 . This is a refrigeration cycle, following the same steps as for C but in the
opposite order, so that by doing work W on the fluid an amount of heat Q1 is
taken from the reservoir at temperature t1 and heat Q2 > Q1 is delivered to
the reservoir at temperature t2 > t1 . Suppose we perform a compound cycle
C∗ , consisting of N repetitions of C−1 and N repetitions of C . According to
Eq. (2.2.4), the net work done by the working fluid is
W ∗ = N W − NW = 0 .
Also, the net heat taken from the hot reservoir at temperature t2 is
Q∗2 = N Q2 − NQ2 .
Now, since no work is done in the compound cycle, according to the fundamen-
tal property of temperature t, it is not possible for positive-definite net heat to
be transferred to a reservoir at temperature t2 from a lower temperature t1 , so the
net heat Q∗2 taken from the hot reservoir in the cycle C∗ must be positive-definite
or zero. Hence, using Eq. (2.2.4),

Q∗2 N Q2 − NQ2 Q2 Q2
0≤ = = − ,
NW NW W W
2 This treatment and that of the following section is based on that given by Fermi, Thermodynamics, listed
in the bibliography.
and therefore
W W
≥ (2.2.5)
Q2 Q2
as was to be proved in the first part of the theorem.
As to the second part of the theorem, note that if C is also a Carnot cycle
then, by the same reasoning,
W W
≥ ,
Q2 Q2
so the efficiencies are equal:
W W
= . (2.2.6)
Q2 Q2
This has now been proved for any pair of Carnot cycles, operating between the
same temperatures t2 and t1 , whatever the values of the heat taken from the
reservoir at temperature t2 and given up to the reservoir at temperature t1 , so
the common efficiency can only depend on t2 and t1 , as was to be proved.
We shall write this relation in terms of the inefficiency:
W Q1
1− = ≡ F (t1 , t2 ) (2.2.7)
Q2 Q2
with F the same function for all Carnot cycles. We next prove that the function
F (t1 , t2 ) takes the form
F (t1 , t2 ) = T (t1 )/T (t2 ) (2.2.8)
for some function T (t). For this purpose we consider a compound cycle consist-
ing of a Carnot cycle operating between the temperatures t2 and t0 ≤ t2 followed
by a Carnot cycle operating between the temperatures t0 and t1 ≤ t0 , with all the
waste heat that is given to the reservoir at temperature t0 in the first cycle taken
up from this reservoir in the second cycle. Since (Q0 /Q2 )(Q1 /Q0 ) = Q1 /Q2 ,
the inefficiency (2.2.7) of the compound cycle is the product of the inefficiencies
of the individual cycles, so
F (t1 , t2 ) = F (t1 , t0 )F (t0 , t2 ) . (2.2.9)
From Eq. (2.2.7) it is evident that F (t2 , t0 )F (t0 , t2 ) = 1, so Eq. (2.2.9) may be
written
F (t1 , t0 )
F (t1 , t2 ) = . (2.2.10)
F (t2 , t0 )
This holds for any t0 with t2 ≥ t0 ≥ t1 , so we can define T (t) ≡ F (t, t0 ) with
an arbitrary choice of t0 in this range, and then Eq. (2.2.10) is the desired result
(2.2.8).
Now, efficiencies are never greater than 100%, so the ratio F (t1 , t2 ) =
T (t1 )/T (t2 ) in Eq. (2.2.7) must be positive, and so T (t) has the same sign
for all temperatures. Since only the ratios of the T s appear in the efficiency,
we are free to choose this sign to be positive, so that T (t) ≥ 0 for all t.
Also, inefficiencies are never greater than 100%, so Eq. (2.2.8) shows that
T (t1 ) ≤ T (t2 ) for any t1 and t2 with t1 ≤ t2 . That is, T (t) is a monotonically
increasing function of t and can therefore be used to judge the direction of
spontaneous heat flow as well as t itself.
We can therefore define the absolute temperature T by just using T (t) as
the temperature in place of t. That is, using Eqs. (2.2.7) and (2.2.8), we define
absolute temperature T by the statement that a Carnot cycle running between
any two temperatures T2 and T1 has
Q1 T1
= . (2.2.11)
Q2 T2
A Carnot cycle running between an upper temperature T2 and a lower tempera-
ture T1 has an efficiency
W Q2 − Q1 T2 − T1
= = . (2.2.12)
Q2 Q2 T2
Of course, this only defines T up to a constant factor, leaving us free to use
what units we like for temperature. But we are not free to shift T (t) by adding
a constant term. Indeed, since in this Carnot cycle heat flows from a reservoir at
temperature T2 to one at temperature T1 , we must have T2 > T1 , and therefore in
order for the efficiency (2.2.11) to be a positive quantity, the lower temperature
must have T1 > 0. Because any heat reservoir must have T positive-definite, we
see that T is the absolute temperature, in the same sense as was found for gases
by Charles.
The temperature defined by Carnot cycles is identical (up to a choice of
units) to the temperature given by a gas thermometer, which for the moment
we will call T g , in the approximation that the gas is ideal. To see this, let us
label the states of the gas as A at the start of the isothermal expansion 1 (and
at the end of the adiabatic compression 4); as B at the start of the adiabatic
expansion 2 (and the end of the isothermal expansion 1); as C at the start of
the isothermal compression 3 (and the end of the adiabatic expansion 2); and
as D at the start of the adiabatic compression 4 (and the end of the isothermal
compression 3). Since the expansion from A to B is isothermal, during this
phase the internal energy of the gas, which is given by Eqs. (1.2.3) and (2.1.10)
as E V = RT M/(γ − 1)μ, does not change, and so the heat drawn from the hot
reservoir is the work done:
B g B g
MRT2 dV MRT2 VB
Q2 = pdV = = ln .
A μ A V μ VA
2.3 Entropy 27
Likewise, the heat given up to the cold reservoir in the isothermal compression
from C to D is
g
MRT1 VC
Q1 = ln .
μ VD
Further, since the expansion from B to C and the contraction from D to A are
adiabatic, Eq. (2.1.13) gives V ∝ (T g )−1/(γ −1) , and so during these parts of the
cycle
g 1/(γ −1)
T2
VC /VB = VD /VA = g ,
T1
and therefore VB /VA = VC /VD , and the logarithmic factors in Q2 and Q1 are
equal. The efficiency is then
g g
Q2 − Q1 T −T
= 2 g 1
Q2 T2
in agreement with Eq. (2.2.12) if T = T g , up to a possible constant factor.
2.3 Entropy
In macroscopic classical thermodynamics we characterize the state of a system

by a set of variables that can be specified independently. For instance, for a fluid
of fixed mass and chemical composition, in a vessel with adjustable volume
(say, with a movable piston) the state is specified by giving the values of any
two of the thermodynamic variables – pressure, volume, temperature, energy,
etc. – the remaining variables being determined in equilibrium or in adiabatic
variations by these two values and some equation of state, such as the ideal
gas law (1.2.3). Many of the consequences of macroscopic classical thermody-
namics can be deduced from the existence of another thermodynamic variable,
known as the entropy, introduced in 1854 by Rudolf Clausius, that like other
thermodynamic variables depends only on the state of the system, although its
definition seems to indicate that it also depends on the way that the system is
prepared.
Suppose a system is prepared in a given state 1 by starting with it in a standard
state (labeled 0 below) and then taken to 1 on a path P through the space
of independent variables used to define thermodynamic states, in which by a
series of small reversible changes at varying absolute temperatures T it picks up
small net amounts dQ of heat energy from the environment. (The heat energy
increment dQ is taken as positive if the system takes heat from the environment
and negative if it gives up heat.) Then the entropy of this state is defined by

dQ
S1 = S0 + , (2.3.1)
P T
where the integral is taken over any reversible path from state 0 to state 1, and S0
is whatever entropy we choose to ascribe to the standard state 0. The remarkable
thing is that the integral here is independent of the particular reversible path
chosen, so that this really defines the entropy S1 up to a common constant
term S0 as a function of the state of the system, not of how it is prepared,
provided that T is the absolute temperature defined, as in the previous section,
by the efficiency of Carnot cycles. Furthermore, with the entropy defined in this
way, for any path P from state 0 to state 1 that may or may not be reversible,
we have

dQ
≤ S1 − S0 . (2.3.2)
P T

Proof: The first step in proving these results is to prove the following lemma:
for an arbitrary cycle, reversible or irreversible, that takes a system from any
state back to the same state, taking in and giving up heat at various temperatures,
we have

dQ
≤0. (2.3.3)
T
After establishing this lemma, the rest of the proof will be straightforward.
To prove this lemma we can approximate the cycle by a sequence of brief
isothermal steps, in each of which the system takes in heat (if dQ is positive)
or gives heat up (if dQ is negative) at a momentary temperature T . We can
imagine that, at each step, the heat taken in or given up is given up or taken
in by another system, which undergoes a Carnot cycle between the momentary
temperature T and a fixed temperature T0 . In this Carnot cycle, the ratio of the
heat dQ given up by the Carnot cycle to and the heat dQ0 taken by the Carnot
cycle from the reservoir at temperature T0 is given by Eq. (2.2.11):
dQ T
= ,
dQ0 T0
or in other words

dQ
dQ0 = T0 .
T

Hence in the complete cycle the Carnot cycles take in a total net heat T0 dQ/T
from the reservoir at temperature T0 . Since the system and each of the Carnot
cycles return to their original states, if this heat taken in at temperature T0 were
positive-definite then it would have to go into work, which is impossible since
work cannot be done by taking heat from a reservoir at a fixed temperature with
no changes elsewhere. (If it could, then this work by producing friction could
2.3 Entropy 29
be used to transfer some heat to any body, even one at a temperature higher
than T0 .) So we conclude that the integral dQ/T is at most zero, as was to be
shown.
The rest is easy. Note that if two paths P and P are both reversible paths
that go from state 0 to state 1, then P P −1 is a closed cycle, where P −1 is path
P taken in reverse, from state 1 to state 0. It follows then from the inequality
(2.3.3) that

dQ dQ dQ
0≥ = − .
P P −1 T P T P T
But P P −1 is also a closed cycle, so

dQ dQ dQ
0≥ = − .

P P −1 T P T
P T
These two results are consistent only if for reversible paths both cyclic integrals
vanish, in which case

dQ dQ
= . (2.3.4)
P T P T
We can therefore define the entropy up to an additive constant as in Eq. (2.3.1),

where P is any reversible path.
Finally, if P is a general path from state 0 to state 1, reversible or irreversible,
while P is a reversible path from state 0 to state 1, then P P −1 (but not neces-
sarily P P −1 !) is a closed cycle, so the inequality (2.3.3) gives

dQ dQ dQ
0≥ = − ,
P P −1 T P T P T
and therefore, using Eq. (2.3.1),

dQ
≤ S1 − S0
P T
as was to be shown.
In the special case of a completely isolated system , no heat can be taken
into or given up by , so the integrand in the integral on the left-hand side of
Eq. (2.3.2) must vanish and therefore S1 ≥ S0 . In isolated systems the entropy
can only increase. On the other hand, if an isolated system is undergoing only
reversible changes, then according to Eq. (2.3.1) the entropy is constant.
There is another definition of entropy, used in information theory as well as
in physics. If a system can be in any one of a number of states characterized
by a continuous (generally multidimensional) parameter α, with a probability
P (α) dα of being in states with this parameter in a narrow range dα around α,
then the entropy is

S = −k P (α) ln P (α) dα , (2.3.5)
where k is the universal constant, known as Boltzmann’s constant, appearing

in Eq. (1.2.1). As we shall see in the next section, according to kinetic the-
ory, with a suitable choice of S0 the thermodynamic entropy (2.3.1) equals the
information-theoretic entropy (2.3.5).
The mere fact that the entropy S defined by (2.3.1) depends only on the ther-
modynamic state has far-reaching consequences. Consider a fixed mass of fluid
in a vessel with variable volume. The independent thermodynamic variables
here can be taken as the volume V and the temperature T , with pressure p,
internal energy E, and entropy S all functions of V and T . The work done by
the fluid pressure p in increasing the fluid volume by a small amount dV is
pdV , so the heat required to change the temperature by an infinitesimal amount
dT and the volume by an infinitesimal amount dV is
dQ = dE + pdV ,
so according to Eq. (2.3.1), the change in the entropy is given by
T dS = dE + pdV . (2.3.6)
In other words
∂S(V , T ) 1 ∂E(V , T )
= (2.3.7)
∂T T ∂T
∂S(V , T ) 1 ∂E(V , T ) p(V , T )
= + . (2.3.8)
∂V T ∂V T
To squeeze information about pressure and internal energy from these formulas,
we use the fact that partial derivatives commute. From Eq. (2.3.7) we have

∂ ∂S(V , T ) 1 ∂ 2 E(V , T )
=
∂V ∂T T ∂T ∂V
while, from Eq. (2.3.8),

∂ ∂S(V , T ) 1 ∂ 2 E(V , T ) 1 ∂E(V , T ) ∂(p(V , T )/T )
= − 2 + .
∂T ∂V T ∂T ∂V T ∂V ∂T
Setting these equal gives a relation between the derivatives of E and p:

1 ∂E(V , T ) ∂ p(V , T )/T
0=− 2 + . (2.3.9)
T ∂V ∂T
This is for a fixed mass. Since E(V , T ) is an extensive variable, it must be
proportional to this mass but does not otherwise have to depend on volume.
In fact, it is frequently a good approximation to suppose that, apart from its
proportionality to mass, E(V , T ) is independent of volume. This is the case if
2.3 Entropy 31
the fluid consists of infinitesimal particles that interact only in contact in colli-
sions; since there is nothing with the dimensions of length that can enter in the
calculation of the energy, E(V , T ) cannot here depend on volume. In this case
Eq. (2.3.9) yields Charles’ law, that for fixed volume V the pressure p(V , T ) is
proportional to T . This shows again that the absolute temperature T in the ideal
gas law (1.2.3) is the same up to a constant factor as the temperature T defined
by the efficiency (2.2.11) of Carnot cycles.
Although this result was obtained without having a formula for the entropy,
for some purposes it is useful actually to know what the entropy is. In a homo-
geneous medium, the entropy S of any mass M of matter may conveniently be
written as S = Ms, where s is the entropy per unit mass, a function of temper-
ature and various densities known as the specific entropy. Dividing Eq. (2.3.6)
by M, we have then
T ds = d(E /ρ) + pd(1/ρ) , (2.3.10)
where as before E ≡ E/V is the internal energy density and ρ ≡ M/V is the
mass density. We consider an ideal gas, for which T = pμ/Rρ while E and p
are related by Eq. (2.1.10): E = p/(γ − 1). Then Eq. (2.3.10) gives

pμ 1 dp 1 ρ γ −1 p
ds = + γpd = d ,
Rρ γ −1 ρ ρ γ −1 ργ
so

R/μ ρ γ p
ds = d .
γ −1 p ργ
The solution is

R/μ p
s= ln + constant . (2.3.11)
γ −1 ργ
We see that the result of Section 2.2 that p ∝ ρ γ for adiabatic processes is just
the statement that s is constant in these processes, which of course it must be
since in an adiabatic process the heat input dQ vanishes.
In many stars there are regions in which convection effectively mixes matter
from various depths. Since heat conduction is usually ineffective in stars, little
heat flows into or out of a bit of matter as it rises or falls, and so it keeps the
same specific entropy. These regions therefore have a uniform specific entropy,
and therefore a uniform value for the ratio p/ρ γ . For instance, this is the case in
the Sun for distances from the center greater than about 65% of the Sun’s radius
out to a thin surface layer.
Neutral Matter
We have been mostly concerned with matter in which in each mass there is a
non-vanishing conserved quantity, the number of particles. There is a different
context, with no similar conserved numbers, in which thermodynamics yields

more detailed information about pressure and energy. In the early universe, at
temperatures above about 1010 K, there is so much energy in radiation and
electron–antielectron pairs that the contribution to the energy of the excess of
matter over antimatter may be neglected. Here there is no number density on
which the pressure and energy density E ≡ E/V can significantly depend,
so here E(V , T ) = V E (T ) and p(V , T ) = p(T ); thus here Eq. (2.3.9) is an
ordinary differential equation for p(T ):

E (T ) d p(T )
0=− 2 +
T dT T
or, in other words,
E (T ) + p(T )
p (T ) = . (2.3.12)
T
Thermodynamics alone does not fix any relation between E (T ) and p(T ), but
given such a relation this result gives both as functions of temperature. For
instance, as an example of the power of thermodynamics, it was known in the
nineteenth century as a consequence of Maxwell’s theory of electromagnetism
that the pressure of electromagnetic radiation is one-third of its energy density.
Setting p = E /3 in Eq. (2.3.12) gives E (T ) = 4E (T ), so
E (T ) = 3p(T ) = aT 4 , (2.3.13)
where a is a constant, known as the radiation energy constant. But, as we shall
see in Section 3.1, it was not possible to understand the value of a until the
advent of quantum mechanics in the early twentieth century.
The Laws of Thermodynamics

It is common to summarize the content of classical thermodynamics in three
laws. As already mentioned, the first law is just the conservation of energy,
discussed in the context of heat energy in Section 2.1, and the second, usually
attributed to Clausius, on which the discussion of thermodynamic efficiency in
Section 2.2 is based, can be stated as the principle that without doing work it is
not possible to transfer heat from a cold reservoir to one at higher temperature.
We have seen that this leads to the existence of a quantity, the entropy, which
depends only on the thermodynamic state and satisfies Eq. (2.3.1) when
reversible changes are made in this state. This can instead be taken as the
second law of thermodynamics.
There are several formulations of the third law, some given by Walther Nernst
(1864–1941) in 1906–1912. The most fruitful, it seems, is that it is possible
to assign a common value to the entropy (conventionally taken as zero) for
all systems at absolute zero temperature, so that at absolute zero the integral
in Eq. (2.3.1) must converge. This has the consequence, in particular, that the
specific heat dQ/dT must vanish for T → 0. This seems to contradict the
results of Section 2.1 for ideal gases, which give a temperature-independent
specific heat whether for fixed volume or fixed pressure. The contradiction is
avoided in practice because no substance remains close to an ideal gas as the
temperature approaches absolute zero. We will see when we come to quantum
mechanics that if an otherwise free particle is confined in any fixed volume, then
it cannot have precisely zero momentum, as required for a classical ideal gas at
absolute zero temperature. On the other hand, solids can exist at absolute zero
temperature, and in that limit their specific heats do approach zero.
2.4 Kinetic Theory and Statistical Mechanics
We saw in the previous chapter how by the mid nineteenth century the ideal gas
law had been established through the work especially of Bernoulli and Clausius.
But, though derived by considering the motions of individual gas molecules, in
its conclusions it dealt only with bulk gas properties such as pressure, tem-
perature, mass density, and energy density. For many purposes, including the
calculation of chemical or transport processes, it was necessary to go further
and work out the detailed probability distribution of the motion of individual
gas particles. This was done in the kinetic theory of James Clerk Maxwell and
Ludwig Boltzmann (1844–1906). Kinetic theory was later generalized to the
formalism known as statistical mechanics, especially by the American theorist
Josiah Willard Gibbs (1839–1906). As it turned out, these methods went a
long way toward not only establishing a correspondence with thermodynamics
but also explaining the principles of thermodynamics on the assumption that
macroscopic matter is composed of very many particles, and thereby helping to
establish the reality of atoms.
The Maxwell–Boltzmann Distribution

Maxwell in 1860 considered the form of the probability distribution function
P (vx , vy , vz ) for the x, y, and z components of the velocity of any molecule in a
gas in equilibrium.3 The probability distribution function is defined so that the
probability that these components are respectively between vx and vx + dvx ,
between vy and vy + dvy , and between vz and vz + dvz , is of the form
P (vx , vy , vz )dvx dvy dvz .
3 J. C. Maxwell, Phil. Mag. 19, 19; 20, 21 (1860). This article is included in Brush, The Kinetic Theory of
Gases – An Anthology of Classic Papers with Historical Commentary, listed in the bibliography.
He assumed (without offering a real justification) that the probability that any
component of velocity of a particle is in a particular range is not correlated with
the other components of the velocity. Then P (vx , vy , vz ) must be proportional
to a function of vx alone, with a coefficient that depends only on vy and vz , and
likewise for vy and vz , so P (vx , vy , vz ) must take the form of a product:
P (vx , vy , vz ) = f (vx )g(vy )h(vz ) .
Rotational symmetry requires further that P can depend only on the magnitude
of the velocity, not on its direction, and hence only on vx2 + vy2 + vz2 . The only
function of vx2 + vy2 + vz2 that takes the form f (vx )g(vy )h(vz ) is proportional to
an exponential:

P (vx , vy , vz ) ∝ exp − C(vx2 + vy2 + vz2 ) .
The constant C must be positive in order that P should not blow up for large
velocity, which would make it impossible to set the total probability equal to
unity, as it must be. Taking C to be positive, and setting the total probability
(the integral of P over all velocities) for each particle equal to one, gives the
factor of proportionality:
3/2
C

P (vx , vy , vz ) = exp − C(vx2 + vy2 + vz2 ) .
π
We can use this to calculate the mean square velocity components:
1
vx2 = vy2 = vz2 = .
2C
Clausius had introduced an absolute temperature T by setting mv⊥ 2 = kT ,
where k is a constant to be determined experimentally and v⊥ is the component

of the velocity in a direction normal to the container wall, which for an isotropic
velocity distribution can be taken as any direction, so the constant C must be
given by C = m/2kT and the Maxwell distribution takes the form
m 3/2

P (vx , vy , vz ) = exp − m(vx2 + vy2 + vz2 )/2kT . (2.4.1)
2πkT
As we saw at the end of Section 2.2, the requirement that in an ideal gas mv⊥ 2 =
kT , which led here to C = m/2kT , also ensures that, up to an arbitrary constant
factor, T is the absolute temperature defined by the efficiency of Carnot cycles.
The formula for the probability distribution P was derived in 1868 in a more
convincing way by Boltzmann.4 He defined a quantity
4 L. Boltzmann, Sitz. Ber. Akad. Wiss. (Vienna), part II, 66, 875 (1872). A translation into English of this
article is included in Brush, The Kinetic Theory of Gases – An Anthology of Classic Papers with Historical
Commentary, listed in the bibliography.
+∞ +∞ +∞
H ≡ ln P = dvx dvy dvz P (v) ln P (v) ,
−∞ −∞ −∞
and showed that collisions of gas particles always lead to a decrease in H until
a minimum is reached, at which P (v) is the Maxwell–Boltzmann distribution
function. A generalization of this H -theorem was given in 1901 by Gibbs.5 The
generalization and proof are given below, along with the application to gases.
The General H -Theorem

Consider a large system with many degrees of freedom, such as a gas with many
molecules (but not necessarily a gas). The states of the system are parameterized
by many variables, which we summarize with a symbol α. (For instance, for a
monatomic gas α stands for the set of positions x1 , x2 , etc. and momenta p1 , p2 ,
etc. of atoms 1, 2, . . . For a gas of multi-atom molecules, α would also include
the orientations and their rates of change for each molecule.) We denote an
infinitesimal range of these parameters by dα. (For instance, for a monatomic
gas dα stands for the product d 3 x1 d 3 p1 d 3 x2 d 3 p2 . . ., known as the phase space
volume.) We define P (α) so that the probability that the parameters of the
system are in an infinitesimal range dα around α is P (α)dα, with P normalized
so that dα P (α) = 1. Define

H ≡ ln P = P (α) dα ln P (α) . (2.4.2)
Gibbs showed that H always decreases until it reaches a minimum value, at

which P (α) is proportional to the exponential of a linear combination of con-
served quantities, such as the total energy.
Proof: Define a differential rate (α → β) such that the rate at which a system
in state α makes a transition to a state within a range dβ around state β is
(α → β) dβ. The probability P (α)dα can either increase because the system
in a range dβ of states around β makes a transition to the range dα of states
around α, or decrease because the system in the range of states dα around α
makes a transition to some other state in a range dβ around β, so

d P (α)dα
= dβ [P (β) (β → α)dα − P (α) dα(α → β)] ,
dt
or, cancelling the differentials dα,

d P (α)
= dβ [P (β) (β → α) − P (α)(α → β)] . (2.4.3)
dt
5 J. W. Gibbs, Elementary Principles of Statistical Mechanics, Developed with Especial Reference to The
Rational Foundation of Thermodynamics (Scribner, New York, 1902).

(Note that this makes dα P (α) time-independent, as it must be. Cancelling dα
is justified because phase space volumes such as dα do not change with time,)
Now use (d/dt)y ln y = (ln y + 1)(dy/dt), which gives here

dH

= dαdβ ln P (α) + 1 [P (β) (β → α) − P (α) (α → β)] .
dt
Interchange α and β in the second double integral arising from the second term
in square brackets:

dH P (α)
= dαdβ P (β) ln (β → α) . (2.4.4)
dt P (β)
Now use the inequality that y ln(x/y) ≤ x − y for any positive numbers x
and y. (To prove this, note that y ln(x/y) − x + y vanishes for x = y, while its
derivative with respect to x is −(x − y)/x, so it monotonically approaches zero
from below for x < y and then decreases monotonically for x > y.) From this
inequality, we have

dH
≤ dαdβ [P (α) − P (β)] (β → α) . (2.4.5)
dt
Again interchange α and β, now in the first double integral:

dH
≤ dαdβ P (β) [(α → β) − (β → α)] . (2.4.6)
dt
In the original proof it was assumed that the laws of physics are invariant under
reversal of the direction of time’s flow, and therefore (β → α) = (α → β),
so that Eq. (2.4.6) says that H decreases with time, in accord with the
H -theorem. In studies of the decay of neutral K-mesons in 1964–1970 it was
found that time-reversal invariance is not exact.6 Fortunately, the H -theorem
survives, because on very general grounds in quantum mechanics it can be
shown without using time-reversal invariance that7

dα [(α → β) − (β → α)] = 0 . (2.4.7)
With Eq. (2.4.6), this is enough to require that dH /dt ≤ 0, as was to be shown.
Let us pause for a moment to reflect how remarkable is this result. The
decrease of H with time indicates a fundamental difference between past
and future, even though this result would hold even if the underlying micro-
scopic laws of physics were entirely symmetric under the direction of time’s
flow, and indeed as we have seen it was first derived under the assumption of
6 K. R. Shubert et al., Phys. Lett. 13, 138 (1964). This had been strongly suggested by an earlier experiment
of J. H. Christensen, J. W. Cronin, V. L. Fitch, and R. Turlay, Phys. Rev. Lett. 13, 138 (1964).
7 For a very general proof, with references to earlier work by others, see S. Weinberg, The Quantum Theory
of Fields, Vol. I, pp. 150–151 (Cambridge University Press, Cambridge, UK, 1995, 2005).
time-reversal invariance. This distinction between past and future is obvious in

everyday life: A glass tumbler that falls on the floor will shatter, giving up its
kinetic energy to heat in the floor, but glass fragments lying on the floor will
not draw energy from the floor and leap up to reassemble as a tumbler. But
from where does this distinction come? It is the introduction of the concept
of probability into physics that creates an asymmetry between past and future.
We can try rewriting the fundamental equation (2.4.3) for the rate of change of
probability by replacing t with −t,

d P (α)
= dβ[P (β) [−(β → α)] − P (α) [−(α → β)]] ,
−dt
but then we would have to replace the rates with −, which makes no sense
because these rates have to be positive. It is the condition that ≥ 0 together
with Eq. (2.4.3) that fixes the direction of time’s flow. We see this also in the
derivation of Eq. (2.4.5), which follows from Eq. (2.4.4) only if we assume that
(β → α) is positive.
Canonical and Grand Canonical Ensembles

Let’s now return to the H -theorem. The decrease in H will stop when H reaches
a minimum value, at which it is stationary for any physically possible infinites-
imal change in P (α). For an arbitrary infinitesimal change δ P (α), we have

δH = δ P (α) dα [ln P (α) + 1] .
Now, δ P (α) is not entirely arbitrary but is constrained bythe condition that
variations in P cannot change either the total probability P (α)dα = 1 or
the mean value of any conserved quantity such as the total energy E(α). In
order that δH should vanish for any variation in P (α) that preserves dα P (α)
and the mean values of all conserved quantities, it is necessary and sufficient
that ln P (α) should be a linear combination of a constant and any conserved
quantities. For instance, if the total energy E(α) is the only conserved quantity
(as it is for radiation) then if we denote the coefficient of E(α) in this linear
combination as −1/ we have

E(α)
P (α) = exp C − ,

with the constant factor eC fixed by the requirement that P (α) dα = 1. We
will show below that, with this probability distribution, the quantity −H has
the defining property (2.3.1) of entropy provided that is proportional to the
absolute temperature T , = kT , so the canonical ensemble is usually written

E(α)
P (α) = exp C − . (2.4.8)
kT
The value of k expresses how we convert units of temperature into units of

energy, and since it is just a matter of our system of units it cannot depend on
what sort of system is described by this distribution.
More generally, there may be some other conserved quantities Ni . For
instance, in a gas consisting of molecules of different types, even if these
molecules are undergoing chemical reactions, under ordinary conditions the
numbers Ni of atoms of type i do not change. In such cases, ln P (α) will in
general contain a term proportional to each conserved quantity Ni (α), with a
coefficient that we will denote as μi /kT , where μi (or sometimes μi /kT ) is a
quantity known as the chemical potential. The probability density is then

[E(α) − i μi Ni (α)]
P (α) = exp C − . (2.4.9)
kT
A multi-particle system with probabilities distributed in this way is said to
form a grand canonical ensemble. For instance, in a gas of H2 , O2 , and H2 O
molecules there are two chemical potentials, for hydrogen and oxygen atoms,
so in equilibrium at a given temperature we can derive one set of ratios among
the three molecular densities without knowing anything about the values of
the chemical potentials, but we need to know these potentials to derive all the
densities.
Connection with Thermodynamics

Aside from a constant factor, H is the entropy, as defined by Clausius in thermo-
dynamics. To see this, suppose we slowly add heat dQ to our system, preserving
the equilibrium form (2.4.8) or (2.4.9) of the distribution P and the values of all
conserved quantities other than the energy, but shifting the average total energy
E by δQ. Then

δH = dα δ P (α)[ln(P (α)) + 1]

1 1 δQ
=− dα δ P (α)E(α) = − δE = −
kT kT kT
which is the defining equation dS = dQ/T of the entropy if (apart from an
arbitrary constant term) we define

S = −kH = −k dα P (α) ln P (α) , (2.4.10)
thus justifying Eq. (2.3.4). The decrease in H implies the increase in entropy,
thus justifying one consequence of the second law of thermodynamics. This was
shocking to some physicists of the nineteenth century, who regarded thermody-
namics as an independent theory, just as fundamental as Newtonian mechanics.
Compound Systems
Equation (2.4.10) makes it easy to justify a fundamental property of the entropy,
that it is extensive. Suppose a system can be regarded as composed of two parts,
whose states are described by parameters α1 and α2 , and that the probabilities in
these two parts are uncorrelated, so that the probability P (α1 , α2 ) dα1 dα2 that
the system is in a state with parameters in the infinitesimal ranges dα1 and dα2
around α1 and α2 is a product of probabilities for the separate parts:
P (α1 , α2 ) dα1 dα2 = P1 (α1 ) dα1 × P2 (α2 ) dα2 , (2.4.11)
with

dα1 P1 (α1 ) = dα2 P2 (α2 ) = 1 .
Then Eq. (2.4.10) gives

S = −k dα1 dα2 P1 (α1 ) P2 (α2 ) ln P1 (α1 ) + ln P2 (α2 ) = S1 + S2 ,
(2.4.12)
where S1 and S2 are the entropies of the two parts of the system:

S1 = −k dα1 P1 (α1 ) ln P1 (α1 ) , S2 = −k dα2 P2 (α2 ) ln P2 (α2 ) .
More generally, the difference S − S1 − S2 is a measure of the degree to which

probabilities in the two subsystems are correlated, and is known as the entan-
glement entropy.
Gases
In a gas E(α) is the sum of the energies Ea of the individual particles. The prob-
ability distribution (2.4.9) is then equal to a product of probability distributions
for the individual particles:

P (α) = p(Ea , Nia ) (2.4.13)
a
where p(Ea , Nia ) are the probability distribution functions for the individual
particle properties:

p(Ea , Nia ) ∝ exp − Ea + Nia μi /kT , (2.4.14)
i
in which Nia is the number of atoms of type i in the ath molecule. The con-
stant of proportionality must be chosen to make the individual total probabil-
ities equal to unity. If all the molecules have the same chemical formula, so
= Ni is the same for all molecules, then we can absorb the factor
that Nia
exp(− i Ni μi /kT ) into the constant of proportionality, and simply write

P (α) = p(Ea ) where p(Ea ) ∝ exp(−Ea /kT ) . (2.4.15)
a
In particular, the distribution of the momentum pa arises from the kinetic energy
term p2a /2ma in Ea , and Eq. (2.4.15) yields the Maxwell distribution (2.4.1) but,
as we have now seen, derived by Boltzmann in a more convincing way.
Equipartition
One of the most useful results of statistical mechanics is the equipartition of
energy in cases where the total energy E(α) can be written as the sum of
individual energies proportional to squares of independent quantities ξn :

E(ξ1 , ξ2 , . . . ) = cn ξn2 . (2.4.16)
n
For instance, for a gas of N monatomic atoms of mass m, the index n runs
over 3N values; the ξn are the three components of each atom’s momentum
and cn = 1/2m. Molecules that are not monatomic can rotate as well as move.
Here n runs over 6N values, with the ξn including the three components of each
atom’s momentum and the three components of its angular momentum. For an
angular momentum J, the rotational energy is
J12 J2 J2
+ 2 + 3 ,
2I1 2I2 2I3
where the Ii characterize the moments of inertia of the molecule. Here the
extra ξn variables are the components of angular momentum, with the cn the
corresponding values of 1/2I . But for a gas of diatomic molecules there is
essentially no energy in rotations around the line separating the atoms, so here
the ξn include only the components of each molecule’s angular momentum J⊥
in the two directions normal to this line, and n runs over 5N values. For an
ensemble of simple harmonic oscillators the ξn include both the displacement
from rest of each oscillator and the displacement’s rate of change. As we shall
see in Section 3.1, the energy of a radiation field can also be expressed as
in Eq. (2.4.16), with the ξn the Fourier transforms of each component of the
electric and magnetic fields.
Whatever the nature of the ξn , because of the factorization of the expo-
nential the probability of finding any one ξn in a range dξn takes the form
An exp(−En /kT ) dξn , with En = cn ξn2 and with proportionality constant An
fixed
by the condition that the total probability for each ξn is unity, so that
An exp(−En /kT ) dξn = 1. Thus the mean value of En is
∞ 2 2
∞ √
−∞ dξn cn ξn exp(−cn ξn /kT ) d En En exp(−En /kT )
En ≡ ∞ = 0 ∞ √
2
−∞ dξn exp(−cn ξn /kT ) 0 d En exp(−En /kT )
∞ 1/2
0 dEn En exp(−En /kT ) (kT )3/2 (3/2)
= ∞ −1/2
= = kT /2 . (2.4.17)
exp(−En /kT ) (kT )1/2 (1/2)
0 dEn En
It is a fortunate aspect of kinetic theory that these mean energies do not depend
on the coefficients cn , or indeed on much else about the physical system aside
from the distribution of the total energy among individual quadratic degrees of
freedom.
In any gas the kinetic energy of the nth particle is mn p2n /2. The average of
each of the three terms in this kinetic energy is kT /2, so the average kinetic
energy of each particle is 3kT /2. Equation (1.1.4) gives mv2 /3 = kT , where
this k is the constant k in the gas constant (1.2.4), so we see that this k is the
same as the constant k in the general probability distribution (2.4.8) or (2.4.9)
of statistical mechanics.
For a generic polyatomic molecule the mean rotational energy associated
with the three degrees of freedom Ji is 3kT /2, but, as already mentioned,
for a diatomic molecule meaningful rotation is only possible around the two
axes perpendicular to the linear molecule, so the mean rotational energy is only
2kT /2. That is, if we write the mean translational plus rotational energy per
molecule as 3kT /2 × (1 + f ), as in Section 2.1, then f = 0 for monatomic
molecules, f = 2/3 for diatomic molecules, while f = 1 for other molecules.
Equation (2.1.9) gives the specific heat ratio as γ = 1 + 2/3(1 + f ), so γ = 5/3
for monatomic gases, γ = 7/5 for diatomic molecules (which explains why
experiments on gases like O2 and H2 gave results near γ = 1.4 in Clausius’
time), and γ = 4/3 for other molecules.
Of course, molecules can also vibrate as well as rotate and move, and energy
can also go into exciting the clouds of electrons that hold them together. For
reasons that only became clear with the advent of quantum mechanics, these
degrees of freedom can only be excited at temperatures much higher than is
common in our environment.
Entropy as Disorder
The entropy can be regarded as a measure of the disorder of a system. To
see this, it is easiest to approximate the parameters of a system as taking a
discrete set of values αν instead of a continuum of values α. We can connect
the continuum and discrete descriptions by dividing the continuum into tiny
ranges αν ≤ α ≤ αν + δα (for simplicity treating α here as if it were one-
dimensional) and approximating P (α) as a constant Pν /δα in each interval, so
that the probability that α is in this interval is
αν +δα
Pν
dα P (α) = δα = Pν . (2.4.18)
αν δα
Then the entropy (2.4.10) is
Pν Pν
S = −k δα ln =−k Pν ln Pν (2.4.19)
ν
δα δα ν
where is a constant,
= k ln δα . (2.4.20)
Since there was an arbitrary constant in Eq. (2.4.10), we can absorb into the
definition of that constant, and define the entropy simply as

S = −k Pν ln Pν . (2.4.21)
ν
Since 0 ≤ Pν ≤ 1, each ln Pν is negative and S is positive. The entropy reaches

its minimum value, zero, in the completely ordered state in which just a single
Pν equals one and all others vanish. In disordered systems with non-vanishing
probabilities for different states S is positive-definite. In the completely disor-
dered state with all Pν equal, the entropy reaches its maximum possible value,
equal to k times the logarithm of the number 1/Pν of intervals.
2.5 Transport Phenomena
So far, we have been concerned with systems in which thermodynamic variables

such as temperature, pressure, density, etc. are constant in time and space, or
vary very slowly. But many of the most interesting physical phenomena are
associated with the transport of such quantities over time from one place to
another in inhomogeneous media. As we shall see in the following section, the
study of such transport phenomena gave physicists their first reliable values for
the masses of individual atoms and molecules.
Conservation Laws
In many cases we have to deal with conserved quantities, such as the number of
molecules or the total electric charge. By a quantity being conserved is meant
that the net rate of increase of the quantity (negative if a decrease) in any volume
plus the net rate at which this quantity flows out of the volume (negative if
flowing in) vanishes. The current J of this quantity is defined so that the net rate
of outward flow is A dA · J where A is the surface surrounding the volume V ,
and dA is an element of area of this surface, taken as a vector pointing outward
from the surface. Hence if this quantity has a density N and a current J , then
the conservation condition is

∂
d xN+
3
dA · J = 0 . (2.5.1)
∂t V A
Using Gauss’s theorem, we can write the second term in Eq. (2.5.1) as an
integral over the volume of the divergence of the current, so Eq. (2.5.1) is
equivalent to

∂
3
d x N +∇·J =0,
V ∂t
and, since this must be true for any volume, the integrand must vanish:
∂
N +∇·J =0. (2.5.2)
∂t
For instance, if matter is carried from one place to another only by a bulk motion
with velocity v, then the mass density ρ satisfies an equation of the form (2.5.2),
with the mass current given by ρv:
∂
ρ + ∇ · (ρv) = 0 . (2.5.3)
∂t
Momentum Flow
Such conservation laws are ubiquitous in physics. We will be concerned now
with a particular set of conserved quantities in fluids, the components of mo-
mentum. The density of the ith component (with i = 1, 2, 3) of momentum is
ρvi , where ρ is the mass density and vi is the ith component of the bulk velocity.
Their conservation provides the fundamental dynamical equation for fluids. The
conservation equation here takes the general form
∂ ∂
(ρvi ) + Tj i = 0 , (2.5.4)
∂t ∂xj
j
where Tj i is the j th component of the current of the ith component of momen-

tum, and the sums here and below run over the directions 1, 2, 3.8
By analogy with the case of the mass current ρv, we might think that the j th
component Tj i of the current of the ith component of momentum is ρvi × vj .
This would be the case if momentum like mass were carried from place to place
only by the bulk motion of the fluid. But of course fluid elements exert forces
on one another, both pressure and viscous forces, with a consequent transfer of
8 T is the purely spatial part of a larger array, a tensor with time as well as space components that serves in
ji
the general theory of relativity as the source of the gravitational field.
momentum. So, to keep an open mind, let us write the j th component of the
current of the ith component of momentum as
Tj i = ρvj vi + τj i , (2.5.5)
with τj i a correction term arising from forces acting within the fluid. Accord-
ing to Eq. (2.5.4), the i-component of the internal force per unit volume is
−(∂/∂xj )τj i .
So what is τj i ? An answer was first given in 1822 by Claude-Louis Navier
(1785–1836), of the Corps des Ponts et Chaussées, and later in his own formu-
lation by Sir George Stokes (1819–1903). Rather than trying to reproduce their
reasoning, we give a treatment below that has a more modern flavor, relying
largely on principles of invariance.
First, we can learn a little about the momentum current Tj k by imposing
the condition that angular momentum should satisfy a conservation condition.
The density of the ith component of angular momentum is ρ(x × v)i , so for
instance the rate of change of its i = 3 component is
∂
∂
∂Tj 2 ∂Tj 1
ρ(x × v)3 = ρ(x1 v2 − x2 v1 ) = −x1 + x2
∂t ∂t ∂xj ∂xj
j j
∂

=− x1 Tj 2 − x2 Tj 1 + T12 − T21 .
∂xj
j
In order for this to take the form of a conservation law we must have T12 = T21 ,
and, since there is nothing special about the 1- and 2- directions, Tj i must be
entirely symmetric,
Tij = Tj i , (2.5.6)
and then of course the same is also true of the term τj i in Eq. (2.5.5):
τij = τj i . (2.5.7)
Next, we assume that there are no preferred directions in the environment of

the fluid, so that τj i
is a spatial tensor – that is, it transforms under rotations
in such a way that ij ai aj τj i is invariant under rotations for any vector a,
such as v. In the absence of external fields, the tensor τj i must be constructed
from rotationally invariant quantities like ρ and T and vectors like v, together
with their space and time derivatives, but no other vectors that would reflect a
preferred direction in the environment.
One obvious such tensor is δj i times any function f of rotationally invariant
quantities,
where δj i is the diagonal matrix with all ones on the main diagonal.
Here ij ai aj δj i f is the rotational invariant a2 f . We can separate a term of
this form in τj i by defining a quantity
1
p≡ τii , (2.5.8)
3
i
and writing τj i as
τj i = pδj i + τj i , (2.5.9)
where τj i is both symmetric and traceless:

τij = τj i , τii = 0 . (2.5.10)
i
The term pδj i in Tj i gives a force per unit volume −∇p in Eq. (2.5.4), so p
can be identified as the fluid’s pressure. Of course, there is an infinite number of
ways of constructing the symmetric traceless tensor τj i from the velocity and
rotational invariants and their derivatives. One simple example is [vi vj + vj vi −
2δij v2 /3]f , where f is any function of the rotational invariants. Fortunately,
we can eliminate many of these possibilities (including this one) by using the
principle of Galilean relativity.
Galilean Relativity
The principle of Galilean relativity9 requires that the laws governing fluids
should be the same for an observer O who uses space coordinates x and for
an observer O moving at any constant velocity −u with respect to O, and who
therefore uses coordinates related to those of O by
x = x + ut . (2.5.11)
Aside from this change of coordinates, the moving observer sees a mass density
ρ that is the same as ρ:
ρ (x , t) = ρ(x, t) . (2.5.12)
But, for the observer O , his own velocity −u is subtracted from the velocity
seen by observer O:
v (x , t) = v(x, t) + u . (2.5.13)
To check whether the equation (2.5.3) of mass conservation is left invariant by
Galilean transformations, take the partial derivative of Eq. (2.5.12) with respect
to time, holding x (but not x !) fixed:
∂ρ (x , t) ∂ρ (x , t) ∂ρ(x, t)
+ ui = .
∂t ∂xi ∂t
i
9 In the twentieth century this came to be called Galilean relativity to distinguish it from the Einstein special
principle of relativity. Both principles state that the laws of nature are unaffected by the uniform motion of
an observer; as we will see in Chapter 4, it is only the details of the transformation to a moving frame of
reference that distinguishes Einsteinian from Galilean relativity.
Therefore
∂ρ (x , t)

+ ∇ · v (x , t)ρ (x , t)
∂t
∂ρ(x, t)

= + ∇ · (v (x , t) − u)ρ (x , t)
∂t
∂ρ(x, t)
= + ∇ · (v(x, t)ρ(x, t)) = 0 (2.5.14)
∂t
and so the equation (2.5.3) of mass conservation does satisfy the principle of
Galilean relativity.
By following the same reasoning, we can see that the momentum conserva-
tion law (2.5.4) would be invariant under Galilean transformations if Tj i were
simply given by the term ρvj vi in Eq. (2.5.5). Hence the principle of Galilean
relativity requires that the term τj i in Eq. (2.5.5) be separately invariant under
Galilean transformations, and according to Eqs. (2.5.8) and (2.5.9) the same
must be true of p and τj i . Because of the term u in the Galilean transformation
(2.5.13), Galilean relativity rules out terms in τj i such as in the example
vi vj + vj vi − 2δij v2 /3 mentioned above, which involves v itself rather than
its gradient.
Navier–Stokes Equation
There are still an infinite variety of terms that might appear in τj i , containing
any number of factors of gradients of any order of density and/or velocity. But
in order to keep the units consistent, the more gradients are contained as factors
in any term in τj i , the more powers of some length that is characteristic of the
microscopic properties of the fluid must appear in the coefficient of that term.
If these lengths characterizing the fluid, such as the distance between molecules
and the mean free path, are all much less than the scale of distances over which
fluid properties such as density and velocity vary, then τj i is dominated by a
term proportional to the minimum number of gradients.10 So we should look
for a possible term in τij proportional to a single gradient.
It is not possible to construct a symmetric traceless tensor proportional to a
single gradient of the density, so a tensor proportional to a single gradient must
be linear in the gradient of the velocity. There is a unique symmetric traceless
tensor of this sort:

∂vj ∂vi 2
τj i = −η + − δij (∇ · v) , (2.5.15)
∂xi ∂xj 3
10 This sort of reasoning has become common in the quantum theory of fields, leading to what are known as
effective-field theories.
where η is a coefficient (Galilean-invariant, like ρ and p) known as the viscosity

of the fluid.11 A minus sign is inserted in Eq. (2.5.15) in order that, with η > 0,
the heat produced by viscous fluid flow should be positive.12 Using Eqs. (2.5.5),
(2.5.9), and (2.5.15), we see that the momentum conservation equation (2.5.4)
takes the form

∂ ∂p ∂ ∂vj ∂vi 2
(ρvi ) = − + −ρvi vj + η + − δij (∇ · v) .
∂t ∂xi ∂xj ∂xi ∂xj 3
j
(2.5.16)
This is the Navier–Stokes equation.
Viscosity
The measurement of viscosity was well within the capabilities of nineteenth
century physicists. In a classic calculation using the Navier–Stokes equation,
Stokes found that a uniform fluid with viscosity η exerts a drag force F on a
spherical ball of radius a moving with velocity v through the fluid, given by
F = 6πηav . (2.5.17)
For instance, if a ball of mass m falls through a fluid, it accelerates until the
viscous force balances the force mg of gravity (neglecting buoyancy), when it
has the terminal velocity
mg
vterminal = .
6πηa
The viscosity of gases could also be measured by observing the effect of a
surrounding gas on the motion of a pendulum.
It was harder to calculate η on the basis of a theory of molecules than to
measure it. For some time the best that could be done theoretically was a rough
estimate of this viscosity.
To make this estimate, consider a uniform fluid experiencing a shear flow.
For instance, suppose v has only one component, v1 , which depends only on x3 .
(The fluid could be enclosed between two flat plates, each in the 1−2 plane,
with their separation in the 3-direction, and with one of the plates moving in the
11 Often η is called the shear viscosity. The reason is that, if we were to insist on using whatever formula for
the pressure p holds in the absence of fluid gradients, then p would not be precisely given by Eq. (2.5.8),
and τj i as defined by Eq. (2.5.9) would not be precisely traceless, so it would have a term proportional
to δij (∇ · v), with a coefficient known as the bulk viscosity. For complicated reasons the bulk viscosity is
generally much less than the shear viscosity (for instance, see S. Weinberg, Astrophys. J. 168, 175 (1971))
and in any case would have no effect in our present calculation.
12 For the details of this argument, see Sections 16 and 49 of Landau and Lifshitz, Fluid Mechanics, listed
in the bibliography.
1-direction and the other at rest.) In this case, Eqs. (2.5.5), (2.5.9), and (2.5.15)
give the 3-component of the current of the 1-component of momentum:
∂v1
T31 = τ31 = −η . (2.5.18)
∂x3
To find η, let us use molecular theory to calculate the rate per unit area at which
the 1-component of momentum crosses a plane normal to the 3-axis, which we
will take as the plane x3 = 0. This current arises because, in addition to being
carried along in the 1-direction by the bulk velocity v, each molecule has a fluc-
tuating “peculiar velocity” v. We make the far-reaching approximation that,
because of rapid collisions, all directions of this peculiar velocity are equally
likely. Then the number per unit volume whose peculiar velocity vector v
makes an angle with the +3-axis between θ and θ + dθ is the ratio of the solid
angle 2π sin θ dθ to 4π, times the total number density n, or n sin θ dθ/2. As
we saw in our calculation of gas pressure in Section 1.1, the number of these
molecules striking an area dA in this plane in a time dt is the number in a
cylinder with base dA and height v⊥ dt = cos θ |v| dt, where v⊥ is the
component of the peculiar velocity normal to the plane x3 = 0 and |v| is the
magnitude of the peculiar velocity. This number of molecules is equal to
dA × cos θ |v| dt × n sin θ dθ/2 .
Since v1 is assumed to be a function v1 (x3 ) only of x3 , a molecule that reaches
the plane x3 = 0 having traveled a distance r will have a 1-component of
momentum mv1 (−r cos θ), where m is the mass of the molecule. (In addition to
the momentum carried by this bulk velocity, the peculiar momenta of molecules
will also have 1-components, but under the assumption that all directions of
peculiar velocity are equally likely, these 1-components cancel when we
integrate over the azimuthal angle around the 3-direction.) A minus sign
appears in the argument of v1 because a molecule with a positive (or negative)
3-component v⊥ of peculiar velocity, for which cos θ > 0 (or cos θ < 0),
arrives at the plane x3 = 0 from negative (or positive) values of x3 . The rate
per unit area and per unit time at which the 1-component of momentum flows
through the plane x3 = 0 is then
π ∞
n|v|
T31 = cos θ sin θ dθ mv1 (−r cos θ ) P (r) dr ,
0 2 0
where P (r)dr is the probability that a molecule that reaches the plane x3 = 0
has traveled a distance between r and r + dr since its last collision with another
molecule, and the bar again denotes an average over molecules. As long as the
mean distance between collisions is small compared with the scale of distances
over which the fluid properties vary, all directions are equivalent, and P (r)dr
is also the probability that from a random starting position a molecule will
travel a distance between r and r + dr before its first collision. (Note that this
formula applies for molecules with negative as well as positive values of cos θ ,
because molecules with negative values of cos θ have a negative 3-component
of peculiar velocity and therefore cross the plane x3 = 0 traveling from positive
to negative values of x3 , and thus contribute a negative amount to the flow of
the 1-component of momentum through this plane.)
We again make the crucial assumption, which led to the Navier–Stokes equa-
tion, that the typical distances traveled by molecules are much smaller than the
scale of distances over which the bulk properties of the fluid vary. Here this
implies that v1 (−r cos θ) changes little over the range of r for which P (r) is
not negligible. This allows us to use a Taylor expansion

∂v1
v1 (−r cos θ) = v1 (0) − r cos θ + ··· .
∂x3 x3 =0
The first term makes no contribution to the current, because the integral over θ
of cos θ sin θ vanishes. This leaves us with the contribution of the next term,
π
nm|v| ∂v1 n|v| ∂v1
T31 = − cos θ sin θ dθ = −
2
,
2 ∂x3 x3 =0 0 3 ∂x3 x3 =0
where is the mean free path13
∞
≡ r P (r) dr.
0
Comparing this with our formula (2.5.18) for T31 , taken from Eq. (2.5.15),
we find for η the positive value
1
η = m n |v| . (2.5.19)
3
Mean Free Path

Now we need to estimate the mean free path . Suppose we make the crude
approximation that a molecule will collide with another molecule if its center
passes within an effective cross-sectional area σ around the center of the other
molecule. (For instance, if molecules were balls of radii a, then they would
collide if their centers approached within a distance 2a, so here σ = π(2a)2 .)
The probability that a molecule that has already traveled a distance r without
colliding will collide before it travels a further distance dr is the ratio of the
total effective area 4πr 2 dr n σ of all the molecules in the shell between r and
r + dr to the area 4πr 2 of this shell, and is therefore nσ dr. The probability
that the collision occurs in the distance between r and r + dr is then nσ dr
13 The notion of a mean free path was introduced by Rudolf Clausius in “On the Mean Lengths of the Paths
Described by the Separate Molecules of Gaseous Bodies,” Ann. Phys. 105, 239 (1858).
times the probability p(r) that it had not collided before it had traveled the dis-
tance r. To calculate p(r), we note that p(r + dr) equals p(r) times the proba-
bility 1 − nσ dr that the molecule will not collide before it travels to r + dr, so
p (r) = −p(r)nσ and, since p(0) = 1, the probability of traveling a distance
r without colliding is p(r) = exp(−nσ r). The probability of a collision in a
distance from r to r + dr is then
P (r)dr = nσ dr × p(r) = nσ dr exp (−n σ r) .
The average distance traveled between collisions is then
∞ ∞
1
≡ r P (r) dr = r nσ dr exp (−n σ r) = . (2.5.20)
0 0 nσ
This formula for is often used for media more complicated than a gas of hard
balls, by taking σ as some sort of effective cross section.
Using the result (2.5.20) in Eq. (2.5.19) gives an estimate of the viscosity:
m |v|
.η
3σ
The Maxwell–Boltzmann distribution (2.4.1) gives the mean value of |v| as

kT RT
|v| = = ,
2πm 2πμ
where R = k/m1 is the gas constant and μ = m/m1 is the molecular weight.
The viscosity is therefore

m RT
η . (2.5.21)
3σ 2πμ
Quantitatively this result correctly only gives the order of magnitude of η, but
it has an important qualitative consequence, that the viscosity is independent of
the gas density. This result was first found by Maxwell.14 In a letter to Stokes,15
he commented that “This is certainly very unexpected, that the friction should
be as great in a rare gas as in a dense gas. The reason is that for the rare gas the
mean path is greater, so that the frictional action extends to greater distance.”
One reason for finding this result surprising is that it raises the question
whether a gas that is so rare that it is practically a vacuum can have any vis-
cosity? It was this point that had led Aristotle in his book Physics to argue
that a vacuum is impossible. He concluded from his experience with motion
under the influence of friction that the velocity imparted to a body by a given
force is inversely proportional to the resistance, thus anticipating Stokes’ law
14 J. C. Maxwell, “Illustrations of the Dynamical Theory of Gases,” Phil. Mag. 19, 19; 20, 21 (1860).
15 Quoted on p. 27 of Brush, The Kinetic Theory of Gases, listed in the bibliography.
(2.5.17), and so he reasoned that in a vacuum where there can be no resistance

all velocities would be infinite.
But, as we have seen, the derivation of the Navier–Stokes equation and
Stokes’ law rests on the assumption that the mean free path is much smaller
than the scale of distances over which the fluid velocity varies. For a gas
of sufficiently low density this will no longer be the case, and the concept
of viscosity loses any meaning. For instance, when a spacecraft or a missile
re-enters the Earth’s atmosphere at very great altitude, where the mean free
path is much larger than the dimensions of the re-entering body, the drag
force F on the body is at first not proportional to its velocity as would be
required by Stokes’ law for spheres, but rather is F = CD ρAv 2 . Here ρ is the
air density; A is the vehicle’s cross-sectional area; v is its velocity; and CD is
a dimensionless “drag coefficient” that depends on the shape of the body. This
is the Knudsen regime. Only when the body reaches lower altitudes with much
smaller mean free paths does the drag force become proportional to velocity
and the body approach terminal velocity.
The fact that viscous drag is independent of gas density was regarded as
a confirmation of the molecular theory of gas dynamics. Maxwell himself
checked the validity of this result by measuring the viscosity of a gas at fixed
temperature, with pressure and hence density varying by a factor 60. The
observed constancy of viscosity over this large range of gas density tended to
confirm the molecular theory of gases, but in itself it revealed nothing about the
nature of molecules.
Diffusion
The general formulation above of the transport of momentum in a gas can be
extended to the transport of other physical quantities in general fluids. One such
quantity is the number density ν of particles suspended in a fluid. These can be
large molecules, such as molecules of sugar dissolved in water, or the tiny bits
of organic matter expelled from pollen grains noticed in 1827 by the botanist
Robert Brown (1773–1858), or artificial little balls used in studies of diffusion
to be discussed in the next section. The conservation of these particles requires
that their number density ν(x, t) satisfies an equation of the general form (2.5.2):
∂

ν + ∇ · νv + j = 0 , (2.5.22)
∂t
where v is the fluid bulk velocity. As in Eq. (2.5.5) we again separate the
convective term νv in the current from the diffusion term j. Since by itself
(∂/∂t)ν + ∇ · (νv) is Galilean-invariant, the diffusion term j must be a Galilean-
invariant vector, and if the scale over which the density ν varies is much larger
than relevant mean free paths then it is dominated by a term with a single
gradient, which can only be of the form
j = −D ∇ν , (2.5.23)
where D is a coefficient known as the diffusion constant.
For instance, if the fluid is at rest and D is independent of time and position
then Eq. (2.5.22) takes the form
∂
ν = D∇ 2 ν . (2.5.24)
∂t
Here is one solution:

N x2
ν(x, t) = exp − , (2.5.25)
(4πDt)3/2 4Dt

where N is a constant equal to the number ν d 3 x of particles suspended in the
fluid. (This is one way of seeing that the coefficient D defined by Eq. (2.5.23)
must be positive.) This√distribution is spherically symmetric and localized
within a radius of order 4Dt, which spreads with time owing to the diffusion
of the suspended particles through the fluid.
A vivid description of how diffusion arises from the microscopic motion
of suspended particles was given in 1905 by Albert Einstein16 (1879–1955).
Consider a time interval τ that is short compared with the times over which
the distribution function changes appreciably but long enough that typical sus-
pended particles collide many times with the molecules of the fluid. In this
time the position of each suspended particle jumps by some random vector
amount . These amounts differ from one suspended particle to another, in a
way that is governed by some sort of statistical distribution. Then for vanishing
bulk velocity v, the number density ν(x, t) is changed in this time interval to
ν(x, t + τ ) = ν(x + , t) ,
the bar indicating an average over the suspended particles. Assuming that ν(x, t)
is slowly varying over times of order τ and distances of order ||, we can
expand both sides as Taylor series in τ and :
∂ ∂ν(x, t) 1 ∂ 2 ν(x, t)
τ ν(x, t) + · · · = i + i j + ··· .
∂t ∂xi 2 ∂xi ∂xj
i ij
Under the assumption that all directions of are equally likely, we have
δij
i = 0, i j = ||2 ;
3
so, to leading order,
∂ 1
τ ν(x, t) = ||2 ∇ 2 ν(x, t) .
∂t 6
16 A. Einstein, Ann. Phys. 17, 549 (1905).

Comparing this with the diffusion equation (2.5.24) for zero bulk velocity, we
see that the mean square displacement increases as
||2 = 6τ D . (2.5.26)
This is in accord with the particular solution (2.5.25) of the diffusion equation;

calculating the integral x2 = N −1 ν(x, t)x2 d 3 x gives x2 = 6Dt.
The diffusion constant can be measured by observing this spreading out of
the suspended particles with time. The calculation by Einstein of the diffusion
constant D in terms of fundamental constants and its use to measure these
constants are discussed in the next section.
2.6 The Atomic Scale
We have seen various ways in which observations in the nineteenth century pro-
vided physicists with the values of only the ratios of quantities that characterize
the scale of individual atoms or molecules. The study of gases allowed measure-
ment of the gas constant R ≡ k/m1 (where k is Boltzmann’s constant and m1 is
the mass an atom would have if it had atomic weight unity, related to Avogadro’s
number by NA = 1/m1 ); the study of electrolysis allowed measurement of the
faraday, F ≡ e/m1 (where e is the minimum electric charge that is transferred
in electrolysis; and, under the assumption that the charge of the electron is
the same as the unit e of electric charge transferred in electrolysis, the study
of the bending of cathode rays allowed measurement of e/me . Furthermore,
under the assumption that molecules are tightly packed in liquids and solids,
knowledge of the mass density of a liquid or solid gave an approximate value
for the ratio of the mass to the volume of individual molecules. A measurement
of m1 or k or e or me or the size of any molecule would yield results for all these
quantities. No accurate measurements of any of these individual quantities were
possible before the twentieth century, which is not to say that nineteenth century
chemists and physicists did not try.
Nineteenth Century Estimates

According to Eq. (2.5.21), the viscosity of a gas of known temperature and
molecular weight is given by known quantities times m/σ , where m is the
mass of the gas molecules and σ is their effective cross-sectional area. Defin-
ing an effective radius a by σ ≡ πa 2 and setting the density in liquid form
equal to m/(4πa 3 /3) gave a rough estimate of both a and m. In this way, in
1865 Josef Loschmidt (1821–1895) estimated that air molecules have a diam-
eter of about 10−7 cm, and in effect that m1 ≈ 2 × 10−23 g, about ten times
too large.
Using this and similar studies of gas properties, G. J. Stoney (1826–1911)

in 1874 estimated in effect that m1 ≈ 10−25 g. Then, using the value of e/m1
measured in electrolysis, he estimated that e ≈ 10−20 coulombs. He called this
the electrine.
Soon after the discovery of the electron, efforts were made to measure its
charge directly. At Thomson’s Cavendish Laboratory, J. S. E. Townsend (1868–
1957) studied falling clouds of water droplets that formed around electrically
charged ions in gases produced in electrolysis. If the droplets have radius a
and mass m then, as discussed in Section 2.5, they reach a terminal velocity
mg/6πηa at which viscous drag balances gravity, where η is the air viscosity.
Measuring the terminal velocity and air viscosity then gave a value for m/a.
A second relation between m and a was provided by the known density ρ of
liquid water, which gives m = 4πa 3 ρ/3, so both m and a were known. The
droplets were collected, and their total mass and charge were measured.
The ratio of the total mass to the known mass m of each droplet gave the
number of droplets, and the ratio of the total electric charge to the number
of droplets then gave the charge per droplet. This charge was reported to be
always close to integer multiples of the same unit of charge, which Townsend
estimated to be 1.1×10−19 coulombs, about 10 times the value found by Stoney.
Similar results, none very accurate, were obtained by Thomson himself and by
H. A. Wilson (1874–1964).
The early years of the twentieth century saw a great improvement in sci-
entists’ knowledge of atomic magnitudes. This improvement came from three
chief sources:
• accurate direct measurement of the electric charges carried by oil droplets
gave a value for e;
• measurements of effects due to the diffusion of small spheres suspended in a
fluid gave a value for Avogadro’s number17 NA = 1/m1 ;
• the study of black body radiation gave a value for k.
Electronic Charge
One of the problems with the water droplets studied by Townsend et al. in their
estimates of the charge of the electron was that the masses of the droplets did
not remain fixed during the experiment, because water evaporates. To avoid
this, Robert Andrews Millikan (1868–1953) in 1906 studied individual oil
17 This is the way in which these experimental results were quoted by physicists at the time and have
generally been described by historians since then, but it is misleading. The formulas used to analyze
these experiments actually involved RNA , where R is the gas constant appearing in the ideal gas law
(1.2.3). Since R had already been measured, the measured value of RNA could be used to find NA . But
since R = k/NA , they were really measuring k, not NA . I suppose that the results were cited in terms of
NA rather than k because Avogadro’s number was much more familiar to physicists of the time than the
Boltzmann constant of statistical mechanics.
drops that had picked up electric charge from air ionized by X-rays. Unlike the
water droplets in the experiments of Townsend et al., these oil drops were
large enough that Millikan could study the motion of individual drops. As in
the earlier experiments, Millikan could measure the mass m and radius a of
individual drops from their terminal velocity in the absence of any external
electric fields, using the known density of oil and viscosity of air. Then, when
he turned on a strong vertical electric field E, a drop carrying electric charge q
would feel an electric force qE in addition to the gravitational force mg, so the
terminal velocity would be altered by an amount qE/6πηa. Measuring changes
in the terminal velocity, and knowing m and a, it was then possible to calculate
the changes in the drops’ electric charges. For instance, in one run the changes
in the electric charge q (in units of 10−19 coulombs) were
9.91, − 11.61, 1.66, 5.00, 1.68, − 8.31, 6.67, 5.02, etc.,
all close to integer multiples of 1.66 × 10−19 coulombs. After repeated runs,
Millikan concluded that the fundamental unit of electric charge is e = (1.592 ±
0.003) × 10−19 coulombs. (The modern value is e = 1.6021765 × 10−19
coulombs.) This immediately allowed the calculation of m1 (from the faraday
e/m1 ), and then k (from the ideal gas constant k/m1 ), and so on. Even more
importantly, the observation that droplet charges come close to integer multiples
of a unit charge gave direct evidence for the discreteness of electric charge.
Brownian Motion
The diffusion of particles suspended in a fluid depends on the size and shape
of the particles, as well as on the fluid properties and fundamental constants.
Where the particles are molecules, such as sugar molecules dissolved in water,
it is not possible to deduce relevant information about their size and shape
with any precision from the properties of solids or liquids composed of these
molecules. In the first decade of the twentieth century Einstein had the idea
of learning about fundamental constants from observation of the diffusion of
artificial particles, like little spherical balls, whose shape, size, and mass were
accurately known. (This diffusion is a special case of what is termed “Brownian
motion,” after the botanist Robert Brown mentioned in the previous section.)
Einstein took notice of the common observation that it is possible to have
a time-independent inhomogeneous equilibrium distribution of particles such
as little balls suspended in a fluid, in which the effect of diffusion is can-
celled by a steady external force F acting on each ball. For example, this force
could be the combined force of gravity and buoyancy, so that it has magnitude
F = g(m − mdisp ) (where g is the gravitational acceleration, m is the ball’s
mass, and mdisp is the mass of the fluid displaced by the ball), and it acts
in the −z direction, where z is altitude. In equilibrium this is balanced by a
kind of pressure, known as the osmotic pressure. With the balls in thermal
equilibrium with the fluid at a uniform temperature T , and therefore with a

kinetic energy given by the equipartition of energy as 3kT /2, their random
motion, which is responsible for diffusion, exerts a pressure that, according
to the same arguments used to derive the ideal gas law (1.1.5), has the value
p(z) = ν(z)kT , where ν(z) is the number of these balls per unit volume at
vertical coordinate z. In equilibrium at uniform absolute temperature T the
balance of forces acting on such suspended particles in a slab of area A between
altitudes z and z + dz then requires that
[ν(z) − ν(z + dz)]AkT = F × ν(z)Adz ,
and therefore the force creates a decrease in the density of little balls with
increasing altitude
ν (z)kT = −F ν(z) . (2.6.1)
Einstein pointed out that in addition to a balance of forces, in equilibrium
there has to be a balance of currents. According to Stokes’ law, the external force
F acting on each little ball gives it a downward velocity v = F /6πηa, where a
is the ball radius and η is the fluid viscosity. If not compensated by diffusion, this
would give these balls a current −vν = −F ν/6πηa, the minus sign indicating
that this current is in the downward direction. But because ν decreases with
increasing altitude, diffusion produces an upward current given by Eq. (2.5.23)
as −Dν (z), The cancellation of these two currents in equilibrium requires that
F ν(z)/6πηa = −Dν (z) . (2.6.2)
Einstein used Eq. (2.6.1) to eliminate the quantity ν /F ν in Eq. (2.6.2), and
concluded that18
kT
D= . (2.6.3)
6πηa
In the appendix to this section a more direct derivation of this formula is given
(taking account of a possible correction) for the case where there is no external
force, and diffusion is actually taking place.
Unlike sugar molecules or the grains observed in Brownian motion, the arti-
ficial little balls used for this purpose could be chosen to have a known uniform
radius a, so by measuring D at a given temperature T in a fluid of known
viscosity η, it was possible to find the Boltzmann constant k.
18 A. Einstein, “On the Motion of Small Particles Suspended in Liquids at Rest Required by the Molecular
Theory of Heat,” Ann. Phys. 17, 549 (1905). Because F has dropped out of the final formula for D,
Einstein’s result is independent of the nature of the force acting on the suspended particles, though for
simplicity we have assumed that this force is independent of position.
Within a few years after Einstein’s 1905 paper, an experimental study of the
diffusion of small bodies was carried out in Paris by Jean Perrin (1870–1942).
Perrin measured k (or as he said, Avogadro’s number) by observing the decrease
with altitude of the density of little balls suspended in a vertical column of fluid.
Equation (2.6.1) has the elementary solution
ν(z) ∝ exp(−F z/kT ) . (2.6.4)
(Perrin gave this solution in the form ν(z) ∝ exp [−NA F z/RT ].) Using the
known value of the combined gravitational and buoyancy force F on the little
balls gave a value for NA /R = 1/k, which by using the known value of the gas
constant R, Perrin reported19 as a value for Avogadro’s number NA = R/k =
7.05 × 1023 /mole, corresponding to m1 = 1.42 × 1024 g. As was usual at the
time, no figure was given for the uncertainty of the measurement.
Perrin also used microscopic measurements over several minutes of the root
mean square diffusion of suspended balls in the horizontal direction, in which
no force is acting. He found that as expected the mean square displacement
is proportional to the elapsed time. Using Eq. (2.5.26) gave a value for the
diffusion constant D and, using Einstein’s formula (2.6.3) (which Perrin like
Einstein wrote as D = RT /6πηaNA ) he found20 that NA = 7.15 × 1023 /mole,
corresponding to m1 = 1.40 × 10−24 g. The fair agreement of this result,
which was obtained by direct observation of diffusing particles, with Perrin’s
earlier measurement based on equilibrium in a vertical column gave support
to the view that diffusion is due to the motion of the balls in equilibrium with
randomly moving molecules. Perrin was not hesitant in concluding that his work
confirmed the reality of molecules – his results were summarized in a long
article21 titled “Brownian Movement and Molecular Reality.” His measure-
ments were not far off – with modern definitions of molecular weight, the value
of Avogadro’s number is 6.022142 × 1023 /mole.
Black Body Radiation

As we will see in the next chapter, in 1900 the early ideas of Max Planck22
(1858–1947) about quantum theory led to a formula for the distribution with
frequency of the radiation energy emitted by a totally absorbing body, which
depended on the value of kT . Comparison of Planck’s formula with observation
gave k 1.34 × 10−16 erg/K.
19 J. Perrin, Comptes rendus cxlvi, 167 (1908) and cxlvii, 530 (1908).
20 J. Perrin, Comptes rendus cxlvii, 1044 (1908).
21 See Perrin, Brownian Movement and Molecular Reality, listed in the bibliography.
22 M. Planck, Verh. d. deutsche phys. Ges. 2, 202, 237 (1900).
Consistency
The atomic theory underlying these measurements of microscopic parameters
and the values found gained much credit from the consistency of the results
obtained. For instance, in 1901 Planck used his measurement of k together with
the known value R = 8.27 × 107 erg/mole K of the gas constant to calculate a
value for Avogadro’s number NA = R/k = 6.17 × 1023 /mole, in fair agree-
ment with Perrin’s later result NA 7 × 1023 /mole. Planck also used this
result together with the known value of the faraday, F = eNA = 9.63 × 104
coulombs/mole, to calculate the unit of charge, e = 1.56 × 10−19 coulombs, in
very good agreement with Millikan’s result.
This happy agreement of fundamental constants led to a widespread accep-
tance of the atomic theory of matter. For instance, the chemist F. W. Ostwald
(1853–1932) had been a determined opponent of the atomic theory, but in 1908
he finally admitted that “I am now convinced that we have recently become
possessed of experimental evidence of the discrete or grained nature of matter,
which the atomic hypothesis sought in vain for hundreds and thousands of
years.”
An adverse voice remained. The physicist–philosopher Ernst Mach (1838–
1916), who spoke of “the artificial hypothetical atoms of chemistry and
physics,” never accepted their existence. As late as 1916, shortly before his
death, he declared that “I can accept the theory of relativity as little as I can
accept the existence of atoms and other such dogmas.” This goes to show that
a scientist can maintain his own principles, bravely holding out against a wide
consensus of the scientific establishment, and still be wrong.
Appendix: Einstein’s Diffusion Constant Rederived
Einstein’s derivation of Eq. (2.6.3) for the diffusion constant D relied on the
introduction of an external force F acting on suspended particles, which
prevents their diffusion from disturbing a time-independent equilibrium particle
distribution. The presence of such an external force such as gravity is not
uncommon, but it ought to be possible to obtain the same result where there is
no external force, and where diffusion is actually taking place. Below is such
a derivation, which indicates the presence of a correction for particles whose
mass is not negligible.
The mean velocity v(x, t) of diffusing suspended particles at position x and
time t is given by setting the current (2.5.23) equal to νv:
D ∇ ν(x, t)
v(x, t) = − . (2.6.5)
ν(x, t)
According to Stokes’ theorem, spherical balls of radius a with this mean veloc-
ity experience a mean viscous drag force:
6πηaD ∇ ν
Fvis = −6πηav = , (2.6.6)
ν
with the signs indicating that the viscous force is in a direction opposite to that
of v, and hence in the direction of the gradient of the particle number density.
Diffusion occurs because this drag is overcome by osmotic pressure. Following
the same reasoning that led to Eq. (2.6.1), if the gradient of ν is along the
x-direction, then the force due to an environment at uniform temperature T
on the particles in a small disk of area dA and thickness dx transverse to
the x-direction is the osmotic pressure force dAkT [ν(x, t) − ν(x + dx, t)] =
−dA kT dx dν(x, t)/dx on the disk. Dividing this by the number dA dx ν(x, t)
of suspended particles in the disk gives the osmotic pressure force on each
particle:
−dAkT dx dν(x, t)/dx kT dν(x, t)/dx
Fosm = =− .
dA dxν(x, t) ν(x, t)
Since in the absence of external forces there is nothing special about the x-
direction, for a gradient in a general direction we have
kT ∇ν
Fosm = − . (2.6.7)
ν
Assuming that the viscous drag is cancelled by the osmotic pressure, we have
0 = Fvis + Fosm , (2.6.8)
which gives Einstein’s formula (2.6.3):
kT
D= . (2.6.9)
6πηa
More generally, we should take into account the possibility that the viscous
drag is not precisely cancelled by the osmotic pressure. In this case, Newton’s
law gives
dv
m = Fvis + Fosm , (2.6.10)
dt
where m is the mass of the balls and the acceleration dv/dt is the total time
derivative of the mean velocity, due both to the change in mean velocity at a
fixed position and to the change in mean velocity of the particles carried from
one point to another at the mean velocity:
dv ∂v
= + v · ∇v . (2.6.11)
dt ∂t
Inspection of Eqs. (2.6.5), (2.6.11), (2.5.22), and (2.5.23) shows that the magni-
tude of the acceleration can depend only on D and on L, the scale of distances
over which ν varies appreciably. Dimensional analysis then tells us that it must
be of order

dv D 2
≈ . (2.6.12)
dt L3
This shifts the value of |Fvis | for a given Fosm by an amount of order mD 2 /L3 ,
and hence shifts the value of the diffusion constant derived from Eq. (2.6.6) by
a fractional amount of order
D mkT
≈ (mD/L3 ) × (L/6πηa) ≈ 2 . (2.6.13)
D L (6πηa)2
Einstein’s formula for D is valid only if this is much less than one.
Einstein did not see this correction, because he was assuming that an
external force was preventing any mean motion, so that there were no inertial
forces. But the correction would affect the horizontal diffusion of suspended
balls in Perrin’s measurement of the diffusion constant. I do not know the
parameters in Perrin’s experiment, but the fact that he obtained close values for
Avogadro’s number from the measurement of horizontal diffusion and from the
measurement of the vertical distribution of suspended balls indicates that in his
experiment the correction (2.6.13) was not very large.
This is reassuring regarding the derivation of the Navier–Stokes equation
(2.5.16), in which it was assumed that terms of second order in the inverse
of the scale L over which properties of the fluid vary can be neglected. Using
mv 2 ≈ kT , where v is a typical particle velocity, we see that the fractional
correction (2.6.13) is of order (L/L)2 , where L ≡ kT /6πηav is approximately
the distance in which viscous forces will bring a particle with radius a and
velocity v to rest. It is the ratios of just such microscopic lengths as L to the
scale L of macroscopic variation whose second and higher powers are dropped
in the derivation of the Navier–Stokes equation.
3
Early Quantum Theory
The early years of quantum theory were a time of guesswork, inspired by prob-
lems presented by the properties of atoms and radiation and their interaction.
This is the subject of the present chapter. Later, in the 1920s, this struggle led to
the systematic theory known as quantum mechanics, the subject of Chapter 5.
3.1 Black Body Radiation
Quantum mechanics started with the problem of understanding radiation in

thermal equilibrium at a non-zero temperature. We define E (ν, T ) dν as the
energy per volume of radiation with frequency between ν and ν + dν in an
enclosure with walls at uniform temperature T . As noted in 1859–1862 by
Gustav Robert Kirchhoff (1824–1887), this distribution is independent of any
property of the enclosure except for its temperature, because to change E (ν, T )
by changing the material or the shape of the enclosure would require taking
energy from one frequency to another, while keeping the same temperature,
which is impossible.
Radiation Absorption, Emission, and Energy Density

Kirchhoff called this “black body radiation.” This term refers to a relation
between the energy density and the rates at which radiation is emitted and
absorbed from any black heated surface. Consider radiation in an enclosure
whose walls are at a uniform temperature T , and think how to calculate the
energy received by a small patch of area dA on the inner walls of the enclosure.
At a point within the enclosure at a distance r from this patch, the patch
subtends a solid angle dA cos θ/r 2 , where θ is the angle between the line of
sight from the point to the patch and the normal to the patch. Hence a fraction
dA cos θ/4πr 2 of the radiation at this point is aimed at the patch. In a time t all
the radiation at a distance r < ct that is aimed at the patch will hit it (where c
is the speed of light), so the total rate (ν, T ) per unit time, per unit area, and
61
62 3 Early Quantum Theory
per unit frequency interval at which radiation energy at a frequency near ν hits
the patch will be
ct π/2
1 dA cos θ
(ν, T ) = 2
2πr dr sin θ dθ E (ν, T ) dν
t dA dν 0 0 4πr 2
c
= E (ν, T ) . (3.1.1)
4
Equilibrium requires that the rate per area of emission of radiation energy in
a frequency interval dν must equal the rate per area of absorption of radia-
tion energy in that frequency interval, which is (c/4)f (ν, T )E (νT ) dν, where
f (ν, T ) ≤ 1 is the fraction of energy of radiation of frequency ν that is absorbed
when it hits the wall of the enclosure. The emission is evidently greatest for
“black” walls, which absorb all the radiation that falls on them, so that f (ν, T )
tales its maximum value, f (ν, T ) = 1.
In the 1890s Eq. (3.1.1) was used at the Physikalisch-Technische Reich-
sanstalt in Berlin to accurately measure E (ν, T ). This presented a challenge to
theorists, to understand the measured distribution E (ν, T ).
Electromagnetic Degrees of Freedom

To use the equipartition of energy to calculate the radiation energy density
E (ν, T ) from first principles it is necessary to identify the degrees of freedom of
radiation among which energy is shared. The deepest understanding of radiation
at the beginning of the twentieth century was based on Maxwell’s equations. In
unrationalized electrostatic units these are
1 ∂E 4π
∇×B− = J , ∇ · E = 4πρ ,
c ∂t c
(3.1.2)
1 ∂B
∇×E+ =0, ∇·B=0,
c ∂t
where E(x, t) and B(x, t) are the electric and magnetic fields, while J(x, t) and
ρ(x, t) are the electric current density and charge density. For empty space,
ρ = J = 0, and Maxwell’s equations have solutions of the form
E(x, t) = e exp (ik · x − iωt) + c.c.

(3.1.3)
B(x, t) = b exp (ik · x − iωt) + c.c.
where k and ω are real constants; e and b are complex constant three-
vectors; and c.c. denotes the complex conjugate of the preceding term. Since
in Eq. (3.1.3) we are including terms proportional to both exp(−iωt) and
exp(iωt), without loss of generality we can take ω > 0. Inserting (3.1.3) into
(3.1.2), we see that this is a solution for ρ = J = 0 if and only if
ω
k×b+e=0, k·e=0
c
(3.1.4)
ω
k×e− b=0, k·b=0.
c
Combining these, we have
ω2 ω
2
e = − k × b = −k × [k × e] = k2 e ; (3.1.5)
c c
so ω = |k|c, and electromagnetic radiation therefore propagates at the speed c.
Now, we want to calculate the electromagnetic energy in a finite volume V .
Since E is universal, we can take our enclosure to be a cube, with edges
L = V 1/3 that lie along the 1-, 2-, and 3- directions. Whatever boundary condi-
tions the material of the enclosure imposes on the phases of the waves, it must
be the same on opposite sides of the cube, so the phase k · x can only change by
an integer multiple of 2π when x1 , x2 , or x3 is shifted by L. That is, the wave
number k and frequency ω must take the form
kn = (2π/L)n, ωn = c |kn | , (3.1.6)
where n is a vector with integer components n1 , n2 , and n3 . Hence the general
electric and magnetic fields in the enclosure are

E(x, t) = e(n) exp (ikn · x − iωn t) + c.c. (3.1.7)
n
c
B(x, t) = [k × e(n)] exp (ikn · x − iωn t) + c.c. , (3.1.8)
n
ωn
where e(n) is, for each n a three-vector orthogonal to n, and c.c. denotes the
complex conjugate of the previous term.
It is a well-known result of classical electrodynamics that the energy density
in radiation is (E2 + B2 )/8π. To integrate this over the volume of the enclosure,
we use the orthogonality relations

i(kn −km )·x V n=m
3
d xe =
V 0 n = m ,
(3.1.9)
i(kn +km )·x V n = −m
3
d xe =
V 0 n = −m .
(For instance, in one dimension for n = m,
L

dx e(2πi/L)(n−m)x = L/2πi(n − m) [e2πi(n−m) − 1] = 0 ,
0
while for n = m it is just L. In three dimensions, the integral is a product of
similar factors.) It follows then that

1
d 3 x E2 (x, t) = e(n) · e(−n)e−2iωn t
V V n

+ e∗ (n) · e∗ (−n)e+2iωn t
n

+2 e(n) · e∗ (n) ,
n
1
c 2

d x B (x, t) =
3 2
(kn × e(n)) · (−kn × e(−n))e−2iωn t
V V n
ω n
c 2
+ (kn × e∗ (n)) · (−kn × e∗ (−n))e+2iωn t
n
ω n
c 2

+2 (kn × e(n)) · (kn × e∗ (n)) .
n
ω n
Noting that for k · e = k · e = 0 we have (k × e) · (k × e ) = k2 e · e , and

noting also that ωn2 = c2 k2n , we see that the terms proportional to exp(−2iωn t)
in the electric and magnetic energy cancel, as do the terms proportional to
exp(+2iωn t), leaving us with the total energy:

1 V
E= d 3 x [E2 (x) + B2 (x)] = e(n) · e∗ (n) . (3.1.10)
8π V 2π n
There are two independent components of each e(n) orthogonal to n, each with
independent real and imaginary terms, all four quantities for each n contributing
independently to E, so there are four degrees of freedom for each n.
We will assume that L = V 1/3 is much larger than the wavelengths c/ν under
consideration, so that the frequencies νn ≡ ωn /2π are very close together
and we can replace sums over n with integrals over ν. To count the num-
ber of integer-component vectors n in a given range of frequencies, note that,
according to Eq. (3.1.6),
|n| = |kn |L/2π = ωn L/2πc = νn L/c .
The number of allowed frequencies between ν and ν + dν therefore equals the
number of integer-component vectors n in a shell with |n| between νL/c and
(ν + dν)L/c. These vectors form a cubic lattice with lattice site width unity,
so the number dN of these vectors in this shell just equals the volume of the
shell:
dN = 4π|n|2 d|n| = 4π(L/c)3 ν 2 dν = 4πV ν 2 dν/c3 . (3.1.11)
With two polarizations for each n, the total energy density per frequency interval
is then
8π 2
E (ν, T ) = ν E(ν, T ) , (3.1.12)
c3
where E(ν, T ) is the mean energy for each of the two complex polarization
vectors orthogonal to a wave vector k with a given value of ν = |k|c.
The Rayleigh–Jeans Distribution

In 1900 a calculation along these lines was presented by John William Strutt
(1842–1919), better known as Lord Rayleigh.1 He used the result of classical
thermodynamics, described here in Section 2.4, that for systems whose total
energy can be expressed as a sum over degrees of freedom of squared ampli-
tudes, as in Eq. (3.1.10), each degree of freedom such as Re(e) and Im(e) for
a given polarization and wave vector contributes an energy kT /2, so the mean
total energy for a given polarization and wave vector is E = kT . Using this in
Eq. (3.1.12) gives an energy density
8πkT 2
E (ν, T ) =
ν . (3.1.13)
c3
A more detailed derivation is given in the next section, to serve as a basis for
the modification introduced by Einstein.
Rayleigh had made a mistake of a factor 8, which was corrected in 1905
by James Jeans2 (1877–1946); the result (3.1.13) is therefore known as the
Rayleigh–Jeans formula. Unfortunately a mere factor 8 was the least of
Rayleigh’s problems. If Eq. (3.1.13) held at all frequencies, however high,
then the total energy E in a volume V at any temperature T = 0 would be given
by a divergent integral:

8πkT V ∞ 2
E= ν dν
c3 0
a result that became known as the ultraviolet catastrophe.
The Planck Distribution

Meanwhile, back in Berlin, a different approach was being followed by Max
Planck (1868–1947). Measurements indicated that E (ν, T ) increases as ν 2
for small ν, reaches a maximum at a frequency proportional to temperature,
and decreases more or less exponentially for large ν. To fit this behavior, it
would be natural to guess that E (ν, T ) = CT ν 2 exp(−C ν/T ), with C and

some temperature-independent
C constants, which would give a total energy
dν E (ν, T ) proportional to T , which as we saw in Section 2.3 is required by
4
1 Lord Rayleigh, Phil. Mag. 49. 539 (1900); Nature 72, 54 (1905).
2 J. Jeans, Phil. Mag. 10, 91 (1905).
thermodynamics. But this formula would not agree with a more detailed result
of classical thermodynamics, known as the Wien displacement law.3 Planck in
1900 guessed the formula4
8πh ν 3 dν
E (ν, T ) dν = , (3.1.14)
c3 exp(hν/kT ) − 1
where h and k again are constants.
A little later in the same year, Planck published an attempted derivation5
of Eq. (3.1.14), which indicated that k is Boltzmann’s constant, while h is a
new constant, known ever since as Planck’s constant. To derive this formula, he
adopted a model of the wall of the enclosure whereby it consists of electrically
charged harmonic oscillators with a wide range of frequencies, with the oscilla-
tors of frequency ν coming into equilibrium with the electromagnetic radiation
of frequency ν. Planck assumed that the energies of oscillators of frequency ν
can only take the form E = nhν, with n a positive integer. Planck calculated
the radiation emitted by these oscillators when they are in thermal equilibrium at
temperature T , and found that in order for them to absorb just as much radiation
as they emit, the radiation in the enclosure must have the energy density distri-
bution given by Eq. (3.1.14). We will not go into Planck’s derivation because
it was superseded a few years later with the modern derivation, due to Albert
Einstein, described in the next section.
Finding the Boltzmann Constant

By comparing Eq. (3.1.14) with the Reichanstalt data, Planck was able to infer
values for the Boltzmann and Planck constants. One set of early results was
k = 1.4 × 10−16 erg/K, h = 6.6 × 10−27 erg sec ,
which compare well with the modern values,
k = 1.38062 × 10−16 erg/K, h = 6.62620 × 10−27 erg sec .
As described in Section 2.6, from his value of k and the known gas constant
R = kNA , Planck calculated a value for the Avogadro number NA (or equiva-
lently, for the mass m1 = 1/NA of unit atomic weight) and from NA and the
known value of the faraday, F = eNA , he calculated the electric charge e carried
by singly charged ions in electrolysis.
3 This result was derived by Wilhelm Wien (1864–1926) in 1893. It requires that the energy density
distribution must take the form E (ν, T ) = ν 3 F (ν/T ) where F is some function, of only the ratio ν/T ,
that is not dictated by thermodynamics alone. For a proof, see Appendix XXXIII of Born, Atomic Physics,
listed in the bibliography. We will not be relying here on this result.
4 M. Planck, Verhand. deutsch. phys. Ges. 2, 202 (1900).
5 M. Planck, Verhand. deutsch. phys. Ges. 2, 237 (1900).
3.2 Photons 67
The Rayleigh–Jeans formula (3.1.13) agrees with Planck’s for hν

kT , and
in fact gives the correct low-frequency limit of the energy density distribution.
It is an irony of history that in principle Rayleigh could have used the com-
parison of his formula with the data for low frequency to find the value of k,
and then like Planck calculated the values of m1 and e. For this, the quantum
hypothesis is unnecessary. This would have been difficult, for it is not easy to
fit experimental data for E (ν, T ) at low frequencies with a formula that is only
supposed to be valid at these frequencies, when the form of the distribution at
higher frequencies is not known. Anyway, it is just as well that Rayleigh did not
do this, as his factor of 8 mistake in E (ν, T ) would have led to the wrong results
for Avogadro’s number and the fundamental electric charge.
Radiation Energy Constant

Unlike the Rayleigh–Jeans distribution, the Planck distribution gives a finite
total energy density:
∞
Eγ (T ) = E (ν, T ) dν = aT 4 . (3.1.15)
0
This was in agreement with the known temperature dependence Eq. (2.3.13),
which as we saw had been derived thermodynamically using the result of clas-
sical electrodynamics that radiation pressure is one-third of the energy density.
But now there was a value for the radiation energy constant:
a = 16π 8 k 4 /15h3 c3 .
(Using modern values for h, c, and k, the constant a has the value 7.56577(5) ×
10−15erg/cm3 K4 .) According to Eq. (3.1.1), this also tells us that the total rate,
≡ (ν, T ) dν, per unit area and per unit time at which a black surface at
temperature T emits radiation energy is = σ T 4 , where σ = ca/4 is another
constant, known as the Stefan–Boltzmann constant.
3.2 Photons
Quantization of Radiation Energy

The modern interpretation of the Planck distribution (3.1.14) emerged from a
heuristic conjecture of Albert Einstein6 in 1905. Planck had assumed a quan-
tization of the energies of the charged harmonic oscillators that he supposed
made up the walls of an enclosure. Einstein instead imposed the quantization
on the radiation itself.

Confusingly, Einstein was not actually dealing with the Planck distribution,
but with an attempted fit to the data given earlier by Wilhelm Wien:
E (ν, T ) ∝ ν 3 exp(−βν/T )
where β is a constant. Einstein used thermodynamic arguments to show that
this distribution would require that the energy of radiation at frequency ν must
be a whole number multiple of βRν/NA . Physicists soon learned that E (ν, T )
is really given by the Planck distribution (3.1.14) and could interpret the Wien
distribution as the high-frequency limit of the Planck distribution, which for
large ν is proportional to ν 3 exp(−hν/kT ). Thus β in Einstein’s quantization
condition could be identified as β = h/k. With the gas constant R equal to
kNA , this means that the energy of the radiation at frequency ν must be a
whole number multiple of (h/k)(Rν/NA ) = hν, the same rule as for Planck’s
mythical oscillating charges.
Derivation of Planck Distribution

To see how Einstein’s assumption leads to the Planck distribution (not the Wien
distribution), it is helpful to follow the reasoning a few years later of Hendrik
Lorentz7 (1853–1928). For this, we will go back in more detail to the use by
Rayleigh and Jeans of the principle of equipartition of energy, which will make
it easy to see the difference made by Einstein’s assumption of the quantization
of radiation energy.
Recall that by counting degrees of freedom, we found that the energy density
per frequency interval is given by Eq. (3.1.12):
8πν 2
E (ν, T ) = E(ν, T ) , (3.2.1)
c3
where E(ν, T ) is the mean energy of each polarization state of electromag-
netic waves of frequency ν. According to Eq. (3.1.10) the energy Ek,e for a
given wave vector k = 2πn/L and polarization vector e orthogonal to n in a
cubical box of volume L3 is a sum of squares:
L3
En,e =(Re e(n))2 + (Im e(n))2 .
2π
Therefore, in the same way as in Eq. (2.4.17), in classical statistical mechanics
we would have a mean energy for each frequency and polarization, given by
7 H. A. Lorentz, Phys. Z. 11m, 1234 (1910).

3.2 Photons 69
∞ ∞

−∞ dX −∞ dY (X + Y ) exp − (X + Y )/kT
2 2 2 2
E(ν, T )class = ∞ ∞
,
−∞ dX −∞ dY exp − (X + Y )/kT
2 2
where

L3 L3
X≡ Re e(n) , Y ≡ Im e(n) .
2π 2π
(The factor L3 /2π in dX dY is irrelevant, as
√ it cancels between √
numerator and
denominator.) Defining θ and E by X = E cos θ and Y = E sin θ and
integrating over θ gives
∞ √ √
2π Ed EE exp(−E/kT )
E(ν, T )class = 0 ∞ √ √
0 2π Ed E exp(−E/kT )
∞
dEE exp(−E/kT )
= 0 ∞ = kT . (3.2.2)
0 dE exp(−E/kT )
This is the classical equipartition result used by Rayleigh and Jeans, leading to
the Rayleigh–Jeans energy density distribution (3.1.13).
According to Einstein’s conjecture, the energy E (not X 2 or Y 2 ) of each
polarization state can only take the values nhν, with n = 0, 1, 2, . . . , so the
integrals in Eq. (3.2.2) must be replaced with sums. That is, according to
Einstein,
∞ ∞

n=0 nhν exp(−nhν/kT ) d
E(ν, T ) = ∞ =− ln exp(−nhν/kT )
n=0 exp(−nhν/kT ) d(1/kT )
n=0
d −1 hν exp(−hν/kT )
=− ln 1 − exp(−hν/kT ) =
d(1/kT ) 1 − exp(−hν/kT )
hν
= . (3.2.3)
exp(hν/kT ) − 1
Using this in Eq. (3.2.1) gives an energy density distribution
8πh ν3
E (ν, T ) = . (3.2.4)
This is the same as the Planck distribution (3.1.14), but derived from quite
different assumptions.
Einstein’s interpretation of his quantization assumption E = nhν was that
the energy in radiation of frequency ν comes in individual bundles, or “quanta,”
each with energy hν. A state of this radiation with energy nhν is simply one
containing n such quanta. This interpretation was soon confirmed by data on

the photoelectric effect.
Photoelectric Effect
Several physicists had observed in the late nineteenth century that electric
charge is expelled from metal surfaces when the surfaces are exposed to
ultraviolet light. After Thomson’s discovery of the electron in 1897, it was
generally assumed that this charge was carried by electrons. A metal is a
lattice of positively charged ions that have each lost one or more electrons,
which circulate freely through the metal, accounting for the good electrical and
thermal conductivity of metals. The positively charged metal ions produce an
electrostatic potential, so that in normal circumstances it takes a definite energy
(called the “work function”) φ to pull the negatively charged electrons out of
the metal. One might think that the more intense the radiation, the more energy
is given to these electrons. In 1902 experiments by Philipp Lenard (1862–1947)
showed that this is not the case. Instead, no matter how intense the radiation,
no electrons are ejected from the metal unless the frequency exceeds a certain
minimum (which is why photoelectricity was discovered using ultraviolet rather
than visible light), and when that condition is met, the energy of each expelled
electron increases with the frequency. Only the number of photoelectrons
depends on the intensity of the radiation, not their individual energies.
Einstein in his 1905 paper seized on these phenomena as evidence for his
quantization assumption. Any electron expelled from the metal was assumed to
have been struck by one of Einstein’s quanta. In order to get out of the metal,
the energy hν of the radiation quantum must at least equal the work function
φ, so no electrons can be emitted unless ν ≥ φ/ h. If this condition is satisfied,
then the kinetic energy Ee of the emitted electron will be given by the excess
Ee = hν − φ . (3.2.5)
These energies could be measured by observing how strong an electric field is
needed to stop the electron emission by exerting a force toward the surface. In
this way, Millikan at Chicago in 1914–1916 (while Europeans had other things
on their minds) confirmed the form of the Einstein relation (3.2.5), and found
a value for h, which turned out to agree with the value measured in studies of
black body radiation.
Particles of Light
As we shall see in Chapter 4, with the advent of special relativity it became
clear that since Einstein’s quanta would have to travel at the speed of light,
as particles they would have to have momenta equal to the energy divided by
c, or hν/c. As discussed in Section 4.5, this was confirmed in experiments of
Arthur Holly Compton (1892–1962) in 1922–1923 on the scattering of X-rays

by electrons in atoms; Compton’s measurements removed the last doubt about
the existence of Einstein’s radiation quanta. A few years later, they were given
their present name, photons.
3.3 The Nuclear Atom
It was not possible to make progress in applying quantum ideas to atoms without
some understanding of what atoms are. The growth of this understanding began
with the discovery of radioactivity.
Radioactivity
In 1896 Antoine Henri Becquerel (1852–1908) was trying to find whether var-
ious crystals that had been exposed to sunlight would emit energetic radiation,
like the X-rays that had been discovered a few months earlier. He put these
crystals next to photographic plates wrapped in dark paper that would block
sunlight but might not block rays emitted by the crystal that had earlier been
exposed to the Sun. A wire mesh was inserted between the crystal and the
paper, so that any exposure of the plate by these rays would show an image
of the mesh. One of the crystals Becquerel intended to study was uranium
potassium bisulphate, because it exhibits the phenomenon of phosphorescence,
the delayed emission of light by substances such as the luminous paint on clock
dials that have been exposed to bright light. At first, in February 1896, the
skies in Paris were too cloudy to provide the needed sunlight, so Becquerel
left his crystals and photographic plates in a drawer. When he took them out
in early March, he found that the plates that had been left near the crystals
containing uranium were exposed, showing clear images of the wire mesh,
even though they had never been put in sunlight. In the following months he
found that some sort of ray from various compounds of uranium would expose
photographic plates, even when the crystals and plates were put together in
lead-lined boxes.
It was soon realized that this phenomenon was not limited to uranium com-
pounds. In 1898 Marie Curie (1867–1934) showed that similar phenomena are
produced by compounds of thorium, and she and Pierre Curie (1859–1906) were
able to isolate a previously unknown element, radium, that was millions of times
more active than uranium or thorium. The Curies gave this phenomenon the
name radioactivity.
Two different kinds of radioactivity were distinguished in the next few years
by Ernest Rutherford (1871–1937). There are beta rays, which are about as pen-
etrating as X-rays, and alpha rays, which cannot penetrate even very thin sheets
of foil. (Gamma rays, which are energetic photons, were discovered later.) In
1898 Becquerel discovered that beta rays could be deflected by magnetic and
electric fields. From the amount of the deflection he concluded that these rays
are composed of particles with the same ratio of charge to mass as the cathode
ray particles whose deflection had been measured by J. J. Thomson shortly
before. Beta rays are in fact what later became known as electrons, but moving
much faster than the electrons in cathode rays. It was harder to deflect alpha
rays, but this was eventually accomplished by Rutherford. From the amount
and direction of deflection, Rutherford concluded that these rays consist of
positively charged particles, with a ratio of charge eα to mass mα equal to
half the ratio of charge to mass for hydrogen ions, as had been measured in
electrolysis. The lightest element heavier than hydrogen is helium, with atomic
weight about four times greater than hydrogen, so Rutherford guessed that alpha
rays are helium ions, with charge twice that of hydrogen ions. That is,
eα 2e 1 e
= = .
mα 4m1 2 m1
This was finally confirmed in 1907 when Rutherford together with T. D. Royds
was able to collect enough alpha particles from radioactive decay to show that
the atoms that they form absorb light at the same spectral frequencies as helium.
Once the particles in alpha and beta rays were identified respectively as
helium ions and electrons, it became possible to use measurements of their
deflection to find the particles’ energies. These energies were enormous,
typically about a million times larger than the energies of photons emitted
in ordinary chemical reactions such as burning. Studies of radioactivity by
Rutherford with the chemist Frederick Soddy (1877–1956) at McGill Uni-
versity showed that this energy is released when elements like radium and
thorium spontaneously change to other elements, such as radon. But, for the
understanding of atoms, the most important consequence of the discovery of
radioactivity was that it provided highly energetic charged particles that could
be used as probes of atomic structure.
Discovery of the Atomic Nucleus

After Thomson’s discovery of the electron, it was widely supposed that atoms
are like puddings, in which negatively charged electrons swim like raisins in
a smooth background of positive charge. This seemed at first to be verified by
experiments at the laboratory of Ernest Rutherford at the University of Manch-
ester, to which Rutherford had moved in 1907. Rutherford’s assistant Hans
Geiger (1882–1945) used a beam of alpha particles from what he called “radium
emanation” (radon 222, a product of the alpha decay of radium 226), collimated
by letting the alpha particles pass through a small slit in a metal sheet through
which the alpha particles emerged in a narrow beam. The beam was directed
at a gold foil, thin enough that the alpha particles could penetrate the foil.
The beam then struck a screen covered with zinc sulfide, which emits a flash
of light when hit by an energetic charged particle such as an alpha particle. If
the gold atom really consisted only of the very light electrons in a continuum of
positive charge, it would scatter the alpha particles only weakly. Geiger at first
found flashes of light from an area only slightly larger than the geometric image
of the slit, where the unscattered beam would have struck the screen, indicating
the expected slight scattering.8
A better model was suggested in 1911 by Rutherford,9 on the basis of further
experiments in 1910 in his laboratory. Geiger and Ernst Marsden (1889–1970)
again used alpha rays emitted from a glass tube containing radon 222 gas.10
Rutherford for some reason asked Geiger and Marsden to see whether any alpha
particles could be deflected at large angles, more than 90◦ , so that the particles
would be reflected backwards from surfaces of gold or other metals, producing
flashes of light in a zinc sulfide screen on the same side of the metal surface
as the alpha particle source. To his surprise Rutherford learned that some alpha
particles were scattered almost straight back from various metal surfaces: gold,
lead, platinum, etc.11
Nuclear Mass
The observation of backward scattering immediately indicated that the alpha
particles were repelled by something much heavier than an electron, heavier
indeed than an alpha particle. Suppose two particles with masses mA and mB
and initial velocities vA and vB along some line collide head on, emerging with
velocities vA and v along the same line. (These vs can be positive or negative;
B
when two vs have the same sign the particles are going in the same direction;
if opposite signs, they are going in opposite directions.) The conservation of
momentum requires that

mA vA + mB vB = mA vA + mB vB
while (as long as velocities are measured when the particles are sufficiently
far apart that they exert no force on each other) the conservation of energy
requires that
8 H. Geiger, Proc. Roy. Soc. A 81, 141 (1908). This reference gives citations to earlier work of Rutherford
and others along the same lines.
9 E. Rutherford, Phil. Mag. 21, 669 (1911); 27, 488 (1914). The first article is reprinted in Beyer,
Foundations of Nuclear Physics, listed in the bibliography.
10 It is not clear whether these alpha particles were produced by the direct alpha decay of radon or in
the alpha decay of radon’s decay products. Without explanation Rutherford’s 1911 paper cited an alpha
particle velocity of 2.09 × 109 cm/sec. If this is accurate, then these alpha particles could not have been
those that are emitted in the decay of radon 222 to polonium 218, which have a velocity of 1.6 × 109
cm/sec. Polonium 218 decays into lead 214 with a half life of 3.1 minutes, producing an alpha particle
with velocity 1.7 × 109 cm/sec, and lead 214 then undergoes further decays. Rutherford’s estimate of an
alpha particle velocity of 2.09 × 109 cm/sec may have been just a guess.
11 H. Geiger and E. Marsden, Proc. Roy. Soc. A 82, 495 (1910).

+ mB vB = mA vA 2 + mB vB 2 .
2 2
mA vA
We can use the first equation to express vB in terms of the other velocities.
Using this in the second equation, we then have a quadratic equation for vA ,
with coefficients depending on vA and vB . Like any quadratic equation, this has
two solutions. If nothing changes in the collision then the conservation laws are
automatically satisfied, so one solution is obvious; without even writing down
the equation, we know that vB = vB , vA = v is a solution. Since there are
A
only two solutions, the other solution, for which something does happen in the
collision, is unique. Here it is:
(mA − mB )vA + 2mB vB

vA = (3.3.1)
mA + m B
and, just interchanging the As and Bs,
(mB − mA )vB + 2mA vA
vB = .
mA + m B .
It is easy to check that these do satisfy the conservation laws.
In particular, suppose that particle B is initially at rest, so vB = 0. Then

mA − m B
vA = vA .
mA + m B
So it is only possible for particle A to be reflected backward (that is, with the
sign of vA opposite to that of v ) in the collision with a particle B at rest if
A
mA − mB is negative. Taking A to be the alpha particle, B to be whatever it is
in the atom encountered by the alpha particle, we see that whatever the forces
between them may be, the alpha particle must be repelled by something in the
atom heavier than itself.
Nuclear Size
The observations of alpha particles reflected backward also shows that they
are repelled by something small. Here it is necessary to assume that at the
separations reached in these collisions, the force between the alpha particle
and whatever it is encountering is purely electrostatic. If we assume that the
alpha particle has charge eα and mass mα , and is repelled by something heavy
with charge Ze, then the potential energy of the alpha particle at separation r
is Zeeα /r. In order for the alpha particle to be brought momentarily to rest
before reversing direction, its initial kinetic energy mα vα2 /2 must be entirely
converted to potential energy, so it must at that moment reach a separation r
satisfying
Zeeα /r = mα vα2 /2
or, in other words,

r = 2Ze(eα /mα )vα2 . (3.3.2)
Both vα and eα /mα could be measured, as Thomson had done for electrons, by
measuring the electric and magnetic deflection of the beam of alpha particles.
Rutherford cited a velocity vα 2.09 × 109 cm/sec, and, as already mentioned,
eα /mα e/2m1 = 1/2 faraday. Using these values in Eq. (3.3.2) gives the
separation when the alpha particle comes to rest as r = 3Z × 10−14 cm. Even
for Z as large as 100, this is much smaller than the diameter >10−8 cm of
heavy atoms, estimated from the density of the metals and the mass Am1 of
their atoms.
Rutherford jumped to the conclusion that the positive charge of any atom and
most of its mass is concentrated in a small heavy nucleus, which was repelling
the alpha particles in his experiment. Whether or not by chance, Rutherford
announced this discovery at a session of the Manchester Literary and Philosoph-
ical Society, the same organization at whose meeting Dalton had announced the
law of combining weights a little more than a century earlier.
Scattering Pattern
Further experiments in Rutherford’s laboratory measured the rate d at which
alpha particles in a beam with flux (in particles per unit time and per unit
area transverse to the beam) are scattered into any solid angle d (that is,
into ranges dθ of angles to the initial direction and dφ of angles around the
initial direction, with d = sin θ dθ dφ). Rutherford compared the result with
a calculation using Newtonian dynamics to follow the hyperbolic orbits of alpha
particles in the electric field of a single charged nucleus and find into what area
dσ transverse to the beam the alpha particles must be directed in order to be
scattered into the solid angle d. This gave the ratio of dσ to d, known as the
differential cross section:
dσ Zα2 Z 2 e4
= , (3.3.3)
d 16Eα2 sin4 (θ/2)
where Zα e and Ze are the electric charges respectively of the alpha particle and
the nucleus, and Eα is the initial alpha particle kinetic energy. Since any given
alpha particle can be anywhere in the beam, for a beam of transverse area A
the probability that a particular alpha particle will be aimed at the area dσ for
scattering into d by a single nucleus is dσ/A. If there are N atomic nuclei in
the part of a metal surface within the area of the beam of alpha particles, then
the probability that a given alpha particle will be scattered into the solid angle
d will be Ndσ/A. With a flux , the number of alpha particles per second
hitting the metal surface is A, so the rate at which alpha particles are scattered
into the solid angle d is
dσ
A × Ndσ/A = N d .
d
The observed pattern of scattering at angles greater than 90◦ agreed with the
proportionality to 1/ sin4 (θ/2) indicated by Eq. (3.3.3), confirming to Ruther-
ford that this was indeed Coulomb scattering by a heavy point charge.
Rutherford was lucky. He was calculating these probabilities using classical
mechanics and got the right answer, even though at these velocities and
separations quantum mechanics would normally be needed. Scattering by
inverse square law forces is special; it allows the use of classical mechanics
in some circumstances where for any other force it would be necessary to
use quantum mechanics. Equation (3.3.3) will be derived using quantum
mechanics in Section 5.6, so we will not trouble to repeat Rutherford’s classical
calculation here.
Of course, Rutherford’s discovery was made before the development of quan-
tum mechanics. The agreement of his experimental results with theory generally
convinced physicists of a new picture of the atom, that it consists of a small
heavy positively charged nucleus, around which electrons revolve like plan-
ets around the Sun, held in orbit by electrostatic attraction, which in part had
already been guessed at in 1904 by Hantaro Nagaoka (1854–1950).
Nuclear Charge
In order for atomic theory to make contact with chemistry, it was essential to
know the precise number of electrons in the atoms of various elements. For
instance, as we shall see in Chapter 5, the dramatic difference in the chemical
properties of chlorine and argon is almost entirely due to the fact that chlorine
atoms contain 17 electrons while argon atoms contain 18. Because atoms are
electrically neutral, knowing the electric charge of the nucleus tells us the num-
ber of electrons: if the nuclear charge is Ze, the atom must contain Z electrons.
Almost immediately after Rutherford in 1911 announced his conclusion
about the existence of the nucleus, Antonius van den Broek (1870–1926)
argued in a brief note12 (apparently on the basis of the steady progression of
chemical properties with increasing atomic weight) that the nuclear charge in
units of e equals the atomic number, defined as the position in the catalog of
elements when they are listed in order of increasing atomic weight – that is,
hydrogen, helium, lithium, and so on – but he had no experimental evidence for
this hypothesis.
Rutherford offered no opinion about this. In his 1911 article cited in
footnote 9 he had used Eq. (3.3.3) (with Zα = 2, known from previous mea-
surements of the deflection of alpha particles by electric and magnetic fields)
together with several measurements by Geiger of the scattering by small angles
12 A. van den Broek, Nature 87, 78 (1911). He later published a longer paper, Phys. Zeit. 14, 32 (1913).
of alpha particles in thin gold foil to derive a value of 97e or 114e for the charge
of the gold nucleus. The atomic number of gold is 79, so if Rutherford’s value
for the charge of the gold nucleus had been correct it would have ruled out the
equality of atomic number and the nuclear charge in units of e.13 As we shall
see in the next section, this equality was established in 1913 by measurements
of the wavelengths of X-rays from various elements.
3.4 Atomic Energy Levels
Spectral Lines
In Munich in 1814–1815 the optician Joseph Fraunhofer (1787–1826) observed
that when light from the Sun is passed through a slit, focussed by a telescope,
and then dispersed by a prism into a spectrum of colors, the spectrum is crossed
with hundreds of dark lines, each an image of the slit. These lines were always
found in the same places in the spectrum, each corresponding to a definite
wavelength of light. It was realized that these dark lines must be caused by
selective absorption of light as it passes from the hot solar surface through
the cooler part of the Sun’s atmosphere. The same dark lines were seen in
the spectrum of the Moon and bright stars. Similar observations of the light
from flames and other terrestrial sources showed lines in the same places, some-
times dark and sometimes bright, so it became possible to identify the elements
producing these lines: sodium, iron, magnesium, calcium, etc. Some elements,
such as helium, were discovered in this way on the Sun before they were found
on Earth.
By the end of the nineteenth century large books had been published for
physicists and chemists, giving vast numbers of wavelengths for the spectral
lines of various elements. The observation of spectra became a standard tool of
astronomy and chemical analysis. But what could cause the atoms of a given
element preferentially to emit and absorb light at only certain definite wave-
lengths? Answering this question had to wait for a realistic model of atoms.
Electron Orbits
In classical electrodynamics the simple harmonic oscillation of a charged body
produces electromagnetic radiation with the same frequency as the oscillating
13 Rutherford’s over-estimate of the charge of the gold nucleus may have arisen because he was using a
wrong value for the velocity of the alpha particle in these experiments. As mentioned in footnote 10, in
the same paper Rutherford had given a value 2.09 × 109 cm/sec for the alpha particle velocity, while the
alpha particles from the decay of radon 222 actually have a velocity of 1.6 × 109 cm/sec. According to
Eq. (3.3.3) the scattering cross section depends on Z/Eα , so by over-estimating the velocity of the alpha
particles he would be over-estimating the electric charge of the gold nucleus.
charge, and the charged body is also effective at absorbing radiation at that
frequency. After the discovery of the electron in 1897, as mentioned earlier it
was widely supposed that atoms consist of electrons trapped in a smooth back-
ground of positive charge, and it was natural to assume that the characteristic
frequencies observed in atomic spectra are the frequencies with which these
electrons can oscillate back and forth around their normal positions.
Then, with the discovery of the nucleus discussed in the previous section,
this picture was replaced with a planetary model of the atom, in which electrons
circulate in orbits around the nucleus, like planets around the Sun only held
in orbit by electrostatic rather than gravitational attraction. In classical elec-
trodynamics the periodic motion of the electrically charged electrons would
produce electromagnetic radiation, with a frequency for circular orbits equal
to the frequency with which the electron goes around its orbit.
For elliptical orbits matters are more complicated. While the Cartesian coor-
dinates of an electron traveling at constant speed in a circular orbit are simple
harmonic functions of time, and in classical electrodynamics the electron radi-
ates at the corresponding frequency, for elliptical orbits the motion is periodic
though not simple harmonic. The Cartesian coordinates for an orbit of period
1/ν can still be expressed as Fourier series of simple harmonic terms propor-
tional to sin 2πnνt and cos 2πnνt with n an integer, so the electron classically
radiates at all frequencies equal to whole number multiples of the frequency ν
of revolution. No such pattern is seen in actual spectra.
Even if the orbits were all circular, this view of atomic spectra would have
problems. One trouble with this picture is that classically the electrons would
continually lose energy to radiation, bringing them closer to the nucleus and
thereby speeding up its revolution, hence replacing the discrete spectral line
with a continuum of frequencies. Even worse, classically there would be nothing
to prevent electrons from spiraling onto the nucleus, so that there would be no
stable atoms. Of course, one could simply assume that only certain orbits are
possible, and that these are all stable. The frequencies of these allowed orbits
would then correspond to the observed spectral lines. But there was another
trouble even with this picture: it offered no explanation of a systematic property
of observed spectral frequencies, known as the Ritz combination principle.
The Combination Principle

In 1908 the spectroscopist Walther Ritz (1878–1909) noticed a peculiar property
of the observed wavelengths of spectral lines:14 in any one atom, the frequencies
corresponding to the observed wavelengths of spectral lines are differences of a
smaller number of quantities, which he called terms. That is, if we label the nth
term as νn , then the observed spectral frequencies are all of the form
14 W. Ritz, Phys. Z. 9, 521 (1908).

νnm = νn − νm , (3.4.1)
with n and m equal to 1, 2, 3, . . . (This was traditionally expressed in terms of
inverse wavelengths instead of frequencies, but the frequency of any wave is just
the speed of light times the inverse wavelength, so this makes no difference.)
Ritz could offer no explanation of this principle.
The explanation of the Ritz principle and much else was provided in the visit
in 1913 to Rutherford’s Manchester laboratory of a young Danish theorist, Niels
Bohr15 (1885–1962). He assumed that the states of an atom have energies in a
discrete set, labeled En with n running from one to infinity. These states are
stable, except for radiative transitions among them, whose rates are typically
much slower than the frequencies of spectral lines. When an atom makes a
transition from an energy En to a smaller energy Em , it emits a photon with
energy En − Em and hence with frequency
νnm = (En − Em )/ h .
Similarly, for an atom to make a transition from an energy Em to a higher
energy En , it must absorb a photon with the same energy and frequency. These
are the transitions that produce the bright and dark lines observed in spectro-
graphs. Their frequencies match the results (3.4.1) given by the Ritz principle,
if we identify the “terms” νn as simply the energies En of the various states,
divided by h.
Bohr’s Quantization Condition

But what determines the energies En ? Casting about for something to quantize,
Bohr noted that h has the units of energy per frequency, which is the same as
the units of angular momentum, so Bohr guessed that the angular momenta of
atomic states are integer multiples of some quantity h̄, similar in magnitude
to h. (Some readers may already know what h̄ turned out to be. At first, Bohr
had no idea what it was, so until we see how Bohr figured this out, please forget
whatever you know about h̄.)
The applications of Bohr’s quantization principle are simplest for one-electron
atoms, such as neutral hydrogen, singly ionized helium, etc. An electron with
velocity vn in a circular orbit of radius rn about a nucleus of charge Ze has
angular momentum me vn rn , so Bohr’s quantization condition was
me vn rn = nh̄ , (3.4.2)
with n an integer running from one to infinity. A second relation between vn
and rn is given by equating the electrostatic attraction Ze2 /rn2 to me times the
centripetal acceleration vn2 /rn :
Ze2 /rn2 = me vn2 /rn , (3.4.3)
15 N. Bohr, Phil. Mag. 26, 1, 476, 857 (1913); Nature 92, 231 (1913).
just as for planets in the solar system, but of course with different constant
factors on each side of the equation. We can solve these two equations for radius
and velocity. Multiplying Eq. (3.4.3) with rn3 /Ze2 gives rn = me vn2 rn2 /Ze2 , so
n2 h̄2
rn = . (3.4.4)
Ze2 me
Using this back in Eq. (3.4.2) then gives
Ze2
vn = . (3.4.5)
nh̄
The electron has total energy
me vn2 Ze2 Z 2 e 4 me
En = − =− . (3.4.6)
2 rn 2n2 h̄2
(By the way, it immediately follows from Eq. (3.4.3) that the kinetic energy is
−1/2 times the potential energy. One consequence, already mentioned in the
previous section, is that classically when an electron in orbit loses energy the
potential energy decreases, becoming more negative, so that the kinetic energy
increases.)
The Correspondence Principle

Now, what is h̄? To answer this, Bohr invoked what he called the correspon-
dence principle, that the larger a system is, the more closely it obeys classical
mechanics. From Eq. (3.4.4) we see that the large orbits are those with large n.
(Atoms with n of order 100 have actually been studied experimentally.) For
n 1, the energy emitted when a single-electron atom goes from state n to
state n − 1 is

Z 2 e 4 me 1 1 Z 2 e 4 me 2
En − En−1 = − × 3 ,
2h̄ 2 (n − 1) 2 n2 2h̄ 2 n
so the frequency of the photon emitted in this transition must be
Z 2 e 4 me
νn→n−1 .
n3 h̄2 h
On the other hand, classically the frequency with which the electron goes around
its orbit is
vn Ze2 /nh̄ Z 2 e 4 me
νn = = = .
2πrn 2πn2 h̄2 /Ze2 me 2πn3 h̄3
In order for these two frequencies to be equal, as required by the correspondence
principle, we must have h̄2 h = 2π h̄3 , and therefore
h̄ = h/2π 1.054 × 10−27 erg sec 6.582 × 10−16 eV sec . (3.4.7)
Bohr could then give numerical values of parameters for one-electron atoms:
Z n2 Z2
vn ×c , rn × 0.5292 × 10−8 cm , En = − × 13.6 eV .
137n Z n2
(3.4.8)
Comparison with Observed Spectra

Bohr’s result for En was in good agreement with measurements of spectral
wavelengths. In 1885 the Swiss mathematician Johann Balmer (1825–1898)
had noticed that the wavelengths of many of the lines in the visible spectrum
of hydrogen are well fit by the formula

−1 1 1
λBalmer,n ∝ − , n = 3, 4, . . .
4 n2
This was generalized in 1888 by the Swedish physicist Johannes Rydberg
(1854–1919), to a general formula for the wavelengths of lines in the spectrum
of neutral hydrogen:

−1 1 1
λm,n = RH − , m = 1, 2, 3, . . . , n = m + 1, m + 2, . . . ,
m2 n2
where RH 1.1 × 105 cm−1 is a constant, later named the Rydberg constant
for hydrogen. The visible Balmer series is the case m = 2, while the infrared
series m = 3, m = 4, m = 5, etc. became named for Paschen, Brackett, Pfund,
etc. The lines of the m = 1 series were predicted by Rydberg’s formula to
be in the ultraviolet, with wavelengths from 121.7 to 91.1 nm. It was not until
1903 that they were measured, studying hydrogen excited by electric currents,
by Theodore Lyman (1833–1897) at Harvard. These results for hydrogen may
have provided Ritz with inspiration to formulate his combination principle for
all elements.
For comparison with Rydberg’s formula, Bohr’s formula (3.4.6) for En gave
the inverse wavelength of the photon emitted in a transition from energy level n
to energy level m: in an atom whose nucleus has charge Ze,

−1 νn→m 1 Z 2 e 4 me 1 1
λn→m = = (En − Em ) = − , (3.4.9)
c 2π h̄c 4π h̄3 c m2 n2
which is the same for Z = 1 as Rydberg’s formula if we identify the Rydberg
constant for hydrogen as
e 4 me
RH = . (3.4.10)
4π h̄3 c
Using the best values then available for the fundamental constants, Bohr
obtained a value for RH in agreement with the results from contemporary
spectroscopic measurements. (Using modern values for fundamental constants

gives RH = 13.605693009(84) eV/ hc = 1.0968 × 105 cm −1 .)
Reduced Mass
But the agreement was not perfect. According to Eq. (3.4.6), all energies and
hence all frequencies in the spectrum of once-ionized helium should be ZHe 2 =4
times larger than for neutral hydrogen, but experiment showed that the ratio
was actually larger than 4 by about 0.04%. Bohr realized that the source of this
discrepancy was that in order to take account of the motion of the nucleus the
formulas for energy and angular momentum of an electron in orbit around a
nucleus of mass M should contain the reduced mass μ = me /(1 + me /M)
in place of the electron mass itself. It is therefore the reduced mass that should
appear in Bohr’s formulas for energies and frequencies in place of me . All ener-
gies and frequencies are thus larger for singly ionized helium than for hydrogen
by a factor
1 + me /mH
2
ZHe 2
/ZH × μHe /μH = 4 × = 4 × 1.00041 ,
1 + me /mHe
in agreement with observation. Bohr’s success in getting this factor right was a
key factor in convincing physicists of the correctness of his assumptions.
Incidentally, although Bohr’s formula (3.4.10) for hydrogen energy levels
(with the reduced mass in place of me ) worked very well, the n in this formula
is not quite equal to the angular momentum in units of h̄, as Bohr had assumed.
We will see in Section 5.2 that in general there are several hydrogen states with
energies given by this formula with the same n, in which the electron has orbital
angular momenta (n − 1)h̄, (n − 2)h̄, . . . , 0, but not nh̄. The electrostatic attrac-
tion exerted on electrons by the nucleus is not balanced solely by the centrifugal
force of motion in closed orbits, but by motions implicit in the wave nature of
the electron. Although Bohr’s calculation of the energy levels in hydrogen has
not survived as a correct derivation of the formula for these energies, Bohr made
a contribution of permanent importance in using a hypothesis of discrete energy
levels for electrons in all atoms to explain the existence of bright and dark lines
in atomic spectra.
Atomic Number
The alpha particle scattering experiments in Rutherford’s laboratory had not
settled the crucial question of the electric charge Ze of the atomic nucleus and
its possible relation to the atomic number, which gives the order of an element
in the list of elements in order of increasing atomic weight. One of the great
achievements of the Bohr theory is that it made possible precise measurements
of nuclear charge.
Of course, Bohr’s formula (3.4.8) was strictly applicable only to one-electron

atoms, but under the approximation of spherical symmetry the electric field felt
by the innermost electrons in any atom arises entirely from the nucleus, not
from electrons farther out. Hence the energy of the photon emitted when an
electron falls from a state in which it is more or less at rest far from the atom
and has essentially zero energy to the innermost n = 1 orbit of any atom is
given by Bohr’s formula (3.4.8) as −E1 = 13.6Z 2 eV. For Z > 10 this is an
X-ray energy.
After publication of Bohr’s work in 1913, a young physicist at Manchester,
Henry G. J. Moseley (1887–1915), set out to measure these energies. Instead of
a prism he used a crystal that (as described in Section 5.1) preferentially reflects
X-rays at certain angles that depend on the wavelength. His results16 for the
nuclear charge Ze are shown in the following table, along with the values then
known for atomic weight A, which are close to the values accepted now:
Element Z A
calcium 20.00 40.09

scandium — 44.1
titanium 21.99 48.1
vanadium 22.96 51.06
chromium 23.98 52.0
manganese 24.99 54.93
iron 25.99 55.85
cobalt 27.00 58.97
nickel 28.04 58.68
copper 29.01 63.57
zinc 30.01 65.37
Two aspects of this table stand out dramatically. The first is that Z always
turns out to be very close to an integer; the small discrepancies can be easily
blamed on experimental uncertainties. That of course is what one expects, if Z
is the number of electrons in the atom, but it reassured everyone that Moseley’s
measurements were reliable. The second remarkable feature is that Z goes up
by one unit as you go up one step in the list of elements according to atomic
weight; there are no elements with atomic weights between 40 and 65 other
than those listed here. (Nickel is an exception to the steady increase of A with
Z, understood today as due to forces in the nucleus of nickel that make it
unusually strongly bound, for a reason discussed in Section 6.3.) This tight
16 H. G. J. Moseley, Phil. Mag. 26, 1024 (1913).

correspondence between atomic number and atomic weight goes beyond the
elements in the table. For instance, there are just 19 elements with atomic
weights less than calcium, which has Z = 20. Thus with a few exceptions,
one can find Z for any element just by making a list of all elements in order
of increasing atomic weight; the atomic number, defined as the place of the
element in that list, gives the number Z of electrons in the atom and the positive
charge Ze of the nucleus.
Incidentally, the Bohr theory also provides a rough idea of the sizes of all
atoms. The electric field felt by the outermost electron in any atom is largely
shielded by the Z − 1 electrons closer to the nucleus, so the radius of its orbit
is very crudely given by the Bohr result (3.4.8), only with Z 1. This is why
the sizes of the atoms of heavy elements are not very much larger than that
of the hydrogen atom, of order 10−8 cm. They are in fact somewhat larger,
because the radius rn increases with n, and for reasons we will learn in Chapter 5
the outermost electrons in heavy atoms have n greater than 1.
Outstanding Questions
Successful as it was, the Bohr theory raised a number of new questions.
1. Why should angular momentum (or anything else) be quantized?
2. How many atomic states are there for each energy? (It was already known
that spectral lines could be split by exposing atoms to external electric and
magnetic fields.)
3. Above all, how should quantum theory be applied to states that cannot be ap-
proximated as consisting of electrons moving in a fixed Coulomb potential.
This includes all molecules.
The solution of these problems had to wait until the advent of modern quantum
mechanics in the 1920s. This is the subject of Chapter 5.
3.5 Emission and Absorption of Radiation
A and B Coefficients
In 1917 Einstein returned to the theory of black body radiation,17 this time
combining it with the Bohr idea of quantized atomic energy states. Einstein
defined a quantity Anm as the rate at which an atom will spontaneously make a
transition from a state m of energy Em to a state n of lower energy En , emitting
a photon of energy Em − En . He also considered the absorption of photons
17 A. Einstein, Phys. Z. 18, 121 (1917), reprinted in English translation in Van der Waerden, Sources of
Quantum Mechanics, listed in the bibliography.
from radiation (not necessarily black body radiation) with an energy density
E (ν) dν at frequencies between ν and ν + dν. The rate at which an individual
atom in such a field makes a transition from a state n to a state m of higher
energy is written as Bnm E (νnm ), where νnm ≡ (Em − En )/ h is the frequency
of the absorbed photon. As we will see, Einstein also found it necessary to take
into account the possibility that the radiation would stimulate the emission of
photons of frequency νnm by the atom in transitions from a state m to a state
n of lower energy, at a rate written as Bmn E (ν ). The coefficients B m , and B n
nm n m
n
like Am , were assumed to depend only on the properties of individual atoms,
not on their temperature or any properties of the radiation.
Now, suppose the radiation is black body radiation, at a temperature T , with
which the atoms are in equilibrium. The energy density per frequency interval
of the radiation will be the function E (ν, T ) given by Eq. (3.2.4):
8πh ν3
E (ν, T ) = .
In equilibrium the rate at which atoms make a transition m → n from higher
to lower energy must equal the rate at which atoms make the reverse transition
n → m:

Nm Anm + Bm n
E (νnm , T ) = Nn Bnm E (νnm , T ) , (3.5.1)
where Nn and Nm are the numbers of atoms in states n and m. According to the
Boltzmann rule of classical statistical mechanics, at temperature T the number
of atoms in a given state of energy E is proportional to exp(−E/kT ), so
Nm /Nn = exp (−(Em − En )/kT ) = exp (−hνnm /kT ) . (3.5.2)
(It is important here to take the various Nn as the numbers of atoms in the
individual states n, some of which may have precisely the same energy, rather
than the numbers of atoms in all states with energies En .) Putting this together,
we have
8πh 3
νnm

Anm = exp(hν nm /kT ) B m
n − Bm
n
. (3.5.3)
c3 exp(hνnm /kT ) − 1
For this to be possible at all temperatures for temperature-independent A and B
coefficients, these coefficients must evidently be related by
3
8πhνnm
Bm = Bn , Am =
n m n
Bmn
. (3.5.4)
c3
Hence, knowing the rate at which a classical light wave of a given energy density
is absorbed or stimulates emission by an atom, we can calculate the rate at which
it spontaneously emits photons, an explicitly quantum process.
Lasers
The phenomenon of stimulated emission makes possible the amplification of
beams of light in a laser. (This is an acronym for “light amplification by stim-
ulated emission of radiation.” Before lasers there were masers, in which it was
microwave radiation rather than visible light that was amplified by stimulated
emission.) Suppose a beam of light with energy density distribution E (ν) passes
through a medium consisting of Nn atoms at energy level En . Stimulated emis-
sion from the first excited state n = 2 to the ground state n = 1 adds photons
of frequency ν12 ≡ (E2 − E1 )/ h to the beam at a rate N2 E (ν12 )B21 , but absorp-
tion from the ground state removes photons at a rate N1 E (ν12 )B12 , and since
B21 = B12 there will be a net addition of photons only in the case N2 > N1 .
Unfortunately, such a population inversion never occurs in thermal equilibrium,
and cannot even be produced by exposing the atoms in their ground state to light
at the resonant frequency ν12 . The net rate of change in the population of the
first excited state, labeled n = 2, due to spontaneous and stimulated emission
from the excited state and absorption from the ground state will be
dN2
= −N2 E (ν12 )B21 − N2 A12 + N1 E (ν12 )B12 ,
dt
or, using the Einstein relation B12 = B21 ,
dN2
= B21 − N2 E (ν12 ) + 8πν12
3
h/c3 + N1 E (ν12 ) . (3.5.5)
dt
If we start with N2 = 0, then N2 increases until it approaches a value N1 /(1+ξ ),
where ξ ≡ 8πν12 3 h/E (ν )c3 , when N becomes constant. Not only can this
12 2
process not produce a population inversion; because of spontaneous emission it
cannot even make N2 as large as N1 .
A population inversion can be produced in other ways, for instance by optical
pumping, in which atoms are excited to some state, say n = 3, by absorption
of light with frequency ν31 = (E3 − E1 )/ h, and then spontaneously decay to
the state n = 2. This can also happen naturally. Masers have been observed in
the accretion disks surrounding the centers of several galaxies, including NGC
4258 and M33.
Suppressed Absorption
Stimulated emission can not only intensify emission lines, such as those from
masers – it can also suppress absorption lines. Consider a steady beam with
area A of radiation moving in the +x-direction, with local energy density per
unit frequency interval E (ν, x) at x. In the steady state, the rate of change of
energy per unit frequency interval E A dx in the slab between x and x + dx due
to atomic transitions n → m and m → n with Em − En = hν > 0 must
be balanced by the difference in the rates at which radiation energy enters and
leaves the slab:
c[E (ν, x) − E (ν, x + dx)]A = hν E (ν, x)[−nn Bnm + nm Bm
n
]A dx
where nm and nn are the number densities of atoms in states m and n, respec-
tively. The two terms in square brackets on the right arise respectively from
absorption and stimulated emission; we do not include a term for spontaneous
emission because the photons it produces leave the beam. If the medium is in
thermal equilibrium at temperature T then nm /nn = exp(−hν/kT ); so, since
Bmn = B m , the energy density per unit frequency interval along the beam must
n
satisfy
d hν
E (ν, x) = − E (ν, x)nn Bnm 1 − exp(−hν/kT ) . (3.5.6)
dx c
Thus, if hν
kT , stimulated emission suppresses the intensity of the absorp-
tion line by a factor hν/kT . This is important for radio and microwave fre-
quency lines, like the famous “21-cm” line in hydrogen discussed in Section 5.4.
It has hν/k = 0.068 K, which is less even than the temperature of the cosmic
microwave background, so this absorption line is strongly suppressed by stim-
ulated emission everywhere. Nevertheless, the absorption line is observed. Its
intensity and Doppler shifts provide valuable information about the temperature
and motion of hydrogen gas in galactic disks.
4
Relativity
We now turn to the special theory of relativity, introduced by Einstein in a pair

of papers in 1905, the same year in which he postulated the quantization of
radiation energy and showed how to use observations of diffusion to measure
constants of microscopic physics. Special relativity revolutionized our ideas of
space, time, and mass, and it gave the physicists of the twentieth century a
paradigm for the incorporation of conditions of invariance into the fundamental
principles of physics.
4.1 Early Relativity
Motion of the Earth

The idea of the relativity of motion first appeared in medieval arguments over
whether or not the Earth can be in motion. For no good reason, it had been
proposed by the followers of the cult of Pythagoras in the fifth century BC that
the Earth along with the Sun and planets was in orbit about some sort of central
fire. A more sober proposal was made in the third century BC by the Hellenistic
astronomer Aristarchus of Samos (ca. 310–230 BC).1 From observations of the
Sun and Moon, he calculated that the Sun is much larger than the Earth. Accord-
ing to a later book of Archimedes, Aristarchus concluded from the difference in
their sizes that instead of the Sun going around the Earth it was more plausible
to suppose that the Earth goes around the Sun.
Better motivated was the idea that the Earth is rotating. It was not hard to
see that the apparent rotation once a day from east to west of the Sun, Moon,
planets, and stars could be neatly explained if instead the Earth were rotating on
an axis once a day from west to east. At least one astronomer suggested this as
1 Aristarchus, “On the Sizes and Distances of the Sun and Moon,” translated by T. L. Heath, in Aristarchus
of Samos (Clarendon Press, Oxford, 1923). The calculations of Aristarchus are described in S. Weinberg,
To Explain the World (HarperCollins Publishers, New York, 2015).
88
early as the fourth century BC; it was Heraclides of Pontus (ca. 388–310 BC),
a student at Plato’s Academy at Athens.
There is a classic argument against both the rotation and motion of the Earth,
given originally by Aristotle, and picked up around 150 AD by the astronomer
Claudius Ptolemy of Alexandria (ca. 100–170 AD). Ptolemy argued that if the
surface of the Earth were in motion then an arrow shot straight up would not
fall back to the same spot from which it was shot, as is observed, because
while the arrow was in flight that spot would have moved some distance under
the arrow. This argument was first countered in the mid-1300s AD by Nicole
Oresme (1321–1382), bishop of Lisieux. Relying on the concept of impetus
introduced by his teacher at the University of Paris Jean Buridan (1300–1358),
Oresme argued that an arrow on the surface of the Earth would pick up an
impetus from the Earth’s motion, which would keep it moving with the same
horizontal component of velocity while going up and down in the air, so it
would fall back to the same spot on Earth, despite the Earth’s motion. Sadly,
whether from respect for the teachings of the Church or fear of its discipline,
Oresme never publicly adopted the notion that the Earth really is in motion. But
he had established that purely terrestrial observations cannot detect a possible
motion of the Earth.
It was not so obvious that the peculiar motion of the planets around the
constellations of the zodiac, sometimes even seeming to reverse their motion,
could be explained if the Earth were in orbit about the Sun, sometimes passing
Mars or some other outer planet, and sometimes being passed by Venus or Mer-
cury. As everyone knows, this was finally made clear in the 1540s by Nicolaus
Copernicus (1473–1543).
Relativity of Motion
I don’t know if it was the writings of Oresme or similar ideas of their own, but
Johannes Kepler (1571–1630) and Galileo Galilei (1564–1642) in their defense
of Copernicanism were comfortable with the conclusion that there is no way
that a uniformly moving observer without observing the surroundings can tell
that he or she is in motion. It was generally understood that (in modern notation)
if a first observer describes any event as having Cartesian space coordinates x i
(with i = 1, 2, 3 or x, y, z) and time coordinate t, then a second observer who
moves with velocity −u with respect to the first will see the same event with
coordinates
x = x i + ui t, t = t ,
i
(4.1.1)
because an object seen by the first observer with any time-independent coordi-
nates x i = a i will seem to the second observer to be moving with velocity + u,
with coordinates x i = a i + ui t.
90 4 Relativity
Invariance under these transformations was built into Newton’s theory of

motion and gravitation. In a system of bodies acted on by their mutual gravita-
tional attraction, the equations of motion obeyed by the Cartesian coordinates
xNi of the Nth body are
d2 i i − xi
xM
x = Gm M
N
(4.1.2)
dt 2 N |xM − xN |3
M=N
i
where G is Newton’s gravitational constant, and |xM − xN |2 ≡ i (xM − xNi )2 .
These equations are invariant under the transformation (4.1.1), which here is

t → t = t ,
i
i
xN → xN = xN
i
+ ui t , (4.1.3)
because the term ui t drops out in the second time derivative on the left-hand side
of Eq. (4.1.2) and does not appear in the differences of spatial coordinates on
the right-hand side. The principle that the laws of nature are invariant under
the transformations (4.1.1) is known as the principle of Galilean relativity.
It is a good approximation for bodies moving at speeds much less than that
of light. For instance, we saw in Section 2.5 how invariance under Galilean
transformations is used to infer the equations of motion for imperfect fluids.
The equations of motion (4.1.2) are of course also invariant under constant
rotations of space coordinates and constant translations of space and time coor-
dinates. The set of all these transformations and all their combinations is known
as the Galileo group.
Speed of Light
It is obvious that Maxwell’s equations are not invariant under the Galilean
transformations (4.1.1). Maxwell’s equations tell us that light always travels
at the same speed, which we call c. If a light wave moves along the 1-direction,
the 1-coordinate of the wave front must have the time dependence
x 1 (t) = x 1 (0) + ct . (4.1.4)
But then if a second observer who moves in the −1-direction with speed u uses
the coordinates (4.1.1), she will see the 1-coordinate of the wave front as
x 1 (t) = x 1 (0) + (c + u)t , (4.1.5)
so the wave would seem to travel faster or slower than the speed of light ac-
cording to whether u is positive or negative. Observers can use any coordinate
systems they like, but Eq. (4.1.5) shows that if Maxwell’s equations in the form
(3.1.2) are found to hold when an observer uses coordinates x i , t then they
cannot hold in that form when she uses coordinates x i , t .
Einstein worried about this as a young man. He was particularly concerned
with what a light wave would look like to an observer with u = −c in our
example – that is, an observer moving with the light wave. He concluded that
the electric and magnetic fields would appear frozen in time, though varying
with position along the ray. Needless to say, this is not a solution of the Maxwell
equations.
This problem did not worry Maxwell. In formulating his equations, he
regarded the electric and magnetic fields as vibrations in an elastic medium, the
aether. In this case one would not expect the equations to hold for observers
moving with respect to the aether, any more than the equations for a sound wave
traveling up and down in an organ pipe would seem the same to an observer
flying up the pipe as to an observer at rest with respect to the pipe. Maxwell
thought that his equations would apply only for observers at rest in the aether.
Michelson–Morley Experiment
So, if electromagnetic waves are vibrations in the aether, can we measure the
velocity of the Earth through the aether? The Earth’s orbital motion gives it
a speed of 30 km/sec relative to the Sun, and the rotation of our galaxy gives
the solar system a speed of about 200 km/sec relative to the galaxy’s center.
These speeds are much less than the speed of light, 300 000 km/sec, but not
too small to be measured with a device known as a Michelson interferometer,
invented by the American physicist Albert Michelson (1852–1931). (Michelson
interferometers have been used for many purposes since then, most recently in
the detection of gravitational waves from distant coalescing black holes and/or
neutron stars.)
In 1886 Michelson and Edward Morley (1838–1923) set out to measure the
speed of the Earth through the aether in observations at the US Naval Academy,
where Michelson had been a midshipman. As a base for their interferometer,
they used a large stone disk floating on mercury, to allow an easy change in
its orientation and also to give it some insulation from vibrations in the Earth.
On this disk they placed a strong source of light, which sent a beam of light
toward a half-silvered mirror set at 45◦ to the beam. (See Figure 4.1.) Half
the beam went straight ahead to an ordinary mirror A at distance LA from the
half-silvered mirror, and half went at a right angle to another ordinary mirror B
at a distance LB . From both these two mirrors the beam was reflected back to
the half-silvered mirror. Some of the two reflected beams went together in the
direction opposite to the direction to mirror B, to a detector which measured
the intensity of the recombined beam. If it takes times tA and tB for the light to
travel from the half-silvered mirror M along the paths to mirrors A and B and
back again, then the intensity observed at the detector is proportional to

AA e−2πiνtA + AB e−2πiνtB 2
= |AA |2 + |AB |2 + 2|AA ||AB | cos(2πν(tA − tB ) + α) , (4.1.6)

92 4 Relativity
v
f
LA
LB
B
Figure 4.1 The interferometer used in the Michelson–Morley experiment,

seen from above.
where AA and AB are the amplitudes that would be received from mirrors A and
B if tA and tB were negligible, α is the relative phase of these amplitudes, and ν
is the light frequency. It is easy to arrange that |AA | and |AB | are approximately
equal, in which case the intensity (4.1.6) is quite sensitive to the argument of
the cosine. So we need to calculate the times tA and tB for various orientations
of the interferometer.
Adopting the idea of an aether for the sake of argument, let us assume that
the Earth is traveling through the aether with a speed v, at an angle φ to the
direction of the interferometer’s incident light beam. To calculate tA and tB it is
easiest to work in the frame of reference at rest in the aether, in which the speed
of light according to Maxwell is c in all directions. If the light takes a time tA+ to
travel from the half-silvered mirror M to mirror A and a time tA− to travel back
from A to M, then in the time intervals tA± it travels a distance LA ± tA± v cos φ
along its original direction (because during time tA+ the mirror A moves in
a direction away from M by a distance tA+ v cos φ while in the time tA− the
half-silvered mirror M moves in a direction toward A by a distance tA− v cos φ).
In both time intervals the light beam also moves at right angles to its original
direction by a distance tA± v sin φ. The total distance traveled in these time
intervals is then the hypotenuse of a right triangle with sides LA ±tA± v cos φ and
tA± v sin φ, so

ctA± = (LA ± tA± v cos φ)2 + (tA± v sin φ)2 = L2A ± 2LA tA± v cos φ + (tA± v)2 .
Because v is presumably much less than c, it will be enough to keep only terms
up to second order in v. We can then use the familiar expansion
√ x x2
1+x =1+ − + ··· ,
2 8
so that
1 ± 2
ctA± LA ± tA± v cos φ + (t v) (1 − cos2 φ)
2LA A
1
LA ± tA± v cos φ + LA (v 2 /c2 )(1 − cos2 φ)
2
and therefore

LA v2
tA± 1 + 2 (1 − cos φ) .
2
c ∓ v cos φ 2c
Adding these results for tA+ and tA− , we see that the terms of first order in v/c
cancel, leaving us with the second-order correction

2LA c v2
tA = tA+ + tA− 2 1 + (1 − cos 2
φ)
c − v 2 cos2 φ 2c2

2LA v2
1 + 2 (1 + cos φ) .
2
c 2c
Since we assumed that the line from the half-silvered mirror M to mirror A is at
an angle φ to the Earth’s velocity through the aether, the line from M to mirror
B is at an angle 90◦ − φ to the Earth’s velocity. We can therefore find tB by
simply replacing φ with 90◦ − φ and of course replacing LA with LB :

2LB v2
tB 1 + 2 (1 + sin φ) .
2
c 2c
The difference, which appears in Eq. (4.1.6), is then

2(LA − LB ) 3v 2 (LA + LB ) v 2
t A − tB 1+ 2 + cos 2φ . (4.1.7)
c 4c 2c c2
There is no way that Michelson and Morley could know LA − LB and α accu-
rately enough to allow them to detect the presence of corrections proportional to
v 2 /c2 by measuring the intensity with a fixed orientation of their interferometer,
even if they knew the value of φ for that orientation, which of course they did
not since no one knew the direction of the Earth’s motion through the aether.
But if they rotated the interferometer through 180◦ , then cos 2φ would vary
94 4 Relativity
through the whole range from −1 to +1, so tA − tB in Eq. (4.1.7) would vary by
an amount (LA + LB )v 2 /c3 and the argument of the cosine in Eq. (4.1.6) would
change by an amount 2πν(LA + LB )v 2 /c3 . This predicts an observable change
in the intensity (4.1.6) as the interferometer is rotated through 180◦ , provided
that 2πν(LA + LB )v 2 /c3 is not much less than 2π , or in other words provided
that v 2 /c2 is not too small compared with c/ν(LA + LB ) = λ/(LA + LB ),
where λ = c/ν is the light wavelength. In the Michelson–Morley experiment
(taking account of repeated reflections between the half-silvered mirror and the
other mirrors) LA + LB was of order 103 cm, while the wavelength λ was a few
times 10−5 cm, so λ/(LA + LB ) was of order 10−8 , and velocities roughly of
order 10−4 c = 30 km/sec could be easily detected.
Finding no change in the intensity (4.1.6) as the interferometer was rotated,
Michelson and Morley concluded in 1887 that the velocity of light as observed
from the moving Earth is the same in all directions to within 5 km/sec.2 That
is, within the aether theory of that time, the speed v of the Earth relative to the
aether would have to be less than 5 km/sec, as compared with the undoubted
orbital velocity of the Earth relative to the Sun of 30 km/sec. By 1964, with
the use of a laser instead of an incoherent light source, the upper limit on this
velocity had been reduced to about 1 km/sec.3 Even if one imagined that on
a particular day the Earth happened to be more or less at rest in the aether,
six months later the Earth would be moving in the opposite direction, with the
same speed relative to the Sun, and hence with a speed of 60 km/sec relative to
the aether.
This surprising result evoked various explanations. H. A. Lorentz4 in 1892
and George Francis Fitzgerald (1851–1901) at about the same time proposed
that motion through the aether causes a contraction of the dimension of the
interferometer along the direction of motion, just such as to hide the effect of
motion on the speed of light. Lorentz, acting on the assumption that all mat-
ter consists of electrons, tried to explain this “Lorentz–Fitzgerald contraction”
within a theory of the electron. Similar ideas were elaborated by the polymath
Henri Poincaré5 (1854–1912). But it was Albert Einstein in 1905 who put his
finger on the solution.
4.2 Einsteinian Relativity
Physicists in the first years of the twentieth century were in a strange bind.
Newton’s equations (4.1.2) of matter and gravitation are invariant under the
2 A. Michelson and E. W. Morley, Am. J. Sci. 34. 333 (1887).

3 T. S. Jaseja, A. Javan, J. Murray, and C. H. Townes, Phys. Rev. 133, A1221 (1964).
4 H. A. Lorentz, Versl. Kon. Akad. Wetensch. Amsterdam I, 74 (1892).
5 H. Poincaré, Rendiconti del Circolo Matematico di Palermo 21, 129 (1906).
Galilean transformation (4.1.1), while Maxwell’s equations are not. That in

itself was not so bad – it was possible to believe, as Maxwell did believe, that his
equations only describe electromagnetism in one frame of reference, supposed
to be the one at rest in the aether. But as we saw in the previous section, it
was not possible to detect any effect of motion relative to the aether on the
speed of light.
Postulate of Invariance
Einstein’s solution to this conundrum was presented in 1905 in an article6 “On
the Electrodynamics of Moving Bodies.” As suggested by the title, part of his
motivation was a peculiar feature of electrodynamics. Consider a magnet mov-
ing past a conducting wire. To an observer at rest with respect to the wire,
the changing magnetic field produces an electric field, which, as in an electric
generator, drives a current in the wire. On the other hand, to an observer at
rest with respect to the magnet, there is no electric field; instead the motion
of the wire with velocity v through the magnetic field B produces a force per
charge v × B/c that drives a current in the wire. Somehow the current is the
same, although the two observers use different language to describe what is
happening. So at least some electromagnetic phenomena are unaffected by the
motion of the observer.
Einstein also mentioned in passing “unsuccessful attempts to detect a motion
of the Earth” relative to what he called “the light medium,” but did not give a
reference to the Michelson–Morley experiment. In his 1905 paper he rejected
the idea of Lorentz and Fitzgerald that the change in the speed of light due to the
transformation (4.1.1) is somehow hidden from us by changes in the measuring
apparatus due to motion. Instead, he insisted that Maxwell’s equations are un-
affected by uniform motion – only the change in coordinates due to uniform
motion is not (4.1.1), but something else.
What was truly new and remarkable in Einstein’s paper was that in working
out this change of coordinates he supposed that the time coordinate, as well as
the space coordinates, is affected by the motion of an observer. In writing the
Galilean transformation in Eq. (4.1.1) I was careful to include the specification
t = t. That was an anachronism – no one before Einstein would have bothered
to specify that the time coordinate is unaffected by the motion of an observer. It
was then universally supposed that the flow of time is unaffected by motion or
anything else. Now Einstein was contemplating the possibility that time as well
as distance is affected by an observer’s motion.
Einstein calculated the effect of motion on space and time coordinates by a
variety of thought experiments, under the assumption that times and distances
would be measured using light rays. Though he did not put it in this way, he

96 4 Relativity
was in effect working out what coordinate transformations leave Maxwell’s

equations, and in particular the speed of light, unchanged. Of course, we can
redefine spacetime coordinates any way we like. There is no physics content
to a prescription of how to transform coordinates. As we shall see, what was
new about the physics introduced by Einstein was not in his change of the
coordinate transformation, to keep the speed of light constant, but his hypothesis
that these new transformations leave the equations of mechanics as well as
electrodynamics invariant. This was not true of Newton’s equations, so Einstein
had to change these equations, with profound consequences for physics.
This work of Einstein started what became one of the continuing preoccu-
pations of modern physics: the study of hypothetical principles of invariance
and their physical implications. Instead of working through Einstein’s thought
experiments, the discussion below adopts a more modern spirit. In this section
we learn what transformations of space and time coordinates, known as Lorentz
transformations, leave the speed of light invariant; in the following section we
work out the consequences of the assumption that the laws that govern the
rigidity of rulers and the ticking of clocks – whatever they are – are invariant
under Lorentz transformations; in Section 4.4 we calculate the implications of
the assumption that all the laws of mechanics are invariant under these transfor-
mations; in Section 4.5 we find the consequences of Lorentz invariance for the
properties of photons; and in Section 4.6 we check that not only the speed of
light but Maxwell’s whole theory of electrodynamics is invariant under Lorentz
transformation. In this work we shall make use of a compact spacetime notation
introduced in 1907 by Herman Minkowski7 (1864–1909).
Lorentz Transformations
Let us first consider what sort of spacetime transformation preserves the speed
of light. If a light wave front shifts its position by a vector x in a time interval
t, then if light travels at a speed c we have |x| = ct, or in other words
0 = x2 − c2 (t)2 . (4.2.1)
So, what sort of transformation leaves invariant the quantity x2 − c2 (t)2 ?
Before answering this question, it may be mentioned that there is a larger
group of transformations that leave x2 − c2 (t)2 invariant only when it van-
ishes. These are known as conformal transformations. One simple example is a
rescaling x → λx, t → λt, with λ an arbitrary constant. Invariance of the laws
of nature under conformal transformations would be enough to keep the speed
of light the same for all observers, but it would apparently make it impossible to
deal with non-zero masses. Nevertheless, conformal symmetry has been revived
7 H. Minkowski, lecture delivered to the Math. Ges. Göttingen, November 5, 1907, published in Ann. Phys.
47, 927 (1915).
again and again up to the present as a possible property of physical law at the
most fundamental level, hidden from us through dynamical effects of one sort
or another. Here we shall content ourselves with asking about the more limited
class of transformations that leave x2 − c2 (t)2 invariant, whether or not it
vanishes.
As mentioned above, it will be very convenient to adopt the spacetime nota-
tion due to Minkowski, with a fourth coordinate x 0 ≡ ct. We use letters from
the middle of the Greek alphabet to label the coordinates of events in spacetime,
as x μ , x ν , etc. Then the right-hand side of Eq. (4.2.1) may be written
(x)2 − c2 (t)2 = ημν x μ x ν ,
it being understood that repeated indices are summed over the values 1, 2, 3, 0.
Here ημν is the matrix
⎧
⎨ 1 μ = ν = 1, 2, 3
ημν = −1 μ = ν = 0 (4.2.2)
⎩
0 μ = ν .
In this notation, the condition we impose on coordinate transformations
x μ → x μ may be written
ημν x μ x ν = ημν x μ x ν . (4.2.3)
It can be shown8 that the most general transformation of the spacetime coordi-
nates that satisfies this condition is linear:
x μ = μ ρ x ρ , (4.2.4)
with μ ρ some set of constants. (We are excluding translations here, under
which x μ would change by a constant term a μ , because x μ is a difference
of spacetime coordinates and hence unaffected by translations.) Recall that the
repetition here of the index ρ indicates that this index is to be summed over the
values 1, 2, 3, 0. Condition (4.2.3) now reads
ημν μ ρ ν σ x ρ x σ = ηρσ x ρ x σ .
In order for this to be valid for any x ρ , the coefficients of x ρ x σ on both
sides must be equal:
ημν μ ρ ν σ = ηρσ , (4.2.5)
for all values of the spacetime coordinate indices ρ and σ . Transformations
(4.2.4) with μ ν satisfying (4.2.5) are known as Lorentz transformations.
8 For a proof, see S. Weinberg, Gravitation and Cosmology (Wiley, New York, 1972), Section 2.1.
98 4 Relativity
It will be instructive to consider the special class of coordinate transforma-

tions that act only on the μ = 3 and μ = 0 components of x μ , which as we
shall see is the case for transformations to a frame of reference moving along
the 3-axis. For any such linear transformation, the only non-zero components of
μ ν are
1 1 = 2 2 = 1 , 3 3 = A , 0 0 = B , 3 0 = C , 0 3 = D ,
with real constants A, B, C, and D that are constrained by the condition that this
is a Lorentz transformation. In matrix notation, with μ ν given by the element
of the matrix in row μ and column ν:
⎛ ⎞
1 0 0 0
⎜ 0 1 0 0 ⎟
μ ν = ⎜ ⎟
⎝ 0 0 A C ⎠ ,
0 0 D B
the rows and columns being labeled in the order 1, 2, 3, 0.
Inserting the formulas for the components of μ ν into Eq. (4.2.5) gives
nothing new if ρ or σ equals 1 or 2, while for ρ = σ = 3, ρ = σ = 0,
and ρ = 3, σ = 0 (or ρ = 0, σ = 3), we get respectively
A2 − D 2 = 1 , C 2 − B 2 = −1 , AC − DB = 0 .
With three conditions on four parameters, there will be one free parameter left
when all conditions are satisfied. We will take this parameter as
β ≡ C/B = D/A .
From A2 − D 2 = 1 we then have
1 β2
A2 = , D 2
= ,
1 − β2 1 − β2
while from C 2 − B 2 = −1 we have
1 β2
B2 = , C 2
= .
1 − β2 1 − β2
To find the signs of A and B we impose an additional limitation on the trans-
formations we are considering, that they can be obtained by a smooth change of
parameters such as velocities and angles from a Lorentz transformation that
does nothing. In our case, of transformations only of x 3 and x 0 , a Lorentz
transformation that does nothing has A = B = 1 and C = D = 0. Neither
A nor B can vanish for any β, so if signs do not suddenly change as we change
the parameters of the Lorentz transformation, we must have A and B positive
for all Lorentz transformations of this form. Since AC = DB, this tells us also
that C and D have the same sign, which by definition is the sign of β. So our
conclusion is that the non-vanishing components of μ ν are
3 3 = 0 0 = γ , 3 0 = 0 3 = βγ ,
(4.2.6)
1 1 = 2 2 = 1 ,
where γ is the positive quantity
1
γ = +% . (4.2.7)
1 − β2
That is, in matrix notation,
⎛ ⎞
1 0 0 0
⎜ 0 1 0 0 ⎟
⎟ .
μ ν =⎜ (4.2.8)
⎝ 0 0 γ βγ ⎠
0 0 βγ γ
The free parameter β can have any sign, but |β| < 1. We will see in Section 4.6
that not only the speed of light but also the complete set of Maxwell’s equations
are invariant under these transformations.
I’ll pause to mention that there are other Lorentz transformations that cannot
be obtained by a gradual variation of parameters from a Lorentz transforma-
tion that does nothing. These include the space inversion x 3 → − x 3 with x μ
unchanged for μ = 3, and the time reversal x 0 → − x 0 with x μ unchanged
for μ = 0. (Space inversion is often described as a change of sign of all three
Cartesian coordinates, but this transformation can be produced by the reversal
of any one coordinate, followed by an ordinary rotation of 180◦ around that
coordinate direction.) As will be discussed in Section 6.5, experiments in the
1950s showed that invariance under space inversion is only a good approxi-
mation, being violated by the very weak forces that lead to the decay of some
radioactive nuclei and elementary particles, and in Section 2.4 we have already
mentioned that the same is true of invariance under time reversal. We will be
concerned here only with transformations that can be obtained by a gradual
variation of parameters from a Lorentz transformation that does nothing. (These
are known as proper orthochronous Lorentz transformations – proper, meaning
that the determinant of the matrix μ ν is unity, and orthochronous, mean-
ing that 0 0 > 0. In this book, I will refer to proper orthochronous Lorentz
transformations simply as “proper.”)
Now let us consider the physical meaning of β. Consider a tiny body at rest
in the frame of reference with coordinates x μ . At two different times, separated
by a time difference t, the body is at the same position, so the separation of
positions is x i = 0 with i = 1, 2, 3. Now suppose we look at the same body in
the frame of reference with coordinates x μ , given by the Lorentz transformation
(4.2.6). The 1- and 2-coordinates will be unaffected, but the 3-coordinates and
the times in the new frame of reference will be separated by
x 3 = 3 0 x 0 = βγ x 0 , t = x 0 /c = 0 0 x 0 /c = γ x 0 /c .
(4.2.9)
100 4 Relativity
So in this frame the body has velocity

v = x 3 /t = cβ . (4.2.10)
Therefore cβ is the velocity in the 3-direction given to a body at rest by the
Lorentz transformation (4.2.6).
The Galilean Limit

For velocities that are much less than the speed of light, |β| = |v|/c is much
less than one, and Eq. (4.2.7) then gives γ very close to one. In this case, setting
β = v/c, γ = 1, and x 0 = ct, the transformation (4.2.9) becomes
x 3 = vt , t = t .
This is the same as the Galilean transformation (4.1.1) used for instance in
working out the form of the Navier–Stokes equation in Section 2.5.
Maximum Speed
From Eqs. (4.2.7) and (4.2.10), we see that it is not possible for a finite Lorentz
transformation to take a body from rest to a velocity greater than or even as
large as c. (In Section 4.7 we will see that causality, the principle that effects
cannot precede causes, rules out any signal traveling faster than light.) This
may be surprising, because we can perform a pair of Lorentz transformations,
each of which gives a body at rest a velocity in the 3-direction greater than c/2,
which if these were Galilean transformations of the form (4.1.1) when combined
would give a Galilean transformation from rest to a velocity greater than c. But
velocities add differently in Einsteinian relativity.
Suppose we perform a Lorentz transformation x μ → x μ = 1 ν x ν that
μ
gives a particle initially at rest a velocity cβ1 in the 3-direction and then perform
a Lorentz transformation x μ → x μ = 2 ν x ν that gives the particle that was
μ
initially at rest a velocity cβ2 in the same direction. The combined effect is a
linear transformation
x μ → x μ = 2 ρ 1 ν x ν = 21 ν x ν ,
μ ρ μ
where
μ μ ρ
21 ν ≡ 2 ρ 1 ν .
In matrix notation, this means that
⎛ ⎞⎛ ⎞
1 0 0 0 1 0 0 0
⎜ 0 1 0 0 ⎟⎜ 0 1 0 0 ⎟
21 ν = ⎜ ⎟⎜ ⎟
μ
⎝ 0 0 γ2 β2 γ2 ⎠⎝ 0 0 γ1 β1 γ1 ⎠
0 0 β2 γ2 γ2 0 0 β1 γ1 γ1
where, according to the general rules of matrix multiplication, the element in

row μ and column ν of the product is the sum over ρ of the products of the
terms in row μ and column ρ of the first matrix times the terms in row ρ and
column ν in the second matrix. It is straightforward to calculate that
⎛ ⎞
1 0 0 0
⎜ 0 1 0 0 ⎟
21 ν = ⎜ ⎟
μ
⎝ 0 0 γ21 β21 γ21 ⎠
0 0 β21 γ21 γ21
where
γ21 = γ1 γ2 (1 + β1 β2 ) , β21 γ21 = γ1 γ2 (β1 + β2 ) ,
and therefore
β1 + β2 1
β21 = , γ21 = .
1 + β1 β2 1 − β21
2
Thus the relativistic rule for combining velocities is that a Lorentz transforma-
tion with velocity v1 = cβ1 followed by a Lorentz transformation with velocity
v2 = cβ2 in the same direction gives a Lorentz transformation with velocity
v1 + v2
v21 = cβ21 = . (4.2.11)
1 + v1 v2 /c2
Even if v1 and v2 both approach c, the combined velocity v21 approaches c,
not 2c.
General Directions
Of course there is nothing special about the 3-direction. Whatever velocity
vector v is given to a body at rest by a given Lorentz transformation, we can
always rotate our coordinate axes so that the 3-direction is in the direction of v.
The Lorentz transformation consequently will have the form (4.2.6), but with
β = |v|/c. If we rotate our coordinate axes back to their original direction,
we find
i j = δij + (γ − 1)v̂i v̂j ,
(4.2.12)
i 0 = 0 i = γ vi /c , 0 0 = γ ,
where i and j run over the spatial coordinate indices 1, 2, 3; δij is the unit
matrix,

1 i=j
δij =
0 i = j ,
and

γ = 1/ 1 − v2 /c2 . (4.2.13)
102 4 Relativity
Here v̂ is the unit vector v/|v|. (To check, note that for v in the 3-direction,
Eq. (4.2.12) gives 1 1 = 2 2 = 1, 3 3 = 1 + (γ − 1) = γ , and so on for
the other components.) Performing a Lorentz transformation with velocity v1
followed by a Lorentz transformation with velocity v2 in general does not give
a Lorentz transformation of the form (4.2.12), unless v1 and v2 happen to be
in the same direction. In general we get a rotation, followed by a Lorentz
transformation of the form (4.2.12). This is not a contradiction, because
rotations satisfy the condition (4.2.5) and therefore can be considered as
belonging to a subgroup of the group of Lorentz transformations. Lorentz
transformations of the special form (4.2.12) are often distinguished from more
general Lorentz transformations by calling them boosts.
Special and General Relativity

A decade after presenting the special theory of relativity, Einstein gave us the
general theory of relativity.9 As its name implies, this theory is based on a
more general principle of invariance than for special relativity: the laws of
nature preserve their form under any possible change of spacetime coordinates,
not just under Lorentz transformations.
But it should not be thought that special relativity is in any way superseded by
general relativity. In general relativity it is still true that in certain inertial frames
of reference, in free fall around any local matter and otherwise more or less at
rest or in a state of uniform motion with respect to the average matter of the uni-
verse, the laws of physics are those of special relativity. For example, in inertial
frames the separation x μ of spacetime coordinates along a wave front of light
satisfies ημν x μ x ν = 0; we will see in the next section that the separation
x μ of the spacetime coordinates of two ticks of a moving clock whose ticks are
T seconds apart at rest satisfies ημν x μ x ν = −T 2 c2 ; and so on. If we make
a coordinate transformation other than a Lorentz transformation, for instance to
a frame of reference that accelerates or spins relative to the inertial frames,
then the laws take a more general form, in which ημν is replaced with a field
gμν (x). This field describes gravitation and satisfies differential equations that
generalize and correct Newton’s formula for gravitational attraction. In contrast,
there is no field or any other physical quantity in special relativity that keeps
track of the velocity of the coordinate system. So invariance plays a different
role in general and special relativity. General relativity is a theory of the grav-
itational field, a quantity that keeps track of departures from inertial frames.
Special relativity is a theory of invariance under Lorentz transformations from
one inertial frame to another.

4.3 Clocks, Rulers, Light Waves
As a first application of Einsteinian relativity we now apply the assumption

that the laws that govern the rigidity of rulers and the operation of clocks are
invariant under the Lorentz transformations described in the previous section,
and we use this assumption to calculate the effects of motion on observed
distances and times. We shall also anticipate the demonstration in Section 4.6
that Maxwell’s equations are Lorentz invariant, using this invariance to work
out the effect of motion on the frequency and wave vectors of electromagnetic
waves.
Clocks
Consider two ticks separated by a time interval T of a small clock at rest in
the frame of reference with coordinates x μ . The spacetime coordinates of these
ticks are separated by x i = 0, x 0 = cT , as usual with i = 1, 2, 3. Now
perform a Lorentz transformation (4.2.6) that gives the clock a velocity v = βc
in the 3-direction. The 1- and 2- coordinates of the clock will be unaffected,
while in the new reference frame the 3- and 0- coordinates of the clock at these
two ticks will be separated by
x 3 = 3 0 x 0 = γβcT = γ vT , (4.3.1)
x 0 = 0 0 x 0 = γ cT , (4.3.2)
where as before γ ≡ (1 − v 2 /c2 )−1/2 . From Eq. (4.3.2) we see that the time
interval between ticks of the moving clock is lengthened to T = x 0 /c = γ T .
This is what is seen by the observer who sees the clock moving with velocity v;
an observer who travels with the clock sees its ticks separated by T , just as if it
were at rest.
There is another way of getting this result without ever looking at a specific
Lorentz transformation. If the time interval between ticks of a clock at rest is T ,
then the spacetime separation between ticks at rest has components x i = 0,
x 0 = cT , which satisfy ημν x μ x ν = −c2 T 2 , where ημν is again the diago-
nal matrix (4.2.2) with elements 1, 1, 1, −1 on the diagonal, and the summation
convention is again in force. If an observer sees the clock moving with velocity
v in any direction and measures a time T between ticks, then in the coordinates
x μ used by this observer the spacetime separation between ticks has compo-
nents x = vT , x 0 = cT , which satisfy ημν x μ x ν = (v2 − c2 )T 2 .
But, as discussed in the previous section, Lorentz transformations are designed
to keep this quantity invariant, in the sense that ημν x μ%
x ν = ημν x μ x ν .
Therefore (v2 − c2 )T 2 = −c2 T 2 , so as before T = T / 1 − v2 /c2 .
This lengthening of course applies to any kind of time interval, not just
ticks of a clock. It is vividly displayed in the decay of unstable particles
104 4 Relativity
in cosmic rays. The collision of atomic nuclei in primary cosmic rays with
atoms in the upper atmosphere produces particles known as muons, resem-
bling electrons but about 210 times heavier. At rest, muons are observed to
decay with a mean lifetime 2.2 microseconds, but although they are typically
produced at an altitude of about 15 km, a good fraction of these muons reach
the ground before decaying, so even traveling near the speed of light they
must have survived for a time (as measured on the Earth’s surface) at least
15 km/300 000 km/sec = 50 microseconds, and more if they reach the ground
at a slant. If there were no relativistic time dilation, then the probability
of a particle with a mean lifetime 2.2 microseconds surviving as long as
50 microseconds would be exp(−50/2.2) = 1.2 × 10−10 . Evidently the life of
these muons is extended by their motion by a factor γ at least of order 10, which
requires their velocity to be within a fraction of a percent of the speed of light.
Rulers
Next consider a ruler of length L at rest, lying along the 3-direction in a frame
of reference with coordinates x μ . At any fixed time its ends are separated in this
frame by x 3 = L, x 0 = 0, and x 1 = x 2 = 0. Now perform a Lorentz
transformation (4.2.6) that gives the ruler a velocity v (positive or negative) in
the 3-direction. The spacetime coordinates x μ in the new reference frame will
be separated by
x 3 = 3 3 L = γ L , x 1 = x 2 = 0 (4.3.3)
x 0 = 0 3 L = γ vL/c . (4.3.4)
But Eq. (4.3.4) shows that in this frame the two ends of the ruler have been
traveling for times that differ by an amount t = γ vL/c2 , so to find the
difference in the space coordinates at the same time t , we have to subtract vt
from x 3 . The spatial separation of the ends of the ruler at the same time t is
then
x 3 − vt = γ L − γ Lv 2 /c2 = L/γ . (4.3.5)
This contraction of lengths in the direction of motion is similar to what Fitzger-
ald and Lorentz had proposed as the cause of the failure to measure the velocity
of the Earth through the aether.
Light Waves
We saw in Eq. (3.1.3) that each component of the electromagnetic fields in a
light wave in empty space can be written as a sum of terms proportional to
e±iφ , where φ is the phase:
φ = k · x − ωt . (4.3.6)
We could always add a spacetime-independent term to φ by adjusting the phase

of the coefficients e and b of eiφ in the fields (3.1.3), so it is only the difference
φ in φ between spacetime points that has physical significance. We expect
such phase differences to be Lorentz invariant, because, as we shall see in
Section 4.6, Lorentz transformations subject electromagnetic fields to real linear
transformations. So we need to give k and ω Lorentz transformation properties
that ensure the Lorentz invariance of the phase differences:
φ = k · x − ωt . (4.3.7)
To see how to manage this, we once again introduce a four-dimensional
notation, taking k 0 = ω/c, so that Eq. (4.3.7) reads
φ = k · x − k 0 x 0 = ημν k μ x ν , (4.3.8)
where ημν is again the diagonal matrix (4.2.2) with elements 1, 1, 1, −1
on the diagonal, and the summation convention is again in force. It is
obvious then that φ will be Lorentz invariant if we ascribe to k μ the same
Lorentz transformation as x μ – that is, if under a Lorentz transformation
x μ → μ ν x ν we have
k μ → μ ν k ν , (4.3.9)
where again μ ν satisfies the condition (4.2.5) for a Lorentz transformation,
ημν μ ρ ν σ = ηρσ . Note that the transformation (4.3.9) also preserves the
condition
0 = c2 |k|2 − ω2 = c2 ημν k μ k ν , (4.3.10)
which says that the wave ∝ eiφ travels at the speed of light.
For example, consider a light wave traveling in the +3-direction, which has
wave vector with k 1 = k 2 = 0 and k 3 = ω/c and frequency ν = ω/2π
in a reference frame with spacetime coordinates x μ . Suppose we perform a
Lorentz transformation x μ → x μ = μ ν x ν that gives bodies at rest in the
first reference frame a velocity v (positive or negative) in the 3-direction in the
new reference frame. With μ ν given by Eq. (4.2.6), the frequency in the new
reference frame is given by
ν ≡ ω /2π = ck 0 /2π = c(0 3 k 3 + 0 0 k 0 )/2π
= (0 3 + 0 0 )ν = γ (1 + v/c)ν . (4.3.11)
Using γ 2 = 1/(1 − v 2 /c2 ), we can rewrite this in a more revealing form:
ν = γ −1 (1 − v/c)−1 ν . (4.3.12)
The factor 1/γ is the relativistic time dilation discussed above for moving
clocks: if time intervals are lengthened by a factor γ , then frequencies are
decreased by a factor 1/γ .
106 4 Relativity
The factor (1 − v/c)−1 is the usual Doppler shift, which applies in both
non-relativistic and relativistic contexts, and indeed was first observed in sound
waves. If the source of the light wave is at rest in the reference frame with
coordinates x μ , then the Lorentz transformation μ ν gives the source a velocity
v in the 3-direction, which for v positive is along the direction of the light wave
and hence toward whoever is observing the wave. If the time interval between
wave crests emitted by the source at rest is 1/ν, then, apart from relativistic
effects, the observer will see these crests arrive at a time interval less by a factor
1 − v/c, since the distance that each crest has to travel is less than that for the
previous crest by a factor 1 − v/c, and hence the observed frequency, the rate
at which wave crests arrive at the observer, is increased (apart from relativistic
time dilation) by a factor 1/(1 − v/c). For negative v the source is moving away
from the observer, and the factor (1 − v/c)−1 gives a decrease in frequency, as
seen in the redshift of light from receding galaxies at great distances.
4.4 Mass, Energy, Momentum, Force
Einstein published two papers on relativity theory in 1905. Shortly after the first
paper, which is cited in Section 4.2, he published in the same journal another
paper10 with the title “Does the inertia of a body depend on its energy content?”
This is often referred to as “the E = mc2 paper,” but as can be gathered from
the title, it would be better called “the m = E/c2 paper.” In this paper, Einstein
showed that the mass of a body decreases by an amount E/c2 when the body
emits radiation with energy E. Here “mass” was defined as inertial mass, by the
prescription that, as in Newtonian mechanics, the kinetic energy of a particle of
mass m with velocity v
c is mv 2 /2.
Einstein’s Thought Experiment

Here is the proof of Einstein’s result. Consider a particle such as an atomic
nucleus, at rest in a reference frame with coordinates x μ , in an excited state A.
Suppose that it decays into a state B of lower energy, emitting two “back-to-
back” photons of equal energy traveling in opposite directions along the 3-axis.
The symmetry of the problem rules out any recoil of the particle in its final
state B, so there is no kinetic energy in the initial or final states and hence
each photon must carry energy (EA − EB )/2 and therefore have frequency
ν = (EA − EB )/2h.
Now consider the same process as observed in a reference frame with coordi-
nates x μ = μ ν x ν , with μ ν the Lorentz transformation (4.2.6). In this frame

the decaying particle is traveling with velocity v in the +3-direction both before
and after the decay. Suppose that v
c. Before it decays, the total energy of
the particle is its internal energy EA plus its kinetic energy:
1
Ebefore = EA + mA v 2 .
2
According to Eq. (4.3.11), in this reference frame the frequencies of the photons
that travel in the +3- and −3-directions are respectively
(1 ± v/c)ν (1 ± v/c)
ν± = % =% (EA − EB )/2h
1 − v /c
2 2 1 − v 2 /c2
so the total energy of the final state is
1
Eafter = EB + mB v 2 + hν+ + hν−
2

1
= EB + mB v 2 + (EA − EB )/ 1 − v 2 /c2 ,
2
or, since we are assuming that v
c,
1
Eafter = EB + mB v 2 + (EA − EB )(1 + v 2 /2c2 ) .
2
The conservation of energy requires that 0 = Ebefore − Eafter , so
1
0 = EA − EB + (mA − mB )v 2 − (EA − EB )(1 + v 2 /2c2 ) .
2
In order for this to be possible with velocity-independent internal energies and
masses, we must have
mA − mB = (EA − EB )/c2 (4.4.1)
as was to be proved.
Despite our use of the approximation v
c, Eq. (4.4.1) is not an approxi-
mate result. No one can stop us from making a Lorentz transformation with an
arbitrarily small velocity, so we can reduce any error we have made along the
way in deriving Eq. (4.4.1) to be as small as we like, simply by making v/c
sufficiently small.
Equation (4.4.1) is not yet the famous E = mc2 . As long as we are dealing
only with a single body changing its state, as in the above Einstein thought
experiment, it is only changes in its energy that matter for the conservation of
energy, not the energy itself, and we might as well define the energy of any one
state, say the lowest state, as mc2 . But E = mc2 goes beyond Einstein’s result
(4.4.1) when we consider a reaction involving a number of bodies, coming into
and going out of existence, and exchanging energy with each other.
108 4 Relativity
General Formulas for Energy and Momentum

The question of the energy of a massive particle at rest is part of a larger
question: what are the energy and momentum of a particle moving with arbitrary
velocity? It is largely up to us what we want to call energy and momentum.
Historically, as we saw in Section 2.1, physicists gave these names to certain
quantities that they had found to be conserved. A three-vector at first called
“quantity of motion” by Newton was found to be conserved as a consequence
of the equality of action and reaction, and later became known as momentum.
A rotationally invariant quantity at first called vis viva was found by Huygens
to be conserved when bodies come into contact, and was later called kinetic
energy. The concept of energy then had to be broadened to preserve the con-
servation of energy in more general processes, as for instance by including
potential energy. It is the conservation of energy and momentum that makes
these concepts useful, whether we want to calculate how much fuel to use to
boil a given mass of water or how fast an alpha particle must be traveling to
give a certain velocity to a gold nucleus that it strikes.
We are not in a position in this chapter to prove the conservation of whatever
we call energy and momentum. As we shall see in Section 5.7 of the chapter on
quantum mechanics, these conservation laws follow from the invariance of the
laws of nature under translations in time and space. But we can here learn a lot
from the requirement that the conservation of the total energy and momentum
of a number of colliding particles must be Lorentz invariant.
Hence, in order to express the momentum and energy of a body as functions
of its velocity, we impose two conditions on these functions:
• The conservation of energy and momentum is Lorentz invariant. That is, if
one observer sees these quantities conserved, then so must any other observer
related to the first by a Lorentz transformation.
• For velocities much less than c, the momentum and (up to a constant term)
the energy must be given by the same formulas as in Newtonian mechanics.
To accomplish this, we shall assume that the momentum p and energy E of
a particle can be assembled into a four-component quantity p μ with p 0 pro-
portional to the energy E, which transform just like the components of x μ .
That is, in changing our spacetime coordinates from x μ to x μ = μ ν x ν , the
energy–momentum four-vector p μ of any particle is changed to
p μ = μ ν p ν . (4.4.2)
If the observer who uses the coordinates x μ sees that in a collision the momen-
μ
tum four-vectors pn of the various colliding particles satisfy the condition of
total energy and momentum conservation,

pnμ − pnμ = 0,
n,before n,after
then an observer who uses coordinates x μ = μ ν x ν will see that

⎡ ⎤

pnμ − pnμ = μ ν ⎣ pnν − pnν ⎦ = 0
n,before n,after n,before n,after
and so will see energy and momentum again conserved.

The transformation property (4.4.2) allows us to calculate the energy and
momentum of a particle with an arbitrary velocity if we know its energy
and momentum when it is at rest. At rest the spatial components p i must
vanish (which way would this vector point?) and we can take p 0 to be some
number that we shall temporarily call N , characterizing the type of particle. It
follows that the momentum four-vector of a particle with velocity v is given by
p μ (v) = μ 0 (v)N
where (v) is the Lorentz transformation (the “boost”) that takes the particle
at rest to velocity v. In particular, for v in the 3-direction, (v) is the Lorentz
transformation (4.2.6), so p is in the 3-direction, with value
p 3 (v) = 3 0 (v)N = γ (v/c)N , (4.4.3)
and
p 0 (v) = 0 0 (v)N = γ N (4.4.4)
%
where again γ ≡ 1/ 1 − v 2 /c2 .
To implement the second condition above, we next consider the limit v
c.
Here Eq. (4.4.3) gives
p 3 (v) = N[v/c + O(v 3 /c3 )] .
In order for this to give the Newtonian result p 3 (v) = mv for v

c, we must
take N = mc, so that
p(v) = mγ v = mv[1 + v2 /2c2 + · · · ] . (4.4.5)
Also, for v
c, Eq. (4.4.4) now gives
p 0 = mc[1 + v 2 /2c2 + O(v 4 /c4 )] .
In order for this to give the Newtonian result mv 2 /2 for the kinetic energy, we
must choose the constant of proportionality between p 0 and E so that E = cp 0 ,
and hence
E(v) = mc2 γ = mc2 + mv2 /2 + mv4 /6c2 + · · · . (4.4.6)
Note that we cannot leave out the term mc2 in the energy (4.4.6), or change it
to any other constant term. If we did, then p μ would not satisfy the condition
110 4 Relativity
(4.4.2) for a four-vector, and the conservation of energy and momentum would
not be Lorentz invariant.
We can eliminate the velocity v from Eqs. (4.4.5) and (4.4.6) to derive a
relation between energy and momentum. Since γ 2 (1 − v2 /c2 ) = 1, we have
E 2 − p2 c2 = m2 c4 , or in other words,

E = p2 c 2 + m 2 c 4 . (4.4.7)
This can also be derived directly by noting that E 2 − c2 p2 = −c2 ημν p μ p ν

takes the value m2 c4 in the reference frame in which the body is at rest, and is
Lorentz invariant, so it takes the same value m2 c4 in all reference frames.
E = mc2
Einstein suggested in his 1905 paper that the reduction of mass accompanying
the emission of energy might be detected by the study of radioactive salts.
This proved difficult, because it is not easy to measure accurately the atomic
weights of different states of a radioactive isotope. In the early 1930s it became
possible to verify Einstein’s relation between energy and mass by studying
reactions among stable isotopes, such as 1 H + 7 Li → 2 4 He. The masses of
the atoms of 1 H, 7 Li, and 4 He are respectively 1.007825 m1 , 7.016003 m1 , and
4.002603 m1 , where m1 is the mass of unit atomic weight, defined today as
1/12 the mass of the carbon isotope 12 C. The mass lost in this reaction is thus
m = 0.018622 m1 = 3.09 × 10−26 g = 17.3 MeV/c2 . Thus it is expected
that the kinetic energies of the two 4 He nuclei in the final state should
exceed the kinetic energies of the 1 H and 7 Li nuclei in the initial state by
mc2 = 17.3 MeV, and this is observed, verifying E = mc2 and not just
Eq. (4.4.1).
Force
Because of the presence of the factor γ in Eqs. (4.4.5) and (4.4.6), the quantity
mγ is sometimes called the relativistic mass. I will not use this terminology,
because it suggests that we can calculate the acceleration produced by any force
just by replacing m in Newton’s F = ma with mγ , which is not the case. To
find how bodies respond to forces in special relativity, we need to formulate a
general Lorentz-invariant version of Newton’s second law.
Though the time coordinate is Galilean invariant it is not Lorentz invariant,
so neither is the time derivative d/dt. To replace the time derivative in Newton’s
second law, we note that dτ is Lorentz invariant, where

dτ ≡ −ημν dx dx /c = dt − dx /c = dt 2 − dt 2 v2 /c2 = dt/γ .
μ ν 2 2 2 2
(4.4.8)
So, in place of the Newtonian formula dp/dt = F, the requirement of Lorentz

invariance suggests that
dp μ
= Fμ , (4.4.9)
dτ
where F μ is a four-vector with the same Lorentz-transformation properties as
x μ or k μ or p μ . The space components of Eq. (4.4.9) give
dp
γ =F (4.4.10)
dt
but p is not just mv, and the factor γ in Eq. (4.4.10) is outside the time derivative.
Incidentally, we do not need a special determination of the time component
F 0 . We have already noted that ημν p μ p ν = −m2 c2 , so
d dp μ ν
0= (ημν p μ p ν ) = 2ημν p .
dτ dτ
Hence
0 = ημν F μ p ν = F · p − F 0 E/c (4.4.11)
and therefore
F 0 = cF · p/E = F · v/c . (4.4.12)
We will see in Section 4.6 how to construct the four-vector F μ for the forces
exerted by electric and magnetic fields on a moving charged particle.
4.5 Photons as Particles
As we saw in Section 3.2, Einstein in 1905 proposed that the energy of radiation
of a given frequency ν is always an integer multiple of hν. This led to the further
conjecture that the radiation consists of particles, later called photons, each with
energy hν. A state with energy nhν would then be interpreted as consisting of
n photons.
Photon Momentum
If we suppose that photons are real particles, then we need to work out the
relation between their energy and the magnitude of their momentum. In order
for the conservation of energy and momentum to be Lorentz invariant when
photons interact with other particles, the photon energy E and momentum p
must form a four-vector p μ , with p 0 = E/c, just as for other particles. That
is, in changing coordinates from x μ to x μ = μ ν x ν , the photon momentum
four-vector is changed to p μ = μ ν p ν . But we cannot work out formulas for
112 4 Relativity
the components of p μ in the way we did for other particles, by expressing p μ

as a Lorentz transformation acting on the four-momentum of a particle at rest,
because photons never can be at rest.
Instead, we return to the starting point, that the energy of a quantum of
radiation is proportional to the frequency. This implies that the time component
of p μ is proportional to the time component of another four-vector, the wave
vector k μ discussed in Section 4.3. Specifically, using the result k 0 = ω/c given
there, we have
p0 ≡ E/c = hν/c = h̄ω/c = h̄k 0 .
Then in all Lorentz frames
0 = p 0 − h̄k 0 . (4.5.1)
It is a general rule that if the time component a 0 of a four-vector a μ vanishes in
all coordinate systems, then the whole four-vector a μ vanishes. For if for any
arbitrary Lorentz transformation we have a 0 = a 0 = 0 where a μ = μ ν a ν ,
then
0 = a 0 = 0 i a i ,
which implies that a i vanishes. (If a = 0 we can rotate our coordinate axes so
that the 3-axis is in the direction of a, and take μ ν to be a Lorentz transforma-
tion (4.2.6) along this direction, in which case 0 = βγ |a|, so a = 0.) The whole
four-vector a μ thus vanishes, as was to be proved. Taking a μ = p μ − h̄k μ , we
conclude then from Eq. (4.5.1) that the photon four-momentum is
p μ = h̄k μ (4.5.2)
and in particular
|p| = h̄|k| = h̄ω/c = E/c . (4.5.3)
This is just the relation between energy and momentum that we would expect
from Eq. (4.4.7) if we treat the photon as a particle of zero mass.
Compton Scattering
If photons carry momentum, then when a photon is scattered by an electron
at rest the electron should recoil. Suppose the incoming and outgoing photons
have wave vectors k and k , respectively. According to Eq. (4.4.7), the energy
of an electron of momentum pe is given by

Ee = p2e c2 + m2e c4 . (4.5.4)
The conservation of energy in the scattering of a photon by an electron at rest
requires that

ch̄|k| + me c2 = ch̄|k | + p2e c2 + m2e c4 ,
where pe is the momentum of the recoiling electron. According to Eq. (4.5.2),
the conservation of momentum gives
pe = h̄k − h̄k ,
so the conservation of energy becomes

ch̄|k| + me c2 = ch̄|k | + c2 h̄2 |k|2 + |k |2 − 2 cos θ|k||k | + m2e c4 ,
where θ is the angle between the initial and final photon wave vectors. Subtract-
ing ch̄|k | from both sides and squaring, we have
c2 h̄2 (k2 − 2|k||k | + k ) + 2c3 h̄me (|k| − |k |) + m2e c4
2
= c2 h̄2 (k2 + k − 2 cos θ|k||k |) + m2e c4 .

2
Cancelling the terms c2 h̄2 k2 . c2 h̄2 k 2 , and m2e c4 on both sides leaves us with
|k| − |k | = |k||k |(1 − cos θ )h̄/me c .
It is conventional to write this in terms of the wavelengths λ = 2π/|k| and
λ = 2π/|k |, and h = 2π h̄:
λ − λ = (1 − cos θ )h/me c . (4.5.5)
The quantity h/me c equals 2.425 × 10−10 cm, and gives the increase in wave-
length for a photon scattered at right angles to its original direction. This is
known as the Compton wavelength of the electron, in honor of Arthur Holly
Compton (1892–1962).
Compton at Washington University studied the scattering of monochromatic
X-ray photons, with energy 17 keV. These photons were created by X-ray flu-
orescence: atoms of high atomic number, such as platinum, were exposed to a
beam of high-energy electrons in a tube something like the cathode ray tubes
used by Thomson (with whom Compton had worked at Cambridge). The beam
of high-energy electrons knocked electrons out of these atoms, some from inner
orbits. Then other electrons of nearly zero energy fell into these orbits, emitting
monochromatic radiation, which, as we saw in our discussion of atomic number
in Section 3.4, is at X-ray wavelengths for atoms with Z 1. In Compton’s
experiment these photons were directed at a graphite target, where they were
scattered by an outer electron of the carbon atom. These outer electrons have
energies of the order of an eV, or at most tens of eV, negligible compared with
the 17 keV energy of the incoming X-ray photon, so they scattered the X-ray
photons just as if they were at rest. The wavelength of the scattered photon was
measured by diffraction scattering, using a single crystal as a diffraction grating.
Compton’s experiment verified Eq. (4.5.5) in 1923, giving a significant boost to
114 4 Relativity
the acceptance of the quantum of light as a particle of zero mass. It was the
chemist G. N. Lewis (1875–1946) who a few years later gave this particle the
name “photon.”
There are other types of particle with zero mass. One is the graviton,
the quantum of gravitational radiation. This radiation has been observed, but
there is unfortunately no prospect of observing its quantum nature in the
foreseeable future. There are also eight types of gluons, massless particles that
in our present Standard Model are supposed to mediate strong nuclear forces.
They interact so strongly when pulled away from other strongly interacting
particles that they cannot even in principle be observed in isolation, but there is
plenty of indirect evidence of their existence.
4.6 Electromagnetic Fields and Forces
Recall that Maxwell’s equations take the form

1 ∂E 4π
∇×B− = J, ∇ · E = 4πρ , (4.6.1)
c ∂t c
1 ∂B
∇×E+ =0, ∇·B=0, (4.6.2)
c ∂t
where E and B are the electric and magnetic fields, while ρ and J are the
densities of electric charge and electric current. Are these equations Lorentz
invariant, as required by Einsteinian relativity?
That is not quite the right question. We have no a priori knowledge of the
Lorentz-transformation properties of the electric and magnetic fields. The real
question that confronts us here is: what Lorentz-transformation properties can
be supposed for the fields and densities in these equations that will make the
equations Lorentz invariant? In the course of answering this question, we will
encounter some algebraic devices that are useful in judging the Lorentz invari-
ance of all sorts of field theories.
Density and Current

Let’s start by considering the charge density ρ(x, t) and current density J(x, t)
appearing on the right-hand sides of the Maxwell equations (4.6.1). Following
the same arguments as in Section 2.5, because electric charge is conserved these
satisfy a continuity equation like Eq. (2.5.2):
∂
ρ(x, t) + ∇ · J(x, t) = 0 . (4.6.3)
∂t
This can be derived directly from the inhomogeneous Maxwell equations
(4.6.1); just add c times the divergence of the first equation to the time derivative
4.6 Electromagnetic Fields and Forces 115
of the second equation. So how should ρ(x, t) and J(x, t) behave under Lorentz
transformations in order for Eq. (4.6.3) to be Lorentz invariant?
It helps to put the continuity equation in a revealing four-dimensional form.
Define a four-component quantity J μ (x) with J 0 (x) = cρ(x). Then, recalling
that x 0 ≡ ct, Eq. (4.6.3) reads
∂ μ
J (x) = 0 , (4.6.4)
∂x μ
with repeated indices summed as usual over the values 1, 2, 3, 0. Now, how does
the partial derivative ∂/∂x μ transform if we perform a Lorentz transformation
x μ → x μ = μ ν x ν ? The chain rule of partial differentiation tells us that
∂ ∂x μ ∂
=
∂x ν ∂x ν ∂x μ
so in our case
∂ ∂
μ ν = . (4.6.5)
∂x μ ∂x ν
Therefore, if we suppose that J μ (x) transforms as a four-vector under the
Lorentz transformation x μ → x μ = μ ν x ν , in the sense that the current
J μ (x ) measured by an observer who uses spacetime coordinates x μ is
J μ (x ) = μ ν J ν (x) , (4.6.6)
then
∂ ∂ ∂
μ
J μ (x ) = μ μ ν J ν (x) = ν J ν (x) .
∂x ∂x ∂x
This is the Lorentz transformation of what is called a scalar. The quantity
∂J μ /∂x μ is seen by different observers to have the same value at the same
point in spacetime, although these observers use different spacetime coordinate
systems to label that point.
So, if an observer who uses spacetime coordinates x μ sees ∂J μ /∂x μ to
μ
vanish at some particular value xI of these coordinates, then an observer who
uses spacetime coordinates x = μ ν x ν will see ∂J μ /∂x μ vanish at the
μ
μ
corresponding coordinates xI = μ ν xIν . In particular, if the first observer sees
∂J μ /∂x μ vanish everywhere, then so will any other observer whose coordi-
nates are related to those of the first observer by a Lorentz transformation.
So the Lorentz transformation (4.6.6) does make the conservation condition
(4.6.4) Lorentz invariant.
The Inhomogeneous Maxwell Equations

We next consider how to rewrite the inhomogeneous Maxwell equations (4.6.1).
The Lorentz invariance of these equations requires that we give E and B
116 4 Relativity
Lorentz-transformation properties such that their first derivatives with respect

to space and time coordinates can be assembled into a four-component field
that transforms as a four-vector, in the same sense (4.6.6) as J μ . We cannot
assemble a four-vector from the six components of E and B themselves, but
we can assemble them into a different sort of quantity, an antisymmetric array
F μν (x) = −F νμ (x) with two vector indices. We take
E1 = F 01 = −F 10 , E2 = F 02 = −F 20 , E3 = F 03 = −F 30 , (4.6.7)
B1 = F 23 = −F 32 , B2 = F 31 = −F 13 , B3 = F 12 = −F 21 , (4.6.8)
and F μν = 0 if μ = ν. In this notation the 3-component of the first of the

inhomogeneous equations (4.6.1) reads
4π 3 ∂B2 ∂B1 ∂E3 ∂F 3ν

J = − − =
c ∂x 1 ∂x 2 ∂x 0 ∂x ν
with the understanding that in accordance with the summation convention the
repeated index ν is summed over the values 1, 2, 3, 0, with the ν = 3 term
here vanishing because F 33 = 0. The same applies to the 1-component and 2-
component of the first of equations (4.6.1). Further, in this notation the second
of equations (4.6.1) reads
4π 0 ∂ ∂ ∂ ∂F 0ν
J = ∇ · E = 1 F 01 + 2 F 02 + 3 F 03 = .
c ∂x ∂x ∂x ∂x ν
So, in this notation all of the inhomogeneous Maxwell equations (4.6.1) can be
summarized in the single four-component equation
∂ μν 4π μ
F (x) = J (x) . (4.6.9)
∂x ν c
It is now almost obvious how to make the inhomogeneous Maxwell equations
Lorentz invariant. We suppose that under a Lorentz transformation x μ → x μ =
μ ν x ν the field F μν (x) transforms like J μ (x), but with a pair of four-valued
indices. That is, the observer who uses coordinates x μ measures electric and
magnetic fields with
F μν (x ) = μ ρ ν σ F ρσ (x) . (4.6.10)
Fields with this sort of transformation property are known as tensors.

To see that this makes Eq. (4.6.9) Lorentz invariant, consider a general
Lorentz transformation x μ → x μ = μ ν x ν . Multiplying Eq. (4.6.9) with ρ μ
and using Eq. (4.6.5) again to set ∂/∂x ν = σ ν ∂/∂x σ gives
∂ ∂ 4π ρ μ
ρ μ σ ν σ
F μν (x) = ρ μ ν F μν (x) = μ J (x) .
∂x ∂x c
Using the transformation properties (4.6.6) and (4.6.10), this becomes

∂ 4π ρ
σ
F ρσ (x ) = J (x ) .
∂x c
Thus Eq. (4.6.9) holds in the frame of reference with coordinates x μ if it holds
in the frame of reference with coordinates x μ , which is what we mean when
we say it is Lorentz invariant. This then is a partial answer to our question: the
inhomogeneous Maxwell equations (4.6.1) are Lorentz invariant if the electric
and magnetic fields transform as components (4.6.7) and (4.6.8) of an antisym-
metric tensor field.
This represents a unification of electricity and magnetism beyond anything
of which Oersted, Ampére, Faraday, or even Maxwell could have dreamed. Not
only are electric and magnetic fields coupled in the field equations – putting an
observer into motion can change electric or magnetic fields into combinations of
both electric and magnetic fields. For example, suppose an observer using coor-
dinates x μ finds a uniform electric field E1 in the 1-direction, and no magnetic
field, so that the only non-vanishing component of F μν is F 01 = −F 10 = E1 .
Suppose a second observer uses coordinates x μ = μ ν (v)x ν , where μ ν (v)
is the Lorentz transformation (4.2.6) that gives a body at rest a velocity v in the
3-direction, whose non-vanishing components are:
3 3 = 0 0 = γ , 3 0 = 0 3 = βγ ,
1 1 = 2 2 = 1 ,
%
where β = v/c, and γ is again the positive quantity γ = +1/ 1 − β 2 . The
second observer sees an electromagnetic field

F μν = μ ρ (v) ν σ (v) F ρσ = μ 0 (v) ν 1 (v) − μ 1 (v) ν 0 (v) E1 .
Its only non-vanishing components in this case are
E1 = F 01 = −F 10 = 0 0 E1 = γ E1 ,
B2 = F 31 = −F 13 = 3 0 E1 = βγ E1 .
Not only is the electric field increased; a magnetic field appears where before
there was none. This is the sort of thing that had led Einstein to his 1905
paper.
Upstairs, Downstairs
We still have to verify that, with electromagnetic fields obeying the trans-
formation rule (4.6.10) that makes the inhomogeneous Maxwell equations
(4.6.1) Lorentz invariant, the homogeneous Maxwell equations (4.6.2) are also
Lorentz invariant. To check this, we need to widen our ideas about vectors and
tensors.
118 4 Relativity
In general, we define a four-vector field V μ (x) as a quantity that has the same
Lorentz-transformation property as x μ or p μ or J μ :
V μ (x) → V μ (x ) = μ ν V ν (x) .
There is a different kind of four-vector field, conventionally written with a lower
index, that transforms according to
Uμ (x) → Uμ (x ) = μ ν Uν (x), (4.6.11)
where μ ν is the transposed inverse of the matrix μ ν in the sense that

1 ρ=σ
μ σ = μ σ = δσ ≡
ρ μ ρ μ ρ
(4.6.12)
0 ρ = σ .
The classic example of a vector that is naturally defined with a lower index is
the partial derivative. If we multiply Eq. (4.6.5) with ρ ν , sum over the repeated
index ν, and use Eq. (4.6.12), we find
∂ ∂
ρ
= ρ ν ν . (4.6.13)
∂x ∂x
It is trivial to calculate the transposed inverse μ ν of any given Lorentz
transformation μ ν . To see this, recall the defining characteristic (4.2.5) of
Lorentz transformations:
ημν μ ρ ν σ = ηρσ .
Multiplying with κ ρ , summing over ρ, and using Eq. (4.6.12) gives
ηκν ν σ = ηρσ κ ρ . (4.6.14)
That is, for i and j each running over 1, 2, 3:
i j = i j , 0 j = −0 j , i 0 = −i 0 , 0 0 = 0 0 .
In general, a tensor can have both upper and lower indices, and transforms
with a or its transposed inverse for each. For instance, a tensor t μν ρ has the
transformation property
t μν ρ → μ λ ν κ ρ σ t λκ σ .
If we set an upper index equal to a lower index and (following the summation
convention) sum over this index, we get another tensor with one less upper index
and one less lower index. For instance, in the above example, if we set ν = ρ
and sum, we obtain a quantity v μ ≡ t μν ν , with the transformation property of a
tensor with one index – that is, a vector:
v μ → μ λ ν κ ν σ t λκ σ = μ λ δκσ t λκ σ = μ λ t λσ σ = μ λ v λ ,
as required for a vector. One case has been already encountered: if we define a
μ
tensor tν ≡ ∂J μ /∂x ν and set the upper and lower indices equal and sum, we
μ
obtain a quantity that we already know is a scalar: tμ ≡ ∂J μ /∂x μ .
Although it is important not to confuse upper and lower indices, the differ-
ence between them is just a matter of the sign of the time components. We
can use the matrix ημν to lower an index on any tensor, giving a new tensor. For
instance, returning to our earlier example, if t μν ρ is a tensor with transformation
property
t μν ρ → μ λ ν κ ρ ξ t λκ ξ ,
we can lower the index ν, defining a new tensor:
uμ σρ ≡ ηνσ t μν ρ .
Using Eq. (4.6.14), we see that this has the transformation property
uμ σρ → ηνσ μ λ ν κ ρ ξ t λκ ξ
= μ λ σ τ ητ κ ρ ξ t λκ ξ = μ λ σ τ ρ ξ uλ τ ξ
as is appropriate for a tensor with one upper index and two lower indices. (It is
also possible to raise any lower indices on tensors, but we won’t need to do this
here.)
The Homogeneous Maxwell Equations

With our new-found power to lower indices, let introduce a new tensor:
∂Fμν ∂Fνλ ∂Fλμ
Hμνλ ≡ λ
+ + . (4.6.15)
∂x ∂x μ ∂x ν
It is easy to see that Hμνλ is totally antisymmetric. For instance, interchanging
μ and λ gives
∂Fλν ∂Fνμ ∂Fμλ ∂Fνλ ∂Fμν ∂Fλμ
Hλνμ = + + = − − − = −Hμνλ .
∂x μ ∂x λ ∂x ν ∂x μ ∂x λ ∂x ν
Therefore Hμνλ vanishes unless all three indices are unequal. In four space-
time dimensions, this means that Hμνλ has only four independent components.
Lowering the indices in Eqs. (4.6.7) and (4.6.8), we have
E1 = −F01 , E2 = −F02 , E3 = −F03 ,
(4.6.16)
B1 = F23 , B2 = F31 , B3 = F12 ,
so
∂F12 ∂F23 ∂F31 ∂B3 ∂B1 ∂B2
H123 ≡ 3
+ 1
+ 2
= + 1 + 2 =∇·B,
∂x ∂x ∂x ∂x 3 ∂x ∂x
∂F12 ∂F20 ∂F01 1 ∂B3 ∂E2 ∂E1
H120 ≡ + + = + −
∂x 0 ∂x 1 ∂x 2 c ∂t ∂x 1 ∂x 2

1 ∂B
= +∇×E .
c ∂t 3
120 4 Relativity
Likewise

1 ∂B 1 ∂B
H230 = +∇×E , H310 = +∇×E .
c ∂t 1 c ∂t 2
Hence the homogeneous Maxwell equations are the same as the requirement
that, for all μ, ν, and λ,
Hμνλ = 0 . (4.6.17)
This is a manifestly Lorentz-invariant condition; if Hμνλ (x) vanishes, then so
does Hρσ (x ) = μ ν λ H
κ ρ σ κ μνλ (x).
(We will not here use the formalism of differential forms, but for anyone
interested in this subject, I mention in passing that a completely antisymmetric
tensor with p lower indices is known as a p-form. Thus Fμν is a 2-form, and
Hμνλ is a 3-form. Given a p-form, we can form a p + 1-form, known as the
exterior derivative, by taking the spacetime derivative and antisymmetrizing.
Thus, Hμνλ is the exterior derivative of Fμν . A p-form whose exterior deriva-
tive vanishes is said to be closed; a p-form that can be written as the exterior
derivative of a p − 1-form is said to be exact. Thus Hμνλ is exact, and the
homogeneous Maxwell equations (4.6.2) tell us that Fμν is closed. It is easy to
see that, because partial derivatives commute, any exact p-form is closed, and
a profound theorem due to Poincaré tells us that in simply connected spaces
any closed p-form is exact but that this is not necessarily true in spaces with
more complicated topology.11 In electrodynamics, since Fμν is closed, we can
conclude that in ordinary spacetime it is exact, so it can be written as the
exterior derivative of a 1-form Aμ known as the four-vector potential; that
is, Fμν = ∂Aμ /∂x ν − ∂Aν /∂x μ . Maxwell originally wrote his equations as
differential equations for A and A0 , not E and B.)
Electric and Magnetic Forces

We saw in Section 4.4 that in special relativity Newton’s F = ma is replaced
with the Lorentz-invariant formula (4.4.9)
dp μ
= Fμ (4.6.18)
dτ
where pμ is the four-vector of energy and momentum, cdτ ≡ [−ημν dx μ dx ν ]1/2 ,
and F μ is a four-vector subject to the constraint
ημν p μ F ν = 0 . (4.6.19)
11 For a more thorough treatment, see e.g. H. Flanders, Differential Forms (Academic Press, New York,
1963).
4.7 Causality 121
So, what should we take for F μ in the case of a particle of charge q in a space
that is empty except for being pervaded by electric and magnetic fields? Just
as for the momentum four-vector of massive particles, the force four-vector is
uniquely determined by the condition that it takes the known form for a particle
at rest, and is a four-vector, so that it is given by a Lorentz transformation for a
particle of any velocity.
There is an obvious four-vector that is linear in the electric and magnetic
fields and (because F μρ = −F ρμ ) satisfies Eq. (4.6.19):
f μ ≡ ηρσ F μρ p σ
so we can guess that F μ ∝ f μ . To check that this gives the right answer for a
particle at rest and to find the coefficient of proportionality, let us evaluate f μ
for a particle at rest, for which p = 0 and p 0 = mc. In this limit
f i → −mcF i0 = mcEi , f 0 → −mcF 00 = 0
with i = 1, 2, 3. Therefore to have agreement with the familiar formula dp/dt =
qE for the acceleration of a particle of charge q and zero velocity by an electric
field, we take
q μ q
Fμ = f = ηρσ F μρ p σ . (4.6.20)
mc mc
That is, for a general velocity,
q
Fi = [F ij p j − F i0 p 0 ] , (4.6.21)
mc
and in three-vector notation, recalling that p 0 = mcγ , p = mγ v,
q 0
F= [p E + p × B] = qγ [E + v × B/c] . (4.6.22)
mc
Since dτ = dt/γ , this gives
d
m [γ v] = q[E + v × B/c] . (4.6.23)
dt
Given the existence of the force exerted by electric fields, the force exerted
by magnetic fields is an inevitable consequence of Lorentz invariance. It is a
special feature of electromagnetic forces that the only change in the equation
of motion introduced by special relativity is the replacement of the mass m in
the momentum with mγ , which in this one case allows us to treat mγ as a
relativistic mass.
4.7 Causality
We saw in Section 4.2 that no Lorentz transformation acting on a body at rest

could give it a speed greater than c, the speed of light. We can derive a stronger
122 4 Relativity
result, that no influence whatever can travel faster than light. This is not just a
confession of technological inadequacy, but a consequence of an assumption of
causality, that effects always come after causes.
Invariance of Temporal Order

Suppose that in some coordinate frame the difference between the spacetime
coordinates of an event and the event that cause it is x μ :
μ
xeffect − xcause
μ
≡ x μ .
According to the principle of causality, we must have t = x 0 /c > 0. Now

suppose we perform a Lorentz transformation μ ν that would give a body at
rest a velocity v in the direction opposite to the spatial separation x. Without
loss of generality, we can rotate our coordinate system so that x and thus
−v are in the 3-direction. Then μ ν takes the form (4.2.6) with β = −|v|/c,
and in the new coordinate frame the difference between the times of effect and
cause is
t = 0 μ x μ /c = γ [t − v|x|/c2 ] (4.7.1)

%
where v = |v| and γ = 1/ 1 − v 2 /c2 . Now, v can be anything, except that
it must be less than c, so if |x|/c is greater than t we could make t
negative by taking v in the range 1 > v/c > ct/|x|. So the observer using
coordinates x μ would see the effect precede the cause.
To rule this out, we must assume that the difference x μ between the space-
time coordinates of an event and the event that causes it satisfies the inequality
|x|/c ≤ t . (4.7.2)
Whatever physical influence is exerted by the cause to produce the effect travels
at a speed |x|/t; the inequality (4.7.2) says that the speed of this influence
must be no greater than c.
Fortunately, if the bound (4.7.2) is seen to be satisfied by one observer then it
is satisfied for any other observer related to the first by the sort of proper Lorentz
transformation discussed in this chapter. The inequality (4.7.2) is equivalent to
the inequality
−ημν x μ x ν = c2 (t)2 − |x|2 ≥ 0 . (4.7.3)
This quantity is Lorentz invariant, so if Eq. (4.7.3) is satisfied for one observer
using coordinates x μ , the corresponding inequality must be satisfied for coordi-
nates x μ = μ ν x ν , so we must also have
c2 (t )2 ≥ |x |2 . (4.7.4)

4.7 Causality 123
Since this gives a non-zero lower bound on |t |, in order for t to have an
opposite sign from t the Lorentz transformation would have to produce a dis-
continuous jump in the coordinates. This is not possible for the sort of “proper”
Lorentz transformation that concerns us in this chapter, which as discussed in
Section 4.2 can be produced from the identity transformation x → x by a
smooth change of parameters. So if one observer sees t > 0 and |x|/c ≤ t,
then any observer related to the first by a proper Lorentz transformation will see
t > 0 and |x |/c ≤ t .
Light Cone
These conclusions are well illustrated by introducing the light cone, the space-
time surface with ημν x μ x ν = 0. Points outside the light cone fall on hyper-
boloids with ημν x μ x ν = a > 0. Any point on one of these hyperboloids
can be taken to any other point on the same hyperboloid (that is, with the same
value of a) by a proper Lorentz transformation, even if this entails a change
of sign of x 0 . On these hyperboloids | x| > c|t|, so it is not possible for
any influence traveling at less than the speed of light to traverse a spacetime
interval x μ outside the light cone. Thus as long as we assume that physical
influences never travel faster than light, the circumstance that proper Lorentz
transformations can change the sign of t outside the light cone presents no
challenge to causality.
Points inside the light cone fall on hyperboloids with ημν x μ x ν = b < 0.
For each value of b there are two disconnected hyperboloids, one inside the
future light cone, with x 0 > 0, and one inside the past light cone, with
x 0 < 0. Any point on one of these connected hyperboloids can be taken to
any other point on the same hyperboloid by a proper Lorentz transformation, but
proper Lorentz transformations cannot take us from inside the future light cone
to inside the past light cone. Causality requires that the difference x μ in the
coordinates of an effect and its cause be on or within the future light cone, and
if one observer sees this to be the case then so will all other observers related to
the first by a proper Lorentz transformation.
5
Quantum Mechanics
Our modern understanding of atoms, molecules, solids, atomic nuclei, and ele-
mentary particles is largely based on quantum mechanics. Quantum mechanics
grew in the mid-1920s out of two independent developments: the 1925 matrix
mechanics of Werner Heisenberg1 (1901–1976), and the 1926 wave mechanics
of Erwin Schrödinger2 (1887–1961). For the most part in this chapter we will
follow the path of wave mechanics, which is far more convenient for all but the
simplest calculations. After a look at the historical inspiration for wave mechan-
ics in Section 5.1 the Schrödinger equation will be introduced in Section 5.2 and
used to derive not only the hydrogen energy levels found by Bohr but also their
degeneracy. The general principles of the wave mechanical formulation of quan-
tum mechanics are laid out in Section 5.3 and provide a basis for the discussion
of spin in Section 5.4, identical particles in Section 5.5, and scattering processes
in Section 5.6. In Section 5.7 the general principles are supplemented with the
canonical formalism, which is used in Section 5.8 to work out the Schrödinger
equation for charged particles in a general electromagnetic field. This will pro-
vide us with examples of the application of a widely useful approximation
scheme, perturbation theory, which is outlined in general terms in Section 5.9.
The two approaches of wave and matrix mechanics were unified by Paul
Dirac (1902–1984) in a more abstract formalism, which he called transforma-
tion theory.3 This has evolved into a modern approach in which physical states
are represented by vectors in an abstract space known as Hilbert space, with
wave functions arising as components of these vectors in a suitable basis. The
Hilbert space approach is briefly described in Section 5.10.
1 W. Heisenberg, Zeit. Phys. 33, 879 (1925). This article is reprinted in English in Van der Waerden, Sources
of Quantum Mechanics, listed in the bibliography.
2 E. Schrödinger, Ann. Physik 79, 361, 409 (1926). These articles are reprinted in English in Shearer,
Collected Papers on Wave Mechanics, listed in the bibliography.
3 This approach is described in Dirac, The Principles of Quantum Mechanics, listed in the bibliography.
124
5.1 De Broglie Waves
Free-Particle Wave Functions

Wave mechanics can be traced to the 1923 Paris Ph.D. thesis of Louis de
Broglie4 (1892–1987). De Broglie was inspired by the quantum interpretation
of electromagnetic radiation. If an electromagnetic wave can somehow be
interpreted as a stream of particles, photons, then might not electrons, which
are undoubtedly particles, be described somehow as waves? As we saw in
Section 4.5, the momentum p and energy E of the photons making up an
electromagnetic wave that is proportional to exp(ik · x − iωt) are given by
p = h̄k and E = h̄ω, so the wave has the spacetime dependence

E and B ∝ exp ip · x/h̄ − iEt/h̄ , (5.1.1)
plus the complex conjugates. De Broglie in his thesis suggested that an electron
of momentum p is associated with a complex wave function of similar form

ψp (x, t) ∝ exp ip · x/h̄ − iE(p)t/h̄ , (5.1.2)
where now the energy is not c|p|, as for a photon, but rather is given by the
formula (4.5.4):

E(p) = m2e c4 + p2 c2 ,
with me the electron mass.
Group Velocity
The association of the wave (5.1.2) with a moving electron gained plausibility
from the remark that a localized packet of these waves travels with the velocity
of the electron. Consider a packet of these waves:

ψ(x, t) = d 3 p g(p) exp ip · x/h̄ − iE(p)t/h̄ (5.1.3)
where g(p) is a smooth function of momentum that is peaked at some value P.

Suppose also that g(p) is chosen so that at t = 0 the integral is peaked at x = 0.
(This will be the case if g(p) varies little over some range around P that is large
enough that if x is not near zero then the factor exp [ip · x/h̄)] in Eq. (5.1.3)
at t = 0 will undergo many oscillations over the range of the integral, which
makes the integral exponentially small except near x = 0.) Then, by expanding
the argument of the exponential around P, we have
E(p) E(P) + V · (p − P) + · · ·
4 L. de Broglie, Comptes Rendus Acad. Sci. 177, 507, 548, 630 (1923).
126 5 Quantum Mechanics
where

∂E(p)
Vi = . (5.1.4)
∂pi p=P
This gives the wave function for t = 0:

ψ(x, t) exp [iP · x/h̄ − i[E(P) − V · P]t/h̄]

× d 3 p g(p) exp(ip · [x − Vt]) . (5.1.5)
Because of the way we have constructed the packet function g(p), the mag-
nitude of (5.1.5) is peaked at x = Vt, which shows that the packet moves at
velocity V, known as its group velocity. But Vi = ∂E(p)/∂pi = c2 pi /E(p),
which as shown in Eqs. (4.4.5) and (4.4.6) is indeed the velocity of a particle of
momentum p.
Application to Hydrogen
De Broglie’s hypothesis met with just one initial success. The electron in a hy-
drogen atom is not free, but moves under the influence of the proton’s attraction.
Nevertheless, de Broglie supposed that the electron is described by the free-
particle wave function (5.1.2), but with the waves traveling in a circle around
the proton like sound waves in a toroidal organ pipe. To avoid a discontinuity in
ψ, it is necessary that a whole number n of wavelengths λ should fit around the
circle, so the radius of the circle is constrained by the condition that 2πr = nλ,
with n = 1, 2, . . . According to Eq. (5.1.2), λ = 2π h̄/p, where p ≡ |p|, so de
Broglie’s condition was
pr = nh̄ (5.1.6)
which for non-relativistic electrons with p = me v is the same as Bohr’s
condition (3.4.2), but now with no need of the correspondence principle to
infer that h̄ = h/2π. De Broglie could then repeat Bohr’s calculation, using
the non-relativistic formula E = me v 2 /2 − e2 /r for energy and the formula
me v 2 /r = e2 /r for centripetal acceleration, and thereby obtain Bohr’s formula
E = −e4 me /2h̄2 n2 for the hydrogen energy levels. Nothing new had been
learned about hydrogen, but de Broglie’s derivation at least gave a hint at an
explanation of Bohr’s quantization condition.
Davisson–Germer Experiment
There is a story that in his oral Ph.D. examination, de Broglie was asked if there
was some direct way of observing the wave nature of electrons, and he answered
that it might be possible to observe the diffraction of electron waves by a crystal
lattice, like the well-known diffraction of X-rays used for instance in measuring
the increase of wavelength in Compton scattering. Whether or not this story is
true, the idea was a good one. According to Eq. (5.1.2), the wavelength of a
non-relativistic electron with kinetic energy Ee
me c2 is given by
%
λ = 2π h̄/pe = 2π h̄/ 2me Ee = 12.26 × 10−8 cm [Ee (eV)]−1/2 . (5.1.7)
Hence we only need electrons with energy a bit larger than 10 eV to get wave-
lengths nearly as small as a typical lattice spacing, about 10−8 cm. This is no
coincidence. In de Broglie’s interpretation of the Bohr quantization assumption,
the wavelength of an electron with an energy of a few eV, which is typical
of atomic binding energies, must fit a few times around an atomic orbit, and
therefore must be similar to the size of the atom, which is similar to the spacing
of atoms in crystals.
Several physicists tried and failed to observe the diffraction of electron
waves, until it was finally measured in 1927 by Clinton Davisson (1881–1958)
and Lester Germer (1896–1971) at the old Bell Telephone Laboratories building
on West Street in Manhattan.5 (It was also measured at about the same time
at the University of Aberdeen by George Paget Thomson (1892–1975), a
son of J. J. Thomson.) They used a beam of electrons with kinetic energy
54 eV, incident on a single crystal of nickel with a spacing of lattice planes
d = 0.91 × 10−8 cm (already known from measurements using X-ray diffrac-
tion). Electrons are reflected not only from the surface of the crystal, but from
numerous planes within the nickel. At certain angles θ between the incident
and reflected waves all these reflected waves go off with the same phase and
therefore add constructively, leading to enhanced reflection at these angles.
According to a 1913 formula (derived in the appendix to this section) of
William Henry Bragg (1862–1942) and his son Lawrence Bragg (1890–1971),
for any sort of wave the angles θn between incident and reflected waves at
which reflection is enhanced in this way satisfy the Bragg formula:
nλ = 2d cos(θn /2) , (5.1.8)
where n = 1, 2, 3, . . . Davisson and Germer found an enhanced n = 1 reflection
at θ1 = 50◦ , giving a wavelength
λ = 2 × 0.91 × 10−8 cm × cos(25◦ ) = 1.68 × 10−8 cm ,
in satisfactory agreement with the wavelength 1.67 × 10−8 cm expected from
Eq. (5.1.7) for a kinetic energy of 54 eV.
The wave nature of the electron allowed the development of a new instru-
ment, the electron microscope. Recall that a photon of energy E has wavelength
5 C. Davisson and L. Germer, Phys. Rev. 30, 707 (1927).

C
f A f
f f
D B
Figure 5.1 Derivation of the Bragg formula. The bold lines represent the
planes of the crystal lattice, seen edge on. Arrows indicate the direction of the
light rays.
λγ = 2π h̄c/E, so the ratio of the wavelength (5.1.7) of an electron of energy

E to the wavelength of a photon of the same energy is

λe E
= .
λγ 2me c2
For energies in the range of 10 eV to 10 keV this is very much less than one,
giving electron microscopes much better resolution than microscopes using
photons of the same energy.
Appendix: Derivation of the Bragg Formula
Suppose that a wave of some sort is incident on a crystal lattice, with a ray
striking one plane of the lattice at point A, where it makes an angle φ between
the ray and the plane. (See Figure 5.1.) Part of the wave is reflected, with the
reflected ray making the same angle φ with the plane. Another part of the wave
continues in its original direction to the next plane, with the ray striking this
plane at point B, again at angle φ. Part of this ray is reflected at B, again at
angle φ, while another part continues to deeper planes. Draw a line from B in
a direction normal to the first reflected ray, intersecting this ray at a point C.
The purpose of this construction is that the two parallel reflected rays travel the
same distance from B and C to any distant detector, so the difference in the total
distance that each ray travels to the detector is AB − AC. The two rays interfere
constructively if this difference is a whole number n of wavelengths λ:
AB − AC = nλ , n = 1, 2, 3, . . .
If this is satisfied, then the difference in the distance traveled by rays reflected
from the second and third planes will also be nλ, and so on down into deeper
and deeper planes, so all these reflected waves will interfere constructively. The
same is true of any rays that strike the crystal along parallel directions, whether
they are reflected from the first, second, or any other crystal plane. We then have
a very strong enhancement of the reflection. So we have to ask, how do AB and
AC depend on the lattice spacing and on the angle φ between the rays and the
crystal planes?
Draw a line from A to the second plane, which intersects it at a right angle at
a point D. The length of the line AD is the spacing d of lattice planes. Looking
at the right triangle ADB (whose hypotenuse is AB) we see that
AB = d/ sin φ .
To calculate AC, note that the angle at B between BA and BC is 180◦ − 2φ −
90◦ = 90◦ − 2φ, so looking at the right triangle BAC (whose hypotenuse is
AB), we see that
AC = AB sin(90◦ − 2φ) = AB cos(2φ) = AB[1 − 2 sin2 φ]
so
AB − AC = 2AB sin2 φ = 2d sin φ
and the condition for constructive interference is therefore
nλ = 2d sin φ .
It is common to describe the reflection in terms of the angle θ between the
incident and reflected rays, θ = 180◦ − 2φ, so φ = 90◦ − θ/2, and the condition
for constructive interference is then
nλ = 2d cos(θ/2) ,
as was to be shown.
5.2 The Schrödinger Equation
Wave Equation in a Potential

De Broglie in 1923 had described the wave function associated with a free
electron and had scored some success in applying this to the electron in a
hydrogen atom, imagining a free electron wave running around the electron
orbit. But of course electrons in atoms are not free. Starting in 1925, Erwin
Schrödinger struggled to extend the idea of the wave function to an electron
moving in a potential.6
Schrödinger’s starting point was de Broglie’s wave theory. Equation (5.1.2)
gives the wave function of a free electron of momentum p as

ψp (x, t) ∝ exp ip · x/h̄ − iE(p)t/h̄ .
The content of this explicit formula can be expressed as a pair of differential
equations
−i h̄∇ψp (x, t) = pψp (x, t) , (5.2.1)
∂
ψp (x, t) = E(p)ψp (x, t) ,
i h̄ (5.2.2)
∂t
where, now in a non-relativistic approximation,
1 2
E(p) me c2 +
p . (5.2.3)
2me
An electron that is bound in an atom cannot have a definite momentum –
classically, it goes round and round its orbit – so we would not expect the bound
electron wave function to satisfy an equation like (5.2.1). On the other hand, we
can try to use something like Eq. (5.2.2) to find the wave function of a bound
electron, with Eq. (5.2.1) used only to interpret p as −i h̄∇ in E(p). Schrödinger
thus took the equation for a bound electron as
∂
i h̄ ψ(x, t) = E(−i h̄∇, x)ψ(x, t) , (5.2.4)
∂t
where now E is given a dependence on x to account for the presence of potential
energy. For a non-relativistic electron in a potential V (x) the energy is E(p, x) =
me c2 + p2 /2me + V (x), and Eq. (5.2.4) reads

∂ h̄ 2
i h̄ ψ(x, t) = me c2 − ∇ 2 + V (x) ψ(x, t) . (5.2.5)
∂t 2me
This is known as the time-dependent Schrödinger equation.
Because the potential is assumed time-independent, Eq. (5.2.5) has solutions
of the form

ψ(x, t) = exp − i(me c2 + E)t/h̄ ψ(x) , (5.2.6)
where

h̄2 2
− ∇ + V (x) ψ(x) = Eψ(x) . (5.2.7)
2me
6 E. Schrödinger, Ann. Phys. 79, 361, 409 (1926).

This is known as the time-independent Schrödinger equation. It is interpreted as

the condition for ψ(x) to represent a state with definite energy E, relative to the
rest-mass energy me c2 .
Boundary Conditions
This all began as just guesswork. Schrödinger and other physicists at first imag-
ined that ψ(x, t) gives an indication of how much of the electron is near x at
time t. As we will see when we come to scattering in Section 5.6, it was only
a few years later that Max Born correctly interpreted |ψ(x, t)|2 as a probability
density – that is, |ψ(x, t)|2 d 3 x is the probability that the electron is in a small
volume d 3 x at position x and time t. For the present, all we need to know is that
the relevant solutions of Eq. (5.2.7) are those with

|ψ(x)|2 d 3 x < ∞ (5.2.8)
so that by dividing ψ(x) by the square root of this integral we obtain a normal-
ized wave function, corresponding to a probability density for which the total
probability of the electron being somewhere is 100%.
If we assume (as is generally the case in practice) that V (x) vanishes at large
distances |x|, then at large |x| Eq. (5.2.7) becomes
h̄2 2
Eψ(x) → − ∇ ψ(x) . (5.2.9)
2me
A bound electron must have E < 0 (since otherwise it would be energetically
possible for the electron to escape to infinite distance) so Eq. (5.2.9) has solu-
tions that at large |x| behave as
ψ(x) → P (x) exp(±κ|x|) (5.2.10)
where κ is the positive square root,

κ = + 2me |E|/h̄2 , (5.2.11)
and P (x) is some function such as a polynomial that varies much more slowly
than an exponential for large |x|. (A derivative ∂/∂xi acting on the exponential
in Eq. (5.2.10) yields a constant factor ±κ, while a gradient acting on a function
P (x) in Eq. (5.2.10) that grows as a power of |x| gives a factor for |x| → ∞
proportional to 1/|x|.) Solutions of the time-independent Schrödinger equation
thus come in pairs, one of which (the one with a minus sign in the exponential
in Eq. (5.2.10)) satisfies the condition (5.2.8) at least as far as convergence at
large |x| is concerned, while the other does not.
We shall see that there is also a smoothness condition on ψ(x) at x → 0
that must be imposed on the wave function. We can always find solutions of
the Schrödinger equation (5.2.7) that satisfy either this condition at x → 0 or
the condition that ψ(x) ∝ exp(−κ|x|) at x → ∞, but we cannot impose both

conditions except for certain discrete values of E. These are the allowed energy
levels of the bound electron. Schrödinger was justly proud that the existence
of discrete energy levels, and hence the existence of atomic spectra discovered
over a century earlier, were now explained as a mathematical consequence of
boundary conditions imposed on a wave equation, rather than Bohr’s ad hoc
assumption of angular momentum quantization.
Spherical Symmetry
We now specialize to the case of spherical symmetry, which applies in one-
electron atoms (and approximately for each electron in atoms with many elec-
trons), for which the potential is only a function of r = |x|. There is a
mathematical identity that is useful for a wide variety of problems with spherical
symmetry in various branches of mathematical physics:

1 ∂ ∂f (x) 1
∇ 2 f (x) = 2 r2 + 2 (x × ∇)2 f (x) (5.2.12)
r ∂r ∂r r
where f (x) is an arbitrary differentiable function of position. (This can be
derived in the same way that in ordinary vector algebra we derive the familiar
identity (a × b)2 = a2 b2 − (a · b)2 , but here keeping track of the order of
the position variable and derivatives that act on it.) As already mentioned,
Schrödinger assumed that −i h̄∇ should be interpreted as the operator rep-
resenting the momentum, so the operator representing the orbital angular
momentum is
L ≡ −i h̄ x × ∇ , (5.2.13)
and we can write Eq. (5.2.12) as the identity

1 ∂ ∂f (x) 1
∇ 2 f (x) = 2 r2 − 2 L2 f (x) . (5.2.14)
r ∂r ∂r h̄ r 2
The time-independent Schrödinger equation (5.2.7) thus takes the form

h̄2 1 ∂ 2 ∂ψ(x) 1
− 2
r + L2 ψ(x) + V (r)ψ(x) = Eψ(x) . (5.2.15)
2me r ∂r ∂r 2me r 2
Radial and Angular Wave Functions

In order for the gradient operator in the Schrödinger equation to be well-defined,
we need the wave function to be analytic in x, by which is meant that it can be
expanded in a power series about any point, and in particular about the origin
x = 0. As we shall see, this condition can be imposed on the wave function
unless the potential is very singular.
Suppose that for some particular wave function, the smallest power of x in the
expansion of the wave function around the origin is some integer = 0, 1, 2, . . .
Then for x → 0 the wave function is dominated by a homogeneous polynomial
of order in the coordinates xi – that is, a sum of terms each of which has
factors of the coordinates xi . For instance, a homogeneous polynomial in x of
order zero is a constant, a homogeneous polynomial in x of order one is a linear
combination of x1 , x2 , and x3 , and a homogeneous polynomial in x of order two
is a linear combination of
x12 , x22 , x32 , x1 x2 , x2 x3 , x3 x1 .
Defining a radial coordinate r ≡ |x| and a unit vector x̂ ≡ x/r, the wave
function for r → 0 can now be written
ψ(x) → r Y (x̂) , (5.2.16)
where Y is some homogeneous polynomial of order in the unit vector x̂. (As
we shall see, for ≥ 1 there is more than one such polynomial, which will later
have to be distinguished by attaching an additional label to Y .)
Just knowing the value of is enough to tell us how L2 acts on the wave
function. Note that L does not act on functions of r, because Lf (r) = −i h̄(x ×
x̂f (r)) = 0, so L2 acts only on the direction x̂. For a wave function that goes
as (5.2.16) for r → 0, the first two terms on the left of Eq. (5.2.15) go as

h̄2 1 ∂ 2 ∂ψ(x) h̄2 ( + 1) −2
− r → − r Y (x̂) ,
2me r 2 ∂r ∂r 2me
1 1 −2 2
2
L2 ψ(x) → r L Y (x̂) ,
2me r 2me
while Eψ and (as long as the potential does not blow up as fast as 1/r 2 for
r → 0) also V (r)ψ are negligible for r → 0 compared with r −2 . Hence the
time-independent Schrödinger equation (5.2.15) requires that
L2 Y (x̂) = h̄2 ( + 1)Y (x̂) . (5.2.17)
We can therefore find solutions of the Schrödinger equation (5.2.15) of the form
ψ(x) = R(r)Y (x̂) , (5.2.18)
where R(r) satisfies the radial wave equation

h̄2 1 d 2 dR(r) h̄2 ( + 1)
− r + R(r) + V (r)R(r) = ER(r) ,
2me r 2 dr dr 2me r 2
(5.2.19)
with boundary conditions
R(r) ∝ r for r → 0 R(r) ∝ P (r) exp(−κr) for r → ∞ .
Here the term proportional to ( + 1) acts as a positive and hence repulsive
potential arising from the centrifugal force acting on an electron with non-zero
angular momentum. The function R(r) will turn out to depend on an index in
addition to , on which the energy also depends.
Angular Multiplicity
As we will see when we come to the periodic table of elements, it is important to
know the number of independent solutions of Eq. (5.2.17) for a given . For this
purpose, it is convenient first to recast Eq. (5.2.17) as a condition on r Y (x̂),
a homogeneous polynomial of order in the three-vector x. From Eq. (5.2.14),
we see that Eq. (5.2.17) is equivalent to the condition

∇ 2 r Y (x̂) = 0 . (5.2.20)
To distinguish among the solutions of Eq. (5.2.20), it is convenient to consider
the action of the operator L3 ≡ −i h̄(x1 ∂/∂x2 − x2 ∂/∂x1 ). Note that
L3 (x1 ± ix2 ) = −i h̄(−x2 ± ix1 ) = ±h̄(x1 ± ix2 ) , L3 x3 = 0 .
We can take a complete set of independent homogeneous polynomials in x of
order as the products of ν± factors of x1 ± ix2 and − ν+ − ν− factors of x3
for various non-negative integers ν± . The action of L3 on these products is

−ν −ν
L3 (x1 + ix2 )ν+ (x1 − ix2 )ν− x3 + −
−ν −ν
= h̄(ν+ − ν− ) (x1 + ix2 )ν+ (x1 − ix2 )ν− x3 + − .
Now, for an arbitrary function f (x),
∂2 ∂ 2f ∂ 2f
(L3 f ) = −2i h̄ + L 3 ,
∂x12 ∂x1 ∂x2 ∂x12
∂2 ∂ 2f ∂ 2f
(L3 f ) = +2i h̄ + L 3 ,
∂x22 ∂x2 ∂x1 ∂x22
∂2 ∂ 2f
(L3 f ) = L 3 ,
∂x32 ∂x32
so
∇ 2 (L3 f ) = L3 ∇ 2 f .
(We shall see in Section 5.4 that this is just a consequence of the rotational in-
variance of the Laplacian ∇ 2 .) It follows that when ∇ 2 acts on a sum of terms of
−ν −ν
the form (x1 + ix2 )ν+ (x1 − ix2 )ν− x3 + − , all with the same value of ν+ − ν− ,
it gives a sum of terms that again all have that value of ν+ −ν− . We can therefore
find solutions of Eq. (5.2.20) that are sums of products of coordinates all with
the same value of m ≡ ν+ − ν− , and label the solutions as r Ym (x̂), where
L3 Ym (x̂) = h̄mYm (x̂) .
How many solutions of Eq. (5.2.20) are there for a given ? First let’s ask
how many independent homogeneous polynomials in x of order of the form
−ν −ν
(x1 + ix2 )ν+ (x1 − ix2 )ν− x3 + − there are for a given . The exponent ν+
can be any integer from 0 to and, for a given ν+ , the exponent ν− can be any
integer from 0 to − ν+ , so the number of these independent homogeneous
polynomials of order in x is therefore
+
−ν

( + 1)
N = 1= ( − ν+ + 1) = ( + 1)2 −
2
ν+ =0 ν− =0 ν+ =0
( + 1)( + 2)
= .
2
We also have to impose the condition (5.2.20). The function ∇ 2 (r Y (x̂)) is
itself a homogeneous polynomial of order − 2 in the three-vector x, so setting
this function equal to zero imposes N−2 conditions on r Y (x̂), and the number
of independent Y subject to these conditions is thus
( + 1)( + 2) ( − 1)
N − N−2 = − = 2 + 1 .
2 2
But this is the same as the number of possible values of m ≡ ν+ − ν− , ranging
from m = − to m = +, so there must be just one function Ym (x̂) for each
and m.
The index m does not appear in Eq. (5.2.19), so the 2 + 1 states that differ
only in the value of m all have the same energy as long as the spherical sym-
metry of the atom is maintained. The degeneracy of these states can be lifted
by exposing the atom to an external perturbation which marks out a preferred
direction, in which case the energies of the different states will be split from one
another. Where the external perturbation is a magnetic field this is known as the
Zeeman effect, after Pieter Zeeman (1865–1943), who first reported it in 1897.
It was not possible to understand the details of the splitting of energy levels
in the Zeeman effect until the discovery of electron spin, to be discussed in
Section 5.4. We will calculate the Zeeman effect in Section 5.9, using the meth-
ods of perturbation theory, as an application of the quantum theory of the inter-
action of electrons with electromagnetic fields, to be described in Section 5.8.
Spherical Harmonics
Explicit formulas for spherical harmonics are needed in some applications of
quantum mechanics but they are not needed in the calculations of energy levels
in one-electron atoms. Nevertheless, to make this discussion of angular depen-

dence concrete, we give here all the spherical harmonics for = 0, = 1, and
= 2:

1
Y0 =
0
,
4π

3 3
Y1 = −
1
(x̂1 + i x̂2 ) = − sin θ eiφ ,
8π 8π

3 3
Y10 = x̂3 = cos θ ,
4π 4π

−1 3 3
Y1 = (x̂1 − i x̂2 ) = sin θ e−iφ ,
8π 8π

15 15
Y22 = (x̂1 + i x̂2 )2 = (sin θ )2 e2iφ ,
32π 32π

15 15
Y2 = −
1
(x̂1 + i x̂2 )x̂3 = − sin θ cos θ eiφ ,
8π 8π

5 5
Y20 = (2x̂32 − x̂12 − x̂22 ) = (3(cos θ )2 − 1) ,
16π 16π

−1 15 15
Y2 = (x̂1 − i x̂2 )x̂3 = sin θ cos θ e−iφ ,
8π 8π

−2 15 15
Y2 = (x̂1 − i x̂2 ) =
2
(sin θ )2 e−2iφ .
32π 32π
They are written in terms of the angles appearing in spherical polar coordinates:
x1 = r sin θ cos φ , x2 = r sin θ sin φ , x3 = r cos θ . (5.2.21)
The numerical factors have been chosen to make the spherical harmonics or-
thonormal, in the sense that
2π π

dφ sin θ dθ Ym ∗ (θ, φ) Ym (θ, φ) = δ δmm . (5.2.22)
0 0
Hydrogenic Energy Levels

Let’s now specialize further to the case of a one-electron atom, such as neutral
hydrogen, singly ionized helium, etc., with nuclear charge Ze. The electrostatic
potential felt by the electron is then the Coulomb potential V (r) = −Ze2 /r.
We seek a solution of the form (5.2.18), ψ(x) = R(r)Ym (x̂), where R(r) is
some function only of r. Then the radial wave equation (5.2.19) reads

h̄2 1 ∂ 2 ∂R(r) ( + 1)h̄2 Ze2 R(r)
− r + R(r) − = ER(r) . (5.2.23)
2me r 2 ∂r ∂r 2me r 2 r
We can easily find a family of exact solutions of this equation. Recall that,
according to the definition of , for r → 0 we have R(r) ∝ r , while as shown
above, for r → ∞ the function R(r) goes as exp(−κr) times some function
of r that grows more slowly than exp(κr). So let us try for a solution of the
form
R(r) = r exp(−κr) . (5.2.24)
The first term in Eq. (5.2.23) then contains a contribution in which both deriva-
tives act on the exponential, which takes the form −(h̄2 κ 2 /2me )R(r), which
according to Eq. (5.2.11) just matches ER(r) on the right-hand side. It also
contains a contribution in which both derivatives act on powers of r; this gives
a contribution

h̄2 exp(−κr) 1 ∂ 2 ∂r h̄2 ( + 1)R(r)
− r = − ,
2me r 2 ∂r ∂r 2me r 2
which cancels the second term on the left-hand side of Eq. (5.2.23). The first
term in Eq. (5.2.23) also contains contributions in which one derivative acts on
exp(−κr), giving a factor −κ, while the other derivative acts on a power of r,
giving a factor [( + 2) + ]/r = 2( + 1)/r, so these contribution add up to
2( + 1)h̄2 κ
+ R(r) .
2me r
This remaining contribution must cancel the Coulomb term −Ze2 R(r)/r, so
the necessary and sufficient condition for a solution of the form (5.2.24) is
( + 1)h̄2 κ
= Ze2 .
me
We conclude that these solutions have
h̄2 κ 2 Z 2 e 4 me
E=− =− 2 , (5.2.25)
2me 2h̄ ( + 1)2
which agrees with the Bohr formula (3.4.6) if we identify n with + 1.
This is not the only class of solutions. It is straightforward though tedious to
show that, in addition to solutions of the form (5.2.24), there are more general
solutions of the form
ψ(r) = r P,ν (r) exp(−κr) , (5.2.26)
where P,ν (r) is a polynomial of order ν. Without actually constructing these
polynomials, we can relate the energy to and ν by considering Eq. (5.2.23) in
the limit r → ∞. In this limit, Eq. (5.2.26) gives
R(r) ∝ r +ν exp(−κr) .

By repeating the arguments previously applied to the solution (5.2.24), but now
only for r → ∞, we see that the first term in Eq. (5.2.23) contains a contribution
in which both derivatives act on the exponential, which matches the term ER
on the right-hand side, and another contribution in which both derivatives act
on powers of r, which is negligible for r → ∞, as is the centrifugal potential
and terms in which one or two derivatives act on sub-leading terms in P,ν (r).
This leaves the potential term and the part of the first term in Eq. (5.2.23) in
which one derivative acts on the exponential and the other on the leading power
of r in P,ν (r), which gives a contribution that cancels the potential term if
( + ν + 1)h̄2 κ/me = Ze2 , or in other words if E is given by the Bohr formula
h̄2 κ 2 Z 2 e 4 me
E=− =− , (5.2.27)
2me 2h̄2 n2
where now
n=ν++1, (5.2.28)
with ν the order of the polynomial in Eq. (5.2.26), and hence a non-negative
integer.
The positive-definite integer n ≥ + 1 defined by Eq. (5.2.28), on which
the energy solely depends, is known as the principal quantum number. Spectro-
scopists have developed a terminology, in which the letters s, p, d, f and so
on stand for = 0, 1, 2, 3, etc. A state is labeled first with n, and then with
the letter indicating , so in hydrogen the states are 1s, 2s, 2p, 3s, 3p, 3d, and
so on.
We can now work out the degeneracy of these energy levels. Since the energy
depends only on n, according to Eq. (5.2.28) for each energy we can have
equal to anything between = 0 and = n−1 (for which respectively ν = n−1
and ν = 0). We have seen that for each there are 2 + 1 states distinguished by
different values of m. So, according to this reasoning, the total number of states
for a given energy and hence a given value of n is

n−1
(n − 1)n
#n = (2 + 1) = 2 + n = n2 . (5.2.29)
2
=0
As we will see in Section 5.4, because electron spin has been left out in this
calculation the degeneracy (5.2.29) is too small by a factor 2.
5.3 General Principles of Quantum Mechanics
As we have seen in the story so far, quantum mechanics began with guess-
work: Einstein’s guess that the energy and momentum of light waves comes in
particles; Bohr’s guess that if the energy and momentum of radiation are quan-
tized then so are other things, such as the angular momentum of electrons
in atomic orbits; de Broglie’s guess that if electromagnetic waves consist of
particles then particles such as electrons behave like waves; and Schrödinger’s
guess that the differential equations for de Broglie’s waves could be modified for
atomic electrons by inserting a potential. It is time that we move on from this
guesswork and describe the general principles of quantum mechanics as they
emerged in the formalism of wave mechanics soon after 1925. Then in following
sections we shall go on to applications of quantum mechanics in contexts more
general than those considered so far.
States and Wave Functions

The first general principle of wave mechanics is that physical states are repre-
sented by wave functions, functions ψ(x1 , x2 , . . . ), with one coordinate argu-
ment for each particle in the system (and, as we shall see in the next section,
with the wave function depending also on the 3-component of each particle’s
spin angular momentum). As anticipated in Section 5.2 for the case of single-
particle wave functions, the probability in a state represented by wave function
ψ that one particle is in a small volume d 3 x1 around x1 , another particle is in a
small volume d 3 x2 around x2 , and so on, is
dP = |ψ(x1 , x2 , . . . )|2 d 3 x1 d 3 x2 · · · . (5.3.1)
Since with 100% probability the particles have to be somewhere, this requires
the wave function to satisfy the normalization condition

d 3 x1 d 3 x2 · · · |ψ(x1 , x2 , . . . )|2 = 1 . (5.3.2)
Two wave functions that differ only by a constant phase factor of absolute value
unity represent the same state. In solving differential equations for the wave
function, the important thing is that the integral (5.3.2) should be finite – in
that case we can always find a ψ that satisfies Eq. (5.3.2) by dividing the wave
function by the square root of this integral.
Observables and Operators

The second general principle of wave mechanics is that observable physical
quantities are represented by linear operators on these wave functions, here
generally distinguished by upper case letters. By an operator A being “linear”
is meant that, for any pair of wave functions ψ1 and ψ2 and numbers a1 and a2 ,
we have
A[a1 ψ1 + a2 ψ2 ] = a1 Aψ1 + a2 Aψ2 . (5.3.3)
As part of this principle, a state represented by a wave function ψ has a definite

value α for the observable represented by an operator A if and only if
Aψ = αψ . (5.3.4)
In this case we say that ψ is an eigenfunction of A with eigenvalue a.
For instance, the operator Pnj that represents the j th component (with
j = 1, 2, 3) of the momentum of the nth particle acts as −i h̄ times the partial
derivative ∂/∂xnj with respect to the j th component of the coordinate of the nth
particle, acting on whatever function is to the right. This is clearly linear in the
sense of Eq. (5.3.3). In order for a one-particle state to have a definite value p
for the momentum of the particle, it is necessary that the wave function should
satisfy an equation of the form (5.3.4), which in this case reads
−i h̄∇ψ(x) = pψ(x) ,
which has as a solution the de Broglie wave function
ψ(x) ∝ exp(ip · x/h̄) .
This raises a problem, which is endemic to values of observables that like
momentum lie in a continuous spectrum of possible values: the integral (5.3.2)
is infinite for a wave function of this form and therefore cannot be normalized.
But we can find a wave function that is arbitrarily close to this form for an
arbitrarily large range of position:
ψ(x) = (πL)−3/2 exp(−x2 /2L2 ) exp(ip · x/h̄) ,
in which the constant factor (πL)−3/2 is chosen so that this ψ satisfies the
normalization condition (5.3.2). The constant L can be chosen as some very
large length, in which case the particle is almost certainly in a very large volume
L3 , where it almost certainly has the momentum p.
The operator Xnj that represents the j th component of the position vector of
the nth particle, acting on any function of position to its right such as a wave
function or the derivative of a wave function, simply multiplies that function by
the argument xnj and is obviously linear. Here again we have the problem that its
eigenfunctions cannot be normalized. In the one-particle case, a wave function
ψ(x) that represents a state with a definite position a would have Xψ(x) ≡
xψ(x) equal to aψ(x) for all x, so that it would have to vanish for all x = a, and
the integral (5.3.2) would vanish. But we can find a normalized wave function
that represents a state in which the particle is almost certainly very close to
position a:

ψ(x) = (πd)−3/2 exp − (x − a)2 /2d 2 ,
where d is here some very small length.
From operators we can construct other operators, which may or may not
represent physical quantities. Linear combinations provide a trivial example:
if A and B are operators while a and b are ordinary complex numbers, then
aA + bB is an operator for which
[aA + bB]ψ = aAψ + bBψ .
The product AB of any two operators A and B is defined by associativity: it is
an operator that, acting on any function f to its right, gives the same result as
acting first with B and then acting to the right on Bf with A:
(AB)f ≡ A(Bf ) . (5.3.5)
The Hamiltonian
One linear operator formed in this way is the Hamiltonian, which represents the
energy. For instance, for a single non-relativistic particle moving in a potential
V the Hamiltonian is
1 2
H = P + V (X) . (5.3.6)
2m
The time-independent Schrödinger equation (5.2.7) is just the statement
H ψ = Eψ, which tells us that ψ represents a state with energy E. The eigen-
functions of this Hamiltonian with negative eigenvalues are normalizable, a con-
dition we imposed in finding the bound state energy values in Section 5.2, but
there are also eigenfunctions with positive eigenvalues, representing unbound
states, which can only be normalized in the same approximate sense as the
eigenfunctions of position and momentum.
Adjoints
There is another process for producing new operators from other operators,
analogous to taking the complex conjugate of a number. For any operator A,
we define the adjoint A† as the operator for which

[Aψ1 ] ψ2 = ψ1∗ [A† ψ2 ]
∗
(5.3.7)
where ψ1 and ψ2 are any two wave functions. Here and below we use the
abbreviation

ψ1 ψ2 ≡ d x1 d 3 x2 · · · ψ1∗ (x1 , x2 , . . . )ψ2 (x1 , x2 , . . . ) .
∗ 3
(5.3.8)
It is easy to see that the adjoint of a product is the product of the adjoints in the
opposite order:
[AB]† = B † A† (5.3.9)
because

∗ ∗
[ABψ1 ] ψ2 = [Bψ1 ] [A ψ2 ] =
†
ψ1∗ [B † A† ψ2 ] .
It is also obvious that the adjoint of a linear combination of operators is the same
linear combination of the adjoints, but with complex conjugate coefficients
[aA + bB]† = a ∗ A† + b∗ B † , (5.3.10)
and the adjoint of an adjoint gives back the original operator:
[A† ]† = A . (5.3.11)
There is an important class of linear operators that are their own adjoints
A† = A . (5.3.12)
A physical quantity represented by such an operator can only have real values
in any state, for if Aψ = αψ for some wave function ψ, then

α ψ ∗ ψ = ψ ∗ [Aψ] = [Aψ]∗ ψ = α ∗ ψ ∗ ψ
so α = α ∗ . Such operators are called self-adjoint, or Hermitian. The coordinate

operator Xn is obviously self-adjoint, and the momentum operator Pni is too,
because the minus sign produced by taking the complex conjugate of −i is
cancelled by the minus sign produced by integration by parts:

[Pni ψ1 ]∗ ψ2 = [−i h̄∇ni ψ1 ]∗ ψ2 = +i h̄ [∇ni ψ1 ]∗ ψ2

= −i h̄ ψ1 [∇ni ψ2 ] = ψ1∗ [Pni ψ2 ] .
∗
Assuming the potential to be real, the Hamiltonian (5.3.6) is also self-adjoint,

so the allowed energy values are all real. Also, all components of the angular
momentum operator L = X × P for a single particle are self-adjoint. For
instance
L†3 = (X1 P2 − X2 P1 )† = P2 X1 − P1 X2 = X1 P2 − X2 P1
the last step being valid because P2 does not act on x1 and P1 does not act on
x2 ; both only act on whatever function of x that L3 is acting on. Likewise of
course for L1 and L2 .
Self-adjoint operators have another important property. If A is self-adjoint
and Aψ1 = α1 ψ1 and Aψ2 = α2 ψ2 , then

α1 ψ2 ψ1 = ψ2 [Aψ1 ] = [Aψ2 ] ψ1 = α2 ψ2∗ ψ1 ,
∗ ∗ ∗

so if α1 = α2 then ψ2∗ ψ1 = 0. Such wave functions are said to be orthogo-
nal. For instance any two different spherical harmonics such as those listed in
Section 5.2 are orthogonal, because they are eigenfunctions of the self-adjoint
operators L2 and L3 with different eigenvalues h̄2 ( + 1) and/or h̄m.
Expectation Values
The interpretation given by Eq. (5.3.1) of |ψ|2 as a probability density tells us
that if we measure any function f (x1 , x2 , . . . ) of positions many times in the
state represented by wave function ψ, the mean value of the measured values
will be

f ψ = f (x1 , x2 , . . . ) |ψ(x1 , x2 , . . . )|2 d 3 x1 d 3 x2 · · · ,

provided that ψ is normalized so that |ψ(x1 , x2 , . . . )|2 d 3 x1 d 3 x2 · · · = 1.
Since ψ(x1 , x2 , . . . ) is an eigenfunction of the operator f (X1 , X2 , . . . ) with
eigenvalue f (x1 , x2 , . . . ), this can be written

f ψ = ψ ∗ (x1 , x2 , . . . ) f (X1 , X2 , . . . )ψ](x1 , x2 , . . . ) ,
or, in our abbreviated notation,

f ψ = ψ ∗ [F ψ] ,
where F is the operator f (X1 , X2 , . . . ). It is only a short step from this to

a third postulate of quantum mechanics, which states that when any physical
quantity represented by an operator A is measured many times, each time in the
state represented by normalized wave function ψ, then the average value found
for this quantity is

Aψ = ψ ∗ [Aψ] (5.3.13)
or, if the wave function is not normalized,

∗
ψ [Aψ]
Aψ = ∗ .
ψ ψ
This is called the expectation value of A for the wave function ψ. For a self-
adjoint operator
∗
ψ ∗ [Aψ] = [Aψ]∗ ψ = ψ ∗ [Aψ] ,
so the expectation value of a self-adjoint operator is real for any wave function.
It is obvious that if Aψ = αψ then the expectation value Aψ of A for
the wave function ψ is just α, but expectation values give useful information
even for wave functions that do not represent states with a definite value for
the observable. For instance, the mean square spread of values of an observable
represented by A around its mean value is
* +
(A)2 ≡ (A − A)2 . (5.3.14)
Probabilities
Suppose a physical system is in a state represented by a normalized wave func-
tion ψ, and we measure an observable represented by a Hermitian operator A
that (to start with the simplest case) has only discrete non-degenerate eigenval-
ues αn with eigenfunctions ϕn . Even though ψ will not in general be one of these
eigenfunctions, it can generally be expanded as a series of terms proportional to
the eigenfunctions

ψ= c n ϕn ,
n
where cn are some numerical coefficients. (The proof of the possibility of such
an expansion depends on detailed properties of the operator A.) As we have
seen, such eigenfunctions are orthogonal and, if properly normalized, can be
taken as orthonormal in the sense that

∗ 1 n=m
ϕm ϕn = δnm ≡
0 n = m .
∗ and
We can find the coefficients cm by multiplying the expansion with ϕm
integrating over all values of the arguments, which gives

∗ ∗
ϕm ψ = cn ϕm ϕn = c m .
n
The expectation value of the observable A is then

∗
Aψ = ψ Aψ = cn cm ϕn∗ Aϕm
∗
nm
or, since Aϕm = αm ϕm ,
2

Aψ = αm cn∗ cm ϕn∗ ϕm = αm |cm | =
2 ∗
αm ϕm ψ .
nm m
Since a corresponding result is true for any function of A, the inevitable inter-
pretation is that when the observable represented by A is measured in a state
represented by the normalized wave function ψ, the probability of finding the
result αm is
2

∗
Pm (ψ) = ϕm ψ . (5.3.15)
This is known as the Born rule and can be taken instead of Eq. (5.3.13) as the
third postulate of quantum mechanics.
Continuum Limit
We can also calculate probability densities for an observable that takes a con-
tinuum of values, by taking the limit of the case in which the observable takes a
very large number of very close discrete values. If the number of values of the
index n for which the eigenvalue αn is in a range from α to α + dα is N (α)dα,
then in the state represented by normalized wave function ψ the probability of
finding the observable in this range is
2

∗
dP (α) = N (α)dα × ϕα ψ , (5.3.16)
where ϕα is any normalized eigenfunction of A with eigenvalue in this narrow

range. For any such observable, instead of working with the conventional wave
functions ψ(x1 , x2 , . . . ) we can use wave functions
%
ψ(α) ≡ N (α) ϕα∗ ψ (5.3.17)
for which Eq. (5.3.16) gives the probability of finding the observable in the
range from α to α + dα:
dP (α) = |ψ(α)|2 dα . (5.3.18)
The classic example of such continuum operators and alternative wave functions
is provided by momentum.
Momentum Space
Consider for instance a particle in a cubical box of edge L. The normalized
wave function representing a state with definite momentum p is
ϕp (x) = L−3/2 exp(ip · x/h̄) .
Pretty much as we saw for photons in Section 3.2, the allowed momenta take
the form p = 2πnh̄/L, where n is a vector with integer components, so that
this wave function should have the same values on opposite sides of the box.
In a state represented by a normalized wave function ψ(x), the probability of
finding the momentum to have value p is
2

Pp = ∗ 3
ϕp (x)ψ(x) d x ,
L3
the integral here taken over the interior of the box. We can pass to the continuum
limit by taking the box to be very large, so that the allowed momentum values
are very close together. Since the allowed vectors n form a lattice of cubes, each
of volume unity, the number dN of these allowed momenta in a small volume of
momentum space d 3 p around p equals the corresponding volume in the space
of vectors n:

L 3 3
dN = d n =
3
d p
2π h̄
so the probability of finding the momentum in this range is
2
L 3 3
d p × ϕp (x)ψ(x) d x = |ψ ¶ (p)|2 d 3 p ,
∗ 3
(5.3.19)
2π h̄ L3
where
3/2
L
ψ (p) ≡
¶
ϕp∗ (x)ψ(x) d 3 x
2π h̄ L3

→ (2π h̄)−3/2 exp(−ip · x/h̄)ψ(x) d 3 x , (5.3.20)
with the last integral taken over all space. We can just as well say that the state
of the system is represented by the momentum-space wave function ψ ¶ (p) as by
the coordinate-space wave function ψ(x). Indeed, as we will see in Section 5.10,
both the coordinate-space wave function ψ and the momentum-space wave
function ψ ¶ are nothing but the components in different bases of a vector in
an abstract space, known as Hilbert space.
Commutation Relations
The commutator of two operators A and B, written [A, B], is defined by
[A, B] ≡ AB − BA . (5.3.21)
In order for the physical quantities represented by operators A and B to have
definite numerical values α and β in a state represented by wave function ψ it
is necessary that the commutator [A, B] acting on ψ should vanish, because
[A, B]ψ = βAψ − αBψ = βαψ − αβψ = 0 .
In particular, it is never possible for any state to have definite values for both of
two quantities represented by operators whose commutator is simply a non-zero
number, because such a commutator can never give zero when acting on any ψ.
It is helpful in evaluating commutators to note that commutation acts like
differentiation. For instance,
[A, BC] = ABC − BCA = ABC − BAC + BAC − BCA
= [A, B]C + B[A, C] .
Thus, since all components of momentum commute with one another, they also
commute with any function only of momenta, such as the total kinetic energy
operator n P2n /2mn .
Uncertainty Principle
Note that the commutator of Xni and Pmj acting on any multi-particle wave
function ψ is
∂ψ ∂
[Xni , Pmj ]ψ = −i h̄xni + i h̄ (xni ψ)
∂xmj ∂xmj
∂ψ ∂ψ
= −i h̄xni + i h̄δij δnm ψ + i h̄xni = +i h̄δij δnm ψ ,
∂xmj ∂xmj
a result we write as a commutation relation
[Xni , Pmj ] = i h̄δij δnm . (5.3.22)
This shows in particular that there can be no state in which a component of some
particle’s position and the same component of the same particle’s momentum
both have definite values.
Indeed, using this commutation relation, it is possible to set a lower bound
on the product of the root mean square spread of values of position and
momentum:
Xni Pni ≥ h̄/2 (5.3.23)
a result known as the Heisenberg uncertainty principle.7
We can see this in a simple example. The normalized wave function for a
particle confined to a distance d around some position a can be written as a
superposition of wave functions with definite momentum:

(πd)−3/2 exp − (x − a)2 /2d 2
3/2
d

= d 3
p exp ip · [x − a]/h̄ exp(−d 2 p2 /2h̄2 ) .
2π 2 h̄2
(5.3.24)
We see that if the spread in values of x is of order d, then the spread in values
of p is of order h̄/d, and the product of the spreads is of order h̄, in accordance
with the uncertainty principle.
7 W. Heisenberg, Zeit. Phys. 43, 172 (1927). For a textbook proof, see Weinberg, Lectures on Quantum
Mechanics, listed in the bibliography.
Time Dependence
In the earliest formulation of quantum mechanics, wave functions were given a
time dependence governed by the time-dependent Schrödinger equation:
∂
i h̄ ψ(x1 , x2 , . . . ; t) = H ψ(x1 , x2 , . . . ; t) (5.3.25)
∂t
where H is the Hamiltonian operator, representing the energy of the system.
The wave function of a state with a definite energy E thus has a trivial time-
dependence, contained in a phase factor exp(−iEt/h̄). The expectation value
(5.3.13) of any operator for such a wave function is independent of time. More
generally, assuming that the Hamiltonian is self-adjoint, the time dependence
of the expectation value of an observable represented by an operator A in a
state represented by a normalized wave function ψ that satisfies Eq. (5.3.25) is
governed by the differential equation

d
A = ψ [(−i/h̄)AH ψ] + [(−i/h̄)H ψ]∗ [Aψ]
∗
dt

∗ ∗
= (−i/h̄) ψ [AH ψ] − ψ [H Aψ]
and therefore
d
i h̄A = [A, H ] . (5.3.26)
dt

In particular, the normalization integral ψ ∗ ψ is the expectation value (5.3.13)
of the unit operator, which acting on any wave function just gives the same wave
function. Since this operator commutes with the Hamiltonian (or anything else),
the normalization integral is constant in time; once normalized, wave functions
remain normalized.
For instance, in the case of a single particle moving in an external potential,
with Hamiltonian (5.3.6),
1 2
H = P + V (X) ,
2m
we have
1 i h̄
[X, H ] = [X, P2 ] = P (5.3.27)
2m m
and
[P, H ] = [P, V (X)] = −i h̄∇V (X) . (5.3.28)
The equations of motion of the expectation values are then
d d
X = P/m , P = −∇V (X) . (5.3.29)
dt dt
This is much the same as in classical physics, but note that ∇V (X) is not the
same as ∇V (X), so this is not a closed set of equations.
Conservation Laws
Operators that commute with the Hamiltonian deserve special attention, in part
because their expectation values (and the expectation values of any functions
of them) are time-independent for any wave function. These represent what are
called conserved quantities. Among these operators is of course H itself, so the
mean energy H of any state is constant in time. The momenta of particles
moving in an external potential is not conserved, but the total momentum of a
number of particles is conserved if the potential depends only on the differences
of their coordinates. For instance, for such a two-particle system,

P1 + P2 , V (X1 − X2 ) = −i h̄∇1 V (X1 − X2 ) − i h̄∇2 V (X1 − X2 ) = 0 .
What about angular momentum? For simplicity, consider just a single particle,
whose orbital√ angular momentum is L = X × P, in a potential that depends
only on R = X2 . It is straightforward to work out that the commutators of a
general linear combination e · L of the components of L with the position and
momentum operators are
[e · L , X] = −i h̄ e × X , (5.3.30)
[e · L , P] = −i h̄ e × P . (5.3.31)
For instance,
[e · L, X1 ]

= e1 (X2 P3 − X3 P2 ) + e2 (X3 P1 − X1 P3 ) + e3 (X1 P2 − X2 P1 ) , X1
= −i h̄(e2 X3 − e3 X2 ) = −i h̄(e × X)1 .
It follows 2 2
√ that each component of L commutes with P and X , and hence also
2
with V ( X ), and so with the Hamiltonian.
Another reason for us to give special attention to operators that commute with
the Hamiltonian is that states with a given energy can be classified according
to the eigenvalues of these conserved quantities. For instance, for a Coulomb
potential the states with a given principal quantum number n and hence with a
given energy can be classified according to the eigenvalues of L2 and L3 , both
of which commute with the Hamiltonian as well as with each other. Of course,
L1 and L2 also commute with the Hamiltonian and with L2 , but as we shall
see in the next section they do not commute with each other, or with L3 , so the
best we can do is to classify states according to the eigenvalues of L2 and L3 as
well as H .
Heisenberg and Schrödinger Pictures

The formalism described here, in which the wave function depends on time but
operators are time-independent, is known as the Schrödinger picture. There is
another formalism, known as the Heisenberg picture, in which wave functions
are time-independent and operators depend on time. To make clear the relation
between these pictures, just for the present we will use subscripts H and S to
distinguish wave functions and operators in the Heisenberg and Schrödinger
pictures.
The time-dependent Schrödinger equation (5.3.25) has a formal solution
ψS (t) = e−iH t/h̄ ψS (0);
so, if we define the Heisenberg-picture wave function as the Schrödinger-picture
wave function at zero time,
ψH ≡ ψS (0) (5.3.32)
then wave functions in the two pictures are related by
ψS (t) = e−iH t/h̄ ψH . (5.3.33)
In the Heisenberg picture, in order to preserve Eq. (5.3.26) we must give oper-
ators a time dependence with
d
i h̄ AH (t) = [AH (t), H ] . (5.3.34)
dt
The commutators of position and momentum with the Hamiltonian given in
Eqs. (5.3.27) and (5.3.28) show that Eq. (5.3.34) gives these operators in the
Heisenberg picture the same time dependence as the corresponding quantities
in classical mechanics. To satisfy Eq. (5.3.34), we define the Heisenberg-picture
operator AH (t) in terms of the Schrödinger-picture operator AS representing the
same observable by
AH (t) = eiH t/h̄ AS e−iH t/h̄ . (5.3.35)
We can go back and forth between the two pictures. For instance, for an arbitrary
operator A and wave function ψ, the Schrödinger-picture wave function corre-
sponding to the Heisenberg-picture operator AH (t) acting on the Heisenberg-
picture wave function ψH is
e−iH t/h̄ AH (t)ψH = AS e−iH t/h̄ ψH = AS ψS (t) ,
just as if we had worked from the beginning in the Schrödinger picture.
The Heisenberg picture and Schrödinger picture are physically equivalent,
but useful in different contexts. The Schrödinger picture is more naturally used
in calculating bound state energies, and, as we shall see in Section 5.6, it can
also be used for scattering processes. The Heisenberg picture is invaluable when
we want to use known equations of motion for observables to motivate a choice
of Hamiltonian, as we will do in Section 5.8. Also, in field theories where

observables depend on position, in order to preserve the appearance of Lorentz
invariance it is necessary to work in the Heisenberg picture, so that these
observables will depend on time as well as on space coordinates.
5.4 Spin and Orbital Angular Momentum
Spin Discovered
The counting of states described in Section 5.2 was already known in 1925 to
be in conflict with spectroscopic data. The problem emerged most clearly in the
study of alkali metals. These are elements such as lithium, sodium, potassium,
etc. that were known to readily lose a single electron.8 In the contemporary
atomic models of the time, this meant that an alkali metal atom has one loosely
bound electron outside inner shells of more tightly bound electrons. The poten-
tial felt by this outer electron is spherically symmetric but it is not a Coulomb
potential, which would be proportional to 1/r; so because L3 and L2 commute
with each other and with H it was still expected that states of definite energy
would also have definite values h̄2 ( + 1) for L2 and 2 + 1 states of equal
energy for any given , distinguished by different eigenvalues h̄m of L3 , but no
further degeneracy was expected. States could still be labeled with a principal
quantum number n, defined so that the number of nodes of the wave function
(values of r where the wave function vanishes) is n − − 1, as it is in hydrogen,
but, unlike the case of hydrogen, here the energies depend on as well as on n.
There is a very well studied “D-line” in the spectrum of sodium vapor (which
gives sodium vapor lamps their orange color) with wavelength about 5890
angstroms, interpreted as a 3p → 3s transition between states of the outermost
electron with n = 3. But even with moderate resolution, spectroscopists were
able to see that this line was doubled, having two components with wavelengths
5896 angstroms and 5890 angstroms. Wolfgang Pauli (1900–1958) was led
to suggest that, on the basis of this and other data, there is a fourth quantum
number, besides n, , and m, which takes just two values in all states with ≥ 1.
But the physical significance of this quantum number was at first mysterious.
Then in 1925 two young Dutch physicists, Samuel Goudsmit (1902–1978)
and George Uhlenbeck (1900–1988), suggested9 that the extra quantum number
8 With the charge of the electron and the atomic weights of these elements known, it could be concluded
from the ratio of the metal mass produced in electrolysis to the electric charge used that one electron is
needed to convert one ion in a solution of the metal salt to an alkali metal atom, so the atom in becoming
an ion had to lose just one electron. This was in contrast with metals like beryllium, magnesium, calcium,
etc., which require two electrons to convert an ion to an atom.
9 S. Goudsmit and G. Uhlenbeck, Naturwiss. 13, 953 (1925).
was associated with an internal angular momentum, or spin, of the electron. At

first this idea seemed absurd. If the spin S is anything like L, then any one
component of S should take 2s + 1 values, running by unit steps from −h̄s up
to +h̄s, where s is given by S2 = h̄2 s(s + 1). But to have 2s + 1 = 2 we need
s = 1/2, while is always an integer.
The notion of spin s = 1/2 was not understood until physicists adopted a
more mature view of the nature of angular momentum, that it is an operator
whose existence and properties are dictated by the invariance of the laws of
nature under rotations, rather than by experience with classical spinning bodies.
This takes some explanation.
Rotations
In general, an infinitesimal rotation changes any vector v by an amount
δv = e × v (5.4.1)
where e is an infinitesimal 3-vector characterizing the rotation. This is a rotation,
because it leaves all scalar products unchanged:
δ(v · v ) = v · (e × v ) + (e × v) · v = 0 . (5.4.2)
It is in fact (though we don’t need to know this here) a rotation by an in-
finitesimal angle of |e| radians counterclockwise around the direction of e. For
instance, if e is in the 3-direction then (5.4.1) gives
δv1 = −|e|v2 , δv2 = +|e|v1 , δv3 = 0 .
Now, suppose that one observer sees that a physical system is in a state
represented by a wave function ψ, and suppose a second observer views the
same state using coordinate axes that have been subjected to a slight rotation,
which changes any vector v by an infinitesimal amount e × v. What does she
see? For e infinitesimal the change in the wave function must be linear in e, and
can therefore be written
δψ = (i/h̄) e · J ψ (5.4.3)
where J is some triplet of operators and the factor (i/h̄) is inserted for future
convenience.
∗ We would not want the rotation to change the total probability
ψ ψ = 1 that the particles in the system are somewhere, so we require that

0 = δ ψ ψ = [(i/h̄)e · Jψ] ψ + ψ ∗ (i/h̄) e · J ψ
∗ ∗

= (i/h̄) ψ ∗ e · (−J† + J)ψ
and therefore J must be self-adjoint;

J† = J . (5.4.4)
(We are here using the abbreviation (5.3.8):

ψa∗ ψb ≡ d 3 x1 d 3 x2 · · · ψa∗ (x1 , x2 , . . . )ψb (x1 , x2 , . . . )
and the definition (5.3.7) of the adjoint A† of any operator A, except that as we
shall see we must include discrete variables along with coordinates.)
In order for the transformation of the wave function to correspond to a rota-
tion, it is necessary that it should produce a rotation of expectation values. That
is, if V is an operator representing an observable that transforms as a vector
under the general infinitesimal rotation (5.4.1), we must have
δV = e × V . (5.4.5)
From Eqs. (5.4.3) and (5.4.4) we see that

δ ψ Vψ = [(−i/h̄)e · Jψ] Vψ + ψ ∗ V(−i/h̄)e · Jψ
∗ ∗

= (i/h̄) ψ ∗ [e · J, V]ψ
and therefore we require

[e · J, V] = −i h̄ e × V. (5.4.6)
The same reasoning shows that J commutes with any rotationally invariant oper-
ator. For instance, for any pair of vector operators V and V , it is a consequence
of Eq. (5.4.6) that

[e · J , V · V ] = −i h̄ [e × V] · V + V · [e × V ] = 0 ,
just as in Eq. (5.4.2). In particular, as long as the Hamiltonian is rotationally
invariant it commutes with J,
[J, H ] = 0 , (5.4.7)
so, according to Eq. (5.3.26) angular momentum is conserved, in the sense that
any expectation value of J is time-independent.
The requirement that the product e · J in Eq. (5.4.3) should not depend on the
orientation of the coordinate axes implies that the operator J is itself a vector
and hence satisfies Eq. (5.4.6);
[e · J, J] = −i h̄ e × J (5.4.8)
for any e. From the coefficients of the different components of e in this equation,
we easily find the equivalent commutation relations:
[J1 , J2 ] = i h̄J3 , [J2 , J3 ] = i h̄J1 , [J3 , J1 ] = i h̄J2 . (5.4.9)
(For instance, the 2-component of Eq. (5.4.8) is
[e · J, J2 ] = −i h̄ (e × J)2 = −i h̄ [e3 J1 − e1 J3 ],
in which the coefficient of e1 is [J1 , J2 ] = i h̄J3 .) Also, J2 like any other scalar
commutes with J:
[Ji , J2 ] = 0 . (5.4.10)
As we shall see, it is the commutation relations (5.4.9) that determine the pos-
sible values of J2 and the possible values of J3 for a given J2 .
Spin and Orbital Angular Momenta

The discussion so far in this section may produce some sense of déjà vu. We
saw in Eqs. (5.3.30) and (5.3.31) that the orbital angular momentum L has
commutators just like (5.4.6) with coordinates and momenta and hence with
any vector V formed from coordinates and momenta:
[e · L, V] = −i h̄ e × V . (5.4.11)
Since L is itself a vector formed from coordinates and momenta, this also
applies with V = L, and hence
[L1 , L2 ] = i h̄L3 , [L2 , L3 ] = i h̄L1 , [L3 , L1 ] = i h̄L2 , (5.4.12)
just like the commutators of the components of J. Of course, we can also
calculate these commutators directly from the commutators of momentum and
position operators. For instance,
[L1 , L2 ] = [X2 P3 − X3 P2 , X3 P1 − X1 P3 ]
= X2 P1 [P3 , X3 ] + P2 X1 [X3 , P3 ]
= i h̄(−X2 P1 + P2 X1 ) = i h̄L3 .
But this does not mean that J = L. Instead, we can consider the possibility
that
J=L+S, (5.4.13)
where S, known as the spin, is some operator whose properties we will now
work out.
First, because J satisfies Eq. (5.4.6) for any vector operator V, and L satisfies
Eq. (5.4.11) for any vector operator V formed from positions and momenta, the
difference of these equations tells us that
[Si , Vj ] = 0 (5.4.14)
for any vector operator V formed from positions and momenta. The spin opera-
tor has nothing to do with positions and momenta.
In particular, since L is a vector formed from positions and momenta,
[Si , Lj ] = 0 . (5.4.15)
It follows that
[Ji , Jj ] = [Li , Lj ] + [Si , Sj ]
so the Si satisfy the same commutation relations with each other as in Eq. (5.4.9)
for J and Eq. (5.4.12) for L:
[S1 , S2 ] = i h̄S3 , [S2 , S3 ] = i h̄S1 , [S3 , S1 ] = i h̄S2 . (5.4.16)
Multiplets
We next show how to use the commutation relations (5.4.9) to find the allowed
values of J2 and the range of allowed values of J3 for a given J2 . Though
presented here for the total angular momentum J, precisely the same reason-
ing and corresponding results apply to any angular momentum operators with
corresponding commutation relations, such as the orbital angular momentum
vector L that satisfies Eq. (5.4.12) and the spin angular momentum vector S that
satisfies Eq. (5.4.16).
First, we note that
[J3 , (J1 ± iJ2 )] = i h̄J2 ± i (−i h̄J1 ) = ± h̄ (J1 ± iJ2 ) . (5.4.17)
Therefore J1 ± iJ2 act as raising and lowering operators: for a wave function
ψ m that satisfies the eigenvalue condition J3 ψ m = h̄mψ m (with any m), we
have
J3 (J1 ± iJ2 )ψ m = (m ± 1)h̄(J1 ± iJ2 )ψ m ,
so if (J1 ± iJ2 )ψ m does not vanish then it is an eigenfunction of J3 with
eigenvalue h̄(m ± 1). Since J2 commutes with J3 , we can choose ψ m to be
an eigenfunction of J2 as well as J3 , and, since J2 commutes with (J1 ± iJ2 ),
all the wave functions that are connected with each other by lowering and/or
raising operators will have the same eigenvalue for J2 . We say that such wave
functions form an angular momentum multiplet.
Now, there must be a maximum and a minimum to the eigenvalues of J3 that
can be reached in this way, because the square of any eigenvalue of J3 is nec-
essarily not more than the eigenvalue of J2 . The reason is that for any wave
function ψ that has an eigenvalue a for J3 and an eigenvalue b for J2 , we have
b − a 2 = (J2 − J32 ) = (J12 + J22 ) ≥ 0 .
It is conventional to define a quantity j as the maximum value of the eigenvalues
of J3 /h̄ for a particular multiplet of wave functions that are related by raising
and lowering operators. We will also temporarily define j as the minimum
eigenvalue of J3 /h̄ for these wave functions. The wave function ψ j for which
J3 takes its maximum eigenvalue h̄j must satisfy
(J1 + iJ2 )ψ j = 0 , (5.4.18)
since otherwise (J1 +iJ2 )ψ j would be a wave function with a larger eigenvalue
of J3 . Likewise, acting on the wave function ψ j with (J1 − iJ2 ) gives an
eigenfunction of J3 with eigenvalue h̄(j −1), unless of course this wave function

vanishes. Continuing in this way, we must eventually get to a wave function ψ j
with the minimum eigenvalue h̄j of J3 , which satisfies

(J1 − iJ2 )ψ j = 0 , (5.4.19)

since otherwise (J1 − iJ2 )ψ j would be a wave function with an even smaller

eigenvalue of J3 . We get to ψ j from ψ j by applying the lowering operator
(J1 − iJ2 ) a whole number of times, so j − j must be a whole number.
To go further, we use the commutation relations of J1 and J2 to show that
(J1 − iJ2 )(J1 + iJ2 ) = J12 + J22 + i[J1 , J2 ] = J2 − J32 − h̄J3 , (5.4.20)
(J1 + iJ2 )(J1 − iJ2 ) = J12 + J22 − i[J1 , J2 ] = J2 − J32 + h̄J3 . (5.4.21)
According to Eq. (5.4.18), the operator (5.4.20) gives zero when acting on
ψ j , so
J2 ψ j = h̄2 j (j + 1) ψ j . (5.4.22)
On the other hand, according to Eq. (5.4.19) the operator (5.4.21) gives zero

when acting on ψ j , so

J2 ψ j = h̄2 j (j − 1) ψ j . (5.4.23)
But all these wave functions are eigenfunctions of J2 with the same eigenvalue,
so j (j − 1) = j (j + 1). This quadratic equation for j has two solutions,
j = j + 1, and j = −j . The first solution is impossible, because j is the
minimum eigenvalue of J3 /h̄ and therefore cannot be greater than the maximum
eigenvalue j . This leaves us with the other solution,
j = −j . (5.4.24)
But we saw that j − j = 2j must be a non-negative integer, so j must be a
non-negative integer or half integer. The eigenvalues of J3 range over the 2j + 1
values of h̄m with m running by unit steps from −j to +j . The corresponding
eigenfunctions will be denoted ψjm , so that
J3 ψjm = h̄ m ψjm , m = −j , −j + 1, . . . , +j (5.4.25)
J2 ψjm = h̄2 j (j + 1) ψjm . (5.4.26)
These are the same eigenvalues as those we found in the previous section in
the case of orbital angular momentum, with the one big difference that j and m
may be half-integers rather than integers. This justifies the guess of Goudsmit
and Uhlenbeck that electrons could have an intrinsic angular momentum with
j = 1/2, but that is the end of the surprises – we see that it is not possible to have
physical systems with weird angular momenta such as j = 1/3, j = 1/4, etc.
Using these results, we can work out the action of any component of J on
these multiplets. Because J1 − iJ2 is a lowering operator, we must have
(J1 − iJ2 )ψjm = αj m ψjm−1 ,
where the αj m are various constants that depend on how the wave functions are
normalized. If we assume that ψjm and ψjm−1 both have unit norm, then, using
Eq. (5.4.21),

|αj m | = ψj (J1 + iJ2 )(J1 − iJ2 )ψj = ψjm∗ (J2 − J32 + h̄J3 )ψjm
2 m∗ m
= h̄2 [j (j + 1) − m(m − 1)] .

We can adjust the√phases of these states so that all αj m are real and positive and
so that αj m = h̄ j (j + 1) − m(m − 1); hence
%
(J1 − iJ2 )ψjm = h̄ j (j + 1) − m(m − 1)ψjm−1 . (5.4.27)
The same analysis shows that, with this choice of phases,
%
(J1 + iJ2 )ψjm = h̄ j (j + 1) − m(m + 1)ψjm+1 . (5.4.28)
So now we know how J1 and J2 as well as J3 act on angular momentum
multiplets.
A particle of species n with eigenvalue h̄2 sn (sn +1) for S2n is said to have spin
sn . Electrons, muons, neutrinos, and quarks have spin 1/2; W and Z particles
have spin 1; and Higgs particles have spin 0. The concept of spin is not limited
to so-called elementary particles. Protons and neutrons are each composites of
three quarks, and some of their intrinsic angular momentum comes from the
orbital motion of these quarks, but the energies in an atomic nucleus are not
high enough for us to probe the internal structure of the proton and neutron, and
so we refer to their total angular momentum as spin 1/2. Likewise, the energies
in an atom are not high enough for us to probe the internal structure of their
nuclei, and so we refer to the intrinsic angular momentum of these nuclei as a
spin. The deuteron has spin 1; the 3 He and 4 He nuclei have spins 1/2 and 0,
respectively; and so on.
The wave function for a multi-particle state depends on the 3-components σn
of the individual spin vectors Sn /h̄, so the wave function must be labeled by
these spin 3-components as well as by coordinates and will be written
ψ(x1 , σ1 ; x2 , σ2 ; . . . ) .
The values of σn run over the 2sn + 1 values from −sn to +sn . In place of
Eq. (5.3.8), the scalar product of two wave functions for systems of particles
with spin includes a sum over all these σn :

ψa∗ ψb

≡ d 3 x1 d 3 x2 · · · ψa∗ (x1 , σ1 , x2 , σ2 , . . . )ψb (x1 , σ1 , x2 , σ1 , . . . ) .
σ1 σ2
(5.4.29)
The spin operator S does not act on the coordinate arguments, but produces
linear combinations of wave functions with various values of the σn . (Instead of
the 3-component of angular momentum, we can label states with the helicity,
the component of angular momentum in the direction of motion in units of h̄.
Photon states have only helicity ±1, corresponding to the two states of circular
polarization.)
Adding Angular Momenta

Physical systems typically involve angular momenta of various sorts. Even in
hydrogen there are the orbital and spin angular momenta of the electron, and
also a proton spin very weakly coupled to the electron. In more complicated
atoms there is more than one orbital and spin electron angular momentum, as
well as a nuclear spin. But rotational invariance only ensures that the states of
definite energy can also be chosen to have definite values for J2 and J3 (where J
is the total angular momentum), which are required by rotational invariance to
commute with the Hamiltonian. This is one reason why it is important to know
what total angular momenta arise when we combine different angular momenta
in the same system.
Suppose a system involves two different angular momenta Ja and Jb . These
may be the spin and/or orbital angular momenta of a single particle, or sums
of various spins and/or angular momenta of a number of particles. Suppose the
wave function is an eigenfunction of J2a and J2b with eigenvalues h̄2 ja (ja + 1)
and h̄2 jb (jb + 1), respectively. We can define wave functions ψjma a,j,bmb as eigen-
functions also of Ja3 and Jb3 with eigenvalues h̄ma and h̄mb , respectively.
We then have the following problems: what linear combinations of these wave
functions are eigenfunctions of J2 and J3 (where J ≡ Ja + Jb ) and what are the
corresponding eigenvalues h̄2 j (j + 1) and h̄m of J2 and J3 ?
j ,j
The “stretched” wave function ψjaa,jbb with the maximum possible eigenvalues
for Ja3 and Jb3 is an eigenfunction of J3 with m = ja + jb , so it must also be an
eigenfunction of J2 with j = ja + jb . It could not have a lower value of j with
this value of m, and there are no wave functions with larger values of j since
there are none with larger values of m.
j −1,j j ,j −1
Next, consider the wave functions ψjaa,jb b and ψjaa,jbb . Both are eigenfunc-
tions of J3 with m = ja + jb − 1. One linear combination of these must be the
member of the j = ja + jb multiplet with m = ja + jb − 1; the other then has
to be a member of some other multiplet, which by the same reasoning as before
must have j = ja + jb − 1.
We can continue in this way, with one multiplet for each j = ja + jb , j =
ja +jb −1, j = ja +jb −2, and so on. After ν steps, with m = ja +jb −ν, there
are ν + 1 choices of ma running up from ja − ν to ja , and mb = m − ma running
down from jb to jb − ν, with one new multiplet having j = ja + jb − ν for each
increase in ν. But this ends with ν = 2jb (taking ja ≥ jb ), for which mb runs
from jb to −jb . At the next step, with ν = 2jb + 1, we would only get a new
multiplet if mb could run from jb down to −jb −1, which is impossible since we
can only have |mb | ≤ jb . So when ja ≥ jb , the lowest value of j that is found
in the addition of angular momenta ja and jb is j = ja + jb − 2jb = ja − jb .
Of course, in the same way, if jb ≥ ja , the lowest value of j is jb − ja . So in
this way, we construct one multiplet for each j in the range
j = ja + jb , j = ja + jb − 1, j = ja + jb − 2, . . . , j = |ja − jb | .
(5.4.30)
This is the general rule for adding angular momenta.
The linear combination of wave functions ψjma a,j,bmb with a definite value for j
and m is conventionally written as
+ja
+jb

ψjma , jb , j = Cja , jb (j , m ; ma , mb )ψjma a, ,jbmb , (5.4.31)
ma =−ja mb =−jb
where the Cja , jb (j , m ; ma , mb ) are real numerical coefficients known as

Clebsch–Gordan coefficients. Because J3 = Ja3 + Jb3 , the Clebsch–Gordan
coefficients are non-zero only for m = ma + mb .
In the appendix to this section we shall work this out in a simple case, the
combination in hydrogen of the electron’s integer orbital angular momentum
and its spin angular momentum with s = 1/2. We then provide a table of
Clebsch–Gordan coefficients for various low values of ja and jb .
Fine Structure and Space Inversion

Let us apply what we have learned to alkali metals and hydrogen. In both
cases the observed spectrum arises from transitions involving a single “valence”
electron moving in an essentially spherically symmetric potential – the potential
of the nucleus for hydrogen or the potential of the nucleus and tightly bound
inner electrons for alkali metals. The total angular momentum J is then the
sum of the valence electron’s orbital angular momentum L and that electron’s
spin S. Since J commutes with the Hamiltonian, we can take the wave functions
of definite energy also to have a definite value h̄2 j (j + 1) of J2 and a value
h̄m of J3 , with m running from −j to +j . Now, as we have just seen, these
wave functions would in general be linear combinations of wave functions with
j = + 1/2 and j = − 1/2 (where is defined by L2 = h̄2 ( + 1)), so a

wave function of definite energy and j would in general be a linear combination
of wave functions with both = j − 1/2 and = j + 1/2.
But in fact these states can be chosen to have definite values of , because
there is another conserved quantity. We can define a space reflection operator
by the condition that, for any wave function ψ,
[ψ](x1 , σ1 ; x2 , σ2 ; . . . ) = ψ(−x1 , σ1 ; −x2 , σ2 ; . . . ) . (5.4.32)
This is not a rotation. By a rotation of 180◦ around the z-axis we can change
the signs of x and y but there is no rotation that changes the signs of all three
components of a 3-vector. It is easy to see that the operator defined in this way
has the properties
2 = 1 , † = (5.4.33)
and (now considering just a single particle)
X = −X , P = −P . (5.4.34)
So, we also have
L = +L . (5.4.35)
The defining condition (5.4.6) for the total angular momentum operator J,
[e · J, V] = −i h̄ e × V ,
is also satisfied by J as long as V = ±V, so
J = +J (5.4.36)
and then also
S = +S . (5.4.37)
The operator commutes with the Hamiltonian:
H = H (5.4.38)
at least for Hamiltonians of the form encountered in atomic and molecular
physics, even if we include spin–orbit coupling terms, proportional to S · L.
It follows that we can choose the states of definite energy so that their wave
functions are also eigenfunctions of :
ψ = πψ .
Because 2 = 1 the eigenvalue π, known as the parity of the state, can only be
+1 or −1. Indeed, given a wave function ψ for which H ψ = Eψ that is not an
eigenfunction of , we can always write it as a superposition ψ = ψ+ + ψ−
where ψ± ≡ (1 ± )ψ/2. Since commutes with H these satisfy H ψ± =

Eψ± , and since 2 = 1 they satisfy ψ± = ±ψ± .
As we saw in Section 5.2 for general spherically symmetric potentials, a one-
particle state with L2 = h̄2 ( + 1) has wave function ψ(x) proportional to a
homogeneous polynomial of order in the coordinate x, on which the operator
gives a factor (−1) , so the states with definite energy can be taken to have
either even or odd. For j − 1/2 even or odd we have j + 1/2 respectively odd
or even, and hence the states with definite energy and j can be taken to have a
definite , either j − 1/2 or j + 1/2. These states are therefore labeled
1s1/2 , 2s1/2 , 2p1/2 , 2p3/2 , 3s1/2 , 3p1/2 , 3p3/2 , 3d3/2 , 3d5/2 , . . .
where again the letters s, p, d, etc, stand for = 0, 1, 2, etc.; the integer in
front of the letter is the principal quantum number, defined so that the number
of nodes of the wave function is n − − 1 (and therefore ≤ n − 1); and
now the subscript gives the value of j . The energy depends on j as well as
on n and (except for hydrogen) on , with the j dependence arising both from
relativistic corrections and the magnetic coupling of the electron’s spin with
the orbital motion, but this dependence is rather weak and just gives rise to the
fine structure of the energy levels. The difference in the energies of the 3p1/2
and 3p3/2 states of sodium splits the wavelengths of the 3p1/2 → 3s1/2 and
3p3/2 → 3s1/2 transitions by just 1.02 parts per thousand, while the energies of
the 2p1/2 and 2p3/2 states of hydrogen differ by only 4.44 parts per million.
The hydrogen fine structure was first calculated in 1928 by Dirac in a rel-
ativistic version of wave mechanics.10 The relativistic and spin effects that he
calculated left the 2p1/2 state with the same energy as the 2s1/2 state. Physi-
cists including Hans Kramers (1894–1952) and Victor Weisskopf (1908–2002)
realized in the 1930s that quantum electrodynamic effects such as the emission
and reabsorption of photons by the orbiting electron would split the energies
of the 2p1/2 and 2s1/2 states, but the calculation proved difficult. This splitting
was first measured after the war by Willis Lamb (1903–2008) and R. C. Rether-
ford,11 and is known as the Lamb shift. It is very small, 4.3515 × 10−6 eV,
about a tenth of the small fine-structure splitting between the 2p1/2 and 2p3/2
states. The successful calculation of the Lamb shift in 1949 by Norman Kroll
(1922–2004) and Lamb12 and by J. B. French and Weisskopf13 marked the
beginning of the modern understanding of quantum electrodynamics.
Any particle at rest, whether elementary or not, will have what is called
an intrinsic parity πn that depends only on the type n of the particle. If
10 P. A. M. Dirac, Proc. Roy. Soc. A 117, 619 (1928).
11 W. E. Lamb, Jr. and R. C. Retherford, Phys. Rev. 72, 241 (1947).
12 N. M. Kroll and W. E. Lamb, Phys. Rev. 75, 388 (1949).
13 J. B. French and V. F. Weisskopf, Phys. Rev. 75, 1240 (1949).
the particle is in a state with orbital angular momentum , the parity of

its state is (−1) πn . In our discussion above we have implicitly taken the
electron to have positive intrinsic parity. This is a matter of definition; if the
electron had negative intrinsic parity we could redefine the parity operator as
= exp(iπQ/e), where Q is the operator for total electric charge. The
one-electron state is an eigenstate of Q with eigenvalue −e, so it is an
eigenstate of exp(iπQ/e) with eigenvalue −1; if it were an eigenstate of
with eigenvalue −1 it would be an eigenstate of with eigenvalue +1.
Since Q as well as commutes with the Hamiltonian, so does and
it can be called the operator of space inversion just as well as . In the
same way, because of the conservation of another quantity known as baryon
number (described in Section 6.2) we can define the parity of the proton
as +1. But the intrinsic parities of most particles have to be determined
experimentally.
Hyperfine Structure
We must not forget the atomic nucleus, for if it has spin this produces a
magnetic field felt by orbiting electrons. This effect is most important for the
s-wave electrons that are not prevented from getting close to the nucleus by
the centrifugal barrier that is present for = 0. In hydrogen the spin 1/2 of
the nucleus combines with the spin 1/2 of the electron in its = 0 ground state
to split the energy of the ground state into components with total spin s = 0 and
s = 1, separated in energy by 5.9×10−6 eV. The transition between these states
produces the famous 21-cm absorption and emission spectral lines, discussed
in Section 3.5.
Appendix: Clebsch–Gordan Coefficients
First, as an example of some intrinsic importance, let us work out how to form
hydrogen wave functions with definite total angular momentum from wave
functions with definite 3-components of spin and orbital angular momentum.
Consider the “stretched” hydrogen wave function in which L3 and S3 are both
as large as possible, having eigenvalues +h̄ and +h̄/2, respectively. In general
we shall label hydrogen wave functions with orbital angular momentum and
spin 1/2 and definite values h̄m and h̄σ for L3 and S3 as ψ,m,1/2
σ
, so this stretched
, +1/2
wave function is denoted ψ, 1/2 . For this wave function, J3 = h̄( + 1/2) is
also as large as possible. This is therefore a wave function with j = + 1/2
(where as usual j is defined so that the eigenvalue of J2 is h̄2 j (j + 1)). This
wave function could not have a larger j because then there would be states with
J3 > h̄( + 1/2), and it could not have a smaller j because then J3 could not be
as large as h̄( + 1/2). In general we shall label hydrogen wave functions with
orbital angular momentum and spin 1/2 and definite values h̄2 j (j + 1) and
h̄M for J2 and J3 as ψ,M1/2, j . So we have
, +1/2 +1/2
ψ, 1/2 = ψ, 1/2, +1/2 . (5.4.39)
So far, this is pretty trivial, apparently not worth the elaborate notation. But
now consider the wave functions with J3 = h̄( − 1/2). There are two of these,
−1, +1/2
one of them, ψ, 1/2 , with L3 = h̄( − 1) and S3 = +h̄/2 and the other,
, −1/2
ψ, 1/2 , with L3 = h̄ and S3 = −h̄/2. One linear combination of these two
can be obtained by letting the lowering operator J1 − iJ2 act on the stretched
wave function. This is part of the same angular momentum multiplet as the
+1/2
stretched wave function ψ, 1/2, +1/2 , with the same eigenvalue for J2 , so is
−1/2
labeled ψ, 1/2, +1/2 . According to Eq. (5.4.27), if properly normalized this
wave function is given by
√ −1/2 +1/2
2 + 1ψ, 1/2, +1/2 = (J1 − iJ2 )ψ, 1/2, +1/2
, +1/2
= (L1 − iL2 + S1 − iS2 )ψ, 1/2 .
Orbital and spin angular momenta obey the same commutation relations as total
angular momentum, so their lowering operators act the same way as given in
Eq. (5.4.27) for J1 − iJ2 :
, +1/2 √ −1, +1/2 , +1/2 , −1/2
(L1 − iL2 )ψ, 1/2 = 2ψ, 1/2 , (S1 − iS2 )ψ, 1/2 = ψ, 1/2 ,
and therefore
√ −1/2 √ −1, +1/2 , −1/2
2 + 1ψ, 1/2, +1/2 = 2ψ, 1/2 + ψ, 1/2 . (5.4.40)
Since there are two independent wave functions with J3 = h̄( − 1/2), there
must be another linear combination that is part of an angular momentum multi-
plet with no higher value of J3 than h̄( − 1/2), so this multiplet has j = − 1,
−1/2
and in our notation this linear combination is ψ, 1/2, −1/2 . Since it has a dif-
ferent value of j , this linear combination can be calculated by requiring it to
be normalized and orthogonal to the one we found by acting with J1 − iJ2 on
the stretched wave function with L3 = h̄ and S3 = +h̄/2. That is (with a
conventional choice of overall phase):
√ −1/2 −1, +1/2 √ , −1/2
2 + 1ψ, 1/2, −1/2 = −ψ, 1/2 + 2ψ, 1/2 . (5.4.41)
By continued operation of the lowering operator on the wave functions (5.4.40)
and (5.4.41), we fill out two complete multiplets, one with j = + 1/2 and one
with j = − 1/2. These results can be summarized as values for the Clebsch–
Gordan coefficients in Eq. (5.4.31):
C, 1/2 ( + 1/2, + 1/2, , +1/2) = 1 ,

2
C, 1/2 ( + 1/2, − 1/2, − 1, +1/2) = ,
2 + 1

1
C, 1/2 ( + 1/2, − 1/2, , −1/2) = ,
2 + 1

1
C, 1/2 ( − 1/2, − 1/2, − 1, +1/2) = − ,
2 + 1

2
C, 1/2 ( − 1/2, − 1/2, , −1/2) = .
2 + 1
All the Clebsch–Gordan coefficients can be calculated in this way, but life is too
short. The best way to find Clebsch–Gordan coefficients is to look them up in
a table. At the end of this section there is a table of these coefficients for small
angular momenta.
There is a symmetry property of the Clebsch–Gordan coefficients in the case
of adding equal angular momenta that will be important for us when we come
to diatomic molecules in the next section and to nuclear forces in Section 6.2.
For ja = jb ,
Cja ,ja (j M; ma mb ) = (−1)j −2ja Cja ja (j M; mb ma ). (5.4.42)
This is trivial for the stretched configuration, where ma = mb = ja and

j = 2ja . It is then also valid for all the Clebsch–Gordan coefficients with the
same value of j , because the corresponding states are obtained by acting on the
stretched configuration state with the symmetric lowering operator J1 − iJ2 =
Ja1 + Jb1 − iJa2 − iJb2 . The state with j = 2ja − 1 and ma + mb = 2ja − 1
is a superposition of terms with ma = ja − 1, mb = ja and ma = ja ,
mb = ja −1, and, since it is orthogonal to the state with j = 2ja and ma +mb =
2ja − 1, it must be antisymmetric in ma and mb . All the other states with
j = 2ja − 1 are obtained by acting on this state with the symmetric lowering
operator J1 − iJ2 , and so are also antisymmetric in ma and mb , in agreement
with Eq. (5.4.42). Continuing, from the states with ma + mb = ja − 2 we
can form one antisymmetric combination, which is needed in the multiplet with
j = ja − 1, and two symmetric combinations, which can then only be in the
multiplets with j = ja + jb and j = ja + jb − 2. And so on.
Table 5.1 The non-vanishing Clebsch–Gordan coefficients for the addition of

angular momenta ja and jb with 3-components ma and mb to give angular
momentum j with 3-component M, for several low values of ja and jb .
ja jb j M ma mb Cja ,jb (j M ; ma mb )
1
2
1
2 1 +1 + 12 + 12 1
√
1
2
1
2 1 0 ± 12 ∓ 12 1/ 2
1
2
1
2 1 −1 − 12 − 12 1
√
1
2
1
2 0 0 ± 12 ∓ 12 ±1 2
1 1
2
3
2 ± 32 ±1 ± 12 1
√
1 1
2
3
2 ± 12 ±1 ∓ 12 1/3
√
1 1
2
3
2 ± 12 0 ± 12 2/3
√
1 1
2
1
2 ± 12 ±1 ∓ 12 ± 2/3
√
1 1
2
1
2 ± 12 0 ± 12 ∓ 1/3
1 1 2 ±2 ±1 ±1 1
√
1 1 2 ±1 ±1 0 1/ 2
√
1 1 2 ±1 0 ±1 1/ 2
√
1 1 2 0 ±1 ∓1 1/ 6
√
1 1 2 0 0 0 2/3
√
1 1 1 ±1 ±1 0 ±1/ 2
√
1 1 1 ±1 0 ±1 ∓1/ 2
√
1 1 0 0 ±1 ∓1 1/ 3
√
1 1 0 0 0 0 −1/ 3
5.5 Bosons and Fermions
Identical Particles
Aside from their momenta and helicities, every photon in the universe is
identical to every other photon. The reason is that all photons are quanta of the
same field, the electromagnetic field. In the same way, aside from their momenta
(or positions) and spin components, according to the modern understandings
outlined in Chapter 7, every electron in the universe is identical to every other
electron because they are all quanta of a single field, known as the electron
field. The same is true of every other species of elementary particle – quarks,
neutrinos, and so on – each is the quantum of a particular field. Indeed, our best
current definition of an elementary particle is that it is the quantum of one of the
fields of which the world is composed. But the same indistinguishability is true
of composite systems in any one specific state. Two protons are indistinguish-
able because they are each composed of three quarks of the same two different
types in the same bound state, and two hydrogen atoms in the same atomic
state are indistinguishable because they are each composed of an electron
and a proton.
In writing a wave function for identical particles as ψ(x1 , σ1 ; x2 , σ2 ; . . . ),
it is incorrect to say that for this wave function the first particle has position
x1 and spin 3-component σ1 while the second particle has position x2 and spin
3-component σ2 , and so on. Instead we should say that there is a particle
with position x1 and spin 3-component σ1 and another particle with posi-
tion x2 and spin 3-component σ2 , and so on. Thus for identical particles,
ψ(x1 , σ1 ; x2 , σ2 ; . . . ) and ψ(x2 , σ2 ; x1 , σ1 ; . . . ) represent the same state.
Two wave functions that represent the same state can only differ by a constant
factor, so
ψ(x2 , σ2 ; x1 , σ1 ; . . . ) = λψ(x1 , σ1 ; x2 , σ2 ; . . . )
for some constant λ. Integrals don’t
depend on how the variables of integration
are labeled, so it follows that |ψ|2 = |λ|2 |ψ|2 , and therefore λ can only
be a phase factor, with |λ| = 1. Further, the constant λ cannot depend on
position or spin 3-components without violating various symmetry principles,
such as Galilean or Einsteinian relativity, rotational invariance, and translation
invariance. so we can therefore repeat the same relation with identical particles
1 and 2 interchanged on both sides, but with the same λ, and write
ψ(x1 , σ1 ; x2 , σ2 ; . . . ) = λψ(x2 , σ2 ; x1 , σ1 ; . . . ) = λ2 ψ(x1 , σ1 ; x2 , σ2 ; . . . )
and therefore λ2 = 1. We have only two possibilities, λ = ±1. Our usual

assumptions regarding locality would not allow the choice of signs to depend
on whatever other particles are described by the wave function if these particles
were very far away from particles 1 and 2, and the continuity of wave functions
would not allow this sign to jump between +1 and −1 as these other particles
come close. We conclude that the value of λ encountered when we exchange
a pair of indistinguishable particles can depend only on the species of these
particles.
Particles for which λ = 1, so that the wave function is symmetric in the
labels of these particles, are known as bosons. They are named after Satyendra
Nath Bose (1894–1974), who first described multi-photon states, imposing this
symmetry condition.14 Einstein had Bose’s paper translated into German and
published, and then applied these ideas to material particles.15
Particles for which λ = −1, so that the wave function is antisymmetric in
the labels of these particles, are known as fermions, named after Enrico Fermi
(1901–1954). Fermi16 and Dirac17 at about the same time described multi-
electron states, imposing this antisymmetry condition.
It is another consequence of the relativistic quantum theory of fields that
elementary particles (the quanta of fields) are bosons or fermions according
to whether their spin is an integer or half an odd integer.18 The reason for this is
outlined in Section 7.4, but a complete proof is beyond the scope of this book.
It is easy, though, to see that if this correlation with spin is valid for some set
of elementary particles then it is valid for any composites of these particles.
If we interchange two identical composite particles then we are interchanging
all their constituents, so the interchange gives a minus sign multiplying the wave
function if each of the composites contains an odd number of fermions and a
plus sign otherwise, no matter how many bosons it contains. But, according
to the rules for adding angular momenta described in the previous section, a
composite has a half odd integer spin if it contains an odd number of half odd
integer spin particles, and integer spin otherwise, no matter how many integer
spin particles it contains. So a composite with half odd integer spin contains
an odd number of fermions, and is therefore a fermion, while if it has integer
spin it contains an even number of fermions (perhaps zero) and is therefore a
boson. No other correlation of boson/fermion character with spin would have
this consistency.
So electrons, quarks, protons, and neutrons, which have spin 1/2, are
fermions. The spin of massless particles like photons requires special consider-
ation, but as noted in Section 7.5 the components of their angular momentum in
the direction of travel can only be ±h̄, corresponding to left and right circular
polarization, and they are bosons. Indeed, as already mentioned, Bose’s original
introduction of symmetric states had to do with photons. Hydrogen and helium
atoms are bosons, while 6 Li atoms (with three protons, three neutrons, and three
electrons) are fermions.
Statistics
The distinction between bosons and fermions has a profound impact on the
properties of gases in thermal equilibrium. As we did for photons in Section 3.2,
14 S. N. Bose, Z. Phys. 26, 178 (1924).
15 A. Einstein, Sitz. Preuss. Akad. Wiss. 1, 3 (1926).
16 E. Fermi, Rend. Lincei 3, 145 (1926).
18 This was first stated as a general rule by M. Fierz, Helv. Phys. Acta 12, 3 (1939) and W. Pauli, Phys. Rev.
58, 716 (1940).
to calculate the densities of particles with various momenta we can imagine a

gas of any identical particles in a cube of volume L3 . Since the particles in a gas
are essentially free particles, their momenta are quantized like photon momenta,
with p = 2πnh̄/L, where n is a 3-vector with integer components. As we saw in
our discussion of momentum space in Section 5.3, the number of these allowed
momentum values in a momentum-space volume d 3 p is
dN = d 3 n = (L/2π h̄)3 × d 3 p . (5.5.1)
The number of particles per volume with momentum in a volume d 3 p of mo-
mentum space around one of these allowed momentum values is then
d N (p) = g dN N̄p /L3 = g(2π h̄)−3 d 3 p N̄p , (5.5.2)
where N̄p is the mean number of particles in these states with momentum p
and any given spin 3-component or helicity. (In Eq. (5.5.2), we include a factor
g equal to the number of spin or helicity states for each allowed momentum
value. For massive particles of spin s, we have g = 2s + 1 states, which are
characterized by different values of S3 , while for photons g = 2.) The mean
number N̄p in a gas with temperature T and chemical potential μ is given by
the grand canonical ensemble discussed in Section 2.4:

N N exp(−N[E(p) − μ]/kT )
N̄p = (5.5.3)
N exp(−N[E(p) − μ]/kT )
the sums running over the allowed numbers of particles with momentum p. It is
in these sums that there appears a distinction between bosons and fermions.
For bosons N runs over all integers from zero to infinity, and we have
1
N̄p = . (5.5.4)
exp([E(p) − μ]/kT ) − 1
This is known as the case of Bose–Einstein statistics. The chemical potential
μ can be non-zero only if the total number of particles is conserved, so for
photons μ = 0 and the result of using Eq. (5.5.4) in Eq. (5.5.2) (with the
number of polarization states g = 2) is equivalent to the Planck distribution
(3.1.14). For material particles such as atoms whose number is conserved under
ordinary conditions we can have μ > 0, and then at very low temperature N̄p
is very sharply peaked at momenta for which the energy E(p) is close to μ. It
is even possible to have a macroscopic number of particles with energy μ, a
phenomenon known as Bose–Einstein condensation, first seen by Eric Cornell
and Carl Wiemann and their collaborators in a gas of rubidium atoms in 1995.19
(There is also a sort of Bose–Einstein condensation in liquid helium, but it is not
a good approximation to treat liquid helium as a gas.)
19 M. H. Anderson et al., Science 269, 198 (1995).

For fermions it is not possible to have more than one particle with a given
momentum p (and a given spin 3-component), because the wave function for
two such particles would be proportional to
exp(ip · x1 /h̄) exp(ip · x2 /h̄),
which is symmetric rather than antisymmetric in the two particles. Hence the
sums in Eq. (5.5.3) run only over the values N = 0 and N = 1:
1
N̄p = . (5.5.5)
exp([E(p) − μ]/kT ) + 1
This is known as the case of Fermi–Dirac statistics. For very low temperatures
this takes the form

1 E(p) < μ
N̄p → (5.5.6)
0 E(p) > μ .
This is used to derive a relation between the number densities and energy densi-
ties in white dwarf stars, whose high density requires electrons to have energies
much larger than chemical binding energies, though they are essentially at zero
temperature. For white dwarfs of relatively low mass, μ is much less than
me c2 but much larger than chemical binding energies, so the number density
of electrons is given by
pF
2 8πpF3
ne = d N (p) = 4πp 2
dp =
p<pF (2π h̄)3 0 3(2π h̄)3
where pF is the Fermi momentum defined by E(pF ) = μ. (In practice, we
use a known or assumed value of n to calculate pF .) The corresponding kinetic
energy density is
pF
2 p2 8πpF5
E= E(p)d N (p) = 4πp 2
dp × =
|p|<pF (2π h̄)3 0 2me 10me (2π h̄)3
= (8π)−2/3 (2π h̄)2 (3ne )5/3 /10me .
As shown in Eq. (1.1.3), the pressure of any non-relativistic monatomic gas is
p = 2E /3, so this gives an equation of state for low-mass white dwarfs:
p = Kρ 5/3 , K = (2/3)(8π)−2/3 (2π h̄)2 (3Z/Am1 )5/3 /10me ,
where ρ = ne Am1 /Z is the mass density.
The Hartree Approximation

In multi-electron atoms it is often a good approximation to treat each electron
as moving in a spherically symmetric (but not Coulomb!) effective potential
arising from the atomic nucleus and from all the other electrons. This is known
as the Hartree approximation, introduced by Douglas Hartree (1897–1958) in

1928.20 Each electron occupies some one-particle state of definite energy in this
effective potential with corresponding wave functions ψ1 (x, σ ), ψ2 (x, σ ), etc.
If electrons were distinguishable, the wave function for an atomic state with N
electrons – electron 1 in state 1, electron 2 in state 2, etc. – would be the product
ψ1 (x1 , σ1 )ψ2 (x2 , σ2 ) · · · ψN (xN .σN ) .
But, because electrons are indistinguishable fermions, this must be antisym-
metrized. The true wave function (up to a normalization constant) is

ψ= δP ψ1 (xP 1 , σP 1 )ψ2 (xP 2 , σP 2 ) · · · ψN (xP N , σP N ) , (5.5.7)
P
the sum running over all permutations P of 1, 2, . . . , N into P 1, P 2, . . . , P N,
with δP = +1 or δP = −1 for P an even or odd permutation, respectively.
For instance, for a two-electron state there are two permutations P , the identity
1 → 1, 2 → 2 with δP = +1, and the interchange 1 ↔ 2, with δP = −1, so
ψ = ψ1 (x1 , σ1 )ψ2 (x2 , σ2 ) − ψ1 (x2 , σ2 )ψ2 (x1 , σ1 ) .
In general, the wave function (5.5.7) can be written as a determinant, known as
a Slater determinant:21

ψ1 (x1 , σ1 ) ψ1 (x2 , σ2 ) · · · ψ1 (xN , σN )

ψ2 (x1 , σ1 ) ψ2 (x2 , σ2 ) · · · ψ2 (xN , σN )
ψ = (5.5.8)
ψ (x , σ ) ψ (x , σ ) · · · ψ (x , σ )
3 1 1 3 2 2 3 N N
··· ··· ··· ···
The Pauli Exclusion Principle

None of the one-particle states occupied by electrons in the Hartree approxima-
tion can be the same, for if they were then two rows of the Slater determinant
would be identical, and the wave function would vanish. This principle was first
stated by Pauli,22 on the basis of efforts to understand the periodic table of the
elements, before it became understood that multi-electron wave functions have
to be antisymmetric. The number of values of L3 for a given is 2 + 1, so
Pauli at first thought that not more than 2 + 1 electrons can have the same n
and , but as we shall see, to get the chemistry right it is necessary to assume
that the maximum number of electrons with a given n and is 2(2 + 1). For
this reason Pauli introduced a new quantum number that takes just two values,
which as discussed in the previous section were identified by Goudsmit and
Uhlenbeck as the two values S3 = ±h̄/2 of the 3-component of electron spin.
20 D. H. Hartree, Proc. Camb. Phil. Soc. 24, 111 (1928).

21 J. C. Slater, Phys. Rev. 34, 1293 (1929).
22 W. Pauli, Z. Physik 31, 763 (1925).
Pauli reasoned that as we increase the number Z of electrons in atoms each

added electron must occupy the one-particle state of next largest energy. This is
why electrons in atoms do not all fall into the two 1s states of lowest energy, so
that atoms with Z > 2 do not all behave chemically just like helium.
The Periodic Table

The Pauli exclusion principle provides an explanation for the periodic table
of elements, first described in purely chemical terms by Dmitri Ivanovich
Mendeleev (1834–1907) in 1869, long before atomic structure was understood.
Of course, Mendeleev knew nothing about electrons but he knew the values
of atomic weights and could list the elements in order of increasing atomic
weight. As we saw in Section 5.2, in the twentieth century it became clear that
the atomic number, defined as the place of an atom in this list, is the same
(with a few exceptions) as the charge Z of the atomic nucleus in units of e and
is hence equal to the number of electrons in the atom, on which the chemical
properties of elements chiefly depend.
Detailed calculations show that the one-electron states are filled (with spo-
radic exceptions) in the order
1s,
2s, 2p,
3s, 3p,
4s, 3d, 4p,
5s, 4d, 5p,
6s, 4f , 5d, 6p,
7s, 5f , 7p, . . . (5.5.9)
(We are here ignoring the small fine-structure splitting in the energies of these
states, and so are leaving out subscripts giving the values of j .) For a given ,
increasing n increases the number of nodes of the wave function, so that the
wave function oscillates more with r, which increases the kinetic energy. This
is the main reason why electron energies increase going down the list. But, for
a given n, the increase in centrifugal force with increasing decreases the wave
function at small r where the charge interior to r is largest, which decreases the
effective absolute value of the negative potential energy, increasing the state’s
total energy. Hence, although the one-electron states listed above on the same
line have approximately equal energy, the energies increase somewhat from
left to right. In the case of 3d, 4d, 4f , 5d, and 5f states and many states
with n ≥ 6, the dependence of the energy on turns out to overcome its
dependence on n.
Taking spin into account, the total number of states for the energy levels listed
on each line of Eq. (5.5.9) are 2, 2 + 6 = 8, 2 + 6 = 8, 2 + 10 + 6 = 18,
2 + 10 + 6 = 18, 2 + 14 + 10 + 6 = 32, and so on. These are substantially
the same periodicities that had been discovered chemically by Mendeleev. For
instance, electrons that fill up any one of the lines of the table are said to form
a closed shell. It is energetically unfavorable for atoms whose electrons just fill
closed shells to gain or lose electrons, so these atoms are chemically inert. They
are the noble gases: there is helium with Z = 2, neon with Z = 2 + 8 = 10,
argon with Z = 10 + 8 = 18, krypton with Z = 18 + 18 = 36, xenon with
Z = 36 + 18 = 54, and radon with Z = 54 +32 = 86. Elements with one elec-
tron outside closed shells find it easy to lose that electron, which can move freely
through the crystal lattice carrying currents of electricity or of heat. These are
the alkali metals: lithium with Z = 2 + 1 = 3, sodium with Z = 10 + 1 = 11,
potassium with Z = 18+1 = 19, and so on. Elements with one electron missing
from the highest energy closed shell react strongly in chemical reactions in
which they can gain an electron. These are the halogens: there is fluorine with
Z = 10−1 = 9, chlorine with Z = 18−1 = 17, bromine with Z = 36−1 = 35,
and so on.
More generally, if an atom has a few electrons outside closed shells, it has
what chemists call a positive valence, equal to that number of extra electrons;
if it has a few electrons less than needed to fill closed shells, then it has neg-
ative valence, equal to that number of missing electrons. Thus alkali metals
have valence +1; the so-called alkali earths beryllium, magnesium, calcium,
etc. have valence +2; the halogens have valence −1; oxygen, sulfur, etc. have
valence −2; and so on. The molecules of many simple chemical compounds
(not all!) are held together by electrostatic attraction between ions of elements
with positive and negative valence that have traded electrons. Since electrons
are neither created nor destroyed in chemistry, in such molecules if electri-
cally neutral the total valence must be zero. These include such compounds
as salts composed of metal and halogen atoms, like sodium chloride, oxides
like calcium oxide, etc. Hydrogen can act as if it has valence +1, as in water or
ammonia, or valence −1, as in metal hydrides.
Diatomic Molecules
The rotational energy spectrum of molecules like H2 , N2 , O2 , etc. that are com-
posed of two identical atoms is profoundly affected by the bosonic or fermionic
nature of the atomic nuclei. The energy required to excite rotational states of
molecules is less than the energy required to excite vibrational states by factors
of order (me /Am1 )1/2 , and less than the energy required to excite electronic
states by even smaller factors, of order me /Am1 , so the lowest energy states
of molecules are rotational states in which the separations of atomic nuclei and
the state of atomic electrons can be regarded as fixed. In such states the wave
function of a molecule consisting of two identical atoms is proportional to
cs (σ1 , σ2 )Ym (n̂) ± cs (σ2 , σ1 )Ym (−n̂) , (5.5.10)
where n̂ is a unit vector in the direction from nucleus 1 to nucleus 2; Ym is
the usual spherical harmonic, of the sort discussed in Section 5.2; cs (σ1 , σ2 )
is a spin wave function that depends on the total spin s of the two nuclei and
their individual spins s1 = s2 , as well as on the spin 3-components σ1 and
σ2 , about which more later; and the sign is +1 or −1 if the nuclei are bosons or
fermions, respectively. The energy of the rotational states with a given is given
in quantum mechanics by replacing L2 in the classical formula E = L2 /2I with
h̄2 ( + 1), so that
h̄2 ( + 1)
E = (5.5.11)
2I
with almost no dependence on total spin. Here I is the moment of inertia of
the molecule around a line perpendicular to n̂ through the center of mass of the
molecule. Now, Ym (−n̂) = (−1) Ym (n̂). Also, the spin wave functions have
the important symmetry property
cs (σ2 , σ1 ) = ±(−1)s cs (σ1 , σ2 ) , (5.5.12)
where the sign ± is (−1)2s1 ; that is, +1 for adding two equal integer spins,
and −1 for adding two equal half odd integer spins. (In terms of the Clebsch–
Gordan coefficients described in the previous section,
cs (σ1 , σ2 ) = Cs1 ,s1 (s σ ; σ1 σ2 ) ,
where σ = σ1 + σ2 . Equation (5.4.42) with ja = s1 , J = s, ma = σ1 , mb = σ2 ,
M = σ gives
Cs1 s1 (s σ ; σ2 σ1 ) = (−1)s−2s1 Cs1 s1 (s σ ; σ1 σ2 ) ,
which is the same as Eq. (5.5.12).) Because of the spin–statistics connection,
the ± sign in Eq. (5.5.12) is the same as in Eq. (5.5.10). We see then that these
± signs cancel, and the only states in which the wave function does not vanish
are therefore those in which
(−) = (−1)s . (5.5.13)
Either s and are even, in which case the molecule is distinguished by the
prefix para, or both are odd, and the prefix is ortho. For instance, in H2 we have
parahydrogen, with s = 0 and even, and orthohydrogen, with s = 1 and odd.
The degeneracy of the states is then (2 + 1) for parahydrogen and 3(2 + 1)
for orthohydrogen.
The forces acting on spins are so weak that radiative transitions do not
change s and therefore can only change by an even number. The dominant
transitions are those in which changes by two units, giving a radiated energy
h̄2 h̄2
E+2 − E = [( + 2)( + 3) − ( + 1)] = [4 + 6] . (5.5.14)
2I 2I
For para molecules the energies (5.5.14) are 3h̄2 /I , 7h̄2 /I , 11h̄2 /I , etc., while
for ortho molecules they are 5h̄2 /I , 9h̄2 /I , 13h̄2 /I , etc. Observing this pattern
of energies, with proportions 3 : 7 : 11 : · · · or 5 : 9 : 13 : · · · , it is possible
to judge which transitions are in para and which in ortho molecules, even if one
does not know the moment of inertia I .
The energy h̄2 /2I is typically much less than kT , so the abundance of
diatomic molecules in a state with given s and is simply proportional to the
degeneracy (2s + 1)(2 + 1). (For instance, for hydrogen h̄2 /2I = k × 45 K.)
The observed transitions are typically between states with 1, and the
intensity of the radiation emitted is mostly a matter of the number of spin states
for and s even or odd, as follows. If the spin s1 of each nucleus is an integer, so
that they are bosons, then the allowed even values of s are 2s1 , 2s1 − 2, . . . , 0,
and the allowed odd values of s are 2s1 − 1, 2s1 − 3, . . . , 1. Hence in this case
the total number of spin states for para and ortho molecules is

s1
#para = 2(2n) + 1 = (s1 + 1)(2s1 + 1) ,
n=0

s1
#ortho = 2(2n − 1) + 1 = s1 (2s1 + 1) ,
n=1
and the ratio of the intensities of para and ortho transitions is

para s1 + 1
= (bosons) . (5.5.15)
ortho s1
On the other hand, if s1 is half an odd integer, so that the nuclei are fermions,
then the allowed even values of s are 2s1 − 1, 2s1 − 3, . . . , 0 and the allowed
odd values of s are 2s1 , 2s1 − 2, . . . , 1. Hence for s1 a half odd integer the total
number of spin states for ortho and para molecules is the same as the number of
spin states for para and ortho molecules in the case where s1 is an integer and
the ratio of the intensities of para and ortho molecules is the reciprocal of the
ratio (5.5.15):
para s1
= (fermions) . (5.5.16)
ortho s1 + 1
(For example, for hydrogen s1 = 1/2, so the abundance of parahydrogen is
about one-third that of orthohydrogen, and the total intensity of radiation emit-
5.6 Scattering 175
ted or absorbed in transitions in parahydrogen is about one-third the ratio for or-
thohydrogen.) Evidently one can tell whether nuclei are bosons or fermions just
by observing whether radiation from the para or ortho transitions is stronger. In
the next chapter we will see that observations of the diatomic nitrogen molecule
presented a puzzle regarding the nature of the nitrogen nucleus that was only
resolved with the discovery of the neutron.
Clouds of interstellar diatomic molecules can cool to quite low temperatures
by collisional excitation of rotational energy levels, after which the excitation
energy is emitted as radiation that leaves the cloud. This is an important feature
in the formation of stars by gravitational condensation of interstellar matter,
which requires low temperatures to mitigate pressure forces that can prevent
condensation. But for cooling, it is necessary that radiation should often be
emitted before the molecule gives its excitation energy back to the cloud in
another collision.
This is an obstacle to cooling by diatomic molecules with identical atoms. As
discussed in Section 7.5, the fastest radiative transitions in atoms and molecules
generally
∗ are electric dipole transitions, in which there is a non-zero value for
ψfinal Pψinitial (where P is the momentum of the radiating particle). Since P is a
three-vector that changes sign under reflection of coordinates, this integral van-
ishes unless certain selection rules are obeyed: when spin effects are neglected,
the initial and final states must have opposite signs for the parity (−) and must
have values of that differ by no more than one unit. Neither selection rule
is satisfied by the transitions in diatomic molecules with identical atoms, in
which changes by two units. These are what in Section 7.5 are called electric
quadrupole transitions, which are much slower than electric dipole transitions.
Thus, although H2 is by far the most common molecule in interstellar space, it
contributes little to the cooling of molecular clouds.
On the other hand, in diatomic molecules with distinguishable atoms radiative
transitions can occur rapidly as electric dipole transitions in which changes by
one unit, and these molecules when excited by collisions often lose energy by
radiation rather than in further collisions. Of the more abundant molecules of
this sort, the most effective at cooling interstellar clouds is CO. This molecule
has a large moment of inertia, with h̄2 /I k 5.5 K, so it can cool clouds to
very low temperatures. The hydroxyl molecule OH is more abundant but has a
smaller moment of inertia and hence larger excitation energies, so it cannot cool
clouds to temperatures as low as can CO.
5.6 Scattering
Much of atomic, nuclear, and elementary particle physics is based on data

gained from the scattering of particles in collisions with other particles. In the
main body of this section we will consider scattering processes only in the case
that is simplest kinematically: the scattering of a particle by a much heavier
particle, such as the scattering of alpha particles by nuclei of various metals in
the 1911 experiment that led Rutherford to the discovery of the atomic nucleus.
In this case we can approximate the effect of the heavy target particle by taking
it to be at rest at the origin of coordinates, and representing its interaction with
the scattered particle as a fixed external potential V (x) that depends only on
the coordinate of the scattered particle. Not only is this a good approximation
for some scattering processes of historical importance – as we shall see, it was
the study of scattering using this approximation that led to the probabilistic
interpretation of quantum mechanics. An appendix to this section considers the
calculation of more general scattering and decay processes with any number of
particles of any type in the initial and final states.
Scattering Wave Function

We again use the time-independent Schrödinger equation
h̄2 2
− ∇ ψ(x) + V (x)ψ(x) = Eψ(x) , (5.6.1)
2m
where V (x) → 0 for |x| → ∞. But now instead of treating bound states with
E < 0, we here consider a particle with E > 0 that comes into the range of the
potential from an infinite distance and then recedes to infinity. We define a wave
number k > 0 by
h̄2 k 2
E= , (5.6.2)
2m
and we rewrite the Schrödinger equation as
2m
(∇ 2 + k 2 )ψ(x) = V (x)ψ(x) . (5.6.3)
h̄2
When x is far outside the range of the potential there is an asymptotic solution of
Eq. (5.6.3) that approaches a plane wave exp(ikx3 )/(2π h̄)3/2 (conveniently nor-
malized), which represents a particle coming in from infinity along the 3-axis.
We seek a solution of Eq. (5.6.3) with this asymptotic form:
eikx3
ψ(x) → (5.6.4)
(2π h̄)3/2
for |x| → ∞. To find such a solution, we replace Eq. (5.6.3) with an integral
equation that incorporates the boundary condition (5.6.4):

eikx3 2m
ψ(x) = + 2 d 3 x Gk (x − x )V (x )ψ(x ) , (5.6.5)
(2π h̄)3/2 h̄
5.6 Scattering 177
where Gk (x − x ) is a Green’s function (named after the nineteenth century

mathematician George Green (1793–1841)) satisfying the conditions
(∇ 2 + k 2 )Gk (x − x ) = δ 3 (x − x ) , (5.6.6)

Gk (x − x ) → 0 for |x − x | → ∞ . (5.6.7)
Here δ 3 (x − x ) is the Dirac delta function, defined by the condition

d 3 x δ 3 (x − x ) f (x ) = f (x) (5.6.8)
for any sufficiently smooth function f (x).
Representations of the Delta Function

Of course, there is no function for which Eq. (5.6.8) is literally satisfied but,
by taking δ 3 (x − x ) to be very large when x is very close to x and very small
otherwise, we can come arbitrarily near to satisfying Eq. (5.6.8). For example,
we can take
1

δ 3 (x − x ) = 3
exp − (x − x )2 /d 2 ,
(πd)
where d is some very small length. It is more convenient here to use another
well-known representation of the delta function:

1

δ 3 (x − x ) = d 3
q exp q · (x − x
) . (5.6.9)
(2π)3
With this representation, Eq. (5.6.8) is the fundamental theorem of Fourier
analysis: if

g(q) = d 3 x e−iq·x f (x )
then

1 1
f (x) = 3
d qe iq·x
g(q) = 3
d qe iq·x
d 3 x e−iq·x f (x ) ,
(2π)3 (2π)3
which with an interchange of the order of integration (discarding mathematical
rigor) is the same as Eq. (5.6.8). If the wave function ϕp (x) for a free particle of
momentum p is defined so that
ϕp (x) = (2π h̄)−3/2 exp(ip · x/h̄)
then Eq. (5.6.9) gives these wave functions a simple delta-function normalization

d 3 x ϕp∗ (x)ϕp (x) = δ 3 (p − p) ,
which is why we inserted a denominator (2π h̄)3/2 in Eq. (5.6.4). As we shall

see, we can derive valid results by manipulating δ 3 (x − x ) as if it were a well-
defined function.
Calculation of the Green’s Function

Using Eq. (5.6.9), we can easily write a solution of the differential equation
(5.6.6):

1 3 exp iq · (x − x )
Gk (x − x ) = d q (5.6.10)
(2π)3 k 2 − q2 + i
where is a positive infinitesimal that makes the integral well-defined despite
the singularity at |q| = k. (The reason for taking positive will be made clear
below.)
The integral over the directions of q in Eq. (5.6.10) gives
∞
4π sin q|x − x | 1
Gk (x − x ) = q 2
dq
3
(2π) 0 q|x − x | k − q 2 + i
2
∞

4π 1 exp iq|x − x |
= q dq 2 . (5.6.11)
(2π)3 2i|x − x | −∞ k − q 2 + i
We can evaluate the integral over q by closing the contour with a very large
semicircle in the upper half of the complex plane, on which the integrand is
exponentially small. Since is infinitesimal, we can write
1 1 1
=
k2 − q + i
2 k + i/2k − q k + i/2k + q
and evaluate the integral as 2iπ times the residue of the pole inside the contour,
at q = k + i/2k, and then take → 0:
1 1

Gk (x − x ) = −
exp ik|x − x | , (5.6.12)
4π |x − x |
so that Gk (x − x ) satisfies the boundary condition (5.6.7).
The Scattering Amplitude

Using Eq. (5.6.12) in Eq. (5.6.5) gives

eikx3 1 m 1

ψ(x) = − d 3x
exp ik|x − x | V (x )ψ(x ) .
(2π h̄) 3 2π h̄ 2 |x − x |
(5.6.13)
In the limit when |x| is much larger than the values of x at which V (x ) is
appreciable, we can use the approximation
5.6 Scattering 179

|x − x | → r 1 − 2x · x /r 2 → r − x̂ · x
where r = |x| and x̂ = x/r. This gives

1 eikr
ψ(x) → e ikx3
+ f (x̂) (5.6.14)
(2π h̄)3/2 r
where f (x̂) is the scattering amplitude

(2π h̄)3/2 m
f (x̂) = − d 3 x exp(−ik x̂ · x )V (x )ψ(x ) . (5.6.15)
2π h̄2
Probabilistic Interpretation
At a distance r from the scattering center that is not only large compared with
the range of the potential but also much greater than the wavelength 2π/k, the
second term in Eq. (5.6.14) at any given direction x̂ behaves like a plane wave
moving outward with wave vector k x̂. This is a familiar behavior for all sorts
of waves. A plane ocean wave encountering an obstacle in the water will break
up and spread out in all directions, just as in Eq. (5.6.14). But a particle like
an alpha particle in Rutherford’s laboratory encountering a target like a gold
nucleus does not break up. It hangs together, and is scattered in some definite
direction, though not a direction that can be predicted in advance. This showed
that ψ(x) or |ψ(x)|2 cannot represent how much of the scattered particle is at x.
It was this remark about scattering that led Max Born (1882–1970) in 1926 to
propose23 that if ψ is suitably normalized then |ψ(x)|2 is the probability density
at x – that is, |ψ(x)|2 d 3 x is the probability that the particle is in a small volume
d 3 x around x.
For a proper treatment of what happens in scattering it is necessary to con-
sider a wave function that at early times is a packet of free-particle waves, as
in Eq. (5.1.3), and use the time-dependent Schrödinger equation to follow the
subsequent scattering. This is the approach followed in the appendix to this
section. But, with a moderate amount of hand-waving, we can derive the most
important results more simply, just using Eq. (5.6.14).
Suppose that at some early time before the scattering the incoming particle is
in a thin disk of area A and thickness L at right angles to the path of the particle.
In order for |ψ|2 to serve√ as a probability density, we have to arrange that eikx3
comes with a factor 1/ AL instead of 1/(2π h̄)3/2 , so that the integral of |ψ|2
over the disk at early times is unity. The√ scattering wave function (5.6.14) will
then also be multiplied by (2π h̄)3/2 / AL. At a late time t after the collisions
a scattered particle will be in a thin disk of the same thickness L at a distance
23 M. Born, Z. Phys. 38, 803 (1926).

√
r = vt from the scattering center (where v = 2E/m). The probability density
at position r x̂ will be
|f (x̂)|2 /r 2
|ψ(r x̂)|2 =
AL
and the probability dP that the particle will be in a small solid angle d around
x̂ is this probability density times the volume of a disk of thickness L and area
r 2 d:
|f (x̂)|2 /r 2
dP = × Lr 2 d = |f (x̂)|2 d/A . (5.6.16)
AL
This is the same probability as if the particle by chance had to hit a tiny target
area dσ = |f (x̂)|2 d somewhere within the larger area A in order to be
scattered into a solid angle d around x̂. The ratio of the target area dσ to
the solid angle d is then
2
dσ 2πm2 3

= |f (x̂)| =
2
d x exp(−ik x̂ · x )V (x )ψ(x ) (5.6.17)
d h̄
and is known as the differential cross section. Much of modern theoretical and
experimental physics consists of the calculation and measurement of differential
cross sections.
Now we can see why it was necessary to take positive-definite in Eq. (5.6.10)
for the Green’s function. With negative-definite the integral (5.6.11) over q
would still be well-defined and we could still evaluate it by closing the contour
of integration with a large semicircle in the upper half complex plane of q,
on which the factor exp(iq|x − x |) is exponentially small. Only now, with
negative, the pole in the integrand in the upper half of the complex plane would
be at q = −k − i/2k, and in the asymptotic form of the wave function the
factor exp(ikr) would be replaced with exp(−ikr). Instead of a wave going
out in all directions to large distances, as in Eq. (5.6.14), this would represent a
wave coming into the potential along all directions from a great distance, which
is not what happens in any scattering process.
The Born Approximation

Equation (5.6.5) is of course not in itself a solution of the differential equation
(5.6.3), because ψ appears on the right-hand side of the equation, as well as on
the left. But it does suggest a solution that is a good approximation if 2m|V |/h̄2
is everywhere much less than k 2 . In this case we can approximate ψ on the
right-hand side of Eq. (5.6.5) with the term of zeroth order in V , that is, with
eikx3 /(2π h̄)3/2 :

1 2m 3 ikx3
ψ(x) e ikx3
+ 2 d x Gk (x − x )V (x )e . (5.6.18)
(2π h̄)3/2 h̄
5.6 Scattering 181
This is known as the Born approximation. Repeating our earlier calculation of

the scattering amplitude, or just jumping back to Eq. (5.6.15) and replacing

ψ(x ) with eikx3 /(2π h̄)3/2 , gives the corresponding approximation for the scat-
tering amplitude:

m
f (x̂) − 2
d 3 x exp(−ik x̂ · x )V (x ) eikx3 . (5.6.19)
2π h̄
This formula becomes particularly simple in the frequently encountered case
in which the potential is spherically symmetric. We can write Eq. (5.6.19) in
this case as

m
f (x̂) − d 3 x exp(iK · x )V (|x |) ,
2π h̄2
where
K = k(ẑ − x̂) .
Here ẑ is a unit vector in the 3-direction, the direction of the original particle
velocity. The integral over the direction of x is then easy:
∞ ∞
m 2 sin Kr 2m
f (x̂) − 4πr V (r ) = − V (r ) sin(Kr ) r dr ,
2π h̄2 0 Kr K h̄2 0
(5.6.20)
where
√
K ≡ |K| = k 2 − 2 cos θ = k 2 − 2[1 − 2 sin2 (θ/2)] = 2k sin(θ/2) ,
(5.6.21)
and θ is the angle between the incident direction ẑ and the scattered direc-
tion x̂. It is a special feature of the Born approximation for spherically sym-
metric potentials that the scattering amplitude depends on k and θ only in the
combination K.
Coulomb Scattering
For an important example of the Born approximation, consider a shielded
Coulomb potential
Z1 Z2 e2
V (r) = exp(−κr) . (5.6.22)
r
This is a rough approximation to the Coulomb energy of a scattered particle of
charge Z2 e in the electric field of an atom whose nucleus has charge Z1 e. The
full electrostatic potential of the nucleus is felt by the scattered particle when the
particle is closer to the nucleus than the electronic orbits, taken to have typical
radii of order 1/κ, but the potential vanishes when the scattered particle is far
enough from the atom for the orbiting electrons to completely shield the charge
of the nucleus. (This potential is also known as a Yukawa potential, because,
as we will see in Section 7.3, in 1935 Hideki Yukawa(1907–1981) showed that
the exchange of a meson of mass h̄κ/c between two nuclear particles would
produce such a potential, though of course with some other constant factor in
place of Z1 Z2 e2 .) Using this in Eq. (5.6.20) gives a scattering amplitude
2mZ1 Z2 e2 1
f (x̂) − , (5.6.23)
h̄2 K2 + κ2
with K given by Eq. (5.6.21). We can find the scattering amplitude for a pure
Coulomb potential by just taking κ = 0 in Eq. (5.6.23). This result is only valid
to first order in Z1 Z2 e2 , but a calculation of higher-order corrections shows that
for κ = 0 these higher-order corrections change the scattering amplitude only
by a phase factor, which has no effect on the differential cross section (5.6.17),
so in this case
dσ 4m2 Z12 Z22 e4
= , (5.6.24)
d h̄4 K 4
which holds even beyond the Born approximation. This is the same as the
formula calculated classically in 1911 by Rutherford, following the hyperbolic
trajectory of the alpha particle to find the area dσ that it must hit to reach a
given direction within a solid angle d. Rutherford’s calculation would not
have given the correct scattering probability for a general potential, except at
very short wavelength. It was just good luck that for Coulomb scattering the
classical calculation gives the right answer for general wavelengths.
Appendix: General Transition Rates
So far we have considered only the scattering of a single non-relativistic particle

by a fixed scattering center. Nature presents us with a much wider variety of
processes, in which any number of particles coming together from large sep-
arations in an initial state interact, producing some number of particles (not
necessarily the same number) that then go out to large separations in a final
state. These processes range from the decay of a single particle to the collision
of any number of relativistic or non-relativistic particles, producing any other
particles. This appendix describes a very general formalism for the calculation
of the rates of all such processes.
We consider a Hamiltonian of the general form
H = H0 + V (5.6.25)
in which the two terms are distinguished by the condition that the eigenfunctions
of H0 represent states of free particles, such as those that are present long before
or long after a collision, while V is an interaction that becomes negligible when
5.6 Scattering 183
these particles are very far apart. For instance, for non-relativistic processes H0
is the operator representing the total kinetic energy. The eigenfunctions ϕα of
H0 satisfy
H0 ϕα = Eα ϕα . (5.6.26)
Here α labels the species, three-momenta, and spin z-components (or helicities)
of all the particles in the state α represented by ϕα , and Eα is the sum of
the kinetic plus mass energies of these particles. These wave functions can be
normalized so that

ϕβ∗ ϕα = δ(β − α) , (5.6.27)
with the understanding that δ(β − α) vanishes unless the numbers of particles
in the states α and β and the species and spin components of the corresponding
particles in these states are all equal, and where they are equal it is given by
a product of Dirac delta functions for the three-momentum of each particle.
(In Eq. (5.6.27) we continue to use the abbreviation, that in ϕβ∗ ϕα we inte-
grate over all coordinates and sum over all spin 3-components on which both
wave functions depend.) To be explicit, for wave functions representing free-
particle states containing respectively N and N particles, we have

ϕn∗ ,σ ,p ;...;n ,σ ,p ϕn1 ,σ1 ,p1 ;...;nN ,σN ,pN
1 1 1 N N N
= δN N δσ1 ,σ1 · · · δσN ,σN × δn1 n1 · · · δnN nN

× δ 3 (p1 − p1 ) · · · δ 3 (pN − pN ) ,
with the ns labeling species and the σ s labeling spin z-components or helicities.
(For identical bosons or fermions it is necessary to respectively symmetrize or
antisymmetrize the products on the right-hand side.) We seek to calculate the
probability that the interaction V will cause a state that looks at very early times
like the free-particle state α to look at very late times like some other free-
particle state β.
To pursue this calculation, we consider an eigenfunction ψα of the full Hamil-
tonian (5.6.25) with energy Eα :
H ψα = Eα ψα . (5.6.28)
We can incorporate this condition along with our initial condition in what is
known as the Lippmann–Schwinger equation24 :
ψα = ϕα + (Eα − H0 + i)−1 V ψα , (5.6.29)
24 B. Lippmann and J. Schwinger, Phys. Rev. 79, 469 (1950).

with a positive-definite infinitesimal quantity that makes (Eα − H0 + i)−1

well-defined even though Eα is within the spectrum of eigenvalues of H0 .
(The general reason for taking positive will be revealed shortly.) Multiplying
Eq. (5.6.29) with the operator Eα − H0 and using Eq. (5.6.26), we see that
(Eα − H0 )ψα = V ψα , so any ψα that satisfies Eq. (5.6.29) also satisfies
Eq. (5.6.28).
To check the initial condition, we need to consider the time dependence of a
packet of the wave functions ψα . If we expand V ψα as an integral over free-
particle wave functions ϕβ and use Eqs. (5.6.26) and (5.6.27), Eq. (5.6.29)
becomes
∗
ϕβ V ψα
ψα = ϕα + dβ ϕβ , (5.6.30)
Eα − Eβ + i
where the integral over β includes an integration over all three-momenta in the
state represented by ϕβ and a sum over all species and spin labels on the particles
in this state. The time dependence of a packet of these wave functions is given
in the Schrödinger picture by

−iH t/h̄
ψ (t) = e
(g)
g(α)ψα dα = g(α)e−iEα t/h̄ ψα dα
∗
−iEα t/h̄ −iEα t/h̄
ϕβ V ψα
= g(α)e ϕα dα + dβ g(α) dα e ϕβ
Eα − Eβ + i
(5.6.31)
where g(α) is some smooth function of the momenta that may also depend on
the spin and species labels. It will be convenient to separate an integral over
energy from the second integral over α, writing Eq. (5.6.31) as
+∞
−iEα t/h̄ Gβ (E)
ψ (t) = g(α)e
(g)
ϕα dα + dβ ϕβ dE e−iEt/h̄ ,
−∞ E − Eβ + i
(5.6.32)
where

Gβ (E) = dα g(α) δ(Eα − E) ϕβ∗ V ψα . (5.6.33)
Now let us take t → −∞. For t < 0 we can close the contour of integration
over E in Eq. (5.6.32) with a very large semicircle in the upper half of the
complex plane, on which the factor e−iEt/h̄ makes the integrand negligible.
The integral over E is then given by a sum of the residues of any singularities
of the integrand in the upper half of the complex plane. There may well be
such singularities, but for t → −∞ their residues are exponentially suppressed
by the same factor e−iEt/h̄ . A singularity infinitesimally above the real axis
would not be suppressed in this way, but the energy at which the denominator
5.6 Scattering 185
E − Eβ + i vanishes is just below the real axis, and so does not contribute
to this contour integral. (This reveals why we took to be positive.) Hence the
integral over E vanishes for t → −∞, so for very early times only the first term
in Eq. (5.6.32) survives:

ψ (t) → g(α)e−iEα t/h̄ ϕα dα .
(g)
(5.6.34)
This is what we mean when we say that at very early times the state repre-
sented by ψα looks like the free-particle state represented by ϕα , as was to be
shown.
What does this state look like at very late times? For t > 0 we can only
close the contour of integration over E with a very large contour in the lower
half of the complex plane, on which the factor e−iEt/h̄ is now negligible.
The residues of any singularities of Gβ (E) at a finite distance below the real
axis are exponentially suppressed for t → +∞ by the same factor. But now
the singularity at E = Eβ − i does contribute to the integral. The contour
of integration goes clockwise around this singularity, so this integral equals
−2πiGβ (Eβ − i) exp([−iEβ − ]t/h̄). As long as we take → 0 before we
take t → +∞, we can drop the here, so the integral over E in Eq, (5.6.32)
equals −2πiGβ (Eβ ) exp(−iEβ t/h̄), and Eq. (5.6.32) then gives

−iEα t/h̄
ψ (t) → g(α)e
(g)
ϕα dα − 2πi dβ Gβ (Eβ )ϕβ exp(−iEβ t/h̄)
for t → +∞. Using Eq. (5.6.33), this is

ψ (t) → g(α) dα dβ Sβα exp(−iEβ t/h̄)ϕβ ,
(g)
(5.6.35)
where

Sβα = δ(β − α) − 2πiδ(Eβ − Eα ) ϕβ∗ V ψα . (5.6.36)
So, in the same sense as in the case t → −∞, Eq. (5.6.35) shows that the
state represented by ψα looks at t → +∞ as a superposition dβ Sβα ϕβ . The
coefficient (5.6.36) is known as the S-matrix and is the central object of study
in modern scattering theory.
But experiments do not measure probability amplitudes. They measure prob-
abilities, or the rates at which probabilities change. However, we cannot just set
the probability for the transition α → β equal to |Sβα |2 . Even if we consider
a process for which α = β, so that we can drop the term δ(β − α) in Sβα ,
the S-matrix element will still be proportional to the energy-conservation delta
function δ(Eβ − Eα ), whose square is not well-defined. Also, in the most com-
mon case, where no external fields affect the transition α → β, momentum is
conserved, so

ϕβ∗ V ψα = δ 3 (Pβ − Pα )Mβα ; (5.6.37)
where P here denotes the total momentum of the state and Mβα is some ampli-
tude that is not singular when Pβ = Pα . So we have to worry about the square
of δ 3 (Pβ − Pα ) as well as the square of δ(Eβ − Eα ).
For a completely convincing way of dealing with these problems, we would
need to take superpositions of states with a range of energies and momenta, and
follow the evolution of these wave packets from very early to very late times.
We will adopt a much simpler approach that gives the right answers with a
minimum of trouble.
First, to deal with the inevitable energy-conservation delta function, we adopt
the fiction that the interaction V acts only for a long but finite time interval of
duration T . This should not introduce significant errors if this interval extends
back in time to long before the particles in state α become close to one another,
and extends forward in time to long after the particles in state β have been close
to one another.25 In this case, the one-dimensional version of the representation
(5.6.9) of the delta function becomes instead

1
δT (Eβ − Eα ) = dt exp(−it (Eβ − Eα )/h̄) , (5.6.38)
2π h̄ T
the integral extending over the time interval of duration T . The square of the
delta function is then

2 T
δT (Eβ − Eα ) = δT (0)δT (Eβ − Eα ) = δT (Eβ − Eα ) .
2π h̄
As long as we do not attempt to measure energies to an uncertainty less than the
tiny amount h̄/T , we can drop the subscript T on the final delta function, and
write this as

2 T
δT (Eβ − Eα ) = δ(Eβ − Eα ) . (5.6.39)
2π h̄
Likewise, in the absence of external fields momentum is conserved; to
deal with the momentum-conservation delta function we imagine that the
system is enclosed in a box of large but finite volume V . The representation
(5.6.9) of the momentum-conservation delta function in Eq. (5.6.37) (now with
momentum and position taking the place of position and wave vector) is then
replaced with
25 For a decay process with a single-particle initial state we must take the duration T of the time interval
sufficiently large that the interval extends back in time close enough to the time when the particle was
produced, so that it had not yet had time to decay, and far enough forward in time that if the particle
has decayed by then its decay products will have had time to separate far enough that they are no longer
interacting.
5.6 Scattering 187

1

δV3 (Pβ − Pα ) = d 3 x exp ix · (Pβ − Pα )/h̄ , (5.6.40)
(2π h̄)3 V
the integral running over the interior of the box. The square of this delta func-
tion is
3 2 V
δV (Pβ − Pα ) = δV3 (0)δV3 (Pβ − Pα ) = δ 3 (Pβ − Pα ) , (5.6.41)
(2π h̄)3
in which we drop the subscript V in the final expression because the uncer-
tainty in measurements of momenta is generally larger than the tiny amount
h̄V −1/3 . Hence, putting together Eqs. (5.6.36), (5.6.37), (5.6.39), and (5.6.41),
the probability of a transition α → β with α = β occurring in a time T in the
volume V is
box 2
P (α → β) = Sβα

T V
= 3
δ(Eβ − Eα )δ 3 (Pβ − Pα )2πMβαbox 2
.
2π h̄ (2π h̄)
(5.6.42)
A superscript “box” has been attached to the matrix elements Sβα and Mβα
because putting the system in a box changes the way that we must normalize
the wave functions ϕα and ϕβ . Without a box, the wave function for a particle
of momentum p far from any interaction is taken as ϕp (x) = exp(ip · x/h̄)/
3/2 3 ∗
(2π h̄) , so that d xϕp (x)ϕp (x) = δ 3 (p − p ), but in a box of volume V
√
we must instead take ϕp (x) = exp(ip · x)/ V , so that the integral of |ϕp (x)|2
over the volume of the box is unity. Thus the matrix element for the transition
α → β in a box is related to the usual matrix element by
(Nα +Nβ )/2
(2π h̄)3
box
Mβα = Mβα , (5.6.43)
V
where Nα and Nβ are the numbers of particles in the initial and final states.
There is a further complication, that in a large box the final states are very close
together. According to Eq. (5.5.1), the number of allowed momentum values for
a single particle in a range d 3 p of momenta is (V /(2π h̄)3 )d 3 p, so the number
of momentum states in the range of final states is
d N (β) = (V /(2π h̄)3 )Nβ dβ (5.6.44)
where dβ denotes a product of momentum-space volume elements d 3 p for each

particle in the final state. Using Eqs. (5.6.43) and (5.6.44) in Eq. (5.6.42), the
differential rate for transitions from an initial state α into a range dβ of final
states is
box |2 d N (β)
|Sβα
d(α → β) =
T
1−Nα
2π V
= Mβα 2 δ(Eβ − Eα )δ 3 (Pβ − Pα ) dβ .
h̄ (2π h̄)3
(5.6.45)
This is the master formula for calculating the rates for all sorts of transitions
between free-particle states.

1−Nα
The factor V /(2π h̄)3 in Eq. (5.6.45) may look peculiar, but it is in
fact just what is needed to account for what is measured. For a decay process
with Nα = 1 this factor is of course absent, corresponding to the obvious fact
that the decay rate of a particle does not depend on the size of the box in
which it is contained. For a two-particle initial state α, the differential rate
of the scattering α → β into an arbitrary final state β is proportional to the
flux, the product of the relative velocity uα and the number density 1/V of
either particle as seen from the other, and is therefore written as the flux times
a differential cross section dσ (α → β). (For a pair of non-relativistic particles
uα = |p1 /m1 − p2 /m2 |, while if one of the particles is a photon then uα = c.)
Hence Eq. (5.6.45) gives
d(α → β) (2π)4 h̄2 2

dσ (α → β) ≡ = Mβα δ(Eβ − Eα )δ 3 (Pβ − Pα ) dβ .
uα /V uα
(5.6.46)
To clarify the meaning of the closing factor δ(Eβ − Eα )δ 3 (Pβ − Pα ) dβ in

Eqs. (5.6.45) and (5.6.46), consider a process α → β in the center-of-mass
system, with Pα = 0, where β is a state of two particles with momenta pA and
pB and masses mA and mB . The closing factor in Eq. (5.6.46) is here

δ(Eβ − Eα )δ 3 (Pβ − Pα ) dβ = δ(EA + EB − Eα )δ 3 (pA + pB ) d 3 pA
3
d pB .
When we integrate over the final momenta the momentum-conservation delta

function directs us to set pA = −pB ≡ p, so
δ(Eβ − Eα )δ 3 (Pβ − Pα ) dβ

→ p 2 dp d δ (p 2 c2 + m2 4 1/2
Ac ) + (p 2 c2 + m2 4 1/2
Bc ) − Eα ,
where p is in the solid angle d. There is a general rule that since for an
arbitrary increasing
function f (p) which takes a value f0 at a single point p0
we have 1 = δ(f (p) − f0 ) df (p), it follows that
δ(f (p) − f0 ) = δ(p − p0 )/f (p0 ) . (5.6.47)

5.6 Scattering 189
In our case, this means that when we integrate over p, we are directed to set
p = pβ , where
(pβ2 c2 + m2 4 1/2
Ac ) + (pβ2 c2 + m2 4 1/2
Bc ) = Eα , (5.6.48)
and Eq. (5.6.46) becomes
(2π)4 h̄2 pβ2
dσ (α → β) = Mβα 2 d , (5.6.49)
uα uβ
in which it is understood that, in the center-of-mass system, Mβα is to be
evaluated by placing pA = − pB in the infinitesimal solid angle d, with
|pA | = |pB | = pβ , and
pβ c 2 pβ c 2
uβ ≡ + . (5.6.50)
(pβ2 c2 + m2 4 1/2
Ac ) (pβ2 c2 + m2 4 1/2
Bc )
Of course in the center-of-mass system the initial relative velocity uα in
Eq. (5.6.47) is given by similar formulas but with β replaced with α and
the final masses mA and mB replaced with initial masses mA and mB :
pα c2 pα c 2
uα ≡ + , (5.6.51)
(pα2 c2 + m2A c4 )1/2 (pα2 c2 + m2B c4 )1/2
where
(pα2 c2 + m2A c4 )1/2 + (pα2 c2 + m2B c4 )1/2 = Eα . (5.6.52)
We can now see how our earlier results for scattering by a fixed potential
emerge from this general formalism. Consider an elastic non-relativistic scat-
tering process, in which mA = mA ≡ m and mB = mB m. In this
case pα = pβ , uα = uβ = pα /m, and Eα − mA c2 − mB c2 = pα2 /2m.
Equation (5.6.49) then gives the differential cross section
dσ (α → β) 2
= (2π)4 h̄2 m2 Mβα . (5.6.53)
d
To calculate the matrix element Mβα , we note that the final free-particle wave
function is

eipA ·xA /h̄ eipB ·xB /h̄
ϕβ (xA , xB ) =
(2π h̄)3/2 (2π h̄)3/2
and in the center-of-mass system the initial interacting wave function takes the
form
1
ψα (xA , xB ) = ψ(xA − xB ) × ,
(2π h̄)3/2
where ψ is the wave function discussed in the main body of this section (which
already includes a normalization factor (2π h̄)−3/2 ), and the second factor takes
care of the normalization of the heavy particle wave function. Then, setting
xA = x + xB and integrating over xB ,

ϕβ V ψα ≡ ϕβ∗ (xA , xB )V (xA − xB )ψα (xA , xB ) d 3 xA d 3 xB
∗

e−ipA ·x/h̄
=δ 3
(pA + pB ) 3
d x V (x)ψ(x)
(2π h̄)3/2
so

e−ipA ·x/h̄
Mβα = 3
d x V (x)ψ(x) . (5.6.54)
(2π h̄)3/2
Using Eq. (5.6.54) in Eq. (5.6.53) gives the same differential cross section
(5.6.17) as found earlier.
It is frequently observed that the cross section for some reaction is a function
of energy with a sharp peak. This is a sign of a resonance, the formation of
a slowly decaying intermediate state in the scattering process. Suppose the
integral ψβ∗ V ψα in Eq. (5.6.36) for the S-matrix has a term with an energy
dependence proportional to (Eα − ER + i h̄/2)−1 , with ER and real and
> 0. This yields a term in the function Gβ (E) defined by Eq. (5.6.33) with
energy dependence proportional to (E − ER + i h̄/2)−1 , which has a pole in
the lower half of the complex E plane. Although, as noted in the derivation of
Eq. (5.6.35), the contribution of any singularity in Gβ (E) at an energy E at a
finite distance below the real axis vanishes for t → +∞, if the singularity is
close to the real axis then this contribution lasts a long time. So if is rela-
tively small then the integral over E in Eq. (5.6.32) contains a term that decays
slowly, with a time dependence proportional to exp(−iER t/h̄) exp(−t/2),
giving a term in |ψ (g) (t)|2 that decays as exp(−t), indicating the presence
of an intermediate state whose probability decays at a rate . The singular term
in ψβ∗ V ψα gives a term in the cross section with energy dependence
2
1 1
σ ∝ = . (5.6.55)
E − ER + i h̄/2 (E − ER ) + h̄2 2 /4
2
So, this is the general rule for resonances: the decay rate of the intermediate
state is the full width in energy of the resonant peak in the cross section at half
maximum, divided by h̄.
5.7 Canonical Formalism
Until now we have followed de Broglie in representing the momentum of

a particle as −i h̄ times the gradient with respect to the particle’s position,
so that the wave function representing a state with definite momentum p is

∝ exp(ip · x/h̄). From this, we obtained the commutation relation among the
operators X and P that represent position and momentum, for instance, for a
single particle,
[Xi , Pj ] = i h̄δij , [Xi , Xj ] = [Pi .Pj ] = 0 , (5.7.1)
where in the Heisenberg picture P = mẊ. This has been adequate in deal-
ing with charged particles moving in an electrostatic potential but not in more
complicated contexts, such as the case of charged particles moving in general
classical electromagnetic fields, discussed in the next section, much less for a
quantum theory of fields. Also, in using commutation relations like Eq. (5.7.1),
we must wonder (or at least we should wonder) why these relations are valid.
Hamiltonian Formalism
There is a more general approach, known as the canonical formalism, according
to which the continuous degrees of freedom (excluding spin) of any system are
represented by a set of canonical variables Qa (such as all the components of
the positions of all the particles in a system) and an equal number of “canonical
conjugates” Pa . Like any operators, in the Heisenberg picture these operators
satisfy the equations of motion (5.3.34):
d d
i h̄ Qa (t) = [Qa (t), H ] , i h̄ Pa (t) = [Pa (t), H ] , (5.7.2)
dt dt

where H = H Q(t), P (t) is the Hamiltonian of the system. On the basis of
previous experience with classical phenomena, we commonly need to require
that these equations of motion take the same form as the Hamiltonian equations
of motion in classical mechanics:
d ∂

Qa (t) = H Q(t), P (t) , (5.7.3)
dt ∂Pa (t)
d ∂

Pa (t) = − H Q(t), P (t) . (5.7.4)
dt ∂Qa (t)
For instance, for a particle of mass m in a potential V (X), the variables Qa are
the components of the position vector X, the Hamiltonian is
P2
H (X, P) = + V (X) ,
2m
and the equations of motion (5.7.3) and (5.7.4) are
d P d
X= , P = −∇V (X)
dt m dt
as in Newtonian mechanics. In order to guarantee that the equations of motion

(5.7.3) and (5.7.4) follow from the equations (5.7.2) of the Heisenberg picture,
we impose the canonical commutation relations
[Qa (t), Pb (t)] = i h̄δab , [Qa (t), Qb (t)] = [Pa (t), Pb (t)] = 0 . (5.7.5)
To see that this works, recall that as remarked in Section 5.3 commutation
is algebraically like differentiation. It follows from the commutation relations
(5.7.5) that for any function F (Q, P ) of the Qs and P s,

∂

Qa (t), F Q(T ), P (t) = i h̄ F Q(T ), P (t) , (5.7.6)
∂Pa (t)

∂

Pa (t), F Q(T ), P (t) = −i h̄ F Q(T ), P (t) . (5.7.7)
∂Qa (t)
So, by taking F = H it follows trivially from the Heisenberg picture equations
(5.7.2) and the commutation relations (5.7.5) that the Qs and P s satisfy the
Hamiltonian equations of motion (5.7.3) and (5.7.4). This is why we impose
these commutation relations.
Of course, since operators in the Heisenberg and Schrödinger pictures are
related by Eq. (5.3.35), the commutation relations for the Schrödinger-picture
operators Qa and Pa are the same as for the Heisenberg-picture operators Qa (t)
and Pa (t).
It is in order to satisfy the canonical commutation relations (5.7.5) that in
wave mechanics we represent the momentum vector by the operator −i h̄∇.
What for de Broglie and Schrödinger was just a guess is a necessary conse-
quence of the canonical formalism. But there are cases where the canonical
conjugates Pa are not simply masses times velocities but take a different form,
as dictated by the Hamiltonian equation (5.7.3). In such cases, it is the quantities
Pa and not masses times velocities that must be represented as gradients.
For instance, consider a particle that experiences a momentum-dependent
interaction, with Hamiltonian
P2 1 1
H = + P · V(X) + V(X) · P , (5.7.8)
2m 2 2
where V is some vector function of position. (Since Pi does not commute
with Xi , we need to average over orderings of P and V(X) in order for the
Hamiltonian to be self-adjoint.) Here Eq. (5.7.3) tells us that the momentum is
not just the mass times the velocity, but instead

d
P(t) = m X(t) − V(X(t)) . (5.7.9)
dt
Nevertheless, it is P and not m dX/dt that must be represented in wave
mechanics by −i h̄∇, in order to satisfy the first commutation relation (5.7.5).
In particular, the time-dependent Schrödinger equation here reads
∂ h̄2 i h̄ i h̄
i h̄ ψ(x, t) = − ∇ 2 ψ(x, t) − ∇ · [V(x)ψ(x, t)] − V(x) · ∇ψ(x, t) .
∂t 2m 2 2
(5.7.10)
Lagrangian Formalism
There is another version of the canonical formalism, in quantum mechanics as
well as classical mechanics, based on a Lagrangian L(Q, Q̇) taken as a function
of canonical variables Qa (t) and their time derivatives Q̇a (t) rather than a
Hamiltonian function of canonical variables and their canonical conjugates. The
fundamental assumption of the Lagrangian formalism is that a quantity known
as the action
+∞
I≡ L(Q(t), Q̇(t)) dt (5.7.11)
−∞
is unaffected by infinitesimal shifts in the functions Qa (t) that vanish at

t → ± ∞. To use this assumption, note that when Qa (t) is changed to
Qa (t) + δQa (t) with δQa (t) infinitesimal, the change in the action is
+∞ ∂L(Q(t), Q̇(t)) ∂L(Q(t), Q̇(t)) d

δI = δQa (t) + δQa (t) dt .
a −∞ ∂Qa (t) ∂ Q̇a (t) dt
In the case where δQa (t) vanishes at t → ±∞, integrating the second term in
the integrand by parts gives
+∞ ∂L(Q(t), Q̇(t)) d ∂L(Q(t), Q̇(t))

δI = − δQa (t) dt ,
a −∞ ∂Qa (t) dt ∂ Q̇a (t)
and since this is assumed to vanish for arbitrary variations δQa (t) that vanish at
t → ±∞, we must have

d ∂L(Q(t), Q̇(t)) ∂L(Q(t), Q̇(t))
= . (5.7.12)
dt ∂ Q̇a (t) ∂Qa (t)
These are the equations of motion in the Lagrangian formalism.
From this, we can go over to the classical Hamiltonian formalism, defining
∂L(Q(t), Q̇(t))
Pa (t) = (5.7.13)
∂ Q̇a (t)
with Hamiltonian

H (Q, P ) = Q̇a Pa − L(Q, Q̇) . (5.7.14)
a
(Taken literally, this may not put the Qs and P s in the right order for H to
be self-adjoint, in which case we must average over their ordering to make H
self-adjoint as we did in Eq. (5.7.8).) In Eq. (5.7.14) we should regard Q̇ as a
function of the Qs and P s, given by solving Eq. (5.7.13) for Q̇. We can then
check that the Qs and P s satisfy the Hamiltonian equations of motion
∂H (Q, P ) ∂ Q̇b ∂L(Q, Q̇) ∂L(Q, Q̇) ∂ Q̇b
= Pb − −
∂Qa ∂Qa ∂Qa ∂ Q̇b ∂Qa
b b
∂L(Q, Q̇)
=− = −Ṗa
∂Qa
and
∂H (Q, P ) ∂ Q̇b ∂L(Q, Q̇) ∂ Q̇b
= Pb + Q̇a − = Q̇a ,
∂Pa ∂Pa ∂ Q̇b ∂Pa
b b
as was to be shown.
Noether’s Theorem
The chief reason for using the Lagrangian formalism to construct a Hamiltonian
is that there is a deep relation between conservation laws and symmetries of
the Lagrangian, first stated in classical physics26 by Amalie Emmy Noether
(1882–1935). Let us consider a symmetry of the Lagrangian under an infinites-
imal transformation that for simplicity takes the Qs into functions of Qs:
∂fa (Q)
Qa → Qa + fa (Q) , Q̇a → Q̇a + Q̇b , (5.7.15)
∂Qb
b
where the fa (Q) are some functions only of the Qs that are dictated, up to a
constant factor, by the nature of the symmetry principle, and is an infinitesimal
parameter. (Time-independent rotations and translations of coordinates are of
this general form.) The invariance of L under this transformation tells us that
∂L ∂L d
0= fa (Q) + fa (Q) .
a
∂Q a a ∂ Q̇ a dt
Using Eqs. (5.7.12) and (5.7.13), we see that this is a conservation law:
dF (Q, P )
= 0 where F (Q, P ) ≡ Pa fa (Q) . (5.7.16)
dt a
26 E. Noether, Nachr. König Gesell. Wiss. zu Göttingenm Math.-Phys. Klasse 235 (1918).
Not only is F conserved – in quantum mechanics it generates the symmetry

with which we began, in the sense that
[F , Qa ] = −i h̄fa (Q) (5.7.17)
or, equivalently, for infinitesimal ,
exp [iF /h̄]Qa exp [−iF /h̄] = Qa + fa (Q) , (5.7.18)
which is just the transformation (5.7.15).
For instance, if we take the canonical variables Q as the ith components Xni
of the coordinate vectors Xn of particles distinguished by a label n, and if as
usual the Lagrangian for a multi-particle system depends only on velocities and
differences of coordinate vectors, then L is invariant under the transformation
Xni → Xni + i , with the same infinitesimal vector for each particle label n,
and Eq. (5.7.16) gives a conserved quantity,

P= Pn .
n
This of course is the total momentum, and generates the translation symmetry,
in the sense that
[ · P, Xni ] = −i h̄i .
A similar analysis uses the assumed rotational invariance of the Lagrangian to
give the usual formula for the total angular momentum of any system that does
not involve spin. But note that invariance under the Galilean transformation
X → X + ut does not lead to a conservation law because, unlike translation or
rotation, this transformation involves the time.
5.8 Charged Particles in Electromagnetic Fields
We now turn to the quantum theory of a charged particle moving in classical

electric and magnetic fields. This theory will provide us in this section with
a good example of the use of the canonical formalism, and as we will see in
the following section this theory played an important part in understanding the
effect of external magnetic fields on atomic spectra.
Scalar and Vector Potentials

It is frequently convenient in classical electrodynamics to write the electric and
magnetic fields as linear combinations of derivatives of a vector potential A(x, t)
and a scalar potential φ(x, t):
1
E = − Ȧ − ∇φ , B=∇×A. (5.8.1)
c
This ensures that the fields satisfy the homogeneous Maxwell equations
∇ × E + Ḃ/c = 0 , ∇·B=0, (5.8.2)
and leads to simplifications in the other Maxwell equations.
What in classical physics is merely a convenience, in quantum mechanics is
a necessity. It is not possible to write a simple local Hamiltonian for a charged
particle in general electric and magnetic fields using just the fields E and B. But
we can write such Hamiltonians in terms of A and φ. For a single non-relativistic
particle of mass m and charge e, the Hamiltonian is
1 , e -2
H (X, P) = P − A(X, t) − eφ(X, t) . (5.8.3)
2m c
Whether or not we derive this Hamiltonian from a Lagrangian, its real justi-
fication is that it leads to the correct equations of motion. The Hamiltonian
equations of motion (5.7.3) and (5.7.4) here take the form
∂H 1 , e -
Ẋi (t) = = Pi (t) − Ai (X, t) ,
∂Pi (t) m c
∂H e , e - ∂A (X, t) ∂φ(X, t)
j
Ṗi (t) = − = Pj (t) − Aj (X, t) −e ,
∂Xi (t) mc c ∂Xi ∂Xi
where the indices i, j , etc. run over the values 1, 2, 3, and repeated indices are
summed. Eliminating the momentum from these two equations (and dropping
arguments), we have an equation of motion for the position:

e ∂Aj ∂φ e ∂Ai ∂Ai
mẌi = Ẋj −e − + Ẋj
c ∂Xi ∂Xi c ∂t ∂Xj

e ∂Aj ∂Ai ∂φ e ∂Ai
= Ẋj − −e − .
c ∂Xi ∂Xj ∂Xi c ∂t
To put this in a more familiar form, note that

∂Aj ∂Ai
Ẋj − = Ẋ × (∇ × A) i .
∂Xi ∂Xj
(For instance, for i = 3 the left-hand side is

∂A1 ∂A3 ∂A2 ∂A3
Ẋ1 − + Ẋ2 − = Ẋ1 (∇ × A)2 − Ẋ2 (∇ × A)1
∂X3 ∂X1 ∂X3 ∂X2

= Ẋ × (∇ × A) 3
and likewise for i = 1 and i = 2.) Using the formulas (5.8.1) for E and B, the
equation of motion takes the form
e
mẌ = eE + [Ẋ × B] , (5.8.4)
c
which we recognize as the equation of motion (4.6.23) dictated by Lorentz

invariance, to first order in |Ẋ|/c.
Gauge Transformations
There is more than one set of potentials A and φ that give the same fields E and
B. Given a set of potentials A and φ that yield a set of fields E and B, we can
always find other potentials
1 ∂ξ
A # = A + ∇ξ , φ # = φ − , (5.8.5)
c ∂t
which give the same fields for an arbitrary function ξ(x, t). A given choice of
potentials is called a choice of gauge, and Eq. (5.8.5) is known as a gauge
transformation. Even though the equation of motion (5.8.4) derived from the
Hamiltonian (5.8.3) involves only the fields E and B, the Hamiltonian depends
on A and φ and is not gauge invariant. So it is important to observe that no
physical implications of this Hamiltonian depend on the choice of gauge.
Let us check this for the simple case of a time-independent gauge trans-
formation function ξ(X), which has no effect on φ. The gauge-transformed
Hamiltonian is
1 , e e -2
H# = P − A − ∇ξ − eφ . (5.8.6)
2m c c
Define an operator

ie
U (X) ≡ exp − ξ(X) .
h̄c
According to Eq. (5.7.7),
e
[P, U (X)] = − ∇ξ(X)U (X)
c
and therefore
e
U −1 (X)PU (X) = P − ∇ξ(X) .
c
It follows that
H # (X, P) = U −1 (X)H (X, P)U (X) . (5.8.7)
So if ψ(x) satisfies the time-independent Schrödinger equation H ψ = Eψ for
energy E, then the gauge-transformed Schrödinger equation H # ψ # = Eψ # is
satisfied for the same energy, with gauge-transformed wave function

ie
ψ (x) = exp − ξ(x) ψ(x) .
#
(5.8.8)
h̄c
Not only the energy but also the probability density |ψ|2 is unchanged by this
transformation.
Magnetic Interactions
Now let us take the simplest example of magnetic interactions, a one-electron
atom in a uniform time-independent magnetic field B. We can take the vector
potential here as
1
A=− X×B,
2
for which ∇ × A = B. Of course this is not unique, but as we have seen this
makes no difference.
The factor 1/c multiplying the vector potential in Eq. (5.8.3) makes the mag-
netic term in the Hamiltonian generally very small. To first order in this term, it
shifts the Hamiltonian (5.8.3) by
e e e
H = A(X) · P = − [X × B] · P = B·L, (5.8.9)
me c 2me c 2me c
where L = X × P is the orbital angular momentum operator. (Here e has been
changed to −e, because in the usual notation this is the charge of the electron.
Also, we have not had to worry about the order of the operators A(X) and P,
because in this choice of gauge, ∇ · A = 0.)
Spin Coupling
What about spin? The form of the interaction (5.8.9) suggests that there should
also be a similar term in the magnetic interaction Hamiltonian with the spin
operator S in place of L, and not necessarily with the same coefficient. The
magnetic interaction is therefore taken to be in the form
e
H = B · [L + ge S] , (5.8.10)
2me c
where ge is a dimensionless coefficient known as the gyromagnetic ratio of
the electron. It was first calculated in 1928 on the basis of a relativistic theory
of the electron by Dirac,27 who found the value ge = 2. The development of
quantum electrodynamics after World War II led to a calculation28 of a radiative
correction due to the emission and reabsorption of a photon by the electron
while it is interacting with the magnetic field. This gave ge = 2 × 1.00162, in
good agreement with experiment.
The effect of the interaction (5.8.10) on atomic energy levels in a magnetic
field is described in the next section.

28 J. Schwinger, Phys. Rev. 73, 416 (1948).
5.9 Perturbation Theory
There are few problems in quantum mechanics that can be solved exactly. For-
tunately it is often possible to find useful approximate solutions by a technique
known as perturbation theory. Sometimes it happens that the results obtained
in this way are more revealing than would be provided by a more complicated
exact solution, even where one is available.
The basis of perturbation theory is the assumption that the Hamiltonian can
be divided into two parts:
H = H0 + H , (5.9.1)
where H0 is simple enough to allow exact solutions of the Schrödinger equation,
and H is in some sense small. We have already used a Hamiltonian of this type
to derive the Born approximation for scattering amplitudes in Section 5.6. In
this section we shall concentrate on deriving approximations for energy levels
and the corresponding wave functions, assuming that H is small enough to
allow the eigenfunctions and eigenvalues of H to be usefully expressed as power
series in H . That is, in the Schrödinger equation H ψ = Eψ we write
ψ = ψ0 + ψ1 + ψ2 + · · · , E = E 0 + E1 + E2 + · · · , (5.9.2)
where ψN and EN are of Nth order in H . The Schrödinger equation then takes
the form
(H0 + H )(ψ0 + ψ1 + ψ2 + · · · )
= (E0 + E1 + E2 + · · · )(ψ0 + ψ1 + ψ2 + · · · ) . (5.9.3)
In the Nth order of perturbation theory we keep all terms in Eq. (5.9.3) up to
Nth order in H . To zeroth order in H , this is the unperturbed Schrödinger
equation
H0 ψ0 = E0 ψ0 , (5.9.4)
whose solutions we assume are known.
First-Order Perturbation Theory

Keeping only terms in Eq. (5.9.3) of first order in H and taking ψ0 to satisfy
the zeroth-order equation Eq. (5.9.4), the Schrödinger equation becomes
H0 ψ1 + H ψ0 = E0 ψ1 + E1 ψ0 . (5.9.5)
To find the first-order term E1 in the energy, multiply Eq. (5.9.5) with ψ0∗ and
integrate and sum over all coordinates and spin 3-components. Because H0 is a
Hermitian operator, we have

ψ0∗ H0 ψ1 = (H0 ψ0 )∗ ψ1 = E0 ψ0∗ ψ1 ,
so the terms in this integral involving ψ1 cancel, and we have

E1 ψ0 ψ0 = ψ0∗ H ψ0 ,
∗
or, if ψ0 is normalized,

E1 = ψ0∗ H ψ0 . (5.9.6)
Very nice, but this does not necessarily work in the case where E0 is a
degenerate energy eigenvalue, with several independent eigenfunctions ψ (n) :
H0 ψ (n) = E0 ψ (n) . (5.9.7)
It is convenient to choose these eigenfunctions to be orthonormal:

∗
ψ (n) ψ (m) = δnm . (5.9.8)
Multiply Eq. (5.9.5) with any of the ψ (n)∗ , integrate, and sum over all coor-
dinates and spin 3-components,
and again use the fact that H0 is Hermitian,
so that ψ (n)∗ H0 ψ1 = E0 ψ (n)∗ ψ1 . The terms in this integral involving ψ1
again cancel, and we have

(n)∗
ψ H ψ0 = E1 ψ (n)∗ ψ0 . (5.9.9)
The difficulty is that with more than one independent solution ψ (n) of Eq. (5.9.7),
whatever we choose for our unperturbed wave function ψ0 , we can always
choose some linear combination n cn ψ (n) of these eigenfunctions to be
(n) ∗ ψ = 0, so that the same
orthogonal to ψ0 , in the sense that n cn ψ 0
linear combination of Eq. (5.9.9) gives a condition on H :
∗
cn ψ (n)
H ψ0 = 0 , (5.9.10)
n
which in general need not be the case.

To avoid this contradiction, we must make an appropriate choice of the
zeroth-order eigenfunction ψ0 . What we need is to choose ψ0 so that any linear
combination of the degenerate wave functions ψ (n) that is orthogonal to ψ0 will
also be orthogonal H ψ0 . Because H is a Hermitian operator, the integrals
(n)∗ to(m)
Hnm ≡ ψ Hψ ∗ =H .
form a Hermitian matrix, in the sense that Hmn nm
According to a general theorem of matrix algebra, it is always possible
to replace the ψ (n) with linear combinations for which the orthonormality
condition (5.9.8) is still satisfied, and now Hnm is diagonal:

(n)∗ (m) En n = m
ψ Hψ = (5.9.11)
0 n = m ,
for some real En . We must take the zeroth-order solution to be one of these
redefined eigenfunctions, say ψ (m) , so that if we multiply Eq. (5.9.5) with
the complex conjugate of any linear combination n=m cn ψ (n) of the other
degenerate eigenfunctions that is orthogonal to ψ (m) , Eq. (5.9.11) implies that
Eq. (5.9.10) is also necessarily satisfied, and there is no contradiction. (We will
see an example of this procedure in our treatment below of the Zeeman effect.)
With the zeroth-order wave function ψ0 = ψ (m) , Eq. (5.9.6) gives E1 = Em .
We can get a further insight into the necessity of a suitable choice of the
zeroth-order wave function by considering a problem of some importance in its
own right, the calculation of the first-order contribution to the wave function.
Let us introduce a complete orthonormal set of solutions ϕa of the zeroth-order
Schrödinger equation

H0 ϕa = Ea ϕa , ϕa∗ ϕb = δab . (5.9.12)
Multiply Eq. (5.9.5) by ϕa∗ and integrate and

sum over all coordinates
and spins.
Since H0 is Hermitian the first term gives ϕa∗ H0 ψ1 = Ea ϕa∗ ψ1 , and so

(E0 − Ea ) ϕa ψ1 = ϕa H ψ0 − E1 ϕa∗ ψ0 .
∗ ∗
(5.9.13)

For Ea = E0 , Eq. (5.9.13) makes no sense unless ϕa∗ H ψ0 vanishes for every
such wave function orthogonal to ψ0 , which is accomplished by taking ψ0 to be
one of the wave functions ψ (m) for which Eq. (5.9.11) is satisfied. On the other
hand, for Ea = E0 , ϕa is orthogonal to ψ0 so Eq. (5.9.13) gives a formula that
is valid for any ϕa for which Ea = E0 :
∗
∗ ϕa H ψ0
ϕa ψ1 = for Ea = E0 . (5.9.14)
E0 − E a
In the case where the eigenvalue E0 of H0 is not degenerate, ψ0 and the
functions ϕa with Ea = E0 form a complete set, so we can expand ψ1 as
∗
ϕa H ψ0
∗
ψ1 = αψ0 + ϕa ϕa ψ1 = αψ0 + ϕa ,
E0 − Ea
a:Ea =E0 a:Ea =E0
with the complex number α the only component of ψ1 that is still unknown.
We can always take α to be real, because any change in the imaginary part of
α needed to make α real has no effect on ψ0 + ψ1 if it is compensated by a

first-order change in the phase of ψ0 , which we are free to choose as we like.
With α real, to first order the norm of ψ0 + ψ1 is

|ψ0 + ψ1 |2 = (1 + 2α) |ψ0 |2 .
So, if we normalize ψ0 and require the wave function to remain normalized in

first order, then we must have α = 0. The first-order shift in the wave function
is then finally
∗
ϕa H ψ0
ψ1 = ϕa . (5.9.15)
E0 − Ea
a:Ea =E0
Note that if the parameters of the theory are changed so that one of the Ea
approaches E0 , then the corresponding component of the wave function
becomes
∗ very large, invalidating perturbation theory, unless in this limit
ϕa H ψ0 becomes very small. So even approximate degeneracy can be a
problem.
In the case of degeneracy Eq. (5.9.13) tells us nothing about the components
of ψ1 along the ϕa with Ea = E0 , and the normalization condition on ψ0 +
ψ1 does not determine these components either. For this, it is necessary to
invoke the condition that the changes of the wave function in higher orders of
perturbation theory are small. We will not pursue this aspect here.
Zeeman Effect
For an example of the use of perturbation theory, let us return to the Zeeman
effect, mentioned at the end of the previous section. Here H0 is the Hamilto-
nian of an alkali metal atom, considering the outermost electron to move in an
effective potential arising from the charges of the nucleus and all other electrons,
with no external fields. To calculate the effect of a weak external magnetic field
B, we consider a first-order perturbation given by Eq. (5.8.10):
e
H = B · [L + ge S] , (5.9.16)
2me c
where ge 2 is the gyromagnetic ratio of the electron. The eigenfunctions of
H0 may be labeled ψnj M . Here
J2 ψnj M = h̄2 j (j + 1)ψnj M , L2 ψnj M = h̄2 ( + 1)ψnj M ,

Jz ψnj M = h̄Mψnj M , (5.9.17)
where M runs by unit steps from −j to +j , and n − − 1 is the number of
nodes of the wave function. The states with a given n, j , and but varying M
all have the same energy, so the eigenstates of H0 are all degenerate, except for
those with j = 0. For a magnetic field in an arbitrary direction the operator H
in general includes terms proportional to Lz and Sz , which do commute with
Jz , but also Lx , Ly , Sx , and Sy, which do not commute with Jz , so there will be
∗
non-vanishing components of ψnj M H ψnj M with M = M, and first-order
perturbation theory will not work if we take the zeroth-order wave function to
be one of the ψnj M .29
The cure is obvious. Take the zeroth-order wave function to be an eigenstate
of the component of J in the direction of B. Or, to save writing, just continue
to use the ψnj M as zeroth-order wave functions but from the beginning choose
the coordinate system so that the z-axis is in the direction of B. In this case, the
first-order shift in the energy is given by Eq. (5.9.6) as

eB ∗
E1 (nj M) = ψnj M (Lz + ge Sz )ψnj M . (5.9.18)
2me c
It is easiest to evaluate E1 for s-wave states with = 0, for which j = 1/2
and M = ±1/2. In this case Eq. (5.9.18) gives immediately
ege B h̄
E1 (n 0 1/2 ± 1/2) = ± . (5.9.19)
4me c
To deal with the general case with = 0, we use a general property of angular
momentum multiplets. Let ϕj M be any multiplet of 2j + 1 wave functions,
with J2 ϕj M = h̄2 j (j + 1)ϕj M and Jz ϕj M = h̄Mϕj M , formed as described
in Section 5.4 by letting lowering operatorsJx − iJy act on a state with M =
j . For any vector operator V, the integrals ϕj∗M Vi ϕj M can all be calculated
from any one of them by using the commutation relations of the raising and
lowering operators Jx ± iJy with the Vi and the effect of these operators on the
multiplet ϕj M , none of which depends on the choice
of the operator V or the
wave functions ϕj M , so in general the integrals ϕj M Vi ϕj M can depend only
on the specific choice of the operator V or the wave functions ϕj M through an
overall factor. In particular, we have

ϕj∗M Vi ϕj M = αV ϕj∗M Ji ϕj M , (5.9.20)
29 If it were not for the fine structure produced by spin–orbit coupling there would be an additional
degeneracy: the energies for states with the same n and but different j would be equal. The discussion
here of the Zeeman effect assumes that the magnetic field is sufficiently weak that the energy shift it
produces is small compared with the fine-structure splitting, in which case states with the same n and
but different j are not effectively degenerate. But we are ignoring the even smaller hyperfine energy shifts
due to the interaction of the electron with the magnetic field of the nucleus.
In hydrogen there is a further degeneracy of states with the same n and j but different , such as the
2s1/2 and 2p1/2 states, which are separated only by the very small Lamb shift described in Section 5.4.
The treatment here applies to hydrogen only when the energy shift due to the interaction of the electron
with the external magnetic field is less than the Lamb shift but greater than the hyperfine splitting.
where the factor αV will in general depend on the nature of the operator V
and the wave functions ϕj M , but not on the vector index i nor on the angular
momentum z-components M and M . This is an example of a general quantum-
mechanical result known as the Wigner–Eckart theorem.30
In its application to the Zeeman effect, Eq. (5.9.20) gives

∗ ∗
ψnj L ψ
M i nj M = αL (nj ) ψnj M Ji ψnj M ,

∗ ∗
ψnj M Si ψnj M = αS (nj ) ψnj M Ji ψnj M . (5.9.21)
To calculate the coefficients αL and αS , we use a trick. The wave functions

Jk ψnj M are linear combinations of the wave functions ψnj M in the same
multiplet, so we can apply Eq. (5.9.21) also to these functions:

∗ ∗
ψnj M Li Jk ψnj M = αL (nj ) ψnj M Ji Jk ψnj M ,
(5.9.22)
∗ ∗
ψnj M Si Jk ψnj M = αS (nj ) ψnj M Ji Jk ψnj M .
Taking the wave functions ψnj M to be orthonormal, we have

∗
M J ψnj M = h̄ j (j + 1)h̄ δM M .
2 2 2
ψnj
Hence, setting i = k and summing over i in Eq. (5.9.22), we have

∗
h̄ j (j + 1)αL (nj ) = ψnj
2
M L · Jψnj M ,

∗
h̄2 j (j + 1)αS (nj ) = ψnj M S · Jψnj M .
Note that
1 1
L·J= − (J − L)2 + J2 + L2 = − S 2 + J 2 + L2
2 2
and likewise
1
S·J= − L2 + J2 + S2 ,
2
so
−3/4 + j (j + 1) + ( + 1)
αL (nj ) = ,
2j (j + 1)
−( + 1) + j (j + 1) + 3/4
αS (nj ) = . (5.9.23)
2j (j + 1)
30 For a statement of this theorem and a detailed proof, see Section 4.1 of Weinberg, Lectures on Quantum
Using Eqs. (5.9.23) and (5.9.21) in Eq. (5.9.18) then gives the first-order Zee-
man energy shift:
E1 (nj M)
eB h̄M
= − 3/4 + j (j + 1) + ( + 1) + ge [−( + 1) + j (j + 1) + 3/4] .
4me cj (j + 1)
(5.9.24)
Second-Order Perturbation Theory

In some cases the interesting effects of a perturbation H arise only in second or
even higher order. The terms in the Schrödinger equation (5.9.3) of second order
in H give
H0 ψ2 + H ψ1 = E0 ψ2 + E1 ψ1 + E2 ψ0 . (5.9.25)
To find E2 , multiply with ψ0∗
and integrate and sum over all coordinates
andspins. Again using the fact that H0 is Hermitian, we have ψ0∗ H0 ψ2 =
E0 ψ0∗ ψ2 , so the terms involving ψ2 cancel.
Also, as we have seen, the nor-
malization condition for ψ requires that ψ0∗ ψ1 = 0, so the term proportional
to E1 vanishes. This leaves

E2 = ψ0∗ H ψ1 . (5.9.26)
In the case where the eigenfunction of H0 with energy E0 is not degenerate, we

can use Eq. (5.9.15), so that E2 is given by a sum over all the other eigenfunc-
tions of H0 :

ϕa∗ H ψ0 2
E2 = . (5.9.27)
E0 − Ea
a:Ea =E0
When field theorists say that the Lamb shift is due to the emission and re-
absorption of a photon by the electron in hydrogen they mean that this is a
second-order effect, in which the wave functions ϕa in Eq. (5.9.27) represent
states containing an electron and a photon. Since these states form a continuum,
the sum over states involves an integral over the photon momentum, which
introduces infinities into the calculation. This calculation was completed only
in 1949, when it was recognized that the same second-order processes require
a redefinition of the mass and charge of the electron and of the photon and
electron fields, which leads to a cancellation of infinities.31
31 N. M. Kroll and W. E. Lamb, Phys. Rev. 75, 388 (1949); J. B. French and V. F. Weisskopf, Phys. Rev. 75,
1240 (1949).
5.10 Beyond Wave Mechanics
Our discussion of quantum mechanics in this chapter has so far been based
on wave mechanics, in which physical states are represented by functions of
particle positions and spins. This is too parochial a formalism. Why position,
among all observable physical quantities? Indeed, we have already seen in
Section 5.3 that a physical state can just as well be represented by a wave
function depending on momenta (such as (5.3.20) for a one-particle system) as
by a wave function depending on position.
The study of other physical systems forces us much farther away from wave
mechanics than merely substituting momenta for position as the argument of
wave functions. The state of a field, such as the electromagnetic field, cannot
be described in terms of the positions or the momenta of any fixed number of
particles. It is partly as a preparation for our account of quantum field theory in
Chapter 7 that we need to consider a formulation of quantum mechanics, due
chiefly to Dirac,32 that is general enough to apply to any physical system.
In this general formulation, physical states are represented by state vectors
in an infinite-dimensional space, known as Hilbert space. Like ordinary vectors
in three dimensions, a linear combination a1 1 + a2 2 of two state vectors 1
and 2 is also a state vector, only here the numerical coefficients a1 and a2 can
be complex. Addition here has the same properties as the addition of complex
numbers, including associativity and commutativity and the existence of a zero
for which 0 + = + 0 = . Also, as in Euclidean space, for any two
state vectors and there is a scalar product denoted (, ), here a complex
number, with the properties
(, ) = (, )∗ , (5.10.1)
(, a1 1 + a2 2 ) = a1 (, 1 ) + a2 (, 2 ) , (5.10.2)
(, ) ≥ 0 (5.10.3)
and (, ) = 0 if and only if = 0. As we shall see, wave functions are
the components of these state vectors in one basis or another,
and the integrals
(5.3.8) of products of these wave functions, abbreviated as ψ ∗ ϕ, are the scalar
products (, ) of the state vectors of which they are the components.
Observable quantities are represented in this formulation by linear operators
that act on state vectors rather than on wave functions. Here an operator A being
“linear” means that for any state vectors 1 and 2 and complex numbers a1
and a2 , we have
A(a1 1 + a2 2 ) = a1 A1 + a2 A2 . (5.10.4)
32 This approach is the basis of Dirac’s 1930 treatise, The Principles of Quantum Mechanics, listed in the
bibliography.
The adjoint of an operator A is defined as an operator A† for which

(, A† ) = (A, ) . (5.10.5)
Real observables are represented by operators that are self-adjoint, in the sense
that A† = A.
The first interpretive postulate of quantum mechanics is that a state repre-
sented by a non-zero state vector has a definite value α for an observable
represented by an operator A if and only if is an eigenvector of A with
eigenvalue α – that is,
A = α . (5.10.6)
If 1 and 2 are non-zero eigenvectors of a self-adjoint operator A with eigen-
values α1 and α2 then
α1 (2 , 1 ) = (2 , A1 ) = (A2 , 1 ) = α2∗ (2 , 1 ) . (5.10.7)
Taking 1 = 2 and then of course α1 = α2 , we see that eigenvalues of self-
adjoint operators are real, while taking α1 = α2 and then of course 1 = 2 ,
we see that eigenvectors of a self-adjoint operator with different eigenvalues are
orthogonal, in the sense that (2 , 1 ) = 0.
The second interpretive postulate of quantum mechanics is that in a state
represented by a state vector , the observable quantity represented by an
operator A has the expectation value
(, A)
A = . (5.10.8)
(, )
Obviously it follows that if is normalized so that (, ) = 1, then the
expectation value is (, A).
Suppose an observable is represented by an operator A with discrete eigen-
values αn and eigenvectors n ,
An = αn n (5.10.9)
and we normalize these eigenvectors so that
(n , m ) = δnm . (5.10.10)
(If there is only one eigenvector for each eigenvalue it follows from Eq. (5.10.7)
and the reality of eigenvalues that the different eigenvectors are orthogonal,
and we can always multiply them by numerical factors so that they satisfy
Eq. (5.10.10). Even in the case of degeneracy, with several eigenvectors for
the same eigenvalue, we can always define linear combinations of these eigen-
vectors to satisfy Eq. (5.10.10).) Ifwe expand an arbitrary state vector in a
series of these eigenvectors = n cn n , by taking the scalar product with
any of the m and using Eq. (5.10.10) we find that cn = (n , ), so that

= (n , ) n . (5.10.11)
n
Inserting this into Eq. (5.10.8) gives the expectation value of the observable
represented by A:

n αn |(n , )|
2
A = . (5.10.12)
n |(n , )|
2
Since a corresponding result applies for any function of this observable, it fol-
lows from Eq. (5.10.12) that the probability of finding a value αm when we
measure the observable represented by A is
|(m , )|2
Pm () = . (5.10.13)
n |(n , )|
2
Note in particular that the sum of these probabilities is one.

As in Section 5.3, we can pass over to the case of an operator A with a
continuum of eigenvalues by supposing that it has a very large number of very
close discrete eigenvalues. If there are N (α) dα eigenvalues between α and
α + dα then, in the limit of close packing, we can evaluate sums over n by
replacing then with integrals over α:

· · · → dα N (α) · · · . (5.10.14)
n
Making this replacement, and defining renormalized eigenvectors

%
ϒα ≡ N (α)n for α = αn , (5.10.15)
Eqs. (5.10.11) and (5.10.14) become

= dα ϒα (ϒα , ) (5.10.16)
and Eq. (5.10.12) gives

α |(ϒα , )|2 dα
A = . (5.10.17)
|(ϒα , )|2 dα
We conclude that the probability that a measurement of the observable repre-
sented by A will give a value in the range α to α + dα is P (α) dα, where P (α)
is the probability density:
|(ϒα , )|2
P (α) = (5.10.18)
|(ϒα , )|2 dα
with the normalization of the state vectors ϒα fixed by the condition (5.10.15).
In particular, if in Eq. (5.10.16) we take = ϒα , we find

ϒα = dα ϒα (ϒα , ϒα ) ,
so with this normalization the scalar product of these eigenvectors is the Dirac
delta function discussed in Section 5.6,
(ϒα , ϒα ) = δ(α − α ) . (5.10.19)
Of course, if we also normalize the state vector so that

|(ϒα , )|2 dα = 1
then the probability density is

P (α) = |(ϒα , )|2 . (5.10.20)
It should by now be clear that the wave function ψ(x) (for instance, for a
single particle in one dimension) is nothing but the scalar product
ψ(x) = (ϒx , ) (5.10.21)
where is the state vector representing the physical state and ϒx is a state
vector, normalized to satisfy Eq. (5.10.15) or equivalently Eq. (5.10.19),
representing a state in which the particle is at x. We can use suitably nor-
malized eigenvectors of operators representing any other observables to define
corresponding wave functions (ϒα , ), such as the momentum-space wave
function introduced in Section 5.3.
In general, eigenvalues and probabilities are to be calculated using relations
among operators that represent physical observables, including commutation
relations and formulas giving the operators that represent conserved quantities
such as the Hamiltonian and angular momentum in terms of other operators.
These relations embody the physical content of any particular quantum-
mechanical theory.
6
Nuclear Physics
Atoms were at the center of physicists’ interest in the 1920s. It was largely
from the effort to understand atomic properties that modern quantum mechan-
ics emerged in this decade. In this work physicists did not have to concern
themselves much with the nature of the atomic nucleus. It had been known
since Rutherford’s interpretation in 1911 of the scattering experiments in his
laboratory that almost all the mass of atoms is contained in a tiny positively
charged nucleus, but all that the atomic physicist needed to know about this
nucleus was its electric charge, mass, and (to account for hyperfine splitting) its
spin and magnetic moment.
In the 1930s physicists’ concerns expanded to include the nature of atomic
nuclei. The constituents of the nucleus were identified, and a start was made in
learning what held them together. And, as everyone knows, world history was
changed in subsequent decades by the military application of nuclear physics.
6.1 Protons and Neutrons
Discovery of the Proton

The first known constituent of the atomic nucleus was the proton. In a series of
experiments in 1919 on the passage of alpha particles from radioactive nuclei
through various gases, Rutherford found that collisions of alpha particles with
nitrogen atoms produced penetrating rays of particles whose range and deflec-
tion by electric and magnetic fields seemed identical to what would be expected
for hydrogen nuclei.1 The reaction is now known to be 14 N + 4 He → 17 O + 1 H,
and is shown on a seven cent postage stamp of New Zealand, the country of
Rutherford’s birth. Rutherford at first called these “H particles,” and he specu-
lated that they were constituents of all atomic nuclei. In the following year he
gave them their modern name, protons.
1 E. Rutherford, Phil. Mag. Series 6 37, 381 (1919); reproduced in Beyer, Foundations of Nuclear Physics,
listed in the bibliography.
210
It was clear from the beginning that protons could not be the only constituents
of atomic nuclei. This would have been close to a realization of a hypothesis in
1815 of the chemist William Prout (1785–1860). Observing that known atomic
weights were generally close to whole number multiples of the atomic weight
of hydrogen, Prout proposed that all atoms are composites of hydrogen atoms.
Applying Prout’s hypothesis to nuclei rather than to atoms would have done
well in accounting for nuclear masses (which provide almost all of the masses
of atoms). It would even work when applied to isotopes, sets of atoms that have
an equal number of electrons and hence display the same chemical behavior
but differ in their atomic weights. Measurements at the Cavendish Laboratory
by Francis William Aston (1877–1945) had shown by 1919 that the atomic
weights of various isotopes of hydrogen, carbon, oxygen, chlorine, etc. were
all close to whole number multiples of the atomic weight of the lightest isotope
of hydrogen. But to suppose that nuclei are made up only of protons would have
entirely failed in dealing with nuclear electric charges. If nuclei were composed
only of protons their atomic weights in units of the atomic weight of hydrogen
would all be close to their atomic numbers, which as we saw in Section 3.4
were by 1919 already known to equal their electric charges in units of the proton
charge. But light nuclei such as helium, carbon, nitrogen, oxygen, etc. typically
have atomic weights close to twice their atomic numbers.
Electrons in the Nucleus?

In his celebrated Bakerian lecture to the Royal Society of London in 1920,2
Rutherford proposed that nuclei consist of two kinds of particle: protons and
electrons. He was undecided about how these particles might be grouped within
nuclei, though he tentatively proposed that nuclei consist of alpha particles
(known to be 4 He nuclei), supposed to consist of four protons and two elec-
trons, and nuclei of the isotope 3 He, which Rutherford had discovered in the
collisions of alpha particles with nuclei of nitrogen and oxygen, supposed to
consist of three protons and an electron. In his lecture he also proposed the
existence of neutral particles later called neutrons, with a mass similar to the
proton’s, and with no electric charge. But for Rutherford the neutron was not a
new particle – it was a composite of a proton and one strongly bound electron.
The theory that nuclei consist of protons and electrons had some plausibility.
Because electrons have so much less mass than protons, this theory implied
that all atomic weights would be close to whole number multiples of the atomic
weight of a single proton, the nucleus of hydrogen, as had been noticed by Prout,
Also, some nuclei were known to emit electrons in beta radioactivity. But it was
hard to see how this could work dynamically. In particular, if there are states
of an electron and a proton that are much more deeply bound than a hydrogen
2 E. Rutherford, Proc. Roy. Soc. A 97, 374 (1920).

212 6 Nuclear Physics
atom, then why do the electrons in ordinary atoms including hydrogen atoms
not all fall into these states, emitting the released energy as radiation?
There was an even stronger argument coming from molecular physics against
supposing nuclei to consist only of protons and electrons. As we saw in
Section 5.5, we can tell whether the identical nuclei in a diatomic molecule
are bosons or fermions from the ratio of intensities of transitions in the para
and ortho states, which have orbital angular momentum respectively even and
odd. At temperatures T for which the energies of these transitions are much less
than kT , the total intensity of the para lines is greater than for the ortho lines by
a factor (s1 + 1)/s1 if the spin s1 of each nucleus is an integer and the nuclei are
bosons, while the total intensity of the para lines is less than for the ortho lines
by a factor s1 /(s1 + 1) if the spin s1 of each nucleus is a half odd integer and the
nuclei are fermions. In 1929 Walter Heitler (1904–1981) and Gerhard Herzberg
(1904–1999) observed that the total intensity of the para lines in the diatomic
nitrogen molecule is greater than the intensity of the ortho lines, indicating that
the nucleus of the most common nitrogen isotope, 14 N, is a boson.3 (In fact,
we now know that it has spin 1.) But if nuclei consist of protons and electrons,
then the 14 N nucleus would consist of 14 protons to give atomic weight 14, and
seven electrons, to give atomic number 14 − 7 = 7, adding up to 14 + 7 = 21
fermions, and the 14 N nucleus would be a fermion.
Discovery of the Neutron

This puzzle began to be resolved in 1932 with the discovery of the neutron4
by James Chadwick (1891–1974), Rutherford’s second in command at the
Cavendish Laboratory at Cambridge. Chadwick had learned about observations
in Paris5 that showed that collisions of energetic alpha particles with beryllium
atoms produce highly penetrating electrically neutral rays, which when directed
into a hydrogen-rich substance like paraffin produce protons that recoil with
very high energy. Experiments at the Cavendish Laboratory showed that these
neutral rays would also cause heavier nuclei to recoil, though with smaller
recoil velocities, and from the ratios of the recoil velocities he was able to
calculate the mass of the particles making up the neutral rays. It follows from
Eq. (3.3.1) that if a particle B moving with velocity vB strikes a particle A at
rest, and A recoils in the same direction as the initial direction of motion of B,
then its recoil velocity will be
2mB
vA = vB .
mA + m B
3 W. Heitler and G. Herzberg, Naturwiss. 17, 673 (1929).

4 J. Chadwick, Proc. Roy. Soc. A 136, 692 (1932), reproduced in Beyer, Foundations of Nuclear Physics,
listed in the bibliography.
5 I. Curie and F. Joliot, Compt. Rend. Acad. Sci. Paris 194, 273 (1932).
Chadwick did not know the initial velocity vB , but he could eliminate it by
taking the ratio of recoil velocities for different target nuclei of known atomic
weights, and from this ratio he could calculate the atomic weight An of the
particle comprising the neutral ray. For instance, measurements showed that
the same neutral ray from beryllium that causes hydrogen nuclei to recoil
straight back with speed 3.3 × 107 m/sec would cause nitrogen nuclei to
recoil straight back with speed 4.7 × 106 m/sec, so
3.3 × 107 An /(1 + An ) (14 + An )
= =
4.7 × 106 An /(14 + An ) (1 + An )
from which it follows that An 1.16. Chadwick concluded that these neutral
rays consist of particles he called neutrons, with mass close to that of hydrogen.
Chadwick assumed that this was the neutron that Rutherford had anticipated
in his 1920 Bakerian lecture, and he followed Rutherford in supposing that the
neutron is a proton–electron bound state. He knew about the problem that study
of the diatomic nitrogen molecule indicated that the 14 N nucleus is a boson,
which is not possible if it consists of 14 protons and seven electrons (whether or
not combined into nuclei of 4 He or 3 He or proton–electron composites), but at
first he decided to ignore the problem. This may have been due to a widespread
reluctance at the time to contemplate any new fundamental particles besides
the proton, electron, and photon, or perhaps it was just the influence of the
formidable Lord Rutherford. The status of the neutron as a fermion that is every
bit as elementary as the proton only became clear with studies of the forces
between these particles, to be discussed in the next section. As a result of these
studies, neutrons and protons became regarded as two members of a family of
particles known as nucleons.
Nuclear Radius and Binding Energy

Like the states of electrons in atoms, the states of nucleons in all but the lightest
nuclei can be described approximately by the Hartree approximation: each
nucleon can be supposed to move in a potential due to all the other nucleons.
Because nuclear forces have short range, each nucleon is chiefly affected by
nucleons with the same one-nucleon orbital wave function. And, because
nucleons are spin 1/2 fermions satisfying the Pauli exclusion principle, there
are just three of these: for a proton (or neutron) state there is another proton (or
neutron) state with opposite spin 3-component, and two neutron or proton states
with each value for the spin 3-component. Thus, whatever the total number A
of nucleons, as a first approximation the binding energy per nucleon and the
volume per nucleon tend to be similar for all nuclei. This is known as the
saturation of nuclear forces.
With a constant volume per nucleon, the volume of a nucleus is proportional
to the number A of nucleons, so the nuclear radius R is proportional to A1/3 .
These radii can be calculated from measurements of the effect of the nuclear
electric quadrupole moment on atomic spectra; from measurements of the scat-
tering of electrons in the Coulomb field of the nucleus; and from the measured
rates of alpha decays, to be discussed in Section 6.4. A consensus of these
measurements gives a nuclear radius
R 1.3 × 10−13 cm × A1/3 . (6.1.1)
The binding energy of a nucleus is the energy required to take all of its nucleons
to rest at a great distance. It can easily be calculated from measurements of
atomic weights: it is the sum of the atomic weights of all the nucleons in the
nucleus minus the atomic weight of the nucleus, times the mass energy m1 c2 =
931.494 MeV of unit atomic weight.
Liquid Drop Model

According to the idea of the saturation of nuclear force, the dominant term in
the binding energy per nucleon is a constant, estimated to be about 15.8 MeV.6
There are several corrections to this simple rule, which taken together provide
the liquid drop model of the nucleus.
Surface Tension
With a nuclear radius proportional to A1/3 the surface area of the nucleus is
proportional to A2/3 , so a fraction proportional to A−1/3 of the A nucleons
is closer to the surface than the range of the nuclear force and therefore feels
less attraction to other nucleons. This decreases the nuclear binding energy
per nucleon by a term proportional to A−1/3 , estimated from measured atomic
weights as −18.3 A−1/3 MeV.
Coulomb Repulsion
The electrostatic repulsion of Z protons introduces a negative term in the total
binding energy proportional to Z 2 and to the inverse nuclear radius, which is
proportional to A−1/3 . The Coulomb contribution to the binding energy per nu-
cleon is therefore proportional to Z 2 A−4/3 . It is approximately −0.71 Z 2 A−4/3
MeV. (The energy coefficient here is smaller than for the other terms in the
binding energy because electric forces are intrinsically weaker than nuclear
forces. For instance, the Coulomb energy of a uniformly charged sphere with
charge Ze and radius (6.1.1) is 3Z 2 e2 /5R = 0.66Z 2 A−1/3 MeV.)
6 The numerical values of coefficients of various terms in the nuclear binding energy are rounded off here
from values derived from a fit to measured binding energies by A. H. Wapstra and N. B. Gove, Nuclear
Data Tables 9, 267 (1971).
Neutron–Proton Inequality
The Pauli exclusion principle leads to a decrease in the binding energy for nuclei
with unequal numbers of protons and neutrons. Given a nucleus with equal
numbers of protons and neutrons, if we imagine a proton changed into a neutron
the new neutron would be forced by the exclusion principle to occupy a state
of energy higher than any of the originally occupied neutron states. Because
of the symmetry between protons and neutrons (discussed in the next section),
with equal numbers of protons and neutrons the highest energy of the originally
occupied neutron states equals the highest energy of the originally occupied
proton states, so changing this proton into a neutron necessarily increases its
energy. The same is true if we change a neutron into a proton. This decrease in
the total binding energy is approximately proportional to (N − Z)2 /A, where
N = A − Z is the number of neutrons. It is taken as proportional to 1/A to take
account of the decrease in the spacing of nuclear energy levels with increasing
A. Observed binding energies indicate a term in the binding energy per nucleon
of −23.2 MeV × (A − 2Z)2 /A2 .
Putting this together, the binding energy per nucleon goes as follows;
binding energy/A 15.8 − 18.3 A−1/3 − 0.71 Z 2 A−4/3
− 23.2 (A − 2Z)2 A−2 MeV . (6.1.2)
There are also sporadic bumps in the binding energy. Nuclei with even or odd
numbers both of protons and
√ of neutrons have√an additional term in the binding
energy that is about 12/ A MeV or −12/ A MeV, respectively. Also, the
binding energy is increased for certain “magic” numbers of protons or neutrons,
to be discussed in Section 6.3.
Stable Valley and Decay Modes

For a given value of A, the most deeply bound nucleus has a value of Z given
by the stationary point of the binding energy per nucleon (6.1.2):
A
Z . (6.1.3)
2 + 0.015A2/3
Nuclei with smaller or larger values of Z for a given A tend to decay into the
nucleus whose Z is given approximately by Eq. (6.1.3), with the emission of an
electron or its antiparticle, the process known as beta decay, to be discussed in
Section 6.5. In a contour map of nuclear masses plotted against A and Z, the
nuclei satisfying Eq. (6.1.3) form a valley of relatively high binding energy and
hence low mass, known as the stable valley.
For A < 50 Eq. (6.1.3) gives Z close to A/2, as was noticed with the earliest
measurements of the atomic numbers of nuclei such as 4 He, 12 C, 14 N, 16 O,
etc. As we consider nuclei with increasing values of A the Coulomb repulsion
among the protons becomes more and more important, and the nuclei with
the lowest ground state energy tend to have an increasing ratio of neutrons to
protons. For instance, for A = 56 the nucleus with the lowest ground state
energy is 56 Fe, with 26 protons and 30 neutrons. The atomic numbers of the
stable valley fall increasingly below the line Z = A/2 for larger values of A, to
a value Z = 92 for A = 238.
In the stable valley, Eqs. (6.1.2) and (6.1.3) give a binding energy per nucleon
that increases with increasing A for lighter nuclei, owing to the decreasing effect
of surface tension, reaches a maximum of about 9 MeV for iron and nickel, and
then, because of the Coulomb term, decreases slowly for larger A, taking a
value of about 7.5 MeV for 238 U. The decrease with A of the binding energy
per nucleon for heavy nuclei makes it energetically favorable for these nuclei to
decay by splitting into fragments, either by spontaneous fission into two nuclei
of much lower A, or more often by emitting an alpha particle. After emitting
one or a few alpha particles a nucleus becomes excessively neutron-rich for the
new, lower, value of A, and it becomes energetically favorable for the nucleus to
lower the neutron–proton ratio by one or more beta decays, moving back toward
the stable valley. These alpha and beta decay processes sometimes yield nuclei
in excited states, which then undergo gamma decay to the ground state, emitting
an energetic photon. A succession of alpha, beta, and gamma decays continues
until the nucleus transforms into a non-radioactive nucleus, such as one of the
stable isotopes of lead.
For instance, in the decay chain that is most important in the history of
physics, uranium 238 alpha-decays to thorium 234 with a half life of 4.47 × 109
years, and then, with much shorter half lives, thorium 234 beta-decays to
protactinium 234, which beta-decays to uranium 234, which alpha-decays
to thorium 230, which alpha-decays to radium 226, which alpha-decays to
radon 222 (an example of alpha decay considered in detail in Section 6.4),
which alpha-decays to polonium 218, which alpha-decays to lead 214, which
beta-decays to bismuth 214, which beta-decays to polonium 214, which alpha-
decays to lead 210, which beta-decays to bismuth 210, which beta-decays to
polonium 210, which alpha-decays to the stable isotope lead 206, which makes
up 24% of natural lead.
6.2 Isotopic Spin Symmetry
There is a deep symmetry between protons and neutrons, which made it evident
that neutrons are fermions and just as elementary as protons. Knowledge of this
symmetry emerged in the late 1930s from a study of the forces among protons
and neutrons.
Nuclear Forces
The first of the nuclear forces to be studied was that between a proton and
a neutron, which could be measured by observing the scattering of neutrons
on the protons in a hydrogen-rich substance such as paraffin. As in all scat-
tering processes, the scattering amplitude f (x̂) introduced in Section 5.6 may
be expanded as a sum over terms with angular dependence proportional to the
spherical harmonic functions Ym (x̂) defined in Section 5.2. The terms with
> 0 are suppressed at low energy by a centrifugal barrier, which makes
the wave function vanish for vanishing separation r as r , so at the energies
available in the 1930s the scattering was dominated by the term with = 0, for
which the scattering amplitude f is independent of direction. But it is important
here to keep track of the dependence of the scattering amplitude on spin, which
we ignored in Section 5.6. With the neutron taken like the proton to have spin
1/2, there are now two terms in the amplitude for neutron–proton scattering,
with total spin s = 0 or s = 1. In the absence of orbital angular momentum
the total spin is conserved in the scattering process, so the total scattering cross
section takes the form σ0 + σ1 , where σs is the cross section in the = 0
proton–neutron state with total spin s. It is possible to separate the contributions
of spin zero and spin one by using data on the deuteron, a proton–neutron bound
state with = 0 (and a small admixture of = 2) and with total angular
momentum j = 1 and hence total spin s = 1. There is a classic relation7 that to
a good approximation gives σ1 = 2π h̄2 /μB, where μ is the reduced mass of a
proton and a neutron and B is the deuteron binding energy, so using scattering
data and the deuteron binding energy one can separately find σ0 and σ1 .
This is important because protons and neutrons are fermions, so the = 0
state of two protons or two neutrons must be antisymmetric in the particles’
spin 3-components. As can be seen from either Eq. (5.4.42) or Table 5.1, this
requires the state to have total spin zero. It is therefore of interest to compare the
value of σ0 deduced for proton–neutron scattering for s = 0 with the observed
total low-energy proton–proton scattering cross section.
Unfortunately there is no way to make a target out of the electrically neutral
(and, as we shall see, unstable) neutron, so it was not possible to make a direct
measurement of neutron–neutron scattering. There is no similar obstacle to the
measurement of proton–proton scattering for, as in Rutherford’s 1919 experi-
ments, one can make a target of a hydrogen gas or a proton-rich substance like
paraffin. Here the problem is that at low energy the scattering is almost entirely
due to the Coulomb potential, and reveals nothing about the nuclear forces.
7 For a textbook derivation, see Section 8.8 of Weinberg, Lectures on Quantum Mechanics, listed in the
bibliography.
The measurements in Rutherford’s laboratory of the scattering of alpha particles

by various nuclei had indicated that the range of nuclear forces is no larger than
about R ≈ 10−13 cm. In order for two protons to approach to a distance less than
this, it is necessary for their kinetic energy to be greater than e2 /R ≈ 1.4 MeV.
High-energy proton beams became available in the 1930s with the invention of
accelerators with potential differences produced electrostatically, which were
used8 to make accurate measurements of proton–proton scattering. It turned out
that when the scattering amplitude due to Coulomb forces was subtracted, the
= 0 part of the purely nuclear proton–proton scattering amplitude was equal
to the previously measured = 0 proton–neutron scattering amplitude in the
state with total spin zero.
Isotopic Spin Rotations

The equality of forces soon led two pairs of theorists9 to propose that the laws
governing nuclear forces (whatever they are) respect a symmetry among neu-
trons and protons. It is not just that these laws do not change if everywhere in
the equations we change neutrons into protons and protons into neutrons. That
would imply that the proton–proton nuclear force is the same as the neutron–
neutron force but would say nothing about their relation to the proton–neutron
force. Rather, according to the proposed symmetry principle, the laws governing
nuclear (but not electromagnetic) forces are invariant under what is called an
isotopic spin rotation, which acts not on momenta or ordinary spin but on the
labels of the nuclear particles. The neutron and proton are supposed to form a
doublet, called the nucleon:

p
n
on which isotopic spin rotations act in the same way mathematically that ordi-
nary rotations act on the two ordinary spin states of any particle with s = 1/2.
(Specifically, isotopic spin rotations act on the nucleon doublet as a 2 × 2 matrix
U having the property U † = U −1 , known as unitarity, and having determi-
nant unity. But we won’t need to use this information here.) Just as we saw
in Section 5.4 that the effect of infinitesimal ordinary rotations on physical
states is given by an angular momentum operator J, whose components sat-
isfy the commutation relations [Ji , Jj ] = i h̄ij k Jk (where ij k is the totally
antisymmetric quantity with 123 = +1, and repeated indices are summed),
in the same way infinitesimal isotopic spin rotations are generated by a three-
component operator T, whose components satisfy the commutation relations
8 M. A. Tuve, N. Heydenberg, and L. Hafstad, Phys. Rev. 50, 850 (1936).

9 B. Cassen and E. U. Condon, Phys. Rev. 50, 846 (1936); G. Breit and E. Feenberg, Phys. Rev. 50, 850
(1936).
[Ta , Tb ] = iabc Tc . (The a, b, c indices can be taken like i, j , k to run over the
values 1, 2, 3, but of course for the isotopic spin these values have nothing to
do with directions in ordinary space. Repeated indices are again summed, and
abc like ij k is a totally antisymmetric quantity with 123 = 1.) The proton and
neutron are taken as the states with T3 = +1/2 and T3 = −1/2, respectively.
Just as two particles with ordinary spin 1/2 can combine to form a compound
state with total spin s equal to 0 or 1, two nucleons can combine to form a com-
pound state with total isotopic spin 0 or 1, which transforms under isotopic spin
rotations in the same way that states with ordinary total spin 0 or 1 transform
under ordinary rotations. The invariance of nuclear forces under isotopic spin
rotations tells us that total isotopic spin is conserved, so the cross section for the
scattering of two nucleons is the sum of a cross section for isotopic spin 1 and
a cross section for isotopic spin zero. The states with total isotopic spin 1 form
a triplet, just like orbital angular momentum states with = 1, whose compo-
nents are a proton+proton state with T3 = +1, a proton+neutron state with
T3 = 0, and a neutron+neutron state with T3 = −1. The proton+proton and
neutron+neutron = 0 states must be antisymmetric in the nucleon spin
3-components, and therefore have total ordinary spin 0. Since spin and isotopic
spin commute, the proton+neutron component of this triplet must then also
have spin zero. Since these three s-wave nucleon–nucleon states with total
ordinary spin 0 form a triplet, the scattering cross sections are the same for each.
On the other hand, an s-wave state of two nucleons with ordinary spin 1
is symmetric in the spin 3-components, so it cannot be a proton+proton or
neutron+neutron state, and can therefore only be a proton+neutron T3 = 0 state
of a singlet with total isotopic spin zero. This is the deuteron, with total angular
momentum and total ordinary spin both equal to one.
Multiplets
The implications of isotopic spin symmetry go far beyond the equality of s-
wave nucleon–nucleon cross sections for total spin zero. Before we go into this,
it is necessary to say something about the relation of isotopic spin quantum
numbers and electric charge. For the proton–neutron doublet, it is obvious that
the electric charge of a nucleon is
Q = e[T3 + 1/2] (6.2.1)
so that protons and neutrons will have charges respectively e and 0. In a nucleus
with B nucleons, the charge is the sum of (6.2.1) for all the nucleons, so
Q = e[T3 + B/2] , (6.2.2)
where now T is the isotopic spin operator of the whole nucleus and B is the
number of nucleons. As we have seen, B is very close to the atomic weight A
of the element, but we use the symbol B instead of A because they are not
precisely equal, and in order that Eq. (6.2.2) should apply for some of the
particles discovered after World War II that are not composed of protons and
neutrons. In this more general context, B is known as the baryon number. (For
some unstable particles a quantity S known as strangeness that is conserved in
strong and electromagnetic interactions must be added to B in Eq. (6.2.2).)
Of course, electromagnetism does not respect isotopic spin symmetry: pro-
tons are charged while neutrons are not. Equation (6.2.2) shows that in elec-
tromagnetic phenomena involving the charge operator the 3-component of the
isotopic spin operator plays a different role from the 1- and 2- components.
There is also a nucleon mass difference, mn − mp = 1.293 MeV/c2 , which
contributes a term in the total rest mass proportional to T3 . For relatively light
nuclei, with atomic numbers less than about 20 to 30, Coulomb forces are
less important than nuclear forces and isotopic spin symmetry is fairly well
respected, but this is not true for heavy nuclei, where the Coulomb repulsion
of protons in the nucleus comes close to tearing the nucleus apart. It makes no
sense to talk about isotopic spin symmetry when we are dealing with uranium.
Relatively light nuclei must form isotopic spin multiplets. We characterize
any multiplet by a total isotopic spin quantum number t, defined so that (just as
for ordinary spin multiplets) the multiplet consists of 2t + 1 nuclei with T3
equal to t, t − 1, . . . , −t, all with the same ordinary spin (that is, total angular
momentum) and with close to the same energy. Acting on the multiplet the
isotopic spin operator T satisfies T2 = t (t + 1), the proton and neutron form a
t = 1/2 doublet, and the deuteron is a t = 0 singlet. There are many t = 1/2
doublets of complex nuclei; the lightest consists of the light isotope 3 He of
helium, whose discovery was announced by Rutherford in his 1920 Bakerian
lecture, and tritium, the radioactive isotope 3 H of hydrogen discovered at the
Cavendish Laboratory10 in 1934. The 3 He nucleus consists of two protons and
one neutron and has atomic weight 3.01605, while the 3 H nucleus is composed
of one proton and two neutrons and has atomic weight 3.01603. Both nuclei
have spin 1/2.
There are also triplets of nuclear states with t = 1, which show again that
this is a symmetry under transformations that go beyond the mere interchange
of protons and neutrons. A famous example includes the ground states of the
nuclei of 12 B and 12 N, which have B = 12 and charges 5e and 7e, and hence
according to Eq. (6.2.2) have T3 = −1 and T3 = +1. The T3 = 0 member
of the triplet would then be ordinary carbon, 12 C, with nuclear charge 6e. But
it is not the ground state of 12 C, which has total angular momentum j = 0,
while the ground states of 12 B and 12 N both have j = 1. Also, although the 12 B
and 12 N ground states have nearly equal atomic weights, 12.0144 and 12.0186,
respectively, the 12 C ground state by definition has atomic weight 12.0000. (The
greater binding energy of 12 C is due to two effects mentioned in the previous
10 M. Oliphant, E. Harteck, and E. Rutherford, Nature 133, 413 (1934); Proc. Roy. Soc. A 144, 692 (1934).
section: the numbers of protons and neutrons in 12 C are equal, and both numbers
are even.) The small difference in atomic weights of 12 B and 12 N is due to the
greater Coulomb repulsion among the seven protons of 12 N than among the five
protons of 12 B, but this cannot account for the large difference from the atomic
weight of the ground state of carbon. In order to provide the T3 = 0 member
of a triplet with 12 B and 12 N, there would have to be a spin 1 state of 12 C
with an excitation energy well above the ground state. Since the number of
protons in 12 C is the average of the numbers in 12 B and 12 N, we would expect
its excitation energy to be about 0.0165 m1 c2 (the average of 0.0144 m1 c2 and
0.0186 m1 c2 ), or, taking m1 c2 = 931.5 MeV, about 15.3 MeV. In fact there is
such a state, a spin 1 state of 12 C that is 15.11 MeV above the 12 C ground state,
which decays into the ground state by emission of a photon. This is the T3 = 0
member of the triplet.
Why Isotopic Spin Symmetry?

One may wonder why nuclear forces should obey a symmetry principle that
is not obeyed by other forces, such as those of electromagnetism. Indeed, one
should wonder. An invariance principle that applies only to some phenomena
and not others can hardly be regarded as a fundamental physical principle.
This puzzle became resolved in the modern theory of strong nuclear forces
known as quantum chromodynamics.11 Briefly, in this theory the neutron and
proton are composed of two kinds of elementary spin 1/2 particles, the up quark
with charge 2e/3 and the down quark with charge −e/3. In close analogy
with how 3 He and 3 H are composed of protons and neutrons, the proton is
composed of two up quarks and a down quark, while the neutron consists of
one up quark and two down quarks. Nuclear forces in quantum chromodynam-
ics are carried by eight fields like the electromagnetic field, only interacting
with a quantum number known whimsically as color instead of charge. At the
energies characteristic of nuclear phenomena these forces are much stronger
than electromagnetic forces, which is why the composite nature of protons and
neutrons is not apparent in most nuclear phenomena and why electromagnetism
can be treated as a small perturbation in studying light nuclei. The quarks all
carry the same set of colors, so strong nuclear forces do not distinguish up
from down quarks, but isotopic spin symmetry is not imposed on the theory. In
fact, unlike protons and neutrons, the up and down quarks have quite different
masses: according to one estimate, the down quark mass is almost twice the
11 Quantum chromodynamics is part of our present theory of elementary particles and their interactions,
the Standard Model. Formulating and testing this model has been the work of many physicists. For an
informal history see Weinberg, “Half a Century of the Standard Model,” listed in the bibliography. A more
detailed account with references to much of this work can be found in Weinberg, The Quantum Theory of
Fields, Vol. II: Modern Applications (Cambridge University Press, Cambridge, UK, 1996).
up quark mass. The reason for the isotopic spin symmetry of strong forces
is just that there is no room in the theory for any violation of the symmetry
other than the quark masses, and the quark masses although unequal are very
small. Almost all of the masses of the proton and neutron comes from the strong
nuclear forces acting among the quarks within a single proton or neutron, not
from the quark masses.
The small mass difference between the proton and the neutron comes both
from differences in the quark masses and from electromagnetic forces among
the quarks, but the quark mass difference is somewhat more important. This why
the neutron is heavier than the proton, even though the electric charges of the
quarks in the proton are larger than those in the neutron. It is both the smallness
of the quark masses and the relative weakness of electromagnetic effects that
makes the neutron–proton mass difference, 1.293 MeV/c2 , so tiny compared
with the proton mass, 938 MeV/c2 .
Pions
Isotopic spin symmetry had important implications for the new strongly inter-
acting particles discovered after World War II. The first of these particles was
the pi meson, or pion as it is frequently called. In 1947 a group at the University
of Bristol,12 studying photographic plates that had been exposed to cosmic
rays at high altitudes in the Pyrenees and Andes, found evidence of a strongly
interacting particle with a mass intermediate (hence the name “meson”) between
the electron and the nucleon. It is today known that these charged pions come
with charges +e and −e, both with masses 139.570 MeV/c2 . These particles
are produced singly in reactions such as p + p → p + n + π + , and so if
baryon number is conserved these particles must be supposed to have B = 0.
Equation (6.2.2) then indicates that the π + and π − have T3 = +1 and T3 = −1,
respectively. No doubly charged particles with similar mass have ever been
found, so the pions cannot be part of an isotopic spin multiplet with t ≥ 2,
and therefore must be part of a triplet, with t = 1. The neutral T3 = 0 member
of the triplet, the π 0 , was discovered at the Berkeley cyclotron in 1950 – the
first particle to be found at an accelerator before it being discovered in cosmic
rays. The mass of the π 0 is now known to be 134.977 MeV/c2 .
In quantum chromodynamics, the π + and π − are respectively u + d and
d + u, where u and d stand for up and down quarks, and the bar denotes
antiquarks. The π 0 is a 50–50 superposition of u + u and d + d. The quark
masses contribute equally to all three pions, so the 4.6 MeV/c2 mass
difference between charged and neutral pions is entirely due to electromagnetic
forces. In fact, this is the one mass difference in an isotopic spin multiplet
12 C. M. G. Lattes, H. Muirhead, G. P. S. Ochiallini, and C. F. Powell, Nature 159, 694 (1947).

of elementary particles that has been successfully calculated as a purely

electromagnetic effect.
Although the charged and neutral pions are joined in an isotopic spin triplet,
their decays occur through interactions that do not respect isotopic spin symme-
try and hence they have very different decay rates and decay modes. The neutral
pion decays into two photons through purely electromagnetic interactions, with
mean lifetime (8.52 ± 0.18) × 10−12 seconds. The charged pion decays much
more slowly through the weak interactions discussed in Section 6.5, with mean
lifetime (2.6033±0.0005)×10−8 seconds, primarily into a neutrino and a muon,
a particle similar to an electron but 210 times heavier, discovered in cosmic rays
in 1937.
Appendix: The Three–Three Resonance
There are no clearly identified multiplets of nuclear states larger than triplets,
but there is a conspicuous quartet of unstable particles that decay into a nucleon
and a pion, with masses all close to 1210 MeV/c2 . This is the “three–three
resonance” , where “three–three” means that it has t = 3/2 and j = 3/2,
and “resonance” indicates that these are seen as sharp peaks in pion–nucleon
scattering, interpreted as the formation of an unstable intermediate state that
decays back into a nucleon and a pion. As discussed at the end of the appendix
to Section 5.6, the total decay rate of each of these four states is measured as
the width of the peak of the cross section as a function of energy, divided by h̄;
the rate of decay into any particular pion–nucleon state equals the total decay
rate times the branching ratio, the fraction of scattering events at the resonant
energy that produce that pion–nucleon state.
Since the formation and decay of the both indicate that it has the same
baryon number B = 1 as the nucleon, Eq. (6.2.2) indicates that the four states of
the quartet with charges 2e, e, 0, and −e have T3 = 3/2, T3 = 1/2, T3 = −1/2,
and T3 = −3/2. Like the proton and neutron the states are interpreted as
composites of three quarks: respectively uuu, uud, udd, and ddd.
The three–three resonance provides a good example of the power of sym-
metry principles such as isotopic spin symmetry to do more than dictate how
energy eigenstates are grouped into multiplets. The conservation of isotopic
spin tells us that the nucleon and pion produced when a decays must be in a
state of total isotopic spin 3/2 rather than a mixture of isotopic spins 3/2 and
1/2. For a three–three resonance with a given value of T3 , the nucleon–pion
state has wave function

C1,1/2 (3/2, T3 ; T3 ∓ 1/2, ±1/2)ψTπ3 N
∓1/2,±1/2 ,
±
where ψTπ3 N
∓1/2,±1/2 is the wave function for a pion and a nucleon with their
third components of isotopic spin equal respectively to T3 ∓ 1/2 and ±1/2, and
C1,1/2 (3/2, T3 ; t, t ) is the Clebsch–Gordan coefficient discussed in Section 5.4.

The rates of decay of a three–three resonance with a given T3 into various
pion–nucleon states are then given by
((T3 ) → π(T3 ∓ 1/2) + N(±1/2))
2
= C1,1/2 (3/2, T3 ; T3 ∓ 1/2, ±1/2) ,
where is the total decay rate of a three–three particle of any charge; it
is another consequence of isotopic spin symmetry that these total decay rates
are the same for all four charges of the . Looking up the Clebsch–Gordan
coefficients in Table 5.1 for combining states of spin 1 and 1/2 to form a state
of spin 3/2, we see that for the T3 = 1/2 state + we have
(+ → π + + n) = /3 , (+ → π 0 + p) = 2 /3 ,
while, for the T3 = −1/2 state 0 ,

(0 → π − + p) = /3 , (0 → π 0 + n) = 2 /3 .
For the ++ and − there is only one available decay channel, so without
looking up Clebsch–Gordan coefficients we know that
(++ → π + + p) = (− → π − + n) = .
These predictions were verified in experiments on pion–nucleon scattering car-
ried out by Fermi’s group at Chicago in the early 1950s.
6.3 Shell Structure
In nuclei as in atoms it is a fair approximation to adopt a Hartree approximation,

in which each nucleon feels an effective potential due to all the other nucleons.
Neutrons and protons are fermions, so their states in nuclei are governed by
the Pauli exclusion principle, like the states of electrons in atoms. In particular,
there are nuclei in which protons or neutrons or both form closed shells like the
electrons in noble gases, and therefore are more tightly bound than other nuclei
of similar weight.
The great difference between the closed shells in atoms and nuclei arises
from the difference in the form of their effective potentials. Both potentials have
approximate spherical symmetry, but in nuclei, unlike atoms, there is nothing
special at the center of symmetry that would make the nuclear potential singular
there. Since the nuclear potential is a function only of the radial coordinate r,
and is expected to be analytic in the Cartesian components of the coordinate
vector x, it must be a power series in r 2 = x2 . Within some neighborhood of
the origin, it is therefore approximately linear in x2 , a relation we shall write as
1
V (x) V0 + mN ω2 x2 (6.3.1)
2
where mN can be taken as the mean nucleon mass, and ω is a constant with the
dimensions of frequency. The total Hamiltonian is then a sum of one-nucleon
Hamiltonians, each of the form
P2 mN ω 2 2
H = V0 + + X , (6.3.2)
2mN 2
with X the operator that multiplies the wave function with the coordinate
argument x, and P the operator that acts on the wave function as the differential
operator −i h̄∇. This is the Hamiltonian for an harmonic oscillator with circular
frequency ω, the first problem solved using Heisenberg’s matrix mechanics at
the beginning of quantum mechanics.13
To find the spectrum of eigenvalues of this Hamiltonian, we introduce a
vector operator

1 mN ω
a≡ √ P−i X. (6.3.3)
2mN ωh̄ 2h̄
Recalling the commutation relations (5.3.22),
[Xi , Pj ] = i h̄δij , [Xi , Xj ] = [Pi , Pj ] = 0 , (6.3.4)
it is straightforward to calculate that
[ai , aj† ] = δij , [ai , aj ] = [ai† , aj† ] = 0 . (6.3.5)
The Hamiltonian (6.3.1) can be expressed as
h̄ω
H = V0 + a · a† + a † · a .
2
Using the commutators (6.3.5), this is
3h̄ω
H = V0 + + h̄ω a† · a . (6.3.6)
2
The operators a† and a play the role of raising and lowering operators for
the energy. Using Eq. (6.3.6) and the commutation relations (6.3.5), we easily
see that
[a, H ] = h̄ωa , [a† , H ] = −h̄ωa† . (6.3.7)
It follows that if H ψ = Eψ, then
H (a† ψ) = (E + h̄ω)(a† ψ) , (6.3.8)
13 W. Heisenberg, Zeit. Phys. 33, 879 (1925). This article is reprinted in English in Van der Waerden, Sources
of Quantum Mechanics, listed in the bibliography.
H (aψ) = (E − h̄ω)(aψ) . (6.3.9)

We assume that there is a one-nucleon state with some minimum energy E0 . In
this case the wave function ψ0 for this state must satisfy
aψ0 = 0 , (6.3.10)
since otherwise according to Eq. (6.3.9) the wave function aψ0 would be an
energy eigenfunction with an even smaller energy, E0 − h̄ω. Using aψ0 = 0,
Eq. (6.3.6) then gives
H ψ0 = E0 ψ0 , (6.3.11)
where E0 is the minimum energy:
3h̄ω
E0 = V0 + .
2
The energy h̄ω/2 associated with each of the three coordinate components
is known as the zero-point energy. The appearance of a zero-point energy for
harmonic oscillators is an inevitable feature of quantum mechanics. Inspection
of the Hamiltonian (6.3.2) shows that for a state to have energy as low as V0
its wave function would have to be an eigenfunction of both P and X with
eigenvalues zero, which is impossible since the commutator [Xi , Pi ] = i h̄
cannot vanish acting on any wave function.
Equation (6.3.8) shows that acting on any wave function with any component
†
ai raises the energy of the state by h̄ω, so we can find a complete set of energy
eigenfunctions
ψn1 n2 n3 ≡ (a1† )n1 (a2† )n2 (a3† )n3 ψ0 (6.3.12)

for which
H ψn1 n2 n3 = (E0 + nh̄ω)ψn1 n2 n3 , (6.3.13)
where n = n1 + n2 + n3 .
We could just as well construct an eigenfunction of H with eigenvalue
E0 + nh̄ω if in place of (a1† )n1 (a2† )n2 (a3† )n3 we operated on ψ0 with any
homogeneous polynomial of order n in the components of a† – that is, any
sum of terms, each proportional to a product like that in Eq. (6.3.12) of a
total of n factors of components of a† . In order to make clear the angular
momentum content of these states, it is much more convenient to use the set of
homogeneous polynomials encountered in Eq. (5.2.16). Expressed as a function
of any vector v, these are
Ym (v) ≡ |v| Ym (v̂) , (6.3.14)
where Ym is the spherical harmonic function described in Section 5.2, with a
non-negative integer and m an integer running over the 2+1 values from − to
+. For instance, Y00 (v) is a constant, and

±1 3 3
Y1 (v) = ∓ (v1 ± iv2 ) , Y1 (v) =
0
v3 .
8π 4π
We can find a complete set of states with energy E0 + nh̄ω and angular momen-
tum quantum number for which n − is an even non-negative integer:
m
ψn, = (a† · a† )(n−)/2 Ym (a† )ψ0 . (6.3.15)
For instance, for n = 0 we have only = 0, and ψ0,0 0 is proportional to the
minimum-energy wave function ψ0 . For n = 1 we have only = 1, and ψ1,1 m =
Y1m (a† )ψ0 . For n = 2 we have both = 2, with ψ2,2 m = Y m (a† )ψ and also
2 0
= 0, with ψ2,0 ∝ (a · a )ψ0 .
0 † †
All but the lowest energy states are evidently degenerate. As we have seen, for
energy levels with n = 1 and n = 2 there are respectively three and 5 + 1 = six
states with energies respectively E0 + h̄ω and E0 + 2h̄ω. In general, the number
#n of states with energy E0 + nh̄ω is the sum of 2 + 1 for all non-negative
integers with n − an even non-negative integer 2ν. That is,
⎧ n/2
⎪
⎪ ν=0 (2n − 4ν + 1) = (2n + 1)(n/2 + 1) − 2(n/2)(n/2 + 1)
⎪
⎪
⎪
⎪
⎪
⎨
for n even
#n = (n−1)/2
⎪ (2n − 4ν + 1) = (2n + 1)((n − 1)/2 + 1)
⎪
⎪ ν=0
⎪
⎪ −2((n − 1)/2)((n − 1)/2 + 1)
⎪
⎪
⎩
for n odd
and so, whether n is even or odd, the degeneracy (apart from spin) is
#n = (n + 1)(n + 2)/2 . (6.3.16)
This can be recognized as the number of ways an integer n can be written as
a sum of three non-negative integers, so this is also the number of independent
wave functions ψn1 n2 n3 with n = n1 + n2 + n3 defined by Eq. (6.3.12). Thus
the wave functions (6.3.15) form a complete set of eigenfunctions of H with
eigenvalue E0 + nh̄ω.
It has been possible to work out the energy eigenvalues and their degeneracies
here (as Heisenberg did in 1925) without examining the form of these wave
functions as functions of the nucleon coordinates, but it will help to make our
discussion more concrete if we take a moment to look at these wave functions.
By using Eq. (6.3.3), the defining (6.3.10) for the wave function of the state of
minimum energy can be written explicitly in a first-order differential equation

h̄ mN ω
∇+ x ψ0 (x) = 0 . (6.3.17)
2mN ω 2h̄
The solution (with arbitrary normalization) is

mN ω 2
ψ0 (x) = exp − x . (6.3.18)
2h̄
m (x) can be found using Eq. (6.3.15), with ψ (x) given
The wave functions ψn, 0
†
by Eq. (6.3.18), and with a replaced with the differential operator

−i mN ω
a =√
†
∇+i x.
2mN ωh̄ 2h̄
For instance,

mN ω 2
m
ψ1,1 (x) ∝ |x|Y1m (x̂) exp − x .
2h̄
Taking into account the two spin states of a nucleon, the actual degeneracy of
the energy level E = E0 + nh̄ω is twice the quantity (6.3.16), or
(n + 1)(n + 2) = 2, 6, 12, 20, 30, 42, . . .
This leads to the expectation that the protons or neutrons in a nucleus would all
form closed shells if the number of protons or of neutrons were equal to
2, 2 + 6 = 8, 8 + 12 = 20, etc.
These are the so-called magic numbers of nuclear physics,14 analogous to the
atomic numbers 2, 10, 18, etc., of the noble gases in atomic physics. We expect
nuclei with a magic number of protons or neutrons to be more deeply bound
and hence more abundant than other nuclei with similar numbers of neutrons
and protons. A nucleus is likely to be particularly deeply bound if it is doubly
magic, with a magic number of both protons and neutrons. Indeed, the lightest
doubly magic nuclei are 4 He, 16 O, and 40 Ca, which are more tightly bound and
abundant than other nuclei of similar weight.
One might expect the magic number following 20 to be 20+20 = 40, but this
is not the case. The degenerate multiplets we found for the harmonic oscillator
begin, for heavier nuclei, to be split in energy, both by the interaction of the
spin and orbital angular momenta of the nucleons and from the breakdown of
the harmonic oscillator approximation (6.3.1) as nucleons in high energy levels
spend increasing time away from the nuclear center. In particular, there is a
term in the Hamiltonian for each nucleon proportional to S · L with a large
14 M. Goeppert-Mayer and J. H. D. Jensen, Elementary Theory of Nuclear Shell Structure (Wiley, New York,
1955).
6.4 Alpha Decay 229
negative coefficient, which for each n lowers the energy of the single-nucleon
state with the largest orbital angular momentum = n and largest total angular
momentum, j = + 1/2 = n + 1/2, below the energies of other single-nucleon
states with the same n.
Without these corrections, the n = 3 energy level would have 20 degenerate
states with = 3 and = 1, but these corrections lower the eight f7/2 states
below the other 12 states, so the magic number following 20 is not 40, but
20 + 8 = 28. The element with 28 protons is nickel, which is known to be pro-
duced abundantly by nuclear reactions occurring in core-collapse supernovae.
The most abundant isotope of nickel is not the doubly magic 56 Ni; this iso-
tope is less abundant than either 58 Ni or 60 Ni, which have a magic number
only of protons. This is because the negative nuclear potential energy of the
additional neutrons is needed to compensate for the Coulomb repulsion of the
28 protons. Even so, as noted in Section 3.4, the deep binding of nickel isotopes
makes nickel an exception to the rule that atomic weight steadily increases with
atomic number.
The same pattern repeats for larger nucleon numbers. The next shell has
nucleons in the 20 − 8 = 12 states with n = 3 and j < 7/2, and in the 10
n = 4 states with = 4 and j = + 1/2 = 9/2, giving a magic number
28 + 12 + 10 = 50. The next shell has nucleons in the 30 − 10 = 20 states
with n = 4 and j < 9/2, and in the 12 n = 4 states with = 5 and
j = + 1/2 = 11/2, giving a magic number 50 + 20 + 12 = 82. Finally,
the next shell has nucleons in the 42 − 12 = 30 states with n = 5 and j < 11/2,
and in the 14 n = 6 states with = 6 and j = + 1/2 = 13/2, giving a magic
number 82 + 30 + 14 = 126. Thus the complete list of magic numbers is
2, 8, 20, 28, 50, 82, 126 .
The only stable doubly magic nucleus heavier than calcium 40 is lead 208.
6.4 Alpha Decay
As we saw in Section 3.3, in the first decade of the twentieth century Rutherford
and his collaborators were able to distinguish two kinds of radioactivity. One
was beta decay, the subject of Section 6.5. The other was alpha decay, the
emission of a charged alpha particle, soon identified as a helium 4 nucleus.
These alpha particles furnished Rutherford with a probe of atomic structure,
with which he discovered the nucleus of the atom.
Alpha decay has the remarkable feature that to get out of the nucleus the
alpha particle must pass through a potential barrier that according to classical
physics it cannot inhabit, because the potential energy there is greater than the
total energy of the alpha particle. Only because of the wave nature of particles
in quantum mechanics is it possible for the alpha particle to leak through the
barrier. The presence of this barrier gives the rate of alpha decay an extreme
sensitivity to the energy of the emitted alpha particle and the radius of the nu-
cleus. Similar Coulomb barriers govern the rate of spontaneous nuclear fission
and of nuclear reactions in stars.
We will assume spherical symmetry, and to avoid mathematical complica-
tions consider only s-wave (l = 0) decays, which are the most common. The
Schrödinger (5.2.19) for the radial wave function RE (r) with alpha particle
energy E and = 0 takes the form

h̄2 1 d 2 dRE (r)
− r + V (r)RE (r) = ERE (r) , (6.4.1)
2mα r 2 dr dr
where V (r) is taken to include both the Coulomb repulsion and the nuclear
attraction between the alpha particle and the rest of the nucleus. We take E > 0,
so that it is energetically possible for the alpha particle to exist far from the
nucleus. It proves very convenient to write this instead as a differential equation
for the reduced wave function uE (r) ≡ r RE (r):
h̄2 d 2
− uE (r) + V (r)uE (r) = EuE (r) . (6.4.2)
2mα dr 2
As we saw in Section 5.2, the boundary condition for general orbital angu-
lar momentum is that, for r → 0, RE (r) is proportional to r and hence
uE (r) is proportional to r +1 , so for = 0 the condition is that uE (r) ∝ r for
r → 0.
It is assumed that for r less than the nuclear radius R the potential V (r) is
dominated by the nuclear attraction, which gives it negative values. For r greater
than R the nuclear attraction is presumed to be ineffective, so V (r) becomes
positive:
2Ze2
V (r) = for r > R , (6.4.3)
r
where Ze is the electric charge of the final nucleus. We assume that for some
range of r greater than R, this potential is greater than E. This is the region that
classically cannot be inhabited by the alpha particle. (See Fig. 6.1.)
To see how the wave function behaves in this region, it is convenient to rewrite
Eq. (6.4.2) for r > R as
d2
uE (r) = κE2 (r)uE (r) , (6.4.4)
dr 2
6.4 Alpha Decay 231
V(r)
E
r
R bE
Figure 6.1 An example of the potential V (r) felt by an alpha particle at a

distance r from the nuclear center.
where κE (r) can be taken as the positive square root,

2mα
κE (r) = + (V (r) − E) . (6.4.5)
h̄2
We note that if V (r) and hence κE (r) were independent of r, then Eq. (6.4.4)
would have solutions proportional to exp(±κE r). It therefore may be guessed
that if κE (r) varies sufficiently slowly with r then the wave function within the
barrier takes the approximate form
r
uE (r) = C+ (E)AE+ (r) exp + κE (r) dr
R
r
+ C− (E)AE− (r) exp − κE (r) dr , (6.4.6)
R
where the amplitudes AE± (r) vary more slowly than the exponentials, and
the C± (E) are r-independent factors determined by the conditions that the
values and first derivatives of uE (r) are continuous at the nuclear radius
√ R.
The appendix to this section shows that AE+ (r) = AE− (r) = 1/ κE (r),
and describes the conditions on κE (r) under which Eq. (6.4.6) is a good
approximation.
Now, if the barrier extended to infinity with V (r) > E then the only
allowed values of energy would be those for which the growing exponential
term in Eq. (6.4.6) was absent, which would require that E takes a value where
C+ (E) = 0. These would be the energies of the true bound states of the alpha
particle in the nucleus. In fact, V (r) falls to the value E at a radial coordinate
r = bE :
bE = 2Ze2 /E (6.4.7)
and V (r) < E for r > bE . The condition C+ (E) = 0 picks out the energies
of unstable states, for which the wave function becomes exponentially small
outside the barrier, though not zero.
For instance, if V (r) in the nucleus were a negative constant −V0 , then the
general solution of Eq. (6.4.2) for r < R would be a linear combination of
sin qr and cos qr, where
1%
q≡+ 2mα (E + V0 ) . (6.4.8)
h̄
The boundary condition that uE (r) ∝ r for r → 0 tells us that (with an arbitrary
normalization) the physical solution for r < R is
uE (r) = sin qr . (6.4.9)
In this case the continuity at R of the values and
√ first derivatives of the wave
functions (6.4.6) and (6.4.9) (with AE± (r) = 1/ κE (r) assumed to vary much
more slowly than the exponentials) gives
1 %
√ [C+ + C− ] = sin qR , κE (R)[C+ − C− ] = q cos qR ,
κE (R)
and therefore, for a constant potential in the nucleus,
√
κE (R) q
C± (E) sin qR ± cos qR . (6.4.10)
2 κE (R)
The condition C+ (E) = 0 requires that tan qR √ = −q/κE (R). For a very deep
potential well, with κE (R) much less than 2mα V0 /h̄ and hence much less
than q, the unstable state with lowest energy has q slightly greater than π/2R.
At a value of E where C+ (E) = 0, the wave function outside the barrier is
suppressed by a factor exp(−G(E)), where
bE bE
4mα Ze 2 1 1
G(E) = κE (r) dr = 2
− dr
R h̄ R r b E

4mα Ze2 bE 4Ze2
= f (R/b E ) = f (R/bE ) , (6.4.11)
h̄2 h̄vα
6.4 Alpha Decay 233
where

1 1 π % √
f (x) ≡ − 1 dz = − x(1 − x) − arcsin x , (6.4.12)
x z 2
√
and vα = 2E/mα is the velocity of the alpha particle when it escapes far
from the nucleus. At the energy of an unstable state, where C+ (E) = 0, the
probability density |RE (bE )|2 at the outer radius of the barrier is suppressed by
a factor of order exp(−2G(E)).
In the earliest successful theory of alpha decay,15 this factor was interpreted
as the probability that an alpha particle coming out of the nucleus would pene-
trate the Coulomb barrier. That is, the rate α of alpha decay was presumed to
take the form
α = ν exp(−2G(E)) , (6.4.13)
where ν is some sort of rate factor that reflects conditions within the nucleus.
The factor ν is commonly estimated as the rate ν V /R at which alpha
particles inside the nucleus classically would strike the nuclear surface, where V
is a typical alpha particle velocity inside the nucleus and R is the nuclear radius.
As we have seen, for a very deep potential well the alpha particle wave number
inside the nucleus is close to π/2R, so V /R h̄π/2mα R 2 , which for a large
nucleus with R 9 × 10−13 cm is 3 × 1020 sec−1 . The rate factor ν is usually
quoted as 1021 sec−1 .
This is sometimes expressed in terms of the spacing of energy levels. For a
flat deep nuclear potential with q κE (R), the energy levels of unstable states
where C+ (E) vanishes are at qR (n + 1/2)π with n = 0, 1, 2, . . . , so that
their wave numbers are spaced by q = π/R. The spacing D in energy is then
D (dE/dq)q = h̄V /R, so V /R D/π h̄.16
The appendix to this section gives a thoroughly quantum-mechanical deriva-
tion of the decay rate that dispenses with the semi-classical picture of an alpha
particle in the nucleus striking the nuclear surface and occasionally leaking
through. The rate of decay of an unstable state with energy E1 is found to be
given by Eq. (6.4.54):

C− (E1 ) −2G(E )
α = e 1 . (6.4.14)
h̄C (E ) + 1
The factor multiplying e−2G(E1 ) in Eq. (6.4.14) is of the same order of magni-
tude as the rate factor V /R. For instance, for a flat nuclear potential, Eq. (6.4.10)
suggests that the derivative of C+ (E) with respect to wave number is of order
15 G. Gamow, Zeit. f. Physik 52, 510 (1929); E. U. Condon and R. W. Gurney, Phys. Rev. 33, 127 (1929).
16 The rate factor ν multiplying exp(−2G) is sometimes instead estimated as ν D/2π h̄; for instance,
see J. M. Blatt and V. F. Weisskopf, Theoretical Nuclear Physics (John Wiley & Sons, New York, 1952),
Section XI.2.
1/2
κE (R)R, while C− (E1 ) is of order κE (R), so C+
1/2 (E)/C (E) is of order
−

(dq/dE)R = R/h̄V and the factor C− (E1 )/h̄C+ (E1 ) in Eq. (6.4.14) is there-
fore of order V /R.
(E )| instead
It must be admitted that taking the rate factor ν as |C− (E1 )/h̄C+ 1
of V /R or D/h̄π is not very important, because none of these estimates take into
account the probability that an alpha particle will somehow become detached
inside the nucleus from the rest of the nucleus. But at least Eq. (6.4.14) is a
precise statement (for thick barriers with a slowly varying potential) of the rate
at which an alpha particle that has become detached inside the nucleus will
escape, and it does not depend on semi-classical hand-waving.
This theory does correctly describe the extreme sensitivity of alpha particle
decay rates to the energy and the nuclear radius, due almost entirely to the
barrier penetration factor exp(−2G(E)). In particular, without needing to worry
about the rate factor ν we can use the above results for the barrier penetration
exponent G(E) to understand the trend of the dependence of the logarithm of
the mean lifetime τα = 1/ α on energy. Note that for a thick barrier with
bE R, the leading and next-to-leading terms in the expansion of Eq. (6.4.11)
in powers of R/bE give
3/2
4Ze2 π R R
G(E) = −2 +O . (6.4.15)
h̄vα 2 bE bE
√
Since vα ∝ E and bE ∝ 1/E, we have
α
ln τα ∝ G(E) = √ + β + O(E) , (6.4.16)
E
with α and β constant in energy. This dependence of ln τα on energy was
originally noticed in 1911 as a dependence of the alpha particle range in air
on energy, and in that form is known as the Geiger–Nuttall law.17
For a numerical example let us consider the historically important decay
process 226 Ra → 222 Rn + 4 He. The nuclei 226 Ra, 222 Rn, and 4 He all have
spin zero and even parity, so the alpha particle in this decay has = 0, as we
assumed in our calculation of G(E). The alpha particles from this decay have a
velocity vα = 1.519×109 cm/sec, and radon has Z = 86, so here the first factor
in Eq. (6.4.11) for G(E) is 4Ze2 /h̄vα = 49.55. Also, bE = 5.18 × 10−12 cm.
According to Eq. (6.1.1) the radius of 222 Rn is approximately 7.9 × 10−13 cm,
to which we should add the radius 2 × 10−13 of 4 He, and so the effective
nuclear radius here is R 9.9 × 10−13 cm, and R/bE 0.19. The func-
tion (6.4.12) is then f (R/bE ) = 0.72. Equation (6.4.11) then gives G(E) =
35.7, and the barrier penetration probability is exp(−2G) ≈ 10−31 . If we take
ν 1021 sec−1 then Eq. (6.4.13) gives a radium mean life 1/ α of order 1010
17 H. Geiger and J. M. Nuttall, Phil. Mag. 22, 613 (1911); 23, 439 (1912).
6.4 Alpha Decay 235
sec. It is the smallness of the factor exp(−2G) that is responsible for the radium
226 nuclei produced in a chain of radioactive decays from uranium 238 living
long enough to be discovered in uranium ores in 1898 by Marie and Pierre
Curie. The predicted mean lifetime, of order 1010 sec, may be compared with
the measured mean life of 2300 years = 7 × 1010 sec. The agreement, such
as it is, is somewhat accidental, because the decay rate is so sensitive to the
nuclear radius R. For instance, if we had taken the effective nuclear radius as
R = 9.3 × 10−13 cm instead of R = 9.9 × 10−13 cm, then with everything else
the same we would have found a predicted mean life of 5600 years. Indeed,
rather than using known values of R to calculate alpha decay rates of various
nuclei, the observed decay rates were historically used to estimate R. For this
purpose, it is not important to be precise about the value of the factor ν multi-
plying exp(−2G) in Eq. (6.4.13). But it is worth trying to be precise about this
in order to make sure that we understand the decay process.
Appendix: Quantum Theory of Barrier Penetration Rates
This appendix presents a thoroughly quantum-mechanical solution of a some-

what artificial problem. We consider a particle in a negative nuclear potential
well surrounded by a positive potential barrier, whose wave function is initially
confined to the nuclear potential well, and we calculate the rate at which the par-
ticle escapes to infinity, without relying on the semi-classical picture of particles
in the nucleus continually banging into the potential barrier and occasionally
leaking through. The calculations in this appendix do not depend on the detailed
form of the potential in the barrier, and so apply also the case where = 0,
where, as in Eq. (5.2.19), we include a centrifugal term h̄2 ( + 1)/2mr 2 in the
potential.
Our strategy will be to assume some initial wave function for the particle,
entirely confined in the nuclear potential well, expand it in orthonormalized
solutions u(N )
E (r) of the Schrödinger equation for various energies E, give each
such solution a time dependence exp(−iEt/h̄), and see what happens as the
time increases.18 In the course of this calculation we will be able to give an idea
of the conditions under which the approximation (6.4.6) is valid, and find the
amplitudes AE± (r).
Our first task is to calculate the not-yet-normalized reduced wave function
inside the barrier, where it satisfies Eq. (6.4.4). This differential equation has
no general analytic solution. We again guess that if the potential varies slowly
(in a sense to be determined) then Eq. (6.4.4) has approximate solutions
18 This follows the approach of E. Fermi, Nuclear Physics, lecture notes compiled by J. Orear, A. H.
Rosenfeld, and R. A. Schluter, revised ed. (University of Chicago Press, Chicago, 1950), Chapter III.
The treatment in this appendix is somewhat simplified by working throughout with continuum wave
functions, and supplies some justifications skipped over by Fermi.

uE (r) = AE± (r) exp ± κE (r) dr , (6.4.17)
where the amplitudes AE± (r) vary more slowly than the exponentials. Before
making any approximations, Eq. (6.4.4) can be written as a differential equation
for AE± (r):
2κE AE± + κE AE± ± AE± = 0 . (6.4.18)
We can implement the approximation that AE± (r) varies slowly by dropping the
second derivative AE± (r), solving Eq. (6.4.18), using the solution to calculate
AE± (r), and checking under what conditions it may indeed be neglected. With
the term ±AE± (r) dropped, Eq. (6.4.18) becomes AE± /AE± = −κE /2κE ,
√
which has the easy solution AE± (r) ∝ 1/ κE (r). Then
AE± κE 3 κE
=− + .
κE AE± 2κE κE 4 κE2
Thus AE± is indeed negligible compared with the term κE AE± in Eq. (6.4.18) if

1 κE 1 κE

1 and
1, (6.4.19)
κE κE κE κE
which is to say that both κE (r) and κE (r) undergo only small fractional changes
in a distance of order 1/κE (r). Under these conditions, Eq. (6.4.4) has the two
independent approximate solutions

1
√ exp ± κE (r) dr .
κE (r)
This is known as the WKB approximation.19 We can write the general solution
of Eq. (6.4.4) inside the barrier as a linear combination of these solutions:
r r
C+ (E) C− (E)
uE (r) = √ exp + κE (r) dr + √ exp − κE (r) dr .
κE (r) R κE (r) R
(6.4.20)
Beyond the barrier, where r > bE (with V (bE ) ≡ E), it is convenient to
write the Schrödinger equation (6.4.2) for the reduced wave function as
d2
uE (r) = −kE
2
(r)uE (r) , (6.4.21)
dr 2
19 G. Wentzel, Zeit. f. Phys. 38, 518 (1926); H. A. Kramers, Zeit. f. Phys. 39, 828 (1926); L. Brillouin,
Compt. Rendus Acad. Sci. 183, 24 (1926).
6.4 Alpha Decay 237
where
1%
kE (r) = + 2mα (E − V (r)) . (6.4.22)
h̄
Following the same arguments as before, provided that

1 kE

1 and 1 kE

1,
(6.4.23)
kE kE kE kE
we can use the WKB approximation to find solutions

1
uE (r) ∝ √ cos kE (r) dr + ϑ , (6.4.24)
kE (r)
where ϑ is any angle. We have two independent solutions, given by using
Eq. (6.4.24) with ϑ taken as two different angles.
We need to work out how each of the two independent solutions of the
Schrödinger equation inside the Coulomb barrier, for r < bE , merges with
linear combinations of the two independent solutions beyond the barrier, where
r > bE . Unfortunately we cannot do this by equating the value and derivative
of the WKB solutions for r just below and just above bE , because κE (r) and
kE (r) both vanish at r = bE , and so the conditions (6.4.19) and (6.4.23) for the
validity of the WKB approximation break down near bE . This is a well-known
problem in the use of the WKB approximation to calculate bound state energies,
but here we will encounter an additional difficulty.
We will make the reasonable assumption that V (r)−E approaches a function
proportional to bE − r for r near bE . In this case, for r > bE ,
%
kE → βE r − bE for r → bE , (6.4.25)
with βE a positive function of E. It is convenient to define a new independent
variable
r
2βE
φ≡ kE (r )dr → (r − bE )3/2 . (6.4.26)
bE 3
The Schrödinger equation (6.4.21) then takes the form
d 2u 1 du
2
+ +u=0, (6.4.27)
dφ 3φ dφ
with two independent solutions
u ∝ φ 1/3 J±1/3 (φ) , (6.4.28)
where Jν is the usual Bessel function of order ν. Likewise, for r < bE we have
%
κE → βE bE − r for r → bE , (6.4.29)
with βE the same positive function of E as in Eq. (6.4.25). Here it is convenient

to define
bE
2βE
φ≡ κE (r )dr → (bE − r)3/2 . (6.4.30)
r 3
The Schrödinger equation (6.4.4) then takes the form
d 2u 1 du
2
+ −u=0, (6.4.31)
dφ 3φ dφ
with two independent solutions
1/3
u∝φ I±1/3 (φ) , (6.4.32)
where here Iν (φ) is the Bessel function of order ν with imaginary argument:
Iν (φ) ≡ e−iπ ν/2 Jν (e+iπ ν/2 φ) . (6.4.33)
To see how the solutions for r > bE and r < bE merge with each other at
r = bE , we note that, for φ → 0,
φ 2/3 21/3
φ 1/3 J1/3 (φ) → , φ 1/3 J−1/3 (φ) → ,
21/3 (4/3) (2/3)
while, for φ → 0,
2/3
1/3 φ 1/3 21/3
φ I1/3 (φ) → 1/3 , φ I−1/3 (φ) → .
2 (4/3) (2/3)
2/3
But φ 2/3 → (2βE /3)2/3 (r − bE ) and φ → (2βE /3)2/3 (bE − r), so
1/3
φ 1/3 J±1/3 (φ) ⇐⇒ ∓φ I±1/3 (φ) , (6.4.34)
where “⇐⇒” means “connects smoothly at r = bE .”
To learn from these results about the WKB solutions, we note that the con-
ditions (6.4.19) and (6.4.23) are satisfied if φ 1 and φ 1. As long as
the approximations (6.4.25) and (6.4.29) are still valid for these large values
of φ and φ, we can take wave functions in the WKB approximations as the
asymptotic limits of the solutions (6.4.28) and (6.4.32):

2 −1/6 π π
φ J±1/3 (φ) →
1/3
φ cos φ ∓ −
π 6 4
2 1/6 r
2 3βE −1/2 π π
= kE (r) cos kE (r )dr ∓ − , (6.4.35)
π 2 bE 6 4
6.4 Alpha Decay 239
and

1/3 1 −1/6

φ I±1/3 (φ) → φ exp φ
2π
1/6
1 3βE2 −1/2
bE

= κE (r) exp κE (r )dr , (6.4.36)
2π 2 r
but

1/3 3 −1/6

φ I+1/3 (φ) − I−1/3 (φ) → −
φ exp −φ
π
2 1/6 bE
3 3βE −1/2
=− κE (r) exp − κE (r )dr . (6.4.37)
π 2 r
From Eqs. (6.4.37), (6.4.35), and (6.4.34), we see that

bE
−1/2
κE (r) exp − κE (r )dr
r
r
2 −1/2 π π
⇐⇒ kE (r) cos kE (r )dr − −
3 bE 6 4
r
π π
+ cos kE (r )dr + −
bE 6 4
r
−1/2 π
= 2kE (r) cos kE (r )dr − . (6.4.38)
bE 4
To find the other connection formula we need, we now have a problem.

Inspecting Eqs. (6.4.36) and (6.4.35), how do we decide in using Eq. (6.4.34)
−1/6
whether φ exp(φ) connects smoothly with −2φ −1/6 cos(φ − π/6 − π/4)
or with +2φ cos(φ + π/6 − π/4)? This puzzle arises because, lurking
1/6
−1/6

under the term proportional to φ exp φ in the asymptotic expansion of
1/3
φ there are terms with unknown coefficients that are proportional
I±1/3 (φ),
−1/6

to φ exp −φ and are therefore negligible for φ 1 but that nevertheless,
as shown in Eq. (6.4.38), connect smoothly with a term proportional to
φ −1/6 cos(φ − π/4). (This is known as the Stokes phenomenon.) As we saw in
deriving Eq. (6.4.38), the difference between − φ −1/6 cos(φ − π/6 − π/4) and
+ φ 1/6 cos(φ + π/6 − π/4) is proportional to φ −1/6 cos(φ − π/4), so we can
−1/6
take φ exp(φ) to connect smoothly with − φ −1/6 cos(φ − π/6 − π/4) or
with + φ cos(φ + π/6 − π/4) or with their average, plus a term proportional
1/6
to φ −1/6 cos(φ − π/4) with a coefficient that cannot be calculated within our
present approximations. Using the average, we have then from Eqs. (6.4.36),
(6.4.35), and (6.4.34):
bE
−1/2 −1/2
κE (r) exp κE (r )dr ⇐⇒ −kE (r)
r
bE bE
π π
× cos kE (r )dr + + ξ(E) cos kE (r )dr − ,
r 4 r 4
(6.4.39)
with ξ(E) an unknown coefficient.20
We can write the exponentials in Eq. (6.4.20) as
r bE
±G(E)
exp ± κE (r ) dr = e exp ∓ κE (r ) dr ,
R r
where G(E) is the barrier penetration exponent:
bE
G(E) ≡ κE (r) dr , (6.4.40)
R
given by Eq. (6.4.11) for a Coulomb potential with = 0. Then the wave
function uE (r) that takes the form (6.4.20) for R < r < bE takes the following
form for r > bE :

1

uE (r) = − √ 2C+ (E)eG(E) + ξ(E)C− (E)e−G(E)
kE (r)
r
π
× cos kE (r )dr −
bE 4
r
−G(E) π
+ C− (E)e cos kE (r )dr + . (6.4.41)
bE 4
Now we need to consider the normalization of the wave functions. Since the
Hamiltonian is Hermitian, and allowed values of energy E form a continuum,
we know that wave functions with different energy are orthogonal, in the sense
that
∞ ∞
uE (r) uE (r) dr = RE (r) RE (r) r 2 dr = N 2 (E)δ(E − E ) .
0 0
(6.4.42)
The only question is, what is the coefficient N 2 (E)? Once we know this, we can
define orthonormalized wave functions
−1
u(N )
E (r) ≡ N (E)uE (r) , (6.4.43)
20 Without explanation, Fermi in the reference in footnote 18 took ξ = 0. This is not justified, but as we
shall see it makes no difference in the decay rate.
6.4 Alpha Decay 241
for which
∞
uE (r) uE (r) dr = δ(E − E ) ,
(N ) (N )
(6.4.44)
0
and use these in an expansion of the time-dependent wave function.

To find the coefficient of the delta function in Eq. (6.4.42), we can discard any
term in the integral that remains finite as E → E . The singularity as E → E
in this integral comes entirely from the infinite range where r is much larger
than bE , so in the integral we use the asymptotic form of Eq. (6.4.41):
1

uE (r) → − √ 2C+ (E)e+G(E) + ξ(E)C− (E)e−G(E) cos (kr − π/4)
k

+ C− (E)e−G(E) cos (kr + π/4) , (6.4.45)
where now k is the wave number of the free particle,
1%
k≡ 2mα E . (6.4.46)
h̄
To calculate the singular part of the integral (6.4.42), we insert a convergence
factor exp(−r) and consider the limit as → 0+ and E → E. A straightfor-
ward calculation gives
∞ π π
dr e−r cos kr ± cos k r ±
0 4 4

1 k+k
= + ,
2 2 + (k + k )2 2 + (k − k )2
∞ π π
dr e−r cos kr ± cos k r ∓
0 4 4

1 k−k
= ± .
2 2 + (k + k )2 2 + (k − k )2
Using a well-known representation of the delta function,

= (π/2)δ(k − k ) ,
2 + (k − k )2
and discarding any terms that are not singular when we set k = k and then let
go to zero, we have
∞

π h̄2 k
cos (kr ± π/4) cos k r ± π/4 = (π/4)δ(k − k ) = δ(E − E ) ,
0 4m α
∞

cos (kr ± π/4) cos k r ∓ π/4 = 0 . (6.4.47)
0
Equation (6.4.45) then gives Eq. (6.4.42), with

π h̄2
2
N 2 (E) = 2C+ (E) eG(E) + ξ(E)C− (E) e−G(E) + C−
2
(E) e−2G(E) .
4mα
(6.4.48)
We have been ruthless here in discarding non-singular terms, but this is a pre-
cise result, apart from the WKB approximation, which was used in deriving
Eq. (6.4.41).
We now turn to calculating the time-dependent reduced wave function u(r, t),
assuming that at t = 0 it takes the form

uE1 (r) r < R
u(r, 0) = (6.4.49)
0 r>R,
where E1 is the energy of an unstable state for which C+ (E1 ) = 0. We can
expand this in orthonormalized solutions of the Schrödinger equation
∞ ∞
u(r, 0) = (N )
dEuE (r) dr u(N )
E (r )u(r , 0)
0 0
∞ R
(6.4.50)
dr uE (r )uE1 (r )
(N ) (N )
= dEuE (r) .
0 0
The time-dependent Schrödinger equation tells us that to find the wave function
at any later time we must insert a factor e−iEt/h̄ in the integrand:
∞ R
−iEt/h̄ (N )
dr uE (r )uE1 (r )
(N )
u(r, t) = dEe uE (r)
0 0
∞
4mα
= dEe−iEt/h̄ uE (r)
π h̄2 0
R
dr uE (r )uE1 (r )
× 0
.
[2C+ (E) eG(E) + ξ(E)C− (E) e−G(E) ]2 + C−
2 (E) e−2G(E)
(6.4.51)
For r < R the wave function uE (r ) is unaffected by the potential barrier,
and therefore (as shown for example in Eq. (6.4.9)) varies smoothly with E.
On the other hand, the term in the denominator proportional to e2G(E) makes
the integrand very small except very near the energies of unstable states, where
C+ (E) vanishes. The integral is therefore dominated by values of E very near
the energies En at which C+ (E) vanishes. These are the energies of nearly
stable states, so the wave functions uEn (r) are approximate eigenfunctions of
the Hamiltonian, and therefore are approximately orthogonal, so in Eq. (6.4.51)
R
the integral 0 dr uEn (r )uE1 (r ) is very small for n = 1. For E very near
E1 , we can approximate C+ (E) → C+ (E )(E − E ). Since the contribution
1 1
6.5 Beta Decay 243
of other energies is exponentially suppressed, we can set E = E1 everywhere

except in the factor E − E1 and the exponential, and extend the range of energy
integration to run over the whole real axis, and for r < R write
R ∞
4mα 2
u(r, t) u E1 (r) dr u E1 (r ) dE
π h̄2 0 −∞
e−iEt/h̄
× (E ) eG(E1 ) (E − E ) + ξ(E )C (E )e−G(E1 ) ]2 + C 2 (E ) e−2G(E1 )
.
[2C+ 1 1 1 − 1 − 1
(6.4.52)
For t > 0 the contour of integration over E can be closed with a large semicircle
in the lower half of the complex plane, on which e−iEt/h̄ is exponentially small.
Since this contour is now closed clockwise, the integral is given by −2iπ times
the residue of the pole at E = E1 − i|C− (E1 )/2C+ (E )|e−2G(E1 ) , where
1
E1 = E1 − ξ(E1 )e−2G(E1 ) C− (E1 )/C+ (E1 ) .

The wave function in the potential well therefore goes as
u(r, t) ∝ e−i E1 t/h̄ e−α t/2 uE1 (r) , (6.4.53)
where

C− (E1 ) −2G(E )
α = e 1 . (6.4.54)
h̄C (E )
+ 1
From the square of Eq. (6.4.53), we see that α is the rate of decay of the
probability density |u(r, t)|2 of the alpha particle within the nucleus. The
Stokes phenomenon has led to an incalculable but exponentially small shift
in the oscillation frequency of the wave function, but has no effect on the
decay rate. Equation (6.4.54) justifies the appearance of the suppression factor
e−2G(E1 ) in Eq. (6.4.14).
6.5 Beta Decay
The earliest studies of radioactivity revealed the existence of a distinct class of

radioactive processes, beta decay, in which an electron is emitted in a transition
between nuclear states. For instance, 234 Th, which is itself a product of the
alpha decay of 238 U, undergoes beta decay to 234 Pa. As we have seen, at first
beta decay was taken as evidence for the view that nuclei consist of protons
and electrons, but this interpretation was abandoned with the realization in the
1930s that nuclei are composed of protons and neutrons. An electron is created
at the moment of beta decay when a neutron turns into a proton – the electron
is no more in the nucleus before it is emitted than a photon is in an atom before
it is radiated.
But there was a peculiar difference between the observed energies of the
photons emitted in atomic transitions and the electrons emitted in beta decay.
As we saw in Section 3.4, Bohr had realized in 1913 that a photon emitted
in any given atomic transition has a unique energy, given by the difference in
energies of the initial and final atomic states. Chadwick discovered in 1914
that the energies of electrons emitted in a beta transition between any specific
nuclear states do not have any one value, but occupy a range up to some def-
inite maximum. This might be explained if a photon is emitted along with the
electron, with the energy of the nuclear transition shared between the electron
and the photon in a proportion that varies from one decay event to another. The
electron energy would come close to a maximum value, equal to the energy
released in the nuclear transition, only when the photon happens to have very
low energy. If beta decay produced a photon along with the electron, then
when these decay products are caught in a surrounding medium the heat energy
given to the medium would be the same in each decay event, equal to the
energy difference of the initial and final nuclear states, and hence equal to the
maximum value observed for the electron energy in this decay. But experiments
in 1927 by C. D. Ellis (1895–1980) and W. A. Wooster (1903–1984) showed
that the average energy deposited in the medium surrounding the decaying
nucleus was not equal to the maximum energy of the electron, but instead to its
average, as if whatever energy was not carried by the electron was simply lost.
Bohr was even led by this to speculate that energy might not be conserved in
beta decays.
A different explanation was offered in 1930 by Pauli. He proposed that the
electron in beta decay is indeed accompanied by another particle that because
electrically neutral had escaped detection, but this neutral particle is not a pho-
ton. Rather, it is an extremely penetrating particle that is not captured in the
surrounding medium. The particle soon became known as a neutrino, symbol-
ized ν. The underlying reaction is n → p + e− + ν (where n and p stand for the
neutron and proton, and the electron is denoted e− , for a reason we will come
to presently). Among many other examples, this is responsible for the decay of
the ground state of boron 12 to carbon in the reaction 12 B → 12 C + e− + ν,
as well as for the decay of the free neutron, n → p + e− + ν. Since neutrons,
protons, and electrons have spin 1/2, angular momentum conservation requires
the neutrino to have a half-integer spin. It is in fact known to have spin 1/2.
There are also radioactive decays in which instead of an electron there is emit-
ted a positron, e+ , the electron’s antiparticle, with the same mass but opposite
electric charge.21 The conservation of energy forbids the process p → n+e+ +ν
21 The existence of the positron was anticipated in 1930 by P. A. M. Dirac, Proc. Roy. Soc. A126, 360 (1930).
He had developed a relativistic version of the Schrödinger equation, which turned out to have solutions
corresponding to states of negatively charged electrons with negative energy as well as states with positive
energy. Dirac’s interpretation was that these negative-energy states are normally filled, one electron to
each negative-energy state in accordance with the Pauli exclusion principle, but that occasionally there
6.5 Beta Decay 245
for free protons, but in nuclei this process can produce decays such as the beta
decay of the ground state of nitrogen 12, 12 N → 12 C + e+ + ν.
In 1934 Fermi proposed a detailed theory of beta decay.22 In Fermi’s theory
the interaction Hamiltonian takes the form

Hβ = (h̄c) GF ημν d 3 xV μ V ν + c.c. ,
3
(6.5.1)
where GF is a constant; V μ and V ν are operators with the same Lorentz and
space inversion transformation properties as the electric current J μ and with
the dimensionality of densities (that is, inverse volumes); V μ acts to change
neutrons to protons; V ν acts to create electrons and neutrinos; and as usual
c.c. indicates the adjoint of the foregoing term. The factor (h̄c)3 is extracted
from GF for later convenience. As we will see in the appendix to Section 7.4
these currents are bilinear functions of Dirac fields, but we will not need that
information for our limited purposes here.
Fermi’s theory almost immediately needed modification. The three-vector
part of the current V μ is odd under space inversion, so when acting on nuclear
states it gives a contribution proportional to nucleon velocities v, and so is
suppressed by a factor of order |v|/c, which is small in nuclei, as in atoms.
This leaves the time component V 0 , which is even under space inversion and
is a rotational scalar. For decays that are not suppressed by a centrifugal barrier
there is no orbital angular momentum, so in these decays neither the parity nor
spin of the nuclear states can change. But many beta decays were observed
in which the spin of the nuclear state did change by one unit, and which yet
appears a vacancy which we observe as a particle of positive charge and positive energy. At first Dirac
identified these holes as protons, but then in 1932 a positively charged particle was unexpectedly found in
cosmic rays by C. D. Anderson, Phys. Rev. 43, 491 (1932). (This article is included in Beyer, Foundations
of Nuclear Physics, listed in the bibliography.) The cloud chamber tracks of these particles were observed
to have the same curvature in a magnetic field as electron tracks, but in the opposite direction, consistent
with a particle having the same mass as an electron and a charge of equal magnitude but opposite sign. It
was widely supposed that these were Dirac’s holes.
The interpretation of positrons as vacancies in a sea of negative-energy electrons has largely been
abandoned. Dirac’s relativistic wave equation works only for particles of spin 1/2. This at first seemed
like a triumph because protons and electrons were known to have spin 1/2, but by now we know of
several particles of spin 0 and spin 1 (the H 0 , W + , W − , and Z 0 ) that seem every bit as elementary as
the electron. Furthermore, the W + and W − are each other’s antiparticles, in the same sense as the e+
and e− . But these are bosons, which do not obey the exclusion principle and so could not form a stable
sea of negative-energy particles. As described in the appendix to Section 7.4, Dirac’s equation survives
as the field equation satisfied by the quantum field of particles of spin 1/2 but not, as Dirac thought, as a
relativistic version of a Schrödinger equation for a probability amplitude.
As explained in Section 7.4, we now understand as a consequence of Lorentz invariance and quantum
mechanics that for every species of particle, elementary or not, fermion or boson, there is a corresponding
species of antiparticle, with the same mass and spin but opposite electric charge. The only qualification is
that a few types of electrically neutral particles like the photon and the Z 0 are their own antiparticles.
22 E. Fermi, Zeit. Phys. 88, 161 (1934). This article is reprinted in Beyer, Foundations of Nuclear Physics,
listed in the bibliography. In his article Fermi cited an unpublished suggestion by Pauli that a neutral
weakly interacting particle was emitted along with electrons in beta decay.
seemed to be “allowed” in the sense of having rates comparable to typical other

beta decays with similar energy. For instance, the ground states of the nuclei
12 B and 12 N mentioned in Section 6.2 have spin one and even parity, and yet
have allowed beta decays into the ground state of 12 C, which has spin zero and
even parity.
In order to allow such decays, Fermi’s theory was modified by adding an
additional term to the interaction Hamiltonian:23

Hβ = (h̄c)3 GF ημν d 3 x[V μ V ν + Aμ Aν ] + c.c. , (6.5.2)
where Aμ like V μ turns neutrons into protons; Aν like V ν creates electrons

and neutrinos; and Aμ and Aν are axial vectors – that is, like V μ and V ν they
transform as four-vectors under proper Lorentz transformations, but they have
opposite properties under space inversion: A is even and A0 is odd, and likewise
for Aν . In consequence A acting on nuclear states can make contributions pro-
portional to nucleon spin vectors, allowing beta decays in which spin changes
by one unit, such as the beta decays of 12 B and 12 N, without suppression of
the rate by factors of order |v|/c or centrifugal barriers. With the interaction
Hamiltonian (6.5.2), the selection rules for “allowed” beta decays are that the
nuclear parity does not change and that the nuclear spin can change by at most
one unit.
To estimate these “allowed” rates, we note that in order for Hβ to have the
dimensionality of energy, the constant (h̄c)3 GF must have the dimensions of
energy times volume. Hence GF has the convenient (and conventional) dimen-
sionality of energy−2 , which is why the factor (h̄c)3 was inserted in Eqs. (6.5.1)
and (6.5.2). The rate β of any beta decay process is proportional to G2F , and if
the energy E released in the decay is much larger than me c2 = 0.511 MeV then
apart from the factor G2F it can only depend on E, so in order for it to have the
dimensionality of a rate it must take the form
1 2 5
β ≈ G E . (6.5.3)
h̄ F
This E 5 dependence is observed for high-energy beta decays that satisfy the
−1/2
selection rules for “allowed” beta decays. The energy GF turns out to be
very large. For instance, as we saw in Section 6.2, the energy released in
the beta decay of 12 B to the ground state of 12 C is 0.0144 mp c2 = 13.4 MeV,
much larger then me c2 . The rate is 48.5 sec−1 . Using these numbers in
−1/2
Eq. (6.5.3) gives GF ≈ 2 × 103 GeV. (This energy is large in part because
weak interactions are transmitted by a heavy particle, the W ± particle with
23 G. Gamow and E. Teller, Phys. Rev. 49, 895 (1936). There were other possibilities involving scalar and
tensor operators that were not finally excluded by experimental data until the 1950s.
6.5 Beta Decay 247
mass 80.4 GeV/c2 , and in part because the interactions that emit and absorb
W ± particles are characterized by a small constant, of the same order as
the fine-structure constant e2 /h̄c 1/137 of electrodynamics, which gives
−1/2 √
GF ≈ mW c2 × 137 ≈ 103 GeV. A more accurate value along with a
more precise definition for GF will be given in the appendix to Section 7.4.)
The extremely low rate at which neutrinos were absorbed by the medium
surrounding the radioactive nuclei in the 1927 experiments of Ellis and Wooster
is due to the extreme weakness of interactions such as beta decay that involve
neutrinos. In general the rates of neutrino interaction processes are characterized
by the presence in the rate of the factor G2F , which is what makes them so weak.
For instance, the cross section for the neutrino reactions ν + p → e+ + n and
ν + n → e− + p (whether for a free proton or a proton or neutron inside a
nucleus) is proportional to G2F , and so, since it has the units of area, dimensional
analysis requires that at a neutrino energy E considerably above me c2 the cross
section takes the form
σ ≈ (h̄cE)2 G2F .
Recalling that h̄c = 197 MeV × 10−13 cm, we see that for a relatively high-
energy beta decay neutrino with energy E = 10 MeV the cross section σ is of
order 10−44 cm2 . In ordinary matter, with a number density n of nucleons
of order 1024 cm−3 , this gives a mean free path 1/nσ ≈ 1020 cm, or about
100 light years. It is no wonder that Ellis and Wooster did not detect energy
deposited by neutrinos in their experiment. There never was any hope of
detecting neutrinos from ordinary laboratory samples of radioactive material,
but nuclear reactors emit such enormous floods of neutrinos from the beta
decay of fission products that at last in 1956 Clyde Cohan, Jr. (1919–1974)
and Frederick Reines (1918–1998) were able to detect neutrinos produced at
the Savannah River reactor by detecting gamma rays from the annihilation of
positrons produced in the reaction ν + p → n + e+ .
All rates for processes involving neutrinos are suppressed by the factor G2F ,
and there are also reactions due to other weak interactions that do not involve
neutrinos but are similarly suppressed. Among these are the decays of a particle
called the K meson, with a mass of 495 MeV/c2 , into two-pion and three-pion
states, decays that are very slow compared with processes such as the decay
of the three–three resonance into a pion and a nucleon that occur through the
action of strong interactions.
There is another common feature of weak interaction processes, beyond their
weakness. They violate some of the symmetry principles obeyed by strong and
electromagnetic interactions. It appeared that the charged K meson decayed
both into two-pion states that are invariant under the space inversion transfor-
mation x → −x, and also into three-pion states that change sign under space
inversion, which would not be possible if the space inversion operator com-
mutes with the Hamiltonian. It was this that led Tsung-Dao Lee (1926– ) and
Chen-Ning Yang (1922– ) in 195624 to suggest that weak interactions in general

do not respect invariance under space inversion, a suggestion that was soon
verified in the beta decay25 of 60 Co and in the decays of charged pions.26
Weak interactions also violate a symmetry between particles and antiparticles,
and they violate the conservation of several quantities (collectively known as
flavors), that are conserved by the strong and electromagnetic interactions.
It used to be thought that neutrinos have zero mass. For massless particles
the helicity, the component of angular momentum in the direction of motion,
is Lorentz invariant. In 1957 Lee and Yang proposed27 that the neutrinos emit-
ted with electrons or positrons always have helicities h̄/2 and −h̄/2, respec-
tively. This was only possible if weak interactions violate invariance under space
inversion, because space inversion transformations reverse the direction of the
neutrino’s motion while leaving its spin unchanged, and so reverse the helicity.
This proposal was incorporated in another change in the beta decay interaction
that in our present schematic notation takes the form

(h̄c)3 GF
Hβ = √ ημν d 3 x[V μ + Aμ ][V ν + Aν ] + c.c. , (6.5.4)
2
in which the terms V μ Aμ and Aμ Vμ evidently violate space inversion
symmetry.
But Lee’s and Yang’s proposal regarding neutrino helicity could not be uni-
versally and literally true unless neutrinos were massless, because the observed
direction of motion of a massive particle is reversed if the observer travels with
higher speed in the same direction, which does not affect its spin. It is now
known that neutrinos have very small but non-zero mass, much less than the
electron mass, and so like any massive particle of spin 1/2 neutrinos exist both
in states with angular momentum components h̄/2 and −h̄/2 in any direction.
Experiment shows that the emission of an electron or positron in the beta decay
of a nucleus at rest is accompanied by the emission of a neutrino that is over-
whelmingly likely to be in a state with angular momentum component in the
direction of motion, respectively +h̄/2 or −h̄/2, as proposed by Lee and Yang,
but this would not be the case if the neutrino were viewed by a more rapidly
moving observer.
There is a complication regarding neutrinos that I have so far not mentioned.
There are two other charged leptons, particles like the electron that have only
electromagnetic and weak but not strong interactions. These are the muon, with
mass 105.658 MeV/c2 , mentioned briefly in Section 4.3, and the more recently
24 T. D. Lee and C. N. Yang, Phys. Rev. 104, 254 (1956).

25 C. S. Wu et al., Phys. Rev. 104, 254 (1957).
26 R. Garwin, L. Lederman, and M. Weinrich, Phys. Rev. 105, 1415 (1957); J. Friedman and V. Telegdi,
Phys. Rev. 105, 1681 (1957).
27 T. D. Lee and C. N. Yang, Phys. Rev. 105, 1671 (1957).
6.5 Beta Decay 249
discovered tauon, with mass 1776.82 MeV/c2 . Both, like electrons, are emitted
by strongly interacting particles along with neutrinos, but these neutrinos are
not the same as the neutrinos emitted along with electrons or positrons in beta
decay. Rather, the neutrinos emitted in beta decay and along with the production
of muons and tauons are of three different types. For instance, the neutrino
emitted along with a muon in the decay of a charged pion can create another
muon in a reaction ν + n → p + μ− , but it cannot create an electron, and
the neutrino emitted along with an electron in beta decay can create another
electron in a reaction ν + n → p + e− , but even if its energy were high enough
it could not create a muon.
Except that, in a sense, it can. For years there was a mysterious deficiency in
the number of neutrinos observed to be coming from the Sun.28 These would
be electron-type neutrinos, created in reactions such as p + p → d + e+ + ν.
Bruno Pontecorvo (1919–1993) suggested29 that this is because neutrinos have
mass but the states with definite mass are not electron-type or muon-type or
tauon-type neutrinos. Rather, each of these is a superposition of neutrino states
of definite mass. According to this idea, the electron-type neutrinos emitted by
the Sun are superpositions of states of definite mass, which oscillate at different
rates on their way to the Earth, arriving as incoherent mixtures of neutrinos
of all three types. In the search for solar neutrinos the detectors were looking
for the reaction ν + 37 Cl → e− + 37 Ar, and were therefore sensitive only to
electron-type neutrinos, which according to Pontecorvo is why fewer neutrinos
were detected than would have been the case if neutrinos were massless, the
undetected neutrinos arriving as muon-type or tauon-type. This hypothesis was
confirmed when it became possible to detect solar neutrinos in the reaction ν +
d → ν + p + n, which is equally sensitive to neutrinos of all three types, and
the number seen was just what was expected. The existence of neutrino masses
has by now been convincingly confirmed in numerous terrestrial experiments,
which, although they have not yielded values for individual neutrino masses,
indicate that they are in the range of 0.01 to 0.1 eV/c2 .
When neutrinos were thought to have zero mass it was common to call the
particle emitted along with an electron an antineutrino, reserving the term neu-
trino for the particle emitted along with a positron. This was to preserve a widely
accepted conservation law, of a quantity known as lepton number, analogous to
baryon number. Electrons and neutrinos were supposed to have lepton number
+1; positrons and antineutrinos would have lepton number −1, while protons
and neutrons would have lepton number zero, so that lepton number would be
conserved in both kinds of beta decay. But it is not possible to attribute different
values for lepton number or any other conserved quantity to the neutral particles
28 J. N. Bahcall, Phys. Rev. Lett. 12, 300 (1964); Phys. Rev. 135, B137 (1964); R. Davis, Jr., Phys. Rev. Lett.
12, 303 (1964); R. Davis, Jr., D. S. Harmer, and K. C. Hoffmann, Phys. Rev. Lett. 26, 1205 (1968).
29 B. Pontecorvo, JETP 53, 1717 (1967).
emitted with electrons or positrons in beta decay if they are just different spin
states of the same particle.
Now that we know that neutrinos have mass, there are two widely considered
points of view regarding the nature of neutrinos and of lepton number and the
origin of neutrino masses.
First, in order to preserve the exact conservation of lepton number it would be
necessary to suppose that the neutrino fields of electron-type, muon-type, and
tauon-type with lepton number −1 each have distinct adjoints with lepton num-
ber +1. In this view it is the states of helicity +h̄/2 of the field with lepton
number −1 that have been observed to be emitted with electrons in beta decay,
and it is the states of helicity −h̄/2 of the adjoint field with lepton number +1
that have been observed to be emitted with positrons, while the other helicity
states of the two fields of each type exist but are so far unobserved. Neutrinos
of this description are often called Dirac neutrinos, because their fields are
described in the same way as in the description of electrons by Dirac, discussed
in the appendix to Section 6.4.
The other possibility, often associated with the name of Ettore Majorana
(1906–1938), is that lepton number is not conserved, and the three types of
neutral particles emitted with negative and positive leptons are states of the same
three spin 1/2 particles, which as a consequence of Eq. (6.5.4) are overwhelm-
ingly likely to be emitted with helicity h̄/2 when emitted with e− , μ− , τ − and
with helicity −h̄/2 when emitted with e+ , μ+ , τ + . We can then regard these
three neutral particles as their own antiparticles, like the photon or the π 0 .
For what it is worth, the Majorana alternative seems to me a more econom-
ical and plausible view, which is why in this section I have not distinguished
neutrinos and antineutrinos. In the Dirac case neutrinos get masses in much the
same way as the other leptons and the quarks of the Standard Model, so it is
mysterious why they are so light compared with other elementary particles. On
the other hand, the masses of Majorana neutrinos can only arise from effects at
very high energy, and are naturally in the observed range.
Fermi’s theory correctly described the probability distribution for the
energy of the electron or positron emitted in beta decay, a distribution that
was unaffected by the subsequent modifications in the interaction Hamiltonian
described above. With these modifications Fermi’s theory has survived as a cor-
rect approximate theory for nuclear beta decay. It was in fact the first successful
application of quantum field theory outside the context of electrodynamics.
7
Quantum Field Theory
Chapter 5 described quantum mechanics in the context of particles moving in

a potential. This application of quantum mechanics led to great advances in
the 1920s and 1930s in our understanding of atoms, molecules, and much else.
But starting around 1930, and increasingly since then, theoretical physicists
have become aware of a deeper description of matter, in terms of fields.
Just as Einstein and others had much earlier recognized that the energy and
momentum of the electromagnetic field is packaged in bundles, the particles
later called photons, so also there is an electron field whose energy and
momentum are packaged in particles, observed as electrons, and likewise for
every other sort of elementary particle. Indeed, in practice this is what we now
mean by an elementary particle: it is the quantum of some field, which appears
as an ingredient in whatever seem to be the fundamental equations of physics
at any stage in our progress.
This is a good place to warn of an old misunderstanding. It used to be thought
by some theorists (perhaps de Broglie) that the wave function of a particle
is a field, something like the electromagnetic field. Just as the creation and
annihilation of photons was seen as a consequence of the application of quantum
mechanics to the electromagnetic field, some theorists came to think that the
creation and annihilation of electrons and other particles could be understood
through the application of quantum mechanics to the wave function itself, a pro-
cess known as second quantization. This does not work. The electromagnetic
field cannot be interpreted like a wave function as a probability amplitude,
and the Schrödinger wave function does not have the Lorentz transformation
property of a scalar field. The wave function is not a field – it is a representation
of a physical state. As discussed in Section 5.10, it is the component of the
state vector in some basis, such as one labeled by the possible positions of a
particle. Even though it is not generally useful to do so, we can also introduce
251
252 7 Quantum Field Theory
wave functions for fields – they are functionals of the field, quantities that
depend on the value taken by the field at every point in space, equal to the
component of the state vector in a basis labeled by these field values. One still
sometimes hears talk of second quantization, but this idea is an obsolete
historical relic.
7.1 Canonical Formalism for Fields
We begin by restating the canonical formalism described in Section 5.7, now in

the context of fields. Here the qN (t) are fields ϕn (x, t), with sums over the label
N now comprising both sums over the discrete label n which distinguishes one
type of field from another and integrals over the spatial argument x. In order
to have any chance of a Lorentz-invariant theory, the action here must take the
form of an integral over spacetime of a function of space derivatives as well as
time derivatives of the fields
+∞
I [ϕ] = d x 3
dt L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t)) . (7.1.1)
−∞
The function L is known as the Lagrangian density. Comparing this with

Eq. (5.7.11), we see that the Lagrangian here is

L[ϕ(t), ϕ̇(t)] = d 3 x L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t)) . (7.1.2)
This Lagrangian is a functional rather than a function of ϕn (x, t) and ϕ̇n (x, t);
that is, it depends on the values of ϕn (x, t) and ϕ̇n (x, t) for all n and x at a given
time t. Therefore, where derivatives of L appear in the canonical formalism as
described in Section 5.7, they should now be interpreted as functional deriva-
tives. In general, the functional derivatives δF /δϕ and δF /δ ϕ̇ of any functional
F of ϕn (x, t) and ϕ̇n (x, t) at a fixed time t are defined by the prescription that the
effect of independent infinitesimal variations in the arguments of the functional
is given by
F [ϕ(t) + δϕ(t), ϕ̇(t) + δ ϕ̇(t)]

δF [ϕ(t), ϕ̇(t)] δF [ϕ(t), ϕ̇(t)]
≡ F [ϕ(t), ϕ̇(t)] + 3
d x δϕn (x, t) + δ ϕ̇n (x, t) .
n
δϕn (x, t) δ ϕ̇n (x, t)
(7.1.3)
For the particular functional (7.1.2), we have

L[ϕ(t) + δϕ(t), ϕ̇(t) + δ ϕ̇(t)]

= d 3 x L(ϕn (x, t) + δϕn (x, t), ∇ϕn (x, t) + ∇δϕn (x, t), ϕ̇n (x, t) + δ ϕ̇n (x, t))
7.1 Canonical Formalism for Fields 253
= L[ϕ(t), ϕ̇(t)]

∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
+ 3
d xg δϕn (x, t)
n
∂ϕn (x, t)
∂ L(ϕn (x, t),∇ϕn (x, t), ϕ̇n (x, t))
+ δ(∂ϕn (x, t)/∂xi )
∂(∂ϕn (x, t)/∂xi )

∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
+ δ ϕ̇n (x, t)g
∂ ϕ̇n (x, t)
= L[ϕ(t), ϕ̇(t)]

∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
+ 3
d xg δϕn (x, t)
n
∂ϕn (x, t)

∂ ∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
− δϕn (x, t)
∂xi ∂(∂ϕn (x, t)/∂xi )

∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
+ δ ϕ̇n (x, t)g
∂ ϕ̇n (x, t)
where as usual a repeated index i is summed over the values 1, 2, 3. Comparing
this with the definition (7.1.3), we have

δL ∂L ∂ ∂L
= − , (7.1.4)
δϕn (x, t) ∂ϕn (x, t) ∂xi ∂(∂ϕn (x, t)/∂xi )
δL ∂L
= , (7.1.5)
δ ϕ̇n (x, t) ∂ ϕ̇n (x, t)
in which to save writing we have dropped the arguments of L and L.
Field Equations
We take the derivatives of the Lagrangian in the equations of motion (5.7.12) to
be functional derivatives:

∂ δL δL
= . (7.1.6)
∂t δ ϕ̇n (x, t) δϕn (x, t)
Equations (7.1.4) and (7.1.5) then give the field equations

∂ ∂L ∂L ∂ ∂L
= − . (7.1.7)
∂t ∂ ϕ̇n (x, t) ∂ϕn (x, t) ∂xi ∂(∂ϕn (x, t)/∂xi )
These are known as the Euler–Lagrange equations. We can put Eq. (7.1.7) into
a form that appears more consistent with Lorentz invariance:

∂L ∂ ∂L
= μ , (7.1.8)
∂ϕn (x, t) ∂x ∂(∂ϕn (x, t)/∂x μ )
in which as usual the repeated index μ is summed over the values μ = 1, 2, 3, 0,

again with x 0 = ct.
Commutation Relations
The field equations (7.1.8) could have been derived more easily by directly
requiring that the action (7.1.1) must be stationary with respect to arbitrary
infinitesimal variations of the ϕn (x, t) that vanish when |x| → ∞ or when
|t| → ∞. The calculation of functional derivatives is however important in
finding the commutation relations of the fields. The canonical conjugate πn (x, t)
to ψn (x, t) is defined as in Eq. (5.7.13) but with a functional derivative of L in
place of an ordinary derivative:
δL ∂L
πn (x, t) = = . (7.1.9)
δ ϕ̇n (x, t) ∂ ϕ̇n (x, t)
The canonical commutation relations (5.7.5) here read
[ϕn (x, t), πm (y, t)] = i h̄δnm δ 3 (x − y) ,
[ϕn (x, t), ϕm (y, t)] = [πn (x, t), πm (y, t)] = 0 . (7.1.10)
We will explore the consequences of these relations in the next section.
Energy and Momentum

In order to calculate the energies of the various states in a quantum field theory,
we need to know the Hamiltonian. Returning to Eq. (5.7.14) and again replacing
derivatives with functional derivatives and sums with sums and integrals, we
have

δL

H = 3
d x ϕ̇n (x, t) − L = d x
3
πn (x, t)ϕ̇n (x, t) − L ,
n
δ ϕ̇ n (x, t) n
(7.1.11)
evaluated at any time t. As explained in Section 5.7, the momentum operator of
any system is the generator of space translations. Under an infinitesimal space
translation x → x + , the fields are changed by
ϕn (x, t) → ϕn (x + , t) = ϕn (x, t) + · ∇ϕn (x, t) ;
so, according to the general rule Eq. (5.7.16), the momentum operator is
δL
P= d 3x ∇ϕn (x, t) = d 3 x πn (x, t)∇ϕn (x, t) . (7.1.12)
n
δ ϕ̇ n (x, t) n
7.2 Free Real Scalar Field
We next consider the simplest example of a quantum field theory, with a single
real scalar field ϕ(x), “free” in the sense that the field equations are linear.
Of course, we are really interested in what happens when fields interact, but,
as we will see in the next section, the first step in dealing with interacting fields
is to understand the content of the free-field theory.
We will take the Lagrangian density to have the form
1 ∂ϕ(x) ∂ϕ(x) m2 c2 2
L0 (x) = − ημν − ϕ , (7.2.1)
2 ∂x μ ∂x ν 2h̄2
the justification being that, as we shall see, this gives a sensible theory of
free spinless particles of mass m. (We are using the conventions described in
Chapter 4, with x 0 = ct; repeated indices are summed, with ημν = +1 for
μ = ν = 1, 2,43, η = −1 for μ = ν = 0, and η = 0 otherwise. This makes
μν μν
the action d x L0 Lorentz invariant.) The subscript 0 on L is to remind us that

this is just the part of the Lagrangian density that would describe free fields if
nothing else were added. We will have to add additional terms in the following
section to include interactions.
The field equations (7.1.8) here are

∂ μν ∂ϕ(x)
−(m c /h̄ )ϕ = − μ η
2 2 2
,
∂x ∂x ν
or more simply

− m2 c2 /h̄2 ϕ = 0 , (7.2.2)
where is the d’Alembertian operator:
∂ ∂ 1 ∂2
≡ ημν = ∇ 2
− . (7.2.3)
∂x μ ∂x ν c2 ∂t 2
The general real solutions of Eq. (7.2.2) are of the form

i
ϕ(x, t) = 3
d p A(p) exp (p · x − E(p)t)
h̄

−i
+ A (p) exp
†
(p · x − E(p)t) (7.2.4)
h̄
%
where E(p) = c2 p2 + m2 c4 , and the coefficients A(p) and A† (p) are
spacetime-independent operators whose properties are to be determined from
the canonical commutation relations.
The canonical conjugate to ϕ is here
∂ L0 1
π(x, t) = = 2 ϕ̇(x, t) . (7.2.5)
∂ ϕ̇(x, t) c
Then

[ϕ(x, t), π(y, t)] = 3
d p d 3 p (−iE(p )/c2 h̄)
/
i −i
× A(p) exp (p · x − E(p)t) + A (p) exp
†
(p · x − E(p)t) ,
h̄ h̄
/
i † −i
A(p ) exp (p · y − E(p )t) −A (p ) exp (p · y − E(p )t) .
h̄ h̄
Terms in the integrand that are proportional to the product exp(−iE(p)t/h̄) ×
exp (−iE(p )t/h̄) or to the product exp(iE(p)t/h̄) exp(iE(p )t/h̄) would make
different time-dependent contributions to the integral for any values of p and p ,
so since the canonical commutation rules give a time-independent commutator,
we must have
[A(p), A(p )] = [A† (p), A† (p )] = 0 . (7.2.6)
The commutator is then

[ϕ(x, t), π(y, t)] = d p 3
d 3 p (−iE(p )/c2 h̄)

† i −i
× −[A(p), A (p )] exp (p · x − E(p)t) exp (p · y − E(p )t)
h̄ h̄

i −i
− [A(p ), A (p)] exp
†
(p · y − E(p )t) exp (p · x − E(p)t) .
h̄ h̄
The commutator must be proportional to δ 3 (x − y), and in particular must be a

function only of x − y, so the commutator of A(p) with A† (p ) must be propor-
tional to δ 3 (p − p ),
[A(p), A† (p )] = f (p)δ 3 (p − p ) ,
which also ensures the cancellation of time-dependent factors. The commutator
is then

[ϕ(x, t), π(y, t)] = d 3 p (iE(p)/c2 h̄)[f (p) + f (−p)]eip·(x−y)/h̄ .
The canonical commutation relations require that

i h̄
[ϕ(x, t), π(y, t)] = i h̄δ (x − y) =
3
d 3 p eip·(x−y)/h̄
(2π h̄)3
and therefore
c2 h̄2
f (p) + f (−p) = .
(2π h̄)3 E(p)
At this point, we have to look at the commutator of the field with itself. Using
what we have already learned about [A(p), A(p )] and [A(p), A† (p )], we see
that

[ϕ(x, t), ϕ(y, t)] = d 3 p [f (p) − f (−p)]eip·(x−y)/h̄ .
Since this commutator has to vanish, we must have f (p) = f (−p), so we must
take
c2 h̄2
f (p) = f (−p) = .
(2π h̄)3 2E(p)
√
It is therefore convenient to define a(p) ≡ A(p)/ f (p), so that
[a(p), a † (p )] = δ 3 (p − p ) (7.2.7)
and

d 3p
ϕ(x, t) = h̄c √
2E(p)(2π h̄)3/2

i −i
× a(p) exp (p · x − E(p)t) + a (p) exp
†
(p · x − E(p)t) .
h̄ h̄
(7.2.8)
The operators a(p) and a † (p) are analogous to the operators ai and ai† intro-
duced in our discussion of the harmonic oscillator Hamiltonian in Section 6.3
but with a continuum momentum argument in place of the three-valued index i
and a delta function instead of a Kronecker delta.
The Hamiltonian for the free scalar field is given by

1 1 2 m2 c 2 2
H0 = d x[π ϕ̇ − L0 ] =
3
d x 2 ϕ̇ + (∇ϕ) + 2 ϕ .
3 2
2 c h̄
Since this is quadratic in the field ϕ, when we insert the expression (7.2.8) for
ϕ in H0 , we encounter a double integral over momentum. The integral over x
yields (2π h̄)3 factors times momentum delta functions that reduce this to a sin-
gle momentum integral. The time-dependent terms in the integrand proportional
to exp(−2iE(p)t/h̄) or exp(2iE(p)t/h̄) are also proportional to −E 2 (p)/c2 +

p2 + m2 c2 , which vanishes. This leaves the time-independent terms
, -
1
H0 = d 3 p E(p) a † (p)a(p) + a(p)a † (p) . (7.2.9)
2
In the same way, we find the momentum operator
, -
1
P= d 3 p p a † (p)a(p) + a(p)a † (p) . (7.2.10)
2
We can check that Eq. (7.2.9) is consistent with what we know to be the
time dependence of the field. The canonical commutation relations have been
constructed so that

d 3p
i h̄ϕ̇(x) = [ϕ(x), H0 ] = h̄c √
2E(p)(2π h̄)3/2

i
× [a(p), H0 ] exp (p · x − E(p)t)
h̄

−i
+ [a (p), H0 ] exp
†
(p · x − E(p)t) .
h̄
From Eq. (7.2.9) we have
[a(p), H0 ] = E(p)a(p) , [a † (p), H0 ] = −E(p)a † (p) , (7.2.11)
so

d 3p
i ϕ̇(x, t) = c √ E(p)
2E(p)(2π h̄)3/2

i −i
× a(p) exp (p · x − E(p)t) − a † (p) exp (p · x − E(p)t)
h̄ h̄
which is the same as would be given directly by taking the time derivative of
Eq. (7.2.8).
Likewise, from Eq. (7.2.10), we have
[a(p), P] = pa(p) , [a † (p), P] = −pa † (p) , (7.2.12)
from which we can see that, as expected, [ϕ(x), P] = −i h̄∇ϕ(x).
Equations (7.2.11) and (7.2.12) show that a(p) and a † (p) act as annihilation
and creation operators, analogous to the lowering and raising operators for
energy ai and ai† in Section 6.3 and to the operators J1 −iJ2 and J1 +iJ2 that we
used in working out the content of angular momentum multiplets in Section 5.4.
Suppose a state represented by a wave function ψ has definite values Eψ and
pψ for the total energy and total momentum. That is,
H0 ψ = Eψ ψ , Pψ = pψ ψ .
Then
H0 a(p)ψ = [H0 , a(p)]ψ + a(p)H0 ψ = (Eψ − E(p))a(p)ψ ;
so if a(p)ψ does not vanish, it is the wave function for a state with energy
Eψ − E(p). Likewise, this is a state with momentum pψ − p, while a † (p)ψ
is the wave function for a state with energy Eψ + E(p) and total momentum
pψ + p. In other words, a(p) and a † (p) respectively annihilate and create a par-
ticle of momentum p. This is what we mean when refer to elementary particles
being bundles of the energy and momentum in some field.
At this point, and for the rest of this chapter, we will abandon the language of
wave mechanics, and instead employ the more abstract language of state vectors
and scalar products that was outlined in Section 5.10. In quantum field theory
the wave function of any state such as the vacuum is a complicated functional of
the fields, and the action of operators like a(p) on these wave functions involves
functional derivatives with respect to these fields. None of these complications
plays a role in most calculations. What we use instead are the properties of
operators, such as the field equations and the canonical commutation relations,
and limited assumptions about physical states.
In particular, it is a plausible physical assumption that there should exist a
physical state, the vacuum vac , with the lowest possible energy. Then a(p)vac
must vanish,
a(p)vac = 0 , (7.2.13)
since otherwise it would be a state with energy less by an amount E(p).
To calculate the energy and momentum of the vacuum, it is convenient to use
the commutator of a with a † to rewrite Eqs. (7.2.9) and (7.2.10) as
, -
H0 = d 3 p E(p) a † (p)a(p) + Evac , (7.2.14)

P = d 3 p p a † (p)a(p) , (7.2.15)
where Evac is an infinite constant:

1 1
Evac = d p E(p) δ (0) =
3 3
d p E(p) d 3 x .
3
(7.2.16)
2 2(2π h̄)3
From Eqs. (7.2.14) and (7.2.13), we see that this is the energy of the vacuum:
H0 vac = Evac vac , (7.2.17)
while Eqs. (7.2.15) and (7.2.13) show that the momentum of the vacuum is zero.
For most purposes a constant term in the energy such as Evac makes no
difference, because the same constant appears in the energies of all states and
therefore has no effect in applications of the conservation of energy. The one
phenomenon that is affected by such a constant is gravitation, which is coupled

to all forms of energy. In a finite volume, Eq. (7.3.16) corresponds to an infinite
vacuum energy density

1
ρvac = d 3 p E(p) .
2(2π h̄)3
But Einstein’s general theory of relativity allows a term in the field equations of
gravitation, known as the cosmological constant, that has just the same effects
as ρvac . There is no reason why the cosmological constant should not include
an infinite negative term that simply cancels ρvac , possibly leaving over a finite
remaining energy density. Observations of an accelerated expansion of the uni-
verse have shown that this remaining energy density is not zero, though it is tiny
compared with the energy densities encountered in atomic and nuclear physics.1
The quantum states of this free field can be constructed by acting on the
vacuum with any number of creation operators. If we define
p1 ,p2 ,p3 ,... ≡ a † (p1 ) a † (p2 ) a † (p3 ) · · · vac (7.2.18)
then from Eqs. (7.2.11), (7.2.12), and (7.2.17) we see that
H0 p1 ,p2 ,p3 ,... = [Evac + E(p1 ) + E(p2 ) + E(p3 ) + · · · ] p1 ,p2 ,p3 ,...
(7.2.19)
and
Pp1 ,p2 ,p3 ,... = [p1 + p2 + p3 + · · · ]p1 ,p2 ,p3 ,··· . (7.2.20)
These are states with any number of particles. The superpositions of all
such states make up what is called Fock space, named after Vladimir Fock
(1898–1974). Because the operators a † (p1 ), a † (p2 ), etc. all commute with one
another, the states (7.2.18) are symmetric in the momenta of the particles, and
hence these spinless particles are bosons.
The states p1 ,p2 ,p3 ,... are no longer eigenstates of the Hamiltonian if we
add higher-order terms such as ϕ 3 , ϕ 4 , etc. to the Lagrangian density. Such
terms drive transitions between these states, corresponding to the creation and
annihilation of particles. We will discuss this further in the next section. As we
will see there, knowledge of the free-field theory is an essential ingredient in
these calculations, which is why we have gone into it here.
1 For a textbook discussion and references to the original literature, see S. Weinberg, Cosmology (Oxford
University Press, Oxford, 2008), Sections 1.4 and 1.5.
7.3 Interactions
We shall now consider how to calculate transition rates in theories of interacting

fields. Most (though not all) useful calculations in quantum field theory rely on
perturbation theory. We write the Hamiltonian as H = H0 + H , where H0
is the Hamiltonian of a free-field theory, like the Hamiltonian discussed in the
previous section, and H is an interaction term that is considered to be small
enough to allow physical quantities to be calculated as power series in H .
In Section 5.9 we saw how perturbation theory is used to calculate shifts in
energy levels in quantum mechanics. In second and higher orders in a perturba-
tion, we encounter energy denominators, such as that shown in Eq. (5.9.27).
Similar energy denominators occur in perturbative calculations of scattering
amplitudes using the Lippmann–Schwinger equation (5.6.29) and its iterations.
The fact that denominators involve energy but not momentum differences makes
it obvious that they are not Lorentz invariant, so they make it difficult to keep
track of Lorentz invariance in relativistic theories. This sort of perturbation
theory, which is now known as “old-fashioned perturbation theory,” was all that
was available for calculations in quantum field theory in the 1930s, making
progress difficult. In particular, it was not clear how to deal with the divergent
integrals occurring in these calculations without losing the Lorentz invariance
of the underlying theory.
In the late 1940s, independently, Richard Feynman2 (1918–1988), Julian
Schwinger3 (1918–1994), and Sin-Itiro Tomonaga (1906–1979) and his collab-
orators4 were able to carry out manifestly relativistic perturbative calculations
in quantum electrodynamics. The equivalence of their methods was shown by
Freeman Dyson5 (1923–2020), who gave a systematic account of a method
of calculation that would maintain manifest Lorentz invariance to all orders
of perturbation theory. We shall now describe this method. Here and for
the balance of this chapter, as is usual in work on quantum field theory, we
shall use what are called “natural units,” in which h̄ = c = 1, and we shall
continue to represent physical states as vectors in Hilbert space, as described in
Section 5.10.
2 R. P. Feynman, Rev. Mod. Phys. 20, 367 (1948); Phys. Rev. 74, 939, 1430 (1948); ibid., 76, 749, 769
(1949); ibid 80, 440 (1950).
3 J. Schwinger, Phys. Rev. 74, 1439 (1948); ibid., 75, 651 (1949); ibid., 76, 790 (1949); ibid., 82, 664, 914
(1951); ibid., 91, 713 (1953); Proc. Nat. Acad. Sci. 37, 452 (1951).
4 S. Tomonaga, Prog. Theor. Phys. Rev. Mod. Phys. 1, 27 (1946); Z. Koba, T. Tati, and S. Tomonaga, ibid.,
2, 101 (1947); S. Kanesawa and S. Tomonaga, ibid., 3, 1, 101 (1948); S. Tomonaga, Phys. Rev. 74, 224
(1948); D. Ito, Z. Koba, and S. Tomonaga, Prog. Theor. Phys. 3, 276 (1948); Z. Koba and S. Tomonaga,
ibid., 3, 290 (1948).
5 F. J. Dyson, Phys. Rev. 75, 486, 1736 (1949).
Time-Ordered Perturbation Theory

We saw in the appendix to Section 5.6 how the rate for any transition α → β
between free-particle states α and β can be calculated from a knowledge of the
S-matrix, Sβα . Our first task now is to see how to express Sβα in a form that
allows its calculation in modern perturbation theory.
In the appendix to Section 5.6 we showed how to construct an eigenstate α
of the full Hamiltonian, with H α = Eα α , with the special property that at
very early times it looks like the eigenstate α of the free-particle Hamiltonian,
with H0 α = Eα α , in the sense of Eq. (5.6.34): for t → −∞,

−iH t
g(α)e α dα → g(α)e−iH0 t α dα ,
where g(α) is a smooth function of the momenta of all the particles in state α,
introduced to give meaning to the limit for t → −∞. (Recall that the label α is
intended to include the momenta, spin 3-components or helicities, and species
labels of all the particles in the state α, and an integral over α is intended to
include integrals over all momenta and sums over all spin 3-components or
helicities and species labels.) We also showed that at very late times the same
state α looks like the superposition dβSβα β , in the sense of Eq. (3.6.35):
for t → +∞,

−iH t −iH0 t
g(α)e α dα → g(α)e dα dβ Sβα β .
With considerable loss of mathematical rigor, we multiply both sides by the

operator eiH t and equate coefficients of g(α) on both sides, so these formulas
yield

α = (−∞)α = (+∞) dβ Sβα β ,
where
(t) ≡ eiH t e−iH0 t .
From the two equalities we can conclude that
Sβα = (β , −1 (+∞)(−∞)α ) = (β , U (+∞, −∞)α ) , (7.3.1)
where
U (t, t0 ) ≡ −1 (t)(t0 ) = eiH0 t e−iH (t−t0 ) e−iH0 t0 . (7.3.2)
The justification (such as it is) for treating (t) as if it had well-defined limits
for t → ±∞ is that at very early and very late times the incoming and outgoing
particles are so far apart that the interaction H − H0 is ineffective. As we shall
see, at least in perturbation theory the limits t → ±∞ do lead to well-defined
probability amplitudes.
To construct a perturbation series for U in powers of the interaction H − H0 ,

we first take the derivative of Eq. (7.3.2) with respect to t:
d
U (t, t0 ) = −i exp(iH0 t)[H − H0 ] exp(−iH (t − t0 )) exp(−iH0 t0 )
dt
= −iH (t)U (t, t0 ) , (7.3.3)
where H (t) is the interaction in what is called the interaction picture, in which
the time dependence is governed by the free-particle Hamiltonian H0 :
H (t) ≡ exp(iH0 t)[H − H0 ] exp(−iH0 t) . (7.3.4)
The time dependence of any operator in the interaction picture is given by its
commutator with H0 , which is one reason why we need to understand the free-
field theory before taking interactions into account.
The differential (7.3.3) together with the initial condition U (t0 , t0 ) = 1 is
incorporated in the integral equation
t
U (t, t0 ) = 1 − i H (t1 )U (t1 , t0 )dt1 . (7.3.5)
t0
We can solve this at least formally by iteration:

t t t1
U (t, t0 ) = 1 − i dt1 H (t1 ) + (−i)2 dt1 dt2 H (t1 ) H (t2 ) + · · · .
t0 t0 t0
(7.3.6)
Instead of using limits on the integrals to impose an ordering of the integration
variables t1 , t2 , etc., we can integrate all these variables over the whole range
from t0 to t, which with n integrals includes n! permutations of the order of
the integration variables; we then correct for this multiplicity of permutations
by dividing by n! and reimpose the ordering of time variables by changing the
product of H operators to a time-ordered product denoted T {· · · }, in which
operator factors appear in order of decreasing time arguments. For instance
t t1 t
1 t
dt1 dt2 H (t1 ) H (t2 ) = dt1 dt2 T {H (t1 ) H (t2 )}
t0 t0 2 t0 t0
where

H (t1 ) H (t2 ) t1 > t2
T {H (t1 ) H (t2 )} =
H (t2 ) H (t1 ) t2 > t 1 .
The complete sum is then
∞
t t
(−i)n
U (t, t0 ) = 1 + dt1 · · · dtn T {H (t1 ) · · · H (tn )} . (7.3.7)
n! t0 t0
n=1
This begins to look Lorentz invariant if we take the limits t → + ∞ and

t0 → −∞ and suppose that H (t) is the integral over all space of a scalar
density H (x), such as a polynomial function of the field ϕ(x) discussed in the
previous section:

H (t) = d 3 x H (x, t) (7.3.8)
in which case Eq. (7.3.7) becomes

∞
(−i)n
U (∞, −∞) = 1 + d x1 · · · d 4 xn T {H (x1 ) · · · H (xn )} ,
4
n!
n=1
(7.3.9)
which can be used along with Eq. (7.3.1) to calculate the S-matrix.
Lorentz Invariance
The remaining problem with Lorentz invariance is that the integrand is still
time-ordered. As we saw in Section 4.7, the ordering in time of two events at
spacetime positions x1 and x2 is Lorentz invariant if the separation x1 − x2 is
time-like or light-like – that is (using units with c = 1), if
ημν (x1 − x2 )μ (x1 − x2 )ν = (x1 − x2 )2 − (t1 − t2 )2 ≤ 0 .
Thus to make the scattering operator Lorentz invariant we need the densities to
commute at space-like separations:

H (x1 ), H (x2 ) = 0 for ημν (x1 − x2 )μ (x1 − x2 )ν > 0 . (7.3.10)
The vanishing of this commutator tells us that there is no obstacle to finding
states that are eigenstates of both H (x1 ) and H (x2 ), which can also be justified
on grounds of causality since for x1 − x2 space-like no signal could travel from
a measurement of H at x1 to interfere with a measurement of H at x2 .
Any space-like separation x1 − x2 can be obtained from a purely spatial
separation with t1 = t2 by a Lorentz transformation, so as long as H (x) is a
scalar, the necessary and sufficient condition for (7.3.10) is that the commutator
should vanish at equal times:

H (x1 , t), H (x2 , t) = 0 . (7.3.11)
The scalar field ϕ(x, t) introduced in the former section satisfies the commuta-
tion relation [ϕ(x1 , t), ϕ(x2 , t)] = 0 for any positions x1 and x2 , so an interaction
Hamiltonian density H constructed as any polynomial function of ϕ will satisfy
Eq. (7.3.11). As we shall see in the next section, the condition (7.3.11) is not
so easy to satisfy in more general theories, and this leads to the necessity of
antiparticles.
Example: Scattering
To make all this concrete, let us calculate the lowest-order amplitude for scat-
tering of a pair of particles in the theory of real scalar fields, with H0 the
free-particle Hamiltonian described in the previous section, and with the simple
interaction Hamiltonian density
g
H = ϕ 3 , (7.3.12)
6
with g a constant taken small enough to justify the use of perturbation theory.6
(The factor 1/6 is inserted for later convenience.) To lowest order in g, the
S-matrix element for particles with momenta p1 and p2 to scatter, with momenta
changing to p1 and p2 , is

1 g 2
Sp1 p2 ,p1 p2 = − d 4 x d 4 y (p1 p2 , T {ϕ 3 (x), ϕ 3 (y)}p1 p2 ) , (7.3.13)
2 6
where
1
p1 p2 ≡ √ a † (p1 ) a † (p2 )0 , (7.3.14)
2
√
with 0 the free-particle vacuum state. The factor 1/ 2 is included to compen-
sate for the sum of two delta functions in the scalar product; using Eqs. (7.2.7)
and (7.2.13),
1
(p1 p2 , p1 p2 ) = δ 3 (p1 − p1 )δ 3 (p2 − p2 ) + δ 3 (p1 − p2 )δ 3 (p2 − p1 ) .
2
(7.3.15)
There is no term in this S-matrix element that is of first order in g, because
there are not enough creation and annihilation operators in a single ϕ 3 operator
to destroy the two initial particles and create the two final particles.
Our strategy in calculating the scattering amplitude (7.3.13) will be to move
a pair of the annihilation operators in ϕ 3 (x) and/or ϕ 3 (y) past the creation oper-
ators in p1 p2 , which gives a pair of commutators of annihilation with creation
operators, and use the fact that annihilation operators give zero when acting
on 0 ; also, to move a pair of the creation operators in ϕ 3 (x) and/or ϕ 3 (y)
to the left side of the scalar product, so that their adjoints act as annihilation
operators on p1 p2 , and then move them past the creation operators in p1 p2 ,
giving another pair of commutators of annihilation with creation operators and
again using the fact that annihilation operators give zero when acting on 0 .7
6 This theory is actually unphysical, because H → −∞ if g > 0 and ϕ → −∞ or if g < 0 and ϕ → +∞.
This problem does not emerge in perturbation theory, and in any case can be dealt with by adding higher
even powers of ϕ with positive coefficients in H .
7 In moving these annihilation operators out of the time-ordered product to the right or their adjoints to
the left, we are ignoring their commutators with the other fields in ϕ 3 (x) and ϕ 3 (y), because these terms
involve momentum delta functions that vanish if we assume that neither p1 nor p2 equal either p1 or p2 .
If we separate the annihilation and creation parts of ϕ(x), so that

ϕ(x) = ϕan (x) + ϕan
†
(x) , (7.3.16)

d 3p

ϕan (x, t) = √ 3/2
a(p) exp i(p · x − E(p)t) , (7.3.17)
2E(p)(2π)
then
1
[ϕan (x) , a † (p)] = √ eip·x , (7.3.18)
2E(p)(2π)3/2
where
p · x ≡ ημν p μ x ν = p · x − E(p)t .
Following this strategy, we encounter three terms in the S-matrix element:
a. The annihilation operator that destroys the particle with momentum p1
and the creation operator that creates the particle with momentum p1 come
from the same ϕ 3 operator in Eq. (7.3.13), while the annihilation operator
that destroys the particle with momentum p2 and the creation operator that
creates the particle with momentum p2 come from the other ϕ 3 operator.
b. The annihilation operator that destroys the particle with momentum p1 and
the creation operator that creates the particle with momentum p2 come
from the same ϕ 3 operator in Eq. (7.3.13), while the annihilation operator
that destroys the particle with momentum p2 and the creation operator that
creates the particle with momentum p1 come from the other ϕ 3 operator.
c. The annihilation operators that destroy both initial particles come from the
same ϕ 3 operator, while the creation operators that create both final particles
come from the other ϕ 3 operator.
In each case, one of the ϕ(x) and one of the ϕ(y) fields is left over in the
time-ordered product. Also, since the time-ordered product in Eq. (7.3.13) is
symmetric in the spacetime arguments x and y, each of the above contributions
is a sum of two equal terms with x and y interchanged, so we can make an
arbitrary choice of which of the ϕ 3 operators in the three cases above is ϕ 3 (x)
and which is ϕ 3 (y), and drop the √ factor 1/2 in (7.3.13). Instead, a factor 1/2
appears owing to the factors 1/ 2 given by Eq. (7.3.14) in the initial and final
states. The factor 1/6 in Eq. (7.3.12) is cancelled by the 3! ways of choosing
each of the fields in ϕ 3 .
Here in turn are these three contributions:

g2
d 4 x d 4 y g([ϕan (x), a † (p1 )][ϕan (y), a † (p2 )]0 ,
(a)
Sp p ,p p = −
1 2 1 2 2
T {ϕ(x), ϕ(y)} × [ϕan (x), a † (p1 )][ϕan (y), a † (p2 )]0 g)
−g 2
=
2(2π)6 2E(p1 ) · 2E(p2 ) · 2E(p1 ) · 2E(p2 )

× d 4 x d 4 ye−ip1 ·x e−ip2 ·y eip1 ·x eip2 ·y (0 , T {ϕ(x), ϕ(y)}0 ) .
(7.3.19)

g2
Sp(b)
p ,p p =− d 4 x d 4 y g([ϕan (y), a † (p1 )][ϕan (x), a † (p2 )]0 ,
1 2 1 2 2
T {ϕ(x), ϕ(y)} × [ϕan (x), a † (p1 )][ϕan (y), a † (p2 )]0 g)
−g 2
=
2(2π)6 2E(p1 ) · 2E(p2 ) · 2E(p1 ) · 2E(p2 )

× d 4 x d 4 ye−ip1 ·y e−ip2 ·x eip1 ·x eip2 ·y (0 , T {ϕ(x), ϕ(y)}0 ) .
(7.3.20)

g2
Sp(c)
p ,p p =− d 4 x d 4 y g([ϕan (y), a † (p1 )][ϕan (y), a † (p2 )]0 ,
1 2 1 2 2
T {ϕ(x), ϕ(x)} × [ϕan (x), a † (p1 )][ϕan (y), a † (p2 )]0 g)
−g 2
=
2(2π)6 2E(p1 ) · 2E(p2 ) · 2E(p1 ) · 2E(p2 )

× d 4 x d 4 ye−ip1 ·y e−ip2 ·y eip1 ·x eip2 ·x (0 , T {ϕ(x), ϕ(y)}0 ) .
(7.3.21)
These three contributions are symbolized in three of what are known as Feyn-
man diagrams, shown here in Figure 7.1.
Calculation of the Propagator

Evidently, we need to calculate the vacuum expectation value

0 , T {ϕ(x), ϕ(y)}0 ,
which is known as the propagator of the field ϕ. For this purpose, we again
write the scalar field as in Eq. (7.3.16) and use the fact that ϕan acting to the
†
right on 0 and ϕan acting to the left, where its adjoint acts on 0 as ϕan , both
vanish. This gives
1¢ 2¢ 2¢ 1¢
1 2 1 2
(a) (b)
1¢ 2¢
1 2
(c)
Figure 7.1 Feynman diagrams for the scattering of neutral scalar particles.
Here the lines coming into the diagrams from below or going out from the
diagrams above represent particles in initial and final states, respectively; the
vertices represent an interaction and are proportional to ϕ 3 ; the line connecting
vertices represents the propagator.
(0 , ϕ(x)ϕ(y)0 ) = (0 , ϕan (x)ϕan

†
(y)0 )
= (0 , [ϕan (x), ϕan
†
(y)]0 ) = + (x − y) (7.3.22)
where + is the function

d 3p
+ (z) ≡ exp(ip · z − iE(p)z0 ) . (7.3.23)
(2π)3 2E(p)
The propagator is then
(0 , T {ϕ(x), ϕ(y)}0 ) = θ(x − y)+ (x − y) + θ (y − x)+ (y − x) ,
(7.3.24)
where θ is the step function

1 z0 > 0
θ(z) ≡ (7.3.25)
0 z0 < 0 .
What we need in Eqs. (7.3.19)–(7.3.21) is the Fourier transform of the
propagator:

(q) ≡ e−iq·z θ(z)+ (z) + θ (−z)+ (−z) d 4 z
∞
1
= dz0 exp [(iq 0 − iE(q))z0 ]
2E(q) 0
0
+ dz exp [(iq + iE(q))z ] .
0 0 0
−∞
We can give meaning to these integrals by inserting convergence factors
exp(−z0 ) in the first integral and exp(+z0 ) in the second integral, where
is a positive infinitesimal. The integrals are then elementary:

1 1 1
(q) = + ;
2E(q) − iq 0 + iE(q) + iq 0 + iE(q)
so, for → 0,
−i
(q) → , (7.3.26)
q2 + m2 − 2iE(p)
where q 2 ≡ ημν q μ q ν = q2 − (q 0 )2 . The term −2iE(p) in the denominator,
though infinitesimal, is important in more complicated calculations where we
have to integrate (q) over a range of its argument in which q 2 +m2 can vanish.
(For this purpose, it is only important that it is a negative imaginary infinites-
imal, and so is usually written simply as −i.) In our calculation the integrals
over x and y fix the argument of , and we can drop the term −2iE(p) in the
denominator.8
To do the integrals over x and y in Eqs. (7.3.19)–(7.3.21) we set x = (x −
y) + y and integrate separately over x − y and y. In each term the integral over
y then simply gives a factor (2π)4 δ 4 (p1 + p2 − p1 − p2 ), which guarantees the
conservation of energy and momentum. The integrals over x − y are given by
Eq. (7.3.26), and the sum of (7.3.19), (7.3.20), and (7.3.21) then gives the total
second-order scattering amplitude:
ig 2 (2π)4 δ 4 (p1 + p2 − p1 − p2 )
Sp p ,p1 p2 =
2(2π)6 2E(p1 ) · 2E(p2 ) · 2E(p1 ) · 2E(p2 )
1 2

1 1 1
× + + .
(p1 − p1 )2 + m2 (p1 − p2 )2 + m2 (p1 + p2 )2 + m2
(7.3.27)
8 The circumstance that all four-momenta are fixed by the delta functions generated by integrals over
spacetime coordinates is true of all tree diagrams – that is, diagrams like Figure 7.1 that can be
disconnected by cutting any single internal line. The contributions to the S-matrix of diagrams with L loops,
whose disconnection requires the cutting of a minimum of L + 1 internal lines, involve integrals over L
four-momenta.
The appearance here of the term 1/[(p1 − p1 )2 + m2 ] may evoke the recol-
lection of an earlier result. In Eq. (5.6.23) we found in the Born approximation
that a potential proportional to exp(−κr)/r gives a scattering amplitude pro-
portional to 1/[(k − k )2 + κ 2 ], where k and k are the initial and final wave
numbers of the scattered particle, or, in units with h̄ = 1, the initial and final
momenta. There is no energy term in the denominator in Eq. (5.6.23) because
the scattering was supposed there to be due to an external potential that can
transfer momentum but not energy. Aside from that, the comparison shows
that the exchange of a scalar particle of mass m creates effects, like those of
a Yukawa potential, proportional to exp(−κr)/r, with κ = m in natural units
or, in cgs units, with κ = mc/h̄. This was the point made by Yukawa9 in 1935,
which led him to the prediction of a “meson,” with mass intermediate between
the electron and the proton, to carry the nuclear force.
7.4 Antiparticles, Spin, Statistics
The real scalar field discussed in the previous two sections could not describe a
particle that carries any conserved quantity, such as electric charge. If the anni-
hilation part ϕan (x) of the field given by Eq. (7.3.17) destroys a certain amount
†
of charge then its adjoint ϕan (x) would create the same quantity of charge, and
†
no interaction such as ϕ 3 constructed from the real field ϕ = ϕan + ϕan could
possibly conserve this quantity. We could construct interactions that conserve
†
charge by separating ϕan and ϕan and taking the interaction to include equal
numbers of factors of each, such as ϕan 2 ϕ †2 , but then we would not be able
an
†
to preserve Lorentz invariance. The commutator of ϕan (x) and ϕan (y) is the
function + (x − y) given by Eq. (7.3.23), which does not vanish for x − y
space-like, so an interaction such as ϕan 2 ϕ †2 that treats ϕ and ϕ † separately
an an an
would not satisfy the condition (7.3.10), which we have seen is necessary for
Lorentz invariance.
So, what to do? The only known way of restoring Lorentz invariance for
charged particles while preserving charge conservation is to take the free field
to be complex, the sum of a term that annihilates a particle and another term
that creates its antiparticle, a particle with the opposite value of electric charge
(and of all other conserved quantities) but the same spin and mass. For spinless
particles this field takes the form
9 H. Yukawa, Proc. Phys.-Math. Soc. Japan 17, 48 (1935). This article is reprinted in Beyer, Foundations of
Nuclear Physics, listed in the bibliography.

d 3p
ϕ(x, t) = √ 3/2
a(p) exp(ip · x − iE(p)t)
2E(p)(2π)

+ b† (p) exp(−ip · x + iE(p)t) (7.4.1)
where
[a(p), a † (p )] = [b(p), b† (p )] = δ 3 (p − p ) , (7.4.2)
[a(p), a(p )] = [b(p), b(p )] = [a(p), b(p )] = 0 , (7.4.3)
[a † (p), b(p )] = [b† (p), a(p )] = 0 . (7.4.4)
In particular, the commutator of ϕ(x, t) with ϕ † (y, t) vanishes for space-like
x −y because of the same sort of cancellation that we encountered for real scalar
fields. Both terms in ϕ change the electric charge (or any other conserved quan-
tity) by the same amount, so interactions conserve this charge if they contain
an equal number of factors of ϕ and ϕ † . This theory was presented in 1934 by
Pauli and Weisskopf,10 in order to contradict Dirac’s view that antiparticles arise
as holes in a sea of negative-energy particles. Antiparticles are indispensable
for Lorentz invariance in any quantum field theory of particles that carry a
conserved quantity, such as electric charge, even if the particles are bosons,
which do not satisfy the Pauli exclusion principle and so could not form a stable
sea of negative-energy particles. Where particles carry no conserved quantity, as
in the previous sections, these particles can be said to be their own antiparticles.
This is the case for the neutral pion and for the Z 0 particle.
This sort of free complex field theory can be derived as a consequence of a
free-field Lagrangian density, of the form
∂ϕ † ∂ϕ
L = −ημν − m2 ϕ † ϕ . (7.4.5)
∂x μ ∂x ν
The Euler–Lagrange field equations are again
( − m2 )ϕ = 0 (7.4.6)
whose general complex solution is Eq. (7.4.1), with spacetime-independent op-
erator coefficients a and b† . But now the canonical conjugate to ϕ is the time
derivative of an independent canonical variable, the adjoint:
∂ †
π(x, t) = ϕ (x, t) , (7.4.7)
∂t
while ∂ϕ/∂t is the canonical conjugate to ϕ † . The canonical commutation rela-
tions (7.1.10) then yield the commutation relations (7.4.2)–(7.4.4).
10 W. Pauli and V. F. Weisskopf, Helv. Phys. Acta 7, 709 (1934).

The particles described so far are bosons. The multi-particle state vectors
here are
a † (p1 )a † (p2 )a † (p3 ) · · · b† (p1 )b† (p2 )b† (p3 ) · · · 0 , (7.4.8)
where 0 is the vacuum, satisfying a(p)0 = b(p)0 = 0. By taking the
adjoints of the commutation relations (7.4.3), we see that these particles are
bosons; the states (7.4.8) are completely symmetric under interchanges of the
labels p1 , p2 , etc. of the particles and under interchanges of the labels p1 , p2 ,
etc. of the antiparticles.
Suppose we wanted to construct a theory of spinless neutral fermions. We
could suppose that all the commutators [A, B] ≡ [A, B]− ≡ AB − BA
that we previously derived from the canonical commutation relations are now
replaced with anticommutators, [A, B]+ ≡ AB + BA. For instance, we can try
introducing a real scalar field like (7.3.16):
ϕ(x, t) = ϕan (x, t) + ϕan
†
(x, t) ,

d 3p
ϕan (x, t) = √ exp(ip · x − iE(p)t) a(p)
(2π)3/2 2E(p)
but we now suppose that the annihilation and creation operators satisfy the
anticommutation rules:
[a(p), a † (p )]+ = δ 3 (p − p ) , (7.4.9)
[a(p), a(p )]+ = [a † (p), a † (p )]+ = 0 . (7.4.10)
The anticommutation relations (7.4.10) imply the complete antisymmetry of the
multi-particle state vector
a † (p1 )a † (p2 )a † (p3 ) · · · 0 (7.4.11)
under interchange of the labels p1 , p2 , etc. of the particles, as required for
fermions.
In place of the vanishing of the equal-time commutator [ϕ(x, t), ϕ(y, t)] we
now have
[ϕ(x, t), ϕ(y, t)]+ = [ϕan (x, t), ϕan
†
(y, t)]+ + [ϕan
†
(x, t), ϕan (y, t)]+
= + (x − y, 0) + + (y − x, 0)
where + (x − y, x 0 − y 0 ) is the function (7.3.23), which at equal times is
non-zero and even:

d 3p m K1 (m|x − y|)
+ (x − y, 0) = eip·(x−y) = .
2E(p)(2π) 3 4π 2 |x − y|
We see that here the two terms in [ϕ(x, t), ϕ(y, t)]+ do not cancel but add,
unlike the bosonic case. It is in fact impossible to construct scalar fields that
anticommute at equal times for spinless fermions.
Though it is not possible here to go into so much detail, the same sort of
analysis leads to the general conclusion cited in Section 5.5, that integer-spin
particles (including spinless particles) must be bosons, while particles with half-
odd-integer spin must be fermions. The free fields for spinning particles take the
general form
ϕn (x, t) = ϕn,an (x, t) + ϕn,cr (x, t) , (7.4.12)
d 3p
ϕn,an (x, t) = √ un (p, σ ) exp(ip · x − iE(p)t) a(p, σ ) ,
σ
(2π)3/2 2E(p)
(7.4.13)
d 3p
ϕn,cr (x, t) = √ vn (p, σ ) exp(−ip · x + iE(p)t) b† (p, σ ) .
σ
(2π)3/2 2E(p)
(7.4.14)
Here a(p, σ ) is the operator that annihilates a particle of momentum p and spin
3-component σ ; b† (p, σ ) is the operator that creates its antiparticle (so that
both terms in ϕ have the same effect on the charge and all other conserved
quantities); and un (p, σ ) and vn (p, σ ) are functions about which more later. For
neutral particles that are their own antiparticles, a(p, σ ) = b(p, σ ). For bosons
or fermions, the operators a and b satisfy the commutation or anticommutation
relations
[a(p, σ ), a † (p , σ )]∓ = [b(p, σ ), b† (p , σ )]∓ = δσ σ δ 3 (p − p ) ,
(7.4.15)
[a(p, σ ), a(p , σ )]∓ = [b(p, σ ), b(p , σ )]∓ = [a(p, σ ), b(p , σ )]∓ = 0 ,
(7.4.16)
[a † (p, σ ), b(p , σ )]∓ = [b† (p, σ ), a(p , σ )]∓ = 0 , (7.4.17)
the ∓ signs being minus, denoting commutators, for bosons or plus, denoting
anticommutators, for fermions.
The functions un (p, σ ) and vn (p, σ ) are governed by what is assumed for the
Lorentz-transformation property of the fields. Under a Lorentz transformation
x μ → x μ = μ ν x ν , the various fields ϕn (x) undergo various matrix transfor-
mations11

ϕn (x) → ϕn (x) = Dnm ()ϕm (x) , (7.4.18)
m
11 These transformations are often written as actions of a quantum-mechanical operator U (), as

U ()ϕn (x)U −1 () = m Dnm−1 ()ϕ (x). This is the same as Eq. (7.4.18) if we identify ϕ (x) =
m n
−1
U ()ϕn (x)U ().
so that for an observer who uses coordinates x μ = μ ν x ν the field is related

by a matrix D to the field for an observer who uses coordinates x μ at the same
spacetime point, a point that is thus given different coordinates by the two
observers. When we perform two successive Lorentz transformations 1 and
then 2 , the effect on the fields is

ϕn (x) → ϕn1 (1 x) = Dnl (1 )ϕl (x)
l

→ Dnl (1 )ϕl2 (2 x) = Dnl (1 )Dlm (2 )ϕm (x) ,
l m,l
μ ρ
while the effect of the compound Lorentz transformation (1 2 )μ ν = 1 ρ 2 ν
is

ϕn (x) → ϕn1 2 (1 2 x) = Dnm (1 2 )ϕm (x) .
m
These transformations must be the same, so
D(1 )D(2 ) = D(1 2 ) , (7.4.19)
where [D(1 )D(2 )]nm is the usual matrix product:

[D(1 )D(2 )]nm = Dnl (1 )Dlm (2 ) .
l
Such matrices are said to form a representation of the group of Lorentz transfor-
mations. We classify the various kinds of field according to the representation
they furnish of the Lorentz group.
It is always possible to write the Lagrangian density in terms of fields that
are irreducible, in the sense that their components cannot be divided into sets
that, under Lorentz (and perhaps space inversion) transformations, transform
only into linear combinations of the field components in the same set. Among
these irreducible fields are a single scalar field for spin zero, for which D()
is the unit matrix, or a single four-vector field for spin one, for which D() is
itself. For spin 1/2 there is the four-component Dirac field, briefly described
in the appendix to this section. For our present purposes, the important thing
about irreducible fields is that the coefficient functions un (p, σ ) and vn (p, σ )
are uniquely determined up to constant factors by what is assumed for the
Lorentz-transformation properties of the fields and the spin of the particles.
As discussed in the previous section, for the Lorentz invariance of the theory
it is not enough that the interaction Hamiltonian density H should be a scalar; it
also has to satisfy the condition (7.3.11), that H (x) should commute with H (y)
at equal times x 0 = y 0 . For this, it is necessary that H should be formed from
bosonic fields that all commute with each other at equal times, plus some even
number (perhaps zero) of fermionic fields that anticommute with each other at
equal times (and commute with the bosonic fields at equal times).
For particles that are not their own antiparticles, the commutators or anticom-
mutators of the ϕn with each other and of the ϕn† with each other trivially vanish.
On the other hand, the equal-time commutator or anticommutator of any field
with its adjoint is
†
[ϕn (x, t), ϕm (y, t)]∓ = nm (x − y) ∓ nm (y − x) (7.4.20)
where
d 3p
nm (x − y) ≡ un (p, σ )u∗m (p, σ )eip·(x−y) , (7.4.21)
σ
2E(p)(2π)3
d 3p ∗
nm (x − y) ≡ 3/2
vn (p, σ )vm (p, σ )eip·(x−y) . (7.4.22)
σ
2E(p)(2π)
The first and second terms on the right of Eq. (7.4.20) come respectively
from the commutator or anticommutator of the annihilation part of ϕn (x, t)
†
with the creation part of ϕm (y, t) and from the commutator or anticommutator
†
of the creation part of ϕn (x, t) with the annihilation part of ϕm (y, t). (The
crucial ∓ sign that distinguishes bosons from fermions appears in the second
term of Eq. (7.4.20) because this term comes from the part of the commutator
†
or anticommutator of ϕn with ϕm in which b† appears to the left of b.) Detailed
calculations beyond the scope of this book show that12
nm (y − x) = (−1)2j |λ|2 nm (x − y) (7.4.23)
where j is the particle spin and λ depends on how the un and vn are normalized.
(If we multiply un and vn by factors α and β then λ is changed by a factor β/α.)
For equal-time commutators or anticommutators of fields and their adjoints to
vanish, the two terms in Eq. (7.4.20) must cancel. For this we need
|λ|2 (−1)2j = ±1 ,
with the top sign for bosons and the bottom sign for fermions. This requires that
|λ| = 1, which can always be arranged by adjusting the relative normalization of
un and vn , and thereby imposes a relation between the strengths of interactions
of particles and antiparticles. But with |λ| = 1, we also need
(−1)2j = ±1 . (7.4.24)
This is the famous connection between spin and statistics:13 particles with j an
integer are bosons, and particles with j a half odd integer are fermions.
12 For a textbook treatment, see e.g. S. Weinberg, The Quantum Theory of Fields, Vol. I (Cambridge
University Press, Cambridge, UK, 1995), Section 5.7.
13 M. Fierz, Helv. Phys. Acta 12, 3 (1939); W. Pauli, Phys. Rev. 58, 716 (1940).
Appendix: Dirac Fields
In 1928 Dirac introduced a relativistic wave equation14 that he thought would

provide the basis for a formulation of quantum mechanics consistent with spe-
cial relativity. About this program he was wrong; the successful relativistic
formulation of quantum mechanics turned out to take the form of quantum field
theory. But his equation survives as the field equation of the quantum fields for
particles of spin 1/2, and their antiparticles, and leads to some of the same
consequences, such as formulas for the fine structure of atomic spectra. This
appendix provides just a sketch of Dirac’s formalism, skipping most proofs.
The Dirac field is a set of four operators ψn (x), characterized by their Lorentz
transformations: for x → x,

ψn (x) → ψn (x) = Dnm ()ψm (x) , (7.4.25)
m
with the matrix D() furnishing a representation of the Lorentz group with the
special property that
D −1 ()γ μ D() = μ ν γ ν (7.4.26)
where the γμ are a set of four 4 × 4 matrices satisfying the anticommutation
relations
⎧
⎨ +1 μ = ν = 1, 2, 3
γ μ γ ν + γ ν γ μ = 2ημν ≡ 2 × −1 μ = ν = 0 (7.4.27)
⎩
0 μ = ν .
This allows a Lorentz-invariant first-order free-field equation for mass m:

μ ∂
γ + m ψ(x) = 0 . (7.4.28)
∂x μ
Using the commutativity of partial derivatives and the anticommutation rules
(7.4.27), we see that Eq. (7.4.28) has the consequence

μ ∂ μ ∂
0= γ μ
−m γ μ
+ m ψ = ( − m2 )ψ .
∂x ∂x
For this reason, Dirac thought of his equation as a sort of square root of the rela-
tivistic Schrödinger (or Klein–Gordon) free-particle equation ( − m2 )ψ = 0.
The general solution of Eq. (7.4.28) is

d 3p ip·x −ip·x

ψn (x) = √ e u n (p, σ )a(p, σ ) + e vn (p, σ )b †
(p, σ )
(2π)3/2 2E(p)
(7.4.29)
14 P. A. M. Dirac, Proc. Roy. Soc. (London) A117, 610 (1928).

where p · x ≡ ημν p μ x ν ; p 0 = E(p) = +(p2 + m2 )1/2 ; the un (p, σ ) and

vn (p, σ ) are independent solutions of the equations

iημν γ μ p ν + m u = 0 , (7.4.30)

−iημν γ μ p ν + m v = 0 ; (7.4.31)
and a(p, σ ) and b(p, σ ) are operator coefficients, with σ labeling the indepen-
dent solutions of Eqs. (7.4.30) and (7.4.31).
We can count the number of independent solutions, noting that any column
wn can be decomposed as

w = w+ + w− , iημν γ μ p ν ± m w± = 0 ,
by taking

∓iημν γ μ p ν + m
w± = w.
2m
Thus, with a total of four components, there must be just two independent
un (p, σ ) satisfying Eq. (7.4.30) and two independent vn (p, σ ) satisfying
Eq. (7.4.31). The index σ therefore takes just two values, corresponding to
the two values of the third component of spin for a particle of spin 1/2. Dirac
thought that the solutions e−ip·x vn (p, σ ) were the wave functions for a free
negatively charged electron with negative energy; instead, just as we saw for
scalar fields, they are the coefficients of the creation operator b† for a positively
charged antielectron, or positron, of positive energy.
In forming a Lagrangian density L(x), we need to include both fields and

their adjoints in such a way that L(x) is a scalar. Here n ψn† ψn is not a
scalar, but here and more generally there is always a matrix βnm for which
†
n,m ψn βnm ψm does transform as a scalar. This is because for any matrices
A and B, we have (AB)†−1 = (B † A† )−1 = A†−1 B †−1 , so the inverse of the
adjoint Dnm †
≡ Dmn∗ satisfies the same multiplication rule (7.4.19) as D itself:
D †−1 (1 )D †−1 (2 ) = D †−1 (1 2 ) . (7.4.32)

This does not mean that D †−1 () = D(), but for irreducible representations
they are equal up to a similarity transformation; there is a matrix β for which
D †−1 () = βD()β −1 ,
or, multiplying on the left with D † and on the right with β,
D † ()βD() = β . (7.4.33)
It follows then that we can define a covariant adjoint

ψ m (x) ≡ ψn† (x)βnm , (7.4.34)
m
such that the effect of a Lorentz transformation x → x is

ψ m (x) → ψ m (x) = ψn† (x)[D † ()β]nm = ψ n (x)[D −1 ()]nm .
n n
(7.4.35)

Thus not only is ψ(x)ψ(x) ≡ m ψ m (x)ψm (x) a scalar, but also (in the same
abbreviated notation) ψ(x)γ μ ψ(x) is a four-vector.
It is now easy to construct a Lorentz-invariant free-field Lagrangian density
from which follows the field equation (7.4.28):

∂
L0 = −ψ γ μ μ + m ψ . (7.4.36)
∂x
Without going into details, the canonical anticommutation relations here give
[a(p, σ ), a † (p , σ )]+ = [b(p, σ ), b† (p , σ )]+ = δσ σ δ 3 (p − p ) (7.4.37)

[a(p, σ ), a(p , σ )]+ = [b(p, σ ), b(p , σ )]+
= [a(p, σ ), b(p , σ )]+ = [a(p, σ ), b† (p , σ )]+ = 0 ,
(7.4.38)
provided the solutions of Eqs. (7.4.30) and (7.4.31) are normalized so that

un (p, σ )um (p, σ ) = [−ipμ γμ + m]nm ,
σ
(7.4.39)
vn (p, σ )v m (p, σ ) = [−ipμ γμ − m]nm .
σ
(As usual, [A, B]+ is defined as AB + BA.) The anticommutator of the Dirac
field with its adjoint is given by

μ ∂ d 3 p ip·(x−y)
[ψn (x), ψ m (y)]+ = −γ μ
+ m 0 3
e − e−ip·(x−y) ,
∂x nm 2p (2π)
(7.4.40)
which obviously vanishes for x 0 = y 0 and hence for all space-like x − y, as

required by Lorentz invariance.
We can include the interaction of the Dirac field of the electron with the
electromagnetic vector potential using the prescription given in the follow-
ing section. Replacing ∂ψ/∂x μ in the free-field Lagrangian density with
∂ψ/∂x μ + ieAμ ψ gives the Lagrangian for electrons and positrons and their
interaction with electromagnetism:

∂
LDirac = −ψ γ μ
+ ieAμ + me ψ . (7.4.41)
∂x μ
This yields the Euler–Lagrange field equation for a Dirac field interacting with
any electromagnetic field:

∂
γ μ
+ ieAμ + me ψ = 0 . (7.4.42)
∂x μ
The Dirac wave function used in Dirac’s calculations was not the quantum
field ψ, but its matrix elements:
ψAn (x) ≡ (vac , ψn (x)A ) , ψBn (x) ≡ (B , ψn (x)vac ) (7.4.43)
where A and B are states of charge −e and +e, respectively, such as states of
an electron and a positron in the electromagnetic field of an atom. These wave
functions satisfy the same equation as the field;

∂ ∂
γ μ
+ ieAμ + me ψA (x) = γ μ
+ ieAμ + me ψB (x) = 0 .
∂x μ ∂x μ
(7.4.44)
For a time-independent electromagnetic field, the time dependence of the Dirac
field is governed by a time-independent Hamiltonian H in the Heisenberg pic-
ture, so, for states A and B with energy EA and EB , the wave functions have
the time dependences
ψAn (x, t) ∝ e−iEA t , ψBn (x, t) ∝ e+iEB t . (7.4.45)
The different sign of the argument of e+iEB t does not arise because the state
B has negative energy, but because it appears to the left of the Dirac field in
the definition (7.4.43) of the wave function ψBn (x, t). From solutions of the
wave (7.4.44) for ψAn (x, t) with time dependence given by (7.4.45) and a pure
Coulomb field A0 = Ze2 /r, A = 0, Dirac was able to calculate the energies of
the states of hydrogenic atoms, including their fine structure:

Ze4 Z 2 e8 3 n
E(nj ) = me 1 − 2 + 4 − + ··· (7.4.46)
2n n 8 2j + 1
with no dependence on .
As discussed in Section 6.5, Fermi in his 1934 theory of beta decay proposed
an interaction Hamiltonian of the form (6.5.1), proportional to the scalar product
of two vector currents. This then had to be modified, first by the introduction
of axial vector currents and then by including terms that violate invariance
under space inversion, resulting in an interaction of the form (6.5.4). Expressed
explicitly in terms of Dirac fields for the proton, neutron, electron, and neutrino,
Fermi’s original proposed interaction (in units with h̄ = c = 1) was
Hβ = GF (ψ e γ μ ψν )(ψ p γ μ ψn ) + GF (ψ ν γ μ ψe )(ψ n γ μ ψp ) , (7.4.47)
and after 30 years of experiments on nuclear beta decay and other weak inter-
action processes, this was finally modified to
GF
Hβ = √ (ψ e γ μ (1 + γ5 )ψν )(ψ p γ μ (1 + γ5 )ψn )
2
GF
+ √ (ψ ν γ μ (1 + γ5 )ψe )(ψ n γ μ (1 + γ5 )ψp ) , (7.4.48)
2
where GF = 1.16 × 10−5 GeV−2 and γ5 ≡ iγ1 γ2 γ3 γ0 . It can be shown from
the anticommutation relations that γ5 is Lorentz invariant, in the sense of com-
muting with D(), so (7.4.48) like (7.4.47) transforms as a scalar under any
proper Lorentz transformation. It is the presence of the matrix 1 + γ5 in Hβ
that produces the violations of invariance under space inversion discussed in
Section 6.5, including the fact that if neutrinos were massless, the neutrinos cre-
ated along with electrons by the first term in Eq. (7.4.48) or along with positrons
in the second term in Eq. (7.7.48) would have a component of angular momen-
tum in the direction of motion respectively equal to h̄/2 or −h̄/2. For the very
small known masses of neutrinos, these helicities are overwhelmingly likely.
7.5 Quantum Theory of Electromagnetism
We end our treatment of quantum mechanics where we began, with the quantum
theory of radiation. We will first present the Lagrangian densities both for the
free electromagnetic field and for the fields’ interactions with matter, then work
out in detail the theory of the free field, which as shown in Section 7.3 is needed
to provide the interaction in the interaction picture in perturbation theory and
then to apply what we have learned to a classic problem, calculation of the
rate of emission of photons in transitions between atomic or molecular states.
We close with an account of the interaction of electromagnetism with general
matter fields.
Lagrangian Density
It is easy to think of a possible Lagrangian density for the electromagnetic
field that is quadratic in the fields, like all free-field Lagrangians and is Lorentz
invariant:
1
L0 = − ημρ ην,σ F μν F ρσ , (7.5.1)
16π
where F μν is the field strength tensor, given by Eqs. (4.6.7) and (4.6.8):
E1 = F 01 = −F 10 , E2 = F 02 = −F 20 , E3 = F 03 = −F 30 , (7.5.2)
B1 = F 23 = −F 32 , B2 = F 31 = −F 13 , B3 = F 12 = −F 21 . (7.5.3)
(The factor −1/16π is irrelevant now but will be convenient later, when we con-
sider the coupling of these fields to matter.) This is manifestly
Lorentz invariant,
but otherwise appears absurd. If we assume that d 4 x L is stationary under
arbitrary infinitesimal variations of the fields F μν , we find Euler–Lagrange
equations of the form F μν = 0, which certainly do not describe actual free
electromagnetic fields. The error made in deriving this wrong result is that
we must not impose conditions for arbitrary variations of F μν , because the field-
strength tensor is constrained by the homogeneous Maxwell equations (4.6.15),
(4.6.17):
∂Fμν ∂Fλμ ∂Fνλ
0= λ
+ + (7.5.4)
∂x ∂x ν ∂x μ
where
Fμν ≡ ημρ ηνσ F ρσ . (7.5.5)
We should only demand that the action is stationary for variations in the fields
that preserve the constraint (7.5.4).
It is easy to see that this requirement leads to the remaining free-field
Maxwell equations ∂F μν /∂x ν = 0, but in deriving the canonical commutation
relations it is awkward to work with functional derivatives with respect to
constrained fields like F μν . In electrodynamics it is much easier to express the
field-strength tensor in terms of an unconstrained vector potential Aμ , in such a
way that the constraint (7.5.4) is automatically respected,
∂Aν ∂Aν
Fμν = − μ (7.5.6)
∂x μ ∂x
and take all functional derivatives with respect to the Aμ . As shown in
Section 5.8, the introduction of a vector potential is essential anyway in
formulating the quantum theory of charged particles in an electromagnetic
field.
For the present we will introduce a general Lagrangian density Lmat for
matter and its interaction with the electromagnetic field, and define the electric
current four-vector J μ as the functional derivative with respect to Aμ (x) of the
corresponding term in the action:

δ
J (x) ≡
μ
d 4 y Lmat (y) . (7.5.7)
δAμ (x)
Under an infinitesimal shift in Aμ , the change in the total action is now

1 ∂F μν (x)
δ d x(L0 + Lmat ) = d x −
4 4
+ J (x) δAμ (x) ,
μ
4π ∂x ν
and the Euler–Lagrange equations here are

∂F μν (x)
ν
= 4πJ μ (x) , (7.5.8)
∂x
which we recognize as the inhomogeneous Maxwell equations (4.6.9) (except
that there is no factor 1/c, because we are using natural units with h̄ = c = 1).
Gauge Transformations
Now we have a problem. We cannot satisfy the canonical commutation relations
for the field A0 , because since F00 = 0 the Lagrangian density does not contain
a time derivative of A0 . To deal with this, we note that the action is invariant
under a gauge transformation
∂ξ(x)
Aμ (x) → Aμ (x) + (7.5.9)
∂x μ
with ξ(x) an arbitrary function of the spacetime coordinate. This has no effect
on the field-strength tensor (7.5.6), and the consistency of the Maxwell equa-
tions requires that the current J μ is conserved in the sense that ∂J μ (x)/∂x μ =
0, so that according to Eq. (7.5.7) the change produced in the matter action by
the gauge transformation (7.5.9) is
μ
∂ξ(x) 4 ∂J (x)
δ d x Lmat = d x J (x)
4 4 μ
= − d x ξ(x) = 0 .
∂x μ ∂x μ
(7.5.10)
Coulomb Gauge
We can always choose ξ(x) so as to adopt what is known as the Coulomb gauge,
for which
∇·A=0 (7.5.11)
because if ∇ · A = 0, we can make it vanish by performing a gauge transforma-
tion with ∇ 2 ξ = −∇ · A. This is called the Coulomb gauge because the μ = 0
component of the inhomogeneous Maxwell equations (7.5.8) is here
∂F 0i
4π J 0 = = −∇ 2 A0
∂x i
with solution given by the familiar Coulomb field

J 0 (y, t)
A (x, t) = d 3 y
0
. (7.5.12)
|x − y|
Since A0 is a functional of the matter fields in J 0 at the same time, it is not to
be regarded as an independent canonical variable. The canonical variables of
electrodynamics in Coulomb gauge are the spatial components Ai , but subject

to the constraint (7.5.11).
The condition (7.5.11) for Coulomb gauge is obviously not Lorentz invari-
ant. Given a vector potential Aμ (x) that satisfies this condition, the Lorentz-
transformed vector potential μ ν Aν will in general not satisfy Eq. (7.5.11) if
is anything but a pure rotation. However, we can always combine any Lorentz
transformation with a gauge transformation that takes the vector potential back
to Coulomb gauge. Since the action is presumed to be gauge invariant, the
physical consequences of the theory calculated in Coulomb gauge turn out to
be Lorentz invariant.
The virtue of Coulomb gauge, which here makes up for its lack of mani-
fest Lorentz invariance, is that it displays the physical degrees of freedom of
electrodynamics. Even though Aμ has four components, as we have seen in
Coulomb gauge A0 is a functional of matter fields, and ∇ · A vanishes. We shall
see that the two remaining degrees of freedom are the two independent states of
photon polarization. It must be admitted, however, that, as a practical matter, in
carrying out calculations in quantum electrodynamics more complicated than
those essayed here, it is necessary to use techniques that preserve manifest
Lorentz invariance, such as the path integral approach of Feynman.15
Now we have to consider what is the canonical conjugate to A. According
to the usual definition of a functional derivative, if we make an infinitesimal
variation Ȧ → Ȧ + δ Ȧ, then

δL(t)
δL(t) = d 3 x δ Ȧi (x, t) ,
δ Ȧi (x, t)
but, since Ȧi is constrained by (7.5.11), we are only allowed to consider varia-
tions satisfying ∂δ Ȧi /∂x i = 0, so δL(t)/δ Ȧi (x, t) is only defined up to gradient
terms, of the form ∂f/∂x i . A direct calculation gives

∂L 1 ∂A0
= Ȧi +
∂ Ȧi 4π ∂x i
but we need to take advantage of our freedom to shift this functional derivative
by the gradient −(∂A0 /∂x i )/4π, and take the canonical conjugate to Ai as
πi = Ȧi /4π , (7.5.13)
so that πi satisfies the same constraint as Ai :
∇·π =0. (7.5.14)
15 R. P. Feynman, Ph.D. thesis, The Principle of Least Action in Quantum Mechanics (Princeton University,
1942; University Microfilms Publication No. 2948, Ann Arbor).
The usual canonical commutation relations must here be modified to take ac-
count of the conditions (7.5.11) and (7.5.14). We use the formula
1
∇2 = −4πδ 3 (x − y) .
|x − y|
(This can be derived by showing directly that the left-hand side vanishes for
x = y, and using Gauss’s theorem to show that its integral over all space is
−4π.) Then we have consistency with conditions (7.5.11) and (7.5.14) if we
take

∂2 1
[Ai (x, t), πj (y, t)] = iδij δ (x − y) + i i j
3
(7.5.15)
∂x ∂x 4π|x − y|
and also
[Ai (x, t), Aj (y, t)] = [πi (x, t), πj (y, t)] = 0 . (7.5.16)
Free Fields
As emphasized in Section 7.3, the first step in using time-ordered perturbation
theory to calculate processes involving interacting particles is to write explicit
formulas for the free fields. With zero current and charge densities, and hence
A0 = 0, the field equations (7.5.8) for Ai in Coulomb gauge are
∂F μi ∂ ∂Aμ
0= μ
= Ai
− μ
= Ai . (7.5.17)
∂x ∂xi ∂x
The general real solution of Eqs. (7.5.11) and (7.5.17) is conveniently written
√ d 3q
A(x, t) = 4π 3/2
√ e(q, λ)a(q, λ)eiq·x−i|q|t
λ
(2π) 2|q|

+ e∗ (q, λ)a † (q, λ)e−iq·x+i|q|t (7.5.18)
where a(q, λ) is an operator coefficient whose properties will be found from the
canonical commutation relations, and e(q, λ) are any two independent three-
vectors normal to q,
q · e(q, λ) = 0 (7.5.19)
with λ a two-valued index distinguishing the two solutions of (7.5.19). By a
suitable normalization of a(q, λ), we can always normalize these vectors so that

ei (q, λ)ej∗ (q, λ) = δij − qi qj /|q|2 . (7.5.20)
λ
√
the 3-direction, we can take e = (1, i, 0)/ 2 for λ = 1,
For instance, for q in√
and e = (1, −i, 0)/ 2 for λ = −1, and, for q in a direction defined by
some choice of rotation from the 3-direction, apply the same rotation to e.
These are the same as the polarization vectors for left- and right-handed circular
polarization that appear in the Fourier expansion of an electromagnetic wave.
With this normalization of the polarization vectors, the field (7.5.18) satisfies
the canonical commutation relations (7.5.15)–(7.5.16) if we take
[a(q, λ), a † (q , λ )] = δλλ δ 3 (q − q ) , (7.5.21)
and
[a(q, λ), a(q , λ )] = 0 . (7.5.22)
Then, just as we saw for a real scalar field in Section 7.2, the operator a † (q, λ)
creates a photon of momentum q and polarization vector e(q, λ) in any state
vector on which it acts, while if there already is such a photon in the state, the
operator a(q, λ) removes it.
To see the physical significance of λ, note that for q in the 3-direction, if we
perform a rotation by angle θ around the 3-axis,
e1 → e1 cos θ + e2 sin θ , e2 → −e1 sin θ + e2 cos θ ,
then the polarization vectors change by phases as follows:
e(q, ±1) → e∓iθ e(q, ±1) .
Since there is nothing special about the 3-direction, this is the effect of rotation
by angle θ around the direction of motion for a photon moving in any direction.
In accordance with the general discussion of angular momentum in Section 5.4,
this means that a photon created by a † (q, λ) has a component of angular
momentum around the direction of motion, that is a helicity, equal to h̄λ in cgs
units.
To calculate the free-field Hamiltonian, we first note that, since A0 = 0 for
free fields, the free-field Hamiltonian density is
1 1
H0 = πj Ȧj − L0 = Ȧj Ȧj + (∂i Aj − ∂j Ai )(∂i Aj − ∂j Ai ) ,
8π 16π
where as usual i and j run over the values 1, 2, 3, and repeated indices are
summed. Using integration by parts and the Coulomb gauge condition (7.5.11)
we find the free-field Hamiltonian

1
H0 = d 3 x H0 = d 3 x Ȧi Ȧi + ∂i Aj ∂i Aj .
8π
Inserting the field (7.5.18) and following just the same steps as in calculating
the free-field Hamiltonian for a scalar field in Section 7.2, we find the free-field
Hamiltonian for electromagnetism

1
H0 = d 3 q|q|(a † (q, λ)a(q, λ) + a(q, λ)a † (q, λ))
2
λ

= d 3 q|q|a † (q, λ)a(q, λ) + Evac , (7.5.23)
λ
where

Evac = δ 3 (0) d 3 q|q| = (2π)−3 d 3x d 3 q|q| . (7.5.24)
As in the case of the real scalar field treated in Section 7.2, the vacuum vac ,
defined as the state of lowest energy, must satisfy the condition
a(q, λ)vac = 0 , (7.5.25)
since otherwise there would be a state a(q, λ)vac with a lower energy than
vac . Thus
H0 vac = Evac vac . (7.5.26)
The energy (7.5.24) is a contribution to the total vacuum energy that must
be added to the contributions of all other fields, such as (7.2.16). The state
consisting of a photon with momentum q1 and helicity λ1 , another photon with
momentum q2 and helicity λ2 , and so on, may be expressed as
q1 ,λ1 ;q2 ,λ2 ;... ∝ a † (q1 , λ1 )a † (q2 , λ2 ) · · · vac , (7.5.27)
and has energy Evac + |q1 | + |q2 | + · · · . The term Evac appears in the energy
of all states, and so aside from gravitational phenomena may be ignored, as we
shall do here.
Radiative Decay
We now consider the rate at which an excited atom16 will drop into a state
of lower energy, emitting a photon. We shall neglect relativistic effects and
the interaction of the electromagnetic field with the electron spin, so that the
Hamiltonian for the atom interacting with the electromagnetic field is given by
a sum over the particles in the atom of terms of form (5.8.3). Since we are
interested in the emission only of a single photon, the relevant interaction term
is the part of this sum linear in A:
en
V =− [A(Xn ) · Pn + Pn · A(Xn )] ,
n
2mn
where en and mn are the charge and mass of the nth particle (electron or nu-
cleus) while Xn and Pn are the position and momentum operators of the nth
16 The calculations here of radiative decay rates apply to molecules as well as to atoms, but to avoid repeating
“or molecules” again and again, I will just refer below to transitions in atoms.
particle and A(X) is the quantum vector potential in the Schrödinger picture.
Because we are using Coulomb gauge, in which A satisfies Eq. (7.5.11), it
makes no difference in what order we write the operators in V , and we can
just as well write
en
V =− A(Xn ) · Pn . (7.5.28)
n
m n
We take the initial and final states of the atom to be eigenstates i,pi and f ,pf
of the Hamiltonian of the atom, with energies Ei and Ef , respectively, and
with total momenta pi and pf , respectively. (Because atomic nuclei are heavy
the kinetic energies of the states of the whole atom are always much less than
Ei − Ef , and so will be neglected.) The atomic state vectors are assumed to be
normalized so that
(a ,p , a,p ) = δa a δ 3 (p − p) . (7.5.29)
Each of these states is a vacuum as far as photons are concerned, so, for any
photon momentum q and helicity λ,
a(q, λ) i,pi = a(q, λ) f ,pf = 0 . (7.5.30)
The initial state of the radiative decay process is then i,pi , and the final state is
a † (q, λ)f ,pf , with q and λ the momentum and helicity of the emitted photon.
To first order in V we can treat A in Eq. (7.5.28) as a free field, so to this order
the S-matrix element (5.6.36) for the decay process is
S[i(pi ) → f (pf ) + γ (q, λ)]
= −2πiδ(Ef + |q| − Ei )(a † (q, λ)f ,pf , V i,pi )
= −2πiδ(Ef + |q| − Ei )(f ,pf , a(q, λ)V i,pi )
√ d 3q
= 2πiδ(Ef + |q| − Ei ) 4π %
λ
(2π)3/2 2|q |
en
× (f pf , a(q, λ)e∗ (q , λ ) · Pn e−iq ·Xn a † (q , λ )i,pi ) .
n
m n
(7.5.31)
Using the photon vacuum condition (7.5.30) and the commutation relation
(7.5.21), we can replace the product a(q, λ)a † (q , λ ) with δ 3 (q − q )δλλ , and
do the integral over q and the sum over λ by just setting q = q and λ = λ, so
√
2πi 4πδ(Ef + |q| − Ei )
S[i(pi ) → f (pf ) + γ (q, λ)] = √
(2π)3/2 2|q|
en

× f pf , e∗ (q, λ) · Pn e−iq·Xn i,pi .
n
mn
(7.5.32)
At this point we make a further approximation, known as the electric dipole

approximation. The wavelength 2π/|q| of the emitted photon is typically at least
hundreds or thousands of angstroms, while the mean separations of electrons
from the center of mass of the atom are typically a few angstroms. It is therefore
usually a good approximation (as long as selection rules to be discussed below
do not require the result to vanish) to replace each particle position Xn in the
exponent in Eq. (7.5.32) with the center-of-mass coordinate vector
1
X≡ mn Xn , M≡ mn . (7.5.33)
M n n
Now, using the commutators of the momentum and position operators,

Pn , exp(iq · X) = q exp(iq · X) , (7.5.34)
n
so17
exp(iq · X)f ,pi = f ,pf +q . (7.5.35)
Hence, replacing all Xn in the exponent in Eq. (7.5.32) with X, and letting
the adjoint of this exponential act on the final state, we have
√
−2πi 4π δ(Ef + |q| − Ei )
S i(pi ) → f (pf ) + γ (q, λ) = √
(2π)3/2 2|q|
en

× f ,pf +q , e∗ (q, λ) · Pn i,pi .
n
mn
(7.5.36)
The operators Pn all commute with the total momentum, so we can write their
matrix elements as

f ,pf +q , Pn i,pi = δ 3 (pf − pi + q)(Pn )f i (7.5.37)
and so
S [i(pi ) → f (pf ) + γ (q, λ)] = −2πiδ(Ef − Ei + |q|)δ 3 (pf − pi + q)
× M[i(pi ) → f (pf ) + γ (q, λ)] , (7.5.38)
17 This argument does not rule out the possible presence of a numerical factor multiplying the right-hand
side of Eq. (7.5.35). Any such factor of proportionality would have to have absolute magnitude unity,
because [exp(iq · X)]† exp(iq · X) = 1, and we define both f ,pf and f ,pf +q to be normalized in
accordance with Eq. (7.5.29). Such a phase factor would depend on our arbitrary choice of the phase of
the state f ,pf as a function of pf and can be defined to be unity, but in any case it cannot affect the
radiative transition rate, which is proportional to the absolute value squared of the matrix element for the
transition. So this possible phase factor will be ignored here.
where
√
4π en
M [i(pi ) → f (pf ) + γ (q, λ)] = √ (Pn )f i · e∗ (q, λ) .
(2π)3/2 2|q| n mn
(7.5.39)
To see how this is calculated in wave mechanics, note for example that in
hydrogen the initial and final atomic wave functions take the form
exp(ipi · x)
ψi,pi (x, x) = ψi (x) ,
(2π)3/2
exp(i[pf + q] · x)
ψf ,pf +q (x, x) = ψf (x) ,
(2π)3/2
where x is the vector separation of the electron and proton and x is the coor-
dinate vector of the center of mass. With me
mp , the matrix element of the
electron momentum operator Pe is

f ,pf +q , Pe i,pi = d x d 3 x ψf∗ ,pf +q (x, x)Pe ψi,pi (x, x) ,
3
which has the same form as Eq. (7.5.37), with

(Pe )f i = −i d 3 x ψf∗ (x)∇ψi (x) .
Using Eq. (7.5.39) in Eq. (5.6.45) (with the number Nα of particles in the
initial state equal to one), the differential decay rate is
2
d(i → f + q, λ) = 2π M[i(pi ) → f (pf ) + γ (q, λ)]
× δ(Ef + |q| − Ei )δ 3 (pf + q − pi ) d 3 q d 3 pf
2
1 en ∗

= (Pn )f i · e (q, λ)

2π|q| n mn
× δ(Ef + |q| − Ei )δ 3 (pf + q − pi )d 3 qd 3 pf .

(7.5.40)
The momentum-conservation delta function just goes to fix the recoil momen-
tum pf = pi − q, and the energy-conservation delta function fixes the photon
energy |q| = Ei − Ef . Writing d 3 q = |q|2 d|q|dγ , we are left with the rate
for emission of a photon with helicity λ into a small solid angle dγ :
2
|q| en

d(i → f + q, λ) = (Pn )f i · e∗ (q, λ) dγ . (7.5.41)
2π n
mn
In the common case where the photon helicity is not measured, the observed
rate is given by a sum over helicities. This sum can be calculated using
Eq. (7.5.20):

ej (q, λ)ek∗ (q, λ) = δj k − q̂j q̂k ,
λ
where q̂ is the unit vector q/|q|. The observed differential decay rate for emis-
sion of a photon with momentum q into a small solid angle dγ is

d(i → f + q, λ)
λ
∗
|q| en en
= (Pnj )f i (Pnk )f i (δj k − q̂j q̂k )dγ . (7.5.42)
2π n
mn n
mn
We can now easily integrate over the photon direction, using

1 8π
dγ δj k − q̂j q̂k = 4πδj k 1 − = δj k .
3 3
The total decay rate for emission of a photon in any direction with any helicity
is then, in Einstein’s notation,
2
f
4|q| en

Ai = dγ d(i → f + q, λ) = (P n f i .
) (7.5.43)
3 mn n

λ
Section 3.5 shows how to use this also to calculate the rates of absorption and
stimulated emission of radiation.
Calculations are made easier if we replace matrix elements of momen-
tum vectors with matrix elements of position vectors. For this, we use the
commutator

1 i
[Xn , H ] = Xn , P2n = Pn ,

2mn mn
n
so
(f ,pf +q , Pn i,pi ) = −i(Ei − Ef )mn (f ,pf +q , Xn i,pi )
= −i|q|mn (f ,pf +q , Xn i,pi ) .
Therefore the decay rate (7.5.43) may be written
2
f 4|q|3

Ai = en (Xn )f i , (7.5.44)
3 n

where
(f ,pf +q , Xn i,pi ) = δ 3 (pf − pi + q)(Xn )f i . (7.5.45)
In cgs units, Eq. (7.5.44) takes the form

2
3
f 4|ω|
Ai = e n n f i ,
(X )
3h̄c3 n

where ω = |q|c/h̄ is the photon circular frequency. This formula was guessed
by Heisenberg18 in 1925 by setting the radiation power emitted in the transition
f
i → f that had been calculated in classical electrodynamics19 equal to h̄ωAi .
He used this formula as a starting point in his matrix mechanics approach to
quantum mechanics. The quantum-mechanical derivation was first given by
Dirac20 in 1927.
Selection Rules
As we have already warned, the electric dipole approximation is not useful if
selection rules give zero for the decay matrix element. We can derive the selec-
tion rules from either Eq. (7.5.43) or Eq. (7.5.44). First, as shown in Section 5.2,
the components of the operator X can be assembled into the spherical harmonics
for = 1,

3 ±1 3
∓ (X1 ± iX2 ) = |X|Y1 (X/|X|) , X3 = |X|Y10 (X/|X|) .
8π 4π
According to the rules for addition of angular momenta set out in Section 5.4,
if the initial atom at rest has total angular momentum quantum number ji , then
the states Xk i,pi for pi = 0 can only have total angular momentum quantum
number jf equal to ji + 1, ji , or ji − 1 and, furthermore, if ji = 0 then only
jf = 1 is possible; jf = 0 is only possible if ji = 1. Hence radiative decay
does not occur in the electric dipole approximation unless the initial and final
atomic states satisfy the selection rule
|jf − ji | ≤ 1 ≤ ji + jf . (7.5.46)
There is a further selection rule that follows from space inversion symmetry.
As we saw in Section 5.4, if we change the sign of each of the three Cartesian
coordinates, any state vector is changed to , where the operator is
unitary in the sense that † = 1, and, since making two space inversions in
succession changes nothing, also 2 = 1. Physical states therefore can be cho-
sen as eigenstates of with eigenvalue, known as the parity of the state, equal
to +1 or −1. The coordinate vector is obviously odd under space inversion, so
18 W. Heisenberg, Z. Physik 33, 879 (1925); reprinted in English in Van der Waerden, Sources of Quantum
19 J. Larmor, Phil. Mag. S.5 44, 503 (1897).
Xn = −Xn . Hence if the initial and final atomic states have parity πi and
πf , the transition rate will vanish in the electric dipole approximation unless the
initial and final parities satisfy the selection rule
πf = −πi . (7.5.47)
For instance, in hydrogen the transition 2p → 1s (ignoring spin) has ji = 1,
πi = −, jf = 0, πf = +, so it satisfies the selection rules (7.5.46) and (7.5.47)
and is therefore predominantly an electric dipole transition. This is the Lyman
alpha ultraviolet transition. On the other hand, in the electric dipole approxima-
tion the 2s, 3s, and 3d states are forbidden by both selection rules from decaying
into the 1s ground state.
Of course the electric dipole approximation is just an approximation. Instead
of simply replacing the coordinates Xn in the exponent in Eq. (7.5.32) with the
center-of-mass coordinate vector X, we can expand the exponential in powers of
the small quantity q·[Xn −X]. With one factor of this quantity, the operator in the
matrix element involves two factors of coordinates, which can be assembled into
the spherical harmonics Y2m and Y1m , which are respectively known as electric
quadrupole and magnetic dipole terms. With two factors of coordinates, these
operators are even under space inversion, so these contributions to the matrix
element vanish unless the initial and final states satisfy the selection rules
|ji − jf | ≤ 2 ≤ ji + jf , πi = πf electric quadrupole (7.5.48)
|ji − jf | ≤ 1 ≤ ji + jf , πi = πf magnetic dipole . (7.5.49)
For instance, in hydrogen the transition 3d → 1s occurs as an electric
quadrupole transition. The rates of both electric quadrupole and magnetic dipole
transitions are suppressed relative to electric dipole transitions by factors of
order (qr/h̄)2 , where r is a characteristic atomic radius; for optical transitions,
this is of order 10−7 .
We can go on, including higher and higher powers of Xn −X in the expansion
of the exponential in Eq. (7.5.32), and also including effects of electron spin.
But whatever effects we include, there is one kind of transition in which single-
photon emission is completely forbidden: transitions between states that both
have total angular momentum zero. This is a simple consequence of angular
momentum conservation. As we have seen, a photon of helicity ±1 has an
angular momentum component in the direction of motion ±1, and therefore
cannot be emitted in a transition between states that have zero total angular
momentum. For instance, none of the excited states of 12 C or 16 O with j = 0
can emit a single photon in gamma decay to the j = 0 ground state. Such
transitions require the emission of pairs of photons, or if enough energy is
available, of electron–positron pairs.
Gauge Invariance and Charge Conservation

It is essential both for the consistency of the Maxwell equations and for gauge
invariance that the current four-vector J μ (x) defined by Eq. (7.5.7) should be
conserved. In modern theories the matter with which electromagnetic fields
interact is described by a field theory, so we need to ask, in what sort of field
theory for matter is this current conserved? There is a simple answer. If charge
is to be conserved, then the net charge destroyed by the product of fields in any
term in the Lagrangian density must vanish, so if each field ϕn destroys a charge
en and creates a charge −en , then
∂ Lmat ∂ Lmat ∂ϕn

en ϕn + =0, (7.5.50)
n
∂ϕn ∂(∂ϕn /∂x μ ) ∂x μ
with summation of course understood over the repeated spacetime index μ. This
is the same as saying that the Lagrangian density is invariant under the phase
transformation
ϕn → [1 + ien ]ϕn , (7.5.51)
with an arbitrary infinitesimal. Using the Euler–Lagrange equation (7.1.8)
allows us to write Eq. (7.5.50) as a conservation equation
∂J μ
=0, (7.5.52)
∂x μ
where
∂ Lmat
J μ = −i en ϕn . (7.5.53)
n
∂(∂ϕn /∂x μ )
(This is an example of the relation between symmetry principles and conserva-
tion laws first expressed in the Noether theorem discussed in Section 5.7.) The
factor −i is inserted here so that

J 0 = −i en πn ϕn , (7.5.54)
n
and therefore

J d x , ϕn = −en ϕn ,
0 3
(7.5.55)

which tells us that en is indeed the value of the charge J 0 d 3 x that is destroyed
by the field ϕn .
For the vector potential to interact with this conserved current, in the sense of
Eq. (7.5.7), it is sufficient to arrange that ∂ϕn /∂x μ and Aμ always occur in the
matter Lagrangian density in the combination

∂
D μ ϕn ≡ − ien Aμ ϕn (7.5.56)
∂x μ
so that
∂ Lmat ∂ Lmat
= −ien μ)
= Jμ . (7.5.57)
∂Aμ n
∂(∂ϕ n /∂x
From this, Eq. (7.5.7) follows immediately.
For instance, we can use this prescription to include electromagnetic interac-
tions in the Lagrangian density (7.4.5) for a complex scalar field ϕ that destroys
charge e:
†
∂ϕ ∂ϕ
Lmat = −η μν
− ieAμ ϕ − ieAν ϕ .
∂x μ ∂x ν
Also, we used this prescription in the previous section to include electromag-
netic interactions in the Lagrangian density for Dirac fields.
There is a more general possibility, that in addition to depending on the
ϕn and Dμ ϕn , the matter Lagrangian density may also depend on the gauge-
invariant field-strength tensor Fμν . In this case, there is an additional term in
the current defined by Eq. (7.5.7):
∂ Lmat ∂ ∂ Lmat
J μ = −i en ϕ + ν
μ) n
. (7.5.58)
n
∂(∂ϕ n /∂x ∂x ∂Fμν
Because of the antisymmetry of Fμν , the new term in J μ is separately con-
served. The possibility of this new term alerts us that the general principles
of electrodynamics do not in themselves fully dictate the parameters in the
Lagrangian that characterize the interaction of matter and radiation, including
the magnetic moments of various particles.
Local Phase and Matrix Transformations

These prescriptions can be framed as consequences of an extended version of
gauge invariance. We have already noted that the condition (7.5.50) of charge
conservation is equivalent to the invariance of the Lagrangian density under the
phase transformation (7.5.51). But if ∂ϕn /∂x μ only appears in the Lagrangian
in the form (7.5.56), then the Lagrangian is invariant under a local phase trans-
formation, with an arbitrary infinitesimal function of spacetime coordinates,
ϕn (x) → [1 + i(x)en ]ϕn (x) , (7.5.59)
provided that the vector potential at the same time undergoes the gauge
transformation
∂(x)
Aμ (x) → Aμ (x) + . (7.5.60)
∂x μ
Today this reasoning is often run in reverse. It is assumed that the Lagrangian
density is invariant under the local phase transformation (7.5.59), with (x)
an arbitrary infinitesimal function of spacetime coordinates, and from this the
existence is deduced of a vector field Aμ (x) whose properties are governed by
invariance under the gauge transformation (7.5.60).
Indeed, our Standard Model of elementary particles and forces is based on
an assumed invariance under a larger group of local transformations, not just
by x-dependent phases, as in Eq. (7.5.59), but transformations by x-dependent
matrices similar to those for isotopic spin rotations. From this, one deduces
the existence of a number of photon-like particles: some, the gluons with zero
mass, whose strong interactions prevent them from being observed in isolation,
and others that are observed, the W± and Z0 , that become massive as a result
of a spontaneous breakdown of the local gauge symmetry. But these matters are
beyond the scope of this book.
Assorted Problems
1. Suppose that in a diatomic gas such as H2 , the vibrational degrees of free-

dom are fully excited, along with the rotational and translational degrees of
freedom. What is the ratio of the energy density of the gas to its pressure?
What does this tell you about the speed of sound in the gas?
2. Suppose that Einstein in 1905 had assumed that, in the radiation at tem-
perature T in a cubical enclosure, the number n of photons for each wave
number and polarization is not any positive integer but can only be n = 0,
n = 1, or n = 2. What would he have found for the energy density E (ν, T )
per unit frequency interval at frequency ν and temperature T ?
3. Suppose that in the 1910 experiment that revealed the existence of the
nucleus of the atom, the nucleus had been moving toward the radon alpha
ray source with speed v0 . What could one conclude about the mass of the
nucleus from the observation that alpha particles are sometimes scattered
straight backwards from the atom?
4. Suppose that the potential energy of an electron in the field of a nucleus is

not −Ze/r but rather V (r) = −gr −η , where g and η are positive-definite
constants, but that Bohr’s quantization condition me vn rn = nh̄ is still valid,
with h̄ some constant and n running over all positive-definite integers.
• What would Bohr in 1913 have found for the radii rn , velocities vn , and
energies En ?
• For what values of η do circular orbits exist that have En < 0?
• What would be found for the relation between h̄ and h if one imposed
Bohr’s correspondence principle on the orbits with n 1?
5. How does the pressure in a non-relativistic ideal gas vary when the mass
density varies adiabatically, assuming that the internal energy density is
either
296
Assorted Problems 297
• much bigger than the pressure, or

• equal to the pressure, or
• one percent of the pressure?
6. It is summer in Texas. The temperature outside is 104 ◦ F (40 ◦ C). In order
to keep the inside of your house at a comfortable 68 ◦ F (20 ◦ C) you need
to take 104 joules per second of heat energy from inside to outside. For this
purpose you use an inverse Carnot cycle. How much power will you need
to run this?
7. In the electrolysis of water, how long does it take a 1 ampere current to
produce 1 gram of oxygen gas?
8. Fifteen grams of element X combine with three grams of hydrogen to pro-
duce 18 grams of a compound Y of element X and hydrogen, with nothing
left over. Also, as gases all at the same temperature and pressure, 2 liters
of element X combine with 3 liters of hydrogen (H2 ) to give 2 liters of
compound Y. What is the chemical formula for compound Y, and what
is the atomic weight of element X? (Take the atomic weight of hydrogen
atoms as 1, and assume the validity of Avogadro’s principle.)
9. Consider a particle of mass m and velocity v, with |v|
c. Find the term in
the energy of this particle of order mv 4 /c2 , and the term in the momentum
of order mv 3 /c2 .
10. Suppose an observer who uses coordinates x μ sees a uniform magnetic
field of magnitude B1 , pointing in the 1-direction, and zero electric field.
A second observer uses coordinates x μ = μ ν x ν , where μ ν is the
Lorentz transformation (4.2.6) that gives a body at rest a velocity with
magnitude v in the 3-direction. What are the values of the components of
the electric and magnetic fields seen by the second observer?
11. In a spacetime with two space dimensions and one time dimension, the
electromagnetic field consists of a two-component electric field E and a
one-component magnetic field B. They satisfy differential equations
∂E1 ∂B ∂E2 ∂B
4πJ1 = − +c , 4πJ2 = − −c ,
∂t ∂x2 ∂t ∂x1
∂E1 ∂E2 ∂E2 ∂E1 1 ∂B
4πρ = + , 0= − + .
∂x1 ∂x2 ∂x1 ∂x2 c ∂t
Find what kind of transformation properties, under (2 + 1)-dimensional
Lorentz transformations, we can give the field components E1 , E2 , and
B and the densities J1 , J2 , and ρ so that the above equations are Lorentz
invariant, in the sense that they are invariant under linear transformations
on spacetime intervals that leave (x 1 )2 + (x 2 )2 − c2 (t)2 invariant.
298 Assorted Problems
Show that with the fields and densities transforming in this way, these
equations really are Lorentz invariant.
12. A particle known as a K meson, with mass 494 MeV/c2 , decays at rest into
a muon, with mass 106 MeV/c2 , and a neutrino, with negligible mass. Use
the conservation of energy and momentum to find the velocity of the muon.
13. What second-order partial differential equation (second-order in both time
and space derivatives) is satisfied by the de Broglie wave function for a free
particle when we do not assume that its velocity is much less than c?
14. When a beam of electrons of some definite energy is directed at a perfect
crystal, it is found that the largest angle θ between the incident and reflected
waves at which reflection is enhanced by constructive interference is 150◦ .
At what other value or values of θ is reflection enhanced by constructive
interference?
15. Suppose we measure the position of the electron in the lowest-energy state
of a hydrogen atom. What is the probability of finding that the electron is
farther than 10−8 cm from the nucleus?
16. Consider an electron in a d3/2 state with orbital angular momentum quan-
tum number = 2, total angular momentum quantum number j = 3/2,
and total angular moment 3-component J3 = h̄/2. Suppose we measure
the 3-component S3 of the spin. What are the probabilities of getting the
results S3 = h̄/2 and S3 = −h̄/2? (Calculate whatever Clebsch–Gordan
coefficients you need – do not just look them up in a table.)
17. Suppose the electron has spin 3/2 rather than 1/2, but that all other prop-
erties of electrons and nuclei are as they are in the real world. What would
you expect would be the atomic numbers Z of the two lightest halogen
elements, that behave like fluorine and chlorine in our world?
18. When a free electron is placed in a uniform magnetic field B pointing in the
1-direction, the Hamiltonian becomes
p2
H = + μ|B|S1
2me
where S is the operator representing the electron spin vector and μ is a
constant, related to the electron magnetic moment. Suppose that at t = 0
the expectation value of the spin vector has components
S1 = S2 = 0 , S3 = h̄/2 .
What are the expectation values of the spin vector components at any later
time?
Assorted Problems 299
19. Suppose that the interaction of the electron in a hydrogen atom with some
sort of external field produces a term in the potential
V (r) = gr ,
where g is a small constant. Calculate the terms in the resulting shift in the
energy of the 1s state that are of first and second order in g.
20. Suppose that the spin–orbit coupling of the electron in hydrogen produces
a term in the Hamiltonian
H = ξ L · S
where ξ is a constant, and L and S are the orbital angular momentum and
spin angular momentum of the electron. What does this term contribute to
the fine-structure splitting between the 2p1/2 and 2p3/2 states of hydrogen?
21. Consider the scattering of a spinless particle of mass m and momentum p
by a central potential
V (r) = V0 exp (−r 3 /R 3 )
where V0 and R are constants. Use the Born approximation to give a for-
mula for the scattering amplitude in the limit pR
h̄.
22. Consider a one-particle system with a Lagrangian

m dX 2 dX
L= + · V(X) ,
2 dt dt
where V is some vector function of position X.
• What equation of motion is satisfied by X?
• Find the Hamiltonian of this theory.
• What is the differential equation satisfied by the wave function ψ(x) of
a state with a definite energy E?
23. Consider a particle of charge e and mass m in classical electromagnetic
potentials that depend on time as well as position, with Hamiltonian
1 , e -2
H (X, P) = P − A(X, t) − eφ(X, t) .
2m c
Suppose you perform a time-dependent and position-dependent gauge
transformation, to new potentials
1 ∂ξ
A# = A + ∇ξ , φ# = φ − ,
c ∂t
where ξ is an arbitrary real function of position and time. What is the
relation between the wave function ψ # (x, t) that satisfies the time-dependent
300 Assorted Problems
Schrödinger equation for the new potentials and the wave function ψ(x, t)
that satisfies the time-dependent Schrödinger equation for the original
potentials?
24. Find the coordinate-space wave function of the one-particle state with an-
gular momentum = 0 and energy V0 + 2h̄ω in the harmonic oscillator
potential (6.3.1).
25. Suppose that the potential felt by an alpha particle for radius r outside the
nuclear radius R is not the Coulomb potential, but instead V (r) = g/r 2 ,
where g is some positive constant. Calculate the exponential suppression
factor in the rate of decay of an unstable alpha particle state with energy
E
g/R 2 .
26. Consider the theory of a neutral spinless particle A and a non-neutral spin-
less particle B, with Lagrangian density
1 ∂ϕA ∂ϕA m2 2 ∂ϕ † ∂ϕB
L = − ημν μ ν − A ϕA − ημν Bμ ν − m2B ϕB† ϕB
2 ∂x ∂x 2 ∂x ∂x
− gϕA ϕB† ϕB .
Calculate the S-matrix elements for the processes A + B → A + B and
B + B → A + A to lowest order in g, where B is the antiparticle of B.
27. Calculate the rate for emission of a photon in the transition 2p → 1s in
hydrogen. Derive formulas and use them to find numerical values. You can
use the facts that the proton is much heavier than the electron, and that the
wavelength of the photon emitted in this process is much larger than the
atomic size, and you can neglect electron spin.
28. What powers of the photon wave number appear in the rates for single-
photon emission in the decays of the 4f state of hydrogen into the 2s and
2p states?
Bibliography
R. T. Beyer, Foundations of Nuclear Physics (Dover Publications, New York, 1949)

[Chapters 3, 6]. Facsimiles of 19 early papers on nuclear physics, including papers
of Rutherford from 1911 and 1919.
Max Born, Atomic Physics (Blackie & Sons, London, 1937; 6th edn. Hafner Publishing,
New York, 1956) [Chapters 2, 3, 5]. An excellent survey of quantum theory and its
applications, from a founder of the theory, with 39 useful mathematical appendices.
Stephen G. Brush, The Kinetic Theory of Gases – An Anthology of Classic Papers with
Historical Commentary (Imperial College Press, London, 2003) [Chapters 1 and 2].
This is an invaluable collection of original papers, including work by Boyle, Newton,
Bernoulli, Joule, Clausius, Maxwell, and Boltzmann cited in the text, along with
reprints of interesting historical discussions by Brush.
J. Chadwick, ed., The Collected Papers of Lord Rutherford of Nelson (Interscience,
1963) [Chapters 3, 6].
P. A. M. Dirac, Principles of Quantum Mechanics (Clarendon Press, Oxford, 1930; 4th
edn. 1958) [Chapter 5]. Long the leading treatise on quantum mechanics.
Enrico Fermi, Thermodynamics (Prentice Hall Co., New York, 1937; reprinted by Dover
Press in 1956) [Chapter 2]. This is a masterpiece of scientific exposition, based on
lectures that Fermi gave at Columbia University in 1936.
G. Holton, Am. J. Phys. 28, 627 (1960) [Chapter 4]. Insightful assessment of contribu-
tions of Einstein, Lorentz, and Poincaré to special relativity theory.
A. J. Ihde, The Development of Modern Chemistry (Harper & Row, 1964) [Chapter 1].
Martin J. Klein, ed., Letters on Wave Mechanics (Philosophical Library, New
York, 1967) [Chapter 5]. Correspondence among Einstein, Lorentz, Planck, and
Schrödinger, translated into English with useful commentary by Klein.
Thomas S. Kuhn, Black-Body Theory and the Quantum Discontinuity, 1894–1912
(Oxford University Press, New York, 1978) [Chapter 3]. This is a detailed analysis
of the work of Planck and Einstein on black-body radiation.
L. D. Landau and E. M. Lifshitz, Fluid Mechanics (Pergamon Press, London, 1959)
[Chapter 2]. This is the classic text on many aspects of fluid mechanics, including the
hydrodynamics of viscous fluids. It is translated from the Russian by J. B. Sykes and
W. H. Reid.
301
302 Bibliography
Arthur I. Miller, Albert Einstein’s Special Theory of Relativity: Emergence (1905)

and Early Interpretation (1905–1911) (Addison-Wesley, Reading, MA, 1981)
[Chapter 4]. Detailed analysis of the founding of special relativity.
M. J. Nye, The Question of the Atom – From the First Karlsruhe Conference to the
First Solvay Conference (Tomash Publishers, Los Angeles, 1984) [Chapters 2 and 3].
English language version of research reports from 1860 to 1911, including papers by
Boltzmann, Einstein, Mendeleev, Perrin, Rutherford, J. J. Thomson.
Jean Perrin, Brownian Motion and Molecular Reality (Taylor and Francis, London,
1910) [Chapter 2]. This is the translation from the French by F. Soddy of Perrin’s
review of his experiments on diffusion, published in September 1909 in the Annales
de Chemie et de Physique, 8th Series.
Wayne Saslaw, “A History of Thermodynamics: The Missing Manual,” Entropy 22, 27
(2020).
J. F. Shearer and W. M. Deans, Collected Papers on Wave Mechanics (Blackie and Son,
London, 1928) [Chapter 5]. Papers of Schrödinger and others, translated into English.
John Stachel, ed., Einstein’s Miraculous Year (Princeton University Press, Princeton,
NJ, 1998) [Chapters 2, 3, 4]. This is an invaluable collection of Einstein’s papers
from about 1905 on Brownian motion, special relativity, and the photon.
B. L. Van der Waerden, Sources of Quantum Mechanics (North-Holland Publishing,
Amsterdam, 1967; reprinted by Dover Publications, New York, 1968) [Chapters 3, 5].
Papers on early quantum theory and matrix mechanics by Bohr, Dirac, Einstein,
Heisenberg, Pauli, and others, all in English or English translation.
Steven Weinberg, The Discovery of Subatomic Particles (Scientific American Library,
1983; revised edn. Cambridge University Press, Cambridge, UK, 2003) [Chapters 1,
3, 6]. This is a non-mathematical historical account of the discoveries of the electron,
proton, neutron, photon, etc., going back to the beginnings of chemistry, with algebra-
based technical appendices.
Steven Weinberg, Lectures on Quantum Mechanics (Cambridge University Press,
Cambridge, UK, 2012; 2nd edn. 2015) [Chapters 3, 5, 6, and 7]. This is a graduate-
level introduction to quantum mechanics, with some historical discussion of the early
quantum theory.
Steven Weinberg, “Half a Century of the Standard Model,” Phys. Rev. Lett. 121, 220001
(2018) [Chapters 6, 7]. A historical review of the Standard Model.
L. Pearce Williams, Relativity Theory: Its Origins and Impact on Modern Thought (John
Wiley and Sons, New York, 1968) [Chapter 4]. Contains an 1887 article by Michelson
and Morley and text of a 1904 talk by Poincaré.
Author Index
Ampère, A.-M., 12, 117 Cohan, C., Jr., 247

Anaximenes, 6 Compton, A. H., 71, 113
Anderson, C. D., 245 Condon, E. U., 218, 233
Anderson, M. H., 168 Copernicus, N., 89
Archimedes, 88 Cornell, E., 168
Aristarchus, 88 Coulomb, C. A., 11, 76
Aristotle, 2, 50, 89 Cronin, J. W., 36
Aston, F. W., 211 Curie, M., 71, 212, 235
Avogadro, A., 8–10
Dalton, J., 7–9, 75
Bahcall, J. N., 249 Davis, R., Jr., 249
Balmer, J., 81 Davisson, C., 126, 127
Becquerel, A. H., 71, 72 Davy, H., 13
Bernoulli, D., 4, 5, 33 Democritus, 1
Beyer, R. T., 73, 210, 212, 245 Dirac, P. M. A., 124, 161, 167, 198, 206, 244, 245,
Blatt, J. M., 233 250, 271, 276, 291
Bohr, N., 79–83, 139, 244 Dyson, F., 261
Boltzmann, L., 9, 33, 34, 40
Born, M., 66, 131, 132, 179 Einstein, A., xiii, xiv, 52, 53, 55–60, 65–71, 84–86,
Bose, S. N., 167 88, 90, 94–96, 106, 110, 111, 117, 138, 167, 251,
Boyle, R., 2, 3, 5 290, 296
Brackett, F. S., 81 Ellis, C. D., 244, 247
Bragg, L., 127 Empedocles, 6
Bragg, W. H., 127 Epicurus, 1
Breit, G., 218
Brillouin, L., 236 Faraday, M., 12, 13, 117
Brown, R., 51, 55 Feenberg, E., 218
Brush, S. G., 22, 33, 34, 50 Fermi, E., xiv, 24, 167, 224, 235, 240, 245, 250, 279
Buridan, J., 89 Feyman, R., 261, 283
Fierz, M., 167, 275
Carlisle, A., 13 Fitch, V. L., 36
Carnot, S., 16, 22, 25 Fitzgerald, G. F., 94, 95, 104
Cassen, B., 218 Flanders, H., 120
Cavendish, H., 6 Fock, V., 260
Chadwick, J., 212, 213, 244 Fourier, J., 16
Charles, J., 3, 26 Franklin, B., 10
Christensen, J. H., 36 Fraunhofer, J., 77
Clausius, R., 4–6, 9, 17, 19, 20, 22, 27, 32–34, 38, French, J. B., 161, 205
49 Friedman, J., 248
303
304 Author Index
Galilei, G., 89 Lorentz, H. A., 68, 94, 95, 104

Gamov, G., 233, 246 Loschmidt, J., 53
Garwin, R., 248 Lucretius, 1
Gay-Lussac, J. L., 3, 5, 8 Lyman, T., 81
Geiger, H., 72, 73, 234
Geissler, H. J., 14 Mach, E., 58
Germer, L., 126, 127 Machiavelli, N., 1
Gibbs, J. W., 33, 35 Majorana, E., 250
Gilbert, W., 11 Mariotte, E., 5
Goeppert-Mayer, M., 228 Marsden, E., 73
Goudsmit, S., 151, 156, 170 Maxwell, J. C., 12, 14, 33, 50, 91, 95, 117, 120
Gove, N. B., 214 Mendeleev, D. I., 171
Green, G., 177 Michelson, A., 91, 93, 94
Gurney, R. W., 233 Millikan, R. A., 54, 55, 70
Minkowski, H., 96, 97
Hafstad, L., 218 Montaigne, M., 1
Harmer, D. S., 249 More, 1
Harteck, E., 220 Morley, E., 91, 93, 94
Hartree, D., 170 Morse, S. F. B., 12
Heath, T. L., 88 Moseley, H. G. J., 83
Heisenberg, W., 124, 147, 225, 291 Muirhead, H., 222
Heitler, W., 212 Murray, J., 94
Helmholtz, H., 17
Heraclides, 89 Nagaoka, H., 76
Heraclitus, 6 Navier, C.-L., 44, 46
Herschel, W., 16 Nernst, W., 32
Herzberg, G., 212 Newton, I., 1, 3, 18, 21, 59, 108
Heydenberg, N., 218 Nicholson, W., 13
Hoffman, K. C., 249 Noether, A. E., 194
Huygens, C., 18, 108 Nuttall, J. M., 234
Ito, D., 261 Ochiallini, G. P. S., 222

Oersted, H. C., 12, 117
Jaseja, T. S., 94 Oliphant, M., 220
Javn, A., 94 Orear, J., 235
Jeans, J., 65, 68, 69 Oresme, N., 89
Jensen, J. H. D., 228 Ostwald, F. W., 58
Joule, J. P., 17
Pascal, B., 2
Kanesawa, S., 261 Paschen, F., 81
Kepler, J., 89 Pauli, W., 151, 167, 170, 244, 245, 271, 275
Kirchhoff, G. R., 61 Perrin, J., 57, 58, 60
Koba, Z., 261 Pfund, H., 81
Kramers, H. A., 161, 236 Planck, M., 57, 58, 65–68
Kroll, N., 161, 205 Plato, 6, 10, 11, 89
Plücker, J., 14
Lamb, W. E., Jr., 161, 205 Poincaré, R., 94, 120
Landau, L. D., 47 Pontecorvo, B., 249
Laplace, P.-S., 16 Powell, C. F., 222
Larmor, J., 291 Priestley, J., 6
Lattes, C. M. G., 222 Prout, W., 211
Lavoisier, A., 6, 7, 16 Ptolemy, C., 89
Lederman, L., 248 Pythagoras, 88
Lee, T.-D., 247, 248
Lenard, P., 70 Rayleigh, Lord [see Strutt, J. W.], 65, 67–69
Lewis, G. N., 114 Reines, F., 247
Lifshitz, E. M., 47 Retherford, R. C., 161
Lippmann, B., 183 Ritz, W., 78, 79, 81
Author Index 305
Rosenfeld, A. H., 235 Townsend, J. S. E., 54, 55

Royds, T. D., 72 Turlay, R., 36
Rumford, Count [see Thompson, B.], 17 Tuve, M. A., 218
Rutherford, E., 71–73, 75–77, 176, 182, 210, 211,
213, 217, 218, 220, 229 Uhlenbeck, G., 151, 157
Rydberg, J., 81
van den Broek, A., 76
Schluter, R. A., 235 Van der Waerden, B. L., 84, 124, 225, 291
Schwinger, J., 183, 198, 261 Volta, A., 11
Shakespeare, W., 1
Shubert, K. R., 36 Wapstra, A. H., 214
Slater, J. C., 170 Weinberg, S., 1, 36, 47, 88, 97, 147, 204, 217, 221,
Socrates, 6 260, 275
Soddy, F., 72 Weinrich, M., 248
Stefan, J., 67 Weisskopf, V. F., 161, 205, 233, 271
Stokes, G., 44, 46, 47, 50 Wenzel, G., 236
Stoney, G. J., 14, 54 Wiemann, C., 168
Strutt, J. W. [Lord Rayleigh], 65 Wien, W., 66, 68
Wilson, H. A., 54
Tati, T., 261 Wooster, W. A., 244, 247
Telegdi, V., 248 Wu, C. S., 248
Thales, 6
Thompson, B. [Count Rumford], 17 Xenophanes, 6
Thomson, G. P., 14, 15, 54, 70, 72, 75, 113
Thomson, J. J., 127 Yang, C.-N., 248
Tomonaga, S.-I., 261 Yukawa, H., 182, 270
Torricelli, E., 2
Townes, C. H., 94 Zeeman, P., 135
Subject Index
A and B coefficients, 84–85 Bragg formula, 127–129

action, 193, 252 Brownian motion, 51, 55–57
adiabatic change, 20–21, 31
adjoints of operators, 141–143 caloric, 16
alkali metals, 151, 172 calorie, unit of heat, 17
canonical ensemble, 37–38
alpha decay and alpha particles, 71–73, 216,
canonical formalism, 190–195, 292–254
229–243
Carnot cycle, 22–26, 29
ampere, unit of electric current, 12 cathode rays, 14–15
angular momentum, see also molecules, rotations, causality, 121–123, 264
spherical harmonics, spin Charles’ law, 3, 4, 5, 9, 31
addition, 158–159, 162–165 chemical potential, 38, 168
commutation relations, 153–154 Clebsch–Gordan coefficients, 159, 162–165, 173,
conservation, 44, 149, 153 223–224
multiplets, 134–135, 155–157 combination principle, 78–79
operator in wave mechanics, 132–136 combining volumes law, 8–9
quantization, 79, 82, 133 combining weights law, 7–8
antiparticle, 245, 271–272 commutators, 146, 192
Compton wavelength h/me c, 113, see also
atomic number, 82–84
scattering
atomic weight, 7–8, 83 conservation
atom, 1, 6, 58, see also atomic number, atomic angular momentum, 153
weight, combining weights law, element, nuclei differential equation for transport phenomena,
of atoms 42–43, 114–115
Avogadro’s number NA , 10, 54, 57, 58 electric charge, 293
Avogadro’s principle, 8–9 in quantum mechanics, 149
continuum limit in quantum mechanics, 145–146
barrier penetration, 229–243 convection, 31, 51
baryon number, 162, 219–220, 222 cooling of interstellar gas, 175
Bessel functions, 237–239 correspondence principle, 80
beta decay, 71–72, 216, 243–250, 279–280 coulomb, unit of electric charge, 12
black- body radiation, 57, 61–62 Coulomb force, 11, 79, 136, 214, 230
Coulomb gauge, 282–284
Boltzmann constant k, 6, 9, 30, 54, 57, 66–67
Coulomb scattering, 75, 181–182
Born approximation, 180–181 creation and annihilation operators, 257, 270
Born rule, see probabilities in quantum mechanics cross section σ , 49–50, 75, 180, 188–189
Bose–Einstein statistics, 167–168
boson, 165–168, 173–175, 273–275 D line of sodium, 151
Boyle’s law, 2, 4, 5 Davisson–Germer experiment, 126–128
307
308 Subject Index
De Broglie waves, 125–128, 140 general relativity, 102, 260

decay rates, 188, 190 Green’s function, 178
degeneracy gyromagnetic ratio, 198
atomic energy levels, 138, 151, 172
harmonic oscillator energy levels, 227 H theorem, 34–37
in perturbation theory, 200–202 halogen, 172
molecular energy levels, 174 Hamiltonian, 141, 190–103
delta function, 177–178, 241 for beta decay interaction 245–246
deuteron, 157, 217, 219 for charged particle in electromagnetic field, 196
diffusion, 51–53, 58–60, see also Brownian motion for fields, 254
diffusion constant D, 51, 59 for electromagnetic field, 286
Dirac equation and field, 276–280
for harmonic oscillator, 225
Doppler effect, 106
harmonic oscillator, 225–228
Hartree approximation, 169–170, 213, 224
eigenfunctions and eigenvalues, 140
heat, 16–17
eigenstates and eigenvalues, 207
Einsteinian relativity, 94–123 Heisenberg picture, see time dependence in quantum
electric and magnetic forces, 12, 15, 120–121, mechanics
196–197 helicity, 158, 284–285
electrostatics, 10–11 helium nucleus, see alpha decay and alpha particles
electric dipole transitions, see radiative decay Hermitian operator, 142–143
electrine, 14 Hilbert space, 124, 146
electrolysis, 13–14 Hydrodynamics of Bernoulli, 4
electromagnetism, 11–12, see also light, Maxwell’s hydrogenic energy levels, 79–82, 126, 136–138
equations hyperfine splitting, see 21 cm spectral line
electron, see also beta decay, gyromagnetic ratio
charge, 54–55, 58 ideal gas, 9–10, 26–27, 31
discovery, 14–15 infrared radiation, 16, 81
in nucleus?, 211–212 interaction picture, 263
orbits in atoms, 78 isotope, 7, 211
spin, 151–157 isotopic spin symmetry, 218–224
element, 6–8, see also periodic table
energy, see also equipartition of energy, joule, unit of energy, 17
Hamiltonian, heat, kinetic energy
conservation, 17–19, 23, 43 K meson, 36, 247
fields, 254 kinetic energy, 17–18
relation to mass, 106–110 kinetic theory, 33–34
entropy, 27–31, 38–39, 41–42 Knudsen regime, 51
equipartition of energy, 40–41
Euler–Lagrange equations, 253 Lagrangian, 193–194, 252–255
exclusion principle, see Pauli exclusion principle Lamb shift, 161, 203, 205
expectation values, 143–144, 207
lasers, 86
lepton number, 249–250
faraday, charge per mole, 13–14, 15, 53, 58
light
Fermi–Dirac statistics, 169
fermion, 165–171, 173–175, 273–280 light cone, 123
Feynman diagrams, 267–268 polarization, 62–65, 284–285
fine structure, 159–161, 203, 279 speed, 12, 90–91, 100–101
Fock space, 260 transformation of wave number and frequency,
force, see also electromagnetism, Stokes’ law 104–106
in relativity theory, 110–111 linear operators, 139
Lippmann–Schwinger equation, 183, 261
Galilean relativity, 45–46, 51, 88–90, 100 liquid drop model, 214–215
gamma decay, 216, 221 lodestone, 11
gauge transformations, 197, 282, 293–295 Lorentz–Fitzgerald contraction, 94, 104
gas constant R, 10, 53 Lorentz invariance and transformation, 96–102, 264,
gases, 2–6, 8–10, 26–27, 31 39–40 270, 273–274
Geiger–Nuttall law, 254 Lyman alpha transition, 292
Subject Index 309
magnetism, 11, 198, see also electromagnetism also see magic numbers, periodic table
magic numbers, 228–229 periodic table of elements, 171–172
mass in relativity theory, 106–110 perturbation theory, 199–205, 262–264
mass of unit atomic weight m1 , 9–10, 214 photoelectric effect, 70
matrix mechanics, 124, 225, 227, 291 photons, 67–71, 284–286
Maxwell–Boltzmann distribution, 33–34, 39–40 pions, 222–223, 248
Maxwell’s equations, 12, 62, 114–120, 196, Planck distribution, 65–66, 68–69, 168
281–282 Planck’s constant h, 66, 69, 70
mean free path, 49–51 Planck’s constant h̄, 80
Michelson–Morley experiment, 91–95 position operator, 140
Minkowski spacetime notation, 96–97, 117–119 positron, 244–245, 247
mole, defined, 10 pressure, 2, 5, 23, 45
molecular weight μ, 9–10 also see gases, osmotic pressure, radiation
molecule, 8–9, 41 principal quantum number n, 138, 151, 161
angular momentum, 40–41 Principia of Newton, 3, 18
diatomic molecules, 172–175, 212 probabilities in quantum mechanics, 131, 139,
size and mass, 53 144–145, 179–180, 187, 208
momentum propagator, 267–269
conservation, 18, 43–44, 149 proton, discovered, 210
fields, 254 Prout’s hypothesis, 211
momentum-space wave functions, 145–146
of photons, 70, 111–112 quantum chromodynamics, 221
of relativistic particles, 109 quantum electrodynamics, 198, 205, 280–295
operator, 130, 140 quantum field theory, 245, 251–295
muons, 106, 223, 248 quantum mechanics, 138–151, 206–209
quarks, 157, 166, 221–222
Navier–Stokes equation, 44–47
neutral matter, 31–32 radiation, 32, 62–65
neutrinos, 157, 244, 247–250 radiation energy constant, 67
neutron also see A and B coefficients, black-body
decay, 244 radiation, light, Planck distribution,
discovery, 212–213 Rayleigh-Jeans distribution,
mass, 213, 222 Stefan-Boltzmann constant, stimulated
noble gases, 172, 224 emission
Noether’s theorem, 194–195, 293 radiative decay, 84–85, 174–175, 286–292
normalization of wave functions, 131, 139, 148, 240 radioactivity, 71
nuclear force, 213–214, 217–218, 224–225 also see alpha decay, beta decay, gamma decay
nuclei of atoms radium & radon, 71, 72, 216, 234–235
binding energy, 214–216 Rayleigh-Jeans distribution, 65, 67
charge, 76–77 Reflections on the Motive Power of Heat of Carnot,
discovery, 72–73 22
mass, 73–74 reduced mass, 52
radius, 74–75, 213–214, 235 relativity, see Einsteinian relativity, Galilean
nucleons, 213, 217 relativity, general relativity
representations of Lorentz group, 274, 276–277
On the Electrodynamics of Moving Bodies of resonance, 190, 223
Einstein, 95 rotations, 152–153
On the Nature of Things of Lucretius, 1 Rydberg constant, 81
ortho and para molecules, 173–175, 212
orthogonal state vectors, 207 S-matrix, 185, 262, 264
orthogonal wave functions, 142–144 saturation of nuclear force, 213
operators representing observables, 139–141 scalar and vector potentials, 195–198, 281–286
osmotic pressure, 55–56 scalar fields, 255–270
scalar product of state vectors, 206
para molecules, see ortho and para molecules scalar product of wave functions, 141, 158, 183
parity, see space-inversion symmetry scattering
Pauli exclusion principle, 170–171, 213, 215, 271, by Coulomb potential, 75–76, 181–182
274 Compton scattering, 112–114
310 Subject Index
in quantum field theory, 265–270 defined by Carnot cycle efficiency, 21–27

in quantum mechanics, 175–190 The Nature of the Motion which We Call Heat of
scattering amplitude f , 178–179 Clausius, 17
Schrödinger equation, 129–138, 176, 199 thermodynamics, laws, 32–33, 38
Schrödinger picture, see time-dependence in three-three resonance, 223–224
quantum mechanics Timaeus of Plato, 6, 10, 11
second quantization, 251–252 time dependence in quantum mechanics, 148–151,
self-adjoint operators, see Hermitian operators 191–193
Slater determinant, 170 time dilation, 103–104
sound speed, 21 time-ordering, 263
space inversion symmetry, 160, 245–246, 247–248, time reversal invariance, 36–37
291–292 transformation theory of Dirac, 124, 206
specific entropy, 31 transport theory, 42–53
specific heat, 19–20 twenty-one cm spectral line, 87, 162
spectral lines, 77, see also combination principle,
hydrogenic spectrum, Lyman alpha transition, ultraviolet radiation, 70, 81, 292
Rydberg constant uncertainty principle, 147
spin, 151–152, 154–155, 157 unitarity, 218
spin–orbit coupling, 160, 228–229, see also fine uranium, 71, 216
structure
spin–statistics connection, 167, 273, 275 vacuum state and vacuum energy, 259–260, 286
spherical harmonics, 135–136 valence, 172
stable valley, 215–216 vector potential, see scalar and vector potentials
Standard Model, 295 viscosity η, 47–49
state vectors, 206
Stefan–Boltzmann constant σ , 67 wave function, 129–132, 139, 209
stimulated emission, 84–87 wave mechanics, 124–129
Stokes’ law, 47, 51, 56, 59 weak interactions, 223, 247–248, see also beta decay
Stokes phenomenon, 239 white dwarf stars, 169
strangeness, 220 Wien displacement law, 66
Sun, 31, 88, 249 Wien distribution, 68
symmetry, see gauge transformations, isotopic spin Wigner–Eckart theorem, 204
symmetry, Lorentz invariance, Noether’s theorem, WKB method, 236–237
rotations, space inversion invariance, time
reversal invariance X-rays, 71, 83–84, 113, 127
tauon, 249 Yukawa potential, 182, 270

temperature, 2–3, see also canonical ensemble,
entropy, gases Zeeman effect, 135, 202–205
absolute, 3, 26 zero-point energy, 226

Steven Weinberg - Foundations of Modern Physics-Cambridge University Press (2021)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Steven Weinberg - Foundations of Modern Physics-Cambridge University Press (2021)

Uploaded by

Copyright:

Available Formats

Foundations of Modern Physics

In addition to his ground-breaking research, Nobel Laureate Steven Weinberg

steven weinberg is a member of the Physics and Astronomy Departments

Cambridge University Press is part of the University of Cambridge.

PREFACE page xiii

1 EARLY ATOMIC THEORY 1

1.1 Gas Properties 2

1.4 The Electron 14

2 THERMODYNAMICS AND KINETIC THEORY 16

2.1 Heat and Energy 16

2.2 Absolute Temperature 21

2.4 Kinetic Theory and Statistical Mechanics 33

2.5 Transport Phenomena 42

2.6 The Atomic Scale 53

3 EARLY QUANTUM THEORY 61

3.1 Black Body Radiation 61

3.3 The Nuclear Atom 71

3.4 Atomic Energy Levels 77

3.5 Emission and Absorption of Radiation 84

4.1 Early Relativity 88

4.2 Einsteinian Relativity 94

4.3 Clocks, Rulers, Light Waves 103

4.4 Mass, Energy, Momentum, Force 106

4.5 Photons as Particles 111

4.6 Maxwell’s Equations 114

4.7 Causality 121

5 QUANTUM MECHANICS 124

5.1 De Broglie Waves 125

5.2 The Schrödinger Equation 129

5.3 General Principles of Quantum Mechanics 138

5.4 Spin and Orbital Angular Momentum 151

5.5 Bosons and Fermions 165

5.6 Scattering 175

5.7 Canonical Formalism 190

5.8 Charged Particles in Electromagnetic Fields 195

5.9 Perturbation Theory 199

5.10 Beyond Wave Mechanics 206

6 NUCLEAR PHYSICS 210

6.1 Protons and Neutrons 210

6.2 Isotopic Spin Symmetry 216

6.3 Shell Structure 224

6.4 Alpha Decay 229

6.5 Beta Decay 243

7 QUANTUM FIELD THEORY 251

7.1 Canonical Formalism for Fields 252

7.2 Free Real Scalar Field 255

7.3 Interactions 261

7.4 Antiparticles, Spin, Statistics 270

7.5 Quantum Theory of Electromagnetism 280

ASSORTED PROBLEMS 296

1.1 Gas Properties

forerunners go back respectively to 1724 and 1742. But, although in Boyle’s

v2  = v12  + v22  + v32  = 3v⊥

law or Charles’ law. Clausius deserves to be called the founder of thermo-

Law of Combining Weights

Compound Dalton formula True formula

Element True Dalton Dalton*

To make progress in measuring atomic weights, it was evidently necessary

Law of Combining Volumes

The Gas Constant

If as stated by Avogadro the number of molecules in a gas with a given pressure,

In 1785 Charles-Augustin de Coulomb (1736–1806) reported that the force

In July 1820 Hans Christian Oersted (1777–1851) in Copenhagen noticed that

Faraday knew that e/m1 96 500 coulombs/gram, where e is the unit of

v2 = v12 + v22 + v32 = 3v⊥

But P P −1 is also a closed cycle, so

Noting that for k · e = k · e = 0 we have (k × e) · (k × e ) = k2 e · e , and

(mA − mB )vA + 2mB vB