Professional Documents
Culture Documents
Steven Weinberg
University of Texas, Austin
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906
www.cambridge.org
Information on this title: www.cambridge.org/9781108841764
DOI: 10.1017/9781108894845
© Steven Weinberg 2021
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2021
Printed in the United Kingdom by TJ Books Limited, Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Weinberg, Steven, 1933– author.
Title: Foundations of modern physics / Steven Weinberg, The University of Texas at Austin.
Description: New York : Cambridge University Press, 2021. | Includes
bibliographical references and indexes.
Identifiers: LCCN 2020055431 (print) | LCCN 2020055432 (ebook) |
ISBN 9781108841764 (hardback) | ISBN 9781108894845 (epub)
Subjects: LCSH: Physics.
Classification: LCC QC21.3 .W345 2021 (print) | LCC QC21.3 (ebook) |
DDC 530–dc23
LC record available at https://lccn.loc.gov/2020055431
LC ebook record available at https://lccn.loc.gov/2020055432
ISBN 978-1-108-84176-4 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
For Louise, Elizabeth, and Gabrielle
Contents
1.2 Chemistry 6
Elements Law of combining weights Dalton’s atomic weights Law of
combining volumes Avogadro’s principle The gas constant Avogadro’s
number
1.3 Electrolysis 10
Early electricity Early magnetism Electromagnetism Discovery of
electrolysis Faraday’s theory The faraday
vii
viii Contents
2.3 Entropy 27
Definition of entropy Independence of path Increase of entropy
Thermodynamic relations Entropy of ideal gases Neutral matter
Radiation energy Laws of thermodynamics
3.2 Photons 67
Quantization of radiation energy Derivation of Planck distribution
Photoelectric effect Particles of light
4 RELATIVITY 88
This book grew out of the notes for a course I gave for undergraduate physics
students at the University of Texas. In this book I think I go farther forward
than is usual in undergraduate courses, giving readers a taste of nuclear physics
and quantum field theory. I also go farther back than is usual, starting with the
struggle in the nineteenth century to establish the existence and properties of
atoms, including the development of thermodynamics that both aided in this
struggle and offered an alternative program.
I fear that some readers may want to skim through this early part and hurry
on to what they regard as the good stuff, quantum mechanics and relativity. That
would be a pity. In my experience physics students who aim at a career in atomic
or nuclear or elementary particle physics often manage to get through their
formal education without ever becoming familiar with entropy, or equipartition,
or viscosity, or diffusion. That was true in my own case. This book, or a course
based on it, may provide some students with their last chance to learn about
these and other matters needed to understand the macroscopic world.
Readers may find this book unusual also in its strong emphasis on history.
I make a point of saying a little about the welter of theoretical guesswork and ill-
understood experiments out of which modern physics emerged in the twentieth
century. This, it seems to me, is a help in understanding what otherwise may
seem an arbitrary set of postulates for relativity and quantum mechanics. It is
also a matter of personal taste. Research in physics seems to me to lose some of
its excitement if we do not see it as part of a great historical progression. Some
valuable historical works are listed in a bibliography, along with collections of
original articles that I have found most helpful.
But this is not a work of history. Historians aim at uncovering how the scien-
tists of the past thought about their own problems – for instance, how Einstein
in 1905 thought about the measurement of space and time separations in de-
veloping the special theory of relativity. For this aim of historical writing it is
necessary to go deeply into personal accounts, institutional development, and
xiii
xiv Preface
false starts, and to put aside our knowledge of subsequent progress. I try to
be accurate in describing the state of physics in past times, but the aim of this
book in discussing the problems of the past is different: it is to make clear how
physicists think about these things today.
This book is intended chiefly for physics students who are well into their time
as undergraduates, and for working scientists who want a brief introduction to
some area of modern physics. I have therefore not hesitated to use calculus and
matrix algebra, though not in advanced versions. As required by the subject
matter, the mathematical level here slopes upwards through the book. Where
possible I have chosen concrete rather than abstract formulations of physical
theories. For instance, in Chapter 5, on quantum mechanics, I mostly represent
physical states as wave functions, only coming at the end of the chapter to their
representation as vectors in Hilbert space. In some sections detailed material
that can be skipped without losing the thread of the theory is put into appendices.
Two of these appendices present what in my unbiased opinion are improved
derivations of important results: the appendix to Section 2.6 gives a revised
version of Einstein’s derivation of his formula for the diffusion constant in
Brownian motion, and the appendix to Section 6.4 presents a revision of Fermi’s
calculation of the rate of alpha decay.
In my experience, with some judicious pruning, the material of the book up
to about the middle of Chapter 5 can be covered in a one-term undergraduate
course. But I think that to go over the whole book would take a full two-term
academic year.
This book treats such a broad range of topics that it is impossible to go very
far into any of them. Certainly its treatment of quantum mechanics, statistical
mechanics, transport theory, nuclear physics, and quantum field theory is no
substitute for graduate-level courses on these topics, any one of which would
occupy at least a whole year. This book presents what I think, in an ideal
world, the ambitious physics student would already know when he or she enters
graduate school. At least, it is what I wish that I had known when I entered
graduate school.
In any case, I hope that the student or reader may be sufficiently interested in
what I do discuss that they will want to go into these topics in greater detail in
more specialized books or courses, and that they will find in this book a good
preparation for such further studies.
I am grateful to many students and colleagues for pointing out errors in
the lecture notes on which this book is based and for the expert and friendly
assistance I have received from Simon Capelin and Vince Higgs, the editors at
Cambridge University Press who guided the publication of this book.
STEVEN WEINBERG
1
Early Atomic Theory
It is an old idea that matter consists of atoms, tiny indivisible particles moving
in empty space. This theory can be traced to Democritus, working in the Greek
city of Abdera, on the north shore of the Aegean sea. In the late 400s BC
Democritus proclaimed that “atoms and void alone exist in reality.” He offered
neither evidence for this hypothesis nor calculations on which to base predic-
tions that could confirm it. Nevertheless, this idea was tremendously influential,
if only as an example of how it might be possible to account for natural phe-
nomena without invoking the gods. Atoms were brought into the materialistic
philosophy of Epicurus of Samos, who a little after 300 BC founded one of
the four great schools of Athens, the Garden. In turn, the idea of atoms and
the philosophy of Epicurus were invoked in the poem On the Nature of Things
by the Roman Lucretius. After this poem was rediscovered in 1417 it influ-
enced Machiavelli, More, Shakespeare, Montaigne, and Newton, among others.
Newton in his Opticks speculated that the properties of matter arise from the
clustering of atoms into larger particles, which themselves cluster into larger
particles, and so on. As we will see, Newton made a stab at an atomic theory of
air pressure, but without significant success.
The serious scientific application of the atomic theory began in the eighteenth
century, with calculations of the properties of gases, which had been studied
experimentally since the century before. This is the topic with which we begin
this chapter. Applications to chemistry and electrolysis followed in the nine-
teenth century and will be considered in subsequent sections. The final section
of this chapter describes how the nature of atoms began to be clarified with the
discovery of the electron. In the following chapter we will see how it became
possible to estimate the atoms’ masses and sizes.1
1 Further historical details about some of these matters can be found in Weinberg, The Discovery of
Subatomic Particles, listed in the bibliography.
1
2 1 Early Atomic Theory
Experimental Relations
The upsurge of enthusiasm for experiment in the seventeenth century was
largely concentrated on the properties of air. The execution and reports of these
experiments did not depend on hypotheses regarding atoms, but we need to
recall them here because their results provided the background for later theories
of gas properties that did rely on assumptions about atoms.
It had been thought by Aristotle and his followers that the suction observed
in pumps and bellows arises from nature’s abhorrence of a vacuum. This notion
was challenged in the 1640s by the invention of the barometer by the Florentine
polymath Evangelista Torricelli (1608–1647). If nature abhors a vacuum, then
when a long glass tube with one end closed is filled with mercury and set
upright with the closed end on top, why does the mercury flow out of the bottom
until the column is only 760 mm high, with empty space appearing above the
mercury? Is there a limit to how much nature abhors a vacuum? Torricelli
argued that the mercury is held up instead by the pressure of the air acting
on the open end of the glass tube (or on the surface of a bath of mercury in
which the open end of the tube is immersed), which is just sufficient to support
a column of mercury 760 mm high. If so, then it should be possible to measure
variations in air pressure using a column of mercury in a vertical glass tube, a
device that we know as a barometer. Such measurements were made from 1648
to 1651 by Blaise Pascal (1623–1662), who found that the height of mercury
in a barometer is decreased by moving to the top of a mountain, where less air
extends above the barometer.
The quantitative properties of air pressure soon began to be studied
experimentally, before there was any correct theoretical understanding of gas
properties. In 1662, in the second edition of his book New Experiments Physico-
Mechanical Concerning the Spring of the Air and its Effects, the Anglo-Irish
aristocrat Robert Boyle (1627–1691) described experiments relating the pres-
sure (the “spring of the air”) and volume of a fixed mass of air. He studied a
sample of air enclosed at the end of a glass tube by a column of mercury in
the tube. The air was compressed at constant temperature by pushing on the
mercury’s surface, revealing what came to be known as Boyle’s law, that for
constant temperature the volume of a gas of fixed mass and composition is
inversely proportional to the pressure, now defined by Boyle as the force per
area exerted on the gas.
Temperature Scales
A word must be said about the phrase “at constant temperature.” Boyle lived
before the establishment of our modern Fahrenheit and Celsius scales, whose
1.1 Gas Properties 3
Theoretical Explanations
In Proposition 23 of his great book, the Principia, Isaac Newton (1643–1727)
made an attempt to account for Boyle’s law by considering air to consist of
particles repelling each other at a distance. Using little more than dimensional
analysis, he showed that the pressure p of a fixed mass of air is inversely
proportional to the volume V if the repulsive force between particles separated
by a distance r falls off as 1/r. But as he pointed out, if the repulsive force goes
as 1/r 2 , then p ∝ V −4/3 . He did not claim to offer any reason why the repulsive
force should go as 1/r and, as we shall see, it is not forces that go as 1/r but
4 1 Early Atomic Theory
rather forces of very short range that act only in collisions that mostly account
for the properties of gases.
It was the Swiss mathematical physicist Daniel Bernoulli (1700–1782) who
made the first attempt to understand the properties of gases theoretically, on the
assumption that a gas consists of many tiny particles moving freely except in
very brief collisions. In 1738, in the chapter, “On the Properties and Motions of
Elastic Fluids, Especially Air” of his book Hydrodynamics, he argued that in a
gas (then called an “elastic fluid”) with n particles per unit volume moving with
a velocity v that is the same (because of collisions) in all directions, the pressure
is proportional to n and to v 2 , because the number of particles that hit any given
area of the wall in a given time is proportional to the number in any given
volume, to the rate at which they hit the wall, which is proportional to v, and to
the force that each particle exerts on the wall, which is also proportional to v.
For a fixed mass of gas n is inversely proportional to the volume V , so pV is
proportional to v 2 . If (as Bernoulli thought) v 2 depends only on the temperature,
this explains Boyle’s law. If v 2 is proportional to the absolute temperature, it
also gives Charles’ law.
Bernoulli did not give much in the way of mathematical details, and did not
try to say to what else the pressure might be proportional besides nv 2 , a matter
crucial for the history of chemistry. These details were provided by Rudolf
Clausius (1822–1888) in 1857, in an article entitled “The Nature of the Motion
which We Call Heat.” Below is a more-or-less faithful description of Clausius’
derivation, in a somewhat different notation.
Suppose a particle hits the wall of a vessel and remains in contact with it for a
small time t, during which it exerts a force with component F along the inward
normal to the wall. Its momentum in the direction of the inward normal to the
wall will decrease by an amount F t, so if the component of the velocity of the
particle before it strikes the wall is v⊥ > 0, and it bounces back elastically with
normal velocity component −v⊥ , the change in the inward normal component
of momentum is −2mv⊥ , where m is the particle mass, so
F = 2mv⊥ /t .
Now, suppose that this goes on with many particles hitting the wall over a time
interval T t, all particles with the same velocity vector v. The number N of
particles that will hit an area A of the wall in this time is the number of particles
in a cylinder with base A and height v⊥ T , or
N = nAv⊥ T ,
where n is the number density, the number of particles per volume. Each of
these particles is in contact with the wall for a fraction t/T of the time T , so the
total force exerted on the wall is
F N (t/T ) = 2mv⊥ /t × nAv⊥ T × (t/T ) = 2nmv⊥
2
A.
1.1 Gas Properties 5
We see that all dependence on the times t and T cancels. The pressure p is
defined as the force per area, so this gives the relation
p = 2nmv⊥
2
. (1.1.1)
This is for the unphysical case in which every particle has the same value of v⊥ ,
positive in the sense that the particles are assumed to be going toward the wall.
In the real world, different particles will be moving with different speeds in
different directions, and Eq. (1.1.1) should be replaced with
1 2
p = 2nm × v⊥ = nmv⊥2
, (1.1.2)
2
the brackets indicating an average over all gas particles, with the factor 1/2
inserted in the first expression because only 50% of these particles will be going
toward any given wall area.
To express v⊥ 2 in terms of the root mean square velocity, Clausius assumed
without proof that “on the average each direction [of the particle velocities]
is equally represented.” In this case, the average square of each component of
velocity equals v⊥ 2 , and the average of the squared velocity vector is then
This is essentially the result p ∝ nv2 of Bernoulli, except that, with the
factor m/3, Eq. (1.1.3) is now an equality, not just a statement of proportion-
ality. For a fixed mass M of gas occupying a volume V , the number density
is n = M/mV , so Clausius could use Boyle’s law (which he called Mariotte’s
law), which states that pV is constant for fixed temperature, to conclude that
for a given gas v2 depends only on the temperature. Further, as Clausius
remarked, Eq. (1.1.3) together with Charles’ law (which Clausius called the
law of Gay-Lussac) indicates that v2 is proportional to the absolute temper-
ature T . If we like, we can adopt a modern notation and write the constant of
proportionality as 3k/m, so that
mv2 /3 = kT , (1.1.4)
and therefore Eq. (1.1.3) reads
p = nkT , (1.1.5)
where k is a constant, in the sense of being independent of p, n, and T . But
the choice of notation does not tell us whether k varies from one type of gas
to another or whether it depends on the molecular mass m. Clausius could not
answer this question, and did not offer any theoretical justification for Boyle’s
6 1 Early Atomic Theory
1.2 Chemistry
Elements
The idea that all matter is composed of a limited number of elements goes back
to the earliest speculations about the nature of matter. At first, in the century
before Socrates, it was supposed that there is just one element: water (Thales) or
air (Anaximenes) or fire (Heraclitus) or earth (perhaps Xenophanes). The idea of
four elements was proposed around 450 BC by Empedocles of Acragas (modern
Agrigento). In On Nature he identified the elements as “fire and water and earth
and the endless height of air.” Classical Chinese sources list five elements: water,
fire, earth, wood, and metal.
Like the theory of atoms, these early proposals of elements did not come
accompanied with any evidence that these really are elements, or any suggestion
how such evidence might be gained. Plato in Timaeus even doubled down and
stated that the difference between one element and another arises from the
shapes of the atoms of which the elements are composed: earth atoms are tiny
cubes, while the atoms of fire, air, and water are other regular polyhedra –
solids bounded respectively by 4, 8, or 20 identical regular polygons, with every
edge and every vertex of each solid the same as every other edge or vertex of
that solid.
By the end of the middle ages this list of elements had come to seem implau-
sible. It is difficult to identify any particular sample of dirt as the element earth,
and fire seems more like a process than a substance. Alchemists narrowed the
list of elements to just three: mercury, sulfur, and salt.
Modern chemistry began around the end of the eighteenth century, with
careful experiments by Joseph Priestley (1733–1804), Henry Cavendish (1743–
1810), Antoine Lavoisier (1743–1794), and others. By 1787 Lavoisier had
1.2 Chemistry 7
worked out a list of 55 elements. In place of air there were several gases:
hydrogen, oxygen, and nitrogen; air was identified as a mixture of nitrogen and
oxygen. There were other non-metals on the list of elements: sulfur, carbon,
and phosphorus, and a number of common metals: iron, copper, tin, lead, silver,
gold, mercury. Lavoisier also listed as elements some chemicals that we now
know are tightly bound compounds: lime, soda, and potash. And the list also
included heat and light, which of course are not substances at all.
Water HO H2 O
Carbon dioxide CO2 CO2
Ammonia NH NH3
Sulfuric acid SO2 H2 SO4
8 1 Early Atomic Theory
Here is a list of the approximate true atomic weights for a few elements,
the weights deduced by Dalton, and (in the column marked with an asterisk) the
weights Dalton would have calculated if he had known the true chemical
formulas.
H 1 1 1
C 12 4.3 8.6
N 14 4.2 12.6
O 16 5.5 11
S 32 14.4 57.6
plausible, molecules of water contain two atoms of hydrogen and one atom of
oxygen, the molecules of oxygen and hydrogen must each contain two atoms.
That is, taking water molecules as H2 O, the reaction for producing molecules
of water is
2H2 + O2 → 2H2 O .
The use of Avogadro’s principle rapidly provided the correct formulas for gases
such as CO2 , NH3 , NO, and so on. Knowing these formulas and measuring
the weights of gases participating in various reactions, it was possible to cor-
rect Dalton’s atomic weights and calculate more reliable values for the atomic
weights of the atoms in gas molecules, relative to any one of them. Taking the
atomic weight of hydrogen as unity, this gave atomic weights close to 12 for
carbon, 14 for nitrogen, 16 for oxygen, 32 for sulfur, and so on. Then, knowing
these atomic weights, it became possible to find atomic weights for many other
elements, not just those commonly found in gases, by measuring the weights of
elements combining in various chemical reactions.
pV = NkT . (1.2.1)
m = μm1 . (1.2.2)
10 1 Early Atomic Theory
In the modern system of atomic weights, with the atomic weight of the most
common isotope of carbon defined as precisely 12, m1 = 1.660539 × 10−24 g,
which of course was not known in Avogadro’s time. A mass M contains
N = M/m = M/m1 μ molecules, so the ideal gas law (1.2.1) can be written
pV = MkT /m1 μ = (M/μ)RT (1.2.3)
where R is the gas constant
R = k/m1 . (1.2.4)
Physicists in the early nineteenth century could use Eq. (1.2.3) to measure R,
and they found a value close to the modern value R = 8.314 J/K. This would
have allowed a determination of m1 and hence of the masses of all atoms of
known atomic weight if k were known, but k did not become known until the
developments described in Section 2.6.
Avogadro’s Number
Incidentally, a mole of any element or compound of molecular weight μ is
defined as μ grams, so in Eq. (1.2.3) the ratio M/μ expressed in grams equals
the number of moles of gas. Since N = M/m1 μ, one mole contains a number of
molecules equal to 1/m1 with m1 given in grams. This is known as Avogadro’s
number. But of course Avogadro did not know Avogadro’s number. It is now
known to be 6.02214 × 1023 molecules per mole, corresponding to unit molec-
ular weight m1 = 1.66054 × 10−24 grams. The measurement of Avogadro’s
number was widely recognized in the late nineteenth century as one of the great
challenges facing physics.
1.3 Electrolysis
Early Electricity
Electricity was known in the ancient world, as what we now call static
electricity. Amber rubbed with fur was seen to attract or repel small bits of light
material. Plato in Timaeus mentions “marvels concerning the attraction of
amber.” (This is where the word electricity comes from; the Greek word for
amber is “elektron.”)
Electricity began to be studied scientifically in the eighteenth century. Two
kinds of electricity were distinguished: resinous electricity is left on an amber
rod when rubbed with fur, while vitreous electricity is left on a glass rod when
rubbed with silk. Unlike charges were found to attract each other, while like
charges repel each other. Benjamin Franklin (1706–1790) gave our modern
terms positive and negative to vitreous and resinous electricity, respectively.
1.3 Electrolysis 11
Early Magnetism
Magnetism too was known in the ancient world, as what we now call per-
manent magnetism. The Greeks knew of naturally occurring lodestones that
could attract or repel small bits of iron. Plato’s Timaeus refers to lodestones as
“Heraclean stones.” (Our word magnet comes from the city Magnesia in Asia
Minor, near where lodestones were commonly found.)
Very early the Chinese also discovered the lodestone and used it as a magnetic
compass (a “south-seeking stone”) for purposes of geomancy and navigation.
Each lodestone has a south-seeking pole at one end, attracted to a point near the
South Pole of the Earth, and a north-seeking pole at the other end, attracted to a
point near the Earth’s North Pole. Magnetism was first studied scientifically by
William Gilbert (1544–1603), court physician to Elizabeth I. It was observed
that the south-seeking poles of different lodestones repel each other, and like-
wise for the north-seeking poles, while the south-seeking pole of one lodestone
attracts the north-seeking pole of another lodestone. Gilbert concluded that one
pole of a lodestone is pulled toward the north and the other toward the south
because the Earth itself is a magnet, with what in a lodestone would be its
south-seeking and north-seeking poles respectively near the Earth’s North Pole
and South Pole.
Electromagnetism
It began to be possible to explore the relations between electricity and mag-
netism quantitatively with the invention in 1809 of electric batteries by Count
Alessandro Volta (1745–1827). These were stacks of disks of two different
metals separated by cardboard disks soaked in salt water. Such batteries drive
steady currents of electricity through wires attached to the ends of the stacks,
with positive and negative terminals identified respectively as the ends of the
stacks from which and towards which electric current flows.
12 1 Early Atomic Theory
2 This quantity is independent of the units used for electric charge as long as the currents appearing in
Eq. (1.3.2) are defined as the rates of flow of charge in the same units as used in Eq. (1.3.1). It is obviously
also independent of the units used for force, as long as the same force units are used in Eqs. (1.3.1) and
(1.3.2).
1.3 Electrolysis 13
Discovery of Electrolysis
Electrolysis was discovered in 1800 by the chemist William Nicholson (1753–
1815) and the surgeon Anthony Carlisle (1768–1840). They found that bubbles
of hydrogen and oxygen would be produced where wires attached respectively
to the negative and positive terminals of a Volta-style battery were inserted in
water. Sir Humphrey Davy (1778–1829), Faraday’s boss at the Royal Institution,
carried out extensive experiments on the electrolysis of molten salts, finding
for instance that, in the electrolysis of molten table salt, sodium, a previously
unknown metal, was produced at the wire attached to the negative terminal of
the battery and a greenish gas, chlorine, was produced at the wire attached
to the other, positive, terminal. Davy’s electrolysis experiments added several
metals aside from sodium to Lavoisier’s list of elements, including aluminum,
potassium, calcium, and magnesium.
A theory of electrolysis was worked out by Faraday. In modern terms, a
small fraction (1.8 × 10−9 at room temperature) of water molecules are nor-
mally dissociated into positive hydrogen ions (H+ ), which are attracted to the
wire attached to the negative terminal of a battery, and negative hydroxyl ions
(OH− ), which are attracted to the wire attached to the positive terminal. At the
wire attached to the negative terminal, two H+ ions combine with two units of
negative charge from the battery to form a neutral H2 molecule. At the wire
attached to the positive terminal, four OH− ions give one O2 molecule plus
two H2 O molecules plus four units of negative charge, which flow through the
battery to the negative terminal.3
Likewise, a small fraction of molten table salt (NaCl) molecules are normally
dissociated into Na+ ions and Cl− ions. At the wire attached to the negative
terminal of a battery, one Na+ ion plus one unit of negative charge gives one
atom of metallic sodium (Na); at the wire attached to the positive terminal, two
Cl− ions give one chlorine (Cl2 ) molecule and two units of negative charge,
which flow through the battery to the negative terminal.
In Faraday’s theory, it takes one unit of electric charge to convert a singly
charged ion such as H+ or Cl− to a neutral atom or molecule, so since molecules
of molecular weight μ have mass μm1 , it takes M/m1 μ units of electric charge
to convert a mass M of singly charged ions to a mass M of neutral atoms
or molecules of molecular weight μ. Experiment showed that it takes about
96 500 coulombs (e.g., one ampere for about 96 500 seconds) to convert μ
grams (that is, one mole) of singly charged ions to neutral atoms or molecules.
(This is called a faraday; the modern value is 96 486.3 coulombs/mole.) Hence
3 We now know that it is negative charge, i.e., electrons, that flows through a battery. As far as Faraday knew,
it was equally possible that positive charges flow through a battery, in which case at the wire attached to the
negative terminal two H+ ions would give an H2 molecule plus two units of positive charge, which would
flow though the battery to the wire attached to the positive terminal, where four OH− ions plus four units
of positive charge would give an O2 molecule and two H2 O molecules.
14 1 Early Atomic Theory
The successful uses of atomic theory described in the previous chapter did not
settle the existence of atoms in all scientists’ minds. This was in part because
of the appearance in the first half of the nineteenth century of an attractive
competitor, the physical theory of thermodynamics. As we shall see in the first
three sections of this chapter, with thermodynamics one may derive powerful
results of great generality without ever committing oneself to the existence of
atoms or molecules. But thermodynamics could not do everything. Section 2.4
will describe the advent of kinetic theory, which is based on the assumption that
matter consists of very large numbers of particles, and its generalization to sta-
tistical mechanics. From these thermodynamics could be derived, and together
with the atomic hypothesis it yielded results far more powerful than could be
obtained from thermodynamics alone. Even so, it was not until the appearance
of direct evidence for the graininess of matter, described in Section 2.5, that
the existence of atoms became almost universally accepted.
The first step in the development of thermodynamics was the recognition that
heat is a form of energy. Though so familiar to us today, this was far from
obvious to the physicists and chemists of the early nineteenth century. Until the
1840s heat was widely regarded as a fluid, named caloric by Lavoisier. Caloric
theory was used to calculate the speed of sound by Pierre-Simon Laplace
(1749–1827) in 1816, the conduction of heat by Joseph Fourier (1768–1830) in
1807 and 1822, and the efficiency of steam engines by Sadi Carnot (1796–1832)
in 1824, whose work as we will see in the next section became a foundation
of thermodynamics. Adding to the confusion, other scientists considered heat
as some sort of wave. This reflected uncertainty regarding the nature of what
is now called infrared radiation, discovered by William Herschel (1738–1822)
in 1800.
16
2.1 Heat and Energy 17
Heat as Energy
In 1798 Benjamin Thompson (1753–1814), an American expatriate in Eng-
land, offered evidence against the idea that heat is a fluid. (Thompson is also
known as Count Rumford, a title he was given when he later served as military
adviser in Austria.) It was well known that boring a cannon produces heat,
which might be supposed to be due to the liberation of caloric from the iron,
but Rumford observed that if the heat is carried away by immersing the cannon
in running water while it is being bored there is no limit to the heat that can be
produced.
The first measurement of the energy in heat was provided in the mid-1840s
by James Prescott Joule (1818–1889). In his apparatus a falling weight turned
paddles in a tank of water, heating the water. The gravitational force on a mass m
kilograms is m times the acceleration of gravity, 9.8 meters/sec2 or 9.8 newtons
per kilogram. Work is force times distance, so dropping one kilogram a distance
of one meter gave it an energy equal to 9.8 newton meters, now also known as
9.8 joules. Joule found that the paddles driven by this dropping weight would
raise the temperature of 100 grams of water by 0.023 ◦ C, so the paddles pro-
duced heat equal to 0.023 × 100 calories, the calorie being defined as the heat
required to raise the temperature of one gram of water by one degree Celsius.
Hence Joule could conclude that 9.8 joules is equivalent to 2.3 calories, so one
calorie is equivalent to 9.8/2.3 = 4.3 joules. The modern value is 4.184 joules.
In 1847 the Prussian physician and physicist Hermann von Helmholtz
(1821–1894) put forward the idea of the universal conservation of energy,
whether in the form of kinetic or potential or chemical energy or heat. But
what sort of energy is heat? For some nineteenth century physicists the question
was irrelevant. They developed the science of heat known as thermodynamics,
which did not depend on any detailed model of heat energy. But there was one
context in which the nature of heat energy seemed evident. In his great 1857
paper, The Nature of the Motion which We Call Heat, Clausius found that at
least part of the heat energy of gases is the kinetic energy of their molecules.
Kinetic Energy
The concept of kinetic energy was long familiar. If a steady force F is exerted
on a particle of mass m, it produces an acceleration F /m, so after a time t the
velocity of the body is v = F t/m. The distance traveled in this time is t times
the average velocity v/2, and the work done on the particle is the force times
this distance:
F × t × F t/2m = F 2 t 2 /2m = mv 2 /2 .
Instead of this work going into heating a tub of water, as in the experiment of
Joule, it goes into giving the particle an energy mv 2 /2.
18 2 Thermodynamics and Kinetic Theory
This energy has the special property of being conserved when bodies come
into contact in collisions. Consider a collision between two rigid balls A and B
with initial vector velocities vA and vB . For the moment suppose that the time
interval t over which this force acts is sufficiently brief that the forces acting on
the balls do not change appreciably during this time. The force that A exerts on
B is equal and opposite to the force F that B exerts on A, so Newton’s second
law tells us that the final velocities of A and B are vA = vA + Ft/mA and
vB = vB − Ft/mB . Hence, as Newton showed, momentum is conserved:
mA vA + mB vB = mA vA + mB vB . (2.1.1)
Neglecting changes in acceleration during the brief time t, the vector displace-
ments traveled by A and B equal t times the average velocities, [vA + vA ]/2
and [vB + vB ]/2, respectively. If the balls remain in contact during this time
interval, then these displacements must be the same, so
vA + vA = vB + vB . (2.1.2)
To derive a second conservation law, rewrite Eq. (2.1.2) as vB − vA = vA − vB
and square this, giving
v2 2
B − 2vB · vA + vA = vB − 2vB · vA + vA .
2 2
Multiply this with mA mB and add the square of Eq. (2.1.1), so that the scalar
products cancel. Dividing by 2(mA +mB ), the result is another conservation law,
mA 2 mB 2 mA 2 mB 2
vA + vB = vA + v . (2.1.3)
2 2 2 2 B
Equations (2.1.1) and (2.1.3) have been derived here only for the case in
which the particles are in contact only for a brief time interval during which
the force acting between the bodies is constant, but this is not an essential
requirement for we can break up any time interval into a large number of brief
intervals in each of which the change in the force is negligible, Then, since
mA vA + mB vB and mA v2A /2 + mB v2B /2 do not change in each interval, they do
not change at all, as long as the bodies exert forces on each other only when they
are in contact.
In 1669 Christiaan Huygens (1629–1695) reported in Journal des Sçavans
that he had confirmed the conservation of the total of mv2 /2, probably by
observing collisions of pendulum bobs, for which initial and final velocities
could be precisely determined. Newton in the Principia called the conserved
quantity mv the quantity of motion, while Huygens gave the name vis viva
(“living force”) to the conserved quantity mv2 /2. These two quantities have
since become known as momentum and kinetic energy.
On the other hand, it was essential in deriving the conservation of kinetic
energy that we assumed that particles interact only when in contact. This is
generally a good approximation in gases, but it is not valid in the presence of
2.1 Heat and Energy 19
Specific Heat
The total kinetic energy of N molecules of gas of mass m and mean square
velocity v2 is Nmv2 /2. Clausius had found the relation (1.1.4) between
mean square velocity and absolute temperature, according to which mv2 /2 =
3kT /2, where k is some constant (later identified as a universal constant of
nature), so the total kinetic energy is 3NkT /2. A mass M of gas of molecu-
lar weight μ contains N = M/μm1 molecules, so the total kinetic energy is
3MRT /2μ, where R = k/m1 is the gas constant (1.2.4). Clausius concluded
that to raise the temperature of a mass M of gas of molecular weight μ by
an amount dT at constant volume, so that the gas does no work on its con-
tainer, requires an energy dE = 3MRdT /2μ. The ratio dE/MdT is known as
the specific heat, so Clausius found that the specific heat of a gas at constant
volume is
Cv = 3R/2μ . (2.1.4)
This result must be distinguished from the value for a different sort of specific
heat, measured at constant pressure, such as when the gas is in a container with
an expandable wall, for which the volume V can change to keep the pressure p
equal to the pressure of the surrounding air or other medium. When pressure
pushes a surface of area A a small distance dL, the work done is the force pA
times dL, which equals pdV where dV = AdL is the change in volume.
According to the ideal gas law (1.2.3), pV = RT M/μ, so if the temperature
is increased by an amount dT , then at constant pressure the gas does work
pdV = RMdT /μ, and this temperature increase therefore requires an energy
3MRdT /2μ + MRdT /μ = 5MRdT /3. In other words, the specific heat at
constant pressure is
Cp = 5R/2μ . (2.1.5)
This result is often expressed in terms of the ratio of specific heats,
γ ≡ Cp /Cv . (2.1.6)
So Clausius found that if all the heat of a gas is contained in the kinetic energy
of its molecules then γ = 5/3.
This did not agree with measurements of the specific heats of common
diatomic gases, such as oxygen or hydrogen, which Clausius cited as giving
γ = 1.421. Later, it was found that γ does indeed equal 5/3 for a monatomic
gas like mercury vapor, but this left the question, in what form is the energy in
ordinary gases that are not monatomic?
20 2 Thermodynamics and Kinetic Theory
To deal with this issue, Clausius suggested that the internal energy of a gas is
larger than the kinetic energy of the molecules, say by a factor 1 + f , with f
some positive number. Then instead of Eq. (2.1.4) we have
Cv = (1 + f ) × 3R/2μ , (2.1.7)
and in place of Eq. (2.1.5),
Cp = (1 + f )3R/2μ + R/μ . (2.1.8)
The specific heat ratio is then
2
γ =1+ . (2.1.9)
3(1 + f )
This is often expressed (especially in astrophysics) as a formula for the internal
energy density E in terms of the pressure and γ :
E = 3RT M(1 + f )/2μV = 3(1 + f )p/2 = p/(γ − 1) . (2.1.10)
The observation that γ 1.4 for diatomic gases like O2 and H2 indicated
that the internal energy of these gases is larger than the kinetic energy of its
molecules by a factor 1 + f 5/3. Measurements gave values of γ for more
complicated molecules like H2 O or CO2 even closer to unity, indicating that
f is even larger for these molecules. The reason for these values for f and γ
did not become clear until the formulation of the equipartition of energy, to be
discussed in Section 2.4.
Adiabatic Changes
It often happens that work is done adiabatically, that is, without the transfer of
heat. In this case the conservation of energy tells us that the work done by an
expanding fluid must be balanced by a decrease in its internal energy E V :
0 = p dV + d(E V ) = (p + E )dV + V d E . (2.1.11)
For an ideal gas, the internal energy per unit volume E is given by Eq. (2.1.10),
so this tells us that
0 = γp dV + V dp
and so, in an adiabatic process,
p ∝ V −γ ∝ ρ γ , (2.1.12)
or, since for a fixed mass p ∝ T /V ,
T ∝ V 1−γ ∝ ρ γ −1 . (2.1.13)
This is in contrast with an isothermal process, for which T is constant and
p ∝ V −1 .
2.2 Absolute Temperature 21
Newton thought
√ that p would be proportional to ρ in a sound wave, which would
give cs = p/ρ, but in fact at audible frequencies the pressure is given by the
√
adiabatic relation (2.1.12), and cs is larger than Newton’s value by a factor γ .
gas thermometer, relying on the ideal gas law pV = MRT /μ, but this law
is approximate, holding precisely only for molecules of negligible size that
interact only in contact in collisions. How can we give precise meaning to
numerical values of temperature without relying on approximate relations?
Surprisingly, as shown by Rudolf Clausius in his 1850 paper1 “On the Mov-
ing Force of Heat,” it is possible by find a definition of temperature T with
absolute significance by the study of thermodynamic engines known as Carnot
cycles.
Sadi Carnot (1796–1832) was a French military engineer, the son of Lazare
Carnot, organizer of military victory in the French Revolution, and uncle of
a later president of the Third Republic. In 1824 Carnot in Reflections on the
Motive Power of Fire set out to study the efficiency of steam engines, explaining
that “Already the steam engine works our mines, impels our ships, excavates
our ports and our rivers, forges iron, fashions wood, grinds grains, spins and
weaves our clothes, transports our heaviest burdens, etc.” (A few years later he
might also have mentioned the beginning of steam-propelled locomotives, with
the opening of the Liverpool–Manchester railroad in 1830.) Carnot invented an
idealized engine, known as a Carnot cycle, which as we shall see is maximally
efficient and provides a natural definition of absolute temperature.
In the Carnot cycle, a working fluid (such as steam in a cylinder fitted with a
piston) goes through four frictionless steps:
1. Isothermal: The working fluid does work on its environment, for instance
by pushing a piston against external pressure, but keeping a constant tem-
perature by absorbing heat Q2 from a hot reservoir at temperature t2 . (We
will continue to use lower case t to indicate temperature defined in any
way that indicates the direction of heat flow, without specifying any physical
significance to its particular numerical values.)
2. Adiabatic: The working fluid, perfectly insulated from its environment and
with no internal friction, does more work, with its temperature dropping to
the temperature t1 of a cold reservoir but with no heat flowing in or out.
3. Isothermal: Work is done on the fluid, for instance by pushing in the piston,
with its temperature kept constant by its giving up heat Q1 to the cold
reservoir.
4. Adiabatic: With the working fluid again completely insulated from its envi-
ronment, work is done on it, bringing its volume back down to its original
value and its temperature back up to the temperature t2 of the hot reservoir.
1 This paper is reprinted in Brush, The Kinetic Theory of Gases – An Anthology of Classic Papers with
Historical Commentary, listed in the bibliography.
2.2 Absolute Temperature 23
p
A
1
B
D
3
C
A graph of the pressure versus the volume of the working fluid in this cycle is
a closed curve, with the net work W done on the environment equal to pdV –
that is, to the area enclosed by the curve. (See Figure 2.1.) As long as steps 2
and 4 are truly adiabatic, the conservation of energy tells us that this work is
W = Q2 − Q1 (2.2.1)
and the efficiency of this cycle is
Q2 − Q1
W/Q2 = <1. (2.2.2)
Q2
(We call this the efficiency, having in mind that, as for a steam engine, we have
to pay for the heat Q2 taken up at the higher temperature t2 , while the heat Q1
given up at the lower temperature t1 is wasted.)
Any Carnot cycle is reversible, because any frictionless adiabatic or isother-
mal process follows the same track, depending only on its endpoints, whichever
direction the process takes. But not all thermodynamic cycles, which take a
working fluid through a series of steps back to the original temperature and
volume, are reversible even though of course they all conserve energy. For
reversibility it is not enough that all steps be either isothermal or adiabatic –
there also should be no friction, which if present would provide an internal
source of heat that is not available to do work.
24 2 Thermodynamics and Kinetic Theory
2 This treatment and that of the following section is based on that given by Fermi, Thermodynamics, listed
in the bibliography.
2.2 Absolute Temperature 25
and therefore
W W
≥ (2.2.5)
Q2 Q2
as was to be proved in the first part of the theorem.
As to the second part of the theorem, note that if C is also a Carnot cycle
then, by the same reasoning,
W W
≥ ,
Q2 Q2
so the efficiencies are equal:
W W
= . (2.2.6)
Q2 Q2
This has now been proved for any pair of Carnot cycles, operating between the
same temperatures t2 and t1 , whatever the values of the heat taken from the
reservoir at temperature t2 and given up to the reservoir at temperature t1 , so
the common efficiency can only depend on t2 and t1 , as was to be proved.
We shall write this relation in terms of the inefficiency:
W Q1
1− = ≡ F (t1 , t2 ) (2.2.7)
Q2 Q2
with F the same function for all Carnot cycles. We next prove that the function
F (t1 , t2 ) takes the form
F (t1 , t2 ) = T (t1 )/T (t2 ) (2.2.8)
for some function T (t). For this purpose we consider a compound cycle consist-
ing of a Carnot cycle operating between the temperatures t2 and t0 ≤ t2 followed
by a Carnot cycle operating between the temperatures t0 and t1 ≤ t0 , with all the
waste heat that is given to the reservoir at temperature t0 in the first cycle taken
up from this reservoir in the second cycle. Since (Q0 /Q2 )(Q1 /Q0 ) = Q1 /Q2 ,
the inefficiency (2.2.7) of the compound cycle is the product of the inefficiencies
of the individual cycles, so
F (t1 , t2 ) = F (t1 , t0 )F (t0 , t2 ) . (2.2.9)
From Eq. (2.2.7) it is evident that F (t2 , t0 )F (t0 , t2 ) = 1, so Eq. (2.2.9) may be
written
F (t1 , t0 )
F (t1 , t2 ) = . (2.2.10)
F (t2 , t0 )
This holds for any t0 with t2 ≥ t0 ≥ t1 , so we can define T (t) ≡ F (t, t0 ) with
an arbitrary choice of t0 in this range, and then Eq. (2.2.10) is the desired result
(2.2.8).
26 2 Thermodynamics and Kinetic Theory
Now, efficiencies are never greater than 100%, so the ratio F (t1 , t2 ) =
T (t1 )/T (t2 ) in Eq. (2.2.7) must be positive, and so T (t) has the same sign
for all temperatures. Since only the ratios of the T s appear in the efficiency,
we are free to choose this sign to be positive, so that T (t) ≥ 0 for all t.
Also, inefficiencies are never greater than 100%, so Eq. (2.2.8) shows that
T (t1 ) ≤ T (t2 ) for any t1 and t2 with t1 ≤ t2 . That is, T (t) is a monotonically
increasing function of t and can therefore be used to judge the direction of
spontaneous heat flow as well as t itself.
We can therefore define the absolute temperature T by just using T (t) as
the temperature in place of t. That is, using Eqs. (2.2.7) and (2.2.8), we define
absolute temperature T by the statement that a Carnot cycle running between
any two temperatures T2 and T1 has
Q1 T1
= . (2.2.11)
Q2 T2
A Carnot cycle running between an upper temperature T2 and a lower tempera-
ture T1 has an efficiency
W Q2 − Q1 T2 − T1
= = . (2.2.12)
Q2 Q2 T2
Of course, this only defines T up to a constant factor, leaving us free to use
what units we like for temperature. But we are not free to shift T (t) by adding
a constant term. Indeed, since in this Carnot cycle heat flows from a reservoir at
temperature T2 to one at temperature T1 , we must have T2 > T1 , and therefore in
order for the efficiency (2.2.11) to be a positive quantity, the lower temperature
must have T1 > 0. Because any heat reservoir must have T positive-definite, we
see that T is the absolute temperature, in the same sense as was found for gases
by Charles.
The temperature defined by Carnot cycles is identical (up to a choice of
units) to the temperature given by a gas thermometer, which for the moment
we will call T g , in the approximation that the gas is ideal. To see this, let us
label the states of the gas as A at the start of the isothermal expansion 1 (and
at the end of the adiabatic compression 4); as B at the start of the adiabatic
expansion 2 (and the end of the isothermal expansion 1); as C at the start of
the isothermal compression 3 (and the end of the adiabatic expansion 2); and
as D at the start of the adiabatic compression 4 (and the end of the isothermal
compression 3). Since the expansion from A to B is isothermal, during this
phase the internal energy of the gas, which is given by Eqs. (1.2.3) and (2.1.10)
as E V = RT M/(γ − 1)μ, does not change, and so the heat drawn from the hot
reservoir is the work done:
B g B g
MRT2 dV MRT2 VB
Q2 = pdV = = ln .
A μ A V μ VA
2.3 Entropy 27
Likewise, the heat given up to the cold reservoir in the isothermal compression
from C to D is
g
MRT1 VC
Q1 = ln .
μ VD
Further, since the expansion from B to C and the contraction from D to A are
adiabatic, Eq. (2.1.13) gives V ∝ (T g )−1/(γ −1) , and so during these parts of the
cycle
g 1/(γ −1)
T2
VC /VB = VD /VA = g ,
T1
and therefore VB /VA = VC /VD , and the logarithmic factors in Q2 and Q1 are
equal. The efficiency is then
g g
Q2 − Q1 T −T
= 2 g 1
Q2 T2
in agreement with Eq. (2.2.12) if T = T g , up to a possible constant factor.
2.3 Entropy
Proof: The first step in proving these results is to prove the following lemma:
for an arbitrary cycle, reversible or irreversible, that takes a system from any
state back to the same state, taking in and giving up heat at various temperatures,
we have
dQ
≤0. (2.3.3)
T
After establishing this lemma, the rest of the proof will be straightforward.
To prove this lemma we can approximate the cycle by a sequence of brief
isothermal steps, in each of which the system takes in heat (if dQ is positive)
or gives heat up (if dQ is negative) at a momentary temperature T . We can
imagine that, at each step, the heat taken in or given up is given up or taken
in by another system, which undergoes a Carnot cycle between the momentary
temperature T and a fixed temperature T0 . In this Carnot cycle, the ratio of the
heat dQ given up by the Carnot cycle to and the heat dQ0 taken by the Carnot
cycle from the reservoir at temperature T0 is given by Eq. (2.2.11):
dQ T
= ,
dQ0 T0
or in other words
dQ
dQ0 = T0 .
T
Hence in the complete cycle the Carnot cycles take in a total net heat T0 dQ/T
from the reservoir at temperature T0 . Since the system and each of the Carnot
cycles return to their original states, if this heat taken in at temperature T0 were
positive-definite then it would have to go into work, which is impossible since
work cannot be done by taking heat from a reservoir at a fixed temperature with
no changes elsewhere. (If it could, then this work by producing friction could
2.3 Entropy 29
be used to transfer some heat to any body, even one at a temperature higher
than T0 .) So we conclude that the integral dQ/T is at most zero, as was to be
shown.
The rest is easy. Note that if two paths P and P are both reversible paths
that go from state 0 to state 1, then P P −1 is a closed cycle, where P −1 is path
P taken in reverse, from state 1 to state 0. It follows then from the inequality
(2.3.3) that
dQ dQ dQ
0≥ = − .
P P −1 T P T P T
These two results are consistent only if for reversible paths both cyclic integrals
vanish, in which case
dQ dQ
= . (2.3.4)
P T P T
the fluid consists of infinitesimal particles that interact only in contact in colli-
sions; since there is nothing with the dimensions of length that can enter in the
calculation of the energy, E(V , T ) cannot here depend on volume. In this case
Eq. (2.3.9) yields Charles’ law, that for fixed volume V the pressure p(V , T ) is
proportional to T . This shows again that the absolute temperature T in the ideal
gas law (1.2.3) is the same up to a constant factor as the temperature T defined
by the efficiency (2.2.11) of Carnot cycles.
Although this result was obtained without having a formula for the entropy,
for some purposes it is useful actually to know what the entropy is. In a homo-
geneous medium, the entropy S of any mass M of matter may conveniently be
written as S = Ms, where s is the entropy per unit mass, a function of temper-
ature and various densities known as the specific entropy. Dividing Eq. (2.3.6)
by M, we have then
T ds = d(E /ρ) + pd(1/ρ) , (2.3.10)
where as before E ≡ E/V is the internal energy density and ρ ≡ M/V is the
mass density. We consider an ideal gas, for which T = pμ/Rρ while E and p
are related by Eq. (2.1.10): E = p/(γ − 1). Then Eq. (2.3.10) gives
pμ 1 dp 1 ρ γ −1 p
ds = + γpd = d ,
Rρ γ −1 ρ ρ γ −1 ργ
so
R/μ ρ γ p
ds = d .
γ −1 p ργ
The solution is
R/μ p
s= ln + constant . (2.3.11)
γ −1 ργ
We see that the result of Section 2.2 that p ∝ ρ γ for adiabatic processes is just
the statement that s is constant in these processes, which of course it must be
since in an adiabatic process the heat input dQ vanishes.
In many stars there are regions in which convection effectively mixes matter
from various depths. Since heat conduction is usually ineffective in stars, little
heat flows into or out of a bit of matter as it rises or falls, and so it keeps the
same specific entropy. These regions therefore have a uniform specific entropy,
and therefore a uniform value for the ratio p/ρ γ . For instance, this is the case in
the Sun for distances from the center greater than about 65% of the Sun’s radius
out to a thin surface layer.
Neutral Matter
We have been mostly concerned with matter in which in each mass there is a
non-vanishing conserved quantity, the number of particles. There is a different
32 2 Thermodynamics and Kinetic Theory
in Eq. (2.3.1) must converge. This has the consequence, in particular, that the
specific heat dQ/dT must vanish for T → 0. This seems to contradict the
results of Section 2.1 for ideal gases, which give a temperature-independent
specific heat whether for fixed volume or fixed pressure. The contradiction is
avoided in practice because no substance remains close to an ideal gas as the
temperature approaches absolute zero. We will see when we come to quantum
mechanics that if an otherwise free particle is confined in any fixed volume, then
it cannot have precisely zero momentum, as required for a classical ideal gas at
absolute zero temperature. On the other hand, solids can exist at absolute zero
temperature, and in that limit their specific heats do approach zero.
We saw in the previous chapter how by the mid nineteenth century the ideal gas
law had been established through the work especially of Bernoulli and Clausius.
But, though derived by considering the motions of individual gas molecules, in
its conclusions it dealt only with bulk gas properties such as pressure, tem-
perature, mass density, and energy density. For many purposes, including the
calculation of chemical or transport processes, it was necessary to go further
and work out the detailed probability distribution of the motion of individual
gas particles. This was done in the kinetic theory of James Clerk Maxwell and
Ludwig Boltzmann (1844–1906). Kinetic theory was later generalized to the
formalism known as statistical mechanics, especially by the American theorist
Josiah Willard Gibbs (1839–1906). As it turned out, these methods went a
long way toward not only establishing a correspondence with thermodynamics
but also explaining the principles of thermodynamics on the assumption that
macroscopic matter is composed of very many particles, and thereby helping to
establish the reality of atoms.
3 J. C. Maxwell, Phil. Mag. 19, 19; 20, 21 (1860). This article is included in Brush, The Kinetic Theory of
Gases – An Anthology of Classic Papers with Historical Commentary, listed in the bibliography.
34 2 Thermodynamics and Kinetic Theory
He assumed (without offering a real justification) that the probability that any
component of velocity of a particle is in a particular range is not correlated with
the other components of the velocity. Then P (vx , vy , vz ) must be proportional
to a function of vx alone, with a coefficient that depends only on vy and vz , and
likewise for vy and vz , so P (vx , vy , vz ) must take the form of a product:
P (vx , vy , vz ) = f (vx )g(vy )h(vz ) .
Rotational symmetry requires further that P can depend only on the magnitude
of the velocity, not on its direction, and hence only on vx2 + vy2 + vz2 . The only
function of vx2 + vy2 + vz2 that takes the form f (vx )g(vy )h(vz ) is proportional to
an exponential:
P (vx , vy , vz ) ∝ exp − C(vx2 + vy2 + vz2 ) .
The constant C must be positive in order that P should not blow up for large
velocity, which would make it impossible to set the total probability equal to
unity, as it must be. Taking C to be positive, and setting the total probability
(the integral of P over all velocities) for each particle equal to one, gives the
factor of proportionality:
3/2
C
P (vx , vy , vz ) = exp − C(vx2 + vy2 + vz2 ) .
π
We can use this to calculate the mean square velocity components:
1
vx2 = vy2 = vz2 = .
2C
Clausius had introduced an absolute temperature T by setting mv⊥ 2 = kT ,
4 L. Boltzmann, Sitz. Ber. Akad. Wiss. (Vienna), part II, 66, 875 (1872). A translation into English of this
article is included in Brush, The Kinetic Theory of Gases – An Anthology of Classic Papers with Historical
Commentary, listed in the bibliography.
2.4 Kinetic Theory and Statistical Mechanics 35
+∞ +∞ +∞
H ≡ ln P = dvx dvy dvz P (v) ln P (v) ,
−∞ −∞ −∞
and showed that collisions of gas particles always lead to a decrease in H until
a minimum is reached, at which P (v) is the Maxwell–Boltzmann distribution
function. A generalization of this H -theorem was given in 1901 by Gibbs.5 The
generalization and proof are given below, along with the application to gases.
5 J. W. Gibbs, Elementary Principles of Statistical Mechanics, Developed with Especial Reference to The
Rational Foundation of Thermodynamics (Scribner, New York, 1902).
36 2 Thermodynamics and Kinetic Theory
(Note that this makes dα P (α) time-independent, as it must be. Cancelling dα
is justified because phase space volumes such as dα do not change with time,)
Now use (d/dt)y ln y = (ln y + 1)(dy/dt), which gives here
dH
= dαdβ ln P (α) + 1 [P (β) (β → α) − P (α) (α → β)] .
dt
Interchange α and β in the second double integral arising from the second term
in square brackets:
dH P (α)
= dαdβ P (β) ln (β → α) . (2.4.4)
dt P (β)
Now use the inequality that y ln(x/y) ≤ x − y for any positive numbers x
and y. (To prove this, note that y ln(x/y) − x + y vanishes for x = y, while its
derivative with respect to x is −(x − y)/x, so it monotonically approaches zero
from below for x < y and then decreases monotonically for x > y.) From this
inequality, we have
dH
≤ dαdβ [P (α) − P (β)] (β → α) . (2.4.5)
dt
Again interchange α and β, now in the first double integral:
dH
≤ dαdβ P (β) [(α → β) − (β → α)] . (2.4.6)
dt
In the original proof it was assumed that the laws of physics are invariant under
reversal of the direction of time’s flow, and therefore (β → α) = (α → β),
so that Eq. (2.4.6) says that H decreases with time, in accord with the
H -theorem. In studies of the decay of neutral K-mesons in 1964–1970 it was
found that time-reversal invariance is not exact.6 Fortunately, the H -theorem
survives, because on very general grounds in quantum mechanics it can be
shown without using time-reversal invariance that7
dα [(α → β) − (β → α)] = 0 . (2.4.7)
With Eq. (2.4.6), this is enough to require that dH /dt ≤ 0, as was to be shown.
Let us pause for a moment to reflect how remarkable is this result. The
decrease of H with time indicates a fundamental difference between past
and future, even though this result would hold even if the underlying micro-
scopic laws of physics were entirely symmetric under the direction of time’s
flow, and indeed as we have seen it was first derived under the assumption of
6 K. R. Shubert et al., Phys. Lett. 13, 138 (1964). This had been strongly suggested by an earlier experiment
of J. H. Christensen, J. W. Cronin, V. L. Fitch, and R. Turlay, Phys. Rev. Lett. 13, 138 (1964).
7 For a very general proof, with references to earlier work by others, see S. Weinberg, The Quantum Theory
of Fields, Vol. I, pp. 150–151 (Cambridge University Press, Cambridge, UK, 1995, 2005).
2.4 Kinetic Theory and Statistical Mechanics 37
Now, δ P (α) is not entirely arbitrary but is constrained bythe condition that
variations in P cannot change either the total probability P (α)dα = 1 or
the mean value of any conserved quantity such as the total energy E(α). In
order that δH should vanish for any variation in P (α) that preserves dα P (α)
and the mean values of all conserved quantities, it is necessary and sufficient
that ln P (α) should be a linear combination of a constant and any conserved
quantities. For instance, if the total energy E(α) is the only conserved quantity
(as it is for radiation) then if we denote the coefficient of E(α) in this linear
combination as −1/ we have
E(α)
P (α) = exp C − ,
with the constant factor eC fixed by the requirement that P (α) dα = 1. We
will show below that, with this probability distribution, the quantity −H has
the defining property (2.3.1) of entropy provided that is proportional to the
absolute temperature T , = kT , so the canonical ensemble is usually written
E(α)
P (α) = exp C − . (2.4.8)
kT
38 2 Thermodynamics and Kinetic Theory
thus justifying Eq. (2.3.4). The decrease in H implies the increase in entropy,
thus justifying one consequence of the second law of thermodynamics. This was
shocking to some physicists of the nineteenth century, who regarded thermody-
namics as an independent theory, just as fundamental as Newtonian mechanics.
2.4 Kinetic Theory and Statistical Mechanics 39
Compound Systems
Equation (2.4.10) makes it easy to justify a fundamental property of the entropy,
that it is extensive. Suppose a system can be regarded as composed of two parts,
whose states are described by parameters α1 and α2 , and that the probabilities in
these two parts are uncorrelated, so that the probability P (α1 , α2 ) dα1 dα2 that
the system is in a state with parameters in the infinitesimal ranges dα1 and dα2
around α1 and α2 is a product of probabilities for the separate parts:
P (α1 , α2 ) dα1 dα2 = P1 (α1 ) dα1 × P2 (α2 ) dα2 , (2.4.11)
with
dα1 P1 (α1 ) = dα2 P2 (α2 ) = 1 .
Gases
In a gas E(α) is the sum of the energies Ea of the individual particles. The prob-
ability distribution (2.4.9) is then equal to a product of probability distributions
for the individual particles:
P (α) = p(Ea , Nia ) (2.4.13)
a
where p(Ea , Nia ) are the probability distribution functions for the individual
particle properties:
p(Ea , Nia ) ∝ exp − Ea + Nia μi /kT , (2.4.14)
i
in which Nia is the number of atoms of type i in the ath molecule. The con-
stant of proportionality must be chosen to make the individual total probabil-
ities equal to unity. If all the molecules have the same chemical formula, so
40 2 Thermodynamics and Kinetic Theory
= Ni is the same for all molecules, then we can absorb the factor
that Nia
exp(− i Ni μi /kT ) into the constant of proportionality, and simply write
P (α) = p(Ea ) where p(Ea ) ∝ exp(−Ea /kT ) . (2.4.15)
a
In particular, the distribution of the momentum pa arises from the kinetic energy
term p2a /2ma in Ea , and Eq. (2.4.15) yields the Maxwell distribution (2.4.1) but,
as we have now seen, derived by Boltzmann in a more convincing way.
Equipartition
One of the most useful results of statistical mechanics is the equipartition of
energy in cases where the total energy E(α) can be written as the sum of
individual energies proportional to squares of independent quantities ξn :
E(ξ1 , ξ2 , . . . ) = cn ξn2 . (2.4.16)
n
For instance, for a gas of N monatomic atoms of mass m, the index n runs
over 3N values; the ξn are the three components of each atom’s momentum
and cn = 1/2m. Molecules that are not monatomic can rotate as well as move.
Here n runs over 6N values, with the ξn including the three components of each
atom’s momentum and the three components of its angular momentum. For an
angular momentum J, the rotational energy is
J12 J2 J2
+ 2 + 3 ,
2I1 2I2 2I3
where the Ii characterize the moments of inertia of the molecule. Here the
extra ξn variables are the components of angular momentum, with the cn the
corresponding values of 1/2I . But for a gas of diatomic molecules there is
essentially no energy in rotations around the line separating the atoms, so here
the ξn include only the components of each molecule’s angular momentum J⊥
in the two directions normal to this line, and n runs over 5N values. For an
ensemble of simple harmonic oscillators the ξn include both the displacement
from rest of each oscillator and the displacement’s rate of change. As we shall
see in Section 3.1, the energy of a radiation field can also be expressed as
in Eq. (2.4.16), with the ξn the Fourier transforms of each component of the
electric and magnetic fields.
Whatever the nature of the ξn , because of the factorization of the expo-
nential the probability of finding any one ξn in a range dξn takes the form
An exp(−En /kT ) dξn , with En = cn ξn2 and with proportionality constant An
fixed
by the condition that the total probability for each ξn is unity, so that
An exp(−En /kT ) dξn = 1. Thus the mean value of En is
2.4 Kinetic Theory and Statistical Mechanics 41
∞ 2 2
∞ √
−∞ dξn cn ξn exp(−cn ξn /kT ) d En En exp(−En /kT )
En ≡ ∞ = 0 ∞ √
2
−∞ dξn exp(−cn ξn /kT ) 0 d En exp(−En /kT )
∞ 1/2
0 dEn En exp(−En /kT ) (kT )3/2 (3/2)
= ∞ −1/2
= = kT /2 . (2.4.17)
exp(−En /kT ) (kT )1/2 (1/2)
0 dEn En
It is a fortunate aspect of kinetic theory that these mean energies do not depend
on the coefficients cn , or indeed on much else about the physical system aside
from the distribution of the total energy among individual quadratic degrees of
freedom.
In any gas the kinetic energy of the nth particle is mn p2n /2. The average of
each of the three terms in this kinetic energy is kT /2, so the average kinetic
energy of each particle is 3kT /2. Equation (1.1.4) gives mv2 /3 = kT , where
this k is the constant k in the gas constant (1.2.4), so we see that this k is the
same as the constant k in the general probability distribution (2.4.8) or (2.4.9)
of statistical mechanics.
For a generic polyatomic molecule the mean rotational energy associated
with the three degrees of freedom Ji is 3kT /2, but, as already mentioned,
for a diatomic molecule meaningful rotation is only possible around the two
axes perpendicular to the linear molecule, so the mean rotational energy is only
2kT /2. That is, if we write the mean translational plus rotational energy per
molecule as 3kT /2 × (1 + f ), as in Section 2.1, then f = 0 for monatomic
molecules, f = 2/3 for diatomic molecules, while f = 1 for other molecules.
Equation (2.1.9) gives the specific heat ratio as γ = 1 + 2/3(1 + f ), so γ = 5/3
for monatomic gases, γ = 7/5 for diatomic molecules (which explains why
experiments on gases like O2 and H2 gave results near γ = 1.4 in Clausius’
time), and γ = 4/3 for other molecules.
Of course, molecules can also vibrate as well as rotate and move, and energy
can also go into exciting the clouds of electrons that hold them together. For
reasons that only became clear with the advent of quantum mechanics, these
degrees of freedom can only be excited at temperatures much higher than is
common in our environment.
Entropy as Disorder
The entropy can be regarded as a measure of the disorder of a system. To
see this, it is easiest to approximate the parameters of a system as taking a
discrete set of values αν instead of a continuum of values α. We can connect
the continuum and discrete descriptions by dividing the continuum into tiny
ranges αν ≤ α ≤ αν + δα (for simplicity treating α here as if it were one-
dimensional) and approximating P (α) as a constant Pν /δα in each interval, so
that the probability that α is in this interval is
42 2 Thermodynamics and Kinetic Theory
αν +δα
Pν
dα P (α) = δα = Pν . (2.4.18)
αν δα
Then the entropy (2.4.10) is
Pν Pν
S = −k δα ln =−k Pν ln Pν (2.4.19)
ν
δα δα ν
where is a constant,
= k ln δα . (2.4.20)
Since there was an arbitrary constant in Eq. (2.4.10), we can absorb into the
definition of that constant, and define the entropy simply as
S = −k Pν ln Pν . (2.4.21)
ν
Conservation Laws
In many cases we have to deal with conserved quantities, such as the number of
molecules or the total electric charge. By a quantity being conserved is meant
that the net rate of increase of the quantity (negative if a decrease) in any volume
plus the net rate at which this quantity flows out of the volume (negative if
flowing in) vanishes. The current J of this quantity is defined so that the net rate
of outward flow is A dA · J where A is the surface surrounding the volume V ,
and dA is an element of area of this surface, taken as a vector pointing outward
2.5 Transport Phenomena 43
from the surface. Hence if this quantity has a density N and a current J , then
the conservation condition is
∂
d xN+
3
dA · J = 0 . (2.5.1)
∂t V A
Using Gauss’s theorem, we can write the second term in Eq. (2.5.1) as an
integral over the volume of the divergence of the current, so Eq. (2.5.1) is
equivalent to
∂
3
d x N +∇·J =0,
V ∂t
and, since this must be true for any volume, the integrand must vanish:
∂
N +∇·J =0. (2.5.2)
∂t
For instance, if matter is carried from one place to another only by a bulk motion
with velocity v, then the mass density ρ satisfies an equation of the form (2.5.2),
with the mass current given by ρv:
∂
ρ + ∇ · (ρv) = 0 . (2.5.3)
∂t
Momentum Flow
Such conservation laws are ubiquitous in physics. We will be concerned now
with a particular set of conserved quantities in fluids, the components of mo-
mentum. The density of the ith component (with i = 1, 2, 3) of momentum is
ρvi , where ρ is the mass density and vi is the ith component of the bulk velocity.
Their conservation provides the fundamental dynamical equation for fluids. The
conservation equation here takes the general form
∂ ∂
(ρvi ) + Tj i = 0 , (2.5.4)
∂t ∂xj
j
8 T is the purely spatial part of a larger array, a tensor with time as well as space components that serves in
ji
the general theory of relativity as the source of the gravitational field.
44 2 Thermodynamics and Kinetic Theory
momentum. So, to keep an open mind, let us write the j th component of the
current of the ith component of momentum as
Tj i = ρvj vi + τj i , (2.5.5)
with τj i a correction term arising from forces acting within the fluid. Accord-
ing to Eq. (2.5.4), the i-component of the internal force per unit volume is
−(∂/∂xj )τj i .
So what is τj i ? An answer was first given in 1822 by Claude-Louis Navier
(1785–1836), of the Corps des Ponts et Chaussées, and later in his own formu-
lation by Sir George Stokes (1819–1903). Rather than trying to reproduce their
reasoning, we give a treatment below that has a more modern flavor, relying
largely on principles of invariance.
First, we can learn a little about the momentum current Tj k by imposing
the condition that angular momentum should satisfy a conservation condition.
The density of the ith component of angular momentum is ρ(x × v)i , so for
instance the rate of change of its i = 3 component is
∂
∂
∂Tj 2 ∂Tj 1
ρ(x × v)3 = ρ(x1 v2 − x2 v1 ) = −x1 + x2
∂t ∂t ∂xj ∂xj
j j
∂
=− x1 Tj 2 − x2 Tj 1 + T12 − T21 .
∂xj
j
In order for this to take the form of a conservation law we must have T12 = T21 ,
and, since there is nothing special about the 1- and 2- directions, Tj i must be
entirely symmetric,
Tij = Tj i , (2.5.6)
and then of course the same is also true of the term τj i in Eq. (2.5.5):
τij = τj i . (2.5.7)
1
p≡ τii , (2.5.8)
3
i
and writing τj i as
τj i = pδj i + τj i , (2.5.9)
where τj i is both symmetric and traceless:
τij = τj i , τii = 0 . (2.5.10)
i
The term pδj i in Tj i gives a force per unit volume −∇p in Eq. (2.5.4), so p
can be identified as the fluid’s pressure. Of course, there is an infinite number of
ways of constructing the symmetric traceless tensor τj i from the velocity and
rotational invariants and their derivatives. One simple example is [vi vj + vj vi −
2δij v2 /3]f , where f is any function of the rotational invariants. Fortunately,
we can eliminate many of these possibilities (including this one) by using the
principle of Galilean relativity.
Galilean Relativity
The principle of Galilean relativity9 requires that the laws governing fluids
should be the same for an observer O who uses space coordinates x and for
an observer O moving at any constant velocity −u with respect to O, and who
therefore uses coordinates related to those of O by
x = x + ut . (2.5.11)
Aside from this change of coordinates, the moving observer sees a mass density
ρ that is the same as ρ:
ρ (x , t) = ρ(x, t) . (2.5.12)
But, for the observer O , his own velocity −u is subtracted from the velocity
seen by observer O:
v (x , t) = v(x, t) + u . (2.5.13)
To check whether the equation (2.5.3) of mass conservation is left invariant by
Galilean transformations, take the partial derivative of Eq. (2.5.12) with respect
to time, holding x (but not x !) fixed:
∂ρ (x , t) ∂ρ (x , t) ∂ρ(x, t)
+ ui = .
∂t ∂xi ∂t
i
9 In the twentieth century this came to be called Galilean relativity to distinguish it from the Einstein special
principle of relativity. Both principles state that the laws of nature are unaffected by the uniform motion of
an observer; as we will see in Chapter 4, it is only the details of the transformation to a moving frame of
reference that distinguishes Einsteinian from Galilean relativity.
46 2 Thermodynamics and Kinetic Theory
Therefore
∂ρ (x , t)
+ ∇ · v (x , t)ρ (x , t)
∂t
∂ρ(x, t)
= + ∇ · (v (x , t) − u)ρ (x , t)
∂t
∂ρ(x, t)
= + ∇ · (v(x, t)ρ(x, t)) = 0 (2.5.14)
∂t
and so the equation (2.5.3) of mass conservation does satisfy the principle of
Galilean relativity.
By following the same reasoning, we can see that the momentum conserva-
tion law (2.5.4) would be invariant under Galilean transformations if Tj i were
simply given by the term ρvj vi in Eq. (2.5.5). Hence the principle of Galilean
relativity requires that the term τj i in Eq. (2.5.5) be separately invariant under
Galilean transformations, and according to Eqs. (2.5.8) and (2.5.9) the same
must be true of p and τj i . Because of the term u in the Galilean transformation
(2.5.13), Galilean relativity rules out terms in τj i such as in the example
vi vj + vj vi − 2δij v2 /3 mentioned above, which involves v itself rather than
its gradient.
Navier–Stokes Equation
There are still an infinite variety of terms that might appear in τj i , containing
any number of factors of gradients of any order of density and/or velocity. But
in order to keep the units consistent, the more gradients are contained as factors
in any term in τj i , the more powers of some length that is characteristic of the
microscopic properties of the fluid must appear in the coefficient of that term.
If these lengths characterizing the fluid, such as the distance between molecules
and the mean free path, are all much less than the scale of distances over which
fluid properties such as density and velocity vary, then τj i is dominated by a
term proportional to the minimum number of gradients.10 So we should look
for a possible term in τij proportional to a single gradient.
It is not possible to construct a symmetric traceless tensor proportional to a
single gradient of the density, so a tensor proportional to a single gradient must
be linear in the gradient of the velocity. There is a unique symmetric traceless
tensor of this sort:
∂vj ∂vi 2
τj i = −η + − δij (∇ · v) , (2.5.15)
∂xi ∂xj 3
10 This sort of reasoning has become common in the quantum theory of fields, leading to what are known as
effective-field theories.
2.5 Transport Phenomena 47
Viscosity
The measurement of viscosity was well within the capabilities of nineteenth
century physicists. In a classic calculation using the Navier–Stokes equation,
Stokes found that a uniform fluid with viscosity η exerts a drag force F on a
spherical ball of radius a moving with velocity v through the fluid, given by
F = 6πηav . (2.5.17)
For instance, if a ball of mass m falls through a fluid, it accelerates until the
viscous force balances the force mg of gravity (neglecting buoyancy), when it
has the terminal velocity
mg
vterminal = .
6πηa
The viscosity of gases could also be measured by observing the effect of a
surrounding gas on the motion of a pendulum.
It was harder to calculate η on the basis of a theory of molecules than to
measure it. For some time the best that could be done theoretically was a rough
estimate of this viscosity.
To make this estimate, consider a uniform fluid experiencing a shear flow.
For instance, suppose v has only one component, v1 , which depends only on x3 .
(The fluid could be enclosed between two flat plates, each in the 1−2 plane,
with their separation in the 3-direction, and with one of the plates moving in the
11 Often η is called the shear viscosity. The reason is that, if we were to insist on using whatever formula for
the pressure p holds in the absence of fluid gradients, then p would not be precisely given by Eq. (2.5.8),
and τj i as defined by Eq. (2.5.9) would not be precisely traceless, so it would have a term proportional
to δij (∇ · v), with a coefficient known as the bulk viscosity. For complicated reasons the bulk viscosity is
generally much less than the shear viscosity (for instance, see S. Weinberg, Astrophys. J. 168, 175 (1971))
and in any case would have no effect in our present calculation.
12 For the details of this argument, see Sections 16 and 49 of Landau and Lifshitz, Fluid Mechanics, listed
in the bibliography.
48 2 Thermodynamics and Kinetic Theory
1-direction and the other at rest.) In this case, Eqs. (2.5.5), (2.5.9), and (2.5.15)
give the 3-component of the current of the 1-component of momentum:
∂v1
T31 = τ31 = −η . (2.5.18)
∂x3
To find η, let us use molecular theory to calculate the rate per unit area at which
the 1-component of momentum crosses a plane normal to the 3-axis, which we
will take as the plane x3 = 0. This current arises because, in addition to being
carried along in the 1-direction by the bulk velocity v, each molecule has a fluc-
tuating “peculiar velocity” v. We make the far-reaching approximation that,
because of rapid collisions, all directions of this peculiar velocity are equally
likely. Then the number per unit volume whose peculiar velocity vector v
makes an angle with the +3-axis between θ and θ + dθ is the ratio of the solid
angle 2π sin θ dθ to 4π, times the total number density n, or n sin θ dθ/2. As
we saw in our calculation of gas pressure in Section 1.1, the number of these
molecules striking an area dA in this plane in a time dt is the number in a
cylinder with base dA and height v⊥ dt = cos θ |v| dt, where v⊥ is the
component of the peculiar velocity normal to the plane x3 = 0 and |v| is the
magnitude of the peculiar velocity. This number of molecules is equal to
dA × cos θ |v| dt × n sin θ dθ/2 .
Since v1 is assumed to be a function v1 (x3 ) only of x3 , a molecule that reaches
the plane x3 = 0 having traveled a distance r will have a 1-component of
momentum mv1 (−r cos θ), where m is the mass of the molecule. (In addition to
the momentum carried by this bulk velocity, the peculiar momenta of molecules
will also have 1-components, but under the assumption that all directions of
peculiar velocity are equally likely, these 1-components cancel when we
integrate over the azimuthal angle around the 3-direction.) A minus sign
appears in the argument of v1 because a molecule with a positive (or negative)
3-component v⊥ of peculiar velocity, for which cos θ > 0 (or cos θ < 0),
arrives at the plane x3 = 0 from negative (or positive) values of x3 . The rate
per unit area and per unit time at which the 1-component of momentum flows
through the plane x3 = 0 is then
π ∞
n|v|
T31 = cos θ sin θ dθ mv1 (−r cos θ ) P (r) dr ,
0 2 0
where P (r)dr is the probability that a molecule that reaches the plane x3 = 0
has traveled a distance between r and r + dr since its last collision with another
molecule, and the bar again denotes an average over molecules. As long as the
mean distance between collisions is small compared with the scale of distances
over which the fluid properties vary, all directions are equivalent, and P (r)dr
is also the probability that from a random starting position a molecule will
travel a distance between r and r + dr before its first collision. (Note that this
2.5 Transport Phenomena 49
formula applies for molecules with negative as well as positive values of cos θ ,
because molecules with negative values of cos θ have a negative 3-component
of peculiar velocity and therefore cross the plane x3 = 0 traveling from positive
to negative values of x3 , and thus contribute a negative amount to the flow of
the 1-component of momentum through this plane.)
We again make the crucial assumption, which led to the Navier–Stokes equa-
tion, that the typical distances traveled by molecules are much smaller than the
scale of distances over which the bulk properties of the fluid vary. Here this
implies that v1 (−r cos θ) changes little over the range of r for which P (r) is
not negligible. This allows us to use a Taylor expansion
∂v1
v1 (−r cos θ) = v1 (0) − r cos θ + ··· .
∂x3 x3 =0
The first term makes no contribution to the current, because the integral over θ
of cos θ sin θ vanishes. This leaves us with the contribution of the next term,
π
nm|v| ∂v1 n|v| ∂v1
T31 = − cos θ sin θ dθ = −
2
,
2 ∂x3 x3 =0 0 3 ∂x3 x3 =0
where is the mean free path13
∞
≡ r P (r) dr.
0
Comparing this with our formula (2.5.18) for T31 , taken from Eq. (2.5.15),
we find for η the positive value
1
η = m n |v| . (2.5.19)
3
13 The notion of a mean free path was introduced by Rudolf Clausius in “On the Mean Lengths of the Paths
Described by the Separate Molecules of Gaseous Bodies,” Ann. Phys. 105, 239 (1858).
50 2 Thermodynamics and Kinetic Theory
times the probability p(r) that it had not collided before it had traveled the dis-
tance r. To calculate p(r), we note that p(r + dr) equals p(r) times the proba-
bility 1 − nσ dr that the molecule will not collide before it travels to r + dr, so
p (r) = −p(r)nσ and, since p(0) = 1, the probability of traveling a distance
r without colliding is p(r) = exp(−nσ r). The probability of a collision in a
distance from r to r + dr is then
P (r)dr = nσ dr × p(r) = nσ dr exp (−n σ r) .
The average distance traveled between collisions is then
∞ ∞
1
≡ r P (r) dr = r nσ dr exp (−n σ r) = . (2.5.20)
0 0 nσ
This formula for is often used for media more complicated than a gas of hard
balls, by taking σ as some sort of effective cross section.
Using the result (2.5.20) in Eq. (2.5.19) gives an estimate of the viscosity:
m |v|
.η
3σ
The Maxwell–Boltzmann distribution (2.4.1) gives the mean value of |v| as
kT RT
|v| = = ,
2πm 2πμ
where R = k/m1 is the gas constant and μ = m/m1 is the molecular weight.
The viscosity is therefore
m RT
η . (2.5.21)
3σ 2πμ
Quantitatively this result correctly only gives the order of magnitude of η, but
it has an important qualitative consequence, that the viscosity is independent of
the gas density. This result was first found by Maxwell.14 In a letter to Stokes,15
he commented that “This is certainly very unexpected, that the friction should
be as great in a rare gas as in a dense gas. The reason is that for the rare gas the
mean path is greater, so that the frictional action extends to greater distance.”
One reason for finding this result surprising is that it raises the question
whether a gas that is so rare that it is practically a vacuum can have any vis-
cosity? It was this point that had led Aristotle in his book Physics to argue
that a vacuum is impossible. He concluded from his experience with motion
under the influence of friction that the velocity imparted to a body by a given
force is inversely proportional to the resistance, thus anticipating Stokes’ law
14 J. C. Maxwell, “Illustrations of the Dynamical Theory of Gases,” Phil. Mag. 19, 19; 20, 21 (1860).
15 Quoted on p. 27 of Brush, The Kinetic Theory of Gases, listed in the bibliography.
2.5 Transport Phenomena 51
Diffusion
The general formulation above of the transport of momentum in a gas can be
extended to the transport of other physical quantities in general fluids. One such
quantity is the number density ν of particles suspended in a fluid. These can be
large molecules, such as molecules of sugar dissolved in water, or the tiny bits
of organic matter expelled from pollen grains noticed in 1827 by the botanist
Robert Brown (1773–1858), or artificial little balls used in studies of diffusion
to be discussed in the next section. The conservation of these particles requires
that their number density ν(x, t) satisfies an equation of the general form (2.5.2):
∂
ν + ∇ · νv + j = 0 , (2.5.22)
∂t
where v is the fluid bulk velocity. As in Eq. (2.5.5) we again separate the
convective term νv in the current from the diffusion term j. Since by itself
(∂/∂t)ν + ∇ · (νv) is Galilean-invariant, the diffusion term j must be a Galilean-
invariant vector, and if the scale over which the density ν varies is much larger
than relevant mean free paths then it is dominated by a term with a single
gradient, which can only be of the form
52 2 Thermodynamics and Kinetic Theory
j = −D ∇ν , (2.5.23)
where D is a coefficient known as the diffusion constant.
For instance, if the fluid is at rest and D is independent of time and position
then Eq. (2.5.22) takes the form
∂
ν = D∇ 2 ν . (2.5.24)
∂t
Here is one solution:
N x2
ν(x, t) = exp − , (2.5.25)
(4πDt)3/2 4Dt
where N is a constant equal to the number ν d 3 x of particles suspended in the
fluid. (This is one way of seeing that the coefficient D defined by Eq. (2.5.23)
must be positive.) This√distribution is spherically symmetric and localized
within a radius of order 4Dt, which spreads with time owing to the diffusion
of the suspended particles through the fluid.
A vivid description of how diffusion arises from the microscopic motion
of suspended particles was given in 1905 by Albert Einstein16 (1879–1955).
Consider a time interval τ that is short compared with the times over which
the distribution function changes appreciably but long enough that typical sus-
pended particles collide many times with the molecules of the fluid. In this
time the position of each suspended particle jumps by some random vector
amount . These amounts differ from one suspended particle to another, in a
way that is governed by some sort of statistical distribution. Then for vanishing
bulk velocity v, the number density ν(x, t) is changed in this time interval to
ν(x, t + τ ) = ν(x + , t) ,
the bar indicating an average over the suspended particles. Assuming that ν(x, t)
is slowly varying over times of order τ and distances of order ||, we can
expand both sides as Taylor series in τ and :
∂ ∂ν(x, t) 1 ∂ 2 ν(x, t)
τ ν(x, t) + · · · = i + i j + ··· .
∂t ∂xi 2 ∂xi ∂xj
i ij
Under the assumption that all directions of are equally likely, we have
δij
i = 0, i j = ||2 ;
3
so, to leading order,
∂ 1
τ ν(x, t) = ||2 ∇ 2 ν(x, t) .
∂t 6
Comparing this with the diffusion equation (2.5.24) for zero bulk velocity, we
see that the mean square displacement increases as
||2 = 6τ D . (2.5.26)
This is in accord with the particular solution (2.5.25) of the diffusion equation;
calculating the integral x2 = N −1 ν(x, t)x2 d 3 x gives x2 = 6Dt.
The diffusion constant can be measured by observing this spreading out of
the suspended particles with time. The calculation by Einstein of the diffusion
constant D in terms of fundamental constants and its use to measure these
constants are discussed in the next section.
We have seen various ways in which observations in the nineteenth century pro-
vided physicists with the values of only the ratios of quantities that characterize
the scale of individual atoms or molecules. The study of gases allowed measure-
ment of the gas constant R ≡ k/m1 (where k is Boltzmann’s constant and m1 is
the mass an atom would have if it had atomic weight unity, related to Avogadro’s
number by NA = 1/m1 ); the study of electrolysis allowed measurement of the
faraday, F ≡ e/m1 (where e is the minimum electric charge that is transferred
in electrolysis; and, under the assumption that the charge of the electron is
the same as the unit e of electric charge transferred in electrolysis, the study
of the bending of cathode rays allowed measurement of e/me . Furthermore,
under the assumption that molecules are tightly packed in liquids and solids,
knowledge of the mass density of a liquid or solid gave an approximate value
for the ratio of the mass to the volume of individual molecules. A measurement
of m1 or k or e or me or the size of any molecule would yield results for all these
quantities. No accurate measurements of any of these individual quantities were
possible before the twentieth century, which is not to say that nineteenth century
chemists and physicists did not try.
Electronic Charge
One of the problems with the water droplets studied by Townsend et al. in their
estimates of the charge of the electron was that the masses of the droplets did
not remain fixed during the experiment, because water evaporates. To avoid
this, Robert Andrews Millikan (1868–1953) in 1906 studied individual oil
17 This is the way in which these experimental results were quoted by physicists at the time and have
generally been described by historians since then, but it is misleading. The formulas used to analyze
these experiments actually involved RNA , where R is the gas constant appearing in the ideal gas law
(1.2.3). Since R had already been measured, the measured value of RNA could be used to find NA . But
since R = k/NA , they were really measuring k, not NA . I suppose that the results were cited in terms of
NA rather than k because Avogadro’s number was much more familiar to physicists of the time than the
Boltzmann constant of statistical mechanics.
2.6 The Atomic Scale 55
drops that had picked up electric charge from air ionized by X-rays. Unlike the
water droplets in the experiments of Townsend et al., these oil drops were
large enough that Millikan could study the motion of individual drops. As in
the earlier experiments, Millikan could measure the mass m and radius a of
individual drops from their terminal velocity in the absence of any external
electric fields, using the known density of oil and viscosity of air. Then, when
he turned on a strong vertical electric field E, a drop carrying electric charge q
would feel an electric force qE in addition to the gravitational force mg, so the
terminal velocity would be altered by an amount qE/6πηa. Measuring changes
in the terminal velocity, and knowing m and a, it was then possible to calculate
the changes in the drops’ electric charges. For instance, in one run the changes
in the electric charge q (in units of 10−19 coulombs) were
9.91, − 11.61, 1.66, 5.00, 1.68, − 8.31, 6.67, 5.02, etc.,
all close to integer multiples of 1.66 × 10−19 coulombs. After repeated runs,
Millikan concluded that the fundamental unit of electric charge is e = (1.592 ±
0.003) × 10−19 coulombs. (The modern value is e = 1.6021765 × 10−19
coulombs.) This immediately allowed the calculation of m1 (from the faraday
e/m1 ), and then k (from the ideal gas constant k/m1 ), and so on. Even more
importantly, the observation that droplet charges come close to integer multiples
of a unit charge gave direct evidence for the discreteness of electric charge.
Brownian Motion
The diffusion of particles suspended in a fluid depends on the size and shape
of the particles, as well as on the fluid properties and fundamental constants.
Where the particles are molecules, such as sugar molecules dissolved in water,
it is not possible to deduce relevant information about their size and shape
with any precision from the properties of solids or liquids composed of these
molecules. In the first decade of the twentieth century Einstein had the idea
of learning about fundamental constants from observation of the diffusion of
artificial particles, like little spherical balls, whose shape, size, and mass were
accurately known. (This diffusion is a special case of what is termed “Brownian
motion,” after the botanist Robert Brown mentioned in the previous section.)
Einstein took notice of the common observation that it is possible to have
a time-independent inhomogeneous equilibrium distribution of particles such
as little balls suspended in a fluid, in which the effect of diffusion is can-
celled by a steady external force F acting on each ball. For example, this force
could be the combined force of gravity and buoyancy, so that it has magnitude
F = g(m − mdisp ) (where g is the gravitational acceleration, m is the ball’s
mass, and mdisp is the mass of the fluid displaced by the ball), and it acts
in the −z direction, where z is altitude. In equilibrium this is balanced by a
kind of pressure, known as the osmotic pressure. With the balls in thermal
56 2 Thermodynamics and Kinetic Theory
18 A. Einstein, “On the Motion of Small Particles Suspended in Liquids at Rest Required by the Molecular
Theory of Heat,” Ann. Phys. 17, 549 (1905). Because F has dropped out of the final formula for D,
Einstein’s result is independent of the nature of the force acting on the suspended particles, though for
simplicity we have assumed that this force is independent of position.
2.6 The Atomic Scale 57
Within a few years after Einstein’s 1905 paper, an experimental study of the
diffusion of small bodies was carried out in Paris by Jean Perrin (1870–1942).
Perrin measured k (or as he said, Avogadro’s number) by observing the decrease
with altitude of the density of little balls suspended in a vertical column of fluid.
Equation (2.6.1) has the elementary solution
ν(z) ∝ exp(−F z/kT ) . (2.6.4)
(Perrin gave this solution in the form ν(z) ∝ exp [−NA F z/RT ].) Using the
known value of the combined gravitational and buoyancy force F on the little
balls gave a value for NA /R = 1/k, which by using the known value of the gas
constant R, Perrin reported19 as a value for Avogadro’s number NA = R/k =
7.05 × 1023 /mole, corresponding to m1 = 1.42 × 1024 g. As was usual at the
time, no figure was given for the uncertainty of the measurement.
Perrin also used microscopic measurements over several minutes of the root
mean square diffusion of suspended balls in the horizontal direction, in which
no force is acting. He found that as expected the mean square displacement
is proportional to the elapsed time. Using Eq. (2.5.26) gave a value for the
diffusion constant D and, using Einstein’s formula (2.6.3) (which Perrin like
Einstein wrote as D = RT /6πηaNA ) he found20 that NA = 7.15 × 1023 /mole,
corresponding to m1 = 1.40 × 10−24 g. The fair agreement of this result,
which was obtained by direct observation of diffusing particles, with Perrin’s
earlier measurement based on equilibrium in a vertical column gave support
to the view that diffusion is due to the motion of the balls in equilibrium with
randomly moving molecules. Perrin was not hesitant in concluding that his work
confirmed the reality of molecules – his results were summarized in a long
article21 titled “Brownian Movement and Molecular Reality.” His measure-
ments were not far off – with modern definitions of molecular weight, the value
of Avogadro’s number is 6.022142 × 1023 /mole.
19 J. Perrin, Comptes rendus cxlvi, 167 (1908) and cxlvii, 530 (1908).
20 J. Perrin, Comptes rendus cxlvii, 1044 (1908).
21 See Perrin, Brownian Movement and Molecular Reality, listed in the bibliography.
22 M. Planck, Verh. d. deutsche phys. Ges. 2, 202, 237 (1900).
58 2 Thermodynamics and Kinetic Theory
Consistency
The atomic theory underlying these measurements of microscopic parameters
and the values found gained much credit from the consistency of the results
obtained. For instance, in 1901 Planck used his measurement of k together with
the known value R = 8.27 × 107 erg/mole K of the gas constant to calculate a
value for Avogadro’s number NA = R/k = 6.17 × 1023 /mole, in fair agree-
ment with Perrin’s later result NA 7 × 1023 /mole. Planck also used this
result together with the known value of the faraday, F = eNA = 9.63 × 104
coulombs/mole, to calculate the unit of charge, e = 1.56 × 10−19 coulombs, in
very good agreement with Millikan’s result.
This happy agreement of fundamental constants led to a widespread accep-
tance of the atomic theory of matter. For instance, the chemist F. W. Ostwald
(1853–1932) had been a determined opponent of the atomic theory, but in 1908
he finally admitted that “I am now convinced that we have recently become
possessed of experimental evidence of the discrete or grained nature of matter,
which the atomic hypothesis sought in vain for hundreds and thousands of
years.”
An adverse voice remained. The physicist–philosopher Ernst Mach (1838–
1916), who spoke of “the artificial hypothetical atoms of chemistry and
physics,” never accepted their existence. As late as 1916, shortly before his
death, he declared that “I can accept the theory of relativity as little as I can
accept the existence of atoms and other such dogmas.” This goes to show that
a scientist can maintain his own principles, bravely holding out against a wide
consensus of the scientific establishment, and still be wrong.
Einstein’s derivation of Eq. (2.6.3) for the diffusion constant D relied on the
introduction of an external force F acting on suspended particles, which
prevents their diffusion from disturbing a time-independent equilibrium particle
distribution. The presence of such an external force such as gravity is not
uncommon, but it ought to be possible to obtain the same result where there is
no external force, and where diffusion is actually taking place. Below is such
a derivation, which indicates the presence of a correction for particles whose
mass is not negligible.
The mean velocity v(x, t) of diffusing suspended particles at position x and
time t is given by setting the current (2.5.23) equal to νv:
D ∇ ν(x, t)
v(x, t) = − . (2.6.5)
ν(x, t)
2.6 The Atomic Scale 59
According to Stokes’ theorem, spherical balls of radius a with this mean veloc-
ity experience a mean viscous drag force:
6πηaD ∇ ν
Fvis = −6πηav = , (2.6.6)
ν
with the signs indicating that the viscous force is in a direction opposite to that
of v, and hence in the direction of the gradient of the particle number density.
Diffusion occurs because this drag is overcome by osmotic pressure. Following
the same reasoning that led to Eq. (2.6.1), if the gradient of ν is along the
x-direction, then the force due to an environment at uniform temperature T
on the particles in a small disk of area dA and thickness dx transverse to
the x-direction is the osmotic pressure force dAkT [ν(x, t) − ν(x + dx, t)] =
−dA kT dx dν(x, t)/dx on the disk. Dividing this by the number dA dx ν(x, t)
of suspended particles in the disk gives the osmotic pressure force on each
particle:
−dAkT dx dν(x, t)/dx kT dν(x, t)/dx
Fosm = =− .
dA dxν(x, t) ν(x, t)
Since in the absence of external forces there is nothing special about the x-
direction, for a gradient in a general direction we have
kT ∇ν
Fosm = − . (2.6.7)
ν
Assuming that the viscous drag is cancelled by the osmotic pressure, we have
0 = Fvis + Fosm , (2.6.8)
which gives Einstein’s formula (2.6.3):
kT
D= . (2.6.9)
6πηa
More generally, we should take into account the possibility that the viscous
drag is not precisely cancelled by the osmotic pressure. In this case, Newton’s
law gives
dv
m = Fvis + Fosm , (2.6.10)
dt
where m is the mass of the balls and the acceleration dv/dt is the total time
derivative of the mean velocity, due both to the change in mean velocity at a
fixed position and to the change in mean velocity of the particles carried from
one point to another at the mean velocity:
dv ∂v
= + v · ∇v . (2.6.11)
dt ∂t
60 2 Thermodynamics and Kinetic Theory
Inspection of Eqs. (2.6.5), (2.6.11), (2.5.22), and (2.5.23) shows that the magni-
tude of the acceleration can depend only on D and on L, the scale of distances
over which ν varies appreciably. Dimensional analysis then tells us that it must
be of order
dv D 2
≈ . (2.6.12)
dt L3
This shifts the value of |Fvis | for a given Fosm by an amount of order mD 2 /L3 ,
and hence shifts the value of the diffusion constant derived from Eq. (2.6.6) by
a fractional amount of order
D mkT
≈ (mD/L3 ) × (L/6πηa) ≈ 2 . (2.6.13)
D L (6πηa)2
Einstein’s formula for D is valid only if this is much less than one.
Einstein did not see this correction, because he was assuming that an
external force was preventing any mean motion, so that there were no inertial
forces. But the correction would affect the horizontal diffusion of suspended
balls in Perrin’s measurement of the diffusion constant. I do not know the
parameters in Perrin’s experiment, but the fact that he obtained close values for
Avogadro’s number from the measurement of horizontal diffusion and from the
measurement of the vertical distribution of suspended balls indicates that in his
experiment the correction (2.6.13) was not very large.
This is reassuring regarding the derivation of the Navier–Stokes equation
(2.5.16), in which it was assumed that terms of second order in the inverse
of the scale L over which properties of the fluid vary can be neglected. Using
mv 2 ≈ kT , where v is a typical particle velocity, we see that the fractional
correction (2.6.13) is of order (L/L)2 , where L ≡ kT /6πηav is approximately
the distance in which viscous forces will bring a particle with radius a and
velocity v to rest. It is the ratios of just such microscopic lengths as L to the
scale L of macroscopic variation whose second and higher powers are dropped
in the derivation of the Navier–Stokes equation.
3
Early Quantum Theory
The early years of quantum theory were a time of guesswork, inspired by prob-
lems presented by the properties of atoms and radiation and their interaction.
This is the subject of the present chapter. Later, in the 1920s, this struggle led to
the systematic theory known as quantum mechanics, the subject of Chapter 5.
61
62 3 Early Quantum Theory
per unit frequency interval at which radiation energy at a frequency near ν hits
the patch will be
ct π/2
1 dA cos θ
(ν, T ) = 2
2πr dr sin θ dθ E (ν, T ) dν
t dA dν 0 0 4πr 2
c
= E (ν, T ) . (3.1.1)
4
Equilibrium requires that the rate per area of emission of radiation energy in
a frequency interval dν must equal the rate per area of absorption of radia-
tion energy in that frequency interval, which is (c/4)f (ν, T )E (νT ) dν, where
f (ν, T ) ≤ 1 is the fraction of energy of radiation of frequency ν that is absorbed
when it hits the wall of the enclosure. The emission is evidently greatest for
“black” walls, which absorb all the radiation that falls on them, so that f (ν, T )
tales its maximum value, f (ν, T ) = 1.
In the 1890s Eq. (3.1.1) was used at the Physikalisch-Technische Reich-
sanstalt in Berlin to accurately measure E (ν, T ). This presented a challenge to
theorists, to understand the measured distribution E (ν, T ).
where k and ω are real constants; e and b are complex constant three-
vectors; and c.c. denotes the complex conjugate of the preceding term. Since
in Eq. (3.1.3) we are including terms proportional to both exp(−iωt) and
exp(iωt), without loss of generality we can take ω > 0. Inserting (3.1.3) into
(3.1.2), we see that this is a solution for ρ = J = 0 if and only if
3.1 Black Body Radiation 63
ω
k×b+e=0, k·e=0
c
(3.1.4)
ω
k×e− b=0, k·b=0.
c
Combining these, we have
ω2 ω
2
e = − k × b = −k × [k × e] = k2 e ; (3.1.5)
c c
so ω = |k|c, and electromagnetic radiation therefore propagates at the speed c.
Now, we want to calculate the electromagnetic energy in a finite volume V .
Since E is universal, we can take our enclosure to be a cube, with edges
L = V 1/3 that lie along the 1-, 2-, and 3- directions. Whatever boundary condi-
tions the material of the enclosure imposes on the phases of the waves, it must
be the same on opposite sides of the cube, so the phase k · x can only change by
an integer multiple of 2π when x1 , x2 , or x3 is shifted by L. That is, the wave
number k and frequency ω must take the form
kn = (2π/L)n, ωn = c |kn | , (3.1.6)
where n is a vector with integer components n1 , n2 , and n3 . Hence the general
electric and magnetic fields in the enclosure are
E(x, t) = e(n) exp (ikn · x − iωn t) + c.c. (3.1.7)
n
c
B(x, t) = [k × e(n)] exp (ikn · x − iωn t) + c.c. , (3.1.8)
n
ωn
where e(n) is, for each n a three-vector orthogonal to n, and c.c. denotes the
complex conjugate of the previous term.
It is a well-known result of classical electrodynamics that the energy density
in radiation is (E2 + B2 )/8π. To integrate this over the volume of the enclosure,
we use the orthogonality relations
i(kn −km )·x V n=m
3
d xe =
V 0 n = m ,
(3.1.9)
i(kn +km )·x V n = −m
3
d xe =
V 0 n = −m .
(For instance, in one dimension for n = m,
L
dx e(2πi/L)(n−m)x = L/2πi(n − m) [e2πi(n−m) − 1] = 0 ,
0
while for n = m it is just L. In three dimensions, the integral is a product of
similar factors.) It follows then that
64 3 Early Quantum Theory
1
d 3 x E2 (x, t) = e(n) · e(−n)e−2iωn t
V V n
+ e∗ (n) · e∗ (−n)e+2iωn t
n
+2 e(n) · e∗ (n) ,
n
1
c 2
d x B (x, t) =
3 2
(kn × e(n)) · (−kn × e(−n))e−2iωn t
V V n
ω n
c 2
+ (kn × e∗ (n)) · (−kn × e∗ (−n))e+2iωn t
n
ω n
c 2
+2 (kn × e(n)) · (kn × e∗ (n)) .
n
ω n
8π 2
E (ν, T ) = ν E(ν, T ) , (3.1.12)
c3
where E(ν, T ) is the mean energy for each of the two complex polarization
vectors orthogonal to a wave vector k with a given value of ν = |k|c.
1 Lord Rayleigh, Phil. Mag. 49. 539 (1900); Nature 72, 54 (1905).
2 J. Jeans, Phil. Mag. 10, 91 (1905).
66 3 Early Quantum Theory
thermodynamics. But this formula would not agree with a more detailed result
of classical thermodynamics, known as the Wien displacement law.3 Planck in
1900 guessed the formula4
8πh ν 3 dν
E (ν, T ) dν = , (3.1.14)
c3 exp(hν/kT ) − 1
where h and k again are constants.
A little later in the same year, Planck published an attempted derivation5
of Eq. (3.1.14), which indicated that k is Boltzmann’s constant, while h is a
new constant, known ever since as Planck’s constant. To derive this formula, he
adopted a model of the wall of the enclosure whereby it consists of electrically
charged harmonic oscillators with a wide range of frequencies, with the oscilla-
tors of frequency ν coming into equilibrium with the electromagnetic radiation
of frequency ν. Planck assumed that the energies of oscillators of frequency ν
can only take the form E = nhν, with n a positive integer. Planck calculated
the radiation emitted by these oscillators when they are in thermal equilibrium at
temperature T , and found that in order for them to absorb just as much radiation
as they emit, the radiation in the enclosure must have the energy density distri-
bution given by Eq. (3.1.14). We will not go into Planck’s derivation because
it was superseded a few years later with the modern derivation, due to Albert
Einstein, described in the next section.
3 This result was derived by Wilhelm Wien (1864–1926) in 1893. It requires that the energy density
distribution must take the form E (ν, T ) = ν 3 F (ν/T ) where F is some function, of only the ratio ν/T ,
that is not dictated by thermodynamics alone. For a proof, see Appendix XXXIII of Born, Atomic Physics,
listed in the bibliography. We will not be relying here on this result.
4 M. Planck, Verhand. deutsch. phys. Ges. 2, 202 (1900).
5 M. Planck, Verhand. deutsch. phys. Ges. 2, 237 (1900).
3.2 Photons 67
3.2 Photons
Confusingly, Einstein was not actually dealing with the Planck distribution,
but with an attempted fit to the data given earlier by Wilhelm Wien:
E (ν, T ) ∝ ν 3 exp(−βν/T )
where β is a constant. Einstein used thermodynamic arguments to show that
this distribution would require that the energy of radiation at frequency ν must
be a whole number multiple of βRν/NA . Physicists soon learned that E (ν, T )
is really given by the Planck distribution (3.1.14) and could interpret the Wien
distribution as the high-frequency limit of the Planck distribution, which for
large ν is proportional to ν 3 exp(−hν/kT ). Thus β in Einstein’s quantization
condition could be identified as β = h/k. With the gas constant R equal to
kNA , this means that the energy of the radiation at frequency ν must be a
whole number multiple of (h/k)(Rν/NA ) = hν, the same rule as for Planck’s
mythical oscillating charges.
where
L3 L3
X≡ Re e(n) , Y ≡ Im e(n) .
2π 2π
(The factor L3 /2π in dX dY is irrelevant, as
√ it cancels between √
numerator and
denominator.) Defining θ and E by X = E cos θ and Y = E sin θ and
integrating over θ gives
∞ √ √
2π Ed EE exp(−E/kT )
E(ν, T )class = 0 ∞ √ √
0 2π Ed E exp(−E/kT )
∞
dEE exp(−E/kT )
= 0 ∞ = kT . (3.2.2)
0 dE exp(−E/kT )
This is the classical equipartition result used by Rayleigh and Jeans, leading to
the Rayleigh–Jeans energy density distribution (3.1.13).
According to Einstein’s conjecture, the energy E (not X 2 or Y 2 ) of each
polarization state can only take the values nhν, with n = 0, 1, 2, . . . , so the
integrals in Eq. (3.2.2) must be replaced with sums. That is, according to
Einstein,
∞ ∞
n=0 nhν exp(−nhν/kT ) d
E(ν, T ) = ∞ =− ln exp(−nhν/kT )
n=0 exp(−nhν/kT ) d(1/kT )
n=0
d −1 hν exp(−hν/kT )
=− ln 1 − exp(−hν/kT ) =
d(1/kT ) 1 − exp(−hν/kT )
hν
= . (3.2.3)
exp(hν/kT ) − 1
Using this in Eq. (3.2.1) gives an energy density distribution
8πh ν3
E (ν, T ) = . (3.2.4)
c3 exp(hν/kT ) − 1
This is the same as the Planck distribution (3.1.14), but derived from quite
different assumptions.
Einstein’s interpretation of his quantization assumption E = nhν was that
the energy in radiation of frequency ν comes in individual bundles, or “quanta,”
each with energy hν. A state of this radiation with energy nhν is simply one
70 3 Early Quantum Theory
Photoelectric Effect
Several physicists had observed in the late nineteenth century that electric
charge is expelled from metal surfaces when the surfaces are exposed to
ultraviolet light. After Thomson’s discovery of the electron in 1897, it was
generally assumed that this charge was carried by electrons. A metal is a
lattice of positively charged ions that have each lost one or more electrons,
which circulate freely through the metal, accounting for the good electrical and
thermal conductivity of metals. The positively charged metal ions produce an
electrostatic potential, so that in normal circumstances it takes a definite energy
(called the “work function”) φ to pull the negatively charged electrons out of
the metal. One might think that the more intense the radiation, the more energy
is given to these electrons. In 1902 experiments by Philipp Lenard (1862–1947)
showed that this is not the case. Instead, no matter how intense the radiation,
no electrons are ejected from the metal unless the frequency exceeds a certain
minimum (which is why photoelectricity was discovered using ultraviolet rather
than visible light), and when that condition is met, the energy of each expelled
electron increases with the frequency. Only the number of photoelectrons
depends on the intensity of the radiation, not their individual energies.
Einstein in his 1905 paper seized on these phenomena as evidence for his
quantization assumption. Any electron expelled from the metal was assumed to
have been struck by one of Einstein’s quanta. In order to get out of the metal,
the energy hν of the radiation quantum must at least equal the work function
φ, so no electrons can be emitted unless ν ≥ φ/ h. If this condition is satisfied,
then the kinetic energy Ee of the emitted electron will be given by the excess
Ee = hν − φ . (3.2.5)
These energies could be measured by observing how strong an electric field is
needed to stop the electron emission by exerting a force toward the surface. In
this way, Millikan at Chicago in 1914–1916 (while Europeans had other things
on their minds) confirmed the form of the Einstein relation (3.2.5), and found
a value for h, which turned out to agree with the value measured in studies of
black body radiation.
Particles of Light
As we shall see in Chapter 4, with the advent of special relativity it became
clear that since Einstein’s quanta would have to travel at the speed of light,
as particles they would have to have momenta equal to the energy divided by
c, or hν/c. As discussed in Section 4.5, this was confirmed in experiments of
3.3 The Nuclear Atom 71
It was not possible to make progress in applying quantum ideas to atoms without
some understanding of what atoms are. The growth of this understanding began
with the discovery of radioactivity.
Radioactivity
In 1896 Antoine Henri Becquerel (1852–1908) was trying to find whether var-
ious crystals that had been exposed to sunlight would emit energetic radiation,
like the X-rays that had been discovered a few months earlier. He put these
crystals next to photographic plates wrapped in dark paper that would block
sunlight but might not block rays emitted by the crystal that had earlier been
exposed to the Sun. A wire mesh was inserted between the crystal and the
paper, so that any exposure of the plate by these rays would show an image
of the mesh. One of the crystals Becquerel intended to study was uranium
potassium bisulphate, because it exhibits the phenomenon of phosphorescence,
the delayed emission of light by substances such as the luminous paint on clock
dials that have been exposed to bright light. At first, in February 1896, the
skies in Paris were too cloudy to provide the needed sunlight, so Becquerel
left his crystals and photographic plates in a drawer. When he took them out
in early March, he found that the plates that had been left near the crystals
containing uranium were exposed, showing clear images of the wire mesh,
even though they had never been put in sunlight. In the following months he
found that some sort of ray from various compounds of uranium would expose
photographic plates, even when the crystals and plates were put together in
lead-lined boxes.
It was soon realized that this phenomenon was not limited to uranium com-
pounds. In 1898 Marie Curie (1867–1934) showed that similar phenomena are
produced by compounds of thorium, and she and Pierre Curie (1859–1906) were
able to isolate a previously unknown element, radium, that was millions of times
more active than uranium or thorium. The Curies gave this phenomenon the
name radioactivity.
Two different kinds of radioactivity were distinguished in the next few years
by Ernest Rutherford (1871–1937). There are beta rays, which are about as pen-
etrating as X-rays, and alpha rays, which cannot penetrate even very thin sheets
72 3 Early Quantum Theory
of foil. (Gamma rays, which are energetic photons, were discovered later.) In
1898 Becquerel discovered that beta rays could be deflected by magnetic and
electric fields. From the amount of the deflection he concluded that these rays
are composed of particles with the same ratio of charge to mass as the cathode
ray particles whose deflection had been measured by J. J. Thomson shortly
before. Beta rays are in fact what later became known as electrons, but moving
much faster than the electrons in cathode rays. It was harder to deflect alpha
rays, but this was eventually accomplished by Rutherford. From the amount
and direction of deflection, Rutherford concluded that these rays consist of
positively charged particles, with a ratio of charge eα to mass mα equal to
half the ratio of charge to mass for hydrogen ions, as had been measured in
electrolysis. The lightest element heavier than hydrogen is helium, with atomic
weight about four times greater than hydrogen, so Rutherford guessed that alpha
rays are helium ions, with charge twice that of hydrogen ions. That is,
eα 2e 1 e
= = .
mα 4m1 2 m1
This was finally confirmed in 1907 when Rutherford together with T. D. Royds
was able to collect enough alpha particles from radioactive decay to show that
the atoms that they form absorb light at the same spectral frequencies as helium.
Once the particles in alpha and beta rays were identified respectively as
helium ions and electrons, it became possible to use measurements of their
deflection to find the particles’ energies. These energies were enormous,
typically about a million times larger than the energies of photons emitted
in ordinary chemical reactions such as burning. Studies of radioactivity by
Rutherford with the chemist Frederick Soddy (1877–1956) at McGill Uni-
versity showed that this energy is released when elements like radium and
thorium spontaneously change to other elements, such as radon. But, for the
understanding of atoms, the most important consequence of the discovery of
radioactivity was that it provided highly energetic charged particles that could
be used as probes of atomic structure.
The beam then struck a screen covered with zinc sulfide, which emits a flash
of light when hit by an energetic charged particle such as an alpha particle. If
the gold atom really consisted only of the very light electrons in a continuum of
positive charge, it would scatter the alpha particles only weakly. Geiger at first
found flashes of light from an area only slightly larger than the geometric image
of the slit, where the unscattered beam would have struck the screen, indicating
the expected slight scattering.8
A better model was suggested in 1911 by Rutherford,9 on the basis of further
experiments in 1910 in his laboratory. Geiger and Ernst Marsden (1889–1970)
again used alpha rays emitted from a glass tube containing radon 222 gas.10
Rutherford for some reason asked Geiger and Marsden to see whether any alpha
particles could be deflected at large angles, more than 90◦ , so that the particles
would be reflected backwards from surfaces of gold or other metals, producing
flashes of light in a zinc sulfide screen on the same side of the metal surface
as the alpha particle source. To his surprise Rutherford learned that some alpha
particles were scattered almost straight back from various metal surfaces: gold,
lead, platinum, etc.11
Nuclear Mass
The observation of backward scattering immediately indicated that the alpha
particles were repelled by something much heavier than an electron, heavier
indeed than an alpha particle. Suppose two particles with masses mA and mB
and initial velocities vA and vB along some line collide head on, emerging with
velocities vA and v along the same line. (These vs can be positive or negative;
B
when two vs have the same sign the particles are going in the same direction;
if opposite signs, they are going in opposite directions.) The conservation of
momentum requires that
mA vA + mB vB = mA vA + mB vB
while (as long as velocities are measured when the particles are sufficiently
far apart that they exert no force on each other) the conservation of energy
requires that
8 H. Geiger, Proc. Roy. Soc. A 81, 141 (1908). This reference gives citations to earlier work of Rutherford
and others along the same lines.
9 E. Rutherford, Phil. Mag. 21, 669 (1911); 27, 488 (1914). The first article is reprinted in Beyer,
Foundations of Nuclear Physics, listed in the bibliography.
10 It is not clear whether these alpha particles were produced by the direct alpha decay of radon or in
the alpha decay of radon’s decay products. Without explanation Rutherford’s 1911 paper cited an alpha
particle velocity of 2.09 × 109 cm/sec. If this is accurate, then these alpha particles could not have been
those that are emitted in the decay of radon 222 to polonium 218, which have a velocity of 1.6 × 109
cm/sec. Polonium 218 decays into lead 214 with a half life of 3.1 minutes, producing an alpha particle
with velocity 1.7 × 109 cm/sec, and lead 214 then undergoes further decays. Rutherford’s estimate of an
alpha particle velocity of 2.09 × 109 cm/sec may have been just a guess.
11 H. Geiger and E. Marsden, Proc. Roy. Soc. A 82, 495 (1910).
74 3 Early Quantum Theory
+ mB vB = mA vA 2 + mB vB 2 .
2 2
mA vA
We can use the first equation to express vB in terms of the other velocities.
Using this in the second equation, we then have a quadratic equation for vA ,
with coefficients depending on vA and vB . Like any quadratic equation, this has
two solutions. If nothing changes in the collision then the conservation laws are
automatically satisfied, so one solution is obvious; without even writing down
the equation, we know that vB = vB , vA = v is a solution. Since there are
A
only two solutions, the other solution, for which something does happen in the
collision, is unique. Here it is:
Nuclear Size
The observations of alpha particles reflected backward also shows that they
are repelled by something small. Here it is necessary to assume that at the
separations reached in these collisions, the force between the alpha particle
and whatever it is encountering is purely electrostatic. If we assume that the
alpha particle has charge eα and mass mα , and is repelled by something heavy
with charge Ze, then the potential energy of the alpha particle at separation r
is Zeeα /r. In order for the alpha particle to be brought momentarily to rest
before reversing direction, its initial kinetic energy mα vα2 /2 must be entirely
converted to potential energy, so it must at that moment reach a separation r
satisfying
Zeeα /r = mα vα2 /2
3.3 The Nuclear Atom 75
Scattering Pattern
Further experiments in Rutherford’s laboratory measured the rate d at which
alpha particles in a beam with flux (in particles per unit time and per unit
area transverse to the beam) are scattered into any solid angle d (that is,
into ranges dθ of angles to the initial direction and dφ of angles around the
initial direction, with d = sin θ dθ dφ). Rutherford compared the result with
a calculation using Newtonian dynamics to follow the hyperbolic orbits of alpha
particles in the electric field of a single charged nucleus and find into what area
dσ transverse to the beam the alpha particles must be directed in order to be
scattered into the solid angle d. This gave the ratio of dσ to d, known as the
differential cross section:
dσ Zα2 Z 2 e4
= , (3.3.3)
d 16Eα2 sin4 (θ/2)
where Zα e and Ze are the electric charges respectively of the alpha particle and
the nucleus, and Eα is the initial alpha particle kinetic energy. Since any given
alpha particle can be anywhere in the beam, for a beam of transverse area A
the probability that a particular alpha particle will be aimed at the area dσ for
scattering into d by a single nucleus is dσ/A. If there are N atomic nuclei in
the part of a metal surface within the area of the beam of alpha particles, then
the probability that a given alpha particle will be scattered into the solid angle
d will be Ndσ/A. With a flux , the number of alpha particles per second
hitting the metal surface is A, so the rate at which alpha particles are scattered
into the solid angle d is
76 3 Early Quantum Theory
dσ
A × Ndσ/A = N d .
d
The observed pattern of scattering at angles greater than 90◦ agreed with the
proportionality to 1/ sin4 (θ/2) indicated by Eq. (3.3.3), confirming to Ruther-
ford that this was indeed Coulomb scattering by a heavy point charge.
Rutherford was lucky. He was calculating these probabilities using classical
mechanics and got the right answer, even though at these velocities and
separations quantum mechanics would normally be needed. Scattering by
inverse square law forces is special; it allows the use of classical mechanics
in some circumstances where for any other force it would be necessary to
use quantum mechanics. Equation (3.3.3) will be derived using quantum
mechanics in Section 5.6, so we will not trouble to repeat Rutherford’s classical
calculation here.
Of course, Rutherford’s discovery was made before the development of quan-
tum mechanics. The agreement of his experimental results with theory generally
convinced physicists of a new picture of the atom, that it consists of a small
heavy positively charged nucleus, around which electrons revolve like plan-
ets around the Sun, held in orbit by electrostatic attraction, which in part had
already been guessed at in 1904 by Hantaro Nagaoka (1854–1950).
Nuclear Charge
In order for atomic theory to make contact with chemistry, it was essential to
know the precise number of electrons in the atoms of various elements. For
instance, as we shall see in Chapter 5, the dramatic difference in the chemical
properties of chlorine and argon is almost entirely due to the fact that chlorine
atoms contain 17 electrons while argon atoms contain 18. Because atoms are
electrically neutral, knowing the electric charge of the nucleus tells us the num-
ber of electrons: if the nuclear charge is Ze, the atom must contain Z electrons.
Almost immediately after Rutherford in 1911 announced his conclusion
about the existence of the nucleus, Antonius van den Broek (1870–1926)
argued in a brief note12 (apparently on the basis of the steady progression of
chemical properties with increasing atomic weight) that the nuclear charge in
units of e equals the atomic number, defined as the position in the catalog of
elements when they are listed in order of increasing atomic weight – that is,
hydrogen, helium, lithium, and so on – but he had no experimental evidence for
this hypothesis.
Rutherford offered no opinion about this. In his 1911 article cited in
footnote 9 he had used Eq. (3.3.3) (with Zα = 2, known from previous mea-
surements of the deflection of alpha particles by electric and magnetic fields)
together with several measurements by Geiger of the scattering by small angles
12 A. van den Broek, Nature 87, 78 (1911). He later published a longer paper, Phys. Zeit. 14, 32 (1913).
3.4 Atomic Energy Levels 77
of alpha particles in thin gold foil to derive a value of 97e or 114e for the charge
of the gold nucleus. The atomic number of gold is 79, so if Rutherford’s value
for the charge of the gold nucleus had been correct it would have ruled out the
equality of atomic number and the nuclear charge in units of e.13 As we shall
see in the next section, this equality was established in 1913 by measurements
of the wavelengths of X-rays from various elements.
Spectral Lines
In Munich in 1814–1815 the optician Joseph Fraunhofer (1787–1826) observed
that when light from the Sun is passed through a slit, focussed by a telescope,
and then dispersed by a prism into a spectrum of colors, the spectrum is crossed
with hundreds of dark lines, each an image of the slit. These lines were always
found in the same places in the spectrum, each corresponding to a definite
wavelength of light. It was realized that these dark lines must be caused by
selective absorption of light as it passes from the hot solar surface through
the cooler part of the Sun’s atmosphere. The same dark lines were seen in
the spectrum of the Moon and bright stars. Similar observations of the light
from flames and other terrestrial sources showed lines in the same places, some-
times dark and sometimes bright, so it became possible to identify the elements
producing these lines: sodium, iron, magnesium, calcium, etc. Some elements,
such as helium, were discovered in this way on the Sun before they were found
on Earth.
By the end of the nineteenth century large books had been published for
physicists and chemists, giving vast numbers of wavelengths for the spectral
lines of various elements. The observation of spectra became a standard tool of
astronomy and chemical analysis. But what could cause the atoms of a given
element preferentially to emit and absorb light at only certain definite wave-
lengths? Answering this question had to wait for a realistic model of atoms.
Electron Orbits
In classical electrodynamics the simple harmonic oscillation of a charged body
produces electromagnetic radiation with the same frequency as the oscillating
13 Rutherford’s over-estimate of the charge of the gold nucleus may have arisen because he was using a
wrong value for the velocity of the alpha particle in these experiments. As mentioned in footnote 10, in
the same paper Rutherford had given a value 2.09 × 109 cm/sec for the alpha particle velocity, while the
alpha particles from the decay of radon 222 actually have a velocity of 1.6 × 109 cm/sec. According to
Eq. (3.3.3) the scattering cross section depends on Z/Eα , so by over-estimating the velocity of the alpha
particles he would be over-estimating the electric charge of the gold nucleus.
78 3 Early Quantum Theory
charge, and the charged body is also effective at absorbing radiation at that
frequency. After the discovery of the electron in 1897, as mentioned earlier it
was widely supposed that atoms consist of electrons trapped in a smooth back-
ground of positive charge, and it was natural to assume that the characteristic
frequencies observed in atomic spectra are the frequencies with which these
electrons can oscillate back and forth around their normal positions.
Then, with the discovery of the nucleus discussed in the previous section,
this picture was replaced with a planetary model of the atom, in which electrons
circulate in orbits around the nucleus, like planets around the Sun only held
in orbit by electrostatic rather than gravitational attraction. In classical elec-
trodynamics the periodic motion of the electrically charged electrons would
produce electromagnetic radiation, with a frequency for circular orbits equal
to the frequency with which the electron goes around its orbit.
For elliptical orbits matters are more complicated. While the Cartesian coor-
dinates of an electron traveling at constant speed in a circular orbit are simple
harmonic functions of time, and in classical electrodynamics the electron radi-
ates at the corresponding frequency, for elliptical orbits the motion is periodic
though not simple harmonic. The Cartesian coordinates for an orbit of period
1/ν can still be expressed as Fourier series of simple harmonic terms propor-
tional to sin 2πnνt and cos 2πnνt with n an integer, so the electron classically
radiates at all frequencies equal to whole number multiples of the frequency ν
of revolution. No such pattern is seen in actual spectra.
Even if the orbits were all circular, this view of atomic spectra would have
problems. One trouble with this picture is that classically the electrons would
continually lose energy to radiation, bringing them closer to the nucleus and
thereby speeding up its revolution, hence replacing the discrete spectral line
with a continuum of frequencies. Even worse, classically there would be nothing
to prevent electrons from spiraling onto the nucleus, so that there would be no
stable atoms. Of course, one could simply assume that only certain orbits are
possible, and that these are all stable. The frequencies of these allowed orbits
would then correspond to the observed spectral lines. But there was another
trouble even with this picture: it offered no explanation of a systematic property
of observed spectral frequencies, known as the Ritz combination principle.
νnm = νn − νm , (3.4.1)
with n and m equal to 1, 2, 3, . . . (This was traditionally expressed in terms of
inverse wavelengths instead of frequencies, but the frequency of any wave is just
the speed of light times the inverse wavelength, so this makes no difference.)
Ritz could offer no explanation of this principle.
The explanation of the Ritz principle and much else was provided in the visit
in 1913 to Rutherford’s Manchester laboratory of a young Danish theorist, Niels
Bohr15 (1885–1962). He assumed that the states of an atom have energies in a
discrete set, labeled En with n running from one to infinity. These states are
stable, except for radiative transitions among them, whose rates are typically
much slower than the frequencies of spectral lines. When an atom makes a
transition from an energy En to a smaller energy Em , it emits a photon with
energy En − Em and hence with frequency
νnm = (En − Em )/ h .
Similarly, for an atom to make a transition from an energy Em to a higher
energy En , it must absorb a photon with the same energy and frequency. These
are the transitions that produce the bright and dark lines observed in spectro-
graphs. Their frequencies match the results (3.4.1) given by the Ritz principle,
if we identify the “terms” νn as simply the energies En of the various states,
divided by h.
15 N. Bohr, Phil. Mag. 26, 1, 476, 857 (1913); Nature 92, 231 (1913).
80 3 Early Quantum Theory
just as for planets in the solar system, but of course with different constant
factors on each side of the equation. We can solve these two equations for radius
and velocity. Multiplying Eq. (3.4.3) with rn3 /Ze2 gives rn = me vn2 rn2 /Ze2 , so
n2 h̄2
rn = . (3.4.4)
Ze2 me
Using this back in Eq. (3.4.2) then gives
Ze2
vn = . (3.4.5)
nh̄
The electron has total energy
me vn2 Ze2 Z 2 e 4 me
En = − =− . (3.4.6)
2 rn 2n2 h̄2
(By the way, it immediately follows from Eq. (3.4.3) that the kinetic energy is
−1/2 times the potential energy. One consequence, already mentioned in the
previous section, is that classically when an electron in orbit loses energy the
potential energy decreases, becoming more negative, so that the kinetic energy
increases.)
Bohr could then give numerical values of parameters for one-electron atoms:
Z n2 Z2
vn ×c , rn × 0.5292 × 10−8 cm , En = − × 13.6 eV .
137n Z n2
(3.4.8)
Reduced Mass
But the agreement was not perfect. According to Eq. (3.4.6), all energies and
hence all frequencies in the spectrum of once-ionized helium should be ZHe 2 =4
times larger than for neutral hydrogen, but experiment showed that the ratio
was actually larger than 4 by about 0.04%. Bohr realized that the source of this
discrepancy was that in order to take account of the motion of the nucleus the
formulas for energy and angular momentum of an electron in orbit around a
nucleus of mass M should contain the reduced mass μ = me /(1 + me /M)
in place of the electron mass itself. It is therefore the reduced mass that should
appear in Bohr’s formulas for energies and frequencies in place of me . All ener-
gies and frequencies are thus larger for singly ionized helium than for hydrogen
by a factor
1 + me /mH
2
ZHe 2
/ZH × μHe /μH = 4 × = 4 × 1.00041 ,
1 + me /mHe
in agreement with observation. Bohr’s success in getting this factor right was a
key factor in convincing physicists of the correctness of his assumptions.
Incidentally, although Bohr’s formula (3.4.10) for hydrogen energy levels
(with the reduced mass in place of me ) worked very well, the n in this formula
is not quite equal to the angular momentum in units of h̄, as Bohr had assumed.
We will see in Section 5.2 that in general there are several hydrogen states with
energies given by this formula with the same n, in which the electron has orbital
angular momenta (n − 1)h̄, (n − 2)h̄, . . . , 0, but not nh̄. The electrostatic attrac-
tion exerted on electrons by the nucleus is not balanced solely by the centrifugal
force of motion in closed orbits, but by motions implicit in the wave nature of
the electron. Although Bohr’s calculation of the energy levels in hydrogen has
not survived as a correct derivation of the formula for these energies, Bohr made
a contribution of permanent importance in using a hypothesis of discrete energy
levels for electrons in all atoms to explain the existence of bright and dark lines
in atomic spectra.
Atomic Number
The alpha particle scattering experiments in Rutherford’s laboratory had not
settled the crucial question of the electric charge Ze of the atomic nucleus and
its possible relation to the atomic number, which gives the order of an element
in the list of elements in order of increasing atomic weight. One of the great
achievements of the Bohr theory is that it made possible precise measurements
of nuclear charge.
3.4 Atomic Energy Levels 83
Element Z A
Two aspects of this table stand out dramatically. The first is that Z always
turns out to be very close to an integer; the small discrepancies can be easily
blamed on experimental uncertainties. That of course is what one expects, if Z
is the number of electrons in the atom, but it reassured everyone that Moseley’s
measurements were reliable. The second remarkable feature is that Z goes up
by one unit as you go up one step in the list of elements according to atomic
weight; there are no elements with atomic weights between 40 and 65 other
than those listed here. (Nickel is an exception to the steady increase of A with
Z, understood today as due to forces in the nucleus of nickel that make it
unusually strongly bound, for a reason discussed in Section 6.3.) This tight
correspondence between atomic number and atomic weight goes beyond the
elements in the table. For instance, there are just 19 elements with atomic
weights less than calcium, which has Z = 20. Thus with a few exceptions,
one can find Z for any element just by making a list of all elements in order
of increasing atomic weight; the atomic number, defined as the place of the
element in that list, gives the number Z of electrons in the atom and the positive
charge Ze of the nucleus.
Incidentally, the Bohr theory also provides a rough idea of the sizes of all
atoms. The electric field felt by the outermost electron in any atom is largely
shielded by the Z − 1 electrons closer to the nucleus, so the radius of its orbit
is very crudely given by the Bohr result (3.4.8), only with Z 1. This is why
the sizes of the atoms of heavy elements are not very much larger than that
of the hydrogen atom, of order 10−8 cm. They are in fact somewhat larger,
because the radius rn increases with n, and for reasons we will learn in Chapter 5
the outermost electrons in heavy atoms have n greater than 1.
Outstanding Questions
Successful as it was, the Bohr theory raised a number of new questions.
1. Why should angular momentum (or anything else) be quantized?
2. How many atomic states are there for each energy? (It was already known
that spectral lines could be split by exposing atoms to external electric and
magnetic fields.)
3. Above all, how should quantum theory be applied to states that cannot be ap-
proximated as consisting of electrons moving in a fixed Coulomb potential.
This includes all molecules.
The solution of these problems had to wait until the advent of modern quantum
mechanics in the 1920s. This is the subject of Chapter 5.
A and B Coefficients
In 1917 Einstein returned to the theory of black body radiation,17 this time
combining it with the Bohr idea of quantized atomic energy states. Einstein
defined a quantity Anm as the rate at which an atom will spontaneously make a
transition from a state m of energy Em to a state n of lower energy En , emitting
a photon of energy Em − En . He also considered the absorption of photons
17 A. Einstein, Phys. Z. 18, 121 (1917), reprinted in English translation in Van der Waerden, Sources of
Quantum Mechanics, listed in the bibliography.
3.5 Emission and Absorption of Radiation 85
from radiation (not necessarily black body radiation) with an energy density
E (ν) dν at frequencies between ν and ν + dν. The rate at which an individual
atom in such a field makes a transition from a state n to a state m of higher
energy is written as Bnm E (νnm ), where νnm ≡ (Em − En )/ h is the frequency
of the absorbed photon. As we will see, Einstein also found it necessary to take
into account the possibility that the radiation would stimulate the emission of
photons of frequency νnm by the atom in transitions from a state m to a state
n of lower energy, at a rate written as Bmn E (ν ). The coefficients B m , and B n
nm n m
n
like Am , were assumed to depend only on the properties of individual atoms,
not on their temperature or any properties of the radiation.
Now, suppose the radiation is black body radiation, at a temperature T , with
which the atoms are in equilibrium. The energy density per frequency interval
of the radiation will be the function E (ν, T ) given by Eq. (3.2.4):
8πh ν3
E (ν, T ) = .
c3 exp(hν/kT ) − 1
In equilibrium the rate at which atoms make a transition m → n from higher
to lower energy must equal the rate at which atoms make the reverse transition
n → m:
Nm Anm + Bm n
E (νnm , T ) = Nn Bnm E (νnm , T ) , (3.5.1)
where Nn and Nm are the numbers of atoms in states n and m. According to the
Boltzmann rule of classical statistical mechanics, at temperature T the number
of atoms in a given state of energy E is proportional to exp(−E/kT ), so
(It is important here to take the various Nn as the numbers of atoms in the
individual states n, some of which may have precisely the same energy, rather
than the numbers of atoms in all states with energies En .) Putting this together,
we have
8πh 3
νnm
Anm = exp(hν nm /kT ) B m
n − Bm
n
. (3.5.3)
c3 exp(hνnm /kT ) − 1
For this to be possible at all temperatures for temperature-independent A and B
coefficients, these coefficients must evidently be related by
3
8πhνnm
Bm = Bn , Am =
n m n
Bmn
. (3.5.4)
c3
Hence, knowing the rate at which a classical light wave of a given energy density
is absorbed or stimulates emission by an atom, we can calculate the rate at which
it spontaneously emits photons, an explicitly quantum process.
86 3 Early Quantum Theory
Lasers
The phenomenon of stimulated emission makes possible the amplification of
beams of light in a laser. (This is an acronym for “light amplification by stim-
ulated emission of radiation.” Before lasers there were masers, in which it was
microwave radiation rather than visible light that was amplified by stimulated
emission.) Suppose a beam of light with energy density distribution E (ν) passes
through a medium consisting of Nn atoms at energy level En . Stimulated emis-
sion from the first excited state n = 2 to the ground state n = 1 adds photons
of frequency ν12 ≡ (E2 − E1 )/ h to the beam at a rate N2 E (ν12 )B21 , but absorp-
tion from the ground state removes photons at a rate N1 E (ν12 )B12 , and since
B21 = B12 there will be a net addition of photons only in the case N2 > N1 .
Unfortunately, such a population inversion never occurs in thermal equilibrium,
and cannot even be produced by exposing the atoms in their ground state to light
at the resonant frequency ν12 . The net rate of change in the population of the
first excited state, labeled n = 2, due to spontaneous and stimulated emission
from the excited state and absorption from the ground state will be
dN2
= −N2 E (ν12 )B21 − N2 A12 + N1 E (ν12 )B12 ,
dt
or, using the Einstein relation B12 = B21 ,
dN2
= B21 − N2 E (ν12 ) + 8πν12
3
h/c3 + N1 E (ν12 ) . (3.5.5)
dt
If we start with N2 = 0, then N2 increases until it approaches a value N1 /(1+ξ ),
where ξ ≡ 8πν12 3 h/E (ν )c3 , when N becomes constant. Not only can this
12 2
process not produce a population inversion; because of spontaneous emission it
cannot even make N2 as large as N1 .
A population inversion can be produced in other ways, for instance by optical
pumping, in which atoms are excited to some state, say n = 3, by absorption
of light with frequency ν31 = (E3 − E1 )/ h, and then spontaneously decay to
the state n = 2. This can also happen naturally. Masers have been observed in
the accretion disks surrounding the centers of several galaxies, including NGC
4258 and M33.
Suppressed Absorption
Stimulated emission can not only intensify emission lines, such as those from
masers – it can also suppress absorption lines. Consider a steady beam with
area A of radiation moving in the +x-direction, with local energy density per
unit frequency interval E (ν, x) at x. In the steady state, the rate of change of
energy per unit frequency interval E A dx in the slab between x and x + dx due
to atomic transitions n → m and m → n with Em − En = hν > 0 must
3.5 Emission and Absorption of Radiation 87
be balanced by the difference in the rates at which radiation energy enters and
leaves the slab:
c[E (ν, x) − E (ν, x + dx)]A = hν E (ν, x)[−nn Bnm + nm Bm
n
]A dx
where nm and nn are the number densities of atoms in states m and n, respec-
tively. The two terms in square brackets on the right arise respectively from
absorption and stimulated emission; we do not include a term for spontaneous
emission because the photons it produces leave the beam. If the medium is in
thermal equilibrium at temperature T then nm /nn = exp(−hν/kT ); so, since
Bmn = B m , the energy density per unit frequency interval along the beam must
n
satisfy
d hν
E (ν, x) = − E (ν, x)nn Bnm 1 − exp(−hν/kT ) . (3.5.6)
dx c
Thus, if hν
kT , stimulated emission suppresses the intensity of the absorp-
tion line by a factor hν/kT . This is important for radio and microwave fre-
quency lines, like the famous “21-cm” line in hydrogen discussed in Section 5.4.
It has hν/k = 0.068 K, which is less even than the temperature of the cosmic
microwave background, so this absorption line is strongly suppressed by stim-
ulated emission everywhere. Nevertheless, the absorption line is observed. Its
intensity and Doppler shifts provide valuable information about the temperature
and motion of hydrogen gas in galactic disks.
4
Relativity
1 Aristarchus, “On the Sizes and Distances of the Sun and Moon,” translated by T. L. Heath, in Aristarchus
of Samos (Clarendon Press, Oxford, 1923). The calculations of Aristarchus are described in S. Weinberg,
To Explain the World (HarperCollins Publishers, New York, 2015).
88
4.1 Early Relativity 89
early as the fourth century BC; it was Heraclides of Pontus (ca. 388–310 BC),
a student at Plato’s Academy at Athens.
There is a classic argument against both the rotation and motion of the Earth,
given originally by Aristotle, and picked up around 150 AD by the astronomer
Claudius Ptolemy of Alexandria (ca. 100–170 AD). Ptolemy argued that if the
surface of the Earth were in motion then an arrow shot straight up would not
fall back to the same spot from which it was shot, as is observed, because
while the arrow was in flight that spot would have moved some distance under
the arrow. This argument was first countered in the mid-1300s AD by Nicole
Oresme (1321–1382), bishop of Lisieux. Relying on the concept of impetus
introduced by his teacher at the University of Paris Jean Buridan (1300–1358),
Oresme argued that an arrow on the surface of the Earth would pick up an
impetus from the Earth’s motion, which would keep it moving with the same
horizontal component of velocity while going up and down in the air, so it
would fall back to the same spot on Earth, despite the Earth’s motion. Sadly,
whether from respect for the teachings of the Church or fear of its discipline,
Oresme never publicly adopted the notion that the Earth really is in motion. But
he had established that purely terrestrial observations cannot detect a possible
motion of the Earth.
It was not so obvious that the peculiar motion of the planets around the
constellations of the zodiac, sometimes even seeming to reverse their motion,
could be explained if the Earth were in orbit about the Sun, sometimes passing
Mars or some other outer planet, and sometimes being passed by Venus or Mer-
cury. As everyone knows, this was finally made clear in the 1540s by Nicolaus
Copernicus (1473–1543).
Relativity of Motion
I don’t know if it was the writings of Oresme or similar ideas of their own, but
Johannes Kepler (1571–1630) and Galileo Galilei (1564–1642) in their defense
of Copernicanism were comfortable with the conclusion that there is no way
that a uniformly moving observer without observing the surroundings can tell
that he or she is in motion. It was generally understood that (in modern notation)
if a first observer describes any event as having Cartesian space coordinates x i
(with i = 1, 2, 3 or x, y, z) and time coordinate t, then a second observer who
moves with velocity −u with respect to the first will see the same event with
coordinates
x = x i + ui t, t = t ,
i
(4.1.1)
because an object seen by the first observer with any time-independent coordi-
nates x i = a i will seem to the second observer to be moving with velocity + u,
with coordinates x i = a i + ui t.
90 4 Relativity
d2 i i − xi
xM
x = Gm M
N
(4.1.2)
dt 2 N |xM − xN |3
M=N
i
where G is Newton’s gravitational constant, and |xM − xN |2 ≡ i (xM − xNi )2 .
These equations are invariant under the transformation (4.1.1), which here is
t → t = t ,
i
i
xN → xN = xN
i
+ ui t , (4.1.3)
because the term ui t drops out in the second time derivative on the left-hand side
of Eq. (4.1.2) and does not appear in the differences of spatial coordinates on
the right-hand side. The principle that the laws of nature are invariant under
the transformations (4.1.1) is known as the principle of Galilean relativity.
It is a good approximation for bodies moving at speeds much less than that
of light. For instance, we saw in Section 2.5 how invariance under Galilean
transformations is used to infer the equations of motion for imperfect fluids.
The equations of motion (4.1.2) are of course also invariant under constant
rotations of space coordinates and constant translations of space and time coor-
dinates. The set of all these transformations and all their combinations is known
as the Galileo group.
Speed of Light
It is obvious that Maxwell’s equations are not invariant under the Galilean
transformations (4.1.1). Maxwell’s equations tell us that light always travels
at the same speed, which we call c. If a light wave moves along the 1-direction,
the 1-coordinate of the wave front must have the time dependence
x 1 (t) = x 1 (0) + ct . (4.1.4)
But then if a second observer who moves in the −1-direction with speed u uses
the coordinates (4.1.1), she will see the 1-coordinate of the wave front as
x 1 (t) = x 1 (0) + (c + u)t , (4.1.5)
so the wave would seem to travel faster or slower than the speed of light ac-
cording to whether u is positive or negative. Observers can use any coordinate
systems they like, but Eq. (4.1.5) shows that if Maxwell’s equations in the form
(3.1.2) are found to hold when an observer uses coordinates x i , t then they
cannot hold in that form when she uses coordinates x i , t .
Einstein worried about this as a young man. He was particularly concerned
with what a light wave would look like to an observer with u = −c in our
4.1 Early Relativity 91
example – that is, an observer moving with the light wave. He concluded that
the electric and magnetic fields would appear frozen in time, though varying
with position along the ray. Needless to say, this is not a solution of the Maxwell
equations.
This problem did not worry Maxwell. In formulating his equations, he
regarded the electric and magnetic fields as vibrations in an elastic medium, the
aether. In this case one would not expect the equations to hold for observers
moving with respect to the aether, any more than the equations for a sound wave
traveling up and down in an organ pipe would seem the same to an observer
flying up the pipe as to an observer at rest with respect to the pipe. Maxwell
thought that his equations would apply only for observers at rest in the aether.
Michelson–Morley Experiment
So, if electromagnetic waves are vibrations in the aether, can we measure the
velocity of the Earth through the aether? The Earth’s orbital motion gives it
a speed of 30 km/sec relative to the Sun, and the rotation of our galaxy gives
the solar system a speed of about 200 km/sec relative to the galaxy’s center.
These speeds are much less than the speed of light, 300 000 km/sec, but not
too small to be measured with a device known as a Michelson interferometer,
invented by the American physicist Albert Michelson (1852–1931). (Michelson
interferometers have been used for many purposes since then, most recently in
the detection of gravitational waves from distant coalescing black holes and/or
neutron stars.)
In 1886 Michelson and Edward Morley (1838–1923) set out to measure the
speed of the Earth through the aether in observations at the US Naval Academy,
where Michelson had been a midshipman. As a base for their interferometer,
they used a large stone disk floating on mercury, to allow an easy change in
its orientation and also to give it some insulation from vibrations in the Earth.
On this disk they placed a strong source of light, which sent a beam of light
toward a half-silvered mirror set at 45◦ to the beam. (See Figure 4.1.) Half
the beam went straight ahead to an ordinary mirror A at distance LA from the
half-silvered mirror, and half went at a right angle to another ordinary mirror B
at a distance LB . From both these two mirrors the beam was reflected back to
the half-silvered mirror. Some of the two reflected beams went together in the
direction opposite to the direction to mirror B, to a detector which measured
the intensity of the recombined beam. If it takes times tA and tB for the light to
travel from the half-silvered mirror M along the paths to mirrors A and B and
back again, then the intensity observed at the detector is proportional to
AA e−2πiνtA + AB e−2πiνtB 2
v
f
LA
LB
B
where AA and AB are the amplitudes that would be received from mirrors A and
B if tA and tB were negligible, α is the relative phase of these amplitudes, and ν
is the light frequency. It is easy to arrange that |AA | and |AB | are approximately
equal, in which case the intensity (4.1.6) is quite sensitive to the argument of
the cosine. So we need to calculate the times tA and tB for various orientations
of the interferometer.
Adopting the idea of an aether for the sake of argument, let us assume that
the Earth is traveling through the aether with a speed v, at an angle φ to the
direction of the interferometer’s incident light beam. To calculate tA and tB it is
easiest to work in the frame of reference at rest in the aether, in which the speed
of light according to Maxwell is c in all directions. If the light takes a time tA+ to
travel from the half-silvered mirror M to mirror A and a time tA− to travel back
from A to M, then in the time intervals tA± it travels a distance LA ± tA± v cos φ
along its original direction (because during time tA+ the mirror A moves in
a direction away from M by a distance tA+ v cos φ while in the time tA− the
half-silvered mirror M moves in a direction toward A by a distance tA− v cos φ).
In both time intervals the light beam also moves at right angles to its original
direction by a distance tA± v sin φ. The total distance traveled in these time
intervals is then the hypotenuse of a right triangle with sides LA ±tA± v cos φ and
tA± v sin φ, so
4.1 Early Relativity 93
ctA± = (LA ± tA± v cos φ)2 + (tA± v sin φ)2 = L2A ± 2LA tA± v cos φ + (tA± v)2 .
Because v is presumably much less than c, it will be enough to keep only terms
up to second order in v. We can then use the familiar expansion
√ x x2
1+x =1+ − + ··· ,
2 8
so that
1 ± 2
ctA± LA ± tA± v cos φ + (t v) (1 − cos2 φ)
2LA A
1
LA ± tA± v cos φ + LA (v 2 /c2 )(1 − cos2 φ)
2
and therefore
LA v2
tA± 1 + 2 (1 − cos φ) .
2
c ∓ v cos φ 2c
Adding these results for tA+ and tA− , we see that the terms of first order in v/c
cancel, leaving us with the second-order correction
2LA c v2
tA = tA+ + tA− 2 1 + (1 − cos 2
φ)
c − v 2 cos2 φ 2c2
2LA v2
1 + 2 (1 + cos φ) .
2
c 2c
Since we assumed that the line from the half-silvered mirror M to mirror A is at
an angle φ to the Earth’s velocity through the aether, the line from M to mirror
B is at an angle 90◦ − φ to the Earth’s velocity. We can therefore find tB by
simply replacing φ with 90◦ − φ and of course replacing LA with LB :
2LB v2
tB 1 + 2 (1 + sin φ) .
2
c 2c
The difference, which appears in Eq. (4.1.6), is then
2(LA − LB ) 3v 2 (LA + LB ) v 2
t A − tB 1+ 2 + cos 2φ . (4.1.7)
c 4c 2c c2
There is no way that Michelson and Morley could know LA − LB and α accu-
rately enough to allow them to detect the presence of corrections proportional to
v 2 /c2 by measuring the intensity with a fixed orientation of their interferometer,
even if they knew the value of φ for that orientation, which of course they did
not since no one knew the direction of the Earth’s motion through the aether.
But if they rotated the interferometer through 180◦ , then cos 2φ would vary
94 4 Relativity
through the whole range from −1 to +1, so tA − tB in Eq. (4.1.7) would vary by
an amount (LA + LB )v 2 /c3 and the argument of the cosine in Eq. (4.1.6) would
change by an amount 2πν(LA + LB )v 2 /c3 . This predicts an observable change
in the intensity (4.1.6) as the interferometer is rotated through 180◦ , provided
that 2πν(LA + LB )v 2 /c3 is not much less than 2π , or in other words provided
that v 2 /c2 is not too small compared with c/ν(LA + LB ) = λ/(LA + LB ),
where λ = c/ν is the light wavelength. In the Michelson–Morley experiment
(taking account of repeated reflections between the half-silvered mirror and the
other mirrors) LA + LB was of order 103 cm, while the wavelength λ was a few
times 10−5 cm, so λ/(LA + LB ) was of order 10−8 , and velocities roughly of
order 10−4 c = 30 km/sec could be easily detected.
Finding no change in the intensity (4.1.6) as the interferometer was rotated,
Michelson and Morley concluded in 1887 that the velocity of light as observed
from the moving Earth is the same in all directions to within 5 km/sec.2 That
is, within the aether theory of that time, the speed v of the Earth relative to the
aether would have to be less than 5 km/sec, as compared with the undoubted
orbital velocity of the Earth relative to the Sun of 30 km/sec. By 1964, with
the use of a laser instead of an incoherent light source, the upper limit on this
velocity had been reduced to about 1 km/sec.3 Even if one imagined that on
a particular day the Earth happened to be more or less at rest in the aether,
six months later the Earth would be moving in the opposite direction, with the
same speed relative to the Sun, and hence with a speed of 60 km/sec relative to
the aether.
This surprising result evoked various explanations. H. A. Lorentz4 in 1892
and George Francis Fitzgerald (1851–1901) at about the same time proposed
that motion through the aether causes a contraction of the dimension of the
interferometer along the direction of motion, just such as to hide the effect of
motion on the speed of light. Lorentz, acting on the assumption that all mat-
ter consists of electrons, tried to explain this “Lorentz–Fitzgerald contraction”
within a theory of the electron. Similar ideas were elaborated by the polymath
Henri Poincaré5 (1854–1912). But it was Albert Einstein in 1905 who put his
finger on the solution.
Physicists in the first years of the twentieth century were in a strange bind.
Newton’s equations (4.1.2) of matter and gravitation are invariant under the
Postulate of Invariance
Einstein’s solution to this conundrum was presented in 1905 in an article6 “On
the Electrodynamics of Moving Bodies.” As suggested by the title, part of his
motivation was a peculiar feature of electrodynamics. Consider a magnet mov-
ing past a conducting wire. To an observer at rest with respect to the wire,
the changing magnetic field produces an electric field, which, as in an electric
generator, drives a current in the wire. On the other hand, to an observer at
rest with respect to the magnet, there is no electric field; instead the motion
of the wire with velocity v through the magnetic field B produces a force per
charge v × B/c that drives a current in the wire. Somehow the current is the
same, although the two observers use different language to describe what is
happening. So at least some electromagnetic phenomena are unaffected by the
motion of the observer.
Einstein also mentioned in passing “unsuccessful attempts to detect a motion
of the Earth” relative to what he called “the light medium,” but did not give a
reference to the Michelson–Morley experiment. In his 1905 paper he rejected
the idea of Lorentz and Fitzgerald that the change in the speed of light due to the
transformation (4.1.1) is somehow hidden from us by changes in the measuring
apparatus due to motion. Instead, he insisted that Maxwell’s equations are un-
affected by uniform motion – only the change in coordinates due to uniform
motion is not (4.1.1), but something else.
What was truly new and remarkable in Einstein’s paper was that in working
out this change of coordinates he supposed that the time coordinate, as well as
the space coordinates, is affected by the motion of an observer. In writing the
Galilean transformation in Eq. (4.1.1) I was careful to include the specification
t = t. That was an anachronism – no one before Einstein would have bothered
to specify that the time coordinate is unaffected by the motion of an observer. It
was then universally supposed that the flow of time is unaffected by motion or
anything else. Now Einstein was contemplating the possibility that time as well
as distance is affected by an observer’s motion.
Einstein calculated the effect of motion on space and time coordinates by a
variety of thought experiments, under the assumption that times and distances
would be measured using light rays. Though he did not put it in this way, he
Lorentz Transformations
Let us first consider what sort of spacetime transformation preserves the speed
of light. If a light wave front shifts its position by a vector x in a time interval
t, then if light travels at a speed c we have |x| = ct, or in other words
0 = x2 − c2 (t)2 . (4.2.1)
So, what sort of transformation leaves invariant the quantity x2 − c2 (t)2 ?
Before answering this question, it may be mentioned that there is a larger
group of transformations that leave x2 − c2 (t)2 invariant only when it van-
ishes. These are known as conformal transformations. One simple example is a
rescaling x → λx, t → λt, with λ an arbitrary constant. Invariance of the laws
of nature under conformal transformations would be enough to keep the speed
of light the same for all observers, but it would apparently make it impossible to
deal with non-zero masses. Nevertheless, conformal symmetry has been revived
7 H. Minkowski, lecture delivered to the Math. Ges. Göttingen, November 5, 1907, published in Ann. Phys.
47, 927 (1915).
4.2 Einsteinian Relativity 97
again and again up to the present as a possible property of physical law at the
most fundamental level, hidden from us through dynamical effects of one sort
or another. Here we shall content ourselves with asking about the more limited
class of transformations that leave x2 − c2 (t)2 invariant, whether or not it
vanishes.
As mentioned above, it will be very convenient to adopt the spacetime nota-
tion due to Minkowski, with a fourth coordinate x 0 ≡ ct. We use letters from
the middle of the Greek alphabet to label the coordinates of events in spacetime,
as x μ , x ν , etc. Then the right-hand side of Eq. (4.2.1) may be written
(x)2 − c2 (t)2 = ημν x μ x ν ,
it being understood that repeated indices are summed over the values 1, 2, 3, 0.
Here ημν is the matrix
⎧
⎨ 1 μ = ν = 1, 2, 3
ημν = −1 μ = ν = 0 (4.2.2)
⎩
0 μ = ν .
In this notation, the condition we impose on coordinate transformations
x μ → x μ may be written
ημν x μ x ν = ημν x μ x ν . (4.2.3)
It can be shown8 that the most general transformation of the spacetime coordi-
nates that satisfies this condition is linear:
x μ = μ ρ x ρ , (4.2.4)
with μ ρ some set of constants. (We are excluding translations here, under
which x μ would change by a constant term a μ , because x μ is a difference
of spacetime coordinates and hence unaffected by translations.) Recall that the
repetition here of the index ρ indicates that this index is to be summed over the
values 1, 2, 3, 0. Condition (4.2.3) now reads
ημν μ ρ ν σ x ρ x σ = ηρσ x ρ x σ .
In order for this to be valid for any x ρ , the coefficients of x ρ x σ on both
sides must be equal:
ημν μ ρ ν σ = ηρσ , (4.2.5)
for all values of the spacetime coordinate indices ρ and σ . Transformations
(4.2.4) with μ ν satisfying (4.2.5) are known as Lorentz transformations.
8 For a proof, see S. Weinberg, Gravitation and Cosmology (Wiley, New York, 1972), Section 2.1.
98 4 Relativity
3 3 = 0 0 = γ , 3 0 = 0 3 = βγ ,
(4.2.6)
1 1 = 2 2 = 1 ,
where γ is the positive quantity
1
γ = +% . (4.2.7)
1 − β2
That is, in matrix notation,
⎛ ⎞
1 0 0 0
⎜ 0 1 0 0 ⎟
⎟ .
μ ν =⎜ (4.2.8)
⎝ 0 0 γ βγ ⎠
0 0 βγ γ
The free parameter β can have any sign, but |β| < 1. We will see in Section 4.6
that not only the speed of light but also the complete set of Maxwell’s equations
are invariant under these transformations.
I’ll pause to mention that there are other Lorentz transformations that cannot
be obtained by a gradual variation of parameters from a Lorentz transforma-
tion that does nothing. These include the space inversion x 3 → − x 3 with x μ
unchanged for μ = 3, and the time reversal x 0 → − x 0 with x μ unchanged
for μ = 0. (Space inversion is often described as a change of sign of all three
Cartesian coordinates, but this transformation can be produced by the reversal
of any one coordinate, followed by an ordinary rotation of 180◦ around that
coordinate direction.) As will be discussed in Section 6.5, experiments in the
1950s showed that invariance under space inversion is only a good approxi-
mation, being violated by the very weak forces that lead to the decay of some
radioactive nuclei and elementary particles, and in Section 2.4 we have already
mentioned that the same is true of invariance under time reversal. We will be
concerned here only with transformations that can be obtained by a gradual
variation of parameters from a Lorentz transformation that does nothing. (These
are known as proper orthochronous Lorentz transformations – proper, meaning
that the determinant of the matrix μ ν is unity, and orthochronous, mean-
ing that 0 0 > 0. In this book, I will refer to proper orthochronous Lorentz
transformations simply as “proper.”)
Now let us consider the physical meaning of β. Consider a tiny body at rest
in the frame of reference with coordinates x μ . At two different times, separated
by a time difference t, the body is at the same position, so the separation of
positions is x i = 0 with i = 1, 2, 3. Now suppose we look at the same body in
the frame of reference with coordinates x μ , given by the Lorentz transformation
(4.2.6). The 1- and 2-coordinates will be unaffected, but the 3-coordinates and
the times in the new frame of reference will be separated by
x 3 = 3 0 x 0 = βγ x 0 , t = x 0 /c = 0 0 x 0 /c = γ x 0 /c .
(4.2.9)
100 4 Relativity
Maximum Speed
From Eqs. (4.2.7) and (4.2.10), we see that it is not possible for a finite Lorentz
transformation to take a body from rest to a velocity greater than or even as
large as c. (In Section 4.7 we will see that causality, the principle that effects
cannot precede causes, rules out any signal traveling faster than light.) This
may be surprising, because we can perform a pair of Lorentz transformations,
each of which gives a body at rest a velocity in the 3-direction greater than c/2,
which if these were Galilean transformations of the form (4.1.1) when combined
would give a Galilean transformation from rest to a velocity greater than c. But
velocities add differently in Einsteinian relativity.
Suppose we perform a Lorentz transformation x μ → x μ = 1 ν x ν that
μ
gives a particle initially at rest a velocity cβ1 in the 3-direction and then perform
a Lorentz transformation x μ → x μ = 2 ν x ν that gives the particle that was
μ
initially at rest a velocity cβ2 in the same direction. The combined effect is a
linear transformation
x μ → x μ = 2 ρ 1 ν x ν = 21 ν x ν ,
μ ρ μ
where
μ μ ρ
21 ν ≡ 2 ρ 1 ν .
In matrix notation, this means that
⎛ ⎞⎛ ⎞
1 0 0 0 1 0 0 0
⎜ 0 1 0 0 ⎟⎜ 0 1 0 0 ⎟
21 ν = ⎜ ⎟⎜ ⎟
μ
⎝ 0 0 γ2 β2 γ2 ⎠⎝ 0 0 γ1 β1 γ1 ⎠
0 0 β2 γ2 γ2 0 0 β1 γ1 γ1
4.2 Einsteinian Relativity 101
Thus the relativistic rule for combining velocities is that a Lorentz transforma-
tion with velocity v1 = cβ1 followed by a Lorentz transformation with velocity
v2 = cβ2 in the same direction gives a Lorentz transformation with velocity
v1 + v2
v21 = cβ21 = . (4.2.11)
1 + v1 v2 /c2
Even if v1 and v2 both approach c, the combined velocity v21 approaches c,
not 2c.
General Directions
Of course there is nothing special about the 3-direction. Whatever velocity
vector v is given to a body at rest by a given Lorentz transformation, we can
always rotate our coordinate axes so that the 3-direction is in the direction of v.
The Lorentz transformation consequently will have the form (4.2.6), but with
β = |v|/c. If we rotate our coordinate axes back to their original direction,
we find
i j = δij + (γ − 1)v̂i v̂j ,
(4.2.12)
i 0 = 0 i = γ vi /c , 0 0 = γ ,
where i and j run over the spatial coordinate indices 1, 2, 3; δij is the unit
matrix,
1 i=j
δij =
0 i = j ,
and
γ = 1/ 1 − v2 /c2 . (4.2.13)
102 4 Relativity
Here v̂ is the unit vector v/|v|. (To check, note that for v in the 3-direction,
Eq. (4.2.12) gives 1 1 = 2 2 = 1, 3 3 = 1 + (γ − 1) = γ , and so on for
the other components.) Performing a Lorentz transformation with velocity v1
followed by a Lorentz transformation with velocity v2 in general does not give
a Lorentz transformation of the form (4.2.12), unless v1 and v2 happen to be
in the same direction. In general we get a rotation, followed by a Lorentz
transformation of the form (4.2.12). This is not a contradiction, because
rotations satisfy the condition (4.2.5) and therefore can be considered as
belonging to a subgroup of the group of Lorentz transformations. Lorentz
transformations of the special form (4.2.12) are often distinguished from more
general Lorentz transformations by calling them boosts.
Clocks
Consider two ticks separated by a time interval T of a small clock at rest in
the frame of reference with coordinates x μ . The spacetime coordinates of these
ticks are separated by x i = 0, x 0 = cT , as usual with i = 1, 2, 3. Now
perform a Lorentz transformation (4.2.6) that gives the clock a velocity v = βc
in the 3-direction. The 1- and 2- coordinates of the clock will be unaffected,
while in the new reference frame the 3- and 0- coordinates of the clock at these
two ticks will be separated by
x 3 = 3 0 x 0 = γβcT = γ vT , (4.3.1)
x 0 = 0 0 x 0 = γ cT , (4.3.2)
where as before γ ≡ (1 − v 2 /c2 )−1/2 . From Eq. (4.3.2) we see that the time
interval between ticks of the moving clock is lengthened to T = x 0 /c = γ T .
This is what is seen by the observer who sees the clock moving with velocity v;
an observer who travels with the clock sees its ticks separated by T , just as if it
were at rest.
There is another way of getting this result without ever looking at a specific
Lorentz transformation. If the time interval between ticks of a clock at rest is T ,
then the spacetime separation between ticks at rest has components x i = 0,
x 0 = cT , which satisfy ημν x μ x ν = −c2 T 2 , where ημν is again the diago-
nal matrix (4.2.2) with elements 1, 1, 1, −1 on the diagonal, and the summation
convention is again in force. If an observer sees the clock moving with velocity
v in any direction and measures a time T between ticks, then in the coordinates
x μ used by this observer the spacetime separation between ticks has compo-
nents x = vT , x 0 = cT , which satisfy ημν x μ x ν = (v2 − c2 )T 2 .
But, as discussed in the previous section, Lorentz transformations are designed
to keep this quantity invariant, in the sense that ημν x μ%
x ν = ημν x μ x ν .
Therefore (v2 − c2 )T 2 = −c2 T 2 , so as before T = T / 1 − v2 /c2 .
This lengthening of course applies to any kind of time interval, not just
ticks of a clock. It is vividly displayed in the decay of unstable particles
104 4 Relativity
in cosmic rays. The collision of atomic nuclei in primary cosmic rays with
atoms in the upper atmosphere produces particles known as muons, resem-
bling electrons but about 210 times heavier. At rest, muons are observed to
decay with a mean lifetime 2.2 microseconds, but although they are typically
produced at an altitude of about 15 km, a good fraction of these muons reach
the ground before decaying, so even traveling near the speed of light they
must have survived for a time (as measured on the Earth’s surface) at least
15 km/300 000 km/sec = 50 microseconds, and more if they reach the ground
at a slant. If there were no relativistic time dilation, then the probability
of a particle with a mean lifetime 2.2 microseconds surviving as long as
50 microseconds would be exp(−50/2.2) = 1.2 × 10−10 . Evidently the life of
these muons is extended by their motion by a factor γ at least of order 10, which
requires their velocity to be within a fraction of a percent of the speed of light.
Rulers
Next consider a ruler of length L at rest, lying along the 3-direction in a frame
of reference with coordinates x μ . At any fixed time its ends are separated in this
frame by x 3 = L, x 0 = 0, and x 1 = x 2 = 0. Now perform a Lorentz
transformation (4.2.6) that gives the ruler a velocity v (positive or negative) in
the 3-direction. The spacetime coordinates x μ in the new reference frame will
be separated by
x 3 = 3 3 L = γ L , x 1 = x 2 = 0 (4.3.3)
x 0 = 0 3 L = γ vL/c . (4.3.4)
But Eq. (4.3.4) shows that in this frame the two ends of the ruler have been
traveling for times that differ by an amount t = γ vL/c2 , so to find the
difference in the space coordinates at the same time t , we have to subtract vt
from x 3 . The spatial separation of the ends of the ruler at the same time t is
then
x 3 − vt = γ L − γ Lv 2 /c2 = L/γ . (4.3.5)
This contraction of lengths in the direction of motion is similar to what Fitzger-
ald and Lorentz had proposed as the cause of the failure to measure the velocity
of the Earth through the aether.
Light Waves
We saw in Eq. (3.1.3) that each component of the electromagnetic fields in a
light wave in empty space can be written as a sum of terms proportional to
e±iφ , where φ is the phase:
φ = k · x − ωt . (4.3.6)
4.3 Clocks, Rulers, Light Waves 105
The factor (1 − v/c)−1 is the usual Doppler shift, which applies in both
non-relativistic and relativistic contexts, and indeed was first observed in sound
waves. If the source of the light wave is at rest in the reference frame with
coordinates x μ , then the Lorentz transformation μ ν gives the source a velocity
v in the 3-direction, which for v positive is along the direction of the light wave
and hence toward whoever is observing the wave. If the time interval between
wave crests emitted by the source at rest is 1/ν, then, apart from relativistic
effects, the observer will see these crests arrive at a time interval less by a factor
1 − v/c, since the distance that each crest has to travel is less than that for the
previous crest by a factor 1 − v/c, and hence the observed frequency, the rate
at which wave crests arrive at the observer, is increased (apart from relativistic
time dilation) by a factor 1/(1 − v/c). For negative v the source is moving away
from the observer, and the factor (1 − v/c)−1 gives a decrease in frequency, as
seen in the redshift of light from receding galaxies at great distances.
Einstein published two papers on relativity theory in 1905. Shortly after the first
paper, which is cited in Section 4.2, he published in the same journal another
paper10 with the title “Does the inertia of a body depend on its energy content?”
This is often referred to as “the E = mc2 paper,” but as can be gathered from
the title, it would be better called “the m = E/c2 paper.” In this paper, Einstein
showed that the mass of a body decreases by an amount E/c2 when the body
emits radiation with energy E. Here “mass” was defined as inertial mass, by the
prescription that, as in Newtonian mechanics, the kinetic energy of a particle of
mass m with velocity v
c is mv 2 /2.
the decaying particle is traveling with velocity v in the +3-direction both before
and after the decay. Suppose that v
c. Before it decays, the total energy of
the particle is its internal energy EA plus its kinetic energy:
1
Ebefore = EA + mA v 2 .
2
According to Eq. (4.3.11), in this reference frame the frequencies of the photons
that travel in the +3- and −3-directions are respectively
(1 ± v/c)ν (1 ± v/c)
ν± = % =% (EA − EB )/2h
1 − v /c
2 2 1 − v 2 /c2
so the total energy of the final state is
1
Eafter = EB + mB v 2 + hν+ + hν−
2
1
= EB + mB v 2 + (EA − EB )/ 1 − v 2 /c2 ,
2
or, since we are assuming that v
c,
1
Eafter = EB + mB v 2 + (EA − EB )(1 + v 2 /2c2 ) .
2
The conservation of energy requires that 0 = Ebefore − Eafter , so
1
0 = EA − EB + (mA − mB )v 2 − (EA − EB )(1 + v 2 /2c2 ) .
2
In order for this to be possible with velocity-independent internal energies and
masses, we must have
mA − mB = (EA − EB )/c2 (4.4.1)
as was to be proved.
Despite our use of the approximation v
c, Eq. (4.4.1) is not an approxi-
mate result. No one can stop us from making a Lorentz transformation with an
arbitrarily small velocity, so we can reduce any error we have made along the
way in deriving Eq. (4.4.1) to be as small as we like, simply by making v/c
sufficiently small.
Equation (4.4.1) is not yet the famous E = mc2 . As long as we are dealing
only with a single body changing its state, as in the above Einstein thought
experiment, it is only changes in its energy that matter for the conservation of
energy, not the energy itself, and we might as well define the energy of any one
state, say the lowest state, as mc2 . But E = mc2 goes beyond Einstein’s result
(4.4.1) when we consider a reaction involving a number of bodies, coming into
and going out of existence, and exchanging energy with each other.
108 4 Relativity
In order for this to give the Newtonian result mv 2 /2 for the kinetic energy, we
must choose the constant of proportionality between p 0 and E so that E = cp 0 ,
and hence
E(v) = mc2 γ = mc2 + mv2 /2 + mv4 /6c2 + · · · . (4.4.6)
Note that we cannot leave out the term mc2 in the energy (4.4.6), or change it
to any other constant term. If we did, then p μ would not satisfy the condition
110 4 Relativity
(4.4.2) for a four-vector, and the conservation of energy and momentum would
not be Lorentz invariant.
We can eliminate the velocity v from Eqs. (4.4.5) and (4.4.6) to derive a
relation between energy and momentum. Since γ 2 (1 − v2 /c2 ) = 1, we have
E 2 − p2 c2 = m2 c4 , or in other words,
E = p2 c 2 + m 2 c 4 . (4.4.7)
E = mc2
Einstein suggested in his 1905 paper that the reduction of mass accompanying
the emission of energy might be detected by the study of radioactive salts.
This proved difficult, because it is not easy to measure accurately the atomic
weights of different states of a radioactive isotope. In the early 1930s it became
possible to verify Einstein’s relation between energy and mass by studying
reactions among stable isotopes, such as 1 H + 7 Li → 2 4 He. The masses of
the atoms of 1 H, 7 Li, and 4 He are respectively 1.007825 m1 , 7.016003 m1 , and
4.002603 m1 , where m1 is the mass of unit atomic weight, defined today as
1/12 the mass of the carbon isotope 12 C. The mass lost in this reaction is thus
m = 0.018622 m1 = 3.09 × 10−26 g = 17.3 MeV/c2 . Thus it is expected
that the kinetic energies of the two 4 He nuclei in the final state should
exceed the kinetic energies of the 1 H and 7 Li nuclei in the initial state by
mc2 = 17.3 MeV, and this is observed, verifying E = mc2 and not just
Eq. (4.4.1).
Force
Because of the presence of the factor γ in Eqs. (4.4.5) and (4.4.6), the quantity
mγ is sometimes called the relativistic mass. I will not use this terminology,
because it suggests that we can calculate the acceleration produced by any force
just by replacing m in Newton’s F = ma with mγ , which is not the case. To
find how bodies respond to forces in special relativity, we need to formulate a
general Lorentz-invariant version of Newton’s second law.
Though the time coordinate is Galilean invariant it is not Lorentz invariant,
so neither is the time derivative d/dt. To replace the time derivative in Newton’s
second law, we note that dτ is Lorentz invariant, where
dτ ≡ −ημν dx dx /c = dt − dx /c = dt 2 − dt 2 v2 /c2 = dt/γ .
μ ν 2 2 2 2
(4.4.8)
4.5 Photons as Particles 111
As we saw in Section 3.2, Einstein in 1905 proposed that the energy of radiation
of a given frequency ν is always an integer multiple of hν. This led to the further
conjecture that the radiation consists of particles, later called photons, each with
energy hν. A state with energy nhν would then be interpreted as consisting of
n photons.
Photon Momentum
If we suppose that photons are real particles, then we need to work out the
relation between their energy and the magnitude of their momentum. In order
for the conservation of energy and momentum to be Lorentz invariant when
photons interact with other particles, the photon energy E and momentum p
must form a four-vector p μ , with p 0 = E/c, just as for other particles. That
is, in changing coordinates from x μ to x μ = μ ν x ν , the photon momentum
four-vector is changed to p μ = μ ν p ν . But we cannot work out formulas for
112 4 Relativity
Compton Scattering
If photons carry momentum, then when a photon is scattered by an electron
at rest the electron should recoil. Suppose the incoming and outgoing photons
have wave vectors k and k , respectively. According to Eq. (4.4.7), the energy
of an electron of momentum pe is given by
Ee = p2e c2 + m2e c4 . (4.5.4)
The conservation of energy in the scattering of a photon by an electron at rest
requires that
4.5 Photons as Particles 113
ch̄|k| + me c2 = ch̄|k | + p2e c2 + m2e c4 ,
where pe is the momentum of the recoiling electron. According to Eq. (4.5.2),
the conservation of momentum gives
pe = h̄k − h̄k ,
so the conservation of energy becomes
ch̄|k| + me c2 = ch̄|k | + c2 h̄2 |k|2 + |k |2 − 2 cos θ|k||k | + m2e c4 ,
where θ is the angle between the initial and final photon wave vectors. Subtract-
ing ch̄|k | from both sides and squaring, we have
c2 h̄2 (k2 − 2|k||k | + k ) + 2c3 h̄me (|k| − |k |) + m2e c4
2
Cancelling the terms c2 h̄2 k2 . c2 h̄2 k 2 , and m2e c4 on both sides leaves us with
|k| − |k | = |k||k |(1 − cos θ )h̄/me c .
It is conventional to write this in terms of the wavelengths λ = 2π/|k| and
λ = 2π/|k |, and h = 2π h̄:
λ − λ = (1 − cos θ )h/me c . (4.5.5)
The quantity h/me c equals 2.425 × 10−10 cm, and gives the increase in wave-
length for a photon scattered at right angles to its original direction. This is
known as the Compton wavelength of the electron, in honor of Arthur Holly
Compton (1892–1962).
Compton at Washington University studied the scattering of monochromatic
X-ray photons, with energy 17 keV. These photons were created by X-ray flu-
orescence: atoms of high atomic number, such as platinum, were exposed to a
beam of high-energy electrons in a tube something like the cathode ray tubes
used by Thomson (with whom Compton had worked at Cambridge). The beam
of high-energy electrons knocked electrons out of these atoms, some from inner
orbits. Then other electrons of nearly zero energy fell into these orbits, emitting
monochromatic radiation, which, as we saw in our discussion of atomic number
in Section 3.4, is at X-ray wavelengths for atoms with Z 1. In Compton’s
experiment these photons were directed at a graphite target, where they were
scattered by an outer electron of the carbon atom. These outer electrons have
energies of the order of an eV, or at most tens of eV, negligible compared with
the 17 keV energy of the incoming X-ray photon, so they scattered the X-ray
photons just as if they were at rest. The wavelength of the scattered photon was
measured by diffraction scattering, using a single crystal as a diffraction grating.
Compton’s experiment verified Eq. (4.5.5) in 1923, giving a significant boost to
114 4 Relativity
the acceptance of the quantum of light as a particle of zero mass. It was the
chemist G. N. Lewis (1875–1946) who a few years later gave this particle the
name “photon.”
There are other types of particle with zero mass. One is the graviton,
the quantum of gravitational radiation. This radiation has been observed, but
there is unfortunately no prospect of observing its quantum nature in the
foreseeable future. There are also eight types of gluons, massless particles that
in our present Standard Model are supposed to mediate strong nuclear forces.
They interact so strongly when pulled away from other strongly interacting
particles that they cannot even in principle be observed in isolation, but there is
plenty of indirect evidence of their existence.
of the second equation. So how should ρ(x, t) and J(x, t) behave under Lorentz
transformations in order for Eq. (4.6.3) to be Lorentz invariant?
It helps to put the continuity equation in a revealing four-dimensional form.
Define a four-component quantity J μ (x) with J 0 (x) = cρ(x). Then, recalling
that x 0 ≡ ct, Eq. (4.6.3) reads
∂ μ
J (x) = 0 , (4.6.4)
∂x μ
with repeated indices summed as usual over the values 1, 2, 3, 0. Now, how does
the partial derivative ∂/∂x μ transform if we perform a Lorentz transformation
x μ → x μ = μ ν x ν ? The chain rule of partial differentiation tells us that
∂ ∂x μ ∂
=
∂x ν ∂x ν ∂x μ
so in our case
∂ ∂
μ ν = . (4.6.5)
∂x μ ∂x ν
Therefore, if we suppose that J μ (x) transforms as a four-vector under the
Lorentz transformation x μ → x μ = μ ν x ν , in the sense that the current
J μ (x ) measured by an observer who uses spacetime coordinates x μ is
J μ (x ) = μ ν J ν (x) , (4.6.6)
then
∂ ∂ ∂
μ
J μ (x ) = μ μ ν J ν (x) = ν J ν (x) .
∂x ∂x ∂x
This is the Lorentz transformation of what is called a scalar. The quantity
∂J μ /∂x μ is seen by different observers to have the same value at the same
point in spacetime, although these observers use different spacetime coordinate
systems to label that point.
So, if an observer who uses spacetime coordinates x μ sees ∂J μ /∂x μ to
μ
vanish at some particular value xI of these coordinates, then an observer who
uses spacetime coordinates x = μ ν x ν will see ∂J μ /∂x μ vanish at the
μ
μ
corresponding coordinates xI = μ ν xIν . In particular, if the first observer sees
∂J μ /∂x μ vanish everywhere, then so will any other observer whose coordi-
nates are related to those of the first observer by a Lorentz transformation.
So the Lorentz transformation (4.6.6) does make the conservation condition
(4.6.4) Lorentz invariant.
E1 = F 01 = −F 10 , E2 = F 02 = −F 20 , E3 = F 03 = −F 30 , (4.6.7)
B1 = F 23 = −F 32 , B2 = F 31 = −F 13 , B3 = F 12 = −F 21 , (4.6.8)
4π 0 ∂ ∂ ∂ ∂F 0ν
J = ∇ · E = 1 F 01 + 2 F 02 + 3 F 03 = .
c ∂x ∂x ∂x ∂x ν
So, in this notation all of the inhomogeneous Maxwell equations (4.6.1) can be
summarized in the single four-component equation
∂ μν 4π μ
F (x) = J (x) . (4.6.9)
∂x ν c
It is now almost obvious how to make the inhomogeneous Maxwell equations
Lorentz invariant. We suppose that under a Lorentz transformation x μ → x μ =
μ ν x ν the field F μν (x) transforms like J μ (x), but with a pair of four-valued
indices. That is, the observer who uses coordinates x μ measures electric and
magnetic fields with
Upstairs, Downstairs
We still have to verify that, with electromagnetic fields obeying the trans-
formation rule (4.6.10) that makes the inhomogeneous Maxwell equations
(4.6.1) Lorentz invariant, the homogeneous Maxwell equations (4.6.2) are also
Lorentz invariant. To check this, we need to widen our ideas about vectors and
tensors.
118 4 Relativity
In general, we define a four-vector field V μ (x) as a quantity that has the same
Lorentz-transformation property as x μ or p μ or J μ :
V μ (x) → V μ (x ) = μ ν V ν (x) .
There is a different kind of four-vector field, conventionally written with a lower
index, that transforms according to
Uμ (x) → Uμ (x ) = μ ν Uν (x), (4.6.11)
where μ ν is the transposed inverse of the matrix μ ν in the sense that
1 ρ=σ
μ σ = μ σ = δσ ≡
ρ μ ρ μ ρ
(4.6.12)
0 ρ = σ .
The classic example of a vector that is naturally defined with a lower index is
the partial derivative. If we multiply Eq. (4.6.5) with ρ ν , sum over the repeated
index ν, and use Eq. (4.6.12), we find
∂ ∂
ρ
= ρ ν ν . (4.6.13)
∂x ∂x
It is trivial to calculate the transposed inverse μ ν of any given Lorentz
transformation μ ν . To see this, recall the defining characteristic (4.2.5) of
Lorentz transformations:
ημν μ ρ ν σ = ηρσ .
Multiplying with κ ρ , summing over ρ, and using Eq. (4.6.12) gives
ηκν ν σ = ηρσ κ ρ . (4.6.14)
That is, for i and j each running over 1, 2, 3:
i j = i j , 0 j = −0 j , i 0 = −i 0 , 0 0 = 0 0 .
In general, a tensor can have both upper and lower indices, and transforms
with a or its transposed inverse for each. For instance, a tensor t μν ρ has the
transformation property
t μν ρ → μ λ ν κ ρ σ t λκ σ .
If we set an upper index equal to a lower index and (following the summation
convention) sum over this index, we get another tensor with one less upper index
and one less lower index. For instance, in the above example, if we set ν = ρ
and sum, we obtain a quantity v μ ≡ t μν ν , with the transformation property of a
tensor with one index – that is, a vector:
v μ → μ λ ν κ ν σ t λκ σ = μ λ δκσ t λκ σ = μ λ t λσ σ = μ λ v λ ,
as required for a vector. One case has been already encountered: if we define a
μ
tensor tν ≡ ∂J μ /∂x ν and set the upper and lower indices equal and sum, we
μ
obtain a quantity that we already know is a scalar: tμ ≡ ∂J μ /∂x μ .
4.6 Electromagnetic Fields and Forces 119
Although it is important not to confuse upper and lower indices, the differ-
ence between them is just a matter of the sign of the time components. We
can use the matrix ημν to lower an index on any tensor, giving a new tensor. For
instance, returning to our earlier example, if t μν ρ is a tensor with transformation
property
t μν ρ → μ λ ν κ ρ ξ t λκ ξ ,
we can lower the index ν, defining a new tensor:
uμ σρ ≡ ηνσ t μν ρ .
Using Eq. (4.6.14), we see that this has the transformation property
uμ σρ → ηνσ μ λ ν κ ρ ξ t λκ ξ
= μ λ σ τ ητ κ ρ ξ t λκ ξ = μ λ σ τ ρ ξ uλ τ ξ
as is appropriate for a tensor with one upper index and two lower indices. (It is
also possible to raise any lower indices on tensors, but we won’t need to do this
here.)
Likewise
1 ∂B 1 ∂B
H230 = +∇×E , H310 = +∇×E .
c ∂t 1 c ∂t 2
Hence the homogeneous Maxwell equations are the same as the requirement
that, for all μ, ν, and λ,
Hμνλ = 0 . (4.6.17)
This is a manifestly Lorentz-invariant condition; if Hμνλ (x) vanishes, then so
does Hρσ (x ) = μ ν λ H
κ ρ σ κ μνλ (x).
(We will not here use the formalism of differential forms, but for anyone
interested in this subject, I mention in passing that a completely antisymmetric
tensor with p lower indices is known as a p-form. Thus Fμν is a 2-form, and
Hμνλ is a 3-form. Given a p-form, we can form a p + 1-form, known as the
exterior derivative, by taking the spacetime derivative and antisymmetrizing.
Thus, Hμνλ is the exterior derivative of Fμν . A p-form whose exterior deriva-
tive vanishes is said to be closed; a p-form that can be written as the exterior
derivative of a p − 1-form is said to be exact. Thus Hμνλ is exact, and the
homogeneous Maxwell equations (4.6.2) tell us that Fμν is closed. It is easy to
see that, because partial derivatives commute, any exact p-form is closed, and
a profound theorem due to Poincaré tells us that in simply connected spaces
any closed p-form is exact but that this is not necessarily true in spaces with
more complicated topology.11 In electrodynamics, since Fμν is closed, we can
conclude that in ordinary spacetime it is exact, so it can be written as the
exterior derivative of a 1-form Aμ known as the four-vector potential; that
is, Fμν = ∂Aμ /∂x ν − ∂Aν /∂x μ . Maxwell originally wrote his equations as
differential equations for A and A0 , not E and B.)
11 For a more thorough treatment, see e.g. H. Flanders, Differential Forms (Academic Press, New York,
1963).
4.7 Causality 121
So, what should we take for F μ in the case of a particle of charge q in a space
that is empty except for being pervaded by electric and magnetic fields? Just
as for the momentum four-vector of massive particles, the force four-vector is
uniquely determined by the condition that it takes the known form for a particle
at rest, and is a four-vector, so that it is given by a Lorentz transformation for a
particle of any velocity.
There is an obvious four-vector that is linear in the electric and magnetic
fields and (because F μρ = −F ρμ ) satisfies Eq. (4.6.19):
f μ ≡ ηρσ F μρ p σ
so we can guess that F μ ∝ f μ . To check that this gives the right answer for a
particle at rest and to find the coefficient of proportionality, let us evaluate f μ
for a particle at rest, for which p = 0 and p 0 = mc. In this limit
f i → −mcF i0 = mcEi , f 0 → −mcF 00 = 0
with i = 1, 2, 3. Therefore to have agreement with the familiar formula dp/dt =
qE for the acceleration of a particle of charge q and zero velocity by an electric
field, we take
q μ q
Fμ = f = ηρσ F μρ p σ . (4.6.20)
mc mc
That is, for a general velocity,
q
Fi = [F ij p j − F i0 p 0 ] , (4.6.21)
mc
and in three-vector notation, recalling that p 0 = mcγ , p = mγ v,
q 0
F= [p E + p × B] = qγ [E + v × B/c] . (4.6.22)
mc
Since dτ = dt/γ , this gives
d
m [γ v] = q[E + v × B/c] . (4.6.23)
dt
Given the existence of the force exerted by electric fields, the force exerted
by magnetic fields is an inevitable consequence of Lorentz invariance. It is a
special feature of electromagnetic forces that the only change in the equation
of motion introduced by special relativity is the replacement of the mass m in
the momentum with mγ , which in this one case allows us to treat mγ as a
relativistic mass.
4.7 Causality
result, that no influence whatever can travel faster than light. This is not just a
confession of technological inadequacy, but a consequence of an assumption of
causality, that effects always come after causes.
|x|/c ≤ t . (4.7.2)
Whatever physical influence is exerted by the cause to produce the effect travels
at a speed |x|/t; the inequality (4.7.2) says that the speed of this influence
must be no greater than c.
Fortunately, if the bound (4.7.2) is seen to be satisfied by one observer then it
is satisfied for any other observer related to the first by the sort of proper Lorentz
transformation discussed in this chapter. The inequality (4.7.2) is equivalent to
the inequality
This quantity is Lorentz invariant, so if Eq. (4.7.3) is satisfied for one observer
using coordinates x μ , the corresponding inequality must be satisfied for coordi-
nates x μ = μ ν x ν , so we must also have
Since this gives a non-zero lower bound on |t |, in order for t to have an
opposite sign from t the Lorentz transformation would have to produce a dis-
continuous jump in the coordinates. This is not possible for the sort of “proper”
Lorentz transformation that concerns us in this chapter, which as discussed in
Section 4.2 can be produced from the identity transformation x → x by a
smooth change of parameters. So if one observer sees t > 0 and |x|/c ≤ t,
then any observer related to the first by a proper Lorentz transformation will see
t > 0 and |x |/c ≤ t .
Light Cone
These conclusions are well illustrated by introducing the light cone, the space-
time surface with ημν x μ x ν = 0. Points outside the light cone fall on hyper-
boloids with ημν x μ x ν = a > 0. Any point on one of these hyperboloids
can be taken to any other point on the same hyperboloid (that is, with the same
value of a) by a proper Lorentz transformation, even if this entails a change
of sign of x 0 . On these hyperboloids | x| > c|t|, so it is not possible for
any influence traveling at less than the speed of light to traverse a spacetime
interval x μ outside the light cone. Thus as long as we assume that physical
influences never travel faster than light, the circumstance that proper Lorentz
transformations can change the sign of t outside the light cone presents no
challenge to causality.
Points inside the light cone fall on hyperboloids with ημν x μ x ν = b < 0.
For each value of b there are two disconnected hyperboloids, one inside the
future light cone, with x 0 > 0, and one inside the past light cone, with
x 0 < 0. Any point on one of these connected hyperboloids can be taken to
any other point on the same hyperboloid by a proper Lorentz transformation, but
proper Lorentz transformations cannot take us from inside the future light cone
to inside the past light cone. Causality requires that the difference x μ in the
coordinates of an effect and its cause be on or within the future light cone, and
if one observer sees this to be the case then so will all other observers related to
the first by a proper Lorentz transformation.
5
Quantum Mechanics
Our modern understanding of atoms, molecules, solids, atomic nuclei, and ele-
mentary particles is largely based on quantum mechanics. Quantum mechanics
grew in the mid-1920s out of two independent developments: the 1925 matrix
mechanics of Werner Heisenberg1 (1901–1976), and the 1926 wave mechanics
of Erwin Schrödinger2 (1887–1961). For the most part in this chapter we will
follow the path of wave mechanics, which is far more convenient for all but the
simplest calculations. After a look at the historical inspiration for wave mechan-
ics in Section 5.1 the Schrödinger equation will be introduced in Section 5.2 and
used to derive not only the hydrogen energy levels found by Bohr but also their
degeneracy. The general principles of the wave mechanical formulation of quan-
tum mechanics are laid out in Section 5.3 and provide a basis for the discussion
of spin in Section 5.4, identical particles in Section 5.5, and scattering processes
in Section 5.6. In Section 5.7 the general principles are supplemented with the
canonical formalism, which is used in Section 5.8 to work out the Schrödinger
equation for charged particles in a general electromagnetic field. This will pro-
vide us with examples of the application of a widely useful approximation
scheme, perturbation theory, which is outlined in general terms in Section 5.9.
The two approaches of wave and matrix mechanics were unified by Paul
Dirac (1902–1984) in a more abstract formalism, which he called transforma-
tion theory.3 This has evolved into a modern approach in which physical states
are represented by vectors in an abstract space known as Hilbert space, with
wave functions arising as components of these vectors in a suitable basis. The
Hilbert space approach is briefly described in Section 5.10.
1 W. Heisenberg, Zeit. Phys. 33, 879 (1925). This article is reprinted in English in Van der Waerden, Sources
of Quantum Mechanics, listed in the bibliography.
2 E. Schrödinger, Ann. Physik 79, 361, 409 (1926). These articles are reprinted in English in Shearer,
Collected Papers on Wave Mechanics, listed in the bibliography.
3 This approach is described in Dirac, The Principles of Quantum Mechanics, listed in the bibliography.
124
5.1 De Broglie Waves 125
Group Velocity
The association of the wave (5.1.2) with a moving electron gained plausibility
from the remark that a localized packet of these waves travels with the velocity
of the electron. Consider a packet of these waves:
ψ(x, t) = d 3 p g(p) exp ip · x/h̄ − iE(p)t/h̄ (5.1.3)
4 L. de Broglie, Comptes Rendus Acad. Sci. 177, 507, 548, 630 (1923).
126 5 Quantum Mechanics
where
∂E(p)
Vi = . (5.1.4)
∂pi p=P
Because of the way we have constructed the packet function g(p), the mag-
nitude of (5.1.5) is peaked at x = Vt, which shows that the packet moves at
velocity V, known as its group velocity. But Vi = ∂E(p)/∂pi = c2 pi /E(p),
which as shown in Eqs. (4.4.5) and (4.4.6) is indeed the velocity of a particle of
momentum p.
Application to Hydrogen
De Broglie’s hypothesis met with just one initial success. The electron in a hy-
drogen atom is not free, but moves under the influence of the proton’s attraction.
Nevertheless, de Broglie supposed that the electron is described by the free-
particle wave function (5.1.2), but with the waves traveling in a circle around
the proton like sound waves in a toroidal organ pipe. To avoid a discontinuity in
ψ, it is necessary that a whole number n of wavelengths λ should fit around the
circle, so the radius of the circle is constrained by the condition that 2πr = nλ,
with n = 1, 2, . . . According to Eq. (5.1.2), λ = 2π h̄/p, where p ≡ |p|, so de
Broglie’s condition was
pr = nh̄ (5.1.6)
which for non-relativistic electrons with p = me v is the same as Bohr’s
condition (3.4.2), but now with no need of the correspondence principle to
infer that h̄ = h/2π. De Broglie could then repeat Bohr’s calculation, using
the non-relativistic formula E = me v 2 /2 − e2 /r for energy and the formula
me v 2 /r = e2 /r for centripetal acceleration, and thereby obtain Bohr’s formula
E = −e4 me /2h̄2 n2 for the hydrogen energy levels. Nothing new had been
learned about hydrogen, but de Broglie’s derivation at least gave a hint at an
explanation of Bohr’s quantization condition.
Davisson–Germer Experiment
There is a story that in his oral Ph.D. examination, de Broglie was asked if there
was some direct way of observing the wave nature of electrons, and he answered
that it might be possible to observe the diffraction of electron waves by a crystal
5.1 De Broglie Waves 127
lattice, like the well-known diffraction of X-rays used for instance in measuring
the increase of wavelength in Compton scattering. Whether or not this story is
true, the idea was a good one. According to Eq. (5.1.2), the wavelength of a
non-relativistic electron with kinetic energy Ee
me c2 is given by
%
λ = 2π h̄/pe = 2π h̄/ 2me Ee = 12.26 × 10−8 cm [Ee (eV)]−1/2 . (5.1.7)
Hence we only need electrons with energy a bit larger than 10 eV to get wave-
lengths nearly as small as a typical lattice spacing, about 10−8 cm. This is no
coincidence. In de Broglie’s interpretation of the Bohr quantization assumption,
the wavelength of an electron with an energy of a few eV, which is typical
of atomic binding energies, must fit a few times around an atomic orbit, and
therefore must be similar to the size of the atom, which is similar to the spacing
of atoms in crystals.
Several physicists tried and failed to observe the diffraction of electron
waves, until it was finally measured in 1927 by Clinton Davisson (1881–1958)
and Lester Germer (1896–1971) at the old Bell Telephone Laboratories building
on West Street in Manhattan.5 (It was also measured at about the same time
at the University of Aberdeen by George Paget Thomson (1892–1975), a
son of J. J. Thomson.) They used a beam of electrons with kinetic energy
54 eV, incident on a single crystal of nickel with a spacing of lattice planes
d = 0.91 × 10−8 cm (already known from measurements using X-ray diffrac-
tion). Electrons are reflected not only from the surface of the crystal, but from
numerous planes within the nickel. At certain angles θ between the incident
and reflected waves all these reflected waves go off with the same phase and
therefore add constructively, leading to enhanced reflection at these angles.
According to a 1913 formula (derived in the appendix to this section) of
William Henry Bragg (1862–1942) and his son Lawrence Bragg (1890–1971),
for any sort of wave the angles θn between incident and reflected waves at
which reflection is enhanced in this way satisfy the Bragg formula:
nλ = 2d cos(θn /2) , (5.1.8)
where n = 1, 2, 3, . . . Davisson and Germer found an enhanced n = 1 reflection
at θ1 = 50◦ , giving a wavelength
λ = 2 × 0.91 × 10−8 cm × cos(25◦ ) = 1.68 × 10−8 cm ,
in satisfactory agreement with the wavelength 1.67 × 10−8 cm expected from
Eq. (5.1.7) for a kinetic energy of 54 eV.
The wave nature of the electron allowed the development of a new instru-
ment, the electron microscope. Recall that a photon of energy E has wavelength
C
f A f
f f
D B
Figure 5.1 Derivation of the Bragg formula. The bold lines represent the
planes of the crystal lattice, seen edge on. Arrows indicate the direction of the
light rays.
Suppose that a wave of some sort is incident on a crystal lattice, with a ray
striking one plane of the lattice at point A, where it makes an angle φ between
the ray and the plane. (See Figure 5.1.) Part of the wave is reflected, with the
reflected ray making the same angle φ with the plane. Another part of the wave
continues in its original direction to the next plane, with the ray striking this
plane at point B, again at angle φ. Part of this ray is reflected at B, again at
angle φ, while another part continues to deeper planes. Draw a line from B in
a direction normal to the first reflected ray, intersecting this ray at a point C.
The purpose of this construction is that the two parallel reflected rays travel the
same distance from B and C to any distant detector, so the difference in the total
5.2 The Schrödinger Equation 129
distance that each ray travels to the detector is AB − AC. The two rays interfere
constructively if this difference is a whole number n of wavelengths λ:
AB − AC = nλ , n = 1, 2, 3, . . .
If this is satisfied, then the difference in the distance traveled by rays reflected
from the second and third planes will also be nλ, and so on down into deeper
and deeper planes, so all these reflected waves will interfere constructively. The
same is true of any rays that strike the crystal along parallel directions, whether
they are reflected from the first, second, or any other crystal plane. We then have
a very strong enhancement of the reflection. So we have to ask, how do AB and
AC depend on the lattice spacing and on the angle φ between the rays and the
crystal planes?
Draw a line from A to the second plane, which intersects it at a right angle at
a point D. The length of the line AD is the spacing d of lattice planes. Looking
at the right triangle ADB (whose hypotenuse is AB) we see that
AB = d/ sin φ .
To calculate AC, note that the angle at B between BA and BC is 180◦ − 2φ −
90◦ = 90◦ − 2φ, so looking at the right triangle BAC (whose hypotenuse is
AB), we see that
AC = AB sin(90◦ − 2φ) = AB cos(2φ) = AB[1 − 2 sin2 φ]
so
AB − AC = 2AB sin2 φ = 2d sin φ
and the condition for constructive interference is therefore
nλ = 2d sin φ .
It is common to describe the reflection in terms of the angle θ between the
incident and reflected rays, θ = 180◦ − 2φ, so φ = 90◦ − θ/2, and the condition
for constructive interference is then
nλ = 2d cos(θ/2) ,
as was to be shown.
hydrogen atom, imagining a free electron wave running around the electron
orbit. But of course electrons in atoms are not free. Starting in 1925, Erwin
Schrödinger struggled to extend the idea of the wave function to an electron
moving in a potential.6
Schrödinger’s starting point was de Broglie’s wave theory. Equation (5.1.2)
gives the wave function of a free electron of momentum p as
ψp (x, t) ∝ exp ip · x/h̄ − iE(p)t/h̄ .
The content of this explicit formula can be expressed as a pair of differential
equations
−i h̄∇ψp (x, t) = pψp (x, t) , (5.2.1)
∂
ψp (x, t) = E(p)ψp (x, t) ,
i h̄ (5.2.2)
∂t
where, now in a non-relativistic approximation,
1 2
E(p) me c2 +
p . (5.2.3)
2me
An electron that is bound in an atom cannot have a definite momentum –
classically, it goes round and round its orbit – so we would not expect the bound
electron wave function to satisfy an equation like (5.2.1). On the other hand, we
can try to use something like Eq. (5.2.2) to find the wave function of a bound
electron, with Eq. (5.2.1) used only to interpret p as −i h̄∇ in E(p). Schrödinger
thus took the equation for a bound electron as
∂
i h̄ ψ(x, t) = E(−i h̄∇, x)ψ(x, t) , (5.2.4)
∂t
where now E is given a dependence on x to account for the presence of potential
energy. For a non-relativistic electron in a potential V (x) the energy is E(p, x) =
me c2 + p2 /2me + V (x), and Eq. (5.2.4) reads
∂ h̄ 2
i h̄ ψ(x, t) = me c2 − ∇ 2 + V (x) ψ(x, t) . (5.2.5)
∂t 2me
This is known as the time-dependent Schrödinger equation.
Because the potential is assumed time-independent, Eq. (5.2.5) has solutions
of the form
ψ(x, t) = exp − i(me c2 + E)t/h̄ ψ(x) , (5.2.6)
where
h̄2 2
− ∇ + V (x) ψ(x) = Eψ(x) . (5.2.7)
2me
Boundary Conditions
This all began as just guesswork. Schrödinger and other physicists at first imag-
ined that ψ(x, t) gives an indication of how much of the electron is near x at
time t. As we will see when we come to scattering in Section 5.6, it was only
a few years later that Max Born correctly interpreted |ψ(x, t)|2 as a probability
density – that is, |ψ(x, t)|2 d 3 x is the probability that the electron is in a small
volume d 3 x at position x and time t. For the present, all we need to know is that
the relevant solutions of Eq. (5.2.7) are those with
|ψ(x)|2 d 3 x < ∞ (5.2.8)
so that by dividing ψ(x) by the square root of this integral we obtain a normal-
ized wave function, corresponding to a probability density for which the total
probability of the electron being somewhere is 100%.
If we assume (as is generally the case in practice) that V (x) vanishes at large
distances |x|, then at large |x| Eq. (5.2.7) becomes
h̄2 2
Eψ(x) → − ∇ ψ(x) . (5.2.9)
2me
A bound electron must have E < 0 (since otherwise it would be energetically
possible for the electron to escape to infinite distance) so Eq. (5.2.9) has solu-
tions that at large |x| behave as
ψ(x) → P (x) exp(±κ|x|) (5.2.10)
where κ is the positive square root,
κ = + 2me |E|/h̄2 , (5.2.11)
and P (x) is some function such as a polynomial that varies much more slowly
than an exponential for large |x|. (A derivative ∂/∂xi acting on the exponential
in Eq. (5.2.10) yields a constant factor ±κ, while a gradient acting on a function
P (x) in Eq. (5.2.10) that grows as a power of |x| gives a factor for |x| → ∞
proportional to 1/|x|.) Solutions of the time-independent Schrödinger equation
thus come in pairs, one of which (the one with a minus sign in the exponential
in Eq. (5.2.10)) satisfies the condition (5.2.8) at least as far as convergence at
large |x| is concerned, while the other does not.
We shall see that there is also a smoothness condition on ψ(x) at x → 0
that must be imposed on the wave function. We can always find solutions of
the Schrödinger equation (5.2.7) that satisfy either this condition at x → 0 or
132 5 Quantum Mechanics
Spherical Symmetry
We now specialize to the case of spherical symmetry, which applies in one-
electron atoms (and approximately for each electron in atoms with many elec-
trons), for which the potential is only a function of r = |x|. There is a
mathematical identity that is useful for a wide variety of problems with spherical
symmetry in various branches of mathematical physics:
1 ∂ ∂f (x) 1
∇ 2 f (x) = 2 r2 + 2 (x × ∇)2 f (x) (5.2.12)
r ∂r ∂r r
where f (x) is an arbitrary differentiable function of position. (This can be
derived in the same way that in ordinary vector algebra we derive the familiar
identity (a × b)2 = a2 b2 − (a · b)2 , but here keeping track of the order of
the position variable and derivatives that act on it.) As already mentioned,
Schrödinger assumed that −i h̄∇ should be interpreted as the operator rep-
resenting the momentum, so the operator representing the orbital angular
momentum is
L ≡ −i h̄ x × ∇ , (5.2.13)
and we can write Eq. (5.2.12) as the identity
1 ∂ ∂f (x) 1
∇ 2 f (x) = 2 r2 − 2 L2 f (x) . (5.2.14)
r ∂r ∂r h̄ r 2
The time-independent Schrödinger equation (5.2.7) thus takes the form
h̄2 1 ∂ 2 ∂ψ(x) 1
− 2
r + L2 ψ(x) + V (r)ψ(x) = Eψ(x) . (5.2.15)
2me r ∂r ∂r 2me r 2
Suppose that for some particular wave function, the smallest power of x in the
expansion of the wave function around the origin is some integer = 0, 1, 2, . . .
Then for x → 0 the wave function is dominated by a homogeneous polynomial
of order in the coordinates xi – that is, a sum of terms each of which has
factors of the coordinates xi . For instance, a homogeneous polynomial in x of
order zero is a constant, a homogeneous polynomial in x of order one is a linear
combination of x1 , x2 , and x3 , and a homogeneous polynomial in x of order two
is a linear combination of
x12 , x22 , x32 , x1 x2 , x2 x3 , x3 x1 .
Defining a radial coordinate r ≡ |x| and a unit vector x̂ ≡ x/r, the wave
function for r → 0 can now be written
ψ(x) → r Y (x̂) , (5.2.16)
where Y is some homogeneous polynomial of order in the unit vector x̂. (As
we shall see, for ≥ 1 there is more than one such polynomial, which will later
have to be distinguished by attaching an additional label to Y .)
Just knowing the value of is enough to tell us how L2 acts on the wave
function. Note that L does not act on functions of r, because Lf (r) = −i h̄(x ×
x̂f (r)) = 0, so L2 acts only on the direction x̂. For a wave function that goes
as (5.2.16) for r → 0, the first two terms on the left of Eq. (5.2.15) go as
h̄2 1 ∂ 2 ∂ψ(x) h̄2 ( + 1) −2
− r → − r Y (x̂) ,
2me r 2 ∂r ∂r 2me
1 1 −2 2
2
L2 ψ(x) → r L Y (x̂) ,
2me r 2me
while Eψ and (as long as the potential does not blow up as fast as 1/r 2 for
r → 0) also V (r)ψ are negligible for r → 0 compared with r −2 . Hence the
time-independent Schrödinger equation (5.2.15) requires that
L2 Y (x̂) = h̄2 ( + 1)Y (x̂) . (5.2.17)
We can therefore find solutions of the Schrödinger equation (5.2.15) of the form
ψ(x) = R(r)Y (x̂) , (5.2.18)
where R(r) satisfies the radial wave equation
h̄2 1 d 2 dR(r) h̄2 ( + 1)
− r + R(r) + V (r)R(r) = ER(r) ,
2me r 2 dr dr 2me r 2
(5.2.19)
with boundary conditions
R(r) ∝ r for r → 0 R(r) ∝ P (r) exp(−κr) for r → ∞ .
134 5 Quantum Mechanics
Here the term proportional to ( + 1) acts as a positive and hence repulsive
potential arising from the centrifugal force acting on an electron with non-zero
angular momentum. The function R(r) will turn out to depend on an index in
addition to , on which the energy also depends.
Angular Multiplicity
As we will see when we come to the periodic table of elements, it is important to
know the number of independent solutions of Eq. (5.2.17) for a given . For this
purpose, it is convenient first to recast Eq. (5.2.17) as a condition on r Y (x̂),
a homogeneous polynomial of order in the three-vector x. From Eq. (5.2.14),
we see that Eq. (5.2.17) is equivalent to the condition
∇ 2 r Y (x̂) = 0 . (5.2.20)
To distinguish among the solutions of Eq. (5.2.20), it is convenient to consider
the action of the operator L3 ≡ −i h̄(x1 ∂/∂x2 − x2 ∂/∂x1 ). Note that
L3 (x1 ± ix2 ) = −i h̄(−x2 ± ix1 ) = ±h̄(x1 ± ix2 ) , L3 x3 = 0 .
We can take a complete set of independent homogeneous polynomials in x of
order as the products of ν± factors of x1 ± ix2 and − ν+ − ν− factors of x3
for various non-negative integers ν± . The action of L3 on these products is
−ν −ν
L3 (x1 + ix2 )ν+ (x1 − ix2 )ν− x3 + −
−ν −ν
= h̄(ν+ − ν− ) (x1 + ix2 )ν+ (x1 − ix2 )ν− x3 + − .
Now, for an arbitrary function f (x),
∂2 ∂ 2f ∂ 2f
(L3 f ) = −2i h̄ + L 3 ,
∂x12 ∂x1 ∂x2 ∂x12
∂2 ∂ 2f ∂ 2f
(L3 f ) = +2i h̄ + L 3 ,
∂x22 ∂x2 ∂x1 ∂x22
∂2 ∂ 2f
(L3 f ) = L 3 ,
∂x32 ∂x32
so
∇ 2 (L3 f ) = L3 ∇ 2 f .
(We shall see in Section 5.4 that this is just a consequence of the rotational in-
variance of the Laplacian ∇ 2 .) It follows that when ∇ 2 acts on a sum of terms of
−ν −ν
the form (x1 + ix2 )ν+ (x1 − ix2 )ν− x3 + − , all with the same value of ν+ − ν− ,
it gives a sum of terms that again all have that value of ν+ −ν− . We can therefore
5.2 The Schrödinger Equation 135
find solutions of Eq. (5.2.20) that are sums of products of coordinates all with
the same value of m ≡ ν+ − ν− , and label the solutions as r Ym (x̂), where
L3 Ym (x̂) = h̄mYm (x̂) .
How many solutions of Eq. (5.2.20) are there for a given ? First let’s ask
how many independent homogeneous polynomials in x of order of the form
−ν −ν
(x1 + ix2 )ν+ (x1 − ix2 )ν− x3 + − there are for a given . The exponent ν+
can be any integer from 0 to and, for a given ν+ , the exponent ν− can be any
integer from 0 to − ν+ , so the number of these independent homogeneous
polynomials of order in x is therefore
+
−ν
( + 1)
N = 1= ( − ν+ + 1) = ( + 1)2 −
2
ν+ =0 ν− =0 ν+ =0
( + 1)( + 2)
= .
2
We also have to impose the condition (5.2.20). The function ∇ 2 (r Y (x̂)) is
itself a homogeneous polynomial of order − 2 in the three-vector x, so setting
this function equal to zero imposes N−2 conditions on r Y (x̂), and the number
of independent Y subject to these conditions is thus
( + 1)( + 2) ( − 1)
N − N−2 = − = 2 + 1 .
2 2
But this is the same as the number of possible values of m ≡ ν+ − ν− , ranging
from m = − to m = +, so there must be just one function Ym (x̂) for each
and m.
The index m does not appear in Eq. (5.2.19), so the 2 + 1 states that differ
only in the value of m all have the same energy as long as the spherical sym-
metry of the atom is maintained. The degeneracy of these states can be lifted
by exposing the atom to an external perturbation which marks out a preferred
direction, in which case the energies of the different states will be split from one
another. Where the external perturbation is a magnetic field this is known as the
Zeeman effect, after Pieter Zeeman (1865–1943), who first reported it in 1897.
It was not possible to understand the details of the splitting of energy levels
in the Zeeman effect until the discovery of electron spin, to be discussed in
Section 5.4. We will calculate the Zeeman effect in Section 5.9, using the meth-
ods of perturbation theory, as an application of the quantum theory of the inter-
action of electrons with electromagnetic fields, to be described in Section 5.8.
Spherical Harmonics
Explicit formulas for spherical harmonics are needed in some applications of
quantum mechanics but they are not needed in the calculations of energy levels
136 5 Quantum Mechanics
As we have seen in the story so far, quantum mechanics began with guess-
work: Einstein’s guess that the energy and momentum of light waves comes in
5.3 General Principles of Quantum Mechanics 139
particles; Bohr’s guess that if the energy and momentum of radiation are quan-
tized then so are other things, such as the angular momentum of electrons
in atomic orbits; de Broglie’s guess that if electromagnetic waves consist of
particles then particles such as electrons behave like waves; and Schrödinger’s
guess that the differential equations for de Broglie’s waves could be modified for
atomic electrons by inserting a potential. It is time that we move on from this
guesswork and describe the general principles of quantum mechanics as they
emerged in the formalism of wave mechanics soon after 1925. Then in following
sections we shall go on to applications of quantum mechanics in contexts more
general than those considered so far.
Two wave functions that differ only by a constant phase factor of absolute value
unity represent the same state. In solving differential equations for the wave
function, the important thing is that the integral (5.3.2) should be finite – in
that case we can always find a ψ that satisfies Eq. (5.3.2) by dividing the wave
function by the square root of this integral.
in which the constant factor (πL)−3/2 is chosen so that this ψ satisfies the
normalization condition (5.3.2). The constant L can be chosen as some very
large length, in which case the particle is almost certainly in a very large volume
L3 , where it almost certainly has the momentum p.
The operator Xnj that represents the j th component of the position vector of
the nth particle, acting on any function of position to its right such as a wave
function or the derivative of a wave function, simply multiplies that function by
the argument xnj and is obviously linear. Here again we have the problem that its
eigenfunctions cannot be normalized. In the one-particle case, a wave function
ψ(x) that represents a state with a definite position a would have Xψ(x) ≡
xψ(x) equal to aψ(x) for all x, so that it would have to vanish for all x = a, and
the integral (5.3.2) would vanish. But we can find a normalized wave function
that represents a state in which the particle is almost certainly very close to
position a:
ψ(x) = (πd)−3/2 exp − (x − a)2 /2d 2 ,
where d is here some very small length.
5.3 General Principles of Quantum Mechanics 141
From operators we can construct other operators, which may or may not
represent physical quantities. Linear combinations provide a trivial example:
if A and B are operators while a and b are ordinary complex numbers, then
aA + bB is an operator for which
[aA + bB]ψ = aAψ + bBψ .
The product AB of any two operators A and B is defined by associativity: it is
an operator that, acting on any function f to its right, gives the same result as
acting first with B and then acting to the right on Bf with A:
(AB)f ≡ A(Bf ) . (5.3.5)
The Hamiltonian
One linear operator formed in this way is the Hamiltonian, which represents the
energy. For instance, for a single non-relativistic particle moving in a potential
V the Hamiltonian is
1 2
H = P + V (X) . (5.3.6)
2m
The time-independent Schrödinger equation (5.2.7) is just the statement
H ψ = Eψ, which tells us that ψ represents a state with energy E. The eigen-
functions of this Hamiltonian with negative eigenvalues are normalizable, a con-
dition we imposed in finding the bound state energy values in Section 5.2, but
there are also eigenfunctions with positive eigenvalues, representing unbound
states, which can only be normalized in the same approximate sense as the
eigenfunctions of position and momentum.
Adjoints
There is another process for producing new operators from other operators,
analogous to taking the complex conjugate of a number. For any operator A,
we define the adjoint A† as the operator for which
[Aψ1 ] ψ2 = ψ1∗ [A† ψ2 ]
∗
(5.3.7)
where ψ1 and ψ2 are any two wave functions. Here and below we use the
abbreviation
ψ1 ψ2 ≡ d x1 d 3 x2 · · · ψ1∗ (x1 , x2 , . . . )ψ2 (x1 , x2 , . . . ) .
∗ 3
(5.3.8)
It is easy to see that the adjoint of a product is the product of the adjoints in the
opposite order:
[AB]† = B † A† (5.3.9)
142 5 Quantum Mechanics
because
∗ ∗
[ABψ1 ] ψ2 = [Bψ1 ] [A ψ2 ] =
†
ψ1∗ [B † A† ψ2 ] .
It is also obvious that the adjoint of a linear combination of operators is the same
linear combination of the adjoints, but with complex conjugate coefficients
[aA + bB]† = a ∗ A† + b∗ B † , (5.3.10)
and the adjoint of an adjoint gives back the original operator:
[A† ]† = A . (5.3.11)
There is an important class of linear operators that are their own adjoints
A† = A . (5.3.12)
A physical quantity represented by such an operator can only have real values
in any state, for if Aψ = αψ for some wave function ψ, then
α ψ ∗ ψ = ψ ∗ [Aψ] = [Aψ]∗ ψ = α ∗ ψ ∗ ψ
so if α1 = α2 then ψ2∗ ψ1 = 0. Such wave functions are said to be orthogo-
nal. For instance any two different spherical harmonics such as those listed in
5.3 General Principles of Quantum Mechanics 143
Section 5.2 are orthogonal, because they are eigenfunctions of the self-adjoint
operators L2 and L3 with different eigenvalues h̄2 ( + 1) and/or h̄m.
Expectation Values
The interpretation given by Eq. (5.3.1) of |ψ|2 as a probability density tells us
that if we measure any function f (x1 , x2 , . . . ) of positions many times in the
state represented by wave function ψ, the mean value of the measured values
will be
f ψ = f (x1 , x2 , . . . ) |ψ(x1 , x2 , . . . )|2 d 3 x1 d 3 x2 · · · ,
provided that ψ is normalized so that |ψ(x1 , x2 , . . . )|2 d 3 x1 d 3 x2 · · · = 1.
Since ψ(x1 , x2 , . . . ) is an eigenfunction of the operator f (X1 , X2 , . . . ) with
eigenvalue f (x1 , x2 , . . . ), this can be written
f ψ = ψ ∗ (x1 , x2 , . . . ) f (X1 , X2 , . . . )ψ](x1 , x2 , . . . ) ,
so the expectation value of a self-adjoint operator is real for any wave function.
It is obvious that if Aψ = αψ then the expectation value Aψ of A for
the wave function ψ is just α, but expectation values give useful information
even for wave functions that do not represent states with a definite value for
144 5 Quantum Mechanics
the observable. For instance, the mean square spread of values of an observable
represented by A around its mean value is
* +
(A)2 ≡ (A − A)2 . (5.3.14)
Probabilities
Suppose a physical system is in a state represented by a normalized wave func-
tion ψ, and we measure an observable represented by a Hermitian operator A
that (to start with the simplest case) has only discrete non-degenerate eigenval-
ues αn with eigenfunctions ϕn . Even though ψ will not in general be one of these
eigenfunctions, it can generally be expanded as a series of terms proportional to
the eigenfunctions
ψ= c n ϕn ,
n
where cn are some numerical coefficients. (The proof of the possibility of such
an expansion depends on detailed properties of the operator A.) As we have
seen, such eigenfunctions are orthogonal and, if properly normalized, can be
taken as orthonormal in the sense that
∗ 1 n=m
ϕm ϕn = δnm ≡
0 n = m .
∗ and
We can find the coefficients cm by multiplying the expansion with ϕm
integrating over all values of the arguments, which gives
∗ ∗
ϕm ψ = cn ϕm ϕn = c m .
n
The expectation value of the observable A is then
∗
Aψ = ψ Aψ = cn cm ϕn∗ Aϕm
∗
nm
or, since Aϕm = αm ϕm ,
2
Aψ = αm cn∗ cm ϕn∗ ϕm = αm |cm | =
2 ∗
αm ϕm ψ .
nm m
Since a corresponding result is true for any function of A, the inevitable inter-
pretation is that when the observable represented by A is measured in a state
represented by the normalized wave function ψ, the probability of finding the
result αm is
2
∗
Pm (ψ) = ϕm ψ . (5.3.15)
5.3 General Principles of Quantum Mechanics 145
This is known as the Born rule and can be taken instead of Eq. (5.3.13) as the
third postulate of quantum mechanics.
Continuum Limit
We can also calculate probability densities for an observable that takes a con-
tinuum of values, by taking the limit of the case in which the observable takes a
very large number of very close discrete values. If the number of values of the
index n for which the eigenvalue αn is in a range from α to α + dα is N (α)dα,
then in the state represented by normalized wave function ψ the probability of
finding the observable in this range is
2
∗
dP (α) = N (α)dα × ϕα ψ , (5.3.16)
for which Eq. (5.3.16) gives the probability of finding the observable in the
range from α to α + dα:
dP (α) = |ψ(α)|2 dα . (5.3.18)
The classic example of such continuum operators and alternative wave functions
is provided by momentum.
Momentum Space
Consider for instance a particle in a cubical box of edge L. The normalized
wave function representing a state with definite momentum p is
ϕp (x) = L−3/2 exp(ip · x/h̄) .
Pretty much as we saw for photons in Section 3.2, the allowed momenta take
the form p = 2πnh̄/L, where n is a vector with integer components, so that
this wave function should have the same values on opposite sides of the box.
In a state represented by a normalized wave function ψ(x), the probability of
finding the momentum to have value p is
2
Pp = ∗ 3
ϕp (x)ψ(x) d x ,
L3
the integral here taken over the interior of the box. We can pass to the continuum
limit by taking the box to be very large, so that the allowed momentum values
146 5 Quantum Mechanics
are very close together. Since the allowed vectors n form a lattice of cubes, each
of volume unity, the number dN of these allowed momenta in a small volume of
momentum space d 3 p around p equals the corresponding volume in the space
of vectors n:
L 3 3
dN = d n =
3
d p
2π h̄
so the probability of finding the momentum in this range is
2
L 3 3
d p × ϕp (x)ψ(x) d x = |ψ ¶ (p)|2 d 3 p ,
∗ 3
(5.3.19)
2π h̄ L3
where
3/2
L
ψ (p) ≡
¶
ϕp∗ (x)ψ(x) d 3 x
2π h̄ L3
→ (2π h̄)−3/2 exp(−ip · x/h̄)ψ(x) d 3 x , (5.3.20)
with the last integral taken over all space. We can just as well say that the state
of the system is represented by the momentum-space wave function ψ ¶ (p) as by
the coordinate-space wave function ψ(x). Indeed, as we will see in Section 5.10,
both the coordinate-space wave function ψ and the momentum-space wave
function ψ ¶ are nothing but the components in different bases of a vector in
an abstract space, known as Hilbert space.
Commutation Relations
The commutator of two operators A and B, written [A, B], is defined by
[A, B] ≡ AB − BA . (5.3.21)
In order for the physical quantities represented by operators A and B to have
definite numerical values α and β in a state represented by wave function ψ it
is necessary that the commutator [A, B] acting on ψ should vanish, because
[A, B]ψ = βAψ − αBψ = βαψ − αβψ = 0 .
In particular, it is never possible for any state to have definite values for both of
two quantities represented by operators whose commutator is simply a non-zero
number, because such a commutator can never give zero when acting on any ψ.
It is helpful in evaluating commutators to note that commutation acts like
differentiation. For instance,
[A, BC] = ABC − BCA = ABC − BAC + BAC − BCA
= [A, B]C + B[A, C] .
5.3 General Principles of Quantum Mechanics 147
Thus, since all components of momentum commute with one another, they also
commute with any function only of momenta, such as the total kinetic energy
operator n P2n /2mn .
Uncertainty Principle
Note that the commutator of Xni and Pmj acting on any multi-particle wave
function ψ is
∂ψ ∂
[Xni , Pmj ]ψ = −i h̄xni + i h̄ (xni ψ)
∂xmj ∂xmj
∂ψ ∂ψ
= −i h̄xni + i h̄δij δnm ψ + i h̄xni = +i h̄δij δnm ψ ,
∂xmj ∂xmj
a result we write as a commutation relation
[Xni , Pmj ] = i h̄δij δnm . (5.3.22)
This shows in particular that there can be no state in which a component of some
particle’s position and the same component of the same particle’s momentum
both have definite values.
Indeed, using this commutation relation, it is possible to set a lower bound
on the product of the root mean square spread of values of position and
momentum:
Xni Pni ≥ h̄/2 (5.3.23)
a result known as the Heisenberg uncertainty principle.7
We can see this in a simple example. The normalized wave function for a
particle confined to a distance d around some position a can be written as a
superposition of wave functions with definite momentum:
(πd)−3/2 exp − (x − a)2 /2d 2
3/2
d
= d 3
p exp ip · [x − a]/h̄ exp(−d 2 p2 /2h̄2 ) .
2π 2 h̄2
(5.3.24)
We see that if the spread in values of x is of order d, then the spread in values
of p is of order h̄/d, and the product of the spreads is of order h̄, in accordance
with the uncertainty principle.
7 W. Heisenberg, Zeit. Phys. 43, 172 (1927). For a textbook proof, see Weinberg, Lectures on Quantum
Mechanics, listed in the bibliography.
148 5 Quantum Mechanics
Time Dependence
In the earliest formulation of quantum mechanics, wave functions were given a
time dependence governed by the time-dependent Schrödinger equation:
∂
i h̄ ψ(x1 , x2 , . . . ; t) = H ψ(x1 , x2 , . . . ; t) (5.3.25)
∂t
where H is the Hamiltonian operator, representing the energy of the system.
The wave function of a state with a definite energy E thus has a trivial time-
dependence, contained in a phase factor exp(−iEt/h̄). The expectation value
(5.3.13) of any operator for such a wave function is independent of time. More
generally, assuming that the Hamiltonian is self-adjoint, the time dependence
of the expectation value of an observable represented by an operator A in a
state represented by a normalized wave function ψ that satisfies Eq. (5.3.25) is
governed by the differential equation
d
A = ψ [(−i/h̄)AH ψ] + [(−i/h̄)H ψ]∗ [Aψ]
∗
dt
∗ ∗
= (−i/h̄) ψ [AH ψ] − ψ [H Aψ]
and therefore
d
i h̄A = [A, H ] . (5.3.26)
dt
In particular, the normalization integral ψ ∗ ψ is the expectation value (5.3.13)
of the unit operator, which acting on any wave function just gives the same wave
function. Since this operator commutes with the Hamiltonian (or anything else),
the normalization integral is constant in time; once normalized, wave functions
remain normalized.
For instance, in the case of a single particle moving in an external potential,
with Hamiltonian (5.3.6),
1 2
H = P + V (X) ,
2m
we have
1 i h̄
[X, H ] = [X, P2 ] = P (5.3.27)
2m m
and
[P, H ] = [P, V (X)] = −i h̄∇V (X) . (5.3.28)
The equations of motion of the expectation values are then
d d
X = P/m , P = −∇V (X) . (5.3.29)
dt dt
5.3 General Principles of Quantum Mechanics 149
This is much the same as in classical physics, but note that ∇V (X) is not the
same as ∇V (X), so this is not a closed set of equations.
Conservation Laws
Operators that commute with the Hamiltonian deserve special attention, in part
because their expectation values (and the expectation values of any functions
of them) are time-independent for any wave function. These represent what are
called conserved quantities. Among these operators is of course H itself, so the
mean energy H of any state is constant in time. The momenta of particles
moving in an external potential is not conserved, but the total momentum of a
number of particles is conserved if the potential depends only on the differences
of their coordinates. For instance, for such a two-particle system,
P1 + P2 , V (X1 − X2 ) = −i h̄∇1 V (X1 − X2 ) − i h̄∇2 V (X1 − X2 ) = 0 .
What about angular momentum? For simplicity, consider just a single particle,
whose orbital√ angular momentum is L = X × P, in a potential that depends
only on R = X2 . It is straightforward to work out that the commutators of a
general linear combination e · L of the components of L with the position and
momentum operators are
[e · L , X] = −i h̄ e × X , (5.3.30)
[e · L , P] = −i h̄ e × P . (5.3.31)
For instance,
[e · L, X1 ]
= e1 (X2 P3 − X3 P2 ) + e2 (X3 P1 − X1 P3 ) + e3 (X1 P2 − X2 P1 ) , X1
= −i h̄(e2 X3 − e3 X2 ) = −i h̄(e × X)1 .
It follows 2 2
√ that each component of L commutes with P and X , and hence also
2
with V ( X ), and so with the Hamiltonian.
Another reason for us to give special attention to operators that commute with
the Hamiltonian is that states with a given energy can be classified according
to the eigenvalues of these conserved quantities. For instance, for a Coulomb
potential the states with a given principal quantum number n and hence with a
given energy can be classified according to the eigenvalues of L2 and L3 , both
of which commute with the Hamiltonian as well as with each other. Of course,
L1 and L2 also commute with the Hamiltonian and with L2 , but as we shall
see in the next section they do not commute with each other, or with L3 , so the
best we can do is to classify states according to the eigenvalues of L2 and L3 as
well as H .
150 5 Quantum Mechanics
Spin Discovered
The counting of states described in Section 5.2 was already known in 1925 to
be in conflict with spectroscopic data. The problem emerged most clearly in the
study of alkali metals. These are elements such as lithium, sodium, potassium,
etc. that were known to readily lose a single electron.8 In the contemporary
atomic models of the time, this meant that an alkali metal atom has one loosely
bound electron outside inner shells of more tightly bound electrons. The poten-
tial felt by this outer electron is spherically symmetric but it is not a Coulomb
potential, which would be proportional to 1/r; so because L3 and L2 commute
with each other and with H it was still expected that states of definite energy
would also have definite values h̄2 ( + 1) for L2 and 2 + 1 states of equal
energy for any given , distinguished by different eigenvalues h̄m of L3 , but no
further degeneracy was expected. States could still be labeled with a principal
quantum number n, defined so that the number of nodes of the wave function
(values of r where the wave function vanishes) is n − − 1, as it is in hydrogen,
but, unlike the case of hydrogen, here the energies depend on as well as on n.
There is a very well studied “D-line” in the spectrum of sodium vapor (which
gives sodium vapor lamps their orange color) with wavelength about 5890
angstroms, interpreted as a 3p → 3s transition between states of the outermost
electron with n = 3. But even with moderate resolution, spectroscopists were
able to see that this line was doubled, having two components with wavelengths
5896 angstroms and 5890 angstroms. Wolfgang Pauli (1900–1958) was led
to suggest that, on the basis of this and other data, there is a fourth quantum
number, besides n, , and m, which takes just two values in all states with ≥ 1.
But the physical significance of this quantum number was at first mysterious.
Then in 1925 two young Dutch physicists, Samuel Goudsmit (1902–1978)
and George Uhlenbeck (1900–1988), suggested9 that the extra quantum number
8 With the charge of the electron and the atomic weights of these elements known, it could be concluded
from the ratio of the metal mass produced in electrolysis to the electric charge used that one electron is
needed to convert one ion in a solution of the metal salt to an alkali metal atom, so the atom in becoming
an ion had to lose just one electron. This was in contrast with metals like beryllium, magnesium, calcium,
etc., which require two electrons to convert an ion to an atom.
9 S. Goudsmit and G. Uhlenbeck, Naturwiss. 13, 953 (1925).
152 5 Quantum Mechanics
Rotations
In general, an infinitesimal rotation changes any vector v by an amount
δv = e × v (5.4.1)
where e is an infinitesimal 3-vector characterizing the rotation. This is a rotation,
because it leaves all scalar products unchanged:
δ(v · v ) = v · (e × v ) + (e × v) · v = 0 . (5.4.2)
It is in fact (though we don’t need to know this here) a rotation by an in-
finitesimal angle of |e| radians counterclockwise around the direction of e. For
instance, if e is in the 3-direction then (5.4.1) gives
δv1 = −|e|v2 , δv2 = +|e|v1 , δv3 = 0 .
Now, suppose that one observer sees that a physical system is in a state
represented by a wave function ψ, and suppose a second observer views the
same state using coordinate axes that have been subjected to a slight rotation,
which changes any vector v by an infinitesimal amount e × v. What does she
see? For e infinitesimal the change in the wave function must be linear in e, and
can therefore be written
δψ = (i/h̄) e · J ψ (5.4.3)
where J is some triplet of operators and the factor (i/h̄) is inserted for future
convenience.
∗ We would not want the rotation to change the total probability
ψ ψ = 1 that the particles in the system are somewhere, so we require that
0 = δ ψ ψ = [(i/h̄)e · Jψ] ψ + ψ ∗ (i/h̄) e · J ψ
∗ ∗
= (i/h̄) ψ ∗ e · (−J† + J)ψ
and the definition (5.3.7) of the adjoint A† of any operator A, except that as we
shall see we must include discrete variables along with coordinates.)
In order for the transformation of the wave function to correspond to a rota-
tion, it is necessary that it should produce a rotation of expectation values. That
is, if V is an operator representing an observable that transforms as a vector
under the general infinitesimal rotation (5.4.1), we must have
δV = e × V . (5.4.5)
From Eqs. (5.4.3) and (5.4.4) we see that
δ ψ Vψ = [(−i/h̄)e · Jψ] Vψ + ψ ∗ V(−i/h̄)e · Jψ
∗ ∗
= (i/h̄) ψ ∗ [e · J, V]ψ
in which the coefficient of e1 is [J1 , J2 ] = i h̄J3 .) Also, J2 like any other scalar
commutes with J:
[Ji , J2 ] = 0 . (5.4.10)
As we shall see, it is the commutation relations (5.4.9) that determine the pos-
sible values of J2 and the possible values of J3 for a given J2 .
It follows that
[Ji , Jj ] = [Li , Lj ] + [Si , Sj ]
so the Si satisfy the same commutation relations with each other as in Eq. (5.4.9)
for J and Eq. (5.4.12) for L:
[S1 , S2 ] = i h̄S3 , [S2 , S3 ] = i h̄S1 , [S3 , S1 ] = i h̄S2 . (5.4.16)
Multiplets
We next show how to use the commutation relations (5.4.9) to find the allowed
values of J2 and the range of allowed values of J3 for a given J2 . Though
presented here for the total angular momentum J, precisely the same reason-
ing and corresponding results apply to any angular momentum operators with
corresponding commutation relations, such as the orbital angular momentum
vector L that satisfies Eq. (5.4.12) and the spin angular momentum vector S that
satisfies Eq. (5.4.16).
First, we note that
[J3 , (J1 ± iJ2 )] = i h̄J2 ± i (−i h̄J1 ) = ± h̄ (J1 ± iJ2 ) . (5.4.17)
Therefore J1 ± iJ2 act as raising and lowering operators: for a wave function
ψ m that satisfies the eigenvalue condition J3 ψ m = h̄mψ m (with any m), we
have
J3 (J1 ± iJ2 )ψ m = (m ± 1)h̄(J1 ± iJ2 )ψ m ,
so if (J1 ± iJ2 )ψ m does not vanish then it is an eigenfunction of J3 with
eigenvalue h̄(m ± 1). Since J2 commutes with J3 , we can choose ψ m to be
an eigenfunction of J2 as well as J3 , and, since J2 commutes with (J1 ± iJ2 ),
all the wave functions that are connected with each other by lowering and/or
raising operators will have the same eigenvalue for J2 . We say that such wave
functions form an angular momentum multiplet.
Now, there must be a maximum and a minimum to the eigenvalues of J3 that
can be reached in this way, because the square of any eigenvalue of J3 is nec-
essarily not more than the eigenvalue of J2 . The reason is that for any wave
function ψ that has an eigenvalue a for J3 and an eigenvalue b for J2 , we have
b − a 2 = (J2 − J32 ) = (J12 + J22 ) ≥ 0 .
It is conventional to define a quantity j as the maximum value of the eigenvalues
of J3 /h̄ for a particular multiplet of wave functions that are related by raising
and lowering operators. We will also temporarily define j as the minimum
eigenvalue of J3 /h̄ for these wave functions. The wave function ψ j for which
J3 takes its maximum eigenvalue h̄j must satisfy
(J1 + iJ2 )ψ j = 0 , (5.4.18)
156 5 Quantum Mechanics
since otherwise (J1 +iJ2 )ψ j would be a wave function with a larger eigenvalue
of J3 . Likewise, acting on the wave function ψ j with (J1 − iJ2 ) gives an
eigenfunction of J3 with eigenvalue h̄(j −1), unless of course this wave function
vanishes. Continuing in this way, we must eventually get to a wave function ψ j
with the minimum eigenvalue h̄j of J3 , which satisfies
(J1 − iJ2 )ψ j = 0 , (5.4.19)
since otherwise (J1 − iJ2 )ψ j would be a wave function with an even smaller
eigenvalue of J3 . We get to ψ j from ψ j by applying the lowering operator
(J1 − iJ2 ) a whole number of times, so j − j must be a whole number.
To go further, we use the commutation relations of J1 and J2 to show that
(J1 − iJ2 )(J1 + iJ2 ) = J12 + J22 + i[J1 , J2 ] = J2 − J32 − h̄J3 , (5.4.20)
(J1 + iJ2 )(J1 − iJ2 ) = J12 + J22 − i[J1 , J2 ] = J2 − J32 + h̄J3 . (5.4.21)
According to Eq. (5.4.18), the operator (5.4.20) gives zero when acting on
ψ j , so
J2 ψ j = h̄2 j (j + 1) ψ j . (5.4.22)
On the other hand, according to Eq. (5.4.19) the operator (5.4.21) gives zero
when acting on ψ j , so
J2 ψ j = h̄2 j (j − 1) ψ j . (5.4.23)
But all these wave functions are eigenfunctions of J2 with the same eigenvalue,
so j (j − 1) = j (j + 1). This quadratic equation for j has two solutions,
j = j + 1, and j = −j . The first solution is impossible, because j is the
minimum eigenvalue of J3 /h̄ and therefore cannot be greater than the maximum
eigenvalue j . This leaves us with the other solution,
j = −j . (5.4.24)
But we saw that j − j = 2j must be a non-negative integer, so j must be a
non-negative integer or half integer. The eigenvalues of J3 range over the 2j + 1
values of h̄m with m running by unit steps from −j to +j . The corresponding
eigenfunctions will be denoted ψjm , so that
J3 ψjm = h̄ m ψjm , m = −j , −j + 1, . . . , +j (5.4.25)
J2 ψjm = h̄2 j (j + 1) ψjm . (5.4.26)
These are the same eigenvalues as those we found in the previous section in
the case of orbital angular momentum, with the one big difference that j and m
may be half-integers rather than integers. This justifies the guess of Goudsmit
5.4 Spin and Orbital Angular Momentum 157
and Uhlenbeck that electrons could have an intrinsic angular momentum with
j = 1/2, but that is the end of the surprises – we see that it is not possible to have
physical systems with weird angular momenta such as j = 1/3, j = 1/4, etc.
Using these results, we can work out the action of any component of J on
these multiplets. Because J1 − iJ2 is a lowering operator, we must have
(J1 − iJ2 )ψjm = αj m ψjm−1 ,
where the αj m are various constants that depend on how the wave functions are
normalized. If we assume that ψjm and ψjm−1 both have unit norm, then, using
Eq. (5.4.21),
|αj m | = ψj (J1 + iJ2 )(J1 − iJ2 )ψj = ψjm∗ (J2 − J32 + h̄J3 )ψjm
2 m∗ m
The values of σn run over the 2sn + 1 values from −sn to +sn . In place of
Eq. (5.3.8), the scalar product of two wave functions for systems of particles
with spin includes a sum over all these σn :
ψa∗ ψb
≡ d 3 x1 d 3 x2 · · · ψa∗ (x1 , σ1 , x2 , σ2 , . . . )ψb (x1 , σ1 , x2 , σ1 , . . . ) .
σ1 σ2
(5.4.29)
The spin operator S does not act on the coordinate arguments, but produces
linear combinations of wave functions with various values of the σn . (Instead of
the 3-component of angular momentum, we can label states with the helicity,
the component of angular momentum in the direction of motion in units of h̄.
Photon states have only helicity ±1, corresponding to the two states of circular
polarization.)
j −1,j j ,j −1
Next, consider the wave functions ψjaa,jb b and ψjaa,jbb . Both are eigenfunc-
tions of J3 with m = ja + jb − 1. One linear combination of these must be the
member of the j = ja + jb multiplet with m = ja + jb − 1; the other then has
to be a member of some other multiplet, which by the same reasoning as before
must have j = ja + jb − 1.
We can continue in this way, with one multiplet for each j = ja + jb , j =
ja +jb −1, j = ja +jb −2, and so on. After ν steps, with m = ja +jb −ν, there
are ν + 1 choices of ma running up from ja − ν to ja , and mb = m − ma running
down from jb to jb − ν, with one new multiplet having j = ja + jb − ν for each
increase in ν. But this ends with ν = 2jb (taking ja ≥ jb ), for which mb runs
from jb to −jb . At the next step, with ν = 2jb + 1, we would only get a new
multiplet if mb could run from jb down to −jb −1, which is impossible since we
can only have |mb | ≤ jb . So when ja ≥ jb , the lowest value of j that is found
in the addition of angular momenta ja and jb is j = ja + jb − 2jb = ja − jb .
Of course, in the same way, if jb ≥ ja , the lowest value of j is jb − ja . So in
this way, we construct one multiplet for each j in the range
j = ja + jb , j = ja + jb − 1, j = ja + jb − 2, . . . , j = |ja − jb | .
(5.4.30)
This is the general rule for adding angular momenta.
The linear combination of wave functions ψjma a,j,bmb with a definite value for j
and m is conventionally written as
+ja
+jb
ψjma , jb , j = Cja , jb (j , m ; ma , mb )ψjma a, ,jbmb , (5.4.31)
ma =−ja mb =−jb
Hyperfine Structure
We must not forget the atomic nucleus, for if it has spin this produces a
magnetic field felt by orbiting electrons. This effect is most important for the
s-wave electrons that are not prevented from getting close to the nucleus by
the centrifugal barrier that is present for = 0. In hydrogen the spin 1/2 of
the nucleus combines with the spin 1/2 of the electron in its = 0 ground state
to split the energy of the ground state into components with total spin s = 0 and
s = 1, separated in energy by 5.9×10−6 eV. The transition between these states
produces the famous 21-cm absorption and emission spectral lines, discussed
in Section 3.5.
First, as an example of some intrinsic importance, let us work out how to form
hydrogen wave functions with definite total angular momentum from wave
functions with definite 3-components of spin and orbital angular momentum.
Consider the “stretched” hydrogen wave function in which L3 and S3 are both
as large as possible, having eigenvalues +h̄ and +h̄/2, respectively. In general
we shall label hydrogen wave functions with orbital angular momentum and
spin 1/2 and definite values h̄m and h̄σ for L3 and S3 as ψ,m,1/2
σ
, so this stretched
, +1/2
wave function is denoted ψ, 1/2 . For this wave function, J3 = h̄( + 1/2) is
also as large as possible. This is therefore a wave function with j = + 1/2
(where as usual j is defined so that the eigenvalue of J2 is h̄2 j (j + 1)). This
wave function could not have a larger j because then there would be states with
5.4 Spin and Orbital Angular Momentum 163
J3 > h̄( + 1/2), and it could not have a smaller j because then J3 could not be
as large as h̄( + 1/2). In general we shall label hydrogen wave functions with
orbital angular momentum and spin 1/2 and definite values h̄2 j (j + 1) and
h̄M for J2 and J3 as ψ,M1/2, j . So we have
, +1/2 +1/2
ψ, 1/2 = ψ, 1/2, +1/2 . (5.4.39)
So far, this is pretty trivial, apparently not worth the elaborate notation. But
now consider the wave functions with J3 = h̄( − 1/2). There are two of these,
−1, +1/2
one of them, ψ, 1/2 , with L3 = h̄( − 1) and S3 = +h̄/2 and the other,
, −1/2
ψ, 1/2 , with L3 = h̄ and S3 = −h̄/2. One linear combination of these two
can be obtained by letting the lowering operator J1 − iJ2 act on the stretched
wave function. This is part of the same angular momentum multiplet as the
+1/2
stretched wave function ψ, 1/2, +1/2 , with the same eigenvalue for J2 , so is
−1/2
labeled ψ, 1/2, +1/2 . According to Eq. (5.4.27), if properly normalized this
wave function is given by
√ −1/2 +1/2
2 + 1ψ, 1/2, +1/2 = (J1 − iJ2 )ψ, 1/2, +1/2
, +1/2
= (L1 − iL2 + S1 − iS2 )ψ, 1/2 .
Orbital and spin angular momenta obey the same commutation relations as total
angular momentum, so their lowering operators act the same way as given in
Eq. (5.4.27) for J1 − iJ2 :
, +1/2 √ −1, +1/2 , +1/2 , −1/2
(L1 − iL2 )ψ, 1/2 = 2ψ, 1/2 , (S1 − iS2 )ψ, 1/2 = ψ, 1/2 ,
and therefore
√ −1/2 √ −1, +1/2 , −1/2
2 + 1ψ, 1/2, +1/2 = 2ψ, 1/2 + ψ, 1/2 . (5.4.40)
Since there are two independent wave functions with J3 = h̄( − 1/2), there
must be another linear combination that is part of an angular momentum multi-
plet with no higher value of J3 than h̄( − 1/2), so this multiplet has j = − 1,
−1/2
and in our notation this linear combination is ψ, 1/2, −1/2 . Since it has a dif-
ferent value of j , this linear combination can be calculated by requiring it to
be normalized and orthogonal to the one we found by acting with J1 − iJ2 on
the stretched wave function with L3 = h̄ and S3 = +h̄/2. That is (with a
conventional choice of overall phase):
√ −1/2 −1, +1/2 √ , −1/2
2 + 1ψ, 1/2, −1/2 = −ψ, 1/2 + 2ψ, 1/2 . (5.4.41)
By continued operation of the lowering operator on the wave functions (5.4.40)
and (5.4.41), we fill out two complete multiplets, one with j = + 1/2 and one
164 5 Quantum Mechanics
with j = − 1/2. These results can be summarized as values for the Clebsch–
Gordan coefficients in Eq. (5.4.31):
All the Clebsch–Gordan coefficients can be calculated in this way, but life is too
short. The best way to find Clebsch–Gordan coefficients is to look them up in
a table. At the end of this section there is a table of these coefficients for small
angular momenta.
There is a symmetry property of the Clebsch–Gordan coefficients in the case
of adding equal angular momenta that will be important for us when we come
to diatomic molecules in the next section and to nuclear forces in Section 6.2.
For ja = jb ,
ja jb j M ma mb Cja ,jb (j M ; ma mb )
1
2
1
2 1 +1 + 12 + 12 1
√
1
2
1
2 1 0 ± 12 ∓ 12 1/ 2
1
2
1
2 1 −1 − 12 − 12 1
√
1
2
1
2 0 0 ± 12 ∓ 12 ±1 2
1 1
2
3
2 ± 32 ±1 ± 12 1
√
1 1
2
3
2 ± 12 ±1 ∓ 12 1/3
√
1 1
2
3
2 ± 12 0 ± 12 2/3
√
1 1
2
1
2 ± 12 ±1 ∓ 12 ± 2/3
√
1 1
2
1
2 ± 12 0 ± 12 ∓ 1/3
1 1 2 ±2 ±1 ±1 1
√
1 1 2 ±1 ±1 0 1/ 2
√
1 1 2 ±1 0 ±1 1/ 2
√
1 1 2 0 ±1 ∓1 1/ 6
√
1 1 2 0 0 0 2/3
√
1 1 1 ±1 ±1 0 ±1/ 2
√
1 1 1 ±1 0 ±1 ∓1/ 2
√
1 1 0 0 ±1 ∓1 1/ 3
√
1 1 0 0 0 0 −1/ 3
Identical Particles
Aside from their momenta and helicities, every photon in the universe is
identical to every other photon. The reason is that all photons are quanta of the
166 5 Quantum Mechanics
same field, the electromagnetic field. In the same way, aside from their momenta
(or positions) and spin components, according to the modern understandings
outlined in Chapter 7, every electron in the universe is identical to every other
electron because they are all quanta of a single field, known as the electron
field. The same is true of every other species of elementary particle – quarks,
neutrinos, and so on – each is the quantum of a particular field. Indeed, our best
current definition of an elementary particle is that it is the quantum of one of the
fields of which the world is composed. But the same indistinguishability is true
of composite systems in any one specific state. Two protons are indistinguish-
able because they are each composed of three quarks of the same two different
types in the same bound state, and two hydrogen atoms in the same atomic
state are indistinguishable because they are each composed of an electron
and a proton.
In writing a wave function for identical particles as ψ(x1 , σ1 ; x2 , σ2 ; . . . ),
it is incorrect to say that for this wave function the first particle has position
x1 and spin 3-component σ1 while the second particle has position x2 and spin
3-component σ2 , and so on. Instead we should say that there is a particle
with position x1 and spin 3-component σ1 and another particle with posi-
tion x2 and spin 3-component σ2 , and so on. Thus for identical particles,
ψ(x1 , σ1 ; x2 , σ2 ; . . . ) and ψ(x2 , σ2 ; x1 , σ1 ; . . . ) represent the same state.
Two wave functions that represent the same state can only differ by a constant
factor, so
ψ(x2 , σ2 ; x1 , σ1 ; . . . ) = λψ(x1 , σ1 ; x2 , σ2 ; . . . )
for some constant λ. Integrals don’t
depend on how the variables of integration
are labeled, so it follows that |ψ|2 = |λ|2 |ψ|2 , and therefore λ can only
be a phase factor, with |λ| = 1. Further, the constant λ cannot depend on
position or spin 3-components without violating various symmetry principles,
such as Galilean or Einsteinian relativity, rotational invariance, and translation
invariance. so we can therefore repeat the same relation with identical particles
1 and 2 interchanged on both sides, but with the same λ, and write
ψ(x1 , σ1 ; x2 , σ2 ; . . . ) = λψ(x2 , σ2 ; x1 , σ1 ; . . . ) = λ2 ψ(x1 , σ1 ; x2 , σ2 ; . . . )
Nath Bose (1894–1974), who first described multi-photon states, imposing this
symmetry condition.14 Einstein had Bose’s paper translated into German and
published, and then applied these ideas to material particles.15
Particles for which λ = −1, so that the wave function is antisymmetric in
the labels of these particles, are known as fermions, named after Enrico Fermi
(1901–1954). Fermi16 and Dirac17 at about the same time described multi-
electron states, imposing this antisymmetry condition.
It is another consequence of the relativistic quantum theory of fields that
elementary particles (the quanta of fields) are bosons or fermions according
to whether their spin is an integer or half an odd integer.18 The reason for this is
outlined in Section 7.4, but a complete proof is beyond the scope of this book.
It is easy, though, to see that if this correlation with spin is valid for some set
of elementary particles then it is valid for any composites of these particles.
If we interchange two identical composite particles then we are interchanging
all their constituents, so the interchange gives a minus sign multiplying the wave
function if each of the composites contains an odd number of fermions and a
plus sign otherwise, no matter how many bosons it contains. But, according
to the rules for adding angular momenta described in the previous section, a
composite has a half odd integer spin if it contains an odd number of half odd
integer spin particles, and integer spin otherwise, no matter how many integer
spin particles it contains. So a composite with half odd integer spin contains
an odd number of fermions, and is therefore a fermion, while if it has integer
spin it contains an even number of fermions (perhaps zero) and is therefore a
boson. No other correlation of boson/fermion character with spin would have
this consistency.
So electrons, quarks, protons, and neutrons, which have spin 1/2, are
fermions. The spin of massless particles like photons requires special consider-
ation, but as noted in Section 7.5 the components of their angular momentum in
the direction of travel can only be ±h̄, corresponding to left and right circular
polarization, and they are bosons. Indeed, as already mentioned, Bose’s original
introduction of symmetric states had to do with photons. Hydrogen and helium
atoms are bosons, while 6 Li atoms (with three protons, three neutrons, and three
electrons) are fermions.
Statistics
The distinction between bosons and fermions has a profound impact on the
properties of gases in thermal equilibrium. As we did for photons in Section 3.2,
14 S. N. Bose, Z. Phys. 26, 178 (1924).
15 A. Einstein, Sitz. Preuss. Akad. Wiss. 1, 3 (1926).
16 E. Fermi, Rend. Lincei 3, 145 (1926).
17 P. A. M. Dirac, Proc. Roy. Soc. A 112, 661 (1926).
18 This was first stated as a general rule by M. Fierz, Helv. Phys. Acta 12, 3 (1939) and W. Pauli, Phys. Rev.
58, 716 (1940).
168 5 Quantum Mechanics
For fermions it is not possible to have more than one particle with a given
momentum p (and a given spin 3-component), because the wave function for
two such particles would be proportional to
exp(ip · x1 /h̄) exp(ip · x2 /h̄),
which is symmetric rather than antisymmetric in the two particles. Hence the
sums in Eq. (5.5.3) run only over the values N = 0 and N = 1:
1
N̄p = . (5.5.5)
exp([E(p) − μ]/kT ) + 1
This is known as the case of Fermi–Dirac statistics. For very low temperatures
this takes the form
1 E(p) < μ
N̄p → (5.5.6)
0 E(p) > μ .
This is used to derive a relation between the number densities and energy densi-
ties in white dwarf stars, whose high density requires electrons to have energies
much larger than chemical binding energies, though they are essentially at zero
temperature. For white dwarfs of relatively low mass, μ is much less than
me c2 but much larger than chemical binding energies, so the number density
of electrons is given by
pF
2 8πpF3
ne = d N (p) = 4πp 2
dp =
p<pF (2π h̄)3 0 3(2π h̄)3
where pF is the Fermi momentum defined by E(pF ) = μ. (In practice, we
use a known or assumed value of n to calculate pF .) The corresponding kinetic
energy density is
pF
2 p2 8πpF5
E= E(p)d N (p) = 4πp 2
dp × =
|p|<pF (2π h̄)3 0 2me 10me (2π h̄)3
= (8π)−2/3 (2π h̄)2 (3ne )5/3 /10me .
As shown in Eq. (1.1.3), the pressure of any non-relativistic monatomic gas is
p = 2E /3, so this gives an equation of state for low-mass white dwarfs:
p = Kρ 5/3 , K = (2/3)(8π)−2/3 (2π h̄)2 (3Z/Am1 )5/3 /10me ,
where ρ = ne Am1 /Z is the mass density.
1s,
2s, 2p,
3s, 3p,
4s, 3d, 4p,
5s, 4d, 5p,
6s, 4f , 5d, 6p,
7s, 5f , 7p, . . . (5.5.9)
(We are here ignoring the small fine-structure splitting in the energies of these
states, and so are leaving out subscripts giving the values of j .) For a given ,
increasing n increases the number of nodes of the wave function, so that the
wave function oscillates more with r, which increases the kinetic energy. This
is the main reason why electron energies increase going down the list. But, for
a given n, the increase in centrifugal force with increasing decreases the wave
function at small r where the charge interior to r is largest, which decreases the
effective absolute value of the negative potential energy, increasing the state’s
total energy. Hence, although the one-electron states listed above on the same
line have approximately equal energy, the energies increase somewhat from
left to right. In the case of 3d, 4d, 4f , 5d, and 5f states and many states
with n ≥ 6, the dependence of the energy on turns out to overcome its
dependence on n.
172 5 Quantum Mechanics
Taking spin into account, the total number of states for the energy levels listed
on each line of Eq. (5.5.9) are 2, 2 + 6 = 8, 2 + 6 = 8, 2 + 10 + 6 = 18,
2 + 10 + 6 = 18, 2 + 14 + 10 + 6 = 32, and so on. These are substantially
the same periodicities that had been discovered chemically by Mendeleev. For
instance, electrons that fill up any one of the lines of the table are said to form
a closed shell. It is energetically unfavorable for atoms whose electrons just fill
closed shells to gain or lose electrons, so these atoms are chemically inert. They
are the noble gases: there is helium with Z = 2, neon with Z = 2 + 8 = 10,
argon with Z = 10 + 8 = 18, krypton with Z = 18 + 18 = 36, xenon with
Z = 36 + 18 = 54, and radon with Z = 54 +32 = 86. Elements with one elec-
tron outside closed shells find it easy to lose that electron, which can move freely
through the crystal lattice carrying currents of electricity or of heat. These are
the alkali metals: lithium with Z = 2 + 1 = 3, sodium with Z = 10 + 1 = 11,
potassium with Z = 18+1 = 19, and so on. Elements with one electron missing
from the highest energy closed shell react strongly in chemical reactions in
which they can gain an electron. These are the halogens: there is fluorine with
Z = 10−1 = 9, chlorine with Z = 18−1 = 17, bromine with Z = 36−1 = 35,
and so on.
More generally, if an atom has a few electrons outside closed shells, it has
what chemists call a positive valence, equal to that number of extra electrons;
if it has a few electrons less than needed to fill closed shells, then it has neg-
ative valence, equal to that number of missing electrons. Thus alkali metals
have valence +1; the so-called alkali earths beryllium, magnesium, calcium,
etc. have valence +2; the halogens have valence −1; oxygen, sulfur, etc. have
valence −2; and so on. The molecules of many simple chemical compounds
(not all!) are held together by electrostatic attraction between ions of elements
with positive and negative valence that have traded electrons. Since electrons
are neither created nor destroyed in chemistry, in such molecules if electri-
cally neutral the total valence must be zero. These include such compounds
as salts composed of metal and halogen atoms, like sodium chloride, oxides
like calcium oxide, etc. Hydrogen can act as if it has valence +1, as in water or
ammonia, or valence −1, as in metal hydrides.
Diatomic Molecules
The rotational energy spectrum of molecules like H2 , N2 , O2 , etc. that are com-
posed of two identical atoms is profoundly affected by the bosonic or fermionic
nature of the atomic nuclei. The energy required to excite rotational states of
molecules is less than the energy required to excite vibrational states by factors
of order (me /Am1 )1/2 , and less than the energy required to excite electronic
states by even smaller factors, of order me /Am1 , so the lowest energy states
of molecules are rotational states in which the separations of atomic nuclei and
5.5 Bosons and Fermions 173
the state of atomic electrons can be regarded as fixed. In such states the wave
function of a molecule consisting of two identical atoms is proportional to
cs (σ1 , σ2 )Ym (n̂) ± cs (σ2 , σ1 )Ym (−n̂) , (5.5.10)
where n̂ is a unit vector in the direction from nucleus 1 to nucleus 2; Ym is
the usual spherical harmonic, of the sort discussed in Section 5.2; cs (σ1 , σ2 )
is a spin wave function that depends on the total spin s of the two nuclei and
their individual spins s1 = s2 , as well as on the spin 3-components σ1 and
σ2 , about which more later; and the sign is +1 or −1 if the nuclei are bosons or
fermions, respectively. The energy of the rotational states with a given is given
in quantum mechanics by replacing L2 in the classical formula E = L2 /2I with
h̄2 ( + 1), so that
h̄2 ( + 1)
E = (5.5.11)
2I
with almost no dependence on total spin. Here I is the moment of inertia of
the molecule around a line perpendicular to n̂ through the center of mass of the
molecule. Now, Ym (−n̂) = (−1) Ym (n̂). Also, the spin wave functions have
the important symmetry property
cs (σ2 , σ1 ) = ±(−1)s cs (σ1 , σ2 ) , (5.5.12)
where the sign ± is (−1)2s1 ; that is, +1 for adding two equal integer spins,
and −1 for adding two equal half odd integer spins. (In terms of the Clebsch–
Gordan coefficients described in the previous section,
cs (σ1 , σ2 ) = Cs1 ,s1 (s σ ; σ1 σ2 ) ,
where σ = σ1 + σ2 . Equation (5.4.42) with ja = s1 , J = s, ma = σ1 , mb = σ2 ,
M = σ gives
Cs1 s1 (s σ ; σ2 σ1 ) = (−1)s−2s1 Cs1 s1 (s σ ; σ1 σ2 ) ,
which is the same as Eq. (5.5.12).) Because of the spin–statistics connection,
the ± sign in Eq. (5.5.12) is the same as in Eq. (5.5.10). We see then that these
± signs cancel, and the only states in which the wave function does not vanish
are therefore those in which
(−) = (−1)s . (5.5.13)
Either s and are even, in which case the molecule is distinguished by the
prefix para, or both are odd, and the prefix is ortho. For instance, in H2 we have
parahydrogen, with s = 0 and even, and orthohydrogen, with s = 1 and odd.
The degeneracy of the states is then (2 + 1) for parahydrogen and 3(2 + 1)
for orthohydrogen.
174 5 Quantum Mechanics
The forces acting on spins are so weak that radiative transitions do not
change s and therefore can only change by an even number. The dominant
transitions are those in which changes by two units, giving a radiated energy
h̄2 h̄2
E+2 − E = [( + 2)( + 3) − ( + 1)] = [4 + 6] . (5.5.14)
2I 2I
For para molecules the energies (5.5.14) are 3h̄2 /I , 7h̄2 /I , 11h̄2 /I , etc., while
for ortho molecules they are 5h̄2 /I , 9h̄2 /I , 13h̄2 /I , etc. Observing this pattern
of energies, with proportions 3 : 7 : 11 : · · · or 5 : 9 : 13 : · · · , it is possible
to judge which transitions are in para and which in ortho molecules, even if one
does not know the moment of inertia I .
The energy h̄2 /2I is typically much less than kT , so the abundance of
diatomic molecules in a state with given s and is simply proportional to the
degeneracy (2s + 1)(2 + 1). (For instance, for hydrogen h̄2 /2I = k × 45 K.)
The observed transitions are typically between states with 1, and the
intensity of the radiation emitted is mostly a matter of the number of spin states
for and s even or odd, as follows. If the spin s1 of each nucleus is an integer, so
that they are bosons, then the allowed even values of s are 2s1 , 2s1 − 2, . . . , 0,
and the allowed odd values of s are 2s1 − 1, 2s1 − 3, . . . , 1. Hence in this case
the total number of spin states for para and ortho molecules is
s1
#para = 2(2n) + 1 = (s1 + 1)(2s1 + 1) ,
n=0
s1
#ortho = 2(2n − 1) + 1 = s1 (2s1 + 1) ,
n=1
ted or absorbed in transitions in parahydrogen is about one-third the ratio for or-
thohydrogen.) Evidently one can tell whether nuclei are bosons or fermions just
by observing whether radiation from the para or ortho transitions is stronger. In
the next chapter we will see that observations of the diatomic nitrogen molecule
presented a puzzle regarding the nature of the nitrogen nucleus that was only
resolved with the discovery of the neutron.
Clouds of interstellar diatomic molecules can cool to quite low temperatures
by collisional excitation of rotational energy levels, after which the excitation
energy is emitted as radiation that leaves the cloud. This is an important feature
in the formation of stars by gravitational condensation of interstellar matter,
which requires low temperatures to mitigate pressure forces that can prevent
condensation. But for cooling, it is necessary that radiation should often be
emitted before the molecule gives its excitation energy back to the cloud in
another collision.
This is an obstacle to cooling by diatomic molecules with identical atoms. As
discussed in Section 7.5, the fastest radiative transitions in atoms and molecules
generally
∗ are electric dipole transitions, in which there is a non-zero value for
ψfinal Pψinitial (where P is the momentum of the radiating particle). Since P is a
three-vector that changes sign under reflection of coordinates, this integral van-
ishes unless certain selection rules are obeyed: when spin effects are neglected,
the initial and final states must have opposite signs for the parity (−) and must
have values of that differ by no more than one unit. Neither selection rule
is satisfied by the transitions in diatomic molecules with identical atoms, in
which changes by two units. These are what in Section 7.5 are called electric
quadrupole transitions, which are much slower than electric dipole transitions.
Thus, although H2 is by far the most common molecule in interstellar space, it
contributes little to the cooling of molecular clouds.
On the other hand, in diatomic molecules with distinguishable atoms radiative
transitions can occur rapidly as electric dipole transitions in which changes by
one unit, and these molecules when excited by collisions often lose energy by
radiation rather than in further collisions. Of the more abundant molecules of
this sort, the most effective at cooling interstellar clouds is CO. This molecule
has a large moment of inertia, with h̄2 /I k 5.5 K, so it can cool clouds to
very low temperatures. The hydroxyl molecule OH is more abundant but has a
smaller moment of inertia and hence larger excitation energies, so it cannot cool
clouds to temperatures as low as can CO.
5.6 Scattering
main body of this section we will consider scattering processes only in the case
that is simplest kinematically: the scattering of a particle by a much heavier
particle, such as the scattering of alpha particles by nuclei of various metals in
the 1911 experiment that led Rutherford to the discovery of the atomic nucleus.
In this case we can approximate the effect of the heavy target particle by taking
it to be at rest at the origin of coordinates, and representing its interaction with
the scattered particle as a fixed external potential V (x) that depends only on
the coordinate of the scattered particle. Not only is this a good approximation
for some scattering processes of historical importance – as we shall see, it was
the study of scattering using this approximation that led to the probabilistic
interpretation of quantum mechanics. An appendix to this section considers the
calculation of more general scattering and decay processes with any number of
particles of any type in the initial and final states.
then
1 1
f (x) = 3
d qe iq·x
g(q) = 3
d qe iq·x
d 3 x e−iq·x f (x ) ,
(2π)3 (2π)3
which with an interchange of the order of integration (discarding mathematical
rigor) is the same as Eq. (5.6.8). If the wave function ϕp (x) for a free particle of
momentum p is defined so that
ϕp (x) = (2π h̄)−3/2 exp(ip · x/h̄)
then Eq. (5.6.9) gives these wave functions a simple delta-function normalization
d 3 x ϕp∗ (x)ϕp (x) = δ 3 (p − p) ,
178 5 Quantum Mechanics
Probabilistic Interpretation
At a distance r from the scattering center that is not only large compared with
the range of the potential but also much greater than the wavelength 2π/k, the
second term in Eq. (5.6.14) at any given direction x̂ behaves like a plane wave
moving outward with wave vector k x̂. This is a familiar behavior for all sorts
of waves. A plane ocean wave encountering an obstacle in the water will break
up and spread out in all directions, just as in Eq. (5.6.14). But a particle like
an alpha particle in Rutherford’s laboratory encountering a target like a gold
nucleus does not break up. It hangs together, and is scattered in some definite
direction, though not a direction that can be predicted in advance. This showed
that ψ(x) or |ψ(x)|2 cannot represent how much of the scattered particle is at x.
It was this remark about scattering that led Max Born (1882–1970) in 1926 to
propose23 that if ψ is suitably normalized then |ψ(x)|2 is the probability density
at x – that is, |ψ(x)|2 d 3 x is the probability that the particle is in a small volume
d 3 x around x.
For a proper treatment of what happens in scattering it is necessary to con-
sider a wave function that at early times is a packet of free-particle waves, as
in Eq. (5.1.3), and use the time-dependent Schrödinger equation to follow the
subsequent scattering. This is the approach followed in the appendix to this
section. But, with a moderate amount of hand-waving, we can derive the most
important results more simply, just using Eq. (5.6.14).
Suppose that at some early time before the scattering the incoming particle is
in a thin disk of area A and thickness L at right angles to the path of the particle.
In order for |ψ|2 to serve√ as a probability density, we have to arrange that eikx3
comes with a factor 1/ AL instead of 1/(2π h̄)3/2 , so that the integral of |ψ|2
over the disk at early times is unity. The√ scattering wave function (5.6.14) will
then also be multiplied by (2π h̄)3/2 / AL. At a late time t after the collisions
a scattered particle will be in a thin disk of the same thickness L at a distance
Coulomb Scattering
For an important example of the Born approximation, consider a shielded
Coulomb potential
Z1 Z2 e2
V (r) = exp(−κr) . (5.6.22)
r
This is a rough approximation to the Coulomb energy of a scattered particle of
charge Z2 e in the electric field of an atom whose nucleus has charge Z1 e. The
full electrostatic potential of the nucleus is felt by the scattered particle when the
particle is closer to the nucleus than the electronic orbits, taken to have typical
radii of order 1/κ, but the potential vanishes when the scattered particle is far
182 5 Quantum Mechanics
enough from the atom for the orbiting electrons to completely shield the charge
of the nucleus. (This potential is also known as a Yukawa potential, because,
as we will see in Section 7.3, in 1935 Hideki Yukawa(1907–1981) showed that
the exchange of a meson of mass h̄κ/c between two nuclear particles would
produce such a potential, though of course with some other constant factor in
place of Z1 Z2 e2 .) Using this in Eq. (5.6.20) gives a scattering amplitude
2mZ1 Z2 e2 1
f (x̂) − , (5.6.23)
h̄2 K2 + κ2
with K given by Eq. (5.6.21). We can find the scattering amplitude for a pure
Coulomb potential by just taking κ = 0 in Eq. (5.6.23). This result is only valid
to first order in Z1 Z2 e2 , but a calculation of higher-order corrections shows that
for κ = 0 these higher-order corrections change the scattering amplitude only
by a phase factor, which has no effect on the differential cross section (5.6.17),
so in this case
dσ 4m2 Z12 Z22 e4
= , (5.6.24)
d h̄4 K 4
which holds even beyond the Born approximation. This is the same as the
formula calculated classically in 1911 by Rutherford, following the hyperbolic
trajectory of the alpha particle to find the area dσ that it must hit to reach a
given direction within a solid angle d. Rutherford’s calculation would not
have given the correct scattering probability for a general potential, except at
very short wavelength. It was just good luck that for Coulomb scattering the
classical calculation gives the right answer for general wavelengths.
these particles are very far apart. For instance, for non-relativistic processes H0
is the operator representing the total kinetic energy. The eigenfunctions ϕα of
H0 satisfy
H0 ϕα = Eα ϕα . (5.6.26)
Here α labels the species, three-momenta, and spin z-components (or helicities)
of all the particles in the state α represented by ϕα , and Eα is the sum of
the kinetic plus mass energies of these particles. These wave functions can be
normalized so that
ϕβ∗ ϕα = δ(β − α) , (5.6.27)
with the understanding that δ(β − α) vanishes unless the numbers of particles
in the states α and β and the species and spin components of the corresponding
particles in these states are all equal, and where they are equal it is given by
a product of Dirac delta functions for the three-momentum of each particle.
(In Eq. (5.6.27) we continue to use the abbreviation, that in ϕβ∗ ϕα we inte-
grate over all coordinates and sum over all spin 3-components on which both
wave functions depend.) To be explicit, for wave functions representing free-
particle states containing respectively N and N particles, we have
ϕn∗ ,σ ,p ;...;n ,σ ,p ϕn1 ,σ1 ,p1 ;...;nN ,σN ,pN
1 1 1 N N N
Now let us take t → −∞. For t < 0 we can close the contour of integration
over E in Eq. (5.6.32) with a very large semicircle in the upper half of the
complex plane, on which the factor e−iEt/h̄ makes the integrand negligible.
The integral over E is then given by a sum of the residues of any singularities
of the integrand in the upper half of the complex plane. There may well be
such singularities, but for t → −∞ their residues are exponentially suppressed
by the same factor e−iEt/h̄ . A singularity infinitesimally above the real axis
would not be suppressed in this way, but the energy at which the denominator
5.6 Scattering 185
E − Eβ + i vanishes is just below the real axis, and so does not contribute
to this contour integral. (This reveals why we took to be positive.) Hence the
integral over E vanishes for t → −∞, so for very early times only the first term
in Eq. (5.6.32) survives:
ψ (t) → g(α)e−iEα t/h̄ ϕα dα .
(g)
(5.6.34)
This is what we mean when we say that at very early times the state repre-
sented by ψα looks like the free-particle state represented by ϕα , as was to be
shown.
What does this state look like at very late times? For t > 0 we can only
close the contour of integration over E with a very large contour in the lower
half of the complex plane, on which the factor e−iEt/h̄ is now negligible.
The residues of any singularities of Gβ (E) at a finite distance below the real
axis are exponentially suppressed for t → +∞ by the same factor. But now
the singularity at E = Eβ − i does contribute to the integral. The contour
of integration goes clockwise around this singularity, so this integral equals
−2πiGβ (Eβ − i) exp([−iEβ − ]t/h̄). As long as we take → 0 before we
take t → +∞, we can drop the here, so the integral over E in Eq, (5.6.32)
equals −2πiGβ (Eβ ) exp(−iEβ t/h̄), and Eq. (5.6.32) then gives
−iEα t/h̄
ψ (t) → g(α)e
(g)
ϕα dα − 2πi dβ Gβ (Eβ )ϕβ exp(−iEβ t/h̄)
where
Sβα = δ(β − α) − 2πiδ(Eβ − Eα ) ϕβ∗ V ψα . (5.6.36)
So, in the same sense as in the case t → −∞, Eq. (5.6.35) shows that the
state represented by ψα looks at t → +∞ as a superposition dβ Sβα ϕβ . The
coefficient (5.6.36) is known as the S-matrix and is the central object of study
in modern scattering theory.
But experiments do not measure probability amplitudes. They measure prob-
abilities, or the rates at which probabilities change. However, we cannot just set
the probability for the transition α → β equal to |Sβα |2 . Even if we consider
a process for which α = β, so that we can drop the term δ(β − α) in Sβα ,
the S-matrix element will still be proportional to the energy-conservation delta
function δ(Eβ − Eα ), whose square is not well-defined. Also, in the most com-
mon case, where no external fields affect the transition α → β, momentum is
conserved, so
186 5 Quantum Mechanics
ϕβ∗ V ψα = δ 3 (Pβ − Pα )Mβα ; (5.6.37)
where P here denotes the total momentum of the state and Mβα is some ampli-
tude that is not singular when Pβ = Pα . So we have to worry about the square
of δ 3 (Pβ − Pα ) as well as the square of δ(Eβ − Eα ).
For a completely convincing way of dealing with these problems, we would
need to take superpositions of states with a range of energies and momenta, and
follow the evolution of these wave packets from very early to very late times.
We will adopt a much simpler approach that gives the right answers with a
minimum of trouble.
First, to deal with the inevitable energy-conservation delta function, we adopt
the fiction that the interaction V acts only for a long but finite time interval of
duration T . This should not introduce significant errors if this interval extends
back in time to long before the particles in state α become close to one another,
and extends forward in time to long after the particles in state β have been close
to one another.25 In this case, the one-dimensional version of the representation
(5.6.9) of the delta function becomes instead
1
δT (Eβ − Eα ) = dt exp(−it (Eβ − Eα )/h̄) , (5.6.38)
2π h̄ T
the integral extending over the time interval of duration T . The square of the
delta function is then
2 T
δT (Eβ − Eα ) = δT (0)δT (Eβ − Eα ) = δT (Eβ − Eα ) .
2π h̄
As long as we do not attempt to measure energies to an uncertainty less than the
tiny amount h̄/T , we can drop the subscript T on the final delta function, and
write this as
2 T
δT (Eβ − Eα ) = δ(Eβ − Eα ) . (5.6.39)
2π h̄
Likewise, in the absence of external fields momentum is conserved; to
deal with the momentum-conservation delta function we imagine that the
system is enclosed in a box of large but finite volume V . The representation
(5.6.9) of the momentum-conservation delta function in Eq. (5.6.37) (now with
momentum and position taking the place of position and wave vector) is then
replaced with
25 For a decay process with a single-particle initial state we must take the duration T of the time interval
sufficiently large that the interval extends back in time close enough to the time when the particle was
produced, so that it had not yet had time to decay, and far enough forward in time that if the particle
has decayed by then its decay products will have had time to separate far enough that they are no longer
interacting.
5.6 Scattering 187
1
δV3 (Pβ − Pα ) = d 3 x exp ix · (Pβ − Pα )/h̄ , (5.6.40)
(2π h̄)3 V
the integral running over the interior of the box. The square of this delta func-
tion is
3 2 V
δV (Pβ − Pα ) = δV3 (0)δV3 (Pβ − Pα ) = δ 3 (Pβ − Pα ) , (5.6.41)
(2π h̄)3
in which we drop the subscript V in the final expression because the uncer-
tainty in measurements of momenta is generally larger than the tiny amount
h̄V −1/3 . Hence, putting together Eqs. (5.6.36), (5.6.37), (5.6.39), and (5.6.41),
the probability of a transition α → β with α = β occurring in a time T in the
volume V is
box 2
P (α → β) = Sβα
T V
= 3
δ(Eβ − Eα )δ 3 (Pβ − Pα )2πMβαbox 2
.
2π h̄ (2π h̄)
(5.6.42)
A superscript “box” has been attached to the matrix elements Sβα and Mβα
because putting the system in a box changes the way that we must normalize
the wave functions ϕα and ϕβ . Without a box, the wave function for a particle
of momentum p far from any interaction is taken as ϕp (x) = exp(ip · x/h̄)/
3/2 3 ∗
(2π h̄) , so that d xϕp (x)ϕp (x) = δ 3 (p − p ), but in a box of volume V
√
we must instead take ϕp (x) = exp(ip · x)/ V , so that the integral of |ϕp (x)|2
over the volume of the box is unity. Thus the matrix element for the transition
α → β in a box is related to the usual matrix element by
(Nα +Nβ )/2
(2π h̄)3
box
Mβα = Mβα , (5.6.43)
V
where Nα and Nβ are the numbers of particles in the initial and final states.
There is a further complication, that in a large box the final states are very close
together. According to Eq. (5.5.1), the number of allowed momentum values for
a single particle in a range d 3 p of momenta is (V /(2π h̄)3 )d 3 p, so the number
of momentum states in the range of final states is
d N (β) = (V /(2π h̄)3 )Nβ dβ (5.6.44)
box |2 d N (β)
|Sβα
d(α → β) =
T
1−Nα
2π V
= Mβα 2 δ(Eβ − Eα )δ 3 (Pβ − Pα ) dβ .
h̄ (2π h̄)3
(5.6.45)
This is the master formula for calculating the rates for all sorts of transitions
between free-particle states.
1−Nα
The factor V /(2π h̄)3 in Eq. (5.6.45) may look peculiar, but it is in
fact just what is needed to account for what is measured. For a decay process
with Nα = 1 this factor is of course absent, corresponding to the obvious fact
that the decay rate of a particle does not depend on the size of the box in
which it is contained. For a two-particle initial state α, the differential rate
of the scattering α → β into an arbitrary final state β is proportional to the
flux, the product of the relative velocity uα and the number density 1/V of
either particle as seen from the other, and is therefore written as the flux times
a differential cross section dσ (α → β). (For a pair of non-relativistic particles
uα = |p1 /m1 − p2 /m2 |, while if one of the particles is a photon then uα = c.)
Hence Eq. (5.6.45) gives
δ(Eβ − Eα )δ 3 (Pβ − Pα ) dβ
→ p 2 dp d δ (p 2 c2 + m2 4 1/2
Ac ) + (p 2 c2 + m2 4 1/2
Bc ) − Eα ,
where p is in the solid angle d. There is a general rule that since for an
arbitrary increasing
function f (p) which takes a value f0 at a single point p0
we have 1 = δ(f (p) − f0 ) df (p), it follows that
In our case, this means that when we integrate over p, we are directed to set
p = pβ , where
(pβ2 c2 + m2 4 1/2
Ac ) + (pβ2 c2 + m2 4 1/2
Bc ) = Eα , (5.6.48)
and Eq. (5.6.46) becomes
(2π)4 h̄2 pβ2
dσ (α → β) = Mβα 2 d , (5.6.49)
uα uβ
in which it is understood that, in the center-of-mass system, Mβα is to be
evaluated by placing pA = − pB in the infinitesimal solid angle d, with
|pA | = |pB | = pβ , and
pβ c 2 pβ c 2
uβ ≡ + . (5.6.50)
(pβ2 c2 + m2 4 1/2
Ac ) (pβ2 c2 + m2 4 1/2
Bc )
Of course in the center-of-mass system the initial relative velocity uα in
Eq. (5.6.47) is given by similar formulas but with β replaced with α and
the final masses mA and mB replaced with initial masses mA and mB :
pα c2 pα c 2
uα ≡ + , (5.6.51)
(pα2 c2 + m2A c4 )1/2 (pα2 c2 + m2B c4 )1/2
where
(pα2 c2 + m2A c4 )1/2 + (pα2 c2 + m2B c4 )1/2 = Eα . (5.6.52)
We can now see how our earlier results for scattering by a fixed potential
emerge from this general formalism. Consider an elastic non-relativistic scat-
tering process, in which mA = mA ≡ m and mB = mB m. In this
case pα = pβ , uα = uβ = pα /m, and Eα − mA c2 − mB c2 = pα2 /2m.
Equation (5.6.49) then gives the differential cross section
dσ (α → β) 2
= (2π)4 h̄2 m2 Mβα . (5.6.53)
d
To calculate the matrix element Mβα , we note that the final free-particle wave
function is
eipA ·xA /h̄ eipB ·xB /h̄
ϕβ (xA , xB ) =
(2π h̄)3/2 (2π h̄)3/2
and in the center-of-mass system the initial interacting wave function takes the
form
1
ψα (xA , xB ) = ψ(xA − xB ) × ,
(2π h̄)3/2
where ψ is the wave function discussed in the main body of this section (which
already includes a normalization factor (2π h̄)−3/2 ), and the second factor takes
190 5 Quantum Mechanics
care of the normalization of the heavy particle wave function. Then, setting
xA = x + xB and integrating over xB ,
ϕβ V ψα ≡ ϕβ∗ (xA , xB )V (xA − xB )ψα (xA , xB ) d 3 xA d 3 xB
∗
e−ipA ·x/h̄
=δ 3
(pA + pB ) 3
d x V (x)ψ(x)
(2π h̄)3/2
so
e−ipA ·x/h̄
Mβα = 3
d x V (x)ψ(x) . (5.6.54)
(2π h̄)3/2
Using Eq. (5.6.54) in Eq. (5.6.53) gives the same differential cross section
(5.6.17) as found earlier.
It is frequently observed that the cross section for some reaction is a function
of energy with a sharp peak. This is a sign of a resonance, the formation of
a slowly decaying intermediate state in the scattering process. Suppose the
integral ψβ∗ V ψα in Eq. (5.6.36) for the S-matrix has a term with an energy
dependence proportional to (Eα − ER + i h̄/2)−1 , with ER and real and
> 0. This yields a term in the function Gβ (E) defined by Eq. (5.6.33) with
energy dependence proportional to (E − ER + i h̄/2)−1 , which has a pole in
the lower half of the complex E plane. Although, as noted in the derivation of
Eq. (5.6.35), the contribution of any singularity in Gβ (E) at an energy E at a
finite distance below the real axis vanishes for t → +∞, if the singularity is
close to the real axis then this contribution lasts a long time. So if is rela-
tively small then the integral over E in Eq. (5.6.32) contains a term that decays
slowly, with a time dependence proportional to exp(−iER t/h̄) exp(−t/2),
giving a term in |ψ (g) (t)|2 that decays as exp(−t), indicating the presence
of an intermediate state whose probability decays at a rate . The singular term
in ψβ∗ V ψα gives a term in the cross section with energy dependence
2
1 1
σ ∝ = . (5.6.55)
E − ER + i h̄/2 (E − ER ) + h̄2 2 /4
2
So, this is the general rule for resonances: the decay rate of the intermediate
state is the full width in energy of the resonant peak in the cross section at half
maximum, divided by h̄.
where in the Heisenberg picture P = mẊ. This has been adequate in deal-
ing with charged particles moving in an electrostatic potential but not in more
complicated contexts, such as the case of charged particles moving in general
classical electromagnetic fields, discussed in the next section, much less for a
quantum theory of fields. Also, in using commutation relations like Eq. (5.7.1),
we must wonder (or at least we should wonder) why these relations are valid.
Hamiltonian Formalism
There is a more general approach, known as the canonical formalism, according
to which the continuous degrees of freedom (excluding spin) of any system are
represented by a set of canonical variables Qa (such as all the components of
the positions of all the particles in a system) and an equal number of “canonical
conjugates” Pa . Like any operators, in the Heisenberg picture these operators
satisfy the equations of motion (5.3.34):
d d
i h̄ Qa (t) = [Qa (t), H ] , i h̄ Pa (t) = [Pa (t), H ] , (5.7.2)
dt dt
where H = H Q(t), P (t) is the Hamiltonian of the system. On the basis of
previous experience with classical phenomena, we commonly need to require
that these equations of motion take the same form as the Hamiltonian equations
of motion in classical mechanics:
d ∂
Qa (t) = H Q(t), P (t) , (5.7.3)
dt ∂Pa (t)
d ∂
Pa (t) = − H Q(t), P (t) . (5.7.4)
dt ∂Qa (t)
For instance, for a particle of mass m in a potential V (X), the variables Qa are
the components of the position vector X, the Hamiltonian is
P2
H (X, P) = + V (X) ,
2m
and the equations of motion (5.7.3) and (5.7.4) are
d P d
X= , P = −∇V (X)
dt m dt
192 5 Quantum Mechanics
∂ h̄2 i h̄ i h̄
i h̄ ψ(x, t) = − ∇ 2 ψ(x, t) − ∇ · [V(x)ψ(x, t)] − V(x) · ∇ψ(x, t) .
∂t 2m 2 2
(5.7.10)
Lagrangian Formalism
There is another version of the canonical formalism, in quantum mechanics as
well as classical mechanics, based on a Lagrangian L(Q, Q̇) taken as a function
of canonical variables Qa (t) and their time derivatives Q̇a (t) rather than a
Hamiltonian function of canonical variables and their canonical conjugates. The
fundamental assumption of the Lagrangian formalism is that a quantity known
as the action
+∞
I≡ L(Q(t), Q̇(t)) dt (5.7.11)
−∞
In the case where δQa (t) vanishes at t → ±∞, integrating the second term in
the integrand by parts gives
+∞ ∂L(Q(t), Q̇(t)) d ∂L(Q(t), Q̇(t))
δI = − δQa (t) dt ,
a −∞ ∂Qa (t) dt ∂ Q̇a (t)
and since this is assumed to vanish for arbitrary variations δQa (t) that vanish at
t → ±∞, we must have
d ∂L(Q(t), Q̇(t)) ∂L(Q(t), Q̇(t))
= . (5.7.12)
dt ∂ Q̇a (t) ∂Qa (t)
These are the equations of motion in the Lagrangian formalism.
From this, we can go over to the classical Hamiltonian formalism, defining
∂L(Q(t), Q̇(t))
Pa (t) = (5.7.13)
∂ Q̇a (t)
with Hamiltonian
H (Q, P ) = Q̇a Pa − L(Q, Q̇) . (5.7.14)
a
194 5 Quantum Mechanics
(Taken literally, this may not put the Qs and P s in the right order for H to
be self-adjoint, in which case we must average over their ordering to make H
self-adjoint as we did in Eq. (5.7.8).) In Eq. (5.7.14) we should regard Q̇ as a
function of the Qs and P s, given by solving Eq. (5.7.13) for Q̇. We can then
check that the Qs and P s satisfy the Hamiltonian equations of motion
∂H (Q, P ) ∂ Q̇b ∂L(Q, Q̇) ∂L(Q, Q̇) ∂ Q̇b
= Pb − −
∂Qa ∂Qa ∂Qa ∂ Q̇b ∂Qa
b b
∂L(Q, Q̇)
=− = −Ṗa
∂Qa
and
∂H (Q, P ) ∂ Q̇b ∂L(Q, Q̇) ∂ Q̇b
= Pb + Q̇a − = Q̇a ,
∂Pa ∂Pa ∂ Q̇b ∂Pa
b b
as was to be shown.
Noether’s Theorem
The chief reason for using the Lagrangian formalism to construct a Hamiltonian
is that there is a deep relation between conservation laws and symmetries of
the Lagrangian, first stated in classical physics26 by Amalie Emmy Noether
(1882–1935). Let us consider a symmetry of the Lagrangian under an infinites-
imal transformation that for simplicity takes the Qs into functions of Qs:
∂fa (Q)
Qa → Qa + fa (Q) , Q̇a → Q̇a + Q̇b , (5.7.15)
∂Qb
b
where the fa (Q) are some functions only of the Qs that are dictated, up to a
constant factor, by the nature of the symmetry principle, and is an infinitesimal
parameter. (Time-independent rotations and translations of coordinates are of
this general form.) The invariance of L under this transformation tells us that
∂L ∂L d
0= fa (Q) + fa (Q) .
a
∂Q a a ∂ Q̇ a dt
Using Eqs. (5.7.12) and (5.7.13), we see that this is a conservation law:
dF (Q, P )
= 0 where F (Q, P ) ≡ Pa fa (Q) . (5.7.16)
dt a
26 E. Noether, Nachr. König Gesell. Wiss. zu Göttingenm Math.-Phys. Klasse 235 (1918).
5.8 Charged Particles in Electromagnetic Fields 195
This ensures that the fields satisfy the homogeneous Maxwell equations
∇ × E + Ḃ/c = 0 , ∇·B=0, (5.8.2)
and leads to simplifications in the other Maxwell equations.
What in classical physics is merely a convenience, in quantum mechanics is
a necessity. It is not possible to write a simple local Hamiltonian for a charged
particle in general electric and magnetic fields using just the fields E and B. But
we can write such Hamiltonians in terms of A and φ. For a single non-relativistic
particle of mass m and charge e, the Hamiltonian is
1 , e -2
H (X, P) = P − A(X, t) − eφ(X, t) . (5.8.3)
2m c
Whether or not we derive this Hamiltonian from a Lagrangian, its real justi-
fication is that it leads to the correct equations of motion. The Hamiltonian
equations of motion (5.7.3) and (5.7.4) here take the form
∂H 1 , e -
Ẋi (t) = = Pi (t) − Ai (X, t) ,
∂Pi (t) m c
∂H e , e - ∂A (X, t) ∂φ(X, t)
j
Ṗi (t) = − = Pj (t) − Aj (X, t) −e ,
∂Xi (t) mc c ∂Xi ∂Xi
where the indices i, j , etc. run over the values 1, 2, 3, and repeated indices are
summed. Eliminating the momentum from these two equations (and dropping
arguments), we have an equation of motion for the position:
e ∂Aj ∂φ e ∂Ai ∂Ai
mẌi = Ẋj −e − + Ẋj
c ∂Xi ∂Xi c ∂t ∂Xj
e ∂Aj ∂Ai ∂φ e ∂Ai
= Ẋj − −e − .
c ∂Xi ∂Xj ∂Xi c ∂t
To put this in a more familiar form, note that
∂Aj ∂Ai
Ẋj − = Ẋ × (∇ × A) i .
∂Xi ∂Xj
(For instance, for i = 3 the left-hand side is
∂A1 ∂A3 ∂A2 ∂A3
Ẋ1 − + Ẋ2 − = Ẋ1 (∇ × A)2 − Ẋ2 (∇ × A)1
∂X3 ∂X1 ∂X3 ∂X2
= Ẋ × (∇ × A) 3
and likewise for i = 1 and i = 2.) Using the formulas (5.8.1) for E and B, the
equation of motion takes the form
e
mẌ = eE + [Ẋ × B] , (5.8.4)
c
5.8 Charged Particles in Electromagnetic Fields 197
Gauge Transformations
There is more than one set of potentials A and φ that give the same fields E and
B. Given a set of potentials A and φ that yield a set of fields E and B, we can
always find other potentials
1 ∂ξ
A # = A + ∇ξ , φ # = φ − , (5.8.5)
c ∂t
which give the same fields for an arbitrary function ξ(x, t). A given choice of
potentials is called a choice of gauge, and Eq. (5.8.5) is known as a gauge
transformation. Even though the equation of motion (5.8.4) derived from the
Hamiltonian (5.8.3) involves only the fields E and B, the Hamiltonian depends
on A and φ and is not gauge invariant. So it is important to observe that no
physical implications of this Hamiltonian depend on the choice of gauge.
Let us check this for the simple case of a time-independent gauge trans-
formation function ξ(X), which has no effect on φ. The gauge-transformed
Hamiltonian is
1 , e e -2
H# = P − A − ∇ξ − eφ . (5.8.6)
2m c c
Define an operator
ie
U (X) ≡ exp − ξ(X) .
h̄c
According to Eq. (5.7.7),
e
[P, U (X)] = − ∇ξ(X)U (X)
c
and therefore
e
U −1 (X)PU (X) = P − ∇ξ(X) .
c
It follows that
H # (X, P) = U −1 (X)H (X, P)U (X) . (5.8.7)
So if ψ(x) satisfies the time-independent Schrödinger equation H ψ = Eψ for
energy E, then the gauge-transformed Schrödinger equation H # ψ # = Eψ # is
satisfied for the same energy, with gauge-transformed wave function
ie
ψ (x) = exp − ξ(x) ψ(x) .
#
(5.8.8)
h̄c
Not only the energy but also the probability density |ψ|2 is unchanged by this
transformation.
198 5 Quantum Mechanics
Magnetic Interactions
Now let us take the simplest example of magnetic interactions, a one-electron
atom in a uniform time-independent magnetic field B. We can take the vector
potential here as
1
A=− X×B,
2
for which ∇ × A = B. Of course this is not unique, but as we have seen this
makes no difference.
The factor 1/c multiplying the vector potential in Eq. (5.8.3) makes the mag-
netic term in the Hamiltonian generally very small. To first order in this term, it
shifts the Hamiltonian (5.8.3) by
e e e
H = A(X) · P = − [X × B] · P = B·L, (5.8.9)
me c 2me c 2me c
where L = X × P is the orbital angular momentum operator. (Here e has been
changed to −e, because in the usual notation this is the charge of the electron.
Also, we have not had to worry about the order of the operators A(X) and P,
because in this choice of gauge, ∇ · A = 0.)
Spin Coupling
What about spin? The form of the interaction (5.8.9) suggests that there should
also be a similar term in the magnetic interaction Hamiltonian with the spin
operator S in place of L, and not necessarily with the same coefficient. The
magnetic interaction is therefore taken to be in the form
e
H = B · [L + ge S] , (5.8.10)
2me c
where ge is a dimensionless coefficient known as the gyromagnetic ratio of
the electron. It was first calculated in 1928 on the basis of a relativistic theory
of the electron by Dirac,27 who found the value ge = 2. The development of
quantum electrodynamics after World War II led to a calculation28 of a radiative
correction due to the emission and reabsorption of a photon by the electron
while it is interacting with the magnetic field. This gave ge = 2 × 1.00162, in
good agreement with experiment.
The effect of the interaction (5.8.10) on atomic energy levels in a magnetic
field is described in the next section.
There are few problems in quantum mechanics that can be solved exactly. For-
tunately it is often possible to find useful approximate solutions by a technique
known as perturbation theory. Sometimes it happens that the results obtained
in this way are more revealing than would be provided by a more complicated
exact solution, even where one is available.
The basis of perturbation theory is the assumption that the Hamiltonian can
be divided into two parts:
H = H0 + H , (5.9.1)
where H0 is simple enough to allow exact solutions of the Schrödinger equation,
and H is in some sense small. We have already used a Hamiltonian of this type
to derive the Born approximation for scattering amplitudes in Section 5.6. In
this section we shall concentrate on deriving approximations for energy levels
and the corresponding wave functions, assuming that H is small enough to
allow the eigenfunctions and eigenvalues of H to be usefully expressed as power
series in H . That is, in the Schrödinger equation H ψ = Eψ we write
ψ = ψ0 + ψ1 + ψ2 + · · · , E = E 0 + E1 + E2 + · · · , (5.9.2)
where ψN and EN are of Nth order in H . The Schrödinger equation then takes
the form
(H0 + H )(ψ0 + ψ1 + ψ2 + · · · )
= (E0 + E1 + E2 + · · · )(ψ0 + ψ1 + ψ2 + · · · ) . (5.9.3)
In the Nth order of perturbation theory we keep all terms in Eq. (5.9.3) up to
Nth order in H . To zeroth order in H , this is the unperturbed Schrödinger
equation
H0 ψ0 = E0 ψ0 , (5.9.4)
whose solutions we assume are known.
or, if ψ0 is normalized,
E1 = ψ0∗ H ψ0 . (5.9.6)
Very nice, but this does not necessarily work in the case where E0 is a
degenerate energy eigenvalue, with several independent eigenfunctions ψ (n) :
Multiply Eq. (5.9.5) with any of the ψ (n)∗ , integrate, and sum over all coor-
dinates and spin 3-components,
and again use the fact that H0 is Hermitian,
so that ψ (n)∗ H0 ψ1 = E0 ψ (n)∗ ψ1 . The terms in this integral involving ψ1
again cancel, and we have
(n)∗
ψ H ψ0 = E1 ψ (n)∗ ψ0 . (5.9.9)
The difficulty is that with more than one independent solution ψ (n) of Eq. (5.9.7),
whatever we choose for our unperturbed wave function ψ0 , we can always
choose some linear combination n cn ψ (n) of these eigenfunctions to be
(n) ∗ ψ = 0, so that the same
orthogonal to ψ0 , in the sense that n cn ψ 0
linear combination of Eq. (5.9.9) gives a condition on H :
∗
cn ψ (n)
H ψ0 = 0 , (5.9.10)
n
to replace the ψ (n) with linear combinations for which the orthonormality
condition (5.9.8) is still satisfied, and now Hnm is diagonal:
(n)∗ (m) En n = m
ψ Hψ = (5.9.11)
0 n = m ,
for some real En . We must take the zeroth-order solution to be one of these
redefined eigenfunctions, say ψ (m) , so that if we multiply Eq. (5.9.5) with
the complex conjugate of any linear combination n=m cn ψ (n) of the other
degenerate eigenfunctions that is orthogonal to ψ (m) , Eq. (5.9.11) implies that
Eq. (5.9.10) is also necessarily satisfied, and there is no contradiction. (We will
see an example of this procedure in our treatment below of the Zeeman effect.)
With the zeroth-order wave function ψ0 = ψ (m) , Eq. (5.9.6) gives E1 = Em .
We can get a further insight into the necessity of a suitable choice of the
zeroth-order wave function by considering a problem of some importance in its
own right, the calculation of the first-order contribution to the wave function.
Let us introduce a complete orthonormal set of solutions ϕa of the zeroth-order
Schrödinger equation
H0 ϕa = Ea ϕa , ϕa∗ ϕb = δab . (5.9.12)
with the complex number α the only component of ψ1 that is still unknown.
We can always take α to be real, because any change in the imaginary part of
202 5 Quantum Mechanics
Note that if the parameters of the theory are changed so that one of the Ea
approaches E0 , then the corresponding component of the wave function
becomes
∗ very large, invalidating perturbation theory, unless in this limit
ϕa H ψ0 becomes very small. So even approximate degeneracy can be a
problem.
In the case of degeneracy Eq. (5.9.13) tells us nothing about the components
of ψ1 along the ϕa with Ea = E0 , and the normalization condition on ψ0 +
ψ1 does not determine these components either. For this, it is necessary to
invoke the condition that the changes of the wave function in higher orders of
perturbation theory are small. We will not pursue this aspect here.
Zeeman Effect
For an example of the use of perturbation theory, let us return to the Zeeman
effect, mentioned at the end of the previous section. Here H0 is the Hamilto-
nian of an alkali metal atom, considering the outermost electron to move in an
effective potential arising from the charges of the nucleus and all other electrons,
with no external fields. To calculate the effect of a weak external magnetic field
B, we consider a first-order perturbation given by Eq. (5.8.10):
e
H = B · [L + ge S] , (5.9.16)
2me c
where ge 2 is the gyromagnetic ratio of the electron. The eigenfunctions of
H0 may be labeled ψnj M . Here
all have the same energy, so the eigenstates of H0 are all degenerate, except for
those with j = 0. For a magnetic field in an arbitrary direction the operator H
in general includes terms proportional to Lz and Sz , which do commute with
Jz , but also Lx , Ly , Sx , and Sy, which do not commute with Jz , so there will be
∗
non-vanishing components of ψnj M H ψnj M with M = M, and first-order
perturbation theory will not work if we take the zeroth-order wave function to
be one of the ψnj M .29
The cure is obvious. Take the zeroth-order wave function to be an eigenstate
of the component of J in the direction of B. Or, to save writing, just continue
to use the ψnj M as zeroth-order wave functions but from the beginning choose
the coordinate system so that the z-axis is in the direction of B. In this case, the
first-order shift in the energy is given by Eq. (5.9.6) as
eB ∗
E1 (nj M) = ψnj M (Lz + ge Sz )ψnj M . (5.9.18)
2me c
It is easiest to evaluate E1 for s-wave states with = 0, for which j = 1/2
and M = ±1/2. In this case Eq. (5.9.18) gives immediately
ege B h̄
E1 (n 0 1/2 ± 1/2) = ± . (5.9.19)
4me c
To deal with the general case with = 0, we use a general property of angular
momentum multiplets. Let ϕj M be any multiplet of 2j + 1 wave functions,
with J2 ϕj M = h̄2 j (j + 1)ϕj M and Jz ϕj M = h̄Mϕj M , formed as described
in Section 5.4 by letting lowering operatorsJx − iJy act on a state with M =
j . For any vector operator V, the integrals ϕj∗M Vi ϕj M can all be calculated
from any one of them by using the commutation relations of the raising and
lowering operators Jx ± iJy with the Vi and the effect of these operators on the
multiplet ϕj M , none of which depends on the choice
of the operator V or the
wave functions ϕj M , so in general the integrals ϕj M Vi ϕj M can depend only
on the specific choice of the operator V or the wave functions ϕj M through an
overall factor. In particular, we have
ϕj∗M Vi ϕj M = αV ϕj∗M Ji ϕj M , (5.9.20)
29 If it were not for the fine structure produced by spin–orbit coupling there would be an additional
degeneracy: the energies for states with the same n and but different j would be equal. The discussion
here of the Zeeman effect assumes that the magnetic field is sufficiently weak that the energy shift it
produces is small compared with the fine-structure splitting, in which case states with the same n and
but different j are not effectively degenerate. But we are ignoring the even smaller hyperfine energy shifts
due to the interaction of the electron with the magnetic field of the nucleus.
In hydrogen there is a further degeneracy of states with the same n and j but different , such as the
2s1/2 and 2p1/2 states, which are separated only by the very small Lamb shift described in Section 5.4.
The treatment here applies to hydrogen only when the energy shift due to the interaction of the electron
with the external magnetic field is less than the Lamb shift but greater than the hyperfine splitting.
204 5 Quantum Mechanics
where the factor αV will in general depend on the nature of the operator V
and the wave functions ϕj M , but not on the vector index i nor on the angular
momentum z-components M and M . This is an example of a general quantum-
mechanical result known as the Wigner–Eckart theorem.30
In its application to the Zeeman effect, Eq. (5.9.20) gives
∗ ∗
ψnj L ψ
M i nj M = αL (nj ) ψnj M Ji ψnj M ,
∗ ∗
ψnj M Si ψnj M = αS (nj ) ψnj M Ji ψnj M . (5.9.21)
Note that
1 1
L·J= − (J − L)2 + J2 + L2 = − S 2 + J 2 + L2
2 2
and likewise
1
S·J= − L2 + J2 + S2 ,
2
so
−3/4 + j (j + 1) + ( + 1)
αL (nj ) = ,
2j (j + 1)
−( + 1) + j (j + 1) + 3/4
αS (nj ) = . (5.9.23)
2j (j + 1)
30 For a statement of this theorem and a detailed proof, see Section 4.1 of Weinberg, Lectures on Quantum
Mechanics, listed in the bibliography.
5.9 Perturbation Theory 205
Using Eqs. (5.9.23) and (5.9.21) in Eq. (5.9.18) then gives the first-order Zee-
man energy shift:
E1 (nj M)
eB h̄M
= − 3/4 + j (j + 1) + ( + 1) + ge [−( + 1) + j (j + 1) + 3/4] .
4me cj (j + 1)
(5.9.24)
When field theorists say that the Lamb shift is due to the emission and re-
absorption of a photon by the electron in hydrogen they mean that this is a
second-order effect, in which the wave functions ϕa in Eq. (5.9.27) represent
states containing an electron and a photon. Since these states form a continuum,
the sum over states involves an integral over the photon momentum, which
introduces infinities into the calculation. This calculation was completed only
in 1949, when it was recognized that the same second-order processes require
a redefinition of the mass and charge of the electron and of the photon and
electron fields, which leads to a cancellation of infinities.31
31 N. M. Kroll and W. E. Lamb, Phys. Rev. 75, 388 (1949); J. B. French and V. F. Weisskopf, Phys. Rev. 75,
1240 (1949).
206 5 Quantum Mechanics
Our discussion of quantum mechanics in this chapter has so far been based
on wave mechanics, in which physical states are represented by functions of
particle positions and spins. This is too parochial a formalism. Why position,
among all observable physical quantities? Indeed, we have already seen in
Section 5.3 that a physical state can just as well be represented by a wave
function depending on momenta (such as (5.3.20) for a one-particle system) as
by a wave function depending on position.
The study of other physical systems forces us much farther away from wave
mechanics than merely substituting momenta for position as the argument of
wave functions. The state of a field, such as the electromagnetic field, cannot
be described in terms of the positions or the momenta of any fixed number of
particles. It is partly as a preparation for our account of quantum field theory in
Chapter 7 that we need to consider a formulation of quantum mechanics, due
chiefly to Dirac,32 that is general enough to apply to any physical system.
In this general formulation, physical states are represented by state vectors
in an infinite-dimensional space, known as Hilbert space. Like ordinary vectors
in three dimensions, a linear combination a1 1 + a2 2 of two state vectors 1
and 2 is also a state vector, only here the numerical coefficients a1 and a2 can
be complex. Addition here has the same properties as the addition of complex
numbers, including associativity and commutativity and the existence of a zero
for which 0 + = + 0 = . Also, as in Euclidean space, for any two
state vectors and there is a scalar product denoted (, ), here a complex
number, with the properties
(, ) = (, )∗ , (5.10.1)
(, a1 1 + a2 2 ) = a1 (, 1 ) + a2 (, 2 ) , (5.10.2)
(, ) ≥ 0 (5.10.3)
and (, ) = 0 if and only if = 0. As we shall see, wave functions are
the components of these state vectors in one basis or another,
and the integrals
(5.3.8) of products of these wave functions, abbreviated as ψ ∗ ϕ, are the scalar
products (, ) of the state vectors of which they are the components.
Observable quantities are represented in this formulation by linear operators
that act on state vectors rather than on wave functions. Here an operator A being
“linear” means that for any state vectors 1 and 2 and complex numbers a1
and a2 , we have
A(a1 1 + a2 2 ) = a1 A1 + a2 A2 . (5.10.4)
32 This approach is the basis of Dirac’s 1930 treatise, The Principles of Quantum Mechanics, listed in the
bibliography.
5.10 Beyond Wave Mechanics 207
Inserting this into Eq. (5.10.8) gives the expectation value of the observable
represented by A:
n αn |(n , )|
2
A = . (5.10.12)
n |(n , )|
2
Since a corresponding result applies for any function of this observable, it fol-
lows from Eq. (5.10.12) that the probability of finding a value αm when we
measure the observable represented by A is
|(m , )|2
Pm () = . (5.10.13)
n |(n , )|
2
so with this normalization the scalar product of these eigenvectors is the Dirac
delta function discussed in Section 5.6,
(ϒα , ϒα ) = δ(α − α ) . (5.10.19)
Of course, if we also normalize the state vector so that
|(ϒα , )|2 dα = 1
Atoms were at the center of physicists’ interest in the 1920s. It was largely
from the effort to understand atomic properties that modern quantum mechan-
ics emerged in this decade. In this work physicists did not have to concern
themselves much with the nature of the atomic nucleus. It had been known
since Rutherford’s interpretation in 1911 of the scattering experiments in his
laboratory that almost all the mass of atoms is contained in a tiny positively
charged nucleus, but all that the atomic physicist needed to know about this
nucleus was its electric charge, mass, and (to account for hyperfine splitting) its
spin and magnetic moment.
In the 1930s physicists’ concerns expanded to include the nature of atomic
nuclei. The constituents of the nucleus were identified, and a start was made in
learning what held them together. And, as everyone knows, world history was
changed in subsequent decades by the military application of nuclear physics.
1 E. Rutherford, Phil. Mag. Series 6 37, 381 (1919); reproduced in Beyer, Foundations of Nuclear Physics,
listed in the bibliography.
210
6.1 Protons and Neutrons 211
It was clear from the beginning that protons could not be the only constituents
of atomic nuclei. This would have been close to a realization of a hypothesis in
1815 of the chemist William Prout (1785–1860). Observing that known atomic
weights were generally close to whole number multiples of the atomic weight
of hydrogen, Prout proposed that all atoms are composites of hydrogen atoms.
Applying Prout’s hypothesis to nuclei rather than to atoms would have done
well in accounting for nuclear masses (which provide almost all of the masses
of atoms). It would even work when applied to isotopes, sets of atoms that have
an equal number of electrons and hence display the same chemical behavior
but differ in their atomic weights. Measurements at the Cavendish Laboratory
by Francis William Aston (1877–1945) had shown by 1919 that the atomic
weights of various isotopes of hydrogen, carbon, oxygen, chlorine, etc. were
all close to whole number multiples of the atomic weight of the lightest isotope
of hydrogen. But to suppose that nuclei are made up only of protons would have
entirely failed in dealing with nuclear electric charges. If nuclei were composed
only of protons their atomic weights in units of the atomic weight of hydrogen
would all be close to their atomic numbers, which as we saw in Section 3.4
were by 1919 already known to equal their electric charges in units of the proton
charge. But light nuclei such as helium, carbon, nitrogen, oxygen, etc. typically
have atomic weights close to twice their atomic numbers.
atom, then why do the electrons in ordinary atoms including hydrogen atoms
not all fall into these states, emitting the released energy as radiation?
There was an even stronger argument coming from molecular physics against
supposing nuclei to consist only of protons and electrons. As we saw in
Section 5.5, we can tell whether the identical nuclei in a diatomic molecule
are bosons or fermions from the ratio of intensities of transitions in the para
and ortho states, which have orbital angular momentum respectively even and
odd. At temperatures T for which the energies of these transitions are much less
than kT , the total intensity of the para lines is greater than for the ortho lines by
a factor (s1 + 1)/s1 if the spin s1 of each nucleus is an integer and the nuclei are
bosons, while the total intensity of the para lines is less than for the ortho lines
by a factor s1 /(s1 + 1) if the spin s1 of each nucleus is a half odd integer and the
nuclei are fermions. In 1929 Walter Heitler (1904–1981) and Gerhard Herzberg
(1904–1999) observed that the total intensity of the para lines in the diatomic
nitrogen molecule is greater than the intensity of the ortho lines, indicating that
the nucleus of the most common nitrogen isotope, 14 N, is a boson.3 (In fact,
we now know that it has spin 1.) But if nuclei consist of protons and electrons,
then the 14 N nucleus would consist of 14 protons to give atomic weight 14, and
seven electrons, to give atomic number 14 − 7 = 7, adding up to 14 + 7 = 21
fermions, and the 14 N nucleus would be a fermion.
Chadwick did not know the initial velocity vB , but he could eliminate it by
taking the ratio of recoil velocities for different target nuclei of known atomic
weights, and from this ratio he could calculate the atomic weight An of the
particle comprising the neutral ray. For instance, measurements showed that
the same neutral ray from beryllium that causes hydrogen nuclei to recoil
straight back with speed 3.3 × 107 m/sec would cause nitrogen nuclei to
recoil straight back with speed 4.7 × 106 m/sec, so
3.3 × 107 An /(1 + An ) (14 + An )
= =
4.7 × 106 An /(14 + An ) (1 + An )
from which it follows that An 1.16. Chadwick concluded that these neutral
rays consist of particles he called neutrons, with mass close to that of hydrogen.
Chadwick assumed that this was the neutron that Rutherford had anticipated
in his 1920 Bakerian lecture, and he followed Rutherford in supposing that the
neutron is a proton–electron bound state. He knew about the problem that study
of the diatomic nitrogen molecule indicated that the 14 N nucleus is a boson,
which is not possible if it consists of 14 protons and seven electrons (whether or
not combined into nuclei of 4 He or 3 He or proton–electron composites), but at
first he decided to ignore the problem. This may have been due to a widespread
reluctance at the time to contemplate any new fundamental particles besides
the proton, electron, and photon, or perhaps it was just the influence of the
formidable Lord Rutherford. The status of the neutron as a fermion that is every
bit as elementary as the proton only became clear with studies of the forces
between these particles, to be discussed in the next section. As a result of these
studies, neutrons and protons became regarded as two members of a family of
particles known as nucleons.
These radii can be calculated from measurements of the effect of the nuclear
electric quadrupole moment on atomic spectra; from measurements of the scat-
tering of electrons in the Coulomb field of the nucleus; and from the measured
rates of alpha decays, to be discussed in Section 6.4. A consensus of these
measurements gives a nuclear radius
R 1.3 × 10−13 cm × A1/3 . (6.1.1)
The binding energy of a nucleus is the energy required to take all of its nucleons
to rest at a great distance. It can easily be calculated from measurements of
atomic weights: it is the sum of the atomic weights of all the nucleons in the
nucleus minus the atomic weight of the nucleus, times the mass energy m1 c2 =
931.494 MeV of unit atomic weight.
Surface Tension
With a nuclear radius proportional to A1/3 the surface area of the nucleus is
proportional to A2/3 , so a fraction proportional to A−1/3 of the A nucleons
is closer to the surface than the range of the nuclear force and therefore feels
less attraction to other nucleons. This decreases the nuclear binding energy
per nucleon by a term proportional to A−1/3 , estimated from measured atomic
weights as −18.3 A−1/3 MeV.
Coulomb Repulsion
The electrostatic repulsion of Z protons introduces a negative term in the total
binding energy proportional to Z 2 and to the inverse nuclear radius, which is
proportional to A−1/3 . The Coulomb contribution to the binding energy per nu-
cleon is therefore proportional to Z 2 A−4/3 . It is approximately −0.71 Z 2 A−4/3
MeV. (The energy coefficient here is smaller than for the other terms in the
binding energy because electric forces are intrinsically weaker than nuclear
forces. For instance, the Coulomb energy of a uniformly charged sphere with
charge Ze and radius (6.1.1) is 3Z 2 e2 /5R = 0.66Z 2 A−1/3 MeV.)
6 The numerical values of coefficients of various terms in the nuclear binding energy are rounded off here
from values derived from a fit to measured binding energies by A. H. Wapstra and N. B. Gove, Nuclear
Data Tables 9, 267 (1971).
6.1 Protons and Neutrons 215
Neutron–Proton Inequality
The Pauli exclusion principle leads to a decrease in the binding energy for nuclei
with unequal numbers of protons and neutrons. Given a nucleus with equal
numbers of protons and neutrons, if we imagine a proton changed into a neutron
the new neutron would be forced by the exclusion principle to occupy a state
of energy higher than any of the originally occupied neutron states. Because
of the symmetry between protons and neutrons (discussed in the next section),
with equal numbers of protons and neutrons the highest energy of the originally
occupied neutron states equals the highest energy of the originally occupied
proton states, so changing this proton into a neutron necessarily increases its
energy. The same is true if we change a neutron into a proton. This decrease in
the total binding energy is approximately proportional to (N − Z)2 /A, where
N = A − Z is the number of neutrons. It is taken as proportional to 1/A to take
account of the decrease in the spacing of nuclear energy levels with increasing
A. Observed binding energies indicate a term in the binding energy per nucleon
of −23.2 MeV × (A − 2Z)2 /A2 .
Putting this together, the binding energy per nucleon goes as follows;
binding energy/A 15.8 − 18.3 A−1/3 − 0.71 Z 2 A−4/3
− 23.2 (A − 2Z)2 A−2 MeV . (6.1.2)
There are also sporadic bumps in the binding energy. Nuclei with even or odd
numbers both of protons and
√ of neutrons have√an additional term in the binding
energy that is about 12/ A MeV or −12/ A MeV, respectively. Also, the
binding energy is increased for certain “magic” numbers of protons or neutrons,
to be discussed in Section 6.3.
among the protons becomes more and more important, and the nuclei with
the lowest ground state energy tend to have an increasing ratio of neutrons to
protons. For instance, for A = 56 the nucleus with the lowest ground state
energy is 56 Fe, with 26 protons and 30 neutrons. The atomic numbers of the
stable valley fall increasingly below the line Z = A/2 for larger values of A, to
a value Z = 92 for A = 238.
In the stable valley, Eqs. (6.1.2) and (6.1.3) give a binding energy per nucleon
that increases with increasing A for lighter nuclei, owing to the decreasing effect
of surface tension, reaches a maximum of about 9 MeV for iron and nickel, and
then, because of the Coulomb term, decreases slowly for larger A, taking a
value of about 7.5 MeV for 238 U. The decrease with A of the binding energy
per nucleon for heavy nuclei makes it energetically favorable for these nuclei to
decay by splitting into fragments, either by spontaneous fission into two nuclei
of much lower A, or more often by emitting an alpha particle. After emitting
one or a few alpha particles a nucleus becomes excessively neutron-rich for the
new, lower, value of A, and it becomes energetically favorable for the nucleus to
lower the neutron–proton ratio by one or more beta decays, moving back toward
the stable valley. These alpha and beta decay processes sometimes yield nuclei
in excited states, which then undergo gamma decay to the ground state, emitting
an energetic photon. A succession of alpha, beta, and gamma decays continues
until the nucleus transforms into a non-radioactive nucleus, such as one of the
stable isotopes of lead.
For instance, in the decay chain that is most important in the history of
physics, uranium 238 alpha-decays to thorium 234 with a half life of 4.47 × 109
years, and then, with much shorter half lives, thorium 234 beta-decays to
protactinium 234, which beta-decays to uranium 234, which alpha-decays
to thorium 230, which alpha-decays to radium 226, which alpha-decays to
radon 222 (an example of alpha decay considered in detail in Section 6.4),
which alpha-decays to polonium 218, which alpha-decays to lead 214, which
beta-decays to bismuth 214, which beta-decays to polonium 214, which alpha-
decays to lead 210, which beta-decays to bismuth 210, which beta-decays to
polonium 210, which alpha-decays to the stable isotope lead 206, which makes
up 24% of natural lead.
There is a deep symmetry between protons and neutrons, which made it evident
that neutrons are fermions and just as elementary as protons. Knowledge of this
symmetry emerged in the late 1930s from a study of the forces among protons
and neutrons.
6.2 Isotopic Spin Symmetry 217
Nuclear Forces
The first of the nuclear forces to be studied was that between a proton and
a neutron, which could be measured by observing the scattering of neutrons
on the protons in a hydrogen-rich substance such as paraffin. As in all scat-
tering processes, the scattering amplitude f (x̂) introduced in Section 5.6 may
be expanded as a sum over terms with angular dependence proportional to the
spherical harmonic functions Ym (x̂) defined in Section 5.2. The terms with
> 0 are suppressed at low energy by a centrifugal barrier, which makes
the wave function vanish for vanishing separation r as r , so at the energies
available in the 1930s the scattering was dominated by the term with = 0, for
which the scattering amplitude f is independent of direction. But it is important
here to keep track of the dependence of the scattering amplitude on spin, which
we ignored in Section 5.6. With the neutron taken like the proton to have spin
1/2, there are now two terms in the amplitude for neutron–proton scattering,
with total spin s = 0 or s = 1. In the absence of orbital angular momentum
the total spin is conserved in the scattering process, so the total scattering cross
section takes the form σ0 + σ1 , where σs is the cross section in the = 0
proton–neutron state with total spin s. It is possible to separate the contributions
of spin zero and spin one by using data on the deuteron, a proton–neutron bound
state with = 0 (and a small admixture of = 2) and with total angular
momentum j = 1 and hence total spin s = 1. There is a classic relation7 that to
a good approximation gives σ1 = 2π h̄2 /μB, where μ is the reduced mass of a
proton and a neutron and B is the deuteron binding energy, so using scattering
data and the deuteron binding energy one can separately find σ0 and σ1 .
This is important because protons and neutrons are fermions, so the = 0
state of two protons or two neutrons must be antisymmetric in the particles’
spin 3-components. As can be seen from either Eq. (5.4.42) or Table 5.1, this
requires the state to have total spin zero. It is therefore of interest to compare the
value of σ0 deduced for proton–neutron scattering for s = 0 with the observed
total low-energy proton–proton scattering cross section.
Unfortunately there is no way to make a target out of the electrically neutral
(and, as we shall see, unstable) neutron, so it was not possible to make a direct
measurement of neutron–neutron scattering. There is no similar obstacle to the
measurement of proton–proton scattering for, as in Rutherford’s 1919 experi-
ments, one can make a target of a hydrogen gas or a proton-rich substance like
paraffin. Here the problem is that at low energy the scattering is almost entirely
due to the Coulomb potential, and reveals nothing about the nuclear forces.
7 For a textbook derivation, see Section 8.8 of Weinberg, Lectures on Quantum Mechanics, listed in the
bibliography.
218 6 Nuclear Physics
[Ta , Tb ] = iabc Tc . (The a, b, c indices can be taken like i, j , k to run over the
values 1, 2, 3, but of course for the isotopic spin these values have nothing to
do with directions in ordinary space. Repeated indices are again summed, and
abc like ij k is a totally antisymmetric quantity with 123 = 1.) The proton and
neutron are taken as the states with T3 = +1/2 and T3 = −1/2, respectively.
Just as two particles with ordinary spin 1/2 can combine to form a compound
state with total spin s equal to 0 or 1, two nucleons can combine to form a com-
pound state with total isotopic spin 0 or 1, which transforms under isotopic spin
rotations in the same way that states with ordinary total spin 0 or 1 transform
under ordinary rotations. The invariance of nuclear forces under isotopic spin
rotations tells us that total isotopic spin is conserved, so the cross section for the
scattering of two nucleons is the sum of a cross section for isotopic spin 1 and
a cross section for isotopic spin zero. The states with total isotopic spin 1 form
a triplet, just like orbital angular momentum states with = 1, whose compo-
nents are a proton+proton state with T3 = +1, a proton+neutron state with
T3 = 0, and a neutron+neutron state with T3 = −1. The proton+proton and
neutron+neutron = 0 states must be antisymmetric in the nucleon spin
3-components, and therefore have total ordinary spin 0. Since spin and isotopic
spin commute, the proton+neutron component of this triplet must then also
have spin zero. Since these three s-wave nucleon–nucleon states with total
ordinary spin 0 form a triplet, the scattering cross sections are the same for each.
On the other hand, an s-wave state of two nucleons with ordinary spin 1
is symmetric in the spin 3-components, so it cannot be a proton+proton or
neutron+neutron state, and can therefore only be a proton+neutron T3 = 0 state
of a singlet with total isotopic spin zero. This is the deuteron, with total angular
momentum and total ordinary spin both equal to one.
Multiplets
The implications of isotopic spin symmetry go far beyond the equality of s-
wave nucleon–nucleon cross sections for total spin zero. Before we go into this,
it is necessary to say something about the relation of isotopic spin quantum
numbers and electric charge. For the proton–neutron doublet, it is obvious that
the electric charge of a nucleon is
Q = e[T3 + 1/2] (6.2.1)
so that protons and neutrons will have charges respectively e and 0. In a nucleus
with B nucleons, the charge is the sum of (6.2.1) for all the nucleons, so
Q = e[T3 + B/2] , (6.2.2)
where now T is the isotopic spin operator of the whole nucleus and B is the
number of nucleons. As we have seen, B is very close to the atomic weight A
of the element, but we use the symbol B instead of A because they are not
220 6 Nuclear Physics
precisely equal, and in order that Eq. (6.2.2) should apply for some of the
particles discovered after World War II that are not composed of protons and
neutrons. In this more general context, B is known as the baryon number. (For
some unstable particles a quantity S known as strangeness that is conserved in
strong and electromagnetic interactions must be added to B in Eq. (6.2.2).)
Of course, electromagnetism does not respect isotopic spin symmetry: pro-
tons are charged while neutrons are not. Equation (6.2.2) shows that in elec-
tromagnetic phenomena involving the charge operator the 3-component of the
isotopic spin operator plays a different role from the 1- and 2- components.
There is also a nucleon mass difference, mn − mp = 1.293 MeV/c2 , which
contributes a term in the total rest mass proportional to T3 . For relatively light
nuclei, with atomic numbers less than about 20 to 30, Coulomb forces are
less important than nuclear forces and isotopic spin symmetry is fairly well
respected, but this is not true for heavy nuclei, where the Coulomb repulsion
of protons in the nucleus comes close to tearing the nucleus apart. It makes no
sense to talk about isotopic spin symmetry when we are dealing with uranium.
Relatively light nuclei must form isotopic spin multiplets. We characterize
any multiplet by a total isotopic spin quantum number t, defined so that (just as
for ordinary spin multiplets) the multiplet consists of 2t + 1 nuclei with T3
equal to t, t − 1, . . . , −t, all with the same ordinary spin (that is, total angular
momentum) and with close to the same energy. Acting on the multiplet the
isotopic spin operator T satisfies T2 = t (t + 1), the proton and neutron form a
t = 1/2 doublet, and the deuteron is a t = 0 singlet. There are many t = 1/2
doublets of complex nuclei; the lightest consists of the light isotope 3 He of
helium, whose discovery was announced by Rutherford in his 1920 Bakerian
lecture, and tritium, the radioactive isotope 3 H of hydrogen discovered at the
Cavendish Laboratory10 in 1934. The 3 He nucleus consists of two protons and
one neutron and has atomic weight 3.01605, while the 3 H nucleus is composed
of one proton and two neutrons and has atomic weight 3.01603. Both nuclei
have spin 1/2.
There are also triplets of nuclear states with t = 1, which show again that
this is a symmetry under transformations that go beyond the mere interchange
of protons and neutrons. A famous example includes the ground states of the
nuclei of 12 B and 12 N, which have B = 12 and charges 5e and 7e, and hence
according to Eq. (6.2.2) have T3 = −1 and T3 = +1. The T3 = 0 member
of the triplet would then be ordinary carbon, 12 C, with nuclear charge 6e. But
it is not the ground state of 12 C, which has total angular momentum j = 0,
while the ground states of 12 B and 12 N both have j = 1. Also, although the 12 B
and 12 N ground states have nearly equal atomic weights, 12.0144 and 12.0186,
respectively, the 12 C ground state by definition has atomic weight 12.0000. (The
greater binding energy of 12 C is due to two effects mentioned in the previous
10 M. Oliphant, E. Harteck, and E. Rutherford, Nature 133, 413 (1934); Proc. Roy. Soc. A 144, 692 (1934).
6.2 Isotopic Spin Symmetry 221
section: the numbers of protons and neutrons in 12 C are equal, and both numbers
are even.) The small difference in atomic weights of 12 B and 12 N is due to the
greater Coulomb repulsion among the seven protons of 12 N than among the five
protons of 12 B, but this cannot account for the large difference from the atomic
weight of the ground state of carbon. In order to provide the T3 = 0 member
of a triplet with 12 B and 12 N, there would have to be a spin 1 state of 12 C
with an excitation energy well above the ground state. Since the number of
protons in 12 C is the average of the numbers in 12 B and 12 N, we would expect
its excitation energy to be about 0.0165 m1 c2 (the average of 0.0144 m1 c2 and
0.0186 m1 c2 ), or, taking m1 c2 = 931.5 MeV, about 15.3 MeV. In fact there is
such a state, a spin 1 state of 12 C that is 15.11 MeV above the 12 C ground state,
which decays into the ground state by emission of a photon. This is the T3 = 0
member of the triplet.
11 Quantum chromodynamics is part of our present theory of elementary particles and their interactions,
the Standard Model. Formulating and testing this model has been the work of many physicists. For an
informal history see Weinberg, “Half a Century of the Standard Model,” listed in the bibliography. A more
detailed account with references to much of this work can be found in Weinberg, The Quantum Theory of
Fields, Vol. II: Modern Applications (Cambridge University Press, Cambridge, UK, 1996).
222 6 Nuclear Physics
up quark mass. The reason for the isotopic spin symmetry of strong forces
is just that there is no room in the theory for any violation of the symmetry
other than the quark masses, and the quark masses although unequal are very
small. Almost all of the masses of the proton and neutron comes from the strong
nuclear forces acting among the quarks within a single proton or neutron, not
from the quark masses.
The small mass difference between the proton and the neutron comes both
from differences in the quark masses and from electromagnetic forces among
the quarks, but the quark mass difference is somewhat more important. This why
the neutron is heavier than the proton, even though the electric charges of the
quarks in the proton are larger than those in the neutron. It is both the smallness
of the quark masses and the relative weakness of electromagnetic effects that
makes the neutron–proton mass difference, 1.293 MeV/c2 , so tiny compared
with the proton mass, 938 MeV/c2 .
Pions
Isotopic spin symmetry had important implications for the new strongly inter-
acting particles discovered after World War II. The first of these particles was
the pi meson, or pion as it is frequently called. In 1947 a group at the University
of Bristol,12 studying photographic plates that had been exposed to cosmic
rays at high altitudes in the Pyrenees and Andes, found evidence of a strongly
interacting particle with a mass intermediate (hence the name “meson”) between
the electron and the nucleon. It is today known that these charged pions come
with charges +e and −e, both with masses 139.570 MeV/c2 . These particles
are produced singly in reactions such as p + p → p + n + π + , and so if
baryon number is conserved these particles must be supposed to have B = 0.
Equation (6.2.2) then indicates that the π + and π − have T3 = +1 and T3 = −1,
respectively. No doubly charged particles with similar mass have ever been
found, so the pions cannot be part of an isotopic spin multiplet with t ≥ 2,
and therefore must be part of a triplet, with t = 1. The neutral T3 = 0 member
of the triplet, the π 0 , was discovered at the Berkeley cyclotron in 1950 – the
first particle to be found at an accelerator before it being discovered in cosmic
rays. The mass of the π 0 is now known to be 134.977 MeV/c2 .
In quantum chromodynamics, the π + and π − are respectively u + d and
d + u, where u and d stand for up and down quarks, and the bar denotes
antiquarks. The π 0 is a 50–50 superposition of u + u and d + d. The quark
masses contribute equally to all three pions, so the 4.6 MeV/c2 mass
difference between charged and neutral pions is entirely due to electromagnetic
forces. In fact, this is the one mass difference in an isotopic spin multiplet
There are no clearly identified multiplets of nuclear states larger than triplets,
but there is a conspicuous quartet of unstable particles that decay into a nucleon
and a pion, with masses all close to 1210 MeV/c2 . This is the “three–three
resonance” , where “three–three” means that it has t = 3/2 and j = 3/2,
and “resonance” indicates that these are seen as sharp peaks in pion–nucleon
scattering, interpreted as the formation of an unstable intermediate state that
decays back into a nucleon and a pion. As discussed at the end of the appendix
to Section 5.6, the total decay rate of each of these four states is measured as
the width of the peak of the cross section as a function of energy, divided by h̄;
the rate of decay into any particular pion–nucleon state equals the total decay
rate times the branching ratio, the fraction of scattering events at the resonant
energy that produce that pion–nucleon state.
Since the formation and decay of the both indicate that it has the same
baryon number B = 1 as the nucleon, Eq. (6.2.2) indicates that the four states of
the quartet with charges 2e, e, 0, and −e have T3 = 3/2, T3 = 1/2, T3 = −1/2,
and T3 = −3/2. Like the proton and neutron the states are interpreted as
composites of three quarks: respectively uuu, uud, udd, and ddd.
The three–three resonance provides a good example of the power of sym-
metry principles such as isotopic spin symmetry to do more than dictate how
energy eigenstates are grouped into multiplets. The conservation of isotopic
spin tells us that the nucleon and pion produced when a decays must be in a
state of total isotopic spin 3/2 rather than a mixture of isotopic spins 3/2 and
1/2. For a three–three resonance with a given value of T3 , the nucleon–pion
state has wave function
C1,1/2 (3/2, T3 ; T3 ∓ 1/2, ±1/2)ψTπ3 N
∓1/2,±1/2 ,
±
where ψTπ3 N
∓1/2,±1/2 is the wave function for a pion and a nucleon with their
third components of isotopic spin equal respectively to T3 ∓ 1/2 and ±1/2, and
224 6 Nuclear Physics
1
V (x) V0 + mN ω2 x2 (6.3.1)
2
where mN can be taken as the mean nucleon mass, and ω is a constant with the
dimensions of frequency. The total Hamiltonian is then a sum of one-nucleon
Hamiltonians, each of the form
P2 mN ω 2 2
H = V0 + + X , (6.3.2)
2mN 2
with X the operator that multiplies the wave function with the coordinate
argument x, and P the operator that acts on the wave function as the differential
operator −i h̄∇. This is the Hamiltonian for an harmonic oscillator with circular
frequency ω, the first problem solved using Heisenberg’s matrix mechanics at
the beginning of quantum mechanics.13
To find the spectrum of eigenvalues of this Hamiltonian, we introduce a
vector operator
1 mN ω
a≡ √ P−i X. (6.3.3)
2mN ωh̄ 2h̄
Recalling the commutation relations (5.3.22),
[Xi , Pj ] = i h̄δij , [Xi , Xj ] = [Pi , Pj ] = 0 , (6.3.4)
it is straightforward to calculate that
[ai , aj† ] = δij , [ai , aj ] = [ai† , aj† ] = 0 . (6.3.5)
The Hamiltonian (6.3.1) can be expressed as
h̄ω
H = V0 + a · a† + a † · a .
2
Using the commutators (6.3.5), this is
3h̄ω
H = V0 + + h̄ω a† · a . (6.3.6)
2
The operators a† and a play the role of raising and lowering operators for
the energy. Using Eq. (6.3.6) and the commutation relations (6.3.5), we easily
see that
[a, H ] = h̄ωa , [a† , H ] = −h̄ωa† . (6.3.7)
It follows that if H ψ = Eψ, then
H (a† ψ) = (E + h̄ω)(a† ψ) , (6.3.8)
13 W. Heisenberg, Zeit. Phys. 33, 879 (1925). This article is reprinted in English in Van der Waerden, Sources
of Quantum Mechanics, listed in the bibliography.
226 6 Nuclear Physics
where Ym is the spherical harmonic function described in Section 5.2, with a
non-negative integer and m an integer running over the 2+1 values from − to
+. For instance, Y00 (v) is a constant, and
±1 3 3
Y1 (v) = ∓ (v1 ± iv2 ) , Y1 (v) =
0
v3 .
8π 4π
We can find a complete set of states with energy E0 + nh̄ω and angular momen-
tum quantum number for which n − is an even non-negative integer:
m
ψn, = (a† · a† )(n−)/2 Ym (a† )ψ0 . (6.3.15)
Y1m (a† )ψ0 . For n = 2 we have both = 2, with ψ2,2 m = Y m (a† )ψ and also
2 0
= 0, with ψ2,0 ∝ (a · a )ψ0 .
0 † †
All but the lowest energy states are evidently degenerate. As we have seen, for
energy levels with n = 1 and n = 2 there are respectively three and 5 + 1 = six
states with energies respectively E0 + h̄ω and E0 + 2h̄ω. In general, the number
#n of states with energy E0 + nh̄ω is the sum of 2 + 1 for all non-negative
integers with n − an even non-negative integer 2ν. That is,
⎧ n/2
⎪
⎪ ν=0 (2n − 4ν + 1) = (2n + 1)(n/2 + 1) − 2(n/2)(n/2 + 1)
⎪
⎪
⎪
⎪
⎪
⎨
for n even
#n = (n−1)/2
⎪ (2n − 4ν + 1) = (2n + 1)((n − 1)/2 + 1)
⎪
⎪ ν=0
⎪
⎪ −2((n − 1)/2)((n − 1)/2 + 1)
⎪
⎪
⎩
for n odd
and so, whether n is even or odd, the degeneracy (apart from spin) is
#n = (n + 1)(n + 2)/2 . (6.3.16)
This can be recognized as the number of ways an integer n can be written as
a sum of three non-negative integers, so this is also the number of independent
wave functions ψn1 n2 n3 with n = n1 + n2 + n3 defined by Eq. (6.3.12). Thus
the wave functions (6.3.15) form a complete set of eigenfunctions of H with
eigenvalue E0 + nh̄ω.
It has been possible to work out the energy eigenvalues and their degeneracies
here (as Heisenberg did in 1925) without examining the form of these wave
functions as functions of the nucleon coordinates, but it will help to make our
discussion more concrete if we take a moment to look at these wave functions.
By using Eq. (6.3.3), the defining (6.3.10) for the wave function of the state of
minimum energy can be written explicitly in a first-order differential equation
228 6 Nuclear Physics
h̄ mN ω
∇+ x ψ0 (x) = 0 . (6.3.17)
2mN ω 2h̄
The solution (with arbitrary normalization) is
mN ω 2
ψ0 (x) = exp − x . (6.3.18)
2h̄
m (x) can be found using Eq. (6.3.15), with ψ (x) given
The wave functions ψn, 0
†
by Eq. (6.3.18), and with a replaced with the differential operator
−i mN ω
a =√
†
∇+i x.
2mN ωh̄ 2h̄
For instance,
mN ω 2
m
ψ1,1 (x) ∝ |x|Y1m (x̂) exp − x .
2h̄
Taking into account the two spin states of a nucleon, the actual degeneracy of
the energy level E = E0 + nh̄ω is twice the quantity (6.3.16), or
(n + 1)(n + 2) = 2, 6, 12, 20, 30, 42, . . .
This leads to the expectation that the protons or neutrons in a nucleus would all
form closed shells if the number of protons or of neutrons were equal to
2, 2 + 6 = 8, 8 + 12 = 20, etc.
These are the so-called magic numbers of nuclear physics,14 analogous to the
atomic numbers 2, 10, 18, etc., of the noble gases in atomic physics. We expect
nuclei with a magic number of protons or neutrons to be more deeply bound
and hence more abundant than other nuclei with similar numbers of neutrons
and protons. A nucleus is likely to be particularly deeply bound if it is doubly
magic, with a magic number of both protons and neutrons. Indeed, the lightest
doubly magic nuclei are 4 He, 16 O, and 40 Ca, which are more tightly bound and
abundant than other nuclei of similar weight.
One might expect the magic number following 20 to be 20+20 = 40, but this
is not the case. The degenerate multiplets we found for the harmonic oscillator
begin, for heavier nuclei, to be split in energy, both by the interaction of the
spin and orbital angular momenta of the nucleons and from the breakdown of
the harmonic oscillator approximation (6.3.1) as nucleons in high energy levels
spend increasing time away from the nuclear center. In particular, there is a
term in the Hamiltonian for each nucleon proportional to S · L with a large
14 M. Goeppert-Mayer and J. H. D. Jensen, Elementary Theory of Nuclear Shell Structure (Wiley, New York,
1955).
6.4 Alpha Decay 229
negative coefficient, which for each n lowers the energy of the single-nucleon
state with the largest orbital angular momentum = n and largest total angular
momentum, j = + 1/2 = n + 1/2, below the energies of other single-nucleon
states with the same n.
Without these corrections, the n = 3 energy level would have 20 degenerate
states with = 3 and = 1, but these corrections lower the eight f7/2 states
below the other 12 states, so the magic number following 20 is not 40, but
20 + 8 = 28. The element with 28 protons is nickel, which is known to be pro-
duced abundantly by nuclear reactions occurring in core-collapse supernovae.
The most abundant isotope of nickel is not the doubly magic 56 Ni; this iso-
tope is less abundant than either 58 Ni or 60 Ni, which have a magic number
only of protons. This is because the negative nuclear potential energy of the
additional neutrons is needed to compensate for the Coulomb repulsion of the
28 protons. Even so, as noted in Section 3.4, the deep binding of nickel isotopes
makes nickel an exception to the rule that atomic weight steadily increases with
atomic number.
The same pattern repeats for larger nucleon numbers. The next shell has
nucleons in the 20 − 8 = 12 states with n = 3 and j < 7/2, and in the 10
n = 4 states with = 4 and j = + 1/2 = 9/2, giving a magic number
28 + 12 + 10 = 50. The next shell has nucleons in the 30 − 10 = 20 states
with n = 4 and j < 9/2, and in the 12 n = 4 states with = 5 and
j = + 1/2 = 11/2, giving a magic number 50 + 20 + 12 = 82. Finally,
the next shell has nucleons in the 42 − 12 = 30 states with n = 5 and j < 11/2,
and in the 14 n = 6 states with = 6 and j = + 1/2 = 13/2, giving a magic
number 82 + 30 + 14 = 126. Thus the complete list of magic numbers is
The only stable doubly magic nucleus heavier than calcium 40 is lead 208.
As we saw in Section 3.3, in the first decade of the twentieth century Rutherford
and his collaborators were able to distinguish two kinds of radioactivity. One
was beta decay, the subject of Section 6.5. The other was alpha decay, the
emission of a charged alpha particle, soon identified as a helium 4 nucleus.
These alpha particles furnished Rutherford with a probe of atomic structure,
with which he discovered the nucleus of the atom.
Alpha decay has the remarkable feature that to get out of the nucleus the
alpha particle must pass through a potential barrier that according to classical
physics it cannot inhabit, because the potential energy there is greater than the
total energy of the alpha particle. Only because of the wave nature of particles
230 6 Nuclear Physics
in quantum mechanics is it possible for the alpha particle to leak through the
barrier. The presence of this barrier gives the rate of alpha decay an extreme
sensitivity to the energy of the emitted alpha particle and the radius of the nu-
cleus. Similar Coulomb barriers govern the rate of spontaneous nuclear fission
and of nuclear reactions in stars.
We will assume spherical symmetry, and to avoid mathematical complica-
tions consider only s-wave (l = 0) decays, which are the most common. The
Schrödinger (5.2.19) for the radial wave function RE (r) with alpha particle
energy E and = 0 takes the form
h̄2 1 d 2 dRE (r)
− r + V (r)RE (r) = ERE (r) , (6.4.1)
2mα r 2 dr dr
where V (r) is taken to include both the Coulomb repulsion and the nuclear
attraction between the alpha particle and the rest of the nucleus. We take E > 0,
so that it is energetically possible for the alpha particle to exist far from the
nucleus. It proves very convenient to write this instead as a differential equation
for the reduced wave function uE (r) ≡ r RE (r):
h̄2 d 2
− uE (r) + V (r)uE (r) = EuE (r) . (6.4.2)
2mα dr 2
As we saw in Section 5.2, the boundary condition for general orbital angu-
lar momentum is that, for r → 0, RE (r) is proportional to r and hence
uE (r) is proportional to r +1 , so for = 0 the condition is that uE (r) ∝ r for
r → 0.
It is assumed that for r less than the nuclear radius R the potential V (r) is
dominated by the nuclear attraction, which gives it negative values. For r greater
than R the nuclear attraction is presumed to be ineffective, so V (r) becomes
positive:
2Ze2
V (r) = for r > R , (6.4.3)
r
where Ze is the electric charge of the final nucleus. We assume that for some
range of r greater than R, this potential is greater than E. This is the region that
classically cannot be inhabited by the alpha particle. (See Fig. 6.1.)
To see how the wave function behaves in this region, it is convenient to rewrite
Eq. (6.4.2) for r > R as
d2
uE (r) = κE2 (r)uE (r) , (6.4.4)
dr 2
6.4 Alpha Decay 231
V(r)
E
r
R bE
Now, if the barrier extended to infinity with V (r) > E then the only
allowed values of energy would be those for which the growing exponential
term in Eq. (6.4.6) was absent, which would require that E takes a value where
C+ (E) = 0. These would be the energies of the true bound states of the alpha
particle in the nucleus. In fact, V (r) falls to the value E at a radial coordinate
r = bE :
bE = 2Ze2 /E (6.4.7)
and V (r) < E for r > bE . The condition C+ (E) = 0 picks out the energies
of unstable states, for which the wave function becomes exponentially small
outside the barrier, though not zero.
For instance, if V (r) in the nucleus were a negative constant −V0 , then the
general solution of Eq. (6.4.2) for r < R would be a linear combination of
sin qr and cos qr, where
1%
q≡+ 2mα (E + V0 ) . (6.4.8)
h̄
The boundary condition that uE (r) ∝ r for r → 0 tells us that (with an arbitrary
normalization) the physical solution for r < R is
uE (r) = sin qr . (6.4.9)
In this case the continuity at R of the values and
√ first derivatives of the wave
functions (6.4.6) and (6.4.9) (with AE± (r) = 1/ κE (r) assumed to vary much
more slowly than the exponentials) gives
1 %
√ [C+ + C− ] = sin qR , κE (R)[C+ − C− ] = q cos qR ,
κE (R)
and therefore, for a constant potential in the nucleus,
√
κE (R) q
C± (E) sin qR ± cos qR . (6.4.10)
2 κE (R)
The condition C+ (E) = 0 requires that tan qR √ = −q/κE (R). For a very deep
potential well, with κE (R) much less than 2mα V0 /h̄ and hence much less
than q, the unstable state with lowest energy has q slightly greater than π/2R.
At a value of E where C+ (E) = 0, the wave function outside the barrier is
suppressed by a factor exp(−G(E)), where
bE bE
4mα Ze 2 1 1
G(E) = κE (r) dr = 2
− dr
R h̄ R r b E
4mα Ze2 bE 4Ze2
= f (R/b E ) = f (R/bE ) , (6.4.11)
h̄2 h̄vα
6.4 Alpha Decay 233
where
1 1 π % √
f (x) ≡ − 1 dz = − x(1 − x) − arcsin x , (6.4.12)
x z 2
√
and vα = 2E/mα is the velocity of the alpha particle when it escapes far
from the nucleus. At the energy of an unstable state, where C+ (E) = 0, the
probability density |RE (bE )|2 at the outer radius of the barrier is suppressed by
a factor of order exp(−2G(E)).
In the earliest successful theory of alpha decay,15 this factor was interpreted
as the probability that an alpha particle coming out of the nucleus would pene-
trate the Coulomb barrier. That is, the rate α of alpha decay was presumed to
take the form
α = ν exp(−2G(E)) , (6.4.13)
where ν is some sort of rate factor that reflects conditions within the nucleus.
The factor ν is commonly estimated as the rate ν V /R at which alpha
particles inside the nucleus classically would strike the nuclear surface, where V
is a typical alpha particle velocity inside the nucleus and R is the nuclear radius.
As we have seen, for a very deep potential well the alpha particle wave number
inside the nucleus is close to π/2R, so V /R h̄π/2mα R 2 , which for a large
nucleus with R 9 × 10−13 cm is 3 × 1020 sec−1 . The rate factor ν is usually
quoted as 1021 sec−1 .
This is sometimes expressed in terms of the spacing of energy levels. For a
flat deep nuclear potential with q κE (R), the energy levels of unstable states
where C+ (E) vanishes are at qR (n + 1/2)π with n = 0, 1, 2, . . . , so that
their wave numbers are spaced by q = π/R. The spacing D in energy is then
D (dE/dq)q = h̄V /R, so V /R D/π h̄.16
The appendix to this section gives a thoroughly quantum-mechanical deriva-
tion of the decay rate that dispenses with the semi-classical picture of an alpha
particle in the nucleus striking the nuclear surface and occasionally leaking
through. The rate of decay of an unstable state with energy E1 is found to be
given by Eq. (6.4.54):
C− (E1 ) −2G(E )
α = e 1 . (6.4.14)
h̄C (E ) + 1
The factor multiplying e−2G(E1 ) in Eq. (6.4.14) is of the same order of magni-
tude as the rate factor V /R. For instance, for a flat nuclear potential, Eq. (6.4.10)
suggests that the derivative of C+ (E) with respect to wave number is of order
15 G. Gamow, Zeit. f. Physik 52, 510 (1929); E. U. Condon and R. W. Gurney, Phys. Rev. 33, 127 (1929).
16 The rate factor ν multiplying exp(−2G) is sometimes instead estimated as ν D/2π h̄; for instance,
see J. M. Blatt and V. F. Weisskopf, Theoretical Nuclear Physics (John Wiley & Sons, New York, 1952),
Section XI.2.
234 6 Nuclear Physics
1/2
κE (R)R, while C− (E1 ) is of order κE (R), so C+
1/2 (E)/C (E) is of order
−
(dq/dE)R = R/h̄V and the factor C− (E1 )/h̄C+ (E1 ) in Eq. (6.4.14) is there-
fore of order V /R.
(E )| instead
It must be admitted that taking the rate factor ν as |C− (E1 )/h̄C+ 1
of V /R or D/h̄π is not very important, because none of these estimates take into
account the probability that an alpha particle will somehow become detached
inside the nucleus from the rest of the nucleus. But at least Eq. (6.4.14) is a
precise statement (for thick barriers with a slowly varying potential) of the rate
at which an alpha particle that has become detached inside the nucleus will
escape, and it does not depend on semi-classical hand-waving.
This theory does correctly describe the extreme sensitivity of alpha particle
decay rates to the energy and the nuclear radius, due almost entirely to the
barrier penetration factor exp(−2G(E)). In particular, without needing to worry
about the rate factor ν we can use the above results for the barrier penetration
exponent G(E) to understand the trend of the dependence of the logarithm of
the mean lifetime τα = 1/ α on energy. Note that for a thick barrier with
bE R, the leading and next-to-leading terms in the expansion of Eq. (6.4.11)
in powers of R/bE give
3/2
4Ze2 π R R
G(E) = −2 +O . (6.4.15)
h̄vα 2 bE bE
√
Since vα ∝ E and bE ∝ 1/E, we have
α
ln τα ∝ G(E) = √ + β + O(E) , (6.4.16)
E
with α and β constant in energy. This dependence of ln τα on energy was
originally noticed in 1911 as a dependence of the alpha particle range in air
on energy, and in that form is known as the Geiger–Nuttall law.17
For a numerical example let us consider the historically important decay
process 226 Ra → 222 Rn + 4 He. The nuclei 226 Ra, 222 Rn, and 4 He all have
spin zero and even parity, so the alpha particle in this decay has = 0, as we
assumed in our calculation of G(E). The alpha particles from this decay have a
velocity vα = 1.519×109 cm/sec, and radon has Z = 86, so here the first factor
in Eq. (6.4.11) for G(E) is 4Ze2 /h̄vα = 49.55. Also, bE = 5.18 × 10−12 cm.
According to Eq. (6.1.1) the radius of 222 Rn is approximately 7.9 × 10−13 cm,
to which we should add the radius 2 × 10−13 of 4 He, and so the effective
nuclear radius here is R 9.9 × 10−13 cm, and R/bE 0.19. The func-
tion (6.4.12) is then f (R/bE ) = 0.72. Equation (6.4.11) then gives G(E) =
35.7, and the barrier penetration probability is exp(−2G) ≈ 10−31 . If we take
ν 1021 sec−1 then Eq. (6.4.13) gives a radium mean life 1/ α of order 1010
17 H. Geiger and J. M. Nuttall, Phil. Mag. 22, 613 (1911); 23, 439 (1912).
6.4 Alpha Decay 235
sec. It is the smallness of the factor exp(−2G) that is responsible for the radium
226 nuclei produced in a chain of radioactive decays from uranium 238 living
long enough to be discovered in uranium ores in 1898 by Marie and Pierre
Curie. The predicted mean lifetime, of order 1010 sec, may be compared with
the measured mean life of 2300 years = 7 × 1010 sec. The agreement, such
as it is, is somewhat accidental, because the decay rate is so sensitive to the
nuclear radius R. For instance, if we had taken the effective nuclear radius as
R = 9.3 × 10−13 cm instead of R = 9.9 × 10−13 cm, then with everything else
the same we would have found a predicted mean life of 5600 years. Indeed,
rather than using known values of R to calculate alpha decay rates of various
nuclei, the observed decay rates were historically used to estimate R. For this
purpose, it is not important to be precise about the value of the factor ν multi-
plying exp(−2G) in Eq. (6.4.13). But it is worth trying to be precise about this
in order to make sure that we understand the decay process.
where the amplitudes AE± (r) vary more slowly than the exponentials. Before
making any approximations, Eq. (6.4.4) can be written as a differential equation
for AE± (r):
2κE AE± + κE AE± ± AE± = 0 . (6.4.18)
We can implement the approximation that AE± (r) varies slowly by dropping the
second derivative AE± (r), solving Eq. (6.4.18), using the solution to calculate
AE± (r), and checking under what conditions it may indeed be neglected. With
the term ±AE± (r) dropped, Eq. (6.4.18) becomes AE± /AE± = −κE /2κE ,
√
which has the easy solution AE± (r) ∝ 1/ κE (r). Then
AE± κE 3 κE
=− + .
κE AE± 2κE κE 4 κE2
Thus AE± is indeed negligible compared with the term κE AE± in Eq. (6.4.18) if
1 κE 1 κE
1 and
1, (6.4.19)
κE κE κE κE
which is to say that both κE (r) and κE (r) undergo only small fractional changes
in a distance of order 1/κE (r). Under these conditions, Eq. (6.4.4) has the two
independent approximate solutions
1
√ exp ± κE (r) dr .
κE (r)
This is known as the WKB approximation.19 We can write the general solution
of Eq. (6.4.4) inside the barrier as a linear combination of these solutions:
r r
C+ (E) C− (E)
uE (r) = √ exp + κE (r) dr + √ exp − κE (r) dr .
κE (r) R κE (r) R
(6.4.20)
Beyond the barrier, where r > bE (with V (bE ) ≡ E), it is convenient to
write the Schrödinger equation (6.4.2) for the reduced wave function as
d2
uE (r) = −kE
2
(r)uE (r) , (6.4.21)
dr 2
19 G. Wentzel, Zeit. f. Phys. 38, 518 (1926); H. A. Kramers, Zeit. f. Phys. 39, 828 (1926); L. Brillouin,
Compt. Rendus Acad. Sci. 183, 24 (1926).
6.4 Alpha Decay 237
where
1%
kE (r) = + 2mα (E − V (r)) . (6.4.22)
h̄
Following the same arguments as before, provided that
1 kE
1 and 1 kE
1,
(6.4.23)
kE kE kE kE
we can use the WKB approximation to find solutions
1
uE (r) ∝ √ cos kE (r) dr + ϑ , (6.4.24)
kE (r)
where ϑ is any angle. We have two independent solutions, given by using
Eq. (6.4.24) with ϑ taken as two different angles.
We need to work out how each of the two independent solutions of the
Schrödinger equation inside the Coulomb barrier, for r < bE , merges with
linear combinations of the two independent solutions beyond the barrier, where
r > bE . Unfortunately we cannot do this by equating the value and derivative
of the WKB solutions for r just below and just above bE , because κE (r) and
kE (r) both vanish at r = bE , and so the conditions (6.4.19) and (6.4.23) for the
validity of the WKB approximation break down near bE . This is a well-known
problem in the use of the WKB approximation to calculate bound state energies,
but here we will encounter an additional difficulty.
We will make the reasonable assumption that V (r)−E approaches a function
proportional to bE − r for r near bE . In this case, for r > bE ,
%
kE → βE r − bE for r → bE , (6.4.25)
with βE a positive function of E. It is convenient to define a new independent
variable
r
2βE
φ≡ kE (r )dr → (r − bE )3/2 . (6.4.26)
bE 3
The Schrödinger equation (6.4.21) then takes the form
d 2u 1 du
2
+ +u=0, (6.4.27)
dφ 3φ dφ
with two independent solutions
u ∝ φ 1/3 J±1/3 (φ) , (6.4.28)
where Jν is the usual Bessel function of order ν. Likewise, for r < bE we have
%
κE → βE bE − r for r → bE , (6.4.29)
238 6 Nuclear Physics
where here Iν (φ) is the Bessel function of order ν with imaginary argument:
Iν (φ) ≡ e−iπ ν/2 Jν (e+iπ ν/2 φ) . (6.4.33)
To see how the solutions for r > bE and r < bE merge with each other at
r = bE , we note that, for φ → 0,
φ 2/3 21/3
φ 1/3 J1/3 (φ) → , φ 1/3 J−1/3 (φ) → ,
21/3 (4/3) (2/3)
while, for φ → 0,
2/3
1/3 φ 1/3 21/3
φ I1/3 (φ) → 1/3 , φ I−1/3 (φ) → .
2 (4/3) (2/3)
2/3
But φ 2/3 → (2βE /3)2/3 (r − bE ) and φ → (2βE /3)2/3 (bE − r), so
1/3
φ 1/3 J±1/3 (φ) ⇐⇒ ∓φ I±1/3 (φ) , (6.4.34)
where “⇐⇒” means “connects smoothly at r = bE .”
To learn from these results about the WKB solutions, we note that the con-
ditions (6.4.19) and (6.4.23) are satisfied if φ 1 and φ 1. As long as
the approximations (6.4.25) and (6.4.29) are still valid for these large values
of φ and φ, we can take wave functions in the WKB approximations as the
asymptotic limits of the solutions (6.4.28) and (6.4.32):
2 −1/6 π π
φ J±1/3 (φ) →
1/3
φ cos φ ∓ −
π 6 4
2 1/6 r
2 3βE −1/2 π π
= kE (r) cos kE (r )dr ∓ − , (6.4.35)
π 2 bE 6 4
6.4 Alpha Decay 239
and
1/3 1 −1/6
φ I±1/3 (φ) → φ exp φ
2π
1/6
1 3βE2 −1/2
bE
= κE (r) exp κE (r )dr , (6.4.36)
2π 2 r
but
1/3 3 −1/6
φ I+1/3 (φ) − I−1/3 (φ) → −
φ exp −φ
π
2 1/6 bE
3 3βE −1/2
=− κE (r) exp − κE (r )dr . (6.4.37)
π 2 r
to φ −1/6 cos(φ − π/4) with a coefficient that cannot be calculated within our
240 6 Nuclear Physics
present approximations. Using the average, we have then from Eqs. (6.4.36),
(6.4.35), and (6.4.34):
bE
−1/2 −1/2
κE (r) exp κE (r )dr ⇐⇒ −kE (r)
r
bE bE
π π
× cos kE (r )dr + + ξ(E) cos kE (r )dr − ,
r 4 r 4
(6.4.39)
with ξ(E) an unknown coefficient.20
We can write the exponentials in Eq. (6.4.20) as
r bE
±G(E)
exp ± κE (r ) dr = e exp ∓ κE (r ) dr ,
R r
where G(E) is the barrier penetration exponent:
bE
G(E) ≡ κE (r) dr , (6.4.40)
R
given by Eq. (6.4.11) for a Coulomb potential with = 0. Then the wave
function uE (r) that takes the form (6.4.20) for R < r < bE takes the following
form for r > bE :
1
uE (r) = − √ 2C+ (E)eG(E) + ξ(E)C− (E)e−G(E)
kE (r)
r
π
× cos kE (r )dr −
bE 4
r
−G(E) π
+ C− (E)e cos kE (r )dr + . (6.4.41)
bE 4
Now we need to consider the normalization of the wave functions. Since the
Hamiltonian is Hermitian, and allowed values of energy E form a continuum,
we know that wave functions with different energy are orthogonal, in the sense
that
∞ ∞
uE (r) uE (r) dr = RE (r) RE (r) r 2 dr = N 2 (E)δ(E − E ) .
0 0
(6.4.42)
The only question is, what is the coefficient N 2 (E)? Once we know this, we can
define orthonormalized wave functions
−1
u(N )
E (r) ≡ N (E)uE (r) , (6.4.43)
20 Without explanation, Fermi in the reference in footnote 18 took ξ = 0. This is not justified, but as we
shall see it makes no difference in the decay rate.
6.4 Alpha Decay 241
for which
∞
uE (r) uE (r) dr = δ(E − E ) ,
(N ) (N )
(6.4.44)
0
(6.4.51)
For r < R the wave function uE (r ) is unaffected by the potential barrier,
and therefore (as shown for example in Eq. (6.4.9)) varies smoothly with E.
On the other hand, the term in the denominator proportional to e2G(E) makes
the integrand very small except very near the energies of unstable states, where
C+ (E) vanishes. The integral is therefore dominated by values of E very near
the energies En at which C+ (E) vanishes. These are the energies of nearly
stable states, so the wave functions uEn (r) are approximate eigenfunctions of
the Hamiltonian, and therefore are approximately orthogonal, so in Eq. (6.4.51)
R
the integral 0 dr uEn (r )uE1 (r ) is very small for n = 1. For E very near
E1 , we can approximate C+ (E) → C+ (E )(E − E ). Since the contribution
1 1
6.5 Beta Decay 243
e−iEt/h̄
× (E ) eG(E1 ) (E − E ) + ξ(E )C (E )e−G(E1 ) ]2 + C 2 (E ) e−2G(E1 )
.
[2C+ 1 1 1 − 1 − 1
(6.4.52)
For t > 0 the contour of integration over E can be closed with a large semicircle
in the lower half of the complex plane, on which e−iEt/h̄ is exponentially small.
Since this contour is now closed clockwise, the integral is given by −2iπ times
the residue of the pole at E = E1 − i|C− (E1 )/2C+ (E )|e−2G(E1 ) , where
1
But there was a peculiar difference between the observed energies of the
photons emitted in atomic transitions and the electrons emitted in beta decay.
As we saw in Section 3.4, Bohr had realized in 1913 that a photon emitted
in any given atomic transition has a unique energy, given by the difference in
energies of the initial and final atomic states. Chadwick discovered in 1914
that the energies of electrons emitted in a beta transition between any specific
nuclear states do not have any one value, but occupy a range up to some def-
inite maximum. This might be explained if a photon is emitted along with the
electron, with the energy of the nuclear transition shared between the electron
and the photon in a proportion that varies from one decay event to another. The
electron energy would come close to a maximum value, equal to the energy
released in the nuclear transition, only when the photon happens to have very
low energy. If beta decay produced a photon along with the electron, then
when these decay products are caught in a surrounding medium the heat energy
given to the medium would be the same in each decay event, equal to the
energy difference of the initial and final nuclear states, and hence equal to the
maximum value observed for the electron energy in this decay. But experiments
in 1927 by C. D. Ellis (1895–1980) and W. A. Wooster (1903–1984) showed
that the average energy deposited in the medium surrounding the decaying
nucleus was not equal to the maximum energy of the electron, but instead to its
average, as if whatever energy was not carried by the electron was simply lost.
Bohr was even led by this to speculate that energy might not be conserved in
beta decays.
A different explanation was offered in 1930 by Pauli. He proposed that the
electron in beta decay is indeed accompanied by another particle that because
electrically neutral had escaped detection, but this neutral particle is not a pho-
ton. Rather, it is an extremely penetrating particle that is not captured in the
surrounding medium. The particle soon became known as a neutrino, symbol-
ized ν. The underlying reaction is n → p + e− + ν (where n and p stand for the
neutron and proton, and the electron is denoted e− , for a reason we will come
to presently). Among many other examples, this is responsible for the decay of
the ground state of boron 12 to carbon in the reaction 12 B → 12 C + e− + ν,
as well as for the decay of the free neutron, n → p + e− + ν. Since neutrons,
protons, and electrons have spin 1/2, angular momentum conservation requires
the neutrino to have a half-integer spin. It is in fact known to have spin 1/2.
There are also radioactive decays in which instead of an electron there is emit-
ted a positron, e+ , the electron’s antiparticle, with the same mass but opposite
electric charge.21 The conservation of energy forbids the process p → n+e+ +ν
21 The existence of the positron was anticipated in 1930 by P. A. M. Dirac, Proc. Roy. Soc. A126, 360 (1930).
He had developed a relativistic version of the Schrödinger equation, which turned out to have solutions
corresponding to states of negatively charged electrons with negative energy as well as states with positive
energy. Dirac’s interpretation was that these negative-energy states are normally filled, one electron to
each negative-energy state in accordance with the Pauli exclusion principle, but that occasionally there
6.5 Beta Decay 245
for free protons, but in nuclei this process can produce decays such as the beta
decay of the ground state of nitrogen 12, 12 N → 12 C + e+ + ν.
In 1934 Fermi proposed a detailed theory of beta decay.22 In Fermi’s theory
the interaction Hamiltonian takes the form
Hβ = (h̄c) GF ημν d 3 xV μ V ν + c.c. ,
3
(6.5.1)
where GF is a constant; V μ and V ν are operators with the same Lorentz and
space inversion transformation properties as the electric current J μ and with
the dimensionality of densities (that is, inverse volumes); V μ acts to change
neutrons to protons; V ν acts to create electrons and neutrinos; and as usual
c.c. indicates the adjoint of the foregoing term. The factor (h̄c)3 is extracted
from GF for later convenience. As we will see in the appendix to Section 7.4
these currents are bilinear functions of Dirac fields, but we will not need that
information for our limited purposes here.
Fermi’s theory almost immediately needed modification. The three-vector
part of the current V μ is odd under space inversion, so when acting on nuclear
states it gives a contribution proportional to nucleon velocities v, and so is
suppressed by a factor of order |v|/c, which is small in nuclei, as in atoms.
This leaves the time component V 0 , which is even under space inversion and
is a rotational scalar. For decays that are not suppressed by a centrifugal barrier
there is no orbital angular momentum, so in these decays neither the parity nor
spin of the nuclear states can change. But many beta decays were observed
in which the spin of the nuclear state did change by one unit, and which yet
appears a vacancy which we observe as a particle of positive charge and positive energy. At first Dirac
identified these holes as protons, but then in 1932 a positively charged particle was unexpectedly found in
cosmic rays by C. D. Anderson, Phys. Rev. 43, 491 (1932). (This article is included in Beyer, Foundations
of Nuclear Physics, listed in the bibliography.) The cloud chamber tracks of these particles were observed
to have the same curvature in a magnetic field as electron tracks, but in the opposite direction, consistent
with a particle having the same mass as an electron and a charge of equal magnitude but opposite sign. It
was widely supposed that these were Dirac’s holes.
The interpretation of positrons as vacancies in a sea of negative-energy electrons has largely been
abandoned. Dirac’s relativistic wave equation works only for particles of spin 1/2. This at first seemed
like a triumph because protons and electrons were known to have spin 1/2, but by now we know of
several particles of spin 0 and spin 1 (the H 0 , W + , W − , and Z 0 ) that seem every bit as elementary as
the electron. Furthermore, the W + and W − are each other’s antiparticles, in the same sense as the e+
and e− . But these are bosons, which do not obey the exclusion principle and so could not form a stable
sea of negative-energy particles. As described in the appendix to Section 7.4, Dirac’s equation survives
as the field equation satisfied by the quantum field of particles of spin 1/2 but not, as Dirac thought, as a
relativistic version of a Schrödinger equation for a probability amplitude.
As explained in Section 7.4, we now understand as a consequence of Lorentz invariance and quantum
mechanics that for every species of particle, elementary or not, fermion or boson, there is a corresponding
species of antiparticle, with the same mass and spin but opposite electric charge. The only qualification is
that a few types of electrically neutral particles like the photon and the Z 0 are their own antiparticles.
22 E. Fermi, Zeit. Phys. 88, 161 (1934). This article is reprinted in Beyer, Foundations of Nuclear Physics,
listed in the bibliography. In his article Fermi cited an unpublished suggestion by Pauli that a neutral
weakly interacting particle was emitted along with electrons in beta decay.
246 6 Nuclear Physics
have allowed beta decays into the ground state of 12 C, which has spin zero and
even parity.
In order to allow such decays, Fermi’s theory was modified by adding an
additional term to the interaction Hamiltonian:23
Hβ = (h̄c)3 GF ημν d 3 x[V μ V ν + Aμ Aν ] + c.c. , (6.5.2)
23 G. Gamow and E. Teller, Phys. Rev. 49, 895 (1936). There were other possibilities involving scalar and
tensor operators that were not finally excluded by experimental data until the 1950s.
6.5 Beta Decay 247
mass 80.4 GeV/c2 , and in part because the interactions that emit and absorb
W ± particles are characterized by a small constant, of the same order as
the fine-structure constant e2 /h̄c 1/137 of electrodynamics, which gives
−1/2 √
GF ≈ mW c2 × 137 ≈ 103 GeV. A more accurate value along with a
more precise definition for GF will be given in the appendix to Section 7.4.)
The extremely low rate at which neutrinos were absorbed by the medium
surrounding the radioactive nuclei in the 1927 experiments of Ellis and Wooster
is due to the extreme weakness of interactions such as beta decay that involve
neutrinos. In general the rates of neutrino interaction processes are characterized
by the presence in the rate of the factor G2F , which is what makes them so weak.
For instance, the cross section for the neutrino reactions ν + p → e+ + n and
ν + n → e− + p (whether for a free proton or a proton or neutron inside a
nucleus) is proportional to G2F , and so, since it has the units of area, dimensional
analysis requires that at a neutrino energy E considerably above me c2 the cross
section takes the form
σ ≈ (h̄cE)2 G2F .
Recalling that h̄c = 197 MeV × 10−13 cm, we see that for a relatively high-
energy beta decay neutrino with energy E = 10 MeV the cross section σ is of
order 10−44 cm2 . In ordinary matter, with a number density n of nucleons
of order 1024 cm−3 , this gives a mean free path 1/nσ ≈ 1020 cm, or about
100 light years. It is no wonder that Ellis and Wooster did not detect energy
deposited by neutrinos in their experiment. There never was any hope of
detecting neutrinos from ordinary laboratory samples of radioactive material,
but nuclear reactors emit such enormous floods of neutrinos from the beta
decay of fission products that at last in 1956 Clyde Cohan, Jr. (1919–1974)
and Frederick Reines (1918–1998) were able to detect neutrinos produced at
the Savannah River reactor by detecting gamma rays from the annihilation of
positrons produced in the reaction ν + p → n + e+ .
All rates for processes involving neutrinos are suppressed by the factor G2F ,
and there are also reactions due to other weak interactions that do not involve
neutrinos but are similarly suppressed. Among these are the decays of a particle
called the K meson, with a mass of 495 MeV/c2 , into two-pion and three-pion
states, decays that are very slow compared with processes such as the decay
of the three–three resonance into a pion and a nucleon that occur through the
action of strong interactions.
There is another common feature of weak interaction processes, beyond their
weakness. They violate some of the symmetry principles obeyed by strong and
electromagnetic interactions. It appeared that the charged K meson decayed
both into two-pion states that are invariant under the space inversion transfor-
mation x → −x, and also into three-pion states that change sign under space
inversion, which would not be possible if the space inversion operator com-
mutes with the Hamiltonian. It was this that led Tsung-Dao Lee (1926– ) and
248 6 Nuclear Physics
discovered tauon, with mass 1776.82 MeV/c2 . Both, like electrons, are emitted
by strongly interacting particles along with neutrinos, but these neutrinos are
not the same as the neutrinos emitted along with electrons or positrons in beta
decay. Rather, the neutrinos emitted in beta decay and along with the production
of muons and tauons are of three different types. For instance, the neutrino
emitted along with a muon in the decay of a charged pion can create another
muon in a reaction ν + n → p + μ− , but it cannot create an electron, and
the neutrino emitted along with an electron in beta decay can create another
electron in a reaction ν + n → p + e− , but even if its energy were high enough
it could not create a muon.
Except that, in a sense, it can. For years there was a mysterious deficiency in
the number of neutrinos observed to be coming from the Sun.28 These would
be electron-type neutrinos, created in reactions such as p + p → d + e+ + ν.
Bruno Pontecorvo (1919–1993) suggested29 that this is because neutrinos have
mass but the states with definite mass are not electron-type or muon-type or
tauon-type neutrinos. Rather, each of these is a superposition of neutrino states
of definite mass. According to this idea, the electron-type neutrinos emitted by
the Sun are superpositions of states of definite mass, which oscillate at different
rates on their way to the Earth, arriving as incoherent mixtures of neutrinos
of all three types. In the search for solar neutrinos the detectors were looking
for the reaction ν + 37 Cl → e− + 37 Ar, and were therefore sensitive only to
electron-type neutrinos, which according to Pontecorvo is why fewer neutrinos
were detected than would have been the case if neutrinos were massless, the
undetected neutrinos arriving as muon-type or tauon-type. This hypothesis was
confirmed when it became possible to detect solar neutrinos in the reaction ν +
d → ν + p + n, which is equally sensitive to neutrinos of all three types, and
the number seen was just what was expected. The existence of neutrino masses
has by now been convincingly confirmed in numerous terrestrial experiments,
which, although they have not yielded values for individual neutrino masses,
indicate that they are in the range of 0.01 to 0.1 eV/c2 .
When neutrinos were thought to have zero mass it was common to call the
particle emitted along with an electron an antineutrino, reserving the term neu-
trino for the particle emitted along with a positron. This was to preserve a widely
accepted conservation law, of a quantity known as lepton number, analogous to
baryon number. Electrons and neutrinos were supposed to have lepton number
+1; positrons and antineutrinos would have lepton number −1, while protons
and neutrons would have lepton number zero, so that lepton number would be
conserved in both kinds of beta decay. But it is not possible to attribute different
values for lepton number or any other conserved quantity to the neutral particles
28 J. N. Bahcall, Phys. Rev. Lett. 12, 300 (1964); Phys. Rev. 135, B137 (1964); R. Davis, Jr., Phys. Rev. Lett.
12, 303 (1964); R. Davis, Jr., D. S. Harmer, and K. C. Hoffmann, Phys. Rev. Lett. 26, 1205 (1968).
29 B. Pontecorvo, JETP 53, 1717 (1967).
250 6 Nuclear Physics
emitted with electrons or positrons in beta decay if they are just different spin
states of the same particle.
Now that we know that neutrinos have mass, there are two widely considered
points of view regarding the nature of neutrinos and of lepton number and the
origin of neutrino masses.
First, in order to preserve the exact conservation of lepton number it would be
necessary to suppose that the neutrino fields of electron-type, muon-type, and
tauon-type with lepton number −1 each have distinct adjoints with lepton num-
ber +1. In this view it is the states of helicity +h̄/2 of the field with lepton
number −1 that have been observed to be emitted with electrons in beta decay,
and it is the states of helicity −h̄/2 of the adjoint field with lepton number +1
that have been observed to be emitted with positrons, while the other helicity
states of the two fields of each type exist but are so far unobserved. Neutrinos
of this description are often called Dirac neutrinos, because their fields are
described in the same way as in the description of electrons by Dirac, discussed
in the appendix to Section 6.4.
The other possibility, often associated with the name of Ettore Majorana
(1906–1938), is that lepton number is not conserved, and the three types of
neutral particles emitted with negative and positive leptons are states of the same
three spin 1/2 particles, which as a consequence of Eq. (6.5.4) are overwhelm-
ingly likely to be emitted with helicity h̄/2 when emitted with e− , μ− , τ − and
with helicity −h̄/2 when emitted with e+ , μ+ , τ + . We can then regard these
three neutral particles as their own antiparticles, like the photon or the π 0 .
For what it is worth, the Majorana alternative seems to me a more econom-
ical and plausible view, which is why in this section I have not distinguished
neutrinos and antineutrinos. In the Dirac case neutrinos get masses in much the
same way as the other leptons and the quarks of the Standard Model, so it is
mysterious why they are so light compared with other elementary particles. On
the other hand, the masses of Majorana neutrinos can only arise from effects at
very high energy, and are naturally in the observed range.
Fermi’s theory correctly described the probability distribution for the
energy of the electron or positron emitted in beta decay, a distribution that
was unaffected by the subsequent modifications in the interaction Hamiltonian
described above. With these modifications Fermi’s theory has survived as a cor-
rect approximate theory for nuclear beta decay. It was in fact the first successful
application of quantum field theory outside the context of electrodynamics.
7
Quantum Field Theory
251
252 7 Quantum Field Theory
wave functions for fields – they are functionals of the field, quantities that
depend on the value taken by the field at every point in space, equal to the
component of the state vector in a basis labeled by these field values. One still
sometimes hears talk of second quantization, but this idea is an obsolete
historical relic.
This Lagrangian is a functional rather than a function of ϕn (x, t) and ϕ̇n (x, t);
that is, it depends on the values of ϕn (x, t) and ϕ̇n (x, t) for all n and x at a given
time t. Therefore, where derivatives of L appear in the canonical formalism as
described in Section 5.7, they should now be interpreted as functional deriva-
tives. In general, the functional derivatives δF /δϕ and δF /δ ϕ̇ of any functional
F of ϕn (x, t) and ϕ̇n (x, t) at a fixed time t are defined by the prescription that the
effect of independent infinitesimal variations in the arguments of the functional
is given by
F [ϕ(t) + δϕ(t), ϕ̇(t) + δ ϕ̇(t)]
δF [ϕ(t), ϕ̇(t)] δF [ϕ(t), ϕ̇(t)]
≡ F [ϕ(t), ϕ̇(t)] + 3
d x δϕn (x, t) + δ ϕ̇n (x, t) .
n
δϕn (x, t) δ ϕ̇n (x, t)
(7.1.3)
= L[ϕ(t), ϕ̇(t)]
∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
+ 3
d xg δϕn (x, t)
n
∂ϕn (x, t)
∂ L(ϕn (x, t),∇ϕn (x, t), ϕ̇n (x, t))
+ δ(∂ϕn (x, t)/∂xi )
∂(∂ϕn (x, t)/∂xi )
∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
+ δ ϕ̇n (x, t)g
∂ ϕ̇n (x, t)
= L[ϕ(t), ϕ̇(t)]
∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
+ 3
d xg δϕn (x, t)
n
∂ϕn (x, t)
∂ ∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
− δϕn (x, t)
∂xi ∂(∂ϕn (x, t)/∂xi )
∂ L(ϕn (x, t), ∇ϕn (x, t), ϕ̇n (x, t))
+ δ ϕ̇n (x, t)g
∂ ϕ̇n (x, t)
where as usual a repeated index i is summed over the values 1, 2, 3. Comparing
this with the definition (7.1.3), we have
δL ∂L ∂ ∂L
= − , (7.1.4)
δϕn (x, t) ∂ϕn (x, t) ∂xi ∂(∂ϕn (x, t)/∂xi )
δL ∂L
= , (7.1.5)
δ ϕ̇n (x, t) ∂ ϕ̇n (x, t)
in which to save writing we have dropped the arguments of L and L.
Field Equations
We take the derivatives of the Lagrangian in the equations of motion (5.7.12) to
be functional derivatives:
∂ δL δL
= . (7.1.6)
∂t δ ϕ̇n (x, t) δϕn (x, t)
Equations (7.1.4) and (7.1.5) then give the field equations
∂ ∂L ∂L ∂ ∂L
= − . (7.1.7)
∂t ∂ ϕ̇n (x, t) ∂ϕn (x, t) ∂xi ∂(∂ϕn (x, t)/∂xi )
These are known as the Euler–Lagrange equations. We can put Eq. (7.1.7) into
a form that appears more consistent with Lorentz invariance:
∂L ∂ ∂L
= μ , (7.1.8)
∂ϕn (x, t) ∂x ∂(∂ϕn (x, t)/∂x μ )
254 7 Quantum Field Theory
Commutation Relations
The field equations (7.1.8) could have been derived more easily by directly
requiring that the action (7.1.1) must be stationary with respect to arbitrary
infinitesimal variations of the ϕn (x, t) that vanish when |x| → ∞ or when
|t| → ∞. The calculation of functional derivatives is however important in
finding the commutation relations of the fields. The canonical conjugate πn (x, t)
to ψn (x, t) is defined as in Eq. (5.7.13) but with a functional derivative of L in
place of an ordinary derivative:
δL ∂L
πn (x, t) = = . (7.1.9)
δ ϕ̇n (x, t) ∂ ϕ̇n (x, t)
The canonical commutation relations (5.7.5) here read
[ϕn (x, t), πm (y, t)] = i h̄δnm δ 3 (x − y) ,
[ϕn (x, t), ϕm (y, t)] = [πn (x, t), πm (y, t)] = 0 . (7.1.10)
We will explore the consequences of these relations in the next section.
We next consider the simplest example of a quantum field theory, with a single
real scalar field ϕ(x), “free” in the sense that the field equations are linear.
Of course, we are really interested in what happens when fields interact, but,
as we will see in the next section, the first step in dealing with interacting fields
is to understand the content of the free-field theory.
We will take the Lagrangian density to have the form
1 ∂ϕ(x) ∂ϕ(x) m2 c2 2
L0 (x) = − ημν − ϕ , (7.2.1)
2 ∂x μ ∂x ν 2h̄2
the justification being that, as we shall see, this gives a sensible theory of
free spinless particles of mass m. (We are using the conventions described in
Chapter 4, with x 0 = ct; repeated indices are summed, with ημν = +1 for
μ = ν = 1, 2,43, η = −1 for μ = ν = 0, and η = 0 otherwise. This makes
μν μν
or more simply
− m2 c2 /h̄2 ϕ = 0 , (7.2.2)
∂ ∂ 1 ∂2
≡ ημν = ∇ 2
− . (7.2.3)
∂x μ ∂x ν c2 ∂t 2
The general real solutions of Eq. (7.2.2) are of the form
i
ϕ(x, t) = 3
d p A(p) exp (p · x − E(p)t)
h̄
−i
+ A (p) exp
†
(p · x − E(p)t) (7.2.4)
h̄
256 7 Quantum Field Theory
%
where E(p) = c2 p2 + m2 c4 , and the coefficients A(p) and A† (p) are
spacetime-independent operators whose properties are to be determined from
the canonical commutation relations.
The canonical conjugate to ϕ is here
∂ L0 1
π(x, t) = = 2 ϕ̇(x, t) . (7.2.5)
∂ ϕ̇(x, t) c
Then
[ϕ(x, t), π(y, t)] = 3
d p d 3 p (−iE(p )/c2 h̄)
/
i −i
× A(p) exp (p · x − E(p)t) + A (p) exp
†
(p · x − E(p)t) ,
h̄ h̄
/
i † −i
A(p ) exp (p · y − E(p )t) −A (p ) exp (p · y − E(p )t) .
h̄ h̄
Terms in the integrand that are proportional to the product exp(−iE(p)t/h̄) ×
exp (−iE(p )t/h̄) or to the product exp(iE(p)t/h̄) exp(iE(p )t/h̄) would make
different time-dependent contributions to the integral for any values of p and p ,
so since the canonical commutation rules give a time-independent commutator,
we must have
[A(p), A(p )] = [A† (p), A† (p )] = 0 . (7.2.6)
The commutator is then
[ϕ(x, t), π(y, t)] = d p 3
d 3 p (−iE(p )/c2 h̄)
† i −i
× −[A(p), A (p )] exp (p · x − E(p)t) exp (p · y − E(p )t)
h̄ h̄
i −i
− [A(p ), A (p)] exp
†
(p · y − E(p )t) exp (p · x − E(p)t) .
h̄ h̄
Since this commutator has to vanish, we must have f (p) = f (−p), so we must
take
c2 h̄2
f (p) = f (−p) = .
(2π h̄)3 2E(p)
√
It is therefore convenient to define a(p) ≡ A(p)/ f (p), so that
[a(p), a † (p )] = δ 3 (p − p ) (7.2.7)
and
d 3p
ϕ(x, t) = h̄c √
2E(p)(2π h̄)3/2
i −i
× a(p) exp (p · x − E(p)t) + a (p) exp
†
(p · x − E(p)t) .
h̄ h̄
(7.2.8)
The operators a(p) and a † (p) are analogous to the operators ai and ai† intro-
duced in our discussion of the harmonic oscillator Hamiltonian in Section 6.3
but with a continuum momentum argument in place of the three-valued index i
and a delta function instead of a Kronecker delta.
The Hamiltonian for the free scalar field is given by
1 1 2 m2 c 2 2
H0 = d x[π ϕ̇ − L0 ] =
3
d x 2 ϕ̇ + (∇ϕ) + 2 ϕ .
3 2
2 c h̄
Since this is quadratic in the field ϕ, when we insert the expression (7.2.8) for
ϕ in H0 , we encounter a double integral over momentum. The integral over x
yields (2π h̄)3 factors times momentum delta functions that reduce this to a sin-
gle momentum integral. The time-dependent terms in the integrand proportional
258 7 Quantum Field Theory
Then
H0 a(p)ψ = [H0 , a(p)]ψ + a(p)H0 ψ = (Eψ − E(p))a(p)ψ ;
so if a(p)ψ does not vanish, it is the wave function for a state with energy
Eψ − E(p). Likewise, this is a state with momentum pψ − p, while a † (p)ψ
is the wave function for a state with energy Eψ + E(p) and total momentum
pψ + p. In other words, a(p) and a † (p) respectively annihilate and create a par-
ticle of momentum p. This is what we mean when refer to elementary particles
being bundles of the energy and momentum in some field.
At this point, and for the rest of this chapter, we will abandon the language of
wave mechanics, and instead employ the more abstract language of state vectors
and scalar products that was outlined in Section 5.10. In quantum field theory
the wave function of any state such as the vacuum is a complicated functional of
the fields, and the action of operators like a(p) on these wave functions involves
functional derivatives with respect to these fields. None of these complications
plays a role in most calculations. What we use instead are the properties of
operators, such as the field equations and the canonical commutation relations,
and limited assumptions about physical states.
In particular, it is a plausible physical assumption that there should exist a
physical state, the vacuum vac , with the lowest possible energy. Then a(p)vac
must vanish,
a(p)vac = 0 , (7.2.13)
since otherwise it would be a state with energy less by an amount E(p).
To calculate the energy and momentum of the vacuum, it is convenient to use
the commutator of a with a † to rewrite Eqs. (7.2.9) and (7.2.10) as
, -
H0 = d 3 p E(p) a † (p)a(p) + Evac , (7.2.14)
P = d 3 p p a † (p)a(p) , (7.2.15)
1 For a textbook discussion and references to the original literature, see S. Weinberg, Cosmology (Oxford
University Press, Oxford, 2008), Sections 1.4 and 1.5.
7.3 Interactions 261
7.3 Interactions
2 R. P. Feynman, Rev. Mod. Phys. 20, 367 (1948); Phys. Rev. 74, 939, 1430 (1948); ibid., 76, 749, 769
(1949); ibid 80, 440 (1950).
3 J. Schwinger, Phys. Rev. 74, 1439 (1948); ibid., 75, 651 (1949); ibid., 76, 790 (1949); ibid., 82, 664, 914
(1951); ibid., 91, 713 (1953); Proc. Nat. Acad. Sci. 37, 452 (1951).
4 S. Tomonaga, Prog. Theor. Phys. Rev. Mod. Phys. 1, 27 (1946); Z. Koba, T. Tati, and S. Tomonaga, ibid.,
2, 101 (1947); S. Kanesawa and S. Tomonaga, ibid., 3, 1, 101 (1948); S. Tomonaga, Phys. Rev. 74, 224
(1948); D. Ito, Z. Koba, and S. Tomonaga, Prog. Theor. Phys. 3, 276 (1948); Z. Koba and S. Tomonaga,
ibid., 3, 290 (1948).
5 F. J. Dyson, Phys. Rev. 75, 486, 1736 (1949).
262 7 Quantum Field Theory
where g(α) is a smooth function of the momenta of all the particles in state α,
introduced to give meaning to the limit for t → −∞. (Recall that the label α is
intended to include the momenta, spin 3-components or helicities, and species
labels of all the particles in the state α, and an integral over α is intended to
include integrals over all momenta and sums over all spin 3-components or
helicities and species labels.) We also showed that at very late times the same
state α looks like the superposition dβSβα β , in the sense of Eq. (3.6.35):
for t → +∞,
−iH t −iH0 t
g(α)e α dα → g(α)e dα dβ Sβα β .
where
(t) ≡ eiH t e−iH0 t .
From the two equalities we can conclude that
Sβα = (β , −1 (+∞)(−∞)α ) = (β , U (+∞, −∞)α ) , (7.3.1)
where
U (t, t0 ) ≡ −1 (t)(t0 ) = eiH0 t e−iH (t−t0 ) e−iH0 t0 . (7.3.2)
The justification (such as it is) for treating (t) as if it had well-defined limits
for t → ±∞ is that at very early and very late times the incoming and outgoing
particles are so far apart that the interaction H − H0 is ineffective. As we shall
see, at least in perturbation theory the limits t → ±∞ do lead to well-defined
probability amplitudes.
7.3 Interactions 263
where
H (t1 ) H (t2 ) t1 > t2
T {H (t1 ) H (t2 )} =
H (t2 ) H (t1 ) t2 > t 1 .
The complete sum is then
∞
t t
(−i)n
U (t, t0 ) = 1 + dt1 · · · dtn T {H (t1 ) · · · H (tn )} . (7.3.7)
n! t0 t0
n=1
264 7 Quantum Field Theory
Lorentz Invariance
The remaining problem with Lorentz invariance is that the integrand is still
time-ordered. As we saw in Section 4.7, the ordering in time of two events at
spacetime positions x1 and x2 is Lorentz invariant if the separation x1 − x2 is
time-like or light-like – that is (using units with c = 1), if
ημν (x1 − x2 )μ (x1 − x2 )ν = (x1 − x2 )2 − (t1 − t2 )2 ≤ 0 .
Thus to make the scattering operator Lorentz invariant we need the densities to
commute at space-like separations:
H (x1 ), H (x2 ) = 0 for ημν (x1 − x2 )μ (x1 − x2 )ν > 0 . (7.3.10)
The vanishing of this commutator tells us that there is no obstacle to finding
states that are eigenstates of both H (x1 ) and H (x2 ), which can also be justified
on grounds of causality since for x1 − x2 space-like no signal could travel from
a measurement of H at x1 to interfere with a measurement of H at x2 .
Any space-like separation x1 − x2 can be obtained from a purely spatial
separation with t1 = t2 by a Lorentz transformation, so as long as H (x) is a
scalar, the necessary and sufficient condition for (7.3.10) is that the commutator
should vanish at equal times:
H (x1 , t), H (x2 , t) = 0 . (7.3.11)
The scalar field ϕ(x, t) introduced in the former section satisfies the commuta-
tion relation [ϕ(x1 , t), ϕ(x2 , t)] = 0 for any positions x1 and x2 , so an interaction
Hamiltonian density H constructed as any polynomial function of ϕ will satisfy
Eq. (7.3.11). As we shall see in the next section, the condition (7.3.11) is not
so easy to satisfy in more general theories, and this leads to the necessity of
antiparticles.
7.3 Interactions 265
Example: Scattering
To make all this concrete, let us calculate the lowest-order amplitude for scat-
tering of a pair of particles in the theory of real scalar fields, with H0 the
free-particle Hamiltonian described in the previous section, and with the simple
interaction Hamiltonian density
g
H = ϕ 3 , (7.3.12)
6
with g a constant taken small enough to justify the use of perturbation theory.6
(The factor 1/6 is inserted for later convenience.) To lowest order in g, the
S-matrix element for particles with momenta p1 and p2 to scatter, with momenta
changing to p1 and p2 , is
1 g
2
Sp1 p2 ,p1 p2 = − d 4 x d 4 y (p1 p2 , T {ϕ 3 (x), ϕ 3 (y)}p1 p2 ) , (7.3.13)
2 6
where
1
p1 p2 ≡ √ a † (p1 ) a † (p2 )0 , (7.3.14)
2
√
with 0 the free-particle vacuum state. The factor 1/ 2 is included to compen-
sate for the sum of two delta functions in the scalar product; using Eqs. (7.2.7)
and (7.2.13),
1
(p1 p2 , p1 p2 ) = δ 3 (p1 − p1 )δ 3 (p2 − p2 ) + δ 3 (p1 − p2 )δ 3 (p2 − p1 ) .
2
(7.3.15)
There is no term in this S-matrix element that is of first order in g, because
there are not enough creation and annihilation operators in a single ϕ 3 operator
to destroy the two initial particles and create the two final particles.
Our strategy in calculating the scattering amplitude (7.3.13) will be to move
a pair of the annihilation operators in ϕ 3 (x) and/or ϕ 3 (y) past the creation oper-
ators in p1 p2 , which gives a pair of commutators of annihilation with creation
operators, and use the fact that annihilation operators give zero when acting
on 0 ; also, to move a pair of the creation operators in ϕ 3 (x) and/or ϕ 3 (y)
to the left side of the scalar product, so that their adjoints act as annihilation
operators on p1 p2 , and then move them past the creation operators in p1 p2 ,
giving another pair of commutators of annihilation with creation operators and
again using the fact that annihilation operators give zero when acting on 0 .7
6 This theory is actually unphysical, because H → −∞ if g > 0 and ϕ → −∞ or if g < 0 and ϕ → +∞.
This problem does not emerge in perturbation theory, and in any case can be dealt with by adding higher
even powers of ϕ with positive coefficients in H .
7 In moving these annihilation operators out of the time-ordered product to the right or their adjoints to
the left, we are ignoring their commutators with the other fields in ϕ 3 (x) and ϕ 3 (y), because these terms
involve momentum delta functions that vanish if we assume that neither p1 nor p2 equal either p1 or p2 .
266 7 Quantum Field Theory
−g 2
=
2(2π)6 2E(p1 ) · 2E(p2 ) · 2E(p1 ) · 2E(p2 )
× d 4 x d 4 ye−ip1 ·x e−ip2 ·y eip1 ·x eip2 ·y (0 , T {ϕ(x), ϕ(y)}0 ) .
(7.3.19)
g2
Sp(b)
p ,p p =− d 4 x d 4 y g([ϕan (y), a † (p1 )][ϕan (x), a † (p2 )]0 ,
1 2 1 2 2
T {ϕ(x), ϕ(y)} × [ϕan (x), a † (p1 )][ϕan (y), a † (p2 )]0 g)
−g 2
=
2(2π)6 2E(p1 ) · 2E(p2 ) · 2E(p1 ) · 2E(p2 )
× d 4 x d 4 ye−ip1 ·y e−ip2 ·x eip1 ·x eip2 ·y (0 , T {ϕ(x), ϕ(y)}0 ) .
(7.3.20)
g2
Sp(c)
p ,p p =− d 4 x d 4 y g([ϕan (y), a † (p1 )][ϕan (y), a † (p2 )]0 ,
1 2 1 2 2
T {ϕ(x), ϕ(x)} × [ϕan (x), a † (p1 )][ϕan (y), a † (p2 )]0 g)
−g 2
=
2(2π)6 2E(p1 ) · 2E(p2 ) · 2E(p1 ) · 2E(p2 )
× d 4 x d 4 ye−ip1 ·y e−ip2 ·y eip1 ·x eip2 ·x (0 , T {ϕ(x), ϕ(y)}0 ) .
(7.3.21)
These three contributions are symbolized in three of what are known as Feyn-
man diagrams, shown here in Figure 7.1.
1¢ 2¢ 2¢ 1¢
1 2 1 2
(a) (b)
1¢ 2¢
1 2
(c)
Figure 7.1 Feynman diagrams for the scattering of neutral scalar particles.
Here the lines coming into the diagrams from below or going out from the
diagrams above represent particles in initial and final states, respectively; the
vertices represent an interaction and are proportional to ϕ 3 ; the line connecting
vertices represents the propagator.
1 1 1
× + + .
(p1 − p1 )2 + m2 (p1 − p2 )2 + m2 (p1 + p2 )2 + m2
(7.3.27)
8 The circumstance that all four-momenta are fixed by the delta functions generated by integrals over
spacetime coordinates is true of all tree diagrams – that is, diagrams like Figure 7.1 that can be
disconnected by cutting any single internal line. The contributions to the S-matrix of diagrams with L loops,
whose disconnection requires the cutting of a minimum of L + 1 internal lines, involve integrals over L
four-momenta.
270 7 Quantum Field Theory
The appearance here of the term 1/[(p1 − p1 )2 + m2 ] may evoke the recol-
lection of an earlier result. In Eq. (5.6.23) we found in the Born approximation
that a potential proportional to exp(−κr)/r gives a scattering amplitude pro-
portional to 1/[(k − k )2 + κ 2 ], where k and k are the initial and final wave
numbers of the scattered particle, or, in units with h̄ = 1, the initial and final
momenta. There is no energy term in the denominator in Eq. (5.6.23) because
the scattering was supposed there to be due to an external potential that can
transfer momentum but not energy. Aside from that, the comparison shows
that the exchange of a scalar particle of mass m creates effects, like those of
a Yukawa potential, proportional to exp(−κr)/r, with κ = m in natural units
or, in cgs units, with κ = mc/h̄. This was the point made by Yukawa9 in 1935,
which led him to the prediction of a “meson,” with mass intermediate between
the electron and the proton, to carry the nuclear force.
The real scalar field discussed in the previous two sections could not describe a
particle that carries any conserved quantity, such as electric charge. If the anni-
hilation part ϕan (x) of the field given by Eq. (7.3.17) destroys a certain amount
†
of charge then its adjoint ϕan (x) would create the same quantity of charge, and
†
no interaction such as ϕ 3 constructed from the real field ϕ = ϕan + ϕan could
possibly conserve this quantity. We could construct interactions that conserve
†
charge by separating ϕan and ϕan and taking the interaction to include equal
numbers of factors of each, such as ϕan 2 ϕ †2 , but then we would not be able
an
†
to preserve Lorentz invariance. The commutator of ϕan (x) and ϕan (y) is the
function + (x − y) given by Eq. (7.3.23), which does not vanish for x − y
space-like, so an interaction such as ϕan 2 ϕ †2 that treats ϕ and ϕ † separately
an an an
would not satisfy the condition (7.3.10), which we have seen is necessary for
Lorentz invariance.
So, what to do? The only known way of restoring Lorentz invariance for
charged particles while preserving charge conservation is to take the free field
to be complex, the sum of a term that annihilates a particle and another term
that creates its antiparticle, a particle with the opposite value of electric charge
(and of all other conserved quantities) but the same spin and mass. For spinless
particles this field takes the form
9 H. Yukawa, Proc. Phys.-Math. Soc. Japan 17, 48 (1935). This article is reprinted in Beyer, Foundations of
Nuclear Physics, listed in the bibliography.
7.4 Antiparticles, Spin, Statistics 271
d 3p
ϕ(x, t) = √ 3/2
a(p) exp(ip · x − iE(p)t)
2E(p)(2π)
+ b† (p) exp(−ip · x + iE(p)t) (7.4.1)
where
[a(p), a † (p )] = [b(p), b† (p )] = δ 3 (p − p ) , (7.4.2)
[a(p), a(p )] = [b(p), b(p )] = [a(p), b(p )] = 0 , (7.4.3)
[a † (p), b(p )] = [b† (p), a(p )] = 0 . (7.4.4)
In particular, the commutator of ϕ(x, t) with ϕ † (y, t) vanishes for space-like
x −y because of the same sort of cancellation that we encountered for real scalar
fields. Both terms in ϕ change the electric charge (or any other conserved quan-
tity) by the same amount, so interactions conserve this charge if they contain
an equal number of factors of ϕ and ϕ † . This theory was presented in 1934 by
Pauli and Weisskopf,10 in order to contradict Dirac’s view that antiparticles arise
as holes in a sea of negative-energy particles. Antiparticles are indispensable
for Lorentz invariance in any quantum field theory of particles that carry a
conserved quantity, such as electric charge, even if the particles are bosons,
which do not satisfy the Pauli exclusion principle and so could not form a stable
sea of negative-energy particles. Where particles carry no conserved quantity, as
in the previous sections, these particles can be said to be their own antiparticles.
This is the case for the neutral pion and for the Z 0 particle.
This sort of free complex field theory can be derived as a consequence of a
free-field Lagrangian density, of the form
∂ϕ † ∂ϕ
L = −ημν − m2 ϕ † ϕ . (7.4.5)
∂x μ ∂x ν
The Euler–Lagrange field equations are again
( − m2 )ϕ = 0 (7.4.6)
whose general complex solution is Eq. (7.4.1), with spacetime-independent op-
erator coefficients a and b† . But now the canonical conjugate to ϕ is the time
derivative of an independent canonical variable, the adjoint:
∂ †
π(x, t) = ϕ (x, t) , (7.4.7)
∂t
while ∂ϕ/∂t is the canonical conjugate to ϕ † . The canonical commutation rela-
tions (7.1.10) then yield the commutation relations (7.4.2)–(7.4.4).
The particles described so far are bosons. The multi-particle state vectors
here are
a † (p1 )a † (p2 )a † (p3 ) · · · b† (p1 )b† (p2 )b† (p3 ) · · · 0 , (7.4.8)
where 0 is the vacuum, satisfying a(p)0 = b(p)0 = 0. By taking the
adjoints of the commutation relations (7.4.3), we see that these particles are
bosons; the states (7.4.8) are completely symmetric under interchanges of the
labels p1 , p2 , etc. of the particles and under interchanges of the labels p1 , p2 ,
etc. of the antiparticles.
Suppose we wanted to construct a theory of spinless neutral fermions. We
could suppose that all the commutators [A, B] ≡ [A, B]− ≡ AB − BA
that we previously derived from the canonical commutation relations are now
replaced with anticommutators, [A, B]+ ≡ AB + BA. For instance, we can try
introducing a real scalar field like (7.3.16):
ϕ(x, t) = ϕan (x, t) + ϕan
†
(x, t) ,
d 3p
ϕan (x, t) = √ exp(ip · x − iE(p)t) a(p)
(2π)3/2 2E(p)
but we now suppose that the annihilation and creation operators satisfy the
anticommutation rules:
[a(p), a † (p )]+ = δ 3 (p − p ) , (7.4.9)
[a(p), a(p )]+ = [a † (p), a † (p )]+ = 0 . (7.4.10)
The anticommutation relations (7.4.10) imply the complete antisymmetry of the
multi-particle state vector
a † (p1 )a † (p2 )a † (p3 ) · · · 0 (7.4.11)
under interchange of the labels p1 , p2 , etc. of the particles, as required for
fermions.
In place of the vanishing of the equal-time commutator [ϕ(x, t), ϕ(y, t)] we
now have
[ϕ(x, t), ϕ(y, t)]+ = [ϕan (x, t), ϕan
†
(y, t)]+ + [ϕan
†
(x, t), ϕan (y, t)]+
= + (x − y, 0) + + (y − x, 0)
where + (x − y, x 0 − y 0 ) is the function (7.3.23), which at equal times is
non-zero and even:
d 3p m K1 (m|x − y|)
+ (x − y, 0) = eip·(x−y) = .
2E(p)(2π) 3 4π 2 |x − y|
We see that here the two terms in [ϕ(x, t), ϕ(y, t)]+ do not cancel but add,
unlike the bosonic case. It is in fact impossible to construct scalar fields that
anticommute at equal times for spinless fermions.
7.4 Antiparticles, Spin, Statistics 273
Though it is not possible here to go into so much detail, the same sort of
analysis leads to the general conclusion cited in Section 5.5, that integer-spin
particles (including spinless particles) must be bosons, while particles with half-
odd-integer spin must be fermions. The free fields for spinning particles take the
general form
ϕn (x, t) = ϕn,an (x, t) + ϕn,cr (x, t) , (7.4.12)
d 3p
ϕn,an (x, t) = √ un (p, σ ) exp(ip · x − iE(p)t) a(p, σ ) ,
σ
(2π)3/2 2E(p)
(7.4.13)
d 3p
ϕn,cr (x, t) = √ vn (p, σ ) exp(−ip · x + iE(p)t) b† (p, σ ) .
σ
(2π)3/2 2E(p)
(7.4.14)
Here a(p, σ ) is the operator that annihilates a particle of momentum p and spin
3-component σ ; b† (p, σ ) is the operator that creates its antiparticle (so that
both terms in ϕ have the same effect on the charge and all other conserved
quantities); and un (p, σ ) and vn (p, σ ) are functions about which more later. For
neutral particles that are their own antiparticles, a(p, σ ) = b(p, σ ). For bosons
or fermions, the operators a and b satisfy the commutation or anticommutation
relations
[a(p, σ ), a † (p , σ )]∓ = [b(p, σ ), b† (p , σ )]∓ = δσ σ δ 3 (p − p ) ,
(7.4.15)
[a(p, σ ), a(p , σ )]∓ = [b(p, σ ), b(p , σ )]∓ = [a(p, σ ), b(p , σ )]∓ = 0 ,
(7.4.16)
[a † (p, σ ), b(p , σ )]∓ = [b† (p, σ ), a(p , σ )]∓ = 0 , (7.4.17)
the ∓ signs being minus, denoting commutators, for bosons or plus, denoting
anticommutators, for fermions.
The functions un (p, σ ) and vn (p, σ ) are governed by what is assumed for the
Lorentz-transformation property of the fields. Under a Lorentz transformation
x μ → x μ = μ ν x ν , the various fields ϕn (x) undergo various matrix transfor-
mations11
ϕn (x) → ϕn (x) = Dnm ()ϕm (x) , (7.4.18)
m
For particles that are not their own antiparticles, the commutators or anticom-
mutators of the ϕn with each other and of the ϕn† with each other trivially vanish.
On the other hand, the equal-time commutator or anticommutator of any field
with its adjoint is
†
[ϕn (x, t), ϕm (y, t)]∓ = nm (x − y) ∓ nm (y − x) (7.4.20)
where
d 3p
nm (x − y) ≡ un (p, σ )u∗m (p, σ )eip·(x−y) , (7.4.21)
σ
2E(p)(2π)3
d 3p ∗
nm (x − y) ≡ 3/2
vn (p, σ )vm (p, σ )eip·(x−y) . (7.4.22)
σ
2E(p)(2π)
The first and second terms on the right of Eq. (7.4.20) come respectively
from the commutator or anticommutator of the annihilation part of ϕn (x, t)
†
with the creation part of ϕm (y, t) and from the commutator or anticommutator
†
of the creation part of ϕn (x, t) with the annihilation part of ϕm (y, t). (The
crucial ∓ sign that distinguishes bosons from fermions appears in the second
term of Eq. (7.4.20) because this term comes from the part of the commutator
†
or anticommutator of ϕn with ϕm in which b† appears to the left of b.) Detailed
calculations beyond the scope of this book show that12
nm (y − x) = (−1)2j |λ|2 nm (x − y) (7.4.23)
where j is the particle spin and λ depends on how the un and vn are normalized.
(If we multiply un and vn by factors α and β then λ is changed by a factor β/α.)
For equal-time commutators or anticommutators of fields and their adjoints to
vanish, the two terms in Eq. (7.4.20) must cancel. For this we need
|λ|2 (−1)2j = ±1 ,
with the top sign for bosons and the bottom sign for fermions. This requires that
|λ| = 1, which can always be arranged by adjusting the relative normalization of
un and vn , and thereby imposes a relation between the strengths of interactions
of particles and antiparticles. But with |λ| = 1, we also need
(−1)2j = ±1 . (7.4.24)
This is the famous connection between spin and statistics:13 particles with j an
integer are bosons, and particles with j a half odd integer are fermions.
12 For a textbook treatment, see e.g. S. Weinberg, The Quantum Theory of Fields, Vol. I (Cambridge
University Press, Cambridge, UK, 1995), Section 5.7.
13 M. Fierz, Helv. Phys. Acta 12, 3 (1939); W. Pauli, Phys. Rev. 58, 716 (1940).
276 7 Quantum Field Theory
(As usual, [A, B]+ is defined as AB + BA.) The anticommutator of the Dirac
field with its adjoint is given by
μ ∂ d 3 p ip·(x−y)
[ψn (x), ψ m (y)]+ = −γ μ
+ m 0 3
e − e−ip·(x−y) ,
∂x nm 2p (2π)
(7.4.40)
This yields the Euler–Lagrange field equation for a Dirac field interacting with
any electromagnetic field:
∂
γ μ
+ ieAμ + me ψ = 0 . (7.4.42)
∂x μ
The Dirac wave function used in Dirac’s calculations was not the quantum
field ψ, but its matrix elements:
ψAn (x) ≡ (vac , ψn (x)A ) , ψBn (x) ≡ (B , ψn (x)vac ) (7.4.43)
where A and B are states of charge −e and +e, respectively, such as states of
an electron and a positron in the electromagnetic field of an atom. These wave
functions satisfy the same equation as the field;
∂ ∂
γ μ
+ ieAμ + me ψA (x) = γ μ
+ ieAμ + me ψB (x) = 0 .
∂x μ ∂x μ
(7.4.44)
For a time-independent electromagnetic field, the time dependence of the Dirac
field is governed by a time-independent Hamiltonian H in the Heisenberg pic-
ture, so, for states A and B with energy EA and EB , the wave functions have
the time dependences
ψAn (x, t) ∝ e−iEA t , ψBn (x, t) ∝ e+iEB t . (7.4.45)
The different sign of the argument of e+iEB t does not arise because the state
B has negative energy, but because it appears to the left of the Dirac field in
the definition (7.4.43) of the wave function ψBn (x, t). From solutions of the
wave (7.4.44) for ψAn (x, t) with time dependence given by (7.4.45) and a pure
Coulomb field A0 = Ze2 /r, A = 0, Dirac was able to calculate the energies of
the states of hydrogenic atoms, including their fine structure:
Ze4 Z 2 e8 3 n
E(nj ) = me 1 − 2 + 4 − + ··· (7.4.46)
2n n 8 2j + 1
with no dependence on .
As discussed in Section 6.5, Fermi in his 1934 theory of beta decay proposed
an interaction Hamiltonian of the form (6.5.1), proportional to the scalar product
of two vector currents. This then had to be modified, first by the introduction
of axial vector currents and then by including terms that violate invariance
under space inversion, resulting in an interaction of the form (6.5.4). Expressed
explicitly in terms of Dirac fields for the proton, neutron, electron, and neutrino,
Fermi’s original proposed interaction (in units with h̄ = c = 1) was
Hβ = GF (ψ e γ μ ψν )(ψ p γ μ ψn ) + GF (ψ ν γ μ ψe )(ψ n γ μ ψp ) , (7.4.47)
280 7 Quantum Field Theory
and after 30 years of experiments on nuclear beta decay and other weak inter-
action processes, this was finally modified to
GF
Hβ = √ (ψ e γ μ (1 + γ5 )ψν )(ψ p γ μ (1 + γ5 )ψn )
2
GF
+ √ (ψ ν γ μ (1 + γ5 )ψe )(ψ n γ μ (1 + γ5 )ψp ) , (7.4.48)
2
where GF = 1.16 × 10−5 GeV−2 and γ5 ≡ iγ1 γ2 γ3 γ0 . It can be shown from
the anticommutation relations that γ5 is Lorentz invariant, in the sense of com-
muting with D(), so (7.4.48) like (7.4.47) transforms as a scalar under any
proper Lorentz transformation. It is the presence of the matrix 1 + γ5 in Hβ
that produces the violations of invariance under space inversion discussed in
Section 6.5, including the fact that if neutrinos were massless, the neutrinos cre-
ated along with electrons by the first term in Eq. (7.4.48) or along with positrons
in the second term in Eq. (7.7.48) would have a component of angular momen-
tum in the direction of motion respectively equal to h̄/2 or −h̄/2. For the very
small known masses of neutrinos, these helicities are overwhelmingly likely.
We end our treatment of quantum mechanics where we began, with the quantum
theory of radiation. We will first present the Lagrangian densities both for the
free electromagnetic field and for the fields’ interactions with matter, then work
out in detail the theory of the free field, which as shown in Section 7.3 is needed
to provide the interaction in the interaction picture in perturbation theory and
then to apply what we have learned to a classic problem, calculation of the
rate of emission of photons in transitions between atomic or molecular states.
We close with an account of the interaction of electromagnetism with general
matter fields.
Lagrangian Density
It is easy to think of a possible Lagrangian density for the electromagnetic
field that is quadratic in the fields, like all free-field Lagrangians and is Lorentz
invariant:
1
L0 = − ημρ ην,σ F μν F ρσ , (7.5.1)
16π
where F μν is the field strength tensor, given by Eqs. (4.6.7) and (4.6.8):
E1 = F 01 = −F 10 , E2 = F 02 = −F 20 , E3 = F 03 = −F 30 , (7.5.2)
7.5 Quantum Theory of Electromagnetism 281
B1 = F 23 = −F 32 , B2 = F 31 = −F 13 , B3 = F 12 = −F 21 . (7.5.3)
(The factor −1/16π is irrelevant now but will be convenient later, when we con-
sider the coupling of these fields to matter.) This is manifestly
Lorentz invariant,
but otherwise appears absurd. If we assume that d 4 x L is stationary under
arbitrary infinitesimal variations of the fields F μν , we find Euler–Lagrange
equations of the form F μν = 0, which certainly do not describe actual free
electromagnetic fields. The error made in deriving this wrong result is that
we must not impose conditions for arbitrary variations of F μν , because the field-
strength tensor is constrained by the homogeneous Maxwell equations (4.6.15),
(4.6.17):
∂Fμν ∂Fλμ ∂Fνλ
0= λ
+ + (7.5.4)
∂x ∂x ν ∂x μ
where
Fμν ≡ ημρ ηνσ F ρσ . (7.5.5)
We should only demand that the action is stationary for variations in the fields
that preserve the constraint (7.5.4).
It is easy to see that this requirement leads to the remaining free-field
Maxwell equations ∂F μν /∂x ν = 0, but in deriving the canonical commutation
relations it is awkward to work with functional derivatives with respect to
constrained fields like F μν . In electrodynamics it is much easier to express the
field-strength tensor in terms of an unconstrained vector potential Aμ , in such a
way that the constraint (7.5.4) is automatically respected,
∂Aν ∂Aν
Fμν = − μ (7.5.6)
∂x μ ∂x
and take all functional derivatives with respect to the Aμ . As shown in
Section 5.8, the introduction of a vector potential is essential anyway in
formulating the quantum theory of charged particles in an electromagnetic
field.
For the present we will introduce a general Lagrangian density Lmat for
matter and its interaction with the electromagnetic field, and define the electric
current four-vector J μ as the functional derivative with respect to Aμ (x) of the
corresponding term in the action:
δ
J (x) ≡
μ
d 4 y Lmat (y) . (7.5.7)
δAμ (x)
Under an infinitesimal shift in Aμ , the change in the total action is now
1 ∂F μν (x)
δ d x(L0 + Lmat ) = d x −
4 4
+ J (x) δAμ (x) ,
μ
4π ∂x ν
282 7 Quantum Field Theory
Gauge Transformations
Now we have a problem. We cannot satisfy the canonical commutation relations
for the field A0 , because since F00 = 0 the Lagrangian density does not contain
a time derivative of A0 . To deal with this, we note that the action is invariant
under a gauge transformation
∂ξ(x)
Aμ (x) → Aμ (x) + (7.5.9)
∂x μ
with ξ(x) an arbitrary function of the spacetime coordinate. This has no effect
on the field-strength tensor (7.5.6), and the consistency of the Maxwell equa-
tions requires that the current J μ is conserved in the sense that ∂J μ (x)/∂x μ =
0, so that according to Eq. (7.5.7) the change produced in the matter action by
the gauge transformation (7.5.9) is
μ
∂ξ(x) 4 ∂J (x)
δ d x Lmat = d x J (x)
4 4 μ
= − d x ξ(x) = 0 .
∂x μ ∂x μ
(7.5.10)
Coulomb Gauge
We can always choose ξ(x) so as to adopt what is known as the Coulomb gauge,
for which
∇·A=0 (7.5.11)
because if ∇ · A = 0, we can make it vanish by performing a gauge transforma-
tion with ∇ 2 ξ = −∇ · A. This is called the Coulomb gauge because the μ = 0
component of the inhomogeneous Maxwell equations (7.5.8) is here
∂F 0i
4π J 0 = = −∇ 2 A0
∂x i
with solution given by the familiar Coulomb field
J 0 (y, t)
A (x, t) = d 3 y
0
. (7.5.12)
|x − y|
Since A0 is a functional of the matter fields in J 0 at the same time, it is not to
be regarded as an independent canonical variable. The canonical variables of
7.5 Quantum Theory of Electromagnetism 283
15 R. P. Feynman, Ph.D. thesis, The Principle of Least Action in Quantum Mechanics (Princeton University,
1942; University Microfilms Publication No. 2948, Ann Arbor).
284 7 Quantum Field Theory
The usual canonical commutation relations must here be modified to take ac-
count of the conditions (7.5.11) and (7.5.14). We use the formula
1
∇2 = −4πδ 3 (x − y) .
|x − y|
(This can be derived by showing directly that the left-hand side vanishes for
x = y, and using Gauss’s theorem to show that its integral over all space is
−4π.) Then we have consistency with conditions (7.5.11) and (7.5.14) if we
take
∂2 1
[Ai (x, t), πj (y, t)] = iδij δ (x − y) + i i j
3
(7.5.15)
∂x ∂x 4π|x − y|
and also
[Ai (x, t), Aj (y, t)] = [πi (x, t), πj (y, t)] = 0 . (7.5.16)
Free Fields
As emphasized in Section 7.3, the first step in using time-ordered perturbation
theory to calculate processes involving interacting particles is to write explicit
formulas for the free fields. With zero current and charge densities, and hence
A0 = 0, the field equations (7.5.8) for Ai in Coulomb gauge are
∂F μi ∂ ∂Aμ
0= μ
= Ai
− μ
= Ai . (7.5.17)
∂x ∂xi ∂x
The general real solution of Eqs. (7.5.11) and (7.5.17) is conveniently written
√ d 3q
A(x, t) = 4π 3/2
√ e(q, λ)a(q, λ)eiq·x−i|q|t
λ
(2π) 2|q|
+ e∗ (q, λ)a † (q, λ)e−iq·x+i|q|t (7.5.18)
where a(q, λ) is an operator coefficient whose properties will be found from the
canonical commutation relations, and e(q, λ) are any two independent three-
vectors normal to q,
q · e(q, λ) = 0 (7.5.19)
with λ a two-valued index distinguishing the two solutions of (7.5.19). By a
suitable normalization of a(q, λ), we can always normalize these vectors so that
ei (q, λ)ej∗ (q, λ) = δij − qi qj /|q|2 . (7.5.20)
λ
√
the 3-direction, we can take e = (1, i, 0)/ 2 for λ = 1,
For instance, for q in√
and e = (1, −i, 0)/ 2 for λ = −1, and, for q in a direction defined by
7.5 Quantum Theory of Electromagnetism 285
some choice of rotation from the 3-direction, apply the same rotation to e.
These are the same as the polarization vectors for left- and right-handed circular
polarization that appear in the Fourier expansion of an electromagnetic wave.
With this normalization of the polarization vectors, the field (7.5.18) satisfies
the canonical commutation relations (7.5.15)–(7.5.16) if we take
and
Then, just as we saw for a real scalar field in Section 7.2, the operator a † (q, λ)
creates a photon of momentum q and polarization vector e(q, λ) in any state
vector on which it acts, while if there already is such a photon in the state, the
operator a(q, λ) removes it.
To see the physical significance of λ, note that for q in the 3-direction, if we
perform a rotation by angle θ around the 3-axis,
Since there is nothing special about the 3-direction, this is the effect of rotation
by angle θ around the direction of motion for a photon moving in any direction.
In accordance with the general discussion of angular momentum in Section 5.4,
this means that a photon created by a † (q, λ) has a component of angular
momentum around the direction of motion, that is a helicity, equal to h̄λ in cgs
units.
To calculate the free-field Hamiltonian, we first note that, since A0 = 0 for
free fields, the free-field Hamiltonian density is
1 1
H0 = πj Ȧj − L0 = Ȧj Ȧj + (∂i Aj − ∂j Ai )(∂i Aj − ∂j Ai ) ,
8π 16π
where as usual i and j run over the values 1, 2, 3, and repeated indices are
summed. Using integration by parts and the Coulomb gauge condition (7.5.11)
we find the free-field Hamiltonian
1
H0 = d 3 x H0 = d 3 x Ȧi Ȧi + ∂i Aj ∂i Aj .
8π
Inserting the field (7.5.18) and following just the same steps as in calculating
the free-field Hamiltonian for a scalar field in Section 7.2, we find the free-field
Hamiltonian for electromagnetism
286 7 Quantum Field Theory
1
H0 = d 3 q|q|(a † (q, λ)a(q, λ) + a(q, λ)a † (q, λ))
2
λ
= d 3 q|q|a † (q, λ)a(q, λ) + Evac , (7.5.23)
λ
where
Evac = δ 3 (0) d 3 q|q| = (2π)−3 d 3x d 3 q|q| . (7.5.24)
As in the case of the real scalar field treated in Section 7.2, the vacuum vac ,
defined as the state of lowest energy, must satisfy the condition
a(q, λ)vac = 0 , (7.5.25)
since otherwise there would be a state a(q, λ)vac with a lower energy than
vac . Thus
H0 vac = Evac vac . (7.5.26)
The energy (7.5.24) is a contribution to the total vacuum energy that must
be added to the contributions of all other fields, such as (7.2.16). The state
consisting of a photon with momentum q1 and helicity λ1 , another photon with
momentum q2 and helicity λ2 , and so on, may be expressed as
q1 ,λ1 ;q2 ,λ2 ;... ∝ a † (q1 , λ1 )a † (q2 , λ2 ) · · · vac , (7.5.27)
and has energy Evac + |q1 | + |q2 | + · · · . The term Evac appears in the energy
of all states, and so aside from gravitational phenomena may be ignored, as we
shall do here.
Radiative Decay
We now consider the rate at which an excited atom16 will drop into a state
of lower energy, emitting a photon. We shall neglect relativistic effects and
the interaction of the electromagnetic field with the electron spin, so that the
Hamiltonian for the atom interacting with the electromagnetic field is given by
a sum over the particles in the atom of terms of form (5.8.3). Since we are
interested in the emission only of a single photon, the relevant interaction term
is the part of this sum linear in A:
en
V =− [A(Xn ) · Pn + Pn · A(Xn )] ,
n
2mn
where en and mn are the charge and mass of the nth particle (electron or nu-
cleus) while Xn and Pn are the position and momentum operators of the nth
16 The calculations here of radiative decay rates apply to molecules as well as to atoms, but to avoid repeating
“or molecules” again and again, I will just refer below to transitions in atoms.
7.5 Quantum Theory of Electromagnetism 287
particle and A(X) is the quantum vector potential in the Schrödinger picture.
Because we are using Coulomb gauge, in which A satisfies Eq. (7.5.11), it
makes no difference in what order we write the operators in V , and we can
just as well write
en
V =− A(Xn ) · Pn . (7.5.28)
n
m n
We take the initial and final states of the atom to be eigenstates i,pi and f ,pf
of the Hamiltonian of the atom, with energies Ei and Ef , respectively, and
with total momenta pi and pf , respectively. (Because atomic nuclei are heavy
the kinetic energies of the states of the whole atom are always much less than
Ei − Ef , and so will be neglected.) The atomic state vectors are assumed to be
normalized so that
(a ,p , a,p ) = δa a δ 3 (p − p) . (7.5.29)
Each of these states is a vacuum as far as photons are concerned, so, for any
photon momentum q and helicity λ,
a(q, λ) i,pi = a(q, λ) f ,pf = 0 . (7.5.30)
The initial state of the radiative decay process is then i,pi , and the final state is
a † (q, λ)f ,pf , with q and λ the momentum and helicity of the emitted photon.
To first order in V we can treat A in Eq. (7.5.28) as a free field, so to this order
the S-matrix element (5.6.36) for the decay process is
S[i(pi ) → f (pf ) + γ (q, λ)]
= −2πiδ(Ef + |q| − Ei )(a † (q, λ)f ,pf , V i,pi )
= −2πiδ(Ef + |q| − Ei )(f ,pf , a(q, λ)V i,pi )
√ d 3q
= 2πiδ(Ef + |q| − Ei ) 4π %
λ
(2π)3/2 2|q |
en
× (f pf , a(q, λ)e∗ (q , λ ) · Pn e−iq ·Xn a † (q , λ )i,pi ) .
n
m n
(7.5.31)
Using the photon vacuum condition (7.5.30) and the commutation relation
(7.5.21), we can replace the product a(q, λ)a † (q , λ ) with δ 3 (q − q )δλλ , and
do the integral over q and the sum over λ by just setting q = q and λ = λ, so
√
2πi 4πδ(Ef + |q| − Ei )
S[i(pi ) → f (pf ) + γ (q, λ)] = √
(2π)3/2 2|q|
en
× f pf , e∗ (q, λ) · Pn e−iq·Xn i,pi .
n
mn
(7.5.32)
288 7 Quantum Field Theory
so17
exp(iq · X)f ,pi = f ,pf +q . (7.5.35)
Hence, replacing all Xn in the exponent in Eq. (7.5.32) with X, and letting
the adjoint of this exponential act on the final state, we have
√
−2πi 4π δ(Ef + |q| − Ei )
S i(pi ) → f (pf ) + γ (q, λ) = √
(2π)3/2 2|q|
en
× f ,pf +q , e∗ (q, λ) · Pn i,pi .
n
mn
(7.5.36)
The operators Pn all commute with the total momentum, so we can write their
matrix elements as
f ,pf +q , Pn i,pi = δ 3 (pf − pi + q)(Pn )f i (7.5.37)
and so
S [i(pi ) → f (pf ) + γ (q, λ)] = −2πiδ(Ef − Ei + |q|)δ 3 (pf − pi + q)
× M[i(pi ) → f (pf ) + γ (q, λ)] , (7.5.38)
17 This argument does not rule out the possible presence of a numerical factor multiplying the right-hand
side of Eq. (7.5.35). Any such factor of proportionality would have to have absolute magnitude unity,
because [exp(iq · X)]† exp(iq · X) = 1, and we define both f ,pf and f ,pf +q to be normalized in
accordance with Eq. (7.5.29). Such a phase factor would depend on our arbitrary choice of the phase of
the state f ,pf as a function of pf and can be defined to be unity, but in any case it cannot affect the
radiative transition rate, which is proportional to the absolute value squared of the matrix element for the
transition. So this possible phase factor will be ignored here.
7.5 Quantum Theory of Electromagnetism 289
where
√
4π en
M [i(pi ) → f (pf ) + γ (q, λ)] = √ (Pn )f i · e∗ (q, λ) .
(2π)3/2 2|q| n mn
(7.5.39)
To see how this is calculated in wave mechanics, note for example that in
hydrogen the initial and final atomic wave functions take the form
exp(ipi · x)
ψi,pi (x, x) = ψi (x) ,
(2π)3/2
exp(i[pf + q] · x)
ψf ,pf +q (x, x) = ψf (x) ,
(2π)3/2
where x is the vector separation of the electron and proton and x is the coor-
dinate vector of the center of mass. With me
mp , the matrix element of the
electron momentum operator Pe is
f ,pf +q , Pe i,pi = d x d 3 x ψf∗ ,pf +q (x, x)Pe ψi,pi (x, x) ,
3
Using Eq. (7.5.39) in Eq. (5.6.45) (with the number Nα of particles in the
initial state equal to one), the differential decay rate is
2
d(i → f + q, λ) = 2π M[i(pi ) → f (pf ) + γ (q, λ)]
× δ(Ef + |q| − Ei )δ 3 (pf + q − pi ) d 3 q d 3 pf
2
1 en ∗
= (Pn )f i · e (q, λ)
2π|q| n mn
In the common case where the photon helicity is not measured, the observed
rate is given by a sum over helicities. This sum can be calculated using
Eq. (7.5.20):
ej (q, λ)ek∗ (q, λ) = δj k − q̂j q̂k ,
λ
where q̂ is the unit vector q/|q|. The observed differential decay rate for emis-
sion of a photon with momentum q into a small solid angle dγ is
d(i → f + q, λ)
λ
∗
|q| en en
= (Pnj )f i (Pnk )f i (δj k − q̂j q̂k )dγ . (7.5.42)
2π n
mn n
mn
We can now easily integrate over the photon direction, using
1 8π
dγ δj k − q̂j q̂k = 4πδj k 1 − = δj k .
3 3
The total decay rate for emission of a photon in any direction with any helicity
is then, in Einstein’s notation,
2
f
4|q| en
Ai = dγ d(i → f + q, λ) = (P n f i .
) (7.5.43)
3 mn n
λ
Section 3.5 shows how to use this also to calculate the rates of absorption and
stimulated emission of radiation.
Calculations are made easier if we replace matrix elements of momen-
tum vectors with matrix elements of position vectors. For this, we use the
commutator
1 i
[Xn , H ] = Xn , P2n = Pn ,
2mn mn
n
so
(f ,pf +q , Pn i,pi ) = −i(Ei − Ef )mn (f ,pf +q , Xn i,pi )
= −i|q|mn (f ,pf +q , Xn i,pi ) .
Therefore the decay rate (7.5.43) may be written
2
f 4|q|3
Ai = en (Xn )f i , (7.5.44)
3 n
where
(f ,pf +q , Xn i,pi ) = δ 3 (pf − pi + q)(Xn )f i . (7.5.45)
7.5 Quantum Theory of Electromagnetism 291
Selection Rules
As we have already warned, the electric dipole approximation is not useful if
selection rules give zero for the decay matrix element. We can derive the selec-
tion rules from either Eq. (7.5.43) or Eq. (7.5.44). First, as shown in Section 5.2,
the components of the operator X can be assembled into the spherical harmonics
for = 1,
3 ±1 3
∓ (X1 ± iX2 ) = |X|Y1 (X/|X|) , X3 = |X|Y10 (X/|X|) .
8π 4π
According to the rules for addition of angular momenta set out in Section 5.4,
if the initial atom at rest has total angular momentum quantum number ji , then
the states Xk i,pi for pi = 0 can only have total angular momentum quantum
number jf equal to ji + 1, ji , or ji − 1 and, furthermore, if ji = 0 then only
jf = 1 is possible; jf = 0 is only possible if ji = 1. Hence radiative decay
does not occur in the electric dipole approximation unless the initial and final
atomic states satisfy the selection rule
|jf − ji | ≤ 1 ≤ ji + jf . (7.5.46)
There is a further selection rule that follows from space inversion symmetry.
As we saw in Section 5.4, if we change the sign of each of the three Cartesian
coordinates, any state vector is changed to , where the operator is
unitary in the sense that † = 1, and, since making two space inversions in
succession changes nothing, also 2 = 1. Physical states therefore can be cho-
sen as eigenstates of with eigenvalue, known as the parity of the state, equal
to +1 or −1. The coordinate vector is obviously odd under space inversion, so
18 W. Heisenberg, Z. Physik 33, 879 (1925); reprinted in English in Van der Waerden, Sources of Quantum
Mechanics, listed in the bibliography.
19 J. Larmor, Phil. Mag. S.5 44, 503 (1897).
20 P. A. M. Dirac, Proc. Roy. Soc. A 114, 710 (1927).
292 7 Quantum Field Theory
Xn = −Xn . Hence if the initial and final atomic states have parity πi and
πf , the transition rate will vanish in the electric dipole approximation unless the
initial and final parities satisfy the selection rule
πf = −πi . (7.5.47)
For instance, in hydrogen the transition 2p → 1s (ignoring spin) has ji = 1,
πi = −, jf = 0, πf = +, so it satisfies the selection rules (7.5.46) and (7.5.47)
and is therefore predominantly an electric dipole transition. This is the Lyman
alpha ultraviolet transition. On the other hand, in the electric dipole approxima-
tion the 2s, 3s, and 3d states are forbidden by both selection rules from decaying
into the 1s ground state.
Of course the electric dipole approximation is just an approximation. Instead
of simply replacing the coordinates Xn in the exponent in Eq. (7.5.32) with the
center-of-mass coordinate vector X, we can expand the exponential in powers of
the small quantity q·[Xn −X]. With one factor of this quantity, the operator in the
matrix element involves two factors of coordinates, which can be assembled into
the spherical harmonics Y2m and Y1m , which are respectively known as electric
quadrupole and magnetic dipole terms. With two factors of coordinates, these
operators are even under space inversion, so these contributions to the matrix
element vanish unless the initial and final states satisfy the selection rules
|ji − jf | ≤ 2 ≤ ji + jf , πi = πf electric quadrupole (7.5.48)
|ji − jf | ≤ 1 ≤ ji + jf , πi = πf magnetic dipole . (7.5.49)
For instance, in hydrogen the transition 3d → 1s occurs as an electric
quadrupole transition. The rates of both electric quadrupole and magnetic dipole
transitions are suppressed relative to electric dipole transitions by factors of
order (qr/h̄)2 , where r is a characteristic atomic radius; for optical transitions,
this is of order 10−7 .
We can go on, including higher and higher powers of Xn −X in the expansion
of the exponential in Eq. (7.5.32), and also including effects of electron spin.
But whatever effects we include, there is one kind of transition in which single-
photon emission is completely forbidden: transitions between states that both
have total angular momentum zero. This is a simple consequence of angular
momentum conservation. As we have seen, a photon of helicity ±1 has an
angular momentum component in the direction of motion ±1, and therefore
cannot be emitted in a transition between states that have zero total angular
momentum. For instance, none of the excited states of 12 C or 16 O with j = 0
can emit a single photon in gamma decay to the j = 0 ground state. Such
transitions require the emission of pairs of photons, or if enough energy is
available, of electron–positron pairs.
7.5 Quantum Theory of Electromagnetism 293
Today this reasoning is often run in reverse. It is assumed that the Lagrangian
density is invariant under the local phase transformation (7.5.59), with (x)
an arbitrary infinitesimal function of spacetime coordinates, and from this the
existence is deduced of a vector field Aμ (x) whose properties are governed by
invariance under the gauge transformation (7.5.60).
Indeed, our Standard Model of elementary particles and forces is based on
an assumed invariance under a larger group of local transformations, not just
by x-dependent phases, as in Eq. (7.5.59), but transformations by x-dependent
matrices similar to those for isotopic spin rotations. From this, one deduces
the existence of a number of photon-like particles: some, the gluons with zero
mass, whose strong interactions prevent them from being observed in isolation,
and others that are observed, the W± and Z0 , that become massive as a result
of a spontaneous breakdown of the local gauge symmetry. But these matters are
beyond the scope of this book.
Assorted Problems
2. Suppose that Einstein in 1905 had assumed that, in the radiation at tem-
perature T in a cubical enclosure, the number n of photons for each wave
number and polarization is not any positive integer but can only be n = 0,
n = 1, or n = 2. What would he have found for the energy density E (ν, T )
per unit frequency interval at frequency ν and temperature T ?
3. Suppose that in the 1910 experiment that revealed the existence of the
nucleus of the atom, the nucleus had been moving toward the radon alpha
ray source with speed v0 . What could one conclude about the mass of the
nucleus from the observation that alpha particles are sometimes scattered
straight backwards from the atom?
• What would Bohr in 1913 have found for the radii rn , velocities vn , and
energies En ?
• For what values of η do circular orbits exist that have En < 0?
• What would be found for the relation between h̄ and h if one imposed
Bohr’s correspondence principle on the orbits with n 1?
5. How does the pressure in a non-relativistic ideal gas vary when the mass
density varies adiabatically, assuming that the internal energy density is
either
296
Assorted Problems 297
Show that with the fields and densities transforming in this way, these
equations really are Lorentz invariant.
12. A particle known as a K meson, with mass 494 MeV/c2 , decays at rest into
a muon, with mass 106 MeV/c2 , and a neutrino, with negligible mass. Use
the conservation of energy and momentum to find the velocity of the muon.
13. What second-order partial differential equation (second-order in both time
and space derivatives) is satisfied by the de Broglie wave function for a free
particle when we do not assume that its velocity is much less than c?
14. When a beam of electrons of some definite energy is directed at a perfect
crystal, it is found that the largest angle θ between the incident and reflected
waves at which reflection is enhanced by constructive interference is 150◦ .
At what other value or values of θ is reflection enhanced by constructive
interference?
15. Suppose we measure the position of the electron in the lowest-energy state
of a hydrogen atom. What is the probability of finding that the electron is
farther than 10−8 cm from the nucleus?
16. Consider an electron in a d3/2 state with orbital angular momentum quan-
tum number = 2, total angular momentum quantum number j = 3/2,
and total angular moment 3-component J3 = h̄/2. Suppose we measure
the 3-component S3 of the spin. What are the probabilities of getting the
results S3 = h̄/2 and S3 = −h̄/2? (Calculate whatever Clebsch–Gordan
coefficients you need – do not just look them up in a table.)
17. Suppose the electron has spin 3/2 rather than 1/2, but that all other prop-
erties of electrons and nuclei are as they are in the real world. What would
you expect would be the atomic numbers Z of the two lightest halogen
elements, that behave like fluorine and chlorine in our world?
18. When a free electron is placed in a uniform magnetic field B pointing in the
1-direction, the Hamiltonian becomes
p2
H = + μ|B|S1
2me
where S is the operator representing the electron spin vector and μ is a
constant, related to the electron magnetic moment. Suppose that at t = 0
the expectation value of the spin vector has components
What are the expectation values of the spin vector components at any later
time?
Assorted Problems 299
19. Suppose that the interaction of the electron in a hydrogen atom with some
sort of external field produces a term in the potential
V (r) = gr ,
where g is a small constant. Calculate the terms in the resulting shift in the
energy of the 1s state that are of first and second order in g.
20. Suppose that the spin–orbit coupling of the electron in hydrogen produces
a term in the Hamiltonian
H = ξ L · S
where ξ is a constant, and L and S are the orbital angular momentum and
spin angular momentum of the electron. What does this term contribute to
the fine-structure splitting between the 2p1/2 and 2p3/2 states of hydrogen?
21. Consider the scattering of a spinless particle of mass m and momentum p
by a central potential
V (r) = V0 exp (−r 3 /R 3 )
where V0 and R are constants. Use the Born approximation to give a for-
mula for the scattering amplitude in the limit pR
h̄.
22. Consider a one-particle system with a Lagrangian
m dX 2 dX
L= + · V(X) ,
2 dt dt
where V is some vector function of position X.
• What equation of motion is satisfied by X?
• Find the Hamiltonian of this theory.
• What is the differential equation satisfied by the wave function ψ(x) of
a state with a definite energy E?
23. Consider a particle of charge e and mass m in classical electromagnetic
potentials that depend on time as well as position, with Hamiltonian
1 , e -2
H (X, P) = P − A(X, t) − eφ(X, t) .
2m c
Suppose you perform a time-dependent and position-dependent gauge
transformation, to new potentials
1 ∂ξ
A# = A + ∇ξ , φ# = φ − ,
c ∂t
where ξ is an arbitrary real function of position and time. What is the
relation between the wave function ψ # (x, t) that satisfies the time-dependent
300 Assorted Problems
Schrödinger equation for the new potentials and the wave function ψ(x, t)
that satisfies the time-dependent Schrödinger equation for the original
potentials?
24. Find the coordinate-space wave function of the one-particle state with an-
gular momentum = 0 and energy V0 + 2h̄ω in the harmonic oscillator
potential (6.3.1).
25. Suppose that the potential felt by an alpha particle for radius r outside the
nuclear radius R is not the Coulomb potential, but instead V (r) = g/r 2 ,
where g is some positive constant. Calculate the exponential suppression
factor in the rate of decay of an unstable alpha particle state with energy
E
g/R 2 .
26. Consider the theory of a neutral spinless particle A and a non-neutral spin-
less particle B, with Lagrangian density
1 ∂ϕA ∂ϕA m2 2 ∂ϕ † ∂ϕB
L = − ημν μ ν − A ϕA − ημν Bμ ν − m2B ϕB† ϕB
2 ∂x ∂x 2 ∂x ∂x
− gϕA ϕB† ϕB .
Calculate the S-matrix elements for the processes A + B → A + B and
B + B → A + A to lowest order in g, where B is the antiparticle of B.
27. Calculate the rate for emission of a photon in the transition 2p → 1s in
hydrogen. Derive formulas and use them to find numerical values. You can
use the facts that the proton is much heavier than the electron, and that the
wavelength of the photon emitted in this process is much larger than the
atomic size, and you can neglect electron spin.
28. What powers of the photon wave number appear in the rates for single-
photon emission in the decays of the 4f state of hydrogen into the 2s and
2p states?
Bibliography
301
302 Bibliography
303
304 Author Index
307
308 Subject Index
magnetism, 11, 198, see also electromagnetism also see magic numbers, periodic table
magic numbers, 228–229 periodic table of elements, 171–172
mass in relativity theory, 106–110 perturbation theory, 199–205, 262–264
mass of unit atomic weight m1 , 9–10, 214 photoelectric effect, 70
matrix mechanics, 124, 225, 227, 291 photons, 67–71, 284–286
Maxwell–Boltzmann distribution, 33–34, 39–40 pions, 222–223, 248
Maxwell’s equations, 12, 62, 114–120, 196, Planck distribution, 65–66, 68–69, 168
281–282 Planck’s constant h, 66, 69, 70
mean free path, 49–51 Planck’s constant h̄, 80
Michelson–Morley experiment, 91–95 position operator, 140
Minkowski spacetime notation, 96–97, 117–119 positron, 244–245, 247
mole, defined, 10 pressure, 2, 5, 23, 45
molecular weight μ, 9–10 also see gases, osmotic pressure, radiation
molecule, 8–9, 41 principal quantum number n, 138, 151, 161
angular momentum, 40–41 Principia of Newton, 3, 18
diatomic molecules, 172–175, 212 probabilities in quantum mechanics, 131, 139,
size and mass, 53 144–145, 179–180, 187, 208
momentum propagator, 267–269
conservation, 18, 43–44, 149 proton, discovered, 210
fields, 254 Prout’s hypothesis, 211
momentum-space wave functions, 145–146
of photons, 70, 111–112 quantum chromodynamics, 221
of relativistic particles, 109 quantum electrodynamics, 198, 205, 280–295
operator, 130, 140 quantum field theory, 245, 251–295
muons, 106, 223, 248 quantum mechanics, 138–151, 206–209
quarks, 157, 166, 221–222
Navier–Stokes equation, 44–47
neutral matter, 31–32 radiation, 32, 62–65
neutrinos, 157, 244, 247–250 radiation energy constant, 67
neutron also see A and B coefficients, black-body
decay, 244 radiation, light, Planck distribution,
discovery, 212–213 Rayleigh-Jeans distribution,
mass, 213, 222 Stefan-Boltzmann constant, stimulated
noble gases, 172, 224 emission
Noether’s theorem, 194–195, 293 radiative decay, 84–85, 174–175, 286–292
normalization of wave functions, 131, 139, 148, 240 radioactivity, 71
nuclear force, 213–214, 217–218, 224–225 also see alpha decay, beta decay, gamma decay
nuclei of atoms radium & radon, 71, 72, 216, 234–235
binding energy, 214–216 Rayleigh-Jeans distribution, 65, 67
charge, 76–77 Reflections on the Motive Power of Heat of Carnot,
discovery, 72–73 22
mass, 73–74 reduced mass, 52
radius, 74–75, 213–214, 235 relativity, see Einsteinian relativity, Galilean
nucleons, 213, 217 relativity, general relativity
representations of Lorentz group, 274, 276–277
On the Electrodynamics of Moving Bodies of resonance, 190, 223
Einstein, 95 rotations, 152–153
On the Nature of Things of Lucretius, 1 Rydberg constant, 81
ortho and para molecules, 173–175, 212
orthogonal state vectors, 207 S-matrix, 185, 262, 264
orthogonal wave functions, 142–144 saturation of nuclear force, 213
operators representing observables, 139–141 scalar and vector potentials, 195–198, 281–286
osmotic pressure, 55–56 scalar fields, 255–270
scalar product of state vectors, 206
para molecules, see ortho and para molecules scalar product of wave functions, 141, 158, 183
parity, see space-inversion symmetry scattering
Pauli exclusion principle, 170–171, 213, 215, 271, by Coulomb potential, 75–76, 181–182
274 Compton scattering, 112–114
310 Subject Index