You are on page 1of 197


Special Relativity
Benjamin Crowell
Fullerton, California
Copyright c _2013 Benjamin Crowell
rev. February 20, 2014
Permission is granted to copy, distribute and/or modify this docu-
ment under the terms of the Creative Commons Attribution Share-
Alike License, which can be found at The
license applies to the entire text of this book, plus all the illustra-
tions that are by Benjamin Crowell. All the illustrations are by
Benjamin Crowell except as noted in the photo credits or in paren-
theses in the caption of the gure. This book can be downloaded
free of charge from in a variety of formats,
including editable formats.
Brief Contents
1 Spacetime 11
2 Foundations (optional) 39
3 Kinematics 49
4 Dynamics 69
5 Inertia (optional) 101
6 Waves 109
7 Coordinates 129
8 Rotation (optional) 139
9 Flux 149
10 Electromagnetism 175
1 Spacetime 11
1.1 Three models of spacetime . . . . . . . . . . . . . 11
Aristotelian spacetime, 12.Galilean spacetime, 13.Einsteins
spacetime, 15.
1.2 Minkowski coordinates . . . . . . . . . . . . . . . 20
1.3 Measurement. . . . . . . . . . . . . . . . . . . 21
Invariants, 21.The metric, 22.The gamma factor, 25.
1.4 The Lorentz transformation . . . . . . . . . . . . . 28
Problems . . . . . . . . . . . . . . . . . . . . . . 34
2 Foundations (optional) 39
2.1 Causality . . . . . . . . . . . . . . . . . . . . 39
The arrow of time, 39.Initial-value problems, 39.A modest def-
inition of causality, 40.
2.2 Flatness . . . . . . . . . . . . . . . . . . . . . 40
Failure of parallelism, 41.Parallel transport, 41.Special rela-
tivity requires at spacetime, 41.
2.3 Additional postulates. . . . . . . . . . . . . . . . 43
2.4 Other axiomatizations . . . . . . . . . . . . . . . 43
Einsteins postulates, 43.Maximal time, 44.Comparison of the
systems, 44.
2.5 Lemma: spacetime area is invariant . . . . . . . . . 45
Problems . . . . . . . . . . . . . . . . . . . . . . 47
3 Kinematics 49
3.1 How can they both . . . ? . . . . . . . . . . . . . . 50
3.2 The stretch factor is the Doppler shift . . . . . . . . . 51
3.3 Combination of velocities . . . . . . . . . . . . . . 53
3.4 No frame of reference moving at c . . . . . . . . . . 55
3.5 The velocity and acceleration vectors. . . . . . . . . 56
The velocity vector, 56.The acceleration vector, 57.Constraints
on the velocity and acceleration vectors, 58.
3.6 Some kinematic identities . . . . . . . . . . . . . 61
3.7 The projection operator . . . . . . . . . . . . . . 61
3.8 Faster-than-light frames of reference?. . . . . . . . 64
Problems . . . . . . . . . . . . . . . . . . . . . . 66
4 Dynamics 69
4.1 Ultrarelativistic particles . . . . . . . . . . . . . . 69
4.2 E=mc
. . . . . . . . . . . . . . . . . . . . . . 72
4.3 Relativistic momentum . . . . . . . . . . . . . . . 77
Massless particles travel at c, 82.No global conservation of energy-
momentum in general relativity, 83.
4.4 Systems with internal structure. . . . . . . . . . . 84
4.5 Force . . . . . . . . . . . . . . . . . . . . . 86
Four-force, 86.The force measured by an observer, 86.Transformation
of the force measured by an observer, 88.
4.6 Degenerate matter . . . . . . . . . . . . . . . . 89
4.7 Tachyons and FTL . . . . . . . . . . . . . . . . 92
A defense in depth, 92.Experiments to search for tachyons, 93.
Problems . . . . . . . . . . . . . . . . . . . . . . 96
5 Inertia (optional) 101
5.1 What is inertial motion? . . . . . . . . . . . . . . 101
An operational denition, 101.Equivalence of inertial and grav-
itational mass, 103.
5.2 The equivalence principle. . . . . . . . . . . . . . 104
Equivalence of acceleration to a gravitational eld, 104.Eotvos
experiments, 104.Gravity without gravity, 105.Gravitational
Doppler shifts, 105.A varying metric, 106.
Problems . . . . . . . . . . . . . . . . . . . . . . 108
6 Waves 109
6.1 Frequency . . . . . . . . . . . . . . . . . . . . 109
Is times ow constant?, 109.Clock-comparison experiments, 109.
Birdtracks notation, 110.Duality, 111.
6.2 Phase . . . . . . . . . . . . . . . . . . . . . . 111
Phase is a scalar, 111.Scaling, 112.
6.3 The frequency-wavenumber covector . . . . . . . . . 112
Visualization, 112.The gradient, 113.Phase and group velocity,
6.4 Duality . . . . . . . . . . . . . . . . . . . . . 114
Duality in 3+1 dimensions, 114.Change of basis, 117.
6.5 The Doppler shift and aberration. . . . . . . . . . . 117
Doppler shift, 117.Aberration, 118.
6.6 Some related mathematical tools . . . . . . . . . . 121
Abstract index notation, 121.Volume, 124.
Problems . . . . . . . . . . . . . . . . . . . . . . 127
7 Coordinates 129
7.1 An example: accelerated coordinates. . . . . . . . . 129
7.2 Transformation of vectors . . . . . . . . . . . . . . 130
7.3 Transformation of the metric . . . . . . . . . . . . 132
7.4 Summary of transformation laws. . . . . . . . . . . 134
7.5 Inertia and rates of change . . . . . . . . . . . . . 136
Problems . . . . . . . . . . . . . . . . . . . . . . 137
8 Rotation (optional) 139
8.1 Rotating frames of reference . . . . . . . . . . . . 139
No clock synchronization, 139.Rotation is locally detectable, 140.
The Sagnac eect, 140.A rotating coordinate system, 141.
8.2 Boosts and rotations. . . . . . . . . . . . . . . . 143
Rotations, 143.Boosts, 144.Thomas precession, 145.
Problems . . . . . . . . . . . . . . . . . . . . . . 147
8 Contents
9 Flux 149
9.1 The current vector . . . . . . . . . . . . . . . . . 149
Current as the ux of charged particles, 149.Conservation of
charge, 152.
9.2 The stress-energy tensor . . . . . . . . . . . . . . 153
Conservation and ux of energy-momentum, 153.Symmetry of
the stress-energy tensor, 153.Dust, 154.Rank-2 tensors and
their transformation law, 154.Pressure, 156.A perfect uid,
156.Two simple examples, 158.Energy conditions, 160.
9.3 Gausss theorem . . . . . . . . . . . . . . . . . 162
Integral conservation laws, 162.A simple form of Gausss theorem,
162.The general form of Gausss theorem, 163.The energy-
momentum vector, 165.
9.4 The covariant derivative. . . . . . . . . . . . . . 167
Comma, semicolon, and birdtracks notation, 169.Finding the
Christoel symbol from the metric, 169.The geodesic equation,
Problems . . . . . . . . . . . . . . . . . . . . . . 174
10Electromagnetism 175
10.1 Relativity requires magnetism . . . . . . . . . . . 175
10.2 Fields in relativity. . . . . . . . . . . . . . . . . 176
Time delays in forces exerted at a distance, 176.Fields carry
energy., 176.Fields must have transformation laws, 177.
10.3 Electromagnetic elds . . . . . . . . . . . . . . 178
The electric eld, 178.The magnetic eld, 178.What about
gravity?, 181.
10.4 Transformation of the elds . . . . . . . . . . . . 181
10.5 Invariants . . . . . . . . . . . . . . . . . . . . 182
10.6 Stress-energy tensor of the electromagnetic eld . . . 182
10.7 Maxwells equations . . . . . . . . . . . . . . . 186
Statement and interpretation, 186.Experimental support, 187.
Incompatibility with Galilean spacetime, 187.Not manifestly rel-
ativistic in their original form, 187.Lorentz invariance, 189.
Problems . . . . . . . . . . . . . . . . . . . . . . 193
Appendix ??: Hints and solutions . . . . . . . . . . . . . . . . . . . . . . . . . . ??
Contents 9
10 Contents
Chapter 1
1.1 Three models of spacetime
The test of a rst-rate intelligence is the ability to hold two op-
posing ideas in mind at the same time and still retain the ability to
function. F. Scott Fitzgerald
a / Three views of spacetime. 1. A
typical graph of a particles mo-
tion: an oscillation. 2. In relativity,
its customary to swap the axes,
and 3 we can even remove the
axes entirely.
Time and space together make spacetime, gure a, the stage on
which physics is played out. Until 1905, physicists were trained to
accept two mutually contradictory theories of spacetime. Ill call
these the Aristotelian and Galilean views, although my colleagues
from that era would have been oended to be accused of even partial
c / Valid vectors representing
observers and simultaneity, ac-
cording to the Aristotelian model
of spacetime.
b / 1. An observer and two clocks.
2. Idealization as events. 3. Vec-
tors used to represent relation-
ships between events.
1.1.1 Aristotelian spacetime
Figure b/1 shows an observer and two clocks, represented using
the graphical conventions of gure a/3. The existence of such a
material object at a certain place and time constitutes an event,
which we idealize as a point, b/2. Spacetime consists of the set of
all events. As time passes, a physical object traces out a continuous
curve, a set of events known in relativistic parlance as its world-
line. Since paper and computer screens are two-dimensional, the
drawings only represent one dimension of space plus one dimension
of time, which in relativity we call 1+1 dimensions. The real
universe has three spatial dimensions, so real spacetime has 3+1
dimensions. Most, but not all, of the interesting phenomena in
special relativity can be understood in 1+1 dimensions, so whenever
possible in this book I will draw 1+1-dimensional gures without
apology or explanation.
The relativists attitude is that events and relationships between
events are primary, while coordinates such as x and t are secondary
and possibly irrelevant. Coordinates let us attach labels like (x, t) to
points, but this is like God asking Adam to name all the birds and
animals: the animals didnt care about the names. Figure b/3 shows
the use of vectors to indicate relationships between points. Vector
o is an observer-vector, connecting two points on the world-line of
the person. It points from the past into the future. The vector s
connecting the two clocks is a vector of simultaneity. The clocks
have previously been synchronized side by side, and if we assume
that transporting them to separate locations doesnt disrupt them,
then the fact that both clocks read two minutes after three oclock
tells us that the two events occur at the same time.
The Aristotelian model of spacetime is characterized by a set
of rules about what vectors are valid observer- and simultaneity-
vectors. We require that every o vector be parallel to every other,
and likewise for s vectors. But, as is usual with vectors, we allow
the arrow to be drawn anywhere without considering the dierent
locations to have any signicance; that is, our model of spacetime
doesnt allow dierent regions to have dierent properties.
When Einstein was a university student, these rules (phrased dif-
ferently) were the ones he was taught to use in describing electricity
and magnetism. He later recalled imagining himself on a motorcy-
12 Chapter 1 Spacetime
cle, riding along next to a light wave and trying to imagine how his
observations could be reconciled with Maxwells equations. I dont
know whether he was ever brave enough to describe this daydream
to his professors, but if he had, their answer would have been essen-
tially that his hypothetical o vector was illegal. The good o vectors
were thought to be the ones that represented an observer at rest
relative to the ether, a hypothetical all-pervasive medium whose vi-
brations were electromagnetic waves. However silly this might seem
to us a hundred years later, it was in fact strongly supported by the
evidence. A vast number of experiments had veried the validity of
Maxwells equations, and it was known that if Maxwells equations
were valid in coordinates (x, t) dened by an observer o, they would
become invalid under the transformation (x

, t

) = (x +vt, t) to co-
ordinates dened by an observer o

in motion at velocity v relative

to o.
1.1.2 Galilean spacetime
But the Aristotelian model was already known to be wrong when
applied to material objects. The classic empirical demonstration of
this fact came around 1610 with Galileos discovery of four moons
orbiting Jupiter, gure d. Aristotelianism in its ancient form was
originally devised as an explanation of why objects always seemed to
settle down to a natural state of rest according to an observer stand-
ing on the earths surface. But as Jupiter ew across the heavens,
its moons circled around it, without showing any natural tendency
to fall behind it like a paper cup thrown out the window of a car.
Just as an observer o
standing on the earth would consider the
earth to be at rest, o
hovering in a balloon at Jupiters cloudtops
would say that the jovian clouds represented an equally natural
state of rest.
d / A simulation of how Jupiter
and its moons might appear at
intervals of three hours through
a telescope. Because we see
the moons circular orbits edge-
on, their world-lines appear sinu-
soidal. Over this time period, the
innermost moon, Io, completes
half a cycle.
Section 1.1 Three models of spacetime 13
e / Valid vectors representing
observers and simultaneity, ac-
cording to the Galilean model of
f / Example 1.
We are thus led to a dierent, Galilean, set of rules for o and
s vectors. All s vectors are parallel to one another, but any vector
that is not parallel to an s vector is a valid o vector. (We may
wish to require that it point into the future rather than the past,
but Newtons laws are symmetric under time-reversal, so this is not
strictly necessary.)
Galilean spacetime, unlike Aristotelian spacetime, has no univer-
sal notion of same place. I can drive to Gettysburg, Pennsylvania,
and stand in front of the brass plaque that marks the site of the mo-
mentous Civil War battle. But am I really in the same place? An
observer whose frame of reference was xed to another planet would
say that our planet had moved through space since 1863.
Note that our geometrical description includes a notion of paral-
lelism, but not of angular measure. We dont know or care whether
the angle between an s and an o is 90 degrees. One represents
a distance, while the other represents an interval of time, and we
cant dene the angle between a distance and a time. The same was
true in the Aristotelian model; the vectors in gure c were drawn
perpendicular to one another simply as a matter of convention, but
any other angle could have been used.
The Galilean twin paradox Example 1
Alice and Betty are identical twins. Betty goes on a space voyage,
traveling away from the earth along vector o
and then turning
around and coming back on o
. Meanwhile, Alice stays on earth.
Because this is an experiment involving material objects, and the
conditions are similar to those under which Galilean relativity has
been repeatedly veried by experiment, we expect the results to
be consistent with Galilean relativitys claim that motion is relative.
Therefore it seems that it should be equally valid to consider Betty
and the spaceship as having been at rest the whole time, while
Alice and the planet earth traveled away fromthe spaceship along
and then returned via o
. But this is not consistent with the
experimental results, which show that Betty undergoes a violent
acceleration at her turnaround point, while Alice and the other
inhabitants of the earth feel no such effect.
The paradox is resolved by realizing that Galilean relativity de-
nes unambiguously whether or not two vectors are parallel. Its
true that we could x a frame of reference in which o
the spaceship staying at rest, but o
is not parallel to o
, so in this
frame we still have a good explanation for why Betty feels an ac-
celeration: she has gone from being at rest to being in motion.
Regardless of which frame of reference we pick, and regardless
of whether we even x a frame of reference, o
and o
are parallel
to one another, and this explains why Alice feels no effect.
14 Chapter 1 Spacetime
g / The clock took up two seats,
and two tickets were bought for it
under the name of Mr. Clock.
h / All three clocks are mov-
ing to the east. Even though the
west-going plane is moving to the
west relative to the air, the air
is moving to the east due to the
earths rotation.
1.1.3 Einsteins spacetime
We have two models of spacetime, neither of which is capable
of describing all the phenomena we observe. Because of the rela-
tively crude state of technology ca. 1900, it required considerable
insight for Einstein to piece together a fragmentary body of indirect
evidence and arrive at a consistent and correct model of spacetime.
Today, the evidence is part of everyday life. For example, every
time you use a GPS receiver, youre using Einsteins theory of rela-
tivity. Somewhere between 1905 and today, technology became good
enough to allow conceptually simple experiments that students in
the early 20th century could only discuss in terms like Imagine that
we could. . .
A good jumping-on point is 1971. In that year, J.C. Hafele and
R.E. Keating brought atomic clocks aboard commercial airliners,
gure g, and went around the world, once from east to west and
once from west to east. Hafele and Keating observed that there was
a discrepancy between the times measured by the traveling clocks
and the times measured by similar clocks that stayed home at the
U.S. Naval Observatory in Washington.
The east-going clock lost
time, ending up o by 59 10 nanoseconds, while the west-going
one gained 273 7 ns.
We are used to thinking of time as absolute and universal, so it
is disturbing to nd that it can ow at a dierent rate for observers
in dierent frames of reference. Nevertheless, the eects that Hafele
and Keating observed were small. This makes sense: Galilean rel-
ativity had already been thoroughly veried for material objects
such as clocks, planets, and airplanes, so a new theory like Ein-
steins had to agree with Galileos to a good approximation, within
the Galilean theorys realm of applicability. This requirement of
backward-compatibility is known as the correspondence principle.
Its also reassuring that the eects on time were small compared
to the three-day lengths of the plane trips. There was therefore no
opportunity for paradoxical scenarios such as one in which the east-
going experimenter arrived back in Washington before he left and
then convinced himself not to take the trip. A theory that maintains
this kind of orderly relationship between cause and eect is said to
satisfy causality.
Hafele and Keating were testing specic quantitative predictions
of relativity, and they veried them to within their experiments er-
ror bars. Lets work backward instead, and inspect the empirical
results for clues as to how time works. The disagreements among
the clocks suggest that simultaneity is not absolute: dierent ob-
There were actually several eects at work, but these details do not aect
the present argument, which only depends on the fact that there is no absolute
time. See p. 106 for more on this topic.
For more about causality, see section 2.1, p. 39.
Section 1.1 Three models of spacetime 15
i / According to Einstein, simul-
taneity is relative, not absolute.
servers have dierent notions of simultaneity, as suggested in gure
i. Just as Galilean relativity freed the o vectors from the constraint
of being parallel to one another, Einstein frees the s vectors. Galileo
made same place into an ambiguous concept, while Einstein did
the same with simultaneous. But because a particular observer
does have methods of synchronizing clocks (e.g., Einstein synchro-
nization, example 4, p. 19), the denition of simultaneity isnt com-
pletely arbitrary. For each o vector we have a corresponding s vec-
tor, which represents that observers opinion as to what constitutes
simultaneity. Because the convention on a Cartesian xt graph is to
draw the axes at right angles to one another, we refer to such a pair
of vectors as orthogonal, but the word is not to be interpreted lit-
erally, since we cant dene an actual angle between a time interval
and a spatial displacement.
j / Possibilities for the behavior of orthogonality.
What, then, are the rules for orthogonality? Figure j shows three
possibilities. In each case, we have an initial pair of vectors o
that we assume are orthogonal, and we then draw a new pair o
and s
for a second observer who is in motion relative to the rst.
The Galilean case, where s
remains parallel to s
, has already been
ruled out. The second case is the one in which s rotates in the same
direction as o. This one is forbidden by causality, because if we kept
on rotating, we could eventually end up rotating o by 180 degrees,
so by a continued process of acceleration, we could send an observer
into a state in which her sense of time was reversed. We are left
with only one possibility for Einsteins spacetime, which is the one
in which a clockwise rotation of o causes a counterclockwise rotation
of s, like closing a pair of scissors.
Now there is a limit to how far this process can go, or else the s
and o would eventually lie on the same line. But this is impossible,
for a valid s vector can never be a valid o, nor an o a valid s. Such a
possibility would mean that an observer would describe two dierent
points on his own world-line as simultaneous, but an observer for
whom no time passes is not an observer at all, since observation
implies collecting data and then being able to remember it at some
later time. We conclude that there is a diagonal line that forms
the boundary between the set of possible s vectors and the set of
valid o vectors. This line has some slope, and the inverse of this
slope corresponds to some velocity, which is apparently a universal
16 Chapter 1 Spacetime
k / The light cone.
and xed property of Einsteins spacetime. This velocity we call c,
and the correspondence principle tells us that c must be very large,
or else Einsteinian, or relativistic, eects such as time distortion
would have been large even for motion at everyday speeds; in the
Hafele-Keating experiment they were quite small, even at the high
speed of a passenger jet.
Although c is a large number when expressed in meters per sec-
ond, for convenience in relativity we will always choose units such
that c = 1. The boundary between s and o vectors then appears on
spacetime diagrams as a diagonal line at 45 degrees. In more than
one spatial dimension, this boundary forms a cone, gure k, and for
reasons that will become more clear in a moment, this cone is called
the light cone. Vectors lying inside the light cone are referred to as
timelike, those outside as spacelike, and those on the cone itself as
lightlike or null.
An important advantage of Einsteins relativity over Galileos is
that it is compatible with the empirical observation that some phe-
nomena travel at a certain xed speed. Light travels at a xed speed,
and so do other phenomena such as gravitational waves (which, al-
though they have not yet been observed directly, have been indi-
rectly conrmed to exist through observations of the decaying or-
bits of binary neutron stars). So do all massless particles (subsection
4.3.1). This xed speed is c, and all observers agree on it. In 1905,
the only phenomenon known to travel at c was light, so c is usually
described as the speed of light, but from the modern point of view
it functions more as a kind of conversion factor between our units
of measurement for time and space. It is a property of spacetime,
not a property of light.
More fundamentally, c is the maximum speed of cause and eect.
If we could propagate cause and eect, e.g., by transmitting a signal,
at a speed greater than c, we would be violating both causality and
the principle that motion is relative:
The argument from causality: If a signal could be propagated
at a speed greater than c, then the vector r connecting the
cause and the eect would be spacelike. But by adding two
spacelike vectors we can make a vector lying in the past time-
like light cone, so by relaying the signal we could send a mes-
sage into the past, violating causality.
The argument from relativity of motion: Furthermore, there
would exist a frame in which vector r was a vector of simul-
taneity, so that the information would have been propagated
not just faster than c but in fact instantaneously. It would
then be possible to send signals from one place in the universe
to another without any time lag. This would allow perfect
synchronization of all clocks. But observations such as the
Section 1.1 Three models of spacetime 17
l / A ring laser gyroscope.
Hafele-Keating experiment demonstrate that clocks A and B
that have been initially synchronized will drift out of sync if
one is in motion relative to the other. With instantaneous
transmission of signals, we could determine, without having
to wait for A and B to be reunited, which was ahead and
which was behind. Since they dont need to be reunited, nei-
ther one needs to undergo any acceleration; each clock can
x an inertial frame of reference, with a velocity vector that
changes neither its direction nor its magnitude. But this vi-
olates the principle that constant-velocity motion is relative,
because each clock can be considered to be at rest, in its own
frame of reference. Since no experiment has ever detected any
violation of the relativity of motion, we conclude that instan-
taneous action at a distance is impossible.
One could complain that giving two separate arguments to this ef-
fect was gilding the lily. If we were doing mathematics, then one
proof would be enough. But this is science, not mathematics, so
every assumption that goes into an argument is not absolute truth
but provisional truth, founded on observations that have limited
precision and that cover a limited domain of conditions. Neither
the relativity of motion nor causality is a logical necessity; they are
both just generalizations based on a body of evidence. For more on
causality, and its uncertain empirical status, see section 2.1, p. 39.
The ring laser gyroscope Example 2
If youve own in a jet plane, you can thank relativity for helping
you to avoid crashing into a mountain or an ocean. Figure l shows
a standard piece of navigational equipment called a ring laser
gyroscope. A beam of light is split into two parts, sent around the
perimeter of the device, and reunited. Since the speed of light is
constant, we expect the two parts to come back together at the
same time. If they dont, its evidence that the device has been
rotating. The planes computer senses this and notes how much
rotation has accumulated.
No frequency-dependence Example 3
Relativity has only one universal speed, so it requires that all light
waves travel at the same speed, regardless of their frequency
and wavelength. Presently the best experimental tests of the in-
variance of the speed of light with respect to wavelength come
from astronomical observations of gamma-ray bursts, which are
sudden outpourings of high-frequency light, believed to originate
from a supernova explosion in another galaxy. One such obser-
vation, in 2009,
found that the times of arrival of all the different
frequencies in the burst differed by no more than 2 seconds out
of a total time in ight on the order of ten billion years!
18 Chapter 1 Spacetime
n / Example 4.
Einsteins train Example 4
The gure shows a famous thought experiment devised by Ein-
stein. A train is moving at constant velocity to the right when bolts
of lightning strike the ground near its front and back. Alice, stand-
ing on the dirt at the midpoint of the ashes, observes that the
light from the two ashes arrives simultaneously, so she says the
two strikes must have occurred simultaneously. Bob, meanwhile,
is sitting aboard the train, at its middle. He passes by Alice at the
moment when Alice later gures out that the ashes happened.
Later, he receives ash 2, and then ash 1. He infers that since
both ashes traveled half the length of the train, ash 2 must have
occurred rst. How can this be reconciled with Alices belief that
the ashes were simultaneous?
Figure n shows the corresponding spacetime diagram. It seems
paradoxical that Alice and Bob disagree on simultaneity, but this is
only because we have an ingrained prejudice in favor of Galilean
relativity. Alices method of determining that 1 and 2 were simul-
taneous is valid, and is known as Einstein synchronization. The
dashed line connecting 1 and 2 is orthogonal to Alices world-line.
But Bob has a different opinion about what constitutes simultane-
ity. The slanted dashed line is orthogonal to his world-line. Ac-
cording to Bob, 2 happened before the time represented by this
line, 1 after.
Example 4 is of course impractical as described, since real trains
dont travel at speeds anywhere near c relative to the dirt. We say
that their speeds are nonrelativistic. Because Einstein coined the
term relativity, and his version of relativity superseded Galileos,
the unmodied word is normally understood to refer to Einsteinian
relativity. A physicist who studies Einstein-relativity is a relativist.
A material object moving at a speed very close to c is described as
ultrarelativistic. One often hears laypeople describing relativity in
terms of certain eects that would happen if you went at the speed
of light. In fact, as well see in ch. 3 and 4, it is not possible to
accelerate material objects to c, and in any case that isnt necessary.
Relativistic eects exist at all speeds, but theyre weak at speeds
small compared to c.
Section 1.1 Three models of spacetime 19
Discussion Question
A The machine-gunner in the gure sends out a spray of bullets. Sup-
pose that the bullets are being shot into outer space, and that the dis-
tances traveled are trillions of miles (so that the human gure in the dia-
gram is not to scale). After a long time, the bullets reach the points shown
with dots which are all equally far from the gun. Their arrivals at those
points are events A through E, which happen at different times. The chain
of impacts extends across space at a speed greater than c. Does this
violate special relativity?
Discussion question A.
1.2 Minkowski coordinates
It is often convenient to name points in spacetime using coordinates,
and a particular type of naming, chosen by Einstein and Minkowski,
is the default in special relativity. Ill refer to the coordinates of this
system as Minkowski coordinates, and theyre what I have in mind
throughout this book when I use letters like t and x (or variations
like x

, t
, etc.) without further explanation. To dene Minkowski
coordinates in 1 + 1 dimensions, we need to pick (1) an event that
we consider to be the origin, (t, x) = (0, 0), (2) an observer-vector
o, and (3) a side of the observers world-line that we will call the
positive x side, and draw on the right in diagrams. The observer is
required to be inertial,
so that by repeatedly making copies of o
and laying them tip-to-tail, we get a chain that lies on top of the
observers world-line and represents ticks on the observers clock.
Minkowski coordinates use units with c = 1. Explicitly, we dene
For now we appeal to the freshman mechanics notion of inertial. A better
relativistic denition, which diers from the Newtonian one, is given in ch. 5.
20 Chapter 1 Spacetime
p / Hermann Minkowski (1864-
q / A set of Minkowski coor-
the unique vector s that is orthogonal to o, points in the positive
direction, and has a length of one clock-tick. In practical terms, the
orthogonality could be dened by Einstein synchronization (example
4, p. 19), and the length by arranging that a radar echo travels to
the tip of s and back in two ticks.
We now construct a graph-paper lattice, gure q, by repeating
the vectors o and s. This grid denes a name (t, x) for each point
in spacetime.
1.3 Measurement
We would like to have a general system of measurement for relativity,
but so far we have only an incomplete patchwork. The length of a
timelike vector can be dened as the time measured on a clock that
moves along the vector. A spacelike vector has a length that is
measured on a ruler whose motion is such that in the rulers frame
of reference, the vectors endpoints are simultaneous. But there is no
third measuring instrument designed for the purpose of measuring
lightlike vectors.
Nor do we automagically get a complete system of measurement
just by having dened Minkowski coordinates. For example, we
dont yet know how to nd the length of a timelike vector such
as (t, x) = (2, 1), and we suspect that it will be not equal 2,
since the Hafele-Keating experiment tells us that a clock undergoing
the motion represented by x = 1 will probably not agree with a
clock carried by the observer whose clock we used in dening these
1.3.1 Invariants
The whole topic of measurement is apt to be confusing, because
the shifting landscape of relativity makes us feel as if weve walked
into a Salvador Dali landscape of melting pocket watches. A good
way to regain our bearings is to look for quantities that are in-
variant: they are the same in all frames of reference. A Euclidean
invariant, such as a length or an angle, is one that doesnt change
under rotations: all observers agree on its value, regardless of the
orientations of their frames of reference. For a relativistic invariant,
we require in addition that observers agree no matter what state of
motion they have. (A transformation that changes from one iner-
tial frame of reference to another, without any rotation, is called a
Electric charge is a good example of an invariant. Electrons
in atoms typically have velocities of 0.01 to 0.1 (in our relativis-
tic units, where c = 1), so if an electrons charge depended on its
motion relative to an observer, atoms would not be electrically neu-
Section 1.3 Measurement 21
r / The two light-rectangles
have the same area.
tral. Experiments have been done
to test this to the phenomenal
precision of one part in 10
, with null results.
A vector can never be an invariant, since it changes direction
under a rotation. (Some vectors, such as velocities, also change un-
der a boost.) In freshman mechanics, any quantity, such as energy,
that wasnt a vector usually fell into the category we referred to as
scalars. In relativity, however, the term scalar has a much more
restrictive denition, which well discuss in section 6.2.1, p. 111.
By the way, beginners in relativity sometimes get confused about
invariance as opposed to conservation. They are not the same thing,
and neither implies the other. For example, momentum has a direc-
tion in space, so it clearly isnt invariant but well see in section
4.3 that there is a relativistic version of the momentum vector that
is conserved. As in Newtonian mechanics, we dont care if all ob-
servers agree on the momentum of a system we only care that
the law of momentum conservation is valid and has the same form
in all frames. Conversely, there are quantities that are invariant but
not conserved, mass being an example.
1.3.2 The metric
Area in 1+1 dimensions is also an invariant, as proved on p. 45.
The invariance of area has little importance on its own, but it pro-
vides a good stepping stone toward a relativistic system of measure-
ment. Suppose that we have events A (Charles VII is restored to
the throne) and B (Joan of Arc is executed). Now imagine that
technologically advanced aliens want to be present at both A and
B, but in the interim they wish to y away in their spaceship, be
present at some other event P (perhaps a news conference at which
they give an update on the events taking place on earth), but get
back in time for B. Since nothing can go faster than c (which we
take to equal 1), P cannot be too far away. The set of all possible
events P forms a rectangle, gure r/1, in the 1+1-dimensional plane
that has A and B at opposite corners and whose edges have slopes
equal to 1. We call this type of rectangle a light-rectangle.
The area of this rectangle will be the same regardless of ones
frame of reference. In particular, we could choose a special frame
of reference, panel 2 of the gure, such that A and B occur in the
same place. (They do not occur at the same place, for example, in
the suns frame, because the earth is spinning and going around the
sun.) Since the speed c = 1 is the same in all frames of reference,
and the sides of the rectangle had slopes 1 in frame 1, they must
still have slopes 1 in frame 2. The rectangle becomes a square,
whose diagonals are an o and an s for frame 2. The length of these
diagonals equals the time elapsed on a clock that is at rest in frame
Marinelli and Morpugo, The electric neutrality of matter: A summary,
Physics Letters B137 (1984) 439.
22 Chapter 1 Spacetime
2, i.e., a clock that glides through space at constant velocity from A
to B, reuniting with the planet earth when its orbit brings it to B.
The area of the gray regions can be interpreted as half the square
of this gliding-clock time, which is called the proper time. Proper
is used here in the somewhat archaic sense of own or self, as in
The Vatican does not lie within Italy proper. Proper time, which
we notate , can only be dened for timelike world-lines, since a
lightlike or spacelike world-line isnt possible for a material clock.
In terms of (Minkowski) coordinates, suppose that events A and
B are separated by a distance x and a time t. Then in general
gives the square of the gliding-clock time. Proof: Because
of the way that area scales with a rescaling of the coordinates, the
expression must have the form (. . .)t
+(. . .)tx+(. . .)x
, where each
(. . .) represents a unitless constant. The tx coecient must be zero
by the isotropy of space. The t
coecient must equal 1 in order
to give the right answer in the case of x = 0, where the coordinates
are those of an observer at rest relative to the clock. Since the area
vanishes for x = t, the x
coecient must equal 1.
When [x[ is greater than [t[, events A and B are so far apart in
space and so close together in time that it would be impossible to
have a cause and eect relationship between them, since c = 1 is
the maximum speed of cause and eect. In this situation t
negative and cannot be interpreted as a clock time, but it can be
interpreted as minus the square of the distance between A and B, as
measured in a frame of reference in which A and B are simultaneous.
Generalizing to 3+1 dimensions and to any vector v, not just
a displacement in spacetime, we have a measurement of the vector
dened by
In the special case where v is a spacetime displacement, this can be
referred to as the spacetime interval. Except for the signs, this looks
very much like the Pythagorean theorem, which is a special case of
the vector dot product. We therefore dene a function g called the
g(u, v) = u
Because of the analogy with the Euclidean dot product, we often use
the notation uv for this quantity, and we sometimes call it the inner
product. The metric is the central object of relativity. In general
relativity, which describes gravity as a curvature of spacetime, the
coecients occurring on the right-hand side are no longer 1, but
must vary from point to point. Even in special relativity, where the
coecients can be made constant, the denition of g is arbitrary up
to a nonzero multiplicative constant, and in particular many authors
dene g as the negative of our denition. The sign convention we
use is the most common one in particle physics, while the opposite
is more common in classical relativity. The set of signs, + or
Section 1.3 Measurement 23
+ ++, is called the signature of the metric.
In subsection 1.1.3 we developed the idea of orthogonality of
spacetime vectors, with the physical interpretation that if an ob-
server moves along a vector o, a vector s that is orthogonal to o is
a vector of simultaneity. This corresponds to the vanishing of the
inner product, o s = 0, and is only imperfectly analogous to the
idea that Euclidean vectors are perpendicular if their dot product is
zero. In particular, a nonzero Euclidean vector is never perpendic-
ular to itself, but for any lightlike vector v we have v v = 0. The
metric doesnt give us a measure of the length of lightlike vectors.
Physically, neither a ruler nor a clock can measure such a vector.
The metric in SI units Example 5
Units with c = 1 are known as natural units. (They are natural
to relativity in the same sense that units with = 1 are natural
to quantum mechanics.) Any equation expressed in natural units
can be reexpressed in SI units by the simple expedient of insert-
ing factors of c wherever they are needed in order to get units that
make sense. The result for the metric could be
g(u, v) = c
g(u, v) = u
It doesnt matter which we pick, since the metric is arbitrary up to
a constant factor. The former expression gives a result in meters,
the latter seconds.
Orthogonal light rays? Example 6
On a spacetime diagram in 1+1 dimensions, we represent the
light cone with the two lines x = t , drawn at an angle of 90
degrees relative to one another. Are these lines orthogonal?
No. For example, if u = (1, 1) and v = (1, 1), then u v is 2, not
Pioneer 10 Example 7
The Pioneer 10 space probe was launched in 1972, and in 1973
was the rst craft to y by the planet Jupiter. It crossed the orbit
of the planet Neptune in 1983, after which telemetry data were
received until 2002. The following table gives the spacecrafts
position relative to the sun at exactly midnight on January 1, 1983
and January 1, 1995. The 1983 date is taken to be t = 0.
t (s) x y z
0 1.784 10
m 3.951 10
m 0.237 10
3.7869120000 10
s 2.420 10
m 8.827 10
m 0.488 10
Compare the time elapsed on the spacecraft to the time in a frame
of reference tied to the sun.
We can convert these data into natural units, with the distance
unit being the second (i.e., a light-second, the distance light trav-
24 Chapter 1 Spacetime
s / The twin paradox.
t / A graph of as a function
of v.
els in one second) and the time unit being seconds. Converting
and carrying out this subtraction, we have:
t (s) x y z
3.7869120000 10
s 0.2121 10
s 1.626 10
s 0.084 10
Comparing the exponents of the temporal and spatial numbers,
we can see that the spacecraft was moving at a velocity on the
order of 10
of the speed of light, so relativistic effects should be
small but not completely negligible.
Since the interval is timelike, we can take its square root and
interpret it as the time elapsed on the spacecraft. The result is
= 3.78691199610
s. This is 0.4 s less than the time elapsed
in the suns frame of reference.
1.3.3 The gamma factor
Figure s is the relativistic version of example 1 on p. 14. We
intend to analyze it using the metric, and since the metric gives
the same result in any frame, we have chosen for convenience to
represent it in the frame in which the earth is at rest. We have
a = (t, 0) and b = (t, vx), where v is the velocity of the spaceship
relative to the earth. Application of the metric gives proper time t
for the earthbound twin and t

1 v
for the traveling twin. The
same results apply for c and d. The result is that the earthbound
twin experiences a time that is greater by a factor (Greek letter
gamma) dened as = 1/

1 v
. If v is close to c, can be large,
and we nd that when the astronaut twin returns home, still youth-
ful, the earthbound twin can be old and gray. This was at one time
referred to as the twin paradox, and it was considered paradoxical
either because it seemed to defy common sense or because the trav-
eling twin could argue that she was the one at rest while the earth
was moving. The violation of common sense is in fact what was ob-
served in the Hafele-Keating experiment, and the latter argument
is fallacious for the same reasons as in the Galilean version given in
example 1.
We have in general the following interpretation:
Time dilation
A clock runs fastest in the frame of reference of an observer
who is at rest relative to the clock. An observer in motion
relative to the clock at speed v perceives the clock as running
more slowly by a factor of .
Although this is phrased in terms of clocks, we interpret it as
telling us something about time itself. The attitude is that we should
dene a concept in terms of the operations required in order to mea-
sure it: time is dened as what a clock measures. This philosophy,
which has been immensely inuential among physicists, is called
operationalism and was developed by P.W. Bridgman in the 1920s.
Section 1.3 Measurement 25
Our operational denition of time works because the rates of all
physical processes are aected equally by time dilation.
By the
time the twins in gure s are reunited, not only has the traveling
twin heard fewer ticks from her antique mechanical pocket watch,
but she has also had fewer heartbeats, and the ships atomic clock
agrees with her watch to within the precision of the watch.
self-check A
What is when v = 0? What does this mean? Express the equation for
in SI units. Answer, p. ??
Time dilation is symmetrical in the sense that it treats all frames
of reference democratically. If observers A and B arent at rest
relative to each other, then A says Bs time runs slow, but B says A
is the slow one. In gure s, the laws of physics make no distinction
between the frames of reference that coincide with vectors a and
b; as in the corresponding Galilean case of example 1 on p. 14, the
asymmetry comes about because a and c are parallel, but b and d
are not.
As shown in example 8 below, consistency demands that in ad-
dition to the eect on time we have a similar eect on distances:
Length contraction
A meter-stick appears longest to an observer who is at rest
relative to it. An observer moving relative to the meter-stick
at v observes the stick to be shortened by a factor of .
Our present discussion is limited to 1+1 dimensions, but in 3+1,
only the length along the line of motion is contracted (ch. 2, problem
2, p. 47). It should not be imagined that length contraction is
what an observer actually sees visually. Optical observations are
inuenced, for example, by the unequal times taken for light to
propagate from the ends of the stick to the eye. A simulation of this
type of eect is drawn in example 6 on p. 120.
An interstellar road trip Example 8
Alice stays on earth while her twin Betty heads off in a spaceship
for Tau Ceti, a nearby star. Tau Ceti is 12 light-years away, so
even though Betty travels at 87% of the speed of light, it will take
her a long time to get there: 14 years, according to Alice.
u / Example 8.
For more on this topic, see section 6.1.
26 Chapter 1 Spacetime
v / Time dilation measured
with an atomic clock at low
speeds. The theoretical curve,
shown with a dashed line, is
calculated from = 1/

1 v
at these small velocities, the
approximation 1 + v
/2 is
excellent, and the graph is in-
distinguishable from a parabola.
This graph corresponds to an
extreme close-up view of the
lower left corner of gure t. The
error bars on the experimental
points are about the same size
as the dots.
Betty experiences time dilation. At this speed, her is 2.0, so that
the voyage will only seem to her to last 7 years. But there is per-
fect symmetry between Alices and Bettys frames of reference, so
Betty agrees with Alice on their relative speed; Betty sees herself
as being at rest, while the sun and Tau Ceti both move backward
at 87% of the speed of light. How, then, can she observe Tau Ceti
to get to her in only 7 years, when it should take 14 years to travel
12 light-years at this speed?
We need to take into account length contraction. Betty sees the
distance between the sun and Tau Ceti to be shrunk by a factor of
2. The same thing occurs for Alice, who observes Betty and her
spaceship to be foreshortened.
A moving atomic clock Example 9
Expanding in a Taylor series, we nd 1v
/2, so that when
v is small, relativistic effects are approximately proportional to v
so it is very difcult to observe them at low speeds. This was
the reason that the Hafele-Keating experiment was done aboard
passenger jets, which y at high speeds. Jets, however, y at
high altitude, and this brings in a second time dilation effect, a
general-relativistic one due to gravity. The main purpose of the
experiment was actually to test this effect.
It was not until four decades after Hafele and Keating that anyone
did a conceptually simple atomic clock experiment in which the
only effect was motion, not gravity. In 2010, however, Chou et al.
succeeded in building an atomic clock accurate enough to detect
time dilation at speeds as low as 10 m/s. Figure v shows their
results. Since it was not practical to move the entire clock, the
experimenters only moved the aluminum atoms inside the clock
that actually made it tick.
Large time dilation Example 10
The time dilation effects described in example 9 were very small.
If we want to see a large time dilation effect, we cant do it with
something the size of the atomic clocks they used; the kinetic
energy would be greater than the total megatonnage of all the
worlds nuclear arsenals. We can, however, accelerate subatomic
particles to speeds at which is large. For experimental particle
physicists, relativity is something you do all day before heading
home and stopping off at the store for milk. An early, low-precision
experiment of this kind was performed by Rossi and Hall in 1941,
using naturally occurring cosmic rays. Figure w shows a 1974
of a similar type which veried the time dilation pre-
dicted by relativity to a precision of about one part per thousand.
Science 329 (2010) 1630
Bailey at al., Nucl. Phys. B150(1979) 1
Section 1.3 Measurement 27
w / Left : Apparatus used for the test of relativistic time dilation described in example 10. The prominent
black and white blocks are large magnets surrounding a circular pipe with a vacuum inside. (c) 1974 by CERN.
Right : Muons accelerated to nearly c undergo radioactive decay much more slowly than they would according
to an observer at rest with respect to the muons. The rst two data-points (unlled circles) were subject to
large systematic errors.
Particles called muons (named after the Greek letter , myoo)
were produced by an accelerator at CERN, near Geneva. A muon
is essentially a heavier version of the electron. Muons undergo
radioactive decay, lasting an average of only 2.197 s before they
evaporate into an electron and two neutrinos. The 1974 experi-
ment was actually built in order to measure the magnetic proper-
ties of muons, but it produced a high-precision test of time dilation
as a byproduct. Because muons have the same electric charge
as electrons, they can be trapped using magnetic elds. Muons
were injected into the ring shown in gure w, circling around it un-
til they underwent radioactive decay. At the speed at which these
muons were traveling, they had = 29.33, so on the average they
lasted 29.33 times longer than the normal lifetime. In other words,
they were like tiny alarm clocks that self-destructed at a randomly
selected time. The graph shows the number of radioactive decays
counted, as a function of the time elapsed after a given stream of
muons was injected into the storage ring. The two dashed lines
show the rates of decay predicted with and without relativity. The
relativistic line is the one that agrees with experiment.
1.4 The Lorentz transformation
Philosophically, coordinates are unnecessary, but in practical terms
they are convenient. They are arbitrary, so we can change from one
set to another. For example, we can simply change the units used to
measure time and position, as in the rst and second panels of gure
x. Nothing changes about the underlying events; only the labels are
dierent. The third panel of gure x shows a convenient convention
28 Chapter 1 Spacetime
y / The Lorentz transformation.
x / Two events are given as points
on a graph of position versus
time. Joan of Arc helps to re-
store Charles VII to the throne. At
a later time and a different posi-
tion, Joan of Arc is sentenced to
we will use to depict such changes visually. The gray rectangle
represents the original coordinate grid from the rst panel, while
the grid of black lines represents the new version from the second
panel. Omitting the grid from the gray rectangle makes the diagram
easier to decode visually.
In special relativity it is of interest to convert between the Min-
kowski coordinates of observers who are in motion relative to one
another. The result, shown in gure y, is a kind of stretching and
smooshing of the diagonals. Since the area is invariant, one diagonal
grows by the same factor by which the other shrinks. This change
of coordinates is called the Lorentz transformation.
z / 1. The clock is at rest in the
original frame of reference, and
it measures a time interval t . In
the new frame of reference, the
time interval is greater by a fac-
tor of . 2. The ruler is moving in
the rst frame, represented by a
square, but at rest in the second
one, shown as a parallelogram.
Each picture of the ruler is a snap-
shot taken at a certain moment as
judged according to the second
frames notion of simultaneity. An
observer in rst frame judges the
rulers length instead according to
that frames denition of simul-
taneity, i.e., using points that are
lined up vertically on the graph.
The ruler appears shorter in the
frame in which it is moving.
Figure z shows how time dilation and length contraction come
about in this picture. It should be emphasized here that the Lorentz
transformation includes more eects than just length contraction
and time dilation. Many beginners at relativity get confused and
come to erroneous conclusions by trying to reduce everything to a
matter of inserting factors of in various equations. If the Lorentz
transformation amounted to nothing more than length contraction
and time dilation, it would be merely a change of units like the one
shown in gure x.
The Lorentz transformation translates into algebraic notation
Section 1.4 The Lorentz transformation 29
like this:

= t vx

= vt +x
The line x = 0 is described in the (t

, x

) coordinates as having slope

1/v, which is what justies the interpretation of v as a velocity.
The inverse transformation is the one with v replaced by v. These
properties are shared by the Galilean transformation. The fact that
this is the correct relativistic transformation can be veried by not-
ing that (1) the lines x = t are preserved, and (2) the determinant
equals 1, so that areas are preserved. Alternatively, it is sucient
to check the invariance of the spacetime interval under this trans-
A numerical example of invariance Example 11
Figure aa shows two frames of reference in motion relative to
one another at v = 3/5. (For this velocity, the stretching and
squishing of the main diagonals are both by a factor of 2.) Events
are marked at coordinates that in the frame represented by the
square are
(t , x) = (0, 0) and
(t , x) = (13, 11) .
The interval between these events is 13
= 48. In the
frame represented by the parallelogram, the same two events lie
at coordinates

, x

) = (0, 0) and

, x

) = (8, 4) .
Calculating the interval using these values, the result is
= 48, which comes out the same as in the other frame.
30 Chapter 1 Spacetime
aa / Example 11.
The garage paradox Example 12
One of the most famous of all the so-called relativity paradoxes
has to do with our incorrect feeling that simultaneity is well de-
ned. The idea is that one could take a schoolbus and drive it at
relativistic speeds into a garage of ordinary size, in which it nor-
mally would not t. Because of the length contraction, the bus
would supposedly t in the garage. The driver, however, will per-
ceive the garage as being contracted and thus even less able to
contain the bus.
The paradox is resolved when we recognize that the concept of
tting the bus in the garage all at once contains a hidden as-
sumption, the assumption that it makes sense to ask whether the
front and back of the bus can simultaneously be in the garage.
Observers in different frames of reference moving at high relative
speeds do not necessarily agree on whether things happen si-
multaneously. As shown in gure ab, the person in the garages
frame can shut the door at an instant B he perceives to be si-
multaneous with the front bumpers arrival A at the back wall of
the garage, but the driver would not agree about the simultaneity
of these two events, and would perceive the door as having shut
long after she plowed through the back wall.
Section 1.4 The Lorentz transformation 31
ab / Example 12: In the garages frame of reference, the bus is moving, and can t in the garage due
to its length contraction. In the buss frame of reference, the garage is moving, and cant hold the bus due to its
length contraction.
ac / Example 13.
Shifting clocks Example 13
The top row of clocks in the gure are located in three different
places. They have been synchronized in the frame of reference
of the earth, represented by the paper. This synchronization is
32 Chapter 1 Spacetime
carried out by exchanging light signals (Einstein synchronization),
as in example 4 on p. 19. For example, if the front and back clocks
both send out ashes of light when they think its 2 oclock, the
one in the middle will receive them both at the same time. Event
A is the one at which the back clock A reads 2 oclock, etc.
The bottom row of clocks are aboard the train, and have been
synchronized in a similar way. For the reasons discussed in ex-
ample 4, their synchronization differs from that of the earth-based
clocks. By referring to the diagram of the Lorentz transformation
shown on the right, we see that in the frame of the train, 2, C
happens rst, then B, then A.
This is an example of the interpretation of the term t

= . . . vx
in the Lorentz transformation (eq. (1), p. 30). Because the events
occur at different xs, each is shifted in time relative to the next,
according to clocks synchronized in frame 2 (t

, the train).
Section 1.4 The Lorentz transformation 33
Problem 5.
1 Astronauts in three dierent spaceships are communicating
with each other. Those aboard ships A and B agree on the rate at
which time is passing, but they disagree with the ones on ship C.
(a) Alice is aboard ship A. How does she describe the motion of her
own ship, in its frame of reference?
(b) Describe the motion of the other two ships according to Alice.
(c) Give the description according to Betty, whose frame of reference
is ship B.
(d) Do the same for Cathy, aboard ship C.
2 What happens in the equation for when you put in a
negative number for v? Explain what this means physically, and
why it makes sense.
3 The Voyager 1 space probe, launched in 1977, is moving faster
relative to the earth than any other human-made object, at 17,000
meters per second.
(a) Calculate the probes .
(b) Over the course of one year on earth, slightly less than one year
passes on the probe. How much less? (There are 31 million seconds
in a year.)

4 The earth is orbiting the sun, and therefore is contracted

relativistically in the direction of its motion. Compute the amount
by which its diameter shrinks in this direction.

5 The gure shows seven displacement vectors in spacetime.

Which of these represent spacetime intervals that are equal to one
6 (a) In Euclidean geometry in three dimensions, suppose we
have two vectors, a and b, which are unit vectors, i.e., a a = 1 and
b b = 1. What is the range of possible values for the inner product
a b?
(b) Repeat part a for two timelike, future-directed unit vectors in
3 + 1 dimensions.
7 Expressed in natural units, the Lorentz transformation is

= t vx

= vt +x .
(a) Insert factors of c to make it valid in units where c ,= 1. (b) Show
that in the limit c , these have the right Galilean behavior.
8 This problem assumes you have some basic knowledge of quan-
tum physics. One way of expressing the correspondence principle as
applied to special relativity is that in the limit c , all relativis-
tic expressions have to go over to their Galilean counterparts. What
would be the corresponding limit if we wanted to recover classical
mechanics from quantum mechanics?
34 Chapter 1 Spacetime
9 In 3 + 1 dimensions, prove that if u and v are nonzero,
future-lightlike, and not parallel to each other, then their sum is
10 Prove that if u and v are nonzero, lightlike, and orthogonal
to each other, then they are parallel, i.e., u = cv for some c ,= 0.
11 The speed at which a disturbance travels along a string
under tension is given by v =
T/, where is the mass per unit
length, and T is the tension.
(a) Suppose a string has a density , and a cross-sectional area A.
Find an expression for the maximum tension that could possibly
exist in the string without producing v > c, which is impossible
according to relativity. Express your answer in terms of , A, and
c. The interpretation is that relativity puts a limit on how strong
any material can be.

(b) Every substance has a tensile strength, dened as the force

per unit area required to break it by pulling it apart. The ten-
sile strength is measured in units of N/m
, which is the same as the
pascal (Pa), the mks unit of pressure. Make a numerical estimate
of the maximum tensile strength allowed by relativity in the case
where the rope is made out of ordinary matter, with a density on
the same order of magnitude as that of water. (For comparison,
kevlar has a tensile strength of about 4 10
Pa, and there is spec-
ulation that bers made from carbon nanotubes could have values
as high as 6 10

(c) A black hole is a star that has collapsed and become very dense,
so that its gravity is too strong for anything ever to escape from it.
For instance, the escape velocity from a black hole is greater than
c, so a projectile cant be shot out of it. Many people, when they
hear this description of a black hole in terms of an escape velocity
greater than c, wonder why it still wouldnt be possible to extract
an object from a black hole by other means than launching it out
as a projectile. For example, suppose we lower an astronaut into a
black hole on a rope, and then pull him back out again. Why might
this not work?
Problems 35
Problem 14.
12 The rod in the gure is perfectly rigid. At event A, the
hammer strikes one end of the rod. At event B, the other end moves.
Since the rod is perfectly rigid, it cant compress, so A and B are
simultaneous. In frame 2, B happens before A. Did the motion at
the right end cause the person on the left to decide to pick up the
hammer and use it?
Problem 12.
13 Use a spacetime diagram to resolve the following relativity
paradox. Relativity says that in one frame of reference, event A
could happen before event B, but in someone elses frame B would
come before A. How can this be? Obviously the two people could
meet up at A and talk as they cruised past each other. Wouldnt
they have to agree on whether B had already happened?
14 The grid represents spacetime in a certain frame of reference.
Event A is marked with a dot. Mark additional points satisfying the
following criteria. (Pick points that lie at the intersections of the
Point B is at the same location as A in this frame of reference, and
lies in its future.
C is also in point As future, is not at the same location as A in
this frame, but is in the same location as A according to some other
frame of reference.
D is simultaneous with A in this frame of reference.
E is not simultaneous with A in this frame of reference, but is si-
multaneous with it according to some other frame.
F lies in As past according to this frame of reference, but could not
have caused A.
G lies in As future according to this frame of reference, but is in its
past according to some other frames.
H lies in As future according to any frame of reference, not just this
36 Chapter 1 Spacetime
I is the departure of a spaceship, which arrives at A.
J could have caused A, but could not have been the departure of a
spaceship like I that arrived later at A.
Problems 37
38 Chapter 1 Spacetime
a / Newtons laws do not dis-
tinguish past from future. The
football could travel in either
direction while obeying Newtons
Chapter 2
Foundations (optional)
In this optional chapter we more systematically examine the foun-
dational assumptions of special relativity, which were appealed to
casually in chapter 1. Most readers will want to skip this chapter
and move on to ch. 3. The ordering of chapters 1 and 2 may seem
backwards, but many of the issues to be raised here are very subtle
and hard to appreciate without already understanding something
about special relativity in fact, Einstein and other relativists did
not understand them properly until decades after the introduction
of special relativity in 1905.
2.1 Causality
2.1.1 The arrow of time
Our intuitive belief in cause-and-eect mechanisms is not sup-
ported in any clearcut way by the laws of physics as currently un-
derstood. For example, we feel that the past aects the future but
not the other way around, but this feeling doesnt seem to translate
into physical law. For example, Newtons laws are invariant un-
der time reversal, gure a, as are Maxwells equations. (The weak
nuclear force is the only part of the standard model that violates
time-reversal symmetry, and even it is invariant under the CPT
There is an arrow of time provided by the second law of thermo-
dynamics, and this arises ultimately from the fact that, for reasons
unknown to us, the universe soon after the Big Bang was in a state
of extremely low entropy.
2.1.2 Initial-value problems
So rather than depending on the arrow of time, we may be better
o formulating a notion of causality based on existence and unique-
ness of initial-value problems. In 1776, Laplace gave an inuential
early formulation of this idea in the context of Newtonian mechan-
ics: Given for one instant an intelligence which could comprehend
all the forces by which nature is animated and the respective posi-
One can nd a vast amount of nonsense written about this, such as claims
that the second law is derivable without reference to any cosmological con-
text. For a careful treatment, see Callender, Thermodynamic Asymmetry
in Time, The Stanford Encyclopedia of Philosophy,
tions of the things which compose it . . . nothing would be uncertain,
and the future as the past would be laid out before its eyes. The
reference to one instant is not compatible with special relativity,
which has no frame-independent denition of simultaneity. We can,
however, dene initial conditions on some spacelike three-surface,
i.e., a three-dimensional set of events that is smooth, has the topol-
ogy of Euclidean space, and whose events are spacelike in relation
to one another.
Unfortunately it is not obvious whether the classical laws of
physics satisfy Laplaces denition of causality. Two interesting
and accessible papers that express a skeptical view on this issue are
Norton, Causation as Folk Science,
1214; and Echeverria et al., Billiard balls in wormhole spacetimes
with closed timelike curves: Classical theory, http://resolver. The Norton paper in
particular has generated a large literature at the interface between
physics and philosophy, and one can nd most of the relevant ma-
terial online using the keywords Nortons dome.
Nor does general relativity oer much support to the Laplacian
version of causality. For example, general relativity says that given
generic initial conditions, gravitational collapse leads to the forma-
tion of singularities, points where the structure of spacetime breaks
down and various measurable quantities become innite. Singu-
larities typically violate causality, since the laws of physics cant
describe them. In a famous image, John Earman wrote that if we
have a certain type of singularity (called a naked singularity), all
sorts of nasty things . . . emerge helter-skelter . . . , including TV
sets showing Nixons Checkers speech, green slime, Japanese horror
movie monsters, etc.
2.1.3 A modest denition of causality
Since there does not seem to be any reason to expect causality
to hold in any grand sense, we will content ourselves here with a
very modest and specialized denition, stated as a postulate, that
works well enough for special relativity.
P1. Causality. There exist events 1 and 2 such that the dis-
placement vector r
is timelike in all frames.
This is sucient to rule out the rotational version of the
Lorentz transformation shown in gure j on p. 16. If P1 were vi-
olated, then we could never describe one event as causing another,
since there would always be frames of reference in which the eect
was observed as preceding the cause.
2.2 Flatness
40 Chapter 2 Foundations (optional)
b / An airplane ying from
Mexico City to London follows
the shortest path, which is a
segment of a great circle. A path
of extremal length between two
points is called a geodesic.
c / Transporting the vector
along path AC gives a different
result than doing it along the path
d / Parallel transport.
2.2.1 Failure of parallelism
In postulate P1 we implicitly assumed that given two points,
there was a certain vector connecting them. This is analogous to
the Euclidean postulate that two points dene a line.
For insight, lets think about how the Euclidean version of this
assumption could fail. Euclidean geometry is only an approximate
description of the earths surface, for example, and this is why at
maps always entail distortions of the actual shapes. The distortions
might be negligible on a map of Connecticut, but severe for a map
of the whole world. That is, the globe is only locally Euclidean.
On a spherical surface, the appropriate object to play the role of a
line is a great circle, gure b. The lines of longitude are examples
of great circles, and since these all coincide at the poles, we can see
that two points do not determine a line in noneuclidean geometry.
A two-dimensional bug living on the surface of a sphere would
not be able to tell that the sphere was embedded in a third dimen-
sion, but it could still detect the curvature of the surface. It could
tell that Euclids postulates were false on large distance scales. A
method that has a better analog in spacetime is shown in gure
c: transporting a vector from one point to another depends on the
path along which it was transported. This eect is our denition of
2.2.2 Parallel transport
The particular type of transport that we have in mind here is
called parallel transport. When I walk from the living room to
the kitchen while carrying a mechanical gyroscope, Im parallel-
transporting the spacelike vector indicated by the direction of its
axis. Figure d shows that parallel transport can also be dened for
timelike vectors, and that parallel transport can be dened in space-
time using only inertial motion, clocks, and intersection of world-
lines. Observers aboard the two spaceships exchange clocks in order
to verify the parallelism of their world-lines (vectors AB and CD,
which have equal lengths as measured by the proper time elapsed
aboard the ships). The observers shoot the clocks across the space
between them, and the clocks are set up so that when they pass by
one another, they automatically record one anothers readings. The
vectors are parallel if the record later reveals AD and BC intersected
at their midpoints, as measured by the proper times recorded on the
2.2.3 Special relativity requires at spacetime
Hidden in a number of spots in chapter 1 was the following
P2. Flatness of spacetime. Parallel-transporting a vector from
Section 2.2 Flatness 41
one point to another gives a result that is independent of the path
along which it was transported.
For example, when we established the form of the metric in sec-
tion 1.3.2, we used the fact, proved on p. 45, that area is a scalar,
but that proof depends on P2.
Property P2 is only approximately true, as shown explicitly by
the Gravity Probe B satellite, launched in 2004. The probe carried
four gyroscopes made of quartz, which were the most perfect spheres
ever manufactured, varying from sphericity by no more than about
40 atoms. After one year and about 5000 orbits around the earth,
the gyroscopes were found to have changed their orientations relative
to the distant stars by about 3 10
radians (gure e). This is a
violation of P2, but one that was very small and dicult to detect.
The result was in good agreement with the predictions of general
relativity, which describes gravity as a curvature of spacetime. The
smallness of the eect tells us that the earths gravitational eld
is not so large as to completely invalidate special relativity as a
description of the nearby region of spacetime. One of the basic
assumptions of general relativity is that in a small enough region
of spacetime, it is always a good approximation to assume P2, so
that general relativity is locally the same as special relativity. In
the Gravity Probe B experiment, the eect was small and hard to
detect, and this was the reason for letting the eect accumulate
over a large number of orbits, spanning a large region of spacetime.
Problem 5 on p. 48 investigates more quantitatively how the size of
curvature eects varies with the size of the region.
e / Precession angle as a function of time as measured by the four gyroscopes aboard Gravity Probe B.
42 Chapter 2 Foundations (optional)
2.3 Additional postulates
We make the following additional assumptions:
P3 Spacetime is homogeneous and isotropic. No time or place
has special properties that make it distinguishable from other
points, nor is one direction in space distinguishable from an-
P4 Inertial frames of reference exist. These are frames in which
particles move at constant velocity if not subject to any forces.
We can construct such a frame by using a particular particle,
which is not subject to any forces, as a reference point. Inertial
motion is modeled by vectors and parallelism.
P5 Equivalence of inertial frames: If a frame is in constant-velocity
translational motion relative to an inertial frame, then it is also
an inertial frame. No experiment can distinguish one preferred
inertial frame from all the others.
P6 Relativity of time: There exist events 1 and 2 and frames of
reference dened by observers o and o

such that o r
true but o

is false, where the notation o r means that
observer o nds r to be a vector of simultaneity according to
some convenient criterion such as Einstein synchronization.
Postulates P3 and P5 describe symmetries of spacetime, while
P6 dierentiates the spacetime of special relativity from Galilean
spacetime; the symmetry described by these three postulates is re-
ferred to as Lorentz invariance, and all known physical laws have
this symmetry. Postulate P4 denes what we have meant when we
referred to the parallelism of vectors in spacetime (e.g., in gure
s on p. 25). Postulates P1-P6 were all the assumptions that were
needed in order to arrive at the picture of spacetime described in
ch. 1. This approach, based on symmetries, dates back to 1911.
2.4 Other axiomatizations
2.4.1 Einsteins postulates
Einstein used a dierent axiomatization in his 1905 paper on
special relativity:
For the experimental evidence on isotropy, see http://www.
Dening this no-force rule turns out to be tricky when it comes to gravity. As
discussed in ch. 5, this apparently minor technicality turns out to have important
example 4, p. 19
W. v. Ignatowsky, Phys. Zeits. 11 (1911) 972. The original paper unfortu-
nately seems very dicult to obtain.
Paraphrased from the translation by W. Perrett and G.B. Jeery.
Section 2.3 Additional postulates 43
E1. Principle of relativity: The laws of electrodynamics and
optics are valid for all frames of reference for which the equations of
mechanics hold good.
E2. Light is always propagated in empty space with a denite
velocity c which is independent of the state of motion of the emitting
These should be supplemented with our P2 and P3.
Einsteins approach has been slavishly followed in many later
textbook presentations, even though the special role it assigns to
light is not consistent with how modern physicists think about the
fundamental structure of the laws of physics. (In 1905 there was
no other phenomenon known to travel at c.) Einstein did not ex-
plicitly state anything like our P2 (atness), since he had not yet
developed the theory of general relativity or the idea of representing
gravity in relativity as spacetime curvature. When he did publish
the general theory, he described the distinction between special and
general relativity as a generalization of the class of acceptable frames
of reference to include accelerated as well as inertial frames. This
description has not stood the test of time, and today relativists
use atness as the distinguishing criterion. In particular, it is not
true, as one sometimes still hears claimed, that special relativity is
incompatible with accelerated frames of reference.
2.4.2 Maximal time
Another approach, presented, e.g., by Laurent,
combines our
P2 with the following:
T1 Metric: An inner product exists. Proper time is measured by
the square of the inner product of a world-line with itself.
T2 Maximum proper time: Inertial motion gives a world-line along
which the proper time is at a maximum with respect to small
changes in the world-line. Inertial motion is modeled by vec-
tors and parallelism, and this vector-space apparatus has the
usual algebraic properties in relation to the inner product re-
ferred to in T1, e.g., a (b +c) = a b +a c.
We have already seen an example of T2 in our analysis of the
twin paradox (gure s on p. 25). Conceptually, T2 is similar to
dening a line as the shortest path between two points, except that
we dene a geodesic as being the longest one (four our +
2.4.3 Comparison of the systems
It is useful to compare the axiomatizations P, E, and T from
sections 2.1.1-2.4.2 with each other in order to gain insight into how
Bertel Laurent, Introduction to Spacetime: A First Course on Relativity
44 Chapter 2 Foundations (optional)
f / Area is a scalar.
much wiggle room there is in constructing theories of spacetime.
Since they are logically equivalent, any statement occurring in one
axiomatization can be proved as a theorem in the other.
For example, we might wonder whether it is possible to equip
Galilean spacetime with a metric. The answer is no, since a system
with a metric would satisfy the axioms of system T, which are log-
ically equivalent to our system P. The underlying reason for this is
that in Galilean spacetime there is no natural way to compare the
scales of distance and time.
Or we could ask whether it is possible to compose variations on
the theme of special relativity, alternative theories whose properties
dier in some way. System P shows that this would be unlikely to
succeed without violating the symmetry of spacetime.
Another interesting example is Amelino-Camelias doubly-special
in which we have both an invariant speed c and an invari-
ant length L, which is assumed to be the Planck length
The invariance of this length contradicts the existence of length
contraction. In order to make his theory work, Amelino-Camelia is
obliged to assume that energy-momentum vectors (section 4.3) have
their own special inner product that violates the algebraic properties
referred to in T2.
2.5 Lemma: spacetime area is invariant
In this section we prove from axioms P1-P6 that area in the x t
plane is invariant, i.e., it does not change between frames of refer-
ence. This result is used in section 1.3.2 to nd the form of the
spacetime metric.
Consider gure f. Vectors o
and s
are orthogonal and have
equal lengths as measured by a clock and a ruler (which are cali-
brated in units such that c = 1, e.g., seconds and light-seconds). The
square lattice of white polka-dots is obtained from them by repeated
addition. By assuming that this lattice construction is possible, we
are implicitly assuming postulate P2, atness of spacetime.
The same properties hold for vectors o
and s
, which give the
lattice of black dots. As required, the two lattices agree on their
45-degree diagonals. Now within the 10 10 portion of the white
lattice shown with gray shading, we have an area of 100. In the
same region we count about 100 or 101 black dots there is some
ambiguity because of the dots that lie on the boundary. The density
of white and black dots is in fact exactly equal, as can be veried
to any desired precision by making the region big enough. In other
words, the diagram is drawn so that area is preserved, which is what
we are going to show is required.
Section 2.5 Lemma: spacetime area is invariant 45
If it was observer 2 rather than 1 who was drawing the diagram,
presumably she would choose to draw the black dots in a square
lattice and vectors o
and s
at right angles. This would require
vectors o
and s
to be opened up at an oblique angle and the white
lattice to be non-square.
Now suppose we had not made area conserved. What if a region
containing 100 white dots had held 200 black ones? But a boost
of velocity v is the same as a ip of the spatial dimension followed
by a v boost and another ip. (If P2 failed, then it might not
be possible to ip a rigid shape in this way.) Therefore this would
violate one or the other of two principles: (1) that all frames of
reference are equally valid (there is no preferred frame such as that
of the ether); or (2) that space is isotropic, meaning that it has the
same properties in all directions (neither +x nor x is a preferred
direction). We conclude that if these two symmetry principles hold,
then spacetime area is the same for any two observers, so it is an
It may seem unnecessarily clumsy that weve used the idea of
counting dots in the above argument, but remember that our main
use of this result is to derive the form of the metric, and before
the metric had been found, we had no system of measurement for
relativity, so we had only very primitive techniques at our disposal.
46 Chapter 2 Foundations (optional)
1 Section 2.5 gives an argument that spacetime area is a rela-
tivistic invariant. Is this argument also valid for Galilean relativity?
2 Section 2.5 gives an argument that spacetime area is a rela-
tivistic invariant. (a) Generalize this from 1+1 dimensions to 3+1.
(b) Use this result to prove that there is no relativistic length con-
traction eect along an axis perpendicular to the velocity.
3 The purpose of this problem is to nd how the direction of a
physical object such as a stick changes under a Lorentz transforma-
tion. Part b of problem 2 shows that relativistic length contraction
occurs only along the axis parallel to the motion. The generalization
of the 1+1-dimensional Lorentz transformation to 2+1 dimensions
therefore consists simply of augmenting equation (1) on p. 30 with

= y. Suppose that a stick, in its own rest frame, has one end
with a world-line (, 0, 0) and the other with (, p, q), where is the
sticks proper time. Call these ends A and B. In other words, we
have a stick that goes from the origin to coordinates (p, q) in the
(x, y) plane. Apply a Lorentz transformation for a boost with ve-
locity v in the x direction, and nd the equations of the world-lines
of the ends of the stick in the new (t

, x

, y

) coordinates. According
to this new frames notion of simultaneity, nd the coordinates of
B when A is at (t

, x

, y

) = (0, 0, 0). (a) In the special case where

q = 0, recover the 1 + 1-dimensional result for length contraction
given on p. 26. (b) Returning to the general case where q ,= 0,
consider the angle that the stick makes with the x axis, and the
related angle

that it makes with the x

axis in the new frame.

Show that tan

= tan .
4 Section 2.2 discusses the idea that a two-dimensional bug
living on the surface of a sphere could tell that its space was curved.
Figure c on p. 41 shows one way of telling, by detecting the path-
dependence of parallel transport. A dierent technique would be to
look for violations of the Pythagorean theorem. In the gure below,
1 is a diagram illustrating the proof of the Pythagorean theorem in
Euclids Elements (proposition I.47). This diagram is equally valid
if the page is rolled onto a cylinder, 2, or formed into a wavy cor-
rugated shape, 3. These types of curvature, which can be achieved
without tearing or crumpling the surface, are not real to the bug.
They are simply side-eects of visualizing its two-dimensional uni-
verse as if it were embedded in a hypothetical third dimension
which doesnt exist in any sense that is empirically veriable to the
bug. Of the curved surfaces in the gure, only the sphere, 4, has
curvature that the bug can measure; the diagram cant be plastered
onto the sphere without folding or cutting and pasting. If a two-
dimensional being lived on the surface of a cone, would it say that
its space was curved, or not? What about a saddle shape?
Problems 47
Problem 5.
Problem 4.
5 The discrepancy in parallel transport shown in gure c on
p. 41 can also be interpreted as a measure of the triangles angular
defect d, meaning the amount S by which the sum of its interior
angles S exceeds the Euclidean value. (a) The gure suggests a
simple way of verifying that the angular defect of a triangle inscribed
on a sphere depends on area. It shows a large equilateral triangle
that has been dissected into four smaller triangles, each of which
is also approximately equilateral. Prove that D = 4d, where D is
the angular defect of the large triangle and d the value for one of
the four smaller ones. (b) Given that the proportionality to area
d = kA holds in general, nd some triangle on a sphere of radius R
whose area and angular defect are easy to calculate, and use it to
x the constant of proportionality k.
Remark: A being who lived on a sphere could measure d and A for some triangle
and infer R, which is a measure of curvature. The proportionality of the eect
to the area of the triangle also implies that the eects of curvature become
negligible on suciently small scales. The analogy in relativity is that special
relativity is a valid approximation to general relativity in regions of space that
are small enough so that spacetime curvature becomes negligible.
48 Chapter 2 Foundations (optional)
Chapter 3
At this stage, many students raise the following questions, which
turn out to be related to one another:
1. According to Einstein, if observers A and B arent at rest
relative to each other, then A says Bs time is slow, but B
says A is the slow one. How can this be? If A says B is slow,
shouldnt B say A is fast? After all, if I took a pill that sped up
my brain, everyone else would seem slow to me, and I would
seem fast to them.
2. Suppose I keep accelerating my spaceship steadily. What hap-
pens when I get to the speed of light?
3. In all the diagrams in section 1.4, the parallelograms have their
diagonals stretched and squished by a certain factor, which
depends on v. What is the interpretation of this factor?
3.1 How can they both . . . ?
Figure a shows how relativity resolves the rst question. If A and B
had an instantaneous method of communication such as Star Treks
subspace radio, then they could indeed resolve the question of who
was really slow.
a / Signals dont resolve the dis-
pute over who is really slow.
But relativity does not allow cause and eect to be propagated
outside the light cone, so the best they can actually do is to send
each other signals at c. In a/1, B sends signals to A at time intervals
of one hour as measured by Bs clock. According to As clock, the
signals arrive at an interval that is shorter than one hour as the two
spaceships approach one another, then longer than an hour after
they pass each other and begin to recede. As shown in a/2, the
situation is entirely symmetric if A sends signals to B.
Who is really slow? Neither. If A, like many astronauts, cut her
teeth as a jet pilot, it may occur to her to interpret the observations
by analogy with the Doppler eect for sound waves. Figure a is
in fact a valid diagram if the signals are clicks of sound, provided
that we interpret it as being drawn in the frame of reference of the
air. Sound waves travel at a xed speed relative to the air, and
the space and time units could be chosen such that the speed of
sound was represented by a slope of 1. But A will nd that in
the relativistic case, with signals traveling at c, her observations
of the time intervals are not in quantitative agreement with the
predictions she gets by plugging numbers into the familiar formulas
for the Doppler shift of sound waves. She may then say, Ah, the
analogy with sound isnt quite right. I need to include a correction
50 Chapter 3 Kinematics
factor for time dilation, since Bs time is slow. Im not slow, of
course. I feel perfectly normal.
But her analogy is false and needlessly complicates the situation.
In the version with sound waves and Galilean relativity, there are
three frames of reference involved: As, Bs, and the airs. The rela-
tivistic version is simpler, because there are only two frames, As and
Bs. Its neither helpful nor necessary to break down the observa-
tions into a factor describing what really happens and a correction
factor to account for the relativistic distortions of reality. All we
need to worry about is the world-lines and intersections of world-
lines shown in the spacetime diagrams, along with the metric, which
allows us to compute how much proper time is experienced by each
b / The twin paradox with signals
sent back to earth by the traveling
3.2 The stretch factor is the Doppler shift
Figure b shows how the ideas in the preceding section apply to the
twin paradox. In b/1 we see the situation as described by an impar-
tial observer, who says that both twins are traveling to the right.
But even the impartial observer agrees that one twins motion is
inertial and the others noninertial, which breaks the symmetry and
Section 3.2 The stretch factor is the Doppler shift 51
c / Interpretation of the iden-
tity D(v)D(v) = 1.
also allows the twins to meet up at the end and compare clocks.
For convenience, b/2 shows the situation in the frame where the
earthbound twin is at rest. Both panels of the gure are drawn such
that the relative velocity of the twins is 3/5, and in panel 2 this
is the inverse slope of the traveling twins world-lines. Straightfor-
ward algebra and geometry (problem 6, p. 67) shows that in this
particular example, the period observed by the earthbound twin is
increased by a factor of 2. But 2 is exactly the factor by which the
diagonals of the parallelogram are stretched and compressed in a
Lorentz transformation for a velocity of 3/5. This is true in general:
the stretching and squishing factors for the diagonals are the same
as the Doppler shift. We notate this factor as D (which can stand
for either Doppler or diagonal), and in general it is given by
D(v) =
1 +v
1 v
(problem 7, p. 67).
self-check A
If you measure with a ruler on gure b/2, you will nd that the labeled
sides of the quadrilateral differ by less than a factor of 2. Why is this?
Answer, p. ??
This expression is for the longitudinal Doppler shift, i.e., the case
where the source and observer are in motion directly away from one
another (or toward one another if v < 0). In the purely transverse
case, there is a Doppler shift 1/ which can be interpreted as simply
a measure of time dilation.
The useful identity D(v)D(v) = 1 is trivial to prove alge-
braically, and has the following interpretation. Suppose, as in gure
c, that A and C are at rest relative to one another, but B is moving
relative to them. Bs velocity relative to A is v, and Cs relative
to B is v. At regular intervals, A sends lightspeed pings to B,
who then immediately retransmits them to C. The interval between
pings accumulates two Doppler shifts, and the result is their prod-
uct D(v)D(v). But B didnt actually need to receive the original
signal and retransmit it; the results would have been the same if B
had just stayed out of the way. Therefore this product must equal
1, so D(v)D(v) = 1.
Ives-Stilwell experiments Example 1
The transverse Doppler shift is a characteristic prediction of spe-
cial relativity, with no nonrelativistic counterpart, and Einstein sug-
gested it early on as a test of relativity. However, it is difcult to
measure with high precision, because the results are sensitive
to any error in the alignment of the 90-degree angle. Such ex-
periments were eventually performed, with results that conrmed
but one-dimensional measurements provided both the
See, e.g., Hasselkamp, Mondry, and Scharmann, Zeitschrift f ur Physik A:
Hadrons and Nuclei 289 (1979) 151.
52 Chapter 3 Kinematics
earliest tests of the relativistic Doppler shift and the most pre-
cise ones to date. The rst such test was done by Ives and Stil-
well in 1938, using the following trick. The relativistic expression
D(v) =
(1 + v)/(1 v) for the Doppler shift has the property
that D(v)D(v) = 1, which differs from the nonrelativistic result
of (1 + v)(1 v) = 1 v
. One can therefore accelerate an
ion up to a relativistic speed, measure both the forward Doppler
shifted frequency f
and the backward one f
, and compute
According to relativity, this should exactly equal the frequency f
measured in the ions rest frame.
In a particularly exquisite modern version of the Ives-Stilwell idea,
Saathoff et al. circulated Li
ions at v = .064 in a storage ring.
An electron-cooler technique was used in order to reduce the
variation in velocity among ions in the beam. Since the identity
D(v)D(v) = 1 is independent of v, it was not necessary to mea-
sure v to the same incredible precision as the frequencies; it was
only necessary that it be stable and well-dened. The natural line
width was 7 MHz, and other experimental effects broadened it fur-
ther to 11 MHz. By curve-tting the line, it was possible to achieve
results good to a few tenths of a MHz. The resulting frequencies,
in units of MHz, were:
= 582490203.44 .09
= 512671442.9 0.5
= 546466918.6 0.3
= 546466918.8 0.4 (from previous experimental work)
The spectacular agreement with theory has made this experiment
a lightning rod for anti-relativity kooks.
If one is searching for small deviations from the predictions of
special relativity, a natural place to look is at high velocities. Ives-
Stilwell experiments have been performed at velocities as high as
0.84, and they conrm special relativity.
3.3 Combination of velocities
In nonrelativistic physics, velocities add in relative motion. For
example, if a boat moves relative to a river, and the river moves
relative to the land, then the boats velocity relative to the land
is found by vector addition. This linear behavior cannot hold rel-
ativistically. For example, if a spaceship is moving relative to the
earth at velocity 3/5 (in units with c = 1), and it launches a probe
at velocity 3/5 relative to itself, we cant have the probe moving at
a velocity of 6/5 relative to the earth, because this would be greater
G. Saatho et al., Improved Test of Time Dilation in Relativity, Phys.
Rev. Lett. 91 (2003) 190403. A publicly available description of the experiment
is given in Saathos PhD thesis,
MacArthur et al., Phys. Rev. Lett. 56 (1986) 282 (1986)
Section 3.3 Combination of velocities 53
d / Two Lorentz transforma-
tions of v = 3/5 are applied one
after the other. The transforma-
tions are represented according
to the graphical conventions of
section 1.4.
e / Example 2.
than the maximum speed of cause and eect, which is 1. To see how
to add velocities relativistically, we consider the eect of carrying
the two Lorentz transformations one after the other, gure d.
The inverse slope of the left side of each parallelogram indicates
its velocity relative to the original frame, represented by the square.
Since the left side of the nal parallelogram has not swept past the
diagonal, clearly it represents a velocity of less than 1, not more. To
determine the result, we use the fact that the D factors multiply. We
chose velocities 3/5 because it gives D = 2, which is easy to work
with. Doubling the long diagonal twice gives an over-all stretch
factor of 4, and solving the equation D(v) = 4 for v gives the result,
v = 15/17.
We can now see the answer to question 2 on p. 49. If we keep
accelerating a spaceship steadily, we are simply continuing the pro-
cess of acceleration shown in gure d. If we do this indenitely, the
velocity will approach c = 1 but never surpass it. (For more on this
topic of going faster than light, see section 4.7.)
Accelerating electrons Example 2
Figure e shows the results of a 1964 experiment by Bertozzi in
which electrons were accelerated by the static electric eld E of
a Van de Graaff accelerator of length
. They were then allowed
to y down a beamline of length
= 8.4 m without being acted
on by any force. The time of ight t
was used to nd the nal
velocity v =
to which they had been accelerated. (To make
the low-energy portion of the graph legible, Bertozzis highest-
energy data point is omitted.)
If we believed in Newtons laws, then the electrons would have an
acceleration a
= Ee/m, which would be constant if, as we pre-
tend for the moment, the eld E were constant. (The electric eld
inside a Van de Graaff accelerator is not really quite constant, but
this will turn out not to matter.) The Newtonian prediction for the
time over which this acceleration occurs is t
/eE. An
acceleration a
acting for a time t
should produce a nal veloc-
ity a
2eV/m, where V = E
is the voltage difference.
(By conservation of energy, this equation holds even if the eld
is not constant.) The solid line in the graph shows the prediction
of Newtons laws, which is that a constant force exerted steadily
over time will produce a velocity that rises linearly and without
The experimental data, shown as black dots, clearly tell a differ-
ent story. The velocity asymptotically approaches a limit, which
we identify as c. The dashed line shows the predictions of spe-
cial relativity, which we are not yet ready to calculate because we
havent yet seen how kinetic energy depends on velocity at rel-
ativistic speeds. The calculation is carried out in example 4 on
p. 75.
54 Chapter 3 Kinematics
Note that the relationship between the rst and second frames of
reference in gure d is the same as the relationship between the sec-
ond and third. Therefore if a passenger is to feel a steady sensation
of acceleration (or, equivalently, if an accelerometer aboard the ship
is to show a constant reading), then the proper time required to pass
from the rst frame to the second must be the same as the proper
time to go from the second to the third. A nice way to express this is
to dene the rapidity = ln D. Combining velocities means multi-
plying Ds, which is the same as adding their logarithms. Therefore
we can write the relativistic rule for combining velocities simply as

The passengers perceive the acceleration as steady if increase by
the same amount per unit of proper time. In other words, we can
dene a proper acceleration d/d, which corresponds to what an
accelerometer measures.
Rapidity is convenient and useful, and is very frequently used
in particle physics. But in terms of ordinary velocities, the rule for
combining velocities can also be rewritten using identity [9] from
section 3.6 as
1 +v
self-check B
How can we tell that this equation is written in natural units? Rewrite it
in SI units. Answer, p. ??
3.4 No frame of reference moving at c
We have seen in section 3.3 that no continuous process of accel-
eration can boost a material object to c. That is, the subluminal
(slower than light) nature of a electron or a person is a fundamental
feature of its identity and can never be changed. Einstein can never
get on his motorcycle and drive at c as he imagined when he was a
young man, so we material beings can never see the world from a
frame of reference that travels at c.
Our universe does, however, contain ingredients such as light
rays, gluons, and gravitational waves that travel at c, so we might
wonder whether these things could be put together to form observers
who do move at c. But this is not possible according to special rel-
ativity, because if we let v approach innity, extrapolation of gure
d on p. 54 shows that the Lorentz transformation would compress
all of spacetime onto the light cone, reducing its number of dimen-
sions by 1. Distinct points would be merged, which would make it
impossible to use this frame to describe the same phenomena that
a subluminal observer could describe. That is, the transformation
would not be one-to-one, and this is unacceptable physically.
Section 3.4 No frame of reference moving at c 55
f / A playing card returns to
its original state when rotated
by 180 degrees. Its orientation,
unlike the orientation of an arrow,
doesnt behave as a vector, since
it doesnt transform in the usual
way under rotations. Under a
180-degree rotation, a vector
should negate itself rather than
coming back to its original state.
3.5 The velocity and acceleration vectors
3.5.1 The velocity vector
In a freshman course in Newtonian mechanics, we would dene
a vector as something that has three components. Furthermore, we
would require it to transform in a certain way under a rotation.
For example, we could form the collection of numbers (e, T, DJIA),
where e is the fundamental charge, T is the temperature in Bualo,
New York, and DJIA measures how the stock market is doing. But
this would not be a vector, since it doesnt act the right way when
rotated (this particular vector is invariant under rotations). Fig-
ure f gives a less silly non-example. In contradistinction to a vector,
a scalar is specied by a single real number and is invariant under
rotations. The most basic example of a Newtonian vector was a
displacement (x, y, z), and from the displacement vector we
would go on to construct other quantities such as a velocity vec-
tor v = r/t. This worked because in Newtonian mechanics t
was treated as a scalar, and dividing a vector by a scalar produces
something that again transforms in the right way to be a vector.
Now lets upgrade to relativity, and work through the same steps
by analogy. When I say vector in this book, I mean something that
in 3+1 dimensions has four components. This can also be referred
to as a four-vector. Our only example so far has been the spacetime
displacement vector r = (t, x, y, z). This vector transforms
according to the Lorentz transformation. In general, we require as
part of the denition of a (four-)vector that it transform in the usual
way under both rotations and boosts (Lorentz transformations). We
might now imagine that the next step should be to construct a
velocity four-vector r/t. But relativistically, the quantity r/t
would not transform like a vector, e.g., if r was spacelike, then there
would be a frame in which we had t = 0, and then r/t would
be nite in some frames but innite in others, which is absurd.
To construct a valid vector, we have to divide r by a scalar.
The only scalar that could be relevant would be the proper time ,
and this is indeed how the velocity vector is dened in relativity.
For an inertial world-line (one with constant velocity), we dene
v = r/. The generalization to noninertial world-lines requires
that we make this denition into a derivative:
v =
Not all objects have well-dened velocity vectors. For exam-
ple, consider a ray of light with a straight world-line, so that the
derivative d. . . / d. . . is the same as the ratio of nite dierences
. . . /. . ., i.e., calculus isnt needed. A ray of light has v = c,
so that applying the metric to any segment of its world-line gives
= 0. Attempting to calculate v = r/ then gives something
56 Chapter 3 Kinematics
g / A spaceship (curved world-
line) moves with an acceleration
perceived as constant by its
of the form (, ). We will see in section 4.3.1 that all massless
particles, not just photons, travel at c, so the same would apply to
them. Therefore a velocity vector is only dened for particles whose
world-lines are timelike, i.e., massive particles.
Velocity vector of an object at rest Example 3
An object at rest has v = (1, 0). The rst component indicates that
if we attach a clock to the object with duct tape, the proper time
measured by the clock suffers no time dilation according to an
observer in this frame, dt / d = 1. The second component tells
us that the objects position isnt changing, dx/ d = 0.
3.5.2 The acceleration vector
The acceleration vector is dened as the derivative of the velocity
vector with respect to proper time,
a =
It measures the curvature of a world-line. Its squared magnitude
is the minus the proper acceleration, meaning the acceleration that
would be measured by an accelerometer carried along that world-
line. The proper acceleration is only approximately equal to the
squared magnitude of the Newtonian acceleration three-vector, in
the limit of small velocities.
Constant proper acceleration Example 4
Suppose a spaceship moves so that the acceleration is judged
to be the constant value a by an observer on board. Find the
motion x(t ) as measured by an observer in an inertial frame.
Let stand for the ships proper time, and let dots indicate
derivatives with respect to . The ships velocity has magnitude
1, so

= 1 .
An observer who is instantaneously at rest with respect to the
ship judges is to have an acceleration vector (0, a) (because the
low-velocity limit applies). The observer in the (t , x) frame agrees
on the magnitude of this vector, so

= a
The solution of these differential equations is t =
x =
cosha (choosing constants of integration so that the ex-
pressions take on their simplest forms). Eliminating gives
x =
1 + a
shown in gure g. The world-line is a hyperbola, and this type of
motion is sometimes referred to as hyperbolic motion.
Section 3.5 The velocity and acceleration vectors 57
As t approaches innity, dx/ dt approaches the speed of light.
In the same limit, x increases exponentially with proper time, so
that surprisingly large distances can in theory be traveled within
a human lifetime (problem 7, p. 97). Some further properties of
hyperbolic motion are developed in problems 10, 11, and 12.
Another interesting feature of this problemis the dashed-line asymp-
tote, which is lightlike. Suppose we interpret this as the world-line
of a ray of light. The ray comes closer and closer to the ship,
but will never quite catch up. Thus provided that the rocket never
stops accelerating, the entire region of spacetime to the left of
the dashed line is forever hidden from its passengers. That is,
an observer who undergoes constant acceleration has an event
horizon a boundary that prevents her from observing anything
on the other side. You may have heard about the event horizon
associated with a black hole. This example shows that we can
have event horizons even when there is no gravity at all.
3.5.3 Constraints on the velocity and acceleration vectors
Counting degrees of freedom
There is something misleading about the foregoing treatment of
the velocity and acceleration vectors, and the easiest way to see this
is by introducing the idea of a degree of freedom. Often we can
describe a system using a list of real numbers. For the hand on a
clock, we only need one number, such as 3 oclock. This is because
the hand is constrained to stay in the plane of the clocks face and
also to keep its tail at the center of the circle. Since one number
describes its position, we say that it has one degree of freedom. If
a hiker wants to know where she is on a map, she has two degrees
of freedom, which could be specied as her latitude and longitude.
If she was in a helicopter, there would be no constraint to stay on
the earths surface, and the number of degrees of freedom would be
increased to three. If we also considered the helicopters velocity to
be part of the description of its state, then there would be a total
of six degrees of freedom: one for each coordinate and one for each
component of the velocity vector.
Now suppose that we want to describe a particles velocity and
acceleration. In Newtonian mechanics, we would describe these
three-vectors as possessing a total of six degrees of freedom: v
, v
, a
, a
, and a
. Upgrading from Newtonian mechanics to
relativity cant change the number of degrees of freedom. For ex-
ample, an electrons acceleration is fully determined by the force
we exert on it, and we might control that acceleration by placing a
proton nearby and producing an electrical attraction. The position
of the proton (three degrees of freedom for its three coordinates) de-
termines the electrons acceleration, so the acceleration has exactly
three degrees of freedom as well.
58 Chapter 3 Kinematics
h / Both vectors are tangent
This means that there must be some hidden redundancy in the
eight components of the velocity and acceleration four-vectors. The
system only has six degrees of freedom, so there must be two con-
straints that we didnt know about. Similarly, Ive gone hiking and
had my GPS unit claim that I was a thousand feet above a lake or
three thousand feet under a mountain. In those situations there was
a constraint that I knew about but that the GPS didnt: that I was
on the surface of the earth.
Normalization of the velocity
The rst constraint arises naturally from a geometrical inter-
pretation of the velocity four-vector, shown in gure h. The curve
represents the world-line of a particle. The dashed line is drawn
tangent to the world-line at a certain moment. Under a microscope,
the dashed line, which represents a possible inertial motion of a par-
ticle, is indistinguishable from the solid curve, which is noninertial.
The dashed line has a slope t/x = 2, which corresponds to a
velocity x/t = 1/2. The gure is drawn in 1+1 dimensions, but
in 3 + 1 dimensions we would want to know more than this num-
ber. We would want to know the orientation of the dashed line in
the three spatial dimensions, i.e., not just the speed of the particle
but also its direction of motion. All the desired information can be
encapsulated in a vector. Both of the vectors shown in the gure
are parallel to the dashed line, so even though they have dierent
lengths, there is no dierence between the velocities they represent.
Since we want the particle to have a single well-dened vector to
represent its velocity, we want to pick one vector from among all
the vectors parallel to the dashed line, and call that the velocity
We have already implicitly made this choice. It follows from
the original denition v = dr/ d that the velocity vectors squared
magnitude v
= v v is always equal to 1, even though the ob-
ject whose motion it describes is not moving at the speed of light.
This, along with the requirement that the velocity vector lie within
the future rather than the past light cone, uniquely species which
tangent vector we want. The requirement v
= 1 is an example of
a recurring idea in physics and mathematics called normalization.
The idea is that we have some object (a vector, a function, . . . )
that could be scaled up or down by any amount, but from among
all the possible scales, there is only one that is the right one. For
example, a gambler might place a horses chance of winning at 9 to
1, but a physicist would divide these by 10 in order to normalize the
probabilities to 0.9 and 0.1, the idea being that the total probability
should add up to 1. Our denition of the velocity vector implies that
it is normalized. Thus an alternative, geometrical denition of the
velocity vector would have been that it is the vector that is tangent
to the particles world-line, future-directed, and normalized to 1.
Section 3.5 The velocity and acceleration vectors 59
When we hear something referred to as a vector, we usually
take this is a statement that it not only transforms as a vector, but
also that it adds as a vector. But the sum of two velocity vectors
would not typically be a valid velocity vector at all, since it would
not have unit magnitude. This lack of additivity would in any case
have been expected because velocities dont add linearly in relativity
(section 3.3).
self-check C
Velocity vectors are required to have v
= 1. If a vector qualies as a
valid velocity vector in some frame, could it be invalid in another frame?
Answer, p. ??
A nice way of thinking about velocity vectors is that every such
vector represents a potential observer. That is, the velocity vectors
are the observer-vectors o of chapter 1, but with a normalization
requirement o
= 1 that we did not impose earlier. An observer
writes her own velocity vector as (1, 0), i.e., as the unit vector in the
timelike direction. Since we have no notion of adding one observer
to another observer, it makes sense that velocity vectors dont add
If u and v are both future-directed, properly normalized velocity
vectors, and if the signature is + as in this book, then their
inner product is = u v, the gamma factor, introduced in section
1.3.3, p. 25, corresponding their relative velocity.
Orthogonality of the velocity and acceleration
Now for the second of the two constraints deduced on p. 58.
Suppose an observer claims that at a certain moment in time,
a particle has v = (1, 0) and a = (3, 0). That is, the particle is
at rest (v
= 0) and its v
is growing by 3 units per second. This
is impossible, because after an innitesimal time interval dt, this
rate of change will result in v = (1 + 3 dt, 0), which is not properly
normalized: its magnitude has grown from 1 to 1+3 dt. The observer
is mistaken. This is not a possible combination of velocity and
acceleration vectors. In general (problem 9, p. 67), we always have
the following constraint on the velocity and acceleration vectors:
a v = 0 .
This is analogous to the three-dimensional idea that in uniform cir-
cular motion, the perpendicularity of the velocity and acceleration
three-vectors is what causes the velocity vector to rotate without
changing its magnitude.
60 Chapter 3 Kinematics
3.6 Some kinematic identities
In addition to the relations
D(v) =
1 +v
1 v
1 +v
the following identities can be handy. If stranded on a desert island
you should be able to rederive them from scratch. Dont memorize
[1] v = (D
+ 1) [5] = ln D [10] D
= D
[2] = (D
+D)/2 [6] v = tanh [11]
[3] v = (D D
)/2 [7] = cosh [12] v

= (v

[4] D(v)D(v) = 1 [8] v = sinh
[9] tanh(x +y) =
tanh x+tanh y
1+tanh xtanh y
The hyperbolic trig functions are dened as follows:
sinh x =
cosh x =
tanh x =
sinh x
cosh x
Their inverses are built in to some calculators and computer soft-
ware, but they can also be calculated using the following relations:
x = ln
x +
+ 1
x = ln
x +
x =
1 +x
1 x
Their derivatives are, respectively, (x
+ 1)
, (x
, and
(1 x
3.7 The projection operator
A frequent source of confusion in relativity is that we write down
equations that are coordinate-dependent, but forget the dependency.
Similarly, it is possible to write expressions that are only valid for
one choice of signature. The following notation, dening a projection
operator P, is one tool for avoiding these diculties.
r = r
r o
o o
o (1)
Usually o is the future timelike vector representing a certain ob-
server, but the denition can be applied as long as o isnt lightlike.
Section 3.6 Some kinematic identities 61
The idea being expressed is that we want to get rid of any part
of r that is parallel to os arrow of time. In a graph constructed
according to os Minkowski coordinates, we cast rs shadow down
perpendicularly onto the spacelike axis, or the spacelike three-plane
in 3 +1 dimensions. This is why P is referred to as a projection op-
erator. The notation sometimes allows us to express the things that
we would otherwise express by explicitly or implicitly constructing
and referring to os spacelike Minkowski coordinates. P has the
following properties:
1. o P
r = 0
2. r P
r is parallel to o.
3. P
o = 0
4. P
r = P
5. P
= P
6. P
is linear, i.e., P
(q +r) = P
q +P
r and P
(cr) = cP
r = P
, where x is any variable and o doesnt depend
on x.
8. If o and v are both future timelike, and [o
[ = 1, then we can
express v as v = P
v +o, where has the usual interpreta-
tion for world-lines that coincide with these two vectors.
All of these hold regardless of whether the signature is +
or + ++, and none of them refer to any coordinates. Properties
1 and 2 can serve as an alternative, geometrical denition of P.
Property 3 says that an observer considers herself to be at rest. 4
is a general property of all projection operators. 8 splits the vector
into its spatial and temporal parts according to o.
Sometimes if we know a position, velocity, or acceleration four-
vector, we want to nd out how these would be measured by a par-
ticular observer using clocks and rulers. The following table shows
how to switch back and forth between the two representations. We
use, for example, the notation v
to mean the velocity vector of the
form (0, v
, v
, v
) that would be measured by an observer whose
velocity vector is o (so that the subscript is an o for observer,
not a zero). Since this type of vector, expressed in the Minkowski
coordinates of observer o, has a zero time component, we refer to it
as a three-vector. In all of these expressions, the velocity vectors o
and v are assumed to be normalized, and the signature is assumed
to be + (one implication being that o v is simply ).
62 Chapter 3 Kinematics
i / Example 5.
nding the three-vector from the
nding the four-vector from the
= P
v = (o +v
a (o a)v
] a =
)v +
, where
v is found as above
As an example of how these are derived, the three-velocity v
is the derivative of x
with respect to observer os Minkowski time
coordinate t, whereas the four-velocity is dened as the derivative of
x with respect to the proper time of the world-line being observed.
Therefore we have
and applying property 7 of the projection operator this becomes
= P
= P

o v
o v
The similar but messier derivation of the expression for a
is problem
15. In manipulating expressions of this type, the identity d/ dt =

is often handy (problem 14).
Lewis-Tolman paradox Example 5
The following example is a form of a paradox discussed by Lewis
and Tolman in 1909. Figure i shows the frame of reference of
observer o in which identical particles 1 and 2 are at initially rest
and located at equal distances from the origin along the y and
x axes. External forces of equal strength act in the directions
shown by the arrows so as to produce accelerations of magnitude
. The system is in rotational equilibrium dL/ dt = 0, because the
rate at which particle 1 picks clockwise angular momentum is the
same as the rate at which 2 acquires it in the counterclockwise
Now change to the frame of reference o

, moving to the right

relative to o at velocity v. Particle 2s distance from the origin
is Lorentz-contracted from to /, so its angular momentum is
also reduced by 1/. It now appears that the systems total an-
gular momentum is increasing in the clockwise sense. How can
we have rotational equilibrium in one frame, but not another?
Section 3.7 The projection operator 63
The resolution of the paradox is that the accelerations transform
as well. In the original frame o, the four-velocities are v
= v
(1, 0, 0, 0), and the four-accelerations are a
= (0, , 0, 0) and a
(0, 0, , 0). Applying a Lorentz transformation, we have v

= v

(, v, 0, 0) and

= (v, , 0, 0)

= (0, 0, 1, 0) .
Our denition of angular momentum is expressed in terms of
three-vectors such as a

and a

, not four-vectors like a


. We have


= ma



Using the relations v
v and a
a (o a)v
we nd

= v ,


[ (v)(v)] =



The result is


= m


which is zero.
3.8 Faster-than-light frames of reference?
Special relativity doesnt permit the existence of observers who move
at c (section 3.4). But what about a superluminal observer, one who
moves faster than c? With charming naivete, the special-eects
technicians for Star Trek attempted to show the frame of reference
of such an observer in scenes where a eld of stars rushed past
the Enterprise. (Never mind that the stars, which pass in front of
and behind the spaceship, should actually be a million times larger
than it.) Actually such an observer (which could not be made out
of normal material particles such as electrons and protons) would
consider its own world-line, which we call spacelike, to be timelike,
while the world-line of a star such as our sun, which we consider
timelike, would be spacelike. That is, our sun would not appear to
the observer as an object in motion but rather as a line stretching
across space, which would wink into existence and then wink back
out. A typical transformation between our frame and the frame
64 Chapter 3 Kinematics
of such an observer would be (x, t) (t, x), simply swapping the
time and space coordinates. The transformation is one-to-one, and
therefore not subject to the objection raised in section 3.4 to frames
moving at c.
But this is all in 1+1 dimensions. In 3+1 dimensions, we again
run into the diculty that the transformation between our frame
and that of the superluminal being cannot be one-to-one, since we
cant squish three dimensions to one or expand one to three without
merging points or splitting one point into many. Our conclusion,
then, is that there can be no such thing as a superluminal observer in
our 3+1-dimensional universe. A more formal proof has been given
by Gorini.
For more about faster-than-light motion in relativity,
see section 4.7, p. 92.
Gorini, Linear Kinematical Groups, Commun. Math. Phys. 21 (1971) 150.
Open access via Project Euclid at
Section 3.8 Faster-than-light frames of reference? 65
1 Fred buys a ticket on a spaceship that will accelerate to an
ultrarelativistic speed v such that c v is only 6 m/s. Fred was
on the track team in high school, so he knows he can run about 8
m/s. Once the ship is up to speed, Fred plans to run in the forward
direction, thereby becoming the rst human to exceed the speed of
light. Other than the possible lack of gravity to allow running, what
is wrong with Freds plan?
2 (a) In the equation v
= (v
)/(1 +v
) for combination
of velocities, interpret the case where one of the velocities (but not
the other) equals the speed of light. (b) Interpret the case where the
denominator goes to zero. (c) Use the geometric series to rewrite
the factor 1/(1 + v
), and then expand the expression for v
a series in v
and v
, retaining terms up to third order in velocity.
How does this relate to the correspondence principle?
3 Determine which of the identities in section 3.6 need to be
modied in order to be valid in units with c ,= 1, and describe how
they should be modied.
4 The Large Hadron Collider accelerates counterrotating beams
of protons and collides them head-on. The beam energy has been
gradually increased, and the accelerator is designed to reach a max-
imum energy of 14 TeV, corresponding to a rapidity of 10.3. (a)
Find the velocity of the beam. (b) In any collision, the kinetic en-
ergy available to do something inelastic (smash up your car, produce
nuclear reactions, . . . ) is the energy in the center of mass frame;
in any other frame, there is initial kinetic energy that must also be
present in the nal state due to conservation of momentum. Sup-
pose that a particular proton in the LHC beam never undergoes
a collision with a proton from the opposite beam, and instead is
wasted by being dumped into a beamstop. Lets say that this colli-
sion is with a proton in a hydrogen atom left behind by someones
ngerprint. Find the velocities of the two protons in their common
center of mass frame.
5 Each GPS satellite is in an orbit with a radius of 26,600
km, with an orbital period of half a sidereal day, giving it a velocity
of 3.88 km/s. The atomic clock aboard such a satellite is tuned
to 10.22999999543 MHz, which is chosen so that when the satellite
is directly overhead, the eect of time dilation (transverse Doppler
shift), combined with a general-relativistic eect due to gravity, re-
sults in a frequency of exactly 10.23 MHz. (GPS started out as a
military project, and legend has it that the top brass, suspicious of
the crazy relativity stu, demanded that the satellites be equipped
with a software switch to turn o the correction, just in case the
physicists were wrong.) There are oscillations superimposed onto
these static eects due to the longitudinal Doppler shifts as the
satellites approach and recede from a given observer on the ground.
66 Chapter 3 Kinematics
(a) Calculate the maximum Doppler-shifted frequency for a hypo-
thetical observer in outer space who is being directly approached by
the satellite in its orbit. (b) In reality, the greatest possible longi-
tudinal component of the velocity is considerably smaller than this
due to the geometry. Use the size of the earth to determine this
velocity and the corresponding maximum frequency.
6 Verify directly, using the geometry of gure b/2 on p. 51 that
for v = 3/5, the Doppler shift factor is D = 2. (Do not simply plug
v = 3/5 into the formula D =
(1 +v)/(1 v).)
7 Generalize the numerical calculation of problem 6 to prove
the general result D =
(1 +v)/(1 v).
8 Expand the relativistic equation for the longitudinal Doppler
shift of light D(v) in a Taylor series, and nd the rst two nonvanish-
ing terms. Show that these two terms agree with the nonrelativistic
expression, so that any relativistic eect is of higher order in v.
9 Prove, as claimed on p. 60, that we must have a v = 0 if the
velocity four-vector is to remain properly normalized.
10 Example 4 on p. 57 described the motion of an object having
constant proper acceleration a, the world-line being t =
sinh a
and x =
cosh a in a particular observers Minkowski coordinates.
(a) Prove the following results for and for the (three-)velocity and
(three-)acceleration measured by this observer.
= cosh a
v = tanh a
acceleration = a cosh
Do the calculations simply by taking the rst and second derivatives
of position with respect to time. You will nd the following facts
1 tanh
= cosh
tanh x = cosh
(b) Interpret the results in the limit of large .
11 Example 4 on p. 57 described the motion of an object having
constant proper acceleration a, the world-line being t =
sinh a
and x =
cosh a in a particular observers Minkowski coordinates.
Find the corresponding velocity and acceleration four-vectors.
12 Starting from the results of problem 11, repeat problem 10a
using the techniques of section 3.7 on p. 61. You will nd it helpful
to know that 1 tanh
= cosh
13 Let v be a future-directed, properly normalized velocity
vector. Compare the value of v v in the + signature used in
this book with its value in the signature + ++.
Problems 67
14 (a) Prove the relation d/ dt =
given on p. 63,
in the special case where the motion is linear. (b) Generalize the
result to 3 + 1 dimensions.
15 Derive the identity a
a (o a)v
] on p. 63.
68 Chapter 3 Kinematics
Chapter 4
4.1 Ultrarelativistic particles
A typical 22-caliber rie shoots a bullet with a mass of about 3
g at a speed of about 400 m/s. Now consider the ring of such
a rie as seen through an ultra-powerful telescope by an alien in
a distant galaxy. We happen to be ring in the direction away
from the alien, who gets a view from over our shoulder. Since the
universe is expanding, our two galaxies are receding from each other.
In the aliens frame, our own galaxy is the one that is moving
lets say at
c (200 m/s). If the two velocities simply added,
the bullet would be moving at c + (200 m/s). But velocities dont
simply add and subtract relativistically, and applying the correct
equation for relativistic combination of velocities, we nd that in
the aliens frame, the bullet ies at only c(199.9995 m/s). That is,
according to the alien, the energy in the gunpowder only succeeded
in accelerating the bullet by 0.0005 m/s! If we insisted on believing
in K = (1/2)mv
, this would clearly violate conservation of energy
in the aliens frame of reference. It appears that kinetic energy must
not only rise faster than v
as v approaches c, it must blow up to
innity. This gives a dynamical explanation for why no material
object can ever reach or exceed c, as we have already inferred on
purely kinematical grounds.
To the alien, both our galaxy and the bullet are ultrarelativistic
objects, i.e., objects moving at nearly c. A good way of thinking
about an ultrarelativistic particle is that its a particle with a very
small mass. For example, the subatomic particle called the neutrino
has a very small mass, thousands of times smaller than that of the
electron. Neutrinos are emitted in radioactive decay, and because
the neutrinos mass is so small, the amount of energy available in
these decays is always enough to accelerate it to very close to c.
Nobody has ever succeeded in observing a neutrino that was not
ultrarelativistic. When a particles mass is very small, the mass
becomes dicult to measure. For almost 70 years after the neu-
trino was discovered, its mass was thought to be zero. Similarly, we
currently believe that a ray of light has no mass, but it is always
possible that its mass will be found to be nonzero at some point
In reality when two velocities move at relativistic speeds compared with one
another, they are separated by a cosmological distance, and special relativity
does not actually allow us to construct frames of reference this large.
in the future. A ray of light can be modeled as an ultrarelativistic
Lets compare ultrarelativistic particles with train cars. A single
car with kinetic energy E has dierent properties than a train of two
cars each with kinetic energy E/2. The single car has half the mass
and a speed that is greater by a factor of

2. But the same is not

true for ultrarelativistic particles. Since an idealized ultrarelativistic
particle has a mass too small to be detectable in any experiment,
we cant detect the dierence between m and 2m. Furthermore,
ultrarelativistic particles move at close to c, so there is no observable
dierence in speed. Thus we expect that a single ultrarelativistic
particle with energy E compared with two such particles, each with
energy E/2, should have all the same properties as measured by a
mechanical detector.
An idealized zero-mass particle also has no frame in which it
can be at rest. It always travels at c, and no matter how fast we
chase after it, we can never catch up. We can, however, observe
it in dierent frames of reference, and we will nd that its energy
is dierent. For example, distant galaxies are receding from us at
substantial fractions of c, and when we observe them through a
telescope, they appear very dim not just because they are very far
away but also because their light has less energy in our frame than
in a frame at rest relative to the source. This eect must be such
that changing frames of reference according to a specic Lorentz
transformation always changes the energy of the particle by a xed
factor, regardless of the particles original energy; for if not, then
the eect of a Lorentz transformation on a single particle of energy
E would be dierent from its eect on two particles of energy E/2.
How does this energy-shift factor depend on the velocity v of
the Lorentz transformation? Here it becomes nicer to work in
terms of the variable D. Lets write f(D) for the energy-shift
factor that results from a given Lorentz transformation. Since a
Lorentz transformation D
followed by a second transformation D
is equivalent to a single transformation by D
, we must have
) = f(D
). This tightly constrains the form of the
function f; it must be something like f(D) = D
, where n is a con-
stant. The interpretation of n is that under a Lorentz transforma-
tion corresponding to 1% of c, energies of ultrarelativistic particles
change by about n% (making the approximation that v = .01 gives
D 1.01). In his original 1905 paper on special relativity, Einstein
used Maxwells equations and the Lorentz transformation to show
that for a light wave n = 1, and we will prove on p. 78 that this
holds for any ultrarelativistic object. He wrote, It is remarkable
that the energy and the frequency . . . vary with the state of motion
of the observer in accordance with the same law. He was presum-
ably interested in this fact because 1905 was also the year in which
he published his paper on the photoelectric eect, which formed the
70 Chapter 4 Dynamics
foundations of quantum mechanics. An axiom of quantum mechan-
ics is that the energy and frequency of any particle are related by
E = hf, and if E and f hadnt transformed in the same way rela-
tivistically, then quantum mechanics would have been incompatible
with relativity.
If we assume that certain objects, such as light rays, are truly
massless, rather than just having masses too small to be detectable,
then their D doesnt have any nite value, but we can still nd how
the energy diers according to dierent observers by nding the D
of the Lorentz transformation between the two observers frames of
An astronomical energy shift Example 1
For quantum-mechanical reasons, a hydrogen atom can only
exist in states with certain specic energies. By conservation
of energy, the atom can therefore only absorb or emit light that
has an energy equal to the difference between two such atomic
energies. The outer atmosphere of a star is mostly made of
monoatomic hydrogen, and one of the energies that a hydrogen
atom can absorb or emit is 3.0276 10
J. When we observe
light from stars in the Andromeda Galaxy, it has an energy of
3.0306 10
J. If this is assumed to be due entirely to the
motion of the Milky Way and Andromeda Galaxy relative to one
another, along the line connecting them, nd the direction and
magnitude of this velocity.
The energy is shifted upward, which means that the Andromeda
Galaxy is moving toward us. (Galaxies at cosmological distances
are always observed to be receding from one another, but this
doesnt necessarily hold for galaxies as close as these.) Relating
the energy shift to the velocity, we have

= D =
(1 + v)/(1 v) .
Since the shift is only about one part per thousand, the velocity
is small compared to c or small compared to 1 in units where
c = 1. Therefore we can employ the low-velocity approximation
D 1 + v, which gives
v D 1 =

1 = 1.0 10
The negative sign conrms that the source is approaching rather
than receding. This is in units where c = 1. Converting to SI
units, where c ,= 1, we have v = (1.0 10
)c = 300 km/s.
Although the Andromeda Galaxys tangential motion is not accu-
rately known, it is considered likely that it will collide with the Milky
Way in a few billion years.
Section 4.1 Ultrarelativistic particles 71
4.2 E=mc
We now know the relativistic expression for kinetic energy in the lim-
iting case of an ultrarelativistic particle: its energy is proportional to
the stretch factor D of the Lorentz transformation. What about
intermediate cases, like v = c/2?
a / The match is lit inside the bell
jar. It burns, and energy escapes
from the jar in the form of light. Af-
ter it stops burning, all the same
atoms are still in the jar: none
have entered or escaped. The g-
ure shows the outcome expected
before relativity, which was that
the mass measured on the bal-
ance would remain exactly the
same. This is not what happens
in reality.
When we are forced to tinker with a time-honored theory, our
rst instinct should always be to tinker as conservatively as possible.
Although weve been forced to admit that kinetic energy doesnt
vary as v
/2 at relativistic speeds, the next most conservative thing
we could do would be to assume that the only change necessary
is to replace the factor of v
/2 in the nonrelativistic expression for
kinetic energy with some other function, which would have to act
like D or 1/D for v c. I suspect that this is what Einstein
thought when he completed his original paper on relativity in 1905,
because it wasnt until later that year that he published a second
paper showing that this still wasnt enough of a change to produce
a working theory. We now know that there is something more that
needs to be changed about prerelativistic physics, and this is the
assumption that mass is only a property of material particles such
as atoms (gure a). Call this the atoms-only hypothesis.
Now that we know the correct relativistic way of nding the
energy of a ray of light, it turns out that we can use that to nd
what we were originally seeking, which was the energy of a material
object. The following discussion closely follows Einsteins.
Suppose that a material object O of mass m
, initially at rest
in a certain frame A, emits two rays of light (or any other kind of
ultrarelativistic particles), each with energy E/2. By conservation
of energy, the object must have lost an amount of energy equal to
E. By symmetry, O remains at rest.
We now switch to a dierent frame of reference B moving at some
arbitrary speed corresponding to a stretch factor D. The change
of frames means that were chasing one ray, so that its energy is
scaled down to (E/2)D
, while running away from the other, whose
energy gets boosted to (E/2)D. In frame B, as in A, O retains the
72 Chapter 4 Dynamics
same speed after emission of the light. But observers in frames A
and B disagree on how much energy O has lost, the discrepancy
(D +D
) 1
This can be rewritten using identity [2] from section 3.6 as
E( 1) .
Lets consider the case where Bs velocity relative to A is small.
Using the approximation 1 +v
/2, our result is approximately
neglecting terms of order v
and higher. The interpretation is that
when O reduced its energy by E in order to make the light rays, it
reduced its mass from m
to m
m, where m = E. Inserting the
necessary factor of c
to make this valid in units where c ,= 1, we
have Einsteins famous
E = mc
This derivation entailed both an approximation and some hidden
assumptions. These issues are explored more thoroughly in section
4.4 on p. 84 and in ch. 9 on p. 149. The result turns out to be valid
for any isolated body.
We nd that mass is not simply a built-in property of the parti-
cles that make up an object, with the objects mass being the sum of
the masses of its particles. Rather, mass and energy are equivalent,
so that if the experiment of gure a is carried out with a suciently
precise balance, the reading will drop because of the mass equivalent
of the energy emitted as light.
The equation E = mc
tells us how much energy is equivalent
to how much mass: the conversion factor is the square of the speed
of light, c. Since c a big number, you get a really really big number
when you multiply it by itself to get c
. This means that even
a small amount of mass is equivalent to a very large amount of
energy. Conversely, an ordinary amount of energy corresponds to
an extremely small mass, and this is why nobody detected the non-
null result of experiments like the one in gure a hundreds of years
The big event here is mass-energy equivalence, but we can also
harvest a result for the energy of a material particle moving at a
certain speed. We have m( 1) for the dierence between Os
energy in frame B and its energy when it is at rest, i.e., its kinetic
energy. But since mass and energy are equivalent, we assign O an
energy m when it is at rest. The result is that the energy is
E = m
(or mc
in units with c ,= 1).
Section 4.2 E=mc
b / Top: A PET scanner. Middle:
Each positron annihilates with an
electron, producing two gamma-
rays that y off back-to-back.
When two gamma rays are ob-
served simultaneously in the ring
of detectors, they are assumed to
come from the same annihilation
event, and the point at which they
were emitted must lie on the line
connecting the two detectors.
Bottom: A scan of a persons
torso. The body has concentrated
the radioactive tracer around the
stomach, indicating an abnormal
medical condition.
Electron-positron annihilation Example 2
Natural radioactivity in the earth produces positrons, which are
like electrons but have the opposite charge. A form of antimat-
ter, positrons annihilate with electrons to produce gamma rays, a
form of high-frequency light. Such a process would have been
considered impossible before Einstein, because conservation of
mass and energy were believed to be separate principles, and
this process eliminates 100% of the original mass. The amount
of energy produced by annihilating 1 kg of matter with 1 kg of
antimatter is
E = mc
= (2 kg)
3.0 10
= 2 10
J ,
which is on the same order of magnitude as a days energy con-
sumption for the entire worlds population!
Positron annihilation forms the basis for the medical imaging tech-
nique called a PET (positron emission tomography) scan, in which
a positron-emitting chemical is injected into the patient and map-
ped by the emission of gamma rays from the parts of the body
where it accumulates.
A rusting nail Example 3
An iron nail is left in a cup of water until it turns entirely to rust.
The energy released is about 0.5 MJ. In theory, would a suf-
ciently precise scale register a change in mass? If so, how much?
The energy will appear as heat, which will be lost to the envi-
ronment. The total mass-energy of the cup, water, and iron will
indeed be lessened by 0.5 MJ. (If it had been perfectly insulated,
there would have been no change, since the heat energy would
have been trapped in the cup.) The speed of light is c = 3 10
meters per second, so converting to mass units, we have
m =
0.5 10
3 10
= 6 10
kilograms .
The change in mass is too small to measure with any practical
technique. This is because the square of the speed of light is
such a large number.
74 Chapter 4 Dynamics
Relativistic kinetic energy Example 4
By about 1930, particle accelerators had progressed to the point
at which relativistic effects were routinely taken into account. In
1964, Bertozzi did special-purpose experiment to test the predic-
tions of relativity using an electron accelerator. The results were
discussed in less detail in example 2 on p. 54, at which point
we had not yet seen the relativistic equation for kinetic energy.
Electrons were accelerated through a static electric potential dif-
ference V to a variety of kinetic energies K = eV, and their veloc-
ities inferred by measuring their time of ight through a beamline
of length = 8.4 m. Electrical pulses were recorded on an os-
cilloscope at the beginning and end of the time of ight t . The
energies were conrmed by calorimetry. Figure c shows a sam-
ple photograph of an oscilloscope trace at V = 1.5 MeV.
c / Example 4. Each horizontal di-
vision is 9.8 ns.
The prediction of Newtonian physics is as follows.
eV = (1/2)mv
v/c = 2.4
t = 12 ns
According to special relativity, we have:
eV = m( 1)c

1 +
= 0.97
t = 29 ns
The results contradict the Newtonian prediction and are consis-
tent with special relativity. According to Newton, this amount of
energy should have accelerated the electrons to several times
the speed of light. In reality, we see a clear demonstration of the
nature of c as a limiting velocity.
Section 4.2 E=mc
Gravity bending light Example 5
Gravity is a universal attraction between things that have mass,
and since the energy in a beam of light is equivalent to some
very small amount of mass, light should be affected by gravity,
although the effect should be very small. The rst experimental
conrmation of relativity came in 1919 when stars next to the sun
during a solar eclipse were observed to have shifted a little from
their ordinary position. (If there was no eclipse, the glare of the
sun would prevent the stars from being observed.) Starlight had
been deected by the suns gravity. The gure is a photographic
negative, so the circle that appears bright is actually the dark face
of the moon, and the dark area is really the bright corona of the
sun. The stars, marked by lines above and below then, appeared
at positions slightly different than their normal ones.
Keep in mind that these arguments are very rough and qualita-
tive, and it is not possible to produce a relativistic theory of gravity
simply by taking E = mc
and combining it with Newtons law of
gravity. After all, this law doesnt refer to time at all: it predicts
that gravitational forces propagate instantaneously. We know this
cant be consistent with relativity, which forbids cause and effect
from propagating at any speed greater than c. To produce a rela-
tivistic theory of gravity, we need general relativity.
Similar reasoning suggests that there may be stars black holes
so dense that their gravity can prevent light from leaving. Such
stars have been detected, and their properties seem so far to be
described correctly by general relativity.
76 Chapter 4 Dynamics
d / In the p-E plane, mass-
less particles lie on the two
diagonals, while particles with
mass lie to the right.
4.3 Relativistic momentum
Newtonian mechanics has two dierent measures of motion, kinetic
energy and momentum, and the relationship between them is nonlin-
ear. Doubling your cars momentum quadruples its kinetic energy.
But nonrelativistic mechanics cant handle massless particles,
which are always ultrarelativistic. We saw in section 4.1 that ul-
trarelativistic particles are generic, in the sense that they have
no individual mechanical properties other than an energy and a
direction of motion. Therefore the relationship between kinetic en-
ergy and momentum must be linear for ultrarelativistic particles.
For example, doubling the amplitude of an electromagnetic wave
quadruples both its energy density, which depends on E
and B
and its momentum density, which goes like EB.
How can we make sense of these energy-momentum relation-
ships, which seem to take on two completely dierent forms in the
limiting cases of very low and very high velocities?
The rst step is realize that since mass and energy are equivalent,
we will get more of an apples-to-apples comparison if we stop talking
about a material objects kinetic energy and consider instead its total
energy E, which includes a contribution from its mass.
Figure d is a graph of energy versus momentum. In this repre-
sentation, massless particles, which have E [p[, lie on two diagonal
lines that connect at the origin. If we like, we can pick units such
that the slopes of these lines are plus and minus one. Material par-
ticles lie above these lines. For example, a car sitting in a parking
lot has p = 0 and E = m.
Now what happens to such a graph when we change to a dif-
ferent frame or reference that is in motion relative to the original
frame? A massless particle still has to act like a massless particle,
so the diagonals are simply stretched or contracted along their own
lengths. A transformation that always takes a line to a line is a
linear transformation, and if the transformation between dierent
frames of reference preserves the linearity of the lines p = E and
p = E, then its natural to suspect that it is actually some kind of
linear transformation. In fact the transformation must be linear, be-
cause conservation of energy and momentum involve addition, and
we need these laws to be valid in all frames of reference. But now
by the same reasoning as in subsection 1.3.1 on p. 21, the trans-
formation must be area-preserving. We then have the same three
cases to consider as in gure j on p. 16. The Galilean version is
ruled out because it would imply that particles keep the same en-
ergy when we change frames. (This is what would happen if c were
innite, so that the mass-equivalent E/c
of a given energy was zero,
and therefore E would be interpreted purely as the mass.) Nor can
the rotational version be right, because it doesnt preserve the
Section 4.3 Relativistic momentum 77
E = [p[ diagonals. We are left with the third case, which establishes
the following aesthetically appealing fact:
Energy-momentum is a four-vector
Let an isolated object have momentum and mass-energy p and E.
Then the p-E plane transforms according to exactly the same kind
of Lorentz transformation as the x-t plane. That is, (E, p
, p
, p
is a four-dimensional vector just like (t, x, y, z).
This is a highly desirable result. If it were not true, it would be
like having to learn dierent mathematical rules for dierent kinds
of three-vectors in Newtonian mechanics.
The only remaining issue to settle is whether the choice of units
that gives invariant 45-degree diagonals in the x-t plane is the same
as the choice of units that gives such diagonals in the p-E plane.
That is, we need to establish that the c that applies to x and t is
equal to the c

needed for p and E, i.e., that the velocity scales of the

two graphs are matched up. This is true because in the Newtonian
limit, the total mass-energy E is essentially just the particles mass,
and then p/E p/m v. This establishes that the velocity scales
are matched at small velocities, which implies that they coincide for
all velocities, since a large velocity, even one approaching c, can be
built up from many small increments. (This also establishes that
the exponent n dened on p. 70 equals 1 as claimed.)
Suppose that a particle is at rest. Then it has p = 0 and mass-
energy E equal to its mass m. Therefore the inner product of its
(E, p) four-vector with itself equals m
. In other words, the mag-
nitude of the energy-momentum four-vector is simply equal to the
particles mass. If we transform into a dierent frame of reference,
in which p ,= 0, the inner product stays the same. In symbols,
= E
or, in units with c ,= 1,
= E
We take this as the relativistic denition of mass. Since the de-
nition is an inner product, which is a scalar, it is the same in all
frames of reference. (Some older books use an obsolete convention
of referring to m as mass and m as rest mass.)
self-check A
Interpret the equation m
= E
in the case where m = 0.
Answer, p. ??
Mass of two light waves Example 6
Let the momentum of a certain light wave be (p
, p
) = (E, E),
78 Chapter 4 Dynamics
and let another such wave have momentum (E, E). The total
momentum is (2E, 0). Thus this pair of massless particles has a
collective mass of 2E. This is an example of the non-additivity of
relativistic mass.
Example 6 shows that mass is not additive, nor it is a measure
of the quantity of matter.
Finding velocity given energy and momentum Example 7
If we know that a particle has mass-energy E and momentum p
(which also implies knowledge of its mass m), what is its velocity?
In the particles rest frame it has a world-line that points straight
up on a spacetime diagram, and its momentum vector p likewise
points up in the p E plane. Since displacement vectors and
momentum vectors transform according to the same rules, this
parallelism will be maintained in other frames as well. Therefore
in an arbitrarily chosen frame, the vector p = (E, p) lies along a
line whose inverse slope v = p/E gives the velocity.
As a check on our result, we look at its limiting behavior. In the
Newtonian limit, the mass-energy E is nearly all due to the mass,
so we have v p/m, the Newtonian result. In the opposite limit
of ultrarelativistic motion, with E m, the denition of mass
= E
gives E [p[, and we have [v[ 1, which is
also correct.
Light rays dont interact Example 8
We observe that when two rays of light cross paths, they continue
through one another without bouncing like material objects. This
behavior follows directly from conservation of energy-momentum.
Any two vectors can be contained in a single plane, so we can
choose our coordinates so that both rays have vanishing p
. By
choosing the state of motion of our coordinate system appropri-
ately, we can also make p
= 0, so that the collision takes place
along a single line parallel to the x axis. Since only p
is nonzero,
we write it simply as p. In the resulting p-E plane, there are two
possibilities: either the rays both lie along the same diagonal, or
they lie along different diagonals. If they lie along the same di-
agonal, then there cant be a collision, because the two rays are
both moving in the same direction at the same speed c, and the
trailing one will never catch up with the leading one.
Now suppose they lie along different diagonals. We add their
energy-momentum vectors to get their total energy-momentum,
which will lie in the gray area of gure d. That is, a pair of light
rays taken as a single system act sort of like a material object
with a nonzero mass. By a Lorentz transformation, we can al-
ways nd a frame in which this total energy-momentum vector
lies along the E axis. This is a frame in which the momenta of the
two rays cancel, and we have a symmetric head-on collision be-
Section 4.3 Relativistic momentum 79
tween two rays of equal energy. It is the center-of-mass frame,
although neither object has any mass on an individual basis. For
convenience, lets assume that the x-y-z coordinate system was
chosen so that its origin was at rest in this frame.
Since the collision occurs along the x axis, by symmetry it is not
possible for the rays after the collision to depart from the x axis;
for if they did, then there would be nothing to determine the ori-
entation of the plane in which they emerged.
Therefore we are
justied in continuing to use the same p
-E plane to analyze the
four-vectors of the rays after the collision.
Let each ray have energy E in the frame described above. Given
this total energy-momentum vector, how can we cook up two
energy-momentumvectors for the nal state such that energy and
momentum will have been conserved? Since there is zero total
momentum, our only choice is two light rays, one with energy-
momentum vector (E, E) and one with (E, E). But this is exactly
the same as our initial state, except that we can arbitrarily choose
the roles of the two rays to have been interchanged. Such an in-
terchanging is only a matter of labeling, so there is no observable
sense in which the rays have collided.
Compton scattering Example 9
Figure e/1 is a histogram of gamma rays emitted by a
source and recorded by a NaI scintillation detector. This type
of detector, unlike a Geiger-Muller counter, gives a pulse whose
height is proportional to the energy of the radiation. About half the
gamma rays do what we would like them to do in a detector: they
deposit their full energy of 662 keV in the detector, resulting in a
prominent peak in the histogram. The other half, however, inter-
act through a process called Compton scattering, in which they
collide with one of the electrons but emerge from the collision
still retaining some of their energy, with which they may escape
In quantum mechanics, there is a loophole here. Quantum mechanics allows
certain kinds of randomness, so that the symmetry can be broken by letting the
outgoing rays be observed in a plane with some random orientation.
There is a second loophole here, which is that a ray of light is actually a
wave, and a wave has other properties besides energy and momentum. It has
a wavelength, and some waves also have a property called polarization. As a
mechanical analogy for polarization, consider a rope stretched taut. Side-to-side
vibrations can propagate along the rope, and these vibrations can occur in any
plane that coincides with the rope. The orientation of this plane is referred to
as the polarization of the wave. Returning to the case of the colliding light rays,
it is possible to have nontrivial collisions in the sense that the rays could aect
one anothers wavelengths and polarizations. Although this doesnt actually
happen with non-quantum-mechanical light waves, it can happen with other
types of waves; see, e.g., Hu et al.,, gure 2.
The title of example 8 is only valid if a ray is taken to be something that lacks
wave structure. The wave nature of light is not evident in everyday life from
observations with apparatus such as ashlights, mirrors, and eyeglasses, so we
expect the result to hold under those circumstances, and it does. E.g., ashlight
beams do pass through one anther without interacting.
80 Chapter 4 Dynamics
from the detector. The amount of energy deposited in the detec-
tor depends solely on the billiard-ball kinematics of the collision,
and can be determined from conservation of energy-momentum
based on the scattering angle. Forward scattering at 0 degrees
is no interaction at all, and deposits no energy, while scattering
at 180 degrees deposits the maximum energy possible if the only
interaction inside the detector is a single Compton scattering. We
will analyze the 180-degree scattering, since it can be tackled in
1+1 dimensions.
e / 1. The Compton edge lies at
the energy deposited by gamma
rays that scatter at 180 degrees
from an electron. 2. The colli-
sion in the lab frame. 3. The
same collision in the center of
mass frame.
Figure e/2 shows the collision in the lab frame, where the elec-
tron is initially at rest. As is conventional in this type of diagram,
the world-line of the photon is shown as a wiggly line; the wig-
gles are just a decoration, and the actual world-line consists of
two line segments. The photon enters the detector with the full
energy E
= 662 keV and leaves with a smaller energy E
. The
difference E
is what the detector will measure, contributing
a count to the Compton edge. In the lab frame, the total initial
momentum vector is p = (E
+ m, E
), with the timelike compo-
nent representing the total mass-energy. Because the photon is
massless, its momentum p
= E
is equal to its energy.
Let v be the velocity of the center-of-mass frame, e/3, relative to
the lab frame. Using the result of example 7, we nd v = E
m). To make the writing easier we dene = E
/m, so that
v = /(1 + ).
The transformation from the lab frame to the c.m. frame Doppler
shifts the energy of the incident photon down to E

= D(v)E
The collision reverses the spatial part of the photons energy-
momentum vector while leaving its energy the same. Transfor-
mation back into the lab frame gives E
= D(v)E

= D(v)
/(1 + 2). (This can also be rewritten using the quantum-
mechanical relation E = hc/ to give the compact form

2hc/m.) The nal result for the energy of the Compton edge is
1 + 1/2
= 478 keV ,
Section 4.3 Relativistic momentum 81
in good agreement with gure e/1.
Pair production requires matter Example 10
Example 2 on p. 74 discussed the annihilation of an electron and
a positron into two gamma rays, which is an example of turning
matter into pure energy. An opposite example is pair production,
a process in which a gamma ray disappears, and its energy goes
into creating an electron and a positron.
Pair production cannot happen in a vacuum. For example, gamma
rays from distant black holes can travel through empty space for
thousands of years before being detected on earth, and they dont
turn into electron-positron pairs before they can get here. Pair
production can only happen in the presence of matter. When
lead is used as shielding against gamma rays, one of the ways
the gamma rays can be stopped in the lead is by undergoing pair
To see why pair production is forbidden in a vacuum, consider the
process in the frame of reference in which the electron-positron
pair has zero total momentum. In this frame, the gamma ray
would have to have had zero momentum, but a gamma ray with
zero momentum must have zero energy as well. This means
that conservation of the momentum vector has been violated: the
timelike component of the momentum is the mass-energy, and it
has increased from 0 in the initial state to at least 2mc
in the nal
4.3.1 Massless particles travel at c
Massless particles always travel at c(= 1). For suppose that a
massless particle had [v[ < 1 in the frame of some observer. Then
some other observer could be at rest relative to the particle. In
such a frame, the particles momentum p is zero by symmetry, since
there is no preferred direction for it. Then E
= p
+ m
is zero
as well, so the particles entire energy-momentum vector is zero.
But a vector that vanishes in one frame also vanishes in every other
frame. That means were talking about a particle that cant undergo
scattering, emission, or absorption, and is therefore undetectable by
any experiment. This is physically unacceptable because we dont
consider phenomena (e.g., invisible fairies) to be of physical interest
if they are undetectable even in principle.
What about the case of a material particle, i.e., one having mass?
Since we already have an equation E = m for the energy of a ma-
terial particle in terms of its velocity, we can nd a similar equation
82 Chapter 4 Dynamics
for the momentum:
p =
= m

= m
1 v
= mv .
As a material particle gets closer and closer to c, its momentum
approaches innity, so that an innite force would be required in
order to reach c.
In summary, massless particles always move at v = c, while
massive ones always move at v < c.
Note that the equation p = mv isnt general enough to serve as
a denition of momentum, since it becomes an indeterminate form
in the limit m 0.
No half-life for massless particles Example 11
When we describe an unstable nucleus or other particle as hav-
ing some half-life, we mean its half-life in its own rest frame. A
massless particle always moves at c and therefore has no rest
frame (section 3.4), so it doesnt make sense to describe it as
having a half-life in this sense. This is almost, but not quite, the
same thing as saying that massless particles can never decay.
Constraints on polarization Example 12
We observe that electromagnetic waves are always polarized
transversely, never longitudinally. Such a constraint can only ap-
ply to a wave that propagates at c. If it applied to a wave that
propagated at less than c, we could move into a frame of refer-
ence in which the wave was at rest. In this frame, all directions in
space would be equivalent, and there would be no way to decide
which directions of polarization should be permitted. For a wave
that propagates at c, there is no frame in which the wave is at rest
(see section 3.8).
4.3.2 No global conservation of energy-momentum in general
If you read optional chapter 2, you know that the distinction
between special and general relativity is dened by the atness
of spacetime, and that atness is in turn dened by the path-
independence of parallel transport. Whereas energy is a scalar in
See Fiore and Modanese,,
decay-of-massless-particles. If such a process does exist, then Lorentz
invariance requires that its time-scale be proportional to the particles energy.
It can be argued that gluons, which are massless, do in fact undergo decay into
less energetic gluons, but the interpretation is ambiguous because we never
observe gluons as free particles, so we cant just capture one in a box and watch
it rattle around inside until it decays.
Section 4.3 Relativistic momentum 83
Newtonian mechanics, in relativity it is the timelike component of
a vector. It therefore follows that in general relativity we should
not expect to have global conservation of energy. For a conservation
law is a statement that when we add up a certain quantity, the total
has a constant value. But if spacetime is curved, then there is no
natural, uniquely dened way to compare vectors that are dened at
dierent places in spacetime. We could parallel transport one over
to the other, but the result would depend on the path along which
we chose to transport it. For similar reasons, we should not expect
global conservation of momentum.
This is the answer to a frequently asked question about cosmol-
ogy. Since 1998 weve known that the expansion of the universe is
accelerating, rather than decelerating as we would have expected due
to gravitational attraction. What is the source of the ever-increasing
kinetic energy of all those galaxies? The question assumes that en-
ergy must be conserved on cosmological scales, but that just isnt
Nevertheless, general relativity reduces to special relativity on
scales small enough to make curvature eects negligible. Therefore it
is still valid to expect conservation of energy and momentum to hold
locally, as assumed, e.g., in the analysis of Compton scattering in
example 9 on p. 80, and veried in countless experiments. Cf. section
9.2, p. 153, on the stress-energy tensor.
4.4 Systems with internal structure
Section 4.2 presented essentially Einsteins original proof of E =
, which has been criticized on several grounds. A detailed discus-
sion is given by Ohanian.
Putting aside questions that are purely
historical or concerned only with academic priority, we would like
to know whether the proof has logical aws, and also whether the
claimed result is only valid under certain conditions. We need to
consider the following questions:
1. Does it matter whether the system being described has nite
spatial extent, or whether the system is isolated?
2. Does it matter whether parts of the system are moving at
relativistic velocities?
3. Does the low-velocity approximation used in Einsteins proof
make a dierence?
4. How do we handle a system that is not made out of point-
like particles, e.g., a capacitor, in which some of the energy-
momentum is in an electric eld?
Einsteins E = mc
84 Chapter 4 Dynamics
f / The world lines of two beads
bouncing back and forth on a
The following example demonstrates issues 1-3 and their logical
connections; the denitional question 4 is addressed in ch. 9. Sup-
pose that two beads slide freely on a wire, bouncing elastically o
of each other and also rebounding elastically from the wires ends.
Their world-lines are shown in gure f. Lets say the beads each
have unit mass. In frame o, the beads are released from the center
of the wire with velocities u. For concreteness, lets set u = 1/2,
so that the system has internal motion at relativistic speeds. In
this frame, the total energy-momentum vector of the system, on the
surface of simultaneity labeled 1 in gure f, is p = (2.31, 0). That is,
it has a total mass-energy of 2.31 units, and a total momentum of
zero (meaning that this is the center of mass frame). As time goes
on, an observer in this frame will say that the balls reach the ends of
the wire simultaneously, at which point they rebound, maintaining
the same total energy-momentum vector p. The mass of the system
is, by denition, m =

2.31, and this mass remains

constant as the balls bounce back and forth.
Now lets transform into a frame o

, moving at a velocity v = 1/2

relative to o. If velocities added linearly in relativity, then the initial
velocities of the beads in this frame would be 0 and 1, but of course
a material object cant move with speed [v[ = c = 1, and velocities
dont add linearly. Applying the correct velocity addition formula for
relativity, we nd that the beads have initial velocities 0 and 0.8 in
this frame, and if we compute their total energy-momentum vector,
on surface of simultaneity 2 in gure f, we get p

= (2.67, 1.33).
This is exactly what we would have gotten by taking the original
vector p and pushing it through a Lorentz transformation. That
is, the energy-momentum vector seems to be acting like a good
four-vector, even through the system has nite spatial extent and
contains parts that move at relativistic speeds. In particular, this
implies that the system has the same mass m =

2.31 as in o, since
m is the norm of the p vector, and the norm of a vector stays the
same under a Lorentz transformation.
But now consider surface 3, which, like 2, observer o

to be a surface of simultaneity. At this time, o

says that both beads

are moving to the left. Between time 2 and time 3, o

says that
the systems total momentum has changed, while its total mass-
energy stayed constant. Its mass is dierent, and the total energy-
momentum vector p

at time 3 is not related by a Lorentz trans-

formation to the value of p at any time in frame o. The reason for
this misbehavior is that the right-hand bead has bounced o of the
right end of the wire, but because o and o

have dierent opinions

about simultaneity, o

says that there has not yet been any matching

collision for the bead on the left.
But all of these diculties arise only because we have left some-
thing out. When the right-hand bead bounces o of the right-hand
end of the wire, this is a collision between the bead and the wire.
Section 4.4 Systems with internal structure 85
After the collision, the wire rebounds to the right (or a vibration is
created in it). By ignoring the rebound of the wire, we have vio-
lated the law of conservation of momentum. If we take into account
the momentum imparted to the wire, then the energy-momentum
vector of the whole system is conserved, and must therefore be the
same at 2 and 3.
The upshot of all this is that E = mc
and the four-vector
nature of p are both valid for systems with nite spatial extent,
provided that the systems are isolated. Isolated means simply
that we should not gratuitously ignore anything such as the wire
in this example that exchanges energy-momentum with our system.
To give a general proof of this, it will be helpful to develop the
idea of the stress-energy tensor (section 9.2, p. 153), which allows
a succinct statement of what we mean by conservation of energy-
momentum (subsection 9.2.1). A proof is given in section 9.3.4 on
p. 165.
4.5 Force
Force is a concept that is seldom needed in relativity, and thats
why this section is optional.
4.5.1 Four-force
By analogy with Newtonian mechanics, we dene a relativistic
force vector
F = ma ,
where a is the acceleration four-vector (sec 3.5, p. 56) and m is the
mass of a particle that has that acceleration as a result of the force
F. This is equivalent to
F =
where p is the mass of the particle and its proper time. Since
the timelike part of p is the particles mass-energy, the timelike
component of the force is related to the power expended by the
force. These denitions only work for massive particles, since for a
massless particle we cant dene a or . F has been dened in terms
of Lorentz invariants and four-vectors, and therefore it transforms
as a god-fearing four-vector itself.
4.5.2 The force measured by an observer
The trouble with all this is that F isnt what we actually mea-
sure when we measure a force, except if we happen to be in a frame
of reference that momentarily coincides with the rest frame of the
particle. As with velocity and acceleration (section 3.7, p. 61), we
have a four-vector that has simple, standard transformation prop-
erties, but a dierent F
, which is what is actually measured by the
86 Chapter 4 Dynamics
observer o. Its dened as
with a dt in the denominator rather than a d. In other words,
it measures the rate of transfer of momentum according to the ob-
server, whose time coordinate is t, not unless the observer
happens to be moving along with the particle. Unlike the three-
vectors v
and a
, whose timelike components are zero by deni-
tion according to observer o, F
usually has a nonvanishing timelike
component, which is the rate of change of the particles mass-energy,
i.e., the power.
The following two examples show that an object moving at rel-
ativistic speeds has less inertia in the transverse direction than in
the longitudinal one. A corollary is that the three-acceleration need
not be parallel to the three-force.
Circular motion Example 13
For a particle in uniform circular motion, is constant, and we
(mv) = m
The particles mass-energy is constant, so the timelike compo-
nent of F
does happen to be zero in this example. In terms of
the three-vectors v
and a
dened in section 3.7, we have
= m
= ma
which is greater than the Newtonian value by the factor . As a
practical example, in a cathode ray tube (CRT) such as the tube
in an old-fasioned oscilloscope or television, a beam of electrons
is accelerated up to relativistic speed (problem 2, p. 96). To paint
a picture on the screen, the beam has to be steered by transverse
forces, and since the deection angles are small, the world-line
of the beam is approximately that of uniform circular motion. The
force required to deect the beam is greater by a factor of than
would have been expected according to Newtons laws.
Linear motion Example 14
For accelerated linear motion in the x direction, ignoring y and z,
we have a velocity vector
v =
Section 4.5 Force 87
whose x component is v. Then
= m
= m
v + m
= m
+ ma
= m(v

a + a)
= ma
The particles apparent inertia is increased by a factor of
to relativity.
The results of examples 13 and 14 can be combined as follows:
= ma
where the subscripts and | refer to the parts of a
and parallel to v
4.5.3 Transformation of the force measured by an observer
Dene a frame of reference o for the inertial frame of reference of
an observer who does happen to be moving along with the particle
at a particular instant in time. Then t is the same as , and F
same as F. In this frame, the particle is momentarily at rest, so the
work being done on it vanishes, and the timelike components of F
and F are both zero.
Suppose we do a Lorentz transformation from o to a new frame

, and suppose the boost is parallel to F

and F (which are both
purely spatial in frame o). Call this direction x. Then dp =
, dp
) = (0, dp
) transforms to dp

= (v dp
, dp
), so that

= dp

/ dt

= ( dp
)/( dt) = F
. The two factors of
cancel, and we nd that F

= F
Now lets do the case where the boost is in the y direction, per-
pendicular to the force. The Lorentz transformation doesnt change
, so F

= dp

/ dt

= dp
/( dt) = F

The summary of our results is as follows. Let F
be the force
acting on a particle, as measured in a frame instantaneously comov-
ing with the particle. Then in a frame of reference moving relative
to this one, we have

= F


where | indicates the direction parallel to the relative velocity of the
two frames, and a direction perpendicular to it.
88 Chapter 4 Dynamics
g / Subrahmanyan Chan-
drasekhar (1910-1995)
4.6 Degenerate matter
The properties of the momentum vector have surprising implications
for matter subject to extreme pressure, as in a star that uses up all
its fuel for nuclear fusion and collapses. These implications were
initially considered too exotic to be taken seriously by astronomers.
An ordinary, smallish star such as our own sun has enough hy-
drogen to sustain fusion reactions for billions of years, maintaining
an equilibrium between its gravity and the pressure of its gases.
When the hydrogen is used up, it has to begin fusing heavier el-
ements. This leads to a period of relatively rapid uctuations in
structure. Nuclear fusion proceeds up until the formation of ele-
ments as heavy as oxygen (Z = 8), but the temperatures are not
high enough to overcome the strong electrical repulsion of these nu-
clei to create even heavier ones. Some matter is blown o, but nally
nuclear reactions cease and the star collapses under the pull of its
own gravity.
To understand what happens in such a collapse, we have to un-
derstand the behavior of gases under very high pressures. In gen-
eral, a surface area A within a gas is subject to collisions in a time t
from the n particles occupying the volume V = Avt, where v is the
typical velocity of the particles. The resulting pressure is given by
P npv/V , where p is the typical momentum.
Nondegenerate gas: In an ordinary gas such as air, the parti-
cles are nonrelativistic, so v = p/m, and the thermal energy
per particle is p
/2m kT, so the pressure is P nkT/V .
Nonrelativistic, degenerate gas: When a fermionic gas is sub-
ject to extreme pressure, the dominant eects creating pres-
sure are quantum-mechanical. Because of the Pauli exclu-
sion principle, the volume available to each particle is V/n,
so its wavelength is no more than (V/n)
, leading to
p = h/ h(n/V )
. If the speeds of the particles are still
nonrelativistic, then v = p/m still holds, so the pressure be-
comes P (h
/m)(n/V )
Relativistic, degenerate gas: If the compression is strong enough
to cause highly relativistic motion for the particles, then v c,
and the result is P hc(n/V )
As a star with the mass of our sun collapses, it reaches a point
at which the electrons begin to behave as a degenerate gas, and
the collapse stops. The resulting object is called a white dwarf. A
white dwarf should be an extremely compact body, about the size
of the Earth. Because of its small surface area, it should emit very
little light. In 1910, before the theoretical predictions had been
made, Russell, Pickering, and Fleming discovered that 40 Eridani B
Section 4.6 Degenerate matter 89
had these characteristics. Russell recalled: I knew enough about
it, even in these paleozoic days, to realize at once that there was
an extreme inconsistency between what we would then have called
possible values of the surface brightness and density. I must have
shown that I was not only puzzled but crestfallen, at this exception
to what looked like a very pretty rule of stellar characteristics; but
Pickering smiled upon me, and said: It is just these exceptions
that lead to an advance in our knowledge, and so the white dwarfs
entered the realm of study!
S. Chandrasekhar showed in that 1930s that there was an upper
limit to the mass of a white dwarf. We will recapitulate his calcu-
lation briey in condensed order-of-magnitude form. The pressure
at the core of the star is P gr GM
, where M is the total
mass of the star. The star contains roughly equal numbers of neu-
trons, protons, and electrons, so M = Knm, where m is the mass of
the electron, n is the number of electrons, and K 4000. For stars
near the limit, the electrons are relativistic. Setting the pressure at
the core equal to the degeneracy pressure of a relativistic gas, we
nd that the Chandrasekhar limit is (hc/G)
= 6M

A less sloppy calculation gives something more like 1.4M

What happens to a star whose mass is above the Chandrasekhar
limit? As nuclear fusion reactions icker out, the core of the star be-
comes a white dwarf, but once fusion ceases completely this cannot
be an equilibrium state. Now consider the nuclear reactions
n p +e

p +e

n + ,
which happen due to the weak nuclear force. The rst of these re-
leases 0.8 MeV, and has a half-life of 14 minutes. This explains
why free neutrons are not observed in signicant numbers in our
universe, e.g., in cosmic rays. The second reaction requires an input
of 0.8 MeV of energy, so a free hydrogen atom is stable. The white
dwarf contains fairly heavy nuclei, not individual protons, but sim-
ilar considerations would seem to apply. A nucleus can absorb an
electron and convert a proton into a neutron, and in this context the
process is called electron capture. Ordinarily this process will only
occur if the nucleus is neutron-decient; once it reaches a neutron-
to-proton ratio that optimizes its binding energy, neutron capture
cannot proceed without a source of energy to make the reaction go.
In the environment of a white dwarf, however, there is such a source.
The annihilation of an electron opens up a hole in the Fermi sea.
There is now an state into which another electron is allowed to drop
without violating the exclusion principle, and the eect cascades
upward. In a star with a mass above the Chandrasekhar limit, this
process runs to completion, with every proton being converted into a
neutron. The result is a neutron star, which is essentially an atomic
nucleus (with Z = 0) with the mass of a star!
90 Chapter 4 Dynamics
Observational evidence for the existence of neutron stars came
in 1967 with the detection by Bell and Hewish at Cambridge of a
mysterious radio signal with a period of 1.3373011 seconds. The sig-
nals observability was synchronized with the rotation of the earth
relative to the stars, rather than with legal clock time or the earths
rotation relative to the sun. This led to the conclusion that its origin
was in space rather than on earth, and Bell and Hewish originally
dubbed it LGM-1 for little green men. The discovery of a second
signal, from a dierent direction in the sky, convinced them that it
was not actually an articial signal being generated by aliens. Bell
published the observation as an appendix to her PhD thesis, and
it was soon interpreted as a signal from a neutron star. Neutron
stars can be highly magnetized, and because of this magnetization
they may emit a directional beam of electromagnetic radiation that
sweeps across the sky once per rotational period the lighthouse
eect. If the earth lies in the plane of the beam, a periodic signal
can be detected, and the star is referred to as a pulsar. It is fairly
easy to see that the short period of rotation makes it dicult to
explain a pulsar as any kind of less exotic rotating object. In the
approximation of Newtonian mechanics, a spherical body of density
, rotating with a period T =
3/G, has zero apparent gravity
at its equator, since gravity is just strong enough to accelerate an
object so that it follows a circular trajectory above a xed point on
the surface (problem 17). In reality, astronomical bodies of plane-
tary size and greater are held together by their own gravity, so we
have T 1/

G for any body that does not y apart spontaneously

due to its own rotation. In the case of the Bell-Hewish pulsar, this
implies 10
, which is far larger than the density of nor-
mal matter, and also 10-100 times greater than the typical density
of a white dwarf near the Chandrasekhar limit.
An upper limit on the mass of a neutron star can be found in a
manner entirely analogous to the calculation of the Chandrasekhar
limit. The only dierence is that the mass of a neutron is much
greater than the mass of an electron, and the neutrons are the only
particles present, so there is no factor of K. Assuming the more
precise result of 1.4M

for the Chandrasekhar limit rather than

our sloppy one, and ignoring the interaction of the neutrons via the
strong nuclear force, we can infer an upper limit on the mass of a
neutron star:


The theoretical uncertainties in such an estimate are fairly large.

Tolman, Oppenheimer, and Volko originally estimated it in 1939
as 0.7M

, whereas modern estimates are more in the range of 1.5

to 3M

. These are signicantly lower than our crude estimate of


, mainly because the attractive nature of the strong nuclear

force tends to pull the star toward collapse. Unambiguous results
Section 4.6 Degenerate matter 91
are presently impossible because of uncertainties in extrapolating
the behavior of the strong force from the regime of ordinary nuclei,
where it has been relatively well parametrized, into the exotic envi-
ronment of a neutron star, where the density is signicantly dierent
and no protons are present. There are a variety of eects that may
be dicult to anticipate or to calculate. For example, Brown and
Bethe found in 1994
that it might be possible for the mass limit to
be drastically revised because of the process e

, which is
impossible in free space due to conservation of energy, but might be
possible in a neutron star. Observationally, nearly all neutron stars
seem to lie in a surprisingly small range of mass, between 1.3 and

, but in 2010 a neutron star with a mass of 1.97 .04 M

was discovered, ruling out most neutron-star models that included

exotic matter.
For stars with masses above the Tolman-Oppenheimer-Volko
limit, it seems likely, both on theoretical and observational grounds,
we end up with a black hole: an object with an event horizon
(cf. p. 58) that cuts its interior o from the rest of the universe.
4.7 Tachyons and FTL
4.7.1 A defense in depth
Lets summarize some ideas about faster-than-light (FTL, su-
perluminal) motion in relativity:
1. Superluminal transmission of information would violate causal-
ity, since it would allow a causal relationship between events
that were spacelike in relation to one another, and the time-
ordering of such events is dierent according to dierent ob-
servers. Since we never seem to observe causality to be vi-
olated, we suspect that superluminal transmission of infor-
mation is impossible. This leads us to interpret the metric in
relativity as being fundamentally a statement of possible cause
and eect relationships between events.
2. We observe the invariant mass dened by m
= E
to be
a xed property of all objects. Therefore we suspect that it is
not possible for an object to change from having [E[ > [p[ to
having [E[ < [p[.
3. No continuous process of acceleration can bring an observer
from v < c to v > c (see section 3.3). Since its possible to
build an observer out of material objects, it seems that its
H.A. Bethe and G.E. Brown, Observational constraints on the maximum
neutron star mass, Astrophys. J. 445 (1995) L129. G.E. Brown and H.A.
Bethe, A Scenario for a Large Number of Low-Mass Black Holes in the Galaxy,
Astrophys. J. 423 (1994) 659. Both papers are available at
Demorest et al.,
92 Chapter 4 Dynamics
impossible to get a material object past c by a continuous
process of acceleration.
4. If we could boost a material object past the speed of light,
even by some discontinuous process, then we could do so for
an observer. But faster-than-light frames of reference are kine-
matically impossible in 3+1 dimensions (section 3.8).
Special relativity seems to have a defense in depth against superlu-
minal motion.
The weakest of these arguments is 1, since as described in sec-
tion 2.1, we have no strong reasons for believing in causality as an
overarching principle of physics. If were willing to let go of causal-
ity, then we only need to comply with arguments 2, 3, and 4 above.
Based on 2, FTL motion would be a property of an exotic form
of matter built out of hypothetical particles with imaginary mass.
Such particles are called tachyons. Argument 4 tells us that the laws
of physics must conspire to make it impossible to build an observer
out of tachyons; this is not entirely implausible, since there are other
classes of particles such as photons that cant be used to construct
4.7.2 Experiments to search for tachyons
It would be exciting if we could detect tachyons in particle accel-
erator experiments or as naturally occurring radiation. Perhaps we
could even learn to transmit and receive tachyon signals articially,
allowing us to send ourselves messages from the future! The latter
possibility was pointed out in 1917 by Tolman
and is referred to
as the tachyonic antitelephone. Bilaniuk et al. claimed in a 1962
paper to have found a reinterpretation that eliminated the causal-
ity violation,
but their interpretation requires that rates of tachyon
emission in one frame be related to rates of tachyon absorption in an-
other frame, which in my opinion is equally problematic, since rates
of absorption should depend on the environment, whereas rates of
emission should depend on the emitter; the causality violation has
simply been described in dierent words, but not eliminated.
Experimental searches are made more dicult by conicting the-
oretical claims as to whether tachyons should be charged or neu-
tral, whether they should have integral or half-integral spin, and
whether the normal spin-statistics relation even applies to them.
Bilaniuk and Sudarshan, Particles beyond the light barrier, Phys. To-
day 22, 43 (1969), available online at
For a dierent critique, see Benford, Book, and Newcomb, The tachyonic
antitelephone, Physical Review D 2 (1970) 263. Scans of the paper can be
found online.
Feinberg, Possibility of Faster-Than-light Particles, Phys
Section 4.7 Tachyons and FTL 93
If charged, it is uncertain whether and under what circumstances
they would emit Cerenkov radiation.
The most obvious experimental signature of tachyons would be
propagation at speeds greater than c. Negative results were reported
by Murthy and later by Clay,
who studied air showers generated
by cosmic rays to look for precursor particles that arrived before the
rst photons.
One could also look for particles with [p[ > E. Alvager and
Erman, in a 1965 experiment, studied the beta decay of
Tm, using
a spectrometer to measure the momentum of charged radiation and
a solid state detector to determine energy. An upper limit of one
tachyon per 10
beta particles was inferred.
If tachyons are neutral, then they might be dicult to detect
directly, but it might be possible to infer their existence indirectly
through missing energy-momentum in reactions. This is how the
neutrino was rst discovered. Baltay et al.
searched for reactions
such as p + p

+ t, with t being a neutral tachyon, by

measuring the momenta of all the other initial and nal particles
and looking for events in which the missing energy-momentum was
spacelike. They put upper limits of 10
on the branching ratios
of this and several other reactions leading to production of single
tachyons or tachyon-antitachyon pairs.
When we add quantum mechanics to special relativity, we get
quantum eld theory, which sounds scary and can be quite techni-
cal, but is governed by some very simple principles. One of these
principles is that everything not forbidden is compulsory. The
phrase originated as political satire of communism by T.H. White,
but was commandeered by physicist Murray Gell-Mann to express
the idea that any process not forbidden by a conservation law will in
fact occur in nature at some rate. If tachyons exist, then it is possi-
ble to have two tachyons whose energy-momentum vectors add up
to zero (problem 8, p. 97). This would seem to imply that the vac-
uum could spontaneously create tachyon-antitachyon pairs. Most
theorists now interpret this as meaning that when tachyons pop up
in the equations, its a sign that the assumed vacuum state is not
stable, and will change into some other state that is the true state
of minimum energy.
A brief urry of reawakened interest in tachyons was occasioned
by a 2011 debacle in which the particle-physics experiment OPERA
mistakenly reported faster-than-light propagation of neutrinos; the
anomaly was later found to be the result of a loose connection on a
Rev 159 (1967) 1089,
A search for tachyons in cosmic ray showers, Austr. J. Phys 41 (1988) 93,
Phys. Rev. D 1 (1970) 759
94 Chapter 4 Dynamics
ber-optic cable plus a miscalibrated oscillator.
Section 4.7 Tachyons and FTL 95
1 Criticize the following reasoning. Temperature is a measure
of the energy per atom. In nonrelativistic physics, there is a min-
imum temperature, which corresponds to zero energy per atom, but
no maximum. In relativity, there should be a maximum temperature,
which would be the temperature at which all the atoms are moving
at c.
2 In an old-fashioned cathode ray tube (CRT) television, elec-
trons are accelerated through a voltage dierence that is typically
about 20 kV. At what fraction of the speed of light are the electrons
3 In nuclear beta decay, an electron or antielectron is typically
emitted with an energy on the order of 1 MeV. In alpha decay,
the alpha particle typically has an energy of about 5 MeV. In each
case, do a rough estimate of whether the particle is nonrelativistic,
relativistic, or ultrarelativistic.
4 Suppose that the starship Enterprise from Star Trek has a
mass of 8.0 10
kg, about the same as the Queen Elizabeth 2.
Compute the kinetic energy it would have to have if it was moving
at half the speed of light. Compare with the total energy content of
the worlds nuclear arsenals, which is about 10

5 Cosmic-ray neutrinos may be the fastest material particles in

the universe. In 2013 the IceCube neutrino detector in Antarctica
detected two neutrinos,
dubbed Bert and Ernie, after the Sesame
Street characters, with energies in the neighborhood of 1 PeV =
eV. The higher energy was Ernies 1.14 0.17 PeV. It is not
known what type of neutrino he was, nor do we have exact masses
for neutrinos, but lets assume m = 1 eV. Find Ernies rapidity.
6 Science ction stories often depict spaceships traveling through
solar systems at relativistic speeds. Interplanetary space contains
a signicant number of tiny dust particles, and such a ship would
sweep these dust particles out of a large volume of space, impacting
them at high speeds. A 1975 experiment aboard the Skylab space
station measured the frequency of impacts from such objects and
found that a square meter of exposed surface experienced an impact
from a particle with a mass of 10
kg about every few hours.
A relativistic object, sweeping through space much more rapidly,
would experience such impacts at rates of more like one every few
seconds. (Larger particles are signicantly more rare, with the fre-
quency falling o as something like m
.) These particles didnt
damage Skylab, because at relative velocities of 10
m/s their ki-
netic energies were on the order of microjoules. At relativistic speeds
it would be a dierent story. Real-world spacecraft are lightweight
and rather fragile, so there would probably be serious consequences
96 Chapter 4 Dynamics
from any impact having a kinetic energy of about 10
J (comparable
to a bullet from a small handgun). (a) Find the speed at which a
starship could cruise through a solar system if frequent 10
J col-
lisions were acceptable, assuming no object with a mass of more
than 10
kg. Express your result relative to c. (b) Find the speed
under the more conservative parameters of 10 J and 10
7 Example 4 on p. 57 derives the equation
x =
cosh a
for a particle moving with constant acceleration. (Note that a con-
stant of integration was taken to be zero, so that x ,= 0 at = 0.)
(a) Rewrite this equation in metric units by inserting the necessary
factors of c. (b) If we had a rocket ship capable of accelerating
indenitely at g, how much proper time would be needed in order
to travel the distance x = 27, 000 light-years to the galactic cen-
ter? (This will be a yby, so the ship accelerates all the way rather
than decelerating to stop at its destination.) Answer: 11 years (c)
An observer at rest relative to the galaxy explains the surprisingly
short time calculated in part b as being due to the time dilation
experienced by the traveler. How does the traveler explain it?
8 Show, as claimed on p. 94, that if tachyons exist, then it is
possible to have two tachyons whose momentum vectors add up to
9 (a) A free neutron (as opposed to a neutron bound into an
atomic nucleus) is unstable, and undergoes spontaneous radioactive
decay into a proton, an electron, and an antineutrino. The masses
of the particles involved are as follows:
neutron 1.67495 10
proton 1.67265 10
electron 0.00091 10
antineutrino < 10
Find the energy released in the decay of a free neutron.

(b) Neutrons and protons make up essentially all of the mass of the
ordinary matter around us. We observe that the universe around us
has no free neutrons, but lots of free protons (the nuclei of hydrogen,
which is the element that 90% of the universe is made of). We nd
neutrons only inside nuclei along with other neutrons and protons,
not on their own.
If there are processes that can convert neutrons into protons, we
might imagine that there could also be proton-to-neutron conver-
sions, and indeed such a process does occur sometimes in nuclei
that contain both neutrons and protons: a proton can decay into a
neutron, a positron, and a neutrino. A positron is a particle with
the same properties as an electron, except that its electrical charge
is positive. A neutrino, like an antineutrino, has negligible mass.
Problems 97
Although such a process can occur within a nucleus, explain why
it cannot happen to a free proton. (If it could, hydrogen would be
radioactive, and you wouldnt exist!)
10 (a) Find a relativistic equation for the velocity of an object
in terms of its mass and momentum (eliminating ).

(b) Show that your result is approximately the same as the classical
value, p/m, at low velocities.
(c) Show that very large momenta result in speeds close to the speed
of light.
11 Expand the equation for relativistic kinetic energy K =
m( 1) in a Taylor series, and nd the rst two nonvanishing
terms. Show that the rst term is the nonrelativistic expression.
12 Expand the equation p = mv in a Taylor series, and nd
the rst two nonvanishing terms. Show that the rst term is the
classical expression.
13 An atom in an excited state emits a photon, ending up in
a lower state. The initial state has mass m
, the nal one m
. To
a very good approximation, we expect the energy E of the photon
to equal m
. However, conservation of momentum dictates
that the atom must recoil from the emission, and therefore it carries
away a small amount of kinetic energy that is not available to the
photon. Find the exact energy of the photon, in the frame in which
the atom was initially at rest.
14 The following are the three most common ways in which
gamma rays interact with matter:
Photoelectric eect: The gamma ray hits an electron, is annihilated,
and gives all of its energy to the electron.
Compton scattering: The gamma ray bounces o of an electron,
exiting in some direction with some amount of energy.
Pair production: The gamma ray is annihilated, creating an electron
and a positron.
Example 10 on p. 82 shows that pair production cant occur in a
vacuum due to conservation of the energy-momentum four-vector.
What about the other two processes? Can the photoelectric eect
occur without the presence of some third particle such as an atomic
nucleus? Can Compton scattering happen without a third particle?
15 This problem assume you know some basic quantum physics.
The point of this problem is to estimate whether or not a neutron or
proton in an an atomic nucleus is highly relativistic. Nuclei typically
have diameters of a few fm (1 fm = 10
m). Take a neutron or
proton to be a particle in a box of this size. In the ground state,
half a wavelength would t in the box. Use the de Broglie relation
98 Chapter 4 Dynamics
to estimate its typical momentum and thus its typical speed. How
relativistic is it?
16 Show, as claimed in example 11 on p. 83, that if a massless
particle were to decay, Lorentz invariance requires that the time-
scale for the process be proportional to the particles energy. What
units would the constant of proportionality have?
17 Derive the equation T =
3/G given on page 91 for the
period of a rotating, spherical object that results in zero apparent
gravity at its surface.
Problems 99
100 Chapter 4 Dynamics
Chapter 5
Inertia (optional)
5.1 What is inertial motion?
On p. 43 I stated the following as an axiom of special relativity:
P4. Inertial frames of reference exist. These are frames in
which particles move at constant velocity if not subject to any
forces. We can construct such a frame by using a particular
particle, which is not subject to any forces, as a reference
point. Inertial motion is modeled by vectors and parallelism.
This is a typical modern restatement of Newtons rst law. It claims
to dene inertial frames and claims that they exist.
a / The spherical chamber, shown
in a cutaway view, has layers
of shielding to exclude all known
nongravitational forces. The three
guns, at right angles to each
other, re bullets. Once the cham-
ber has been calibrated by mark-
ing the three dashed-line trajecto-
ries under free-fall conditions, an
observer inside the chamber can
always tell whether she is in an in-
ertial frame.
5.1.1 An operational denition
In keeping with the philosophy of operationalism (p. 25), we
ought to be able to translate the denition into a method for testing
whether a given frame really is inertial. Figure a shows an idealized
varation on a device actually built for this purpose by Harold Waage
at Princeton as a lecture demonstration to be used by his partner in
b / Example 1.
crime John Wheeler. We build a sealed chamber whose contents are
isolated as much as possible from outside forces. Of the four known
forces of nature, the ones we know how to exclude are the strong
nuclear force, the weak nuclear force, and the electromagnetic force.
The strong nuclear force has a range of only about 1 fm (10
so to exclude it we merely need to make the chamber thicker than
that, and also surround it with enough paran wax to keep out
any neutrons that happen to be ying by. The weak nuclear force
also has a short range, and although shielding against neutrinos is
a practical impossibility, their inuence on the apparatus inside will
be negligible. To shield against electromagnetic forces, we surround
the chamber with a Faraday cage and a solid sheet of mu-metal.
Finally, we make sure that the chamber is not being touched by any
surrounding matter, so that short-range residual electrical forces
(sticky forces, chemical bonds, etc.) are excluded. That is, the
chamber cannot be supported; it is free-falling.
Crucially, the shielding does not exclude gravitational forces.
There is in fact no known way of shielding against gravitational
eects such as the attraction of other masses or the propagation of
gravitational waves. (Because the shielding is spherical, it exerts no
gravitational force of its own on the apparatus inside.)
Inside, an observer carries out an initial calibration by ring
bullets along three Cartesian axes and tracing their paths, which she
denes to be linear. (She can also make sure that the chamber isnt
rotating, e.g., by checking for velocity-dependent Coriolis forces.)
After the initial calibration, she can always tell whether or not she
is in an inertial frame. She simply has to re the bullets, and see
whether or not they follow the precalibrated paths. For example,
she can detect that the frame has become noninertial if the chamber
is rotated, allowed to rest on the ground, or accelerated by a rocket
Isaac Newton would have been extremely unhappy with our def-
inition. This is absurd, he says. The way youve dened it, my
street in London isnt inertial. Newtonian mechanics only makes
predictions if we input the correct data on all the mass in the uni-
verse. Given this kind of knowledge, we can properly account for all
the gravitational forces, and dene the street in London as an iner-
tial frame because in that frame, the trees and houses have zero total
force on them and dont accelerate. But spacetime isnt Galilean. In
special relativitys description of spacetime, information propagates
at a maximum speed of c, so there will always be distant parts of the
universe that we can never know about, because information from
those regions hasnt had time to reach us yet.
Rotation is noninertial Example 1
Figure b shows a hypothetical example proposed by Einstein.
One planet rotates about its axis and therefore has an equatorial
102 Chapter 5 Inertia (optional)
c / According to Galileos stu-
dent Viviani, Galileo dropped
a cannonball and a musketball
simultaneously from the leaning
tower of Pisa, and observed that
they hit the ground at nearly the
same time. This contradicted
Aristotles long-accepted idea
that heavier objects fell faster.
bulge. The other planet doesnt rotate and has none. Both New-
tonian mechanics and special relativity make these predictions,
and although the scenario is idealized and unrealistic, there is no
doubt that their predictions are correct for this situation, because
the two theories have been tested in similar cases. This also
agrees with our operational denition of inertial motion on p. 102.
Rotational motion is noninertial.
This bothered Einstein for the following reason. If the inhabitants
of the two planets can look up in the sky at the xed stars, they
have a clear explanation of the reason for the difference in shape.
People on planet A dont see the stars rise or set, and they infer
that this is because they live on a nonrotating world. The inhab-
itants of planet B do see the stars rise and set, just as they do
here on earth, so they infer, just as Copernicus did, that their
planet rotates.
But suppose, Einstein said, that the two planets exist alone in an
otherwise empty universe. There are no stars. Then its equally
valid for someone on either planet to say that its the one that
doesnt rotate. Each planet rotates relative to the other planet,
but the situation now appears completely symmetric. Einstein
took this argument seriously and felt that it showed a defect in
special relativity. He hoped that his theory of general relativity
would x this problem, and predict that in an otherwise empty uni-
verse, neither planet would show any tidal bulge. In reality, further
study of the general theory of relativity showed that it made the
same prediction as special relativity. Theorists have constructed
other theories of gravity, most prominently the Brans-Dicke the-
ory, that do behave more in the way Einsteins physical intuition
expected. Precise solar-system tests have, however, supported
general relativity rather than Brans-Dicke gravity, so it appears
well settled now that rotational motion really shouldnt be consid-
ered inertial.
5.1.2 Equivalence of inertial and gravitational mass
All of the reasoning above depends on the perfect cancellation
referred to by Newton: since gravitational forces are proportional to
mass, and acceleration is inversely proportional to mass, the result
is that accelerations caused by gravity are independent of mass.
This is the universality of free fall, which was famously observed by
Galileo, gure c.
Suppose that, on the contrary, we had access to some mat-
ter that was immune to gravity. Its sold under the brand name
. The cancellation fails now. Lets say that alien
gangsters land in a ying saucer, kidnap you out of your back yard,
konk you on the head, and take you away. When you regain con-
sciousness, youre locked up in a sealed cabin in their spaceship.
You pull your keychain out of your pocket and release it, and you
Section 5.1 What is inertial motion? 103
d / Lor and E otv os (1848-1919).
observe that it accelerates toward the oor with an acceleration that
seems quite a bit slower than what youre used to on earth, perhaps
a third of a gee. There are two possible explanations for this. One
is that the aliens have taken you to some other planet, maybe Mars,
where the strength of gravity is a third of what we have on earth.
The other is that your keychain didnt really accelerate at all: youre
still inside the ying saucer, which is accelerating at a third of a gee,
so that it was really the deck that accelerated up and hit the keys.
There is absolutely no way to tell which of these two scenarios is
actually the case unless you happen to have a chunk of FloatyStu
in your other pocket. If you release the FloatyStu and it hovers
above the deck, then youre on another planet and experiencing
genuine gravity; your keychain responded to the gravity, but the
FloatyStu didnt. But if you release the FloatyStu and see it hit
the deck, then the ying saucer is accelerating through outer space.
5.2 The equivalence principle
5.2.1 Equivalence of acceleration to a gravitational eld
The nonexistence of FloatyStu in our universe is a special case
of the equivalence principle. The equivalence principle states that
an acceleration (such as the acceleration of the ying saucer) is al-
ways equivalent to a gravitational eld, and no observation can ever
tell the dierence without reference to something external. (And
suppose you did have some external reference point how would
you know whether it was accelerating?)
5.2.2 E otv os experiments
FloatyStu would be an extreme example, but if there was any
violation of the universality of free fall, no matter how small, then
the equivalence principle would be falsied. Since Galileos time, ex-
perimental methods have had several centuries in which to improve,
and the second law has been subjected to similar tests with ex-
ponentially improving precision. For such an experiment in 1993,
physicists at the University of Pisa (!) built a metal disk out of
copper and tungsten semicircles joined together at their at edges.
They evacuated the air from a vertical shaft and dropped the disk
down it 142 times, using lasers to measure any tiny rotation that
would result if the accelerations of the copper and tungsten were
very slightly dierent. The results were statistically consistent with
zero rotation, and put an upper limit of 1 10
on the fractional
dierence in acceleration [g
[/g. Experiments of this
type are called Eotvos experiments, after Lorand Eotvos, who did
the rst modern, high-precision versions.
Carusotto et al., Limits on the violation of g-universality with a Galileo-
type experiment, Phys Lett A183 (1993) 355. Freely available online at re-
104 Chapter 5 Inertia (optional)
e / An articial horizon.
The articial horizon Example 2
The pilot of an airplane cannot always easily tell which way is up.
The horizon may not be level simply because the ground has an
actual slope, and in any case the horizon may not be visible if the
weather is foggy. One might imagine that the problem could be
solved simply by hanging a pendulum and observing which way
it pointed, but by the equivalence principle the pendulum cannot
tell the difference between a gravitational eld and an acceler-
ation of the aircraft relative to the ground nor can any other
accelerometer, such as the pilots inner ear. For example, when
the plane is turning to the right, accelerometers will be tricked into
believing that down is down and to the left. To get around this
problem, airplanes use a device called an articial horizon, which
is essentially a gyroscope. The gyroscope has to be initialized
when the plane is known to be oriented in a horizontal plane. No
gyroscope is perfect, so over time it will drift. For this reason the
instrument also contains an accelerometer, and the gyroscope is
automatically restored to agreement with the accelerometer, with
a time-constant of several minutes. If the plane is own in cir-
cles for several minutes, the articial horizon will be fooled into
indicating that the wrong direction is vertical.
5.2.3 Gravity without gravity
We live immersed in the earths gravitational eld, and that
is where we do almost all of our physics experiments. Its sur-
prising, then, that special relativity can be conrmed in earth-
bound experiments, sometimes with phenomonal precision, as in the
Ives-Stilwell experiments 10-signicant-gure test of the relativistic
Doppler shift equation (p .52). How can this be, since special rel-
ativity is supposed to be the version of relativity that cant handle
gravity? The equivalence principle provides an answer. If the only
gravitational eect on your experiment is a uniform eld g, then its
valid for you to describe your experiment as having been done in a
region without any gravity, but in a laboratory whose oor happened
to have been accelerating upward with an acceleration g. Special
relativity works just ne in such situations, because switching into
an accelerated frame of reference doesnt have any eect on the at-
ness of spacetime (p. 42). Note that Gravity Probe B (p. 42) orbited
the earth, so the eld it experienced varied in direction, causing the
above argument to fail; the eects it observed were not explainable
by special relativity.
5.2.4 Gravitational Doppler shifts
For an example of a specically gravitational experiment that
is explainable by special relativity, and that has actually been car-
ried out, In a laboratory accelerating upward, a light wave emitted
from the oor would be Doppler-shifted toward lower frequencies
when observed at the ceiling, because of the change in the receivers
Section 5.2 The equivalence principle 105
f / 1. A light wave is emitted
upward from the oor of the ele-
vator. The elevator accelerates
upward. 2. By the time the light
wave is detected at the ceiling,
the elevator has changed its
velocity, so the wave is detected
with a Doppler shift.
g / Pound and Rebka at the
top and bottom of the tower.
velocity during the waves time of ight. The eect is given by
f/f ax/c
, where a is the labs acceleration, x is the
height from oor to ceiling, and c is the speed of light (problem 1).
In units with c = 1, we have f/f ax.
By the equivalence principle, we nd that when such an experi-
ment is done in a gravitational eld g, there should be a gravitational
eect on frequency f/f gx. This can be expressed more
compactly as f/f , where is the gravitational potential,
i.e., the gravitational energy per unit mass.
In 1959, Pound and Rebka
carried out an experiment in a tower
at Harvard. Gamma rays from were emitted by a
Fe source at the
bottom and detected at the top, having risen x = 22.6 m. The
equivalence principle predicts a fractional frequency shift due to
gravity of 2.4610
. This is very small, and would normally have
been masked by recoil eects (problem 13, p. 98), but by exploiting
the Mossbauer eect Pound and Rebka measured the shift to be
(2.56 0.25) 10
5.2.5 A varying metric
In the Pound-Rebka experiment, the nuclei emitting the gamma
rays at frequency f can be thought of as little clocks. Each wave
crest that propagates upward is a signal saying that the clock has
ticked once. An observer at the top of the tower nds that the
signals come in at the lower frequency f

, and concludes naturally

that the clocks at the bottom had been slowed down due to some
kind of time dilation eect arising from gravity.
This may seem like a big conceptual leap, but it has been con-
rmed using atomic clocks. In a 1978 experiment by Iijima and Fu-
jiwara, gure h, identical atomic clocks were kept at rest at the top
and bottom of a mountain near Tokyo. The discrepancies between
the clocks were consistent with the predictions of the equivalence
principle. The gravitational Doppler shift was also one of the eects
that led to the non-null result of the Hafele-Keating experiment
p. 15, in which atomic clocks were own around the world aboard
commercial passenger jets. Every time you use the GPS system,
you are making use of these eects.
Starting from only the seemingly innocuous assumption of the
equivalence principle, we are led to surprisingly far-reaching con-
clusions. We nd that time ows at dierent rates depending on
the height within a gravitational eld. Since the metric can be in-
terpreted as a measure of the amount of proper time along a given
world-line, we conclude that we cannot always express the metric in
the familiar form
= (+1)t
+ (1)x
with xed coecients
+1 and 1. Suppose that the t coordinate is dened by radio syn-
chronization. Then the +1 in the metric needs to be replaced with
Phys. Rev. Lett. 4 (1960) 337
106 Chapter 5 Inertia (optional)
h / A graph showing the time
difference between two atomic
clocks. One clock was kept at Mi-
taka Observatory, at 58 m above
sea level. The other was moved
back and forth to a second ob-
servatory, Norikura Corona Sta-
tion, at the peak of the Norikura
volcano, 2876 m above sea level.
The plateaus on the graph are
data from the periods when the
clocks were compared side by
side at Mitaka. The difference be-
tween one plateau and the next
shows a gravitational effect on the
rate of ow of time, accumulated
during the period when the mobile
clock was at the top of Norikura.
approximately 1 + 2, where we take = 0 by convention at the
height of the standard clock that coordinates the synchronization.
Keep in mind that although we have connected gravity to the
measurement apparatus of special relativity, there is no curvature
of spacetime, so what we are doing here is still special relativity, not
general relativity. In fact there is nothing more mysterious going
on here than a renaming of spacetime events through a change of
coordinates. The renaming might be convenient if we were using
earth-based reference points to measure the x coordinate. But if
we felt like it, we could switch to a good inertial frame of reference,
one that was free-falling. In this frame, we would obtain exactly the
same prediction for the results of any experiment. For example, the
free-falling observer would explain the result of the Pound-Rebka
experiment as arising from the upward acceleration of the detector
away from the source.
Section 5.2 The equivalence principle 107
1 Carry out the details of the calculation of the gravitational
Doppler eect in section 5.2.4.
2 A student argues as follows. At the center of the earth, there
is zero gravity by symmetry. Therefore time would ow at the same
rate there as at a large distance from the earth, where there is also
zero gravity. Although we cant actually send an atomic clock to the
center of the earth, interpolating between the surface and the center
shows that a clock at the bottom of a mineshaft would run faster
than one on the earths surface. Find the mistake in this argument.
3 Somewhere in outer space, suppose there is an astronomical
body that is a sphere consisting of solid lead. Assume the Newtonian
expression = GM/r for the potential in the space outside the
object. Make an order of magnitude estimate of the diameter it
must have if the gravitational time dilation at its surface is to be
a factor of 2 relative to time as measured far away. (Under these
conditions of strong gravitational elds, special relativity is only a
crude approximation, and thats why we wont get more than an
order of magnitude estimate out of this.) What is the gravitational
eld at its surface? If I have a weeks vacation from work, and I
spend it lounging on the beach on the lead planet, do I experience
two weeks of relaxation, or half a week?
108 Chapter 5 Inertia (optional)
Chapter 6
This chapter and the preceding one have good, solid physical titles.
Inertia. Waves. But underlying the physical content is a thread of
mathematics designed to teach you a language for describing space-
time. Without this language, the complications of relativity rapidly
build up and become unmanageable. In section 5.2.5, we saw that
there are physically compelling reasons for switching back and forth
between dierent coordinate systems dierent ways of attach-
ing names to the events that make up spacetime. A toddler in a
bilingual family gets a payo for switching back and forth between
asking Mama in Spanish for dulces and alerting Daddy in English
that Barbie needs to be rescued from falling o the couch. She may
bounce back and forth between the two languages in a single sen-
tence a habit that linguists call code switching. In relativity,
we need to build uency in a language that lets us talk about actual
phenomena without getting hung up on the naming system.
6.1 Frequency
6.1.1 Is times ow constant?
The simplest naming task is in 0 + 1 dimensions: a time-line
like the ones in history class. If we name the points in time A, B,
C, . . . or 1, 2, 3, . . . , or Bush, Clinton, Bush, . . . , how do we know
that were marking o equal time intervals? Does it make sense
to imagine that time itself might speed up and slow down, or even
start and stop? The second law of thermodynamics encourages us
to think that it could. If the universe had existed for an innite
time, then entropy would have maximized itself a long time ago,
presumably and we would not exist, because the heat death of
the universe would already have happened.
6.1.2 Clock-comparison experiments
But what would it actually mean empirically for times rate of
ow to vary? Unless we can tie this to the results of experiments, its
nothing but cut-rate metaphysics. In a Hollywood movie where time
could stop, the scriptwriters would show us the stopping through
the eyes of an observer, who would stroll past frozen waterfalls and
snapshotted bullets in mid-ight. The observers brain is a kind
of clock, and so is the waterfall. Were left with whats known
as a clock-comparison experiment. To date, all clock-comparison
experiments have given null results. Matsakis et al.
found that
pulsars match the rates of atomic clocks with a drift of less than
about 10
seconds over 10 years. Guena et al.
observed that
atomic clocks using atoms of dierent isotopes drifted relative to
one another by no more than about 10
per year. Any non-null
result would have caused serious problems for relativity. One of
the expectations in an Aristotelian description of spacetime is that
the motion of material objects on earth would naturally slow down
relative to celestial phenomena such as the rising and setting of the
sun. The relativistic interpretation of time dilation as an eect on
time itself (p. 25) also depends crucially on the null results of these
6.1.3 Birdtracks notation
As a simple example of clock comparison, lets imagine using
the hourly emergence of a mechanical bird from a pendulum-driven
cuckoo clock to measure the rate at which the earth spins. There
is clearly a kind of symmetry here, since we could equally well take
our planets rotation as the standard and use it to measure the
frequency with which the bird pops out of the door. Schematically,
lets represent this measurement process with the following notation,
which is part of a system called called birdtracks:
c e = 24
Here c represents the cuckoo clock and e the rotation of the earth.
Although the measurement relationship is nearly symmetric, the
arrow has a direction, because, for example, the measurement of
the earths rotational period in terms of the clocks frequency is
c e = (1 hr
)(24 hr) = 24, but the clocks period in terms of the
earths frequency is e c = 1/24. We say that the relationship is
not symmetric but dual. By the way, it doesnt matter how we
arrange these diagrams on the page. The notations c e and e c
mean exactly the same thing, and expressions like this can even be
drawn vertically.
Suppose that e is a displacement along some one-dimensional
line of time, and we want to think of it as the thing being measured.
Then we expect that the measurement process represented by c pro-
duces a real-valued result and is a linear function of e. Since the
relationship between c and e is dual, we expect that c also belongs
to some vector space. For example, vector spaces allow multiplica-
tion by a scalar: we could double the frequency of the cuckoo clock
Astronomy and Astrophysics 326 (1997) 924,
The system used in this book follows the one dened by Cvitanovic, which
was based closely on a graphical notation due to Penrose. For a more com-
plete exposition, see the Wikipedia article Penrose graphical notation and
Cvitanovics online book at
110 Chapter 6 Waves
by making the bird come out on the half hour as well as on the
hour, forming 2c. Measurement should be a linear function of both
vectors; we say it is bilinear.
6.1.4 Duality
The two vectors c and e have dierent units, hr
and hr, and
inhabit two dierent one-dimensional vector spaces. The avor of
the vector is represented by whether the arrow goes into it or comes
out. Just as we used notation like

v in freshman physics to tell
vectors apart from scalars, we can employ arrows in the birdtracks
notation as part of the notation for the vector, so that instead of
writing the two vectors as c and e, we can notate them as c and
e . Performing a measurement is like plumbing. We join the two
pipes in c e and simplify to c e .
A confusing and nonstandardized jungle of notation and termi-
nology has grown up around these concepts. For now, lets refer to
a vector such as e , with the arrow coming in, simply as a vec-
tor, and the type like c as a covector. In the one-dimensional
example of the earth and the cuckoo clock, the roles played by the
two things were completely equivalent, and it didnt matter which
one we expressed as a vector and which as a covector.
6.2 Phase
6.2.1 Phase is a scalar
In section 1.3.1, p. 21, we dened a (Lorentz) invariant as a
quantity that was unchanged under rotations and Lorentz boosts.
A measurement such as c e = 24 is an invariant because it is
simply a count. Weve counted the number of periods. In fact,
a count is not just invariant under rotations and boosts but un-
der any well-behaved change of coordinates the technical condi-
tion being that each coordinate in each set is a dierentiable func-
tion of each coordinate in the other set. Such a change of coordi-
nates is called a dieomorphism. For example, a change of units
(t, x, y, z) (kt, kx, ky, kz) is all right as long as k is nonzero. A
quantity that stays the same under any dieomorphism is called a
scalar. Since a Lorentz transformation is a dieomorphism, every
scalar is a Lorentz invariant. Not every Lorentz invariant is a scalar.
For example, spacetime volume is Lorentz invariant (p. 45), but un-
der the change of units described above, spacetime volume doesnt
stay the same.
In birdtracks notation, any expression that has no external ar-
rows at all represents a scalar. Since the expression c e = 24 has no
external arrows, only internal ones, it represents a scalar. Another
way of describing this measurement is as a phase. If we prefer to
measure the phase in units of cycles, then we have = c e. If we
Section 6.2 Phase 111
a / 1. A displacement vector.
2. A covector. 3. Measurement
is reduced to counting. The
observer, represented by the
displacement vector, counts 24
b / Constant-temperature curves
for January in North America, at
intervals of 4

C. The tempera-
ture gradient at a given point is a
like radians, we can use = 2c e.
6.2.2 Scaling
A convenient way of summarizing all of our categories of vari-
ables is by their behavior when we convert units, i.e., when we rescale
our space. If we switch our time unit from hours to minutes, the
number of apples in a bowl is unchanged, the earths period of ro-
tation gets 60 times bigger, and the frequency of the cuckoo clock
changes by a factor of 1/60. In other words, a quantity u under
rescaling of coordinates by a factor becomes
u, where the expo-
nents 1, 0, and +1 correspond to covectors, scalars, and vectors,
respectively. We can therefore see that these distinctions are of
interest even in one dimension, contrary to what one would have ex-
pected from the freshman-physics concept of a vector as something
transforming in a certain way under rotations.
In section 1.3.1 (p. 21), we dened an invariant as a quantity
that did not change under rotations or Lorentz boosts, i.e., one that
was independent of the frame of reference. For a scalar we have the
even more restrictive condition that it must not change if we change
our units of measurement. For example, proper time is an invariant,
and so is area in 1+1-dimensional spacetime, but neither is a scalar.
6.3 The frequency-wavenumber covector
Generalizing from 0 + 1 dimensions to 3 + 1, we could have an
observer moving inertially along velocity vector o, while counting
the phase (in radians) of a plane wave (perhaps a water wave or
an electromagnetic wave) that is washing over her. Since is just a
count, its clearly a scalar. That means that we have some function
that takes as its input a vector o and gives as an output the scalar
. This function has all the right characteristics to be described
as a measurement o of o with some covector , and in a
constructive style of mathematics this is a good way of dening a
covector: its a linear function from the space of vectors to the real
numbers. We call the frequency-wavelength covector, or just
the frequency covector for short. If o represents one second as
measured on the clock of this observer, then o is the frequency
measured by this observer in units of radians per second. If the
same observer considers s to be a vector of simultaneity with a
length of one meter, then s is the observers measurement of the
wavenumber k, dened as 2 divided by the wavelength.
6.3.1 Visualization
In more than one dimension, there are natural ways of visualizing
the dierent vector spaces inhabited by vectors and covectors. A
vector is an arrow. A covector can be visualized as a set of parallel,
evenly spaced lines on a topographic map, a/2, with an arrowhead
112 Chapter 6 Waves
c / The surfer moves directly
to the right with velocity vector u.
The wave also propagates to the
to show which way is uphill. The act of measurement consists of
counting how many of these lines are crossed by a certain vector,
6.3.2 The gradient
Given a scalar eld , its gradient at any given point is a
covector. The frequency covector is the gradient of the phase. In
birdtracks notation, we indicate this by writing it with an outward-
pointing arrow, () . Because gradients occur so frequently, bird-
tracks notation has a special shorthand for them, which is simply a

As discussed in section 9.4.1 on p. 169, this notation, and the no-

tion of a gradient, have to be dened in a special way when the
coordinates are not Minkowski.
Cosmological observers Example 1
Time is relative, so what do people mean when they say that the
universe is 13.8 billion years old? If a hypothetical observer had
been around since shortly after the big bang, the time elapsed
on that observers clock would depend on the observers world
line. Two such observers, who had had different world-lines, could
have differing clock readings.
Modern cosmologists arent naive about time dilation. They have
in mind a cosmologically preferred world-line for their observer.
One way of constructing this world-line is as follows. Over time,
the temperature T of the universe has decreased. (We dene this
temperature locally, but we average over large enough regions so
that local variations dont matter.) The negative gradient of this
temperature, T, is a covector that points in a preferred direc-
tion in spacetime, and a preferred world-line for an observer is
one whose velocity vector v is always parallel to T, in the
sense that it maximizes (T) v , subject to the usual con-
straint v
= 1.
6.3.3 Phase and group velocity
Phase velocity
A wavefront is a line or surface of constant phase. In a snapshot
of a wave at one moment of time, the direction of propagation of
the wave is across the wavefronts. The visual situation is dierent
in a spacetime diagram. In 1 + 1 dimensions, gure c/1, suppose
that the lines represent the crest of the water waves. The surfer is
on top of a crest, riding along with it. His velocity vector u is in
the spacetime direction that lies on top of the wavefront, not across
it. Clearly both his motion and the propagation of the wave are to
the right, not to the left as we might imagine based on experience
with snapshots of waves.
Section 6.3 The frequency-wavenumber covector 113
d / Points on the graph sat-
isfy the dispersion relation C = 0
for water waves. At a given point
on the graph, the covector (C)
tells us the group velocity.
In 2 +1 dimensions, c/2, the surfers velocity is visualized as an
arrow lying within a plane of constant phase. Given the waves phase
information, there is more than one possible arrow of this kind. We
could try to resolve the ambiguity by requiring that the arrows
projection into the xy plane be perpendicular to the intersection of
the wavefronts with that plane, but (with the exception of the case
where the wave travels at c, example 4, p. 116) this prescription
gives results that change depending on our frame of reference, and
the changes are not describable by a Lorentz transformation of the
velocity vector. This shows that in the general case, the phase in-
formation of the wave, encoded in the frequency covector , does
not describe the direction of the waves propagation through space.
At most it tells us the waves phase velocity, /k, which is not really
a velocity. All of these are symptoms of the fact that a velocity is
supposed to be a vector, but is a covector. The phase velocity
lacks physical interest, because it is not the velocity at which any
stu moves.
Group velocity
There is a dispersion relation between a waves frequency and
wavenumber. For example, surface waves in deep water obey the
constraint C = 0, where C =

(gure d) and is a
constant with units of acceleration, relating to the acceleration of
gravity. (Since the water is innitely deep, there is no other scale
that could enter into the constraint.)
When a wave is modulated, it can transport energy and mo-
mentum and transmit information, i.e., act as an agent of cause
and eect between events. How fast does it go? If a certain bump
on the envelope with which the wave is modulated visits spacetime
events P and Q, then whatever frequency and wavelength the wave
has near the bump are observed to be the same at P and Q. In gen-
eral, k and are constant along the spacetime displacement of any
point on the envelope, so the spacetime displacement r from P to
Q must satisfy the condition (C) r = 0. The set of solutions to
this equation is the world-line of the bump, and the inverse slope of
this world-line is called the group velocity. In our example of water
waves, the group velocity is /2, which is half the phase velocity.
6.4 Duality
6.4.1 Duality in 3+1 dimensions
In our original 0 + 1-dimensional example of the cuckoo clock
and the earth, we had duality: the measurements c e = 24 and
e c = 1/24 really provided the same information, and it didnt
matter whether we made our scalar out of covector c and vector
e or covector e and vector c . All these quantities were simply
clock rates, which could be described either by their frequencies
114 Chapter 6 Waves
(covectors) or their periods (vectors).
To generalize this to 3+1 dimensions, we need to use the metric
a piece of machinery that we have never had to employ since
the beginning of the chapter. Given a vector r, suppose we knew
how to produce its covector version r . Then we could hook up the
plumbing to form r r, which is just a number. What number could
it be? The only reasonable possibility is the squared magnitude of
r, which we calculate using the metric as r
= g(r, r). Since we can
think of covectors as functions that take vectors to real numbers,
clearly r should be the function f dened by f(x) = g(r, x).
Finding the dual of a given vector Example 2
Given the vector v = (3, 4) in 1 + 1-dimensional Minkowski
coordinates, nd the covector v , i.e., its dual.
Our goal is to write out an explicit expression for the covector in
component form,
v = (a, b) .
To dene these components, we have to have some basis in
mind, consisting of one timelike observer-vector o and one space-
like vector of simultaneity s. Since were doing this in Minkowski
coordinates (section 1.2, 20), lets notate these as

t and x,
where the hats indicate that these are unit vectors in the sense

= 1 and x
= 1. Writing v in terms of a and b means
that were identifying v with the function f dened by f (x) =
g( v, x). Therefore
f (

t) = a and f ( x) = b
g( v,

t) = 3 = a and g( v, x) = 4 = b .
The result of the formidable, fancy-looking calculation in exam-
ple 2 was simply to take the vector
(3, 4)
and ip the sign of its spacelike component to give the its dual, the
(3, 4) .
Looking back at why this happened, it was because we were using
Minkowski coordinates, and in Minkowski coordinates the form of
the metric is g(p, q) = (+1)p
+ (1)p
+ . . .. Therefore, we
can always nd duals in this way, provided that (1) were using
Minkowski coordinates, and (2) the signature of the metric is, as
assumed throughout this book, +, not + ++.
Section 6.4 Duality 115
Going both ways Example 3
Assume Minkowski coordinates at signature + . Given the
e = (8, 7)
and the covector
f = (1, 2) ,
nd e and f
By the rule established above, we can nd e simply by ipping
the sign of the 7,
e = (8, 7) .
To nd f, we need to ask what vector (a, b), if we ipped the sign
of b, would give us (a, b) = (1, 2). Obviously this is
f = (1, 2) .
In other words, ipping the sign of the spacelike part of a vector
is also the recipe for changing covectors into vectors.
Example 3 shows that in Minkowski coordinates, the operation
of changing a covector to the corresponding vector is the same as
that of changing a vector to its covector. Thus, the dual of a dual is
the same thing you started with. In this respect, duality is similar
to arithmetic operations such as x x and x 1/x. That is, the
duality is a self-inverse operation it undoes itself, like getting two
sex-change operations in a row, or switching political parties twice in
a country that has a two-party system. Birdtracks notation makes
this self-inverse property look obvious, since duality means switching
a inward arrow to an outward one or vice versa, and clearly doing
two such switches gives back the original notation. This property
was established in example 3 by using Minkowski coordinates and
assuming the signature to be + , but it holds without these
assumptions (problem 1, p. 127).
In the general case where the coordinates may not be Minkowski,
the above analysis plays out as follows. Covectors and vectors are
represented by row and column vectors. The metric can be specied
by a matrix g so that the inner product of column vectors p and q
is given by p
gq, where T represents the transpose. Rerunning
the same logic with these additional complications, we nd that the
dual of a vector q is (gq)
, while the dual of a covector is (g
where g
is the inverse of the matrix g.
Velocity vector of a light wave, given its phase Example 4
We saw on p. 114 that in general, the information about the phase
of a wave encoded in does not determine its direction of prop-
agation. The exception is a wave, such as a light wave, that prop-
agates at c. Let a world-line of propagation of the wave lie along
the vector v. In the case of a wave propagating at c, we have
116 Chapter 6 Waves
= 0 (so that v cant have the usual normalization for a veloc-
ity vector), and the dispersion relation is simply
= 0. Since the
phase stays constant along a world-line of propagation, v = 0.
We therefore nd that v and are two nonzero, lightlike vectors
that are orthogonal to each other. But as shown in problem 10 on
p. 35, this implies that the two vectors are parallel. Thus if were
given the covector , we just have to compute its dual to
nd the direction of propagation.
6.4.2 Change of basis
We saw in section 6.2.2 that in 0 + 1 dimensions, vectors and
covectors has opposite scaling properties under a change of units,
so that switching our base unit from hours to minutes caused our
frequency covectors to go up by a factor of 60, while our time vectors
went down by the same factor. This behavior was necessary in order
to keep scalar products the same. In more than one dimension, the
notion of changing units is replaced with that of a change of basis.
In linear algebra, row vectors and column vectors act like covectors
and vectors; they are dual to each other. Let B be a matrix made
of column vectors, representing a basis for the column-vector space.
Then a change of basis for a row vector r is expressed as r

= rB,
while the same change of basis for a column vector c is c

= B
We then nd that the scalar product is unaected by the change of
basis, since r

= rBB
c = rc.
In the important special case where B is a Lorentz transforma-
tion, this means that covectors transform under the inverse trans-
formation, which can be found by ipping the sign of v. This fact
will be important in the following section.
6.5 The Doppler shift and aberration
6.5.1 Doppler shift
As an example, we generalize our previous discussion of the
Doppler shift of light to 3 + 1 dimensions.
For clarity, lets rst show how the 1 +1-dimensional case works
in our new notation. For a wave traveling to the left, we have
= (, ) (not (, ) see gure c/1). We now want to
transform into the frame of an observer moving to the right with
velocity v relative to the original frame. Because is a covec-
tor, we do this using the inverse Lorentz transformation. An or-
dinary Lorentz transformation would take a lightlike vector (, )
to (/D, /D) (see section 3.2). The inverse Lorentz transforma-
tion gives (D, D). The frequency has been shifted upward by the
factor D, as established previously.
In 3 +1 dimensions, a spatial plane is determined by the lights
direction of propagation and the relative velocity of the source and
Section 6.5 The Doppler shift and aberration 117
observer, so this case reduces without loss of generality to 2 + 1
dimensions. The frequency four-vector must be lightlike, so its most
general possible form is (, cos , sin ), where is interpreted
as the angle between the direction of propagation and the relative
velocity. In 2+1 dimensions, a Lorentz boost along the x axis looks
like this:

= t vx

= vt +x

= y
The inverse transformation is found by ipping the sign of v. Putting
our frequency vector through an inverse Lorentz boost, we nd

= (1 +v cos ) .
For = 0 the Doppler factor reduces to (1+v) = D, recovering the
1 + 1-dimensional result. For = 90

, we have

= , which is
interpreted as a pure time dilation eect when the sources motion
is transverse to the line of sight.
To see the power of the mathematical tools weve developed in
this chapter, you may wish to look at sections 6 and 7 of Einsteins
1905 paper on special relativity, where a lengthy derivation is needed
in order to arrive at the same result.
6.5.2 Aberration
Imagine that rain is falling vertically while you drive in a con-
vertible with the top down. To you, the raindrops appear to be
moving at some nonzero angle relative to vertical. This is referred
to as aberration: a world-lines direction changes depending on ones
frame of reference. In the streets frame of reference, the angle be-
tween the rains three-velocity and the cars is = 90

, but in the
cars frame

,= 90

. In this example, aberration is a large eect

because the cars speed v is comparable to the velocity u of the
raindrops. To a snail crawling along the sidewalk at a much lower
v, the eect would be small. Using the small-angle approximation
tan , we nd that for small v, the dierence =

be approximately v/u, in units of radians.
Compared to a ray of light, were all like snails. For example, the
earths orbital speed is about v 10
in units where the speed of
light u = 1, so we expect a maximum eect of about 10
or 20

of arc, which is small but not negligible for a telescope with

a high-quality mount, being used at high magnication.
This estimate of astronomical aberration of light is roughly right,
but we dont expect it to be exact, both because of the small-angle
approximation and because we calculated it using a Galilean picture
of spacetime. Lets calculate the exact result. As shown in example
4 on p. 116, the direction of propagation of a light wave lies along
118 Chapter 6 Waves
the vector that is the dual to its frequency covector. Lets call this
direction of propagation u. Reusing the expression for dened
in section 6.5.1, and arbitrarily xing us timelike component to
be 1, we have
u = (1, cos , sin ) .
When this vector undergoes a boost v along the x axis it becomes

= ((1 +v cos ), (v cos ), sin )

The original angle = tan
) has been transformed to



), the result being

(cos +v)
A test of special relativity Example 5
An assumption underlying this treatment of aberration was that
the speed of light was u = c, regardless of the velocity of the
source. Not all prerelativistic theories had this property, and one
would expect that in such a theory, aberration would not be in
accord with the relativistic result. In particular, suppose that we
believed in Galilean spacetime, so that when a distant galaxy,
receding from us at some speed w, emitted a ray of light toward
us, the lights velocity in our frame was u = c w. That is, we
imagine a theory in which emitting a ray of light is like shooting
a bullet from a gun. Since aberration effects go approximately
like v/u, we would expect that the reduced u would lead to more
aberration compared to the prediction of relativity.
To test theories of this type, Heckmann
used a 24-inch reector
at Hamburg to take high-magnication photographic plates of a
star eld in Ursa Major containing 11 stars inside the Milky Way
and 5 distant galaxies. Measurements of Doppler shifts showed
that the galaxies were receding from us at velocities of about w =
0.05c, whereas stars within the Milky Way move relative to us
at speeds that are negligible in comparison. If, contrary to the
relativistic prediction, this led to a 5% decrease in u, then we
would expect about a 5% increase in aberration for the galaxies
compared to the stars.
Over the course of a year, the earths orbit carries it toward and
away from Ursa Major, so that in the earths frame of reference,
the stars and galaxies have varying velocities relative to us, and
the 20

aberration effect oscillates in direction. If the effect was

different for the galaxies and the stars, then they ought to shift
their apparent positions relative to one another. The shift ought
to be on the order of 5% of 20

, or one second of arc. The results

Annales dAstrophysique 23 (1960) 410,
Section 6.5 The Doppler shift and aberration 119
from the observations showed that these relative positions did not
appear to vary at all over the course of a year, with the average
relative shift being 0.00 0.06

of arc. This difference in aberra-

tion is consistent with zero, as predicted by special relativity.
e / 1. The cubes rest frame. 2. The observers frame. 3. The observers view of the cube, severely
distorted by aberration.
The view of an ultrarelativistic observer Example 6
Figure e shows a visualization for an observer ying through a
cube at v = 0.99. In e/1, the cube is shown in its own rest frame,
where it has sides of unit length, and the observer, having already
passed through, lies one unit to the right of the cubes center. The
observer is facing to the right, away from the cube. The dashed
line is a ray of light that travels from point P to the observer, and
in this frame it appears as though the ray, arriving from = 162

would not make it into the observers eye.
But in the observers frame, e/2, the ray is at

= 47

, so it actu-
ally does fall within her eld of view. The cube is length-contracted
by a factor 7. The ray was emitted earlier, when the cube was
out in front of the observer, at the position shown by the dashed
The image seen by the observer is shown in e/3. The circular
outline dening the eld of view represents

= 50

. Note that
the relativistic length contraction is not at all what an observer
sees optically. The optical observation is inuenced by length
contraction, but also by aberration and by the time it takes for
light to propagate to the observer. The time of propagation is
different for different parts of the cube, so in the observers frame,
e/2, rays from different points had to be emitted when the cube
was at different points in its motion, if those rays were to reach
120 Chapter 6 Waves
the eye.
A group at Australian National University has produced anima-
tions of similar scenes, which can be found online by searching
for optical effects of special relativity.
Its fun to imagine the view of an observer oboard an ultrarela-
tivistic starship. For v sufciently close to 1, any angle < 180

transforms to a small

. Thus, all light coming to this observer

from the surrounding stars even those in extreme backward
directions! is gathered into a small, bright patch of light that
appears to come from straight ahead. Some visible light would
be shifted into the extreme ultraviolet and infrared, while some
infrared and ultraviolet light would become visible.
6.6 Some related mathematical tools
This chapter has centered on the physics of waves, but along the way
weve found it helpful to build up some mathematical ideas such
as covectors, which have applications in a much broader physical
context. In this section well develop some related notation and
geometrical applications.
6.6.1 Abstract index notation
Expressions in birdtracks notation such as
C s
can be awkward to type on a computer, which is why weve al-
ready been occasionally resorting to more linear notations such as
(C) s. For more complicated birdtracks, the diagrams sometimes
look like complicated electrical schematics, and the problem of gen-
erating them on a keyboard get more acute. There is in fact a sys-
tematic way of representing any such expression using only ordinary
subscripts and superscripts. This is called abstract index notation,
and was introduced by Roger Penrose at around the same time he
invented birdtracks. For practical reasons, it was the abstract index
notation that caught on.
The idea is as follows. Suppose we wanted to describe a compli-
cated birdtrack verbally, so that someone else could draw it. The
diagram would be made up of various smaller parts, a typical one
looking something like the scalar product u v. The verbal instruc-
tions might be: We have an object u with an arrow coming out of
it. For reference, lets label this arrow as a. Now remember that
other object v I had you draw before? There was an arrow coming
into that one, which we also labeled a. Now connect up the two
arrows labeled a.
Shortening this lengthy description to its bare minimum, Penrose
renders it like this: u
. Subscripts depict arrows coming out of
Section 6.6 Some related mathematical tools 121
a symbol (think of water owing from a tank out through a pipe
below). Superscripts indicate arrows going in. When the same letter
is used as both a superscript and a subscript, the two arrows are to
be piped together.
Abstract index notation evolved out of an earlier one called
the Einstein summation convention, in which superscripts and sub-
scripts referred to specic coordinates. For example, we might take
0 to be the time coordinate, 1 to be x, and so on. A symbol like u

would then indicate a component of the dual vector u, which could

be its x component if took on the value 1. Repeated indices were
summed over.
The advantage of the birdtrack and abstract index notations is
that they are coordinate-independent, so that an equation written
in them is valid regardless of the choice of coordinates. The Einstein
and abstract-index notations look very similar, so for example if we
want to take a general result expressed in abstract-index notation
and apply it in a specic coordinate system, there is essentially no
translation required. In fact, the two notations look so similar that
we need an explicit way to tell which is which, so that we can tell
whether or not a particular result is coordinate-independent. We
therefore use the convention that Latin indices represent abstract
indices, whereas Greek ones imply a specic coordinate system and
can take on numerical values, e.g., = 1.
The following are some examples of equivalent equations written
side by side in birdtracks and abstract index notations.
Observer os displacement in spacetime is a vector:
o o
In Einstein notation, its awkward to express a vector as a whole,
because in a notation like o

, is supposed to take on a particular

value. If we used o

to mean the whole vector, it would be an abuse

of notation. In abstract index notation, however, the a is simply a
name we gave to a pipe coming into vector o; the fact that we didnt
need to refer to the name in order to connect it to some other pipe
is irrelevant.
A waves frequency is a covector:

An observer experiences proper time :
o o =
There are no external arrows in the birdtracks version, and in the
abstract-index version all lower indices (pipes coming out) have been
paired with upper indices (pipes coming in); this indicates that the
122 Chapter 6 Waves
proper time is a scalar, and therefore independent of any choice of
coordinate system. In Einstein notation, this becomes o

, with
an implied sum over the repeated index,

. The refers to
a particular coordinate system, so in the Einstein notation it is no
longer obvious that the equation holds regardless of our choice of
A world-line along which a wave propagates lies along a vector
that is orthogonal to the waves frequency covector:
u = 0
= 0
The frequency covector is the gradient of the phase:

The following grammatical rules apply to both abstract-index

and Einstein notation:
1. Repeated indices occur in pairs, with one up and one down
and the two factors multiplying each other.
2. Disregarding indices that are paired as in rule 1, all other
indices must appear uniformly in all terms and on both sides
of an equation. Appear uniformly means that an index cant
be missing and cant be a superscript in some places but a
subscript in others.
3. For reasons to be explained in section 7.4, p. 134, a partial
derivative with respect to a coordinate, such as /x
, is
treated as if the index were a subscript, and conversely /x
is considered to have a superscripted k.
In abstract-index notation, rule 1 follows because the indices are
simply labels describing how, in birdtracks notation, the pipes should
be hooked up. Violating rule 1, as in an expression like v
, pro-
duces a quantity that does not actually behave as a scalar. An
example of a violation of rule 2 is v
. This doesnt make
sense, for the same reason that it doesnt make sense to equate a
row vector to a column vector in linear algebra. Even if an equation
like this did hold in one frame of reference, it would fail in another,
since the left-hand and right-hand sides transform dierently under
a boost.
In section 6.4.1 we discussed the notion of nding the covector
that was dual to a given vector, and the vector dual to a given
covector. Because the distinction between vectors and covectors
is represented in index notation by placing the index on the top
or on the bottom, relativists refer to this kind of thing as raising
and lowering indices. In general, this type of manipulation is called
Section 6.6 Some related mathematical tools 123
f / Using parallelism to dene
index gymnastics. Heres what raising and lowering indices looks
Converting a vector to its covector form:
= g
Changing a covector to the corresponding vector:
= g
The symbol g
refers to the inverse of the matrix g
6.6.2 Volume
Desirable properties
In 3 + 1 dimensions, we have a natural way of dening four-
dimensional volume, which is to pick a frame of reference and let
the element of volume be dt dxdy dz in the Minkowski coordinates
of that frame. Although this denition of 4-volume is stated in terms
of certain coordinates, it turns out to be Lorentz-invariant (section
2.5, p. 45). It also has the following desirable properties:
V1. Any two m-volumes can be compared in terms of their ratio.
V2. For any m nonzero vectors, the m-volume of the paral-
lelepiped they span is nonzero if and only if the vectors are linearly
independent (that is, if none of them can be expressed in terms of
the others using scalar multiplication and vector addition).
We would also like to have convenient methods for working with
three-volume, two-volume (area), and one-volume (length). But the
m-volumes for m < 4 give us headaches if we try to dene them so
that they obey both V1 and V2. For example, the obvious way to
dene length (m = 1) is to use the metric, but then lightlike vectors
would violate V2.
Ane measure
If were willing to abandon V1, then the following approach suc-
ceeds. Consider the m = 1 case. We ignore the metric completely
and exploit the fact that in special relativity, spacetime is at (pos-
tulate P2, p. 41), so that parallelism works the same way as in
Euclidean geometry. Let be a line, and suppose we want to dene
a number system on this line that measures how far apart events are.
Depending on the type of line, this could be a measurement of time,
of spatial distance, or a mixture of the two. First we arbitrarily sin-
gle out two distinct points on and label them 0 and 1, as in gure
f. Next, pick some auxiliary point q
not lying on . Construct q
and parallel to 01 and 1q
parallel to 0q
, forming the parallelogram
shown in the gure. Continuing in this way, we have a scaolding of
parallelograms adjacent to the line, determining an innite lattice of
points 1, 2, 3, . . . on the line, which represent the positive integers.
124 Chapter 6 Waves
g / The area of the viola can
be determined by counting the
parallelograms formed by the
lattice. The area can be deter-
mined to any desired precision,
by dividing the parallelograms
into fractional parts that are as
small as necessary.
Fractions can be dened in a similar way. For example,
is dened
as the point such that when the initial lattice segment 0
is ex-
tended by the same construction, the next point on the lattice is 1.
The continuously varying variable constructed in this way is called
an ane parameter. The time measured by a free-falling clock is
an example of an ane parameter, as is the distance measured by
the tick marks on a free-falling ruler. An ane parameter can only
be dened along a straight world-line, not an arbitrary curve. The
ane measurement of 1-volume violates V1, because it only allows
us to compare distances that lie on or parallel to it. On the other
hand, it has the advantage over metric measurement that it allows
us to measure lengths along lightlike lines.
Figure g shows how to dene an ane measure of 2-volume, and
a similar method works for 3-volume.
Suppose that a parallelogram is formed with vectors a and b as
two of its sides. It we double a, then the area doubles as well,
area(2a, b) = 2 area(a, b) .
In general, if we scale either of the vectors by a factor c, the area
scales by the same factor, provided that we set some rule for han-
dling signs an issue that well ignore for the time being. Some-
thing similar happens when we add two vectors, e.g.,
area(a, b +c) = area(a, b) + area(a, c) ,
again passing over issues with signs. We refer to these properties as
linearity of the ane 2-volume. Any sensible measure of m-volume
should have similar linearity properties.
The 3-volume covector
In the special case of m = 3, linearity leads to an especially
simple characterization of the volume. Let a 3-volume be dened
by the parallelepiped spanned by vectors a, b, and c. If we threw
in a fourth vector d, we would have a 4-volume, and 4-volume is
a scalar. This 4-volume would depend in a linear way on all four
vectors, and in particular it would depend linearly on d. But this
means we have a scalar function that depends linearly on a vector,
and such a function is exactly what we mean by a covector. We can
therefore dene a volume covector S according to
= 4-volume(a, b, c, d) .
The volume covector collects the information about the volume of
the 3-parallelepiped, encapsulating it in a convenient form with
known transformation properties. In particular, the statement and
proof of Gausss theorem in 3 + 1 dimensions are greatly simpli-
ed by the use of this tool (p. 164). The 3-volume covector, un-
like the ane 3-volume, is dened in an absolute sense rather than
Section 6.6 Some related mathematical tools 125
h / Interpretation of the 3-volume
in relation to some parallelepiped arbitrarily chosen as a standard.
Both the covector and the ane volume fail to satisfy the ratio-
comparison property V1 on p. 124, since we cant compare volumes
unless they lie in parallel 3-planes.
Weve been visualizing covectors in n dimensions as stacks of
(n 1)-dimensional planes (gure a/2, p. 112; gure c/2, p. 113).
The volume three-vector should therefore be visualized as a stack of
3-planes in a four-dimensional space. Since most of us cant visu-
alize things very well in four dimensions, gure h omits one of the
dimensions, so that the 3-surfaces appear as two-dimensional planes.
The small hand h/1 has a certain 3-volume, and the covector that
measures it is represented by the stack of 3-planes parallel to it,
h/2. The bigger hand h/3 has twice the 3-volume, and its covector
is represented by a stack of planes with half the spacing.
126 Chapter 6 Waves
1 In section 6.4.1, I proved that duality is a self-inverse oper-
ation, invoking Minkowski coordinates and assuming the signature
to be +. Show that these assumptions were not necessary.
Problems 127
128 Chapter 6 Waves
Chapter 7
In your previous study of physics, youve seen many examples where
one coordinate system makes life easier than another. For a block
being pushed up an inclined plane, the most convenient choice may
be to tilt the x and y axes. To nd the moment of inertia of a
disk we use cylindrical coordinates. The same is true in relativ-
ity. Minkowski coordinates are not always the most convenient. In
chapter 6 we learned to classify physical quantities as covectors,
scalars, and vectors, and we learned rules for how these three types
of quantities transformed in two special changes of coordinates:
1. When we rescale all coordinates by a factor , vectors, scalars,
and covectors scale by
, where p = +1, 0, and 1, respec-
2. Under a boost, the three cases require respectively the Lorentz
transformation, no transformation, and the inverse Lorentz
In this chapter well learn how to generalize this to any change of
and also how to nd the form of the metric expressed
in non-Minkowski coordinates.
7.1 An example: accelerated coordinates
Lets start with a concrete example that has some physical interest.
In section 5.2, p. 104, we saw that we could have gravity without
gravity: an experiment carried out in a uniform gravitational eld
can be interpreted as an experiment in at spacetime (so that spe-
cial relativity applies), but with the measurements expressed in the
accelerated frame of the earths surface. In the Pound-Rebka ex-
periment, all of the results could have been expressed in an inertial
(free-falling) frame of reference, using Minkowski coordinates, but
this would have been extremely inconvenient, because, for example,
they didnt want to drop their expensive atomic clocks and take the
readings before the clocks hit the oor and were destroyed.
Since this is gravity without gravity, we dont actually need
a planet cluttering up the picture. Imagine a universe consist-
We do require the change of coordinates to be smooth in the sense dened
on p. 111, i.e., it should be a dieomorphism.
a / The transformation between
Minkowski coordinates (t , x)
and the accelerated coordinates
(T, X)
ing of limitless, empty, at spacetime. Describe it initially using
Minkowski coordinates (t, x, y, z). Now suppose we want to nd a
new set of coordinates (T, X, Y , Z) that correspond to the frame of
reference of an observer aboard a spaceship accelerating in the x
direction with a constant acceleration.
The Galilean answer would be that we simply take X = x
But this is unsatisfactory from a relativistic point of view for several
reasons. At t = c/a the observer would be moving at the speed of
light, but relativity doesnt allow frames of reference moving at c
(section 3.4, p. 55). At t > c/a, the observers motion would be
faster than c, but this is impossible in 3+1 dimensions (section 3.8,
p. 64).
These problems are related to the fact that the observers proper
acceleration, i.e., the reading on an accelerometer aboard the ship,
isnt constant if the ships position is given by
. We saw in
example 4 on p. 57 that constant proper acceleration is described
by x =
cosh a, t =
sinh a, where is the proper time. For this
type of motion, the velocity only approaches c asymptotically. This
suggests the following for the relationship between the two sets of
t = X sinh T
x = X cosh T
y = Y
z = Z
For example, if the ship follows a world-line (T, X) = (, 1), then its
motion in the unaccelerated frame is (t, x) = (sinh , cosh ), which
is of the desired form with a = 1.
The (T, X, Y , Z) coordinates, called Rindler coordinates, have
many, but not all, of the properties we would like for an accelerated
frame. Ideally, wed like to have all of the following: (1) the proper
acceleration is constant for any world-line of constant (X, Y , Z); (2)
the proper acceleration is the same for all such world-lines, i.e., the
ctitious gravitational eld is uniform; and (3) the description of
the accelerated frame is just a change of coordinates, i.e., were just
talking about the at spacetime of special relativity, with events
renamed. It turns out that we can pick two out of three of these,
but its not possible to satisfy all three at the same time. Rindler
coordinates satisfy conditions 1 and 3, but not 2. This is because the
proper acceleration of a world-line of constant (X, Y , Z) can easily
be shown to be 1/X, which depends on X. Thus we dont speak of
Rindler coordinates as the coordinates of an accelerated observer.
7.2 Transformation of vectors
Now suppose we want to transform a vector whose components
are expressed in the (T, X) coordinates into components expressed
130 Chapter 7 Coordinates
in (t, x). Our most basic example of a vector is a dispacement
(T, X), and if we make this an innitesimal (dT, dX) then we
dont need to worry about the fact that the chart in gure a has
curves on it close up, curves look like straight lines.
If we think
of the coordinate t as a function of two variables, t = t(T, X), then
t is changing for two dierent reasons: its rst input T changes, and
also its second input X. If t were only a function of one variable
t(T), then the change in t would be given simply by the chain rule,
dt =
dt/ dT
T. Since it actually has two such reasons to change, we
add the two changes:
dt =
dT +
The derivatives are partial derivatives, and these derivatives exist
because, as we will always assume, the change of coordinates is
smooth. An exactly analogous expression applies for dx.
dx =
dT +
Before we carry out the details of this calculation, lets stop
and note that the results so far are completely general. Since we
have so far made no use of the actual equations for this particular
change of coordinates, these expressions would apply to any such
transformation, including the special cases weve encountered so far,
such as Lorentz transformations and scaling. (For example, if wed
been scaling by a factor , then all of the partial derivatives would
simply have equaled .) Furthermore, our denition of a vector is
that a vector is anything that transforms like a vector. Since weve
established that the rules above apply to a displacement vector, we
conclude that they would also apply to any other vector, say an
energy-momentum vector.
Returning to this specic example, application of the facts
dsinh u/ du = cosh u and dcosh u/ du = sinh u tells us that the vec-
(dT, dX)
is transformed to:
(dt, dx) = (X cosh T dT + sinh T dX , X sinh T dT + cosh T dX)
As an example of how this applies universally to any type of
vector, suppose that the observer aboard a spaceship with world-line
(T, X) = (, 1) has a favorite paperweight with mass m. According
to measurements carried out aboard her ship, its energy-momentum
vector is
, p
) = (m, 0) .
Here we make use of the fact that the change of coordinate was smooth, i.e.,
a dieomorphism. Otherwise the curves could have kinks in them that would
still look like kinks under any magnication.
Section 7.2 Transformation of vectors 131
In the unaccelerated coordinates, this becomes
, p
) = (X cosh T p
+ sinh T p
, X sinh T p
+ cosh T p
= (mX cosh T, mX sinh T)
= (mcosh , msinh ) .
Since the functions cosh and sinh behave like e
for large x, we nd
that after the astronaut has spent a reasonable amount of proper
time accelerating, the paperweights mass-energy and momentum
will have grown to the point where its an awesome weapon of mass
destruction, capable of obliterating an entire galaxy.
7.3 Transformation of the metric
Continuing with the example of accelerated coordinates, lets nd
what happens to the metric when we change from Minkowski coor-
dinates. Minkowski coordinates are essentially dened so that the
metric has the familiar form with coecients +1 and 1. In relativ-
ity, one often presents the metric by showing its result when applied
to an innitesimal displacement (dt, dx):
= dt
Here ds would represent proper time, in the case where the displace-
ment was timelike. Since weve already determined that
dt = X cosh T dT + sinh T dX and
dx = X sinh T dT + cosh T dX ,
we can simply substitute into the expression for ds in order to nd
the form of the metric in (T, X) coordinates. Employing the identity
= 1, we nd
= X
The varying value of the dT
coecient is in fact exactly the kind
of gravitational time dilation eect whose existence we predicted in
section 5.2.5, p. 106 based on the equivalence principle. The form
of the metric inferred there was
(1 + 2) dT
where is the dierence in gravitational potential relative to some
reference height. One of the approximations employed was the as-
sumption that the range of heights X was small, but subject to
that approximation, the two results should agree. For convenience,
lets consider observers in the region X 1, where the accelera-
tion is approximately 1. Then the = (1 + X) (1)
(acceleration)(height) X, so the time coecient in the second
132 Chapter 7 Coordinates
form of the metric is 1 +2 1 +2X. But to within the de-
sired level of approximation, this is the same as X
= (1 +X)

1 + 2X.
The procedure employed above works in general. To transform
the metric from coordinates (t, x, y, z) to new coordinates (t

, x

, y

, z

we obtain the unprimed coordinates in terms of the primed ones,
take dierentials on both sides, and eliminate t, . . . , dt, . . . in favor
of t

, . . . dt

, . . . in the expression for ds

. Well see in section 9.2.4,
p. 154, that this is an example of a more general transformation
law for tensors, mathematical objects that generalize vectors and
covectors in the same way that matrices generalize row and column
b / Example 1.
A map projection Example 1
Because the earths surface is curved, it is not possible to rep-
resent it on a at map without distortion. Let be the latitude,
the angle measured down from the north pole (known as the
colatitude), both measured in radians, and let a be the earths ra-
dius. Then by the denition of radian measure, an innitesimal
north-south displacement by d is a distance a d. A point at a
given colatitude lies at a distance a sin from the axis, so for an
innitesimal east-west distance we have a sin d. For conve-
nience, let the units be chosen such that a = 1. Then the metric,
with signature ++, is
= d
+ sin
d .
One of the many possible ways of forming a at map is the Lam-
bert cylindrical projection,
x =
y = cos ,
shown in gure b. If we see a distance on the map and want
to know how far it actually is on the earths surface, we need
to transform the metric into the (x, y) coordinates. The inverse
coordinate transformation is
= x
= cos
y .
Section 7.3 Transformation of the metric 133
Taking differentials on both sides, we get
d = dx
d =
1 y
We take the metric and eliminate , , d, and d, nding
= (1 y
) dx
1 y
In gure b, the polka-dot pattern is made of gures that are ac-
tually circles, all of equal size, on the earths surface. Since they
are fairly small, we can approximate y as having a single value
for each circle, which means that they are represented on the
at map as approximate ellipses with their east-west dimensions
having been stretched by (1 y
and their north-south ones
shrunk by (1 y
. Since these two factors are reciprocals of
one another, the area of each ellipse is the same as the area of
the original circle, and therefore the same as those of all the other
ellipses. They are a visual representation of the metric, and they
demonstrate the equal-area property of this projection.
7.4 Summary of transformation laws
Having worked through one example in detail, lets progress from
the specic to the general. In the Einstein concrete index notation,
let coordinates (x
, x
, x
, x
) be transformed to new coordinates
, x
, x
, x
). Then vectors transform according to the rule

= v

, (1)
where the Einstein summation convention implies a sum over the
repeated index . By the same reasoning as in section 6.4.2, p. 117,
the transformation for a covector is

. (2)
Note the inversion of the partial derivative in one equation compared
to the other. Because these equations describe a change from one
coordinate system to another, they clearly depend on the coordinate
system, so we use Greek indices rather than the Latin ones that
would indicate a coordinate-independent abstract index equation.
The letter in these equations always appears as an index re-
ferring to the new coordinates, to the old ones. For this rea-
son, we can get away with dropping the primes and writing, e.g.,

= v


rather than v

, counting on context to show that


is the vector expressed in the new coordinates, v

in the old ones.

134 Chapter 7 Coordinates
This becomes especially natural if we start working in a specic co-
ordinate system where the coordinates have names. For example,
if we transform from coordinates (t, x, y, z) to (a, b, c, d), then it is
clear that v
is expressed in one system and v
in the other.
In equation (2), appears as a subscript on the left side of the
equation, but as a superscript on the right. This would appear to
violate the grammatical rules given on p. 123, but the interpreta-
tion here is that in expressions of the form /x
and /x
, the
superscripts and subscripts should be understood as being turned
upside-down. Similarly, (1) appears to have the implied sum over
written ungrammatically, with both s appearing as superscripts.
Normally we only have implied sums in which the index appears
once as a superscript and once as a subscript. With our new rule
for interpreting indices on the bottom of derivatives, the implied sum
is seen to be written correctly. This rule is similar to the one for
analyzing the units of derivatives written in Leibniz notation, with,
e.g., d
x/ dt
having units of meters per second squared. That is,
the ipping of the indices like this is required for consistency so
that everything will work out properly when we change our units of
measurement, causing all our vector components to be rescaled.
The identity transformation Example 2
In the case of the identity transformation x

= x

, equation (1)
clearly gives v

= v, since all the mixed partial derivatives x


with ,= are zero, and all the derivatives for = equal 1.

In equation (2), it is tempting to write


(wrong!) ,
but this would give innite results for the mixed terms! Only in the
case of functions of a single variable is it possible to ip deriva-
tives in this way; it doesnt work for partial derivatives. To evalu-
ate these partial derivatives, we have to invert the transformation
(which in this example is trivial to accomplish) and then take the
partial derivatives.
Polar coordinates Example 3
None of the techniques discussed here are particular to relativity.
For example, consider the transformation from polar coordinates
(r , ) in the plane to Cartesian coordinates
x = r cos
y = r sin .
A bug sits on the edge of a phonograph turntable, at (r , ) = (1, 0).
The turntable rotates clockwise, giving the bug a velocity vector

= (v
, v

) = (0, 1), i.e., the angular velocity is one radian per

second in the negative (counterclockwise) direction. Lets nd the
Section 7.4 Summary of transformation laws 135
bugs velocity vector in Cartesian coordinates. The transformation
law for vectors gives.
= v


Expanding the implied sum over the repeated index , we have
= v
+ v

= (0)
+ (1)

= r sin
= 0 .
For the y component,
= v
+ v

= (0)
+ (1)

= r sin
= 1 .
7.5 Inertia and rates of change
Suppose that we describe a ying bullet in polar coordinates. We
neglect the vertical dimension, so the bullets motion is linear. If the
bullet has a displacement of (r
) in an short time interval t,
then clearly at a later point in its motion, during an equal interval,
it will have a displacement (r
) with two dierent numbers
inside the parentheses. This isnt because its velocity or momentum
really changed. Its because the coordinate system is curvilinear.
There are three ways to get around this:
1. Use only Minkowski coordinates.
2. Instead of characterizing inertial motion as motion with con-
stant velocity components, we can instead characterize it as
motion that maximizes the proper time (section 2.4.2, p. 44).
3. Dene a correction term to be added when taking the deriva-
tive of a vector or covector expressed in non-Minkowski coor-
These issues become more acute in general relativity, where curva-
ture of spacetime can make option 1 impossible. Option 3, called the
covariant derivative, is discussed in optional section 9.4 on p. 167.
If you arent going to read that section, just keep in mind that in
non-Minkowski coordinates, you cannot naively use changes in the
components of a vector as a measure of a change in the vector itself.
136 Chapter 7 Coordinates
Problem 2.
1 Example 3 on p. 135 discussed polar coordinates in the Eu-
clidean plane. Use the technique demonstrated in section 7.3 to nd
the metric in these coordinates.
2 Oblique Cartesian coordinates are like normal Cartesian co-
ordinates in the plane, but their axes are at at an angle ,= /2 to
one another. Show that the metric in these coordinates is
= dx
+ dy
+ 2 cos dxdy .
Problems 137
138 Chapter 7 Coordinates
Chapter 8
Rotation (optional)
8.1 Rotating frames of reference
8.1.1 No clock synchronization
Panels 1 and 2 of gure a recapitulate the result of example 13
on p. 32. The set of three clocks xed to the earth in a/1 have
been synchronized by Einstein synchronization (example 4, p. 19),
i.e., by exchanging ashes of light. The three clocks aboard the
moving train, a/2, have been synchronized in the same way, and
the events that were simultaneous according to frame 1 are not
simultaneous in frame 2. There is a systematic shift in the times,
which is represented by the term t

= . . . vx in the Lorentz
transformation (eq. (1), p. 30).
a / Clocks cant be synchronized
in a rotating frame of reference.
Now suppose we take the diagram of the train and wrap it
around, a/3. If we go on and close the loop, making the chain
into a circle like a chain necklace, we have a problem. The trend
in the clock times can continue until it wraps back around to the
beginning, but then there will be a discrepancy.
We conclude that clocks cant be synchronized in a rotating
frame of reference. Such a frame does not admit a universal time
coordinate because Einstein synchronization isnt transitive: syn-
chronizing clock A with clock B, and B with C, does not imply that
A is synchronized with C. This nontransitivity is one way of dening
what we mean by rotation. That is, if the operational denition of
an inertial frame given in section 5.1, p. 101, shows that our frame is
noninertial, and we want to know more about why its noninertial,
testing for this nontransitivity is a way of nding out whether its
because of rotation.
8.1.2 Rotation is locally detectable
The people aboard the circular train know that their attempts at
synchronization fail, so they can tell, without reference to anything
external, that theyre going in a circle. (Cf. example 1, p. 102.)
Although this is a book on special, not general, relativity, its in-
teresting to note the following possibility. Suppose that we verify, by
local experiments, that we have a good, nonrotating, inertial frame
of reference. It is then imaginable that if we view distant galaxies
from this frame, we will see them rotate at some angular frequency
about some axis on the celestial sphere. If this is observed, then
we must infer that it is the universe as a whole not our labora-
tory! that is rotating. Such an eect has been searched for, and,
for example, an upper limit 10
radian/year was inferred by
General-relativistic models of such rotating cosmologies
have a preferred vector constituting the direction of the axis about
which matter rotates, but there is no global center of rotation. Cur-
rent upper limits on are good enough to rule out any signicant
eect on cosmological expansion due to centrifugal forces.
8.1.3 The Sagnac effect
Although the train scenario is obviously unrealistic, the time
shift is far from hypothetical. This type of eect, called the Sagnac
eect, was rst observed by M. Georges Sagnac in 1913, and it
relates to the principle of the ring laser gyroscope (example 2, p. 18),
used in passenger jets. (The name is French, and is pronounced
sah-NYAHK.) To nd the Sagnac eect quantitatively, we note
that in the circular train example (ignoring signs) the relevant term
in the Lorentz transformation, vx, would accumulate, after one
complete circuit of Einstein synchronization, a discrepancy equal
to the circumference of the circle multiplied by v. If the circles
radius is r and the angular velocity , we have t = 2r
. This
can be rewritten in terms of the circles area A as t = 2A, or,
reinserting factors of c to accomodate SI units, t = 2A/c
. The
proportionality to the enclosed area is not an accident; the product
vx has the form of the integrand F ds occurring in Stokes theorem.
Astronomical time, Rev. Mod. Phys. 29 (1957) 2.
140 Chapter 8 Rotation (optional)
Sagnac effect in the Hafele-Keating experiment Example 1
A clock at the equator of the earth rotates at a frequency of
2 radians per sidereal day, suffering a Sagnac effect of 210 ns
per day. The traveling atomic clocks in the Hafele-Keating exper-
iment (p. 15) went around the world in both directions, and were
compared with a third set of clocks that stayed in Washington, DC.
Since the time required to y around the earth was also on the or-
der of one day, the differences in the values of for the three sets
of clocks were on the same order of magnitude as the of the
earth, and we therefore expect cumulative differential Sagnac ef-
fects that are also on the order of a hundred nanoseconds. These
effects exist only in the rotating frame of the earth, but the things
being measured are proper times, and proper time is a scalar, so
the experimental results are independent of what frame of refer-
ence is used for calculating them. Since the airline pilots provided
Hafele and Keating with navigational data referred to the rotating
earth, they analyzed their results in the rotating frame, in which
there was a Sagnac effect. They could equally well have trans-
formed their data into the frame of the stars, in which case the
same result would have been predicted, but it would have been
described as arising from kinematic time dilation.
Ring laser gyroscope Example 2
The ring laser gyroscope in the photo in example 2 on p. 18 looks
like it has an area on the order of 10
and uses red light.
For use in navigation, one wants to be able to detect a change in
course of, say, one degree in our hour, or 510
The result is a time shift t 10
s, which for red light is a
phase shift of only = 4A/c 310
radian. In the orig-
inal nineteenth-century experiments, this phase shift would have
had to be measured by producing interference between the two
beams and measuring the change in intensity resulting from this
change in phase. Our estimate of shows that this is impractical
for a portable instrument. In a modern ring laser gyroscope, an
active laser medium is inserted in the loop, and the result is that
the loop resonates at a frequency that is shifted from the lasers
natural frequency by f c/L, where L is the circumference.
The result is a frequency shift of a few Hz, which is easily measur-
able. An alternative technique, used in the ber optic gyroscope,
is to wrap N turns of optical ber around the circumference, ef-
fectively changing A to NA.
8.1.4 A rotating coordinate system
The GPS system is a practical example of a case where we nat-
urally want to employ a rotating coordinate system. Hikers and
sailors, after all, want to know where they are relative to the earths
rotating surface. Since locations need to be determined to within
meters, the timing of signals needs to be done to a precision of
Section 8.1 Rotating frames of reference 141
something like (1 m)/c, which is a few nanoseconds. This is why
the GPS satellites have atomic clocks aboard, and timing to this pre-
cision clearly requires that relativistic eects be taken into account.
We therefore need not a rotating Newtonian coordinate system but
a rotating relativistic one. Lets start with the nonrotating frame,
and dene coordinates (t, r, , z), with the spatial part (r, , z) being
ordinary cylindrical coordinates. For simplicity, well neglect the z
coordinate in what follows. Extending the result of problem 1 on
p. 137 from 2 + 0 dimensions to 2 + 1, we have the metric
= dt
. (1)
The results of section 8.1.1 show that we do not expect to be able
to dene a completely satisfactory time coordinate in the rotating
frame, so lets start with the minimal change (t, r, ) (t, r,


= t. This is at least enough to make world-lines of


be ones that revolve around the origin at the appropriate

frequency. Substituting d = d

+ dt, we nd
= (1
) dt

dt . (2)
Recognizing r as the velocity of one frame relative to another,
and (1
as , we see that we do have a relativistic time
dilation eect in the dt
term. But the dr
and d
terms look the
same as in equation (1). Why dont we see any Lorentz contraction
of the length scale in the azimuthal direction?
The answer is that coordinates in relativity are arbitrary, and
just because we can write down a certain set of coordinates, that
doesnt mean they have any special physical interpretation. The co-
ordinates (t, r,

) do not correspond physically to the quantities that

a rotating observer R would measure with clocks and meter-sticks.
If R uses a ruler to measure a short arc along the circumference of
the circle r = r
, the distance is a distance being measured between
events in spacetime that are simultaneous in the rest frame of the
ruler, and these do not occur at the time value of the time coordi-
nate t. In the Lorentz transformation, for linear motion, it is the
vx term applied to the times that xes this problems and makes t

properly represent simultaneity in the new frame. In our rotational

version, we could try to do something similar by dening a time
coordinate t

= t +f

, where f is a function of r that is engineered

so that the d

dt cross term in the metric would go away. This can

be done (the function f that works turns out to be r
but the problem is that the t

coordinate is not single-valued, in the

sense that (t, r, ) and (t, r, + 2) would not produce the same t

This is inevitable, as weve seen in section 8.1.1, so we cant improve
on the coordinates (t, r,

) and the metric (2).

The coordinates (t, r,

), with the metric (2) are the ones used

in the GPS system, and in that context are called Earth-Centered
142 Chapter 8 Rotation (optional)
Inertial (ECI) coordinates. (Another name is Born coordinates.)
Their time coordinate is not the time measured by a clock in the
rotating frame but is simply the time coordinate of the nonrotating
frame of reference tied to the earths center. Conceptually, we can
imagine this time coordinate as one that is established by sending
out an electromagnetic tick-tock signal from the earths center,
with each satellite correcting the phase of the signal based on the
propagation time inferred from its own r. In reality, this is accom-
plished by communication with a master control station in Colorado
Springs, which communicates with the satellites via relays at Kwa-
jalein, Ascension Island, Diego Garcia, and Cape Canaveral.
8.2 Boosts and rotations
A relative of mine fell in love. She and her boyfriend bought a house
in the suburbs and started trying to have a baby. They think theyll
get married at some later point. An engineer by training, she says
she doesnt want to get hung up on the order of operations. For
some mathematical operations, the order doesnt matter: 5 + 7 is
the same as 7 + 5.
b / Performing the rotations in one
order gives one result, 3, while re-
versing the order gives a different
result, 5.
8.2.1 Rotations
But gure b shows that the order of operations does matter
for rotations. Rotating around the x axis and then y produces a
dierent result than y followed by x. We say that rotations are
noncommutative. This is why, in Newtonian mechanics, we dont
have an angular displacement vector ; vectors are supposed to be
additive, and vector addition is commutative. For small rotations,
however, the discrepancy caused by choosing one order of operations
Section 8.2 Boosts and rotations 143
rather than the other becomes small (of order
), so we can dene
an innitesimal displacement vector d, whose direction is given by
the right-hand rule, and an angular velocity = d/ dt.
As an example of how this works out for small rotations, lets
take the vector
(0, 0, 1) (3)
and apply the operations shown in gure b, but with rotations of
only = 0.1 radians rather than 90 degrees. Rotation by this
angle about the x axis is given by the transformation (x, y, z)
(x, y cos z sin , y sin +z cos ), and applying this to the original
vector gives this:
(0.00000, 0.09983, 0.99500) (after x) (4)
After a further rotation by the same angle, this time about the y
axis, we have
(0.09933, 0.09983, 0.99003) (after x, then y) (5)
Starting over from the original vector (3) and doing the operations
in the opposite order gives these results:
(0.09983, 0.00000, 0.99500) (after y) (6)
(0.09983, 0.09933, 0.99003) (after y, then x) (7)
The discrepancy between (5) and (7) is a rotation by very nearly
.005 radians in the xy plane. As claimed, this is on the order of
(in fact, its almost exactly
/2). A single example can never prove
anything, but this is an example of the general rule that rotations
along dierent axes dont commute, and for small angles the dis-
crepancy is a rotation in the plane dened by the two axes, with a
magnitude whose maximum size is on the order of
8.2.2 Boosts
Something similar happens for boosts. In 3 + 1 dimensions, we
start with the vector
(0, 1, 0, 0) , (8)
pointing along the x axis. A Lorentz boost with v = 0.1 (eq. (1),
p. 30) in the x direction gives
(0.10050, 1.00504, 0.00000, 0.00000) (after x) (9)
and a second boost, now in the y direction, produces this:
(0.10101, 1.00504, 0.01010, 0.00000) (after x, then y) (10)
Starting over from (8) and doing the boosts in the opposite order,
we have
(0.00000, 1.00000, 0.00000, 0.00000) (after y) (11)
(0.10050, 1.00504, 0.00000, 0.00000) (after y, then x) (12)
144 Chapter 8 Rotation (optional)
c / Nonrelativistically, the gy-
roscope should not rotate as long
as the forces from the hammer
are all transmitted to it at its
center of mass.
The discrepancy between (10) and (12) is a rotation in the xy plane
by very nearly 0.01 radians. This is an example of a more general
fact, which is that boosts along dierent axes dont commute, and
for small angles the discrepancy is a rotation in the plane dened
by the two boosts, with a magnitude whose maximum size is on the
order of v
, in units of radians.
8.2.3 Thomas precession
Figure c shows the most important physical consequence of all
this. The gyroscope is sent around the perimeter of a square, with
impulses provided by hammer taps at the corners. Each impulse
can be modeled as a Lorentz boost, notated, e.g., L
for a boost
in the x direction. The series of four operations can be written as
, using the notational convention that the rst opera-
tion applied is the one on the right side of the list. If boosts were
commutative, we could swap the two operations in the middle of
the list, giving L
. The L
would undo the L
, and
the L
would undo the L
. But boosts arent commutative, so
the vector representing the orientation of the gyroscope is rotated
in the xy plane. This eect is called the Thomas precession, after
Llewellyn Thomas (1903-1992). Thomas precession is a purely rela-
tivistic eect, since a Newtonian gyroscope does not change its axis
of rotation unless subjected to a torque; if the boosts are accom-
plished by forces that act at the gyroscopes center, then there is no
nonrelativistic explanation for the eect.
Clearly we should see the same eect if the jerky motion in gure
c was replaced by uniform circular motion, and something similar
should happen in any case in which a spinning object experiences an
external force. In the limit of low velocities, the general expression
for the angular velocity of the precession is = a v, and in the
case of circular motion, = (1/2)v
, where is the frequency of
the circular motion.
If we want to see this precession eect in real life, we should
look for a system in which both v and a are large. An atom is
such a system. The Bohr model, introduced in 1913, marked the
rst quantitatively successful, if conceptually muddled, description
of the atomic energy levels of hydrogen. Continuing to take c = 1,
the over-all scale of the energies was calculated to be proportional to
, where m is the mass of the electron, and = ke
/ 1/137,
known as the ne structure constant, is essentially just a unitless
way of expressing the coupling constant for electrical forces. At
higher resolution, each excited energy level is found to be split into
several sub-levels. The transitions among these close-lying states
are in the millimeter region of the microwave spectrum. The energy
scale of this ne structure is m
. This is down by a factor of

compared to the visible-light transitions, hence the name of the
constant. Uhlenbeck and Goudsmit showed in 1926 that a splitting
Section 8.2 Boosts and rotations 145
d / States in hydrogen are la-
beled with their and s quantum
numbers, representing their
orbital and spin angular momenta
in units of . The state with
s = +1/2 has its spin angular
momentum aligned with its orbital
angular momentum, while the
s = 1/2 state has the two
angular momenta in opposite
directions. The direction and
order of magnitude of the splitting
between the two = 1 states
is successfully explained by
magnetic interactions with the
proton, but the calculated effect
is too big by a factor of 2. The
relativistic Thomas precession
cancels out half of the effect.
on this order of magnitude was to be expected due to the magnetic
interaction between the proton and the electrons magnetic moment,
oriented along its spin. The eect they calculated, however, was too
big by a factor of two.
The explanation of the mysterious factor of two had in fact been
implicit in a 1916 calculation by Willem de Sitter, one of the rst
applications of general relativity. De Sitter treated the earth-moon
system as a gyroscope, and found the precession of its axis of rota-
tion, which was partly due to the curvature of spacetime and partly
due to the type of rotation described earlier in this section. The
eect on the motion of the moon was noncumulative, and was only
about one meter, which was much too small to be measured at the
time. In 1927, however, Thomas applied similar reasoning to the
hydrogen atom, with the electrons spin vector playing the role of
gyroscope. Since the electrons spin is /2, the energy splitting is
(/2), depending on whether the electrons spin is in the same
direction as its orbital motion, or in the opposite direction. This is
less than the atoms gross energy scale by a factor of v
/2, which
. The Thomas precession cancels out half of the magnetic
eect, bringing theory in agreement with experiment.
Uhlenbeck later recalled: ...when I rst heard about [the Thomas
precession], it seemed unbelievable that a relativistic eect could
give a factor of 2 instead of something of order v/c... Even the
cognoscenti of relativity theory (Einstein included!) were quite sur-
146 Chapter 8 Rotation (optional)
1 In the 1925 Michelson-Gale-Pearson experiment, the physicists
measured the Sagnac eect due to the earths rotation. They laid
out a rectangle of sewer pipes with length x = 613 m and width y =
339 m, and pumped out the air. The latitude of the site in Illinois
was 41


, so that the eective area was equal to the projection of

the rectangle into the plane perpendicular to the earths axis. Light
was provided by a sodium discharge with = 570 nm. The light
was sent in both directions around the rectangle and interfered,
eectively doubling the area. Clever techniques were required in
order to calibrate the apparatus, since it was not possible to change
its orientation. Calculate the number of wavelengths by which the
relative phase of the two beams was expected to shift dur to the
Sagnac eect, and compare with the experimentally measured result
of 0.230 0.005 cycles.
Problems 147
148 Chapter 8 Rotation (optional)
a / Charged particles with
world-lines that contribute to J
and . The z dimension isnt
shown, so the cubical 3-surfaces
appear as squares.
Chapter 9
9.1 The current vector
9.1.1 Current as the ux of charged particles
The most fundamental laws of physics are conservation laws,
which tell us that we cant create or destroy stu, where stu
could mean quantities such as electric charge or energy-momentum.
Since charge is a Lorentz invariant, its an easy example to start
with. Because charge is invariant, we might also imagine that charge
density was invariant. But this is not the case, essentially because
spatial (3-dimensional) volume isnt invariant; in 3 + 1 dimensions,
only four-dimensional volume is an invariant (problem 2, p. 47). For
example, suppose we have an insulator in the shape of a cube, with
charge distributed uniformly throughout it according to an observer
at rest relative to the cube. Then in a frame o
moving relative to
the cube, parallel to one of its axes, the cube becomes foreshortened
by length contraction, and its volume is reduced by the factor 1/.
The result is that the charge density in o
is greater by a factor of
This means that knowledge of the charge density in one frame
is insucient to determine the charge density in another frame. In
the example of the cube, what would be sucient would be knowl-
edge of the vector J =
v, where
is the charge density in the
cubes rest frame, and v is the cubes velocity vector. J, called
the current vector, transforms as a relativistic vector because of the
transformation properties of the two factors that dene it. The ve-
locity v is a vector (section 3.5.1). The factor
is an invariant,
since it in turn breaks down into charge divided by rest-volume.
Charge is an invariant, and all observers agree on what the volume
the cube would have in its rest frame.
J can be expressed in Minkowski coordinates as (, J
, J
, J
where is the charge density and, e.g., J
is the density of electric
current in the x direction. Suppose we dene the three-surface S
shown in gure a/1, consisting of the set of events with coordinates
(t, 0, y, z) such that 0 t 1, 0 y 1, and 0 z 1. Some
charged particles have world-lines that intersect this surface, pass-
ing through it either in the positive x direction or the negative x
direction (which we count as negative charge transport). S has a
three-volume V . If we add up the total charge transport q across
b / Example 1.
this surface and divide by V , we get the average value of J
. If we
let S shrink down to smaller and smaller three-surfaces surround-
ing the event (0, 0, 0, 0), then we get the the value of J
at this
point, lim
V 0
q/V . In other words, J
measures the ux density
of charge that passes through S. Of course this description in terms
of a limit implies a large number of charges, not just one as in gure
You can write out the analogous denition for J
, using a surface
of simultaneity for like S

, gure a/2, and youll see that it expresses

the density of charge . In this case S

represents a moment in time,

and the ux through S

means that the charges are crossing the

threshold from the past into the future.
Our argument that J transformed like a vector was based on a
case where all the charged particles had the same velocity vector, but
the above description in terms of the ux of charge eliminated any
discussion of velocity. Its true, but less obvious, that the J described
in this way also transforms as a vector, even in cases where the
charged particles do not all have parallel world-lines. The current
vector is the source of electric and magnetic elds. Remarkably, no
macroscopic electrical measurement is capable of detecting anything
more detailed about the motion of the charges than the averaged
information provided by J.
Boosting a solenoid Example 1
The gure shows a solenoid, at rest, wound from copper wire.
At point P, we construct a rectangular Amp` erian loop in the yz
plane that has its right edge inside the solenoid and its left one
outside. Amp` eres law,
B ds = (4k/c
)I, then tells us that the
current density J
causes a difference between the exterior eld
= 0 and the interior eld B
= (4k/c
y, where y is the
thickness of the solenoid. There are two things we can get from
this result, both of them nontrivial.
First, the eld depends only on the current density, not on any
information about the details of the motion of the electrons in the
copper. The electrons motion is fast and highly random, but all
that contributes to J
is the slow drift velocity, typically 1 cm/s,
superimposed on the randomness. This is exact and not at all
obvious. For example, the total momentum of the electrons does
depend on the random part of their motion, because p
= mv
has a factor of in it.
Second, we can use the transformation properties of the current
vector to nd the eld of this solenoid in a frame boosted along
its axis. This is the kind of situation that would naturally arise,
for example, in an electric motor whose rotor contains an elec-
tromagnet. A Lorentz transformation in the z direction doesnt
change the x component of a vector, nor does it change y, so
is the same in both frames. This is nontrivial both in the sense
150 Chapter 9 Flux
that it would have been difcult to gure out by brute force and in
the sense that elds dont have to be the same in different frames
of reference for example, a boost in the x or the y direction
would have changed the result.
A wire Example 2
In a solid conductor such as a copper wire, we have two types of
charges, protons and electrons. The protons are at rest in the lab
frame o, with charge density
and current density
= (
, 0, 0, 0)
in Minkowski coordinates. The motion of the electrons is compli-
cated. Some electrons are bound to a particular atom, but still
move at relativistic speeds within their atoms. Others exhibit vio-
lent thermal motion that very nearly, but not quite, averages out
to zero when there is a current measurable by an ammeter. For
simplicity, we treat all the electrons (both the bound ones and the
mobile ones) as a single density of charge
. Let the average
velocity of the electrons, known as their drift velocity, be v in the x
direction. Then in the frame o

moving along with the drift velocity

we have

= (

, 0, 0, 0) ,
which under a Lorentz transformation back into the lab frame be-
= (


v, 0, 0) .
Adding the two current vectors, we have a total current in the lab
J = (


v, 0, 0) .
The wire is electrically neutral in this frame, so

= 0. Since

is a xed property of the wire, we express

in terms of it as

/. Eliminating

J = (0,
v, 0, 0) .
Because the factors canceled, we nd that the current is exactly
proportional to the drift velocity. Geometrically, we have added
two timelike vectors and gotten a spacelike one; this is possible
because one of the timelike vectors was future-directed and the
other past-directed.
Section 9.1 The current vector 151
c / 1. Charge is not conserved.
Charges mysteriously appear at
a later time without having been
present before. 2. Charge is
conserved. Although more world-
lines come out through the top
of the box than came in through
the bottom, the discrepancy is
accounted for by others that
entered through the sides.
9.1.2 Conservation of charge
Conservation of charge can be expressed elegantly in terms of J.
Charge density is the timelike component J
. If this charge density
near a certain point is, for example, increasing, then it might be
because charge conservation has been violated as in gure c/1. In
this example, more world-lines emerge into the future at the top
of the four-cube than had entered through the bottom in the past.
Some process inside the cube is creating charge. In the limit where
the cube is made very small, this would be measured by a value of
/t that was greater than zero.
But experiments have never detected any violation of charge con-
servation, so if more charge is emerging from the top (future) side of
the cube than came in from the bottom (past), the more likely expla-
nation is that the charges are not all at rest, as in c/1, but are mov-
ing, c/2, and there has been a net ow in from neighboring regions
of space. We should nd this reected in the spatial components J
and J
. Moreover, if these spatial components were all constant,
then any given region of space would have just as much current ow-
ing into it from one side as there was owing out the other. We there-
fore need to have some nonzero partial derivatives such as J
For example, gure c/2 has a positive J
on the left and a negative
on the right, so J
/x < 0. Charge conservation is expressed by
the simple equation J


= 0. Writing out the implied sum over

, this says that J
/t +J
/y +J
/z = 0. with an
implied sum over the index . If youve taken vector calculus, youll
recognize the operator being applied to J as a four-dimensional gen-
eralization of the divergence. This charge-conservation equation is
valid regardless of the coordinate system, so it can also be rewritten
in abstract index notation as
= 0 . (1)
Conservation of charge in a solenoid Example 3
In a solenoid, we have charge circulating at some drift velocity v.
Ignoring the protons, and adapting the relevant expression from
example 2 to the case of circular rather than linear motion, we
might have for the electrons contribution to the current something
of the form
J = p(1, qy, qx, 0) ,
where p = v and q depends on the v and on the radius of the
solenoid. Conservation of charge is satised, because each of
the four terms in the equation J
/t +J
/z =
0 vanishes individually.
152 Chapter 9 Flux
9.2 The stress-energy tensor
9.2.1 Conservation and ux of energy-momentum
A particle such as an electron has a charge, but it also has a
mass. We cant dene a relativistic mass ux because ux is de-
ned by addition, but mass isnt additive in relativity (example 6,
p. 78). Mass-energy is additive, but unlike charge it isnt an in-
variant. Mass-energy is part of the energy-momentum four vector
p = (E, p
, p
, p
). We then have sixteen dierent uxes we can
dene. For example, we could replay the description in section 9.1
of the three-surface S perpendicular to the x direction, but now we
would be interested in a quantity such as the z component of mo-
mentum. We then have a measure of the density of ux of p
the x direction, which we notate as T
. The matrix T is called
the stress-energy tensor, and it is an object of central importance
in relativity. In general relativity, it is the source of gravitational
elds. (The reason for the odd name will become more clear in a
The stress-energy tensor is related to physical measurements as
follows. Let o be the future-directed, normalized velocity vector of
an observer; let s express a spatial direction according to this ob-
server, i.e., it points in a direction of simultaneity and is normalized
with s s = 1; and let S be a three-volume covector (p. 125), di-
rected toward the future (i.e., o
> 0). Then measurements by
this observer come out as follows:
= mass-energy inside the three-volume S (2a)
= momentum in the direction s, inside S (2b)
The stress-energy tensor allows us to express conservation of
energy-momentum as
= 0 . (3)
This local conservation of energy-momentum is all we get in general
relativity. As discussed in section 4.3.2, p. 83, there is no such
global law in curved spacetime. However, we will show in section
9.3.4 that in the special case of at spacetime, i.e., special relativity,
we do have such a global conservation law.
9.2.2 Symmetry of the stress-energy tensor
The stress-energy tensor is a symmetric matrix. For example,
lets say we have some nonrelativistic particles. If we have a nonzero
, it represents a ux of mass-energy (p
) through a three-surface
perpendicular to x. This means that mass is moving in the x direc-
tion. But if mass is moving in the x direction, then we have some
x momentum p
. Therefore we must also have a T
, since this mo-
mentum is carried by the particles, whose world-lines pass through
a hypersurface of simultaneity.
Section 9.2 The stress-energy tensor 153
9.2.3 Dust
The simplest example of a stress-energy tensor would be a cloud
of particles, all at rest in a certain frame of reference, described in
Minkowski coordinates:

0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
where we now use to indicate the density of mass-energy, not
charge as in section 9.1. This could be the stress-energy tensor of
a stack of oranges at the grocery store, the atoms in a hunk of
copper, or the galaxies in some small neighborhood of the universe.
Relativists refer to this type of matter, in which the velocities are
negligible, as dust. The nonvanishing component T
that for a three-surface S perpendicular to the t axis, particles with
mass-energy E = P
are crossing that surface from the past to the
future. Conservation of energy-momentum is satised, since all the
elements of this T are constant, so all the partial derivatives vanish.
9.2.4 Rank-2 tensors and their transformation law
Suppose we were to look at this cloud in a dierent frame of
reference. Some or all of the timelike row T
and timelike column
would ll in because of the existence of momentum, but lets
just focus for the moment on the change in the mass-energy density
represented by T
. It will increase for two reasons. First, the kinetic
energy of each particle is now nonzero; its mass-energy increases
from m to m. But in addition, the volume occupied by the cloud
has been reduced by 1/ due to length contraction. Weve picked up
two factors of gamma, so the result is
. This is dierent from
the transformation behavior of a vector. When a vector is purely
timelike in one frame, transformation to another frame raises its
timelike component only by a factor of , not
. This tells us that
a matrix like T transforms dierently than a vector (section 7.2,
p. 130). The general rule is that if we transform from coordinates x
to x

, then:

= T

An object that transforms in this standard way is called a rank-2
tensor. The 2 is because it has two indices. Vectors and covectors
have rank 1, invariants rank 0.
In section 7.3, p. 132, we developed a method of transforming
the metric from one set of coordinates to another; we now see that
technique as an application of the more general rule given in equa-
tion (4). Considered as a tensor, the metric is symmetric, g
= g
In most of the examples weve been considering, the metric tensor
is diagonal, but when it has o-diagonal elements, each of these is
154 Chapter 9 Flux
one half the corresponding coecient in the expression for ds, as in
the following example.
An non-diagonal metric tensor Example 4
The answer to problem 2 on p. 137 was the metric
= dx
+ dy
+ 2 cos dx dy .
Writing this in terms of the metric tensor, we have
= g



= g
+ g
dx dy + g
dy dx + g
= g
+ 2g
dx dy + g
Therefore we have g
= cos , not g
= 2cos .
Dust in a different frame Example 5
We start with the stress-energy tensor of the cloud of particles, in
the rest frame of the particles.

0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Under a boost by v in the x direction, the tensor transformation
law gives



v 0 0

0 0
0 0 0 0
0 0 0 0
The over-all factor of
arises for the reasons previously de-
Parity Example 6
The parity transformation is a change of coordinates that looks
like this:

= t

= x

= y

= z
It turns right-handed screws into left-handed ones, but leaves the
arrow of time unchanged. Under this transformation, the tensor
transformation law tells us that some of the components of the
stress-energy tensor will ip their signs, while others will stay the
no ip ip ip ip
ip no ip no ip no ip
ip no ip no ip no ip
ip no ip no ip no ip
Section 9.2 The stress-energy tensor 155
d / Tullio Levi-Civita (1873-
1941) worked on models of
number systems possessing
innitesimals and on differential
geometry. He invented the
tensor notation, which Einstein
learned from his textbook. He
was appointed to prestigious
endowed chairs at Padua and the
University of Rome, but was red
in 1938 because he was a Jew
and an anti-fascist.
Everything here was based solely on the fact that T was a rank-
2 tensor expressed in Minkowski coordinates, and therefore the
same parity properties hold for other rank-2 tensors as well; cf. ex-
ample 1, p. 180.
9.2.5 Pressure
The stress-energy tensor carries information about pressure. For
example, T
is the ux in the x direction of x-momentum. This
is simply the pressure, P, that would be exerted on a surface with
its normal in the x direction. Negative pressure is tension, and
this is the origin of the term tensor, coined by the mathematician
Pressure as a source of gravitational elds Example 7
Because the stress-energy tensor is the source of gravitational
elds in general relativity, we can see that the gravitational eld of
an object should be inuenced not just by its mass-energy but by
its internal stresses. The very early universe was dominated by
photons rather than by matter, and photons have a much higher
ratio of momentum to mass-energy than matter, so the impor-
tance of the pressure components in the stress-energy tensor
was much greater in that era. In the universe today, the largest
pressures are those found inside atomic nuclei. Inside a heavy
nucleus, the electromagnetic pressure can be as high as 10
If general relativitys description of pressure as a source of gravi-
tational elds were wrong, then we would see anomalous effects
in the gravitational forces exerted by heavy elements compared
to light ones. Such effects have been searched for both in the
and in lunar laser ranging experiments,
with results
that agreed with general relativitys predictions.
9.2.6 A perfect uid
The cloud in example 5 had a stress-energy tensor in its own rest
frame that was isotropic, i.e., symmetric with respect to the x, y,
and z directions. The tensor became anisotropic when we switched
out of this frame. If a physical system has a frame in which its
stress-energy tensor is isotropic, i.e., of the form

0 0 0
0 P 0 0
0 0 P 0
0 0 0 P
we call it a perfect uid in equilibrium. Although it may contain
moving particles, this special frame is the one in which their mo-
menta cancel out. In other cases, the pressure need not be isotropic,
Kreuzer, Phys. Rev. 169 (1968) 1007. Described in section 3.7.3 of Will,
The Confrontation between General Relativity and Experiment, relativity.
Bartlett and van Buren, Phys. Rev. Lett. 57 (1986) 21, also described in
156 Chapter 9 Flux
and the stress exerted by the uid need not be perpendicular to the
surface on which it acts. The space-space components of T would
then be the classical stress tensor, whose diagonal elements are the
anisotropic pressure, and whose o-diagonal elements are the shear
stress. This is the reason for calling T the stress-energy tensor.
The perfect uid form of the stress-energy tensor is extremely
important and common. For example, cosmologists nd that it is a
nearly perfect description of the universe on large scales.
We discussed in section ?? the ideas of converting back and forth
between vectors and their corresponding covectors, and of notating
this as the raising and lowering indices. We can do the same thing
with the two indices of a rank-2 tensor, so that the stress-energy
tensor can be expressed in four dierent ways: T
, T
, T
, and
, but the symmetry of T means that there is no interesting dis-
tinction between the nal two of these. In special relativity, the
distinctions among the various forms are not especially fascinating.
We can always cover all of spacetime with Minkowski coordinates,
so that the form of the metric is simply a diagonal matrix with el-
ements 1 on the diagonal. As with a rank-1 tensor, raising and
lowering indices on a rank-2 tensor just ips some components and
leaves others alone. The methods for raising and lowering dont
need to be deduced or memorized, since they follow uniquely from
the grammar of index notation, e.g., T
= g
. But there is
the potential for a lot of confusion with all the signs, and in ad-
dition there is the fact that some people use a + signature
while others use + ++. Since perfect uids are so important, Ill
demonstrate how all of this works out in that case.
For a perfect uid, we can write the stress-energy tensor in the
coordinate-independent form
= ( +P)o
where o represents the velocity vector of an observer in the uids
rest frame, and o
= o
= o o equals 1 for our + signature
or 1 for the signature +++. For ease of writing, lets abbreviate
the signature factor as s = o
Suppose that the metric is diagonal, but its components are
varying, g

= diag(sA
, sB
, . . .). The properly normalized ve-
locity vector of an observer at (coordinate-)rest is o

= (A
, 0, 0, 0).
Lowering the index gives o

= (sA, 0, 0, 0). The various forms of the

stress-energy tensor then look like the following:
= A
= B
= s T
= sP
= A
= B
P .
Which of these forms is the real one, e.g., which form of the 00
component is the one that the observer o actually measures when
Section 9.2 The stress-energy tensor 157
she sticks a shovel in the ground, pulls out a certain volume of dirt,
weighs it, and determines ? The answer is that the index notation
is so slick and well designed that all of them are equally real, and
we dont need to memorize which actually corresponds to measure-
ments. When she does this measurement with the shovel, she could
say that she is measuring the quantity T
. But because all of
the as and bs are paired o, this expression is a rank-0 tensor. That
means that T
, T
, and T
are all the same number.
If, for example, we have coordinates in which the metric is diagonal
and has elements 1, then in all these expressions the diering signs
of the os are exactly compensated for by the signs of the Ts.
9.2.7 Two simple examples
A rope under tension Example 8
As a real-world example in which the pressure is not isotropic,
consider a rope that is moving inertially but under tension, i.e.,
equal forces at its ends cancel out so that the rope doesnt ac-
celerate. Tension is the same as negative pressure. If the rope
lies along the x axis and its bers are only capable of supporting
tension along that axis, then the ropes stress-energy tensor will
be of the form

0 0 0
0 P 0 0
0 0 0 0
0 0 0 0
where P is negative and equals minus the tension per unit cross-
sectional area.
Conservation of energy-momentumis expressed as (eq. 3, p. 153)
= 0 .
Converting the abstract indices to concrete ones, we have

= 0 ,
where there is an implied sum over , and the equation must hold
both in the case where is a label for t and the one where it refers
to x.
In the rst case, we have
t t
= 0 ,
which is a statement of conservation of energy, energy being the
timelike component of the energy-momentum. The rst term is
zero because is constant by virtue of our assumption that the
158 Chapter 9 Flux
rope was uniform. The second term is zero because T
= 0.
Therefore conservation of energy is satised. This came about
automatically because by writing down a time-independent ex-
pression for the stress-energy, we were dictating a static equilib-
When stands for x, we get an equation that requires the x com-
ponent of momentum to be conserved,
t x
= 0 .
This simply says
= 0 ,
meaning that the tension in the rope is constant along its length.
A rope supporting its own weight Example 9
A variation on example 8 is one in which the rope is hanging
and supports its own weight. Although gravity is involved, we
can solve this problem without general relativity, by exploiting the
equivalence principle (section 5.2, p. 104). As discussed in sec-
tion 5.1 on p. 101, an inertial frame in relativity is one that is free-
falling. We dene an inertial frame of reference o, corresponding
to an observer free-falling past the rope, and a noninertial frame

at rest relative to the rope.

Since the rope is hanging in static equilibrium, observer o

a stress-energy tensor that has no time-dependence. The off-
diagonal components vanish in this frame, since there is no mo-
mentum. The stress-energy tensor is

0 P
where the components involving y and z are zero and not shown,
and P is negative as in example 8. We could try to apply the
conservation of energy condition to this stress-energy tensor as
in example 8, but that would be a mistake. As discussed in 7.5
on p. 136, rates of change can only be measured by taking par-
tial derivatives with respect to the coordinates if the coordinates
are Minkowski, i.e., in an inertial frame. Therefore we need to
transform this stress-energy tensor into the inertial frame o.
For simplicity, we restrict ourselves to the Newtonian approxima-
tion, so that the change of coordinates between the two frames
t t

x x

Section 9.2 The stress-energy tensor 159
where a > 0 if the free-falling observer falls in the negative x
direction, i.e., positive x is up. That is, if a point on the rope at a
xed x

is marked with a spot of paint, then free-falling observer o

sees the spot moving up, to larger values of x, at t > 0. Applying
the tensor transformation law, we nd

at P + a
As in example 8, conservation of energy is trivially satised. Con-
servation of momentum gives
t x
= 0 ,
a +
= 0 .
Integrating this with respect to x, we have
P = ax + constant .
Let the cross-sectional area of the rope be A, and let = A be
the mass per unit length and T = PA the tension. We then nd
T = ax + constant .
Conservation of momentum requires that the tension vary along
the length of the rope, just as we expect from Newtons laws: a
section of the rope higher up has more weight below it to sup-
9.2.8 Energy conditions
The result of example 9 could cause something scary to happen.
If we walk up to a clothesline under tension and give it a quick
karate chop, we will observe wave pulses propagating away from the
chop in both directions, at velocities v =
T/. But the result
of the example is that this expression increases without limit as x
gets larger and larger. At some point, v will exceed the speed of
light. (Of course any real rope would break long before this much
tension was achieved.) Two things led to the problematic result: (1)
we assumed there was no constraint on the possible stress-energy
tensor in the rest frame of the rope; and (2) we used a Newtonian
approximation to change from this frame to the free-falling frame.
In reality, we dont know of any material so sti that vibrations
propagate in it faster than c. In fact, all ordinary materials are made
of atoms, atoms are bound to each other by electromagnetic forces,
and therefore no material made of atoms can transmit vibrations
faster than the speed of an electromagnetic wave, c.
Based on these conditions, we therefore expect there to be cer-
tain constraints on the stress-energy tensor of any ordinary form
160 Chapter 9 Flux
of matter. For example, we dont expect to nd any rope whose
stress-energy tensor looks like this:

1 0 0 0
0 2 0 0
0 0 0 0
0 0 0 0
because here the tensile stress +2 is greater than the mass density
1, which would lead to [v[ =
2/1 > 1. Constraints of this kind are
called energy conditions. Hypothetical forms of matter that violate
them are referred to as exotic matter; if they exist, they are not made
of atoms. This particular example violates the an energy condition
known as the dominant energy condition, which requires > 0 and
[P[ > . There are about ve energy conditions that are commonly
used, and a detailed discussion of them is more appropriate for a
general relativity text. The common ideas that recur in many of
them are: (1) that energy density is never negative in any frame of
reference, and (2) that there is never a ux of energy propagating
at a speed greater than c.
An energy condition that is particularly simple to express is the
trace energy condition (TEC),
0 ,
where we have to have one upper index and one lower index in order
to obey the grammatical rules of index notation. In Minkowski
coordinates (t, x, y, z), this becomes T

0, with the implied sum

over expanding to give
0 .
The left-hand side of this relation, the sum of the main-diagonal
elements of a matrix, is called the trace of the matrix, hence the
name of this energy condition. Since this book uses the signature
+ for the metric, raising the second index changes this to
0 .
In example 5 on p. 155, we computed the stress-energy tensor of a
cloud of dust, in a frame moving at velocity v relative to the clouds
rest frame. The result was



v 0 0

0 0
0 0 0 0
0 0 0 0
In this example, the trace energy condition is satised precisely un-
der the condition [v[ 1, which can be interpreted as a statement
that according the TEC, the mass-energy of the cloud can never be
transported at a speed greater than c in any frame.
Section 9.2 The stress-energy tensor 161
e / Three lines go in, and
three come out. These could be
eld lines or world lines.
9.3 Gausss theorem
9.3.1 Integral conservation laws
Weve expressed conservation of charge and energy-momentum
in terms of zero divergences,
= 0
= 0 .
These are expressed in terms of derivatives. The derivative of a
function at a certain point only depends on the behavior of the
function near that point, so these are local statements of conser-
vation. Conservation laws can also be stated globally: the total
amount of something remains constant. Taking charge as an exam-
ple, observer o denes Minkowski coordinates (t, x, y, z), and at a
time t
says that the total amount of charge in some region is
) =
where the subscript t
means that the integrand is to be evaluated
over the surface of simultaneity t = t
, and dS
= (dxdy dz, 0, 0, 0)
is an element of 3-volume expressed as a covector (p. 125). The
charge at some later time t
would be given by a similar integral.
If charge is conserved, and if our region is surrounded by an empty
region through which no charge is coming in or out, then we should
have q(t
) = q(t
9.3.2 A simple form of Gausss theorem
The connection between the local and global conservation laws
is provided by a theorem called Gausss theorem. In your course
on electromagnetism, you learned Gausss law, which relates the
electric ux through a closed surface to the charge contained inside
the surface. In the case where no charges are present, it says that
the ux through such a surface cancels out. The interpretation is
that since eld lines only begin or end on charges, the absence of
any charges means that the lines cant begin or end, and therefore,
as in gure e, any eld line that enters the surface (contributing
some negative ux) must eventually come back out (creating some
positive ux that cancels out the negative). But there is nothing
about gure e that requires it to be interpreted as a drawing of
electric eld lines. It could just as easily be a drawing of the world-
lines of some charged particles in 1 + 1 dimensions. The bottom of
the rectangle would then be the surface at t
and the top t
. We
have q(t
) = 3 and q(t
) = 3 as well.
For simplicity, lets start with a very restricted version of Gausss
theorem. Let a vector eld J
be dened in two dimensions. (We
162 Chapter 9 Flux
dont care whether the two dimensions are both spacelike or one
spacelike and one timelike; that is, Gausss theorem doesnt depend
on the signature of the metric.) Let R be a rectangular area, and
let S be its boundary. Dene the ux of the eld through S as
where the integral is to be taken over all four sides, and the covector
points outward. If the eld has zero divergence, J
= 0,
then the ux is zero.
Proof: Dene coordinates x and y aligned with the rectangle.
Along the top of the rectangle, the element of the surface, oriented
outwards, is dS = (0, dx), so the contribution to the ux from the
top is

) dx .
At the bottom, an outward orientation gives dS = (0, dx), so

) dx .
Using the fundamental theorem of calculus, the sum of these is

dy dx .
Adding in the similar expressions for the left and right, we get
dxdy .
But the integrand is the divergence, which is zero by assumption,
so = 0 as claimed.
9.3.3 The general form of Gausss theorem
Although the coordinates were labeled x and y, the proof made
no use of the metric, so the result is equally valid regardless of the
signature. The rectangle could equally well have been a rectangle in
1 +1-dimensional spacetime. The generalization to n dimensions is
also automatic, and everything also carries through without modi-
cation if we replace the vector J
with a tensor such as T
as more indices the extra index b just comes along for the ride.
Sometimes, as with Gausss law in electromagnetism, we are inter-
ested in elds whose divergences are not zero. Gausss theorem then
dv ,
where dv is the element of n-volume. In 3 + 1 dimensions we could
use Minkowski coordinates to write the element of 4-volume as dv =
dt dxdy dz, and even though this expression in written in terms of
Section 9.3 Gausss theorem 163
f / Proof of Gausss theorem
for a region with an arbitrary
these specic coordinates, it is actually Lorentz invariant (section
2.5, p. 45).
The generalization to a region R with an arbitrary shape, gure
f, is less trivial. The basic idea is to break up the region into rect-
anglular boxes, f/1. Where the faces of two boxes coincide on the
interior of R, their own outward directions are opposite. Therefore
if we add up the uxes through the surfaces of all the boxes, the
contributions on the interior cancel, and were left with only the
exterior contributions. If R could be dissected exactly into boxes,
then this would complete the proof, since the sum of exterior contri-
butions would be the same as the ux through S, and the left-hand
side of Gausss theorem would be additive over the boxes, as is the
right-hand side.
The diculty arises because a smooth shape typically cannot be
built out of bricks, a fact that is well known to Lego enthusiasts
who build elaborate models of the Death Star. We could argue on
physical grounds that no real-world measurement of the ux can
depend on the granular structure of S at arbitrarily small scales,
but this feels a little unsatisfying. For comparison, it is not strictly
true that surface areas can be treated in this way. For example, if
we approximate a unit 3-sphere using smaller and smaller boxes, the
limit of the surface area is 6, which is quite a bit greater than the
surface area 4/3 of the limiting surface.
Instead, we explicitly consider the nonrectangular pieces at the
surface, such as the one in f/2. In this drawing in n = 2 dimensions,
the top of this piece is approximately a line, and in the limit well be
considering, where its width becomes an innitesimally small dx, the
error incurred by approximating it as a line will be negligible. We
dene vectors dx and dx

as shown in the gure. In more than the

two dimensions shown in the gure, we would approximate the top
surface as an (n1)-dimensional parallelepiped spanned by vectors

, dy

, . . . This is the point at which the use of the covector S

(p. 125) pays o by greatly simplifying the proof.
Applying this
to the top of the triangle, dS is dened as the linear function that
takes a vector J and gives the n-volume spanned by J along with

, . . .
Call the vertical coordinate on the diagram t, and consider the
contribution to the ux from Js time component, J
. Because the
Here is an example of the ugly complications that occur if one doesnt have
access to this piece of technology. In the low-tech approach, in Euclidean space,
one denes an element of surface area dA = ndA, where the unit vector n is
outward-directed with n n = 1. But in a signature such as + , we could
have a region R such that over some large area of the bounding surface S, the
normal direction was lightlike. It would therefore be impossible to scale n so that
n n was anything but zero. As an example of how much work it is to resolve
such issues using stone-age tools, see Synge, Relativity: The Special Theory,
VIII, 6-7, where the complete argument takes up 22 pages.
164 Chapter 9 Flux
triangles size is an innitesimal of order dx, we can approximate J
as being a constant throughout the triangle, while incurring only an
error of order dx. (By stating Gausss theorem in terms of deriva-
tives of J, we implicitly assumed it to be dierentiable, so it is not
possible for it to jump discontinuously.) Since dS depends linearly
not just on J but on all the vectors, the dierence between the ux
at the top and bottom of the triangle equals is proportional to the
area spanned by J and dx

dx. But the latter vector is is in the

t direction, and therefore the area it spans when taken with J
approximately zero. Therefore the contribution of J
to the ux
through the triangle is zero. To estimate the possible error due to
the approximations, we have to count powers of dx. The possible
variation of J
over the triangle is of order (dx)
. The covector dS is
of order (dx)
, so the possible error in the ux is of order (dx)
This was only an estimate of one part of the ux, the part con-
tributed by the component J
. However, we get the same estimate
for the other parts. For example, if we refer to the two dimensions
in gure f/2 as t and x, then interchanging the roles of t and x
in the above argument produces the same error estimate for the
contribution from J
This is good. When we began this argument, we were motivated
to be cautious by our observation that a quantity such as the surface
area of R cant be calculated as the limit of the surface area as
approximated using boxes. The reason we have that problem for
surface area is that the error in the approximation on a small patch
is of order (dx)
, which is an innitesimal of the same order as the
surface area of the patch itself. Therefore when we scale down the
boxes, the error doesnt get small compared to the total area. But
when we consider ux, the error contibuted by each of the irregularly
shaped pieces near the surface goes like (dx)
, which is of the order
of the n-volume of the piece. This volume goes to zero in the limit
where the boxes get small, and therefore the error goes to zero as
well. This establishes the generalization of Gausss theorem to a
region R of arbitrary shape.
9.3.4 The energy-momentum vector
Einsteins celebrated E = mc
is a special case of the statement
that energy-momentum is conserved, transforms like a four-vector,
and has a norm m equal to the rest mass. Section 4.4 on p. 84
explored some of the problems with Einsteins original attempt at a
proof of this statement, but only now are we prepared to completely
resolve them. One of the problems was the denitional one of what
we mean by the energy-momentum of a system that is not composed
of pointlike particles. The answer is that for any phenomenon that
carries energy-momentum, we must decide how it contributes to the
stress-energy tensor. For example, the stress-energy tensor of the
electric and magnetic elds is described in section 10.6 on p. 182.
Section 9.3 Gausss theorem 165
g / Conservation of the inte-
grated energy-momentum vector.
h / Lorentz transformation of
the integrated energy-momentum
For the reasons discussed in section 4.4 on p. 84, it is necessary to
assume that energy-momentum is locally conserved, and also that
the system being described is isolated. Local conservation is de-
scribed by the zero-divergence property of the stress-energy tensor,
= 0. Once we assume local conservation, gure g shows
how to prove conservation of the integrated energy-momentum vec-
tor using Gausss theorem. Fix a frame of reference o. Surrounding
the system, shown as a dark stream owing through spacetime, we
draw a box. The box is bounded on its past side by a surface that
o considers to be a surface of simultaneity s
, and likewise on the
future side s
. It doesnt actually matter if the sides of the box are
straight or curved according to o. What does matter is that because
the system is isolated, we have enough room so that between the
system and the sides of the box there can be a region of vacuum, in
which the stress-energy tensor vanishes.
Observer o says that at the initial time corresponding to s
, the
total amount of energy-momentum in the system was



where the minus sign occurs because we take dS

to point outward,
for compatibility with Gausss theorem, and this makes it antiparal-
lel to the velocity vector o, which is the opposite of the orientation
dened in equations (2) on p. 153. At the nal time we have



with a plus sign because the outward direction is now the same as
the direction of o. Because of the vacuum region, there is no ux
through the sides of the box, and therefore by Gausss theorem p

= 0. The energy-momentum vector has been globally conserved
according to o.
We also need to show that the integrated energy-momentum
transforms properly as a four-vector. To prove this, we apply Gausss
theorem to the region shown in gure h, where s
is a surface of si-
multaneity according to some other observer o

. Gausss theorem
tells us that p
= p
, which means that the energy-momentum on
the two surfaces is the same vector in the absolute sense but
this doesnt mean that the two vectors have the same components
as measured by dierent observers. Observer o says that s
is a
surface of simultaneity, and therefore considers p
to be the total
energy-momentum at a certain time. She says the total mass-energy
is p


(eq. (2a), p. 153), and similarly for the total momentum in

the three spatial directions s
, s
, and s
(eq. (2b)). Observer o

meanwhile, considers s
to be a surface of simultaneity, and has
the same interpretations for quantities such as p


. But this is
just a way of saying that p

and p

are related to each other by
166 Chapter 9 Flux
i / These three rulers represent
three choices of coordinates.
a change of basis from (o, s
, s
, s
) to (o

, s

, s

, s

). A change of
basis like this is just what we mean by a Lorentz transformation, so
the integrated energy-momentum p transforms as a four-vector.
9.4 The covariant derivative
In this optional section we deal with the issues raised in section
7.5 on p. 136. We noted there that in non-Minkowski coordinates,
one cannot naively use changes in the components of a vector as a
measure of a change in the vector itself. A constant scalar function
remains constant when expressed in a new coordinate system, but
the same is not true for a constant vector function, or for any tensor
of higher rank. This is because the change of coordinates changes
the units in which the vector is measured, and if the change of
coordinates is nonlinear, the units vary from point to point. This
topic doesnt logically belong in this chapter, but Ive placed it here
because it cant be discussed clearly without already having covered
tensors of rank higher than one.
Consider the one-dimensional case, in which a vector v
has only
one component, and the metric is also a single number, so that we
can omit the indices and simply write v and g. (We just have to
remember that v is really a vector, even though were leaving out
the upper index.) If v is constant, its derivative dv/ dx, computed in
the ordinary way without any correction term, is zero. If we further
assume that the metric is simply the constant g = 1, then zero is
not just the answer but the right answer.
Now suppose we transform into a new coordinate system X, and
the metric G, expressed in this coordinate system, is not constant.
Applying the tensor transformation law, we have V = v dX/ dx,
and dierentiation with respect to X will not give zero, because the
factor dX/ dx isnt constant. This is the wrong answer: V isnt
really varying, it just appears to vary because G does.
We want to add a correction term onto the derivative operator
d/ dX, forming a new derivative operator
that gives the right
is called the covariant derivative. This correction term
is easy to nd if we consider what the result ought to be when dif-
ferentiating the metric itself. In general, if a tensor appears to vary,
it could vary either because it really does vary or because the met-
ric varies. If the metric itself varies, it could be either because the
metric really does vary or . . . because the metric varies. In other
words, there is no sensible way to assign a nonzero covariant deriva-
tive to the metric itself, so we must have
G = 0. The required
correction therefore consists of replacing d/ dX with

Applying this to G gives zero. G is a second-rank tensor with two
lower indices. If we apply the same correction to the derivatives of
Section 9.4 The covariant derivative 167
j / Example 10.
other tensors of this type, we will get nonzero results, and they will
be the right nonzero results.
Mathematically, the form of the derivative is (1/y) dy/ dx, which
is known as a logarithmic derivative, since it equals d(ln y)/ dx. It
measures the multiplicative rate of change of y. For example, if
y scales up by a factor of k when x increases by 1 unit, then the
logarithmic derivative of y is ln k. The logarithmic derivative of
is c. The logarithmic nature of the correction term to
is a
good thing, because it lets us take changes of scale, which are mul-
tiplicative changes, and convert them to additive corrections to the
derivative operator. The additivity of the corrections is necessary if
the result of a covariant derivative is to be a tensor, since tensors
are additive creatures.
What about quantities that are not second-rank covariant ten-
sors? Under a rescaling of coordinates by a factor of k, covectors
scale by k
, and second-rank tensors with two lower indices scale
by k
. The correction term should therefore be half as much for


and should have an opposite sign for vectors.
Generalizing the correction term to derivatives of vectors in more
than one dimension, we should have something of this form:



, called the Christoel symbol, does not transform like
a tensor, and involves derivatives of the metric. (Christoel is
pronounced Krist-AWful, with the accent on the middle syllable.)
An important gotcha is that when we evaluate a particular com-
ponent of a covariant derivative such as
, it is possible for the
result to be nonzero even if the component v
vanishes identically.
Christoffel symbols on the globe Example 10
As a qualitative example, consider the airplane trajectory shown
in gure j, from London to Mexico City. This trajectory is the short-
est one between these two points; such a minimum-length trajec-
tory is called a geodesic. In physics it is customary to work with
the colatitude, , measured down from the north pole, rather then
the latitude, measured from the equator. At P, over the North At-
lantic, the planes colatitude has a minimum. (We can see, with-
out having to take it on faith from the gure, that such a minimum
must occur. The easiest way to convince oneself of this is to con-
sider a path that goes directly over the pole, at = 0.)
At P, the planes velocity vector points directly west. At Q, over
New England, its velocity has a large component to the south.
168 Chapter 9 Flux
k / Birdtracks notation for the
covariant derivative.
Since the path is a geodesic and the plane has constant speed,
the velocity vector is simply being parallel-transported; the vec-
tors covariant derivative is zero. Since we have v

= 0 at P, the
only way to explain the nonzero and positive value of

is that
we have a nonzero and negative value of

By symmetry, we can infer that

must have a positive value

in the southern hemisphere, and must vanish at the equator.

is computed in example 11 on page 170.

Symmetry also requires that this Christoffel symbol be indepen-
dent of , and it must also be independent of the radius of the
To compute the covariant derivative of a higher-rank tensor, we
just add more correction terms, e.g.,





With the partial derivative

, it does not make sense to use the

metric to raise the index and form

. It does make sense to do so

with covariant derivatives, so
= g

is a correct identity.
9.4.1 Comma, semicolon, and birdtracks notation
Some authors use superscripts with commas and semicolons to
indicate partial and covariant derivatives. The following equations
give equivalent notations for the same derivatives:



= X

= X

= X
Figure k shows two examples of the corresponding birdtracks no-
tation. Because birdtracks are meant to be manifestly coordinate-
independent, they do not have a way of expressing non-covariant
9.4.2 Finding the Christoffel symbol from the metric
Weve already found the Christoel symbol in terms of the metric
in one dimension. Expressing it in tensor notation, we have

) ,
Section 9.4 The covariant derivative 169
where inversion of the one-component matrix G has been replaced
by matrix inversion, and, more importantly, the question marks indi-
cate that there would be more than one way to place the subscripts
so that the result would be a grammatical tensor equation. The
most general form for the Christoel symbol would be

) ,
where L, M, and N are constants. Consistency with the one-
dimensional expression requires L + M + N = 1. The condition
L = M arises on physical, not mathematical grounds; it reects the
fact that experiments have not shown evidence for an eect called
torsion, in which vectors would rotate in a certain way when trans-
ported. The L and M terms have a dierent physical signicance
than the N term.
Suppose an observer uses coordinates such that all objects are
described as lengthening over time, and the change of scale accu-
mulated over one day is a factor of k > 1. This is described by the
< 1, which aects the M term. Since the metric is
used to calculate squared distances, the g
matrix element scales
down by 1/

k. To compensate for
< 0, so we need to add a
positive correction term, M > 0, to the covariant derivative. When
the same observer measures the rate of change of a vector v
respect to space, the rate of change comes out to be too small, be-
cause the variable she dierentiates with respect to is too big. This
requires N < 0, and the correction is of the same size as the M
correction, so [M[ = [N[. We nd L = M = N = 1.
Self-check: Does the above argument depend on the use of space
for one coordinate and time for the other?
The resulting general expression for the Christoel symbol in
terms of the metric is


) .
One can go back and check that this gives
= 0.
Self-check: In the case of 1 dimension, show that this reduces to
the earlier result of (1/2) dG/ dX.
is not a tensor, i.e., it doesnt transform according to the tensor
transformation rules. Since isnt a tensor, it isnt obvious that the
covariant derivative, which is constructed from it, is tensorial. But
if it isnt obvious, neither is it surprising the goal of the above
derivation was to get results that would be coordinate-independent.
Christoffel symbols on the globe, quantitatively Example 11
In example 10 on page 168, we inferred the following properties
for the Christoffel symbol

on a sphere of radius R:

independent of and R,

< 0 in the northern hemisphere

170 Chapter 9 Flux
l / The geodesic, 1, preserves
tangency under parallel trans-
port. The non-geodesic curve,
2, doesnt have this property;
a vector initially tangent to the
curve is no longer tangent to it
when parallel-transported along
(colatitude less than /2),

= 0 on the equator, and

0 in the southern hemisphere.
The metric on a sphere is ds
= R
+ R
. The only
nonvanishing term in the expression for

is the one involving

= 2R
sin cos . The result is

= sin cos , which

can be veried to have the properties claimed above.
9.4.3 The geodesic equation
In this section we show how the Christoel symbols can be used
to nd dierential equations that describe inertial motion. The
world-line of a test particle is called a geodesic. We dened this term
in a nonrelativistic context as the shortest curve between two points.
Geodesics play the same role in relativity that straight lines play in
Euclidean geometry. A relativistic geodesic minimizes or maximizes
the metric distance between two events. A timelike geodesic maxi-
mizes the proper time (cf. section 2.4.2, p. 44). In special relativity,
geodesics are given by linear equations when expressed in Minkowski
coordinates, and the velocity vector of a test particle has constant
components when expressed in Minkowski coordinates. In general
relativity, Minkowski coordinates dont exist, and geodesics dont
have the properties we expect based on Euclidean intuition; for ex-
ample, initially parallel geodesics may later converge or diverge.
Characterization of the geodesic
A geodesic can be dened as a world-line that preserves tangency
under parallel transport, l. This is essentially a mathematical way
of expressing the notion that we have previously expressed more
informally in terms of staying on course or moving inertially.
A curve can be specied by giving functions x
() for its coor-
dinates, where is a real parameter. A vector lying tangent to the
curve can then be calculated using partial derivatives, T
= x
There are three ways in which a vector function of could change:
(1) it could change for the trivial reason that the metric is changing,
so that its components changed when expressed in the new metric;
(2) it could change its components perpendicular to the curve; or
(3) it could change its component parallel to the curve. Possibility
1 should not really be considered a change at all, and the denition
of the covariant derivative is specically designed to be insensitive
to this kind of thing. 2 cannot apply to T
, which is tangent by
construction. It would therefore be convenient if T
happened to
be always the same length. If so, then 3 would not happen either,
and we could reexpress the denition of a geodesic by saying that
the covariant derivative of T
was zero. For this reason, we will
assume for the remainder of this section that the parametrization
of the curve has this property. In a Newtonian context, we could
imagine the x
to be purely spatial coordinates, and to be a uni-
versal time coordinate. We would then interpret T
as the velocity,
Section 9.4 The covariant derivative 171
and the restriction would be to a parametrization describing motion
with constant speed. In relativity, the restriction is that must be
an ane parameter. For example, it could be the proper time of a
particle, if the curve in question is timelike.
Covariant derivative with respect to a parameter
The notation of section 9.4 is not quite adapted to our present
purposes, since it allows us to express a covariant derivative with
respect to one of the coordinates, but not with respect to a param-
eter such as . We would like to notate the covariant derivative of
with respect to as

, even though isnt a coordinate. To
connect the two types of derivatives, we can use a total derivative.
To make the idea clear, here is how we calculate a total derivative
for a scalar function f(x, y), without tensor notation:


This is just the generalization of the chain rule to a function of two
variables. For example, if represents time and f temperature,
then this would tell us the rate of change of the temperature as
a thermometer was carried through space. Applying this to the
present problem, we express the total covariant derivative as

= (

The geodesic equation
/ d as a total non-covariant derivative, we

Substituting x
/ for T
, and setting the covariant derivative
equal to zero, we obtain
= 0.
This is known as the geodesic equation.
If this dierential equation is satised for one ane parameter
, then it is also satised for any other ane parameter

= a+b,
where a and b are constants (problem 4, p. 174). Recall that ane
parameters are only dened along geodesics, not along arbitrary
curves. We cant start by dening an ane parameter and then use
it to nd geodesics using this equation, because we cant dene an
ane parameter without rst specifying a geodesic. Likewise, we
cant do the geodesic rst and then the ane parameter, because if
172 Chapter 9 Flux
we already had a geodesic in hand, we wouldnt need the dierential
equation in order to nd a geodesic. The solution to this chicken-
and-egg conundrum is to write down the dierential equations and
try to nd a solution, without trying to specify either the ane
parameter or the geodesic in advance.
The geodesic equation is useful in establishing one of the nec-
essary theoretical foundations of relativity, which is the uniqueness
of geodesics for a given set of initial conditions. If the geodesic
were not uniquely determined, then particles would have no way of
deciding how to move. The form of the geodesic equation guaran-
tees uniqueness, because one can use it to dene an algorithm that
constructs a geodesic for a given set of initial conditions.
Section 9.4 The covariant derivative 173
1 Rewrite the stress-energy tensor of a perfect uid in SI units.
For air at sea level, compare the sizes of its components.
2 Prove by direct computation that if a rank-2 tensor is sym-
metric when expressed in one Minkowski frame, the symmetry is
preserved under a boost.
3 Consider the following change of coordinates:

= t

= x

= y

= z
This is called a time reversal. As in example 6 on p. 155, nd the
eect on the stress-energy tensor.
4 Show that if the dierential equation for geodesics on page
172 is satised for one ane parameter , then it is also satised for
any other ane parameter

= a+b, where a and b are constants.

5 This problem investigates a notational conict in the de-
scription of the metric tensor using index notation. Suppose that
we have two dierent metrics, g

and g

. The dierence of two

rank-2 tensors is also a rank-2 tensor, so we would like the quantity

= g

to be a well-behaved tensor both in its transforma-

tion properties and in its behavior when we manipulate its indices.
Now we also have g

and g

, which are dened as the matrix

inverses of their lower-index counterparts; this is a special property
of the metric, not of rank-2 tensors in general. We can then dene

= g

. (a) Use a simple example to show that g

and g

cannot be computed from one another in the usual way

by raising and lowering indices. (b) Find the general relationship
between g

and g

174 Chapter 9 Flux
a / A model of a charged particle
and a current-carrying wire,
seen in two different frames of
reference. The relativistic length
contraction is highly exaggerated.
The force on the lone particle is
purely magnetic in 1, and purely
electric in 2.
Chapter 10
10.1 Relativity requires magnetism
Figure a/1 is an unrealistic model of charged particle moving par-
allel to a current-carrying wire. What electrical force does the lone
particle in gure a/1 feel? Since the density of trac on the two
sides of the road is equal, there is zero overall electrical force on
the lone particle. Each car that attracts the lone particle is paired
with a partner on the other side of the road that repels it. If we
didnt know about magnetism, wed think this was the whole story:
the lone particle feels no force at all from the wire.
Figure a/2 shows what wed see if we were observing all this from
a frame of reference moving along with the lone charge. Relativity
tells us that moving objects appear contracted to an observer who is
not moving along with them. Both lines of charge are in motion in
both frames of reference, but in frame 1 they were moving at equal
speeds, so their contractions were equal. In frame 2, however, their
speeds are unequal. The dark charges are moving more slowly than
in frame 1, so in frame 2 they are less contracted. The light-colored
charges are moving more quickly, so their contraction is greater now.
The cars on the two sides of the road are no longer paired o,
so the electrical forces on the lone particle no longer cancel out as
they did in a/1. The lone particle is attracted to the wire, because
the particles attracting it are more dense than the ones repelling it.
Now observers in frames 1 and 2 disagree about many things,
but they do agree on concrete events. Observer 2 is going to see
the lone particle drift toward the wire due to the wires electrical
attraction, gradually speeding up, and eventually hit the wire. If 2
sees this collision, then 1 must as well. But 1 knows that the total
electrical force on the lone particle is exactly zero. There must be
some new type of force. She invents a name for it: magnetism.
Magnetism is a purely relativistic eect. Since relativistic ef-
fects are down by a factor of v
compared to Newtonian ones, its
surprising that relativity can produce an eect as vigorous as the at-
traction between a magnet and your refrigerator. The explanation
is that although matter is electrically neutral, the cancellation of
electrical forces between macroscopic objects is extremely delicate,
so anything that throws o the cancellation, even slightly, leads to
a surprisingly large force.
b / Fields carry energy.
10.2 Fields in relativity
Based on what we learned in section 10.1, the next natural step
would seem to be to nd some way of extending Coulombs law to
include magnetism. For example, we could try to nd a formula
for the magnetic force between charges q
and q
based on not just
their relative positions but also on their velocities. The following
considerations, however, tell us not to go down that path.
10.2.1 Time delays in forces exerted at a distance
Relativity forbids Newtons instantaneous action at a distance
(p. 17). Since forces cant be transmitted instantaneously, it be-
comes natural to imagine force-eects spreading outward from their
source like ripples on a pond, and we then have no choice but to
impute some physical reality to these ripples. We call them elds,
and they have their own independent existence.
Even empty space, then, is not perfectly featureless. It has mea-
surable properties. For example, we can drop a rock in order to
measure the direction of the gravitational eld, or use a magnetic
compass to nd the direction of the magnetic eld. This concept
made a deep impression on Einstein as a child. He recalled that as
a ve-year-old, the gift of a magnetic compass convinced him that
there was something behind things, something deeply hidden.
10.2.2 Fields carry energy.
The smoking-gun argument for this strange notion of traveling
force ripples comes from the fact that they carry energy. In g-
ure b/1, Alice and Betty hold positive charges A and B at some
distance from one another. If Alice chooses to move her charge
closer to Bettys, b/2, Alice will have to do some mechanical work
against the electrical repulsion, burning o some of the calories from
that chocolate cheesecake she had at lunch. This reduction in her
bodys chemical energy is oset by a corresponding increase in the
electrical potential energy qV . Not only that, but Alice feels the
resistance stien as the charges get closer together and the repul-
sion strengthens. She has to do a little extra work, but this is all
properly accounted for in the electrical potential energy.
But now suppose, b/3, that Betty decides to play a trick on Alice
by tossing charge B far away just as Alice is getting ready to move
charge A. We have already established that Alice cant feel charge
Bs motion instantaneously, so the electric forces must actually be
propagated by an electric eld. Of course this experiment is utterly
impractical, but suppose for the sake of argument that the time it
takes the change in the electric eld to propagate across the dia-
gram is long enough so that Alice can complete her motion before
she feels the eect of Bs disappearance. She is still getting stale
information about Bs position. As she moves A to the right, she
176 Chapter 10 Electromagnetism
feels a repulsion, because the eld in her region of space is still the
eld caused by B in its old position. She has burned some chocolate
cheesecake calories, and it appears that conservation of energy has
been violated, because these calories cant be properly accounted
for by any interaction with B, which is long gone.
If we hope to preserve the law of conservation of energy, then
the only possible conclusion is that the electric eld itself carries
away the cheesecake energy. In fact, this example represents an
impractical method of transmitting radio waves. Alice does work
on charge A, and that energy goes into the radio waves. Even if B
had never existed, the radio waves would still have carried energy,
and Alice would still have had to do work in order to create them.
10.2.3 Fields must have transformation laws
In the foregoing discussion Ive been guilty of making arguments
that elds were real. Sorry. In physics, and particularly in rel-
ativity, its usually a waste of time worrying about whether some
eect such as length contraction is real or only seems that way.
But thinking of elds as having an independent existence does lead
to a useful guiding principle, which is that elds must have trans-
formation laws. Suppose that at a certain location, observer o
measures every possible eld electric, magnetic, bodice-ripper-
sexual-attractional, and so on. (The gravitational eld is not on the
list, for the reasons discussed in section 5.2.) Observer o
, passing
by the same event but in a dierent state of motion, could carry
out similar measurements. Were talking about measurements be-
ing carried out on a cubic inch of pure vacuum, but suppose that the
answer to Peggy Lees famous question is Yes, thats all there is
the only information there is to know about that empty parcel of
nothingness is the (frame-dependent) value of the elds it contains.
Then o
ought to be able to predict the results of o
s measurements.
For if not, then what is the nature of the information that is hidden
from o
but revealed to o
? Presumably this would be something
related to how the elds were produced by certain particles long ago
and far away. For example, maybe o
is at rest relative to a certain
charge q that helped to create the elds, but o
isnt, so o
up qs magnetic eld, which is information unavailable to o
thinks q was at rest, and therefore didnt make any magnetic eld.
This would contradict our thats all there is hypothesis.
To show the power of thats all there is, consider example 1,
p. 150, in which we found that boosting a solenoid along its own axis
doesnt change its internal eld. As a fact about solenoids, its fairly
obscure and useless. But if the elds must have transformation laws,
then weve learned something much more general: a magnetic eld
always stays the same under a boost in the direction of the eld.
Section 10.2 Fields in relativity 177
10.3 Electromagnetic elds
10.3.1 The electric eld
Section 10.1 showed that relativity requires magnetic forces to
exist, and section 10.2.3 gave us a peek at what this implies about or
how electric and magnetic elds transform. To understand this on a
more general basis, lets explicitly list some assumptions about the
electric eld and see how they lead to the existence and properties
of a magnetic eld:
1. Denition of the electric eld: In the frame of reference of an
inertial observer o, take some standard, charged test particle,
release it at rest, and observe the force F
(section 4.5, p. 86)
acting on the particle. (The timelike component of this force
vanishes.) Then the electric eld three-vector E in frame o is
dened by F
= qE, where we x our system of units by taking
some arbitrary value for the charge q of the test particle.
2. Denition of electric charge: For charges other than the stan-
dard test charge, we take Gausss law to be our denition of
electric charge.
3. Charge is Lorentz invariant (p. 22).
4. Fields must have transformation laws (section 10.2.3).
Many times already in our study of relativity, weve followed
the strategy of taking a Galilean vector and trying to redene it
as a four-dimensional vector in relativity. Lets try to do this with
the electric eld. Then we would have no other obvious thing to
try than to change its denition to F = qE, where F = ma is the
relativistic force vector (section 4.5, p. 86), so that the electric eld
three-vector was just the spacelike part of E. Because a v = 0 for
a material particle, this would imply that E was orthogonal to o
for any observer o. But this is impossible, since then a spacetime
displacement vector s along the direction of E would be a vector of
simultaneity for all observers, and we know that this isnt possible
in relativity.
10.3.2 The magnetic eld
Our situation is very similar to the one encountered in section
9.1, p. 149, where we found that knowledge of the charge density
in one frame was insucient to tell us the charge density in other
frames. There was missing information, which turned out to be
the current density. The problems weve encountered in dening
the transformation properties of the electric eld suggest a similar
missing-information situation, and it seems likely that the miss-
ing information is the magnetic eld. How should we modify the
assumptions on p. 178 to allow for the existence of a magnetic eld
178 Chapter 10 Electromagnetism
in addition to the electric one? What properties could this addi-
tional eld have? How would we dene or measure it?
One way of imagining a new type of eld would be if, in addition
to charge q, particles had some other characteristic, call it r, and
there was then be some entirely separate eld dened by their action
on a particle with this r-ness. But going down this road leads us
to unrelated phenomena such as the the strong nuclear interaction.
The nature of the contradiction arrived at in section 10.3.1 is
such that our additional eld is closely linked to the electric one,
and therefore we expect it to act on charge, not on r-ness. With-
out inventing something new like r-ness, the only other available
property of the test particle is its state of motion, characterized by
its velocity vector v. Now the simplest rule we could imagine for
determining the force on a test particle would be a linear one, which
would look like matrix multiplication:
F = qTv
or in index notation,
= qT
Although the form T
with one upper and one lower index occurs
naturally in this expression, well nd it more convenient from now
on to work with the upper-upper form T
. T would be 4 4, so it
would have 16 elements:
Presumably these 16 numbers would encode the information about
the electric eld, as well as some additional information about the
eld or elds we were missing.
But these are not 16 numbers that we can choose freely and inde-
pendently. For example, consider a charged particle that is instan-
taneously at rest in a certain observers frame, with v = (1, 0, 0, 0).
(In this situation, the four-force equals the force measured by the
observer.) The work done by a force is positive if the force is in the
same direction as the motion, negative if in the opposite direction,
and zero if there is no motion. Therefore the power P = dW/ dt
in this example should be zero. Power is the timelike component of
the force vector, which forces us to take T
= 0.
More generally, consider the kinematical constraint a v = 0
(p. 60). When we require a v = 0 for any v, not just this one, we
end up with the constraint that T must be antisymmetric, meaning
Section 10.3 Electromagnetic elds 179
that when we transpose it, the result is another matrix that looks
just like the original one, but with all the signs ipped:
0 T
0 T
0 T
Each element equals minus the corresponding element across the
main diagonal from it, and antisymmetry also requires that the main
diagonal itself be zero. In terms of the concept of degrees of free-
dom introduced in section 3.5.3, p. 58, we are down to 6 degrees of
freedom rather than 16.
We now relabel the elements of the matrix and follow up with
a justication of the relabeling. The result is the following rank-2

0 E
0 B
0 B
Well call this the electromagnetic eld tensor. The labeling of the
left column simply expresses the denition of the electric eld, which
is expressed in terms of the velocity v = (1, 0, 0, 0) of a particle at
rest. The top row then follows from antisymmetry. For an arbitrary
velocity vector, writing out the matrix multiplication F

= qT

results in expressions such as F

= q(E
) (problem 3,
p. 193). Taking into account the dierence of a factor of between
the four-force and the force measured by an observer, we end up
with the familiar Lorentz force law,
= q(E+u B) ,
where B is the magnetic eld. This is expressed in units where c = 1,
so that the electric and magnetic eld have the same units. In units
with c ,= 1, the magnetic components of the electromagnetic eld
matrix should be multiplied by c.
Thus starting only from the assumptions on p. 178, we deduce
that the electric eld must be accompanied by a magnetic eld.
Parity properties of E and B Example 1
In example 6 on p. 155, we saw that under the parity transfor-
mation (t , x, y, z) (t , x, y, z), any rank-2 tensor expressed
in Minkowski coordinates changes the signs of its components
according to the same rule:
no ip ip ip ip
ip no ip no ip no ip
ip no ip no ip no ip
ip no ip no ip no ip
180 Chapter 10 Electromagnetism
c / Example 2.
Since this holds for the electromagnetic eld tensor T, we nd
that under parity, E E and B B. For example, a capacitor
seen in a mirror has its electric eld pointing the opposite way, but
there is no change in the magnetic eld of a current loop, since
the location of each current element is ipped to the other side
of the loop, but its direction of ow is also reversed, so that the
picture as a whole remains unchanged.
10.3.3 What about gravity?
A funny puzzle pops up if we go back and think about the as-
sumptions on p. 178 that went into all this. Those assumptions were
so general that it almost seems as though the only possible behavior
for elds is the behavior of electric and magnetic elds. But other
elds do behave dierently. How did the assumptions fail in the
case of gravity, for example? Gausss law (assumption 2) certainly
holds for gravity. But the source of gravitational elds isnt charge,
its mass-energy, and mass-energy isnt a Lorentz invariant, contrary
to assumption 3. Furthermore, assumption 1 entailed that our eld
could be dened in terms of forces measured by an inertial observer,
but for an inertial observer gravity doesnt exist (section 5.2).
10.4 Transformation of the elds
Since we have associated the components of the electric and mag-
netic elds with elements of a rank-2 tensor, the transformation law
for these elds now follows from the general tensor transformation
law for rank-2 tensors (p. 154). We rst state the general rule, in
a prettied form, and then give some concrete examples. Under a
boost by a three-velocity v, the electric and magnetic elds E and
B transform to E

and B

according to these rules:


= E

= (E

+v B)

= B

= (B

v E)
A line of charge Example 2
Figure c/1 shows a line of charges. At a given nearby point, it
creates an electric eld E that points outward, as measured by
an observer o who is at rest relative to the charges. This eld is
represented in the gure by its pattern of eld lines, which start
on the charges and radiate outward like the bristles of a bottle
brush. Because the charges are at rest, the magnetic eld is
zero. (Finding the magnitude of the eld at a certain distance is a
straightforward application of Gausss law.)
Now consider an observer o

, gure c/2, moving at velocity v to

the right relative to o. Without even worrying about how the eld
was created, we can transform the elds, at the point in space
discussed previously, into the new frame. The result is E

= E
Section 10.4 Transformation of the elds 181
and B

= v E. In this frame, the electric eld is more in-

tense, and there is also a magnetic eld, whose pattern of white
eld lines forms circles lying in planes perpendicular to the line.
If we do happen to know that the eld was created created by
the line of charge, which is moving according to o

, then we can
explain these results as arising from two effects. First, the line of
charge has been length-contracted. This causes the density of
charge per unit length to increase by a factor of , with a propor-
tional increase in the electric eld. In the eld-line description, we
simply have more charges in the gure, so there are more eld
lines coming out of them. Second, the line of charge is moving
to the left in this frame, so it forms an electric current, and this
current is the cause of the magnetic eld B

10.5 Invariants
Weve seen cases before in which an invariant can be formed from
a rank-1 tensor. The square of the proper time corresponding to a
timelike spacetime displacement r is r r or, in the index notation
introduced in section ??, r
. From the momentum tensor we can
construct the square of the mass p
There are good reasons to believe that something similar can
be done with the electromagnetic eld tensor, since electromagnetic
elds have certain properties that are preserved when we switch
frames. Specically, an electromagnetic wave consists of electric
and magnetic elds that are equal in magnitude and perpendicular
to one another. An electromagnetic wave that is a valid solution
to Maxwells equations in one frame should also be a valid wave in
another frame. It can be shown that the following two quantities
are invariants:
P = B
Q = E B
The rst of these can also be expressed as P =
, while
the second equals Q =

, where the Levi-Civita symbol

equals +1 if are an even permutation of 1234, 1 if odd,

and 0 otherwise. For an electromagnetic wave, both P and Q are
zero. (The converse is false: a eld for which both invariants vanish
need not be an electromagnetic wave.)
10.6 Stress-energy tensor of the
electromagnetic eld
The electromagnetic eld has a stress-energy tensor associated with
it. From our study of electromagnetism we know that the electro-
182 Chapter 10 Electromagnetism
d / Pressure and tension in
electrostatic elds.
magnetic eld has energy density U = (E
)/8k and momen-
tum density S = (E B)/4k (in units where c = 1, with k being
the Coulomb constant). This xes the components of the stress-
energy tensor of the form T
and T
, i.e., the top row and left
column, to look like this:

The following argument tells us something about what to expect
for the components T
, T
, and T
, which are interpreted as
pressures or tensions, depending on their signs. In gure d/1, the
capacitor plates want to collapse against each other in the vertical
(y) direction, but at the same time the internal repulsions within
each plate make that plate want to expand in the x direction. If
the capacitor is built out of materials that hold their shape, then
the electromagnetic tension in T
< 0 is counteracted by pressure
> 0 in the materials, while the electromagnetic pressure T
> 0
is canceled by the materials tension T
< 0. We got these results
for a particular physical situation, but relativity requires that the
stress-energy be dened at every point based on the elds at that
point, so our conclusions must hold generally. In d/2 and d/3, white
boxes have been drawn in regions where the total eld is strong
and the elds are strongly interacting. In 2, there is tension in
the x direction and pressure in y; the tension can be thought of as
contributing to the attraction between the opposite charges. In 3,
there is also x tension and y pressure; the pressure contributes to
the like charges repulsion.
To make this more quantitative, consider the discontinuity in E
at the upper plate in gure d/1. The eld abruptly switches from
0 on the outside to some value E between the plates. By Gausss
law, the charge per unit area on the plate must be = E/4k.
The average eld experienced by the charge in the plate is E =
(0 +E)/2 = E/2, so the force per unit area, i.e., the tension in the
eld, is E = E
/8k. Thus we expect T
= E
/8k if E is
along the y axis.
For the reader who wants the full derivation of the remaining
nine components of the tensor, we now give an argument that makes
use of the following list of its properties. Other readers can skip
ahead to where the full tensor is presented.
1. T is symmetric, T

= T

2. The components must be second-order in the elds, e.g., we
can have terms like E
, but not E
or E
. This
Section 10.6 Stress-energy tensor of the electromagnetic eld 183
is because Maxwells equations are linear, and when a wave
equation is perfectly linear, the corresponding energy expres-
sion is second-order in the amplitude of the wave.
3. T has the parity properties described in example 6 on p. 155.
4. The electric and magnetic elds are treated symmetrically in
Maxwells equations, so they should be treated symmetrically
in the stress-energy tensor. E.g., we could have a term like
+ 7B
, but not 7E
+ 6B
5. On p. 161 of section 9.2.8, we saw that the trace energy con-
dition T
0 is satised by a cloud of dust if and only if
the dusts mass-energy is not transported at a speed greater
than c. In section 4.1, we saw that all ultrarelativistic par-
ticles have the same mechanical properties. Since a cloud of
dust, in the limit where its speed approaches c, is on the edge
of the bound set by the trace energy condition, T
0, we
expect that the electromagnetic eld, in which disturbances
propagate at c, should also exactly saturate the trace energy
condition, so that T
= 0.
6. The stress-energy tensor should behave properly under rota-
tions, which basically means that x, y, and z should be treated
7. An electromagnetic plane wave propagating in the x direction
should not exert any pressure in the y or z directions.
8. If the eld obeys Maxwells equations, then the energy-conservation
condition T
= 0 should hold.
These facts are enough to completely determine the form of the
remaining nine components of the stress-energy tensor. Property 3
requires that all of these components be even under parity. Since
electric elds ip under parity but magnetic elds dont (example 1,
p. 180), these components can only have terms like E
and B
not mixed terms like E
. Taking into account properties 4 and 6,
we nd that the diagonal terms must look like
= a(E
) +b(E
) ,
and the o-diagonal ones
= c(E
) .
Property 5 gives 1/2 a 3b = 0 and 7 gives b = a/2, so we have
a = 1 and b = 1/2. The determination of c = 1 is left as an
exercise, problem 4 on p. 193.
184 Chapter 10 Electromagnetism
We have now established the complete expression for the stress-
energy tensor of the electromagnetic eld, which is










U =
) ,
S =
EB ,
and , known as the Maxwell stress tensor, is given by



if ,=

) if =
All of this can be expressed more compactly and in a coordinate-
independent way as
, (2)
where o is a future-directed velocity vector, so that o
= +1 for
the signature + used in this book, and 1 if the signature is
+ ++.
Stress-energy tensor of a plane wave Example 3
Let an electromagnetic plane wave (not necessarily sinusoidal)
propagate along the x axis, with its polarization such that E is in
the y direction and B on the z axis, and [E[ = [B[ = A. Then we
have the following for the stress-energy tensor.

1 1 0 0
1 1 0 0
0 0 0 0
0 0 0 0
The T
t t
component tells us that the wave has a certain energy
density. Because the wave is massless, we have E
= m
0, so the momentum density is the same as the energy density,
and T
t x
is the same as T
t t
. If this wave strikes a surface in the
yz plane, the momentum the surface absorbs from the wave will
be felt as a pressure, represented by T
In example 5 on p. 155, we saw that a cloud of dust, viewed in a
frame moving at velocity v relative to the dusts rest frame, had
the following stress-energy tensor.



v 0 0

0 0
0 0 0 0
0 0 0 0
Section 10.6 Stress-energy tensor of the electromagnetic eld 185
In the ultrarelativistic limit v 1, this becomes

= (energy density)
1 1 0 0
1 1 0 0
0 0 0 0
0 0 0 0
which is exactly the same as the result for our electromagnetic
wave. This illustrates the fact discussed in section 4.1 that all
ultrarelativistic particles have the same mechanical properties.
10.7 Maxwells equations
10.7.1 Statement and interpretation
In this book I assume that youve had the usual physics back-
ground acquired in a freshman survey course, which includes an
initial, probably frightening, encounter with Maxwells equations in
integral form. In units with c = 1, Maxwells equations are:

= 4kq (3a)

= 0 (3b)
E d =

B d =

+ 4kI (3d)

E da and (4)

B da . (5)
Equations (3a) and (3b) refer to a closed surface and the charge q
contained inside that surface. Equation (3a), Gausss law, says that
charges are the sources of the electric eld, while (3b) says that
magnetic charges dont exist. Equations (3c) and (3d) refer to
a surface like a potato chip, which has an edge or boundary, and
the current I passing through that surface, with the line integrals
in being evaluated along that boundary. The right-hand side of
(3c) says that a changing magnetic eld produces a curly electric
eld, as in a generator or a transformer. The I term in (3d) says
that currents create magnetic elds that curl around them. The

/t term, which says that changing electric elds create mag-
netic elds, is necessary so that the equations produce consistent
186 Chapter 10 Electromagnetism
results regardless of the surfaces chosen, and is also part of the ap-
paratus responsible for the existence of electromagnetic waves, in
which the changing E eld produces the B, and the changing B
makes the E.
Equations (3a) and (3b) have no time-dependence. They func-
tion as constraints on the possible eld patterns. Equations (3c)
and (3d) are dynamical laws that predict how an initial eld pat-
tern will evolve over time. It can be shown that if (3a) and (3b) are
satised initially, then (3c) and (3d) ensure that they will continue
to be satised later. Because the dynamical laws consist of two vec-
tor equations, they provide a total of 6 constraints, which are the
number needed in order to predict the behavior of the 6 elds E
, E
, B
, B
, and B
10.7.2 Experimental support
Before Einsteins 1905 paper on relativity, the known laws of
physics were Newtons laws and Maxwells equations (3a)-(3d). Ex-
periments such as example 4 on p. 75 show that Newtons laws
are only low-velocity approximations. Maxwells equations are not
low-velocity approximations; for example, in section 1.3.1 we noted
the evidence that atoms are electrically neutral, in agreement with
Gausss law, (3a), to one part in 10
, even though the electrons in
atoms typically have velocities on the order of 1-10% of c.
10.7.3 Incompatibility with Galilean spacetime
Maxwells equations are not compatible with the Galilean de-
scription of spacetime (section 1.1.2, p. 13). If we assume that
equations (3) hold in some frame o, and then apply a Galilean
boost, transforming the coordinates (t, x, y, z) to (t

, x

, y

, z

) =
(t, x vt, y, z), we nd that in frame o

the equations have a dif-

ferent and more complicated form that cannot be simplied so as
to look like the form they had in o. Rather than writing out the
resulting horrible mess and verifying that it cant be cast back into
the simpler form, an easier way to prove this is to note that there
are solutions to the equations in o that are not solutions after a
Galilean boost into o

, if we try to keep the equations in the same

form. For example, if a light wave propagates in the x direction at
speed c in o, then after a boost with v = c, we would have a light
wave in frame o

that was standing still. (This is Einsteins thought

experiment of riding alongside a light wave on a motorcycle, p. 13.)
Such a wave would violate (3c), since the left-hand side would be
nonzero for a surface lying in the xy plane, but the time derivative
on the right-hand side would be zero.
10.7.4 Not manifestly relativistic in their original form
Since Maxwells equations are not low-velocity approximations
and are incompatible with Galilean relativity, we expect with the
benet of historical hindsight that they are compatible with the
Section 10.7 Maxwells equations 187
e / A magnetic eld that vio-
= 0.
relativistic picture of spacetime. But when they are expressed in
the form (3), they have two features, either one of which seems
enough to make them completely incompatible with relativity:
(i) They appear to describe instantaneous action at a distance.
For example, Gausss law,
= 4kq, relates the electric eld
in one place (on the closed surface) to the electric charge some-
where else (inside the surface). This nonlocal structure smells
wrong relativistically, for the reasons discussed in section 10.2.
(ii) They appear to treat time and space asymmetrically.
Whats really happening here is that equations (3) are like a version
of Hamlet written in crayon on a long strip of toilet paper. They are
completely relativistic, but have been written in a form that hides
that fact.
The problem of nonlocality, i, can be shown to be a non-issue
because Maxwells equations can be reworked into a form in which
they are purely local. The idea is shown in gure e. The magnetic
eld lines all form closed loops, except for one of them, which begins
at a point in space and extends o to innity. Drawing the large
box, 1, we nd that
, the ux of the magnetic eld through the
box, is not zero, because a line leaves the box but none come in. But
the same discrepancy could have been detected with the smaller box
2, or in fact with an arbitrarily small box containing source of the
eld line. In other words, the equation
= 0 is nonlocal, but if it
is to hold for any surface, then it must also hold locally, in the limit
of an arbitrarily small surface. This purely local law of physics can
be expressed using the three-dimensional version of the divergence,
introduced on p. 152:
= 0
Of the four Maxwells equations, both equation (3a) and (3b) can
be reexpressed in this way. This book neither presents the full ma-
chinery of vector calculus nor assumes previous knowledge of it, but
a similar limiting procedure can also be applied to equations (3c)
and (3d), using an operator called the curl.
The following example is one in which both problem i and prob-
lem ii turn out not to be problems.
Jumping through a hoop Example 4
Here is an example in which the non-obvious features of Maxwells
equations prevent the antirelativistic meltdown projected in i. In
gure f/1, an electron jumps back and forth through an imagi-
nary circular hoop, across which we construct an imaginary at
surface. Every time the electron pierces the surface, it makes a
188 Chapter 10 Electromagnetism
f / 1. An electron jumps through
a hoop. 2. An alternative surface
spanning the hoop.
momentary spike in the current I, which appears in (3d),
B d =

+ 4kI .
We might expect that this would cause the eld B detected on the
edge of the disk to show similar spikes at the same times. But
same times implies some notion of simultaneity, and this would
be incompatible with relativity, since the t coordinate being re-
ferred to here is just one observers notion of time. Furthermore,
it would seem that information was being transmitted instanta-
neously from the center of the disk to its edge, which violates
relativity (p. 17).
Stranger still, we can produce an apparent paradox without even
appealing to relativity. Instead of the at surface in f/1, we can
pick a dish-shaped one, f/2, with a deep enough curve so that the
electron never crosses it. The current I is always zero according
to this surface, so that no eld B would be produced at the rim at
The resolution of all these difculties lies in the term
/t ,
which weve ignored. With surface 1, the electron crosses the
surface in time t , causing a current I = e/t but also causing a
change in the ux from
2ke to
2ke. The result
is that the right-hand side of the equation is nearly zero. With
surface 2, I = 0 and
/t 0, so the right-hand side is again
nearly zero.
When the approximations used above are eliminated, Maxwells
equations do predict a nonvanishing eld, which is the expected
electromagnetic wave propagating away from the electron at the
proper speed c.
10.7.5 Lorentz invariance
Example 4 might seem like a just-so story, but the appar-
ently miraculous resolution is not a coincidence. It happens because
Maxwells equations are in fact invariant under a Lorentz transfor-
mation, even though that isnt obvious when theyre written in the
form (3a)-(3d). There are various ways of showing this:
Einstein did it by brute force in his 1905 paper on relativity,
by transforming the coordinates through a Lorentz transfor-
mation and the elds as in section 10.4.
Maxwells equations are basically wave equations. (They have
both wave solutions and static solutions.) We can verify that
when we start with a sinusoidal plane wave in one frame, then
transform into another frame, the result is again a valid sine-
wave solution, having been subjected to a Doppler shift (sec-
tion 3.2) and aberration (section 6.5). This requires checking
Section 10.7 Maxwells equations 189
that the wave is still purely transverse, but that follows eas-
ily from examining the invariants described in section 10.5.
By a celebrated mathematical result called Fouriers theorem,
any well-behaved wave can be written as a sum of sine waves,
and therefore any wave solution of Maxwells equations in one
frame is also a solution in every other frame.
Maxwells equations can be rewritten in terms of tensors, obey-
ing all the grammatical rules of index gymnastics. if they can
be written in this form, they are automatically Lorentz invari-
The last approach is the most general and elegant, so well pro-
vide a brief sketch of how it works. Equation (3a) has 4 times the
charge on the right, while (3d) has 4 times the current. These both
relate to the current four-vector J, so clearly we need to combine
them somehow into a single equation with J on the right. Since
the local form of equation (3a) involves the three-dimensional di-
vergence, which contains rst derivatives, the left-hand side of this
combined equation should have a rst derivative in it. Given the
grammatical rules of tensors and index gymnastics, we dont have
many possible ways to accomplish this. The only obvious thing to
try is

= 4kJ

. (6)
Writing this out for being the time coordinate, we get a relation
that equates the divergence of E to 4 times the charge density; this
is the local equivalent of (3a). If youve taken vector calculus and
know about the curl operator and Stokes theorem, then you can
verify that for referring to x, y, and z, we recover the local form
of (3d). The tensorial way of expressing (3b) and (3c) turns out to



= 0 . (7)
g / Example 5.
A generator Example 5
Figure g shows a crude, impractical generator, depicted in two
frames of reference.
190 Chapter 10 Electromagnetism
Flea 1 is sitting on top of the bar magnet, which creates the mag-
netic eld pattern shown with the arrows. To her, the bar magnet
is obviously at rest, and this magnetic eld pattern is static. As
the square wire loop is dragged away from her and the magnet,
its protons experience a force in the z direction, as determined
by the Lorentz force law F = qv B. The electrons, which are
negatively charged, feel a force in the +z direction. The conduc-
tion electrons are free to move, but the protons arent. In the front
and back sides of the loop, this force is perpendicular to the wire.
In the right and left sides, however, the electrons are free to re-
spond to the force. Since the magnetic eld is weaker on the right
side, current circulates around the loop.
Flea 2 is sitting on the loop, which she considers to be at rest. In
her frame of reference, its the bar magnet that is moving. Like
ea 1, she observes a current circulating around the loop, but
unlike ea 1, she cannot use magnetic forces to explain this cur-
rent. As far as she is concerned, the electrons were initially at
rest. Magnetic forces are forces between moving charges and
other moving charges, so a magnetic eld can never accelerate
a charged particle starting from rest. A force that accelerates a
charge from rest can only be an electric force, so she is forced
to conclude that there is an electric eld in her region of space.
This eld drives electrons around and around in circles it is a
curly eld. What reason can ea 2 offer for the existence of this
electric eld pattern? Well, shes been noticing that the magnetic
eld in her region of space has been changing, possibly because
that bar magnet over there has been getting farther away. She
observes that a changing magnetic eld creates a curly electric
eld. Thus the
/t term in equation (3c) is not optional; it is
required to exist if Maxwells equations are to be equally valid in
all frames.
Einstein opens his 1905 paper on relativity
begins with this sen-
tence: It is known that Maxwells electrodynamicsas usually
understood at the present timewhen applied to moving bodies,
leads to asymmetries which do not appear to be inherent in the
phenomena. He then gives essentially this example. Although
the observers in frames 1 and 2 agree on all physical measure-
ments, their explanations of the physical mechanisms, couched
in the language of Maxwells equations in the form (3), are com-
pletely different. In relativistic language, ea 2s explanation can
be written in terms of equation (7), in the case where the indices
are x, z, and t :
t x
= 0 ,
Zur Elektrodynamik bewegter K orper, Annalen der Physik. 17 (1905)
891. Translation by Perrett and Jeery
Section 10.7 Maxwells equations 191
which is the same as

= 0 .
Because the rst term is negative, the second term must be pos-
itive. Since equations (6) and (7) are written in terms of tensors,
obeying the grammatical rules of index gymnastics, we are guar-
anteed that they give consistent predictions in all frames of refer-
Conservation of charge and energy-momentum Example 6
Solving equation (6) for the current vector, we have


Conservation of charge (section 9.1.2, p. 152) can be expressed

= 0 .
If we substitute the rst equation into the second, we obtain


= 0


= 0 ,
with a sum over both and . But this equation is automatically
satised because T is antisymmetric, so for every combination of
indices and , the term involving T

is canceled by one con-

taining T

= T

. Thus conservation of charge does not have

to be added as a supplementary condition in addition to Maxwells
equations; it is automatically implied by Maxwells equations.
Using equation (2) on p. 185, one can also prove that Maxwells
equations imply conservation of energy-momentum.
192 Chapter 10 Electromagnetism
1 (a) A parallel-plate capacitor has charge per unit area on
its two plates. Use Gausss law to nd the eld between the plates.
(b) In the style of example 2 on p. 181, transform the eld to a
frame moving perpendicular to the plates, and verify that the result
makes sense in terms of the sources that are present.
(c) Repear the analysis for a frame moving parallel to the plates.
2 Weve seen examples such as gure a on p. 175 in which a
purely magnetic eld in one frame becomes a mixture of magnetic
and electric elds in another, and also cases like example 2 on p. 181
in which a purely electric eld transforms to a mixture. Can we have
a case in which a purely electric eld in one frame transforms to a
purely magnetic one in another? The easy way to do this problem
is by using invariants.
3 (a) Starting from equation (1) on p. 180 for T

, lower
an index to nd T

. Assume Minkowski coordinates and metric

signature +.
(a) Let v = (1, u
, u
, u
), where (u
, u
, u
) is the velocity three-
vector. Write out the matrix multiplication F

= qT

, and show
as claimed on p. 180 that the result is the Lorentz force law.
4 On p. 183 I presented a list of properties of the electromag-
netic stress tensor, followed by an argument in which the tensor is
constructed with three unknown constants a, b, and c, to be deter-
mined from those properties. The values of a and b are derived in
the text, and the purpose of this problem is to nish up by proving
that c = 1. The idea is to take the eld of a point charge, which
we know satises Maxwells equations, and then apply property 8,
which requires that the energy-conservation condition T
= 0
hold. This works out nicely if you apply this property to the x col-
umn of T, at a point that lies in the positive x direction relative to
the charge.
5 Show that the number of independent conditions contained
in equations (6) and (7) agrees with the number found in equations
6 Show that



= 0
(equation (7), p. 190) implies that the magnetic eld has zero diver-
7 Write down the elds of an electromagnetic plane wave propa-
gating in the z direction, choosing some polarization. Do not assume
a sinusoidal wave. Show that this is a solution of

= 0
Problems 193
(equation (6), p. 190, in a vacuum).
194 Chapter 10 Electromagnetism
Photo Credits
15 Atomic clock on plane: Copyright 1971, Associated press, used
under U.S. fair use exception to copyright law. 18 Ring laser
gyroscope: Wikimedia commons user Nockson, CC-BY-SA licensed.
20 Machine gunners body: Redrawn from a public-domain photo by
Cpl. Sheila Brooks. 20 Machine gunners head: Redrawn from a
sketch by Wenceslas Hollar, 17th century. 21 Minkowski: From
a 1909 book, public domain. 28 Muon storage ring at CERN:
Copyright 1974 by CERN; used here under the U.S. fair use doc-
trine. 29 Joan of Arc holding banner: Ingres, 1854. 29 Joan of
Arc interrogated: Delaroche, 1856. 75 Oscilloscope trace: From
Bertozzi, 1964; used here under the U.S. fair use doctrine. 81
Gamma-ray spectrum: Redrawn from a public-domain image by
Kieran Maher and Dirk Hunniger. 104 Eotvos: Unknown source.
Since Eotvos died in 1919, the painting itself would be public do-
main if done from life. Under U.S. law, this makes photographic
reproductions of the painting public domain. 105 Articial hori-
zon: NASA, public domain. 106 Pound and Rebka photo: Harvard
University. I presume this photo to be in the public domain, since
it is unlikely to have had its copyright renewed. 113 Surfer: Re-
drawn from a photo by Jon Sullivan, CC0 license. 133 Lambert
projection: Eric Gaba, CC-BY-SA. 156 Levi-Civita: Believed to
be public domain. Source:
Problems 195
aberration, 118
abstract index notation, 121
equivalent to birdtracks, 121
proper, 55, 57
acceleration vector, 57
ane parameter, 125
birdtracks, 110
birdtracks notation, 110
equivalent to abstract index notation, 121
black hole, 92
Bohr model, 145
dened, 21
Bridgman, P.W., 25
Brown-Bethe scenario, 92
causality, 15, 40
Chandrasekhar limit, 90
Christoel symbol, 168
clock-comparison experiments, 109
Minkowski, 20
correspondence principle
dened, 15
for time dilation, 15
covariant derivative, 136, 167
in relativity, 167
covector, 111
curl, 188
current vector, 149
Cvitanovic, Predrag, 110
de Sitter, Willem, 146
degree of freedom, 58
covariant, 167
in relativity, 167
dieomorphism, 111
divergence, 152, 188
dust, 154
Eotvos experiments, 104
Einstein summation convention, 122
Einstein synchronization, 19
electron capture, 90
equivalent to mass, 72
event, 12
event horizon, 58
black hole, 92
ne structure constant, 145
four-vector, 56
gamma factor
as an inner product, 60
dened, 25
gamma ray
pair production, 82
garage paradox, 31
geodesic, 168, 171
dierential equation for, 171
geodesic equation, 172
Goudsmit, 145
group velocity, 114
hyperbolic motion, 57, 67
inner product, 23
dened, 23
compared to scalar, 111, 112
dened, 21
Ives-Stilwell experiments, 52
Lambert cylindrical projection, 133
Levi-Civita symbol, 182
Levi-Civita, Tullio, 156
Lewis-Tolman paradox, 63
light cone, 17
Lorentz invariance, 43
Lorentz invariant, see invariant
Lorentz scalar, see scalar
Lorentz transformation, 29
lowering an index, 124
dened, 78
equivalent to energy, 72
metric, 23
Minkowski coordinates, 20
Minkowski, Hermann, 20
natural units, 24
neutron star, 90
normalization, 59
operationalism, 25
pair production, 82
parallel transport, 41
graphical notation for tensors, 110
phase velocity, 114
of light, 83
positron, 74
projection operator, 61
proper acceleration, 57
proper time, 23
pulsar, 91
raising an index, 124
rapidity, 55
Rindler coordinates, 130
scalar, 111, 112
compared to Lorentz invariant, 111
signature, 24
stress-energy tensor, 153
interpretation of, 157
Penrose graphical notation, 110
Thomas precession, 145
Thomas, Llewellyn, 145
three-vector, 62
Tolman-Oppenheimer-Volko limit, 91
torsion, 170
transverse polarization
of light, 83
Uhlenbeck, 145
acceleration, 57
distinguished from covector, 111
Penrose graphical notation, 110
3-volume covector, 125
ane, 124
Voyager space probe, 34
Waage, Harold, 101
Wheeler, John, 102
white dwarf, 89
world-line, 12
Index 197