0 Up votes0 Down votes

58 views197 pagesThis a textbook on on special relativity, aimed at undergraduates who have already completed a freshman survey course. The treatment of electromagnetism assumes previous exposure to Maxwell's equations in integral form, but no knowledge of vector calculus. As of January 2014, the coverage of topics is essentially complete, but it's still a work in progress.

Mar 07, 2014

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

This a textbook on on special relativity, aimed at undergraduates who have already completed a freshman survey course. The treatment of electromagnetism assumes previous exposure to Maxwell's equations in integral form, but no knowledge of vector calculus. As of January 2014, the coverage of topics is essentially complete, but it's still a work in progress.

Attribution Non-Commercial (BY-NC)

58 views

This a textbook on on special relativity, aimed at undergraduates who have already completed a freshman survey course. The treatment of electromagnetism assumes previous exposure to Maxwell's equations in integral form, but no knowledge of vector calculus. As of January 2014, the coverage of topics is essentially complete, but it's still a work in progress.

Attribution Non-Commercial (BY-NC)

- PHYS 237 Syllabus 2014
- Differential Dx of Angina
- Circuit
- The Difference Between Right and Left (J. Bennett)
- Advertising History
- Intro Physics-Ch 9-Operational Definitions of Energy
- Differential Forms.
- French, A. - Special Relativity
- Reissner Nordstrom Metric
- 1976.12.15 - Stefan Marinov - Rotating Disk Experiments
- Special Relativity
- LQG
- ijsrp-June-2012-57
- 2nd. Revised Completed Einstein General Theory of Relativity
- GR19
- Jan Brugues et al- Newton-Hooke algebras, nonrelativistic branes, and generalized pp-wave metrics
- Relativity_-_The_Special_and
- 0709.2948v1The Contribution of the Cosmological Constant to the Relativistic Bending of Light Revisited
- The Eco-Genesis of Ethics and Religion
- 115101011.pdf

You are on page 1of 197

Special Relativity

Benjamin Crowell

www.lightandmatter.com

Fullerton, California

www.lightandmatter.com

Copyright c _2013 Benjamin Crowell

rev. February 20, 2014

Permission is granted to copy, distribute and/or modify this docu-

ment under the terms of the Creative Commons Attribution Share-

Alike License, which can be found at creativecommons.org. The

license applies to the entire text of this book, plus all the illustra-

tions that are by Benjamin Crowell. All the illustrations are by

Benjamin Crowell except as noted in the photo credits or in paren-

theses in the caption of the gure. This book can be downloaded

free of charge from www.lightandmatter.com in a variety of formats,

including editable formats.

Brief Contents

1 Spacetime 11

2 Foundations (optional) 39

3 Kinematics 49

4 Dynamics 69

5 Inertia (optional) 101

6 Waves 109

7 Coordinates 129

8 Rotation (optional) 139

9 Flux 149

10 Electromagnetism 175

5

6

Contents

1 Spacetime 11

1.1 Three models of spacetime . . . . . . . . . . . . . 11

Aristotelian spacetime, 12.Galilean spacetime, 13.Einsteins

spacetime, 15.

1.2 Minkowski coordinates . . . . . . . . . . . . . . . 20

1.3 Measurement. . . . . . . . . . . . . . . . . . . 21

Invariants, 21.The metric, 22.The gamma factor, 25.

1.4 The Lorentz transformation . . . . . . . . . . . . . 28

Problems . . . . . . . . . . . . . . . . . . . . . . 34

2 Foundations (optional) 39

2.1 Causality . . . . . . . . . . . . . . . . . . . . 39

The arrow of time, 39.Initial-value problems, 39.A modest def-

inition of causality, 40.

2.2 Flatness . . . . . . . . . . . . . . . . . . . . . 40

Failure of parallelism, 41.Parallel transport, 41.Special rela-

tivity requires at spacetime, 41.

2.3 Additional postulates. . . . . . . . . . . . . . . . 43

2.4 Other axiomatizations . . . . . . . . . . . . . . . 43

Einsteins postulates, 43.Maximal time, 44.Comparison of the

systems, 44.

2.5 Lemma: spacetime area is invariant . . . . . . . . . 45

Problems . . . . . . . . . . . . . . . . . . . . . . 47

3 Kinematics 49

3.1 How can they both . . . ? . . . . . . . . . . . . . . 50

3.2 The stretch factor is the Doppler shift . . . . . . . . . 51

3.3 Combination of velocities . . . . . . . . . . . . . . 53

3.4 No frame of reference moving at c . . . . . . . . . . 55

3.5 The velocity and acceleration vectors. . . . . . . . . 56

The velocity vector, 56.The acceleration vector, 57.Constraints

on the velocity and acceleration vectors, 58.

3.6 Some kinematic identities . . . . . . . . . . . . . 61

3.7 The projection operator . . . . . . . . . . . . . . 61

3.8 Faster-than-light frames of reference?. . . . . . . . 64

Problems . . . . . . . . . . . . . . . . . . . . . . 66

4 Dynamics 69

4.1 Ultrarelativistic particles . . . . . . . . . . . . . . 69

4.2 E=mc

2

. . . . . . . . . . . . . . . . . . . . . . 72

4.3 Relativistic momentum . . . . . . . . . . . . . . . 77

Massless particles travel at c, 82.No global conservation of energy-

momentum in general relativity, 83.

4.4 Systems with internal structure. . . . . . . . . . . 84

4.5 Force . . . . . . . . . . . . . . . . . . . . . 86

Four-force, 86.The force measured by an observer, 86.Transformation

of the force measured by an observer, 88.

4.6 Degenerate matter . . . . . . . . . . . . . . . . 89

4.7 Tachyons and FTL . . . . . . . . . . . . . . . . 92

A defense in depth, 92.Experiments to search for tachyons, 93.

Problems . . . . . . . . . . . . . . . . . . . . . . 96

5 Inertia (optional) 101

5.1 What is inertial motion? . . . . . . . . . . . . . . 101

An operational denition, 101.Equivalence of inertial and grav-

itational mass, 103.

5.2 The equivalence principle. . . . . . . . . . . . . . 104

Equivalence of acceleration to a gravitational eld, 104.Eotvos

experiments, 104.Gravity without gravity, 105.Gravitational

Doppler shifts, 105.A varying metric, 106.

Problems . . . . . . . . . . . . . . . . . . . . . . 108

6 Waves 109

6.1 Frequency . . . . . . . . . . . . . . . . . . . . 109

Is times ow constant?, 109.Clock-comparison experiments, 109.

Birdtracks notation, 110.Duality, 111.

6.2 Phase . . . . . . . . . . . . . . . . . . . . . . 111

Phase is a scalar, 111.Scaling, 112.

6.3 The frequency-wavenumber covector . . . . . . . . . 112

Visualization, 112.The gradient, 113.Phase and group velocity,

113.

6.4 Duality . . . . . . . . . . . . . . . . . . . . . 114

Duality in 3+1 dimensions, 114.Change of basis, 117.

6.5 The Doppler shift and aberration. . . . . . . . . . . 117

Doppler shift, 117.Aberration, 118.

6.6 Some related mathematical tools . . . . . . . . . . 121

Abstract index notation, 121.Volume, 124.

Problems . . . . . . . . . . . . . . . . . . . . . . 127

7 Coordinates 129

7.1 An example: accelerated coordinates. . . . . . . . . 129

7.2 Transformation of vectors . . . . . . . . . . . . . . 130

7.3 Transformation of the metric . . . . . . . . . . . . 132

7.4 Summary of transformation laws. . . . . . . . . . . 134

7.5 Inertia and rates of change . . . . . . . . . . . . . 136

Problems . . . . . . . . . . . . . . . . . . . . . . 137

8 Rotation (optional) 139

8.1 Rotating frames of reference . . . . . . . . . . . . 139

No clock synchronization, 139.Rotation is locally detectable, 140.

The Sagnac eect, 140.A rotating coordinate system, 141.

8.2 Boosts and rotations. . . . . . . . . . . . . . . . 143

Rotations, 143.Boosts, 144.Thomas precession, 145.

Problems . . . . . . . . . . . . . . . . . . . . . . 147

8 Contents

9 Flux 149

9.1 The current vector . . . . . . . . . . . . . . . . . 149

Current as the ux of charged particles, 149.Conservation of

charge, 152.

9.2 The stress-energy tensor . . . . . . . . . . . . . . 153

Conservation and ux of energy-momentum, 153.Symmetry of

the stress-energy tensor, 153.Dust, 154.Rank-2 tensors and

their transformation law, 154.Pressure, 156.A perfect uid,

156.Two simple examples, 158.Energy conditions, 160.

9.3 Gausss theorem . . . . . . . . . . . . . . . . . 162

Integral conservation laws, 162.A simple form of Gausss theorem,

162.The general form of Gausss theorem, 163.The energy-

momentum vector, 165.

9.4 The covariant derivative. . . . . . . . . . . . . . 167

Comma, semicolon, and birdtracks notation, 169.Finding the

Christoel symbol from the metric, 169.The geodesic equation,

171.

Problems . . . . . . . . . . . . . . . . . . . . . . 174

10Electromagnetism 175

10.1 Relativity requires magnetism . . . . . . . . . . . 175

10.2 Fields in relativity. . . . . . . . . . . . . . . . . 176

Time delays in forces exerted at a distance, 176.Fields carry

energy., 176.Fields must have transformation laws, 177.

10.3 Electromagnetic elds . . . . . . . . . . . . . . 178

The electric eld, 178.The magnetic eld, 178.What about

gravity?, 181.

10.4 Transformation of the elds . . . . . . . . . . . . 181

10.5 Invariants . . . . . . . . . . . . . . . . . . . . 182

10.6 Stress-energy tensor of the electromagnetic eld . . . 182

10.7 Maxwells equations . . . . . . . . . . . . . . . 186

Statement and interpretation, 186.Experimental support, 187.

Incompatibility with Galilean spacetime, 187.Not manifestly rel-

ativistic in their original form, 187.Lorentz invariance, 189.

Problems . . . . . . . . . . . . . . . . . . . . . . 193

Appendix ??: Hints and solutions . . . . . . . . . . . . . . . . . . . . . . . . . . ??

Contents 9

10 Contents

Chapter 1

Spacetime

1.1 Three models of spacetime

The test of a rst-rate intelligence is the ability to hold two op-

posing ideas in mind at the same time and still retain the ability to

function. F. Scott Fitzgerald

a / Three views of spacetime. 1. A

typical graph of a particles mo-

tion: an oscillation. 2. In relativity,

its customary to swap the axes,

and 3 we can even remove the

axes entirely.

Time and space together make spacetime, gure a, the stage on

which physics is played out. Until 1905, physicists were trained to

accept two mutually contradictory theories of spacetime. Ill call

these the Aristotelian and Galilean views, although my colleagues

from that era would have been oended to be accused of even partial

Aristotelianism.

11

c / Valid vectors representing

observers and simultaneity, ac-

cording to the Aristotelian model

of spacetime.

b / 1. An observer and two clocks.

2. Idealization as events. 3. Vec-

tors used to represent relation-

ships between events.

1.1.1 Aristotelian spacetime

Figure b/1 shows an observer and two clocks, represented using

the graphical conventions of gure a/3. The existence of such a

material object at a certain place and time constitutes an event,

which we idealize as a point, b/2. Spacetime consists of the set of

all events. As time passes, a physical object traces out a continuous

curve, a set of events known in relativistic parlance as its world-

line. Since paper and computer screens are two-dimensional, the

drawings only represent one dimension of space plus one dimension

of time, which in relativity we call 1+1 dimensions. The real

universe has three spatial dimensions, so real spacetime has 3+1

dimensions. Most, but not all, of the interesting phenomena in

special relativity can be understood in 1+1 dimensions, so whenever

possible in this book I will draw 1+1-dimensional gures without

apology or explanation.

The relativists attitude is that events and relationships between

events are primary, while coordinates such as x and t are secondary

and possibly irrelevant. Coordinates let us attach labels like (x, t) to

points, but this is like God asking Adam to name all the birds and

animals: the animals didnt care about the names. Figure b/3 shows

the use of vectors to indicate relationships between points. Vector

o is an observer-vector, connecting two points on the world-line of

the person. It points from the past into the future. The vector s

connecting the two clocks is a vector of simultaneity. The clocks

have previously been synchronized side by side, and if we assume

that transporting them to separate locations doesnt disrupt them,

then the fact that both clocks read two minutes after three oclock

tells us that the two events occur at the same time.

The Aristotelian model of spacetime is characterized by a set

of rules about what vectors are valid observer- and simultaneity-

vectors. We require that every o vector be parallel to every other,

and likewise for s vectors. But, as is usual with vectors, we allow

the arrow to be drawn anywhere without considering the dierent

locations to have any signicance; that is, our model of spacetime

doesnt allow dierent regions to have dierent properties.

When Einstein was a university student, these rules (phrased dif-

ferently) were the ones he was taught to use in describing electricity

and magnetism. He later recalled imagining himself on a motorcy-

12 Chapter 1 Spacetime

cle, riding along next to a light wave and trying to imagine how his

observations could be reconciled with Maxwells equations. I dont

know whether he was ever brave enough to describe this daydream

to his professors, but if he had, their answer would have been essen-

tially that his hypothetical o vector was illegal. The good o vectors

were thought to be the ones that represented an observer at rest

relative to the ether, a hypothetical all-pervasive medium whose vi-

brations were electromagnetic waves. However silly this might seem

to us a hundred years later, it was in fact strongly supported by the

evidence. A vast number of experiments had veried the validity of

Maxwells equations, and it was known that if Maxwells equations

were valid in coordinates (x, t) dened by an observer o, they would

become invalid under the transformation (x

, t

) = (x +vt, t) to co-

ordinates dened by an observer o

to o.

1.1.2 Galilean spacetime

But the Aristotelian model was already known to be wrong when

applied to material objects. The classic empirical demonstration of

this fact came around 1610 with Galileos discovery of four moons

orbiting Jupiter, gure d. Aristotelianism in its ancient form was

originally devised as an explanation of why objects always seemed to

settle down to a natural state of rest according to an observer stand-

ing on the earths surface. But as Jupiter ew across the heavens,

its moons circled around it, without showing any natural tendency

to fall behind it like a paper cup thrown out the window of a car.

Just as an observer o

1

standing on the earth would consider the

earth to be at rest, o

2

hovering in a balloon at Jupiters cloudtops

would say that the jovian clouds represented an equally natural

state of rest.

d / A simulation of how Jupiter

and its moons might appear at

intervals of three hours through

a telescope. Because we see

the moons circular orbits edge-

on, their world-lines appear sinu-

soidal. Over this time period, the

innermost moon, Io, completes

half a cycle.

Section 1.1 Three models of spacetime 13

e / Valid vectors representing

observers and simultaneity, ac-

cording to the Galilean model of

spacetime.

f / Example 1.

We are thus led to a dierent, Galilean, set of rules for o and

s vectors. All s vectors are parallel to one another, but any vector

that is not parallel to an s vector is a valid o vector. (We may

wish to require that it point into the future rather than the past,

but Newtons laws are symmetric under time-reversal, so this is not

strictly necessary.)

Galilean spacetime, unlike Aristotelian spacetime, has no univer-

sal notion of same place. I can drive to Gettysburg, Pennsylvania,

and stand in front of the brass plaque that marks the site of the mo-

mentous Civil War battle. But am I really in the same place? An

observer whose frame of reference was xed to another planet would

say that our planet had moved through space since 1863.

Note that our geometrical description includes a notion of paral-

lelism, but not of angular measure. We dont know or care whether

the angle between an s and an o is 90 degrees. One represents

a distance, while the other represents an interval of time, and we

cant dene the angle between a distance and a time. The same was

true in the Aristotelian model; the vectors in gure c were drawn

perpendicular to one another simply as a matter of convention, but

any other angle could have been used.

The Galilean twin paradox Example 1

Alice and Betty are identical twins. Betty goes on a space voyage,

traveling away from the earth along vector o

1

and then turning

around and coming back on o

2

. Meanwhile, Alice stays on earth.

Because this is an experiment involving material objects, and the

conditions are similar to those under which Galilean relativity has

been repeatedly veried by experiment, we expect the results to

be consistent with Galilean relativitys claim that motion is relative.

Therefore it seems that it should be equally valid to consider Betty

and the spaceship as having been at rest the whole time, while

Alice and the planet earth traveled away fromthe spaceship along

o

3

and then returned via o

4

. But this is not consistent with the

experimental results, which show that Betty undergoes a violent

acceleration at her turnaround point, while Alice and the other

inhabitants of the earth feel no such effect.

The paradox is resolved by realizing that Galilean relativity de-

nes unambiguously whether or not two vectors are parallel. Its

true that we could x a frame of reference in which o

1

represented

the spaceship staying at rest, but o

2

is not parallel to o

1

, so in this

frame we still have a good explanation for why Betty feels an ac-

celeration: she has gone from being at rest to being in motion.

Regardless of which frame of reference we pick, and regardless

of whether we even x a frame of reference, o

3

and o

4

are parallel

to one another, and this explains why Alice feels no effect.

14 Chapter 1 Spacetime

g / The clock took up two seats,

and two tickets were bought for it

under the name of Mr. Clock.

h / All three clocks are mov-

ing to the east. Even though the

west-going plane is moving to the

west relative to the air, the air

is moving to the east due to the

earths rotation.

1.1.3 Einsteins spacetime

We have two models of spacetime, neither of which is capable

of describing all the phenomena we observe. Because of the rela-

tively crude state of technology ca. 1900, it required considerable

insight for Einstein to piece together a fragmentary body of indirect

evidence and arrive at a consistent and correct model of spacetime.

Today, the evidence is part of everyday life. For example, every

time you use a GPS receiver, youre using Einsteins theory of rela-

tivity. Somewhere between 1905 and today, technology became good

enough to allow conceptually simple experiments that students in

the early 20th century could only discuss in terms like Imagine that

we could. . .

A good jumping-on point is 1971. In that year, J.C. Hafele and

R.E. Keating brought atomic clocks aboard commercial airliners,

gure g, and went around the world, once from east to west and

once from west to east. Hafele and Keating observed that there was

a discrepancy between the times measured by the traveling clocks

and the times measured by similar clocks that stayed home at the

U.S. Naval Observatory in Washington.

1

The east-going clock lost

time, ending up o by 59 10 nanoseconds, while the west-going

one gained 273 7 ns.

We are used to thinking of time as absolute and universal, so it

is disturbing to nd that it can ow at a dierent rate for observers

in dierent frames of reference. Nevertheless, the eects that Hafele

and Keating observed were small. This makes sense: Galilean rel-

ativity had already been thoroughly veried for material objects

such as clocks, planets, and airplanes, so a new theory like Ein-

steins had to agree with Galileos to a good approximation, within

the Galilean theorys realm of applicability. This requirement of

backward-compatibility is known as the correspondence principle.

Its also reassuring that the eects on time were small compared

to the three-day lengths of the plane trips. There was therefore no

opportunity for paradoxical scenarios such as one in which the east-

going experimenter arrived back in Washington before he left and

then convinced himself not to take the trip. A theory that maintains

this kind of orderly relationship between cause and eect is said to

satisfy causality.

2

Hafele and Keating were testing specic quantitative predictions

of relativity, and they veried them to within their experiments er-

ror bars. Lets work backward instead, and inspect the empirical

results for clues as to how time works. The disagreements among

the clocks suggest that simultaneity is not absolute: dierent ob-

1

There were actually several eects at work, but these details do not aect

the present argument, which only depends on the fact that there is no absolute

time. See p. 106 for more on this topic.

2

For more about causality, see section 2.1, p. 39.

Section 1.1 Three models of spacetime 15

i / According to Einstein, simul-

taneity is relative, not absolute.

servers have dierent notions of simultaneity, as suggested in gure

i. Just as Galilean relativity freed the o vectors from the constraint

of being parallel to one another, Einstein frees the s vectors. Galileo

made same place into an ambiguous concept, while Einstein did

the same with simultaneous. But because a particular observer

does have methods of synchronizing clocks (e.g., Einstein synchro-

nization, example 4, p. 19), the denition of simultaneity isnt com-

pletely arbitrary. For each o vector we have a corresponding s vec-

tor, which represents that observers opinion as to what constitutes

simultaneity. Because the convention on a Cartesian xt graph is to

draw the axes at right angles to one another, we refer to such a pair

of vectors as orthogonal, but the word is not to be interpreted lit-

erally, since we cant dene an actual angle between a time interval

and a spatial displacement.

j / Possibilities for the behavior of orthogonality.

What, then, are the rules for orthogonality? Figure j shows three

possibilities. In each case, we have an initial pair of vectors o

1

and

s

1

that we assume are orthogonal, and we then draw a new pair o

2

and s

2

for a second observer who is in motion relative to the rst.

The Galilean case, where s

2

remains parallel to s

1

, has already been

ruled out. The second case is the one in which s rotates in the same

direction as o. This one is forbidden by causality, because if we kept

on rotating, we could eventually end up rotating o by 180 degrees,

so by a continued process of acceleration, we could send an observer

into a state in which her sense of time was reversed. We are left

with only one possibility for Einsteins spacetime, which is the one

in which a clockwise rotation of o causes a counterclockwise rotation

of s, like closing a pair of scissors.

Now there is a limit to how far this process can go, or else the s

and o would eventually lie on the same line. But this is impossible,

for a valid s vector can never be a valid o, nor an o a valid s. Such a

possibility would mean that an observer would describe two dierent

points on his own world-line as simultaneous, but an observer for

whom no time passes is not an observer at all, since observation

implies collecting data and then being able to remember it at some

later time. We conclude that there is a diagonal line that forms

the boundary between the set of possible s vectors and the set of

valid o vectors. This line has some slope, and the inverse of this

slope corresponds to some velocity, which is apparently a universal

16 Chapter 1 Spacetime

k / The light cone.

and xed property of Einsteins spacetime. This velocity we call c,

and the correspondence principle tells us that c must be very large,

or else Einsteinian, or relativistic, eects such as time distortion

would have been large even for motion at everyday speeds; in the

Hafele-Keating experiment they were quite small, even at the high

speed of a passenger jet.

Although c is a large number when expressed in meters per sec-

ond, for convenience in relativity we will always choose units such

that c = 1. The boundary between s and o vectors then appears on

spacetime diagrams as a diagonal line at 45 degrees. In more than

one spatial dimension, this boundary forms a cone, gure k, and for

reasons that will become more clear in a moment, this cone is called

the light cone. Vectors lying inside the light cone are referred to as

timelike, those outside as spacelike, and those on the cone itself as

lightlike or null.

An important advantage of Einsteins relativity over Galileos is

that it is compatible with the empirical observation that some phe-

nomena travel at a certain xed speed. Light travels at a xed speed,

and so do other phenomena such as gravitational waves (which, al-

though they have not yet been observed directly, have been indi-

rectly conrmed to exist through observations of the decaying or-

bits of binary neutron stars). So do all massless particles (subsection

4.3.1). This xed speed is c, and all observers agree on it. In 1905,

the only phenomenon known to travel at c was light, so c is usually

described as the speed of light, but from the modern point of view

it functions more as a kind of conversion factor between our units

of measurement for time and space. It is a property of spacetime,

not a property of light.

More fundamentally, c is the maximum speed of cause and eect.

If we could propagate cause and eect, e.g., by transmitting a signal,

at a speed greater than c, we would be violating both causality and

the principle that motion is relative:

The argument from causality: If a signal could be propagated

at a speed greater than c, then the vector r connecting the

cause and the eect would be spacelike. But by adding two

spacelike vectors we can make a vector lying in the past time-

like light cone, so by relaying the signal we could send a mes-

sage into the past, violating causality.

The argument from relativity of motion: Furthermore, there

would exist a frame in which vector r was a vector of simul-

taneity, so that the information would have been propagated

not just faster than c but in fact instantaneously. It would

then be possible to send signals from one place in the universe

to another without any time lag. This would allow perfect

synchronization of all clocks. But observations such as the

Section 1.1 Three models of spacetime 17

l / A ring laser gyroscope.

Hafele-Keating experiment demonstrate that clocks A and B

that have been initially synchronized will drift out of sync if

one is in motion relative to the other. With instantaneous

transmission of signals, we could determine, without having

to wait for A and B to be reunited, which was ahead and

which was behind. Since they dont need to be reunited, nei-

ther one needs to undergo any acceleration; each clock can

x an inertial frame of reference, with a velocity vector that

changes neither its direction nor its magnitude. But this vi-

olates the principle that constant-velocity motion is relative,

because each clock can be considered to be at rest, in its own

frame of reference. Since no experiment has ever detected any

violation of the relativity of motion, we conclude that instan-

taneous action at a distance is impossible.

One could complain that giving two separate arguments to this ef-

fect was gilding the lily. If we were doing mathematics, then one

proof would be enough. But this is science, not mathematics, so

every assumption that goes into an argument is not absolute truth

but provisional truth, founded on observations that have limited

precision and that cover a limited domain of conditions. Neither

the relativity of motion nor causality is a logical necessity; they are

both just generalizations based on a body of evidence. For more on

causality, and its uncertain empirical status, see section 2.1, p. 39.

The ring laser gyroscope Example 2

If youve own in a jet plane, you can thank relativity for helping

you to avoid crashing into a mountain or an ocean. Figure l shows

a standard piece of navigational equipment called a ring laser

gyroscope. A beam of light is split into two parts, sent around the

perimeter of the device, and reunited. Since the speed of light is

constant, we expect the two parts to come back together at the

same time. If they dont, its evidence that the device has been

rotating. The planes computer senses this and notes how much

rotation has accumulated.

No frequency-dependence Example 3

Relativity has only one universal speed, so it requires that all light

waves travel at the same speed, regardless of their frequency

and wavelength. Presently the best experimental tests of the in-

variance of the speed of light with respect to wavelength come

from astronomical observations of gamma-ray bursts, which are

sudden outpourings of high-frequency light, believed to originate

from a supernova explosion in another galaxy. One such obser-

vation, in 2009,

3

found that the times of arrival of all the different

frequencies in the burst differed by no more than 2 seconds out

of a total time in ight on the order of ten billion years!

3

http://arxiv.org/abs/0908.1832

18 Chapter 1 Spacetime

n / Example 4.

Einsteins train Example 4

The gure shows a famous thought experiment devised by Ein-

stein. A train is moving at constant velocity to the right when bolts

of lightning strike the ground near its front and back. Alice, stand-

ing on the dirt at the midpoint of the ashes, observes that the

light from the two ashes arrives simultaneously, so she says the

two strikes must have occurred simultaneously. Bob, meanwhile,

is sitting aboard the train, at its middle. He passes by Alice at the

moment when Alice later gures out that the ashes happened.

Later, he receives ash 2, and then ash 1. He infers that since

both ashes traveled half the length of the train, ash 2 must have

occurred rst. How can this be reconciled with Alices belief that

the ashes were simultaneous?

Figure n shows the corresponding spacetime diagram. It seems

paradoxical that Alice and Bob disagree on simultaneity, but this is

only because we have an ingrained prejudice in favor of Galilean

relativity. Alices method of determining that 1 and 2 were simul-

taneous is valid, and is known as Einstein synchronization. The

dashed line connecting 1 and 2 is orthogonal to Alices world-line.

But Bob has a different opinion about what constitutes simultane-

ity. The slanted dashed line is orthogonal to his world-line. Ac-

cording to Bob, 2 happened before the time represented by this

line, 1 after.

Example 4 is of course impractical as described, since real trains

dont travel at speeds anywhere near c relative to the dirt. We say

that their speeds are nonrelativistic. Because Einstein coined the

term relativity, and his version of relativity superseded Galileos,

the unmodied word is normally understood to refer to Einsteinian

relativity. A physicist who studies Einstein-relativity is a relativist.

A material object moving at a speed very close to c is described as

ultrarelativistic. One often hears laypeople describing relativity in

terms of certain eects that would happen if you went at the speed

of light. In fact, as well see in ch. 3 and 4, it is not possible to

accelerate material objects to c, and in any case that isnt necessary.

Relativistic eects exist at all speeds, but theyre weak at speeds

small compared to c.

Section 1.1 Three models of spacetime 19

Discussion Question

A The machine-gunner in the gure sends out a spray of bullets. Sup-

pose that the bullets are being shot into outer space, and that the dis-

tances traveled are trillions of miles (so that the human gure in the dia-

gram is not to scale). After a long time, the bullets reach the points shown

with dots which are all equally far from the gun. Their arrivals at those

points are events A through E, which happen at different times. The chain

of impacts extends across space at a speed greater than c. Does this

violate special relativity?

Discussion question A.

1.2 Minkowski coordinates

It is often convenient to name points in spacetime using coordinates,

and a particular type of naming, chosen by Einstein and Minkowski,

is the default in special relativity. Ill refer to the coordinates of this

system as Minkowski coordinates, and theyre what I have in mind

throughout this book when I use letters like t and x (or variations

like x

, t

o

, etc.) without further explanation. To dene Minkowski

coordinates in 1 + 1 dimensions, we need to pick (1) an event that

we consider to be the origin, (t, x) = (0, 0), (2) an observer-vector

o, and (3) a side of the observers world-line that we will call the

positive x side, and draw on the right in diagrams. The observer is

required to be inertial,

4

so that by repeatedly making copies of o

and laying them tip-to-tail, we get a chain that lies on top of the

observers world-line and represents ticks on the observers clock.

Minkowski coordinates use units with c = 1. Explicitly, we dene

4

For now we appeal to the freshman mechanics notion of inertial. A better

relativistic denition, which diers from the Newtonian one, is given in ch. 5.

20 Chapter 1 Spacetime

p / Hermann Minkowski (1864-

1909).

q / A set of Minkowski coor-

dinates.

the unique vector s that is orthogonal to o, points in the positive

direction, and has a length of one clock-tick. In practical terms, the

orthogonality could be dened by Einstein synchronization (example

4, p. 19), and the length by arranging that a radar echo travels to

the tip of s and back in two ticks.

We now construct a graph-paper lattice, gure q, by repeating

the vectors o and s. This grid denes a name (t, x) for each point

in spacetime.

1.3 Measurement

We would like to have a general system of measurement for relativity,

but so far we have only an incomplete patchwork. The length of a

timelike vector can be dened as the time measured on a clock that

moves along the vector. A spacelike vector has a length that is

measured on a ruler whose motion is such that in the rulers frame

of reference, the vectors endpoints are simultaneous. But there is no

third measuring instrument designed for the purpose of measuring

lightlike vectors.

Nor do we automagically get a complete system of measurement

just by having dened Minkowski coordinates. For example, we

dont yet know how to nd the length of a timelike vector such

as (t, x) = (2, 1), and we suspect that it will be not equal 2,

since the Hafele-Keating experiment tells us that a clock undergoing

the motion represented by x = 1 will probably not agree with a

clock carried by the observer whose clock we used in dening these

coordinates.

1.3.1 Invariants

The whole topic of measurement is apt to be confusing, because

the shifting landscape of relativity makes us feel as if weve walked

into a Salvador Dali landscape of melting pocket watches. A good

way to regain our bearings is to look for quantities that are in-

variant: they are the same in all frames of reference. A Euclidean

invariant, such as a length or an angle, is one that doesnt change

under rotations: all observers agree on its value, regardless of the

orientations of their frames of reference. For a relativistic invariant,

we require in addition that observers agree no matter what state of

motion they have. (A transformation that changes from one iner-

tial frame of reference to another, without any rotation, is called a

boost.)

Electric charge is a good example of an invariant. Electrons

in atoms typically have velocities of 0.01 to 0.1 (in our relativis-

tic units, where c = 1), so if an electrons charge depended on its

motion relative to an observer, atoms would not be electrically neu-

Section 1.3 Measurement 21

r / The two light-rectangles

have the same area.

tral. Experiments have been done

5

to test this to the phenomenal

precision of one part in 10

21

, with null results.

A vector can never be an invariant, since it changes direction

under a rotation. (Some vectors, such as velocities, also change un-

der a boost.) In freshman mechanics, any quantity, such as energy,

that wasnt a vector usually fell into the category we referred to as

scalars. In relativity, however, the term scalar has a much more

restrictive denition, which well discuss in section 6.2.1, p. 111.

By the way, beginners in relativity sometimes get confused about

invariance as opposed to conservation. They are not the same thing,

and neither implies the other. For example, momentum has a direc-

tion in space, so it clearly isnt invariant but well see in section

4.3 that there is a relativistic version of the momentum vector that

is conserved. As in Newtonian mechanics, we dont care if all ob-

servers agree on the momentum of a system we only care that

the law of momentum conservation is valid and has the same form

in all frames. Conversely, there are quantities that are invariant but

not conserved, mass being an example.

1.3.2 The metric

Area in 1+1 dimensions is also an invariant, as proved on p. 45.

The invariance of area has little importance on its own, but it pro-

vides a good stepping stone toward a relativistic system of measure-

ment. Suppose that we have events A (Charles VII is restored to

the throne) and B (Joan of Arc is executed). Now imagine that

technologically advanced aliens want to be present at both A and

B, but in the interim they wish to y away in their spaceship, be

present at some other event P (perhaps a news conference at which

they give an update on the events taking place on earth), but get

back in time for B. Since nothing can go faster than c (which we

take to equal 1), P cannot be too far away. The set of all possible

events P forms a rectangle, gure r/1, in the 1+1-dimensional plane

that has A and B at opposite corners and whose edges have slopes

equal to 1. We call this type of rectangle a light-rectangle.

The area of this rectangle will be the same regardless of ones

frame of reference. In particular, we could choose a special frame

of reference, panel 2 of the gure, such that A and B occur in the

same place. (They do not occur at the same place, for example, in

the suns frame, because the earth is spinning and going around the

sun.) Since the speed c = 1 is the same in all frames of reference,

and the sides of the rectangle had slopes 1 in frame 1, they must

still have slopes 1 in frame 2. The rectangle becomes a square,

whose diagonals are an o and an s for frame 2. The length of these

diagonals equals the time elapsed on a clock that is at rest in frame

5

Marinelli and Morpugo, The electric neutrality of matter: A summary,

Physics Letters B137 (1984) 439.

22 Chapter 1 Spacetime

2, i.e., a clock that glides through space at constant velocity from A

to B, reuniting with the planet earth when its orbit brings it to B.

The area of the gray regions can be interpreted as half the square

of this gliding-clock time, which is called the proper time. Proper

is used here in the somewhat archaic sense of own or self, as in

The Vatican does not lie within Italy proper. Proper time, which

we notate , can only be dened for timelike world-lines, since a

lightlike or spacelike world-line isnt possible for a material clock.

In terms of (Minkowski) coordinates, suppose that events A and

B are separated by a distance x and a time t. Then in general

t

2

x

2

gives the square of the gliding-clock time. Proof: Because

of the way that area scales with a rescaling of the coordinates, the

expression must have the form (. . .)t

2

+(. . .)tx+(. . .)x

2

, where each

(. . .) represents a unitless constant. The tx coecient must be zero

by the isotropy of space. The t

2

coecient must equal 1 in order

to give the right answer in the case of x = 0, where the coordinates

are those of an observer at rest relative to the clock. Since the area

vanishes for x = t, the x

2

coecient must equal 1.

When [x[ is greater than [t[, events A and B are so far apart in

space and so close together in time that it would be impossible to

have a cause and eect relationship between them, since c = 1 is

the maximum speed of cause and eect. In this situation t

2

x

2

is

negative and cannot be interpreted as a clock time, but it can be

interpreted as minus the square of the distance between A and B, as

measured in a frame of reference in which A and B are simultaneous.

Generalizing to 3+1 dimensions and to any vector v, not just

a displacement in spacetime, we have a measurement of the vector

dened by

v

2

t

v

2

x

v

2

y

v

2

z

.

In the special case where v is a spacetime displacement, this can be

referred to as the spacetime interval. Except for the signs, this looks

very much like the Pythagorean theorem, which is a special case of

the vector dot product. We therefore dene a function g called the

metric,

g(u, v) = u

t

v

t

u

x

v

x

u

y

v

y

u

z

v

z

.

Because of the analogy with the Euclidean dot product, we often use

the notation uv for this quantity, and we sometimes call it the inner

product. The metric is the central object of relativity. In general

relativity, which describes gravity as a curvature of spacetime, the

coecients occurring on the right-hand side are no longer 1, but

must vary from point to point. Even in special relativity, where the

coecients can be made constant, the denition of g is arbitrary up

to a nonzero multiplicative constant, and in particular many authors

dene g as the negative of our denition. The sign convention we

use is the most common one in particle physics, while the opposite

is more common in classical relativity. The set of signs, + or

Section 1.3 Measurement 23

+ ++, is called the signature of the metric.

In subsection 1.1.3 we developed the idea of orthogonality of

spacetime vectors, with the physical interpretation that if an ob-

server moves along a vector o, a vector s that is orthogonal to o is

a vector of simultaneity. This corresponds to the vanishing of the

inner product, o s = 0, and is only imperfectly analogous to the

idea that Euclidean vectors are perpendicular if their dot product is

zero. In particular, a nonzero Euclidean vector is never perpendic-

ular to itself, but for any lightlike vector v we have v v = 0. The

metric doesnt give us a measure of the length of lightlike vectors.

Physically, neither a ruler nor a clock can measure such a vector.

The metric in SI units Example 5

Units with c = 1 are known as natural units. (They are natural

to relativity in the same sense that units with = 1 are natural

to quantum mechanics.) Any equation expressed in natural units

can be reexpressed in SI units by the simple expedient of insert-

ing factors of c wherever they are needed in order to get units that

make sense. The result for the metric could be

g(u, v) = c

2

u

t

v

t

u

x

v

x

u

y

v

y

u

z

v

z

or

g(u, v) = u

t

v

t

(u

x

v

x

u

y

v

y

u

z

v

z

)/c

2

.

It doesnt matter which we pick, since the metric is arbitrary up to

a constant factor. The former expression gives a result in meters,

the latter seconds.

Orthogonal light rays? Example 6

On a spacetime diagram in 1+1 dimensions, we represent the

light cone with the two lines x = t , drawn at an angle of 90

degrees relative to one another. Are these lines orthogonal?

No. For example, if u = (1, 1) and v = (1, 1), then u v is 2, not

zero.

Pioneer 10 Example 7

The Pioneer 10 space probe was launched in 1972, and in 1973

was the rst craft to y by the planet Jupiter. It crossed the orbit

of the planet Neptune in 1983, after which telemetry data were

received until 2002. The following table gives the spacecrafts

position relative to the sun at exactly midnight on January 1, 1983

and January 1, 1995. The 1983 date is taken to be t = 0.

t (s) x y z

0 1.784 10

12

m 3.951 10

12

m 0.237 10

12

m

3.7869120000 10

8

s 2.420 10

12

m 8.827 10

12

m 0.488 10

12

m

Compare the time elapsed on the spacecraft to the time in a frame

of reference tied to the sun.

We can convert these data into natural units, with the distance

unit being the second (i.e., a light-second, the distance light trav-

24 Chapter 1 Spacetime

s / The twin paradox.

t / A graph of as a function

of v.

els in one second) and the time unit being seconds. Converting

and carrying out this subtraction, we have:

t (s) x y z

3.7869120000 10

8

s 0.2121 10

4

s 1.626 10

4

s 0.084 10

4

s

Comparing the exponents of the temporal and spatial numbers,

we can see that the spacecraft was moving at a velocity on the

order of 10

4

of the speed of light, so relativistic effects should be

small but not completely negligible.

Since the interval is timelike, we can take its square root and

interpret it as the time elapsed on the spacecraft. The result is

= 3.78691199610

8

s. This is 0.4 s less than the time elapsed

in the suns frame of reference.

1.3.3 The gamma factor

Figure s is the relativistic version of example 1 on p. 14. We

intend to analyze it using the metric, and since the metric gives

the same result in any frame, we have chosen for convenience to

represent it in the frame in which the earth is at rest. We have

a = (t, 0) and b = (t, vx), where v is the velocity of the spaceship

relative to the earth. Application of the metric gives proper time t

for the earthbound twin and t

1 v

2

for the traveling twin. The

same results apply for c and d. The result is that the earthbound

twin experiences a time that is greater by a factor (Greek letter

gamma) dened as = 1/

1 v

2

. If v is close to c, can be large,

and we nd that when the astronaut twin returns home, still youth-

ful, the earthbound twin can be old and gray. This was at one time

referred to as the twin paradox, and it was considered paradoxical

either because it seemed to defy common sense or because the trav-

eling twin could argue that she was the one at rest while the earth

was moving. The violation of common sense is in fact what was ob-

served in the Hafele-Keating experiment, and the latter argument

is fallacious for the same reasons as in the Galilean version given in

example 1.

We have in general the following interpretation:

Time dilation

A clock runs fastest in the frame of reference of an observer

who is at rest relative to the clock. An observer in motion

relative to the clock at speed v perceives the clock as running

more slowly by a factor of .

Although this is phrased in terms of clocks, we interpret it as

telling us something about time itself. The attitude is that we should

dene a concept in terms of the operations required in order to mea-

sure it: time is dened as what a clock measures. This philosophy,

which has been immensely inuential among physicists, is called

operationalism and was developed by P.W. Bridgman in the 1920s.

Section 1.3 Measurement 25

Our operational denition of time works because the rates of all

physical processes are aected equally by time dilation.

6

By the

time the twins in gure s are reunited, not only has the traveling

twin heard fewer ticks from her antique mechanical pocket watch,

but she has also had fewer heartbeats, and the ships atomic clock

agrees with her watch to within the precision of the watch.

self-check A

What is when v = 0? What does this mean? Express the equation for

in SI units. Answer, p. ??

Time dilation is symmetrical in the sense that it treats all frames

of reference democratically. If observers A and B arent at rest

relative to each other, then A says Bs time runs slow, but B says A

is the slow one. In gure s, the laws of physics make no distinction

between the frames of reference that coincide with vectors a and

b; as in the corresponding Galilean case of example 1 on p. 14, the

asymmetry comes about because a and c are parallel, but b and d

are not.

As shown in example 8 below, consistency demands that in ad-

dition to the eect on time we have a similar eect on distances:

Length contraction

A meter-stick appears longest to an observer who is at rest

relative to it. An observer moving relative to the meter-stick

at v observes the stick to be shortened by a factor of .

Our present discussion is limited to 1+1 dimensions, but in 3+1,

only the length along the line of motion is contracted (ch. 2, problem

2, p. 47). It should not be imagined that length contraction is

what an observer actually sees visually. Optical observations are

inuenced, for example, by the unequal times taken for light to

propagate from the ends of the stick to the eye. A simulation of this

type of eect is drawn in example 6 on p. 120.

An interstellar road trip Example 8

Alice stays on earth while her twin Betty heads off in a spaceship

for Tau Ceti, a nearby star. Tau Ceti is 12 light-years away, so

even though Betty travels at 87% of the speed of light, it will take

her a long time to get there: 14 years, according to Alice.

u / Example 8.

6

For more on this topic, see section 6.1.

26 Chapter 1 Spacetime

v / Time dilation measured

with an atomic clock at low

speeds. The theoretical curve,

shown with a dashed line, is

calculated from = 1/

1 v

2

;

at these small velocities, the

approximation 1 + v

2

/2 is

excellent, and the graph is in-

distinguishable from a parabola.

This graph corresponds to an

extreme close-up view of the

lower left corner of gure t. The

error bars on the experimental

points are about the same size

as the dots.

Betty experiences time dilation. At this speed, her is 2.0, so that

the voyage will only seem to her to last 7 years. But there is per-

fect symmetry between Alices and Bettys frames of reference, so

Betty agrees with Alice on their relative speed; Betty sees herself

as being at rest, while the sun and Tau Ceti both move backward

at 87% of the speed of light. How, then, can she observe Tau Ceti

to get to her in only 7 years, when it should take 14 years to travel

12 light-years at this speed?

We need to take into account length contraction. Betty sees the

distance between the sun and Tau Ceti to be shrunk by a factor of

2. The same thing occurs for Alice, who observes Betty and her

spaceship to be foreshortened.

A moving atomic clock Example 9

Expanding in a Taylor series, we nd 1v

2

/2, so that when

v is small, relativistic effects are approximately proportional to v

2

,

so it is very difcult to observe them at low speeds. This was

the reason that the Hafele-Keating experiment was done aboard

passenger jets, which y at high speeds. Jets, however, y at

high altitude, and this brings in a second time dilation effect, a

general-relativistic one due to gravity. The main purpose of the

experiment was actually to test this effect.

It was not until four decades after Hafele and Keating that anyone

did a conceptually simple atomic clock experiment in which the

only effect was motion, not gravity. In 2010, however, Chou et al.

7

succeeded in building an atomic clock accurate enough to detect

time dilation at speeds as low as 10 m/s. Figure v shows their

results. Since it was not practical to move the entire clock, the

experimenters only moved the aluminum atoms inside the clock

that actually made it tick.

Large time dilation Example 10

The time dilation effects described in example 9 were very small.

If we want to see a large time dilation effect, we cant do it with

something the size of the atomic clocks they used; the kinetic

energy would be greater than the total megatonnage of all the

worlds nuclear arsenals. We can, however, accelerate subatomic

particles to speeds at which is large. For experimental particle

physicists, relativity is something you do all day before heading

home and stopping off at the store for milk. An early, low-precision

experiment of this kind was performed by Rossi and Hall in 1941,

using naturally occurring cosmic rays. Figure w shows a 1974

experiment

8

of a similar type which veried the time dilation pre-

dicted by relativity to a precision of about one part per thousand.

7

Science 329 (2010) 1630

8

Bailey at al., Nucl. Phys. B150(1979) 1

Section 1.3 Measurement 27

w / Left : Apparatus used for the test of relativistic time dilation described in example 10. The prominent

black and white blocks are large magnets surrounding a circular pipe with a vacuum inside. (c) 1974 by CERN.

Right : Muons accelerated to nearly c undergo radioactive decay much more slowly than they would according

to an observer at rest with respect to the muons. The rst two data-points (unlled circles) were subject to

large systematic errors.

Particles called muons (named after the Greek letter , myoo)

were produced by an accelerator at CERN, near Geneva. A muon

is essentially a heavier version of the electron. Muons undergo

radioactive decay, lasting an average of only 2.197 s before they

evaporate into an electron and two neutrinos. The 1974 experi-

ment was actually built in order to measure the magnetic proper-

ties of muons, but it produced a high-precision test of time dilation

as a byproduct. Because muons have the same electric charge

as electrons, they can be trapped using magnetic elds. Muons

were injected into the ring shown in gure w, circling around it un-

til they underwent radioactive decay. At the speed at which these

muons were traveling, they had = 29.33, so on the average they

lasted 29.33 times longer than the normal lifetime. In other words,

they were like tiny alarm clocks that self-destructed at a randomly

selected time. The graph shows the number of radioactive decays

counted, as a function of the time elapsed after a given stream of

muons was injected into the storage ring. The two dashed lines

show the rates of decay predicted with and without relativity. The

relativistic line is the one that agrees with experiment.

1.4 The Lorentz transformation

Philosophically, coordinates are unnecessary, but in practical terms

they are convenient. They are arbitrary, so we can change from one

set to another. For example, we can simply change the units used to

measure time and position, as in the rst and second panels of gure

x. Nothing changes about the underlying events; only the labels are

dierent. The third panel of gure x shows a convenient convention

28 Chapter 1 Spacetime

y / The Lorentz transformation.

x / Two events are given as points

on a graph of position versus

time. Joan of Arc helps to re-

store Charles VII to the throne. At

a later time and a different posi-

tion, Joan of Arc is sentenced to

death.

we will use to depict such changes visually. The gray rectangle

represents the original coordinate grid from the rst panel, while

the grid of black lines represents the new version from the second

panel. Omitting the grid from the gray rectangle makes the diagram

easier to decode visually.

In special relativity it is of interest to convert between the Min-

kowski coordinates of observers who are in motion relative to one

another. The result, shown in gure y, is a kind of stretching and

smooshing of the diagonals. Since the area is invariant, one diagonal

grows by the same factor by which the other shrinks. This change

of coordinates is called the Lorentz transformation.

z / 1. The clock is at rest in the

original frame of reference, and

it measures a time interval t . In

the new frame of reference, the

time interval is greater by a fac-

tor of . 2. The ruler is moving in

the rst frame, represented by a

square, but at rest in the second

one, shown as a parallelogram.

Each picture of the ruler is a snap-

shot taken at a certain moment as

judged according to the second

frames notion of simultaneity. An

observer in rst frame judges the

rulers length instead according to

that frames denition of simul-

taneity, i.e., using points that are

lined up vertically on the graph.

The ruler appears shorter in the

frame in which it is moving.

Figure z shows how time dilation and length contraction come

about in this picture. It should be emphasized here that the Lorentz

transformation includes more eects than just length contraction

and time dilation. Many beginners at relativity get confused and

come to erroneous conclusions by trying to reduce everything to a

matter of inserting factors of in various equations. If the Lorentz

transformation amounted to nothing more than length contraction

and time dilation, it would be merely a change of units like the one

shown in gure x.

The Lorentz transformation translates into algebraic notation

Section 1.4 The Lorentz transformation 29

like this:

t

= t vx

x

= vt +x

(1)

The line x = 0 is described in the (t

, x

1/v, which is what justies the interpretation of v as a velocity.

The inverse transformation is the one with v replaced by v. These

properties are shared by the Galilean transformation. The fact that

this is the correct relativistic transformation can be veried by not-

ing that (1) the lines x = t are preserved, and (2) the determinant

equals 1, so that areas are preserved. Alternatively, it is sucient

to check the invariance of the spacetime interval under this trans-

formation.

A numerical example of invariance Example 11

Figure aa shows two frames of reference in motion relative to

one another at v = 3/5. (For this velocity, the stretching and

squishing of the main diagonals are both by a factor of 2.) Events

are marked at coordinates that in the frame represented by the

square are

(t , x) = (0, 0) and

(t , x) = (13, 11) .

The interval between these events is 13

2

11

2

= 48. In the

frame represented by the parallelogram, the same two events lie

at coordinates

(t

, x

) = (0, 0) and

(t

, x

) = (8, 4) .

Calculating the interval using these values, the result is

8

2

4

2

= 48, which comes out the same as in the other frame.

30 Chapter 1 Spacetime

aa / Example 11.

The garage paradox Example 12

One of the most famous of all the so-called relativity paradoxes

has to do with our incorrect feeling that simultaneity is well de-

ned. The idea is that one could take a schoolbus and drive it at

relativistic speeds into a garage of ordinary size, in which it nor-

mally would not t. Because of the length contraction, the bus

would supposedly t in the garage. The driver, however, will per-

ceive the garage as being contracted and thus even less able to

contain the bus.

The paradox is resolved when we recognize that the concept of

tting the bus in the garage all at once contains a hidden as-

sumption, the assumption that it makes sense to ask whether the

front and back of the bus can simultaneously be in the garage.

Observers in different frames of reference moving at high relative

speeds do not necessarily agree on whether things happen si-

multaneously. As shown in gure ab, the person in the garages

frame can shut the door at an instant B he perceives to be si-

multaneous with the front bumpers arrival A at the back wall of

the garage, but the driver would not agree about the simultaneity

of these two events, and would perceive the door as having shut

long after she plowed through the back wall.

Section 1.4 The Lorentz transformation 31

ab / Example 12: In the garages frame of reference, the bus is moving, and can t in the garage due

to its length contraction. In the buss frame of reference, the garage is moving, and cant hold the bus due to its

length contraction.

ac / Example 13.

Shifting clocks Example 13

The top row of clocks in the gure are located in three different

places. They have been synchronized in the frame of reference

of the earth, represented by the paper. This synchronization is

32 Chapter 1 Spacetime

carried out by exchanging light signals (Einstein synchronization),

as in example 4 on p. 19. For example, if the front and back clocks

both send out ashes of light when they think its 2 oclock, the

one in the middle will receive them both at the same time. Event

A is the one at which the back clock A reads 2 oclock, etc.

The bottom row of clocks are aboard the train, and have been

synchronized in a similar way. For the reasons discussed in ex-

ample 4, their synchronization differs from that of the earth-based

clocks. By referring to the diagram of the Lorentz transformation

shown on the right, we see that in the frame of the train, 2, C

happens rst, then B, then A.

This is an example of the interpretation of the term t

= . . . vx

in the Lorentz transformation (eq. (1), p. 30). Because the events

occur at different xs, each is shifted in time relative to the next,

according to clocks synchronized in frame 2 (t

, the train).

Section 1.4 The Lorentz transformation 33

Problem 5.

Problems

1 Astronauts in three dierent spaceships are communicating

with each other. Those aboard ships A and B agree on the rate at

which time is passing, but they disagree with the ones on ship C.

(a) Alice is aboard ship A. How does she describe the motion of her

own ship, in its frame of reference?

(b) Describe the motion of the other two ships according to Alice.

(c) Give the description according to Betty, whose frame of reference

is ship B.

(d) Do the same for Cathy, aboard ship C.

2 What happens in the equation for when you put in a

negative number for v? Explain what this means physically, and

why it makes sense.

3 The Voyager 1 space probe, launched in 1977, is moving faster

relative to the earth than any other human-made object, at 17,000

meters per second.

(a) Calculate the probes .

(b) Over the course of one year on earth, slightly less than one year

passes on the probe. How much less? (There are 31 million seconds

in a year.)

relativistically in the direction of its motion. Compute the amount

by which its diameter shrinks in this direction.

Which of these represent spacetime intervals that are equal to one

another?

6 (a) In Euclidean geometry in three dimensions, suppose we

have two vectors, a and b, which are unit vectors, i.e., a a = 1 and

b b = 1. What is the range of possible values for the inner product

a b?

(b) Repeat part a for two timelike, future-directed unit vectors in

3 + 1 dimensions.

7 Expressed in natural units, the Lorentz transformation is

t

= t vx

x

= vt +x .

(a) Insert factors of c to make it valid in units where c ,= 1. (b) Show

that in the limit c , these have the right Galilean behavior.

8 This problem assumes you have some basic knowledge of quan-

tum physics. One way of expressing the correspondence principle as

applied to special relativity is that in the limit c , all relativis-

tic expressions have to go over to their Galilean counterparts. What

would be the corresponding limit if we wanted to recover classical

mechanics from quantum mechanics?

34 Chapter 1 Spacetime

9 In 3 + 1 dimensions, prove that if u and v are nonzero,

future-lightlike, and not parallel to each other, then their sum is

future-timelike.

10 Prove that if u and v are nonzero, lightlike, and orthogonal

to each other, then they are parallel, i.e., u = cv for some c ,= 0.

11 The speed at which a disturbance travels along a string

under tension is given by v =

_

T/, where is the mass per unit

length, and T is the tension.

(a) Suppose a string has a density , and a cross-sectional area A.

Find an expression for the maximum tension that could possibly

exist in the string without producing v > c, which is impossible

according to relativity. Express your answer in terms of , A, and

c. The interpretation is that relativity puts a limit on how strong

any material can be.

per unit area required to break it by pulling it apart. The ten-

sile strength is measured in units of N/m

2

, which is the same as the

pascal (Pa), the mks unit of pressure. Make a numerical estimate

of the maximum tensile strength allowed by relativity in the case

where the rope is made out of ordinary matter, with a density on

the same order of magnitude as that of water. (For comparison,

kevlar has a tensile strength of about 4 10

9

Pa, and there is spec-

ulation that bers made from carbon nanotubes could have values

as high as 6 10

10

Pa.)

(c) A black hole is a star that has collapsed and become very dense,

so that its gravity is too strong for anything ever to escape from it.

For instance, the escape velocity from a black hole is greater than

c, so a projectile cant be shot out of it. Many people, when they

hear this description of a black hole in terms of an escape velocity

greater than c, wonder why it still wouldnt be possible to extract

an object from a black hole by other means than launching it out

as a projectile. For example, suppose we lower an astronaut into a

black hole on a rope, and then pull him back out again. Why might

this not work?

Problems 35

Problem 14.

12 The rod in the gure is perfectly rigid. At event A, the

hammer strikes one end of the rod. At event B, the other end moves.

Since the rod is perfectly rigid, it cant compress, so A and B are

simultaneous. In frame 2, B happens before A. Did the motion at

the right end cause the person on the left to decide to pick up the

hammer and use it?

Problem 12.

13 Use a spacetime diagram to resolve the following relativity

paradox. Relativity says that in one frame of reference, event A

could happen before event B, but in someone elses frame B would

come before A. How can this be? Obviously the two people could

meet up at A and talk as they cruised past each other. Wouldnt

they have to agree on whether B had already happened?

14 The grid represents spacetime in a certain frame of reference.

Event A is marked with a dot. Mark additional points satisfying the

following criteria. (Pick points that lie at the intersections of the

gridlines.)

Point B is at the same location as A in this frame of reference, and

lies in its future.

C is also in point As future, is not at the same location as A in

this frame, but is in the same location as A according to some other

frame of reference.

D is simultaneous with A in this frame of reference.

E is not simultaneous with A in this frame of reference, but is si-

multaneous with it according to some other frame.

F lies in As past according to this frame of reference, but could not

have caused A.

G lies in As future according to this frame of reference, but is in its

past according to some other frames.

H lies in As future according to any frame of reference, not just this

one.

36 Chapter 1 Spacetime

I is the departure of a spaceship, which arrives at A.

J could have caused A, but could not have been the departure of a

spaceship like I that arrived later at A.

Problems 37

38 Chapter 1 Spacetime

a / Newtons laws do not dis-

tinguish past from future. The

football could travel in either

direction while obeying Newtons

laws.

Chapter 2

Foundations (optional)

In this optional chapter we more systematically examine the foun-

dational assumptions of special relativity, which were appealed to

casually in chapter 1. Most readers will want to skip this chapter

and move on to ch. 3. The ordering of chapters 1 and 2 may seem

backwards, but many of the issues to be raised here are very subtle

and hard to appreciate without already understanding something

about special relativity in fact, Einstein and other relativists did

not understand them properly until decades after the introduction

of special relativity in 1905.

2.1 Causality

2.1.1 The arrow of time

Our intuitive belief in cause-and-eect mechanisms is not sup-

ported in any clearcut way by the laws of physics as currently un-

derstood. For example, we feel that the past aects the future but

not the other way around, but this feeling doesnt seem to translate

into physical law. For example, Newtons laws are invariant un-

der time reversal, gure a, as are Maxwells equations. (The weak

nuclear force is the only part of the standard model that violates

time-reversal symmetry, and even it is invariant under the CPT

transformation.)

There is an arrow of time provided by the second law of thermo-

dynamics, and this arises ultimately from the fact that, for reasons

unknown to us, the universe soon after the Big Bang was in a state

of extremely low entropy.

1

2.1.2 Initial-value problems

So rather than depending on the arrow of time, we may be better

o formulating a notion of causality based on existence and unique-

ness of initial-value problems. In 1776, Laplace gave an inuential

early formulation of this idea in the context of Newtonian mechan-

ics: Given for one instant an intelligence which could comprehend

all the forces by which nature is animated and the respective posi-

1

One can nd a vast amount of nonsense written about this, such as claims

that the second law is derivable without reference to any cosmological con-

text. For a careful treatment, see Callender, Thermodynamic Asymmetry

in Time, The Stanford Encyclopedia of Philosophy, plato.stanford.edu/

archives/fall2011/entries/time-thermo.

39

tions of the things which compose it . . . nothing would be uncertain,

and the future as the past would be laid out before its eyes. The

reference to one instant is not compatible with special relativity,

which has no frame-independent denition of simultaneity. We can,

however, dene initial conditions on some spacelike three-surface,

i.e., a three-dimensional set of events that is smooth, has the topol-

ogy of Euclidean space, and whose events are spacelike in relation

to one another.

Unfortunately it is not obvious whether the classical laws of

physics satisfy Laplaces denition of causality. Two interesting

and accessible papers that express a skeptical view on this issue are

Norton, Causation as Folk Science, philsci-archive.pitt.edu/

1214; and Echeverria et al., Billiard balls in wormhole spacetimes

with closed timelike curves: Classical theory, http://resolver.

caltech.edu/CaltechAUTHORS:ECHprd91. The Norton paper in

particular has generated a large literature at the interface between

physics and philosophy, and one can nd most of the relevant ma-

terial online using the keywords Nortons dome.

Nor does general relativity oer much support to the Laplacian

version of causality. For example, general relativity says that given

generic initial conditions, gravitational collapse leads to the forma-

tion of singularities, points where the structure of spacetime breaks

down and various measurable quantities become innite. Singu-

larities typically violate causality, since the laws of physics cant

describe them. In a famous image, John Earman wrote that if we

have a certain type of singularity (called a naked singularity), all

sorts of nasty things . . . emerge helter-skelter . . . , including TV

sets showing Nixons Checkers speech, green slime, Japanese horror

movie monsters, etc.

2.1.3 A modest denition of causality

Since there does not seem to be any reason to expect causality

to hold in any grand sense, we will content ourselves here with a

very modest and specialized denition, stated as a postulate, that

works well enough for special relativity.

P1. Causality. There exist events 1 and 2 such that the dis-

placement vector r

12

is timelike in all frames.

This is sucient to rule out the rotational version of the

Lorentz transformation shown in gure j on p. 16. If P1 were vi-

olated, then we could never describe one event as causing another,

since there would always be frames of reference in which the eect

was observed as preceding the cause.

2.2 Flatness

40 Chapter 2 Foundations (optional)

b / An airplane ying from

Mexico City to London follows

the shortest path, which is a

segment of a great circle. A path

of extremal length between two

points is called a geodesic.

c / Transporting the vector

along path AC gives a different

result than doing it along the path

ABC.

d / Parallel transport.

2.2.1 Failure of parallelism

In postulate P1 we implicitly assumed that given two points,

there was a certain vector connecting them. This is analogous to

the Euclidean postulate that two points dene a line.

For insight, lets think about how the Euclidean version of this

assumption could fail. Euclidean geometry is only an approximate

description of the earths surface, for example, and this is why at

maps always entail distortions of the actual shapes. The distortions

might be negligible on a map of Connecticut, but severe for a map

of the whole world. That is, the globe is only locally Euclidean.

On a spherical surface, the appropriate object to play the role of a

line is a great circle, gure b. The lines of longitude are examples

of great circles, and since these all coincide at the poles, we can see

that two points do not determine a line in noneuclidean geometry.

A two-dimensional bug living on the surface of a sphere would

not be able to tell that the sphere was embedded in a third dimen-

sion, but it could still detect the curvature of the surface. It could

tell that Euclids postulates were false on large distance scales. A

method that has a better analog in spacetime is shown in gure

c: transporting a vector from one point to another depends on the

path along which it was transported. This eect is our denition of

curvature.

2.2.2 Parallel transport

The particular type of transport that we have in mind here is

called parallel transport. When I walk from the living room to

the kitchen while carrying a mechanical gyroscope, Im parallel-

transporting the spacelike vector indicated by the direction of its

axis. Figure d shows that parallel transport can also be dened for

timelike vectors, and that parallel transport can be dened in space-

time using only inertial motion, clocks, and intersection of world-

lines. Observers aboard the two spaceships exchange clocks in order

to verify the parallelism of their world-lines (vectors AB and CD,

which have equal lengths as measured by the proper time elapsed

aboard the ships). The observers shoot the clocks across the space

between them, and the clocks are set up so that when they pass by

one another, they automatically record one anothers readings. The

vectors are parallel if the record later reveals AD and BC intersected

at their midpoints, as measured by the proper times recorded on the

clocks.

2.2.3 Special relativity requires at spacetime

Hidden in a number of spots in chapter 1 was the following

assumption.

P2. Flatness of spacetime. Parallel-transporting a vector from

Section 2.2 Flatness 41

one point to another gives a result that is independent of the path

along which it was transported.

For example, when we established the form of the metric in sec-

tion 1.3.2, we used the fact, proved on p. 45, that area is a scalar,

but that proof depends on P2.

Property P2 is only approximately true, as shown explicitly by

the Gravity Probe B satellite, launched in 2004. The probe carried

four gyroscopes made of quartz, which were the most perfect spheres

ever manufactured, varying from sphericity by no more than about

40 atoms. After one year and about 5000 orbits around the earth,

the gyroscopes were found to have changed their orientations relative

to the distant stars by about 3 10

6

radians (gure e). This is a

violation of P2, but one that was very small and dicult to detect.

The result was in good agreement with the predictions of general

relativity, which describes gravity as a curvature of spacetime. The

smallness of the eect tells us that the earths gravitational eld

is not so large as to completely invalidate special relativity as a

description of the nearby region of spacetime. One of the basic

assumptions of general relativity is that in a small enough region

of spacetime, it is always a good approximation to assume P2, so

that general relativity is locally the same as special relativity. In

the Gravity Probe B experiment, the eect was small and hard to

detect, and this was the reason for letting the eect accumulate

over a large number of orbits, spanning a large region of spacetime.

Problem 5 on p. 48 investigates more quantitatively how the size of

curvature eects varies with the size of the region.

e / Precession angle as a function of time as measured by the four gyroscopes aboard Gravity Probe B.

42 Chapter 2 Foundations (optional)

2.3 Additional postulates

We make the following additional assumptions:

P3 Spacetime is homogeneous and isotropic. No time or place

has special properties that make it distinguishable from other

points, nor is one direction in space distinguishable from an-

other.

2

P4 Inertial frames of reference exist. These are frames in which

particles move at constant velocity if not subject to any forces.

3

We can construct such a frame by using a particular particle,

which is not subject to any forces, as a reference point. Inertial

motion is modeled by vectors and parallelism.

P5 Equivalence of inertial frames: If a frame is in constant-velocity

translational motion relative to an inertial frame, then it is also

an inertial frame. No experiment can distinguish one preferred

inertial frame from all the others.

P6 Relativity of time: There exist events 1 and 2 and frames of

reference dened by observers o and o

such that o r

12

is

true but o

r

12

is false, where the notation o r means that

observer o nds r to be a vector of simultaneity according to

some convenient criterion such as Einstein synchronization.

4

Postulates P3 and P5 describe symmetries of spacetime, while

P6 dierentiates the spacetime of special relativity from Galilean

spacetime; the symmetry described by these three postulates is re-

ferred to as Lorentz invariance, and all known physical laws have

this symmetry. Postulate P4 denes what we have meant when we

referred to the parallelism of vectors in spacetime (e.g., in gure

s on p. 25). Postulates P1-P6 were all the assumptions that were

needed in order to arrive at the picture of spacetime described in

ch. 1. This approach, based on symmetries, dates back to 1911.

5

2.4 Other axiomatizations

2.4.1 Einsteins postulates

Einstein used a dierent axiomatization in his 1905 paper on

special relativity:

6

2

For the experimental evidence on isotropy, see http://www.

edu-observatory.org/physics-faq/Relativity/SR/experiments.html#

Tests_of_isotropy_of_space.

3

Dening this no-force rule turns out to be tricky when it comes to gravity. As

discussed in ch. 5, this apparently minor technicality turns out to have important

consequences.

4

example 4, p. 19

5

W. v. Ignatowsky, Phys. Zeits. 11 (1911) 972. The original paper unfortu-

nately seems very dicult to obtain.

6

Paraphrased from the translation by W. Perrett and G.B. Jeery.

Section 2.3 Additional postulates 43

E1. Principle of relativity: The laws of electrodynamics and

optics are valid for all frames of reference for which the equations of

mechanics hold good.

E2. Light is always propagated in empty space with a denite

velocity c which is independent of the state of motion of the emitting

body.

These should be supplemented with our P2 and P3.

Einsteins approach has been slavishly followed in many later

textbook presentations, even though the special role it assigns to

light is not consistent with how modern physicists think about the

fundamental structure of the laws of physics. (In 1905 there was

no other phenomenon known to travel at c.) Einstein did not ex-

plicitly state anything like our P2 (atness), since he had not yet

developed the theory of general relativity or the idea of representing

gravity in relativity as spacetime curvature. When he did publish

the general theory, he described the distinction between special and

general relativity as a generalization of the class of acceptable frames

of reference to include accelerated as well as inertial frames. This

description has not stood the test of time, and today relativists

use atness as the distinguishing criterion. In particular, it is not

true, as one sometimes still hears claimed, that special relativity is

incompatible with accelerated frames of reference.

2.4.2 Maximal time

Another approach, presented, e.g., by Laurent,

7

combines our

P2 with the following:

T1 Metric: An inner product exists. Proper time is measured by

the square of the inner product of a world-line with itself.

T2 Maximum proper time: Inertial motion gives a world-line along

which the proper time is at a maximum with respect to small

changes in the world-line. Inertial motion is modeled by vec-

tors and parallelism, and this vector-space apparatus has the

usual algebraic properties in relation to the inner product re-

ferred to in T1, e.g., a (b +c) = a b +a c.

We have already seen an example of T2 in our analysis of the

twin paradox (gure s on p. 25). Conceptually, T2 is similar to

dening a line as the shortest path between two points, except that

we dene a geodesic as being the longest one (four our +

signature).

2.4.3 Comparison of the systems

It is useful to compare the axiomatizations P, E, and T from

sections 2.1.1-2.4.2 with each other in order to gain insight into how

7

Bertel Laurent, Introduction to Spacetime: A First Course on Relativity

44 Chapter 2 Foundations (optional)

f / Area is a scalar.

much wiggle room there is in constructing theories of spacetime.

Since they are logically equivalent, any statement occurring in one

axiomatization can be proved as a theorem in the other.

For example, we might wonder whether it is possible to equip

Galilean spacetime with a metric. The answer is no, since a system

with a metric would satisfy the axioms of system T, which are log-

ically equivalent to our system P. The underlying reason for this is

that in Galilean spacetime there is no natural way to compare the

scales of distance and time.

Or we could ask whether it is possible to compose variations on

the theme of special relativity, alternative theories whose properties

dier in some way. System P shows that this would be unlikely to

succeed without violating the symmetry of spacetime.

Another interesting example is Amelino-Camelias doubly-special

relativity,

8

in which we have both an invariant speed c and an invari-

ant length L, which is assumed to be the Planck length

_

G/c

3

.

The invariance of this length contradicts the existence of length

contraction. In order to make his theory work, Amelino-Camelia is

obliged to assume that energy-momentum vectors (section 4.3) have

their own special inner product that violates the algebraic properties

referred to in T2.

2.5 Lemma: spacetime area is invariant

In this section we prove from axioms P1-P6 that area in the x t

plane is invariant, i.e., it does not change between frames of refer-

ence. This result is used in section 1.3.2 to nd the form of the

spacetime metric.

Consider gure f. Vectors o

1

and s

1

are orthogonal and have

equal lengths as measured by a clock and a ruler (which are cali-

brated in units such that c = 1, e.g., seconds and light-seconds). The

square lattice of white polka-dots is obtained from them by repeated

addition. By assuming that this lattice construction is possible, we

are implicitly assuming postulate P2, atness of spacetime.

The same properties hold for vectors o

2

and s

2

, which give the

lattice of black dots. As required, the two lattices agree on their

45-degree diagonals. Now within the 10 10 portion of the white

lattice shown with gray shading, we have an area of 100. In the

same region we count about 100 or 101 black dots there is some

ambiguity because of the dots that lie on the boundary. The density

of white and black dots is in fact exactly equal, as can be veried

to any desired precision by making the region big enough. In other

words, the diagram is drawn so that area is preserved, which is what

we are going to show is required.

8

arxiv.org/abs/gr-qc/0012051

Section 2.5 Lemma: spacetime area is invariant 45

If it was observer 2 rather than 1 who was drawing the diagram,

presumably she would choose to draw the black dots in a square

lattice and vectors o

2

and s

2

at right angles. This would require

vectors o

1

and s

1

to be opened up at an oblique angle and the white

lattice to be non-square.

Now suppose we had not made area conserved. What if a region

containing 100 white dots had held 200 black ones? But a boost

of velocity v is the same as a ip of the spatial dimension followed

by a v boost and another ip. (If P2 failed, then it might not

be possible to ip a rigid shape in this way.) Therefore this would

violate one or the other of two principles: (1) that all frames of

reference are equally valid (there is no preferred frame such as that

of the ether); or (2) that space is isotropic, meaning that it has the

same properties in all directions (neither +x nor x is a preferred

direction). We conclude that if these two symmetry principles hold,

then spacetime area is the same for any two observers, so it is an

invariant.

It may seem unnecessarily clumsy that weve used the idea of

counting dots in the above argument, but remember that our main

use of this result is to derive the form of the metric, and before

the metric had been found, we had no system of measurement for

relativity, so we had only very primitive techniques at our disposal.

46 Chapter 2 Foundations (optional)

Problems

1 Section 2.5 gives an argument that spacetime area is a rela-

tivistic invariant. Is this argument also valid for Galilean relativity?

2 Section 2.5 gives an argument that spacetime area is a rela-

tivistic invariant. (a) Generalize this from 1+1 dimensions to 3+1.

(b) Use this result to prove that there is no relativistic length con-

traction eect along an axis perpendicular to the velocity.

3 The purpose of this problem is to nd how the direction of a

physical object such as a stick changes under a Lorentz transforma-

tion. Part b of problem 2 shows that relativistic length contraction

occurs only along the axis parallel to the motion. The generalization

of the 1+1-dimensional Lorentz transformation to 2+1 dimensions

therefore consists simply of augmenting equation (1) on p. 30 with

y

= y. Suppose that a stick, in its own rest frame, has one end

with a world-line (, 0, 0) and the other with (, p, q), where is the

sticks proper time. Call these ends A and B. In other words, we

have a stick that goes from the origin to coordinates (p, q) in the

(x, y) plane. Apply a Lorentz transformation for a boost with ve-

locity v in the x direction, and nd the equations of the world-lines

of the ends of the stick in the new (t

, x

, y

) coordinates. According

to this new frames notion of simultaneity, nd the coordinates of

B when A is at (t

, x

, y

q = 0, recover the 1 + 1-dimensional result for length contraction

given on p. 26. (b) Returning to the general case where q ,= 0,

consider the angle that the stick makes with the x axis, and the

related angle

Show that tan

= tan .

4 Section 2.2 discusses the idea that a two-dimensional bug

living on the surface of a sphere could tell that its space was curved.

Figure c on p. 41 shows one way of telling, by detecting the path-

dependence of parallel transport. A dierent technique would be to

look for violations of the Pythagorean theorem. In the gure below,

1 is a diagram illustrating the proof of the Pythagorean theorem in

Euclids Elements (proposition I.47). This diagram is equally valid

if the page is rolled onto a cylinder, 2, or formed into a wavy cor-

rugated shape, 3. These types of curvature, which can be achieved

without tearing or crumpling the surface, are not real to the bug.

They are simply side-eects of visualizing its two-dimensional uni-

verse as if it were embedded in a hypothetical third dimension

which doesnt exist in any sense that is empirically veriable to the

bug. Of the curved surfaces in the gure, only the sphere, 4, has

curvature that the bug can measure; the diagram cant be plastered

onto the sphere without folding or cutting and pasting. If a two-

dimensional being lived on the surface of a cone, would it say that

its space was curved, or not? What about a saddle shape?

Problems 47

Problem 5.

Problem 4.

5 The discrepancy in parallel transport shown in gure c on

p. 41 can also be interpreted as a measure of the triangles angular

defect d, meaning the amount S by which the sum of its interior

angles S exceeds the Euclidean value. (a) The gure suggests a

simple way of verifying that the angular defect of a triangle inscribed

on a sphere depends on area. It shows a large equilateral triangle

that has been dissected into four smaller triangles, each of which

is also approximately equilateral. Prove that D = 4d, where D is

the angular defect of the large triangle and d the value for one of

the four smaller ones. (b) Given that the proportionality to area

d = kA holds in general, nd some triangle on a sphere of radius R

whose area and angular defect are easy to calculate, and use it to

x the constant of proportionality k.

Remark: A being who lived on a sphere could measure d and A for some triangle

and infer R, which is a measure of curvature. The proportionality of the eect

to the area of the triangle also implies that the eects of curvature become

negligible on suciently small scales. The analogy in relativity is that special

relativity is a valid approximation to general relativity in regions of space that

are small enough so that spacetime curvature becomes negligible.

48 Chapter 2 Foundations (optional)

Chapter 3

Kinematics

At this stage, many students raise the following questions, which

turn out to be related to one another:

1. According to Einstein, if observers A and B arent at rest

relative to each other, then A says Bs time is slow, but B

says A is the slow one. How can this be? If A says B is slow,

shouldnt B say A is fast? After all, if I took a pill that sped up

my brain, everyone else would seem slow to me, and I would

seem fast to them.

2. Suppose I keep accelerating my spaceship steadily. What hap-

pens when I get to the speed of light?

3. In all the diagrams in section 1.4, the parallelograms have their

diagonals stretched and squished by a certain factor, which

depends on v. What is the interpretation of this factor?

49

3.1 How can they both . . . ?

Figure a shows how relativity resolves the rst question. If A and B

had an instantaneous method of communication such as Star Treks

subspace radio, then they could indeed resolve the question of who

was really slow.

a / Signals dont resolve the dis-

pute over who is really slow.

But relativity does not allow cause and eect to be propagated

outside the light cone, so the best they can actually do is to send

each other signals at c. In a/1, B sends signals to A at time intervals

of one hour as measured by Bs clock. According to As clock, the

signals arrive at an interval that is shorter than one hour as the two

spaceships approach one another, then longer than an hour after

they pass each other and begin to recede. As shown in a/2, the

situation is entirely symmetric if A sends signals to B.

Who is really slow? Neither. If A, like many astronauts, cut her

teeth as a jet pilot, it may occur to her to interpret the observations

by analogy with the Doppler eect for sound waves. Figure a is

in fact a valid diagram if the signals are clicks of sound, provided

that we interpret it as being drawn in the frame of reference of the

air. Sound waves travel at a xed speed relative to the air, and

the space and time units could be chosen such that the speed of

sound was represented by a slope of 1. But A will nd that in

the relativistic case, with signals traveling at c, her observations

of the time intervals are not in quantitative agreement with the

predictions she gets by plugging numbers into the familiar formulas

for the Doppler shift of sound waves. She may then say, Ah, the

analogy with sound isnt quite right. I need to include a correction

50 Chapter 3 Kinematics

factor for time dilation, since Bs time is slow. Im not slow, of

course. I feel perfectly normal.

But her analogy is false and needlessly complicates the situation.

In the version with sound waves and Galilean relativity, there are

three frames of reference involved: As, Bs, and the airs. The rela-

tivistic version is simpler, because there are only two frames, As and

Bs. Its neither helpful nor necessary to break down the observa-

tions into a factor describing what really happens and a correction

factor to account for the relativistic distortions of reality. All we

need to worry about is the world-lines and intersections of world-

lines shown in the spacetime diagrams, along with the metric, which

allows us to compute how much proper time is experienced by each

observer.

b / The twin paradox with signals

sent back to earth by the traveling

twin.

3.2 The stretch factor is the Doppler shift

Figure b shows how the ideas in the preceding section apply to the

twin paradox. In b/1 we see the situation as described by an impar-

tial observer, who says that both twins are traveling to the right.

But even the impartial observer agrees that one twins motion is

inertial and the others noninertial, which breaks the symmetry and

Section 3.2 The stretch factor is the Doppler shift 51

c / Interpretation of the iden-

tity D(v)D(v) = 1.

also allows the twins to meet up at the end and compare clocks.

For convenience, b/2 shows the situation in the frame where the

earthbound twin is at rest. Both panels of the gure are drawn such

that the relative velocity of the twins is 3/5, and in panel 2 this

is the inverse slope of the traveling twins world-lines. Straightfor-

ward algebra and geometry (problem 6, p. 67) shows that in this

particular example, the period observed by the earthbound twin is

increased by a factor of 2. But 2 is exactly the factor by which the

diagonals of the parallelogram are stretched and compressed in a

Lorentz transformation for a velocity of 3/5. This is true in general:

the stretching and squishing factors for the diagonals are the same

as the Doppler shift. We notate this factor as D (which can stand

for either Doppler or diagonal), and in general it is given by

D(v) =

_

1 +v

1 v

(problem 7, p. 67).

self-check A

If you measure with a ruler on gure b/2, you will nd that the labeled

sides of the quadrilateral differ by less than a factor of 2. Why is this?

Answer, p. ??

This expression is for the longitudinal Doppler shift, i.e., the case

where the source and observer are in motion directly away from one

another (or toward one another if v < 0). In the purely transverse

case, there is a Doppler shift 1/ which can be interpreted as simply

a measure of time dilation.

The useful identity D(v)D(v) = 1 is trivial to prove alge-

braically, and has the following interpretation. Suppose, as in gure

c, that A and C are at rest relative to one another, but B is moving

relative to them. Bs velocity relative to A is v, and Cs relative

to B is v. At regular intervals, A sends lightspeed pings to B,

who then immediately retransmits them to C. The interval between

pings accumulates two Doppler shifts, and the result is their prod-

uct D(v)D(v). But B didnt actually need to receive the original

signal and retransmit it; the results would have been the same if B

had just stayed out of the way. Therefore this product must equal

1, so D(v)D(v) = 1.

Ives-Stilwell experiments Example 1

The transverse Doppler shift is a characteristic prediction of spe-

cial relativity, with no nonrelativistic counterpart, and Einstein sug-

gested it early on as a test of relativity. However, it is difcult to

measure with high precision, because the results are sensitive

to any error in the alignment of the 90-degree angle. Such ex-

periments were eventually performed, with results that conrmed

relativity,

1

but one-dimensional measurements provided both the

1

See, e.g., Hasselkamp, Mondry, and Scharmann, Zeitschrift f ur Physik A:

Hadrons and Nuclei 289 (1979) 151.

52 Chapter 3 Kinematics

earliest tests of the relativistic Doppler shift and the most pre-

cise ones to date. The rst such test was done by Ives and Stil-

well in 1938, using the following trick. The relativistic expression

D(v) =

_

(1 + v)/(1 v) for the Doppler shift has the property

that D(v)D(v) = 1, which differs from the nonrelativistic result

of (1 + v)(1 v) = 1 v

2

. One can therefore accelerate an

ion up to a relativistic speed, measure both the forward Doppler

shifted frequency f

f

and the backward one f

b

, and compute

_

f

f

f

b

.

According to relativity, this should exactly equal the frequency f

o

measured in the ions rest frame.

In a particularly exquisite modern version of the Ives-Stilwell idea,

2

Saathoff et al. circulated Li

+

ions at v = .064 in a storage ring.

An electron-cooler technique was used in order to reduce the

variation in velocity among ions in the beam. Since the identity

D(v)D(v) = 1 is independent of v, it was not necessary to mea-

sure v to the same incredible precision as the frequencies; it was

only necessary that it be stable and well-dened. The natural line

width was 7 MHz, and other experimental effects broadened it fur-

ther to 11 MHz. By curve-tting the line, it was possible to achieve

results good to a few tenths of a MHz. The resulting frequencies,

in units of MHz, were:

f

f

= 582490203.44 .09

f

b

= 512671442.9 0.5

_

f

f

f

b

= 546466918.6 0.3

f

o

= 546466918.8 0.4 (from previous experimental work)

The spectacular agreement with theory has made this experiment

a lightning rod for anti-relativity kooks.

If one is searching for small deviations from the predictions of

special relativity, a natural place to look is at high velocities. Ives-

Stilwell experiments have been performed at velocities as high as

0.84, and they conrm special relativity.

3

3.3 Combination of velocities

In nonrelativistic physics, velocities add in relative motion. For

example, if a boat moves relative to a river, and the river moves

relative to the land, then the boats velocity relative to the land

is found by vector addition. This linear behavior cannot hold rel-

ativistically. For example, if a spaceship is moving relative to the

earth at velocity 3/5 (in units with c = 1), and it launches a probe

at velocity 3/5 relative to itself, we cant have the probe moving at

a velocity of 6/5 relative to the earth, because this would be greater

2

G. Saatho et al., Improved Test of Time Dilation in Relativity, Phys.

Rev. Lett. 91 (2003) 190403. A publicly available description of the experiment

is given in Saathos PhD thesis, www.mpi-hd.mpg.de/ato/homes/saathoff/

diss-saathoff.pdf.

3

MacArthur et al., Phys. Rev. Lett. 56 (1986) 282 (1986)

Section 3.3 Combination of velocities 53

d / Two Lorentz transforma-

tions of v = 3/5 are applied one

after the other. The transforma-

tions are represented according

to the graphical conventions of

section 1.4.

e / Example 2.

than the maximum speed of cause and eect, which is 1. To see how

to add velocities relativistically, we consider the eect of carrying

the two Lorentz transformations one after the other, gure d.

The inverse slope of the left side of each parallelogram indicates

its velocity relative to the original frame, represented by the square.

Since the left side of the nal parallelogram has not swept past the

diagonal, clearly it represents a velocity of less than 1, not more. To

determine the result, we use the fact that the D factors multiply. We

chose velocities 3/5 because it gives D = 2, which is easy to work

with. Doubling the long diagonal twice gives an over-all stretch

factor of 4, and solving the equation D(v) = 4 for v gives the result,

v = 15/17.

We can now see the answer to question 2 on p. 49. If we keep

accelerating a spaceship steadily, we are simply continuing the pro-

cess of acceleration shown in gure d. If we do this indenitely, the

velocity will approach c = 1 but never surpass it. (For more on this

topic of going faster than light, see section 4.7.)

Accelerating electrons Example 2

Figure e shows the results of a 1964 experiment by Bertozzi in

which electrons were accelerated by the static electric eld E of

a Van de Graaff accelerator of length

1

. They were then allowed

to y down a beamline of length

2

= 8.4 m without being acted

on by any force. The time of ight t

2

was used to nd the nal

velocity v =

2

/t

2

to which they had been accelerated. (To make

the low-energy portion of the graph legible, Bertozzis highest-

energy data point is omitted.)

If we believed in Newtons laws, then the electrons would have an

acceleration a

N

= Ee/m, which would be constant if, as we pre-

tend for the moment, the eld E were constant. (The electric eld

inside a Van de Graaff accelerator is not really quite constant, but

this will turn out not to matter.) The Newtonian prediction for the

time over which this acceleration occurs is t

N

=

_

2m

1

/eE. An

acceleration a

N

acting for a time t

N

should produce a nal veloc-

ity a

N

t

N

=

_

2eV/m, where V = E

1

is the voltage difference.

(By conservation of energy, this equation holds even if the eld

is not constant.) The solid line in the graph shows the prediction

of Newtons laws, which is that a constant force exerted steadily

over time will produce a velocity that rises linearly and without

limit.

The experimental data, shown as black dots, clearly tell a differ-

ent story. The velocity asymptotically approaches a limit, which

we identify as c. The dashed line shows the predictions of spe-

cial relativity, which we are not yet ready to calculate because we

havent yet seen how kinetic energy depends on velocity at rel-

ativistic speeds. The calculation is carried out in example 4 on

p. 75.

54 Chapter 3 Kinematics

Note that the relationship between the rst and second frames of

reference in gure d is the same as the relationship between the sec-

ond and third. Therefore if a passenger is to feel a steady sensation

of acceleration (or, equivalently, if an accelerometer aboard the ship

is to show a constant reading), then the proper time required to pass

from the rst frame to the second must be the same as the proper

time to go from the second to the third. A nice way to express this is

to dene the rapidity = ln D. Combining velocities means multi-

plying Ds, which is the same as adding their logarithms. Therefore

we can write the relativistic rule for combining velocities simply as

c

=

1

+

2

.

The passengers perceive the acceleration as steady if increase by

the same amount per unit of proper time. In other words, we can

dene a proper acceleration d/d, which corresponds to what an

accelerometer measures.

Rapidity is convenient and useful, and is very frequently used

in particle physics. But in terms of ordinary velocities, the rule for

combining velocities can also be rewritten using identity [9] from

section 3.6 as

v

c

=

v

1

+v

2

1 +v

1

v

2

.

self-check B

How can we tell that this equation is written in natural units? Rewrite it

in SI units. Answer, p. ??

3.4 No frame of reference moving at c

We have seen in section 3.3 that no continuous process of accel-

eration can boost a material object to c. That is, the subluminal

(slower than light) nature of a electron or a person is a fundamental

feature of its identity and can never be changed. Einstein can never

get on his motorcycle and drive at c as he imagined when he was a

young man, so we material beings can never see the world from a

frame of reference that travels at c.

Our universe does, however, contain ingredients such as light

rays, gluons, and gravitational waves that travel at c, so we might

wonder whether these things could be put together to form observers

who do move at c. But this is not possible according to special rel-

ativity, because if we let v approach innity, extrapolation of gure

d on p. 54 shows that the Lorentz transformation would compress

all of spacetime onto the light cone, reducing its number of dimen-

sions by 1. Distinct points would be merged, which would make it

impossible to use this frame to describe the same phenomena that

a subluminal observer could describe. That is, the transformation

would not be one-to-one, and this is unacceptable physically.

Section 3.4 No frame of reference moving at c 55

f / A playing card returns to

its original state when rotated

by 180 degrees. Its orientation,

unlike the orientation of an arrow,

doesnt behave as a vector, since

it doesnt transform in the usual

way under rotations. Under a

180-degree rotation, a vector

should negate itself rather than

coming back to its original state.

3.5 The velocity and acceleration vectors

3.5.1 The velocity vector

In a freshman course in Newtonian mechanics, we would dene

a vector as something that has three components. Furthermore, we

would require it to transform in a certain way under a rotation.

For example, we could form the collection of numbers (e, T, DJIA),

where e is the fundamental charge, T is the temperature in Bualo,

New York, and DJIA measures how the stock market is doing. But

this would not be a vector, since it doesnt act the right way when

rotated (this particular vector is invariant under rotations). Fig-

ure f gives a less silly non-example. In contradistinction to a vector,

a scalar is specied by a single real number and is invariant under

rotations. The most basic example of a Newtonian vector was a

displacement (x, y, z), and from the displacement vector we

would go on to construct other quantities such as a velocity vec-

tor v = r/t. This worked because in Newtonian mechanics t

was treated as a scalar, and dividing a vector by a scalar produces

something that again transforms in the right way to be a vector.

Now lets upgrade to relativity, and work through the same steps

by analogy. When I say vector in this book, I mean something that

in 3+1 dimensions has four components. This can also be referred

to as a four-vector. Our only example so far has been the spacetime

displacement vector r = (t, x, y, z). This vector transforms

according to the Lorentz transformation. In general, we require as

part of the denition of a (four-)vector that it transform in the usual

way under both rotations and boosts (Lorentz transformations). We

might now imagine that the next step should be to construct a

velocity four-vector r/t. But relativistically, the quantity r/t

would not transform like a vector, e.g., if r was spacelike, then there

would be a frame in which we had t = 0, and then r/t would

be nite in some frames but innite in others, which is absurd.

To construct a valid vector, we have to divide r by a scalar.

The only scalar that could be relevant would be the proper time ,

and this is indeed how the velocity vector is dened in relativity.

For an inertial world-line (one with constant velocity), we dene

v = r/. The generalization to noninertial world-lines requires

that we make this denition into a derivative:

v =

dr

d

Not all objects have well-dened velocity vectors. For exam-

ple, consider a ray of light with a straight world-line, so that the

derivative d. . . / d. . . is the same as the ratio of nite dierences

. . . /. . ., i.e., calculus isnt needed. A ray of light has v = c,

so that applying the metric to any segment of its world-line gives

= 0. Attempting to calculate v = r/ then gives something

56 Chapter 3 Kinematics

g / A spaceship (curved world-

line) moves with an acceleration

perceived as constant by its

passengers.

of the form (, ). We will see in section 4.3.1 that all massless

particles, not just photons, travel at c, so the same would apply to

them. Therefore a velocity vector is only dened for particles whose

world-lines are timelike, i.e., massive particles.

Velocity vector of an object at rest Example 3

An object at rest has v = (1, 0). The rst component indicates that

if we attach a clock to the object with duct tape, the proper time

measured by the clock suffers no time dilation according to an

observer in this frame, dt / d = 1. The second component tells

us that the objects position isnt changing, dx/ d = 0.

3.5.2 The acceleration vector

The acceleration vector is dened as the derivative of the velocity

vector with respect to proper time,

a =

dv

d

.

It measures the curvature of a world-line. Its squared magnitude

is the minus the proper acceleration, meaning the acceleration that

would be measured by an accelerometer carried along that world-

line. The proper acceleration is only approximately equal to the

squared magnitude of the Newtonian acceleration three-vector, in

the limit of small velocities.

Constant proper acceleration Example 4

Suppose a spaceship moves so that the acceleration is judged

to be the constant value a by an observer on board. Find the

motion x(t ) as measured by an observer in an inertial frame.

Let stand for the ships proper time, and let dots indicate

derivatives with respect to . The ships velocity has magnitude

1, so

t

2

x

2

= 1 .

An observer who is instantaneously at rest with respect to the

ship judges is to have an acceleration vector (0, a) (because the

low-velocity limit applies). The observer in the (t , x) frame agrees

on the magnitude of this vector, so

t

2

x

2

= a

2

.

The solution of these differential equations is t =

1

a

sinha,

x =

1

a

cosha (choosing constants of integration so that the ex-

pressions take on their simplest forms). Eliminating gives

x =

1

a

_

1 + a

2

t

2

,

shown in gure g. The world-line is a hyperbola, and this type of

motion is sometimes referred to as hyperbolic motion.

Section 3.5 The velocity and acceleration vectors 57

As t approaches innity, dx/ dt approaches the speed of light.

In the same limit, x increases exponentially with proper time, so

that surprisingly large distances can in theory be traveled within

a human lifetime (problem 7, p. 97). Some further properties of

hyperbolic motion are developed in problems 10, 11, and 12.

Another interesting feature of this problemis the dashed-line asymp-

tote, which is lightlike. Suppose we interpret this as the world-line

of a ray of light. The ray comes closer and closer to the ship,

but will never quite catch up. Thus provided that the rocket never

stops accelerating, the entire region of spacetime to the left of

the dashed line is forever hidden from its passengers. That is,

an observer who undergoes constant acceleration has an event

horizon a boundary that prevents her from observing anything

on the other side. You may have heard about the event horizon

associated with a black hole. This example shows that we can

have event horizons even when there is no gravity at all.

3.5.3 Constraints on the velocity and acceleration vectors

Counting degrees of freedom

There is something misleading about the foregoing treatment of

the velocity and acceleration vectors, and the easiest way to see this

is by introducing the idea of a degree of freedom. Often we can

describe a system using a list of real numbers. For the hand on a

clock, we only need one number, such as 3 oclock. This is because

the hand is constrained to stay in the plane of the clocks face and

also to keep its tail at the center of the circle. Since one number

describes its position, we say that it has one degree of freedom. If

a hiker wants to know where she is on a map, she has two degrees

of freedom, which could be specied as her latitude and longitude.

If she was in a helicopter, there would be no constraint to stay on

the earths surface, and the number of degrees of freedom would be

increased to three. If we also considered the helicopters velocity to

be part of the description of its state, then there would be a total

of six degrees of freedom: one for each coordinate and one for each

component of the velocity vector.

Now suppose that we want to describe a particles velocity and

acceleration. In Newtonian mechanics, we would describe these

three-vectors as possessing a total of six degrees of freedom: v

x

,

v

y

, v

z

, a

x

, a

y

, and a

z

. Upgrading from Newtonian mechanics to

relativity cant change the number of degrees of freedom. For ex-

ample, an electrons acceleration is fully determined by the force

we exert on it, and we might control that acceleration by placing a

proton nearby and producing an electrical attraction. The position

of the proton (three degrees of freedom for its three coordinates) de-

termines the electrons acceleration, so the acceleration has exactly

three degrees of freedom as well.

58 Chapter 3 Kinematics

h / Both vectors are tangent

vectors.

This means that there must be some hidden redundancy in the

eight components of the velocity and acceleration four-vectors. The

system only has six degrees of freedom, so there must be two con-

straints that we didnt know about. Similarly, Ive gone hiking and

had my GPS unit claim that I was a thousand feet above a lake or

three thousand feet under a mountain. In those situations there was

a constraint that I knew about but that the GPS didnt: that I was

on the surface of the earth.

Normalization of the velocity

The rst constraint arises naturally from a geometrical inter-

pretation of the velocity four-vector, shown in gure h. The curve

represents the world-line of a particle. The dashed line is drawn

tangent to the world-line at a certain moment. Under a microscope,

the dashed line, which represents a possible inertial motion of a par-

ticle, is indistinguishable from the solid curve, which is noninertial.

The dashed line has a slope t/x = 2, which corresponds to a

velocity x/t = 1/2. The gure is drawn in 1+1 dimensions, but

in 3 + 1 dimensions we would want to know more than this num-

ber. We would want to know the orientation of the dashed line in

the three spatial dimensions, i.e., not just the speed of the particle

but also its direction of motion. All the desired information can be

encapsulated in a vector. Both of the vectors shown in the gure

are parallel to the dashed line, so even though they have dierent

lengths, there is no dierence between the velocities they represent.

Since we want the particle to have a single well-dened vector to

represent its velocity, we want to pick one vector from among all

the vectors parallel to the dashed line, and call that the velocity

vector.

We have already implicitly made this choice. It follows from

the original denition v = dr/ d that the velocity vectors squared

magnitude v

2

= v v is always equal to 1, even though the ob-

ject whose motion it describes is not moving at the speed of light.

This, along with the requirement that the velocity vector lie within

the future rather than the past light cone, uniquely species which

tangent vector we want. The requirement v

2

= 1 is an example of

a recurring idea in physics and mathematics called normalization.

The idea is that we have some object (a vector, a function, . . . )

that could be scaled up or down by any amount, but from among

all the possible scales, there is only one that is the right one. For

example, a gambler might place a horses chance of winning at 9 to

1, but a physicist would divide these by 10 in order to normalize the

probabilities to 0.9 and 0.1, the idea being that the total probability

should add up to 1. Our denition of the velocity vector implies that

it is normalized. Thus an alternative, geometrical denition of the

velocity vector would have been that it is the vector that is tangent

to the particles world-line, future-directed, and normalized to 1.

Section 3.5 The velocity and acceleration vectors 59

When we hear something referred to as a vector, we usually

take this is a statement that it not only transforms as a vector, but

also that it adds as a vector. But the sum of two velocity vectors

would not typically be a valid velocity vector at all, since it would

not have unit magnitude. This lack of additivity would in any case

have been expected because velocities dont add linearly in relativity

(section 3.3).

self-check C

Velocity vectors are required to have v

2

= 1. If a vector qualies as a

valid velocity vector in some frame, could it be invalid in another frame?

Answer, p. ??

A nice way of thinking about velocity vectors is that every such

vector represents a potential observer. That is, the velocity vectors

are the observer-vectors o of chapter 1, but with a normalization

requirement o

2

= 1 that we did not impose earlier. An observer

writes her own velocity vector as (1, 0), i.e., as the unit vector in the

timelike direction. Since we have no notion of adding one observer

to another observer, it makes sense that velocity vectors dont add

relativistically.

If u and v are both future-directed, properly normalized velocity

vectors, and if the signature is + as in this book, then their

inner product is = u v, the gamma factor, introduced in section

1.3.3, p. 25, corresponding their relative velocity.

Orthogonality of the velocity and acceleration

Now for the second of the two constraints deduced on p. 58.

Suppose an observer claims that at a certain moment in time,

a particle has v = (1, 0) and a = (3, 0). That is, the particle is

at rest (v

x

= 0) and its v

t

is growing by 3 units per second. This

is impossible, because after an innitesimal time interval dt, this

rate of change will result in v = (1 + 3 dt, 0), which is not properly

normalized: its magnitude has grown from 1 to 1+3 dt. The observer

is mistaken. This is not a possible combination of velocity and

acceleration vectors. In general (problem 9, p. 67), we always have

the following constraint on the velocity and acceleration vectors:

a v = 0 .

This is analogous to the three-dimensional idea that in uniform cir-

cular motion, the perpendicularity of the velocity and acceleration

three-vectors is what causes the velocity vector to rotate without

changing its magnitude.

60 Chapter 3 Kinematics

3.6 Some kinematic identities

In addition to the relations

D(v) =

_

1 +v

1 v

and

v

c

=

v

1

+v

2

1 +v

1

v

2

,

the following identities can be handy. If stranded on a desert island

you should be able to rederive them from scratch. Dont memorize

them.

[1] v = (D

2

1)/(D

2

+ 1) [5] = ln D [10] D

c

= D

1

D

2

[2] = (D

1

+D)/2 [6] v = tanh [11]

c

=

1

+

2

[3] v = (D D

1

)/2 [7] = cosh [12] v

c

c

= (v

1

+v

2

)

1

2

[4] D(v)D(v) = 1 [8] v = sinh

[9] tanh(x +y) =

tanh x+tanh y

1+tanh xtanh y

The hyperbolic trig functions are dened as follows:

sinh x =

e

x

e

x

2

cosh x =

e

x

+e

x

2

tanh x =

sinh x

cosh x

Their inverses are built in to some calculators and computer soft-

ware, but they can also be calculated using the following relations:

sinh

1

x = ln

_

x +

_

x

2

+ 1

_

cosh

1

x = ln

_

x +

_

x

2

1

_

tanh

1

x =

1

2

ln

_

1 +x

1 x

_

Their derivatives are, respectively, (x

2

+ 1)

1/2

, (x

2

1)

1/2

, and

(1 x

2

)

1

.

3.7 The projection operator

A frequent source of confusion in relativity is that we write down

equations that are coordinate-dependent, but forget the dependency.

Similarly, it is possible to write expressions that are only valid for

one choice of signature. The following notation, dening a projection

operator P, is one tool for avoiding these diculties.

P

o

r = r

r o

o o

o (1)

Usually o is the future timelike vector representing a certain ob-

server, but the denition can be applied as long as o isnt lightlike.

Section 3.6 Some kinematic identities 61

The idea being expressed is that we want to get rid of any part

of r that is parallel to os arrow of time. In a graph constructed

according to os Minkowski coordinates, we cast rs shadow down

perpendicularly onto the spacelike axis, or the spacelike three-plane

in 3 +1 dimensions. This is why P is referred to as a projection op-

erator. The notation sometimes allows us to express the things that

we would otherwise express by explicitly or implicitly constructing

and referring to os spacelike Minkowski coordinates. P has the

following properties:

1. o P

o

r = 0

2. r P

o

r is parallel to o.

3. P

o

o = 0

4. P

o

P

o

r = P

o

r

5. P

co

= P

o

6. P

o

is linear, i.e., P

o

(q +r) = P

o

q +P

o

r and P

o

(cr) = cP

o

r

7.

d

dx

P

o

r = P

o

dr

dx

, where x is any variable and o doesnt depend

on x.

8. If o and v are both future timelike, and [o

2

[ = 1, then we can

express v as v = P

o

v +o, where has the usual interpreta-

tion for world-lines that coincide with these two vectors.

All of these hold regardless of whether the signature is +

or + ++, and none of them refer to any coordinates. Properties

1 and 2 can serve as an alternative, geometrical denition of P.

Property 3 says that an observer considers herself to be at rest. 4

is a general property of all projection operators. 8 splits the vector

into its spatial and temporal parts according to o.

Sometimes if we know a position, velocity, or acceleration four-

vector, we want to nd out how these would be measured by a par-

ticular observer using clocks and rulers. The following table shows

how to switch back and forth between the two representations. We

use, for example, the notation v

o

to mean the velocity vector of the

form (0, v

x

, v

y

, v

z

) that would be measured by an observer whose

velocity vector is o (so that the subscript is an o for observer,

not a zero). Since this type of vector, expressed in the Minkowski

coordinates of observer o, has a zero time component, we refer to it

as a three-vector. In all of these expressions, the velocity vectors o

and v are assumed to be normalized, and the signature is assumed

to be + (one implication being that o v is simply ).

62 Chapter 3 Kinematics

i / Example 5.

nding the three-vector from the

four-vector

nding the four-vector from the

three-vector

x

o

= P

o

x

v

o

=

Pov

ov

v = (o +v

o

)

a

o

=

1

(ov)

2

[P

o

a (o a)v

o

] a =

3

(a

o

v

o

)v +

2

a

o

, where

v is found as above

As an example of how these are derived, the three-velocity v

o

is the derivative of x

o

with respect to observer os Minkowski time

coordinate t, whereas the four-velocity is dened as the derivative of

x with respect to the proper time of the world-line being observed.

Therefore we have

v

o

=

dx

o

dt

=

dP

o

x

dt

and applying property 7 of the projection operator this becomes

v

o

= P

o

dx

dt

= P

o

dx

d

d

dt

=

1

P

o

dx

d

=

1

o v

P

o

dx

d

=

P

o

v

o v

.

The similar but messier derivation of the expression for a

o

is problem

15. In manipulating expressions of this type, the identity d/ dt =

3

a

o

v

o

is often handy (problem 14).

Lewis-Tolman paradox Example 5

The following example is a form of a paradox discussed by Lewis

and Tolman in 1909. Figure i shows the frame of reference of

observer o in which identical particles 1 and 2 are at initially rest

and located at equal distances from the origin along the y and

x axes. External forces of equal strength act in the directions

shown by the arrows so as to produce accelerations of magnitude

. The system is in rotational equilibrium dL/ dt = 0, because the

rate at which particle 1 picks clockwise angular momentum is the

same as the rate at which 2 acquires it in the counterclockwise

direction.

Now change to the frame of reference o

relative to o at velocity v. Particle 2s distance from the origin

is Lorentz-contracted from to /, so its angular momentum is

also reduced by 1/. It now appears that the systems total an-

gular momentum is increasing in the clockwise sense. How can

we have rotational equilibrium in one frame, but not another?

Section 3.7 The projection operator 63

The resolution of the paradox is that the accelerations transform

as well. In the original frame o, the four-velocities are v

1

= v

2

=

(1, 0, 0, 0), and the four-accelerations are a

1

= (0, , 0, 0) and a

2

=

(0, 0, , 0). Applying a Lorentz transformation, we have v

1

= v

2

=

(, v, 0, 0) and

a

1

= (v, , 0, 0)

a

2

= (0, 0, 1, 0) .

Our denition of angular momentum is expressed in terms of

three-vectors such as a

o

1

and a

o

2

, not four-vectors like a

1

and

a

2

. We have

dL

dt

= ma

o

1x

ma

o

2y

.

Using the relations v

o

=

1

P

o

v and a

o

=

2

[P

o

a (o a)v

o

],

we nd

v

o

1x

= v ,

a

o

1x

=

1

2

[ (v)(v)] =

3

,

and

a

o

2y

=

2

.

The result is

dL

dt

= m

3

m

,

which is zero.

3.8 Faster-than-light frames of reference?

Special relativity doesnt permit the existence of observers who move

at c (section 3.4). But what about a superluminal observer, one who

moves faster than c? With charming naivete, the special-eects

technicians for Star Trek attempted to show the frame of reference

of such an observer in scenes where a eld of stars rushed past

the Enterprise. (Never mind that the stars, which pass in front of

and behind the spaceship, should actually be a million times larger

than it.) Actually such an observer (which could not be made out

of normal material particles such as electrons and protons) would

consider its own world-line, which we call spacelike, to be timelike,

while the world-line of a star such as our sun, which we consider

timelike, would be spacelike. That is, our sun would not appear to

the observer as an object in motion but rather as a line stretching

across space, which would wink into existence and then wink back

out. A typical transformation between our frame and the frame

64 Chapter 3 Kinematics

of such an observer would be (x, t) (t, x), simply swapping the

time and space coordinates. The transformation is one-to-one, and

therefore not subject to the objection raised in section 3.4 to frames

moving at c.

But this is all in 1+1 dimensions. In 3+1 dimensions, we again

run into the diculty that the transformation between our frame

and that of the superluminal being cannot be one-to-one, since we

cant squish three dimensions to one or expand one to three without

merging points or splitting one point into many. Our conclusion,

then, is that there can be no such thing as a superluminal observer in

our 3+1-dimensional universe. A more formal proof has been given

by Gorini.

4

For more about faster-than-light motion in relativity,

see section 4.7, p. 92.

4

Gorini, Linear Kinematical Groups, Commun. Math. Phys. 21 (1971) 150.

Open access via Project Euclid at http://projecteuclid.org/DPubS?service=

UI&version=1.0&verb=Display&handle=euclid.cmp/1103857292

Section 3.8 Faster-than-light frames of reference? 65

Problems

1 Fred buys a ticket on a spaceship that will accelerate to an

ultrarelativistic speed v such that c v is only 6 m/s. Fred was

on the track team in high school, so he knows he can run about 8

m/s. Once the ship is up to speed, Fred plans to run in the forward

direction, thereby becoming the rst human to exceed the speed of

light. Other than the possible lack of gravity to allow running, what

is wrong with Freds plan?

2 (a) In the equation v

c

= (v

1

+v

2

)/(1 +v

1

v

2

) for combination

of velocities, interpret the case where one of the velocities (but not

the other) equals the speed of light. (b) Interpret the case where the

denominator goes to zero. (c) Use the geometric series to rewrite

the factor 1/(1 + v

1

v

2

), and then expand the expression for v

c

as

a series in v

1

and v

2

, retaining terms up to third order in velocity.

How does this relate to the correspondence principle?

3 Determine which of the identities in section 3.6 need to be

modied in order to be valid in units with c ,= 1, and describe how

they should be modied.

4 The Large Hadron Collider accelerates counterrotating beams

of protons and collides them head-on. The beam energy has been

gradually increased, and the accelerator is designed to reach a max-

imum energy of 14 TeV, corresponding to a rapidity of 10.3. (a)

Find the velocity of the beam. (b) In any collision, the kinetic en-

ergy available to do something inelastic (smash up your car, produce

nuclear reactions, . . . ) is the energy in the center of mass frame;

in any other frame, there is initial kinetic energy that must also be

present in the nal state due to conservation of momentum. Sup-

pose that a particular proton in the LHC beam never undergoes

a collision with a proton from the opposite beam, and instead is

wasted by being dumped into a beamstop. Lets say that this colli-

sion is with a proton in a hydrogen atom left behind by someones

ngerprint. Find the velocities of the two protons in their common

center of mass frame.

5 Each GPS satellite is in an orbit with a radius of 26,600

km, with an orbital period of half a sidereal day, giving it a velocity

of 3.88 km/s. The atomic clock aboard such a satellite is tuned

to 10.22999999543 MHz, which is chosen so that when the satellite

is directly overhead, the eect of time dilation (transverse Doppler

shift), combined with a general-relativistic eect due to gravity, re-

sults in a frequency of exactly 10.23 MHz. (GPS started out as a

military project, and legend has it that the top brass, suspicious of

the crazy relativity stu, demanded that the satellites be equipped

with a software switch to turn o the correction, just in case the

physicists were wrong.) There are oscillations superimposed onto

these static eects due to the longitudinal Doppler shifts as the

satellites approach and recede from a given observer on the ground.

66 Chapter 3 Kinematics

(a) Calculate the maximum Doppler-shifted frequency for a hypo-

thetical observer in outer space who is being directly approached by

the satellite in its orbit. (b) In reality, the greatest possible longi-

tudinal component of the velocity is considerably smaller than this

due to the geometry. Use the size of the earth to determine this

velocity and the corresponding maximum frequency.

6 Verify directly, using the geometry of gure b/2 on p. 51 that

for v = 3/5, the Doppler shift factor is D = 2. (Do not simply plug

v = 3/5 into the formula D =

_

(1 +v)/(1 v).)

7 Generalize the numerical calculation of problem 6 to prove

the general result D =

_

(1 +v)/(1 v).

8 Expand the relativistic equation for the longitudinal Doppler

shift of light D(v) in a Taylor series, and nd the rst two nonvanish-

ing terms. Show that these two terms agree with the nonrelativistic

expression, so that any relativistic eect is of higher order in v.

9 Prove, as claimed on p. 60, that we must have a v = 0 if the

velocity four-vector is to remain properly normalized.

10 Example 4 on p. 57 described the motion of an object having

constant proper acceleration a, the world-line being t =

1

a

sinh a

and x =

1

a

cosh a in a particular observers Minkowski coordinates.

(a) Prove the following results for and for the (three-)velocity and

(three-)acceleration measured by this observer.

= cosh a

v = tanh a

acceleration = a cosh

3

a

Do the calculations simply by taking the rst and second derivatives

of position with respect to time. You will nd the following facts

helpful:

1 tanh

2

= cosh

2

d

dx

tanh x = cosh

2

x

(b) Interpret the results in the limit of large .

11 Example 4 on p. 57 described the motion of an object having

constant proper acceleration a, the world-line being t =

1

a

sinh a

and x =

1

a

cosh a in a particular observers Minkowski coordinates.

Find the corresponding velocity and acceleration four-vectors.

12 Starting from the results of problem 11, repeat problem 10a

using the techniques of section 3.7 on p. 61. You will nd it helpful

to know that 1 tanh

2

= cosh

2

.

13 Let v be a future-directed, properly normalized velocity

vector. Compare the value of v v in the + signature used in

this book with its value in the signature + ++.

Problems 67

14 (a) Prove the relation d/ dt =

3

a

o

v

o

given on p. 63,

in the special case where the motion is linear. (b) Generalize the

result to 3 + 1 dimensions.

15 Derive the identity a

o

=

1

(ov)

2

[P

o

a (o a)v

o

] on p. 63.

68 Chapter 3 Kinematics

Chapter 4

Dynamics

4.1 Ultrarelativistic particles

A typical 22-caliber rie shoots a bullet with a mass of about 3

g at a speed of about 400 m/s. Now consider the ring of such

a rie as seen through an ultra-powerful telescope by an alien in

a distant galaxy. We happen to be ring in the direction away

from the alien, who gets a view from over our shoulder. Since the

universe is expanding, our two galaxies are receding from each other.

In the aliens frame, our own galaxy is the one that is moving

lets say at

1

c (200 m/s). If the two velocities simply added,

the bullet would be moving at c + (200 m/s). But velocities dont

simply add and subtract relativistically, and applying the correct

equation for relativistic combination of velocities, we nd that in

the aliens frame, the bullet ies at only c(199.9995 m/s). That is,

according to the alien, the energy in the gunpowder only succeeded

in accelerating the bullet by 0.0005 m/s! If we insisted on believing

in K = (1/2)mv

2

, this would clearly violate conservation of energy

in the aliens frame of reference. It appears that kinetic energy must

not only rise faster than v

2

as v approaches c, it must blow up to

innity. This gives a dynamical explanation for why no material

object can ever reach or exceed c, as we have already inferred on

purely kinematical grounds.

To the alien, both our galaxy and the bullet are ultrarelativistic

objects, i.e., objects moving at nearly c. A good way of thinking

about an ultrarelativistic particle is that its a particle with a very

small mass. For example, the subatomic particle called the neutrino

has a very small mass, thousands of times smaller than that of the

electron. Neutrinos are emitted in radioactive decay, and because

the neutrinos mass is so small, the amount of energy available in

these decays is always enough to accelerate it to very close to c.

Nobody has ever succeeded in observing a neutrino that was not

ultrarelativistic. When a particles mass is very small, the mass

becomes dicult to measure. For almost 70 years after the neu-

trino was discovered, its mass was thought to be zero. Similarly, we

currently believe that a ray of light has no mass, but it is always

possible that its mass will be found to be nonzero at some point

1

In reality when two velocities move at relativistic speeds compared with one

another, they are separated by a cosmological distance, and special relativity

does not actually allow us to construct frames of reference this large.

69

in the future. A ray of light can be modeled as an ultrarelativistic

particle.

Lets compare ultrarelativistic particles with train cars. A single

car with kinetic energy E has dierent properties than a train of two

cars each with kinetic energy E/2. The single car has half the mass

and a speed that is greater by a factor of

true for ultrarelativistic particles. Since an idealized ultrarelativistic

particle has a mass too small to be detectable in any experiment,

we cant detect the dierence between m and 2m. Furthermore,

ultrarelativistic particles move at close to c, so there is no observable

dierence in speed. Thus we expect that a single ultrarelativistic

particle with energy E compared with two such particles, each with

energy E/2, should have all the same properties as measured by a

mechanical detector.

An idealized zero-mass particle also has no frame in which it

can be at rest. It always travels at c, and no matter how fast we

chase after it, we can never catch up. We can, however, observe

it in dierent frames of reference, and we will nd that its energy

is dierent. For example, distant galaxies are receding from us at

substantial fractions of c, and when we observe them through a

telescope, they appear very dim not just because they are very far

away but also because their light has less energy in our frame than

in a frame at rest relative to the source. This eect must be such

that changing frames of reference according to a specic Lorentz

transformation always changes the energy of the particle by a xed

factor, regardless of the particles original energy; for if not, then

the eect of a Lorentz transformation on a single particle of energy

E would be dierent from its eect on two particles of energy E/2.

How does this energy-shift factor depend on the velocity v of

the Lorentz transformation? Here it becomes nicer to work in

terms of the variable D. Lets write f(D) for the energy-shift

factor that results from a given Lorentz transformation. Since a

Lorentz transformation D

1

followed by a second transformation D

2

is equivalent to a single transformation by D

1

D

2

, we must have

f(D

1

D

2

) = f(D

1

)f(D

2

). This tightly constrains the form of the

function f; it must be something like f(D) = D

n

, where n is a con-

stant. The interpretation of n is that under a Lorentz transforma-

tion corresponding to 1% of c, energies of ultrarelativistic particles

change by about n% (making the approximation that v = .01 gives

D 1.01). In his original 1905 paper on special relativity, Einstein

used Maxwells equations and the Lorentz transformation to show

that for a light wave n = 1, and we will prove on p. 78 that this

holds for any ultrarelativistic object. He wrote, It is remarkable

that the energy and the frequency . . . vary with the state of motion

of the observer in accordance with the same law. He was presum-

ably interested in this fact because 1905 was also the year in which

he published his paper on the photoelectric eect, which formed the

70 Chapter 4 Dynamics

foundations of quantum mechanics. An axiom of quantum mechan-

ics is that the energy and frequency of any particle are related by

E = hf, and if E and f hadnt transformed in the same way rela-

tivistically, then quantum mechanics would have been incompatible

with relativity.

If we assume that certain objects, such as light rays, are truly

massless, rather than just having masses too small to be detectable,

then their D doesnt have any nite value, but we can still nd how

the energy diers according to dierent observers by nding the D

of the Lorentz transformation between the two observers frames of

reference.

An astronomical energy shift Example 1

For quantum-mechanical reasons, a hydrogen atom can only

exist in states with certain specic energies. By conservation

of energy, the atom can therefore only absorb or emit light that

has an energy equal to the difference between two such atomic

energies. The outer atmosphere of a star is mostly made of

monoatomic hydrogen, and one of the energies that a hydrogen

atom can absorb or emit is 3.0276 10

19

J. When we observe

light from stars in the Andromeda Galaxy, it has an energy of

3.0306 10

19

J. If this is assumed to be due entirely to the

motion of the Milky Way and Andromeda Galaxy relative to one

another, along the line connecting them, nd the direction and

magnitude of this velocity.

The energy is shifted upward, which means that the Andromeda

Galaxy is moving toward us. (Galaxies at cosmological distances

are always observed to be receding from one another, but this

doesnt necessarily hold for galaxies as close as these.) Relating

the energy shift to the velocity, we have

E

E

= D =

_

(1 + v)/(1 v) .

Since the shift is only about one part per thousand, the velocity

is small compared to c or small compared to 1 in units where

c = 1. Therefore we can employ the low-velocity approximation

D 1 + v, which gives

v D 1 =

E

E

1 = 1.0 10

3

.

The negative sign conrms that the source is approaching rather

than receding. This is in units where c = 1. Converting to SI

units, where c ,= 1, we have v = (1.0 10

3

)c = 300 km/s.

Although the Andromeda Galaxys tangential motion is not accu-

rately known, it is considered likely that it will collide with the Milky

Way in a few billion years.

Section 4.1 Ultrarelativistic particles 71

4.2 E=mc

2

We now know the relativistic expression for kinetic energy in the lim-

iting case of an ultrarelativistic particle: its energy is proportional to

the stretch factor D of the Lorentz transformation. What about

intermediate cases, like v = c/2?

a / The match is lit inside the bell

jar. It burns, and energy escapes

from the jar in the form of light. Af-

ter it stops burning, all the same

atoms are still in the jar: none

have entered or escaped. The g-

ure shows the outcome expected

before relativity, which was that

the mass measured on the bal-

ance would remain exactly the

same. This is not what happens

in reality.

When we are forced to tinker with a time-honored theory, our

rst instinct should always be to tinker as conservatively as possible.

Although weve been forced to admit that kinetic energy doesnt

vary as v

2

/2 at relativistic speeds, the next most conservative thing

we could do would be to assume that the only change necessary

is to replace the factor of v

2

/2 in the nonrelativistic expression for

kinetic energy with some other function, which would have to act

like D or 1/D for v c. I suspect that this is what Einstein

thought when he completed his original paper on relativity in 1905,

because it wasnt until later that year that he published a second

paper showing that this still wasnt enough of a change to produce

a working theory. We now know that there is something more that

needs to be changed about prerelativistic physics, and this is the

assumption that mass is only a property of material particles such

as atoms (gure a). Call this the atoms-only hypothesis.

Now that we know the correct relativistic way of nding the

energy of a ray of light, it turns out that we can use that to nd

what we were originally seeking, which was the energy of a material

object. The following discussion closely follows Einsteins.

Suppose that a material object O of mass m

o

, initially at rest

in a certain frame A, emits two rays of light (or any other kind of

ultrarelativistic particles), each with energy E/2. By conservation

of energy, the object must have lost an amount of energy equal to

E. By symmetry, O remains at rest.

We now switch to a dierent frame of reference B moving at some

arbitrary speed corresponding to a stretch factor D. The change

of frames means that were chasing one ray, so that its energy is

scaled down to (E/2)D

1

, while running away from the other, whose

energy gets boosted to (E/2)D. In frame B, as in A, O retains the

72 Chapter 4 Dynamics

same speed after emission of the light. But observers in frames A

and B disagree on how much energy O has lost, the discrepancy

being

E

_

1

2

(D +D

1

) 1

_

.

This can be rewritten using identity [2] from section 3.6 as

E( 1) .

Lets consider the case where Bs velocity relative to A is small.

Using the approximation 1 +v

2

/2, our result is approximately

1

2

Ev

2

,

neglecting terms of order v

4

and higher. The interpretation is that

when O reduced its energy by E in order to make the light rays, it

reduced its mass from m

o

to m

o

m, where m = E. Inserting the

necessary factor of c

2

to make this valid in units where c ,= 1, we

have Einsteins famous

E = mc

2

.

This derivation entailed both an approximation and some hidden

assumptions. These issues are explored more thoroughly in section

4.4 on p. 84 and in ch. 9 on p. 149. The result turns out to be valid

for any isolated body.

We nd that mass is not simply a built-in property of the parti-

cles that make up an object, with the objects mass being the sum of

the masses of its particles. Rather, mass and energy are equivalent,

so that if the experiment of gure a is carried out with a suciently

precise balance, the reading will drop because of the mass equivalent

of the energy emitted as light.

The equation E = mc

2

tells us how much energy is equivalent

to how much mass: the conversion factor is the square of the speed

of light, c. Since c a big number, you get a really really big number

when you multiply it by itself to get c

2

. This means that even

a small amount of mass is equivalent to a very large amount of

energy. Conversely, an ordinary amount of energy corresponds to

an extremely small mass, and this is why nobody detected the non-

null result of experiments like the one in gure a hundreds of years

ago.

The big event here is mass-energy equivalence, but we can also

harvest a result for the energy of a material particle moving at a

certain speed. We have m( 1) for the dierence between Os

energy in frame B and its energy when it is at rest, i.e., its kinetic

energy. But since mass and energy are equivalent, we assign O an

energy m when it is at rest. The result is that the energy is

E = m

(or mc

2

in units with c ,= 1).

Section 4.2 E=mc

2

73

b / Top: A PET scanner. Middle:

Each positron annihilates with an

electron, producing two gamma-

rays that y off back-to-back.

When two gamma rays are ob-

served simultaneously in the ring

of detectors, they are assumed to

come from the same annihilation

event, and the point at which they

were emitted must lie on the line

connecting the two detectors.

Bottom: A scan of a persons

torso. The body has concentrated

the radioactive tracer around the

stomach, indicating an abnormal

medical condition.

Electron-positron annihilation Example 2

Natural radioactivity in the earth produces positrons, which are

like electrons but have the opposite charge. A form of antimat-

ter, positrons annihilate with electrons to produce gamma rays, a

form of high-frequency light. Such a process would have been

considered impossible before Einstein, because conservation of

mass and energy were believed to be separate principles, and

this process eliminates 100% of the original mass. The amount

of energy produced by annihilating 1 kg of matter with 1 kg of

antimatter is

E = mc

2

= (2 kg)

_

3.0 10

8

m/s

_

2

= 2 10

17

J ,

which is on the same order of magnitude as a days energy con-

sumption for the entire worlds population!

Positron annihilation forms the basis for the medical imaging tech-

nique called a PET (positron emission tomography) scan, in which

a positron-emitting chemical is injected into the patient and map-

ped by the emission of gamma rays from the parts of the body

where it accumulates.

A rusting nail Example 3

An iron nail is left in a cup of water until it turns entirely to rust.

The energy released is about 0.5 MJ. In theory, would a suf-

ciently precise scale register a change in mass? If so, how much?

The energy will appear as heat, which will be lost to the envi-

ronment. The total mass-energy of the cup, water, and iron will

indeed be lessened by 0.5 MJ. (If it had been perfectly insulated,

there would have been no change, since the heat energy would

have been trapped in the cup.) The speed of light is c = 3 10

8

meters per second, so converting to mass units, we have

m =

E

c

2

=

0.5 10

6

J

_

3 10

8

m/s

_

2

= 6 10

12

kilograms .

The change in mass is too small to measure with any practical

technique. This is because the square of the speed of light is

such a large number.

74 Chapter 4 Dynamics

Relativistic kinetic energy Example 4

By about 1930, particle accelerators had progressed to the point

at which relativistic effects were routinely taken into account. In

1964, Bertozzi did special-purpose experiment to test the predic-

tions of relativity using an electron accelerator. The results were

discussed in less detail in example 2 on p. 54, at which point

we had not yet seen the relativistic equation for kinetic energy.

Electrons were accelerated through a static electric potential dif-

ference V to a variety of kinetic energies K = eV, and their veloc-

ities inferred by measuring their time of ight through a beamline

of length = 8.4 m. Electrical pulses were recorded on an os-

cilloscope at the beginning and end of the time of ight t . The

energies were conrmed by calorimetry. Figure c shows a sam-

ple photograph of an oscilloscope trace at V = 1.5 MeV.

c / Example 4. Each horizontal di-

vision is 9.8 ns.

The prediction of Newtonian physics is as follows.

eV = (1/2)mv

2

v/c = 2.4

t = 12 ns

According to special relativity, we have:

eV = m( 1)c

2

v

c

=

1

_

1 +

eV

mc

2

_

2

= 0.97

t = 29 ns

The results contradict the Newtonian prediction and are consis-

tent with special relativity. According to Newton, this amount of

energy should have accelerated the electrons to several times

the speed of light. In reality, we see a clear demonstration of the

nature of c as a limiting velocity.

Section 4.2 E=mc

2

75

Gravity bending light Example 5

Gravity is a universal attraction between things that have mass,

and since the energy in a beam of light is equivalent to some

very small amount of mass, light should be affected by gravity,

although the effect should be very small. The rst experimental

conrmation of relativity came in 1919 when stars next to the sun

during a solar eclipse were observed to have shifted a little from

their ordinary position. (If there was no eclipse, the glare of the

sun would prevent the stars from being observed.) Starlight had

been deected by the suns gravity. The gure is a photographic

negative, so the circle that appears bright is actually the dark face

of the moon, and the dark area is really the bright corona of the

sun. The stars, marked by lines above and below then, appeared

at positions slightly different than their normal ones.

Keep in mind that these arguments are very rough and qualita-

tive, and it is not possible to produce a relativistic theory of gravity

simply by taking E = mc

2

and combining it with Newtons law of

gravity. After all, this law doesnt refer to time at all: it predicts

that gravitational forces propagate instantaneously. We know this

cant be consistent with relativity, which forbids cause and effect

from propagating at any speed greater than c. To produce a rela-

tivistic theory of gravity, we need general relativity.

Similar reasoning suggests that there may be stars black holes

so dense that their gravity can prevent light from leaving. Such

stars have been detected, and their properties seem so far to be

described correctly by general relativity.

76 Chapter 4 Dynamics

d / In the p-E plane, mass-

less particles lie on the two

diagonals, while particles with

mass lie to the right.

4.3 Relativistic momentum

Newtonian mechanics has two dierent measures of motion, kinetic

energy and momentum, and the relationship between them is nonlin-

ear. Doubling your cars momentum quadruples its kinetic energy.

But nonrelativistic mechanics cant handle massless particles,

which are always ultrarelativistic. We saw in section 4.1 that ul-

trarelativistic particles are generic, in the sense that they have

no individual mechanical properties other than an energy and a

direction of motion. Therefore the relationship between kinetic en-

ergy and momentum must be linear for ultrarelativistic particles.

For example, doubling the amplitude of an electromagnetic wave

quadruples both its energy density, which depends on E

2

and B

2

,

and its momentum density, which goes like EB.

How can we make sense of these energy-momentum relation-

ships, which seem to take on two completely dierent forms in the

limiting cases of very low and very high velocities?

The rst step is realize that since mass and energy are equivalent,

we will get more of an apples-to-apples comparison if we stop talking

about a material objects kinetic energy and consider instead its total

energy E, which includes a contribution from its mass.

Figure d is a graph of energy versus momentum. In this repre-

sentation, massless particles, which have E [p[, lie on two diagonal

lines that connect at the origin. If we like, we can pick units such

that the slopes of these lines are plus and minus one. Material par-

ticles lie above these lines. For example, a car sitting in a parking

lot has p = 0 and E = m.

Now what happens to such a graph when we change to a dif-

ferent frame or reference that is in motion relative to the original

frame? A massless particle still has to act like a massless particle,

so the diagonals are simply stretched or contracted along their own

lengths. A transformation that always takes a line to a line is a

linear transformation, and if the transformation between dierent

frames of reference preserves the linearity of the lines p = E and

p = E, then its natural to suspect that it is actually some kind of

linear transformation. In fact the transformation must be linear, be-

cause conservation of energy and momentum involve addition, and

we need these laws to be valid in all frames of reference. But now

by the same reasoning as in subsection 1.3.1 on p. 21, the trans-

formation must be area-preserving. We then have the same three

cases to consider as in gure j on p. 16. The Galilean version is

ruled out because it would imply that particles keep the same en-

ergy when we change frames. (This is what would happen if c were

innite, so that the mass-equivalent E/c

2

of a given energy was zero,

and therefore E would be interpreted purely as the mass.) Nor can

the rotational version be right, because it doesnt preserve the

Section 4.3 Relativistic momentum 77

E = [p[ diagonals. We are left with the third case, which establishes

the following aesthetically appealing fact:

Energy-momentum is a four-vector

Let an isolated object have momentum and mass-energy p and E.

Then the p-E plane transforms according to exactly the same kind

of Lorentz transformation as the x-t plane. That is, (E, p

x

, p

y

, p

z

)

is a four-dimensional vector just like (t, x, y, z).

This is a highly desirable result. If it were not true, it would be

like having to learn dierent mathematical rules for dierent kinds

of three-vectors in Newtonian mechanics.

The only remaining issue to settle is whether the choice of units

that gives invariant 45-degree diagonals in the x-t plane is the same

as the choice of units that gives such diagonals in the p-E plane.

That is, we need to establish that the c that applies to x and t is

equal to the c

two graphs are matched up. This is true because in the Newtonian

limit, the total mass-energy E is essentially just the particles mass,

and then p/E p/m v. This establishes that the velocity scales

are matched at small velocities, which implies that they coincide for

all velocities, since a large velocity, even one approaching c, can be

built up from many small increments. (This also establishes that

the exponent n dened on p. 70 equals 1 as claimed.)

Suppose that a particle is at rest. Then it has p = 0 and mass-

energy E equal to its mass m. Therefore the inner product of its

(E, p) four-vector with itself equals m

2

. In other words, the mag-

nitude of the energy-momentum four-vector is simply equal to the

particles mass. If we transform into a dierent frame of reference,

in which p ,= 0, the inner product stays the same. In symbols,

m

2

= E

2

p

2

,

or, in units with c ,= 1,

(mc

2

)

2

= E

2

(pc)

2

.

We take this as the relativistic denition of mass. Since the de-

nition is an inner product, which is a scalar, it is the same in all

frames of reference. (Some older books use an obsolete convention

of referring to m as mass and m as rest mass.)

self-check A

Interpret the equation m

2

= E

2

p

2

in the case where m = 0.

Answer, p. ??

Mass of two light waves Example 6

Let the momentum of a certain light wave be (p

t

, p

x

) = (E, E),

78 Chapter 4 Dynamics

and let another such wave have momentum (E, E). The total

momentum is (2E, 0). Thus this pair of massless particles has a

collective mass of 2E. This is an example of the non-additivity of

relativistic mass.

Example 6 shows that mass is not additive, nor it is a measure

of the quantity of matter.

Finding velocity given energy and momentum Example 7

If we know that a particle has mass-energy E and momentum p

(which also implies knowledge of its mass m), what is its velocity?

In the particles rest frame it has a world-line that points straight

up on a spacetime diagram, and its momentum vector p likewise

points up in the p E plane. Since displacement vectors and

momentum vectors transform according to the same rules, this

parallelism will be maintained in other frames as well. Therefore

in an arbitrarily chosen frame, the vector p = (E, p) lies along a

line whose inverse slope v = p/E gives the velocity.

As a check on our result, we look at its limiting behavior. In the

Newtonian limit, the mass-energy E is nearly all due to the mass,

so we have v p/m, the Newtonian result. In the opposite limit

of ultrarelativistic motion, with E m, the denition of mass

m

2

= E

2

p

2

gives E [p[, and we have [v[ 1, which is

also correct.

Light rays dont interact Example 8

We observe that when two rays of light cross paths, they continue

through one another without bouncing like material objects. This

behavior follows directly from conservation of energy-momentum.

Any two vectors can be contained in a single plane, so we can

choose our coordinates so that both rays have vanishing p

z

. By

choosing the state of motion of our coordinate system appropri-

ately, we can also make p

y

= 0, so that the collision takes place

along a single line parallel to the x axis. Since only p

x

is nonzero,

we write it simply as p. In the resulting p-E plane, there are two

possibilities: either the rays both lie along the same diagonal, or

they lie along different diagonals. If they lie along the same di-

agonal, then there cant be a collision, because the two rays are

both moving in the same direction at the same speed c, and the

trailing one will never catch up with the leading one.

Now suppose they lie along different diagonals. We add their

energy-momentum vectors to get their total energy-momentum,

which will lie in the gray area of gure d. That is, a pair of light

rays taken as a single system act sort of like a material object

with a nonzero mass. By a Lorentz transformation, we can al-

ways nd a frame in which this total energy-momentum vector

lies along the E axis. This is a frame in which the momenta of the

two rays cancel, and we have a symmetric head-on collision be-

Section 4.3 Relativistic momentum 79

tween two rays of equal energy. It is the center-of-mass frame,

although neither object has any mass on an individual basis. For

convenience, lets assume that the x-y-z coordinate system was

chosen so that its origin was at rest in this frame.

Since the collision occurs along the x axis, by symmetry it is not

possible for the rays after the collision to depart from the x axis;

for if they did, then there would be nothing to determine the ori-

entation of the plane in which they emerged.

2

Therefore we are

justied in continuing to use the same p

x

-E plane to analyze the

four-vectors of the rays after the collision.

Let each ray have energy E in the frame described above. Given

this total energy-momentum vector, how can we cook up two

energy-momentumvectors for the nal state such that energy and

momentum will have been conserved? Since there is zero total

momentum, our only choice is two light rays, one with energy-

momentum vector (E, E) and one with (E, E). But this is exactly

the same as our initial state, except that we can arbitrarily choose

the roles of the two rays to have been interchanged. Such an in-

terchanging is only a matter of labeling, so there is no observable

sense in which the rays have collided.

3

Compton scattering Example 9

Figure e/1 is a histogram of gamma rays emitted by a

137

Cs

source and recorded by a NaI scintillation detector. This type

of detector, unlike a Geiger-Muller counter, gives a pulse whose

height is proportional to the energy of the radiation. About half the

gamma rays do what we would like them to do in a detector: they

deposit their full energy of 662 keV in the detector, resulting in a

prominent peak in the histogram. The other half, however, inter-

act through a process called Compton scattering, in which they

collide with one of the electrons but emerge from the collision

still retaining some of their energy, with which they may escape

2

In quantum mechanics, there is a loophole here. Quantum mechanics allows

certain kinds of randomness, so that the symmetry can be broken by letting the

outgoing rays be observed in a plane with some random orientation.

3

There is a second loophole here, which is that a ray of light is actually a

wave, and a wave has other properties besides energy and momentum. It has

a wavelength, and some waves also have a property called polarization. As a

mechanical analogy for polarization, consider a rope stretched taut. Side-to-side

vibrations can propagate along the rope, and these vibrations can occur in any

plane that coincides with the rope. The orientation of this plane is referred to

as the polarization of the wave. Returning to the case of the colliding light rays,

it is possible to have nontrivial collisions in the sense that the rays could aect

one anothers wavelengths and polarizations. Although this doesnt actually

happen with non-quantum-mechanical light waves, it can happen with other

types of waves; see, e.g., Hu et al., arxiv.org/abs/hep-ph/9502276, gure 2.

The title of example 8 is only valid if a ray is taken to be something that lacks

wave structure. The wave nature of light is not evident in everyday life from

observations with apparatus such as ashlights, mirrors, and eyeglasses, so we

expect the result to hold under those circumstances, and it does. E.g., ashlight

beams do pass through one anther without interacting.

80 Chapter 4 Dynamics

from the detector. The amount of energy deposited in the detec-

tor depends solely on the billiard-ball kinematics of the collision,

and can be determined from conservation of energy-momentum

based on the scattering angle. Forward scattering at 0 degrees

is no interaction at all, and deposits no energy, while scattering

at 180 degrees deposits the maximum energy possible if the only

interaction inside the detector is a single Compton scattering. We

will analyze the 180-degree scattering, since it can be tackled in

1+1 dimensions.

e / 1. The Compton edge lies at

the energy deposited by gamma

rays that scatter at 180 degrees

from an electron. 2. The colli-

sion in the lab frame. 3. The

same collision in the center of

mass frame.

Figure e/2 shows the collision in the lab frame, where the elec-

tron is initially at rest. As is conventional in this type of diagram,

the world-line of the photon is shown as a wiggly line; the wig-

gles are just a decoration, and the actual world-line consists of

two line segments. The photon enters the detector with the full

energy E

o

= 662 keV and leaves with a smaller energy E

f

. The

difference E

o

E

f

is what the detector will measure, contributing

a count to the Compton edge. In the lab frame, the total initial

momentum vector is p = (E

o

+ m, E

o

), with the timelike compo-

nent representing the total mass-energy. Because the photon is

massless, its momentum p

x

= E

o

is equal to its energy.

Let v be the velocity of the center-of-mass frame, e/3, relative to

the lab frame. Using the result of example 7, we nd v = E

o

/(E

o

+

m). To make the writing easier we dene = E

o

/m, so that

v = /(1 + ).

The transformation from the lab frame to the c.m. frame Doppler

shifts the energy of the incident photon down to E

= D(v)E

o

.

The collision reverses the spatial part of the photons energy-

momentum vector while leaving its energy the same. Transfor-

mation back into the lab frame gives E

f

= D(v)E

= D(v)

2

E

o

=

E

o

/(1 + 2). (This can also be rewritten using the quantum-

mechanical relation E = hc/ to give the compact form

f

o

=

2hc/m.) The nal result for the energy of the Compton edge is

E

o

E

f

=

E

o

1 + 1/2

= 478 keV ,

Section 4.3 Relativistic momentum 81

in good agreement with gure e/1.

Pair production requires matter Example 10

Example 2 on p. 74 discussed the annihilation of an electron and

a positron into two gamma rays, which is an example of turning

matter into pure energy. An opposite example is pair production,

a process in which a gamma ray disappears, and its energy goes

into creating an electron and a positron.

Pair production cannot happen in a vacuum. For example, gamma

rays from distant black holes can travel through empty space for

thousands of years before being detected on earth, and they dont

turn into electron-positron pairs before they can get here. Pair

production can only happen in the presence of matter. When

lead is used as shielding against gamma rays, one of the ways

the gamma rays can be stopped in the lead is by undergoing pair

production.

To see why pair production is forbidden in a vacuum, consider the

process in the frame of reference in which the electron-positron

pair has zero total momentum. In this frame, the gamma ray

would have to have had zero momentum, but a gamma ray with

zero momentum must have zero energy as well. This means

that conservation of the momentum vector has been violated: the

timelike component of the momentum is the mass-energy, and it

has increased from 0 in the initial state to at least 2mc

2

in the nal

state.

4.3.1 Massless particles travel at c

Massless particles always travel at c(= 1). For suppose that a

massless particle had [v[ < 1 in the frame of some observer. Then

some other observer could be at rest relative to the particle. In

such a frame, the particles momentum p is zero by symmetry, since

there is no preferred direction for it. Then E

2

= p

2

+ m

2

is zero

as well, so the particles entire energy-momentum vector is zero.

But a vector that vanishes in one frame also vanishes in every other

frame. That means were talking about a particle that cant undergo

scattering, emission, or absorption, and is therefore undetectable by

any experiment. This is physically unacceptable because we dont

consider phenomena (e.g., invisible fairies) to be of physical interest

if they are undetectable even in principle.

What about the case of a material particle, i.e., one having mass?

Since we already have an equation E = m for the energy of a ma-

terial particle in terms of its velocity, we can nd a similar equation

82 Chapter 4 Dynamics

for the momentum:

p =

_

E

2

m

2

= m

_

2

1

= m

_

1

1 v

2

1

= mv .

As a material particle gets closer and closer to c, its momentum

approaches innity, so that an innite force would be required in

order to reach c.

In summary, massless particles always move at v = c, while

massive ones always move at v < c.

Note that the equation p = mv isnt general enough to serve as

a denition of momentum, since it becomes an indeterminate form

in the limit m 0.

No half-life for massless particles Example 11

When we describe an unstable nucleus or other particle as hav-

ing some half-life, we mean its half-life in its own rest frame. A

massless particle always moves at c and therefore has no rest

frame (section 3.4), so it doesnt make sense to describe it as

having a half-life in this sense. This is almost, but not quite, the

same thing as saying that massless particles can never decay.

4

Constraints on polarization Example 12

We observe that electromagnetic waves are always polarized

transversely, never longitudinally. Such a constraint can only ap-

ply to a wave that propagates at c. If it applied to a wave that

propagated at less than c, we could move into a frame of refer-

ence in which the wave was at rest. In this frame, all directions in

space would be equivalent, and there would be no way to decide

which directions of polarization should be permitted. For a wave

that propagates at c, there is no frame in which the wave is at rest

(see section 3.8).

4.3.2 No global conservation of energy-momentum in general

relativity

If you read optional chapter 2, you know that the distinction

between special and general relativity is dened by the atness

of spacetime, and that atness is in turn dened by the path-

independence of parallel transport. Whereas energy is a scalar in

4

See Fiore and Modanese, arxiv.org/abs/hep-th/9508018,

and http://physics.stackexchange.com/questions/12488/

decay-of-massless-particles. If such a process does exist, then Lorentz

invariance requires that its time-scale be proportional to the particles energy.

It can be argued that gluons, which are massless, do in fact undergo decay into

less energetic gluons, but the interpretation is ambiguous because we never

observe gluons as free particles, so we cant just capture one in a box and watch

it rattle around inside until it decays.

Section 4.3 Relativistic momentum 83

Newtonian mechanics, in relativity it is the timelike component of

a vector. It therefore follows that in general relativity we should

not expect to have global conservation of energy. For a conservation

law is a statement that when we add up a certain quantity, the total

has a constant value. But if spacetime is curved, then there is no

natural, uniquely dened way to compare vectors that are dened at

dierent places in spacetime. We could parallel transport one over

to the other, but the result would depend on the path along which

we chose to transport it. For similar reasons, we should not expect

global conservation of momentum.

This is the answer to a frequently asked question about cosmol-

ogy. Since 1998 weve known that the expansion of the universe is

accelerating, rather than decelerating as we would have expected due

to gravitational attraction. What is the source of the ever-increasing

kinetic energy of all those galaxies? The question assumes that en-

ergy must be conserved on cosmological scales, but that just isnt

so.

Nevertheless, general relativity reduces to special relativity on

scales small enough to make curvature eects negligible. Therefore it

is still valid to expect conservation of energy and momentum to hold

locally, as assumed, e.g., in the analysis of Compton scattering in

example 9 on p. 80, and veried in countless experiments. Cf. section

9.2, p. 153, on the stress-energy tensor.

4.4 Systems with internal structure

Section 4.2 presented essentially Einsteins original proof of E =

mc

2

, which has been criticized on several grounds. A detailed discus-

sion is given by Ohanian.

5

Putting aside questions that are purely

historical or concerned only with academic priority, we would like

to know whether the proof has logical aws, and also whether the

claimed result is only valid under certain conditions. We need to

consider the following questions:

1. Does it matter whether the system being described has nite

spatial extent, or whether the system is isolated?

2. Does it matter whether parts of the system are moving at

relativistic velocities?

3. Does the low-velocity approximation used in Einsteins proof

make a dierence?

4. How do we handle a system that is not made out of point-

like particles, e.g., a capacitor, in which some of the energy-

momentum is in an electric eld?

5

Einsteins E = mc

2

mistakes, arxiv.org/abs/0805.1400

84 Chapter 4 Dynamics

f / The world lines of two beads

bouncing back and forth on a

wire.

The following example demonstrates issues 1-3 and their logical

connections; the denitional question 4 is addressed in ch. 9. Sup-

pose that two beads slide freely on a wire, bouncing elastically o

of each other and also rebounding elastically from the wires ends.

Their world-lines are shown in gure f. Lets say the beads each

have unit mass. In frame o, the beads are released from the center

of the wire with velocities u. For concreteness, lets set u = 1/2,

so that the system has internal motion at relativistic speeds. In

this frame, the total energy-momentum vector of the system, on the

surface of simultaneity labeled 1 in gure f, is p = (2.31, 0). That is,

it has a total mass-energy of 2.31 units, and a total momentum of

zero (meaning that this is the center of mass frame). As time goes

on, an observer in this frame will say that the balls reach the ends of

the wire simultaneously, at which point they rebound, maintaining

the same total energy-momentum vector p. The mass of the system

is, by denition, m =

_

p

2

t

p

2

x

=

constant as the balls bounce back and forth.

Now lets transform into a frame o

relative to o. If velocities added linearly in relativity, then the initial

velocities of the beads in this frame would be 0 and 1, but of course

a material object cant move with speed [v[ = c = 1, and velocities

dont add linearly. Applying the correct velocity addition formula for

relativity, we nd that the beads have initial velocities 0 and 0.8 in

this frame, and if we compute their total energy-momentum vector,

on surface of simultaneity 2 in gure f, we get p

= (2.67, 1.33).

This is exactly what we would have gotten by taking the original

vector p and pushing it through a Lorentz transformation. That

is, the energy-momentum vector seems to be acting like a good

four-vector, even through the system has nite spatial extent and

contains parts that move at relativistic speeds. In particular, this

implies that the system has the same mass m =

2.31 as in o, since

m is the norm of the p vector, and the norm of a vector stays the

same under a Lorentz transformation.

But now consider surface 3, which, like 2, observer o

considers

to be a surface of simultaneity. At this time, o

are moving to the left. Between time 2 and time 3, o

says that

the systems total momentum has changed, while its total mass-

energy stayed constant. Its mass is dierent, and the total energy-

momentum vector p

formation to the value of p at any time in frame o. The reason for

this misbehavior is that the right-hand bead has bounced o of the

right end of the wire, but because o and o

about simultaneity, o

collision for the bead on the left.

But all of these diculties arise only because we have left some-

thing out. When the right-hand bead bounces o of the right-hand

end of the wire, this is a collision between the bead and the wire.

Section 4.4 Systems with internal structure 85

After the collision, the wire rebounds to the right (or a vibration is

created in it). By ignoring the rebound of the wire, we have vio-

lated the law of conservation of momentum. If we take into account

the momentum imparted to the wire, then the energy-momentum

vector of the whole system is conserved, and must therefore be the

same at 2 and 3.

The upshot of all this is that E = mc

2

and the four-vector

nature of p are both valid for systems with nite spatial extent,

provided that the systems are isolated. Isolated means simply

that we should not gratuitously ignore anything such as the wire

in this example that exchanges energy-momentum with our system.

To give a general proof of this, it will be helpful to develop the

idea of the stress-energy tensor (section 9.2, p. 153), which allows

a succinct statement of what we mean by conservation of energy-

momentum (subsection 9.2.1). A proof is given in section 9.3.4 on

p. 165.

4.5 Force

Force is a concept that is seldom needed in relativity, and thats

why this section is optional.

4.5.1 Four-force

By analogy with Newtonian mechanics, we dene a relativistic

force vector

F = ma ,

where a is the acceleration four-vector (sec 3.5, p. 56) and m is the

mass of a particle that has that acceleration as a result of the force

F. This is equivalent to

F =

dp

d

,

where p is the mass of the particle and its proper time. Since

the timelike part of p is the particles mass-energy, the timelike

component of the force is related to the power expended by the

force. These denitions only work for massive particles, since for a

massless particle we cant dene a or . F has been dened in terms

of Lorentz invariants and four-vectors, and therefore it transforms

as a god-fearing four-vector itself.

4.5.2 The force measured by an observer

The trouble with all this is that F isnt what we actually mea-

sure when we measure a force, except if we happen to be in a frame

of reference that momentarily coincides with the rest frame of the

particle. As with velocity and acceleration (section 3.7, p. 61), we

have a four-vector that has simple, standard transformation prop-

erties, but a dierent F

o

, which is what is actually measured by the

86 Chapter 4 Dynamics

observer o. Its dened as

F

o

=

dp

dt

,

with a dt in the denominator rather than a d. In other words,

it measures the rate of transfer of momentum according to the ob-

server, whose time coordinate is t, not unless the observer

happens to be moving along with the particle. Unlike the three-

vectors v

o

and a

o

, whose timelike components are zero by deni-

tion according to observer o, F

o

usually has a nonvanishing timelike

component, which is the rate of change of the particles mass-energy,

i.e., the power.

The following two examples show that an object moving at rel-

ativistic speeds has less inertia in the transverse direction than in

the longitudinal one. A corollary is that the three-acceleration need

not be parallel to the three-force.

Circular motion Example 13

For a particle in uniform circular motion, is constant, and we

have

F

o

=

d

dt

(mv) = m

dv

dt

.

The particles mass-energy is constant, so the timelike compo-

nent of F

o

does happen to be zero in this example. In terms of

the three-vectors v

o

and a

o

dened in section 3.7, we have

F

o

= m

dv

o

dt

= ma

o

,

which is greater than the Newtonian value by the factor . As a

practical example, in a cathode ray tube (CRT) such as the tube

in an old-fasioned oscilloscope or television, a beam of electrons

is accelerated up to relativistic speed (problem 2, p. 96). To paint

a picture on the screen, the beam has to be steered by transverse

forces, and since the deection angles are small, the world-line

of the beam is approximately that of uniform circular motion. The

force required to deect the beam is greater by a factor of than

would have been expected according to Newtons laws.

Linear motion Example 14

For accelerated linear motion in the x direction, ignoring y and z,

we have a velocity vector

v =

dr

d

,

Section 4.5 Force 87

whose x component is v. Then

F

o,x

= m

d(v)

dt

= m

d

dt

v + m

dv

dt

= m

d

dv

dv

dt

+ ma

= m(v

2

3

a + a)

= ma

3

The particles apparent inertia is increased by a factor of

3

due

to relativity.

The results of examples 13 and 14 can be combined as follows:

F

o

= ma

o,

+m

3

a

o,

,

where the subscripts and | refer to the parts of a

o

perpendicular

and parallel to v

o

.

4.5.3 Transformation of the force measured by an observer

Dene a frame of reference o for the inertial frame of reference of

an observer who does happen to be moving along with the particle

at a particular instant in time. Then t is the same as , and F

o

the

same as F. In this frame, the particle is momentarily at rest, so the

work being done on it vanishes, and the timelike components of F

o

and F are both zero.

Suppose we do a Lorentz transformation from o to a new frame

o

o

and F (which are both

purely spatial in frame o). Call this direction x. Then dp =

(dp

t

, dp

x

) = (0, dp

x

) transforms to dp

= (v dp

x

, dp

x

), so that

F

o

,x

= dp

x

/ dt

= ( dp

x

)/( dt) = F

o,x

. The two factors of

cancel, and we nd that F

o

,x

= F

o,x

.

Now lets do the case where the boost is in the y direction, per-

pendicular to the force. The Lorentz transformation doesnt change

dp

y

, so F

o

,y

= dp

y

/ dt

= dp

y

/( dt) = F

o

,y

/.

The summary of our results is as follows. Let F

o

be the force

acting on a particle, as measured in a frame instantaneously comov-

ing with the particle. Then in a frame of reference moving relative

to this one, we have

F

o

,

= F

o,

and

F

o

,

=

F

o,

,

where | indicates the direction parallel to the relative velocity of the

two frames, and a direction perpendicular to it.

88 Chapter 4 Dynamics

g / Subrahmanyan Chan-

drasekhar (1910-1995)

4.6 Degenerate matter

The properties of the momentum vector have surprising implications

for matter subject to extreme pressure, as in a star that uses up all

its fuel for nuclear fusion and collapses. These implications were

initially considered too exotic to be taken seriously by astronomers.

An ordinary, smallish star such as our own sun has enough hy-

drogen to sustain fusion reactions for billions of years, maintaining

an equilibrium between its gravity and the pressure of its gases.

When the hydrogen is used up, it has to begin fusing heavier el-

ements. This leads to a period of relatively rapid uctuations in

structure. Nuclear fusion proceeds up until the formation of ele-

ments as heavy as oxygen (Z = 8), but the temperatures are not

high enough to overcome the strong electrical repulsion of these nu-

clei to create even heavier ones. Some matter is blown o, but nally

nuclear reactions cease and the star collapses under the pull of its

own gravity.

To understand what happens in such a collapse, we have to un-

derstand the behavior of gases under very high pressures. In gen-

eral, a surface area A within a gas is subject to collisions in a time t

from the n particles occupying the volume V = Avt, where v is the

typical velocity of the particles. The resulting pressure is given by

P npv/V , where p is the typical momentum.

Nondegenerate gas: In an ordinary gas such as air, the parti-

cles are nonrelativistic, so v = p/m, and the thermal energy

per particle is p

2

/2m kT, so the pressure is P nkT/V .

Nonrelativistic, degenerate gas: When a fermionic gas is sub-

ject to extreme pressure, the dominant eects creating pres-

sure are quantum-mechanical. Because of the Pauli exclu-

sion principle, the volume available to each particle is V/n,

so its wavelength is no more than (V/n)

1/3

, leading to

p = h/ h(n/V )

1/3

. If the speeds of the particles are still

nonrelativistic, then v = p/m still holds, so the pressure be-

comes P (h

2

/m)(n/V )

5/3

.

Relativistic, degenerate gas: If the compression is strong enough

to cause highly relativistic motion for the particles, then v c,

and the result is P hc(n/V )

4/3

.

As a star with the mass of our sun collapses, it reaches a point

at which the electrons begin to behave as a degenerate gas, and

the collapse stops. The resulting object is called a white dwarf. A

white dwarf should be an extremely compact body, about the size

of the Earth. Because of its small surface area, it should emit very

little light. In 1910, before the theoretical predictions had been

made, Russell, Pickering, and Fleming discovered that 40 Eridani B

Section 4.6 Degenerate matter 89

had these characteristics. Russell recalled: I knew enough about

it, even in these paleozoic days, to realize at once that there was

an extreme inconsistency between what we would then have called

possible values of the surface brightness and density. I must have

shown that I was not only puzzled but crestfallen, at this exception

to what looked like a very pretty rule of stellar characteristics; but

Pickering smiled upon me, and said: It is just these exceptions

that lead to an advance in our knowledge, and so the white dwarfs

entered the realm of study!

S. Chandrasekhar showed in that 1930s that there was an upper

limit to the mass of a white dwarf. We will recapitulate his calcu-

lation briey in condensed order-of-magnitude form. The pressure

at the core of the star is P gr GM

2

/r

4

, where M is the total

mass of the star. The star contains roughly equal numbers of neu-

trons, protons, and electrons, so M = Knm, where m is the mass of

the electron, n is the number of electrons, and K 4000. For stars

near the limit, the electrons are relativistic. Setting the pressure at

the core equal to the degeneracy pressure of a relativistic gas, we

nd that the Chandrasekhar limit is (hc/G)

3/2

(Km)

2

= 6M

.

A less sloppy calculation gives something more like 1.4M

.

What happens to a star whose mass is above the Chandrasekhar

limit? As nuclear fusion reactions icker out, the core of the star be-

comes a white dwarf, but once fusion ceases completely this cannot

be an equilibrium state. Now consider the nuclear reactions

n p +e

+

p +e

n + ,

which happen due to the weak nuclear force. The rst of these re-

leases 0.8 MeV, and has a half-life of 14 minutes. This explains

why free neutrons are not observed in signicant numbers in our

universe, e.g., in cosmic rays. The second reaction requires an input

of 0.8 MeV of energy, so a free hydrogen atom is stable. The white

dwarf contains fairly heavy nuclei, not individual protons, but sim-

ilar considerations would seem to apply. A nucleus can absorb an

electron and convert a proton into a neutron, and in this context the

process is called electron capture. Ordinarily this process will only

occur if the nucleus is neutron-decient; once it reaches a neutron-

to-proton ratio that optimizes its binding energy, neutron capture

cannot proceed without a source of energy to make the reaction go.

In the environment of a white dwarf, however, there is such a source.

The annihilation of an electron opens up a hole in the Fermi sea.

There is now an state into which another electron is allowed to drop

without violating the exclusion principle, and the eect cascades

upward. In a star with a mass above the Chandrasekhar limit, this

process runs to completion, with every proton being converted into a

neutron. The result is a neutron star, which is essentially an atomic

nucleus (with Z = 0) with the mass of a star!

90 Chapter 4 Dynamics

Observational evidence for the existence of neutron stars came

in 1967 with the detection by Bell and Hewish at Cambridge of a

mysterious radio signal with a period of 1.3373011 seconds. The sig-

nals observability was synchronized with the rotation of the earth

relative to the stars, rather than with legal clock time or the earths

rotation relative to the sun. This led to the conclusion that its origin

was in space rather than on earth, and Bell and Hewish originally

dubbed it LGM-1 for little green men. The discovery of a second

signal, from a dierent direction in the sky, convinced them that it

was not actually an articial signal being generated by aliens. Bell

published the observation as an appendix to her PhD thesis, and

it was soon interpreted as a signal from a neutron star. Neutron

stars can be highly magnetized, and because of this magnetization

they may emit a directional beam of electromagnetic radiation that

sweeps across the sky once per rotational period the lighthouse

eect. If the earth lies in the plane of the beam, a periodic signal

can be detected, and the star is referred to as a pulsar. It is fairly

easy to see that the short period of rotation makes it dicult to

explain a pulsar as any kind of less exotic rotating object. In the

approximation of Newtonian mechanics, a spherical body of density

, rotating with a period T =

_

3/G, has zero apparent gravity

at its equator, since gravity is just strong enough to accelerate an

object so that it follows a circular trajectory above a xed point on

the surface (problem 17). In reality, astronomical bodies of plane-

tary size and greater are held together by their own gravity, so we

have T 1/

due to its own rotation. In the case of the Bell-Hewish pulsar, this

implies 10

10

kg/m

3

, which is far larger than the density of nor-

mal matter, and also 10-100 times greater than the typical density

of a white dwarf near the Chandrasekhar limit.

An upper limit on the mass of a neutron star can be found in a

manner entirely analogous to the calculation of the Chandrasekhar

limit. The only dierence is that the mass of a neutron is much

greater than the mass of an electron, and the neutrons are the only

particles present, so there is no factor of K. Assuming the more

precise result of 1.4M

our sloppy one, and ignoring the interaction of the neutrons via the

strong nuclear force, we can infer an upper limit on the mass of a

neutron star:

1.4M

_

Km

e

m

n

_

2

5M

Tolman, Oppenheimer, and Volko originally estimated it in 1939

as 0.7M

to 3M

5M

force tends to pull the star toward collapse. Unambiguous results

Section 4.6 Degenerate matter 91

are presently impossible because of uncertainties in extrapolating

the behavior of the strong force from the regime of ordinary nuclei,

where it has been relatively well parametrized, into the exotic envi-

ronment of a neutron star, where the density is signicantly dierent

and no protons are present. There are a variety of eects that may

be dicult to anticipate or to calculate. For example, Brown and

Bethe found in 1994

6

that it might be possible for the mass limit to

be drastically revised because of the process e

+

e

, which is

impossible in free space due to conservation of energy, but might be

possible in a neutron star. Observationally, nearly all neutron stars

seem to lie in a surprisingly small range of mass, between 1.3 and

1.45M

exotic matter.

7

For stars with masses above the Tolman-Oppenheimer-Volko

limit, it seems likely, both on theoretical and observational grounds,

we end up with a black hole: an object with an event horizon

(cf. p. 58) that cuts its interior o from the rest of the universe.

4.7 Tachyons and FTL

4.7.1 A defense in depth

Lets summarize some ideas about faster-than-light (FTL, su-

perluminal) motion in relativity:

1. Superluminal transmission of information would violate causal-

ity, since it would allow a causal relationship between events

that were spacelike in relation to one another, and the time-

ordering of such events is dierent according to dierent ob-

servers. Since we never seem to observe causality to be vi-

olated, we suspect that superluminal transmission of infor-

mation is impossible. This leads us to interpret the metric in

relativity as being fundamentally a statement of possible cause

and eect relationships between events.

2. We observe the invariant mass dened by m

2

= E

2

p

2

to be

a xed property of all objects. Therefore we suspect that it is

not possible for an object to change from having [E[ > [p[ to

having [E[ < [p[.

3. No continuous process of acceleration can bring an observer

from v < c to v > c (see section 3.3). Since its possible to

build an observer out of material objects, it seems that its

6

H.A. Bethe and G.E. Brown, Observational constraints on the maximum

neutron star mass, Astrophys. J. 445 (1995) L129. G.E. Brown and H.A.

Bethe, A Scenario for a Large Number of Low-Mass Black Holes in the Galaxy,

Astrophys. J. 423 (1994) 659. Both papers are available at adsabs.harvard.edu.

7

Demorest et al., arxiv.org/abs/1010.5788v1.

92 Chapter 4 Dynamics

impossible to get a material object past c by a continuous

process of acceleration.

4. If we could boost a material object past the speed of light,

even by some discontinuous process, then we could do so for

an observer. But faster-than-light frames of reference are kine-

matically impossible in 3+1 dimensions (section 3.8).

Special relativity seems to have a defense in depth against superlu-

minal motion.

The weakest of these arguments is 1, since as described in sec-

tion 2.1, we have no strong reasons for believing in causality as an

overarching principle of physics. If were willing to let go of causal-

ity, then we only need to comply with arguments 2, 3, and 4 above.

Based on 2, FTL motion would be a property of an exotic form

of matter built out of hypothetical particles with imaginary mass.

Such particles are called tachyons. Argument 4 tells us that the laws

of physics must conspire to make it impossible to build an observer

out of tachyons; this is not entirely implausible, since there are other

classes of particles such as photons that cant be used to construct

observers.

4.7.2 Experiments to search for tachyons

It would be exciting if we could detect tachyons in particle accel-

erator experiments or as naturally occurring radiation. Perhaps we

could even learn to transmit and receive tachyon signals articially,

allowing us to send ourselves messages from the future! The latter

possibility was pointed out in 1917 by Tolman

8

and is referred to

as the tachyonic antitelephone. Bilaniuk et al. claimed in a 1962

paper to have found a reinterpretation that eliminated the causal-

ity violation,

9

but their interpretation requires that rates of tachyon

emission in one frame be related to rates of tachyon absorption in an-

other frame, which in my opinion is equally problematic, since rates

of absorption should depend on the environment, whereas rates of

emission should depend on the emitter; the causality violation has

simply been described in dierent words, but not eliminated.

10

Experimental searches are made more dicult by conicting the-

oretical claims as to whether tachyons should be charged or neu-

tral, whether they should have integral or half-integral spin, and

whether the normal spin-statistics relation even applies to them.

11

8

www.archive.org/details/theoryrelativmot00tolmrich

9

Bilaniuk and Sudarshan, Particles beyond the light barrier, Phys. To-

day 22, 43 (1969), available online at wildcard.ph.utexas.edu/

~

sudarshan/

publications.htm

10

For a dierent critique, see Benford, Book, and Newcomb, The tachyonic

antitelephone, Physical Review D 2 (1970) 263. Scans of the paper can be

found online.

11

Feinberg, Possibility of Faster-Than-light Particles, Phys

Section 4.7 Tachyons and FTL 93

If charged, it is uncertain whether and under what circumstances

they would emit Cerenkov radiation.

The most obvious experimental signature of tachyons would be

propagation at speeds greater than c. Negative results were reported

by Murthy and later by Clay,

12

who studied air showers generated

by cosmic rays to look for precursor particles that arrived before the

rst photons.

One could also look for particles with [p[ > E. Alvager and

Erman, in a 1965 experiment, studied the beta decay of

170

Tm, using

a spectrometer to measure the momentum of charged radiation and

a solid state detector to determine energy. An upper limit of one

tachyon per 10

4

beta particles was inferred.

If tachyons are neutral, then they might be dicult to detect

directly, but it might be possible to infer their existence indirectly

through missing energy-momentum in reactions. This is how the

neutrino was rst discovered. Baltay et al.

13

searched for reactions

such as p + p

+

+

measuring the momenta of all the other initial and nal particles

and looking for events in which the missing energy-momentum was

spacelike. They put upper limits of 10

3

on the branching ratios

of this and several other reactions leading to production of single

tachyons or tachyon-antitachyon pairs.

When we add quantum mechanics to special relativity, we get

quantum eld theory, which sounds scary and can be quite techni-

cal, but is governed by some very simple principles. One of these

principles is that everything not forbidden is compulsory. The

phrase originated as political satire of communism by T.H. White,

but was commandeered by physicist Murray Gell-Mann to express

the idea that any process not forbidden by a conservation law will in

fact occur in nature at some rate. If tachyons exist, then it is possi-

ble to have two tachyons whose energy-momentum vectors add up

to zero (problem 8, p. 97). This would seem to imply that the vac-

uum could spontaneously create tachyon-antitachyon pairs. Most

theorists now interpret this as meaning that when tachyons pop up

in the equations, its a sign that the assumed vacuum state is not

stable, and will change into some other state that is the true state

of minimum energy.

A brief urry of reawakened interest in tachyons was occasioned

by a 2011 debacle in which the particle-physics experiment OPERA

mistakenly reported faster-than-light propagation of neutrinos; the

anomaly was later found to be the result of a loose connection on a

Rev 159 (1967) 1089, http://www.scribd.com/doc/144943457/

G-Feinberg-Possibility-of-Faster-Than-light-Particles-Phys-Rev-159-1967-1089

12

A search for tachyons in cosmic ray showers, Austr. J. Phys 41 (1988) 93,

http://adsabs.harvard.edu/full/1988AuJPh..41...93C

13

Phys. Rev. D 1 (1970) 759

94 Chapter 4 Dynamics

ber-optic cable plus a miscalibrated oscillator.

Section 4.7 Tachyons and FTL 95

Problems

1 Criticize the following reasoning. Temperature is a measure

of the energy per atom. In nonrelativistic physics, there is a min-

imum temperature, which corresponds to zero energy per atom, but

no maximum. In relativity, there should be a maximum temperature,

which would be the temperature at which all the atoms are moving

at c.

2 In an old-fashioned cathode ray tube (CRT) television, elec-

trons are accelerated through a voltage dierence that is typically

about 20 kV. At what fraction of the speed of light are the electrons

moving?

3 In nuclear beta decay, an electron or antielectron is typically

emitted with an energy on the order of 1 MeV. In alpha decay,

the alpha particle typically has an energy of about 5 MeV. In each

case, do a rough estimate of whether the particle is nonrelativistic,

relativistic, or ultrarelativistic.

4 Suppose that the starship Enterprise from Star Trek has a

mass of 8.0 10

7

kg, about the same as the Queen Elizabeth 2.

Compute the kinetic energy it would have to have if it was moving

at half the speed of light. Compare with the total energy content of

the worlds nuclear arsenals, which is about 10

21

J.

the universe. In 2013 the IceCube neutrino detector in Antarctica

detected two neutrinos,

14

dubbed Bert and Ernie, after the Sesame

Street characters, with energies in the neighborhood of 1 PeV =

10

15

eV. The higher energy was Ernies 1.14 0.17 PeV. It is not

known what type of neutrino he was, nor do we have exact masses

for neutrinos, but lets assume m = 1 eV. Find Ernies rapidity.

6 Science ction stories often depict spaceships traveling through

solar systems at relativistic speeds. Interplanetary space contains

a signicant number of tiny dust particles, and such a ship would

sweep these dust particles out of a large volume of space, impacting

them at high speeds. A 1975 experiment aboard the Skylab space

station measured the frequency of impacts from such objects and

found that a square meter of exposed surface experienced an impact

from a particle with a mass of 10

15

kg about every few hours.

A relativistic object, sweeping through space much more rapidly,

would experience such impacts at rates of more like one every few

seconds. (Larger particles are signicantly more rare, with the fre-

quency falling o as something like m

8

.) These particles didnt

damage Skylab, because at relative velocities of 10

4

m/s their ki-

netic energies were on the order of microjoules. At relativistic speeds

it would be a dierent story. Real-world spacecraft are lightweight

and rather fragile, so there would probably be serious consequences

14

arxiv.org/abs/1304.5356

96 Chapter 4 Dynamics

from any impact having a kinetic energy of about 10

2

J (comparable

to a bullet from a small handgun). (a) Find the speed at which a

starship could cruise through a solar system if frequent 10

2

J col-

lisions were acceptable, assuming no object with a mass of more

than 10

15

kg. Express your result relative to c. (b) Find the speed

under the more conservative parameters of 10 J and 10

14

kg.

7 Example 4 on p. 57 derives the equation

x =

1

a

cosh a

for a particle moving with constant acceleration. (Note that a con-

stant of integration was taken to be zero, so that x ,= 0 at = 0.)

(a) Rewrite this equation in metric units by inserting the necessary

factors of c. (b) If we had a rocket ship capable of accelerating

indenitely at g, how much proper time would be needed in order

to travel the distance x = 27, 000 light-years to the galactic cen-

ter? (This will be a yby, so the ship accelerates all the way rather

than decelerating to stop at its destination.) Answer: 11 years (c)

An observer at rest relative to the galaxy explains the surprisingly

short time calculated in part b as being due to the time dilation

experienced by the traveler. How does the traveler explain it?

8 Show, as claimed on p. 94, that if tachyons exist, then it is

possible to have two tachyons whose momentum vectors add up to

zero.

9 (a) A free neutron (as opposed to a neutron bound into an

atomic nucleus) is unstable, and undergoes spontaneous radioactive

decay into a proton, an electron, and an antineutrino. The masses

of the particles involved are as follows:

neutron 1.67495 10

27

kg

proton 1.67265 10

27

kg

electron 0.00091 10

27

kg

antineutrino < 10

35

kg

Find the energy released in the decay of a free neutron.

(b) Neutrons and protons make up essentially all of the mass of the

ordinary matter around us. We observe that the universe around us

has no free neutrons, but lots of free protons (the nuclei of hydrogen,

which is the element that 90% of the universe is made of). We nd

neutrons only inside nuclei along with other neutrons and protons,

not on their own.

If there are processes that can convert neutrons into protons, we

might imagine that there could also be proton-to-neutron conver-

sions, and indeed such a process does occur sometimes in nuclei

that contain both neutrons and protons: a proton can decay into a

neutron, a positron, and a neutrino. A positron is a particle with

the same properties as an electron, except that its electrical charge

is positive. A neutrino, like an antineutrino, has negligible mass.

Problems 97

Although such a process can occur within a nucleus, explain why

it cannot happen to a free proton. (If it could, hydrogen would be

radioactive, and you wouldnt exist!)

10 (a) Find a relativistic equation for the velocity of an object

in terms of its mass and momentum (eliminating ).

(b) Show that your result is approximately the same as the classical

value, p/m, at low velocities.

(c) Show that very large momenta result in speeds close to the speed

of light.

11 Expand the equation for relativistic kinetic energy K =

m( 1) in a Taylor series, and nd the rst two nonvanishing

terms. Show that the rst term is the nonrelativistic expression.

12 Expand the equation p = mv in a Taylor series, and nd

the rst two nonvanishing terms. Show that the rst term is the

classical expression.

13 An atom in an excited state emits a photon, ending up in

a lower state. The initial state has mass m

1

, the nal one m

2

. To

a very good approximation, we expect the energy E of the photon

to equal m

1

m

2

. However, conservation of momentum dictates

that the atom must recoil from the emission, and therefore it carries

away a small amount of kinetic energy that is not available to the

photon. Find the exact energy of the photon, in the frame in which

the atom was initially at rest.

14 The following are the three most common ways in which

gamma rays interact with matter:

Photoelectric eect: The gamma ray hits an electron, is annihilated,

and gives all of its energy to the electron.

Compton scattering: The gamma ray bounces o of an electron,

exiting in some direction with some amount of energy.

Pair production: The gamma ray is annihilated, creating an electron

and a positron.

Example 10 on p. 82 shows that pair production cant occur in a

vacuum due to conservation of the energy-momentum four-vector.

What about the other two processes? Can the photoelectric eect

occur without the presence of some third particle such as an atomic

nucleus? Can Compton scattering happen without a third particle?

15 This problem assume you know some basic quantum physics.

The point of this problem is to estimate whether or not a neutron or

proton in an an atomic nucleus is highly relativistic. Nuclei typically

have diameters of a few fm (1 fm = 10

15

m). Take a neutron or

proton to be a particle in a box of this size. In the ground state,

half a wavelength would t in the box. Use the de Broglie relation

98 Chapter 4 Dynamics

to estimate its typical momentum and thus its typical speed. How

relativistic is it?

16 Show, as claimed in example 11 on p. 83, that if a massless

particle were to decay, Lorentz invariance requires that the time-

scale for the process be proportional to the particles energy. What

units would the constant of proportionality have?

17 Derive the equation T =

_

3/G given on page 91 for the

period of a rotating, spherical object that results in zero apparent

gravity at its surface.

Problems 99

100 Chapter 4 Dynamics

Chapter 5

Inertia (optional)

5.1 What is inertial motion?

On p. 43 I stated the following as an axiom of special relativity:

P4. Inertial frames of reference exist. These are frames in

which particles move at constant velocity if not subject to any

forces. We can construct such a frame by using a particular

particle, which is not subject to any forces, as a reference

point. Inertial motion is modeled by vectors and parallelism.

This is a typical modern restatement of Newtons rst law. It claims

to dene inertial frames and claims that they exist.

a / The spherical chamber, shown

in a cutaway view, has layers

of shielding to exclude all known

nongravitational forces. The three

guns, at right angles to each

other, re bullets. Once the cham-

ber has been calibrated by mark-

ing the three dashed-line trajecto-

ries under free-fall conditions, an

observer inside the chamber can

always tell whether she is in an in-

ertial frame.

5.1.1 An operational denition

In keeping with the philosophy of operationalism (p. 25), we

ought to be able to translate the denition into a method for testing

whether a given frame really is inertial. Figure a shows an idealized

varation on a device actually built for this purpose by Harold Waage

at Princeton as a lecture demonstration to be used by his partner in

101

b / Example 1.

crime John Wheeler. We build a sealed chamber whose contents are

isolated as much as possible from outside forces. Of the four known

forces of nature, the ones we know how to exclude are the strong

nuclear force, the weak nuclear force, and the electromagnetic force.

The strong nuclear force has a range of only about 1 fm (10

15

m),

so to exclude it we merely need to make the chamber thicker than

that, and also surround it with enough paran wax to keep out

any neutrons that happen to be ying by. The weak nuclear force

also has a short range, and although shielding against neutrinos is

a practical impossibility, their inuence on the apparatus inside will

be negligible. To shield against electromagnetic forces, we surround

the chamber with a Faraday cage and a solid sheet of mu-metal.

Finally, we make sure that the chamber is not being touched by any

surrounding matter, so that short-range residual electrical forces

(sticky forces, chemical bonds, etc.) are excluded. That is, the

chamber cannot be supported; it is free-falling.

Crucially, the shielding does not exclude gravitational forces.

There is in fact no known way of shielding against gravitational

eects such as the attraction of other masses or the propagation of

gravitational waves. (Because the shielding is spherical, it exerts no

gravitational force of its own on the apparatus inside.)

Inside, an observer carries out an initial calibration by ring

bullets along three Cartesian axes and tracing their paths, which she

denes to be linear. (She can also make sure that the chamber isnt

rotating, e.g., by checking for velocity-dependent Coriolis forces.)

After the initial calibration, she can always tell whether or not she

is in an inertial frame. She simply has to re the bullets, and see

whether or not they follow the precalibrated paths. For example,

she can detect that the frame has become noninertial if the chamber

is rotated, allowed to rest on the ground, or accelerated by a rocket

engine.

Isaac Newton would have been extremely unhappy with our def-

inition. This is absurd, he says. The way youve dened it, my

street in London isnt inertial. Newtonian mechanics only makes

predictions if we input the correct data on all the mass in the uni-

verse. Given this kind of knowledge, we can properly account for all

the gravitational forces, and dene the street in London as an iner-

tial frame because in that frame, the trees and houses have zero total

force on them and dont accelerate. But spacetime isnt Galilean. In

special relativitys description of spacetime, information propagates

at a maximum speed of c, so there will always be distant parts of the

universe that we can never know about, because information from

those regions hasnt had time to reach us yet.

Rotation is noninertial Example 1

Figure b shows a hypothetical example proposed by Einstein.

One planet rotates about its axis and therefore has an equatorial

102 Chapter 5 Inertia (optional)

c / According to Galileos stu-

dent Viviani, Galileo dropped

a cannonball and a musketball

simultaneously from the leaning

tower of Pisa, and observed that

they hit the ground at nearly the

same time. This contradicted

Aristotles long-accepted idea

that heavier objects fell faster.

bulge. The other planet doesnt rotate and has none. Both New-

tonian mechanics and special relativity make these predictions,

and although the scenario is idealized and unrealistic, there is no

doubt that their predictions are correct for this situation, because

the two theories have been tested in similar cases. This also

agrees with our operational denition of inertial motion on p. 102.

Rotational motion is noninertial.

This bothered Einstein for the following reason. If the inhabitants

of the two planets can look up in the sky at the xed stars, they

have a clear explanation of the reason for the difference in shape.

People on planet A dont see the stars rise or set, and they infer

that this is because they live on a nonrotating world. The inhab-

itants of planet B do see the stars rise and set, just as they do

here on earth, so they infer, just as Copernicus did, that their

planet rotates.

But suppose, Einstein said, that the two planets exist alone in an

otherwise empty universe. There are no stars. Then its equally

valid for someone on either planet to say that its the one that

doesnt rotate. Each planet rotates relative to the other planet,

but the situation now appears completely symmetric. Einstein

took this argument seriously and felt that it showed a defect in

special relativity. He hoped that his theory of general relativity

would x this problem, and predict that in an otherwise empty uni-

verse, neither planet would show any tidal bulge. In reality, further

study of the general theory of relativity showed that it made the

same prediction as special relativity. Theorists have constructed

other theories of gravity, most prominently the Brans-Dicke the-

ory, that do behave more in the way Einsteins physical intuition

expected. Precise solar-system tests have, however, supported

general relativity rather than Brans-Dicke gravity, so it appears

well settled now that rotational motion really shouldnt be consid-

ered inertial.

5.1.2 Equivalence of inertial and gravitational mass

All of the reasoning above depends on the perfect cancellation

referred to by Newton: since gravitational forces are proportional to

mass, and acceleration is inversely proportional to mass, the result

is that accelerations caused by gravity are independent of mass.

This is the universality of free fall, which was famously observed by

Galileo, gure c.

Suppose that, on the contrary, we had access to some mat-

ter that was immune to gravity. Its sold under the brand name

FloatyStu

TM

. The cancellation fails now. Lets say that alien

gangsters land in a ying saucer, kidnap you out of your back yard,

konk you on the head, and take you away. When you regain con-

sciousness, youre locked up in a sealed cabin in their spaceship.

You pull your keychain out of your pocket and release it, and you

Section 5.1 What is inertial motion? 103

d / Lor and E otv os (1848-1919).

observe that it accelerates toward the oor with an acceleration that

seems quite a bit slower than what youre used to on earth, perhaps

a third of a gee. There are two possible explanations for this. One

is that the aliens have taken you to some other planet, maybe Mars,

where the strength of gravity is a third of what we have on earth.

The other is that your keychain didnt really accelerate at all: youre

still inside the ying saucer, which is accelerating at a third of a gee,

so that it was really the deck that accelerated up and hit the keys.

There is absolutely no way to tell which of these two scenarios is

actually the case unless you happen to have a chunk of FloatyStu

in your other pocket. If you release the FloatyStu and it hovers

above the deck, then youre on another planet and experiencing

genuine gravity; your keychain responded to the gravity, but the

FloatyStu didnt. But if you release the FloatyStu and see it hit

the deck, then the ying saucer is accelerating through outer space.

5.2 The equivalence principle

5.2.1 Equivalence of acceleration to a gravitational eld

The nonexistence of FloatyStu in our universe is a special case

of the equivalence principle. The equivalence principle states that

an acceleration (such as the acceleration of the ying saucer) is al-

ways equivalent to a gravitational eld, and no observation can ever

tell the dierence without reference to something external. (And

suppose you did have some external reference point how would

you know whether it was accelerating?)

5.2.2 E otv os experiments

FloatyStu would be an extreme example, but if there was any

violation of the universality of free fall, no matter how small, then

the equivalence principle would be falsied. Since Galileos time, ex-

perimental methods have had several centuries in which to improve,

and the second law has been subjected to similar tests with ex-

ponentially improving precision. For such an experiment in 1993,

1

physicists at the University of Pisa (!) built a metal disk out of

copper and tungsten semicircles joined together at their at edges.

They evacuated the air from a vertical shaft and dropped the disk

down it 142 times, using lasers to measure any tiny rotation that

would result if the accelerations of the copper and tungsten were

very slightly dierent. The results were statistically consistent with

zero rotation, and put an upper limit of 1 10

9

on the fractional

dierence in acceleration [g

copper

g

tungsten

[/g. Experiments of this

type are called Eotvos experiments, after Lorand Eotvos, who did

the rst modern, high-precision versions.

1

Carusotto et al., Limits on the violation of g-universality with a Galileo-

type experiment, Phys Lett A183 (1993) 355. Freely available online at re-

searchgate.net.

104 Chapter 5 Inertia (optional)

e / An articial horizon.

The articial horizon Example 2

The pilot of an airplane cannot always easily tell which way is up.

The horizon may not be level simply because the ground has an

actual slope, and in any case the horizon may not be visible if the

weather is foggy. One might imagine that the problem could be

solved simply by hanging a pendulum and observing which way

it pointed, but by the equivalence principle the pendulum cannot

tell the difference between a gravitational eld and an acceler-

ation of the aircraft relative to the ground nor can any other

accelerometer, such as the pilots inner ear. For example, when

the plane is turning to the right, accelerometers will be tricked into

believing that down is down and to the left. To get around this

problem, airplanes use a device called an articial horizon, which

is essentially a gyroscope. The gyroscope has to be initialized

when the plane is known to be oriented in a horizontal plane. No

gyroscope is perfect, so over time it will drift. For this reason the

instrument also contains an accelerometer, and the gyroscope is

automatically restored to agreement with the accelerometer, with

a time-constant of several minutes. If the plane is own in cir-

cles for several minutes, the articial horizon will be fooled into

indicating that the wrong direction is vertical.

5.2.3 Gravity without gravity

We live immersed in the earths gravitational eld, and that

is where we do almost all of our physics experiments. Its sur-

prising, then, that special relativity can be conrmed in earth-

bound experiments, sometimes with phenomonal precision, as in the

Ives-Stilwell experiments 10-signicant-gure test of the relativistic

Doppler shift equation (p .52). How can this be, since special rel-

ativity is supposed to be the version of relativity that cant handle

gravity? The equivalence principle provides an answer. If the only

gravitational eect on your experiment is a uniform eld g, then its

valid for you to describe your experiment as having been done in a

region without any gravity, but in a laboratory whose oor happened

to have been accelerating upward with an acceleration g. Special

relativity works just ne in such situations, because switching into

an accelerated frame of reference doesnt have any eect on the at-

ness of spacetime (p. 42). Note that Gravity Probe B (p. 42) orbited

the earth, so the eld it experienced varied in direction, causing the

above argument to fail; the eects it observed were not explainable

by special relativity.

5.2.4 Gravitational Doppler shifts

For an example of a specically gravitational experiment that

is explainable by special relativity, and that has actually been car-

ried out, In a laboratory accelerating upward, a light wave emitted

from the oor would be Doppler-shifted toward lower frequencies

when observed at the ceiling, because of the change in the receivers

Section 5.2 The equivalence principle 105

f / 1. A light wave is emitted

upward from the oor of the ele-

vator. The elevator accelerates

upward. 2. By the time the light

wave is detected at the ceiling,

the elevator has changed its

velocity, so the wave is detected

with a Doppler shift.

g / Pound and Rebka at the

top and bottom of the tower.

velocity during the waves time of ight. The eect is given by

f/f ax/c

2

, where a is the labs acceleration, x is the

height from oor to ceiling, and c is the speed of light (problem 1).

In units with c = 1, we have f/f ax.

By the equivalence principle, we nd that when such an experi-

ment is done in a gravitational eld g, there should be a gravitational

eect on frequency f/f gx. This can be expressed more

compactly as f/f , where is the gravitational potential,

i.e., the gravitational energy per unit mass.

In 1959, Pound and Rebka

2

carried out an experiment in a tower

at Harvard. Gamma rays from were emitted by a

57

Fe source at the

bottom and detected at the top, having risen x = 22.6 m. The

equivalence principle predicts a fractional frequency shift due to

gravity of 2.4610

15

. This is very small, and would normally have

been masked by recoil eects (problem 13, p. 98), but by exploiting

the Mossbauer eect Pound and Rebka measured the shift to be

(2.56 0.25) 10

15

.

5.2.5 A varying metric

In the Pound-Rebka experiment, the nuclei emitting the gamma

rays at frequency f can be thought of as little clocks. Each wave

crest that propagates upward is a signal saying that the clock has

ticked once. An observer at the top of the tower nds that the

signals come in at the lower frequency f

that the clocks at the bottom had been slowed down due to some

kind of time dilation eect arising from gravity.

This may seem like a big conceptual leap, but it has been con-

rmed using atomic clocks. In a 1978 experiment by Iijima and Fu-

jiwara, gure h, identical atomic clocks were kept at rest at the top

and bottom of a mountain near Tokyo. The discrepancies between

the clocks were consistent with the predictions of the equivalence

principle. The gravitational Doppler shift was also one of the eects

that led to the non-null result of the Hafele-Keating experiment

p. 15, in which atomic clocks were own around the world aboard

commercial passenger jets. Every time you use the GPS system,

you are making use of these eects.

Starting from only the seemingly innocuous assumption of the

equivalence principle, we are led to surprisingly far-reaching con-

clusions. We nd that time ows at dierent rates depending on

the height within a gravitational eld. Since the metric can be in-

terpreted as a measure of the amount of proper time along a given

world-line, we conclude that we cannot always express the metric in

the familiar form

2

= (+1)t

2

+ (1)x

2

with xed coecients

+1 and 1. Suppose that the t coordinate is dened by radio syn-

chronization. Then the +1 in the metric needs to be replaced with

2

Phys. Rev. Lett. 4 (1960) 337

106 Chapter 5 Inertia (optional)

h / A graph showing the time

difference between two atomic

clocks. One clock was kept at Mi-

taka Observatory, at 58 m above

sea level. The other was moved

back and forth to a second ob-

servatory, Norikura Corona Sta-

tion, at the peak of the Norikura

volcano, 2876 m above sea level.

The plateaus on the graph are

data from the periods when the

clocks were compared side by

side at Mitaka. The difference be-

tween one plateau and the next

shows a gravitational effect on the

rate of ow of time, accumulated

during the period when the mobile

clock was at the top of Norikura.

approximately 1 + 2, where we take = 0 by convention at the

height of the standard clock that coordinates the synchronization.

Keep in mind that although we have connected gravity to the

measurement apparatus of special relativity, there is no curvature

of spacetime, so what we are doing here is still special relativity, not

general relativity. In fact there is nothing more mysterious going

on here than a renaming of spacetime events through a change of

coordinates. The renaming might be convenient if we were using

earth-based reference points to measure the x coordinate. But if

we felt like it, we could switch to a good inertial frame of reference,

one that was free-falling. In this frame, we would obtain exactly the

same prediction for the results of any experiment. For example, the

free-falling observer would explain the result of the Pound-Rebka

experiment as arising from the upward acceleration of the detector

away from the source.

Section 5.2 The equivalence principle 107

Problems

1 Carry out the details of the calculation of the gravitational

Doppler eect in section 5.2.4.

2 A student argues as follows. At the center of the earth, there

is zero gravity by symmetry. Therefore time would ow at the same

rate there as at a large distance from the earth, where there is also

zero gravity. Although we cant actually send an atomic clock to the

center of the earth, interpolating between the surface and the center

shows that a clock at the bottom of a mineshaft would run faster

than one on the earths surface. Find the mistake in this argument.

3 Somewhere in outer space, suppose there is an astronomical

body that is a sphere consisting of solid lead. Assume the Newtonian

expression = GM/r for the potential in the space outside the

object. Make an order of magnitude estimate of the diameter it

must have if the gravitational time dilation at its surface is to be

a factor of 2 relative to time as measured far away. (Under these

conditions of strong gravitational elds, special relativity is only a

crude approximation, and thats why we wont get more than an

order of magnitude estimate out of this.) What is the gravitational

eld at its surface? If I have a weeks vacation from work, and I

spend it lounging on the beach on the lead planet, do I experience

two weeks of relaxation, or half a week?

108 Chapter 5 Inertia (optional)

Chapter 6

Waves

This chapter and the preceding one have good, solid physical titles.

Inertia. Waves. But underlying the physical content is a thread of

mathematics designed to teach you a language for describing space-

time. Without this language, the complications of relativity rapidly

build up and become unmanageable. In section 5.2.5, we saw that

there are physically compelling reasons for switching back and forth

between dierent coordinate systems dierent ways of attach-

ing names to the events that make up spacetime. A toddler in a

bilingual family gets a payo for switching back and forth between

asking Mama in Spanish for dulces and alerting Daddy in English

that Barbie needs to be rescued from falling o the couch. She may

bounce back and forth between the two languages in a single sen-

tence a habit that linguists call code switching. In relativity,

we need to build uency in a language that lets us talk about actual

phenomena without getting hung up on the naming system.

6.1 Frequency

6.1.1 Is times ow constant?

The simplest naming task is in 0 + 1 dimensions: a time-line

like the ones in history class. If we name the points in time A, B,

C, . . . or 1, 2, 3, . . . , or Bush, Clinton, Bush, . . . , how do we know

that were marking o equal time intervals? Does it make sense

to imagine that time itself might speed up and slow down, or even

start and stop? The second law of thermodynamics encourages us

to think that it could. If the universe had existed for an innite

time, then entropy would have maximized itself a long time ago,

presumably and we would not exist, because the heat death of

the universe would already have happened.

6.1.2 Clock-comparison experiments

But what would it actually mean empirically for times rate of

ow to vary? Unless we can tie this to the results of experiments, its

nothing but cut-rate metaphysics. In a Hollywood movie where time

could stop, the scriptwriters would show us the stopping through

the eyes of an observer, who would stroll past frozen waterfalls and

snapshotted bullets in mid-ight. The observers brain is a kind

of clock, and so is the waterfall. Were left with whats known

as a clock-comparison experiment. To date, all clock-comparison

109

experiments have given null results. Matsakis et al.

1

found that

pulsars match the rates of atomic clocks with a drift of less than

about 10

6

seconds over 10 years. Guena et al.

2

observed that

atomic clocks using atoms of dierent isotopes drifted relative to

one another by no more than about 10

16

per year. Any non-null

result would have caused serious problems for relativity. One of

the expectations in an Aristotelian description of spacetime is that

the motion of material objects on earth would naturally slow down

relative to celestial phenomena such as the rising and setting of the

sun. The relativistic interpretation of time dilation as an eect on

time itself (p. 25) also depends crucially on the null results of these

experiments.

6.1.3 Birdtracks notation

As a simple example of clock comparison, lets imagine using

the hourly emergence of a mechanical bird from a pendulum-driven

cuckoo clock to measure the rate at which the earth spins. There

is clearly a kind of symmetry here, since we could equally well take

our planets rotation as the standard and use it to measure the

frequency with which the bird pops out of the door. Schematically,

lets represent this measurement process with the following notation,

which is part of a system called called birdtracks:

3

c e = 24

Here c represents the cuckoo clock and e the rotation of the earth.

Although the measurement relationship is nearly symmetric, the

arrow has a direction, because, for example, the measurement of

the earths rotational period in terms of the clocks frequency is

c e = (1 hr

1

)(24 hr) = 24, but the clocks period in terms of the

earths frequency is e c = 1/24. We say that the relationship is

not symmetric but dual. By the way, it doesnt matter how we

arrange these diagrams on the page. The notations c e and e c

mean exactly the same thing, and expressions like this can even be

drawn vertically.

Suppose that e is a displacement along some one-dimensional

line of time, and we want to think of it as the thing being measured.

Then we expect that the measurement process represented by c pro-

duces a real-valued result and is a linear function of e. Since the

relationship between c and e is dual, we expect that c also belongs

to some vector space. For example, vector spaces allow multiplica-

tion by a scalar: we could double the frequency of the cuckoo clock

1

Astronomy and Astrophysics 326 (1997) 924,

adsabs.harvard.edu/full/1997A&26A...326..924M

2

arxiv.org/abs/1205.4235

3

The system used in this book follows the one dened by Cvitanovic, which

was based closely on a graphical notation due to Penrose. For a more com-

plete exposition, see the Wikipedia article Penrose graphical notation and

Cvitanovics online book at birdtracks.eu.

110 Chapter 6 Waves

by making the bird come out on the half hour as well as on the

hour, forming 2c. Measurement should be a linear function of both

vectors; we say it is bilinear.

6.1.4 Duality

The two vectors c and e have dierent units, hr

1

and hr, and

inhabit two dierent one-dimensional vector spaces. The avor of

the vector is represented by whether the arrow goes into it or comes

out. Just as we used notation like

v in freshman physics to tell

vectors apart from scalars, we can employ arrows in the birdtracks

notation as part of the notation for the vector, so that instead of

writing the two vectors as c and e, we can notate them as c and

e . Performing a measurement is like plumbing. We join the two

pipes in c e and simplify to c e .

A confusing and nonstandardized jungle of notation and termi-

nology has grown up around these concepts. For now, lets refer to

a vector such as e , with the arrow coming in, simply as a vec-

tor, and the type like c as a covector. In the one-dimensional

example of the earth and the cuckoo clock, the roles played by the

two things were completely equivalent, and it didnt matter which

one we expressed as a vector and which as a covector.

6.2 Phase

6.2.1 Phase is a scalar

In section 1.3.1, p. 21, we dened a (Lorentz) invariant as a

quantity that was unchanged under rotations and Lorentz boosts.

A measurement such as c e = 24 is an invariant because it is

simply a count. Weve counted the number of periods. In fact,

a count is not just invariant under rotations and boosts but un-

der any well-behaved change of coordinates the technical condi-

tion being that each coordinate in each set is a dierentiable func-

tion of each coordinate in the other set. Such a change of coordi-

nates is called a dieomorphism. For example, a change of units

(t, x, y, z) (kt, kx, ky, kz) is all right as long as k is nonzero. A

quantity that stays the same under any dieomorphism is called a

scalar. Since a Lorentz transformation is a dieomorphism, every

scalar is a Lorentz invariant. Not every Lorentz invariant is a scalar.

For example, spacetime volume is Lorentz invariant (p. 45), but un-

der the change of units described above, spacetime volume doesnt

stay the same.

In birdtracks notation, any expression that has no external ar-

rows at all represents a scalar. Since the expression c e = 24 has no

external arrows, only internal ones, it represents a scalar. Another

way of describing this measurement is as a phase. If we prefer to

measure the phase in units of cycles, then we have = c e. If we

Section 6.2 Phase 111

a / 1. A displacement vector.

2. A covector. 3. Measurement

is reduced to counting. The

observer, represented by the

displacement vector, counts 24

wavefronts.

b / Constant-temperature curves

for January in North America, at

intervals of 4

C. The tempera-

ture gradient at a given point is a

covector.

like radians, we can use = 2c e.

6.2.2 Scaling

A convenient way of summarizing all of our categories of vari-

ables is by their behavior when we convert units, i.e., when we rescale

our space. If we switch our time unit from hours to minutes, the

number of apples in a bowl is unchanged, the earths period of ro-

tation gets 60 times bigger, and the frequency of the cuckoo clock

changes by a factor of 1/60. In other words, a quantity u under

rescaling of coordinates by a factor becomes

p

u, where the expo-

nents 1, 0, and +1 correspond to covectors, scalars, and vectors,

respectively. We can therefore see that these distinctions are of

interest even in one dimension, contrary to what one would have ex-

pected from the freshman-physics concept of a vector as something

transforming in a certain way under rotations.

In section 1.3.1 (p. 21), we dened an invariant as a quantity

that did not change under rotations or Lorentz boosts, i.e., one that

was independent of the frame of reference. For a scalar we have the

even more restrictive condition that it must not change if we change

our units of measurement. For example, proper time is an invariant,

and so is area in 1+1-dimensional spacetime, but neither is a scalar.

6.3 The frequency-wavenumber covector

Generalizing from 0 + 1 dimensions to 3 + 1, we could have an

observer moving inertially along velocity vector o, while counting

the phase (in radians) of a plane wave (perhaps a water wave or

an electromagnetic wave) that is washing over her. Since is just a

count, its clearly a scalar. That means that we have some function

that takes as its input a vector o and gives as an output the scalar

. This function has all the right characteristics to be described

as a measurement o of o with some covector , and in a

constructive style of mathematics this is a good way of dening a

covector: its a linear function from the space of vectors to the real

numbers. We call the frequency-wavelength covector, or just

the frequency covector for short. If o represents one second as

measured on the clock of this observer, then o is the frequency

measured by this observer in units of radians per second. If the

same observer considers s to be a vector of simultaneity with a

length of one meter, then s is the observers measurement of the

wavenumber k, dened as 2 divided by the wavelength.

6.3.1 Visualization

In more than one dimension, there are natural ways of visualizing

the dierent vector spaces inhabited by vectors and covectors. A

vector is an arrow. A covector can be visualized as a set of parallel,

evenly spaced lines on a topographic map, a/2, with an arrowhead

112 Chapter 6 Waves

c / The surfer moves directly

to the right with velocity vector u.

The wave also propagates to the

right.

to show which way is uphill. The act of measurement consists of

counting how many of these lines are crossed by a certain vector,

a/3.

6.3.2 The gradient

Given a scalar eld , its gradient at any given point is a

covector. The frequency covector is the gradient of the phase. In

birdtracks notation, we indicate this by writing it with an outward-

pointing arrow, () . Because gradients occur so frequently, bird-

tracks notation has a special shorthand for them, which is simply a

circle:

tion of a gradient, have to be dened in a special way when the

coordinates are not Minkowski.

Cosmological observers Example 1

Time is relative, so what do people mean when they say that the

universe is 13.8 billion years old? If a hypothetical observer had

been around since shortly after the big bang, the time elapsed

on that observers clock would depend on the observers world

line. Two such observers, who had had different world-lines, could

have differing clock readings.

Modern cosmologists arent naive about time dilation. They have

in mind a cosmologically preferred world-line for their observer.

One way of constructing this world-line is as follows. Over time,

the temperature T of the universe has decreased. (We dene this

temperature locally, but we average over large enough regions so

that local variations dont matter.) The negative gradient of this

temperature, T, is a covector that points in a preferred direc-

tion in spacetime, and a preferred world-line for an observer is

one whose velocity vector v is always parallel to T, in the

sense that it maximizes (T) v , subject to the usual con-

straint v

2

= 1.

6.3.3 Phase and group velocity

Phase velocity

A wavefront is a line or surface of constant phase. In a snapshot

of a wave at one moment of time, the direction of propagation of

the wave is across the wavefronts. The visual situation is dierent

in a spacetime diagram. In 1 + 1 dimensions, gure c/1, suppose

that the lines represent the crest of the water waves. The surfer is

on top of a crest, riding along with it. His velocity vector u is in

the spacetime direction that lies on top of the wavefront, not across

it. Clearly both his motion and the propagation of the wave are to

the right, not to the left as we might imagine based on experience

with snapshots of waves.

Section 6.3 The frequency-wavenumber covector 113

d / Points on the graph sat-

isfy the dispersion relation C = 0

for water waves. At a given point

on the graph, the covector (C)

tells us the group velocity.

In 2 +1 dimensions, c/2, the surfers velocity is visualized as an

arrow lying within a plane of constant phase. Given the waves phase

information, there is more than one possible arrow of this kind. We

could try to resolve the ambiguity by requiring that the arrows

projection into the xy plane be perpendicular to the intersection of

the wavefronts with that plane, but (with the exception of the case

where the wave travels at c, example 4, p. 116) this prescription

gives results that change depending on our frame of reference, and

the changes are not describable by a Lorentz transformation of the

velocity vector. This shows that in the general case, the phase in-

formation of the wave, encoded in the frequency covector , does

not describe the direction of the waves propagation through space.

At most it tells us the waves phase velocity, /k, which is not really

a velocity. All of these are symptoms of the fact that a velocity is

supposed to be a vector, but is a covector. The phase velocity

lacks physical interest, because it is not the velocity at which any

stu moves.

Group velocity

There is a dispersion relation between a waves frequency and

wavenumber. For example, surface waves in deep water obey the

constraint C = 0, where C =

4

2

k

2

(gure d) and is a

constant with units of acceleration, relating to the acceleration of

gravity. (Since the water is innitely deep, there is no other scale

that could enter into the constraint.)

When a wave is modulated, it can transport energy and mo-

mentum and transmit information, i.e., act as an agent of cause

and eect between events. How fast does it go? If a certain bump

on the envelope with which the wave is modulated visits spacetime

events P and Q, then whatever frequency and wavelength the wave

has near the bump are observed to be the same at P and Q. In gen-

eral, k and are constant along the spacetime displacement of any

point on the envelope, so the spacetime displacement r from P to

Q must satisfy the condition (C) r = 0. The set of solutions to

this equation is the world-line of the bump, and the inverse slope of

this world-line is called the group velocity. In our example of water

waves, the group velocity is /2, which is half the phase velocity.

6.4 Duality

6.4.1 Duality in 3+1 dimensions

In our original 0 + 1-dimensional example of the cuckoo clock

and the earth, we had duality: the measurements c e = 24 and

e c = 1/24 really provided the same information, and it didnt

matter whether we made our scalar out of covector c and vector

e or covector e and vector c . All these quantities were simply

clock rates, which could be described either by their frequencies

114 Chapter 6 Waves

(covectors) or their periods (vectors).

To generalize this to 3+1 dimensions, we need to use the metric

a piece of machinery that we have never had to employ since

the beginning of the chapter. Given a vector r, suppose we knew

how to produce its covector version r . Then we could hook up the

plumbing to form r r, which is just a number. What number could

it be? The only reasonable possibility is the squared magnitude of

r, which we calculate using the metric as r

2

= g(r, r). Since we can

think of covectors as functions that take vectors to real numbers,

clearly r should be the function f dened by f(x) = g(r, x).

Finding the dual of a given vector Example 2

Given the vector v = (3, 4) in 1 + 1-dimensional Minkowski

coordinates, nd the covector v , i.e., its dual.

Our goal is to write out an explicit expression for the covector in

component form,

v = (a, b) .

To dene these components, we have to have some basis in

mind, consisting of one timelike observer-vector o and one space-

like vector of simultaneity s. Since were doing this in Minkowski

coordinates (section 1.2, 20), lets notate these as

t and x,

where the hats indicate that these are unit vectors in the sense

that

t

2

= 1 and x

2

= 1. Writing v in terms of a and b means

that were identifying v with the function f dened by f (x) =

g( v, x). Therefore

f (

t) = a and f ( x) = b

or

g( v,

t) = 3 = a and g( v, x) = 4 = b .

The result of the formidable, fancy-looking calculation in exam-

ple 2 was simply to take the vector

(3, 4)

and ip the sign of its spacelike component to give the its dual, the

covector

(3, 4) .

Looking back at why this happened, it was because we were using

Minkowski coordinates, and in Minkowski coordinates the form of

the metric is g(p, q) = (+1)p

t

q

t

+ (1)p

x

q

x

+ . . .. Therefore, we

can always nd duals in this way, provided that (1) were using

Minkowski coordinates, and (2) the signature of the metric is, as

assumed throughout this book, +, not + ++.

Section 6.4 Duality 115

Going both ways Example 3

Assume Minkowski coordinates at signature + . Given the

vector

e = (8, 7)

and the covector

f = (1, 2) ,

nd e and f

By the rule established above, we can nd e simply by ipping

the sign of the 7,

e = (8, 7) .

To nd f, we need to ask what vector (a, b), if we ipped the sign

of b, would give us (a, b) = (1, 2). Obviously this is

f = (1, 2) .

In other words, ipping the sign of the spacelike part of a vector

is also the recipe for changing covectors into vectors.

Example 3 shows that in Minkowski coordinates, the operation

of changing a covector to the corresponding vector is the same as

that of changing a vector to its covector. Thus, the dual of a dual is

the same thing you started with. In this respect, duality is similar

to arithmetic operations such as x x and x 1/x. That is, the

duality is a self-inverse operation it undoes itself, like getting two

sex-change operations in a row, or switching political parties twice in

a country that has a two-party system. Birdtracks notation makes

this self-inverse property look obvious, since duality means switching

a inward arrow to an outward one or vice versa, and clearly doing

two such switches gives back the original notation. This property

was established in example 3 by using Minkowski coordinates and

assuming the signature to be + , but it holds without these

assumptions (problem 1, p. 127).

In the general case where the coordinates may not be Minkowski,

the above analysis plays out as follows. Covectors and vectors are

represented by row and column vectors. The metric can be specied

by a matrix g so that the inner product of column vectors p and q

is given by p

T

gq, where T represents the transpose. Rerunning

the same logic with these additional complications, we nd that the

dual of a vector q is (gq)

T

, while the dual of a covector is (g

1

)

T

,

where g

1

is the inverse of the matrix g.

Velocity vector of a light wave, given its phase Example 4

We saw on p. 114 that in general, the information about the phase

of a wave encoded in does not determine its direction of prop-

agation. The exception is a wave, such as a light wave, that prop-

agates at c. Let a world-line of propagation of the wave lie along

the vector v. In the case of a wave propagating at c, we have

116 Chapter 6 Waves

v

2

= 0 (so that v cant have the usual normalization for a veloc-

ity vector), and the dispersion relation is simply

2

= 0. Since the

phase stays constant along a world-line of propagation, v = 0.

We therefore nd that v and are two nonzero, lightlike vectors

that are orthogonal to each other. But as shown in problem 10 on

p. 35, this implies that the two vectors are parallel. Thus if were

given the covector , we just have to compute its dual to

nd the direction of propagation.

6.4.2 Change of basis

We saw in section 6.2.2 that in 0 + 1 dimensions, vectors and

covectors has opposite scaling properties under a change of units,

so that switching our base unit from hours to minutes caused our

frequency covectors to go up by a factor of 60, while our time vectors

went down by the same factor. This behavior was necessary in order

to keep scalar products the same. In more than one dimension, the

notion of changing units is replaced with that of a change of basis.

In linear algebra, row vectors and column vectors act like covectors

and vectors; they are dual to each other. Let B be a matrix made

of column vectors, representing a basis for the column-vector space.

Then a change of basis for a row vector r is expressed as r

= rB,

while the same change of basis for a column vector c is c

= B

1

c.

We then nd that the scalar product is unaected by the change of

basis, since r

= rBB

1

c = rc.

In the important special case where B is a Lorentz transforma-

tion, this means that covectors transform under the inverse trans-

formation, which can be found by ipping the sign of v. This fact

will be important in the following section.

6.5 The Doppler shift and aberration

6.5.1 Doppler shift

As an example, we generalize our previous discussion of the

Doppler shift of light to 3 + 1 dimensions.

For clarity, lets rst show how the 1 +1-dimensional case works

in our new notation. For a wave traveling to the left, we have

= (, ) (not (, ) see gure c/1). We now want to

transform into the frame of an observer moving to the right with

velocity v relative to the original frame. Because is a covec-

tor, we do this using the inverse Lorentz transformation. An or-

dinary Lorentz transformation would take a lightlike vector (, )

to (/D, /D) (see section 3.2). The inverse Lorentz transforma-

tion gives (D, D). The frequency has been shifted upward by the

factor D, as established previously.

In 3 +1 dimensions, a spatial plane is determined by the lights

direction of propagation and the relative velocity of the source and

Section 6.5 The Doppler shift and aberration 117

observer, so this case reduces without loss of generality to 2 + 1

dimensions. The frequency four-vector must be lightlike, so its most

general possible form is (, cos , sin ), where is interpreted

as the angle between the direction of propagation and the relative

velocity. In 2+1 dimensions, a Lorentz boost along the x axis looks

like this:

t

= t vx

x

= vt +x

y

= y

The inverse transformation is found by ipping the sign of v. Putting

our frequency vector through an inverse Lorentz boost, we nd

= (1 +v cos ) .

For = 0 the Doppler factor reduces to (1+v) = D, recovering the

1 + 1-dimensional result. For = 90

, we have

= , which is

interpreted as a pure time dilation eect when the sources motion

is transverse to the line of sight.

To see the power of the mathematical tools weve developed in

this chapter, you may wish to look at sections 6 and 7 of Einsteins

1905 paper on special relativity, where a lengthy derivation is needed

in order to arrive at the same result.

6.5.2 Aberration

Imagine that rain is falling vertically while you drive in a con-

vertible with the top down. To you, the raindrops appear to be

moving at some nonzero angle relative to vertical. This is referred

to as aberration: a world-lines direction changes depending on ones

frame of reference. In the streets frame of reference, the angle be-

tween the rains three-velocity and the cars is = 90

, but in the

cars frame

,= 90

because the cars speed v is comparable to the velocity u of the

raindrops. To a snail crawling along the sidewalk at a much lower

v, the eect would be small. Using the small-angle approximation

tan , we nd that for small v, the dierence =

would

be approximately v/u, in units of radians.

Compared to a ray of light, were all like snails. For example, the

earths orbital speed is about v 10

4

in units where the speed of

light u = 1, so we expect a maximum eect of about 10

4

radians,

or 20

a high-quality mount, being used at high magnication.

This estimate of astronomical aberration of light is roughly right,

but we dont expect it to be exact, both because of the small-angle

approximation and because we calculated it using a Galilean picture

of spacetime. Lets calculate the exact result. As shown in example

4 on p. 116, the direction of propagation of a light wave lies along

118 Chapter 6 Waves

the vector that is the dual to its frequency covector. Lets call this

direction of propagation u. Reusing the expression for dened

in section 6.5.1, and arbitrarily xing us timelike component to

be 1, we have

u = (1, cos , sin ) .

When this vector undergoes a boost v along the x axis it becomes

u

The original angle = tan

1

(u

y

/u

x

) has been transformed to

=

tan

1

(u

y

/u

x

), the result being

tan

=

sin

(cos +v)

.

A test of special relativity Example 5

An assumption underlying this treatment of aberration was that

the speed of light was u = c, regardless of the velocity of the

source. Not all prerelativistic theories had this property, and one

would expect that in such a theory, aberration would not be in

accord with the relativistic result. In particular, suppose that we

believed in Galilean spacetime, so that when a distant galaxy,

receding from us at some speed w, emitted a ray of light toward

us, the lights velocity in our frame was u = c w. That is, we

imagine a theory in which emitting a ray of light is like shooting

a bullet from a gun. Since aberration effects go approximately

like v/u, we would expect that the reduced u would lead to more

aberration compared to the prediction of relativity.

To test theories of this type, Heckmann

4

used a 24-inch reector

at Hamburg to take high-magnication photographic plates of a

star eld in Ursa Major containing 11 stars inside the Milky Way

and 5 distant galaxies. Measurements of Doppler shifts showed

that the galaxies were receding from us at velocities of about w =

0.05c, whereas stars within the Milky Way move relative to us

at speeds that are negligible in comparison. If, contrary to the

relativistic prediction, this led to a 5% decrease in u, then we

would expect about a 5% increase in aberration for the galaxies

compared to the stars.

Over the course of a year, the earths orbit carries it toward and

away from Ursa Major, so that in the earths frame of reference,

the stars and galaxies have varying velocities relative to us, and

the 20

different for the galaxies and the stars, then they ought to shift

their apparent positions relative to one another. The shift ought

to be on the order of 5% of 20

4

Annales dAstrophysique 23 (1960) 410, adsabs.harvard.edu/abs/

1960AnAp...23..410H.

Section 6.5 The Doppler shift and aberration 119

from the observations showed that these relative positions did not

appear to vary at all over the course of a year, with the average

relative shift being 0.00 0.06

tion is consistent with zero, as predicted by special relativity.

e / 1. The cubes rest frame. 2. The observers frame. 3. The observers view of the cube, severely

distorted by aberration.

The view of an ultrarelativistic observer Example 6

Figure e shows a visualization for an observer ying through a

cube at v = 0.99. In e/1, the cube is shown in its own rest frame,

where it has sides of unit length, and the observer, having already

passed through, lies one unit to the right of the cubes center. The

observer is facing to the right, away from the cube. The dashed

line is a ray of light that travels from point P to the observer, and

in this frame it appears as though the ray, arriving from = 162

,

would not make it into the observers eye.

But in the observers frame, e/2, the ray is at

= 47

, so it actu-

ally does fall within her eld of view. The cube is length-contracted

by a factor 7. The ray was emitted earlier, when the cube was

out in front of the observer, at the position shown by the dashed

outline.

The image seen by the observer is shown in e/3. The circular

outline dening the eld of view represents

= 50

. Note that

the relativistic length contraction is not at all what an observer

sees optically. The optical observation is inuenced by length

contraction, but also by aberration and by the time it takes for

light to propagate to the observer. The time of propagation is

different for different parts of the cube, so in the observers frame,

e/2, rays from different points had to be emitted when the cube

was at different points in its motion, if those rays were to reach

120 Chapter 6 Waves

the eye.

A group at Australian National University has produced anima-

tions of similar scenes, which can be found online by searching

for optical effects of special relativity.

Its fun to imagine the view of an observer oboard an ultrarela-

tivistic starship. For v sufciently close to 1, any angle < 180

transforms to a small

from the surrounding stars even those in extreme backward

directions! is gathered into a small, bright patch of light that

appears to come from straight ahead. Some visible light would

be shifted into the extreme ultraviolet and infrared, while some

infrared and ultraviolet light would become visible.

6.6 Some related mathematical tools

This chapter has centered on the physics of waves, but along the way

weve found it helpful to build up some mathematical ideas such

as covectors, which have applications in a much broader physical

context. In this section well develop some related notation and

geometrical applications.

6.6.1 Abstract index notation

Expressions in birdtracks notation such as

C s

can be awkward to type on a computer, which is why weve al-

ready been occasionally resorting to more linear notations such as

(C) s. For more complicated birdtracks, the diagrams sometimes

look like complicated electrical schematics, and the problem of gen-

erating them on a keyboard get more acute. There is in fact a sys-

tematic way of representing any such expression using only ordinary

subscripts and superscripts. This is called abstract index notation,

and was introduced by Roger Penrose at around the same time he

invented birdtracks. For practical reasons, it was the abstract index

notation that caught on.

The idea is as follows. Suppose we wanted to describe a compli-

cated birdtrack verbally, so that someone else could draw it. The

diagram would be made up of various smaller parts, a typical one

looking something like the scalar product u v. The verbal instruc-

tions might be: We have an object u with an arrow coming out of

it. For reference, lets label this arrow as a. Now remember that

other object v I had you draw before? There was an arrow coming

into that one, which we also labeled a. Now connect up the two

arrows labeled a.

Shortening this lengthy description to its bare minimum, Penrose

renders it like this: u

a

v

a

. Subscripts depict arrows coming out of

Section 6.6 Some related mathematical tools 121

a symbol (think of water owing from a tank out through a pipe

below). Superscripts indicate arrows going in. When the same letter

is used as both a superscript and a subscript, the two arrows are to

be piped together.

Abstract index notation evolved out of an earlier one called

the Einstein summation convention, in which superscripts and sub-

scripts referred to specic coordinates. For example, we might take

0 to be the time coordinate, 1 to be x, and so on. A symbol like u

be its x component if took on the value 1. Repeated indices were

summed over.

The advantage of the birdtrack and abstract index notations is

that they are coordinate-independent, so that an equation written

in them is valid regardless of the choice of coordinates. The Einstein

and abstract-index notations look very similar, so for example if we

want to take a general result expressed in abstract-index notation

and apply it in a specic coordinate system, there is essentially no

translation required. In fact, the two notations look so similar that

we need an explicit way to tell which is which, so that we can tell

whether or not a particular result is coordinate-independent. We

therefore use the convention that Latin indices represent abstract

indices, whereas Greek ones imply a specic coordinate system and

can take on numerical values, e.g., = 1.

The following are some examples of equivalent equations written

side by side in birdtracks and abstract index notations.

Observer os displacement in spacetime is a vector:

o o

a

In Einstein notation, its awkward to express a vector as a whole,

because in a notation like o

value. If we used o

of notation. In abstract index notation, however, the a is simply a

name we gave to a pipe coming into vector o; the fact that we didnt

need to refer to the name in order to connect it to some other pipe

is irrelevant.

A waves frequency is a covector:

a

An observer experiences proper time :

o o =

2

o

a

o

a

=

2

There are no external arrows in the birdtracks version, and in the

abstract-index version all lower indices (pipes coming out) have been

paired with upper indices (pipes coming in); this indicates that the

122 Chapter 6 Waves

proper time is a scalar, and therefore independent of any choice of

coordinate system. In Einstein notation, this becomes o

, with

an implied sum over the repeated index,

. The refers to

a particular coordinate system, so in the Einstein notation it is no

longer obvious that the equation holds regardless of our choice of

coordinates.

A world-line along which a wave propagates lies along a vector

that is orthogonal to the waves frequency covector:

u = 0

a

u

a

= 0

The frequency covector is the gradient of the phase:

=

a

=

a

and Einstein notation:

1. Repeated indices occur in pairs, with one up and one down

and the two factors multiplying each other.

2. Disregarding indices that are paired as in rule 1, all other

indices must appear uniformly in all terms and on both sides

of an equation. Appear uniformly means that an index cant

be missing and cant be a superscript in some places but a

subscript in others.

3. For reasons to be explained in section 7.4, p. 134, a partial

derivative with respect to a coordinate, such as /x

k

, is

treated as if the index were a subscript, and conversely /x

k

is considered to have a superscripted k.

In abstract-index notation, rule 1 follows because the indices are

simply labels describing how, in birdtracks notation, the pipes should

be hooked up. Violating rule 1, as in an expression like v

a

v

a

, pro-

duces a quantity that does not actually behave as a scalar. An

example of a violation of rule 2 is v

a

=

a

. This doesnt make

sense, for the same reason that it doesnt make sense to equate a

row vector to a column vector in linear algebra. Even if an equation

like this did hold in one frame of reference, it would fail in another,

since the left-hand and right-hand sides transform dierently under

a boost.

In section 6.4.1 we discussed the notion of nding the covector

that was dual to a given vector, and the vector dual to a given

covector. Because the distinction between vectors and covectors

is represented in index notation by placing the index on the top

or on the bottom, relativists refer to this kind of thing as raising

and lowering indices. In general, this type of manipulation is called

Section 6.6 Some related mathematical tools 123

f / Using parallelism to dene

1-volume.

index gymnastics. Heres what raising and lowering indices looks

like.

Converting a vector to its covector form:

u

a

= g

ab

u

b

Changing a covector to the corresponding vector:

u

a

= g

ab

u

b

The symbol g

ab

refers to the inverse of the matrix g

ab

.

6.6.2 Volume

Desirable properties

In 3 + 1 dimensions, we have a natural way of dening four-

dimensional volume, which is to pick a frame of reference and let

the element of volume be dt dxdy dz in the Minkowski coordinates

of that frame. Although this denition of 4-volume is stated in terms

of certain coordinates, it turns out to be Lorentz-invariant (section

2.5, p. 45). It also has the following desirable properties:

V1. Any two m-volumes can be compared in terms of their ratio.

V2. For any m nonzero vectors, the m-volume of the paral-

lelepiped they span is nonzero if and only if the vectors are linearly

independent (that is, if none of them can be expressed in terms of

the others using scalar multiplication and vector addition).

We would also like to have convenient methods for working with

three-volume, two-volume (area), and one-volume (length). But the

m-volumes for m < 4 give us headaches if we try to dene them so

that they obey both V1 and V2. For example, the obvious way to

dene length (m = 1) is to use the metric, but then lightlike vectors

would violate V2.

Ane measure

If were willing to abandon V1, then the following approach suc-

ceeds. Consider the m = 1 case. We ignore the metric completely

and exploit the fact that in special relativity, spacetime is at (pos-

tulate P2, p. 41), so that parallelism works the same way as in

Euclidean geometry. Let be a line, and suppose we want to dene

a number system on this line that measures how far apart events are.

Depending on the type of line, this could be a measurement of time,

of spatial distance, or a mixture of the two. First we arbitrarily sin-

gle out two distinct points on and label them 0 and 1, as in gure

f. Next, pick some auxiliary point q

0

not lying on . Construct q

0

q

1

and parallel to 01 and 1q

1

parallel to 0q

0

, forming the parallelogram

shown in the gure. Continuing in this way, we have a scaolding of

parallelograms adjacent to the line, determining an innite lattice of

points 1, 2, 3, . . . on the line, which represent the positive integers.

124 Chapter 6 Waves

g / The area of the viola can

be determined by counting the

parallelograms formed by the

lattice. The area can be deter-

mined to any desired precision,

by dividing the parallelograms

into fractional parts that are as

small as necessary.

Fractions can be dened in a similar way. For example,

1

2

is dened

as the point such that when the initial lattice segment 0

1

2

is ex-

tended by the same construction, the next point on the lattice is 1.

The continuously varying variable constructed in this way is called

an ane parameter. The time measured by a free-falling clock is

an example of an ane parameter, as is the distance measured by

the tick marks on a free-falling ruler. An ane parameter can only

be dened along a straight world-line, not an arbitrary curve. The

ane measurement of 1-volume violates V1, because it only allows

us to compare distances that lie on or parallel to it. On the other

hand, it has the advantage over metric measurement that it allows

us to measure lengths along lightlike lines.

Figure g shows how to dene an ane measure of 2-volume, and

a similar method works for 3-volume.

Linearity

Suppose that a parallelogram is formed with vectors a and b as

two of its sides. It we double a, then the area doubles as well,

area(2a, b) = 2 area(a, b) .

In general, if we scale either of the vectors by a factor c, the area

scales by the same factor, provided that we set some rule for han-

dling signs an issue that well ignore for the time being. Some-

thing similar happens when we add two vectors, e.g.,

area(a, b +c) = area(a, b) + area(a, c) ,

again passing over issues with signs. We refer to these properties as

linearity of the ane 2-volume. Any sensible measure of m-volume

should have similar linearity properties.

The 3-volume covector

In the special case of m = 3, linearity leads to an especially

simple characterization of the volume. Let a 3-volume be dened

by the parallelepiped spanned by vectors a, b, and c. If we threw

in a fourth vector d, we would have a 4-volume, and 4-volume is

a scalar. This 4-volume would depend in a linear way on all four

vectors, and in particular it would depend linearly on d. But this

means we have a scalar function that depends linearly on a vector,

and such a function is exactly what we mean by a covector. We can

therefore dene a volume covector S according to

S

i

d

i

= 4-volume(a, b, c, d) .

The volume covector collects the information about the volume of

the 3-parallelepiped, encapsulating it in a convenient form with

known transformation properties. In particular, the statement and

proof of Gausss theorem in 3 + 1 dimensions are greatly simpli-

ed by the use of this tool (p. 164). The 3-volume covector, un-

like the ane 3-volume, is dened in an absolute sense rather than

Section 6.6 Some related mathematical tools 125

h / Interpretation of the 3-volume

covector.

in relation to some parallelepiped arbitrarily chosen as a standard.

Both the covector and the ane volume fail to satisfy the ratio-

comparison property V1 on p. 124, since we cant compare volumes

unless they lie in parallel 3-planes.

Weve been visualizing covectors in n dimensions as stacks of

(n 1)-dimensional planes (gure a/2, p. 112; gure c/2, p. 113).

The volume three-vector should therefore be visualized as a stack of

3-planes in a four-dimensional space. Since most of us cant visu-

alize things very well in four dimensions, gure h omits one of the

dimensions, so that the 3-surfaces appear as two-dimensional planes.

The small hand h/1 has a certain 3-volume, and the covector that

measures it is represented by the stack of 3-planes parallel to it,

h/2. The bigger hand h/3 has twice the 3-volume, and its covector

is represented by a stack of planes with half the spacing.

126 Chapter 6 Waves

Problems

1 In section 6.4.1, I proved that duality is a self-inverse oper-

ation, invoking Minkowski coordinates and assuming the signature

to be +. Show that these assumptions were not necessary.

Problems 127

128 Chapter 6 Waves

Chapter 7

Coordinates

In your previous study of physics, youve seen many examples where

one coordinate system makes life easier than another. For a block

being pushed up an inclined plane, the most convenient choice may

be to tilt the x and y axes. To nd the moment of inertia of a

disk we use cylindrical coordinates. The same is true in relativ-

ity. Minkowski coordinates are not always the most convenient. In

chapter 6 we learned to classify physical quantities as covectors,

scalars, and vectors, and we learned rules for how these three types

of quantities transformed in two special changes of coordinates:

1. When we rescale all coordinates by a factor , vectors, scalars,

and covectors scale by

p

, where p = +1, 0, and 1, respec-

tively.

2. Under a boost, the three cases require respectively the Lorentz

transformation, no transformation, and the inverse Lorentz

transformation.

In this chapter well learn how to generalize this to any change of

coordinates,

1

and also how to nd the form of the metric expressed

in non-Minkowski coordinates.

7.1 An example: accelerated coordinates

Lets start with a concrete example that has some physical interest.

In section 5.2, p. 104, we saw that we could have gravity without

gravity: an experiment carried out in a uniform gravitational eld

can be interpreted as an experiment in at spacetime (so that spe-

cial relativity applies), but with the measurements expressed in the

accelerated frame of the earths surface. In the Pound-Rebka ex-

periment, all of the results could have been expressed in an inertial

(free-falling) frame of reference, using Minkowski coordinates, but

this would have been extremely inconvenient, because, for example,

they didnt want to drop their expensive atomic clocks and take the

readings before the clocks hit the oor and were destroyed.

Since this is gravity without gravity, we dont actually need

a planet cluttering up the picture. Imagine a universe consist-

1

We do require the change of coordinates to be smooth in the sense dened

on p. 111, i.e., it should be a dieomorphism.

129

a / The transformation between

Minkowski coordinates (t , x)

and the accelerated coordinates

(T, X)

ing of limitless, empty, at spacetime. Describe it initially using

Minkowski coordinates (t, x, y, z). Now suppose we want to nd a

new set of coordinates (T, X, Y , Z) that correspond to the frame of

reference of an observer aboard a spaceship accelerating in the x

direction with a constant acceleration.

The Galilean answer would be that we simply take X = x

1

2

at

2

.

But this is unsatisfactory from a relativistic point of view for several

reasons. At t = c/a the observer would be moving at the speed of

light, but relativity doesnt allow frames of reference moving at c

(section 3.4, p. 55). At t > c/a, the observers motion would be

faster than c, but this is impossible in 3+1 dimensions (section 3.8,

p. 64).

These problems are related to the fact that the observers proper

acceleration, i.e., the reading on an accelerometer aboard the ship,

isnt constant if the ships position is given by

1

2

at

2

. We saw in

example 4 on p. 57 that constant proper acceleration is described

by x =

1

a

cosh a, t =

1

a

sinh a, where is the proper time. For this

type of motion, the velocity only approaches c asymptotically. This

suggests the following for the relationship between the two sets of

coordinates:

t = X sinh T

x = X cosh T

y = Y

z = Z

For example, if the ship follows a world-line (T, X) = (, 1), then its

motion in the unaccelerated frame is (t, x) = (sinh , cosh ), which

is of the desired form with a = 1.

The (T, X, Y , Z) coordinates, called Rindler coordinates, have

many, but not all, of the properties we would like for an accelerated

frame. Ideally, wed like to have all of the following: (1) the proper

acceleration is constant for any world-line of constant (X, Y , Z); (2)

the proper acceleration is the same for all such world-lines, i.e., the

ctitious gravitational eld is uniform; and (3) the description of

the accelerated frame is just a change of coordinates, i.e., were just

talking about the at spacetime of special relativity, with events

renamed. It turns out that we can pick two out of three of these,

but its not possible to satisfy all three at the same time. Rindler

coordinates satisfy conditions 1 and 3, but not 2. This is because the

proper acceleration of a world-line of constant (X, Y , Z) can easily

be shown to be 1/X, which depends on X. Thus we dont speak of

Rindler coordinates as the coordinates of an accelerated observer.

7.2 Transformation of vectors

Now suppose we want to transform a vector whose components

are expressed in the (T, X) coordinates into components expressed

130 Chapter 7 Coordinates

in (t, x). Our most basic example of a vector is a dispacement

(T, X), and if we make this an innitesimal (dT, dX) then we

dont need to worry about the fact that the chart in gure a has

curves on it close up, curves look like straight lines.

2

If we think

of the coordinate t as a function of two variables, t = t(T, X), then

t is changing for two dierent reasons: its rst input T changes, and

also its second input X. If t were only a function of one variable

t(T), then the change in t would be given simply by the chain rule,

dt =

dt/ dT

d

T. Since it actually has two such reasons to change, we

add the two changes:

dt =

t

T

dT +

t

X

dX

The derivatives are partial derivatives, and these derivatives exist

because, as we will always assume, the change of coordinates is

smooth. An exactly analogous expression applies for dx.

dx =

x

T

dT +

x

X

dX

Before we carry out the details of this calculation, lets stop

and note that the results so far are completely general. Since we

have so far made no use of the actual equations for this particular

change of coordinates, these expressions would apply to any such

transformation, including the special cases weve encountered so far,

such as Lorentz transformations and scaling. (For example, if wed

been scaling by a factor , then all of the partial derivatives would

simply have equaled .) Furthermore, our denition of a vector is

that a vector is anything that transforms like a vector. Since weve

established that the rules above apply to a displacement vector, we

conclude that they would also apply to any other vector, say an

energy-momentum vector.

Returning to this specic example, application of the facts

dsinh u/ du = cosh u and dcosh u/ du = sinh u tells us that the vec-

tor

(dT, dX)

is transformed to:

(dt, dx) = (X cosh T dT + sinh T dX , X sinh T dT + cosh T dX)

As an example of how this applies universally to any type of

vector, suppose that the observer aboard a spaceship with world-line

(T, X) = (, 1) has a favorite paperweight with mass m. According

to measurements carried out aboard her ship, its energy-momentum

vector is

(p

T

, p

X

) = (m, 0) .

2

Here we make use of the fact that the change of coordinate was smooth, i.e.,

a dieomorphism. Otherwise the curves could have kinks in them that would

still look like kinks under any magnication.

Section 7.2 Transformation of vectors 131

In the unaccelerated coordinates, this becomes

(p

t

, p

x

) = (X cosh T p

T

+ sinh T p

X

, X sinh T p

T

+ cosh T p

X

)

= (mX cosh T, mX sinh T)

= (mcosh , msinh ) .

Since the functions cosh and sinh behave like e

x

for large x, we nd

that after the astronaut has spent a reasonable amount of proper

time accelerating, the paperweights mass-energy and momentum

will have grown to the point where its an awesome weapon of mass

destruction, capable of obliterating an entire galaxy.

7.3 Transformation of the metric

Continuing with the example of accelerated coordinates, lets nd

what happens to the metric when we change from Minkowski coor-

dinates. Minkowski coordinates are essentially dened so that the

metric has the familiar form with coecients +1 and 1. In relativ-

ity, one often presents the metric by showing its result when applied

to an innitesimal displacement (dt, dx):

ds

2

= dt

2

dx

2

Here ds would represent proper time, in the case where the displace-

ment was timelike. Since weve already determined that

dt = X cosh T dT + sinh T dX and

dx = X sinh T dT + cosh T dX ,

we can simply substitute into the expression for ds in order to nd

the form of the metric in (T, X) coordinates. Employing the identity

cosh

2

sinh

2

= 1, we nd

ds

2

= X

2

dT

2

dX

2

.

The varying value of the dT

2

coecient is in fact exactly the kind

of gravitational time dilation eect whose existence we predicted in

section 5.2.5, p. 106 based on the equivalence principle. The form

of the metric inferred there was

ds

2

(1 + 2) dT

2

dX

2

,

where is the dierence in gravitational potential relative to some

reference height. One of the approximations employed was the as-

sumption that the range of heights X was small, but subject to

that approximation, the two results should agree. For convenience,

lets consider observers in the region X 1, where the accelera-

tion is approximately 1. Then the = (1 + X) (1)

(acceleration)(height) X, so the time coecient in the second

132 Chapter 7 Coordinates

form of the metric is 1 +2 1 +2X. But to within the de-

sired level of approximation, this is the same as X

2

= (1 +X)

2

1 + 2X.

The procedure employed above works in general. To transform

the metric from coordinates (t, x, y, z) to new coordinates (t

, x

, y

, z

),

we obtain the unprimed coordinates in terms of the primed ones,

take dierentials on both sides, and eliminate t, . . . , dt, . . . in favor

of t

, . . . dt

2

. Well see in section 9.2.4,

p. 154, that this is an example of a more general transformation

law for tensors, mathematical objects that generalize vectors and

covectors in the same way that matrices generalize row and column

vectors.

b / Example 1.

A map projection Example 1

Because the earths surface is curved, it is not possible to rep-

resent it on a at map without distortion. Let be the latitude,

the angle measured down from the north pole (known as the

colatitude), both measured in radians, and let a be the earths ra-

dius. Then by the denition of radian measure, an innitesimal

north-south displacement by d is a distance a d. A point at a

given colatitude lies at a distance a sin from the axis, so for an

innitesimal east-west distance we have a sin d. For conve-

nience, let the units be chosen such that a = 1. Then the metric,

with signature ++, is

ds

2

= d

2

+ sin

2

d .

One of the many possible ways of forming a at map is the Lam-

bert cylindrical projection,

x =

y = cos ,

shown in gure b. If we see a distance on the map and want

to know how far it actually is on the earths surface, we need

to transform the metric into the (x, y) coordinates. The inverse

coordinate transformation is

= x

= cos

1

y .

Section 7.3 Transformation of the metric 133

Taking differentials on both sides, we get

d = dx

d =

dy

_

1 y

2

.

We take the metric and eliminate , , d, and d, nding

ds

2

= (1 y

2

) dx

2

+

1

1 y

2

dy

2

.

In gure b, the polka-dot pattern is made of gures that are ac-

tually circles, all of equal size, on the earths surface. Since they

are fairly small, we can approximate y as having a single value

for each circle, which means that they are represented on the

at map as approximate ellipses with their east-west dimensions

having been stretched by (1 y

2

)

1/2

and their north-south ones

shrunk by (1 y

2

)

1/2

. Since these two factors are reciprocals of

one another, the area of each ellipse is the same as the area of

the original circle, and therefore the same as those of all the other

ellipses. They are a visual representation of the metric, and they

demonstrate the equal-area property of this projection.

7.4 Summary of transformation laws

Having worked through one example in detail, lets progress from

the specic to the general. In the Einstein concrete index notation,

let coordinates (x

0

, x

1

, x

2

, x

3

) be transformed to new coordinates

(x

0

, x

1

, x

2

, x

3

). Then vectors transform according to the rule

v

= v

, (1)

where the Einstein summation convention implies a sum over the

repeated index . By the same reasoning as in section 6.4.2, p. 117,

the transformation for a covector is

. (2)

Note the inversion of the partial derivative in one equation compared

to the other. Because these equations describe a change from one

coordinate system to another, they clearly depend on the coordinate

system, so we use Greek indices rather than the Latin ones that

would indicate a coordinate-independent abstract index equation.

The letter in these equations always appears as an index re-

ferring to the new coordinates, to the old ones. For this rea-

son, we can get away with dropping the primes and writing, e.g.,

v

= v

/x

rather than v

v

134 Chapter 7 Coordinates

This becomes especially natural if we start working in a specic co-

ordinate system where the coordinates have names. For example,

if we transform from coordinates (t, x, y, z) to (a, b, c, d), then it is

clear that v

t

is expressed in one system and v

c

in the other.

In equation (2), appears as a subscript on the left side of the

equation, but as a superscript on the right. This would appear to

violate the grammatical rules given on p. 123, but the interpreta-

tion here is that in expressions of the form /x

i

and /x

i

, the

superscripts and subscripts should be understood as being turned

upside-down. Similarly, (1) appears to have the implied sum over

written ungrammatically, with both s appearing as superscripts.

Normally we only have implied sums in which the index appears

once as a superscript and once as a subscript. With our new rule

for interpreting indices on the bottom of derivatives, the implied sum

is seen to be written correctly. This rule is similar to the one for

analyzing the units of derivatives written in Leibniz notation, with,

e.g., d

2

x/ dt

2

having units of meters per second squared. That is,

the ipping of the indices like this is required for consistency so

that everything will work out properly when we change our units of

measurement, causing all our vector components to be rescaled.

The identity transformation Example 2

In the case of the identity transformation x

= x

, equation (1)

clearly gives v

/x

In equation (2), it is tempting to write

x

=

1

x

(wrong!) ,

but this would give innite results for the mixed terms! Only in the

case of functions of a single variable is it possible to ip deriva-

tives in this way; it doesnt work for partial derivatives. To evalu-

ate these partial derivatives, we have to invert the transformation

(which in this example is trivial to accomplish) and then take the

partial derivatives.

Polar coordinates Example 3

None of the techniques discussed here are particular to relativity.

For example, consider the transformation from polar coordinates

(r , ) in the plane to Cartesian coordinates

x = r cos

y = r sin .

A bug sits on the edge of a phonograph turntable, at (r , ) = (1, 0).

The turntable rotates clockwise, giving the bug a velocity vector

v

= (v

r

, v

second in the negative (counterclockwise) direction. Lets nd the

Section 7.4 Summary of transformation laws 135

bugs velocity vector in Cartesian coordinates. The transformation

law for vectors gives.

v

x

= v

x

x

.

Expanding the implied sum over the repeated index , we have

v

x

= v

r

x

r

+ v

= (0)

x

r

+ (1)

x

= r sin

= 0 .

For the y component,

v

y

= v

r

y

r

+ v

= (0)

y

r

+ (1)

y

= r sin

= 1 .

7.5 Inertia and rates of change

Suppose that we describe a ying bullet in polar coordinates. We

neglect the vertical dimension, so the bullets motion is linear. If the

bullet has a displacement of (r

1

,

1

) in an short time interval t,

then clearly at a later point in its motion, during an equal interval,

it will have a displacement (r

2

,

2

) with two dierent numbers

inside the parentheses. This isnt because its velocity or momentum

really changed. Its because the coordinate system is curvilinear.

There are three ways to get around this:

1. Use only Minkowski coordinates.

2. Instead of characterizing inertial motion as motion with con-

stant velocity components, we can instead characterize it as

motion that maximizes the proper time (section 2.4.2, p. 44).

3. Dene a correction term to be added when taking the deriva-

tive of a vector or covector expressed in non-Minkowski coor-

dinates.

These issues become more acute in general relativity, where curva-

ture of spacetime can make option 1 impossible. Option 3, called the

covariant derivative, is discussed in optional section 9.4 on p. 167.

If you arent going to read that section, just keep in mind that in

non-Minkowski coordinates, you cannot naively use changes in the

components of a vector as a measure of a change in the vector itself.

136 Chapter 7 Coordinates

Problem 2.

Problems

1 Example 3 on p. 135 discussed polar coordinates in the Eu-

clidean plane. Use the technique demonstrated in section 7.3 to nd

the metric in these coordinates.

2 Oblique Cartesian coordinates are like normal Cartesian co-

ordinates in the plane, but their axes are at at an angle ,= /2 to

one another. Show that the metric in these coordinates is

ds

2

= dx

2

+ dy

2

+ 2 cos dxdy .

Problems 137

138 Chapter 7 Coordinates

Chapter 8

Rotation (optional)

8.1 Rotating frames of reference

8.1.1 No clock synchronization

Panels 1 and 2 of gure a recapitulate the result of example 13

on p. 32. The set of three clocks xed to the earth in a/1 have

been synchronized by Einstein synchronization (example 4, p. 19),

i.e., by exchanging ashes of light. The three clocks aboard the

moving train, a/2, have been synchronized in the same way, and

the events that were simultaneous according to frame 1 are not

simultaneous in frame 2. There is a systematic shift in the times,

which is represented by the term t

= . . . vx in the Lorentz

transformation (eq. (1), p. 30).

a / Clocks cant be synchronized

in a rotating frame of reference.

Now suppose we take the diagram of the train and wrap it

around, a/3. If we go on and close the loop, making the chain

into a circle like a chain necklace, we have a problem. The trend

in the clock times can continue until it wraps back around to the

beginning, but then there will be a discrepancy.

139

We conclude that clocks cant be synchronized in a rotating

frame of reference. Such a frame does not admit a universal time

coordinate because Einstein synchronization isnt transitive: syn-

chronizing clock A with clock B, and B with C, does not imply that

A is synchronized with C. This nontransitivity is one way of dening

what we mean by rotation. That is, if the operational denition of

an inertial frame given in section 5.1, p. 101, shows that our frame is

noninertial, and we want to know more about why its noninertial,

testing for this nontransitivity is a way of nding out whether its

because of rotation.

8.1.2 Rotation is locally detectable

The people aboard the circular train know that their attempts at

synchronization fail, so they can tell, without reference to anything

external, that theyre going in a circle. (Cf. example 1, p. 102.)

Although this is a book on special, not general, relativity, its in-

teresting to note the following possibility. Suppose that we verify, by

local experiments, that we have a good, nonrotating, inertial frame

of reference. It is then imaginable that if we view distant galaxies

from this frame, we will see them rotate at some angular frequency

about some axis on the celestial sphere. If this is observed, then

we must infer that it is the universe as a whole not our labora-

tory! that is rotating. Such an eect has been searched for, and,

for example, an upper limit 10

7

radian/year was inferred by

Clemence.

1

General-relativistic models of such rotating cosmologies

have a preferred vector constituting the direction of the axis about

which matter rotates, but there is no global center of rotation. Cur-

rent upper limits on are good enough to rule out any signicant

eect on cosmological expansion due to centrifugal forces.

8.1.3 The Sagnac effect

Although the train scenario is obviously unrealistic, the time

shift is far from hypothetical. This type of eect, called the Sagnac

eect, was rst observed by M. Georges Sagnac in 1913, and it

relates to the principle of the ring laser gyroscope (example 2, p. 18),

used in passenger jets. (The name is French, and is pronounced

sah-NYAHK.) To nd the Sagnac eect quantitatively, we note

that in the circular train example (ignoring signs) the relevant term

in the Lorentz transformation, vx, would accumulate, after one

complete circuit of Einstein synchronization, a discrepancy equal

to the circumference of the circle multiplied by v. If the circles

radius is r and the angular velocity , we have t = 2r

2

. This

can be rewritten in terms of the circles area A as t = 2A, or,

reinserting factors of c to accomodate SI units, t = 2A/c

2

. The

proportionality to the enclosed area is not an accident; the product

vx has the form of the integrand F ds occurring in Stokes theorem.

1

Astronomical time, Rev. Mod. Phys. 29 (1957) 2.

140 Chapter 8 Rotation (optional)

Sagnac effect in the Hafele-Keating experiment Example 1

A clock at the equator of the earth rotates at a frequency of

2 radians per sidereal day, suffering a Sagnac effect of 210 ns

per day. The traveling atomic clocks in the Hafele-Keating exper-

iment (p. 15) went around the world in both directions, and were

compared with a third set of clocks that stayed in Washington, DC.

Since the time required to y around the earth was also on the or-

der of one day, the differences in the values of for the three sets

of clocks were on the same order of magnitude as the of the

earth, and we therefore expect cumulative differential Sagnac ef-

fects that are also on the order of a hundred nanoseconds. These

effects exist only in the rotating frame of the earth, but the things

being measured are proper times, and proper time is a scalar, so

the experimental results are independent of what frame of refer-

ence is used for calculating them. Since the airline pilots provided

Hafele and Keating with navigational data referred to the rotating

earth, they analyzed their results in the rotating frame, in which

there was a Sagnac effect. They could equally well have trans-

formed their data into the frame of the stars, in which case the

same result would have been predicted, but it would have been

described as arising from kinematic time dilation.

Ring laser gyroscope Example 2

The ring laser gyroscope in the photo in example 2 on p. 18 looks

like it has an area on the order of 10

2

cm

2

and uses red light.

For use in navigation, one wants to be able to detect a change in

course of, say, one degree in our hour, or 510

6

radian/s.

The result is a time shift t 10

24

s, which for red light is a

phase shift of only = 4A/c 310

9

radian. In the orig-

inal nineteenth-century experiments, this phase shift would have

had to be measured by producing interference between the two

beams and measuring the change in intensity resulting from this

change in phase. Our estimate of shows that this is impractical

for a portable instrument. In a modern ring laser gyroscope, an

active laser medium is inserted in the loop, and the result is that

the loop resonates at a frequency that is shifted from the lasers

natural frequency by f c/L, where L is the circumference.

The result is a frequency shift of a few Hz, which is easily measur-

able. An alternative technique, used in the ber optic gyroscope,

is to wrap N turns of optical ber around the circumference, ef-

fectively changing A to NA.

8.1.4 A rotating coordinate system

The GPS system is a practical example of a case where we nat-

urally want to employ a rotating coordinate system. Hikers and

sailors, after all, want to know where they are relative to the earths

rotating surface. Since locations need to be determined to within

meters, the timing of signals needs to be done to a precision of

Section 8.1 Rotating frames of reference 141

something like (1 m)/c, which is a few nanoseconds. This is why

the GPS satellites have atomic clocks aboard, and timing to this pre-

cision clearly requires that relativistic eects be taken into account.

We therefore need not a rotating Newtonian coordinate system but

a rotating relativistic one. Lets start with the nonrotating frame,

and dene coordinates (t, r, , z), with the spatial part (r, , z) being

ordinary cylindrical coordinates. For simplicity, well neglect the z

coordinate in what follows. Extending the result of problem 1 on

p. 137 from 2 + 0 dimensions to 2 + 1, we have the metric

ds

2

= dt

2

dr

2

r

2

d

2

. (1)

The results of section 8.1.1 show that we do not expect to be able

to dene a completely satisfactory time coordinate in the rotating

frame, so lets start with the minimal change (t, r, ) (t, r,

),

where

constant

frequency. Substituting d = d

+ dt, we nd

ds

2

= (1

2

r

2

) dt

2

dr

2

r

2

d

2

2r

2

d

dt . (2)

Recognizing r as the velocity of one frame relative to another,

and (1

2

r

2

)

1/2

as , we see that we do have a relativistic time

dilation eect in the dt

2

term. But the dr

2

and d

2

terms look the

same as in equation (1). Why dont we see any Lorentz contraction

of the length scale in the azimuthal direction?

The answer is that coordinates in relativity are arbitrary, and

just because we can write down a certain set of coordinates, that

doesnt mean they have any special physical interpretation. The co-

ordinates (t, r,

a rotating observer R would measure with clocks and meter-sticks.

If R uses a ruler to measure a short arc along the circumference of

the circle r = r

0

, the distance is a distance being measured between

events in spacetime that are simultaneous in the rest frame of the

ruler, and these do not occur at the time value of the time coordi-

nate t. In the Lorentz transformation, for linear motion, it is the

vx term applied to the times that xes this problems and makes t

version, we could try to do something similar by dening a time

coordinate t

= t +f

so that the d

be done (the function f that works turns out to be r

2

/(1

2

r

2

)),

but the problem is that the t

sense that (t, r, ) and (t, r, + 2) would not produce the same t

.

This is inevitable, as weve seen in section 8.1.1, so we cant improve

on the coordinates (t, r,

The coordinates (t, r,

in the GPS system, and in that context are called Earth-Centered

142 Chapter 8 Rotation (optional)

Inertial (ECI) coordinates. (Another name is Born coordinates.)

Their time coordinate is not the time measured by a clock in the

rotating frame but is simply the time coordinate of the nonrotating

frame of reference tied to the earths center. Conceptually, we can

imagine this time coordinate as one that is established by sending

out an electromagnetic tick-tock signal from the earths center,

with each satellite correcting the phase of the signal based on the

propagation time inferred from its own r. In reality, this is accom-

plished by communication with a master control station in Colorado

Springs, which communicates with the satellites via relays at Kwa-

jalein, Ascension Island, Diego Garcia, and Cape Canaveral.

8.2 Boosts and rotations

A relative of mine fell in love. She and her boyfriend bought a house

in the suburbs and started trying to have a baby. They think theyll

get married at some later point. An engineer by training, she says

she doesnt want to get hung up on the order of operations. For

some mathematical operations, the order doesnt matter: 5 + 7 is

the same as 7 + 5.

b / Performing the rotations in one

order gives one result, 3, while re-

versing the order gives a different

result, 5.

8.2.1 Rotations

But gure b shows that the order of operations does matter

for rotations. Rotating around the x axis and then y produces a

dierent result than y followed by x. We say that rotations are

noncommutative. This is why, in Newtonian mechanics, we dont

have an angular displacement vector ; vectors are supposed to be

additive, and vector addition is commutative. For small rotations,

however, the discrepancy caused by choosing one order of operations

Section 8.2 Boosts and rotations 143

rather than the other becomes small (of order

2

), so we can dene

an innitesimal displacement vector d, whose direction is given by

the right-hand rule, and an angular velocity = d/ dt.

As an example of how this works out for small rotations, lets

take the vector

(0, 0, 1) (3)

and apply the operations shown in gure b, but with rotations of

only = 0.1 radians rather than 90 degrees. Rotation by this

angle about the x axis is given by the transformation (x, y, z)

(x, y cos z sin , y sin +z cos ), and applying this to the original

vector gives this:

(0.00000, 0.09983, 0.99500) (after x) (4)

After a further rotation by the same angle, this time about the y

axis, we have

(0.09933, 0.09983, 0.99003) (after x, then y) (5)

Starting over from the original vector (3) and doing the operations

in the opposite order gives these results:

(0.09983, 0.00000, 0.99500) (after y) (6)

(0.09983, 0.09933, 0.99003) (after y, then x) (7)

The discrepancy between (5) and (7) is a rotation by very nearly

.005 radians in the xy plane. As claimed, this is on the order of

2

(in fact, its almost exactly

2

/2). A single example can never prove

anything, but this is an example of the general rule that rotations

along dierent axes dont commute, and for small angles the dis-

crepancy is a rotation in the plane dened by the two axes, with a

magnitude whose maximum size is on the order of

2

.

8.2.2 Boosts

Something similar happens for boosts. In 3 + 1 dimensions, we

start with the vector

(0, 1, 0, 0) , (8)

pointing along the x axis. A Lorentz boost with v = 0.1 (eq. (1),

p. 30) in the x direction gives

(0.10050, 1.00504, 0.00000, 0.00000) (after x) (9)

and a second boost, now in the y direction, produces this:

(0.10101, 1.00504, 0.01010, 0.00000) (after x, then y) (10)

Starting over from (8) and doing the boosts in the opposite order,

we have

(0.00000, 1.00000, 0.00000, 0.00000) (after y) (11)

(0.10050, 1.00504, 0.00000, 0.00000) (after y, then x) (12)

144 Chapter 8 Rotation (optional)

c / Nonrelativistically, the gy-

roscope should not rotate as long

as the forces from the hammer

are all transmitted to it at its

center of mass.

The discrepancy between (10) and (12) is a rotation in the xy plane

by very nearly 0.01 radians. This is an example of a more general

fact, which is that boosts along dierent axes dont commute, and

for small angles the discrepancy is a rotation in the plane dened

by the two boosts, with a magnitude whose maximum size is on the

order of v

2

, in units of radians.

8.2.3 Thomas precession

Figure c shows the most important physical consequence of all

this. The gyroscope is sent around the perimeter of a square, with

impulses provided by hammer taps at the corners. Each impulse

can be modeled as a Lorentz boost, notated, e.g., L

x

for a boost

in the x direction. The series of four operations can be written as

L

y

L

x

L

y

L

x

, using the notational convention that the rst opera-

tion applied is the one on the right side of the list. If boosts were

commutative, we could swap the two operations in the middle of

the list, giving L

y

L

y

L

x

L

x

. The L

x

would undo the L

x

, and

the L

y

would undo the L

y

. But boosts arent commutative, so

the vector representing the orientation of the gyroscope is rotated

in the xy plane. This eect is called the Thomas precession, after

Llewellyn Thomas (1903-1992). Thomas precession is a purely rela-

tivistic eect, since a Newtonian gyroscope does not change its axis

of rotation unless subjected to a torque; if the boosts are accom-

plished by forces that act at the gyroscopes center, then there is no

nonrelativistic explanation for the eect.

Clearly we should see the same eect if the jerky motion in gure

c was replaced by uniform circular motion, and something similar

should happen in any case in which a spinning object experiences an

external force. In the limit of low velocities, the general expression

for the angular velocity of the precession is = a v, and in the

case of circular motion, = (1/2)v

2

, where is the frequency of

the circular motion.

If we want to see this precession eect in real life, we should

look for a system in which both v and a are large. An atom is

such a system. The Bohr model, introduced in 1913, marked the

rst quantitatively successful, if conceptually muddled, description

of the atomic energy levels of hydrogen. Continuing to take c = 1,

the over-all scale of the energies was calculated to be proportional to

m

2

, where m is the mass of the electron, and = ke

2

/ 1/137,

known as the ne structure constant, is essentially just a unitless

way of expressing the coupling constant for electrical forces. At

higher resolution, each excited energy level is found to be split into

several sub-levels. The transitions among these close-lying states

are in the millimeter region of the microwave spectrum. The energy

scale of this ne structure is m

4

. This is down by a factor of

2

compared to the visible-light transitions, hence the name of the

constant. Uhlenbeck and Goudsmit showed in 1926 that a splitting

Section 8.2 Boosts and rotations 145

d / States in hydrogen are la-

beled with their and s quantum

numbers, representing their

orbital and spin angular momenta

in units of . The state with

s = +1/2 has its spin angular

momentum aligned with its orbital

angular momentum, while the

s = 1/2 state has the two

angular momenta in opposite

directions. The direction and

order of magnitude of the splitting

between the two = 1 states

is successfully explained by

magnetic interactions with the

proton, but the calculated effect

is too big by a factor of 2. The

relativistic Thomas precession

cancels out half of the effect.

on this order of magnitude was to be expected due to the magnetic

interaction between the proton and the electrons magnetic moment,

oriented along its spin. The eect they calculated, however, was too

big by a factor of two.

The explanation of the mysterious factor of two had in fact been

implicit in a 1916 calculation by Willem de Sitter, one of the rst

applications of general relativity. De Sitter treated the earth-moon

system as a gyroscope, and found the precession of its axis of rota-

tion, which was partly due to the curvature of spacetime and partly

due to the type of rotation described earlier in this section. The

eect on the motion of the moon was noncumulative, and was only

about one meter, which was much too small to be measured at the

time. In 1927, however, Thomas applied similar reasoning to the

hydrogen atom, with the electrons spin vector playing the role of

gyroscope. Since the electrons spin is /2, the energy splitting is

(/2), depending on whether the electrons spin is in the same

direction as its orbital motion, or in the opposite direction. This is

less than the atoms gross energy scale by a factor of v

2

/2, which

is

2

. The Thomas precession cancels out half of the magnetic

eect, bringing theory in agreement with experiment.

Uhlenbeck later recalled: ...when I rst heard about [the Thomas

precession], it seemed unbelievable that a relativistic eect could

give a factor of 2 instead of something of order v/c... Even the

cognoscenti of relativity theory (Einstein included!) were quite sur-

prised.

146 Chapter 8 Rotation (optional)

Problems

1 In the 1925 Michelson-Gale-Pearson experiment, the physicists

measured the Sagnac eect due to the earths rotation. They laid

out a rectangle of sewer pipes with length x = 613 m and width y =

339 m, and pumped out the air. The latitude of the site in Illinois

was 41

46

the rectangle into the plane perpendicular to the earths axis. Light

was provided by a sodium discharge with = 570 nm. The light

was sent in both directions around the rectangle and interfered,

eectively doubling the area. Clever techniques were required in

order to calibrate the apparatus, since it was not possible to change

its orientation. Calculate the number of wavelengths by which the

relative phase of the two beams was expected to shift dur to the

Sagnac eect, and compare with the experimentally measured result

of 0.230 0.005 cycles.

Problems 147

148 Chapter 8 Rotation (optional)

a / Charged particles with

world-lines that contribute to J

x

and . The z dimension isnt

shown, so the cubical 3-surfaces

appear as squares.

Chapter 9

Flux

9.1 The current vector

9.1.1 Current as the ux of charged particles

The most fundamental laws of physics are conservation laws,

which tell us that we cant create or destroy stu, where stu

could mean quantities such as electric charge or energy-momentum.

Since charge is a Lorentz invariant, its an easy example to start

with. Because charge is invariant, we might also imagine that charge

density was invariant. But this is not the case, essentially because

spatial (3-dimensional) volume isnt invariant; in 3 + 1 dimensions,

only four-dimensional volume is an invariant (problem 2, p. 47). For

example, suppose we have an insulator in the shape of a cube, with

charge distributed uniformly throughout it according to an observer

o

1

at rest relative to the cube. Then in a frame o

2

moving relative to

the cube, parallel to one of its axes, the cube becomes foreshortened

by length contraction, and its volume is reduced by the factor 1/.

The result is that the charge density in o

2

is greater by a factor of

.

This means that knowledge of the charge density in one frame

is insucient to determine the charge density in another frame. In

the example of the cube, what would be sucient would be knowl-

edge of the vector J =

0

v, where

0

is the charge density in the

cubes rest frame, and v is the cubes velocity vector. J, called

the current vector, transforms as a relativistic vector because of the

transformation properties of the two factors that dene it. The ve-

locity v is a vector (section 3.5.1). The factor

0

is an invariant,

since it in turn breaks down into charge divided by rest-volume.

Charge is an invariant, and all observers agree on what the volume

the cube would have in its rest frame.

J can be expressed in Minkowski coordinates as (, J

x

, J

y

, J

z

),

where is the charge density and, e.g., J

x

is the density of electric

current in the x direction. Suppose we dene the three-surface S

shown in gure a/1, consisting of the set of events with coordinates

(t, 0, y, z) such that 0 t 1, 0 y 1, and 0 z 1. Some

charged particles have world-lines that intersect this surface, pass-

ing through it either in the positive x direction or the negative x

direction (which we count as negative charge transport). S has a

three-volume V . If we add up the total charge transport q across

149

b / Example 1.

this surface and divide by V , we get the average value of J

x

. If we

let S shrink down to smaller and smaller three-surfaces surround-

ing the event (0, 0, 0, 0), then we get the the value of J

x

at this

point, lim

V 0

q/V . In other words, J

x

measures the ux density

of charge that passes through S. Of course this description in terms

of a limit implies a large number of charges, not just one as in gure

a.

You can write out the analogous denition for J

t

, using a surface

of simultaneity for like S

the density of charge . In this case S

and the ux through S

threshold from the past into the future.

Our argument that J transformed like a vector was based on a

case where all the charged particles had the same velocity vector, but

the above description in terms of the ux of charge eliminated any

discussion of velocity. Its true, but less obvious, that the J described

in this way also transforms as a vector, even in cases where the

charged particles do not all have parallel world-lines. The current

vector is the source of electric and magnetic elds. Remarkably, no

macroscopic electrical measurement is capable of detecting anything

more detailed about the motion of the charges than the averaged

information provided by J.

Boosting a solenoid Example 1

The gure shows a solenoid, at rest, wound from copper wire.

At point P, we construct a rectangular Amp` erian loop in the yz

plane that has its right edge inside the solenoid and its left one

outside. Amp` eres law,

_

B ds = (4k/c

2

)I, then tells us that the

current density J

x

causes a difference between the exterior eld

B

z

= 0 and the interior eld B

z

= (4k/c

2

)J

x

y, where y is the

thickness of the solenoid. There are two things we can get from

this result, both of them nontrivial.

First, the eld depends only on the current density, not on any

information about the details of the motion of the electrons in the

copper. The electrons motion is fast and highly random, but all

that contributes to J

x

is the slow drift velocity, typically 1 cm/s,

superimposed on the randomness. This is exact and not at all

obvious. For example, the total momentum of the electrons does

depend on the random part of their motion, because p

x

= mv

x

has a factor of in it.

Second, we can use the transformation properties of the current

vector to nd the eld of this solenoid in a frame boosted along

its axis. This is the kind of situation that would naturally arise,

for example, in an electric motor whose rotor contains an elec-

tromagnet. A Lorentz transformation in the z direction doesnt

change the x component of a vector, nor does it change y, so

B

z

is the same in both frames. This is nontrivial both in the sense

150 Chapter 9 Flux

that it would have been difcult to gure out by brute force and in

the sense that elds dont have to be the same in different frames

of reference for example, a boost in the x or the y direction

would have changed the result.

A wire Example 2

In a solid conductor such as a copper wire, we have two types of

charges, protons and electrons. The protons are at rest in the lab

frame o, with charge density

p

and current density

J

p

= (

p

, 0, 0, 0)

in Minkowski coordinates. The motion of the electrons is compli-

cated. Some electrons are bound to a particular atom, but still

move at relativistic speeds within their atoms. Others exhibit vio-

lent thermal motion that very nearly, but not quite, averages out

to zero when there is a current measurable by an ammeter. For

simplicity, we treat all the electrons (both the bound ones and the

mobile ones) as a single density of charge

e

. Let the average

velocity of the electrons, known as their drift velocity, be v in the x

direction. Then in the frame o

we have

J

e

= (

e

, 0, 0, 0) ,

which under a Lorentz transformation back into the lab frame be-

comes

J

e

= (

e

,

e

v, 0, 0) .

Adding the two current vectors, we have a total current in the lab

frame

J = (

p

+

e

,

e

v, 0, 0) .

The wire is electrically neutral in this frame, so

p

+

e

= 0. Since

p

is a xed property of the wire, we express

e

in terms of it as

p

/. Eliminating

e

gives

J = (0,

p

v, 0, 0) .

Because the factors canceled, we nd that the current is exactly

proportional to the drift velocity. Geometrically, we have added

two timelike vectors and gotten a spacelike one; this is possible

because one of the timelike vectors was future-directed and the

other past-directed.

Section 9.1 The current vector 151

c / 1. Charge is not conserved.

Charges mysteriously appear at

a later time without having been

present before. 2. Charge is

conserved. Although more world-

lines come out through the top

of the box than came in through

the bottom, the discrepancy is

accounted for by others that

entered through the sides.

9.1.2 Conservation of charge

Conservation of charge can be expressed elegantly in terms of J.

Charge density is the timelike component J

t

. If this charge density

near a certain point is, for example, increasing, then it might be

because charge conservation has been violated as in gure c/1. In

this example, more world-lines emerge into the future at the top

of the four-cube than had entered through the bottom in the past.

Some process inside the cube is creating charge. In the limit where

the cube is made very small, this would be measured by a value of

J

t

/t that was greater than zero.

But experiments have never detected any violation of charge con-

servation, so if more charge is emerging from the top (future) side of

the cube than came in from the bottom (past), the more likely expla-

nation is that the charges are not all at rest, as in c/1, but are mov-

ing, c/2, and there has been a net ow in from neighboring regions

of space. We should nd this reected in the spatial components J

x

,

J

y

and J

z

. Moreover, if these spatial components were all constant,

then any given region of space would have just as much current ow-

ing into it from one side as there was owing out the other. We there-

fore need to have some nonzero partial derivatives such as J

x

/x.

For example, gure c/2 has a positive J

x

on the left and a negative

J

x

on the right, so J

x

/x < 0. Charge conservation is expressed by

the simple equation J

/x

, this says that J

t

/t +J

x

/x+J

y

/y +J

z

/z = 0. with an

implied sum over the index . If youve taken vector calculus, youll

recognize the operator being applied to J as a four-dimensional gen-

eralization of the divergence. This charge-conservation equation is

valid regardless of the coordinate system, so it can also be rewritten

in abstract index notation as

J

a

x

a

= 0 . (1)

Conservation of charge in a solenoid Example 3

In a solenoid, we have charge circulating at some drift velocity v.

Ignoring the protons, and adapting the relevant expression from

example 2 to the case of circular rather than linear motion, we

might have for the electrons contribution to the current something

of the form

J = p(1, qy, qx, 0) ,

where p = v and q depends on the v and on the radius of the

solenoid. Conservation of charge is satised, because each of

the four terms in the equation J

t

/t +J

x

/x+J

y

/y+J

z

/z =

0 vanishes individually.

152 Chapter 9 Flux

9.2 The stress-energy tensor

9.2.1 Conservation and ux of energy-momentum

A particle such as an electron has a charge, but it also has a

mass. We cant dene a relativistic mass ux because ux is de-

ned by addition, but mass isnt additive in relativity (example 6,

p. 78). Mass-energy is additive, but unlike charge it isnt an in-

variant. Mass-energy is part of the energy-momentum four vector

p = (E, p

x

, p

y

, p

z

). We then have sixteen dierent uxes we can

dene. For example, we could replay the description in section 9.1

of the three-surface S perpendicular to the x direction, but now we

would be interested in a quantity such as the z component of mo-

mentum. We then have a measure of the density of ux of p

z

in

the x direction, which we notate as T

zx

. The matrix T is called

the stress-energy tensor, and it is an object of central importance

in relativity. In general relativity, it is the source of gravitational

elds. (The reason for the odd name will become more clear in a

moment.)

The stress-energy tensor is related to physical measurements as

follows. Let o be the future-directed, normalized velocity vector of

an observer; let s express a spatial direction according to this ob-

server, i.e., it points in a direction of simultaneity and is normalized

with s s = 1; and let S be a three-volume covector (p. 125), di-

rected toward the future (i.e., o

a

S

a

> 0). Then measurements by

this observer come out as follows:

T

ab

o

a

S

b

= mass-energy inside the three-volume S (2a)

T

ab

s

a

S

b

= momentum in the direction s, inside S (2b)

The stress-energy tensor allows us to express conservation of

energy-momentum as

T

ab

x

a

= 0 . (3)

This local conservation of energy-momentum is all we get in general

relativity. As discussed in section 4.3.2, p. 83, there is no such

global law in curved spacetime. However, we will show in section

9.3.4 that in the special case of at spacetime, i.e., special relativity,

we do have such a global conservation law.

9.2.2 Symmetry of the stress-energy tensor

The stress-energy tensor is a symmetric matrix. For example,

lets say we have some nonrelativistic particles. If we have a nonzero

T

tx

, it represents a ux of mass-energy (p

t

) through a three-surface

perpendicular to x. This means that mass is moving in the x direc-

tion. But if mass is moving in the x direction, then we have some

x momentum p

x

. Therefore we must also have a T

xt

, since this mo-

mentum is carried by the particles, whose world-lines pass through

a hypersurface of simultaneity.

Section 9.2 The stress-energy tensor 153

9.2.3 Dust

The simplest example of a stress-energy tensor would be a cloud

of particles, all at rest in a certain frame of reference, described in

Minkowski coordinates:

T

=

_

_

_

_

0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

_

_

_

_

,

where we now use to indicate the density of mass-energy, not

charge as in section 9.1. This could be the stress-energy tensor of

a stack of oranges at the grocery store, the atoms in a hunk of

copper, or the galaxies in some small neighborhood of the universe.

Relativists refer to this type of matter, in which the velocities are

negligible, as dust. The nonvanishing component T

tt

indicates

that for a three-surface S perpendicular to the t axis, particles with

mass-energy E = P

t

are crossing that surface from the past to the

future. Conservation of energy-momentum is satised, since all the

elements of this T are constant, so all the partial derivatives vanish.

9.2.4 Rank-2 tensors and their transformation law

Suppose we were to look at this cloud in a dierent frame of

reference. Some or all of the timelike row T

t

and timelike column

T

t

would ll in because of the existence of momentum, but lets

just focus for the moment on the change in the mass-energy density

represented by T

tt

. It will increase for two reasons. First, the kinetic

energy of each particle is now nonzero; its mass-energy increases

from m to m. But in addition, the volume occupied by the cloud

has been reduced by 1/ due to length contraction. Weve picked up

two factors of gamma, so the result is

2

. This is dierent from

the transformation behavior of a vector. When a vector is purely

timelike in one frame, transformation to another frame raises its

timelike component only by a factor of , not

2

. This tells us that

a matrix like T transforms dierently than a vector (section 7.2,

p. 130). The general rule is that if we transform from coordinates x

to x

, then:

T

= T

(4)

An object that transforms in this standard way is called a rank-2

tensor. The 2 is because it has two indices. Vectors and covectors

have rank 1, invariants rank 0.

In section 7.3, p. 132, we developed a method of transforming

the metric from one set of coordinates to another; we now see that

technique as an application of the more general rule given in equa-

tion (4). Considered as a tensor, the metric is symmetric, g

ab

= g

ba

.

In most of the examples weve been considering, the metric tensor

is diagonal, but when it has o-diagonal elements, each of these is

154 Chapter 9 Flux

one half the corresponding coecient in the expression for ds, as in

the following example.

An non-diagonal metric tensor Example 4

The answer to problem 2 on p. 137 was the metric

ds

2

= dx

2

+ dy

2

+ 2 cos dx dy .

Writing this in terms of the metric tensor, we have

ds

2

= g

dx

dx

= g

xx

dx

2

+ g

xy

dx dy + g

yx

dy dx + g

yy

dy

2

= g

xx

dx

2

+ 2g

xy

dx dy + g

yy

dy

2

.

Therefore we have g

xy

= cos , not g

xy

= 2cos .

Dust in a different frame Example 5

We start with the stress-energy tensor of the cloud of particles, in

the rest frame of the particles.

T

=

_

_

_

_

0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

_

_

_

_

.

Under a boost by v in the x direction, the tensor transformation

law gives

T

=

_

_

_

_

2

2

v 0 0

2

v

2

v

2

0 0

0 0 0 0

0 0 0 0

_

_

_

_

.

The over-all factor of

2

arises for the reasons previously de-

scribed.

Parity Example 6

The parity transformation is a change of coordinates that looks

like this:

t

= t

x

= x

y

= y

z

= z

It turns right-handed screws into left-handed ones, but leaves the

arrow of time unchanged. Under this transformation, the tensor

transformation law tells us that some of the components of the

stress-energy tensor will ip their signs, while others will stay the

same:

_

_

_

_

no ip ip ip ip

ip no ip no ip no ip

ip no ip no ip no ip

ip no ip no ip no ip

_

_

_

_

,

Section 9.2 The stress-energy tensor 155

d / Tullio Levi-Civita (1873-

1941) worked on models of

number systems possessing

innitesimals and on differential

geometry. He invented the

tensor notation, which Einstein

learned from his textbook. He

was appointed to prestigious

endowed chairs at Padua and the

University of Rome, but was red

in 1938 because he was a Jew

and an anti-fascist.

Everything here was based solely on the fact that T was a rank-

2 tensor expressed in Minkowski coordinates, and therefore the

same parity properties hold for other rank-2 tensors as well; cf. ex-

ample 1, p. 180.

9.2.5 Pressure

The stress-energy tensor carries information about pressure. For

example, T

xx

is the ux in the x direction of x-momentum. This

is simply the pressure, P, that would be exerted on a surface with

its normal in the x direction. Negative pressure is tension, and

this is the origin of the term tensor, coined by the mathematician

Levi-Civita.

Pressure as a source of gravitational elds Example 7

Because the stress-energy tensor is the source of gravitational

elds in general relativity, we can see that the gravitational eld of

an object should be inuenced not just by its mass-energy but by

its internal stresses. The very early universe was dominated by

photons rather than by matter, and photons have a much higher

ratio of momentum to mass-energy than matter, so the impor-

tance of the pressure components in the stress-energy tensor

was much greater in that era. In the universe today, the largest

pressures are those found inside atomic nuclei. Inside a heavy

nucleus, the electromagnetic pressure can be as high as 10

33

Pa!

If general relativitys description of pressure as a source of gravi-

tational elds were wrong, then we would see anomalous effects

in the gravitational forces exerted by heavy elements compared

to light ones. Such effects have been searched for both in the

laboratory

1

and in lunar laser ranging experiments,

2

with results

that agreed with general relativitys predictions.

9.2.6 A perfect uid

The cloud in example 5 had a stress-energy tensor in its own rest

frame that was isotropic, i.e., symmetric with respect to the x, y,

and z directions. The tensor became anisotropic when we switched

out of this frame. If a physical system has a frame in which its

stress-energy tensor is isotropic, i.e., of the form

T

=

_

_

_

_

0 0 0

0 P 0 0

0 0 P 0

0 0 0 P

_

_

_

_

,

we call it a perfect uid in equilibrium. Although it may contain

moving particles, this special frame is the one in which their mo-

menta cancel out. In other cases, the pressure need not be isotropic,

1

Kreuzer, Phys. Rev. 169 (1968) 1007. Described in section 3.7.3 of Will,

The Confrontation between General Relativity and Experiment, relativity.

livingreviews.org/Articles/lrr-2006-3/.

2

Bartlett and van Buren, Phys. Rev. Lett. 57 (1986) 21, also described in

Will.

156 Chapter 9 Flux

and the stress exerted by the uid need not be perpendicular to the

surface on which it acts. The space-space components of T would

then be the classical stress tensor, whose diagonal elements are the

anisotropic pressure, and whose o-diagonal elements are the shear

stress. This is the reason for calling T the stress-energy tensor.

The perfect uid form of the stress-energy tensor is extremely

important and common. For example, cosmologists nd that it is a

nearly perfect description of the universe on large scales.

We discussed in section ?? the ideas of converting back and forth

between vectors and their corresponding covectors, and of notating

this as the raising and lowering indices. We can do the same thing

with the two indices of a rank-2 tensor, so that the stress-energy

tensor can be expressed in four dierent ways: T

ab

, T

ab

, T

a

b

, and

T

b

a

, but the symmetry of T means that there is no interesting dis-

tinction between the nal two of these. In special relativity, the

distinctions among the various forms are not especially fascinating.

We can always cover all of spacetime with Minkowski coordinates,

so that the form of the metric is simply a diagonal matrix with el-

ements 1 on the diagonal. As with a rank-1 tensor, raising and

lowering indices on a rank-2 tensor just ips some components and

leaves others alone. The methods for raising and lowering dont

need to be deduced or memorized, since they follow uniquely from

the grammar of index notation, e.g., T

a

b

= g

bc

T

ac

. But there is

the potential for a lot of confusion with all the signs, and in ad-

dition there is the fact that some people use a + signature

while others use + ++. Since perfect uids are so important, Ill

demonstrate how all of this works out in that case.

For a perfect uid, we can write the stress-energy tensor in the

coordinate-independent form

T

ab

= ( +P)o

a

o

b

(o

c

o

c

)Pg

ab

,

where o represents the velocity vector of an observer in the uids

rest frame, and o

c

o

c

= o

2

= o o equals 1 for our + signature

or 1 for the signature +++. For ease of writing, lets abbreviate

the signature factor as s = o

c

o

c

.

Suppose that the metric is diagonal, but its components are

varying, g

= diag(sA

2

, sB

2

, . . .). The properly normalized ve-

locity vector of an observer at (coordinate-)rest is o

= (A

1

, 0, 0, 0).

Lowering the index gives o

stress-energy tensor then look like the following:

T

00

= A

2

T

11

= B

2

P

T

0

0

= s T

1

1

= sP

T

00

= A

2

T

11

= B

2

P .

Which of these forms is the real one, e.g., which form of the 00

component is the one that the observer o actually measures when

Section 9.2 The stress-energy tensor 157

she sticks a shovel in the ground, pulls out a certain volume of dirt,

weighs it, and determines ? The answer is that the index notation

is so slick and well designed that all of them are equally real, and

we dont need to memorize which actually corresponds to measure-

ments. When she does this measurement with the shovel, she could

say that she is measuring the quantity T

ab

o

a

o

b

. But because all of

the as and bs are paired o, this expression is a rank-0 tensor. That

means that T

ab

o

a

o

b

, T

ab

o

a

o

b

, and T

a

b

o

a

o

b

are all the same number.

If, for example, we have coordinates in which the metric is diagonal

and has elements 1, then in all these expressions the diering signs

of the os are exactly compensated for by the signs of the Ts.

9.2.7 Two simple examples

A rope under tension Example 8

As a real-world example in which the pressure is not isotropic,

consider a rope that is moving inertially but under tension, i.e.,

equal forces at its ends cancel out so that the rope doesnt ac-

celerate. Tension is the same as negative pressure. If the rope

lies along the x axis and its bers are only capable of supporting

tension along that axis, then the ropes stress-energy tensor will

be of the form

T

=

_

_

_

_

0 0 0

0 P 0 0

0 0 0 0

0 0 0 0

_

_

_

_

,

where P is negative and equals minus the tension per unit cross-

sectional area.

Conservation of energy-momentumis expressed as (eq. 3, p. 153)

T

ab

x

a

= 0 .

Converting the abstract indices to concrete ones, we have

T

= 0 ,

where there is an implied sum over , and the equation must hold

both in the case where is a label for t and the one where it refers

to x.

In the rst case, we have

T

t t

t

+

T

xt

x

= 0 ,

which is a statement of conservation of energy, energy being the

timelike component of the energy-momentum. The rst term is

zero because is constant by virtue of our assumption that the

158 Chapter 9 Flux

rope was uniform. The second term is zero because T

xt

= 0.

Therefore conservation of energy is satised. This came about

automatically because by writing down a time-independent ex-

pression for the stress-energy, we were dictating a static equilib-

rium.

When stands for x, we get an equation that requires the x com-

ponent of momentum to be conserved,

T

t x

t

+

T

xx

x

= 0 .

This simply says

P

x

= 0 ,

meaning that the tension in the rope is constant along its length.

A rope supporting its own weight Example 9

A variation on example 8 is one in which the rope is hanging

and supports its own weight. Although gravity is involved, we

can solve this problem without general relativity, by exploiting the

equivalence principle (section 5.2, p. 104). As discussed in sec-

tion 5.1 on p. 101, an inertial frame in relativity is one that is free-

falling. We dene an inertial frame of reference o, corresponding

to an observer free-falling past the rope, and a noninertial frame

o

Since the rope is hanging in static equilibrium, observer o

sees

a stress-energy tensor that has no time-dependence. The off-

diagonal components vanish in this frame, since there is no mo-

mentum. The stress-energy tensor is

T

=

_

0

0 P

_

,

where the components involving y and z are zero and not shown,

and P is negative as in example 8. We could try to apply the

conservation of energy condition to this stress-energy tensor as

in example 8, but that would be a mistake. As discussed in 7.5

on p. 136, rates of change can only be measured by taking par-

tial derivatives with respect to the coordinates if the coordinates

are Minkowski, i.e., in an inertial frame. Therefore we need to

transform this stress-energy tensor into the inertial frame o.

For simplicity, we restrict ourselves to the Newtonian approxima-

tion, so that the change of coordinates between the two frames

is

t t

x x

+

1

2

at

2

,

Section 9.2 The stress-energy tensor 159

where a > 0 if the free-falling observer falls in the negative x

direction, i.e., positive x is up. That is, if a point on the rope at a

xed x

sees the spot moving up, to larger values of x, at t > 0. Applying

the tensor transformation law, we nd

T

=

_

at

at P + a

2

t

2

_

,

As in example 8, conservation of energy is trivially satised. Con-

servation of momentum gives

T

t x

t

+

T

xx

x

= 0 ,

or

a +

P

x

= 0 .

Integrating this with respect to x, we have

P = ax + constant .

Let the cross-sectional area of the rope be A, and let = A be

the mass per unit length and T = PA the tension. We then nd

T = ax + constant .

Conservation of momentum requires that the tension vary along

the length of the rope, just as we expect from Newtons laws: a

section of the rope higher up has more weight below it to sup-

port.

9.2.8 Energy conditions

The result of example 9 could cause something scary to happen.

If we walk up to a clothesline under tension and give it a quick

karate chop, we will observe wave pulses propagating away from the

chop in both directions, at velocities v =

_

T/. But the result

of the example is that this expression increases without limit as x

gets larger and larger. At some point, v will exceed the speed of

light. (Of course any real rope would break long before this much

tension was achieved.) Two things led to the problematic result: (1)

we assumed there was no constraint on the possible stress-energy

tensor in the rest frame of the rope; and (2) we used a Newtonian

approximation to change from this frame to the free-falling frame.

In reality, we dont know of any material so sti that vibrations

propagate in it faster than c. In fact, all ordinary materials are made

of atoms, atoms are bound to each other by electromagnetic forces,

and therefore no material made of atoms can transmit vibrations

faster than the speed of an electromagnetic wave, c.

Based on these conditions, we therefore expect there to be cer-

tain constraints on the stress-energy tensor of any ordinary form

160 Chapter 9 Flux

of matter. For example, we dont expect to nd any rope whose

stress-energy tensor looks like this:

T

=

_

_

_

_

1 0 0 0

0 2 0 0

0 0 0 0

0 0 0 0

_

_

_

_

,

because here the tensile stress +2 is greater than the mass density

1, which would lead to [v[ =

_

2/1 > 1. Constraints of this kind are

called energy conditions. Hypothetical forms of matter that violate

them are referred to as exotic matter; if they exist, they are not made

of atoms. This particular example violates the an energy condition

known as the dominant energy condition, which requires > 0 and

[P[ > . There are about ve energy conditions that are commonly

used, and a detailed discussion of them is more appropriate for a

general relativity text. The common ideas that recur in many of

them are: (1) that energy density is never negative in any frame of

reference, and (2) that there is never a ux of energy propagating

at a speed greater than c.

An energy condition that is particularly simple to express is the

trace energy condition (TEC),

T

a

a

0 ,

where we have to have one upper index and one lower index in order

to obey the grammatical rules of index notation. In Minkowski

coordinates (t, x, y, z), this becomes T

over expanding to give

T

t

t

+T

x

x

+T

y

y

+T

z

z

0 .

The left-hand side of this relation, the sum of the main-diagonal

elements of a matrix, is called the trace of the matrix, hence the

name of this energy condition. Since this book uses the signature

+ for the metric, raising the second index changes this to

T

tt

T

xx

T

yy

T

zz

0 .

In example 5 on p. 155, we computed the stress-energy tensor of a

cloud of dust, in a frame moving at velocity v relative to the clouds

rest frame. The result was

T

=

_

_

_

_

2

2

v 0 0

2

v

2

v

2

0 0

0 0 0 0

0 0 0 0

_

_

_

_

.

In this example, the trace energy condition is satised precisely un-

der the condition [v[ 1, which can be interpreted as a statement

that according the TEC, the mass-energy of the cloud can never be

transported at a speed greater than c in any frame.

Section 9.2 The stress-energy tensor 161

e / Three lines go in, and

three come out. These could be

eld lines or world lines.

9.3 Gausss theorem

9.3.1 Integral conservation laws

Weve expressed conservation of charge and energy-momentum

in terms of zero divergences,

J

a

x

a

= 0

T

ab

x

a

= 0 .

These are expressed in terms of derivatives. The derivative of a

function at a certain point only depends on the behavior of the

function near that point, so these are local statements of conser-

vation. Conservation laws can also be stated globally: the total

amount of something remains constant. Taking charge as an exam-

ple, observer o denes Minkowski coordinates (t, x, y, z), and at a

time t

1

says that the total amount of charge in some region is

q(t

1

) =

_

t

1

J

a

dS

a

,

where the subscript t

1

means that the integrand is to be evaluated

over the surface of simultaneity t = t

1

, and dS

a

= (dxdy dz, 0, 0, 0)

is an element of 3-volume expressed as a covector (p. 125). The

charge at some later time t

2

would be given by a similar integral.

If charge is conserved, and if our region is surrounded by an empty

region through which no charge is coming in or out, then we should

have q(t

2

) = q(t

1

).

9.3.2 A simple form of Gausss theorem

The connection between the local and global conservation laws

is provided by a theorem called Gausss theorem. In your course

on electromagnetism, you learned Gausss law, which relates the

electric ux through a closed surface to the charge contained inside

the surface. In the case where no charges are present, it says that

the ux through such a surface cancels out. The interpretation is

that since eld lines only begin or end on charges, the absence of

any charges means that the lines cant begin or end, and therefore,

as in gure e, any eld line that enters the surface (contributing

some negative ux) must eventually come back out (creating some

positive ux that cancels out the negative). But there is nothing

about gure e that requires it to be interpreted as a drawing of

electric eld lines. It could just as easily be a drawing of the world-

lines of some charged particles in 1 + 1 dimensions. The bottom of

the rectangle would then be the surface at t

1

and the top t

2

. We

have q(t

1

) = 3 and q(t

2

) = 3 as well.

For simplicity, lets start with a very restricted version of Gausss

theorem. Let a vector eld J

a

be dened in two dimensions. (We

162 Chapter 9 Flux

dont care whether the two dimensions are both spacelike or one

spacelike and one timelike; that is, Gausss theorem doesnt depend

on the signature of the metric.) Let R be a rectangular area, and

let S be its boundary. Dene the ux of the eld through S as

=

_

S

J

a

dS

a

,

where the integral is to be taken over all four sides, and the covector

dS

a

points outward. If the eld has zero divergence, J

a

/x

a

= 0,

then the ux is zero.

Proof: Dene coordinates x and y aligned with the rectangle.

Along the top of the rectangle, the element of the surface, oriented

outwards, is dS = (0, dx), so the contribution to the ux from the

top is

top

=

_

top

J

y

(y

top

) dx .

At the bottom, an outward orientation gives dS = (0, dx), so

bottom

=

_

bottom

J

y

(y

bottom

) dx .

Using the fundamental theorem of calculus, the sum of these is

top

+

bottom

=

_

R

J

y

y

dy dx .

Adding in the similar expressions for the left and right, we get

=

_

R

_

J

x

x

+

J

y

y

_

dxdy .

But the integrand is the divergence, which is zero by assumption,

so = 0 as claimed.

9.3.3 The general form of Gausss theorem

Although the coordinates were labeled x and y, the proof made

no use of the metric, so the result is equally valid regardless of the

signature. The rectangle could equally well have been a rectangle in

1 +1-dimensional spacetime. The generalization to n dimensions is

also automatic, and everything also carries through without modi-

cation if we replace the vector J

a

with a tensor such as T

ab

that

as more indices the extra index b just comes along for the ride.

Sometimes, as with Gausss law in electromagnetism, we are inter-

ested in elds whose divergences are not zero. Gausss theorem then

becomes

_

S

J

a

dS

a

=

_

R

J

a

x

a

dv ,

where dv is the element of n-volume. In 3 + 1 dimensions we could

use Minkowski coordinates to write the element of 4-volume as dv =

dt dxdy dz, and even though this expression in written in terms of

Section 9.3 Gausss theorem 163

f / Proof of Gausss theorem

for a region with an arbitrary

shape.

these specic coordinates, it is actually Lorentz invariant (section

2.5, p. 45).

The generalization to a region R with an arbitrary shape, gure

f, is less trivial. The basic idea is to break up the region into rect-

anglular boxes, f/1. Where the faces of two boxes coincide on the

interior of R, their own outward directions are opposite. Therefore

if we add up the uxes through the surfaces of all the boxes, the

contributions on the interior cancel, and were left with only the

exterior contributions. If R could be dissected exactly into boxes,

then this would complete the proof, since the sum of exterior contri-

butions would be the same as the ux through S, and the left-hand

side of Gausss theorem would be additive over the boxes, as is the

right-hand side.

The diculty arises because a smooth shape typically cannot be

built out of bricks, a fact that is well known to Lego enthusiasts

who build elaborate models of the Death Star. We could argue on

physical grounds that no real-world measurement of the ux can

depend on the granular structure of S at arbitrarily small scales,

but this feels a little unsatisfying. For comparison, it is not strictly

true that surface areas can be treated in this way. For example, if

we approximate a unit 3-sphere using smaller and smaller boxes, the

limit of the surface area is 6, which is quite a bit greater than the

surface area 4/3 of the limiting surface.

Instead, we explicitly consider the nonrectangular pieces at the

surface, such as the one in f/2. In this drawing in n = 2 dimensions,

the top of this piece is approximately a line, and in the limit well be

considering, where its width becomes an innitesimally small dx, the

error incurred by approximating it as a line will be negligible. We

dene vectors dx and dx

two dimensions shown in the gure, we would approximate the top

surface as an (n1)-dimensional parallelepiped spanned by vectors

dx

, dy

a

(p. 125) pays o by greatly simplifying the proof.

3

Applying this

to the top of the triangle, dS is dened as the linear function that

takes a vector J and gives the n-volume spanned by J along with

dx

, . . .

Call the vertical coordinate on the diagram t, and consider the

contribution to the ux from Js time component, J

t

. Because the

3

Here is an example of the ugly complications that occur if one doesnt have

access to this piece of technology. In the low-tech approach, in Euclidean space,

one denes an element of surface area dA = ndA, where the unit vector n is

outward-directed with n n = 1. But in a signature such as + , we could

have a region R such that over some large area of the bounding surface S, the

normal direction was lightlike. It would therefore be impossible to scale n so that

n n was anything but zero. As an example of how much work it is to resolve

such issues using stone-age tools, see Synge, Relativity: The Special Theory,

VIII, 6-7, where the complete argument takes up 22 pages.

164 Chapter 9 Flux

triangles size is an innitesimal of order dx, we can approximate J

t

as being a constant throughout the triangle, while incurring only an

error of order dx. (By stating Gausss theorem in terms of deriva-

tives of J, we implicitly assumed it to be dierentiable, so it is not

possible for it to jump discontinuously.) Since dS depends linearly

not just on J but on all the vectors, the dierence between the ux

at the top and bottom of the triangle equals is proportional to the

area spanned by J and dx

t direction, and therefore the area it spans when taken with J

t

is

approximately zero. Therefore the contribution of J

t

to the ux

through the triangle is zero. To estimate the possible error due to

the approximations, we have to count powers of dx. The possible

variation of J

t

over the triangle is of order (dx)

1

. The covector dS is

of order (dx)

n1

, so the possible error in the ux is of order (dx)

n

.

This was only an estimate of one part of the ux, the part con-

tributed by the component J

t

. However, we get the same estimate

for the other parts. For example, if we refer to the two dimensions

in gure f/2 as t and x, then interchanging the roles of t and x

in the above argument produces the same error estimate for the

contribution from J

x

.

This is good. When we began this argument, we were motivated

to be cautious by our observation that a quantity such as the surface

area of R cant be calculated as the limit of the surface area as

approximated using boxes. The reason we have that problem for

surface area is that the error in the approximation on a small patch

is of order (dx)

n1

, which is an innitesimal of the same order as the

surface area of the patch itself. Therefore when we scale down the

boxes, the error doesnt get small compared to the total area. But

when we consider ux, the error contibuted by each of the irregularly

shaped pieces near the surface goes like (dx)

n

, which is of the order

of the n-volume of the piece. This volume goes to zero in the limit

where the boxes get small, and therefore the error goes to zero as

well. This establishes the generalization of Gausss theorem to a

region R of arbitrary shape.

9.3.4 The energy-momentum vector

Einsteins celebrated E = mc

2

is a special case of the statement

that energy-momentum is conserved, transforms like a four-vector,

and has a norm m equal to the rest mass. Section 4.4 on p. 84

explored some of the problems with Einsteins original attempt at a

proof of this statement, but only now are we prepared to completely

resolve them. One of the problems was the denitional one of what

we mean by the energy-momentum of a system that is not composed

of pointlike particles. The answer is that for any phenomenon that

carries energy-momentum, we must decide how it contributes to the

stress-energy tensor. For example, the stress-energy tensor of the

electric and magnetic elds is described in section 10.6 on p. 182.

Section 9.3 Gausss theorem 165

g / Conservation of the inte-

grated energy-momentum vector.

h / Lorentz transformation of

the integrated energy-momentum

vector.

For the reasons discussed in section 4.4 on p. 84, it is necessary to

assume that energy-momentum is locally conserved, and also that

the system being described is isolated. Local conservation is de-

scribed by the zero-divergence property of the stress-energy tensor,

T

ab

/x

a

= 0. Once we assume local conservation, gure g shows

how to prove conservation of the integrated energy-momentum vec-

tor using Gausss theorem. Fix a frame of reference o. Surrounding

the system, shown as a dark stream owing through spacetime, we

draw a box. The box is bounded on its past side by a surface that

o considers to be a surface of simultaneity s

A

, and likewise on the

future side s

B

. It doesnt actually matter if the sides of the box are

straight or curved according to o. What does matter is that because

the system is isolated, we have enough room so that between the

system and the sides of the box there can be a region of vacuum, in

which the stress-energy tensor vanishes.

Observer o says that at the initial time corresponding to s

A

, the

total amount of energy-momentum in the system was

p

A

=

_

s

A

T

dS

,

where the minus sign occurs because we take dS

to point outward,

for compatibility with Gausss theorem, and this makes it antiparal-

lel to the velocity vector o, which is the opposite of the orientation

dened in equations (2) on p. 153. At the nal time we have

p

B

=

_

s

B

T

dS

,

with a plus sign because the outward direction is now the same as

the direction of o. Because of the vacuum region, there is no ux

through the sides of the box, and therefore by Gausss theorem p

A

= 0. The energy-momentum vector has been globally conserved

according to o.

We also need to show that the integrated energy-momentum

transforms properly as a four-vector. To prove this, we apply Gausss

theorem to the region shown in gure h, where s

C

is a surface of si-

multaneity according to some other observer o

. Gausss theorem

tells us that p

B

= p

C

, which means that the energy-momentum on

the two surfaces is the same vector in the absolute sense but

this doesnt mean that the two vectors have the same components

as measured by dierent observers. Observer o says that s

B

is a

surface of simultaneity, and therefore considers p

B

to be the total

energy-momentum at a certain time. She says the total mass-energy

is p

B

o

the three spatial directions s

1

, s

2

, and s

3

(eq. (2b)). Observer o

,

meanwhile, considers s

C

to be a surface of simultaneity, and has

the same interpretations for quantities such as p

C

o

. But this is

just a way of saying that p

B

and p

C

are related to each other by

166 Chapter 9 Flux

i / These three rulers represent

three choices of coordinates.

a change of basis from (o, s

1

, s

2

, s

3

) to (o

, s

1

, s

2

, s

3

). A change of

basis like this is just what we mean by a Lorentz transformation, so

the integrated energy-momentum p transforms as a four-vector.

9.4 The covariant derivative

In this optional section we deal with the issues raised in section

7.5 on p. 136. We noted there that in non-Minkowski coordinates,

one cannot naively use changes in the components of a vector as a

measure of a change in the vector itself. A constant scalar function

remains constant when expressed in a new coordinate system, but

the same is not true for a constant vector function, or for any tensor

of higher rank. This is because the change of coordinates changes

the units in which the vector is measured, and if the change of

coordinates is nonlinear, the units vary from point to point. This

topic doesnt logically belong in this chapter, but Ive placed it here

because it cant be discussed clearly without already having covered

tensors of rank higher than one.

Consider the one-dimensional case, in which a vector v

a

has only

one component, and the metric is also a single number, so that we

can omit the indices and simply write v and g. (We just have to

remember that v is really a vector, even though were leaving out

the upper index.) If v is constant, its derivative dv/ dx, computed in

the ordinary way without any correction term, is zero. If we further

assume that the metric is simply the constant g = 1, then zero is

not just the answer but the right answer.

Now suppose we transform into a new coordinate system X, and

the metric G, expressed in this coordinate system, is not constant.

Applying the tensor transformation law, we have V = v dX/ dx,

and dierentiation with respect to X will not give zero, because the

factor dX/ dx isnt constant. This is the wrong answer: V isnt

really varying, it just appears to vary because G does.

We want to add a correction term onto the derivative operator

d/ dX, forming a new derivative operator

X

that gives the right

answer.

X

is called the covariant derivative. This correction term

is easy to nd if we consider what the result ought to be when dif-

ferentiating the metric itself. In general, if a tensor appears to vary,

it could vary either because it really does vary or because the met-

ric varies. If the metric itself varies, it could be either because the

metric really does vary or . . . because the metric varies. In other

words, there is no sensible way to assign a nonzero covariant deriva-

tive to the metric itself, so we must have

X

G = 0. The required

correction therefore consists of replacing d/ dX with

X

=

d

dX

G

1

dG

dX

.

Applying this to G gives zero. G is a second-rank tensor with two

lower indices. If we apply the same correction to the derivatives of

Section 9.4 The covariant derivative 167

j / Example 10.

other tensors of this type, we will get nonzero results, and they will

be the right nonzero results.

Mathematically, the form of the derivative is (1/y) dy/ dx, which

is known as a logarithmic derivative, since it equals d(ln y)/ dx. It

measures the multiplicative rate of change of y. For example, if

y scales up by a factor of k when x increases by 1 unit, then the

logarithmic derivative of y is ln k. The logarithmic derivative of

e

cx

is c. The logarithmic nature of the correction term to

X

is a

good thing, because it lets us take changes of scale, which are mul-

tiplicative changes, and convert them to additive corrections to the

derivative operator. The additivity of the corrections is necessary if

the result of a covariant derivative is to be a tensor, since tensors

are additive creatures.

What about quantities that are not second-rank covariant ten-

sors? Under a rescaling of coordinates by a factor of k, covectors

scale by k

1

, and second-rank tensors with two lower indices scale

by k

2

. The correction term should therefore be half as much for

covectors,

X

=

d

dX

1

2

G

1

dG

dX

.

and should have an opposite sign for vectors.

Generalizing the correction term to derivatives of vectors in more

than one dimension, we should have something of this form:

a

v

b

=

a

v

b

+

b

ac

v

c

a

v

b

=

a

v

b

c

ba

v

c

,

where

b

ac

, called the Christoel symbol, does not transform like

a tensor, and involves derivatives of the metric. (Christoel is

pronounced Krist-AWful, with the accent on the middle syllable.)

An important gotcha is that when we evaluate a particular com-

ponent of a covariant derivative such as

2

v

3

, it is possible for the

result to be nonzero even if the component v

3

vanishes identically.

Christoffel symbols on the globe Example 10

As a qualitative example, consider the airplane trajectory shown

in gure j, from London to Mexico City. This trajectory is the short-

est one between these two points; such a minimum-length trajec-

tory is called a geodesic. In physics it is customary to work with

the colatitude, , measured down from the north pole, rather then

the latitude, measured from the equator. At P, over the North At-

lantic, the planes colatitude has a minimum. (We can see, with-

out having to take it on faith from the gure, that such a minimum

must occur. The easiest way to convince oneself of this is to con-

sider a path that goes directly over the pole, at = 0.)

At P, the planes velocity vector points directly west. At Q, over

New England, its velocity has a large component to the south.

168 Chapter 9 Flux

k / Birdtracks notation for the

covariant derivative.

Since the path is a geodesic and the plane has constant speed,

the velocity vector is simply being parallel-transported; the vec-

tors covariant derivative is zero. Since we have v

= 0 at P, the

only way to explain the nonzero and positive value of

is that

we have a nonzero and negative value of

.

By symmetry, we can infer that

in the southern hemisphere, and must vanish at the equator.

Symmetry also requires that this Christoffel symbol be indepen-

dent of , and it must also be independent of the radius of the

sphere.

To compute the covariant derivative of a higher-rank tensor, we

just add more correction terms, e.g.,

a

U

bc

=

a

U

bc

d

ba

U

dc

d

ca

U

bd

or

a

U

c

b

=

a

U

c

b

d

ba

U

c

d

+

c

ad

U

d

b

.

With the partial derivative

metric to raise the index and form

with covariant derivatives, so

a

= g

ab

b

is a correct identity.

9.4.1 Comma, semicolon, and birdtracks notation

Some authors use superscripts with commas and semicolons to

indicate partial and covariant derivatives. The following equations

give equivalent notations for the same derivatives:

=

x

= X

,

a

X

b

= X

b;a

a

X

b

= X

;a

b

Figure k shows two examples of the corresponding birdtracks no-

tation. Because birdtracks are meant to be manifestly coordinate-

independent, they do not have a way of expressing non-covariant

derivatives.

9.4.2 Finding the Christoffel symbol from the metric

Weve already found the Christoel symbol in terms of the metric

in one dimension. Expressing it in tensor notation, we have

d

ba

=

1

2

g

cd

(

?

g

??

) ,

Section 9.4 The covariant derivative 169

where inversion of the one-component matrix G has been replaced

by matrix inversion, and, more importantly, the question marks indi-

cate that there would be more than one way to place the subscripts

so that the result would be a grammatical tensor equation. The

most general form for the Christoel symbol would be

b

ac

=

1

2

g

db

(L

c

g

ab

+M

a

g

cb

+N

b

g

ca

) ,

where L, M, and N are constants. Consistency with the one-

dimensional expression requires L + M + N = 1. The condition

L = M arises on physical, not mathematical grounds; it reects the

fact that experiments have not shown evidence for an eect called

torsion, in which vectors would rotate in a certain way when trans-

ported. The L and M terms have a dierent physical signicance

than the N term.

Suppose an observer uses coordinates such that all objects are

described as lengthening over time, and the change of scale accu-

mulated over one day is a factor of k > 1. This is described by the

derivative

t

g

xx

< 1, which aects the M term. Since the metric is

used to calculate squared distances, the g

xx

matrix element scales

down by 1/

k. To compensate for

t

v

x

< 0, so we need to add a

positive correction term, M > 0, to the covariant derivative. When

the same observer measures the rate of change of a vector v

t

with

respect to space, the rate of change comes out to be too small, be-

cause the variable she dierentiates with respect to is too big. This

requires N < 0, and the correction is of the same size as the M

correction, so [M[ = [N[. We nd L = M = N = 1.

Self-check: Does the above argument depend on the use of space

for one coordinate and time for the other?

The resulting general expression for the Christoel symbol in

terms of the metric is

c

ab

=

1

2

g

cd

(

a

g

bd

+

b

g

ad

d

g

ab

) .

One can go back and check that this gives

c

g

ab

= 0.

Self-check: In the case of 1 dimension, show that this reduces to

the earlier result of (1/2) dG/ dX.

is not a tensor, i.e., it doesnt transform according to the tensor

transformation rules. Since isnt a tensor, it isnt obvious that the

covariant derivative, which is constructed from it, is tensorial. But

if it isnt obvious, neither is it surprising the goal of the above

derivation was to get results that would be coordinate-independent.

Christoffel symbols on the globe, quantitatively Example 11

In example 10 on page 168, we inferred the following properties

for the Christoffel symbol

on a sphere of radius R:

is

independent of and R,

170 Chapter 9 Flux

l / The geodesic, 1, preserves

tangency under parallel trans-

port. The non-geodesic curve,

2, doesnt have this property;

a vector initially tangent to the

curve is no longer tangent to it

when parallel-transported along

it.

(colatitude less than /2),

>

0 in the southern hemisphere.

The metric on a sphere is ds

2

= R

2

d

2

+ R

2

sin

2

d

2

. The only

nonvanishing term in the expression for

= 2R

2

sin cos . The result is

can be veried to have the properties claimed above.

9.4.3 The geodesic equation

In this section we show how the Christoel symbols can be used

to nd dierential equations that describe inertial motion. The

world-line of a test particle is called a geodesic. We dened this term

in a nonrelativistic context as the shortest curve between two points.

Geodesics play the same role in relativity that straight lines play in

Euclidean geometry. A relativistic geodesic minimizes or maximizes

the metric distance between two events. A timelike geodesic maxi-

mizes the proper time (cf. section 2.4.2, p. 44). In special relativity,

geodesics are given by linear equations when expressed in Minkowski

coordinates, and the velocity vector of a test particle has constant

components when expressed in Minkowski coordinates. In general

relativity, Minkowski coordinates dont exist, and geodesics dont

have the properties we expect based on Euclidean intuition; for ex-

ample, initially parallel geodesics may later converge or diverge.

Characterization of the geodesic

A geodesic can be dened as a world-line that preserves tangency

under parallel transport, l. This is essentially a mathematical way

of expressing the notion that we have previously expressed more

informally in terms of staying on course or moving inertially.

A curve can be specied by giving functions x

i

() for its coor-

dinates, where is a real parameter. A vector lying tangent to the

curve can then be calculated using partial derivatives, T

i

= x

i

/.

There are three ways in which a vector function of could change:

(1) it could change for the trivial reason that the metric is changing,

so that its components changed when expressed in the new metric;

(2) it could change its components perpendicular to the curve; or

(3) it could change its component parallel to the curve. Possibility

1 should not really be considered a change at all, and the denition

of the covariant derivative is specically designed to be insensitive

to this kind of thing. 2 cannot apply to T

i

, which is tangent by

construction. It would therefore be convenient if T

i

happened to

be always the same length. If so, then 3 would not happen either,

and we could reexpress the denition of a geodesic by saying that

the covariant derivative of T

i

was zero. For this reason, we will

assume for the remainder of this section that the parametrization

of the curve has this property. In a Newtonian context, we could

imagine the x

i

to be purely spatial coordinates, and to be a uni-

versal time coordinate. We would then interpret T

i

as the velocity,

Section 9.4 The covariant derivative 171

and the restriction would be to a parametrization describing motion

with constant speed. In relativity, the restriction is that must be

an ane parameter. For example, it could be the proper time of a

particle, if the curve in question is timelike.

Covariant derivative with respect to a parameter

The notation of section 9.4 is not quite adapted to our present

purposes, since it allows us to express a covariant derivative with

respect to one of the coordinates, but not with respect to a param-

eter such as . We would like to notate the covariant derivative of

T

i

with respect to as

T

i

, even though isnt a coordinate. To

connect the two types of derivatives, we can use a total derivative.

To make the idea clear, here is how we calculate a total derivative

for a scalar function f(x, y), without tensor notation:

df

d

=

f

x

x

+

f

y

y

.

This is just the generalization of the chain rule to a function of two

variables. For example, if represents time and f temperature,

then this would tell us the rate of change of the temperature as

a thermometer was carried through space. Applying this to the

present problem, we express the total covariant derivative as

T

i

= (

b

T

i

)

dx

b

d

=

_

b

T

i

+

i

bc

T

c

_

dx

b

d

.

The geodesic equation

Recognizing

b

T

i

dx

b

/ d as a total non-covariant derivative, we

nd

T

i

=

dT

i

d

+

i

bc

T

c

dx

b

d

.

Substituting x

i

/ for T

i

, and setting the covariant derivative

equal to zero, we obtain

d

2

x

i

d

2

+

i

bc

dx

c

d

dx

b

d

= 0.

This is known as the geodesic equation.

If this dierential equation is satised for one ane parameter

, then it is also satised for any other ane parameter

= a+b,

where a and b are constants (problem 4, p. 174). Recall that ane

parameters are only dened along geodesics, not along arbitrary

curves. We cant start by dening an ane parameter and then use

it to nd geodesics using this equation, because we cant dene an

ane parameter without rst specifying a geodesic. Likewise, we

cant do the geodesic rst and then the ane parameter, because if

172 Chapter 9 Flux

we already had a geodesic in hand, we wouldnt need the dierential

equation in order to nd a geodesic. The solution to this chicken-

and-egg conundrum is to write down the dierential equations and

try to nd a solution, without trying to specify either the ane

parameter or the geodesic in advance.

The geodesic equation is useful in establishing one of the nec-

essary theoretical foundations of relativity, which is the uniqueness

of geodesics for a given set of initial conditions. If the geodesic

were not uniquely determined, then particles would have no way of

deciding how to move. The form of the geodesic equation guaran-

tees uniqueness, because one can use it to dene an algorithm that

constructs a geodesic for a given set of initial conditions.

Section 9.4 The covariant derivative 173

Problems

1 Rewrite the stress-energy tensor of a perfect uid in SI units.

For air at sea level, compare the sizes of its components.

2 Prove by direct computation that if a rank-2 tensor is sym-

metric when expressed in one Minkowski frame, the symmetry is

preserved under a boost.

3 Consider the following change of coordinates:

t

= t

x

= x

y

= y

z

= z

This is called a time reversal. As in example 6 on p. 155, nd the

eect on the stress-energy tensor.

4 Show that if the dierential equation for geodesics on page

172 is satised for one ane parameter , then it is also satised for

any other ane parameter

5 This problem investigates a notational conict in the de-

scription of the metric tensor using index notation. Suppose that

we have two dierent metrics, g

and g

rank-2 tensors is also a rank-2 tensor, so we would like the quantity

g

= g

tion properties and in its behavior when we manipulate its indices.

Now we also have g

and g

inverses of their lower-index counterparts; this is a special property

of the metric, not of rank-2 tensors in general. We can then dene

g

= g

and g

by raising and lowering indices. (b) Find the general relationship

between g

and g

.

174 Chapter 9 Flux

a / A model of a charged particle

and a current-carrying wire,

seen in two different frames of

reference. The relativistic length

contraction is highly exaggerated.

The force on the lone particle is

purely magnetic in 1, and purely

electric in 2.

Chapter 10

Electromagnetism

10.1 Relativity requires magnetism

Figure a/1 is an unrealistic model of charged particle moving par-

allel to a current-carrying wire. What electrical force does the lone

particle in gure a/1 feel? Since the density of trac on the two

sides of the road is equal, there is zero overall electrical force on

the lone particle. Each car that attracts the lone particle is paired

with a partner on the other side of the road that repels it. If we

didnt know about magnetism, wed think this was the whole story:

the lone particle feels no force at all from the wire.

Figure a/2 shows what wed see if we were observing all this from

a frame of reference moving along with the lone charge. Relativity

tells us that moving objects appear contracted to an observer who is

not moving along with them. Both lines of charge are in motion in

both frames of reference, but in frame 1 they were moving at equal

speeds, so their contractions were equal. In frame 2, however, their

speeds are unequal. The dark charges are moving more slowly than

in frame 1, so in frame 2 they are less contracted. The light-colored

charges are moving more quickly, so their contraction is greater now.

The cars on the two sides of the road are no longer paired o,

so the electrical forces on the lone particle no longer cancel out as

they did in a/1. The lone particle is attracted to the wire, because

the particles attracting it are more dense than the ones repelling it.

Now observers in frames 1 and 2 disagree about many things,

but they do agree on concrete events. Observer 2 is going to see

the lone particle drift toward the wire due to the wires electrical

attraction, gradually speeding up, and eventually hit the wire. If 2

sees this collision, then 1 must as well. But 1 knows that the total

electrical force on the lone particle is exactly zero. There must be

some new type of force. She invents a name for it: magnetism.

Magnetism is a purely relativistic eect. Since relativistic ef-

fects are down by a factor of v

2

compared to Newtonian ones, its

surprising that relativity can produce an eect as vigorous as the at-

traction between a magnet and your refrigerator. The explanation

is that although matter is electrically neutral, the cancellation of

electrical forces between macroscopic objects is extremely delicate,

so anything that throws o the cancellation, even slightly, leads to

a surprisingly large force.

175

b / Fields carry energy.

10.2 Fields in relativity

Based on what we learned in section 10.1, the next natural step

would seem to be to nd some way of extending Coulombs law to

include magnetism. For example, we could try to nd a formula

for the magnetic force between charges q

1

and q

2

based on not just

their relative positions but also on their velocities. The following

considerations, however, tell us not to go down that path.

10.2.1 Time delays in forces exerted at a distance

Relativity forbids Newtons instantaneous action at a distance

(p. 17). Since forces cant be transmitted instantaneously, it be-

comes natural to imagine force-eects spreading outward from their

source like ripples on a pond, and we then have no choice but to

impute some physical reality to these ripples. We call them elds,

and they have their own independent existence.

Even empty space, then, is not perfectly featureless. It has mea-

surable properties. For example, we can drop a rock in order to

measure the direction of the gravitational eld, or use a magnetic

compass to nd the direction of the magnetic eld. This concept

made a deep impression on Einstein as a child. He recalled that as

a ve-year-old, the gift of a magnetic compass convinced him that

there was something behind things, something deeply hidden.

10.2.2 Fields carry energy.

The smoking-gun argument for this strange notion of traveling

force ripples comes from the fact that they carry energy. In g-

ure b/1, Alice and Betty hold positive charges A and B at some

distance from one another. If Alice chooses to move her charge

closer to Bettys, b/2, Alice will have to do some mechanical work

against the electrical repulsion, burning o some of the calories from

that chocolate cheesecake she had at lunch. This reduction in her

bodys chemical energy is oset by a corresponding increase in the

electrical potential energy qV . Not only that, but Alice feels the

resistance stien as the charges get closer together and the repul-

sion strengthens. She has to do a little extra work, but this is all

properly accounted for in the electrical potential energy.

But now suppose, b/3, that Betty decides to play a trick on Alice

by tossing charge B far away just as Alice is getting ready to move

charge A. We have already established that Alice cant feel charge

Bs motion instantaneously, so the electric forces must actually be

propagated by an electric eld. Of course this experiment is utterly

impractical, but suppose for the sake of argument that the time it

takes the change in the electric eld to propagate across the dia-

gram is long enough so that Alice can complete her motion before

she feels the eect of Bs disappearance. She is still getting stale

information about Bs position. As she moves A to the right, she

176 Chapter 10 Electromagnetism

feels a repulsion, because the eld in her region of space is still the

eld caused by B in its old position. She has burned some chocolate

cheesecake calories, and it appears that conservation of energy has

been violated, because these calories cant be properly accounted

for by any interaction with B, which is long gone.

If we hope to preserve the law of conservation of energy, then

the only possible conclusion is that the electric eld itself carries

away the cheesecake energy. In fact, this example represents an

impractical method of transmitting radio waves. Alice does work

on charge A, and that energy goes into the radio waves. Even if B

had never existed, the radio waves would still have carried energy,

and Alice would still have had to do work in order to create them.

10.2.3 Fields must have transformation laws

In the foregoing discussion Ive been guilty of making arguments

that elds were real. Sorry. In physics, and particularly in rel-

ativity, its usually a waste of time worrying about whether some

eect such as length contraction is real or only seems that way.

But thinking of elds as having an independent existence does lead

to a useful guiding principle, which is that elds must have trans-

formation laws. Suppose that at a certain location, observer o

1

measures every possible eld electric, magnetic, bodice-ripper-

sexual-attractional, and so on. (The gravitational eld is not on the

list, for the reasons discussed in section 5.2.) Observer o

2

, passing

by the same event but in a dierent state of motion, could carry

out similar measurements. Were talking about measurements be-

ing carried out on a cubic inch of pure vacuum, but suppose that the

answer to Peggy Lees famous question is Yes, thats all there is

the only information there is to know about that empty parcel of

nothingness is the (frame-dependent) value of the elds it contains.

Then o

1

ought to be able to predict the results of o

2

s measurements.

For if not, then what is the nature of the information that is hidden

from o

1

but revealed to o

2

? Presumably this would be something

related to how the elds were produced by certain particles long ago

and far away. For example, maybe o

1

is at rest relative to a certain

charge q that helped to create the elds, but o

2

isnt, so o

2

picks

up qs magnetic eld, which is information unavailable to o

1

who

thinks q was at rest, and therefore didnt make any magnetic eld.

This would contradict our thats all there is hypothesis.

To show the power of thats all there is, consider example 1,

p. 150, in which we found that boosting a solenoid along its own axis

doesnt change its internal eld. As a fact about solenoids, its fairly

obscure and useless. But if the elds must have transformation laws,

then weve learned something much more general: a magnetic eld

always stays the same under a boost in the direction of the eld.

Section 10.2 Fields in relativity 177

10.3 Electromagnetic elds

10.3.1 The electric eld

Section 10.1 showed that relativity requires magnetic forces to

exist, and section 10.2.3 gave us a peek at what this implies about or

how electric and magnetic elds transform. To understand this on a

more general basis, lets explicitly list some assumptions about the

electric eld and see how they lead to the existence and properties

of a magnetic eld:

1. Denition of the electric eld: In the frame of reference of an

inertial observer o, take some standard, charged test particle,

release it at rest, and observe the force F

o

(section 4.5, p. 86)

acting on the particle. (The timelike component of this force

vanishes.) Then the electric eld three-vector E in frame o is

dened by F

o

= qE, where we x our system of units by taking

some arbitrary value for the charge q of the test particle.

2. Denition of electric charge: For charges other than the stan-

dard test charge, we take Gausss law to be our denition of

electric charge.

3. Charge is Lorentz invariant (p. 22).

4. Fields must have transformation laws (section 10.2.3).

Many times already in our study of relativity, weve followed

the strategy of taking a Galilean vector and trying to redene it

as a four-dimensional vector in relativity. Lets try to do this with

the electric eld. Then we would have no other obvious thing to

try than to change its denition to F = qE, where F = ma is the

relativistic force vector (section 4.5, p. 86), so that the electric eld

three-vector was just the spacelike part of E. Because a v = 0 for

a material particle, this would imply that E was orthogonal to o

for any observer o. But this is impossible, since then a spacetime

displacement vector s along the direction of E would be a vector of

simultaneity for all observers, and we know that this isnt possible

in relativity.

10.3.2 The magnetic eld

Our situation is very similar to the one encountered in section

9.1, p. 149, where we found that knowledge of the charge density

in one frame was insucient to tell us the charge density in other

frames. There was missing information, which turned out to be

the current density. The problems weve encountered in dening

the transformation properties of the electric eld suggest a similar

missing-information situation, and it seems likely that the miss-

ing information is the magnetic eld. How should we modify the

assumptions on p. 178 to allow for the existence of a magnetic eld

178 Chapter 10 Electromagnetism

in addition to the electric one? What properties could this addi-

tional eld have? How would we dene or measure it?

One way of imagining a new type of eld would be if, in addition

to charge q, particles had some other characteristic, call it r, and

there was then be some entirely separate eld dened by their action

on a particle with this r-ness. But going down this road leads us

to unrelated phenomena such as the the strong nuclear interaction.

The nature of the contradiction arrived at in section 10.3.1 is

such that our additional eld is closely linked to the electric one,

and therefore we expect it to act on charge, not on r-ness. With-

out inventing something new like r-ness, the only other available

property of the test particle is its state of motion, characterized by

its velocity vector v. Now the simplest rule we could imagine for

determining the force on a test particle would be a linear one, which

would look like matrix multiplication:

F = qTv

or in index notation,

F

a

= qT

a

b

v

b

.

Although the form T

a

b

with one upper and one lower index occurs

naturally in this expression, well nd it more convenient from now

on to work with the upper-upper form T

ab

. T would be 4 4, so it

would have 16 elements:

_

_

_

_

T

tt

T

tx

T

ty

T

tz

T

xt

T

xx

T

xy

T

xz

T

yt

T

yx

T

yy

T

yz

T

zt

T

zx

T

zy

T

zz

_

_

_

_

Presumably these 16 numbers would encode the information about

the electric eld, as well as some additional information about the

eld or elds we were missing.

But these are not 16 numbers that we can choose freely and inde-

pendently. For example, consider a charged particle that is instan-

taneously at rest in a certain observers frame, with v = (1, 0, 0, 0).

(In this situation, the four-force equals the force measured by the

observer.) The work done by a force is positive if the force is in the

same direction as the motion, negative if in the opposite direction,

and zero if there is no motion. Therefore the power P = dW/ dt

in this example should be zero. Power is the timelike component of

the force vector, which forces us to take T

tt

= 0.

More generally, consider the kinematical constraint a v = 0

(p. 60). When we require a v = 0 for any v, not just this one, we

end up with the constraint that T must be antisymmetric, meaning

Section 10.3 Electromagnetic elds 179

that when we transpose it, the result is another matrix that looks

just like the original one, but with all the signs ipped:

_

_

_

_

0 T

tx

T

ty

T

tz

T

tx

0 T

xy

T

xz

T

ty

T

xy

0 T

yz

T

tz

T

xz

T

yz

0

_

_

_

_

Each element equals minus the corresponding element across the

main diagonal from it, and antisymmetry also requires that the main

diagonal itself be zero. In terms of the concept of degrees of free-

dom introduced in section 3.5.3, p. 58, we are down to 6 degrees of

freedom rather than 16.

We now relabel the elements of the matrix and follow up with

a justication of the relabeling. The result is the following rank-2

tensor:

T

=

_

_

_

_

0 E

x

E

y

E

z

E

x

0 B

z

B

y

E

y

B

z

0 B

x

E

z

B

y

B

x

0

_

_

_

_

(1)

Well call this the electromagnetic eld tensor. The labeling of the

left column simply expresses the denition of the electric eld, which

is expressed in terms of the velocity v = (1, 0, 0, 0) of a particle at

rest. The top row then follows from antisymmetry. For an arbitrary

velocity vector, writing out the matrix multiplication F

= qT

x

= q(E

x

+u

y

B

z

u

z

B

y

) (problem 3,

p. 193). Taking into account the dierence of a factor of between

the four-force and the force measured by an observer, we end up

with the familiar Lorentz force law,

F

o

= q(E+u B) ,

where B is the magnetic eld. This is expressed in units where c = 1,

so that the electric and magnetic eld have the same units. In units

with c ,= 1, the magnetic components of the electromagnetic eld

matrix should be multiplied by c.

Thus starting only from the assumptions on p. 178, we deduce

that the electric eld must be accompanied by a magnetic eld.

Parity properties of E and B Example 1

In example 6 on p. 155, we saw that under the parity transfor-

mation (t , x, y, z) (t , x, y, z), any rank-2 tensor expressed

in Minkowski coordinates changes the signs of its components

according to the same rule:

_

_

_

_

no ip ip ip ip

ip no ip no ip no ip

ip no ip no ip no ip

ip no ip no ip no ip

_

_

_

_

.

180 Chapter 10 Electromagnetism

c / Example 2.

Since this holds for the electromagnetic eld tensor T, we nd

that under parity, E E and B B. For example, a capacitor

seen in a mirror has its electric eld pointing the opposite way, but

there is no change in the magnetic eld of a current loop, since

the location of each current element is ipped to the other side

of the loop, but its direction of ow is also reversed, so that the

picture as a whole remains unchanged.

10.3.3 What about gravity?

A funny puzzle pops up if we go back and think about the as-

sumptions on p. 178 that went into all this. Those assumptions were

so general that it almost seems as though the only possible behavior

for elds is the behavior of electric and magnetic elds. But other

elds do behave dierently. How did the assumptions fail in the

case of gravity, for example? Gausss law (assumption 2) certainly

holds for gravity. But the source of gravitational elds isnt charge,

its mass-energy, and mass-energy isnt a Lorentz invariant, contrary

to assumption 3. Furthermore, assumption 1 entailed that our eld

could be dened in terms of forces measured by an inertial observer,

but for an inertial observer gravity doesnt exist (section 5.2).

10.4 Transformation of the elds

Since we have associated the components of the electric and mag-

netic elds with elements of a rank-2 tensor, the transformation law

for these elds now follows from the general tensor transformation

law for rank-2 tensors (p. 154). We rst state the general rule, in

a prettied form, and then give some concrete examples. Under a

boost by a three-velocity v, the electric and magnetic elds E and

B transform to E

and B

E

= E

= (E

+v B)

B

= B

= (B

v E)

A line of charge Example 2

Figure c/1 shows a line of charges. At a given nearby point, it

creates an electric eld E that points outward, as measured by

an observer o who is at rest relative to the charges. This eld is

represented in the gure by its pattern of eld lines, which start

on the charges and radiate outward like the bristles of a bottle

brush. Because the charges are at rest, the magnetic eld is

zero. (Finding the magnitude of the eld at a certain distance is a

straightforward application of Gausss law.)

Now consider an observer o

the right relative to o. Without even worrying about how the eld

was created, we can transform the elds, at the point in space

discussed previously, into the new frame. The result is E

= E

Section 10.4 Transformation of the elds 181

and B

tense, and there is also a magnetic eld, whose pattern of white

eld lines forms circles lying in planes perpendicular to the line.

If we do happen to know that the eld was created created by

the line of charge, which is moving according to o

, then we can

explain these results as arising from two effects. First, the line of

charge has been length-contracted. This causes the density of

charge per unit length to increase by a factor of , with a propor-

tional increase in the electric eld. In the eld-line description, we

simply have more charges in the gure, so there are more eld

lines coming out of them. Second, the line of charge is moving

to the left in this frame, so it forms an electric current, and this

current is the cause of the magnetic eld B

.

10.5 Invariants

Weve seen cases before in which an invariant can be formed from

a rank-1 tensor. The square of the proper time corresponding to a

timelike spacetime displacement r is r r or, in the index notation

introduced in section ??, r

a

r

a

. From the momentum tensor we can

construct the square of the mass p

a

p

a

.

There are good reasons to believe that something similar can

be done with the electromagnetic eld tensor, since electromagnetic

elds have certain properties that are preserved when we switch

frames. Specically, an electromagnetic wave consists of electric

and magnetic elds that are equal in magnitude and perpendicular

to one another. An electromagnetic wave that is a valid solution

to Maxwells equations in one frame should also be a valid wave in

another frame. It can be shown that the following two quantities

are invariants:

P = B

2

E

2

and

Q = E B

The rst of these can also be expressed as P =

1

2

T

ab

T

ab

, while

the second equals Q =

1

4

abcd

T

ab

T

cd

, where the Levi-Civita symbol

and 0 otherwise. For an electromagnetic wave, both P and Q are

zero. (The converse is false: a eld for which both invariants vanish

need not be an electromagnetic wave.)

10.6 Stress-energy tensor of the

electromagnetic eld

The electromagnetic eld has a stress-energy tensor associated with

it. From our study of electromagnetism we know that the electro-

182 Chapter 10 Electromagnetism

d / Pressure and tension in

electrostatic elds.

magnetic eld has energy density U = (E

2

+B

2

)/8k and momen-

tum density S = (E B)/4k (in units where c = 1, with k being

the Coulomb constant). This xes the components of the stress-

energy tensor of the form T

t...

and T

...t

, i.e., the top row and left

column, to look like this:

T

=

_

_

_

_

U S

x

S

y

S

z

S

x

S

y

S

z

_

_

_

_

.

The following argument tells us something about what to expect

for the components T

xx

, T

yy

, and T

zz

, which are interpreted as

pressures or tensions, depending on their signs. In gure d/1, the

capacitor plates want to collapse against each other in the vertical

(y) direction, but at the same time the internal repulsions within

each plate make that plate want to expand in the x direction. If

the capacitor is built out of materials that hold their shape, then

the electromagnetic tension in T

yy

< 0 is counteracted by pressure

T

yy

> 0 in the materials, while the electromagnetic pressure T

xx

> 0

is canceled by the materials tension T

xx

< 0. We got these results

for a particular physical situation, but relativity requires that the

stress-energy be dened at every point based on the elds at that

point, so our conclusions must hold generally. In d/2 and d/3, white

boxes have been drawn in regions where the total eld is strong

and the elds are strongly interacting. In 2, there is tension in

the x direction and pressure in y; the tension can be thought of as

contributing to the attraction between the opposite charges. In 3,

there is also x tension and y pressure; the pressure contributes to

the like charges repulsion.

To make this more quantitative, consider the discontinuity in E

y

at the upper plate in gure d/1. The eld abruptly switches from

0 on the outside to some value E between the plates. By Gausss

law, the charge per unit area on the plate must be = E/4k.

The average eld experienced by the charge in the plate is E =

(0 +E)/2 = E/2, so the force per unit area, i.e., the tension in the

eld, is E = E

2

/8k. Thus we expect T

yy

= E

2

/8k if E is

along the y axis.

For the reader who wants the full derivation of the remaining

nine components of the tensor, we now give an argument that makes

use of the following list of its properties. Other readers can skip

ahead to where the full tensor is presented.

1. T is symmetric, T

= T

.

2. The components must be second-order in the elds, e.g., we

can have terms like E

x

B

z

, but not E

3

x

B

7

z

or E

x

B

z

B

y

. This

Section 10.6 Stress-energy tensor of the electromagnetic eld 183

is because Maxwells equations are linear, and when a wave

equation is perfectly linear, the corresponding energy expres-

sion is second-order in the amplitude of the wave.

3. T has the parity properties described in example 6 on p. 155.

4. The electric and magnetic elds are treated symmetrically in

Maxwells equations, so they should be treated symmetrically

in the stress-energy tensor. E.g., we could have a term like

7E

2

x

+ 7B

2

z

, but not 7E

2

x

+ 6B

2

z

.

5. On p. 161 of section 9.2.8, we saw that the trace energy con-

dition T

a

a

0 is satised by a cloud of dust if and only if

the dusts mass-energy is not transported at a speed greater

than c. In section 4.1, we saw that all ultrarelativistic par-

ticles have the same mechanical properties. Since a cloud of

dust, in the limit where its speed approaches c, is on the edge

of the bound set by the trace energy condition, T

a

a

0, we

expect that the electromagnetic eld, in which disturbances

propagate at c, should also exactly saturate the trace energy

condition, so that T

a

a

= 0.

6. The stress-energy tensor should behave properly under rota-

tions, which basically means that x, y, and z should be treated

symmetrically.

7. An electromagnetic plane wave propagating in the x direction

should not exert any pressure in the y or z directions.

8. If the eld obeys Maxwells equations, then the energy-conservation

condition T

ab

/x

a

= 0 should hold.

These facts are enough to completely determine the form of the

remaining nine components of the stress-energy tensor. Property 3

requires that all of these components be even under parity. Since

electric elds ip under parity but magnetic elds dont (example 1,

p. 180), these components can only have terms like E

i

E

j

and B

i

B

j

,

not mixed terms like E

i

B

j

. Taking into account properties 4 and 6,

we nd that the diagonal terms must look like

4kT

xx

= a(E

2

x

+B

2

x

) +b(E

2

+B

2

) ,

and the o-diagonal ones

4kT

xy

= c(E

x

E

y

+B

x

B

y

) .

Property 5 gives 1/2 a 3b = 0 and 7 gives b = a/2, so we have

a = 1 and b = 1/2. The determination of c = 1 is left as an

exercise, problem 4 on p. 193.

184 Chapter 10 Electromagnetism

We have now established the complete expression for the stress-

energy tensor of the electromagnetic eld, which is

T

=

_

_

_

_

U S

x

S

y

S

z

S

x

xx

xy

xz

S

y

yx

yy

yz

S

z

zx

zy

zz

_

_

_

_

,

where

U =

1

8k

(E

2

+B

2

) ,

S =

1

4k

EB ,

and , known as the Maxwell stress tensor, is given by

=

1

4k

_

E

if ,=

E

+

1

2

(E

2

+B

2

) if =

All of this can be expressed more compactly and in a coordinate-

independent way as

T

ab

=

1

4k

_

T

ac

T

b

c

+

1

4

o

d

o

d

g

ab

T

ef

T

ef

_

, (2)

where o is a future-directed velocity vector, so that o

d

o

d

= +1 for

the signature + used in this book, and 1 if the signature is

+ ++.

Stress-energy tensor of a plane wave Example 3

Let an electromagnetic plane wave (not necessarily sinusoidal)

propagate along the x axis, with its polarization such that E is in

the y direction and B on the z axis, and [E[ = [B[ = A. Then we

have the following for the stress-energy tensor.

T

=

A

2

4k

_

_

_

_

1 1 0 0

1 1 0 0

0 0 0 0

0 0 0 0

_

_

_

_

,

The T

t t

component tells us that the wave has a certain energy

density. Because the wave is massless, we have E

2

p

2

= m

2

=

0, so the momentum density is the same as the energy density,

and T

t x

is the same as T

t t

. If this wave strikes a surface in the

yz plane, the momentum the surface absorbs from the wave will

be felt as a pressure, represented by T

xx

.

In example 5 on p. 155, we saw that a cloud of dust, viewed in a

frame moving at velocity v relative to the dusts rest frame, had

the following stress-energy tensor.

T

=

_

_

_

_

2

2

v 0 0

2

v

2

v

2

0 0

0 0 0 0

0 0 0 0

_

_

_

_

.

Section 10.6 Stress-energy tensor of the electromagnetic eld 185

In the ultrarelativistic limit v 1, this becomes

T

= (energy density)

_

_

_

_

1 1 0 0

1 1 0 0

0 0 0 0

0 0 0 0

_

_

_

_

,

which is exactly the same as the result for our electromagnetic

wave. This illustrates the fact discussed in section 4.1 that all

ultrarelativistic particles have the same mechanical properties.

10.7 Maxwells equations

10.7.1 Statement and interpretation

In this book I assume that youve had the usual physics back-

ground acquired in a freshman survey course, which includes an

initial, probably frightening, encounter with Maxwells equations in

integral form. In units with c = 1, Maxwells equations are:

E

= 4kq (3a)

B

= 0 (3b)

_

E d =

B

t

(3c)

_

B d =

E

t

+ 4kI (3d)

where

E

=

_

E da and (4)

B

=

_

B da . (5)

Equations (3a) and (3b) refer to a closed surface and the charge q

contained inside that surface. Equation (3a), Gausss law, says that

charges are the sources of the electric eld, while (3b) says that

magnetic charges dont exist. Equations (3c) and (3d) refer to

a surface like a potato chip, which has an edge or boundary, and

the current I passing through that surface, with the line integrals

in being evaluated along that boundary. The right-hand side of

(3c) says that a changing magnetic eld produces a curly electric

eld, as in a generator or a transformer. The I term in (3d) says

that currents create magnetic elds that curl around them. The

E

/t term, which says that changing electric elds create mag-

netic elds, is necessary so that the equations produce consistent

186 Chapter 10 Electromagnetism

results regardless of the surfaces chosen, and is also part of the ap-

paratus responsible for the existence of electromagnetic waves, in

which the changing E eld produces the B, and the changing B

makes the E.

Equations (3a) and (3b) have no time-dependence. They func-

tion as constraints on the possible eld patterns. Equations (3c)

and (3d) are dynamical laws that predict how an initial eld pat-

tern will evolve over time. It can be shown that if (3a) and (3b) are

satised initially, then (3c) and (3d) ensure that they will continue

to be satised later. Because the dynamical laws consist of two vec-

tor equations, they provide a total of 6 constraints, which are the

number needed in order to predict the behavior of the 6 elds E

x

,

E

y

, E

z

, B

x

, B

y

, and B

z

.

10.7.2 Experimental support

Before Einsteins 1905 paper on relativity, the known laws of

physics were Newtons laws and Maxwells equations (3a)-(3d). Ex-

periments such as example 4 on p. 75 show that Newtons laws

are only low-velocity approximations. Maxwells equations are not

low-velocity approximations; for example, in section 1.3.1 we noted

the evidence that atoms are electrically neutral, in agreement with

Gausss law, (3a), to one part in 10

21

, even though the electrons in

atoms typically have velocities on the order of 1-10% of c.

10.7.3 Incompatibility with Galilean spacetime

Maxwells equations are not compatible with the Galilean de-

scription of spacetime (section 1.1.2, p. 13). If we assume that

equations (3) hold in some frame o, and then apply a Galilean

boost, transforming the coordinates (t, x, y, z) to (t

, x

, y

, z

) =

(t, x vt, y, z), we nd that in frame o

ferent and more complicated form that cannot be simplied so as

to look like the form they had in o. Rather than writing out the

resulting horrible mess and verifying that it cant be cast back into

the simpler form, an easier way to prove this is to note that there

are solutions to the equations in o that are not solutions after a

Galilean boost into o

form. For example, if a light wave propagates in the x direction at

speed c in o, then after a boost with v = c, we would have a light

wave in frame o

experiment of riding alongside a light wave on a motorcycle, p. 13.)

Such a wave would violate (3c), since the left-hand side would be

nonzero for a surface lying in the xy plane, but the time derivative

on the right-hand side would be zero.

10.7.4 Not manifestly relativistic in their original form

Since Maxwells equations are not low-velocity approximations

and are incompatible with Galilean relativity, we expect with the

benet of historical hindsight that they are compatible with the

Section 10.7 Maxwells equations 187

e / A magnetic eld that vio-

lates

B

= 0.

relativistic picture of spacetime. But when they are expressed in

the form (3), they have two features, either one of which seems

enough to make them completely incompatible with relativity:

(i) They appear to describe instantaneous action at a distance.

For example, Gausss law,

E

= 4kq, relates the electric eld

in one place (on the closed surface) to the electric charge some-

where else (inside the surface). This nonlocal structure smells

wrong relativistically, for the reasons discussed in section 10.2.

(ii) They appear to treat time and space asymmetrically.

Whats really happening here is that equations (3) are like a version

of Hamlet written in crayon on a long strip of toilet paper. They are

completely relativistic, but have been written in a form that hides

that fact.

The problem of nonlocality, i, can be shown to be a non-issue

because Maxwells equations can be reworked into a form in which

they are purely local. The idea is shown in gure e. The magnetic

eld lines all form closed loops, except for one of them, which begins

at a point in space and extends o to innity. Drawing the large

box, 1, we nd that

B,1

, the ux of the magnetic eld through the

box, is not zero, because a line leaves the box but none come in. But

the same discrepancy could have been detected with the smaller box

2, or in fact with an arbitrarily small box containing source of the

eld line. In other words, the equation

B

= 0 is nonlocal, but if it

is to hold for any surface, then it must also hold locally, in the limit

of an arbitrarily small surface. This purely local law of physics can

be expressed using the three-dimensional version of the divergence,

introduced on p. 152:

B

x

x

+

B

y

y

+

B

z

z

= 0

Of the four Maxwells equations, both equation (3a) and (3b) can

be reexpressed in this way. This book neither presents the full ma-

chinery of vector calculus nor assumes previous knowledge of it, but

a similar limiting procedure can also be applied to equations (3c)

and (3d), using an operator called the curl.

The following example is one in which both problem i and prob-

lem ii turn out not to be problems.

Jumping through a hoop Example 4

Here is an example in which the non-obvious features of Maxwells

equations prevent the antirelativistic meltdown projected in i. In

gure f/1, an electron jumps back and forth through an imagi-

nary circular hoop, across which we construct an imaginary at

surface. Every time the electron pierces the surface, it makes a

188 Chapter 10 Electromagnetism

f / 1. An electron jumps through

a hoop. 2. An alternative surface

spanning the hoop.

momentary spike in the current I, which appears in (3d),

_

B d =

E

t

+ 4kI .

We might expect that this would cause the eld B detected on the

edge of the disk to show similar spikes at the same times. But

same times implies some notion of simultaneity, and this would

be incompatible with relativity, since the t coordinate being re-

ferred to here is just one observers notion of time. Furthermore,

it would seem that information was being transmitted instanta-

neously from the center of the disk to its edge, which violates

relativity (p. 17).

Stranger still, we can produce an apparent paradox without even

appealing to relativity. Instead of the at surface in f/1, we can

pick a dish-shaped one, f/2, with a deep enough curve so that the

electron never crosses it. The current I is always zero according

to this surface, so that no eld B would be produced at the rim at

all.

The resolution of all these difculties lies in the term

E

/t ,

which weve ignored. With surface 1, the electron crosses the

surface in time t , causing a current I = e/t but also causing a

change in the ux from

E

2ke to

E

2ke. The result

is that the right-hand side of the equation is nearly zero. With

surface 2, I = 0 and

E

/t 0, so the right-hand side is again

nearly zero.

When the approximations used above are eliminated, Maxwells

equations do predict a nonvanishing eld, which is the expected

electromagnetic wave propagating away from the electron at the

proper speed c.

10.7.5 Lorentz invariance

Example 4 might seem like a just-so story, but the appar-

ently miraculous resolution is not a coincidence. It happens because

Maxwells equations are in fact invariant under a Lorentz transfor-

mation, even though that isnt obvious when theyre written in the

form (3a)-(3d). There are various ways of showing this:

Einstein did it by brute force in his 1905 paper on relativity,

by transforming the coordinates through a Lorentz transfor-

mation and the elds as in section 10.4.

Maxwells equations are basically wave equations. (They have

both wave solutions and static solutions.) We can verify that

when we start with a sinusoidal plane wave in one frame, then

transform into another frame, the result is again a valid sine-

wave solution, having been subjected to a Doppler shift (sec-

tion 3.2) and aberration (section 6.5). This requires checking

Section 10.7 Maxwells equations 189

that the wave is still purely transverse, but that follows eas-

ily from examining the invariants described in section 10.5.

By a celebrated mathematical result called Fouriers theorem,

any well-behaved wave can be written as a sum of sine waves,

and therefore any wave solution of Maxwells equations in one

frame is also a solution in every other frame.

Maxwells equations can be rewritten in terms of tensors, obey-

ing all the grammatical rules of index gymnastics. if they can

be written in this form, they are automatically Lorentz invari-

ant.

The last approach is the most general and elegant, so well pro-

vide a brief sketch of how it works. Equation (3a) has 4 times the

charge on the right, while (3d) has 4 times the current. These both

relate to the current four-vector J, so clearly we need to combine

them somehow into a single equation with J on the right. Since

the local form of equation (3a) involves the three-dimensional di-

vergence, which contains rst derivatives, the left-hand side of this

combined equation should have a rst derivative in it. Given the

grammatical rules of tensors and index gymnastics, we dont have

many possible ways to accomplish this. The only obvious thing to

try is

T

= 4kJ

. (6)

Writing this out for being the time coordinate, we get a relation

that equates the divergence of E to 4 times the charge density; this

is the local equivalent of (3a). If youve taken vector calculus and

know about the curl operator and Stokes theorem, then you can

verify that for referring to x, y, and z, we recover the local form

of (3d). The tensorial way of expressing (3b) and (3c) turns out to

be

T

+

T

+

T

= 0 . (7)

g / Example 5.

A generator Example 5

Figure g shows a crude, impractical generator, depicted in two

frames of reference.

190 Chapter 10 Electromagnetism

Flea 1 is sitting on top of the bar magnet, which creates the mag-

netic eld pattern shown with the arrows. To her, the bar magnet

is obviously at rest, and this magnetic eld pattern is static. As

the square wire loop is dragged away from her and the magnet,

its protons experience a force in the z direction, as determined

by the Lorentz force law F = qv B. The electrons, which are

negatively charged, feel a force in the +z direction. The conduc-

tion electrons are free to move, but the protons arent. In the front

and back sides of the loop, this force is perpendicular to the wire.

In the right and left sides, however, the electrons are free to re-

spond to the force. Since the magnetic eld is weaker on the right

side, current circulates around the loop.

Flea 2 is sitting on the loop, which she considers to be at rest. In

her frame of reference, its the bar magnet that is moving. Like

ea 1, she observes a current circulating around the loop, but

unlike ea 1, she cannot use magnetic forces to explain this cur-

rent. As far as she is concerned, the electrons were initially at

rest. Magnetic forces are forces between moving charges and

other moving charges, so a magnetic eld can never accelerate

a charged particle starting from rest. A force that accelerates a

charge from rest can only be an electric force, so she is forced

to conclude that there is an electric eld in her region of space.

This eld drives electrons around and around in circles it is a

curly eld. What reason can ea 2 offer for the existence of this

electric eld pattern? Well, shes been noticing that the magnetic

eld in her region of space has been changing, possibly because

that bar magnet over there has been getting farther away. She

observes that a changing magnetic eld creates a curly electric

eld. Thus the

B

/t term in equation (3c) is not optional; it is

required to exist if Maxwells equations are to be equally valid in

all frames.

Einstein opens his 1905 paper on relativity

1

begins with this sen-

tence: It is known that Maxwells electrodynamicsas usually

understood at the present timewhen applied to moving bodies,

leads to asymmetries which do not appear to be inherent in the

phenomena. He then gives essentially this example. Although

the observers in frames 1 and 2 agree on all physical measure-

ments, their explanations of the physical mechanisms, couched

in the language of Maxwells equations in the form (3), are com-

pletely different. In relativistic language, ea 2s explanation can

be written in terms of equation (7), in the case where the indices

are x, z, and t :

T

xz

t

+

T

zt

x

+

T

t x

z

= 0 ,

1

Zur Elektrodynamik bewegter K orper, Annalen der Physik. 17 (1905)

891. Translation by Perrett and Jeery

Section 10.7 Maxwells equations 191

which is the same as

B

y

t

+

E

z

x

E

x

z

= 0 .

Because the rst term is negative, the second term must be pos-

itive. Since equations (6) and (7) are written in terms of tensors,

obeying the grammatical rules of index gymnastics, we are guar-

anteed that they give consistent predictions in all frames of refer-

ence.

Conservation of charge and energy-momentum Example 6

Solving equation (6) for the current vector, we have

J

=

1

4k

T

.

Conservation of charge (section 9.1.2, p. 152) can be expressed

as

J

= 0 .

If we substitute the rst equation into the second, we obtain

_

1

4k

T

_

= 0

or

2

T

= 0 ,

with a sum over both and . But this equation is automatically

satised because T is antisymmetric, so for every combination of

indices and , the term involving T

taining T

= T

to be added as a supplementary condition in addition to Maxwells

equations; it is automatically implied by Maxwells equations.

Using equation (2) on p. 185, one can also prove that Maxwells

equations imply conservation of energy-momentum.

192 Chapter 10 Electromagnetism

Problems

1 (a) A parallel-plate capacitor has charge per unit area on

its two plates. Use Gausss law to nd the eld between the plates.

(b) In the style of example 2 on p. 181, transform the eld to a

frame moving perpendicular to the plates, and verify that the result

makes sense in terms of the sources that are present.

(c) Repear the analysis for a frame moving parallel to the plates.

2 Weve seen examples such as gure a on p. 175 in which a

purely magnetic eld in one frame becomes a mixture of magnetic

and electric elds in another, and also cases like example 2 on p. 181

in which a purely electric eld transforms to a mixture. Can we have

a case in which a purely electric eld in one frame transforms to a

purely magnetic one in another? The easy way to do this problem

is by using invariants.

3 (a) Starting from equation (1) on p. 180 for T

, lower

an index to nd T

signature +.

(a) Let v = (1, u

x

, u

y

, u

z

), where (u

x

, u

y

, u

z

) is the velocity three-

vector. Write out the matrix multiplication F

= qT

, and show

as claimed on p. 180 that the result is the Lorentz force law.

4 On p. 183 I presented a list of properties of the electromag-

netic stress tensor, followed by an argument in which the tensor is

constructed with three unknown constants a, b, and c, to be deter-

mined from those properties. The values of a and b are derived in

the text, and the purpose of this problem is to nish up by proving

that c = 1. The idea is to take the eld of a point charge, which

we know satises Maxwells equations, and then apply property 8,

which requires that the energy-conservation condition T

ab

/x

a

= 0

hold. This works out nicely if you apply this property to the x col-

umn of T, at a point that lies in the positive x direction relative to

the charge.

5 Show that the number of independent conditions contained

in equations (6) and (7) agrees with the number found in equations

(3a)-(3d).

6 Show that

T

+

T

+

T

= 0

(equation (7), p. 190) implies that the magnetic eld has zero diver-

gence.

7 Write down the elds of an electromagnetic plane wave propa-

gating in the z direction, choosing some polarization. Do not assume

a sinusoidal wave. Show that this is a solution of

T

= 0

Problems 193

(equation (6), p. 190, in a vacuum).

194 Chapter 10 Electromagnetism

Photo Credits

15 Atomic clock on plane: Copyright 1971, Associated press, used

under U.S. fair use exception to copyright law. 18 Ring laser

gyroscope: Wikimedia commons user Nockson, CC-BY-SA licensed.

20 Machine gunners body: Redrawn from a public-domain photo by

Cpl. Sheila Brooks. 20 Machine gunners head: Redrawn from a

sketch by Wenceslas Hollar, 17th century. 21 Minkowski: From

a 1909 book, public domain. 28 Muon storage ring at CERN:

Copyright 1974 by CERN; used here under the U.S. fair use doc-

trine. 29 Joan of Arc holding banner: Ingres, 1854. 29 Joan of

Arc interrogated: Delaroche, 1856. 75 Oscilloscope trace: From

Bertozzi, 1964; used here under the U.S. fair use doctrine. 81

Gamma-ray spectrum: Redrawn from a public-domain image by

Kieran Maher and Dirk Hunniger. 104 Eotvos: Unknown source.

Since Eotvos died in 1919, the painting itself would be public do-

main if done from life. Under U.S. law, this makes photographic

reproductions of the painting public domain. 105 Articial hori-

zon: NASA, public domain. 106 Pound and Rebka photo: Harvard

University. I presume this photo to be in the public domain, since

it is unlikely to have had its copyright renewed. 113 Surfer: Re-

drawn from a photo by Jon Sullivan, CC0 license. 133 Lambert

projection: Eric Gaba, CC-BY-SA. 156 Levi-Civita: Believed to

be public domain. Source: http://www-history.mcs.st-and.ac.

uk/PictDisplay/Levi-Civita.html.

Problems 195

Index

aberration, 118

abstract index notation, 121

equivalent to birdtracks, 121

acceleration

proper, 55, 57

acceleration vector, 57

ane parameter, 125

birdtracks, 110

birdtracks notation, 110

equivalent to abstract index notation, 121

black hole, 92

Bohr model, 145

boost

dened, 21

Bridgman, P.W., 25

Brown-Bethe scenario, 92

causality, 15, 40

Chandrasekhar limit, 90

Christoel symbol, 168

clock-comparison experiments, 109

coordinates

Minkowski, 20

correspondence principle

dened, 15

for time dilation, 15

covariant derivative, 136, 167

in relativity, 167

covector, 111

curl, 188

current vector, 149

Cvitanovic, Predrag, 110

de Sitter, Willem, 146

degree of freedom, 58

derivative

covariant, 167

in relativity, 167

dieomorphism, 111

divergence, 152, 188

dust, 154

Eotvos experiments, 104

Einstein summation convention, 122

Einstein synchronization, 19

electron capture, 90

energy

equivalent to mass, 72

event, 12

event horizon, 58

black hole, 92

ne structure constant, 145

four-vector, 56

gamma factor

as an inner product, 60

dened, 25

gamma ray

pair production, 82

garage paradox, 31

geodesic, 168, 171

dierential equation for, 171

geodesic equation, 172

Goudsmit, 145

group velocity, 114

hyperbolic motion, 57, 67

inner product, 23

interval

dened, 23

invariant

compared to scalar, 111, 112

dened, 21

Ives-Stilwell experiments, 52

Lambert cylindrical projection, 133

Levi-Civita symbol, 182

Levi-Civita, Tullio, 156

Lewis-Tolman paradox, 63

light cone, 17

Lorentz invariance, 43

Lorentz invariant, see invariant

Lorentz scalar, see scalar

Lorentz transformation, 29

lowering an index, 124

mass

dened, 78

equivalent to energy, 72

metric, 23

Minkowski coordinates, 20

Minkowski, Hermann, 20

natural units, 24

neutron star, 90

normalization, 59

operationalism, 25

pair production, 82

parallel transport, 41

Penrose

graphical notation for tensors, 110

phase velocity, 114

polarization

of light, 83

positron, 74

projection operator, 61

proper acceleration, 57

proper time, 23

pulsar, 91

raising an index, 124

rapidity, 55

Rindler coordinates, 130

scalar, 111, 112

compared to Lorentz invariant, 111

signature, 24

stress-energy tensor, 153

interpretation of, 157

tensor

Penrose graphical notation, 110

Thomas precession, 145

Thomas, Llewellyn, 145

three-vector, 62

Tolman-Oppenheimer-Volko limit, 91

torsion, 170

transverse polarization

of light, 83

Uhlenbeck, 145

vector

acceleration, 57

distinguished from covector, 111

Penrose graphical notation, 110

volume

3-volume covector, 125

ane, 124

Voyager space probe, 34

Waage, Harold, 101

Wheeler, John, 102

white dwarf, 89

world-line, 12

Index 197

- PHYS 237 Syllabus 2014Uploaded byspylogo3
- Differential Dx of AnginaUploaded byRomberg's sign
- CircuitUploaded bykaushik247
- The Difference Between Right and Left (J. Bennett)Uploaded byDaniel Riquelme
- Advertising HistoryUploaded byBrands Vietnam
- Intro Physics-Ch 9-Operational Definitions of EnergyUploaded byFernand Brunschwig
- Differential Forms.Uploaded bygoofyboy22
- French, A. - Special RelativityUploaded byCharlie21993
- Reissner Nordstrom MetricUploaded byc_y_lo
- 1976.12.15 - Stefan Marinov - Rotating Disk ExperimentsUploaded bysadang
- Special RelativityUploaded bynmark
- LQGUploaded byPeter Freimann
- ijsrp-June-2012-57Uploaded byBiesley Gutsa
- 2nd. Revised Completed Einstein General Theory of RelativityUploaded bynadgob
- GR19Uploaded byLuke Lee
- Jan Brugues et al- Newton-Hooke algebras, nonrelativistic branes, and generalized pp-wave metricsUploaded byArsLexii
- Relativity_-_The_Special_andUploaded byapi-19973051
- 0709.2948v1The Contribution of the Cosmological Constant to the Relativistic Bending of Light RevisitedUploaded bytomdickens22
- The Eco-Genesis of Ethics and ReligionUploaded byJack Dale
- 115101011.pdfUploaded byArvinder Singh
- RelativityUploaded byHammad Misbah
- 5938.Full +GravitationUploaded byEduardo Cordova
- PHYS 342 - Lecture 1 Notes - F12Uploaded byJohn Doe
- Lecture 3Uploaded byHimangshu Bora
- SamanthaUploaded byLyka Jane Pesigan
- Lec 11 Work Energy Method for Particle KineticsUploaded byJihan Pacer
- Science Quiz.docxUploaded byVee Vee
- Motion in a Straight LineUploaded bySam Jones
- The Grand Unified Theory of Classical Quantum MechanicsUploaded byAnonymous ZKt114

- DiffractionUploaded bygzb012
- Introduction to VNAUploaded byMohammed Ali
- datasheet.pdfUploaded byMohammed Ali
- Chap 1Uploaded bySumeet Kumar Padhi
- Lecture MWVUploaded byMohammed Ali
- Antenna Basic R&SUploaded byKelvin Arana
- FraunhoferDiffractionUploaded byimunasinghe
- wire antennaUploaded bySrikanth Varanasi
- 12 ClusteringUploaded byMohammed Ali
- 12 ClusteringUploaded bydadhichakhilesh
- CST PCB STUDIO - Workflow and Solver OverviewUploaded byMohammed Ali
- 481 Lecture 2Uploaded byMuhammad Ramzy
- chap.3Uploaded byMohammed Ali
- white_paper_meshing_in_td.pdfUploaded byTuýp Phờ Nờ
- 2010 Lahiri PhdUploaded byMohammed Ali
- lgccurbeamUploaded byMohammed Ali
- lgcintroUploaded byMohammed Ali
- Instruction ManualUploaded byMohammed Ali
- Ch11-Inductance and Magnetic EnergyUploaded bymehdii.heidary1366
- dp-rf-mwUploaded bykd kd
- CST MICROWAVE STUDIO - Workflow and Solver Overview.pdfUploaded byMuhammad Nourman Hadi
- Combining Methods Analysis Solve Antenna ProblemsUploaded byZeeshan Ahmed
- 12 TransUploaded byRenesiy Marco Novilla
- 11410_00724c_understanding_vector_network_analysis_an.pdfUploaded byMohammed Ali
- Elements_Of_Engineering_Electromagnetics_6th_Edition.pdfUploaded byLegeñd League
- 13.0202173.HillionUploaded byMohammed Ali
- Rohde Schwarz SMB100A DatasheetUploaded byMohammed Ali
- VL2Uploaded byMohammed Ali
- optcommchap1.pdfUploaded byMohammed Ali

- Association of ABO and Rh Incompatibility With NeonatalUploaded byMer Rodriguez
- Quiz #8Uploaded byMichael Merlin
- Holy Week MessageUploaded bymelvinsachin
- BOM Tradesho AllisonUploaded byRob Kambach
- Bitumen PumpsUploaded bymetasoniko2014
- AMS 5667Uploaded byShankar Mano
- API 570 Study Question FlashcardsUploaded byjasminnee
- BTECH (4) semUploaded byPrashant Kumar
- PSG's Wilden® Product BrochureUploaded bycassandralotus
- Steam TurbineUploaded byavishek90
- 1056-1060 PotreroUploaded byJoe Eskenazi
- Simultaneous Cu-, Fe-, And Zn-Specific Detection of Metal Lo ProteinsUploaded byAntonius Rianditya Putra
- 5150NQ_2Uploaded bySamuel DiRocco
- marketingUploaded byfariha0505
- Report(Meat)SpresentationUploaded byBitter Sugar
- TSO 7500Uploaded bySrinivas Yamparala
- MVA Method to Calculate Fault CurrentUploaded byOoi Ban Juan
- Pendulum Lab Pt 2Uploaded byrebbieg
- Errata Sheet Solutions Manual Cengel CimbalaUploaded byOdu Paul Duku
- jutototUploaded byjusdjjdjdfjdf
- 20190308 Ivy - Business Planner for Wells Engineers (1)Uploaded bydavorp1402
- Pattern Printing in cUploaded bySatkar Agrawal
- N Series Op's ManualUploaded byflorincostan
- H960 Experimental and Operating ManualUploaded bynes1b0
- CRETUSCAUploaded byQ-Entity
- 1-Chap Electric Charge and FieldUploaded bymanish kant nagar
- OxcarbazepineUploaded byahmad_makhtar
- final copy bone detectivesUploaded byapi-313335831
- Adverbs Vocabulary Word List - EnchantedLearning inglesUploaded byLina M PrinSs
- DLL Science 7 Q1 - Week 1.docxUploaded byKatrina L Pascua

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.