You are on page 1of 71

Physics 16 Section 1, September 6-7, 2011

Getting Acquainted

On your sheet of scrap paper, please provide the following information.


1. What is your name?
2. What is your level of experience with math, what math courses are you currently
taking, and what is the general state of your relationship with math?
3. Do you like to work with your hands? Have you built or fixed something recently?
4. How comfortable are you with computer programming, LaTeX, and Mathematica?

Getting Acquainted with Your Brain

On your sheet of scrap paper, please attempt to answer the following questions. Half-baked
ideas, gut feelings, and wild guesses are welcome! You may skip questions that dont inspire
you.
1. Why are clouds white?
2. What is likelier to damage your microwave, a steel ball or a steel fork?
3. Why can you jump higher lowering yourself quickly before jumping than from a static
squat position?
4. How do .zip files work? Why does .jpg format reduce the file size of a raw bitmap file
more than .zip compression?
5. Prove that a circle has the highest possible ratio of area to perimeter.
6. Why is it that you blow over a glass bottle with a steady stream of air it produces a
pitch, i.e. a vibrating column of air inside the bottle?

Section, Office Hours, Homework

Section is a chance to review lecture, but this usually deserves at most fifteen minutes
because you can review much more efficiently alone or with classmates. When I do review
in section, it is to summarize pithily ideas from lecture for your memory files and to give
them motivation and intuition. The real purpose of section is to solve problems. Usually
we will do a warm-up problem and one or two harder problems. I will ask a lot of difficult
1

questions on the way to a solution. The most obvious pitfall of easy questions is that they are
boring. Furthermore, theres no glory in getting the right answer, while the wrong answer is
embarassing. This causes awkward silences, which are a TFs greatest fear. It doesnt cost
any pride to get a hard question wrong and it feels great to get it right. If your answer has a
sliver of truth we can run with it and everyone is happy. The same applies if its wrong for
an interesting reason. Its not for your ego that I reward partial answers its simply the
way science works. Finally, when you use ideas from lecture in unexpected ways, combine
them, and generalize them, you get a deeper and more confident grasp of them.
At office hours (as well as before and after class) you can talk about anything. If something was presented in class from a point of view that doesnt work well for you we can
approach it from a different angle. Or, you may have a few busy weeks and fall behind. If
this happens, dont be embarassed to come in for a detailed review of weeks of material. My
opinions about section do not imply a disdain for review in general. Nothing is too simple
for office hours, including, for example, line-by-line review of lecture slides. When you have
the time, office hours are great for oddball tangents such as rigorous proofs and other areas
of physics.
Homework is mainly for your practice, but it is also a dialogue. It shows me how well
you understand things and give comments accordingly. Sometimes I will just ask you to look
at the solutions, but if you make a very tempting error I try to explain why it doesnt quite
work. I also note particularly elegant and original solutions. Basically, check the margins of
your problem sets.

A Bit of Review

So far all we have seen is a bit of F = ma and solutions to the differential equations that
this implies.

4.1

Example: Air Resistance

As discussed in lecture, you can model air resistance with a frictional force proportional to
the square of speed. Hence
F = ma = mv 2 v 2 =

1 dv
dv
= 2 ,
dt
v dt

(1)

where we arbitrarily write the force constant as m instead of just to simplify our expressions. We have grouped all the v-dependent terms on the same side as the dv/dt because
then we can integrate both sides using the change-of-variables formula. Integrating from
some initial time ti to some final time tf and changing variables from t to v(t)
Z

tf

tf

dt =
ti

ti

1 dv
dt =
v 2 dt
2

v(tf )

v(ti )

 v(tf )
1
1
dv =
,
2
v
v v(ti )

(2)

which reduces to

(tf ti ) =

4.2

1
1

v(tf ) v(ti )


v(tf ) =

v(ti )
.
1 + v(ti )(tf ti )

(3)

Example: Harmonic Oscillator

As you probably saw in AP Physics, a useful model is a system whose degree of freedom is
restored to equilibrium with a force proportional to its displacement:
ma = m

d2 x
= F = Kx.
dt2

(4)

Note that for positive K the minus sign is essential. There are systematic ways of deriving the
solution to such an equation, but lets use of more venerable method of guessing a solution
of the form
x(t) = a cos(t) + b sin(t).
(5)
Plugging this into the differential equation gives
m 2 (a cos(t) + b sin(t)) = K (a cos(t) + b sin(t)) =

p
K/m

(6)

We see that the differential equation has nothing to say about a and b. They are determined
by the initial conditions. For example, we might have be given x(0) x0 and x0 (0) v0 . In
terms of a and b we find
x(0) = a a = x0
x0 (0) = b b = v0 /.

(7)

A Problem, if Time Permits

Problem: Up to some dimensionless prefactor, what is the frequency of the fundamental


mode of vibration of a circular drumhead?
Solution: Frequency has dimensions T 1 . We need to figure out a few parameters that
ought to affect the frequency. If were lucky, there will be a unique way of combining them
so that the dimensions come out to inverse time.
The radius r, with dimensions L, ought to matter. Equivalently, we could express things
in terms of the circumference or area. It seems intuitively clear that the density , with
dimensions mass per area (M L2 ) should also affect the frequency something with lots of
inertia should vibrate slower.
The last one is a bit trickier. In one dimension, you know that taut strings have higher
pitches. Thus we need the two-dimensional version of tension, that is, surface tension.
Question: What are the dimensions of surface tension?

Answer: Surface tension represents the amount of energy it takes to stretch a surface,
that is, to increase its area. Thus we can define it as energy divided by (area), which has
dimensions
E
M L2 T 2
=
= M T 2 .
(8)
2
A
L
So now we need to combine r, , and to get dimensions T 1 . Only surface tension
involves time, so the answer must contain 1/2 , which has dimensions
 1/2 
= M 1/2 T 1 .
(9)

Since frequency does not involve any length dimensions, we must combine r and so as to
cancel out length. The combination that does this is
 2
(10)
r = M.
Finally, we take the square root of r2 to get dimensions of M 1/2 that cancel the mass
dimensions in 1/2 :
r


= T 1 .
(11)
r2
p
We conclude (1/r) /.
Question: Did some cheating occur here? Is there a variable we left out?
Answer: Perhaps the height h of the vibration makes a difference. In fact, it does.
This ruins everything, because the ratio r/h is dimensionless, and as such you can act on
it with any function, not just raise it to various powers. We would then have to include a
prefactor f (r/h), where f is an unknown function. This greatly reduces the predictive power
of dimensional analysis.
However, ignoring h actually isnt so bad.
Question: Why?
Answer: We know from experience with the harmonic oscillator that for small oscillations, with restoring force approximately proportional to displacement, the frequency actually doesnt depend on the amplitude. So in the case of small vibrations, our original answer
is not so bad.

Physics 16 Section 2, September 13, 2011


1

A Bit of Review

Last Thursdays lecture stressed two concepts. The first, conservation of momentum, is
familiar to you from high school. It will come up in todays problems. The second is not
simply vectors, which you know very well, but the subtler idea that it is often useful to manipulate vectors independent of their components, that is, as objects with inherent meaning.
The dot product and cross product are very useful tools for the so-called coordinate-free
representation of vectors. You can think of this as analogous to solving problems with all
the parameters as symbols and only plugging in values at the end, if at all. It saves time
and increases clarity to preserve generality as long as possible.

Problem 1: Sliding off a Hemisphere (Morin 5.53)

Problem: A point particle of mass m slides without friction, starting with infinitesimal
initial speed, at the top of a hemisphere of mass M and radius R that moves on a frictionless
plane. At what angle relative to the hemisphere does the particle fly off?
Solution: First, note that we can treat this as a 2-D problem the particle moves along a

longitude line, which we can say lies in the xy plane. Let


vh = vh x and
vp = (vp,x , vp,y )
denote the hemisphere and particle velocities. We will find it useful to express some things in

0
0
, vp,y
)
the lab frame of reference and some things in the hemispheres frame, so let
vp 0 = (vp,x
be the particles velocity relative to the hemisphere. Let be the angle of the arc the particle
travels relative to vertical, as in the diagram below.

Question: What is the condition for flying off?


Answer: The particle flies off when the normal force vanishes, that is, when gravity just
barely provides the centripetal force mv 2 /R for circular motion.
Question: How do we express this in terms of our variables?

Answer: The normal component of gravity is mg cos , and the particles motion is only
circular in the hemispheres frame. Thus the condition is
2
mg cos = m vp0 /R
2
2
0
0
.
(1)
+ vp,y
gR cos = vp,x
Now we could set up an ugly differential equation and solve for as a function of time.
It is easier to use conservation laws.
Question: What is the consequence of momentum conservation?
Answer: The hemispheres recoil has to balance the particles xmomentum:
M vh = mvp,x
m
vh =
vp,x .
M

(2)

As for energy conservation, the potential energy lost by falling a distance (1 cos )R
must equal the kinetic energy gained:

1
2
2
M vh2 + mvp,x
+ mvp,y
2

m
2
1
2
2
M
vp,x + mvp,x + mvp,y
=
2
M

m 2
2
2gR(1 cos ) = 1 +
v + vp,y
M p,x
mgR(1 cos ) =

(3)

Now there is only one degree of freedom as long as the particle is constrained to move
along the hemispheres surface, so you know vp,x and vp,y cant be independent. A little bit
of geometry in the hemispheres reference frame tells you that
0
vp,y
= tan .
0
vp,x

(4)

Question: How do we translate this to the lab frame?


0
Answer: It is not hard to see that vp,y
= vp,y and

m
0
vp,x
= vp,x + vh = 1 +
vp,x ,
M

(5)

which gives

m
vp,y = 1 +
tan vp,x .
M
The energy conservation equation becomes


m 
m 2
2
2
2gR(1 cos ) =
1+
+ 1+
tan vp,x
,
M
M


(6)

(7)

while the vanishing normal force equation becomes



 2
m 2
gR cos = 1 +
.
1 + tan2 vp,x
M

(8)

2
Define 1 + m/M for convenience and eliminate vp,x
to get

( + 2 tan2 ) gR cos
2gR(1 cos ) =
2 (1 + tan2 )


2(1 cos ) 1 + tan2 = 1 + tan2 cos .

(9)

Since tan2 = sin2 / cos2 = (1 cos2 )/ cos2 , this is a polynomial in cos :


(1 ) cos3 + 3 cos 2 = 0.

(10)

Question: Are there any cases that we can check this results against?
Answer: The one that first comes to mind is an infinitely heavy, i.e. immobile, hemisphere. Then = 1 and we get 3 cos 2 = 0, so = cos1 (2/3). Personally, I dont have
any reason to think this is intuitive, so lets try a different limit, perhaps slightly silly: an
infinitely massive particle or infinitely light hemisphere. Then and we have
cos3 + 3 cos 2 = 0
3 cos cos3 = 2,

(11)

which implies = 0.
Question: Why is this intuitive?
Answer: If the hemisphere is infinitely light, it acquires an infinite recoil velocity infinitely quickly, hence it scoots out from under the particle when = 0.

Problem 2: Drag on a Sphere (Morin 5.86)

Problem: What is the drag force on a sphere of mass M , radius R, and speed v moving
through a gas of particles with mass m and density (number per unit volume) , assuming
that m << M ?
Solution: We are going to try to do this as coordinate-free as possible.
Question: What should our strategy be? Hint: force equals time derivative of momentum.
Answer: Lets consider the momentum that the sphere loses in an individual collision
and and multiply by the rate of collisions. Of course, this depends of what part of the sphere
collides with a gas particle, so we will need to integrate over the different parts of the sphere.
Since the sphere is much more massive than the gas particles, we may ignore its recoil. Then
the momentum it loses equals the momentum that it imparts to a gas particle. We can

consider to sphere to be motionless and the particles to be incoming with velocity


v = v
v.
3

Question: In as coordinate-free a way as possible, what is the velocity of the gas particle
after collision, in terms of the position vector r of the point at which it hits the surface of
the sphere?
Answer: Instead of a sphere, we could equally well imagine the particle bouncing off a
plane tangent to the sphere and perpendicular to r.
Question: What is a coordinate-free way of saying that the angle of incidence equals
the angle of reflection?

Answer: The component of


v perpendicular to the plane (parallel to r) is reversed,
while the components parallel to the plane (perpendicular to r) are unaffected. We can
divide the velocity vector into perpendicular and parallel components with the dot product:

v = (
v r) r + (
v (
v r) r) = parallel + perpendicular

v = ( v r) r + ( v (
v r) r) .
f

(12)

Hence the momentum change is 2m (


v r) r.
Question: Can we simplify this by discarding extraneous information and/or exploiting
symmetry?
Answer: Only the momentum change parallel to the direction of motion matters other
forces must balance out to zero. To isolate this component we take the dot product with v:

p = 2m (
v r) r v = 2mv (
r v)2 .

(13)

Finally, we can delay the inevitable no longer and we institute a coordinate system. It is
not hard to see that v r = cos . In addition to , we have an angle that rotates around
v.
4

Question: We will find the average momentum transfer and then multiply by the total
rate of collisions. To get the average, do we simply integrate over the half sphere and divide
by the surface area of the sphere?
R /2 2
R 2
R
sin
d
d2mv cos2
0
0
(14)
2R2
Answer: No! The incoming particles are not evenly distributed over the surface of the
sphere. Rather, they are evenly distributed over the circular cross section of the sphere.
Looking at the sphere head-on, this cross section can be divided in to annuli of infinitesimal
width dr:

The area of an annulus at (cross-sectional) distance from the center of the circle is 2rdr,
so what we want is
RR
(2r)(2mv cos2 )dr
0
(15)
R2
We must relate r to . From the diagram below, we see that R sin = r, and therefore
cos2 = 1 (r/R)2 .
Thus we have the average momentum loss per collision:
RR
(2r)(1 (r/R)2 )dr
2mv 0
R2
= mv.
Question: Finally, what is the rate of collisions?
5

(16)

(17)

Answer: The sphere moves through a distance vdt in time dt, hence it sweeps through
a volume R2 vdt. This volume has R2 vdt gas particles, so the rate of collisions is R2 v.
Therefore, the rate of momentum loss (force) is
F = R2 v 2 m.

(18)

Question: Why does this start to fail for a denser medium?


Answer: We assumed that each particle is hit once and then goes away forever. However,
it could be stopped by another gas particle and collide with the sphere again. For a liquid,
our formula completely falls apart because the liquid molecules grip each other strongly.
Question: We have seen a reason why the drag is greater than our estimate. Is there
some effect that reduces the amount of drag?
Answer: Pressure in the medium mean that isolated collisions are a poor model and a
better approximation is sheets of laminar flow. The angle particles are deflected by would
then be much less. In effect, you could imagine an extreme case in which the sphere is
permanently attached to a streamlined narrow wedge of gas particles that it pushes in front
of itself, in which case we would solve the same problem with a wedge instead of a sphere to
find a smaller coefficient fo drag. Obviously, this is not realistic, but the effect exists.

Problem 3: Distribution of Velocities in a Gas

Problem: What are the relative probabilities for molecules in a gas at equilibrium to

have velocity
v = (vx , vy , vz )? That is, what is the function f (v) such that the probability
to have velocity v0,x < vx < v0,x + dvx , v0,y < vy < v0,y + dvy ,v0,z < vz < v0,z + dvz is
f (v0,x , v0,y , v0,z )dvz dvy dvz ?
1

Thanks go to Eric Kramer for suggesting how to solve this problem.

Solution: This seems hopeless. When something in physics seems hopeless, a direct attack
usually increases the hopelessness. We need something clever.
Question: What are some useful symmetries?
Answer: First, q
no direction is more awesome than any other, so f can only depend on
the magnitude v = vx2 + vy2 + vz2 . Second, motion in every direction is independent, which
means that the probability distribution of x-velocities is independent of the distribution of
y-velocities. When things are independent, probabilities simply multiply, so we must have
for some function g
f (v) = g(vx )g(vy )g(vz ).
(19)
Now this is actually a very significant constraint, because there are lots of ways to change
the components of v in such a way that the magnitude v is unaffected. Then the left side
of this equation doesnt change, and so everything must also work out so that the left side
doesnt change! To make this perhaps a little starker, take the natural logarithm:
ln f (v) = ln g(vx ) + ln g(vy ) + ln g(vz ).

(20)

Question: Now what? Hint: whats a mathematical way to talk about changes?
Answer: At this point it is at most a hunch, but lets try taking a derivative. Specifically,
take the partial derivative of both sides with respect to vx , which just means a derivative
in which we pretend vy and vz are constants. If you remember the chain rule, you get

Now v =

1 dg
1 df dv
=
f (v) dv dvx
g(vx ) dvx

(21)

vx
2vx
dv
=
= p 2
,
dvx
v
2 vx + vy2 + vz2

(22)

q
vx2 + vy2 + vz2 , so

whence
1 df vx
1 dg
=
f (v) dv v
g(vx ) dvx
1 df
1
dg

=
f (v)v dv
g(vx )vx dvx

(23)

Question: What is very peculiar about this equation?


Answer: The left side is a function of v alone. The right side is a function of vx alone.
But these are two different variables! The left-side cant depend on vy or vz even though it
depends on v. This is only possible if it is a constant. Calling this constant 2, we get
1 df
1
dg
= 2 =
.
f (v)v dv
g(vx )vx dvx
7

(24)

Question: How do we solve the left of the equations?


Answer: Use the separation of variables method that you used in your first homework:
1 df
= 2
f (v)v dv
df

= 2vdv
f
ln f = v 2 + c
f exp(v 2 )

(25)

Question: What is the physical meaning of ?


Answer: When is large, large velocities are suppressed. Thus 1/ represents the
temperature. To see this more directly, we could use the distribution to calculate the average
kinetic energy of a molecule (which is proportional to the temperature) and we would find
that it is proportional to 1/.

Physics 16 Section 3, September 20, 2011


1
1.1

Crash Course in Multivariable Differential Calculus


Partial Derivatives

Suppose we have some multivariable function f (x, y) and want to know how much it changes
when x or y changes. To do this, we simply differentiate with respect to x (say) in the familiar
way and pretend that y is a constant. The definition of such of operation is
f (x0 + x, y0 ) f (x0 , y0 )
f
(x0 , y0 ) lim
.
x0
x
x
We define the partial derivative with respect to y analogously. For example


x2 + y 2 = 2x,
(xy) = y,
(sin(xy)) = x cos(xy).
x
x
y

1.2

(1)

(2)

Chain Rule

Now that we can take derivatives, lets invent a chain rule. First, a digression about fruit
sales. Suppose I sell na apples and np pears at prices pa and pb . Further suppose that the
number of apples and pears I can sell is proportional to the number m of sunny days in the
growing season, so that na(b) = a(b) m. Then the rate of change of my total revenue R with
respect to the number of sunny days is
R
= a pa + p pp .
(3)
m
The way you reasoned this is as follows: each sunny day give a apples. And each one of
those apples gives revenue pa , for a total of a pa . Repeating the logic for pears gives the
above total. Now note that this is just another way of saying
R na
R np
R
=
+
.
m
na m np m

(4)

This is exactly the same as the single-variable chain rule, but now the independent variable m
has two paths, one via na and one via np , through which it can cause a change in R. This
logic generalizes: partial derivatives of composite functions are the sum of chain rule-like
expressions for each path. For example, suppose we have some function f (g1 (x, y), g2 (x, y)).
Then
f
f g1
f g2
=
+
.
(5)
y
g1 y
g2 y
Note that we often get lazy and write the partial derivative symbol when technically something is a single derivative. Finally, you might think the apples and pears example was
unrigorous because it was a linear function. However, any well-behaved function looks linear
locally, so as far as derivatives were concerned our example did not suffer a loss of generality.
1

1.3

Vector Operator # 1: Divergence

Quick definition: a vector field is something that assigns a vector to each point in space.
Suppose we wanted a measure of how much a vector field is flowing outward. In the
direction,
diagram below, the right face (colored yellow) of the infinitesimal box faces in the x
so the amount of flow leaving this face is proportional to the x-component Ax of the
field, evaluated at (x + dx, y, z). The amount entering the left face is similarly Ax (x, y, z).
Subtracting, we get a contribution
Ax (x + dx, y, z) Ax (x, y, z)

Ax
x

(6)

Adding the contributions from the four other faces we can define the divergence
A

Ax Ay Az
+
+
.
x
y
z

(7)

The dot product notation is a slight abuse of notation but makes sense if we define the
nabla operator



, ,
.
(8)
x y z
Those of you who have seen the divergence theorem can now appreciate its intuitive
meaning. The flux across a surface is the amount flowing in minus the amount flowing out.
If this surface encloses a volume V , then we can subdivided V into infinitesimal boxes, and
the flux is the total amount flowing in minus the total amount flowing out of these boxes.
But that is simply the integral over the divergence over V .

1.4

Vector Operator # 1: Curl

The curl is also motivated by a tangible quantity: if you were to walk around an infinitesimal
square, how much would the force field go with or against your motion? In other words,
what is the integral of the component of the force field that is tangent to your path?
Lets say your path were in the x-y plane (this is defined as the z-component of the
curl, since paths in this plane wrap around the z-axis). Then the component of the force
2

along the path c4 is Fx (x, y + y), while along c2 it is Fx (x, y). This gives a total of
Fx (x, y) Fx (x, y + y) Fx /y. Adding in the sides c2 and c4 , the total amount the
force field F(x, y) pushes you as you go around the path is
F

Fy Fy
+
,
x
x

(9)

which defines the curl. As with the divergence, the notation can be taken literally the curl
can be computed as


x

y
z




F=
(10)
.
x y z
Fx Fy Fz
Using this geometric definition of the curl, we can give a non-rigorous but compelling
proof of Stokes Theorem, which says that the line integral of a vector field along the boundary of a surface (the surface is two-dimensional, possibly curved like the surface of a hemisphere, and the boundary is one-dimensional) equals the integral of the curl of the vector
field over the surface itself.

The proof follows from the picture above. If we add (integrate) the curl over all the
infinitesimal squares, we see that contributions from an edge shared by two squares cancels;
3

the clockwise flow is going left for one square and right for the other, or up for one and down
for the other. The only thing that doesnt cancel are edges that arent shared, that is, the
boundary of the surface.
A corollary of Stokes Theorem is that a field with zero curl does zero net pushing
around any closed path, not just infinitesimal ones. This explains why conservative forces
must have zero curl. If not, there would exist some path that would give you more and more
energy the more you walked around it.
Another corollary is that the work a conservative force does between two points doesnt
depend on the path. To prove this, suppose we have two paths, 1 and 2 , from A to B.
Then reversing 2 and adding it to 1 gives a closed path that starts and end at A. Thus
the work along 1 minus the work along 2 has to equal zero, which implies that the work
along the two paths is the same. This lets us define the potential energy at point X as the
(path-independent) amount of work it takes to get to X from some arbitrary starting point.
The arbitrary starting point only affects the potential by an additive constant.

Resonance

Resonance is simple: as we have seen, many systems like to oscillate at a particular frequency.
If you push it back and forth at that frequency, it will keep absorbing energy without limit
unless some damping force keeps the amplitude in check. If you push it near the natural
frequency, you will generate a large amplitude.
In section we also talked about the bottle problem posed in section 1. However, it would
take too long to code the diagrams. Its easier to explain in person.

Conceptual Problem

Complete the following sentence profoundly and concisely:


Non-conservative forces appear to exists even though the fundamental forces
are conservative because...
Answer: Think about friction. An object slows down because it transfers momentum to the
atoms of the surface its moving on. If there were only a few of these atoms (and if quantum
physics were turned off), we could keep track of them as a billiards problem. However, there
are unspeakably many atoms, which in turn transfer energy to unspeakably more atoms in
the earths crust. Clearly, its much easier to pretend there is some frictional force acting on
the moving object than to solve for the motion of gondwillions (one gondwillion is defined
as the number of years ago that tectonic activity produced Gondwanaland) of atoms. Hence
our answer is
...we cant keep track of all the degrees of freedom.

Problem: Collisions and Vectors

Problem: A moving mass collides elastically with a pair of other masses at rest, something
like figure 5.44 in Morin. Assume that the three masses are equal. Show that in a two
dimensional collision, it is impossible to have the same non-zero angle between the velocities
of each pair of particles after the collision. Note that the statement of the problem implicitly
assumes that none of the velocities in the final state vanish, because at zero velocity, that
angles are not defined.
Solution: The idea is to use energy and momenum conservation efficiently in vector notation. Again call the initial velocity of mass 1 v. Call the final velocity of mass j vj . Then
conservation of energy is (cancelling a factor of m/2)
v v = v1 v1 + v2 v2 + v3 v3

(11)

and conservation of momentum is


v = v1 + v2 + v3

(12)

Taking the dot product of both sides of (12) with itself gives
v v = (v1 + v2 + v3 )(v1 + v2 + v3 ) = v1 v1 +v2 v2 +v3 v3 +2 (v1 v2 + v1 v3 + v2 v3 )
(13)
Subtracting (11) from (13) gives
v1 v2 + v1 v3 + v2 v3 = 0

(14)

This is the mathematics we need for both parts. If all the angles are equal, (14) becomes
(v1 v2 + v1 v3 + v2 v3 ) cos = 0

(15)

where vj |~vj | is the length of vj .


Plane geometry implies that three vectors in a plane can all be at the same angle from
one another only if = 120 or 0 . But in both cases, cos 6= 0, and so (15) implies
v1 v2 + v1 v3 + v2 v3 = 0

(16)

which is impossible if all the masses are moving in the final state. Thus two dimensions
doesnt work.
Q: Show that this is possible in three dimensions, and compute .
A: This is now easy from (15). We can satisfy it only if cos = 0. This is possible in
three dimensions. The three velocities in the final state are perpendicular to one another.

Problem: More Springs

Problem: A mass is attached to a two-dimensional collection of springs, all with different


spring constants and non-zero relaxed lengths. Describe the motion after it is bonked from
equilibrium.
Solution: As we have seen, the relaxed lengths need to be zero for the force to simplify
nicely. What about the potential? The potential of one spring is (k/2)|r r0 |2 , which is
proportional to (r r0 ) (r r0 ). Hence the total potential is a quadratic function of the
coordinates. This means that near equilibrium (choose coordinates where this is x = y = 0),
we can write
U (x, y) = c0 + c1 x2 + c2 y 2 + c3 xy.
(17)
Note that there are no terms x or y, since these have non-zero partial derivative and thus
yield a non-zero force at x = y = 0. A new feature is the cross term xy, which in general
appears in any quadratic function of x and y. Differentiating, we get
 



U U
x
2c1
c3
,
. (18)
= (2c1 x + c3 y, 2c2 y + c3 x) =
F(x, y) =
c3
2c2
y
x y
Q: Is it true as it was in the homework that the system oscillates with simple harmonic
motion along the axis of the bonk?
A: No. As we can see, a bonk in the x-direction produces a force in the y-direction.
Q: Do there exist any directions in which motion is simple harmonic along a single axis?
that the force of motion along n
is parallel
A: Lets check. We need for some direction n
. That is
to n





2c1
c3
nx
nx
=
(19)
c3
2c2
ny
ny
for some number . Those of you who know linear algebra will recognize this as an eigenvector
1 and n2 , the eigenvectors, that are the axes
equation. It yields two perpendicular vectors n
of simple harmonic motion.
Q: What are the eigenvalues 1,2 ?
1,2 , the force equals the displacement times 1,2 . Thus they are
A: Along the directions n
effective spring constants for the two special directions.
We have found two types of motion of the system that have a definite frequency. These
are its normal modes. A general motion is a superposition of the two of these, which is a
1,2 .
Lissajous Figure (see Wikipedia) along axes n

Physics 16 Section 4, September 27-28, 2011


1
1.1

Review
Euler-Lagrange Equations

If you can write the kinetic energy T and the potential energy U in terms of some set of
coordinates {qi } and their time derivatives, then, defining the Lagrangian L = T U , the
equations of motion are
d L L

= 0 i.
(1)
dt qi qi
In Morin it is proved that these are equivalent to Newtonian mechanics. The proof is
unilluminating and there is no point recapitulating it here. This may seem like pulling a
rabbit out of a hat, and historically Lagrangian mechanics was just a convenient coincidence.
From the modern point of view, the Lagrangian is more fundamental than force. Physicists
like to start with a few symmetries and logical properties that the laws of physics ought to
exhibit and then deduce the laws from these properties. It turns out, as perhaps we may see
later in the semester, that the Lagrangian is an excellent way to encode symmetries.
By the way, you might be skeptical whether taking a partial derivative with respect to
q (while holding q constant) is a legal operation. If you retrace Morins proof, you will see
that what this really means is that you write the Lagrangian in terms of q and q,
pretend
that they have nothing to do with each other, and then take the derivative as a formal
(and perfectly well-defined) operation. Since this is the operation that the proof refers to,
the Euler-Lagrange equations you obtain this way are valid. Your skepticism is about the
notation, not the equations.

1.2

Conservation Laws and Noethers Theorem

There are two kinds of conservation laws that follow from the Lagrangian method. The
easy one is that if the Lagrangian depends on qi but not qi for some coordinate qi , then the
Euler-Lagrange equations immediately tell us that
d L
= 0.
dt qi

(2)

Hence L/ qi is conserved. Conservation laws of this easy variety include conservation of


linear and angular momentum.
These are a subcategory of laws that follow from Noethers Theorem, which goes as
follows. Suppose the Lagrangian (and hence the physics) is unchanged under some transformation law
qj qj () = qj + j + . . .
(3)

where  is some continuous parameter that represents how big the transformation is. Above,
we have Taylor-expanded to first order. For example, if we rotate 2-D coordinates by an
angle around the origin yields
x = x cos() y sin() = x y . . .
y = y cos() + x sin() = y + x . . .

(4)

Hence x = y and y = x.
So far there is no physics, only the statemtent that a continuous transformation can be
expanded to first order. Physics occurs when the Lagrangian is unchanged by this transformation, which means that the laws of physics look the same in both the original and the
transformed coordinate systems. If two things are equal, they must also be equal to first
order in , so we have
L(qi , qi ) = L(
qi , qi )
i + . . .)
= L(qi + i + . . . , qi + 


X
L
L
i

= L(qi , qi ) + 
+ i

qi
i
i


X
i L + i L

0=
qi
qi
i

(5)

By the Euler-Lagrange equations


d L
L
=
,
qj
dt qi

(6)

whence
X
i


L
d
d L

i
+ i
=
qi
dt qi
dt

X
i

L
i
qi

!
= 0.

(7)

Thus we have a conserved quantity. To summarize, if the Lagrangian is unaffected by a


transformation that is, to first order,
qj qj + j ,
then
X

is conserved.

Bead on Rotating Hoop

See the section 4 Mathematica notebook.


2

L
qi

(8)

(9)

Waves on a String

Problem: Derive the equation of motion of a string of length L, tension T , and density
with its endpoints held in place. You may assume that all motion occurs in a single plane.
Solution: The string is essentially a continuous object, while we have only formulated
Lagrangian mechanics in terms of finitely many degrees of freedom. To get around this
difficulty, we could concentrate all the mass of the string at N evenly-spaced points and take
the N limit: The mass is L = mN , where m is the mass of each point, so m = L/N .

The natural choice of coordinates is are the heights yi of each mass, where y0 = yN = 0 are
not free.
The kinetic energy is easy:
mX 2
KE =
y .
(10)
2 j j
The potential energy is the tension times the amount of stretch, which is
i
i
X hp
X hp
PE = T
x2 + y 2 x = T
x2 + (yi+1 yi )2 x
i

(11)

We could leave the potential energy like this, but then the equations of motion would be
non-linear. In a future section we might discuss the effects of non-linearity. For now, we
note that for small displacements we can take a Taylor approximation to get
PE =

T X
(yi+1 yi )2 .
2x i

(12)

Putting it together,
L=

mX 2
T X
y j
(yi+1 yi )2 .
2 j
2x i

(13)

To write, the Euler-Lagrange equations, the easy part is


L
= my j = xy j .
y j

(14)

Q: In evaluating L/yj , what do we have to careful of?


A: We have to be careful because we get a contribution when the dummy index i equals
j but also when i + 1 = j. Thus we have

L
T
T 
=
(yj+1 yj )2 + (yj yj1 )2 =
[(yj+1 yj ) (yj yj1 )] .
yj
2x yj
x
3

(15)

Hence the equations of motion are


x
yj

T
[(yj+1 yj ) (yj yj1 )] = 0
x

(16)

Q: What should we do next?


A: Dividing everything by x, the right side of the equation becomes a second derivative
as x 0:

yk T

yk+1 yk
x

yk yk1
x

=0
x
2y
2y
2 T 2 = 0.
t
x

(17)

Q: How do we solve this equation?


A: If you guessed that the time dependence of the solution should involve cos(t) due to
the presence of the second derivative, you are correct. But lets digress a little to show why
this is so. This equation clearly has the property of time translation invariance, which
means that if y(x, t) is a solution, then so is y(x, t + t). Since the equation is also linear,
all solutions can be written as linear combinations of some basis yi (x, t) of solutions. Thus
if we time-translate some solution yj , we get
X
Mjk yk ,
(18)
T [yj ] =
k

where T is the time translation operator T : f (x, t) f (x, t + t) and M is some matrix.
Taking the eigenvectors of M , we get solutions that map to constant multiples of themselves
when time-translated:
T [y](x, t) = y(x, t + t) y(x, t).
(19)
The only well-behaved functions with this property are exponentials, so we are guaranteed
a basis of solutions
y(x, t) = eit f (x).
(20)
Q: Furthermore. . .
A: . . . the same logic applies to x, so we get solutions
y(x, t) = eit eikx .

(21)

Plugging this in, we find that it is a solution iff


2 + T k 2 = 0 = k

p
T /.

(22)

Taking linear combinations of complex exponentials we get a real-valued solution


y(x, t) = cos(n t + ) sin(kn x).
4

(23)

Q: What is one more contraint that fixes the different values kn , and in turn, n ?
A: We need y(t, 0) = y(t, L) = 0, so kn = n/L. Thus we have determined the normal
modes of the string and their frequencies.
Q: What quantity is conserved by virtue of horizontal translational invariance?
A: Before we take the limit N , this is a discrete symmetry and Noethers Theorem
technically doesnt apply. However, it is a continuous symmetry in reality so we expect
the result to be forgiving of a lack of rigor. In terms of the continuous function y(x, t) a
translation is
y
+ ...
(24)
y(x, t) y(x + , y) = y(x, t) + 
x
Back in discrete language, this says that
yi yi + x(yi+1 yi ),

(25)

from which we read off yi = x(yi+1 yi ). Thus we have the consevred quantity
Z L
X
X
y y
L

x
(yi+1 yi )
= mx
(yi+1 yi )y i
dx.

x
t
i
0
i
i

(26)

If it seems redundant to switch back and forth from continuous to discrete so much, it should.
A tool called the functional derivative allows one to carry out the whole analysis without
discretizing.

4
4.1

Conceptual Questions
Bathtubs

Question: When the water in a bathtub drains, it forms a whirlpool. How is this possible
without violating Noethers Theorem?

4.2

Throat Singing

Question: It is possible to simultaneously sing a low note and a very high overtone. Why
doesnt changing the low note (essentially, reducing tension in your vocal cords to reduce
, which should affect all modes, not just the fundamental one) also change the high note?
Note: in section I will try to demonstrate this. If I fail, look it up on YouTube.

Physics 16 Section 5, October 4, 2011


1
1.1

Review
Partial Derivatives and Their Notation

Suppose we had a function f (x) that depends on x via x2 and x3 . For example
f (x) = sin(x2 + x3 ) = g(x2 , x3 ), where g(a, b) = sin(a + b).

(1)

We could directly obtain the derivative


df
= cos(x2 + x3 )(2x + 3x2 ).
dx

(2)

Alternatively, we could differentiate with respect to g. As discussed earlier, we apply the


chain rule to all paths of intermediate variables by which a change in x can change g(x2 , x3 ).
Here they are x2 and x3 and there are two paths. The formula for what we are doing is
g
d 1st argument of g
g
d 2nd argument of g
dg
=
+
, (3)
dx
1st argument of g
dx
2nd argument of g
dx
where we evaluate the partial derivates of g at (1st argument, 2nd argument) = (x2 , x3 ).
Henceforth lets call the arguments arg1 and arg2. Then
g

=
sin(arg1 + arg2) = cos(arg1 + arg2)
arg1
arg1
g

=
sin(arg1 + arg2) = cos(arg1 + arg2).
arg2
arg2

(4)

We can introduce notation that is precise, well-suited for computers, and perhaps somewhat
soulless as follows. Define
g
arg1
g
g (0,1) (arg1, arg2)
.
arg2
g (1,0) (arg1, arg2)

(5)

Then, for example,g (2,0) is the second partial derivative with respect to its first argument.
The precise statement of the chain rule is then
dg
d(x2 )
d(x3 )
= g (1,0) (x2 , x3 )
+ g (0,1) (x2 , x3 )
= cos(x2 + x3 )(2x + 3x2 ),
dx
dx
dx

(6)

as obtained by the direct method. The important thing to take away from this is that we
were in no sense pretending that x2 and x3 were independent of each other. All we said is
1

that change in x affects both x2 and x3 , which in turns affects sin(x2 + x3 ). Confusion comes
when we get sloppy and write
g
(x2 )
g
.
g (0,1) (x2 , x3 )
(x3 )
g (1,0) (x2 , x3 )

(7)

On the LHS, it is clear that we differentiate g with respect to its first argument and then
plug in the value x2 . On the RHS,it seems that x2 and x3 are two distinct independent
variables.
Incidentally, this lets us make sense of many of the partial derivatives in the textbook.
For example, for Lagrangian L = L(q, q,
t)
dL
L
6=
???
t
dt

(8)

Heres what these two quantities refer to. The partial derivative really means the partial
derivative of L with respect to its third argument. That is
L(0,0,1) (q, q,
t)

sloppiness

L
t

(9)

The normal derivative,on the other hand is the derivative of the number L,which depends
on t via its first argument q(t), its second argument q(t),and

its third argument t. By the


chain rule we have
dq
dq
dt
dL
= L(1,0,0) (q(t), q(t),

t) + L(0,1,0) (q(t), q(t),

t) + L(0,0,1) (q(t), q(t),

t)
dt
dt
dt
dt
L dq L
sloppiness L dq

+
+
.
q dt
q dt
t

1.2

(10)

Stationary Action and Extrema of Functionals

Lets review how stationary action leads to the Euler-Lagrange equations using the good
notation introduced above. We define the action S : C [t1 , t2 ] R
Z t2
S[q]
L(q(t), q(t),

t) dt.
(11)
t1

This is a very general kind of functional that comes up in all sorts of contexts. Now suppose
we want to find the analog of a critical point minimum, maximum, or saddle point. Just
as a single-variable function has a critical point when its derivative vanishes,that is, when
you can change its independent variable by a small amount  and get no change up to first

order in , the condition for some function q(t) to be a stationary value, given fixed boundary
conditions q(t1(2) ) = q1(2) , is
d
d

t2

L(q(t) + (t), q(t)


+ (t),
t) dt = 0 s.t. (t1 ) = (t2 ) = 0.

(12)

t1

We need (t) to vanish at the boundaries because otherwise we would be comparing a nearby
function with different boundary conditions. This would be a valid question, but its not the
question we are asking! We are only maximizing or minimizing or finding a saddle point
within the set of functions that have the same boundary conditions at t1(2) . There is no a
priori reason for this except that a lot of useful problems are of this form. Of course, we know
a posteriori that finding the stationary action subject to the boundary conditions yields the
Euler-Lagrange equations.
Anyway, the math proceeds and we cavalierly take the derivative inside the integral sign.
Then we apply the chain rule, noting that t does not depend on . The condition for q(t) to
be stationary is (the condition s.t. (t1 ) = (t2 ) = 0 is implicit throughout)
Z t2
d

L(q(t) + (t), q(t)


+ (t),
t) dt = 0 (13)
t1 d
Z t2

d
d 

L(1,0,0) (q(t), q(t),

t) (q(t) + (t)) + L(0,1,0) (q(t), q(t),

t)
q(t)
+ (t)
+ ...
d
d
t1
dt
+ L(0,0,1) (q(t), q(t),

t) dt = 0 (14)
d
Z
t2

L(1,0,0) (q(t), q(t),

t)(t) + L(0,1,0) (q(t), q(t),

t)(t)
dt = 0

(15)

t1

(16)
Z
Now we integrate by parts (

Z
u dv = uv

v du with u = L(0,1,0) (q(t), q(t),

t) and dv =

(t)
dt, hence v = (t)):
Z t2
Z
 (0,1,0)
t2
(0,1,0)

L
(q(t), q(t),

t)(t)
= L
(q(t), q(t),

t)(t) t1
t1

t2

t1

t2

=0
t1


d
L(0,1,0) (q(t), q(t),

t) (t) dt
dt
(17)


d
L(0,1,0) (q(t), q(t),

t) (t) dt,
dt

(18)

where we used the fact that (t1(2) ) = 0. Substituting this the stationarity (is that a word?)
condition becomes

Z t2 
d (0,1,0)
(1,0,0)
L
(q(t), q(t),

t) L
(q(t), q(t),

t) (t) dt = 0
(19)
dt
t1
3

Now this is true for all (t). The only thing that can be multiplied by any function such
that the product is always zero is the zero function (you should be able to convince yourself
of this hand-wavingly), so we conclude that the thing in brackets is zero:
L(1,0,0) (q(t), q(t),

t)

d (0,1,0)
L
(q(t), q(t),

t) = 0
dt
d L
sloppiness L

= 0.
q
dt q

(20)
(21)

Thus we have obtained the Euler-Lagrange equation. There are two nifty consequences of
this derivation. First, we have shown that a lot of interesting optimization problems reduce
to the Euler-Lagrange equation. Second, we have shown that Newtonian mechanics, which
is equivalent to the Euler-Lagrange equations, is equivalent to the principle of stationary
action.

1.3

Some Remarks on Noethers Theorem

1. L and S do not really have an intuitive meaning.


2. Noethers Theorem is sort of intuitive within quantum mechanics.
3. Noethers Theorem is really not an important computational tool in classical physics.
However, it is very useful in quantum field theory. Furthermore, it provides a new way
of defining momentum and energy as the quantities that are conserved by virtue of
translation invariance in space and time. This lets us apply conservation of momentum
when matter interacts with electromagnetic fields, for example. It also tells us to think
of energy as the time component of momentum, which is exactly what we will do in
special relativity.

Problem 1: Lagrangian and Small Oscillation

Problem: A particle of mass m slides along a hoop of radius R. The hoop is aligned
vertically and rotates at angular speed around a vertical axis running through its center.
What are the equilibria of the particles position, are they stable, and what are the frequencies
of small oscialltions about those equilibria if they are stable?
Solution: The easiest coordinate to use is the angle along the hoops circumference that
the particle is displaced from the bottom. There are two perpendicular components of the
velocity. They are R along the hoop and R sin in a plane parallel to the ground due the
the hoops rotation. The height relative to the center of the hoop is R cos , so

mR2  2 2
sin + 2 mgR cos .
(22)
L=
2
The corresponding Euler-Lagrange equation is
mR2 mR2 2 sin cos + mgR sin = 0.
4

(23)

The points of equilibrium have constant , so = 0. We find that either sin = 0, in which
case = 0 or = , or cos = g/(R2 ). = is obviously unstable. What about = 0.
Then expanding to first order in small we get
2 + g/R = 0.

(24)

p
By now we can read off that the frequency of small oscillations is = g/R 2 , which is
real (stable) if g/(R2 ) 1. Whats interesting about this is that this is the condition for
the other equilibrium not to exist at all, since cos = g/(R2 ) 1 is impossible.
Now lets look at the other equilibrium. We have already found a condition for its
existence; now we must check its stability. Expand to first order in the deviation of from
0 = cos1 (g/(R2 )): = 0 + . Then
sin = sin(0 + ) = sin(0 ) cos() + cos(0 ) sin() sin(0 ) + cos(0 )
cos = cos(0 + ) = cos(0 ) cos() sin(0 ) sin() cos(0 ) sin(0 ).

(25)
(26)

Substituting these into the Euler-Lagrange equation gives


2 (sin(0 ) + cos(0 )) (cos(0 ) sin(0 )) + g/R (sin(0 ) + cos(0 )) = 0

(27)

Now the part of the LHS that doesnt involve must equal zero, since this is the definition
of an equilibrium (you could of course check this). The first order remainder is


g
+ 2 (sin2 0 cos2 0 ) + cos 0 = 0
(28)
R


g
+ 2 (1 2 cos2 0 ) + cos 0 = 0
(29)
R


g2
+ 2 2 2 = 0
(30)
R
r
g2
The freqeuncy is then 2 2 2 , which is real (stable) iff
R
2

g2
g2
g2
g

1
1.
2
2
2
4
2
4
R
R
R
R2

(31)

This is the same condition as before, so if this equilibrium exists, it is stable.

Problem 2: 2-D bubbles

Problem: Show that a two-dimensional bubble subject to pressure and surface tension
forms a circle.
Solution: We need to find some energy functional for the system. Perhaps it is
E = circumference surface tension () area pressure (P )
5

(32)

Q: How accurate is this? Does it matter?


A: That the pressure decreases to zero as the circle expands is the glaring error. This
shouldnt matter because the essential qualitative feature is that there exists some isotropic
tendency to expand. It is not hard to generalize this to handle pressure as a function of area.
Q: Are we looking for a global minimum?
A: In general we would do this, but with our caricature of pressure we can expand the
shape by some factor x so that the circumference, which goes as x, is dwarfed by the
area, which goes as x2 . Thus the energy becomes infinitely low as x increases. However,
we hope that there will be a local minimum where area and circumference are balanced in
competition.
Q: How do we formulate the problem? What coordinate system? What boundary conditions?
A: The simplest way to represent a closed curve is in polar coordinates with the angle
going from 0 to 2. For the curve to close we need r(0) = r(2) = R for some R. We are
then hoping that our Euler-Lagrange equation can be solved by r() = R = constant.
Q: Whats the problem with this?
A: The physics will be controlled by the ratio /P . For some value of this ratio and
fixed boundary condition R, there is no guarantee that the solution has radius R. It could
be a circle of a different radius, and then we would have to try a solution that is a circle not
centered at the origin.
Q: Is there a better way?
A: Yes there is. We know that for some value of the ratio /P , the radius will be R.
Thus we can plug r() = R and then solve for the /P that makes this work. If we want,
we could then invert this relationship to get the radius R in terms of the parameters.
Now we can start. You may recall that the area in polar coordinates is
Z 2
1 2
r () d
(33)
A=
2
0
Q: What is the circumference?
A: The infinitesimal length can be found from the Pythagorean theorem with one perpendicular direction along increasing r (dl = dr) and another along increasing (dl = rd).
Then
Z 2 p
p

r0 ()2 + r2 d
(34)
dl = dr2 + r2 d2 = r0 ()2 + r2 d L =
0

Then
Z
E
0


1 2
p 0 2
2
r () + r r () d
P
2

(35)

The Lagrangian is simply the integrand, and the Euler-Lagrange equation is


d
r0 ()

r()
p
p
+ r() = 0
0
2
2
0
P d r () + r
P r ()2 + r2
6

(36)

Plugging in r() = R and r0 ( = 0), we get


R

=0
=R
PR
P

(37)

We have shown that the shape is a circle and we have found how the radius depends on the
physical parameters.

Challenge Problem

Find the function that maximizes (absolute maximum) the functional


Z

Z

V [f ] =
0

2
f (t) dt dx

(38)

subject to the constraint


Z

f 2 (t) dt = 1.

(39)

Physics 16 Section 6, October 11-12, 2011


1

Review

Unlike previous weeks, the new material is not mathematically difficult but is conceptually
bizarre. Thus we will do something new and integrate conceptual questions into the review.
Anyway, relativity starts from two postulates:
1. The laws of physics are the same in all non-accelerating reference frames.
2. The speed of light (in a vacuum) is a law of physics.
Now this seems very nice, but something really ought to bother you about the second
postulate.
Q: What is it?
A: Whats so special about light that its speed is a law of physics? After all, an electrons
speed is not a law of physics electrons move at all sorts of speeds. If thats too abstract,
your speed is not a law of physics. Hence,
Q: Why is the speed of light a fundamental law of the universe? You should think about
it for a few minutes, after which the hint is: think hand-wavingly about the Newtonian
mechanics of massless objects.
A1: If you apply F = ma to a massless object, you will find that a massless object that
interacts with other stuff with any force at all experiences an infinite acceleration. This is
untenable and requires that we somehow change the laws of physics so that speeds dont
become infinite. We could do this in three ways give every massless object its own speed
limit with its own ad hoc reason for having that speed, invent some fancy new speed limit
laws, or simply impose the same universal speed limit on all objects. The third is the most
aesthetically satisfying.
A2: Using only Maxwells equations, one can derive a wave equation for light which is
identical to the equation of a string we found two weeks ago:
1 2
2
+
= 0,
x2 c2 t2

(1)

where is some component of the elctric or magnetic field. The actual equation is the 3-D
generalization of this, but thats not important. We solved this and found standing waves,
but you can equally well get travelling waves. Try a solution (x, t) = f (x vt) for some
function f . This represents a shape f (x) travelling at velocity v. Plugging in, we get
v 2 00
f (x vt) 2 f (x vt) = 0.
c
00

(2)

Hence v = c. Since Maxwells equations are laws of physics, the corollary that electromagnetic waves travel at the fixed speed c, which can be computed from the permeability and
permittivity of free space, must also be a law of physics.
1

Q: How could one get around this? Hint: why isnt the speed of waves on the string from
two weeks ago a fundamental law of physics?
A: Just as the wave equation on a string implicitly involves the reference frame of the
string, perhaps Maxwells equations also involve some implicit reference frame, that is, some
medium in which electromagentic waves travel. Nineteenth-century physicists hypothesized
an invisible medium called the ether, which was disproved in the famous Michelson-Morley
experiment. Nowadays people make fun of the ether concept, but really its a very reasonable
idea. Certainly it fares no worse than relativity when subjected to Occams Razor.
So anyway, if you are forced to accept these postulates, you can start calculating some
odd stuff such as time dilation and length contraction. To prevent chaos, one must always be
clear about what reference frame one is using. In class, a reference frame was described as a
collection of synchronized clocks. In more human terms, it is a collection of observers who can
agree on time and distance measurements. Measurement is not the same as perception;
for example, suppose two people live several miles apart and watch lightning storms for fun.
Suppose they want to agree on the exact time of lightning strikes. Although they observe
(hear) the thunder arriving at different times due to the finite speed of sound, they can
compare the times at which they heard the thunder and come up with some triangulation
scheme to agree on when the lightning actually occurred. Similarly relativity deals with
measurements that every observer in a given reference frame could agree on after accounting
for their relative positions, etc.
In lecture you saw the example of two black blocks defining a reference frame with one
blue block moving relative to them. By a light clock argument, we saw that the amount of
time (that is, the number of light clock ticks) that the blue block measures is less than the
time that the blackpblocks measure. That is, time for the lone observer moves more slowly
by a factor = 1/ 1 v 2 /c2 .
Q: First things first, if a light clock runs slow, how do you conclude that all physical
processes run slow?
A: Otherwise, the speed of light relative to other things would be different, which is
the same as saying that the speed of light is not the same in every reference frame. For
example, if the blue observers atomic clock did not run slowly but his light clock did, he
would measure a different speed of light (using his atomic clock) than the black obervers
would with their atomic clock.
Q: How would you challenge the lone observer, slow clock statement?
A: Add a second blue observer who synchronizes with the first!
So now you have two reference frames, blue and black. Each thinks that the others clock
is running slowly! But then we seem to have a paradox: for some time interval, the blue
clock ticks less than the black clock, which ticks less than the blue clock...
Q: Help!
A: The two frames dont agree on time intervals. For example, suppose the two frames
decided to measure each others time as follows: Starting from when the right blue block
passes the left black block and ending when the right and left blocks are at the same positions
(we assume the blocks are separated by the same distance in their reference frames), the blue
2

frame counts the number of black ticks and vice versa. The problem is that this relies on
the right blocks and left blocks lining up simultaneously, which can be true in one frame and
false in the other. Thus there is no paradox, althought there is still weirdness.
Another thing we introduced was the idea of spacetime coordinates and a spacetime
interval. If we have a point x and a time t, then we can combine them into a four-dimensional
object (t, x). This is just convenient notation for a theory where space and time get mixed
up. Then the spacetime interval between two events (t1 , x1 ) and (t2 , x2 ) is just the difference
(t2 , x2 ) (t1 , x1 ). In Morin, the way to switch between different reference frames is derived.
Suppose reference frame S 0 moves at velocity v
x with respect to reference frame S. Then if
frame S measures some two events separated by the interval (x, t), then frame S 0 measures
it occurring at coordinates (x0 , t0 ), where in units c = 1
x0
t0
y 0
z 0

= (x vt)
= (t vx)
= y
= z

This is called a Lorentz transformation and can


0
t

v
x0 v

0 =
y 0
0
0
z
0
0

(3)
(4)
(5)
(6)

be expressed as the matrix equation



0 0
t
x
0 0
.
(7)
1 0 y
0 1
z

Q: Without doing any calculation, what is the inverse of this matrix?


A: This matrix gives coordinate moving at velocity v
x relative to our original coordinates.
Moving with velocity v
x restores the original frame, so the inverse is given by taking
v v above.
Q: If the spacetime interval between two events is zero, the linearity of the transformation
implies that it is zero in every reference frame. Whats the common sense reason for this?
A: If you are walking through Harvard Square, and a fish falls from the sky and hits you,
then in your reference frame the events of you being in Harvard Square and a fish being in
Harvard Square had zero spacetime interval between them. But one would hope that the
fish hits you in all reference frames, which requires zero spacetime interval.
There is an odd generalization of distance in Euclidean space, which is invariant under
rotations, to four-dimensional spacetime. It is defined as
s2 c2 t2 x2

(8)

and you can check that it is invariant under Lorentz transformations. In words, s measures
the time surplus relative to how long it takes light to travel a certain distance in space. Then
if s2 is positive it means light could travel the spatial interval in less than the time interval.
If two events are separated by such an interval, this means that the earlier can affect the
later one.
3

Q: How does this logic work?


A: All forces are transmitted by particles travelling at or slower than c, so if two events
are too far separated in space for light to traverse the interval in time, one cannot influence
the other. conversely, if the distance in space is not too great, one can influence the other.
Q: What worries you about something I said recently?
A: The earlier one can affect the later one. Which event is earlier might depend on the
reference frame, in which case the order of cause an deffect is switched! Fortunately, you can
show that if s2 is positive then Lorentz transformations do not affect which event is earlier
and which is later.
In lecture it was pointed out that time and space arent as different as we perceive. For
example, we only experience one instant of time at once, while space seems to be spread out
before us all at once. However, really we only experience one pointin space at once as well.
Photons (sound waves, wafting scents, etc.) emitted at different points in space and previous
times travel through time and space to reach us, so we see the past. however, because light
travels so quickly we think we are seeing an instantaneous snapshot of all points in space.
This brings up a conundrum.
Q: If you can only observe one point in spacetime, then how is a sense of self possible
when our brains are extended objects and not localized to a single point?
A: I have no idea!
Finally, since a lot of people mentioned this on their QAs, lets talk a bit about rapidity.
Unlike some things we have dealt with recently, rapidity has a very clear physical meaning.
In Newtonian mechanics, if an object is pushed by a constant force, its velocity increases
linearly with time. In relativity, an object pushed by a constant force (that is, it experiences
a constant acceleration in its rest frame) has a rapidity that increases linearly with time.
Basically, rapidity measures how much something has been pushed, but because there is a
fundamental speed limit c, to get a speed you have to smush the interval [0, ) of possible rapidities into the interval [0, c] of possible speeds. As you can check for yourself, the
hyperbolic tangent is such a smushing function.
To prove that its a hyperbolic tangent, let be the total amount at that an object has
accelerated according to its rest frame and let it accelerate in its rest frame an infinitesimal
amount . Then by velocity addition
v( + ) =
Thus

v() +
= v() + (1 v()2 )) + O( 2 ).
1 + v()

v( + ) v()
= 1 v 2 v = tanh
0

v 0 () = lim

(9)

(10)

I uploaded a document that derives Lorentz transformations from the invariant interval
onto the website, and it shows the mathematical significance of the rapidity.
Lets do some problems.

Problems

2.1

Problem 1: Lorentz invariance of wave equation

Problem: Show that the wave equation is Lorentz-invariant. That is, if


 2


2
=0

x2 t2

(11)

is true in one reference frame, then the same equation is true if we replace (t, x) (t0 , x0 ).
Solution: This calls for the chain rule, which we will apply to second derivatives for the
first time. We can say that depends on (t, x) via the intermediate variables
x0 = (x vt)
t0 = (t vx)

(12)
(13)

Then the chain rule says that


f x0 f t0
f
=
+ 0
x
x0 x
t x
f
f
= 0 v 0 .
x
t

(14)
(15)

As an operator equation, this says that





=
v 0 .
x
x0
t

(16)




= v 0 + 0 .
t
x
t

(17)

Likewise, we find that

Hence
2 
2 #

v 0 v 0 + 0
x0
t
x
t



 2

2

2
2
2
=
1v

+ (2v 2v) 0 0
(x0 )2 (t0 )2
x t
 2

2

= 2 (1 v 2 )

0
2
(x )
(t0 )2
 2


2
=

.
(x0 )2 (t0 )2

2
2

= 2
x2 t2

"

(18)
(19)
(20)
(21)

2.2

Problem 2: Taste of Relativistic Electromagnetism

Problem: One of the postulates of relativity is that the laws of physics are the same in all
inertial reference frames. It would certainly contradict this if a force existed in one frame
and not in another. But wait... do you remember magnetism from AP physics?
Q: Whats the problem?
A: Magnetic fields are produced by and act on moving charges. A charged particle that
is moving in one frame and experiences a magnetic force is motionless in some other frame.
The force is no more.
Q: Whats a possible resolution?
A: As we will see, the forces remain the same, but in one frame they are called magnetic
and in another they are called electric. In other words, given relativity and electrostatics,
magnetism must exist. If thats not compelling enough for you, I dont know what is. Lets
now state the quantitative part of our problem.
Problem: Suppose we model an electric current as a very long line with density of
positive charges moving at velocity v
z and a density of negative charges moving at velocity
v
z , for a net current of I = 2v. This produces a magnetic field
B(r) =

0 I
.
2r

(22)

If a charge q is a distance r from the wire and moves at a small velocity u


z , it experiences
a magnetic force qu B, which you can check points toward the wire for positive u and q.
How can we understand this in the rest frame of the charged particle?
Solution: We have no magnetic field, so somehow the wire must produce an electric force
acting on the charged particle. However, the wire is neutral, so it seems there can be no
electric force.
Q: Whats going on?
A: Perhaps the wire is not so neutral after all. Consider the positive and negative current
carriers separately, since they are moving with different velocities. We can use the velocity
addition formula to obtain the velcities v of the positive and negative carriers in the frame
of the particle (the lab frame has velocity -u wrt. the particle, to which we add velocities
v wrt the lab frame):
u v
(23)
v =
1 uv
Common sense and the above formula tell us that for u > 0 the negative carriers, which are
moving opposite the direction of the particle in the lab frame, have a larger velocity in the
particles frame. Thus |v| > |v+ |.
Q: Whats the next step?
A: Consider some length-L chunk of the wire, and consider the positive and negative
charges separately. They are length-contracted by factors
q
2
L = L/ = L 1 v
(24)
6

This means that their densities are increased by the same factors
q
2
= / = / 1 v

(25)

Since the negative carriers move faster, > + and their density in enhanced more. Thus
the charged particle sees a negatively charged wire with net charge density (+ ). This
confirms that for q > 0 and u > 0 the particle is attracted to the wire.

2.3

Problem 3: Trains, Conductors, etc.

Problem: A train has length L in its rest frame and is moving with speed vt x with respect
to a station, where v > 0. A conductor walks to the right end at speed vc relative to the train.
His dog also starts at the left end and runs to the right end, back to the conductor, back to
the right end, etc. at speed vd > vc relative to the train. The conductor has an extremely
accurate biological clock to measure the dogs age at both ends of the train. According to
the conductors clock, how much does his dog age?
Solution: The first thing to realize is that the station frame is irrelevant. The second
thing to do is to translate this as a physics question.
Q: What is the question?
A: It is, how much time elapses in the dogs reference frame?
Q: How can we obtain this?
A: We could find the time in the trains reference frame and then use the fact that
the dogs clock runs slowly. Its essentially the twin paradox with the dog replacing the
space-traveling twin and the train replacing the Earth.
Q: In the trains reference frame, how much time elapses?
A: Considering the conductors path, it is L/vc , so the time elapsed in the dogs frame is
p
L 1 vd2
L
=
.
(26)
vc
vc
Q: Does it matter that the conductors frame is not the same as the trains frame?
A: No. The clock is moving with respect to the train , but its measurements are perfectly
valid spacetime events in the train frame. Of course, we could solve the problem in the
conductors frame too, but it would be more complicated.

Physics 16 Section 7, October 18, 2011


1
1.1

Review
4-vectors

A 4-vector is defined as a collection of four numbers A = (A0 , A1 , A2 , A3 ) = (A0 , A) that


mix up with the same Lorentz transformation that mixes up position and time as
A00 = (A0 vA1 )
A01 = (A1 vA0 )

(1)
(2)

when you switch between reference frames with relative velocity v. This is analogous to
regular vectors in two and three dimensions. For example, in two dimensions the position
vector (x, y) transforms as
x0 = cos x sin y
y 0 = sin x + cos y

(3)
(4)

when you rotate your coordinates by an angle . Then velocity and acceleration are vectors
because the same rule works if you replace (x, y) by (vx , vy ) or (ax , ay ). In contrast, (Moose,
Squirrel) is not a 4-vector, since when you rotate your coordinates you do not get (Moose
cos + Squirrel sin ...)
We can divide both sides of the above equations by some number that does not depend
on the reference frame to get an identical transformation law for A/, which shows that A/
is a 4-vector. This seems a little useless it tells us that 2A is also a 4-vector. However, let
x( ) = (t, x) be the 4-vector describing a particles spacetime coordinates in a laboratory
frame as a function of the particles proper time . Then x( + d ) x( ) is also a 4-vector
for any d . Now d is the same in every frame it is defined that way so we can choose
= d and take the limit 0 to get that
dx dt
x
= (1, v)
v . =
d
dt d

(5)

is a 4-vector. Multiplying by the particles mass m we get another 4-vector


P (m, mv).

(6)

Taylor expanding gives


1
P0 m + mv 2 ,
(7)
2
while P looks like regular momentum with a factor of slapped on to let it become infinite
even though v is bounded by c. Furthermore, Morin shows that both of these quantities are
1

conserved in simple collisions. Thus heuristically it seems like they ought to be the enrgy
and momentum, so that P = (E, p). More on that later.
Just as t2 x2 is invariant as a consequence of how Lorentz transformations work, the
same must be true for any 4-vector:
A20 A2 = constant.

(8)

In particular, for the energy-momentum 4-vector


P02 P2 = E 2 p2 = m2 2 (12 v 2 ) = m2

2
= m2 .
2

(9)

This is actually just a special case of the invariance of the 4-vector inner product. Defining
A B A0 B0 A B,

(10)

one can show that A B = A0 B 0 is the same in any inertial reference frame. Then for
energy-momentum we have
P 2 P P = m2 .
(11)
For a single particle this is a useful algebraic relation between E, p, and m, but we can use
it for the total momentum of a system, since the sum of 4-vectors is a 4-vector. Then we get
the mass of the system
p
2
,
(12)
m = Ptot
which says that the mass of a system is the energy in its zero-momentum frame (P0 = m
when v = 0 = 1). Said another way, mass is defined as the energy something has
just for existing, and not for doing anything. The unsettling consequence of this is that the
mass of a system of particles is not simply the sum of the masses of its constituents. If this
doesnt bother you, you are the most sanguine physics student in history. Lets try to make
it bother you less.
Q: Basically the problem is that you think of mass as essentially stuff. Masses should
add together in the same way that 4 ducks and 5 ducks make 9 ducks (or 4 atoms plus 5
atoms etc). How do you get around this?
A: You skirt the issue completely by stating that stuff does not exist! On a subatomic
level there is no such thing as a solid lump of stuff, just point particles and empty space
or fuzzy wavefunctions, depending on your point of view. Thus there is no sense in which
two electron lumps make a lump thats twice as big. So we have gotten over this issue,
with the price that now we realize that in some sense only abstract mathematical objects,
notthings, actually exist.
Now maybe it bothers you that something has energy just by existing, without doing
anything. For a composite particle its not a big deal, since in the zero-momentum frame
of a proton there are a lot of internal motions of the quarks, hence the proton is doing
something. But for a fundamental particle like an electron, it seems weird that something
has energy in the absence of dynamics.
2

Q: Whats a way around this one?


A: As a condensed matter theorist I am over my head here, but in string theory all the
things we thought of as fundamental particles are actually excitations of a string. The string
has to do something for the particles to exist and hence to have energy.
Anyway, to answer the question that some of you may be wondering, this way of defining
mass is experimentally correct. A protons mass is much larger than the sum of three quark
masses. This mass is in no sense not a real mass. You can use it for any calculation of
gravity and inertia as if the proton were a black box.

1.2

Lagrangian

As discussed a few weeks ago, the fundamental way to define energy and momentum is
as the quantities that are conserved by virtue of translation invariance in time and space,
respectively. This requires a Lagrangian and Noethers Theorem. Since the laws of motion
follow from stationary action and need to be the same in all inertial frames, we hope that the
action itself is frame-independent. For little reason other than we dont have many invariant
quantities to work with and that the Lagrangian ought not to depend explicitly on position,
we take
Z
Z
Z
2
2
dt dx = m
1 x 2 dt.
(13)
S = m d = m
This is your first taste of writing down a Lagrangian that satisfies certain symmetries (in this
case, Lorentz invariance) with no other justification. This method isnt as much of a blind
guess as it seems for very deep reasons that you will learn about in quantum field theory.
If it strikes you as unsatisfying now, without the perspective of QFT, I dont blame you.
What we have done is to toss out all the physics we have learned except the idea that maybe
Lagrangians are a good way of describing things and the need for Lorentz invariance. Then
we tried to write down the simplest possible Lorentz-invariant Lagrangian and hoped that the
consequences would match experimental evidence. This is not a derivation of the relativistic
Lagrangian in the sense that you start with some laws of physics and work backwards in
logical steps until you cast it all in the form of a Lgrangian. In some sense all we did was
hope that the simple Lagrangian was actually correct. This works in physics unreasonably
often. Sometimes (always?) there is a deep principle in the background and its not actually
voodoo.
Anyway, the Lagrangian has the desired invariance properties under translations in space

and time, so the machinery we developed earlier gives


X L
E=
x i L

i
i
X mx 2

i + m 1 x2
=
1 x2
i
= mv 2 + m/
= (m/)(1 + v 2 2 )


v2
= (m/) 1 +
1 v2
1 v2 + v2
m 2
= (m/)
= m
=
1 v2

(14)
(15)
(16)
(17)
(18)
(19)

and
L
= m x i
(20)
x i
These are the energy and momentum that we found earlier. The relativistic Lagrangian has
now served its purpose.
pi =

2
2.1

Problems
Pair Formation

Problem: Two photons of energy E1 and E2 collide at an angle . What is the minimum
possible E1 to produce of two particles of mass m?
Solution: Instead of immediately crunching a lot of kinematics, lets make our task easier.
Q: What is the least possible total energy of the two emitted particles? Hint: their
total momentum is fixed by momentum conservation, so the question is: given p, how do we
minimize E?
A: For fixed total momentum, energy is minimized by having the least possible internal
dynamics of the two-particle system, that is, the least possible energy in the zero-momentum
frame. This is achieved if the two particles have the same velocity. Then in this best-case
scenario, we essentially produce one particle of mass 2m. Let P be the 4-momentum of this
fictitious particle. Then
1 + 2 = P,
(21)
where 1(2) are the photon 4-momenta.
Q: What are these?
A: Since E = |p| for photons, we can choose coordinates where
1 = (E1 , E1 , 0, 0)
2 = (E2 , E2 cos , E2 sin , 0)
4

(22)
(23)

Q: Now what?
A: Square both sides of 4-momentum conservation and use 2 = 0 to get
(1 + 2 )2 = P 2
21 2 = (2m)2
1 2 = 2m2
E1 E2 (1 cos ) = 2m2
2m2
E1 =
.
E2 (1 cos )

2.2

(24)
(25)
(26)
(27)
(28)

Photon Rocket

Problem: A rocket propels itself by emitting photons. When it starts from rest it has mass
m0 and eventually reaches 4-momentum (E, p) and then decelerates back to rest, again by
emitting photons. What is its final mass?
Solution: By conservation of momentum, the photons emitted during the acceleration
phase have total momentum p, and hence have 4-momentum (p, p). Thus 4-momentum
conservation gives
(m0 , 0) = (p, p) + (m1 , p)
m1 = m0 p,

(29)
(30)

where m1 is the mass after acceleration. By the same logic, the deceleration photons have
momentum p and carry off energy p, which is reflected in the rockets loss of mass, so
m2 = m0 2p.

2.3

(31)

Morin 13.11

Problem: Three particles go off at equal speeds v at angle 2/3 with respect to each other.
What is the angle between any two particles velocities in the rest frame of the third?
Solution: The strategy for problems like this is to (1) write out relevant 4-vectors, in this
case the velocity 4-vectors, in terms of unknowns in the desired frame; (2) take inner products
of 4-vectors in this frame and relate them to the desired unknown quantities; (3) calculate
the inner products in whatever frame is easiest. In the rest frame of particle 3 (frame 3),
V10 = 0 (1, v 0 , 0, 0)
V20 = 0 (1, v 0 cos , v 0 sin , 0)
V30 = (1, 0, 0, 0),

(32)
(33)
(34)

where v 0 and 0 are the speed and associated factor of particles 1 and 2 in frame 3. There
are only two independent unknowns, since ( 0 )2 = 1/(1 (v 0 )2 ). Next we take some dot
5

products there are only two interesting ones:


V10 V20 = ( 0 )2 (1 (v 0 )2 cos )
V10 V30 = 0 .

(35)
(36)

The second of these is really convenient since we have an expression for 0 without any
algebra. Using also the relation
(v 0 )2 =

1
(( 0 )2 1),
0
2
( )

(37)

we get
V10

V20

(V10

V30 )2


1
0 2
0
)

1
cos

V
(V
1 0
3
1
(V1 V30 )2


(38)

Now we exploit invariance of the inner product to replace the primed inner products by
unprimed ones. Furthermore, symmetry lets us say
V10 V20 = V1 V2 = V20 V30 = V2 V3 .
then we must compute V1 V2 and solve



1
2
2
= 1 2 1 cos

cos =
+1

(39)

(40)
(41)

So finally, what is the inner product? As in the tetrahedron problem from a problem set,
the sum of velocity 3-vectors is zero and the square of a velocity 4-vector is 1. Thus
(V1 + V2 + V3 )2 = 3V12 + 6V1 V2 = (3, 0, 0, 0)2
3 + 6 = 9 2
3 2 1
=
2

(42)
(43)
(44)
(45)

Hence
cos =
=

3 2 1
3 2 1
2
= 3 22+1
3 2 1
+
1
2
2
3
31+v 2
1
1v 2
1v 2
= 3+1v
2
3
+1
1v 2
1v 2

3 2 1
3 2 + 1

2 + v2
=
4 v2

(46)
(47)

Physics 16 Section 8, October 25, 2011


1

Review

Theres not much new this week. We have the definition


F=

dp
dt

(1)

and its corollary


dE = F dr.

(2)

We learned a bit about relativistic rocket motion, which is essentially just conservation of
momentum. Relativistic strings are a just a matter of connecting objects with a constant
tension. That doesnt imply that your homework this week is easy, just that a review of
concepts wont be as useful as going straight to the problems.

2
2.1

Problems
Three Quarks

Problem: Three massless quarks are in the same place and go off with equal energies E at
equal angles from each other. They are connected by massless strings of tension T . What is
the period of the resulting motion?
Solution: The time it takes to stop, which for particles travelling at speed c = 1 is equal to
the distance it takes to turn around, is one quarter of a period. Thus the period is four times
the distance d from the center of an equilateral triangle of side length L to one of its vertices,
where L is the length of each string when the quarks first turn around. By conservation of
energy, we need 3E = 3LT , so L = E/T . Some geometry gives d = L/ 3, so
P =

2.2

4E
.
T 3

(3)

Laser-Driven Rocket

Problem: A rocket ship is driven, starting from rest, by a laser beam of power w shot from
Earth. The ship varies the fraction of photons it absorbs by spreading photon-absorbing
sails so that it experiences a constant acceleration g in its own frame. What is (t) according
to the Earths frame?
Solution: First, we need a plan of what information we need to calculate in what order.
Lets work backwards. To get (t), we need to know the rate at which the ship consumes
energy. To get energy consumption, we need to know its energy as a function of time. From
E = m, we need to know its speed v(t) and mass m(t) as a function of time. Since its
acceleration is known, in one frame at least, it seems like v(t) cant be too hard to obtain.
1

Perhaps m(t) will be a bit harder, but maybe once we have v(t) we will have a chance. So,
a loose plan is
1. Get v(t) somehow.
2. Get m(t) somehow.
3. Get E(t) and dE/dt straightforwardly.
4. Findthe rate of laser energy available to ship.
5. Take the ratio.
Q: To get you warmed up, tell me whats wrong with this argument:
The rapidity
Z t
is just the integral of acceleration in the ships frame, so (t) =
mg dt0 = mgt, so
0

v(t) = tanh(mgt).
A: The problem is that the upper limit of integration that defines rapidity is the ships
proper time , not the Earth time t. And you cant just slap on a factor because the speed
is not constant.
Heres an efficient way to get v(t). First, convince yourself that if you know the acceleration as a function of time, then you know the velocity as a function of time, independent
of the mass. Thus to get v(t), we can pretend we have an object of constant mass m that
feels acceleration g in its own frame. Then the force is F = mg in its frame, and since the
relative vecloity of the frames is collinear with the force, this is also the Earth frame force.
Now apply F = dp/dt to get
F = mg =
Hence

dp
p = mgt = mv (t)v(t) = gt.
dt

v(t)
gt
p
= gt v(t) = p
.
1 v 2 (t)
g 2 t2 + 1

(4)

(5)

Now we need to find m(t).


Q: How can we get m(t) quickly. Hint: all the momentum and energy comes from
photons.
A: Since energy and momentum come from photons, the increase in one equals the
increase in the other. the ship starts from rest with E = m0 , p = 0, so we must always have
E = p + m0 . Now we plug this into the definitions of E and p, which involve m(t) and v(t):
E(t) = (t)m(t) = p(t) + m0 = (t)m(t)v(t) + m0
m0
m(t) =
(t)(1 v(t))

(6)
(7)

The next step is rote plugging in:


E(t) = m(t)(t) =
m0 dv/dt
dE
=
dt
(1 v(t))2

m0
1 v(t)

(8)
(9)

Finally, we divide this by the rate at which energy reaches the ship in the Earth frame.
This is not simply w.
Q: Why?
A: Because the laser beam travels at speed c = 1, while the ship travels at speed v(t).
Thus energy reaches the ship at a reduced speed w(1 v(t)), so we have
3
p
dE/dt
m0 dv/dt
m0 g 
2 t2 + 1
g
.
(10)
(t) =
=
gt
+
=
w(1 v(t))
w (1 v(t))3
w

2.3

Why Field Theory is Necessary

Problem: Construct the Lagrangian of two interacting relativistic particles.


Solution: Thats easy, just take the free particle kinetic part and add an interaction:
Z
X Z q
2
S=
mi
1 x i dt U (|x1 x2 |) dt
(11)
i

Q: Whats wrong with this?


A: Two things. First, the distance between the particles is not Lorentz-invariant. Second,
this says that particles separated by a spacelike interval can influence one another. To solve
it, we could replace the spatial distance by the invariant interval, and then require the
potential to vanish for spacelike intervals:
Z Z
X Z q
2
1 x i dt
U ((t1 t2 )2 (x1 x2 )2 ) dt1 dt2
(12)
S=
mi
i

Unfortunately, it seems that we must have a double integral over time, since different times t1
and t2 interact. We no longer have an action expressed as the time integral of a Lagrangian.
Worse, by making the interaction Lorentz-invariant, we have lost the Lorentz invariance of
the action, since dt1 and dt2 are not invariant. To fix the first problem, we could say that we
only want an interaction that applies when t1 = t2 , but if we need the condition t1 t2 = 0
to hold in every frame, we also need x1 x2 = 0, so that we need something like
Z 
L(0, 0 (if x1 = x2 )
Sint =
(13)
0 (otherwise)
In addition to our queasiness with the discontinuity inherent in having an interaction that
suddenly turns on for particles that are in exactly the same place, we still dont have Lorentz
invariance since dt is not invariant.
3

Z
Q: What could we do instead of

dt? Hint: what is the determinant of a Lorentz

matrix?
A: the quantity dt dx dy dz is Lorentz invariant,
time. To verify this, recall that
0
dt

dt
dt0


0
0
0
0
dx0
dt dx dy dz = dt dx dy dz dt

dy
0
dt

dz

so we could integrate over space and


dx0
dt
dx0
dx
dx0
dy
dx0
dz

dy 0
dt
dy 0
dx
dy 0
dy
dy 0
dz

dz 0
dt
dz 0
dx
dz 0
dy
dz 0
dz

(14)

Because the Lorentz transformation is linear, the matrix of derivatives (the Jacobian) is
simply the matrix of Lorentz transformation coefficients. For boosts along the x-axis, this
comes outs to:




v
0
0


v

0
0

= 2 (1 v 2 ) = 1.
(15)
0

0
1
0


0
0
0 1
Hence dt0 dx0 dy 0 dz 0 = dt dx dy dz. The interaction part of the action now looks like
Z

Sint = Lint ((x), (x))


d4 x,

(16)

where x = (t, x, y, z) and d4 x dt dx dy dz.


We now have a theory with a degree of freedom defined at each point in spacetime.
The spatial coordinates are just dummy variables. This means we have a field theory. For
example, an electric field does not have a position. Rather, it has some value at every point
in space, and its Lagrangian is obtained by integrating over all of space.
Finally, the form we gave above isnt quite right because /t is not invariant. However,
the combination
2
2
2
2

(17)
t2 x2 y 2 z 2
is invariant. Then one simple action for a relativistic field is
 2

Z
2 2 2 4
2
S = (x) +
2 2 2 d x.
t2
x
y
z

2.4

Relativistic Bucket

See Morin problem 12.19.

(18)

Physics 16 Section 10, November 8, 2011


1
1.1

Review
Tensors

By explicit calculation, one finds that the angular momentum of an object is related to its
angular velocity by
L = I,
(1)
where I is a symmetric 3x3 matrix that depends on the geometry of the object and with
respect to which point L is measured. In general, you calculate it by doing tedious integrals.
Okay, I lied a little. Technically, I is a tensor which is represented as a matrix when we
choose a particular coordinate system. Just as the vector one meter in length that points in
front of you could be (3, 3, 1) or (4, 6, 7) etc, but always points in front of you, a tensor has
an abstract meaning independent of any coordinate system. Lets explore what makes it a
tensor. One definition of a vector is that it transforms in a certain way. In special relativity
we saw that 4-vectors transform by Lorentz matrices when we change coordinate systems.
Similarly, a 3-vector is an object that transforms via rotation matrices, which I assume most
of you have encountered before. That is, for each rotation there exists a matrix R such that
X
x0 = Rx, x0i =
Rij xj
(2)
j

relates the vectors coordinates in the original and rotated frames. Now L and are vectors,
and
X
L = I, Lj =
Iij j
(3)
j

ought to be true in any coordinate system. Thus we must have


L0 = I0 0
RL = I0 (R)
L = (R1 I0 R)) = I
I0 = R1 IR

(4)
(5)
(6)
(7)

We can actually do a little better using the fact that rotation matrices leave angles and
magnitudes invariant, and hence do not affect the inner product. This is analogous to the
4-vector inner product of special relativity, which normal human beings learn about after

covering 3-vectors. the conseuence of invariance is


a0 b0 = a b
(Ra)T (Rb) = aT b
T

(8)
(9)

a R Rb = a b

(10)

R R = Id3x3

(11)

R1 = RT

(12)

Hence
I0 = R1 IR

(13)

= R IR

(14)

0
Ijk
= RT IRjk =

RT jm Imn Rnk

(15)

m,n

Imn Rmj Rnk

(16)

m,n

This looks like two separate matrix multiplications by the rotation matrix, one for the
first index and one for the second. This transformation defines a tensor. If you have ever
encountered the tensor product in your math classes, this definition is equivalent, for suppose
you have a tensor product x = v  w, xij vi wj . Then to get its value under a change of
basis, you would transform the two vectors separately:
X
X
Rim Rjn xmn
(17)
Rim Rjn vm wn =
x0ij = vi0 wj0 =
m,n

m,n

In short, a tensor transforms like the outer product of two vectors.

1.2

Inertia Tensor Meaning

Mathematically, the existence of three orthogonal principal axes, where angular momentum
and angular velocity are parallel to one another, follows from the spectral theorem, which
says that any real symmetric matrix (or complex matrix equal to its own conjugate transpose,
or under certain conditions, self-adjoint infinite-dimensional linear operators) has a complete
set of eigenvectors. The unintuitive existence of non-principal axes follows from the fact that
not everything is an eigenvector! Now lets try to understand things more physically. What
follows is my personal interpretation, which I hope you will find helpful.
Since angular momentum is conserved while angular velocity is not, you can think of
as a temporary conditions that holds only as long as rotation around a certain axis is
imposed. A system may want to rotate around one axis based on its angular momentum
but may have to rotate around another axis, for example if a rod is stuck through it. If
the rod is removed instantaneously, L does not change but certainly may. Thus, at least
2

one of the angular velocities is not parallel to the angular momentum. Consider just about
the simplest example: a globe. Suppose the globe is a perfect sphere and has a rod going
through the (geographic, not magnetic) poles. By symmetry the rod is a principal axis. Now
lets ruin this by adding two really big identical mountains at antipodal points, for example
Lake Baikal and Tierra del Fuego.
Q: How do these two mountains want to move?
A: You know in your heart that centrifugal forces will try to push the mountains toward
the equator, which the globe could achieve by rotating. If the rod were removed, that is
exactly what the globe would try to do. Thus we see a tangible case where a torque has to
be applied to maintain the direction of angular velocity.
Now suppose these two mountains are in the x z plane, os that their coordinates are
(x, z) and (x, z). Then their contribution to the inertia tensors off-diagonal element Ixz
is mxz + m(x)(z) = 2mxz. The off-diagonal element is non-zero, which represents that
these are not principal axes.
Q: How could we locate the mountains so that the off-diagonal inertia tensor elements
cancel?
A: If the mountains were related by reflection symmetry, that is, at the same latitude but
opposite longitude, the contribution would be mxz + m(x)z = 0. In fact, if we had a lot of
mountains, but every mountain at (x, y, z) had a clone at (x, y, z), the off-diagonal element
would vanish. Physically, our centrifugal force argument doesnt work because you cant
move them both closer to the equator simultaneously, so they remain at equal distances
from the equator. The off-diagonal elements of the inertia tensor cancel to zero due to
symmetry. We can turn this around and say The off-diagonal elements of the inertia tensor
measure the amount of asymmetry of an object. We can further say that the principal axes
define rotational motion where all the centrifugal forces balance each other.

1.3

Addition of Angular Velocities

The proof that angular velocities add is quite simple. Lets go over it.
Theorem:Suppose object A is rotating around some point R with angular velocity A
and object B is rotating with angular velocity B with respect the same point R on object
A. Then object B is (instantaneously) rotating around R with angular velocity A + B .
Proof: When object B is at point r, it is moving with velocity B (r R) in the rest
frame of object A. The same point r on object A has velocity A (r R). Since relative
velocities simply add, the net velocity is (A + B ) (r R), which describes rotational
motion with angular velocity A + B .
Morin first discusses Chasles theorem because we had to know that the most general
motion of a rigid body is a translation plus a rotation, which means that the only instantaneous motion about some fixed point is a rotation. This let us say that the velocity (A + B ) (r R) must equal (r R) for some ,which lets us conclude that
= A + B .

One final note, this only applies to instantaneous angular velocity, which in general is
always changing. It does not mean that the motion over time is a simple rotation!

1.4

Principal Axis Guessing

Often the princiapl axes can be guessed based on instinct. If you dont have instinct, or
if you used to but relativity made it feel unloved and now its sulking, there is a helpful
theorem.
Q: What are the principal axes of an airplane propeller or a fan ?
A: Because these have N -fold rotational symmetry, the main rotational axis and any two
perpendicular directions are principal axes. The fact that the blades are sloped and do not
lie in a plane does not affect this.
For an asymmetric object, you need to diagonalize the inertia tensor by finding its eigenvectors, which you will not have to do in this course.
Q: What are the principal axes of a flat-head screw? What about a Phillips head screw?
A: A flat-head screw only has 2-fold symmetry, so our theorem doesnt quite apply. A
Phillips head screw has 4-fold symmetry on top, so the theorem applies, but the shaft does
not quite have rotational symmetry. It has helical symmetry , but it is not symmetric under
rotations about the shaft. To see this, consider the point on the bottom of the screw where
the thread ends. There is only one such point, and it moves when you rotate the screw.
Q: How could you design a screw for extreme precision drilling?
A: You could have N 3 separate threads of the screw winding up the shaft in parallel.
Then the shaft would have N -fold symmetry, so that the rotation induced by drilling would
be a princiapl axis.
Q: What are the principal axes of a football?
A: A football has fourfold symmetry about its main axis, so the long axis and any two
perpendicular axes are princiapl axes.
Q: What about if you take into account the laces of the football?
A: The long axis should still be a principal axis by the centrifugal force argument,
the laces are already centered around the equator so they cant get pushed out any farther.
Similarly, an axis throught the middle of the laces ought to be a principal axes. Finally,
the axis perpendicular to these two is principal. If you dont like the centrifugal point of
view, let the long axis be z and let the axis throught the center of the laces be x. Then the
football is symmetric under x x and y y, so all the off-diagonal integrals cancel,
for example xz + (x)z, xy + x(y), etc.

2
2.1

Problems
Problem 1: Plausibility of Tennis Racket Theorem

Problem: Note: this motivates a theorem that will be demonstrated more rigorously, and
with a demonstration, in lecture. Suppose an object subject to no external torque has
4

principal axes with moments I1 < I2 < I3 . Show that rotations around axes 1 and 3 are
stable, while rotations around axis 2 are (probably) unstable, in the sense that a rotation
that at some instant has nearly parallel to axis 2 will not stay near.
Solution: Use conservation of E and L2 . In terms of the principal moments,
L = I = (I1 1 , I2 2 , I3 3 )

(18)

L2 = I12 12 + I22 22 + I32 32

(19)

2E = L = I1 12 + I2 22 + I3 32

(20)

are constants of motion.


Q: Any ideas?
A: We could eliminate any one of the s as follows:
X

2EIj L2 =
Ij Ik Ik2 k2 .

(21)

so
and

k6=j

Taking j = 1, 2, 3 gives
2EI3 L2 = (I3 I1 I12 )12 + (I3 I2 I22 )22
2EI2 L2 = (I2 I1 I12 )12 + (I2 I3 I32 )32
2EI1 L2 = (I1 I2 I22 )22 + (I1 I3 I32 )32 .

(22)
(23)
(24)

Using I1 < I2 < I3 , these become


positive 12 + positive 22 = (positive) constant
positive 12 + negative 32 = constant
negative 22 + negative 32 = (negative) constant

(25)
(26)
(27)

Q: So what?
A: These equations describe ellipses in the 1 2 and 3 2 planes and a hyperbola
in the 1 3 plane. Thus if 1 and 2 start out small (for a rotation nearly about axis 3),
they remain small. Similarly if 3 and 2 start small they remain small. However, we cant
say the same if 1 and 3 start small.

2.2

Problem 2: Stochastic Sphere

Problem: A sphere of radius R and angular momentum L absorbs cosmic microwave background photons of momentum p at a rate of n per second. The photons come from all
directions with equal probability. For small times t what is the expected angle by which the
angular momentum vector has rotated?
5

Solution: Let the initial direction be


z. If the net angular momentum absorbed is L, then
the final angular momentum is (Lx , Ly , L + Lz ), which makes an angle
p 2
Lx + L2y
tan =
(28)
L + Lz
with the z-axis. To first order in small quantities,
1q 2
=
Lx + L2y .
L
Now lets consider the random variable inside the square root:
L2x

L2y

= R , R = (Rx , Ry )

N
X

(Li,x , Li,y ),

(29)

(30)

i=1

where Li,x is the random variable describing the angular momentum carried by the ith photon
and there are N photons.
Q: What can we say about the random variable R?
A: It is the sum of a large number of identical independent random variables and thus
the central limit theorem tells us that it is essentially normally distributed. This is great
because we only need to know its mean and variance in order to know everything!
Q: What is the mean of (Rx , Ry )?
A: By isotropy, the means are zero.
To calculate the variance, let 2 be the variance of the single photon angular momentum
components Li,x(y) . Then the variance of the total Rx is N 2 (the variance of the sum is
the sum of the variance for sums of independent random variables. Then the probability
distribution for Rx (and also Ry ) is
f (x) =

ex

2 /(2N 2 )

.
2N 2
Then we can calculate the expected by integrating over these ditribution functions:
1 q
hi = h Rx2 + Ry2 i
LZ Z
1
1 p 2
1
2
2
2
2
=
x + y2
ex /(2N )
ey /(2N ) dx dy
L
2N 2
2N 2
Z 2 Z
1
1
2
2
=
rer /(2N ) r dr d
2
L 2N 0
0
Z
1
2
2
=
r2 er /(2N ) dr
2
N L 0
r
1

=
(N 2 )3/2
2
N L
2
r
r
1/2

=
N =
(nt)1/2
2L
2L
6

(31)

(32)
(33)
(34)
(35)
(36)
(37)

Note that the average deviation grows as the square root of time. This is typical of random
wals and other random/diffusive processes. Now we just need the variance of the Lx imparted
by a single photon. Since the mean of this is zero, the variance is
2 = hL2i,x i

(38)

Q: Can we exploit symmetry to reduce or eliminate the amount of trig we will have to
do?
A: By symmetry,
2

hL2i,x i

1 2
1
p2 2
1 2
2
2
2
= hLi,x + Li,y + Li,z i = hL i = h(bp) i = hb i,
3
3
3
3

(39)

where b is the impact parameter of the incident photons. The impact parameter b is geometrically just the polar coordinate r in the cross-sectional circle, so
R 2 R R 2
r r dr d
2
hb i = 0 0 2
= R2 /2.
(40)
R
Thus

R 2 p2
=
6
2

and
r
hi =

1
2L

R 2 p2
pR
(nt)1/2 =
(nt/3)1/2 .
6
2L

(41)

(42)

Deriving the Lorentz Transformation from the Invariant Interval


David Benjamin
We derive the Lorentz tranformation using only the property that it leaves the spacetime interval
s2 = t2 x2 y 2 z 2 invariant. We first handle the related problem of rotations in the x y plane,
which satisfy the property that they leave the distance r2 = x2 + y 2 invariant. We then generalize
these techniques to four-dimensional spacetime.

N succesive rotations, which is just one larger rotation,


must be represented by

ROTATIONS IN THE PLANE

Rotations in the x y plane about the origin do not


change the distance of a point from the origin. That is, if
x and y map to x0 and y 0 , then x2 + y 2 = x02 + y 02 . If we
treat position as a row vector (1 2 matrix) xT = (x y),
then we require
 
 x
2
2
x +y = x y
= xT x = x0T x0
(1)
y
It had better be true that x0 = Mx, where M is a
2 2 matrix representing a linear transformation. Convince yourself that any non-linear transformation would
have some very undesirable properties. For example, we
would expect that if points A and B are twice as far apart
in one frame, then they are twice as far apart in any
other. You could think of it this way once we do Lorentz
transformations: we are replacing the two postulates of
relativity with the two postulates of the invariant interval
and linearity of transformations.
Substituting x0 = Mx into Eq. (1) gives a condition
on M:
(Mx)T (Mx) = xT x
T

x M Mx = x x




N
0 1
2
I+
+ O( )
1 0


N
0 1
= I+
+ N O(2 )
1 0

Setting  = a/N and taking the N limit gives a


rotation matrix

N

a
0 1
+ N O(a2 /N 2 )
M = lim I +
N
N 1 0
 

0 1
= exp a
1 0

n

n
X
a
0 1
(6)
=
n! 1 0
n=0

where matrix exponentiation is defined by the usual Taylor series for exponential functions, applied to a matrix.
The powers of the matrix are actually simple to handle
because of the nice property

M M=I

(5)

0 1
1 0

2
= I

(7)

(2)
so that

where I is the (2 2) identity matrix. M = I is a solution, the trivial rotation of doing nothing at all. Thus we
restrict ourselves for now to the seemingly modest goal of
finding small rotations, that is, we look for M = I+M1 ,
where  is infinitesimal. Substituting this into Eq. (2),
we obtain
(I + MT
1 )(I + M1 ) = I
2
I + (MT
1 + M1 ) + O( ) = I
2
MT
1 + M1 = 0 + O( )


0 1
M1
+ O(2 )
1 0

(3)

Hence we obtain

(8)

which is the familiar matrix for 2 D rotations.

LORENTZ TRANSFORMATIONS


M=I+


a4
a2
M=I 1
+
...
2!
4!



a
a3
a5
0 1
+

+
...
1 0
1!
3!
5!


0 1
= I cos(a) +
sin(a)
1 0


cos a sin a
=
sin a cos a


0 1
1 0

+ O(2 )

(4)

As you know, the composition of two linear transformations is given by matrix mulitplication, so for any N ,

Now we have coordinates xT = (t, x, y, z). The invariant interval is now


s2 = t2 x2 y 2 z 2 = xT x

(9)

2
where = diag(1, 1, 1, 1). We again have very sensible reasons to want the transformation to be linear, so
for x0 = x, we require
(x)T (x) = xT x
T

x x = x x

x
x

T = I

(10)

The question we need to ask is how many free parameters are there that determine ? Its a 4 4 matrix
with 16 entries, but these are related to each other by
Eq. (10). How many of these are we free to choose, i.e.
how many different types of Lorentz transformations are
there?
As before, the identity is clearly among the solutions
to Eq. (10). Thus we can look for small Lorentz transformations = I + 1 . The claim is that the number
of free parameters for small transformations is equal to
the number of free parameters for arbitrary transformations. Intuitively, this makes sense, because a Lorentz
transformation at a very large speed should equal the
product of many successive Lorentz transformations at
low speeds. However, we can also prove it using the preimage theorem (see e.g. Guillemin and Pollack, Differential Topology pp. 20-23). With the theorem, we would
use the fact that is the inverse image of regular value
of a smooth map to show that the set of all possible is
a manifold. Since a manifolds dimension (which equals
the number of free parameters) is the same everywhere,
we can evaluate the dimension locally near I.
In analogy to Eq. (3), we find
(I + T
1 )(I + 1 ) = I
2
I + (T
1 + 1 ) + O( ) = I
2
T
1 + 1 = 0 + O( )

(11)
If we actually write out the elements of 1 explicitly as

a b c d
e f g h

1 =
(12)
i j k l
m n o p
then Eq. (11) becomes


a e i m
a
b f j n e


c g k o = i
d h l p
m

b
f
j
n

c d
g h

k l
o p

(13)

from which we can read off a = f = k = p = 0, e =


b, i = c, m = d, and j = g, n = h, o = l. Thus there
are six free parameters b, c, d, g, h, l. This makes a lot of
sense. Clearly, rotations of space dont affect s2 , since
they do nothing to time and they leave spatial distance

invariant. There are three of these, for example rotation


in the y z plane, obtained by the choice l = , and
setting all other parameters to zero:

1 0
0
0
0 1
0
0

xy =
(14)
0 0 cos sin
0 0 sin cos
The first row and first column of affect time, so transformations involving non-zero b, c, d are something new.
Lets look at the transformation e = b 6= 0, all other parameters equal zero. Then the same reasoning as above
gives


0 1 0 0
1 0 0 0

(15)
= exp
b 0 0 0 0
0 0 0 0
Now this matrix that we are exponentiating has the nice
property

0
1

0
0

1
0
0
0

0
0
0
0

2
1 0
0
0 1
0
=
0 0
0
0 0
0

0
0
0
0

0
0

0
0

(16)

so that when we take the matrix Taylor series we get

1 0 0 0 

0 1 0 0
b4
b2

+
...
1+
=
0 0 0 0
2!
4!
0 0 0 0

0 1 0 0 

1 0 0 0 b
b3
b5

+
+
+
...
0 0 0 0 1!
3!
5!
0 0 0 0

0 0 0 0
0 0 0 0

+
0 0 1 0
0 0 0 1

cosh b sinh b 0 0
sinh b cosh b 0 0

=
(17)
0
0
1 0
0
0
0 1
which is just the identity when acting on the y and z
coordinates. Acting on x and t, it gives
t0 = t cosh b + x sinh b
x0 = t sinh b + x cosh b

(18)

This is exactly equation 11.57 in your textbook, and b


is the rapidity . We could find similar Lorentz boosts
for motion along the y and z directions by setting
c = i 6= 0 and d = m 6= 0, respectively.

Physics 16 Variational Challenge with Solutions by


Aleksandar Makelov and Jeffrey Wang
Problem: Find the function that blobally maximizes the functional
2
Z 1 Z x
V [f ] =
f (t) dt dx
0

(1)

subject to the constraint


Z

f 2 (t) dt = 1.

(2)

Solution

Both solutions start as follows: Define the function


Z x
F (x)
f (t) dt F 0 (x) = f (x).

(3)

Then our task is to maximize


Z 1

F (x) dx s.t.
0

(F 0 (x)) dx = 1.

(4)

Jeffreys Method

If we were dealing with a multivariable calculus problem of maximizing given a constraint, we


would use the method of Lagrange multipliers, which says that at a maximum of f (x1 , x2 , . . .)
subject to g(x1 , x2 , . . .) = c, there exists some constant such that
g
f
=
i.
xi
xi

f = g

(5)

For our purposes, the best way to recast this is to switch to vector notation (x1 , x2 . . .) x
and write that if a maximum occurs at x, then for a change x and small constant ,
f (x + x) f (x) = (g(x + x) g(x)) + O(2 )



f (x + x)
= g(x + x)



=0
=0
Generalizing this to the case of a functional, if F (x) maximizes
Z 1
V [f ]
F 2 (x) dx
0

(6)
(7)

(8)

subject to
Z
G[f ]

(F 0 (x)) dx = 1,

(9)

we must have for some change F (x), small constant , and some constant :



V [F + F ]
= G[F + F ]


=0
=0


Z 1
Z 1

2
2
0
0

(F (x) + F (x)) dx
=
(F (x) + F (x)) dx
 0
 0
=0
=0
Z 1
Z 1

2F (x)F (x) dx =
2F 0 (x)F 0 (x) dx
0
Z 1
Z0 1
F (x)F (x) dx =
F 00 (x)F (x) dx

0
00

F (x) = F (x).

(10)
(11)
(12)
(13)
(14)

The second-to-last line followed from integration by parts with boundary conditions F (0) =
F (1) = 0 and the logic of the last step was that the integrals are equal for any F (x) that
satisfies the boundary conditions, hence the things multiplying F (x) within the integral
must be equal for all x. We conclude from the differential equation that for some constants
a and b
F (x) = a sin(bx).
(15)

Aleksandars Method

Consider the enlarged functional acting on a function F (x) and a real number :
Z 1h
i
2
2
0
U [F, ]
F (x) + (F (x)) dx

(16)

Set the boundary conditions


F (0) = 0, F (1) =

(17)

Note that is still unknown since we have not yet found the solution F (x), however, in
principle we know that F (1) is some constant. Think of it this way: we are finding a global
minimum over the sapce of all functions F (0), which will certainly also be a minimum over
the subset of functions with F (1) = .
We will now show that maximizing U [F, ] is equivalent to maximizing V [F ] with the
constraint. Thus we will replace a constrained problem with an unconstrained one, which
we can solve using the Euler-Lagrange equations. Suppose for some fixed we could find
some function F0 (x) that maximizes U [F, ] over all functions with the boundary conditions
F (0) = 0, F (1) = . That is,
U [F, ] U [F0 , ] F.
(18)
2

Note that F0 depends on . Because of this dependence, suppose we choose some particular
0 such that F0 satisfies the constraint
Z 1
2
(F00 (x)) dx = 1.
(19)
0

Since this F0 minimizes U [F, 0 ] over all functions with the boundary conditions, it certainly
minimizes U [F, 0 ] over all functions with the boundary conditions and with the integral
constraint. Thus for all functions F (x) satisfying the integral constraint adn boundary
conditions
Z 1
Z 1
Z 1
Z 1
2
2
2
0
2
(20)
0 (F00 (x)) dx
F (x) dx +
0 (F (x)) dx
F0 (x) dx +
0
0
0
0
Z 1
Z 1
2
(21)

F (x) dx + 0
F02 (x) dx + 0
0
0
Z 1
Z 1
2

F (x) dx
F02 (x) dx.
(22)
0

Z
The second-to-last line followed from the definition of the constraint:

(F 0 )2 dx = 1.

This last line explicitly says that F0 maximizes V [F ] subject to the constraint, as desired.
thus all we have to do is maximize the functional
Z 1h
i
2
U [F, 0 ] =
F 2 (x) + 0 (F 0 (x)) dx
(23)
0

with the boundary conditions F (0) = 0, F (1) = . But this just follows from the EulerLagrange equations, which are
0 F 00 (x) = F (x).
(24)
Thus we expect F (x) = a sin(bx).

Finding the Constants a and b

Note that we could state the problem as minimize the functional:


2
R 1 R x
f (t) dt dx
0
0
V [f ] =
.
R1
f 2 (t) dt
0
the reason is that if we have a maximum f (x) such that
Z 1
f 2 (t) dt = z 6= 1
0

(25)

(26)


we could simply divide f by z. Then the new function would satisfy the integral constraint
without affecting the value of the functional, the numerator and denominator of which would
both get divided by z. From our earlier work we know that f (x) cos(bx), so lets maximize
with respect to b. Thus we need
2
R 1 R x
d 0 0 cos(bt) dt dx
=0
(27)
R1
db
cos2 (bt) dt = 1
0

d 2b sin(2b)
=0
db b2 (2b + sin(2b)

(28)

You can evaluate this derivative (tedious without Mathematica) and plot it against b. The
graph strongly suggests that b = /2 is a solution. Plugging in b = /2 into the derivative,
we confirm that b = /2.
Finally, we obtain the prefactor by setting
Z 1
(a cos(x/2))2 dx = 1,
(29)
0

which gives a =

2. Hence
f (x) =

2 cos(x/2).

(30)

A Third Method

According to our discussion above, we need to maximize (without constraint).


2
R 1 R x
dx
f
(t)
dt
.
V [f ] = 0 R01
f 2 (t) dt
0
Z x
Alternatively, set F (x) =
f (t) dt as before and maximize

(31)

R1

F 2 (x) dx
V [f ] = R 1 0
.
0 (x))2 dx
(F
0
Then we need

#
R1
2
0 (F (x) + (x)) dx
R
 1 (F 0 (x) + 0 (x))2 dx
0

(32)

= 0.
=0

Straightforward application of the quotient rule and integrating by parts gives


Z 1
Z 1
Z 1
Z 1
0
2
2
F (x) dx
F (x)(x) dx
F (x) dx
F 0 (x)0 (x) dx = 0
0
0
0
0
Z 1
Z 1
Z 1
Z 1
0
2
2
F (x) dx
F (x)(x) dx +
F (x) dx
F 00 (x)(x) dx = 0
0

(33)

(34)
(35)

Now since (x) can be anything, choose it to be a Dirac function (x) = (x y). If you
are not familiar with this function, look it up on Wikipedia. Its not complicated! Then we
have
Z
Z
1

F (y)

00

F 2 (x) dx = 0

F (x) dx + F (y)
0

(36)

Now as far as F (y) is concerned, the integrals over the dummy variable x are just some
constants, so we have
F (y) + 2 F 00 (y) = 0 F (y) = sin(y),

(37)

where
R1
=

R 01
0

F 2 (x) dx
F 0 (x)2 dx

R1
=

sin2 (x) dx

R1
2 0 cos2 (x) dx
2 sin(2)
= 2
.
(2 + sin(2))
This is equivalent to the algebraic equation we solved above.

(38)
(39)
(40)

You might also like