You are on page 1of 121

Classical Theoretical Physics

Alexander Altland
ii
Contents

1 Electrodynamics 1
1.1 Magnetic field energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Electromagnetic gauge field . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Covariant formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Fourier transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Fourier transform (finite space) . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Fourier transform (infinte space) . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Fourier transform and solution of linear differential equations . . . . . 12
1.3.4 Algebraic interpretation of Fourier transform . . . . . . . . . . . . . . 14
1.4 Electromagnetic waves in vacuum . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Solution of the homogeneous wave equations . . . . . . . . . . . . . . 20
1.4.2 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Green function of electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1 Physical meaning of the Green function . . . . . . . . . . . . . . . . . 26
1.5.2 Electromagnetic gauge field and Green functions . . . . . . . . . . . . 27
1.6 Field energy and momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.1 Field energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.2 Field momentum ∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.3 Energy and momentum of plane electromagnetic waves . . . . . . . . 32
1.7 Electromagnetic radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 Macroscopic electrodynamics 37
2.1 Macroscopic Maxwell equations . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.1.1 Averaged sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Applications of macroscopic electrodynamics . . . . . . . . . . . . . . . . . . 47
2.2.1 Electrostatics in the presence of matter * . . . . . . . . . . . . . . . . 47
2.2.2 Magnetostatics in the presence of matter . . . . . . . . . . . . . . . . 50
2.3 Wave propagation in media . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3.1 Model dielectric function . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3.2 Plane waves in matter . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.3 Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.4 Electric conductivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

iii
iv CONTENTS

3 Relativistic invariance 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 Galilei invariance and its limitations . . . . . . . . . . . . . . . . . . . 61
3.1.2 Einsteins postulates . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.3 First consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 The mathematics of special relativity I: Lorentz group . . . . . . . . . . . . . 66
3.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Aspects of relativistic dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.1 Proper time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.2 Relativistic mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4 Mathematics of special relativity II: Co– and Contravariance . . . . . . . . . . 76
3.4.1 Covariant and Contravariant vectors . . . . . . . . . . . . . . . . . . . 76
3.4.2 The relativistic covariance of electrodynamics . . . . . . . . . . . . . . 78

4 Lagrangian mechanics 81
4.1 Variational principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.2 Euler–Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.3 Coordinate invariance of Euler-Lagrange equations . . . . . . . . . . . 86
4.2 Lagrangian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.1 The idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.2 Hamilton’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.3 Lagrange mechanics and symmetries . . . . . . . . . . . . . . . . . . 92
4.2.4 Noether theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5 Hamiltonian mechanics 99
5.1 Hamiltonian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.1 Legendre transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.2 Hamiltonian function . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.1 Phase space and structure of the Hamilton equations . . . . . . . . . . 106
5.2.2 Hamiltonian flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.3 Phase space portraits . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2.4 Poisson brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.5 Variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3 Canonical transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3.1 When is a coordinate transformation ”canonical”? . . . . . . . . . . . 113
5.3.2 Canonical transformations: why? . . . . . . . . . . . . . . . . . . . . 113
Chapter 1

Electrodynamics

1.1 Magnetic field energy


As a prelude to our discussion of the full set of Maxwell equations, let us address a question
which, in principle, should have been answered in the previous chapter: What is the energy
stored in a static magnetic field? In section ??, the analogous question for the electric field
was answered in a constructive manner: we computed the mechanical energy required to build
up a system of charges. It turned out that the answer could be formulated entirely in terms of
the electric field, without explicit reference to the charge distribution creating it. By symmetry,
one might expect a similar prescription to work in the magnetic case. Here, one would ask
for the energy needed to build up a current distribution against the magnetic field created by
those elements of the current distribution that have already been put in place. A moments
thought, however, shows that this strategy is not quite as straightforwardly implemented as
in the electric case: no matter how slowly we move a ‘current loop’ in an magnetic field, an
electric field acting on the charge carriers maintaining the current in the loop will be induced
— the induction law. Work will have to be done to maintain a constant current and it is this
work function that essentially enters the energy balance of the current distribution.
To make this picture quantitative, consider a current loop carry-
ing a current I. We may think of this loop as being consecutively
I B
built up by importing small current loops (carrying current I) from
infinity (see the figure.) The currents flowing along adjacent seg-
ments of these loops will eventually cancel so that only the net
current I flowing around the boundary remains. Let us, then, com-
pute the work that needs to be done to bring in one of these loops
from infinity.
Consider, first, an ordinary point particle kept at a (mechanical)
potential U . The rate at which this potential changes if the particle
changes its position is dt U = dt U (x(t)) = ∇U · dt x = −F · v,
where F is the force acting on the particle. Specifically, for the charged particles moving in
our prototypical current loops, F = qE, where E is the electric field induced by variation of

1
2 CHAPTER 1. ELECTRODYNAMICS

the magnetic flux through the loop as it approaches from infinity.


Next consider an infinitesimal volume element d3 x inside the loop. Assuming that the
charge carriers move at a velocity v, the charge of the volume element is given by q = ρd3 x and
rate of its potential change by dt φ = −ρd3 xv · E = −d3 xj · E. RWe may now use R the induction
−1
R of magnetic quantities: ds·E = −c
law toRexpress the electric field in terms δS
dσ n·dt B =
−c−1 δS dσ n · dt (∇ × A) = −c−1 ds · dt A. Since this holds regardless of the geometry
of the loop, we have E = −c−1 dt A, where dt A is the change in vector potential due to the
movement of the loop. Thus, the rate at which the potential of a volume element changes is
given by dt φ = c−1 d3 xj · dt A. Integrating over time and space, we find that the total potential
energy of the loop due to the presence of a vector potential is given by
1
Z
E= d3 x j · A. (1.1)
c

Although derived for the specific case of a current loop, Eq.(1.1) holds for general current
distributions subject to a magnetic field. (For example, for the current density carried by a
point particle at x(t), j = qδ(x − x(t))v(t), we obtain E = qv(t) · A(x(t)), i.e. the familiar
Lorentz–force contribution to the Lagrangian of a charged particle.)
Now, assume that we shift the loop at fixed current against the magnetic R field. The change
−1 3
in potential energy corresponding to a small shift is given by δE = c d x j · δA, where
∇ × δA = δB denotes the change in the field strength. Using that ∇ × H = 4πc−1 j, we
represent δE as
1 1
Z Z
3
δE = d x (∇ × H) · δA = d3 x ijk (∂j Hk )δAi =
4π 4π
1 1
Z Z
3
=− d x Hk ijk ∂j δAi = d3 x H · δB,
4π 4π
where in the integration by parts we noted that due to the spatial decay of the fields no
surface terms at infinitey arise. Due to the linear relation H = µ−1
0 B, we may write H · δB =
1
R
δ(H · B)/2, i.e. δE = 8π δ B · E. Finally, summing over all shifts required to bring the
current loop in from infinity, we obtain

1
Z
E= d3 x H · B (1.2)

for the magnetic field energy. Notice (a) that we have again managed to express the
energy of the system entirely in terms of the fields, i.e. without explicit reference to the
sources creating these fields and (b) the structural similarity to the electric field energy (??).
Finally, (c) the above construction shows that a separation of electromagnetism into static
and dynamic phenomena, resp., is not quite as unambiguous as one might have expected:
the derivation of the magnetostatic field energy relied on the dynamic law of induction. We
next proceed to embed the general static theory – as described by the concept of electric and
magnetic potentials – into the larger framework of electrodynamics.
1.2. ELECTROMAGNETIC GAUGE FIELD 3

Thus, from now on, we are up to the theory of the full set of Maxwell equations

∇ · D = 4πρ, (1.3)
1∂ 4π
∇×H− D= j, (1.4)
c ∂t c
1∂
∇×E+ B = 0, (1.5)
c ∂t
∇ · B = 0. (1.6)

1.2 Electromagnetic gauge field


Consider the full set of Maxwell equations in vacuum (E = D and B = H),

∇·E = 4πρ,
1∂ 4π
∇×B− E = j,
c ∂t c
1∂
∇×E+ B = 0,
c ∂t
∇·B = 0. (1.7)

As in previous sections we will try to use constraints inherent to these equations to compactify
them to a smaller set of equations. As in section ??, the equation ∇ · B = 0 implies

B = ∇ × A. (1.8)

(However, we may no longer expect A to be time independent.) Now, substitute this represen-
tation into the law of induction: ∇×(E+c−1 ∂t A) = 0. This implies that E+c−1 ∂t A = −∇φ
can be written as the gradient of a scalar potential, or

1
E = −∇φ − ∂t A. (1.9)
c

We have, thus, managed to represent the electromagnetic fields as in terms of derivatives of


a generalized four–component potential A = (φ, A). (The negative sign multiplying A has
been introduced for later reference.) Substituting Eqs. (1.8) and (1.9) into the inhomogeneous
Maxwell equations, we obtain

−∆φ − 1c ∂t ∇ · A = 4πρ,
−∆A + c12 ∂t2 A + ∇(∇ · A) + 1c ∂t ∇φ = 4π
c
j (1.10)

These equations do not look particularly inviting. However, as in section ?? we may observe
that the choice of the generalized vector potential A is not unique; this freedom can be used
to transform Eq.(1.10) into a more manageable form: For an arbitrary function f : R3 × R →
4 CHAPTER 1. ELECTRODYNAMICS

R, (x, t) 7→ f (x, t). The transformation A → A + ∇f leaves the magnetic field unchanged
while the electric field changes according to E → E − c−1 ∂t ∇f . If, however, we synchronously
redefine the scalar potential as φ → φ − c−1 ∂t f , the electric field, too, will not be affected by
the transformation. Summarizing, the generalized gauge transformation

A → A + ∇f,
1
φ → φ − ∂t f, (1.11)
c
leaves the electromagnetic fields (1.8) and (1.9) unchanged.
The gauge freedom may be used to transform the vector potential into one of several
convenient representations. Of particular relevance to the solution of the time dependent
Maxwell equations is the so–called Lorentz gauge

1
∇ · A + ∂t φ = 0. (1.12)
c

Importantly, it is always possible to satisfy this condition by a suitable gauge transformation.


Indeed, if (φ0 , A0 ) does not obey the Lorentz condition, i.e. ∇ · A0 + 1c ∂t φ0 6= 0, we may define
a transformed potential (φ, A) through φ = φ0 − c−1 ∂t f and A = A0 + ∇f . We now require
that the newly defined potential obey (1.12). This is equivalent to the condition
   
1 2 0 1 0
∆ − 2 ∂t f = − ∇ · A + ∂t φ ,
c c

i.e. a wave equation with inhomogeneity −(∇ · A + c−1 ∂t φ). We will see in a moment that
such equations can always be solved, i.e.

we may always choose a potential (φ, A) so as to obey the Lorentz


gauge condition.

In the Lorentz gauge, the Maxwell equations assume the simplified form
 
1 2
∆ − 2 ∂t φ = −4πρ,
c
 
1 2 4π
∆ − 2 ∂t A = − j. (1.13)
c c

In combination with the gauge condition (1.12), Eqs. (1.13) are fully equivalent to the set of
Maxwell equations (1.7).

1.2.1 Covariant formulation


The formulation introduced above is mathematically efficient, however it has an unsatisfactory
touch to it: the gauge transformation (1.11) couples the “zeroth component” φ with the
1.2. ELECTROMAGNETIC GAUGE FIELD 5

“vectorial” components A. While this indicates that the four components (φ, A) should
form an entity, the equations above treat φ and A separately. In the following we show that
there exists a much more concise and elegant covariant formulation of electrodynamics that
transcendents this separation. At this point, we introduce the covariant notation merely as a
mnemonic aid. Later in the text, we will understand the physical principles behind the new
formulation.

Definitions
Consider a vector x ∈ R4 . We represent this object in terms of four components {xµ },
µ = 0, 1, 2, 3, where x0 will be called the time–like component and xi , i = 1, 2, 3 space-
1
like components. Four–component objects of this structure will be called contravariant
vectors. To a contravariant vector xµ , we associate a covariant vector xµ (notice the
positioning of the indices as superscripts and subscripts, resp.) through
x0 = x 0 , xi = −xi . (1.14)
2
A co– and a contravariant vector xµ and x0µ can be “contracted” to produce a number:
xµ x0µ ≡ x0 x00 − xi x0i .
Here, we are employing theP
Einstein summation convention according to which pairs of indices
are summed over, xµ x0µ ≡ 4µ=1 xµ x0µ , etc. Notice that
xµ x0µ = xµ x0µ .
We next introduce a number of examples of co- and contravariant vectors that will be of
importance below:
. Consider a point in space time (t, r) defined by a time argument, and three real space
components ri , i = 1, 2, 3. We store this information in a so-called contravariant four–
component vector, xµ , µ = 0, 1, 2, 3 as x0 = ct, xi = ri , i = 1, 2, 3. Occasionally, we
write x ≡ {xµ } = (ct, x) for short. The associated covariant vector has components
{xµ } = (ct, r).
. Derivatives taken w.r.t. the components of the contravariant 4–vector are denoted by
∂µ ≡ δxδµ = (c−1 ∂t , ∂r ). Notice that this is a covariant object. Its contravariant partner
is given by {∂ µ } = (c−1 ∂t , −∂r ).
. The 4-current j = {j µ } is defined through j ≡ (cρ, j), where ρ and j are charge density
and vectorial current density, resp.
. the 4–vector potential A ≡ {Aµ } through A = (φ, A).
We next engage these definitions to introduce the
1
Following a widespread convention, we denote four component indices µ, ν, · · · = 0, 1, 2, 3 by Greek letters
(from the middle of the alphabet.) “Space–like” indices i, j, · · · = 1, 2, 3 are denoted by (mid–alphabet) latin
letters.
2
Although one may formally consider sums xµ x0µ , objects of this type do not make sense in the present
formalism.
6 CHAPTER 1. ELECTRODYNAMICS

Covariant representation of electrodynamics


Above, we saw that the freedom to represent the potentials (φ, A) in different gauges may be
used to obtain convenient formulations of electrodynamics. Within the covariant formalism,
the general gauge transformation (1.11) assumes the form

Aµ → Aµ − ∂ µ f. (1.15)

(Exercise: convince yourself that this is equivalent to (1.11).) Specifically, this freedom may
be used to realize the Lorentz gauge,

∂µ Aµ = 0. (1.16)

Once more, we verify that any 4-potential may be transformed to one that obeys the Lorentz
condition: for if A0µ obeys ∂µ A0µ ≡ λ, where λ is some function, we may define the transformed
potential Aµ ≡ A0µ − ∂ µ f . Imposing on Aµ the Lorentz condition,
!
0 = ∂µ Aµ = ∂µ (A0µ − ∂ µ f ) = λ − ∂µ ∂ µ f,
!
we obtain the equivalent condition f = λ. Here, we have introduced the d’Alambert
operator
1
 ≡ ∂µ ∂ µ = 2 ∂t2 − ∆. (1.17)
c
(In a sense, this is a four–dimentional generalization of the Laplace operator, hence the four–
cornered square.) This is the covariant formulation of the condition discussed above. A gauge
transformation Aµ → Aµ + ∂ µ f of a potential Aµ in the Lorenz gauge class ∂µ Aµ = 0 is
compatible with the Lorentz condition iff f = 0. In other words, the class of Lorentz gauge
vector potentials still contains a huge gauge freedom.
The general representation of the Maxwell equations through potentials (1.10) affords the
4-representation
4π µ
Aµ − ∂ µ (∂ν Aν ) = j .
c
P
We may act on this equation with µ ∂µ to obtain the constraint

∂µ j µ = 0, (1.18)

which we identify as the covariant formulation of charge conservation, ∂µ j µ = ∂t ρ+∇·j =


0: the Maxwell equations imply current conservation. Finally, imposing the Lorentz condition,
∂µ Aµ = 0, we obtain the simplified representation

4π µ
∂ν ∂ ν Aµ = j , ∂µ Aµ = 0. (1.19)
c

Before turning to the discussion of the solution of these equations a few general remarks
are in order:
1.3. FOURIER TRANSFORMATION 7

. The Lorentz condition is the prevalent gauge choice in electrodynamics because (a) it
brings the Maxwell equations into a maximally simple form and (b) will turn out below
to be invariant under the most general class of coordinate transformations, the Lorentz
transformations see below. (It is worthwhile to note that the Lorentz condition does
not unambiguously fix the gauge of the vector potential. Indeed, we may modify any
Lorentz gauge vector potential Aµ as Aµ → Aµ +∂ µ f where f satisfies the homogeneous
wave equation ∂ µ ∂µ f = 0. This doesn’t alter the condition ∂ µ Aµ = 0.) Other gauge
conditions frequently employed include

. the Coulomb gauge or radiation gauge ∇ · A = 0 (employed earlier in section


??.) The advantage of this gauge is that the scalar Maxwell equation assumes the
Rsimple form of a Poisson equation ∆φ(x, t) = 4πρ(x, t) which is solved by φ(x, t) =
d x ρ(x, t)/|x − x0 |, i.e. by an instantaneous ’Coulomb potential’ (hence the name
3 0

Coulomb gauge.) This gauge representation has also proven advantageous within the
context of quantum electrodynamics. However, we won’t discuss it any further in this
text.

. The passage from the physical fields E, B to potentials Aµ as fundamental variables


implies a convenient reduction of variables. Working with fields, we have 2 × 3
vectorial components, which, however are not independent: equations such as ∇×B = 0
represent constraints. This should be compared to the four components of the potential
Aµ . The gauge freedom implies that of the four components only three are independent
(think about this point.) In general, no further reduction in the number of variable
is possible. To see this, notice that the electromagnetic fields are determined by the
“sources” of the theory, i.e. the 4-current j µ . The continuity equation ∂µ j µ = 0 reduces
the number of independent sources down to 3. We certainly need three independent field
components to describe the field patterns created by the three independent sources. (The
electromagnetic field in a source–free environment j µ = 0 is described in terms of only
two independent components. This can be seen, e.g., by going to the Coulomb gauge.
The above equation for the time–like component of the potential is then uniquely solved
by φ = 0. Thus, only the two independent components of A constraint by ∇ · A = 0
remain.

1.3 Fourier transformation


In this – purely mathematical – section, we introduce the concept of Fourier transformation.
We start with a formal definition of the Fourier transform as an abstract integral transfor-
mation. Later in the section, we will discuss the application of Fourier transformation in the
solution of linear differential equations, and its conceptual interpretation as a basis-change in
function space.
8 CHAPTER 1. ELECTRODYNAMICS

1.3.1 Fourier transform (finite space)


Consider a function f : [0, L] → C on an interval [0, L] ⊂ R. Conceptually, the Fourier
transformation is a representation of f as a superposition (a sum) of plane waves:

1 X ikx
f (x) = e fk , (1.20)
L k

where the sum runs over all values k = 2πn/L, n ∈ Z. Series representations of this type are
summarily called Fourier series representations. At this point, we haven’t proven that the
function f can actually be represented by a series representation of this type. However, if it
exists, the coefficients fk can be computed with the help of the identity (prove it)
1 L
Z
0
dx e−ikx eikx = δkk0 , (1.21)
L 0
where k = 2πn/L, k 0 = 2πn0 /L and δkk0 ≡ δnn0 . We just need to multiply Eq. (1.1) by the
function e−ikx and integrate over x to obtain
Z L Z L
−ikx (1.1) 1 X 0
dx e f (x) = dx e−ikx eik x fk0
0 L 0 k0
X 1 Z L
0 (1.21)
= fk 0 dx e−ikx eik x = fk .
k0
L 0

We call the numbers fk ∈ C the Fourier coefficients of the function f . The relation
Z L
fk = dx e−ikx f (x), (1.22)
0

connecting a function with its Fourier coefficients is called the Fourier transform of f .
The ’inverse’ identity (1.1) expressing f through its Fourier coefficients is called the inverse
Fourier transform. To repeat, at this point, we haven’t proven the existence of the inverse
transform, i.e. the possibility to ’reconstruct’ a function f from its Fourier coefficients fk .
A criterion for the existence of the Fourier transform is that the insertion of (1.22) into
the r.h.s. of (1.1) gives the back the original function f :
Z L
! 1 0
X
f (x) = e ikx
dx0 e−ikx f (x0 ) =
L k 0
Z L
1 X ik(x−x0 )
= dx0 e f (x0 ).
0 L k

A necessary and sufficient condition for this to hold is that


1 X ik(x−x0 )
e = δ(x − x0 ), (1.23)
L k
1.3. FOURIER TRANSFORMATION 9

1.0

0.5

1.5 2.0 2.5 3.0

-0.5

-1.0

Figure 1.1: Representation of a function (thick line) by Fourier series expansion. The graphs
show series epansions up to (1, 3, 5, 7, 9) terms in *

where δ(x) is the Dirac δ–function. For reasons that will become transparent below, Eq. (1.23)
is called a completeness relation. Referring for a proof of this relation to the exercise below,
we here merely conclude that Eq. (1.23) establishes Eq. (1.22) and its inverse (1.1) as an
integraltransform, i.e. the full information carried by f (x) is contained in the set of all Fourier
coefficients {fk }. By way of example, figure 1.1 illustrates the approximation of a sawtooth
shaped function f by Fourier series truncated after a finite number of terms.

EXERCISE Prove the completeness relation. To this end, add an infinitesimal “convergence
generating factor” δ > 0 to the sum:

1 X ik(x−x0 )−δ|k| 1 X 2πin(x−x0 )/L−2π|n|δ/L
e = e .
L L n=−∞
k

Next compute the sum as a convergent geometric series. Take the limit δ & 0 and convince
yourself that you obtain a representation of the Dirac δ–function.

It is straightforward to generalize the formulae above to functions defined in higher dimen-


sional spaces: consider a function f : M → C, where M = [0, L1 ] × [0, L2 ] × · · · × [0, Ld ] is
a d–dimensional cuboid. It is straightforward to check, that the generalization of the formulae
above reads
d
Y 1 X ik·x
f (x) = e fk ,
L l
Zl=1 k (1.24)
fk = dd x e−ix·k f (x),
M

where the sum runs over all vectors k = (k1 , . . . , kd ), kl = 2πnl /Ll , nl ∈ Z. Eqs. (1.21)
10 CHAPTER 1. ELECTRODYNAMICS

and (1.23) generalize to


Z
0
dd x ei(k−k )·x = δk,k0 ,
M
d
Y 1 X ik(x−x0 )
e = δ(x − x0 ),
l=1
Ll
k

where δk,k0 ≡ dl=1 δkl ,kl0 .


Q

Finally, the Fourier transform of vector valued functions, f : M → Cn is defined


through the Fourier transform of its components, i.e. fk is an n–dimensional vector whose
components fkl , are given by the Fourier transform of the component functions f l (x), l =
1, . . . , n.
This completes the definition of Fourier transformation in finite integration domains. At
this point, a few general remarks are in order:

. Why is Fourier transformation useful? There are many different answers to this
question, which underpins the fact that we are dealing with an important concept: In
many areas of electrical engineering, sound engineering, etc. it is customary to work
with wave like signals, i.e. signals of harmonic modulation ∼ Re eikx = cos(kx). The
(inverse) Fourier transform is but the decomposition of a general signal into harmonic
constituents. Similarly, many “signals” encountered in nature (water or sound waves
reflected from a complex pattern of obstacles, light reflected from rough surfaces, etc.)
can be interpreted in terms of a superposition of planar waves. Again, this is a situation
where a Fourier ’mode’ decomposition is useful.

On a different note, Fourier transformation is an extremely potent tool in the solution


of linear differential equations. This is a subject to which we will turn below.

. There is some freedom in the P definitions of FourierR transforms. For example,


we might have chosen f (x) = c
ke
P L −ikx
ikx
fk with fk = 1
R c dx e−ikx fk and an arbitrary
1 ikx
constant c. Or, f (x) = L k e fk with fk = dx e fk (inverted sign of the
phases), or combinations thereof. The actual choice of convention is often a matter of
convenience.

1.3.2 Fourier transform (infinte space)


In many applications, we want to consider functions f : Rd → C without explicit reference to
some finite cuboid shaped integration domain. Formally, the extension to infinite space may
be effected by sending the extension of the integration interval, L, to infinity. In the limit
L → ∞, the spacing of consecutive Fourier modes ∆k = 2π/L approaches zero, and Eq.
1.3. FOURIER TRANSFORMATION 11

(1.1) is identified as the Riemann sum representation of an integral,


1 X ikx 1 X
lim f (x) = lim e fk = lim ∆k ei∆knx f∆kn
L→∞ L→∞ L 2π ∆k→0
k n∈Z
1
Z
= dk eikx f (k),

Following a widespread convention, we denote the Fourier representation of functions in infinite
space by f (k) instead of fk . Explicit reference to the argument “k” will prevent us from
confusing the Fourier representation f (k) from the original function f (x).
The infinite space version of the inverse transform reads as
Z ∞
f (k) = dx e−ikx f (x). (1.25)
−∞

While this definition is formally consistent, it has a problem: only functions decaying suf-
ficiently fast at infinity so as to make f (x) exp(ikx) integrable can be transformed in this
way. This makes numerous function classes of applied interest – polynomials, various types of
transcendental functions, etc. – non-transformable and severely restrict the usefulness of the
formalism.
However, by a little trick, we may upgrade the definition to one that is far more useful.
Let us modify the transform according to
Z
2
f (k) = lim dx e−ikx−δx f (x). (1.26)
δ&0

Due to the inclusion of the convergence generating factor, δ, any function that does not
increase faster than exponential can be transformed! Similarly, it will be useful to include a
convergence generating in the reverse transform, i.e.
Z
2
f (x) = lim dk eikx−δk f (k).
δ&0

Finally, upon straightforward extension of these formulae to multi–dimensional functions f :


Rd → C, we arrive at the definition of Fourier transform in infinite space

dd k ix·k−δk2
Z
f (x) = lim e f (k),
δ&0 (2π)d
Z (1.27)
d −ik·x−δx2
f (k) = lim d xe f (x).
δ&0

In these formulae, x2 ≡ x · x is the norm of the vector x, etc. A few more general remarks
on these definitions:
. To keep the notation simple, we will suppress the explicit reference to the convergence
generating factor, unless convergence is an issue.
12 CHAPTER 1. ELECTRODYNAMICS

. Occasionally, it will be useful to work with convergence generating factors different


from the one above. For example, in one dimension, it may be convenient to effect
convergence by the definition
1
Z
f (x) = lim dk eixk−δ|k| f (k),
δ&0 2π
Z
f (k) = lim dx e−ikx−δ|x| f (x).
δ&0

. For later reference, we note that the generalization of the completeness relation to
infinite space reads as

dd k ik·(x−x0 )
Z
e = δ(x − x0 ). (1.28)
(2π)d

. Finally, one may wonder, whether the inclusion of the convergence generating factor
might not spoil the consistency of the transform. To check that this is not the case, we
need to verify the infinite space variant of the completeness relation (1.23), i.e.

dd k ik·(x−x0 )−δk2
Z
lim e f (k) = δ(x − x0 ). (1.29)
δ&0 (2π)d
(For only if this relation holds, the two equations in (1.27) form a consistent pair, cf.
the argument in section 1.3.1.)
EXERCISE Prove relation (1.29). Do the Gaussian integral over k (hint: notice that the integral
factorizes into d one–dimensional Gaussian integrals.) to obtain a result that in the limit δ & 0
asymptotes to a representation of the δ–function.

1.3.3 Fourier transform and solution of linear differential equations


One of the most striking features of the Fourier transform is that it turns derivative operations
3
(complicated) into algebraic multiplications (simple): consider the function ∂x f (x). Without
much loss of generality, we assume the function f : [0, L] → C to obey periodic boundary
conditions, f (0) = f (L). A straightforward integration by parts shows that the Fourier
coefficients of ∂x f are given by ikfk . In other words, denoting the passage from a function to
its Fourier transform by an arrow,

f (x) −→ fk ⇒ ∂x f (x) −→ ikfk .

This operation can be iterated to give

∂xn f (x) −→ (ik)n fk .


3
d
Although our example is one–dimensional, we write ∂x instead of dx . This is in anticipation of our
higher–dimensional applications below.
1.3. FOURIER TRANSFORMATION 13

In a higher dimensional setting these identities generalize to (think why!)


∂l f (x) −→ (ikl )fk ,
where ∂l ≡ ∂x∂ l and kl is the l–component of the vector k. Specifically, the operations of
vector analysis become purely algebraic operations:
∇f (x) −→ ikfk ,
∇ · f (x) −→ ik · fk ,
∇ × f (x) −→ ik × fk ,
∆f (x) −→ −(k · k)fk , (1.30)
etc. Relation of this type are of invaluable use in the solution of linear differential equations
(of which electrodynamics is crawling!)
To begin with, consider a linear differential equation with constant coefficients. This
is a differential equation of the structure
(1) (2)
(c(0) + ci ∂i + cij ∂i ∂j + . . . )ψ(x) = φ(x),
(n)
where ci,j,... ∈ C are constants. We may now Fourier transform both sides of the equation to
obtain
(1) (2)
(c(0) + ici ki − cij ki kj + . . . )ψk = φk .
This is an algebraic equation which is readily solved as
φk
ψk = (1) (2)
.
c(0) + ici ki − cij ki kj + . . .
The solution ψ(x) can now be obtained by computation of the inverse Fourier transform,
dd k φk e−ik·x
Z
ψ(x) = , (1.31)
(2π)d c(0) + ic(1) (2)
i ki − cij ki kj + . . .

Fourier transform methods thus reduce the solution of linear partial differential equations with
constant coefficients to the computation of a definite integral. Technically, this integral may
be difficult to compute, however, the conceptual task of solving the differential equation is
under control.
EXAMPLE By way of example, consider the second order ordinary differential equation
(−∂x2 + λ−2 )ψ(x) = cδ(x − x0 ),
4
where λ, c and x0 are constants. Application of formula (1.31) gives a solution

dk e−ikx eikx0
Z
ψ(x) = c .
2π k 2 + λ−2
4
The solution is not unique: addition of a solution ψ0 of the homogeneous equation (−∂x2 + λ−2 )ψ0 (x) = 0
gives another solution ψ(x) → ψ(x) + ψ0 (x) of the inhomogeneous differential equation.
14 CHAPTER 1. ELECTRODYNAMICS

The integral above can be done by the method of residues (or looked up in a table). As a result
we obtain the solution.
ψ(x) = cλe−|x−x0 |/λ .
(Exercise: convince yourself that ψ is a solution to the differential equation.) In the limit λ → ∞,
the differential equation collapses to a one–dimensional variant of the Poisson equation, −∂x2 ψ(x) =
cδ(x − x0 ) for a charge c/4π at x0 . Our solution asymptotes to

ψ(x) = −c|x − x0 |,

which we may interpret as a one–dimensional version of the Coulomb potential.

However, the usefulness of Fourier transform methods has its limits. The point is that the
Fourier transform does not only turn complicated operations (derivatives) simple, the converse
is also true. The crux of the matter is expressed by the so–called convolution theorem:
straightforward application of the definitions (1.27) and of the completeness relation (1.29)
shows that
dd p
Z
f (x)g(x) −→ (f ∗ g)(k) ≡ f (k − p)g(p). (1.32)
(2π)d
In other words, the Fourier transform turns a simple product operation into a non–local inte-
gral convolution. Specifically, this means that the Fourier transform representation of (linear)
differential equations with spatially varying coefficient functions will contain convolution oper-
ations. Needless to say that the solution of the ensuing integral equations can be complicated.
Finally, notice the mathematical similarity between the Fourier transform f (x) → f (k) and
its reverse f (k) → f (x). Apart from a sign change in the phase and the constant pre–factor
(2π)−d the two operations in (1.27) are identical. This means that identities such as (1.30)
or (1.32) all have a reverse version. For example,

∂k f (k) −→ −ixf (x),

a convolution, Z
(f ∗ g)(x) ≡ dd xf (x − y)g(y) −→ f (k)g(k)

transforms into a simple product, etc. In practice, one often needs to compromise and figure
out whether the real space or the Fourier transformed variant of a problem offers the best
perspectives for mathematical attack.

1.3.4 Algebraic interpretation of Fourier transform


While it is perfectly o.k. to apply the Fourier transform as a purely operational tool, new
perspective become available once we understand the conceptual meaning of the transform.
The Fourier transform is but one member of an entire family of integral transforms. All these
transforms have a common basis, which we introduce below. The understanding of these
principles will elucidate the mathematical structure of the formulas introduced above, and be
of considerable practical use.
1.3. FOURIER TRANSFORMATION 15

A word of caution: this section won’t contain difficult formulae. Nonetheless, it doesn’t
make for easy reading. Please do carefully think about the meaning of the structures introduced
below.

Function spaces as vector spaces

Consider the space V of L2 functions f : [0, L] → C, i.e. the space of functions for which
RL
0
dx |f |2 exists. We call this a ’space’ of functions, because V is a vector space: for two
elements f, g ∈ V and a, b ∈ C, af + bg ∈ V , the defining criterion of a complex vector
space. However, unlike the finite–dimensional spaces studied in ordinary linear algebra, the
dimension of V is infinite. Heuristically, we may think of the dimension of a complex vector
space as the number of linearly independent complex components required to uniquely specify
a vector. In the case of functions, we need the infinitely many “components” f (x), x ∈ [0, L]
for a full characterization.
1.0
In the following we aim to develop some intuition for the “linear
algebra” on function spaces. To this end, it will be useful to 0.5

think of V as the limit of finite dimensional vector spaces.


Let xi = iδ, δ = L/N be a discretization of the interval 0.2 0.4 0.6 0.8 1.0

[0, L] into N segments. Provided N is sufficiently large, the -0.5

data set fN ≡ (f1 , . . . , fN ), fi ≡ f (xi ) will be an accurate -1.0

representative of a (smooth) function f (see the figure for a discretization of a smooth function
into N = 30 data points). We think of the N -component complex vectors fN generated in
this way as elements of an N -dimensional complex vector space VN . Heuristically it should be
clear that in the limit of infinitely fine discretization, N → ∞, the vectors fN carry the full
information on their reference functions, and VN approaches a representative of the function
5
space “limN →∞ VN = V ”.

5
The quotes are quite appropriate here. Many objects standard in linear algebra, traces, determinants,
etc. do not afford a straightforward extrapolation to the infinite dimensional case. However, in the present
context, the ensuing structures – the subject of functional analysis – do not play an essential role, and a
naive approach will be sufficient.
16 CHAPTER 1. ELECTRODYNAMICS

On VN , we may P define a complex scalar product, h , i : VN × VN → C


through, hf , gi ≡ N1 N ∗ 6 −1
i=1 fi gi . Here, the factor N has been included to keep
the scalar product roughly constant as N increases. In the limit N → ∞, the
operation h , i evolves into a scalar product on the function space V : to two
functions f, g ∈ V , we assign
Z L
1 X ∗
hf, gi ≡ lim fi gi = dxf ∗ (x)g(x).
N →∞ N 0
i

Corresponding to the scalar product on VN , we have a natural orthonormal basis


{ei |i = 1, . . . , N }, where ei = (0, . . . , N, 0, . . . ) and the entry N sits at the ith
position. With this definition, we have

hei , ej i = N δij .

The N –normalization of the basis vectors ei implies that the coefficients of general
vectors f are simply obtained as
fi = hei , f i. (1.33)
What is the N → ∞ limit of these expressions? At N → ∞ the vectors ei become the
representative of functions that are “infinitely sharply peaked” around a single point (see the
figure above). Increasing N , and choosing a sequence of indices i that correspond to a fixed
7
point y ∈ [0, L], we denote this function by δy . Equation (1.33) then asymptotes to
Z L
f (y) = hδy , f i = dx δy (x)f (x),
0

8
where we noted that δy is real. This equation identifies δy (x) = δ(y − x) as the δ-function.
This consideration identifies the δ–functions {δy |y ∈ [0, L]} as the function space analog of
the standard basis associated to the canonical scalar product. Notice that there are “infinitely
many” basis vectors. Much like we need to distinguish between vectors v ∈ VN and the
components of vectors, vi ∈ C, we have to distinguish between functions f ∈ V and their
“coefficients” or function values f (x) = hδx , f i ∈ C.

Fourier transformation is a change of basis


Why did we bother to introduce these formal analogies? The point is that much like in
linear algebra we may consider bases different from the standard one. In fact, there is often
good motivation to switch to another basis. Imagine one aims to mathematically describe
a problem displaying a degree of “symmetry”. Think, for example, of the Poisson equation
of a rotationally symmetric charge distribution. It will certainly be a good idea, to seek for
6
If no confusion is possible, we omit the subscript N in fN .
7
For example, i = [N y/L], where [x] is the largest integer smaller than x ∈ R.
8
Strictly speaking, δy is not a “function”. Rather, it is what in mathematics is called a distribution.
However, we will continue to use the loose terminology “δ–function”.
1.3. FOURIER TRANSFORMATION 17

solutions that reflect this symmetry, i.e. one might aim to represent the solution in terms
of a basis of functions that show definite behavior under rotations. Specifically, the Fourier
transformation is a change to a function basis that behaves conveniently when acted upon by
derivative operations.
To start with, let us briefly recall the linear algebra of changes of basis in finite dimen-
sional vector spaces. Let {wα |α = 1, . . . , N } be a second basis of the space VN (besides
the reference basis {ei |i = 1, . . . , N }). For simplicity, we presume that the new basis is
orthonormal with respect to the given scalar product, i.e.

hwα , wβ i = N δα,β .

Under these conditions, any vector v ∈ VN may be expanded as


X
v= vα0 wα . (1.34)
α

Written out for individual components vi = hei , vi, this reads


X
vi = vα0 wα,i , (1.35)
α

where wα,i ≡ hei , wα i are the components of the new basis vectors in the old basis. The
coefficients of the expansion,Pvα0 , of v in the new basis are obtained by taking the scalar
product with wα : hwα , vi = β vβ0 hwα , wβ i = N vα0 , or

1 X ∗
vα0 = w v (1.36)
N j α,j j

Substitution of this expansion into (1.35) obtains vi = N −1 j α wα,j


P P ∗
wα,i vj . This relation
holds for arbitrary component sets {vi }, which is only possible if

1 X ∗
δij = w w . (1.37)
N α α,j α,i

This is the finite dimensional variant of a completeness relation. The completeness relation
holds only for orthonormalized bases. However, the converse is also true: any system of
vectors {wα |i = 1, . . . , n} that satisfying the completeness criterion (1.37) is orthonormal
and “complete” in the sense that it can be used to represent arbitrary vectors as in (1.34).
We may also write the completeness relation in a less index–oriented notation,
X
hei , ej i = hei , wα i hwα , ej i. (1.38)
α

EXERCISE Prove that the completeness relation (1.37) represents a criterion for the orthonor-
mality of a basis.
18 CHAPTER 1. ELECTRODYNAMICS

Let us now turn to the infinite dimensional generalization of these concepts. The (in-
verse) Fourier transform states (1.1) that an arbitrary element f ∈ V of function space (a
“vector”) can be expanded in a basis of functions {ek ∈ V |k ∈ 2πL−1 Z}. The function values
(“components in the standard basis”) of these candidate basis functions are given by

ek (x) = hδx , ek i = eikx .


9
The functions ek are properly orthonormalized (cf. Eq. (1.21))
Z L
hek , ek0 i = dx e∗k (x)ek0 (x) = Lδkk0 .
0

The inverse Fourier transform states that an arbitrary function can be expanded as
1X
f= fk e k . (1.39)
L k

This is the analog of (1.34). In components,


1X
f (x) = fk ek (x), (1.40)
L k

which is the analog of (1.35). Substitution of ek (x) = eikx gives (1.1). The coefficients can
be obtained by taking the scalar product hek , i with (1.39), i.e.
Z L
fk = hek , f i = dx e−ikx f (x),
0

which is the Fourier transform. Finally, completeness holds if the substitution of these coef-
ficients into the inverse transform (1.40) reproduces the function f (x). Since this must hold
for arbitrary functions, we are led to the condition
1 X −ikx iky
e e = δ(x − y),
L k

The abstract representation of this relation (i.e. the analog of (1.38)) reads (check it!)
1X
hδx , δy i = hδx , ek i hek , δy i.
L k

For the convenience of the reader, the most important relations revolving around completeness
and basis change are summarized in table 1.1. The column “invariant” contains abstract
definitions, while “components” refers to the coefficients of vectors/function values (For better
readability the normalization factors N we have our specific definition of scalar products above
are omitted in the table. )
9
However, this by itself does not yet prove their completeness as a basis!
1.3. FOURIER TRANSFORMATION 19

vector space function space


invariant components invariant components
elements vector, v components vi function, f value, f (x)
scalar product vi∗ wi dx f ∗ (x)g(x)
R
hv,wi hf,gi
standard basis ei ei,j =δij δx δx (y)=δ(x−y)
alternate basis wα wα,i ek ek (x)=exp ikx
∗ 0
orthonormality dx ei(k−k )x =Lδkk0
R
hwα ,wβ i=δαβ wα,i wβ,i =δαβ hek ,k0 i=Lδkk0
expansion v=vα wα vi =vα wα,i f= L 1 Pf e 1 P f eikx
f (x)= L
k k k
k k

coefficients fk = dx e−ikx f (x)
R
vα =hwα ,vi vα =wα,i vi fk =hek ,f i
∗ 1 hδ ,e ihe ,δ i 1 P eik(x−y)
completeness hei ,ej i=hei ,wα ihwα ,ej i δij =wα,i wα,j δ(x−y)= L x k k y δ(x−y)= L
k

Table 1.1: Summarizing the linear algebraic interpretation of basis changes in function space.
(In all formulas, a double index summation convention implied.)

With this background, we are in a position to formulate the motivation for Fourier
transformation from a different perspective. Above, we had argued that “Fourier transfor-
mation transforms derivatives into something simple”. The derivative operation ∂x : V → V
can be viewed as a linear operator in function space: derivatives map functions onto functions,
f → ∂x f and ∂x (af + bg) = a∂x f + b∂x g, a, b ∈ C, the defining criterion for a linear map.
In linear algebra we learn that given a linear operator, A : VN → VN , it may be a good idea
to look at its eigenvectors, wα , and eigenvalues, λα , i.e. Awα = λα wα . In the specific case,
where A is hermitean, hv, Awi = hAv, wi for arbitrary v, w, we even know that the set of
eigenvectors {wα } can be arranged to form an orthonormal basis.
These structures extend to function space. Specifically, consider the operator −i∂x , where
the factor (−i) has been added for convenience. A straightforward integration by parts shows
that this operator is hermitean:
Z Z
hf, (−i∂x )gi = dx f (x)(−i∂x )g(x) = dx (−i∂x f )∗ (x)g(x) = h(−i∂x )f, gi.

Second, we have,
(−i∂x ek )(x) = −i∂x exp(ikx) = k exp(ikx) = kek (x),
or
(−i∂x ) ek = k ek
for short. This demonstrates that

the exponential functions exp(ikx) are eigenfunctions of the derivative


operator, and k is the corresponding eigenvalue.

This observation implies the conclusion:

Fourier transformation is an expansion of functions in the “eigenbasis” of the


derivative operator, i.e. the basis of exponential functions {ek }.

This expansion is appropriate, whenever we need a “simple” representation of the derivative


operator. In the sections to follow, we will explore how Fourier transformation works in practice.
20 CHAPTER 1. ELECTRODYNAMICS

1.4 Electromagnetic waves in vacuum


As a warmup to our discussion of the full problem posed by the solution of Eqs. (1.13), we
consider the vacuum problem, i.e. a situation where no sources are present, j µ = 0. Taking
the curl of the second of Eqs. (1.13) and using (1.8) we then find
 
1 2
∆ − 2 ∂t B = 0.
c
Similarly, adding to the gradient of the first equation c−1 times the time derivative of the
second, and using Eq.(1.9), we obtain
 
1 2
∆ − 2 ∂t E = 0,
c
i.e. in vacuum both the electric field and the magnetic field obey homogeneous wave equations.

1.4.1 Solution of the homogeneous wave equations


The homogeneous wave equations are conveniently solved by Fourier transformation. To this
end, we define a four–dimensional variant of Eq. (1.27),
Z 3
d kdω
f (t, x) = f (ω, k)eik·x−iωt ,
(2π)4
Z
f (ω, k) = d3 xdt f (t, x)e−ik·x+iωt , (1.41)

The one difference to Eq. (1.27) is that the sign–convention in the exponent of the (ω/t)–sector
of the transform differs from that in the (k, x)–sector. We next subject the homogeneous wave
equation  
1 2
∆ − 2 ∂t ψ(t, x) = 0,
c
(where ψ is meant to represent any of the components of the E– or B–field) to this transfor-
mation and obtain   ω 2 
2
k − ψ(ω, k) = 0,
c
where k ≡ |k|. Evidently, the solution ψ must vanish for all values (ω, k), except for those
for which the factor k 2 − (ω/c)2 = 0. We may thus write
ψ(ω, k) = 2πc+ (k)δ(ω − kc) + 2πc− (k)δ(ω + kc),
where c± ∈ C are arbitrary complex functions of the wave vector k. Substituting this
representation into the inverse transform, we obtain the general solution of the scalar ho-
mogeneous wave equation
d3 k
Z
i(k·x−ckt) i(k·x+ckt)

ψ(t, x) = c + (k)e + c − (k)e . (1.42)
(2π)3
In words:
1.4. ELECTROMAGNETIC WAVES IN VACUUM 21

The general solution of the homogeneous wave equation is obtained by


linear superposition of elementary plane waves, ei(k·x∓ckt) ,

where each wave is weighted with an arbitrary coefficient c± (k).


The elementary constituents are called waves because for any
fixed instance of space, x/time, t they harmonically depend
on the complementary argument time, t/position vector, x.
The waves are planar in the sense that for all points in the
plane fixed by the condition x · k = const. the phase of the
wave is identical, i.e. the set of points k · x = const. defines a ‘wave front’ perpendicular
to the wave vector k. The spacing between consequtive wave fronts with the same phase
arg(exp(i(k · x − ckt)) is given by ∆x = 2π k
≡ λ, where λ is the wave length of the wave
−1
and λ = k/2π its wave number. The temporal oscillation period of the wave fronts is set
by 2π/ck.
Focusing on a fixed wave vector k, we next generalize our results to E
the vectorial problem posed by the homogeneous wave equations. Since
every component of the fields E and B is subject to its own independent
wave equation, we may write down the prototypical solutions k
B
E(x, t) = E0 ei(k·x−ωt) , B(x, t) = B0 ei(k·x−ωt) , (1.43)
where we introduced the abbreviation ω = ck and E0 , B0 ∈ C3 are constant coefficient
vectors. The Maxwell equations ∇ · B = 0 and (vacuum) ∇ · E = 0 imply the condition
k · E0 = k · B0 = 0, i.e. the coefficient vectors are orthogonal to the wave vector. Waves
of this type, oscillating in a direction perpendicular to the wave vector, are called transverse
waves. Finally, evaluating Eqs.(1.43) on the law of induction ∇ × E + c−1 ∂t B = 0, we obtain
the additional equation B0 = nk × E0 , where nk ≡ k/k, i.e. the vector B0 is perpendicular
to both k and E0 , and of equal magnitude as E0 . Summarizing, the vectors
k ⊥ E ⊥ B ⊥ k, |B| = |E| (1.44)
form an orthogonal system and B is uniquely determined by E (and vice versa).
At first sight, it may seem that we have been to liberal in formulating the solution (1.43):
while the physical electromagnetic field is a real vector field, the solutions (1.43) are manifestly
complex. The simple solution to this conflict is to identify Re E and Re B with the physical
10
fields.

1.4.2 Polarization
In the following we will discuss a number of physically different realizations of plane electro-
magnetic waves. Since B is uniquely determined by E, we will focus attention on the latter.
10
One may ask why, then, did we introduce complex notation at all. The reason is that working with
exponents of phases is way more convenient than explicitly having to distinguish between the sin and cos
functions that arise after real parts have been taken.
22 CHAPTER 1. ELECTRODYNAMICS

Let us choose a coordinate system such that e3 k k. We may then write


E(x, t) = (E1 e1 + E2 e2 )eik·x−iωt ,
where Ei = |Ei | exp(iφi ). Depending on the choice of the complex coefficients Ei , a number
of physically different wave–types can be distinguished.

Linearly polarized waves

For identical phases φ1 = φ2 = φ, we obtain


Re E = (|E1 |e1 + |E2 |e2 ) cos(k · x − ωt + φ),
i.e. a vector field linearly polarized in the direction of the y

vector |E1 |e1 + |E2 |e2 . The corresponding magnetic field is


given by
Re B = (|E1 |e2 − |E2 |e1 ) cos(k · x − ωt + φ), z
x

i.e. a vector perpendicular on both E and k, and phase synchronous with E.


INFO Linear polarization is a hallmark of many artificial light sources, e.g. laser light is usually
linearly polarized. Likewise, the radiation emitted by many antennae shows approximately linear
polarization. In nature, the light reflected from optically dense media – water surfaces, window
11
panes, etc. – tends to exhibit a partial linear polarization.

Circularly polarized waves


Next consider the case of a phase difference, φ1 = φ2 ∓ π/2 ≡ φ and equal amplitudes
|E1 | = |E2 | = E:
Re E(x, t) = E (e1 cos(k · x − ωt + φ) ∓ e2 sin(k · x − ωt + φ)) .

Evidently, the tip of the vector E moves on a circle — E is


circularly polarized. Regarded as a function of x (for any fixed
instance of time), E traces out a spiral whose characteristic
period is set by the wave number λ = 2π/k, (see Fig. ??.) y

The sense of orientation of this spiral is determined by the sign


of the phase mismatch. For φ1 = φ2 + π/2 ( φ1 = φ2 − π/2)
we speak of a right (left) circulary polarized wave or a wave of z
x

positive (negative) helicity.


INFO As with classical point particles, electromagnetic fields can be subjected to a quantization
procedure. In quantum electrodynamics it is shown that the quantae of the electromagnetic field,
the photons carry definite helicity, i.e. they represent the minimal quantum of circularly polarized
waves.
11
This phenomenon motivates the application of pole filters in photography. A pole filter blocks light of a
pre–defined linear polarization. This direction can be adjusted so as to suppress light reflections. As a result
one obtains images exhibiting less glare, more intense colouring, etc.
1.5. GREEN FUNCTION OF ELECTRODYNAMICS 23

Elliptically polarized waves


Circular and linear polarization represent limiting cases of a more general form of polarization.
Indeed, the minimal geometric structure capable of continuously interpolating between a line
segment and a circle is the ellipse. To conveniently see the appearance of ellipses, consider
the basis change, e± = √12 (e1 ± ie2 ), and represent the electric field as

E(x, t) = (E+ e+ + E− e− )eik·x−iωt .

In this representation, a circularly polarized wave corresponds


to the limit E+ = 0 (positive helicity) or E− = 0 (negative he-
licity). Linear polarization is obtained by setting E+ = ±E− .
It is straightforward to verify that for generic values of the ra- y

tio |E− /E+ | ≡ r one obtains an elliptically polarized wave


where the ratio between the major and the minor axis is set
by |1 + r/1 − r|. The tilting angle α of the ellipse w.r.t. the z
x

1–axis is given by α = arg(E− /E+ ).

This concludes our discussion of the polarization of electromagnetic radiation. It is impor-


tant to keep in mind that the different types of polarization discussed above represent limiting
cases of what in general is only partially or completely un–polarized radiation. Radiation of
a given value of the frequency, ω, usually involves the superposition of waves of different
wave vector k (at fixed wave number k = |k| = ωc−1 .) Only if the amplitudes of all partial
waves share a definite phase/amplitude relation, do we obtain a polarized signal. The degree
of polarization of a wave can be determined by computing the so–called Stokes parameters.
However, we will not discuss this concept in more detail in this text.

1.5 Green function of electrodynamics


To prepare our discussion of the full problem (1.19), let us consider the inhomogeneous scalar
wave equation  
1 2
∆ − 2 ∂t ψ = −4πg, (1.45)
c
where the inhomogeneity g and the solution f are scalar functions. As with the Poisson
equation (??) the weak spot of the wave equation is its linearity. We may, therefore, again
employ the concept of Green functions to simplify the solution of the problem.
24 CHAPTER 1. ELECTRODYNAMICS

INFO It may be worthwhile to rephrase the concept of the solution of linear differential equations
by Green function methods in the present dynamical context: think of a generic inhomo-
geneity 4πg(x, t) as an additive superposition of δ–inhomogeneities, δ(y,s) (x, t) ≡ δ(x − y)δ(t − s)
concentrated in space and time around y and s, resp. (see the figure, where the vertical bars rep-
resent individual contributions δ(y,s) weighted by g(y, s) and a finite discretization in space–time
is understood.)
Z
4πg(x, t) = d3 yds (4πg(y, s))δ(y,s) (x, t).

We may suppress the arguments (x, t) to write


this identity as
g
Z
4πg = d3 yds (4πg(y, s)) δ(y,s) .

This formula emphasizes the interpretation of t


g as a weighted sum of particular functions
δ(y,s) , with constant “coefficients” (i.e. num-
bers independent of (x, t)) 4πg(y, s). Denote x
the solutions of the wave equation for an in-
dividual point source δ(y,s) by G(y,s) . The
R equation means that the solution f of the equation for the function g is given
linearity of the wave
by the sum f = dd yds (4πg(y, s))G(y,s) of point source solutions, weighted by the coefficients
12
g(y, t). Or, reinstating the arguments (x, t),
Z
f (x, t) = d3 yds G(x, t; y, s) g(y, s). (1.46)

The functions G(y,s)(x,t) ≡ G(x, t; y, s) are the Green functions of the wave equation. Knowledge
of the Green function is equivalent to a full solution of the general equation, up to the final
integration (1.46).

The Green function of the scalar wave equation (1.45) G(x, t; x0 , t) is defined by the
differential equation
 
1 2
∆ − 2 ∂t G(x, t; x0 , t0 ) = −δ(x − x0 )δ(t − t0 ). (1.47)
c
Once the solution of this equation has been obtained (which requires specification of a set of
boundary conditions), the solution of (1.45) becomes a matter of a straightforward integration:
Z
f (x, t) = 4π d3 x0 dt0 G(x, t; x0 , t0 )g(x0 , t0 ). (1.48)

|x−x0 |,|t−t0 |→∞


Assuming vanishing boundary conditions at infinity G(x, t; x0 , t0 ) −→ 0, we next turn
to the solution of Eq.(1.47).
12
To check that this is a solution, act with the wave equation operator −∆ + c−2 ∂t on both sides of the
equation and use that (∆ − c−2 ∂t )G(x, t; y, s) = −δ(y,s) (x, t) ≡ −δ(x − y)δ(t − s).
1.5. GREEN FUNCTION OF ELECTRODYNAMICS 25

INFO In fact, the most elegant and efficient solution strategy utilizes methods of the theory
of complex functions. however, we do not assume familiarity of the reader with this piece of
mathematics and will use a more elementary technique.

We first note that for the chosen set of boundary conditions, the Green function G(x −
x0 , t − t0 ) will depend on the difference of its arguments only. We next Fourier transform
Eq.(1.47)R in the temporal variable, i.e. we act on theR equation with the integral transform
dt
f (ω) = 2π exp(iωt)f (t) (whose inverse is f (t) = dt exp(−iωt)f (ω).) The temporally
transformed equation is given by
1
∆ + k 2 G(x, ω) = − δ(x),

(1.49)

where we defined k ≡ ω/c. If it were not for the constant k (and a trivial scaling factor (2π)−1
on the l.h.s.) this equation, known in the literature as the Helmholtz equation, would be
equivalent to the Poisson equation of electrostatics. Indeed, it is straightforward to verify that
the solution of (1.49) is given by

1 e±ik|x|
G± (x, ω) = , (1.50)
8π 2 |x|
where the sign ambiguity needs to be fixed on physical grounds.
INFO To prove Eq.(1.50), we introduce polar coordinates centered around x0 and act with the
spherical representation of the Laplace operator (cf. section ??) on G± (r) = 8π1 2 eikr /r. Noting
that the radial part of the Laplace operator, ∆# is given by r−2 ∂r r2 ∂r = ∂r2 + (2/r)∂r and
∆# (4πr)−1 = −δ(x) (the equation of the Green function of electrostatics), we obtain

1  e±ikr
∆ + k 2 G± (r, ω) = 2 ∂r2 + 2r−1 ∂r + k 2

=
8π r
e±ikr  2 −1
 −1 ∓ikr −1 ±ikr ∓ikr −1 2 2 −1
 ±ikr 
= ∂r + 2r ∂r r + 2e (∂r r )∂r e +e r ∂r + k + 2r ∂r e =
8π 2
1 1
= − δ(x) ∓ 2ikr−2 ± 2ikr−2 = − δ(x),
2π 2π
as required.

Doing the inverse temporal Fourier transform, we obtain


±iωc−1 |x|
−iωt 1 e 1 δ(t ∓ c−1 |x|)
Z Z
± −iωt
G (x, t) = dω e Gω/c (x) = dω e = ,
8π 2 |x| 4π |x|
or
1 δ(t − t0 ∓ c−1 |x − x0 |)
G± (x − x0 , t − t0 ) = . (1.51)
4π |x − x0 |

For reasons to become clear momentarily, we call G+ (G− ) the retarded (advanced) Green
function of the wave equation.
26 CHAPTER 1. ELECTRODYNAMICS

t
x
x = ct

Figure 1.2: Wave front propagation of the pulse created by a point source at the origin
schematically plotted as a function of two–dimensional space and time. The width of the
front is set by δx = cδt, where δt is the duration of the time pulse. Its intensity decays as
∼ x−1 .

1.5.1 Physical meaning of the Green function


Retarded Green function
To understand the physical significance of the retarded Green function G+ , we substitute the
r.h.s. of Eq.(1.51) into (1.48) and obtain

g(x0 , t − |x − x0 |/c)
Z
f (x, t) = d3 x0 . (1.52)
|x − x0 |

For any fixed instance of space and time, (x, t), the solution f (x, t) is affected by the sources
g(x0 , t0 ) at all points in space and fixed earlier times t0 = t − |x − x|0 /c. Put differently, a time
t − t0 = |x − x0 |/c has to pass before the amplitude of the source g(x0 , t0 ) may cause an effect
at the observation point x at time t — the signal received at (x, t) is subject to a retardation
mechanism. When the signal is received, it is so at a strength g(x0 , t)/|x−x0 |, similarly to the
Coulomb potential in electrostatics. (Indeed, for a time independent source g(x0 , t0 ) = g(x0 ),
we may enter the wave equation with a time–independent ansatz f (x), whereupon it reduces
to the Poisson equation.) Summarizing, the sources act as ‘instantaneous Coulomb charges’
which are (a) of strength g(x0 , t0 ) and felt at times t = t0 + |x − x0 |/c.

EXAMPLE By way of example, consider a point source at the origin, which ‘broadcasts’ a signal
for a short instance of time at t0 ' 0, g(x0 , t0 ) = δ(x0 )F (t0 ) where the function F is sharply
peaked around t0 = 0 and describes the temporal profile of the source. The signal is then given
by f (x, t) = F (t − |x|/c)/|x|, i.e. we obtain a pattern of out–moving spherical waves, whose
amplitude diminishes as |x|−1 or, equivalently ∼ 1/tc (see Fig. 1.2.)

Advanced Green function


Consider now the solution we would have obtained from the advanced Green function, f (x, t) =
R 3 0 g(x0 ,t+|x−x0 |/c)
dx |x−x0 |
. Here, the signal responds to the behaviour of the source in the future.
1.5. GREEN FUNCTION OF ELECTRODYNAMICS 27

The principle of cause–and–effect or causality is violated implying that the advanced Green
function, though mathematically a legitimate solution of the wave equation, does not carry
physical meaning. Two more remarks are in order: (a) when solving the wave equation by
techniques borrowed from the theory of complex functions, the causality principle is built in
from the outset, and the retarded Green function automatically selected. (b) Although the
advanced Green function does not carry immanent physical meaning, its not a senseless object
altogether. However, the utility of this object discloses itself only in quantum theories.

1.5.2 Electromagnetic gauge field and Green functions


So far we have solved the wave equation for an arbitrary scalar source. Let us now specialize
to the wave equations (1.19) for the components of the electromagnetic potential, Aµ . To a
first approximation, these are four independent scalar wave equations for the four sources jµ .
We may, therefore, just copy the prototypical solution to obtain the retarded potentials of
electrodynamics
ρ(x0 , t − |x − x0 |/c)
Z
φ(x, t) = d3 x0 ,
|x − x0 |
0 0
1 3 0 j(x , t − |x − x |/c)
Z
A(x, t) = dx . (1.53)
c |x − x0 |
INFO There is, however, one important consistency check that needs to be performed: recalling
that the wave equations (1.19) hold only in the Lorentz gauge ∂µ Aµ = 0 ⇔ c−1 ∂t φ + ∇ · A = 0,
we need to check that the solutions (1.53) actually meet the Lorentz gauge condition.
Relatedly, we have to keep in mind that the sources are not quite independent; they obey the
continuity equation ∂µ j µ = 0 ⇔ ∂t ρ + ∇ · j = 0.
As in section ??, it will be most convenient to probe the gauge behaviour of the vector potential
in a fully developed Fourier language. Also, we will use a 4–vector notation throughout. In this
notation, the Fourier transformation (1.41) assumes the form
d4 x
Z
µ
f (k) = 4
f (x)eix kµ , (1.54)
(2π)
where a factor of c−1 has been absorbed in the integration measure and kµ = (k0 , k) with k0 =
13
ω/c. The Fourier transform of the scalar wave equation (1.45) becomes kµ k µ ψ(k) = −4πg(k).
Specifically, the Green function obeys the equation kµ k µ G(k) = −(2π)−4 which is solved by
G(k) = −((2π)4 kµ k µ )−1 . (One may check explicitly that this is the Fourier transform of our
solution (1.51), however for our present purposes there is no need to do so.) The solution of the
general scalar wave equation (1.48) obtains by convoluting the Green function and the source,
Z
f (x) = 4π(G ∗ g)(x) ≡ 4π d4 x G(x − x0 )g(x0 ).

Using the convolution theorem, this transforms to f (k) = (2π)4 4π G(k)g(k) = − 4πg(k)
kµ kµ . Specifi-
cally, for the vector potential, we obtain

Aµ = −4π . (1.55)
kν k ν
13
Recall that x0 = ct, i.e. kµ xµ = k0 x0 − k · x = ωt − k · x.
28 CHAPTER 1. ELECTRODYNAMICS

This is all we need to check that the gauge condition is fulfilled: The Fourier transform of the
Lorentz gauge relation ∂µ Aµ = 0 is given by kµ Aµ = k µ Aµ = 0. Probing this relation on (1.55),
we obtain kµ Aµ ∝ kµ j µ = 0, where we noted that kµ j µ = 0 is the continuity relation.
We have thus shown that the solution (1.53) conforms with the gauge constraints. As an important
byproduct, our proof above reveals an intimate connection between the gauge behaviour of the
electromagnetic potential and current conservation.

Eqs. (1.53) generalize Eqs.(??) and (??) to the case of dynamically varying sources. In
the next sections, we will explore various aspects of the physical contents of these equations.

1.6 Field energy and momentum


In sections ?? and 1.1 we have seen that the static electric and the magnetic field, resp.,
carry energy. How does the concept of field energy generalize to the dynamical case where
the electric and the magnetic field form an inseparable entity? Naively, one may speculate
that the energy, E, carried by anR electromagnetic field is given by the sum of its electric and
1
magnetic components, E = 8π d3 x(E · D + B · H). As we will show below, this expectation
actually turns out to be correct. However, there is more to the electromagnetic field energy
than the formula above. For example, we know from daily experience that electromagnetic
energy can flow, and that it can be converted to mechanical energy. (Think of a microwave
oven where radiation flows into whatever fast–food you put into it, to next deposit its energy
into [mechanical] heating.)
In section 1.6.1, we explore the balance between bulk electromagnetic energy, energy flow
(current), and mechanical energy. In section 1.6.2, we will see that the concept of field energy
can be generalized to one of “field momentum”. Much like energy, the momentum carried by
the electromagnetic field can flow, get converted to mechanical momentum, etc.

1.6.1 Field energy


We begin our discussion of electromagnetic field energy with a few formal operations on the
Maxwell equations: multiplying Eq. (1.4) by E· and Eq. (1.5) by −H·, and adding the results
to each other, we obtain
1 4π
E · (∇ × H) − H · (∇ × E) − (E · ∂t D + H · ∂t B) = j · E.
c c
Using that for general vector field v and w, (check!) ∇ · (v × w) = w · (∇ × v) − v · (∇ × w),
as well as E · ∂t D = ∂t (E · D)/2 and H · ∂t B = ∂t (H · B)/2, this equation can be rewritten
as
1 c
∂t (E · D + B · H) + ∇ · (E × H) + j · E = 0. (1.56)
8π 4π
To understand the significance of Eq.(1.60), let us integrate it over a test volume V :
1 c
Z Z Z
3
dt d x (E · D + B · H) = − dσ n · (E × H) − d3 x j · E, (1.57)
V 8π S(V ) 4π V
1.6. FIELD ENERGY AND MOMENTUM 29

where use of Gauß’s theorem has been made. The first term is the sum over the electric and
the magnetic field energy density, as derived earlier for static field configurations. We now
interpret this term as the energy stored in general electromagnetic field configurations and

1
w≡ (E · D + B · H) (1.58)

as the electromagnetic energy density. What is new in electrodynamics is that the field
energy may change in time. The r.h.s. of the equation tells us that there are two mechanisms
whereby the eletromagnetic field energy may be altered. First, there is a surface integral over
the so–called Poynting vector field

c
S≡ E × H. (1.59)

We interpret the integral over this vector as the energy current passing through the surface S
and the Poynting vector field as the energy current density. Thus, the pair (energy density,
w)/(Poynting vector S) plays a role similar to the pair (charge density, ρ)/(current density,
j) for the matter field. However, unlike with matter, the energy of the electromagnetic field
is not conserved. Rather, the balance equation of electromagnetic field energy above
states that
∂t w + ∇ · S = −j · E. (1.60)
The r.h.s. of this equation contains the matter field j which suggests that it describes the
conversion of electromagnetic into mechanical energy. Indeed, the temporal change of the
energy of a charged point particle in an electromagnetic field (cf. section 1.1 for a similar line
of reasoning) is given by dt U (x(t)) = −F · ẋ = −q(E + c−1 v × B) · v = −qE · v, where we
observed that the magnetic component of the Lorentz force is perpendicular to the velocity
and, therefore, does not do work on the particle. Recalling that the current densityR of a point
particle is given by j = qδ(x − x(t))v, this expression may be rewritten as dt U = d3 x j · E.
The r.h.s. of Eq. (1.59) is the generalization of this expression to arbitrary current densities.
Energy conservation implies that the work done on a current of charged particles has to be
taken from the electromagnetic field. This explains the appearance of the mechanical energy
on the r.h.s. of the balance equation (1.60).

INFO Equations of the structure


∂t ρ + ∇ · j = σ, (1.61)

generally represent continuity equations: suppose we have some quantity – energy, oxygen
14
molecules, voters of some political party – that may be described by a density distribution ρ.

14
In the case of discrete quantities (the voters, say) ρ(x, t)d3 x is the average number of “particles” present
in a volume d3 x where the reference volume d3 x must be chosen large enough to make this number bigger
than unity (on average.)
30 CHAPTER 1. ELECTRODYNAMICS

The contents of our quantity in a small reference volume d3 x may


change due to (i) flow of the quantity to somewhere else. Mathemati-
cally, this is described by a current density j, where j may be a function
of ρ. However, (ii) changes may also occur as a consequence of a con-
version of ρ into something else (the chemical reaction of oxygen to
carbon dioxide, voters changing their party, etc.). In (1.61), the rate
at which this happens is described by σ. If σ equals zero, we speak of
a conserved quantity. Any conservation law can be represented in differential form, Eq. (1.61),
or in the equivalent integral form
Z Z Z
3
dt d x ρ(x, t) + dS · j = d3 x σ(x, t). (1.62)
V S V

It is useful to memorize the form of the continuity equation. For if a mathematical equation of the
form ∂t (function no. 1) + ∇ · (vector field) = function no. 2 crosses our way, we know that we
have encountered a conservation law, and that the vector field must be “the current” associated
to function no. 1. This can be non–trivial information in contexts where the identification of
densities/currents from first principles is difficult. For example, the structure of (1.60) identifies
w as a density and S as its current.

1.6.2 Field momentum ∗


Unllike in previous sections on conservation laws, we here identify B = H, D = E, i.e. we
15
consider the vacuum case.

As a second example of a physical quantity that can be exchanged between matter and
electromagnetic fields, we consider momentum. According to Newtons equations, the change
of the total mechanical momentum Pmech carried by the particles inside a volume B is given
by the integrated force density, i.e.
 
1
Z Z
3 3
dt Pmech = d xf = d x ρE + j × B ,
B B c

where in the second equality we inserted the Lorentz force density. Using Eqs. (??) and (??)
to eliminate the sources we obtain
1
Z  
dt Pmech = d3 x (∇ · E)E − B × (∇ × B) + c−1 B × Ė .
4π B

Now, using that B × Ė = dt (B × E) − Ḃ × E = dt (B × E) − cE × (∇ × E) and adding


0 = B(∇ · B) to the r.h.s., we obtain the symmetric expression

1
Z
d3 x (∇ · E)E − E × (∇ × E) + B(∇ · B) − B × (∇ × B) + c−1 dt (B × E) ,

dt Pmech =
4π B
15
The discussion of the field momentum in matter (cf. ??) turns out to be a delicate matter, wherefore we
prefer to stay on the safe ground of the vaccum theory.
1.6. FIELD ENERGY AND MOMENTUM 31

which may be reorganized as


 
1
Z
3
dt Pmech − d xB × E =
4πc B
1
Z
= d3 x ((∇ · E)E − E × (∇ × E) + B(∇ · B) − B × (∇ × B)) .
4π B

This equation is of the form dt (something) = (something else). Comparing to our earlier
discussion of conservation laws, we are led to interpret the ‘something’ on the l.h.s. as a
conserved quantity. Presently, this quantity is the sum of the total mechanical momentum
density Pmech and the integral

1 S
Z
Pfield = d3 x g, g≡ E × B = 2. (1.63)
4πc c

The structure of this expression suggests to interpret Pfield as the momentum carried by
the electromagnetic field and g as the momentum density (which happens to be given
by c−2 × the Poynting vector.) If our tentative interpretation of the equation above as a
conservation law is to make sense, we must be able to identify its r.h.s. as a surface integral.
This in turn requires that the components of the (vector valued) integrand be representable
as a Xj ≡ ∂i Tij i.e. as the divergence of a vector–field Tj with components Tij . (Here,
j = 1, 2, 3 plays the role of a spectator
R 3 index.) RIf this is the case,
R we may, indeed, transform
3
the integral to a surface integral, B d x Xj = B d x ∂i Tij = ∂B dσ n · Tj . Indeed, it is not
difficult to verify that
 
δij
[(∇ · X) − X × (∇ × X)]j = ∂i Xi Xj − X · X
2

Identifiyng X = E, B and introducing the components of the Maxwell stress tensor as


 
1 δij
Tij = Ei Ej + Bi Bj − (E · E + B · B) , (1.64)
4π 2
R R
The r.h.s. of the conservation law assumes the form B d3 x ∂i Tij = ∂B dσ ni Tij , where ni
are the components of the normal vector field of the system boundary. The law of the
conservation of momentum thus assumes the final form
Z
dt (Pmech + Pfield )j = dσ ni Tij . (1.65)
∂B

Physically, dtdσni Tij is the (jth component of the) momentum that gets pushed through dσ
in time dt. Thus, dσni Tij is the momentum per time, or force excerted on dσ and ni Tij the
force per area or radiation pressure due to the change of linear momentum in the system.
32 CHAPTER 1. ELECTRODYNAMICS

It is straightforward to generalize the discussion above to the conservation of angular


momentum: The angular momentum carried by a mechanical system of charged particles
may be converted into angular momentum of the electromagnetic field. It is evident from
Eq. (1.63) that the angular momentum density of the field is given by

1
l=x×g = x × (E × B).
4πc
However, in this course we will not discuss angular momentum conservation any further.

1.6.3 Energy and momentum of plane electromagnetic waves


Consider a plane wave in vacuum. Assuming that the wave propagates in 3–direction, the
physical electric field, Ephys is given by

Ephys. (x, t) = Re E(x, t) = Re (E1 e1 + E2 e2 )eikx3 −iωt = r1 u1 (x3 , t)e1 + r2 u2 (x3 , t)e2 ,


where ui (x3 , t) = cos(φi + kx3 − ωt) and we defined Ei = ri exp(iφ1 ) with real ri . Similarly,
the magnetic field Bphys. is given by

Bphys. (x, t) = Re (e3 × E(x, t)) = r1 u1 (x3 , t)e2 − r2 u2 (x, t)e1 .

From these relations we obtain the energy density and Poynting vector as

1 1
w(x, t) = (E2 + B2 ) = ((r1 u1 )2 + (r2 u2 )2 )(x3 , t)
8π 4π
S(x, t) = cw(x, t)e3 ,

where we omitted the subscript ’phys.’ for notational simplicity.

EXERCISE Check that w and S above comply with the conservation law (1.60).

1.7 Electromagnetic radiation


To better understand the physics of the expressions (1.53), let us consider where the sources
are confined to within a region in space of typical extension d. Without loss of generality,
we assume the time dependence of the sources to be harmonic, with a certain characteristic
frequency ω, j µ (x) = j µ (x) exp(−iωt). (The signal generated by sources of more general
time dependence can always be obtained by superposition of harmonic signals.) As we shall
see, all other quantities of relevance to us (potentials, fields, etc.) will inherit the same
time dependence, i.e. X(x, t) = X(x) exp(−iωt). Specifically, Eq. (1.53) implies that the
electromagnetic potentials, too, oscillate in time, Aµ (x) = Aµ (x) exp(−iωt). Substituting
µ 0 )eik|x−x0 |
this ansatz into Eq. (1.53), we obtain Aµ (x) = 1c d3 x0 j (x|x−x
R
0| .
1.7. ELECTROMAGNETIC RADIATION 33

As in our earlier analyses of multipole fields, we assume that the observation point x is far
away from the source, r = |x|  d. Under these circumstances, we may focus attention on
the spatial components of the vector potential,
0 ik|x−x0 |
1 3 0 j(x )e
Z
A(x) = dx , (1.66)
c |x − x0 |

where we have substituted the time–oscillatory ansatz for sources and potential into (1.53)
and divided out the time dependent phases. From (1.66) the magnetic and electric fields are
obtained as
B = ∇ × A, E = ik −1 ∇ × (∇ × A), (1.67)
where in the second identity we used the source–free variant of the law of magnetic circulation,
c−1 ∂t E = −ikE = ∇ × B.
We may now use the smallness of the ratio d/r to expand |x − x0 | ' r − n · x0 + . . . , where
n is the unit vector in x–direction. Substituting this expansion into (1.66) and expanding
the result in powers of r−1 , we obtain a series similar in spirit to the electric and magnetic
multipole series discussed above. For simplicity, we here focus attention on the dominant
contribution to the series, obtained by approximating |x − x0 | ' r. For reasons to become
clear momentarily, this term,
eikr
Z
A(x) ' d3 x0 j(x0 ), (1.68)
cr
generates electric dipoleR radiation. In section ?? we have seen that for a static current
distribution, the integral j vanishes. However, for a dynamic distribution, we may engage
the Fourier transform of the continuity relation, iωρ = ∇ · j to obtain
Z Z Z Z
d x ji = d x ∇xi · j = − d x xi (∇ · j) = −iω d3 x0 xi ρ.
3 3 3 0

Substituting this result into (1.68)


R and recalling the definition of the electric dipole moment
of a charge distribution, d ≡ d3 x x · ρ, we conclude that

eikr
A(x) ' −ikd , (1.69)
r

is indeed controlled by the electric dipole moment.


Besides d and r another characteristic length scale in the problem is the characteristic
wave length λ ≡ 2π/k = 2πc/ω. For simplicity, we assume that the wave length λ  d is
much larger than the extent of the source.

INFO This latter condition is usually met in practice. E.g. for radiation in the MHz range,
ω ∼ 106 s−1 the wave length is given by λ = 3 · 108 m s−1 /106 s−1 ' 300 m much larger than the
extension of typical antennae.
34 CHAPTER 1. ELECTRODYNAMICS

We then need to distinguish between three


different regimes (see Fig. ??): the near zone,
d  r  λ, the intermediate zone, d 
λ ∼ r, and the far zone, d  λ  r. We
next discuss these regions in turn.

Near zone
For r  λ or kr  1, the exponential in
(1.68) may be approximated by unity and we
obtain
k
A(x, t) ' −i d eiωt ,
r
where we have re–introduced the time de-
pendence of the sources. Using Eq.(1.67) to
compute the electric and magnetic field, we
get
k
B(x, t) = i n × deiωt ,
r2
3n(n · d) − d iωt
E(x, t) = e .
r3
The electric field equals the dipole field (??)
created by a dipole with time dependent mo-
ment d exp(iωt) (cf. the figure where the numbers on the ordinate indicate the separation
from the radiation center in units of the wave length and the color coding is a measure of the
field intensity.) This motivates the denotation ‘electric dipole radiation’. The magnetic field
is by a factor kr  1 smaller than the electric field, i.e. in the near zone, the electromagnetic
field is dominantly electric. In the limit k → 0 the magnetic field vanishes and the electric
field reduces to that of a static dipole.
The analysis of the intermediate zone, kr ∼ 1 is complicated in as much as all powers
in an expansion of the exponential in (1.68) in kr must be retained. For a discussion of the
resulting field distribution, we refer to[1].

Far zone
For kr  1, it suffices to let the derivative operations on the argument kr of the exponential
function. Carrying out the derivatives, we obtain

ei(kr−ωt)
B = k2n × d , E = −n × B. (1.70)
r
These asymptotic field distributions have much in common with the vacuum electromagnetic
fields above. Indeed, one would expect that far away from the source (i.e. many wavelengths
away from the source), the electromagnetic field resembles a spherical wave (by which we
1.7. ELECTROMAGNETIC RADIATION 35

mean that the surfaces of constant phase form spheres.) For observation points sufficiently
far from the center of the wave, its curvature will be nearly invisible and we the wave will look
approximately planar. These expectations are met by the field distributions (1.70): neglecting
all derivatives other than those acting on the combination kr (i.e. neglecting corrections of
O(kr)−1 ), the components of E and B obey the wave equation (the Maxwell equations in
16
vacuum. The wave fronts are spheres, outwardly propagating at a speed c. The vectors
n ⊥ E ⊥ B ⊥ n form an orthogonal set, as we saw is characteristic for a vacuum plane wave.
Finally, it is instructive to compute the energy current carried by the wave. To this end,
we recall that the physically realized values of the electromagnetic field obtain by taking the
real part of (1.70). We may then compute the Poynting vector (1.59) as

ck 4 2 2 2 h... it ck
4
S= n (d − (n · d) ) cos (kr − ωt) −→ n (d2 − (n · d)2 ),
4πr2 8πr2
Where the last expression is the energy current temporally averaged over several oscillation
periods ω −1 . The energy current is maximal in the plane perpendicular to the dipole moment of
the source and decays according to an inverse square law. It is also instructive to compute the
total power radiated by the source, i.e. the energy current integrated over spheres of constant
radius r. (Recall that the integrated energy current accounts for the change of energy inside
the reference volume per time, i.e. the power radiated by the source.) Choosing the z–axis of
the coordinate system to be colinear with the dipole moment, we have d2 − (n · d)2 = d2 sin2 θ
and
ck 4 d2 2 π
Z 2π
ck 4 d2
Z Z
2
P ≡ dσn · S = r sin θdθ dφ sin θ = .
S 8πr2 0 0 3
Notice that the radiated power does not depend on the radius of the reference sphere, i.e.
the work done by the source is entirely converted into radiation and not, say, in a steadily
increasing density of vacuum field energy.

INFO As a concrete example of a radiation source, let us consider a center–fed linear an-
tenna, i.a. a piece of wire of length a carrying an AC current that is maximal at the center
of the wire. We model the current flow by the distribution I(z) = I0 (1 − 2|z|a−1 ) exp(−iωt),
where a  λ = c/ω and alignment with the z–axis has been assumed. Using the continu-
ity equation, ∂z I(z, t) + ∂t ρ(z, t), we obtain the charge density (charge per unit length) in the
wire as Rρ(z) = iI0 2(ωa)−1 sgn(z) exp(−iωt). The dipole moment of the source is thus given by
a
d = ez −a dz zρ(z, t) = iI0 (a/2ω), and the radiated power by

(ka)2 2
P = I .
12c 0
The coefficient Rrad ≡ (ka)2 /12c of the factor I02 is called radiation resistance of the antenna.
To understand the origin of this denotation, notice that [Rrad ] = [c−1 ] has indeed the dimension of
resistivity (exercise.) Next recall that the power required to drive a current through a conventional
resistor R is given by P = U I = RI 2 . Comparison with the expression above suggests to interpret
16
Indeed, one may verify (do it!) that the characteristic factors r−1 ei(kr−ωt) are exact solutions of the wave
equations; they describe the space–time profile of a spherical wave.
36 CHAPTER 1. ELECTRODYNAMICS

Rrad as the ‘resistance’ of the antenna. However, this resistance has nothing to do with dissipative
energy losses inside the antenna. (I.e. those losses that hold responsible for the DC resistivity of
metals.) Rather, work has to be done to feed energy into electromagnetic radiation. This work
determines the radiation resistance. Also notice that Rrad ∼ k 2 ∼ ω 2 , i.e. the radiation losses
increase quadratic in frequency. This latter fact is of great technological importance.

Our results above relied on a first order expansion in the ratio d/r between the extent of
the sources and the distance of the observation point. We saw that at this order of the
expansion, the source coupled to the electromagnetic field by its electric dipole moment. A
more sophisticated description of the field field may be obtained by driving the expansion in d/r
to higher order. E.g. at next order, the electric quadrupole moment and the magnetic dipole
moment enter the stage. This leads to the generation of electric quadrupole radiation and
magnetic dipole radiation. For an in–depth discussion of these types of radiation we refer
to [1].
Chapter 2

Macroscopic electrodynamics

Suppose we want to understand the electromagnetic fields in an environment containing ex-


tended pieces of matter. From a purist point of view, we would need to regard the O(1023 )
carriers of electric and magnetic moments — electrons, protons, neutrons — comprising the
medium as sources. Throughout, we will denote the fields e, . . . , h created by this system
of sources as microscopic fields. (Small characters are used to distinguish the microscopic
fields from the effective macroscopic fields to be introduced momentarily.) The ‘microscopic
Maxwell equations’ read as

∇·e = 4πρµ ,
1∂ 4π
∇×b− e = jµ ,
c ∂t c
1∂
∇×e+ b = 0,
c ∂t
∇·b = 0,

where ρµ and jµ are the microscopic charge and current density, respectively. Now, for several
reasons, it does not make much sense to consider these equations as they stand: First, it is
clear that attempts to get a system of O(1023 ) dynamical sources under control are bound
to fail. Second, the dynamics of the microscopic sources is governed by quantum effects and
cannot be described in terms of a classical vector field j. Finally, we aren’t even interested in
knowing the microscopic fields. Rather, (in classical electrodynamics) we want to understand
the behaviour of fields on ‘classical’ length scales which generally exceed the atomic scales by
1
far.
In the following, we develop a more praxis oriented theory of electromagnetism in matter.
We aim to lump the influence of the microscopic degrees of freedom of the matter–environment
into a few effective modifiers of the Maxwell equations.
1
For example, the wave–length of visible light is about 600 nm, whilst the typical extension of a molecule
is 0.1 nm. Thus, ‘classical’ length scales of interest are about three to four orders of magnitude larger than
the microscopic scales.

37
38 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

Figure 2.1: Schematic on the spatially averaged system of sources. A zoom into a solid (left)
reveals a pattern of ions or atoms or molecules (center left). Individual ions/atoms/molecules
centered at lattice coordinate xm may carry a net charge (qm ), and a dipole element dm (center
right). A dipole element forms if sub-atomic constituents (electrons/nuclei) are shifted asym-
metrically w.r.t. the center coordinate. In an ionic system, mobile electrons (at coordinates
xj contribute to the charge balance.)

2.1 Macroscopic Maxwell equations


Let us, then, introduce macroscopic fields by averaging the microscopic fields as

E ≡ hei, B ≡ hbi,
R 3 0
where the averaging procedure is defined
R by hf (x)i ≡ d x f (x − x0 )g(x0 ), and g is a weight
function that is unit normalized, gt = 1 and decays over sufficiently large regions in space.
Since the averaging procedure commutes with taking derivatives w.r.t. both space and time,
∂t,x E = h∂t,x ei (and the same for B), the averaged Maxwell equations read as

∇·E = 4πhρµ i,
1∂ 4π
∇×B− E = hjµ i,
c ∂t c
1∂
∇×E+ B = 0,
c ∂t
∇·B = 0.

2.1.1 Averaged sources


To better understand the effect of the averaging on the sources, we need to take a (superficial)
look into the atomic structure of a typical solid. Generally, two different types of charges in
solids have to be distinguished: first, there are charges — nuclei and valence electrons —
that are by no more than a vector am,j off the center coordinate of a molecule (or atom for
that matter) xm . Second, (in metallic systems) there are free conduction electrons which may
abide at arbitrary coordinates xi (t) in the system. Denoting
P the two contributions by ρb and
ρf , respectively,
P we have ρµ = ρb + ρf , where ρb (x, t) = m,j qm,j δ(x − xm (t) − am,j (t)) and
ρf = qe i δ(x − xi (t)). Here, qm,j is the charge of the ith molecular constituent and qe the
electron charge.
2.1. MACROSCOPIC MAXWELL EQUATIONS 39

Averaged charge density


The density of the bound charges, averaged over macroscopic proportions, is given by
Z X
hρb (x)i = dx0 g(x0 ) qm,j δ(x − x0 − xm − am,j ) =
m,j
X
= g(x − xm − am,j )qm,j '
m,j
X X X
' qm g(x − xm ) − ∇ g(x − xm ) · am,j qm,j =
m m j
| {z }
dm
X
= qm hδ(x − xm )i − ∇ · P(x).
m

In the second line, we used the fact that the range over which the weight function g changes
exceeds atomic extensions by far to Taylor expand to first order in the offsets a. The zeroth
order term of the expansion
P is oblivious to the molecular structure, i.e. it contains only the
total charge qm = j qm,j of the molecules and their center positions. The first order term
P 2
contains the molecular dipole moments, dm ≡ j am,j qm,j . By the symbol
* +
X
P(x, t) ≡ δ(x − xm (t))dm (t) , (2.1)
m

we denote the average polarization of the medium. Evidently, P is a measure of the density
of the dipole moments carried by the molecules in the system. P
The average density of the mobile carriers in the system is given by hρf (x, t)i = hqe i δ(x−
xi (t))i, so that we obtain the average charge density as
hρµ (x, t)i = ρav (x, t) − ∇ · P(x, t) (2.2)
where we defined the effective or macroscopic charge density in the system,
* +
X X
ρav (x, t) ≡ qm δ(x − xm (t)) + qe δ(x − xi (t)) , (2.3)
m i

Notice that the molecules/atoms present in the medium enter the quantity ρav (x, t) as point–
like entities. Their finite polarizability is accounted for by the second term contributing to hρi.
Also notice that most solids are electrically neutral on average, i.e. the positive charge of ions
accounted for in the first term in (2.3) cancels against the negative charge of mobile electrons
(the second term) meaning that ρav (x, t) = 0. Under these circumstances,

hρµ (x, t)i = −∇ · P(x, t). (2.4)


2
Here, we allow for dynamical atomic or molecular center coordinates, xm = xm (t). In this way, the defini-
tion of P becomes general enough to encompass non-crystalline, or liquid substances in which atoms/molecules
are not tied to a crystalline matrix.
40 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

where P is given by the averaged polarization (2.1). The electric field E in the medium
is caused by the sum of the averaged microscopic density, hρµ i, and an ’external’ charge
3
density,

∇ · E = −4π(ρ + hρµ i), (2.5)

where ρ is the external charge density.

Averaged current density


Averaging the current density proceeds in much the same way as the charge density procedure
outlined above. As a result of a somewhat more tedious calculation (see the info block below)
we obtain
hjµ i ' jav + Ṗ + c∇ × M, (2.6)
where * +
X 1 X
M(x) = δ(x − xm ) qm,j am,j × ȧm,j (2.7)
m
2c j

is the average density of magnetic dipole moments in the system. The quantity
* +
D X E X
jav ≡ qe ẋi δ(x − xi ) + qm ẋm δ(x − xm )
m

is the current carried by the free charge carriers and the point–like approximation of the
molecules. Much like the average charge density ρav this quantity vanishes in the majority of
cases – solids usually do not support non–vanishing current densities on average, i.e. jav = 0,
meaning that the average of the microscopic current density is given by

hjµ i = Ṗ + c∇ × M. (2.8)

In analogy to Eq. (2.5), the magnetic induction has the sum of hjµ i and an ’external’ current
density, j as its source:

1 4π
∇ × B − ∂t E = (hjµ i + j) . (2.9)
c c

INFO To compute the average of the microscopic current density,P


we we decompose jµ =
jb + jf the current density into a bound and a free part. With jb (x) = m,j (ẋm + ȧm,j )δ(x −
3
For example, for a macroscopic system placed into a capacitor, the role of the external charges will be
taken by the charged capacitor plates, etc.
2.1. MACROSCOPIC MAXWELL EQUATIONS 41

xm − am,j ), the former is averaged as


Z X
hjb (x)i = dx0 g(x0 ) qm,j (ẋm + ȧm,j ) δ(x − x0 − xm − am,j ) =
m,j
X
= g(x − xm − am,j )(ẋm + ȧm,j )qm,j '
m,j
" #
X X
' qm,j g(x − xm ) − ∇g(x − xm ) · am,j (ẋm + ȧm,j ).
m,j m

We next consider the different orders in a contributing to this expansion in turn. At zeroth order,
we obtain * +
X X
hjb (x)i(0) = qm g(x − xm )ẋm = qm ẋm δ(x − xm ) ,
m m
i.e. the current carried by the molecules in a point–like approximation. The first order term is
given by Xh i
hjb (x)i(1) = g(x − xm ) ḋm − (∇g(x − xm ) · dm ) ẋm .
m
The form of this expression suggests to compare it with the time derivative of the polarization
vector,
X Xh i
Ṗ = dt g(x − xm )dm = g(x − xm ) ḋm − (∇g(x − xm ) · ẋm ) dm .
m m

The difference of these two expressions is given by X ≡ hjb (x)i(1) − Ṗ = m ∇g(x − xm ) ×


P
(dm × ẋm ). By a dimensional argument, one may show that this quantity is negligibly small: the
magnitude |ẋm | ∼ v is of the order of the typical velocity of the atomic or electronic compounds
inside the solid. We thus obtain the rough estimate |X(q, ω)| ∼ v(∇P)(q, ω)| ∼ vq|P|, where
in the second step we generously neglected the differences in the vectorial structure of P and X,
resp. However, |(Ṗ)(q, ω)| = ω|P(q, ω)|. This means that |X|/|Ṗ| ∼ vq/ω ∼ v/c is of the order
of the ratio of typical velocities of non–relativistic matter and the speed of light. This ratio is so
small that it (a) over–compensates the crudeness of our estimates above by far and (b) justifies
to neglect the difference X. We thus conclude that
hjb i(1) ' Ṗ.
The second order term contributing to the average current density is given by
X
hjb i(2) = − qm,j am,j (ȧm,j · ∇g(x − xm ))
m,j

This expression is related to the density of magnetic dipole moments carried by the molecules.
The latter is given by (cf. Eq. (??) and Eq. (2.7))
1 X
M= qm,j am,j × ȧm,j g(x − xm ).
2c
m,j

As a result of a straightforward calculation, we find that the curl of this expression is given by
1 1 X
∇ × M = hjb i(2) + dt qm,j am,j (am,j · ∇g(x − xm )).
c 2c
m,j
42 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

The second term on the Pr.h.s. of this expression engages the time derivative of the electric
quadrupole moments ∼ qaa ab , a, b = 1, 2, 3 carried by the molecules. The fields generated by
these terms are arguably very weak so that

hjb i(2) ' c∇ × M.

P
Adding to the results above the contribution of the free carriers hjf (x)i = qe h i ẋi δ(x − xi )i,
we arrive at Eq. (2.6).

Interpretation of the sources I: charge and current densities

Eqs. (2.4) and (2.8) describe the spatial average of the microscopic sources of electromagnetic
fields in an extended medium. The expressions appearing on the r.h.s. of these equations can
be interpreted in two different ways, both useful in their own right. One option is to look at the
r.h.s. of, say, Eq. (2.4) as an effective charge density. This density obtains as the divergence
of a vector field P describing the average polarization in the medium (cf. Eq. (2.1)).
To understand the meaning of this expression, consider the cartoon of a solid shown in
Fig. 2.2. Assume that the microscopic constituents of the medium are ’polarized’ in a certain
4
direction. Formally, this polarization is described by a vector field P, as indicated by the
horizontal cut through the medium in Fig. 2.2. In the bulk of the medium, the ’positive’
and ’negative ends’ of the molecules carrying the polarization mutually neutralize and no net-
charge density remains. (Remember that the medium is assumed to be electrically neutral on
average.) However, the boundaries of the system carry layers of uncompensated positive and
negative polarization, and this is where a non-vanishing charge density will form (indicated
by a solid and a dashed line at the system boundaries.) Formally, the vector field P begins
and ends (has its ’sources’) at the boundaries, i.e. ∇ · P is non-vanishing at the boundaries
where it represents the effective surface charge density. This explains the appearance of ∇ · P
as a source term in the Maxwell equations.
Dynamical changes of the polarization ∂t P 6= 0, – due to, for example, fluctuations in the
orientation of polar molecules – correspond to the movement of charges parallel to P. The
5
rate ∂t P of charge movement is a current flow in P direction. Thus, ∂t P must appear as a
bulk current source in the Maxwell equations.
Finally, a non-vanishing magnetization M (cf. the vertical cut through the system) corre-
sponds to the flow of circular currents in a plane perpendicular to M. For homogeneous M,
the currents mutually cancel in the bulk of the system, but a non-vanishing surface current
remains. Formally, this surface current density is represented by the curl of M (Exercise:
compute the curl of a vector field that is perpendicular to the cut plane and homogeneous
across the systems.), i.e. c∇ × M is a current source in the effective Maxwell equations.

4
Such a polarization may form spontaneously, or, more frequently, be induced externally, cf. the info block
below.
5
Notice that a current flow j = ∂t P is not in conflict with the condition of electro-neutrality.
2.1. MACROSCOPIC MAXWELL EQUATIONS 43

Figure 2.2: Cartoon illustrating the origin of (surface) charge and current densities forming in
a solid with non-vanishing polarization and/or magnetization.

Interpretation of the sources II: fields

Above, we have interpreted ∇·P, ∂t P and ∇×M as charge and current sources in the Maxwell
equations. But how do we know these sources? It stands to reason that the polarization, P,
and magnetization, M, of a bulk material will sensitively depend on the electric field, E, and the
magnetic field, B, present in the system. For in the absence of such fields, E = 0, B = 0, only
6
few materials will ’spontaneously’ build up a non–vanishing polarization, or magnetization.
Usually, it takes an external electric field, to orient the microscopic constituents of a solid so
as to add up to a bulk polarization. Similarly, a magnetic field is usually needed to generate
a bulk magnetization.
This means that the solution of the Maxwell equations poses us with a self consistency
problem: An external charge distribution, ρext immersed into a solid gives rise to an electric
field. This field in turn creates a bulk polarization, P[E], which in turn alters the charge
distribution, ρ → ρeff [E] ≡ ρext − ∇ · P[E]. (The notation indicates that the polarization
depends on E.) Our task then is to self consistently solve the equation ∇· E = 4πρeff [E], with
a charge distribution that depends on the sought for electric field E. Similarly, the current
distribution j = jeff [B, E] in a solid depends on the electric and the magnetic field which
means that ’Amperès law’, too, poses a self consistency problem.
In this section we re–interpret polarization and magnetization in a manner tailor made to
the self consistent solution of the Maxwell equations. We start by substituting Eq. (2.2) into
(2.5). Rearranging terms we obtain

∇ · (E + 4πP) = 4πρ.
6
Materials disobeying this rule are called ferroelectric (spontaneous formation of polarization), or fer-
romagnetic (spontaneous magnetization), respectively. In spite of their relative scarcity, ferroelectric and
–magnetic materials are of huge applied importance. Application fields include, storage and display technol-
ogy, sensor technology, general electronics, and, of course, magnetism.
44 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

This equation suggests to introduce a field

D ≡ E + 4πP. (2.10)

The source of this displacement field is the external charge density,

∇ · D = 4πρ, (2.11)

(while the electric field has the spatial average of the full microscopic charge system as its
source, ∇ · E = 4πhρµ i.) Notice the slightly different perspectives at which Eqs. (2.5) and
(2.10) interpret the phenomenon of matter polarization. Eq. (2.5) places the emphasis on
the intrinsic charge density −4πhρµ i. The actual electric field, E, is caused by the sum of
external charges and polarization charges. In Eq. (2.10), we consider a fictitious field D, whose
sources are purely external. The field D differs from the actual electric field E by (4π×) the
polarization, P, building up in the system. Of course the two alternative interpretations of
polarization are physically equivalent. (Think about this point.)
The magnetization, M, can be treated in similar terms: Substituting Eq. (2.6) into (2.9),
we obtain
1 4π
∇ × (B − 4πM) − ∂t (E + 4πP) = j.
c c
The form of this equation motivates the definition of the magnetic field

H = B − 4πM. (2.12)

Expressed in terms of this quantity, the Maxwell equation assumes the form

1 4π
∇ × H − ∂t D = j, (2.13)
c c

where j is the external current density. According to this equation, H plays a role similar to D:
the sources of the magnetic field H are purely external. The field H differs from the induction
B (i.e. the field an magnetometer would actually measure in the system) by (−4π×) the
magnetization M. In conceptual analogy to the electric field, E, the induction B has the sum
of external and intrinsic charge and current densities as its source (cf. Eq. (2.9).)

Electric and magnetic susceptibility

To make progress with Eqs. (2.10) and (2.11), we need to say more about the polarization.
Above, we have anticipated that a finite polarization is usually induced by an electric field, i.e.
P = P[E]. Now, external electric fields triggering a non–vanishing polarization are ususally
much weaker than the intrinsic microscopic fields keeping the constituents of a solid together.
In practice, this means that the polarization may be approximated by a linear functional of the
2.1. MACROSCOPIC MAXWELL EQUATIONS 45

7
electric field. The most general form of a linear functional reads as
1
Z Z
3 0
P(x, t) = dx dt0 χ(x, t; x0 , t0 )E(x0 , t0 ), (2.14)
(2π)4

where the integral kernel χ is called the electric susceptibility of the medium and the factor
(2π)−4 has been introduced for convenience. The susceptibility χ(x, t; x0 , t0 ) describes how
8
an electric field amplitude at x0 and t0 < t affects the polarization at (x, t). Notice that the
polarization obtains by convolution of χ and E. In Fourier space, the equation assumes the
simpler form P(q) = χ(q)E(q), where q = (ω/c, q) as usual. Accordingly, the electric field
and the displacement field are connected by the linear relation

D(q) = (q)E(q), (q) = 1 + 4πχ(q), (2.15)

where the function (q) is known as the dielectric function.

INFO On a microscopic level, at least two different mechanisms causing field–induced polar-
ization need to be distinguished: Firstly, a finite electric field may cause a distortion of the electron
clouds surrounding otherwise unpolarized atoms or molecules. The relative shift of the electron
clouds against the nuclear centers causes a finite molecular dipole moment which integrates to
a macroscopic polarization P. Second, in many substances (mostly of organic chemical origin),
the molecules (a) carry a permanent intrinsic dipole moment and (b) are free to move. A finite
electric field will cause spatial alignment of the otherwise dis–oriented molecules. This leads to a
polarization of the medium as a whole.

Information on the frequency/momentum dependence of the dielectric function has to


be inferred from outside electrodynamics, typically from condensed matter physics, or plasma
9
physics. In many cases of physical interest — e.g. in the physics of plasmas or the physics of
metallic systems — the dependence of  on both q and ω has to be taken seriously. Sometimes,
only the frequency dependence, or the dependence on the spatial components of momentum
is of importance. However, the crudest approximation at all (often adopted in this course)
is to approximate the function (q) '  by a constant. (Which amounts to approximating
χ(x, t; x0 , t0 ) ∝ δ(x − x0 )δ(t − t0 ) by a function infinitely short ranged in space and time.) If
not mentioned otherwise, we will thus assume

D = E,

where  is a material constant.


7
Yet more generally, one might allow for a non co–linear dependence of the polarization on the field vector:
1
Z Z
3 0
Pi (x, t) = d x dt0 χij (x, t; x0 , t0 )Ej (x0 , t0 ),
(2π)4

where χ = {χij } is a 3 × 3–matrix field.


8
By causality, χ(x, t; x0 , t0 ) ∝ Θ(t − t0 ).
9
A plasma is a gas of charged particles.
46 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

As with the electric sector of the theory, the magnetic fields created by externally imposed
macroscopic current distributions (a) ‘magnetize’ solids, i.e. generate finite fields M, and (b)
are generally much weaker than the intrinsic microscopic fields inside a solid. This means that
we can write B = H + 4πM[H], where M[H] may be assumed to be linear in H. In analogy
to Eq. (2.14) we thus define

1
Z Z
3 0
M(x, t) = dx dt0 χm (x, t; x0 , t0 )H(x0 , t0 ), (2.16)
(2π)4

where the function χ is called the magnetic susceptibility of the system. The magnetic sus-
ceptibility describes how a magnetic field at (x0 , t0 ) causes magnetization at (x, t). Everything
we said above about the electric susceptibility applies in a similar manner to the magnetic
susceptibility. Specifically, we have the Fourier space relation M(q) = χm (q)H(q) and

B(q) = µ(q)H(q), µ(q) = 1 + 4πχm (q), (2.17)

where µ(q) is called the magnetic permeability. In cases where the momentum dependence
of the susceptibility is negligible, we obtain the simplified relation

B = µH, (2.18)

where µ is a constant. As in the electric case, a number of different microscopic mechanisms


causing deviations of µ off unity can be distinguished. Specifically, for molecules carrying no
intrinsic magnetic moment, the presence of an external magnetic field may induce molecular
currents and, thus, a non–vanishing magnetization. This magnetization is generally directed
opposite to the external field, i.e. µ < 1. Substances of this type are called diamagnetic.
(Even for the most diamagnetic substance known, bismuth, 1 − µ = O(10−4 ), i.e. diamag-
netism is a very weak effect.) For molecules carrying a non–vanishing magnetic moment, an
external field will cause alignment of the latter and, thus, an effective amplification of the
field, µ > 1. Materials of this type are called paramagnetic. Typical values of paramagnetic
permeabilities are of the order of µ − 1 = O(10−5 ) − O(10−2 ).
Before leaving this section, it is worthwhile to take a final look at the general structure
of the Maxwell equations in matter, We note that

. The averaged charge densities and currents are sources of the electric displacement field
D and the magnetic field H, respectively.

. These fields are related to the electric field E and the magnetic induction B by Eqs. (2.15)
and (2.17), respectively.

. The actual fields one would measure in an experiment are E and B; these fields determine
the coupling (fields matter) as described by the Lorentz force law.

. Similarly, the (matter un–related) homogeneous Maxwell equations contain the fields E
and B.
2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS 47

2.2 Applications of macroscopic electrodynamics


In this section, we will explore a number of phenomena caused by the joint presence of matter
and electromagnetic fields. The organization of the section parallels that of the vacuum part
of the course, i.e. we will begin by studying static electric phenomena, then advance to the
static magnetic case and finally discuss a few applications in electrodynamics.

2.2.1 Electrostatics in the presence of matter *


Generalities
Consider the macroscopic Maxwell equations of electrostatics

∇ · D = 4πρ,
∇ × E = 0,
D = E, (2.19)

where in the last equation we assumed a dielectric constant for simplicity. Now assume that
we wish to explore the fields in the vicinity of a boundary separating two regions containing
types of matter. In general, the dielectric constants characterizing the two domains will be
different, i.e. we have D = E in region #1 and D = 0 E in region #2. (For 0 = 1, we
describe the limiting case of a matter/vacuum interface.) Optionally, the system boundary
may carry a finite surface charge density η.
Arguing as in section ??, we find that the first and the second of the Maxwell equations
10
imply the boundary conditions

(D(x) − D0 (x)) · n(x) = 0,


(E(x) − E0 (x)) × n(x) = 0, (2.20)

where the unprimed/primed quantities refer to the fields in regions #1 and #2, respectively,
and n(x) is a unit vector normal to the boundary at x. Thus,

the tangential component of the electric field continuous and the normal
component of the displacement field are continuous at the interface.

Example: Dielectric sphere in a homogeneous electric field


To illustrate the computation of electric fields in static environments with matter, consider the
example of a massive sphere of radius R and dielectric constant  placed in an homogeneous
external displacement field D. We wish to compute the electric field E in the entire medium.
10
If the interface between #1 and #2 carries external surface charges, the first of these equations generalizes
to (D(x) − D0 (x)) · n(x) = 4πη(x), where η is the surface charge density. In this case, the normal component
of the displacement field jumps by an amount set the by the surface charge density.
48 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

Since there are no charges present, the electric potential inside and outside the sphere
obeys the Laplace equation ∆φ = 0. Further, choosing polar coordinates such that the z–axis
is aligned with the orientation of the external field, we expect the potential to be azimuthally
symmetric, φ(r, θ, φ) = φ(r, θ). Expressed in terms of this potential, the boundary equations
(2.20) assume the form

∂r φ r=R−0
= ∂r φ r=R+0
, (2.21)
∂θ φ r=R−0
= ∂θ φ r=R+0
, (2.22)
φ r=R−0
= φ r=R+0
, (2.23)

where R±0 ≡ limη→0 R±η. To evaluate these conditions, we expand the potential inside and
outside the sphere in a series of spherical harmonics,
q(??). Due to the azimuthal symmetry of
2l+1
the problem, only φ–independent functions Yl,0 = 4π
Pl contribute, i.e.

Al rl

X , r < R,
φ(r, θ) = Pl (cos θ) × l −(l+1) ,
Bl r + Cl r , r ≥ R,
l

where we have absorbed the normalization factors (2l + 1/4π)1/2 in the expansion coefficients
Al , Bl , Cl .
At spatial infinity, the potential must approach the potential φ = −Dz = −Dr cos θ =
−DrP1 (cos θ) of the uniform external field D = Dez . Comparison with the series above
shows that B1 = −D and Bl6=1 = 0. To determine the as yet unknown coefficients Al and Cl ,
we consider the boundary conditions above. Specifically, Eq. (2.21) translates to the condition
X
Pl (cos θ) lAl Rl−1 + (l + 1)Cl R−(l+2) + Dδl,1 = 0.

l

Now, the Legendre polynomials are a complete set of functions, i.e. the vanishing of the l.h.s.
of the equation implies that all series coefficients must vanish individually:

C0 = 0,
A1 + 2R−3 C1 + D = 0,
lAl Rl−1 + (l + 1)Cl R−(l+2) = 0, l > 1.
l −(l+1)
P
A second set of equations is obtained from Eq. (2.22): ∂ θ l Pl (cos θ)((Al −Bl )R −Cl R )=
l −(l+1)
P
0 or l>0 Pl (cos θ)((Al − Bl )R − Cl R ) = const. The second condition implies (think
why!) (Al − Bl )Rl − Cl R−(l+1) = 0 for all l > 0. Substituting this condition into Eq. (2.23),
we finally obtain A0 = 0. To summarize, the expansion coefficients are determined by the set
of equations

A0 = 0,
(A1 + D)R − C1 R−2 = 0,
Al Rl − Cl R−(l+1) = 0, l > 1.
2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS 49

D + +
+
+
+
+
+
+
+
+ + +
+ + + + + + + + + +
+ + + + + + + + + + + +
+ + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +

η<0
+ + + + + + + + + + + + + + + + + + + + + + + +

η>0 +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + +
+ + + + + + + + + + + +
+ + + + + + + + + +
+ + + + + + + +
+ + + +

Figure 2.3: Polarization of a sphere by an external electric field.

It is straightforward to verify that these equations are equivalent to the conditions

−1 3
C1 = DR3 , A1 = − D,
+2 +2
while Cl>1 = Al>1 = 0. The potential distorted by the presence of the sphere is thus given by

3

 ,r < R
+2

φ(r, θ) = −rE0 cos θ × −1 R
  3 (2.24)

 1 − , r ≥ R,
+2 r

The result (2.24) affords a very intuitive physical interpretation:

3 3
. Inside the sphere the electric field E = −∇φ = ∇E0 z +2 = E0 +2 ez is parallel to the
external electric field, but weaker in magnitude (as long as  > 1 which usually is the
case.) This indicates the buildup of a polarization P = −1

E = 4π3 −1
E.
+2 0

Outside the sphere, the electric field is given by a superposition of the external field and
the field generated by a dipole potential φ = d · x/r3 = d cos θ/r2 , where the dipole
moment is given by d = V P and V = 4πR3 /3 is the volume of the sphere.
To understand the origin of this dipole moment, notice that in a polarizable medium, and
in the absence of external charges, 0 = ∇ · D = ∇ · E + 4π∇ · P, while the microscopic
charge density ∇ · E = 4πρmic need not necessarily vanish. In exceptional cases where
ρmic does not vanish upon averaging, we denote hρmic i = ρpol the polarization charge.
The polarization charge is determined by the condition

∇ · P = −ρpol .

Turning back to the sphere, the equation 0 = ∇ · D = χ ∇ · P tells us that there is


no polarization charge in the bulk of the sphere (nor, of course, outside the sphere.)
50 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

However, arguing as in section ??, we find that the sphere (or any boundary between
two media with different dielectric constants) carries a surface polarization charge
ηind = Pn0 − Pn .
Specifically, in our problem, Pn0 = 0 so that η = −Pn = − 4π 3 −1
E cos θ. Qualitatively,
+2 0
the origin of this charge is easy enough to understand: while in the bulk of the sphere,
the induced dipole moments cancel each other upon spatial averaging (see the figure),
uncompensated charges remain at the surface of the medium. These surface charges
induce a finite and quite macroscopic dipole moment which is aligned opposite to E0
and acts as a source of the electric field (but not of the displacement field).

2.2.2 Magnetostatics in the presence of matter


Generalities
The derivation of ‘magnetic boundary conditions’ in the presence of materials with a finite
permeability largely parallels the electric case: the macroscopic equations of magnetostatics
read as

∇×H = j,
c
∇ · B = 0,
H = µ−1 B. (2.25)
Again, we wish to explore the behaviour of the fields in the vicinity of a boundary between
two media with different permeabilities. Proceeding in the usual manner, i.e. by application
of infinitesimal variants of Gaus̈’s and Stokes law, respectively, we derive the equations
(B(x) − B0 (x)) · n = 0,

(H(x) − H0 (x)) × n = js , (2.26)
c
where js is an optional surface current.

Example: paramagnetic sphere in a homogeneous magnetic field


Consider a sphere of radius R and permeability µ > 1 in the presence of an external magnetic
field H. We wish to compute the magnetic induction B inside and outside the sphere. In
principle, this can be done as in the analogous problem on the dielectric sphere, i.e. by
Legendre polynomial series expansion. However, comparing the magnetic Maxwell equations
to electric Maxwell equations considered earlier,
∇·B=0 ↔ ∇ · D = 0,
∇×H=0 ↔ ∇ × E = 0,
B = µH ↔ D = E,
−1 µ−1
P= E ↔ M= H,
4π 4π
2.3. WAVE PROPAGATION IN MEDIA 51

we notice that there is actually no need to do so: Formally identifying B ↔ D, H ↔ E, µ ↔


, M ↔ P, the two sets of equation become equivalent, and we may just copy the solution
obtained above. Thus (i) the magnetic field outside the sphere is a superposition of the
3 µ−1
external field and a magnetic dipole field, where (ii) the dipole moment M = 4π µ+2
H is
parallel to the external field. This magnetic moment is caused by the orientation of the
intrinsic moments along the external field axis; consequently, the actually felt magnetic field
(the magnetic induction) B = H + 4πM exceeds the external field.

2.3 Wave propagation in media


What happens if an electromagnetic wave impinges upon a medium characterized by a non–
11
trivial dielectric function ? And how does the propagation behaviour of waves relate to
actual physical processes inside the medium? To prepare the discussion of these questions, we
will first introduce a physically motivated model for thhe dependence of the dielectric function
on its arguments. This modelling will establish a concrete link between the amplitude of the
electromagnetic waves and the dynamics of the charge carriers inside the medium.

2.3.1 Model dielectric function


In Fourier space, the dielectric function, (q, ω) is a function of both wave vector and frequency.
It owes its frequency dependence to the fact that a wave in matter create excitations of the
molecular degrees of freedom. The feedback of these excitations to the electromagnetic wave
is described by the frequency dependence of the function . To get an idea of the relevant
frequency scales, notice that typical excitation energies in solids are of the order ~ω < 1 eV.
Noting that Planck’s constant ~ ' 0.6 × 10−15 eVs, we find that the characteristic frequencies
at which solids dominantly respond to electromagnetic waves are of the order ω < 1015 s−1 .
However, the wave lengths corresponding to these frequencies, λ = 2π/|q| ∼ 2πc/ω ∼ 10−6 m
are much larger than the range of a few interatomic spacings over which we expect the
non–local feedback of the external field into the polarization to be ‘screened’ (the range of
the susceptibility χ.) This means (think why!) that the function (q, ω) ' (ω) is largely
momentum–independent at the momentum scales of physical relevance.
To obtain a crude model for the function (ω), we imagine that the electrons surrounding
the molecules inside the solid are harmonically bound to the nuclear center coordinates. An
individual electron at spatial coordinate x (measured w.r.t. a nuclear position) will then be
subject to the equation of motion

m d2t + ω02 /2 + γdt x(t) = eE(t),



(2.27)

where ω0 is the characteristic frequency of the oscillator motion of the electron, γ a relaxation
constant, and the variation of the external field over the extent of the molecule has been
11
We emphasize dielectric function because, as shall be seen below, the magnetic permeability has a much
lesser impact on electromagnetic wave propagation.
52 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

neglected (cf. the discussion above.) Fourier transforming this equation, we obtain d ≡
ex(ω) = e2 E(ω)m−1 /(ω02 − ω 2 − iγω). The dipole moment d(ω) of a molecule
P with electrons
fi oscillating at frequency ωi and damping rate γi is given by d(ω) = i fi d(ω0 →ωi ,γ→γi ) .
Denoting the density of molecules in the by n, we thus obtain (cf. Eqs. (2.1), (2.10), and
(2.15))
4πne2 X fi
(ω) = 1 + 2
. (2.28)
m i
ωi − ω 2 − iγi ω

With suitable quantum mechanical definitions of the material constants fi , ωi and γi , the
model dielectric function (2.28) provides an accurate description of the molecular contri-
bution to the dielectric behaviour of solids. A schematic of the typical behaviour of real and
imaginary part of the dielectric function is shown in Fig. 2.4. A few remarks on the profile of
these functions:

. The material constant  employed in the ‘static’ sections of this chapter is given by
 = (0). The imaginary part of the zero frequency dielectric function is negligible.

. For reasons to become clear later on, we call the function

2ω p
α(ω) = Im µ(ω) (2.29)
c

the (frequency dependent) absorption coefficient and


p
n(ω) = Re µ(ω) (2.30)

the (frequency dependent) refraction coefficient of the system.

. For most frequencies, i.e. everywhere except for the immediate vicinity of a resonant
d
frequency ωi , the absorption coefficient is negligibly small. In these regions, dω Re (ω) >
0. The positivity of this derivative defines a region of normal dispersion.

. In the immediate vicinity of a resonant frequency, |ω − ωi | < γi /2, the absorption


d
coefficient shoots up while the derivative dω Re (ω) < 0 changes sign. As we shall
discuss momentarily, the physics of these frequency windows of anomalous dispersion
is governed by resonant absorption processes.

2.3.2 Plane waves in matter


To study the impact of the non–uniform dielectric function on the behaviour of electromagnetic
waves in matter, we consider the spatio–temporal Fourier transform of the Maxwell equations
in a medium free of extraneous sources, ρ = 0, j = 0. Using that D(k) = (k)E(k) and
2.3. WAVE PROPAGATION IN MEDIA 53

Im 

Re 

γi

ωi ω

Figure 2.4: Schematic of the functional profile of the function (ω). Each resonance frequency
ωi is center of a peak in the imaginary part of  and of a resonant swing of in the real part.
The width of these structures is determined by the damping rate γi .

H(k) = µ−1 (k)B(k), where k = (ω/c, k), we have


k · (E) = 0,
ω
k × (µ−1 B) + (E) = 0,
c
ω
k×E− B = 0,
c
k·B = 0,
where we omitted momentum arguments for notational clarity. We next compute k×(third
equation) + (ωµ/c)×(second equation) and (ω/c)×(third equation) − k×(second equation)
to obtain the Fourier representation of the wave equation
ω2
  
2 E(k, ω)
k − 2 × = 0, (2.31)
c (ω) B(k, ω)
where
c
c(ω) = p , (2.32)
µ(ω)
is the effective velocity of light in matter.
Comparing with our discussion of section 1.4, we conclude that a plane electric wave in
a matter is mathematically described by the function
E(x, t) = E0 exp(i(k · x − ωt)),
54 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

where k = kn and k = ω/c(ω) is a (generally complex) function of the wave frequency.


Specifically, for Im (k · x) > 0, the wave is will be exponentially damped. Splitting the wave
number k into its real and imaginary part, we have

ω i
k = Re k + i Im k = n(ω) + α(ω),
c 2
where n and α are the refraction and absorption coefficient, respectively. To better understand
the meaning of the absorption coefficient, let us compute the Poynting vector of the electro-
magnetic wave. As in section 1.4, the Maxwell equation 0 = ∇ · D = ∇ · E ∝ k · E0 implies
transversality of the electric field. (Here we rely on the approximate x–independence of the
dielectric function.) Choosing coordinates such that n = e3 and assuming planar polarization
for simplicity, we set E0 = E0 e1 with a real coefficient E0 . The — physically relevant — real
part of the electric field is thus given by

Re E = E0 e1 cos(ω(zn/c − t))e−αz/2 .

From the Maxwell equations ∇ · B = 0 and ∇ × E + c−1 ∂t B, we further obtain (exercise)


Re B = E0 e2 cos(ω(zn/c − t))e−αz/2 . Assuming that the permeability is close to unity, µ ' 1
or B ' H, we thus obtain for the magnitude of the Poynting vector

c cE02 2
h... it cE0 −αz
|S| = |E × H| = cos2 (ω(zn/c − t))e−αz −→ e ,
4π 4π 8π
where in the last step we have averaged over several intervals of the oscillation period 2π/ω.
According to this result,

The electromagnetic energy current inside a medium decays at a rate


set by the absorption coefficient.

This phenomenon is easy enough to interpret: according to our semi–phenomenological model


of the dielectric function above, the absorption coefficient is non–vanishing in the immediate
vicinity of a resonant frequency ωi , i.e. a frequency where molecular degrees of freedom
oscillate. The energy stored in an electromagnetic wave of this frequency may thus get
converted into mechanical oscillator energy. This goes along with a loss of field intensity, i.e.
a diminishing of the Poynting vector or, equivalently, a loss of electromagnetic energy density,
w.

2.3.3 Dispersion
In the previous section we have focused on the imaginary part of the dielectric function and
on its attenuating impact on individual plane waves propagating in matter. We next turn
to the discussion of the — equally important — role played by the real part. To introduce
2.3. WAVE PROPAGATION IN MEDIA 55

the relevant physical principles in a maximally simple environment, we will focus on a one–
dimensional model of wave propagation throughout. Consider, thus, the one–dimensional
variant of the wave equations (2.31),

ω2
 
2
k − 2 ψ(k, ω) = 0, (2.33)
c (ω)
p
where k is a one–dimensional wave ’vector’ and c(ω) = c/ (ω) as before. (For simplicity,
we set µ = 1 from the outset.) Plane wave solutions to this equation are given by ψ(x, t) =
12
ψ0 exp(ik(ω)x − iωt), where k(ω) = ω/c(ω). Notice that an alternative representation of
the plane wave configurations reads as ψ(x, t) = ψ0 exp(ikx − iω(k)t), where ω(k) is defined
implicitly, viz. as a solution of the equation ω/c(ω) = k.
To understand the consequences of frequency dependent variations in the real part of the
dielectric function, we need, however, to go beyond the level of isolated plane wave. Rather,
let us consider a superposition of plane waves (cf. Eq. (1.42)),
Z
ψ(x, t) = dk ψ0 (k)e−i(kx−ω(k)t) , (2.34)

where ψ0 (k) is an arbitrary function. (Exercise: check that this superposition solves the wave
equations.)
Specifically, let us chose the function ψ0 (k) such that, at initial time t = 0, the distribution
ψ(x, t = 0) is localized in a finite region in space. Configurations of this type are called
wave packets. While there is a lot of freedom in choosing such spatially localizing envelope
functions, a convenient (and for our purposes sufficiently general) choice of the weight function
ψ0 is a Gaussian,
2
 

ψ0 (k) = ψ0 exp −(k − k0 ) ,
4
where ψ0 ∈ C is a amplitude coefficient fixed and ξ a coefficient of dimensionality ‘length’
that determines the spatial extent of the wave package (at time t = 0.) To check that latter
assertion, let us compute the spatial profile ψ(x, 0) of the wave package at the initial time:
Z 2
Z 2
−(k−k0 )2 ξ4 −ikx 2ξ
ψ(x, 0) = ψ0 dk e = ψ0 eik0 x
dk e−k 4 −ikx =

Z 2
2ξ 2 2
= ψ0 eik0 x
dk e−k 4 −(x/ξ) = 2 πψ0 ξ eik0 x e−(x/ξ) .

The function ψ(x, 0) is concentrated in a volume of extension ξ; it describes the profile of a


small wave ‘packet’ at initial time t = 0. What will happen to this wave packet as time goes
on?
To answer this question, let us assume that the scales over which the function ω(k) varies
are much smaller than the extension of the wave package in wave number space, k. It is, then,
12
There is also the ‘left moving’ solution, ψ(x, t) = ψ0 exp(ik(ω)x + iωt), but for our purposes it will be
sufficient to consider right moving waves.
56 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

vg t

Figure 2.5: Left: Dispersive propagation of an initially sharply focused wave package. Right:
A more shallow wave package suffers less drastically from dispersive deformation.

a good idea to expand ω(k) around k0 :


a
ω(k) = ω0 + vg (k − k0 ) + (k − k0 )2 + . . . ,
2
where we introduced the abbreviations ω0 ≡ ω(k0 ), vg ≡ ω 0 (k0 ) and a ≡ ω 00 (k0 ). Substituting
this expansion into Eq. (2.34) and doing the Gaussian integral over wave numbers we obtain
Z
2 2
ψ(x, t) = ψ0 dk e−(k−k0 ) ξ /4−i(kx−ω(k)t) '
Z
2 2
' ψ0 eik0 (x−vp t)
dk e−(ξ +2iat)(k−k0 ) /4−i(k−k0 )(x−vg t) =
√ 2
' 2 πψ0 ξ(t)eik0 (x−vp t) e−((x−vg t)/ξ(t)) ,
p
where we introduced the function ξ(t) = ξ 2 + 2iat, and the abbreviation vp ≡ ω(k0 )/k0
Let us try to understand the main characteristics of this result:

. The center of the wave package moves with a characteristic velocity vg = ∂k ω(k0 ) to
the right. In as much as vg determines the effective center velocity of a superposition of
a continuum or ‘group’ of plain waves, vg is called the group velocity of the system.
Only in vacuum where c(ω) = c is independent of frequency we have ω = ck and the
group velocity vg = ∂k ω = c coincides with the vacuum velocity of light.
A customary yet far less useful notion is that of a phase velocity: an individual plane
wave behaves as ∼ exp(i(kx − ω(k)t)) = exp(ik(x − (ω(k)/k)t)). In view of the
structure of the exponent, one may call vp ≡ ω(k)/k the ‘velocity’ of the wave. Since,
however, the wave extends over the entire system anyway, the phase velocity is not of
direct physical relevance. Notice that the phase velocity vp = ω(k)/k = c(k) = c/n(ω),
where the refraction coefficient has been introduced in (2.30). For most frequencies and
in almost all substances, n > 1 ⇒ vp < c. In principle, however, the phase velocity
2.3. WAVE PROPAGATION IN MEDIA 57

13
may become larger than the speed of light. The group velocity vg = ∂k ω(k) =
(∂ω k(ω)) = (∂ω ωn(ω)) c = (n(ω) + ω∂ω n(ω))−1 c = vp (1 + n−1 ω∂ω n)−1 . In
−1 −1

regions of normal dispersion, ∂ω n > 0 implying that vg < vp . For the discussion of
anomalous cases, where vg may exceed the phase velocity or even the vacuum velocity
of light, see [1].

. The width of the wave package, ∆x changes in time. Inspection of the Gaussian fac-
2 −2 ∗−2
tor controlling the envelope of the packet shows that Re ψ ∼ e(x−vg t) (ξ +ξ )/2 , where
the symbol ’∼’ indicates that only the exponential dependence of the package is taken
into account. Thus, the width of the package is given by ξ(t) = ξ(1 + (2at/ξ 2 )2 )1/2 .
According to this formula, the rate at which ξ(t) increases (the wave package flows
apart) is the larger, the sharper the spatial concentration of the initial wave package
was (cf. Fig. 2.5). For large times, t  ξ 2 /a, the width of the wave package increases
as ∆x ∼ t. The disintegration of the wave package is caused by a phenomenon called
dispersion: The form of the package at t = 0 is obtained by (Fourier) superposition
of a large number of plane waves. If all these plane waves propagated with the same
phase velocity (i.e. if ω(k) was k–independent), the form of the wave package would
remain un–changed. However, due to the fact that in a medium plane waves of different
wave number k generally propagate at different velocities, the Fourier spectrum of the
package at non–zero times differs from that at time zero. Which in turn means that the
integrity of the form of the wave package gets gradually lost.
The dispersive spreading of electromagnetic wave packages is a phenomenon of immense
applied importance. For example, dispersive deformation is one of the main factors limiting
the information load that can be pushed through fiber optical cables. (For too high loads,
the ’wave packets’ constituting the bit stream through such fibers begin to overlap thus loosing
their identity. The construction of ever more sophisticated counter measures optimizing the
data capacity of optical fibers represents a major stream of applied research.

2.3.4 Electric conductivity


As a last phenomenon relating to macroscopic electrodynamics, we discuss the electric con-
duction properties of metallic systems.
Empirically, we know that in a metal, a finite electric field will cause current flow. In
its most general form, the relation between field and current assumes a form similar to that
between field and polarization discussed above:
Z Z t
3 0
ji (x, t) = dx dt0 σij (x − x0 , t − t0 )Ej (x0 , t0 ), (2.35)
−∞

13
However, there is no need to worry that such anomalies conflict with Einsteins principle of relativity to be
discussed in chapter 3 below: the dynamics of a uniform wave train is not linked to the transport of energy
or other observable physical quantities, i.e. there is no actual physical entity that is transported at a velocity
larger than the speed of light.
58 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

where σ = {σij (x, t)} is called the conductivity tensor. This equation states that a field may
cause current flow at different spatial locations in time. Equally, a field may cause current flow
in directions different from the field vector. (For example, in a system subject to a magnetic
field in z–direction, an electric field in x–direction will cause current flow in both x– and
y–direction, why?) As with the electric susceptibility, the non–local space dependence of the
conductivity may often be neglected, σ(x, t) ∝ δ(x), or σ(q, ω) = σ(ω) in Fourier space.
However, except for very low frequencies, the ω–dependence is generally important. Generally,
one calls the finite–frequency conductivity ’AC conductivity’, where ’AC’ stands for ’alternating
current’. For ω → 0, the conductivity crosses over to the ’DC conductivity’, where ’DC’ stands
for ’directed current’. In the DC limit, and for an isotropic medium, the field–current relation
assumes the form of Ohm’s law
j = σE. (2.36)
In the following, we wish to understand how the phenomenon of electrical conduction can be
explained from our previous considerations.
There are two different ways to specify the dynamical response of a metal to external
fields: We may either postulate a current–field relation in the spirit of Ohms law. Or we
may determine a dielectric function which (in a more microscopic way) describes the response
of the mobile charge carriers in the system to the presence of a field. However, since that
response generally implies current flow, the simultaneous speccificiation of both a frequency
14
dependent dielectric function and a current–field relation would over–determine the problem.
In the following, we discuss the two alternatives in more detail.

The phenomenological route


Let us impose Eq. (2.36) as a condition extraneous to the Maxwell equations; We simply
postulate that an electric field drives a current whose temporal and spatial profile is rigidly
locked to the electric field. As indiciated above, this postulate largely determines the dynamical
response of the system to the external field, i.e. the dielectric function (ω) =  has to be
chosen constant. Substitution of this ansatz into the temporal Fourier transform of the Maxwell
equation (??) obtains

1
∇ × H(x, ω) = (−iω + 4πσ) E(x, ω), (2.37)
c
For the moment, we leave this result as it is and turn to

The microscopic route


We want to describe the electromagnetic response of a metal in terms of a model dielectric
function. The dielectric function (2.28) constructed above describes the response of charge
carriers harmonically bound to molecular center coordinates by a harmonic potential ∼ mi ωi2 /2
and subject to an effective damping or friction mechanism. Now, the conduction electrons of a
14
As we shall see below, a dielectric constant does not lead to current flow.
2.3. WAVE PROPAGATION IN MEDIA 59

metal mey be thought of as charge carriers whose confining potential is infinitely weak, ωi = 0,
so that their motion is not tied to a reference coordinate. Still, the conduction electrons will
be subject to friction mechanisms. (E.g., scattering off atomic imperfections will impede their
ballistic motion through the solid.) We thus model the dielectric function of a metal as

4πne e2 4πne2 X fi ωωi 4πne e2


(ω) = 1 − + ' − , (2.38)
mω(ω + τi ) m i
ωi2 − ω 2 − iγi ω mω(ω + τi )

where ne ≡ N f0 is a measure of the concentration of conduction electrons, the friction


15
coefficient of the electrons has been denoted by τ −1 , and in the last step we noted that
for frequencies well below the oscillator frequencies ωi of the valence electrons, the frequency
dependence of the second contribution to the dielectric function may be neglected; We thus
lump that contribution into an effective material constant . Keep in mind that the free electron
contribution to the dielectric function exhaustingly describes the acceleration of charge carriers
by the field (the onset of current), i.e. no extraneous current–field relations must be imposed.
Substitution of Eq. (2.38) into the Maxwell equation (??) obtains the result

4πne e2
 
iω(ω) 1
∇ × H(x, ω) = − E(x, ω) = −iω + E(x, ω), (2.39)
c c m(−iω + τ1 )

Comparison to the phenomenological result (2.37) yields the identification

ne e 2
σ(ω) = , (2.40)
m(−iω + τ1 )

i.e. a formula for the AC conductivity in terms of electron density and mass, and the impurity
collision rate.
Eq. (2.40) affords a very intuitive physical interpretations:

. For high frequencies, ω  τ −1 , we may approximate σ ∼ (iω)−1 , or ωj ∼ E. In


time space, this means ∂t j ∼ ρv̇ ∼ E. This formula describes the ballistic acceleration
of electrons by an electric field: On time scales t  τ , which, in a Fourier sense,
correspond to large frequencies ω  τ −1 the motion of electrons is not hindered by
impurity collisions and they accelarate as free particles.

. For low frequencies, ω  τ −1 , the conductivity may be approximated by a constant,


σ ∼ τ . In this regime, the motion of the electrons is impeded by repeated impurity
collisions. As a result they diffuse with a constant drift induced by the electric field.

15
... alluding to the fact that the attenuation of the electrons is due to collisions of impurities which take
place at a rate denoted by τ .
60 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS
Chapter 3

Relativistic invariance

3.1 Introduction
Consult an entry level physics textbook to re–familiarize yourself with the basic notions of
special relativity!

3.1.1 Galilei invariance and its limitations


The laws of mechanics are the same in two coordinate systems K and K 0 moving at a con-
stant velocity relative to one another. Within classical Newtonian mechanics, the space time
coordinates (t, x) and (t0 , x0 ) of two such systems are related by a Galilei transformation,

t0 = t
x0 = x − vt.

Substituting this transformation into Newton’s equations of motion as formulated in system


K,
d2 xi X
K: = −∇ xi Vij (xi − xj )
dt2 j

(xi are particle coordinates in system K and Vij is a pair potential which, crucially, depends
only on the differences between particle coordinates but not on any distinguished reference
point in universe.) we find that

d2 x0i X
K0 : 02
= −∇x 0 Vij (x0i − x0j ),
dt i
j

i.e. the equations remain form invariant. This is what is meant when we say that classical
1
mechanics is Galilei invariant.
1
More precisely, Galilei invariance implies invariance under all transformations of the Galilei group, the
uniform motions above, rotations of space, and uniform translations of space and time.

61
62 CHAPTER 3. RELATIVISTIC INVARIANCE

Now, suppose a certain physical phenomenon in K (the dynamics of a water surface, say)
is effectively described by a wave equation,
 
1 2
K: ∆ − 2 ∂t f (x, t) = 0.
c
Substitution of the Galilei transformation into this equation obtains
 
0 1 2 2 1
K : ∆ − 2 ∂t0 − 2 v · ∇ ∂t0 − 2 (v · ∇ ) f (x0 , t0 ) = 0,
0 0 0 2
c c c
i.e. an equation of modified form. At first sight this result may look worrisome. After all,
many waves (water waves, sound waves, etc.) are of pure mechanical origin which means that
wave propagation ought to be a Galilei invariant phenomenon. The resolution to this problem
lies with the fact, that matter waves generally propagate in a host medium (water, say.) While
this medium is at rest in K, it isn’t in K 0 . But a wave propagating in a non–stationary medium
must be controlled by a different equation of motion, i.e. there is no a priori contradiction
to the principle of Galilei invariance. Equivalently, one may observe that a wave emitted by a
stationary point source in K will propagate with velocity c (in K.) However, an observer in
K 0 will observe wave fronts propagating at different velocities. Mathematically, this distortion
of the wave pattern is described by K 0 ’s ‘wave equation’.
But what about electromagnetic waves? So far, we have never talked about a ‘host
medium’ supporting the propagation of electromagnetic radiation. The lack of Galilean invari-
ance of Maxwells equations then leaves us with three possibilities:
1. Nature is Galilean invariant, but Maxwells equations are incorrect (or, to put it more
mildly, incomplete.)

2. An electromagnetic host medium (let’s call it the ether) does, indeed, exist.

3. Nature is invariant under a group of space–time transformations different from Galilei.


If so, Newton’s equations of motion are in need of modification.
These questions became pressing in the second half of the 19th century, after Maxwell had
brought the theory of electromagnetism to completion, and the exploration of all kinds of
electromagnetic wave equation was a focal point of theoretical and experimental research. In
view of the sensational success of Maxwells theory the first answer was not really considered
an option.
Since Newtonian mechanics wasn’t a concept to be sacrificed in a lightheartededly either,
the existence of an ether was postulated. However, it was soon realized that the ether medium
would have to have quite eccentric physical properties. Since no one had ever actually observed
an ether, this medium would have to be infinitely light and fully non–interacting with matter.
(Its presumed etheric properties actually gave it its name.) Equally problematic, electromag-
netic waves would uniformly propagate with the speed of light, c, only in the distinguished
ether rest frame. The ether postulate was tantamount to the abandonment of the idea of ‘no
preferential intertial frames in nature’ a concept close at heart to the development of modern
3.1. INTRODUCTION 63

Figure 3.1: Measurement of Newtons law of gravitational attraction (any other law would be
just as good) in two frames that are relatively inertial. The equations describing the force
between masses will be the same in both frames

physics. Towards the end of the nineteenth century, experimental evidence was mounting that
the vacuum velocity of light was, independent of the observation frame, given by c; the idea
of an ether became increasingly difficult to sustain.

3.1.2 Einsteins postulates


In essence, this summarizes the state of affairs before Einstein entered the debate. In view
of the arguments outlined above, Einstein decided that 3. had to be the correct option.
Specifically, he put forward two fundamental postulates

. The postulate of relativity: This postulate is best formulated in a negative form:


there is no such thing like ‘absolute rest’ or an ’absolute velocity’, or an ‘absolute point
in space’. No absolute frame exists in which the laws of nature assume a distinguished
form.
To formulate this principle in a positive form, we define two coordinate frames to be
inertial relatively to each other if one is related to the other by a translation in space
and/or time, a constant rotation, a uniform motion, or a combination of these operations.
The postulate then states that the laws of nature will assume the same form (be expressed
by identical fundamental equations) in both systems.
As a corollary, this implies that physical laws must never make reference to absolute
coordinates, times, angles, etc.

. The postulate of the constancy of the speed of light: the speed of light is inde-
pendent of the motion of its source.
What makes this postulate — which superficially might seem to be no more than an
2
innocent tribute to experimental observation — so revolutionary is that it abandons the
2
... although the constancy of the speed of light had not yet been fully established experimentally when
Einstein made his postulate.
64 CHAPTER 3. RELATIVISTIC INVARIANCE

notion of ‘absolute time’. For example, two events that are simultaneous in one inertial
frame will in general no longer be simultaneous in other inertial frames. While, more
than one hundred years later, we have grown used to its implications the revolutionary
character of Einsteins second postulate cannot be exaggerated.

3.1.3 First consequences


As we shall see below, the notion of an ’absolute presence’ — which makes perfect sense in
3
Newtonian mechanics — becomes meaningless in Einsteins theory of relativity. When we
monitor physical processes we must be careful to assign to each physical event its own space
and time coordinate x = (ct, x) (where the factor of c has been included for later convenience.)
An event is canonical in that it can be observed in all frames (no matter whether they are
relatively inertial or not). However, both its space and time coordinates are non–canonical,
i.e. when observed from a different frame K 0 it will have coordinates x0 = (ct0 , x0 ), where t0
may be different from t.
When we talk of a ’body’ or a ’particle’ in relativity, what we ct
actually mean is the continuous sequence of events x = (ct, x)
defined by its instantaneous position x at time t (both moni-
tored in a specific frame). The assembly of these events obtains
the world line of the body, a (directed) curve in a space–time
coordinate frame (cf. the figure for a schematic of the (1 + 1)–
dimensional world lines of a body in uniform motion (left) and an x
accelerated body (right).)
The most important characteristic of the relative motion of inertial frames is that no
acceleration is involved. This implies, in particular, that a uniform motion in one frame will stay
uniform in the other; straight world lines transform into straight world lines. Parameterizing
a general uniform motion in system K by x = (ct, wt + b), where w is the velocity and b
an offset vector, we must require that the coordinate representation x0 in any inertial system
K 0 is again a straight line. Now, the most general family of transformations mapping straight
lines onto straight lines are the affine transformations
x0 = Λx + a, (3.1)
4
where Λ : Rd+1 → Rd+1 is a linear map from (d + 1)–dimensional space time into itself and
a ∈ Rd+1 a vector describing the displacement of the origin of coordinate systems in space
and or time. For a = 0, we speak of a homogeneous coordinate transformation. Since any
transformation may be represented as a succession of a homogeneous transformation and a
trivial translation, we will focus on the former class throughout.
INFO Above, we have set up criteria for two frames being relatively inertial. However, in the
theory of relativity, one meets with the notion of (absolutely) inertial frames (no reference to
3
Two events that occur simultaneously in one frame will remain simultaneous under Galilei transformations.
4
Although we will be foremostly interested in the case d = 3, it is occasionaly useful to generalize to other
dimensions. Specifically, space–time diagrams are easiest to draw/imagine in (1 + 1)–dimensions.
3.1. INTRODUCTION 65

another frame.) But how can we tell, whether a frame is inertial or not? And what, for that
matter, does the attribute ’inertial’ mean, if no transformation is involved?
Newton approached this question by proposing the concept of an ’absolute space’. Systems at rest
in this space, or at constant motion relative to it, were ’inertial’. For practical purposes, it was
proposed that a reference system tied to the position of the fixed stars was (a good approximation
of) an absolute system. Nowadays, we know that these concepts are problematic. The fixed stars
aren’t fixed (nor for that matter is any known constituent of the universe), i.e. we are not aware
of a suitable ’anchor’ reference system. What is more, the declaration of an absolute frame is at
odds with the ideas of relativity.
In the late nineteenth century, the above definition of inertial frames was superseded by a purely
operational one. A modern formulation of the alternative definition might read:

An inertial frame of reference is one in which the motion of a particle


not subject to forces is in a straight line at constant speed.

The meaning of this definition is best exposed by considering its negative: imagine an observer
tied to a rotating disc. A particle thrown by that observer will not move at a straight line (relative
to the disc.) If the observer does not know that he/she is on a rotating frame, he/she will have to
attribute bending of the trajectory to some ’fictitious’ forces. The presence of such forces is proof
that the disc system is not inertial.
However, in most physical contexts questions concerning the ’absolute’ status of a system do not
5
arise. Far more frequently, we care about what happens as we pass from one frame (laboratory,
say) to another (that of a particle moving relatively to the lab.) Thus, the truly important issue is
whether systems are relatively inertial or not.

What conditions will the transformation matrix Λ have to fulfill in order to qualify as a trans-
formation between inertial frames? Referring for a rigorous discussion of this question to
Ref. [2], we here merely note that symmetry conditions implied by the principle of relativ-
ity nearly but not completely specify the class of permissible transformations. This class of
transformations includes the Gallilei transformations which we saw are incompatible with the
physics of electromagnetic wave propagation.
At this point, Einsteins second postulate enters the game. Consider two inertial frames
K and K 0 related by a homogeneous coordinate transformation. At time t = 0 the origins of
the two systems coalesce. Let us assume, however, the K 0 moves relatively to K at constant
velocity v. Now consider the event: at time t = 0 a light source at x = 0 emits a signal. In
K, the wave fronts of the light signal then trace out world lines propagating at the vacuum
speed of light, i.e. the spatial coordinates x of a wave front obey the relation

x2 − c2 t2 = 0.

The crucial point now is that in spite of its motion relative to the point of emanation of the
light signal, an observer in K 0 , too, will observe a light signal moving with velocity c in all
directions. (Notice the difference to, say, a sound wave. From K 0 ’s point of view, fronts
moving along v would be slowed down while those moving in the opposite direction would
5
Exceptions include high precision measurements of gravitational forces.
66 CHAPTER 3. RELATIVISTIC INVARIANCE

0
propagate at higher speed, vsound = vsound − v.) This means that the light front coordinates
0
in x obey the same relation
x02 − c2 t02 = 0.
This additional condition unambiguously fixes the set of permissible transformations. To
formulate it in the language of linear coordinate transformations (which, in view of the linearity
of transformations between inertial frames is the adequate one), let us define the matrix
 
1
 −1 
g = {gµν } ≡  , (3.2)
 −1 
−1

where all non–diagonal entries are zero. The constancy of the speed of light may then be
expressed in concise form as xT gx = 0 ⇒ x0T gx0 = 0. This condition suggests to focus on
6
coordinate transformations for which the bilinear form xt gx is conserved,

xT gx = x0T gx0 . (3.3)

Substituting x0 = Λx, we find that Λ must obey the condition

ΛT gΛ = g (3.4)

Linear coordinate transformations satisfying this condition are called Lorentz transforma-
tions. We thus postulate that

All laws of nature must be invariant under affine coordinate transformations (3.1)
where the homogeneous part of the transformation obeys the Lorentz condition (3.4).

The statement above characterizes the set of legitimate transformations in implicit terms. In
the next section, we aim for a more explicit description of the Lorentz transformation. These
results will form the basis for our discussion of Lorentz invariant electrodynamics further down.

3.2 The mathematics of special relativity I: Lorentz group


In this section, we will develop the mathematical framework required to describe transforma-
tions between inertial frames. We will show that these transformations form a continuous
group, the Lorentz group.
6
Notice the gap in the logics of the argument. Einsteins second postulate requires Eq. (3.3) only for those
space time vectors for which xT gx = 0 vanishes; We now suggest to declare it as a universal condition, i.e.
one that holds irrespective of the value of xT gx. To show that the weaker condition implies the stronger, one
may first prove (cf. Ref. [2]) that relatively straightforward symmetry considerations suffice to fix the class of
allowed transformations up to one undetermined scalar constant. The value of that constant is then fixed by
the ‘weak condition’ above. Once it has been fixed, one finds that Eq. (3.3) holds in general.
3.2. THE MATHEMATICS OF SPECIAL RELATIVITY I: LORENTZ GROUP 67

ct

time like forward light cone

space like

backward light cone

Figure 3.2: Schematic of the decomposition of space time into a time like region and a space
like region. The two regions are separated by a ‘light cone’ of light–like vectors. (A ’cone’
because in space dimensions d > 1 it acquires a conical structure.) The forward/backward
part of the light cone extends to positive/negative times

3.2.1 Background
The bilinear form introduced above defines a scalar product of R4 :
g : R4 × R4 → R, (3.5)
(x, y) 7→ xT gy ≡ xµ gµν y ν . (3.6)
A crucial feature of this scalar product is that it is not positive definite. For reasons to be
discussed below, we call vectors whose norm is positive, xT gx = (ct)2 − x2 > 0, time like
vectors. Vectors with negative norm will be called space like and vectors of vanishing norm
light like.
We are interested in Lorentz transformations, i.e. linear transformations Λ : R4 → R4 , x 7→
Λx that leave the scalar product g invariant, ΛT gΛ = g. (Similarly to, say, the orthogonal
transformations which leave the standard scalar product invariant.) Before exploring the struc-
ture of these transformations in detail, let us observe a number of general properties. First
notice that the set of Lorentz transformations forms a group, the Lorentz group, L. For if
Λ and Λ0 are Lorentz transformations, so is the composition ΛΛ0 . The identity transformation
obviously obeys the Lorentz condition. Finally, the equation ΛT gΛ = g implies that Λ is
non–singular (why?), i.e. it possesses an inverse, Λ−1 . We have thus shown that the Lorentz
transformations form a group.

Global considerations on the Lorentz group


What more can be said about this group? Taking the determinant of the invariance relation,
we find that det(ΛT gΛ) = det(Λ)2 det(g) = det(g), i.e. det(Λ)2 = 1 which means that
68 CHAPTER 3. RELATIVISTIC INVARIANCE

det(Λ) ∈ {−1, 1}. (Λ is a real matrix, i.e. det(Λ) ∈ R.) We may further observe that the
absolute value of the component Λ00 is always larger than unity. This is shown
P by inspection
T 2 2
of the 00–element of the invariance relation: 1 = g00 = (Λ gΛ)00 = Λ00 − i Λ0i . Since the
subtracted term is positive (or zero), we have |Λ00 | ≥ 1. We thus conclude that L contains
four components defined by

L↑+ : det Λ = +1, Λ00 ≥ +1,


L↓+ : det Λ = +1, Λ00 ≤ −1,
(3.7)
L↑− : det Λ = −1, Λ00 > +1,
L↓− : det Λ = −1, Λ00 ≤ −1.

Transformations belonging to any one of these subsets cannot be continuously deformed into a
transformation of a different subset (why?), i.e. the subsets are truly disjoint. In the following,
we introduce a few distinguished transformations belonging to the different components of the
Lorentz group.
Space reflection, or parity, P : (ct, x) → (ct, −x) is a Lorentz transformation. It belongs
to the component L↑− . Time inversion, T : (ct, x) → (−ct, x) belongs to the component
L↓− . The product of space reflection and time inversion, PT : (ct, x) → (−ct, −x) belongs
to the component L↓+ . Generally, the product of two transformations belonging to a given
component no longer belongs to that component, i.e. the subsets do not for subgroups of
the Lorentz group. However, the component L↑+ is exceptional in that it is a subgroup. It is
called the proper orthochrome Lorentz group, or restricted Lorentz group. The attribute
’proper’ indicates that it does not contain excpetional transformations (such as parity and time
inversion) which cannot be continuosly deformed back to the identity transform. It is called
‘orthochronous’ because it maps the positive light cone into it self, i.e. it respects the ‘direction’
of time. To see that L↑+ is a group, notice that it contains the identity transformation. Further,
its elements can be continuously contracted to unity; however, if this property holds for two
elements Λ, Λ0 ∈ L↑+ , then so for the product ΛΛ0 , i.e. the product is again in L↑+ .

The restricted Lorentz group


The restricted Lorentz group contains those transformations which we foremostly associate
with coordinate transformations between inertial frames. For example, rotations of space,
 
1
ΛR ≡ ,
R

where R is a three–dimensional rotation matrix trivially fall into this subgroup. However, it
also contains the second large family of homogeneous transformations between inertial frames,
uniform motions.
3.2. THE MATHEMATICS OF SPECIAL RELATIVITY I: LORENTZ GROUP 69

x3
Consider two inertial frames K and K 0 whose origins K x03 K0

at time t = 0 coalesce (which means that the affine x2 x02


coordinate transformation (3.1) will be homogeneous, vt
a = 0.) We assume that K 0 moves at constant veloc- x1 x01
ity v relative to K. Lorentz transformations of this type,
Λv , are called special Lorentz transformations. Phys-
x03
ically, they describe what is sometimes callled a Lorentz K0
boost, i.e. the passage from a stationary to a moving x02
frame. (Although we are talking about a mere coor- x3 K
dinate transformation, i.e. no acceleration processes is x01
vt
described.) We wish to explore the mathematical and x2

physical properties of these transformations.


x1
Without loss of generality, we assume that the vector
v describing the motion is parallel to the e1 direction of K (and hence to the e01 –direction of
K 0 : Any general velocity vector v may be rotated to v k e1 by a space like rotation R. This
means that a generic special Lorentz transformation Λv may be obtained from the prototypical
one Λve1 by a spatial rotation, Λv = ΛR Λve1 Λ−1 R (cf. the figure.)
A special transformation in e1 –direction will leave the coordinates x2 = x02 and x3 = x03
7
unchanged. The sought for transformation thus assumes the form
 
Av
Λve1 =  1 ,
1

where the 2 × 2 matrix Av operates in (ct, x1 )–space.


There are many ways to determine the matrix Av . We here use a group theory inspired
method which may come across as somewhat abstract but has the advantage of straightfor-
ward applicability to many other physical problems. P Being element of a continuous group of
transformations (a Lie group), the matrix Av ≡ exp( i λi Ti ) can be written in an exponential
parameterization. Here, λi are real parameters and Ti two–dimensional fixed ‘generator ma-
trices’ (Cf. with the conventional rotation group; the Ti ’s play the role of angular momentum
generator matrices, and the λi ’s are generalized real–valued ‘angles’.) To determine the group
generators, we consider the transformation Av for infinitesimal angles λi . Substituting the
matrix Av into the defining equation of the Lorentz group, using that the restriction of the
metric to (ct, x) space assumes the form g → σ3 ≡ diag(1, −1), and expanding to first order
in λi , we obtain

!
X X X
ATv σ3 Av ' (1 + λi TiT )σ3 (1 + λi Ti ) ' σ3 + λi (TiT σ3 + σ3 Ti ) = σ3 .
i i i

7
To actually prove this, notice that from the point of view of an observer in K 0 we consider a special trans-
formation with velocity −ve01 . Then apply symmetry arguments (notably the condition of parity invariance; a
mirrored universe should not differ in an observable manner in its transformation behaviour from our one.) to
show that any change in these coordinates would lead to a contradiction.
70 CHAPTER 3. RELATIVISTIC INVARIANCE

This relation must hold regardless of the value of λi , i.e. the matrices Ti must obey the
condition TiT σ3 = −σ3 Ti . Upto normalization, there is only one matrix that meets this
0 1
condition, viz Ti = σ1 ≡ . The most general form of the matrix Av is thus given by
1 0
    
0 1 cosh λ sinh λ
Av = exp λ = ,
1 0 sinh λ cosh λ

where we denoted the single remaining parameter by λ and in the second equality expanded
the exponential in a power series. We now must determine the parameter λ = λ(v) in such
a way that it actually describes our v–dependent transformation. We first substitute Av into
the transformation law x0 = Λv x, or
 00   0
x x
10 = Av .
x x1

Now, the origin of K 0 (having coordinate x10 = 0) is at coordinate x = vt. Substituting this
condition into the equation above, we obtain

tanh λ = −v/c.

Using this result and introducing the (standard) notation

β = |v|/c, γ ≡ (1 − β 2 )−1/2 , (3.8)

the special Lorentz transformation assumes the form


 
γ −βγ
−βγ γ 
Λve1 =  (3.9)
 1 
1

EXERCISE Repeat the argument above for the full transformation group, i.e. not just the set
of special Lorentz transformations along a certain axis. To this end, introduce an exponential
P
representation Λ = exp( i λi Ti ) for general elements of the restricted Lorentz group. Use the
8
defining condition of the group to identify six linearly independent matrix generators. Among
the matrices Ti , identify three as the generators Ji of the spatial rotation group, Ti = Ji . Three
others generate special Lorentz transformations along the three coordinate axes of K. (Linear
combinations of these generators can be used to describe arbitrary special Lorentz transformations.)

A few more remarks on the transformations of the restricted Lorentz group:


8
Matrices are elements of a certain vector space (the space of linear transformations of a given vector
space.)
P As A set of matrices A1 , . . . Ak is linearly independent if no linear combination exists such that
i i i = 0 vanishes.
x A
3.3. ASPECTS OF RELATIVISTIC DYNAMICS 71

. In the limit of small β, the transformation Λve1 asymptotes to the Galilei transformation,
t0 = t and x01 = x1 − vt.

. We repeat that a general special Lorentz transformation can be obtained from the
transformation along e1 by a space–like rotation, Λv = ΛR Λve1 ΛR−1 , where R is a
rotation mapping e1 onto the unit vector in v–direction.

. Without proof (However, the proof is by and large implied by the exercise above.) we
mention that a general restricted Lorentz transformation can always be represented as

Λ = ΛR Λv ,

i.e. as the product of a rotation and a special Lorentz transformation.

3.3 Aspects of relativistic dynamics


In this section we explore a few of the physical consequences deriving from Einsteins postulates;
Our discussion will be embarassingly superficial, key topics relating to the theory of relativity
aren’t mentioned at all. Rather the prime objective of this section will be to provide the core
background material required to discuss the relativistic invariance of electrodynammics below.

3.3.1 Proper time


Consider the world line of a particle moving in space in a (not nec- ct

essarily uniform manner.) At any instance of time, t, the particle 0


ct
will be at rest in an inertial system K 0 whose origin coincides with 0
x
x(t) and whose instantaneous velocity w.r.t. K is given by dt x(t).
At this point, it is important to avoid confusion. Of course, the
coordinate system whose origin globally coincides with x(t) is not
inertial w.r.t. K (unless the motion is uniform.) Put differently,
the origin of the inertial system K 0 defined by the instantaneous
position x(t) and the instantaneous velocity dt x(t) coincides with
the particles position only at the very instant t. The t0 axis of x

K 0 is parallel to the vector x0 (t). Also, we know that the t0 axis


cannot be tilted w.r.t. the t axis by more than an angle π/4, i.e. it must stay within the
instantaneous light cone attached to the reference coordinate x.
Now, imagine an observer traveling along the world line x(t) and carrying a watch. At
time t (corresponding to time zero in the instantaneous rest frame) the watch is reset. We
wish to identify the coordinates of the event ‘watch shows (infinitesimal) time dτ ’ in both
K and K 0 . In K’ the answer to this question is formulated easily enough, the event has
coordinates (cdτ, 0), where we noted that, due to the orthogonality of the space–like coordinate
axes to the world line, the watch will remain at coordinate x0 = 0. To identify the K–
coordinates, we first note that the origin of K 0 is translated w.r.t. that of K by a = (ct, x(t).
72 CHAPTER 3. RELATIVISTIC INVARIANCE

Thus, the coordinate transformation between the two frames assumes the general affine form
x = Lx0 + a. In K, the event takes place at some (as yet undetermined) time dt later than t.
It’s spatial coordinates will be x(t) + vdt, where v = dt x is the instantaneous velocity of the
moving observer. We thus have the identification (cdτ, 0) ↔ (c(t + dt), x + vdt). Comparing
this with the general form of the coordinate transformation, we realize that (cdτ, 0) and
(cdt, vdt) are related by a homogeneous Lorentz transformation. This implies, in particular,
that (cdτ )2 = (cdt)2 − (vdt)2 , or
p
dτ = dt 1 − v2 /c2 . (3.10)

The time between events of finite duration may be obtained by integration,


Z p
τ = dt 1 − v2 /c2 .

This equation expresses the famous phenomenon of the relativity of time: For a moving ob-
server time proceeds slower than for an observer at rest. The time measured by the propagating
watch, τ , is called proper time. The attribute ‘proper’ indicates that τ is defined with refer-
ence to a coordinate system (the instantaneous rest frame) that can be canonically defined.
I.e. while the time t associated to a given point on the world line x(t) = (ct, x(t)) will change
from system to system (as we just saw), the proper time remains the same; the proper time can
be used to parameterize the space–time curve in an invariant manner as x(τ ) = (ct(τ ), x(τ )).

3.3.2 Relativistic mechanics


Relativistic energy momentum relation
In section 3.1.2 we argued that Einstein solved the puzzle posed by the Lorentz invariance of
electrodynamics by postulating that Newtonian mechanics was in need of generalization. We
will begin our discussion of the physical ramifications of Lorentz invariance by developing this
extended picture. Again, we start from postulates motivated by physical reasoning:
. The extended variant of Newtonian mechanics (lets call it relativistic mechanics) ought to
be invariant under Lorentz transformations, i.e. the generalization of Newtons equations
should assume the same form in all inertial frames.

. In the rest frame of a particle subject to mechanical forces, or in frames moving at low
velocity v/c  1 relatively to the rest frame, the equations must asymptote to Newtons
equations.
Newton’s equations in the rest frame K are of course given by
d2
m x = f,
dτ 2
where m is the particle’s mass (in its rest frame — as we shall see, the mass is not an invariant
quantity. We thus better speak of a particle rest mass), and f is the force. As always in
3.3. ASPECTS OF RELATIVISTIC DYNAMICS 73

relativity, a spatial vector (such as x or f ) must be interpreted as the space–like component


of a four–vector. The zeroth component of x is given by x0 = cτ , so that the rest frame
generalization of the Newton equation reads as md2τ xµ = f µ , where f = (0, f ) and we noted
that for the zeroth component md2τ x0 = 0, so that f 0 = 0. It is useful to define the four–
momentum of the particle by pµ = mdτ xµ . (Notice that both τ and the rest mass m are
invariant quantities, they are not affected by Lorentz transformations. The transformation
behaviour of p is determined by that of x.) The zeroth component of the momentum, mdτ x0
carries the dimension [energy]/[velocity]. This suggests to define
 
E/c
p= ,
p

where E is some characteristic energy. (In the instantaneous rest frame of the particle,
E = mc2 and p = 0.)
Expressed in terms of the four–momentum, the Newton equation assumes the simple form

dτ p = f. (3.11)

Let us explore what happens to the four momentum as we boost the particle from its rest
frame to a frame moving with velocity vector v = ve1 . Qualitatively, one will expect that
in the moving frame, the particle does carry a finite space–like momentum (which will be
proportional to the velocity of the boost, v). We should also expect a renormalization of its
energy (from the point of view of the moving frame, the particle has acquired kinetic energy).
Focusing on the 0 and 1 component of the momentum (the others won’t change), Eq.
(3.9) implies  2   0   
mc /c E /c γmc2 /c
→ ≡ .
0 p01 −(γm)v
This equation may be interpreted in two different ways. Substituting the explicit formula
E = mc2 , we find E 0 = (γm)c2 and p01 = −(γm)v. In Newtonian mechanics, we would have
expected that a particle at rest in K carries momentum p01 = −mv in K 0 . The difference to
the relativistic result is the renormalization of the mass factor: A particle that has mass m in
its rest frame appears to carry a velocity dependent mass
m
m(v) = γm = p
1 − (v/c)2

in K 0 . For v → c, it becomes infinitely heavy and its further acceleration will become progres-
sively more difficult:

Particles of finite rest frame mass cannot move faster than with the speed of light.

The energy in K 0 is given by (mγ)c2 , and is again affected by mass renormalization. However,
a more useful representation is obtained by noting that pT gp is a conserved quantity. In
74 CHAPTER 3. RELATIVISTIC INVARIANCE

K, pT gp = (E/c)2 = (mc2 )2 . Comparing with the bilinear form in K 0 , we find (mc)2 =


(E 0 /c)2 − p02 or
p
E 0 = (mc2 )2 + (p0 c)2 (3.12)

This relativistic energy–momentum relation determines the relation between energy and
momentum of a free particle.
It most important implication is that even a free particle at rest carries energy E = mc2 ,
Einsteins world–famous result. However, this energy is usually not observable (unless it gets
released in nuclear processes of mass fragmentation, with all the known consequences.) What
we do observe in daily life are the changes in energy as we observe the particle in different
frames. This suggests to define the kinetic energy of a particle by
p
T ≡ E − mc2 = (mc2 )2 + (pc)2 − mc2 ,

For velocities v  c, the kineatic energy T ' p2 /2m indeed reduces to the familiar non–
relativistic result. Deviations from this relation become sizeable at velocities comparable to
the speed of light.

Particle subject to a Lorentz force


We next turn back to the discussion of the relativistically invariant Newton equation dτ pµ =
fµ . Both, pµ and the components of the force, fµ , transform as vectors under Lorentz
transformations. Specifically, we wish to explore this transformation behaviour on an example
of obvious relevance to electrodynamics, the Newton equation of a particle of velocity v subject
to an electric and a magnetic field. Unlike in much of our previous discussion our observer
frame K is not the rest frame of the particle. The four–momentum of the particle is thus
given by p = (mγc, mγv).
What we take for granted is the Lorentz force relation,
d
p = q(E + (v/c) × B).
dt
In view of our discussion above, it is important to appreciate the meaning of the non–invariant
constituents of this equation. Specifically, t is the time in the observer frame, and p is the
space–like component of the actual momentum. This quantity is different from mv. Rather,
it is given by the product (observed mass) × (observed velocity), where the observed mass
differs from the rest mass, as discussed above.
We now want to bring this equation into the form of an invariant Newton equation (3.11).
Noting that (cf. Eq. (3.10)) dtd = γ −1 dτd , we obtain

d
p = f,

where the force acting on the particle is given by

f = qγ(E + (v/c) × B).


3.3. ASPECTS OF RELATIVISTIC DYNAMICS 75

To complete our derivation of the Newton equation in K, we need to identify the zeroth
component of the force, f0 . The zeroth component of the left hand side of the Newton
equation is given by (γ/c)dt E, i.e. (γ/c)× the rate at which the kinetic energy of the particle
changes. However, we have seen before that this rate of potential energy change is given by
dt U = −f · v = −qE · v. Energy conservation implies that this energy gets converted into
the kinetic energy, dt T = dt E = +qE · v. This implies the identification f0 = (γq/c)E · v.
The generalized form of Newtons equations is thus given by
   
γc q E · (γv)
mdτ = .
γv c γcE + (γv) × B

The form of these equations suggests to introduce the four–velocity vector


 
γc
v≡
γv

(in terms of which the four–momentum is given by p = mv, i.e. by multiplication by the rest
mass.) The equations of motion can then be expressed as

mdτ v µ = F µν vν , (3.13)

where we used the index raising and lowering convention originally introduced in chapter ??,
i.e. v 0 = v0 and v i = −vi , and the matrix F is defined by
 
0 −E1 −E2 −E3
E 1 0 −B3 B2 
F = {F µν } =  . (3.14)
E2 B3 0 −B1 
E3 −B2 B1 0

The significance of Eq. (3.13) goes much beyond that of a mere reformulation of Newton’s
equations: The electromagnetic field enters the equation through a matrix, the so–called
field strength tensor. This signals that the transformation behaviour of the electromagnetic
field will be different from that of vectors. Throughout much of the course, we treated the
electric field as if it was a (space–like) vector. Naively, one might have expected that in the
theory of relativity, this field gets augmented by a fourth component to become a four–vector.
However, a moments thought shows that this picture cannot be correct. Consider, say, a
charged particle at rest in K. This particle will create a Coulomb electric field. However, in
a frame K 0 moving relative to K, the charge will be in motion. Thus, an observer in K 0 will
see an electric current plus the corresponding magnetic field. This tells us that under inertial
transformations, electric and magnetic fields get transformed into one another. We thus need
to find a relativistically invariant object accommodating the six components of the electric and
magnetic field ’vectors’. Eq. (3.14) provides a tentative answer to that problem. The idea is
that the fields enter the theory as a matrix and, therefore, transform in a way fundamentally
different from that of vectors.
76 CHAPTER 3. RELATIVISTIC INVARIANCE

3.4 Mathematics of special relativity II: Co– and Con-


travariance
So far, we have focused on the Lorentz transformation behaviour of vectors in R4 . However, our
observations made in the end of the previous section suggest to include other objects (such
as matrices) into our discussion. We will begin by providing some essential mathematical
background and then, finally, turn to the discussion of Lorentz invariance of electrodynamics.

3.4.1 Covariant and Contravariant vectors


Generalities

Let x = {xµ } be a four component object that transforms under homogeneous Lorentz trans-
formations as x → Λx or xµ → Λµν xν in components. (By Λµν we denote the components
of the Lorentz transformation matrix. For the up/down arrangement of indices, see the disuc-
ssion below.) Quantities of this transformation behaviour are called contravariant vectors
(or contravariant tensors of first rank). Now, let us define another four–component object,
xµ ≡ gµν xν . Under a Lorentz transformation, xµ → gµν Λν ρ xρ = (ΛT −1 )µν xν . Quantities
transforming in this way will be denoted as covariant vectors (or covariant tensors of first
rank). Defining g µν to be the inverse of the Lorentz metric (in a basis where g is diagonal,
g = g −1 ), the contravariant anchestor of xµ may be obtained back by ‘raising the indices’ as
xµ = g µν xν .
Critical readers may find these ’definitions’ unsatisfactory. Formulated in a given basis,
one may actually wonder whether they are definitions at all. For a discussion of the basis
invariant meaning behind the notions of co- and contravariance, we refer to the info block
below. However, we here proceed in a pragmatic way and continue to explore the consequences
of the definitions (which, in fact, they are) above.

INFO Consider a general real vector space V . Recall that the dual space V ∗ is the linear space of
all mappings ṽ : V → R, v 7→ ṽ(v). (This space is a vector space by itself which is why we denote
its elements by vector–like symbols, ṽ.) For a given basis {eµ } of V , P
a dual basis {ẽµ } of ∗
µ
PV may
be defined by the condition ẽµ (eν ) = δµν . With the expansions v = µ v eµ and ṽ = µ ṽµ ẽµ ,
respectively, we have ṽ(v) = ṽµ v µ . (Notice that components of objects in dual space will be
indexed by subscripts throughout.) In the literature of relativity, elements of the vector space —
then to be identified with space–time, see below — are usually called contravariant vectors while
their partners in dual space are called covariant vectors.

EXERCISE Let A : V → V be a linear map defining a basis change in V , i.e. eµ = Aµν e0ν ,
where {Aµν } are the matrix elements of A. Show that
. The components of a contravariant vector v transform as v µ → v 0µ = (AT )µν v ν .
. The components of a covariant vector, w̃ transform as w̃µ → w̃µ0 = (A−1 )µν w̃ν .
. The action of w̃ on v remains invariant, i.e. w̃(v) = w̃µ v µ = w̃µ0 v 0ν does not change.
(Of course, the action of the linear map w̃ on vectors must not depend on the choice of a
particular basis.)
3.4. MATHEMATICS OF SPECIAL RELATIVITY II: CO– AND CONTRAVARIANCE 77

In physics it is widespread (if somewhat tautological) practice to define co– or contravariant


vectors by their transformation behaviour. For example, a set of d–components {wµ } is denoted
a covariant vector if these components change under linear transformations as wµ → Aµν wν .
Now, let us assume that V is a vector space with scalar product g : V × V → R, (v, w) 7→
v T gw ≡ hv, wi. A special feature of vector spaces with scalar product is the existence of a
canonical mapping V → V ∗ , v 7→ ṽ, i.e. a mapping that to all vectors v canonically (without
reference to a specific basis) assigns a dual element v. The vector ṽ is implicitly defined by the
!
condition ∀w ∈ V, ṽ(w) = hv, wi. For a given basis {eµ } of V the components of ṽ = µ ṽ µ ẽµ
P

may be obtained as ṽµ = ṽ(eµ ) = hv, eµ i = v ν gνµ = gµν v ν , where in the last step we used the
symmetry of g. With the so–called index lowering convention

vµ = gµν v ν , (3.15)

we are led to the identification ṽµ = vµ and ṽ(w) = vµ wµ .


Before carrying on, let us introduce some more notation: by g µν ≡ (g −1 )µν we denote the compo-
nents of the inverse of the metric (which, in the case of the Lorentz metric in its standard diagonal
representation equals the metric, but that’s an exception.) Then g µν gνρ = δρµ , where δµρ is the
standard Kronecker symbol. With this definition, we may introduce an index raising convention
as v µ ≡ g µν vν . Indeed, g µν vν = g µν gνρ v ρ = δ µρ v ρ = v µ . (Notice that indices that are summed
over or ‘contracted’ always appear crosswise, one upper and one lower index.)
Consider a map A : V → V, v 7→ Av. In general the action of the canonical covariant vector Av f on
transformed vectors Aw need not equal the original value ṽ(w). However, it is natural to focus on
those transformations that are compatible with the canonical assignment, i.e. Av(Aw) f = ṽ(w).
In a component language, this condition translates to

(AT )µσ gσρ Aρ ν = gµν

or AT gA = g. I.e. transformations compatible with the canonical identification (vector space)


↔ (dual space.) have to respect the metric. E.g., in the case of the Lorentz metrix, the ’good’
transformations belong to the Lorentz group.
Summarizing, we have seen that the invariant meaning behind contra– and covariant vectors is
that of vectors and dual vectors resepectively. Under Lorentz transformations these objects behave
as is required by the component definitions given in the main text. A general tensor of degree
(n, m) is an element of (⊗n1 V ) ⊗ (⊗m ∗
1 V ).

The definitions above may be extended to objects of higher complexity. A two component
quantity W µν is called a contravariant tensor of second degree if it transforms under
0 0
Lorentz transformations as W µν → Λµµ0 Λν ν 0 W µ ν . Similiarly, a covariant tensor of second
0 0
degree transforms as Wµν → (ΛT −1 )µµ (ΛT −1 )ν ν Wµ0 ν 0 . Covariant and contravariant tensors
0 0
are related to each other by index raising/lowering, e.g. Wµν = gµµ0 gνν 0 W µ ν . A mixed
0 0
second rank tensor Wµ ν transforms as Wµ ν → (ΛT −1 )µµ Λν ν 0 Wµ0ν . The generalization of
these definitions to tensors of higher degree should be obvious.
Finally, we define a contravariant vector field (as opposed to a fixed vector) as a field
vµ (x) that transforms under Lorentz transformations as v µ (x) → v 0µ (x0 ) = Λµν v ν (x). A
Lorentz scalar (field), or tensor of degree zero is one that does not actively transform under
Lorentz transformations: φ(x) → φ0 (x0 ) = φ(x). Covariant vector fields, tensor fields, etc.
are defined in an analogous manner.
78 CHAPTER 3. RELATIVISTIC INVARIANCE

The summation over a contravariant and a covariant index is called a contraction. For
example, v µ wµ ≡ φ is the contraction of a contravariant and a covariant vector (field). As
a result, we obtain a Lorentz scalar, i.e. an object that does not change under Lorentz
transformations. The contraction of two indices lowers the tensor degree of an object by two.
For example the contraction of a mixed tensor of degree two and a contravariant tensor of
degree one, Aµν v ν ≡ wµ obtains a contravariant tensor of degree one, etc.

3.4.2 The relativistic covariance of electrodynamics


We now have everything in store to prove the relativistic covariance of electrodynamics, i.e.
the form–invariance of its basic equations under Lorentz transformations. Basically, what we
need to do is assign a definite transformation behaviour scalar, co–/contravariant tensor, etc.
to the building blocks of electrodynamics.
The coordinate vector xµ is a contravariant vector. In electrodynamics we frequently
differentiate w.r.t. the compoents xµ , i.e. the next question to ask is whether the four–
dimensional gradient {∂xµ } is co– or contravariant. According to the chain rule,
∂ ∂xµ ∂
= .
∂x0µ0 ∂x0µ0 ∂xµ
0
Using that xµ = (Λ−1 )µµ0 x0µ , we find that
∂ −1T µ ∂
0µ0 = (Λ )µ0 µ ,
∂x ∂x
i.e. ∂xµ ≡ ∂µ transforms as a covariant vector. In components,
∂µ = (c−1 ∂t , ∇), ∂ µ = (c−1 ∂t , −∇).

The four–current vector


We begin by showing that the four–component object j = (cρ, j) is a Lorentz vector. Consider
the current density carried by a point particle at coordinate x(t) in some coordinate system
K. The four–current density carried by the particle is given by j(x) = q(c, dt x(t))δ(x − x(t)),
or j µ (x) = qdt xµ (t)δ(x − x(t)), where xµ (t) = (ct, x(t)). To show that this is a contravariant
vector, we introduce a dummy integration,
dxµ (t̃) dxµ 4
Z Z
µ
j (x) = q dt̃ δ(x − x(t̃))δ(t − t̃) = qc dτ δ (x − x(τ )),
dt̃ dτ
where in the last step we switched to an integration over the proper time τ = τ (t̃) uniquely
assigned to the world line parameterization x(t̃) = (ct̃, x(t̃)) in K. Now, the four–component
δ–distribution, δ 4 (x) = δ(x)δ(ct) is a Lorentz scalar (why?). The proper time, τ , also is a
Lorentz scalar. Thus, the transformation behaviour of j µ is dictated by that of xµ , i.e. j µ
defines a contravariant vector.
As an important corrolary we note that the continuity equation — the contraction of
the covariant vector ∂µ and the contravariant vector j µ is Lorentz invariant, ∂µ j µ = 0 is a
Lorentz scalar.
3.4. MATHEMATICS OF SPECIAL RELATIVITY II: CO– AND CONTRAVARIANCE 79

Electric field strength tensor and vector potential


In section 3.3.2 we obtained Eq. (3.13) for the Lorentz invariant generalization of Newton’s
equations. Since vµ and v ν transform a covariant and contravariant vectors, respectively, the
matrix F µν must transform as a contravariant tensor of rank two. Now, let us define the four
vector potential as {Aµ } = (φ, A). It is then a straightforward exercise to verify that the
field strength tensor is obtained from the vector potential as

F µν = ∂ µ Aν − ∂ ν Aµ . (3.16)

(Just work out the antisymmetric combination of derivatives and compare to the definitions
B = ∇ × A, E = −∇φ − c−1 ∂t A.) This implies that the four–component object Aµ indeed
transforms as a contravariant vector.
Now, we have seen in chapter 1.2 that the Lorentz condition can be expressed as ∂µ Aµ =
0, i.e. in a Lorentz invariant form. In the Lorentz gauge, the inhomogeneous wave equations
assume the form
4π µ
Aµ = j , (3.17)
c
where  = ∂µ ∂ µ is the Lorentz invariant wave operator. Formally, this completes the proof
of the Lorentz invariance of the theory. The combination Lorentz–condition/wave equations,
which we saw carries the same information as the Maxwell equations, has been proven to be
invariant. However, to stay in closer contact to the original formulation of the theory, we next
express Maxwells equations themselves in a manifestly invariant form.

Invariant formulation of Maxwells equations


One may verify by direct inspection that the two inhomogeneous Maxwell equations can be
formulated in a covariant manner as

4π µ
∂µ F µν = j . (3.18)
c

To obtain the covariant formulation of the homogeneous equations a bit of preparatory work
is required: Let us define the fourth rank antisymmetric tensor as

 1, (µ, ν, ρ, σ) = (0, 1, 2, 3)or an even permutation
µνρσ ≡ −1 for an odd permutation (3.19)
0 else.

One may show (do it!) that µνρσ transforms as a contravariant tensor of rank four (under
the transformations of the unit–determinant subgroup L+ .) Contracting this object with the
covariant field strength tensor Fµν , we obtain a contravariant tensor of rank two,

1
F µν ≡ µνρσ Fρσ ,
2
80 CHAPTER 3. RELATIVISTIC INVARIANCE

known as the dual field strength tensor. Using (3.20), it is straightforward to verify that
 
0 −B1 −B2 −B3
B1 0 E3 −E2 
F = {F µν } = 
B2 −E3
, (3.20)
0 E1 
B3 E2 −E1 0

i.e. that F is obtained from F by replacement E → B, B → −E. One also verifies that the
homogeneous equations assume the covariant form

∂µ F µν = 0. (3.21)

This completes the invariance proof. Critical readers may find the definition of the dual tensor
somewhat un–motivated. Also, the excessive use of indices — something one normally tries
to avoid in physics — does not help to make the structure of the theory transparent. Indeed,
there exists a much more satisfactory, coordinate independent formulation of the theory in
terms of differential forms. However, as we do not assume familiarity of the reader with the
theory of forms, all we can do is encourage him/her to learn it and to advance to the truly
invariant formulation of electrodynamics not discussed in this text ...
Chapter 4

Lagrangian mechanics

Newton’s equations of motion provide a complete description of mechanical motion. Even


from today’s perspective, their scope is limited only by velocity (at high velocities v ∼ c,
Newtonian mechanics has to be replaced by its relativistic generalization), and classicicity
(the dynamics of small bodies is affected by quantum effects.) Otherwise, they remain fully
applicable, which is remarkable for a theory that old.
Newtonian theory had its first striking successes in celes-
tial mechanics. But how useful is this theory in a context more
worldly than that of a planet in open space? To see the jus-
tification of this question, consider the system shown in the
figure, a setup known as the Atwood machine: two bodies
subject to gravitational force are tied to each other by an ideal-
ized massless string over an idealized frictionless pulley. What
makes this problem different from those considered in celestial
mechanics is that the participating bodies (the two masses)
are constrained in their motion. The question then arises how
this constraint can be incorporated into (Newton’s) mechan-
ical equations of motion, and how the ensuing equations can
be solved. This problem is of profound applied relevance – of the hundreds of motions taking
place in, say, the engine of a car practically all are constrained. In the eighteenth century,
the era initiating the age of engineering and industrialization, the availability of a formalism
capable of efficient formulation of problems subject to mechanical constraints became pressing.
Early solution schemes in terms of Newtonian mechanics relied on the concept of “con-
straining forces”. The strategy there was to formulate a problem in terms of its basal un-
constrained variables (e.g., the real space coordinates of the two masses in the figure). In
a second step one would then introduce (infinitely strong) constraining forces serving to
reduce the number of free coordinates (e.g., down to the one coordinate measuring the height
of one of the masses in the figure.) However, strategies of this type soon turned out to be
operationally sub-optimal. The need to find more efficient formulations was motivation for
intensive research activity, which eventually culminated in the modern formulations of classical
mechanics, Lagrangian and Hamiltonian mechanics.

81
82 CHAPTER 4. LAGRANGIAN MECHANICS

INFO There are a number of conceptualy different types of mechanical constraints: con-
straints that can be expressed in terms of equalities such as f (q1 , . . . , qn ; q̇1 , . . . , q̇n ) = 0 are
called holonomic. Here, q1 , . . . , qn are the basal coordinates of the problem, and q̇1 , . . . , q̇n
the corresponding velocities. (The constraint of the Atwood machine belongs to this category:
x1 + x2 = l, where l is a constant, and xi , i = 1, 2 are the heights of the two bodies measured with
respect to a common reference height.) This has to be contrasted to non-holonomic constraints,
i.e. constraints formulated by inequalities. (Think of the molecules moving in a piston. Their
coordinates obey the constraint 0 ≤ qj ≤ Lj , j = 1, 2, 3, where Lj are the extensions of the
piston.)
Constraints explicitly involving time f (q1 , . . . , qn ; q̇1 , . . . , q̇n ; t) = 0 are called rheonomic. For
example, a particle constraint to move on a moving surface is subject to a rheonomic constraint.
Constraints void of explicit time dependence are called scleronomic.

In this chapter, we will introduce the concept of variational principles to derive Lagrangian (and
later Hamiltonian) mechanics from their Newtonian ancestor. Our construction falls short to
elucidate the beautiful and important history developments that eventually led to the modern
formulation. Also, it is difficult to motivate in advance. However, within the framework of a
short introductory course, these shortcomings are outweighed by the brevity of the derivation.
Still, it is highly recommended to consult a textbook on classical mechanics to learn more
about the historical developments that led from Newtonian to Lagrangian mechanics.
Let us begin with a few simple and seemingly un–inspired manipulations of Newton’s
equations mq̈ = F of a single particle (generalization to many particles will be obvious)
subject to a conservative force F. Now, notice that the l.h.s. of the equation may be written
as mq̈ = dt ∂q̇ T , where T = T (q̇) = m2 q̇2 is the particle’s kinetic energy. By definition, the
conservative force affords a representation F = −∂q U , where U = U (q) is the potential. This
means that Newton’s equation may be equivalently represented as

(dt ∂q̇ − ∂q )L(q, q̇) = 0, (4.1)

where we have defined the Lagrangian function,

L = T − U. (4.2)

But what is this reformulation good for? To appreciate the meaning of the mathematical
structure of Eq. (4.1), we need to introduce the purely mathematical concept of

4.1 Variational principles


In standard calculus, one is concerned with functions F (v) that take vectors v ∈ Rn as
arguments. Variational calculus generalizes standard calculus, in that one considers “functions”
F [f ] taking functions as arguments. Now, “function of a function” does not sound nice. This
may be the reason for why the “function” F is actually called a functional. Similarly, it is
customary to indicate the argument of a functional in angular brackets. The generalization
v → f is not in the least mysterious: as we have seen in previous chapters, one may discretize
a function (cf. the discussion in section 1.3.4) f → {fi |i = 1, . . . , N }, thus interpreting it as
4.1. VARIATIONAL PRINCIPLES 83

Figure 4.1: On the mathematical setting of functionals on curves (discussion, see text)

the limiting case of an N -dimensional vector; in many aspects, variational calculus amounts
to a straightforward generalization of standard calculus. At any rate, we will see that one may
work with functionals much like with ordinary functions.

4.1.1 Definitions
In this chapter we will focus on the class of functionals relevant to classical mechanics, viz.
1
functionals F [γ] taking curves in (subsets of) Rn as arguments. To be specific, consider the
set of all curves M ≡ {γ : I ≡ [t0 , t1 ] → U } mapping an interval I into a subset U ⊂ Rn of
2
n-dimensional space (see Fig. 4.1.) Now, consider a mapping

Φ :M → R,
γ 7→ Φ[γ], (4.3)

assigning to each curve γ a real number, i.e. a (real) functional on M .

EXAMPLE The length of a curve is defined as


n
!1/2
Z t1 X
L[γ] ≡ dt γ̇i (t)γ̇i (t) . (4.4)
t0 i=1

1
We have actually met with more general functionals before. For example, the electric susceptibility χ[E]
is a functional of the electric field E : R4 → R3 .
2
The set U may actually be a lower dimensional submanifold of Rn . For example, we might consider curves
in a two–dimensional plane embedded in R3 , etc.
84 CHAPTER 4. LAGRANGIAN MECHANICS

It assigns to each curve its euclidean length. Some readers may question the consistency of the
notation: on the l.h.s. we have the symbol γ (no derivatives), and on the r.h.s. γ̇ (temporal
derivatives.) However, there is no contradiction here. The notation [γ] indicates dependence on
the curve as a geometric object. By definition, this contains the full information on the curve,
including all derivatives. The r.h.s. indicates that the functional L reads only partial information
on the curve, viz. that contained in first derivatives.

Consider now two curves γ, γ 0 ∈ M that lie “close” to each other. (For example, we may
2 1/2
require that |γ(t) − γ 0 (t)| ≡ 0
P 
i (γi (t) − γi (t)) <  for all t and some positive .) We
are interested in the increment Φ[γ] − Φ[γ ]. Defining γ 0 = γ + h, the functional Φ is called
0

differentiable iff

Φ[γ + h] − Φ[γ] = F γ [h] + O(h2 ), (4.5)

where F γ [h] is a linear functional of h, i.e. a functional obeying F γ [c1 h1 + c2 h2 ] =


c1 F γ [h1 ] + c2 F γ [h2 ] for c1 , c2 ∈ R and h1,2 ∈ M . In (4.5), O(h2 ) indicates residual
contributions of order h2 . For example, if |h(t)| <  for all t, these terms would be of O(2 ).
The functional F |γ is called the differential of the functional Φ at γ. Notice that F |γ need
not depend linearly on γ. The differential generalizes the notion of a derivative to functionals.
Similarly, we may think of Φ[γ +h] = Φ[γ]+F |γ [h]+O(h2 ) as a generalized Taylor expansion.
The linear functional F |γ describes the behavior of Φ in the vicinity of the reference curve γ.
A curve γ is called an extremal curve of Φ if F |γ = 0.

EXERCISE Re-familiarize yourself with the definition of the derivative f 0 (x) of higher dimensional
functions f : Rk → R. Interpret the functional Φ[γ] as the limit of a function Φ : RN → R, {γi } →
Φ({γi }) where the vector {γi |i = 1, . . . , N } is a discrete approximation of the curve γ. Think how
the definition (4.5) generalizes the notion of differentiability and how F |γ ↔ f 0 (x) generalizes the
definition of a derivative.

This is about as much as we need to say/define in most general terms. In the next section
we will learn how to determine the extremal curves for an extremely important sub–family of
functionals.

4.1.2 Euler–Lagrange equations


In the following, we will focus on functionals that afford a “local representation”
Z t1
S[γ] = dt L(γ(t), γ̇(t), t), (4.6)
t0
4.1. VARIATIONAL PRINCIPLES 85

where L : Rn ⊕ Rn ⊕ R → R is a function. The functional S is


“local” in that the integral kernel L does not depend on points on the
curve at different times. Local functionals play an important role
in applications. (For example, the length functional (4.4) belongs
to this family.) We will consider the restriction of local functionals
to the set of all curves γ ∈ M that begin and end at common terminal points: γ(t0 ) ≡ γ0
and γt1 = γ1 with fixed γ0 and γ1 (see the figure.) Again, this is a restriction motivated by
the applications below. To keep the notation slim, we will denote the space of all curves thus
restricted again by M .
We now prove the following important fact: the local functional S[γ] is differentiable and
its derivative is given by

Z t1
F γ [h] = dt (∂γ L − dt ∂γ̇ L) · h. (4.7)
t0

Here, we are using the shorthand notation ∂γ L·h ≡ ni=1 ∂xi L(x, γ̇, t) xi =γi hi and analogously
P

for ∂γ̇ · h. Eq. (4.7) is verified by straightforward Taylor series expansion:

Z t1  
S[γ + h] − S[γ] = dt L(γ + h, γ̇ + ḣ, t) − L(γ, γ̇, t) =
t
Z 0t1 h i
= dt ∂γ L · h + ∂γ̇ · ḣ + O(h2 ) =
t
Z 0t1
= dt [∂γ L − dt (∂γ̇ L)] · h + ∂γ̇ L · h|tt10 + O(h2 ),
t0

where in the last step, we have integrated by parts. The surface term vanishes because
h(t0 ) = h(t1 ) = 0, on account of the condition γ(ti ) = (γ + h)(ti ) = γi , i = 0, 1. Comparison
with the definition (4.5) then readily gets us to (4.7).
Eq. (4.7) entails an important corollary: the local functional S is extremal on all curves
obeying the so-called Euler-Lagrange equations

d dL dL
− = 0, i = 1, . . . , N.
dt dγ̇i dγi

The reason is that if and only if these N conditions hold, will the linear functional (4.7) vanish
on arbitrary curves h. (Exercise: assuming that one or several of the conditions above are
violated, construct a curve h on which the functional (4.7) will not vanish.)
86 CHAPTER 4. LAGRANGIAN MECHANICS

Let us summarize what we have got: for a given function L : Rn ⊕ Rn ⊕ R → R,


the local functional

S : M → R,
Z t1
γ 7→ S[γ] ≡ dt L(γ(t), γ̇(t), t), (4.8)
t0

defined on the set M = {γ : [t0 , t1 ] → Rn |γ(t0 ) = γ0 , γ(t1 ) = γ1 } is extremal


on curves obeying the conditions
d dL dL
− = 0, i = 1, . . . , N. (4.9)
dt dγ̇i dγi
These equations are called the Euler–Lagrange equations of the functional S.
The function L is often called the Lagrangian (function) and S an action
functional.

At this stage, one may observe a suspicious structural similarity between the Euler-Lagrange
equations and our early reformulation of Newton’s equations (4.1). However, before shedding
more light on this connection, it is worthwhile to illustrate the usage of Euler-Lagrange calculus
on an

EXAMPLE Let us ask what curve between fixed initial and final points γ0 and γ1 , resp. will have
extremal curve length. Here, extremal means “shortest”, there is no “longest” curve. To answer
this question, we need to formulate and solve the Euler-Lagrange equations of the functional (4.4).
Rt
Differentiating the Lagrangian L(γ̇) = t01 dt ( ni=1 γ̇i γ̇i )1/2 , we obtain (show it)
P

d ∂L ∂L γ̈i γ̇i (γ̇j γ̈j )


− = − .
dt ∂ γ̇i ∂γi L(γ̇) L(γ̇)3

These equations are solved by all curves connecting γ0 and γ1 as a straight line. For example,
the “constant velocity” curve, γ(t) = γ0 tt−t 1
0 −t1
+ γ1 tt−t0
1 −t0
has γ̈i = 0, thus solving the equation.
(Exercise: the most general curve connecting γ0 and γ1 by a straight line is given by γ(t) =
γ0 + f (t)(γ0 − γ1 ), where f (t0 ) = 0 and f (t1 ) = 1. In general, these curves describe “accelerated
motion”, γ̈ 6= 0. Still, they solve the Euler-Lagrange equations, i.e. they are of shortest geometric
length. Show it!)

4.1.3 Coordinate invariance of Euler-Lagrange equations


What we actually mean when we write ∂γ∂L i (t)
is: differentiate the function L(γ, γ̇, t) w.r.t. the
ith coordinate of the curve γ at time t. However the same curve can be represented in different
coordinates! For example, a three dimensional curve γ will have different representations
depending on whether we work in cartesian coordinates γ(t) ↔ (x1 (t), x2 (t), x3 (t)) or spherical
coordinates γ(t) ↔ (r(t), θ(t), φ(t)).
4.1. VARIATIONAL PRINCIPLES 87

φ ◦ φ−1

Figure 4.2: On the representation of curves and functionals in different coordinates

Yet, nowhere in our derivation of the Euler–Lagrange equations did we make reference to
specific properties of the coordinate system. This suggests that the Euler–Lagrange equations
assume the same form, Eq. (4.9), in all coordinate systems. (To appreciate the meaning
of this statement, compare with the Newton equations which assume their canonical form
ẍi = fi (x) only in cartesian coordinates.) Coordinate changes play an extremely important
role in analytical mechanics, especially when it comes to problems with constraints. Hence,
anticipating that the Euler–Lagrange formalism is slowly revealing itself as a replacement of
Newton’s formulation, it is well invested time to take a close look at the role of coordinates.
Both a curve γ and the functional S[γ] are canonical objects, no reference to coordinates
made here. The curve is simply a map γ : I → U and S[γ] assigns to that map a number.
However, most of the time when we actually need to work with curve, we do so in a system
3
of coordinates. Mathematically, a coordinate system of U is a diffeomorphic map:

φ :U → V,
x 7→ φ(x) ≡ x ≡ (x1 , . . . , xn ),

where the coordinate domain V is an open subset of Rm and m is the dimensionality of


U ⊂ Rm . (The set U may be of lower dimension than embedding space Rn .) For example,
spherical coordinates ]0, π[×]0, 2π[3 (θ, φ) 7→ S 2 ⊂ R3 assign to each coordinate pair (θ, φ)
4
a point on the two–dimensional sphere, etc.
3
Loosely speaking, this means invertible and smooth (differentiable.)
4
We here avoid the discussion of the complications arising when U cannot be covered by a single coordinate
“chart” φ. For example, the sphere S 2 cannot be fully covered by a single coordinate mapping. (Think why.)
88 CHAPTER 4. LAGRANGIAN MECHANICS

Given a coordinate system, φ, the abstract curve γ defines a curve x ≡ φ ◦ γ : I → V ; t 7→


x(t) = φ(γ(t)) in coordinate space (see Fig. ??.) What we actually mean when we refer to
γi (t) in the previous sections, are the coordinates xi (t) of γ in the coordinate system φ. (In
cases where unambiguous reference to one coordinate system is made, it is customary to avoid
the introduction of extra symbols (xi ) and to write γi instead. As long as one knows what
one is doing, no harm is done.) Similarly, the abstract functional S[γ] defines a functional
Sc [x] ≡ S[φ−1 ◦x] on curves in the coordinate domain. In our derivation of the Euler-Lagrange
equations we have been making tacit use of a coordinate representation of this kind. In other
words, the Euler–Lagrange equations were actually derived for the representation Sc [x]. (Again
it is customary to simply write S[γ] if unambiguous reference to a coordinate representation
is made. Other customary notations include S[x] (omitting the subscript c), or just S[γ] if
γ is tacitly identified with a specific coordinate representation. Occasionally one writes S[γ]
where γ is meant to be the vector of coordinates of γ in a specific system.)
Now, suppose we are given another coordinate representation of U , that is, a diffeomor-
phism φ0 : U → V 0 , x 7→ φ0 (x) = x0 ≡ (x01 , . . . , x0m ). A point on the curve, γ(t) now has two
different coordinate representations, x(t) = φ(γ(t)) and x0 (t) = φ0 (γ(t)). The coordinate
transformation between these representations is mediated by the map

φ0 ◦ φ−1 :V → V 0 ,
x 7→ x0 = φ0 ◦ φ−1 (x).

By construction, this is a smooth and invertible map between open subsets of Rm . For example,
if x = (x1 , x2 , x3 ) are cartesian coordinates and x0 = (r, θ, φ) are spherical coordinates, we
would have x01 (x) = (x21 + x22 + x23 )1/2 , etc.
The most important point is that x(t) and x0 (t) describe the same curve, γ(t), only in
different representations. Specifically, if the reference curve γ is an extremal curve, the coor-
dinate representations x : I → V and x0 : I → V2 will be extremal curves, too. According to
our discussion in section 4.1.2 both curves must be solutions of the Euler-Lagrange equations,
i.e. we can draw the conclusion:

γ extremal ⇒ (4.10)
d ∂L ∂L
− = 0,
dt ∂ ẋi ∂xi
d ∂L0 ∂L
0
− 0 = 0,
dt ∂ ẋi ∂xi

where L0 (x0 , ẋ0 , t) ≡ L(x(x0 ), ẋ(x0 ), t) is the Lagrangian in the φ0 –coordinate representation.
In other words

The Euler–Lagrange equations are coordinate invariant.

They assume the same form in all coordinate systems.


4.2. LAGRANGIAN MECHANICS 89

EXERCISE Above we have shown the coordinate invariance of the Euler–Lagrange equations by
conceptual reasoning. However, it must be possible to obtain the same invariance properties by
brute force computation. To show that the second line in (4.10) follows from the first, use the
P ∂x0 ∂ ẋ0 ∂x0
chain rule, dt x0i = j ∂xji dt xj , and its immediate consequence ∂ ẋji = ∂xji .

EXAMPLE Let us illustrate the coordinate invariance of the variational formalism on the example
of the functional “curve length” discussed on p 86. Considering the case of curves in the plane,
n = 2, we might get the idea to attack this problem in polar coordinates φ−1 (x) = (r, ϕ). The
polar coordinate representation of the cartesian Lagrangian L(x1 , x2 , ẋ1 , ẋ2 ) = (ẋ21 + ẋ22 )1/2 reads
(verify it)

L(r, ϕ, ṙ, ϕ̇) = (ṙ2 + r2 ϕ̇2 )1/2 .


It is now straightforward to compute the Euler–Lagrange equations
d ∂L ∂L !
− = ϕ̇(. . . ) = 0,
dt ∂ ṙ ∂r
d ∂L ∂L !
− = ϕ̇(. . . ) = 0.
dt ∂ ϕ̇ ∂ϕ
Here, the notation ϕ̇(. . . ) indicates that we are getting a lengthy list of terms which, however,
are all weighted by ϕ̇. Putting the initial point into the origin φ(γ0 ) = (0, 0) and the final point
somewhere into the plane, φ(γ1 ) = (r1 , ϕ1 ), we thus conclude that the straight line connectors,
φ(γ(t)) = (r(t), ϕ0 ) are solutions of the Euler–Lagrange equations. (It is less straightforward to
show that these are the only solutions.)

With these preparations in store, we are now in a very good position to apply variational
calculus to a new and powerful reformulation of Newtonian mechanics.

4.2 Lagrangian mechanics


4.2.1 The idea
Imagine a mechanical problem subject to constraints.
For definiteness, we may consider the system shown
on the right – a bead sliding on a string and subject
to the gravitational force, Fg . In principle, we may
describe the situation in terms of Newton’s equa-
tions. These equations must contain an “infinitely
strong” force Fc whose sole function is to keep
the bead on the string. About this force we do not
know much (other than that it acts vertically to
the string, and vanishes right on the string.)
According to the structures outlined above, we
may now reformulate Newton’s equations as (4.1), where the potential V = Vg + Vc in the
Lagrangian L = T − V contains two contributions, one accounting for the gravitational force,
90 CHAPTER 4. LAGRANGIAN MECHANICS

Vg , and another, Vc , for the force Fc . We also know that the sought for solution curve
Rt
q : I → R3 , t 7→ q(t) will extremize the action S[q] = t01 dt L(q, q̇). (Our problem does not
include explicit time dependence, i.e. L does not carry a time–argument.) So far, we have
not gained much. But let us now play the trump card of the new formulation, its invariance
under coordinate changes.
In the above formulation of the problem, we seeking for an extremum of S[q] on the set of
all curves I → R3 . However, we know that all curves in R3 will be subject to the “infinitely
strong” potential of the constraint force, unless they lie right on the string S. The action of
those generic curves will be infinitely large and we may remove them from the set of curves
entering the variational procedure from the outset. This observation suggests to represent the
problem in terms of coordinates (s, q⊥ ), where the one–dimensional coordinate s parameterizes
the curve, and the (3 − 1)–dimensional coordinate vector q⊥ parameterizes the space normal
to the curve. Knowing that curves with non-vanishing q⊥ (t) will have an infinitely large action,
we may restrict the set of curves under consideration to M ≡ {γ : I → S, t 7→ (s(t), 0)},
i.e. to curves in S. On the string, S, the constraint force Fc vanishes. The above limitation
thus entails that the constraint forces will never explicitly appear in the variational procedure;
this is an enormous simplification of the problem. Also, our problem has effectively become
one–dimensional. The Lagrangian evaluated on curves in S is a function L(s(t), ṡ(t)). It is
much simpler than the original Lagrangian L(q, q̇(t)) with its constraint force potential.
Before generalizing these ideas to a general solution strategy of problems with constraints,
let us illustrate its potential on a concrete problem.

EXAMPLE We consider the Atwood machine depicted on p81. The Lagrangian of this problem
reads
m1 2 m2 2
L(q1 , q2 , q̇1 , q̇2 ) = q̇ + q̇ − m1 gx1 − m2 gx2 .
2 1 2 2
Here, qi is the position of mass i, i = 1, 2 and xi is its height. In principle, the sought for solution
γ(t) = (q1 (t), q2 (t)) is a curve in six dimensional space. We assume that the masses are released
at rest at initial coordinates (xi (t0 ), yi (t0 ), zi (t0 )). The coordinates yi and zi will not change in
the process, i.e. effectively we are seeking for solution curves in the two–dimensional space of
coordinates (x1 , x2 ). The constraint now implies that x ≡ x1 = l − x2 , or x2 = l − x. Thus,
our solution curves are uniquely parameterized by the “generalized coordinate” x as (x1 , x2 ) =
(x, l − x). We now enter with this parameterization into the Lagrangian above to obtain

m1 + m2 2
L(x, ẋ) = ẋ − (m1 − m2 )gx + const.
2
It is important to realize that this function uniquely specifies the action
Z t1
S[x] = dt L(x(t), ẋ(t))
t0

of physically allowed curves. The extremal curve, which then describes the actual motion of the
two–body system, will be solution of the equation

∂L ∂L
dt − = (m1 + m2 )ẍ + (m1 − m2 )g = 0.
∂ ẋ ∂x
4.2. LAGRANGIAN MECHANICS 91

For the given initial conditions x(t0 ) = x1 (t0 ) and ẋ(t0 ) = 0, this equation is solved by

m1 − m2 g
x(t) = x(t0 ) − (t − t0 )2 .
m1 + m2 2

Substitution of this solution into q1 = (x, y1 , z1 ) and q2 = (l − x, y2 , z2 ) solves our problem.

4.2.2 Hamilton’s principle


After this preparation, we are in a position to formulate a new approach to solving mechanical
problems. Suppose, we are given an N –particle setup specified by the following data: (i) a La-
grangian function L : R6N +1 → R, (x1 , . . . , xN , ẋ1 , . . . , ẋN , t) 7→ L(x1 , . . . , xN , ẋ1 , . . . , ẋN , t)
defined in the 6N –dimensional space needed to register the coordinates and velocities of N –
particles, and (ii) a set of constraints limiting the motion of particles to an f –dimensional
5
submanifold of RN . Mathematically, a (holonomic) set of constraints will be implemented
through 3N − f equations

Fj (x1 , . . . , xN , t) = 0, j = 1, . . . , 3N − f. (4.11)

The number f is called the number of degrees of freedom of the problem.


Hamiltons’s principle states that such a problem is to be solved by a three–step algorithm:

. Resolve the constraints (4.11) in terms of f parameters q ≡ (q1 , . . . , qf ), i.e. find a


representation xl (q), l = 1, . . . , N , such that the constraints Fj (x1 (q), . . . , xN (q)) = 0
are resolved for all j = 1, . . . , 3N − f . The parameters qi are called generalized coor-
dinates of the problem. The maximal set of parameter configurations q ≡ (q1 , . . . qf )
compatible with the constraint defines a subset V ⊂ Rf and the map V → R3N , q 7→
(x1 (q), . . . , xN (q)) defines an f –dimensional submanifold of R3N .

. Reduce the Lagrangian of the problem to an effective Lagrangian

L(q, q̇, t) ≡ L(x1 (q), . . . , xN (q), ẋ1 (q), . . . , ẋN (q), t).

In practice, this amounts to a substitution of xi (t) = xi (q(t)) into the original La-
grangian. The effective Lagrangian is a function L : V × Rf × R → R, (q, q̇, t) 7→
L(q, q̇, t).

. Finally formulate and solve the Euler–Lagrange equations

d L(q, q̇, t) L(q, q̇, t)


− = 0, i = 1, . . . , f. (4.12)
dt dq̇i dqi
5
Loosely speaking, a d–dimensional submanifold of Rn is a subset of Rn that affords a smooth param-
eterization in terms of d < n coordinates (i.e. is locally diffeomorphic to open subsets of Rd .) Think of a
smooth surface in three–dimensional space (n = 3, d = 2) or a line (n = 3, d = 1), etc.
92 CHAPTER 4. LAGRANGIAN MECHANICS

The prescription above is equivalent to the following statement, which is known as Hamiltons
principle

Consider a mechanical problem formulated in terms of f generalized coordinates q =


(q1 , . . . , qf ) and a Lagrangian L(q, q̇, t) = (T − U )(q, q̇, t). Let q(t) be solution of
the Euler-Lagrange equations

(dt ∂q̇i − ∂qi )L(q, q̇, t) = 0, i = 1, . . . , f,

at given initial and final configuration q(t0 ) = q0 and q(t1 ) = q1 . This curve describes
the physical motion of the system. It is an extremal curve of the action functional
Z t1
S[q] = dt L(q, q̇, t).
t0

To conclude this section, let us transfer a number of important physical quantities from
Newtonian mechanics to the more general framework of Lagrangian mechanics: the general-
ized momentum associated to a generalized coordinate qi is defined as
∂L(q, q̇, t)
pi = . (4.13)
∂ q̇i
Notice that for cartesian coordinates, pi = ∂q̇i L = ∂q̇i T = mq̇i reduces to the familiar
momentum variable. We call a coordinate qi a cyclic variable if it does not enter the
Lagrangian function, that is if ∂qi L = 0. These two definitions imply that

the generalized momentum corresponding to a cyclic variable is


conserved, dt pi = 0.

This follows from dt pi = dt ∂q̇i L = ∂qi L = 0. In general, we call the derivative ∂qi L ≡ Fi
a generalized force. In the language of generalized momenta and forces, the Lagrange
equations (formally) assume the form of Newton-like equations,

dt pi = F i .

For the convenience of the reader, the most important quantities revolving around the La-
grangian formalism are summarized in table 4.1.

4.2.3 Lagrange mechanics and symmetries


Above, we have emphasized the capacity of the Lagrange formalism to handle problems with
constraints. However, an advantage of equal importance is its flexibility in the choice of
problem adjusted coordinates: unlike the Newton equations, the Lagrange equations maintain
their form in all coordinate systems.
4.2. LAGRANGIAN MECHANICS 93

quantity designation or definition


generalized coordinate qi
generalized momentum pi = ∂q̇i L
generalized force Fi = ∂qi L
Lagrangian L(q, q̇, t) = (T − U )(q, q̇, t)
Rt
Action (functional) S[q] = t01 dt L(q, q̇, t)
Euler-Lagrange equations (dt ∂q̇i − ∂qi )L = 0

Table 4.1: Basic definitions of Lagrangian mechanics

The choice of “good coordinates” becomes instrumental in problems with symmetries.


From experience we know that a symmetry (think of z–axis rotational invariance) entails the
conservation of a physical variable (the z–component of angular momentum), and that it
is important to work in coordinates reflecting the symmetry (z–axis cylindrical coordinates.)
But how do we actually define the term “symmetry”? And how can we find the ensuing
conservation laws? Finally, how do we obtain a system of symmetry–adjusted coordinates? In
this section, we will provide answers to these questions.

4.2.4 Noether theorem

Consider a family of mappings, hs , of the coordinate


manifold of a mechanical system into itself,

hs :V → V,
q → hs (q). (4.14)

Here, s ∈ R is a control parameter, and we require that


h0 = id. is the identity transform. By way of example, consider the map hs (r, θ, φ) ≡ (r, θ, φ+
s) describing a rotation around the z-axis in the language of spherical coordinates, etc. For
each curve q : I → V, t 7→ q(t), the map hs gives us a new curve hs ◦ q : I → V, t 7→ hs (q(t))
(see the figure).
We call the transformation hs a symmetry (transformation) of a mechanical system iff

S[hs ◦ q] = S[q],

for all s and curves q. The action is then said to be invariant under the symmetry transfor-
mation.
Suppose, we have found a symmetry and its representation in terms of a family of in-
variant transformations. Associated to that symmetry there is a quantity that is conserved
during the dynamical evolution of the system. The correspondence between symmetry and its
conservation law is established by a famous result due to Emmy Noether:
94 CHAPTER 4. LAGRANGIAN MECHANICS

Noether theorem (1915-18): Let the action S[q] be invariant under the transfor-
mation q 7→ hs (q). Let q(t) be an solution curve of the system (a solution of the
Euler-Lagrange equations). Then, the quantity
f
X d
I(q, q̇) ≡ pi (hs (q))i (4.15)
i=1
ds

is dynamically conserved:
dt I(q, q̇) = 0.
Here, pi = ∂q̇i L (q,q̇)=(hs (q),ḣs (q)) is the generalized momentum of the ith coordinate,
and the quantity I(q, q̇) is known as the Noether momentum.

The proof of Noether’s theorem is straightforward: the requirement of action–invariance under


the transformation hs is equivalent to the condition ds S[hs (q)] = 0 for all values of s. We
now consider the action Z tf
S[q] = dt L(q, q̇, t),
t0

a solution curve samples during a time interval [t0 , tf ]. Here, tf is a free variable and the
transformed value hs (q(tf )) may differ from q(tf ). We next explore what the condition of
action invariance tells us about the Lagrangian of the theory:
Z tf
!
0 = ds dt L(hs (q), ḣs (q), t)
t0
Z t1  
= dt ∂qi L q=hs (q) ds (hs (q))i + ∂q̇i L q̇=ḣs (q) ds (ḣs (q))i =
t
Z 0t1 tf
= dt (∂qi − dt ∂q̇i L) (hs (q),ḣs (q)) ds (hs (q))i + ∂q̇i L q̇=ḣs (q) ds hs (q)i .
t0 t0

Now, our reference curve is a solution curve, which means that the integrand in the last line
tf
vanishes. We are thus led to the conclusion ∂q̇i L d h (q)i
q̇=ḣs (q) s s
= 0 for all values tf , and
t0
this is equivalent to the constancy of the Noether momentum.
It is often convenient, to evaluate both, the symmetry transformation hs and the Noether
momentum I = I(s) at infinitesimal values of the control parameter s. In practice, this
means that we fix a pair (q, q̇) comprising coordinate and velocity of a solution curve. We
then consider an infinitesimal transformation h (q), where  is infinitesimally small. The
Noether momentum is then obtained as

f
X d
I(q, q̇) = pi (h (q))i . (4.16)
i=1
d 
4.2. LAGRANGIAN MECHANICS 95

It is usually convenient to work in symmetry adjusted coordinates, i.e. in coordinates


where the transformation hs assumes its simplest possible form. These are coordinates where
hs acts by translation in one coordinate direction, that is (hs (q))i = qi + sδij , where j is the
affected coordinate direction. Coordinates adjusted to a symmetry are cyclic (think about this
point), and the Noether momentum
I = pi ,
collapses to the standard momentum of the symmetry coordinate.

4.2.5 Examples
In this section, we will discuss two prominent examples of symmetries and their conservation
laws.

Translational invariance ↔ conservation of momentum


Consider a mechanical system that is invariant under translation in some direction. Without
loss of generality, we choose cartesian coordinates q = (q1 , q2 , q3 ) in such a way that the
coordinate q1 parameterizes the invariant direction. The symmetry transformation hs then
translates in this direction: hs (q) = q + se1 , or hs (q) = (q1 + s, q2 , q3 ). With ds hs (q) = e1 ,
we readily obtain
∂L
I(q, q̇) = = p1 ,
∂ q̇1
where p1 = mq̇1 is the ordinary cartesian momentum of the particle. We are thus led to the
conclusion

Translational invariance entails the conservation of cartesian momentum.

EXERCISE Generalize the construction above to a system of N particles in the absence of external
potentials. The (interaction) potential of the system then depends only on coordinate differences
qi − qj . Show that translation in arbitrary directions is a symmetry of the system and that this
implies the conservation of the total momentum P = N
P
j=1 mj q̇j .

Rotational invariance ↔ conservation of angular momentum


As a second example, consider a system invariant under rotations around the 3-axis. In
cartesian coordinates, the corresponding symmetry transformation is described by
(3)
q 7→ hφ (q) ≡ Rφ q,

where the angle φ serves as a control parameter (i.e. φ assumes the role of the parameter s
above), and  
cos(φ) sin(φ) 0
(3)
Rφ ≡ − sin(φ) cos(φ) 0
0 0 1
96 CHAPTER 4. LAGRANGIAN MECHANICS

is an O(3)–matrix generating rotations by the angle φ around the 3–axis. The infinitesimal
variant of a rotation is described by
 
1  0
R(3) ≡ − 1 0 + O(2 ).
0 0 1

From this representation, we obtain the Noether momentum as

∂L
I(q, q̇) = d =0
(R(3) q)i = m(q̇1 q2 − q̇2 q1 ).
∂ q̇i

This is (the negative of) the 3-component of the particle’s angular momentum. Since the
choice of the 3–axis as a reference axis was arbitrary, we have established the result:

Rotational invariance around a symmetry axis entails the conservation of the


angular momentum component along that axis.

Now, we have argued above that in cases with symmetries, one should employ adjusted co-
ordinates. Presently, this means coordinates that are organized around the 3–axis: spherical,
or cylindrical coordinates. Choosing cylindrical coordinates for definiteness, the Lagrangian
assumes the form
m 2
L(r, φ, ṙ, φ̇, ż) = (ṙ + r2 φ̇2 + ż 2 ) − U (r, z), (4.17)
2
where we noted that the problem of a rotationally invariant system does not depend on φ.
The symmetry transformation now simply acts by translation, hs (r, z, φ) = (r, z, φ + s), and
the Noether current is given by

∂L
I ≡ l3 = = mr2 φ̇. (4.18)
∂ φ̇

We recognize this as the cylindrical coordinate representation of the 3–component of angular


momentum.

EXAMPLE Let us briefly extend the discussion above to re-derive the symmetry optimized rep-
resentation of a particle in a central potential. We choose cylindrical coordinates, such that (a)
the force center lies in the origin, and (b) at time t = 0, both q and q̇ lie in the z = 0 plane.
Under these conditions, the motion will stay in the z = 0 plane. (Exercise: show this from the
Euler–Lagrange equations.) and the cylindrical coordinates (r, φ, z) can be reduced to the polar
coordinates (r, φ) of the invariant z = 0 plane.
4.2. LAGRANGIAN MECHANICS 97

The reduced Lagrangian reads


m 2
L(r, ṙ, φ̇) = (ṙ + r2 φ̇2 ) − U (r).
2

From our discussion above, we know that l ≡ l3 = mr2 φ̇ is


a constant, and this enables us to express the angular velocity
φ̇ = l3 /mr2 in terms of the radial coordinate. This leads us the
effective Lagrangian of the radial coordinate,

m 2 l2
L(r, ṙ) = ṙ + − U (r).
2 2mr2
The solution of its Euler–Lagrange equations,

l2
mr̈ = −∂r U − ,
mr3
has been discussed in section *
98 CHAPTER 4. LAGRANGIAN MECHANICS
Chapter 5

Hamiltonian mechanics

The Lagrangian approach has introduced a whole new degree of flexibility into mechanical
theory building. Still, there is room for further development. To see how, notice that in the
last chapter we have consistently characterized the state of a mechanical system in terms of
the 2f variables (q1 , . . . , qf , q̇1 , . . . , q̇f ), the generalized coordinates and velocities. It is clear
that we need 2f variables to specify the state of a mechanical system. But are coordinates
and velocities necessarily the best variables? The answer is: often yes, but not always. In
our discussion of symmetries in section 4.2.3 above, we have argued that in problems with
symmetries, one should work with variables qi that transform in the simplest possible way (i.e.
additively) under symmetry transformations. If so, the corresponding momentum pi = ∂q̇i L is
conserved. Now, if it were possible to express the velocities uniquely in terms of coordinates
and momenta, q̇ = q̇(q, p), we would be in possession of an alternative set of variables (q, p)
such that in a situation with symmetries a fraction of our variables stays constant. And that’s
the simplest type of dynamics a variable can show.
In this chapter, we show that the reformulation of mechanics in terms of coordinates and
momenta as independent variables is an option. The resulting description is called Hamilto-
nian mechanics. Important examples of this new approach include:
. It exhibits a maximal degree of flexibility in the choice of problem adjusted coordinates.
. The variables (q, p) live in a mathematical space, so called “phase space”, that carries
a high degree of mathematical structure (much more than an ordinary vector space.)
This structure turns out to be of invaluable use in the solution of complex problems.
Relatedly,
. Hamiltonian theory is the method of choice in the solution of advanced mechanical
problems. For example, the theory of (conservative) chaotic systems is almost exclusively
formulated in the Hamiltonian approach.
. Hamiltonian mechanics is the gateway into quantum mechanics. Virtually all concepts
introduced below have a direct quantum mechanical extension.
However, this does not mean that Hamiltonian mechanics is “better” than the Lagrangian
theory. The Hamiltonian approach has its specific advantages. However, in some cases it

99
100 CHAPTER 5. HAMILTONIAN MECHANICS

may be preferable to stay on the level of the Lagrangian theory. At any rate, Lagrangian
and Hamiltonian mechanics form a pair that represents the conceptual basis of many modern
theories of physics, not just mechanics. For example, electrodynamics (classical and quantum)
can be formulated in terms of a Lagrangian and an Hamiltonian theory, and these formulations
are strikingly powerful.

5.1 Hamiltonian mechanics


The Lagrangian L = L(q, q̇, t) is a function of coordinates and velocities. What we are after is
a function H = H(q, p, t) that is a function of coordinates and momenta, where the momenta
and velocities are related to each other as pi = ∂q̇i L = pi (q, q̇, t). Once in possession of the
new function H, there must be a way to express the key information carriers of the theory –
the Euler Lagrange equations – in the language of the new variables. In mathematics, there
exist different ways of formulating variable changes of this type, and one of them is known as

5.1.1 Legendre transform


Reducing the notation to a necessary minimum, the task formulated above amounts to the
following problem: given a function f (x) (here, f assumes the role of L and x represents q̇i ),
consider the variable z = ∂x f (x) = z(x) (z represents the momentum pi .) If this equation is
invertible, i.e. if a representation x = x(z) exists, find a partner function g(z), such that g
carries the same amount of information as f . The latter condition means that we are looking
for a transformation, i.e. a mapping between functions f (x) → g(z) that can be inverted
g(z) → f (x), so that the function f (x) can be reconstructed in unique terms from g(z).
If this latter condition is met, it must be possible to express any mathematical operation
formulated for the function of f in terms of a corresponding operation on the function g (cf.
the analogous situation with the Fourier transform.)
Let us now try to find a transformation that meets the criteria above. The most obvious
guess might be a direct variable substitution: compute z = ∂x f (x) = z(x), invert to x = x(z),
?
and substitute this into f , i.e. g(z) = f (x(z)). This idea goes in the right direction but is
not quite good enough. The problem is that in this way information stored in the function f
may get lost. To exemplify the situation, consider the function

f (x) = C exp(x),

where C is a constant. Now, z = ∂x f (x) = C exp(x), which means x = ln(z/C) substitution


back into f gets us to
g(z) = C exp(ln(z/C)) = z.

The function g no longer knows about C, so information on this constant has been irretrievably
lost. (Knowledge of g(z) is not sufficient to re-construct f (x).)
5.1. HAMILTONIAN MECHANICS 101

However, it turns out that a slight extension of the above idea does the trick. Namely,
consider the so called Legendre transform of f (x),

g(z) ≡ f (x(z)) − zx(z). (5.1)

We claim that g(z) defines a transformation of functions. To verify this, we need to show
that the function f (x) can be obtained from g(z) by some suitable inverse transformation.
It turns out that (up to a harmless sign change) the Legendre transform is self-inverse; just
apply it once more and you get back to the function f . Indeed, let us define the variable
y ≡ ∂z g(z) = ∂x f x(z) ∂z x(z) − z∂z x(z) − x(z). Now, by definition of the function z(x) we
have ∂x f x(z) = z(x(z)) = z. Thus, the first two terms cancel, and we have y(z) = −x(z).
We next compute the Legendre transform of g(z):
h(y) = f (x(z(y)) − x(z(y))z(y) − z(y)y.
Evaluating the relation y(z) = −x(z) on the specific argument z(y), we get y = y(z(y)) =
−x(z(y)), i.e. f (x(z(y)) = f (−y), and the last two terms in the definition of h(y) are seen
to cancel. We thus obtain
h(y) = f (−y),
the Legendre transform is (almost) self–inverse, and this means that by passing from f (x) to
g(z) no information has been lost.
The abstract definition of the Legendre transform with its nested variable dependencies can
be somewhat confusing. However, when applied to concrete functions, the transform is actually
easy to handle. Let us illustrate this on the example considered above: with f (x) = C exp(x),
and x = ln(z/C), we get
g(z) = C exp(ln(z/C)) − ln(z/C)z = z(1 − ln(z/C)).
(Notice that g(z) “looks” very different from f (x), but this need not worry us.) Now, let us
apply the transform once more: define y = ∂z g(z) = − ln(z/C), or z(y) = C exp(−y). This
gives
h(y) = C exp(−y)(1 + y) − C exp(−y)y = C exp(−y),
in accordance with the general result h(y) = f (−y).
The Legendre transform of multivariate functions f (x) is obtained by application of
the rule to all variables: compute zi ≡ ∂xi f (x). Next construct the inverse x = x(z). Then
define X
g(z) = f (x(z)) − zi xi (z). (5.2)
i

5.1.2 Hamiltonian function


Definition
We now apply the construction above to compute the Legendre transform of the Lagrange
function. We thus apply Eq. (5.2) to the function L(q, q̇, t) and identify variables as x ↔ q̇
102 CHAPTER 5. HAMILTONIAN MECHANICS

and z ↔ p. (Notice that the coordinates, q, themselves play the role of spectator variables.
The Legendre transform is in the variables q̇!) Le us now formulate the few steps it takes to
pass to the Legendre transform of the Lagrange functions:

1. Compute the f variables

pi = ∂q̇i L(q, q̇, t). (5.3)

2. Invert these relations to obtain q̇i = q̇i (q, p, t)

3. Define a function
X
H(q, p, t) ≡ pi q̇i (q, q̇, t) − L(q, q̇(q, p, t)). (5.4)
i

Technically, this is the negative of L’s Legendre transform. Of course, this function
carries the same information as the Legendre transform itself.

The function H is known as the Hamiltonian function of the system. It is usually called
“Hamiltonian” for short (much like the Lagrangian function) is called “Lagrangian”. The
notation above emphasizes the variable dependencies ofP the quantities q̇i , pi , etc. One usually
keeps the notation more compact,P e.g. by writing H = i pi q̇i − L. However, it is important
to remember that in both L and i q̇i pi , the variable q̇i has to be expressed as a function of
q and p. It doesn’t make sense to write down formulae such as H = . . . , if the right hand
side contains q̇i ’s as fundamental variables!

Hamilton equations
Now we need to do something with the Hamiltonian function. Our goal will be to transcribe
the Euler–Lagrange equations to equations defined in terms of the Hamiltonian. The hope
then is, that these new equations contain operational advantages over the Euler–Lagrange
equations.
Now, the Euler-Lagrange equations probe changes (derivatives) of the function L. It will
therefore be a good idea, to explore what happens if we ask similar questions to the function
H. In doing so, we should keep in mind that our ultimate goal is to obtain unambiguous
information on the evolution of particle trajectories. Let us then compute

∂qi H(q, p, t) = pj ∂qi q̇j (q, p) − ∂qi L(q, q̇(q, p)) − ∂q̇j L(q, q̇(q, p))∂qi q̇j (q, p).

The first and the third term cancel, because ∂q̇j L = pj . Now, iff the curve q(t) is a so-
lution curve, the middle term equals −dt ∂q̇i L = −dt pi . We thus conclude that the (q, p)-
representation of solution curves must obey the equation

ṗi = −∂qi H(q, p, t).


5.1. HAMILTONIAN MECHANICS 103

Similarly,

∂pi H(q, p, t) = q̇i (q, p) + pj ∂pi q̇j (q, p, t) − ∂q̇j L(q, q̇(q, p))∂pi q̇j (q, p, t).

The last two terms cancel, and we have

q̇i = ∂pi H(q, p, t).


1
Finally, let us compute the partial time derivative ∂t H.

dt H(q, p, t) = pi ∂t q̇i (q, p, t) − ∂q̇i L(q, q̇(q, p, t))∂t q̇i (q, p, t) − ∂t L(q, q̇, t).

Again, two terms cancel and we have

∂t H(q, p, t) = −∂t L(q, q̇(q, p, t), t), (5.5)

where the time derivative on the r.h.s. acts only on the third argument L( . , . , t).
Let us summarize where we are: The solution curve q(t) of a mechanical system de-
fines a 2f –dimensional “curve”, (q(t), q̇(t)). Eq. (5.3) then defines a 2f –dimensional curve
(q(t), p(t)). The invertibility of the relation p ↔ q̇ implies that either representation faithfully
describes the curve. The derivation above then shows that

The solution curves (q, p)(t) of mechanical problems obey the so–called Hamilton
equations

q̇i = ∂pi H,
ṗi = −∂qi H. (5.6)

Some readers may worry about the numbers of variables employed in the description of curves:
in principle, a curve is uniquely represented by the f variables q(t). However, we have now
decided to represent curves in terms of the 2f –variables (q, p). Is there some redundancy
here? No there isn’t and the reason can be understood in different ways. First notice that in
Lagrangian mechanics, solution curves are obtained from second order differential equations in
time. (The Euler–Lagrange equations contain terms q̈i .) In contrast, the Hamilton equations
2
are first order in time. The price to be payed for this reduction is the introduction of a second
set of variables, p. Another way to understand the rational behind the introduction of 2f
variables is by noting that the solution of the (2nd order differential) Euler–Lagrange equations
1
Here it is important to be very clear about what we are doing: in the present context, q̇ is a variable in
the Lagrange function. (We could also name it v or z, or whatever.) It is considered a free variable (no time
dependence), unless the transformation q̇i = q̇i (q, p, t) becomes explicitly time dependent. Remembering the
origin of this transformation, we see that this may happen if the function L(q, p, t) contains explicit time
dependence. This happens, e.g., in the case of rheonomic constraints, or time dependent potentials.
2
Ordinary differential equations of nth order can be transformed to systems of n ordinary differential
equations of first order. The passage L → H is example of such an order reduction.
104 CHAPTER 5. HAMILTONIAN MECHANICS

requires the specification of 2f boundary conditions. These can be the 2f conditions


3
stored in the specification q(t0 ) = q0 and q(t1 ) = q1 of an initial and a final point, or the
4
specification q(t0 ) = q0 and q̇(t0 ) = v0 of an initial configuration and velocity. In contrast

the Hamilton are uniquely solvable once an initial configuration


(q(t0 ), p(t0 )) = (q0 , p0 ) has been specified.

Either way, we need 2f boundary conditions, and hence 2f variables to uniquely specify the
state of a mechanical system.
EXAMPLE To make the new approach more concrete, let us formulate the Hamilton equations
of a point particle in cartesian coordinates. The Lagrangian is given by
m
L(q, q̇, t) = q̇2 − U (q, t),
2
where we allow for time dependence of the potential. Thus
pi = mq̇i ,
pi
which is inverted as q̇i (p) = m. This leads to
3
X pi p2
H(q, p, t) = pi − L(q, p/m, t) = + U (q, t).
m 2m
i=1
The Hamilton equations assume the form
p
q̇ = ,
m
ṗ = −∂q U (q, t).
Further, ∂t H = −∂t L = ∂t U inherits its time dependence from the time dependence of the
potential. The two Hamilton equations are recognized as a reformulation of the Newton equation.
(Substituting the time derivative of the first equation, q̈ = ṗ/m, into the second we obtain the
Newton equation mq̈ = −∂q U .)
p2
Notice the Hamiltonian function H = 2m + U (q, t) equals the energy of the particle! In the next
section, we will discuss this connection in more general terms.
EXAMPLE Let us now solve these equations for the very elementary example of the one-
2
dimensional harmonic oscillator. In this case, V (q) = mω 2
2 q and the Hamilton equations assume
the form
p
q̇ = ,
m
ṗ = −mω 2 q.
For a given initial configuration x(0) = (q(0), p(0)), these equations afford the unique solution
p(0)
q(t) = q(0) cos ωt + sin ωt,

p(t) = p(0) cos ωt − mω sin ωt. (5.7)
3
But notice that conditions of this type do not always uniquely specify a solution – think of examples!
4
Formally, this follows from a result of the theory of ordinary differential equations: a system of n first
order differential equations ẋi = fi (x1 , . . . , xn ), i = 1, . . . , n affords a unique solution, once n initial conditions
xi (0) have been specified.
5.1. HAMILTONIAN MECHANICS 105

Physical meaning of the Hamilton function


The Lagrangian L = T − U did not carry immediate physical meaning; its essential role
was that of a generator of the Euler–Lagrange equations. However, with the Hamiltonian
the situation is different. To understand its physical interpretation, let us compute the full
time derivative dt H = dt H(q(t), p(t), t) of the Hamiltonian evaluated on a solution curve
(q(t), p(t)), (i.e. a solution of the Hamilton equations (5.6)):
f  
X ∂H(q, p, t) ∂H(q, p, t) ∂H(q, p, t)
dt H(q, p, t) = q̇i + ṗi +
i=1
∂qi ∂pi ∂t
(5.6) ∂H(q, p, t)
= .
∂t
Now, above we have seen (cf. Eq. (5.5)) that the Hamiltonian inherits its explicit time
dependence ∂t H = −∂t L from the explicit time dependence of the Lagrangian. In problems
without explicitly time dependent potentials (they are called autonomous problems), the
Hamiltonian stays constant on solution curves.
We have thus found that the function H(q, p), (the missing time argument indicates that
we are now looking at an autonomous situation) defines a constant of motion of extremely
general nature. What is its physical meaning? To answer this question, we consider an general
N –body system in an arbitrary potential. (This is most general conservative autonomous
setup one may imagine.) In cartesian coordinates, its Lagrangian is given by
N
X
L(q, p) = mj q̇2j − U (q1 , . . . , qN ),
j=1

where U is the N –body potential and mj the mass of the jth particle. Proceeding as in the
example on p 104, we readily find
N
X p2i
H(q, p) = + U (q1 , . . . , qN ). (5.8)
j=1
2m i

The expression on the r.h.s. we recognize as the energy of the system. We are thus led to the
following important conclusion:

The Hamiltonian H(q, p) of an autonomous problem (a problem with time–independent


potentials) is dynamically conserved: H(q(t), p(t)) = E = const. on solution curves
(q(t), p(t)).

However, our discussion above implies another very important corollary. It has shown that

The Hamiltonian H = T + U is given by the sum of potential energy, U (q, t), and
kinetic energy, T (q, p, t), expressed as a function of coordinates and momenta.
106 CHAPTER 5. HAMILTONIAN MECHANICS

We have established this connection in cartesian coordinates. However, the coordinate


invariance of the theory implies that this identification holds in general coordinate systems.
Also, the identification H = T + U did not rely on the time–independence of the potential, it
extends to potentials U (q, t).

INFO In principle, the identification H = T + U provides an option to access the Hamiltonian


without reference to the Lagrangian. In cases where the expression T = T (q, p, t) of the
kinetic energy in terms of the coordinates and momenta of the theory is known, we may just
add H(q, p, t) = T (q, p, t) + U (q, t). In such circumstances, there is no need to compute the
Legendre
Hamiltonian by the route L(q, q̇, t) −→ H(q, p, t). Often, however, the identification of the
momenta of the theory is not so obvious, and one is better advised to proceed via Legendre
transform.

5.2 Phase space


In this section, we will introduce phase space as the basic “arena” of Hamiltonian mechanics.
We will start from an innocent definition of phase space as a 2f –dimensional coordinate space
comprising configuration space coordinates, q, and momenta, p. However, as we go along,
we will realize that this working definition is but the tip of an iceberg: the coordinate spaces
of Hamiltonian mechanics are endowed with very deep mathematical structure, and this is one
way of telling why Hamiltonian mechanics is so powerful.

5.2.1 Phase space and structure of the Hamilton equations


To keep the notation simple, we will suppress reference to an optional explicit time depen-
dence of the Hamilton functions throughout this section, i.e. we just write H(. . . ) instead of
H(. . . , t).
The Hamilton equations are coupled equations for coordinates q, and momenta p. As was
argued above, the coordinate pair (q, p) contains sufficient information to uniquely encode
the state of a mechanical system. This suggests to consider the 2f –component objects
 
q
x≡ (5.9)
p

as the new fundamental variables of the theory. Given a coordinate system, we may think of
x as element of a 2f –dimensional vector space. However, all what has been said in section
4.1.3 about the curves of mechanical systems and their coordinate representations carries over
to the present context: in abstract terms, the pair “(configuration space points, momenta)”
defines a mathematical space known as phase space, Γ. The formulation of coordinate
invariant descriptions of phase space is a subject beyond the scope of the present course.
5
However, locally it is always possible to parameterize Γ in terms of coordinates, and to describe
5
Which means that Γ has the status of a 2f –dimensional manifold.
5.2. PHASE SPACE 107

its elements through 2f –component objects such as (5.9). This means that, locally, phase
space can be identified with a 2f –dimensional vector space. However, different coordinate
representations correspond to different coordinate vector spaces, and sometimes it is important
6
to recall that the identification (phase space) ↔ (vector space) may fail globally.
Keeping the above words of caution in mind, we will temporarily identify phase space Γ with
a 2f –dimensional vector space. We will soon see that this space contains a very particular sort
of “scalar product”. This additional structure makes phase space a mathematical object far
more interesting than an ordinary vector space. Let us start with a rewriting of the Hamilton
equations. The identification xi = qi and xf +i = pi , i = 1, . . . , f enables us to rewrite
Eq. (5.6) as,

ẋi = ∂xi+f H,
ẋi+f = −∂xi H,

where i = 1, . . . , f . We can express this in a more compact form as

ẋi = Iij ∂xj H, i = 1, . . . , 2f, (5.10)

or, in vectorial notation,

ẋ = I∂x H. (5.11)

Here, the (2f ) × (2f ) matrix I is defined as


 
1f
I= , (5.12)
−1f

and 1f is the f -dimensional unit matrix. The matrix I is sometimes called the symplectic
unity.

5.2.2 Hamiltonian flow


For any x, the quantity I∂x H(x) = {Iij ∂xj H(x, t)} is a vector in phase space. This means
that the prescription
XH :Γ → Rn ,
x 7→ I∂x H, (5.13)
defines a vector field in phase space, the so-called Hamiltonian vector field. The form of
the Hamilton equations
ẋ = Xh , (5.14)
suggests an interpretation in terms of the “flow lines” of the Hamiltonian vector field: at each
point in phase space, the field XH defines a vector XH (x). The Hamilton equations state
6
For example, there are mechanical systems whose phase space is a two–sphere, and the sphere is not a
vector space. (However, locally, it can be represented in terms of vector–space coordinates.)
108 CHAPTER 5. HAMILTONIAN MECHANICS

Figure 5.1: Visualization of the Hamiltonian vector field and its flow lines

that the solution curve x(t) is tangent to that vector, x(t) = XH (x). One may visualize the
situation in terms of the streamlines of a fluid. Within that analogy, the value XH (x) is a
measure of the local current flow. If one injected a drop of colored ink into the fluid, its trace
would be a representation of the curve x(t).
For a general vector field v : U → RN , x 7→ v(x), where U ⊂ RN , one may define a
parameter–dependent map

Φ : U × R → U,
(x, t) 7→ Φ(x, t), (5.15)

!
through the condition ∂x Φ(x, t) = v(Φ(x, t)). The map Φ is called the flow of the vector
field v. Specifically, the flow of the Hamiltonian vector field XH is defined by the prescription

Φ(x, t) ≡ x(t), (5.16)

where x(t) is a curve with initial condition x(t = 0) = x. Equation (5.16) is a proper
definition because for a given x(0) ≡ x, the solution x(t) is uniquely defined. Further,
∂t Φ(x, t) = dt x(t) = XH (x(t)) satisfies the flow condition. The map Φ(x, t) is called the
Hamiltonian flow.
There is a corollary to the uniqueness of the curve x(t) emanating from a given initial
condition x(0):

Phase space curves never cross.

For if they did, the crossing point would be the initial point of the two out-going stretches
of the curve, and this would be in contradiction to the uniqueness of the solution for a given
initial configuration. There is, however, one subtle caveat to the statement above: although
phase space curves do not cross, they may actually touch each other in common terminal
points. (For an example, see the discussion in section 5.2.3 below.)
5.2. PHASE SPACE 109

5.2.3 Phase space portraits

Graphical representations of phase flow, so-called phase space


portraits are very useful in the qualitative characterization of
mechanical motion. The only problem with this is that phase
flow is defined in 2f dimensional space. For f > 1, we cannot
draw this. What can be drawn, however, is the projection of a
curve onto any of the 2f (2f − 1)/2 coordinate planes (xi , xj ),
1 ≤ i < j ≤ 2f . See the figure for the example of a coordinate
plane projection out of 3d–dimensional space. However, it
takes some experience to work with those representations and
we will not discuss them any further.
However, the concept of phase space portraits becomes truly useful in problems with
one degree of freedom, f = 1. In this case, the conservation of energy on individual
curves H(q, p) = E = const. enables us compute an explicit representation q = q(p, E), or
p = p(q, E) just by solving the relation H(q, p) = E for q or p. This does not “solve” the me-
chanical problem under consideration. (For that one would need to know the time dependence
at which the curves are traversed.) Nonetheless, it provides a far–reaching characterization
of the motion: for any phase space point (q, p) we can construct the curve running through
it without solving differential equations. The projection of these curves onto configuration
space (q(t), p(t)) 7→ q(t) tells us something about the evolution of the configuration space
coordinate.
Let us illustrate these statements, on an example where a closed
solution is possible, the harmonic oscillator. From the Hamiltonian
p2 2
H(q, p) = 2m + mω2
q 2 = E, we find
s  
mω 2 2
p = p(q, E) = ± 2m E − q .
2

Examples of these (ellipsoidal) curves are shown in the figure. It


is straightforward to show (do it!) that they are tangential to the
Hamiltonian vector field
 −1 
m p
XH = .
−mω 2 q
2
At the turning points, p = 0, the energy of the curve E = mω 2
q2,
equals the potential energy.
Now, the harmonic oscillator may be an example a little bit too basic to illustrate the
usefulness of phase space portraits. Instead, consider the problem defined by the potential
shown in Fig. 5.2. The Hamilton equations can now no longer be solved in closed form.
However, we may still compute curves by solution of H(p, q) = E ⇒ p = p(q, E), and the
results look as shown qualitatively for three different values of E in the figure. These curves
110 CHAPTER 5. HAMILTONIAN MECHANICS

give a far–reaching impression of the motion of a point particle in the potential landscape.
Specifically, notice the threshold energy E2 corresponding to the local potential maximum. At
first sight, the corresponding phase space curve appears to violate the criterion of the non-
existence of crossing points (cf. the “critical point” at p = 0 beneath the potential maximum.)
In fact, however, this isn’t a crossing point. Rather the two incoming curves “terminate” at
this point: coming from the left, or right it takes infinite time for the particle to climb the
potential maximum.

EXERCISE Show that a trajectory at energy E = V ∗ ≡ V (q ∗ ) equal to the potential maximum


at q = q ∗ needs infinite time to climb the potential hill. To this end, use that the potential
maximum at q ∗ can be locally modelled as an inverted harmonic oscillator potential, V (q) '
V ∗ − C(q − q ∗ )2 , where C is a positive constant. Consider the local approximation of the Hamilton
p2
function H = 2m − C(q − q ∗ )2 + const. and formulate and solve the corresponding equations of
motion (hint: compare to the standard oscillator discussed above.) Compute the time it takes to
reach the maximum if E = V (q ∗ ).

Eventually it will rest at the maximum in an unstable equilibrium position. Similarly, the out-
going trajectories “begin” at this point. Starting at zero velocity (corresponding to zero initial
momentum), it takes infinite time to accelerate and move downhill. In this sense, the curves
touching (not crossing!) in the critical point represent idealizations that are never actually
realized. Trajectories of this type are called separatrices. Separatrices are important in that
they separate phase space into regions of qualitatively different type of motion (presently,
curves that make it over the hill and those which do not.)

EXERCISE Take a few minutes to familiarize yourself with the phase space portrait and to learn
how to “read” such representations. Sketch the periodic potential V (q) = cos(2πq/a), a = const.
and its phase space portrait.

5.2.4 Poisson brackets


Consider a (differentiable) function in phase space, g : Γ → R, x 7→ g(x). A natural question
to ask is how g(x) will evolve, as x ≡ x(0) 7→ x(t) traces out a phase space trajectory. In
other words, we ask for the time evolution of g(x(t)) with initial condition g(x(0)) = g(x).
The answer is obtained by straightforward time differentiation:
f  
d d X ∂g dqi ∂g dpi
g(x(t), t) = g(q(t), p(t), t) = + + ∂t g(q(t), p(t), t) =
dt dt i=1
∂qi dt ∂pi dt (q(t),p(t),t)
f  
X ∂g ∂H ∂g ∂H
= − + ∂t g(q(t), p(t), t),
i=1
∂q i ∂p i ∂p i ∂q i (q(t),p(t),t)

where the terms ∂t g account for an optional explicit time dependence of g. The character-
istic combination of derivatives governing this expressions appears frequently in Hamiltonian
5.2. PHASE SPACE 111

Figure 5.2: Phase space portrait of a ”non–trivial” one-dimensional potential. Notice the
“critical point” at the threshold energy 2).

mechanics. It motivates the introduction of a shorthand notation,

f  
X ∂f ∂g ∂f ∂g
{f, g} ≡ − . (5.17)
i=1
∂p i ∂q i ∂q i ∂p i

This is expression is called the Poisson bracket of two phase space functions f and g. In the
invariant notation of phase space coordinates x, it assumes the form

{f, g} = (∂x f )T I ∂x g, (5.18)

where I is the symplectic unit matrix defined above. The time evolution of functions may be
concisely expressed in terms of the Poisson bracket of the two functions H and g:

dt g = {H, g} + ∂t g. (5.19)

Notice that the Poisson bracket operation bears similarity to a scalar product of functions.
Let CΓ be the space of smooth (and integrable) functions in phase space. Then
{ , } : CΓ × CΓ → R,
(f, g) 7→ {f, g}, (5.20)
112 CHAPTER 5. HAMILTONIAN MECHANICS

defines a scalar product. This scalar product comes with a number of characteristic algebraic
properties:

. {f, g} = −{g, f }, (skew symmetry),

. {cf + c0 f 0 , g} = c{f, g} + c0 {f 0 , g}, (linearity),

. {c, g} = 0,

. {f f 0 , g} = f {f 0 , g} + {f, g}f 0 , (product rule),

. {f, {g, h}} + {h, {f, g}} + {g, {h, f }} = 0 (Jacobi identity),

where, c, c0 ∈ R are constants. The first three of these relations are immediate consequences
of the definition and the fourth follows from the product rule. The proof of the Jacobi identity
amounts to a straightforward if tedious exercise in differentiation.
At this point, the Poisson bracket appears to be little more than convenient notation.
However, as we go along, we will see that it encapsulates important information on the
mathematical structure of phase space.

5.2.5 Variational principle


The solutions of the Hamilton equations x(t) can be interpreted as extremal curves of a certain
action functional. This connection will be a gateway to the further development of the theory.
Let us consider the set M = {x : [t0 , t1 ] → Γ, q(t0 ) = q0 , q(t1 ) = q1 } of all phase space
curves beginning and ending at a common configuration space point, q0,1 . We next define the
local functional

S :M → R,
f
!
Z t1 X
x 7→ S[x] = dt pi q̇i − H(q, p, t) . (5.21)
t0 i=1

We now claim that the extremal curves of this functional are but
P the solutions of the Hamilton
equations (5.6). To see this, define the function F (x, t) ≡ fi=1 pi q̇i − H(q, p, t), in terms
Rt
of which S[x] = t01 dt F (x, t). Now, according to the general discussion of section 4.1.2, the
extremal curves are solutions of the Euler Lagrange equations

(dt ∂ẋi − ∂xi ) F (x, t) = 0, i = 1, . . . , 2f.

Evaluating these equations for i = 1, . . . , f and i = f + 1, . . . , 2f , respectively, we obtain the


first and second set of the equations (5.6).
5.3. CANONICAL TRANSFORMATIONS 113

5.3 Canonical transformations


In the previous chapter, we have emphasized the flexibility of the Lagrangian approach in
the choice of problem adjusted coordinates: any diffeomorphic transformation qi 7→ qi0 =
qi0 (q), i = 1, . . . , f let the Euler–Lagrange equations form–invariant. Now, in the Hamiltonian
formalism, we have twice as many variables xi , i = 1, . . . , 2f . Of course, we may consider
“restricted” transformations, qi 7→ qi0 (q), where only the configuration space coordinates
are transformed (the transformation of the pi ’s then follows.) However, things get more
interesting, when we set out to explore the full freedom of coordinate transformations “mixing”
configuration space coordinates, and momenta. This would include, for example, a mapping
qi 7→ qi0 ≡ pi , pi 7→ p0i = qi (a diffeomorphism!) that exchanges the roles of coordinates
and momenta. Now, this newly gained freedom raises two questions: (i) what might such
generalized transformations be good for, and (ii) does every diffeormorphic mapping xi 7→ x0i
qualify as a good coordinate transformation? Beginning with the second one, let us address
these two questions in turn:

5.3.1 When is a coordinate transformation ”canonical”?


In the literature of variable transformations in Hamiltonian dynamics it is customary to denote
the “new variables” X ≡ x0 by capital letters. We will follow this convention.

Following our general paradigm of valuing “form invariant equations”, we deem a coordi-
nate transformation “canonical”, if the following condition is met:

7
A diffeomorphism x 7→ X(x) defines a canonical transformation if there exists a
function H 0 (X, t) such that the representation of solution curves (i.e. solutions of
the equations ẋ = I∂x H) in the new coordinates X = X(x) solves the “transformed
Hamilton equations”

Ẋ0 = I∂X H 0 (X, t). (5.22)

Now, this definition may leave us a bit at a loss: it does not tell how the “new” Hamiltonian H 0
relates to the old one, H and in this sense seems to be strangely “under–defined”. However,
before exploring the ways by which canonical transformations can be constructively obtained,
let us take a look at possible motivations for switching to new coordinates.

5.3.2 Canonical transformations: why?


Why would one start thinking about generalized coordinate transformations mixing config-
uration space coordinates and momenta, etc.? In previous sections, we had argued that
coordinates should relate to the symmetries of a problem. Symmetries, in turn, generate con-
servations laws (Noether), i.e. for each symmetry one obtains a quantity, I, that is dynamically
114 CHAPTER 5. HAMILTONIAN MECHANICS

conserved. Within the framework of phase space dynamics, this means that there is a function
I(x) such that dt I(x(t)), where x(t) is a solution curve. We may think of I(x) as a function
that is constant along the Hamiltonian flow lines.
Now, suppose we have s ≤ f symmetries and, accordingly, s conserved functions Ii , i =
1, . . . , f . Further assume that we find a canonical transformation to phase space coordinates
x 7→ X, where P = (I1 , . . . , Is , Ps+1 , . . . , Pf ), and Q = (φ1 , . . . , φs , Qs+1 , . . . , Qf ). In
words: the first s of the new momenta are the functions Ii . Their conjugate configuration
8
space coordinates are called φi . The new Hamiltonian will be some function of the new
coordinates and momenta, H 0 (φ1 , . . . , φs , Qs+1 , . . . , Qf , I1 , . . . , Is , Ps+1 , . . . , Pf ). Now let’s
take a look at the Hamilton equations associated to the variables (φi , Ii ):
dt φi = ∂Ii H 0 ,
!
dt Ii = −∂φi H 0 = 0.
The second equation tells us that H 0 must be independent of the variables φi :

The coordinates, φi , corresponding to conserved momenta Ii are cyclic coordinates,

∂H 0 At
= 0.
∂φi

this point, it should have become clear that coordinate sets containing conserved momenta,
Ii , and their coordinates, φi are tailored to the optimal formulation of mechanical problems.
The power of these coordinates becomes fully evident in the extreme case where s = f ,
i.e. where we have as many conserved quantities as degrees of freedom. In this case
H 0 = H 0 (I1 , . . . , If ),
and the Hamilton equations assume the form
dt φi = ∂Ii H 0 ≡ ωi
dt Ii = −∂φi H 0 = 0.
9
These equations can now be trivially solved:
φi (t) = ωi t + αi ,
Ii (t) = Ii = const.
8
It is natural to designate the conserved quantities as “momenta”: in section 4.2.4 we saw that the
conserved quantity associated to the “natural” coordinate of a symmetry (a coordinate transforming additively
under the symmetry operation) is the Noether momentum of that coordinate.
9
The problem remains solvable, even if H 0 (I1 , . . . , If , t) contains explicit time dependence. In this case,
φ̇i = ∂Ii H 0 (I1 , . . . , If , t) ≡ ωi (t), where ωi (t) is a function of known time dependence. (The time dependence
is known because the variables Ii are constant and we are left with the externally imposed dependence Rt on
time.) These equations – ordinary first order differential equations in time – are solved by φi (t) = t0 dt0 ωi (t0 ).
5.3. CANONICAL TRANSFORMATIONS 115

?
A very natural guess would H 0 (X, t) = H(x(X), t), i.e. the “old” Hamiltonian expressed
in new coordinates. However, we will see that this ansatz can be too restrictive.
Progress with this situation is made once we remember that solution curves x are extrema
of an action functional S[x] (cf. section 5.2.5). This functional is the x–representation of an
“abstract” functional. The same object can be expressed in terms of X–coordinates (cf. our
discussion in section 4.1.3.) as S 0 [X] = S[x]. The form of the functional S 0 follows from the
condition that the X–representation of the curve be solution of the equations (5.22). This is
equivalent to the condition
f
Z t1 !
X
S 0 [X] = dt Pi Q̇i − H 0 (Q, P, t) + const.,
t0 i=1

where “const.” is a contribution whose significance we will discuss in a moment. Indeed, the
variation of S 0 [X] generates Euler–Lagrange equations which we saw in section 5.2.5 are just
the Hamilton equations (5.22).
116 CHAPTER 5. HAMILTONIAN MECHANICS
Bibliography

[1] J. D. Jackson. Classical Electrodynamics. Wiley, 1975.

[2] R. Sexl and H. Urbantke. Relativity, Groups, Particles. Springer, 2001.

117

You might also like