Advanced Quantum Mechanics
Peter S. Riseborough
May 10, 2011
Contents
1 Introduction 5
2 Quantum Mechanics of a Single Photon 6
2.1 Rotations and Intrinsic Spin . . . . . . . . . . . . . . . . . . . . . 7
2.2 Massless Particles with Spin Zero . . . . . . . . . . . . . . . . . . 11
2.3 Massless Particles with Spin One . . . . . . . . . . . . . . . . . . 12
3 Maxwell’s Equations 14
3.1 Vector and Scalar Potentials . . . . . . . . . . . . . . . . . . . . . 15
3.2 Gauge Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Relativistic Formulation of Electrodynamics 20
4.1 Lorentz Scalars and Vectors . . . . . . . . . . . . . . . . . . . . . 20
4.2 Covariant and Contravariant Derivatives . . . . . . . . . . . . . . 22
4.3 Lorentz Transformations . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Invariant Form of Maxwell’s Equations . . . . . . . . . . . . . . . 27
5 The Simplest Classical Field Theory 29
5.1 The Continuum Limit . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Normal Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3 Rules of Canonical Quantization . . . . . . . . . . . . . . . . . . 38
5.4 The Algebra of Boson Operators . . . . . . . . . . . . . . . . . . 41
5.5 The Classical Limit . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6 Classical Field Theory 44
6.1 The Hamiltonian Formulation . . . . . . . . . . . . . . . . . . . . 45
6.2 Symmetry and Conservation Laws . . . . . . . . . . . . . . . . . 46
6.2.1 Conservation Laws . . . . . . . . . . . . . . . . . . . . . . 47
6.2.2 Noether Charges . . . . . . . . . . . . . . . . . . . . . . . 48
6.2.3 Noether’s Theorem . . . . . . . . . . . . . . . . . . . . . . 50
6.3 The EnergyMomentum Tensor . . . . . . . . . . . . . . . . . . . 50
1
7 The Electromagnetic Lagrangian 53
7.1 Conservation Laws for Electromagnetic Fields . . . . . . . . . . . 57
7.2 Massive SpinOne Particles . . . . . . . . . . . . . . . . . . . . . 63
8 Symmetry Breaking and Mass Generation 64
8.1 Symmetry Breaking and Goldstone Bosons . . . . . . . . . . . . 64
8.2 The KibbleHiggs Mechanism . . . . . . . . . . . . . . . . . . . . 67
9 Quantization of the Electromagnetic Field 68
9.1 The Lagrangian and Hamiltonian Density . . . . . . . . . . . . . 70
9.2 Quantizing the Normal Modes . . . . . . . . . . . . . . . . . . . . 73
9.2.1 The Energy of the Field . . . . . . . . . . . . . . . . . . . 75
9.2.2 The Electromagnetic Field . . . . . . . . . . . . . . . . . 76
9.2.3 The Momentum of the Field . . . . . . . . . . . . . . . . 77
9.2.4 The Angular Momentum of the Field . . . . . . . . . . . . 79
9.3 Uncertainty Relations . . . . . . . . . . . . . . . . . . . . . . . . 85
9.4 Coherent States . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.4.1 The PhaseNumber Uncertainty Relation . . . . . . . . . 92
9.4.2 Argand Representation of Coherent States . . . . . . . . . 93
10 NonRelativistic Quantum Electrodynamics 94
10.1 Emission and Absorption of Photons . . . . . . . . . . . . . . . . 96
10.1.1 The Emission of Radiation . . . . . . . . . . . . . . . . . 97
10.1.2 The Dipole Approximation . . . . . . . . . . . . . . . . . 99
10.1.3 Electric Dipole Radiation Selection Rules . . . . . . . . . 103
10.1.4 Angular Distribution of Dipole Radiation . . . . . . . . . 111
10.1.5 The Decay Rate from Dipole Transitions. . . . . . . . . . 117
10.1.6 The 2p →1s Electric Dipole Transition Rate. . . . . . . . 120
10.1.7 Electric Quadrupole and Magnetic Dipole Transitions. . . 123
10.1.8 The 3d →1s Electric Quadrupole Transition Rate . . . . 127
10.1.9 Twophoton decay of the 2s state of Hydrogen. . . . . . . 130
10.1.10The Absorption of Radiation . . . . . . . . . . . . . . . . 137
10.1.11The Photoelectric Eﬀect . . . . . . . . . . . . . . . . . . . 142
10.1.12Impossibility of absorption of photons by freeelectrons. . 145
10.2 Scattering of Light . . . . . . . . . . . . . . . . . . . . . . . . . . 147
10.2.1 Rayleigh Scattering . . . . . . . . . . . . . . . . . . . . . 151
10.2.2 Thomson Scattering . . . . . . . . . . . . . . . . . . . . . 154
10.2.3 Raman Scattering . . . . . . . . . . . . . . . . . . . . . . 158
10.2.4 Radiation Damping and Resonance Fluorescence . . . . . 159
10.2.5 Natural LineWidths . . . . . . . . . . . . . . . . . . . . . 162
10.3 Renormalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
10.3.1 The Casimir Eﬀect . . . . . . . . . . . . . . . . . . . . . . 165
10.3.2 The Lamb Shift . . . . . . . . . . . . . . . . . . . . . . . . 177
10.3.3 The SelfEnergy of a Free Electron . . . . . . . . . . . . . 180
10.3.4 The SelfEnergy of a Bound Electron . . . . . . . . . . . . 183
10.3.5 Brehmstrahlung . . . . . . . . . . . . . . . . . . . . . . . 188
2
11 The Dirac Equation 195
11.1 Conservation of Probability . . . . . . . . . . . . . . . . . . . . . 200
11.2 Covariant Form of the Dirac Equation . . . . . . . . . . . . . . . 201
11.3 The Field Free Solution . . . . . . . . . . . . . . . . . . . . . . . 203
11.4 Coupling to Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 209
11.4.1 Mott Scattering . . . . . . . . . . . . . . . . . . . . . . . . 210
11.4.2 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . 214
11.4.3 The Gordon Decomposition . . . . . . . . . . . . . . . . . 215
11.5 Lorentz Covariance of the Dirac Equation . . . . . . . . . . . . . 219
11.5.1 The Space of the Anticommuting γ
µ
Matrices. . . . . . . 230
11.5.2 Polarization in Mott Scattering . . . . . . . . . . . . . . . 237
11.6 The NonRelativistic Limit . . . . . . . . . . . . . . . . . . . . . 240
11.7 Conservation of Angular Momentum . . . . . . . . . . . . . . . . 244
11.8 Conservation of Parity . . . . . . . . . . . . . . . . . . . . . . . . 246
11.9 Bilinear Covariants . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.10The Spherically Symmetric Dirac Equation . . . . . . . . . . . . 254
11.10.1The Hydrogen Atom . . . . . . . . . . . . . . . . . . . . . 265
11.10.2LowestOrder Radial Wavefunctions . . . . . . . . . . . . 278
11.10.3The Relativistic Corrections for Hydrogen . . . . . . . . . 280
11.10.4The Kinematic Correction . . . . . . . . . . . . . . . . . . 285
11.10.5SpinOrbit Coupling . . . . . . . . . . . . . . . . . . . . . 286
11.10.6The Darwin Term . . . . . . . . . . . . . . . . . . . . . . 292
11.10.7The Fine Structure of Hydrogen . . . . . . . . . . . . . . 293
11.10.8A Particle in a Spherical Square Well . . . . . . . . . . . 297
11.10.9The MIT Bag Model . . . . . . . . . . . . . . . . . . . . . 303
11.10.10The Temple Meson Model . . . . . . . . . . . . . . . . . . 307
11.11Scattering by a Spherically Symmetric Potential . . . . . . . . . 309
11.11.1Polarization in Coulomb Scattering. . . . . . . . . . . . . 309
11.11.2Partial Wave Analysis . . . . . . . . . . . . . . . . . . . . 313
11.12An Electron in a Uniform Magnetic Field . . . . . . . . . . . . . 316
11.13Motion of an Electron in a Classical Electromagnetic Field . . . . 319
11.14The Limit of Zero Mass . . . . . . . . . . . . . . . . . . . . . . . 323
11.15Classical Dirac Field Theory . . . . . . . . . . . . . . . . . . . . . 331
11.15.1Chiral Gauge Symmetry . . . . . . . . . . . . . . . . . . . 334
11.16Hole Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
11.16.1Compton Scattering . . . . . . . . . . . . . . . . . . . . . 342
11.16.2Charge Conjugation . . . . . . . . . . . . . . . . . . . . . 347
12 The ManyParticle Dirac Field 351
12.1 The Algebra of Fermion Operators . . . . . . . . . . . . . . . . . 351
12.2 Quantizing the Dirac Field . . . . . . . . . . . . . . . . . . . . . 354
12.3 Parity, Charge and Time Reversal Invariance . . . . . . . . . . . 358
12.3.1 Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
12.3.2 Charge Conjugation . . . . . . . . . . . . . . . . . . . . . 362
12.3.3 Time Reversal . . . . . . . . . . . . . . . . . . . . . . . . 363
12.3.4 The CPT Theorem . . . . . . . . . . . . . . . . . . . . . . 366
3
12.4 The Connection between Spin and Statics . . . . . . . . . . . . . 367
13 Massive Gauge Field Theory 368
13.1 The Gauge Symmetry . . . . . . . . . . . . . . . . . . . . . . . . 369
13.2 The Coupling to the Gauge Field . . . . . . . . . . . . . . . . . . 370
13.3 The Free Gauge Fields . . . . . . . . . . . . . . . . . . . . . . . . 371
13.4 Breaking the Symmetry . . . . . . . . . . . . . . . . . . . . . . . 375
4
1 Introduction
Nonrelativistic mechanics yields a reasonable approximate description of phys
ical phenomena, in the range where the particles’ kinetic energies are small
compared with their rest mass energies. However, it should be noted that when
the relativistic invariant mass of a particle is expressed in terms of its energy E
and momentum p via
E
2
− p
2
c
2
= m
2
c
4
(1)
it implies that its dispersion relation has two branches
E = ± c
_
p
2
+ m
2
c
2
(2)
In relativistic classical mechanics, it is assumed expedient to neglect the negative
energy solutions. This assumption based on the expectation that since energy
pc
E
(
p
)
+mc
2
mc
2
2mc
2
Figure 1: The positive and negative energy branches for a relativistic particle
with rest mass m. The minimum separation between the positiveenergy branch
and the negativeenergy branch is 2mc
2
.
can only change continuously, it is impossible that a particle with positive energy
can make a transition from the positive to negativeenergy states. However, in
quantum mechanics, particles can make discontinuous transitions. Therefore, it
is necessary to consider both the positive and negativeenergy branches. These
considerations naturally lead one to the concept of particles and antiparticles,
and also to the realization that one must consider multiparticle quantum me
chanics or ﬁeld theory.
We shall take a look at the quantum mechanical description of the elec
tromagnetic ﬁeld, Dirac’s relativistic theory of spin onehalf fermions (such as
leptons and quarks), and then look at the interaction between these particles
and the electromagnetic ﬁeld. The interaction between charged fermions and
the electromagnetic ﬁeld is known as Quantum Electrodynamics. Quantum
Electrodynamics contains some surprises, namely that although the interaction
5
appears to be governed by a small coupling strength
_
e
2
¯h c
_
∼
1
137.0359979
(3)
perturbation theory does not converge. In fact, straightforward perturbation
theory is plagued by inﬁnities. However, physics is a discipline which is aimed
at uncovering the relationships between measured quantities. The quantities
e and m which occur in quantum electrodynamics are theoretical constructs
which, respectively, describe the bare charge of the electron and bare mass of
the electron. This means one is assuming that e and m would be the results
of measurements on a (ﬁctional) electron which does not interact electromag
netically. That is, e and m are not physically measurable and their values are
therefore unknown. What can be measured experimentally are the renormal
ized mass and the renormalized charge of the electron. The divergences found in
quantum electrodynamics can be shown to cancel or drop out, when one relates
diﬀerent physically measurable quantities, as only the renormalized masses and
energies enter the theory. Despite the existence of inﬁnities, quantum electrody
namics is an extremely accurate theory. Experimentally determined quantities
can be predicted to an extremely high degree of precision.
The framework of quantum electrodynamics can be extended to describe
the uniﬁcation of electrodynamics and the weak interaction via electroweak
theory, which is also well tested. The scalar (U(1)) gauge symmetry of the elec
tromagnetic ﬁeld is replaced by a matrix (SU(2)) symmetry of the combined
electroweak theory, in which the gauge ﬁeld couples to the two components of
the spinor wave functions of the fermions. The generalization of the gauge ﬁeld
necessitates the inclusion of additional components. Through symmetry break
ing, some components of the ﬁeld which mediates the electroweak interaction
become massive, i.e. have ﬁnite masses. The ﬁnite masses are responsible for
the short range of the weak interaction. More tentatively, the gauge theory
framework of quantum electrodynamics has also been extended to describe the
interactions between quarks which is mediated by the gluon ﬁeld. The gauge
symmetry of the interaction is enlarged to an SU(3) symmetry. However, un
like quantum electrodynamics where photons are uncharged and do not interact
with themselves, the gluons do interact amongst themselves.
2 Quantum Mechanics of a Single Photon
Maxwell’s equations were formulated to describe classical electromagnetism. In
the quantum description, the classical electromagnetic ﬁeld is described as being
composed of a very large number of photons. Before one describes multiphoton
quantum mechanics of the electromagnetic ﬁeld, one should ascertain the form
of the Schr¨odinger equation for a single photon. The photon is a massless, un
charged particle of spinone.
6
A spinone particle is described by a vector wave function. This can be
heuristically motivated as follows:
A spinzero particle has just one state and is uniquely described by a one
component ﬁeld ψ.
A spin onehalf particle has two independent states corresponding to the two
allowed values of the zcomponent of the intrinsic angular momentum S
z
=
±
¯ h
2
. The wave function ψ of a spin onehalf particle is a spinor which has two
independent components
ψ(r, t) =
_
ψ
(1)
(r, t)
ψ
(2)
(r, t)
_
(4)
These two components can be used to represent two independent basis states.
We conjecture that since a particle with intrinsic spin S has (2S + 1) inde
pendent basis states, then the wave function should have (2S +1) independent
components.
A (nonrelativistic) spinone particle should have three independent states
corresponding to the three possible values of the zcomponent of the intrinsic
angular momentum. From the conjecture, one expects that the wave function
ψ of a spinone particle should have three components.
ψ(r, t) =
_
_
ψ
(1)
(r, t)
ψ
(3)
(r, t)
ψ
(3)
(r, t)
_
_
(5)
This conjecture can be veriﬁed by examining the transformational properties of
a vector ﬁeld under rotations. Under a rotation of the ﬁeld, the components
of the ﬁeld are transported in space, and also the direction of the vector ﬁeld
is rotated. This implies that the components of the transported ﬁeld have to
be rotated. The rotation of the direction of the ﬁeld is generated by operators
which turn out to be the intrinsic angular momentum operators. Speciﬁcally,
the generators satisfy the commutation relations deﬁning angular momentum,
but also correspond to the subspace with angular momentum one.
2.1 Rotations and Intrinsic Spin
Under the transformation which takes r → r
= R r, the magnitude of the
scalar ﬁeld ψ at r is transferred to the point r
. This deﬁnes the transformation
ψ → ψ
. The transformed scalar ﬁeld ψ
is deﬁned so that its value at r
has
the same value as ψ(r). That is
ψ
(r
) = ψ(r) (6)
7
or equivalently
ψ
(
ˆ
R r) = ψ(r) (7)
The above equation can be used to determine ψ
(r) by using the substitution
r →
ˆ
R
−1
r so
ψ
(r) = ψ(
ˆ
R
−1
r) (8)
If ˆ e is a unit vector along the axis of rotation, the rotation of r through an
r
e x r
e x ( e x r )
ϕ
e
r ( e . r )
Figure 2: A rotation of the vector r through an angle ϕ about an axis ˆ e.
inﬁnitesimal angle δϕ is expressed as
ˆ
R r = r + δϕ ˆ e ∧ r + . . . (9)
where terms of order δϕ
2
have been neglected. Hence, under an inﬁnitesimal
ψ(r)
ψ(R
1
r)
ψ
xaxis
yaxis
R
r'=R
1
r
Figure 3: The eﬀect of a rotation
ˆ
R on a scalar ﬁeld ψ(r).
rotation, the transformation of a scalar wave function can be found from the
Taylor expansion
ψ
(r) = ψ(r −δϕ ˆ e ∧ r)
= ψ(r) − δϕ ( ˆ e ∧ r ) . ∇ ψ(r) + . . .
= ψ(r) − δϕ ( r ∧ ∇ ) . ˆ e ψ(r) + . . .
= ψ(r) −
i δϕ
¯h
( ˆ e .
ˆ
L ) ψ(r) + . . .
8
= exp
_
−
i δϕ
¯h
( ˆ e .
ˆ
L )
_
ψ(r) (10)
where the operator
ˆ
L has been deﬁned as
ˆ
L = − i ¯h r ∧ ∇ (11)
Therefore, locally, rotations of the scalar ﬁeld are generated by the orbital an
gular momentum operator
ˆ
L.
Since the operation
ˆ
R is a rotation, it also rotates a vector ﬁeld ψ(r). Not
only does the rotation transfer the magnitude of ψ(r) to the new point r
but
it must also rotate the transferred vector so that ψ
(r
) has the same direction
ˆ
¹ ψ(r). That is
2
0
2
1
0.5
0
0.5
1
0.5
0
0.5
2 1 0 1 2
1
0.5
0
0.5
1
Figure 4: The eﬀect of a rotation
ˆ
R on a vector ﬁeld ψ(r). The rotation aﬀects
both the magnitude and direction of the vector.
ψ
(r
) =
ˆ
¹ ψ(r) (12)
or equivalently
ψ
(
ˆ
R r) =
ˆ
¹ ψ(r) (13)
The above equation can be used to determine ψ
(r) as
ψ
(r) =
ˆ
¹ ψ(
ˆ
R
−1
r) (14)
The part of the rotational operator designated by
ˆ
¹ does not aﬀect the posi
tional coordinates (r) of the vector ﬁeld, and so can be found by considering
the rotation of the vector ﬁeld ψ at the origin
ˆ
¹ ψ =
_
ˆ
I + δϕ ˆ e ∧
_
ψ (15)
That is, the operator
ˆ
¹ only produces a mixing of the components of ψ. Hence,
the complete rotational transformation of the vector ﬁeld can be represented as
ψ
(r) =
ˆ
¹ ψ(r −δϕ ˆ e ∧ r)
9
= ψ(r −δϕ ˆ e ∧ r) + δϕ ˆ e ∧ ψ(r −δϕ ˆ e ∧ r)
= ψ(r −δϕ ˆ e ∧ r) + δϕ ˆ e ∧ ψ(r) + . . .
= ψ(r) −
i δϕ
¯h
( ˆ e .
ˆ
L ) ψ(r) + δϕ ˆ e ∧ ψ(r) + . . .
= ψ(r) −
i δϕ
¯h
( ˆ e .
ˆ
L ) ψ(r) −
i δϕ
¯h
( ˆ e .
ˆ
S ) ψ(r) (16)
where the terms of order (δϕ)
2
have been neglected and a vector operator
ˆ
S has
been introduced. The operator
ˆ
S only admixes the components of ψ
µ
, unlike
ˆ
L which only acts on the r dependence of the components. The components of
the threedimensional vector operator S are expressed as 3 3 matrices
1
, with
matrix elements
(
ˆ
S
(i)
)
j,k
= − i ¯h ξ
i,j,k
(18)
where ξ
i,j,k
is the antisymmetric LeviCivita symbol. The LeviCivita symbol
is deﬁned by ξ
i,j,k
= 1 if the ordered set (i, j, k) is obtained by an even number
of permutations of (1, 2, 3) and is −1 if it is obtained by an odd number of
permutations, and is zero if two or more indices are repeated. Speciﬁcally, the
antisymmetric matrices are given by
ˆ
S
(1)
= ¯ h
_
_
0 0 0
0 0 −i
0 i 0
_
_
(19)
and by
ˆ
S
(2)
= ¯ h
_
_
0 0 i
0 0 0
−i 0 0
_
_
(20)
and ﬁnally by
ˆ
S
(3)
= ¯ h
_
_
0 −i 0
i 0 0
0 0 0
_
_
(21)
By using a unitary transform, these operators can be transformed into the stan
dard representation of spinone operators where S
(3)
is chosen to be diagonal.
It is easily shown that the components of the matrix operators
ˆ
L and
ˆ
S satisfy
the same type of commutation relations
[
ˆ
L
(i)
,
ˆ
L
(j)
] = i ¯h ξ
i,j,k
ˆ
L
(k)
(22)
and
[
ˆ
S
(i)
,
ˆ
S
(j)
] = i ¯h ξ
i,j,k
ˆ
S
(k)
(23)
1
The component of the matrix denoted by
(
ˆ
S)
j,k
(17)
denotes the element of
ˆ
S in the jth row and kth column.
10
where the repeated index (k) is summed over. The above set of operators
form a Lie algebra associated with the corresponding Lie group of continuous
rotations. Thus, it is natural to identify these operators which arise in the
analysis of transformations in classical physics with the angular momentum
operators of quantum mechanics. In terms of these operators, the inﬁnitesimal
transformation has the form
ψ
(r) ≈ ψ(r) −
i δϕ
¯h
ˆ e . (
ˆ
L +
ˆ
S ) ψ(r) + . . . (24)
or
ψ
(r) = exp
_
−
i δϕ
¯h
ˆ e . (
ˆ
L +
ˆ
S )
_
ψ(r) (25)
Thus, the transformation is locally accomplished by
ψ
(r) = exp
_
−
i δϕ
¯h
( ˆ e .
ˆ
J )
_
ψ(r) (26)
where
ˆ
J =
ˆ
L +
ˆ
S (27)
is the total angular momentum. The operator
ˆ
S is the intrinsic angular mo
mentum of the vector ﬁeld ψ. The magnitude of S is found from
ˆ
S
2
= (
ˆ
S
(1)
)
2
+ (
ˆ
S
(2)
)
2
+ (
ˆ
S
(3)
)
2
(28)
which is evaluated as
ˆ
S
2
= 2 ¯h
2
_
_
1 0 0
0 1 0
0 0 1
_
_
(29)
which is the Casimir operator. It is seen that a vector ﬁeld has intrinsic angular
momentum, with a magnitude given by the eigenvalue of
ˆ
S
2
which is
S ( S + 1 ) ¯ h
2
= 2 ¯h
2
(30)
hence S = 1. Thus, it is seen that a vector ﬁeld is associated with an intrinsic
angular momentum of spin one.
2.2 Massless Particles with Spin Zero
First, we shall try and construct the Schr¨odinger equation describing a massless
uncharged spinless particle. A spinless particle is described by a scalar wave
function, and an uncharged particle is described by a real wave function. The
derivation is based on the energymomentum relation for a massless particle
E
2
− p
2
c
2
= 0 (31)
11
which is quantized by using the substitutions
E → i ¯h
∂
∂t
p → ˆ p = − i ¯h ∇ (32)
One ﬁnds that the real scalar wave function ψ(r, t) satisﬁes the wave equation
_
1
c
2
∂
2
∂t
2
− ∇
2
_
ψ = 0 (33)
since ¯ h drops out. This is not a very useful result, since it is a secondorder diﬀer
ential equation in time, and the solution of a secondorder diﬀerential equation
can only be determined if two initial conditions are given. Usually, the initial
conditions are given by
ψ(r, 0) = f(r)
∂ψ
∂t
¸
¸
¸
¸
t=0
= g(r) (34)
In quantum mechanics, measurements disturb the state of the system and
so it becomes impossible to design two independent measurements which can
uniquely specify two initial conditions for one state. Hence, one has reached
an impasse. Due to this diﬃculty and since there are no known examples of
massless spinless particles found in nature, this theory is not very useful.
2.3 Massless Particles with Spin One
The wave function of an uncharged spinone particle is expected to be repre
sented by a real vector function.
We shall try and factorize the wave equation for the vector E into two ﬁrst
order diﬀerential equations, each of which requires one boundary condition.
This requires one to specify six quantities. Therefore, one needs to postulate
the existence of two independently measurable ﬁelds, E and B. Each of these
ﬁelds should satisfy the two wave equations
_
1
c
2
∂
2
∂t
2
− ∇
2
_
E = 0 (35)
and
_
1
c
2
∂
2
∂t
2
− ∇
2
_
B = 0 (36)
12
The ﬁrstorder equations must have the form
i ¯h
∂E
∂t
= c
_
a ˆ p ∧ E + b ˆ p ∧ B
_
i ¯h
∂B
∂t
= c
_
d ˆ p ∧ B + e ˆ p ∧ E
_
(37)
since the lefthand side is a vector, the righthand side must also be a vector
composed of the operator ˆ p and the wave functions. Like Newton’s laws, these
equations must be invariant under timereversal invariance, t → − t. The
transformation leads to the identiﬁcation
a = d = 0 (38)
and
b = − e (39)
if one also requires that one of the two ﬁelds changes sign under time reversal
2
.
We shall adopt the convention that the ﬁeld E retains its sign, so
E → E
B → − B (40)
under timereversal invariance. On taking the time derivative of the ﬁrst equa
tion, one obtains
− ¯h
2
∂
2
E
∂t
2
= − c
2
b
2
ˆ p ∧
_
ˆ p ∧ E
_
(41)
Likewise, the B ﬁeld is found to satisfy
− ¯h
2
∂
2
B
∂t
2
= − c
2
b
2
ˆ p ∧
_
ˆ p ∧ B
_
(42)
Thus, one has found the two equations
− ¯h
2
∂
2
E
∂t
2
= − c
2
b
2
_
− ˆ p
2
E + ˆ p
_
ˆ p . E
_ _
(43)
and
− ¯h
2
∂
2
B
∂t
2
= − c
2
b
2
_
− ˆ p
2
B + ˆ p
_
ˆ p . B
_ _
(44)
On substituting the operator ˆ p = − i ¯h ∇ , one obtains
∂
2
E
∂t
2
= − c
2
b
2
_
− ∇
2
E + ∇
_
∇ . E
_ _
(45)
2
For the nonrelativistic Schr¨odinger equation, timereversal invariance implies that t →
t
= −t and ψ →ψ
= ψ
∗
.
13
and
∂
2
B
∂t
2
= − c
2
b
2
_
− ∇
2
B + ∇
_
∇ . B
_ _
(46)
so ¯ h drops out. To reduce these equations to the form of wave equations, one
needs to impose the conditions
∇ . E = 0 (47)
and
∇ . B = 0 (48)
On identifying the coeﬃcients with those of the wave equation, one requires
that
b
2
= 1 (49)
Thus, one has arrived at the set of the sourcefree Maxwell’s equations
1
c
∂E
∂t
= ∇ ∧ B
−
1
c
∂B
∂t
= ∇ ∧ E
∇ . E = 0
∇ . B = 0 (50)
which describe the oneparticle Schr¨odinger equation for a massless spinone
particle, with the wave function (E, B). These have a form which appears to be
completely classical, since ¯ h has dropped out. Furthermore, in the absence of
sources, Maxwell’s equations are invariant under the symmetry transformation
(E, B) →(−B, E).
3 Maxwell’s Equations
Classical Field Theories describe systems in which a very large number of par
ticles are present. Measurements on systems containing very large numbers of
particles are expected to result in average values, with only very small devia
tions. Hence, we expect that the subtleties of quantum measurements should be
completely absent in systems that can be described as quantum ﬁelds. Classical
Electromagnetism is an example of such a quantum ﬁeld, in which an inﬁnitely
large number of photons are present.
In the presence of a current density j and a charge density ρ, Maxwell’s
equations assume the forms
∇ ∧ B −
1
c
∂E
∂t
=
4 π
c
j
14
∇ ∧ E +
1
c
∂B
∂t
= 0
∇ . E = 4 π ρ
∇ . B = 0 (51)
The ﬁeld equations ensure that the sources j and ρ satisfy a continuity equation.
Taking the divergence of the ﬁrst equation and combining it with the time
derivative of the third, one obtains
∇ .
_
∇ ∧ B
_
−
1
c
_
∂
∂t
∇ . E
_
=
4 π
c
∇ . j
−
1
c
_
∂
∂t
∇ . E
_
=
4 π
c
∇ . j
−
4 π
c
∂ρ
∂t
=
4 π
c
∇ . j (52)
Hence, one has derived the continuity equation
∂ρ
∂t
+ ∇ . j = 0 (53)
which shows that charge is conserved.
3.1 Vector and Scalar Potentials
Counting each component of Maxwell’s equations separately, one arrives at eight
equations for the six components of the unknown ﬁelds E and B. As the equa
tions are linear, this would overdetermine the ﬁelds. Two of the eight equations
must be regarded as selfconsistency equations for the initial conditions on the
ﬁelds.
One can solve the two sourcefree Maxwell equations, by expressing the
electric E and magnetic ﬁelds B in terms of the vector A and scalar φ potentials,
via
E = −
1
c
∂A
∂t
− ∇ φ (54)
and
B = ∇ ∧ A (55)
The expressions for B and E automatically satisfy the two sourcefree Maxwell’s
equations. This can be seen by examining
∇ ∧ E +
1
c
∂B
∂t
= 0 (56)
15
which, on substituting the expressions for the electromagnetic ﬁelds in terms of
the vector and scalar potentials, becomes
∇ ∧
_
−
1
c
∂A
∂t
− ∇ φ
_
+
1
c
∂
∂t
_
∇ ∧ A
_
= 0 (57)
which is automatically satisﬁed since
∇ ∧
_
∇ φ
_
= 0 (58)
and the terms involving A cancel since A is analytic. The remaining sourcefree
Maxwell equation is satisﬁed, since it has the form
∇ . B = 0 (59)
which reduces to
∇ .
_
∇ ∧ A
_
= 0 (60)
which is identically zero.
Therefore, the six components of E and B have been replaced by the four
quantities A and φ. These four quantities are determined by the Maxwell equa
tions which involve the sources, which are four in number.
The ﬁelds are governed by the set of nontrivial equations which relate A and
φ to the sources j and ρ. When expressed in terms of A and φ, the remaining
nontrivial Maxwell equations become
∇ ∧
_
∇ ∧ A
_
+
1
c
∂
∂t
_
1
c
∂A
∂t
+ ∇ φ
_
=
4 π
c
j
− ∇ .
_
1
c
∂A
∂t
+ ∇ φ
_
= 4 π ρ (61)
but since
∇ ∧
_
∇ ∧ A
_
= ∇
_
∇ . A
_
− ∇
2
A (62)
the pair of equations can be written as
_
− ∇
2
A +
1
c
2
∂
2
A
∂t
2
_
+∇
_
∇ . A +
1
c
∂φ
∂t
_
=
4 π
c
j
− ∇
2
φ −
1
c
∂
∂t
_
∇ . A
_
= 4 π ρ (63)
We shall make use of gauge invariance to simplify these equations.
16
3.2 Gauge Invariance
The vector and scalar potentials are deﬁned as the solutions of the coupled
partial diﬀerential equations describing the electric and magnetic ﬁelds
E = −
1
c
∂A
∂t
− ∇ φ (64)
and
B = ∇ ∧ A (65)
Hence, one expects that the solutions are only determined up to functions of
integration. That is the vector and scalar potentials are not completely deter
mined, even if the electric and magnetic ﬁelds are known precisely. It is possible
to transform the vector and scalar potentials, in a way such that the E and B
ﬁelds remain invariant. These transformations are known as gauge transforma
tions of the second kind
3
.
In particular, one can perform the transform
A → A
= A − ∇ Λ
φ → φ
= φ +
1
c
∂Λ
∂t
(66)
where Λ is an arbitrary analytic function and this transformation leaves the E
and B ﬁelds invariant. The magnetic ﬁeld is seen to be invariant since
B
= ∇ ∧ A
= ∇ ∧
_
A − ∇ Λ
_
= ∇ ∧ A
= B (67)
where the identity
∇ ∧
_
∇ Λ
_
= 0 (68)
valid for any scalar function Λ has been used. The electric ﬁeld is invariant,
since the transformed electric ﬁeld is given by
E
= −
1
c
∂A
∂t
− ∇ φ
3
The transformation
ψ → ψ
= ψ exp
_
i χ
¯h
_
p → ˆ p
= − i ¯h ∇ − ∇ χ
used in quantum mechanics is known as a gauge transformation of the ﬁrst kind.
17
= −
1
c
∂
∂t
_
A − ∇ Λ
_
− ∇
_
φ +
1
c
∂Λ
∂t
_
= −
1
c
∂A
∂t
− ∇ φ
= E (69)
In the above derivation, it has been noted that the order of the derivatives can
be interchanged,
∇
∂Λ
∂t
=
∂
∂t
∇ Λ (70)
since Λ is an analytic scalar function.
The gauge invariance allows us the freedom to impose a gauge condition
which ﬁxes the gauge. Two gauge conditions which are commonly used are the
Lorenz gauge
∇ . A +
1
c
∂φ
∂t
= 0 (71)
and the Coulomb or radiation gauge
∇ . A = 0 (72)
The Lorenz gauge is manifestly Lorentz invariant, whereas the Coulomb gauge
is frequently used in cases where the electrostatic interactions are important.
It is always possible to impose one or the other of these gauge conditions. If
the vector and scalar potentials (φ, A) do not satisfy the gauge transformation,
then one can perform a gauge transformation so that the transformed ﬁelds
(φ
, A
) satisfy the gauge condition.
For example, if the ﬁelds (φ, A) do not satisfy the Lorenz gauge condition,
since
∇ . A +
1
c
∂φ
∂t
= χ(r, t) (73)
where χ is nonzero, then one can perform the gauge transformation to the new
ﬁelds (φ
, A
)
∇ . A
+
1
c
∂φ
∂t
= ∇ . A − ∇
2
Λ +
1
c
∂φ
∂t
+
1
c
2
∂
2
Λ
∂t
2
= χ −
_
∇
2
−
1
c
2
∂
2
∂t
2
_
Λ (74)
The new ﬁelds satisfy the Lorentz condition if one chooses Λ to be the solution
of the wave equation
_
∇
2
−
1
c
2
∂
2
∂t
2
_
Λ = χ(r, t) (75)
18
This can always be done, since the driven wave equation always has a solution.
Hence, one can always insist that the ﬁelds satisfy the gauge condition
∇ . A
+
1
c
∂φ
∂t
= 0 (76)
Alternatively, if one is to impose the Coulomb gauge condition
∇ . A
= 0 (77)
one can use Poisson’s equations to show that one can always ﬁnd a Λ such that
the Coulomb gauge condition is satisﬁed
4
.
In the Lorenz gauge, the equations of motion for the electromagnetic ﬁeld
are given by
_
− ∇
2
+
1
c
2
∂
2
∂t
2
_
A =
4 π
c
j
_
− ∇
2
+
1
c
2
∂
2
∂t
2
_
φ = 4 π ρ (78)
Hence, A and φ both satisfy the wave equation, where j and ρ are the sources.
The solutions are waves which travel with velocity c.
In the Coulomb gauge, the ﬁelds satisfy the equations
_
− ∇
2
+
1
c
2
∂
2
∂t
2
_
A =
4 π
c
j −
1
c
∂
∂t
∇ φ
− ∇
2
φ = 4 π ρ (79)
The second equation is Poisson’s equation and has solutions given by
φ(r, t) =
_
d
3
r
ρ(r
, t)
[ r − r
[
(80)
which is an “instantaneous” Coulomb interaction. However, the force from the
electric ﬁeld E is not transmitted instantaneously from r
to r, since there is a
term in the equation for A which compensates for the “instantaneous” interac
tion described by φ.
Exercise:
Consider the case of a uniform magnetic ﬁeld of magnitude B which is ori
ented along the zaxis. Using the Coulomb gauge, ﬁnd a general solution for
the vector potential.
4
Imposing a gauge condition is insuﬃcient to uniquely determine the vector potential A,
since in the case of the Coulomb gauge, the vector potential is only known up to the gradient
of any harmonic function Λ.
19
4 Relativistic Formulation of Electrodynamics
Physical quantities can be classiﬁed as either being scalars, vectors or tensors
according to how they behave under transformations. Scalars are invariant un
der Lorentz transformations, and all vectors transform in the same way.
4.1 Lorentz Scalars and Vectors
The spacetime fourvector has components given by the time t and the three
space coordinates (x
(1)
, x
(2)
, x
(3)
) which label an event. The zerothcomponent
of the fourvector x
(0)
(the time component) is deﬁned to be ct, where c is the
velocity of light, in order that all the components have the dimensions of length.
In Minkowski space, the fourvector is deﬁned as having contravariant compo
nents x
µ
= (ct, x
(1)
, x
(2)
, x
(3)
), while the covariant components are denoted
by x
µ
= (ct, −x
(1)
, −x
(2)
, −x
(3)
). The invariant length is given by the scalar
product
x
µ
x
µ
= ( c t )
2
− ( x
(1)
)
2
− ( x
(2)
)
2
− ( x
(3)
)
2
(81)
where repeated indices are summed over. The invariant length x
µ
x
µ
is related
to the proper time τ. This deﬁnition can be generalized to the scalar product
of two arbitrary fourvectors A
µ
and B
µ
as
A
µ
B
µ
= A
(0)
B
(0)
− A
(1)
B
(1)
− A
(2)
B
(2)
− A
(3)
B
(3)
(82)
In special relativity, the fourvector scalar product can be written in terms of
the product of the timeindex components and the scalar product of the usual
threevectors as
A
µ
B
µ
= A
(0)
B
(0)
− A . B (83)
The Lorentz invariant fourvector scalar product can also be written as
A
µ
B
µ
= g
µ,ν
A
µ
B
ν
(84)
where g
µ,ν
is the Minkowski metric. These equations imply that
A
µ
= g
µ,ν
A
ν
(85)
That is, the metric tensor transforms contravariant components to covariant
components. The Minkowski metric can be expressed as a four by four matrix
g
µ,ν
=
_
_
_
_
1 0 0 0
0 −1 0 0
0 0 −1 0
0 0 0 −1
_
_
_
_
(86)
20
where µ labels the rows and ν labels the columns. If the fourvectors are ex
pressed as columnvectors
A
ν
=
_
_
_
_
A
(0)
A
(1)
A
(2)
A
(3)
_
_
_
_
(87)
and
A
ν
=
_
_
_
_
A
(0)
A
(1)
A
(2)
A
(3)
_
_
_
_
(88)
then the transformation from contravariant to covariant components can be
expressed as
_
_
_
_
A
(0)
A
(1)
A
(2)
A
(3)
_
_
_
_
=
_
_
_
_
1 0 0 0
0 −1 0 0
0 0 −1 0
0 0 0 −1
_
_
_
_
_
_
_
_
A
(0)
A
(1)
A
(2)
A
(3)
_
_
_
_
(89)
The inverse transform is expressed as
A
µ
= g
µ,ν
A
ν
(90)
where, most generally, g
µ,ν
is the inverse metric
g
µ,ν
= ( g
µ,ν
)
−1
(91)
In our particular case of Cartesian coordinates in (ﬂat) Minkowski space, the
inverse metric coincides with the metric.
A familiar example of the Lorentz invariant scalar product involves the mo
mentum fourvector with contravariant components p
µ
≡ (
E
c
, p
(1)
, p
(2)
, p
(3)
)
where E is the energy. The covariant components of the momentum fourvector
are given by p
µ
≡ (
E
c
, −p
(1)
, −p
(2)
, −p
(3)
) and the scalar product deﬁnes the
invariant mass m via
p
µ
p
µ
=
_
E
c
_
2
− p
2
= m
2
c
2
(92)
Another scalar product which is frequently encountered is p
µ
x
µ
which is given
by
p
µ
x
µ
= E t − p . x (93)
This scalar product is frequently seen in the description of planes of constant
phase of waves.
21
4.2 Covariant and Contravariant Derivatives
We shall now generalize the idea of the diﬀerential operator ∇ to Minkowski
space. The generalization we consider, will have to be modiﬁed when the metric
varies in space, i.e. when g
µ,ν
depends on the coordinates x
µ
of the points in
space.
Consider a scalar function φ(x
µ
) deﬁned in terms of the contravariant coor
dinates x
µ
. Under an inﬁnitesimal translation a
µ
x
µ
→ x
µ
= x
µ
+ a
µ
(94)
the scalar function φ(x
µ
) is still a scalar. Therefore, on performing a Taylor
expansion, one has
φ(x
µ
+a
µ
) = φ(x
µ
) + a
µ
∂
∂x
µ
φ(x
µ
) + . . . (95)
which is also a scalar. Therefore, the quantity
a
µ
∂
∂x
µ
φ(x
µ
) (96)
is a scalar and can be interpreted as a scalar product between the contravariant
vector displacement a
µ
and the covariant gradient
∂
∂x
µ
φ(x
µ
) (97)
The covariant gradient can be interpreted in terms of a covariant derivative
∂
µ
=
∂
∂x
µ
=
_
1
c
∂
∂t
,
∂
∂x
(1)
,
∂
∂x
(2)
,
∂
∂x
(3)
_
=
_
1
c
∂
∂t
, ∇
_
(98)
Likewise, one can introduce the contravariant derivative as
∂
µ
=
∂
∂x
µ
=
_
1
c
∂
∂t
, −
∂
∂x
(1)
, −
∂
∂x
(2)
, −
∂
∂x
(3)
_
=
_
1
c
∂
∂t
, − ∇
_
(99)
These covariant and contravariant derivative operators are useful in making
relativistic transformational properties explicit. For example, if one deﬁnes the
22
fourvector potential A
µ
via
A
µ
=
_
φ , A
(1)
, A
(2)
, A
(3)
_
=
_
φ , A
_
(100)
then the Lorenz gauge condition can be expressed as
∂
µ
A
µ
=
1
c
∂φ
∂t
+ ∇ . A = 0 (101)
which is of the form of a Lorentz scalar. Likewise, if one introduces the current
density fourvector j
µ
with contravariant components
j
µ
=
_
c ρ , j
(1)
, j
(2)
, j
(3)
_
=
_
c ρ , j
_
(102)
then the condition for conservation of charge can be written as
∂ρ
∂t
+ ∇ . j = 0
∂
µ
j
µ
= 0 (103)
which is a Lorentz scalar. Also, the gauge transformation can also be compactly
expressed in terms of a transformation of the contravariant vector potential
A
µ
→ A
µ
= A
µ
+ ∂
µ
Λ (104)
where Λ is an arbitrary scalar function. The gauge transformation reduces to
φ → φ
= φ +
1
c
∂Λ
∂t
A → A
= A − ∇ Λ (105)
Similarly, one can use the contravariant notation to express the quantization
conditions
E → i ¯h
∂
∂t
p → − i ¯h ∇ (106)
in the form
p
µ
→ i ¯h ∂
µ
(107)
One can also express the wave equation operator in terms of the scalar product
of the contravariant and covariant derivative operators
∂
µ
∂
µ
=
1
c
2
∂
2
∂t
2
− ∇
2
(108)
23
Hence, in the Lorenz gauge, the equations of motion for the fourvector potential
A
µ
can be expressed concisely as
∂
ν
∂
ν
A
µ
=
4 π
c
j
µ
(109)
However, these equations are not gauge invariant.
4.3 Lorentz Transformations
A Lorentz transform can be deﬁned as any transformation which leaves the
scalar product of two fourvectors invariant. Under a Lorentz transformation,
an arbitrary fourvector A
µ
is transformed to A
µ
, via
A
µ
= Λ
µ
ν
A
ν
(110)
where the repeated index ν is summed over. The inverse transformation is
represented by
A
µ
= ( Λ
−1
)
µ
ν
A
ν
(111)
Since the scalar product is to be invariant, one requires
A
µ
B
µ
= Λ
µ
ν
A
ν
g
µ,σ
Λ
σ
τ
B
τ
(112)
If the scalar product is to be invariant, the transform must satisfy the condition
g
ν,τ
= Λ
µ
ν
g
µ,σ
Λ
σ
τ
(113)
If this condition is satisﬁed, then Λ
µ
ν
is a Lorentz transformation.
Like the metric tensor, the Lorentz transformation can be expressed as a
four by four matrix
Λ
µ
ν
=
_
_
_
_
Λ
0
0
Λ
0
1
Λ
0
2
Λ
0
3
Λ
1
0
Λ
1
1
Λ
1
2
Λ
1
3
Λ
2
0
Λ
2
1
Λ
2
2
Λ
2
3
Λ
3
0
Λ
3
1
Λ
3
2
Λ
3
3
_
_
_
_
(114)
where µ labels the rows and ν labels the columns. In terms of the matrices, the
condition that Λ is a Lorentz transformation can be written as
g = Λ
T
g Λ (115)
where Λ
T
is the transpose of the matrix Λ, i.e.
( Λ
T
)
ν
µ
= Λ
µ
ν
(116)
24
x
(1)
x'
(1)
x
(2)
x'
(2)
x
(3)
x'
(3)
v
O
O'
Figure 5: Two inertial frames of reference moving with a constant relative ve
locity with respect to each other.
A speciﬁc transformation, which is the transformation from a stationary
frame to a reference frame moving along the x
(1)
axis with velocity v, is repre
sented by the matrix
Λ
µ
ν
=
1
_
1 −
v
2
c
2
_
_
_
_
_
_
1 −
v
c
0 0
−
v
c
1 0 0
0 0
_
1 −
v
2
c
2
0
0 0 0
_
1 −
v
2
c
2
_
_
_
_
_
_
(117)
which can be seen to satisfy the condition
g
ν,τ
= Λ
µ
ν
g
µ,σ
Λ
σ
τ
(118)
which has to be satisﬁed if Λ
µ
ν
is to represent a Lorentz transformation.
x
(3)
x
(1)
x
(2)
O
x'
(1)
x'
(2)
φ
Figure 6: Two inertial frames of reference rotated with respect to each other.
Likewise, the rotation through an angle ϕ about the x
(3)
axis represented by
Λ
µ
ν
=
_
_
_
_
1 0 0 0
0 cos ϕ sin ϕ 0
0 −sin ϕ cos ϕ 0
0 0 0 1
_
_
_
_
(119)
25
is a Lorentz transformation, since it also satisﬁes the condition eqn(113).
Since the boost velocity v and the angles of rotation ϕ are continuous, one
could consider transformations where these quantities are inﬁnitesimal. Such
inﬁnitesimal transformations can be expanded as
Λ
µ
ν
= δ
µ
ν
+
µ
ν
+ . . . (120)
where δ
µ
ν
is the Kronecker delta function representing the identity transforma
tion
5
and
µ
ν
is a matrix which is ﬁrstorder in the inﬁnitesimal parameter. The
condition on
µ
ν
required for Λ
µ
ν
to be a Lorentz transform is given by
0 =
µ
ν
g
µ,τ
+ g
ν,σ
σ
τ
(122)
or, on using the metric tensor to lower the indices, one has
τ,ν
= −
ν,τ
(123)
Hence, an arbitrary inﬁnitesimal Lorentz transformation is represented by an
arbitrary antisymmetric 4 4 matrix
τ,ν
. This matrix occurs in the expression
for the transformation matrix
Λ
τ,ν
= g
τ,ν
+
τ,ν
+ . . . (124)
which transforms the contravariant components of a vector into covariant com
ponents. It follows that, if ν and τ are either both spaceindices or are both
timeindices, the components of the ﬁnite Lorentz transformation matrix Λ
τ
ν
are antisymmetric on interchanging τ and ν. Whereas if the pair of indices ν
and τ are mixed space and timeindices, the components of the transformation
matrix Λ
τ
ν
are symmetric.
Exercise:
Show that a Lorentz transformation from the unprimed rest frame to the
primed reference frame moving along the x
(3)
axis with constant velocity v, can
be considered as a rotation through an imaginary angle θ = i χ in spacetime,
where i c t plays the role of a spatial coordinate. Find the equation that deter
mines χ.
5
The student more adept in indexgymnastics may consider the advantages and disadvan
tages of replacing the Kronecker delta function δ
µ
ν by g
µ
ν, since
δ
µ
ν ≡ g
µ
ν = g
µ,ρ
gρ,ν = δµ,ν
(121)
26
4.4 Invariant Form of Maxwell’s Equations
In physics, one strives to write the fundamental equations in forms which are
independent of arbitrary choices, such as the coordinate system or the choice of
gauge condition. However, in particular applications it is expedient to choose
the coordinate system and gauge condition in ways that highlight the symme
tries and simplify the mathematics.
We shall introduce an antisymmetric ﬁeld tensor F
µ,ν
which is gauge in
variant. That is, the form of F
µ,ν
is independent of the choice of gauge. We
shall express the six (
(16−4)
2
= 6) independent components of the antisymmetric
tensor in terms of the fourvector potential A
µ
and the contravariant derivative
as
F
µ,ν
= ∂
µ
A
ν
− ∂
ν
A
µ
(125)
so the tensor is antisymmetric
F
µ,ν
= − F
ν,µ
(126)
It is immediately obvious that F
µ,ν
is invariant under gauge transformations
A
µ
→ A
µ
= A
µ
+ ∂
µ
Λ (127)
since
∂
µ
∂
ν
Λ − ∂
ν
∂
µ
Λ ≡ 0 (128)
Alternatively, explicit evaluation of F
µ,ν
shows that the six independent com
ponents can be expressed in terms of the electric and magnetic ﬁelds, which are
gauge invariant. Components of the ﬁeld tensor are explicitly evaluated from
the deﬁnition as
F
0,1
=
1
c
∂
∂t
A
(1)
−
∂
∂x
1
φ
=
1
c
∂
∂t
A
(1)
+
∂
∂x
(1)
φ
= − E
(1)
(129)
and
F
1,2
=
∂
∂x
1
A
(2)
−
∂
∂x
2
A
(1)
= −
∂
∂x
(1)
A
(2)
+
∂
∂x
(2)
A
(1)
= − B
(3)
(130)
The nonzero components of the ﬁeld tensor are related to the spatial compo
nents (i, j, k) of the electromagnetic ﬁeld by
F
i,0
= E
(i)
(131)
27
and
F
i,j
= − ξ
i,j,k
B
(k)
(132)
where ξ
i,j,k
is the LeviCivita symbol. Therefore, the ﬁeld tensor can be ex
pressed as the matrix
F
µ,ν
=
_
_
_
_
0 −E
(1)
−E
(2)
−E
(3)
E
(1)
0 −B
(3)
B
(2)
E
(2)
B
(3)
0 −B
(1)
E
(3)
−B
(2)
B
(1)
0
_
_
_
_
(133)
Maxwell’s equations can be written in terms of the ﬁeld tensor as
∂
ν
F
ν,µ
=
4 π
c
j
µ
(134)
For µ = i, the ﬁeld equations become
1
c
∂
∂t
F
0,i
+
∂
∂x
j
F
j,i
=
4 π
c
j
(i)
−
1
c
∂
∂t
E
(i)
+ ξ
i,j,k
∂
∂x
j
B
(k)
=
4 π
c
j
(i)
−
1
c
∂
∂t
E
(i)
+
_
∇ ∧ B
_
(i)
=
4 π
c
j
(i)
(135)
while for µ = 0 the equations reduce to
∂
∂x
j
F
j,0
=
4 π
c
j
(0)
∂
∂x
j
E
(j)
= 4 π ρ
_
∇ . E
_
= 4 π ρ (136)
since F
0,0
vanishes. The above ﬁeld equations are the two Maxwell’s equations
which involve the sources of the ﬁelds. The remaining two sourceless Maxwell
equations are expressed in terms of the antisymmetric ﬁeld tensor as
∂
µ
F
ν,ρ
+ ∂
ρ
F
µ,ν
+ ∂
ν
F
ρ,µ
= 0 (137)
where the indices are permuted cyclically. These internal equations reduce to
∇ . B = 0 (138)
when µ, ν and ρ are the space indices (1, 2, 3). When one index taken from the
set (µ, ν, ρ) is the time index, and the other two are diﬀerent space indices, the
ﬁeld equations reduce to
1
c
∂B
∂t
+ ∇ ∧ E = 0 (139)
28
If two indices are repeated, the above equations are satisﬁed identically, due to
the antisymmetry of the ﬁeld tensor.
Alternatively, when expressed in terms of the vector potential, the ﬁeld equa
tions of motion are equivalent to the wave equations
∂
ν
∂
ν
A
µ
− ∂
µ
_
∂
ν
A
ν
_
=
4 π
c
j
µ
(140)
Since fourvectors A
µ
and j
µ
transform as
A
µ
= Λ
µ
ν
A
ν
j
µ
= Λ
µ
ν
j
ν
(141)
and likewise for the contravariant derivative
∂
µ
= Λ
µ
ν
∂
ν
(142)
then one can conclude that the ﬁeld tensor transforms as
F
µ,ν
= Λ
µ
σ
Λ
ν
τ
F
σ,τ
(143)
This shows that, under a Lorentz transform, the electric and magnetic ﬁelds
(E, B) transform into themselves.
Exercise:
Show explicitly, how the components of the electric and magnetic ﬁelds
change, when the coordinate system is transformed from the unprimed refer
ence frame to a primed reference frame which is moving along the x
(3)
axis with
constant velocity v.
5 The Simplest Classical Field Theory
Consider a string stretched along the xaxis, which can support motion in the
ydirection. We shall consider the string to be composed of mass elements
m
i
= ρ a, that have ﬁxed xcoordinates denoted by x
i
and are separated by
a distance a. The mass elements can be displaced along the yaxis. The y
coordinate of the ith mass element is denoted by y
i
. We shall assume that the
string satisﬁes the spatial boundary conditions at each end. We shall assume
that the string satisﬁes periodic boundary conditions, so that y
0
= y
N+1
.
The Lagrangian for the string is a function of the coordinates y
i
and the
velocities
dyi
dt
. The Lagrangian is given by
L =
i=N
i=1
_
m
i
2
_
dy
i
dt
_
2
−
κ
i
2
_
y
i
− y
i−1
_
2
_
(144)
29
x
y
y
i
y
i+1
x
i+1
x
i
y
i1
x
i1
y
i+1
y
i
a
Figure 7: A string composed of a discrete set of particles of masses m
i
sepa
rated by a distance a along the xaxis. The particles can be moved from their
equilibrium positions by displacements y
i
transverse to the xaxis.
The ﬁrst term represents the kinetic energy of the mass elements, and the second
term represents the increase in the elastic potential energy of the section of the
string between the ith and (i −1)th element as the string is stretched from its
equilibrium position. This follows since, ∆s
i
the length of the section of string
between mass element i and i −1 in a nonequilibrium position is given by
∆s
2
i
= ( x
i
− x
i−1
)
2
+ ( y
i
− y
i−1
)
2
= a
2
+ ( y
i
− y
i−1
)
2
(145)
since the xcoordinates are ﬁxed. Thus, if one assumes that the spring constant
for the stretched string segment is κ
i
, then the potential energy of the segment
is given by
V
i
=
κ
i
2
( y
i
− y
i−1
)
2
(146)
We shall consider the case of a uniform string for which κ
i
= κ for all i.
The equations of motion are obtained by minimizing the action S which is
deﬁned as the integral
S =
_
T
0
dt L (147)
between an initial conﬁguration at time 0 and a ﬁnal conﬁguration at time T.
The action is a functional of the coordinates y
i
and the velocities
dyi
dt
, which are
to be evaluated for arbitrary functions y
i
(t). The string follows the trajectory
y
ex
i
(t) which minimizes the action, which travels between the ﬁxed initial value
y
i
(0) and the ﬁnal value y
i
(T). We shall represent the deviation of an arbitrary
trajectory y
i
(t) from the extremal trajectory by δy
i
(t), then
δy
i
(t) = y
i
(t) − y
ex
i
(t) (148)
30
The action can be expanded in powers of the deviations δy
i
as
S = S
0
+ δ
1
S + δ
2
S + . . . (149)
where S
0
is the action evaluated for the extremal trajectories. The ﬁrstorder
deviation found by varying δy
i
is given by
δ
1
S =
_
T
0
dt
i=N
i=1
_
m
i
_
dδy
i
dt
_ _
dy
ex
i
dt
_
−κ δy
i
_
y
ex
i
−y
ex
i−1
_
+κ δy
i
_
y
ex
i+1
−y
ex
i
_ _
(150)
in which y
i
(T) and
dyi
dt
are to be evaluated for the extremal trajectory. Since
the trajectory which the string follows minimizes the action, the term δ
1
S must
vanish for an arbitrary variation δy
i
. We can eliminate the time derivative of
the deviation by integrating by parts with respect to t. This yields
δ
1
S =
_
T
0
dt
i=N
i=1
_
− m
i
δy
i
d
dt
_
dy
ex
i
dt
_
− κ δy
i
_
y
ex
i
− y
ex
i−1
_
+ κ δy
i
_
y
ex
i+1
− y
ex
i
_ _
+
i
m
i
δy
i
(t)
_
dy
ex
i
dt
_¸
¸
¸
¸
T
0
(151)
The boundary term vanishes since the initial and ﬁnal conﬁgurations are ﬁxed,
so
δy
i
(T) = δy
i
(0) = 0 (152)
Hence the ﬁrstorder variation of the action reduces to
δ
1
S =
_
T
0
dt
i=N
i=1
δy
i
_
− m
i
d
dt
_
dy
ex
i
dt
_
− κ
_
2 y
ex
i
− y
ex
i−1
− y
ex
i+1
_ _
(153)
The linear variation of the action vanishes for an arbitrary δy
i
(t), if the term in
the square brackets vanishes
m
i
d
dt
_
dy
ex
i
dt
_
+ κ
_
2 y
ex
i
− y
ex
i−1
− y
ex
i+1
_
= 0 (154)
Thus, out of all possible trajectories, the physical trajectory y
ex
i
(t) is determined
by the equation of motion
m
i
d
dt
_
dy
i
dt
_
= − κ
_
2 y
i
− y
i−1
− y
i+1
_
(155)
The momentum p
i
which is canonically conjugate to y
i
is determined by
p
i
=
_
∂L
∂(
dyi
dt
)
_
(156)
31
which yields the momentum as
p
i
= m
i
dy
i
dt
(157)
The Hamiltonian is deﬁned as the Legendre transform of L, so
H =
i
p
i
dy
i
dt
− L (158)
The Hamiltonian is only a function of the pairs of canonically conjugate mo
menta p
i
and coordinates y
i
. This can be seen, considering inﬁnitesimal changes
in y
i
,
dyi
dt
and p
i
. The resulting inﬁnitesimal change in the Hamiltonian dH is
expressed as
dH =
i=N
i=1
_
dp
i
dy
i
dt
+ p
i
d(
dy
i
dt
) −
_
∂L
∂(
dyi
dt
)
_
d(
dy
i
dt
) −
_
∂L
∂y
i
_
dy
i
_
=
i=N
i=1
_
dp
i
dy
i
dt
−
_
∂L
∂y
i
_
dy
i
_
(159)
since, the terms proportional to the inﬁnitesimal change d(
dyi
dt
) vanish identi
cally, due to the deﬁnition of p
i
. From this, one ﬁnds
∂H
∂p
i
=
dy
i
dt
(160)
and
∂H
∂y
i
= −
_
∂L
∂y
i
_
(161)
Therefore, the Hamiltonian is only a function of the pairs of canonically
conjugate variables p
i
and y
i
. The Hamiltonian is given by
H =
i=N
i=1
_
p
2
i
2 m
i
+
κ
i
2
_
y
i
− y
i−1
_
2
_
(162)
When expressed in terms of the Hamiltonian, the equations of motion have the
form
dy
i
dt
=
∂H
∂p
i
dp
i
dt
= −
∂H
∂y
i
(163)
32
The Hamilton equations of motion reduce to
dy
i
dt
=
p
i
m
i
dp
i
dt
= − κ
i
_
y
i
− y
i−1
_
+ κ
i+1
_
y
i+1
− y
i
_
(164)
for each i value N ≥ i ≥ 1.
One can deﬁne the Poisson brackets of two arbitrary quantities A and B in
terms of derivatives with respect to the canonically conjugate variables
_
A , B
_
=
i=N
i=1
_
∂A
∂y
i
∂B
∂p
i
−
∂B
∂y
i
∂A
∂p
i
_
(165)
The Poisson bracket is antisymmetric in A and B
_
A , B
_
= −
_
B , A
_
(166)
The Poisson brackets of the canonically conjugate variables are given by
_
p
i
, y
j
_
= − δ
i,j
(167)
and
_
p
i
, p
j
_
=
_
y
i
, y
j
_
= 0 (168)
We shall show how energy is conserved by considering a ﬁnite segment of
the string. For example, we shall consider the segment of the string consisting
of the ith mass element and the string which connects the ith and (i − 1)th
mass element. The energy of this segment will be described by H
i
, where
H
i
=
ρ a
2
_
dy
i
dt
_
2
+
κ
2
_
y
i
− y
i−1
_
2
(169)
The rate of increase energy in this segment is given by
dH
i
dt
= ρ a
_
d
2
y
i
dt
2
_ _
dy
i
dt
_
+κ
_
dy
i
dt
_ _
y
i
−y
i−1
_
−κ
_
dy
i−1
dt
_ _
y
i
−y
i−1
_
(170)
One can use the equation of motion to eliminate the acceleration term, leading
to
dH
i
dt
= κ
_
dy
i
dt
_ _
y
i+1
− y
i
_
− κ
_
dy
i−1
dt
_ _
y
i
− y
i−1
_
(171)
33
The increase in energy of this segment, per unit time, is clearly given by the
diﬀerence of the quantity
T
i
= − κ
_
dy
i
dt
_ _
y
i+1
− y
i
_
(172)
at the front end of the segment and T
i−1
at the back end of the segment. Since,
from continuity of energy, the rate of increase in the energy of the segment must
equal the net inﬂow of energy into the segment, one can identify T
i
as the ﬂux
of energy ﬂowing out of the ith into the (i + 1)th segment.
5.1 The Continuum Limit
The displacement of each element of the string can be expressed as a function
of its position, via
y
i
= y(x
i
) (173)
where each segment has length a, so that x
i+1
= x
i
+ a. The displacement
y(x
i+1
) can be Taylor expanded about x
i
as
y(x
i+1
) = y(x
i
) + a
∂y
∂x
¸
¸
¸
¸
xi
+
a
2
2!
∂
2
y
∂x
2
¸
¸
¸
¸
xi
+ . . . (174)
We intend to take the limit a → 0, so that only the ﬁrst few terms of the series
need to be retained. The summations over i are to be replaced by integrations
N
i=1
→
1
a
_
L
0
dx (175)
The tension in the string T is given by
T = κ a (176)
and this has to be kept constant when the limit a →0 is taken.
In the continuum limit, the Lagrangian L can be expressed as an integral of
the Lagrangian density L as
L =
_
L
0
dx L (177)
where
L =
1
2
_
ρ
_
dy
dt
_
2
− κ a
_
∂y
∂x
_
2
_
(178)
34
The equations of motion are found from the extrema of the action
S =
_
T
0
dt
_
L
0
dx L (179)
It should be noted, that in S time and space are treated on the same footing
and that L is a scalar quantity.
In the continuum limit, the Hamiltonian is given by
H =
_
L
0
dx H (180)
where the Hamiltonian density H is given by
H =
1
2
_
ρ
_
dy
dt
_
2
+ κ a
_
∂y
∂x
_
2
_
(181)
and the energy ﬂux T is given by
T = − κ a
_
dy
dt
_ _
∂y
∂x
_
(182)
The condition of conservation of energy is expressed as the continuity equation
dH
dt
+
∂T
∂x
= 0 (183)
5.2 Normal Modes
The solutions of the equations of motion are of the form of a (real) superposition
of plane waves
ψ
k
(x) =
1
√
L
exp
_
i ( k x − ω t )
_
(184)
The above expression satisﬁes the wave equation if the frequency ω satisﬁes the
dispersion relation
ω
2
k
= v
2
k
2
(185)
The above dispersion relation yields both positive and negative frequency solu
tions. If the planewaves are to satisfy periodic boundary conditions, k must be
quantized so that
k
n
=
2 π
L
n (186)
for integer n. The positivefrequency solutions shall be written as
ψ
k
(x) =
1
√
L
exp
_
i ( k
n
x − ω
n
t )
_
(187)
35
and the negative frequency solutions as
ψ
∗
−k
(x) =
1
√
L
exp
_
i ( k
n
x + ω
n
t )
_
(188)
These solutions form an orthonormal set since
_
L
0
dx ψ
∗
k
(x) ψ
k
(x) = δ
k
,k
(189)
Hence, a general solution can be written as
y(x) =
k
_
c
k
(0) ψ
k
(x) + c
∗
−k
(0) ψ
∗
−k
(x)
_
(190)
where the c
k
are arbitrary complex numbers that depend on k. If the time
dependence of the ψ
k
(x) is absorbed into the complex functions c
k
via
c
k
(t) = c
k
(0) exp
_
− i ω
k
t
_
(191)
then one has
y(x) =
1
√
L
k
_
c
k
(t) + c
∗
−k
(t)
_
exp
_
i k x
_
(192)
which is purely real. Thus, the ﬁeld y(x) is determined by the amplitudes of
the normal modes, i.e. by c
k
(t). The timedependent amplitude c
k
(t) satisﬁes
the equation of motion
d
2
c
k
dt
2
= − ω
2
k
c
k
(193)
and, therefore, behaves like a classical harmonic oscillator. To quantize this
classical ﬁeld theory, one needs to quantize these harmonic oscillators.
The Hamiltonian is expressed as
H =
1
2 a
_
L
0
dx
_
1
ρ a
p(x)
2
+ κ a
2
_
∂y(x)
∂x
_
2
_
(194)
On substituting y(x) in the form
y(x) =
1
√
L
k
_
c
k
(t) + c
∗
−k
(t)
_
exp
_
i k x
_
(195)
and
p(x) =
ρ a
√
L
k
_
dc
k
dt
+
dc
∗
−k
dt
_
exp
_
i k x
_
(196)
36
then after integrating over x, one ﬁnds that the energy has the form
H =
ρ
2
k
_
dc
−k
dt
+
dc
∗
k
dt
_ _
dc
k
dt
+
dc
∗
−k
dt
_
+
κ a
2
k
k
2
_
c
−k
(t) + c
∗
k
(t)
_ _
c
k
(t) + c
∗
−k
(t)
_
(197)
Furthermore, on using the timedependence of the Fourier coeﬃcients c
k
(t), one
has
H = −
ρ
2
k
ω
2
k
_
c
−k
(t) − c
∗
k
(t)
_ _
c
k
(t) − c
∗
−k
(t)
_
+
κ a
2
k
k
2
_
c
−k
(t) + c
∗
k
(t)
_ _
c
k
(t) + c
∗
−k
(t)
_
(198)
but the frequency is given by the dispersion relation
ω
2
k
= v
2
k
2
=
_
κ a
ρ
_
k
2
(199)
Therefore, the expression for the Hamiltonian simpliﬁes to
H = ρ
k
ω
2
k
_
c
∗
k
(t) c
k
(t) + c
−k
(t) c
∗
−k
(t)
_
= ρ
k
ω
2
k
_
c
∗
k
(0) c
k
(0) + c
−k
(0) c
∗
−k
(0)
_
(200)
which is timeindependent, since the timedependent phase factors cancel out.
Thus, one can think of the energy as a function of the variables c
k
and c
∗
−k
.
Since the Hamiltonian is strictly expressed in terms of canonically conjugate co
ordinates and momenta, one should examine the Poisson brackets of c
k
and c
∗
−k
.
The variables y(x
i
) and p(x
j
) have the Poisson brackets
_
p(x
i
) , y(x
j
)
_
= − δ
i,j
_
p(x
i
) , p(x
j
)
_
=
_
y(x
i
) , y(x
j
)
_
= 0 (201)
Due to the orthogonality properties of the planewaves, one has
_
c
k
+ c
∗
−k
_
≈
a
√
L
i
y(x
i
) exp
_
− i k x
i
_
(202)
and also
− i ω
k
ρ a
_
c
k
− c
∗
−k
_
≈
a
√
L
j
p(x
j
) exp
_
− i k
x
j
_
(203)
37
These relations are simply the results of applying the inverse Fourier transform
to y(x) and p(x). One can ﬁnd the Poisson brackets relations between c
k
and
c
∗
k
from
− i ω
k
ρ
_ _
c
k
− c
∗
−k
_
,
_
c
k
+ c
∗
−k
_ _
= −
a
L
i,j
_
p(x
i
) , y(x
j
)
_
exp
_
− i ( k x
i
+ k
x
j
)
_
= +
a
L
i,j
δ
i,j
exp
_
− i ( k x
i
+ k
x
j
)
_
= +
a
L
i
exp
_
− i ( k + k
) x
i
_
= + δ
k+k
(204)
Likewise, one can obtain similar expressions for the other commutation relations.
This set of equations can be satisﬁed by setting
_
c
∗
k
, c
k
_
=
i
2 ω
k
ρ
δ
k,k
(205)
and
_
c
∗
k
, c
∗
k
_
=
_
c
k
, c
k
_
= 0 (206)
The above set of Poisson brackets can be recast in a simpler form by deﬁning
c
k
=
1
√
2 ω
k
ρ
a
k
(207)
etc., so that the Poisson brackets reduce to
_
a
∗
k
, a
k
_
= i δ
k,k
(208)
and
_
a
∗
k
, a
∗
k
_
=
_
a
k
, a
k
_
= 0 (209)
where the nonuniversal factors have cancelled out.
5.3 Rules of Canonical Quantization
The ﬁrst rule of Canonical Quantization states, “Physical quantities should be
represented by operators”. Hence a
k
and a
∗
k
should be replaced by the operators
ˆ a
k
and ˆ a
†
k
. The second rule of Canonical Quantization states, “Poisson Brackets
should be replaced by Commutators”. Hence,
i ¯h
_
A , B
_
→ [
ˆ
A ,
ˆ
B ] (210)
38
So one has
[ ˆ a
†
k
, ˆ a
k
] = − ¯h δ
k,k
(211)
and
[ ˆ a
†
k
, ˆ a
†
k
] = [ ˆ a
k
, ˆ a
k
] = 0 (212)
To get rid of the annoying ¯ h in the commutator, one can set
ˆ a
k
=
√
¯h
ˆ
b
k
ˆ a
†
k
=
√
¯h
ˆ
b
†
k
(213)
Whether it was noted or not,
ˆ
b
†
k
is the Hermitean conjugate of
ˆ
b
k
. The Her
mitean relation can proved by taking the Hermitean conjugate of ˆ y(x
i
), and
noting that the third rule of quantization states, “Measurable quantities are to
replaced by Hermitean operators”. Therefore, the operator
ˆ y(x) =
1
√
L
k
¸
¯h
2 ρ ω
k
_
ˆ
b
k
(t) +
ˆ
b
†
−k
(t)
_
exp
_
i k x
_
(214)
must be Hermitean. What this means is, the Hermitean conjugate
ˆ y
†
(x) =
1
√
L
k
¸
¯h
2 ρ ω
k
_
ˆ
b
†
k
(t) + (
ˆ
b
†
−k
(t))
†
_
exp
_
− i k x
_
(215)
has to be the same as ˆ y(x). On setting k = −k
in the above equation, one has
ˆ y
†
(x) =
1
√
L
k
¸
¯h
2 ρ ω
k
_
ˆ
b
†
−k
(t) + (
ˆ
b
†
k
(t))
†
_
exp
_
+ i k
x
_
(216)
For ˆ y
†
(x) to be equal to ˆ y(x), it is necessary that the Hermitean conjugate of
the operator
ˆ
b
†
k
is equal to
ˆ
b
k
. This shows that the pair of operators are indeed
Hermitean conjugates. The quantum ﬁeld is represented by the operator
ˆ y(x) =
1
√
L
k
¸
¯h
2 ρ ω
k
_
ˆ
b
†
−k
(t) +
ˆ
b
k
(t)
_
exp
_
+ i k
x
_
(217)
where the timedependent creation and annihilation operators are given by
ˆ
b
k
(t) =
ˆ
b
k
exp
_
− i ω
k
t
_
ˆ
b
†
k
(t) =
ˆ
b
†
k
exp
_
+ i ω
k
t
_
(218)
39
The quantized Hamiltonian becomes
ˆ
H = ρ
k
ω
2
k
_
ˆ c
†
k
ˆ c
k
+ ˆ c
−k
c
†
−k
_
=
k
¯h ω
k
2
_
ˆ
b
†
k
ˆ
b
k
+
ˆ
b
−k
ˆ
b
†
−k
_
(219)
On transforming k →−k in the second term of the summation, one obtains the
standard form
ˆ
H =
k
¯h ω
k
2
_
ˆ
b
†
k
ˆ
b
k
+
ˆ
b
k
ˆ
b
†
k
_
(220)
where the
ˆ
b
k
and
ˆ
b
†
k
are to be identiﬁed as annihilation and creation operators
for the quanta.
The quantum operator
ˆ
P corresponding to the classical quantity P
P =
_
L
0
dx T (221)
is evaluated as
ˆ
P = −
_
κ a
2 ρ
_
k
¯h k
_
ˆ
b
†
−k
−
ˆ
b
k
_ _
ˆ
b
†
k
+
ˆ
b
−k
_
(222)
where the planewave orthogonality properties have been used. This quantity
can be expressed as the sum of two terms
ˆ
P = −
_
κ a
2 ρ
_
k
¯h k
_
ˆ
b
†
−k
ˆ
b
†
k
−
ˆ
b
k
ˆ
b
−k
_
+
_
κ a
2 ρ
_
k
¯h k
_
ˆ
b
k
ˆ
b
†
k
−
ˆ
b
†
−k
ˆ
b
−k
_
(223)
The operator
ˆ
P can be shown to be equivalent to
ˆ
P = v
2
k
¯h k
ˆ
b
†
k
ˆ
b
k
(224)
which obviously is proportional to the sum of the momenta of the quanta. The
quantity
κ a
ρ
is just the square of the wave velocity v
2
. On noting that the
quanta travel with velocities given by v sign(k) and have energies given by
¯h ω
k
= ¯h v [ k [, one sees that T is the expressed as the total energy ﬂux
associated with the quanta.
40
5.4 The Algebra of Boson Operators
The number operator ˆ n
k
can be deﬁned
6
as
ˆ n
k
=
ˆ
b
†
k
ˆ
b
k
(225)
which has eigenstates [ n
k
> with eigenvalues n
k
ˆ n
k
[ n
k
> = n
k
[ n
k
> (226)
The eigenvalues n
k
are positive integers, including zero. This can be inferred
from the commutation relations
[
ˆ
b
†
k
,
ˆ
b
k
] = − δ
k,k
(227)
which has the consequence that
[ ˆ n
k
,
ˆ
b
†
k
] = + δ
k,k
ˆ
b
†
k
[ ˆ n
k
,
ˆ
b
k
] = − δ
k,k
ˆ
b
k
(228)
Hence, when
ˆ
b
k
acts on an eigenstate of ˆ n
k
with eigenvalue n
k
it produces
another eigenstate of ˆ n
k
but with an eigenvalue of n
k
−1, as can be seen since
ˆ n
k
ˆ
b
k
[ n
k
> =
ˆ
b
k
( ˆ n
k
− 1 ) [ n
k
>
= ( n
k
− 1 )
ˆ
b
k
[ n
k
> (229)
Therefore, since
ˆ
b
k
lowers the eigenvalue of the number operator by one unit,
one can write
ˆ
b
k
[ n
k
> = C(n
k
) [ n
k
−1 > (230)
where the complex number C(n
k
) has to be determined. The normalization
coeﬃcient C(n
k
) can be determined by noting that
< n
k
[
ˆ
b
†
k
(231)
is the Hermitean conjugate of the state
ˆ
b
k
[ n
k
> (232)
On taking the norm of the state and its conjugate, one ﬁnds the normalization
< n
k
[
ˆ
b
†
k
ˆ
b
k
[ n
k
> = C
∗
(n
k
) C(n
k
) < n
k
−1 [ n
k
−1 >
= [ C(n
k
) [
2
(233)
However, on using the deﬁnition of the number operator and the normalization
condition, one ﬁnds that
[ C(n
k
) [
2
= n
k
(234)
6
P. Jordan and O. Klein, Zeit. f¯ ur Physik, 45, 751 (1927).
41
so, on choosing the phase factor, one can deﬁne
ˆ
b
k
[ n
k
> =
√
n
k
[ n
k
−1 > (235)
as the annihilation operator.
Likewise, one can see that Hermitean conjugate operator
ˆ
b
†
k
when acting on
an eigenstate of the number operator increases its eigenvalue by one unit
ˆ n
k
ˆ
b
†
k
[ n
k
> =
ˆ
b
†
k
( ˆ n
k
+ 1 ) [ n
k
>
= ( n
k
+ 1 )
ˆ
b
†
k
[ n
k
> (236)
Therefore, one has
ˆ
b
†
k
[ n
k
> = C
(n
k
) [ n
k
+ 1 > (237)
The coeﬃcient C
(n
k
) is found from the normalization condition
< n
k
[
ˆ
b
k
ˆ
b
†
k
[ n
k
> = C
∗
(n
k
) C
(n
k
) < n
k
+ 1 [ n
k
+ 1 >
= [ C
(n
k
) [
2
(238)
which with
ˆ
b
k
ˆ
b
†
k
= ˆ n
k
+ 1 (239)
yields
[ C
(n
k
) [
2
= n
k
+ 1 (240)
Since, the phase factor has already been determined by the Hermitean conjugate
equation, one has
ˆ
b
†
k
[ n
k
> =
√
n
k
+ 1 [ n
k
+ 1 > (241)
which raises the eigenvalue of the number operator.
Hence, one sees that the eigenvalues of the number operator are separated
by integers. Furthermore, the smallest eigenvalue corresponds to n
k
= 0, since
for n
k
= 0 the equation
ˆ
b
k
[ n
k
> =
√
n
k
[ n
k
−1 > (242)
reduces to
ˆ
b
k
[ 0 > = 0 (243)
Hence, the hierarchy of states produced by the annihilation operator acting on
a number operator eigenstate terminates at n
k
= 0. Thus, the eigenvalues of
the number operator n
k
can have integer values 0 , 1 , 2 , 3 , . . . , ∞.
Therefore, any arbitrary number operator eigenstate [ ¦n
k
¦ > , in which
the number of excitations (n
k
) in the each normal mode has been speciﬁed, can
be written in terms of the vacuum state [ 0 > and the creation operators as
[ ¦n
k
¦ > =
k
_
(
ˆ
b
†
k
)
n
k
√
n
k
!
_
[ 0 > (244)
42
The repeated operation of the creation operator
ˆ
b
†
k
creates a state with n
k
bosonic excitations present in mode k and the denominator provides the correct
normalization for this state.
Any arbitrary state [ Ψ > can be expressed as a linear superposition of
number operator eigenstates
[ Ψ > =
{n
k
}
C(¦n
k
¦) [ ¦n
k
¦ > (245)
where the sum runs over all possible number eigenstates, and the complex coef
ﬁcients C(¦n
k
¦) are arbitrary except that they must satisfy the normalization
condition
{n
k
}
[ C(¦n
k
¦) [
2
= 1 (246)
5.5 The Classical Limit
The classical limit of the quantum ﬁeld theory can be characterized by the limit
in which the ﬁeld operator can be replaced by a function. This requires that the
“classical” states are not only described as states with large numbers of quanta
in the excited normal modes, but also that the state is a linear superposition
of states with diﬀerent number of quanta, with a reasonable well deﬁned phase
of the complex coeﬃcients. For a quantum state to ideally represent a given
classical state, one needs the quantum state to be composed of a coherent su
perposition of states with diﬀerent numbers of quanta.
That states which are eigenstates of the number operators ( [ ¦n
k
¦ > ) can
not represent classical states, can be seen by noting that the expectation value
of the ﬁeld operator is zero
< ¦n
k
¦ [ ˆ y(x) [ ¦n
k
¦ > = 0 (247)
follows from the expectation value of the creation and annihilation operators
< ¦n
k
¦ [ a
k
[ ¦n
k
¦ > = 0 (248)
Despite the fact that the average value of the ﬁeld is zero, the ﬂuctuation in the
ﬁeld amplitude is inﬁnite since
< ¦n
k
¦ [ ˆ y(x)
2
[ ¦n
k
¦ > =
1
L
k
¯h
2 ρ ω
k
< ¦n
k
¦ [
_
ˆ
b
†
k
+
ˆ
b
−k
_ _
ˆ
b
k
+
ˆ
b
†
−k
_
[ ¦n
k
¦ >
=
1
L
k
¯h
2 ρ ω
k
( 1 + 2 n
k
) (249)
and the zeropoint contribution diverges logarithmically at the upper and lower
limits of integration.
43
Hence, the eigenstates of the number operator. or equivalently
ˆ
H, do not
describe the classical states of the string. Classical states must be expressed as
a linear superposition of energy eigenstates.
6 Classical Field Theory
The dynamics of a multicomponent classical ﬁeld φ
α
is governed by a Lagrange
density L, which is a scalar quantity that is a function of the ﬁelds φ
α
and their
derivatives ∂
µ
φ
α
. The equations of motion for the classical ﬁeld are determined
by the principle of extremal action. That is, the classical ﬁelds are those for
which the action S
S =
_
dt
_
d
3
x L
_
φ
α
, ∂
µ
φ
α
_
(250)
is extremal. An arbitrary ﬁeld φ
α
can be expressed in terms of the extremal
value φ
α
ex
and the deviation δφ
α
as
φ
α
= φ
α
ex
+ δφ
α
(251)
The space and time derivatives of the arbitrary ﬁeld can also be expressed as
the derivatives of the sum of the extremal ﬁeld and the deviation
∂
ν
φ
α
= ∂
ν
φ
α
ex
+ ∂
ν
δφ
α
(252)
The ﬁrstorder change in the action δS is given by
δS =
_
t
0
dt
_
d
3
x
_
δφ
α
∂
∂φ
α
L
_
φ
α
ex
, ∂
µ
φ
α
ex
_
+ (∂
ν
δφ
α
)
∂
∂(∂
ν
φ
α
)
L
_
φ
α
ex
, ∂
µ
φ
α
ex
_ _
(253)
On integrating by parts with respect to x
ν
in the last term, and on assuming
appropriate boundary conditions, one ﬁnds
δS =
_
t
0
dt
_
d
3
x δφ
α
_
∂
∂φ
α
L
_
φ
α
ex
, ∂
µ
φ
α
ex
_
−∂
ν
_
∂
∂(∂
ν
φ
α
)
L
_
φ
α
ex
, ∂
µ
φ
α
ex
_ _ _
(254)
which has to vanish for an arbitrary choice of δφ
α
. Hence, one obtains the
EulerLagrange equations
∂
∂φ
α
L
_
φ
α
ex
, ∂
µ
φ
α
ex
_
= ∂
ν
_
∂
∂(∂
ν
φ
α
)
L
_
φ
α
ex
, ∂
µ
φ
α
ex
_ _
(255)
This set of equations determine the time dependence of the classical ﬁelds
φ
α
ex
(x). That is, out of all possible ﬁelds with components φ
α
, the equations
of motion determine the physical ﬁeld which has the components φ
α
ex
. It is
convenient to deﬁne the ﬁeld momentum density π
0
α
(x) conjugate to φ
α
as
π
0
α
(x
ν
) =
1
c
∂
∂(∂
0
φ
α
)
L
_
φ
β
, ∂
µ
φ
β
_
(256)
44
The Hamiltonian density H is then deﬁned as the Legendre transform
H = c
α
π
0
α
(∂
0
φ
α
) − L (257)
which eliminates the timederivative of the ﬁelds in terms of the momentum
density of the ﬁelds.
Exercise:
Starting from the Lorentz scalar Lagrangian
L =
1
2
_
( ∂
µ
φ ) ( ∂
µ
φ ) −
_
m c
¯h
_
2
φ
2
_
(258)
for a real scalar ﬁeld φ, determine the EulerLagrange equation and the Hamil
tonian density H.
Exercise:
Consider the Lagrangian density
L =
1
2
_
( ∂
µ
ψ
∗
) ( ∂
µ
ψ ) −
_
m c
¯h
_
2
[ ψ [
2
_
(259)
for a complex scalar ﬁeld ψ. Treat ψ and ψ
∗
as independent ﬁelds.
(i) Determine the EulerLagrange equation and the Hamiltonian density H.
(ii) By Fourier transforming with respect to space and time, determine the form
of the general solution for ψ.
Exercise:
The Lagrangian density for the complex ﬁeld ψ representing a charged par
ticle is given by
L = −
¯h
2
2 m
_
∇ψ
∗
_
.
_
∇ψ
_
−
¯h
2 i
_
ψ
∗
_
∂ψ
∂t
_
−
_
∂ψ
∗
∂t
_
ψ
_
− ψ
∗
V (x) ψ
(260)
(i) Determine the equation of motion, and the Hamiltonian density H.
(ii) Consider the case V (x) ≡ 0, then by Fourier transforming with respect to
space and time, determine the form of the general solution for ψ.
6.1 The Hamiltonian Formulation
The Hamiltonian formulation reserves a special role for time, and so is not
Lorentz covariant. However, the Hamiltonian formulation is the most conve
nient formulation for quantizing ﬁelds. The Hamilton equations of motion are
45
determined from the Hamiltonian
H =
_
d
3
x H (261)
by noting that H is only a functional of π
0
α
and φ
α
. This can be seen, since as
H =
_
d
3
x
_
c
α
π
0
α
(∂
0
φ
α
) − L
_
(262)
then, the ﬁrstorder variation of the Hamiltonian δH is given by
δH =
_
d
3
x
_
c
α
_
δπ
0
α
(∂
0
φ
α
) + π
0
α
(∂
0
δφ
α
)
_
− δL
_
(263)
but, from the Lagrangian formulation of ﬁeld theory, one has
1
c
δL = δφ
α
(∂
0
π
0
α
) + (∂
0
δφ
α
) π
0
α
(264)
where the EulerLagrange equations were substituted into the ﬁrst term. There
fore, the variation in the Hamiltonian is given by
δH =
_
d
3
x c
α
_
δπ
0
α
(∂
0
φ
α
) − δφ
α
(∂
0
π
0
α
)
_
(265)
which does not involve the time derivative of the ﬁelds. This implies that
the Hamiltonian is a function of the ﬁelds π
0
α
, φ
α
and their derivatives. On
calculating the variation of H using the independent variables π
0
α
and φ
α
, and
integrating by parts, one ﬁnds that the Hamiltonian equations of motion are
given by
c ∂
0
φ
α
=
_
∂H
∂π
0
α
_
− ∇
_
∂H
∂(∇π
0
α
)
_
− c ∂
0
π
0
α
=
_
∂H
∂φ
α
_
− ∇
_
∂H
∂(∇φ
α
)
_
(266)
The structure of these equations are similar to those of the classical mechanics of
point particles. Similar to classical mechanics of point particles, one can deﬁne
Poisson Brackets with ﬁelds. When quantizing the ﬁelds, the Poisson Bracket
relations between the ﬁelds can be replaced by commutation relations.
6.2 Symmetry and Conservation Laws
Emmy Noether produced a theorem linking continuous symmetries of a La
grangian to conservation laws
7
.
7
E. Noether, Nachr. d. Kgl. Gessch. d. Wiss. Gottingen, K1. Math. Phys. (1918) 235.
46
6.2.1 Conservation Laws
Consider a Lagrangian density L which is a function of a set of ﬁelds φ
α
(x) and
their derivatives deﬁned in a Minkowski space x. Consider how the Lagrangian
density changes for a particular choice of a combination of inﬁnitesimal trans
formations of the ﬁeld components
φ
α
(x) → φ
α
(x) = φ
α
(x) + δφ
α
(x) (267)
and, as a consequence, the derivatives of the ﬁeld components also transform as
∂
µ
φ
α
(x) → ∂
µ
φ
α
(x) = ∂
µ
φ
α
(x) + ∂
µ
δφ
α
(x) (268)
Under this combined transformation, the Lagrangian density changes by an
inﬁnitesimal amount δL, given by
δL =
_
∂L
∂(∂
µ
φ
α
)
_
∂
µ
δφ
α
+
∂L
∂φ
α
δφ
α
(269)
where the ﬁeld index α is to be summed over. However, the generalized mo
mentum density π
µ
α
(x) is deﬁned by
π
µ
α
(x) =
_
∂L
∂(∂
µ
φ
α
)
_
(270)
so
δL = π
µ
α
(∂
µ
δφ
α
) +
∂L
∂φ
α
δφ
α
(271)
The EulerLagrange equation for each ﬁeld φ
α
is given by
∂
µ
π
µ
α
−
∂L
∂φ
α
= 0 (272)
where φ
α
satisﬁes the appropriate boundary conditions. Thus, on adding and
subtracting a term
(∂
µ
π
µ
α
) δφ
α
(273)
to δL, one ﬁnds
δL =
_
π
µ
α
∂
µ
δφ
α
+ (∂
µ
π
µ
α
) δφ
α
_
+
_
∂L
∂φ
α
− ∂
µ
π
µ
α
_
δφ
α
= ∂
µ
_
π
µ
α
δφ
α
_
+
_
∂L
∂φ
α
− ∂
µ
π
µ
α
_
δφ
α
= ∂
µ
_
π
µ
α
δφ
α
_
(274)
since the last term in the second line vanishes if the ﬁelds φ
α
satisfy the Euler
Lagrange equations. If the Lagrangian is invariant under the transformation,
then δL = 0, so
∂
µ
_
π
µ
α
δφ
α
_
= 0 (275)
47
where the ﬁeld index α is to be summed over. The above equation can be
rewritten as a continuity equation
∂
µ
j
µ
= 0 (276)
where the conserved current j
µ
(x) is given by
j
µ
(x) ∝ π
µ
α
(x) δφ
α
(x)
∝
_
∂L
∂(∂
µ
φ
α
)
_
δφ
α
(x) (277)
up to a constant of proportionality. The normalization of the conserved current
is arbitrary and can be chosen at will. Since it is recognized that δφ
α
is in
ﬁnitesimal, the normalization is chosen by introducing an inﬁnitesimal constant
via
j
µ
(x) = π
µ
α
(x) δφ
α
(x)
=
_
∂L
∂(∂
µ
φ
α
)
_
δφ
α
(x) (278)
The conserved charge Q is deﬁned as the integral over all space of the time
component of the current density j
(0)
. That is, the conserved charge is given by
Q =
_
d
3
x j
(0)
(x) (279)
or, more speciﬁcally
Q =
_
d
3
x π
(0)
α
(x) δφ
α
(x)
=
_
d
3
x
_
∂L
∂(∂
0
φ
α
)
_
δφ
α
(x) (280)
Since is a constant, the total charge Q is constant. Therefore, the total time
derivative of Q vanishes
dQ
dt
= 0 (281)
The spatial components of j
µ
form the current density vector.
6.2.2 Noether Charges
Consider the inﬁnitesimal variation of a complex ﬁeld φ
α
(x) deﬁned by
φ
α
(x) → φ
α
(x) = φ
α
(x) + i
β
λ
α
β
φ
β
(x) (282)
If this inﬁnitesimal variation leads to L being invariant, one has a conserved
current
j
µ
= i
α,β
_
∂L
∂(∂
µ
φ
α
)
_
λ
α
β
φ
β
(x) (283)
48
An important example is given by the inﬁnitesimal transformation
ψ
= ψ + i ψ
ψ
∗
= ψ
∗
− i ψ
∗
(284)
where ψ and its complex conjugate ψ
∗
are regarded as independent ﬁelds. The
transformation represents a an inﬁnitesimal constant shift of the phase of the
ﬁeld
8
. The conserved current is
j
µ
= − i
_ _
∂L
∂(∂
µ
ψ)
_
ψ(x) −
_
∂L
∂(∂
µ
ψ
∗
)
_
ψ
∗
(x)
_
(285)
which is the electromagnetic current density fourvector.
Exercise:
The Lagrangian density for the complex Schr¨odinger ﬁeld representing a
charged particle is given by
L = −
¯h
2
2 m
_
∇ψ
∗
_
.
_
∇ψ
_
−
¯h
2 i
_
ψ
∗
_
∂ψ
∂t
_
−
_
∂ψ
∗
∂t
_
ψ
_
− ψ
∗
V (x) ψ
(286)
(i) Determine the conserved Noether charges.
Exercise:
Determine the Noether charges for a complex KleinGordon ﬁeld theory,
governed by the Lagrangian density
L =
1
2
_
( ∂
µ
ψ
∗
) ( ∂
µ
ψ ) −
_
m c
¯h
_
2
[ ψ [
2
_
(287)
8
This particular transformation is a speciﬁc example of a gauge transformations of the ﬁrst
kind, in which
ψ
(x) = exp
_
− i
q
¯h c
Λ(x)
_
ψ(x)
A gauge transformation of the second kind is one in which the ﬁeld changes according to
A
µ
= A
µ
+ (∂
µ
Λ)
Since ˆ p
µ
= i ¯h ∂
µ
, the combination of these transformations keep the quantity (ˆ p
µ
−
q
c
A
µ
)ψ
invariant
49
6.2.3 Noether’s Theorem
The basic theorem can be generalized to the case where the Lagrangian density
is not invariant under the inﬁnitesimal transformation, but instead changes by
a combination of total derivatives. That is,
δL = ∂
µ
Λ
µ
(288)
for some analytic vector function with components Λ
µ
. This type of transforma
tion does not change the total action. If the Lagrangian changes by the above
amount for the combined transformation δφ
α
φ
α
(x) → φ
α
(x) = φ
α
(x) + δφ
α
(x) (289)
then as has been previously shown
δL =
_
π
µ
α
(∂
µ
δφ
α
) + (∂
µ
π
µ
α
) δφ
α
_
+
_
∂L
∂φ
α
− ∂
µ
π
µ
α
_
δφ
α
= ∂
µ
_
π
µ
α
δφ
α
_
+
_
∂L
∂φ
α
− ∂
µ
π
µ
α
_
δφ
α
= ∂
µ
_
π
µ
α
δφ
α
_
(290)
one has
∂
µ
Λ
µ
= ∂
µ
_
π
µ
α
δφ
α
_
(291)
If the conserved currents are identiﬁed as
j
µ
=
_
π
µ
α
δφ
α
_
− Λ
µ
(292)
then the continuity condition
∂
µ
j
µ
= 0 (293)
holds.
6.3 The EnergyMomentum Tensor
An example of Noether’s theorem is given by the transformation
φ
α
(x) → φ
α
(x +) = φ
α
(x) +
µ
(∂
µ
φ
α
) (294)
which represents an inﬁnitesimal spacetime translation. This is a symmetry
appropriate to a Lagrangian density L which has no explicit x dependence. We
shall assume that the Lagrangian density only depends on the ﬁeld φ
α
and its
derivatives ∂
ν
φ
α
L = L
_
φ
α
, (∂
ν
φ
α
)
_
(295)
50
In this case, the change in the Lagrangian density is given by the total derivative
δL =
_
∂L
∂(∂
ν
φ
α
)
_
(∂
ν
δφ
α
) +
_
∂L
∂φ
α
_
δφ
α
=
µ
_ _
∂L
∂(∂
ν
φ
α
)
_ _
∂
ν
∂
µ
φ
α
_
+
_
∂L
∂φ
α
_
∂
µ
φ
α
_
=
µ
∂
µ
L (296)
where the last line follows since the Lagrangian only depends implicitly on x
µ
through the ﬁelds. Hence, the change in the Lagrangian is a total derivative
δL =
µ
∂
µ
Λ (297)
where Λ = L. Therefore, for transformations of the type
φ
α
→ φ
α
+
µ
(∂
µ
φ
α
) (298)
Noether’s theorem takes the form
µ
_
∂
µ
L
_
=
µ
_ _
∂L
∂(∂
ν
φ
α
)
_
∂
ν
_
∂
µ
φ
α
_
+
_
∂L
∂φ
α
_
∂
µ
φ
α
_
=
µ
_ _
∂L
∂(∂
ν
φ
α
)
_
∂
ν
_
∂
µ
φ
α
_
+ ∂
ν
_
∂L
∂(∂
ν
φ
α
)
_
∂
µ
φ
α
_
=
µ
∂
ν
_ _
∂L
∂(∂
ν
φ
α
)
_ _
∂
µ
φ
α
_ _
(299)
where the EulerLagrange equation has been used in the second line. Thus, the
ﬁelds satisfy the continuity conditions
0 =
µ
∂
ν
_ _
∂L
∂(∂
ν
φ
α
)
_ _
∂
µ
φ
α
_
− δ
ν
µ
L
_
(300)
where
δ
µ
ν
= 1 if µ = ν
δ
µ
ν
= 0 otherwise (301)
The conserved current density is identiﬁed as
T
ν
µ
=
_ _
∂L
∂(∂
ν
φ
α
)
_ _
∂
µ
φ
α
_
− δ
ν
µ
L
_
(302)
which is the energymomentum density T
ν
µ
. The energy momentum tensor sat
isﬁes the conservation law
∂
ν
T
ν
µ
= 0 (303)
The secondrank tensor can be written in contravariant form as
T
ν,µ
=
_ _
∂L
∂(∂
ν
φ
α
)
_ _
∂
µ
φ
α
_
− g
ν,µ
L
_
(304)
51
where the metric tensor has been used to raise the index µ. The component
with µ = ν = 0 is the Hamiltonian density H for the ﬁelds
H = T
0,0
=
_ _
∂L
∂(∂
0
φ
α
)
_ _
∂
0
φ
α
_
− L
_
(305)
so the total energy of the ﬁeld is given by
E =
_
d
3
x H =
_
d
3
x T
0,0
(306)
The energy is conserved since
1
c
∂H
∂t
+
j
∂
∂x
(j)
T
j,0
= 0 (307)
where the components c T
j,0
represents the components of the energydensity
ﬂux. Likewise, the components
T
0,j
= c π
(0)
α
_
∂
(j)
φ
α
_
(308)
are related to the momentum density since the total momentum of the ﬁeld is
given by
P
(j)
=
_
d
3
x
1
c
T
0,j
(309)
Since T
0,j
is the momentum density, one expects that the components of the
orbital angular momentum density are proportional to
M
0,j,k
= T
0,j
x
(k)
− T
0,k
x
(j)
(310)
One can deﬁne a thirdrank tensor via
M
µ,ν,ρ
= T
µ,ν
x
ρ
− T
µ,ρ
x
ν
(311)
The divergence of the thirdrank tensor is evaluated as
∂
µ
M
µ,ν,ρ
=
_
∂
µ
T
µ,ν
_
x
ρ
+ T
µ,ν
δ
ρ
µ
−
_
∂
µ
T
µ,ρ
_
x
ν
− T
µ,ρ
δ
ν
µ
= T
ρ,ν
− T
ν,ρ
(312)
where the conservation law for T
µ,ν
and the condition
∂
µ
x
ρ
= δ
µ
ρ
(313)
expressing the independence of the variables x
ρ
and x
µ
have been used. The
divergence of the thirdrank tensor vanishes if T
µ,ν
is symmetric. Thus, the
52
angular momentum tensor M
µ,ν,ρ
is conserved if the energymomentum tensor
is symmetric.
It should be noted that the tensor T
µ,ν
is only symmetric for scalar ﬁelds.
This is related to the fact that a vector or tensor ﬁeld carries a nonzero intrin
sic angular momentum. It is possible to incorporate an additional term in the
momentumenergy tensor of a vector ﬁeld to make it symmetric.
Exercise:
(i) Determine the momentumenergy tensor for a complex scalar ﬁeld ψ governed
by the Lagrangian density
L =
1
2
_
( ∂
µ
ψ
∗
) ( ∂
µ
ψ ) −
_
m c
¯h
_
2
[ ψ [
2
_
(314)
(ii) Find the forms of the energy and momentum density of the ﬁeld.
(iii) Using the form of the general solution, ﬁnd expressions for the total energy
and momentum of the ﬁeld in terms of the Fourier components of the ﬁeld.
Exercise:
(i) Determine the energymomentum tensor for the Lagrangian density for the
complex Schr¨odinger ﬁeld representing a charged particle given by
L = −
¯h
2
2 m
_
∇ψ
∗
_
.
_
∇ψ
_
−
¯h
2 i
_
ψ
∗
_
∂ψ
∂t
_
−
_
∂ψ
∗
∂t
_
ψ
_
− ψ
∗
V (x) ψ
(315)
(ii) Find the forms of the energy and momentum density of the ﬁeld.
(iii) Find the forms of the generalized orbital angular momentum density of the
ﬁeld.
(iv) Consider the case where V (x) ≡ 0. Using the form of the general solution,
ﬁnd expressions for the total energy and momentum of the ﬁeld in terms of the
Fourier components of the ﬁeld.
7 The Electromagnetic Lagrangian
The Lagrangian for a sourcefree electromagnetic ﬁeld must be gauge invariant
and must be a Lorentz scalar. An appropriate scalar Lagrange density can be
constructed as
L = −
1
16 π
F
µ,ν
F
µ,ν
(316)
where A
µ
are the ﬁelds. The constant of proportionality is merely a matter
of convention. The EulerLagrange equations are found by expressing the La
53
grangian density in the symmetrical form
L = −
1
16 π
F
µ,ν
g
µ,σ
g
ν,τ
F
σ,τ
(317)
From the above expression, it is seen that the two factors of the antisymmetri
cal secondrank ﬁeld tensors produce identical variations of the action. Further
more, in this form the Lagrangian only depends on the contravariant derivatives
of the contravariant components of the ﬁeld. This form allows variations to be
made directly without using the properties of the metric tensor. The ﬁrstorder
variation of the action can be expressed as
δS = −
2
16 π c
_
d
4
x
_
( ∂
µ
δA
ν
) F
µ,ν
− ( ∂
ν
δA
µ
) F
µ,ν
_
=
2
16 π c
_
d
4
x
_
δA
ν
∂
µ
_
F
µ,ν
− F
ν,µ
_ _
=
1
4 π c
_
d
4
x δA
ν
( ∂
µ
F
µ,ν
) (318)
where the second line has been obtained by integrating by parts and the last
line was obtained by using the antisymmetric nature of the ﬁeld tensor. The
vanishing of the ﬁrstorder variation of the action δS, for arbitrary δA
ν
, yields
the EulerLagrange equation
∂
µ
F
µ,ν
= 0 (319)
which is the same as Maxwell’s equations in the absence of any sources.
In the absence of the source, the Lagrangian density is gauge invariant.
This can be seen by noting that the contravariant ﬁeld tensor F
µ,ν
is gauge
invariant, and the covariant tensor is obtained from the contravariant tensor by
lowering both indices with the metric tensor. The contravariant ﬁeld tensor can
be expressed as the matrix
F
µ,ν
≡
_
_
_
_
0 −E
(1)
−E
(2)
−E
(3)
E
(1)
0 −B
(3)
B
(2)
E
(2)
B
(3)
0 −B
(1)
E
(3)
−B
(2)
B
(1)
0
_
_
_
_
(320)
and the covariant ﬁeld tensor can be expressed as the matrix
F
µ,ν
≡
_
_
_
_
0 E
(1)
E
(2)
E
(3)
−E
(1)
0 −B
(3)
B
(2)
−E
(2)
B
(3)
0 −B
(1)
−E
(3)
−B
(2)
B
(1)
0
_
_
_
_
(321)
in which the sign of the terms with mixed time and space indices have changed.
Therefore, the Lagrangian density can be expressed in terms of the electromag
netic ﬁelds as
L =
1
8 π
( E
2
− B
2
) (322)
54
Since the Lagrangian density is completely expressed in terms of the electro
magnetic ﬁeld, it is gauge invariant.
In the presence of source densities, the Lagrangian density is extended to
include the interaction to become
L = −
1
16 π
F
µ,ν
F
µ,ν
−
1
c
A
µ
j
µ
(323)
This interaction term is the only Lorentz scalar that one can form with the
fourvector current and the ﬁeld. It should be noted that the last term is not
gauge invariant. This action yields the equation of motion
∂
µ
F
µ,ν
=
4 π
c
j
ν
(324)
as expected.
The lack of gauge invariance in the interaction Lagrangian
L
int
= −
1
c
A
µ
j
µ
(325)
does not aﬀect the equations of motion. On performing the gauge transforma
tion
A
µ
→ A
µ
= A
µ
+ ∂
µ
Λ (326)
one ﬁnds that the interaction part of the Lagrangian density is transformed to
L
int
= −
1
c
_
A
µ
+ ∂
µ
Λ
_
j
µ
(327)
Since charge is conserved, the current density must satisfy the continuity equa
tion
∂
µ
j
µ
= 0 (328)
The continuity condition can be used to express the interaction as the untrans
formed Lagrangian density and a perfect derivative
L
int
= −
1
c
A
µ
j
µ
−
1
c
∂
µ
( Λ j
µ
) (329)
The perfect derivative term only adds a constant term to the action which does
not aﬀect the equations of motion
9
. Hence, although the Lagrangian density
is not gauge invariant in the presence of sources, the Lagrangian equations of
motion are gauge invariant.
9
The change in the form of the interaction Lagrangian density produced by a gauge trans
formation should be taken as a warning against considering quantities in a ﬁeld theory as
being localized.
55
The momentum density conjugate to A
µ
is calculated as
π
0,µ
= −
c
4 π
F
0,µ
(330)
which vanishes for µ = 0, indicating that the scalar potential A
0
is not a
dynamic variable. This suggests that it may be appropriate to completely ﬁx
the scalar potential by a choice of gauge, such as the Coulomb gauge which leads
to the scalar potential φ being ﬁxed by Poisson’s equation. In the presence of
sources, the Hamiltonian density is expressed as
H = −
1
4 π
( ∂
0
A
ν
) F
0,ν
− L
= −
1
4 π
( F
0,ν
+ ∂
ν
A
0
) F
0,ν
− L
= −
1
4 π
( F
0,ν
+ ∂
ν
A
0
) F
0,ν
+
1
8 π
( B
2
− E
2
) +
1
c
j
µ
A
µ
= +
1
8 π
( E
2
+ B
2
) −
1
4 π
( ∇ . E ) A
0
+
1
c
j
µ
A
µ
+
1
4 π
∇ . ( A
(0)
E ) (331)
The fourth line has been derived by noting that the nonzero components of
F
0,µ
are only nonzero for spacelike µ and are given by
F
0,i
= − E
(i)
(332)
Thus, the ﬁrst term in the third line is given by
−
1
4 π
F
0,ν
F
0,ν
= +
1
4 π
E
2
(333)
which can be combined with the term
−
1
8 π
( E
2
− B
2
) (334)
originating from the Lagrangian density. This combination results in the term
1
8 π
_
E
2
+ B
2
_
(335)
which is recognized as the usual expression for the energy density of a free
electromagnetic ﬁeld. On substituting eqn(332) into the second term in the
third line, one ﬁnds
+
1
4 π
( ∇A
0
) . E (336)
which can be expressed as
1
4 π
( ∇A
0
) . E =
1
4 π
∇ . ( A
0
E ) −
1
4 π
A
0
( ∇ . E ) (337)
56
This relation has been used in arriving at the fourth line of eqn(331). Since the
divergence of the electric ﬁeld satisﬁes Gauss’s law
∇ . E = 4 π ρ (338)
the expression given in eqn(337) simpliﬁes to
1
4 π
( ∇A
0
) . E =
1
4 π
∇ . ( A
0
E ) − A
0
ρ (339)
Therefore, the Hamiltonian density can be expressed as
H = +
1
8 π
( E
2
+ B
2
) − ρ A
0
+
1
c
j
µ
A
µ
+
1
4 π
∇ . ( A
(0)
E ) (340)
On combining the term ρ A
0
with the last term
1
c
j
µ
A
µ
= ρ A
0
−
1
c
j . A (341)
which originates from the Lagrangian interaction (−L
int
), one ﬁnds that the
terms proportional to A
0
ρ in the Hamiltonian density cancel. On neglecting
the total derivative term [ +
1
4 π
∇ . ( φ E ) ], one ﬁnds that the Hamiltonian
density reduces to
H =
1
8 π
( E
2
+ B
2
) −
1
c
j . A (342)
The ﬁrst term is the energy density of the free electromagnetic ﬁeld and the
second term represents the energy of the interaction between the electromag
netic ﬁeld and “charged particles”. It should be noted that the interaction
Hamiltonian is expressed entirely in terms of an interaction between the current
density and the vector potential, which demonstrates that the Hamiltonian is
not invariant under a Lorentz transformation
H
int
= −
1
c
j . A (343)
but is invariant under rotations in space. This situation is to be contrasted
with the interaction term in the Lagrangian which was Lorentz invariant as it
explicitly included an interaction between the scalar potential and the charge
density.
7.1 Conservation Laws for Electromagnetic Fields
The Lagrangian density L of an electromagnetic ﬁeld is given by the Lorentz
scalar
L = −
1
16 π
F
µ,ν
F
µ,ν
−
1
c
j
µ
A
µ
(344)
57
or
L = −
1
16 π
_ _
∂
µ
A
ν
_
−
_
∂
ν
A
µ
_ _ _ _
∂
µ
A
ν
_
−
_
∂
ν
A
µ
_ _
−
1
c
j
µ
A
µ
(345)
The Noetherian energymomentum tensor T
ν,µ
is found from
T
ν
µ
=
_
∂L
∂(∂
ν
A
ρ
)
_ _
∂
µ
A
ρ
_
−δ
ν
µ
L
=
_
∂L
∂(∂
ν
A
ρ
)
_ _
∂
µ
A
ρ
_
−δ
ν
µ
L (346)
The derivative of the Lagrangian density is evaluated as
_
∂L
∂(∂
ν
A
ρ
)
_
= −
1
8 π
_
F
ν,ρ
− F
ρ,ν
_
= −
1
4 π
F
ν,ρ
(347)
Therefore, the energymomentum density is found as
T
ν
µ
= −
1
4 π
F
ν,ρ
_
∂
µ
A
ρ
_
−δ
ν
µ
L (348)
On raising the index µ with the metric tensor, one has the contravariant second
rank tensor
T
ν,µ
= −
1
4 π
F
ν,ρ
_
∂
µ
A
ρ
_
− g
ν,µ
L (349)
The energymomentum tensor is not gauge invariant, as it explicitly involves
the ﬁelds A
µ
. On using the expression for the sourcefree Lagrangian density
L =
1
8 π
_
E
2
− B
2
_
(350)
one ﬁnds that the time components of T
µ,ν
are given by
T
0,0
=
1
8 π
_
E
2
+ B
2
_
+
1
4 π
∇ .
_
φ E
_
(351)
The expression T
0,0
is the Hamiltonian density H, in the absence of sources,
which represents the energy density of the free ﬁeld. The momentum density is
given by the mixed time and space components, and is given by
T
0,j
= −
1
4 π
F
0,ρ
( ∂
(j)
A
ρ
) (352)
58
but since F
µ,ν
is antisymmetric, only the terms where ρ is a spatial index are
nonzero. Hence, one has
T
0,j
= −
1
4 π
i
F
0,i
( ∂
(j)
A
i
)
= +
1
4 π
i
F
0,i
_
∂
(j)
A
(i)
− ∂
(i)
A
(j)
_
+
1
4 π
i
F
0,i
( ∂
(i)
A
(j)
)
(353)
where the relation between the spacelike components of the covariant and con
travariant fourvector A
i
= − A
(i)
has been used. Since the time component
of the ﬁeld tensor is given by
F
0,i
= − E
(i)
(354)
and
10
_
∂
(i)
A
(j)
− ∂
(j)
A
(i)
_
= −
k
ξ
i,j,k
B
(k)
(355)
one ﬁnds that the momentum density is given by
T
0,j
= −
1
4 π
i,k
ξ
i,j,k
E
(i)
B
(k)
−
1
4 π
i
E
(i)
( ∂
(i)
A
(j)
)
=
1
4 π
_
E ∧ B
_
(j)
+
1
4 π
E . ( ∇ A
(j)
) (356)
On noting that in the absence of sources, one has
∇ . E = 0 (357)
and by adding a term proportional to A
(j)
( ∇ . E ) to the expression for T
0,j
in eqn(356), one arrives at the result
T
0,j
=
1
4 π
_
E ∧ B
_
(j)
+
1
4 π
∇ .
_
A
(j)
E
_
(358)
The components T
0,ν
, apart from the terms involving total derivatives which
integrate out to zero, are related to the total energy and the components of the
total momentum of the electromagnetic ﬁeld. The components of T
µ,ν
satisfy
the continuity equations
∂
µ
T
µ,ν
= 0 (359)
which represent the conservation of energy and momentum. The other mixed
time and spatial components of the energymomentum tensor are evaluated as
T
j,0
=
1
4 π
_
E ∧ B
_
(j)
+
1
4 π
_ _
∇ ∧
_
φ B
_ _
(j)
−
1
c
∂
∂t
_
φ E
(j)
_ _
(360)
10
Since the vector relationship B = ∇ ∧ A involves the covariant derivative, there is a
negative sign in the analogous expression involving the contravariant derivative.
59
The components T
j,0
represent the components of the energy ﬂux.
It should be noted that the energymomentum tensor T
µ,ν
is not symmet
ric. This has the consequence that the covariant generalization of the angular
momentum to the thirdrank tensor
M
µ,ν,ρ
= T
µ,ν
x
ρ
− T
µ,ρ
x
ν
(361)
is not conserved as the energymomentum tensor is not symmetric. Additional
terms can be added to the energymomentum tensor
11
, to create a symmetric
tensor Θ
µ,ν
. These extra terms account for the intrinsic angular momentum of
the photon.
The symmetric energymomentum tensor Θ
µ,ν
can be found by substituting
(∂
ν
A
λ
) = − F
λ,ν
+ (∂
λ
A
ν
) (362)
into the expression for T
µ,ν
, to yield
T
µ,ν
=
1
4 π
_
g
µ,ρ
F
ρ,λ
F
λ,ν
+
1
4
g
µ,ν
F
ρ,λ
F
ρ,λ
_
−
1
4 π
g
µ,ρ
F
ρ,λ
(∂
λ
A
ν
)
(363)
The ﬁrst two terms are symmetric and are gauge invariant. These two terms
will form the basis for Θ
µ,ν
, which will be expressed as
Θ
µ,ν
=
1
4 π
_
g
µ,ρ
F
ρ,λ
F
λ,ν
+
1
4
g
µ,ν
F
ρ,λ
F
ρ,λ
_
(364)
The expression Θ
µ,ν
is symmetric under the interchange of µ and ν, as can be
seen by writing
Θ
µ,ν
=
1
4 π
_
F
µ
λ
F
λ,ν
+
1
4
g
µ,ν
F
ρ,λ
F
ρ,λ
_
=
1
4 π
_
F
µ,λ
F
λ
ν
+
1
4
g
µ,ν
F
ρ,λ
F
ρ,λ
_
(365)
If Θ
µ,ν
and T
µ,ν
are to represent the same set of conserved quantities, the last
term in eqn(363) must be expressible as a total derivative. That this is true can
11
J. Belinfante, Physica 6, 887 (1939) has shown that the modiﬁed tensor Θ
µ,ν
deﬁned by
Θ
µ,ν
= T
µ,ν
+
_
∂ρ Λ
ρ;µ,ν
_
where Λ
ρ;µ,ν
is an arbitrary tensor that is antisymmetric under the interchange of the ﬁrst
pair of indices
Λ
ρ;µ,ν
= − Λ
µ;ρ,ν
will automatically satisfy the same continuity conditions as T
µ,ν
and leave the total energy
and momentum unaltered.
60
be seen by examining the asymmetric term
−
1
4 π
g
µ,ρ
F
ρ,λ
(∂
λ
A
ν
) = −
1
4 π
F
µ,λ
(∂
λ
A
ν
) (366)
where the index ρ was raised by using the metric tensor. On combining the
above expression with the source free Maxwell equation
(∂
λ
F
µ,λ
) = 0 (367)
one obtains
−
1
4 π
g
µ,ρ
F
ρ,λ
(∂
λ
A
ν
) = −
1
4 π
_
F
µ,λ
(∂
λ
A
ν
) + A
ν
(∂
λ
F
µ,λ
)
_
= −
1
4 π
∂
λ
_
F
µ,λ
A
ν
_
(368)
which is a total derivative. Furthermore, this term does not alter the conserva
tion laws since their diﬀerence involves the double derivative
∂
µ
( Θ
µ,ν
− T
µ,ν
) = −
1
4 π
∂
µ
∂
λ
_
F
λ,µ
A
ν
_
(369)
and F
λ,µ
is antisymmetric. On interchanging the order of the derivatives in the
right hand side, switching the summation labels, and using the antisymmetric
property of F
λ,µ
, one has
∂
µ
( Θ
µ,ν
− T
µ,ν
) = −
1
4 π
∂
µ
∂
λ
_
F
λ,µ
A
ν
_
= −
1
4 π
∂
λ
∂
µ
_
F
λ,µ
A
ν
_
= −
1
4 π
∂
µ
∂
λ
_
F
µ,λ
A
ν
_
= +
1
4 π
∂
µ
∂
λ
_
F
λ,µ
A
ν
_
(370)
On comparing the right hand sides of the ﬁrst and last line, one ﬁnds that
they have opposite signs, and therefore vanish. Thus, the diﬀerence between
continuity relations vanish
∂
µ
( Θ
µ,ν
− T
µ,ν
) = 0 (371)
Hence, since T
µ,ν
is conserved, then the symmetrized energymomentum tensor
Θ
µ,ν
is also conserved.
Thus, the symmetric energymomentum tensor Θ
µ,ν
expressed by
Θ
µ,ν
=
1
4 π
_
g
µ,ρ
F
ρ,λ
F
λ,ν
+
1
4
g
µ,ν
F
ρ,λ
F
ρ,λ
_
(372)
61
is a conserved quantity. The purely temporal component is given by
Θ
0,0
=
1
8 π
_
E
2
+ B
2
_
(373)
and the mixed temporal and spatial components are given by
Θ
0,j
=
1
4 π
_
E ∧ B
_
(j)
(374)
The temporal and spatial components of Θ
0,µ
are, respectively, recognized as
being the energydensity of the free ﬁeld and the momentumdensity vector. The
components Θ
j,0
are recognized as forming the Poynting vector which represents
the energy ﬂux of the electromagnetic ﬁeld. The spatial components are given
by
Θ
i,j
= −
1
4 π
_
E
(i)
E
(j)
+ B
(i)
B
(j)
−
1
2
δ
i,j
( E
2
+ B
2
)
_
(375)
Noether’s theorem is purely classical, but there are generalizations for quan
tum ﬁelds. Quantum generalizations includes the WardTakahashi and Taylor
Slavnov identities.
Exercise:
Evaluate the components T
j,0
and T
i,j
of the (asymmetric) energymomentum
tensor for a sourcefree electromagnetic ﬁeld.
Exercise:
Show that in the presence of sources, the symmetric energymomentum ten
sor has components with the form
Θ
0,0
=
1
8 π
_
E
2
+ B
2
_
−
1
c
j . A
Θ
0,j
=
1
4 π
_
E ∧ B
_
(j)
− ρ A
(j)
(376)
Verify the form of the conservation laws for energy and momentum.
Exercise:
Show that the extra term included in the tensor Θ
i,j
produces a contribution
to the angular momentum density of the form
o
0,j
=
1
4 π
_
E ∧ A
_
(j)
(377)
62
which is the intrinsic spin density of the electromagnetic ﬁeld.
7.2 Massive SpinOne Particles
The electromagnetic theory has been uniﬁed with the theory of weak interac
tions. This generalization requires the existence of two new types of spinone
particles in addition to the photon, which together mediate the electroweak
interaction. These new particles have nonzero mass. The massive spinone
particle particle has to satisfy the equation
12
p
µ
p
µ
= m
2
c
2
(378)
and with the quantization condition,
p
µ
→ ˆ p
µ
= i ¯h
∂
∂x
µ
(379)
the fourvector ﬁeld A
µ
must satisfy the KleinGordon equation
_
1
c
2
∂
2
∂t
2
− ∇
2
−
_
m c
¯h
_
2
_
A
µ
=
4 π
c
j
µ
(380)
where ¯h no longer drops out. This equation can be derived from the Lagrangian
L = −
1
16 π
F
µ,ν
F
µ,ν
+
1
8 π
_
m c
¯h
_
2
A
µ
A
µ
−
1
c
j
µ
A
µ
(381)
For example, on varying A
µ
, one obtains the equation of motion
∂
ν
F
ν,µ
+
_
m c
¯h
_
2
A
µ
=
4 π
c
j
µ
(382)
Neither the Lagrangian, nor the equation of motion are gauge invariant. The ap
propriate gauge condition can be enforced by imposing conservation of charge
13
∂
µ
j
µ
= 0 (383)
On taking the fourdivergence of the equation of motion, one ﬁnds
∂
µ
∂
ν
F
ν,µ
+
_
m c
¯h
_
2
∂
µ
A
µ
=
4 π
c
∂
µ
j
µ
(384)
The ﬁrst term on the lefthand side vanishes due to the deﬁnition of F
µ,ν
, since
F
ν,µ
= ∂
ν
A
µ
− ∂
µ
A
ν
(385)
12
A. Proca, J. Phys. et Radium 7, 147 (1936).
13
Note that, unlike the massless photon, charge conservation has to be imposed as an
additional assumption.
63
one ﬁnds
∂
ν
F
ν,µ
= ∂
ν
∂
ν
A
µ
− ∂
µ
∂
ν
A
ν
(386)
therefore
∂
µ
∂
ν
F
ν,µ
= ∂
ν
∂
ν
∂
µ
A
µ
− ∂
µ
∂
µ
∂
ν
A
ν
= 0 (387)
The term on the righthand side of eqn(384) also vanishes, because it was chosen
to impose charge conservation. Hence, one ﬁnds that A
µ
for a massive spinone
particle must satisfy the Lorenz gauge condition
∂
µ
A
µ
= 0 (388)
Exercise:
Starting from the expression eqn(372), determine the symmetrized energy
momentum tensor for the massive vector ﬁeld. Hence, ﬁnd the energy and
momentum densities.
8 Symmetry Breaking and Mass Generation
We shall ﬁrst look at an example of Goldstone’s theorem which states that,
if a system described by a Lagrangian which has a continuous symmetry (and
only shortranged interactions) has a broken symmetry state then the system
supports a branch of small amplitude excitations with a dispersion relation ω
k
that vanishes at k = 0. We shall then examine the situation in which the system
is coupled by longranged interactions, as modelled by an electromagnetic ﬁeld.
As was ﬁrst pointed out by Anderson, the longranged interactions alter the
excitation spectrum of the symmetry broken state by removing the Goldstone
modes and generating a branch of massive excitations.
8.1 Symmetry Breaking and Goldstone Bosons
Consider a Lagrangian density for a complex scalar ﬁeld of the form
L = ( ∂
µ
ψ
∗
) ( ∂
µ
ψ ) −
_
m c
2 ¯h φ
0
_
2
_
ψ
∗
ψ − φ
2
0
_
2
(389)
The Lagrangian density is invariant under the continuous global symmetry
ψ →ψ
= exp
_
− i α
_
ψ (390)
64
Re [Ψ]
v
(
Ψ
)
Im [Ψ]
φ
0
Figure 8: The potential V [ψ] described by the Lagrangian is invariant under
global rotations of the phase of ψ. The minima occurs at values of ψ which have
magnitudes φ
0
, therefore, the uniform static ﬁeld is inﬁnitely degenerate.
for any real constant α. The static or minimum energy solution corresponds to
[ ψ [ = φ
0
(391)
which leaves the phase of ψ undetermined. Since the phase of ψ is continuous,
the ground state is inﬁnitely degenerate. If one writes
ψ = φ
1
+ i φ
2
ψ
∗
= φ
1
− i φ
2
(392)
then the Lagrangian can be written as a Lagrangian density involving the two
real scalar ﬁelds φ
1
and φ
2
. The Lagrangian density has a U(1) symmetry which
corresponds to the rotation of ψ around a circle about the origin in the (φ
1
, φ
2
)
plane.
We shall assume the ﬁeld ψ representing the physical ground state corre
sponds to only one of the inﬁnite number of possible candidates. The physical
state must have a phase, which shall be deﬁned as zero. That is, one starts with
a ground state ψ = φ
0
, and then consider the small amplitude excitations. A
lowenergy excited state corresponds to the complex ﬁeld
ψ = φ
0
+ δψ (393)
where δψ is static and uniform and can be considered to be very small. The
small amplitude complex ﬁeld δψ can be expressed in terms of its real and
imaginary parts
δψ = χ
1
+ i χ
2
(394)
65
The Lagrangian density takes the form
L = ( ∂
µ
χ
1
) ( ∂
µ
χ
1
) + ( ∂
µ
χ
2
) ( ∂
µ
χ
2
) −
_
m c
2 ¯h φ
0
_
2
_
2 φ
0
χ
1
+ χ
2
1
+ χ
2
2
_
2
(395)
If one only considers inﬁnitesimally small amplitude oscillations, one only needs
consider terms quadratic in the ﬁelds. The quadratic Lagrangian density L
Free
describes noninteracting ﬁelds. The quadratic Lagrangian density is given by
L
Free
= ( ∂
µ
χ
1
) ( ∂
µ
χ
1
) −
_
m c
¯h
_
2
χ
2
1
+ ( ∂
µ
χ
2
) ( ∂
µ
χ
2
) (396)
The symmetry breaking has resulted in the complex ﬁeld breaking up into two
ﬁelds: The ﬁrst ﬁeld χ
1
describes massive excitations m and the second ﬁeld χ
2
describes massless excitations. The ﬁrst ﬁeld χ
1
has planewave solutions if the
energy and momentum are related via the dispersion relation
ω
2
= c
2
k
2
+
_
m c
2
¯h
_
2
(397)
and represents excitations which corresponds to a “stretching” of φ
0
. It is
massive since this excitation moves the ﬁeld away from the minimum of the
potential. The second excitation χ
2
represents δψ which is transverse to φ
0
in
the (φ
1
, φ
2
) plane. This last excitation is known as a Goldstone boson
14
. The
Goldstone boson has a dispersion relation
ω
2
= c
2
k
2
(398)
which vanishes at k = 0. The Goldstone boson dynamically restores the sponta
neously broken U(1) symmetry since, at k = 0, it just corresponds to a change
of the value of the (static and uniform) broken symmetry ﬁeld from (φ
0
, 0)
to the new direction (φ
0
, χ
2
). Therefore, if inﬁnitely many zeroenergy Gold
stone bosons are excited in the system, the resulting state should correspond
to a new ground state with a diﬀerent value of the phase. As noted by An
derson
15
prior to Goldstone’s work, the Goldstone theorem breaks down when
longranged interactions are present. Anderson was responsible for the concept
of mass generation through symmetry breaking due to the coupling with gauge
ﬁelds
16
. This concept was subsequently developed by Peter Higgs
17
and Tom
Kibble and coworkers
18
.
14
J. Goldstone, Il Nuovo Cimento, 19, 154 (1961).
15
P. W. Anderson, Phys. Rev., 112 1900 (1958).
16
P.W. Anderson, “Plasmons, Gauge Invariance, and Mass”. Physical Review 130, 439
(1963).
17
P.W. Higgs, “Broken Symmetries, Massless Particles and Gauge Fields”, Physics Letters,
12, 132 (1964): P.W. Higgs, “Broken Symmetries and the Masses of Gauge Bosons”. Physical
Review Letters 13, 508 (1964):P.W. Higgs, “Spontaneous Symmetry Breaking without mass
less Bosons”, Physical Review 145, 1156 (1966)
18
G.S. Guralnik, C.R. Hagen, and T.W.B. Kibble, “Global Conservation Laws and Massless
Particles”. Physical Review Letters 13, 585587 (1964): T.W.B. Kibble, “Symmetry Breaking
in nonAbelian Gauge Theories”, Physical Review, 155, 1554 (1967).
66
8.2 The KibbleHiggs Mechanism
We shall now consider the coupling of a scalar ﬁeld ψ with charge q to a gauge
ﬁeld A
µ
. The Lagrangian density is related to the sum of the Lagrangian density
for the electromagnetic ﬁeld and the Lagrangian density for the charged scalar
particle. The coupling between the ﬁelds is found from the minimum coupling
assumption
ˆ p
µ
→ ˆ p
µ
= ˆ p
µ
−
q
c
A
µ
(399)
which becomes
i ¯h ∂
µ
→ i ¯h ∂
µ
−
q
c
A
µ
(400)
Therefore, the Lagrangian density for the coupled ﬁelds has the form
L = ( ∂
µ
− i
q
¯h c
A
µ
) ψ
∗
( ∂
µ
+ i
q
¯h c
A
µ
) ψ −
1
16 π
F
µ,ν
F
µ,ν
−
_
m c
2 ¯h φ
0
_
2
_
ψ
∗
ψ − φ
2
0
_
2
(401)
The Lagrangian density is invariant under the local gauge transformation
ψ → ψ
= exp
_
− i
q
¯h c
Λ
_
ψ
A
µ
→ A
µ
= A
µ
+ ∂
µ
Λ (402)
where Λ = Λ(x
µ
) is a function of spacetime. The system has minimum energy
when ψ has a constant value with a magnitude given by
[ ψ [ = φ
0
(403)
and the A
µ
vanish. Any local gauge transformation leads to a state with the
same energy, therefore, the ground state is inﬁnitely degenerate.
We shall assume that a physical system spontaneously breaks the symmetry
in that it corresponds to a speciﬁc constant value of Λ. We shall choose the
local gauge Λ(x) such that the ﬁeld ψ representing the excited states is purely
real. However, once the gauge has been ﬁxed, no further gauge transformations
can be made.
The small amplitude excitations can be expressed as
ψ = φ
0
+ δψ (404)
The ﬂuctuations can be expressed as
δψ = χ
1
(405)
67
and on substituting in the Lagrangian and collecting the quadratic terms, one
obtains
L
Free
= ( ∂
µ
χ
1
) ( ∂
µ
χ
1
) −
_
m c
¯h
_
2
χ
2
1
−
1
16 π
F
µ,ν
F
µ,ν
+
_
q φ
0
¯h c
_
2
A
µ
A
µ
(406)
Therefore, one ﬁnds that the charged boson ﬁeld has a mass m and the gauge
ﬁeld has acquired a mass m
A
given by
m
2
A
= 8 π
_
q φ
0
c
2
_
2
(407)
Hence, by coupling an electromagnetic ﬁeld with two components to a scalar
charged boson ﬁeld, one has found a massive vector boson gaugeﬁeld with three
independent components. The massless spinless component of the charged bo
son ﬁeld which described the Goldstone mode has become the longitudinal mode
of the gauge ﬁeld.
9 Quantization of the Electromagnetic Field
Following the work of Dirac
19
, the energy, momentum and angular momentum
of the electromagnetic ﬁeld shall be reduced into contributions from a set of
normal modes. A particular normal mode will correspond to a particular wave
vector and a particular polarization of the ﬁeld. The normal modes can be
described in terms of a set of harmonic oscillators and, when quantized, the
normal modes will be described by quantum mechanical harmonic oscillators.
In the absence of sources, the (classical) wave equation for the vector poten
tial has the form
_
− ∇
2
+
1
c
2
∂
2
∂t
2
_
A = 0 (408)
when the Coulomb gauge condition is imposed
∇ . A = 0 (409)
The Fourier transformation, with respect to space is deﬁned as
A(k, t) =
1
√
V
_
d
3
r exp
_
i k . r
_
A(r, t) (410)
19
P. A. M. Dirac, Proc. Roy. Soc. A 114, 243 (1927).
In this paper Dirac uses two diﬀerent approaches to quantizing electromagnetism. In one
approach he treated a single photon as satisfying a singleparticle Schr¨odinger equation, that
has a similar form to Maxwell’s equations. The other approach treated the ﬁelds as dynamical
variables and then quantized them. Dirac then showed that these two methods produce
equivalent results. By doing this, Dirac created second quantization.
68
where V is the volume of the system. The inverse Fourier Transform is given
by
A(r, t) =
1
√
V
k
exp
_
− i k . r
_
A(k, t) (411)
On Fourier transforming the wave equation with respect to space and time, one
ﬁnds the equation of motion
_
k
2
+
1
c
2
_
∂
2
∂t
2
_ _
A(k, t) = 0 (412)
and the Coulomb gauge condition becomes
k . A(k, t) = 0 (413)
We shall look for solutions for A(k, t) that have a time dependence given by
linear superpositions of the terms proportional to
exp
_
∓ i ω
k
t
_
(414)
By substituting the above terms into the wave equation, it is found that linear
superpositions of planewaves are solutions of Maxwell’s equation but only if
the frequency ω
k
and wave vector k are related via the dispersion relation
ω
2
k
= c
2
k
2
(415)
The gauge condition also requires that the vector potential is oriented perpen
dicular to the direction of propagation. Therefore, an arbitrary planewave
solution can be represented as a linear superposition of two polarized waves
with polarizations described by two mutually orthogonal unit vectors denoted
by ˆ
α
(k). The polarization vectors satisfy
k . ˆ
α
(k) = 0
ˆ
α
(k) . ˆ
β
(k) = δ
α,β
(416)
We shall assume that three vectors
_
k, ˆ
1
(k), ˆ
2
(k)
_
form a mutually orthogonal
coordinate system. We shall deﬁne
ˆ
1
(−k) = ˆ
1
(k)
ˆ
2
(−k) = ˆ
2
(k) (417)
The algebraic equations for A(k) can be solved trivially. One can express the
vector potential as a linear superposition
A(r, t) =
1
√
V
k,α
ˆ
α
(k) exp
_
− i k . r
_
Φ
α
(k, t) (418)
69
E
B
k
Figure 9: The normal modes of the classical electromagnetic ﬁeld are plane
polarized waves, in which E and B are transverse to the direction of propagation
k, and oscillate in phase.
However, since the vector potential is real
A(r, t) = A
∗
(r, t) (419)
one must have
Φ
α
(k, t) = Φ
∗
α
(−k, t) (420)
Therefore, if Φ
α
(k) and Φ
∗
α
(k) are to be considered as being independent ﬁelds,
then one must restrict k to have values in a volume of kspace that does not
contain both k and −k for any ﬁxed value of k. This curiosity is associated with
the fact that, for purely real ﬁelds, particles are identical to their antiparticles.
9.1 The Lagrangian and Hamiltonian Density
The Lagrangian density L for the electromagnetic ﬁeld can be expressed as
L =
1
8 π
_
E
2
− B
2
_
(421)
in the Coulomb gauge, the electromagnetic ﬁeld is given by
E = −
1
c
∂A
∂t
B = ∇ ∧ A (422)
Hence, the Lagrangian density is expressed as
L =
1
8 π
_
1
c
2
_
∂A
∂t
_
2
−
_
∇ ∧ A
_
2
_
(423)
70
The Lagrangian is given by the space integral of the Lagrangian density
L =
_
d
3
r L (424)
On substituting A(r, t) in the form of eqn(418) and integrating over r and using
the identity
1
V
_
d
3
r exp
_
i ( k + k
) . r
_
= δ
k+k
(425)
one ﬁnds the Lagrangian is given by
L =
1
8 π
k,k
α,β
δ
k+k
_
ˆ
α
(k) . ˆ
β
(k
)
1
c
2
_
∂Φ
α
(k)
∂t
_ _
∂Φ
β
(k
)
∂t
_
+ ( k ∧
α
(k) ) . ( k
∧
β
(k
) ) Φ
α
(k) Φ
β
(k
)
_
=
1
8 π
k,α
_
1
c
2
_
∂Φ
∗
α
(k)
∂t
_ _
∂Φ
α
(k)
∂t
_
− k
2
Φ
∗
α
(k) Φ
α
(k)
_
(426)
In the above expression, the summation over k is unrestricted. If the Lagrangian
is to be expressed in terms of the independent components, then the summation
over k must be restricted to half the set allowed values. With this restriction,
k
k
Figure 10: A possible partition of kspace, which does not contain both k and
its inverse −k.
one obtains
L =
2
8 π
k,α
_
1
c
2
_
∂Φ
∗
α
(k)
∂t
_ _
∂Φ
α
(k)
∂t
_
− k
2
Φ
∗
α
(k) Φ
α
(k)
_
(427)
71
where the prime over the summation denotes the restriction of k to values
in the “positive” half volume of kspace. Since there are half the number of
independent normal modes, their contributions are twice as big. The Lagrangian
is a function of the six generalized variables Φ
α
(k) and Φ
∗
α
(k) for the independent
k values. The generalized momenta variables are found as
Π
α
(k) =
2
8 π c
2
_
∂Φ
∗
α
(k)
∂t
_
Π
∗
α
(k) =
2
8 π c
2
_
∂Φ
α
(k)
∂t
_
(428)
The Lagrangian equations of motion of the ﬁeld are given by
∂
∂t
_
1
8 π c
2
_
∂Φ
α
(k)
∂t
_ _
= −
k
2
8 π
Φ
α
(k) (429)
or
_
∂
2
Φ
α
(k)
∂t
2
_
= − ω
2
k
Φ
α
(k) (430)
where ω
k
= c k. Thus, the classical ﬁeld Φ
α
(k) has a timedependent ampli
tude which resembles that of a harmonic oscillator with frequency ω
k
= c k.
The Hamiltonian can be obtained from the Lagrangian, via the Legendre Trans
formation
H =
k,α
_
Π
∗
α
(k)
∂Φ
∗
α
(k)
∂t
+ Π
α
(k)
∂Φ
α
(k)
∂t
_
− L (431)
which leads to the explicit expression for the Hamiltonian
H =
k,α
_
8 π c
2
2
Π
∗
α
(k) Π
α
(k) +
2
8 π
k
2
Φ
∗
α
(k) Φ
α
(k)
_
(432)
where the summation over (k, α) runs over the independent normal modes.
Hence, the k summation only runs over the set of points in k space which are
not related via the inversion operator. The Hamiltonian is related to the energy
of the electromagnetic ﬁeld, as shall be seen below.
The energy density H for the electromagnetic ﬁeld can be expressed as
H =
1
8 π
_
E
2
+ B
2
_
(433)
in the Coulomb gauge. The energy density can be written in terms of the vector
potential as
H =
1
8 π
_
1
c
2
_
∂A
∂t
_
2
+
_
∇ ∧ A
_
2
_
(434)
72
The energy is the integral of the energy density over all space
H =
_
d
3
r H (435)
When expressed in terms of the generalized coordinates and the generalized
momenta, the energy reduces to the expression
H =
k,α
_
8 π c
2
4
Π
α
(k) Π
∗
α
(k) +
1
8 π
k
2
Φ
∗
α
(k) Φ
α
(k)
_
(436)
in which the summation over k is unrestricted. Thus, the above expression for
the energy is identical to the Hamiltonian for the electromagnetic ﬁeld. Fur
thermore, the Hamiltonian has been expressed in terms of a set of the normal
modes labeled by (k, α).
9.2 Quantizing the Normal Modes
The quantized Hamiltonian is obtained from the classical Hamiltonian by re
placing the ﬁeld components and their canonically conjugate momenta
Φ
α
(k) , Π
α
(k) (437)
by the operators
ˆ
Φ
α
(k) ,
ˆ
Π
α
(k) (438)
and their complex conjugates are replaced by the Hermitean conjugate opera
tors. The canonically conjugate coordinates and momenta operators satisfy the
commutation relations
[
ˆ
Φ
α
(k) ,
ˆ
Π
β
(k
) ] = i ¯h δ
α,β
δ
k,k
[
ˆ
Π
α
(k) ,
ˆ
Π
β
(k
) ] = 0
[
ˆ
Φ
α
(k) ,
ˆ
Φ
β
(k
) ] = 0 (439)
The quantized Hamiltonian for the electromagnetic ﬁeld is given by
ˆ
H =
k,α
_
8 π c
2
4
ˆ
Π
α
(k)
ˆ
Π
†
α
(k) +
1
8 π
k
2
ˆ
Φ
†
α
(k)
ˆ
Φ
α
(k)
_
(440)
The Hamiltonian can be factorized by introducing the annihilation operators
ˆ a
k,α
=
1
√
2
_
i
¸
8 π c
2
2 ¯h ω
k
ˆ
Π
α
(k) +
¸
2 k
2
8 π ¯h ω
k
ˆ
Φ
†
α
(k)
_
(441)
and the Hermitean conjugate operators
ˆ a
†
k,α
=
1
√
2
_
− i
¸
8 π c
2
2 ¯h ω
k
ˆ
Π
†
α
(k) +
¸
2 k
2
8 π ¯h ω
k
ˆ
Φ
α
(k)
_
(442)
73
known as creation operators. The commutation relations for the creation and
annihilation operators can be obtained directly from the commutation relations
of the ﬁeld operators
ˆ
Φ
α
(k) and
ˆ
Π
α
(k) which are shown in eqn(439). It can
be shown that the creation and annihilation operators satisfy the commutation
relations
[ ˆ a
k,α
, ˆ a
†
k
,β
] = δ
α,β
δ
k,k
[ ˆ a
†
k,α
, ˆ a
†
k
,β
] = 0
[ ˆ a
k,α
, ˆ a
k
,β
] = 0 (443)
The ﬁeld operators can be expressed in terms of the creation and annihilation
operators. Starting with
ˆ a
k,α
=
1
√
2
_
i
¸
8 π c
2
2 ¯h ω
k
ˆ
Π
α
(k) +
¸
2 k
2
8 π ¯h ω
k
ˆ
Φ
†
α
(k)
_
(444)
transforming k → −k and then by noting that
ˆ
Π
α
(−k) =
ˆ
Π
†
α
(k) and
ˆ
Φ
†
α
(−k) =
ˆ
Φ
α
(k), one ﬁnds
ˆ a
−k,α
=
1
√
2
_
i
¸
8 π c
2
2 ¯h ω
k
ˆ
Π
†
α
(k) +
¸
2 k
2
8 π ¯h ω
k
ˆ
Φ
α
(k)
_
(445)
One can eliminate
ˆ
Π
†
α
(k) by adding the expression for the creation operator
given by eqn(442) and the expression for the annihilation operator with mo
mentum −k given by eqn(445). This process yields the expression for the ﬁeld
component operators
ˆ
Φ
α
(k) in the form
ˆ
Φ
α
(k) =
_
2 π ¯h ω
k
k
2
_
ˆ a
†
k,α
+ ˆ a
−k,α
_
(446)
and, by an analogous procedure, the Hermitean conjugate operator is found to
be given by
ˆ
Φ
†
α
(k) =
_
2 π ¯h ω
k
k
2
_
ˆ a
k,α
+ ˆ a
†
−k,α
_
(447)
which is identical to
ˆ
Φ
α
(−k). Likewise, the canonically conjugate momenta
operators are given by
ˆ
Π
α
(k) = i
_
¯h ω
k
8 π c
2
_
ˆ a
†
−k,α
− ˆ a
k,α
_
(448)
and their Hermitean conjugates are
ˆ
Π
†
α
(k) = − i
_
¯h ω
k
8 π c
2
_
ˆ a
−k,α
− ˆ a
†
k,α
_
(449)
as was anticipated.
74
9.2.1 The Energy of the Field
The Hamiltonian of the electromagnetic ﬁeld
ˆ
H =
k,α
_
8 π c
2
4
ˆ
Π
α
(k)
ˆ
Π
†
α
(k) +
1
8 π
k
2
ˆ
Φ
†
α
(k)
ˆ
Φ
α
(k)
_
(450)
can be expressed in terms of the creation and annihilation operators as
ˆ
H =
k,α
¯h ω
k
4
_ _
ˆ a
†
−k,α
− ˆ a
k,α
_ _
ˆ a
−k,α
− ˆ a
†
k,α
_
+
_
ˆ a
†
k,α
+ ˆ a
−k,α
_ _
ˆ a
k,α
+ ˆ a
†
−k,α
_ _
=
k,α
¯h ω
k
4
_
ˆ a
†
k,α
ˆ a
k,α
+ ˆ a
k,α
ˆ a
†
k,α
+ ˆ a
†
−k,α
ˆ a
−k,α
+ ˆ a
−k,α
ˆ a
†
−k,α
_
(451)
If one sets k → −k in the second set of terms, then one ﬁnds the Hamiltonian
becomes the sum over independent harmonic oscillators for each k value and
polarization
ˆ
H =
k,α
¯h ω
k
2
_
ˆ a
†
k,α
ˆ a
k,α
+ ˆ a
k,α
ˆ a
†
k,α
_
(452)
The number operator for each normal mode is given by
ˆ n
k,α
= ˆ a
†
k,α
ˆ a
k,α
(453)
and has integer eigenvalues denoted by n
k,α
. Hence, the energy eigenvalues E
are given by
E =
k,α
¯h ω
k
_
n
k,α
+
1
2
_
(454)
The energy of the electromagnetic ﬁeld is quantized in units of ¯ h ω
k
= ¯h c k.
The quanta are known as photons.
It should be noted that the contributions to the total energy from the zero
point energy terms
¯ hω
k
2
diverge. However, in most circumstances, only the
excitation energy of the ﬁeld is measurable, hence the divergence is mainly ir
relevant. The zeropoint energy does have physical consequences, and can be
observed if the volume or boundary conditions of the ﬁeld are changed. The
change in the zeropoint energy of the ﬁeld due to change in volume or boundary
conditions is known as the Casimir eﬀect
20
.
20
H. B. G. Casimir, Proc. Neth. Aka. Wetenschapen, 51, 793 (1948), M. J. Sparnaay,
Physica 24, 761 (1959)
75
9.2.2 The Electromagnetic Field
The quantized vector potential is given by the operator
ˆ
A(r), given by
ˆ
A(r) =
k,α
ˆ
α
(k)
¸
2 π ¯h c
2
ω
k
V
_
ˆ a
†
k,α
+ ˆ a
−k,α
_
exp
_
− i k . r
_
(455)
In the Heisenberg representation, the time dependence of the vector potential
is found from
i ¯h
∂A(r, t)
∂t
= [ A(r, t) ,
ˆ
H ] (456)
which has the solution
ˆ
A(r, t) = exp
_
+ i
t
¯h
ˆ
H
_
ˆ
A(r, 0) exp
_
− i
t
¯h
ˆ
H
_
(457)
or
ˆ
A(r, t) =
k,α
ˆ
α
(k)
¸
2 π ¯h c
2
ω
k
V
_
ˆ a
†
k,α
exp
_
i ω
k
t
_
+ ˆ a
−k,α
exp
_
− i ω
k
t
_ _
exp
_
− i k . r
_
(458)
The above equation was obtained by noting that, in the basis composed of
eigenstates of the number operators [n
k,α
>, one has
ˆ a
k,α
(t) [n
k,α
> = exp
_
+iω
k
t (ˆ a
†
k,α
ˆ a
k,α
+ 1/2)
_
ˆ a
k,α
(0) [n
k,α
> exp
_
−iω
k
t (n
k,α
+ 1/2)
_
= exp
_
+iω
k
t (n
k,α
−1/2)
_
√
n
k,α
[n
k,α
−1 > exp
_
−iω
k
t (n
k,α
+ 1/2)
_
= exp
_
− i ω
k
t
_
ˆ a
k,α
[n
k,α
> (459)
and that the timedependent creation operator is given by the Hermitean con
jugate expression. Thus, the explicit form of time dependence of the vector
potential is a consequence of the explicit time dependence of the creation and
annihilation operators in the Heisenberg representation. Alternatively, one can
ﬁnd the time dependence of the creation and annihilation operators directly
from the Heisenberg equations of motion without invoking a privileged set of
basis states. The equation of motion for the creation operator is given by
i ¯h
∂ˆ a
†
k,α
∂t
= [ ˆ a
†
k,α
,
ˆ
H ] (460)
and the commutator is evaluated as
[ ˆ a
†
k,α
, ˆ a
†
k
,β
ˆ a
k
,β
] = − ˆ a
†
k,α
δ
α,β
δ
k,k
(461)
76
so the equation of motion simpliﬁes to
i ¯h
∂ˆ a
†
k,α
∂t
= − ¯h ω
k
ˆ a
†
k,α
(462)
Therefore, one ﬁnds the result
ˆ a
†
k,α
(t) = ˆ a
†
k,α
exp
_
i ω
k
t
_
(463)
Likewise, the annihilation operator satisﬁes the equation of motion
i ¯h
∂ˆ a
k,α
∂t
= [ ˆ a
k,α
,
ˆ
H ] (464)
and as
[ ˆ a
k,α
, ˆ a
†
k
,β
ˆ a
k
,β
] = + ˆ a
k,α
δ
α,β
δ
k,k
(465)
so the equation of motion simpliﬁes to
i ¯h
∂ˆ a
k,α
∂t
= + ¯ h ω
k
ˆ a
k,α
(466)
Hence, one ﬁnds that the timedependent annihilation operator is given by
ˆ a
k,α
(t) = ˆ a
k,α
exp
_
− i ω
k
t
_
(467)
which is just the Hermitean conjugate of the ˆ a
†
k,α
(t) that was found previously.
Therefore, the timedependence of the vector potential is entirely due to the
timedependence of the Heisenberg representation of the creation and annihila
tion operators.
9.2.3 The Momentum of the Field
The total momentum operator for the electromagnetic ﬁeld is given by the
integral over all space of the Poynting vector
ˆ
P =
1
4 π c
_
d
3
r
_
ˆ
E ∧
ˆ
B
_
(468)
This will be evaluated by expressing the
ˆ
E and
ˆ
B ﬁeld operators in terms of the
vector potential A operator via
ˆ
E = −
1
c
∂
ˆ
A
∂t
ˆ
B = ∇ ∧
ˆ
A (469)
77
The vector potential operator can be written in terms of the creation and anni
hilation operators for the normal modes as
ˆ
A(r, t) =
k,α
ˆ
α
(k)
¸
2 π ¯h c
2
ω
k
V
_
ˆ a
†
k,α
(t) + ˆ a
−k,α
(t)
_
exp
_
− i k . r
_
(470)
then the E and B ﬁeld operators are found as
ˆ
E(r) = − i
k,α
ˆ
α
(k)
_
2 π ¯h ω
k
V
_
ˆ a
†
k,α
− ˆ a
−k,α
_
exp
_
− i k . r
_
(471)
and
ˆ
B(r) = − i
k,α
( k ∧ ˆ
α
(k) )
¸
2 π ¯h c
2
ω
k
V
_
ˆ a
†
k,α
+ ˆ a
−k,α
_
exp
_
− i k . r
_
(472)
For a ﬁxed k, the polarization vectors ˆ
α
(k) and k are mutually orthogonal.
Therefore, one has
ˆ
α
(k) ∧ ( k ∧ ˆ
β
(k) ) = k ( ˆ
α
(k) . ˆ
β
(k) ) − ˆ
β
(k) ( k . ˆ
α
(k) )
= k δ
α,β
(473)
Hence, the total momentum of the electromagnetic ﬁeld is determined from
ˆ
P =
¯h
2
k,α
ˆ
α
(k) ∧ ( k ∧ ˆ
α
(k) )
_
ˆ a
†
k,α
− ˆ a
−k,α
_ _
ˆ a
†
−k,α
+ ˆ a
k,α
_
=
¯h
2
k,α
k
_
ˆ a
†
k,α
− ˆ a
−k,α
_ _
ˆ a
†
−k,α
+ ˆ a
k,α
_
(474)
It should be noted that the momentum from each normal mode of the ﬁeld is
parallel to its direction of propagation. Since the creation operators commute
ˆ a
†
k,α
ˆ a
†
−k,α
= ˆ a
†
−k,α
ˆ a
†
k,α
(475)
and that the annihilation operators also commute
ˆ a
−k,α
ˆ a
k,α
= ˆ a
k,α
ˆ a
−k,α
(476)
one ﬁnds that the part of the momentum represented by the summation over k
given by
¯h
k,α
k
_
ˆ a
†
k,α
ˆ a
†
−k,α
− ˆ a
−k,α
ˆ a
k,α
_
= 0 (477)
vanishes since the summand is odd under inversion symmetry. Thus, the mo
mentum of the electromagnetic ﬁeld is given by
ˆ
P =
¯h
2
k,α
k
_
ˆ a
†
k,α
ˆ a
k,α
− ˆ a
−k,α
ˆ a
†
−k,α
_
=
1
2
k,α
_
¯h k ˆ a
†
k,α
ˆ a
k,α
− ¯h k ˆ a
†
−k,α
ˆ a
−k,α
− ¯h k
_
(478)
78
where the commutation relations for the creation and annihilation operators
were used to obtain the last line. The last term vanishes when summed over k,
due to inversion symmetry. Hence, the momentum of the ﬁeld is given by the
operator
ˆ
P =
1
2
k,α
_
¯h k ˆ a
†
k,α
ˆ a
k,α
− ¯h k ˆ a
†
−k,α
ˆ a
−k,α
_
(479)
Finally, on transforming −k to k in the last term of the summand, one ﬁnds the
total momentum of the ﬁeld is carried by the excitations since
ˆ
P =
k,α
¯h k ˆ a
†
k,α
ˆ a
k,α
(480)
Thus, each quantum excitation of wave vector k has momentum ¯h k.
Since a photon has an energy ¯ h c k and momentum ¯h k, these quanta are
massless because the mass of the quanta are deﬁned as the relativistic invariant
length of the momentum fourvector
_
E
c
_
2
− p
2
= m
2
c
2
(481)
which yields m = 0. The energymomentum dispersion relation of the quanta
of the electromagnetic ﬁeld was conclusively demonstrated by A. H. Compton
21
.
Compton showed that when quanta are scattered by charged particles, the pho
ton’s dispersion relation follows directly by application of conservation laws to
the recoiling particle.
9.2.4 The Angular Momentum of the Field
The total angular momentum operator of the electromagnetic ﬁeld
ˆ
J
EM
is given
by
ˆ
J
EM
=
1
4 π c
_
d
3
r
_
r ∧ (
ˆ
E ∧
ˆ
B )
_
(482)
The ith component is given by
ˆ
J
(i)
EM
=
1
4 π c
_
d
3
r ξ
i,j,k
_
x
(j)
(
ˆ
E ∧
ˆ
B )
(k)
_
=
1
4 π c
_
d
3
r ξ
i,j,k
_
x
(j)
ξ
k,l,m
ˆ
E
(l)
ˆ
B
(m)
_
=
1
4 π c
_
d
3
r ξ
i,j,k
_
x
(j)
ξ
k,l,m
ˆ
E
(l)
ξ
m,n,p
∂
ˆ
A
(p)
∂x
(n)
_
(483)
21
A. H. Compton, Phys. Rev. Second Series, 21, 483 (1923).
79
However, due to the identity
ξ
k,l,m
ξ
m,n,p
=
_
δ
k,n
δ
l,p
− δ
k,p
δ
l,n
_
(484)
one ﬁnds
ˆ
J
(i)
EM
=
1
4 π c
_
d
3
r ξ
i,j,k
_
x
(j)
ˆ
E
(l)
∂
ˆ
A
(l)
∂x
(k)
− x
(j)
ˆ
E
(l)
∂
ˆ
A
(k)
∂x
(l)
_
(485)
On integrating by parts in the last term, one has
ˆ
J
(i)
EM
=
1
4 π c
_
d
3
r ξ
i,j,k
_
x
(j)
ˆ
E
(l)
∂
ˆ
A
(l)
∂x
(k)
+
∂
∂x
(l)
_
x
(j)
ˆ
E
(l)
_
ˆ
A
(k)
_
(486)
The divergence of the electric ﬁeld vanishes,
∂
ˆ
E
(l)
∂x
(l)
= 0 (487)
and since
∂x
(j)
∂x
(l)
= δ
j,l
(488)
the total angular momentum can be rewritten as
ˆ
J
(i)
EM
=
1
4 π c
_
d
3
r ξ
i,j,k
_
ˆ
E
(l)
x
(j)
∂
ˆ
A
(l)
∂x
(k)
+
ˆ
E
(j)
ˆ
A
(k)
_
(489)
The ﬁrst term can be recognized as the orbital angular momentum of the ﬁeld.
The orbital angular momentum operator
ˆ
L
(i)
is given by
ˆ
L
(i)
= − i ¯h ξ
i,j,k
x
(j)
∂
∂x
(k)
(490)
so the total angular momentum of the ﬁeld is given by
ˆ
J
(i)
EM
=
i
4 π ¯h c
_
d
3
r
_
ˆ
E
(l)
ˆ
L
(i)
ˆ
A
(l)
− i ¯h ξ
i,j,k
ˆ
E
(j)
ˆ
A
(k)
_
=
i
4 π ¯h c
_
d
3
r
_
ˆ
E
(l)
ˆ
L
(i)
ˆ
A
(l)
+
ˆ
E
(j)
(
ˆ
S
(i)
)
j,k
ˆ
A
(k)
_
(491)
where the deﬁnition
(
ˆ
S
(i)
)
j,k
= − i ¯h ξ
i,j,k
(492)
for
ˆ
S, the intrinsic spin operator for the photon, has been used in obtaining the
second line. The total vector angular momentum operator can be expressed as
ˆ
J
EM
=
i
4 π ¯h c
_
d
3
r
ˆ
E
(j)
_
ˆ
L δ
j,k
+ (
ˆ
S )
j,k
_
ˆ
A
(k)
(493)
80
which shows that the orbital angular momentum is diagonal with respect to
the ﬁeld components and the spin angular momentum mixes the diﬀerent ﬁeld
components.
The total spin component of the angular momentum operator for the elec
tromagnetic ﬁeld is given by
ˆ
S
(i)
EM
=
i
4 π ¯h c
_
d
3
r
_
ˆ
E
(j)
(
ˆ
S
(i)
)
j,k
ˆ
A
(k)
_
=
1
4 π c
_
d
3
r ξ
i,j,k
ˆ
E
(j)
ˆ
A
(k)
=
1
4 π c
_
d
3
r
_
ˆ
E ∧
ˆ
A
_
(i)
(494)
This can be expressed in terms of the photon creation and annihilation operators
as
ˆ
S
(i)
EM
= − i
¯h
2
k,α,β
_
ˆ
(j)
β
(k) ξ
i,j,k
ˆ
(k)
α
(k)
_
_
ˆ a
†
−k,β
− ˆ a
k,β
_ _
ˆ a
†
k,α
+ ˆ a
−k,α
_
(495)
The ﬁrst term in parenthesis is recognized as the ith component of the vector
product
ˆ
β
(k) ∧ ˆ
α
(k) (496)
and, therefore, it is antisymmetric in the polarization indices α and β and the
nonzero contributions are restricted to the case α ,= β. Since the creation
and annihilation operators corresponding to diﬀerent polarizations commute,
the product of the two remaining parenthesis can be rearranged as the sum of
two terms
ˆ
S
(i)
EM
= − i
¯h
2
k,α,β
_
ˆ
β
(k) ∧ ˆ
α
(k)
_
(i)
_ _
ˆ a
†
−k,β
ˆ a
†
k,α
− ˆ a
k,β
ˆ a
−k,α
_
+
_
ˆ a
†
−k,β
ˆ a
−k,α
− ˆ a
†
k,α
ˆ a
k,β
_ _
(497)
On transforming the summation variable k → −k and commuting the oper
ators, one ﬁnds that the ﬁrst term is symmetric whereas the second term is
antisymmetric under the interchange of α and β. Hence, on summing over
the polarization indices, the contribution from the ﬁrst term vanishes, as it is
the product of a symmetric and antisymmetric term. Therefore, the total spin
81
operator of the electromagnetic ﬁeld is expressed as
ˆ
S
(i)
EM
= i
¯h
2
k,α,β
_
ˆ
β
(k) ∧ ˆ
α
(k)
_
(i)
_
ˆ a
†
k,α
ˆ a
k,β
− ˆ a
†
−k,β
ˆ a
−k,α
_
(498)
On deﬁning the sense of the polarization vectors relative to
ˆ
k (≡ ˆ e
3
(k) the unit
vector in the direction of propagation) via
_
ˆ
1
(k) ∧ ˆ
2
(k)
_
=
ˆ
k (499)
so that
ˆ
k corresponds to the zdirection, one ﬁnds that
ˆ
S
EM
= i
¯h
2
k
_
ˆ
k
_
ˆ a
†
k,2
ˆ a
k,1
− ˆ a
†
k,1
ˆ a
k,2
_
−
ˆ
k
_
ˆ a
†
−k,2
ˆ a
−k,1
− ˆ a
†
−k,1
ˆ a
−k,2
_ _
(500)
On setting −k → k in the second part of the summation, the spin of the elec
tromagnetic ﬁeld is found as
ˆ
S
EM
= i ¯h
k
ˆ
k
_
ˆ a
†
k,2
ˆ a
k,1
− ˆ a
†
k,1
ˆ a
k,2
_
(501)
It should be noted that in this expression, the indices (1, 2) refer to directions
in threedimensional space and do not refer to the zcomponent of the spin an
gular momentum. Therefore, the above equation shows that a planepolarized
photon is not an eigenstate of the singleparticle spin operator quantized along
the kaxis
22
.
In our Cartesian component basis, the eigenstates of the component of the
spin operator parallel to the direction of propagation
ˆ
S
(3)
, where
ˆ
S
(3)
= ¯ h
_
_
0 −i 0
i 0 0
0 0 0
_
_
(502)
are given by
¯
Φ
m
(k), where
¯
Φ
+1
(k) = −
1
√
2
_
_
1
i
0
_
_
22
Strictly speaking, this quantum number corresponds to the helicity as it is the spin eigen
value which is quantized along the direction of propagation.
82
¯
Φ
0
(k) =
_
_
0
0
1
_
_
¯
Φ
−1
(k) =
1
√
2
_
_
1
− i
0
_
_
(503)
and where the subscript m refers to the eigenvalue of
ˆ
S
(3)
, in units of ¯ h. From
this, it follows that an arbitrary transverse vector wave function Φ(k) can only
be expressed as a linear superposition of states involving m = ±1, and that the
m = 0 component is absent. On expressing an arbitrary (nontransverse) vector
wave function Φ(k) with components Φ
(1)
(k), Φ
(2)
(k) and Φ
(3)
(k) in terms of
its components referred to the helicity eigenstates Φ
m
(k) one has
_
_
Φ
+1
(k)
Φ
0
(k)
Φ
−1
(k)
_
_
=
1
√
2
_
_
− 1 i 0
0 0
√
2
1 i 0
_
_
_
_
Φ
(1)
(k)
Φ
(2)
(k)
Φ
(3)
(k)
_
_
(504)
This relation between the two bases can be expressed in the alternate form
Φ(k) =
m=1
m=−1
ˆ e
m
Φ
m
(k) (505)
where the circularlypolarized unit vectors are introduced via
ˆ e
+1
= −
1
√
2
( ˆ
1
(k) + i ˆ e
2
(k) )
ˆ e
0
= ˆ
3
(k)
ˆ e
−1
=
1
√
2
( ˆ
1
(k) − i ˆ
2
(k) ) (506)
The circularlypolarized unit vectors are associated with photons which have
deﬁnite helicity eigenvalues. It should be noted that these complex unit vectors
are orthogonal, and satisfy
ˆ e
∗
m
. ˆ e
m
= δ
m,m
(507)
The above relations allow one to deﬁne the circularlypolarized creation and
annihilation operators via their relation to the quantum ﬁelds. This procedure
yields
i=3
i=1
ˆ
i
(k) ˆ a
k,i
=
m=1
m=−1
ˆ e
m
(k) ˆ a
k,m
(508)
Hence, the photon annihilation operators corresponding to a deﬁnite helicity
are related to the annihilation operators for planepolarized photons via
ˆ a
k,m=+1
= −
1
√
2
( ˆ a
k,1
− i ˆ a
k,2
)
83
ˆ a
k,m=0
= ˆ a
k,3
ˆ a
k,m=−1
=
1
√
2
( ˆ a
k,1
+ i ˆ a
k,2
) (509)
and the inverse relations are given by
ˆ a
k,1
= −
1
√
2
( ˆ a
k,m=1
− ˆ a
k,m=−1
)
ˆ a
k,2
= −
i
√
2
( ˆ a
k,m=1
+ ˆ a
k,m=−1
)
ˆ a
k,3
= ˆ a
k,m=0
(510)
When expressed in terms of the circularlypolarized unit vectors, the spin oper
ator for the electromagnetic ﬁeld becomes
ˆ
S
EM
= ¯ h
k
ˆ
k
_
ˆ a
†
k,m=1
ˆ a
k,m=1
− ˆ a
†
k,m=−1
ˆ a
k,m=−1
_
(511)
which is expressed in terms of photons with deﬁnite helicity. Within the man
ifold of singlephoton states with momentum ¯h k, the spin operator has eigen
values of ±¯h when measured along the direction
ˆ
k. It is seen that the photon
has helicity m = ±1 but does not involve the helicity state with m = 0 since
the electromagnetic ﬁeld is transverse. The transverse nature of the ﬁeld is due
k
e
1
e
2
Figure 11: The circularlypolarized normal modes of a classical electromag
netic ﬁeld are composed of two planepolarized waves which are out of phase,
are mutually orthogonal, and are transverse to the direction of propagation k.
The resulting electric ﬁeld spirals along the direction of propagation. The left
circularlypolarized wave shown in the diagram corresponds to a helicity of +¯ h.
to the photon being massless. In general, a massive particle with spin S should
have (2S + 1) helicity states. However, a massless particle can only have the
two helicity states corresponding to m = ±S.
84
The angular momentum of the elementary excitation of the electromagnetic
ﬁeld was inferred from experiments in which beams of circularlypolarized light
were absorbed by a sensitive torsional pendulum
23
. Quantum electromagnetic
theory shows that the angular momentum density of left circularlypolarized
light is just ¯ h times the photon density or, equivalently, is just ω
−1
times the
energy density which is also the case for classical electromagnetism
24
. Hence, the
net increase of angular momentum per unit time can easily be calculated from
the excess of the angular momentum ﬂux ﬂowing into the pendulum over that
ﬂowing out. Beth’s experiments veriﬁed that the net torque on the pendulum
was consistent with the theoretical prediction. Thus, the quantized electromag
netic ﬁeld has been shown to be related to a massless particle with spin ¯h and
energymomentum given by the fourvector (¯ hω
k
/c, ¯hk). This particle is the
photon. Every quantized ﬁeld is to be associated with a type of particle.
9.3 Uncertainty Relations
The eigenstates of the ﬁeld operators such as
ˆ
A(r, t) do not correspond to eigen
states of the photon number operators.
Consider the electric ﬁeld
ˆ
E = −
1
c
∂
ˆ
A
∂t
(512)
Although the expectation value of
ˆ
E vanishes for any eigenstate of the set of
occupation numbers [ ¦n
k
,β
¦ >
< ¦n
k
,β
¦ [
ˆ
E [ ¦n
k
,β
¦ > = 0 (513)
since
< ¦n
k
,β
¦ [ a
k,α
[ ¦n
k
,β
¦ > = 0 (514)
the ﬂuctuation in the ﬁeld is given by
< ¦n
k
,β
¦ [
ˆ
E .
ˆ
E [ ¦n
k
,β
¦ > − [ < ¦n
k
,β
¦ [
ˆ
E [ ¦n
k
,β
¦ > [
2
= < ¦n
k
,β
¦ [
ˆ
E .
ˆ
E [ ¦n
k
,β
¦ >
=
4 π
V
k,α
¯hω
k
( n
k,α
+
1
2
)
→ ∞ (515)
The ﬂuctuations in the ﬁeld diverge because the zeropoint energy ﬂuctuations
diverge.
23
R. A. Beth, Phys. Rev. 50, 115 (1936).
24
J. H. Poynting, Proc. Roy. Soc. A82, 560 (1909).
85
The commutation relations between the xcomponent of the E ﬁeld and the
B ﬁeld at the same instant of time are nonzero
25
. That is,
[
ˆ
E
x
(r) ,
ˆ
B
y
(r
) ] =
2 π
V
k,α
¯h ω
k
ˆ
α
(k)
x
(
ˆ
k ∧ ˆ
α
(k) )
y
exp
_
i k . ( r
− r )
_
−
2 π
V
k,α
¯h ω
k
ˆ
α
(k)
x
(
ˆ
k ∧ ˆ
α
(k) )
y
exp
_
− i k . ( r
− r )
_
= −
4 π ¯h c
V
k
k
z
exp
_
− i k . ( r
− r )
_
= i
4 π c ¯h
V
∂
∂z
k
exp
_
− i k . ( r
− r )
_
= i
c ¯h
2 π
2
∂
∂z
δ
3
(r
−r) (516)
The fact that the two polarizations are transverse to the unit vector
ˆ
k has been
used to obtain the third line. Since
ˆ
E and
ˆ
B do not commute, it follows that E
and B obey an uncertainty relation in that the values of E and B cannot both
be speciﬁed to arbitrary accuracy at the same point.
However, if two points in space time x and x
are not causally related, i.e.
[ r
− r [ , = c [ t
− t [ (517)
then the operators commute
[
ˆ
E
x
(r, t) ,
ˆ
B
y
(r
, t
) ] = 0 (518)
Thus, if the two points in spacetime are not connected by the propagation of
light, then the E
x
and B
y
ﬁelds can both be determined to arbitrary accuracy.
9.4 Coherent States
We shall focus our attention on one normal mode of the electromagnetic ﬁeld,
and shall drop the indices (k, α) labelling the normal mode. A coherent state
[ a
ϕ
> is deﬁned as an eigenstate of the annihilation operator
ˆ a [ a
ϕ
> = a
ϕ
[ a
ϕ
> (519)
For example, the vacuum state or ground state is an eigenstate of the annihila
tion operator, in which case a
ϕ
= 0.
25
P. Jordan and W. Pauli Jr. 47, 151 (1927).
86
The coherent state
26
can be found as a linear superposition of eigenstates of
the number operator with eigenvalues n
[ a
ϕ
> =
∞
n=0
C
n
[ n > (520)
On substituting this form in the deﬁnition of the coherent state
ˆ a [ a
ϕ
> =
n
C
n
ˆ a [ n >
= a
ϕ
n
C
n
[ n > (521)
and using the property of the annihilation operator, one has
n
C
n
√
n [ n − 1 > = a
ϕ
n
C
n
[ n > (522)
On taking the matrix elements of this equation with the state < m [, and using
the orthonormality of the eigenstates of the number operator, one ﬁnds
C
m+1
√
m + 1 = a
ϕ
C
m
(523)
Hence, on iterating downwards, one ﬁnds
C
m
=
_
a
m
ϕ
√
m!
_
C
0
(524)
and the coherent state can be expressed as
[ a
ϕ
> = C
0
∞
n=0
_
a
n
ϕ
√
n!
_
[ n > (525)
The normalization constant C
0
can be found from
1 = C
∗
0
C
0
∞
n=0
_
a
n
ϕ
∗
a
n
ϕ
n!
_
(526)
by noting that the sum exponentiates to yield
1 = C
∗
0
C
0
exp
_
a
∗
ϕ
a
ϕ
_
(527)
so, on choosing the phase of C
0
, one has
C
0
= exp
_
−
1
2
a
∗
ϕ
a
ϕ
_
(528)
87
0
0.05
0.1
0.15
0.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
n
P
n
Figure 12: The probability of ﬁnding n photons P(n) in a normal mode repre
sented by a coherent state.
From this, it can be shown that if the number of photons in a coherent state
are measured, the result n will occur with a probability given by
P(n) =
( a
∗
ϕ
a
ϕ
)
n
n!
exp
_
− a
∗
ϕ
a
ϕ
_
(529)
Thus, the photon statistics are governed by a Poisson distribution. Furthermore,
the quantity a
∗
ϕ
a
ϕ
is the average number of photons n present in the coherent
state.
The coherent states can be written in a more compact form. Since the state
with occupation number n can be written as
[ n > =
( ˆ a
†
)
n
√
n!
[ 0 > (530)
the coherent state can also be expressed as
[ a
ϕ
> = exp
_
−
1
2
a
∗
ϕ
a
ϕ
_
∞
n=0
( a
ϕ
ˆ a
†
)
n
n!
[ 0 > (531)
or on summing the series as an exponential
[ a
ϕ
> = exp
_
−
1
2
a
∗
ϕ
a
ϕ
_
exp
_
a
ϕ
ˆ a
†
_
[ 0 > (532)
Thus the coherent state is an inﬁnite linear superposition of states with diﬀerent
occupation numbers, each coeﬃcient in the linear superposition has a speciﬁc
phase relation with every other coeﬃcient.
26
R. J. Glauber, Phys. Rev. Lett. 10, 84 (1963).
88
The above equation represents a transformation between number operator
states and the coherent states. The inverse transformation can be found by
expressing a
ϕ
as a magnitude a and a phase ϕ
a
ϕ
= a exp
_
i ϕ
_
(533)
The number states can be expressed in terms of the coherent states via the
inverse transformation
[ n > =
√
n!
a
n
exp
_
+
1
2
a
2
_ _
2π
0
dϕ
2π
exp
_
− i n ϕ
_
[ a
ϕ
>
(534)
by integrating over the phase ϕ of the coherent state. Since the set of occupa
tion number states is complete, the set of coherent states must also span Hilbert
space. In fact, the set of coherent states is overcomplete.
The coherent state [ a
ϕ
> can be represented by the point a
ϕ
in the Argand
plane. The overlap matrix elements between two coherent states is calculated
1
0.5
0
0.5
1
1 0.5 0 0.5 1
Re z
I
m
z
φ
a
φ
a
Figure 13: Since a coherent state [ a
φ
> is completely determined by a complex
number a
ϕ
, it can be represented by a point in the complex plane.
as
[ < a
ϕ
[ a
ϕ
> [
2
= exp
_
− [ a
ϕ
− a
ϕ
[
2
_
(535)
Hence, coherent states corresponding to diﬀerent points are not orthogonal. The
coherent states form an over complete basis set. The over completeness relation
can be expressed as
_
d 'e a
ϕ
d ·m a
ϕ
π
[ a
ϕ
> < a
ϕ
[ =
ˆ
I (536)
89
This relation can be proved by taking the matrix elements between the occupa
tion number states < n
[ and [ n >, which leads to
_
d 'e a
ϕ
d ·m a
ϕ
π
< n
[ a
ϕ
> < a
ϕ
[ n > = δ
n
,n
(537)
which can be evaluated as
=
_
d 'e a
ϕ
d ·m a
ϕ
π
< n
[ a
ϕ
> < a
ϕ
[ n >
=
_
∞
0
da a
_
2π
0
dϕ
π
a
∗
ϕ
n
a
n
ϕ
√
n
! n!
exp
_
− [ a
ϕ
[
2
_
=
_
∞
0
da a
a
n+n
√
n
! n!
exp
_
− [ a
ϕ
[
2
_ _
2π
0
dϕ
π
exp
_
i ( n − n
) ϕ
_
=
_
∞
0
da a
a
n+n
√
n
! n!
exp
_
− a
2
_
2 δ
n,n
(538)
On changing variable to s = a
2
, one proves the completeness relation by noting
that
_
∞
0
ds s
n
exp
_
− s
_
= n! (539)
Hence, the coherent states form a complete basis set.
The eﬀect of the creation operator on the coherent state can be expressed
as
ˆ a
†
[ a
ϕ
> = ˆ a
†
exp
_
−
1
2
a
∗
ϕ
a
ϕ
_
exp
_
a
ϕ
ˆ a
†
_
[ 0 >
= exp
_
−
1
2
a
∗
ϕ
a
ϕ
_
ˆ a
†
exp
_
a
ϕ
ˆ a
†
_
[ 0 >
= exp
_
−
1
2
a
∗
ϕ
a
ϕ
_
∂
∂a
ϕ
exp
_
a
ϕ
ˆ a
†
_
[ 0 >
= exp
_
−
1
2
a
∗
ϕ
a
ϕ
_
∂
∂a
ϕ
exp
_
+
1
2
a
∗
ϕ
a
ϕ
_
[ a
ϕ
>
(540)
The coherent state is not an eigenstate of the creation operator, since the re
sulting state does not include the zerophoton state.
The expectation value of the ﬁeld operators between the coherent states
yields the classical value, since
< a
ϕ
[ ( ˆ a
†
+ ˆ a ) [ a
ϕ
> = ( a
∗
ϕ
+ a
ϕ
) (541)
In deriving the above equation, the deﬁnition
ˆ a [ a
ϕ
> = a
ϕ
[ a
ϕ
> (542)
90
has been used in the term involving the annihilation operator and the term orig
inating from the creation operator is evaluated using the Hermitean conjugate
equation
< a
ϕ
[ ˆ a
†
= < a
ϕ
[ a
∗
ϕ
(543)
One also ﬁnds that that the expectation value of the number operator is given
by
< a
ϕ
[ ˆ a
†
ˆ a [ a
ϕ
> = a
∗
ϕ
a
ϕ
(544)
so the magnitude of a
ϕ
is related to the average number of photons in the
coherent state n. This identiﬁcation is consistent with the Poisson distribution
of eqn(529) which governs the probability of ﬁnding n photons in the coherent
state. The coherent state is not an eigenstate of the number operator since
there are ﬂuctuations in any measurement of the number of photons. The rms
ﬂuctuation ∆n can be evaluated by noting that
< a
ϕ
[ ˆ n
2
[ a
ϕ
> = < a
ϕ
[ ˆ a
†
ˆ a ˆ a
†
ˆ a [ a
ϕ
>
= < a
ϕ
[ ˆ a
†
ˆ a
†
ˆ a ˆ a [ a
ϕ
> + < a
ϕ
[ ˆ a
†
ˆ a [ a
ϕ
>
= ( a
∗
ϕ
)
2
( a
ϕ
)
2
+ a
∗
ϕ
a
ϕ
(545)
where the boson commutation relations have been used in the second line. Thus,
the mean squared ﬂuctuation in the number operator is given by
< a
ϕ
[ ∆ˆ n
2
[ a
ϕ
> = a
∗
ϕ
a
ϕ
(546)
The rms ﬂuctuation of the photon number is only negligible when compared to
the average value if a
ϕ
has a large magnitude
a
∗
ϕ
a
ϕ
¸ 1 (547)
The expectation values of coherent states almost behave completely clas
sically. The deviation from the classical expectation values can be seen by
examining
< a
ϕ
[ ˆ a ˆ a
†
[ a
ϕ
> = a
∗
ϕ
a
ϕ
+ 1 (548)
which is evaluated by using the commutation relations. It is seen that the ex
pectation values can be approximated by the classical values, if the magnitude
of a
ϕ
is much greater than unity.
Exercise:
Determine the expectation values for the electric and magnetic ﬁeld opera
tors in a coherent state which represents a planepolarized electromagnetic wave.
Exercise:
Determine the expectation values for the electric and magnetic ﬁeld opera
tors in a coherent state which represents a left circularlypolarized electromag
netic wave composed of photons with a helicity of +1.
91
9.4.1 The PhaseNumber Uncertainty Relation
From the discussion of coherent states, it is seen that the coherent state has a
deﬁnite phase, but does not have a deﬁnite number of quanta. In general, it is
impossible to know both the phase of a state and the number of a state. This
is formalized as a phase  number uncertainty relation.
The phase and amplitude of a state is related to the annihilation operator.
Since the annihilation operator is nonHermitean, one can construct the annihi
lation operator as a function of Hermitean operators. Formally, the amplitude
can be related to the square root operator, and the phase to a phase operator
27
.
Hence, one can write
ˆ a
k,α
= exp
_
+ i ( ˆ ϕ
k,α
−ω
k
t)
_
ˆ
√
n
k,α
(549)
and the Hermitean conjugate operator, the creation operator can be expressed
as
ˆ a
†
k,α
=
ˆ
√
n
k,α
exp
_
− i ( ˆ ϕ
k,α
−ω
k
t)
_
(550)
since it has been required that
ˆ
√
n and ˆ ϕ are Hermitean. Furthermore, the
operator
ˆ
√
n must have the property
ˆ
√
n
k,α
ˆ
√
n
k,α
= ˆ n
k,α
(551)
On substituting the expressions for the creation and annihilation operators,
in terms of the phase and amplitude, into boson commutation relations
[ ˆ a
k,α
, ˆ a
†
k
,β
] = δ
k,k
δ
α,β
(552)
etc, one ﬁnds
δ
k,k
δ
α,β
= exp
_
+ i ( ˆ ϕ
k,α
−ω
k
t)
_
ˆ
√
n
k,α
ˆ
√
n
k
,β
exp
_
− i ( ˆ ϕ
k
,β
−ω
k
t)
_
−
ˆ
√
n
k
,β
exp
_
− i ( ˆ ϕ
k
,β
−ω
k
t)
_
exp
_
+ i ( ˆ ϕ
k,α
−ω
k
t)
_
ˆ
√
n
k,α
(553)
Thus, for k = k
and α = β, one has
exp
_
+ i ϕ
k,α
_
ˆ n
k,α
− ˆ n
k,α
exp
_
+ i ϕ
k,α
_
= exp
_
+ i ϕ
k,α
_
(554)
27
P. A. M. Dirac, Proc. Roy. Soc. A 114, 243 (1927).
92
This relationship is satisﬁed, if the phase and number operators satisfy the
commutation relation
[ ˆ n
k,α
, ϕ
k,α
] = i (555)
If one can construct the Hermitean operators that satisfy this commutation
relation, then one can show that the rms uncertainties phase and number must
satisfy the inequality
(∆ϕ
k,α
)
rms
(∆n
k,α
)
rms
≥ 1 (556)
It should be noted that only the relative phase can be measured
28
. Thus, if the
phase diﬀerence of any two components (k, α) and (k
, α
) is speciﬁed precisely,
then the occupation number of either component can not be speciﬁed.
Exercise:
Express the vector potential and the electric and magnetic ﬁeld operators in
terms of the amplitude and phase operators.
9.4.2 Argand Representation of Coherent States
The coherent state [ a
ϕ
> can be represented by the point a
ϕ
in the Argand
plane. The overlap matrix elements between two coherent states is calculated
as
[ < a
ϕ
[ a
ϕ
> [
2
= exp
_
− [ a
ϕ
− a
ϕ
[
2
_
(557)
Hence, coherent states are not orthogonal. In fact, their overlap decreases expo
nentially with large “separations” between the points a
ϕ
and a
ϕ
in the Argand
plane. We shall denote [ a
ϕ
[ by a. Two states separated by distances a ∆ϕ or
∆a such that a ∆ϕ ≥ 1 and ∆a ≥ 1 are eﬀectively orthogonal or independent.
However, states within an area given by ∆aa ∆ϕ ≈ 1 have signiﬁcant overlap
and so can represent the same state. Therefore, the minimum uncertainty state
occupies an area ∆a a ∆ϕ ≈ 1. We note that 2 a ∆a can be interpreted
as a measure of the uncertainty ∆n
ϕ
in the particle number for the state, and
∆ϕ is the uncertainty in the phase of the state. Hence, the phase  number
uncertainty relation sets the area of the Argand diagram that can be associated
with a single state as
a ∆a ∆ϕ ∼ 1 (558)
28
L. Susskind and J. Glogower, Physics 1, 49 (1964).
93
∆a
a ∆φ
∆φ
Re z
Im z
Figure 14: Due to the phasenumber uncertainty principle, the minimum area
of the Argand diagram needed to represent a minimum uncertainty state has
dimensions such that a ∆a ∆ϕ ∼ 1.
10 NonRelativistic Quantum Electrodynamics
The nonrelativistic Hamiltonian for a particle with charge q and mass m inter
acting with a quantized electromagnetic ﬁeld can be expressed as
ˆ
H =
ˆ p
2
2 m
+ q φ(r) −
q
2 m c
_
ˆ p .
ˆ
A(r) +
ˆ
A(r) . ˆ p
_
+
q
2
2 m c
2
ˆ
A
2
(r)
+
_
d
3
r
_
ˆ
E
2
(r
) +
ˆ
B
2
(r
)
8 π
_
(559)
when the vector potential is chosen to satisfy the Coulomb gauge. The second,
third and fourth terms are to be evaluated at the location of the charged point
particle, r, and the last term is evaluated at all points in space. The Hamiltonian
can be expressed as
ˆ
H =
ˆ
H
0
+
ˆ
H
rad
+
ˆ
H
int
(560)
where
ˆ
H
0
is the Hamiltonian for the charged particle in the electrostatic poten
tial φ
ˆ
H
0
=
ˆ p
2
2 m
+ q φ(r) (561)
and
ˆ
H
rad
is the Hamiltonian for the electromagnetic radiation and H
int
is the
interaction
ˆ
H
int
= −
q
2 m c
_
ˆ p .
ˆ
A +
ˆ
A . ˆ p
_
+
q
2
2 m c
2
ˆ
A
2
(562)
The interaction term is composed of a paramagnetic interaction which is linearly
proportional to the vector potential and the diamagnetic interaction which is
proportional to the square of the vector potential. When the electromagnetic
94
ﬁeld is quantized, the radiation Hamiltonian has the form
ˆ
H
rad
=
k,α
¯h ω
k
2
_
ˆ a
†
k,α
ˆ a
k,α
+ ˆ a
k,α
ˆ a
†
k,α
_
(563)
Since the quantized vector potential is given by
ˆ
A(r, t) =
1
√
V
k,α
¸
2 π ¯h c
2
ω
k
ˆ e
α
(k)
_
ˆ a
†
k,α
+ ˆ a
−k,α
_
exp
_
− i k . r
_
(564)
the paramagnetic interaction can be expressed as
ˆ
H
para
= −
q
m c
k,α
¸
2 π ¯h c
2
V ω
k
ˆ p . ˆ
α
(k)
_
ˆ a
†
k,α
+ ˆ a
−k,α
_
exp
_
− i k . r
_
(565)
in which the transverse gauge condition ∇ . A = 0 has also been used. The
(k,α)
p
p'
(k,α)
p'
p
Figure 15: The paramagnetic interaction leads to scattering of an electron from
p to p
by either (a) absorbing a photon, or (b) by emitting a photon.
diamagnetic interaction is expressed as
ˆ
H
dia
=
q
2
2 m c
2
k,k
,α,β
_
2 π ¯h c
2
√
ω
k
ω
k
V
_
ˆ
β
(k
) . ˆ
α
(k) exp
_
− i ( k + k
) . r
_
_
ˆ a
†
k
,β
ˆ a
†
k,α
+ ˆ a
†
k
,β
ˆ a
−k,α
+ ˆ a
−k
,β
ˆ a
†
k,α
+ ˆ a
−k
,β
ˆ a
−k,α
_
(566)
For charged particles with spin onehalf, analysis of the nonrelativistic Pauli
equation shows that there is another interaction term involving the particles’
spins. This interaction can be described by the anomalous Zeeman interaction
ˆ
H
Zeeman
= −
q ¯h
2 m c
_
σ . B
_
(567)
where
B = ∇ ∧ A(r) (568)
95
and σ
i
are the three Pauli matrices.
Generally, the paramagnetic interaction has a greater strength than the Zee
man interaction. This can be seen by examining the magnitudes of the interac
tions. The paramagnetic interaction has a magnitude given by
e
m c
p . A (569)
and for an atom of size a , the uncertainty principle yields
p ∼
¯h
a
(570)
The Zeeman interaction has a magnitude given by
e ¯h
m c
σ . ( k ∧ A ) (571)
but since k is the wavelength of light
k ∼
1
λ
(572)
Hence, since the wave length of light is larger than the linear dimension of an
atom, λ > a, one ﬁnds the inequality between the magnitude of the paramag
netic interaction and the Zeeman interaction
e ¯h
m c
1
a
A >
e ¯h
m c
1
λ
A (573)
Both the paramagnetic and Zeeman coupling strengths are proportional to the
magnitude of the vector potential A, hence the ratio of the strengths of the
interactions are independent of A. Therefore, there magnitudes satisfy the in
equality
1
a
>
1
λ
(574)
so the Zeeman interaction can frequently be neglected in comparison with the
paramagnetic interaction.
10.1 Emission and Absorption of Photons
An atom in an electromagnetic ﬁeld has its constituent charges perturbed by
the oscillating ﬁeld, and those perturbations may lead to either to the absorp
tion of radiation or emission of further radiation. However, thermal equilibrium
between matter and radiation can only be reached if, in addition to these in
duced processes, there exists also a spontaneous process in which an excited
atom emits radiation even in the absence of any measurable radiation. This
spontaneous emission process may be considered as being induced by the zero
point ﬂuctuations of the electromagnetic ﬁeld.
96
10.1.1 The Emission of Radiation
We shall consider a state [ (nlm) ¦n
k
,β
¦ > which is an energy eigenstate
of the unperturbed Hamiltonian
ˆ
H
0
and the radiation Hamiltonian
ˆ
H
rad
. The
interaction
ˆ
H
int
causes the system to make a transition from the initial state to
a ﬁnal state. In the initial state, the electron is in an energy state designated
by the quantum numbers (n, l, m) and the electromagnetic ﬁeld is in a state
speciﬁed by the number of photons in each normal mode. That is, the photon
ﬁeld is in an initial state which is speciﬁed by the set of photon quantum num
bers, ¦n
k
,β
¦. We shall consider the transition in which the electron makes a
transition from the initial state to a ﬁnal state denoted by (n
, l
, m
). Since the
16
12
8
4
0
4
E
(hω'/c,hk')
e

E
nlm
E
n'l'm'
Figure 16: An electron in the initial atomic state with energy E
n,l,m
makes a
transition to the ﬁnal atomic state with energy E
n
,l
,m
, by emitting a photon
with energy ¯ hω
k
.
photon is emitted, the ﬁnal state of the photon ﬁeld described by the set ¦n
k
,β
¦
where
n
k
,β
= n
k
,β
for (k
, β) ,= (k, α) (575)
and the number of photons in a normal mode (k, α) is increased by one
n
k,α
= n
k,α
+ 1 (576)
The transition rate for the electron to make a transition from (n, l, m) to
(n
, l
, m
) can be calculated
29
from the FermiGolden rule expression
_
1
τ
_
=
_
2 π
¯h
_
k,α
[ < n
l
m
¦n
k
,β
¦ [
ˆ
H
int
[ nlm¦n
k
,β
¦ > [
2
δ( E
nlm
−E
n
l
m
−¯h ω
k,α
)
(577)
29
P. A. M. Dirac, Proc. Roy. Soc. A 112, 661 (1926), A 114, 243 (1927).
97
The delta function expresses the conservation of energy. The energy of the
initial state is given by
E
nlm
+
k
,β
¯h ω
k
,β
( n
k
,β
+
1
2
) (578)
and the ﬁnal state has energy
E
n
l
m
+
k
,β
¯h ω
k
,β
( n
k
,β
+
1
2
) (579)
The diﬀerence in the energy of the initial state and ﬁnal state is evaluated as
E
nlm
− E
n
l
m
− ¯h ω
k,α
(580)
which is the argument of the delta function and must vanish if energy is con
served. The sum over k can be evaluated by assuming that the radiation ﬁeld is
conﬁned to a volume V . The allowed k values for the normal modes are deter
mined by the boundary conditions. In this case, the sum over k is transformed
to an integral over kspace via
k
→
V
( 2 π )
3
_
d
3
k (581)
The matrix elements of the interaction Hamiltonian between photon energy
eigenstates is evaluated as
< ¦n
k
,β
¦ [
ˆ
H
int
[ ¦n
k
,β
¦ > = −
q
m c
k,α
¸
2 π ¯h c
2
V ω
k
< ¦n
k
,β
¦ [ ˆ
α
(k) . ˆ p ( ˆ a
†
k,α
+ ˆ a
−k,α
) exp
_
− i k . r
_
[ ¦n
k
,β
¦ >
(582)
since only the paramagnetic part of the interaction has nonzero matrix ele
ments. For the photon emission process, the matrix elements of the creation
operator between the initial and ﬁnal states of the electromagnetic cavity is
evaluated as
< ¦n
k
,β
¦ [ ˆ a
†
k,α
[ ¦n
k
,β
¦ > =
_
n
k,α
+ 1 (583)
hence, the matrix elements of the interaction are given by
< n
l
m
¦n
k
,β
¦ [
ˆ
H
int
[ nlm ¦n
k
,β
¦ > = −
q
m c
¸
2 π ¯h c
2
V ω
k
_
n
k,α
+ 1
< n
l
m
[ ˆ
α
(k) . ˆ p exp
_
− i k . r
_
[ nlm >
(584)
98
Therefore, the transition rate for photon emission can be expressed as
1
τ
=
2 π
¯h
_
q
m c
_
2
V
( 2 π )
3
_
d
3
k
_
2 π ¯h c
2
V ω
k
_
α
( n
k,α
+ 1 )
[ < n
l
m
[ ˆ
α
(k) . ˆ p exp
_
− i k . r
_
[ nlm > [
2
δ( E
nlm
− E
n
l
m
− ¯h ω
k
)
(585)
The above expression shows that the rate for emitting a photon into state (k, α)
is proportional to a factor of n
k,α
+1, which depends on the state of occupation
of the normal mode. The term proportional to the photon occupation number
describes stimulated emission. However, if there are no photons initially present
in this normal mode, one still has a nonzero transition rate corresponding to
spontaneous emission. These factors are the result of the rigorous calculations
30
based on Dirac’s quantization of the electromagnetic ﬁeld, but were previously
derived by Einstein
31
using a diﬀerent argument. From the above expression, it
is seen that the number of photons emitted into state (k, α) increases in propor
tional to the number of photons present in that normal mode. This stimulated
emission increases the number of photons and can lead to ampliﬁcation of the
number of quanta in the normal mode, and leads to the phenomenon of Light
Ampliﬁcation by Stimulated Emission of Radiation (LASER).
10.1.2 The Dipole Approximation
The dipole approximation is justiﬁed by noting that in an emission process, the
typical energy of the photon is of the order of 10 eV. Hence, a typical wave
length of the photon is given by
λ =
2 π c ¯h
¯h ω
∼ 3000
˚
A (586)
whereas the typical length scale r for the electronic state is of the order of an
Angstrom. Therefore the product k r ∼ 10
−3
, so the exponential factor in the
vector potential can be Taylor expanded as
exp
_
− i k . r
_
∼ 1 − i k . r + . . . (587)
The ﬁrst term in the expression produces results that are equivalent to the ra
diation from an oscillating classical electric dipole. If only the ﬁrst term in the
expansion is retained, the resulting approximation is known as the dipole ap
proximation. The second term in the expansion yields results equivalent to the
radiation from an electric quadrupole. The dipole approximation, where only
the ﬁrst term in the expansion is retained, is justiﬁed for transitions where the
30
P. A. M. Dirac, Proc. Roy. Soc. A 114, 243 (1927).
31
A. Einstein, Verh. Deutsche Phys. Ges. 18, 318 (1916), Phys. Z. 18, 121 (1917).
99
V(r)
Ψ(r)
A(r)
Figure 17: A cartoon depicting the relative lengthscales assumed in the dipole
approximation.
successive terms in the expansion are successively smaller by factors of the order
of 10
−3
. The dipole approximation crudely restricts consideration to the case
where the emitted photon can only have zero orbital angular momentum. This
follows from the dipole approximation’s requirement that the size of the atom is
negligible compared with the scale over which the vector potential varies. Then
the vector potential in the spatial region where the electron is located only de
scribes photons with zero orbital angular momentum.
In the dipole approximation, the transition rate for single photon emission
is given by
1
τ
≈
2 π
¯h
_
q
m c
_
2
V
( 2 π )
3
_
d
3
k
_
2 π ¯h c
2
V ω
k
_
α
( n
k,α
+ 1 )
[ ˆ
α
(k) . < n
l
m
[ ˆ p [ nlm > [
2
δ( E
nlm
− E
n
l
m
− ¯h ω
k
)
(588)
The matrix elements of the momentum can be evaluated by noting that the
states [ nlm > are eigenstates of the unperturbed electronic Hamiltonian so
ˆ
H
0
[ nlm > = E
nlm
[ nlm > (589)
where the unperturbed Hamiltonian is given by
ˆ
H
0
=
ˆ p
2
2 m
+ V (r) (590)
The electronic momentum operator ˆ p can be expressed in terms of the commu
tator of the Hamiltonian
ˆ
H
0
and r through the relation
[ r ,
ˆ
H
0
] = i
¯h
m
ˆ p (591)
On using this relation, the matrix elements of the momentum operator can be
written in terms of the matrix elements of the electron’s position operator r by
< n
l
m
[ ˆ p [ nlm > = − i
m
¯h
< n
l
m
[ [ r ,
ˆ
H
0
] [ nlm >
100
= i
m
¯h
( E
nlm
− E
n
l
m
) < n
l
m
[ r [ nlm >
(592)
Therefore, in the dipole approximation, the transition rate is given by
1
τ
≈
q
2
( 2 π )
_
d
3
k ω
k
α
( n
k,α
+ 1 )
[ ˆ
α
(k) . < n
l
m
[ r [ nlm > [
2
δ( E
nlm
− E
n
l
m
− ¯h ω
k
)
(593)
where the property of the delta function has been used to set
( E
nlm
− E
n
l
m
)
2
¯h
2
δ( E
nlm
−E
n
l
m
−¯h ω
k
) = ω
2
k
δ( E
nlm
−E
n
l
m
−¯h ω
k
)
(594)
It is seen that the volume of the electromagnetic cavity has dropped out of the
expression of eqn(593) for the transition rate. We shall assume that the number
of photons n
k,α
in the initial state is zero. The (complex) factor
d
nlm,n
l
m
= q < n
l
m
[ r [ nlm > (595)
is deﬁned as the electric dipole moment, and the electronic energy diﬀerence is
denoted by the frequency
E
nlm
− E
n
l
m
= ¯ h ω
nl,n
l
(596)
With this notation, the transition rate can be expressed as
1
τ
≈
1
2 π
_
d
3
k
_
ω
2
nl,n
l
ω
k
_
α
[ ˆ
α
(k) . d
nlm,n
l
m
[
2
δ( ¯h ω
nl,n
l
− ¯h ω
k
)
(597)
The integration over d
3
k can be performed by separating the integration over
the direction dΩ
k
of the outgoing photon and an integration over the magnitude
of k. The integration over the magnitude of k can be performed by noting that
the integrand is proportional to a Dirac delta function, so the transition rate
can be evaluated as
1
τ
=
ω
2
nl,n
l
2 π ¯h
_
dΩ
k
_
∞
0
dk
_
k
2
ω
k
_
α
[ ˆ
α
(k) . d
nlm,n
l
m
[
2
δ( ω
nl,n
l
− c k )
=
ω
3
nl,n
l
2 π ¯h c
3
_
dΩ
k
α
[ ˆ
α
(k) . d
nlm,n
l
m
[
2
(598)
The above expression yields the rate at which an electron makes a transition
between the initial and ﬁnal electronic state, in which one photon of any polar
ization is emitted in any direction.
101
If one is only interested in the decay rate of the electronic state via the
emission of a photon, one should sum over all polarizations and integrate over
all directions of the emitted photon. The direction of the emitted photon
ˆ
k is
expressed in terms of polar coordinates deﬁned with respect to an arbitrarily
chosen polar axis. The direction of the photon’s wave vector
ˆ
k is deﬁned as
k
θ
k
φ
k
e
2
(k)
e
1
(k)
dΩ
k
x
y
z
Figure 18: A photon is emitted with wave vector k with a direction denoted
by the polar coordinates (θ
k
, ϕ
k
). The polarization vector ˆ e
1
(k) is chosen to be
in the plane containing the polaraxis and k, therefore, ˆ e
2
(k) is parallel to the
x −y plane.
ˆ
k = (sin θ
k
cos ϕ
k
, sin θ
k
sin ϕ
k
, cos θ
k
). The directions of the two transverse
polarizations α are deﬁned as
ˆ
1
(k) = (cos θ
k
cos ϕ
k
, cos θ
k
sin ϕ
k
, −sin θ
k
)
ˆ
2
(k) = (−sin ϕ
k
, cos ϕ
k
, 0) (599)
The scalar product between the polarization vectors and the dipole moment can
be expressed in terms of the Cartesian components via
ˆ
α
(k) . d
nlm,n
l
m
=
i
ˆ
(i)
α
(k) . ( d
(i)
nlm,n
l
m
) (600)
As neither the polarization nor the direction of the outgoing photon are mea
sured, the transition rates is determined as an integral over all directions
_
1
τ
_
=
ω
3
nl,n
l
2 π ¯h c
3
i,j
α
_
dΩ
k
ˆ
(j)
α
(k) ˆ
(i)
α
(k) ( d
(j)
nlm,n
l
m
)
∗
( d
(i)
nlm,n
l
m
)
(601)
102
On using the identity
1
4 π
α
_
dΩ
k
ˆ
(j)
α
(k) ˆ
(i)
α
(k) =
2
3
δ
i,j
(602)
one ﬁnds that the transition rate is given by the scalar product of complex
vectors
_
1
τ
_
=
4 ω
3
nl,n
l
d
∗
nlm,n
l
m
. d
nlm,n
l
m
3 ¯h c
3
(603)
The electric dipole matrix elements can be shown to vanish between most pairs
of states. The selection rules determine which matrix elements are nonzero
and, therefore, which electric dipole transitions are allowed.
10.1.3 Electric Dipole Radiation Selection Rules
Electric dipole induced transitions obey the selection rules ∆l = ± 1 and either
∆m = ± 1 or 0, where l is the quantum number for electron’s orbital angular
momentum and m is the zcomponent. The dipole selection rules can be derived
by writing the wave functions for the oneelectron states as
ψ
n,l,m
(r) = R
nl
(r) Y
l
m
(θ, ϕ) (604)
where R
n,l
(r) is the radial wave function, and Y
l
m
(θ, ϕ) is the spherical harmonic
function quantized along the zdirection. The components of an arbitrarily
oriented electric dipole matrix elements involve matrix elements of the quantities
x = r sin θ cos ϕ
y = r sin θ sin ϕ
z = r cos θ (605)
Since the above expressions are the components of a vector, they can be re
written as combinations of the spherical harmonics with angular momentum
l = 1, via
x = r
1
2
_
8 π
3
_
Y
1
−1
(θ, ϕ) − Y
1
1
(θ, ϕ)
_
y = r
i
2
_
8 π
3
_
Y
1
−1
(θ, ϕ) + Y
1
1
(θ, ϕ)
_
z = r
_
4 π
3
Y
1
0
(θ, ϕ) (606)
Hence, the components of the vector r can be written as
r =
_
4 π
3
r
_
ˆ e
x
+ i ˆ e
y
√
2
Y
1
−1
+ ˆ e
z
Y
1
0
−
ˆ e
x
− i ˆ e
y
√
2
Y
1
1
_
(607)
103
The circular polarization vectors are given by
ˆ e
m=−1
=
ˆ e
x
− i ˆ e
y
√
2
ˆ e
m=0
= ˆ e
z
ˆ e
m=+1
= −
ˆ e
x
+ i ˆ e
y
√
2
(608)
which are orthogonal
ˆ e
∗
m
. ˆ e
m
= δ
m,m
(609)
Hence, the vector r can be written in the alternate forms
r =
_
4 π
3
r
m
ˆ e
∗
m
Y
1
m
(θ, ϕ)
=
_
4 π
3
r
m
ˆ e
m
Y
1
m
(θ, ϕ)
∗
(610)
This illustrates that through the dipole approximation coupling term
r .
_
ˆ e
m
ˆ a
k,m
+ ˆ e
∗
m
ˆ a
†
k,m
_
(611)
(where k ≈ 0), an electron with angular momentum quantized along the z
direction most naturally couples to circularlypolarized light with the same
quantization axis. The electric dipole matrix elements involve the three fac
tors
_
2π
0
dϕ
_
π
0
dθ sin θ Y
l
m
(θ, ϕ)
∗
Y
1
±1
(θ, ϕ) Y
l
m
(θ, ϕ)
_
2π
0
dϕ
_
π
0
dθ sin θ Y
l
m
(θ, ϕ)
∗
Y
1
0
(θ, ϕ) Y
l
m
(θ, ϕ) (612)
which come from the angular integrations. Conservation of angular momentum
leads to the dipoletransition selection rules
l
= l ± 1 (613)
and
m
= m ± 1
m
= m (614)
because one unit of angular momentum is carried away by the photon in the
form of its spin
32
.
32
In the dipole approximation, the photon is restricted to have zero orbital angular mo
mentum. Therefore, the angular momentum is completely transformed to the photon’s spin.
More generally, the spatial (planewave) part of the vector potential should be expanded in
terms of spherical harmonics to exhibit the photon’s orbital angular momentum components.
104
The mselection rules for electric dipole transitions.
The selection rules on the zcomponent of the angular momentum, m, follow
directly from the ϕdependence of the spherical harmonics
Y
l
m
(θ, ϕ) = Θ
l
m
(θ)
1
√
2 π
exp
_
i m ϕ
_
(615)
so, the integral over the Cartesian components of the dipole matrix elements
involve
1
2 π
_
2π
0
dϕ exp
_
i ( m−m
) ϕ
_ _
sin ϕ
cos ϕ
_
=
1
2
_
− i δ
m+1,m
+ i δ
m−1,m
δ
m+1,m
+ δ
m−1,m
_
(616)
and
1
2 π
_
2π
0
dϕ exp
_
i ( m − m
) ϕ
_
= δ
m,m
(617)
The above results lead to the selection rules for the zcomponent of the electron’s
orbital angular momentum
m
= m ± 1
m
= m (618)
An alternate derivation of the selection rules for the zcomponent of the
electron’s orbital angular momentum can be found from considerations of the
commutation relations
[
ˆ
L
z
, x ] = i ¯h y
[
ˆ
L
z
, y ] = − i ¯h x
[
ˆ
L
z
, z ] = 0 (619)
On taking the matrix elements between states with deﬁnite zcomponents of the
angular momenta, one ﬁnds
< n
l
m
[ [
ˆ
L
z
, x ] [ nlm > = i ¯h < n
l
m
[ y [ nlm >
< n
l
m
[ [
ˆ
L
z
, y ] [ nlm > = − i ¯h < n
l
m
[ x [ nlm >
< n
l
m
[ [
ˆ
L
z
, z ] [ nlm > = 0 (620)
which reduce to
( m
− m ) < n
l
m
[ x [ nlm > = i < n
l
m
[ y [ nlm >
( m
− m ) < n
l
m
[ y [ nlm > = − i < n
l
m
[ x [ nlm >
( m
− m ) < n
l
m
[ z [ nlm > = 0 (621)
105
From the last equation, it follows that either m
= m or that
< n
l
m
[ z [ nlm > = 0 (622)
On combining the ﬁrst two equations, one ﬁnds that
( m
− m )
2
< n
l
m
[ x [ nlm > = i ( m
− m ) < n
l
m
[ y [ nlm >
= < n
l
m
[ x [ nlm > (623)
The above equation is solved by requiring that either
( m
− m )
2
= 1 (624)
or
< n
l
m
[ x [ nlm > = 0 (625)
Hence, the mselection rules for the electric dipole transitions are ∆m = ± 1, 0.
The lselection rules for electric dipole transitions.
The selection rules for the magnitude of the electron’s orbital angular mo
mentum can be found by considering the double commutator
[
ˆ
L
2
, [
ˆ
L
2
, r ] ] = 2 ¯h
2
_
r
ˆ
L
2
+
ˆ
L
2
r
_
(626)
On taking the matrix elements of this equation between diﬀerent eigenstates of
the magnitude of the orbital angular momentum, one ﬁnds
_
l
( l
+ 1 ) −l ( l + 1 )
_
2
< n
l
m
[ r [ nlm >= 2
_
l
( l
+ 1 ) +l ( l + 1 )
_
< n
l
m
[ r [ nlm >
(627)
Since
_
l
( l
+ 1 ) − l ( l + 1 )
_
2
= ( l
+ l + 1 )
2
( l
− l )
2
(628)
and
2
_
l
( l
+ 1 ) + l ( l + 1 )
_
=
_
( l
+ l + 1 )
2
+ ( l
− l )
2
− 1
_
(629)
the above equation is satisﬁed if, either
< n
l
m
[ r [ nlm > = 0 (630)
or
_
( l
+ l + 1 )
2
− 1
_ _
( l
− l )
2
− 1
_
= 0 (631)
106
The ﬁrst factor in eqn(631) is always positive when l
,= l, therefore, the electric
dipole selection rule becomes ∆l = ± 1.
The actual values of the matrix elements can be found from explicit calcu
lations. The θdependence of the matrix elements is governed by the associated
Legendre functions through
Θ
l
m
(θ) =
¸
( 2 l + 1 )
2
( l − m )!
( l + m )!
P
l
m
(cos θ) (632)
which obey the recursion relations
sin θ P
l
m−1
(cos θ) =
P
l+1
m
(cos θ) − P
l−1
m
(cos θ)
2 l + 1
(633)
and
sin θ P
l
m+1
(cos θ) =
( l + m ) ( l + m + 1 ) P
l−1
m
(cos θ) − ( l − m ) ( l − m + 1 ) P
l+1
m
(cos θ)
2 l + 1
(634)
appropriate for the ∆m = ± 1 transitions and
cos θ P
l
m
(cos θ) =
( l − m + 1 ) P
l+1
m
(cos θ) + ( l + m ) P
l−1
m
(cos θ)
2 l + 1
(635)
for the constant m transition, ∆m = 0. Using the recursion relations, one
ﬁnds that
sin θ Θ
l
m
(θ) =
¸
( l + m + 2 ) ( l + m + 1 )
( 2 l + 1 ) ( 2 l + 3 )
Θ
l+1
m+1
−
¸
( l − m ) ( l − m − 1 )
( 2 l − 1 ) ( 2 l + 1 )
Θ
l−1
m+1
(636)
for ∆m = 1, while for ∆m = − 1 one ﬁnds
sin θ Θ
l
m
(θ) =
¸
( l + m ) ( l + m − 1 )
( 2 l + 1 ) ( 2 l − 1 )
Θ
l−1
m−1
−
¸
( l + 2 − m ) ( l + 1 − m )
( 2 l + 1 ) ( 2 l + 3 )
Θ
l+1
m−1
(637)
and for constant m
cos θ Θ
l
m
(θ) =
¸
( l + 1 + m ) ( l + 1 − m )
( 2 l + 1 ) ( 2 l + 3 )
Θ
l+1
m
+
¸
( l + m ) ( l − m )
( 2 l − 1 ) ( 2 l + 1 )
Θ
l−1
m
(638)
107
The coeﬃcients in the above equation have a similar form to the ClebschGordon
coeﬃcients. The dipole matrix elements can be evaluated by taking the matrix
elements of the above set of relations with Θ
l
m
(θ)
∗
and then using the orthog
onality properties. The above three relations give rise to the selection rules for
the magnitude of the orbital angular momentum l
l
= l ± 1 (639)
Hence, not only have the selection rules on l been rederived but the angular
integrations have also been evaluated.
What the above mathematics describes is how the spin angular momentum
of the emitted photon is combined with the orbital angular momentum of the
electron in the ﬁnal state, so that total angular momentum is conserved. This
implies the selection rules which leads to the magnitude of the initial and ﬁnal
electronic angular momentum l having to satisfy the triangular inequality
l
+ 1 ≥ l ≥ [ l
− 1 [ (640)
as required by the rules of combination of angular momentum. The evaluation of
the dipole matrix elements is an explicit example of the WignerEckart theorem.
For this example, the irreducible tensor is the vector V with components V
µ
given by
V
±
= ∓
( x ± i y )
√
2
= r
_
4π
3
Y
1
±1
V
0
= z = r
_
4π
3
Y
1
0
(641)
Then, since the electric dipole carries angular momentum (1, µ), the Wigner
Eckart theorem reduces to
< n
l
m
[ V
µ
[ nlm > =
1
√
2 l
+ 1
< l, m; 1, µ [ l
m
> < n
l
[ [V [ [ nl >
(642)
where the ﬁrst term which represents the angular integration is a Clebsch
Gordon coeﬃcient and the second factor is the reduced matrix element which
does depend on the form of the particular vector, but is independent of any
choice of coordinate system. Furthermore, the WignerEckart theorem yields
the selection rules for the electric dipole transition
l + l
≥ 1 ≥ [ l − l
[ (643)
Exercise:
108
Table 1: Matrix Elements of the Components of the Dipole Moment
l
m
x y z
m
= m+ 1
1
2
_
(l+2+m)(l+1+m)
(2l+1)(2l+3)
−
i
2
_
(l+2+m)(l+1+m)
(2l+1)(2l+3)

l
= l + 1 m
= m  
_
(l+1+m)(l+1−m)
(2l+1)(2l+3)
m
= m−1 −
1
2
_
(l+2−m)(l+1−m)
(2l+1)(2l+3)
−
i
2
_
(l+2−m)(l+1−m)
(2l+1)(2l+3)

m
= m+ 1 −
1
2
_
(l−m)(l−1−m)
(2l−1)(2l+1)
i
2
_
(l−m)(l−1−m)
(2l−1)(2l+1)

l
= l −1 m
= m  
_
(l+m)(l−m)
(2l−1)(2l+1)
m
= m−1
1
2
_
(l+m)(l−1+m)
(2l−1)(2l+1)
i
2
_
(l+m)(l−1+m)
(2l−1)(2l+1)

Using the commutation relations for the jth component of a vector
ˆ
V
j
with
the ith component of the orbital angular momentum
ˆ
L
i
,
[
ˆ
L
i
,
ˆ
V
j
] = i ¯h
k
ξ
i,j.k
ˆ
V
k
(644)
where ξ
i,j,k
is the antisymmetric LeviCivita symbol, show that
[
ˆ
L
2
,
ˆ
V ] = − i ¯h
_
ˆ
L ∧
ˆ
V −
ˆ
V ∧
ˆ
L
_
= − 2 i ¯h
_
ˆ
L ∧
ˆ
V − i ¯h
ˆ
V
_
(645)
From the above equation, derive the double commutation relation
[
ˆ
L
2
, [
ˆ
L
2
,
ˆ
V ] ] = 2 ¯ h
2
_
ˆ
V
ˆ
L
2
+
ˆ
L
2
ˆ
V
_
− 4 ¯h
2
ˆ
L
_
ˆ
L .
ˆ
V
_
(646)
and that the last term of the above expression is zero if
ˆ
V = r.
The parity selection rule.
In addition to the electronic orbital angular momentum selection rules, there
is a parity selection rule. The parity operation is an inversion through the origin
given by r → − r. The parity operator
ˆ
T has the eﬀect
ˆ
T ψ(r) = ψ(−r) (647)
109
The parity operator is its own inverse since for any state ψ(r)
ˆ
T
2
ψ(r) =
ˆ
T ψ(−r)
= ψ(r) (648)
Therefore, the parity operator has eigenvalues p = ±1 for the eigenstates which
are deﬁned by
ˆ
T φ
p
(r) = p φ
p
(r) (649)
so
ˆ
T
2
φ
p
(r) = p
2
φ
p
(r) = φ
p
(r) (650)
which yields p
2
= 1 or p = ± 1. In polar coordinates, the parity operation is
equivalent to a reﬂection
θ → π − θ (651)
followed by a rotation
ϕ → ϕ + π (652)
In electromagnetic processes, parity is conserved since the Coulomb potential is
θ
ϕ
x
y
z
r
 r
π−θ
π+ϕ
r
r
Figure 19: The eﬀect of the parity operator on a displacement vector r, in
spherical polar coordinates.
symmetric under reﬂection
33
. Therefore, the parity operator
ˆ
T commutes with
the Hamiltonian
[
ˆ
T ,
ˆ
H ] = 0 (653)
and so one can ﬁnd states [ φ
n
> that are simultaneous eigenstates of
ˆ
H and
ˆ
T.
ˆ
H [ φ
n
> = E
n
[ φ
n
>
ˆ
T [ φ
n
> = p
n
[ φ
n
> (654)
33
The weak interaction does not conserve parity.
110
Inversion transforms vector operators according to
ˆ
T r
ˆ
T
−1
= − r (655)
Hence, for any matrix elements of r between any eigenstates of the parity oper
ator, one has
< φ
n
[ r [ φ
n
> = − < φ
n
[
ˆ
T r
ˆ
T
−1
[ φ
n
>
= − p
n
p
n
< φ
n
[ r [ φ
n
> (656)
Therefore, the parity must change in an electric dipole transition
p
n
p
n
= − 1 (657)
This is known as the Laporte selection rule for electric dipole transitions
34
. The
validity of this selection follows from the fact that inversion commutes with the
orbital angular momentum operator. The spherical harmonics are eigenstates
of the parity operator since
ˆ
T Y
l
m
(θ, ϕ) = ( − 1 )
l
Y
l
m
(θ, ϕ) (658)
This is proved by examining
Y
l
l
(θ, ϕ) ∝ sin
l
θ exp
_
i l ϕ
_
(659)
which is seen to be an eigenfunction of the parity operator with eigenvalue
( − 1 )
l
. All other spherical harmonics with the same value of l have the
same eigenvalue since the lowering operator (like any component of the angular
momentum) commutes with the parity operator. Therefore, one can use the
angular momentum selection rule to show that parity does change in an electric
dipole transition since
( − 1 )
l+l
= − 1 (660)
The Laporte selection rule is satisﬁed since ∆l = 1.
10.1.4 Angular Distribution of Dipole Radiation
We shall assume that the initial state is polarized so that the electron is in an
electronic state labelled by m, where the axis of quantization is ﬁxed in space.
The decay rate in which a photon of polarization α is emitted into the solid
angle dΩ
k
is given by
1
τ
dΩ
k
=
ω
3
nl,n
l
2 π ¯h c
3
dΩ
k
α
[ ˆ
α
(k) . d
nlm,n
l
m
[
2
(661)
34
O. Laporte, Z. Physik, 23, 135 (1924).
111
For a photon emitted in the direction
ˆ
k
ˆ
k = (sin θ
k
cos ϕ
k
, sin θ
k
sin ϕ
k
, cos θ
k
) (662)
the polar polarization vectors are given by
ˆ
1
(k) = (cos θ
k
cos ϕ
k
, cos θ
k
sin ϕ
k
, −sin θ
k
)
ˆ
2
(k) = (−sin ϕ
k
, cos ϕ
k
, 0) (663)
Therefore, the scalar products of the transition matrix elements of r with the
polarizations are given by
ˆ
1
(k) . < n
l
m
[ r [ nlm > =
1
2
cos θ
k
exp[ − i ϕ
k
] < n
l
m
[ (x +iy) [ nlm >
+
1
2
cos θ
k
exp[ + i ϕ
k
] < n
l
m
[ (x −iy) [ nlm >
− sin θ
k
< n
l
m
[ z [ nlm > (664)
and
ˆ
2
(k) . < n
l
m
[ r [ nlm > = −
i
2
exp[ − i ϕ
k
] < n
l
m
[ (x +iy) [ nlm >
+
i
2
exp[ + i ϕ
k
] < n
l
m
[ (x −iy) [ nlm >
(665)
Due to the mselection rules
< n
l
m
[ (x +iy) [ nlm > ∝ δ
m
−m−1
< n
l
m
[ (x −iy) [ nlm > ∝ δ
m
−m+1
(666)
and
< n
l
m
[ z [ nlm > ∝ δ
m
−m
(667)
the crossterms in the square of the matrix elements are zero. Hence, on sum
ming over the polarizations, one ﬁnds that the (θ
k
, ϕ
k
) dependence of the decay
is governed by the dipole matrix elements through
α
[ ˆ
α
(k) . r
nlm,n
l
m
[
2
=
1
4
_
1 + cos
2
θ
k
_
[ < n
l
m
[ (x +iy) [ nlm > [
2
+
1
4
_
1 + cos
2
θ
k
_
[ < n
l
m
[ (x −iy) [ nlm > [
2
+ sin
2
θ
k
[ < n
l
m
[ z [ nlm > [
2
(668)
For l
= l + 1, the above sum is found to depend on the angular factors
I
m
l
=l+1
(θ
k
, ϕ
k
) =
_
1 + cos
2
θ
k
_
1
4
(l + 2 +m)(l + 1 +m)
(2l + 1)(2l + 3)
δ
m
−m−1
+
_
1 + cos
2
θ
k
_
1
4
(l + 2 −m)(l + 1 −m)
(2l + 1)(2l + 3)
δ
m
−m+1
+ sin
2
θ
k
(l + 1 +m)(l + 1 −m)
(2l + 1)(2l + 3)
δ
m
−m
(669)
112
Since the zcomponent of the ﬁnal electron’s orbital angular momentum is not
measured, m
should be summed over. The angular distribution of the emitted
radiation for the l
= l +1 transition when neither the polarization nor the ﬁnal
state m
value are measured is given by
m
I
m
l
=l+1
(θ
k
, ϕ
k
) =
1
2
( 1 + cos
2
θ
k
)
(l + 2)(l + 1) + m
2
(2l + 1)(2l + 3)
+ sin
2
θ
k
(l + 1)
2
− m
2
(2l + 1)(2l + 3)
(670)
This factor determines the angular dependence of the emitted electromagnetic
radiation, which clearly depends on the value of m specifying the initial elec
tronic state. On rearranging the expression, one ﬁnds that the anisotropy is
governed by the factor
m
I
m
l
=l+1
(θ
k
, ϕ
k
) =
(l + 2)(l + 1) + m
2
(2l + 1)(2l + 3)
+
1
2
l(l + 1) − 3 m
2
(2l + 1)(2l + 3)
sin
2
θ
k
(671)
which shows that for m = 0 the photons are preferentially emitted perpendicular
to the direction of quantization axis since this maximizes the overlap between
the polarization and the dipole matrix element. In the opposite case of large
values of m
2
[ 3 m
2
> l (l + 1) ], one ﬁnds that the photons are preferentially
emitted parallel (or antiparallel) to the axis of quantization. On integrating
over all directions of the emitted photon, one obtains
1
4 π
_
dΩ
k
m
I
m
l
=l+1
(θ
k
, ϕ
k
) =
2
3
(l + 1)
(2l + 1)
(672)
The independence of the result on m follows since, in this case, there are no an
gular correlations and the choice of direction of quantization of m is completely
arbitrary. The total decay rate for an electron in a state with ﬁxed m due to
an l
= l + 1 transition is given by
1
τ
l
=l+1
=
4 e
2
ω
3
nl,n
l
3 ¯h c
3
(l + 1)
(2l + 1)
¸
¸
¸
¸
_
∞
0
dr r
2
R
∗
n
l+1
(r) r R
nl
(r)
¸
¸
¸
¸
2
(673)
for l
= l + 1. This decay rate would be measured in experiments in which
neither the ﬁnal state of the electron nor the ﬁnal photon state is measured.
However, if the initial electronic state is unpolarized, then one should sta
tistically average over the initial m. In this case, the emitted radiation becomes
isotropic
1
2l + 1
l
m=−l
m
I
m
l
=l+1
(θ
k
, ϕ
k
) =
2
3
(l + 1)
(2l + 1)
(674)
since
1
2l + 1
l
m=−l
m
2
=
1
3
l(l + 1) (675)
113
Hence, if the initial electronic state is unpolarized, the electromagnetic radiation
is isotropic. The decay rate for the l
= l +1 transition starting with a statistical
distribution of m values is given by
1
τ
l
=l+1
=
4 e
2
ω
3
nl,n
l
3 ¯h c
3
(l + 1)
(2l + 1)
¸
¸
¸
¸
_
∞
0
dr r
2
R
∗
n
l+1
(r) r R
nl
(r)
¸
¸
¸
¸
2
(676)
for l
= l +1. This is the same result that was previously obtained for the decay
rate of a level with a speciﬁc m value, when the m
value of the ﬁnal state and
the polarization or direction of the emitted photon are not measured.
For the case where l
= l − 1, one ﬁnds that the decay rate involves the
angular factor
I
m
l
=l−1
(θ
k
, ϕ
k
) =
_
1 + cos
2
θ
k
_
1
4
(l −m)(l −1 −m)
(2l −1)(2l + 1)
δ
m
−m−1
+
_
1 + cos
2
θ
k
_
1
4
(l −1 +m)(l +m)
(2l −1)(2l + 1)
δ
m
−m+1
+ sin
2
θ
k
(l +m)(l −m)
(2l −1)(2l + 1)
δ
m
−m
(677)
which on summing over the ﬁnal values of m
yields the angular dependence of
the radiation ﬁeld
m
I
m
l
=l−1
(θ
k
, ϕ
k
) =
1
2
( 1 + cos
2
θ
k
)
l(l −1) + m
2
(2l −1)(2l + 1)
+ sin
2
θ
k
l
2
−m
2
(2l −1)(2l + 1)
(678)
The anisotropy of the emitted radiation is determined by the factor
m
I
m
l
=l−1
(θ
k
, ϕ
k
) =
l(l −1) + m
2
(2l −1)(2l + 1)
+
1
2
l(l + 1) − 3 m
2
(2l −1)(2l + 1)
sin
2
θ
k
(679)
which shows that for m = 0 the photons are preferentially emitted perpendicular
to the direction of quantization axis since this maximizes the overlap between
the polarization and the dipole matrix element. In the opposite case of larger
m
2
[ 3 m
2
> l (l + 1) ], one ﬁnds that the photons are preferentially emitted
parallel (or antiparallel) to the axis of quantization.
Again it is noted that if the initial state is unpolarized, so that m has to be
averaged over, then the radiation ﬁeld is isotropic since
1
2l + 1
l
m=−l
m
I
m
l
=l−1
(θ
k
, ϕ
k
) =
2
3
l
(2l + 1)
(680)
Therefore, the decay rate in which the photon is emitted in any direction is
given by the expression
1
τ
l
=l−1
=
4 e
2
ω
3
nl,n
l
3 ¯h c
3
l
(2l + 1)
¸
¸
¸
¸
_
∞
0
dr r
2
R
∗
n
l−1
(r) r R
nl
(r)
¸
¸
¸
¸
2
(681)
114
for l
= l −1.
Classical Interpretation.
The quantum mechanical results for the angular distribution of the radiation
can be understood in terms of a simple classical model of the atom. In Bohr’s
model, a single electron orbits a central nucleus to which it is bound by the
attractive Coulomb potential. We shall assume that the radius of the orbit is a
and that the electron is performing a circular orbit in the x−y plane. Since the
direction of the electron’s orbital angular momentum is aligned with the zaxis,
it corresponds to the case where m ≈ l and l ¸ 1. In this case, the electron
has an oscillating dipole moment given by
d(t) = q a
_
cos ω t ˆ e
x
+ sin ω t ˆ e
y
_
= q a 'e
_
ˆ e
x
− i ˆ e
y
_
exp
_
i ω t
_
(682)
This rotating dipole moment can be decomposed into two orthogonal linear
dipole moments which oscillate out of phase with each other. It should be
ω t
d(t)
e

x
y
m = l
Figure 20: A classical electron orbiting in the xy plane (m = l) can be consid
ered as producing two perpendicular linearlyoscillating electricdipole moments.
recalled that a classical oscillating (linear) electric dipole moment radiates power
P(ω) into a solid angle dΩ
k
with a distribution given by
_
dP
dΩ
k
_
linear
=
c
8 π
_
ω
c
_
4
[ d [
2
sin
2
Θ
kd
(683)
where Θ
kd
is the angle between the detector and the direction of the electric
dipole. On considering the radiation from the atom to be generated from two
orthogonal linear oscillating dipoles, one ﬁnds
_
dP
dΩ
k
_
dipole
=
c
8 π
_
ω
c
_
4
[ d [
2
_
sin
2
Θ
kx
+ sin
2
Θ
ky
_
(684)
115
e
x
e
y
e
z
k
Θ
kx
θ
k
Θ
ky
dΩ
k
e

m=l
Figure 21: The polarization of the radiated electromagnetic ﬁeld for an electron
orbiting in the xy plane (m = l) can be comprehended in terms of the classical
radiation emanating from two linearly oscillating electricdipole moments. The
angles Θ
kx
and Θ
ky
, respectively, are the angles between the emitted radiation
and the xaxis and the angle subtended by the emitted radiation and the yaxis.
which on using
cos Θ
kx
= sin θ
k
cos ϕ
k
cos Θ
ky
= sin θ
k
sin ϕ
k
(685)
becomes
_
dP
dΩ
k
_
dipole
=
c
8 π
_
ω
c
_
4
[ d [
2
_
1 + cos
2
θ
k
_
(686)
Since the energy of the emitted photon is given by ¯h ω, one ﬁnds the angular
dependence of the semiclassical prediction of the decay rate is given by
1
τ
dΩ
k
=
e
2
8 π ¯h a
_
ω a
c
_
3
_
1 + cos
2
θ
k
_
dΩ
k
(687)
The polarization vector is parallel to the direction of the electric ﬁeld, which in
turn is given by the direction of the oscillating dipole that produced it. Hence,
a detector which is arranged to accept radiation travelling in the direction
ˆ
k will
detect polarizations that are found by projecting the electron’s orbit onto the
plane perpendicular to
ˆ
k. For example, in this case where the electron’s orbit
is in the x − y plane, so radiation along the zaxis will be circularlypolarized,
whereas radiation in the x −y plane will be linearlypolarized.
116
Linear
Circular
e

k
k
Figure 22: The polarization at a ﬁeld point can be determined by considering
the projection of the electrons orbit on a plane perpendicular to the direction
of emission k. The polarization vector of the classical EM wave follows the
projected orbit of the dipole moment.
The angular dependence of the decay rate follows directly from the expres
sions of eqn(670) and eqn(678) by setting m ≈ l ¸1, replacing the radial matrix
elements of r by a, adding the expressions and inserting them into eqn(661). The
analysis shows that quantum mechanics reproduces the classical limit correctly,
as is expected from the correspondence principle.
10.1.5 The Decay Rate from Dipole Transitions.
The decay rate due to dipole transitions includes processes in which photons
of all polarizations are emitted in all directions. Accordingly, the decay rate
is found by summing over all polarizations and integrating over the directions
of the emitted photon. For a spherically symmetric system, the energy will
be independent of the zcomponent of the orbital angular momentum. In this
case, one should sum over all values of m
corresponding to the degenerate ﬁnal
states. On summing over all ﬁnal states corresponding to a speciﬁc l
value, that
is on summing over m
where m
= m, m ± 1, one ﬁnds that the transition
rate can be expressed as
_
1
τ
_
=
4 e
2
ω
3
nl,n
l
3 ¯h c
3
_
(l+1)
(2l+1)
l
(2l+1)
_ ¸
¸
¸
¸
_
∞
0
dr r
2
R
n
l
(r) r R
nl
(r)
¸
¸
¸
¸
2
(688)
for
l
=
_
l + 1
l −1
(689)
It should be noted that, for a ﬁxed l
, the lifetime of the state [ nlm > is inde
pendent of the value of m. This is expected since the choice of the quantization
117
Table 2: Radial wave functions R
nl
(ρ) for a Hydrogeniclike atom, where ρ =
Zr
a
. The functions are normalized so that
_
∞
0
dρ ρ
2
R
nl
R
nl
= 1.
n = 1 l = 0 2 exp
_
− ρ
_
n = 2 l = 0
1
√
2
_
1 −
ρ
2
_
exp
_
−
ρ
2
_
l = 1
1
2
√
6
ρ exp
_
−
ρ
2
_
n = 3 l = 0
2
3
3
2
_
1 −
2
3
ρ +
2
27
ρ
2
_
exp
_
−
ρ
3
_
l = 1
2
5
2
3
7
2
_
1 −
ρ
6
_
ρ exp
_
−
ρ
3
_
l = 2
2
3
2
3
9
2
√
5
ρ
2
exp
_
−
ρ
3
_
118
Table 3: Values of [
_
∞
0
dr r
2
R
nl
r R
n
l−1
[
2
in atomic units.
n, l n
, l −1
np 1s 2
8
n
7
(n −1)
2n−5
(n + 1)
−2n−5
2s 2
17
n
7
(n
2
−1)(n −2)
2n−6
(n + 2)
−2n−6
3s 2
8
3
7
n
7
(n
2
−1)(n −3)
2n−8
(7n
2
−27)
2
(n + 3)
−2n−8
nd 2p 2
19
3
−1
n
9
(n
2
−1)(n −2)
2n−7
(n + 2)
−2n−7
3p 2
11
3
9
n
9
(n
2
−1)(n
2
−4)(n −3)
2n−8
(n + 3)
−2n−8
nf 3d 2
13
3
9
5
−1
n
11
(n
2
−1)(n
2
−4)(n −3)
2n−9
(n + 3)
−2n−9
direction is completely arbitrary.
There are no selection rules associated with the radial integration in the
dipole matrix elements
_
∞
0
dr r
2
R
n
l−1
r R
nl
(690)
The radial part of the dipole matrix element can be expressed in terms of the
hypergeometric function F(a, b, c) via
_
∞
0
dr r
2
R
n
l−1
r R
nl
=
a (−1)
n
−l
4(2l −1)
¸
(n +l)!(n
+l −1)!
(n −l −1)!(n
−l)!
_
(4n
n)
l+1
(n −n
)
n
+n−2l−2
(n
+n)
n
+n
_
_
F(l + 1 −n, l −n
, 2l, −
4n
n
(n
−n)
2
) −
_
n
−n
n
+n
_
2
F(l + 1 −n, l −n
, −
4n
n
(n
−n)
2
)
_
(691)
Simple analytic expressions for the squares of the matrix elements for small val
ues of (n
, l) are shown in Table(3).
The radial integrations were evaluated by Schr¨odinger
35
using the generating
35
E. Schr¨odinger, Ann. der Phys. 79, 362 (1926).
119
function expansion for the Laguerre polynomials. Eckart
36
and Gordon
37
have
calculated these dipole matrix elements by other means. In general, the lifetime
of the hydrogenic states increases with increasing n, varying roughly as n
3
for
a ﬁxed value of l. The decrease in the dipole matrix elements with increasing n
is simply due to the increasing numbers of nodes in the radial wave functions.
The magnitude of the decay rate is estimated as
1
τ
∼
c
a
_
ω a
c
_
3
_
e
2
¯h c
_
(692)
where the magnitude of the dipole matrix element is estimated as e a where a
is the Bohr radius. On setting ¯hω equal to the electrostatic energy of hydrogen,
the remaining factor is estimated to have the magnitude
_
ω a
c
_
∼
_
e
2
¯h c
_
(693)
where the length scale a has dropped out. Hence, as
_
e
2
¯h c
_
≈
1
137.0359979
(694)
one ﬁnds that the decay rate is given by
1
τ
∼
c
a
_
e
2
¯h c
_
4
(695)
so the decay time is approximately eight orders of magnitude larger than the
time taken for the photon to cross the atom. When averaged over l, the electric
dipole decay rate is given by
1
τ n
∝
l
(2l + 1)
n
5
∼ n
−
9
2
(696)
so, as seen in Table(4), the decay is slower for the higher energy levels.
10.1.6 The 2p →1s Electric Dipole Transition Rate.
Consider the decay of the 2p state (with m = 0) to the 1s state in the hydrogen
atom. As can be seen from Table(2), the initial state is described by an electronic
wave function
ψ
2p
(r) =
1
2
√
6 a
3
_
r
a
_
exp
_
−
1
2
r
a
_
_
3
4 π
cos θ (697)
36
C. Eckart, Phys. Rev. 28, 927 (1926).
37
W. Gordon, Ann. der Phys. 2, 1031 (1929).
120
Table 4: Electric Dipole Transition Rates for Hydrogen, in units of 10
8
sec
−1
.
Initial Final n=1 n=2 n=3
2p ns 6.25  
3s np  0.063 
3p ns 1.64 0.22 
3d np  0.64 
4s np  0.025 0.018
4p ns 0.68 0.095 0.030
4p nd   0.003
4d np  0.204 0.070
4f nd   0.137
and the ﬁnal state electronic is given by
ψ
1s
(r) =
2
√
a
3
exp
_
−
r
a
_
1
√
4 π
(698)
where the length scale a is the Bohr radius
a =
¯h
2
m e
2
(699)
The decay rate in the FermiGolden rule, evaluated in the dipole approximation,
is given by
1
τ
=
4 ω
3
1,2
d
∗
1s,2p
. d
1s,2p
3 ¯h c
3
(700)
The frequency is evaluated from
¯h ω
12
= E
2p
− E
1s
=
m e
4
2 ¯h
2
_
1 −
1
4
_
=
3
8
e
2
a
(701)
Hence,
1
τ
=
4
3
_
3
8
_
3
_
e
2
¯h a
_
3
e
2
a
2
¯h c
3
¸
¸
¸
¸
d
1s,2p
e a
¸
¸
¸
¸
2
121
=
9
128
_
e
2
¯h c
_
4
c
a
¸
¸
¸
¸
d
1s,2p
e a
¸
¸
¸
¸
2
(702)
Therefore, the scattering rate is determined by the ratio
c
a
but also is modiﬁed
by the fourth power of the dimensionless electromagnetic coupling strength
_
e
2
¯h c
_
≈
1
137.0359979
(703)
The smallness of this factor allows us to only consider the FermiGolden rule
expression for the decay rate. The dimensionless dipole matrix elements are
expected to be nonzero, since they obey the selection rules. They are nonzero,
as can be directly veriﬁed by performing an integration. The only nonzero
dipole matrix element originates from the zcomponent of the dipole
d
1s,2p
= e
_
d
3
r ψ
∗
1s
(r) r ψ
2p
(r) (704)
since only the zcomponent satisﬁes the ∆m = 0 selection rule. The angular
integration is evaluated as
√
3
4 π
_
2π
0
dϕ
_
π
0
dθ sin θ cos
2
θ =
√
3
4 π
2 π
2
3
=
1
√
3
(705)
and the radial integration yields
_
∞
0
dr r
2
R
∗
1s
(r) r R
2p
(r) =
2
2
√
6
_
∞
0
dr
r
4
a
3
exp
_
−
3
2
r
a
_
=
a
√
6
_
2
3
_
5
_
∞
0
dx x
4
exp
_
− x
_
= a
4!
√
6
_
2
3
_
5
= 4 a
√
6
_
2
3
_
5
(706)
Hence, the magnitude of the dipole matrix element is evaluated as
d
1s,2p
e a
= 4
√
2
_
2
3
_
5
(707)
Therefore, the dipole allowed decay rate is given by
1
τ
=
_
2
3
_
8
_
e
2
¯h c
_
4
c
a
(708)
Hence, the time scale τ is of the order of 10
−10
seconds. The exact value of the
decay time is calculated to be 1.6 10
−9
seconds.
122
10.1.7 Electric Quadrupole and Magnetic Dipole Transitions.
Consider decays such as the 3d state (with m = 0) to the 1s state in the hydrogen
atom. Since, in this transition, the change in the electron’s angular momentum
is two units, the transition is forbidden in the dipole approximation. Therefore,
the transition rate is evaluated by keeping the next order term in the expansion
exp
_
− i k . r
_
≈ 1 − i k . r + . . . (709)
The second term in the expansion describes electric quadrupole and magnetic
dipole transitions.
The matrix elements that have to be evaluated are of the form
< n
l
m
[ ( k . r ) ( ˆ
α
(k) . ˆ p ) [ nlm > (710)
This shall be written as the sum of two terms, with diﬀerent symmetries with re
spect to interchange of r and p. These two terms will describe electric quadrupole
and magnetic dipole transitions. The matrix elements are written as the sum
of a term symmetric under the interchange of r and ˆ p and a term that is anti
symmetric
( k . r ) ( ˆ
α
(k) . ˆ p ) =
1
2
_
( k . r ) ( ˆ
α
(k) . ˆ p ) + ( k . ˆ p ) ( ˆ
α
(k) . r )
_
+
1
2
_
( k . r ) ( ˆ
α
(k) . ˆ p ) − ( k . ˆ p ) ( ˆ
α
(k) . r )
_
(711)
The ﬁrst term represents the matrix elements for the electric quadrupole tran
sitions
38
, and the second term represents the matrix elements for the magnetic
dipole transitions. The ﬁrst term can be written as the scalar products of a
symmetric dyadic
k .
_
r ˆ p + ˆ p r
_
. ˆ
α
(k) (712)
The scalar products are organized such that the left most vector outside the
parenthesis forms a scalar product with the left most vectors within the paren
thesis, and likewise with the right most vectors. The electronic matrix elements
only involve the dyadic operator, as the wave vector and polarization vectors
are properties of the photon. The matrix elements
< n
l
m
[
_
r ˆ p + ˆ p r
_
[ nlm > (713)
are evaluated by ﬁrst noting that
[ r , ˆ p
2
] = 2 i ¯h ˆ p (714)
38
J. A. Gaunt and W. H. McCrea, Proc. Camb. Phil. Soc. 23, 930 (1927).
123
which allows the momentum operator to be written as
ˆ p =
i m
¯h
[
ˆ
H , r ] (715)
Therefore, the matrix elements of the dyadic can be expressed in the form of
the matrix elements of the commutator with the dyadic
< n
l
m
[
_
r ˆ p + ˆ p r
_
[ nlm > =
i m
¯h
< n
l
m
[ r [
ˆ
H , r ] + [
ˆ
H , r ] r [ nlm >
=
i m
¯h
< n
l
m
[ [
ˆ
H , r r ] [ nlm >
= i m
( E
n
l
m
− E
nlm
)
¯h
< n
l
m
[ r r [ nlm >
= i m ω
n
,n
< n
l
m
[ r r [ nlm > (716)
The decay rate in the FermiGolden rule, evaluated in the electric quadrupole
approximation, is given by
1
τ
=
_
e
m c
_
2
_
d
3
k
m
2
c
2
ω
2
k
8 π ω
k
α
[ k . < n
l
m
[ r r [ nlm > . ˆ
α
(k) [
2
δ( E
nlm
− E
n
l
m
− ¯h ω
k
)
=
e
2
8 π ¯h
_
ω
nl,n
l
c
_
5
_
dΩ
k
α
[
ˆ
k . < n
l
m
[ r r [ nlm > . ˆ
α
(k) [
2
(717)
where, in the second line k is restricted to have the magnitude
k =
ω
nl,n
l
c
(718)
The frequency is evaluated from
¯h ω
nl,n
l
= E
nl
− E
n
l
∼
m e
4
¯h
2
= m c
2
_
e
2
¯h c
_
2
∼
e
2
a
(719)
Hence,
1
τ
∼
_
e
2
¯h a c
_
5
e
2
8 π ¯h
_
dΩ
k
α
[
ˆ
k . < n
l
m
[ r r [ nlm > . ˆ
α
(k) [
2
∼
_
e
2
¯h c
_
6
c
a
(720)
124
Therefore, the scattering rate is determined by the ratio
c
a
but also is modiﬁed
by the sixth power of the dimensionless electromagnetic coupling strength
_
e
2
¯h c
_
≈
1
137.0359979
(721)
The smallness of this factor allows us to only consider the FermiGolden rule
expression for the decay rate. The dimensionless quadrupole matrix elements
are expected to be nonzero, since they obey the selection rules which involve
the exchange of two units of angular momentum. They are nonzero, as can
be directly veriﬁed by performing an integration. Therefore, the quadrupole
allowed decay rate is given by
1
τ
∼
_
e
2
¯h c
_
6
c
a
(722)
Hence, the time scale τ is expected to be of the order of 10
−6
seconds
39
.
This type of transition is known as an electric quadrupole transition. Because
of the transversality condition
ˆ
k . ˆ
α
(k) = 0 (723)
one can add a diagonal term to the dyadic without aﬀecting the result. A
diagonal term with a magnitude that makes the resulting dyadic traceless is
added to the dyadic, leading to the expression
Q
i,j
= e
_
x
i
x
j
−
1
3
δ
i,j
[ r [
2
_
(724)
Therefore, the transition rate is governed by the electric quadrupole tensor
< n
l
m
[ Q
i,j
[ nlm > (725)
The symmetric dyadic Q
i,j
has six inequivalent components, which because of
the restriction that the dyadic is traceless, can be reduced to ﬁve independent
components. Due to the transformational properties of the dyadic under ro
tation, it can be expressed as a linear combination of the spherical harmonics
Y
2
m
(θ, ϕ) and nothing else
40
. This can be seen from rewriting the quadrupole
tensor
˜
Q
˜
Q
e
=
_
_
_
xx −
r
2
3
xy xz
yx yy −
r
2
3
yz
zx zy zz −
r
2
3
_
_
_ (726)
39
This estimate will be modiﬁed upwards by several orders of magnitude, due to the presence
of a large dimensionless factor that was not accounted for.
40
The transformational properties of the dyadic follow immediately from the transforma
tional properties of the vector r
125
in terms of spherical polar coordinates
˜
Q
e r
2
=
_
_
1
2
sin
2
θ cos 2ϕ −
1
6
(3 cos
2
θ −1)
1
2
sin
2
θ sin 2ϕ sin θ cos θ cos ϕ
1
2
sin
2
θ sin 2ϕ −
1
2
sin
2
θ cos 2ϕ −
1
6
(3 cos
2
θ −1) sin θ cos θ sin ϕ
sin θ cos θ cos ϕ sin θ cos θ sin ϕ
1
3
(3 cos
2
θ −1)
_
_
(727)
The presence of states with orbital angular momentum of only two makes the
dyadic an irreducible second rank tensor. Application of the WignerEckart
theorem to an irreducible second rank tensor results in the electric quadrupole
selection rules
l + l
≥ 2 ≥ [ l − l
[ (728)
The angular momentum carried away by the photon consists of the spinone
carried away by the photon in addition to the component of the photon’s wave
function described by the spherical Bessel function j
1
(kr) ∼ k r which carries
oﬀ one unit of orbital angular momentum. In addition to the angular momen
tum selection rules, there are parity selection rules for the electric quadrupole
transitions. Since the parity operator satisﬁes
ˆ
T r
ˆ
T
−1
= − r (729)
then the electric quadrupole matrix elements satisfy
< n
[ r r [ n > = < n
[
ˆ
T r r
ˆ
T
−1
[ n >
= p
n
p
n
< n
[ r r [ n > (730)
Therefore, the parity does not change in an electric quadrupole transition as
p
n
p
n
= 1.
The magnetic dipole matrix elements are given by
1
2
_
( k . r ) ( ˆ
α
(k) . ˆ p ) − ( k . ˆ p ) ( ˆ
α
(k) . r )
_
(731)
which can be rewritten as
1
2
_
( k ∧ ˆ
α
(k) ) . ( r ∧ ˆ p )
_
(732)
The ﬁrst term is of the form
B = ∇ ∧ A → k ∧ A → ∇ ∧ ˆ
α
(k) (733)
and the second term
r ∧ p (734)
is the orbital angular momentum. The orbital angular momentum produces the
orbital magnetic moment given by
µ =
e
2 m c
( r ∧ p ) (735)
126
The magnetic dipole transition should be extended from orbital angular mo
mentum to include the spin magnetic moment which is of the same order
_
e ¯h
2 m c
_
σ . ( k ∧ ˆ
α
(k) )
_
e
2 m c
_
( r ∧ p ) . ( k ∧ ˆ
α
(k) ) (736)
since orbital angular momentum is quantized in units of ¯ h. The angular mo
mentum selection rule for the magnetic dipole transition is given by
∆l = 0 (737)
and
1 ≥ [ ∆m [ (738)
also parity does not change.
Terms with higherorder orbital angular momentum that occur in the ex
pansion of the photon’s wave function exp[ i k . r ] can be found by using the
Rayleigh expansion. The terms with orbital angular momentum l are propor
tional to the spherical harmonics j
l
(kr) which vary as (kr)
l
when kr →0, as is
found from the expansion of the exponential term. The presence of the extra
factors k
l
in the matrix element has the result that the electric 2
s
th multipole
transition rates are found to vary as
1
τ
∝
_
ω
n,n
c
_
2s+1
a
2s
(739)
where s is the magnitude of the change in the electronic orbital angular mo
mentum, which satisﬁes the inequality
( l + l
) ≥ s ≥ [ l
− l [ (740)
The extra factors from the photon’s angular momentum results in an overall de
crease in the electric multipole transition rate by a factor of (
e
2
¯ h c
)
2s
. It should
also be noted that the relative strength of the higherorder electric multipole
transitions increase more rapidly with Z than the electric dipole transitions.
Therefore, it is frequently found that the quadrupole transitions cannot be ne
glected for the heavy elements. Alternatively, higherorder multipole transitions
do become important in the xray region, since in this region the wavelength of
the radiation is comparable to the spatial extent of the charged particle’s wave
function.
10.1.8 The 3d →1s Electric Quadrupole Transition Rate
The transition 3d → 1s is forbidden to occur via the dipole process, since it
involves a change of l by two units. It may occur as an electric quadrupole
127
transition. The electric quadrupole transition rate can be expressed as
1
τ
=
1
8 π ¯h
_
ω
c
_
5
_
dΩ
k
α
[
ˆ
k . < 1s [
˜
Q [ 3d > . ˆ
α
(k) [
2
(741)
where
˜
Q represents the quadrupole tensor. The frequency factor can be evalu
ated as
_
ω
c
_
=
4
9 a
_
e
2
¯h c
_
(742)
hence, the rate can be expressed in the form
1
τ
=
c
8 π a
_
4
9
_
5
_
e
2
¯h c
_
6
_
dΩ
k
α
1
e
2
a
4
[
ˆ
k . < 1s [
¯
Q [ 3d > . ˆ
α
(k) [
2
(743)
We shall consider the transition from the m = 0 state of the 3d level to the 1s
state. As can be easily shown, the matrix elements of quadrupole tensor for this
transition are diagonal and are given by
< 1s [
˜
Q [ 3d > = < 1s [
_
_
−
Qzz
2
0 0
0 −
Qzz
2
0
0 0 Q
zz
_
_
[ 3d > (744)
Therefore, the transition matrix elements are of the form
_
dΩ
k
α
1
e
2
a
4
[
ˆ
k . < 1s [
¯
Q [ 3d > . ˆ
α
(k) [
2
=
_
dΩ
k
α
[ < 1s [ Q
zz
[ 3d > [
2
e
2
a
4
_
ˆ
k
z
ˆ
α
(k)
z
−
1
2
ˆ
k
x
ˆ
α
(k)
x
−
1
2
ˆ
k
y
ˆ
α
(k)
y
_
2
(745)
The direction of the emitted photon
ˆ
k is expressed as
ˆ
k = (sin θ
k
cos ϕ
k
, sin θ
k
sin ϕ
k
, cos θ
k
) (746)
and the polarization vectors are given by
ˆ
1
(k) = (cos θ
k
cos ϕ
k
, cos θ
k
sin ϕ
k
, −sin θ
k
)
ˆ
2
(k) = (−sin ϕ
k
, cos ϕ
k
, 0) (747)
Thus for the m = 0 level, one ﬁnds that the integral over the angular distribution
is given by
_
dΩ
k
α
1
e
2
a
4
[
ˆ
k . < 1s [
¯
Q [ 3d > . ˆ
α
(k) [
2
=
_
dΩ
k
[ < 1s [ Q
zz
[ 3d > [
2
e
2
a
4
_
−
3
2
sin θ
k
cos θ
k
_
2
=
3
10
4 π
[ < 1s [ Q
zz
[ 3d > [
2
e
2
a
4
(748)
128
The scattering rate becomes
1
τ
=
3
20
_
4
9
_
5
c
a
_
e
2
¯h c
_
6
[ < 1s [ Q
zz
[ 3d > [
2
e
2
a
4
(749)
The dimensionless quadrupole matrix elements are evaluated as the product of
an angular integral and a radial integral
< 1s [ Q
zz
[ 3d >
e a
2
=
_
dΩ Y
0
0
(θ, ϕ)
∗
1
3
( 3 cos
2
θ − 1 ) Y
2
0
(θ, ϕ)
_
∞
0
dr r
2
R
∗
1s
(r)
r
2
a
2
R
3d
(r)
= −
1
3
_
4
5
4
_
3
4
_9
2
(750)
Finally, one ﬁnds the resulting expression for the quadrupole decay rate of the
3d state with m = 0
1
τ
=
1
3600
c
a
_
e
2
¯h c
_
6
(751)
which is evaluated as 228 sec
−1
. From the above analysis, it is seen that angular
distribution for the emitted photon is governed by the factor
cos
2
θ
k
sin
2
θ
k
(752)
and the intensity is largest for the cone with θ
k
≈ 0.28 π or 0.72 π. This angular
dependence of the emitted radiation is the same as found by considering the
radiation from an oscillating classical quadrupole, for which the radiated power
is given by
_
dP
dΩ
k
_
quad
=
c
288 π
_
ω
c
_
6
Q
2
cos
2
θ
k
sin
2
θ
k
(753)
However, the angular distribution of the emitted quadrupole radiation is to
be contrasted with the ∆m = 0 decay of a 2p electron, that has an angular
distribution of emitted photons given by
sin
2
θ
k
(754)
which is maximum for θ
k
=
π
2
.
The 3d → 1s quadrupole decay rate will be dwarfed by dipole allowed cas
cade emission processes such as 3d → 2p followed by 2p → 1s. Therefore, the
intensity of the emitted light corresponding to the 3d →1s process is expected
to be extremely weak. However, the quadrupole line is expected to be much
more readily observed in absorption spectra.
129
θ
m = 0
Figure 23: The angular distribution of quadrupole radiation for the ∆m = 0
transition, as a function of polar angle θ
k
.
10.1.9 Twophoton decay of the 2s state of Hydrogen.
The 2s state of the hydrogen atom can not decay via the paramagnetic interac
tion since, as it can be shown that the matrix elements that govern the emission
intensity vanish
< 1s [ ˆ
α
(k) . ˆ p exp
_
− i k . r
_
[ 2s > = 0 (755)
First on integrating by parts, the matrix elements can be written as
i ¯h < 1s [ ˆ
α
(k) . ∇ exp
_
− i k . r
_
[ 2s >
= + i ¯h < 2s [ exp
_
+ i k . r
_
ˆ
α
(k) . ∇ [ 1s >
∗
(756)
On utilizing the expression for the 1s wave function
ψ
1s
(r) =
1
√
π a
3
exp
_
−
r
a
_
(757)
one ﬁnds that
∇ψ
1s
(r) = −
r
r
ψ
1s
(r)
a
(758)
130
Hence, the transition matrix element is given by
− i
¯h
a
_
∞
0
d
3
r ψ
2s
(r) exp
_
− i k . r
_
ˆ
α
(k) . r
r
ψ
∗
1s
(r) (759)
The matrix elements can be simply evaluated in spherical polar coordinates if
one chooses the direction of k as the polar axis. The planewave, therefore, only
depends on the zcomponent of r and since
ˆ
α
(k) . k = 0 (760)
the factor ˆ
α
(k) . r only depends on x and y and is antisymmetric with respect
to the transformations x → − x and y → − y. All other factors are even
functions of x and y. On integrating over the directions in the x −y plane, one
ﬁnds that the integral is identically zero.
The above result could have been (partially) anticipated by considering the
selection rules. The electric dipole transition is forbidden by parity. The mag
netic dipole transition is zero in this nonrelativistic treatment. All magnetic
and electric quadrupole and higher multipole transitions are forbidden by an
gular momentum conservation.
The 2s state decays via twophoton emission which is described by the dia
magnetic interaction and by the eﬀect of the paramagnetic interaction taken to
secondorder in timedependent perturbation theory. Since only the part of the
paramagnetic interaction that creates a photon is involved, for our purposes the
paramagnetic interaction can be replaced by
ˆ
H
para
→ −
q
m c
k,α
_
2 π ¯h c
2
V ω
k
_1
2
ˆ p . ˆ
α
(k) a
†
k,α
exp
_
− i k . r
_
(761)
Likewise, the diamagnetic interaction can be replaced by
hk
n'
n
Figure 24: The onephoton emission part of the paramagnetic interaction.
131
ˆ
H
dia
→
q
2
2 m c
2
k,α;k
,α
_
2 π ¯h c
2
V
_
ˆ
α
(k) . ˆ
α
(k
)
√
ω
k
ω
k
a
†
k,α
a
†
k
,α
exp
_
−i ( k +k
) . r
_
(762)
for twophoton emission. The system is assumed to be initially in an eigenstate
2s
1s
(k',α')
(k,α)
Figure 25: The twophoton emission part of the diamagnetic interaction.
of the unperturbed Hamiltonian [ n > but, due to the interaction
ˆ
H
int
makes
transitions to states [ n
>. Following the usual procedure of timedependent
perturbation theory, the above state [ ψ
n
> can be decomposed in terms of a
complete set of noninteracting energy eigenstates [ n > via
[ ψ
n
> =
n
C
n
(t) [ n
> (763)
where C
n
(t) are timedependent coeﬃcients. The probability of ﬁnding the
system in the ﬁnal state [ n
> at time t is then given by [C
n
(t)[
2
. The rate
at which the transition n → n
occurs is then given by the timederivative of
[C
n
(t)[
2
.
It shall be assumed that the interaction is slowly turned on when t → − ∞.
The interaction can be reduced to zero at large negative times by introducing a
multiplicative factor of exp[ + η t ] in the interaction, where η is an inﬁnitesi
mally small positive constant. To ﬁrstorder in the diamagnetic interaction, one
ﬁnds
C
(1)
n
(t) =
_
− i
¯h
_
< n
[ exp[ − i ( k
+ k ) . r ] [ n >
_
q
2
2 m c
2
_ _
2 π ¯h c
2
√
ω
k
ω
k
V
_
2 ˆ
α
(k) . ˆ
α
(k
)
_
t
−∞
dt
exp
_
i
¯h
( ¯h ω
+ ¯h ω + E
n
− E
n
) t
_
(764)
where ω = c k and ω
= c k
are the energies of the two photons in the ﬁnal
state. The small quantity η has been absorbed as a small imaginary part to the
initial state energy
E
n
→ E
n
+ i η ¯h (765)
132
The paramagnetic interaction is of order of q and the diamagnetic interaction
is of order q
2
. Thus, to secondorder in q, one must include the diamagnetic
interaction and the paramagnetic interaction to secondorder. There are two
secondorder terms which represent:
(a) emission of the photon (k, α) followed by the emission of a photon (k
, α
).
(b) emission of a photon (k
, α
) followed by the emission of the photon (k, α).
n
n''
n'
(k,α) (k',α')
n
n''
n'
(k,α) (k',α')
Figure 26: The twophoton emission processes to the paramagnetic interaction
to secondorder.
The secondorder contribution to the transition amplitude is given by
C
(2)
n
(t) =
_
− i
¯h
_
2
_
q
m c
_
2
_
2 π ¯h c
2
√
ω
k
ω
k
V
_ _
t
−∞
dt
_
t
−∞
dt
_
n
exp[
i
¯h
(E
n
+ ¯hω
−E
n
) t
] exp[
i
¯h
(E
n
+ ¯hω −E
n
) t
]
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k) . ˆ p [ nlm >
+
n
exp[
i
¯h
(E
n
+ ¯hω −E
n
) t
] exp[
i
¯h
(E
n
+ ¯ hω
−E
n
) t
]
< n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k
) . ˆ p [ nlm >
_
(766)
The earliest time integration can be evaluated leading to
C
(2)
n
(t) =
_
− i
¯h
_ _
q
m c
_
2
_
2 π ¯h c
2
√
ω
k
ω
k
V
_ _
t
−∞
dt
n
exp[
i
¯h
(¯ hω
+E
n
−E
n
−¯hω) t
]
_
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k) . ˆ p [ nlm >
( E
n
− E
n
− ¯h ω )
+
< n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k
) . ˆ p [ nlm >
( E
n
− E
n
− ¯h ω
)
_
(767)
133
as long as the denominators are nonvanishing.
The coeﬃcients C
(1)
n
(t) and C
(2)
n
(t) have the same type of timedependence.
The remaining integration over time yields
_
− i
¯h
_ _
t
−∞
dt
exp
_
i
¯h
(¯ hω
+ ¯hω +E
n
−E
n
−i¯hη) t
_
= −
exp[
i
¯ h
(¯ hω
+ ¯ hω +E
n
−E
n
−i¯hη) t ]
(¯ hω
+ ¯hω +E
n
−E
n
−i¯hη)
(768)
The transition rate is given by
1
τ
=
∂
∂t
_ ¸
¸
¸
¸
C
(1)
n
(t) + C
(2)
n
(t)
¸
¸
¸
¸
2
_
(769)
but the timedependence of the squared modulus is contained in the common
factor
¸
¸
¸
¸
exp[
i
¯ h
(¯ hω
+ ¯hω +E
n
−E
n
−i¯hη) t ]
(¯ hω
+ ¯ hω +E
n
−E
n
−i¯hη)
¸
¸
¸
¸
2
=
exp
_
2 η t
_
(¯ hω
+ ¯hω +E
n
−E
n
)
2
+ ¯h
2
η
2
(770)
Since the momenta and polarizations of the emitted photons are not measured,
the rate is summed over (k, α) and (k
, α
). Therefore, the transition rate is
given by the expression
1
τ
=
k,α:k
,α
2 η exp
_
2 η t
_
(¯ hω
+ ¯hω +E
n
−E
n
)
2
+ ¯h
2
η
2
M
2
(771)
where the matrix elements M are due to the combined eﬀect of the diamagnetic
interaction and the paramagnetic interaction taken to secondorder. That is,
M = < 1s kα, k
α
[
ˆ
H
dia
[ 2s >
+
n
l
m
< 1s k α, k
α
[
ˆ
H
para
[ n
l
m
k
α
> < n
l
m
k
α
[
ˆ
H
para
[ 2s >
E
2s
− E
n
l
m
− ¯hω
k
+
n
l
m
< 1s k α, k
α
[
ˆ
H
para
[ n
l
m
kα > < n
l
m
kα [
ˆ
H
para
[ 2s >
E
2s
− E
n
l
m
− ¯hω
k
(772)
These three terms add coherently, and it should be noted that the intermediate
state is only a virtual state and it can have a higherenergy than the 2s state
41
.
41
Due to the Lamb shift, there is a 2p state with slightly lower energy than the 2s state.
However, due to the small magnitude of the energy diﬀerence, the part of the decay process
involving any real 2p transition is negligibly small.
134
In the limit η → 0 the ﬁrst term in the expression for the transition rate
of eqn(771) reduces to a delta function which expresses conservation of energy
between the initial and ﬁnal states.
lim
η→0
1
π
η exp
_
2 η t
_
(¯ hω
+ ¯hω +E
n
−E
n
)
2
+ ¯h
2
η
2
= δ( E
2s
− E
1s
− ¯hω
k
− ¯hω
k
)
(773)
In the limit η → 0 the transition rate reduces to the FermiGolden rule
expression
1
τ
=
2π
¯h
k,α:k
,α
[ M [
2
δ( E
2s
− E
1s
− ¯hω
k
− ¯hω
k
) (774)
The emitted photons have continuous spectra. In the expression for the matrix
elements M, the last two terms diﬀer in the timeorder that the two photons
are emitted. On inserting the expressions for the interactions into M, one can
pull out the common factors leaving a dimensionless matrix element M
. This
leads to the expression
M =
_
q
2
2 m c
2
_ _
2 π ¯h c
2
V
_
1
√
ω
k
ω
k
M
(775)
where M
is the dimensionless factor given by
M
= ˆ
α
(k) . ˆ
α
(k
) < 1s [ exp
_
− i ( k + k
) . r
_
[ 2s >
+
2
m
n
l
m
< 1s [ ˆ
α
(k) . ˆ p exp[ − i k . r ][ n
l
m
> < n
l
m
[ ˆ
α
(k
) . ˆ p exp[ − i k
. r ] [ 2s >
E
2s
− E
n
l
m
− ¯hω
k
+
2
m
n
l
m
< 1s [ ˆ
α
(k
) . ˆ p exp[ − i k
. r ][ n
l
m
> < n
l
m
[ ˆ
α
(k) . ˆ p exp[ − i k . r ] [ 2s >
E
2s
− E
n
l
m
− ¯hω
k
(776)
The ﬁrst term is negligible, since 1 ¸ k . r and the electronic eigenstates
are orthogonal. The order of magnitude of the second term is given by the
electronic kinetic energy divided by the excitation energy. Hence, the reduced
matrix elements have a magnitude of the order of unity. The transition rate is
given by
1
τ
=
_
e
2
2 m c
2
_
2
( 2 π )
3
V
2
k,α;k
,α
¯h c
2
k k
[M
[
2
δ( E
2s
− E
1s
− ¯hω
k
− ¯hω
k
)(777)
One can assume that the dipole matrix elements of the intermediate states
should be randomly oriented in space, since the initial and ﬁnal electronic states
135
are isotropic. After summing over the polarizations, the transition rate becomes
isotropic. On setting
α,α
[ M
[
2
≈ 1 (778)
one ﬁnds
1
τ
=
_
e
2
2 m c
2
_
2
¯h c
2
( 2 π )
3
_
d
3
k
k
_
d
3
k
k
δ( E
2s
− E
1s
− ¯hω
k
− ¯hω
k
)(779)
Since the integrand is independent of the direction of k and k
, the angular
integrations can be performed leaving
1
τ
=
_
e
2
m c
2
_
2
¯h c
2
2 π
_
∞
0
dk k
_
∞
0
dk
k
δ( E
2s
− E
1s
− ¯hω
k
− ¯hω
k
)(780)
On integrating over k
, one obtains
1
τ
=
_
e
2
m c
2
_
2
c
2 π
_ ω
c
0
dk k (
ω
21
c
− k ) (781)
where ω
12
is related to the energy diﬀerence of the 1s and 2s states. An ele
mentary integration yields
1
τ
=
_
e
2
m c
2
_
2
c
12 π
_
ω
12
c
_
3
(782)
The ﬁrst factor has dimensions of length squared and can be recognized as the
square of the classical radius of the electron. However, since
ω
12
c
=
3
8
e
2
¯h c a
(783)
and
a =
¯h
2
m e
2
(784)
or
e
2
m c
2
=
a e
4
¯h
2
c
2
(785)
one ﬁnds the decay rate is approximated by
1
τ
=
1
12 π
_
3
8
_
3
c
a
_
e
2
¯h c
_
7
(786)
Thus, the estimated decay rate is 8.75 sec
−1
. The exact value calculated by
Shapiro and Breit
42
is 8.266 sec
−1
.
42
J. Shapiro and G. Breit, Phys. Rev. 113, 179 (1959).
136
10.1.10 The Absorption of Radiation
If a process occurs in which only a photon with quantum numbers (k, α) is
absorbed, then the numbers of quanta in the initial and ﬁnal state of the elec
tromagnetic ﬁeld are given by
n
k,α
= n
k,α
− 1
n
k
,β
= n
k
,β
(787)
The matrix elements of the paramagnetic interaction are given by
< n
l
m
¦n
k
,β
¦ [
ˆ
H
para
[ nlm ¦n
k
,β
¦ >
= −
_
q
m c
_
k,α
√
n
k,α
¸
2 π ¯h c
2
V ω
k
< n
l
m
[ ˆ p . ˆ
α
(k) exp
_
+ i k . r
_
[ nlm >
(788)
The photon absorption rate is found from the FermiGolden rule expression
1
τ
=
2 π
¯h
_
q
m c
_
2
_
2 π ¯h c
2
V ω
k
_
n
k,α
n
l
m
δ( E
nlm
+ ¯hω
k
− E
n
l
m
)
[ < n
l
m
[ ˆ p . ˆ
α
(k) exp
_
+ i k . r
_
[ nlm > [
2
(789)
This is related to the lifetime due to stimulated emission, if the initial and ﬁnal
states are interchanged.
The scattering crosssection for photon absorption σ
absorb
(ω) is found by
relating the number of photons absorbed (per second) to the product of the
incident ﬂux and the crosssection. The photon ﬂux is given by the photon
density times the velocity of light
j =
n
k,α
V
c
ˆ
k (790)
Hence, the crosssection can be written as
σ
absorb
(ω
k
) =
_
V
n
k,α
c
_
2 π
¯h
_
q
m c
_
2
_
2 π ¯h c
2
V ω
k
_
n
k,α
n
l
m
δ( E
nlm
+ ¯hω
k
− E
n
l
m
)
[ < n
l
m
[ ˆ p . ˆ
α
(k) exp
_
+ i k . r
_
[ nlm > [
2
(791)
which simpliﬁes to
σ
absorb
(ω
k
) =
_
4 π
2
e
2
m
2
ω
k
c
_
n
l
m
δ( E
nlm
+ ¯hω
k
− E
n
l
m
)
[ < n
l
m
[ ˆ p . ˆ
α
(k) exp
_
+ i k . r
_
[ nlm > [
2
(792)
137
The absorption crosssection is independent of the volume of the electromag
netic cavity and the number of photons in the incident beam. As a function of
frequency, the Born approximation for the crosssection for photon absorption
contains delta function lines corresponding to the atomic excitation energies.
Measured absorption lines do have natural widths ∆ω
nl,n
l
and the absorbtion
spectra can be approximated by the sums of Lorentzian functions. The widths
0
10
20
30
40
50
60
70
0.7 0.8 0.9 1
hω [ Ryd ]
σ
(
ω
)
[
h
2
/
m
]
Figure 27: A sketch of the photon absorption crosssection σ(ω) (in units of
¯ h
2
m
) as a function of photon energy ¯ hω (in units of Rydbergs). The plot over
emphasizes the role of the photon lifetimes, since the ratio of the linewidth to
the photon frequency is of the order of (
e
2
¯ hc
)
3
.
of the lines are governed by half the sum of the decay rates of the initial and
ﬁnal electronic levels.
∆ω
nl,n
l
=
1
2
_
1
τ
nl
+
1
τ
n
l
_
(793)
This formula implies that rapidly decaying levels will yield broad lines, but does
not imply the converse
43
. The spectral widths can be described by the inclu
sion of the eﬀects of interaction to higher orders
44
. The higherorder processes
produce small shifts of the atomic energy levels and also give the energies small
imaginary parts, resulting in a Lorentzian line shape. Since a typical atomic
transition rate is of the order of 10
8
sec
−1
and a typical photon frequency is of
the order of 10
15
sec
−1
, the widths of the lines can usually be neglected.
The absorption crosssection can be evaluated in the dipole approximation
σ
absorb
(ω
k
) =
_
4 π
2
e
2
m
2
ω
k
c
_
n
l
m
δ( E
nlm
+ ¯hω
k
− E
n
l
m
)
43
Lines in the absorption spectra with weak intensities can be broad if the ﬁnal states are
rapidly decaying.
44
V. F. Weisskopf and E. Wigner, Z. Physik, 63, 54 (1930), Zeit. f¨ ur Physik, 65, 18 (1930).
138
[ < n
l
m
[ ˆ p . ˆ
α
(k) [ nlm > [
2
(794)
which can be rewritten as
σ
absorb
(ω
k
) =
_
4 π
2
e
2
ω
k
c
_
n
l
m
δ( E
nlm
+ ¯hω
k
− E
n
l
m
)
[ < n
l
m
[ r . ˆ
α
(k) [ nlm > [
2
(795)
For an isotropic medium, the electronic states are degenerate with respect to
the zcomponents of the orbital angular momentum, so the initial state (n, l, m)
should be averaged over the diﬀerent values of m
1
(2l + 1)
l
m=−l
(796)
and the values of m
for the ﬁnal states are summed over all possible values.
This averaging process results in an isotropic absorption rate, and is equivalent
to averaging the polarization vector over all directions in space. Therefore, in
the dipole approximation, the absorption crosssection for an isotropic medium
is given by the expression
σ
absorb
(ω) =
4 π
2
3
_
e
2
¯h c
_
n
l
m
ω
n
l
,nl
[ < n
l
m
[ r [ nlm > [
2
δ(ω
n
l
,nl
−ω )
(797)
The strength of each absorption line can be found by integrating the cross
section over a narrow frequency range centered on the frequency of the absorp
tion line. (More speciﬁcally, the width of the interval of integration must be
greater than the natural linewidth.) The integrated intensity of the transition
(nlm) →(n
l
m
) is given by
_
ω
nl,n
l
+
ω
nl,n
l
−
dω σ
absorb
(ω) =
4 π
2
3
_
e
2
¯h c
_
ω
n
l
,nl
[ < n
l
m
[ r [ nlm > [
2
(798)
The intensity of each line is proportional to the “oscillator strength” f
nl→n
l
deﬁned as
f
nl→n
l
=
2 m ω
n
l
,nl
¯h
[ < n
l
m
[ r [ nlm > [
2
(799)
The intensities and the frequencies of all the transitions are related via sum
rules
45
. These sum rules involve quantities of the form
n
l
m
ω
p
n
l
m
,nlm
[ < n
l
m
[ r [ nlm > [
2
(800)
139
Table 5: Sum Rules for Dipole Transitions
p
n
l
m
ω
p
n
l
m
,nlm
[ < n
l
m
[ r [ nlm > [
2
0 < nlm [ r
2
[ nlm >
1
3 ¯ h
2 m
2
2
m
( E
nlm
− < nlm [ V [ nlm > )
3
¯ h
2 m
< nlm [ ∇
2
V [ nlm >
and have values given in the Table(5). The sum rules can be used to provide
checks of experimental data.
Sum Rules for Dipole Radiation
There exists a systematic way of deriving sum rules for the weighted inten
sities of the dipole allowed transitions. The sum rules are of the form
n
l
m
ω
p
nl,n
l
[ < nlm [
ˆ
A [ n
l
m
> [
2
(801)
where
¯h ω
nl,n
l
= E
nl
− E
n
l
(802)
and p is a positive integer.
Consider the expression
F(t) = < nlm [
ˆ
A(t)
ˆ
A
†
(0) [ nlm > (803)
where the operator
ˆ
A(t) is given in the Heisenberg representation
ˆ
A(t) = exp
_
+
i
¯h
ˆ
H
0
t
_
ˆ
A exp
_
−
i
¯h
ˆ
H
0
t
_
(804)
45
W. Thomas, Naturwiss. 11, 527 (1925).
F. Reiche and W. Thomas, Zeit. f¨ ur Physik, 34, 510 (1925).
W. Kuhn, Zeit. f¨ ur Physik, 33, 408 (1925).
140
Then, on taking successive derivatives of F(t) with respect to t, one ﬁnds
_
∂F
∂t
_
=
i
¯h
< nlm [ [
ˆ
H
0
,
ˆ
A(t) ]
ˆ
A
†
(0) [ nlm > (805)
and
_
∂
2
F
∂t
2
_
=
_
i
¯h
_
2
< nlm [ [
ˆ
H
0
, [
ˆ
H
0
,
ˆ
A(t) ] ]
ˆ
A
†
(0) [ nlm > (806)
etc. This process shows that the pth derivative is expressed as p nested com
mutators
_
∂
p
F
∂t
p
_
=
_
i
¯h
_
p
< nlm[ [
ˆ
H
0
, [ . . . [
ˆ
H
0
, [
ˆ
H
0
,
ˆ
A(t) ] ] . . . ] ]
ˆ
A
†
(0) [ nlm >
(807)
Alternatively, one can insert a complete set of states in the deﬁnition for F(t)
yielding
F(t) =
n
l
m
< nlm [
ˆ
A(t) [ n
l
m
> < n
l
m
[
ˆ
A
†
(0) [ nlm > (808)
but since the states [ nlm > are eigenstates of
ˆ
H
0
, one has
F(t) =
n
l
m
exp
_
i ω
nl,n
l
t
_ ¸
¸
¸
¸
< nlm [
ˆ
A [ n
l
m
>
¸
¸
¸
¸
2
(809)
On taking the pth derivative of this form of F(t), one ﬁnds
_
∂
p
F
∂t
p
_
=
n
l
m
i
p
ω
p
nl,n
l
exp
_
i ω
nl,n
l
t
_ ¸
¸
¸
¸
< nlm[
ˆ
A [ n
l
m
>
¸
¸
¸
¸
2
(810)
The sum rules are found by equating the two forms of the pth timederivative
and then setting t = 0
n
l
m
_
E
nl
− E
n
l
_
p
¸
¸
¸
¸
< nlm [
ˆ
A [ n
l
m
>
¸
¸
¸
¸
2
= < nlm [ [
ˆ
H
0
, [ . . . , [
ˆ
H
0
, [
ˆ
H
0
,
ˆ
A ] ] . . . ] ]
ˆ
A
†
[ nlm >(811)
Hence, the pth moment of the matrix elements of
ˆ
A is related to the expecta
tion value of the product of the pth nested commutator of
ˆ
H
0
and
ˆ
A multiplied
by
ˆ
A
†
.
The expectation value of p nested commutators of
ˆ
A can be expressed as the
expectation value of the product of p−q nested commutators of
ˆ
A with q nested
commutators of
ˆ
A
†
. This can be demonstrated by noting that the expectation
141
value is homogeneous in time. The qth nested commutator of the operator
ˆ
A
can be deﬁned by
ˆ
B
q
= [
ˆ
H
0
,
ˆ
B
q−1
] (812)
where
ˆ
B
0
=
ˆ
A (813)
Likewise,
ˆ
C
q
can be deﬁned as the qth nested commutator of
ˆ
A
†
. However, for
any pair of operators
ˆ
B
p−q−1
and
ˆ
C
q
, one has
< nlm [
ˆ
B
p−q−1
(t)
ˆ
C
q
(0) [ nlm >
=
n
l
m
exp
_
i ω
nl,n
l
t
_
< nlm [
ˆ
B
p−q−1
[ n
l
m
> < n
l
m
[
ˆ
C
q
[ nlm >
= < nlm [
ˆ
B
p−q−1
(0)
ˆ
C
q
(−t) [ nlm > (814)
This is an expression of the homogeneity of time. Hence, on taking a derivative
with respect to t and then setting t = 0, one ﬁnds
< nlm [ [
ˆ
H
0
,
ˆ
B
p−q−1
]
ˆ
C
q
[ nlm > = ( − 1 ) < nlm [
ˆ
B
p−q−1
[
ˆ
H
0
,
ˆ
C
q
] [ nlm >
(815)
On using the deﬁnition of the operators
ˆ
B
p
and C
q
, the above equation reduces
to
< nlm [
ˆ
B
p−q
ˆ
C
q
[ nlm > = ( − 1 ) < nlm [
ˆ
B
p−q−1
ˆ
C
q+1
[ nlm > (816)
By induction, this shows that the nested commutators can be distributed be
tween the two sides of the expression.
< nlm [
ˆ
B
p
ˆ
C
0
[ nlm > = ( − 1 )
q
< nlm [
ˆ
B
p−q
ˆ
C
q
[ nlm > (817)
which was to be shown.
10.1.11 The Photoelectric Eﬀect
The diﬀerential scattering crosssection for the absorption of a photon by a
hydrogen atom in the ground state accompanied by the emission of an electron
shall be derived. For emitted electrons with suﬃciently high energies, the wave
function for the photoemitted electron can be approximated by a planewave.
The transition rate is given by the FermiGolden rule expression involving the
paramagnetic interaction
1
τ
=
2 π
¯h
_
e
m c
_
2
_
2 π ¯h c
2
V ω
k
_
n
k,α
k
[ < k
[ p . ˆ
α
(k) exp
_
i k . r
_
[ 1s > [
2
δ( E
1s
+ ¯h ω
k
−
¯h
2
k
2
2 m
)
(818)
142
The crosssection is given by
σ =
k
_
4 π
2
e
2
m
2
ω
k
c
_
[ < k
[ p . ˆ
α
(k) exp
_
i k . r
_
[ 1s > [
2
δ( E
1s
+ ¯h ω
k
−
¯h
2
k
2
2 m
)
(819)
where the initial wave function is given by
ψ
1s
(r) =
1
√
π a
3
exp
_
−
r
a
_
(820)
As long as the emitted electron is not close to threshold, the ﬁnal state wave
function can be approximated by a planewave
ψ
k
(r) =
1
√
V
exp
_
i k
. r
_
(821)
The sum over ﬁnal states of the electron can be replaced by an integral over the
magnitude of its momentum and its direction
k
→
V
( 2 π )
3
_
∞
0
dk
k
2
_
dΩ
k
(822)
It is seen that the factor of the volume in the density of ﬁnal states cancels with
the factors from the normalization of the electron’s ﬁnal state. The diﬀerential
crosssection corresponds to the part of the crosssection where the outgoing
electron is emitted into the solid angle dΩ
k
. Hence,
dσ
dΩ
=
V
2 π
_
e
2
m
2
ω
k
c
_ _
∞
0
dk
k
2
[ < k
[ p . ˆ
α
(k) exp
_
i k . r
_
[ 1s > [
2
δ( E
1s
+ ¯h ω
k
−
¯h
2
k
2
2 m
)
(823)
The integration over the magnitude of electron’s ﬁnal momentum k
can be
performed by using the properties of the energy conserving delta function. The
magnitude of electron’s ﬁnal momentum is denoted by k
f
k
2
f
=
2 m
¯h
2
( ¯hω
k
+ E
1s
) (824)
The result of the integration over k
is
dσ
dΩ
=
V
2 π ¯h
2
_
e
2
m ω
k
c
_
k
f
[ < k
f
dΩ
[ p . ˆ
α
(k) exp
_
i k . r
_
[ 1s > [
2
(825)
It is assumed that the initial photon is propagating along the xaxis and is
polarized along the zdirection. The matrix elements involving the momentum
operator only yield a ﬁnite result when ˆ p acts on ψ
1s
(r), since k . ˆ
α
(k) = 0.
However,
ˆ
α
(k) . ˆ p ψ
1s
(r) = i
¯h cos θ
a
ψ
1s
(r) (826)
143
(hω
k
,k)
e
α
e

(E
k'
,k')
θ
'
Figure 28: The geometry for the photoemission of an electron from an atom.
An electromagnetic wave, with polarization along the zaxis, is incident along
the xaxis. The photoemitted electron propagates along the direction k
.
which results in the replacement ˆ p
z
→ i
¯ h cos θ
a
. Thus, one ﬁnds
dσ
dΩ
=
V
2 π
_
e
2
m ω
k
c a
2
_
k
f
[ < k
f
dΩ
[ cos θ exp
_
i k . r
_
[ 1s > [
2
(827)
where (θ, ϕ) are the polar coordinates of the vector r. The matrix elements are
evaluated using the dipole approximation for the photon wave function and set
exp
_
i k . r
_
≈ 1 + i k . r + . . . (828)
and only keep the ﬁrst term of the expansion. The factor cos θ can be expressed
as a spherical harmonic through
cos θ =
_
4 π
3
Y
1
0
(θ, ϕ) (829)
and the ﬁnal state electronic wave function can be expressed in terms of the
Rayleigh expansion
exp
_
i k
f
ˆ
k
. r
_
= 4 π
l,m
i
l
j
l
( k
f
r ) Y
l
m
(θ, ϕ) Y
l
m
∗
(θ
, ϕ
) (830)
where (θ
, ϕ
) are the polar coordinates of the electron’s ﬁnal momentum. The
angular integration over the polar coordinates (θ, ϕ) can be performed by using
the orthogonality relations for the spherical harmonics. The end result is
< k
f
dΩ
[ cos θ [ 1s >= −4 π i cos θ
1
√
π a
3
V
_
∞
0
dr r
2
j
1
(k
f
r) exp
_
−
r
a
_
(831)
where the cos θ
dependence refers to the direction of the emitted electron’s
angular momentum. The radial integral is evaluated to yield
< k
f
dΩ
[ cos θ [ 1s > = − 4 π i cos θ
_
a
3
π V
2 k
f
a
( 1 + k
2
f
a
2
)
2
(832)
144
Therefore, the diﬀerential crosssection is given by
_
dσ
dΩ
_
= 8
_
k
f
k
_ _
e
2
a
m c
2
_
cos
2
θ
_
2 k
f
a
( 1 + k
2
f
a
2
)
2
_
2
(833)
Using
a =
¯h
2
m e
2
(834)
the photoemission crosssection can be rewritten as
_
dσ
dΩ
_
= 8
_
k
f
k
_
a
2
_
e
2
¯h c
_
2
cos
2
θ
_
2 k
f
a
( 1 + k
2
f
a
2
)
2
_
2
(835)
Thus, although the photon is propagating along the xdirection, the electron is
preferentially emitted along the direction of the polarization (θ
≈ 0). This can
be understood as being due to the eﬀect of c being large, so that the photon’s
momentum is negligible compared with the energy, therefore, (in the dipole
approximation) only the direction of the polarization determines the angular
distribution of the emitted electron. It should be noted that in the relativistic
case, where the momentum of the photon is important, the electrons are pre
dominantly ejected in the direction of the photon
46
. This formula also breaks
down for emitted electrons with low energies. In this case, the correct electronic
wave function for the continuous spectrum of
ˆ
H
0
should be used
47
. The in
clusion of the Coulomb attraction of the ion in the ﬁnal state has the eﬀect of
reducing the crosssection near the threshold.
10.1.12 Impossibility of absorption of photons by freeelectrons.
Free electrons are described by the noninteracting Hamiltonian
ˆ
H
0
where
ˆ
H
0
=
ˆ p
2
2 m
(836)
which planewaves as energy eigenstates
φ
k
(r) =
1
√
V
exp
_
− i k
. r
_
(837)
corresponding to the energy eigenvalues
E
k
=
¯h
2
k
2
2 m
(838)
The matrix elements for electromagnetic transitions in which a photon (k, α) is
absorbed is given by
−
_
q
m c
_
_
2 π ¯h c
V ω
k
< k
[ ˆ p . ˆ
α
(k) exp
_
+ i k . r
_
[ k
>
√
n
k,α
(839)
46
F. Sauter, Ann. Phys. 9, 217 (1931), Ann. Phys. 11, 454 (1931).
47
M. Stobbe, Ann. Phys. 7, 661 (1930).
145
which is evaluated as
−
_
q
m c
_
_
2 π ¯h c
V ω
k
ˆ p . ˆ
α
(k) δ
k+k
−k
√
n
k,α
(840)
This shows that momentum is conserved. Furthermore, for the transition rate
p
p'
p''
Figure 29: The absorption of a photon via the paramagnetic interaction.
to represent a real process, it is necessary that energy is conserved between the
initial and ﬁnal states
¯h ω
k
+
¯h
2
k
2
2 m
=
¯h
2
k
2
2 m
(841)
It is impossible for this process to satisfy the conditions for conservation of
energy and momentum. This can be seen by appealing to the relativistic for
mulation where the fourvector momentum is conserved
p
µ
+ p
µ
= p
µ
(842)
Hence,
( p
µ
+ p
µ
) ( p
µ
+ p
µ
) = p
µ
p
µ
(843)
but the electron’s momenta form a Lorentz scalar which is related to the rest
mass
p
µ
p
µ
= p
µ
p
µ
= m
2
c
2
(844)
and the photon has zero mass
p
µ
p
µ
= 0 (845)
Therefore, one ﬁnds that the crossterms vanish
p
µ
p
µ
= 0 (846)
In the rest frame of the electron one has p
µ
= ( m c , 0 ), so the energy of
the photon is identically zero. Therefore, there is no photon and the absorption
process is impossible.
146
10.2 Scattering of Light
Kramers and Heisenberg evaluated the scattering crosssection for light incident
on atomic electrons
48
. The incident photon is denoted by (k, α) and the scat
tered photon by (k
, α
). The scattering crosssection involves the paramagnetic
interaction to secondorder and the diamagnetic interaction to ﬁrstorder. The
matrix elements of the diamagnetic interaction are given by
< n
l
m
k
α
[
ˆ
H
dia
[ nlmkα > =
_
e
2
2 m c
2
_
< n
l
m
k
α
[
ˆ
A .
ˆ
A [ nlmkα >
=
_
e
2
2 m c
2
_
< n
l
m
k
α
[ ( a
k,α
a
†
k
,α
+ a
†
k
,α
a
k,α
) exp
_
i ( k − k
) . r
_
[ nlmkα >
_
2 π ¯h c
2
√
ω
k
ω
k
V
_
ˆ
α
(k) . ˆ
α
(k
) (847)
where it has been assumed that only the initial and ﬁnal photon are present. On
making use of the longwavelength approximation λ ¸ a, the matrix elements
simplify to
< n
l
m
k
α
[
ˆ
H
dia
[ nlmkα > ≈
_
e
2
2 m c
2
_
< n
l
m
[ nlm >
_
2 π ¯h c
2
√
ω
k
ω
k
V
_
ˆ
α
(k) . ˆ
α
(k
)
(848)
The scattering crosssection will be expressed in terms of a transition rate and
the transition rate will be calculated using a similar procedure to that which
was used in describing twophoton decay. An arbitrary state [ ψ
n
> can be
expressed in terms of a complete set of noninteracting states [ n >
[ ψ
n
> =
n
C
n
(t) [ n
> (849)
where C
n
(t) are timedependent coeﬃcients. Initially, the system is assumed to
be in an energy eigenstate [ n > of the unperturbed Hamiltonian, and due to
the interaction makes a transition to a state [ n
>. The probability of ﬁnding
the system in the state [ n
> at time t is then given by [C
n
(t)[
2
. It shall be
assumed that the interaction is turned oﬀ when t → − ∞. The interaction
can be turned oﬀ at large negative times by introducing a multiplicative factor
of exp[ + η t ] in the interaction, where η is an inﬁnitesimally small positive
constant. To ﬁrstorder in the diamagnetic interaction, one ﬁnds
C
(1)
n
(t) =
_
− i
¯h
_
δ
n
,n
_
e
2
2 m c
2
_ _
2 π ¯h c
2
√
ω
k
ω
k
V
_
2 ˆ
α
(k) . ˆ
α
(k
)
_
t
−∞
dt
exp
_
i
¯h
( ¯h ω
+ E
n
− ¯h ω − E
n
) t
_
(850)
48
H. A. Kramers and W. Heisenberg, Z. Physik, 31, 681 (1925).
147
where ω = c k and ω
= c k
and the longwavelength approximation has
been used. The small quantity η has been absorbed as a small imaginary part
to the initial state energy
E
n
→ E
n
+ i η ¯h (851)
The paramagnetic interaction is of order of e and the diamagnetic interaction
is of order e
2
. Thus, to secondorder in e, one must include the diamagnetic
interaction and the paramagnetic interaction to secondorder. There are two
n
n'
k'α'
k''α''
Figure 30: Photon scattering processes due to the diamagnetic interaction to
ﬁrstorder.
terms which are secondorder in the paramagnetic interactions that represent:
(a) absorption of the photon (k, α) followed by the emission of a photon (k
, α
).
(b) emission of a photon (k
, α
) followed by the absorption of the photon (k, α).
n
n''
n'
(k,α)
(k',α')
n
n''
n'
(k,α)
(k',α')
Figure 31: Photon scattering processes due to the paramagnetic interaction to
secondorder.
The secondorder contribution to the transition amplitude is given by
C
(2)
n
(t) =
_
− i
¯h
_
2
_
e
m c
_
2
_
2 π ¯h c
2
√
ω
k
ω
k
V
_ _
t
−∞
dt
_
t
−∞
dt
_
n
exp[
i
¯h
(¯ hω
+E
n
−E
n
) t
] exp[ −
i
¯h
(E
n
−E
n
+ ¯hω) t
]
148
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k) . ˆ p [ nlm >
+
n
exp[
i
¯h
(E
n
−E
n
−¯hω) t
] exp[ −
i
¯h
(E
n
−E
n
−¯hω
) t
]
< n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k
) . ˆ p [ nlm >
_
(852)
The earliest time integration can be evaluated leading to
C
(2)
n
(t) =
_
− i
¯h
_ _
e
m c
_
2
_
2 π ¯h c
2
√
ω
k
ω
k
V
_ _
t
−∞
dt
n
exp[
i
¯h
(¯ hω
+E
n
−E
n
−¯hω) t
]
_
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k) . ˆ p [ nlm >
( E
n
− E
n
+ ¯h ω )
+
< n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k
) . ˆ p [ nlm >
( E
n
− E
n
− ¯h ω
)
_
(853)
as long as the denominators are nonvanishing.
The coeﬃcients C
(1)
n
(t) and C
(2)
n
(t) have the same type of timedependence.
The remaining integration over time yields
_
− i
¯h
_ _
t
−∞
dt
exp
_
i
¯h
(¯ hω
+E
n
−E
n
−¯hω −i¯hη) t
_
= −
exp[
i
¯ h
(¯ hω
+E
n
−E
n
−¯hω −i¯hη) t ]
(¯ hω
+E
n
−E
n
−¯hω −i¯hη)
(854)
The transition rate is given by
1
τ
=
∂
∂t
_ ¸
¸
¸
¸
C
(1)
n
(t) + C
(2)
n
(t)
¸
¸
¸
¸
2
_
(855)
but the timedependence of the squared modulus is contained in the common
factor
¸
¸
¸
¸
exp[
i
¯ h
(¯ hω
+E
n
−E
n
−¯hω −i¯hη) t ]
(¯ hω
+E
n
−E
n
−¯hω −i¯hη)
¸
¸
¸
¸
2
=
exp
_
2 η t
_
(¯ hω
+E
n
−E
n
−¯hω)
2
+ ¯h
2
η
2
(856)
Therefore, one ﬁnds the transition rate is given by the expression
1
τ
=
2 η exp
_
2 η t
_
(¯ hω
+E
n
−E
n
−¯hω)
2
+ ¯h
2
η
2
_
e
2
m c
2
_
2
_
2 π ¯h c
2
√
ω
k
ω
k
V
_
2
M
2
(857)
149
where the matrix elements are given by
M =
_
ˆ
α
(k) . ˆ
α
(k
) < n
l
m
[ nlm >
+
1
m
n
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k) . ˆ p [ nlm >
( E
n
− E
n
+ ¯h ω )
+
1
m
n
< n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k
) . ˆ p [ nlm >
( E
n
− E
n
− ¯h ω
)
_
(858)
On taking the limit η → 0, the ﬁrst factor in the decay rate reduces to an
energy conserving delta function. Therefore, one obtains the FermiGolden rule
expression
1
τ
=
2 π
¯h
_
e
2
m c
2
_
2
_
2 π ¯h c
2
√
ω
k
ω
k
V
_
2
M
2
δ(¯ hω
k
+E
n
−E
n
−¯hω
k
) (859)
The magnitudes of the ﬁnal state photon quantum numbers (k
) must be inte
grated over, since these are not measured. This integration imparts a physical
meaning to the expression for the rate which contains the Dirac delta function.
We shall assume that the direction of the scattered photon is to be measured
and that the photon is absorbed by a detector which subtends a solid angle dΩ
to the atom. Therefore, the scattering rate is given by
1
τ
dΩ
=
2 π
¯h
_
e
2
m c
2
_
2
V
( 2 π )
3
dΩ
_
∞
0
dk
k
2
_
2 π ¯h c
2
√
ω
k
ω
k
V
_
2
[ M [
2
δ(¯ hω
k
+E
n
−E
n
−¯hω
k
)
(860)
Since ¯ h ω
k
= ¯ h c k
, the integration over the delta function can be performed,
yielding
1
τ
dΩ
=
2 π
¯h
_
e
2
m c
2
_
2
V dΩ
( 2 π )
3
ω
2
¯h c
3
_
2 π ¯h c
2
√
ω ω
V
_
2
[ M [
2
(861)
The scattering crosssection is deﬁned as the transition rate divided by the
photon ﬂux. The photon ﬂux is found by noting that it has been assumed that
there is one photon per volume V so the photon density is
1
V
and the speed of
light is c. Hence, the photon ﬂux is given by
c
V
. Therefore, the crosssection is
determined by the KramersHeisenberg formula
_
dσ
dΩ
_
=
_
e
2
m c
2
_
2
_
ω
ω
_
[ M [
2
(862)
The magnitude of the scattering rate is determined by the quantity r
e
which
has the dimensions of length
r
e
=
_
e
2
m c
2
_
(863)
150
This quantity is often called the classical radius of the electron. The quantity
r
e
can be expressed as
r
e
=
_
e
2
m c
2
_
=
_
e
2
¯h c
_ _
¯h
m c
_
≈ 2.82 10
−15
m (864)
10.2.1 Rayleigh Scattering
Rayleigh scattering corresponds to the limit in which the light is elastically
scattered. Hence, one has
ω = ω
(865)
In the case of elastic scattering, all the terms in the KramersHeisenberg formula
are equally important. That all terms have a similar magnitude can be seen by
rewriting the ﬁrst term ˆ
α
(k) . ˆ
α
(k
) in a way which is similar to the second.
The scalar product of the polarization vectors can be expressed as
ˆ
α
(k) . ˆ
α
(k
) =
i,j
ˆ
α
(k)
i
δ
i,j
ˆ
α
(k
)
j
(866)
but one can rewrite the Kronecker delta function in terms of the commutation
relation
[ x
i
, ˆ p
j
] = i ¯h δ
i,j
(867)
Thus, one can express the scalar product as a commutator
ˆ
α
(k) . ˆ
α
(k
) =
1
i ¯h
i,j
ˆ
α
(k)
i
[ x
i
, ˆ p
j
] ˆ
α
(k
)
j
=
1
i ¯h
[ ˆ
α
(k) . r , ˆ p . ˆ
α
(k
) ] (868)
Since, in the dipole approximation, the diamagnetic contribution to the matrix
elements M is proportional to the overlap integral
< n
l
m
[ nlm > (869)
the initial and ﬁnal states must be identical if this is nonzero. Hence, the
result is equivalent to the expectation value in the state [ nlm > . On replacing
the matrix elements by the expectation value and then insert a complete set of
electronic states, one ﬁnds
< n
l
m
[ nlm > ˆ
α
(k) . ˆ
α
(k
)
=
1
i ¯h
n
l
m
_
< n
l
m
[ r . ˆ
α
(k) [ n
l
m
> < n
l
m
[
α
(k
) . ˆ p [ nlm >
− < n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ r . ˆ
α
(k) [ nlm >
_
(870)
151
The matrix elements of r can be expressed in terms of the matrix elements of ˆ p
via
< n
l
m
[ ˆ p [ n
l
m
> =
1
2 i ¯h
< n
l
m
[ [ r , ˆ p
2
] [ n
l
m
>
=
m
i ¯h
< n
l
m
[ [ r ,
ˆ
H
0
] [ n
l
m
>
=
m
i ¯h
( E
n
l
m
− E
n
l
m
) < n
l
m
[ r [ n
l
m
>
(871)
Therefore, one ﬁnds
< n
l
m
[ r [ n
l
m
> =
i
m ω
n
,n
< n
l
m
[ ˆ p [ n
l
m
> (872)
where
E
n
l
m
− E
n
l
m
= ¯ h ω
n
n
(873)
Thus, the elastic scattering term in the KramersHeisenberg formula is given by
δ
nlm,n
l
m
ˆ
α
(k) . ˆ
α
(k
)
=
1
m
n
l
m
< n
l
m
[ ˆ p . ˆ
α
(k) [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k
) [ nlm >
¯h ω
n
n
−
1
m
n
l
m
< n
l
m
[ ˆ p . ˆ
α
(k
) [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k) [ nlm >
¯h ω
nn
(874)
but since for elastic scattering E
nlm
= E
n
l
m
, one has
δ
nlm,n
l
m
ˆ
α
(k) . ˆ
α
(k
)
=
1
m
n
l
m
< n
l
m
[ ˆ p . ˆ
α
(k) [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k
) [ nlm >
E
n
− E
n
+
1
m
n
l
m
< n
l
m
[ ˆ p . ˆ
α
(k
) [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k) [ nlm >
E
n
− E
n
(875)
On substituting this back into the expression for the matrix elements M, one
obtains
M =
_
1
m
n
l
m
< n
l
m
[ ˆ p . ˆ
α
(k) [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k
) [ nlm >
E
n
− E
n
+
1
m
n
l
m
< n
l
m
[ ˆ p . ˆ
α
(k
) [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k) [ nlm >
E
n
− E
n
152
+
1
m
n
l
m
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k) [ nlm >
( E
n
− E
n
+ ¯h ω )
+
1
m
n
l
m
< n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k
) [ nlm >
( E
n
− E
n
− ¯h ω )
_
(876)
which simpliﬁes to
M =
ω
m ¯h
n
l
m
_
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k) [ nlm >
ω
n
n
( ω
nn
+ ω )
−
< n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k
) [ nlm >
ω
n
n
( ω
nn
− ω )
_
(877)
In the limit of small photon frequencies compared with the electronic energies,
one can expand the denominators of the matrix element as
1
ω
nn
( ω
nn
± ω )
=
1
ω
2
nn
∓
ω
ω
3
nn
+ . . . (878)
When this lowfrequency expansion is substituted into the matrix elements, the
leading term vanishes. This can be seen since the leading term is proportional
to
n
1
ω
2
nn
_
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k) [ nlm >
− < n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k
) [ nlm >
_
(879)
which can be expressed as
m
2
n
_
< n
l
m
[ ˆ
α
(k
) . r [ n
l
m
> < n
l
m
[ r . ˆ
α
(k) [ nlm >
− < n
l
m
[ ˆ
α
(k) . r [ n
l
m
> < n
l
m
[ r . ˆ
α
(k
) [ nlm >
_
(880)
or, on using the completeness relation, one ﬁnds the expectation value of the
commutator is given by
m
2
< n
l
m
[ [ ˆ
α
(k
) . r , r . ˆ
α
(k) ] [ nlm > = 0 (881)
153
Thus, the leading term of the lowfrequency expansion vanishes. Therefore, the
scattering rate is expressed as
_
dσ
dΩ
_
=
_
r
e
m ¯h
_
2
ω
4
¸
¸
¸
¸
n
l
m
_
1
ω
nn
_
3
_
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k) [ nlm >
+ < n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k
) [ nlm >
_ ¸
¸
¸
¸
2
(882)
Finally, the scattering rate can be expressed in terms of the dipole matrix ele
ments as
_
dσ
dΩ
_
=
_
r
e
m
¯h
_
2
ω
4
¸
¸
¸
¸
n
l
m
_
1
ω
nn
_
3
_
< n
l
m
[ ˆ
α
(k
) . r [ n
l
m
> < n
l
m
[ r . ˆ
α
(k) [ nlm >
+ < n
l
m
[ ˆ
α
(k) . r [ n
l
m
> < n
l
m
[ r . ˆ
α
(k
) [ nlm >
_ ¸
¸
¸
¸
2
(883)
Hence, at longwavelengths, the scattering crosssection varies as ω
4
as expected
from Rayleigh’s law. Since the typical electronic frequency ω
nn
is in the ultra
violet spectrum, then
ω
nn
¸ ω (884)
for all frequencies in the visible optical spectrum. This leads to the phenomena
of blue skies in the day and red sunsets at dusk.
10.2.2 Thomson Scattering
Thomson scattering occurs for photons with suﬃciently high energies
ω ¸ ω
nn
(885)
so that the photon energy is greater than the atomic bindingenergy. In this
case, the second and third terms in the KramersHeisenberg formula can be
neglected. This is because
ω ∼ ω
¸
1
m
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ p . ˆ
α
(k) [ nlm >
(886)
154
Therefore, the scattering predominantly occurs elastically and the scattering
crosssection is given by
_
dσ
dΩ
_
= r
2
e
¸
¸
¸
¸
ˆ
α
(k) . ˆ
α
(k
)
¸
¸
¸
¸
2
(887)
which is independent of ω. The above result is dependent on the scattering
angle via the polarization vectors.
In the investigation of the angular dependence of Thomson scattering, it is
convenient to introduce a coordinate system which is deﬁned by the polarization
vectors and direction of propagation of the incident photon and its polarization
ˆ
1
(k). The coordinate system is composed of the three orthogonal unit vec
tors (ˆ
1
(k), ˆ
2
(k),
ˆ
k). Thus the direction of the polarization vector ˆ
1
(k) deﬁnes
the xdirection. In this coordinate system, the scattered photon (k
, α
) is in
the direction k
with polar coordinates (θ
k
, ϕ
k
). The polarization of the ﬁnal
photons ˆ
α
(k
) must be transverse to k
. Two polarization vectors are deﬁned
according to
ˆ
1
(k
) = ( cos θ
k
cos ϕ
k
, cos θ
k
sin ϕ
k
, − sin θ
k
) (888)
which lies in the plane of k and k
and
ˆ
2
(k
) = ( − sin ϕ
k
, cos ϕ
k
, 0 ) (889)
which lies in the plane perpendicular to k. In terms of the chosen polarization
k
e
1
(k)
e
2
(k)
k'
φ
k'
θ
k'
e
1
(k')
e
2
(k')
Figure 32: The coordinate system and polarization vectors used to describe
Thomson scattering.
vectors, the scattering crosssection for incident radiation that is polarized along
155
the xdirection takes on the form
_
dσ
dΩ
_
x−pol
= r
2
e
_
cos
2
θ
k
cos
2
ϕ
k
for α
= 1
sin
2
ϕ
k
for α
= 2
(890)
if the polarizations of the ﬁnal photon are measured.
If the incident beam has its polarization along the xdirection, and the de
tector is not sensitive to the polarization, then the ﬁnal polarization must be
summed over. In this case of a polarized beam and a polarization insensitive
detector, the crosssection is given by
_
dσ
dΩ
_
x−pol
= r
2
e
_
cos
2
θ
k
cos
2
ϕ
k
+ sin
2
ϕ
k
_
(891)
where the polarizations of the ﬁnal state photon have been summed over.
If the incident beam of photons is unpolarized, then ϕ is undeﬁned since
the azimuthal direction of the scattered photon is deﬁned with respect to the
assumed polarization ˆ
1
(k). In the case of an unpolarized incident beam the
expression should be integrated over ϕ and divided by 2π. The scattering rate
is given by
_
dσ
dΩ
_
unpol
=
r
2
e
2
_
cos
2
θ
k
for α
= 1
1 for α
= 2
(892)
if the polarizations of the ﬁnal state photons are measured. This result is iden
tical to that obtained by assuming that the initial beam is composed of one half
of the number photons polarized along the xdirection and the other half of the
number of photons polarized along the ydirection. That is
_
dσ
dΩ
_
unpol
= r
2
e
_
1
2
cos
2
θ
k
( cos
2
ϕ
k
+ sin
2
ϕ
k
) for α
= 1
1
2
( sin
2
ϕ
k
+ cos
2
ϕ
k
) for α
= 2
(893)
The crosssection for unpolarized photons with a polarization insensitive detec
tor is given by
_
dσ
dΩ
_
unpol
=
r
2
e
2
_
1 + cos
2
θ
k
_
(894)
where the ﬁnal polarizations have been summed over.
The total crosssection σ is obtained by integrating over all directions. The
total Thomson scattering crosssection is independent of whether the initial
beam was polarized or unpolarized. The ﬁnal result is
σ =
8 π
3
r
2
e
(895)
which has a magnitude of 6.65 10
−29
m
2
. More massive charged particles,
such as protons, can also produce Thomson scattering but the crosssections for
156
these processes are smaller by factors of (
m
M
)
2
. The derivation of the Thomson
scattering crosssection breaks down for photons which have energies of the
order of the electron’s rest energy
¯h ω ∼ m
e
c
2
(896)
For photons with these highenergies, one must describe the scattering process
relativistically. In this energy region, Compton scattering dominates.
Classical Interpretation
The classical counterparts of Rayleigh and Thomson scattering can be de
scribed by a twostep process. In the ﬁrst step, the incident classical electro
magnetic ﬁeld causes an electron to undergo forced oscillations. In the second
step, the oscillating electrons emit electromagnetic radiation.
In the ﬁrst process, an electron bound harmonically to the atom which re
sponds to an electromagnetic ﬁeld E
0
exp[ i ω t ] can be described by the
equation of motion
¨ r + ω
2
0
r =
q
m
E
0
'e exp
_
i ω t
_
(897)
where ω
0
is the frequency of the electron’s natural motion. In the steady state,
one ﬁnds
r =
q
m
E
0
ω
2
0
− ω
2
'e exp
_
i ω t
_
(898)
The acceleration of the charged particle can be described by
¨ r = −
q
m
ω
2
E
0
ω
2
0
− ω
2
'e exp
_
i ω t
_
(899)
The accelerating charged particle radiates electromagnetic energy. The emitted
power is given by the Larmor formula
P(ω) =
2 q
2
r
2
ω
4
3 c
3
=
2 q
4
E
2
0
3 m
2
c
3
ω
4
( ω
2
0
− ω
2
)
2
(900)
while the incident energy ﬂux is given by
c
4 π
E
2
0
(901)
Hence, the scattering crosssection is described by
σ =
8 π
3
r
2
e
ω
4
( ω
2
0
− ω
2
)
2
(902)
157
This formula has the correct frequency dependence in the limit ω ¸ω
0
in which
case the classical crosssection varies as ω
4
, as expected for Rayleigh scattering.
On the other hand, in the limit ω ¸ ω
0
the crosssection becomes frequency
independent, as is expected for Thomson scattering.
10.2.3 Raman Scattering
For inelastic scattering, one has ¯h ω ,= ¯h ω
, therefore, the condition of con
servation of energy requires that
E
nlm
+ ¯h ω = E
n
l
m
+ ¯h ω
(903)
Since it is most probable that the initial electron is in the ground state, one has
E
n
l
m
> E
nlm
(904)
which leads to the inequality
¯h ω > ¯h ω
(905)
Hence, the ﬁnal photon has less energy than the initial photon. That is, the
ω'
I
(
ω
'
)
ω ω−ω
n',n
ω+ω
n',
Stokes
n → n'
antiStokes
n' → n
Figure 33: The schematic frequency dependence of the observed intensity ex
pected in a Raman scattering experiment. The ratio of intensities of the Stokes
and antiStokes lines provides a relative measure of the initial occupation of the
lowenergy state n and the higherenergy excited state n
.
electromagnetic ﬁeld has lost energy and left the electron in an excited state.
This inelastic process describes the Stoke’s line. On the other hand, if the
electron is initially in an excited state, then it is possible that the electron
looses energy and makes a transition to the ground state. In this case,
E
n
l
m
< E
nlm
(906)
158
so the ﬁnal photon is more energetic
¯h ω < ¯h ω
(907)
This process results in the antiStokes line.
10.2.4 Radiation Damping and Resonance Fluorescence
In the analysis of photon scattering, it has been assumed that the energy de
nominators ( E
n
− E
n
+ ¯ h ω ) do not vanish. If the energy denominator
vanishes, the KramersHeisenberg formula becomes singular, however, the phys
ically observed scattering crosssection may become large but does not diverge.
This is the phenomenon of resonanceﬂuorescence. Using the classical model,
one can describe the scattering crosssection, if damping is introduced to rep
resent the lifetime of the electronic states. That is, the dynamics of the bound
electron is modelled by a damped harmonic oscillator
¨ r + γ ˙ r + ω
2
0
r =
q
m
E
0
'e exp
_
i ω t
_
(908)
which has the solution
r =
q
2 π m
E
0
'e
1
ω
2
0
+ i γ ω − ω
2
exp
_
i ω t
_
(909)
since γ is related to the decay rate and is of the order of 10
8
sec
−1
, it is usually
negligible compared with the frequency of light which is estimated as ω ∼ 10
15
sec
−1
. Following our previous arguments, one ﬁnds that the scattering cross
section is given by
σ(ω) =
8 π
3
_
q
2
m c
2
_
2
ω
4
( ω
2
− ω
2
0
)
2
+ γ
2
ω
2
(910)
which no longer diverges when the resonance condition is satisﬁed, because of
the damping of the electronic states.
The lifetime of a quantum mechanical state which at t = 0 is represented
by [ ψ
n
(0) > calculated to secondorder in the interaction
ˆ
H
I
is given by the
FermiGolden rule expression. The rate can be expresses as the limit η →0 by
1
τ
n
= −
2
¯h
·m
n
[ < ψ
n
[
ˆ
H
I
[ ψ
n
> [
2
E
n
− E
n
+ i η
(911)
whereas the energyshift found in secondorder (RayleighSchr¨odinger) pertur
bation theory is also given by the limit η →0 of
∆E
n
= 'e
n
[ < ψ
n
[
ˆ
H
I
[ ψ
n
> [
2
E
n
− E
n
+ i η
(912)
159
so
E
n
= E
(0)
n
+ ∆E
n
(913)
Hence, due to the form of the expressions for the shift and the lifetime as the
real and imaginary parts of a complex function, it is possible to consider an
unstable state as having a complex energy
49
given by
E
n
− i Γ
n
≈ E
(0)
n
+ ∆E
n
− i
¯h
2 τ
n
(914)
That is, the lifetime can be considered as giving the state an energywidth Γ
n
.
This is the natural width of the electronic state. The factor of two in the width
can be understood by considering the timedependence of the state [ ψ
n
(t) >
which is given by
[ ψ
n
(t) > = exp
_
−
i
¯h
( E
n
− i Γ
n
) t
_
[ ψ
n
(0) > (915)
Hence, the probability P
n
(t) that the state has not decayed at time t is given
by
P
n
(t) = [ < ψ
n
(0) [ ψ
n
(t) > [
2
= [ < ψ
n
(0) [ exp
_
−
i
¯h
( E
n
− i Γ
n
) t
_
[ ψ
n
(0) > [
2
= exp
_
−
2
¯h
Γ
n
t
_
(916)
due to the normalization of the initial state. This timedependence of P
n
(t) is
interpreted in terms of the exponential decay of the probability for ﬁnding the
initial state
P
n
(t) = P
n
(0) exp
_
−
t
τ
n
_
(917)
This leads to the identiﬁcation of the relation between the energywidth and
the lifetime
Γ
n
=
¯h
2 τ
n
(918)
Hence, the lifetime τ
n
of an unstable or metastable state can be incorporated
by introducing an imaginary part Γ
n
to the energy.
Therefore, for the case of resonant scattering, one should replace the energies
by complex numbers such that the real part represents the state’s energy and
the imaginary part describes half the state’s decay rate. In the case of resonant
scattering, the KramersHeisenberg formula is modiﬁed
50
to
_
dσ
dΩ
_
=
_
e
2
m c
2
_
2
_
ω
ω
_
[ M [
2
(919)
49
That is, the perturbation produces a complex shift of the energyshift which related to
the selfenergy Σn(E) which is to be discussed later
50
P. A. M. Dirac, Proc. Roy. Soc. A 114, 710 (1927).
160
where the matrix elements are given by
M =
_
ˆ
α
(k) . ˆ
α
(k
) < n
l
m
[ nlm >
+
1
m
n
l
m
< n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k) . ˆ p [ nlm >
( E
n
− E
n
− i Γ
n
+ ¯h ω )
+
1
m
n
l
m
< n
l
m
[ ˆ
α
(k) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k
) . ˆ p [ nlm >
( E
n
− E
n
− i Γ
n
− ¯h ω
)
_
(920)
Since close to resonance, the resonant denominator is given by Γ ∼
¯ h c
a
(
e
2
¯ h c
)
4
whereas the numerator is of the order of
e
2
a
. Hence, onresonance the matrix
elements can be of the order (
e
2
¯ h c
)
−3
larger than the nonresonant matrix ele
ments. Therefore, on resonance, the nonresonant terms may be neglected. In
the following, it shall be assumed that the resonant state is nondegenerate
_
dσ
dΩ
_
=
_
e
2
m
2
c
2
_
2
_
ω
ω
_
[ < n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> < n
l
m
[ ˆ
α
(k) . ˆ p [ nlm > [
2
( E
n
− E
n
+ ¯h ω )
2
+ Γ
2
n
(921)
This expression can be reexpressed in terms of the product of two factors
_
dσ
dΩ
_
=
_
e
2
2 π ¯h
m
2
ω V
_
[ < nlm [ ˆ
α
(k) . ˆ p [ n
l
m
> [
2
( E
n
− E
n
+ ¯h ω )
2
+ Γ
2
n
V
c
2 π
¯h
_
e
2
2 π ¯h
m
2
ω
V
_
[ < n
l
m
[ ˆ
α
(k
) . ˆ p [ n
l
m
> [
2
V ω
2
( 2 π )
3
¯h c
3
(922)
which is the probability for absorption from the ground state to the resonant
state [ n
l
m
> (divided by the incident ﬂux) times the probability for its
decay via emission. On resonance, it appears that the process corresponds to
two sequential processes, ﬁrst absorption and secondly emission.
For energies slightly oﬀresonance, the resonant scattering is expected to in
terfere with the nonresonant scattering process. Likewise, if the resonant state
is degenerate, the sum over the degeneracy must be performed before the matrix
elements are squared leading to constructive interference.
The diﬀerence between a resonant process and two step process, is deter
mined by the lifetime of the intermediate state [ n
l
m
> compared with the
frequency width of the photon beam. The frequency width of the photon beam
may be limited by the monochromator, or by the timescale of the experiment
if it involves a pulsed light source. If the lifetime of the intermediate state is
suﬃciently long compared with the time scale of experiment, it may be possi
ble to observe the decay long after the incident light has been switched oﬀ. In
161
this case, the resonance can be considered to be composed of two independent
processes
51
. Furthermore, it may be possible to perform further experiments
on the surviving intermediate state. In the opposite case, where the lifetime
of the intermediate state is shorter than the timescale of the experiment, the
intermediate state will have decayed before the experiment has terminated.
10.2.5 Natural LineWidths
The interaction representation will be used to calculate the natural width for
the absorption of light (k, α), by introducing the lifetimes of the initial and ﬁnal
state. Strictly speaking, one should not take the exponential decay of a proba
bility P
n
(t) of ﬁnding an electron in state ψ
n
too literally. If one considers the
approximate exponential decay as being rigorous, this implies that the Hamil
tonian should be nonHermitean which is strictly forbidden. One should think
of the decaying wave function as a wave packet or linear superposition of exact
energy eigenstates (with energies denoted by E). The Fourier transform of the
timedependent wave function should provide the energydistribution ρ
n
(E) of
the exact energy eigenstates in the wave packet [ ψ
n
(t) >
ρ
n
(E) =
1
2 π ¯h
_
∞
−∞
dt exp
_
+
i
¯h
E t
_
< ψ
n
(0) [ ψ
n
(t) >(923)
On assuming the approximate form of a decaying wave packet
< ψ
n
(0) [ ψ
n
(t) > = exp
_
−
i
¯h
E
n
t −
[ t [
2 τ
n
_
(924)
where the decay includes transitions to all possible ﬁnal states, one ﬁnds
ρ
n
(E) =
1
2 π i
_
1
E − E
n
− i
¯ h
2 τn
−
1
E − E
n
+ i
¯ h
2 τn
_
=
1
π
¯ h
2 τn
( E − E
n
)
2
+ (
¯ h
2 τn
)
2
(925)
This can only be an approximate form of the energydistribution since the en
ergy must be bounded from below. The existence of a lowerbound to en
ergy distribution implies that the width of the electronic energy level has to
be energydependent
¯ h
2 τn
= Γ
n
as this must become zero below a threshold
energy. However, it should be noted that the width of the energydistribution
will determine the approximate exponential decay. Since the perturbations in
troduce an energydependent width to the wave packet, causality requires that
the energyshift ∆E
n
should also be energydependent. Hence, the eﬀects of
the perturbation (such as the energyshift and lifetime) should be described in
51
V. Weisskopf, Ann. der Physik, 9, 23 (1931).
162
terms of a selfenergy Σ
n
(E)
Σ
n
(E) = 'e Σ
n
(E) + i ·m Σ
n
(E)
≈ ∆E
n
− i Γ
n
(926)
The energydependent selfenergy appears most naturally if one uses Brillouin
Wigner perturbation theory to calculate the correction to an approximate energy
E
n
. From secondorder BrillouinWigner perturbation theory, one ﬁnds that the
energydependent selfenergy, when evaluated just above the real E axis, is given
by
Σ
n
(E +iη) =
n
[ < n
[
ˆ
H
I
[ n > [
2
E + i η − E
n
(927)
This complex selfenergy has a real and imaginary part. The imaginary part
can be thought of as occurring via ampliﬁcation of the inﬁnitesimal imaginary
term i η in the denominator, and can be seen to be nonzero when the energy
E of the component in the wave packet falls in the region when the spectral
density of the approximate E
n
is ﬁnite. Hence, since the E
n
are bounded from
below, then so is the energydistribution ρ
n
(E) since
ρ
n
(E) = −
1
π
·m Σ
n
(E +iη)
( E − E
n
− 'e Σ
n
(E) )
2
+ ( ·m Σ
n
(E +iη) )
2
(928)
The real part of the selfenergy must also be energydependent, since it is related
to the imaginary part via the Kramer’sKronig relations
'e Σ
n
(E) = −
1
π
_
∞
−∞
dz
·m Σ
n
(z +iη)
E − z
·m Σ
n
(E +iη) = +
Tr
π
_
∞
−∞
dz
'e Σ
n
(z)
E − z
(929)
where the Principal Part of an integral with a simple pole is deﬁned as
Tr
_
∞
−∞
dz
f(z)
z
= lim
→0
_ _
∞
+
dz
f(z)
z
+
_
−
−∞
dz
f(z)
z
_
(930)
Hence, the real part of the selfenergy is also energydependent. The Kramers
Kronig relation is an expression of causality.
Since the electronic states in the expression for the FermiGolden rule decay
rate
1
τ nl→n
l
=
2 π
¯h
[ < n
l
m
[
ˆ
H
I
[ nlm > [
2
δ( E
n
l
− E
nl
− ¯h ω ) (931)
are to be interpreted as wave packets with a distribution of energies, the factor
expressing conservation of energy should be expressed in terms of the energy
163
conservation for the components of the wave packets. Hence, the decay rate
should be written as the convolution
1
τ nl→n
l
=
2 π
¯h
[ < n
l
m
[
ˆ
H
I
[ nlm > [
2
_
∞
−∞
dE
ρ
n
l
(E
)
_
∞
−∞
dE ρ
nl
(E) δ( E
− E − ¯h ω )
=
2 π
¯h
[ < n
l
m
[
ˆ
H
I
[ nlm > [
2
_
∞
−∞
dE ρ
n
l
( E + ¯h ω ) ρ
nl
(E) (932)
We shall use the approximation for the energy distributions suggested by eqn(925).
In this case, the convolution is evaluated by contour integration as
1
τ nl→n
l
=
2
¯h
[ < n
l
m
[
ˆ
H
I
[ nlm > [
2
¯ h
2 τn
+
¯ h
2 τ
n
l
( ¯h ω + E
nl
− E
n
l
)
2
+ (
¯ h
2 τ
nl
+
¯ h
2 τ
n
l
)
2
(933)
since only the terms with poles on the opposite sides of the realaxis yield
nonzero contributions. From this, one can show that the optical absorption
crosssection is given by
σ
absorb
(ω) =
4 π
3
_
e
2
¯h c
_
n
l
m
[ < n
l
m
[ r [ nlm > [
2
ω
n
l
,nl
(
1
2 τ
n
l
+
1
2 τ
nl
)
( ω
n
l
,nl
− ω )
2
+ (
1
2 τ
n
l
+
1
2 τ
nl
)
2
(934)
which was ﬁrst derived by Weisskopf and Wigner
52
. Hence, the natural width
is given by the average of the decay rates for the initial and ﬁnal electronic
states. This leads to the conclusion that even weak lines can be broad, if the
ﬁnal electronic state has a short lifetime.
10.3 Renormalization
Quantum Electrodynamics treats the interactions between charged particles and
the electromagnetic ﬁeld, and often contains inﬁnities. The zeropoint energy of
the electromagnetic ﬁeld is one such inﬁnity. In most cases, these inﬁnities can
be ignored since they are not measurable, since the inﬁnities occur as modiﬁca
tions caused by the introduction of interactions between the charged particles
of a hypothetical system with an electromagnetic ﬁeld. That is, the inﬁnities
occur in the form of a renormalization of the quantities of the noninteracting
theory. These inﬁnite renormalizations do not lead to the rejection of the theory
of Quantum Electrodynamics since the quantities of the noninteracting system
are not measurable. To be sure, the inﬁnities occur in relations between hypo
thetical quantities and physically measurable quantities, and so these inﬁnities
can be ignored since the hypothetical quantities are undeﬁned. However, it is
possible to use the theory to eliminate the unmeasurable quantities, thereby
yielding relations between physically measurable quantities to other physically
52
V. F. Weisskopf and E. Wigner, Z. Physik, 63, 54 (1930).
164
measurable quantities. In Quantum Electrodynamics, the inﬁnities cancel in
equations which only contain physical measurable quantities. This fortunate
circumstance makes the theory of Quantum Electrodynamics renormalizable.
First, it shall be shown how the inﬁnite zeropoint energy of the electro
magnetic ﬁeld can lead to a (ﬁnite) physically measurable force between its
containing walls.
10.3.1 The Casimir Eﬀect
The zeropoint energy of the electromagnetic ﬁeld can lead to measurable ef
fects. In general relativity, the total energy including the zeropoint energy
of the electromagnetic radiation is the source for the gravitational ﬁeld. The
Casimir eﬀect
53
shows that the zeropoint energy of the electromagnetic radi
ation produces a force on the walls of the cavity. We shall consider a cubic
volume V = L
3
which is enclosed by conducting walls that acts as a cavity
for the electromagnetic radiation. This volume is divided into two by a metallic
partition, which is located at a distance d from one side of the cavity. We shall
d L  d
Figure 34: The geometry of the partitioned electromagnetic cavity used to con
sider the Casimir eﬀect.
evaluate the total energy of this conﬁguration and then deduce the form of the
interaction between the partition and the walls of the cavity.
We shall consider the total energy due to the zeropoint ﬂuctuations in the
container. Since the zeropoint energy is divergent due to the presence of arbi
trarily large frequencies, we shall introduce a convergence factor. The conver
gence factor can be motivated by the observation that, in mater, electromagnetic
radiation becomes exponentially damped at large frequencies. Hence, one can
write
E =
1
2
k,α
¯h ω
k,α
exp
_
− λ
ω
k,α
c
_
(935)
53
H. B. G. Casimir, Physica 19, 846 (1953).
165
and then take the limit λ →0.
The presence of the conducting walls introduces boundary conditions such
that the EM ﬁeld is zero at every boundary. The boundary conditions restrict
the allowed values of k so that the components satisfy
k
i
L = π n
i
(936)
for i = x, y and n
i
are positive integers. The boundary condition for the re
maining two boundaries leads to the restriction
k
z
d = π n
z
(937)
The energy of the radiation in one part of the partition can be expressed as
E
d
= ¯ h c
L
2
π
2
_
∞
0
dk
x
_
∞
0
dk
y
∞
nz=1
¸
k
2
x
+ k
2
y
+
_
n
z
π
d
_
2
exp
_
−λ
¸
k
2
x
+ k
2
y
+
_
n
z
π
d
_
2
_
(938)
where the two polarizations have been summed over. The integration has cylin
drical symmetry but only extends over the quadrant with positive k
x
and k
y
,
therefore, it shall be rewritten as
E
d
= ¯ h c
2 L
2
4 π
_
∞
0
dk k
∞
nz=1
¸
k
2
+
_
n
z
π
d
_
2
exp
_
−λ
¸
k
2
+
_
n
z
π
d
_
2
_
(939)
or, on changing variable to the dimensionless κ = k
2
(
d
nz π
)
2
E
d
= ¯ h c
L
2
4 π
∞
nz=1
_
n
z
π
d
_
3
_
∞
0
dκ
√
κ + 1 exp
_
−
n
z
π λ
d
√
κ + 1
_
(940)
The factor of n
3
z
can be expressed as a thirdorder derivative of the exponential
factor w.r.t. λ
E
d
= − ¯h c
L
2
4 π
∞
nz=1
_
∞
0
dκ
1
κ + 1
∂
3
∂λ
3
_
exp
_
−
n
z
π λ
d
√
κ + 1
_ _
(941)
The summation over n
z
can be performed, leading to
E
d
= − ¯h c
L
2
4 π
_
∞
0
dκ
1
κ + 1
∂
3
∂λ
3
_
exp[ −
π λ
d
√
κ + 1 ]
1 − exp[ −
π λ
d
√
κ + 1 ]
_
= − ¯h c
L
2
4 π
_
∞
0
dκ
1
κ + 1
∂
3
∂λ
3
_
1
exp[
π λ
d
√
κ + 1 ] − 1
_
(942)
Let t =
√
κ + 1 so
E
d
= − ¯h c
2 L
2
4 π
_
∞
1
dt
t
∂
3
∂λ
3
_
1
exp[
π λ
d
t ] − 1
_
(943)
166
The factor of t
−1
can be eliminated by performing one of the diﬀerentials with
respect to λ.
E
d
=
¯h c L
2
2 d
_
∞
1
dt
∂
2
∂λ
2
_
exp[
π λ
d
t ]
( exp[
π λ
d
t ] − 1 )
2
_
(944)
We shall set
s = exp[
π λ t
d
] − 1 (945)
therefore
E
d
=
¯h c L
2
2 d
∂
2
∂λ
2
_
d
π λ
__
∞
s0
ds
s
2
(946)
where the lower limit of integration depends on λ and is given by
s
0
= exp[
π λ
d
] − 1 (947)
The integration can be performed trivially, yielding
E
d
=
¯h c L
2
2 d
∂
2
∂λ
2
_
d
π λ s
0
_
=
¯h c L
2
2 d
∂
2
∂λ
2
_
d
π λ
exp[
π λ
d
] − 1
_
=
¯h c L
2
2 d
∂
2
∂λ
2
_ _
d
π λ
_
2
_
π λ
d
exp[
π λ
d
] − 1
_ _
(948)
The last factor in the above expression can be expanded as
x
exp[x] − 1
=
∞
n=0
B
n
x
n
n!
(949)
where B
n
are the Bernoulli numbers, which are given by B
0
= 1, B
1
=
−
1
2
, B
2
=
1
6
, B
3
= 0, B
4
= −
1
30
, etc. Therefore, the energy of the
electromagnetic cavity at zero temperature, is ﬁnite for a ﬁnite value of the
cutoﬀ λ but diverges as λ
−4
when λ →0. The zero point energy of the cavity
can be expressed as
E
d
=
¯h c L
2
2
_
π
2
d
3
n=0
B
n
(n −2)(n −3)
n!
_
π λ
d
_
n−4
_
(950)
where the n = 0 term diverges as λ
−4
in the limit as λ →0 and is proportional
to the volume of the cavity d L
2
. The term with n = 1 also diverges, but
diverges as λ
−3
and has the form of a surface energy since it is proportional to
L
2
. The terms with n = 2 and n = 3 are identically equal to zero. The term
167
with n = 4 remains ﬁnite in the limit λ → 0 and all the higherorder terms
vanish in this limit. Explicitly, one has
E
d
=
¯h c L
2
2 d
_
6 B
0
d
2
π
2
λ
4
+
2 B
1
d
π λ
3
+
2 B
4
π
2
4! d
2
+ O(λ)
_
(951)
The ﬁrst term in the energy is proportional to L
2
d, which is the volume of the
cavity and the second term is proportional to L
2
the surface area of the walls.
The third term is independent of the cutoﬀ and the higher order terms vanish
in the limit λ →0.
The Casimir force is the force between two planes, which originates from
the energy of the ﬁeld
54
. This energy can be separated out into a volume
part and parts due to the creation of the surfaces and an interaction energy
between the surfaces. In order to eliminate both the volume dependence of the
energy and the surface energies, we are considering two conﬁgurations of the
partitions in the cavity. In one conﬁguration the plane divides the volume into
two unequal volumes d L
2
and (L − d) L
2
, and the other conﬁguration is a
reference conﬁguration where the cavity is partitioned into two equal volumes
L
3
2
. The diﬀerence of energies for these conﬁgurations is given by
∆E = E
d
+ E
L−d
− 2 EL
2
(952)
In the limit L → ∞ this is expected to reduce to the energy of interaction
between the planes separated by distance d. Since the volume and surface areas
of the two partitions are identical, one ﬁnds that the diﬀerence in energy of the
two conﬁgurations is ﬁnite and is given by
lim
Ld,λ→0
∆E → −
π
2
720
¯h c L
2
d
3
(953)
The ddependence of the energy diﬀerence leads to an attractive force between
the two plates separated by a distance d, which is the Casimir force
F = −
π
2
240
¯h c
L
2
d
4
(954)
The force is proportional to L
2
which is the area of the wall of the cavity. The
predicted force was measured by Sparnaay
55
. A more recent experiment involv
ing a similar force between a planar surface and a sphere has achieved greater
accuracy
56
.
54
Our considerations only include the part of Fock space that corresponds to having zero
numbers of excited quanta. Hence, the Casimir force is due to the properties of the ﬁeld, and
is not due to the transmission of real particles (photons) between the planes.
55
M. J. Sparnaay, Physica 24, 751 (1958).
56
S. K. Lamoreaux, Phys. Rev. Lett. 78, 5 (1997).
168
VOLUME 78, NUMBER 1 P HYS I CAL RE VI EW L E T T ERS 6 JANUARY 1997
large. This drift was the result of a variety of environmen
tal factors, most notably temperature variations. Roughly
10% of the up͞down sweeps were rejected because anoma
lously large drift resulted in an unsatisfactory convergence
for the ﬁt, as evidenced by an anomalously large x
2
, a
nonphysical value for a0, and an inconsistent result for b
which was quite constant. Also, those sweeps where the
net change in dV between start and ﬁnish corresponded
to a force greater than 4 3 10
25
dyn were rejected; in all,
the ﬁnal data set comprises 216 up͞down sweeps. Quite
often, as the absolute separation drifted, the plates would
contact before the end of a complete up sweep. The step
at which this occurred could be unambiguously determined
by a sudden jump in the feedback signal. Roughly eight
steps on the down sweep had to be rejected because af
ter such a large perturbation, the feedback system required
several minutes to reestablish equilibrium.
Assuming that the functional form for the Casimir force
is correct, its magnitude was determined by using linear
least squares to determine a parameter d for each sweep
such that
F
m
c
͑ai ͒ ͑1 1 d͒F
T
c
͑ai ͒ 1 b
0
. (9)
In this context, b
0
should be zero, and for the complete
data set, b
0
, 5 3 10
27
dyn (95% conﬁdence level).
The average over the 216 sweeps gives d 0.01 6 0.05,
and this is taken as the degree of precision of the
measurement. There was no evidence for any variation
of d depending on the region of the plates used for the
measurement.
The most striking demonstration of the Casimir force
is given in Fig. 4. The agreement with theory, with no
FIG. 4. Top: All data with electric force subtracted, averaged
into bins (of varying width), compared to the expected Casimir
force for a 11.3 cm spherical plate. Bottom: Theoretical
Casimir force, without the thermal correction, subtracted from
top plot; the solid line shows the expected residuals.
adjustable parameters, is excellent. It should be noted that
the closest approach is about 0.6 mm; this limit could be
caused by either dirt on the surfaces, or by an instability
of the feedback system. The Casimir force is nonlinear
and increases rapidly at distances less than 0.5 mm. With
the plates separated by 10 mm, the feedback loop became
unstable when a 700 mV potential difference between the
plates was applied; the change in force with distance (the
effective spring constant) in this case is dFe͞da 1.5 3
10
23
dyn͞mm, which is equal to dF
c
͞da at a 0.5 mm.
In conclusion, we have given an unambiguous demon
stration of the Casimir force with accuracy of order 5%.
Our data is not of sufﬁcient accuracy to demonstrate the
ﬁnite temperature correction, as shown in Fig. 4(b). Also,
given a plasma frequency for Au of order vp͞2p ഠ
6 3 10
14
Hz, Eq. (5) gives a correction of order 20%
at the closest spacings; our data does not support such
a deviation. However, the simple frequency dependence
of the electrical susceptibility used in the derivation of
Eq. (5) is not correct for Au, the index of refraction of
which has a large imaginary component above the plasma
frequency; a rough estimate using the tabulated complex
index [14] limits the conductivity correction as no larger
than 3%, which is consistent with our results [15].
I thank Dev Sen (who was supported by the UW NASA
Space Grant Program) for contributions to the early stages
of this experiment, and Michael Eppard for assistance
with calculations.
*Present address: Los Alamos National Laboratory,
Neutron Science and Technology Division P23, M.S.
H803, Los Alamos, NM 87545.
[1] H. B. G. Casimir, Koninkl. Ned. Adak. Wetenschap. Proc.
51, 793 (1948).
[2] E. Elizalde and A. Romeo, Am. J. Phys. 59, 711 (1991).
[3] V. M. Mostepanenko and N. N. Trunov, Sov. Phys. Usp.
31, 965 (1988).
[4] M. J. Sparnaay, Physica (Utrecht) 24, 751 (1958).
[5] C. I. Sukenik, M. G. Boshier, D. Cho, V. Sangdohar, and
E. A. Hinds, Phys. Rev. Lett. 70, 560 (1993).
[6] E. M. Lifshitz, Sov. Phys. JETP 2, 73 (1956).
[7] T. H. Boyer, Phys. Rev. 174, 1764 (1968).
[8] J. Blocki, J. Randrup, W. J. Swiatecki, and C. F. Tsang,
Ann. Phys. (N.Y.) 105, 427 (1977).
[9] J. Schwinger, L. L. DeRaad, Jr., and K. A. Milton, Ann.
Phys. (N.Y.) 115, 1 (1978).
[10] J. Mehra, Physica (Utrecht) 37, 145 (1967).
[11] L. S. Brown and G. J. Maclay, Phys. Rev. 184, 1272
(1969).
[12] G. Ising, Philos. Mag. 1, 827 (1926).
[13] W. R. Smythe, Static and Dynamic Electricity (McGraw
Hill, New York, 1950), pp. 121–122.
[14] CRC Handbook of Chemistry and Physics, 76th Ed. (CRC
Press, Boca Raton, 1995), pp. 12–130.
[15] S. Hacyan, R. Jauregui, F. Soto, and C. Villarreal, J. Phys.
A 23, 2401 (1990).
8
Figure 35: The separationdependent force between two closely spaced metallic
surfaces due to the modiﬁcation of the zeropoint energy. The lower panel shows
the diﬀerence between the experimental results and the theoretical prediction
for the Casimir Force. [After S. K. Lamoreaux, Phys. Rev. Lett. 78, 5, (1997).]
To summarize, the physical quantity is the force or diﬀerence in energies
when one wall is moved. When the change in energy is calculated, the diﬀer
ence between the two divergent energies is ﬁnite and independent of the choice
of cutoﬀ
57
.
CutOﬀ Independence
It is the boundary condition and not the cutoﬀ that plays an important role
in the Casimir eﬀect. For simplicity, one can choose zero boundary conditions.
The zeropoint energy of a cylindrical electromagnetic cavity of radius R and
length d can be expressed as the sum
E
d
= 2
¯h c
2
π R
2
( 2 π )
∞
nz=1
_
dk
ρ
k
ρ
¸
k
2
ρ
+
_
π n
z
d
_
2
F
_
¸
k
2
ρ
+
_
π n
z
d
_
2
_
(955)
where F(z) is an arbitrary cutoﬀ function (which may depend on an arbitrary
parameter λ which is ultimately going to be set to zero). The cutoﬀ must not
eﬀect the low energymodes so one can choose F(0) = 1 and all the derivatives
of F(z) to be zero for ﬁnite values of z. These assumptions are all in accord with
the ideal case of no cutoﬀ function or F(z) = 1. The energy can be written as
E
d
=
¯h c R
2
2
∞
nz=1
f(n
z
) (956)
57
The independence of any cutoﬀ procedure can be shown by evaluating the divergent sums
by using the EulerMaclaurin summation formula.
169
0
0.4
0.8
1.2
0 0.2 0.4 0.6 0.8 1
z/N
F
(
z
)
Figure 36: The schematic form of the cutoﬀ function F(z).
where
f(n
z
) =
_
∞
0
dk
ρ
k
ρ
¸
k
2
ρ
+
_
π n
z
d
_
2
F
_
¸
k
2
ρ
+
_
π n
z
d
_
2
_
(957)
The summation can be performed by changing it into an integral, however the
corrections due to smoothing will be kept. This is accomplished by the Euler
Maclaurin formula. The integral between 0 and N of a function can be roughly
expressed as a summation
_
N
0
dx f(x) ≈
1
2
f(0) +
N−1
n=1
f(n) +
1
2
f(N) (958)
by choosing to approximate the integral by the area under a histogram where
the x variable is binned into intervals of width unity centered around x = n.
The corrections at n = 0 and n = N are needed to account for the fact that the
range of integration excludes half the width of the rectangular blocks centered on
n = 0 and n = N. The EulerMaclaurin formulae is equivalent to ﬁnding a good
smooth polynomial ﬁt to the integrand, and then integrating the polynomial.
It generates corrections which are given by the derivatives at the end points
_
N
0
dx f(x) =
1
2
f(0) +
N−1
n=1
f(n) +
1
2
f(N)
+
B
2
2!
( f
(1)
(0) − f
(1)
(N) ) +
B
4
4!
( f
(3)
(0) − f
(3)
(N) ) + . . .
(959)
We shall assume that f(n) and all its derivatives vanishes in the limit of large
n, lim
N→∞
f(N) → 0, due to the behavior of the cutoﬀ function. The
170
corrections in the EulerMaclaurin summation formulae can be evaluated by
noting that the ﬁrst derivative of f(n) with respect to n is given by
f
(1)
(n) =
π
2
n
d
2
_
∞
0
dk
ρ
k
ρ
_
k
2
ρ
+ (
π n
d
)
2
F
_
¸
k
2
ρ
+
_
π n
d
_
2
_
(960)
since the derivatives of F(z) all vanish for ﬁnite z. The integration over the
variable k
ρ
is reexpressed in terms of an integration over the variable z, deﬁned
by
z =
¸
k
2
ρ
+
_
π n
d
_
2
(961)
so
dz = dk
ρ
k
ρ
¸
k
2
ρ
+
_
π n
d
_
2
(962)
The integration is evaluated through integration by parts
f
(1)
(n) =
π
2
n
d
2
_
∞
πn
d
dz F(z)
=
π
2
n
d
2
_
∞
πn
d
dz
_
∂z
∂z
_
F(z)
=
π
2
n
d
2
z F(z)
¸
¸
¸
¸
∞
πn
d
= −
π
3
n
2
d
3
(963)
In deriving the above expression, the condition that the ﬁrstorder derivative of
F(z) vanishes for ﬁnite z has been used. It immediately follows that
f
(2)
(n) = −
2 π
3
n
d
3
(964)
and
f
(3)
(n) = −
2 π
3
d
3
(965)
and all higher order derivatives vanish. Hence, one ﬁnds that at z = 0 all the
mth order derivatives f
(m)
(0) vanish, except for m = 3 which is given by
f
(3)
(0) = −
2 π
3
d
3
(966)
Hence, on evaluating the energy of the cylindrical cavity (and using the zero
boundary conditions), one ﬁnds that the energy is composed of the sum of an
171
integral and a ﬁnite number of other terms. The integral part of the expression
only depends on the volume of the cavity and is proportional to a divergent
integral, and hence drops out when the energy diﬀerences are taken. The only
terms that yield nonzero contributions to the energy diﬀerence originate with
f
(3)
(0) and depend on d. It is these terms that give rise to the Casimir force.
This approach also showed that any particular choice made for the cutoﬀ is
irrelevant.
Mathematical Interlude:
The EulerMaclaurin Summation Formula.
The EulerMaclaurin formula allows one to accurately evaluate the diﬀerence
of ﬁnite summations and their approximate evaluations in the form of integrals.
The EulerMaclaurin Formula
If N is an integer and f(x) is a smooth diﬀerentiable function deﬁned for all
real values of x between 0 and N, then the summation
S =
N−1
n=1
f(n) (967)
can be approximated by an integral
I =
_
N
0
dx f(x) (968)
In particular, by utilizing the “trapedoizal rule”, one expects that
I ∼ S +
1
2
_
f(0) + f(N)
_
(969)
The EulerMaclaurin formula provides expressions for the diﬀerence between the
sum and the integral in terms of the higherderivatives f(n) at the end points
of the interval 0 and N. For any integer p, one has
S +
1
2
_
f(0) + f(N)
_
− I =
p
n=1
B
2n
(2n)!
_
f
2n−1
(N) − f
2n−1
(0)
_
+ R
(970)
where B
1
= 1/2, B
2
= 1/6, B
3
= 0, B
4
= 1/30, B
5
= 0, B
6
= 1/42, B
7
=
0, B
8
= 1/30, ... are the Bernoulli numbers, and R is an error term which is
normally small if the series on the right is truncated at a suitable value of p.
The Remainder Term
The remainder R when the series is truncated after p terms is given by
R = (−1)
p
_
N
0
dx f
(p+1)
(x)
P
p+1
(x)
(p + 1)!
(971)
172
where P
n
(x) = B
n
(x−[x]) are the periodic Bernoulli polynomials. The remain
der term can be estimated as
[ R [ ≤
2
(2π)
p
_
N
0
dx [ f
2p−1
(x) [ (972)
Derivation by Induction
First we shall examine the properties of the Bernoulli polynomials and the
Bernoulli numbers. Then we shall indicate how the EulerMaclaurin formula
can be obtained by induction.
The Bernoulli polynomials B
n
(x), for n = 0, 1, 2, ... are deﬁned by the
generating function expansion
G(z, x) =
z e
zx
e
z
−1
=
∞
n=0
B
n
(x)
z
n
n!
(973)
Furthermore, when x = 0, one has
G(z, 0) =
z
e
z
−1
=
∞
n=0
B
n
z
n
n!
(974)
where B
n
are the Bernoulli constants. Hence, the Bernoulli constants are the
Bernoulli polynomials evaluated at x = 0, i.e. B
n
(0) = B
n
. Furthermore, on
diﬀerentiating the generating function w.r.t. x, one ﬁnds
∂G(z, x)
∂x
= z G(z, x) (975)
which implies that
∞
n=0
∂B
n
(x)
∂x
z
n
n!
= z
∞
n=0
B
n
(x)
z
n
n!
(976)
On equating the coeﬃcients of z
n
in the above equation, one obtains the im
portant relation
∂B
n
(x)
∂x
= n B
n−1
(x) (977)
Therefore, by integration it easy to show that B
n
(x) are polynomials of degree
n. The ﬁrst few Bernoulli polynomials can be explicitly constructed from the
generating function expansion. The ﬁrst few polynomials are given by
B
0
(x) = 1
B
1
(x) = x −
1
2
173
B
2
(x) = x
2
−x +
1
6
B
3
(x) = x
3
−
3
2
x
2
+
1
2
x
B
4
(x) = x
4
−2x
3
+x
2
−
1
30
B
5
(x) = x
5
−
5
2
x
4
+
5
3
x
3
−
1
6
x
. . . (978)
From the generating function expansion, one can show that the Bernoulli poly
nomials are either even or odd functions of x −
1
2
. The generating function can
be expressed
G(z, x) = e
z(x−
1
2
)
_
z
e
z
2
−e
−
z
2
_
=
∞
n=0
B
n
(x)
z
n
n!
(979)
where the second factor is an even function of z, thus, the generating function is
invariant under the combined transformation z →−z and (x −
1
2
) →−(x −
1
2
).
Therefore, one has
∞
n=0
B
n
_
1
2
+x −
1
2
_
z
n
n!
=
∞
n=0
B
n
_
1
2
+
1
2
−x
_
( − 1 )
n
z
n
n!
(980)
so the polynomials satisfy
B
n
(x) = ( − 1 )
n
B
n
(1 −x) (981)
In particular for x = 1, one has
B
n
(1) = ( − 1 )
n
B
n
(0) (982)
The generating function with x = 0 can be rewritten as the sum of its even
and odd parts
G(z, 0) =
_
z
2
tanh
z
2
_
−
z
2
=
∞
n=0
B
n
(0)
z
n
n!
(983)
The even part has only even terms in its Taylor expansion, and there is only
one term in the odd part. Hence, the odd Bernoulli numbers vanish for n > 1,
i.e. B
2n+1
(0) = 0 for n > 0. Therefore, for n ≥ 2, one has B
n
(0) = B
n
(1). This
equality can be used to evaluate the integrals of the Bernoulli polynomial over
the range from 0 to 1. On expressing the integral of B
n
(x) in terms of B
n+1
(x),
one has
_
1
0
dx B
n
(x) =
1
(n + 1)
_
1
0
dx
∂B
n+1
(x)
∂x
=
B
n+1
(1) − B
n+1
(0)
( n + 1 )
= 0 for n ≥ 1 (984)
174
Hence, the Bernoulli polynomials may be deﬁned recursively via the relation
∂B
n
(x)
∂x
= n B
n−1
(x) (985)
if the constant of integration is ﬁxed by
_
1
0
dx B
n
(x) = 0 for n ≥ 1 (986)
The periodic Bernoulli functions P
n
(x) can be deﬁned by
P
n
(x) = B
n
(x −[x]) (987)
where [x] is the integral part of x. This deﬁnition of P
n
(x) reproduces to the
Bernoulli polynomials on the interval (0, 1) since [x] = 0 in this interval. The
functions P
n
(x) are periodic over an extended range of x with period 1.
The EulerMaclaurin formula can be obtained by mathematical induction.
Consider the integral
_
n+1
n
dx f(x) =
_
n+1
n
dx u
∂v
∂x
(988)
with the identiﬁcation of
u = f(x) (989)
and
∂v
∂x
= 1 = P
0
(x) (990)
since P
0
(x) = 1. Therefore, on using the recursion relation involving the
derivative of the Bernoulli polynomials, one ﬁnds that
v = P
1
(x) (991)
Integrating by parts, one obtains
_
n+1
n
dx f(x) = [ f(x) P
1
(x) ]
n+1
n
−
_
n+1
n
dx
∂f(x)
∂x
P
1
(x) (992)
but since the periodic Bernoulli polynomial P
1
(x) is given by
P
1
(x) = (x − [x]) −
1
2
(993)
it has the value of 1/2 at the limits of integration. Hence, the integration reduces
to
_
n+1
n
dx f(x) =
_
f(n + 1) + f(n)
2
_
−
_
n+1
n
dx
∂f(x)
∂x
P
1
(x) (994)
175
Summing the above expression from n = 1 to n = N −1, yields
_
N
1
dx f(x) =
_
f(1) + f(N)
2
_
+
N−1
n=2
f(n) −
_
N
1
dx
∂f(x)
∂x
P
1
(x) (995)
Adding
_
f(1) + f(N)
2
_
to both sides of the equation and rearranging, one ﬁnds
N
n=1
f(n) =
_
N
1
dx f(x) +
_
f(1) + f(N)
2
_
+
_
N
1
dx
∂f(x)
∂x
P
1
(x) (996)
The last two terms, therefore, give the error when the sum is approximated by
an integral. The ﬁrst correction is simply the end point corrections from the
“trapezoidal rule”, and the second correction has to be evaluated to yield the
EulerMaclaurin formula. The last correction is of the form of an integral which
can be expressed in terms of the sum of the integrals
_
n+1
n
dx f
(x) P
1
(x) (997)
where the prime refers to the derivative of f(x) w.r.t. x. The above expression
can be evaluated by integrating by parts. The integrand is rewritten as
_
n+1
n
dx f
(x) P
1
(x) =
_
n+1
n
dx u
∂v
∂x
(998)
where one identiﬁes the two factors as
u = f
(x)
∂v
∂x
= P
1
(x) (999)
Since the indeﬁnite integral is evaluated as
_
x
dx
P
1
(x
) =
1
2
P
2
(x) (1000)
the integration by parts yields
_
n+1
n
dx P
1
(x) f
(x) =
_
P
2
(x) f
(x)
2
_
n+1
n
−
1
2
_
n+1
n
dx f
(x) P
2
(x) (1001)
However, one has P
2
(0) = P
2
(1) = B
2
, therefore the above expression simpliﬁes
to
_
n+1
n
dx P
1
(x) f
(x) = B
2
_
f
(n + 1) −f
(n)
2
_
−
1
2
_
n+1
n
dx f
(x) P
2
(x)
(1002)
176
Then, on summing the above expression from n = 1 to n = N −1, one ﬁnds
_
N
1
dx P
1
(x) f
(x) = B
2
_
f
(N) −f
(1)
2
_
−
1
2
_
N
1
dx f
(x) P
2
(x) (1003)
This yields the ﬁrst term in the series of end point corrections in the Euler
Maclaurin formula, where the correction is the sum of the ﬁrst derivatives at
the end points multiplied by B
2
/2!. The above process can be iterated yielding
a complete proof of the EulerMaclaurin summation formula.
In order to get bounds on the size of the error when the sum is approximated
by the integral, we note that the Bernoulli polynomials on the interval [0, 1] at
tain their maximum absolute values at the endpoints and the value B
n
(1) is the
nth Bernoulli number.
References
T. M. Apostol, ”An Elementary View of Euler’s Summation Formula”,
American Mathematical Monthly, 106, 409418 (1999).
D. H. Lehmer, ”On the Maxima and Minima of Bernoulli Polynomials”,
American Mathematical Monthly, 47, 533538 (1940).
10.3.2 The Lamb Shift
The Lamb shift is a shift between the energy levels of the 2
2
S1
2
and the 2
2
P1
2
levels of hydrogen from the predictions of the Dirac equation as they have the
same n and j values (j = l ± s). The Dirac equation predicts that these two
levels should be degenerate. However, these levels were measured by Lamb and
Retherford
58
who found that the 2
2
S1
2
level is higher than the 2
2
P1
2
level
by 1058 MHz or 0.033 cm
−1
. Bethe explained this in terms of the interaction
between the bound electron and the quantized electromagnetic ﬁeld
59
. Simi
lar shifts should also occur between the n
2
S1
2
and the n
2
P1
2
levels, but the
magnitude of the shifts should be much smaller, as the magnitude varies as n
−3
.
Qualitatively, the electron interacts with the ﬂuctuating electromagnetic
ﬁeld and with the potential due to the nucleus. The zeropoint ﬂuctuations
cause the electron to deviate from its quantum orbit by an amount given by ∆r
and, therefore, experiences a potential given by
V (r + ∆r) = V (r) + ∆r . ∇V (r) +
1
2!
( ∆r . ∇ )
2
V (r) +. . . (1004)
58
W. E. Lamb Jr. and R. C. Retherford, Phys. Rev.72, 241 (1947).
59
H. A. Bethe, Phys. Rev. 72, 339 (1947).
177
so one expects an energyshift given by
∆E =
1
3 . 2!
< ∆r
2
> < nlm [ ∇
2
V (r) [ nlm > (1005)
Due to the form of the Coulomb potential
V (r) = −
e
2
r
(1006)
the Laplacian is related to a point charge density at the nucleus
∇
2
V (r) = 4 π e
2
δ(r) (1007)
Hence, the shift due to the ﬂuctuations in the electron’s potential energy occurs
primarily at the origin. The eﬀect of the electromagnetic ﬂuctuations on the
kinetic energy are not state speciﬁc, and can be considered as a uniform shift
of all the energy levels, like the electron’s rest mass energy m c
2
. Thus, the
relative energy shift of the levels is solely determined by the potential at the
origin. Therefore, the states with nonzero angular momenta do not experience
the relative energyshift since the electronic wave functions vanish at the origin.
Thus, only the 2s state experiences a shift but the 2p state is unshifted.
The magnitude of the Lamb shift can be ascertained by expressing ∆r in
terms of the zeropoint ﬂuctuations in the electromagnetic ﬁeld
60
. If it is as
sumed that the electron is bound to the atom harmonically, ∆r is determined
from the equation of motion
∆¨ r +ω
2
0
∆r =
q
m
E (1008)
where the electric ﬁeld E has components that are ﬂuctuating with wave vector k
or equivalently with frequency ω. This has the result that the position ﬂuctuates
∆r(t)
Figure 37: A cartoon depicting the modiﬁcation of the classical orbit of an
electron due to the zeropoint ﬂuctuations of the electromagnetic ﬁeld.
at the frequency ω with an amplitude given by
∆r
ω
=
q
m
E
ω
1
ω
2
0
− ω
2
(1009)
60
T. A. Welton, Phys. Rev. 74, 1557 (1948).
178
where E
ω
is the Fourier component of the ﬂuctuating electric ﬁeld. Hence, the
ωcomponent of the mean squared ﬂuctuation
61
in the particle’s position is given
by
< [ ∆r
2
ω
[ > =
_
q
m
_
2
< [
E
2
ω
( ω
2
0
− ω
2
)
2
[ > (1010)
On approximating the electromagnetic energy associated with the ﬂuctuating
electromagnetic ﬁeld < [ E
2
ω
[ > by the half the sum of the zeropoint energies
of the photon modes, one has
V
8 π
< [ E
2
ω
[ > = 2
1
4
¯h ω (1011)
where the factor 2 represents the two types of polarization of the normal modes.
Therefore, on summing over the normal modes, one ﬁnds that the mean squared
deviation of the electron’s trajectory from the classical orbit is proportional to
V
( 2 π c )
3
_
dΩ
_
∞
0
dω ω
2
< [
E
2
ω
( ω
2
0
− ω
2
)
2
[ >
=
4 π ¯h
V
V
( 2 π c )
3
_
dΩ
_
∞
0
dω
ω
3
( ω
2
0
− ω
2
)
2
(1012)
The integration over ω can be approximated as
_ mc
2
¯ h
ω0
dω
ω
= ln
m c
2
¯h ω
0
(1013)
where an upper and lower cutoﬀ have been introduced to prevent the integral
from diverging
62
. The expectation value of the second derivative of the potential
for the 2s state is given by
< [ ∇
2
V [ > = 4 π e
2
1
π a
3
(1014)
where the second factor represents the 2s electron density at the origin. The
corresponding factor for an ns level is expected to vary proportionally to n
−3
.
Combining the above expressions, one ﬁnds that the 2s level is shifted by an
energy given by
∆E
2s
=
4
2 π
_
e
2
¯h c
_
3
_
m e
4
¯h
2
_
ln
m c
2
¯h ω
0
(1015)
61
The average squared ﬂuctuation of the electromagnetic ﬁeld should, in principle, be cal
culated as an average over a volume in time and space which encompasses the electron’s
trajectory.
62
The upper limit can be considered as being determined by the spatial dimension of the
volume in which the electromagnetic ﬂuctuations are being averaged over. The divergence at
the lower limit of integration is unphysical and is caused by the neglect of the lifetimes of
the electronic states. The inclusion of the lifetimes result in the integrand being ﬁnite at the
resonance frequency ω
0
179
where the frequency of the electron’s orbit ω
0
has been chosen as a lower cutoﬀ
on the frequency of the electromagnetic ﬂuctuations. Since the logarithmic fac
tor is approximately given by ∼ −2 ln
e
2
¯ hc
, one can see that the above estimate
is consistent with the Lamb shift having a magnitude of approximately 4.372
10
−6
eV.
10.3.3 The SelfEnergy of a Free Electron
The corrections to the energy of a free electron due to its coupling to the elec
tromagnetic ﬁeld are to be considered
63
. It shall be assumed that the electro
magnetic ﬁeld is in the ground state [ ¦0¦ > , and the energy of an electron in
a state with momentum q will be evaluated via perturbation theory.
The lowestorder correction to the electron’s energy comes from the diamag
netic interaction. From ﬁrstorder perturbation theory, one ﬁnds the correction
q
q
(k,α)
Figure 38: The ﬁrstorder correction to the rest mass of the electron due to the
diamagnetic interaction.
∆E
(1)
q
= < q ¦0¦ [
ˆ
H
dia
[ q ¦0¦ > (1016)
On using a planewave to represent the electronic wave function
ψ
q
(r) =
1
√
V
exp
_
i q . r
_
(1017)
then the ﬁrstorder change in the electron’s energy due to the coupling to the
ﬁeld is given by
∆E
(1)
q
=
_
e
2
2 m c
2
_
k,α,k
,α
_
2 π ¯h c
2
V
_
ˆ
α
(k) . ˆ
α
(k
)
√
ω
k
ω
k
< ¦0¦ [ a
k
,α
a
†
k,α
[ ¦0¦ >
63
W. Heisenberg and W. Pauli, Z. Physik, 56, 1 (1929),
W. Heisenberg and W. Pauli, Z. Physik, 59, 168 (1930).
I. Waller, Z. Physik, 59, 168 (1930).
I. Waller, Z. Physik, 61, 721 & 837 (1930).
I. Waller, Z. Physik, 62, 673 (1930).
180
1
V
_
d
3
r exp
_
− i q . r
_
exp
_
i ( k
− k ) . r
_
exp
_
i q . r
_
=
_
e
2
2 m c
2
_
k,α,k,α
_
2 π ¯h c
2
V
_
ˆ
α
(k) . ˆ
α
(k
)
√
ω
k
ω
k
δ
k,k
δ
α,α
(1018)
since the electronic matrix elements give rise to the condition of conservation of
momentum. Hence, the correction to the energy is found as
∆E
(1)
q
=
_
e
2
2 m c
2
_
V
( 2 π )
3
_
2 π ¯h c
V
_
2
_
d
3
k
1
k
=
_
e
2
2 m c
2
_
V
( 2 π )
3
_
2 π ¯h c
V
_
8 π
_
∞
0
dk k
=
_
e
2
¯h
π m c
_ _
∞
0
dk k (1019)
which diverges. This contribution is independent of the electron’s momentum
q, and since k = k
it can be seen that the contribution of the diamagnetic
interaction to ﬁrstorder is independent of the quantum state of the electron.
This contribution to the electron’s energy can be lumped together with the elec
tron’s restenergy m c
2
. However, since the corrections are being evaluated for
nonrelativistic electrons for which the rest energy is not observable, it is cus
tomary to ignore the restenergy and, therefore, this correction shall no longer
be considered.
The paramagnetic interaction when taken to secondorder also yields a cor
rection to the electron’s selfenergy. This correction can be considered to be due
q
q
qk
(k,α)
Figure 39: The secondorder selfenergy correction of a free electron due to
the paramagnetic interaction. The electron with momentum q emits a virtual
photon with momentum k and then reabsorbs it.
to a virtual process in which the electron emits a photon and then reabsorbs
it. The secondorder correction to the energy is evaluated from
∆E
(2)
q
=
q
,k,α
< q ¦0¦ [
ˆ
H
para
[ q
1
k,α
> < q
1
k,α
[
ˆ
H
para
[ q ¦0¦ >
E
q
+ ¯h ω
k
− E
q
(1020)
where [ q
1
k,α
> is a onephoton intermediate state of the electronphoton
system. We assume that the process does not conserve energy, so that the
181
denominator is ﬁnite. The matrix elements are evaluated as
< q
1
k,α
[
ˆ
H
para
[ q ¦0¦ > =
¸
2 π ¯h c
2
V ω
k
¯h ˆ
α
(k) . q
1
V
_
d
3
r exp
_
− i q
. r
_
exp
_
− i k . r
_
exp
_
i q . r
_
=
¸
2 π ¯h c
2
V ω
k
¯h ˆ
α
(k) . q δ
q
+k−q
(1021)
which leads to momentum conservation. The secondorder correction to the
electron’s energy takes the form
∆E
(2)
q
=
_
e
2
m
2
c
2
_
k,α
_
2 π ¯h c
2
V ω
k
_
[ ¯h q . ˆ
α
(k) [
2
¯ h
2
q
2
2 m
−
¯ h
2
(q−k)
2
2 m
− ¯h ω
k
(1022)
On summing over the polarizations by using the diadic completeness relation
64
α
ˆ
α
(k) ˆ
α
(k) =
ˆ
I −
ˆ
k
ˆ
k (1023)
one ﬁnds that the numerator is given by
α
¯h
2
[ q . ˆ
α
(k) [
2
= ¯ h
2
q
2
( 1 − cos
2
θ ) (1024)
where θ is the angle between q and k
q . k = q k cos θ (1025)
Hence, one has
∆E
(2)
q
=
_
e
2
m
2
c
2
_
V
( 2 π )
3
_
d
3
k
_
2 π ¯h c
2
V ω
k
_
¯h
2
q
2
( 1 − cos
2
θ )
¯ h
2
q
2
2 m
−
¯ h
2
(q−k)
2
2 m
− ¯h ω
k
=
_
e
2
¯h
2 π m
2
c
_ _
∞
0
dk k
_
π
0
dθ sin θ
¯h
2
q
2
( 1 − cos
2
θ )
¯ h
2
q k
m
cos θ −
¯ h
2
k
2
2 m
− ¯h c k
(1026)
64
The completeness relation merely expresses the fact that any vector in a threedimensional
space can be expressed in terms of the components along three orthogonal directions ˆ e
i
A =
3
i=1
A
i
ˆ e
i
where the components are given by the scalar product
A
i
= A . ˆ e
i
Hence, the completeness relation follows as
I =
i
ˆ e
i
ˆ e
i
.
182
This contribution can be written as being explicitly proportional to the kinetic
energy of the electron, and a factor of k can be cancelled from the numerator
and the denominator
∆E
(2)
q
=
¯h
2
q
2
2 m
_
e
2
¯h c
_
2
π
_
∞
0
dk
_
π
0
dθ sin θ
( 1 − cos
2
θ )
2 q cos θ − k −
2 m c
¯ h
(1027)
It should be evident that the integral diverges logarithmically at large k. The
divergent part of the integral can be written as
∆E
(2)
q
∼ −
¯h
2
q
2
2 m
_
e
2
¯h c
_
2
π
_
π
0
dθ sin θ ( 1 − cos
2
θ )
_
∞
2mc
¯ h
dk
k
= −
¯h
2
q
2
2 m
_
e
2
¯h c
_
8
3 π
_
∞
2mc
¯ h
dk
k
(1028)
If an upper cutoﬀ λ
−1
+
is introduced, then the correction to the electron’s kinetic
energy can be estimated as
∆E
(2)
q
= −
¯h
2
q
2
2 m
8
3 π
_
e
2
¯h c
_
ln
_
¯h
2 m c λ
+
_
(1029)
This shift can be interpreted as a (secondorder) renormalization of the electron’s
mass from the unrenormalized mass to the physical mass m
∗
1
m
∗
=
1
m
_
1 −
8
3 π
_
e
2
¯h c
_
ln
_
¯h
2 m c λ
+
_
+ . . .
_
(1030)
It is the renormalized mass m
∗
which would be determined by an experiment.
10.3.4 The SelfEnergy of a Bound Electron
The Lamb shift (a quantum electrodynamic shift of the 2s level of Hydrogen
upwards by 1058 MHz) is caused the selfenergy of a bound electron. The
selfenergy of the state nlm can be estimated from secondorder perturbation
theory using the dipole approximation, as is appropriate for a completely non
relativistic calculation. The secondorder shift is given by
∆E
(2)
nlm
=
_
e
2
m
2
c
2
_
k,α,n
l
m
_
2 π ¯h c
2
V ω
k
_
[ < n
l
m
[ ˆ
α
(k) . ˆ p [ nlm > [
2
E
nlm
− E
n
l
m
− ¯h ω
k
(1031)
On summing over the polarizations using the completeness relation, one obtains
∆E
(2)
nlm
=
_
e
2
m
2
c
2
_
¯h c
( 2 π )
_
∞
0
dk k
_
π
0
dθ
k
sin θ
k
n
l
m
[ < n
l
m
[ ˆ p [ nlm > [
2
( 1 − cos
2
θ
k
)
E
nlm
− E
n
l
m
− ¯h ω
k
(1032)
183
where θ
k
is the angle subtended between k and the matrix elements of p. The
angular integration can be performed, yielding
∆E
(2)
nlm
=
_
e
2
m
2
c
2
_
2 ¯h c
3 π
_
∞
0
dk k
n
l
m
[ < n
l
m
[ ˆ p [ nlm > [
2
E
nlm
− E
n
l
m
− ¯h ω
k
(1033)
In the completely nonrelativistic limit, the integration over k can be shown to
be linearly divergent at the upper limit of integration.
Hans Bethe argued
65
that, within the same dipole approximation, the cor
rection to the kinetic energy of the electron in the state [ nlm > is given by an
expression analogous to that of an electron in a continuum state n
∆T
(2)
n
=
2
3 π
_
e
2
¯h c
_ _
¯h
m c
_
2
_
∞
0
dω ω
n
[ < n
[ ˆ p [ n > [
2
E
n
− E
n
− ¯h ω
(1034)
Since momentum is conserved for continuum states (on average), only the state
where n = n
contribute so the denominator simpliﬁes
66
. The expression for the
mass renormalization is divergent and is given by
∆T
(2)
n
= −
2
3 π
_
e
2
¯h c
_ _
¯h
m c
_
2
_
∞
0
dω ω
n
[ < n
[ ˆ p [ n > [
2
¯h ω
= −
4
3 π
_
e
2
¯h c
_ _
¯h
m c
2
_ _
∞
0
dω ω
ω
< n [ ˆ p
2
[ n >
2 m
(1035)
where the completeness relation has been used. This expression is valid if n
labels either a continuum or a discrete state, since only the mass of the electron
is being altered and the expectation value of ˆ p is unaltered. Thus, Bethe argued,
the kinetic energy of an electron in a bound state which has the physical mass
m
∗
should be approximated as
< nlm [
ˆ p
2
2 m
∗
[ nlm > = < nlm [
ˆ p
2
2 m
[ nlm >
−
4
3 π
_
e
2
¯h c
_ _
¯h
m c
2
_ _
∞
0
dω ω
ω
< nlm [ ˆ p
2
[ nlm >
2 m
(1036)
Now the bare Hamiltonian for an electron bound to a nucleus is given by
ˆ
H
0
=
ˆ p
2
2 m
+ V (r) (1037)
65
H. A. Bethe, Phys. Rev. 72, 339 (1947).
66
Since we are now using the dipole approximation, the recoil of the free electron which was
taken into account in our previous analysis is now being ignored. [See the denominator of the
ﬁrst line of eqn(1026).]
184
and the unperturbed energy of the hypothetical state [ nlm > is calculated in
the nonrelativistic Schr¨odinger theory as
E
(0)
nlm
= < nlm [
ˆ p
2
2 m
[ nlm > + < nlm [ V (r) [ nlm > (1038)
However, when this is evaluated, the approximate energy E
(0)
nlm
has to be ex
pressed in terms of the observed physical mass m
∗
via
E
(0)
nlm
= < nlm [
ˆ p
2
2 m
∗
[ nlm > + < nlm [ V (r) [ nlm >
+
4
3 π
_
e
2
¯h c
_ _
¯h
m c
2
_ _
∞
0
dω ω
ω
< nlm [ ˆ p
2
[ nlm >
2 m
= < nlm [
ˆ p
2
2 m
∗
[ nlm > + < nlm [ V (r) [ nlm >
+
4
3 π
_
e
2
¯h c
_ _
¯h
m c
2
_ _
∞
0
dω ω
ω
n
l
m
[ < n
l
m
[ ˆ p [ nlm > [
2
2 m
(1039)
where the completeness relation was used in obtaining the last line. The second
term in the unperturbed energy is a correction due to the mass renormalization
67
which should be combined with the secondorder radiative correction. The total
energy (to secondorder) is given by
E
nlm
= E
(0)
n,l,m
+ ∆E
(2)
n,l,m
= < nlm [
ˆ p
2
2 m
∗
[ nlm > + < nlm [ V (r) [ nlm >
+
2
3 π
_
e
2
¯h c
_ _
¯h
m c
_
2
_
∞
0
dω ω
n
l
m
[ < n
l
m
[ ˆ p [ nlm > [
2
¯h ω
+
2
3 π
_
e
2
¯h c
_ _
¯h
m c
_
2
_
∞
0
dω ω
n
l
m
[ < n
l
m
[ ˆ p [ nlm > [
2
E
nlm
− E
n
l
m
− ¯h ω
(1040)
The overall (secondorder) shift from the Schr¨odinger estimate of the energy for
the state [ nlm > (as calculated with the physical mass) is given by the sum
of the last two terms, which is expressed as
∆E
shift
nlm
=
2
3 π
_
e
2
¯h c
_ _
¯h
m c
_
2
_
∞
0
dω ω
n
l
m
[ < n
l
m
[ ˆ p [ nlm > [
2
( E
nlm
− E
n
l
m
)
( E
nlm
− E
n
l
m
− ¯h ω ) ¯h ω
(1041)
67
Renormalization is an idea which Bethe attributed to H. A. Kramers. Kramers had
proposed that physical quantities should be expressed in terms of observable quantities, with
all mention of bare quantities removed. Kramers was advocating a classical treatment from
which Bethe created a nonrelativistic quantum treatment.
185
The integration over ω is logarithmically divergent, and can be made to con
verge by introducing an upper cutoﬀ ω
+
= c λ
−1
. Therefore, the diﬀerence
of the linearly divergent selfenergy of the bound electron and the linearly di
vergent selfenergy of the free electron is only logarithmically divergent. After
introducing the cutoﬀ, one ﬁnds the result
∆E
shift
nlm
= −
2
3 π
_
e
2
¯h c
_
n
l
m
[ < n
l
m
[ ˆ p [ nlm > [
2
m
2
c
2
( E
nlm
− E
n
l
m
) ln
¸
¸
¸
¸
¯hcλ
−1
+E
n
l
m
−E
nlm
E
n
l
m
−E
nlm
¸
¸
¸
¸
(1042)
If the rest energy of the electron is used as the upper cutoﬀ energy m c
2
∼ 0.5
10
6
eV, and assuming that the averaged logarithm of the electron excitation
energy corresponds to an energy of the order of 17.8 Ryd, then the logarithm has
a value of about 7.63 and is not sensitive to the precise value of E
nlm
− E
n
l
m
and, therefore, can be taken outside the summation
∆E
shift
nlm
= −
2
3 π
_
e
2
¯h c
_
ln
¸
¸
¸
¸
2 ¯h
2
c
2
Z
2
e
4
¸
¸
¸
¸
n
l
m
[ < n
l
m
[ ˆ p [ nlm > [
2
m
2
c
2
( E
nlm
− E
n
l
m
)
(1043)
As later shown by Dyson
68
, that divergences found in any order in
e
2
¯ h c
can
be removed by consistently using the ideas of mass and charge renormaliza
tion
69
. Hence, a completely consistent relativistic theory does yield a ﬁnite
shift, without the need to invoke any cutoﬀ
70
. The weighted sum over the
matrix elements can be evaluated by expressing it in terms of an expectation
value involving commutators of
ˆ
H
0
with ˆ p. That is
n
l
m
[ < n
l
m
[ ˆ p [ nlm > [
2
( E
nlm
− E
n
l
m
)
=
n
l
m
< nlm [ ˆ p [ n
l
m
> < n
l
m
[ [ ˆ p ,
ˆ
H
0
] [ nlm >(1044)
and using the completeness relation, one obtains
= < nlm [ ˆ p [ ˆ p ,
ˆ
H
0
] [ nlm > (1045)
68
F. J. Dyson, Phys. Rev. 75, 1736 (1949).
69
This statement does not imply that a properly renormalized perturbation theory is con
vergent. In fact, one may argue that if the coupling constant changed sign then systems
containing electrons would be unstable to BCS pairing. Since the radius of convergence of
any expansion is limited by the closest singularity, perturbation theory may only have a zero
radius of convergence. In this case, the theory may be expected to contain nonanalytic terms
of the form exp[ − ¯h c/ e
2
].
70
F. J. Dyson, Phys. Rev. 173, 617 (1948).
186
On substituting
ˆ p = − i ¯h ∇ (1046)
and
ˆ
H
0
=
ˆ p
2
2 m
+ V (r) (1047)
into the expression for the matrix elements, one obtains
− ¯h
2
_
d
3
r ψ
nlm
(r) ∇ .
_
( ∇ V (r) ) ψ
nlm
(r)
_
(1048)
On integrating by parts, expanding the derivative of the big brackets in the
above equation and rearranging both sides of the resulting equation, one ﬁnds
_
d
3
r ( ∇ ψ
mln
(r) ) . ∇V (r) ψ
nlm
(r) = −
1
2
_
d
3
r ψ
nlm
(r) ∇
2
V (r) ψ
nlm
(r)
(1049)
Substituting the expressions for the matrix elements into the expression for the
Lambshift yields
∆E
shift
nlm
=
2
3 π
_
e
2
¯h c
_ _
¯h
2
m
2
c
2
_
ln
¸
¸
¸
¸
2 ¯h
2
c
2
Z
2
e
4
¸
¸
¸
¸
< nlm [ ∇
2
V (r) [ nlm >
(1050)
Thus, the energyshift only occurs for bound electrons as the expectation value of
the Laplacian of the potential will vanish for extended states. For a hydrogenic
like atom
∇
2
V (r) = 4 π Z e
2
δ
3
(r) (1051)
so
∆E
shift
nlm
=
4 Z e
2
3
_
e
2
¯h c
_ _
¯h
2
m
2
c
2
_
[ ψ
nlm
(0) [
2
ln
¸
¸
¸
¸
2 ¯h
2
c
2
Z
2
e
4
¸
¸
¸
¸
(1052)
Therefore, the Lamb shift only occurs for electrons with l = 0, since electronic
wave functions with l ,= 0 vanish at the origin. The atomic wave function at
the position of the nucleus is given by
[ ψ
n00
(0) [
2
=
1
π
_
Z
n a
_
3
(1053)
This yields Bethe’s estimate for the Lamb shift as
∆E
shift
n00
=
4
3 π n
3
_
e
2
¯h c
_
3
_
Z
4
e
4
m
¯h
2
_
ln
¸
¸
¸
¸
2 ¯h
2
c
2
Z
2
e
4
¸
¸
¸
¸
(1054)
The above formulae leads to the estimate of 1040 MHz which is in good agree
ment with the experimentally determined value
71
. The exact relativistic calcu
lation
72
yields the result
∆E
shift
n00
=
4
3 π n
3
Z
4
_
e
2
¯h c
_
5
m c
2
_
ln
¸
¸
¸
¸
m c
2
2 ¯h ω
n,n
¸
¸
¸
¸
+
31
120
_
(1055)
71
W. E. Lamb Jr. and R. E. Retherford, Phys. Rev. 72, 241 (1947).
72
N. M. Kroll and W. E. Lamb Jr., Phys. Rev. 75, 388 (1949).
187
where the mc
2
in the logarithm comes from the Dirac theory without invoking
any cutoﬀ. The most recent experimentally measured value
73
is 1057.851 MHz
which is in good agreement with the theoretical value of 1057.857 MHz.
10.3.5 Brehmstrahlung
Accelerating (or decelerating) charged particles radiate. We shall consider the
radiation emitted by a charged particle (such as an electron) that scatters from
a massive charged particle via the Coulomb interaction. It is assumed that the
mass M of the massive charged particle (in most cases, this is a nucleus) is
signiﬁcantly greater than the electron mass, so that the recoil of the nucleus can
be neglected. The (instantaneous) Coulomb interaction between the electron
and the nucleus is given by
V (r) = −
Z e
2
r
(1056)
The Hamiltonian of the unperturbed electron is simply the kinetic energy. The
incident electron is assumed to have a momentum q and the scattered electron
has momentum q
and the crosssection for the scattering process will be calcu
lated via loworder perturbation theory.
Rutherford Scattering
To secondorder, the scattering crosssection is expressed as Rutherford scat
tering which is elastic and, therefore, involves no emission of photons. The
q'
q
qq'
Figure 40: The Rutherford scattering process.
Rutherford scattering crosssection is found from the FermiGolden rule decay
rate
_
1
τ
_
Rutherford
=
2 π
¯h
[ < q
[ V (r) [ q > [
2
δ( E
q
− E
q
) (1057)
73
G. C. Bhatt and H. Grotch, Ann. Phys. 187, 1 (1987).
188
The matrix elements of the Coulomb potential is evaluated as
< q
[ V (r) [ q > = −
4 π Z e
2
V [ q − q
[
2
(1058)
On integrating over the magnitude of the scattered electron’s momentum, one
obtains
_
1
τ
dΩ
_
Rutherford
=
2 π
¯h
V
( 2 π )
3
dΩ
_
∞
0
dq
q
2
_
4 π Z e
2
V [ q − q
[
2
_
2
δ( E
q
− E
q
)
=
2 π
¯h
V
( 2 π )
3
m
¯h
2
q
_
4 π Z e
2
V [ q − q
[
2
_
2
dΩ
(1059)
The denominator in the potential has to be evaluated on the energy shell. On
introducing the scattering angle θ
and using the elastic scattering condition
[ q − q
[
2
= 2 q
2
( 1 − cos θ
)
= 4 q
2
sin
2
θ
2
(1060)
one ﬁnds
q
q'
θ'
2q sinθ'/2
Figure 41: The geometry for Rutherford scattering. For elastic scattering, the
magnitude of the initial momentum q is equal to the magnitude of the ﬁnal
momentum q
and the scattering angle is θ
.
_
1
τ
dΩ
_
Rutherford
=
2 π
¯h
V
( 2 π )
3
m
¯h
2
q
_
4 π Z e
2
V 4 q
2
sin
2 θ
2
_
2
dΩ
(1061)
On diving the scattering rate by the incident ﬂux T of electrons
T =
¯h q
m V
(1062)
189
the elastic scattering crosssection is found to be given by
_
dσ
dΩ
_
Rutherford
=
2 π
¯h
V
( 2 π )
3
V m
2
¯h
3
_
4 π Z e
2
V 4 q
2
sin
2 θ
2
_
2
=
_
m Z e
2
2 ¯h
2
q
2
sin
2 θ
2
_
2
(1063)
which is the Rutherford scattering crosssection for electrons. The scattering
0 0.25 0.5 0.75 1
θ'/π
d
σ
/
d
Ω
'
Figure 42: The scattering angle dependence of the diﬀerential scattering cross
section.
crosssection diverges at θ
= 0 and is always ﬁnite at θ
= π no matter how
large q is. The scattering at θ
= π is known as backscattering, and is caused
by the extremely high potential experienced by electrons with very small im
pact parameters. It was the large crosssection for backscattering of charged
αparticles from atoms, found by H. Geiger and E. Marsden in 1913
74
, that
was instrumental in verifying Rutherford’s 1911 conjecture
75
that atoms have
nuclei which are of very small spatial extent. The divergence in the scattering
crosssection at θ
= 0 is due to the longranged nature of the Coulomb inter
action, which causes electrons to undergo scattering (no matter how slight the
scattering is) at arbitrarily large distances from the nucleus.
Brehmstrahlung
Elastic scattering of electrons by the Coulomb potential is highly unlikely,
since from classical electrodynamics it is known that accelerated particles ra
diate. Hence, it is expected that photons should be emitted in this process.
This phenomenon is known as Brehmstrahlung. We shall calculate the Brehm
strahlung scattering crosssection
76
using loworder perturbation theory. The
electron is scattered between the free electron eigenstates due to a perturbation
74
H. Geiger and E. Marsden, Phil. Mag. 25, 1798 (1913).
75
E. Rutherford, Phil. Mag. 21, 669 (1911).
76
H. A. Bethe and W. Heitler, Proc. Roy. Soc. A 146, 82 (1934).
190
which is a linear superposition of the Coulomb interaction with the nucleus and
the paramagnetic interaction.
(k,α)
q
q'
q'+k
(k,α)
q
q'
qk
Figure 43: The two lowestorder processes contributing to Bremstrahlung.
The lowestorder probability amplitude describing Brehmstrahlung is a lin
ear superposition of two processes. These are:
(a) Scattering of an electron from the nucleus followed by the emission of a pho
ton. The initial state of the electron is assumed to have momentum q and the
ﬁnal state of the electron is given by q
while the emitted photon has momentum
k. Therefore, from conservation of momentum, the momentum of the electron
in the intermediate state is given by q
+k.
(b) Emission of a photon followed by scattering from the nucleus. Conservation
of momentum indicates that the intermediate state has momentum given by
q −k.
The matrix elements for these secondorder processes are given by
M
a
=
_
4 π Z e
2
V [ q − q
− k [
2
_ _
e ¯h
m c
_
¸
2 π ¯h c
2
V ω
k
ˆ
α
(k) . ( q
+ k )
_
1
E
q
− E
q
+k
+ i η
_
(1064)
and
M
b
=
_
4 π Z e
2
V [ q − q
− k [
2
_ _
e ¯h
m c
_
¸
2 π ¯h c
2
V ω
k
ˆ
α
(k) . ( q − k )
_
1
E
q
− E
q−k
− ¯h ω
k
+ i η
_
(1065)
It should be noted that the numerators of the matrix elements simplify because
the photons have transverse polarizations
α
(k) . k = 0 (1066)
191
From the energy conserving delta function in the expression for the decay rate,
one ﬁnds
E
q
= E
q
+ ¯h ω
k
(1067)
hence the ﬁrst energydenominator can be expressed in a similar form to the
second
E
q
− E
q
+k
= E
q
− E
q
+k
+ ¯h ω
k
(1068)
For small k, the energydenominators can be expanded, yielding
E
q
− E
q
+k
+ ¯h ω
k
= ¯ h ω
k
−
¯h
m
q
. k −
¯h
2
k
2
2 m
(1069)
and
E
q
− E
q−k
− ¯h ω
k
= − ¯h ω
k
+
¯h
m
q . k −
¯h
2
k
2
2 m
(1070)
Since the energy of the photon cannot exceed the energy of the initial electron,
one must have q > k, so the third term is smaller than the second term. Due to
the large magnitude of c compared with the electron velocities
¯ h q
m
, the second
and third terms can be neglected. Therefore, the photonenergy dominates both
the energydenominators. On substituting the above expressions in the sum of
the matrix elements, one ﬁnds
M
a
+ M
b
=
_
4 π Z e
2
V [ q − q
− k [
2
_ _
e
m c
_
¸
2 π ¯h c
2
V ω
k
_
ˆ
α
(k) . q
¯h
E
q
− E
q
+k
+ ¯h ω
k
+ i η
+
ˆ
α
(k) . q ¯h
E
q
− E
q−k
− ¯h ω
k
+ i η
_
≈
_
4 π Z e
2
V [ q − q
− k [
2
_ _
e
m c
_
¸
2 π ¯h c
2
V ω
k
_
ˆ
α
(k) . ( q
− q )
ω
k
_
(1071)
Using this approximation for the matrix elements, the transition rate is given
by
1
τ
=
2 π
¯h
q
k,α
_
4 π Z e
2
V [ q − q
− k [
2
_
2
_
e
m c
_
2
_
2 π ¯h c
2
V ω
k
_
¸
¸
¸
¸
ˆ
α
(k) . ( q
− q )
ω
k
¸
¸
¸
¸
2
δ( E
q
− E
q
− ¯h ω
k
) (1072)
The terms proportional to k in the Coulomb scattering terms can be neglected,
for low k values. The inelastic scattering crosssection for Brehmstrahlung is
found by replacing the sums over q
and k
by integrals, and dividing by the
192
incident ﬂux of electrons. This procedure results in the expression
_
d
2
σ
dΩ
dω
k
_
Brehmse
=
q
q
_
2 m Z e
2
¯h
2
[ q − q
[
2
_
2
α
_
dΩ
k
4 π
2
ω
k
_
e
2
¯h c
_
¸
¸
¸
¸
¯h ˆ
α
(k) . ( q
− q )
m c
¸
¸
¸
¸
2
(1073)
If the angular distributions of the emitted photon (dΩ
k
) and the scattered elec
tron (dΩ
) are both measured, the scattering crosssection can be represented
as
_
d
3
σ
dΩ
dΩ
k
dω
k
_
Brehmse
=
q
q
_
dσ
dΩ
_
Rutherford
1
4 π
2
ω
k
_
e
2
¯h c
_
α
¸
¸
¸
¸
¯h ˆ
α
(k) . ( q
− q )
m c
¸
¸
¸
¸
2
(1074)
where the second factor is the probability of emitting a photon with energy ¯ h ω
k
into solid angle dΩ
k
. On summing over the polarization α and integrating over
the directions of the emitted photon, one obtains
_
d
2
σ
dΩ
dω
k
_
Brehmse
=
q
q
_
2 m Z e
2
¯h
2
[ q − q
[
2
_
2
2
3 π ω
k
_
e
2
¯h c
_ ¸
¸
¸
¸
¯h ( q
− q )
m c
¸
¸
¸
¸
2
(1075)
Hence, the scattering rate which includes the emission of a photon of energy
¯h ω
k
is given by the product of the Rutherford scattering rate with a factor
q
q
2
3 π ω
k
_
e
2
¯h c
_ _
2 q ¯h sin
θ
2
m c
_
2
(1076)
This particular factorization of the crosssection involving the simultaneous
emission of a soft photon is common to many processes involving the emis
sion of lowenergy bosons. The softphoton theorem
77
shows that properties of
the emitted lowenergy photon is insensitive to anything except the global prop
erties (such as the total charge or total magnetic moment) of the scattered par
ticle. The crosssection involving the emission of a lowenergy photon diverges
as ω
k
→ 0, due to the factor of ω
−1
k
in eqn(1075). This type of divergence
is an infrared divergence. What this implies is that, in Brehmstrahlung, arbi
trary large numbers of lowenergy photons are emitted. Furthermore, similar
singularities are also found in the ω = 0 limit when elastic scattering correc
tions to the Rutherford scattering process are considered
78
. In any experiment
77
F. F. Low, Phys. Rev. 96, 1428 (1958).
78
R. H. Dalitz, Proc. Roy. Soc. A 206, 509 (1950).
193
with ﬁnite energy resolution, elastic scattering and very lowenergy quasielastic
scattering processes cannot be distinguished, so it is might be expected that the
elastic scattering and quasielastic scattering divergences should be combined.
The divergences found in the problem of Brehmstrahlung were ﬁrst con
sidered by Bloch and Nordsieck
79
who showed that the infrared divergences
cancel. That is, the infrared divergence does not exist
80
. The cancelation
was achieved adding virtual emission processes for Rutherford scattering to the
Brehmstrahlung crosssection for the emission of photons of energy less than
ω
0
, since these processes cannot be distinguished for suﬃciently small photon
frequencies ω
0
. That is, on introducing an infrared cutoﬀ λ
−
, one ﬁnds that
the total inelastic scattering in which a photon with frequency less than ω
0
is
emitted is given by
_
dσ
dΩ
_
Brehmse
=
_
dσ
dΩ
_
Rutherford
_
1
2 π
_
e
2
¯h c
_ _
A ln
2 ω
0
λ
−
c
+. . .
_
+. . .
_
(1077)
where the factor A depends on the initial and ﬁnal momentum of the electron.
This result is logarithmically divergent as λ
−
→ 0. On the other hand, to the
same order, the elastic scattering crosssection is found as
_
dσ
dΩ
_
Elastic
=
_
dσ
dΩ
_
Rutherford
_
1 +
1
2 π
_
e
2
¯h c
_ _
A ln
¯h
λ
−
m c
+. . .
_
+. . .
_
(1078)
Hence, on combining the results, one ﬁnds that the quasielastic scattering cross
section is given by
_
dσ
dΩ
_
QuasiElastic
=
_
dσ
dΩ
_
Rutherford
_
1 +
1
2 π
_
e
2
¯h c
_ _
A ln
2 ¯h ω
0
m c
2
+. . .
_
+. . .
_
(1079)
so the cutoﬀ λ
−
cancels and the scattering crosssection does not diverge log
arithmically. With this reasoning, Bloch and Nordsieck found that the appro
priate expansion parameter is not
e
2
¯ h c
but instead is given by
e
2
¯ h c
ln
¯ h ω0
m c
2
. The
higherorder perturbations may also describe processes involving larger numbers
of emitted soft photons and results in a multiplicative exponential factor to the
quasielastic scattering rate
_
dσ
dΩ
_
QuasiElastic
≈
_
dσ
dΩ
_
Rutherford
exp
_
1
2 π
_
e
2
¯h c
_
B ln
2 ¯h ω
0
m c
2
+. . .
_
(1080)
Therefore, the scattering rate from soft photons vanishes in the limit ω
0
→ 0.
This occurs because perturbation theory causes the normalization of the starting
79
F. Bloch and A. Nordsieck, Phys. Rev. 52, 54 (1937).
80
Since there are an inﬁnite number of lowenergy photons present in Brehmstrahlung, then
it is expected that the classical limit of quantum theory applies so that classical electromag
netic theory should produce exact results.
194
approximate wave function to change, and hence the probabilities of the vari
ous processes are changed by including higherorder processes. In other words,
since the probability of emitting an arbitrarily large number of softphotons is
ﬁnite, the probability of emitting either zero or any ﬁxed number of soft photons
must be zero. Bloch and Nordsieck’s calculation was restricted to the case of
emission of suﬃciently lowenergy photons. Pauli and Fierz
81
also considered
Brehmstrahlung in a nonrelativistic approximation. Pauli and Fierz showed
that the infrared divergences, discussed above, cancel. Pauli and Fierz went on
to examine the remaining ultraviolet divergences, and showed that portions of
the ultraviolet inﬁnities that were found in the calculations of the scattering
processes could be associated with mass renormalization. Using a relativistic
theory Ito, Koba and Tomonaga
82
showed that the remaining inﬁnities could
be absorbed into a renormalization of the electron charge. Similar conclusions
were arrived at by Lewis
83
and by Epstein
84
. Dyson
85
showed that all inﬁnities
that appear in Quantum Electrodynamics could be cured by renormalization to
arbitrarily highorders in perturbation theory.
11 The Dirac Equation
In 1928, Dirac searched for a relativistically invariant form of the oneparticle
Schr¨odinger equation for electrons
i ¯h
∂
∂t
ψ =
ˆ
H ψ (1081)
Since this equation is only ﬁrstorder in time, then the solution is uniquely
speciﬁed by the initial condition for ψ. It is essential to only require an evolution
equation which is ﬁrstorder in time. Dirac
86
searched for a set of coupled ﬁrst
order (in time) equations for a multicomponent wave function ψ
ψ =
_
_
_
_
_
_
_
_
_
ψ
(0)
ψ
(1)
.
.
.
ψ
(N)
_
_
_
_
_
_
_
_
_
(1082)
The wave function was assumed to satisfy an equation of the form
_
i
¯h
c
∂
∂t
− α . ˆ p
_
ψ = β m c ψ
81
W. Pauli and M. Fierz, Nuovo Cimento, 15, 167 (1938).
82
D. Ito, Z. Koba and SI. Tomonaga, Prog. Theor. Phys. (Kyoto), 3, 276 (1948).
83
H. W. Lewis, Phys. Rev. 73, 173 (1948).
84
Saul T. Epstein, Phys. Rev. 73, 177 (1948).
85
F. J. Dyson, Phys. Rev. 75, 486 (1949).
86
P. A. M. Dirac, Proc. Roy. Soc. A 117, 610 (1928).
P. A. M. Dirac, Proc. Roy. Soc. A 118, 351 (1928).
195
_
i
¯h
c
∂
∂t
+ i ¯h α . ∇
_
ψ = β m c ψ (1083)
The equations have to be of this form since, if the equation is a ﬁrstorder partial
diﬀerential equation in time then it must also only involve the ﬁrstorder partial
derivatives with respect to the spatial components for the resulting equation to
be relativistically covariant. The wave function ψ is a Ncomponent (column)
wave function and the three as yet unknown components of α and β are three
N N matrices. Since the Hamiltonian is the generator of time translations,
then
ˆ
H should be equivalent to i¯h
∂
∂t
. Hence, as the Hamiltonian operator
ˆ
H
must be Hermitean, then the operators α and β must be Hermitean matrices.
This set of equations is required to yield the dispersion relation for a relativistic
particle
_
E
c
_
2
− p
2
= m
2
c
2
(1084)
which, following the ordinary rules of quantization, leads to the KleinGordon
equation
_
−
¯h
2
c
2
∂
2
∂t
2
+ ¯h
2
∇
2
_
ψ = m
2
c
2
ψ (1085)
(which is a secondorder partial diﬀerential equation in time). The requirement
that the Dirac equation is compatible with the KleinGordon equation imposes
conditions on the form of the matrices. On writing the Dirac equation as
i
¯h
c
∂ψ
∂t
=
_
β m c − i ¯h α . ∇
_
ψ (1086)
and iterating, one has
−
_
¯h
c
_
2
∂
2
ψ
∂t
2
=
_
β m c − i ¯h α . ∇
_
2
ψ
=
_
β
2
m
2
c
2
− i ¯h m c ( β α + α β ) . ∇
− ¯h
2
( α . ∇ )
2
_
ψ (1087)
When expressed in terms of individual matrices α
(j)
, the above equation be
comes
−
_
¯h
c
_
2
∂
2
ψ
∂t
2
=
_
β
2
m
2
c
2
− i ¯h m c
j
( β α
(j)
+ α
(j)
β ) ∇
j
−
¯h
2
2
i,j
( α
(i)
α
(j)
+ α
(j)
α
(i)
) ∇
i
∇
j
_
ψ (1088)
since the derivatives commute. If the above equation is to be equivalent to the
KleinGordon equation, then the coeﬃcients of the various derivatives must be
196
identical for both equations. Therefore, it is required that the constant terms
are equal
β
2
=
ˆ
I (1089)
It is also required that the ﬁrstorder derivative terms vanish and that the
secondorder derivative terms should be equal, hence the matrices must satisfy
the anticommutation relations
α
(i)
β + β α
(i)
= 0
α
(i)
α
(j)
+ α
(j)
α
(i)
= 2 δ
i,j
ˆ
I (1090)
On imposing the above conditions, Dirac’s form of the relativistic Schr¨odinger
equation is compatible with the KleinGordon equation.
From eqn(1089), one concludes that if the Hermitean matrices are brought to
diagonal form then the diagonal elements are given by ± 1. The possible dimen
sions N of the matrix can be determined by considering the anticommutation
relations. On taking the determinant of eqn(1090), one ﬁnds
det α
(i)
det β = ( −1 )
N
det β det α
(i)
det α
(i)
det α
(j)
= ( −1 )
N
det α
(j)
det α
(i)
(1091)
Hence, on cancelling the common factors of determinants, one ﬁnds
( − 1 )
N
= 1 (1092)
so N must be even. Furthermore, the matrices must be traceless. This can be
seen by considering
α
(i)
α
(j)
= − α
(j)
α
(i)
(1093)
which on multiplying by α
(i)
, yields the relation
α
(j)
= − α
(i)
α
(j)
α
(i)
α
(j)
= − ( α
(i)
)
−1
α
(j)
α
(i)
(1094)
since α
(i)
is its own inverse. Apart from the negative sign, the form of the
lefthand side is of the form of an equivalence transformation. By using cyclic
invariance, it can be shown that the trace of a matrix is invariant under equiv
alence transformations. Therefore, one has
Trace α
(i)
= − Trace α
(i)
(1095)
or
Trace α
(i)
= 0 (1096)
which proves that the matrices are traceless.
197
Since the Dirac matrices satisfy
β
2
=
ˆ
I
(α
(i)
)
2
=
ˆ
I (1097)
then their eigenvalues must all be ±1, as can be seen by operating on the
eigenvalue equation
β φ
β
= λ
β
φ
β
(1098)
with β. This process yields
β
2
φ
β
= λ
β
β φ
β
= λ
2
β
φ
β
(1099)
which with β
2
=
ˆ
I, requires that the eigenvalues must satisfy the equation
λ
2
β
= 1 (1100)
This and the condition that the matrices are traceless implies that the set of
eigenvalues of each matrix are composed of equal numbers of +1 and −1, and
it also conﬁrms the conclusion that dimension N of the matrices must be even.
The smallest value of the dimension for which there is a representation of the
matrices is N = 4. The smallest even value of N, N = 2 cannot be used since
one can only construct three linearly independent anticommuting 2 2 ma
trices
87
. These three matrices are the Pauli spin matrices σ
(j)
. Hence, Dirac
constructed the relativistic theory with N = 4.
It is useful to ﬁnd a representation in which the mass term is diagonal, since
this represents the largest energy which occurs in the nonrelativistic limit.
When diagonalized, the β matrix has two eigenvalues of +1 and two eigenvalues
of −1 and so β can be expressed in 2 2 blockdiagonal form. We shall express
the 4 4 matrices in the form of 2 2 block matrices. In this case, one can
represent the matrix in the blockdiagonal form
β =
_
I 0
0 −I
_
(1101)
If the three matrices α
(i)
are to anticommute with β and be Hermitean, they
must have the oﬀdiagonal form
α
(i)
=
_
0 A
(i)
A
(i)†
0
_
(1102)
87
In d + 1 spacetime dimensions, one can form 2
d+1
matrices from products of the set of
d+1 linearly independent (anticommuting) Diracmatrices. We shall assume that the product
matrices are linearly independent. Since the number of linearly independent N ×N matrices
is N
2
, the minimum dimension N which will yield a representation of the Diracmatrices is
N = 2
d+1
2 .
198
where A
(i)
is an arbitrary 2 2 matrix. We shall choose all three A
(i)
matrices
to be Hermitean. Since the three α
(i)
matrices must anticommute with each
other, the A
(i)
must also anticommute with each other. Since the three Pauli
matrices are mutually anticommuting, one can set
α
(i)
=
_
0 σ
(i)
σ
(i)
0
_
(1103)
where the σ
(i)
and I are, respectively, the 2 2 Pauli matrices and the 2 2
unit matrix. The Pauli matrices are given by
σ
(1)
=
_
0 1
1 0
_
(1104)
σ
(2)
=
_
0 −i
i 0
_
(1105)
and
σ
(3)
=
_
1 0
0 −1
_
(1106)
The matrix α
(0)
is deﬁned as the 4 4 identity matrix
α
(0)
=
_
I 0
0 I
_
(1107)
This set of matrices form a representation of the Dirac matrices. This can
be seen by directly showing that they satisfy the appropriate relations. Many
diﬀerent representations of the Dirac matrices can be found, but they are all
related by equivalence transformations and the physical results are independent
of which choice is made.
Exercise:
By direct matrix multiplication, show that the above matrices satisfy the
relations
( α
(j)
)
2
= β
2
=
ˆ
I (1108)
and the anticommutation relations
α
(i)
β + β α
(i)
= 0
α
(i)
α
(j)
+ α
(j)
α
(i)
= 2 δ
i,j
ˆ
I (1109)
and so form a representation of the Dirac matrices.
199
11.1 Conservation of Probability
One can ﬁnd a conservation law for Dirac’s equation
i ¯h
∂ψ
∂t
=
_
− i ¯h c α . ∇ + β m c
2
_
ψ (1110)
On premultiplying the Dirac equation by ψ
†
, which is the Hermitean conjugate
of the spinor wave function and is deﬁned as the row vector formed by the
complex conjugate of the components
ψ
†
=
_
ψ
(0)∗
, ψ
(1)∗
, ψ
(2)∗
, ψ
(3)∗
_
, (1111)
one obtains
i ¯h ψ
†
∂ψ
∂t
=
_
− i ¯h c ψ
†
α . ∇ ψ + ψ
†
β ψ m c
2
_
(1112)
The Hermitean conjugate of the Dirac equation is given by
− i ¯h
∂ψ
†
∂t
=
_
+ i ¯h c ∇ . ψ
†
α
†
+ ψ
†
β
†
m c
2
_
(1113)
Since α and β are Hermitean matrices, the Hermitean conjugate equation sim
pliﬁes to
− i ¯h
∂ψ
†
∂t
=
_
+ i ¯h c ∇ . ψ
†
α + ψ
†
β m c
2
_
(1114)
Postmultiplying the Hermitean conjugate equation by the columnvector ψ,
yields
− i ¯h
∂ψ
†
∂t
ψ =
_
+ i ¯h c ∇ . ψ
†
α ψ + ψ
†
β ψ m c
2
_
(1115)
On subtracting eqn(1115) from the eqn(1112) and combining terms, one obtains
i ¯h
∂
∂t
( ψ
†
ψ ) = − i ¯h c ∇ . ( ψ
†
α ψ ) (1116)
The above equation has the form of a continuity equation
∂ρ
∂t
+ ∇ . j = 0 (1117)
in which the probability density is given by
ρ = ψ
†
ψ (1118)
Using the rules of matrix multiplication the probability density is a real scalar
quantity, which is given by the sum of squares
ρ = [ ψ
(0)
[
2
+ [ ψ
(1)
[
2
+ [ ψ
(2)
[
2
+ [ ψ
(3)
[
2
(1119)
200
and so it is positive deﬁnite. Hence, unlike the KleinGordon equation, the
Dirac equation does not lead to negative probability densities. The probability
current density j is given by
j = c ψ
†
α ψ (1120)
In this case, the total probability
Q =
_
d
3
x ψ
†
ψ =
_
d
3
x ρ (1121)
is conserved, since
dQ
dt
=
_
d
3
x
∂ρ
∂t
= −
_
d
3
x ∇ . j
= −
_
d
2
S . j (1122)
where Gauss’s theorem has been used to represent the volume integral as surface
integral. For a suﬃciently large volume, the current at the boundary vanishes,
hence the total probability is conserved
dQ
dt
= 0 (1123)
11.2 Covariant Form of the Dirac Equation
In the absence of an electromagnetic ﬁeld, the Dirac equation can be expressed
in either of the two forms
α
µ
ˆ p
µ
ψ = β m c ψ
i ¯h α
µ
∂
µ
ψ = β m c ψ (1124)
where it has been recalled that
α
(0)
=
ˆ
I (1125)
and the covariant momentum operator is given by
ˆ p
µ
= i ¯h
_
∂
∂x
µ
_
(1126)
Or equivalently, after multiplying the Dirac equation by β and then introducing
the four γ matrices via
γ
µ
= β α
µ
(1127)
one ﬁnds that the Dirac equation appears in the alternate forms
γ
µ
ˆ p
µ
ψ = m c ψ
i ¯h γ
µ
∂
µ
ψ = m c ψ (1128)
201
The four gamma matrices satisfy the anticommutation relations
γ
µ
γ
ν
+ γ
ν
γ
µ
= 2 g
µ,ν
ˆ
I (1129)
where
ˆ
I is the 4 4 identity matrix, and g
µ,ν
is the Minkowski metric. The
gamma matrices labelled by the spatial indices are Unitary and antiHermitean,
as shall be proved below.
It is easy to show that the matrix with the temporal index (0) is unitary
and Hermitean
( γ
(0)
)
−1
= γ
(0)
( γ
(0)
)
†
= γ
(0)
(1130)
since β is its own inverse and β is Hermitean.
The gamma matrices with spatial indices are antiHermitean as
( γ
(i)
)
†
= ( β α
(i)
)
†
= ( ( α
(i)
)
†
β
†
)
= ( α
(i)
β )
= ( − β α
(i)
)
= − γ
(i)
(1131)
since α
(i)
and β are Hermitean and, in the fourth line the operators have been
anticommuted. Now, the gamma matrices with spatial indices can be shown
to be unitary since
γ
(i)
γ
(i)
= β α
(i)
β α
(i)
= − β β α
(i)
α
(i)
= −
ˆ
I (1132)
where, in obtaining the second line, the anticommutation properties of α
(i)
and
β have been used, and the property
( α
(i)
)
2
= β
2
=
ˆ
I (1133)
was used to obtain the last line. Since it has already been demonstrated that
the spatial matrices are antiHermitean
( γ
(i)
)
†
= − γ
(i)
(1134)
then it follows that γ
(i)
is unitary as
( γ
(i)
)
†
γ
(i)
=
ˆ
I (1135)
which completes the proof.
202
The continuity equation can also be expressed in a covariant form. The
covariant Dirac adjoint of ψ is deﬁned as ψ
†
where
ψ
†
= ψ
†
γ
(0)
(1136)
Hence, since
( γ
(0)
)
2
=
ˆ
I (1137)
the Hermitean conjugate wave function ψ
†
can be expressed in terms of the
adjoint spinor ψ
†
via
ψ
†
= ψ
†
γ
(0)
(1138)
The continuity equation has the Lorentz covariant form
∂j
µ
∂x
µ
= 0 (1139)
where the fourvector conserved probability current j
µ
is given by
j
µ
= c ψ
†
α
µ
ψ (1140)
By using the deﬁnition of the Dirac adjoint, the current density can be re
expressed as the four quantities
j
(0)
= c ψ
†
γ
(0)
ψ
j
(i)
= c ψ
†
γ
(i)
ψ (1141)
that, respectively, represent c times the probability density and the j
(i)
are the
contravariant components of the probability current density.
11.3 The Field Free Solution
In the absence of ﬁelds, the Dirac equation can be solved exactly by assuming
a solution in the form of planewaves. This is because the momentum operator
ˆ p commutes with the Hamiltonian
ˆ
H since in the absence of ﬁelds there is no
explicit dependence on position. The solution can be expressed as a momentum
eigenstate in the form
ψ =
_
_
_
_
u
(0)
u
(1)
u
(2)
u
(3)
_
_
_
_
exp
_
− i k
µ
x
µ
_
(1142)
where the functions u
µ
(k) are to be determined. On substituting this form in
the Dirac equation, it becomes an algebraic equation of the form
_
k
(0)
ˆ
I − k . α − (
m c
¯h
) β
_
ψ = 0 (1143)
203
where k is a threevector with components given by the contravariant spa
tial components of k
µ
. In order to write this equation in two by two block
diagonal form, the fourcomponent spinor ψ can be written in terms of two
twocomponents spinors
ψ =
_
φ
A
φ
B
_
(1144)
where the two twocomponent spinors are given by
φ
A
=
_
u
(0)
u
(1)
_
φ
B
=
_
u
(2)
u
(3)
_
(1145)
Hence, the Dirac equation can be expressed as the blockdiagonal matrix equa
tion
_
_
_
_
_
−k
(0)
+
m c
¯ h
_
I k . σ
k . σ
_
−k
(0)
−
m c
¯ h
_
I
_
_
_
_
_
φ
A
φ
B
_
= 0 (1146)
where the threevector scalar product involves the contravariant components
of the momentum k
(i)
with the Pauli spin matrices σ
(i)
. The above equation
is an eigenvalue equation for k
(0)
. The eigenvalues are given by the solution of
the secular equation
¸
¸
¸
¸
¸
_
−k
(0)
+
m c
¯ h
_
I k . σ
k . σ
_
−k
(0)
−
m c
¯ h
_
I
¸
¸
¸
¸
¸
= 0 (1147)
which can be written as
_
k
(0)2
−
_
m c
¯h
_
2
_
=
_
σ . k
_
2
(1148)
Using Pauli’s identity
_
σ . A
_ _
σ . B
_
=
_
A . B
_
I + i σ .
_
A ∧ B
_
(1149)
one ﬁnds the energy eigenvalues are given by the doublydegenerate dispersion
relations
k
(0)
= ±
¸
_
m c
¯h
_
2
+ k
2
(1150)
Thus, the ﬁeld free relativistic electron can have positive and negativeenergy
eigenvalues given by
E = ±
_
m
2
c
4
+ p
2
c
2
(1151)
204
Since the solutions are degenerate, solutions can be found that are simultaneous
eigenvalues of the Hamiltonian
ˆ
H given by
ˆ
H =
_
m c
2
I − i ¯h c σ . ∇
− i ¯h c σ . ∇ − m c
2
I
_
(1152)
and another operator that commutes with
ˆ
H. It is convenient to choose the
second operator to be the helicity operator.
The helicity operator
ˆ
Σ corresponds to the projection of the electron’s spin
along the direction of momentum. The (unnormalized) helicity operator is
k k
Σ
k
= +1 Σ
k
= −1
Figure 44: A cartoon depicting the two helicity states of a spin onehalf particle.
given by
ˆ
Σ = − i ¯h
_
σ . ∇ 0
0 σ . ∇
_
(1153)
This is the appropriate relativistic generalization of spin valid only for free
particles
88
, as the helicity is a conserved quantity since
[
ˆ
H ,
ˆ
Σ ] = 0 (1154)
In the absence of electromagnetic ﬁelds, the Hamiltonian is evaluated as
ˆ
H(k) =
_
m c
2
I ¯h c σ . k
¯h c σ . k − m c
2
I
_
(1155)
Likewise, for the source free case, the properly normalized Helicity operator is
found as
Λ(k) =
_
σ .
ˆ
k 0
0 σ .
ˆ
k
_
(1156)
88
Helicity is not conserved for spherically symmetric potentials. However, if only a time
independent vector potential is present, the generalized quantity
ˆ
Σ = σ . ( ˆ p −
q
c
A )
is conserved. This conservation law implies that the spin will always retain its alignment with
the velocity.
205
which has eigenvalues of ±1.
The axis of quantization of σ will be chosen to be along the direction of
propagation
ˆ
k. In this case, the helicity operator becomes
Λ(k) =
_
σ
(3)
0
0 σ
(3)
_
(1157)
and the eigenstates of helicity with eigenvalue +1 are composed of a linear
superposition of the spinup eigenstates. We shall represent the twocomponent
spinors φ
A
+
via
φ
A
+
= u
(0)
χ
+
= u
(0)
_
1
0
_
(1158)
and φ
B
+
as
φ
B
+
= u
(2)
χ
+
= u
(2)
_
1
0
_
(1159)
Therefore, one has
ψ
+
(x) =
_
u
(0)
χ
+
u
(2)
χ
+
_
exp
_
− i k
µ
x
µ
_
(1160)
Likewise, for the negative helicity states, φ
A
−
can be represented via
φ
A
−
= u
(1)
χ
−
= u
(1)
_
0
1
_
(1161)
and φ
B
+
as
φ
B
−
= u
(3)
χ
−
= u
(3)
_
0
1
_
(1162)
Thus, the eigenstates with helicity −1 are the spindown eigenstates
ψ
−
(x) =
_
u
(1)
χ
−
u
(3)
χ
−
_
exp
_
− i k
µ
x
µ
_
(1163)
Clearly, states with diﬀerent helicities are orthogonal since
χ
†
Λ
χ
Λ
= δ
Λ,Λ
(1164)
206
which is as it should be since they are eigenstates of a Hermitean operator.
On substituting the helicity eigenstates ψ
Λ
into the Dirac equation for the
free spin onehalf particle
i ¯h
∂
∂t
ψ
Λ
=
ˆ
H ψ
Λ
(1165)
one ﬁnds
E
_
φ
A
Λ
φ
B
Λ
_
=
_
m c
2
σ
(3)
c ¯h k
(3)
σ
(3)
c ¯h k
(3)
− m c
2
_ _
φ
A
Λ
φ
B
Λ
_
(1166)
Therefore, the complex amplitudes φ
A
Λ
and φ
B
Λ
are found to be related by
φ
B
Λ
=
σ
(3)
c ¯h k
(3)
E + m c
2
φ
A
Λ
(1167)
This equation shows that the components φ
B
Λ
are small for the positiveenergy
solutions, whereas the complementary expression
φ
A
Λ
= −
σ
(3)
c ¯h k
(3)
m c
2
− E
φ
B
Λ
(1168)
shows that φ
A
Λ
is small for the negativeenergy solutions. Hence, the two
positiveenergy and two negativeenergy (unnormalized) solutions of the Dirac
equation can be written as
ψ
+
(x) = ^
e
_
_
χ
+
c ¯ h k
(3)
E + m c
2
χ
+
_
_
exp
_
− i k
µ
x
µ
_
(1169)
for helicity +1 and
ψ
−
(x) = ^
e
_
_
χ
−
−
c ¯ h k
(3)
E + m c
2
χ
−
_
_
exp
_
− i k
µ
x
µ
_
(1170)
for helicity 1. In this expression ^
e
is a normalization factor.
The normalization condition is
_
d
3
r ψ
†
ψ = 1 (1171)
which determines the magnitude of the normalization constant through
1 = V ^
e
2
_
1 +
c
2
¯h
2
k
2
( E + m c
2
)
2
_
207
= V ^
e
2
_
E
2
+ 2 E m c
2
+ m
2
c
4
+ c
2
¯h
2
k
2
( E + m c
2
)
2
_
= V ^
e
2
_
2 E
2
+ 2 E m c
2
( E + m c
2
)
2
_
= V ^
e
2
_
2 E
E + m c
2
_
Hence, the normalization constant can be set as
^
e
=
_
E + m c
2
2 E V
(1172)
for positive E.
For states with negative energies,
E = −
_
m
2
c
4
+ c
2
¯h
2
k
2
(1173)
the lower components are the large components. In this case, it is more conve
nient to express the negativeenergy solutions as
ψ
+
(x) = ^
p
_
_
−
c ¯ h k
m c
2
− E
χ
+
χ
+
_
_
exp
_
− i k
µ
x
µ
_
(1174)
for helicity +1 and
ψ
−
(x) = ^
p
_
_
c ¯ h k
m c
2
− E
χ
−
χ
−
_
_
exp
_
− i k
µ
x
µ
_
(1175)
for helicity 1. Furthermore, in this expression the normalization constant has
the form
^
p
=
_
m c
2
− E
− 2 E V
(1176)
Hence, the positive and negativeenergy solutions are symmetric under the in
terchange E → − E, if Λ → − Λ and the upper and lower twocomponent
spinors (φ
A
, φ
B
) are interchanged.
General Helicity Eigenstates
The Helicity operator for a particle with a momentum ¯ h k is given by the
Hermitean operator
Λ(k) =
1
k
_
_
k
(3)
k
(1)
− i k
(2)
k
(1)
+ i k
(2)
− k
(3)
_
_
=
_
_
cos θ
k
sin θ
k
exp[−iϕ
k
]
sin θ
k
exp[+iϕ
k
] − cos θ
k
_
_
(1177)
208
which since
Λ(k) Λ(k) = I (1178)
has eigenvalues Λ of ±1. The helicity eigenstates
89
are given by the two
component spinors χ
Λ
±. The positive helicity state is given by
χ
Λ
+ =
1
_
2 k ( k − k
(3)
)
_
_
k
(1)
− i k
(2)
k − k
(3)
_
_
= exp[ − i
ϕ
k
2
]
_
_
cos
θ
k
2
exp[−i
ϕ
k
2
]
sin
θ
k
2
exp[+i
ϕ
k
2
]
_
_
(1179)
in which (k, θ
k
, ϕ
k
) are the polar coordinates of k. The negative helicity eigen
state is given by the spinor χ
Λ
−
χ
Λ
− =
1
_
2 k ( k − k
(3)
)
_
_
− k + k
(3)
k
(1)
+ i k
(2)
_
_
= exp[ + i
ϕ
k
2
]
_
_
− sin
θ
k
2
exp[−i
ϕ
k
2
]
cos
θ
k
2
exp[+i
ϕ
k
2
]
_
_
(1180)
Therefore, the general helicity eigenstate planewave solutions of the Dirac equa
tion can be written in terms of two twocomponent spinors as
ψ
Λ
±(x) = ^
e
_
_
χ
Λ
±
c ¯ h k Λ±
E + m c
2
χ
Λ
±
_
_
exp
_
− i k
µ
x
µ
_
(1181)
In this expression ^
e
is a normalization factor
^
e
=
_
E + m c
2
2 E V
(1182)
These planewave solutions are useful in considerations of scattering processes.
11.4 Coupling to Fields
The Dirac equation describes relativistic spin onehalf fermions, and their anti
particles. It describes all massive leptons such as the electron, muon and tao
particle, and can be generalized to describe their interaction with the electro
magnetic ﬁeld, or its generalization the electroweak ﬁeld. In the limit m → 0,
the Dirac equation reduces to the Weyl equation
90
which describes neutrinos.
89
C. G. Darwin, Proc. Roy. Soc. A 118, 654 (1928).
C. G. Darwin, Proc. Roy. Soc. A 120, 631 (1928).
90
H. Weyl, Z. Physik, 56, 330 (1929).
209
The Dirac equation also describes massive quarks and the interaction can be
generalized to quantum chromodynamics.
In the absence of interactions, the Dirac equation can be expressed in either
of the two forms
α
µ
ˆ p
µ
ψ = β m c ψ
i ¯h α
µ
∂
µ
ψ = β m c ψ (1183)
The interaction with electromagnetic ﬁeld is introduced as follows. Using the
minimal coupling approximation, where
ˆ p
µ
→ ˆ p
µ
= ˆ p
µ
−
q
c
A
µ
(1184)
and q is the charge of the particle, the Dirac equation in the presence of an
electromagnetic ﬁeld becomes
α
µ
_
ˆ p
µ
−
q
c
A
µ
_
ψ = β m c ψ
i ¯h α
µ
_
∂
µ
+ i
q
¯h c
A
µ
_
ψ = β m c ψ (1185)
This process has resulted in the inclusion of the interaction with the electro
magnetic ﬁeld in a gauge invariant, Lorentz covariant manner. The appearance
of the gauge ﬁeld together with the derivative results in local gauge invariance.
Sometimes it is convenient to deﬁne a covariant derivative as the gaugeinvariant
combination
T
µ
≡ ∂
µ
+ i
q
¯h c
A
µ
(1186)
The concept of the covariant derivative also appears in the context of other
gauge ﬁeld theories. Using this deﬁnition we can express the Dirac equation in
the presence of an electromagnetic ﬁeld in the compact covariant form
i ¯h γ
µ
T
µ
ψ = m c ψ (1187)
The presence of an electromagnetic ﬁeld does not alter the form of the conserved
fourvector current
j
µ
= c ψ
†
γ
µ
ψ (1188)
which is explicitly gauge invariant.
11.4.1 Mott Scattering
We shall consider the scattering of positiveenergy electrons from a nucleus of
charge Z. The initial electron beam has momentum ¯ h k which is scattered by
the target nucleus. The detector is positioned so that it detects all the scattered
210
electrons with momentum ¯h k
. The initial and ﬁnal states of the positiveenergy
electron can be represented by the Dirac spinors of the form ψ
σ
ψ
k,σ
(x) = ^
k
_
_
χ
σ
c ¯ h k . σ
E
k
+ m c
2
χ
σ
_
_
exp
_
− i k
µ
x
µ
_
(1189)
where the normalization constant is chosen as
^
k
=
¸
E
k
+ m c
2
2 E
k
V
(1190)
The interaction Hamiltonian with the electrostatic ﬁeld of the nucleus is given
by the diagonal matrix
ˆ
H
Int
= −
Z e
2
r
_
I 0
0 I
_
(1191)
The ﬂux of incident electrons is deﬁned by
T =
[ v [
V
(1192)
where
v =
_
∂E
∂p
_
(1193)
Therefore, the electron ﬂux is given by
T =
_
¯h k c
2
V E
k
_
(1194)
The elastic scattering crosssection in which the ﬁnal state polarization is un
measured is given by
_
dσ
dΩ
_
=
1
( 2 π )
2
_
E
k
V
2
¯h
2
k c
2
_
σ
_
∞
0
dk
k
2
[ < k
σ
[
ˆ
H
Int
[ k, σ > [
2
δ( E
k
−E
k
)
(1195)
where the delta function ensures conservation of energy. Since the polarization
of the ﬁnal state electron is unmeasured, the spin σ
is summed over. The
integration over k
can be performed, yielding
_
dσ
dΩ
_
=
_
E V
2 π ¯h
2
c
2
_
2
σ
[ < k
σ
[
ˆ
H
Int
[ k, σ > [
2
(1196)
where k and k
are restricted to be on the energy shell (E = E
k
= E
k
). The
matrix elements can then be evaluated as
< k
, σ
[
ˆ
H
Int
[ k, σ > = −
_
4 π Z e
2
V [ k − k
[
2
_ _
E + m c
2
2 E V
_
χ
T
σ
_
I +
c
2
¯h
2
( σ . k
) ( σ . k )
( E + m c
2
)
2
_
χ
σ
(1197)
211
where the normalization constants have been combined, since energy is con
served. Likewise, the complex conjugate matrix elements are given by
< k, σ [
ˆ
H
Int
[ k
, σ
> = −
_
4 π Z e
2
V [ k − k
[
2
_ _
E + m c
2
2 E V
_
χ
T
σ
_
I +
c
2
¯h
2
( σ . k ) ( σ . k
)
( E + m c
2
)
2
_
χ
σ
(1198)
These expressions for the matrix elements are inserted into the scattering cross
section. Since the ﬁnal state polarization is not detected, then σ
must be
summed over. The trace over σ
is evaluated by using the completeness relation
σ
χ
σ
χ
T
σ
= I (1199)
The resulting matrix elements involve the spindependent factor
χ
T
σ
_
I +
c
2
¯h
2
( σ . k ) ( σ . k
)
( E + m c
2
)
2
_ _
I +
c
2
¯h
2
( σ . k
) ( σ . k )
( E + m c
2
)
2
_
χ
σ
(1200)
The products of matrix elements shown above can be evaluated with the aid of
the Pauli identity. The sum of the crossterms can be evaluated directly using
the Pauli identity. We note that since the vector product are antisymmetric in
k and k
, the sum of the vector product terms cancel. That is
c
2
¯h
2
( E + m c
2
)
2
_
( σ . k ) ( σ . k
) + ( σ . k
) ( σ . k )
_
=
c
2
¯h
2
( E + m c
2
)
2
2 ( k . k
)
ˆ
I (1201)
The remaining term is evaluated by using the Pauli identity for the inner two
scalar products, and then reusing the identity for the outer two scalar products.
Explicitly, this process yields
c
4
¯h
4
( E + m c
2
)
4
_
( σ . k ) ( σ . k
) ( σ . k
) ( σ . k )
_
=
c
4
¯h
4
( E + m c
2
)
4
k
2
k
2
ˆ
I (1202)
Hence, the crosssection is given by
_
dσ
dΩ
_
=
_
Z e
2
¯h
2
c
2
[ k − k
[
2
_
2
_
( E +mc
2
)
2
+ 2 c
2
¯h
2
k . k
+
c
4
¯h
4
k
2
k
2
( E + m c
2
)
2
_
(1203)
212
It should be noted that the last two terms originated from the combined action
of the Pauli spin operators and involved the lower twocomponent spinors. The
last term can be simpliﬁed by using the elastic scattering condition k = k
and
then using the identity
c
4
¯h
4
k
4
= ( E
2
− m
2
c
4
)
2
(1204)
in the numerator. On canceling the factor of ( E + m c
2
)
2
in the denominator of
the last term with a similar factor in the numerator, the last term is recognized
as being just ( E − m c
2
)
2
. Hence, on combining the ﬁrst and last terms, one
ﬁnds the result
_
dσ
dΩ
_
=
_
Z e
2
¯h
2
c
2
[ k − k
[
2
_
2
_
2 ( E
2
+ m
2
c
4
) + 2 c
2
¯h
2
k . k
_
(1205)
The scattering angle θ
is introduced in the square parenthesis through
k . k
= k
2
cos θ
(1206)
and also in the denominator of the Coulomb interaction by
[ k − k
[
2
= 4 k
2
sin
2
θ
2
(1207)
Furthermore, the factor of m
2
c
4
in the square parenthesis can be replaced by
m
2
c
4
= E
2
− c
2
¯h
2
k
2
(1208)
so that the crosssection takes the form
_
dσ
dΩ
_
=
_
Z e
2
2 ¯h
2
c
2
k
2
sin
2 θ
2
_
2
_
E
2
− c
2
¯h
2
k
2
sin
2
θ
2
_
=
_
2 Z e
2
E
4 ¯h
2
c
2
k
2
sin
2 θ
2
_
2
_
1 −
_
v
c
_
2
sin
2
θ
2
_
(1209)
where the expression for the magnitude of the velocity
v
2
=
_
c
2
¯h k
E
_
2
(1210)
has been introduced. The above result is the Mott scattering crosssection
91
,
which describes the scattering of charged electrons. It diﬀers from the Ruther
ford scattering crosssection due to the multiplicative factor of relativistic origin,
which deviates from unity due to the electron’s internal degree of freedom. The
extra contribution to the scattering is interpreted in terms of scattering from the
magnetic moment associated with the electron’s spin interacting with the mag
netic ﬁeld of the nuclear charge that the electron experiences in its rest frame.
It should be noted that even if the initial beam of electrons is unpolarized, the
scattered beam will be partially spinpolarized (due to higherorder corrections).
91
N. F. Mott, Proc. Roy. Soc. A 124, 425 (1929).
213
11.4.2 Maxwell’s Equations
Maxwell’s equations can be written in the form of the Dirac equation. We
introduce a fourcomponent wave function ψ given by
ψ =
_
_
_
_
0
B
(1)
−iE
(1)
B
(2)
−iE
(2)
B
(3)
−iE
(3)
_
_
_
_
(1211)
Maxwell’s equations can be written in the form
i α
µ
∂
µ
ψ = −
4 π
c
j (1212)
where j is the contravariant form of the current fourvector
j =
_
_
_
_
c ρ
j
(1)
j
(2)
j
(3)
_
_
_
_
(1213)
We shall require that the matrices α
µ
are Hermitean and that they satisfy the
equation
( α
µ
)
2
=
ˆ
I (1214)
On comparing with the form of Maxwell’s equations
92
, one ﬁnds that the Ma
trices are given by
α
(0)
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
(1215)
α
(1)
=
_
_
_
_
0 −1 0 0
−1 0 0 0
0 0 0 −i
0 0 i 0
_
_
_
_
(1216)
α
(2)
=
_
_
_
_
0 0 −1 0
0 0 0 i
−1 0 0 0
0 −i 0 0
_
_
_
_
(1217)
α
(3)
=
_
_
_
_
0 0 0 −1
0 0 −i 0
0 i 0 0
−1 0 0 0
_
_
_
_
(1218)
92
Since the ﬁrst element of ψ is zero, the ﬁrst columns of the matrices are not determined
directly from the comparison. The ﬁrst rows are determined by demanding that the matrices
are Hermitean.
214
The matrices corresponding to the spatial indices are traceless and satisfy the
anticommutation relations
α
(i)
α
(j)
+ α
(j)
α
(i)
= 2 δ
i,j
(1219)
and
α
(i)
α
(j)
= i
k
ξ
i,j,k
α
(k)
(1220)
On premultiplying Maxwell’s equations in the form
i α
µ
∂
µ
ψ = −
4 π
c
j (1221)
with the operator
i α
ν
∂
ν
(1222)
one obtains
− α
ν
α
µ
∂
ν
∂
µ
ψ = − i
4 π
c
α
ν
∂
ν
j (1223)
Utilizing the anticommutation of the spatial matrices, the lefthand side sim
pliﬁes to
−
_
− ∂
µ
∂
µ
+ 2
_
α
ν
∂
ν
_
1
c
∂
∂t
_
ψ (1224)
On substituting the new form of Maxwell’s equations in the second term, the
expression reduces to
−
_
− ∂
µ
∂
µ
ψ + i
8 π
c
2
∂
∂t
j
_
(1225)
Thus, the equation becomes
∂
µ
∂
µ
ψ = i
4 π
c
_
2
c
∂
∂t
−α
ν
∂
ν
_
j (1226)
The zeroth component of the source term vanishes, due to conservation of
charge.
11.4.3 The Gordon Decomposition
The interaction of the Dirac particle with the electromagnetic ﬁeld is described
by the interaction Hamiltonian which is described by the 4 4 matrix
ˆ
H
I
=
_
q
c
_
c γ
(0)
γ
µ
A
µ
(1227)
215
The matrix interaction Hamiltonian operator yields an interaction Hamiltonian
density
ˆ
H
I
given by
ˆ
H
I
=
_
q
c
_
c ψ
†
γ
µ
ψ A
µ
=
_
q
c
_
j
µ
A
µ
(1228)
where j
µ
is the fourvector probability current density which satisﬁes the con
dition for conservation of probability. Due to the prominence of the current
density operator in applications of the Dirac equation, since it naturally de
scribes interactions with an electromagnetic ﬁeld and the conservation laws, the
physical content of the current densities shall be examined next.
In the presence of an electromagnetic ﬁeld, the fourvector current density
is given by the expression
j
ν
= c ψ
†
γ
ν
ψ (1229)
where
ψ
†
= ψ
†
γ
(0)
(1230)
One can rewrite the current density by using the Dirac equation
i ¯h γ
µ
_
∂
µ
+ i
q
¯h c
A
µ
_
ψ = m c ψ (1231)
and the Hermitean conjugate equation
− i ¯h
_
∂
µ
− i
q
¯h c
A
µ
_
ψ
†
γ
µ†
= m c ψ
†
(1232)
On symmetrizing the current density and then substituting the Dirac equation
in one term and its Hermitean conjugate in the other term, one obtains
j
ν
=
c
2
_
ψ
†
γ
ν
ψ + ψ
†
γ
ν
ψ
_
=
i ¯h
2 m
_
− ( ∂
µ
− i
q
¯h c
A
µ
)ψ
†
γ
µ†
γ
(0)
γ
ν
ψ + ψ
†
γ
(0)
γ
ν
γ
µ
( ∂
µ
+ i
q
¯h c
A
µ
)ψ
_
=
i ¯h
2 m
_
− ( ∂
µ
− i
q
¯h c
A
µ
)ψ
†
γ
(0)
γ
µ†
γ
(0)
γ
ν
ψ + ψ
†
γ
ν
γ
µ
( ∂
µ
+ i
q
¯h c
A
µ
)ψ
_
(1233)
where the partial derivatives only operate on the wave function immediately to
the right of it. The identity
γ
(0)
γ
(0)
=
ˆ
I (1234)
has been used to express ψ
†
in terms of ψ
†
. However, since the γ matrices
satisfy
γ
(0)
γ
µ†
γ
(0)
= γ
µ
(1235)
216
the current can be further simpliﬁed to yield
j
ν
=
i ¯h
2 m
_
− ( ∂
µ
− i
q
¯h c
A
µ
)ψ
†
γ
µ
γ
ν
ψ + ψ
†
γ
ν
γ
µ
( ∂
µ
+ i
q
¯h c
A
µ
)ψ
_
(1236)
where, once again, the partial derivative only operates on the wave function
immediately to the right of it. Furthermore, if one sets
1
2
_
γ
µ
γ
ν
+ γ
ν
γ
µ
_
= g
µ,ν
ˆ
I
1
2
_
γ
µ
γ
ν
− γ
ν
γ
µ
_
= − i σ
µ,ν
(1237)
then the current density can be expressed as the sum of two contributions
j
ν
= j
ν
c
+ j
ν
s
=
i ¯h
2 m
_
− g
µ,ν
( ∂
µ
ψ
†
ψ − ψ
†
∂
µ
ψ ) + 2 i
q
¯h c
g
µ,ν
ψ
†
A
µ
ψ
_
−
¯h
2 m
∂
∂x
µ
_
ψ
†
σ
µ,ν
ψ
_
(1238)
where
j
ν
c
=
i ¯h
2 m
_
− ( ∂
ν
ψ
†
ψ − ψ
†
∂
ν
ψ ) + 2 i
q
¯h c
ψ
†
A
ν
ψ
_
j
ν
s
= −
¯h
2 m
∂
∂x
µ
_
ψ
†
σ
µ,ν
ψ
_
(1239)
This is the Gordon decomposition
93
of the probability current density. A similar
expression can be derived for the matrix elements of the interaction operator
between states ψ
†
β
and ψ
α
. As shall be shown, the ﬁrst contribution in the Gor
don decomposition is gauge invariant and dominates the current density in the
nonrelativistic limit. The second contribution involves the matrix σ
µ,ν
which
is antisymmetric in its indices and has the form of a spin contribution to the
current density.
Let us examine the ﬁrst term in the probability current density. If ψ repre
sents an energy eigenstate, then j
(0)
c
is given by
j
(0)
c
=
_
E
m c
_
ψ
†
ψ −
q
m c
ψ
†
A
(0)
ψ (1240)
This contribution obviously yields the main contribution to (c times) the prob
ability density
j
(0)
c
≈ c ψ
†
ψ (1241)
93
W. Gordon, Zeit. f¨ ur Physik, 50, 630 (1928).
217
in the nonrelativistic limit since the rest mass energy dominates the energy
E ∼ m c
2
. The spatial components of j
(i)
c
are given by
j
c
=
i ¯h
2 m
_
( ∇ ψ
†
) ψ − ψ
†
( ∇ ψ )
_
−
q
m c
ψ
†
A ψ (1242)
where the derivatives have been expressed as derivatives w.r.t. the contravariant
components x
(i)
of the position vector. This expression coincides with the full
nonrelativistic expression for the current density j
(i)
.
We now examine the second term j
µ
s
in the Gordon decomposition. For
future reference, the antisymmetrized products of the Dirac matrices σ
µ,ν
will
be expressed in 2 2 block diagonal form. Therefore, since
γ
(0)
=
_
I 0
0 −I
_
γ
(i)
=
_
0 σ
(i)
−σ
(i)
0
_
(1243)
and
σ
µ,ν
=
i
2
_
γ
µ
γ
ν
− γ
ν
γ
µ
_
(1244)
the matrices are found as
σ
0,j
= i
_
0 σ
(j)
σ
(j)
0
_
(1245)
and
σ
i,j
=
k
ξ
i,j,k
_
σ
(k)
0
0 σ
(k)
_
(1246)
The two by two block diagonal matrix of Pauli spin matrices will be denoted
by ˆ σ. For an energy eigenstate, the time component of j
(0)
s
is identically zero.
Hence, the spatial components of j
(i)
s
are given by
j
s
= −
¯h
2 m
∇ ∧ ( ψ
†
ˆ σ ψ ) (1247)
where ˆ σ is the 2 2 blockdiagonal Pauli spin matrix
ˆ σ =
_
σ 0
0 σ
_
(1248)
The additional term in the current density clearly involves the Pauli spin
matrices. To elucidate its meaning, its contribution to the energy shall be
examined. On substituting this term in the interaction Hamiltonian density,
one ﬁnds a contribution
ˆ
H
spin
I
= −
q
c
j
s
. A
= +
q ¯h
2 m c
A .
_
∇ ∧ ( ψ
†
ˆ σ ψ )
_
(1249)
218
On integrating over space, the interaction Hamiltonian density gives rise to the
interactions contribution to the total energy. By integrating by parts, it can
be shown that this energy contribution is equivalent to the energy contribution
caused by an equivalent form of the interaction Hamiltonian density
ˆ
H
spin
I
≡ −
q ¯h
2 m c
( ψ
†
ˆ σ ψ ) . ( ∇ ∧ A )
≡ −
q ¯h
2 m c
( ψ
†
ˆ σ ψ ) . B (1250)
where B is the magnetic ﬁeld. Hence, the interaction energy contains a term
which represents an interaction between the electron’s internal degree of free
dom and the magnetic ﬁeld.
11.5 Lorentz Covariance of the Dirac Equation
One goal of Physics is to write the laws in a manner which are independent
of any arbitrary choices that are made. Within special relativity, this implies
that the laws of Physics should be written in a way which is independent of
the choice of inertial reference frame. Dirac’s theory is Lorentz covariant if the
results are independent of the Lorentz frame used. To this end, it is required
that the Dirac equation in a Lorentz transformed frame of reference has the
same form as the Dirac equation in the original reference frame, and also that
the solutions of these two equations describe the same physical states. That is,
the two solutions must describe the same set of measurable properties in the
diﬀerent reference frames, and therefore the results are simply related by the
Lorentz transformation.
The ﬁrst step of the proof of the Lorentz covariance of the Dirac equation
requires that one should be able to show that under a Lorentz transformation
deﬁned by
A
µ
→ A
µ
= Λ
µ
ν
A
ν
(1251)
then the Dirac equation is transformed from
γ
µ
( ˆ p
µ
−
q
c
A
µ
) ψ = m c ψ (1252)
to an equation with an equivalent form
γ
µ
( ˆ p
µ
−
q
c
A
µ
) ψ
= m c ψ
(1253)
Furthermore, the four components of the spinor wave function ψ
are assumed
to be linearly related to the components of ψ by a four by four matrix
ˆ
¹(Λ)
which is independent of x
µ
ψ
(x
) =
ˆ
¹(Λ) ψ(x) (1254)
219
Hence, the transformed Dirac equation can be rewritten in terms of the un
transformed spinor
γ
µ
( ˆ p
µ
−
q
c
A
µ
) ψ
= m c ψ
γ
µ
( ˆ p
µ
−
q
c
A
µ
)
ˆ
¹(Λ) ψ = m c
ˆ
¹(Λ) ψ (1255)
if such an
ˆ
¹(Λ) exists. The γ
µ
matrices must satisfy the same anticommutation
relations as the γ
µ
and, therefore, only diﬀer from them by a similarity trans
formation
94
. The transformations of γ
µ
just results in the set of the four linear
equations that compose the Dirac equation being combined in diﬀerent ways,
so this rearrangement can be absorbed in the deﬁnition of
ˆ
¹(Λ). That is, one
can choose to impose the convention that γ
µ
= γ
µ
. The transformed Dirac
equation can be expressed as
γ
µ
( ˆ p
µ
−
q
c
A
µ
)
ˆ
¹(Λ) ψ = m c
ˆ
¹(Λ) ψ
γ
µ
Λ
µ
ν
( ˆ p
ν
−
q
c
A
ν
)
ˆ
¹(Λ) ψ = m c
ˆ
¹(Λ) ψ (1256)
where the transformation properties of the momentum fourvector have been
used
95
. On multiplying by the inverse of
ˆ
¹(Λ), one has
ˆ
¹
−1
(Λ) γ
µ
Λ
µ
ν
( ˆ p
ν
−
q
c
A
ν
)
ˆ
¹(Λ) ψ = m c ψ (1257)
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) Λ
µ
ν
( ˆ p
ν
−
q
c
A
ν
) ψ = m c ψ (1258)
where the four by four matrices
ˆ
¹(Λ) have been commuted with the diﬀerential
operators and also with the components of the Lorentz transform. The condition
for covariance as
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) Λ
µ
ν
= γ
ν
(1259)
The transformed Dirac equation has the same form as the original equation if
the transformed γ
µ
matrices satisfy the same anticommutations and conditions
as the unprimed matrices. This can be achieved by choosing γ
µ
= γ
µ
. This
choice yields the condition for covariance as
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) Λ
µ
ν
= γ
ν
(1260)
Since for a Lorentz transform one has
Λ
µ
ν
Λ
ρ
ν
= δ
µ
ρ
(1261)
94
This is a statement of Pauli’s fundamental theorem [W. Pauli, Ann. Inst. Henri Poincar´e
6, 109 (1936).]. For a general discussion, see R. H. Good Jr. Rev. Mod. Phys. 27, 187
(1955).
95
It should be noted that the matrices Λµ
ν
and
ˆ
R act on totally diﬀerent spaces. The
matrices Λµ
ν
act on the components of the fourvectors x
ν
, whereas the
ˆ
R matrices act on
the components of the fourcomponent Dirac spinor ψ.
220
then multiplying the above covariance condition by Λ
ρ
ν
leads to
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) = Λ
µ
ν
γ
ν
(1262)
The above equation determines the 4 4 matrix
ˆ
¹(Λ). If
ˆ
¹(Λ) exits, the Dirac
equation has the same form in the two frames of reference and the solutions
are linearly related. Pauli’s “fundamental theorem” guarantees that a matrix
ˆ
¹(Λ) exists which does satisfy the condition. Instead of following the general
theorem, the solution will be inferred from consideration of inﬁnitesimal Lorentz
transformations.
The matrix
ˆ
¹(Λ) will be determined by considering the eﬀect of an inﬁnites
imal Lorentz transformation
Λ
µ
ν
= δ
µ
ν
+
µ
ν
+ . . . (1263)
where δ
µ
ν
is the Kronecker delta function. The matrix
ˆ
¹(Λ) for the inﬁnitesimal
transformation can also be expanded as
ˆ
¹ =
ˆ
I −
i
4
µ
ν
ω
µ
ν
+ . . . (1264)
where ω
µ
ν
is a four by four matrix that has yet to be determined. The inverse
matrix can be written as
ˆ
¹
−1
=
ˆ
I +
i
4
µ
ν
ω
µ
ν
+ . . . (1265)
to ﬁrstorder in the inﬁnitesimal quantity
µ
ν
. On substituting the matrices for
the inﬁnitesimal transform into the equation that determines
ˆ
¹, one obtains
i
4
ρ
σ
_
ω
ρ
σ
γ
µ
− γ
µ
ω
ρ
σ
_
=
µ
ν
γ
ν
(1266)
or on raising and lowering indices
i
4
ρσ
_
ω
ρσ
γ
µ
− γ
µ
ω
ρσ
_
= g
µρ
ρσ
γ
σ
(1267)
Thus, since
ρσ
is antisymmetric as it represents an inﬁnitesimal Lorentz trans
formation, the matrix ω
ρσ
can be restricted to be antisymmetric in the indices,
because any symmetric part does not contribute to the matrix
ˆ
¹. By making
speciﬁc choices for the antisymmetric quantities
ρσ
, which are zero except for
a chosen pair of indices (say α and β), one ﬁnds that the antisymmetric part
of ω
αβ
is determined from the equation
i
2
[ ω
αβ
, γ
µ
] = g
µα
γ
β
− g
µβ
γ
α
(1268)
These sets of equations have to be satisﬁed even if arbitrary choices are made for
the inﬁnitesimal Lorentz transformations
ρσ
. The inﬁnitesimal unitary matrix
221
ˆ
¹ can be expressed in terms of six generators ω
ρσ
of the inﬁnitesimal Lorentz
transformation
ˆ
¹ =
ˆ
I −
i
4
ρσ
ω
ρσ
+ . . . (1269)
The set of matrices ω
ρσ
that deﬁne
ˆ
¹ must satisfy the equation
i
2
[ ω
ρσ
, γ
µ
] = g
µρ
γ
σ
− g
µσ
γ
ρ
(1270)
The set of (as yet unknown) matrices ω
ρσ
that solve the above set of equations
are given by
ω
αβ
= σ
αβ
=
i
2
[ γ
α
, γ
β
] (1271)
which are the six generators of the general inﬁnitesimal Lorentz transformation.
This solution, and hence, the existence of
ˆ
¹(Λ) shows that the solutions of the
Dirac equation and the transformed equation are in a one to one correspondence.
——————————————————————————————————
Proof of Solution
It can be shown that the expression for σ
α,β
given in eqn(1271) satisﬁes the
requirement of eqn(1270), by evaluating the nested commutator through repeat
edly using the anticommutation properties of the γ matrices. The commutator
can be expressed as a nested commutator or as the sum of two commutators
[ σ
αβ
, γ
µ
] =
i
2
[ [ γ
α
, γ
β
] , γ
µ
]
=
i
2
[ γ
α
γ
β
, γ
µ
] −
i
2
[ γ
β
γ
α
, γ
µ
] (1272)
On using the anticommutation relation for the γ matrices
1
2
_
γ
α
γ
β
+ γ
β
γ
α
_
= g
α,β
ˆ
I (1273)
one can eliminate the second term leading to
[ σ
αβ
, γ
µ
] = i [ γ
α
γ
β
, γ
µ
] + i g
α,β
[
ˆ
I , γ
µ
]
= i [ γ
α
γ
β
, γ
µ
] (1274)
where the second line follows since the identity matrix commutes with γ
µ
. One
notices that if the γ
µ
’s are anticommuted to the center of each product, some
terms will cancel and there may be some simpliﬁcation. On using the anti
commutation relation in the second term of the expression
[ σ
αβ
, γ
µ
] = i
_
γ
α
γ
β
γ
µ
− γ
µ
γ
α
γ
β
_
(1275)
222
one ﬁnds
[ σ
αβ
, γ
µ
] = i
_
γ
α
γ
β
γ
µ
+ γ
α
γ
µ
γ
β
− 2 g
µ,α
γ
β
_
(1276)
Likewise, the γ matrices in the ﬁrst term can also be anticommuted, leading to
[ σ
αβ
, γ
µ
] = i
_
2 g
µ,β
γ
α
− γ
α
γ
µ
γ
β
+ γ
α
γ
µ
γ
β
− 2 g
µ,α
γ
β
_
= 2 i
_
g
µ,β
γ
α
− g
µ,α
γ
β
_
(1277)
since the middle pair of terms cancel. Hence, one has proved that
i
2
[ σ
αβ
, γ
µ
] =
_
g
µ,α
γ
β
− g
µ,β
γ
α
_
(1278)
which completes the identiﬁcation of the solution of the equation for ω
α,β
.
Therefore, since
ˆ
¹(Λ) exists, it has been shown that the form of the Dirac
equation is maintained in the primed reference frame and that there is a one
to one correspondence between the solutions of the primed and unprimed frames.
——————————————————————————————————
Equivalence of Physical Properties
It remains to be shown that the ψ and ψ
describe the properties of the same
physical system, albeit in two diﬀerent frames of reference. That is, the proper
ties associated with ψ must be related to the properties of ψ
and the relation
can be obtained by considering the Lorentz transformation. The most complete
physical descriptions of a unique quantum mechanical state are related to the
probability density, which can only be inferred from an inﬁnite set of position
measurements. The probability density, should behave similarly to the time
component of a fourvector as was seen from the consideration of the continuity
equation. Therefore, it follows that if the fourvector probability currents of ψ
and ψ
are related via a Lorentz transformation, then the two spinors describe
the same physical state of the system.
The probability current fourvector j
µ
in the unprimed frame is described
by
j
µ
= c ψ
†
γ
µ
ψ
= c ψ
†
γ
(0)
γ
µ
ψ (1279)
and in the primed frame, one has
j
µ
= c ψ
†
γ
(0)
γ
µ
ψ
= c ψ
†
ˆ
¹
†
γ
(0)
γ
µ
ˆ
¹ ψ (1280)
223
The identity
ˆ
¹
−1
= γ
(0)
ˆ
¹
†
γ
(0)
(1281)
will be proved below, so on using this identity together with
γ
(0)
γ
(0)
=
ˆ
I (1282)
the probability current density can be rewritten as
j
µ
= c ψ
†
γ
(0)
γ
(0)
ˆ
¹
†
γ
(0)
γ
µ
ˆ
¹ ψ
= c ψ
†
γ
(0)
ˆ
¹
−1
γ
µ
ˆ
¹ ψ (1283)
However, because the covariant condition is given by
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) = Λ
µ
ν
γ
ν
(1284)
the current density can be expressed as
j
µ
= c ψ
†
γ
(0)
Λ
µ
ν
γ
ν
ψ
= Λ
µ
ν
c ψ
†
γ
ν
ψ
= Λ
µ
ν
j
ν
(1285)
Hence, the probability current densities j
µ
and j
µ
found in the two reference
frames are simply related via the Lorentz transformation. Therefore, the Dirac
equation gives consistent results, no matter what inertial frame of reference is
used.
——————————————————————————————————
Proof of Identity
The identity
ˆ
¹
−1
(Λ) = γ
(0)
ˆ
¹
†
(Λ) γ
(0)
(1286)
can be proved by starting from the expression for the expression for
ˆ
¹ appro
priate for inﬁnitesimal transformation given by
ˆ
¹ =
ˆ
I +
1
8
µν
[ γ
µ
, γ
ν
] + . . . (1287)
Hence, the Hermitean conjugate is given by
ˆ
¹
†
=
ˆ
I +
1
8
µν
[ γ
ν†
, γ
µ†
] + . . .
ˆ
¹
†
=
ˆ
I −
1
8
µν
[ γ
µ†
, γ
ν†
] + . . . (1288)
since the Hermitean conjugate of a product is the product of the Hermitean
conjugate of the factors taken in opposite order. On forming the product
γ
(0)
ˆ
¹
†
γ
(0)
and inserting a factor of
γ
(0)
γ
(0)
=
ˆ
I (1289)
224
between the pairs of four by four γ matrices in the commutator and noting that
γ
(0)
γ
µ†
γ
(0)
= γ
µ
(1290)
one ﬁnds that
γ
(0)
ˆ
¹
†
γ
(0)
=
ˆ
I −
1
8
µν
[ γ
µ
, γ
ν
] + . . .
=
ˆ
¹
−1
(1291)
The last line follows from the observation that on combining the expression for
ˆ
¹ with the expression for γ
(0)
ˆ
¹
†
γ
(0)
, the terms of order cancel. Hence to
the order of
2
, the product γ
(0)
ˆ
¹
†
γ
(0)
coincides with
ˆ
¹
−1
. This concludes
the discussion of the desired identity.
Finite Rotations
Consider a ﬁnite rotation of the coordinate system speciﬁed by the transfor
mation matrix Λ
x
µ
= Λ
µ
ν
x
ν
(1292)
Speciﬁcally, a ﬁnite (passive) rotation through an angle ϕ about the ˆ e
3
direction
can be expressed in terms of the transformation matrix
Λ =
_
_
_
_
1 0 0 0
0 cos ϕ + sin ϕ 0
0 − sin ϕ cos ϕ 0
0 0 0 1
_
_
_
_
(1293)
The above transformation represents a rotation of the coordinate system while
the physical system stays put. For an inﬁnitesimal rotation through δϕ, the
x
(1)
x
(2)
x
(1)
'
x
(2)
'
ϕ
Figure 45: A passive rotation of the coordinate system through an angle ϕ about
the ˆ e
3
axis.
225
transformation matrix reduces to
Λ =
_
_
_
_
1 0 0 0
0 1 + δϕ 0
0 − δϕ 1 0
0 0 0 1
_
_
_
_
+ . . . (1294)
to ﬁrstorder in the inﬁnitesimal quantity δϕ. Therefore, with the inﬁnitesimal
form of the general Lorentz transformation
Λ
µ
ν
= δ
µ
ν
+
µ
ν
+ . . . (1295)
on lowering the ﬁrst index, one identiﬁes
12
= −
21
= − δϕ (1296)
The inﬁnitesimal transformation of a Dirac spinor was determined to be given
by
ˆ
¹(δϕ) =
ˆ
I −
i
4
µν
σ
µν
+ . . . (1297)
Hence, for a inﬁnitesimal rotation one has
ˆ
¹(δϕ) =
ˆ
I +
i
4
( δϕ σ
1,2
− δϕ σ
2,1
) + . . .
=
ˆ
I +
i
2
δϕ σ
1,2
+ . . .
= exp
_
i
δϕ
2
σ
1,2
_
(1298)
since σ
µ,ν
is antisymmetric. On compounding N inﬁnitesimal transformations
about the same axis
ˆ
¹(δϕ) using their exponential form, and deﬁning N δϕ =
ϕ, one obtains the ﬁnite rotation
ˆ
¹(ϕ)
ˆ
¹(ϕ) =
_
ˆ
¹(δϕ)
_
N
= exp
_
i N
δϕ
2
σ
1,2
_
= exp
_
i
ϕ
2
σ
1,2
_
(1299)
Therefore, for a ﬁnite rotation, the transformation matrix is given by
ˆ
¹(ϕ) = exp
_
i
ϕ
2
σ
1,2
_
(1300)
which can be expressed in terms of even and oddpowers of σ
1,2
via
ˆ
¹(ϕ) = cos
_
ϕ
2
σ
1,2
_
+ i sin
_
ϕ
2
σ
1,2
_
(1301)
226
but since
σ
1,2
=
i
2
[ γ
(1)
, γ
(2)
]
= ˆ σ
(3)
(1302)
the transformation can be expressed as
ˆ
¹(ϕ) = cos
_
ϕ
2
ˆ σ
(3)
_
+ i sin
_
ϕ
2
ˆ σ
(3)
_
(1303)
The above expression can be simpliﬁed by expanding the trigonometric functions
in series of ϕ and then using the property of the ˆ σ
(j)
matrices
( ˆ σ
(3)
)
2
=
ˆ
I (1304)
Since the repeated use of the above identity leads to
( ˆ σ
(3)
)
2n
=
ˆ
I
( ˆ σ
(3)
)
2n+1
= ˆ σ
(3)
(1305)
the series simplify and can be resummed leading to
ˆ
¹(ϕ) = cos
_
ϕ
2
_
ˆ
I + i sin
_
ϕ
2
_
ˆ σ
(3)
(1306)
Therefore, under a ﬁnite rotation through angle ϕ around the unit vector ˆ e, a
spinor is rotated by the operator
ˆ
¹(ϕ) = cos
ϕ
2
ˆ
I + i sin
ϕ
2
ˆ e . ˆ σ (1307)
From the above equation, due to the presence of the halfangle, one notes that
a rotation ϕ and through ϕ + 2π are not equivalent, since
ˆ
¹(ϕ + 2π) = −
ˆ
¹(ϕ) (1308)
which changes the sign of the spinor. For spin onehalf electrons, it is necessary
to rotate through 4π to return to the same state
ˆ
¹(ϕ + 4π) =
ˆ
¹(ϕ) (1309)
A quantity which is bilinear in ψ
†
and ψ will remain invariant under a rotation
of 2π.
Finite Lorentz Boosts
A ﬁnite Lorentz boost by velocity v along the ˆ e
1
direction can be expressed
in terms of the transformation
Λ =
_
_
_
_
cosh χ − sinh χ 0 0
− sinh χ cosh χ 0 0
0 0 1 0
0 0 0 1
_
_
_
_
(1310)
227
where the rapidity χ is deﬁned by
tanh χ =
v
c
(1311)
so
cosh χ =
1
_
1 −(
v
c
)
2
sinh χ =
v
c
_
1 −(
v
c
)
2
(1312)
For an inﬁnitesimal boost through δχ, the transformation matrix reduces to
Λ =
_
_
_
_
1 − δχ 0 0
− δχ 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
+ . . . (1313)
to ﬁrstorder in the inﬁnitesimal quantity δχ. Therefore, with the inﬁnitesimal
form of the general Lorentz transformation
Λ
µ
ν
= δ
µ
ν
+
µ
ν
+ . . . (1314)
on lowering the ﬁrst index, one identiﬁes
01
= −
10
= − δχ (1315)
The inﬁnitesimal transformation of a Dirac spinor was determined to be given
by
ˆ
¹(δχ) =
ˆ
I −
i
4
µν
σ
µν
+ . . . (1316)
Hence, for a inﬁnitesimal Lorentz boost one has
ˆ
¹(δχ) =
ˆ
I + 2
i
4
δχ σ
0,1
+ . . .
= exp
_
i
δχ
2
σ
0,1
_
(1317)
On compounding N successive inﬁnitesimal Lorentz boosts (with parallel veloc
ities) given by
ˆ
¹(δχ) and deﬁning N δχ = χ, one obtains the ﬁnite Lorentz
boost
ˆ
¹(χ)
ˆ
¹(χ) =
_
ˆ
¹(δχ)
_
N
= exp
_
i N
δχ
2
σ
0,1
_
= exp
_
i
χ
2
σ
0,1
_
(1318)
228
Therefore, for a ﬁnite Lorentz boost, the transformation matrix is given by
ˆ
¹(χ) = exp
_
i
χ
2
σ
0,1
_
(1319)
which can be expressed in terms of even and oddpowers of σ
0,1
via
ˆ
¹(χ) = cosh
_
i
χ
2
σ
0,1
_
+ sinh
_
i
χ
2
σ
0,1
_
(1320)
but since
σ
0,1
=
i
2
[ γ
(0)
, γ
(1)
]
= i α
(1)
(1321)
the transformation can be expressed as
ˆ
¹(χ) = cosh
_
−
χ
2
α
(1)
_
+ sinh
_
−
χ
2
α
(1)
_
(1322)
The above expression can be simpliﬁed by expanding the hyperbolic functions
in series of χ and then using the property of the α matrices
( α
(1)
)
2
=
ˆ
I (1323)
Since the repeated use of the above identity leads to
( α
(1)
)
2n
=
ˆ
I
( α
(1)
)
2n+1
= α
(1)
(1324)
the series simplify and can be resummed leading to
ˆ
¹(χ) = cosh
_
−
χ
2
_
ˆ
I + sinh
_
−
χ
2
_
α
(1)
= cosh
_
χ
2
_
ˆ
I − sinh
_
χ
2
_
α
(1)
(1325)
Therefore, under a ﬁnite boost through velocity v, a spinor is rotated by the
operator
ˆ
¹(χ) = cosh
χ
2
_
ˆ
I − tanh
χ
2
α
(1)
_
= cosh
χ
2
_
ˆ
I − tanh
χ
2
ˆ v . α
_
(1326)
where the rapidity χ is determined by
tanh χ =
v
c
(1327)
229
Exercise:
Determine the relationship between the rapidities for a combined Lorentz
transformation consisting of two successive Lorentz boosts with parallel veloci
ties v
0
and v
1
.
Exercise:
Starting from a solution of a free stationary Dirac particle with spin σ,
perform a Lorentz boost to determine the solution for a Dirac electron with
momentum p
.
v
L'
A
ψ
Figure 46: A cartoon depicting a stationary freeelectron conﬁned in a volume
V with proper length L, viewed from a coordinate system moving with velocity
v antiparallel to the ˆ e
1
axis.
Exercise:
Show that the helicity eigenvalue of a free Dirac particle can be reversed by
going to a new reference frame which is “overtaking” the particle.
11.5.1 The Space of the Anticommuting γ
µ
Matrices.
One can form sixteen matrices Γ
i
from the product of the four γ matrices. Since
the γ
µ
matrices obey the anticommutation relations
¦ γ
µ
, γ
ν
¦
+
= 2 g
µ,ν
ˆ
I (1328)
all other products can be reduced to the above products. The order of the
matrices is irrelevant, since the diﬀerent matrices anticommute. Also, since
230
Table 6: The Set of the Sixteen Matrices Γ
n
with their Phase Factors (j > i)
ˆ
I
γ
(0)
i γ
(i)
γ
(0)
γ
(i)
i ε
i,j,k
γ
(i)
γ
(j)
γ
(4)
= i γ
(0)
γ
(1)
γ
(2)
γ
(3)
− i γ
(0)
γ
(4)
γ
(i)
γ
(4)
( γ
µ
)
2
= ±
ˆ
I, one only needs to consider the products in which each matrix
enters at most one time. Hence, since each of the four matrices either appear as
a factor or do not, there are only 2
4
such matrices. These sixteen Γ
i
matrices
can be constructed from
ˆ
I, γ
µ
, σ
µ,ν
= i γ
µ
γ
ν
, γ
(4)
and γ
(4)
γ
µ
, by choosing
appropriate phase factors.
Closure under Multiplication
The set of matrices Γ
i
formed from the set of γ
µ
are closed under multipli
cation, so
Γ
i
Γ
j
= a
i,j
Γ
k
(1329)
where a
4
i,j
= 1. The sixteen Γ
i
matrices can be chosen as the product of the
members of the above set multiplied by a phase factor taken from the set ± 1
and ± i, such that the condition
( Γ
i
)
2
=
ˆ
I (1330)
is satisﬁed. Furthermore, by counting the number of nonequivalent factors of
the γ
µ
in the products, one can show that
Γ
i
Γ
j
=
ˆ
I only if i = j (1331)
Also, by anticommuting the factors of γ
µ
in the products, one can show that
Γ
i
Γ
j
= ± Γ
j
Γ
i
(1332)
Speciﬁcally, for a ﬁxed Γ
i
not equal to the identity, one can always ﬁnd a
speciﬁc Γ
k
such that
Γ
i
Γ
k
= − Γ
k
Γ
i
(1333)
231
which on multiplying by Γ
k
results in
Γ
k
Γ
i
Γ
k
= − Γ
i
(1334)
Traceless Matrices
The above facts can be used to show that the Γ
i
matrices, other than the
identity, are traceless. This can be proved by considering
− Trace Γ
i
= Trace( − Γ
i
) = Trace( Γ
k
Γ
i
Γ
k
)
= Trace( Γ
i
Γ
k
Γ
k
) = Trace Γ
i
(1335)
in which the existence of a speciﬁc Γ
k
which anticommutes with Γ
i
has been
used, and where the cyclic invariance of the trace has been used as has been
( Γ
k
)
2
=
ˆ
I. Hence, all the Γ
i
matrices, other than the identity, are traceless
Trace Γ
i
= 0 (1336)
Linear Independence
The sixteen Γ
i
matrices are linearly independent. The linear independence
can be expressed in terms of the absence of any nontrivial solution of the
equation
i
C
i
Γ
i
= 0 (1337)
other than C
i
≡ 0 for all i. If the Γ
i
are linearly independent, the only solution
of this equation is
C
i
≡ 0 for all i (1338)
This can be proved by multiplying eqn(1337) by any one Γ
j
in the set which
leads to
C
j
ˆ
I +
i=j
C
i
Γ
i
Γ
j
= 0
C
j
ˆ
I +
i=j
C
i
a
i,j
Γ
k
= 0 (1339)
On taking the trace one ﬁnds
0 = C
j
Trace
ˆ
I +
i=j
C
i
a
i,j
Trace Γ
k
= C
j
4 (1340)
232
since the matrices Γ
k
are traceless. Hence, all the C
j
are zero, so the matrices
are linearly independent.
Uniqueness of Expansions
The existence of sixteen linearly independent matrices require that the ma
trices can be represented in a space of N N matrices, where N ≥ 4. Any
matrix A in the space of 4 4 matrices can be uniquely expressed in terms of
the basis set of the Γ
i
. For example, if
A =
i
C
i
Γ
i
(1341)
then on multiplying by Γ
j
and taking the trace, one has
Trace( A Γ
j
) =
i
C
i
Trace( Γ
i
Γ
j
)
= C
j
Trace( Γ
j
Γ
j
) +
i=j
C
i
Trace( Γ
i
Γ
j
)
= C
j
Trace(
ˆ
I ) +
i=j
C
i
Trace( a
i,j
Γ
k
)
= C
j
4 (1342)
Hence, the coeﬃcients C
j
in the expansion of A are uniquely determined as
C
j
=
1
4
Trace( A Γ
j
) (1343)
Schur’s Lemma
The uniqueness of the expansion can be used to show that the product of Γ
i
for ﬁxed i with the set of Γ
j
for leads to a diﬀerent Γ
k
for each j. This can be
shown by assuming that there exist two diﬀerent (linearly independent) values
Γ
j
and Γ
j
which lead to the same Γ
k
Γ
i
Γ
j
= a
i,j
Γ
k
Γ
i
Γ
j
= a
i,j
Γ
k
(1344)
On multiplying by Γ
i
, one obtains
Γ
j
= a
i,j
Γ
i
Γ
k
Γ
j
= a
i,j
Γ
i
Γ
k
(1345)
233
Hence, one infers that
Γ
j
=
a
i,j
a
i,j
Γ
j
(1346)
which contradicts the assumption that Γ
j
and Γ
j
are linearly independent.
Therefore for ﬁxed i, the product of Γ
i
Γ
j
leads to a diﬀerent result Γ
k
for the
diﬀerent Γ
j
.
One can also prove Schur’s lemma. Schur’s Lemma states that if a matrix A
commutes with all the γ
µ
’s, then A is a multiple of the identity. If A commutes
with the γ
µ
’s, it also commutes with all the Γ
i
’s. Schur’s lemma follows from
the expansion of A as
A = C
i
Γ
i
+
j=i
C
j
Γ
j
(1347)
for any i such that Γ
i
,=
ˆ
I. Then, one notes that there exits a Γ
k
such that
Γ
k
Γ
i
Γ
k
= − Γ
i
(1348)
Since it has been assumed that A commutes with all the Γ
i
, for the speciﬁc Γ
k
one has
A = Γ
k
A Γ
k
= C
i
Γ
k
Γ
i
Γ
k
+
j=i
C
j
Γ
k
Γ
j
Γ
k
= − C
i
Γ
i
+
j=i
C
j
Γ
k
Γ
j
Γ
k
(1349)
Furthermore, since the Γ
i
matrices either commute or anticommute
Γ
k
Γ
j
Γ
k
= ( ± 1 )
j,k
Γ
j
(1350)
the above equation reduces to
A = − C
i
Γ
i
+
j=i
C
j
( ± 1 )
j,k
Γ
j
(1351)
which should be compared with the assumed form of the expansion
A = C
i
Γ
i
+
j=i
C
j
Γ
j
(1352)
Since the expansion is unique, the coeﬃcients of the Γ
j
are unique and in par
ticular
C
i
= − C
i
(1353)
so C
i
= 0 for any i such that Γ
i
,=
ˆ
I. Hence, if A commutes with all the Γ
i
then A must be proportional to the identity.
234
Pauli’s Fundamental Theorem
Pauli’s fundamental theorem states that if there are two representations of
the algebra of anticommuting γmatrices, say γ
µ
and γ
µ
, then these represen
tations are related via a similarity transformation
γ
µ
=
ˆ
S γ
µ
ˆ
S
−1
(1354)
where
ˆ
S is a nonsingular matrix.
The theorem requires that one constructs a set of sixteen matrices Γ
i
from
the γ
µ
following the same rules with which the Γ
i
were constructed from γ
µ
.
Then one can describe the nonsingular matrix by
ˆ
S =
i
Γ
i
F Γ
i
(1355)
where F is an arbitrary 4 4 matrix.
First one notes that
Γ
i
Γ
j
= a
i,j
Γ
k
(1356)
so on iterating, one has
Γ
i
Γ
j
Γ
i
Γ
j
= a
2
ı,j
Γ
2
k
= a
2
i,j
ˆ
I (1357)
since
Γ
2
k
=
ˆ
I (1358)
On premultiplying eqn(1357) by Γ
j
Γ
i
, one obtains
Γ
j
Γ
i
Γ
i
Γ
j
Γ
i
Γ
j
= a
2
i,j
Γ
j
Γ
i
(1359)
but since
Γ
j
Γ
i
Γ
i
Γ
j
=
ˆ
I (1360)
eqn(1359) reduces to
Γ
i
Γ
j
= a
2
i,j
Γ
j
Γ
i
(1361)
However, as
Γ
i
Γ
j
= a
i,j
Γ
k
(1362)
the equation becomes
a
i,j
Γ
k
= a
2
i,j
Γ
j
Γ
i
(1363)
or since a
4
i,j
= 1, the equation can be expressed as
Γ
j
Γ
i
= a
3
i,j
Γ
k
(1364)
235
The Γ
i
matrices are constructed so that they satisfy similar relations to the Γ
i
.
In particular, the Γ
i
matrices satisfy
Γ
i
Γ
j
= a
i,j
Γ
k
(1365)
with the same constants a
i,j
as the unprimed matrices.
Pauli’s theorem follows from the above relations by noting that
Γ
i
ˆ
S Γ
i
= Γ
i
(
j
Γ
j
F Γ
j
) Γ
i
(1366)
but on recalling that
Γ
j
Γ
i
= a
3
i,j
Γ
k
(1367)
and
Γ
i
Γ
j
= a
i,j
Γ
k
(1368)
one ﬁnds
Γ
i
ˆ
S Γ
i
=
j
a
4
i,j
Γ
k
F Γ
k
(1369)
Therefore, with a
4
i,j
= 1, the above equation reduces to
Γ
i
ˆ
S Γ
i
=
j
Γ
k
F Γ
k
(1370)
However, since i is ﬁxed and j is being summed over, every Γ
k
appears once
and only once in the product. Therefore, the sum can be performed over k
Γ
i
ˆ
S Γ
i
=
k
Γ
k
F Γ
k
=
ˆ
S (1371)
If one can show that the matrix
ˆ
S has an inverse, then on postmultiplying by
ˆ
S
−1
, one ﬁnds
Γ
i
ˆ
S Γ
i
ˆ
S
−1
=
ˆ
I (1372)
Furthermore, since Γ
i
is its own inverse, then on premultiplying by Γ
i
the
equation reduces to
ˆ
S Γ
i
ˆ
S
−1
= Γ
i
(1373)
This is a generalization of the statement of the theorem. As a particular case,
one may choose Γ
i
= γ
µ
in which case the theorem becomes
γ
µ
=
ˆ
S γ
µ
ˆ
S
−1
(1374)
which was the initial statement of Pauli’s fundamental theorem made above.
236
The matrix
ˆ
S is nonsingular and has an inverse. This can be shown by
using Schur’s Lemma. One can construct a matrix
ˆ
S
in a manner which is
symmetrical to the construction of
ˆ
S. That is
ˆ
S
=
i
Γ
i
G Γ
i
(1375)
From symmetry it follows that since eqn(1371) is given by
ˆ
S = Γ
i
ˆ
S Γ
i
(1376)
one also has
ˆ
S
= Γ
i
ˆ
S
Γ
i
(1377)
Therefore, on taking the product, one obtains
ˆ
S
ˆ
S = Γ
i
ˆ
S
Γ
i
Γ
i
ˆ
S Γ
i
= Γ
i
ˆ
S
ˆ
S Γ
i
(1378)
Hence, by Schur’s Lemma one sees that
ˆ
S
ˆ
S commutes with all the matrices in
the space, therefore it must be a multiple of the identity
ˆ
S
ˆ
S = κ
ˆ
I (1379)
where κ is a constant. By a judicious choice of the magnitude of the elements
of F, the constant κ can be set to unity, yielding
ˆ
S
ˆ
S =
ˆ
I (1380)
Thus,
ˆ
S is nonsingular so the inverse exists and is given by
ˆ
S
−1
=
ˆ
S
.
11.5.2 Polarization in Mott Scattering
When evaluated in the Born Approximation, Mott scattering does not result
in the polarization of an unpolarized beam. However, when higherorder cor
rections are included, Mott scattering produces a partially polarization of the
scattered electrons
96
. If the incident beam is polarized by having a deﬁnite
helicity, it is expected that the helicity may change as a result of the scattering.
The probability of nonhelicity ﬂip scattering and helicity ﬂip scattering can
be evaluated using the Born approximation. The initial beam will be considered
as having a momentum p parallel to the ˆ e
3
axis and as having a helicity of +1.
The initial spinor is proportional to
ψ
p,+
(r) =
¸
E
p
+ m c
2
2 E
p
V
_
χ
+
c p
Ep + m c
2
χ
+
_
exp
_
i
p . r
¯h
_
(1381)
96
N. F. Mott, Proc. Roy. Soc. A 124, 425 (1929).
237
p'
p'
p
p
θ'
Figure 47: Helcity nonﬂip and helicity ﬂip Mott scattering of an electron with
helicity +1. The scattering angle is θ
p
.
The electrons are assumed to be elastically scattered to a state with ﬁnal mo
mentum p
. The scattering is deﬁned to occur through an angle θ
p
in the z −x
plane. The ﬁnal state is composed of a linearsuperposition of states with dif
ferent helicities. Since the ﬁnal state helicities are speciﬁed relative to the ﬁnal
momentum, the ﬁnal state helicity eigenstates can be obtained by rotating the
initial state helicity eigenstates through an angle θ
p
around the ˆ e
2
axis
ψ
p
,Λ
(r) =
ˆ
¹(θ
p
) ψ
p
,Λ
(x)
=
¸
E
p
+ m c
2
2 E
p
V
ˆ
¹(θ
p
)
_
χ
Λ
c p Λ
Ep + m c
2
χ
Λ
_
exp
_
i
p
. r
¯h
_
(1382)
where the rotation operator is given by
ˆ
¹(θ
p
) =
_
cos
θ
p
2
ˆ
I − i sin
θ
p
2
ˆ σ
(2)
_
(1383)
which does not mix the upper and lower twocomponent spinors. Therefore, one
ﬁnds that the ﬁnal state twocomponent spinors representing helicity eigenstates
are given by
χ
+
=
_
cos
θ
p
2
I − i sin
θ
p
2
σ
(2)
_
χ
+
=
_
cos
θ
p
2
sin
θ
p
2
_
(1384)
and
χ
−
=
_
cos
θ
p
2
I − i sin
θ
p
2
σ
(2)
_
χ
−
238
=
_
− sin
θ
p
2
cos
θ
p
2
_
(1385)
Therefore, the ﬁnal state basis states are given by
ψ
p
,Λ
(r) =
¸
E
p
+ m c
2
2 E
p
V
_
χ
Λ
c p Λ
Ep + m c
2
χ
Λ
_
exp
_
i
p
. r
¯h
_
(1386)
The Born approximation scattering crosssection can be expressed in terms of
the modulus squared matrix elements
¸
¸
¸
¸
_
d
3
r ψ
†
p
,Λ
(r) γ
(0)
Z e
2
[ r [
ψ
p,+
(r)
¸
¸
¸
¸
2
(1387)
which is evaluated as
_
4 π Z e
2
V [ p − p
[
2
_
2
_
1 + Λ
c
2
p
2
( E
p
+ m c
2
)
2
_
2
_
E
p
+ m c
2
2 E
p
_
2
¸
¸
¸
¸
χ
†
Λ
χ
+
¸
¸
¸
¸
2
=
_
4 π Z e
2
V [ p − p
[
2
_
2
_
( E
p
+ m c
2
)
2
+ Λ
c
2
p
2
2 E
p
( E
p
+ m c
2
)
_
2
¸
¸
¸
¸
χ
†
Λ
χ
+
¸
¸
¸
¸
2
(1388)
Therefore, the probability for nonhelicity ﬂips scattering is proportional to
∝
_
4 π Z e
2
V [ p − p
[
2
_
2
cos
2
θ
p
2
(1389)
whereas the probability for helicity ﬂip scattering is given by
∝
_
4 π Z e
2
V [ p − p
[
2
_
2
_
m c
2
E
p
_
2
sin
2
θ
p
2
(1390)
It is seen that the probability for helicity ﬂip scattering vanishes in the ultra
relativistic limit. Also, in the nonrelativistic limit, a static charge cannot ﬂip
the spin. Therefore, in the nonrelativistic limit, if one expresses the spin eigen
state as a linear superposition of the ﬁnal helicity eigenstates
χ
+
= cos
θ
p
2
χ
+
− sin
θ
p
2
χ
−
(1391)
one is lead to expect that the relative probability of helicity ﬂip to nonhelicity
ﬂip will be governed by a factor of
tan
2
θ
p
2
(1392)
which agrees with the above matrix elements evaluated in the nonrelativistic
limit. The crosssection for nonﬂip scattering is determined as
_
dσ
dΩ
_
+,+
=
_
2 Z e
2
E
p
4 c
2
p
2
sin
2
θ
p
2
_
2
cos
2
θ
p
2
(1393)
239
whereas the crosssection for spin ﬂip scattering is given by
_
dσ
dΩ
_
+,−
=
_
2 Z e
2
m c
2
4 c
2
p
2
sin
2
θ
p
2
_
2
sin
2
θ
p
2
(1394)
The Born approximation to the total crosssection for scattering of polarized
electrons, in which the ﬁnal polarization is not measured, is given by
_
dσ
dΩ
_
+
=
_
dσ
dΩ
_
+,+
+
_
dσ
dΩ
_
+,−
=
_
2 Z e
2
E
p
4 c
2
p
2
sin
2
θ
p
2
_
2
_
cos
2
θ
p
2
+
_
m c
2
E
p
_
2
sin
2
θ
p
2
_
=
_
2 Z e
2
E
p
4 c
2
p
2
sin
2
θ
p
2
_
2
_
1 −
_
p c
E
p
_
2
sin
2
θ
p
2
_
(1395)
which is the same as the crosssection as calculated for unpolarized electrons.
The degree of polarization of the scattered beam is given
P(θ
p
) =
_
E
2
p
cos
2
θ
p
2
− m
2
c
4
sin
2
θ
p
2
E
2
p
cos
2
θ
p
2
+ m
2
c
4
sin
2
θ
p
2
_
(1396)
If the initial beam of electrons is unpolarized, the scattered electrons would be
observed to be partially polarized, where the net polarization is in the plane
perpendicular to the scattering plane. However, the polarization is due to pro
cesses of higherorder than the Born approximation and is governed by the factor
(
Z e
2
¯ h c
).
11.6 The NonRelativistic Limit
The nonrelativistic limit of the Dirac equation should reduce to the Schr¨odinger
equation. As shall be seen, the appropriate Schr¨odinger equation for a particle
with positiveenergy is modiﬁed due to the existence of spin. The nonrelativistic
limit is described by the Pauli equation
97
.
The Dirac equation can be written as
_
i
¯h
c
∂
∂t
−
q
c
A
0
_
ψ =
_
α . ( ˆ p −
q
c
A ) + β m c
_
ψ (1397)
The equation can be written in 2 2 block diagonal form, if the wave function
is expressed in the form of two twocomponent spinors. We shall mainly focus
on the positiveenergy solutions and recognize that, in the nonrelativistic limit,
the largest component of the wave function is φ
A
and the largest term in the
97
W. Pauli, Z, Phys. 44, 601 (1927).
240
energy is the rest mass energy m c
2
. Therefore, the spinor wave function will
be expressed as
ψ =
_
φ
A
φ
B
_
exp
_
− i
m c
2
¯h
t
_
(1398)
The above form explicitly displays the restmass energy of the positiveenergy
solution of the Dirac equation. Hence, the Dirac equation takes the form
_
i ¯h
∂
∂t
− q A
0
_ _
φ
A
φ
B
_
=
_
c σ . ( ˆ p −
q
c
A ) φ
B
c σ . ( ˆ p −
q
c
A ) φ
A
_
− 2 m c
2
_
0
φ
B
_
(1399)
where the rest mass has been eliminated from the equation for the large com
ponent φ
A
of the positiveenergy solution. Since the kinetic energy and the
potential energy are assumed to be smaller than the rest mass energy, the equa
tion for the small component
_
i ¯h
∂
∂t
− q A
0
_
φ
B
= c σ .
_
ˆ p −
q
c
A
_
φ
A
− 2 m c
2
φ
B
(1400)
can be expressed as
φ
B
=
1
2 m c
σ .
_
ˆ p −
q
c
A
_
φ
A
(1401)
Substituting the expression for the small component into the equation for the
large component, hence eliminating φ
B
, one ﬁnds the equation
_
i ¯h
∂
∂t
− q A
0
_
φ
A
=
1
2 m
_
σ .
_
ˆ p −
q
c
A
_ _
2
φ
A
(1402)
which is the Pauli equation. The equation can be simpliﬁed by expanding the
terms involving the Pauli spin matrices. The Pauli identity can be used to
obtain
_
σ .
_
ˆ p −
q
c
A
_ _
2
= I
_
ˆ p −
q
c
A
_
2
+ i σ .
_ _
ˆ p −
q
c
A
_
∧
_
ˆ p −
q
c
A
_ _
= I
_
ˆ p −
q
c
A
_
2
−
q ¯h
c
σ .
_
∇ ∧ A
_
(1403)
where the last term originates from the noncommutativity of the components
of ˆ p and A. Since the magnetic ﬁeld B is given by
B = ∇ ∧ A (1404)
the Pauli equation can be expressed as
i ¯h
∂
∂t
φ
A
=
1
2 m
_
ˆ p −
q
c
A
_
2
φ
A
+ q A
0
φ
A
−
q ¯h
2 m c
σ . B φ
A
(1405)
241
The Pauli equation
98
is the nonrelativistic limit of the Dirac equation. It rep
resents the Schr¨odinger equation for a charged particle with spin onehalf. The
two components of the spinor φ
A
in the Pauli equation represent the internal
spin of the electron. The last term represents the anomalous Zeeman interaction
between the magnetic ﬁeld and the electron’s spin.
The other contribution to the Zeeman interaction originates with the elec
tron’s orbital angular momentum L. The ordinary Zeeman interaction occurs
between the constant magnetic ﬁeld B and the orbital angular momentum and
originates from the gaugeinvariant term in the Hamiltonian
1
2 m
_
ˆ p −
q
c
A
_
2
=
1
2 m
_
ˆ p −
q
2 c
B ∧ r
_
2
(1406)
where the vector potential has been expressed in terms of the uniform magnetic
ﬁeld via
A =
1
2
B ∧ r (1407)
The expression for the energy term can be further simpliﬁed to
1
2 m
_
ˆ p −
q
c
A
_
2
=
ˆ p
2
2 m
−
q
4 m c
_
ˆ p . (B ∧ r) + (B ∧ r) . ˆ p
_
+
q
2
2 m c
2
A
2
=
ˆ p
2
2 m
−
q
2 m c
_
(B ∧ r) . ˆ p
_
+
q
2
2 m c
2
A
2
=
ˆ p
2
2 m
−
q
2 m c
_
B . (r ∧ ˆ p)
_
+
q
2
2 m c
2
A
2
=
ˆ p
2
2 m
−
q
2 m c
_
B .
ˆ
L
_
+
q
2
2 m c
2
A
2
(1408)
In obtaining the second line, the ith component of ˆ p has been commuted with
the ith component of ( B ∧ r ). In obtaining the third line, the (cyclic) vector
identity
( A ∧ B ) . C = ( B ∧ C ) . A (1409)
has been used. The ﬁrst term in eqn(1408) represents the usual nonrelativistic
expression for the kinetic energy of the electrons, the second term represents
the ordinary Zeeman interaction which originates from the paramagnetic inter
action. The last term represents the diamagnetic interaction.
The total Zeeman interaction is the energy of the total magnetic moment M
in the ﬁeld B
ˆ
H
Zeeman
= − M . B (1410)
98
W. Pauli, Z, Phys. 44, 601 (1927).
242
The Dirac equation results in the Zeeman interaction of the form
ˆ
H
Zeeman
= −
q
2 m c
B .
_
ˆ
L + ¯h σ
_
= −
q
2 m c
B .
_
ˆ
L + 2 S
_
(1411)
where the spin angular momentum S has been identiﬁed as
S =
¯h
2
σ (1412)
It is seen that both the spin angular momentum and the orbital angular momen
tum of the charged particle interacts with the magnetic ﬁeld, therefore, both
contribute to the magnetic moment. However, it is noted that the magnetic
moment can be written in the form
M =
q
2 m c
_
ˆ
L + g S
_
(1413)
where the magnitude of the magnetic moment is determined by the factor
q ¯ h
2 m c
which is the Bohr magneton. The Dirac equation shows that the spin angular
momentum couples with a diﬀerent strength to orbital angular momentum, and
the relative coupling strength g (the gyromagnetic ratio) is given by g = 2.
The existence of spin and the value of 2 for the gyromagnetic ratio were the
ﬁrst successes of Dirac’s theory. Quantum Electrodynamics
99
yields a small
correction to the gyromagnetic ratio of
g = 2
_
1 +
1
2 π
_
e
2
¯h c
_
+ . . .
_
(1414)
which has been experimentally veriﬁed to incredible precision
100
. Using the fea
tures associated with spin, Dirac’s theory correctly described the ﬁne structure
of the Hydrogen atom. The second success of the Dirac equation followed Dirac’s
physical interpretation of the negativeenergy states in terms of antiparticles
101
.
The second round of success came with the discovery of the positron by Ander
son
102
.
Exercise:
The Dirac equation can be phenomenologically modiﬁed to describe particles
with anomalous magnetic moments. The Dirac equation is modiﬁed to
_
i ¯h γ
µ
( ∂
µ
+ i
q
¯h c
A
µ
) +κ
q ¯h
4 m c
2
σ
µ,ν
F
µ,ν
− m c
ˆ
I
_
ψ = 0 (1415)
99
J. S. Schwinger, Phys. Rev. 73, 416 (1948).
100
H. M. Foley and P. Kusch, Phys. Rev. 73, 412 (1948).
R. S. Van Dyck Jr., P. B. Schwinberg and H. G. Dehmelt, Phys. Rev. Lett. 59, 26 (1987).
101
P. A. M. Dirac, Proc. Roy. Soc. A 126, 360 (1930).
102
C. D. Anderson, Phys. Rev. 43, 491 (1933).
243
Show that the modiﬁed equation is Lorentz covariant and that the Hamiltonian
is Hermitean. Also derive the corrections to the magnetic moment due to the
spin by examining the nonrelativistic limit.
11.7 Conservation of Angular Momentum
The law of conservation of angular momentum will now be examined. For a
relativistic electron the orbital angular momentum and the spin angular mo
mentum are not separately conserved. However, the total angular momentum
which is the sum of the orbital angular momentum and spin angular momentum
is conserved.
The orbital angular momentum
ˆ
L deﬁned by
ˆ
L = r ∧ ˆ p (1416)
is not conserved for a spherically symmetric potential. The Dirac Hamiltonian
is given by
ˆ
H = c α . ˆ p + β m c
2
+
ˆ
I V (r) (1417)
The matrices shall be expressed in a 2 2 block diagonal form. Therefore, the
identity matrix is written as
ˆ
I =
_
I 0
0 I
_
(1418)
and
β =
_
I 0
0 −I
_
(1419)
Finally, the α matrices are of oﬀdiagonal form
α =
_
0 σ
σ 0
_
=
_
0 I
I 0
_
ˆ σ (1420)
where ˆ σ is the 2 2 blockdiagonal Pauli spin matrix. The rate of change of
orbital angular momentum is given by the Heisenberg equation of motion
i ¯h
∂
∂t
ˆ
L = [
ˆ
L ,
ˆ
H ] (1421)
The orbital angular momentum operator commutes with the mass term and
with the spherically symmetric potential V (r). The orbital angular momentum
does not commute with the momentum. Thus,
i ¯h
∂
∂t
ˆ
L = c [
ˆ
L , α . ˆ p ] (1422)
244
Hence, the Heisenberg equation of motion can be expressed in the form
i ¯h
∂
∂t
ˆ
L = c
_
0 I
I 0
_
[
ˆ
L , ˆ σ . ˆ p ]
= − c
_
0 I
I 0
_
ˆ σ . [ ˆ p ,
ˆ
L ] (1423)
However, the components of the orbital angular momentum
ˆ
L
(i)
and momenta
p
(j)
satisfy the commutation relations
[
ˆ
L
(i)
, p
(j)
] = i ¯h
k
ξ
i,j,k
p
(k)
(1424)
Therefore, one ﬁnds
i ¯h
∂
∂t
ˆ
L = i ¯h c
_
0 I
I 0
_
( ˆ σ ∧ ˆ p ) (1425)
which shows that orbital angular momentum is not conserved for a relativistic
electron with a central potential.
The spin angular momentum is also not conserved. This can be seen by
examining the Heisenberg equation of motion for the Pauli spin operator
i ¯h
∂
∂t
ˆ σ = [ ˆ σ ,
ˆ
H ] (1426)
The spin operator commutes with
ˆ
I and β but does not commute with the α
matrices. Hence,
i ¯h
∂
∂t
ˆ σ = c [ ˆ σ , α . ˆ p ]
= c [ ˆ σ , ˆ σ ] . ˆ p
_
0 I
I 0
_
(1427)
The components of the Pauli spin operators satisfy the commutation relations
[ σ
(i)
, σ
(j)
] = 2 i
k
ξ
i,j,k
σ
(k)
(1428)
which, clearly, have a similar form to the commutation relations for the orbital
angular momentum. Hence, spin angular momentum is not conserved since
i ¯h
∂
∂t
ˆ σ = − 2 i c
_
0 I
I 0
_
( ˆ σ ∧ ˆ p ) (1429)
245
The total angular momentum
ˆ
J is deﬁned via
ˆ
J =
ˆ
L +
ˆ
S
=
ˆ
L +
¯h
2
ˆ σ (1430)
The total angular momentum is conserved since
i ¯h
∂
∂t
ˆ
J = i ¯h
∂
∂t
ˆ
L + i ¯h
∂
∂t
ˆ
S
= i ¯h c
_ _
0 I
I 0
_
( ˆ σ ∧ ˆ p ) −
_
0 I
I 0
_
( ˆ σ ∧ ˆ p )
_
= 0 (1431)
which follows from combining eqn(1425) and eqn(1429). This conﬁrms the in
terpretation of the quantity
ˆ
S deﬁned by
ˆ
S =
¯h
2
ˆ σ (1432)
as the spin angular momentum of the electron.
11.8 Conservation of Parity
Dirac was very conscious that his book “Principles of Quantum Mechanics”
never contained any mention of parity. It seems that he had questioned the
requirement of parity invariance
103
since biological systems are not parity in
variant. Dirac’s viewpoint was vindicated by the discovery that the weak inter
action violates parity.
The parity transform T acting on the coordinates (t, r) has the eﬀect
T (t, r) →(t
, r
) = (t, −r) (1433)
which is an inversion of the spatial coordinates. Thus, the parity reverse the
spacelike components of vectors, so the eﬀects of the parity operation on the
position and momentum vectors are given by
ˆ
T r
ˆ
T
−1
= − r
ˆ
T p
ˆ
T
−1
= − p (1434)
However, the eﬀect of the parity transform on pseudovectors such as orbital
angular momentum L = r ∧ p is such that
ˆ
T L
ˆ
T
−1
= L (1435)
103
The question of parity conservation in weak interactions was raised subsequently by T. D.
Lee and C. N. Yang [T. D. Lee and C. N. Yang, Phys. Rev. 104, 254 (1956).]
246
which is unchanged. This implies that spin angular momentum should also be
invariant under the parity transform
ˆ
T σ
ˆ
T
−1
= σ (1436)
If the Hamiltonian
ˆ
H is invariant under a parity transform, one requires that
ˆ
H =
ˆ
T
ˆ
H
ˆ
T
−1
(1437)
Imposing parity invariance of the Dirac Hamiltonian
ˆ
H = c α . ˆ p + β m c
2
+
ˆ
I V (r) (1438)
yields a condition on the potential
V (r) = V (−r) (1439)
and also to conditions on the Dirac matrices
ˆ
T α
ˆ
T
−1
= − α
ˆ
T β
ˆ
T
−1
= β (1440)
The condition on the potential is the familiar condition for parity invariance
in classical mechanics. In the standard representation, in 2 2 block diagonal
form, the requirement of parity invariance on the Dirac matrices become the
matrix equations
ˆ
T
_
0 σ
σ 0
_
ˆ
T
−1
= −
_
0 σ
σ 0
_
ˆ
T
_
I 0
0 −I
_
ˆ
T
−1
=
_
I 0
0 −I
_
(1441)
The above equation shows that, in the standard representation, the parity op
erator can be uniquely factorized as
ˆ
T =
_
I 0
0 −I
_
ˆ
P (1442)
where the operator
ˆ
P only acts on the coordinates r. The presence of the matrix
in the parity operation on the Dirac spinor should be compared with the eﬀect
of the parity operator on the fourvector potential of Electrodynamics A
µ
(r)
which is given by the product of spatial inversion and a matrix operation
ˆ
T A
µ
(r, t) = γ
µ
ν
ˆ
P A
ν
(r, t)
= γ
µ
ν
A
ν
(−r, t) (1443)
where the matrix γ
µ
ν
given by
γ
µ
ν
=
_
_
_
_
1 0 0 0
0 −1 0 0
0 0 −1 0
0 0 0 −1
_
_
_
_
(1444)
247
reverses the direction of the spatial components of the vector ﬁeld.
The eﬀect of the parity operator on the Dirac fourcomponent spinor wave
function can be computed from
ˆ
T ψ(t, r) =
ˆ
T
_
φ
A
(t, r)
φ
B
(t, r)
_
=
_
I 0
0 −I
__
φ
A
(t, −r)
φ
B
(t, −r)
_
=
_
φ
A
(t, −r)
− φ
B
(t, −r)
_
(1445)
Hence, in the standard representation, the parity operator changes the relative
sign of the two twocomponent spinors. Due to the presence of the term − I in
the lower diagonal block of the parity matrix, the lower twocomponent spinor
φ
B
in the Dirac spinor is said to have a negative intrinsic parity.
The parity eigenstates satisfy the eigenvalue equation
ˆ
T ψ = η
p
ψ (1446)
with eigenvalues η
p
= ±1, since
ˆ
T
2
=
ˆ
I. The application of the parity
operator on the Dirac spinor leads to the equation
_
ˆ
P φ
A
−
ˆ
P φ
B
_
=
_
η
p
φ
A
η
p
φ
B
_
(1447)
Hence, the twocomponent spinors φ
A
(r) and φ
B
(r) have opposite parities under
spatial inversion
ˆ
P φ
A
(r) = η
p
φ
A
(r)
ˆ
P φ
B
(r) = − η
p
φ
B
(r) (1448)
In polar coordinates, the spatial part of the parity operation
ˆ
P is equivalent to
a reﬂection
θ → π − θ (1449)
followed by a rotation
ϕ → ϕ + π (1450)
which has the eﬀect that
sin θ → sin θ
cos θ → − cos θ
exp
_
i m ϕ
_
→ ( − 1)
m
exp
_
i m ϕ
_
(1451)
248
Hence, the spherical harmonics with m = l
Y
l
l
(θ, ϕ) =
( − 1 )
l
2
l
l!
_
2 l + 1
4 π
sin
l
θ exp
_
i l ϕ
_
(1452)
are eigenstates of the parity operator and have parity eigenvalues of (−1)
l
. The
lowering operator
ˆ
L
−
, deﬁned via
ˆ
L
−
= − ¯h exp
_
− i ϕ
_ _
∂
∂θ
− i cot θ
∂
∂ϕ
_
(1453)
is invariant under the parity transformation
ˆ
T
ˆ
L
−
ˆ
T
−1
=
ˆ
L
−
(1454)
Therefore, on repeatedly operating on Y
l
l
(θ, ϕ) with the lowering operator
ˆ
L
−
(l −m) times, one ﬁnds that under the parity transformation
Y
l
m
(θ, ϕ) → ( − 1 )
l
Y
l
m
(θ, ϕ) (1455)
which shows that all states with a deﬁnite magnitude of the orbital angular mo
mentum l are eigenstates of the parity operator and have the same eigenvalue.
Exercise:
Show that under a parity transformation the positiveenergy solution for the
free Dirac particle ψ
+
k,σ
(x) transforms as
ˆ
T ψ
+
k,σ
(x) = ψ
+
−k,σ
(x) (1456)
while the negativeenergy solutions ψ
−
k,σ
(x) transform as
ˆ
T ψ
−
k,σ
(x) = − ψ
−
−k,σ
(x) (1457)
Hence, the parity operation reverses the momentum and keeps the spin invari
ant for the positiveenergy and negativeenergy solutions solution. The extra
negative sign implies that the negativeenergy solution has opposite intrinsic
parity to the positiveenergy solution.
Exercise:
Consider the parity transform as an example of an improper Lorentz trans
formation Λ, for which det [ Λ [ = − 1. If the Lorentz transform is given
by
x
µ
= Λ
µ
ν
x
ν
(1458)
the spinor wave function transforms via
ψ
(x
) =
ˆ
¹(Λ) ψ(x) (1459)
249
where
ˆ
¹(Λ) “rotates” the spinor. The covariant condition for the Dirac equation
is
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) = Λ
µ
ν
γ
ν
(1460)
For a parity transformation, one has
x
µ
= x
µ
(1461)
since the spatial components of x
µ
change sign. Hence, for a parity transforma
tion, the transformation matrix is determined as
Λ
µ
ν
= g
µ,ν
(1462)
which is an improper Lorentz transformation since
det [ g [ = − 1 (1463)
Therefore, the covariant condition reduces to
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) = g
µ,ν
γ
ν
(1464)
Solve for the matrix
ˆ
¹(Λ) which shuﬄes the components of the Dirac spinor.
11.9 Bilinear Covariants
Under an Lorentz transformation
x
µ
= Λ
µ
ν
x
ν
(1465)
(where Λ
0
0
> 0 for an orthochronous transformation), the Dirac spinor ψ
transforms according to
ψ
(x
) =
ˆ
¹(Λ) ψ(x) (1466)
and the condition that the Dirac equation is covariant under the orthochronous
Lorentz transformation is
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) = Λ
µ
ν
γ
ν
(1467)
From the transformational properties of the Dirac spinors, together with the
identity
γ
(0)
ˆ
¹
†
(Λ) γ
(0)
=
ˆ
¹
−1
(Λ) (1468)
one can ﬁnd the transformational properties of quantities that are bilinear in
the Dirac spinors.
Thus, for example, the bilinear quantity ψ
†
ψ transforms according to
ψ
†
ψ
= ψ
†
γ
(0)
ψ
= ψ
†
ˆ
¹
†
(Λ) γ
(0)
ˆ
¹(Λ) ψ
= ψ
†
( γ
(0)
)
2
ˆ
¹
†
(Λ) γ
(0)
ˆ
¹(Λ) ψ
= ψ
†
γ
(0)
ˆ
¹
−1
(Λ)
ˆ
¹(Λ) ψ
= ψ
†
ψ (1469)
250
Table 7: The sixteen bilinear covariants ψ
†
ˆ
Q ψ for the Dirac equation.
Quantity Bilinear Transformed Number of
Covariant Matrix Matrices
ψ
†
ˆ
Q ψ
ˆ
¹
−1
(Λ)
ˆ
Q
ˆ
¹(Λ)
Scalar ψ
†
ˆ
I ψ
ˆ
I 1
Vector ψ
†
γ
µ
ψ Λ
µ
ν
γ
ν
4
Antisymmetric Tensor ψ
†
σ
µ,ν
ψ Λ
µ
ρ
Λ
ν
τ
σ
ρ,τ
6
Pseudoscalar ψ
†
γ
(4)
ψ det [ Λ [ γ
(4)
1
AxialVector ψ
†
γ
(4)
γ
µ
ψ det [ Λ [ Λ
µ
ν
γ
(4)
γ
ν
4
where a factor of ( γ
(0)
)
2
=
ˆ
I has been used in the third line and the identity
has been used in the fourth. Thus, one ﬁnds that ψ
†
ψ transforms like a scalar.
Likewise, one can show that the bilinear quantities ψ
†
γ
µ
ψ transform like
the components of a fourvector. That is
ψ
†
γ
µ
ψ
= ψ
†
γ
(0)
γ
µ
ψ
= ψ
†
ˆ
¹
†
(Λ) γ
(0)
γ
µ
ˆ
¹(Λ) ψ
= ψ
†
( γ
(0)
)
2
ˆ
¹
†
(Λ) γ
(0)
γ
µ
ˆ
¹(Λ) ψ
= ψ
†
γ
(0)
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) ψ
= ψ
†
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ) ψ
= Λ
µ
ν
ψ
†
γ
ν
ψ (1470)
where the covariant condition has been used in obtaining the last line. Since
this relation holds for Lorentz boosts, rotations and spatial inversions, ψ
†
γ
µ
ψ
is a fourvector.
The antisymmetric quantity σ
µ,ν
deﬁned as
σ
µ,ν
=
i
2
[ γ
µ
, γ
ν
] (1471)
can be used to form a bilinear quantity ψ
†
σ
µ,ν
ψ. This bilinear quantity
251
transforms like a secondrank antisymmetric tensor, since
ψ
†
σ
µ,ν
ψ
= ψ
†
γ
(0)
σ
µ,ν
ψ
= ψ
†
ˆ
¹
†
(Λ) γ
(0)
σ
µ,ν
ˆ
¹(Λ) ψ
= ψ
†
( γ
(0)
)
2
ˆ
¹
†
(Λ) γ
(0)
σ
µ,ν
ˆ
¹(Λ) ψ
= ψ
†
γ
(0)
ˆ
¹
−1
(Λ) σ
µ,ν
ˆ
¹(Λ) ψ
= ψ
†
ˆ
¹
−1
(Λ) σ
µ,ν
ˆ
¹(Λ) ψ (1472)
For µ ,= ν, the antisymmetric quantity σ
µ,ν
can be written as
σ
µ,ν
= i γ
µ
γ
ν
(1473)
Therefore, one may reexpress the bilinear quantity as
ψ
†
σ
µ,ν
ψ
= i ψ
†
ˆ
¹
−1
(Λ) γ
µ
γ
ν
ˆ
¹(Λ) ψ
= i ψ
†
ˆ
¹
−1
(Λ) γ
µ
ˆ
¹(Λ)
ˆ
¹
−1
(Λ) γ
ν
ˆ
¹(Λ) ψ
= i ψ
†
Λ
µ
ρ
γ
ρ
Λ
ν
τ
γ
τ
ψ
= Λ
µ
ρ
Λ
ν
τ
ψ
†
σ
ρ,τ
ψ (1474)
where we have inserted a factor of
ˆ
I =
ˆ
¹(Λ)
ˆ
¹
−1
(Λ) in the second line, and
used the covariant condition (twice) in the third line. Hence, the bilinear quan
tity ψ
†
σ
µ,ν
ψ transforms like an antisymmetric secondrank tensor.
One can deﬁne a quantity γ
(4)
in terms of a product of all the γmatrices
γ
(4)
= i γ
(0)
γ
(1)
γ
(2)
γ
(3)
(1475)
It is easily veriﬁed that γ
(4)
anticommutes with all the γ
µ
,
¦ γ
µ
, γ
(4)
¦
+
= 0 (1476)
Furthermore, one has
( γ
(4)
)
2
=
ˆ
I (1477)
In the standard representation of the Dirac matrices, the matrix γ
(4)
has the
two by two block diagonal form
γ
(4)
=
_
0 I
I 0
_
(1478)
The quantity γ
(4)
can be used to construct a bilinear covariant quantity ψ
†
γ
(4)
ψ.
Under an orthochronous Lorentz transformation, the bilinear quantity trans
form according to
ψ
†
γ
(4)
ψ
= ψ
†
γ
(0)
γ
(4)
ψ
252
= ψ
†
ˆ
¹
†
(Λ) γ
(0)
γ
(4)
ˆ
¹(Λ) ψ
= ψ
†
( γ
(0)
)
2
ˆ
¹
†
(Λ) γ
(0)
γ
(4)
ˆ
¹(Λ) ψ
= ψ
†
γ
(0)
ˆ
¹
†
(Λ) γ
(0)
γ
(4)
ˆ
¹(Λ) ψ
= ψ
†
ˆ
¹
−1
(Λ) γ
(4)
ˆ
¹(Λ) ψ (1479)
A proper Lorentz transformation, such as a boost and a rotation, are generated
by the quantities σ
µ,ν
which involves the antisymmetrized product of the two
Dirac matrices γ
µ
and γ
ν
. Since the matrices γ
µ
and γ
ν
individual anticommute
with γ
(4)
, their product commutes with γ
(4)
. Hence, one can commute γ
(4)
and
ˆ
¹(Λ). Therefore, the bilinear quantity transforms as
ψ
†
(x
) γ
(4)
ψ
(x
) = ψ
†
(x) γ
(4)
ψ(x) (1480)
which behaves like a scalar under a proper orthochronous Lorentz transforma
tion for which det [ Λ [ = 1. However, for an inversion where det [ Λ [ = − 1,
one has
ˆ
¹(T) = γ
(0)
which anticommutes with γ
(4)
. Hence, for an inversion,
one has
ψ
†
(x
) γ
(4)
ψ
(x
) = − ψ
†
(x) γ
(4)
ψ(x) (1481)
so the quantity changes sign. In general, for an orthochronous transformation
one can show that
ˆ
¹
−1
(Λ) γ
(4)
ˆ
¹(Λ) = det [ Λ [ γ
(4)
(1482)
so one has
ψ
†
(x
) γ
(4)
ψ
(x
) = det [ Λ [ ψ
†
(x) γ
(4)
ψ(x) (1483)
Therefore, the quantity ψ
†
γ
(4)
ψ transforms as a pseudoscalar.
One can also deﬁne the bilinear axialvector ψ
†
γ
(4)
γ
µ
ψ. From consid
erations similar to those used previously, one can show that these quantities
transform according to
ψ
†
(x
) γ
(4)
γ
µ
ψ
(x
) = det [ Λ [ Λ
µ
ν
ψ
†
(x) γ
(4)
γ
ν
ψ(x) (1484)
Hence, ψ
†
γ
(4)
γ
µ
ψ transforms like a fourvector under proper orthochronous
Lorentz transformations. However, the spatial components do not change sign
under an inversion, but the time components do change sign. Therefore, ψ
†
γ
(4)
γ
µ
ψ
transforms like an axialvector.
Exercise:
Show that a modiﬁed Dirac equation described by
_
i ¯h γ
µ
( ∂
µ
+ i
q
¯h c
A
µ
) − i
κ q ¯h
4 m c
2
σ
µ,ν
γ
(4)
F
µ,ν
− m c
ˆ
I
_
ψ = 0 (1485)
253
is covariant under proper Lorentz transformations, but is not covariant under
improper transformations.
Show, by considering the nonrelativistic limit, that the above equation de
scribes an electron with an electric dipole moment. Determine an expression for
the electric dipole moment.
11.10 The Spherically Symmetric Dirac Equation
The Dirac Hamiltonian for a (electrostatic) spherically symmetric potential is
given by
ˆ
H = c α . ˆ p + β m c
2
+
ˆ
I V (r) (1486)
The angular momentum operator
ˆ
J and the parity operator
ˆ
T commute with
the Hamiltonian
ˆ
H. Therefore, one can ﬁnd simultaneous eigenstates of the
three operators
ˆ
H,
ˆ
J
2
,
ˆ
J
z
and
ˆ
T. The energy eigenstates satisfy the equation
_
c α . ˆ p + β m c
2
+
ˆ
I V (r)
_
ψ = E ψ (1487)
On writing the fourcomponent spinor in terms of the two twocomponent
spinors φ
A
and φ
B
the energy eigenvalue equation reduces to the set of cou
pled equations
( E − V (r) − m c
2
) φ
A
(r) = c ( σ . ˆ p ) φ
B
(r)
( E − V (r) + m c
2
) φ
B
(r) = c ( σ . ˆ p ) φ
A
(r) (1488)
In spherical polar coordinates, the operator ( σ . ˆ p ) can be expressed as
( σ . ˆ p ) = − i ¯h
_
cos θ sin θ exp[−iϕ]
sin θ exp[+iϕ] − cos θ
_
∂
∂r
−
i ¯h
r
_
− sin θ cos θ exp[−iϕ]
cos θ exp[+iϕ] sin θ
_
∂
∂θ
−
i ¯h
r sin θ
_
0 − i exp[−iϕ]
i exp[+iϕ] 0
_
∂
∂ϕ
(1489)
which has a quite complicated structure. For future reference, it shall be noted
that the matrix part of the coeﬃcient of the partial derivative w.r.t. r is simply
equal to
_
r . σ
r
_
(1490)
which is independent of the radial coordinate r. The operator ( σ . ˆ p ) can be
cast in a more convenient form through the repeated use of the Pauli identity.
254
First, the 2 2 unit matrix can be written as
I =
_
r . σ
r
_
2
(1491)
since diﬀerent Pauli spin matrices anticommute
¦ σ
(i)
, σ
(j)
¦
+
= 2 δ
i,j
ˆ
I (1492)
and are their own inverses. Therefore, one can express the operator ( σ . ˆ p ) as
( σ . ˆ p ) =
_
r . σ
r
_
2
( σ . ˆ p )
=
_
r . σ
r
2
_
( r . σ ) ( σ . ˆ p )
=
_
r . σ
r
2
_ _
r . ˆ p + i σ . ( r ∧ ˆ p )
_
=
_
r . σ
r
2
_ _
− i ¯h r
∂
∂r
+ i σ .
ˆ
L
_
=
_
r . σ
r
2
_ _
− i ¯h r
∂
∂r
+
2 i
¯h
S .
ˆ
L
_
(1493)
where the Pauli identity has been used in going between the second and third
lines. Therefore, the twocomponent spinors satisfy the set of coupled equations
( E − V (r) − m c
2
) φ
A
(r) = c
_
r . σ
r
2
_ _
− i ¯h r
∂
∂r
+
2 i
¯h
S .
ˆ
L
_
φ
B
(r)
( E − V (r) + m c
2
) φ
B
(r) = c
_
r . σ
r
2
_ _
− i ¯h r
∂
∂r
+
2 i
¯h
S .
ˆ
L
_
φ
A
(r)
(1494)
It is seen that, due to the eﬀect of special relativity, the Dirac equation results
in the coupling of the spin and the orbital angular momentum.
TwoComponent Spinor Spherical Harmonics
The angular dependence of the twocomponent wave functions φ
A
(r) and
φ
B
(r) are determined by the eigenvalue equations for the magnitude and the
zcomponents of the total angular momentum
J = L + S (1495)
Thus, the twocomponent spinor eigenstates of total angular momentum Ω
l
j,jz
(θ, ϕ)
which describes the angular dependence, are formed by combining states of or
bital angular momentum l, represented by Y
l
m
(θ, ϕ), and the spin eigenfunction
255
Table 8: The ClebschGordon Coeﬃcients for adding orbital angular momentum
(l, m) with spin quantum numbers (
1
2
, s
z
) to yield a state with total angular
momentum quantum numbers (j, j
z
). The allowed values of m are given by
j
z
= m+s
z
.
s
z
= +
1
2
s
z
= −
1
2
j = l +
1
2
_
l + jz +
1
2
2 l + 1
_
l − jz +
1
2
2 l + 1
j = l −
1
2

_
l − jz +
1
2
2 l + 1
_
l +jz +
1
2
2 l + 1
χ
±
. On combining states with orbital angular momentum l and spin s =
1
2
, one
ﬁnds states with total angular momentum which satisfy
l +
1
2
≥ j ≥ l −
1
2
(1496)
Thus, it is found that the possible eigenstates correspond to j = l +
1
2
and
j = l −
1
2
. Furthermore, the corresponding eigenfunctions are expressed as
Ω
l
l+
1
2
,jz
(θ, ϕ) =
¸
l +
1
2
+j
z
2l + 1
Y
l
jz−
1
2
(θ, ϕ) χ
+
+
¸
l +
1
2
−j
z
2l + 1
Y
l
jz+
1
2
(θ, ϕ) χ
−
Ω
l
l−
1
2
,jz
(θ, ϕ) = −
¸
l +
1
2
−j
z
2l + 1
Y
l
jz−
1
2
(θ, ϕ) χ
+
+
¸
l +
1
2
+j
z
2l + 1
Y
l
jz+
1
2
(θ, ϕ) χ
−
(1497)
where the coeﬃcients are identiﬁed with the ClebschGordon coeﬃcients given
in Table(8). The functions Ω
l
j,jz
(θ, ϕ) are the analogue of the spherical harmon
ics Y
l
m
(θ, ϕ) in relativistic problems where spin and orbital angular momentum
are coupled.
However, since orbital angular momentum is not a good quantum number,
the angular dependence of the eigenstates of the Dirac Hamiltonian can be
expressed as a linear superposition of states with diﬀerent values of the orbital
angular momentum l. For a ﬁxed value of j, one ﬁnds that the possible values
of the orbital angular momentum l are determined by
j = l +
1
2
j = l
−
1
2
(1498)
256
where l
= l + 1. The appropriate twocomponent spinor angular momentum
eigenstate with quantum numbers (j, j
z
) found by combining a spin onehalf
and orbital angular momentum l
= (l + 1) is given by
Ω
l+1
l+
1
2
,jz
(θ, ϕ) = −
¸
l +
3
2
−j
z
2l + 3
Y
l+1
jz−
1
2
(θ, ϕ) χ
+
+
¸
l +
3
2
+j
z
2l + 3
Y
l+1
jz+
1
2
(θ, ϕ) χ
−
(1499)
As shall be seen later, the twocomponent spinors Ω
l
l
−
1
2
,jz
(θ, ϕ) and Ω
l
l+
1
2
,jz
(θ, ϕ)
have opposite parities. In fact, the twocomponent spinors generated by angular
momentum l and l
= (l + 1) are related by the action of the pseudoscalar
_
r . σ
r
_
=
_
cos θ sin θ exp[−iϕ]
sin θ exp[+iϕ] − cos θ
_
(1500)
which changes sign under a parity transformation, (θ, ϕ) →(π −θ, ϕ+π). The
explicit relationship is given by
_
r . σ
r
_
Ω
l
j,jz
(θ, ϕ) = − Ω
l+1
j,jz
(θ, ϕ) (1501)
as can be shown by examination of Table(1). Likewise, on using the identity
_
r . σ
r
_
2
= I (1502)
one ﬁnds that the inverse relationship between the twocomponent spinors is
also given by
_
r . σ
r
_
Ω
l+1
j,jz
(θ, ϕ) = − Ω
l
j,jz
(θ, ϕ) (1503)
Therefore, one concludes that the two angular momentum eigenstates have dif
ferent properties under the spatial inversion transformation r →−r.
——————————————————————————————————
Mathematical Interlude:
The Action of the Operator ( ˆ r . σ ) on the Spinor Spherical Harmon
ics Ω
j±
1
2
j,jz
(θ, ϕ).
Here, it will be argued that the spinor spherical harmonics satisfy the equa
tions
_
r . σ
r
_
Ω
j+
1
2
j,jz
(θ, ϕ) = − Ω
j−
1
2
j,jz
(θ, ϕ)
_
r . σ
r
_
Ω
j−
1
2
j,jz
(θ, ϕ) = − Ω
j+
1
2
j,jz
(θ, ϕ) (1504)
257
The components of the total angular momentum
ˆ
J
(i)
=
ˆ
L
(i)
+
ˆ
S
(i)
(1505)
commute with ( r .
ˆ
S ). That is
[
ˆ
J
(i)
, ( r .
ˆ
S ) ] = 0 (1506)
The complete proof of this statement immediately follows from the proof of the
relation for any one component
ˆ
J
(i)
, since ( r .
ˆ
S ) is spherically symmetric.
Thus, for i = 1, one has
[
ˆ
J
(1)
, ( r .
ˆ
S ) ] = [
ˆ
L
(1)
+
ˆ
S
(1)
, x
(1)
ˆ
S
(1)
+ x
(2)
ˆ
S
(2)
+ x
(3)
ˆ
S
(3)
]
=
ˆ
S
(2)
[
ˆ
L
(1)
, x
(2)
] +
ˆ
S
(3)
[
ˆ
L
(1)
, x
(3)
]
+ x
(2)
[
ˆ
S
(1)
,
ˆ
S
(2)
] + x
(3)
[
ˆ
S
(1)
,
ˆ
S
(3)
] (1507)
Using the commutation relations
[
ˆ
S
(i)
,
ˆ
S
(j)
] = i ¯h ε
i,j,k
ˆ
S
(k)
(1508)
and
[
ˆ
L
(i)
, x
(j)
] = i ¯h ε
i,j,k
x
(k)
(1509)
one ﬁnds that
[
ˆ
J
(1)
, ( r .
ˆ
S ) ] = i ¯h
_
ˆ
S
(2)
x
(3)
−
ˆ
S
(3)
x
(2)
+ x
(2)
ˆ
S
(3)
− x
(3)
ˆ
S
(2)
_
= 0 (1510)
which was to be shown. From repeated use of the above commutation relations
which involve the components
ˆ
J
(i)
, it immediately follows that
[
ˆ
J
2
, ( r .
ˆ
S ) ] = 0 (1511)
Thus, since Ω
j±
1
2
j,jz
is a simultaneous eigenstate of
ˆ
J
2
and
ˆ
J
(3)
and because these
operators commute with ( r .
ˆ
S ), then ( r .
ˆ
S ) Ω
j±
1
2
j,jz
is also a simultaneous
eigenstate with eigenvalues (j, j
z
).
Since the states ( r .
ˆ
S ) Ω
j±
1
2
j,jz
are simultaneous eigenstates of
ˆ
J
2
and
ˆ
J
(3)
with eigenvalues (j, j
z
), and because this subspace is spanned by the basis com
posed of the two states Ω
j±
1
2
j,jz
(θ, ϕ), the transformed states can be decomposed
as
_
r . σ
r
_
Ω
j+
1
2
j,jz
(θ, ϕ) = C
++
(j, j
z
) Ω
j+
1
2
j,jz
(θ, ϕ) + C
+−
(j, j
z
) Ω
j−
1
2
j,jz
(θ, ϕ)
_
r . σ
r
_
Ω
j−
1
2
j,jz
(θ, ϕ) = C
−+
(j, j
z
) Ω
j+
1
2
j,jz
(θ, ϕ) + C
−−
(j, j
z
) Ω
j−
1
2
j,jz
(θ, ϕ)
(1512)
258
where the coeﬃcients C
±,±
(j, j
z
) will be determined below.
First, we shall show that the coeﬃcients C
±,±
(j, j
z
) are independent of j
z
.
This follows as
ˆ
J
±
commutes with ( r .
ˆ
S ) since all the components
ˆ
J
(i)
com
mute with ( r .
ˆ
S ). Thus, one has
ˆ
J
±
_
r . σ
r
_
Ω
j+
1
2
j,jz
(θ, ϕ) =
_
r . σ
r
_
ˆ
J
±
Ω
j+
1
2
j,jz
(θ, ϕ) (1513)
and
ˆ
J
±
_
r . σ
r
_
Ω
j+
1
2
j,jz
(θ, ϕ) = C
++
(j, j
z
)
ˆ
J
±
Ω
j+
1
2
j,jz
(θ, ϕ) + C
+−
(j, j
z
)
ˆ
J
±
Ω
j−
1
2
j,jz
(θ, ϕ)
_
r . σ
r
_
ˆ
J
±
Ω
j+
1
2
j,jz
(θ, ϕ) = C
++
(j, j
z
±1)
ˆ
J
±
Ω
j+
1
2
j,jz
(θ, ϕ) + C
+−
(j, j
z
±1)
ˆ
J
±
Ω
j−
1
2
j,jz
(θ, ϕ)
(1514)
Hence, on comparing the linearlyindependent terms on the lefthand sides, one
concludes that
C
++
(j, j
z
±1) = C
++
(j, j
z
)
C
+−
(j, j
z
±1) = C
+−
(j, j
z
) (1515)
etc. Therefore, the coeﬃcients C
±,±
(j, j
z
) are independent of the value of j
z
.
Henceforth, we shall omit the index j
z
in C
±,±
(j, j
z
).
From considerations of parity, it can be determined that C
++
(j) = C
−−
(j) =
0. Under the parity transformation r → − r, one has
Ω
j±
1
2
j,jz
(θ, ϕ) → ( − 1 )
j±
1
2
Ω
j±
1
2
j,jz
(θ, ϕ) (1516)
which follows from the properties of the spherical harmonics Y
l
m
(θ, ϕ) under the
parity transformation. Also one has
_
r . σ
r
_
→ −
_
r . σ
r
_
(1517)
under the parity transform. Thus, after the parity transform, one ﬁnds that the
transformed states have the decompositions
_
r . σ
r
_
Ω
j+
1
2
j,jz
(θ, ϕ) = − C
++
(j) Ω
j+
1
2
j,jz
(θ, ϕ) + C
+−
(j) Ω
j−
1
2
j,jz
(θ, ϕ)
_
r . σ
r
_
Ω
j−
1
2
j,jz
(θ, ϕ) = C
−+
(j) Ω
j+
1
2
j,jz
(θ, ϕ) − C
−−
(j) Ω
j−
1
2
j,jz
(θ, ϕ)
(1518)
which by comparison with eqn(1512) leads to the identiﬁcation
C
++
(j) = C
−−
(j) = 0 (1519)
259
Therefore, recalling that the coeﬃcients are independent of j
z
, one can express
the eﬀect of the operator on the spinor spherical harmonics as
_
r . σ
r
_
Ω
j+
1
2
j,jz
(θ, ϕ) = C
+−
(j) Ω
j−
1
2
j,jz
(θ, ϕ)
_
r . σ
r
_
Ω
j−
1
2
j,jz
(θ, ϕ) = C
−+
(j) Ω
j+
1
2
j,jz
(θ, ϕ) (1520)
Furthermore, since
_
r . σ
r
_
2
= I (1521)
one obtains the condition
C
+−
(j) C
−+
(j) = 1 (1522)
This condition can be made more restrictive as
_
r . σ
r
_
is Hermitean, which
leads to
C
+−
(j) = C
−+
(j)
∗
(1523)
The above two equations suggest that C
−+
(j) and C
+−
(j) are pure phase fac
tors, such as
C
+−
(j) = exp
_
+ i φ(j)
_
C
−+
(j) = exp
_
− i φ(j)
_
(1524)
The phase factor can be completely determined by considering the relations
(1520) with speciﬁc choices of the values of (θ, ϕ). As can be seen by examining
the case where ϕ = 0, the phase φ(j) is either zero or π. For the case ϕ = 0,
the operator simpliﬁes to
_
r . σ
r
_
=
_
cos θ sin θ
sin θ −cos θ
_
(1525)
The spinor spherical harmonics are given by
Ω
j+
1
2
j,jz
(θ, ϕ) =
_
_
−
_
j+1−jz
2j+2
Y
j+
1
2
jz−
1
2
(θ, ϕ)
+
_
j+1+jz
2j+2
Y
j+
1
2
jz+
1
2
(θ, ϕ)
_
_
Ω
j−
1
2
j,jz
(θ, ϕ) =
_
_
_
j+jz
2j
Y
j−
1
2
jz−
1
2
(θ, ϕ)
_
j−jz
2j
Y
j−
1
2
jz+
1
2
(θ, ϕ)
_
_
(1526)
which becomes real for ϕ = 0 since the spherical harmonics become real. Hence,
on inspecting eqn(1520) with ϕ = 0, one concludes that the phase factors are
equal and are purely real. That is
C
+−
(j) = C
−+
(j) = ± 1 (1527)
260
Finally, by considering θ = 0, for which
_
r . σ
r
_
=
_
1 0
0 −1
_
(1528)
and the spherical harmonics reduce
Y
j±
1
2
jz∓
1
2
(0, ϕ) =
_
2j + 1 ±1
4 π
δ
jz∓
1
2
,0
(1529)
one ﬁnds that, for ﬁxed j, only the four spinor spherical harmonics Ω
j±
1
2
j,jz
(0, ϕ)
Ω
j+
1
2
j,jz
(0, ϕ) =
_
_
−
_
2j+1
8 π
δ
jz−
1
2
,0
+
_
2j+1
8 π
δ
jz+
1
2
,0
_
_
Ω
j−
1
2
j,jz
(0, ϕ) =
_
_
_
2j+1
8 π
δ
jz−
1
2
,0
_
2j+1
8 π
δ
jz+
1
2
,0
_
_
(1530)
are nonzero. The spinor spherical harmonics with θ = 0 are connected via
_
1 0
0 −1
_
Ω
j±
1
2
j,±
1
2
(0, ϕ) = − Ω
j∓
1
2
j,±
1
2
(0, ϕ) (1531)
Hence, one has determined that
C
+−
(j) = C
−+
(j) = − 1 (1532)
which holds independent of the values of θ and j, so the eﬀect of the operator
on the spinor spherical harmonics is completely speciﬁed by
_
r . σ
r
_
Ω
j+
1
2
j,jz
(θ, ϕ) = − Ω
j−
1
2
j,jz
(θ, ϕ)
_
r . σ
r
_
Ω
j−
1
2
j,jz
(θ, ϕ) = − Ω
j+
1
2
j,jz
(θ, ϕ) (1533)
as was to be shown.
——————————————————————————————————
The Ansatz
If one only considers the spatial part of the parity operator,
ˆ
P, the two
component spinor states Ω
l
l
±
1
2
,jz
(θ, ϕ) have parities (−1)
l
ˆ
P Ω
l
l
±
1
2
,jz
(θ, ϕ) = (−1)
l
Ω
l
l
±
1
2
,jz
(θ, ϕ) (1534)
261
Furthermore, as has been seen, the upper and lower twocomponent spinors of
the fourcomponent Dirac spinor must have opposite intrinsic parity. Therefore,
the desired simultaneous eigenstates for the relativistic electron can be either
represented by the fourcomponent Dirac spinor ψ
−
j,jz
(r) with parity (−1)
l
=
(−1)
(l+
1
2
−
1
2
)
of the form
ψ
−
l+
1
2
,jz
(r) =
_
_
f
−
(r)
r
Ω
l
l+
1
2
,jz
(θ, ϕ)
i
g
−
(r)
r
Ω
l+1
l+
1
2
,jz
(θ, ϕ)
_
_
(1535)
or by ψ
+
j,jz
(r)
ψ
+
l+
1
2
,jz
(r) =
_
_
f
+
(r)
r
Ω
l+1
l+
1
2
,jz
(θ, ϕ)
i
g
+
(r)
r
Ω
l
l+
1
2
,jz
(θ, ϕ)
_
_
(1536)
which has parity (−1)
l+
1
2
+
1
2
. In these expressions f
±
(r) and g
±
(r) are scalar
radial functions that have to be determined as solutions of the radial equation.
These states do not correspond to deﬁnite values of the orbital angular mo
mentum since the upper and lower twocomponent spinors correspond to the
diﬀerent values of either l or l
= l + 1 for the orbital angular momentum.
To condense the notation, the energy eigenstates will be written in the com
pact form
ψ
±
j,jz
(r) =
_
f
±
(r)
r
Ω
l
A
j,jz
(θ, ϕ)
i
g
±
(r)
r
Ω
l
B
j,jz
(θ, ϕ)
_
(1537)
where l
A
= j ±
1
2
and l
B
= j ∓
1
2
.
The Radial Equation
We shall ﬁnd the radial Dirac equation for the solution ψ
±
j,jz
(r). The Dirac
spinor wave functions in eqn(1536) and eqn(1535) are substituted into eqns(1494).
The spinorbit interaction term can be evaluated by squaring the expression
ˆ
J =
ˆ
L + S (1538)
which leads to the identity
S .
ˆ
L =
1
2
_
ˆ
J
2
−
ˆ
L
2
− S
2
_
(1539)
When this operator acts on the relativistic twocomponent spinor spherical har
monic Ω
l
A
j,jz
, one ﬁnds
S .
ˆ
L Ω
l
A
j,jz
=
¯h
2
2
_
j ( j + 1 ) − l
A
( l
A
+ 1 ) −
3
4
_
Ω
l
A
j,jz
(1540)
262
which for j = l
A
+
1
2
yields
S .
ˆ
L Ω
l
A
j,jz
=
¯h
2
2
( j −
1
2
) Ω
l
A
j,jz
(1541)
and for j = l
A
−
1
2
, one obtains
S .
ˆ
L Ω
l
A
j,jz
= −
¯h
2
2
( j +
3
2
) Ω
l
A
j,jz
(1542)
The Dirac equation can be written in the general form
( E − V (r) − m c
2
)
f(r)
r
Ω
l
A
j,jz
= c ¯h
_
r . σ
r
2
_ _
r
∂
∂r
−
2
¯h
2
S .
ˆ
L
_
g(r)
r
Ω
l
B
j,jz
( E − V (r) + m c
2
)
g(r)
r
Ω
l
B
j,jz
= − c ¯h
_
r . σ
r
2
_ _
r
∂
∂r
−
2
¯h
2
S .
ˆ
L
_
f(r)
r
Ω
l
A
j,jz
(1543)
Following Dirac, it is customary to deﬁne an integer κ in terms of the eigenvalues
of S .
ˆ
L via
( S .
ˆ
L ) Ω
l
A
j,jz
= −
¯h
2
2
( 1 + κ ) Ω
l
A
j,jz
( S .
ˆ
L ) Ω
l
B
j,jz
= −
¯h
2
2
( 1 − κ ) Ω
l
B
j,jz
(1544)
Therefore, if Ω
l
A
j,jz
= Ω
j+
1
2
j,jz
, i.e. j = l
A
−
1
2
, then κ = (j +
1
2
).
Otherwise, if Ω
l
A
j,jz
= Ω
j−
1
2
j,jz
, i.e. j = l
A
+
1
2
, then κ = − (j +
1
2
).
On substituting the above expressions in the Dirac energy eigenvalue equa
tion for ψ
j,jz
(r) one ﬁnds
( E − V (r) − m c
2
)
f(r)
r
Ω
l
A
j,jz
= c ¯h
_
r . σ
r
2
_ _
r
∂
∂r
+ 1 − κ
_
g(r)
r
Ω
l
B
j,jz
( E − V (r) + m c
2
)
g(r)
r
Ω
l
B
j,jz
= − c ¯h
_
r . σ
r
2
_ _
r
∂
∂r
+ 1 + κ
_
f(r)
r
Ω
l
A
j,jz
(1545)
Since the radial spin projection operator is independent of r, it can be com
muted to the right of the diﬀerential operator in the large parenthesis. Then on
using either the relation given in eqn(1501) or in eqn(1502), one ﬁnds that the
relativistic spherical harmonics factor out of the equations, leading to
( E − V (r) − m c
2
)
f(r)
r
= − c ¯h
_
∂
∂r
+
1 − κ
r
_
g(r)
r
( E − V (r) + m c
2
)
g(r)
r
= c ¯h
_
∂
∂r
+
1 + κ
r
_
f(r)
r
(1546)
263
Table 9: The Relationship between j, l
A
, l
B
, κ and Parity. The parity eigenvalue
is given by η
p
= (−1)
l
A
and κ = ±(j +
1
2
).
κ l
A
l
B
Parity
κ = (j +
1
2
) j +
1
2
j −
1
2
(−1)
κ
κ = −(j +
1
2
) j −
1
2
j +
1
2
(−1)
1−κ
Therefore, the Dirac radial equation consists of the two coupled ﬁrstorder dif
ferential equations for f(r) and g(r). On multiplying by a factor of r and
simplifying the derivatives of f(r)/r, one ﬁnds the pair of more symmetrical
equations
( E − V (r) − m c
2
) f(r) + c ¯h
_
∂
∂r
−
κ
r
_
g(r) = 0
( E − V (r) + m c
2
) g(r) − c ¯h
_
∂
∂r
+
κ
r
_
f(r) = 0
(1547)
The above pair of equations are the central result of this lecture.
The Probability Density in Spherical Polar Coordinates.
The probability density P(r) that an electron, in an energy eigenstate of a
spherically symmetric potential, is found in the vicinity of the point (r, θ, ϕ) is
given by
P(r) = ψ
†
(r) ψ(r)
=
_
[f(r)[
2
r
2
_
Ω
l
A
j,jz
(θ, ϕ)
†
Ω
l
A
j,jz
(θ, ϕ) +
_
[g(r)[
2
r
2
_
Ω
l
B
j,jz
(θ, ϕ)
†
Ω
l
B
j,jz
(θ, ϕ)
(1548)
However, due to the identity
Ω
l
A
j,jz
(θ, ϕ)
†
Ω
l
A
j,jz
(θ, ϕ) = Ω
l
B
j,jz
(θ, ϕ)
†
Ω
l
B
j,jz
(θ, ϕ) = A
j,jz
(θ) (1549)
the probability is independent of the azimuthal angle ϕ and the sign of j
z
(just
like in the nonrelativistic case) and has a common angular factor of A
j,jz
(θ).
Thus, the probability distribution factorizes into a radial and the angular factor
P(r) =
_
[f(r)[
2
r
2
+
[g(r)[
2
r
2
_
A
j,jz
(θ) (1550)
264
Table 10: Relativistic Angular Distribution Functions
j [j
z
[ A
j,jz
(θ)
1
2
1
2
1
4 π
3
2
1
2
1
8 π
( 1 + 3 cos
2
θ )
3
2
3
2
3
8 π
sin
2
θ
5
2
1
2
3
16 π
( 1 − 2 cos
2
θ + 5 cos
4
θ )
5
2
3
2
3
32 π
sin
2
θ ( 1 + 15 cos
2
θ )
5
2
5
2
15
32 π
sin
4
θ
The angular distribution function for a closed shell is given by the sum over the
angular distribution functions. Due to the identity,
j
jz=−j
A
j,jz
(θ) =
2 j + 1
4 π
(1551)
one ﬁnds that closed shells are spherically symmetric, as is expected. The ﬁrst
few angular dependent factors A
j,jz
(θ) are given in Table(10) and the corre
sponding nonrelativistic angular factors are given in Table(11). On compar
ing the relativistic angular dependent factors with the nonrelativistic factors
[Y
l
m
(θ, ϕ)[
2
, one ﬁnds that they are identical for [j
z
[ = j. Since the relativistic
distribution is the sum of two generally diﬀerent positive deﬁnite forms origi
nally associated with the two spinors χ
+
and χ
−
, it generally does not go to
zero for nonzero values of θ.
11.10.1 The Hydrogen Atom
The radial energy eigenvalue equation for a hydrogeniclike atom is given by
( E +
Z e
2
r
− m c
2
) f(r) + c ¯h
_
∂
∂r
−
κ
r
_
g(r) = 0
( E +
Z e
2
r
+ m c
2
) g(r) − c ¯h
_
∂
∂r
+
κ
r
_
f(r) = 0
(1552)
265
Figure 48: The relativistic (left) and nonrelativistic (right) angular distribu
tions A
j,jz
(θ) for j =
1
2
and j =
3
2
.
266
Figure 49: The relativistic (left) and nonrelativistic (right) angular distribu
tions A
j,jz
(θ) for j =
5
2
.
267
Table 11: NonRelativistic Angular Distribution Functions
l [m[ A
l,m
(θ)
0 0
1
4 π
1 0
3
4 π
cos
2
θ
1 1
3
8 π
sin
2
θ
2 0
5
16 π
( 1 − 3 cos
2
θ )
2
2 1
15
8 π
sin
2
θ cos
2
θ
2 2
15
32 π
sin
4
θ
The above equations will be written in dimensionless units, where the energy is
expressed in terms of the rest mass m c
2
and lengths are expressed in terms of
the Compton wave length
¯ h
m c
. A dimensionless energy is deﬁned as the ratio
of E to the rest mass energy
=
E
m c
2
(1553)
For a bound state, m c
2
> E > − m c
2
so the value of the magnitude of
is expected to be a little less than unity. A dimensionless radial variable ρ is
introduced which governs the asymptotic large r decay of the bound state wave
function. The variable is deﬁned by
ρ =
_
1 −
2
_
r m c
¯h
_
(1554)
In terms of these dimensionless variables, the Dirac radial equations for the
hydrogenlike atom become
_
−
_
1 −
1 +
+
γ
ρ
_
f +
_
∂
∂ρ
−
κ
ρ
_
g = 0
_
_
1 +
1 −
+
γ
ρ
_
g −
_
∂
∂ρ
+
κ
ρ
_
f = 0 (1555)
where
γ =
_
Z e
2
¯h c
_
(1556)
268
is a small number.
Boundary Conditions
The asymptotic ρ →∞ form of the solution can be found from the asymp
totic form of the equations
−
_
1 −
1 +
f +
∂
∂ρ
g ∼ 0
_
1 +
1 −
g −
∂
∂ρ
f ∼ 0 (1557)
Hence, on combing these equations, one sees that the asymptotic form of the
equation is given by
∂
2
f
∂ρ
2
= f (1558)
Therefore, one has
f ∼ A exp
_
− ρ
_
+ B exp
_
+ ρ
_
(1559)
and, likewise, g has a similar exponential form. If the solution is to be nor
malizable, then the coeﬃcient B in front of the increasing exponential must be
exactly zero (B ≡ 0).
The asymptotic ρ →0 behavior of the solution can be found from
γ f +
_
ρ
∂
∂ρ
− κ
_
g = 0
γ g −
_
ρ
∂
∂ρ
+ κ
_
f = 0 (1560)
where it has been noted that both the angular momentum term κ and the
Coulomb potential γ govern the small ρ variation, while the mass and en
ergy terms are negligible. This is in contrast to the case of the nonrelativistic
Schr¨odinger equation with the Coulomb potential, where for small r the Coulomb
potential term is negligible in comparison with the centrifugal potential. We
shall make the ansatz for the asymptotic small ρ variation
f ∼ A ρ
s
g ∼ B ρ
s
(1561)
where the exponent s is an unknown constant and then substitute the ansatz in
the above equations. This procedure yields the coupled algebraic equations
γ A + ( s − κ ) B = 0
γ B − ( s + κ ) A = 0 (1562)
269
Hence, it is found that the exponent s is determined as solutions of the indicial
equation which is a quadratic equation. The solutions are given by
s = ±
_
κ
2
− γ
2
(1563)
Since, the wave function must be normalizable near ρ →0,
lim
η→0
_
η
dr
_
[ f [
2
+ [ g [
2
_
< ∞ (1564)
one must choose the positive solution for s. Normalizability near the origin
requires that 2 s > − 1. Hence, one may set
s =
_
κ
2
− γ
2
(1565)
This will be a good solution for κ = − 1 if Z does not exceed a critical value.
For values of Z greater than ≈ 172, the point charge can spark the vacuum
and spontaneously generate electronpositron pairs
104
. The solution with the
negative value of s given by
s = −
_
κ
2
− γ
2
(1566)
could also possibly exist and be normalizable if γ is greater than a critical value
γ
c
determined as
1
2
=
_
1 − γ
2
c
(1567)
This critical value of γ is found from
γ
c
=
√
3
2
(1568)
which corresponds to Z
c
∼ 118. The solutions corresponding to negative s
are, infact, unphysical and do not survive if the nucleus is considered to have
a ﬁnite spatial extent.
The Fr¨obenius Method
We shall use the Fr¨obenius method to ﬁnd a solution. The solutions of the
radial equation shall be written in the form
f(r) = exp
_
− ρ
_
ρ
s
F(ρ)
g(r) = exp
_
− ρ
_
ρ
s
G(ρ) (1569)
104
H. Backe, L. Handschug, F. Hessberger, E. Kankeleit, L. Richter, F. Weik, R. Willwater,
H. Bokemeyer, P. Vincent, Y. Nakayama, and J. S. Greenberg, Phys. Rev. Lett. 40, 1443
(1978).
270
This form incorporates the appropriate boundary conditions at ρ → 0 and
ρ →∞. The coupled radial equations are transformed to
_
−
_
1 −
1 +
ρ + γ
_
F +
_
ρ
∂
∂ρ
+ s − κ − ρ
_
G = 0
_
_
1 +
1 −
ρ + γ
_
G −
_
ρ
∂
∂ρ
+ s + κ − ρ
_
F = 0
(1570)
The functions F(ρ) and G(ρ) can be expressed as an inﬁnite power series in ρ
F(ρ) =
∞
n=0
a
n
ρ
n
G(ρ) =
∞
n=0
b
n
ρ
n
(1571)
where the coeﬃcients a
n
and b
n
are constants which have still to be determined.
The coeﬃcients are determined by substituting the series in the diﬀerential
equation and then equating the coeﬃcients of the same power in ρ. Equating
the coeﬃcient of ρ
n
yields the set of relations
_
−
_
1 −
1 +
a
n−1
+ γ a
n
_
+ ( n + s − κ ) b
n
− b
n−1
= 0
_
_
1 +
1 −
b
n−1
+ γ b
n
_
− ( n + s + κ ) a
n
+ a
n−1
= 0
(1572)
This equation is automatically satisﬁed for n = 0, since by deﬁnition a
−1
=
b
−1
≡ 0 so the equation reduces to the indicial equation for s. These relations
yield recursion relations between the coeﬃcients (a
n
, b
n
) with diﬀerent values of
n. The form of the recursion relation can be made explicit by using a relation
between a
n
and b
n
valid for any n. This relation is found by multiplying the
ﬁrst relation of eqn(1572) by the factor
_
1 +
1 −
(1573)
and adding it to the second, one sees that the coeﬃcients with index n − 1
vanish. This process results in the equation
_
_
1 +
1 −
γ − ( n + s + κ )
_
a
n
+
_
γ +
_
1 +
1 −
( n + s − κ )
_
b
n
= 0
(1574)
valid for any n. The above equation can be used to eliminate the coeﬃcients
b
n
and yield a recursion relation between a
n
and a
n−1
. The ensuing recursion
271
relation will enable us to explicitly calculate the wave functions G(ρ) and hence
F(ρ).
Truncation of the Series
The behavior of the recursion relation for large values of n can be found by
noting that eqn(1574) yields
n a
n
∼
_
1 +
1 −
n b
n
(1575)
which when substituted back into the large n limit of the ﬁrst relation of
eqn(1572) yields
n a
n
∼ 2 a
n−1
(1576)
Since the large ρ limit of the function is dominated by the highest powers of
ρ, it is seen that if the series does not terminate, the functions F(ρ) and G(ρ)
would be exponentially growing functions of ρ
F(ρ) ∼ exp
_
+ 2 ρ
_
G(ρ) ∼ exp
_
+ 2 ρ
_
(1577)
Therefore, the set of recursion relations must terminate, since if the series does
not terminate, the large ρ behavior of the functions F(ρ) and G(ρ) would gov
erned by the growing exponentials. Even when combined with the decaying
exponential term that appear in the relations
f(r) = ρ
s
F(ρ) exp
_
− ρ
_
g(r) = ρ
s
G(ρ) exp
_
− ρ
_
(1578)
the resulting functions f(r) and g(r) would not satisfy the required boundary
conditions at ρ →∞. We shall assume that the series truncate after the n
r
th
terms. That is, it is possible to set
a
nr+1
= 0
b
nr+1
= 0 (1579)
Thus, the components of the radial wave function may have n
r
nodes. Assuming
that the coeﬃcients with indices n
r
+ 1 vanish and using the ﬁrst relation in
eqn(1572) with n = n
r
+ 1, one obtains the condition
_
1 +
1 −
b
nr
= − a
nr
(1580)
272
A second condition is given by the relation between a
n
and b
n
_
_
1 +
1 −
γ − ( n + s + κ )
_
a
n
+
_
γ +
_
1 +
1 −
( n + s − κ )
_
b
n
= 0
(1581)
valid for any n. We shall set n = n
r
and then eliminate a
nr
using the termination
condition expressed by eqn(1580). After some simpliﬁcation, this leads to the
equation
γ = ( n
r
+ s )
_
1 −
2
(1582)
This equation determines the square of the dimensionless energy eigenvalue
2
.
On squaring this equation, simplifying and taking the square root, one ﬁnds
= ±
( n
r
+ s )
_
( n
r
+ s )
2
+ γ
2
(1583)
or, equivalently, the energy of the hydrogen atom
105
is given by
E = ±
m c
2
_
1 +
γ
2
( nr + s )
2
(1584)
where
s =
_
( j +
1
2
)
2
− γ
2
(1585)
This expression for the energy eigenvalue is independent of the sign of κ and,
therefore, it holds for both cases
j = ( l + 1 ) −
1
2
j = l +
1
2
(1586)
Hence, the energy eigenstates are predicted to be doubly degenerate (in addition
to the (2j +1) degeneracy associated with j
3
), since states with the same j but
have diﬀerent values of l
have the same energy. If the positiveenergy eigenvalue
is expanded in powers of γ, one obtains
E ≈ m c
2
−
1
2
m c
2
γ
2
( n
r
+ j +
1
2
)
2
+ . . . (1587)
which agrees with the energy eigenvalues found from the nonrelativistic Schr¨odinger
equation. However, as has been seen, the exact energy eigenvalue depends on
n
r
and (j +
1
2
) separately, as opposed to being a function of the principle quan
tum number n which is deﬁned as the sum n = n
r
+ j +
1
2
. Hence, the Dirac
105
C. G. Darwin, Proc. Roy. Soc. A 118, 654 (1928).
C. G. Darwin, Proc. Roy. Soc. A 120, 621 (1928).
W. Gordon, Zeit. f¨ ur Physik, 48, 11 (1928).
273
equation lifts the degeneracy between states with diﬀerent values of the angular
momentum. The energy levels together with their quantum numbers are shown
in Table(12). The energy splitting between states with the same n and diﬀerent
j values has a magnitude which is governed by the square of the ﬁne structure
constant Z (
e
2
¯ h c
). That is
E ≈ m c
2
_
1 −
1
2
γ
2
( n
r
+ j +
1
2
)
2
−
1
2
γ
4
( n
r
+ j +
1
2
)
3
_
1
( j +
1
2
)
−
3
4 ( n
r
+ j +
1
2
)
_
+ . . .
_
(1588)
The ﬁne structure splittings for Hlike atoms was ﬁrst observed by Michelson
106
and the theoretical prediction is in agreeement with the accurate measurements
of Paschen
107
. The ﬁne structure splitting is important for atoms with larger Z.
This observation has a classical interpretation which reﬂects the fact that for
large Z the electrons move in orbits with smaller radii and, therefore, the elec
trons must move faster. Relativistic eﬀects become more important for electrons
which move faster, and this occurs for atoms with larger values of Z. Although
the ﬁne structure splitting does remove some degeneracy, the two states with
the same principle quantum number n and the same angular momentum j but
which have diﬀerent values of l are still predicted to be degenerate. Thus, for
example, the 2S
j=
1
2
and the 2P
j=
1
2
states of Hydrogen are predicted to be de
generate by the Dirac equation. It has been shown that this degeneracy is
removed by the Lamb shift, which is due to the interaction of an electron with
its own radiation ﬁeld. The Lamb shift is smaller than the ﬁne structure shifts
discussed above because it involves an extra factor of
e
2
¯ h c
.
The Ground State Wave Function
The ground state wave function of the hydrogen atom is slightly singular
at the origin. This can be seen by noting that it corresponds to n
r
= 0 and
κ = −1. Since the dimensionless energy is given by the expression
=
_
1 − γ
2
(1589)
one ﬁnds that the dimensionless radial distance ρ is simply given by
ρ =
_
1 −
2
_
r m c
¯h
_
= γ
_
m c
¯h
_
r
=
Z e
2
m
¯h
2
r (1590)
106
A. A. Michelson, Phil. Mag. 31, 338 (1891).
107
F. Paschen, Ann. Phys. 50, 901 (1916).
274
Table 12: The Equivalence between Relativistic and Spectroscopic Quantum
Numbers.
n = n
r
+[κ[ n
r
κ = ±(j +
1
2
) nL
j
Degenerate
_
E
m c
2
_
Partner
1 0 1 1S1
2
_
1 −γ
2
2 1 1 2S1
2
2P1
2
_
1 −
γ
2
2+2
√
1−γ
2
2 1 +1 2P1
2
2S1
2
 
2 0 2 2P3
2
_
1 −
1
4
γ
2
3 2 1 3S1
2
3P1
2
_
1 −
γ
2
5+4
√
1−γ
2
3 2 +1 3P1
2
3S1
2
 
3 1 2 3P3
2
3D3
2
_
1 −
γ
2
5+2
√
4−γ
2
3 1 +2 3D3
2
3P3
2
 
3 0 3 3D5
2
_
1 −
1
9
γ
2
275
where the characteristic length scale is just the nonrelativistic Bohr radius
divided by Z. The wave functions are written as
f(ρ) = exp
_
− ρ
_
ρ
s
F(ρ)
g(ρ) = exp
_
− ρ
_
ρ
s
G(ρ) (1591)
Since n
r
= 0, the recursion relations terminate immediately leading to the
functions F(ρ) and G(ρ) being given, respectively by constants a
0
and b
0
. Since
κ = − 1 for the ground state, the recursion relations are simply
γ a
0
+ ( s + 1 ) b
0
= 0
γ b
0
− ( s − 1 ) a
0
= 0 (1592)
The solution of the equations results in the index s being given by
s =
_
1 − γ
2
(1593)
and
b
0
a
0
=
s + κ
γ
=
_
1 − γ
2
− 1
γ
(1594)
This shows that the lower component is smaller than the upper constant by
approximately γ, which has the magnitude of
v
c
where v is the velocity in Bohr’s
theory. The ratio of b
0
to a
0
determines the radial functions as
f(r)
r
= a
0
ρ
s−1
exp
_
− ρ
_
g(r)
r
= a
0
_
1 − γ
2
− 1
γ
ρ
s−1
exp
_
− ρ
_
(1595)
Since Y
0
0
(θ, ϕ) =
1
√
4 π
, the angular spherical harmonics for the upper com
ponents are just
Ω
A
(θ, ϕ) =
1
√
4 π
χ
σ
(1596)
and the lower components are given by
Ω
B
(θ, ϕ) = −
r . σ
r
1
√
4 π
χ
σ
= −
1
√
4 π
_
cos θ sin θ exp[−iϕ]
sin θ exp[+iϕ] − cos θ
_
χ
σ
(1597)
Thus, apart from an over all normalization factor, the fourcomponent spinor
Dirac wave function ψ is given by
ψ =
^
√
4 π
ρ
√
1−γ
2
−1
exp
_
− ρ
_
_
_
χ
σ
− i
_
r . σ
r
_
χ
σ
_
_
(1598)
276
1S
1/2
0.4
0.2
0
0.2
0.4
0.6
0.8
0 0.5 1 1.5 2 2.5 3
mcγr/h
f(r)
g(r) x 100
Figure 50: The large f(r) and small component g(r) radial wave functions for
the 1S1
2
ground state of Hydrogen.
Hence, it is seen that as ρ approaches the origin, at ﬁrst the wave function is
slowly varying since
ρ
√
1−γ
2
−1
∼ exp
_
−
γ
2
2
ln ρ
_
∼ 1 −
γ
2
2
ln ρ (1599)
but for distances smaller than the characteristic length scale
r
c
=
¯h
m c γ
exp
_
−
2
γ
2
_
(1600)
the wave function exhibits a slight singularity. This length scale is much smaller
that the nuclear radius so, due to the spatial distribution of the nuclear charge,
the singularity is largely irrelevant. This singularity is not present in the non
relativistic limit, since in this limit one assumes that the inequality [ V (r) [ ¸
m c
2
always holds, although this assumption is invalid for r ∼ 0. Therefore, one
concludes that the relativistic theory diﬀers from the nonrelativistic theory at
small distances, which could have been discerned from the use of the Heisenberg
uncertainty principle.
277
2S
1/2
0.6
0.4
0.2
0
0.2
0.4
0 2 4 6 8 10
mcγr/h
f(r)
g(r) x 100
2P
1/2
0.6
0.4
0.2
0
0.2
0 2 4 6 8 10
mcγr/h
f(r)
g(r) x 100
Figure 51: The radial wave functions for the 2S1
2
and 2P1
2
states of Hydrogen.
11.10.2 LowestOrder Radial Wavefunctions
The ﬁrst few radial functions for the hydrogen atom can be expressed in the
form
f(r) = ^
¸
1 +
_
E
m c
2
_ _
r
a
_
s
exp
_
−
r
a
_ _
c
0
− 2 c
1
_
r
a
_ _
g(r) = − ^
¸
1 −
_
E
m c
2
_ _
r
a
_
s
exp
_
−
r
a
_ _
d
0
− 2 d
1
_
r
a
_ _
(1601)
where the above form is restricted to the case where the radial quantum number
n
r
take on the values of 0 or 1. The index s is the same as that which occurs
in the Frobenius method and is given by the positive solution
s =
_
κ
2
− γ
2
(1602)
It is seen that the radial wavefunction depend on the dimensionless variable ρ
deﬁned by
ρ =
r
a
(1603)
where the length scale a is given in terms of the energy E and the Compton
wavelength by
a =
_
1 −
_
E
m c
2
_
2
_
−
1
2
_
¯h
m c
_
(1604)
The values of the indices s, energy E, length scale a and normalization ^
are given in Table(13). Since the twocomponent spinor spherical harmonics
Ω
l
j,jz
(θ, ϕ) are normalized to unity, the normalization condition is determined
from the integral
^
2
_
∞
0
dr
_
[ f [
2
+ [ g [
2
_
= 1 (1605)
278
involving the radial wave functions. The integral is evaluated with the aid of
the identity
_
∞
0
dρ ρ
a+b
exp
_
− 2 ρ
_
= 2
−(a+b+1)
Γ(a +b + 1) (1606)
The coeﬃcients c
n
and d
n
in the above expansion of the radial functions diﬀer
Table 13: Parameters specifying the Radial Functions for the Hydrogen atom.
State s
_
E
m c
2
_
a
_
γ m c
¯ h
_
^
κ = −1
_
1 − γ
2
_
1 − γ
2
1
1
√
a
2
s+
1
2
√
2Γ(2s+1)
1S1
2
κ = −1
_
1 − γ
2
_
1 +
√
1 − γ
2
2
2
_
E
m c
2
_
1
2
_
(
2 E
m c
2
−1)
(
2 E
m c
2
)
1
√
a
2
s+
1
2
√
Γ(2s+1)
2S3
2
κ = 1
_
1 − γ
2
_
1 +
√
1 − γ
2
2
2
_
E
m c
2
_
1
2
_
(
2 E
m c
2
+1)
(
2 E
m c
2
)
1
√
a
2
s+
1
2
√
Γ(2s+1)
2P1
2
κ = −2
_
4 − γ
2
_
1 −
γ
2
4
2
1
√
a
2
s+
1
2
√
2Γ(2s+1)
2P3
2
from the coeﬃcients a
n
and b
n
that occur in the Frobenius expansion, since the
values of the ratio c
nr
/d
nr
has been chosen to simplify in the limit of large n. In
particular at the value of n
r
(at which the series terminates), the ratio is chosen
to satisfy
c
nr
d
nr
= 1 (1607)
instead of the condition
a
nr
b
nr
= −
¸
1 + (
E
m c
2
)
1 − (
E
m c
2
)
(1608)
279
The relative negative sign and the square root factors in the coeﬃcients have
been absorbed into the expressions for the upper and lower components f(r) and
g(r). The square root factors are responsible for converting the upper and lower
components, respectively, into the large and small components for positive E,
and vice versa for negative E. The expansion coeﬃcients are given in Table(14).
Since the ratio of the magnitudes of the polynomial factors is generally of the
Table 14: Coeﬃcients for the Polynomial in the Hydrogen atom Radial Wave
functions.
State c
0
c
1
d
0
d
1
κ = −1 1 0 1 0
1S1
2
κ = −1 2
_
E
m c
2
_ _
(
2 E
m c
2
)+1
2 s + 1
_
2
_ _
E
m c
2
_
+ 1
_ _
(
2 E
m c
2
)+1
2 s + 1
_
2S3
2
κ = 1 2
_ _
E
m c
2
_
− 1
_ _
(
2 E
m c
2
)−1
2 s + 1
_
2
_
E
m c
2
_ _
(
2 E
m c
2
)−1
2 s + 1
_
2P1
2
κ = −2 1 0 1 0
2P3
2
order of unity, the ratio of the magnitudes of the small to large components is
found to be of the order of γ.
11.10.3 The Relativistic Corrections for Hydrogen
The Dirac equation for Hydrogen will be examined in the nonrelativistic limit,
and the lowestorder relativistic corrections will be retained. The resulting equa
tion will be recast in the form of a Schr¨odinger equation, in which the Hamil
tonian contains additional interaction terms. The resulting interactions, when
treated by ﬁrstorder perturbation theory, yield the ﬁne structure. The physical
interpretation of the interactions will be examined. Historically, the following
type of analysis and the ensuing discussion of the Thomas precession played a
280
decisive role in compelling Pauli to reluctantly accept Dirac’s theory.
The Dirac equation can be expressed as the set of coupled equations
_
i ¯h
∂
∂t
− V − m c
2
_
φ
A
= − i ¯h c ( σ . ∇ ) φ
B
_
i ¯h
∂
∂t
− V + m c
2
_
φ
B
= − i ¯h c ( σ . ∇ ) φ
A
(1609)
where φ
A
and φ
B
are, respectively, the upper and lower two component spinors
of the fourcomponent Dirac spinor ψ. The energy eigenvalues of these equations
are sought, so to this end the explicit timedependence of the energy eigenstates
will be separated out via
ψ =
_
φ
A
φ
B
_
exp
_
−
i
¯h
E t
_
(1610)
Also the nonrelativistic energy will be deﬁned as the energy referenced with
respect to the restmass energy
E = m c
2
+ (1611)
The coupled equations reduce to
_
− V
_
φ
A
= − i ¯h c ( σ . ∇ ) φ
B
_
− V + 2 m c
2
_
φ
B
= − i ¯h c ( σ . ∇ ) φ
A
(1612)
The pair of equations will be expanded in powers of (
p
m c
)
2
and only the ﬁrst
order relativistic corrections will be retained. One can express φ
B
as
φ
B
=
− i ¯h c ( σ . ∇ ) φ
A
− V + 2 m c
2
=
1
2 m c
_
1 +
− V
2 m c
2
_
−1
( σ . ˆ p ) φ
A
≈
1
2 m c
_
1 −
− V
2 m c
2
+ . . .
_
( σ . ˆ p ) φ
A
(1613)
to the required order of approximation. The above equation can be used to
obtain a Schr¨odingerlike equation for the twocomponent spinor φ
A
. Since a
Schr¨odinger equation is sought for ψ
S
, a correspondence must be established
between the pair of spinors (φ
A
,φ
B
) and ψ
S
. The probability density is the
physical quantity which is directly associated with both types of wave functions.
The probability density associated with the Schr¨odinger equation should be
equivalent to the probability density associated with the Dirac equation. The
281
probability density associated with the fourcomponent Dirac spinor depends
on both φ
A
and φ
B
,
P(r) =
_
φ
A†
φ
A
+ φ
B†
φ
B
_
(1614)
The probability density associated with the twocomponent Schr¨odinger wave
function depends on ψ
S
P(r) = ψ
†
S
ψ
S
(1615)
The probability density is normalized to unity. On equating the two expressions
for the normalization and substituting for φ
B
, one obtains
_
d
3
r ψ
†
S
ψ
S
=
_
d
3
r
_
φ
A†
φ
A
+
1
4 m
2
c
2
( σ . ˆ p φ
A
)
†
( σ . ˆ p φ
A
)
_
=
_
d
3
r
_
φ
A†
φ
A
+
1
4 m
2
c
2
φ
A†
( σ . ˆ p ) ( σ . ˆ p ) φ
A
_
=
_
d
3
r φ
A†
_
I +
ˆ p
2
4 m
2
c
2
_
φ
A
(1616)
Therefore, the twocomponent Schr¨odinger wave function can be identiﬁed as
ψ
S
=
_
I +
ˆ p
2
8 m
2
c
2
+ . . .
_
φ
A
(1617)
or, on inverting the expansion
φ
A
≈
_
I −
ˆ p
2
8 m
2
c
2
+ . . .
_
ψ
S
(1618)
Expressing φ
A
in terms of ψ
S
in the equation for φ
B
yields the equation
φ
B
≈
1
2 m c
_
1 −
− V
2 m c
2
_
( σ . ˆ p )
_
I −
ˆ p
2
8 m
2
c
2
_
ψ
S
≈
1
2 m c
_
( σ . ˆ p )
_
I −
ˆ p
2
8 m
2
c
2
_
−
_
− V
2 m c
2
_
( σ . ˆ p )
_
ψ
S
(1619)
On substituting φ
B
and ψ
S
into the equation for φ
A
, one ﬁnds the (two
component) energy eigenvalue equation
_
− V
_ _
I −
ˆ p
2
8 m
2
c
2
_
ψ
S
=
( σ . ˆ p )
2 m
_
( σ . ˆ p )
_
I −
ˆ p
2
8 m
2
c
2
_
−
_
− V
2 m c
2
_
( σ . ˆ p )
_
ψ
S
(1620)
282
or
_
− V −
ˆ p
2
8 m
2
c
2
+ V
ˆ p
2
8 m
2
c
2
_
ψ
S
=
_
ˆ p
2
2 m
_
I −
ˆ p
2
8 m
2
c
2
_
− ( σ . ˆ p )
_
− V
4 m
2
c
2
_
( σ . ˆ p )
_
ψ
S
(1621)
The above energy eigenvalue equation can be expressed as
_
− V −
ˆ p
2
2 m
+
ˆ p
2
8 m
2
c
2
+ V
ˆ p
2
8 m
2
c
2
_
ψ
S
=
_
−
ˆ p
4
16 m
3
c
2
+ ( σ . ˆ p )
_
V
4 m
2
c
2
_
( σ . ˆ p )
_
ψ
S
(1622)
The term proportional to the product of the energy eigenvalue and the kinetic
energy can be rewritten as
ˆ p
2
8 m
2
c
2
ψ
S
=
ˆ p
2
8 m
2
c
2
ψ
S
≈
ˆ p
2
8 m
2
c
2
( V +
ˆ p
2
2 m
) ψ
S
(1623)
to the required order of approximation. On substituting the above expression
into the energy eigenvalue equation (1622), one ﬁnds
_
− V −
ˆ p
2
2 m
+
ˆ p
4
8 m
3
c
2
+ V
ˆ p
2
8 m
2
c
2
+
ˆ p
2
8 m
2
c
2
V
_
ψ
S
=
_
( σ . ˆ p )
_
V
4 m
2
c
2
_
( σ . ˆ p )
_
ψ
S
(1624)
The above equation will be interpreted as the nonrelativistic energy eigenvalue
equation for the twocomponent wave function ψ
S
, which contains relativistic
corrections of order (
v
c
)
2
. The energy eigenvalue equation (1624) will be written
in the form
ψ
S
=
_
ˆ p
2
2 m
+ V
_
ψ
S
−
ˆ p
4
8 m
3
c
2
ψ
S
−
_
ˆ p
2
V + V ˆ p
2
8 m
2
c
2
_
ψ
S
+
_
( σ . ˆ p )
_
V
4 m
2
c
2
_
( σ . ˆ p )
_
ψ
S
(1625)
where the relativistic corrections are symmetric in p
2
and V . This represents the
energy eigenvalue equation for a twocomponent wave function ψ
S
, similar to
the Schr¨odinger wave function, but the above equation does include relativistic
corrections to the Hamiltonian. The ﬁrst correction term is
ˆ
H
Kin
= −
ˆ p
4
8 m
3
c
2
(1626)
283
which is recognized as the relativistic kinematic energy correction, that origi
nates from the expansion of the kinetic energy
=
_
m
2
c
2
+ p
2
c
2
− m c
2
≈
p
2
2 m
−
p
4
8 m
3
c
2
+ . . . (1627)
The remaining two correction terms
_
( σ . ˆ p )
_
V
4 m
2
c
2
_
( σ . ˆ p )
_
−
_
ˆ p
2
V + V ˆ p
2
8 m
2
c
2
_
(1628)
will be interpreted as the sum of the spinorbit interaction and the Darwin term.
It should be noted that the sum of these two terms would identically cancel in
a purely classical theory. This cancellation can be shown to occur since, in the
classical limit, V and p commute, and then the Pauliidentity can be used to
show that the resulting pairs of terms cancel.
The factor
2 ( σ . ˆ p ) V ( σ . ˆ p ) −
_
ˆ p
2
V + V ˆ p
2
_
(1629)
can be evaluated as
2 ˆ p . V ˆ p −
_
ˆ p
2
V + V ˆ p
2
_
+ 2 i σ .
_
ˆ p ∧ V ˆ p
_
(1630)
The ﬁrst two terms can be combined to form a double commutator, yielding
− [ ˆ p , [ ˆ p , V ] ] + 2 i σ .
_
ˆ p ∧ V ˆ p
_
(1631)
or
+ ¯h
2
∇
2
V + 2 i σ .
_
ˆ p ∧ V ˆ p
_
(1632)
The last term can be evaluated, resulting in the expression
+ ¯h
2
∇
2
V + 2 ¯ h σ .
_
∇ V ∧ ˆ p
_
(1633)
since
ˆ p ∧ ˆ p ≡ 0 (1634)
Using these substitutions, the remaining interactions can be expressed as the
sum of the spinorbit interaction and the Darwin interaction
ˆ
H
SO
+
ˆ
H
Darwin
= +
¯h
4 m
2
c
2
σ .
_
∇ V ∧ ˆ p
_
+
¯h
2
8 m
2
c
2
∇
2
V (1635)
284
The ﬁrst term is the spinorbit interaction term, and the second term is the
Darwin term. For central potentials, the Darwin term is only important for
electrons with l = 0. The evaluation and the physical interpretation of the
energy shifts due to the three ﬁnestructure interactions will be discussed sepa
rately.
11.10.4 The Kinematic Correction
The kinematic interaction
ˆ
H
Kin
= −
ˆ p
4
8 m
3
c
2
(1636)
originates from the expansion of the relativistic expression for the kinetic energy
of a classical particle
=
_
m
2
c
2
+ p
2
c
2
− m c
2
≈
p
2
2 m
−
p
4
8 m
3
c
2
+ . . . (1637)
The ﬁrstorder energy shift due to the kinematic correction
ˆ
H
Kin
can be eval
uated by using the solution to the nonrelativistic Schr¨odinger equation
ˆ p
2
2 m
ψ
S
=
_
−
m c
2
2 n
2
_
Z e
2
¯h c
_
2
+
Z e
2
r
_
ψ
S
(1638)
which leads to
∆E
Kin
= −
_
d
3
r ψ
†
S
(r)
ˆ p
4
8 m
3
c
2
ψ
S
(r)
= −
1
2 m c
2
_
d
3
r ψ
†
S
(r)
_
−
m c
2
2 n
2
_
Z e
2
¯h c
_
2
+
Z e
2
r
_
2
ψ
S
(r)
= m c
2
_
Z e
2
¯h c
_
4
3
8 n
4
−
Z
2
e
4
2 m c
2
_
d
3
r ψ
†
S
(r)
1
r
2
ψ
S
(r) (1639)
Hence, the ﬁrstorder energy shift due to the kinematic correction is evaluated
as
∆E
Kin
= m c
2
_
Z e
2
¯h c
_
4
_
3
8 n
4
−
1
n
3
( 2 l + 1 )
_
(1640)
This term is found to lift the degeneracy between states with ﬁxed principle
quantum numbers n and values of the angular momenta l. The relativistic kine
matic correction to the energy is found to be smaller than the nonrelativistic
energy by a factor of
_
Z e
2
¯h c
_
2
∼ Z
2
10
−4
(1641)
285
which can be identiﬁed with a factor of (
v
c
)
2
as can be inferred from an analysis
based on the Bohr model of the atom. One sees that the relativistic corrections
become more important for atoms with larger Z, since the correction varies as
Z
4
. This occurs because for larger Z the electrons are drawn closer to the nu
cleus and, hence have higher kinetic energies, so the electron’s velocities draw
closer to the velocity of light.
11.10.5 SpinOrbit Coupling
To elucidate the meaning of the spinorbit interaction, the interaction will be
rederived starting from quasiclassical considerations of the anomalous Zeeman
interaction of a spin with a magnetic ﬁeld.
Consider a particle moving with a velocity v in a static electric ﬁeld E. In
the particle’s rest frame, it will experience a magnetic ﬁeld B
which is given by
B
= −
1
c
v ∧ E
_
1 −
v
c
2
≈
1
c
E ∧ v (1642)
for small velocities v. The magnetic ﬁeld B
is a relativistic correction due to
the motion of the source of the electric ﬁeld. If an electron is moving in a central
electrostatic potential φ(r) caused by a charged nucleus, the radial electric ﬁeld
is given by
E = −
r
r
_
∂φ
∂r
_
(1643)
Hence, the magnetic ﬁeld experienced by an electron in its rest frame is given
by
B
= −
1
m c r
_
∂φ
∂r
_
r ∧ p
= −
1
m c r
_
∂φ
∂r
_
L (1644)
which is caused by the apparent rotation of the charged nucleus. In the electron’s
rest frame, the electron’s spin S should interact with the magnetic ﬁeld through
the Zeeman interaction
ˆ
H
rest
Int
= −
q
2 m c
g
S
B
. S (1645)
where g
S
is the gyromagnetic ratio for the electron’s spin. Dirac’s theory pre
dicts that the spin is a relativistic phenomenon and also that g
S
= 2 for an
electron in its rest frame. This interaction with the magnetic ﬁeld will cause
the spin of the electron to precess. The spin precession rate found in the elec
trons rest frame is calculated as
ω
rest
=
e
2 m c
g
S
B
(1646)
286
However, the electron is bound to the nucleus and is orbiting with angular
momentum L. Therefore, one has to consider the corrections to the precession
rate (and the interaction) caused by the acceleration of the electron’s rest frame.
Thomas Precession
Electrons exhibit two diﬀerent gyromagnetic ratios. The gyromagnetic ratio
of g
S
= 2 couples a spin to an external magnetic ﬁeld and there is a gyro
magnetic ratio of unity for the lab frame. This gyromagnetic ratio of unity (in
the lab frame) enters the coupling between the spin of an electron in a circular
orbit to the magnetic ﬁeld B
experienced in the electron’s rest frame
108
. We
shall ﬁnd the gyromagnetic ratio in the lab frame, by calculating the rate of
precession that is observed in the lab frame and then inferring the (lab frame)
interaction which produces the same rate of precession.
In the electron’s rest frame, the gyromagnetic ratio due to the orbital mag
netic ﬁeld B
(caused by the charged nucleus) is given by g
s
= 2. This gyro
magnetic ratio yields a spin precession rate in the electron’s rest frame of
ω
rest
=
e
2 m c
g
S
B
(1647)
The spin precession rate observed in the lab frame will be calculated later. The
rate of precession as observed in the electron’s rest frame has to be corrected
by taking into account the motion of the electron. The correction is due to
the nonadditivity of velocities in successive Lorentz transformations. First,
the transformation properties of Dirac spinors under inﬁnitesimal rotations and
boosts will be reexamined. Secondly, inﬁnitesimal transformations will be suc
cessively applied to describe the particle’s instantaneous rest frame and the
Thomas precession.
A Lorentz transform of a spinor ﬁeld ψ is achieved by the rotation operator
ˆ
¹ via
ψ
(r) =
ˆ
¹ ψ(R
−1
r) (1648)
where
ˆ
¹ shuﬄes the components of the spinor. For a passive rotation (of the
coordinate system) through the inﬁnitesimal angle δϕ in the i  j plane, the
inﬁnitesimal Lorentz transform has the nonzero elements
i,j
= −
j,i
= − δϕ (1649)
Hence, the fourcomponent spinor is transformed by a rotation operator of the
form
ˆ
¹(δϕ) = exp
_
+ i
δϕ
2
σ
i,j
_
(1650)
108
L. H. Thomas, Nature 117, 514 (1926).
L. H. Thomas, Phil. Mag. 3, 1 (1927).
287
where
σ
i,j
=
i
2
[ γ
(i)
, γ
(j)
]
=
k
ξ
i,j,k
_
σ
(k)
0
0 σ
(k)
_
(1651)
Hence, for a passive rotation through an inﬁnitesimal angle δϕ, the fourcomponent
Dirac spinor is rotated by
ˆ
¹(δϕ) =
ˆ
I + i
δϕ
2
k
ξ
i,j,k
_
σ
(k)
0
0 σ
(k)
_
+ . . . (1652)
which can be expressed in terms of the projection of the (block diagonal) spin
operator
ˆ
S =
¯ h
2
ˆ σ on the axis of rotation ˆ e as
ˆ
¹(δϕ) =
ˆ
I +
i δϕ
2
( ˆ e . ˆ σ ) + . . .
=
ˆ
I +
i δϕ
¯h
( ˆ e .
ˆ
S ) + . . . (1653)
which is in accord with the deﬁnition of spin
ˆ
S as the generator of rotations.
If the primed frame of reference has a velocity v along the kaxis relative
to the unprimed frame, the inﬁnitesimal Lorentz transform has the nonzero
elements
0,k
= −
k,0
= −
v
c
(1654)
A Lorentz boost along the kaxis corresponds to a rotation in the 0  k plane
through an “angle” χ
ˆ
¹(χ) = exp
_
+ i
χ
2
σ
0,k
_
(1655)
where the “angle” χ is governed by the boost velocity v through
tanh χ =
v
c
(1656)
However,
σ
0,k
=
i
2
[ γ
(0)
, γ
(k)
] = i α
(k)
(1657)
so
ˆ
¹(χ) = exp
_
−
χ
2
α
(k)
_
(1658)
Therefore, for a Lorentz boost with an inﬁnitesimal velocity v along the kth
direction, one ﬁnds
ˆ
¹(χ) =
ˆ
I −
v
2 c
α
(k)
+ . . . (1659)
288
The inﬁnitesimal transformation is guaranteed to be consistent with the source
free solution of the Dirac equation. For example, if the above transformation is
applied to the solution of the Dirac equation describing a positiveenergy parti
cle at rest, the transformed solution describes a particle moving with momentum
p = − m v when viewed from the moving frame of reference.
q
v
a E
ω
Rest
ω
Τ
Figure 52: A cartoon depicting a rotating charged spin onehalf particle, along
with the precession of the spin due to the external ﬁeld in the particle’s rest
frame and the Thomas precession.
Consider the rotation
ˆ
¹
1
of a spinor due to an inﬁnitesimal Lorentz trans
formation with “small” velocity v, then
ˆ
¹
1
=
ˆ
I −
1
2 c
α . v + . . . (1660)
At a time δt later, the electron has changed its velocity since it is accelerating.
The new velocity of the electron’s rest frame is given by
v
= v + a δt (1661)
On performing a second Lorentz transform with the boost a δt, one ﬁnds the
rotation
ˆ
¹
2
=
ˆ
I −
1
2 c
α . a δt + . . . (1662)
The combined Lorentz transform is given by
ˆ
¹ =
ˆ
¹
2
ˆ
¹
1
=
_
ˆ
I −
1
2 c
α . a δt
_ _
ˆ
I −
1
2 c
α . v
_
+ . . .
=
ˆ
I −
1
2 c
α . ( v + a δt ) +
1
4 c
2
( α . a ) ( α . v ) δt + . . . (1663)
The Pauli identity can be used to evaluate the last term
( α . a ) ( α . v ) = ( ˆ σ . a ) ( ˆ σ . v )
= a . v
ˆ
I + i ˆ σ . ( a ∧ v ) (1664)
289
where, since the product of the two α’s yields a two by two block diagonal form
which involves the four by four matrices
ˆ
I and ˆ σ. Hence, the righthand side acts
equally on both the upper and lower twocomponent spinors. Furthermore, since
the orbit is circular, the acceleration is perpendicular to the velocity, therefore
a . v = 0 (1665)
Thus, the combined boost corresponds to the transformation
ˆ
¹ =
ˆ
I −
1
2 c
α . ( v + a δt ) +
i
4 c
2
ˆ σ . ( a ∧ v ) δt + . . . (1666)
The combined boost is identiﬁed as producing an inﬁnitesimal Lorentz boost
through v + a δt and a rotation around an axis ˆ e through the inﬁnitesimal
angle δϕ given by
δϕ ˆ e ≈
1
2 c
2
( a ∧ v ) δt (1667)
The rotation part acts on both the upper and lower twocomponent spinors in
the Dirac spinor. The rotation angle δϕ is linearly proportional to the time
interval δt. This class of rotations due to the combination of Lorentz boosts are
known as a Wigner rotations. Hence, it was shown that the spinor rotates with
the angular velocity given by
ω
T
=
1
2 c
2
( a ∧ v )
=
q
2 m c
2
( E ∧ v ) (1668)
The magnitude of ω
T
is calculated as
ω
T
=
e
2 m c
B
(1669)
and its direction is opposite to the precession of the spin in the electron’s rest
frame.
On combing the two precession frequencies, one ﬁnds that in the lab frame
the spin’s precession rate is given by
ω
Lab
= ω
rest
− ω
T
=
e
2 m c
( g
S
− 1 ) B
(1670)
It is clear that the moving spin experiences an eﬀective interaction which is
reduced by the factor
_
g
S
− 1
g
S
_
(1671)
when compared to the interaction in the electron’s rest frame. Hence, the gyro
magnetic ratio that enters the spinorbit coupling should not be g
S
but should
290
be given by ( g
S
− 1 ).
The SpinOrbit Interaction
In the lab frame, the interaction between the moving electron’s spin S mag
netic moment and its ﬁeld is inferred to be
ˆ
H
Lab
Int
= −
q
2 m c
( g
S
− 1 ) B
. S (1672)
where g
S
is the gyromagnetic ratio. Since the magnetic induction ﬁeld is given
by
B
= −
1
m c r
_
∂φ
∂r
_
L (1673)
where the electrostatic potential is given by
φ(r) =
q
r
(1674)
the spinorbit interaction can be expressed as
ˆ
H
SO
= −
q
2 m c
( g
S
− 1 )
q
m c r
3
L . S (1675)
Hence, the spinorbit interaction is found to be given by
ˆ
H
SO
=
Z e
2
2 m
2
c
2
r
3
( g
S
− 1 ) L . S (1676)
The spinorbit coupling is a relativistic coupling which, apart from the Thomas
precession factor, indicates that the electron’s spin interacts with a magnetic
ﬁeld in its rest frame via the gyromagnetic ratio of 2. The magnitude of the
interaction agrees precisely with the interaction found from the perturbative
treatment of the Dirac equation.
To ﬁrstorder in perturbation theory, the spinorbit coupling interaction
yields a shift of the energy levels. Since the total angular momentum J is a
good quantum number, one can write
L . S =
1
2
_
j ( j + 1 ) − l ( l + 1 ) −
3
4
_
(1677)
but j for a single electron can only take on the values j = l ±
1
2
, so
L . S =
1
2
_
± ( l +
1
2
) −
1
2
_
(1678)
The expectation value of r
−3
is evaluated as
_
d
3
r ψ
†
S
(r)
1
r
3
ψ
S
(r) =
1
l ( l +
1
2
) ( l + 1 )
_
Z
n a
_
3
(1679)
291
for l ,= 0. So the ﬁrstorder energyshift due to the spinorbit coupling can be
expressed as
∆E
SO
= m c
2
_
Z e
2
¯h c
_
4
_
± ( l +
1
2
) −
1
2
4 n
3
l ( l +
1
2
) ( l + 1 )
_
(1680)
Therefore, the spinorbit interaction lifts the degeneracy between states with
diﬀerent j = l ±
1
2
values. For l = 0, the numerator vanishes since the total
angular momentum can only take the value
j = +
1
2
(1681)
The energy shift produced by the spinorbit coupling is about a factor of the
square of the ﬁne structure constant
_
e
2
¯h c
_
2
∼
_
1
137
_
2
∼ 10
−4
(1682)
smaller than the energy levels of the hydrogenlike atom
E
n
≈ −
m c
2
2 n
2
_
Z e
2
¯h c
_
2
(1683)
calculated using the nonrelativistic Schr¨odinger equation. The spinorbit split
levels are labeled by the angular momentum values and the j values, and are
denoted by nL
j
. Hence, for n = 2 and l = 1, one has the two levels 2P1
2
and
2P3
2
, while for n = 3 and l = 2 one has the levels 3D3
2
and 3D5
2
, and so on. It
is seen that the spinorbit interaction is increasingly important for atoms with
large Z values, as it varies like Z
4
.
11.10.6 The Darwin Term
The Darwin term has no obvious classical interpretation. It only has physical
consequences for states with zero orbital angular momentum. However, it does
play an important role for the s electronic state of hydrogen, and is essential in
describing why the Dirac’s theory makes the 2S1
2
and 2P1
2
states of hydrogen
degenerate. This degeneracy was an essential ingredient in the discovery of the
Lamb shift and the subsequent development of Quantum Electrodynamics.
The Darwin interaction is given by
ˆ
H
Darwin
=
π Z e
2
¯h
2
2 m
2
c
2
δ
3
(r) (1684)
which produces the ﬁrstorder shift
∆E
Darwin
=
π Z e
2
¯h
2
2 m
2
c
2
ψ
†
S
(0) ψ
S
(0) (1685)
292
Hence, the shift only occurs for electrons with l = 0. Furthermore, since the
probability density for ﬁnding the electron at the origin is given by
ψ
†
S
(0) ψ
S
(0) =
1
π
_
Z
n a
_
3
δ
l,0
(1686)
to ﬁrst order, the Darwin term produces a shift
∆E
Darwin
=
m c
2
Z
4
2 n
3
_
e
2
¯h c
_
4
δ
l,0
(1687)
which shifts the energies of s states upwards. The Darwin term reﬂects the fact
that the relativistic corrections are important for small r since the inequality
m c
2
¸
Z e
2
r
(1688)
required for the nonrelativistic treatment to be reasonable is violated in this
region.
11.10.7 The Fine Structure of Hydrogen
Kinematic
P E
Kinematic
Darwin
S
Spin Orbit
n = 2
2S
1/2
2P
1/2
2P
3/2
Figure 53: The Grotarian energy level diagram for the n = 2 shell of hydrogen
(blue). The diagram shows the magnitude and sign of the various relativistic
corrections. It should be noted that states with the same j are degenerate.
When the various relativistic corrections are combined, for l = 0, the Darwin
term exactly compensates for the absence of the spinorbit interaction. There
fore, the energy shifts combine to yield one formula in which l drops out. This
implies that the energy levels only depend on the principle quantum number n
293
and the total angular momentum j. States with diﬀerent orbital angular mo
menta are degenerate, even though the individual interactions appear to raise
the degeneracy. The relativistic corrections inherent in Dirac’s theory of hydro
gen yields energy shifts and linesplittings which are described as ﬁne structure.
The energy levels are described by
E ≈ m c
2
_
1 −
1
2
Z
2
α
2
n
2
−
1
2
Z
4
α
4
n
3
_
1
( j +
1
2
)
−
3
4 n
_
+ . . .
_
(1689)
where
α =
_
e
2
¯h c
_
(1690)
is the ﬁne structure constant. Generally, states with larger j values have higher
energies. The ﬁne structure splittings decrease with increasing n like n
−3
, but
increase with increasing Z like Z
4
. The splitting of the lower energy levels are
largest, for example
E
2P3
2
− E
2P1
2
= −
m c
2
α
4
16
_
1
2
−
1
1
_
≈ 4.533 10
−5
eV (1691)
This splitting corresponds to a frequency of 10.96 GHz. The energy levels are
predicted to be doubly degenerate (in addition to the degeneracy associated
with j
3
), the degeneracy is just the number of states with diﬀerent l values that
yield the same value of j. Since j is found by combining l with the electronic
spin s =
1
2
, there are two possible l values for each energy level which are given
by the solutions of either
j = l +
1
2
(1692)
or
j = l −
1
2
(1693)
The higherorder relativistic corrections do not alter the conclusion that the
states labeled by (n, j) are degenerate, as the energy levels found from the exact
solution of the Dirac equation only depend on n and j. For j =
1
2
the energy
levels, although predicted to be degenerate by Dirac’s theory, are experimentally
observed as being nondegenerate. The ﬁrst experiments that revealed this split
ting were performed by Lamb and Retherford
109
. These scientists found that
the 2S1
2
was shifted by about 1057 MHz to higher energies relative to the 2P1
2
.
The relative shift of the nS1
2
level of hydrogen with respect to the nP1
2
level is
known as the Lamb shift.
Lamb and Retherford’s Experiment
109
W. E. Lamb Jr. and R. E. Retherford, Phys. Rev. 72, 241 (1947).
294
E S P D
Kinematic
Kinematic
Darwin
Spin Orbit
Spin Orbit
n = 3
Kinematic
3S
1/2
3P
1/2
3P
3/2 3D
3/2
3D
5/2
Figure 54: The Grotarian energy level diagram for the n = 3 shell of hydrogen
(blue). The diagram shows the magnitude and sign of the various relativistic
corrections. It should be noted that states with the same j are degenerate.
Lamb and Retherford designed an experiment to accurately measure the ﬁne
structure of the hydrogen atom. In the experiment, the time scales were such
that the population of all excited states, other than the metastable 2S1
2
state
of hydrogen, radiatively decayed to the ground state. Hence, the number of
induced transitions from the 2S1
2
state could be monitored by simply observing
of the population of hydrogen atoms not in the ground state.
e

e

H
1
H
1
Oven
EM
Cavity
I
Figure 55: A schematic of the apparatus used in the LambRetherford exper
iment. The beam of H molecules is produced in an oven, the beam is excited
by crossbombardment with an electron beam. The population of the 2S1
2
is al
tered in the microwave resonator, and the population is observed via the current
emitted at the tungsten plate.
A beam of hydrogen atoms was produced by dissociating hydrogen molecules
295
in an oven. The thermal beam of hydrogen atoms was then crossbombarded
with electrons, which excited some of the hydrogen atoms out of the ground
state. Since the electronatom scattering doesn’t obey the radiation selection
rules, a ﬁnite population of atoms (about 1 in 10
8
) were excited to the longlived
2S1
2
state. Subsequently, the other excited electronic states rapidly decayed to
the ground state by the emission of radiation. The beam of hydrogen atoms was
then passed through a tuneable (microwave) electromagnetic resonator, which
could cause the hydrogen atoms in the metastable level to make transitions
to selected nearby energy levels. Again, any non2S excited state of hydrogen
produced by the action of the resonator rapidly decayed to the ground state.
The resulting beam of hydrogen atoms was incident on a Tungsten plate, and
the collision could result in electron emission if the atoms were in an excited
state, but no emission would take place if the hydrogen atom was in the ground
state. Therefore, the current due to the emitted electrons was proportional to
the number of metastable hydrogen atoms that survived the passage through
the resonator. Hence, analysis of the experiment yielded the number of transi
tions undergone in the electromagnetic resonator.
Figure 56: The dependence of the current emitted from the tungsten plate on
the applied magnetic ﬁeld. The resonance frequency was set to 9487 Megacycles.
[W. E. Lamb Jr. and R. C. Retherford, Phys. Rev. 72, 241 (1947).]
In the resonator, an applied magnetic ﬁeld Zeeman split the excited levels of
hydrogen and, when the oscillating ﬁeld was onresonance with the splitting of
the energy levels, the hydrogen atom made transitions out from the metastable
2S1
2
state. At resonance, the frequency of the oscillating electromagnetic ﬁeld
is equal to the energy splitting. Therefore, for ﬁxed frequency, knowledge of the
resonance magnetic ﬁeld allowed the splitting of the energy levels to be accu
296
rately determined. The ﬁeld dependence of the resonance frequency indicated
Figure 57: The observed dependence of the resonance frequencies on the applied
magnetic ﬁeld. The solid lines are the predictions of the Dirac theory and the
dashed lines are the result of Dirac’s theory if the energy of the 2S state is
simply shifted. [W. E. Lamb Jr. and R. C. Retherford, Phys. Rev. 72, 241
(1947).]
that at zero ﬁeld the degeneracy between the 2S1
2
and 2P1
2
states were lifted,
with the 2S1
2
state having the higher energy.
11.10.8 A Particle in a Spherical Square Well
The radial equation for a relativistic spin onehalf particle in a spherically sym
metric “square well” potential is given by
( E − V (r) − m c
2
) f(r) + c ¯h
_
∂
∂r
−
κ
r
_
g(r) = 0
( E − V (r) + m c
2
) g(r) − c ¯h
_
∂
∂r
+
κ
r
_
f(r) = 0
(1694)
297
We shall examine the case of an attractive central square well potential V (r)
which is deﬁned by
V (r) =
_
− V
0
for r < a
0 for r > a
(1695)
In the region r < a where the potential is ﬁnite, the Dirac radial equation
1.5
1
0.5
0
0.5
0 0.5 1 1.5 2
r/a
V
(
r
)
/
V
0
Figure 58: A spherically symmetric potential well, of depth V
0
and radius a.
becomes
( E + V
0
− m c
2
) f(r) + c ¯h
_
∂
∂r
−
κ
r
_
g(r) = 0
( E + V
0
+ m c
2
) g(r) − c ¯h
_
∂
∂r
+
κ
r
_
f(r) = 0
(1696)
The function f(r) satisﬁes a secondorder diﬀerential equation, which can be
found by premultiplying the second equation by the operator
c ¯h
_
∂
∂r
−
κ
r
_
(1697)
and then eliminating g(r) by using the ﬁrst equation. This process yields the
equation
c
2
¯h
2
_
∂
2
∂r
2
−
κ ( κ + 1 )
r
2
_
f(r) = −
_
( E + V
0
)
2
− m
2
c
4
_
f(r)
(1698)
298
By using a similar procedure, starting from the second equation, one can ﬁnd
the analogous equation for g(r)
c
2
¯h
2
_
∂
2
∂r
2
−
κ ( κ − 1 )
r
2
_
g(r) = −
_
( E + V
0
)
2
− m
2
c
4
_
g(r)
(1699)
It should be recognized that the term proportional to κ ( κ + 1 ) on the left
hand side of the eqn(1698) for the large component, when divided by 2 m c
2
,
is equivalent to the centrifugal potential in the nonrelativistic limit. The small
component experiences a diﬀerent centrifugal potential. Furthermore, the quan
tity
( E + V
0
)
2
− m
2
c
4
(1700)
plays a similar role to the kinetic energy in the nonrelativistic Schr¨odinger
equation.
Real Momenta
If the quantity ( E + V
0
)
2
− m
2
c
4
is positive, it can be written as
( E + V
0
)
2
− m
2
c
4
= c
2
¯h
2
k
2
0
> 0 (1701)
where k
0
is real. These equations can be expressed in dimensionless form by
introducing the dimensionless variable variable ρ = k
0
r. The radial equations
simplify to become
ρ
2
∂
2
f
∂ρ
2
+
_
ρ
2
− κ ( κ + 1 )
_
f = 0
ρ
2
∂
2
g
∂ρ
2
+
_
ρ
2
− κ ( κ − 1 )
_
g = 0 (1702)
Since (apart from the sign) κ is identiﬁed with a form of angular momentum,
one sees that the upper and lower components experience diﬀerent centrifugal
potentials. These equations have forms which are closely related to Bessel’s
equation. If one sets
f = ρ
1
2
X
κ+
1
2

(1703)
and
g = ρ
1
2
Y
κ−
1
2

(1704)
the equations reduce to the pair of Bessel’s equations
ρ
2
∂
2
X
κ+
1
2

∂ρ
2
+ ρ
∂X
κ+
1
2

∂ρ
+
_
ρ
2
− ( κ +
1
2
)
2
_
X
κ+
1
2

= 0
ρ
2
∂
2
Y
κ−
1
2

∂ρ
2
+ ρ
∂Y
κ−
1
2

∂ρ
+
_
ρ
2
− ( κ −
1
2
)
2
_
Y
κ−
1
2

= 0
(1705)
299
of halfinteger order. The spherical Bessel functions and spherical Neumann
functions of order n are deﬁned in terms of the Bessel functions via
j
n
(ρ) =
_
π
2 ρ
J
n+
1
2
(ρ)
η
n
(ρ) =
_
π
2 ρ
N
n+
1
2
(ρ) (1706)
Therefore, the general solutions of each of the radial equations can be expressed
as
f(r)
r
= A
0
j
κ+
1
2
−
1
2
(k
0
r) + A
1
η
κ+
1
2
−
1
2
(k
0
r) (1707)
and
g(r)
r
= B
0
j
κ−
1
2
−
1
2
(k
0
r) + B
1
η
κ−
1
2
−
1
2
(k
0
r) (1708)
However, since the functions f(r) and g(r) in the upper and lower components
are related by the diﬀerential equations
_
∂
∂ρ
+
1 + κ
ρ
_ _
f(r)
r
_
=
( E + V
0
+ m c
2
)
c ¯h k
0
_
g(r)
r
_
(1709)
and
_
∂
∂ρ
+
1 − κ
ρ
_ _
g(r)
r
_
= −
( E + V
0
− m c
2
)
c ¯h k
0
_
f(r)
r
_
(1710)
the two sets of coeﬃcients (A
0
, A
1
) and (B
0
, B
1
) must also be related. The
explicit relations can be found by using the recurrence relations for the spherical
Bessel functions j
n
(ρ)
∂
∂ρ
_
ρ
n+1
j
n
(ρ)
_
= ρ
n+1
j
n−1
(ρ) (1711)
and
∂
∂ρ
_
ρ
−n
j
n
(ρ)
_
= − ρ
−n
j
n+1
(ρ) (1712)
The spherical Neumann functions η
n
(ρ) satisfy identical recurrence relations.
This yields the relations
A
0
= sign
_
κ
_ _
E + V
0
+ m c
2
c ¯h k
0
_
B
0
A
1
= sign
_
κ
_ _
E + V
0
+ m c
2
c ¯h k
0
_
B
1
(1713)
Hence, for positiveenergy solutions. the upper components are the large com
ponents and the lower components are the small components. In the inner
region, one must set A
1
= B
1
= 0, since the wave function are required to be
300
normalizable near the origin and the spherical Neumann functions η
n
(ρ) diverge
as ρ
−(n+1)
as ρ →0.
Imaginary Momenta
If the quantity ( E + V
0
)
2
− m
2
c
4
is negative, it can be written as
( E + V
0
)
2
− m
2
c
4
= − c
2
¯h
2
κ
2
0
< 0 (1714)
where κ
0
is real. This corresponds to the case of negative kinetic energies. In
this case, one can express the solution in terms of the modiﬁed spherical Bessel
functions
f(r)
r
= A
0
i
κ+
1
2

(κ
0
r) + A
1
k
κ+
1
2

(κ
0
r) (1715)
and
g(r)
r
= B
0
i
κ−
1
2

(κ
0
r) + B
1
k
κ−
1
2

(κ
0
r) (1716)
Because to the factors of i in the deﬁnitions of the modiﬁed spherical Bessel
functions, the amplitudes of the upper and lower components are related via
A
0
= −
_
E + V
0
+ m c
2
c ¯h κ
0
_
B
0
A
1
=
_
E + V
0
+ m c
2
c ¯h κ
0
_
B
1
(1717)
where a minus sign has appeared in the ﬁrst equation. Again, we see that for
positive energies, for r < a, the upper components are the larger components
and the lower components are the smaller components.
Bound States
The bound state energy E must occur in the energy interval
m c
2
> E > − m c
2
(1718)
so that the wave function in the region r < a where the potential is zero is
exponentially decaying. Since E
2
− m
2
c
4
< 0, the wave functions in the
outer region should also be expressed in terms of the modiﬁed spherical Bessel
functions. The quantity κ
1
can be deﬁned as
E
2
− m
2
c
4
= − ¯h
2
c
2
κ
2
1
(1719)
and the equations can be expressed in terms of the dimensionless variable
ρ = i κ
1
r (1720)
In this case, it is more useful to express the solution of the radial Dirac equation
in terms of the spherical Hankel functions h
±
n
(ρ). The spherical Hankel functions
are deﬁned via
h
±
n
(ρ) = j
n
(ρ) ± i η
n
(ρ) (1721)
301
For asymptotically large ρ, these functions are complex conjugates and represent
outgoing or incoming spherical waves
lim
ρ→∞
h
±
n
(ρ) →
1
ρ
exp
_
± i
_
ρ − ( n +
1
2
)
π
2
_ _
(1722)
The factor of ρ
−1
reﬂects the fact that the intensity of an outgoing wavepacket
decreases in proportion to ρ
−2
in order to conserve energy and probability. From
the asymptotic variation, it is seen that the spherical Hankel functions h
±
n
(iρ)
with imaginary arguments, respectively, represent exponentially attenuating or
growing spherical waves. In the exterior region, the solutions are represented
by
f(r)
r
= C
0
h
+
κ+
1
2
−
1
2
(iκ
1
r) + C
1
h
−
κ+
1
2
−
1
2
(iκ
1
r) (1723)
and
g(r)
r
= D
0
h
+
κ−
1
2
−
1
2
(iκ
1
r) + D
1
h
−
κ−
1
2
−
1
2
(iκ
1
r) (1724)
The coeﬃcients of the upper and lower components are related via
C
0
= −
_
E + m c
2
c ¯h κ
1
_
D
0
C
1
=
_
E + m c
2
c ¯h κ
1
_
D
1
(1725)
as can be seen by substituting the asymptotic form of the Hankel functions given
by eqn(1722) in the asymptotic form of the diﬀerential equations relating f(r)
and g(r) with V
0
= 0. If this wave function is to be normalizable at ρ → ∞,
one must set C
1
= D
1
= 0.
The solutions for the wave functions have been found in the inner and outer
regions of the potential. The solution must also hold at r = a. This is achieved
by demanding that the upper and lower components of the wave function are
continuous at r = a. These conditions are demanded due to charge conservation
∂
µ
j
µ
= 0, since the current j
µ
only depends on the components of ψ and does
not (explicitly) depend on their derivatives.
Since the wave function at the origin must be normalizable, and since the
wave function must be exponentially decaying, when r → ∞, the matching
condition for the upper component becomes
A
0
j
κ+
1
2
−
1
2
(k
0
a) = C
0
h
+
κ+
1
2
−
1
2
(iκ
1
a) (1726)
and the matching condition for the lower components becomes
B
0
j
κ−
1
2
−
1
2
(k
0
a) = D
0
h
+
κ−
1
2
−
1
2
(iκ
1
a) (1727)
302
By eliminating the amplitudes from the two matching conditions by using
eqn(1725), one can arrive at the equation
sign(κ)
_
E + V
0
+ m c
2
c ¯h k
0
_ _
j
κ+
1
2
−
1
2
(k
0
a)
j
κ−
1
2
−
1
2
(k
0
a)
_
= −
_
E + m c
2
c ¯h κ
1
_ _
h
+
κ+
1
2
−
1
2
(iκ
1
a)
h
+
κ−
1
2
−
1
2
(iκ
1
a)
_
(1728)
In the above expression, the quantities k
0
and κ
1
are deﬁned by
¯h
2
c
2
k
2
0
= ( E + V
0
)
2
− m
2
c
4
(1729)
and
¯h
2
c
2
κ
2
1
= m
2
c
4
− E
2
(1730)
These equations determine the allowed values for the energy. The above set of
equations have to be solved numerically to ﬁnd the energy eigenvalues. We note
that for the Dirac particle, the spin eﬀectively results in the formation of a cen
trifugal barrier (either for the upper or the lower component) even for electrons
in s states. As a result, the potential V
0
must exceed a critical strength if it is
to yield a bound state.
11.10.9 The MIT Bag Model
From the point of view of symmetry, a baryon, such as a neutron or proton, are
thought of as being composed of three (valence) quarks. For example, the proton
is considered to be made of two up quarks and a down quark (p = (uud)), while
the neutron is considered to be made of one up quark and two down quarks
(n = (udd)). These valence quarks are assumed to be surrounded by a sea of
gluons which bind the quarks together and a sea of virtual quark/antiquark
pairs that are produced by the gluon ﬁeld. Likewise, mesons are considered
to be made of a quark and an antiquark, but these valence quarks are also
surrounded by a sea of gluons and quark/antiquark pairs. The gluon force has
the property that the energy of interaction increases as the separation between
the quarks increases. It is this property of the gluon force that results in the
quarks being conﬁned, so that no single quark can be found in nature.
The MIT bag model
110
is a simple purely phenomenological model for the
structure of strongly interacting particles (hadrons). The model is based on
the spherically symmetric potential of radius a, but it will be assumed that the
quark mass can have one or the other of two values. The quark is assumed to
have a small mass (approximately zero) if it is located within a sphere of radius
a, and the mass is assumed to be very large (or inﬁnite) if r > a. To be sure,
the quark mass is assumed to be a function of r such that
m = 0 if r < a
m → ∞ if r > a (1731)
110
A. Chodos, R. L. Jaﬀe, K. Johnson, C. B. Thorn, and V. F. Weisskopf, Phys. Rev. D 9,
3471, (1974).
303
It is the inﬁnite mass of the quark for r > a that results in the conﬁnement of
the quark to within the hadron. That is, in the exterior region, the inﬁnite rest
mass energy exceeds the bound state energy so the exterior region is classically
forbidden, therefore, the particle is conﬁned to the interior.
Inside the hadron, where both the potential energy and the mass m are zero,
the kinetic energy parameter k
0
can be expressed entirely in terms of the energy
via E = ¯h c k
0
since the potential is assumed to be zero. Therefore, the radial
components of the Dirac wave function can be expressed as
f(r)
r
= A
0
j
κ+
1
2
−
1
2
(k
0
r)
g(r)
r
= sign(κ) A
0
j
κ−
1
2
−
1
2
(k
0
r) (1732)
where the amplitudes of the upper and lower components are the same, since
the potential and mass are zero for r < a.
Outside the hadron, where r > a, the energy E is assumed to be much less
than the rest mass energy, m c
2
¸ E, therefore, the momentum parameter is
imaginary and one can set ¯h c κ
1
≈ m c
2
. In the exterior region, the radial
functions can be expressed as
f(r)
r
= C
0
h
+
κ+
1
2
−
1
2
(iκ
1
r)
g(r)
r
= − C
0
h
+
κ−
1
2
−
1
2
(iκ
1
r) (1733)
since the imaginary momentum parameter has a magnitude which is governed
by the large mass m. Due to the large magnitude of κ
1
, the wave function
decays very rapidly in the exterior region.
The bound state energy is determined from the matching condition
sign(κ)
_
j
κ+
1
2
−
1
2
(k
0
a)
j
κ−
1
2
−
1
2
(k
0
a)
_
= −
_
h
+
κ+
1
2
−
1
2
(iκ
1
a)
h
+
κ−
1
2
−
1
2
(iκ
1
a)
_
(1734)
Due to the asymptotic properties of the spherical Hankel functions, their ratio
is unity for large κ
1
. This leads to the energies of the quarks being governed by
the simpliﬁed matching condition
j
κ+
1
2
−
1
2
(k
0
a) = − sign(κ) j
κ−
1
2
−
1
2
(k
0
a) (1735)
where
E = c ¯h k
0
(1736)
The above equation governs the ground state and excited state energies of the
individual quarks inside the hadron. Since the spherical Bessel functions oscil
late in sign, the above equations will result in a set of solutions for k
0
with ﬁxed
304
κ. From the structure of the equations, it is seen that the solutions k
0
will only
depend on the integer number κ and the value of a. Since another boundary
condition should also be imposed at the bag’s surface, only states with angular
momentum j =
1
2
should be retained. This extra condition restricts the interest
to states with κ = − 1.
We shall examine the lowestenergy bound state which corresponds to the
case κ = − 1. The bound state energies are given by the matching condition
j
0
(k
0
a) = j
1
(k
0
a) (1737)
Since
j
0
(ρ) =
sin ρ
ρ
(1738)
and
j
1
(ρ) =
_
sin ρ − ρ cos ρ
ρ
2
_
(1739)
the energy eigenvalues are determined by the solutions of
ρ =
1
1 + cot ρ
(1740)
which has an inﬁnite number of solutions which, asymptotically, are spaced by
π. The smallest solution corresponds to k
0
a = 2.04. Hence, the energy of the
0
0.5
1
0 1 2 3 4
ρ
P
(
ρ
)
/
P
(
0
)
Figure 59: The radial dependence of the quarkdistribution in the ground state
of the MIT bag.
lowestenergy quark is give by the formula
E
κ=−1,nr=0
=
2.04 c ¯h
a
(1741)
The solutions with larger values of k
0
, corresponding to excited states with
κ = − 1 are given by analogous expressions. Therefore, if one knows the value
of a, one could ﬁnd the energies required to excite a singlequark between the
305
Table 15: The lowest singleparticle energies (in units of E
κ,nr
a/c ¯h) of the MIT
Bag Model.
n
r
κ = −1 κ = +1 κ = −2 κ = +2
n
r
= 0 2.04 3.81 3.21 5.12
n
r
= 1 5.40 7.00 6.76 8.41
n
r
= 2 8.58 10.17 10.01 11.61
n
r
= 3 11.73 13.31 13.20 14.79
single particle levels. This could allow one to calculate the excitation energies
required to change the hadron’s internal structure.
In conclusion, the MIT bag model, when interpreted as being a strictly
singleparticle picture, predicts that the set of excitation energies (for the in
ternal structure) of each of the basic hadrons can be put into a onetoone
correspondence with each other. That is, the family of excitation energies for
each hadron should fall ontop of each other, if one scales the energies by mul
tiplying them with the hadron’s characteristic length scale a. The bag radius
is determined by the use of further phenomenological considerations. However,
although the model can be used to ﬁt the right size for a nucleon (∼ 1 fm),
the model predicts that a meson (such as the pions which are composed of a
quark and antiquark in the combinations of either (u, d), (d, u) or
1
√
2
(dd −uu)
) should have almost the same radii
111
a
n
a
π
=
_
3
2
_1
4
(1742)
Hence, the ratio of the nucleon mass M
n
to the pion mass M
π
is expected to be
given by the formula
M
n
M
π
=
3 2.04/a
n
2 2.04/a
π
=
3
2
_
2
3
_1
4
(1743)
111
It is assumed that the bag energy is given by the sum of a volume term B a
3
and the sum
of the quark energies
c ¯ h
a
n
αn. Minimizing the energy w.r.t a results in the bag radius a
being determined by
a
4
=
c ¯h
3 B
n
αn
306
Table 16: The Observed Energy Levels for the charmonium system (cc) in units
of MeV/c
2
.
1
S
0
3
S
1
3
P
0
3
P
1
3
P
2
2981 3097 3415 3510 3556
3686 3770   
 4040   
 4160   
which yields a ratio of 1.36. This ratio is far too small for the triplet of π mesons
since M
π
∼ 139 MeV/c
2
, and M
n
∼ 938 MeV/c
2
. Although it is in adequate
for the pseudoscalar mesons, the MIT bag model is more appropriate for the ω
vector meson which is composed of
1
√
2
(uu +dd) and has a mass of M
ω
∼ 783
MeV/c
2
, or the ρ vector meson
1
√
2
(uu +dd) with a mass M
ρ
∼ 776 MeV/c
2
.
Hence, at best, the MIT Bag model produces mixed results. The MIT Bag
model is also quite unappealing, since the basic assumptions of the bag model
do not follow from Quantum Chromodynamics, and the model is neither re
normalizable nor is it Lorentz invariant.
11.10.10 The Temple Meson Model
A quark and antiquark pair form bound states. Thus, for example a charmed
quark/antiquark pair (c, c) can form states with diﬀerent internal quantum
numbers
112
. The experimentally determined energies for the J/Ψ system
113
are given in Table(16). Similarly, the Upsilon particle
114
(bb) has a similar set
of energy levels. The energy levels of the Upsilon system
115
are tabulated in
Table(17). For positronium
116
, like the hydrogen atom, it is the electromagnetic
force mediated by vector photons which binds the electron and positron into a
bound state. For a quark/antiquark bound state, it is the color force mediated
by massless vector gluons that bind the quark/antiquark pair together. The
color force has the property that it increases with increasing separation of the
quark/antiquark pair, which has the consequence that the quarks are conﬁned.
Furthermore, highenergy inelastic scattering experiments on hadrons indicate
112
J. E. Augustin et al. Phys. Rev. Lett. 33, 1406 (1974).
J. J. Aubert Phys. Rev. Lett. 33, 1404 (1974).
113
The data are taken from the Particle Data Group: http://pdg.lbl.gov
114
S. W. Herb, et al. Phys. Rev. Lett. 39, 252 (1977).
W. R. Innes et al. Phys. Rev. Lett. 39, 1240 (1977).
115
The data are taken from the Particle Data Group: http://pdg.lbl.gov
116
M. Deutsch, Phys. Rev. 82, 455 (1951).
307
Table 17: The Observed Energy Levels for the Upsilon system (bb) in units of
MeV/c
2
.
1
S
0
3
P
0
3
P
1
3
P
2
9460 9860 9893 9913
10025 10232 10255 10268
10355   
10580   
that at small separations the quarks only interact weakly. This property is
called asymptotic freedom. It was the realization by ’t Hooft
117
, Gross and
Wilczek
118
and Politzer
119
that nonAbelian gauge theories possessed the prop
erties of asymptotic freedom that led to the acceptance of the theory of Quantum
Chromodynamics. The screening of the color force between the quarks at large
distances (due to virtual quark/antiquark pairs) is more than compensated by
an antiscreening due to virtual gluon pairs. However, at small distances the
color force vanishes.
The restmass energy of the quarks and antiquarks will be modeled by
m(r) c = m
0
_
c − i ω α . r
_
(1744)
which describes an energy similar to that of an elastic string which couples to
the spin
120
. The model has two undetermined parameters, the quark mass m
0
and the string tension m
0
c ω. The mass m(r), and the Dirac equation, can be
used to determine the energy levels of quarkonium.
Exercise:
Show that the positive energy eigenvalues of the Dirac equation with the
mass m(r) given by
m(r) c = m
0
_
c − i ω α . r
_
(1745)
are determined as
E
n,j,l
= m
0
c
2
√
t A + 1 (1746)
117
G. t’ Hooft, unpublished (1972).
118
D. J. Gross and F. A. Wilczek, Phys. Rev. Lett. 30, 1343 (1973).
119
H. D. Politzer, Phys. Rev. Lett. 30, 1346 (1973).
120
D. Ito, K. Mori and E. Carriere, Nuovo Cimento, 51 A, 1119, (1967).
P. A. Cook, Nuovo Cimento Lett. 1, 419 (1971).
308
where the dimensionless parameter t corresponding to the string tension is given
by
t =
_
¯h ω
m
0
c
2
_
(1747)
and A is given in terms of the quantum numbers as
A = 2 (n −j) + 1 if j = l +
1
2
A = 2 (n +j) + 3 if j = l −
1
2
(1748)
Hence, ﬁnd the best ﬁt to the excitation spectra of quarkonium.
11.11 Scattering by a Spherically Symmetric Potential
First, the polarization dependence of scattering of an electron from a Coulomb
potential will be examined in terms of the scattering amplitudes, and second,
by using a partial wave analysis, the scattering amplitudes will be expressed in
terms of phase shifts.
11.11.1 Polarization in Coulomb Scattering.
The scattering of a relativistic electron by a Coulomb force ﬁeld results in spin
ﬂip scattering since the electron has a magnetic moment which interacts with
the magnetic ﬁeld produced in the electron’s rest frame. Since the Coulomb
potential is spherically symmetric, the angular momentum
ˆ
J
2
and
ˆ
J
(3)
com
mute with the Hamiltonian, hence, (j, j
3
) are constants of motion. However,
the orbital angular momentum
ˆ
L does not commute with
ˆ
H.
The Dirac wave function ψ(r) can be expressed in terms of two twocomponent
spinors
ψ(r) =
_
φ
A
(r)
φ
B
(r)
_
(1749)
One only need specify the upper component φ
A
(r), since once φ
A
(r) has been
speciﬁed φ
B
(r) is completely determined. For example, for the in and out
asymptotes, the Dirac equation reduces to
_
E
p
− m c
2
− c ˆ p . σ
− c ˆ p . σ E
p
+ m c
2
_ _
φ
A
(r)
φ
B
(r)
_
= 0 (1750)
Hence, the lower twocomponent spinor is completely determined in terms of
the upper twocomponent spinor
φ
B
(r) =
c ˆ p . σ
E
p
+ m c
2
φ
A
(r) (1751)
309
In the scattering experiment, a planewave with momentum p parallel to the
ˆ e
3
axis falls incident on the target. The inasymptote can be described by a
state which is in a superposition of eigenstates of
ˆ
S
(3)
given by
ψ
in
±
(r) = ^
Ep
_
χ
±
±
c p
Ep + m c
2
χ
±
_
exp
_
i
p r
¯h
cos θ
_
(1752)
From the Rayleigh expansion, one observes that the inasymptotes are not eigen
states of (
ˆ
J)
2
= (
ˆ
L+
ˆ
S)
2
since they are formed of linear superpositions of many
states with diﬀering eigenvalues of
ˆ
L
2
but have a ﬁxed eigenvalue of
ˆ
S
2
. How
ever, the inasymptote are eigenstates of
ˆ
J
(3)
=
ˆ
L
(3)
+
ˆ
S
(3)
with eigenvalues
±
¯ h
2
.
Ψ
in
Ψ
out
p
p
(θ,ϕ)
Figure 60: The geometry of the asymptotic ﬁnal state of Mott scattering. At
large r, the beam separated into an unscattered beam ψ
in
and a spherical out
going wave ψ
out
.
The corresponding outasymptotes can be described as spherical outgoing
waves. Even though the inasymptote may have a deﬁnite eigenvalue of
ˆ
S
(3)
,
the spherically symmetric outasymptote waves may contain a component with
ﬂipped spin, due to the action of the spinorbit coupling
_
ˆ
S .
ˆ
L
_
=
ˆ
S
(3)
ˆ
L
(3)
+
1
2
_
ˆ
S
+
ˆ
L
−
+
ˆ
S
−
ˆ
L
+
_
(1753)
active in the vicinity of the target. In spherical polar coordinates, the orbital
310
angular momentum raising and lowering operators are given by
ˆ
L
±
= ± ¯h exp[±iϕ]
_
∂
∂θ
± i cot θ
∂
∂ϕ
_
(1754)
Hence, on noting that
ˆ
S
±
χ
±
≡ 0 (1755)
one ﬁnds that the outasymptotes can be expressed as
ψ
out
±
(r) = ^
Ep
_
( f(θ) ± g(θ) exp[ ± iϕ ]
ˆ
S
∓
) χ
±
c p ( ˆ er . σ )
Ep + m c
2
( f(θ) ± g(θ) exp[ ± iϕ ]
ˆ
S
∓
) χ
±
_
1
r
exp
_
i
p r
¯h
_
(1756)
where ˆ e
r
is a unit vector in the radial direction. It should be noted that the
outasymptote describes an outgoing spherical wave when r → ∞. Therefore,
the operator ( σ . ˆ p ) appearing in the asymptote has simpliﬁed since
lim
r→∞
( σ . ˆ p ) = lim
r→∞
_
r . σ
r
_ _
− i ¯h
∂
∂r
+
2 i
¯h
_
ˆ
S .
ˆ
L
r
_ _
→
_
r . σ
r
_ _
− i ¯h
∂
∂r
_
(1757)
which reﬂects that the spinorbit coupling term is ineﬀective at r → ∞. Simi
larly, the eﬀect of the diﬀerential operator can be evaluated as
lim
r→∞
− i ¯h
∂
∂r
_
1
r
exp
_
i
p r
¯h
_ _
→
p
r
exp
_
i
p r
¯h
_
(1758)
In light of the comment about the upper twocomponent spinor, one sees that
the scattered wave is determined by
_
f(θ)
g(θ) exp[+iϕ]
_
(1759)
for an incident beam with positive helicity, and by
_
−g(θ) exp[−iϕ]
f(θ)
_
(1760)
if the initial beam has a negative helicity. The quantities f(θ) and g(θ) are gen
eralized scattering amplitudes that have the dimensions of length, and depend
on θ but do not depend on ϕ as both the in and out asymptotes are eigenstates
of
ˆ
J
(3)
with eigenvalues ±
¯ h
2
. A partial wave analysis can be performed on the
Dirac equation to yield expressions for the scattering amplitudes f(θ) and g(θ)
in terms of phase shifts. A detailed knowledge of the scattering amplitudes is
not required for the following analysis.
311
If the inasymptote has the spin quantized along the direction given by
(sin θ
s
cos ϕ
s
, sin θ
s
sin ϕ
s
, cos θ
s
), the upper component of the Dirac wave spinor
is determined by the twocomponent spinor
χ
s
=
_
cos
θs
2
exp[−i
ϕs
2
]
sin
θs
2
exp[+i
ϕs
2
]
_
(1761)
The outasymptote is then determined by the twocomponent spinor
φ
A
(r) =
_
f(θ) cos
θs
2
exp[−i
ϕs
2
] − g(θ) sin
θs
2
exp[+i
ϕs
2
] exp[−iϕ]
g(θ) cos
θs
2
exp[−i
ϕs
2
] exp[+iϕ] + f(θ) sin
θs
2
exp[+i
ϕs
2
]
_
1
r
exp
_
i
p r
¯h
_
(1762)
The probability for scattering is proportional to
I(θ, ϕ) ∝
¸
¸
¸
¸
f(θ) cos
θ
s
2
exp[−i
ϕ
s
2
] − g(θ) sin
θ
s
2
exp[+i
ϕ
s
2
] exp[−iϕ]
¸
¸
¸
¸
2
+
¸
¸
¸
¸
g(θ) cos
θ
s
2
exp[−i
ϕ
s
2
] exp[+iϕ] + f(θ) sin
θ
s
2
exp[+i
ϕ
s
2
]
¸
¸
¸
¸
2
=
_
[ f(θ) [
2
+ [ g(θ) [
2
_
+ sin θ
s
sin(ϕ −ϕ
s
) i
_
f
∗
(θ) g(θ) − f(θ) g
∗
(θ)
_
(1763)
which clearly depends on the azimuthal angle ϕ.
If the initial beam is unpolarized, the direction of the initial spin (θ
s
, ϕ
s
)
should be averaged over by integrating over the solid angle dΩ
s
= dϕ
s
dθ
s
sin θ
s
.
This process yields the scattering probability for the unpolarized beam
_
dΩ
s
4 π
I(θ, ϕ) =
_
[ f(θ) [
2
+ [ g(θ) [
2
_
(1764)
which is independent of the azimuthal angle ϕ. It should be noted that the
unpolarized crosssection diﬀers from the polarized crosssection.
Even if the initial beam is unpolarized, the ﬁnal beam will be partially
polarized. The direction of the net polarization is determined by evaluating the
matrix elements of
ˆ
S and averaging over the direction of the initial spin, θ
s
and
ϕ
s
. The result is proportional to
ˆ
S =
¯h
2
i
_
f
∗
(θ) g(θ) − f(θ) g
∗
(θ)
[ f(θ) [
2
+ [ g(θ) [
2
_
(sin ϕ, −cos ϕ, 0) (1765)
Hence, the polarization is perpendicular to the scattering plane. It should also
be noted that the net polarization of the scattered wave is determined by the
relative deviation of the scattering crosssection for polarized electrons from the
unpolarized scattering crosssection.
312
11.11.2 Partial Wave Analysis
The Dirac equation with a spherically symmetric potential V (r) has solutions
of the form
ψ(r) =
_
f(r)
r
Ω
j±
1
2
j,jz
(θ, ϕ)
i
g(r)
r
Ω
j∓
1
2
j,jz
(θ, ϕ)
_
(1766)
where the twocomponent spinor spherical harmonics Ω
j±
1
2
j,jz
(θ, ϕ) are given by
Ω
j±
1
2
j,jz
(θ, ϕ) =
_
_
∓
_
j+
1
2
±
1
2
∓jz
2j+1±1
Y
j±
1
2
jz−
1
2
(θ, ϕ)
_
j+
1
2
±
1
2
±jz
2j+1±1
Y
j±
1
2
jz+
1
2
(θ, ϕ)
_
_
(1767)
and the radial functions f
κ
(r) and g
κ
(r) satisfy
_
E − V (r) − m c
2
_
f
κ
(r) = − c ¯h
_
∂
∂r
−
κ
r
_
g
κ
(r)
_
E − V (r) + m c
2
_
g
κ
(r) = c ¯h
_
∂
∂r
+
κ
r
_
f
κ
(r) (1768)
where κ = ± ( j +
1
2
). If the momentum ¯h k is deﬁned via
c
2
¯h
2
k
2
= E
2
− m
2
c
4
(1769)
the asymptotic r → ∞ form of the solutions of these coupled equations with
positive values of κ are of the form of a linear superposition
f
κ
(r)
r
= A
κ
j
κ
(kr) + B
κ
η
κ
(kr) (1770)
where j
κ
(kr) and η
κ
(kr) are the spherical Bessel and the spherical Neumann
functions. For negative values of κ, the solutions are given by
f
κ
(r)
r
= A
κ
j
−κ−1
(kr) + B
κ
η
−κ−1
(kr) (1771)
The spherical Bessel and spherical Neumann functions have the asymptotic
forms
j
κ
(kr) →
cos(kr −(κ + 1)
π
2
)
kr
η
κ
(kr) →
sin(kr −(κ + 1)
π
2
)
kr
(1772)
The solutions for a free particle do not involve the spherical Neumann functions,
since they are not normalizable at the origin. The amplitudes of the asymptotic
solution in the presence of a ﬁnite potential V (r) are usually written as
B
κ
A
κ
= − tan δ
κ
(k) (1773)
313
where δ
κ
(k) are the phase shifts that characterize the potential. The phase
shifts depend directly on κ (and the energy) and only depend indirectly on j
and l through κ. The phase shifts are deﬁned so that the asymptotic variation
of the radial functions is given by
f
κ
(r)
r
∼ e
iδκ(k)
cos(kr −(κ + 1)
π
2
+δ
κ
(k))
r
(1774)
and only diﬀers from the asymptotic variation of the free particle solutions
through the phase shifts. Furthermore, if this is decomposed in terms of incom
ing and outgoing spherical waves,
f
κ
(r)
r
∼
exp
_
i
_
k r − (κ + 1)
π
2
+ 2 δ
κ
(k)
_ _
2 r
+
exp
_
− i
_
k r − (κ + 1)
π
2
_ _
2 r
(1775)
their ﬂuxes are equal due to conservation of particles and, as written, the in
coming spherical waves are not modiﬁed by the phaseshifts.
The general asymptotic r →∞ form of the wave function for the scattering
is composed of the unscattered wave and a spherical outgoing wave. The polar
axis is chosen to be parallel to direction of the incident beam which is also chosen
to be the quantization axis for the spin. If the incident beam is polarized with
spinup, the upper twocomponent spinor has the form
φ
A
↑
(r) =
_
1
0
_
exp
_
i k r cos θ
_
+
_
f(θ)
g(θ) exp[ i ϕ ]
_ exp
_
i k r
_
r
(1776)
whereas for a downspin polarized incident beam
φ
A
↓
(r) =
_
0
1
_
exp
_
i k r cos θ
_
+
_
− g(θ) exp[ − i ϕ ]
f(θ)
_ exp
_
i k r
_
r
(1777)
On recalling the Rayleigh expansion
exp
_
i k r cos θ
_
=
l
i
l
( 2 l + 1 ) j
l
(kr) P
l
(cos θ) (1778)
one can ﬁnd the scattered spherical outgoing wave by subtracting the un
scattered beam from the total wave function. On using the asymptotic large r
variation, one obtains the asymptotic form
exp
_
i k r cos θ
_
→
l
i
l
( 2 l + 1 )
cos(kr −(l + 1)
π
2
)
kr
P
l
(cos θ) (1779)
314
which has a similar form to the asymptotic form of the total wave function.
In particular, the spin and orbital angular momentum eigenstates can be de
composed in terms of the spinor spherical harmonics. Thus, for the upspin
polarized incident beam one has the upper twocomponent spinor
P
l
(cos θ) χ
+
=
_
4 π
2 l + 1
Y
l
0
(θ, ϕ) χ
+
=
√
4 π
2 l + 1
_
√
l + 1 Ω
l
l+
1
2
,
1
2
−
√
l Ω
l
l−
1
2
,
1
2
_
(1780)
and for the downspin beam
P
l
(cos θ) χ
−
=
_
4 π
2 l + 1
Y
l
0
(θ, ϕ) χ
−
=
√
4 π
2 l + 1
_
√
l + 1 Ω
l
l+
1
2
,−
1
2
+
√
l Ω
l
l−
1
2
,−
1
2
_
(1781)
Therefore, when expressed in terms of a superposition of continuum energy
eigenstates corresponding to diﬀerent values of j and κ, the asymptotic form of
the Rayleigh expansion becomes
exp
_
i k r cos θ
_
χ
+
→
√
4 π
l
i
l
cos(kr −(l + 1)
π
2
)
kr
_
√
l + 1 Ω
l
l+
1
2
,
1
2
−
√
l Ω
l
l−
1
2
,
1
2
_
(1782)
and
exp
_
i k r cos θ
_
χ
−
→
√
4 π
l
i
l
cos(kr −(l + 1)
π
2
)
kr
_
√
l + 1 Ω
l
l+
1
2
,−
1
2
+
√
l Ω
l
l−
1
2
,−
1
2
_
(1783)
Although the coeﬃcients A
κ
of the exact wave function are as yet unknown, they
can be determined by requiring the scattered spherical wave does not contain
terms proportional to
exp
_
− i k r
_
r
(1784)
which would represent an incoming spherical wave. This requirement leads to
the outgoing spherical wave having a spinup component given by
√
4 π
2 i k
l
_
( l + 1 )
√
2 l + 1
_
exp[ 2 i δ
−l−1
(k) ] − 1
_
+
l
√
2 l + 1
_
exp[ 2 i δ
l
(k) ] − 1
_ _
Y
l
0
(θ, ϕ)
exp
_
i k r
_
r
(1785)
315
and the downspin component is given by
√
4 π
2 i k
l
_
l ( l + 1 )
2 l + 1
_ _
exp[ 2 i δ
−l−1
(k) ] − 1
_
−
_
exp[ 2 i δ
l
(k) ] − 1
_ _
Y
l
1
(θ, ϕ)
exp
_
i k r
_
r
(1786)
In the above expressions, the index on the phaseshifts δ
κ
(k) refer to the value
of κ. Hence, for a spinup polarized incident beam, the scattering amplitudes
are given in terms of the phaseshifts via
f(θ) =
√
4 π
2 i k
l
_
( l + 1 )
√
2 l + 1
_
exp[ 2 i δ
−l−1
(k) ] − 1
_
+
l
√
2 l + 1
_
exp[ 2 i δ
l
(k) ] − 1
_ _
Y
l
0
(θ, ϕ) (1787)
and
g(θ) exp[ i ϕ ] =
√
4 π
2 i k
l
_
l ( l + 1 )
2 l + 1
_ _
exp[ 2 i δ
−l−1
(k) ] − 1
_
−
_
exp[ 2 i δ
l
(k) ] − 1
_ _
Y
l
1
(θ, ϕ) (1788)
A similar analysis can be applied to the scattering of an incident beam which is
downspin polarized, giving similar results.
If the incident beam is unpolarized, the elastic scattering crosssection is
given in terms of the scattering amplitudes by
_
dσ
dΩ
_
=
_
[ f(θ) [
2
+ [ g(θ) [
2
_
(1789)
where the polar angle θ is the scattering angle.
11.12 An Electron in a Uniform Magnetic Field
We shall consider a Dirac electron in a constant magnetic ﬁeld B = B
(z)
ˆ e
z
aligned parallel to the z direction. The vector potential can be chosen such that
A = B x ˆ e
y
(1790)
We shall search for stationary states with energy E, where
ψ =
_
φ
A
φ
B
_
exp
_
−
i
¯h
E t
_
(1791)
316
In the standard representation, the energy eigenvalue equation is represented
by the set of coupled equations
( E − m c
2
) φ
A
(r) = c σ . ( ˆ p −
q
c
A ) φ
B
(r)
( E + m c
2
) φ
B
(r) = c σ . ( ˆ p −
q
c
A ) φ
A
(r)
(1792)
Substituting the expression for φ
B
from the second equation into the ﬁrst, one
obtains the secondorder diﬀerential equation for φ
A
( E
2
− m
2
c
4
) φ
A
= c
2
_
σ . ( ˆ p −
q
c
A )
_
2
φ
A
(r)
= c
2
_
( ˆ p −
q
c
A )
2
−
q ¯h
c
σ . B
_
φ
A
(r)
=
_
ˆ p
2
c
2
+ q
2
B
2
x
2
− 2 q ˆ p
y
c B x − q c ¯h σ
(z)
B
_
φ
A
(r)
(1793)
Since ˆ p
y
and ˆ p
z
commute with x, one can ﬁnd simultaneous eigenstates of
ˆ
H,
ˆ p
y
and ˆ p
z
. Hence, the twocomponent spinor φ
A
can be expressed as
φ
A
(r) = exp
_
i k
y
y + i k
z
z
_
Φ
A
(x) (1794)
in which Φ
A
(x) is a twocomponent spinor which only depends on the variable
x. In this case, the exponential term can be factored out of the eigenvalue
equation. The resulting equation has the form
_
−¯h
2
c
2
∂
2
∂x
2
+ ( c ¯h k
y
−q B x )
2
−q c ¯h B σ
(z)
_
Φ
A
(x) = ( E
2
−m
2
c
4
−c
2
¯h
2
k
2
z
) Φ
A
(x)
(1795)
The equations decouple if the twocomponent spinor Φ
A
(x) can be taken to be
an eigenstate of the zcomponent of the spin operator
Φ
A
(x) = f(x) χ
σ
(1796)
where
σ
(z)
χ
σ
= σ χ
σ
(1797)
in which the eigenvalues of σ
(z)
are denoted by σ. Therefore, the eigenvalue
equation can be reduced to
_
−¯h
2
c
2
∂
2
∂x
2
+ ( q B )
2
_
x −
c ¯h k
y
q B
_
2
_
f(x) = ( E
2
−m
2
c
4
−c
2
¯h
2
k
2
z
+q c ¯h B σ ) f(x)
(1798)
317
which (apart from an overall scale factor) is formally equivalent
121
to the (non
relativistic) energy eigenvalue equation for a shifted harmonic oscillator, with
frequency 2 c [ q [ B. The modulus sign was inserted to ensure that the frequency
ω
HO
is positive. The energy eigenvalues are determined from
( E
2
− m
2
c
4
− c
2
¯h
2
k
2
z
+ q c ¯h B σ ) = 2 [ q [ c ¯h B ( n +
1
2
) (1799)
Hence, for an electron with negative charge q = − e one ﬁnds that the positive
energy eigenvalue is given by the solution
E = c
_
m
2
c
2
+ ¯h
2
k
2
z
+ ( 2 n + 1 + σ )
[ e [ ¯h
c
B (1800)
This expression has an inﬁnite degeneracy as it is independent of the continuous
variable k
y
. It also has a discrete (twofold) degeneracy between the levels with
quantum numbers (n, σ = 1) and (n + 1, σ = −1). The twofold degeneracy
can be understood as a consequence of the generalized helicity σ . ( ˆ p −
q
c
A )
commuting with the Hamiltonian
ˆ
H. This results in the spin’s alignment with
the electron’s velocity being preserved, as the spin’s precession is precisely bal
anced by the electron’s orbital precession. It should be noted that if the g factor
deviates from 2, and such an anomaly in the g factor is expected from Quantum
Electrodynamics and has been found in experiment, then this degeneracy will
be lifted. The calculated ( g − 2 ) anomaly for an electron is given by
_
g − 2
2
_
Theor
=
1
2
_
α
π
_
−0.3284986
_
α
π
_
2
+ 1.17611
_
α
π
_
3
−1.434
_
α
π
_
4
+. . .
(1801)
where
α =
_
e
2
¯h c
_
(1802)
is the ﬁne structure constant. The experimentally determined value of the g
anomaly is found as
_
g − 2
2
_
Expt
= 0.0011659208 (1803)
121
The explicit (but dimensionally incorrect) analogy is obtained by setting the Harmonic
Oscillator mass, m
HO
, as
m
HO
=
1
2 c
2
and then determine the frequency from
m
2
HO
ω
2
HO
=
_
q B
c
_
2
.
318
and diﬀers from the theoretical value in the last two decimal places
122
. In
the nonrelativistic limit, the expression for the relativistic energy eigenvalue
reproduces the expression for energies of the wellknown Landau levels
E ≈ m c
2
+
¯h
2
k
2
z
2 m
+ ( n +
1 + σ
2
)
_
[ e [ ¯h B
m c
_
(1804)
which are doublydegenerate.
11.13 Motion of an Electron in a Classical Electromag
netic Field
Consider an electron in a classical electromagnetic ﬁeld represented by the real
vector potential A
µ
. For simplicity, electromagnetic ﬁeld will be represented by
a plane wave deﬁned over Minkowski space that depends on the phase φ deﬁned
by
φ = k
µ
x
µ
(1805)
Hence, the vector potential is written as
A
µ
= A
µ
(φ) (1806)
The vector potential satisﬁes the Lorentz Gauge condition
∂
µ
A
µ
= k
µ
A
µ
(φ)
= 0 (1807)
where the prime indicates diﬀerentiation with respect to φ. The classical vector
potential must satisfy the sourcefree wave equation
∂
ν
∂
ν
A
µ
= k
ν
k
ν
A
µ
(φ)
= 0 (1808)
which results in the condition
k
ν
k
ν
= 0 (1809)
which is the dispersion relation for a free electromagnetic ﬁeld.
The Dirac equation for a spin onehalf particle with charge q can be used to
obtain the secondorder diﬀerential equation
_
−¯h
2
∂
µ
∂
µ
−2 i ¯h
q
c
A
µ
∂
µ
+
q
2
c
2
A
µ
A
µ
−m
2
c
2
−i ¯h
q
c
γ
µ
k
µ
γ
ν
A
ν
(φ)
_
ψ = 0
(1810)
122
This discrepancy could indicate the importance of virtual processes in which heavy par
ticle/antiparticle pairs are created. The (g − 2) anomalies for the muon and its antiparticle
have also been measured [G. W. Bennett et al., Phys. Rev. Lett. 92, 1618102 (2004).].
These experiments show that particles and antiparticles precess at the same rate. However,
the value of the (g − 2) anomaly is inconsistent with the theoretical prediction based on the
standard model of particle physics.
319
where ψ is the fourcomponent Dirac spinor. In deriving this, the Lorenz gauge
condition has been used to rewrite
γ
µ
γ
ν
∂
µ
_
A
ν
ψ
_
= γ
µ
γ
ν
∂
µ
_
A
ν
ψ
_
− g
µ,ν
_
∂
µ
A
ν
_
ψ
(1811)
in the diagonal terms.
Following Volkow
123
, the solution of the secondorder diﬀerential equation
can be found in the form
ψ = exp
_
− i
p
µ
x
µ
¯h
_
F(φ) (1812)
where p
µ
is a fourvector and F(φ) is a fourcomponent spinor. This form re
duces to the form of a free particle solution when A
µ
≡ 0 in which case p
µ
becomes the momentum of the free particle. The exponential form is unaltered
when the vector potential is nonzero since arbitrary multiples of the electro
magnetic wave vector k can be added to the momentum of the free particle, in
which case p
µ
has a diﬀerent interpretation. For a transverse polarized vector
potential describing an electromagnetic wave travelling in the ˆ e
3
direction, the
operators
ˆ p
1
= i ¯h ∂
1
ˆ p
2
= i ¯h ∂
2
(1813)
commute with the timedependent Dirac Hamiltonian and are constants of mo
tion. Although the particle’s energy and momentum operators do not commute
with the Hamiltonian, as these quantities are not conserved due to the interac
tion with the ﬁeld, the quantity
ˆ p
3
− ˆ p
0
= i ¯h
_
∂
3
− ∂
0
_
(1814)
does commutes with the Hamiltonian and, therefore, is conserved. The con
servation of this quantity can be interpreted in terms of the energy absorbed
or emitted by the electron due to interaction with the classical electromagnetic
ﬁeld being accompanied by the absorption or emission of similar amount of mo
mentum
124
. Despite the diﬀerent interpretation of p
µ
in the presence of the
classical ﬁeld, the fourvector p
µ
shall be chosen to satisfy the condition
p
µ
p
µ
= m
2
c
2
(1815)
which is the dispersion relation for a free electron
125
.
123
D. M. Volkow, Zeit. f¨ ur Physik, 94, 25 (1935).
124
For the quantized electromagnetic ﬁeld, the absorption of a photon involves the absorption
of the energy and momentum given by the fourvector ¯h k
µ
, where k
µ
= (k, 0, 0, k).
125
If the condition on p
µ
is dropped, the function F(φ) will acquire an overall phase factor
that depends linearly on φ and on the constant value of p
µ
pµ − m
2
c
2
.
320
The form of the wave function of eqn(1812) is to be substituted into the
secondorder diﬀerential eqn(1810). It shall be noted that
A
µ
∂
µ
F(φ) = k
µ
A
µ
F(φ)
= 0
∂
µ
∂
µ
F(φ) = k
µ
k
µ
F(φ)
= 0 (1816)
since A
µ
satisﬁes the Lorenz gauge condition and k
µ
satisﬁes the dispersion
relation for electromagnetic waves in vacuum. On substituting the ansatz into
the secondorder equation, using the above two equations and the choice of p
µ
satisfying the freeelectron dispersion relation, one ﬁnds that the secondorder
equation reduces to a ﬁrstorder diﬀerential equation for the spinor F(φ)
2 i ¯h p
µ
k
µ
F(φ)
=
_
2
q
c
A
µ
p
µ
−
q
2
c
2
A
µ
A
µ
+ i ¯h
q
c
γ
µ
k
µ
γ
ν
A
ν
(φ)
_
F(φ)
(1817)
which only depends on φ since the exponential phasefactor which depends on
p
µ
x
µ
has been factored out. The ﬁrstorder equation can be integrated w.r.t.
φ to yield
F(φ) = exp
_
−
i q
¯h c p
λ
k
λ
_
φ
0
_
p
µ
A
µ
(φ
) −
1
2
q
c
A
µ
(φ
) A
µ
(φ
)
_
dφ
+
q
c
γ
µ
k
µ
γ
ν
A
ν
2 p
λ
k
λ
_
F(0)
(1818)
where F(0) is an arbitrary constant fourcomponent spinor. The exponential of
the matrix is deﬁned in terms of its series expansion.
F(φ) = exp
_
−
i q
¯h c p
λ
k
λ
_
φ
0
_
p
µ
A
µ
(φ
) −
1
2
q
c
A
µ
(φ
) A
µ
(φ
)
_
dφ
_
exp
_
q
c
γ
µ
k
µ
γ
ν
A
ν
2 p
λ
k
λ
_
F(0) (1819)
The above form can be simpliﬁed by expanding the last exponential factor due
to the identity
_
γ
µ
k
µ
γ
ν
A
ν
_
n
= 0 (1820)
for all integers n such that n > 1. The identity can be proved by
γ
µ
k
µ
γ
ν
A
ν
γ
τ
k
τ
γ
ρ
A
ρ
= − γ
µ
k
µ
γ
τ
k
τ
γ
ν
A
ν
γ
ρ
A
ρ
+ 2 g
ν,τ
A
ν
k
τ
γ
µ
k
µ
γ
ρ
A
ρ
= − γ
µ
k
µ
γ
τ
k
τ
γ
ν
A
ν
γ
ρ
A
ρ
(1821)
where the ﬁrst line follows by using the anticommutation relations for the γ
matrices and the second line follows from applying the Lorenz gauge condition.
The expression can be further simpliﬁed by noting that on anticommuting the
ﬁrst pair of γ matrices, one has
= − γ
µ
k
µ
γ
τ
k
τ
γ
ν
A
ν
γ
ρ
A
ρ
= γ
τ
k
τ
γ
µ
k
µ
γ
ν
A
ν
γ
ρ
A
ρ
+ 2 g
µ,τ
k
µ
k
τ
= γ
τ
k
τ
γ
µ
k
µ
γ
ν
A
ν
γ
ρ
A
ρ
= γ
µ
k
µ
γ
τ
k
τ
γ
ν
A
ν
γ
ρ
A
ρ
(1822)
321
the third line follows from the condition k
µ
k
µ
= 0 and the last line follows
from interchanging the ﬁrst two pairs of summation indices. On comparing the
ﬁrst and last lines, one notes that the righthand side is zero. Therefore, one
has proved the identity
γ
µ
k
µ
γ
ν
A
ν
γ
τ
k
τ
γ
ρ
A
ρ
= 0 (1823)
Using, the above identity, the spinor F(φ) can be expanded as
F(φ) = exp
_
−
i q
¯h c p
λ
k
λ
_
φ
0
_
p
µ
A
µ
(φ
) −
1
2
q
c
A
µ
(φ
) A
µ
(φ
)
_
dφ
_
_
ˆ
I +
q
c
γ
µ
k
µ
γ
ν
A
ν
2 p
λ
k
λ
_
F(0) (1824)
Hence, the spinor solution of the secondorder diﬀerential equation can be ex
pressed as
ψ(x) = exp
_
i
S
¯h
_ _
ˆ
I +
q
c
γ
µ
k
µ
γ
ν
A
ν
2 p
λ
k
λ
_
F(0) (1825)
where S given by
S = − p
µ
x
µ
−
q
c p
λ
k
λ
_
φ
0
_
p
µ
A
µ
(φ
) −
1
2
q
c
A
µ
(φ
) A
µ
(φ
)
_
dφ
(1826)
is the classical action of a particle moving in an electromagnetic ﬁeld.
If the above equation is to be a solution of the Dirac equation, one needs to
exclude redundant solutions of the secondorder equation. This can be achieved
by demanding that as r → ∞ one has A
µ
→ 0. In this limit, the above
solution reduces to
ψ → exp
_
− i
p
µ
x
µ
¯h
_
F(0) (1827)
which satisﬁes the Dirac equation if
_
γ
µ
p
µ
− m c
_
F(0) = 0 (1828)
Therefore, one demands that F(0) satisﬁes the above supplementary condition
which is the same as for a free particle. Hence, one can set
F(0) = ^
F
_
_
χ
σ
p . σ
p
(0)
+ m c
χ
σ
_
_
(1829)
where the normalization constant is given by
^
F
=
¸
p
(0)
+ m c
2 p
(0)
V
(1830)
322
The spectrum of eigenvalues of the electron’s energy can be found by Fourier
transforming the above solution with respect to time, which shows that the elec
tron absorbs and emits radiation in multiples of ¯h ω. The Volkov solutions have
been used to describe the Compton scattering of electrons by intense coherent
laser beams, and is also the basis of the strongﬁeld approximation sometimes
found useful in atomic physics
126
.
The current density is derived from the expression
j
µ
= c ψ
†
γ
µ
ψ (1831)
Since the Dirac adjoint spinor is given by
ψ
†
= F
†
(0)
_
ˆ
I +
q
c
γ
ν
A
ν
γ
µ
k
µ
2 p
λ
k
λ
_
exp
_
− i
S
¯h
_
(1832)
the current density is evaluated as
j
µ
=
c
p
(0)
V
_
p
µ
−
q
c
A
µ
+ k
µ
_
q
c
p
ν
A
ν
k
λ
p
λ
−
q
2
c
2
A
ν
A
ν
2 k
λ
p
λ
_ _
(1833)
Hence, the current is composed of a constant component p
µ
and an oscillatory
component form the vector potential, and an oscillatory component which is
second order in the vector potential. This implies that the electromagnetic ﬁeld
has measurable consequences. For a vector potential A
µ
which is a periodic
function with a timeaveraged value of zero, the timeaveraged current density
is given by
j
µ
=
c
p
(0)
V
_
p
µ
− k
µ
_
q
2
c
2
A
ν
A
ν
2 k
λ
p
λ
_ _
(1834)
which shows that the electromagnetic wave does not drop out from timeaveraged
quantities.
11.14 The Limit of Zero Mass
The Dirac equation has the form
γ
µ
ˆ p
µ
ψ = m c ψ (1835)
where the γ matrices are any set of matrices which satisfy the anticommutation
relations
γ
µ
γ
ν
+ γ
ν
γ
µ
= 2 g
µ,ν
ˆ
I (1836)
126
L. V. Keldysh, Zh. Eksp. Teor. Fiz. 47, 1945 (1964). [Sov. Phys. J.E.T.P. 20, 1307
(1965).]
F. H. M. Faisal, J. Phys. B 6, L89 (1973).
H. R. Reiss, Phys. Rev. A 22, 1786 (1980).
323
The Dirac equation is independent of the speciﬁc representation of the γ matri
ces. We have chosen the representation
γ
(0)
=
_
I 0
0 −I
_
(1837)
and
γ
(i)
=
_
0 σ
(i)
−σ
(i)
0
_
(1838)
where σ
(i)
are the Paulimatrices. This is the standard representation.
We can ﬁnd other representations which diﬀer through unitary transforma
tions
ψ
=
ˆ
U ψ (1839)
where the explicit form of the γ matrices transform via
γ
µ
=
ˆ
U γ
µ
ˆ
U
†
(1840)
and the Dirac adjoint is transformed via
ψ
†
= ψ
†
γ
(0)
(1841)
These unitary transformations of the gamma operators keep matrix elements of
the form _
d
3
r ψ
†
ˆ
A ψ (1842)
invariant.
The chiral representation is found by performing the unitary transform
ˆ
U =
1
√
2
_
I −I
I I
_
(1843)
starting with the standard representation. In the chiral representation, the γ
matrices have the form
γ
(0)
=
_
0 I
I 0
_
(1844)
and
γ
(i)
=
_
0 σ
(i)
−σ
(i)
0
_
(1845)
The components of the wave function in the chiral representation ψ
are denoted
as
ψ
=
_
φ
L
φ
R
_
(1846)
324
The components φ
L
and φ
R
are related to the components of ψ in the standard
representation via
_
φ
L
φ
R
_
=
1
√
2
_
φ
A
− φ
B
φ
A
+ φ
B
_
(1847)
The chiral representation is particularly useful for the description of massless
spin onehalf particles, such as might be the case for the neutrino. The neutrino
masses are extremely small. The masses have evaded direct experimental mea
surement. However, direct measurements have set upper limits on the masses
which decrease with time
127
. In this case, with the limit m → 0, the Dirac
equation takes the form
_
_
_
_
0
_
∂
∂t
+ c σ . ∇
_
_
∂
∂t
− c σ . ∇
_
0
_
_
_
_
_
φ
L
φ
R
_
= 0 (1848)
Hence, the Dirac equation for a massless free particle reduces to two uncoupled
equations, each of which are equations proposed by Weyl
128
_
∂
∂t
+ c σ . ∇
_
φ
R
= 0 (1849)
and
_
∂
∂t
− c σ . ∇
_
φ
L
= 0 (1850)
The Weyl equation describes a spin onehalf massless particle by a two com
ponent spinor wave function. The Weyl equation violates parity invariance.
The Weyl equation was considered to be unphysical until the discovery of the
(anti)neutrino
129
and the associated violation of parity invariance
130
. After the
parity violation of the weak interaction was established, the Weyl equation was
adopted to describe the neutrino
131
.
Inexplicably nature seems to have selected the Weyl equation for φ
L
, but not
φ
R
to describing neutrinos. The solutions of the Weyl equation for free particles
_
∂
∂t
− c σ . ∇
_
φ
L
= 0 (1851)
127
L. Langer and R. Moﬀat, Phys. Rev. 88, 689 (1952).
V. A. Lyubimov, F. G. Novikov, V. Z. Nozik, F. F. Tretyakov, and V. S. Kosik, Phys. Lett.
94B, 266 (1980).
A. I. Belesev et al., Phys. Lett. 350, 263 (1995).
128
H. Weyl, Zeit. f¨ ur Physik, 56, 330 (1929).
129
C. L. Cowan Jr., F. Reines, F. B. Harrison, H. W. Kruse and A. D. McGuire, Science
124, 103 (1956).
F. Reines and C. L. Cowan Jr., Phys. Rev. 113, 273 (1959).
130
C. S. Wu, E. Ambler, R. W. Hayward, D. D. Hoppes and R. F. Hudson, Phys. Rev. 108,
1413 (1957).
131
T. D. Lee and C. N. Yang, Phys. Rev. 105, 1671 (1957).
A. Salam, Nuovo Cimento 5, 299 (1957).
L. Landau, Nuclear Phys. 3, 127 (1957).
325
can be written as
φ
L
=
_
u
(0)
u
(1)
_
1
√
V
exp
_
−
i
¯h
( E t − p . r )
_
(1852)
Since helicity is conserved, one can choose the direction of p as the axis of
quantization. The positiveenergy solution is given by
φ
L
−
=
_
0
1
_
1
√
V
exp
_
−
i
¯h
( E t − p z )
_
(1853)
which has negative helicity and has energy given by
E
−
= c p (1854)
The negativeenergy solution is given by
φ
L
+
=
_
1
0
_
1
√
V
exp
_
−
i
¯h
( E t − p z )
_
(1855)
which has positive helicity and the energy is given by
E
+
= − c p (1856)
This negativeenergy solution will describe antiparticles. The Weyl equation for
φ
R
has a positiveenergy solution with positive helicity, and a negativeenergy
solution with negative helicity. Since only neutrinos with negative helicity are
observed in nature, only φ
L
is needed. The antineutrinos have positive helicity
and are represented by φ
R
.
Elementary Excitations
φ
L
E
ν∗
ν φ
R Λ=−1
Λ=+1 Λ=−1
Λ=+1
Figure 61: The dispersion relations for φ
L
and φ
R
. The elementary excitations
are the negativehelicity neutrino ν and a positivehelicity antineutrino ν.
The Neutrino
The neutrino was postulated by Pauli to balance energy and momentum
conservation in beta decay. In beta decay, it had been observed that neutron
326
decay products included a proton and an electron. However, it was observed
that the emitted electron had a continuous range of kinetic energies. Therefore,
another neutral particle must have been emitted in the decay. This particle was
termed the antineutrino, and the reaction can be written as
n → p + e
−
+ ν
e
(1857)
Conservation of angular momentum requires that the neutrino has a spin of
¯ h
2
.
Furthermore, since an energy of 1.2934 MeV is released in the transformation of
a neutron to a proton, and since sometime the decay processes produce electrons
which seem to take up all the released energy, the neutrino was suggested as
having zero mass. An upper limit on the neutrino’s mass of a few eV follows
from the FermiKurie plot
132
. The FermiKurie plot of the electron energy
0
50
100
150
200
0 5 10 15 20
Energy [keV]
[
N
(
p
)
/
F
p
2
]
1
/
2
Figure 62: The FermiKurie plot of the energy distribution of the electrons
emitted in the beta decay of tritium,
3
H →
3
He +e
−
+ν
e
. The decay releases
18.1 keV. It is seen that the electrons produced in the decay process have a
nonzero probability for carrying oﬀ most of the released energy. Hence, one
concludes that the antineutrinos are almost massless. The dashed blue curve
is the curve expected if the neutrino had a mass of 3 keV.
distribution is based on the phase space available for the emission of the electron
and antineutrino
133
. The joint phasespace available for the electron of four
momentum (E
e
/c, p) and the antineutrino of fourmomentum is (E
ν
/c, q) is
proportional to the factor
dΓ = dp p
2
_
dq q
2
δ(E −E
e
(p) −E
ν
(q))
= dE
e
(p)
p E
e
(p)
c
2
_
dE
ν
(q)
q E
ν
(q)
c
2
δ(E −E
e
(p) −E
ν
(q))
132
L. Langer and R. Moﬀat, Phys. Rev. 88, 689 (1952).
V. A. Lyubimov, F. G. Novikov, V. Z. Nozik, F. F. Tretyakov, and V. S. Kosik, Phys. Lett.
94B, 266 (1980).
A. I. Belesev et al., Phys. Lett. 350, 263 (1995).
133
E. Fermi, Zeit. f¨ ur Physik, 88, 161 (1934).
F. N. D. Kurie, J. R. Richardson and H. C. Paxton, Phys. Rev. 48, 167 (1935).
327
=
1
c
5
dE
e
(p) p E
e
(p)
_
E
ν
(q)
2
− m
2
ν
c
4
E
ν
(q)
¸
¸
¸
¸
Eν(q)=E−Ee(p)
=
1
c
5
dE
e
(p) p E
e
(p)
_
( E − E
e
(p) )
2
− m
2
ν
c
4
( E − E
e
(p) )
(1858)
where, since the antineutrino’s trajectory is unobservable, its momentum is in
tegrated over. This phasespace factor partially governs the energy distribution
of the emitted electrons. The second to last factor in the accessible volume
of phasespace contains the dependence on the antineutrino’s mass m
ν
and it
is this factor which is highlighted by the FermiKurie plot. The plot is de
signed to exhibit a linear energy variation until the line cuts the Eaxis, if the
antineutrino is massless. On the other hand, if the antineutrino has a ﬁnite
mass, the line should curve over and cut the Eaxis vertically. In this case, the
antineutrino mass would be determined by the diﬀerence between the linearly
extrapolated intercept and the actual intercept.
The process of beta decay does not conserve parity. The nonconservation
of parity was discovered in the experiments of C. S. Wu et al.
134
. In these
experiments, the spin of a
60
Co nucleus was aligned with a magnetic ﬁeld. The
spin S = 5¯h
60
Co nucleus decayed into a spin S = 4¯ h
60
Ni nucleus by emitting
an electron and an antineutrino.
60
Co →
60
Ni + e
−
+ ν
e
(1859)
Since angular momentum is conserved, the spin of the electron and the anti
neutrino initially must both be aligned with the ﬁeld. In the experiment, the
angular distribution of the emitted electrons was observed. Because the helicity
of the electrons is conserved, the angular distribution of the electrons can be
used to prove that the electrons all have negative helicity, and hence it is inferred
that the antineutrinos should have positive helicity. Since helicity should be
reversed under the parity operation, and since only negative helicity electrons
are observed, the process is not invariant under parity. Hence, parity is not
conserved.
The electrons that are emitted in beta decay have negative helicities. If
the momentum of an emitted electron is given by (p, θ
p
, ϕ
p
), then its helicity
operator is
Λ
p
=
_
cos θ
p
sin θ
p
exp[−iϕ
p
]
sin θ
p
exp[+iϕ
p
] −cos θ
p
_
(1860)
The helicity operator has eigenstates χ given by
Λ
p
χ
±
θp
= ± χ
±
θp
(1861)
134
C. S. Wu, E. Ambler, R. W. Hayward, D. D. Hoppes and R. F. Hudson, Phys. Rev. 108,
1413 (1957).
328
which are determined as
χ
+
θp
=
_
cos
θp
2
exp[−i
ϕp
2
]
sin
θp
2
exp[+i
ϕp
2
]
_
χ
−
θp
=
_
−sin
θp
2
exp[−i
ϕp
2
]
cos
θp
2
exp[+i
ϕp
2
]
_
(1862)
Since angular momentum is conserved and the emitted electrons only have neg
ative helicity, the angular distribution of the emitted electrons is proportional to
the square of the overlap of the initial electron spinup spinor with the negative
helicity spinors
[ χ
+
θ=0
†
χ
−
θp
[
2
= sin
2
θ
p
2
=
1
2
( 1 − cos θ
p
) (1863)
which is in exact agreement with the experimentally observed distribution. From
the distribution of emitted electrons one is led to expect that the antineutrino
has positive helicity.
The helicity of the neutrino was measured in an experiment performed by
Maurice Goldhaber et al.
135
. In the experiment, a
152
Eu nucleus with J = 0
135
M. Goldhaber, I. Grodzins, A. W. Sunyar, Phys. Rev. 109, 1015 (1958).
Co
S=5h
Ni
S=4h
e

S=h/2
ν
e
S=h/2
Figure 63: The spin S = 5¯ h of the Co nucleus is aligned with the magnetic ﬁeld.
The Co undergoes beta decay to Ni which has S = 4¯ h by emitting an electron
e
−
and an antineutrino ν
e
. The spin of the electron and the antineutrino
produced by the decay must initially be aligned with the magnetic ﬁeld, due to
conservation of angular momentum.
329
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
θ/π
I
(
θ
)
Figure 64: The angular distribution of the emitted electron in the beta decay
experiment of Wu et al.
captures an electron from the Kshell and decays to the excited state of a
152
Sm
nucleus with angular momentum J = ¯ h and emits a neutrino.
152
Eu + e
−
→
152
Sm
∗
+ ν
e
(1864)
The J = ¯ h excited state of Sm
∗
subsequently decays into the J = 0 ground
state of Sm by emitting a photon.
152
Sm
∗
→
152
Sm + γ (1865)
Goldhaber et al. measured the photons with the full Doppler shift, from which
they were able to infer the direction of the recoil of the nucleus. The photons
e

ν
e
γ
Eu
Sm
Sm*
ν
e
J=0 Λ
γ
=−1
J=1
Figure 65: A schematic depiction of the experiments of Goldhaber et al. which
determined the helicity of the neutrino.
were observed to be rightcircularly polarized, which corresponds to having a
negative helicity. Therefore, the photon’s spin was parallel to the momentum of
the emitted neutrino. Since the ground state of Sm has zero angular momentum,
the excited state of the Sm
∗
nucleus must have had its angular momentum
330
oriented along the direction of motion of the emitted neutrino. Since the sum of
the angular momentum of the excited state (J = ¯h) and the emitted neutrino
must equal the spin of the captured electron
¯ h
2
, the neutrino must have its spin
oriented antiparallel to the angular momentum of the Sm
∗
nucleus. Hence, the
neutrino has negative helicity.
11.15 Classical Dirac Field Theory
The Dirac Lagrangian density is given by
L = ψ
†
_
i ¯h c γ
µ
( ∂
µ
+ i
q
c ¯h
A
µ
) − m c
2
_
ψ (1866)
which, since ψ
†
and ψ are independent, the momentum conjugate to ψ is
Π =
1
c
∂L
∂(∂
0
ψ)
= i ¯h ψ
†
γ
(0)
= i ¯h ψ
†
(1867)
The momentum conjugate to ψ
†
vanishes
Π
†
=
1
c
∂L
∂(∂
0
ψ
†
)
= 0 (1868)
The Lagrangian equation of motion is found from the variational principle which
states that the action is extremal with respect to ψ and ψ
†
. The condition that
the action is extremal with respect to variations in ψ
†
leads to the Dirac equation
i ¯h γ
µ
( ∂
µ
+ i
q
c ¯h
A
µ
) ψ = m c ψ (1869)
after the resulting equation has been multiplied by a factor of γ
(0)
. On making
a variation of the action with respect to ψ, one ﬁnds the Hermitean conjugate
equation
− i ¯h c ( ∂
µ
− i
q
c ¯h
A
µ
) ψ
†
γ
µ
− m c
2
ψ
†
= 0 (1870)
That this is the Hermitean conjugate of the Dirac equation can be shown by
taking its Hermitean conjugate, which results in
i ¯h c γ
µ†
γ
(0)
( ∂
µ
+ i
q
c ¯h
A
µ
) ψ − m c
2
γ
(0)
ψ = 0 (1871)
The above equation can be reduced to the conventional form by multiplying by
γ
(0)
and by using the identities
γ
(0)
γ
(0)
=
ˆ
I
γ
(0)
γ
µ†
γ
(0)
= γ
µ
(1872)
331
Hence, the equation found by varying ψ is just the Hermitean conjugate of the
Dirac equation
i ¯h γ
µ
( ∂
µ
+ i
q
c ¯h
A
µ
) ψ = m c ψ (1873)
Furthermore, it is surmised that the starting Lagrangian is appropriate to de
scribe the Dirac ﬁeld theory.
The Hamiltonian density H is determined from the Lagrangian by the usual
Legendre transformation process
H = c Π ∂
0
ψ + c Π
†
∂
0
ψ
†
− L
= i ¯h c ψ
†
∂
0
ψ − L
= i ¯h c ψ
†
γ
(0)
∂
0
ψ − L
= − i ¯h c ψ
†
γ . ( ∇ − i
q
c ¯h
A ) ψ + ψ
†
( m c
2
+ q γ
(0)
A
(0)
) ψ
(1874)
where the relation between the covariant components of the vector potential to
the contravariant components A
(i)
= − A
(i)
has been used in the last line.
The result is identiﬁable with the Hamiltonian density that appears in the usual
expression for the quantum mechanical expectation value for the energy for the
Dirac electron.
The set of conserved quantities can be obtained from Noether’s theorem.
The momentumenergy tensor T
µ
ν
is given by
T
µ
ν
=
∂L
∂(∂
µ
ψ)
∂
ν
ψ +
∂L
∂(∂
µ
ψ
†
)
∂
ν
ψ
†
− δ
µ
ν
L (1875)
which is evaluated as
T
µ
ν
= i ¯h c ψ
†
γ
µ
∂
ν
ψ − δ
µ
ν
L
= i ¯h c ψ
†
γ
µ
∂
ν
ψ + δ
µ
ν
_
− i ¯h c ψ
†
γ
ρ
( ∂
ρ
+ i
q
c ¯h
A
ρ
) ψ + m c
2
ψ
†
ψ
_
(1876)
Hence, one ﬁnds the energy density T
0
0
is given by
T
0
0
= − i ¯h c ψ
†
γ . ( ∇ − i
q
c ¯h
A ) ψ + ψ
†
( m c
2
+ q γ
(0)
A
(0)
) ψ
= − i ¯h c ψ
†
α . ( ∇ − i
q
c ¯h
A ) ψ + ψ
†
( β m c
2
+ q A
(0)
) ψ
(1877)
which is the Hamiltonian density H. On integrating over all space, one sees
that the energy of the Dirac Field is equal to the expectation value of the Dirac
Hamiltonian operator
_
d
3
r T
0
0
=
_
d
3
r ψ
†
ˆ
H ψ (1878)
332
Likewise, (c times) the momentum density T
0
j
is found from
T
0
j
= i ¯h c ψ
†
γ
(0)
∂
j
ψ
= i ¯h c ψ
†
∂
j
ψ
= c ψ
†
ˆ p
j
ψ (1879)
where the partial derivative has been identiﬁed with the covariant momentum
operator. Hence, the contravariant component of the momentum is given by
T
0,j
= − i ¯h c ψ
†
∂
∂x
j
ψ
= c ψ
†
ˆ p
(j)
ψ (1880)
where the usual (contravariant) momentum operator is deﬁned as
ˆ p
(j)
= − i ¯h
∂
∂x
j
(1881)
Therefore, the jth component of the momentum is given by
P
(j)
=
1
c
_
d
3
r T
0,j
=
_
d
3
r ψ
†
ˆ p
(j)
ψ (1882)
which is equal to the expectation value of the momentum operator.
One can also determine the conserved Noether charges by noting that the
Lagrangian is invariant under a global gauge transformation
ψ →ψ
= exp
_
+ i ϕ
_
ψ
ψ
∗
→ψ
∗
= exp
_
− i ϕ
_
ψ
∗
(1883)
where ϕ is a constant real number. The inﬁnitesimal global gauge transforma
tion produces a variation in the (independent) ﬁelds
δψ = + i δϕ ψ
δψ
∗
= − i δϕ ψ
∗
(1884)
Since the Lagrangian is invariant under the transformation, then
δL = 0 (1885)
so we have
0 = δ L
=
_
∂L
∂ψ
_
δψ +
_
∂L
∂ψ
∗
_
δψ
∗
+
_
∂L
∂(∂
µ
ψ)
_
δ(∂
µ
ψ) +
_
∂L
∂(∂
µ
ψ
∗
)
_
δ(∂
µ
ψ
∗
)
(1886)
333
After substituting the EulerLagrange equations for the derivatives w.r.t. the
ﬁelds ψ and ψ
∗
, the variation is expressed as
0 = ∂
µ
_ _
∂L
∂(∂
µ
ψ)
_
δψ +
_
∂L
∂(∂
µ
ψ
∗
)
_
δψ
∗
_
(1887)
For an arbitrary gauge transformation through the ﬁxed inﬁnitesimal angle δϕ,
this condition becomes
0 = i δϕ ∂
µ
_ _
∂L
∂(∂
µ
ψ)
_
ψ −
_
∂L
∂(∂
µ
ψ
∗
)
_
ψ
∗
_
(1888)
Hence, one ﬁnds that there is a current j
µ
which satisﬁes the continuity equation
∂
µ
j
µ
= 0 (1889)
where (apart from the inﬁnitesimal constant of proportionality) the current is
given by
j
µ
∝ i δϕ
_ _
∂L
∂(∂
µ
ψ)
_
ψ −
_
∂L
∂(∂
µ
ψ
∗
)
_
ψ
∗
_
(1890)
For the Dirac Lagrangian, the second term is identically zero and the ﬁrst term
is nonzero. Hence, on adopting a conventional normalization, the conserved
current is identiﬁed as
j
µ
= c ψ
†
γ
µ
ψ (1891)
This is the same expression for the conserved current that was previously derived
for the oneelectron Dirac equation. Hence, the oneparticle Dirac equation
yields the same expectation values and obeys the same conservation laws as the
(classical) Dirac ﬁeld theory.
11.15.1 Chiral Gauge Symmetry
In the limit of zero mass, the Dirac Lagrangian takes the form
L = i ¯h c ψ
†
γ
µ
∂
µ
ψ (1892)
Starting with the standard representation and making the unitary transform
ˆ
U =
1
√
2
_
I −I
I I
_
(1893)
one ﬁnds that in the chiral representation the Dirac Lagrangian reduces to
L = i ¯h c
_
φ
†L
σ
µ
L
∂
µ
φ
L
+ φ
†R
σ
µ
R
∂
µ
φ
R
_
(1894)
where φ
L
and φ
R
are twocomponent Dirac spinors and the two sets of quantities
σ
µ
and ˜ σ
µ
are expressed in terms of the Pauli matrices as
σ
µ
L
= ( σ
0
, − σ )
σ
µ
R
= ( σ
0
, σ ) (1895)
334
The diﬀerence between σ
µ
L
and σ
µ
R
reﬂect the diﬀerent chirality of φ
L
and φ
R
. In
the absence of the mass term, the Dirac Lagrangian possesses two independent
scalar gauge transformations. These transformations corresponds to the global
gauge transformations
φ
L
→ φ
L
= φ
L
exp
_
i θ
L
_
φ
R
→ φ
R
= φ
R
exp
_
i θ
R
_
(1896)
where θ
L
and θ
R
are independent angles. The Lagrangian has a U(1) U(1)
gauge symmetry. The presence of a mass term would couple the two ﬁelds and
reduce the gauge transformation to one in which θ
R
= θ
L
.
In the chiral representation, the Hermitean matrix deﬁned by
γ
(4)
= i γ
(0)
γ
(1)
γ
(2)
γ
(3)
(1897)
takes the form
γ
(4)
=
_
−I 0
0 I
_
(1898)
The general gauge transformations for the massless fermion can be expressed as
the product of two independent transformations
ψ → ψ
= exp
_
i
_
θ
L
+θ
R
2
_
ˆ
I
_
exp
_
i
_
θ
R
−θ
L
2
_
γ
(4)
_
ψ (1899)
where ψ is a fourcomponents spinor
ψ =
_
φ
L
φ
R
_
(1900)
The ﬁrst factor represents the usual global gauge transformation for the Dirac
Lagrangian with ﬁnite mass. This transformation yields the usual conserved
fourvector current j
µ
V
deﬁned by
j
µ
V
= c ψ
†
γ
µ
ψ (1901)
The second factor is speciﬁc to the Dirac Lagrangian with zero mass. It is
called the chiral transformation or axial U(1) transformation. Using the anti
commutation relation
¦ γ
(4)
, γ
µ
¦
+
= 0 (1902)
one can show that the exponential factor in the chiral gauge transformation has
the property that
γ
µ
exp
_
i
_
θ
R
−θ
L
2
_
γ
(4)
_
= exp
_
− i
_
θ
R
−θ
L
2
_
γ
(4)
_
γ
µ
(1903)
335
This property can be used to show that the Lagrangian is invariant under the
chiral transformation because
ψ
†
γ
µ
∂
µ
ψ
= ψ
†
exp
_
− i
_
θ
R
−θ
L
2
_
γ
(4)
_
γ
(0)
γ
µ
∂
µ
exp
_
i
_
θ
R
−θ
L
2
_
γ
(4)
_
ψ
= ψ
†
γ
(0)
γ
µ
∂
µ
ψ
= ψ
†
γ
µ
∂
µ
ψ (1904)
which involves two commutations. Since the massless Dirac Lagrangian is invari
ant under the chiral transformation, Noether’s theorem shows that the current
j
µ
A
= c ψ
†
γ
µ
γ
(4)
ψ (1905)
is conserved. This conserved current transforms like a vector under proper
orthochronous Lorentz transformations but does not transform as a vector under
improper orthochronous transformations. Therefore, the current is an axial
current. The conserved axial density j
(0)
A
is given by
j
(0)
A
= ψ
†
γ
(0)
γ
(4)
ψ
= ψ
†
γ
(4)
ψ
= − φ
†L
φ
L
+ φ
†R
φ
R
(1906)
which is the diﬀerence between the number of particles with positive helicity
and the number of particles with negative helicity.
In the presence of a mass m, the Dirac Lagrangian in the chiral representation
is
L = i ¯h c
_
φ
†L
σ
µ
L
∂
µ
φ
L
+ φ
†R
σ
µ
R
∂
µ
φ
R
_
− m c
2
_
φ
†L
φ
R
+ φ
†R
φ
L
_
(1907)
and one ﬁnds that the axial current is not conserved because the mass term is
not invariant and acts like a current source
∂
µ
j
µ
A
= i
2 m c
¯h
ψ
†
γ
(4)
ψ (1908)
To summarize, just like the Proca equation yields a zero mass for the photon
if one imposes U(1) gauge invariance on the electromagnetic ﬁeld
136
, the neu
trino must have zero mass if one imposes a global U(1) chiral gauge invariance.
Furthermore, the existence of conservation of chirality for the massless neutrino
implies that the weak interaction must involve a coupling proportional to a fac
tor of either (
ˆ
I +γ
(4)
) or (
ˆ
I −γ
(4)
).
136
Schwinger has noted that the condition of gauge invariance does not necessarily result
in the photon being massless. He argued that if the electromagnetic coupling strength were
larger, the photon could have ﬁnite mass. [J. Schwinger, “Gauge Invariance and Mass”,
Physical Review, 125, 397 (1962).]
336
Exercise:
By considering an inﬁnitesimal chiral gauge transformation on the Lagrangian
for massive Dirac particles, determine δL and show that this leads to the axial
current j
µ
A
not being conserved.
11.16 Hole Theory
The negativeenergy solutions of the Dirac equation lead to the conclusion that
oneparticle quantum mechanics is an inadequate description of nature. In clas
sical mechanics, the dispersion relation for a free particle is found to be given
by
E = ±
_
m
2
c
4
+ p
2
c
2
(1909)
The negativeenergy states found in classical mechanics can be safely ignored.
The rational for ignoring the negativeenergy states in classical mechanics is
that, the dynamics is governed by a set of diﬀerential equations which result
in the classical variables changing in a continuous fashion. Since the particle’s
energy can only change in a continuous fashion, there is no mechanism which al
lows it to connect with the negative branch of the dispersion relation. However,
in quantum mechanics, particles can make discontinuous transitions between
diﬀerent energy levels, by emitting photons. Hence, if one has a single electron
in a positiveenergy state where E > m c
2
, this state would be unstable to
the electron making a transition to a negativeenergy state which occurs with
the simultaneous emission of photons which carry away an energy greater than
2 m c
2
. The transition rate for such process is quite large, therefore, one might
conclude that positiveenergy particles should not exist in nature. Furthermore,
if one does have particles in the negativeenergy branch, they might be able to
further lower their energies by multiple photon emission processes. Hence, the
states of negativeenergy particles with ﬁnite momenta could be unstable to
states in which the momentum has an inﬁnite value.
Dirac noted that if the negativeenergy states were all ﬁlled, then the Pauli
exclusion principle would prevent the decay of positiveenergy particles into
the negativeenergy states. Furthermore, in the absence of any positiveenergy
particles, the Pauli exclusion principle would cause the set of particles in the
negativeenergy state to be completely inert. In this picture, the ﬁlled sea
of negativeenergy states would represent the physical vacuum, and would be
unobservable in experiments. For example, if charge is measured, it is the
nonuniform part of the charge distribution that is measured, but the inﬁnite
number of particles in the negativeenergy states do produce a uniform charge
density. Likewise, when energies are measured, the energy is usually measured
with respect to some reference level. For the case of a vacuum in which all
the negativeenergy states are ﬁlled with electrons, the measured energies cor
respond to energy diﬀerences and so the inﬁnite negative energy of the vacuum
337
2
1
0
1
2
E
/
m
c
2
Occupied Negative Energy States
Unoccupied Positive Energy States
Figure 66: A cartoon depicting the vacuum for Dirac’s Hole Theory, in which
the negativeenergy states are ﬁlled and the positiveenergy states are empty.
should cancel. Therefore, Dirac postulated that the vacuum consists of the
state in which all the negativeenergy states are all ﬁlled with electrons
137
. Fur
thermore, physical states correspond to the states were a relatively few of the
positiveenergy states are ﬁlled with electrons and a few negative states are un
occupied. In this case, the electrons in the positiveenergy states are identiﬁed
with observable electrons, and the unﬁlled states or holes in the distribution of
negativeenergy states are also observable. These holes are known as positrons
and are the antiparticles of the electrons. The properties of a positron are
found by computing the diﬀerence between the property for a state with an
absent negativeenergy electron and the property of the vacuum state.
We shall assume that the vacuum contains of N electrons which completely
ﬁll all the N negative states and, for simplicity of discussion, the eﬀect of cou
pling to the electromagnetic ﬁeld can be ignored. Then the charge of a positron
q
p
is the diﬀerence between the charge of the vacuum with one missing electron,
and the charge of the vacuum
q
p
= ( N − 1 ) q
e
− N q
e
(1910)
Therefore, one ﬁnds that the positron has the opposite charge to that of an
electron
q
p
= − q
e
(1911)
137
P. A. M. Dirac, Proc. Roy. Soc. A 126, 360 (1930).
338
Hence, the positron has a positive charge. Likewise, the energy of the vacuum
in which all the electrons occupy all the negativeenergy states is denoted by
E
0
. The positron energy will be denoted as E
p
(p
e
). The positron corresponds
all states with negative energy being ﬁlled except for the state with the energy
E
e
(p
e
) = −
_
m
2
c
4
+ p
2
e
c
4
(1912)
which is unﬁlled. The positron energy is deﬁned as the energy diﬀerence
E
p
(p
p
) =
_
E
0
− E
e
(p
e
)
_
− E
0
= − E
e
(p
e
)
=
_
m
2
c
4
+ p
2
e
c
4
(1913)
Therefore, the positron corresponds to a particle with a positive energy. From
this it is seen that the rest mass energy of the positron is identical to the
rest mass energy of the electron. If the vacuum corresponds to a state with
momentum P
0
and if the negativeenergy state with momentum p
e
is unﬁlled,
then the momentum of the positron would be given by p
p
where
p
p
=
_
P
0
− p
e
_
− P
0
= − p
e
(1914)
Hence, the momentum of the positron is the negative of the momentum of the
missing electron
p
p
= − p
e
(1915)
Likewise, the spin of the positron is opposite to the spin of the missing electron,
etc. The velocity of an electron is deﬁned as the group velocity of a wave packet
of momentum p
e
. Hence, one ﬁnds the velocity of the negative energyelectron
from
v
e
=
∂
∂p
e
E
e
(p
e
)
= −
p
e
c
2
_
m
2
c
4
+ p
2
e
c
2
(1916)
while the velocity of the positron is given by
v
p
=
∂
∂p
p
E
p
(p
p
)
=
p
p
c
2
_
m
2
c
4
+ p
2
p
c
2
339
Table 18: The relation between properties of Negative Energy Electron and
Positron States.
Particle Charge Energy Momentum Spin Helicity Velocity
Electron − [ e [ − [ E [ + p +
¯ h
2
σ σ . p v
Positron + [ e [ + [ E [ − p −
¯ h
2
σ σ . p v
= −
p
e
c
2
_
m
2
c
4
+ p
2
e
c
2
= v
e
(1917)
Therefore, the positron and the negativeenergy electron states have the same
velocities.
Hole theory provides a simple description of the relation between a negative
energy state and antiparticle states. Mathematically, this relation is expressed
in terms of the charge conjugation transformation. A unique signature of the
hole theory is that a positiveenergy electron can make a transition to an unﬁlled
negativeenergy state emitting radiation, which corresponds to the process in
which a electronpositron pair annihilates
138
e + e → 2 γ (1918)
In this process, it is necessary that the excess energy be carried oﬀ by two
photons if the energymomentum conservation laws are to be satisﬁed. Likewise,
by supplying an energy greater than a threshold energy of 2 m c
2
, it should be
possible to promote an electron from a negativeenergy state, thereby creating
an electronpositron pair. Since it is unlikely that more than one photon can
be absorbed simultaneously, electronpositron pair creation only occurs in the
vicinity of a charged nucleus which can carry oﬀ any excess momentum.
γ → e + e (1919)
The positively charged electron, predicted by Dirac, was found experimentally
138
P. A. M. Dirac, Proc. Camb. Phil. Soc. 26, 361 (1930).
340
2
1
0
1
2
E
/
m
c
2
Occupied Negative Energy States
Unoccupied Positive Energy States
(k,α)
Figure 67: A cartoon depicting electronpositron production in Dirac’s Hole
Theory. In this case, an incident γray produces an electronhole pair. The
process is restricted to occur in the vicinity of heavy particles that can act as a
momentum sinks.
by Anderson
139
and the electronpositron creation
140
and annihilation pro
cesses
141
were observed shortly afterwards.
Dirac commented
142
that in scattering processes involving lowenergy elec
trons, such as Thomson scattering, it is essential that negativeenergy states
appear as virtual states, if one is to recover the correct scattering crosssection
in the nonrelativistic limit. The involvement of negativeenergy states in the
scattering of light is a consequence that, in the standard representation, the
lower twocomponent spinor in the Dirac wave function for a free (positive
energy) electron vanishes in the low energy limit, and also because the coupling
139
C. D. Anderson, Phys. Rev. 43, 491 (1933). Anderson observed the curved trajectories
of the charged particles in a cloud chamber in the presence of a magnetic ﬁeld. Anderson
inferred the charge of the particles from their direction of motion. The insertion of a lead
plate in a cloud chamber caused the particles to lose energy on one side of the plate which
was observed as a change in the radius of curvature of the particle’s track. Therefore, the
examination of the radius of curvature of the track on both sides of the plate allowed the
direction of motion to be established.
140
P. M. S. Blackett and G. P. S. Occhialini, Proc. Roy. Soc. A 139, 688 (1933). These
authors were the ﬁrst who correctly identiﬁed the positively charged particle as the anti
particle of the electron, in full accord with the predictions of Dirac’s hole theory.
141
J. Thibaud, Phys. Rev, 35, 78 (1934).
142
P. A. M. Dirac, Proc. Roy. Soc. A126, 360 (1930).
341
to the radiation ﬁeld is produced by γ
(0)
γ . A. The interaction operator can
be expressed as
ˆ
H
Int
= − q α . A = − q
_
0 σ
σ 0
_
. A (1920)
which only connects the upper and lower twocomponent spinors of the initial
and ﬁnal states ψ
n
and ψ
n
. Hence, as light scattering processes are at least of
secondorder in A, the intermediate state ψ
n
must involve a negativeenergy
electron state. Since the Pauli exclusion principle forbids the occupation of
the ﬁlled negativeenergy states, hole theory ascribes the intermediate states
as involving virtual electronpositron creation and annihilation processes. This
shows that, even for processes which appear to involve a single electron in the
initial and ﬁnal states, one must abandon singleparticle quantum mechanics
and adopt a multiparticle description. Therefore, a purely singleparticle de
scription is inadequate and one must consider a manyparticle description such
as quantum ﬁeld theory.
11.16.1 Compton Scattering
We shall consider Thomson scattering of light by free electrons. In this process,
light is scattered from the initial state (k, α) to the ﬁnal state (k, α) and the
(positiveenergy) electron makes a transition from the initial state (q, σ) to its
ﬁnal (positiveenergy) state (q
, σ
). The Thomson scattering crosssection of
light is given by the expression
_
dσ
dΩ
k
_
=
_
V ω
k
2 π ¯h c
2
_
2
[ M [
2
(1921)
where the matrix element M are determined from
M =
q
_
< q
, k
, α
[
ˆ
H
Int
[ q
> < q
[
ˆ
H
Int
[ q, k, α >
( E
q
+ ¯h ω
k
− E
q
)
+
< q
, k
, α
[
ˆ
H
Int
[ q
, k, α, k
, α
> < q
, k, α, k
, α
[
ˆ
H
Int
[ q, k, α >
( E
q
− E
q
− ¯h ω
k
)
_
(1922)
and where q indicates all the quantum numbers of a positiveenergy free electron
state. The sum over q
represents a sum over all possible intermediate states
of the electron, no matter whether they are positive or negativeenergy states.
The matrix element M is composed of a coherent superposition of matrix el
ements for virtual processes which represent the absorption of a photon (k, α)
followed by the subsequent emission of a photon (k
, α
) and the process where
the emission of light precedes the absorption process.
342
(k,α) (k',α')
q
q'
q''
(k,α) (k',α')
q
q'
q''
Figure 68: Processes involving negative electron states q
which contribute to
Compton scattering.
Since the basis set is composed of momentum eigenstates, the evaluation
of the spatial integration in the matrix elements of the interaction results in
the condition of conservation of momentum. Hence, for the process where the
photon (k, α) is absorbed before the emission of the photon (k
, α
), the momenta
are restricted by
k + q = q
q
= k
+ q
(1923)
which leads to the identiﬁcation of the momentum of the intermediate and ﬁnal
states as
q
= q + k
q
= q + k − k
(1924)
In the second process, where the emission process precedes the absorption, con
servation of momentum yields
k + q = k + k
+ q
k + k
+ q
= k
+ q
(1925)
which yields
q
= q − k
q
= q + k − k
(1926)
The limit in which the initial electron is at rest q = 0 shall be considered. The
momenta of the incident and scattered photon will be assumed suﬃciently low
so that the momentum of the electron in the intermediate state can be neglected
since q
≈ 0. That is, the Compton scattering process will be consider in the
limit k →0 and k
→0.
343
If the initial (positiveenergy) electron is stationary and has spin σ, its wave
function can be represented by the Dirac spinor
ψ
σ,q
(r) =
1
√
V
_
χ
σ
0
_
(1927)
Because the interaction Hamiltonian has the form of an oﬀdiagonal 2 2 block
matrix
ˆ
H
Int
= −
q
c
_
0 σ
σ 0
_
. A (1928)
the only nonzero matrix elements are those which connect the upper two
component spinor to the lower twocomponent spinor of the virtual state. Also,
momentum conservation requires that the virtual state also be one of almost
zero momentum. Hence, the electron in the virtual state must have the form of
a negativeenergy eigenstate
ψ
σ
,q
(r) ≈
1
√
V
_
0
χ
σ
_
(1929)
since the contribution from a positiveenergy state with small momentum is
negligibly small. Therefore, the electronic part of the matrix elements involving
the initial electron simply reduce to the expression
< ψ
σ
,q
[
ˆ
H
Int
[ ψ
σ,q
> = [ e [ χ
†
σ
σ χ
σ
. A (1930)
Likewise, the matrix elements which involve the ﬁnal (positive energy) electron
are evaluated as
< ψ
σ
,q
[
ˆ
H
Int
[ ψ
σ
,q
> = [ e [ χ
†
σ
σ χ
σ
. A (1931)
From these one ﬁnds that, to secondorder, the matrix elements that appear in
the transition rate are given by
M = e
2
_
2 π ¯h c
2
V
√
ω
k
ω
k
_
σ
_
( χ
†
σ
σ .
α
(k
) χ
σ
) ( χ
†
σ
σ .
α
(k) χ
σ
)
E
q
− E
q
+ ¯h ω
k
+
( χ
†
σ
σ .
α
(k) χ
σ
) ( χ
†
σ
σ .
α
(k
) χ
σ
)
E
q
− E
q
− ¯h ω
k
_
≈
_
e
2
2 m c
2
_ _
2 π ¯h c
2
V
√
ω
k
ω
k
_
σ
_
( χ
†
σ
σ .
α
(k
) χ
σ
) ( χ
†
σ
σ .
α
(k) χ
σ
)
+ ( χ
†
σ
σ .
α
(k) χ
σ
) ( χ
†
σ
σ .
α
(k
) χ
σ
)
_
(1932)
where one has set
E
q
− E
q
≈ 2 m c
2
(1933)
On using the completeness relation for the twocomponent Dirac spinors
σ
χ
σ
χ
†
σ
= I (1934)
344
the matrix elements are evaluated as
M ≈
_
e
2
2 m c
2
_ _
2 π ¯h c
2
V
√
ω
k
ω
k
_
χ
†
σ
_
( σ . ˆ
α
(k) ) ( σ . ˆ
α
(k
) ) + ( σ . ˆ
α
(k
) ) ( σ . ˆ
α
(k) )
_
χ
σ
(1935)
The products in the above expression can be evaluated with the aid of the Pauli
identity. The result is
( σ . ˆ
α
(k) ) ( σ . ˆ
α
(k
) ) = ( ˆ
α
(k) . ˆ
α
(k
) ) + i σ . ( ˆ
α
(k) ∧ ˆ
α
(k
) ) (1936)
Therefore, after combining both terms and noting that the pair of vector product
terms cancel since the vector product is antisymmetric under the interchange
k ↔k
, one ﬁnds that the matrix elements are evaluated as
M ≈
_
e
2
2 m c
2
_ _
2 π ¯h c
2
V
√
ω
k
ω
k
_
χ
†
σ
_
2 ˆ
α
(k) . ˆ
α
(k
)
_
χ
σ
M ≈
_
e
2
2 m c
2
_ _
2 π ¯h c
2
V
√
ω
k
ω
k
_
δ
σ,σ
2 ˆ
α
(k) . ˆ
α
(k
) (1937)
Hence the matrix elements are diagonal in the spin indices. The above matrix
elements are identical to the matrix elements that occur in the nonrelativistic
quantum theory of Thomson scattering. On substituting this result into eqn(1921),
one recovers the nonrelativistic expression for the diﬀerential scattering cross
section
_
dσ
dΩ
k
_
≈ δ
σ,σ
_
ω
k
ω
k
_ _
e
2
m c
2
_
2
[ ˆ
α
(k) . ˆ
α
(k
) [
2
≈ δ
σ,σ
_
ω
k
ω
k
_ _
e
2
m c
2
_
2
cos
2
Θ (1938)
where
cos Θ = ˆ
α
(k) . ˆ
α
(k
) (1939)
is the angle subtended by the initial and ﬁnal polarization vectors. Hence, one
concludes that the negativeenergy states do play an important role in light
scattering processes which involve lowenergy electrons. The result, although
correct, does need reinterpretation, since the states of negative energy are as
sumed to be ﬁlled with electrons in the vacuum and, therefore, the electron is
forbidden to occupy these levels in the intermediate states.
ElectronPositron Interpretation
The ﬁrst contribution to the matrix elements, which was described above,
has to be reinterpreted as representing a process in which an electron that
initially occupies the negativeenergy state q
makes a transition to the positive
energy state q
while emitting the photon (k
, α
). This transition is subsequently
followed by the positiveenergy electron q absorbing the photon (k, α) and falling
345
(k,α)
(k',α')
e

e

e
+
q
q'
q''
(k',α')
(k,α)
e

e

e
+
q'
q
q''
Figure 69: Processes involving positrons which contribute to Compton scatter
ing.
into the empty negativeenergy state. In this process, the negativeenergy states
are completely occupied in the initial and ﬁnal state, and the energy of the initial
and ﬁnal states are conserved. By reordering the factors in the matrix elements
and noting that since
E
q
+ ¯h ω
k
= E
q
+ ¯h ω
k
(1940)
the contribution to the matrix element of these two descriptions are identical
(apart from an over all negative sign).
The second contribution to the matrix elements can be viewed as originating
from an electron which initially occupies a negativeenergy state q
that absorbs
the photon (k, α) and makes a transition to the positiveenergy state q
. This
is followed by the electron in the positiveenergy state q emitting the photon
(k
, α
) and then falling into the empty negativeenergy state q
. Again, on
reordering the matrix elements and noting that
E
q
− ¯h ω
k
= E
q
− ¯h ω
k
(1941)
one ﬁnds an identical expression (and the multiplicative factor of minus one).
Hence, Dirac holetheory does lead to the correct classical result.
The above description is quite cumbersome, but can be made more concise by
adopting an antiparticle description of the unoccupied negativeenergy states.
The ﬁrst contribution to M ﬁrst involves the creation of a virtual electron
positron pair with the emission of the photon (k
, α
). The electron which has
just been created in the momentum eigenstate (q
, σ
) remains unchanged in
the ﬁnal state. Subsequently, the positron annihilates with the initial electron
(q, σ) while absorbing the photon (k, α). Since the intermediate state is a vir
tual state, energy does not have to be conserved. The second contribution to M
involves the creation of a virtual electronpositron pair with the absorption of
the photon (k, α). The created electron (q
, σ
) remains in the ﬁnal state while
346
the positron subsequently annihilates with the initial electron (q, σ) and emits
the photon (k
, α
). This process is also a virtual process if the energy of the
incident light ¯ h ω
k
is less than 2 m c
2
.
The perturbative expression for the Compton scattering crosssection can
be evaluated exactly, without recourse to nonrelativistic approximations. The
exact result is
_
dσ
dΩ
_
=
1
4
r
2
e
_
ω
ω
_
2
_
ω
ω
+
ω
ω
− 2 + 4 cos
2
Θ
_
(1942)
where Θ is the angle between the polarization vectors. This result was ﬁrst
derived by Klein and Nishina
143
in 1928.
11.16.2 Charge Conjugation
Charge conjugation is the operation of replacing matter by antimatter, so that,
for example, electrons will be replaced by positrons and vice versa. The opera
tion of charge conjugation consists of ﬁrst taking the complex conjugate of the
Dirac equation
_
γ
µ
( i ¯h ∂
µ
−
q
c
A
µ
) − m c
_
ψ = 0 (1943)
which describes a particle with charge q. We shall also assume that ψ describes
a positiveenergy solution. Complex conjugation yields the equation
_
γ
µ∗
( − i ¯h ∂
µ
−
q
c
A
∗
µ
) − m c
_
ψ
∗
= 0 (1944)
The complex conjugate of a positiveenergy solution ψ
∗
has a timedependent
phase that identiﬁes it with a negativeenergy solution. The vector potential
A
µ
is real. In the standard representation γ
(0)
, γ
(1)
and γ
(3)
are real, whereas
γ
(2)
is imaginary and, therefore, satisﬁes
γ
(2)∗
= − γ
(2)
(1945)
We shall multiply the complex conjugate of the Dirac equation by γ
(2)
and anti
commute γ
(2)
with the real γ
µ∗
and commute γ
(2)
with the γ
(2)∗
matrix. This
procedure changes the sign in front of the term originating from the diﬀerential
momentum operator w.r.t. the sign of the mass term. This procedure yields
γ
(2)
_
γ
µ∗
( − i ¯h ∂
µ
−
q
c
A
µ
) − m c
_
ψ
∗
= 0
_
γ
µ
( i ¯h ∂
µ
+
q
c
A
µ
) − m c
_
γ
(2)
ψ
∗
= 0 (1946)
143
O. Klein and Y. Nishina, Zeit. f¨ ur Physik, 52, 843 (1928).
347
Hence, one sees that γ
(2)
ψ
∗
describes a Dirac particle with mass m and a charge
of − q moving in the presence of a vector potential A
µ
. The fact that the opera
tion of charge conjugation (in any representation) involves complex conjugation
is related to gauge invariance. Charge conjugation is a new type of symmetry for
particles that have complex wave functions which relates particles to particles
with opposite charges. The charge conjugate ﬁeld ψ
c
is deﬁned as
ψ
c
=
ˆ
C ψ
∗
(1947)
which is the result of the complex conjugation followed by the action of a linear
operator
ˆ
C. The joint operation can be represented as an antiunitary operator.
The charge conjugation operator
ˆ
C is deﬁned as the unitary and Hermitean
operator
ˆ
C = − i γ
(2)
(1948)
The charge conjugation operator is Hermitean as
ˆ
C
†
= + i γ
(2)†
= − i γ
(2)
=
ˆ
C (1949)
and it is unitary since
ˆ
C
†
ˆ
C = − γ
(2)
γ
(2)
=
ˆ
I (1950)
where the anticommutation relations of the γ matrices have been used. It was
through this type of logic that Kramers
144
discovered the form of the charge
conjugation transformation which turns a particle into an antiparticle.
The expectation values of an operator
ˆ
A in a general charge conjugated state
ψ
c
are related to the expectation values in a general state ψ via
< ψ
c
[
ˆ
A [ ψ
c
> = −
_
< ψ [ γ
(2)
ˆ
A
∗
γ
(2)
[ ψ >
_
∗
(1951)
This can be shown in the position representation, by writing
_
d
3
r ψ
c†
(r)
ˆ
A ψ
c
(r) =
_
d
3
r ψ
∗†
(r)
ˆ
C
†
ˆ
A
ˆ
C ψ
∗
(r)
=
_ _
d
3
r ψ
†
(r)
ˆ
C
†∗
ˆ
A
∗
ˆ
C
∗
ψ(r)
_
∗
(1952)
where we have used the identity z = (z
∗
)
∗
in the second line. However, since
ˆ
C
is real, one ﬁnds
_
d
3
r ψ
c†
(r)
ˆ
A ψ
c
(r) =
_ _
d
3
r ψ
†
(r)
ˆ
C
ˆ
A
∗
ˆ
C ψ(r)
_
∗
= −
_ _
d
3
r ψ
†
(r) γ
(2)
ˆ
A
∗
γ
(2)
ψ(r)
_
∗
(1953)
144
H. A. Kramers, Proc. Amst. Akad. Sci. 40, 814 (1937).
348
This shows the relation between expectation values of a general operator
ˆ
A in
a state ψ(r) and its charge conjugated state ψ
c
(r).
We shall examine the eﬀect of charge conjugation on the plane wave solutions
of the Dirac equation. The planewave solutions can be written as
ψ
σ,k
(x) =
_
( E + m c
2
)
2 E V
_
χ
σ
c ¯ h σ . k
E + m c
2
χ
σ
_
exp
_
− i k
µ
x
µ
_
(1954)
The charge conjugate wave function is given by
ψ
c
σ,k
(x) =
ˆ
C ψ
∗
σ,k
(x)
=
_
( E + m c
2
)
2 E V
ˆ
C
_
χ
∗
σ
c ¯ h σ
∗
. k
E + m c
2
χ
∗
σ
_
exp
_
+ i k
µ
x
µ
_
(1955)
where
ˆ
C = − i γ
(2)
=
_
0 −iσ
(2)
iσ
(2)
0
_
=
_
_
_
_
0 0 0 −1
0 0 1 0
0 1 0 0
−1 0 0 0
_
_
_
_
(1956)
Therefore, the charge conjugate wave function is found to be given by
ψ
c
σ,k
(x) = i ˆ σ
(2)
_
( E + m c
2
)
2 E V
_
−
c ¯ h σ
∗
. k
E + m c
2
χ
∗
σ
χ
∗
σ
_
exp
_
+ i k
µ
x
µ
_
(1957)
which has the form of a planewave solution with negative energy E → − E,
and momentum ¯h k → − ¯h k. Furthermore, the spin of the charge conjugated
wave function has been reversed
145
σ → − σ, since when i σ
(2)
acts on the
complex conjugated positiveeigenvalue eigenstate of the spin projected on an
arbitrary direction
χ
+σ
(θ, ϕ)
∗
=
_
cos
θ
2
exp[+i
ϕ
2
]
sin
θ
2
exp[−i
ϕ
2
]
_
(1958)
145
Note that the helicity is invariant under the joint transformation
σ → − σ
k → − k
349
it turns it into the negativeeigenvalue eigenstate
χ
−σ
(θ, ϕ) =
_
−sin
θ
2
exp[−i
ϕ
2
]
cos
θ
2
exp[+i
ϕ
2
]
_
(1959)
That is, up to an arbitrary phase factor, the lower twocomponent spinor is
given by
i σ
(2)
χ
+σ
(θ, ϕ)
∗
= χ
−σ
(θ, ϕ) (1960)
Likewise, it can be shown that the upper twocomponent spinor is proportional
to
i σ
(2)
( σ
∗
. k ) χ
+σ
(θ, ϕ)
∗
= − ( σ . k ) ( i σ
(2)
) χ
+σ
(θ, ϕ)
∗
= − ( σ . k ) χ
−σ
(θ, ϕ)
= ( σ . ( − k ) ) χ
−σ
(θ, ϕ) (1961)
The end result is that the charge conjugated singleparticle wave function has
the form
ψ
c
σ,k
(x) =
_
( E + m c
2
)
2 E V
_
−
c ¯ h ( σ . (−k) )
E + m c
2
χ
−σ
χ
−σ
_
exp
_
+ i k
µ
x
µ
_
(1962)
The properties described above are the properties of a state of a relativistic free
particle with a negative energy eigenvalue − E, momentum − ¯h k and spin − σ.
The absence of an electron in the charge conjugated state describes a positron,
with positive energy E, momentum ¯ h k and spin σ.
More generally, even when an electromagnetic ﬁeld is present, the charge
conjugated wave function of a positiveenergy particle corresponds to the wave
function of a state with reversed energy E → − E, reversed spin σ → − σ
and reversed charge q → − q. Therefore, the charge conjugated state corre
sponds to the (negativeenergy) state which when unoccupied is described as an
antiparticle.
Exercise:
Consider massless Dirac particles, m →0. (i) Show that the energyhelicity
eigenstates coincide with the eigenstates of γ
(4)
. (ii) Hence, show that the oper
ators
1
2
(
ˆ
I ± γ
(4)
) project onto helicity eigenstates. These projection operators
relate the fourcomponent Dirac spinors onto the independent twocomponent
Weyl spinors φ
L
and φ
R
. (iii) Show that charge conjugation transforms φ
L
into
φ
R
.
Exercise:
350
Prove the completeness relation for the set of solutions for the Dirac equation
for a free particle
α
_
φ
†
α
(r)
λ
φ
α
(r
)
ρ
+ φ
c
α
†
(r)
λ
φ
c
α
(r
)
ρ
_
= δ
3
(r −r
) δ
λ,ρ
(1963)
where λ and ρ denote the components of the Dirac spinor
146
.
12 The ManyParticle Dirac Field
12.1 The Algebra of Fermion Operators
Second quantization of fermions amounts to adopting an occupation number
representation. Therefore, we shall examine No. accounting for fermions
147
.
Fermion operators satisfy anticommutation relations. The anticommutator
of two operators
ˆ
A and
ˆ
B is deﬁned as
¦
ˆ
A ,
ˆ
B ¦
+
≡
ˆ
A
ˆ
B +
ˆ
B
ˆ
A (1964)
The fermion creation and annihilation operators, ˆ c
†
α
and ˆ c
α
, satisfy the anti
commutation relations
¦ ˆ c
†
α
, ˆ c
†
β
¦
+
= 0
¦ ˆ c
α
, ˆ c
β
¦
+
= 0 (1965)
and
¦ ˆ c
†
α
, ˆ c
β
¦
+
= δ
α,β
(1966)
where the quantum numbers α and β describe a complete set of singleparticle
states.
The anticommutation relation
ˆ c
†
α
ˆ c
†
β
= − ˆ c
†
β
ˆ c
†
α
(1967)
is merely a restatement of the antisymmetric nature of a fermionic many
particle wave function under the permutation of a pair of particles, as is the
Hermitean conjugate relation
ˆ c
α
ˆ c
β
= − ˆ c
β
ˆ c
α
(1968)
146
Frequently, the relativistic free electron states are given a manifestly covariant normaliza
tion, in order to facilitate covariant perturbation theory. The use of diﬀerent normalization
conventions results in changes the form of the completeness relation.
147
P. Jordan and E. Wigner, Zeit. f¨ ur Physik, 47, 631 (1928).
351
The number operator ˆ n
α
is deﬁned as
ˆ n
α
= ˆ c
†
α
ˆ c
α
(1969)
The choice of anticommutation relations results in the eigenvalues of the num
ber operator to be restricted to either n
α
= 1 or n
α
= 0. This can be seen by
examining the identity
ˆ n
α
ˆ n
α
= ˆ n
α
(1970)
which follows from
ˆ n
α
ˆ n
α
= ˆ c
†
α
ˆ c
α
ˆ c
†
α
ˆ c
α
= ˆ c
†
α
ˆ c
α
− ˆ c
†
α
ˆ c
†
α
ˆ c
α
ˆ c
α
= ˆ c
†
α
ˆ c
α
+ ˆ c
†
α
ˆ c
†
α
ˆ c
α
ˆ c
α
(1971)
where we have used the anticommutation relation for the creation and annihila
tion operator to obtain the second line and used the anticommutation relation
for two annihilation operators to obtain the last line. On comparing the second
and third lines, one recognizes that
ˆ c
†
α
ˆ c
†
α
ˆ c
α
ˆ c
α
= 0 (1972)
Hence, we have
ˆ n
α
ˆ n
α
= ˆ c
†
α
ˆ c
α
= ˆ n
α
(1973)
Thus, the eigenstates of the number operator satisfy the equation
ˆ n
α
ˆ n
α
[ n
α
> = ˆ n
α
[ n
α
>
n
2
α
[ n
α
> = n
α
[ n
α
> (1974)
Therefore, for there to be nontrivial eigenstates the eigenvalues must satisfy
the equation
n
α
( n
α
− 1 ) = 0 (1975)
which only has the solutions n
α
= 0 and n
α
= 1. Thus the choice of anti
commutation relations for the creation and annihilation operators results in
the Pauli exclusion principle. The Pauli exclusion principle states that a non
degenerate quantum state can not be occupied by more than one fermion.
The number operator satisﬁes the commutation relations
[ ˆ n
α
, ˆ c
†
β
] = δ
α,β
ˆ c
†
β
[ ˆ n
α
, ˆ c
β
] = − δ
α,β
ˆ c
β
(1976)
as can be seen by using the fermion anticommutation relations. The hierarchy
of eigenstates of the number operator can be found from the action of the
352
creation operator. In particular, one can deﬁne the eigenstate of the annihilation
operator with eigenvalue zero by
ˆ c
α
[ 0 > = 0 (1977)
The state [ 0 > is also an eigenstate of the number operator with eigenvalue
zero since
ˆ n
α
[ 0 > = ˆ c
†
α
ˆ c
α
[ 0 >
= 0 (1978)
A general eigenstate of the number operator with eigenvalue n
α
can be expressed
as
[ n
α
> =
( ˆ c
†
α
)
nα
√
n
α
!
[ 0 > (1979)
as can be seen by using the commutation relation
[ ˆ n
α
, ( ˆ c
†
α
)
nα
] = n
α
( ˆ c
†
α
)
nα
(1980)
If this operator equation acts on the state where the quantum state α is unoc
cupied [ 0 >, and using the condition
ˆ n
α
[ 0 > = 0 (1981)
one ﬁnds the state of equation(1978) satisﬁes the eigenvalue equation
ˆ n
α
[ n
α
> = n
α
[ n
α
> (1982)
with eigenvalue of either unity or zero.
A general number operator eigenstate can be expressed in terms of the oc
cupation numbers of all the singleparticle states
[ ¦n
α
¦ > =
∞
α=1
_
( ˆ c
†
α
)
nα
√
n
α
!
_
[ 0 , 0 , 0 , . . . , 0 > (1983)
where the allowed values of the set of occupation numbers n
α
are either unity
or zero. The sequencing or ordering of the creation operators in this expression
is crucial, since the interchange the positions of the operators may result in a
change in sign of the state. For example, the action of a creation operators on
a general number eigenstate has the eﬀect
ˆ c
†
β
[ n
1
n
2
. . . n
β
. . . > = ( − 1 )
(
β
i=1
ni)
[ n
1
n
2
. . . n
β
+1 . . . > (1984)
where the sign occurs since this involves anticommutating ˆ c
†
β
with
β
i=1
n
i
other creation operators to bring it into the βth position.
353
12.2 Quantizing the Dirac Field
The quantization of the Dirac ﬁeld proceeds exactly the same way as for non
relativistic electrons
148
. However, the negativeenergy states will be described
with a diﬀerent notation from the positiveenergy states. The change of no
tation is to reﬂect the intent of describing the (quasiparticle) excitations of
the system and not to describe the manyparticle ground state which is unob
servable. The wave functions φ
α
(r) describing the positiveenergy states of the
noninteracting electrons are indexed by the set of quantum numbers α ≡ (k, σ).
The negativeenergy states are described as the charge conjugates of the positive
energy states. Therefore, the negativeenergy states are described by the same
set of indices α and the corresponding wave functions are denoted by φ
c
α
(r).
The annihilation operator for electrons in the positiveenergy state α is denoted
by ˆ c
α
. However, the operator which removes an electron from the (negative
energy) charge conjugated state φ
c
α
(r) is denoted by a creation operator
ˆ
b
†
α
. The
change from annihilation operator to creation operator merely represents that
creating a positron with quantum numbers α is equivalent to creating a hole in
the negativeenergy state
149
. The eﬀect of the annihilation operators on Dirac’s
vacuum [ 0 >, in which all the negativeenergy states are fully occupied are
ˆ c
α
[ 0 > = 0
ˆ
b
α
[ 0 > = 0 (1985)
where the ﬁrst expression follows from the assumed absence of electrons in the
positiveenergy states, and the second expression follows from the assumption
that all the negativeenergy states are completely ﬁlled, so adding an extra elec
tron to the state φ
c
α
is forbidden by the Pauliexclusion principle. More concisely,
the above relations state that the vacuum contains neither (positiveenergy)
electrons nor positrons. It is seen that the form of the anticommutation rela
tions are unchanged by this simple change of notation. The anticommutation
relations become
¦ ˆ c
†
α
, ˆ c
†
β
¦
+
= ¦ ˆ c
α
, ˆ c
β
¦
+
= 0
¦ ˆ c
†
α
, ˆ c
β
¦
+
= δ
α,β
(1986)
for the electron operators
¦
ˆ
b
†
α
,
ˆ
b
†
β
¦
+
= ¦
ˆ
b
α
,
ˆ
b
β
¦
+
= 0
¦
ˆ
b
†
α
,
ˆ
b
β
¦
+
= δ
α,β
(1987)
for the positron operators, and the mixed electron/positron anticommutation
relations are given by
¦ ˆ c
†
α
,
ˆ
b
†
β
¦
+
= ¦ ˆ c
α
,
ˆ
b
β
¦
+
= ¦ ˆ c
†
α
,
ˆ
b
β
¦
+
= 0 (1988)
148
W. Heisenberg and W. Pauli, Zeit. f¨ ur Physik, 56, 1 (1929).
W. Heisenberg and W. Pauli, Zeit. f¨ ur Physik, 59, 168 (1930).
149
W. H. Furry and J. R. Oppenheimer, Phys. Rev. 45, 245 (1934).
354
The mixed electron/positron anticommutation relations are all zero, since the
operators describe electrons in diﬀerent singleparticle energy eigenstates. In
this notation, the ﬁeld operators are expressed as
150
ˆ
ψ(r) =
α
_
φ
α
(r) ˆ c
α
+ φ
c
α
(r)
ˆ
b
†
α
_
(1989)
and
ˆ
ψ
†
(r) =
α
_
φ
∗
α
(r) ˆ c
†
α
+ φ
c
α
∗
(r)
ˆ
b
α
_
(1990)
The ﬁeld operators
ˆ
ψ(r) and
ˆ
ψ
†
(r) are expected to be canonically conjugate, as
we shall show below.
The Lagrangian density is given by
L = c
ˆ
ψ
†
_
i ¯h γ
µ
∂
µ
− m c
_
ˆ
ψ (1991)
so the momentum ﬁeld operator
ˆ
Π(r) canonically conjugate to
ˆ
ψ(r) is given by
ˆ
Π(r) =
1
c
δL
δ(∂
0
ˆ
ψ)
= i ¯h
ˆ
ψ
†
(r) γ
(0)
= i ¯h
ˆ
ψ
†
(r) (1992)
Hence, one expects that the ﬁeld operators
ˆ
ψ
†
(r) and
ˆ
ψ(r) are canonically con
jugate and, therefore, satisfy the equaltime anticommutation relations
¦
ˆ
ψ
†
(r)
λ
,
ˆ
ψ(r
)
ρ
¦
+
= δ
3
(r −r
) δ
λ,ρ
(1993)
where λ and ρ label the components of the Dirac spinor. The anticommutation
relations for the ﬁeld operators can be veriﬁed by noting that
¦
ˆ
ψ
†
(r) ,
ˆ
ψ(r
) ¦
+
=
α,β
_
¦ ˆ c
†
α
, ˆ c
β
¦
+
φ
∗
α
(r) φ
β
(r
) + ¦ ˆ c
†
α
,
ˆ
b
†
β
¦
+
φ
∗
α
(r) φ
c
β
(r
)
+ ¦
ˆ
b
α
, ˆ c
β
¦
+
φ
c
α
∗
(r) φ
β
(r
) + ¦
ˆ
b
α
,
ˆ
b
†
β
¦
+
φ
c
α
∗
(r) φ
c
β
(r
)
_
=
α,β
_
δ
α,β
φ
∗
α
(r) φ
β
(r
) + δ
α,β
φ
c
α
∗
(r) φ
c
β
(r
)
_
=
α
_
φ
∗
α
(r) φ
α
(r
) + φ
c
α
∗
(r) φ
c
α
(r
)
_
= δ
3
(r −r
) (1994)
where the fermion anticommutation relations have been used in arriving at the
second line. The positiveenergy states and their charge conjugated states form
150
W. Heisenberg and W. Pauli, Zeit. f¨ ur Physik, 56, 1 (1929).
W. Heisenberg and W. Pauli, Zeit. f¨ ur Physik, 59, 168 (1930).
355
a complete set of basis states for the singleparticle Dirac equation, so their
completeness condition has been used in going from the third to the fourth line.
The equaltime ﬁeld anticommutation relations can be generalized to ﬁeld anti
commutators at spacetime points with a general type of separation. In the case
where the two ﬁeld points x and x
have a spacelike separation
( x
µ
− x
µ
) ( x
µ
− x
µ
) < 0
causality dictates that the anticommutators are zero
¦
ˆ
ψ
†
(x) ,
ˆ
ψ(x
) ¦
+
= 0
That is, for spacelike separations, there is no causal connection
151
so a mea
surement of a local ﬁeld at x
cannot aﬀect a measurement at x. N. Bohr and
r
c
t
∆x
2
> 0
∆x
2
< 0
Figure 70: Due to causality, the anticommutator of the ﬁeld operator should
vanish for spacelike separations. The anticommutators can be nonzero inside
or on the light cone.
L. Rosenfeld
152
have put forward general arguments that the commutation rela
tions also place limitations on the measurement of ﬁelds at timelike separations.
The Hamiltonian density for the (noninteracting) quantized Dirac ﬁeld the
ory can be expressed as the operator
ˆ
H =
ˆ
ψ
†
γ
(0)
c
_
− i ¯h γ . ∇ + m c
_
ˆ
ψ (1995)
and the Hamiltonian operator is given by
ˆ
H =
_
d
3
r
ˆ
H (1996)
151
Outside the lightcone there is no way to distinguish between future and past.
152
N. Bohr and L. Rosenfeld, Kon. Dansk. Vid. Selskab., Mat.Fys. Medd. XII, 8 (1933).
356
When the expansion of the quantized ﬁeld in terms of singleparticle wave func
tions is substituted into the Hamiltonian, one ﬁnds
ˆ
H =
α
_
E
α
ˆ c
†
α
ˆ c
α
+ E
c
α
ˆ
b
α
ˆ
b
†
α
_
=
α
_
E
α
ˆ c
†
α
ˆ c
α
− E
α
ˆ
b
α
ˆ
b
†
α
_
(1997)
where the expression for the energy of the charge conjugated state
E
c
α
= − E
α
(1998)
has been used. On anticommuting the positron and annihilation operators, one
ﬁnds
ˆ
H =
α
E
α
_
ˆ c
†
α
ˆ c
α
+
ˆ
b
†
α
ˆ
b
α
− 1
_
(1999)
The last term, when summed over α, yields the inﬁnitely negative energy of
Dirac’s vacuum in which all the negativeenergy states are ﬁlled. The vacuum
energy shall be used as the reference energy, so the Hamiltonian becomes
ˆ
H =
α
E
α
_
ˆ c
†
α
ˆ c
α
+
ˆ
b
†
α
ˆ
b
α
_
(2000)
which describes the energy of the excited state as the sum of the energies of the
excited electrons and the excited positrons. The energies of the positrons and
electrons are given by positive numbers.
The momentum operator deﬁned by Noether’s theorem is found as
ˆ
P =
k,σ
¯h k
_
ˆ c
†
k,σ
ˆ c
k,σ
+
ˆ
b
†
k,σ
ˆ
b
k,σ
_
(2001)
which is just the sum of the momenta of the (positiveenergy) electrons and the
positrons. The spin operator is deﬁned as
ˆ
S =
¯h
2
_
d
3
r
ˆ
ψ
†
ˆ σ
ˆ
ψ (2002)
This is evaluated by substituting the expression for the ﬁeld operators in terms
of the singleparticle wave functions and the particle creation and annihilation
operators. The expectation value of the spin operator in the charge conjugated
state φ
c
α
is given by
_
d
3
r φ
c
α
†
(r) ˆ σ φ
c
α
(r) = −
_ _
d
3
r φ
†
α
(r) γ
(2)
σ
∗
γ
(2)
φ
α
(r)
_
∗
=
_ _
d
3
r φ
†
α
(r) σ
(2)
ˆ σ
∗
σ
(2)
φ
α
(r)
_
∗
357
= −
_ _
d
3
r φ
†
α
(r) ˆ σ φ
α
(r)
_
∗
= −
_ _
d
3
r φ
†
α
(r) ˆ σ φ
α
(r)
_
(2003)
The third line follows from the identity
σ
(2)
ˆ σ
∗
σ
(2)
= − ˆ σ (2004)
The last line follows since σ is Hermitean. Hence, the spin operator is evaluated
as
ˆ
S =
¯h
2
k;σ
,σ
χ
†
σ
σ χ
σ
_
ˆ c
†
k,σ
ˆ c
k,σ
+
ˆ
b
†
k,σ
ˆ
b
k,σ
_
(2005)
which is just the sums of the spins of the electrons and positrons.
Finally, the conserved Noether charge corresponding to the global gauge
invariance is given by
ˆ
Q =
_
d
3
r
ˆ
ψ
†
(r)
ˆ
ψ(r)
=
α
_
ˆ c
†
α
ˆ c
α
+
ˆ
b
α
ˆ
b
†
α
_
=
α
_
ˆ c
†
α
ˆ c
α
−
ˆ
b
†
α
ˆ
b
α
+ 1
_
(2006)
The last term in the parenthesis, when summed over all states α, yields the total
charge of the vacuum which is to be discarded. Hence, the observable charge is
deﬁned as
ˆ
Q =
α
_
ˆ c
†
α
ˆ c
α
−
ˆ
b
†
α
ˆ
b
α
_
(2007)
which shows that the total electrical charge deﬁned as the diﬀerence between
the number of electrons and the number of positrons is conserved.
12.3 Parity, Charge and Time Reversal Invariance
The Lagrangian density may posses continuous symmetries and it may also
posses discrete symmetries. Some of the discrete symmetries are examined be
low.
358
12.3.1 Parity
The parity eigenvalue equation for a multiparticle state with parity η
ψ
can be
expressed as
ˆ
T [ ψ > = η
ψ
[ ψ > (2008)
Since the action of the parity operator on states is described by a unitary opera
tor, operators transform under parity according to the general form of a unitary
transformation. In particular, the eﬀect of the parity transformation on the
ﬁeld operator is determined as
ˆ
ψ(r) →
ˆ
ψ
(r
) =
ˆ
T
ˆ
ψ(r)
ˆ
T (2009)
The parity transformation is going to be determined in analogy with the parity
transformation of a classical ﬁeld, in which the creation and annihilation oper
ators are replaced by complex numbers. The parity operation on the quantum
ﬁeld can be interpreted as only acting on the wave functions and not the particle
creation and annihilation operators. Quantum mechanically, this corresponds
to viewing the parity operator as changing the properties of the states to the
properties associated with the parity reversed states. Since the ﬁeld operator is
expressed as
ˆ
ψ(r) =
α
_
ˆ c
α
φ
α
(r) +
ˆ
b
†
α
φ
c
α
(r)
_
(2010)
one has
ˆ
T
ˆ
ψ(r)
ˆ
T =
α
_
ˆ c
α
ˆ
T φ
α
(r)
ˆ
T +
ˆ
b
†
α
ˆ
T φ
c
α
(r)
ˆ
T
_
(2011)
However, under a parity transform a general Dirac spinor satisﬁes
ˆ
T φ
α
(r) = η
P
α
φ
Pα
(r)
ˆ
T φ
c
α
(r) = η
P
α
c
φ
c
Pα
(r) (2012)
where η
P
α
is a phase factor which represents the intrinsic parity of the state.
Furthermore, since
ˆ
T
2
=
ˆ
I, then the intrinsic parities η
P
α
and η
P
α
c
have to
satisfy the conditions
( η
P
α
)
2
= 1
( η
P
α
c
)
2
= 1 (2013)
So the intrinsic parities are ±1. The intrinsic parity of a state φ
α
(r) and its
charge conjugated state φ
c
α
(r) are related by
η
P
α
c
= − η
P
α
(2014)
This follows since charge conjugation ﬂips the upper and lower twocomponent
spinors and these twocomponent spinors have opposite intrinsic parity. There
fore, the state φ
α
(r) and the charge conjugates state φ
c
α
(r) have opposite pari
ties. Therefore, it follows that the ﬁeld operator transforms as
ˆ
T
ˆ
ψ(r)
ˆ
T =
α
_
η
P
α
ˆ c
α
φ
Pα
(r) − η
P
α
ˆ
b
†
α
φ
c
Pα
(r)
_
(2015)
359
so the quantum ﬁeld operators transforms in a similar fashion to the classical
ﬁeld.
The relations between parity reversed states and parity reversed charge con
jugated states can be veriﬁed by examining the free particle solutions of the
Dirac equation and noting that the parity operator consists of the product of
γ
(0)
and spatial inversion r → − r. This spatial inversion acting on a wave
function with momentum k and spin σ becomes a wave function with momentum
−k and spin σ, up to a constant of proportionality. A free particle momentum
eigenstate is given by
φ
σ,k
(x) = ^
_
χ
σ
c ¯ h k . σ
E + m c
2
χ
σ
_
exp
_
− i ( k
0
x
(0)
− k . r )
_
(2016)
The application of the parity operator to the above wave function yields
ˆ
T φ
σ,k
(x) = ^ γ
(0)
_
χ
σ
c ¯ h k . σ
E + m c
2
χ
σ
_
exp
_
− i ( k
0
x
(0)
+ k . r )
_
= ^
_
χ
σ
−
c ¯ h k . σ
E + m c
2
χ
σ
_
exp
_
− i ( k
0
x
(0)
+ k . r )
_
= φ
σ,−k
(x) (2017)
as anticipated. The charge conjugated state is given by
φ
c
σ,k
(x) =
ˆ
C φ
∗
σ,k
(x)
= ^
ˆ
C
_
χ
∗
σ
c ¯ h k . σ
∗
E + m c
2
χ
∗
σ
_
exp
_
+ i k
µ
x
µ
_
(2018)
where
ˆ
C = − i γ
(2)
=
_
_
_
_
0 0 0 −1
0 0 1 0
0 1 0 0
−1 0 0 0
_
_
_
_
(2019)
Therefore, the charge conjugate wave function is given by
φ
c
σ,k
(x) = − i ˆ σ
(2)
^
_
c ¯ h k . σ
∗
E + m c
2
χ
∗
σ
− χ
∗
σ
_
exp
_
+ i k
µ
x
µ
_
(2020)
The eﬀect of the parity operator on this state leads to
ˆ
T φ
c
σ,k
(x) = − i ˆ σ
(2)
^
_
c ¯ h k . σ
∗
E + m c
2
χ
∗
σ
+ χ
∗
σ
_
exp
_
+ i ( k
(0)
x
0
+ k . r )
_
= − i ˆ σ
(2)
^
_
−
c ¯ h ( − k . σ
∗
)
E + m c
2
χ
∗
σ
+ χ
∗
σ
_
exp
_
+ i ( k
(0)
x
0
+ k . r )
_
= − φ
c
σ,−k
(x) (2021)
360
where in the ﬁrst line the parity operator has sent r → − r and the factor of
γ
(0)
has ﬂipped the sign of the lower components. In the second line we have
rewritten k as −(−k) in the two twocomponent spinor, in anticipation of the
comparison with eqn(2018) which allows us to identify the factor of φ
c
σ,−k
(x).
This example shows that a state and its charge conjugate have opposite intrinsic
parities.
From the general form of the parity transformation on Dirac spinors, one
infers that the parity transform of the ﬁeld operator is given by
ˆ
T
ˆ
ψ(r)
ˆ
T =
α
_
ˆ c
α
η
P
α
φ
Pα
(r) −
ˆ
b
†
α
η
P
α
φ
c
Pα
(r)
_
(2022)
On setting α
= T α and noting that α = T α
, one ﬁnds
ˆ
T
ˆ
ψ(r)
ˆ
T =
Pα
_
η
P
Pα
ˆ c
Pα
φ
α
(r) − η
P
Pα
ˆ
b
†
Pα
φ
c
α
(r)
_
(2023)
and on transforming the summation index from α
to α
ˆ
T
ˆ
ψ(r)
ˆ
T =
α
_
η
P
Pα
ˆ c
Pα
φ
α
(r) − η
P
Pα
ˆ
b
†
Pα
φ
c
α
(r)
_
(2024)
Thus, the parity operation can also be interpreted as only aﬀecting the particle
creation and annihilation operators, and not the wave functions. Quantum
mechanically, this interpretation corresponds to viewing that the particles as
being transferred into their parity reversed states
ˆ
T
ˆ
ψ(r)
ˆ
T =
α
_
ˆ
T ˆ c
α
ˆ
T φ
α
(r) +
ˆ
T
ˆ
b
†
α
ˆ
T φ
c
α
(r)
_
(2025)
In this new interpretation, the eﬀects of parity on the fermion operators are
found by identifying the operators multiplying the singleparticle wave functions
in the previous two equations. The resulting operator equations are
ˆ
T ˆ c
α
ˆ
T = η
P
Pα
ˆ c
Pα
(2026)
and
ˆ
T
ˆ
b
†
α
ˆ
T = − η
P
Pα
ˆ
b
†
Pα
(2027)
which shows that fermion particles and antiparticles have opposite intrinsic
parities. Therefore, we conclude that, irrespective of which interpretation is
used, the ﬁeld operator transforms as
ˆ
T
ˆ
ψ(r)
ˆ
T =
α
_
η
P
α
ˆ c
α
φ
Pα
(r) − η
P
α
ˆ
b
†
α
φ
c
Pα
(r)
_
(2028)
which shows that the quantum ﬁeld operators transforms in a similar fashion
to the classical ﬁeld.
361
12.3.2 Charge Conjugation
Under charge conjugation, the classical Dirac ﬁeld transforms as
ψ → ψ
c
= − i γ
(2)
ψ
∗
(2029)
(up to an arbitrary phase) since this is how the singleparticle wave functions
transform. Classically, the (antilinear) charge conjugation operator
ˆ
( is the
product of complex conjugation and the unitary matrix operator
ˆ
C = − i γ
(2)
.
If the classical ﬁeld is expressed as a linear superposition of energy eigenfunc
tions, the amplitudes of the eigenfunctions are represented by complex numbers.
In the charge conjugated state, these amplitudes are replaced by the complex
conjugates. In the quantum ﬁeld, the amplitudes must be replaced by parti
cle creation and annihilation operators. If an amplitude is associated with an
annihilation operator, then the complex conjugate of the amplitude is usually
associated with a creation operator. Hence, we should expect that charge con
jugation will result in the creation and annihilation operators being switched.
Since the quantum ﬁeld operator is expressed as
ˆ
ψ(r) =
α
_
ˆ c
α
φ
α
(r) +
ˆ
b
†
α
φ
c
α
(r)
_
(2030)
the charge conjugate operation
ˆ
( transforms the ﬁeld operator via
ˆ
ψ
c
(r) =
ˆ
(
ˆ
ψ(r)
ˆ
( =
α
_
ˆ c
†
α
ˆ
( φ
α
(r)
ˆ
( +
ˆ
b
α
ˆ
( φ
c
α
(r)
ˆ
(
_
(2031)
where, in accord with the earlier comment about the relation between the quan
tum and classical ﬁelds, the singleparticle operators have been replaced by their
Hermitean conjugates. However, under charge conjugation general Dirac spinors
satisfy
ˆ
( φ
α
(r) = η
c
φ
c
α
(r)
ˆ
( φ
c
α
(r) = η
c
φ
α
(r) (2032)
therefore,
ˆ
ψ
c
(r) =
ˆ
(
ˆ
ψ(r)
ˆ
( =
α
η
c
_
ˆ c
†
α
φ
c
α
(r) +
ˆ
b
α
φ
α
(r)
_
(2033)
However, if the charge conjugation operator is to be interpreted as only acting
on the singleparticle operators, one has
ˆ
ψ
c
(r) =
α
_
ˆ
( ˆ c
α
ˆ
( φ
α
(r) +
ˆ
(
ˆ
b
†
α
ˆ
( φ
c
α
(r)
_
(2034)
362
For consistency, the two expressions for
ˆ
ψ
c
(r) must be equivalent. Hence, the
operator coeﬃcients of φ
α
(r) and φ
c
α
(r) in the two expressions should be iden
tical. Therefore, one requires that
ˆ
( ˆ c
α
ˆ
( = η
c
ˆ
b
α
ˆ
(
ˆ
b
†
α
ˆ
( = η
c
ˆ c
†
α
(2035)
In other words, charge conjugation replaces particles by their antiparticles and
their quantum numbers α are unchanged. Furthermore, we identify the charge
conjugated ﬁeld operator as
ˆ
ψ
c
=
ˆ
(
ˆ
ψ
ˆ
( = − i η
c
γ
(2)
ˆ
ψ
†
(2036)
where
ˆ
ψ
†
is the Hermitean conjugate (column) ﬁeld operator. Apart from the
replacement of the complex amplitudes with the Hermitean conjugates of the
creation and annihilation operators, the above expression is identical to the ex
pression for charge conjugation on the classical ﬁeld.
The charge conjugation operator has the eﬀect of reversing the current den
sity operator
ˆ
(
ˆ
ψ
†
γ
µ
ˆ
ψ
ˆ
( = −
ˆ
ψ
†
γ
µ
ˆ
ψ (2037)
which is understood as the result in the change of the charge’s sign.
12.3.3 Time Reversal
The timereversal operation interchanges the past with the future. Time reversal
transforms the spacetime coordinates via
ˆ
T (ct, r) = (−ct, r) (2038)
Thus, under time reversal, the time and spatial components of the position
fourvector have diﬀerent transformational properties. Furthermore, the energy
momentum fourvector transforms as
ˆ
T (p
(0)
, p) = (p
(0)
, −p) (2039)
Hence, the position fourvector and momentum fourvector have diﬀerent trans
formational properties. Due to the above properties, angular momentum (in
cluding spin) transforms as
ˆ
T J = − J (2040)
Therefore, we ﬁnd that time reversal reverses momenta and ﬂips spins.
According to the Wigner theorem
153
, time reversal can only be implemented
by an antilinear antiunitary transformation. Since the time reversal operator
153
E. P. Wigner. G¨ottinger. Nachr. 31, 546 (1932).
363
ˆ
T interchanges the initial and ﬁnal states, then
<
ˆ
T ψ
f
[
ˆ
T ψ
i
> = < ψ
i
[ ψ
f
>
= < ψ
f
[ ψ
i
>
∗
(2041)
Thus,
ˆ
T must be an antiunitary operator. Furthermore, if the initial state is
given by a linear superposition
[ ψ
i
> =
α
C
α
[ φ
α
> (2042)
then the overlap is given by
α
<
ˆ
T ψ
f
[
ˆ
T C
α
[ φ
α
> =
α
C
∗
α
< ψ
f
[ φ
α
>
∗
(2043)
Hence, one infers that
ˆ
T
α
C
α
[ φ
α
> =
α
C
∗
α
ˆ
T [ φ
α
> (2044)
which is the deﬁnition of an antilinear operator and so we identify
ˆ
T as an
antilinear operator.
It can be shown that the timereversed Dirac wave function deﬁned by
ˆ
T ψ(t, r) = − γ
(1)
γ
(3)
ψ
∗
(−t, r) (2045)
satisﬁes the Dirac equation with t → − t. For example, the plane wave
solutions of the Dirac equation can be shown to transform as
ˆ
T φ
σ,k
(r, t) = − γ
(1)
γ
(3)
φ
∗
σ,k
(r, −t)
= φ
−σ,−k
(r, t) (2046)
which ﬂips the momentum and the spin angular momentum. It should be noted
that the matrix operator γ
(1)
γ
(3)
does not couple the upper and lower two
component spinors, but nevertheless is closely related to the operator − i γ
(2)
which occurs in the charge conjugation operator.
Also, if the Dirac ﬁeld operator is required to satisfy
ˆ
T
ˆ
ψ(t, r)
ˆ
T = − γ
(1)
γ
(3)
ˆ
ψ
∗
(−t, r) (2047)
then the singleparticle operators must satisfy
ˆ
T c
α
ˆ
T = c
T α
ˆ
T b
α
ˆ
T = b
T α
(2048)
which correspond to particles following timereversed trajectories.
364
Table 19: Discrete Symmetries of Particles.
The charge conjugated of a state is a negative energy state with momentum −p
and spin −σ, that is interpreted as the state of antiparticle with momentum p
and spin σ.
Q p σ Λ
Charge Conjugation − + + +
Parity + − + −
Time Reversal + − − +
CPT − + − −
It is known that the weak interaction violates parity invariance. However,
there was a slight possibility that the weak interaction conserves the combined
operation of charge conjugation and spatial inversion
154
. Christenson, Cronin,
Fitch and Turlay
155
performed experiments which showed that the combined
operation C P is violated in the decay of K mesons. There is reason to believe
that the weak interaction is invariant under the combined symmetry operation
C P T, since this is related to Lorentz invariance.
The combined symmetry operation
ˆ
(
ˆ
T
ˆ
T transforms a Dirac spinor as
ψ
(x
) =
ˆ
(
ˆ
T
ˆ
T ψ(x)
= − i γ
(2)
_
ˆ
T
ˆ
T ψ(x)
_
∗
= + i γ
(2)
_
γ
(0)
γ
(1)
γ
(3)
ψ
∗
(−x)
_
∗
= i γ
(2)
γ
(0)
γ
(1)
γ
(3)
ψ(−x)
= i γ
(0)
γ
(1)
γ
(2)
γ
(3)
ψ(−x)
= γ
(4)
ψ(−x) (2049)
154
J. C. Wick, A. S. Wightman and E. P. Wigner, Phys. Rev. 88, 101 (1952).
155
J. Christenson, J. W. Cronin, V. I. Fitch and R. Turlay, Phys. Rev. Lett. 13, 138 (1964).
365
12.3.4 The CPT Theorem
The CPT theorem states that any local
156
quantum ﬁeld theory with a Her
mitean Lorentz invariant Lagrangian which satisﬁes the spinstatistics theorem,
is invariant under the compound operation
ˆ
(
ˆ
T
ˆ
T , where the operators can be
placed in any order.
The proof of the theorem relies on the fact that any Lorentz invariant quan
tity must be created out of contracting the indices of bilinear covariants (quan
tities such as the current density j
µ
which involve products of the γ
µ
) with the
indices of contravariant derivatives ∂
µ
. Since the joint operation
ˆ
T
ˆ
T results in
each of the contravariant derivatives ∂
µ
in the product changing sign, the theo
rem ensures that the corresponding bilinear covariants with which the deriva
tives are contracted with must undergo an equivalent number of sign changes
under the compound operation
ˆ
(
ˆ
T
ˆ
T . The theorem only assumes invariance
under proper orthochronous Lorentz transformations and makes no assumptions
about reﬂection. The improper transformations are treated as analytic continu
ation of the Lorentz transformation into complex spacetime. The theorem was
ﬁrst discussed by L¨ uders
157
and Pauli
158
, and then by Lee, Oehme and Yang
159
.
The theorem has several consequences, such as the equality of the masses of
particles and their antiparticles. This follows since the mass mc is an eigenvalue
of ˆ p
(0)
in the particle’s rest frame and since one can ﬁnd simultaneous eigenstates
of the commuting operators ˆ p
µ
and the product
ˆ
(
ˆ
T
ˆ
T . If one denotes the
compound operator as
ˆ
Θ =
ˆ
(
ˆ
T
ˆ
T (2050)
then
< Ψ [
ˆ
H [ Ψ > = < Ψ [
ˆ
Θ
−1
ˆ
Θ
ˆ
H
ˆ
Θ
−1
ˆ
Θ [ Ψ >
= < Ψ [
ˆ
Θ
−1
ˆ
H
ˆ
Θ [ Ψ > (2051)
since the CPT theorem ensures that
ˆ
Θ commutes with the Hamiltonian
ˆ
Θ
ˆ
H
ˆ
Θ
−1
=
ˆ
H (2052)
If [ Ψ > represents a stable singleparticle state, such as
[ Ψ > = c
†
α
[ 0 > (2053)
156
A Local Field Theory is one expressible in terms of a local Lagrangian density in which
interactions can be expressed in terms of products of ﬁelds at the same point in spacetime.
It would be truly remarkable if this concept were to continue to work at arbitrarily small
distances!
157
G. L¨ uders, Dan. Mat. Fys. Medd. 28, 5 (1954).
G. L¨ uders, Ann. Phys. 2, 1 (1957).
158
W. Pauli in Niels Bohr and the Development of Physics, Pergamon Press, London (1955).
159
T. D. Lee, R. Oehme and C. N. Yang, Phys. Rev. 106, 340 (1957).
366
then the state
ˆ
Θ [ Ψ > describes an antiparticle with ﬂipped angular momen
tum. This follows since the vacuum satisﬁes
ˆ
Θ [ 0 > = [ 0 > (2054)
Therefore, the singleparticle state transforms as
ˆ
Θ [ Ψ > =
ˆ
Θ c
†
α
ˆ
Θ
−1
ˆ
Θ [ 0 >
=
ˆ
Θ c
†
α
ˆ
Θ
−1
[ 0 > (2055)
By successive applications of
ˆ
(,
ˆ
T and
ˆ
T , one ﬁnds that the operator
ˆ
Θ c
†
α
ˆ
Θ
−1
reduces to the creation operator for the antiparticle with reversed angular mo
mentum. Therefore, the state
ˆ
Θ [ Ψ > describes an antiparticle with ﬂipped
angular momentum. From the equality of the expectation values
< Ψ [
ˆ
H [ Ψ > = < Ψ [
ˆ
Θ
−1
ˆ
H
ˆ
Θ [ Ψ > (2056)
one ﬁnds that the energy of a particle is equal to the energy of an antiparticle
with a reversed spin. However, as the rest mass cannot depend on the angular
momentum, the mass of a particle is equal to the mass of its antiparticle. For
unstable particles, the equality of the mass of the particle and antiparticle is
ensured by the invariance of the Smatrix under
ˆ
(
ˆ
T
ˆ
T .
Likewise, one can use the CPT theorem to show that the total decay rate
of a particle into products is equal to the total decay rate of the antiparticle
into its products
160
. It should be noted that the partial decay rates into speciﬁc
ﬁnal states are not equivalent, only the sums over all ﬁnal states are equal.
12.4 The Connection between Spin and Statics
The above result for the energy operator of the Dirac ﬁeld illustrates the “Spin
Statistics Theorem” proposed by Pauli
161
. The theorem states that particles
with half oddinteger spins obey FermiDirac Statistics and particles with integer
spins obey BoseEinstein Statistics. The Dirac spinor describes spin onehalf
particles, and if these particles are chosen to satisfy anticommutation relations,
then the energy of the excited states is given by
ˆ
H
Dirac
=
α
E
α
_
ˆ c
†
α
ˆ c
α
+
ˆ
b
†
α
ˆ
b
α
_
(2057)
which only has positive excitation energies. Hence, if the wave function changes
sign under the interchange of a pair of spin onehalf particles the energy is
bounded from below. If the ﬁeld operators had been chosen to obey commu
tation relation, then the wave function would have been symmetric under the
160
T. D. Lee, R. Oehme and C. N. Yang, Phys. Rev. 106, 340 (1957).
161
W. Pauli, Phys. Rev. 58, 716 (1940).
367
interchange of particles. If this were the case, there would be a negative sign in
front of the positron energies so that the energy would have been unbounded
from below. This would have implied that the vacuum would not be stable, and
the theory is erroneous. This can be taken as implying that spin onehalf par
ticles must obey FermiDirac Statistics. The other part of the theorem compels
integer spin particles to be bosons. Therefore, since photons have spin one, the
expression for the energy of the electromagnetic ﬁeld is considered to be given
by
ˆ
H
Photon
=
k,α
¯h ω
k
2
_
ˆ a
†
k,α
ˆ a
k,α
+ ˆ a
k,α
ˆ a
†
k,α
_
(2058)
This Hamiltonian represents the energy of a spinone particle. The photon
creation and annihilation operators satisfy commutation relations, therefore,
the energy can be expressed as
ˆ
H
Photon
=
k,α
¯h ω
k
2
_
2 ˆ a
†
k,α
ˆ a
k,α
+ 1
_
(2059)
which is the sum of the vacuum energy (the zeropoint energies) and the ener
gies of each excited photon. The excitation energies are positive. If it had been
assumed that the photon wave functions were antisymmetric under the inter
change of particles, then one would have found that the photon energies would
have been identically equal to zero. Furthermore, the excited photons would
have carried zero momentum and, therefore, be completely void of any physi
cal consequence. Hence, one concludes that spinone photons must obey Bose
Einstein Statistics. The generalized theorem
162
is an assertion that a nontrivial
integer spin ﬁeld cannot have a anticommutator that vanishes for spacelike sep
arations and a nontrivial odd halfinteger spin ﬁeld cannot have a commutator
that vanishes for spacelike separations.
13 Massive Gauge Field Theory
Following Yang and Mills
163
, we shall consider a twocomponent complex scalar
ﬁeld. The ﬁeld can be expressed as a twocomponent ﬁeld, representing states
with diﬀerent isospin
Φ =
_
Φ
1
Φ
2
_
(2060)
where the Φ
i
are complex scalars. That is
Φ
1
= 'e Φ
1
+ i ·m Φ
1
Φ
2
= 'e Φ
2
+ i ·m Φ
2
(2061)
162
R. Streater and A. S. Wightman, PCT, Spin and Statistics, and All That, Princeton
Univ. Press (2000).
163
C. N. Yang and R. L. Mills, Phys. Rev. 96, 191 (1964).
368
This is equivalent to assuming four independent real ﬁelds. The inner product
is deﬁned as
Φ
†
Φ = Φ
∗
1
Φ
1
+ Φ
∗
2
Φ
2
(2062)
13.1 The Gauge Symmetry
We shall assume that the Lagrangian is invariant under a generalized gauge
transformations of the form
Φ → Φ
= exp
_
− i α
(0)
_
ˆ
U Φ (2063)
where α
(0)
is an arbitrary scalar. The invariance of the Lagrangian under mul
tiplication of the wave function by the phase factor α
(0)
, is equivalent to the
usual U(1) gauge invariance which has been discussed in the context of the elec
tromagnetic ﬁeld. The operator
ˆ
U must be a unitary operator, if the norm of
Φ is conserved by the generalized gauge transformation
Φ
†
Φ
= Φ
†
ˆ
U
†
ˆ
U Φ
= Φ
†
Φ (2064)
Therefore, one requires
ˆ
U
†
ˆ
U =
ˆ
I (2065)
and so
ˆ
U must be a unitary operator. The operator
ˆ
U is assumed to be an
arbitrary unitary matrix that acts on isospin states, that is, it acts on the two
components of Φ. Furthermore, it shall be assumed that the unitary matrix
has determinant + 1. Hence, the Lagrangian is assumed to be invariant under
a set of SU(2) gauge transformations. A general transformation of SU(2) is
generated by the three operators
τ
(1)
=
_
0 1
1 0
_
τ
(2)
=
_
0 −i
i 0
_
τ
(3)
=
_
1 0
0 −1
_
(2066)
where these matrices generate a Lie algebra. That is, the algebra of the com
mutation relations is closed, since
[ τ
(i)
, τ
(j)
] = 2 i ξ
i,j,k
τ
(k)
(2067)
where ξ
i,j,k
is the antisymmetric LeviCivita symbol. An arbitrary unitary
transformation can be expressed as
ˆ
U = exp
_
− i
k
α
k
τ
(k)
_
(2068)
369
where the α
k
are three real quantities. This represents an arbitrary rotation in
isospin space
164
. The U(1) gauge transformation can also be represented in the
same way. Namely, the U(1) transformation can be expressed as
ˆ
U
0
= exp
_
− i α
(0)
τ
(0)
_
(2069)
where τ
(0)
is the unit matrix
τ
(0)
=
_
1 0
0 1
_
(2070)
We should note that since τ
(0)
commutes with all isospin operators, the U(1)
symmetry is decoupled from the SU(2) symmetry. Hence, when a coupling to
gauge ﬁelds is introduced, the U(1) gauge ﬁeld may have a coupling constant
which is diﬀerent from the coupling constant for the three SU(2) gauge ﬁelds.
13.2 The Coupling to the Gauge Field
We shall start with a Lagrangian density L
scalar
describing the ﬁeld free two
component scalar ﬁeld, given by
L
scalar
= ∂
µ
Φ
†
∂
µ
Φ − V (Φ
†
Φ) (2071)
where V (Φ
†
Φ) is an arbitrary scalar potential. For example, in a KleinGordon
ﬁeld theory describing particles with mass m
V (Φ
†
Φ) =
_
m c
¯h
_
2
Φ
†
Φ (2072)
The Lagrangian is invariant under the combined gauge transformation if the
quantities α
k
are independent of x. In this case, the ﬁeld is invariant under the
transformation which is identical at each point in space, so the Lagrangian is
said to have a global gauge invariance.
We shall alter the Lagrangian, such that it is invariant under a gauge trans
formation which varies from point to point in space. These are local gauge
transformations, in which the α
k
(x) depend on x. If the Lagrangian is to be
invariant under local gauge transformations, then one must introduce a coupling
to gauge ﬁelds A
µ
. This coupling compensates for the change of the derivatives
under the gauge transformation, so that
_ _
∂
µ
− i g A
µ
_
Φ
_
†
_ _
∂
µ
− i g A
µ
_
Φ
_
=
_ _
∂
µ
− i g A
µ
_
Φ
_
†
_ _
∂
µ
− i g A
µ
_
Φ
_
(2073)
164
We shall not stop and contemplate the question of what restricts our measurements have to
be quantized along the isospin zdirection, and shall not ponder why there is a superselection
rule at work.
370
Since
Φ =
ˆ
U
†
Φ
(2074)
we require that
_ _
∂
µ
− i g A
µ
_
Φ
_
=
ˆ
U
_ _
∂
µ
− i g A
µ
_
Φ
_
(2075)
so the ﬁelds A
µ
must transform as
A
µ
=
ˆ
U A
µ
ˆ
U
†
+
i
g
ˆ
U
_
∂
µ
ˆ
U
†
_
(2076)
where the derivative only acts on the unitary transformation. Since the
ˆ
U are
generated by the three traceless τ
(k)
, there must be three components of A
µ
, i.e.
the ﬁelds have three components A
µ,k
. In addition, the U(1) symmetry requires
that there is a coupling to a gauge ﬁeld which has the diagonal form A
µ,(0)
τ
(0)
and is associated with the overall phase α
(0)
(x). The components A
µ,(0)
are
unchanged by SU(2) transformations and shall be considered separately. The
matrix form of A
µ
is given by
A
µ
=
3
k=1
A
µ,k
τ
(k)
=
_
A
µ,(3)
A
µ,(1)
− i A
µ,(2)
A
µ,(1)
+ i A
µ,(2)
− A
µ,(3)
_
(2077)
Under a gauge transformation
ˆ
U, the fourvectors A
µ,k
are transformed in
isospin space. For a global gauge transformation, the transformation is a ro
tation in isospin space. The gauge ﬁeld A
µ
is also required to transform as a
fourvector under Lorentz transformations.
We shall identify the contravariant derivative for the massive scalar particles
as
165
as
T
µ
= ∂
µ
− i g A
µ
− i g
0
A
µ
0
(2078)
and one recognizes that this has the same form as the coupling of charged
particles to the EM ﬁeld. In the EM case, the coupling occurs solely via τ
(0)
,
the coupling constant is given by g
0
=
q
¯ h c
and the ﬁeld A
µ(0)
= A
µ
is the
usual fourvector potential. Since τ
(0)
commutes with all isospin operators, it
is not necessary to consider g
0
to be identical with the g value for the SU(2)
gauge ﬁelds.
13.3 The Free Gauge Fields
We have four real fourvector ﬁelds consisting of the A
µ,k
and A
µ,(0)
. These are
the gauge ﬁelds. The free gauge ﬁelds exist in the absence of the particles, and
165
This can be related to the contravariant derivative familiar in the context of general
relativity, if one follows the logic adopted by Weyl and considers GR as a gauge ﬁeld theory.
371
has a free Lagrangian. The ﬁeld strength tensors F
µ,ν
are given by the SU(2)
generalized form of the EM ﬁeld tensor
F
µ,ν
= T
µ
A
ν
− T
ν
A
µ
(2079)
where T is the covariant derivative only involving the SU(2) triplet of gauge
ﬁelds. It should be noted that since the gauge ﬁelds do not commute, this
involves terms which are secondorder in the ﬁeld amplitudes. That is
F
µ,ν
=
_
∂
µ
A
ν
− ∂
ν
A
µ
_
− i g
_
A
µ
A
ν
− A
ν
A
µ
_
(2080)
The quadratic terms can be evaluated by using the commutation relations of
the isospin operators τ
(k)
. The kth component of the SU(2) triplet of gauge
ﬁelds is given by
F
µ,ν
k
= ∂
µ
A
ν
k
− ∂
ν
A
µ
k
+ g
3
{i,j}=1
ξ
i,j,k
( A
µ
i
A
ν
j
− A
ν
i
A
µ
j
)
= ∂
µ
A
ν
k
− ∂
ν
A
µ
k
+ 2 g
3
{i>j}=1
ξ
i,j,k
( A
µ
i
A
ν
j
− A
ν
i
A
µ
j
)
(2081)
where the indices i and j are summed over and ξ
i,j,k
is the LeviCivita symbol.
In arriving at the above expression, we have used the identity
τ
(i)
τ
(j)
= δ
i,j
τ
(0)
+ i
3
k=1
ξ
i,j,k
τ
(k)
(2082)
found by combining the anticommutation and commutation relations for the
Pauli spin matrices. There is no contribution to the last term in the ﬁeld tensor
from the U(1) gauge ﬁeld A
µ
(0)
since τ
(0)
commutes with all other matrices. The
zerothcomponent of the ﬁeld tensor is simply given by
F
µ,ν
(0)
= ∂
µ
A
ν
(0)
− ∂
ν
A
µ
(0)
(2083)
as expected for an electromagnetic ﬁeld. Since the SU(2) gauge ﬁelds don’t
commute, the ﬁeld theory is a nonAbelian gauge ﬁeld theory. Under an SU(2)
transformation, the ﬁeld tensors transform according to
F
µ,ν
→ F
µ,ν
=
ˆ
U F
µ,ν
ˆ
U
†
(2084)
which is just a local unitary transform in isospin space. The Lagrangian density
for all the free gauge ﬁelds can be expressed as
L
gauge
= −
1
32 π
Trace F
µ,ν
F
µ,ν
(2085)
372
where the Trace is evaluated in isospin space and takes into account that there
are a total of four ﬁelds. The Lagrangian density can be expressed directly in
terms of the contributions from four components of the ﬁeld. The result can be
expressed as
L
gauge
= −
1
16 π
3
k=0
F
k
µ,ν
F
µ,ν,k
(2086)
where we have decomposed the ﬁelds as
F
µ,ν
=
k
F
k
µ,ν
τ
(k)
(2087)
evaluated the product of the Pauli spin matrices and used the fact that the
Pauli spin matrices τ
(k)
for k ,= 0 are traceless.
One can consider the kcomponents of the vector potential A
µ
(i.e. the three
real components A
µ
k
for ﬁxed µ) as forming threevectors A
µ
in isospin space.
These quantities transform as threevectors under transformations in isospin
space, and also the A
µ
transform as fourvectors under Lorentz transformations
in Minkowsky spacetime. The threevector ﬁelds are spinone bosons with
isospin one. Hence, we might expect that the isospin triplet should contain two
oppositely charged particles and one uncharged particle. These particles are
supplemented by the particle corresponding to the single uncharged ﬁeld A
µ
(0)
.
In terms of this set of isospin vectors, the free gauge ﬁeld Lagrangian density
can be written in the form of a sum of a scalar product in isospin space and an
isospin scalar
L
gauge
= −
1
16 π
_
(∂
µ
A
ν
−∂
ν
A
µ
) + 2g A
µ
∧ A
ν
_
.
_
(∂
µ
A
ν
−∂
ν
A
µ
) + 2g A
µ
∧ A
ν
_
−
1
16 π
_
∂
µ
A
ν
(0)
− ∂
ν
A
µ
(0)
__
∂
µ
A
ν,(0)
− ∂
ν
A
µ,(0)
_
(2088)
It should be noted that the Lagrangian reduces to the sum of four noninteracting
electromagnetic Lagrangians in the limit g → 0. However, at ﬁnite values of
g, the Lagrangian density contains cubic and quartic interactions with coupling
strengths that are ﬁxed by gauge invariance in terms of the single gauge param
eter g.
Exercise:
Determine the equations of motion for the vector gauge ﬁelds, in the presence
of a source term
L
int
= −
1
c
Trace ( A
µ
. j
µ
) (2089)
where the current source j
µ
has also been decomposed in terms of Pauli spin
matrices.
373
Figure 71: The interaction vertices representing the interaction of three and
four isospin triplet gauge ﬁeld bosons.
It is convenient to introduce the two combinations
A
±
µ
=
1
√
2
_
A
(1)
µ
∓ i A
(2)
µ
_
(2090)
which appear in the isospin matrix form of A
µ
. These combinations are mutually
complex conjugate. Likewise, one can introduce the combinations of the ﬁeld
tensors
F
±
µ,ν
=
1
√
2
_
F
(1)
µ,ν
∓ i F
(2)
µ,ν
_
(2091)
which are evaluated as
F
±
µ,ν
= ( ∂
µ
∓ 2 i g A
(3)
µ
) A
±
ν
− ( ∂
ν
∓ 2 i g A
(3)
ν
) A
±
µ
(2092)
The third component of the ﬁeld tensor can be written as
F
(3)
µ,ν
= ( ∂
µ
A
(3)
ν
− ∂
ν
A
(3)
µ
) + 2 i g ( A
−
µ
A
+
ν
− A
−
ν
A
+
µ
) (2093)
In terms of these new combinations, the free Lagrangian for the gauge ﬁelds
become
L
gauge
= −
1
16 π
F
(0)
µ,ν
F
µ,ν,(0)
−
1
16 π
F
(3)
µ,ν
F
µ,ν,(3)
−
1
8 π
F
−
µ,ν
F
µ,ν,+
(2094)
where the ﬁrst two terms are recognized as being similar to the Lagrangian den
sity for the electromagnetic ﬁeld. It was ﬁrst hypothesized by Sheldon Glashow
that the electroweak interaction is produced by the massless vector bosons de
scribed by the above Lagrangian
166
. Masses for the gauge bosons should not
be added by hand, since the resulting theory would not be renormalizable. To
retain renormalizability of the theory, and to have massive vector bosons, we
need to break the symmetry.
166
S. L. Glashow, Nuclear Physics 22, 579588, (1961).
374
13.4 Breaking the Symmetry
We shall assume that our massive charged scalar boson ﬁeld has broken symme
try
167
. The small amplitude excitations of the ﬁeld with broken symmetry will
be modiﬁed, as will be the excitations of the gauge ﬁelds. Due to the symmetry
breaking of the scalar ﬁeld, the U(1) vector gauge ﬁeld will become coupled to
the triplet of SU(2) gauge ﬁelds. When the symmetry is broken, the elementary
excitations of the coupled system of ﬁelds change and these new excitations will
represent the observable particles.
The potential for the twocomponent scalar ﬁeld is chosen to be given by
V (Φ) =
_
m c
2 ¯h φ
0
_
2
_
Φ
†
Φ −φ
2
0
_
2
(2095)
where φ
0
is a ﬁxed real constant. The lowestenergy state described by this
potential is given by
Φ
†
Φ = φ
2
0
(2096)
This state is degenerate with respect to global rotations in the fourdimensional
space of 'e Φ
1
, ·m Φ
1
, 'e Φ
2
, ·m Φ
2
which keeps the magnitude of Φ
constant and uniform over space.
The symmetry is broken by assuming that the physical ground state corre
sponds to one speciﬁc choice of the uniform ﬁeld Φ. Given the speciﬁc ground
state which the system chooses spontaneously, one can make use of the global
gauge invariance to describe the ground state Φ
0
as a ﬁeld which has one non
zero component which is real. That is, α
k
can be chosen so that
Φ
0
=
_
'e Φ
1
0
_
=
_
φ
0
0
_
(2097)
The excited states can be expressed as
Φ =
_
φ
0
+ χ
1
0
_
(2098)
where the local gauge degrees of freedom have been used to make χ
1
real. This
excited ﬁeld is invariant under the transfor