Professional Documents
Culture Documents
Mathematical
Physical
Chemistry
Practical and Intuitive Methodology
Second Edition
Mathematical Physical Chemistry
Shu Hotta
Mathematical Physical
Chemistry
Practical and Intuitive Methodology
Second Edition
Shu Hotta
Takatsuki, Osaka, Japan
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
To my wife Kazue
and
To the memory of Roro
Preface to the Second Edition
vii
viii Preface to the Second Edition
Once again, the author wishes to thank many students for valuable discussions
and Dr. Shin’ichi Koizumi at Springer for giving him an opportunity to write
this book.
The contents of this book are based upon manuscripts prepared for both undergrad-
uate courses of Kyoto Institute of Technology by the author entitled “Polymer
Nanomaterials Engineering” and “Photonics Physical Chemistry” and a master’s
course lecture of Kyoto Institute of Technology by the author entitled “Solid-State
Polymers Engineering.”
This book is intended for graduate and undergraduate students, especially those
who major in chemistry and, at the same time, wish to study mathematical physics.
Readers are supposed to have a basic knowledge of analysis and linear algebra.
However, they are not supposed to be familiar with the theory of analytic functions
(i.e., complex analysis), even though it is desirable to have relevant knowledge
about it.
At the beginning, mathematical physics looks daunting to chemists, as used to be
the case with myself as a chemist. The book introduces the basic concepts of
mathematical physics to chemists. Unlike other books related to mathematical
physics, this book makes a reasonable selection of material so that students majoring
in chemistry can readily understand the contents in spontaneity. In particular, we
stress the importance of practical and intuitive methodology. We also expect engi-
neers and physicists to benefit from reading this book.
In Part I and Part II, the book describes quantum mechanics and electromagne-
tism. Relevance between the two is well considered. Although quantum mechanics
covers the broad field of modern physics, in Part I we focus on a harmonic oscillator
and a hydrogen (like) atom. This is because we can study and deal with many of
fundamental concepts of quantum mechanics within these restricted topics. More-
over, knowledge acquired from the study of the topics can readily be extended to
practical investigation of, e.g., electronic states and vibration (or vibronic) states of
molecular systems. We describe these topics by both analytic method (that uses
differential equations) and operator approach (using matrix calculations). We believe
that the basic concepts of quantum mechanics can be best understood by contrasting
the analytical and algebraic approaches. For this reason, we give matrix representa-
tions of physical quantities whenever possible. Examples include energy
ix
x Preface to the First Edition
in many parts along with their tangible applications to individual topics. Hence,
readers are recommended to carefully examine and compare the related contents
throughout the book. We believe that readers, especially chemists, benefit from the
writing style of this book, since it is suited to chemists who are good at intuitive
understanding.
The author would like to thank many students for their valuable suggestions and
discussions at the lectures. The author also wishes to thank many students for
valuable discussions and Dr. Shin’ichi Koizumi at Springer for giving him an
opportunity to write this book.
Part II Electromagnetism
7 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
7.1 Maxwell’s Equations and Their Characteristics . . . . . . . . . . . . . 269
7.2 Equation of Wave Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
7.3 Polarized Characteristics of Electromagnetic Waves . . . . . . . . . 280
7.4 Superposition of Two Electromagnetic Waves . . . . . . . . . . . . . 285
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Contents xv
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
Part I
Quantum Mechanics
In the last part, we study the theory of analytic functions, one of the most elegant
theories of mathematics. This approach not only helps cultivate a broad view of pure
mathematics, but also leads to the acquisition of practical methodology. The last part
deals with the introductory set theory and topology aswell.
Chapter 1
Schrödinger Equation and Its Application
(b)
−ℏ
scattered X-ray (ℏ ′)
Fig. 1.1 Scattering of an X-ray beam by an electron. (a) θ denotes a scattering angle of the X-ray
beam. (b) Conservation of momentum
E ¼ hν ¼ ħω, ð1:1Þ
where ħ h/2π and ω ¼ 2πν. The quantity ω is called angular frequency with ν
being frequency. The quantity ħ is said to be a reduced Planck constant.
Also Einstein (1917) concluded that momentum of light quantum p is identical to
the energy of light quantum divided by light velocity in vacuum c. That is, we have
p = ħk, ð1:3Þ
where k 2πλ n (n: a unit vector in the direction of propagation of light) is said to be a
wavenumber vector.
Meanwhile, Arthur Compton (1923) conducted various experiments where he
investigated how an incident X-ray beam was scattered by matter (e.g., graphite,
copper). As a result, Compton found out a systematical redshift in X-ray wave-
lengths as a function of scattering angles of the X-ray beam (Compton effect).
Moreover he found that the shift in wavelengths depended only on the scattering
angle regardless of quality of material of a scatterer. The results can be summarized
in a simple equation described as
h
Δλ ¼ ð1 cos θÞ, ð1:4Þ
me c
λe h=me c: ð1:5Þ
In other words, λe is equal to the maximum shift in the wavelength of the scattered
beam; this shift is obtained when θ ¼ π/2. The quantity λe is called an electron
Compton wavelength and has an approximate value of 2.426 1012 [m].
Let us derive (1.4) on the basis of conservation of energy and momentum. To this
end, in Fig. 1.1 we assume that an electron is originally at rest. An X-ray beam is
incident to the electron. Then the X-ray is scattered and the electron recoils as shown.
The energy conservation reads as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ħω þ me c2 ¼ ħω0 þ p2 c2 þ me 2 c4 , ð1:6Þ
where ω and ω0 are initial and final angular frequencies of the X-ray; the second term
of RHS is an energy of the electron in which p is a magnitude of momentum after
recoil. Meanwhile, conservation of the momentum as a vector quantity reads as
ħk = ħk0 þ p, ð1:7Þ
where k and k0 are wavenumber vectors of the X-ray before and after being scattered;
p is a momentum of the electron after recoil. Note that an initial momentum of the
electron is zero since the electron is originally at rest. Here p is defined as
p mu, ð1:8Þ
Figure 1.1 shows that ħk, ħk0, and p form a closed triangle.
From (1.6), we have
2
me c2 þ ħðω ω0 Þ ¼ p2 c 2 þ m e 2 c 4 : ð1:10Þ
Hence, we get
2me c2 ħðω ω0 Þ þ ħ2 ðω ω0 Þ ¼ p2 c2 :
2
ð1:11Þ
ħ2 2 02 0
¼ ω þ ω 2ωω cos θ , ð1:12Þ
c2
6 1 Schrödinger Equation and Its Application
where we used the relations ω ¼ ck and ω0 ¼ ck0 with the third equality. Therefore,
we get
p2 c2 ¼ ħ2 ω2 þ ω0 2ωω0 cos θ :
2
ð1:13Þ
That is,
Thus, we get
ω ω0 1 1 1 0 ħ
¼ 0 ¼ ðλ λÞ ¼ ð1 cos θÞ, ð1:16Þ
ωω0 ω ω 2πc me c2
where λ and λ0 are wavelengths of the initial and final X-ray beams, respectively.
Since λ0 λ ¼ Δλ, we have (1.4) from (1.16) accordingly.
We have to mention another important person, Louis-Victor de Broglie (1924) in
the development of quantum mechanics. Encouraged by the success of Einstein and
Compton, he propounded the concept of matter wave, which was referred to as the
de Broglie wave afterward. Namely, de Broglie reversed the relationship of (1.1) and
(1.2) such that
ω ¼ E=ħ, ð1:17Þ
and
p
k¼ or λ ¼ h=p, ð1:18Þ
ħ
where p equals jpj and λ is a wavelength of a corpuscular beam. This is said to be the
de Broglie wavelength. In (1.18), de Broglie thought that a particle carrying an
energy E and momentum p is accompanied by a wave that is characterized by an
angular frequency ω and wavenumber k (or a wavelength λ ¼ 2π/k). Equation (1.18)
implies that if we are able to determine the wavelength of the corpuscular beam
experimentally, we can decide a magnitude of momentum accordingly.
1.1 Early-Stage Quantum Theory 7
p
u¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð1:19Þ
me 1 þ ðp=me cÞ2
p me u:
E e ¼ mc2 : ð1:21Þ
The relation (1.21) is due to Einstein (1905, 1907) and is said to be the equivalence
theorem of mass and energy.
If an electron is accompanied by a matter wave, that wave should be propagated
with a certain phase velocity vp and a group velocity vg. Thus, using (1.17) and (1.18)
we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
vp ¼ ω=k ¼ Ee =p ¼ p2 c2 þ me 2 c4 =p > c,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
vg ¼ ∂ω=∂k ¼ ∂Ee =∂p ¼ c2 p= p2 c2 þ me 2 c4 < c, ð1:22Þ
vp vg ¼ c :
2
Notice that in the above expressions, we replaced E of (1.17) with Ee of (1.20). The
group velocity is thought to be a velocity of a wave packet and, hence, a propagation
velocity of a matter wave should be identical to vg. Thus, vg is considered as a
particle velocity as well. In fact, vg given by (1.22) is identical to u expressed in
(1.19). Therefore, a particle velocity must not exceed c. As for photons (or light
quanta), vp ¼ vg ¼ c and, hence, once again we get vpvg ¼ c2. We will encounter the
last relation of (1.22) in Part II as well.
The above discussion is a brief historical outlook of early-stage quantum theory
before Erwin Schrödinger (1926) propounded his equation.
8 1 Schrödinger Equation and Its Application
2
1 ∂ ψ
∇2 ψ ¼ , ð1:23Þ
v2 ∂t 2
2 2 2
∂ ∂ ∂
∇2 2
þ 2þ 2: ð1:24Þ
∂x ∂y ∂z
One of special solutions for (1.24) called a plane wave is well studied and expressed
as
where e1, e2, and e3 denote basis vectors of an orthonormal base pointing to positive
directions of x-, y-, and z-axes. Here we make it a rule to represent basis vectors by a
row vector and represent a coordinate or a component of a vector by a column vector;
see Sect. 9.1.
The other way around, now we wish to seek a basic equation whose solution is
described as (1.25). Taking account of (1.1)–(1.3) as well as (1.17) and (1.18), we
rewrite (1.25) as
ψ ¼ ψ 0 eiðħ ∙ x ħ tÞ ,
p E
ð1:27Þ
0 1
px
B C
where we redefine p = ðe1 e2 e3 Þ@ py A and E as quantities associated with those of
pz
matter (electron) wave. Taking partial differentiation of (1.27) with respect to x, we
obtain
1.2 Schrödinger Equation 9
∂ψ
¼ p ψ eiðħ ∙ x ħ tÞ ¼ px ψ:
i p E i
ð1:28Þ
∂x ħ x 0 ħ
ħ ∂ψ
¼ px ψ: ð1:29Þ
i ∂x
Similarly we have
ħ ∂ψ ħ ∂ψ
¼ py ψ and ¼ pz ψ: ð1:30Þ
i ∂y i ∂z
Comparing both sides of (1.29), we notice that we may relate a differential operator
ħ ∂
i ∂x to px. From (1.30), similar relationship holds with the y and z components. That
is, we have the following relations:
ħ ∂ ħ ∂ ħ ∂
$ px , $ py , $ pz : ð1:31Þ
i ∂x i ∂y i ∂z
2 2
∂ ψ
ψ 0 eiðħ ∙ x ħtÞ ¼ 2 p2x ψ:
i p E 1
¼ p ð1:32Þ
∂x2 ħ x
ħ
Hence,
2
∂ ψ
ħ2 ¼ p2x ψ: ð1:33Þ
∂x2
Similarly we have
2 2
∂ ψ ∂ ψ
ħ2 ¼ p2y ψ and ħ2 ¼ p2z ψ: ð1:34Þ
∂y2 ∂z2
2 2 2
∂ ∂ ∂
ħ2 $ p2x , ħ2 2 $ p2y , ħ2 2 $ p2z : ð1:35Þ
∂x2 ∂y ∂z
Summing both sides of (1.33) and (1.34) and then dividing by 2m, we have
10 1 Schrödinger Equation and Its Application
ħ2 2 p2
∇ ψ¼ ψ ð1:36Þ
2m 2m
ħ2 2 p2
∇ $ , ð1:37Þ
2m 2m
∂ψ
¼ Eψ 0 eiðħ ∙ x ħ tÞ ¼ Eψ:
i p E i
ð1:38Þ
∂t ħ ħ
That is,
∂ψ
iħ ¼ Eψ: ð1:39Þ
∂t
∂
iħ $ E: ð1:40Þ
∂t
we have
p2
E¼ þ V, ð1:43Þ
2m
∂ψ ħ2 2
iħ þ ∇ ψ ¼ Vψ: ð1:44Þ
∂t 2m
ħ2 2
H ∇ þ V: ð1:46Þ
2m
∂ψ
Hψ ¼ iħ : ð1:47Þ
∂t
m e c2
K ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m e c2 :
2
1 ðu=cÞ
1 1
pffiffiffiffiffiffiffiffiffiffiffi 1 þ x
1x 2
2
when x (>0) corresponding to uc is enough small than 1. This implies that in the
above case the group velocity u of a particle is supposed to be well below light
velocity c. Dirac (1928) formulated an equation that describes relativistic quantum
mechanics (the Dirac equation).
In (1.45) ψ varies as a function of x and t. Suppose, however, that a potential
V depends only upon x. Then we have
ħ2 2 ∂ψ ðx, t Þ
∇ þ V ðxÞ ψ ðx, t Þ ¼ iħ : ð1:48Þ
2m ∂t
Now, let us assume that separation of variables can be done with (1.48) such that
Then, we have
ħ2 2 ∂ϕðxÞξðt Þ
∇ þ V ðxÞ ϕðxÞξðt Þ ¼ iħ : ð1:50Þ
2m ∂t
ħ2 2 ∂ξðt Þ
∇ þ V ðxÞ ϕðxÞ=ϕðxÞ ¼ iħ =ξðt Þ: ð1:51Þ
2m ∂t
For (1.51) to hold, we must equate both sides to a constant E. That is, for a certain
fixed point x0 we have
ħ2 2 ∂ξðt Þ
∇ þ V ðx0 Þ ϕðx0 Þ=ϕðx0 Þ ¼ iħ =ξðt Þ, ð1:52Þ
2m ∂t
where ϕ(x0) of a numerator should be evaluated after operating ∇2, while with ϕ(x0)
in a denominator, ϕ(x0) is evaluated simply replacing x in ϕ(x) with x0. Now, let us
define a function Φ(x) such that
ħ2 2
Φð xÞ ∇ þ V ðxÞ ϕðxÞ=ϕðxÞ: ð1:53Þ
2m
Then, we have
1.2 Schrödinger Equation 13
∂ξðt Þ
Φðx0 Þ ¼ iħ =ξðt Þ: ð1:54Þ
∂t
Equation (1.56) can readily be solved. Since (1.56) depends on a sole variable t, we
have
dξðt Þ E E
¼ dt or d ln ξðt Þ ¼ dt: ð1:57Þ
ξðt Þ iħ iħ
ξðt Þ Et
ln ¼ : ð1:58Þ
ξð0Þ iħ
That is,
Comparing (1.59) with (1.38), we find that the constant E in (1.55) and (1.56)
represents an energy of a particle (electron).
Thus, the next task we want to do is to solve an eigenvalue equation of (1.55).
After solving the problem, we get a solution
where the constant ξ(0) has been absorbed in ϕ(x). Normally, ϕ(x) is to be normal-
ized after determining the functional form (vide infra).
14 1 Schrödinger Equation and Its Application
The Schrödinger equation has been expressed as (1.48). The equation is a second-
order linear differential equation (SOLDE). In particular, our major interest lies in
solving an eigenvalue problem of (1.55). Eigenvalues consist of points in a complex
plane. Those points sometimes form a continuous domain, but we focus on the
eigenvalues that comprise discrete points in the complex plane. Therefore, in our
studies the eigenvalues are countable and numbered as, e.g., λn (n ¼ 1, 2, 3, ). An
example is depicted in Fig. 1.2. Having this common belief as a background, let us
first think of a simple form of SOLDE.
Example 1.1 Let us think of a following differential equation:
d 2 yð xÞ
þ λyðxÞ ¼ 0, ð1:61Þ
dx2
The BCs of (1.62) are called Dirichlet conditions. We define the following differ-
ential operator D described as
d2
D : ð1:63Þ
dx2
0 1
1.3 Simple Applications of Schrödinger Equation 15
This is because the above functions do not change a functional form with respect to
the differentiation and we ascribe solving a differential equation to solving an
algebraic equation among constants (or parameters). In the present case, λ and
k are such constants.
The parameter k could be a complex variable, because λ is allowed to take a
complex value as well. Linear independence of these functions is ensured from a
nonvanishing Wronskian, W. That is,
where a and b are (complex) constant. We call two linearly independent solutions
eikx and eikx (k 6¼ 0) a fundamental set of solutions of a SOLDE. Inserting (1.66)
into (1.61), we have
λ k2 ¼ 0, i:e:, λ ¼ k2 : ð1:68Þ
eikL eikL
¼ 0, i:e:, e2ikL e2ikL ¼ 0: ð1:71Þ
eikL eikL
It is because if (1.71) were not zero, we would have a ¼ b ¼ 0 and y(x) 0. Note that
with an eigenvalue problem we must avoid having a solution that is identically zero.
Rewriting (1.71), we get
or
eikL ða bÞ ¼ 0: ð1:75Þ
Therefore,
a ¼ b, ð1:76Þ
where we used the fact that eikL is a nonvanishing function for any ikL (either real or
complex). Similarly, in the case of (1.74), we have
a ¼ b: ð1:77Þ
cos kL ¼ 0: ð1:80Þ
Hence,
π
kL ¼ þ mπ ðm ¼ 0, 1, 2, Þ: ð1:81Þ
2
π π
In (1.81), for instance, we have k ¼ 2L for m ¼ 0 and k ¼ 2L for m ¼ 1. Also, we
have k ¼ 2L for m ¼ 1 and k ¼ 2L for m ¼ 2. These cases, however, individually
3π 3π
give linearly dependent solutions for (1.78). Therefore, to get a set of linearly
independent eigenfunctions we may define k as positive. Correspondingly, from
(1.68) we get eigenvalues of
sin kL ¼ 0: ð1:83Þ
Hence,
kL ¼ nπ ðn ¼ 1, 2, 3, Þ: ð1:84Þ
where we chose positive numbers n for the same reason as the above. With the
second equality of (1.85), we made eigenvalues easily comparable to those of (1.82).
Figure 1.3 shows the eigenvalues given in both (1.82) and (1.85) in a unit of π 2/4L2.
From (1.82) and (1.85), we find that λ is positive definite (or strictly positive), and
so from (1.68) we have
pffiffiffi
k¼ λ: ð1:86Þ
[ /4
+∞
0 1 4 9 16 25 ⋯
Fig. 1.3 Eigenvalues of a differential equation (1.61) under boundary conditions given by (1.62).
The eigenvalues are given in a unit of π 2/4L2 on a real axis
18 1 Schrödinger Equation and Its Application
Z L Z L
I¼ yðxÞ yðxÞdx ¼ jyðxÞj2 dx ¼ 1: ð1:87Þ
L L
That is,
Z L Z L
1
I ¼ 4jaj2 cos 2 kxdx ¼ 4jaj2 ð1 þ cos 2kxÞdx
L L 2
h iL
1
¼ 2jaj2 x þ sin 2kx ¼ 4Ljaj2 : ð1:88Þ
2k L
Thus, we have
rffiffiffi
1 1 iθ
a¼ e , ð1:90Þ
2 L
Then, we have a ¼ 12 1
L. Thus for a normalized cosine eigenfunctions, we get
rffiffiffi h i
1 π
yð xÞ ¼ cos kx kL ¼ þ mπ ðm ¼ 0, 1, 2, Þ ð1:91Þ
L 2
Strictly speaking, we should be careful to assure that (1.81) holds on the basis of
(1.80). It is because we have yet the possibility that k is a complex number. To see it,
we examine zeros of a cosine function that is defined in a complex domain. Here the
zeros are (complex) numbers to which the function takes zero. That is, if f(z0) ¼ 0, z0
is called a zero (i.e., one of zeros) of f(z). Now we have
1 iz
cos z e þ eiz ; z ¼ x þ iy ðx, y : realÞ: ð1:93Þ
2
1
cos z ¼ ½ cos xðey þ ey Þ þ i sin xðey ey Þ : ð1:94Þ
2
For cosz to vanish, both its real and imaginary parts must be zero. Since e y + ey > 0
for all real numbers y, we must have cosx ¼ 0 for the real part to vanish; i.e.,
π
x¼ þ mπ ðm ¼ 0, 1, 2, Þ: ð1:95Þ
2
Note in this case that sinx ¼ 1 (6¼0). Therefore, for the imaginary part to vanish,
ey e y ¼ 0. That is, we must have y ¼ 0. Consequently, the zeros of cosz are real
numbers. In other words, with respect to z0 that satisfies cosz0 ¼ 0 we have
π
z0 ¼ þ mπ ðm ¼ 0, 1, 2, Þ: ð1:96Þ
2
ħ2 d2 ψ ðxÞ
þ Eψ ðxÞ ¼ 0, ð1:97Þ
2m dx2
0 ðL x LÞ,
V ð xÞ ¼
1 ðL > x; x > LÞ:
d2 ψ ðxÞ 2mE
þ 2 ψ ð xÞ ¼ 0 ð1:98Þ
dx2 ħ
with BCs
ħ2 λ
E¼ ð1:100Þ
2m
with λ ¼ k2 in (1.68). For k we use the values of (1.81) and (1.84). Therefore, with
energy eigenvalues we get either
2
ħ2 ð2l þ 1Þ π 2
E¼ ðl ¼ 0, 1, 2, Þ,
2m 4L2
to which
rffiffiffi h i
1 π
ψ ð xÞ ¼ cos kx kL ¼ þ lπ ðl ¼ 0, 1, 2, Þ ð1:101Þ
L 2
corresponds or
2
ħ2 ð2nÞ π 2
E¼ ðn ¼ 1, 2, 3, Þ,
2m 4L2
to which
rffiffiffi
1
ψ ð xÞ ¼ sin kx ½kL ¼ nπ ðn ¼ 1, 2, 3, Þ ð1:102Þ
L
corresponds.
Since the particle behaves as a free particle within the potential well (L x L)
and p ¼ ħk, we obtain
1.4 Quantum-Mechanical Operators and Matrices 21
p2 ħ2 2
E¼ ¼ k ,
2m 2m
where
8
< ð2l þ 1Þπ=2L
> ðl ¼ 0, 1, 2, Þ,
k¼
>
:
2nπ=2L ðn ¼ 1, 2, 3, Þ:
PΨ ¼ pΨ : ð1:103Þ
e
j ψi ¼ : ð1:105Þ
f
Note that operating (2, 2) matrix on a (2, 1) matrix produces another (2, 1) matrix.
Furthermore, we define an adjoint matrix A{ such that
{ a c
A ¼ , ð1:106Þ
b d
In this case, jψi{ also denotes a complex conjugate transpose of jψi. The notation
jψi and hψj are due to Dirac. He named hψj and jφi a bra vector and ket vector,
respectively. This naming or equivoque comes from that hψ j j φi ¼ hψ| φi forms a
bracket. This is a (1, 2) (2, 1) ¼ (1, 1) matrix, i.e., a c-number (including a
complex number) and hψ| φi represents an inner product. These notations are widely
used nowadays in the field of mathematics
and physics.
g
Taking another vector j ξi ¼ and using a matrix calculation rule, we have
h
{ { a c e a eþc f
A j ψi ¼j A ψi ¼ ¼ : ð1:108Þ
b d f b eþd f
Thus, we get
g
A{ ψjξi ¼ ðae þ cf be þ df Þ ¼ ðag þ bhÞe þ ðcg þ dhÞf : ð1:110Þ
h
Similarly, we have
a b g
hψjAξi ¼ ðe f Þ ¼ ðag þ bhÞe þ ðcg þ dhÞf : ð1:111Þ
c d h
A{ ψjξi ¼ hψjAξi: ð1:112Þ
Also, we have
{
A{ ¼ A: ð1:115Þ
where the second equality comes from (1.113) obtained by exchanging ψ and ξ
there. Moreover, we have a following relation:
ðABÞ{ ¼ B{ A{ : ð1:117Þ
Making an inner product by multiplying jξi from the right of the leftmost and
rightmost sides of (1.118) and using (1.116), we get
hAψ j ξi ¼ ψA{ j ξi ¼ ψjA{ ξ :
This relation may be regarded as the associative law with regard to the symbol “j” of
the inner product. This is equivalent to the associative law with regard to the matrix
multiplication.
The results obtained above can readily be extended to a general case where (n, n)
matrices are dealt with.
Now, let us introduce an Hermitian operator (or matrix) H. When we have
H { ¼ H, ð1:119Þ
24 1 Schrödinger Equation and Its Application
A norm is a natural extension for a notion of a “length” of a vector. The norm j|ψ|j is
zero, if and only if jψi ¼ 0 (zero vector). Then, from (1.105) and (1.107), we have
hψjψi ¼ jej2 þ j f j2 :
Therefore, hψ| ψi ¼ 0 ⟺ e ¼ f ¼ 0, i. e. , j ψi ¼ 0.
Let us further consider an eigenvalue problem represented by our newly intro-
duced notation. The eigenvalue equation is symbolically written as
H j ψi ¼ λ j ψi, ð1:122Þ
where we assume that jψi is normalized, namely hψ| ψi ¼ 1 or j|ψ| j ¼ 1. Notice that
the symbol “j” in an inner product is of secondary importance. We may disregard
this notation as in the case where a product notation “” is omitted by denoting ab
instead of a b.
Taking a complex conjugate of (1.123), we have
hψjH ψi ¼ λ : ð1:124Þ
where with the third equality we used the definition (1.119). The relation λ ¼ λ
obviously shows that any eigenvalue λ is real, if H is Hermitian. The relation (1.125)
immediately tells us that even though jψi is not an eigenfunction, hψ| Hψi is real as
well, if H is Hermitian. The quantity hψ| Hψi is said to be an expectation value. This
value is interpreted as the most probable or averaged value of H obtained as a result
of operation of H on a physical state jψi. We sometimes denote the expectation value
as
1.4 Quantum-Mechanical Operators and Matrices 25
where jψi is normalized. Unless jψi is normalized, it can be normalized on the basis
of (1.121) by choosing jΦi such that
Using the above definition, let us calculate hg| Df i, where D was defined in
(1.63). Then, using the integration by parts, we have
Z Z b
b
d2 f ðxÞ 0 b 0
hg j Df i ¼ gð x Þ 2
dx ¼ g f a þ g f 0 dx
a dx a
Z b Z b
0b 0 b 00 00
g fdx ¼ ½g 0 f g f 0 a þ
b
¼ g f aþ g f a g f dx
a a
¼ ½g 0 f g f 0 a þ hDg j f i:
b
ð1:130Þ
26 1 Schrödinger Equation and Its Application
we get
In light of (1.120), (1.132) implies that D is Hermitian. In (1.131), notice that the
functions f and g satisfy the same BCs. Normally, for an operator to be Hermitian has
this property. Thus, the Hermiticity of a differential operator is closely related to BCs
of the differential equation.
Next, we consider a following inner product:
Z b Z b
0
00
f0 a f f 0 dx
b
hf jDf i ¼ f f dx ¼ ½ f þ
a a
Z b
¼ ½ f f 0 a þ j f 0 j dx:
b 2
ð1:133Þ
a
Note that the definite integral of (1.133) cannot be negative. There are two possibil-
ities for D to be Hermitian according to different BCs.
(i) Dirichlet conditions: f(b) ¼ f(a) ¼ 0. If we could have f 0 ¼ 0, h f j Df i would be
zero. But, in that case f should be constant. If so, f(x) 0 according to BCs. We
must exclude this trivial case. Consequently, to avoid this situation we must
have
Z b
j f 0 j dx > 0 or h f jDf i > 0:
2
ð1:134Þ
a
In this case, the operator D is said to be positive definite. Suppose that such a
positive-definite operator has an eigenvalue λ. Then, for a corresponding
eigenfunction y(x) we have
Both hy| Dyi and kyk2 are positive and, hence, we have λ > 0. Thus, if D has
an eigenvalue, it must be positive. In this case, λ is said to be positive definite as
well; see Example 1.1.
(ii) Neumann conditions: f 0(b) ¼ f 0(a) ¼ 0. From (1.130), D is Hermitian as well.
Unlike the condition (i), however, f may be a nonzero constant in this case.
Therefore, we are allowed to have
Z b
j f 0 j dx ¼ 0 or hf jDf i ¼ 0:
2
ð1:137Þ
a
hf jDf i 0: ð1:138Þ
where the presence of a unit matrix E is implied. Explicitly writing it, we have,
The relations (1.140) and (1.141) are called the canonical commutation relation. On
the basis of a relation p ¼ ħi ∂q
∂
, a brief proof for this is as follows:
ħ ∂ ħ ∂ ħ ∂ ħ ∂
½q, p jψi ¼ ðqp pqÞjψi ¼ q q ψi ¼ q ψi ðqjψiÞ
i ∂q i ∂q i ∂q i ∂q
ħ ∂ j ψi ħ ∂q ħ ∂ j ψi ħ
¼q ψi q ¼ ψi ¼ iħ j ψi: ð1:142Þ
i ∂q i ∂q i ∂q i
G{ ¼ G, ð1:145Þ
G j ψi ¼ λ j ψi, ð1:146Þ
where G is an anti-Hermitian operator and jψi has been normalized. As in the case of
(1.123) we have
hψjGψi ¼ λ : ð1:148Þ
λ ¼ hψjGψi ¼ ψjG{ ψ ¼ hψjGψi ¼ λ, ð1:149Þ
where the domain [a, b] depends on a physical system; this can be either bounded or
unbounded. Performing integration by parts, we have
Z b
ħ b ∂ ħ
hgjpf i ¼ ½gðxÞ f ðxÞ a ½ gð x Þ f ðxÞdx
i a ∂x i
Z b
ħ ħ ∂
¼ ½gðbÞ f ðbÞ gðaÞ f ðaÞ þ gðxÞ f ðxÞdx: ð1:151Þ
i a i ∂x
If we require f(b) ¼ f(a) and g(b) ¼ g(a), the first term vanishes and we get
Z b
ħ ∂
hgjpf i ¼ gðxÞ f ðxÞdx ¼ hpgj f i: ð1:152Þ
a i ∂x
Thus, as in the case of (1.120), the momentum operator p is Hermitian. Note that a
position operator q of (1.142) is Hermitian as a priori assumption.
Meanwhile, the z-component of angular momentum operator Lz is described in a
polar coordinate as follows:
ħ ∂
Lz ¼ , ð1:153Þ
i ∂ϕ
where ϕ is an azimuthal angle varying from 0 to 2π. The notation and implication of
Lz will be mentioned in Chap. 3. Similarly as the above, we have
30 1 Schrödinger Equation and Its Application
Z 2π
ħ ħ ∂
hgjLz f i ¼ ½gð2π Þ f ð2π Þ gð0Þ f ð0Þ þ gðxÞ f ðxÞdϕ: ð1:154Þ
i 0 i ∂ϕ
Note that we must have the above BC, because ϕ ¼ 0 and ϕ ¼ 2π are spatially the
same point. Thus, we find that Lz is Hermitian as well on this condition.
On the basis of aforementioned argument, let us proceed to quantum-mechanical
studies of a harmonic oscillator. Regarding the angular momentum, we will study
their basic properties in Chap. 3.
Reference
d2 xðtÞ
m ¼ sxðt Þ, ð2:1Þ
dt 2
d 2 xð t Þ
þ ω2 xðt Þ ¼ 0: ð2:2Þ
dt 2
where a and b are suitable constants. Let us consider BCs different from those of
Examples 1.1 or 1.2 this time. That is, we set BCs such that
Notice that (2.5) gives initial conditions (ICs). Mathematically, ICs are included
in BCs (see Chap. 10). From (2.4) we have
v0 iωx v
xð t Þ ¼ e eiωx ¼ 0 sin ωt: ð2:7Þ
2iω ω
1
E ¼ K þ V ¼ mv20 : ð2:8Þ
2
1 1
V ðqÞ ¼ sq2 ¼ mω2 q2 , ð2:9Þ
2 2
p2 p2 1
H¼ þ V ð qÞ ¼ þ mω2 q2 : ð2:10Þ
2m 2m 2
Hψ ðqÞ ¼ Eψ ðqÞ
or
ħ2 2 1
∇ ψ ðqÞ þ mω2 q2 ψ ðqÞ ¼ Eψ ðqÞ: ð2:11Þ
2m 2
This is a SOLDE and it is well known that the SOLDE can be solved by a power
series expansion method.
In the present studies, however, let us first use an operator method to solve the
eigenvalue equation (2.11) of a one-dimensional oscillator. To this end, we use a
quantum-mechanical Hamiltonian where a momentum operator p is explicitly
represented. Thus, the Hamiltonian reads as
p2 1
H¼ þ mω2 q2 : ð2:12Þ
2m 2
Equation (2.12) is formally the same as (2.10). Note, however, that in (2.12) p and
q are expressed as quantum-mechanical operators.
As in (1.126), we first examine an expectation value hHi of H. It is given by
34 2 Quantum-Mechanical Harmonic Oscillator
2 D E
p 1
hH i ¼ hψjHψ i ¼ ψ ψ þ ψ mω2 q2 ψ
2m 2
1 1 1 1 ð2:13Þ
¼ p{ ψ jpψ þ mω2 q{ ψ jqψ ¼ hpψ jpψ i þ mω2 hqψ jqψ i
2m 2 2m 2
1 1
¼ 2
kpψ k þ mω kqψ k 0,
2 2
2m 2
where again we assumed that jψi has been normalized. In (2.13) we used the
notation (1.126) and the fact that both q and p are Hermitian. In this situation, hHi
takes a non-negative value.
In (2.13), the equality holds if and only if jpψi ¼ 0 and jqψi ¼ 0. Let us specify a
vector jψ 0i that satisfies these conditions such that
j pψ 0 i ¼ 0 and j qψ 0 i ¼ 0: ð2:14Þ
Multiplying q from the left on the first equation of (2.14) and multiplying p from
the left on the second equation, we have
qp j ψ 0 i ¼ 0 and pq j ψ 0 i ¼ 0: ð2:15Þ
Subtracting the second equation of (2.15) from the first equation, we get
where with the first equality we used (1.140). Therefore, we would have jψ 0(q)i 0.
This leads to the relations (2.14). That is, if and only if jψ 0(q)i 0, hHi ¼ 0. But,
since it has no physical meaning, jψ 0(q)i 0 must be rejected as unsuitable for the
solution of (2.11). Regarding a physically acceptable solution of (2.13), hHi must
take a positive definite value accordingly. Thus, on the basis of the canonical
commutation relation, we restrict the range of the expectation values.
Instead of directly dealing with (2.12), it is well known to introduce following
operators [1]:
rffiffiffiffiffiffiffi
mω i
a q þ pffiffiffiffiffiffiffiffiffiffiffiffi p ð2:17Þ
2ħ 2mħω
Notice here again that both q and p are Hermitian. Using a matrix representation
for (2.17) and (2.18), we have
2.2 Formulation Based on an Operator Method 35
0 qffiffiffiffiffiffiffi 1
mω i
pffiffiffiffiffiffiffiffiffiffiffiffi
a B 2ħ 2mħω C q
¼B ffiffiffiffiffiffiffi
@ qmω
C
A p ð2:19Þ
a{ i
pffiffiffiffiffiffiffiffiffiffiffiffi
2ħ 2mħω
Then we have
where the second last equality comes from (1.140). Rewriting (2.20), we get
1
H ¼ ħωa{ a þ ħω: ð2:21Þ
2
Similarly we get
1
H ¼ ħωaa{ ħω: ð2:22Þ
2
That is,
a, a{ ¼ 1 or a, a{ ¼ E, ð2:24Þ
Similarly, we get
Thus, the expectation value is equal to or larger than 12 ħω. This is consistent with
that an energy eigenvalue is positive definite as mentioned above. Equation (2.27)
also tells us that if we have
j aψ 0 i ¼ 0, ð2:28Þ
we get
1
hψ 0 jHjψ 0 i ¼ ħω: ð2:29Þ
2
Equation (2.29) means that the smallest expectation value is 12 ħω on the condition
of (2.28). On the same condition, using (2.21) we have
1 1
H j ψ 0 i ¼ ħωa{ a j ψ 0 i þ ħω j ψ 0 i ¼ ħω j ψ 0 i: ð2:30Þ
2 2
1
H j ψ 0 i ¼ ħω j ψ 0 i ¼ E0 j ψ 0 i: ð2:31Þ
2
a{ H j ψ 0 i ¼ a{ E 0 j ψ 0 i: ð2:32Þ
This implies that a{ j ψ 0i belongs to an eigenvalue (E0 + ħω), which is larger than
E0 as expected. Again multiplying a{ on both sides of (2.34) from the left and using
(2.25), we get
2.2 Formulation Based on an Operator Method 37
[ℏ
+∞
0 1 3 5 7 9
⋯
2 2 2 2 2
2 2
H a{ j ψ 0 i ¼ ðE 0 þ 2ħωÞ a{ j ψ 0 i: ð2:35Þ
This implies that (a{)2 j ψ 0i belongs to an eigenvalue (E0 + 2ħω). Thus, repeat-
edly taking the above procedures, we get
n n
H a{ j ψ 0 i ¼ ðE 0 þ nħωÞ a{ j ψ 0 i: ð2:36Þ
where En denotes an energy eigenvalue of the n-th excited state. The energy
eigenvalues are plotted in Fig. 2.1.
Our next task is to seek normalized eigenvectors of the n-th excited state. Let cn
be a normalization constant of that state. That is, we have
n
j ψ n i ¼ cn a{ j ψ 0 i, ð2:38Þ
In the above procedures, we used [a, a{] ¼ 1. What is implied in (2.39) is that a
coefficient of (a{)n 1 increased one by one with a transferred toward the right one
by one in the second term of RHS. Notice that in the second term a is sandwiched
such that (a{)ma(a{)n m (m ¼ 1, 2, . . .). Finally, we have
38 2 Quantum-Mechanical Harmonic Oscillator
n n1 { n
a a{ ¼ n a{ þ a a: ð2:40Þ
Thus, we get
n h n1 n i n1
a ψ n i ¼ c n a a { ψ 0 i ¼ c n n a { þ a{ a ψ 0 i ¼ c n n a{ ψ 0 i
c n1 c ð2:41Þ
¼ n n cn1 a{ jψ 0 i ¼ n n jψ n1 i,
cn1 cn1
where m < n and f(a, a{) is a polynomial of a{ that has a power of a as a coefficient.
Further operating hψ 0| and jψ 0i from the left and right on both sides of (2.44),
respectively, we have
D n E D nm E
ψ 0 am a{ ψ 0 ¼ nðn 1Þðn 2Þ ðn m þ 1Þ ψ 0 a{ ψ 0
þ ψ 0 f a, a{ aψ 0 ð2:45Þ
D nm E
¼ nðn 1Þðn 2Þ ðn m þ 1Þ ψ 0 a{ ψ 0
hψ 0 j a{ ¼ 0: ð2:46Þ
Equation (2.48) can be obtained by repeatedly using (1.117). From (2.47) and
(2.48), we get
n
hψ 0 j am a{ j ψ 0 i ¼ 0, when m 6¼ n: ð2:49Þ
1 n
j ψ n i ¼ pffiffiffiffi a{ j ψ 0 i, ð2:52Þ
n!
then we have
1
hψ n j ¼ pffiffiffiffi hψ 0 jan :
n!
hψ m j ψ n i ¼ δmn : ð2:53Þ
1
cn ¼ pffiffiffiffi : ð2:54Þ
n!
Notice here that an undetermined phase factor eiθ (θ : real) is intended such that
1
cn ¼ pffiffiffiffi eiθ :
n!
But, eiθ is usually omitted for the sake of simplicity. Thus, we have constructed a
series of orthonormal eigenfunctions | ψ ni.
Furthermore, using (2.41) and (2.54) we get
pffiffiffi
a j ψ ni ¼ n j ψ n1 i: ð2:55Þ
H j ψ n i ¼ ðE 0 þ nħωÞ j ψ n i: ð2:56Þ
a{ a j ψ n i ¼ n j ψ n i: ð2:58Þ
Thus,
2.3 Matrix Representation of Physical Quantities 41
pffiffiffi
a{ j ψ n1 i ¼ n j ψ n i: ð2:61Þ
Equations (2.55) and (2.62) clearly represent the relationship between an operator
and eigenfunction (or eigenvector). The relationship is characterized by
Thus, we are now in a position to construct this relation using matrices. From
(2.53), we should be able to construct basis vectors using a column vector such that
0 1 0 1 0 1
1 0 0
B 0 C B 1 C B 0 C
B C B C B C
B C B C B C
B 0 C B 0 C B 1 C
jψ 0 i ¼ B C, j ψ 1 i ¼ B
B 0 C B
C, j ψ 2 i ¼ B
C B
C, : ð2:64Þ
B C B 0 C B 0 C
C
B C B C B C
@ 0 A @ 0 A @ 0 A
⋮ ⋮ ⋮
Notice that these vectors form a vector space of an infinite dimension. The
orthonormal relation (2.53) can easily be checked. We represent a and a{ so that
(2.55) and (2.62) can be satisfied. We obtain
0 1
0 1 0 0 0
B pffiffiffi
B 0 0 2 0 0 C
C
B pffiffiffi C
B 0 0 0 3 0 C
a¼B
B
C: ð2:65Þ
B 0 0 0 0 2 C
C
B C
@ 0 0 0 0 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
Similarly,
42 2 Quantum-Mechanical Harmonic Oscillator
0 1
0 0 0 0 0
B 1 C
B 0
pffiffiffi
0 0 0 C
B C
B 0 2 0 0 0 C
a{ ¼ B
B 0 pffiffiffi C: ð2:66Þ
B 0 3 0 0 CC
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
With the inverse matrix, we will deal with it in Part III. Note that q and p are both
Hermitian. Inserting (2.65) and (2.66) into (2.68), we get
0 1
0 1 0 0 0
B pffiffiffi
C
rffiffiffiffiffiffiffiffiffiffiB C
1 0 2 0 0
B pffiffiffi pffiffiffi C
ħ B B
0 2 0 3 0 C
C,
q¼ pffiffiffi ð2:69Þ
2mωB B 0 0 3 0 2 C
C
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
0 1
0 1 0 0 0
B pffiffiffi
0 2 C
rffiffiffiffiffiffiffiffiffiffi B C
1 0 0
B pffiffiffi pffiffiffi C
mħω B 0 2 0 3 0 C
p¼ iB pffiffiffi C: ð2:70Þ
2 B B 0 0 3 0 2 C
C
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
2.3 Matrix Representation of Physical Quantities 43
Equations (2.69) and (2.70) obviously show that q and p are Hermitian. We can
derive various physical quantities from these equations. For instance, following
matrix algebra Hamiltonian H can readily be calculated. The result is expressed as
0 1
1 0 0 0 0
B C
B 0 3 0 0 0 C
B C
p 2
1 ħω B
B
0 0 5 0 0 C
C:
H¼ þ mω2 q2 ¼ ð2:71Þ
2m 2 2 B
B 0 0 0 7 0 C
C
B C
@ 0 0 0 0 9 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
where with the second last equality we used (2.24) and the identity matrix E of an
infinite dimension is given by
44 2 Quantum-Mechanical Harmonic Oscillator
0 1
1 0 0 0 0
B 0 C
B 1 0 0 0 C
B C
B 0 0 1 0 0 C
E¼B
B 0
C: ð2:74Þ
B 0 0 1 0 CC
B C
@ 0 0 0 0 1 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
The Schrödinger equation has been given in (1.55) or (2.11) as a SOLDE form. In
contrast to the matrix representation, (1.55) and (2.11) are said to be coordinate
representation of Schrödinger equation. Now, let us derive a coordinate representa-
tion of (2.28) to obtain an analytical solution.
On the basis of (1.31) and (2.17), a is expressed as
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffi
mω i mω i ħ ∂
a¼ q þ pffiffiffiffiffiffiffiffiffiffiffiffi p ¼ q þ pffiffiffiffiffiffiffiffiffiffiffiffi
2ħ 2mħω 2ħ 2mħω i ∂q
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
mω ħ ∂
¼ qþ : ð2:75Þ
2ħ 2mω ∂q
Or
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
mω ħ ∂ψ 0 ðqÞ
qψ ðqÞ þ ¼ 0: ð2:77Þ
2ħ 0 2mω ∂q
∂ψ 0 ðqÞ mω
þ qψ 0 ðqÞ ¼ 0: ð2:78Þ
∂q ħ
ψ 0 ðqÞ ¼ N 0 eαq ,
2
ð2:79Þ
Hence, we get
mω
α¼ : ð2:81Þ
2ħ
Thus,
ψ 0 ðqÞ ¼ N 0 e 2ħ q :
mω 2
ð2:82Þ
we have
Z 1
rffiffiffiffiffiffiffi
mω 2 πħ
e ħ q dq ¼ : ð2:85Þ
1 mω
R1 cq2
To get (2.84), putting I 1 e dq, we have
Z 1 Z 1 Z 1
ecq dq ecs ds ecðq þs Þ dqds
2 2 2 2
I2 ¼ ¼
1 1 1
Z 1 Z 2π Z 1 Z 2π ð2:86Þ
1 π
ecr rdr ecR dR
2
¼ dθ ¼ dθ ¼ ,
0 0 2 0 0 c
Also, we have
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffi
mω i mω i ħ ∂
a{ ¼ q pffiffiffiffiffiffiffiffiffiffiffiffi p ¼ q pffiffiffiffiffiffiffiffiffiffiffiffi
2ħ 2mħω 2ħ 2mħω i ∂q
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
mω ħ ∂
¼ q : ð2:88Þ
2ħ 2mω ∂q
Putting
rffiffiffiffiffiffiffi
mω
β and ξ ¼ βq, ð2:90Þ
ħ
we rewrite (2.89) as
n=2 n
ξ 1 mω ∂
ψ n ð qÞ ¼ ψ n ¼ pffiffiffiffi ξ ψ 0 ð qÞ
β n! 2ħβ 2 ∂ξ
n
rffiffiffiffiffiffiffiffiffi n
1 1 n=2 ∂ 1 mω 1=4 ∂
e2ξ : ð2:91Þ
1 2
¼ pffiffiffiffi ξ ψ 0 ð qÞ ¼ ξ
n! 2 ∂ξ 2 n
n! πħ ∂ξ
β2
α¼ : ð2:92Þ
2
Moreover, putting
2.4 Coordinate Representation of Schrödinger Equation 47
rffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 mω 1=4 β
Nn ¼ , ð2:93Þ
2n n! πħ π 1=2 2n n!
we get
n
∂
e 2ξ :
1 2
ψ n ðξ=βÞ ¼ N n ξ ð2:94Þ
∂ξ
We have to normalize (2.94) with respect to a variable ξ. Since ψ n(q) has already
been normalized as in (2.53), we have
Z 1
jψ n ðqÞj2 dq ¼ 1: ð2:95Þ
1
fn ðξÞ as
Comparing (2.96) and (2.97), if we define ψ
rffiffiffi
1
fn ðξÞ
ψ ψ ðξ=βÞ, ð2:98Þ
β n
n
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ψ fn ξ ∂
fn ðξÞ ¼ N e 12ξ2 fn
with N
1
: ð2:99Þ
∂ξ π 1=2 2n n!
where Hn(x) is a n-th order polynomial. We wish to show the following relation on
the basis of mathematical induction:
fn H n ðξÞe2ξ :
fn ðξÞ ¼ N
ψ
1 2
ð2:101Þ
Comparing (2.87), (2.98), and (2.99), we make sure that (2.101) holds with n ¼ 0.
When n ¼ 1, from (2.99) we have
h i
f1 ξ ∂ e12ξ2 ¼ N
f1 ðξÞ ¼ N
ψ f1 ξe12ξ2 ðξÞe12ξ2 ¼ N f1 2ξe12ξ2
∂ξ
2 1 2 ð2:102Þ
f 1 ξ2 d f1 H 1 ðξÞe12ξ2 :
¼ N 1 ð1Þ e eξ e2ξ ¼ N
dξ
nþ1 n
g ∂ 1 ∂ f ∂
e2ξ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ξ e 2ξ
1 2 1 2
ψg
nþ1 ðξÞ ¼ N nþ1 ξ Nn ξ
∂ξ 2ð n þ 1 Þ ∂ξ ∂ξ
h i
1 ∂ f
N n H n ðxÞe2ξ
1 2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ξ
2ð n þ 1Þ ∂ξ
n 1 2
1 fn ξ ∂
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N ð1Þn eξ
2 d
e ξ2
e 2ξ
2ð n þ 1Þ ∂ξ dξn
n
1 fn ð1Þn ξ ∂ e12ξ2 d n eξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N
2ð n þ 1Þ ∂ξ dξ
n n
1 fn ð1Þn ξe12ξ2 d n eξ2 ∂ e12ξ2 d n eξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N
2ð n þ 1Þ dξ ∂ξ dξ
n n nþ1
1 f 1 2 d
ξ ξ2 1 2 d
ξ ξ2 1 2 d
ξ ξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N n ð1Þ ξe
n 2 e ξe 2 e e 2 e
2ð n þ 1Þ dξn dξn dξnþ1
nþ1
1 fn ð1Þnþ1 e12ξ2 d ξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N e
2ð n þ 1Þ dξnþ1
nþ1 1 2
nþ1 ξ2 d
¼ Ng
nþ1 ð 1 Þ e nþ1
e ξ2
e2ξ ¼ Ng nþ1 H nþ1 ðxÞe 2 :
1 ξ2
dξ
ð2:103Þ
This means that (2.101) holds with n + 1 as well. Thus, it follows that (2.101) is true
of n that is zero or any positive integer.
Orthogonal relation reads as
2.4 Coordinate Representation of Schrödinger Equation 49
Z 1
ψfm ðξÞ ψ
fn ðξÞdξ ¼ δmn : ð2:104Þ
1
We tabulate first several Hermite polynomials Hn(x) in Table 2.1, where the index
n represents the highest order of the polynomials. In Table 2.1 we see that even
functions and odd functions appear alternately (i.e., parity). This is the case with
ψ n(q) as well, because ψ n(q) is a product of Hn(x) and an even function e 2ħ q .
mω 2
Note that Hm(ξ) is a real function, and so Hm(ξ) ¼ Hm(ξ). The relation (2.107) is
well known as the orthogonality of Hermite polynomials with eξ taken as a weight
2
function [3]. Here the weight function is a real and non-negative function within the
domain considered [e.g., (1, +1) in the present case] and independent of indices
m and n. We will deal with it again in Sect. 8.4.
The relation (2.101) and the orthogonality relationship described as (2.107) can
more explicitly be understood as follows: From (2.11) we have the Schrödinger
equation of a one-dimensional quantum-mechanical harmonic oscillator such that
ħ2 d 2 uð qÞ 1
þ mω2 q2 uðqÞ ¼ EuðqÞ: ð2:108Þ
2m dq2 2
d 2 uð ξ Þ 2E
þ ξ2 uðξÞ ¼ uðξÞ: ð2:109Þ
dξ 2 ħω
2E
λ ð2:110Þ
ħω
d2
D þ ξ2 , ð2:111Þ
dξ2
uðξÞ ¼ vðξÞeξ =2
2
: ð2:113Þ
ξ2
Since e 2 does not vanish with any ξ, we have
d 2 vð ξ Þ dvðξÞ
2
þ 2ξ ¼ ðλ 1Þ vðξÞ: ð2:115Þ
dξ dξ
e such that
If we define another differential operator D
2
e d þ 2ξ d ,
D ð2:116Þ
dξ2 dξ
e ðξÞ ¼ ðλ 1Þ vðξÞ:
Dv ð2:117Þ
d 2 H n ðξÞ dH ðξÞ
2
2ξ n þ 2nH n ðξÞ ¼ 0: ð2:118Þ
dξ dξ
This equation is said to be Hermite differential equation. Using (2.116), (2.118) can
be recast as an eigenvalue equation such that
λ ¼ 2n þ 1, ð2:120Þ
we get
where c is an arbitrary constant. Thus, using (2.113), for a solution of (2.109) we get
un ðξÞ ¼ cH n ðξÞeξ =2
2
, ð2:122Þ
where the solution u(ξ) is indexed with n. From (2.110), as an energy eigenvalue we
have
1
E n ¼ n þ ħω:
2
where we have
52 2 Quantum-Mechanical Harmonic Oscillator
ΔA A hAi: ð2:123Þ
where we used the fact that an expectation value of an Hermitian operator is real.
Then, h(ΔA)2i is non-negative as in the case of (2.13). Moreover, if jψi is an
eigenstate of A, h(ΔA)2i ¼ 0. Therefore, h(ΔA)2i represents a measure of how large
measured values are dispersed when A is measured in reference to a quantum state
jψi. Also, we define a standard deviation δA as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D Effi
δA ðΔAÞ2 : ð2:126Þ
then we have
δA δB j k j =2 ð2:128Þ
In (2.129), we used the fact that hψ| A| ψi and hψ| B| ψi are just real numbers and
those commute with any operator. Next, we calculate a following quantity in relation
to a real number λ:
where we used the fact that ΔA and ΔB are Hermitian. For the above quadratic form
to hold with any real number λ, we have
where we used the fact that ΔA is Hermitian. From the definition of norm of (1.121),
for δA ¼ 0 to hold, we have
where we assumed that jψi is arbitrarily chosen normalized vector. Suppose now
that jψ 0i is an eigenstate of A that belongs to an eigenvalue a. Then, we have
A j ψ 0 i ¼ a j ψ 0 i: ð2:137Þ
where the last equality comes from the fact that A is Hermitian. From (2.138), we
would have
This would imply that (2.136) does not hold with jψ 0i, in contradiction to (2.127),
where ik 6¼ 0. Namely, we conclude that any physical state cannot be an eigenstate of
A on condition that (2.127) holds. Equation (2.127) is rewritten as
h 2 i
ħ 2 ħ
q2 ¼ a þ a{ ¼ a2 þ E þ 2a{ a þ a{ , ð2:142Þ
2mω 2mω
where E denotes an identity operator and we used (2.24) along with the following
relation:
ħ ħ
h ψ n j q2 j ψ n i ¼ hψ n jψ n i þ 2 ψ n ja{ aψ n i ¼ ð2n þ 1Þ, ð2:144Þ
2mω 2mω
mħω
hψ n jp j ψ n i ¼ 0 and hψ n jp2 j ψ n i ¼ ð2n þ 1Þ: ð2:145Þ
2
Thus, we get
D E
mħω
ðΔpÞ2 ¼ hψ n jp2 j ψ n i ¼ ð2n þ 1Þ:
2
Accordingly, we have
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D E rDffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
E
2 ħ ħ
δq δp ¼ ðΔqÞ ðΔpÞ2 ¼ ð2n þ 1Þ : ð2:146Þ
2 2
ħ
δq δp :
2
This is indeed the case with (2.146) for the quantum-mechanical harmonic oscillator.
This example represents uncertainty principle more generally.
In relation to the aforementioned argument, we might well wonder if Examples
1.1 and 1.2 have an eigenstate of a fixed momentum. Suppose that we chose for an
eigenstate y(x) ¼ ceikx, where c is a constant. Then, we would have ħi ∂y∂xðxÞ ¼ ħkyðxÞ
56 2 Quantum-Mechanical Harmonic Oscillator
and get an eigenvalue ħk for a momentum. Nonetheless, such y(x) does not satisfy
the proper BCs; i.e., y(L) ¼ y(L ) ¼ 0. This is because eikx never vanishes with any
real numbers of k or x (any complex numbers of k or x, more generally). Thus, we
cannot obtain a proper solution that has an eigenstate with a fixed momentum in a
confined physical system.
References
two particles are moving under control only by a force field between the two
particles without other external force fields [1].
In the classical mechanics, equation of motion is separated into two equations
related to the relative coordinates and center-of-mass coordinates accordingly. Of
these, a term of the potential field is only included in the equation of motion with
respect to the relative coordinates.
The situation is the same with the quantum mechanics. Namely, the Schrödinger
equation of motion with the relative coordinates is expressed as an eigenvalue
equation that reads as
ħ2
∇2 þ V ðr Þ ψ ¼ Eψ, ð3:1Þ
2μ
where μ is a reduced mass of two particles [1], i.e., an electron and a proton; V(r) is a
potential with r being a distance between the electron and proton. In (3.1), we
assume the spherically symmetric potential; i.e., the potential is expressed only as
a function of the distance r. Moreover, if the potential is coulombic,
ħ2 2 e2
∇ ψ ¼ Eψ, ð3:2Þ
2μ 4πε0 r
0 1
Lx
B C
L ¼ ðe1 e2 e3 Þ@ Ly A, ð3:4Þ
Lz
where x denotes a position vector with respect to the relative coordinates x, y, and z.
That is,
0 1
x
B C
x ¼ ð e1 e2 e3 Þ @ y A: ð3:5Þ
z
where we have used canonical commutation relation (1.140) in the second to the last
equality. In the above calculations, we used commutability of, e.g., y and pz; z and py.
For example, we have
60 3 Hydrogen-Like Atoms
ħ ∂ ∂ ħ ∂jψi ∂jψi
pz , y j ψi ¼ yy j ψi ¼ y y ¼ 0:
i ∂z ∂z i ∂z ∂z
Since jψi is arbitrarily chosen, this relation implies that pz and y commute. We
obtain similar relations regarding Ly2 and Lz2 as well. Thus, we have
L2 ¼ Lx 2 þ Ly 2 þ Lz 2
¼ y2 pz 2 þ z2 py 2 þ z2 px 2 þ x2 pz 2 þ x2 py 2 þ y2 px 2
þ x 2 px 2 x 2 px 2 þ y 2 py 2 y 2 py 2 þ z 2 pz 2 z 2 pz 2
yzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py
þiħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy
¼ y 2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 px 2
þx2 px 2 þ y2 py 2 þ z2 pz 2 Þ x2 px 2 þ y2 py 2 þ z2 pz 2
¼ rðr pÞ p:
The calculations of r2 p2 [the first term of (3.7)] and r p (in the third term) are
straightforward.
In a spherical coordinate, momentum p is expressed as
where pr, pθ, and pϕ are components of p; e(r), e(θ), and e(ϕ) are orthonormal basis
vectors of ℝ3 in the direction of increasing r, θ, and ϕ, respectively (see Fig. 3.1). In
3.2 Constitution of Hamiltonian 61
(a) z (b)
e(r) z
e(r)
e(φ ) e(θ )
e(φ )
θ e(θ ) θ
O y
O
φ
Fig. 3.1 Spherical coordinate system and orthonormal basis set. (a) Orthonormal basis vectors e(r),
e(θ), and e(ϕ) in ℝ3. (b) The basis vector e(ϕ) is perpendicular to the plane shaped by the z-axis and a
straight line of y ¼ x tan ϕ
Fig. 3.1b, e(ϕ) is perpendicular to the plane shaped by the z-axis and a straight line of
y ¼ x tan ϕ. Notice that the said plane is spanned by e(r) and e(θ). Meanwhile, the
momentum operator is expressed as [2]
ħ
p¼ —
i
ħ ðrÞ ∂ ðθ Þ 1 ∂ ðϕÞ 1 ∂
¼ e þe þe : ð3:9Þ
i ∂r r ∂θ r sin θ ∂ϕ
The vector notation of (3.9) corresponds to (1.31). That is, in the Cartesian
coordinate, we have
ħ ħ ∂ ∂ ∂
p= —= e1 þ e2 þ e3 ,
i i ∂x ∂y ∂z
r = reðrÞ , ð3:10Þ
ħ ħ ∂
r p=r — =r : ð3:11Þ
i i ∂r
Hence,
2
ħ ∂ ħ ∂ ∂
rðr pÞ p = r r ¼ ħ2 r 2 2 : ð3:12Þ
i ∂r i ∂r ∂r
Thus, we have
2
∂ 2 ∂ 2 ∂ 2 ∂
L 2 ¼ r 2 p2 þ ħ2 r 2 þ 2ħ r ¼ r 2 2
p þ ħ r : ð3:13Þ
∂r
2 ∂r ∂r ∂r
Therefore,
ħ2 ∂ 2 ∂ L2
p ¼ 2
2
r þ 2: ð3:14Þ
r ∂r ∂r r
Notice here that L2 does not contain r (vide infra); i.e., L2 commutes with r2, and
so it can freely be divided by r2. Thus, the Hamiltonian H is represented by
p2
H¼ þ V ðr Þ
2μ
2
1 ħ ∂ ∂ L2 Ze2
¼ 2 r2 þ 2 : ð3:15Þ
2μ r ∂r ∂r r 4πε0 r
1 9
r ¼ ð x2 þ y2 þ z 2 Þ 2 , >
>
>
>
1=2 =
ð x2 þ y2 Þ ð3:18Þ
θ ¼ tan 1 , >
z >
>
1 y
>
;
ϕ ¼ tan :
x
Thus, we have
∂ ∂
Lz ¼ xpy ypx ¼ iħ x y , ð3:19Þ
∂y ∂x
∂ ∂r ∂ ∂θ ∂ ∂ϕ ∂
¼ þ þ : ð3:20Þ
∂x ∂x ∂r ∂x ∂θ ∂x ∂ϕ
In turn, we have
∂r x
¼ ¼ sin θ cos ϕ,
∂x r
1
∂θ 1 ðx2 þ y2 Þ 2 2x z x cos θ cos ϕ
¼ ¼ 2 ¼ ,
∂x 1 þ ðx2 þ y2 Þ=z2 2z x þ y2 þ z2 ðx2 þ y2 Þ12 r
∂ϕ 1 1 sin ϕ
¼ y ¼ :
∂x 1 þ ðy2 =x2 Þ x2 r sin θ
ð3:21Þ
0 1
tan 1 x ¼ :
1 þ x2
Similarly, we have
Inserting (3.22) and (3.23) together with (3.17) into (3.19), we get
64 3 Hydrogen-Like Atoms
∂
Lz ¼ iħ : ð3:24Þ
∂ϕ
∂ ∂r ∂ ∂θ ∂ ∂ sin θ ∂
¼ þ ¼ cos θ :
∂z ∂z ∂r ∂z ∂θ ∂r r ∂θ
Then, we have
∂ ∂ ∂ ∂
LðþÞ ¼ ħeiϕ þ i cot θ and LðÞ ¼ ħeiϕ þ i cot θ : ð3:28Þ
∂θ ∂ϕ ∂θ ∂ϕ
Thus, we get
ðþÞ ðÞ ∂ ∂ iϕ ∂ ∂
L L ¼ħ e 2 iϕ
þ i cot θ e þ i cot θ
∂θ ∂ϕ ∂θ ∂ϕ
2 2
∂ 1 ∂ ∂
¼ ħ2 eiϕ eiϕ 2 þ i þ i cot θ
∂θ sin 2 θ ∂ϕ ∂θ∂ϕ
2 2
∂ ∂ ∂ ∂
þeiϕ cot θ þ i cot θ þ ieiϕ cot θ þ i cot θ 2
∂θ ∂ϕ ∂ϕ∂θ ∂ϕ
2 2
∂ ∂ ∂ 2 ∂
¼ ħ2 þ cot θ þ i þ cot θ ð3:29Þ
∂θ2 ∂θ ∂ϕ ∂ϕ2
2 2
∂ ∂ ∂ cot θ ∂ ∂ 1 ∂ ∂
i cot θ ¼i þ cot θ ¼i þ cot θ :
∂θ ∂ϕ ∂θ ∂ϕ ∂θ∂ϕ sin 2 θ ∂ϕ ∂θ∂ϕ
2 2
∂ ∂
Note also that ∂θ∂ϕ ¼ ∂ϕ∂θ . This is because we are dealing with continuous and
differentiable functions.
Meanwhile, we have following commutation relations:
Lx , Ly ¼ Lx Ly Ly Lx
¼ ypz zpy zpx xpz zpx xpz ypz zpy
In the above equation, we assumed that the order of differentiation with respect to
x and y can be switched. It is because we are dealing with continuous and differen-
tiable normal functions. Thus, px and py commute.
For other important commutation relations, we have
Lx , L2 ¼ 0, Ly , L2 ¼ 0, and Lz , L2 ¼ 0: ð3:31Þ
The derivation is straightforward and it is left for readers. The relations (3.30) and
(3.31) imply that a simultaneous eigenstate exists for L2 and one of Lx, Ly, and Lz.
This is because L2 commute with them from (3.31), whereas Lz does not commute
with Lx or Ly. The detailed argument about the simultaneous eigenstate can be seen
in Part III.
Thus, we have
LðþÞ LðÞ ¼ Lx 2 þ Ly 2 þ i Ly Lx Lx Ly ¼ Lx 2 þ Ly 2 þ i Ly , Lx
¼ Lx 2 þ Ly 2 þ ħLz
2
∂
Lz 2 ¼ ħ2 2
: ð3:33Þ
∂ϕ
Finally we get
!
2 2
∂ ∂ 1 ∂
L2 ¼ ħ2 þ cot θ þ
∂θ
2 ∂θ sin θ ∂ϕ2
2
or
" #
2
1 ∂ ∂ 1 ∂
L2 ¼ ħ2 sin θ þ : ð3:34Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
( " # )
2
ħ2 ∂ 2 ∂ 1 ∂ ∂ 1 ∂ Ze2
2 r þ sinθ þ 2 ψ ¼Eψ: ð3:36Þ
2μr ∂r ∂r sinθ ∂θ ∂θ sin θ ∂ϕ2 4πε0 r
L2 ¼ Lx 2 þ Ly 2 þ Lz 2 : ð3:38Þ
hL2 i hψ jL2 ψi
Notice that the second last equality comes from that Lx, Ly, and Lz are Hermitian.
An operator that satisfies (3.40) is said to be non-negative (see Sects. 1.4, 2.2, etc.
where we saw the calculation routines). Note also that in (3.40) the equality holds
only when the following relations hold:
The eigenfunction that satisfies (3.42) and the next relation (3.43) is a simulta-
neous eigenstate of Lx, Ly, Lz, and L2. This could seem to be in contradiction to the
fact that Lz does not commute with Lx or Ly. However, this is an exceptional case. Let
jψ 0i be the eigenfunction that satisfies both (3.41) and (3.42). Then, we have
As can be seen from (3.24) to (3.26) along with (3.34), the operators Lx, Ly, Lz,
and L2 are differential operators. Therefore, (3.43) implies that jψ 0i is a constant. We
will come back this point later. In spite of this exceptional situation, it is impossible
that all Lx, Ly, and Lz as well as L2 take a whole set of eigenfunctions as simultaneous
eigenstates. We briefly show this as below.
In Chap. 2, we mention that if [A, B] ¼ ik, any physical state cannot be an
eigenstate of A or B. The situation is different, on the other hand, if we have a
following case
where A, B, and C are Hermitian operators. The relation (3.30) is a typical example
for this. If Cjψi ¼ 0 in (3.44), jψi might well be an eigenstate of A and/or B.
However, if Cjψi ¼ cjψi (c 6¼ 0), jψi cannot be an eigenstate of A or B. This can
readily be shown in a fashion similar to that described in Sect. 2.5. Let us think of,
e.g., [Lx, Ly] ¼ iħLz. Suppose that for ∃ψ 0 we have Lzj ψ 0i ¼ 0. Taking an inner
product using jψ 0i, from (3.30) we have
3.3 Separation of Variables 69
ψ 0 j Lx Ly Ly Lx ψ 0 ¼ 0:
γ ¼ ħ2 λ ðλ 0Þ: ð3:46Þ
That is,
2
1 ħ ∂ ∂Rðr Þ L2 Y ðθ, ϕÞ Ze2
2 r2 Y ðθ, ϕÞ þ R ð r Þ Rðr ÞY ðθ, ϕÞ
2μ r ∂r ∂r r 2 4πε0 r
2
1 ħ ∂ 2 ∂Rðr Þ ħ2 λ Ze2
2 r þ 2 R ðr Þ Rðr Þ ¼ ERðr Þ: ð3:51Þ
2μ r ∂r ∂r r 4πε0 r
Regarding angular components θ and ϕ, using (3.34), (3.37), and (3.46), we have
" #
2
1 ∂ ∂ 1 ∂
L Y ðθ, ϕÞ ¼ ħ
2 2
sin θ þ Y ðθ, ϕÞ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
¼ ħ2 λY ðθ, ϕÞ : ð3:52Þ
Notice in (3.53) that the angular part of SOLDE does not depend on a specific
form of the potential.
Now, we further assume that (3.53) can be separated into a zenithal angle part θ
and azimuthal angle part ϕ such that
Then we have
" #
2
1 ∂ ∂ΘðθÞ 1 ∂ Φ ð ϕÞ
sin θ ΦðϕÞ þ ΘðθÞ ¼ λΘðθÞΦðϕÞ: ð3:55Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
Multiplying both sides by sin2θ/Θ(θ)Φ(ϕ) and arranging both the sides, we get
2
1 ∂ ΦðϕÞ sin 2 θ 1 ∂ ∂ΘðθÞ
¼ sin θ þ λΘðθÞ : ð3:56Þ
ΦðϕÞ ∂ϕ2 ΘðθÞ sin θ ∂θ ∂θ
Since LHS of (3.56) depends only upon ϕ and RHS depends only on θ, we must
have
1 d 2 Φð ϕÞ
¼ η: ð3:58Þ
ΦðϕÞ dϕ2
2
Putting D dϕ
d
2 , we get
The SOLDEs of (3.58) and (3.59) are formally the same as (1.61) of Sect. 1.3,
where boundary conditions (BCs) are Dirichlet conditions. Unlike (1.61), however,
we have to consider different BCs, i.e., the periodic BCs.
As in Example 1.1, we adopt two linearly independent solutions. That is, we have
Meanwhile, we have
Then,
m2 eimϕ ¼ ηeimϕ :
Therefore, we get
η ¼ m2 ðm ¼ 0, 1, 2, Þ: ð3:65Þ
For the sake of simple notation, let us define J as follows so that we can eliminate
ħ and deal with dimensionless quantities in the present discussion:
3.4 Generalized Angular Momentum 73
0
1 0 1
Jex =ħ Jx
e Be C B C
J J=ħ ¼ ðe1 e2 e3 Þ@ J y =ħ A ¼ ðe1 e2 e3 Þ@ J y A,
Jez =ħ Jz
J2 ¼ J x 2 þ J y 2 þ J z 2 : ð3:68Þ
J x , J y ¼ iJ z , J y , J z ¼ iJ x , and ½J z , J x ¼ iJ y : ð3:69Þ
J x , J2 ¼ 0, J y , J2 ¼ 0, and J z , J2 ¼ 0: ð3:70Þ
The implication of (3.71) is that jζ, μi is the simultaneous eigenstate and that μ is
an eigenvalue of Jz which jζ, μi belongs to with ζ being an eigenvalue of J2 which
jζ, μi belongs to as well.
Since Jz and J2 are Hermitian, both μ and ζ are real (see Sect. 1.4). Of these, ζ 0
as in the case of (3.45). We define following operators J (+) and J () as in the case of
(3.27):
Equation (3.75) indicates that both J (+)jζ, μi and J ()jζ, μi are eigenvectors of J2
that correspond to an eigenvalue ζ.
Meanwhile, from (3.74) we get
J ðþÞ jζ, μi ¼ aμ ðþÞ jζ, μ þ 1i and J ðÞ jζ, μi ¼ aμ ðÞ jζ, μ 1i: ð3:77Þ
J x 2 þ J y 2 ¼ J2 J z 2 : ð3:78Þ
Therefore,
J x 2 þ J y 2 Þjζ, μi ¼ J2 J z 2 jζ, μi ¼ ζ μ2 jζ, μi: ð3:79Þ
ζ μ2 0: ð3:80Þ
j, j 1, j 2, , j0 : ð3:82Þ
ζ j2 j ¼ ζ j0 þ j0 ¼ 0:
2
ð3:85Þ
ζ ¼ jð j þ 1Þ ¼ j0 ð j0 1Þ: ð3:86Þ
jð j þ 1Þ j0 ð j0 1Þ ¼ ð j þ j0 Þð j j0 þ 1Þ ¼ 0: ð3:87Þ
j þ j0 ¼ 0 or j ¼ j0 : ð3:88Þ
μ ¼ j, j 1, j 2, , j 1, j: ð3:89Þ
That is, the number μ can take is (2j + 1). The relation (3.89) implies that taking a
positive integer k,
j k ¼ j or j ¼ k=2: ð3:90Þ
where the second equality comes from that jζ, μ 1i has been normalized; i.e.,
jjj ζ, μ 1ijj ¼ 1. Meanwhile, taking adjoint of both sides of the first equation of
(3.77), we have
h i{ h i
hζ, μj J ðþÞ ¼ aμ ðþÞ hζ, μ þ 1j: ð3:92Þ
But, from (3.72) and the fact that Jx and Jy are Hermitian,
h i{
J ðþÞ ¼ J ð Þ : ð3:93Þ
where again jζ, μi is assumed to be normalized. Comparing (3.91) and (3.95), we get
h i
aμ ðÞ ¼ aμ1 ðþÞ : ð3:96Þ
Taking an inner product regarding the first equation of (3.77) and its adjoint,
h i 2
hζ, μjJ ðÞ J ðþÞ jζ, μi ¼ aμ ðþÞ aμ ðþÞ hζ, μ þ 1jζ, μ þ 1i ¼aμ ðþÞ : ð3:97Þ
Once again, the second equality of (3.97) results from the normalization of the
vector.
Using (3.83) as well as (3.71) and (3.86), (3.97) can be rewritten as
Thus, we get
3.5 Orbital Angular Momentum: Operator Approach 77
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aμ ðþÞ ¼ eiδ ð j μÞð j þ μ þ 1Þ ðδ : an arbitrary real numberÞ, ð3:99Þ
In (3.99) and (3.100), we routinely put δ ¼ 0 so that aμ(+) and aμ() can be positive
numbers. Explicitly rewriting (3.77), we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
J ðþÞ jζ, μi ¼ ð j μÞð j þ μ þ 1Þjζ, μ þ 1i,
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
J ðÞ jζ, μi ¼ ð j μ þ 1Þð j þ μÞjζ, μ 1i, ð3:101Þ
where j is a fixed given number chosen from among zero, positive integers, and
positive half-integers (or half-odd-integers).
As discussed above, we have derived various properties and relations with respect
to the generalized angular momentum on the basis of (i) the relation (3.30) or (3.69)
and (ii) the fact that Jx, Jy, and Jz are Hermitian operators. This notion is very useful
in dealing with various angular momenta of different origins (e.g., orbital angular
momentum, spin angular momentum) from a unified point of view. In Chap. 20, we
will revisit this issue in more detail.
In Sect. 3.4 we have derived various important results on angular momenta on the
basis of the commutation relations (3.69) and the assumption that Jx, Jy, and Jz are
Hermitian. Now, let us return to the discussion on orbital angular momenta we dealt
with in Sects. 3.2 and 3.3. First, we treat the orbital angular momenta via operator
approach. This approach enables us to understand why a quantity j introduced in
Sect. 3.4 takes a value zero or positive integers with the orbital angular momenta. In
the next section (Sect 3.6) we will deal with the related issues by an analytical
method.
In (3.28) we introduced differential operators L(+) and L(). According to Sect.
3.4, we define following operators to eliminate ħ so that we can deal with dimen-
sionless quantities:
0 1
Mx
B C
M L=ħ ¼ ðe1 e2 e3 Þ@ M y A,
Mz
78 3 Hydrogen-Like Atoms
M 2 ¼ L2 =ħ2 ¼ M x 2 þ M y 2 þ M z 2 : ð3:102Þ
Hence, we have
Lx Ly
Mx ¼ , M y ¼ , and M z ¼ Lz =ħ: ð3:103Þ
ħ ħ
Then we have
" #
2
1 ∂ ∂ 1 ∂
M ¼ 2
sin θ þ : ð3:106Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
we get
qffiffiffiffiffiffiffiffiffiffiffiffiffi !
ð þÞ 2 ∂ ξ ∂
M ¼e 1ξ iϕ
þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1 ξ2 ∂ϕ
qffiffiffiffiffiffiffiffiffiffiffiffiffi !
ð Þ iϕ 2 ∂ ξ ∂ ð3:109Þ
M ¼e 1ξ þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1 ξ2 ∂ϕ
∂ ∂ 2
1 ∂
M2 ¼ 1 ξ2 :
∂ξ ∂ξ 1 ξ2 ∂ϕ2
3.5 Orbital Angular Momentum: Operator Approach 79
m ¼ j, j 1, j 2, , j 1, j: ð3:110Þ
j ðθ, ϕÞ / e
Ym :
imϕ
Therefore, we get
qffiffiffiffiffiffiffiffiffiffiffiffiffi !
2 ∂ mξ
M ð þÞ Y m
j ðθ, ϕÞ ¼e 1ξ
iϕ
pffiffiffiffiffiffiffiffiffiffiffiffiffi Y m
j ðθ, ϕÞ
∂ξ 1 ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffimþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffim
∂
¼ e iϕ
1ξ 2
1ξ 2
Y j ðθ, ϕÞ ,
m
ð3:112Þ
∂ξ
qffiffiffiffiffiffiffiffiffiffiffiffiffim qffiffiffiffiffiffiffiffiffiffiffiffiffim2
∂
1 ξ2 Ym ð θ, ϕ Þ ¼ mξ 1 ξ2 j ðθ, ϕÞ
Ym
∂ξ j
qffiffiffiffiffiffiffiffiffiffiffiffiffim
∂Y m
j ðθ, ϕÞ
þ 1 ξ2 : ð3:113Þ
∂ξ
Similarly, we get
qffiffiffiffiffiffiffiffiffiffiffiffiffi !
2 ∂ mξ
M ð Þ Y m
j ðθ, ϕÞ ¼e iϕ
1ξ pffiffiffiffiffiffiffiffiffiffiffiffiffi Y m
j ðθ, ϕÞ
∂ξ 1 ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffimþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffim
iϕ ∂
¼e 1ξ 2
1ξ 2
Y j ðθ, ϕÞ :
m
ð3:114Þ
∂ξ
h in qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n
ð þÞ n inϕ ∂
M Y j ðθ, ϕÞ ¼ ð1Þ e
m
1 ξ2 n
∂ξ
qffiffiffiffiffiffiffiffiffiffiffiffiffim
1 ξ2 Ym j ð θ, ϕ Þ : ð3:115Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffi !
2 ∂ ξ ∂
¼e iϕ
1ξ þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1 ξ2 ∂ϕ
qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ 2 m m
ð1Þn einϕ 1 ξ2 1 ξ Y ð θ, ϕ Þ
∂ξn j
" qffiffiffiffiffiffiffiffiffiffiffiffiffi #
n iðnþ1Þϕ 2 ∂ ξðn þ mÞ
¼ ð1Þ e 1ξ pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1 ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ 2 m m
1ξ 2
1ξ Y j ðθ, ϕÞ
∂ξn
3.5 Orbital Angular Momentum: Operator Approach 81
( qffiffiffiffiffiffiffiffiffiffiffiffiffi" qffiffiffiffiffiffiffiffiffiffiffiffiffimþn1
n iðnþ1Þϕ
¼ ð1Þ e 1 ξ 2 ð m þ nÞ 1 ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffi n qffiffiffiffiffiffiffiffiffiffiffiffiffi
1 2 1 ∂ 2 m m
1ξ ð2ξÞ 1ξ Y j ðθ, ϕÞ
2 ∂ξn
qffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffimþn nþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ 2 m m
1 ξ2 1 ξ2 1 ξ Y j ð θ, ϕ Þ
∂ξnþ1
qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n "qffiffiffiffiffiffiffiffiffiffiffiffiffi )
ξðn þ mÞ ∂ 2 m m
pffiffiffiffiffiffiffiffiffiffiffiffiffi 1ξ 2
1ξ Y j ðθ, ϕÞ
1 ξ2 ∂ξn
qffiffiffiffiffiffiffiffiffiffiffiffiffimþðnþ1Þ nþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi
nþ1 iðnþ1Þϕ ∂ 2 m m
¼ ð1Þ e 1ξ 2
1ξ Y j ðθ, ϕÞ : ð3:116Þ
∂ξnþ1
Notice that the first and third terms in the second last equality cancelled each
other. Thus, (3.115) certainly holds with (n + 1). Similarly, we have [3]
h in qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi
ðÞ inϕ ∂
M Y j ðθ, ϕÞ ¼ e
m
1ξ 2
n 1ξ 2 m m
Y j ðθ, ϕÞ :
∂ξ
ð3:117Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffi j
This implies that 1 ξ2 j Y j ðθ, ϕÞ is a constant with respect to ξ. We
describe this as
qffiffiffiffiffiffiffiffiffiffiffiffiffij
1 ξ2 Y j
j ðθ, ϕÞ ¼ c ðc : constant with respect to ξÞ: ð3:119Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffij
1 ξ2 Y j j ðθ, ϕÞ ¼ ðat most a 2j degree polynomial with ξÞ: ð3:121Þ
Replacing Y j
j ðθ, ϕÞ in (3.121) with that of (3.119), we get
qffiffiffiffiffiffiffiffiffiffiffiffiffij qffiffiffiffiffiffiffiffiffiffiffiffiffij
j
c 1 ξ2 1 ξ2 ¼ c 1 ξ2
Y ðθ, ϕÞ Y m
l ðθ, ϕÞ ðl : zero or a positive integer Þ: ð3:123Þ
At the same time, so far as the orbital angular momentum is concerned, from
(3.71) and (3.86) we can identify ζ in (3.71) with l(l + 1). Namely, we have
ζ ¼ lðl þ 1Þ:
m ¼ l, l 1, l 2, 1, 0, 1, l þ 1, l: ð3:124Þ
l ðξÞ ΘðθÞ,
Pm ð3:126Þ
and considering (3.109) along with (3.54), we arrive at the next SOLDE described as
3.5 Orbital Angular Momentum: Operator Approach 83
d m
2 dPl ðξÞ m2
1ξ þ lðl þ 1Þ Pm ðξÞ ¼ 0: ð3:127Þ
dξ dξ 1 ξ2 l
where the integration is performed on a unit sphere. Note that an infinitesimal area
element on the unit sphere is represented by sinθdθdϕ.
We evaluate the above integral denoted as
Z π
I sin 2lþ1 θdθ: ð3:132Þ
0
Then,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ð2l þ 1Þ! eiχ ð2l þ 1Þ!
jκ l j ¼ l or κl ¼ l ðχ : realÞ, ð3:136Þ
2 l! 4π 2 l! 4π
1
Y m1
l ðθ, ϕÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M ðÞ Y m
l ðθ, ϕÞ: ð3:138Þ
ðl m þ 1Þðl þ mÞ
1 ðÞ l
l ðθ, ϕÞ ¼ pffiffiffiffi M
Y l1 Y l ðθ, ϕÞ: ð3:139Þ
2l
lm l
Further replacing M ðÞ Y l ðθ, ϕÞ in (3.140) with that of (3.141), we get
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffim lm
ðl þ mÞ! iðlmÞϕ ∂
l ðθ, ϕÞ
Ym ¼ e 1 ξ2
ð2lÞ!ðl mÞ! ∂ξ
lm
"qffiffiffiffiffiffiffiffiffiffiffiffiffi #
l
1 ξ Y l ðθ, ϕÞ :
2 l
ð3:142Þ
where we put (1)l on both the numerator and denominator. In RHS of (3.144),
ð1Þl ∂l
1 ξ2 Þl Pl ðξÞ: ð3:145Þ
2l l! ∂ξl
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eiχ ð2l þ 1Þ eiχ ð2l þ 1Þ
Y 0l ð0, ϕÞ ¼ Pl ð1Þ ¼ , ð3:147Þ
ð1Þl 4π ð1Þl 4π
where we used Pl(1) ¼ 1. For this important relation, see Sect. 3.6.1. Also noting that
eiχ
ð1Þl ¼ 1, we must have
eiχ
¼ 1 or eiχ ¼ ð1Þl ð3:148Þ
ð1Þl
h l i
1 ξ2 : ð3:149Þ
jψ 0 i Y 00 ðθ, ϕÞ:
Now, we know that m takes (2l + 1) different values that correspond to each l.
This implies that the operators can be expressed with (2l + 1, 2l + 1) matrices. As
implied in (3.151), M () takes the following form:
M ð Þ ¼
0 pffiffiffiffiffiffiffiffiffiffi 1
0 2l 1
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B ð2l 1Þ 2 C
B 0 C
B C
B C
B 0 ⋱ C
B C
B C
B ⋱ C
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B ð2l k þ 1Þ k C,
B 0 C
B C
B 0 C
B C
B C
B ⋱ ⋱ C
B pffiffiffiffiffiffiffiffiffiffi C
B C
B 0 1 2l C
@ A
0
ð3:152Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where diagonal elements are zero and a (k, k + 1) element is ð2l k þ 1Þ k. That
is, nonzero elements are positioned just above the zero diagonal elements. Corre-
spondingly, we have
M ð þÞ ¼
0 1
0
B pffiffiffiffiffiffiffiffiffi C
B 2l 1 C
B 0 C
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B C
B ð2l 1Þ 2 0 C
B C
B C
B ⋱ 0 C
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B C,
B ð2l k þ 1Þ k 0 C
B C
B ⋱ C
B C
B ⋱ C
B C
B 0 C
B C
B ⋱ 0 C
@ A
pffiffiffiffiffiffiffiffiffi
1 2l 0
ð3:153Þ
88 3 Hydrogen-Like Atoms
where the first number l in jl, li, jl, l + 1i, etc. denotes the quantum number
associated with λ ¼ l(l + 1) of (3.124) and is kept constant; the latter number denotes
m. Note from (3.154) that the column vector whose k-th row is 1 corresponds to
m such that
m ¼ l þ k 1: ð3:155Þ
where the second equality is obtained by replacing k with that of (3.155), i.e.,
k ¼ l + m + 1. Changing m to (m 1), we get the first equation of (3.151). Similarly,
we obtain the second equation of (3.151) as well. That is, we have
3.5 Orbital Angular Momentum: Operator Approach 89
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðþÞ jl, mi ¼ð2l k þ 1Þ k jl, m þ 1i
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ ðl mÞðl þ m þ 1Þ jl, m þ 1i: ð3:157Þ
M 2 ¼ M ðþÞ M ðÞ þ M z 2 M z :
In the above, M (+)M () and Mz are diagonal matrices and, hence, Mz2 and M2 are
diagonal matrices as well such that
0 1
l
B C
B l þ 1 C
B C
B C
B l þ 2 C
B C
B ⋱ C
Mz ¼ B C,
B C
B kl C
B C
B C
B C
@ A
⋱
l
0 1
0
B C
B 2l 1 C
B C
B C
B ð2l 1Þ 2 C
B C
ð þÞ ðÞ B C
M M ¼B C,
B ⋱ C
B ð2l k þ 1Þ k C
B C
B C
B C
@ A
⋱
1 2l
ð3:158Þ
0 1
l ð l þ 1Þ
B C
B lðl þ 1Þ C
B C
B C
B ⋱ C
B C
B lðl þ 1Þ C
M ¼B
2
C:
B C
B ⋱ C
B C
B C
B C
@ A
l ð l þ 1Þ
lðl þ 1Þ
ð3:159Þ
These expressions are useful to understand how the vectors of (3.154) constitute
simultaneous eigenstates of M2 and Mz. In this situation, the matrix representation is
said to diagonalize both M2 and Mz. In other words, the quantum states represented
by (3.154) are simultaneous eigenstates of M2 and Mz.
The matrices (3.152) and (3.153) that represent M () and M (+), respectively, are
said to be ladder operators or raising and lowering operators, because operating
column vectors those operators convert jmi to jm 1i as mentioned above. The
operators M () and M (+) correspond to a and a{ given in (2.65) and (2.66), respec-
tively. All these operators are characterized by that the corresponding matrices have
diagonal elements of zero and that nonvanishing elements are only positioned on
“right above” or “right below” relative to the diagonal elements. These matrices are a
kind of triangle matrices and all their diagonal elements are zero. The matrices are
characteristic of nilpotent matrices. That is, if a suitable power of a matrix is zero as a
matrix, such a matrix is said to be a nilpotent matrix (see Part III). In the present case,
(2l + 1)-th power of M () and M (+) becomes zero as a matrix.
The operator M () and M (+) can be described by the following shorthand
representations:
h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðÞ ¼ ð2l k þ 1Þ kδkþ1,j ð1 k 2lÞ: ð3:160Þ
kj
pffiffiffiffiffiffiffiffiffiffi
If l ¼ 0, Mz ¼ M (+)M () ¼ M2 ¼ 0. This case corresponds to Y 00 ðθ, ϕÞ ¼ 1=4π
and we do not need the matrix representation. Defining
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ak ð2l k þ 1Þ k,
h i2 X
M ð Þ ¼ aδ aδ
p k kþ1,p p pþ1,j
¼ ak akþ1 δkþ2,j
kj
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ ð2l k þ 1Þ k ½2l ðk þ 1Þ þ 1 ðk þ 1Þδkþ2,j , ð3:161Þ
Notice that although a0 is not defined, δ1, j + 1 ¼ 0 for any j, and so this causes no
inconvenience. Hence, [M (+)M ()]kj of (3.163) is well defined with 1 k 2l + 1.
Important properties of angular momentum operators examined above are based
upon the fact that those operators are ladder operators and represented by nilpotent
matrices. These characteristics will further be studied in Part III.
In this section, our central task is to solve the associated Legendre differential
equation expressed by (3.127) by an analytical method. Putting m ¼ 0 in (3.127),
we have
d dP0l ðxÞ
1 x2 þ lðl þ 1ÞP0l ðxÞ ¼ 0, ð3:164Þ
dx dx
d l l
1 x2 1 x2 ¼ 2lx 1 x2 , ð3:166Þ
dx
Xn n!
d n ðuvÞ ¼ d m udnm v, ð3:167Þ
m¼0 m!ðn mÞ!
where
dm u=dxm dm u:
The above shorthand notation is due to Byron and Fuller [4]. We use this notation
for simplicity from place to place.
Noting that the third order and higher differentiations of (1 x2) vanish in LHS of
(3.166), we have
h l i
LHS ¼ dlþ1 1 x2 d 1 x2
l l
¼ 1 x2 dlþ2 1 x2 2ðl þ 1Þxd lþ1 1 x2
l
lðl þ 1Þdl 1 x2 :
Also noting that the second order and higher differentiations of 2lx vanish in LHS
of (3.166), we have
h l i
RHS ¼ d lþ1 2lx 1 x2
l l
¼ 2lxd lþ1 1 x2 2lðl þ 1Þd l 1 x2 :
Therefore,
LHS RHS
l l l
¼ 1 x2 dlþ2 1 x2 2xd lþ1 1 x2 þ lðl þ 1Þdl 1 x2 ¼ 0:
We define Pl(x) as
3.6 Orbital Angular Momentum: Analytic Approach 93
ð1Þl d l h i
2 l
Pl ðxÞ 1 x , ð3:168Þ
2l l! dxl
l
where a constant ð1 Þ
2l l!
is multiplied according to the custom so that we can explicitly
represent Rodrigues formula of Legendre polynomials. Thus, from (3.164) Pl(x)
defined above satisfies Legendre differential equation. Rewriting it, we get
d 2 P l ð xÞ dP ðxÞ
1 x2 2x l þ lðl þ 1ÞPl ðxÞ ¼ 0: ð3:169Þ
dx2 dx
Or equivalently, we have
d
2 dPl ðxÞ
1x þ lðl þ 1ÞPl ðxÞ ¼ 0: ð3:170Þ
dx dx
m ¼ l, l 1, l 2, 1, 0, 1, l þ 1, l:
are of particular importance. We will come back to this point in Sect. 8.3.
Since m can be either positive or negative, from (3.171) we notice that Pm l ðxÞ and
Pm
l ð x Þ must satisfy the same differential equation (3.171). This implies that Pml ð xÞ
m
and Pl ðxÞ are connected, i.e., linearly dependent. First, let us assume that m 0. In
the case of m < 0, we will examine it later soon.
According to Dennery and Krzywicki [5], we assume
2 m=2
l ðxÞ ¼ κ 1 x
Pm CðxÞ, ð3:172Þ
where κ is a constant. Inserting (3.172) into (3.171) and rearranging the terms, we
obtain
94 3 Hydrogen-Like Atoms
d2 C dC
1 x2 2
2ðm þ 1Þx þ ðl mÞðl þ m þ 1ÞC ¼ 0 ð0 m lÞ: ð3:173Þ
dx dx
d m Pl
þ ðl m Þðl þ m þ 1 Þ ¼ 0, ð3:174Þ
dxm
where we used the Leibniz rule about differentiation of (3.167). Comparing (3.173)
and (3.174), we find that
d m Pl
C ð xÞ ¼ κ 0 ,
dxm
where κ0 is a constant. Inserting this relation into (3.172) and setting κκ 0 ¼ 1, we get
2 m=2 d Pl ðxÞ
m
l ðxÞ ¼ 1 x
Pm
dxm
ð0 m lÞ: ð3:175Þ
Equation (3.175) defines the associated Legendre functions. Note, however, that
the function form differs from literature to literature [2, 5, 6].
Among classical orthogonal polynomials, Gegenbauer polynomials C λn ðxÞ often
appear in the literature. The relevant differential equation is defined by
d2 λ d
1 x2 C ðxÞ ð2λ þ 1Þx C λn ðxÞ þ nðn þ 2λÞC λn ðxÞ
dx2 n dx
1
¼0 λ> : ð3:177Þ
2
d2 mþ12 d mþ1
1 x2 C ðxÞ 2ðm þ 1Þx Clm2 ðxÞ
2 lm
dx dx
3.6 Orbital Angular Momentum: Analytic Approach 95
mþ1
þðl mÞðl þ m þ 1ÞC lm2 ðxÞ ¼ 0: ð3:178Þ
d m P l ð xÞ mþ1
¼ constant C lm2 ðxÞ ð0 m lÞ: ð3:179Þ
dxm
Next, let us determine the constant appearing in (3.179). To this end, we consider
a following generating function of the polynomials C λn ðxÞ defined by [7, 8]
λ X1
1
1 2tx þ t 2 C λ ðxÞt n
n¼0 n
λ> : ð3:180Þ
2
λ
X1 λ
ð 1 þ xÞ ¼ m¼0
xm , ð3:181Þ
m
λ
where λ is an arbitrary real number and we define as
m
λ λ
λðλ 1Þðλ 2Þ λ m þ 1=m! and 1: ð3:182Þ
m 0
X1 Γðλ þ 1Þ
ð1 þ xÞλ ¼ xm , ð3:183Þ
m¼0 m!Γðλ m þ 1Þ
Z 1
eu u2z1 du ð Re z > 0Þ:
2
ΓðzÞ ¼ 2 ð3:185Þ
0
Note that the above expression is associated with the following fundamental
feature of the gamma functions:
λ X1 Γðλ þ 1Þ
1 2tx þ t 2 ¼ ðt Þm ð2x t Þm : ð3:187Þ
m¼0 m!Γðλ m þ 1Þ
Assuming that x is a real number belonging to an interval [1, 1], (3.187) holds
with t satisfying jt j < 1 [8]. The discussion is as follows: When x satisfies the above
condition, solving 1 2tx + t2 ¼ 0 we get solution t such that
pffiffiffiffiffiffiffiffiffiffiffiffiffi
t ¼x i 1 x2 :
Defining r as
(1 2tx + t2)λ, regarded as a function of t, is analytic in the disk |t| < r. But, we
have
jt j ¼ 1:
Thus, (1 2tx + t2)λ is analytic within the disk jt j < 1 and, hence, it can be
expanded in a Taylor’s series (see Chap. 6).
Continuing the calculation of (3.187), we have
λ
1 2tx þ t 2
X1 X
Γðλ þ 1Þ m m m m! k k
¼ m¼0 m!Γðλ m þ 1Þ
ð1Þ t k¼0 k!ðm k Þ!
2 x ð1Þ t
mk mk
X1 Xm ð1Þmþk Γðλ þ 1Þ
¼ m¼0 k¼0 k!ðm k Þ! Γðλ m þ 1Þ
2mk xmk t mþk
X1 Xm ð1Þmþk ð1Þm Γðλ þ mÞ
¼ k¼0 k!ðm k Þ!
2mk xmk t mþk ,
m¼0 ΓðλÞ
ð3:188Þ
3.6 Orbital Angular Momentum: Analytic Approach 97
where the last equality results from that we rewrote gamma functions using (3.186).
Replacing (m + k) with n, we get
where [n/2] represents an integer that does not exceed n/2. This expression comes
from a requirement that an order of x must satisfy the following condition:
n 2k 0 or k n=2: ð3:190Þ
Comparing (3.164) and (3.177) and putting λ ¼ 1/2, we immediately find that the
two differential equations are identical [7]. That is,
n ðxÞ ¼ Pn ðxÞ:
C 1=2 ð3:192Þ
It is convenient to make
a formula
about a gamma function. In (3.193), n k > 0,
and so let us think of Γ 12 þ m ðm: positive integerÞ. Using (3.186), we have
98 3 Hydrogen-Like Atoms
1 1 1 1 3 3
Γ þm ¼ m Γ m ¼ m m Γ m ¼
2 2 2 2 2 2
1 3 1 1 m 1
¼ m m Γ ¼ 2 ð2m 1Þð2m 3Þ 3 1 Γ
2 2 2 2 2
ð 2m 1 Þ! ð 2m Þ!
1 1
¼ 2m m1 Γ ¼ 22m Γ :
2 ðm 1Þ! 2 m! 2
ð3:195Þ
For the derivation of the above definite integral, see (2.86) of Sect. 2.4. From
(3.184), we also have
Γð1Þ ¼ 1:
In relation of the discussion of Sect. 3.5, let us derive an important formula about
Legendre polynomials. From (3.180) and (3.192), we get
1=2 X1
1 2tx þ t 2 P ðxÞt
n¼0 n
n
: ð3:197Þ
12 1 X1 X1
1 2tx þ t 2 ¼ ¼ tn ¼ P ð1Þt n : ð3:198Þ
1t n¼0 n¼0 n
Pn ð1Þ ¼ 1:
dm Pl ðxÞ=dxm
X½ðlmÞ=2 ð1Þk ð2l 2k Þ!ðl 2kÞðl 2k 1Þ ðl 2k m þ 1Þ
¼ xl2km
k¼0 2l k!ðl k Þ!ðl 2k Þ!
X½ðlmÞ=2 ð1Þk ð2l 2k Þ!
¼ xl2km : ð3:199Þ
k¼0 2 k!ðl kÞ!ðl 2k mÞ!
l
Meanwhile, we have
mþ1
X½ðlmÞ=2 ð1Þk 2l2km Γ l þ 1 k
C lm2 ðxÞ ¼ 2 xl2km : ð3:200Þ
k¼0 k!ðl 2k mÞ! Γ m þ 12
Therefore, we get
mþ1
X½ðlmÞ=2 ð1Þk ð2l 2kÞ! 2m Γðm þ 1Þ l2km
Clm2 ðxÞ ¼ x , ð3:201Þ
k¼0 2l k!ðl kÞ!ðl 2k mÞ! Γð2m þ 1Þ
where we used m! ¼ Γ(m + 1) and (2m)! ¼ Γ(2m + 1). Comparing (3.199) and
(3.201), we get
Hence, we have
100 3 Hydrogen-Like Atoms
lm h i
ð1Þlm ðl þ mÞ!
2 m=2 d 2 l
l ð xÞ ¼
Pm 1 x 1 x : ð3:206Þ
2l l!ðl mÞ! dxlm
When m ¼ 0, we have
ð1Þl d l
P0l ðxÞ ¼ 1 x2 Þl ¼ Pl ðxÞ: ð3:207Þ
2l l! dxl
ð1Þm ðl mÞ! m
Pm
l ð xÞ ¼ Pl ðxÞ ðl m lÞ: ð3:210Þ
ðl þ mÞ!
m
l ðxÞ and Pl ðxÞ are linearly dependent.
Thus, as expected earlier, Pm
3.6 Orbital Angular Momentum: Analytic Approach 101
Y m m
l ðθ, ϕÞ ¼ ð1Þ Y l ðθ, ϕÞ :
m
ð3:213Þ
ð1Þlm ðl þ mÞ!
l ð xÞ ¼
Pm ð1 þ xÞm=2 ð1 xÞm=2
2l l!ðl mÞ!
Xlm ðl mÞ! d r lmr
l d
r¼0 r 1 þ xÞ 1 xÞl
r!ðl m r Þ! dx dxlmr
m:
ð1Þlm ðl þ mÞ! Xlm ðl mÞ! l!ð1 þ xÞlr 2 ð1Þlmr l!ð1 xÞmþr 2
m
¼ r¼0 r!ðl m r Þ!
2l l!ðl mÞ! ðl r Þ! ðm þ r Þ!
l!ðl þ mÞ! Xlm
m m
r lr rþ
ð1Þ ð 1 þ xÞ 2 ð 1 xÞ 2
¼ :
2 l r¼0 r!ð l m r Þ! ð l r Þ! ð m þ r Þ!
ð3:214Þ
l ðθ, ϕÞ ¼
Ym
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2l þ 1Þðl þ mÞ!ðl mÞ! imϕ Xlm ð1Þ cos 2l2rm θ2 sin 2rþm θ2
rþm
l! e :
4π r¼0 r!ðl m r Þ!ðl r Þ!ðm þ rÞ!
ð3:216Þ
For the minus sign appearing in Y 33 ðθ, ϕÞ is due to the Condon–Shortley phase.
For l ¼ 3 and m ¼ 0, moreover, we have
3.6 Orbital Angular Momentum: Analytic Approach 103
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r 62r θ 2r θ
ð 1 Þ cos sin
7 3! 3!X3 2 2
Y 03 ðθ, ϕÞ ¼ 3!
4π r¼0 r!ð3 r Þ!ð3 r Þ!r!
2 3
rffiffiffi cos 6 θ cos 4 θ
sin 2 θ
cos 2 θ
sin 4 θ
sin 6 θ
766
2 2 2 2 2 2 7
7
¼ 18 þ
π 4 0!3!3!0! 1!2!2!1! 2!1!1!2! 3!0!0!3! 5
8 9
rffiffiffi>
> θ θ θ θ θ θ >>
< cos 2 sin 2 cos
6 6 2 2 2 2
2 =
cos sin sin
7 2 2 2
¼ 18 þ
π>> 36 4 >
>
: ;
rffiffiffi
7 5 3
¼ cos 3 θ cos θ ,
π 4 4
where in the last equality we used formulae of elementary algebra and trigonometric
functions. At the same time, we get
rffiffiffiffiffi
7
Y 03 ð0, ϕÞ ¼ :
4π
Orthogonality relation of functions is important. Here we deal with it, regarding the
associated Legendre functions.
Replacing m with (m 1) in (3.174) and using the notation introduced before, we
have
1 x2 d mþ1 Pl 2mxd m Pl þ ðl þ mÞðl m þ 1Þdm1 Pl ¼ 0: ð3:219Þ
h m1 m1
d 1 x2 Þm dm Pl ¼ ðl þ mÞðl m þ 1Þ 1 x2 d Pl : ð3:220Þ
where with the second equality the first term vanished and with the second last
equality we used (3.220). Equation (3.222) gives a recurrence formula regarding f
(m). Further performing the calculation, we get
where we have
Z 1
f ð 0Þ ¼ Pl ðxÞPl0 ðxÞdx: ð3:224Þ
1
Z 1h
ih 0 0i
0
ð1Þl ð1Þl
f ð 0Þ ¼ 0 d l 1 x2 l dl 1 x2 l dx: ð3:225Þ
2l l! 2l l0 ! 1
To evaluate (3.224), we have two cases; i.e., (i) l 6¼ l0 and (ii) l ¼ l0. With the first
case, assuming that l > l0 and taking partial integration, we have
Z 1h ih 0 i
0
I¼ d l 1 x2 Þl dl 1 x2 Þl dx
1
hn l on l0 l0 oi1
¼ d l1 1 x2 d 1 x2
1
Z 1h h 0 i
0
d l1 1 x2 Þl dl þ1 1 x2 Þl dx: ð3:226Þ
1
In the above, we find that the first term vanishes because it contains (1 x2) as a
factor. Integrating (3.226) another l0 times as before, we get
Z 1h ih 0 i
0 0 0
I ¼ ð1Þl þ1 d ll 1 1 x2 Þl d2l þ1 1 x2 Þl dx: ð3:227Þ
1
0
In (3.227) 0 l l0 1 2l, and so dll 1 ð1 x2 Þ does not vanish, but
l
l0 0 l0
ð1 x2 Þ is an at most 2l0-degree polynomial and, hence, d2l þ1 ð1 x2 Þ vanishes.
Therefore,
f ð0Þ ¼ 0: ð3:228Þ
If l < l0, changing Pl(x) and Pl0 ðxÞ in (3.224), we get f(0) ¼ 0 as well.
In the second case of l ¼ l0, we evaluate the following integral:
Z 1h
l i2
I¼ d l 1 x2 dx: ð3:229Þ
1
Z 1 Z
l π
1 x2 dx ¼ sin 2lþ1 θdθ: ð3:231Þ
1 0
2
ðl!Þ 2lþ1
We have already estimate this integral in (3.132) to have 2ð2lþ1 Þ! . Therefore,
Thus, we get
ðl þ mÞ! ðl þ mÞ! 2
f ðmÞ ¼ f ð 0Þ ¼ : ð3:233Þ
ðl mÞ! ðl mÞ! 2l þ 1
Accordingly, putting
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
fm ðxÞ ð2l þ 1Þðl mÞ!Pm ðxÞ,
P ð3:235Þ
l
2ðl þ mÞ! l
we get
Z 1
P fm0 ðxÞdx ¼ δll0 :
fm ðxÞP ð3:236Þ
l l
1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2l þ 1Þðl mÞ! m
l ðθ, ϕÞ
Ym ¼ Pl ðxÞeimϕ ðx ¼ cos θ; 0 θ πÞ: ð3:238Þ
4π ðl þ mÞ!
m
Since Pml ðxÞ and Pl ðxÞ are linearly dependent as noted in (3.210), the set of the
associated Legendre functions cannot define a complete set of orthonormal system.
In fact, we have
Z 1
m ð1Þm ðl mÞ! ðl þ mÞ! 2 2ð1Þm
l ðxÞPl ðxÞdx ¼
Pm ¼ : ð3:239Þ
1 ðl þ mÞ! ðl mÞ! 2l þ 1 2l þ 1
m
l ðxÞ and Pl ðxÞ are not orthogonal. Thus, we need e
imϕ
This means that Pm to
constitute the complete set of orthonormal system. In other words,
Z 2π Z 1 h 0 i
dϕ d ð cos θÞ Y m
l0 ðθ, ϕÞ Y l ðθ, ϕÞ ¼ δll δmm :
m 0 0 ð3:240Þ
0 1
In Sect. 3.2, the separation of variables leaded to the radial part of the Schrödinger
equation described as
108 3 Hydrogen-Like Atoms
2
1 ħ ∂ 2 ∂Rðr Þ ħ2 λ Ze2
2 r þ 2 R ðr Þ Rðr Þ ¼ ERðr Þ: ð3:51Þ
2μ r ∂r ∂r r 4πε0 r
We identified λ with l(l + 1) in (3.124). Thus, rewriting (3.51) and indexing R(r)
with l, we have
2
ħ2 d 2 dRl ðr Þ ħ lðl þ 1Þ Ze2
r þ R ðr Þ ¼ ERl ðr Þ, ð3:241Þ
2μr 2 dr dr 2μr 2 4πε0 r l
where Rl(r) is a radial wave function parametrized with l; μ, Z, ε0, and E denote a
reduced mass of hydrogen-like atom, atomic number, permittivity of vacuum, and
eigenvalue of energy, respectively. Otherwise we follow conventions.
Now, we are in position to solve (3.241). As in the cases of Chap. 2 of a quantum-
mechanical harmonic oscillator and the previous section of the angular momentum
operator, we present the operator formalism in dealing with radial wave functions of
hydrogen-like atoms. The essential point rests upon that the radial wave functions
can be derived by successively operating lowering operators on a radial wave
function having a maximum allowed orbital angular momentum quantum number.
The results agree with the conventional coordinate representation method based
upon power series expansion that is related to associated Laguerre polynomials.
Sunakawa [3] introduced the following differential equation by suitable trans-
formations of a variable, parameter, and function:
d 2 ψ l ð ρÞ l ð l þ 1Þ 2
þ ψ l ðρÞ ¼ Eψ l ðρÞ, ð3:242Þ
dρ2 ρ2 ρ
a 2
where ρ ¼ Zra , E ¼ 2μ
ħ2 Z
E , and ψ l(ρ) ¼ ρRl(r) with a (4πε0ħ2/μe2) being Bohr
radius of a hydrogen-like atom. Note that ρ and E are dimensionless quantities. The
related calculations are as follows: We have
dRl d ðψ l =ρÞ dρ dψ l 1 ψ l Z
¼ ¼ :
dr dρ dr dρ ρ ρ2 a
Thus, we get
dRl dψ l a d 2 dRl d 2 ψ l ð ρÞ
r2
¼ r ψ l, r ¼ ρ:
dr dρ Z dr dr dρ2
d l 1
bl þ : ð3:243Þ
dρ ρ l
Hence,
d l 1
b{l ¼ þ , ð3:244Þ
dρ ρ l
where the operator b{l is an adjoint operator of bl. Notice that these definitions are
different from those of Sunakawa [3]. The operator dρ d
ð AÞ is formally an anti-
Hermitian operator. We have mentioned such an operator in Sect. 1.5. The second
terms of (3.243) and (3.244) are Hermitian operators, which we define as H. Thus,
we foresee that bl and b{l can be denoted as follows:
bl ¼ A þ H and b{l ¼ A þ H:
d2 l l2 2 1 d2 lðl 1Þ 2 1
¼ þ þ ¼ þ þ 2 ð3:245Þ
dρ2 ρ2 ρ2 ρ l2 dρ2 ρ2 ρ l
Also, we have
d2 l ðl þ 1 Þ 2 1
b{l bl ¼ þ þ 2: ð3:246Þ
dρ 2 ρ2 ρ l
ðlÞ d2 l ð l þ 1Þ 2
H 2þ :
dρ ρ2 ρ
Then, from (3.243) and (3.244) as well as (3.245) and (3.246) we have
¼ b{ χ b{ χi þ εðn1Þ h χ jχ i
n n
2
¼ b{n jχ i þ εðn1Þ h χ jχ i
εðn1Þ : ð3:250Þ
Here we assume that χ is normalized. On the basis of the variational principle [9],
the above expected value must take a minimum ε(n 1) so that χ can be an
eigenfunction. To satisfy this condition, we have
jb{n χi ¼ 0: ð3:251Þ
ðnÞ
ψ n1 χ: ð3:253Þ
ðnÞ
ψ ðns
nÞ
bnsþ1 bnsþ2 ∙ ∙ ∙ ∙ bn1 ψ n1 ð2 s nÞ: ð3:255Þ
ðnÞ
With these functions (s – 1) operators have been operated on ψ n1 . Note that if
s took 1, no operation of bl would take place. Thus, we find that bl functions upon the
l-state to produce the (l 1)-state. That is, bl acts as an annihilation operator. For the
sake of convenience we express
ðnÞ
H ðn,sÞ ψ ðns
nÞ
¼ H ðn,sÞ bnsþ1 bnsþ2 bn1 ψ n1
ðnÞ
¼ bnsþ1 H ðn,s1Þ bnsþ2 bn1 ψ n1
ðnÞ
¼ bnsþ1 bnsþ2 H ðn,s2Þ bn1 ψ n1
ð nÞ
¼ bnsþ1 bnsþ2 H ðn,2Þ bn1 ψ n1
ð nÞ
¼ bnsþ1 bnsþ2 bn1 H ðn,1Þ ψ n1
ðnÞ
¼ bnsþ1 bnsþ2 bn1 εðn1Þ ψ n1
ðnÞ
¼ εðn1Þ bnsþ1 bnsþ2 bn1 ψ n1
¼ εðn1Þ ψ ðns
nÞ
: ð3:257Þ
2
ħ2 Z 1
En ¼ : ð3:258Þ
2μ a n2
ðnÞ
If we define l n s and take account of (3.252), total n functions ψ l (l ¼ 0, 1, 2,
∙ ∙ ∙ ∙, n 1) belong to the same eigenvalue ε(n 1).
ðnÞ
The quantum state ψ l is associated with the operators H(l ). Thus, the solution of
ðnÞ
(3.242) has been given by functions ψ l parametrized with n and l on condition that
(3.251) holds. As explicitly indicated in (3.255) and (3.257), bl lowers the parameter
112 3 Hydrogen-Like Atoms
ðnÞ
l by one from l to l – 1, when it operates on ψ l . The operator b0 cannot be defined as
indicated in (3.243), and so the lowest number of l should be zero. Operators such as
bl are known as a ladder operator (lowering operator or annihilation operator in the
ðnÞ
present case). The implication is that the successive operations of bl on ψ n1 produce
various parameters l as a subscript down to zero, while retaining the same integer
parameter n as a superscript.
ðnÞ
dψ n1 n 1 ðnÞ
þ ψ n1 ¼ 0: ð3:259Þ
dρ ρ n
ðnÞ
ψ n1 ¼ cn ρn eρ=n , ð3:260Þ
Namely,
Z 1
j cn j 2
ρ2n e2ρ=n dρ ¼ 1: ð3:262Þ
0
Z 1 2nþ1
1
ρ2n e2ρ=n dρ ¼ ð2nÞ!nð2nþ1Þ : ð3:264Þ
0 2
Hence,
nþ12 pffiffiffiffiffiffiffiffiffiffi
2
cn ¼ = ð2nÞ!: ð3:265Þ
n
To further normalize the other wave functions, we calculate the following inner
product:
D E D E
ðnÞ ðnÞ ðnÞ ðnÞ
ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 b{lþ1 blþ1 blþ2 bn1 ψ n1 : ð3:266Þ
ðnÞ
To show this, we use mathematical
D induction.
E We have already normalized ψ n1
ðnÞ ðnÞ
in (3.261). Next, we calculate ψ n2 jψ n2 such that
D E D E D E
ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ
ψ n2 jψ n2 ¼ ψ n1 b{n1 jbn1 ψ n1 ¼ ψ n1 b{n1 bn1 ψ n1
D h i E
ðnÞ ðnÞ
¼ ψ n1 bn b{n þ εðn1Þ εðn2Þ ψ n1
D E h iD E ð3:269Þ
ðnÞ ðnÞ ðnÞ ðnÞ
¼ ψ n1 bn b{n ψ n1 þ εðn1Þ εðn2Þ ψ n1 ψ n1
h iD E
ðnÞ ðnÞ
¼ εðn1Þ εðn2Þ ψ n1 ψ n1 :
With the third equality, we used (3.267) with l ¼ n 1; with the last equality we
D with lE¼ n 2. Then, Dit sufficesEto show that
used (3.251). Therefore, (3.268) holds
ðnÞ ðnÞ ðnÞ ðnÞ
assuming that (3.268) holds with ψ lþ1 ψ lþ1 , it holds with ψ l ψ l as well.
D E
ðnÞ ðnÞ
Let us calculate ψ l ψ l , starting with (3.266) as below:
114 3 Hydrogen-Like Atoms
D E D E
ðnÞ ðnÞ ðnÞ ðnÞ
ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 b{lþ1 blþ1 blþ2 bn1 ψ n1
D E
ðnÞ ðnÞ
¼ ψ n1 b{n1 b{lþ2 b{lþ1 blþ1 blþ2 bn1 ψ n1
D h i E
ðnÞ ðnÞ
¼ ψ n1 b{n1 b{lþ2 blþ2 b{lþ2 þ εðlþ1Þ εðlÞ blþ2 bn1 ψ n1
D E
ðnÞ ðnÞ
¼ ψ n1 b{n1 b{lþ2 blþ2 b{lþ2 blþ2 bn1 ψ n1 ð3:270Þ
h iD E
ðnÞ ð nÞ
þ εðlþ1Þ εðlÞ ψn1 b{n1 b{lþ2 blþ2 bn1 ψ n1
D E
ðnÞ ðnÞ
¼ ψ n1 b{n1 b{lþ2 blþ2 b{lþ2 blþ2 bn1 ψ n1
h iD E
ðnÞ ðnÞ
þ εðlþ1Þ εðlÞ ψ lþ1 ψ lþ1 :
In the next step, using b{lþ2 blþ2 ¼ blþ3 b{lþ3 þ εðlþ2Þ εðlþ1Þ , we have
D E D E
ðnÞ ðnÞ ðnÞ ðnÞ
ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 blþ2 blþ3 b{lþ3 blþ3 bn1 ψ n1
h iD E h iD E
ðnÞ ðnÞ ðnÞ ðnÞ
þ εðlþ1Þ εðlÞ ψ lþ1 ψ lþ1 þ εðlþ2Þ εðlþ1Þ ψ lþ1 ψ lþ1 :
ð3:271Þ
Thus, we find that in the first term the index number of b{lþ3 has been increased by
one with itself transferred toward theDright side. EOn the other hand, we notice that
ðnÞ ðnÞ
with the second and third terms, εðlþ1Þ ψ lþ1 ψ lþ1 cancels out. Repeating the above
processes, we reach a following expression:
D E D ðnÞ E
ðnÞ ðnÞ ðnÞ
ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 blþ2 bn1 bn b{n ψ n1
h iD E h iD E
ðnÞ ðnÞ ðnÞ ðnÞ
þ εðn1Þ εðn2Þ ψ lþ1 ψ lþ1 þ εðn2Þ εðn3Þ ψ lþ1 ψ lþ1 þ
h iD E h iD E ð3:272Þ
ðnÞ ðnÞ ðnÞ ðnÞ
þ εðlþ2Þ εðlþ1Þ ψ lþ1 ψ lþ1 þ εðlþ1Þ εðlÞ ψ lþ1 ψ lþ1
h iD E
ðnÞ ðnÞ
¼ εðn1Þ εðlÞ ψ lþ1 ψ lþ1 :
D E
ðnÞ ðnÞ
ψ lþ1 ψ lþ1
h ih i h iD E
ðnÞ ðnÞ
¼ εðn1Þ εðn2Þ εðn1Þ εðn3Þ εðn1Þ εðlþ1Þ ψ n1 ψ n1 :
Inserting this equation into (3.272), we arrive at (3.268). In other words, we have
shown that if (3.268) holds with l ¼ l + 1, (3.268)D holds withE l ¼ l as well. This
ðnÞ ðnÞ
completes the proof to show that (3.268) is true of ψ l ψ l with l down to 0.
ðnÞ
el
The normalized wave functions ψ are expressed from (3.255) as
ðnÞ ðnÞ
¼ κðn, lÞ2 blþ1 blþ2 bn1 ψ
1
el
ψ e n1 , ð3:273Þ
nþ12
ðnÞ 2 1
e n1 ¼
ψ pffiffiffiffiffiffiffiffiffiffi ρn eρ=n : ð3:276Þ
n ð2nÞ!
h i12
e
bl εðn1Þ εðl1Þ bl : ð3:277Þ
¼e blþ2 e
blþ1e
ðnÞ ðnÞ
el
ψ e n1 :
bn1 ψ ð3:278Þ
116 3 Hydrogen-Like Atoms
ðnÞ
e l with conventional wave
It will be of great importance to compare the functions ψ
functions that are expressed using associated Laguerre polynomials. For this purpose
ðnÞ
we define the following functions Φl ðρÞ such that
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
lþ32 ðn l 1Þ! ρn lþ1 2lþ1 2ρ
ðnÞ 2
Φl ð ρÞ e ρ Lnl1 : ð3:279Þ
n 2nðn þ lÞ! n
1 ν x dn nþν x
Lνn ðxÞ ¼ x e n ðx e Þ, ðν > 1Þ: ð3:280Þ
n! dx
In a form of power series expansion, the polynomials are expressed for integer
k 0 as
Xn ð1Þm ðn þ k Þ!
Lkn ðxÞ ¼ m¼0 ðn mÞ!ðk þ mÞ!m!
xm : ð3:281Þ
Hence, instead of (3.280) and (3.281), the Rodrigues formula and power series
expansion of Ln(x) are given by [2, 5]
1 x dn n x
Ln ðxÞ ¼ e ðx e Þ,
n! dxn
Xn ð1Þm n!
L n ð xÞ ¼ xm :
m¼0
ðn mÞ!ðm!Þ2
ðnÞ ρ
The function Φl ðρÞ contains multiplication factors en and ρl + 1. The function
ðnÞ
Lnl1 n is a polynomial of ρ with the highest order of ρn l 1. Therefore, Φl ðρÞ
2lþ1 2ρ
ρ
consists of summation of terms containing en ρt , where t is an integer equal to 1 or
ðnÞ
larger. Consequently, Φl ðρÞ ! 0 when ρ ! 0 and ρ ! 1 (vide supra). Thus, we
ðnÞ
have confirmed that Φl ðρÞ certainly satisfies proper BCs mentioned earlier and,
d
hence, the operator dρ is indeed an anti-Hermitian. To show this more explicitly, we
define D dρ d
. An inner product between arbitrarily chosen functions f and g is
3.7 Radial Wave Functions of Hydrogen-Like Atoms 117
Z 1
h f jDgi f Dgdρ
0
Z 1
ð3:282Þ
¼ ½ f g 1
0 ðDf Þgdρ
0
¼ ½ f g 1
0 þ hDf jgi,
{ { { 2 2
b{l bl ¼ H 2 A2 H { A{ þ A{ H { ¼ H { A{ H { A{ þ A{ H {
The Hermiticity is true of bl b{l as well. Thus, the eigenvalue and eigenstate
(or wave function) which belongs to that eigenvalue are physically meaningful.
Next, consider the following operation:
e ðnÞ
bl Φ l ð ρÞ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h i12 lþ32 ðn l 1Þ! n o
ðn1Þ ðl1Þ 2 d l 1 ρ 2ρ
¼ ε ε þ en ρlþ1 L2lþ1 ,
n 2nðn þ lÞ! dρ ρ l nl1 n
ð3:286Þ
where
h i12
nl
εðn1Þ εðl1Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð3:287Þ
ðn þ lÞðn lÞ
2ρ
Rewriting L2lþ1
nl1 n in a power series expansion form using (3.280) and
rearranging the result, we obtain
118 3 Hydrogen-Like Atoms
e ðnÞ
bl Φ l ð ρÞ
3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 lþ2 nl 1 d l 1
¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi þ
n ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ! dρ ρ l
nl1
ρ
l d nþl 2ρ
eρn ρ e n
dρnl1
3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 lþ2 nl 1 d l 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ
n ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ! dρ ρ l
Xnl1 ð1Þm ðn þ lÞ! m ðn l 1Þ!
ρ 2
en ρlþmþ1 , ð3:288Þ
m¼0 ð2l þ m þ 1Þ! n m!ðn l m 1Þ!
e ðnÞ
bl Φ l ð ρÞ
3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 lþ2 n þ l nl 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi eρ=n ρl
n 2l ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ!
(
Xnl1 ð1Þmþ1 ðn þ lÞ! ðn l 1Þ! mþ1 ð3:289Þ
mþ1 2
ρ
m¼0 ð2l þ m þ 1Þ! m!ðn l m 1Þ! n
)
Xnl1 ð1Þm ðn þ l 1Þ! ðn l 1Þ! m
m 2
þ2l m¼0 ρ :
ð2l þ mÞ! m!ðn l m 1Þ! n
In (3.289), calculation of the part { } next to the multiplication sign for RHS is
somewhat complicated, and so we describe the outline of the calculation procedure
below.
{ } of RHS of (3.289)
2ρm
Xnl1 ð1Þm ðn l 1Þ!ðn þ l 1Þ!
¼ n
m¼1 ð2l þ mÞ!
nþl 2l
þ
ðm 1Þ!ðn l mÞ! m!ðn l m 1Þ!
nl ðn þ l 1Þ!
2ρ
þð1Þnl þ
n ð2l 1Þ!
2ρm
Xnl1 ð1Þm ðn l 1Þ!ðn þ l 1Þ!
¼ n
m¼1 ð2l þ mÞ!
ðm 1Þ!ðn l m 1Þ!ð2l þ mÞðn lÞ
ðm 1Þ!ðn l mÞ!m!ðn l m 1Þ!
nl ðn þ l 1Þ!
2ρ
þð1Þnl þ
n ð2l 1Þ!
2ρm nl ðn þ l 1Þ!
Xnl1 ð1Þm ðn lÞ!ðn þ l 1Þ! nl 2ρ
¼ n
þ ð 1 Þ þ
m¼1 ð2l þ m 1Þ!ðn l mÞ!m! n ð2l 1Þ!
"
Xnl1 ð1Þm 2ρ m ðn þ l 1Þ! nl 1 2ρ nl
¼ ðn lÞ! n
m¼1 ð2l þ m 1Þ!ðn l mÞ!m!
þ ð1Þ
ðn lÞ! n
#
ðn þ l 1Þ! Xnl ð1Þm 2ρ m ðn þ l 1Þ!
þ ¼ ðn lÞ! m¼0 n
ðn lÞ!ð2l 1Þ! ð2l þ m 1Þ!ðn l mÞ!m!
2ρ
¼ ðn lÞ!L2l1
nl : ð3:290Þ
n
Notice that with the second equality of (3.290), the summation is divided into
three terms; i.e., 1 m n l 1, m ¼ n l (the highest-order term), and m ¼ 0
(the lowest- order term). Note that with the second last equality of (3.290), the
highest-order (n l) term and the lowest-order term (i.e., a constant) have been
absorbed in a single equation, namely, an associated Laguerre polynomial. Corre-
spondingly, with the second last equality of (3.290) the summation range is extended
to 0 m n l.
Summarizing the above results, we get
120 3 Hydrogen-Like Atoms
e ðnÞ
bl Φl ð ρÞ
3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 lþ2 n þ l nlðn lÞ! 1 2ρ
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi eρ=n ρl L2l1
ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ!
n 2l nl n
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ðn lÞ!
2 lþ2 2ρ
¼ eρ=n ρl L2l1
n 2nðn þ l 1Þ! nl n
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 ðl1Þþ2 ½n ðl 1Þ 1! ρn ðl1Þþ1 2ðl1Þþ1 2ρ
3
ðnÞ
¼ e ρ Lnðl1Þ1 Φl1 ðρÞ:
n 2n½n þ ðl 1Þ! n
ð3:291Þ
ðnÞ ðnÞ
e l . Moreover, if we replace
Thus, we find out that Φl ðρÞ behaves exactly like ψ
l in (3.279) with n – 1, we find
ðnÞ ðnÞ
e n1 :
Φn1 ðρÞ ¼ ψ ð3:292Þ
Operating e
bn1 on both sides of (3.292),
ðnÞ ðnÞ
e n2 :
Φn2 ðρÞ ¼ ψ ð3:293Þ
ðnÞ ðnÞ
e l ðρÞ,
Φl ðρÞ ¼ ψ ð3:294Þ
ðnÞ ðnÞ
e l ðρÞ:
Φl ð ρÞ ψ ð3:295Þ
ðnÞ ðnÞ
e l =ρ:
Rl ðr Þ ¼ ψ ð3:296Þ
ðnÞ
To normalize Rl ðr Þ, we have to calculate the following integral:
3.7 Radial Wave Functions of Hydrogen-Like Atoms 121
Z 1 Z 1 Z
ðnÞ 2 2 1 ðnÞ 2 a 2 a a 3 1 ðnÞ 2
Rl ðr Þ r dr ¼ ψel ρ dρ ¼ e l dρ
ψ
0 0 ρ 2 Z Z Z 0
3
a
¼ : ð3:297Þ
Z
ðnÞ
el ðr Þ for the normalized radial
Accordingly, we choose the following functions R
wave functions:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eðl nÞ ðr Þ ¼
R ðZ=aÞ3 Rl ðr Þ:
ðnÞ
ð3:298Þ
Substituting (3.296) into (3.298) and taking account of (3.279) and (3.280), we
obtain
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h i
eðl nÞ ðr Þ 2Z 3 ðn l 1Þ! 2Zr l Zr 2Zr
R ¼ exp L2lþ1 : ð3:299Þ
an 2nðn þ lÞ! an an nl1 an
Equation (3.298) is exactly the same as the normalized radial wave functions that
can be obtained as the solution of (3.241) through the power series expansion. All
ħ2 Z 2 1
these functions belong to the same eigenenergy E n ¼ 2μ a n2 ; see (3.258).
Returning back to the quantum states sharing the same eigenenergy, we have such
n states that belong to the principal quantum number n with varying azimuthal
quantum number l (0 l n 1). Meanwhile, from (3.158), (3.159), and
(3.171), 2l + 1 quantum states possess an eigenvalue of l(l + 1) with the quantity
M2. As already mentioned in Sects. 3.5 and 3.6, the integer m takes the 2l + 1
different values of
m ¼ l, l 1, l 2, 1, 0, 1, l þ 1, l:
Regarding this situation, we say that the quantum states belonging to the principal
ħ2 Z 2 1
quantum number n, or the energy eigenvalue En ¼ 2μ a
2
n2 , are degenerate n -
fold. Note that we ignored the freedom of spin state.
In summary of this section, we have developed the operator formalism in dealing
with radial wave functions of hydrogen-like atoms and seen how the operator
formalism features the radial wave functions. The essential point rests upon that
122 3 Hydrogen-Like Atoms
the radial wave functions can be derived by successively operating the lowering
ð nÞ
e n1 that is parametrized with a principal quantum number n and an
operators bl on ψ
orbital angular momentum quantum number l ¼ n 1. This is clearly represented by
(3.278). The results agree with the conventional coordinate representation method
based upon the power series expansion that leads to associated Laguerre polyno-
mials. Thus, the operator formalism is again found to be powerful in explicitly
representing the mathematical constitution of quantum-mechanical systems.
Since we have obtained angular wave functions and radial wave functions, we
e ðnÞ of hydrogen-like atoms as a product
describe normalized total wave functions Λ l,m
of the angular part and radial part such that
4π 2 6 a
1 3 r 1 3 r r z 1
¼ pffiffiffiffiffi a2 e2a cos θ ¼ pffiffiffiffiffi a2 e2a ¼ pffiffiffiffiffi a2 e2a z: ð3:303Þ
r 5 r
4 2π a 4 2π a r 4 2π
eð12Þ ðr Þ ¼ p
ϕ 2pxþiy Y 11 ðθ, ϕÞR
1 32 r 2ar
ffiffiffi a e sin θeiϕ
8 π a
1 3 r r x þ iy 1
¼ pffiffiffi a2 e2a ¼ pffiffiffi a2 e2a ðx þ iyÞ:
5 r
ð3:304Þ
8 π a r 8 π
In (3.304), the minus sign comes from the Condon–Shortley phase. Furthermore,
we have
ϕ 2pxiy Y 1 eð2Þ 1 32 r 2ar
1 ðθ, ϕÞR1 ðr Þ ¼ pffiffiffi a a e sin θeiϕ
8 π
1 3 r r x iy 1
¼ pffiffiffi a2 e2a ¼ pffiffiffi a2 e2a ðx iyÞ:
5 r
ð3:305Þ
8 π a r 8 π
Notice that the above notations ϕ(2px + iy) and ϕ(2px iy) differ from the custom
that uses, e.g., ϕ(2px) and ϕ(2py). We will come back to this point in Sect. 4.3.
References
∂ψ
Hψ ¼ ih : ð1:47Þ
∂t
∂ξðt Þ
ih ¼ Eξðt Þ: ð1:56Þ
∂t
The probability density of the system (i.e., normally a particle such as an electron
and a harmonic oscillator) residing at a certain place x at a certain time t is expressed
as
ψ ðx, t Þψ ðx, t Þ:
This means that the probability density of the system depends only on spatial
coordinate and is constant in time. Such a state is said to be a stationary state. That
is, the system continues residing in a quantum state described by ϕ(x) and remains
unchanged independent of time.
Next, we consider a linear combination of functions described by (1.60). That is,
we have
ψ ðx, t Þ ¼ c1 ϕ1 ðxÞ exp ðiE 1 t=hÞ þ c2 ϕ2 ðxÞ exp ðiE 2 t=hÞ, ð4:2Þ
where the first term is pertinent to the state 1 and second term to the state 2; c1 and c2
are complex constants with respect to the spatial coordinates but may be weakly
time-dependent. The state described by (4.2) is called a coherent state. The proba-
bility distribution of that state is described as
ψ ðx, t Þψ ðx, t Þ
¼ jc1 j2 jϕ1 j2 þ jc2 j2 jϕ2 j2 þ c1 c2 ϕ1 ϕ2 eiωt þ c2 c1 ϕ2 ϕ1 eiωt , ð4:3Þ
where ω is expressed as
4.1 Electric Dipole Transition 127
ω ¼ ðE 2 E 1 Þ=h: ð4:4Þ
This equation shows that the probability density of the system undergoes a sinusoi-
dal oscillation with time. The angular frequency equals the energy difference
between the two states divided by the reduced Planck constant. If the system is a
charged particle such as an electron and proton, the sinusoidal oscillation is accom-
panied by an oscillating electromagnetic field. Thus, the coherent state is associated
with the optical transition from one state to another, when the transition is related to
the charged particle.
The optical transitions result from various causes. Of these, the electric dipole
transition yields the largest transition probability and the dipole approximation is
often chosen to represent the transition probability. From the point of view of optical
measurements, the electric dipole transition gives the strongest absorption or emis-
sion spectral lines. The matrix element of the electric dipole, more specifically a
square of an absolute value of the matrix element, is a measure of the optical
transition probability. Labelling the quantum states as a, b, etc. and describing the
corresponding state vector as j ai, j bi, etc., the matrix element Pba is given by
P eΣj xj , ð4:6Þ
where e is an elementary charge (e < 0) and xj is a position vector of the j-th electron.
Detailed description of εe and P can be seen in Chap. 7. The quantity Pba is said to be
transition dipole moment, or, more precisely, transition electric dipole moment with
respect to the states j ai and j bi. We assume that the optical transition occurs from a
quantum state j ai to another state j bi. Since Pba is generally a complex number,
jPbaj2 represents the transition probability.
If we adopt the coordinate representation, (4.5) is expressed by
Z
Pba ¼ ϕb εe Pϕa dτ, ð4:7Þ
1
ψ ðx, t Þ ¼ pffiffiffi ½ϕ1 ðxÞ exp ðiE 1 t=hÞ þ ϕ2 ðxÞ exp ðiE 2 t=hÞ, ð4:8Þ
2
rffiffiffi rffiffiffi
2 2
ϕ1 ð xÞ ¼ cos x and ϕ2 ðxÞ ¼ sin 2x: ð4:9Þ
π π
1
ψ ðx, t Þψ ðx, t Þ ¼ cos 2 x þ sin 2 2x þ ð sin 3x þ sin xÞ cos ωt , ð4:10Þ
π
ω ¼ 3h=2m, ð4:11Þ
,0
confined in a square-well
potential. The solid curve
,0
and broken curve represent
the density of t ¼ 0 and
∗
t ¼ π/ω (i.e., half period),
respectively
− /2 0 /2
Z 0
1 4
ψ ðx, 0Þψ ðx, 0Þdx ¼ 0:076: ð4:14Þ
π=2 2 3π
Thus, 92% of a total charge (as a probability density) is concentrated in the positive
domain. Differentiation of ψ (x, 0)ψ(x, 0) gives five extremals including both edges.
Of these, a major maximum is located at 0.635 radian that corresponds to about 40%
of π/2. This can be a measure of the transition moment. Figure 4.1 demonstrates
these results (see a solid curve). Meanwhile, putting t ¼ π/ω (i.e., half period), we
plot ψ (x, π/ω)ψ(x, π/ω). The result shows that the graph is obtained by folding back
the solid curve of Fig. 4.1 with respect to the ordinate axis. Thus, we find that the
charge (or the probability density) exerts a sinusoidal oscillation with an angular
frequency 3 h/2m along the x-axis around the origin.
Let e1 be a unit vector in the positive direction of the x-axis. Then, the electric
dipole P of the system is
P = ex = exe1 , ð4:15Þ
where x is a position vector of the electron. Let us define the matrix element of the
electric dipole transition as
Notice that we only have to consider that the polarization of light is parallel to the x-
axis. With the coordinate representation, we have
Z π=2 Z rffiffiffi
π=2
rffiffiffi
2 2
P21 ¼ ϕ2 ðxÞexϕ1 ðxÞdx ¼ ð cos xÞex sin 2x dx
π=2 π=2 π π
Z π=2 Z π=2
2 e
¼e x cos x sin 2x dx ¼ xð sin x þ sin 3xÞdx
π π=2 π π=2
130 4 Optical Transition and Selection Rules
Z π=2 0
e 1 16e
¼ xð cos xÞ0 þ x cos 3x dx ¼ , ð4:17Þ
π π=2 3 9π
where we used a trigonometric formula and integration by parts. The factor 16/9π in
(4.17) is about 36% of π/2. This number is pretty good agreement with 40% that is
estimated above from the major maximum of ψ (x, 0)ψ(x, 0). Note that the transition
moment vanishes if the two states associated with the transition have the same parity.
In other words, if these are both described by sine functions or cosine functions, the
integral vanishes.
Example 4.2: One-Dimensional Harmonic Oscillator Second, let us think of an
optical transition regarding a harmonic oscillator that we dealt with in Chap. 2. We
denote the state of the oscillator as j ni in place of jψ ni (n ¼ 0, 1, 2, ) of Chap. 2.
Then, a general expression (4.5) can be written as
εe ¼ e
q and P = eq, ð4:19Þ
εe P = eq: ð4:20Þ
That is,
D pffiffiffiffiffiffiffiffiffiffiffi
kja ¼ k þ 1hk þ 1j: ð4:24Þ
Pkl ¼ Plk :
The matrix element Pkl is symmetric with respect to indices k and l. Notice that the
first term does not vanish only when k + 1 ¼ l. The second term does not vanish only
when k ¼ l + 1. Therefore we get
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
hð k þ 1Þ hð l þ 1Þ hð k þ 1Þ
Pk,kþ1 ¼e and Plþ1,l ¼ e or Pkþ1,k ¼ e : ð4:27Þ
2mω 2mω 2mω
where we used (2.68). Note that a real Hermitian matrix is a symmetric matrix.
Practically, it is a fast way to construct a transition matrix (4.28) using (2.65) and
(2.66). It is an intuitively obvious and straightforward task. Having a glance at the
matrix form immediately tells us that the transition matrix elements are nonvanishing
with only (k, k + 1) and (k + 1, k) positions. Whereas the (k, k + 1)-element represents
transition from the k-th excited state to (k 1)-th excited state accompanied by
photoemission, the (k + 1, k)-element implies the transition from (k 1)-excited state
132 4 Optical Transition and Selection Rules
to k-th excited state accompanied by photoabsorption. The two transitions give the
same transition moment. Note that zeroth excited state means the ground state; see
(2.64) for basis vector representations.
We should be careful about “addresses” of the matrix accordingly. For example,
P0, 1 in (4.27) represents a (1, 2) element of the matrix (4.28); P2, 1 stands for a (3, 2)
element.
Suppose that we seek the transition dipole moments using coordinate represen-
tation. Then, we need to use (2.106) and perform definite integration. For instance,
we have
Z 1
e ψ 0 ðqÞqψ 1 ðqÞdq
1
qffiffiffiffiffiffiffi
h
that corresponds to (1, 2) element of (4.28). Indeed, the above integral gives e 2mω.
The confirmation is left for the readers. Nonetheless, to seek a definite integral of
product of higher excited-state wave functions becomes increasingly troublesome. In
this respect, the operator method described above provides us with a much better
insight into complicated calculations.
Equations (4.26)–(4.28) imply that the electric dipole transition is allowed to
occur only when the quantum number changes by one. Notice also that the transition
takes place between the even function and odd function; see Table 2.1 and (2.101).
Such a condition or restriction on the optical transition is called a selection rule. The
former equation of (4.27) shows that the transition takes place from the upper state to
the lower state accompanied by the photon emission. The latter equation, on the
other hand, shows that the transition takes place from the lower state to the upper
accompanied by the photon absorption.
The hydrogen-like atoms give us a typical example. Since we have fully investigated
the quantum states of those atoms, we make the most of the related results.
Example 4.3: An Electron in a Hydrogen Atom Unlike the one-dimensional
system, we have to take account of an angular momentum in the three-dimensional
system. We have already obtained explicit wave functions. Here we focus on 1s and
2p states of a hydrogen. For their normalized states we have
rffiffiffiffiffiffiffi
1 r=a
ϕð1sÞ ¼ e , ð4:29Þ
πa3
4.3 Three-Dimensional System 133
rffiffiffiffiffiffiffiffiffiffi
1 1 r r=2a
ϕ 2pz ¼ e cos θ, ð4:30Þ
4 2πa3 a
rffiffiffiffiffiffiffi
1 1 r r=2a
ϕ 2pxþiy ¼ e sin θeiϕ , ð4:31Þ
8 πa3 a
rffiffiffiffiffiffiffi
1 1 r r=2a
ϕ 2pxiy ¼ e sin θeiϕ , ð4:32Þ
8 πa3 a
where a denotes Bohr radius of a hydrogen. Note that a minus sign of ϕ(2px + iy) is
due to the Condon–Shortley phase. Even though the transition probability is pro-
portional to a square of the matrix element and so the phase factor cancels out, we
describe the state vector faithfully. The energy eigenvalues are
h2 h2
E ð1sÞ ¼ 2
, E 2pz ¼ E 2pxþiy ¼ E 2pxiy ¼ , ð4:33Þ
2μa 8μa2
where μ is a reduced mass of a hydrogen. Note that the latter three states are
degenerate.
First, we consider a transition between ϕ(1s) and ϕ(2pz) states. Suppose that the
normalized coherent state is described as
1
ψ ðx, t Þ ¼ pffiffiffi f ϕð1sÞ exp ½iE ð1sÞt=h þ ϕ 2pz exp iE 2pz t=h g: ð4:34Þ
2
As before, we have
where ω is given by
ω ¼ E 2pz Eð1sÞ =h ¼ 3h=8 μa2 : ð4:36Þ
In virtue of the third term of (4.35) that contains a cos ωt factor, the charge
distribution undergoes a sinusoidal oscillation along the z-axis with an angular
frequency described by (4.36). For instance, ωt ¼ 0 gives +1 factor to (4.35) when
t ¼ 0, whereas it gives 1 factor when ωt ¼ π, i.e., t ¼ 8π μa2/3h.
Integrating (4.35), we have
134 4 Optical Transition and Selection Rules
Z Z
ψ ðx, t Þψ ðx, t Þdτ ¼ jψ ðx, t Þj2 dτ
Z n Z
1 2 o 1 1
¼ ½ϕð1sÞ2 þ ϕ 2pz dτ þ cos ωt ϕð1sÞϕ 2pz dτ ¼ þ ¼ 1,
2 2 2
where we used normalized functional forms of ϕ(1s) and ϕ(2pz) together with
orthogonality of them. Note that both of the functions are real.
Next, we calculate the matrix element. For simplicity, we denote the matrix
element simply as Pðεe Þ only by designating the unit polarization vector εe. Then,
we have
where
0 1
x
B C
P = ex = eðe1 e2 e3 Þ@ y A: ð4:38Þ
z
We have three possibilities of choosing εe out of e1, e2, and e3. Choosing e3, we have
ðe Þ
Pz,jp3 ¼ e ϕð1sÞjzjϕ 2pz
zi
Z 1 Z π Z 2π pffiffiffi
e 4 3r=2a 27 2
¼ pffiffiffi r e dr cos θ sin θdθ
2
dϕ ¼ 5 ea 0:745ea: ð4:39Þ
4 2πa4 0 0 0 3
ðe Þ
In (4.39), we express the matrix element as Pz,jp3to indicate the z-component of
i z
position vector and to explicitly show that ϕ(2pz) state is responsible for the
transition. In (4.39), we used z ¼ r cos θ. We also used a radial part integration
such that
Z 1
2a 5
r 4 e3r=2a dr ¼ 24 :
0 3
description. Equation (4.39) represents the transition from jϕ(2pz)i to jϕ(1s)i that is
accompanied by the photon emission. Thus, j pzi in the notation means that jϕ(2pz)i
4.3 Three-Dimensional System 135
is the initial state. In the notation, in turn, (e3) denotes the polarization vector and
z represents the electric dipole.
In the case of photon absorption where the transition occurs from jϕ(1s)i to
jϕ(2pz)i, we use the following notation:
ðe Þ
Pz, 3p j ¼ e ϕ 2pz jzjϕð1sÞ : ð4:40Þ
hz
Since all the functions related to the integration are real, we have
ðe Þ ðe Þ
Pz,jp3 ¼ Pz, 3p j :
zi hz
ðe Þ
Px,jp1 ¼ e ϕð1sÞjxjϕ 2pz
zi
Z 1 Z π Z 2π
e
¼ pffiffiffi r 4 e3r=2a dr sin 2 θ cos θdθ cos ϕdϕ ¼ 0, ð4:41Þ
4 2πa4 0 0 0
where cos ϕ comes from x ¼ r sin θ cos ϕ and an integration of cos ϕ gives zero. In a
similar manner, we have
ðe Þ
Py, 2j p ¼ e ϕð1sÞjyjϕ 2pz ¼ 0: ð4:42Þ
zi
Next, we estimate the matrix elements associated with 2px and 2py. For this
purpose, it is convenient to introduce the following complex coordinates by a unitary
transformation:
0 10 1
1 1 1 i
0 1 p ffiffi
ffi p ffiffi
ffi 0 CB pffiffiffi pffiffiffi 0 C0 1
x B 2 C x
B 2 2 CB 2
B C B CB CB C
ðe1 e2 e3 Þ@ y A ¼ ðe1 e2 e3 ÞB piffiffiffi pi ffiffiffi 0 CB p1ffiffiffi pffiffiffi 0 C
i @yA
B CB C
z @ 2 2 A@ 2 2 A z
0 0 1 0 0 1
0 1
1
pffiffiffi ðx þ iyÞ
B 2 C
1 1 B C
¼ pffiffiffi ðe1 ie2 Þ pffiffiffi ðe1 þ ie2 Þ e3 B B 1 C,
C ð4:43Þ
2 2 @ pffiffiffi ðx iyÞ A
2
z
U { U ¼ UU { ¼ E: ð4:44Þ
We will investigate details of the unitary transformation and matrix in Parts III
and IV.
We define e+ and e as follows [1]:
1 1
eþ pffiffiffi ðe1 þ ie2 Þ and e pffiffiffi ðe1 ie2 Þ, ð4:45Þ
2 2
where complex vectors e+ and e represent the left-circularly polarized light and
right-circularly polarized light that carry an angular momentum h and h, respec-
tively. We will revisit the characteristics and implication of these complex vectors in
Sect. 7.4.
We have
0 1
0 1 p
1
ffiffi
ffi ð x þ iy Þ
x B 2 C
B C B C
ð e1 e2 e3 Þ @ y A ¼ ð e eþ e3 Þ B
B pffiffiffi ðx iyÞ C:
1 C ð4:46Þ
@ A
z 2
z
In this situation, e+, e, and e3 are said to form an orthonormal basis in a three-
dimensional complex vector space (see Sect. 11.4).
Now, choosing e+ for εe, we have [2]
ðeþ Þ 1
Pxiy, e ϕð1sÞjpffiffiffi ðx iyÞjϕ 2pxþiy , ð4:48Þ
j pþ i 2
Z 1 Z π Z 2π
ðe Þ e
þ
Pxiy,jp ¼ pffiffiffi r 4 e3r=2a dr sin 3 θdθ eiϕ eiϕ dϕ
þi 8 2πa4 0 0 0
pffiffiffi
27 2
¼ 5 ea, ð4:49Þ
3
where we used
In the definite integral of (4.49), eiϕ comes from x iy, while eiϕ comes from
ϕ(2px + iy). Note that from (3.24) eiϕ is an eigenfunction corresponding to an angular
momentum eigenvalue h. Notice that in (4.49) exponents eiϕ and eiϕ cancel out and
that an azimuthal integral is nonvanishing.
If we choose e for εe, we have
ðe Þ 1
Pxþiy,jp ¼ e ϕð1sÞjpffiffiffi ðx þ iyÞjϕ 2pxþiy ð4:51Þ
þi 2
Z 1 Z π Z 2π
e
¼ pffiffiffi r 4 e3r=2a dr sin 3 θdθ e2iϕ dϕ ¼ 0, ð4:52Þ
8 2πa4 0 0 0
where we used
x þ iy ¼ r sin θeiϕ :
With (4.52), a factor e2iϕ results from the product ϕ(2px + iy)(x + iy) which renders the
integral (4.51) vanishing. Note that the only difference between (4.49) and (4.52) is
about the integration of ϕ factor. For the same reason, if we choose e3 for εe, the
matrix element vanishes. Thus, with the ϕ(2px + iy)-related matrix element, only
ð eþ Þ
Pxiy, survives. Similarly, with the ϕ(2px iy)-related matrix element, only
j pþ i
ð e Þ
Pxþiy,jp survives. Notice that j pi is a shorthand notation of ϕ(2px iy). That is,
i
we have, e.g.,
pffiffiffi
ðe Þ 1 27 2
Pxþiy,jp ¼ e ϕð1sÞjpffiffiffi ðx þ iyÞjϕ 2pxiy ¼ 5 ea,
i
2 3
ðe Þ 1
þ
Pxiy, jp ¼ e ϕð1sÞjpffiffiffi ðx iyÞjϕ 2pxiy ¼ 0: ð4:53Þ
i
2
pffiffiffi
ðeþ Þ 1 27 2
Pxiy, ¼ e ϕ 2pxþiy jpffiffiffi ðx þ iyÞjϕð1sÞ ¼ 5 ea ð4:54Þ
j pþ i 2 3
ðe Þ
Here recall (1.116) and (x iy){ ¼ x + iy. Also note that since Pxiy,
þ
is real,
j pþ i
ð eþ Þ
[Pxiy, is real as well so that we have
j pþ i
ð eþ Þ ðe Þ ðe Þ
Pxiy, ¼ Pxiy,
þ
¼ Pxþiy,
: ð4:55Þ
j pþ i j pþ i hpþ j
Comparing (4.48) and (4.55), we notice that the polarization vector has been
switched from e+ to e with the allowed transition, even though the matrix element
remains the same. This can be explained as follows: In (4.48) the photon emission is
occurring, while the electron is causing a transition from ϕ(2px + iy) to ϕ(1s). As a
result, the radiation field has gained an angular momentum by h during the process in
which the electron has lost an angular momentum h. In other words, h is transferred
from the electron to the radiation field and this process results in the generation of
left-circularly polarized light in the radiation field.
In (4.54), on the other hand, the reversed process takes place. That is, the photon
absorption is occurring in such a way that the electron is excited from ϕ(1s) to
ϕ(2px + iy). After this process has been completed, the electron has gained an angular
momentum by h, whereas the radiation field has lost an angular momentum by h. As
a result, the positive angular momentum h is transferred to the electron from the
radiation field that involves left-circularly polarized light. This can be translated into
the statement that the radiation field has gained an angular momentum by h. This is
equivalent to the generation of right-circularly polarized light (characterized by e)
in the radiation field. In other words, the electron gains the angular momentum by h
to compensate the change in the radiation field.
The implication of the first equation of (4.53) can be interpreted in a similar
manner. Also we have
h i
ðe Þ ðe Þ ð eþ Þ 1
Pxþiy, jp ¼ Pxþiy,
¼ Pxiy, ¼ e ϕ 2p j p ffiffi
ffi ð x iy Þjϕ ð 1s Þ
i jp i hp j xiy
2
pffiffiffi
27 2
¼ ea:
35
Notice that the inner products of (4.49) and (4.53) are real, even though operators
ðeþ Þ ð e Þ
x + iy and x iy are not Hermitian. Also note that Pxiy, of (4.49) and Pxþiy,
j pþ i j p i
of (4.53) have the same absolute value with minus and plus signs, respectively. The
minus sign of (4.49) comes from the Condon–Shortley phase. The difference,
however, is not essential, because the transition probability is proportional to
4.3 Three-Dimensional System 139
2 2
ð e Þ
Pxþiy,j p i or Pxiy, . Some literature [3, 4] uses (x + iy) instead of x + iy.
ðeþ Þ
j pþ i
This is because simply of the inclusion of the Condon–Shortley phase; see (3.304).
Let us think of the coherent state that is composed of ϕ(1s) and ϕ(2px + iy) or
ϕ(2px iy). Choosing ϕ(2px + iy), the state ψ(x, t) can be given by
1
ψ ðx, t Þ ¼ pffiffiffi
2
ϕð1sÞ exp ðiE ð1sÞt=hÞ þ ϕ 2pxþiy exp iE 2pxþiy t=h , ð4:56Þ
where ϕ(1s) is described by (3.301) and ϕ(2px + iy) is expressed as (3.304). Then we
have
That is, R 2pxþiy represents a real component of ϕ(2px+iy) that depends only on
r and θ. The third term of (4.57) implies that the existence probability density of an
electron represented by jψ(x, t)j2 is rotating counterclockwise around the z-axis with
an angular frequency of ω. Similarly, in the case of ϕ(2pxiy), the existence prob-
ability density of an electron is rotating clockwise around the z-axis with an angular
frequency of ω.
Integrating (4.57), we have
Z Z
ψ ðx, t Þψ ðx, t Þdτ ¼ jψ ðx, t Þj2 dτ
Z n
1 2 o
¼ jϕð1sÞj2 þ ϕ 2pxþiy dτ
2
Z 1 Z π Z 2π
1 1
þ 2
r dr sin θ dθ ϕð1sÞR 2pxþiy dϕ cos ðϕ ωt Þ ¼ þ ¼ 1,
0 0 0 2 2
where we used normalized functional forms of ϕ(1s) and ϕ(2px+iy); the last term
vanishes because
140 4 Optical Transition and Selection Rules
Z 2π
dϕ cos ðϕ ωt Þ ¼ 0:
0
where we assume that m is positive so that we can appropriately take into account the
Condon–Shortley phase. Alternatively, we describe it via unitary transformation as
follows:
ð1Þm imϕ 1 imϕ 1 1
pffiffiffiffiffi e pffiffiffiffiffi e ¼ pffiffiffi cos mϕ pffiffiffi sin mϕ
2π 2π π π
0 m 1
ð1Þ 1
pffiffiffi pffiffiffi
B 2 2 C
B C, ð4:60Þ
@ ð1Þm i i A
pffiffiffi pffiffiffi
2 2
0 1
ð1Þm ð1Þm i
ðnÞ pffiffiffi pffiffiffi C
B
eðnÞ
Λ eðnÞ e eðnÞ B 2 2 C,
l, cos mϕ l, sin mϕ ¼ Λl,m
Λ Λl,m @ A ð4:61Þ
1 i
pffiffiffi pffiffiffi
2 2
ðe Þ
Px,jp1 i ¼ ehϕð1sÞjxjϕð2px Þi
x
Z 1 Z π Z 2π pffiffiffi
e 27 2
¼ pffiffiffi r 4 e3r=2a dr sin 3 θdθ cos 2 ϕdϕ ¼ ea: ð4:63Þ
4 2πa4 0 0 0 35
Thus, we obtained the same result as (4.49) apart from the minus sign. Since a square
of an absolute value of the transition moment plays a role, the minus sign is again of
secondary importance. With Pðye2 Þ , similarly we have
ðe Þ
Py,jp2 ¼ e ϕð1sÞjyjϕ 2py
yi
Z 1 Z π Z 2π pffiffiffi
e 4 3r=2a 27 2
¼ pffiffiffi r e dr sin θdθ
3
sin ϕdϕ ¼ 5 ea:
2
ð4:64Þ
4 2πa4 0 0 0 3
pffiffiffi
ðe Þ ðe Þ ðe Þ 27 2
Pz, 3 p ¼ Px,j1 p i ¼ Py, 2 p ¼ 5 ea:
j zi x j yi 3
ðe Þ ðe Þ ðe Þ
In the case of Pz,jp3 , Px,j1 p i , and Py,jp2 , the optical transition is said to be polarized
zi x yi
along the z-, x-, and y-axis, respectively, and so linearly polarized lights are relevant.
Note moreover that operators z, x, and y in (4.39), (4.63), and (4.64) are Hermitian
and that ϕ(2pz), ϕ(2px), and ϕ(2py) are real functions.
Notice that in the upper line the indices change cyclic like (z, x, y), whereas in the
lower line they change anti-cyclic such as (z, y, x). The proof of (4.65) is left for the
reader. Thus, we have, e.g.,
Putting
Qþ x þ iy and Q x iy,
we have
½M z , Qþ ¼ Qþ , ½M z , Q ¼ Q : ð4:67Þ
hm0 j½M z , Qþ jmi ¼ hm0 jM z Qþ Qþ M z jmi ¼ m0 hm0 jQþ jmi mhm0 jQþ jmi
¼ hm0 jQþ jmi, ð4:68Þ
Therefore, for the matrix element hm0j Q+j mi not to vanish, we must have
m0 m 1 ¼ 0 or Δm ¼ 1 ðΔm m0 mÞ:
This represents the selection rule with respect to the coordinate Q+.
Similarly, we get
In this case, for the matrix element hm0j Qj mi not to vanish we have
m0 m þ 1 ¼ 0 or Δm ¼ 1:
To derive (4.70), we can alternatively use the following: Taking the adjoint of (4.69),
we have
½M z , z ¼ 0: ð4:71Þ
Therefore, for the matrix element hm0j zj mi not to vanish, we must have
144 4 Optical Transition and Selection Rules
m0 m ¼ 0 or Δm ¼ 0: ð4:72Þ
These results are fully consistent with Example 4.3 of Sect. 4.3. That is, if
circularly polarized light takes part in the optical transition, Δm ¼ 1. For instance,
using the present notation we rewrite (4.48) as
pffiffiffi
1 1 27 2
ϕð1sÞjpffiffiffi ðx iyÞjϕ 2pxþiy ¼ pffiffiffi h0jQ j1i ¼ 5 a:
2 2 3
¼ M x ðM x z zM x Þ þ M x zM x zM x 2 þ M y M y z zM y
þ M y zM y zM y 2
¼ M x ½M x , z þ ½M x , zM x þ M y M y , z þ M y , z M y
¼ i M y x þ xM y M x y yM x
¼ i M x y yM x M y x þ xM y þ 2M y x 2M x y
¼ i 2iz þ 2M y x 2M x y ¼ 2i M y x M x y þ iz : ð4:73Þ
In the above calculations, (i) we used [Mz, z] ¼ 0 (with the second equality); (ii) RHS
was modified so that the commutation relations can be used (the third equality); (iii)
we used Mxy ¼ Mxy 2Mxy and Myx ¼ Myx + 2Myx so that we can use (4.65)
(the second last equality). Moreover, using (4.65), (4.73) can be written as
2
M , z ¼ 2i xM y M x y ¼ 2i M y x yM x :
Similar results on the commutator can be obtained with [M2, x] and [M2, y]. For
further use, we give alternative relations such that
M 2 , x ¼ 2i yM z M y z ¼ 2i M z y zM y ,
2
M , y ¼ 2iðzM x M z xÞ ¼ 2iðM x z xM z Þ: ð4:74Þ
2 2
M , M , z ¼ 2i M 2 , M y x M 2 , M x y þ i M 2 , z
¼ 2i M y M 2 , x M x M 2 , y þ i M 2 , z
¼ 2i 2iM y yM z M y z 2iM x ðM x z xM z Þ þ i M 2 , z
¼ 2 2 M x x þ M y y þ M z z M z 2 M x 2 þ M y 2 þ M z 2 z
þM 2 z z M 2 g ¼ 2 M 2 z þ z M 2 : ð4:75Þ
In the above calculations, (i) we used [M2, My] ¼ [M2, Mx] ¼ 0 (with the second
equality); (ii) we used (4.74) (the third equality); (iii) RHS was modified so that we
can use the relation M ⊥ x from the definition of the angular momentum operator,
i.e., Mxx + Myy + Mzz ¼ 0 (the second last equality). We used [Mz, z] ¼ 0 as well.
Similar results are obtained with x and y. That is, we have
M 2 , M 2 , x ¼ 2 M 2 x þ xM 2 , ð4:76Þ
2 2
M , M , y ¼ 2 M 2 y þ yM 2 : ð4:77Þ
M 4 z 2M 2 z M 2 þ z M 4 ¼ 2 M 2 z þ z M 2 : ð4:78Þ
Using the relation (4.78) and taking inner products of both sides, we get, e.g.,
l0 jM 4 z 2M 2 z M 2 þ z M 4 jl ¼ l0 j2 M 2 z þ z M 2 jl : ð4:79Þ
That is,
l0 jM 4 z 2M 2 z M 2 þ z M 4 jl l0 j2 M 2 z þ z M 2 jl ¼ 0:
Considering that both terms of LHS contain a factor hl0j zj li in common, we have
h i
l0 ðl0 þ 1Þ 2l0 lðl0 þ 1Þðl þ 1Þ þ l2 ðl þ 1Þ2 2l0 ðl0 þ 1Þ 2lðl þ 1Þ
2 2
h i
¼ ðl0 þ lÞ ðl0 þ lÞðl0 lÞ þ 2 l0 l0 l þ l2 2l0 l 2 ðl0 þ lÞ
2 2
h i
¼ ðl0 þ lÞ ðl0 þ lÞðl0 lÞ þ 2 l0 2l0 l þ l2 ðl0 þ l þ 2Þ
2 2
h i
¼ ðl0 þ lÞ ðl0 þ lÞðl0 lÞ þ 2ðl0 lÞ ðl0 þ l þ 2Þ
2 2
n o
¼ ðl0 þ lÞ ðl0 lÞ ½ðl0 þ lÞ þ 2 ðl0 þ l þ 2Þ
2
We have similar relations with respect to hl0j xj li and hl0j yj li because of (4.76) and
(4.77). For the electric dipole transition to be allowed, among hl0j xj li, hl0j yj li, and
hl0j zj li at least one term must be nonvanishing. For this, at least one of the four
factors of (4.81) should be zero. Since l0 + l + 2 > 0, this factor is excluded.
For l0 + l to vanish, we should have l0 ¼ l ¼ 0; notice that both l0 and l are non-
negative integers. We must then examine this condition. This condition is equivalent
to that the spherical harmonics related to the angular variables θ and ϕ take the form
pffiffiffiffiffi
of Y00 ðθ, ϕÞ ¼ 1= 4π, i.e., a constant. Therefore, the θ-related integral for the matrix
element hl0j zj li only consists of a following factor:
Z π Z π
1 1
cos θ sin θdθ ¼ sin 2θdθ ¼ ½ cos 2θπ0 ¼ 0,
0 2 0 4
where cosθ comes from a polar coordinate z ¼ r cos θ; sinθ is due to an infinitesimal
volume of space, i.e., r2 sin θdrdθdϕ. Thus, we find that hl0j zj li vanishes on
condition that l0 ¼ l ¼ 0. As a polar coordinate representation, x ¼ r sin θ cos ϕ
and y ¼ r sin θ sin ϕ, and so the ϕ-related integral hl0j xj li and hl0j yj li vanishes as
well. That is,
Z 2π Z 2π
cos ϕdϕ ¼ sin ϕdϕ ¼ 0:
0 0
Therefore, the matrix elements relevant to l0 ¼ l ¼ 0 vanish with all the coordinates;
i.e., we have
4.5 Angular Momentum of Radiation 147
l0 l þ 1 ¼ 0 or l0 l 1 ¼ 0: ð4:84Þ
Or defining Δl l0 l, we get
Δl ¼ 1: ð4:85Þ
Thus, for the transition to be allowed, the azimuthal quantum number must change
by one.
In Sect. 4.3 we mentioned circularly polarized light. If the circularly polarized light
acts on an electron, what can we anticipate? Here we deal with this problem within a
framework of a semiclassical theory.
Let E and H be an electric and magnetic field of a left-circularly polarized light,
respectively. They are expressed as
1
E = pffiffiffi E 0 ðe1 þ ie2 Þ exp iðkz ωt Þ, ð4:86Þ
2
1 1 E
H = pffiffiffi H 0 ðe2 2 ie1 Þ exp iðkz ωt Þ ¼ pffiffiffi 0 ðe2 2 ie1 Þ exp iðkz ωt Þ: ð4:87Þ
2 2 μv
Here we assume that the light is propagating in the direction of the positive z-axis.
The electric and magnetic fields described by (4.86) and (4.87) represent the left-
circularly polarized light. A synchronized motion of an electron is expected, if the
electron exerts a circular motion in such a way that the motion direction of the
electron is always perpendicular to the electric field and parallel to the magnetic field
(see Fig. 4.2). In this situation, magnetic Lorentz force does not affect the electron
motion.
Here, the Lorentz force F(t) is described by
where the first term is electric Lorentz force and the second term represents the
magnetic Lorentz force.
148 4 Optical Transition and Selection Rules
electron
O E x
1
E = pffiffiffi E 0 ½e1 cos ðkz ωt Þ 2 e2 sin ðkz ωt Þ
2
1
þi pffiffiffi E 0 ½e2 cos ðkz ωt Þ þ e1 sin ðkz ωt Þ: ð4:89Þ
2
Suppose that the electron exerts the circular motion in a region narrow enough
around the origin and that the said electron motion is confined within the xy-plane
that is perpendicular to the light propagation direction. Then, we can assume that
z 0 in (4.89). Ignoring kz in (4.89) accordingly and taking a real part, we have
1
E = pffiffiffi E0 ðe1 cos ωt þ e2 sin ωt Þ:
2
F = eE, ð4:90Þ
x = eE,
m€ ð4:91Þ
4.5 Angular Momentum of Radiation 149
1 1
m€x ¼ pffiffiffi eE0 cos ωt and m€y ¼ pffiffiffi eE 0 sin ωt: ð4:92Þ
2 2
eE
mx ¼ pffiffiffi 0 cos ωt þ Ct þ D, ð4:93Þ
2 ω2
eE
my ¼ pffiffiffi 0 sin ωt þ C 0 t þ D0 , ð4:94Þ
2 ω2
ffiffi 0 , we
where C0 and D0 are integration constants. Setting y(0) ¼ 0 and y0 ð0Þ ¼ peE
2mω
have C0 ¼ D0 ¼ 0. Thus, making t a parameter, we get
2
eE
x2 þ y2 ¼ pffiffiffi 0 : ð4:95Þ
2mω2
This implies that the electron is exerting a counterclockwise circular motion with
a radius pffiffieE
2mω2
0
under the influence of the electric field. This is consistent with a
motion of an electron in the coherent state of ϕ(1s) and ϕ(2px + iy) as expressed in
(4.57).
An angular momentum the electron has acquired is
eE meE e2 E 0 2
L¼x p = xpy ypx ¼ pffiffiffi 0 pffiffiffi 0 ¼ : ð4:96Þ
2mω2 2mω 2mω3
e2 E 0 2
¼ h: ð4:97Þ
2mω3
e2 E 0 2
¼ hω: ð4:98Þ
2mω2
eE
α ¼ pffiffiffi 0 : ð4:99Þ
2mω2
References
the quantum state or energy is small. The applied external field causes the change in
Hamiltonian of a quantum system and results in the change in that quantum state
usually accompanied by the energy change of the system. We describe this process
as a following eigenvalue equation:
H j Ψi i ¼ Ei j Ψi i, ð5:1Þ
H ¼ H 0 þ λV, ð5:2Þ
where H0 is the Hamiltonian of the unperturbed system without the external field;
V is an additional Hamiltonian due to the applied external field, or interaction
between the physical system and the external field. The second term includes a
parameter λ that can be modulated according to the change in the strength of the
applied field. The parameter λ should be real so that H of (5.2) can be Hermitian. The
parameter λ can be the applied field itself or can merely be a dimensionless parameter
that can be set at λ ¼ 1 after finishing the calculation. We will come back to this point
later in Examples.
In (5.2) we assume that the following equation holds:
ð0Þ
H 0 j ii ¼ E i j ii, ð5:3Þ
where jii is the i-th quantum state. We designate j0i as the ground state. We assume
the excited states ji i (i ¼ 1, 2, ) to be numbered in order of increasing energies.
These functions (or vectors) jii (including j0i) can be a quantum state that appeared
as various eigenfunctions in the previous chapters. Of these, two or more functions
may have the same eigenenergy (i.e., degenerate). If we are to deal with such
degenerate states, those states jii have to be distinguished by, e.g., jid i
(d ¼ 1, 2, , s), where s denotes the degeneracy (or degree of degeneracy). For
simplicity, however, in this chapter we are going to examine only the change in
energy and physical states with respect to nondegenerate states. We disregard the
issue on notation of the degenerate states accordingly. We pay attention to each
individual case later, when necessary.
Regarding a one-dimensional physical system, e.g., a particle confined within
potential well and quantum-mechanical harmonic oscillator, for example, all the
quantum states are nondegenerate as we have already seen in Chaps. 1 and 2. As a
5.1 Perturbation Method 153
special case we also consider the ground state of a hydrogen-like atom. This is
because that state is nondegenerate, and so the problem can be dealt with in parallel
to the cases of the one-dimensional physical systems. We assume that the quantum
sates ji i (i ¼ 0, 1, 2, ) constitute the orthonormal eigenvectors such that
h j j ii ¼ δji : ð5:4Þ
Remember that the notation has already appeared in (2.53); see Chap. 13 as well.
One of the most important applications of the perturbation method is to evaluate the
shift in quantum state and energy level caused by the applied external field. To this
end, we expand both the quantum sate and energy as a power series of λ. That is, in
(5.1) we expand jΨii and Ei such that
hijΨi i ¼ 1: ð5:7Þ
This condition, however, not necessarily means that jΨii has been normalized
(vide infra). From (5.4) and (5.7) we have
D E D E
ð1Þ ð2Þ
ijϕi ¼ ijϕi ¼ ¼ 0: ð5:8Þ
Let us calculate hΨij Ψii on condition of (5.7) and (5.8). From (5.5) we have
154 5 Approximation Methods of Quantum Mechanics
hD E D Ei
ð1Þ ð1Þ
hΨi jΨi i ¼ hijii þ λ ijϕi þ ϕi ji
hD E D E D Ei
ð2Þ ð2Þ ð1Þ ð1Þ
þλ2 ijϕi þ ϕi ji þ ϕi jϕi þ
D E D E
ð1Þ ð1Þ ð1Þ ð1Þ
¼ hijii þ λ2 ϕi jϕi þ 1 þ λ2 ϕi jϕi ,
where we used (5.4) and (5.8). Thus, hΨij Ψii is normalized if we ignore the factor of
λ 2.
Now, inserting (5.5) and (5.6) into (5.1) and comparing the same power factors
with respect to λ, we obtain
h i
ð0Þ
E i H 0 j ii ¼ 0, ð5:9Þ
h i
ð0Þ ð1Þ ð1Þ
E i H 0 j ϕi i þ E i j ii ¼ V j ii, ð5:10Þ
h i
ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ
Ei H 0 j ϕi i þ E i j ϕi i þ Ei j ii ¼ V j ϕi i: ð5:11Þ
Equation (5.9) is identical with (5.3). Operating h jj on both sides of (5.10) from
the left, we get
h i h i
ð0Þ ð1Þ ð1Þ ð0Þ ð0Þ ð1Þ ð1Þ
h j j Ei H 0 j ϕi i þ h j j E i j ii ¼ h j j E i E j j ϕi i þ E i h jjii
h i
ð0Þ ð0Þ ð1Þ ð1Þ
¼ E i E j h j j ϕi i þ E i δji ¼ h jjVjii: ð5:12Þ
ð1Þ
Ei ¼ hijVjii: ð5:13Þ
This represents the first-order correction term with the energy eigenvalue. Mean-
while, assuming j 6¼ i on (5.12), we have
ð1Þ h jjVjii
h j j ϕi i ¼ h i ð j 6¼ iÞ, ð5:14Þ
ð0Þ ð0Þ
Ei Ej
ð 0Þ ð0Þ
where we used Ei 6¼ E j on the assumption that the quantum state jii that belongs
ð0Þ
to eigenenergy E i is nondegenerate. Here, we emphasize that the eigenstates that
ð0Þ
correspond to Ej may or may not be degenerate. In other words, the quantum state
that we are making an issue of with respect to the perturbation is nondegenerate. In
this context, we do not question whether or not other quantum states [represented by
jji in (5.14)] are degenerate.
5.1 Perturbation Method 155
ð1Þ ð1Þ
X ð1Þ
X ð1Þ
j ϕi i ¼ E j ϕi i ¼ k
jkihk jϕi i ¼ k6¼i
jkihkjϕi i
X hkjVjii
¼ k6¼i
j ki h i, ð5:16Þ
ð0Þ ð0Þ
Ei Ek
where with the third equality we used (5.8) and with the last equality we used (5.14).
Operating hij from the left on (5.16) and using (5.4), we recover (5.8). Hence, using
(5.5) the approximated quantum state to the first order of λ is described by
ð1Þ
X hkjVjii
j Ψi i j ii þ λ j ϕi i ¼j ii þ λ k6¼i
j ki h i: ð5:17Þ
ð0Þ ð0Þ
Ei Ek
For a while, let us think of a case where we deal with the change in the
eigenenergy and corresponding eigenstate with respect to the degenerate quantum
state. In that case, regarding the state jii in question on (5.12) we have ∃j ji ( j 6¼ i)
ð 0Þ ð0Þ
that satisfies Ei ¼ E j . Therefore, we would have h jj Vj ii ¼ 0 from (5.12).
Generally, it is not the case, however, and so we need a relation different from
(5.12) to deal with the degenerate case. However, we do not get into details about
this issue in this book.
ð2Þ
Next let us seek the second-order correction term E i of the eigenenergy.
Operating h jj on both sides of (5.11) from the left, we get
h i
ð0Þ ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ
Ei Ej h j j ϕi i þ E i h j j ϕi i þ E i δji ¼ h j j V j ϕi i:
ð2Þ ð1Þ
Ei ¼ hi j V j ϕi i: ð5:18Þ
Notice that we used (5.8) to derive (5.18). Using (5.16) furthermore, we get
ð2Þ
X hkjVjii X hkjVjii
Ei ¼ k6¼i
hi j V j ki h i¼
k6¼i
hk j V { j ii h i
ð0Þ ð0Þ ð0Þ ð0Þ
Ei Ek Ei Ek
X hkjVjii X 1
¼ k6¼i
hk j V j ii h i¼
k6¼i
h i jhkjVjiij2 , ð5:19Þ
ð0Þ ð0Þ ð0Þ ð 0Þ
Ei Ek Ei Ek
where with the second equality we used (1.116) in combination with (1.118) and
with the third equality we used the fact that V is an Hermitian operator; i.e., V{ ¼ V.
The state jki is sometimes called an intermediate state.
Considering the expression of (5.16), we find that the approximated state of
ð1Þ
j ii þ λ j ϕi i in (5.5) is not an eigenstate of energy, because the approximated state
contains a linear combination of different eigenstates jki that have eigenenergies
ð0Þ
different from Ei of jii. Substituting (5.13) and (5.19) for (5.6), with the energy
correction terms up to the second order we have
ð0Þ ð0Þ
Since E 0 < Ek ðk ¼ 1, 2, Þ for anyjki, the second-order correction term is
always negative. This contributes to the energy stabilization.
Example 5.1 Suppose that the charged particle is confined within a one-dimen-
sional potential well. This problem has already appeared in Example 1.2. Now let us
consider how an energy of a particle carrying a charge e is shifted by the applied
electric field. This situation could experimentally be achieved using a “nano capac-
itor” where, e.g., an electron is confined within the capacitor, while applying a
voltage between the two electrodes.
We adopt the coordinate system the same as that of Example 1.2. That is, suppose
that we apply an electric field F between the electrodes that are positioned at x ¼ L
and that the electron is confined between the two electrodes. Then, the Hamiltonian
of the system is described as
h2 d 2
H¼ eFx, ð5:22Þ
2m dx2
where m is a mass of the particle. Note that we may choose eF for the parameter λ
of Sect. 5.1.1. Then, the coordinate representation of the Schrödinger equation is
given by
h2 d 2 ψ i ð x Þ
eFxψ i ðxÞ ¼ E i ψ i ðxÞ, ð5:23Þ
2m dx2
where Ei is the energy of the i-th state from the bottom (i.e., the ground state); ψ i is its
corresponding eigenstate. We wish to seek the perturbation energy up to the second
order and the quantum state up to the first order. We are particularly interested in the
change in the ground state j0i. Notice that the ground state is obtained in (1.101) by
putting l ¼ 0. According to (5.21) we have
ð0Þ
X 1
E 0 E0 þ λh0jVj0i þ λ2 h i jhkjVj0ij2 , ð5:24Þ
k6¼0 ð0Þ ð0Þ
E0 Ek
ð0Þ
X 1
E 0 E0 eF h0jxj0i þ ðeF Þ2 h i jhkjxj0ij2 : ð5:25Þ
k6¼0 ð0Þ ð0Þ
E0 Ek
Considering that j0i is an even function [i.e., a cosine function in (1.101)] with
respect to x and that x is an odd function, we find that the second term vanishes.
Notice that the explicit coordinate representation of j0i is
158 5 Approximation Methods of Quantum Mechanics
rffiffiffi
1 π
j 0i ¼ cos x, ð5:26Þ
L 2L
qffiffi
π
which is obtained by putting l ¼ 0 in j lcos i 1
L cos 2L þ lπL x ðl ¼ 0, 1, 2, Þ in
(1.101).
By the same token, hkj xj 0i in the third term of RHS of (5.25) vanishes if jki
denotes a cosine function in (1.101). If, on the other hand, jki denotes a sine
function, hkj xj 0i does not vanish. To distinguish the sine functions from cosine
functions, we denote
rffiffiffi
1 nπ
j nsin i sin x ðn ¼ 1, 2, 3, Þ ð5:27Þ
L L
that is identical with (1.102); see Fig. 5.1 with the notation of the quantum states.
h ð0Þ π 2 2
h ð0Þ
ð2nÞ π 2 2 2
Now, putting E0 ¼ 2m 4L 2 and E 2n1 ¼ 2m
4L2
ðn ¼ 1, 2, 3, Þ in (5.25),
we rewrite it as
ð0Þ
X1 1
E0 E 0 eF h0jxj0i þ ðeF Þ2 h i jhkjxj0ij2
k¼1 ð0Þ ð0Þ
E0 Ek
⋯
( ) ℏ
= |2 ⟩ ≡ |3⟩
( ) ℏ
= |1 ⟩ ≡ |2⟩
( ) ℏ
= |1 ⟩ ≡ |1⟩
( ) ℏ
= |0 ⟩ ≡ | 0⟩
5.1 Perturbation Method 159
ð0Þ
X1 1
¼ E 0 eF h0jxj0i þ ðeF Þ2 h i jhnsin jxj0ij2 , ð5:25Þ
n¼1 ð0Þ ð0Þ
E 0 E 2n1
where with the second equality n denotes the number appearing in (5.27) and jnsini
ð0Þ
belongs to E2n1. Referring to, e.g., (4.17) with regard to the calculation of hkj xj 0i,
we have arrived at the following equation described by
h2 π 2 322 8 mL4 X1 n2
E0 2 ðeF Þ2 : ð5:28Þ
2m 4L π h
6 2 n¼1
ð4n 1Þ5
2
The series of the second term in (5.28) rapidly converges. Defining the first
N partial sum of the series as
XN n2
Sð N Þ ,
n¼1
ð4n2 1Þ5
we obtain
as well as
½Sð1000Þ Sð1Þ
¼ 0:001324653:
Sð1000Þ
That is, only the first term of S(N ) occupies ~99.9% of the infinite series. Thus,
ð2Þ
the stabilization energy λ2 E0 due to the perturbation is satisfactorily given by
ð1Þ
X hkjxj0i
j Ψ0 i j 0i þ λ j ϕ0 i j 0i eF k6¼0
j ki h i
ð0Þ ð0Þ
E0 Ek
32 8 mL3 X1 ð1Þn1 n
¼j 0i þ eF j n sin i : ð5:30Þ
π 4 h2 n¼1
ð4n2 1Þ3
32 8 mL3
j Ψ0 i j 0i þ eF j 1sin i: ð5:31Þ
27π 4 h2
Thus, we have roughly estimated the stabilization energy and the corresponding
quantum state as in (5.29) and (5.31), respectively. It may well be worth estimating
rough numbers of L and F in the actual situation (or perhaps in an actual “nano
device”). The estimation is left for readers.
Example 5.2 [1] In Chap. 2 we dealt with a quantum-mechanical harmonic oscil-
lator. Here let us suppose that a charged particle (with its charge e) is performing
harmonic oscillation under an applied electric field. First we consider the coordinate
representation of Schrödinger equation. Without the external field, the Schrödinger
equation as an eigenvalue equation reads as
h2 d 2 uð qÞ 1
þ mω2 q2 uðqÞ ¼ EuðqÞ: ð2:108Þ
2m dq2 2
Under the applied electric field, the perturbation is expressed as eFq and, hence,
the equation can be written as
h2 d 2 uð qÞ 1
þ mω2 q2 uðqÞ eF ðqÞquðqÞ ¼ EuðqÞ: ð5:32Þ
2m dq2 2
For simplicity we assume that the electric field is uniform independent of q so that
we can deal with it as a constant. Then, (5.32) can be rewritten as
2
h2 d2 uðqÞ 1 eF 2 1 2 eF
þ mω q
2
uðqÞ ¼ E þ mω uðqÞ: ð5:33Þ
2m dq2 2 mω2 2 mω2
eF
Qq , ð5:34Þ
mω2
we have
2
h2 d 2 uð qÞ 1 1 2 eF
þ mω Q uðqÞ ¼ E þ mω
2 2
uðqÞ: ð5:35Þ
2m dQ2 2 2 mω2
Taking account of the change in functional form that results from the variable
transformation, we rewrite (5.35) as
5.1 Perturbation Method 161
2
h2 d 2 e
uð Q Þ 1 1 2 eF
þ mω Q e
2 2
uðQÞ ¼ E þ mω e
uðQÞ, ð5:36Þ
2m dQ2 2 2 mω2
where
eF
uð qÞ ¼ u Q þ e
uðQÞ: ð5:37Þ
mω2
e as
Defining E
2
e E þ 1 mω2 eF e2 F 2
E ¼ E þ , ð5:38Þ
2 mω2 2mω2
we get
h2 d 2 e
uðQÞ 1 ee
þ mω2 Q2e
uð Q Þ ¼ E uðQÞ: ð5:39Þ
2m dQ2 2
Thus, we recover exactly the same form as (2.108). Equation (5.39) can be solved
analytically as already shown in Sect. 2.4. From (2.110) and (2.120), we have
e
2E
λ and λ ¼ 2n þ 1:
hω
Then, we have
e ¼ hω λ ¼ hωð2n þ 1Þ :
E ð5:40Þ
2 2
1
Eðn0Þ ¼ hω n þ : ð5:43Þ
2
Taking account of (2.55), (2.68), and (2.62), the second term of (5.42) vanishes.
With the third term, using (2.68) as well as (2.55) and (2.61) we have
rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
h { h
jhkjqjnij ¼ 2
kja þ a jn kja þ a{ jn
2mω 2mω
h pffiffiffi pffiffiffiffiffiffiffiffiffiffiffi 2
¼ nhkjn 1i þ n þ 1hkjn þ 1i
2mω
h pffiffiffi pffiffiffiffiffiffiffiffiffiffiffi 2
¼ nδk,n1 þ n þ 1δk,nþ1
2mω
h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii
h
¼ nδk,n1 þ ðn þ 1Þδk,nþ1 þ 2δk,n1 δk,nþ1 nðn þ 1Þ
2mω
h
¼ ½nδ þ ðn þ 1Þδk,nþ1 : ð5:44Þ
2mω k,n1
Notice that with the last equality of (5.44) there is no k that satisfies
k ¼ n 1 ¼ n + 1 at once. Also note that hkj qj ni is a real number. Hence, as the
third term of (5.42) we get
X 1
h i jhkjqjnij2
k6¼n ð0Þ
Eðn0Þ Ek
X 1 h
¼ ½nδ þ ðn þ 1Þδk,nþ1
hωðn kÞ 2mω k,n1
k6¼n
1 n nþ1 1
¼ þ ¼ : ð5:45Þ
2mω n ðn 1Þ n ðn þ 1Þ
2 2mω2
ð1Þ
X hkjVj0i h1jVj0i
j Ψ0 i j 0i þ λ j ϕ0 i ¼j 0i þ λ k6¼0
j ki h i ¼j 0i þ λ j 1i h i
ð0Þ ð0Þ ð0Þ ð0Þ
E0 Ek E0 E1
rffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h1jxj0i e2 F 2
¼ j0i eF j1i h i ¼j 0i þ j 1i: ð5:47Þ
ð0Þ
E E
ð0Þ 2mω3 h
0 1
Thus, jΨ0i is not an eigenstate because jΨ0i contains both j0i and j1i. Hence,
jΨ0i does not possess an eigenenergy. Notice also that in (5.47) the factors hkj xj 0i
vanish except for h1j xj 0i; see Chaps. 2 and 4.
To think of this point further, let us come back to the coordinate representation of
Schrödinger equation.
From (5.39) and (2.106), we have
eF
un ð qÞ ¼ eun q 2
mω
14 rffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi mω eF 2
mω 1 mω eF
¼ Hn q e 2h qmω2 ðn ¼ 0, 1, 2, Þ: ð5:48Þ
h π 2 2n n!
1
h mω2
Since ψ n(q) of (2.106) forms the (CONS) [2], un(q) should be expanded using
ψ n(q). That is, as a solution of (5.32) we get
X
un ð qÞ ¼ c ψ ðqÞ,
k nk k
ð5:49Þ
where a set of cnk are appropriate coefficients. Since the functions ψ k(q) are
nondegenerate, un(q) expressed by (5.49) lacks a definite eigenenergy.
Example 5.3 [1] In the previous examples we studied the perturbation in
one-dimensional systems, where all the energy eigenstates are nondegenerate.
Here we deal with the change in the energy and quantum states of a hydrogen
atom (i.e., a three-dimensional system). Since the energy eigenstates are generally
degenerate (see Chap. 3), for simplicity we consider only the ground state that is
nondegenerate. As in the previous two cases, we assume that the perturbation is
caused by the applied electric field. As the simplest example, we deal with the
properties including the polarizability of the hydrogen atom in its ground state. The
polarizability of atom, molecule, etc. is one of the most fundamental properties in
materials science.
164 5 Approximation Methods of Quantum Mechanics
We assume that the electric field is applied along the direction of the z-axis; in this
case the perturbation term V is expressed as
V ¼ eFz, ð5:50Þ
where e is a charge of an electron (e < 0). Then, the total Hamiltonian H is described
by
e ¼ H 0 þ V,
H ð5:51Þ
" #
2
h2 ∂ 2 ∂ 1 ∂ ∂ 1 ∂
H0 ¼ r þ sin θ þ
2μr 2 ∂r ∂r sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
e2
, ð5:52Þ
4πε0 r
X hkjVj0i X hkjzj0i
j Ψ0 i j 0i þ k6¼0
j ki h i ¼j 0i eF
k6¼0
j ki h i, ð5:53Þ
ð0Þ ð0Þ ð0Þ ð0Þ
E0 Ek E0 Ek
where j0i denotes the ground state expressed as (3.301). Since the quantum states jki
represent all the quantum states of the hydrogen atom as implied in (5.3), in the
second term of RHS in (5.53) we are considering all those states except for j0i. The
ð 0Þ h2
eigenenergy E0 is identical with E 1 ¼ 2μa 2 obtained from (3.258), where n ¼ 1.
As already noted, we simply numbered the states jki in order of increasing energies.
ð0Þ ð0Þ ð0Þ ð0Þ
In the case of k 6¼ j (k, j 6¼ 0), we may have either E k ¼ Ej or E k 6¼ Ej
accordingly.
Using (5.51), let us calculate the polarizability of a hydrogen atom in a ground
state. In Sect. 4.1 we gave a definition of the electric dipole moment such that
X
Pe x,
j j
ð4:6Þ
where xj is a position vector of the j-th charged particle. Placing the proton of
hydrogen at the origin, by use of the notation (3.5) P is expressed as
5.1 Perturbation Method 165
1 0 0 1
Px ex
B C B C
P ¼ ðe1 e2 e3 Þ@ Py A ¼ ex ¼ ðe1 e2 e3 Þ@ ey A:
Pz ez
Pz ¼ ez: ð5:54Þ
where jΨ0i is taken from (5.53). Substituting (5.53) for (5.55), we get
hΨ0 jzjΨ0 i
* +
X hkjzj0i X hkjzj0i
¼ h0 j eF k6¼0
hk j h ijzjj 0i eF
k6¼0
j ki h i
ð0Þ ð0Þ ð0Þ ð0Þ
E0 Ek E0 Ek
X hkjzj0i X hkjzj0i
¼ h0jzj0i eF k6¼0
h0jzjk i h i eF
k6¼0
0jz{ jk h i
ð0Þ ð0Þ ð0Þ ð0Þ
E0 Ek E0 Ek
X jhkjzj0ij2
þðeF Þ2 k6¼0
hkjzjki h i2 : ð5:56Þ
ð0Þ ð0Þ
E0 Ek
In (5.56) readers are referred to the computation rule of inner product such as
(13.20) and (13.64). Since z is Hermitian (i.e., z{ ¼ z), the second and third terms of
RHS of (5.56) equal. Neglecting the second-order perturbation factor on the fourth
term, we have
X jhkjzj0ij2
hΨ0 jzjΨ0 i h0jzj0i 2eF h i: ð5:57Þ
k6¼0 ð0Þ ð0Þ
E0 Ek
X jhkjzj0ij2
hPz i eh0jzj0i 2e2 F h i:
k6¼0 ð0Þ ð0Þ
E0 Ek
Z 1
a3
h0jzj0i ¼ ze2r=a dxdydz
π 1
Z 1 Z Z π
a3 3 2r=a
2π
¼ r e dr dϕ cos θ sin θdθ
π 0 0 0
Z 1 Z Z π
a3 3 2r=a
2π
¼ r e dr dϕ sin 2θdθ ¼ 0, ð5:58Þ
2π 0 0 0
Rπ
where we used 0 sin 2θdθ ¼ 0. Then, we have
X jhkjzj0ij2
hPz i 2e2 F h i: ð5:59Þ
k6¼0 ð0Þ ð0Þ
E0 Ek
X jhkjzj0ij2
α ¼ 2e2 h i: ð5:61Þ
k6¼0 ð0Þ ð 0Þ
E0 Ek
At the first glance, to evaluate (5.61) appears to be formidable, but making the
most of the fact that the total wave functions of a hydrogen atom form the CONS, the
evaluation is straightforward. The discussion is as follows:
First, let us seek an operator G that satisfies [1]
To this end, we take account of all the quantum states jki of a hydrogen atom
(including j0i) that are solutions (or eigenstates of energy) of (3.36). In Sect 3.8 we
have obtained the total wave functions described by
ðnÞ
where Λe is expressed as a product of spherical surface harmonics and radial wave
l,m
functions. The latter functions are described by associated Laguerre functions. It is
well known [2] that the spherical surface harmonics constitute the CONS on a unit
sphere and that associated Laguerre functions form the CONS in the real one-
dimensional space. Thus, their product expressed as (3.300), i.e., aforementioned
collection of jki constitutes the CONS in the real three-dimensional space. This is
equivalent to
X
j
j jih j j¼ E, ð5:63Þ
In other words, (5.64) implies that any function f(r) can be expanded into a series
of jki. The coefficients fk are defined as
ðGH 0 H 0 GÞ j 0i
h2 1 ∂ 2∂ 1 ∂ 2∂ 1 1 ∂ ∂
¼ G 2 r 2 r G 2 sinθ G j 0i: ð5:66Þ
2μ r ∂r ∂r r ∂r ∂r r sinθ ∂θ ∂θ
1 ∂ 2 ∂ 1 ∂ 2 ∂G 1 ∂ 2 ∂ j 0i
2 r G j 0i ¼ 2 r j 0i 2 r G
r ∂r ∂r r ∂r ∂r r ∂r ∂r
2
1 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i ∂G ∂ j 0i
¼ 2 2r j 0i 2 j 0i 2 r2
r ∂r ∂r ∂r ∂r r ∂r ∂r ∂r ∂r
2
2 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i
¼ j 0i 2 j 0i 2 2 r2
r ∂r ∂r ∂r ∂r r ∂r ∂r
2
2 ∂G ∂ G 2 ∂G G ∂ 2 ∂ j 0i
¼ j 0i 2 j 0i þ j 0i 2 r , ð5:67Þ
r ∂r ∂r a ∂r r ∂r ∂r
where with the last equality we used (3.301), namely j0 i / er/a, where a is Bohr
radius (Sect. 3.7.1). The last term of (5.67) cancels the first term of (5.66). Then, we
get
ðGH 0 H 0 GÞ j 0i
2
h2 1 1 ∂G ∂ G 1 1 ∂ ∂G
¼ 2 j0i þ 2 j0i þ 2 sin θ j0i
2μ r a ∂r ∂r r sin θ ∂θ ∂θ
2
h2 1 1 ∂G ∂ G 1 1 ∂ ∂G
¼ 2 þ 2 þ 2 sin θ j 0i: ð5:68Þ
2μ r a ∂r ∂r r sin θ ∂θ ∂θ
hr j z j 0i ¼ hr j ðGH 0 H 0 GÞ j 0i ð5:69Þ
where ϕ(1s) is given by (3.301). Dividing (5.70) by ϕ(1s) and rearranging it, we
have
5.1 Perturbation Method 169
2
∂ G 1 1 ∂G 1 1 ∂ ∂G 2μ
þ 2 þ sin θ ¼ r cos θ: ð5:71Þ
∂r 2 r a ∂r r 2 sin θ ∂θ ∂θ h2
d2 g 1 1 dg 2g 2μ
þ2 ¼ 2 r: ð5:73Þ
dr 2 r a dr r 2 h
Equation (5.73) has a regular singular point at the origin and resembles an Euler
equation whose general from of a homogeneous equation is described by [5]
d2 g a dg b
þ þ g ¼ 0, ð5:74Þ
dr 2 r dr r 2
d2 g a dg a
þ g ¼ 0, ð5:75Þ
dr 2 r dr r 2
g ¼ pr 2 þ qr: ð5:76Þ
4pr 2q 2μ
4p ¼ 2 r: ð5:77Þ
a a h
aμ a2 μ
p¼ and q ¼ 2 : ð5:78Þ
2h 2
h
Hence, we get
170 5 Approximation Methods of Quantum Mechanics
aμ r
gð r Þ ¼ þ a r: ð5:79Þ
h2 2
Accordingly, we have
aμ r aμ r
Gðr, θÞ ¼ gðr Þ cos θ ¼ þ a r cos θ ¼ þ a z: ð5:80Þ
h2 2 h2 2
ð0Þ
H 0 j 0i ¼ E 0 j 0i ð5:82Þ
and
h i{
ð0Þ ð0Þ
h j j H 0 ¼ H 0 { j ji{ ¼ H 0 j ji{ ¼ Ej j ji ¼ E j h j j , ð5:83Þ
where we used a computation rule of (1.118) and with the second equality we used
the fact that H0 is Hermitian, i.e., H0{ ¼ H0. Rewriting (5.81), we have
h jjzj0i
h jjGj0i ¼ ð0Þ ð0Þ
: ð5:84Þ
E0 Ej
Multiplying h0jzj ji on both sides of (5.84) and summing over j (6¼0), we obtain
But, from (5.58) h0j zj 0i ¼ 0. Moreover, using the completeness of jji described
by (5.63) for LHS of (5.86), we get
5.1 Perturbation Method 171
where with the second equality we used (5.84) and with the last equality we used
(5.61). Also, we used the fact that z is Hermitian.
Meanwhile, using (5.80), LHS of (5.87) is expressed as
aμ r aμ a2 μ
h0jzGj0i ¼ h 0 z þ a z0i ¼ 2 h0jzrzj0i 2 h0z2 0i: ð5:88Þ
h 2 2 2h h
Noting that j0i is spherically symmetric and that z and r are commutative, (5.88)
can be rewritten as
aμ 3 a2 μ 2
h0jzGj0i ¼ h 0 r 0i h0 r 0i, ð5:89Þ
6h2 3h2
To perform the definite integral calculations of (5.89) and obtain the result of
(5.90), modify (3.263) and use the formula described below. That is, differentiate
Z1
1
erξ dr ¼
ξ
0
four (or five) times with respect to ξ and replace ξ with 2/a to get the result of (5.90)
appropriately.
Using (5.87) and (5.90), as the polarizability α we finally get
172 5 Approximation Methods of Quantum Mechanics
aμ 9a3 e2 aμ 9a3 9a3
α ¼ 2e h0jzGj0i ¼ 2e 2
2 2
¼ ¼ 4πε 0
h 4 h2 2 2
¼ 18πε0 a3 , ð5:91Þ
where we used
ð0Þ
X 1
E 0 E0 eF h0jzj0i þ ðeF Þ2 h i jh jjzj0ij2 : ð5:93Þ
j6¼0 ð0Þ ð0Þ
E0 Ej
Experimentally, the energy shift by 9πε0a3F2 is well known as the Stark shift.
In the above examples, we have seen how energy levels and related properties are
changed by the applied electric field. The energies of the system that result from the
applied external field should not be considered as an energy eigenvalue, but should
be thought to be an expectation value.
The perturbation theory has a wide application in physical problems. Examples
include the evaluation of transition probability between quantum states. The theory
also deals with scattering of particles. Including the treatment of the degenerate case,
interested readers are referred to appropriate literature [1, 3].
Another approximation method is a variational method. This method also has a wide
range of applications in mathematical physics.
A major application of the variational method lies in seeking an eigenvalue that is
appropriately approximated. Suppose that we have an Hermitian differential opera-
tor D that satisfies an eigenvalue equation
5.2 Variational Method 173
λ1 λ2 λ3 , ð5:96Þ
hyn jDui ¼ hDyn jui ¼ hλn yn jui ¼ λn hyn jui ¼ λn hyn jui: ð5:97Þ
In (5.97) we used (1.132) and the computation rule of the inner product (see Sect.
13.1). Also, we used the fact that any eigenvalue of an Hermitian operator is real
(Sect. 1.4). From the assumption that the functions {yn} constitute the CONS, u can
be expanded in a series described by
X
u¼ c
j j
j yj i ð5:98Þ
with
cj yj ju : ð5:99Þ
This relation is in parallel with (5.64). Similarly, Du can be expanded such that
X X X X
Du ¼ d
j j
j yj i ¼ j
yj jDu j yj i ¼ λ
j j
yj ju j yj i ¼ λc
j j j
j yj i, ð5:100Þ
where dj is an expansion coefficient and with the third equality we used (5.95) and
with the last equality we used (5.99). Comparing the individual coefficient of jyji of
(5.100), we get
d j ¼ λj c j : ð5:101Þ
Thus, we obtain
X
Du ¼ λc
j j j
j yj i: ð5:102Þ
Comparing (5.98) and (5.102), we find that we may termwise operate D on (5.98).
174 5 Approximation Methods of Quantum Mechanics
Meanwhile, taking an inner product huj Dui again termwise with (5.100), we get
D X E X X X 2
hujDui ¼ uj j λj cj j yj i ¼ j
λ j c j ujy j ¼ j
λ j c j c
j ¼ λ c
j j j
X 2 X 2
λ cj ¼ λ 1 c , ð5:103Þ
j 1 j j
where with the third equality we used (5.99) in combination with ujyj ¼
yj ju ¼ cj . With huj ui, we have
DX X E X X 2
hujui ¼ c y jj c j y i ¼ c
c y jy ¼ c , ð5:104Þ
j j j j j j j j j j j j j
where with the last equality we used the fact that the functions {yn} are normalized.
Comparing once again (5.103) and (5.104), we finally get
hujDui
hujDui λ1 hujui or λ1 : ð5:105Þ
hujui
Dy ¼ λy, ð5:106Þ
Using this powerful theorem, we are able to find a suitably approximated smallest
eigenvalue. We have simple examples next. As before, let us focus on the change in
the energies and quantum states caused by the applied electric field for a particle in a
one-dimensional potential well and a harmonic oscillator. The latter problem will be
left for readers as an exercise. As a specific case, for simplicity, we consider only the
ground state and first excited state to choose a trial function. First, we deal with the
problem in common with both the cases. Then, we discuss the feature individually.
As a general case, suppose that H is the Hamiltonian of the system (either a
particle in a one-dimensional potential well or a harmonic oscillator). We calculate
an expectation value of energy hHi when the system undergoes an influence of the
electric field. The Hamiltonian is described by
5.2 Variational Method 175
H ¼ H 0 eFx, ð5:107Þ
where H0 is the Hamiltonian without the external field F. We suppose that the
quantum state jui is expressed as
j ui j 0i þ a j 1i, ð5:108Þ
where j0i and j1i denote the ground state and first excited state, respectively; a is an
unknown parameter to be decided after the analysis based on the variational method.
In the present case, jui is a trial function. Then, we have
hujHui
hH i : ð5:109Þ
hujui
Note that in (5.110) four terms vanish because of the symmetry requirement of the
symmetric potential well and harmonic oscillator with respect to the origin as well as
because of the orthogonality between the states j0i and j1i. Here x is Hermitian and
we assume that the coordinate representation of j0i and j1i is real as in (1.101) and
(1.102) or in (2.106). Then, we have
where we used the normalized conditions h0j 0i ¼ h1j 1i ¼ 1 and the orthogonal
conditions h0j 1i ¼ h1j 0i ¼ 0. Thus, we get
176 5 Approximation Methods of Quantum Mechanics
Here, defining the ground state energy and first excited energy as E0 and E1,
respectively, we put
where E0 6¼ E1. As already shown in Chaps. 1 and 2, both the systems have
nondegenerate energy eigenstates. Then, we have
E0 2eFah0jxj1i þ a2 E1
hH i ¼ : ð5:115Þ
1 þ a2
Now we wish to determine the condition where hHi has an extremum. That is, we
are seeking a that satisfies
∂ hH i
¼ 0: ð5:116Þ
∂a
For ∂∂a
hH i
to be zero, we must have
pffiffiffiffiffiffiffiffiffiffiffiffi Δ
1þΔ1þ ð5:120Þ
2
eF h0jxj1i
a : ð5:121Þ
E1 E0
eF h0jxj1i
j ui j 0i þ j 1i: ð5:122Þ
E1 E0
If one compares (5.122) with (5.17), one should recognize that these two equa-
tions are the same within the first-order approximation. Notice that putting i ¼ 0 and
k ¼ 1 in (5.17) and replacing V with eFx, we have the expression similar to (5.122).
Notice once again that h0j xj 1i ¼ h1j xj 0i as x is an Hermitian operator and that we
are considering real inner products.
Inserting (5.121) into (5.115) and approximating
1
1 a2 , ð5:123Þ
1 þ a2
we can estimate hHi. The estimation depends upon the nature of the system we
choose. The next examples deal with these specific characteristics.
Example 5.4 We adopt the same problem as Example 5.1. That is, we consider how
an energy of a particle carrying a charge e confined within a one-dimensional
potential well is changed by the applied electric field. Here we deal with it using
the variational method.
We deal with the same Hamiltonian as in the case of Example 5.1. It is described
as
h2 d 2
H¼ eFx: ð5:22Þ
2m dx2
By putting
h2 d 2
H0 , ð5:124Þ
2m dx2
H ¼ H 0 eFx: ð5:125Þ
Expressing the ground state as j0i and the first excited state as j1i, we adopt a trial
function jui as
178 5 Approximation Methods of Quantum Mechanics
j ui ¼j 0i þ a j 1i: ð5:108Þ
and
rffiffiffi
1 π
j 1i ¼ sin x: ð5:126Þ
L L
h2 π 2
h0jH 0 j0i ¼ E0 ¼ ,
2m 4L2
h2 4π 2
h1jH 0 j1i ¼ E 1 ¼ ¼ 4E0 : ð5:127Þ
2m 4L2
h0jxj1i
a eF : ð5:128Þ
3E 0
32L
h0jxj1i ¼ : ð5:129Þ
9π 2
For this calculation, use the coordinate representation of (5.26) and (5.126). Thus,
we get
32 8 mL3
a eF : ð5:130Þ
27π 4 h2
h0jxj1i
j ui j 0i þ eF j 1i: ð5:131Þ
3E 0
The resulting jui in (5.131) is the same as (5.31), which was obtained by the first-
order approximation of (5.30). Inserting (5.130) into (5.113) and approximating the
denominator as before such that
References 179
1
1 a2 , ð5:123Þ
1 þ a2
322 8 mL4
hH i E0 ðeF Þ2 : ð5:132Þ
243π 6 h2
Once again, this is the same result as that of (5.28) and (5.29) where only n ¼ 1 is
taken into account.
Example 5.5 We adopt the same problem as Example 5.2. The calculation pro-
cedures are almost the same as those of Example 5.2. We use
rffiffiffiffiffiffiffiffiffiffi
h
h0jqj1i ¼ and E1 E 0 ¼ hω: ð5:133Þ
2mω
The detailed procedures are left for readers as an exercise. We should get the same
results as those obtained in Example 5.2.
As discussed in the above five simple examples, we showed the calculation
procedures of the perturbation method and variational method. These methods not
only supply us with suitable approximation techniques, but also provide physical
and mathematical insight in many fields of natural science.
References
z ¼ x þ iy, ð6:1Þ
where x and y are real numbers and i is an imaginary unit. Graphically, the number
z is indicated as a point in a complex plane where the real axis is drawn as abscissa
and the imaginary axis is depicted as ordinate (see Fig. 6.1). Since the complex plane
has a two-dimensional extension, we can readily imagine the domain of variability of
z and make a graphical object for it on the complex plane. A disk-like diagram that is
enclosed with a closed curve C is frequently dealt with in the theory of analytic
0 1
functions. Such a diagram provides a good subject for study of the set theory and
topology. Before developing the theory of complex numbers and analytic functions,
we briefly mention the basic notions as well as notations and meanings of sets and
topology.
where A contains ten elements in the former case and uncountably infinite elements
are contained in the latter set B. In the latter case (sometimes in the former case as
well), the elements are usually called points.
When we speak of the sets, a universal set [1] is implied in the background. This
set contains all the sets in consideration of the study and is usually clearly defined in
the context of the discussion. For example, with the real analysis the universal set is
ℝ and with the complex analysis it is ℂ (an entire complex plane). In the latter case a
two-dimensional complex plane (see Fig. 6.1) is frequently used as the universal set.
We study various characteristics of sets (or subsets) that are contained in the
universal set and use a Venn diagram to represent them. Figure 6.2a illustrates an
example of Venn diagram that shows a subset A. As is often the case, the universal
set U is omitted in Fig. 6.2a. To show a (sub)set A we usually depict it with a closed
curve. If an element a is contained in A, we write
a2A ð6:3Þ
6.1 Set and Topology 183
(a) (b)
Fig. 6.2 Venn diagrams. (a) An example of the Venn diagram. (b) A subset A and its complement
Ac. U shows the universal set
and depict it as a point inside the closed curve. To show that b is not contained in A,
we write
b2
= A:
A ⊂ B: ð6:4Þ
The subset A may have different cardinal numbers (or cardinalities) depending on
the nature of A. To explicitly show this, elements are sometimes indexed as, e.g.,
ai (i ¼ 1, 2, ) or aλ (λ 2 Λ), where Λ can be a finite set or infinite set with different
cardinalities; (6.2) is an example. A complement (or complementary set) of A is
defined as a difference U A and denoted by Ac(U A). That is, the set Ac is said
to be a complement of A with respect to U and indicated with a shaded area in
Fig. 6.2b.
Let A and B be two different subsets in U (Fig. 6.3). Figure 6.3 represents a sum
(or union, or union of sets) A [ B, an intersection A \ B, and differences A B and
B A. More specifically, A B is defined as a subset of A to which B is subtracted
from A and referred to as a set difference of A relative to B. We have
A [ B ¼ ðA BÞ [ ðB AÞ [ ðA \ BÞ: ð6:5Þ
184 6 Theory of Analytic Functions
A [ A ¼ A:
The well-known de Morgan’s law can be understood graphically using Fig. 6.3.
u 2 ðA [ BÞc ⟺ u 2
= A [ B ⟺ u2
= A and u 2
= B ⟺ u 2 Ac and u 2 Bc
⟺ u 2 Ac \ Bc :
Furthermore, we have
u 2 ðA \ BÞc ⟺ u 2
= A \ B ⟺ u2
= A or u 2
= B ⟺ u 2 Ac or u 2 Bc ⟺ u 2 Ac [ Bc :
ðA [ BÞ \ C ¼ ðA \ C Þ [ ðB \ C Þ, ð6:7Þ
ðA \ BÞ [ C ¼ ðA [ CÞ \ ðB [ CÞ: ð6:8Þ
The confirmation of (6.7) and (6.8) is left for readers. The Venn diagrams are
intuitively acceptable, but we need more rigorous discussion to characterize and
analyze various aspects of sets (vide infra).
6.1 Set and Topology 185
τ0 fO0 \ O; O 2 τg:
Then, (O0, τ0) can be regarded as a topological space. In this case τ0 is called a
relative topology for O0 [2, 3] and O0 is referred to as a subspace of T. The topological
space (O0, τ0) shares the same topological feature as that of (T, τ). Notice that O0 does
not necessarily need to be an open set of T. But, if O0 is an open set of T, then any
open set belonging to O0 is an open set of T as well. This is evident from the
definition of the relative topology and Axiom (O2).
The abovementioned Axioms (O1)–(O3) sound somewhat pretentious and the
definition of the open sets seems to be “descent from heaven.” Nonetheless, these
axioms and terminology turn out useful soon. Once a topological space (T, τ) is
given, subsets of various types arise and a variety of relationships among them ensue
as well. Note that the above axioms mention nothing about a set that is different from
an open set. Let S be a subset (maybe an open set or maybe not) of T and let us think
of the properties of S.
Definition 6.1 Let (T, τ) be a topological space and S be an open set of τ; i.e., S 2 τ.
Then, a subset defined as a complement of S in (T, τ); i.e., Sc ¼ T S is called a
closed set in (T, τ).
Replacing S with Sc in Definition 6.1, we have (Sc)c ¼ S ¼ T Sc. The above
definition may be rephrased as follows: Let (T, τ) be a topological space and let A be
a subset of T. Then, if Ac ¼ T A 2 τ, then A is a closed set in (T, τ).
In parallel with the axioms (O1) to (O3), we have the following axioms related to
the closed sets of (T, τ): Let eτ be a collection of all the closed sets of (T, τ).
(C1) T 2 eτ and ∅ 2 eτ, where ∅ is an empty set.
(C2) If C1, C2, , C n 2 eτ, C1 [ C 2 [ C n 2 eτ.
(C3) If C λ ðλ 2 ΛÞ 2 eτ, \λ2Λ C λ 2 eτ.
186 6 Theory of Analytic Functions
These axioms are obvious from those of (O1)–(O3) along with de Morgan’s law
(6.6).
Next, we classify and characterize a variety of elements (or points) and subsets as
well as relationships among them in terms of the open sets and closed sets. We make
the most of these notions and properties to study various aspects of topological
spaces and their structures. In what follows, we assume the presence of a topological
space (T, τ).
(a) Neighborhoods [4]
Definition 6.2 Let (T, τ) be a topological space and S be a subset of T. If S contains
an open set ∃O that contains an element (or point) a 2 T, i.e.,
a 2 O ⊂ S,
S∘ ⊂ S: ð6:9Þ
S ⊂ S: ð6:10Þ
S∘ ⊂ S ⊂ S: ð6:11Þ
O ⊂ S∘ ⊂ S ð6:12Þ
with an arbitrary open set 8O contained in S∘. From Lemma 6.2 we have
x 2 S∘ ) x 2 O; i.e., S∘ ⊂ O with some open set ∃O containing S∘. Expressing
e we have S∘ ⊂ O.
this specific O as O, e Meanwhile, using Lemma 6.1 once again we
must get
e ⊂ S∘ ⊂ S:
O ð6:13Þ
e and O
That is, we have S∘ ⊂ O e ⊂ S∘ at once. This implies that with this specific
e e ∘
open set O, we get O ¼ S . That is,
188 6 Theory of Analytic Functions
e ¼ S∘ ⊂ S:
O ð6:14Þ
Relation (6.14) obviously shows that S∘ is an open set and moreover that S∘ is the
largest open set contained in S. Figure 6.5a depicts this relation. In this context we
have a following important theorem.
Theorem 6.1 Let S be a subset of T. The subset S is open if and only if S ¼ S∘.
Proof From (6.14) S∘ is open. Therefore, if S ¼ S∘, then S is open. Conversely,
suppose that S is open. In that event S itself is an open set contained in S. Meanwhile,
S∘ has been characterized as the largest open set contained in S. Consequently, we
have S ⊂ S∘. At the same time, from (6.11) we have S∘ ⊂ S. Thus, we get S ¼ S∘.
This completes the proof. ∎
In contrast to Lemmas 6.1 and 6.2, we have following lemmas with respect to
closures.
Lemma 6.3 Let S be a subset of T and C be any closed set containing S. Then, we
have S ⊂ C.
Proof Since C is a closed set, Cc is an open set. Suppose that with a point x, x 2
= C.
Then, x 2 Cc. Since C ⊃ S, Cc ⊂ Sc. This implies Cc \ S ¼ ∅. As x 2 Cc ⊂ Cc,
by Definition 6.2 Cc is a neighborhood of x. Then, by Definition 6.4 x is not in S; i.e.,
c c
x2= S. This statement can be translated into x 2 Cc ) x 2 S ; i.e., Cc ⊂ S . That
is, we get S ⊂ C. ∎
Lemma 6.4 Let S be a subset of T and suppose that a point x does not belong to S;
i.e., x 2 = C for some closed set ∃C containing S.
= S. Then, x 2
= S, then by Definition 6.4 we have some neighborhood ∃N and some
Proof If x 2
∃
open set O contained in that N such that x 2 O ⊂ N and N \ S ¼ ∅. Therefore, we
must have O \ S ¼ ∅. Let C ¼ Oc. Then C is a closed set with C ⊃ S. Since
x 2 O, x 2
= C. This completes the proof. ∎
From Lemmas 6.3 and 6.4 we obtain further implications. From Lemma 6.3 and
(6.11) we have
6.1 Set and Topology 189
S⊂S⊂C ð6:15Þ
e
S ⊂ S ⊂ C: ð6:16Þ
That is, we have Ce ⊂ S and S ⊂ C e at once. This implies that with this specific
e e
closed set C, we get C ¼ S. That is,
e
S ⊂ S ¼ C: ð6:17Þ
Relation (6.17) obviously shows that S is a closed set and moreover that S is the
smallest closed set containing S. Figure 6.5b depicts this relation. In this context we
have a following important theorem.
Theorem 6.2 Let S be a subset of T. The subset S is closed if and only if S ¼ S.
Proof The proof can be done analogously to that of Theorem 6.1. From (6.17) S is
closed. Therefore, if S ¼ S, then S is closed. Conversely, suppose that S is closed. In
that event S itself is a closed set containing S. Meanwhile, S has been characterized as
the smallest closed set containing S. Then, we have S ⊃ S. At the same time, from
(6.11) we have S ⊃ S. Thus, we get S ¼ S. This completes the proof. ∎
(c) Boundary [4]
Definition 6.5 Let (T, τ) be a topological space and S be a subset of T. If a point
x 2 T is in both the closure of S and the closure of the complement of S, i.e., Sc, x is
said to be in the boundary of S. In this case, x is called a boundary point of S. The
boundary of S is denoted by Sb.
By this definition Sb can be expressed as
Sb ¼ S \ Sc : ð6:18Þ
Thus, we have
Sb ¼ ðSc Þb : ð6:20Þ
190 6 Theory of Analytic Functions
This means that S and Sc have the same boundary. Since both S and Sc are closed
sets and the boundary is defined as their intersection, from Axiom (C3) the boundary
is a closed set. From Definition 6.1 a closed set is defined as a complement of an
open set, and vice versa. Thus, the open set and closed set do not make a confron-
tation concept but a complementation concept. In this respect we have the following
lemma and theorem.
Lemma 6.5 Let S be an arbitrary subset of T. Then, we have
Sc ¼ ðS∘ Þc :
Proof Since S∘ ⊂ S, (S∘)c ⊃ Sc. As S∘ is open, (S∘)c is closed. Hence, ðS∘ Þc ⊃ Sc,
because Sc is the smallest closed set containing Sc. Next let C be an arbitrary closed set
that contains Sc. Then Cc is an open set that is contained in S, and so we have Cc ⊂ S∘,
because S∘ is the largest open set contained in S. Therefore, C ⊃ (S∘)c. Then, if we
choose Sc for C, we must have Sc ⊃ (S∘)c. Combining this relation with ðS∘ Þc ⊃ Sc
obtained above, we get Sc ¼ ðS∘ Þc . ∎
Theorem 6.3 [4] A necessary and sufficient condition for S to be both open and
closed at once is Sb ¼ ∅.
Proof
(i) Necessary condition: Let S be open and closed. Then, S ¼ S and S ¼ S∘.
Suppose that Sb 6¼ ∅. Then, from Definition 6.5 we would have ∃ a 2 S \ Sc
for some a. Meanwhile, from Lemma 6.5 we have Sc ¼ ðS∘ Þc . This leads to
a 2 S \ ðS∘ Þc. But, from the assumption we would have a 2 S \ Sc. We have no
such a, however. Thus, using the proof by contradiction we must have Sb ¼ ∅.
(ii) Sufficient condition: Suppose that Sb ¼ ∅. Starting with S ⊃ S∘ and S 6¼ S∘, let
us assume that there is a such that a 2 S and a 2 = S∘. That is, we have a 2
ðS∘ Þ ) a 2 S \ ðS∘ Þ ⊂ S \ ðS∘ Þ ¼ S \ Sc ¼ Sb , in contradiction to the sup-
c c c
position. Thus, we must not have such a, implying that S ¼ S∘. Next, starting
with S ⊃ S and S 6¼ S, we assume that there is a such that a 2 S and a 2
= S. That
is, we have a 2 Sc ) a 2 S \ Sc ⊂ S \ Sc ¼ Sb , in contradiction to the suppo-
sition. Thus, we must not have such a, implying that S ¼ S. Combining this
relation with S ¼ S∘ obtained above, we get S ¼ S ¼ S∘ . In other words, S is
both open and closed at once. These complete the proof. ∎
The abovementioned set is sometimes referred to as a closed-open set or a clopen
set as a portmanteau word. An interesting example can be seen in Chap. 20 in
relation to topological groups (or continuous groups).
6.1 Set and Topology 191
Then, (6.22) means that Sd is divided into two sets consisting of the accumulation
points, i.e., (Sd)+ that belongs to S and (Sd) that does not belong to S.
Thus, as another direct sum we have
þ
S ¼ Sd \ S [ Sdsc ¼ Sd [ Sdsc : ð6:23Þ
Equation (6.23) represents the direct sum consisting of a part of the derived set
and the whole discrete set. Here we have the following theorem.
192 6 Theory of Analytic Functions
S ⊃ Sd [ Sdsc :
It is natural to ask how to deal with other points that do not belong to Sd [ Sdsc.
Taking account of the above discussion on the accumulation points and isolated
points, however, such remainder points should again be classified into accumulation
and isolated points. Since all the isolated points are contained in S, they can be taken
into S.
Suppose that among the accumulation points, some points p are not contained in
= S. In that case, we have S {p} ¼ S, namely S fpg ¼ S. Thus, p 2
S; i.e., p 2
S fpg , p 2 S. From (6.21), in turn, this implies that in case p is not contained in
S, for p to be an accumulation point of S is equivalent to that p is an adherent point of
S. Then, those points p can be taken into S as well. Thus, finally we get (6.24). From
Definitions 6.6 and 6.7, obviously we have Sd \ Sdsc ¼ ∅. This completes the
proof. ∎
Rewriting (6.24), we get
þ
S ¼ Sd [ Sd [ Sdsc ¼ Sd [ S: ð6:25Þ
S ¼ Sd [ S: ð6:26Þ
Proof Let us assume that p 2 S. Then, we have following two cases: (i) if p 2 S, we
have trivially p 2 S ⊂ Sd [ S. (ii) Suppose p 2 = S. Since p 2 S, from Definition 6.4
any neighborhood of p contains a point of S (other than p). Then, from Definition
6.6, p is an accumulation point of S. Thus, we have p 2 Sd ⊂ Sd [ S and, hence, get
S ⊂ Sd [ S. Conversely, suppose that p 2 Sd [ S. Then, (i) if p 2 S, obviously we
have p 2 S. (ii) Suppose p 2 Sd. Then, from Definition 6.6 any neighborhood of
p contains a point of S (other than p). Thus, from Definition 6.4, p 2 S. Taking into
account the above two cases, we get Sd [ S ⊂ S. Combining this with S ⊂ Sd [ S
obtained above, we get (6.26). This completes the proof. ∎
6.1 Set and Topology 193
( )
̅= ∪ = ∪ ,
=( ) ∪( )
This relation is equivalent to the statement that S contains all the accumulation
points of S.
In Fig. 6.6 we show the relationship among S, S, Sd, and Sdsc. From Fig. 6.6, we
can readily show that
Sdsc ¼ S Sd ¼ S Sd :
Sb ¼ S \ Sc ¼ S \ ð S∘ Þ c ¼ S S∘ : ð6:27Þ
Since S ⊃ S∘ , we get
S ¼ S∘ [ Sb ; S∘ \ Sb ¼ ∅:
Sb ¼ ∅ ⟺ S ¼ S∘ ⟺ S is a clopen set:
194 6 Theory of Analytic Functions
(a) (b)
(c)
= ∪
Fig. 6.7 Connectedness of subspaces. (a) Connected subspace A. Any two points z1 and z2 of A can
be connected by a continuous line that is contained entirely within A. (b) Simply connected
subspace A. All the points inside any closed path C within A contain only the points that belong
to A. (c) Disconnected subspace A that comprises two disjoint sets B and C
(e) Connectedness
In the precedent several paragraphs we studied various building blocks and their
properties of the topological space and sets contained in the space. In this paragraph,
we briefly mention the relationship between the sets. The connectedness is an
important concept in the topology. The concept of the connectedness is associated
with the subspaces defined earlier in this section. In terms of the connectedness, a
topological space can be categorized into two main types: a subspace of the one type
is connected and that of the other is disconnected.
Let A be a subspace of T. Intuitively, for A to be connected is defined as follows:
Let z1 and z2 be any two points of A. If z1 and z2 can be joined by a continuous line
that is contained entirely within A (see Fig. 6.7a), then A is said to be connected. Of
the connected subspaces, suppose that the subspace A has the property that all the
points inside any closed path (i.e., continuous closed line) C within A contain only
the points that belong to A (Fig. 6.7b). Then, that subspace A is said to be simply
connected. A subspace that is not simply connected but connected is said to be
multiply connected. A typical example for the latter is a torus. If a subspace is given
as a union of two (or more) disjoint non-empty (open) sets, that subspace is said to be
disconnected. Notice that if two (or more) sets have no element in common, those
sets are said to be disjoint. Figure 6.7c shows an example for which a disconnected
subspace A is a union of two disjoint sets B and C.
Interesting examples can be seen in Chap. 20 in relation to the connectedness.
6.1 Set and Topology 195
6.1.3 T1-Space
We can “enter” a variety of topologies τ into a set T to get a topological space (T, τ).
We have two extremes of them. One is an indiscrete topology (or trivial topology)
and the other is a discrete topology [2]. For the former the topology is characterized
as
τ ¼ f∅, T g ð6:28Þ
With τ all the subsets are open and complements to the individual subsets are
closed by Definition 6.1. But, such closed subsets are contained in τ, and so they are
again open. Thus, all the subsets are clopen sets. Practically speaking, the afore-
mentioned two extremes are of less interest and value. For this reason, we need some
moderate separation conditions with topological spaces. These conditions are well
established as the separation axioms. We will not get into details about the discus-
sion [2], but we mention the first separation axiom (Fréchet axiom) that produces the
T1-space.
Definition 6.8 Let (T, τ) be a topological space. Suppose that with respect to
8
x, y (x 6¼ y) 2 T there is a neighborhood N in such a way that
x 2 N and y 2
= N:
The topological space that satisfies the above separation condition is called a
T1-space.
We mention two important theorems of the T1-space.
Theorem 6.7 A necessary and sufficient condition for (T, τ) to be a T1-space is that
for each point x 2 T, {x} is a closed set of T.
Proof
(i) Necessary condition: Let (T, τ) to be a T1-space. Choose any pair of elements
x, y (x 6¼ y) 2 T. Then, from Definition 6.8 there is a neighborhood ∃N of y (6¼x)
that does not contain x; y 2 N and N \ {x} ¼ ∅. From Definition 6.4 this implies
that y 2
= fxg. Thus, we have y 6¼ x ) y 2= fxg. This means that y 2 fxg ) y ¼ x.
Namely, fxg ⊂ fxg , but from (6.11) we have fxg ⊂ fxg . Thus, fxg ¼ fxg .
From Theorem 6.2 this shows that {x} is a closed set of T.
(ii) Sufficient condition: Suppose that {x} is a closed set of T. Choose any x and
y such that y 6¼ x. Since {x} is a closed set, N ¼ T {x} is an open set that
contains y; i.e., y 2 T {x} and x 2= T {x}. Since y 2 T {x} ⊂ T {x}
196 6 Theory of Analytic Functions
z ¼ x þ iy ð6:1Þ
arg z ¼ θ þ 2πn; n ¼ 0, 1, 2, :
we get
z ¼ reiθ : ð6:34Þ
Equation (6.36) is called the de Moivre’s theorem. This relation holds with
n being integers including zero along with rational numbers including negative
numbers. Euler’s formula (6.33) immediately leads to the following important
formulae:
1 iθ 1 iθ
cos θ ¼ e þ eiθ , sin θ ¼ e eiθ : ð6:37Þ
2 2i
Notice that although in (6.32) and (6.33) we assumed that θ is a real number,
(6.33) and (6.37) hold with any complex number θ. Note also that (6.33) results from
(6.37) and the following definition of power series expansion of the exponential
function
X 1 zk
ez k¼0 k!
, ð6:38Þ
where z is any complex number [5]. In fact, from (6.37) and (6.38) we have
following familiar expressions of power series expansion of the cosine and sine
functions
X1 z2k X1 z2kþ1
cos z ¼ ð1Þk , sin z ¼ ð1Þk :
k¼0 ð2kÞ! k¼0 ð2k þ 1Þ!
z w ¼ ðx uÞ þ iðy vÞ:
ρðz, wÞ jz wj ,
ρ(z, w) defines a metric (or distance function); see Chap. 13. Thus, the metric gives a
distance between any arbitrarily chosen pair of elements z, w 2 ℂ and (ℂ, ρ) can be
dealt with as a metric space. This makes it easy to view the complex plane as a
topological space. Discussions about the set theory and topology already developed
earlier in this chapter basically hold.
In the theory of analytic functions, a subset A depicted in Fig. 6.7 can be regarded
as a part of the complex plane. Since the complex plane ℂ itself represents a
topological space, the subset A may be viewed as a subspace of ℂ. A complex
function f (z) of a complex variable z of (6.1) is usually defined in a connected open
set called a region [5]. The said region is of practical use among various subspaces
and can be an entire domain of ℂ or a subset of ℂ. The connectedness of the region
can be considered in a manner similar to that described in Sect. 6.1.2.
Since the complex number and complex plane have been well characterized in the
previous section, we deal with the complex functions of a complex variable in the
complex plane.
Taking the complex conjugate of (6.1), we have
z ¼ x iy: ð6:40Þ
∗
−
∗
−
∗ ∗
− − +
0
∗
−
∗
− +
(a) (b) z
i
x
0 0 1
Fig. 6.10 Ways a number (real or complex) is approached. (a) Two ways a number x0 is
approached in a real number line. (b) Various ways a number z0 is approached in a complex
plane. Here, only four ways are indicated
1 1
The matrix is non-singular, and so the inverse matrix exists (see Sect.
i i
11.3) and (x y) can be described as
1 1 i
ðx yÞ ¼ ðz z Þ ð6:42Þ
2 1 i
1 i
x ¼ ðz þ z Þ and y ¼ ðz z Þ, ð6:43Þ
2 2
where both u(x, y) and v(x, y) are real functions of real variables x and y.
6.2 Analytic Functions of a Complex Variable 201
f ðx, yÞ ¼ 2x þ iy ¼ z þ x: ð6:45Þ
where f (x, y) and h(z, z) denote different functional forms. The derivative (6.45)
varies depending on a way z0 is approached. For instance, think of
df f ð 0 þ z Þ f ð 0Þ 2x þ iy 2x2 þ y2 ixy
¼ lim ¼ lim ¼ lim :
dz 0 z!0 z x!0, y!0 x þ iy x!0, y!0 x 2 þ y2
Suppose that the differentiation is taken along a straight line in a complex plane
represented by iy ¼ (ik)x (k, x, y : real). Then, we have
df 2x2 þ k2 x2 ikx2 2 þ k 2 ik 2 þ k 2 ik
¼ lim ¼ lim ¼ : ð6:46Þ
dz 0 x!0, y!0 x þk x
2 2 2 x!0, y!0 1 þ k2 1 þ k2
df
However, this means that dz 0 takes varying values depending upon k. Namely,
df
dz 0cannot uniquely be defined but depends on different ways to approach the origin
of the complex plane. Thus, we find that the derivative takes different values
depending on straight lines along which the differentiation is taken. This means
that f (x, y) is not differentiable or analytic at z ¼ 0.
Meanwhile, think of g(z) expressed as
gðzÞ ¼ x þ iy ¼ z: ð6:47Þ
dg gð 0 þ z Þ gð 0Þ z0
¼ lim ¼ lim ¼ 1:
dz 0 z!0 z z!0 z
As a result, the derivative takes the same value 1, regardless of what straight lines
the differentiation is taken along.
Though simple, the above example gives us a heuristic method. As before, let f (z)
be a function described as (6.44) such that
where both u(x, y) and v(x, y) possess the first-order partial derivatives with respect to
x and y. Then we have a derivative df (z)/dz such that
We wish to seek the condition that dfdzðzÞ gives the same result regardless of the
order of taking the limit of Δx ! 0 and Δy ! 0. That is, taking the limit of Δy ! 0
first, we have
Consequently, by equating the real and imaginary parts of (6.50) and (6.51) we
must have
∂uðx, yÞ ∂vðx, yÞ
¼ , ð6:52Þ
∂x ∂y
6.2 Analytic Functions of a Complex Variable 203
∂vðx, yÞ ∂uðx, yÞ
¼ : ð6:53Þ
∂x ∂y
These relationships (6.52) and (6.53) are called the CauchyRiemann conditions.
Differentiating (6.52) with respect to x and (6.53) with respect to y and further
subtracting one from the other, we get
2 2
∂ uðx, yÞ ∂ uðx, yÞ
2
þ 2
¼ 0: ð6:54Þ
∂x ∂y
Similarly we have
2 2
∂ vðx, yÞ ∂ vðx, yÞ
2
þ 2
¼ 0: ð6:55Þ
∂x ∂y
df ðzÞ ∂f
¼ : ð6:56Þ
dz ∂x
df ðzÞ ∂f
¼ i : ð6:57Þ
dz ∂y
Meanwhile, we have
∂ ∂z ∂ ∂z ∂ ∂ ∂
¼ þ ¼ þ : ð6:58Þ
∂x ∂x ∂z ∂x ∂z ∂z ∂z
Also, we get
∂ ∂z ∂ ∂z ∂ ∂ ∂
¼ þ ¼i : ð6:59Þ
∂y ∂y ∂z ∂y ∂z ∂z ∂z
where the change in the functional form is due to the variables transformation.
Hence, from (6.56) and using (6.58) we have
204 6 Theory of Analytic Functions
df ðzÞ ∂F ðz, z Þ ∂ ∂ ∂F ðz, z Þ ∂F ðz, z Þ
¼ ¼ þ F ðz, z Þ ¼ þ : ð6:61Þ
dz ∂x ∂z ∂z ∂z ∂z
Similarly, we get
df ðzÞ ∂F ðz, z Þ ∂ ∂ ∂F ðz, z Þ ∂F ðz, z Þ
¼ i ¼ F ðz, z Þ ¼ : ð6:62Þ
dz ∂y ∂z ∂z ∂z ∂z
∂F ðz, z Þ
¼ 0: ð6:63Þ
∂z
This clearly shows that if F(z, z) is differentiable with respect to z, F(z, z) does
not depend on z, but only depends upon z. Thus, we may write the differentiable
function as
F ðz, z Þ f ðzÞ:
where u(x, y) and v(x, y) satisfy the CauchyRiemann conditions and possess con-
tinuous first-order partial derivatives with respect to x and y in some region of ℂ.
Then, we have
∂uðx, yÞ ∂uðx, yÞ
uðx þ Δx, y þ ΔyÞ uðx, yÞ ¼ Δx þ Δy þ ε1 Δx þ δ1 Δy, ð6:64Þ
∂x ∂y
∂vðx, yÞ ∂vðx, yÞ
vðx þ Δx, y þ ΔyÞ vðx, yÞ ¼ Δx þ Δy þ ε2 Δx þ δ2 Δy, ð6:65Þ
∂x ∂y
where four positive numbers ε1, ε2, δ1, and δ2 can be made arbitrarily small as Δx and
Δy tend to be zero. The relations (6.64) and (6.65) result from the continuity of the
first-order partial derivatives of u(x, y) and v(x, y). Then, we have
6.2 Analytic Functions of a Complex Variable 205
f ðz þ ΔzÞ f ðzÞ
Δz
uðx þ Δx, y þ ΔyÞ uðx, yÞ i½vðx þ Δx, y þ ΔyÞ vðx, yÞ
¼ þ
Δz Δz
Δx ∂uðx, yÞ ∂vðx, yÞ Δy ∂uðx, yÞ ∂vðx, yÞ
¼ þi þ þi
Δz ∂x ∂x Δz ∂y ∂y
Δx Δy
þ ðε1 þ iε2 Þ þ ðδ þ iδ2 Þ
Δz Δz 1
Δx ∂uðx, yÞ ∂vðx, yÞ Δy ∂vðx, yÞ ∂uðx, yÞ
¼ þi þ þi
Δz ∂x ∂x Δz ∂x ∂x
Δx Δy
þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ
Δz 1 Δz 1
Δx þ iΔy ∂uðx, yÞ ðΔx þ iΔyÞ ∂vðx, yÞ Δx Δy
¼ þ i þ ðε1 þ iε2 Þ þ ðδ þ iδ2 Þ
Δz ∂x Δz ∂x Δz Δz 1
∂uðx, yÞ ∂vðx, yÞ Δx Δy
¼ þi þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ, ð6:66Þ
∂x ∂x Δz 1 Δz 1
where with the third equality we used the CauchyRiemann conditions (6.52) and
(6.53). Accordingly, we have
From (6.56) and (6.57), the relations (6.69) and (6.70) imply that for f (z) to be
differentiable requires that the first-order partial derivatives of u(x, y) and v(x, y) with
respect to x and y should exist. Meanwhile, once f (z) is found to be differentiable
(or analytic), its higher order derivatives must be analytic as well (vide infra). This
requires, in turn, that the above first-order partial derivatives should be continuous.
(Note that the analyticity naturally leads to continuity.) Thus, the following theorem
will follow.
Theorem 6.9 Let f (z) be a complex function of complex variables z such that
f (z) ¼ u(x, y) + iv(x, y), where x and y are real variables. Then, a necessary and
sufficient condition for f (z) to be differentiable is that the first-order partial deriva-
tives of u(x, y) and v(x, y) with respect to x and y exist and are continuous and that u
(x, y) and v(x, y) satisfy the CauchyRiemann conditions.
Now, we formally give definitions of the differentiability and analyticity of a
complex function of complex variables.
Definition 6.9 Let z0 be a given point of the complex plane ℂ. Let f (z) be defined in
a neighborhood containing z0. If f (z) is single valued and differentiable at all points
of this neighborhood containing z0, f (z) is said to be analytic at z0. A point at which
f (z) is analytic is called a regular point of f (z). A point at which f (z) is not analytic is
called a singular point of f (z).
We emphasize that the above definition of analyticity at some point z0 requires the
single valuedness and differentiability at all the points of a neighborhood containing
z0. This can be understood by Fig. 6.10b and Example 6.1. The analyticity at a point
is deeply connected to how and in which direction we take the limitation process.
Thus, we need detailed information about the neighborhood of the point in question
to determine whether the function is analytic at that point.
The next definition is associated with a global characteristic of the analytic
function.
Definition 6.10 Let R be a region contained in the complex plane ℂ. Let f (z) be a
complex function defined in R . If f (z) is analytic at all points of R , f (z) is said to be
analytic in R . In this case R is called a domain of analyticity. If the domain of
analyticity is an entire complex plane ℂ, the function is called an entire function.
To understand various characteristics of the analytic functions, it is indispensable
to introduce Cauchy’s integral formula. In the next section we deal with the
integration of complex functions.
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 207
where z0 za ¼ z(ta) and zn zb ¼ z(tb) together with zi ¼ z(ti) (0 i n). For this
parametrization, we assume that C is subdivided into n pieces designated by z0, z1,
, zn. Let f (z) be a complex function defined in a region containing C. In this
situation, let us assume the following summation Sn:
Xn
Sn ¼ i¼1
f ðζ i Þðzi zi1 Þ, ð6:72Þ
Z Z
dx dy dx dy
I¼ uðx, yÞ dt vðx, yÞ dt þ i vðx, yÞ dt þ uðx, yÞ dt
dt dt dt dt
ZC Z C
dx dy
¼ ½uðx, yÞ þ ivðx, yÞ dt þ i ½uðx, yÞ þ ivðx, yÞ dt
dt dt
ZC C
Z ð6:76Þ
dx dy dz
¼ ½uðx, yÞ þ ivðx, yÞ þi dt ¼ ½uðx, yÞ þ ivðx, yÞ dt:
C dt dt C dt
Z Z tb Z zb
dz dz
¼ f ðzÞ dt ¼ f ðzÞ dt ¼ f ðzÞdz
C dt ta dt za
Notice that the contour integral I of (6.74), (6.75), or (6.76) depends in general on
the paths connecting the points za and zb. If so, (6.76) is merely of secondary
importance. But, if f (z) can be described by the derivative of another function, the
situation becomes totally different. Suppose that we have
e ðzÞ
dF
f ðzÞ ¼ : ð6:77Þ
dz
e ðzÞ dz d F
dz d F e ½zðt Þ
f ð zÞ ¼ ¼ : ð6:78Þ
dt dz dt dt
This implies that the contour integral I does not depend on the path C connecting
the points za and zb. We must be careful, however, about a situation where there
would be a singular point zs of f (z) in a region R . Then, f (z) is not defined at zs. If one
chooses C00 for a contour of the return path in such a way that zs is on C00 (see
Fig. 6.11), RHS of (6.80) cannot exist, nor can (6.80) hold. To avoid that situation,
we have to “expel” such singularity from the region R so that R can wholly be
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 209
contained in the domain of analyticity of f (z) and that f (z) can be analytic in the
entire region R . Moreover, we must choose R so that the whole region inside any
closed paths within R can contain only the points that belong to the domain of
analyticity of f (z). That is, the region R must be simply connected in terms of the
analyticity of f (z) (see Sect. 6.1). For a region R to be simply connected ensures that
(6.80) holds with any pair of points za and zb in R , so far as the integration path with
respect to (6.80) is contained in R .
From (6.80), we further get
Z zb Z za
f ðzÞdz þ f ðzÞdz ¼ 0: ð6:81Þ
za zb
Since the integral does not depend on the paths, we can take a path from zb to za
along a curve C0 (see Fig. 6.11). Thus, we have
Z Z Z
f ðzÞdz þ f ðzÞdz ¼ f ðzÞdz ¼ 0, ð6:82Þ
C C0 CþC 0
(a) (b)
Γ
ℛ
ℛ
Fig. 6.12 Region R and contour C for integration. (a) A singular point at zs is contained within R .
(b) The singular point at zs is not contained within Re so that f(z) can be analytic in a simply
connected region Re . Regarding the symbols and notations, see text
I
f ðzÞdz ¼ 0: ð6:84Þ
C
e denotes the clockwise integration (see Fig. 6.12b) and the lines L1 and L2 are
where Γ
rendered as closely as possible. In (6.85) the integrations with L1 and L2 cancel,
because they are taken in the reverse direction. Thus, we have
I I I
f ðzÞdz ¼ 0 or f ðzÞdz ¼ f ðzÞdz, ð6:86Þ
Cþe
Γ C Γ
where Γ stands for the counterclockwise integration. Equation (6.86) is valid as well,
when there are more singular points. In that case, instead of (6.86) we have
I XI
f ðzÞdz ¼ i
f ðzÞdz, ð6:87Þ
C Γi
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 211
where Γi is taken so that it can encircle individual singular points. Reflecting the
nature of singularity, we get different results with (6.87). We will come back to this
point later.
On the basis of Cauchy’s integral theorem, we show an integral representation of
an analytic function that is well known as Cauchy’s integral formula.
Theorem 6.11: Cauchy’s Integral Formula Let f (z) be analytic in a simply
connected region R . Let C be an arbitrary closed curve within R that encircles z.
Then, we have
I
1 f ðζ Þ
f ðzÞ ¼ dζ, ð6:88Þ
2πi C ζz
ζ ¼ z þ ρeiθ : ð6:90Þ
I Z 2π
dζ
¼ idθ ¼ 2πi: ð6:92Þ
Γ z
ζ 0
we have
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 213
Z
j f ðzÞdz j max j f j L, ð6:95Þ
C
where L represents the arc length of the curve C for the contour integration.
Proof As discussed earlier in this section, the integral is the limit of n ! 1 of the
sum described by
Xn
Sn ¼ i¼1
f ðζ i Þðzi zi1 Þ ð6:72Þ
with
Xn
I ¼ lim Sn ¼ lim i¼1
f ðζ i Þðzi zi1 Þ: ð6:73Þ
n!1 n!1
j Sn j max j f j L:
R
As n ! 1, jSnj ¼ j I j ¼ j C f (z)dz j max j f j L.
This completes the proof. ∎
Applying the Darboux inequality to (6.94), we get
I I
f ðζ Þ f ðzÞ f ðζ Þ f ðzÞ f ðζ Þ f ðzÞ
dζ ¼ dζ max 2πρ < 2πε:
Γ ζz Γ ρe iθ ρeiθ
That is,
I
1 f ðζ Þ f ðzÞ
j dζ j< ε: ð6:97Þ
2πi Γ ζz
Proposition 6.1 [5] Let C be a piecewise continuous curve of finite length. (The
curve C may or may not be closed.) Let f (z) be a continuous function. The contour
integration of f (ζ)/(ζ z) gives ef ðzÞ such that
Z
ef ðzÞ ¼ 1 f ðζ Þ
dζ: ð6:98Þ
2πi C z
ζ
ef ðz þ ΔzÞ ef ðzÞ Z
1 f ðζ Þ
Δ dζ , ð6:99Þ
Δz 2πi C ðζ zÞ2
Z
j Δz j f ðζ Þ
Δ¼ 2
dζ : ð6:100Þ
2π C ðζ z ΔzÞðζ zÞ
Z
1 f ðζ Þðζ zÞ2 f ðζ Þðζ z ΔzÞðζ zÞ f ðζ ÞΔzðζ z ΔzÞ
Δ¼ dζ
2πiΔz C ðζ z ΔzÞðζ zÞ2
Z :
1 f ðζ ÞðΔzÞ2
¼ 2
dζ
2πiΔz C ðζ z ΔzÞðζ zÞ
Z
def ðzÞ 1 f ðζ Þ
¼ dζ: ð6:101Þ
dz 2πi C ðζ zÞ2
I
ef ðzÞ ¼ 1 f ðζ Þ
dζ, ð6:102Þ
2πi C z
ζ
Comparing (6.102) with (6.103) and taking account of the single valuedness of f
(z), we must have
An analogous result holds with the n-th derivative of f (z) such that [5]
I
d n f ðzÞ n! f ðζ Þ
¼ dζ ðn : zero or positive integersÞ: ð6:106Þ
dzn 2πi C ðζ zÞ
nþ1
df ðzÞ
It is true of d dzf ðnzÞ with any n (zero or positive integers).
n
dz .
A following theorem is important and intriguing with the analytic functions.
Theorem 6.13: Cauchy–Liouville Theorem A bounded entire function must be a
constant.
Proof Using (6.106), we consider the first derivative of an entire function f (z)
described as
I
df 1 f ðζ Þ
¼ 2
dζ:
dz 2πi C ðζ zÞ
Since f (z) is an entire function, we can arbitrarily choose a large enough circle of
radius R centered at z for a closed contour C. On the circle, we have
216 6 Theory of Analytic Functions
ζ ¼ z þ Reiθ ,
where θ is a real number changing from 0 to 2π. Then, the above equation can be
rewritten as
Z Z
df 1 2π
f ðζ Þ 1 2π
f ðζ Þ
¼ iReiθ dθ ¼ dθ: ð6:107Þ
dz 2πi 0 ðReiθ Þ2 2πR 0 eiθ
Using Cauchy’s integral formula, we study Taylor’s series and Laurent’s series in
relation to the power series expansion of analytic function. Taylor’s series and
Laurent’s series are fundamental tools to study various properties of complex
functions in the theory of analytic functions. First, let us examine Taylor’s series
of an analytic function.
Theorem 6.14 [6] Any analytic function f (z) can be expressed by uniformly
convergent power series at an arbitrary regular point of the function in the domain
of analyticity R .
6.4 Taylor’s Series and Laurent’s Series 217
1 1 1 1
¼ ¼
ζ z ζ a ðz aÞ ζ a 1 ζa
za
1 za ð z aÞ 2
¼ þ þ þ , ð6:110Þ
ζ a ð ζ aÞ 2 ð ζ aÞ 3
where ζ is an arbitrary point on the circle C. To derive (6.110), we used the following
formula:
1 X1
¼ xn ,
1x n¼0
where x ¼ ζa
za
. Since za
ζa ¼ ρr < 1 , the geometric series of (6.110) converges.
Suppose that f (ζ) is analytic on C. Then, we have a finite positive number M on
C such that
j f ðζ Þj< M: ð6:111Þ
f ðζ Þ f ðζ Þ ðz aÞf ðζ Þ ðz aÞ2 f ðζ Þ
¼ þ þ þ : ð6:112Þ
ζz ζa ðζ aÞ2 ð ζ aÞ 3
" n #
X1 ðz aÞν f ðζ Þ M X1 ρ ν M 1 1 ρr
¼
ν¼n
ð ζ aÞ νþ1 r ν¼n r r 1 ρr 1 ρr
M ρ n
¼ : ð6:113Þ
rρ r
we get
X1
f ðzÞ ¼ A ðz
n¼0 n
aÞ n : ð6:116Þ
I
1 f ðζ Þ 1 dn f ðzÞ 1 d n f ð aÞ
An ¼ dζ ¼ ¼ : ð6:118Þ
2πi C ð ζ aÞ
nþ1 n! dzn n! dzn
z¼a
0 1
X1 1 dn f ðaÞ
f ðzÞ ¼ n¼0 n! dzn
ð z aÞ n :
This is the same form as that obtained with respect to a real variable.
Equation (6.116) is a uniformly convergent power series called a Taylor’s series
or Taylor’s expansion of f (z) with respect to z ¼ a with An being its coefficient. In
Theorem 6.14 we have assumed that a union of a circle C and its inside is simply
connected. This topological environment is virtually the same as that of Fig. 6.13. In
Theorem 6.11 (Cauchy’s integral formula) we have shown the integral representa-
tion of an analytic function. On the basis of Cauchy’s integral formula, Theorem
6.14 demonstrates that any analytic function is given a tangible functional form.
Example 6.2 Let us think of a function f (z) ¼ 1/z. This function in analytic except
for z ¼ 0. Therefore, it can be expanded in the Taylor’s series in the region ℂ {0}.
We consider the expansion around z ¼ a (a 6¼ 0). Using (6.115), we get the
coefficients of the expansion are
I
1 1
An ¼ nþ1
dz,
2πi C z ð z aÞ
where C is a closed curve that does not contain z ¼ 0 on C or in its inside. Therefore,
1/z is analytic in the simply connected region encircled with C so that the point z ¼ 0
may not be contained inside C (Fig. 6.15). Then, from (6.106) we get
I
1 1 1 d n ð1=zÞ 1 1 ð1Þn
dz ¼ ¼ ð1Þn n! nþ1 ¼ nþ1 :
2πi C z ð z aÞ
nþ1 n! dzn n! a a
z¼a
1 X1 ð1Þ 1 X1
n
n z n
f ðzÞ ¼ ¼ n¼0 anþ1
ð z a Þ ¼ 1 : ð6:119Þ
z a n¼0 a
The series uniformly converges within a convergence circle called a “ball” [7]
(vide infra) whose convergence radius r is estimated to be
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffi
1 n ð1Þn 1 n 1 1
¼ lim sup ¼ ¼ : ð6:120Þ
r n!1 anþ1 jaj jaj jaj
The region R is an open set by definition (see Sect. 6.1) and the Tailor’s series
defined in R is convergent. On the other hand, the Tailor’s series may or may not be
convergent at z that is on the convergence circle. In Example 6.2, f ðzÞ ¼ 1z is not
defined at z ¼ 0, even though z (¼0) is on the convergent circle. At other points on
the convergent circle, however, the Tailor’s series is convergent.
Note that a region B n in ℝn similarly defined as
B n ¼ x; x, ∃ x0 2 ℝn , ρðx, x0 Þ < r
is sometimes called a ball or open ball. This concept can readily be extended to any
metric space.
Next, we examine Laurent’s series of an analytic function.
Theorem 6.15 [6] Let f (z) be analytic in a region R except for a point a. Then, f (z)
can be described by the uniformly convergent power series within a region R fag.
Proof Let us assume that we have a circle C1 of radius r1 centered at a and another
circle C2 of radius r2 centered at a both within R . Moreover, let another circle Γ be
centered at z (6¼a) within R so that z can be outside of C2; see Fig. 6.16.
In this situation we consider the contour integration along C1, C2, and Γ. Using
(6.87), we have
6.4 Taylor’s Series and Laurent’s Series 221
I I I
1 f ðζ Þ 1 f ðζ Þ 1 f ðζ Þ
dζ ¼ dζ dζ: ð6:121Þ
2πi Γ ζ z 2πi C1 ζ z 2πi C2 z
ζ
Notice that the region containing Γ and its inside form a simply connected region
in terms of the analyticity of f (z). With the first term of RHS of (6.121), from
Theorem 6.14 we have
I X1
1 f ðζ Þ
dζ ¼ A ð z aÞ n ð6:122Þ
C1 ζ z
2πi n¼0 n
with
I
1 f ðζ Þ
An ¼ nþ1
dζ ðn ¼ 0, 1, 2, Þ: ð6:123Þ
2πi C 1 ð ζ aÞ
In fact, if ζ lies on the circle C1, the Taylor’s series is uniformly convergent as in
the case of the proof of Theorem 6.14.
With the second term of RHS of (6.121), however, the situation is different in
such a way that z lies outside the circle C2. In this case, from Fig. 6.16 we have
ζa r
¼ 2 < 1: ð6:124Þ
za ρ
" #
2
1 1 1 ζ a ð ζ aÞ
¼ ¼ 1þ þ þ ð6:125Þ
ζ z ζ a ð z aÞ z a z a ð z aÞ 2
where
I
1
An ¼ f ðζ Þðζ aÞn1 dζ ðn ¼ 1, 2, 3, Þ: ð6:128Þ
2πi C2
X1
f ðzÞ ¼ A ðz
n¼1 n
aÞ n : ð6:129Þ
but
d n f ðzÞ
6¼ 0, ð6:132Þ
dzn z¼a
Then, h(z) is analytic and nonvanishing at z ¼ a. From the analyticity h(z) must be
continuous at z ¼ a and differ from zero in some finite neighborhood of z ¼ a.
Consequently, it is also the case with f (z). Therefore, if the set of zeros had an
accumulation point at z ¼ a, any neighborhood of z ¼ a would contain another zero,
in contradiction to the above assumption. To avoid this contradiction, the analytic
function f (z) 0 throughout the region R . Taking the contraposition of the above
statement, if an analytic function is not identically zero [i.e., f (z) ≢ 0], the zeros of
that function are isolated; see the relevant discussion of Sect. 6.1.2.
Meanwhile, the Laurent’s series (6.129) can be rewritten as
224 6 Theory of Analytic Functions
X1 X1 1
f ðzÞ ¼ A ðz
k¼0 k
aÞk þ A
k¼1 k
ð6:134Þ
ð z aÞ k
so that the constitution of the Laurent’s series can explicitly be recognized. The
second term is called a singular part (or principal part) of f (z) at z ¼ a. The
singularity is classified as follows:
(1) If (6.134) lacks the singular part (i.e., Ak ¼ 0 for k ¼ 1, 2, ), the singular
point is said to be a removable singularity. In this case, we have
X1
f ðzÞ ¼ A ðz
k¼0 k
aÞ k : ð6:135Þ
f ðaÞ A0 ,
f (z) can be regarded as analytic. Examples include the following case where a
sin function is expanded as the Taylor’s series:
Hence, if we define
sin z sin z
f ðzÞ and f ð0Þ lim ¼ 1, ð6:137Þ
z z!0 z
z ¼ 0 is a removable singularity.
(2) Suppose that in (6.134) we have a certain positive integer such that
the function f (z) is said to have a simple pole. This special case is important in
the calculation of various integrals (vide infra). In the above cases, (6.134) can
be rewritten as
6.6 Analytic Continuation 225
gð z Þ
f ðzÞ ¼ , ð6:140Þ
ð z aÞ n
X1 Xn 1
f ðzÞ ¼ A ðz
k¼0 k
aÞ k þ A
k¼1 k
:
ðz aÞk
(3) If the singular part of (6.134) comprises an infinite series, the point z ¼ a is
called an essential singularity. In that case, f (z) performs a complicated behavior
near or at z ¼ a. Interested readers are referred to suitable literature [5].
A function f (z) that is analytic in a region ℂ except at a set of points of the
region where the function has poles is called a meromorphic function in the said
region. The above definition of meromorphic functions is true of Cases (1) and
(2), but not true of Case (3). Henceforth, we will be dealing with the meromor-
phic functions.
In the above discussion of Cases (1) and (2), we often deal with a function f
(z) that has a single isolated pole at z ¼ a. This implies that f (z) is analytic within
a certain neighborhood Na of z ¼ a but is not analytic at z ¼ a. More specifically,
f (z) is analytic in a region Na {a}. Even though we consider more than one
isolated pole, the situation is essentially the same. Suppose that there is another
isolated pole at z ¼ b. In that case, again take a certain neighborhood Nb of z ¼ b
(6¼a) and f (z) is analytic in a region Nb {b}. Readers may well wonder why we
have to discuss this trifling issue. Nevertheless, think of the situation where a set
of poles has an accumulation point. Any neighborhood of the accumulation
point contains another pole where the function is not analytic. This is in
contradiction to that f (z) is analytic in a region, e.g., Na {a}. Thus, if such a
function were present, it would be intractable to deal with.
When we discussed the Cauchy’s integral formula (Theorem 6.11), we have known
that if a function is analytic in a certain region of ℂ and on a curve C that encircles the
region, the values of the function within the region are determined once the values of
the function on C are given. We have the following theorem for this.
226 6 Theory of Analytic Functions
−1 1
Theorem 6.16 Let f1(z) and f2(z) be two functions that are analytic within a region
R . Suppose that the two functions coincide in a set chosen from among (i) a
neighborhood of a point z 2 R , or (ii) a segment of a curve lying in R , or (iii) a
set of points containing an accumulation point belonging to R . Then, the two
functions f1(z) and f2(z) coincide throughout R .
Proof Suppose that f1(z) and f2(z) coincide on the above set chosen from the three.
Then, since f1(z) f2(z) ¼ 0 on that set, the set comprises the zeros of f1(z) f2(z).
This implies that the zeros are not isolated in R . Thus, as evidenced from the
discussion of Sect. 6.5, we conclude that f1(z) f2(z) 0 throughout R . That is,
f1(z) f2(z). This completes the proof. ∎
The theorem can be understood as follows: Two different analytic functions
cannot coincide in the set chosen from the above three (or more succinctly, a set
that contains an accumulation point). In other words, the behavior of an analytic
function in R is uniquely determined by the limited information of the subset
belonging to R .
The above statement is reminiscent of the pre-established harmony and, hence,
occasionally talked about mysteriously. Instead, we wish to give a simple example.
Example 6.3 In Example 6.2 we described a tangible illustration of the Taylor’s
expansion of a function 1/z as follows:
1 X1 ð1Þ 1 X1
n
n z n
¼ n¼0 anþ1
ð z a Þ ¼ 1 : ð6:119Þ
z a n¼0 a
C2, C3, and C4 (i.e., convergence circles of radius 1). The said region R is colored
pale green in Fig. 6.17. There are four petals as shown.
We can view Fig. 6.17 as follows: For instance, f1(z) and fi(z) are defined in C1
and its inside and C2 and its inside, respectively, excluding z ¼ 0. We have
f1(z) ¼ fi(z) within the petal P1 (overlapped part as shown). Consequently, from
the statement of Theorem 6.16, f1(z) fi(z) throughout the region encircled by C1
and C2 (excluding z ¼ 0). Repeating this procedure, we get
Thus, we obtain the same functional entity throughout R except for the origin
(z ¼ 0). Further continuing similar procedures, we finally reach the entire complex
plane except for the origin, i.e., ℂ {0} regarding the domain of analyticity of 1/z.
We further compare the above results with those for analytic function defined in
the real domain. The tailor’s expansion of f (x) ¼ 1/x (x: real with x 6¼ 0) around
a (a 6¼ 0) reads as
This is exactly the same form as that of (6.119) aside from notation of the
argument.
The aforementioned procedure that determines the behavior of an analytic func-
tion outside the region where that function was originally defined is called analytic
continuation or analytic prolongation. Comparing (6.119) and (6.142), we can see
that (6.119) is a consequence of the analytic continuation of 1/x from the real domain
to the complex domain.
The theorem of residues and its application to calculus of various integrals are one of
central themes of the theory of analytic functions.
The Cauchy’s integral theorem (Theorem 6.10) tells us that if f (z) is analytic in a
simply connected region R , we have
I
f ðzÞdz ¼ 0, ð6:84Þ
C
where C is a closed curve within R . As we have already seen, this is not necessarily
the case if f (z) has singular points in R . Meanwhile, if we look at (6.130), we are
aware of a simple but important fact. That is, replacing n with 1, we have
228 6 Theory of Analytic Functions
I I
1
A1 ¼ f ðζ Þdζ or f ðζ Þdζ ¼ 2πiA1 : ð6:143Þ
2πi C C
This implies that if we could obtain the Laurent’s series of f (z) with respect
H to a
singularity located at z ¼ a and encircled by C, we can immediately estimate Cf (ζ)
1
dζ of (6.143) from the coefficient of the term (z H a) , i.e., A1. This fact further
implies that even though f (z) has singularities, Cf (ζ)dζ may happen to vanish.
Thus, whether (6.84) vanishes depends on the nature of f (z) and its singularities. To
examine it, we define a residue of f (z) at z ¼ a (that is encircled by C) as
I I
1
Res f ðaÞ f ðzÞdz or f ðzÞdz ¼ 2πi Res f ðaÞ, ð6:144Þ
2πi C C
where Res f (a) denotes the residue of f (z) at z ¼ a. From (6.143) and (6.144), we
have
In a formal sense, the point a does not have to be a singular point. If it is a regular
point of f (a), we have Res f (a) ¼ 0 trivially. If there is more than one singularity at
z ¼ aj, we have
I X
f ðzÞdz ¼ 2πi j
Res f aj : ð6:146Þ
C
Notice that we assume the isolated singularities with (6.144) and (6.146). These
equations are associated with Cases (1) and (2) dealt with in Sect. 6.5. Within this
framework, we wish to evaluate the residue of f (z) at a pole of order n located at
z ¼ a. Using (6.140)
gð z Þ
f ðzÞ ¼ , ð6:140Þ
ð z aÞ n
In the special but important case where f (z) has a simple pole at z ¼ a, from
(6.147) we get
Then, let us calculate the contour integral of f (z) on a closed circle Γ of radius ρ
centered at z ¼ a. We have
I X1 I
I f ðζ Þdζ ¼ A
n¼1 n
ðζ aÞn dζ
Γ Γ
X1 Z 2π
¼ n¼1
iAn ρnþ1 eiðnþ1Þθ dθ: ð6:149Þ
0
I ¼ 2πiA1 :
To this end, we convert the real variable x to the complex variable z and evaluate a
contour integral IC (Fig. 6.18) described by
Z R Z
dz dz
IC ¼ þ , ð6:151Þ
R 1 þ z2 ΓR 1 þ z2
to be
Γ
6.8 Examples of Real Definite Integrals 231
1
Res f ðiÞ ¼ f ðzÞðz iÞjz¼i ¼ : ð6:153Þ
2i
To estimate the integral (6.150), we must calculate the second term of (6.151). To
this end, we change the variable z such that
z ¼ Reiθ , ð6:154Þ
Taking R ! 1, we have
Z
dz
! 0: ð6:155Þ
ΓR 1 þ z2
Consequently, if R ! 1, we have
Z 1 Z Z 1
dx dz dx
lim I C ¼ þ lim ¼ ¼I
R!1 1 1 þ x 2 R!1 ΓR 1 þ z 2
1 1 þ x2
¼ 2πi Res f ðiÞ ¼ π, ð6:156Þ
where with the first equality we placed the variable back to x; with the last equality
we used (6.152) and (6.153). Equation (6.156) gives an answer to (6.150). Notice in
general that if a meromorphic function is given as a quotient of two polynomials, an
integral of a type of (6.155) tends to be zero with R ! 1 in the case where the
degree of the denominator of that function is at least two units higher than the degree
of the numerator [8].
We may equally choose another contour C e (the closed curve comprising the
interval [R, R] and the lower semicircle ΓfR ) for the integration (Fig. 6.18). In that
case, we have
232 6 Theory of Analytic Functions
Z 1 Z Z 1
dx dz dx
lim I ¼ þ lim ¼
R!1 e
C
1 1 þ x2 R!1 ΓeR 1 þ z2 1 1 þ x2
¼ 2πi Res f ðiÞ ¼ π: ð6:157Þ
Note that in the above equation the contour integral is taken in the counterclock-
wise direction and, hence, that the real definite integral has been taken from 1 to
1. Notice also that
1
Res f ðiÞ ¼ f ðzÞðz þ iÞjz¼i ¼ ,
2i
because f (z) has a simple pole at z ¼ i in the lower semicircle (Fig. 6.18) for which
the residue has been taken. From (6.157), we get the same result as before such that
Z 1
dx
¼ π:
1 1 þ x2
1 1 1 1
¼ : ð6:158Þ
1 þ z2 2i z i z þ i
where C stands for the closed curve comprising the interval [R, R] and the upper
semicircle ΓR (see Fig. 6.18).
The function f ðzÞ ð1þz1 2 Þ3 has two isolated poles at z ¼ i. In the present case,
however, the poles are of order of 3. Hence, we use (6.147) to estimate the residue.
The inside of the upper semicircle contains the pole of order 3 at z ¼ i. Therefore, we
have
6.8 Examples of Real Definite Integrals 233
I I
1 1 gð z Þ 1 d2 gðzÞ
Res f ðiÞ ¼ f ðzÞdz ¼ 3
dz ¼ , ð6:161Þ
2πi C 2πi C ðz iÞ 2! dz2
z¼i
where
1 gð z Þ
gð z Þ 3
and f ðzÞ ¼ :
ðz þ i Þ ðz iÞ3
Hence, we get
1 d 2 gð z Þ 1 1
z¼i ¼ ð3Þ ð4Þ ðz þ iÞ5 ¼ ð3Þ ð4Þ ð2iÞ5
2! dz2 2 z¼i 2
3
¼ ¼ Res f ðiÞ: ð6:162Þ
16i
For a reason similar to that of Example 6.4, the second term of (6.160) vanishes
when R ! 1. Then, we obtain
Z 1 Z I
dz dz gð z Þ
lim I ¼ 3
þ lim 3
¼ lim 3
dz ¼ 2πi Res f ðiÞ
R!1C 1 ð1 þ z Þ
2 R!1 ΓR ð1 þ z Þ
2 R!1 C ðz iÞ
3 3π
¼ 2πi ¼ :
16i 8
That is,
Z 1 Z 1
dz dx 3π
3
¼ 3
¼I¼ :
1 ð1 þ z2 Þ 1 ð1 þ x Þ
2 8
1 1
f ðzÞ ¼ ¼ :
ð1 þ z2 Þ3 ðz þ iÞ3 ðz iÞ3
3
1 1 1 ζ
f ðzÞ ¼ ef ðζ Þ ¼ ¼ 1 þ
ðζ þ 2iÞ3 ζ 3 ζ 3 ð2iÞ3 2i
" 2 3 #
1 1 ζ ð3Þð4Þ ζ ð3Þð4Þð5Þ ζ
¼ 3 1 þ ð3Þ þ þ þ
ζ ð2iÞ3 2i 2! 2i 3! 2i
1 1 3 3 5
¼ þ þ ,
8i ζ 3 2iζ 2 2ζ 4i
ð6:163Þ
Equation (6.164) is a Laurent’s expansion of f (z). From (6.164) we see that the
1 3
coefficient of zi of f (z) is 16i , in agreement with (6.145) and (6.162). To obtain the
answer of (6.159), once again we may equally choose the contour C e (Fig. 6.18) and
get the same result as in the case of Example 6.4. In that case, f (z) can be expanded
into a Laurent’s series around z ¼ i. Readers are encouraged to check it.
We make a few remarks about the (generalized) binomial expansion formula. We
described it in (3.181) of Sect 3.6.1 such that
X1 λ
ð1 þ xÞλ ¼ m¼0
xm , ð3:181Þ
m
X1 λ
ð1 þ zÞλ ¼ m¼0
zm , ð6:165Þ
m
where λ is any complex number with z also being a complex number of jzj < 1. We
may view (6.165) as a consequence of the analytic continuation and (6.163) has been
dealt with as such indeed.
6.8 Examples of Real Definite Integrals 235
-5 -3 -1 1 3 5
We put
1
f ð xÞ : ð6:167Þ
1 þ x3
Figure 6.19 depicts a graphical form of f (x). The modulus of f (x) tends to be
pffiffiffiffiffiffiffiffi
infinity at x ¼ 1. Inflection points are present at x ¼ 0 and 3 1=2 ð 0:7937Þ.
Now, in the former two examples, the isolated singular points (i.e., poles) existed
only in the inside of a closed contour. In the present case, however, a pole is located
on the real axis. Bearing in mind this situation, we wish to estimate the integral
(6.166).
Rewriting (6.166), we have
Z 1 Z 1
dx dz
I¼ or I¼ :
1 ð 1 þ x Þ ð x 2 x þ 1Þ
1 ð 1 þ z Þ ð z2 z þ 1Þ
1
f ðzÞ
ð 1 þ z Þ ð z 2 z þ 1Þ
236 6 Theory of Analytic Functions
− −1 0
/
has three simple poles at z ¼ 1 along with z ¼ eiπ/3. This time, we have a contour
C depicted in Fig. 6.20, where the contour integration IC is performed in such a way
that
Z 1r Z
dz dz
IC ¼ þ
R ð1 þ zÞðz2 z þ 1Þ
Γ1 ð 1 þ z Þ ð z 2 z þ 1Þ
Z R Z
dz dz
þ þ ,
1þr ð1 þ zÞðz z þ 1Þ ΓR ð1 þ zÞðz z þ 1Þ
2 2
ð6:168Þ
where Γ1 stands for a small semicircle of radius r around z ¼ 1 as shown. Of the
complex roots z ¼ eiπ/3, only z ¼ eiπ/3 is responsible for the contour integration (see
Fig. 6.20).
Here we define a principal value P of the integral such that
Z 1
dx
P
1 ð 1 þ x Þ ð x 2 x þ 1Þ
Z 1r Z 1
dx dx
lim þ , ð6:169Þ
r!0 1 ð 1 þ x Þ ð x 2 x þ 1Þ 1þr ð 1 þ x Þ ð x 2 x þ 1Þ
where r is an arbitrary real positive number. As shown in this example, the principal
value is a convenient device to traverse the poles on the contour and gives a correct
answer in the contour integral. At the same time, getting R to infinity, we obtain
Z 1 Z
dx dz
lim I C ¼ P þ
R!1 1 ð 1 þ x Þ ðx 2 x þ 1Þ
Γ1 ð 1 þ z Þð z 2 z þ 1Þ
Z
dz
þ : ð6:170Þ
Γ1 ð 1 þ z Þ ð z 2 z þ 1Þ
I X
lim I C ¼ f ðzÞdz ¼ 2πi Res f aj , ð6:146Þ
R!1 C j
where in the present case f ðzÞ ð1þzÞðz12 zþ1Þ and we have only one simple pole at
aj ¼ eiπ/3 within the contour C.
We are going to get the answer by combining (6.170) and (6.146) such that
Z 1
dx
IP
1 ð 1 þ x Þ ðx 2 x þ 1Þ
Z Z
dz dz
¼ 2πi Res f eiπ=3 :
Γ1 ð1 þ zÞðz z þ 1Þ Γ1 ð1 þ zÞðz z þ 1Þ
2 2
ð6:171Þ
The third term of RHS of (6.171) tends to be zero as in the case of Examples 6.4
and 6.5. Therefore, if we can estimate the second term as well as the residues
properly, the integral (6.166) can be adequately determined. Note that in this case
the integral is meant by the principal value. To estimate the integral of the second
term, we make use of the fact that defining g(z) as
1
gð z Þ , ð6:172Þ
z2 z þ 1
g(z) is analytic at and around z ¼ 1. Hence, g(z) can be expanded in the Taylor’s
series around z ¼ 1 such that
X1
gðzÞ ¼ A0 þ A ðz
n¼1 n
þ 1Þ n , ð6:173Þ
where A0 6¼ 0. In fact,
Then, we have
Z Z
dz gðzÞdz
¼
ð þ Þ ð Γ1 1 þ z
1 z z 2 z þ 1Þ
Γ1
Z Z X
dz 1
¼ A0 þ A ðz þ 1Þn1 dz: ð6:175Þ
Γ1 1 þ z Γ1
n¼1 n
z ¼ 1 þ reiθ , ð6:176Þ
Note that in (6.177) the contour integration along Γ1 was performed in the
decreasing direction of the argument θ from π to 0. Thus, (6.175) is rewritten as
Z X1 Z 0
dz
¼ A 0 iπ þ An i r n einθ dθ, ð6:178Þ
Γ1 ð1 þ zÞðz z þ 1Þ
2 n¼1
π
where with the last equality we exchanged the order of the summation and integra-
tion. The second term of RHS of (6.178) vanishes when r ! 0. Thus, (6.171) is
further rewritten as
We have only one simple pole for f (z) at z ¼ eiπ/3 within the semicircle contour C.
Then, using (6.148), we have
1 1 pffiffiffi
¼ ¼ 1 þ 3i : ð6:180Þ
ð1 þ eiπ=3 Þðeiπ=3 eiπ=3 Þ 6
The above calculus is straightforward, but (6.37) can be conveniently used. Thus,
finally we get
pffiffiffi
πi pffiffiffi πi 3π π
I¼ 1 þ 3i þ ¼ ¼ pffiffiffi : ð6:181Þ
3 3 3 3
That is, I 1:8138: ð6:182Þ
(a) (b)
Γ
Γ
/
i /
Γ i
0 − −1 0
−1
/
/
1
Fig. 6.21 Sector of circle for contour integration of 1þz 3 that appears in Example 6.7. (a) The
The integral can also be calculated using a lower semicircle including the simple
pffiffi
pole located at z ¼ eiπ=3 ¼ 12 3i . That gives the same answer as (6.181). The
calculations are left for readers as an exercise.
Example 6.7 [9, 10] It will be interesting to compare the result of Example 6.6 with
that of the following real definite integral:
Z 1
dx
I¼ : ð6:183Þ
0 1 þ x3
In the third integral of (6.184), we take argz ¼ θ (constant) so that with z ¼ reiθ
we may have dz ¼ dreiθ. Then, that integral is given by
Z Z
dz dr
¼ : ð6:185Þ
L 1 þ z3 L 1 þ r 3 e3iθ
Z 1 Z Z 1
dx dz dr
lim I C ¼ þ e2πi=3 : ð6:186Þ
R!1 0 1 þ x3 Γ1 1 þ z3 0 1 þ r3
The second term of (6.186) vanishes as before. Changing the variable r ! x and
taking account of (6.180), we obtain
Z 1
dx
2πiRes f eiπ=3 ¼ 1 e2πi=3 ¼ 1 e2πi=3 I: ð6:187Þ
0 1 þ x3
Then, we have
To calculate Res f (eiπ/3), we used (6.148) at z ¼ eiπ/3; f (z) has a simple pole at
z ¼ eiπ/3 within the contour C of Fig. 6.21a. Noting that (1 e2πi/3) (1 + eiπ/3) ¼ 3
pffiffiffi
and eiπ=3 eiπ=3 ¼ 2i sin ðπ=3Þ ¼ 3i, we get
pffiffiffi 2π
I ¼ 2πi= 3 3i ¼ pffiffiffi : ð6:188Þ
3 3
To evaluate this integral, we define the contour integral I C0 as in Fig. 6.21b, which
is expressed as
Z 0 Z Z Z
dz dz dz dz
I C0 ¼P þ þ þ : ð6:190Þ
R 1 þ z3 Γ1 ð 1 þ z Þ ð z 2 z þ 1Þ
L 0 1 þ z
3
ΓR 1 þ z3
where we used (6.188) and (6.178) in combination with (6.174). Thus, we get
6.8 Examples of Real Definite Integrals 241
Z 0
dx π
I¼ ¼ pffiffiffi : ð6:191Þ
1 1 þ x
3
3 3
where ε(R) is a certain positive number that depends only on R and tends to be zero
as jzj ! 1. Therefore, we have
Z π Z π=2
jI R j < εðRÞR eαR sin θ dθ ¼ 2 εðRÞR eαR sin θ dθ,
0 0
where the last equality results from the symmetry of sinθ about π/2. Meanwhile, we
have
242 6 Theory of Analytic Functions
Hence, we have
Z π=2
πεðRÞ
jI R j < 2 εðRÞR e2αRθ=π dθ ¼ 1 eαR :
0 α
Therefore, we get
lim I R ¼ 0:
R!1
Rewriting (6.192) as
Z 1 Z 1
sin z 1 eiz eiz
I¼ dz ¼ dz, ð6:193Þ
1 z 2i 1 z
Γ
− 0
hence, can be expanded in the Taylor’s series around any complex number.
Expanding it around the origin we have
X1 ðizÞn X1 ðizÞn
eiz ¼ n¼0 n!
¼ 1 þ n¼1 n!
: ð6:195Þ
Notice that in (6.196) the argument is decreasing from π ! 0 during the contour
integration. Within the contour C, there is no singular point, and so IC ¼ 0 due to
Theorem 6.10 (Cauchy’s integral theorem). Thus, from (6.194) we obtain
Z 1 Z
eiz eiz
P dz ¼ dz ¼ iπ: ð6:197Þ
1 z Γ0 z
Z 1 Z 1
ðeiz eiz Þ sin z
P dz ¼ 2iP dz ¼ 2iπ: ð6:200Þ
1 z 1 z
1
sin 2 x ¼ ð1 cos 2xÞ: ð6:203Þ
2
Then, the integrand of (6.202) with the variable changed is rewritten using (6.37)
as
X1 ð2izÞn X1 ð2izÞn
e2iz ¼ n¼0 n!
¼ 1 þ 2iz þ n¼2 n!
: ð6:205Þ
Then, we get
X1 ð2izÞn
1 e2iz ¼ 2iz n¼2 n!
,
X n n2 ð6:206Þ
1 e2iz 2i 1 ð2iÞ z
¼ :
z2 z n¼2 n!
As in Example 6.8, we wish to use (6.206) to estimate the second term of the
following contour integral IC:
6.8 Examples of Real Definite Integrals 245
Z 1 Z Z
1 e2iz 1 e2iz 1 e2iz
lim I C ¼ P dz þ dz þ dz: ð6:207Þ
R!1 1 z2 Γ0 z2 Γ1 z2
With the second integral of (6.207), as before, only the first term of (6.206)
contributes to the integral. The result is
Z Z Z 0
1 e2iz 1
dz ¼ 2i dz ¼ 2i idθ ¼ 2π: ð6:208Þ
Γ0 z2 Γ0 z π
The first term of RHS of (6.209) vanishes for the reason similar to that already we
have seen in Example 6.4. The second term of RHS vanishes as well because of the
Jordan’s lemma. Within the contour C of (6.207), again there is no singular point,
and so lim I C ¼ 0. Hence, from (6.207) we have
R!1
Z 1 Z
1 e2iz 1 e2iz
P 2
dz ¼ dz ¼ 2π: ð6:210Þ
1 z Γ0 z2
Summing both sides of (6.210) and (6.211) in combination with (6.204), we get
an answer described by
Z 1
sin 2 z
dz ¼ π: ð6:212Þ
1 z2
(a) (b)
Γ
Γ − 0
Γ
− 0
1 1 1 1
¼ : ð6:214Þ
ðx aÞðx bÞ a b x a x b
We wish to consider the integration by dividing the integrand and calculate each
term individually. First, we estimate
Z 1
eiz
dz: ð6:216Þ
1 z a
To apply the Jordan’s lemma to the integration of (6.216), we use the upper
contour C that consists of the real axis, Γa, and ΓR; see Fig. 6.23a. As usual, we
consider a following integral:
Z 1 Z Z
eiz eiz eiz
lim I C ¼ P dz þ dz þ dz, ð6:217Þ
R!1 1 z a Γa z a Γ1 z a
Z Z Z
eiz eiðzþaÞ eiz
dz ¼ dz ¼ eia dz ¼ eia iπ, ð6:218Þ
Γa z a Γ0 z Γ0 z
where with the last equality we used (6.196). There is no singular point within the
contour C, and so lim I C ¼ 0. Then, we have
R!1
Z 1 Z
eiz eiz
P dz ¼ dz ¼ eia iπ, ð6:219Þ
1 z a Γa z a
To apply the Jordan’s lemma to the integration of (6.220), we use the lower
contour C0 that consists of the real axis, Γea , and Γ
fR (Fig. 6.23b). This time, the
integral to be evaluated is given by
Z 1 Z Z
eiz eiz eiz
lim I C0 ¼ P dz þ dz þ dz ð6:221Þ
R!1 1 z a Γea z a Γe1 a
z
so that we can trace the contour C0 counterclockwise. Notice that the minus sign is
present in front of the principal value. The third term of (6.221) vanishes due to the
Jordan’s lemma. Evaluating the second term of (6.221) similarly just above, we get
Z Z iðzþaÞ Z iz
eiz e ia e
dz ¼ dz ¼ e dz ¼ eia iπ: ð6:222Þ
e
Γa z a e
Γ0 z e
Γ0 z
Hence, we obtain
Z 1 Z
eiz eiz
P dz ¼ dz ¼ eia iπ: ð6:223Þ
1 z a Γea z a
Notice that in (6.222) the argument is again decreasing from 0 ! π during the
contour integration. Summing (6.219) and (6.223), we get
Z 1 Z 1 iz Z 1 iz
eiz e e þ eiz
P dz þ P dz ¼ P dz ¼ eia iπ eia iπ
1 z a 1 z a 1 z a
¼ iπ eia eia ¼ iπ 2i sin a ¼ 2π sin a:
Z 1
eiz þ eiz
P dz ¼ 2π sin b:
1 z b
The calculation is left for readers. Hints: (i) Use cosx ¼ (eix + eix)/2 and Jordan’s
lemma. (ii) Use the contour as shown in Fig. 6.18 for integration. Besides the
examples listed above, a variety of integral calculations can be seen in literature
[9–12].
The last topics of the theory of analytic functions are related to multivalued func-
tions. Riemann surfaces play a central role in developing the theory of the
multivalued functions.
z ¼ wn : ð6:226Þ
−1 1
(b) 1 /
(−1) /
/
/
1 −1
/
/
w ¼ z1=n : ð6:227Þ
We have expressly shown this simple thing. It is because if we think of real z, z1/n
is uniquely determined only when n is odd regardless of whether z is positive or
negative. In case we think of even n, (i) if z is positive, we have two n-th root of z1/n.
(ii) If z is negative, on the other hand, we have no n-th root. Meanwhile, if we think
of complex z, we have a totally different situation. That is, regardless of whether n is
odd or even, we always have n different z1/n for any given complex number z.
As a tangible example, let us think of n-th roots of 1 for n ¼ 2 and 3. These are
depicted graphically in Fig. 6.24. We can readily understand the statement of the
above paragraph. In Fig. 6.24a, if z ¼ 1 for n ¼ 2, we have either w ¼ (1)1/2
¼ (eiπ)1/2 ¼ i or w ¼ (1)1/2 ¼ (eiπ)1/2 ¼ i with square roots. In Fig. 6.24b, if
z ¼ 1 for n ¼ 3, we have w ¼ (1)1/3 ¼ 1; w ¼ (1)1/3 ¼ (eiπ)1/3 ¼ eiπ/3;
w ¼ (1)1/3 ¼ (eiπ)1/3 ¼ eiπ/3 with cubic roots. Moreover, we have 11/2 ¼ 1; 1.
Also, we have 11/3 ¼ 1; e2iπ/3; e2iπ/3.
Let us extend the above preliminary discussion to the analytic functions. Suppose
we are given a following polar form of z such that
and
250 6 Theory of Analytic Functions
Then, we have
where with the last equality we used de Moivre’s theorem (6.36). Comparing the real
and imaginary parts of (6.32) and (6.228), we have
r ¼ ρn or ρ ¼ r 1=n ð6:229Þ
and
1
φ ¼ ðθ 2kπ Þ ðk ¼ 0, 1, 2, Þ:
n
That is, wk(θ) is a periodic function of the period 2nπ. The number of the total
branches is n; the index k of wk(θ) denotes these different branches. In particular,
when k ¼ 0, we have
1 1
w0 ðθÞ ¼ z1=n ¼ r 1=n cos θ þ i sin θ : ð6:233Þ
n n
From (6.234), we find that wk(θ) is obtained by shifting (or rotating) w0(θ) by 2kπ
toward the positive direction of θ. The branch w0(θ) is called a principal branch of
the n-th root of z. The value of w0(θ) is called the principal value of that branch. The
implication of the presence of the branches is as follows: Suppose that the branches
besides w0(θ) would be absent. Then, from (6.233) we have
2π 2π
w0 ð2π Þ ¼ z1=n ¼ r 1=n cos þ i sin 6¼ w0 ð0Þ ¼ r 1=n :
n n
This means that the value of w0(0) is recovered and, hence, the single valuedness
remains intact. After another cycle around the origin, similarly we have
Thus, we have no more new function. At the same time, the single valuedness of
wk(θ) (k ¼ 0, 1, 2, , n 1) remains intact during these processes. To summarize
the above discussion, if we have n planes (or sheets) as the complex planes and
allocate them to w0(θ), w1(θ), , wn 1(θ) individually, we have a single-valued
function w(θ) as a whole throughout these planes. In a word, the following relation
represents the situation:
252 6 Theory of Analytic Functions
8
> w 0 ðθ Þ
>
> for Plane 0,
>
< w 1 ðθ Þ for Plane 1,
w ðθ Þ ¼
>
>
>
>
: for Plane n 1:
wn1 ðθÞ
This is the essence of the Riemann surface. The superposition of these n planes is
called a Riemann surface and each plane is said to be a Riemann sheet [of the
function w(θ)]. Each single-valued function w0(θ), w1(θ), , wn 1(θ) defined on
each Riemann sheet is called a branch of w(θ). In the above discussion, the origin is
called a branch point of w(z).
For simplicity, let us think of the function w(z) described by
pffiffi
wðzÞ z1=2 ¼ z:
In this case, for z ¼ reiθ (0 θ 2π) we have two different values w0 and w1
given by
Then, we have
w0 ðθÞ for Plane 0,
wðθÞ ¼ ð6:236Þ
w1 ðθÞ for Plane 1:
In this case, w(z) is called a “two-valued” function of z and the functions w0 and
w1 are said to be branches of w(z). Suppose that z makes a counterclockwise circuit
around the origin, starting from, e.g., a real positive number z ¼ z0 to come full circle
to the original point z ¼ z0. Then, the argument of z has been increased by 2π. In this
situation the arguments of the individual branches w0 and w1 are increased by π.
Accordingly, w0 is switched to w1 and w1 is switched to w0. This situation can be
understood more clearly from (6.232) and (6.234). That is, putting n ¼ 2 in (6.232),
we have
w1 ðθÞ ¼ w0 ðθ 2π Þ: ð6:238Þ
(a) (b)
Plane I placed on top of
Plane II
Plane I Plane II
Fig. 6.25 Simple kit that helps visualize the Riemann surface. To make it, follow next procedures:
(a) Take two sheets of paper (Planes I and II) and make slits (indicated by dashed lines) as shown.
Next, put Plane I on top of Plane II so that the two slits (i.e., branch cuts) can fit in line. (b) Tape
together the downside of the cut of Plane I and the foreside of the cut of Plane II. Then, also tape
together the downside of the cut of Plane II and the foreside of the cut of Plane I
where with the last equality we replaced θ with θ + 2π in (6.237). Rewriting the
above, we get
w1, respectively, so that each branch can be single valued. In other words, w(z) ¼ z1/2
is a single-valued function that is defined on the whole Riemann surface.
There are a variety of Riemann surfaces according to the nature of the complex
functions. Another example of the multivalued function is expressed as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
wðzÞ ¼ ðz aÞðz bÞ,
where two branch points are located at z ¼ a and z ¼ b. As before, we assume that
Then, we have
pffiffiffiffiffiffiffiffi iðθa þθb Þ=2
wðθa , θb Þ ¼ z1=2 ¼ ra rb e : ð6:239Þ
(a) (b)
0 0
(c)
Fig. 6.27 Geometrical relationship between contour C and two branch points (located at z ¼ a and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
z ¼ b) for ðz aÞðz bÞ. (a) The contour C encircles only z ¼ a. (b) C encircles both z ¼ a and
z ¼ b. (c) C encircles neither z ¼ a nor z ¼ b
From (6.242), we find that adding 2π to θa or θb, w0 and w1 are switched to each
other as before. We also find that after the variable z has come full circle around one
of a and b, w0(θa, θb) changes sign as in the case of (6.235).
We also have
From (6.243), on the other hand, after the variable z has come full circle around
both a and b, both w0(θa, θb) and w1(θa, θb) keep the original value. This is also the
case where the variable z comes full circle without encircling a or b. These behaviors
imply that (i) if a contour encircles one of z ¼ a and z ¼ b, w0(θa, θb) and w1(θa, θb)
change sign and switch to each other. Thus, w0(θa, θb) and w1(θa, θb) form branches.
On the other hand, (ii) if a contour encircles both z ¼ a and z ¼ b, w0(θa, θb) and
w1(θa, θb) remain intact. (iii) If a contour encircles neither z ¼ a nor z ¼ b, w0(θa, θb)
and w1(θa, θb) remain intact as well.
These three cases (i), (ii), and (iii) are depicted in Fig. 6.27a–c, respectively. In
Fig. 6.27c after z comes full circle, argz relative to z ¼ a returns to the original value
keeping θ1 arg z θ2. This is similarly the case with argz relative to z ¼ b.
Correspondingly, the branch cut(s) are depicted with doubled broken line(s), e.g., as
in Fig. 6.28. Note that in the cases (ii) and (iii) the branch cut can be chosen so that
256 6 Theory of Analytic Functions
(a) (b)
0 0
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Fig. 6.28 Branch cut(s) for ðz aÞðz bÞ. (a) A branch cut is shown by a line connecting z ¼ a
and z ¼ b. (b) Branch cuts are shown by a line connecting z ¼ a and z ¼ 1 and another
line connecting z ¼ b and z ¼ 1. In both (a, b) the branch cut(s) are depicted with doubled broken
line(s)
the circle (or contour) may not cross the branch cut. Although a bit complicated, the
Riemann surface can be shaped accordingly. Other choices of the branch cuts are
shown in Sect. 6.9.2 (vide infra).
When we consider logarithm functions, we have to deal with rather complicated
situation. For polar coordinate representation, we have z ¼ reiθ. That is,
where
When we deal with a function having a branch point, we must remind that unless the
contour crosses the branch cut (either algebraic or logarithmic), a function we are
thinking of is held single valued and analytic (if that function is originally analytic)
and can be differentiated or integrated in a normal way. We give some examples
for this.
6.9 Multivalued Functions and Riemann Surfaces 257
We rewrite (6.246) as
Z 1
za1
I¼ dz: ð6:247Þ
0 1þz
za1
f ðzÞ ,
1þz
za1 ¼ eða1Þ ln z :
where ΓR and Γ0 denote the outer large circle and inner small circle of their radius
R ( 1) and r ( 1), respectively. Note that with the contour integration ΓR is traced
counterclockwise but Γ0 is traced clockwise (see Fig. 6.29). Since the simple pole is
present at z ¼ 1, the related residue Res f (1) is
a1
Res f ð1Þ ¼ za1 z¼1
¼ ð1Þa1 ¼ eiπ ¼ eaiπ :
Then, we have
Moreover, we have
Z Z Z
za1 eða1Þ ln z eða1Þð ln jzjþ2πiÞ
dz ¼ dz ¼ dz
Q0 P0 1 þ z Q0 P0 1 þ z Q0 P0 1þz
Z
eða1Þ ln jzj
¼ eða1Þ2πi dz: ð6:252Þ
Q0 P0 1þz
Notice that the argument underneath the branch cut (i.e., the line Q0P0) is
increased by 2π relative to that over the branch cut.
Considering (6.249) through (6.252) and taking the limit of R ! 1 and r ! 0,
(6.248) is rewritten by
Z 1 ða1Þ ln jzj Z 0 ða1Þ ln jzj
e e
2πi eaπi ¼ dz þ eða1Þ2πi dz
0 1þz 1 1þz
h iZ 1 ða1Þ ln jzj
ða1Þ2πi e
¼ 1e dz
0 1þz
h iZ 1
za1
¼ 1 eða1Þ2πi dz: ð6:253Þ
0 1þz
In (6.253) we have assumed z to be real and positive (see Fig. 6.29), and so we have
e(a 1) ln jz| ¼ e(a 1) ln z ¼ za 1. Therefore, from (6.253) we get
6.9 Multivalued Functions and Riemann Surfaces 259
Z 1
za1 2πi eaπi 2πi eaπi
dz ¼ ¼ : ð6:254Þ
0 1þz 1 eða1Þ2πi 1 e2πai
Using (6.37) for (6.254) and performing simple trigonometric calculations (after
multiplying both the numerator and denominator by 1 e2πai), we finally get
Z 1 Z 1
za1 xa1
dz ¼ dx ¼ I ¼ π=ðsin aπ Þ:
0 1þz 0 1þx
1
f ðzÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
ðz aÞðz bÞ
The total contour C comprises Ca, PQ, Cb, and Q0P0 as depicted in Fig. 6.30. The
function f (z) has two branch points at z ¼ a and z ¼ b; otherwise f (z) is analytic. We
draw a branch cut with a doubled broken line as shown. Since the contour C does not
cross the branch cut but encircle it, we can evaluate the integral in a normal manner.
Starting from the point P0, (6.256) can be expressed by
Z Z
1 1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz
Ca ðz aÞðz bÞ PQ ð z aÞ ð z bÞ
Z Z
1 1
þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz: ð6:257Þ
Cb ð z a Þ ð z b Þ 0 0
QP ð z a Þ ð z bÞ
We assume that the lines PQ and Q0P0 are illimitably close to the real axis. This
time, Ca and Cb are both traced counterclockwise. Putting
we have
260 6 Theory of Analytic Functions
Fig. 6.30 Branch cut (shown with a doubled broken line) and contour for the integration of
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ffi ða < bÞ. Only the argument θ1 is depicted. With θ2 see text
ðzaÞðzbÞ
1
f ðzÞ ¼ pffiffiffiffiffiffiffiffi eiðθ1 þθ2 Þ=2 : ð6:258Þ
r1 r2
Z Z π Z π pffiffiffi
1 iε ε
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffi eiðθ1 þθ2 Þ=2 dθ2 pffiffiffiffi dθ2 :
Cb ðz aÞðz bÞ π r1 ε π r1
Therefore, we get
Z Z
1 1
lim pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0 or lim pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0:
ε!0 Cb ð z aÞ ð z bÞ ε!0 Cb ð z aÞ ð z bÞ
Z Z Z
1 a
eiπ=2 a
i
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx
0 0
QP ðz aÞðz bÞ b ð x aÞ ð b x Þ b ðx aÞðb xÞ
Z b
1
¼i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx:
a ðx aÞðb xÞ
ð6:260Þ
On the line PQ, in turn, we have r1 ¼ x a and θ1 ¼ 2π. This is because when
going from P0 to P, the argument is increased by 2π; see Fig. 6.30 once again. Also,
we have r2 ¼ b x, θ2 ¼ π, and dz ¼ dx. Notice that the argument θ2 remains
unchanged. Then, we get
Z Z
1 b
e3iπ=2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx
PQ ð z a Þ ð z bÞ a ðx aÞðb xÞ
Z b
1
¼i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx: ð6:261Þ
a ðx aÞðb xÞ
z ¼ 1=w, ð6:263Þ
we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw
ð z aÞ ð z bÞ ¼ ð1 awÞð1 bwÞ with dz ¼ 2 : ð6:264Þ
w w
Thus, we have
I I
1 1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw, ð6:265Þ
C ðz aÞðz bÞ C 0 w ð1 awÞð1 bwÞ
where the closed contour C0 of diameter 1/R comes full circle clockwise around the
origin of the w-plane (see Fig. 6.31). This is because as in the z-plane the contour C is
262 6 Theory of Analytic Functions
(a) (b)
0 0
Fig. 6.31 Branch cuts (shown with doubled broken lines) and contour in the w-plane for the
integration of pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
. (a) Case of 0 < a < b. (b) Case of a < 0 < b. Notice that in both cases
w ð1awÞð1bwÞ
the contour C0 is traced clockwise (see text)
traced viewing the point z ¼ 1 on the right, the contour C0 is traced viewing the
corresponding point w ¼ 0 on the right. The above clockwise cycle results from such
a situation. The integrand of (6.265) has a simple pole at w ¼ 0.
Regarding other singularities of the integrand, there are two branch points at
w ¼ 1/a and w ¼ 1/b. Then, we appropriately choose R 1 (or 1/R 1) so that C0
does not cut across the branch cut. To meet this condition, we have two choices.
(i) The branch cut connects w ¼ 1/a and w ¼ 1/b. (ii) Another choice of the branch
cut is that it connects w ¼ 1/a and w ¼ 1 along with w ¼ 1/b and w ¼ 1 (see
Fig. 6.31). If we have 0 < a < b, we can choose a branch cut as that depicted in
Fig. 6.31a corresponding to Fig. 6.28a. In case a < 0 < b, the branch cut can be that
shown in Fig. 6.31b corresponding to Fig. 6.28b. (When a < b < 0, the branch cut
geometry may be that similar to Fig. 6.31a.)
Consequently, in both the above cases of (i) and (ii) there is only a simple pole at
w ¼ 0 without any other singularities inside the contour C0. Therefore, according to
(6.148) we have a contour integration described by
I
1 1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw ¼ 2πi Res pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
C0 w ð1 awÞð1 bwÞ w ð1 awÞð1 bwÞ w¼0
ð6:266Þ
1
¼ 2πi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2πi,
ð1 awÞð1 bwÞ w¼0
where the above minus sign is due to the clockwise rotation of the contour integra-
tion. Hence, from (6.265) we get
I
1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 2πi: ð6:267Þ
C ð z aÞ ð z bÞ
Z b
1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ¼ π: ð6:268Þ
a ðx aÞðb xÞ
f ðt Þ 1 2tx þ t 2 : ð6:270Þ
We rewrite (6.270) as
h pffiffiffiffiffiffiffiffiffiffiffiffiffi ih pffiffiffiffiffiffiffiffiffiffiffiffiffi i
f ð t Þ ¼ t x þ i 1 x2 t x i 1 x2 : ð6:271Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffi
With f (t) ¼ 0 we have two roots at t ¼ x i 1 x2, and so jtj ¼ 1 (see Sect.
3.6.1).
When λ ¼ 1/2, we are dealing with essentially the same problem as that of
Example 6.13. For instance, when x ¼ 0, the branch points are located at t ¼ i.
pffiffiffi pffiffiffi
When x ¼ 1= 2, the branch points are positioned at t ¼ ð1 iÞ= 2. These values
of t form the branch points. In Fig. 6.32 we depict these branch points and
corresponding branch cuts. In this situation, the function
1=2
hðt Þ 1 2tx þ t 2 ð6:272Þ
is analytic within a circle jt j < 1 of the complex plane t [14] (Fig. 6.32). Therefore,
we can freely expand h(t) into a Taylor’s series at any point within that circle. If we
expand h(t) around t ¼ 0, we expect to have a following Taylor’s series:
X1
hð t Þ ¼ c ðxÞt
n¼0 n
n
: ð6:273Þ
where the contour C can be chosen so that C can encircle t ¼ 0 in a region jt j < 1.
Let us make a substitution such that [14]
264 6 Theory of Analytic Functions
1=2
1 ut ¼ 1 2tx þ t 2 : ð6:275Þ
Then we have
2ð u x Þ
t¼ and dt ¼ 2ð1 ut Þdu= u2 1 with u 6¼ 1: ð6:276Þ
u 1
2
I n n
1 ð1Þn ð1 u2 Þ ð1Þn d n ð1 u2 Þ
cn ð xÞ ¼ du ¼ n Pn ðxÞ: ð6:278Þ
2πi C 0 2 ð u xÞ
n nþ1 2 n! dun
u¼x
The functions Pn(x) are Legendre polynomials that have already appeared in
(3.145) and (3.207) of Sects. 3.5 and 3.6. In this manner, the above argument directly
confirms that the Legendre polynomials of (3.194) derived from the generating
function (3.180) or defined as Rodrigues formula of (3.145) are identical in the
complex domain. Originally, (3.145) and (3.207) were defined in the real domain.
The identical representations in both the domains real and complex are the conse-
quence of the analytic continuation.
Substituting x ¼ 1 for (6.277), we get
References 265
I n I
1 ð1Þn ð1 u2 Þ 1 1 ð u þ 1Þ n 1
c n ð 1Þ ¼ du ¼ n du ¼ n ðu þ 1Þn
2πi C 2 ð u 1Þ
0 n nþ1 2 2πi C 0 u 1 2
u¼1
¼1
and
I n I
1 ð1Þn ð1 u2 Þ 1 1 ðu 1Þn 1
cn ð1Þ ¼ du ¼ n du ¼ n ðu 1Þn
2πi C 2 ð u þ 1Þ
0 n nþ1 2 2πi C 0 u þ 1 2
u¼1
¼ ð1Þn
ð6:279Þ
Notice that in (6.279) the integrand has a simple pole at u ¼ 1 and, hence, we
used (6.144) and (6.148) with the contour integration. The contour C0 is taken so that
it can encircle u ¼ 1 in the former case and u ¼ 1 in the latter. Hence, we have
important relations
Summarizing this chapter, we have explored the theory and applications of the
analytic functions on the complex domain. We have studied how the theory of
analytic functions produces fruitful results.
References
In this chapter, we first represent Maxwell’s equations as vector forms. The equa-
tions are represented as a differential form that is consistent with a viewpoint based
on “action trough medium.” The equation of wave motion (or wave equation) is
naturally derived from these equations.
Maxwell’s equations of electromagnetism are expressed as follows:
div D ¼ ρ, ð7:1Þ
div B ¼ 0, ð7:2Þ
∂B
rot E þ ¼ 0, ð7:3Þ
∂t
∂D
rot H ¼ i: ð7:4Þ
∂t
In (7.3), RHS denotes a zero vector. Let us take some time to get acquainted with
physical quantities with their dimension as well as basic ideas and concepts along
with definitions of electromagnetism.
The quantity D is called electric flux density [As
m2 ¼ m2] (or electric displacement);
C
0 1
Vx
B C
V ¼ ð e1 e2 e3 Þ @ V y A: ð7:5Þ
Vz
∂V x ∂V y ∂V z
div V þ þ : ð7:6Þ
∂x ∂y ∂z
Thus, the div operator converts a vector to a scalar. The quantities D and the electric
field E [V/m] are associated with the following expression:
D = εE, ð7:7Þ
2
where ε [Nm
C
2] is called a dielectric constant (or permittivity) of the dielectric medium.
The dimension can be understood from the following Coulomb’s law that describes a
force exerted between two charges:
1 QQ0
F¼ , ð7:8Þ
4πε r 2
where F is the force; Q and Q0 are electric charges of the two charges; r is a distance
between the two charges. Equation (7.1) represents Gauss’ law of electrostatics.
The electric charge of 1 Coulomb (1 [C]) is defined as follows: Suppose that two
point charges having the same electric charge are placed in vacuum 1 m apart. In that
situation, if a force F between the two charges is
where ec is a light velocity, then we define the electric charge which each point charge
possesses as 1 [C]. Here, note that ec is a dimensionless number related to the light
7.1 Maxwell’s Equations and Their Characteristics 271
velocity in vacuum that is measured in a unit of [m/s]. Notice also that the light
velocity c in vacuum is defined as
Thus, we have
Meanwhile, 1 s (second) has been defined from a certain spectral line emitted from
133
Cs. Thus, 1 m (meter) is defined by
ðdistance along which light is propagated in vacuum during 1 sÞ=299 792 458:
The quantity B is called magnetic flux density [Vsm2 ¼ m2 ]. Equation (7.2) repre-
Wb
sents Gauss’s law of magnetostatics. In contrast to (7.1), RHS of (7.2) is zero. This
corresponds to the fact that although a true charge exists, a true “magnetic charge”
does not exist. (More precisely, such charge has not been detected so far.) The
quantity B is connected with magnetic field H by the following relation:
B = μH, ð7:10Þ
Also we have
μ0 ε0 ¼ 1=c2 : ð7:12Þ
We will encounter this relation later again. Since the quantities c and ε0 have a
defined magnitude, so does μ0 from (7.12).
We often make an issue of a relative magnitude of dielectric constant and
magnetic permeability of a dielectric medium. That is, relative permittivity εr and
relative permeability μr are defined as follows:
272 7 Maxwell’s Equations
Note that both εr and μr are dimensionless quantities. Those magnitudes are equal to
1 (in the case of vacuum) or larger than 1 (with any other dielectric media).
Equations (7.3) and (7.4) deal with the change in electric and magnetic fields with
time. Of these, (7.3) represents Faraday’s law of electromagnetic induction due to
Michael Faraday (1831). He found that when a permanent magnet was thrust into or
out of a closed circuit, the transient current flowed. Moreover, that experiment
implied that even without the closed circuit, an electric field was generated around
the space that changed the position relative to the permanent magnet. Equation (7.3)
is easier to understand if it is rewritten as follows:
∂B
rot E = 2 : ð7:14Þ
∂t
That is, the electric field E is generated in such a way that the induced electric field
(or induced current) tends to lessen the change in magnetic flux (or magnetic field)
produced by the permanent magnet (Lenz’s law). The minus sign in RHS indicates
that effect.
The rot operator appearing in (7.3) and (7.4) is defined by
e1 e2 e3
∂ ∂ ∂
rot V = — V =
∂x ∂y ∂z
Vx Vy Vz
∂V z ∂V y ∂V x ∂V z ∂V y ∂V x
= 2 e1 þ 2 e2 þ 2 e3 : ð7:15Þ
∂y ∂z ∂z ∂x ∂x ∂y
The operator — has already appeared in (3.9). This operator transforms a vector to a
vector. Let us think of the meaning of the rot operator. Suppose that there is a vector
field that varies with time and spatial positions. Suppose also at some instant the
spatial distribution of the field varies as in Fig. 7.1, where a spiral vector field V is
present. For a z-component of rot V around the origin, we have
∂V y ∂V x ðV 1 Þy ðV 3 Þy ðV 2 Þx ðV 4 Þx
ðrot V Þz ¼ 2 ¼ lim :
∂x ∂y Δx⟶0, Δy⟶0 Δx Δy
In the case of Fig. 7.1, (V1)y (V3)y > 0 and (V2)x (V4)x < 0 and, hence, we find
∂V
that rot V has a positive z-component. If Vz ¼ 0 and ∂zy ¼ ∂V ∂z
x
¼ 0, we find from
∂V
(7.15) that rot V possesses only the z-component. The equation ∂zy ¼ ∂V ∂z
x
¼0
implies that the vector field V is uniform in the direction of the z-axis. Thus, under
the above conditions the spiral vector field V is accompanied by the rot V vector field
7.1 Maxwell’s Equations and Their Characteristics 273
that is directed toward the upper side of the plane of paper (i.e., the positive direction
of the z-axis).
Equation (7.4) can be rewritten as
∂D
rot H ¼ i þ : ð7:16Þ
∂t
A
Notice that ∂D
∂t
has the same dimension as m2 ] and is called displacement current.
Without this term, we have
rot H ¼ i: ð7:17Þ
This relation is well known as Ampère’s law or Ampère’s circuital law (Andr-
é-Marie Ampère: 1827), which determines a magnetic field yielded by a stationary
current. Again with the aid of Fig. 7.1, (7.17) implies that the current given by i
produces spiral magnetic field.
Now, let us think of a change in amount of charges with time in a part of three-
dimensional closed space V surrounded by a closed surface S. It is given by
Z Z Z Z
d ∂ρ
ρdV ¼ dV ¼ i ndS ¼ div idV, ð7:18Þ
dt V V ∂t S V
where n is an outward-directed normal unit vector; with the last equality we used
Gauss’s theorem. The Gauss’s theorem is described by
Z Z
div idV ¼ i ndS:
V S
(‒Δx/2, 0) O x
(Δx/2, 0)
z
V3 V4
(0, ‒Δy/2)
274 7 Maxwell’s Equations
Figure 7.2 gives an intuitive diagram that explains the Gauss’s theorem. The diagram
shows a cross-section of the closed space V surrounded by a surface S. In this case,
imagine a cube or a hexahedron as V. The periphery is the cross-section of the closed
surface accordingly. Arrows in the diagram schematically represent div i on indi-
vidual fragments; only those of the center infinitesimal fragment are shown with
solid lines. The arrows of adjacent fragments cancel out each other and only the
components on the periphery are nonvanishing. Thus, the volume integration of div i
is converted to the surface integration of i. Readers are referred to appropriate
literature with the vector integration [1].
Consequently, from (7.18) we have
Z
∂ρ
þ div i dV ¼ 0: ð7:19Þ
V ∂t
∂ρ
þ div i = 0: ð7:20Þ
∂t
The relation (7.20) is called a current continuity equation. This relation represents
law of conservation of charge.
Meanwhile, taking div of both sides of (7.17), we have
∂ ∂H z ∂H y ∂ ∂H x ∂H z ∂ ∂H y ∂H x
div rot H ¼ 2 þ 2 þ 2
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y
2 2 2 2 2 2
∂ ∂ ∂ ∂ ∂ ∂
¼ Hz þ Hx þ Hy
∂x∂y ∂y∂x ∂y∂z ∂z∂y ∂z∂x ∂x∂z
¼ 0:
ð7:22Þ
2 2
∂ Hz
With the last equality of (7.22), we used the fact that if, e.g., ∂x∂y
and ∂∂y∂x
Hz
are
2 2
∂ Hz
continuous and differentiable in a certain domain (x, y), ∂x∂y
¼ ∂∂y∂x
Hz
. That is, we
assume “ordinary” functions for Hz, Hx, and Hy. Thus from (7.21), we have
div i = 0: ð7:23Þ
∂ρðx, t Þ
¼ 0, ð7:24Þ
∂t
where we explicitly show that ρ depends upon both x and t. Note that x is a position
vector described as (3.5). Therefore, (7.24) shows that ρ(x, t) is temporally constant
at a position x, consistent with the stationary current.
Nevertheless, we encounter a problem when ρ(x, t) is temporally varying. In other
words, (7.17) goes against the charge conservation law, when ρ(x, t) is temporally
varying. It was James Clerk Maxwell (1861–1862) that solved the problem by
introducing a concept of the displacement current. In fact, taking div of both sides
of (7.16), we have
∂div D ∂ρ
div rot H ¼ div i þ ¼ div i þ ¼ 0, ð7:25Þ
∂t ∂t
where with the first equality we exchanged the order of differentiations with respect
to t and x; with the second equality we used (7.1). The last equality of (7.25) results
from (7.20). In other words, in virtue of the term of ∂D
∂t
, (7.4) is consistent with the
charge conservation law. Thus, the set of Maxwell’s eqs. (7.1)–(7.4) supply us with
well-established base in natural science up until the present.
Although the set of these equations describe spatial and temporal changes in
electric and magnetic fields in vacuum and matter including metal, in Part II we
confine ourselves to the changes in the electric and magnetic fields in a uniform
dielectric medium.
276 7 Maxwell’s Equations
If we further confine ourselves to the case where neither electric charge nor electric
current is present in a uniform dielectric medium, we can readily obtain equations of
wave motion regarding the electric and magnetic fields. That is,
div D ¼ 0, ð7:26Þ
div B ¼ 0, ð7:27Þ
∂B
rot E þ ¼ 0, ð7:28Þ
∂t
∂D
rot H ¼ 0: ð7:29Þ
∂t
The relations (7.27) and (7.28) are identical to (7.2) and (7.3), respectively.
Let us start with a formula of vector analysis. First, we introduce a grad operator.
We have
∂f ∂f ∂f
grad f ¼ — f ¼ e1 þ e2 þ e3 :
∂x ∂y ∂z
That is, the grad operator transforms a scalar to a vector. We have a following
formula:
∂ ∂V y ∂V z
2 2
∂ Vx ∂ Vx
= þ : ð7:32Þ
∂x ∂y ∂z ∂y
2
∂z
2
2 2
∂ V ∂ V 2 2
Again assuming that ∂y∂xy ¼ ∂x∂yy and ∂∂z∂x
Vz
¼ ∂∂x∂z
Vz
, we have
½rot rot V x ¼ grad div V ∇2 V x : ð7:33Þ
Regarding y- and z-components, we have similar relations as well. Thus (7.30) holds.
7.2 Equation of Wave Motion 277
2
∂B ∂rotB ∂ E
rot rot E þ rot ¼ grad div E — 2 E þ ¼ — 2 E þ με 2
∂t ∂t ∂t
¼ 0, ð7:34Þ
where with the first equality we used (7.30) and for the second equality we used
(7.7), (7.10), and (7.29). Thus we have
2
∂ E
— 2 E = με 2
: ð7:35Þ
∂t
2
∂ H
— 2 H = με 2
: ð7:36Þ
∂t
Equations (7.35) and (7.36) are called equations of wave motions for the electric and
magnetic fields.
To consider implications of these equations, let us think of for simplicity a
following equation in a one-dimensional space.
2 2
∂ yðx, t Þ 1 ∂ yðx, t Þ
2
¼ , ð7:37Þ
∂x v2 ∂t 2
where y is an arbitrary scalar function that depends on x and t; v is a constant. Let f(x,
t) and g(x, t) be arbitrarily chosen functions. Then, f(x vt) and g(x + vt) are two
solutions of (7.37). In fact, putting X ¼ x vt, we have
2 2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X ∂ f
¼ ¼ , ¼ ¼ , ð7:38Þ
∂x ∂X ∂x ∂X ∂x2 ∂X ∂X ∂x ∂X 2
2 2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X 2 ∂ f
¼ ¼ ðvÞ , ¼ ð v Þ ¼ ð v Þ : ð7:39Þ
∂t ∂X ∂t ∂X ∂t 2 ∂X ∂X ∂t ∂X
2
2 2
∂ f 1 ∂ f
2
¼ 2 2: ð7:40Þ
∂x v ∂t
Similarly, we get
278 7 Maxwell’s Equations
2 2
∂ g 1 ∂ g
2
¼ 2 2: ð7:41Þ
∂x v ∂t
Therefore, as a general solution we can take a superposition of f(x, t) and g(x, t). That
is,
The implication of (7.42) is as follows: (i) The function f(x vt) can be obtained
by parallel translation of f(x) by vt in a positive direction of x-axis. In other words, f
(x vt) is obtained by translating f(x) by v in a unit of time in a positive direction of
x-axis, or the function represented by f(x) is translated at a rate of v with its form
unchanged in time. (ii) The function g(x + vt), on the other hand, is translated at a rate
of v with its form unchanged in time as well. (iii) Thus, y(x, t) of (7.42) represents
two “waves,” i.e., a forward wave and a backward wave. Propagation velocity of the
two waves is jvj accordingly. Usually we choose a positive number for v and v is
called a phase velocity.
Comparing (7.35) and (7.36) with (7.41), we have
με ¼ 1=v2 : ð7:43Þ
f ðx vt Þ ¼ AeiðxvtÞ , ð7:44Þ
where ef shows the change in the functional from according to the variable transfor-
mation. In (7.45), we have
where ν and ω are said to be frequency and angular frequency, respectively. For a
three-dimensional wave f of a scalar function, we have a following form:
7.2 Equation of Wave Motion 279
2
1 ∂ f
— 2f = : ð7:49Þ
v2 ∂t 2
1
A k2 eiðkxωtÞ ¼ A ω2 eiðkxωtÞ : ð7:50Þ
v2
k2 v2 ¼ ω2 or kv ¼ ω: ð7:51Þ
where nx, and ny, and nz define direction cosines. Then (7.47) can be rewritten as
f ¼ AeiðknxωtÞ , ð7:53Þ
where an exponent is called a phase. Suppose that the phase is fixed at zero. That is,
kn x ωt ¼ 0: ð7:54Þ
n x vt ¼ 0: ð7:55Þ
In a nonmagnetic substance such as glass and polymer materials, we can assume that
μr 1. Thus, we get an approximate expression as follows:
pffiffiffiffi
n εr : ð7:57Þ
In (7.58) and (7.59), E0 and H0 are constant vectors and may take a complex
magnitude. Substituting (7.58) for (7.28) and using (7.10) as well as (7.43) and
(7.46), we have
Comparing coefficients of the exponential functions of the first and last sides, we get
pffiffiffiffiffiffiffiffi
H 0 = n E0 = μ=ε : ð7:60Þ
n E0 ¼ n H 0 ¼ 0: ð7:62Þ
This indicates that E and H are both perpendicular to n, i.e., the propagation
direction of the electromagnetic wave. Thus, the electromagnetic wave is character-
ized by a transverse wave. The fields E and H have the same phase on P at an
arbitrary given time. Taking account of (7.60)–(7.62), E, H, and n are mutually
perpendicular to one another. We depict a geometry of E, H, and n for the
electromagnetic plane wave in Fig. 7.4, where a plane P is perpendicular to n.
We find that (7.60) is not independent of (7.61). In fact, taking an outer product
from the right with respect to both sides of (7.60), we have
pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi
H 0 n ¼ n E0 n= μ=ε ¼ E0 = μ=ε ,
n E0 n ¼ E0 : ð7:63Þ
H y
O
A ¼ nðn AÞ þ n ðA nÞ:
This relation means that A can be decomposed into a component parallel to n and
that perpendicular to n. Equation (7.63) shows that E0 has no component parallel to
n. This is another confirmation
pffiffiffiffiffiffiffiffi that E is perpendicular to n.
In (7.60) and (7.61), μ=ε has a p dimension
ffiffiffiffiffiffiffiffi [Ω]. Make sure that this can be
confirmed by (7.9) and (7.11). Hence, μ=ε is said to be characteristic impedance
[2]. We denote it by
pffiffiffiffiffiffiffiffi
Z μ=ε: ð7:64Þ
Thus, we have
H0 ¼ ðn E0 Þ=Z:
In vacuum we have
pffiffiffiffiffiffiffiffiffiffiffi
Z0 ¼ μ0 =ε0 376:7 ½Ω:
For the electromagnetic wave to be the transverse wave means that neither E nor
H has component along n. Choosing the positive direction of the z-axis for n and
7.3 Polarized Characteristics of Electromagnetic Waves 283
ignoring components related to partial differentiation with respect to x and y (i.e., the
component related to ∂/∂x and ∂/∂y), we rewrite (7.28) and (7.29) for individual
Cartesian coordinates as
2
∂ Ey 2
∂ Bx
þ ¼ 0:
∂z
2 ∂z∂t
Also differentiating the fourth equation of (7.65) with respect to t and multiplying
both sides by μ, we have
2
2
∂ Hx ∂ Dy
μ þμ ¼ 0:
∂t∂z ∂t
2
Summing both sides of the above equations and arranging terms, we get
2 2
∂ Ey ∂ Ey
2
¼ με 2
:
∂z ∂t
2 2
∂ Ex ∂ E
2
¼ με 2 x :
∂z ∂t
2 2
2
∂ Hx
2
∂ Hx ∂ Hy ∂ Hy
2
¼ με 2
and 2
¼ με 2
:
∂z ∂t ∂z ∂t
From the above relations, we have two plane electromagnetic waves polarized either
the x-axis or y-axis.
284 7 Maxwell’s Equations
What is implied in the above description is that as solutions of (7.35) and (7.36)
we have a plane wave characterized by a specific direction defined by E0 and H0.
This implies that if we observe the electromagnetic wave at a fixed point, both E and
H oscillate along the mutually perpendicular directions E0 and H0. Hence, we say
that the “electric wave” is polarized in the direction E0 and that the “magnetic wave”
is polarized in the direction H0.
To characterize the polarization of the electromagnetic wave, we introduce
following unit polarization vectors εe and εm (with indices e and m related to the
electric and magnetic field, respectively) [3]:
where E0 and H0 are said to be amplitude and may be again complex. We have
εe εm = n: ð7:67Þ
We call εe and εm a unit polarization vector of the electric field and magnetic field,
respectively. As noted above, εe, εm, and n constitute a right-handed system in this
order and are mutually perpendicular to one another.
The phase of E in the plane wave (7.58) and that of H in (7.59) are individually
the same on all the points of P. From the wave equations of (7.35) and (7.36),
however, it is unclear whether E and H have the same phase. Suppose that E and H
would have a different phase such that
E = E0 eiðkxωtÞ
and
f0 eiðkxωtÞ ,
H = H 0 eiðkxωtþδÞ ¼ H 0 eiδ eiðkxωtÞ ¼ H
where Hf0 ð¼ H 0 eiδ Þ is complex and a phase factor eiδ in the exponent is unknown.
This factor, however, can be set at zero. To show this, let us make qualitative
discussion using Fig. 7.5. Figure 7.5 shows the electric field of the plane wave at
some instant as a function of phase Φ. Suppose that Φ is taken in the direction of n in
Fig. 7.4. Then, from (7.53) we have
Φ ¼ kn x ωt ¼ kn ρn ωt ¼ kρ ωt,
Suppose furthermore that the phase is measured at t ¼ 0 and that the electric field is
measured along the εe direction. Then, we have
7.4 Superposition of Two Electromagnetic Waves 285
Electric field
sinusoidal curve
representing the electric P1 P2
field is shifted with time 0
from left to right with its
form unchanged
−
0
Phase
where E1 (>0) and E2 (>0) are amplitudes and e1 and e2 represent unit polarization
vectors in the direction of positive x-axis and y-axis; we assume that two waves are
286 7 Maxwell’s Equations
being propagated in the direction of the positive z-axis; δ is a phase difference. The
total electric field E is described as the superposition of E1 and E2 such that
E2
Ey ¼ E: ð7:71Þ
E1 x
This is an equation of a straight line. The resulting electric field E is called a linearly
polarized light accordingly. That is, when we are observing the electric field of the
relevant light at the origin, the field is oscillating along the straight line described by
(7.71) with the origin centrally located of the oscillating field. If δ ¼ π, we have
E2
Ey ¼ E:
E1 x
This gives a straight line as well. Therefore, if we wish to seek the relationship
between Ex and Ey, it suffices to examine it as a function of δ in a region of π2
δ π2.
(i) Case I: E1 6¼ E2.
Let us consider the case where δ 6¼ 0 in (7.70). Rewriting the second equation of
(7.70) and inserting the first equation into it so that we can eliminate t, we have
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi!
E E2
E y ¼ E 2 ð cos ωt cos δ þ sin ωt sin δÞ ¼ E 2 cos δ x sin δ 1 x2 :
E1 E1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ey E E2
ð cos δÞ x ¼ ð sin δÞ 1 x2 : ð7:72Þ
E2 E1 E1
Ex 2 2ð cos δÞE x E y Ey 2
þ ¼ 1: ð7:73Þ
E 21 sin 2 δ E1 E2 sin 2 δ E22 sin 2 δ
Note that the above matrix is real symmetric. In that case, to examine properties
of the matrix we calculate its determinant along with principal minors. The principal
minor means a minor with respect to a diagonal element. In this case, two principal
1 1
minors are E2 sin 2 and 2
δ E sin 2 δ
. Also we have
2 1
1 cos δ
E21 sin 2 δ E 1 E 2 sin 2 δ 1
cos δ 1 ¼ E2 E 2 sin 2 δ : ð7:75Þ
1 2
E E sin 2 δ E22 sin 2 δ
1 2
Evidently, two principal minors as well as a determinant are all positive (δ 6¼ 0). In
this case, the (2, 2) matrix of (7.74) is said to be positive definite. The related
discussion will be given in Part III. The positive definiteness means that in a
quadratic form described by (7.74), LHS takes a positive value for any real number
Ex and Ey except a unique case where Ex ¼ Ey ¼ 0, which renders LHS zero. The
positive definiteness of a matrix ensures the existence of positive eigenvalues with
the said matrix.
Let us consider a real symmetric (2, 2) matrix that has positive principal minors
and a positive determinant in a general case. Let such a matrix M be
a c
M¼ ,
c b
where a, b > 0 and det M > 0; i.e., ab c2 > 0. Let a corresponding quadratic form
be Q. Then, we have
288 7 Maxwell’s Equations
a c x cy 2 c2 y2 aby2
Q ¼ ð x yÞ ¼ ax þ 2cyx þ by ¼ a x þ
2 2
2 þ 2
c b y a a a
cy 2 y2
¼a xþ þ 2 ab c2 :
a a
Thus, Q 0 for any real numbers x and y. We seek a condition under which Q ¼ 0.
We readily find that with M that has the above properties, only x ¼ y ¼ 0 makes
Q ¼ 0. Thus, M is positive definite. We will deal with this issue from a more general
standpoint in Part III.
In general, it is pretty complicated to seek eigenvalues and corresponding eigen-
vectors in the above case. Yet, we can extract important information from (7.74).
The eigenvalues λ are estimated as follows:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
E 21 þ E 22 E21 þ E22 4E21 E22 sin 2 δ
λ¼ : ð7:76Þ
2E 21 E 22 sin 2 δ
2
E21 E22 þ 4E21 E22 cos 2 δ > 0 ðδ 6¼ π=2Þ:
Also we have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
E 21 þ E22 > E 21 þ E 22 4E 21 E 22 sin 2 δ:
These clearly show that the quadratic form of (7.74) gives an ellipse (i.e., elliptically
polarized light). Because of the presence of the second term of LHS of (7.73), both
the major and minor axes of the ellipse are tilted and diverted from the x- and y-axes.
Let us inspect the ellipse described by (7.74). Inserting Ex ¼ E1 obtained at t ¼ 0
in (7.70) into (7.73) and solving a quadratic equation with respect to Ey, we get Ey as
a double root such that
E y ¼ E 2 cos δ:
E x ¼ E 1 cos δ:
These results show that an ellipse described by (7.73) or (7.74) is internally tangent
to a rectangle as depicted in Fig. 7.6a. Equation (7.69) shows that the electromag-
netic wave is propagated toward the positive direction of the z-axis. Therefore, in
Fig. 7.6a we are peeking into the oncoming wave from the bottom of a plane of paper
7.4 Superposition of Two Electromagnetic Waves 289
(a) (b)
cos
cos
Fig. 7.6 Trace of an electric field of an elliptically polarized light. (a) The trace is internally tangent
to a rectangle of 2E1 2E2. In the case of δ > 0, starting from P at t ¼ 0, the coordinate point
representing the electric field traces the ellipse counterclockwise with time. (b) The trace of an
elliptically polarized light for δ ¼ π/2
Ex 2 Ey 2
þ 2 ¼ 1: ð7:77Þ
E 21 E2
Thus, the principal axes of the ellipse coincide with the x- and y-axes. On the basis of
(7.70), we see from Fig. 7.6b that starting from P at t ¼ 0, again the coordinate point
representing the electric field traces the ellipse counterclockwise with time; see the
curved arrow of Fig. 7.6b. If δ < 0, the coordinate point traces the ellipse clockwise
with time.
(ii) Case II: E1 ¼ E2.
Now, let us consider a simple but important case. When E1 ¼ E2, (7.73) is simplified
to be
1 cos δ Ex
Ex Ey ¼ E 21 sin 2 δ: ð7:79Þ
cos δ 1 Ey
we obtain
1 þ cos δ 0
P1 AP ¼ : ð7:85Þ
0 1 cos δ
Notice that eigenvalues (1 + cos δ) and (1 cos δ) are both positive as expected.
Rewriting (7.79), we have
1 1 cos δ 1
Ex
E x Ey PP PP
cos δ 1 Ey
7.4 Superposition of Two Electromagnetic Waves 291
1 þ cos δ 0 1
Ex
¼ Ex Ey P P ¼ E21 sin 2 δ: ð7:86Þ
0 1 cos δ Ey
The coordinate system is depicted in Fig. 7.7 along with the basis vectors. The
relevant discussion will again appear in Part III.
Substituting (7.87) for (7.86) and rearranging terms, we get
u2 v2
þ 2 ¼ 1: ð7:90Þ
E21 ð1 cos δÞ E 1 ð1 þ cos δÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Equation (7.90) ffi indicates that a major axis and minor axis are E1 1 þ cos δ and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
E 1 1 cos δ, respectively. When δ ¼ π/2, (7.90) becomes
(a) (b)
1 − cos
>0
= /2
Fig. 7.8 Polarized feature of light in the case of E1 ¼ E2. (a) If δ > 0, the electric field traces an
ellipse from A1 via A2 to A3 (see text). (b) If δ ¼ π/2, the electric field traces a circle from C1 via C2
to C3 (left-circularly polarized light)
u2 v 2
þ ¼ 1: ð7:91Þ
E21 E 21
This represents a circle. For this reason, the wave described by (7.91) is called a
circularly polarized light. In (7.90) where δ 6¼ π/2, the wave is said to be an
elliptically polarized light. Thus, we have linearly, elliptically, and circularly polar-
ized lights depending on a magnitude of δ.
Let us closely examine characteristics of the elliptically and circularly polarized
lights in the case of E1 ¼ E2. When t ¼ 0, from (7.70) we have
as in the case of Fig. 7.6. If δ takes a negative value, on the other hand, the field traces
the ellipse clockwise.
If δ ¼ π/2, in (7.94) we have
π
Ex ¼ E 1 cos ðωt Þ and E y ¼ E 1 cos ωt : ð7:95Þ
2
1 1
eþ pffiffiffi ðe1 þ ie2 Þ and e pffiffiffi ðe1 ie2 Þ, ð4:45Þ
2 2
In the case of δ ¼ 0, we have a linearly polarized light. For this, the points A1, A2,
and A3 coalesce to be a point on a straight line of Ey ¼ Ex.
References
1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
2. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
3. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York
Chapter 8
Reflection and Transmission
of Electromagnetic Waves in Dielectric
Media
Z Z
∂B
rot E ndS þ ndS ¼ 0, ð8:1Þ
S S ∂t
With the line integral of the first term, C is a closed loop surrounding the rectangle
S and dl = tdl, where t is a unit vector directed toward the tangential direction of
C (see t1 and t2 in Fig. 8.1). The line integration is performed such that C is followed
counter-clockwise in the direction of t.
Figure 8.2 gives an intuitive diagram that explains the Stokes’ theorem. The
diagram shows an overview of a surface S encircled by a closed curve C. Suppose
that we have a spiral vector field E represented by arrowed circles as shown. In that
case, rot E is directed toward the upper side of the plane of paper in the individual
fragments. A summation of rot E ndS forms a surface integral covering S. Mean-
while, the arrows of adjacent fragments cancel out each other and only the compo-
nents on the periphery (i.e., the curve C) are nonvanishing (see Fig. 8.2). Thus, the
surface integral of rot E is equivalent to the line integral of E. Accordingly, we get
Stokes’ theorem described by [1]
Δl
t1
Δh
D1
D2 n C t2
S
Fig. 8.1 A small rectangle S that strides an interface formed by two semi-infinite dielectric media
of D1 and D2. Let a curve C be a closed loop surrounding the rectangle S. A unit vector n is directed
to a normal of S. Unit vectors t1 and t2 are directed to a tangential line of the interface plane
C
dS
n
Fig. 8.2 Diagram that intuitively explains the Stokes’ theorem. In the diagram a surface S is
encircled by a closed curve C. An infinitesimal portion of C is denoted by dl. The surface S is
pertinent to the surface integration. Spiral vector field E is present on and near S
8.2 Basic Concepts Underlying Phenomena 297
Z I
rot E ndS ¼ E dl: ð8:3Þ
S C
ΔlðE1 t1 þ E2 t2 Þ ¼ 0,
where E1 and E2 represent the electric field in the dielectrics D1 and D2 close to the
interface, respectively. Considering t2 = 2 t1 and putting t1 = t, we get
ðE1 E2 Þ t ¼ 0, ð8:4Þ
where t represents a unit vector in the direction of a tangential line of the interface
plane. Equation (8.4) means that the tangential components of the electric field are
continuous on both sides of the interface. We obtain a similar result with the
magnetic field. This can be shown by taking a surface integral of both sides of
(7.29) as well. As a result, we get
ðH 1 H 2 Þ t ¼ 0, ð8:5Þ
where H1 and H2 represent the magnetic field in D1 and D2 close to the interface,
respectively. Hence, from (8.5) the tangential components of the magnetic field are
continuous on both sides of the interface as well.
where Fi, Fr, and Ft denote an amplitude of the field; εi, εr, and εt represent a unit
vector of polarization direction, i.e., the direction along which the field oscillates; ki,
kr, and kt are wavenumber vectors such that ki ⊥ εi, kr ⊥ εr, and kt ⊥ εt. These
wavenumber vectors represent the propagation directions of individual waves. In
(8.6) to (8.8), indices of i, r, and t stand for incidence, reflection, and transmission,
respectively.
Let xs be an arbitrary position vector at the interface between the dielectrics. Also,
let t be a unit vector paralleling the interface. Thus, tangential components of the
field are described as
Note that F it and F rt represent the field in D1 just close to the interface and that F tt
denotes the field in D2 just close to the interface. Thus, in light of (8.4) and (8.5), we
have
F it þ F rt ¼ F tt : ð8:12Þ
Notice that (8.12) holds with any position xs and any time t.
Let us think of elementary calculation of exponential functions or exponential
polynomials and the relationship between individual coefficients and exponents.
0
With respect to two functions eikx and eik x , we have two alternatives according to a
value Wronskian takes. Here, Wronskian W is expressed as
eikx eik x
0
0
W ¼ ikx 0 ik0 x 0 ¼ iðk k 0 Þeiðkþk Þx : ð8:13Þ
e e
0
(i) W 6¼ 0 if and only if k 6¼ k0. In this case, eikx and eik x are said to be linearly
independent. That is, on condition of k 6¼ k0, for any x we have
8.2 Basic Concepts Underlying Phenomena 299
0
aeikx þ beik x ¼ 0 ⟺ a ¼ b ¼ 0: ð8:14Þ
where W 6¼ 0 if and only if k1 6¼ k2, k2 6¼ k3, and k3 6¼ k1. That is, on this condition for
any x we have
If the three exponential functions are linearly dependent, at least two of k1, k2, and k3
are equal to each other, and vice versa. On this condition, again consider a following
equation of an exponential polynomial:
a þ b ¼ 0 and c ¼ 0: ð8:18Þ
a þ b þ c ¼ 0: ð8:19Þ
F i ðt εi Þeiðki xs ωtÞ þ F r ðt εr Þeiðkr xs ωtÞ F t ðt εt Þeiðkt xs ωtÞ ¼ 0: ð8:20Þ
Again, (8.20) must hold with any position xs and any time t. Meanwhile, for (8.20) to
have a physical meaning, we should have
F i ðt εi Þ 6¼ 0, F r ðt εr Þ 6¼ 0, and F t ðt εt Þ 6¼ 0: ð8:21Þ
On the basis of the above consideration, we must have following two relations:
ki xs ωt ¼ kr xs ωt ¼ kt xs ωt
or
ki xs ¼ kr xs ¼ k t xs , ð8:22Þ
and
F i ðt εi Þ þ F r ðt εr Þ F t ðt εt Þ ¼ 0
or
F i ðt εi Þ þ F r ðt εr Þ ¼ F t ðt εt Þ: ð8:23Þ
In this way, we are able to obtain a relation among amplitudes of the fields of
incidence, reflection, and transmission. Notice that we get both the relations between
exponents and coefficients at once.
First, let us consider (8.22). Suppose that the incident light (ki) is propagated in a
dielectric medium D1 in parallel to the zx-plane and that the interface is the xy-plane
(see Fig. 8.3). Also suppose that at the interface the light is reflected partly back to
D1 and transmitted (or refracted) partly into another dielectric medium D2. In
Fig. 8.3, ki, kr, and, kt represent the incident, reflected, and transmitted lights that
make an angle θ, θ0, and ϕ with the z-axis, respectively. Then we have
8.2 Basic Concepts Underlying Phenomena 301
θ θ kr
' x
'
θ ki
φ
kt
Fig. 8.3 Geometry of the incident, reflected, and transmitted lights. We assume that the light is
incident from a dielectric medium D1 toward another medium D2. The wavenumber vectors ki, kr,
and, kt represent the incident, reflected, and transmitted (or refracted) lights with an angle θ, θ0, and
ϕ, respectively. Note here that we did not assume the equality of θ and θ0 (see text)
0 1
ki sin θ
B C
ki ¼ ð e 1 e 2 e 3 Þ @ 0 A, ð8:24Þ
ki cos θ
0 1
x
B C
xs ¼ ð e 1 e 2 e3 Þ @ y A , ð8:25Þ
0
ki xs ¼ k i x sin θ, ð8:26Þ
kr xs ¼ k rx x þ kry y, ð8:27Þ
kt xs ¼ ktx x þ kty y, ð8:28Þ
where ki ¼ j kij; krx and kry are x and y components of kr; similarly ktx and k ty are
x and y components of kt.
Since (8.22) holds with any x and y, we have
From (8.30) neither kr nor kt has a y component. This means that ki, kr, and kt are
coplanar. That is, the incident, reflected, and transmitted waves are all parallel to the
zx-plane. Notice that at the beginning we did not assume the coplanarity of those
waves. We did not assume the equality of θ and θ0 either (vide infra). From (8.29)
and Fig. 8.3, however, we have
302 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
where kr ¼ j krj and kt ¼ j ktj; θ0 and ϕ are said to be a reflection angle and a
refraction angle, respectively. Thus, the end points of ki, kr, and kt are connected on a
straight line that parallels the z-axis. Figure 8.3 clearly shows it.
Now, we suppose that a wavelength of the electromagnetic wave in D1 is λ1 and
that in D2 is λ2. Since the incident light and reflected light are propagated in D1, we
have
ki ¼ kr ¼ 2π=λ1 : ð8:32Þ
θ ¼ θ0 : ð8:34Þ
This implies that the components tangential to the interface of ki, kr, and kt are
the same.
Meanwhile, we have
kt ¼ 2π=λ2 : ð8:36Þ
Also we have
c ¼ λ0 ν, v1 ¼ λ1 ν, v2 ¼ λ2 ν, ð8:37Þ
where v1 and v2 are phase velocities of light in D1 and D2, respectively. Since ν is
common to D1 and D2, we have
or
sin θ kt λ1 n2
¼ ¼ ¼ ð nÞ, ð8:39Þ
sin ϕ ki λ2 n1
On the basis of the above argument, we are now in the position to determine the
relations among amplitudes of the electromagnetic fields of waves of incidence,
reflection, and transmission. Notice that since we are dealing with non-absorbing
media, the relevant amplitudes are real (i.e., positive or negative). In other words,
when the phase is retained upon reflection, we have a positive amplitude due to
ei0 ¼ 1. When the phase is reversed upon reflection, on the other hand, we will be
treating a negative amplitude due to eiπ ¼ 1. Nevertheless, when we consider the
total reflection, we deal with a complex amplitude (vide infra).
We start with the discussion of the vertical incidence of an electromagnetic wave
before the general oblique incidence. In Fig. 8.4a, we depict electric fields E and
magnetic fields H obtained at a certain moment near the interface. We index, e.g., Ei
for the incident field. There we define unit polarization vectors of the electric field εi,
εr, and εt as identical to be e1 (a unit vector in the direction of the x-axis). In (8.6), we
also define Fi (both electric and magnetic fields) as positive.
Hi Ei Hi Ei
Hr Er Hr Er
' x x
Ht e1 ' e1
Et Ht Et
Fig. 8.4 Geometry of the electromagnetic fields near the interface between dielectric media D1 and
D2 in the case of vertical incidence. (a) All Ei, Er, and Et are directed in the same direction e1 (i.e., a
unit vector in the positive direction of the x-axis). (b) Although Ei and Et are directed in the same
direction, Er is reversed. In this case, we define Er as negative
304 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
We have two cases about a geometry of the fields (see Fig. 8.4). The first case is
that all Ei, Er, and Et are directed in the same direction (i.e., the positive direction of
the x-axis); see Fig. 8.4a. Another case is that although Ei and Et are directed in the
same direction, Er is reversed (Fig. 8.4b). In this case, we define Er as negative.
Notice that Ei and Et are always directed in the same direction and that Er is directed
either in the same direction or in the opposite direction according to the nature of the
dielectrics. The situation will be discussed soon.
Meanwhile, unit polarization vectors of the magnetic fields are determined by
(7.67) for the incident, reflected, and transmitted waves. In Fig. 8.4, the magnetic
fields are polarized along the y-axis (i.e., perpendicular to the plane of paper). The
magnetic fields Hi and Ht are always directed to the same direction as in the case of
the electric fields. On the other hand, if the phase of Er is conserved, the direction of
Hr is reversed and vice versa. This converse relationship with respect to the electric
and magnetic fields results solely from the requirement that E, H, and the propaga-
tion unit vector n of light must constitute a right-handed system in this order. Notice
that n is reversed upon reflection.
Next, let us consider an oblique incidence. With the oblique incidence, electro-
magnetic waves are classified into two special categories, i.e., transverse electric
(TE) waves (or modes) or transverse magnetic (TM) waves (or modes). The TE wave
is characterized by the electric field that is perpendicular to the incidence plane,
whereas the TM wave is characterized by the magnetic field that is perpendicular to
the incidence plane. Here the incidence plane is a plane that is formed by the
propagation direction of the incident light and the normal to the interface of the
two dielectrics. Since E, H, and n form a right-handed system, in the TE wave H lies
on the incidence plane. For the same reason, in the TM wave E lies on the incidence
plane.
In a general case where a field is polarized in an arbitrary direction, that field can
be formed by superimposing two fields corresponding to the TE and TM waves. In
other words, if we take an arbitrary field E, it can be decomposed into a component
having a unit polarization vector directed perpendicular to the incidence plane and
another component having the polarization vector that lies on the incidence plane.
These two components are orthogonal to each other.
Example 8.1: TE Wave In Fig. 8.5 we depict the geometry of oblique incidence of
a TE wave. The xy-plane defines the interface of the two dielectrics and t of (8.9) lies
on that plane. The zx-plane defines the incidence plane. In this case, E is polarized
along the y-axis with H polarized in the zx-plane. That is, regarding E, we choose
polarization direction εi, εr, and εt of the electric field as e2 (a unit vector toward the
positive direction of the y-axis that is perpendicular to the plane of paper). In Fig. 8.5,
the polarization direction of the electric field is denoted by a symbol ⨂. Therefore,
we have
e2 εi = e2 εr = e2 εt = 1: ð8:40Þ
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 305
Hr z
ki εi kr
εt
θ
' x
y '
Ei Et εr Er
φ
kt
Fig. 8.5 Geometry of the electromagnetic fields near the interface between dielectric media D1 and
D2 in the case of oblique incidence of a TE wave. The electric field E is polarized along the y-axis
(i.e., perpendicular to the plane of paper) with H polarized in the zx-plane. Polarization directions εi,
εr, and εt are given for H. To avoid complication, neither Hi nor Ht is shown
For H we define the direction of unit polarization vectors εi, εr, and εt so that their
direction cosine relative to the x-axis can be positive; see Fig. 8.5. Choosing e1
(a unit vector in the direction of the x-axis) for t of (8.9) with regard to H, we have
Ei þ Er ¼ Et , ð8:42Þ
H i cos θ þ H r cos θ ¼ H t cos ϕ: ð8:43Þ
To derive the above equations, we choose t = e2 with E and t = e1 with H for (8.23).
Because of the abovementioned converse relationship with E and H, we have
E r H r < 0: ð8:44Þ
pffiffiffiffiffiffiffiffiffiffiffi
Z1 ¼ μ1 =ε1 ¼ E i =H i ¼ E r =H r , ð8:45Þ
pffiffiffiffiffiffiffiffiffiffiffi
Z 2 ¼ μ2 =ε2 ¼ E t =H t , ð8:46Þ
where ε1 and μ1 are permittivity and permeability of D1, respectively; ε2 and μ2 are
permittivity and permeability of D2, respectively. As an example, we have
where we distinguish polarization vectors of electric and magnetic fields. Note in the
above discussion, however, we did not distinguish these vectors to avoid complica-
tion. Comparing coefficients of the last relation of (8.47), we get
Ei =Z 1 ¼ H i : ð8:48Þ
On the basis of (8.42) to (8.46), we are able to decide Er, Et, Hi, Hr, and Ht.
What we wish to determine, however, is a ratio among those quantities. To this
end, dividing (8.42) and (8.43) by Ei (>0), we define following quantities:
R⊥ ⊥
E E r =E i and T E E t =E i , ð8:49Þ
where R⊥ ⊥
E and T E are said to be a reflection coefficient and transmission coefficient
with the electric field, respectively; the symbol ⊥ means a quantity of the TE wave
(i.e., electric field oscillating vertically with respect to the incidence plane). Thus
rewriting (8.42) and (8.43) and using R⊥ ⊥
E and T E , we have
9
R⊥ ⊥
E T E ¼ 1, =
cos θ cos ϕ cos θ ð8:50Þ
R⊥
E Z þ T⊥
E Z ¼ :;
1 2 Z1
1 1
cosθ cos θ
2Z 2 cos θ
⊥ Z Z1
TE ¼ 1 ¼ : ð8:52Þ
1 1 Z cos θ þ Z 1 cos ϕ
2
cos θ cos ϕ
Z1 Z2
Similarly, defining
R⊥ ⊥
H H r =H i and T H H t =H i , ð8:53Þ
where R⊥ ⊥
H and T H are said to be a reflection coefficient and transmission coefficient
with the magnetic field, respectively, we get
Z 1 cos ϕ Z 2 cos θ
R⊥
H ¼ , ð8:54Þ
Z 2 cos θ þ Z 1 cos ϕ
2Z 1 cos θ
T⊥
H ¼ : ð8:55Þ
Z 2 cos θ þ Z 1 cos ϕ
In this case, rewrite (8.42) as a relation among Hi, Hr, and Ht using (8.45) and (8.46).
Derivation of (8.54) and (8.55) is left for readers. Notice also that
R⊥ ⊥
H ¼ RE : ð8:56Þ
k Z 2 cos ϕ Z 1 cos θ
RE ¼ , ð8:59Þ
Z 1 cos θ þ Z 2 cos ϕ
k 2Z 2 cos θ
TE ¼ : ð8:60Þ
Z 1 cos θ þ Z 2 cos ϕ
Also, we get
308 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
k Z 1 cos θ Z 2 cos ϕ k
RH ¼ ¼ RE , ð8:61Þ
Z 1 cos θ þ Z 2 cos ϕ
k 2Z 1 cos θ
TH ¼ : ð8:62Þ
Z 1 cos θ þ Z 2 cos ϕ
In Table 8.1, we list the important coefficients in relation to the reflection and
transmission of electromagnetic waves along with their relationship.
In Examples 8.1 and 8.2, we have examined how the reflection and transmission
coefficients vary as a function of characteristic impedance as well as incidence and
refraction angles. Meanwhile, in a nonmagnetic substance a refractive index n can be
approximated as
pffiffiffiffi
n εr , ð7:57Þ
Using this relation, we can readily rewrite the reflection and transmission coeffi-
cients as a function of refractive indices of the dielectrics. The derivation is left for
the readers.
where εe and εm are unit polarization vector; we assume that both E and H are
positive. Notice again that εe, εm, and n constitute a right-handed system in this
order.
The energy transport is characterized by a Poynting vector S that is described by
S = E H: ð8:65Þ
V A
Since E and H have a dimension [m ] and [m], respectively, S has a dimension [mW2]. Hence,
S represents an energy flow per unit time and per unit area with respect to the propagation
direction. For simplicity, let us assume that the electromagnetic wave is propagating
toward the z-direction. Then we have
1
cos 2 ωt ¼ ð1 þ cos 2ωt Þ, ð8:69Þ
2
1
S ¼ EHe3 : ð8:70Þ
2
Equivalently, we have
1
S ¼ E H : ð8:71Þ
2
1
W ¼ ðE D þ H BÞ, ð8:72Þ
2
where the first and second terms are pertinent to the electric and magnetic fields,
respectively. Note in (8.72) that the dimension of E D is [m m2] ¼ [mJ3] and that the
V C
1 2
W¼ εE þ μH 2 : ð8:73Þ
2
εE2 = μH 2 : ð8:75Þ
This implies that the energy density resulting from the electric field and that due to
the magnetic field have the same value. Thus, rewriting (8.74) we have
1 1
W ¼ εE 2 = μH 2 : ð8:76Þ
2 2
1 1
S ¼ vεE 2 e3 ¼ vμH 2 e3 : ð8:78Þ
2 2
cos ϕ
R⊥ ⊥ ⊥ ⊥
E RH þ TE T H ¼ 1, ð8:79Þ
cos θ
8.4 Energy Transport by Electromagnetic Waves 311
k k k k cos ϕ
RE RH þ T E T H ¼ 1: ð8:80Þ
cos θ
In both the TE and TM cases, we define reflectance R and transmittance T such that
where Sr and Si are time-averaged Poynting vectors of the reflected wave and
incident waves, respectively. Also, we have
R þ T ¼ 1: ð8:83Þ
ϕ
The relation (6.83) represents the energy conservation. The factor cos cos θ can be
understood by Fig. 8.6 that depicts a luminous flux near the interface. Suppose
that we have an incident wave with an irradiance I mW2 whose incidence plane is the
zx-plane. Notice that I has the same dimension as a Poynting vector.
Here let us think of the luminous flux that is getting through a unit area (i.e., a unit
length square) perpendicular to the propagation direction of the light. Then, this flux
illuminates an area on the interface of a unit length (in the y-direction) multiplied by
ϕ
cos θ (in the x-direction). That is, the luminous flux has been widened
a length of cos
ϕ
(or squeezed) by cos cos θ times after getting through the interface. The irradiance has
been weakened (or strengthened) accordingly (see Fig. 8.6). Thus, to take a balance
of income and outgo with respect to the luminous flux before and after getting
ϕ
through the interface, the transmission irradiance must be multiplied by a factor cos cos θ.
θ
'
' x
kt
312 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
cos θ n cos ϕ
R⊥
E ¼ , ð8:84Þ
cos θ þ n cos ϕ
k cos ϕ n cos θ
RE ¼ , ð8:85Þ
cos ϕ þ n cos θ
sin θ
½numerator of ð8:84Þ ¼ cos θ n cos ϕ ¼ cos θ cos ϕ
sin ϕ
sin ϕ cos θ sin θ cos ϕ sin ðϕ θÞ
¼ ¼ , ð8:86Þ
sin ϕ sin ϕ
where with the second equality we used Snell’s law; the last equality is due to
trigonometric formula. Since we assume 0 < θ < π/2 and 0 < ϕ < π/2, we have
π/2 < ϕ θ < π/2. Therefore, if and only if ϕ θ ¼ 0, sin(ϕ θ) ¼ 0. Namely,
only when ϕ ¼ θ, R⊥ E could vanish. For different dielectrics having different
refractive indices, only if ϕ ¼ θ ¼ 0 (i.e., a vertical incidence), we have ϕ ¼ θ.
But, in that case we have
sin ðϕ θÞ 0
lim ¼ :
ϕ!0, θ!0 sin ϕ 0
1n
R⊥
E ¼ , ð8:87Þ
1þn
k 1n
RE ¼ :
1þn
sin θ
½numerator of ð8:85Þ ¼ cos ϕ n cos θ ¼ cos ϕ cos θ
sin ϕ
sin ϕ cos ϕ sin θ cos θ sin ðϕ θÞ cos ðϕ þ θÞ
¼ ¼ : ð8:88Þ
sin ϕ sin ϕ
With the last equality of (8.88), we used a trigonometric formula. From (8.86) we
ðϕθÞ k
know that sinsin ϕ does not vanish. Therefore, for RE to vanish, we need cos
(ϕ + θ) ¼ 0. Since 0 < ϕ + θ < π, cos(ϕ + θ) ¼ 0 if and only if
ϕ þ θ ¼ π=2: ð8:89Þ
ϕB þ θB ¼ π=2, ð8:90Þ
k
we have RE ¼ 0; i.e., we do not observe a reflected wave. The particular angle θB is
said to be the Brewster angle. For θB we have
π
sin ϕB ¼ sin θB ¼ cos θB ,
2
n ¼ sin θB = sin ϕB ¼ sin θB = cos θB ¼ tan θB or θB ¼ tan 1 n,
ϕB ¼ tan 1 n1 : ð8:91Þ
θB '
x
φB
' φB
' θB
Fig. 8.7 Diagram that explains the Brewster angle. Suppose that a parallel plate consisting of a
dielectric D2 of a refractive index n2 is sandwiched with another dielectric D1 of a refractive index
n1. The incidence angle θB represents the Brewster angle observed when the TM wave is incident
from D1 to D2. ϕB is another Brewster angle that is observed when the TM wave is getting back
from D2 to D1
314 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
e
θB ¼ tan 1 n1 : ð8:92Þ
e
θ B ¼ ϕB : ð8:93Þ
Thus, regarding the TM wave that is propagating in D2 after getting through the
interface and is to get back to D1, eθB ¼ ϕB is again the Brewster angle. In this way,
the said TM wave is propagating from D1 to D2 and then getting back from D2 to D1
without being reflected by the two interfaces. This conspicuous feature is often
utilized for an optical device.
If an electromagnetic wave is incident from a dielectric of a higher refractive
index to that of a lower index, the total reflection takes place. This is equally the case
with both TE and TM waves. For the total reflection to take place, θ should be larger
than a critical angle θc that is defined by
θc ¼ sin 1 n: ð8:94Þ
sin θc n
¼ sin θc ¼ 2 ð nÞ: ð8:95Þ
sin π2 n1
θc > θB : ð8:96Þ
The critical angle is always larger than the Brewster angle with TM waves.
8.6 Total Reflection 315
In Sect. 8.2 we saw that the Snell’s law results from the kinematical requirement. For
this reason, we may consider it as a universal relation that can be extended to
complex refraction angles. In fact, for the Snell’s law to hold with θ > θc, we
must have
π
ϕ¼ þ ia ða : real, a 6¼ 0Þ, ð8:98Þ
2
we have
1 iϕ 1
sin ϕ e eiϕ ¼ ðea þ ea Þ > 1, ð8:99Þ
2i 2
1 i
cos ϕ eiϕ þ eiϕ ¼ ðea ea Þ: ð8:100Þ
2 2
where ktx and ktz are x and z components of kt, respectively; kt ¼ j ktj. Putting
we have
Et = Eεt eiðxkt sin ϕþibzkt ωtÞ ¼ Eεt eiðxkt sin ϕωtÞ ebzkt : ð8:104Þ
b > 0: ð8:106Þ
Meanwhile, we have
sin2 θ n2 sin2 θ
cos2 ϕ ¼ 1 sin2 ϕ ¼ 1 ¼ , ð8:107Þ
n2 n2
where notice that n < 1, because we are dealing with the incidence of light from a
medium with a higher refractive index to a low index medium. When we consider
the total reflection, the numerator of (8.107) is negative, and so we have two choices
such that
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sin2 θ n2
cos ϕ ¼ i : ð8:108Þ
n
Then, we have
⊥ ⊥
R⊥ ¼ R⊥E RH ¼ R⊥ E RE
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cos θ i sin2 θ n2 cos θ þ i sin2 θ n2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1: ð8:111Þ
cos θ þ i sin2 θ n2 cos θ i sin2 θ n2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
k n2 cos θ þ i sin2 θ n2 n2 cos θ i sin2 θ n2
R ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1: ð8:113Þ
n2 cos θ þ i sin2 θ n2 n2 cos θ i sin2 θ n2
The relations (8.111) and (8.113) ensure that the energy flow gets back to a higher
refractive index medium.
Thus, the total reflection is characterized by the complex reflection coefficient
expressed as (8.110) and (8.112) as well as a reflectance of 1. From (8.110) and
(8.112) we can estimate a change in a phase of the electromagnetic wave that takes
place by virtue of the total reflection. For this purpose, we put
k
R⊥
E e and RE e :
iα iβ
ð8:114Þ
sin θc ¼ n: ð8:116Þ
Therefore, we have
1 n2 ¼ cos2 θc : ð8:117Þ
The argument α defines a phase shift upon the total reflection. Considering (8.115)
and (8.118), we have
318 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
α
= =
2
1+
= sin
2
αjθ¼θc ¼ 0:
Since 1 n2 > 0 (i. e., n < 1) and in the total reflection region sin2θ n2 > 0, the
imaginary part of R⊥ E is negative for any θ (i.e., 0 to π/2). On the other hand, the real
part of R⊥
E varies from 1 to 1, as is evidenced from (8.118) and (8.119). At θ0 that
satisfies a following condition:
rffiffiffiffiffiffiffiffiffiffiffiffiffi
1 þ n2
sin θ0 ¼ , ð8:121Þ
2
the real part is zero. Thus, the phase shift α varies from 0 to π as indicated in
Fig. 8.8. Comparing (8.121) with (8.116) and taking into account n < 1, we have
Then, we have
k
RE ¼ 1: ð8:123Þ
θ¼θc
k
RE ¼ 1: ð8:124Þ
θ¼π=2
βjθ¼θc ¼ π: ð8:126Þ
Therefore, the denominator of (8.122) is positive and, hence, the imaginary part of
k
RE is positive as well for any θ (i.e., 0 to π/2). From (8.123) and (8.124), on the other
hand, the real part of RE in (8.122) varies from 1 to 1. At e
k
θ0 that satisfies a
following condition:
rffiffiffiffiffiffiffiffiffiffiffiffiffi
1 n2
cos e
θ0 ¼ , ð8:128Þ
1 þ n4
k
the real part of RE is zero. Once again, we have
θc < e
θ0 < π=2: ð8:129Þ
There are many optical devices based upon light propagation. Among them, wave-
guide devices utilize the total reflection. We explain their operation principle.
Suppose that we have a thin plate (usually said to be a slab) comprising a
dielectric medium that infinitely spreads two-dimensionally and that the plate is
sandwiched with another dielectric (or maybe air or vacuum) or metal. In this
320 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
β
1
0
= =
2
situation electromagnetic waves are confined within the slab. Moreover, only under a
restricted condition those waves are allowed to propagate in parallel to the slab
plane. Such electromagnetic waves are usually called propagation modes or simply
“modes.” An optical device thus designed is called a waveguide. These modes are
characterized by repeated total reflection during the propagation. Another mode is an
evanescent mode. Because of the total reflection, the energy transport is not allowed
to take place vertically to the interface of two dielectrics. For this reason, the
evanescent mode is thought to be propagated in parallel to the interface very close
to it.
(a) Metal
(b) Dielectric (clad layer)
Waveguide
Waveguide
d
d
(core layer)
z z
x x
Fig. 8.10 Cross-section of a slab waveguide comprising a dielectric medium. (a) A waveguide is
sandwiched between a couple of metal layers. (b) A waveguide is sandwiched between a couple of
layers consisting of another dielectric called clad layer. The sandwiched layer is called core layer
other dielectrics. We distinguish this case as the total internal reflection from the
above case (i). We further describe it in Sect. 8.7.2. For the total internal reflection to
take place, the refractive index of the waveguide must be higher than those of other
dielectrics (Fig. 8.10b). The dielectric of the waveguide is called core layer and the
other dielectric is called clad layer. In this case, electromagnetic waves are allowed
to propagate inside of the clad layer, even though the region is confined very close to
the interface between the clad and core layers. Such electromagnetic waves are said
to be an evanescent wave. According to these two cases (i) and (ii), we have different
conditions under which the allowed modes can exist.
Now, let us return to Maxwell’s equations. We have introduced the equations of
wave motion (7.35) and (7.36) from Maxwell’s eqs. (7.28) and (7.29) along with
(7.7) and (7.10). One of their simplest solutions is a plane wave described by (7.53).
The plane wave is characterized by that the wave has the same phase on an infinitely
spreading plane perpendicular to the propagation direction (characterized by a
wavenumber vector k). In a waveguide, however, the electromagnetic field is
confined with respect to the direction parallel to the normal to the slab plane (i.e.,
the direction of the y-axis in Fig. 8.10). Consequently, the electromagnetic field can
no longer have the same phase in that direction. Yet, as solutions of equations of
wave motion, we can have a solution that has the same phase with the direction of the
x-axis. Bearing in mind such a situation, let us think of Maxwell’s equations in
relation to the equations of wave motion.
Ignoring components related to partial differentiation with respect to x (i.e., the
component related to ∂/∂x) and rewriting (7.28) and (7.29) for individual Cartesian
coordinates, we have [3].
∂E z ∂Ey ∂Bx
2 þ ¼ 0, ð8:130Þ
∂y ∂z ∂t
322 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
∂E x ∂By
þ ¼ 0, ð8:131Þ
∂z ∂t
∂Ex ∂Bz
þ ¼ 0, ð8:132Þ
∂y ∂t
∂H z ∂H y ∂Dx
2 ¼ 0, ð8:133Þ
∂y ∂z ∂t
∂H x ∂Dy
¼ 0, ð8:134Þ
∂z ∂t
∂H x ∂Dz
¼ 0: ð8:135Þ
∂y ∂t
2
∂ E x ∂ By
2
þ ¼ 0, ð8:136Þ
∂z
2 ∂z∂t
2 2
∂ E x ∂ Bz
¼ 0, ð8:137Þ
∂y
2 ∂y∂t
2
2
∂ Hz ∂ H y ∂2 Dx
2 ¼ 0: ð8:138Þ
∂t∂y ∂t∂z ∂t
2
Multiplying (8.138) by μ and further adding (8.136) and (8.137) to it and using (7.7),
we get
2 2 2
∂ Ex ∂ Ex ∂ E
2
þ 2
¼ με 2 x : ð8:139Þ
∂y ∂z ∂t
2 2 2
∂ Hx ∂ Hx ∂ Hx
2
þ 2
¼ με 2
: ð8:140Þ
∂y ∂z ∂t
Equations (8.139) and (8.140) are two-dimensional wave equations with respect
to the y- and z-coordinates. With the direction of the x-axis, a propagating wave has
the same phase. Suppose that we have plane wave solutions for them as in the case of
(7.58) and (7.59). Then, we have
8.7 Waveguide Applications 323
E = E0 eiðkxωtÞ ¼ E0 eiðknxωtÞ ,
H = H0 eiðkxωtÞ ¼ H0 eiðknxωtÞ : ð8:141Þ
Note that a plane wave expressed by (7.58) and (7.59) is propagated uniformly in a
dielectric medium. In a waveguide, on the other hand, the electromagnetic waves
undergo repeated (total) reflections from the two boundaries positioned either side of
the waveguide, while being propagated.
In a three-dimensional version, the wavenumber vector has three components kx,
ky, and kz as expressed in (7.48). In (8.141), in turn, k has y- and z-components such
that
k2 ¼ k 2 ¼ k 2y þ k2z : ð8:142Þ
2 2 2 2 2 2
∂ Ex ∂ Ex ∂ Ex ∂ Hx ∂ Hx ∂ Hx
þ ¼ με , þ ¼ με :
∂ð yÞ2 ∂ð zÞ2 ∂ ð t Þ2 ∂ð yÞ2 ∂ð zÞ2 ∂ ð t Þ2
Figure 8.11 indicates this situation where an electromagnetic wave can be propa-
gated within a slab waveguide in either one direction out of four choices of k. In this
section, we assume that the electromagnetic wave is propagated toward the positive
direction of the z-axis, and so we define kz as positive. On the other hand, ky can be
either positive or negative. Thus, we get
−| | | |
−| |
324 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
k kz
ky
θ ky θ
k
kz
z
x
Figure 8.12 shows the geometries of the electromagnetic waves within the slab
waveguide. The slab plane is parallel to the zx-plane. Let the positions of the two
interfaces of the slab waveguide be
y ¼ 0 and y ¼ d: ð8:144Þ
Eðz, yÞ = εe Eþ eiðkz sin θþky cos θωtÞ þ εe 0 E eiðkz sin θky cos θωtÞ , ð8:145Þ
where E+ (E) and εe ε0e represent an amplitude and unit polarization vector of the
incident (reflected) waves, respectively. The vector εe (or εe0) is defined in (7.67).
Equation (8.145) is common to both the cases of TE and TM waves. From now,
we consider the TE mode case. Suppose that the slab waveguide is sandwiched with
a couple of metal sheet of high conductance. Since the electric field must be absent
inside the metal, the electric field at the interface must be zero owing to the
continuity condition of a tangential component of the electric field. Thus, we require
the following condition should be met with (8.145):
8.7 Waveguide Applications 325
t εe E þ þ t εe 0 E ¼ 0, ð8:147Þ
E þ þ E ¼ 0:
This means that the reflection coefficient of the electric field is 1. Denoting
E+ ¼ E E0 (>0), we have
h i
E = e1 E0 eiðkz sin θþky cos θωtÞ eiðkz sin θky cos θωtÞ
h i
= e1 E0 eiky cos θ eiky cos θ eiðkz sin θωtÞ
Note that in terms of the boundary conditions we are thinking of Dirichlet conditions
(see Sects. 1.3 and 10.3). In this case, we have nodes for the electric field at the
interface between metal and a dielectric. For this condition to be satisfied, we must
have
m kd=π: ð8:150Þ
Meanwhile, we have
k ¼ nk 0 , ð8:151Þ
326 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
where n is a refractive index of a dielectric that shapes the slab waveguide; the
quantity k0 is a wavenumber of the electromagnetic wave in vacuum. The index n is
given by
n ¼ c=v, ð8:152Þ
where c and v are light velocity in vacuum and the dielectric media, respectively.
Here v is meant as a velocity in an infinitely spreading dielectric. Thus, θ is allowed
to take several (or more) numbers depending upon k, d, and m.
Since in the z-direction no specific boundary conditions are imposed, we have
propagating modes in that direction characterized by a propagation constant (vide
infra). Looking at (8.148), we notice that k sin θ plays a role of a wavenumber in a
free space. For this reason, a quantity β defined as
1=2
m2 π 2
β¼ k2 : ð8:154Þ
d2
Thus, the allowed TE waves indexed by m are called TE modes and represented as
TEm. The phase velocity vp is given by
vp ¼ ω=β: ð8:155Þ
1
dω dβ
vg ¼ ¼ : ð8:156Þ
dβ dω
vg ¼ v2 β=ω: ð8:157Þ
Thus, we have
vp vg ¼ v2 : ð8:158Þ
n1 > n2 ð8:159Þ
so that the total internal reflection can take place at the interface of D1 and D2. In this
case, the dielectrics D1 and D2 act as a core layer and a clad layer, respectively.
The biggest difference between the present waveguide and the previous one is
that unlike the previous case, the total internal reflection occurs in the present case.
Concomitantly, an evanescent wave is present in the clad layer very close to the
interface.
First, let us estimate the conditions that are satisfied so that an electromagnetic
wave can be propagated within a waveguide. Figure 8.13 depicts a cross-section of
the waveguide where the light is propagated in the direction of k. In Fig. 8.13,
suppose that we have a normal N to the plane of paper at P. Then, N and a straight
line XY shape a plane NXY. Also suppose that a dielectric fills a semi-infinite space
situated below NXY. Further suppose that there is another virtual plane N’X’Y’ that
is parallel with NXY as shown. Here N0 is parallel to N. The separation of the two
parallel plane is d. We need the virtual plane N’X’Y0 just to estimate an optical path
difference (or phase difference, more specifically) between two waves, i.e., a prop-
agating wave and a reflected wave.
Let n be a unit vector in the direction of k; i.e.,
n ¼ k= j k j¼ k=k: ð8:160Þ
d
fills a semi-infinite space 4
situated below NXY. We
suppose another virtual ;ಬ 1ಬ % <ಬ
plane N’X’Y0 that is parallel 3ಬ
with NXY. We need the θ
plane N’X’Y0 to estimate an
optical path difference $ಬ
4ಬ
(or phase difference)
%ಬ
x = rn þ su þ tv, ð8:162Þ
where u and v represent unit vectors in the direction perpendicular to n. Then (8.161)
can be expressed by
E = E0 eiðkrωtÞ : ð8:163Þ
Suppose that the wave is propagated starting from a point A to P and reflected at
P. Then, the wave is further propagated to B and reflected again to reach Q. The
wave front is originally at AB and finally at PQ. Thus, the Z-shaped optical path
length APBQ is equal to a separation between A’B’ and PQ. Notice that the
separation between A’B’ and P’Q’ is taken so that it is equal to that between AB
and PQ. The geometry of Fig. 8.13 implies that two waves starting from AB and
A’B’ at once reach PQ again at once.
We find the separation between AB and A’B’ is
2d cos θ:
Let us tentatively call these waves Wave-AB and Wave-A’B’ and describe their
electric fields as EAB and EA0 B0 , respectively. Then, we denote
where k is a wavenumber in the dielectric. Note that since EA0 B0 gets behinds EAB, a
plus sign appears in the first term of the exponent. Therefore, the phase difference
between the two waves is
Now, let us come back to the actual geometry of the waveguide. That is, the core
layer of thickness d is sandwiched by a couple of clad layers (Fig. 8.10b). In this
situation, the wave EAB experiences the total internal reflection two times, which we
ignored in the above discussion of the metal waveguide. Since the total internal
reflection causes a complex phase shift, we have to take account of this effect. The
phase shift was defined as α of (8.114) for a TE mode and β for a TM mode. Notice
that in Fig. 8.13 the electric field oscillates perpendicularly to the plane of paper with
the TE mode, whereas it oscillates in parallel with the plane of paper with the TM
mode. For both the cases the electric field oscillates perpendicularly to n. Conse-
quently, the phase shift due to these reflections has to be added to (8.166). Thus, for
the phase commensuration to be obtained, the following condition must be satisfied:
where δ is either δTE or δTM defined below according to the case of the TE wave and
TM wave, respectively. For a practical purpose, (8.167) is dealt with by a numerical
calculation, e.g., to design an optical waveguide.
Unlike (8.149), what is the most important with (8.167) is that the condition l ¼ 0
is permitted because of δ < 0 (see just below).
For convenience and according to the custom, we adopt a phase shift notation
other than that defined in (8.114). With the TE mode, the phase is retained upon
reflection at the critical angle, and so we identify α with an additional component
δTE. In the TM case, on the other hand, the phase is reversed upon reflection at the
critical angle (i.e., a π shift occurs). Since this π shift has been incorporated into β, it
suffices to consider only an additional component δTM. That is, we have
We rewrite (8.110) as
aeiσ
R⊥
E ¼e ¼ e
iα iδTE
¼ e2iσ
aeiσ
and
where we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aeiσ ¼ cos θ þ i sin 2 θ n2 : ð8:170Þ
Therefore,
330 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
δ sin2 θ n2
tan σ ¼ tan TE ¼ : ð8:171Þ
2 cos θ
k beiτ
RE ¼ eiβ ¼ eiðδTM þπ Þ ¼ e2iτ ¼ eið2τþπÞ , ð8:172Þ
beiτ
where
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
beiτ ¼ n2 cos θ þ i sin2 θ n2 : ð8:173Þ
Note that the minus sign with the second last equality in (8.172) is due to the phase
reversal upon reflection. From (8.172), we may put
β ¼ 2τ þ π:
Consequently, we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
δ sin2 θ n2
tan τ ¼ tan TM ¼ : ð8:175Þ
2 n2 cos θ
Finally, the additional phase change δTE and δTM upon the total reflection is given
by [4].
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 sin2 θ n2 1 sin2 θ n2
δTE ¼ 2 tan and δTM ¼ 2 tan : ð8:176Þ
cos θ n2 cos θ
We emphasize that in (8.176) both δTE and δTM are negative quantities. This phase
shift has to be included in (8.167) as a negative quantity δ. At a first glance, (8.176)
seems to differ largely from (8.120) and (8.125). Nevertheless, noting that a trigo-
nometric formula
2 tan x
tan 2x ¼
1 tan2 x
and remembering that δTM in (8.168) includes π arising from the phase reversal, we
find that both the relations are virtually identical.
Evanescent waves are drawing a large attention in the field of basic physics and
applied device physics. If the total internal reflection is absent, ϕ is real. But, under
8.8 Stationary Waves 331
the total internal reflection, ϕ is pure imaginary. The electric field of the evanescent
wave is described as
Et = Eεt eiðkt z sin ϕþkt y cos ϕωtÞ ¼ Eεt eiðkt z sin ϕþkt yibωtÞ
¼ Eεt eiðkt z sin ϕωtÞ ekt yb : ð8:177Þ
v1 ω ω v
v1 < vðpsÞ ¼ ¼ ¼ ¼ vðpeÞ ¼ 2 < v2 , ð8:178Þ
sin θ k sin θ kt sin ϕ sin ϕ
where v1 and v2 are light velocity in a free space filled by the dielectric D1 and D2,
respectively. For this, we used a relation described as
ω ¼ v1 k ¼ v2 k t : ð8:179Þ
We also used Snell’s law with the third equality. Notice that sin ϕ > 1 in the
evanescent region and that kt sin ϕ is a propagation constant in the clad layer. Also
note that vðpsÞ is equal to vðpeÞ and that these phase velocities are in between the two
velocities of the free space. Thus, the evanescent waves must be present, accompa-
nying propagating waves that undergo the total internal reflections in a slab
waveguide.
As remarked in (6.105), the electric field of evanescent waves decays exponen-
tially with increasing z. This implies that the evanescent waves exist only in the clad
layer very close to an interface of core and clad layers.
So far, we have been dealing with propagating waves either in a free space or in a
waveguide. If the dielectric shaping the waveguide is confined in another direction,
the propagating waves show specific properties. Examples include optical fibers.
In this section we consider a situation where the electromagnetic wave is prop-
agating in a dielectric medium and reflected by a “wall” formed by metal or another
dielectric. In such a situation, the original wave (i.e., a forward wave) causes
interference with the backward wave and a stationary wave is formed as a conse-
quence of the interference.
332 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
To approach this issue, we deal with superposition of two waves that have
different phases and different amplitudes. To generalize the problem, let ψ 1 and
ψ 2 be two cosine functions described as
Here we wish to unify (8.181) as a single cosine (or sine) function. To this end, we
modify a description of ψ 2 such that
ψ ¼ ψ1 þ ψ2
¼ ½a1 þ a2 cos ðb1 b2 Þ cos b1 þ a2 sin ðb1 b2 Þ sin b1 : ð8:183Þ
we get
where θ is expressed by
a2 sin ðb1 b2 Þ
tan θ ¼ : ð8:186Þ
a1 þ a2 cos ðb1 b2 Þ
a1
θ
a2
O b1 b2 x
where the former equation represents a forward wave, whereas the latter a backward
wave. Then we have
b1 b2 ¼ 2kx: ð8:188Þ
with
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R¼ a1 2 þ a2 2 þ 2a1 a2 cos 2kx ð8:190Þ
and
a2 sin 2kx
tan θ ¼ : ð8:191Þ
a1 þ a2 cos 2kx
Equation (8.189) looks simple, but since both R and θ vary as a function of x, the
situation is somewhat complicated unlike a simple sinusoidal wave. Nonetheless,
when x takes a special value, (8.189) is expressed by a simple function form. For
example, at t ¼ 0,
This corresponds to (i) of Fig. 8.15. If t ¼ T/2 (where T is a period, i.e., T ¼ 2π/ω),
we have
This corresponds to (iii) of Fig. 8.15. But, the waves described by (ii) or (iv) do not
have a simple function form.
We characterize Fig. 8.15 below. If we have
respectively. Notice that in Fig. 8.15 we took a1, a2 > 0. At another instant t ¼ T/4,
we have similarly
Thus, the waves that vary with time are characterized by two dram-shaped envelopes
that have extremals ja1 + a2j and |a1 a2| or those j a1 + a2j and |a1 a2|. An
important implication of Fig. 8.15 is that no node is present in the superposed wave.
8.8 Stationary Waves 335
In other words, there is no instant t0 when ψ(x, t0) ¼ 0 for any x. From the aspect of
energy transport of electromagnetic waves, if |a1| > j a2j (where a1 and a2 represent
an amplitude of the forward and backward waves, respectively), the net energy flow
takes place in the travelling direction of the forward wave. If, on the other hand, |
a1| > ja2j, the net energy flow takes place in the travelling direction of the backward
wave. In this respect, think of Poynting vectors.
In case |a1| ¼ ja2j, the situation is particularly simple. No net energy flow takes
place in this case. Correspondingly, we observe nodes. Such waves are called
stationary waves. Let us consider this simple situation for an electromagnetic wave
that is incident perpendicularly to the interface between two dielectrics (one of them
may be a metal).
Returning back to (7.58) and (7.66), we describe two electromagnetic waves that
are propagating in the positive and negative direction of the z-axis such that
where εe is a unit polarization vector arbitrarily fixed so that it can be parallel to the
interface, i.e., wall (i.e., perpendicular to the z-axis). The situation is depicted in
Fig. 8.16. Notice that in Fig. 8.16 E1 represents the forward wave (i.e., incident
wave) and E2 the backward wave (i.e., wave reflected at the interface). Thus, a
superposed wave is described as
E ¼ E1 þ E2 : ð8:196Þ
= ( )
336 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
h i
E = E1 εe eiðkzωtÞ þ eiðkzωtÞ
¼ E1 εe eiωt eikz þ eikz ¼ 2E 1 εe eiωt cos kz: ð8:197Þ
Note that in (8.198) variables z and t have been separated. This implies that
we have a stationary wave. For this case to be realized, the characteristic
impedance of the dielectric of the incident wave side should be smaller enough
than that of the other side; see (8.51) and (8.59). In other words, the dielectric
constant of the incident side should be large enough. We have nodes at positions
that satisfy
π 1 m
kz ¼ þ mπ ðm ¼ 0, 1, 2, Þ or z ¼ λ þ λ: ð8:199Þ
2 4 2
Note that we are thinking of the stationary wave in the region of z < 0.
Equation (8.199) indicates that nodes are formed at a quarter wavelength from
the interface and every half wavelength from it. The node means the position
where no electric field is present.
Meanwhile, antinodes are observed at positions
m
kz ¼ mπ ðm ¼ 0, 1, 2, Þ or z ¼ þ λ:
2
Thus, the nodes and antinodes alternate with every quarter wavelength.
(ii) Anti-phase:
The phase of the electric field is reversed. We assume that E1 ¼ E2 (>0).
Then, we have
h i
E = E 1 εe eiðkzωtÞ eiðkzωtÞ ¼ E1 εe eiωt eikz eikz
In (8.201) variables z and t have been separated as well. For this case to be
realized, the characteristic impedance of the dielectric of the incident wave side
should be larger enough than that of the other side. In other words, the dielectric
constant of the incident side should be small enough. Practically, this situation
References 337
can easily be attained choosing a metal of high reflectance for the wall material.
We have nodes at positions that satisfy
m
kz ¼ mπ ðm ¼ 0, 1, 2, Þ or z ¼ λ: ð8:202Þ
2
The nodes are formed at the interface and every half wavelength from it. As
in the case of the syn-phase, the antinodes take place with the positions shifted
by a quarter wavelength relative to the nodes.
If there is another interface at say z ¼ L (>0), the wave goes back and forth
many times. If an absolute value of the reflection coefficient at the interface is
high enough (i.e., close to the unity), attenuation of the wave is ignorable. For
both the syn-phase and anti-phase cases, we must have
m
kL ¼ mπ ðm ¼ 1, 2, Þ or L ¼ λ ð8:203Þ
2
so that stationary waves can stand stable. For a practical purpose, an optical
device having such an interface is said to be a resonator. Various geometries
and constitutions of the resonator are proposed in combination with various
dielectrics including semiconductors. Related discussion can be seen in Chap. 9.
References
1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
2. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
3. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester
4. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge
5. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
Chapter 9
Light Quanta: Radiation and Absorption
Historically, the relevant theory was first propounded by Max Planck and then
Albert Einstein as briefly discussed in Chap. 1. The theory was developed on the
basis of the experiments called cavity radiation or blackbody radiation. Here,
however, we wish to derive Planck’s law of radiation on the assumption of the
existence of quantum harmonic oscillators.
As discussed in Chap. 2, the ground state of a quantum harmonic oscillator has an
energy 12 ħω. Therefore, we measure energies of the oscillator in reference to that
state. Let N0 be the number of oscillators (i.e., light quanta) present in the ground
state. Then, according to Boltzmann distribution law the number of oscillators of the
first excited state N1 is
N 1 ¼ N 0 eħω=kB T , ð9:1Þ
N j ¼ N 0 ejħω=kB T : ð9:2Þ
N ¼ N 0 þ N 0 eħω=kB T þ þ N 0 ejħω=kB T þ
X1
¼ N 0 j¼0 ejħω=kB T : ð9:3Þ
Let E be a total energy of the oscillator system in reference to the ground state. That
is, we put a ground-state energy at zero. Then we have
E ¼ 0 N 0 þ N 0 ħωeħω=kB T þ þ N 0 jħωejħω=kB T þ
X1
¼ N 0 j¼0 jħωejħω=kB T : ð9:4Þ
X1 X1
d X1 j d 1
jxj ¼ jxj1 x ¼ x x ¼ x
j¼0 j¼0 dx j¼0 dx 1 x
x
¼ : ð9:7Þ
ð 1 xÞ 2
X1 1
xj ¼ : ð9:8Þ
j¼0 1x
Therefore, we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 341
ħωx ħωeħω=kB T ħω
E¼ ¼ ¼ : ð9:9Þ
1 x 1 eħω=kB T eħω=kB T 1
The function
1
ð9:10Þ
eħω=kB T 1
E k B T: ð9:11Þ
Thus, the relation (9.9) asymptotically agrees with a classical theory. In other words,
according to the classical theory related to law of equipartition of energy, energy of
kBT/2 is distributed to each of two degrees of freedom of motion, i.e., a kinetic
energy and a potential energy of a harmonic oscillator.
Researcher at the time tried to seek the relationship between the energy density
inside the cavity and (angular) frequency of radiation. To reach the relationship, let
us introduce a concept of mode density of electromagnetic waves related to the
blackbody radiation. We define the mode density D(ω) as the number of modes of
electromagnetic waves per unit volume per unit angular frequency. We refer to the
electromagnetic waves having allowed specific angular frequencies and polarization
as modes. These modes must be described as linearly independent functions.
Determination of the mode density is related to boundary conditions (BCs)
imposed on a physical system. We already dealt with this problem in Chaps. 2, 3,
and 8. These BCs often appear when we find solutions of a differential equations. Let
us consider a following wave equation:
2 2
∂ ψ 1 ∂ ψ
2
¼ 2 : ð9:12Þ
∂x v ∂t 2
Substituting (9.13) for (9.12) and dividing both sides by X(x)T(t), we have
342 9 Light Quanta: Radiation and Absorption
1 d2 X 1 1 d2 T
¼ ¼ k 2 , ð9:14Þ
X dx2 v2 T dt 2
d2 X
þ k2 X ¼ 0: ð9:15Þ
dx2
Remember that k is supposed to be a complex number for the moment (see Example 1.1).
Modifying Example 1.1 a little bit such that (9.15) is posed in a domain [0, L] and
imposing the Dirichlet conditions such that
we find a solution of
where a is a constant. The constant k can be determined to satisfy the BCs; i.e.,
This solution has already appeared in Chap. 8 as a stationary solution. The readers
are encouraged to derive these results.
In a three-dimensional case, we have a wave equation
2 2 2 2
∂ ψ ∂ ψ ∂ ψ 1 ∂ ψ
2
þ 2þ 2 ¼ 2 2
: ð9:21Þ
∂x ∂y ∂z v ∂t
Similarly we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 343
1 d2 X 1 d2 Y 1 d2 Z 1 1 d2 T
þ þ ¼ ¼ k2 : ð9:23Þ
X dx2 Y dx2 Z dx2 v2 T dt 2
Putting
1 d2 X 1 d2 Y 1 d2 Z
¼ k 2
x , ¼ k 2
y , ¼ k2z , ð9:24Þ
X dx2 Y dx2 Z dx2
we have
k 2x þ k2y þ k 2z ¼ k 2 : ð9:25Þ
Then, we get a stationary wave solution as in the one-dimensional case such that
Returning to the main issue, let us deal with the mode density. Think of a cube of
each side of L that is placed as shown in Fig. 9.1. Calculating k, we have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
π ω
k¼ k 2x þ k2y þ k 2z ¼ m2x þ m2y þ m2z ¼ , ð9:28Þ
L c
where we assumed that the inside of a cavity is vacuum and, hence, the propagation
velocity of light is c. Rewriting (9.28), we have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Lω
m2x þ m2y þ m2z ¼ : ð9:29Þ
πc
x
344 9 Light Quanta: Radiation and Absorption
The number mx, my, and mz represents allowable modes in the cavity; the set of (mx,
my, mz) specifies individual modes. Note that mx, my, and mz are all positive integers.
If for instance mx were allowed, this would produce sin(kxx) ¼ sin kxx; but
this function is linearly dependent on sinkxx. Then, a mode indexed by mx should
not be regarded as an independent mode. Given a ω, a set (mx, my, mz) that satisfies
(9.29) corresponds to each mode. Therefore, the number of modes that satisfies a
following expression
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Lω
m2x þ m2y þ m2z ð9:30Þ
πc
1 dN L ω2
DðωÞdω ¼ dω ¼ 2 3 dω, ð9:32Þ
L3 dω π c
where D(ω)dω represents the number of modes per unit volume whose angular
frequencies range ω and ω + dω.
Now, we introduce a function ρ(ω) as an energy density per unit angular
frequency. Then, combining D(ω) with (9.9), we get
ħω ħω3 1
ρ ð ω Þ ¼ D ð ωÞ ¼ : ð9:33Þ
eħω=kB T 1 π 2 c3 eħω=kB T 1
The relation (9.33) is called Planck’s law of radiation. Notice that ρ(ω) has a
dimension [Jm3s].
To solve (9.15) under the Dirichlet conditions (9.16) is pertinent to analyzing an
electric field within a cavity surrounded by a metal husk, because the electric field
must be absent at an interface between the cavity and metal. The problem, however,
can equivalently be solved using the magnetic field. This is because at the interface
the reflection coefficient of electric field and magnetic field have a reversed sign (see
Chap. 8). Thus, given an equation for the magnetic field, we may use the Neumann
condition. This condition requires differential coefficients to vanish at the boundary
9.3 Two-Level Atoms 345
(i.e., the interface between the cavity and metal). In a similar manner to the above,
we get
By imposing BCs, again we have (9.18) that leads to the same result as the above.
We may also impose the periodic BCs. This type of equation has already been
treated in Chap. 3. In that case we have a solution of
Notice that eikx and eikx are linearly independent and, hence, minus sign for m is
permitted. Correspondingly, we have
4π Lω 3 L3 ω3
NL ¼ 2¼ 2 3: ð9:36Þ
3 2πc 3π c
In other words, here we have to consider a whole volume of a sphere of a half radius
of the previous case. Thus, we reach the same conclusion as before.
If the average energy of an oscillator were described by (9.11), we would obtain a
following description of ρ(ω) such that
ω2
ρðωÞ ¼ DðωÞk B T ¼ kB T: ð9:37Þ
π 2 c3
This relation is well known as RayleighJeans law, but (9.37) disagreed with
experimental results in that according to RayleighJeans law, ρ(ω) diverges toward
infinity as ω goes to infinity. The discrepancy between the theory and experimental
results was referred to as “ultraviolet catastrophe.” Planck’s law of radiation
described by (9.33), on the other hand, reproduces the experimental results well.
−
( ) ( )
=
ℏ
, Stimulated Stimulated Spontaneous
absorption emission emission
is excited up to a higher level (i.e., the stimulated absorption). (ii) The higher-level
electron may spontaneously lose its energy and return back to the ground state (the
spontaneous emission). (iii) The higher-level electron may also lose its energy and
return back to the ground state. Unlike (ii), however, the excited electron has to be
stimulated by being irradiated by light quanta having an energy corresponding to the
energy difference between the ground state and excited state (the stimulated emis-
sion). Figure 9.2 schematically depicts the optical processes of the Einstein model.
Under those postulates, Einstein dealt with the problem probabilistically. Sup-
pose that the ground state and excited state have energies E1 and E2. Einstein
assumed that light quanta having an energy equaling E2 E1 take part in all the
above three transition processes. He also propounded the idea that the light quanta
have an energy that is proportional to its (angular) frequency. That is, he thought that
the following relation should hold:
ħω21 ¼ E2 E1 , ð9:38Þ
where ω21 is an angular frequency of light that takes part in the optical transitions.
For the time being, let us follow Einstein’s postulates.
(i) Stimulated absorption: This process is simply said to be an “absorption.” Let Wa
[s1] be the transition probability that the electron absorbs a light quantum to be
excited to the excited state. Wa is described as
where N1 is the number of atoms occupying the ground state; B21 is a propor-
tional constant; ρ(ω21) is due to (9.33). Note that in (9.39) we used ω21 instead
of ω in (9.33). The coefficient B21 is called Einstein B coefficient; more
specifically one of Einstein B coefficients. Namely, B21 is pertinent to the
transition from the ground state to excited state.
9.3 Two-Level Atoms 347
(ii) Emission processes: The processes include both the spontaneous and stimulated
emissions. Let We [s1] be the transition probability that the electron emits a
light quantum and returns back to the ground state. We is described as
where N2 is the number of atoms occupying the excited state; B12 and A12 are
proportional constants. The coefficient A12 is called Einstein A coefficient
relevant to the spontaneous emission. The coefficient B12 is associated with
the stimulated emission and also called Einstein B coefficient together with B21.
Here, B12 is pertinent to the transition from the excited state to ground state.
Now, we have
The reasoning for this is as follows: The coefficients B12 and B21 are proportional to
the matrix elements pertinent to the optical transition. Let T be an operator associated
with the transition. Then, a matrix element is described using an inner product
notation of Chap. 1 by
B21 ¼ hψ 2 jT j ψ 1 i, ð9:42Þ
where ψ 1 and ψ 2 are initial and final states of the system in relation to the optical
transition. As a good approximation, we use er for T (dipole approximation), where
e is an elementary charge and r is a position operator (see Chap. 1). If (9.42)
represents the absorption process (i.e., the transition from the ground state to excited
state), the corresponding emission process should be described as a reversed process
by
B12 ¼ hψ 1 jT j ψ 2 i: ð9:43Þ
B21 ¼ hψ 1 jT { j ψ 2 i, ð9:44Þ
H { ¼ H: ð1:119Þ
T { ¼ T: ð9:45Þ
Thus, we get
But, as in the cases of Sects. 4.2 and 4.3, ψ 1 and ψ 2 can be represented as real
functions. Then, we have
That is, we assume that the matrix B is real symmetric. In the case of two-level
atoms, as a matrix form we get
0 B12
B¼ : ð9:47Þ
B12 0
W e ¼ W a: ð9:48Þ
That is,
where we used (9.41) for LHS. Assuming Boltzmann distribution law, we get
N2
¼ exp ½ðE 2 E 1 Þ=kB T : ð9:50Þ
N1
N2
¼ exp ðħω21 =kB T Þ: ð9:51Þ
N1
B21 ρðω21 Þ
exp ðħω21 =kB T Þ ¼ : ð9:52Þ
B21 ρðω21 Þ þ A12
Assuming that
A12 ħω21 3
¼ 2 3 , ð9:54Þ
B21 π c
we have
ħω21 3 1
ρðω21 Þ ¼ : ð9:55Þ
π 2 c3 exp ðħω21 =kB T Þ 1
In (9.54) we only know the ratio of A12 to B21. To have a good knowledge of these
Einstein coefficients, we briefly examine a mechanism of the dipole radiation. The
electromagnetic radiation results from an accelerated motion of a dipole.
A dipole moment p(t) is defined as a function of time t by
Z
pð t Þ ¼ x0 ρðx0 , t Þdx0 , ð9:56Þ
where qi is a charge of each point charge i and xi is a position vector of the point
charge i. From (9.56) and (9.57), we find that p(t) depends on how we set up the
coordinate system. However, if a total charge of the system is zero, p(t) does not
depend on the coordinate system. Let p(t) and p0(t) be a dipole moment viewed from
the frame O and O0, respectively (see Fig. 9.3). Then we have
350 9 Light Quanta: Radiation and Absorption
᧧ ಥ
.
.
Fig. 9.3 Dipole moment viewed from the frame O or O0
(a) +q (b) z
x
z0 a
θ
‒a ‒z0
y
O
‒q φ
x
Fig. 9.4 Electromagnetic radiation from an accelerated motion of a dipole. (a) A dipole placed at
the origin of the coordinate system is executing harmonic oscillation along the z-direction around an
equilibrium position. (b) Electromagnetic radiation from a dipole in a wave zone. εe and εm are unit
polarization vectors of the electric field and magnetic field, respectively. εe, εm, and n form a right-
handed system
X X X X X
p0 ð t Þ ¼ q x0 ¼
i i i
q ð x þ xi Þ ¼
i i 0 i
q i x 0 þ i
q i x i ¼ qx
i i i
¼ pðt Þ: ð9:58Þ
Notice that with the third equality the first term vanishes because the total charge is
zero. The system comprising two point charges that have an opposite charge ( q) is
particularly simple but very important. In that case we have
Here we assume that q > 0 according to the custom and, hence, e x is a vector directing
from the minus point charge to the plus charge.
Figure 9.4 displays geometry of an oscillating dipole and electromagnetic radia-
tion from it. Figure 9.4a depicts the dipole. It is placed at the origin of the coordinate
system and assumed to be of an atomic or molecular scale in extension; we regard a
center of the dipole as the origin. Figure 9.4b represents a large-scale geometry of the
9.4 Dipole Radiation 351
dipole and surrounding space of it. For the electromagnetic radiation, an accelerated
motion of the dipole is of primary importance. The electromagnetic fields produced
by pð€t Þ vary as the inverse of r, where r is a macroscopic distance between the dipole
and observation point. Namely, r is much larger compared to the dipole size.
There are other electromagnetic fields that result from the dipole moment. The
fields result from p(t) and pð_t Þ. Strictly speaking, we have to include those quantities
that are responsible for the electromagnetic fields associated with the dipole radia-
tion. Nevertheless, the fields produced by p(t) and pð_t Þ vary as a function of the
inverse cube and inverse square of r, respectively. Therefore, the surface integral of
the square of the fields associated with p(t) and pð_t Þ asymptotically reaches zero with
enough large r with respect to a sphere enveloping the dipole. Regarding pð€t Þ, on the
other hand, the surface integral of the square of the fields remains finite even with
enough large r. For this reason, we refer to the spatial region where pð€t Þ does not
vanish as a wave zone.
Suppose that a dipole placed at the origin of the coordinate system is executing
harmonic oscillation along the z-direction around an equilibrium position (see
Fig. 9.4). Motion of two charges having plus and minus signs is described by
where z+ and z2 are position vectors of a plus charge and minus charge, respectively;
z0 and z0 are equilibrium positions of each charge; a is an amplitude of the
harmonic oscillation; ω is an angular frequency of the oscillation. Then, accelera-
tions of the charges are given by
Meanwhile, we have
Therefore,
The quantity pð€t Þ, i.e., the second derivative of p(t) with respect to time, produces
the electric field described by [2]
352 9 Light Quanta: Radiation and Absorption
E= 2€
p=4πε0 c2 r ¼ 2 qaω2 eiωt e3 =2πε0 c2 r, ð9:66Þ
where r is a distance between the dipole and observation point. In (9.66) we ignored
a term proportional to inverse square and cube of r for the aforementioned reason.
As described in (9.66), the strength of the radiation electric field in the wave zone
measured at a point away from the oscillating dipole is proportional to a component
of the vector of the acceleration motion of the dipole [i.e., pð€t Þ ]. The radiation
electric field lies in the direction perpendicular to a line connecting the observation
point and the point of the dipole (Fig. 9.4). Let εe be a unit polarization vector of the
electric field in that direction and let E⊥ be the radiation electric field. Then, we have
E⊥ = 2 qaω2 eiωt ðe3 εe Þεe =2πε0 c2 r ¼ 2 qaω2 eiωt εe sin θ=2πε0 c2 r : ð9:67Þ
As shown in Sect. 7.3, (εe e3)εe in (9.67) “extracts” from e3 a vector component
parallel to εe. Such an operation is said to be a projection of a vector. The related
discussion can be seen in Part III.
It takes a time of r/c for the emitted light from the charge to arrive at the
observation point. Consequently, the acceleration of the charge has to be measured
at the time when the radiation leaves the charge. Let t be the instant when the electric
field is measured at the measuring point. Then, it follows that the radiation leaves the
charge at a time of t r/c. Thus, the electric field relevant to the radiation that can be
observed far enough away from the oscillating charge is described as [2]
E⊥ ðx, t Þ = 2 εe : ð9:68Þ
2πε0 c2 r
=2 εm , ð9:69Þ
2πcr
where n represents a unit vector in the direction parallel to a line connecting the
observation point and the dipole. The εm is a unit polarization vector of the magnetic
field as defined by (7.67). From the above, we see that the radiation electromagnetic
waves in the wave zone are transverse waves that show the properties the same as
those of electromagnetic waves in a free space.
Now, let us evaluate a time-averaged energy flux from an oscillating dipole.
Using (8.71), we have
9.4 Dipole Radiation 353
ω4 sin 2 θ
SðθÞ = ðeaÞ2 n: ð9:71Þ
8π 2 ε0 c3 r 2
Let us relate the above argument to Einstein A and B coefficients. Since we are
dealing with an isolated dipole, we might well suppose that the radiation comes from
the spontaneous emission. Let P be a total power of emission from the oscillating
dipole that gets through a sphere of radius r. Then we have
Z Z 2π Z π
P¼ SðθÞ ndS ¼ dϕ SðθÞr 2 sin θdθ
0 0
Z 2π Z π
ω 4
¼ ðeaÞ2 dϕ sin 3 θdθ: ð9:72Þ
8π 2 ε0 c3 0 0
Rπ
Changing cosθ to t, the integral I 0 sin 3 θdθ can be converted into
Z 1
I¼ 1 t 2 dt ¼ 4=3: ð9:73Þ
1
Thus, we have
ω4
P¼ ðeaÞ2 : ð9:74Þ
3πε0 c3
ω21 3
A12 ¼ ðeaÞ2 : ð9:75Þ
3πε0 c3 ħ
π
B12 ¼ ðeaÞ2 : ð9:76Þ
3ε0 ħ 2
ω21 3 e2 πe2
A12 ¼ jh1jrj2ij2 and B12 ¼ jh1jrj2ij2 : ð9:77Þ
3πε0 c ħ
3
3ε0 ħ2
πe2 πe2
B12 ¼ h1jrj2ih1jrj2i ¼ h1jrj2ih2jr{ j1
3ε0 ħ 2
3ε0 ħ2
πe2
¼ h1jrj2ih2jrj1i, ð9:78Þ
3ε0 ħ2
where with the second equality we used (1.116); the last equality comes from that r is
Hermitian. Meanwhile, we have
9.5 Lasers
9.5.1 Brief Outlook
t t+dt
I(x) I(x+dx)
0 x x+dx L
Fig. 9.5 Rectangular parallelepiped of a laser medium with a length L and a cross-section area
S (not shown). I(x) denotes irradiance at a point of x from the origin
N ¼ N1 þ N2, ð9:80Þ
where N1 and N2 represent the number of atoms occupying the ground and excited
states, respectively. Suppose that a light is propagated from the left of the rectangular
parallelepiped and entering it. Then, we expect that three processes occur simulta-
neously. One is a stimulated absorption and others are stimulated emission and
spontaneous emission. After these process, an increment dE in photon energy of the
total system (i.e., the rectangular parallelepiped) during dt is described by
E 0
I¼ c: ð9:84Þ
SL
c0 ¼ c=n, ð9:85Þ
c0 c0
dI ¼ dE ¼ B21 ρðω21 ÞðN 2 N 1 Þħω21 dt
SL SL
e 21 c0 dt,
¼ B21 ρðω21 ÞNħω ð9:86Þ
where Ne ¼ ðN 2 N 1 Þ=SL denotes a “net” density of atoms that occupy the excited
state.
The energy density ρ(ω21) can be written as
The quantity I(ω21) is an energy flux that gets through per unit area per unit time.
This flux corresponds to an energy contained in a long and thin rectangular paral-
lelepiped of a length c0 and a unit cross-section area. To obtain ρ(ω21), I(ω12) should
be divided by c0 in (9.87) accordingly. Using (9.87) and replacing c0dt with a distance
dx and I(ω12) with I(x) as a function of x, we rewrite (9.86) as
9.5 Lasers 357
e 21
B21 gðω21 ÞNħω
dI ðxÞ ¼ 0 I ðxÞdx: ð9:88Þ
c
where I0 is an irradiance of light at an instant when the light is entering the laser
medium from the left. Thus, we get
e 21
B21 gðω21 ÞNħω
I ðxÞ ¼ I 0 exp x : ð9:90Þ
c0
Equation (9.90) shows that an irradiance of the laser light is augmented exponen-
tially along the path of the laser light. In (9.90), denoting an exponent as G
e 21
B21 gðω21 ÞNħω
G 0 , ð9:91Þ
c
we get
The constant G is said to be a gain constant. This is an index that indicates the laser
performance. Large numbers B21, g(ω21), and N e yield a high performance of the
laser.
In Sects. 8.8 and 9.2, we sought conditions for electromagnetic waves to cause
constructive interference. In a one-dimensional dielectric medium, the condition is
described as
kL ¼ mπ or mλ ¼ 2L ðm ¼ 1, 2, Þ, ð9:92Þ
It is often the case that if the laser is a long and thin rod, rectangular parallele-
piped, etc., we see that sharply resolved and regularly spaced spectral lines are
358 9 Light Quanta: Radiation and Absorption
πc
ωm n ¼ m, ð9:95Þ
L
δn δn πc
nδωm þ ωm δn ¼ nδωm þ ωm δω ¼ n þ ωm δωm ¼ δm: ð9:96Þ
δωm m δωm L
Therefore, we get
1
πc δn
δωm ¼ n þ ωm δm: ð9:97Þ
L δωm
δn
n g n þ ωm ð9:98Þ
δωm
plays a role of refractive index when the laser material has a wavelength dispersion.
The quantity ng is said to be a group refractive index (or group index). Thus, (9.97) is
rewritten as
πc
δω ¼ δm, ð9:99Þ
Lng
where we omitted the index m of ωm. When we need to distinguish the refractive
index n clearly from the group refractive index, we refer to n as a phase refractive
index. Rewriting (9.98) as a relation of continuous quantities and using differenti-
ation instead of variation, we have [2]
dn dn
ng ¼ n þ ω or ng ¼ n λ : ð9:100Þ
dω dλ
9.5 Lasers 359
ω λ
ωdλ þ λdω ¼ 0 or ¼ :
dω dλ
where A, B, and C are appropriate constants with A and B being dimensionless and
C having a dimension [m]. In an actual case, it would be difficult to determine
n analytically. However, if we are able to obtain well-resolved spectra, δm can be put
as 1 and δωm can be determined from the free spectral range. Expressing it as δωFSR
from (9.99) we have
πc πc
δωFSR ¼ or ng ¼ : ð9:102Þ
Lng LðδωFSR Þ
(a) (b)
10 11
Intensity (103 counts)
9
0
500 520 540 560 580 600 620 640 529 530 531
Wavelength (nm) Wavelength (nm)
Fig. 9.6 Broadband emission spectra of an organic semiconductor crystal AC’7. (a) Full spectrum.
(b) Enlarged profile of the spectrum around 530 nm. Reproduced from Yamao T, Okuda Y,
Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer
single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing.
https://doi.org/10.1063/1.3634117
4
Intensity (103 counts)
0
522 524 526 528
Wavelength (nm)
Fig. 9.7 Laser oscillation spectrum of an organic semiconductor crystal AC’7. Reproduced from
Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/
phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission
of AIP Publishing. https://doi.org/10.1063/1.3634117
h 2 i2
A 1 λc þB
ng ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h i ffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h ffi: ð9:103Þ
c 2 3 c 2 i
1 λ A 1 λ þB
Group index ng
6 BP1T AC'7
refractive indices for several
organic semiconductor AC5
crystals. Reproduced from 5
Yamao T, Okuda Y,
4
Makino Y, Hotta S (2011)
Dispersion of the refractive
3
indices of thiophene/
phenylene co-oligomer
single crystals. J Appl Phys 3.3
110(5): 053113/7 pages [6], 3.2 AC'7 (b)
Refractive index n
with the permission of AIP 3.1 BP1T
Publishing. https://doi.org/ 3
10.1063/1.3634117
2.9
2.8
2.7
2.6 AC5
2.5
500 600
Wavelength (nm)
Table 9.1 Optimized constants of A, B, and C for Sellmeier’s dispersion formula (9.101) with
several organic semiconductor crystalsa
Material A B C (nm)
BP1T 5.7 1.04 397
AC5 3.9 1.44 402
AC’7 6.0 1.06 452
a
Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive
indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages,
with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117
As the phase refractive index n has a dispersion, the propagation constant β has a
dispersion as well. In optics, n sin θ is often referred to as an effective index. We
denote it by
(a)
P6T
(b) (c)
P6T crystal
air
optical device
AZO grating
2 µm
Fig. 9.10 Construction of an organic light-emitting device. (a) Structural formula of P6T. (b)
Diffraction grating of an organic device. The purple arrow indicates the direction of grating
wavevector K. Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y,
Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal
light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP
Publishing. https://doi.org/10.1063/1.5030486. (c) Cross-section of P6T crystal/AZO substrate
Fig. 9.11 Geometrical relationship among the light propagation in crystal (indicated with β), the
propagation outside the crystal (k0), and the grating wavevector (K). This geometry represents the
phase matching between the emission inside the device (organic slab crystal) and out-coupled
emission (i.e., the emission outside the device); see text
where β is the propagation vector with |β| β that appeared in (8.153). The quantity
β is called a propagation constant. Equation (9.105) can be visualized in Fig. 9.11 as
geometrical relationship among the light propagation in crystal (indicated with β),
the propagation outside the crystal (k0), and the grating wavevector (K ). The grating
wavevector K is defined as
jK j K ¼ 2π=Λ,
where Λ is the grating period and the direction of K is perpendicular to the grating
grooves; that direction is indicated in Fig. 9.10b with a purple arrow. The unit vector
ee is defined as
β mK
ee ¼ :
j β mK j
where λp is the emission peak wavelength in vacuum. Note that k0 is almost identical
to the wavenumber of the emission in air (i.e., the emission to be detected). If we are
dealing with the emission parallel to the substrate plane, i.e., grazing emission,
(9.105) can be reduced to
9.5 Lasers 365
β 2 mK ¼ k0 : ð9:107Þ
Equations (9.105) and (9.107) represent the relationship between β and k0, i.e., the
phase matching conditions between the emission inside the device (organic slab
crystal) and the out-coupled emission, i.e., the emission outside the device (in air).
Thus, the boundary conditions can be restated as (i) the tangential component
continuity of electromagnetic fields at both the planes of interface (see Sect. 8.1) and
(ii) the phase matching of the electromagnetic fields both inside and outsides the
laser medium. We examine these two factors below.
(i) Tangential component continuity of electromagnetic fields:
Let us suppose that we are dealing with a dielectric medium with anisotropic
dielectric constant, but isotropic magnetic permeability. In this situation, Maxwell’s
equations must be formulated so that we can deal with the electromagnetic properties
in an anisotropic medium. Such substances are widely available and organic crystals
are counted as typical examples. Among those crystals, P6T crystallizes in the
monoclinic system as in many other cases of organic crystals [10, 11]. In this case,
the permittivity tensor (or electric permittivity tensor) ε is written as [7]
0 1
εaa 0 εac
B C
ε¼@ 0 εbb 0 A: ð9:108Þ
εc a 0 εc c
∂H
rot E ¼ 2 μ0 , ð9:110Þ
∂t
366 9 Light Quanta: Radiation and Absorption
0 1
εaa 0 εac
B C ∂E
rot H ¼ ε0 @ 0 εbb 0 A : ð9:111Þ
∂t
εc a 0 εc c
In (9.111) the equation is described in the abc -system. But, it will be desired to
describe the Maxwell’s equations in the laboratory coordinate system (see Fig. 9.12)
so that we can readily visualize the light propagation in an anisotropic crystal such as
P6T. Figure 9.12 depicts the ξηζ-system, where the ξ-axis is in the direction of the
light propagation within the crystal, namely the ξ-axis parallels β in (9.107). We
assume that the ξηζ-system forms an orthogonal coordinate.
Here we define the dielectric constant ellipsoid e ε described by
0 1
ξ
B C
ðξ η ζ Þe
ε@ η A ¼ 1, ð9:112Þ
ζ
which represents an ellipsoid in the ξηζ-system. If one cuts the ellipsoid by a plane
that includes the origin of the coordinate system and perpendicular to the ξ-axis, its
cross-section is an ellipse. In this situation, one can choose the η- and ζ- axes for the
principal axis of the ellipse. This implies that when we put ξ ¼ 0, we must have
0 1
0
B C
ð0 η ζ Þe
ε@ η A ¼ εηη η2 þ εζζ ζ 2 ¼ 1: ð9:113Þ
ζ
0 1
εξξ εξη εξζ
B C
e
ε ¼ @ εηξ εηη 0 A,
εζξ 0 εζζ
where eε is expressed in the ξηζ-system. In terms of the matrix algebra, the principal
submatrix with respect to the η- and ζ- components should be diagonalized (see Sect.
14.5). On this condition, the electric flux density of the electromagnetic wave is
polarized in the η- and ζ-direction. A half of the reciprocal square root of the
pffiffiffiffiffiffi pffiffiffiffiffiffi
principal axes, i.e., 1= εηη or 1= εζζ , represents the anisotropic refractive indices.
Suppose that the ξηζ-system is reached by two successive rotations starting from
the abc -system in such a way that we first perform the rotation by α around the c -
axis that is followed by another rotation by δ around the ξ-axis (Fig. 9.12). Then we
have [7]
0 1 0 1
εξξ εξη εξζ εaa 0 εac
B C 1 1 B C
@ εηξ εηη 0 A ¼ Rξ ðδÞRc ðαÞ@ 0 εbb 0 ARc ðαÞRξ ðδÞ, ð9:114Þ
εζξ 0 εζζ εc a 0 εc c
where Rc ðαÞ and Rξ(δ) stand for the first and second rotation matrices of the above,
respectively. In (9.114) we have
0 1 0 1
cos α sin α 0 1 0 0
B C B C
Rc ðαÞ ¼ @ sin α cos α 0 A, Rξ ðδÞ ¼ @ 0 cos δ sin δ A: ð9:115Þ
0 0 1 0 sin δ cos δ
With the matrix presentations and related coordinate transformations for (9.114)
and (9.115), see Sects. 11.4, 17.1, and 17.4.2. Note that the rotation angle δ cannot
be decided independent of α. In the literature [7], furthermore, the quantity εξη ¼ εηξ
was ignored as small compared to other components. As a result, (9.111) is
converted into the following equation described by
0 1
εξξ 0 εξζ
B C ∂E
rot H ¼ ε0 @ 0 εηη 0 A : ð9:116Þ
∂t
εζξ 0 εζζ
Notice here that the electromagnetic quantities H and E are measured in the ξηζ-
coordinate system.
368 9 Light Quanta: Radiation and Absorption
Meanwhile, the electromagnetic waves that are propagating within the slab
crystal are described as
where Φν stands for either an electric field or a magnetic field with ν chosen from
ν ¼ 1, 2, 3 representing each component of the ξηζ-system. Inserting (9.117) into
(9.116) and separating the resulting equation into ξ, η, and ζ components, we obtain
six equations with respect to six components of H and E. Of these, we are partic-
ularly interested in Eξ and Eζ as well as Hη, because we assume that the electromag-
netic wave is propagated as a TM mode [7].
Using the set of the above six equations, with Hη in the P6T crystal we get the
following second-order linear differential equation (SOLDE):
d2 H η εξζ dH η εξζ 2 εξξ
þ 2ineff k 0 þ εξξ n 2
k 2 H ¼ 0, ð9:118Þ
dζ 2 εζζ dζ εζζ εζζ eff 0 η
and inserting (9.119) into (9.118), we get a following type of equation described by
In (9.119) Hcryst represents the magnetic field within the P6T crystal. The
constants Ζ and Ω in (9.120) can be described using κ and k together with the
dH
constant coefficients of dζη and Hη of SOLDE (9.118). Meanwhile, we have
eiκζ cos ðkζ þ ϕ Þ eiκζ sin ðkζ þ ϕs Þ
s 2iκ
iκζ 0 0 ¼ ke 6¼ 0,
e cos ðkζ þ ϕs Þ eiκζ sin ðkζ þ ϕs Þ
where the differentiation is pertinent to ζ. This shows that eiκζ cos (kζ + ϕs) and
eiκζ sin (kζ + ϕs) are linearly independent. This in turn implies that
ΖΩ0 ð9:121Þ
εξζ εξζ 1
κ¼β ¼ neff k0 , k 2 ¼ 2 εξξ εζζ εξζ 2 εζζ neff 2 k0 2 : ð9:122Þ
εζζ εζζ εζζ
εζζ
Eξ ¼ i kH eiκζ sin ðkζ þ ϕs Þ: ð9:123Þ
ωε0 ðεξξ εζζ εξζ 2 Þ cryst
d2 H η 2
2
þ e
n neff 2 k0 2 H η ¼ 0, ð9:124Þ
dζ
where e
n is the refractive index of either the substrate or air. Since
e
n neff n, ð9:125Þ
for the substrate (i.e., AZO) and air, from (9.124) we express the electromagnetic
fields as
e γζ ,
H η ¼ He ð9:127Þ
where He denotes the magnetic field relevant to the substrate or air. Considering that
ζ < 0 for the substrate and ζ > d/ cos δ in air (see Fig. 9.13), with the magnetic field
we have
H η ¼ He e γðζ cosd δÞ
e γζ and H η ¼ He ð9:128Þ
for the substrate and air, respectively. Notice that Hη ! 0, when ζ ! 1 with the
substrate and ζ ! 1 for air. Correspondingly, with the electric field we get
370 9 Light Quanta: Radiation and Absorption
γ e γζ γ e γðζ cosd δÞ
E ξ ¼ i He and E ξ ¼ i He ð9:129Þ
ωε0e
n 2
ωε0e
n2
where the factor of cosδ comes from the requirement of the tangential continuity
condition. (b) The tangential continuity condition for the electric field at the same
interface reads as
εζζ γ e γ0
i kH eiκ0 sin ðk 0 þ ϕs Þ ¼ i He : ð9:131Þ
ωε0 ðεξξ εζζ εξζ 2 Þ cryst ωε0e
n2
εζζ γ εζζ γ
k tan ϕs ¼ 2 or k tan ðϕs Þ ¼ 2 , ð9:132Þ
εξξ εζζ εξζ 2 e
n ε ξξ ε ζζ ε ξζ
2
e
n
where γ and e n are substrate related quantities; see (9.125) and (9.126). Another
boundary condition at the air/crystal interface can be obtained in a similar manner.
Notice that in that case ζ ¼ d/ cos δ; see Fig. 9.13. As a result, we have
cross-section of air/P6T
crystal/AZO substrate. The
angle δ is identical with that
of Fig. 9.12. The points air
P and P0 are located at the
crystal/substrate interface
and air/crystal interface, P6T crystal
respectively. The distance
between P and P0 is d/ cos δ
AZO substrate
9.5 Lasers 371
εζζ kd γ
k tan þ ϕs ¼ , ð9:133Þ
εξξ εζζ εξζ 2 cos δ e
n2
where γ and e n are air related quantities. The quantity ϕs can be eliminated
considering arc tangent of (9.132) and (9.133). In combination with (9.122), we
finally get the following eigenvalue equation for TM modes:
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
k0 d ðε ε εξζ 2 Þðεζζ neff 2 Þ=ð cos δÞ
εζζ 2 ξξ ζζ
" sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi#
1 1 n 2 nair 2
¼ lπ þ tan ðεξξ εζζ εξζ 2 Þ eff
nair 2 εζζ neff 2
" sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi#
1 1 n 2 nsub 2
þ tan ðεξξ εζζ εξζ 2 Þ eff , ð9:134Þ
nsub 2 εζζ neff 2
where nair (¼ 1) is the refractive index of air, nsub is that of AZO substrate equipped
with a diffraction grating, d is the crystal thickness, and l is the order of the transverse
mode. The refractive index nsub is estimated as the average of the refractive indices
of air and AZO weighted by volume fraction they occupy in the diffraction grating.
To solve (9.134), iterative numerical computation was needed. The detailed calcu-
lation procedures for this can be seen in the literature [7] and supplementary material
therein.
Note that (9.134) includes the well-known eigenvalue equation for a waveguide
that comprises an isotropic core dielectric medium sandwiched by a couple of clad
layers having the same refractive index (symmetric waveguide) [12]. In that case, in
fact, the second and third terms in RHS of (9.134) are the same and their sum is
identical with δTM of (8.176) in Sect. 8.7.2. To confirm it, in (9.134) use
εξξ ¼ εζζ n2 [see (7.57)] and εξζ ¼ 0 together with the relative refractive index
given by (8.39).
(ii) Phase matching of the electromagnetic fields:
Once we have obtained the eigenvalue equation described by (9.134), we will be
able to solve the problem and, hence, to design a high-performance optical device,
especially the laser. To this end, let us return back to (9.107). Taking inner products
(see Chap. 13) of both sides of (9.107), we have
where φ is an angle between mK and β (see Fig. 9.11). Inserting (9.104) into (9.135)
and solving the quadratic equation, we obtain neff as a function of k0 (or emission
wavelength). This implies that we can determine neff from the experimental data.
Yet, the above process does not mean that we can determine neff uniquely. It is
because there is still room for choice of a positive integer m. Using (9.125) that
restricts the values neff, however, we can decide the most probable integer m. Thus,
we can compare neff obtained from (9.134) with neff determined from the
experiments.
As described above, we have explained two major factors of the out-coupling of
emission. In the above discussion, we assumed that the TM modes dominate.
Combining (9.107) and (9.134), we successfully assign individual emissions to the
specific TMml modes, in which the indices m and l are longitudinal and transverse
modes with m associated with the grating. Note that m 1 and l 0. With the index
l, l ¼ 0 is allowed because the total internal reflection (Sects. 8.5 and 8.6) is
responsible.
Figure 9.14a [7] shows an example of the angle-dependent emission spectra. For
this, the grazing emission was intended. There are two series of progression in which
the emission peak locations are either redshifted or blueshifted with increasing
(a) (b)
4
(×10 counts)
Intensity
2
3
90
ion angle (degree)
60
30
RotatAngle
−30
−60
−90
600 700 800
Wavelength (nm)
Fig. 9.14 Angle-dependent emission spectra of a P6T crystal. (a) Emission spectra. Reproduced
from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018)
Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys
123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1.
5030486. (b) The grazing emissions (k0) were collected at various angles θ made with the grating
wavevector direction (K)
9.5 Lasers 373
2
(1, 1, r)
Fig. 9.15 Wavelength dispersion of effective indices neff. The open and closed symbols show the
data pertinent to the blueshifted and redshifted peaks, respectively. These data were calculated from
either (9.105) or (9.107). The colored solid curves represent the dispersions of the effective
refractive indices computed from (9.134). The numbers and characters (m, l, s) denote the diffrac-
tion order (m), order of transverse mode (l ), and shift direction (s) that distinguishes between
blueshift (b) and redshift (r). A black solid curve at the upper right indicates the wavelength
dispersion of the phase refractive indices (n) of the P6T crystal related to the one principal axis.
Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S,
Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J
Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.
1063/1.5030486
angles jθj. Figure 9.15 [7] compares the effective indices determined by the exper-
iments and numerical computations based on (9.134). The experimentally observed
emission peaks are indicated with and open circles (or open triangles) or closed
circles (or closed triangles) with blueshifted or redshifted peaks, respectively. The
color solid curves represent the results obtained from (9.134).
As clearly recognized in Fig. 9.15 [7], the results of the experiments and com-
putations are satisfactory. In particular, TM20 (blueshifted) and TM10 (redshifted)
modes are seamlessly jointed so as to be an upper group of emission peaks. A lower
group of emission peaks are assigned to TM21 (blueshifted) or TM11 (redshifted)
modes. Notice here that these modes may have multiple values of neff according to
different εξξ, εζζ, and εξζ that vary with the different propagation directions of light
(i.e., the ξ-axis) within the slab crystal waveguide.
From a practical point of view, it is desired to make a device so that it can strongly
emit light in the direction parallel to the grating wavevector. In that case, (9.107) can
be rewritten as
β ¼ ðk0 þ mK Þee,
374 9 Light Quanta: Radiation and Absorption
where all the vectors β, K, and k0 are parallel to ee. Then, we have two choices such
that
2π mK
β ¼ k0 þ mK ¼ neff k 0 , k 0 ¼ ¼ ;
λp neff 1
2π mK
β ¼ k0 mK ¼ neff k0 , k0 ¼ ¼ : ð9:136Þ
λp neff þ 1
The first and second equations of (9.136) are associated with the spectra of
redshifted and blueshifted progressions, respectively. Further rewriting (9.136) to
combine the two equations, we get
2β ¼ mK ð9:138Þ
or
2neff Λ
λp ¼ : ð9:139Þ
m
As discussed above in detail, Examples 9.1 and 9.2 enable us to design a high-
performance laser device using an organic crystal that is characterized by a pretty
complicated crystal structure associated with anisotropic refractive indices. The
abovementioned design principle enables one to construct effective laser devices
that consist of light-emitting materials either organic or inorganic more widely. At
the same time, these examples are expected to provide an effective methodology in
interdisciplinary fields encompassing solid-state physics and solid-state chemistry as
well as device physics.
9.6 Mechanical System 375
where we designated the polarization vector as a direction of the x-axis. At the same
time, we omitted the index from the amplitude. Thus, from the second equation of
(7.65) we have
pffiffiffi
∂H y 1 ∂E x 2Ek
¼ ¼ sin ωt cos kz: ð9:141Þ
∂t μ ∂z μ
Note that this equation appears in the second equation of (8.131) as well. Integrating
both sides of (9.141), we get
pffiffiffi
2Ek
Hy ¼ cos ωt cos kz:
μω
pffiffiffi
2E
Hy ¼ cos ωt cos kz þ C,
μv
E
H ,
μv
we have
pffiffiffi
Hy ¼ 2H cos ωt cos kz:
Thus, E (ke1), H (ke2), and n (ke3) form the right-handed system in this order. As
noted in Sect. 8.8, at the interface (or wall) the electric field and magnetic field form
nodes and antinodes, respectively. Namely, the two fields are in-quadrature.
Let us calculate electromagnetic energy of the dielectric medium within a cavity.
In the present case the cavity is meant as the dielectric sandwiched by a couple of
metal layer. We have
W ¼ W e þ W m, ð9:142Þ
where W is the total electromagnetic energy; We and Wm are electric and magnetic
energies, respectively. Let the length of the cavity be L. Then the energy per unit
cross-section area is described as
Z L Z L
ε μ
We ¼ E dz and W m ¼
2
H 2 dz: ð9:143Þ
2 0 2 0
ε
W e ¼ LE 2 sin2 ωt,
2
2
μ μ E ε
W m ¼ LH 2 cos 2 ωt ¼ L cos 2 ωt ¼ LE 2 cos 2 ωt, ð9:144Þ
2 2 μv 2
ε μ
W ¼ LE 2 ¼ LH 2 : ð9:145Þ
2 2
fe , W
Representing an energy per unit volume as W g e
m , and W, we have
fe ¼ ε E 2 sin2 ωt, W
W g ε 2 2 e ε 2 μ 2
m ¼ E cos ωt, W ¼ E ¼ H : ð9:146Þ
2 2 2 2
v0
xð t Þ ¼ sin ωt ¼ x0 sin ωt: ð2:7Þ
ω
v0
x0 :
ω
Defining
p0 mωx0 ,
we have
Let a kinetic energy and potential energy of the oscillator be K and V, respec-
tively. Then, we have
1 1 2
K¼ pðt Þ2 ¼ p cos 2 ωtxðt Þ,
2m 2m 0
1 1
V ¼ mω2 xðt Þ2 ¼ mω2 x0 2 sin2 ωtxðt Þ,
2 2
1 2 1 1
W ¼KþV ¼ p ¼ mv0 2 ¼ mω2 x0 2 : ð9:147Þ
2m 0 2 2
References
A general form of n-th order linear differential equations has the following form:
dn u dn1 u du
a n ð xÞ n þ an1 ðxÞ n1 þ þ a1 ðxÞ þ a0 ðxÞu ¼ d ðxÞ: ð10:1Þ
dx dx dx
d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ dðxÞ: ð10:2Þ
dx2 dx
In (10.2) we assume that the variable x is real. The equation can be solved under
appropriate boundary conditions (BCs). A general form of BCs is described as
du du
B1 ðuÞ ¼ α1 uðaÞ þ β1 x¼a þ γ 1 uðbÞ þ δ1 ¼ σ1, ð10:3Þ
dx dx x¼b
du du
B2 ðuÞ ¼ α2 uðaÞ þ β2 x¼a þ γ 2 uðbÞ þ δ2 ¼ σ2, ð10:4Þ
dx dx x¼b
where α1, β1, γ 1, δ1 σ 1, etc. are real constants; u(x) is defined in an interval [a, b],
where a and b can be infinity (i.e., 1). The LHS of B1(u) and B2(u) are referred to
as boundary functionals [1, 2]. If σ 1 ¼ σ 2 ¼ 0, the BCs are called homogeneous;
otherwise the BCs are said to be inhomogeneous. In combination with the inhomo-
geneous equation expressed as (10.2), Table 10.1 summarizes characteristics of
SOLDEs. We have four types of SOLDEs according to homogeneity and inhomo-
geneity of equations and BCs.
Even though SOLDEs are mathematically tractable, yet it is not easy necessarily
to solve them depending upon the nature of a(x), b(x), and c(x) of (8.2). Nonetheless,
if those functions are constant coefficients, it can readily be solved. We will deal with
SOLDEs of that type in great deal later. Suppose that we find two linearly indepen-
dent solutions u1(x) and u2(x) of a following homogeneous equation:
d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ 0: ð10:5Þ
dx2 dx
Then, any solution u(x) of (10.5) can be expressed as their linear combination such
that
a1 a a
f n ð xÞ ¼ f ðxÞ 2 f 2 ðxÞ n1 f n1 ðxÞ: ð10:8Þ
an 1 an an
If f1(x), f2(x), , and fn(x) are not linearly dependent, they are called linearly
independent. In other words, the statement that f1(x), f2(x), , and fn(x) are linearly
independent is equivalent to that (10.7) holds if and only if a1 ¼ a2 ¼ ¼ an ¼ 0.
We will have relevant discussion in Part III.
Now suppose that with the above two linearly independent functions u1(x) and
u2(x), we have
Thus, that u1(x) and u2(x) are linearly independent is equivalent to that the following
expression holds:
u1 ð x Þ u2 ðxÞ
du ðxÞ du ðxÞ W ðu1 , u2 Þ 6¼ 0, ð10:12Þ
1 2
dx dx
where W(u1, u2) is called Wronskian of u1(x) and u2(x). In fact, if W(u1, u2) ¼ 0, then
we have
du2 du
u1 u2 1 ¼ 0: ð10:13Þ
dx dx
382 10 Introductory Green’s Functions
This implies that there is a functional relationship between u1 and u2. In fact, if we
can express as u2 ¼ u2(u1(x)), then du
dx ¼ du1 dx . That is,
2 du2 du1
du2 du du du du du du
u1 ¼ u1 2 1 ¼ u2 1 or u1 2 ¼ u2 or 2 ¼ 1 , ð10:14Þ
dx du1 dx dx du1 u2 u1
where the second equality of the first equation comes from (10.13). The third
equation can easily be integrated to yield
u2 u
ln ¼ c or 2 ¼ ec or u2 ¼ ec u1 : ð10:15Þ
u1 u1
Equation (10.15) shows that u1(x) and u2(x) are linearly dependent. It is easy to show
if u1(x) and u2(x) are linearly dependent, W(u1, u2) ¼ 0. Thus, we have a following
statement:
On the other hand, suppose that we have another solution u3(x) for (10.5) besides
u1(x) and u2(x). Then, we have
d 2 u1 du
að x Þ þ bðxÞ 1 þ cðxÞu1 ¼ 0,
dx2 dx
d 2 u2 du
að x Þ þ bðxÞ 2 þ cðxÞu2 ¼ 0,
dx2 dx
d 2 u3 du
aðxÞ þ bðxÞ 3 þ cðxÞu3 ¼ 0: ð10:16Þ
dx2 dx
d 2 u1 du1
u1
dx2 dx
d 2 u2 du2
u2 ¼ 0: ð10:18Þ
dx2 dx
d 2 u3 du3
u3
dx2 dx
where W(u1, u2, u3) is Wronskian of u1(x), u2(x), and u3(x). In the above relation, we
used the fact that a determinant of a matrix is identical to that of its transposed matrix
and that a determinant of a matrix changes the sign after permutation of row vectors.
To be short, a necessary and sufficient condition to get a nontrivial solution is that W
(u1, u2, u3) vanishes.
This implies that u1(x), u2(x), and u3(x) are linearly dependent. However, we have
assumed that u1(x) and u2(x) are linearly independent, and so (10.18) and (10.19)
mean that u3(x) must be described as a linear combination of u1(x) and u2(x). That is,
we have no third linearly independent solution. Consequently, the general solution
of (10.5) must be given by (10.6). In this sense, u1(x) and u2(x) are said to be a
fundamental set of solutions of (10.5).
Next, let us consider the inhomogeneous equation of (10.2). Suppose that up(x) is
a particular solution of (10.2). Let us think of a following function v(x) such that:
d2 v dv d 2 up dup
aðxÞ 2
þ b ð x Þ þ c ð x Þv þ a ð x Þ 2
þ bð x Þ þ cðxÞup ¼ dðxÞ: ð10:21Þ
dx dx dx dx
Therefore, we have
d2 v dv
að x Þ 2
þ bðxÞ þ cðxÞv ¼ 0:
dx dx
384 10 Introductory Green’s Functions
But, v(x) can be described by a linear combination of u1(x) and u2(x) as in the case of
(10.6). Hence, the general solution of (10.2) should be expressed as
du
að x Þ þ bðxÞu ¼ d ðxÞ ½aðxÞ 6¼ 0: ð10:23Þ
dx
where α, β, and σ are real constants; u(x) is defined in an interval [a, b]. If in (10.23) d
(x) 0, (10.23) can readily be integrated to yield a solution. Let us multiply both
sides of (10.23) by w(x). Then we have
du
wðxÞaðxÞ þ wðxÞbðxÞu ¼ wðxÞd ðxÞ: ð10:25Þ
dx
We define p(x) as
where w(x) is called a weight function. As mentioned in Sect. 2.3, the weight
function is a real and non-negative function within the domain considered. Here
we suppose that
dpðxÞ
¼ wðxÞbðxÞ: ð10:27Þ
dx
d
½pðxÞu ¼ wðxÞdðxÞ: ð10:28Þ
dx
10.2 First-Order Linear Differential Equations (FOLDEs) 385
du
þ xu ¼ x: ð10:32Þ
dx
uðaÞ ¼ σ: ð10:33Þ
Also, we have
Z h
x
0 0 1 2 i
pðxÞ ¼ wðxÞ ¼ exp x dx ¼ exp x a2 : ð10:35Þ
a 2
Z x Z x h i
0 0 0 1 02
wðx Þdðx Þdx ¼ x0 exp
x a2 dx0
a a 2
2 Z x2 =2
a
¼ exp exp tdt
2 a2 =2
2 2 2
a x a
¼ exp exp exp
2 2 2
2 2
a x
¼ exp exp 1 ¼ pðxÞ 1, ð10:36Þ
2 2
1 σ1
u¼ ½ pð x Þ 1 þ σ ¼ 1 þ
pð x Þ pðxÞ
2
a x 2
¼ 1 þ ðσ 1Þ exp : ð10:37Þ
2
d
Lx ¼ : ð10:38Þ
dx
Looking at (10.40), we notice that LHS comprises a difference between two inte-
grals, while RHS referred to as a boundary term (or surface term) does not contain an
integral.
Recalling the expression (1.128) and defining
10.2 First-Order Linear Differential Equations (FOLDEs) 387
d
Lex
dx
in (10.40), we have
Comparing (10.42) and (10.43) and considering that φ and ψ are arbitrary functions,
we have Lex ¼ Lx { . Thus, as an operator adjoint to Lx we get
d
Lx { ¼ Lex ¼ ¼ Lx :
dx
Notice that only if the surface term vanishes, the adjoint operator Lx{ can appropri-
ately be defined. We will encounter a similar expression again in Part III.
We add that if with an operator A we have a relation describe by
A{ ¼ A, ð10:44Þ
φ ð bÞ ψ ð aÞ
φ ðbÞψ ðbÞ ¼ φ ðaÞψ ðaÞ or ¼ :
φ ð aÞ ψ ð bÞ
If ψ(b) ¼ 2ψ(a), then we should have φðbÞ ¼ 12 φðaÞ for the surface term to vanish.
Recalling (10.24), the above conditions are expressed as
388 10 Introductory Green’s Functions
The boundary functional B0(φ) is said to be adjoint to B(ψ). The two boundary
functionals are admittedly different. If, however, we set ψ(b) ¼ ψ(a), then we should
have φ(b) ¼ φ(a) for the surface term to vanish. That is
Thus, the two functionals are identical and ψ and φ satisfy homogeneous BCs with
respect to these functionals.
As discussed above, a FOLDE is characterized by its differential operator as well
as a BC (or boundary functional). This is similarly the case with SOLDEs as well.
Example 10.3 Next, let us consider a following differential operator:
1 d
Lx ¼ : ð10:49Þ
i dx
1
hφjLx ψ i hLx φjψ i ¼ ½φ ψ ba , ð10:51Þ
i
Repeating a discussion similar to Example 10.2, when the surface term vanishes, we
get
Lx { ¼ Lx : ð10:54Þ
The second-order differential operators are the most common operators and fre-
quently treated in mathematical physics. The general differential operators are
described as
d2 d
Lx ¼ a ð x Þ 2
þ bðxÞ þ cðxÞ, ð10:55Þ
dx dx
where a(x), b(x), and c(x) can in general be complex functions of a real variable x.
Let us think of following identities [1]:
d2 u d2 ðav Þ d du d ðav Þ
v a 2 u ¼ av u ,
dx dx2 dx dx dx
du dðbv Þ d
v b þu ¼ ½buv ,
dx dx dx
v cu ucv ¼ 0: ð10:56Þ
d2 u du d2 ðav Þ d ðbv Þ
v a 2 þ b þ cu u þ cv
dx dx dx2 dx
ð10:57Þ
d du dðav Þ d
¼ av u þ ½buv :
dx dx dx dx
Hence, following the expressions of Sect. 10.2, we define Lx{ such that
d 2 ðav Þ d ðbv Þ
Lx { v þ cv : ð10:58Þ
dx2 dx
d 2 ð a v Þ d ð b v Þ
Lx { v ¼ þ c v: ð10:59Þ
dx2 dx
Assuming that the relevant SOLDE is defined in [r, s] and integrating (10.61) within
that interval, we get
Z h s
s
{ i du dðav Þ
dx v ðLx uÞ Lx v u ¼ av u þ buv : ð10:62Þ
r dx dx r
Using the definition of an inner product described in (1.128) and rewriting (10.62),
we have
s
du dðav Þ
hvjLx ui Lx { vju ¼ av u þ buv :
dx dx r
Here if RHS of the above (i.e., the surface term of the above expression) vanishes,
we get
hvjLx ui ¼ Lx { vju :
Bearing in mind this situation, let us seek a condition under which the differential
operator Lx is Hermitian. Suppose here that a(x), b(x), and c(x) are all real and that
daðxÞ
¼ bðxÞ: ð10:63Þ
dx
d2 d
Lx { ¼ a ð x Þ 2
þ bð x Þ þ c ð x Þ ¼ L x :
dx dx
Thus, we are successful in constituting a self-adjoint operator Lx. In that case, (10.62)
can be rewritten as
Z s
s
du dv
dx½v ðLx uÞ ½Lx v u ¼ a v u : ð10:64Þ
r dx dx r
Notice that b(x) is eliminated from (10.64). If RHS of (10.64) vanishes, we get
This notation is consistent with (1.119) and the Hermiticity of Lx becomes well
defined.
If we do not have the condition of dadxðxÞ ¼ bðxÞ , how can we deal with the
problem? The answer is that following the procedures in Sect. 10.2, we can convert
Lx to a self-adjoint operator by multiplying Lx by a weight function w(x) introduced
in (10.26), (10.27), and (10.31). Replacing a(x), b(x), and c(x) with w(x)a(x), w(x)b
(x), and w(x)c(x), respectively, in the identity (10.57), we rewrite (10.57) as
d2 u du d2 ðawv Þ dðbwv Þ
v aw 2 þ bw þ cwu u þ cwv
dx dx dx2 dx
ð10:65Þ
d du dðawv Þ d
¼ awv u þ ½bwuv :
dx dx dx dx
Let us calculate { } of the second term for LHS of (10.65). Using (10.26) and
(10.27), we have
d2 ðawv Þ dðbwv Þ
þ cwv
dx2 dx
h i h
0 0
i
¼ ðawÞ0 v þ awv ðbwÞ0 v þ bwv þ cwv
0
h i h
0 0 0
i
¼ bwv þ awv bwÞ0 v þ bwv þ cwv
392 10 Introductory Green’s Functions
00 0 00 0
¼ w av þ bv þ cv ¼ w a v þ b v þ c v
¼ wðav00 þ bv0 þ cvÞ : ð10:66Þ
The second last equality of (10.66) is based on the assumption that a(x), b(x), and
c(x) are real functions. Meanwhile, for RHS of (10.65) we have
d du dðawv Þ d
awv u þ ½bwuv
dx dx dx dx
d du 0 dv d
¼ awv uðawÞ v uaw þ ½bwuv
dx dx dx dx
d du dv d
¼ awv ubwv uaw þ ½bwuv
dx dx dx dx
d du dv d du dv
¼ aw v u ¼ p v u : ð10:67Þ
dx dx dx dx dx dx
With the last equality of (10.67), we used (10.26). Using (10.66) and (10.67), we
rewrite (10.65) once again as
2 2
d u du d v dv
v w a 2 þ b þ cu uw a 2 þ b þ cv
dx dx dx dx
d du dv
¼ p v u : ð10:68Þ
dx dx dx
The relations (10.69) along with (10.62) are called the generalized Green’s identity.
We emphasize that as far as the coefficients a(x), b(x), and c(x) in (10.55) are real
functions, the associated differential operator Lx can be converted to a self-adjoint
form following the procedures of (10.66) and (10.67).
In the above, LHS of the original homogeneous differential equation (10.5) is
rewritten as
10.3 Second-Order Differential Operators 393
d2 u du
aðxÞwðxÞ 2 þ bðxÞwðxÞ þ cðxÞwðxÞu
dx dx
d du
¼ pð x Þ þ cwðxÞu:
dx dx
where we have
dpðxÞ
pðxÞ ¼ aðxÞwðxÞ and ¼ bðxÞwðxÞ: ð10:71Þ
dx
B 1 ð uÞ ¼ uð r Þ ¼ σ 1 : ð10:74Þ
Further putting
σ 1 ¼ σ 2 ¼ 0, ð10:76Þ
For RHS of (10.69) to vanish, it suffices to define B{1 ðuÞ and B{2 ðuÞ such that
394 10 Introductory Green’s Functions
In this manner, we can readily construct the homogeneous adjoint BCs the same as
those of (10.77) so that Lx can be Hermitian.
We list several prescriptions of typical BCs below.
Yet, care should be taken when handling RHS of (10.69), i.e., the surface terms. It is
because conditions (i) to (iii) are not necessary but sufficient conditions for the
surface terms to vanish. Such conditions are not limited to them. Meanwhile, we
often have to deal with the nonvanishing surface terms. In that case, we have to start
with (10.62) instead of (10.69).
In Sect. 10.2, we mentioned the definition of Hermiticity of the differential
operator in such a way that the said operator is self-adjoint and that homogeneous
BCs and homogeneous adjoint BCs are the same. In light of the above argument,
however, we may relax the conditions for a differential operator to be Hermitian.
This is particularly the case when p(x) ¼ a(x)w(x) in (10.69) vanishes at both the
endpoints. We will encounter such a situation in Sect. 10.7.
Lx u ð x Þ ¼ d ð x Þ ð10:81Þ
under homogeneous adjoint BCs [1, 2] with an inhomogeneous term h(x) being an
arbitrary function as well.
Let us describe the above relations as
GL ¼ LG ¼ E, ð10:84Þ
This implies that (10.81) has been solved and the solution is given by G| di. Since an
inverse operation to differentiation is integration, G is expected to be an integral
operator.
We have
Using a weight function w(x), we generalize an inner product of (1.128) such that
Z s
hgj f i wðxÞgðxÞ f ðxÞdx: ð10:88Þ
r
Z s
jfi ¼ dxwðxÞf ðxÞjxi: ð10:89Þ
r
Alternatively, we have
Z s
0
f ðx Þ ¼ dxf ðxÞδðx x0 Þ: ð10:92Þ
r
δ ð x x0 Þ δðx0 xÞ
wðxÞhx0 jxi ¼ δðx x0 Þ or hx0 jxi ¼ ¼ : ð10:94Þ
wðxÞ wðxÞ
δðx yÞ
Lx Gðx, yÞ ¼ : ð10:95Þ
wðxÞ
δðx yÞ
Lx { gðx, yÞ ¼ : ð10:96Þ
wðxÞ
To arrive at (10.96), we start the discussion assuming an operator (Lx{)1 such that
(Lx{)1 g with gL{ ¼ L{g ¼ E.
The function G(x, y) is called a Green’s function and g(x, y) is said to be an adjoint
Green’s function. Handling of Green’s functions and adjoint Green’s functions is
based upon (10.95) and (10.96), respectively. As (10.81) is defined in a domain
10.4 Green’s Functions 397
That is, G(x, y) and g(x, y) satisfy the homogeneous equation with respect to the
variable x. Accordingly, we require G(x, y) and g(x, y) to satisfy the same homoge-
neous BCs with respect to the variable x as those imposed upon u(x) and v(x) of
(10.81) and (10.82), respectively [1].
The relation (10.88) can be obtained as follows: Operating hg| on (10.89), we
have
Z s Z s
hgjf i ¼ dxwðxÞf ðxÞhgjxi ¼ wðxÞgðxÞ f ðxÞdx, ð10:98Þ
r r
For this, see (1.113) where A is replaced with an identity operator E with regard to a
complex conjugate of an inner product of two vectors. Also see (13.2) of Sect. 13.1.
If in (10.69) the surface term (i.e., RHS) vanishes under appropriate conditions,
e.g., (10.80), we have
Z n
s o
dxwðxÞ v ðLx uÞ Lx { v u ¼ 0, ð10:100Þ
r
which is called Green’s identity. Since (10.100) is derived from identities (10.56),
(10.100) is an identity as well (as a terminology of Green’s identity shows).
Therefore, (10.100) must hold with any functions u and v so far as they satisfy
homogeneous BCs. Thus, replacing v in (10.100) with g(x, y) and using (10.96)
together with (10.81), we have
Z n o
s
dxwðxÞ g ðx, yÞ½Lx uðxÞ Lx { gðx, yÞ uðxÞ
r
Z s
δðx yÞ
¼ dxwðxÞ g ðx, yÞdðxÞ uð x Þ
r wðxÞ
Z s
¼ dxwðxÞg ðx, yÞdðxÞ uðyÞ ¼ 0, ð10:101Þ
r
where with the second last equality we used a property of the δ functions. Also notice
that δwðxy Þ
ðxÞ is a real function. Rewriting (10.101), we get
398 10 Introductory Green’s Functions
Z s
uð y Þ ¼ dxwðxÞg ðx, yÞd ðxÞ: ð10:102Þ
r
Similarly, replacing u in (10.100) with G(x, y) and using (10.95) together with
(10.82), we have
Z s
vð yÞ ¼ dxwðxÞG ðx, yÞhðxÞ: ð10:103Þ
r
Replacing u and v in (10.100) with G(x, q) and g(x, t), respectively, we have
Z n o
s
dxwðxÞ g ðx, t Þ½Lx Gðx, qÞ Lx { gðx, t Þ Gðx, qÞ ¼ 0: ð10:104Þ
r
Notice that we have chosen q and t for the second argument y in (10.95) and (10.96),
respectively. Inserting (10.95) and (10.96) into the above equation after changing
arguments, we have
Z
s
δ ð x qÞ δ ðx t Þ
dxwðxÞ g ðx, t Þ Gðx, qÞ ¼ 0: ð10:105Þ
r wðxÞ w ð xÞ
Thus, we get
This implies that G(t, q) must satisfy the adjoint BCs with respect to the second
argument q. Inserting (10.106) into (10.102), we get
Z s
uð y Þ ¼ dxwðxÞGðy, xÞd ðxÞ: ð10:107Þ
r
Or, we have
10.4 Green’s Functions 399
Z s
vð xÞ ¼ dywðyÞgðx, yÞhðyÞ: ð10:110Þ
r
In Sect. 10.3 we assume that the coefficients a(x), and b(x), and c(x) are real to assure
that Lx is Hermitian [1]. On this condition G(x, y) is real as well (vide infra). Then we
have
That is, G(x, y) is real symmetric with respect to the arguments x and y.
To be able to apply Green’s functions to practical use, we will have to estimate a
behavior of the Green’s function near x ¼ y. This is because in light of (10.95) and
(10.96) there is a “jump” at x ¼ y.
When we deal with a case where a self-adjoint operator is relevant, using a
function p(x) of (10.69) we have
að x Þ ∂ ∂G δ ð x yÞ
p þ cðxÞG ¼ : ð10:114Þ
pðxÞ ∂x ∂x wðxÞ
f ðxÞ δðxÞ ¼ f ð0Þ δðxÞ or f ðxÞ δðx yÞ ¼ f ðyÞ δðx yÞ, ð10:116Þ
we have
∂ ∂G pðyÞ δðx yÞ pðxÞcðxÞ
p ¼ Gðx, yÞ: ð10:117Þ
∂x ∂x aðyÞ wðyÞ að x Þ
1 ð x > 0Þ
θ ð xÞ ¼ ð10:119Þ
0 ðx < 0Þ:
dθðxÞ
¼ δðxÞ: ð10:120Þ
dx
In RHS of (10.118) the first term has a discontinuity at x ¼ y because of θ(x y),
whereas the second term is continuous with respect to y. Thus, we have
" #
∂Gðx, yÞ ∂Gðx, yÞ
lim pðy þ εÞ x¼yþε pðy εÞ
ε!þ0 ∂x ∂x x¼yε
pð y Þ pð y Þ
¼ lim ½θðþεÞ θðεÞ ¼ : ð10:121Þ
ε!þ0 aðyÞwðyÞ aðyÞwðyÞ
Since p( y) is continuous with respect to the argument y, this factor drops off and we
get
" #
∂Gðx, yÞ ∂Gðx, yÞ 1
lim x¼yþε ¼ : ð10:122Þ
ε!þ0 ∂x ∂x x¼yε aðyÞwðyÞ
Thus, ∂G∂x
ðx, yÞ
is accompanied by a discontinuity at x ¼ y by a magnitude of aðyÞw
1
ðyÞ.
Since RHS of (10.122) is continuous with respect to the argument y, integrating
(10.122) again with respect to x, we find that G(x, y) is continuous at x ¼ y. These
properties of G(x, y) are useful to calculate Green’s functions in practical use. We
will encounter several examples in next sections.
10.5 Construction of Green’s Functions 401
Suppose that there are two Green’s functions that satisfy the same homogeneous
e ðx, yÞ be such functions. Then, we must have
BCs. Let G(x, y) and G
δ ð x yÞ e ðx, yÞ ¼ δðx yÞ :
Lx Gðx, yÞ ¼ and Lx G ð10:123Þ
w ð xÞ wðxÞ
e ðx, yÞ 0 or Gðx, yÞ G
Gðx, yÞ G e ðx, yÞ: ð10:125Þ
δ ð x yÞ
Lx Gðx, yÞ ¼ : ð10:126Þ
wðxÞ
Notice here that both δ(x y) and w(x) are real functions. Subtracting (10.95) from
(10.126), we have
Again, from the uniqueness of the Green’s function, we get G(x, y) ¼ G(x, y); i.e., G
(x, y) is real accordingly. This is independent of specific structures of Lx. In other
words, so far as we are dealing with real coefficients a(x), b(x), and c(x), G(x, y) is
real whether or not Lx is self-adjoint.
d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ dðxÞ, ð10:2Þ
dx2 dx
where coefficients a(x), b(x), and c(x) are real. In this case, if d(x) 0 in (10.108),
namely, the SOLDE is homogeneous equation, we have a solution u(x) 0 on the
basis of (10.108). If, on the other hand, we have inhomogeneous boundary condi-
tions (BCs), additional terms appear on RHS of (10.108) in both the cases of
homogeneous and inhomogeneous equations. In this section, we examine how we
can deal with this problem.
Following the remarks made in Sect. 10.3, we start with (10.62) or (10.69). If we
deal with a self-adjoint or Hermitian operator, we can apply (10.69) to the problem.
In a more general case where the operator is not self-adjoint, (10.62) is useful. In this
respect, in Sect. 10.6 we have a good opportunity for this.
In Sect. 10.3, we mentioned that we may relax the definition of Hermiticity of the
differential operator in the case where the surface term vanishes. Meanwhile, we
should bear in mind that the Green’s functions and adjoint Green’s functions are
constructed using homogeneous BCs regardless of whether we are concerned with a
homogeneous equation or inhomogeneous equation. Thus, even if the surface terms
do not vanish, we may regard the differential operator as Hermitian. This is because
we deal with essentially the same Green’s function to solve a problem with both the
cases of homogeneous equation and inhomogeneous equations (vide infra). Notice
also that whether or not RHS vanishes, we are to use the same Green’s function
[1]. In this sense, we do not have to be too strict with the definition of Hermiticity.
Now, suppose that for a differential (self-adjoint) operator Lx we are given
Z s
s
du dv
dxwðxÞfv ðLx uÞ ½Lx v ug ¼ pðxÞ v u , ð10:69Þ
r dx dx r
where w(x) > 0 in a domain [r, s] and p(x) is a real function. Note that since (10.69) is
an identity, we may choose G(x, y) for v with an appropriate choice of w(x). Then
from (10.95) and (10.112), we have
Z Z s
s
δðx yÞ δðx yÞ
dxwðxÞ Gðy,xÞdðxÞ u ¼ dxwðxÞ Gðy, xÞdðxÞ u
r wðxÞ r wðxÞ
s
duðxÞ ∂Gðy, xÞ
¼ pðxÞ Gðy,xÞ uð x Þ :
dx ∂x x¼r
ð10:127Þ
Note in (10.127) we used the fact that both δ(x y) and w(x) are real.
Using a property of the δ function, we get
10.5 Construction of Green’s Functions 403
Z s
s
duðxÞ ∂Gðy, xÞ
uð y Þ ¼ dxwðxÞGðy, xÞdðxÞ pðxÞ Gðy, xÞ uð x Þ :
r dx ∂x x¼r
ð10:128Þ
The function G(x, y) satisfies homogeneous BCs. Hence, if we assume, e.g., the
Dirichlet BCs [see (10.80)], we have
Using the symmetric property of G(x, y) with respect to arguments x and y, from
(10.129) we get
Thus, the first term of the surface terms of (10.128) is eliminated to yield
Z s
s
∂Gðy, xÞ
uð y Þ ¼ dxwðxÞGðy, xÞdðxÞ þ pðxÞuðxÞ :
r ∂x x¼r
Then, (i) substituting surface terms of u(s) and u(r) that are associated with the
inhomogeneous BCs described as
Lx Gðx, yÞ ¼ 0, ð10:133Þ
where Lx is given by
d2 d
Lx ¼ aðxÞ þ bðxÞ þ cðxÞ: ð10:55Þ
dx2 dx
The differential equation Lxu ¼ d(x) is defined within an interval [r, s], where r may
be 1 and s may be +1.
From now on, we regard a(x), b(x), and c(x) as real functions. From (10.133), we
expect the Green’s function to be described as a linear combination of a fundamental
set of solutions u1(x) and u2(x). Here the fundamental set of solutions are given by
two linearly independent solutions of a homogeneous equation Lxu ¼ 0. Then we
should be able to express G(x, y) as a combination of F1(x, y) and F2(x, y) that are
described as
where c1, c2, d1, and d2 are arbitrary (complex) constants to be determined later.
These constants are given as a function of y. The combination has to be made such
that
Notice that F1(x, y) and F2(x, y) are “ordinary” functions and that G(x, y) is not,
because G(x, y) contains the θ(x) function.
If we have
Hence, we get
Green’s function from F1(x, y) and F2(x, y), Lx should be Hermitian. However, if we
had, e.g., F1(x, y) ¼ x r and F2(x, y) ¼ y s, G(x, y) 6¼ G(y, x), and so Lx would not
be Hermitian.
The Green’s functions must satisfy the homogeneous BCs. That is,
d2 u
þ u ¼ 1: ð10:141Þ
dx2
We assume that a domain of the argument x is [0, L]. We set boundary conditions
such that
Thus, if at least one of σ 1 and σ 2 is not zero, we are dealing with an inhomogeneous
differential equation under inhomogeneous BCs.
Next, let us seek conditions that the Green’s function satisfies. We also seek a
fundamental set of solutions of a homogeneous equation described by
d2 u
þ u ¼ 0: ð10:143Þ
dx2
Then, we have
The functions F1(x, y) and F2(x, y) must satisfy the following BCs such that
406 10 Introductory Green’s Functions
Thus, we have
F 1 ðx, yÞ ¼ c1 eix eix , F 2 ðx, yÞ ¼ d1 eix e2iL eix : ð10:146Þ
Therefore, at x ¼ y we have
c1 eiy eiy ¼ d 1 eiy e2iL eiy : ð10:147Þ
This is because both F1(x, y) and F2(x, y) are ordinary functions and supposed to be
differentiable at any x. The relation (10.148) then reads as
id 1 eiy þ e2iL eiy ic1 eiy þ eiy ¼ 1: ð10:149Þ
This can readily be integrated to yields solution for the inhomogeneous equation
such that
Next, let us consider the surface term. This is given by the second term of
(10.131). We get
∂F 1 ðx, yÞ sin x ∂F 2 ðx, yÞ cos ðx 2LÞ cos x
¼ , ¼ : ð10:157Þ
∂x y¼0
y¼L
∂x sin L 2 sin 2 L
Therefore, with the inhomogeneous BCs we have the following solution for the
inhomogeneous equation:
408 10 Introductory Green’s Functions
uðxÞ 1:
Looking at (10.141), we find that u(x) 1 is certainly a solution for (10.141) with
inhomogeneous BCs of σ 1 ¼ σ 2 ¼ 1. The uniqueness of the solution then ensures
that u(x) 1 is a sole solution under the said BCs.
From (10.154), we find that G(x, y) has a singularity at L ¼ nπ (n : integers). This
is associated with the fact that a homogenous equation (10.143) has a nontrivial
solution, e.g., u(x) ¼ sin x under homogeneous BCs u(0) ¼ u(L ) ¼ 0. The present
situation is essentially the same as that of Example 1.1 of Sect. 1.3. In other words,
when λ ¼ 1 in (1.61), the form of a differential equation is identical to (10.143) with
virtually the same Dirichlet conditions. The point is that (10.143) can be viewed as a
homogeneous equation and, at the same time, as an eigenvalue equation. In such a
case, a Green’s function approach will fail.
The IVPs frequently appear in mathematical physics. The relevant conditions are
dealt with as BCs in the theory of differential equations. With boundary functionals
B1(u) and B2(u) of (10.3) and (10.4), setting α1 ¼ β2 ¼ 1 and other coefficients as
zero, we get
du
B1 ðuÞ ¼ uðpÞ ¼ σ 1 and B2 ðuÞ ¼ ¼ σ2 : ð10:159Þ
dx x¼p
In the above, note that we choose [r, s] for a domain of x. The points r and s can be
infinity as before. Any point p within the domain [r, s] may be designated as a special
point on which the BCs (10.159) are imposed. The initial conditions are particularly
prominent among BCs. This is because the conditions are set at one point of the
argument. This special condition is usually called initial conditions. In this section,
we investigate fundamental characteristics of IVPs.
Suppose that we have
10.6 Initial Value Problems (IVPs) 409
du
uð pÞ ¼ ¼ 0, ð10:160Þ
dx x¼p
d2 d
Lx ¼ aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx
Lx uðxÞ ¼ 0: ð10:161Þ
A general solution u(x) for (10.161) is given by a linear combination of u1(x) and
u2(x) such that
where c1 and c2 are arbitrary (complex) constants. Suppose that we have homoge-
neous BCs expressed by (10.160). Then, we have
Since the matrix represents Wronskian of a fundamental set of solutions u1(x) and
u2(x), its determinant never vanishes at any point p. That is, we have
u1 ð pÞ u2 ð pÞ
u 0 ðpÞ u 0 ðpÞ 6¼ 0: ð10:163Þ
1 2
uð x Þ 0
Z h s
s
{ i du dðav Þ
dx v ðLx uÞ Lx v u ¼ av u þ buv : ð10:62Þ
r dx dx r
For the surface term (RHS) to vanish, for homogeneous BCs we have, e.g.,
du dv
uð s Þ ¼ x¼s ¼ 0 and vðr Þ ¼ ¼ 0,
dx dx x¼r
for the two sets of BCs adjoint to each other. Obviously, these are not identical
simply because the former is determined at s and the latter is determined at a different
point r. For this reason, the operator Lx is not Hermitian, even though it is formally
self-adjoint. In such a case, we would rather use Lx directly than construct a self-
adjoint operator because we cannot make the operator Hermitian either way.
Hence, unlike the precedent sections we do not need a weight function w(x). Or,
we may regard w(x) 1. Then, we reconsider the conditions which the Green’s
functions should satisfy. On the basis of the general consideration of Sect. 8.4,
especially (10.86), (10.87), and (10.94), we have [2].
Therefore, we have
2
∂ Gðx, yÞ bðxÞ ∂Gðx, yÞ cðxÞ δðx yÞ
þ þ Gðx, yÞ ¼ :
∂x 2 að x Þ ∂x að x Þ að x Þ
∂Gðx, yÞ θðxyÞ
Noting that the functions other than ∂x
and aðyÞ are continuous, as before we
have
" #
∂Gðx, yÞ ∂Gðx, yÞ 1
lim x¼yþε ¼ : ð10:165Þ
ε!þ0 ∂x ∂x x¼yε að y Þ
10.6 Initial Value Problems (IVPs) 411
From a practical point of view, we may set r ¼ 0 in (10.62). Then, we can choose a
domain for [0, s] (for s > 0) or [s, 0] (for s < 0) with (10.2). For simplicity, we use
x instead of s. We consider two cases of x > 0 and x < 0.
(i) Case I (x > 0): Let u1(x) and u2(x) be a fundamental set of solutions. We define
F1(x, y) and F2(x, y) as before such that
As before, we set
Correspondingly, we have
As mentioned above, since u1(x) and u2(x) are a fundamental set of solutions, we
have c1 ¼ c2 ¼ 0. Hence, we get
F 1 ðx, yÞ ¼ 0:
From the continuity and discontinuity conditions (10.165) imposed upon the
Green’s functions, we have
412 10 Introductory Green’s Functions
As before, we get
u2 ð y Þ u1 ð y Þ
d1 ¼ and d 2 ¼ ,
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ aðyÞW ðu1 ðyÞ, u2 ðyÞÞ
where W(u1( y), u2( y)) is Wronskian of u1( y) and u2( y). Thus, we get
Here notice that the sign is reversed in (10.170)) relative to (10.168). This is because
on the discontinuity condition, instead of (10.167) we have to have
This results from the fact that magnitude relationship between the arguments x and
y has been reversed in (10.169) relative to (10.166).
Summarizing the above argument, (10.168) is obtained in the domain 0 y < x;
(10.170) is obtained in the domain x < y < 0. Noting this characteristic, we define a
function such that
Notice that
( , )
O y
, = −1
( − , − )
O
y
Note that Θ(x a, y a) can be obtained by shifting Θ(x, y) toward the positive
direction of the x- and y-axes by a (a can be either positive or negative; in Fig. 10.3
we assume a > 0). Using the Θ(x, y) function, the Green’s function is described as
we have
Notice that
As before, we set r ¼ 0 in (10.62). Also, we classify (10.62) into two cases according
as s > 0 or s < 0.
(i) Case I (x > 0, y > 0): Equation (10.62) reads as
Z h 1
1
{ i du dðav Þ
dx v ðLx uÞ Lx v u ¼ av u þ buv : ð10:176Þ
0 dx dx 0
O y
Note that in the above we used Lx{g(x, y) ¼ δ(x y). In Fig. 10.4, we depict a domain
of G(y, x) in which G(y, x) does not vanish. Notice that we get the domain by folding
back that of Θ(x, y) (see Fig. 10.2) relative to a straight line y ¼ x. Thus, we find that
G(y, x) vanishes at x > y. So does ∂G∂x ðy, xÞ
; see Fig. 10.4. Namely, the second term of
RHS of (10.178) vanishes at x ¼ 1. In other words, g(x, y) and G(y, x) must satisfy
the adjoint BCs; i.e.,
At the same time, the upper limit of integration range of (10.178) can be set at y.
Noting the above, we have
Z y
ðthe first termÞ of ð10:178Þ ¼ dxGðy, xÞdðxÞ: ð10:180Þ
0
du
uð0Þ ¼ σ 1 and ¼ σ2 , ð10:182Þ
dx x¼0
for (10.181) along with other appropriate values, we should be able to get a unique
solution as
Z y
uð y Þ ¼ dxGðy, xÞdðxÞ
0
dað0Þ ∂Gðy, xÞ
þ σ 2 að 0Þ σ 1 þ σ 1 bð0Þ Gðy, 0Þσ 1 að0Þ : ð10:183Þ
dx ∂x x¼0
Here, we consider that Θ(x, y) ¼ 1 in this region and use (10.174). Meanwhile, from
(10.174), we have
where we used
as well as
10.6 Initial Value Problems (IVPs) 417
and
(ii) Case II (x < 0, y < 0): Similarly as the above, Eq. (10.62) reads as
Z 0
uð y Þ ¼ dxGðy, xÞd ðxÞ
1
0
duðxÞ daðxÞ ∂Gðy, xÞ
aðxÞGðy, xÞ uð x Þ Gðy, xÞ uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ :
dx dx ∂x 1
Comparing (10.191) with (10.183), we recognize that the sign of RHS of (10.191)
has been reversed relative to RHS of (10.183). This is also the case after exchanging
arguments x and y. Note, however, Θ(x, y) ¼ 1 in the present case. As a result, two
minus signs cancel and (10.191) takes exactly the same expression as (10.183).
Proceeding with calculations similarly, for both Cases I and II we arrive at a unified
solution represented by (10.190) throughout a domain (1, +1).
10.6.4 Examples
d2 u
þ u ¼ 1: ð10:192Þ
dx2
Note that (10.192) is formally the same differential equation of (10.141). We may
encounter (10.192) when we are observing a motion of a charged harmonic oscillator
that is placed under a static electric field. We assume that a domain of the argument
x is a whole range of real numbers. We set boundary conditions such that
uðxÞ 1: ð10:197Þ
This also ensures that this is a unique solution under the inhomogeneous BCs
described as σ 1 ¼ 1 and σ 2 ¼ 0.
Example 10.6: Damped Oscillator If a harmonic oscillator undergoes friction, the
oscillator exerts damped oscillation. Such an oscillator is said to be a damped
oscillator. The damped oscillator is often dealt with when we think of bound
electrons in a dielectric medium that undergo an effect of a dynamic external field
varying with time. This is the case when the electron is placed in an alternating
electric field or an electromagnetic wave.
An equation of motion of the damped oscillator is described as
d2 u du
m þ r þ ku ¼ d ðxÞ, ð10:198Þ
dt 2 dx
d2 u du
m þ r þ ku ¼ 0,
dt 2 dx
putting
u ¼ eiρt ð10:199Þ
r k
ρ2 þ iρ þ ¼ 0: ð10:200Þ
m m
We call this equation a characteristic quadratic equation. We have three cases for the
solution of a quadratic equation of (10.200). Solving (10.200), we get
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ir r2 k
ρ¼ 2þ : ð10:201Þ
2m 4m m
2
(i) Equation (10.201) gives two pure imaginary roots; i.e., 4m
r
2 þ m < 0 (an over
k
2
damping). (ii) The equation has double roots; 4m
r
2 þ m ¼ 0 (a critical damping).
k
420 10 Introductory Green’s Functions
2
(iii) The equation has two complex roots; 4mr
2 þ m > 0 (a weak damping). Of these,
k
Case (ii):
The characteristic roots are given by
ir
ρ¼ :
2m
rt
u1 ðt Þ ¼ exp :
2m
∂u1 ðt Þ rt
u 2 ðt Þ ¼ c ¼ c0 t exp ,
∂ðiρÞ 2m
rt rt
uðt Þ ¼ a exp þ bt exp :
2m 2m
The most important and interesting feature emerges as a “damped oscillator” in the
next Case (iii) in many fields of natural science. We are particularly interested in
this case.
10.6 Initial Value Problems (IVPs) 421
Case (iii):
Suppose that the damping is relatively weak such that the characteristic equation
has two complex roots. Let us examine further details of this case following the
prescriptions of IVPs. We divide (10.198) by m for the sake of easy handling of the
differential equation such that
d 2 u r du k 1
þ þ u ¼ dðxÞ:
dt 2 m dx m m
Putting
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r2 k
ω 2þ , ð10:202Þ
4m m
rt
uðt Þ ¼ exp exp ð iωt Þ: ð10:203Þ
2m
h i
1 r 2mr ðtτÞ
sin ωðt τÞ þ ωe2mðtτÞ cos ωðt τÞ
r
þ e
ω 2m
½δðt τÞθðτÞ þ δðτ t ÞθðτÞ: ð10:206Þ
422 10 Introductory Green’s Functions
In the last term using the property of the δ function and θ function, we get δ(t τ).
Note here that
where we used (10.202) for the last equality. Rearranging (10.207) once again, we
have
d 2 G r dG k
þ þ G ¼ δðt τÞ: ð10:208Þ
dt 2 m dt m
d2 r d k
Lt þ þ , ð10:209Þ
dt 2 m dt m
we get
Note that this expression is consistent with (10.164). Thus, we find that (10.210)
satisfies the condition (10.123) of the Green’s function, where the weight function is
identified with unity.
Now, suppose that a sinusoidally changing external field eiΩt influences the
motion of the damped oscillator. Here we assume that an amplitude of the external
field is unity. Then, we have
d2 u r du k 1
þ þ u ¼ eiΩt : ð10:211Þ
dt 2 m dx m m
10.6 Initial Value Problems (IVPs) 423
where t is an arbitrary positive or negative time. Equation (10.212) shows that with
the real part we have an external field cosΩt and that with the imaginary part we have
an external field sinΩt. To calculate (10.212) we use
h i
1 iωðtτÞ
sin ωðt τÞ ¼ e eiωðtτÞ : ð10:213Þ
2i
1 r 1 r 2 1 2
Cuðt Þ ¼ ΩsinΩt þ cos Ωt Ω ω2 cos Ωt
m m m 2m m
h
1 r r 1 2
þ e2mt Ω2 ω2 cos ωt Ω þ ω2 sin ωt
m 2m ω
r 2 r 31
cos ωt sin ωt, ð10:214Þ
2m 2m ω
2 r2 r 4
C ¼ Ω 2 ω 2 þ 2 Ω 2 þ ω2 þ : ð10:215Þ
2m 2m
For the imaginary part (i.e., the external field is sinΩt) we get
1 2 1 r 1 r 2
Cuðt Þ ¼ Ω ω2 sin Ωt ΩcosΩt þ sin Ωt
m m m m 2m
1 2mr t Ω 2 r r 2Ω
þ e Ω ω sin ωt þ Ωcosωt þ
2
sin ωt :
m ω m 2m ω
ð10:216Þ
In Fig. 10.5 we show an example that depicts the positions of a damped oscillator as
a function of time. In Fig. 10.5a, an amplitude of an envelope gradually diminishes
with time. An enlarged diagram near the origin (Fig. 10.5b) clearly reflects the initial
conditions uð0Þ ¼ uð_0Þ ¼ 0 . In Fig. 10.5, we put m ¼ 1 [kg], Ω ¼ 1 1s , ω ¼
0:94 1s , and r ¼ 0:006 kgs . In the above calculations, if mr is small enough (i.e.,
damping is small enough), the third order and fourth order of mr may be ignored and
the approximation is precise enough.
424 10 Introductory Green’s Functions
(a)
3RVLWLRQ P
Phase:
ಥ 7LPH V
(b)
3RVLWLRQ P
Phase: 0
ಥ 7LPH V
Fig. 10.5 Example of a damped oscillation as a function of t. The data are taken from (10.214). (a)
Overall profile. (b) Profile enlarged near the origin
In the case of inhomogeneous BCs, given σ 1 ¼ u(0) and σ 2 ¼ uð_0Þ we can decide
additional terms S(t) using (10.190) such that
r 1 2mr t
sin ωt þ σ 1 e2mt cos ωt:
r
Sð t Þ ¼ σ 2 þ σ 1 e ð10:217Þ
2m ω
This term arises from (10.190). Thus, from (10.212) and (10.217), u(t) + S(t) gives a
unique solution for the SOLDE with inhomogeneous BCs. Notice that S(t) does not
depend on the external field.
10.7 Eigenvalue Problems 425
d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ 0: ð10:5Þ
dx2 dx
d2 d
Lx ¼ aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx
Lx uðxÞ ¼ 0: ð10:218Þ
d2 u du
að x Þ þ bðxÞ λu ¼ 0: ð10:219Þ
dx2 dx
d2 d
L x ¼ að x Þ þ bðxÞ λ, ð10:220Þ
dx2 dx
Lx u ¼ 0 ð10:221Þ
d2 d
Lx ¼ aðxÞ þ bð x Þ ,
dx2 dx
Lx u ¼ λu ð10:222Þ
to express (10.219).
Equations (10.221) and (10.222) are essentially the same except that the expres-
sion is different. The expression using (10.222) is familiar to us as an eigenvalue
equation. The difference between (10.5) and (10.219) is that whereas c(x) in (10.5) is
a given fixed function, λ in (10.219) is constant, but may be varied according to the
solution of u(x). One of the most essential properties of the eigenvalue problem that
is posed in the form of (10.222) is that its solution is not uniquely determined as
already studied in various cases of Part I. Remember that the methods based upon the
Green’s function are valid for a problem to which a homogeneous differential
equation has a trivial solution (i.e., identically zero) under homogeneous BCs. In
contrast to this situation, even though the eigenvalue problem is basically posed as a
homogeneous equation under homogeneous BCs, nontrivial solutions are expected
to be obtained. In this respect, we have seen that in Part I we rejected a trivial
solution (i.e., identically zero) because of no physical meaning.
As exemplified in Part I, the eigenvalue problems that appear in mathematical
physics are closely connected to the Hermiticity of (differential) operators. This is
because in many cases an eigenvalue is required to be real. We have already
examined how we can convert a differential operator to the self-adjoint form. That
is, if we define p(x) as in (10.26), we have the self-adjoint operator as described in
(10.70). As a symbolic description, we have
d du
wðxÞLx u ¼ pð x Þ þ cðxÞwðxÞu: ð10:223Þ
dx dx
For instance, Hermite differential equation that has already appeared as (2.118) in
Sect. 2.3 is described as
d2 u du
2x þ 2nu ¼ 0: ð10:225Þ
dx2 dx
have
d x2 du
þ 2nex u ¼ 0:
2
e ð10:226Þ
dx dx
Notice that the differential operator has been converted to a self-adjoint form
according to (10.31) that defines a real and positive weight function ex in the
2
10.7 Eigenvalue Problems 427
present case. The domain of the Hermite differential equation is (1, +1) at the
endpoints (i.e., 1) of which the surface term of RHS of (10.69) approaches zero
sufficiently rapidly in virtue of ex .
2
In Sects. 3.4 and 3.5, in turn, we dealt with the (associated) Legendre differential
equation (3.127) for which the relevant differential operator is self-adjoint. The
surface term corresponding to (10.69) vanishes. It is because (1 ξ2) vanishes at
the endpoints ξ ¼ cos θ ¼ 1 (i.e., θ ¼ 0 or π) from (3.107). Thus, the Hermiticity
is automatically ensured for the (associated) Legendre differential equation as well
as Hermite differential equation. In those cases, even though the differential equa-
tions do not satisfy any particular BCs, the Hermiticity is yet ensured.
In the theory of differential equations, the aforementioned properties of the
Hermitian operators have been fully investigated as the so-called StrumLiouville
system (or problem) in the form of a homogeneous differential equation. The related
differential equations are connected to classical orthogonal polynomials having
personal names such as Hermite, Laguerre, Jacobi, Gegenbauer, Legendre, and
Tchebichef. These equations frequently appear in quantum mechanics and electro-
magnetism as typical examples of StrumLiouville system. They can be converted
to differential equations by multiplying an original form by a weight function. The
resulting equations can be expressed as
d dY ðxÞ
aðxÞwðxÞ n þ λn wðxÞY n ðxÞ ¼ 0, ð10:227Þ
dx dx
where we put (aw)0 ¼ bw. That is, the differential equation is originally described as
d 2 Y n ð xÞ dY ðxÞ
að x Þ þ bðxÞ n þ λn Y n ðxÞ ¼ 0: ð10:229Þ
dx2 dx
In the case of Hermite polynomials, for instance, a(x) ¼ 1 and wðxÞ ¼ ex . Since we
2
have ðawÞ0 ¼ 2xex ¼ bw, we can put b ¼ 2x. Examples including this case
2
are tabulated in Table 10.2. The eigenvalues λn are associated with real numbers that
characterize the individual physical systems. The related fields have a wide appli-
cations in many branches of natural science.
After having converted the operator to the self-adjoint form; i.e., Lx ¼ Lx{, instead
of (10.100) we have
428
Z s
dxwðxÞfv ðLx uÞ ½Lx v ug ¼ 0: ð10:230Þ
r
ψ j jLx ψ i ¼ ψ j jλi ψ i ¼ λi ψ j jψ i ¼ λj ψ j jψ i ¼ λj ψ j jψ i
¼ Lx ψ j jψ i : ð10:234Þ
With the second and third equalities, we have used a rule of the inner product (see
Parts I and III). Therefore, we get
λi λj ψ j jψ i ¼ 0: ð10:235Þ
λi λi ¼ 0 or λi ¼ λi : ð10:237Þ
The relation (10.237) obviously indicates that λi is real; i.e., we find that eigenvalues
of an Hermitian operator are real. If λi 6¼ λj ¼ λj, from (10.235) we get
ψ j jψ i ¼ 0: ð10:238Þ
References
In this part we treat vectors and their transformations in linear vector spaces so that
we can address various aspects of mathematical physics systematically but intui-
tively. We outline general principles of linear vector spaces mostly from an algebraic
point of view. Starting with abstract definition and description of vectors, we deal
with their transformation in a vector space using a matrix. An inner product is a
central concept in the theory of a linear vector space so that two vectors can be
associated with each other to yield a scalar. Unlike many of the books of linear
algebra and linear vector spaces, however, we describe canonical forms of matrices
before considering the inner product. This is because we can treat the topics in light
of the abstract theory of matrices and vector space without a concept of the inner
product. Of the canonical forms of matrices, Jordan canonical form is of paramount
importance. We study how it is constructed providing a tangible example.
In relation to the inner product space, normal operators such as Hermitian
operators and unitary operators frequently appear in quantum mechanics and elec-
tromagnetism. From a general aspect, we revisit the theory of Hermitian operators
that often appeared in both Parts I andII.
The last part deals with exponential functions of matrices. The relevant topics can
be counted as one of the important branches of applied mathematics. Also, the
exponential functions of matrices play an essential role in the theory of continuous
groups that will be dealt with in PartIV.
Chapter 11
Vectors and Their Transformation
In this chapter, we deal with the theory of finite-dimensional linear vector spaces.
Such vector spaces are spanned by a finite number of linearly independent vectors,
namely, basis vectors. In conjunction with developing an abstract concept and
theory, we mention a notion of mapping among mathematical elements. A linear
transformation of a vector is a special kind of mapping. In particular, we focus on
endomorphism within a n-dimensional vector space V n. Here, the endomorphism is
defined as a linear transformation: V n ! V n. The endomorphism is represented by a
(n, n) square matrix. This is most often the case with physical and chemical
applications, when we deal with matrix algebra. In this book we focus on this type
of transformation.
A non-singular matrix plays an important role in the endomorphism. In this
connection, we consider its inverse matrix and determinant. All these fundamental
concepts supply us with a sufficient basis for better understanding of the theory of
the linear vector spaces. Through these processes, we should be able to get
acquainted with connection between algebraic and analytical approaches and gain
a broad perspective on various aspects of mathematical physics and related fields.
11.1 Vectors
From both fundamental and practical points of view, it is desirable to define linear
vector spaces in an abstract way. Suppose V is a set of elements denoted by a, b, c,
etc. called vectors. The set V is a linear vector space (or simply a vector space), if a
sum a + b 2 V is defined for any pair of vectors a and b and the elements of V satisfy
the following mathematical relations:
ða þ bÞ þ c ¼ a þ ðb þ cÞ, ð11:1Þ
a þ b ¼ b þ a, ð11:2Þ
a þ 0 ¼ a, ð11:3Þ
a þ ð 2 aÞ ¼ 0: ð11:4Þ
For the above 0 is called the zero vector. Furthermore, for a 2 V, ca 2 V is defined
(c is a complex number called a scalar) and we assume the following relations among
vectors and scalars:
On the basis of the above relations, we can construct the following expression called
a linear combination:
c 1 a1 þ c 2 a2 þ þ c n an :
c1 a1 þ c2 a2 þ þ cn an ¼ 0: ð11:9Þ
If (11.9) holds only in the case where every ci ¼ 0 (1 i n), the vectors a1, a2, ,
an are said to be linearly independent. In this case the relation represented by (11.9)
is said to be trivial. If the relation is nontrivial (i.e., ∃ci 6¼ 0), those vectors are said to
be linearly dependent.
If in the vector space V the maximum number of linearly independent vectors is
n, V is said to be an n-dimensional vector space and sometimes denoted by V n. In this
case any vector x of V n is expressed uniquely as a linear combination of linearly
independent vectors such that
x ¼ x 1 a1 þ x 2 a2 þ þ x n an : ð11:10Þ
Suppose x is denoted by
A vector space that has a finite number of basis vectors is called finite-dimensional;
otherwise it is infinite-dimensional.
Alternatively, we express (11.10) as
0 1
x1
B C
x ¼ ða1 an Þ@ ⋮ A: ð11:13Þ
xn
0 1
x1
B C
A set of coordinates @ ⋮ A is called a column vector (or a numerical vector) that
xn
indicates an “address” of the vector x with respect to the basis vectors a1, a2, , an.
Any vector in V n can be expressed as a linear combination of the basis vectors and,
hence, we say that V n is spanned by a1, a2, , an. This is represented as
V n ¼ Spanfa1 , a2 , , an g: ð11:14Þ
Let us think of a subset W of V n (i.e., W ⊂ V n). If the following relations hold for
W, W is said to be a (linear) subspace of V n.
a, b 2 W⟹a þ b 2 W,
a 2 W⟹ca 2 W: ð11:15Þ
These two relations ensure that the relations of (11.1)–(11.8) hold for W as well. The
dimension of W is equal to or smaller than n. For instance W ¼ Span{a1, a2, ,
ar} (r n) is a subspace of V n. If r ¼ n, W ¼ V n. Suppose that there are two
subspaces W1 ¼ Span{a1} and W2 ¼ Span{a2}. Note that in this case W1 [ W2 is not
a subspace, because W1 [ W2 does not contain a1 + a2. However, a set U defined by
U ¼ x ¼ x1 þ x2 ; 8 x1 2 W 1 , 8 x2 2 W 2 ð11:16Þ
(a) z z (b) z
% % %
2 2 2
y y y
$ $ಬ 3
x x x
Fig. 11.1 Decomposition of a vector in a three-dimensional Cartesian space ℝ3 into two subspaces.
(a) ℝ3 ¼ W1 + W2; W1 \ W2 ¼ Span{e2}, where e2 is a unit vector in the positive direction of the y-
axis. (b) ℝ3 ¼ W1 + W3; W1 \ W3 ¼ {0}
W1 + W2. This implies that W1 + W2 is the smallest subspace that contains both W1
and W2.
Example 11.1 Consider a three-dimensional Cartesian space ℝ3 (Fig. 11.1). We
regard the xy-plane and yz-plane as a subspace W1 and W2, respectively, and
! ! !
ℝ3 ¼ W1 + W2. In Fig. 11.1a, a vector OB (in ℝ3) is expressed as OA þ AB (i.e.,
!
a sum of a vector in W1 and that in W2). Alternatively, the same vector OB can be
! !
expressed as OA0 þ A0 B . On the other hand, we can designate a subspace in a
different way; i.e., in Fig. 11.1b the z-axis is chosen for a subspace W3 instead of
!
W2. We have ℝ3 ¼ W1 + W3 as well. In this case, however, OB is uniquely expressed
! ! !
as OB ¼ OP þ PB . Notice that in Fig. 11.1a W1 \ W2 ¼ Span{e2}, where e2 is a
unit vector in the positive direction of the y-axis. In Fig. 11.1b, on the other hand, we
have W1 \ W3 ¼ {0}.
We can generalize this example to the following theorem.
Theorem 11.1 Let W1 and W2 be subspaces of V and V ¼ W1 + W2. Then a vector x
in V is uniquely expressed as
x ¼ x1 þ x 2 , x1 2 W 1 , x2 2 W 2 ,
V ¼ W 1 ⨁W2 : ð11:17Þ
where “dim” stands for dimension of the vector space considered. To prove (11.18),
we suppose that V is a n-dimensional vector space and that W1 and W2 are spanned
by r1 and r2 linearly independent vectors, respectively, such that
n o n o
ð1Þ ð1Þ ð2Þ ð2Þ
W 1 ¼ Span e1 , e2 , , eðr11 Þ and W 2 ¼ Span e1 , e2 , , eðr22 Þ : ð11:19Þ
Then, we have n r1 + r2. This is almost trivial. Suppose r1 + r2 < n. Then, these
(r1 + r2) vectors cannot span V, but we need additional vectors for all vectors
including the additional vectors to span V. Thus, we should have n r1 + r2
accordingly. That is,
ð2Þ
Now, let us assume that V ¼ W1 ⨁ W2. Then, ei ð1 i r 2 Þ must be linearly
ð1Þ ð1Þ ð1Þ ð2Þ
independent of e1 , e2 , , er1 . If not, ei could be described as a linear combina-
ð1Þ ð1Þ ð2Þ ð2Þ
tion of e1 , e2 , , eðr11 Þ. But, this would imply that ei 2 W 1 , i.e., ei 2 W 1 \ W 2 ,
in contradiction to that we have V ¼ W1 ⨁ W2. It is because W1 \ W2 ¼ {0} by
ð1Þ ð2Þ ð2Þ
assumption. Likewise, ej ð1 j r 1 Þ is linearly independent of e1 , e2 , , eðr22 Þ .
ð1Þ ð1Þ ð2Þ ð2Þ
Hence, e1 , e2 , , eðr11 Þ , e1 , e2 , , eðr22 Þ must be linearly independent. Thus,
n r1 + r2. This is because in the vector space V we may well have additional
vector(s) that are independent of the above (r1 + r2) vectors. Meanwhile, n r1 + r2
from the above. Consequently, we must have n ¼ r1 + r2. Thus, we have proven that
438 11 Vectors and Their Transformation
The vector described by the first term is contained in W1 and that described by the
second term in W2. Both the terms are again expressed uniquely. Therefore, we get
V ¼ W1 ⨁ W2. This is a proof of
Furthermore, we have
In the previous section we introduced vectors and their calculation rules in a linear
vector space. It is natural and convenient to relate a vector to another vector, as a
function f relates a number (either real or complex) to another number such that y ¼ f
(x), where x and y are certain two numbers. A linear transformation from the vector
space V to another vector space W is a mapping A: V ! W such that
11.2 Linear Transformations of Vectors 439
We will briefly discuss the concepts of mapping at the end of this section.
It is convenient to define addition of the linear transformations. It is defined as
x ¼ xe1 þ ye2 :
x
¼ ð e1 e 2 Þ , ð11:27Þ
y
where e1 and e2 are unit basis vectors in the xy-plane and x and y are coordinates of
the vector x in reference to e1 and e2. The expression (11.27) is consistent with
(11.13). The rotation represented in Fig. 11.2 is an example of a linear transforma-
tion. We call this rotation R. According to the definition
θ
θ
θ
′
440 11 Vectors and Their Transformation
cos θ sin θ
e01 e02 ¼ ðe1 e2 Þ : ð11:31Þ
sin θ cos θ
Meanwhile, x0 can be expressed relative to the original basis vectors e1 and e2.
x0 = x 0 e 1 þ y 0 e 2 : ð11:33Þ
x0 = x cos θ y sin θ,
y0 ¼ x sin θ þ y cos θ: ð11:34Þ
x0 cos θ sin θ x
0
¼ : ð11:35Þ
y sin θ cos θ y
cos θ sin θ x
x0 ¼ RðxÞ ¼ ðe1 e2 Þ : ð11:36Þ
sin θ cos θ y
The above example demonstrates that the linear transformation R has a (2, 2)
matrix representation shown in (11.36). Moreover, this example obviously shows
that if a vector is expressed as a linear combination of the basis vectors, the
“coordinates” (represented by a column vector) can be transformed as well by the
same matrix.
Regarding an abstract n-dimensional linear vector space V n, the linear vector
transformation A is given by
0 10 1
a11 a1n x1
B CB C
AðxÞ ¼ ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A , ð11:37Þ
an1 ann xn
11.2 Linear Transformations of Vectors 441
where e1, e2, , and en are basis vectors and x1, x2, , and xn are the corresponding
coordinates of a vector x = ∑ni¼1 xi ei. We assume that the transformation is a mapping
A: V n ! V n (i.e., endomorphism). In this case the transformation is represented by a
(n, n) matrix. Note that the matrix operates on the basis vectors from the right and
that it operates on the coordinates (i.e., a column vector) from the left. In (11.37) we
often omit a parenthesis to simply write Ax.
Here we mention matrix notation for later convenience. We often identify a linear
vector transformation with its representation matrix and denote both transformation
and matrix by A. On this occasion, we write
0 1
a11 a1n
B C
A¼@⋮ ⋱ ⋮ A, A ¼ ðAÞij ¼ aij , etc:, ð11:38Þ
an1 ann
where with the second expression (A)ij and (aij) represent the matrix A itself; for we
frequently use indexed matrix notations such as A1, A{ and A.e The notation (11.38)
can conveniently be used in such cases. Note moreover that aij represents the matrix
A as well.
Equation (11.37) has duality such that the matrix A operates either on the basis
vectors or coordinates. This can explicitly be written as
2 0 13 0 1
a11 a1n x1
6 B C7 B C
AðxÞ ¼ 4ðe1 en Þ@ ⋮ ⋱ ⋮ A5 @ ⋮ A
an1 ann xn
20 10 13
a11 a1n x1
6B CB C7
¼ ðe1 en Þ4@ ⋮ ⋱ ⋮ A@ ⋮ A5:
an1 ann xn
That is, we assume the associative law with the above expression. Making summa-
tion representation, we have
0 1 0 Pn 1
Xn Xn x1 l¼1 a1l xl
B C B C
AðxÞ ¼ e a
k¼1 k k1
ea
k¼1 k kn
@ ⋮ A ¼ ðe1 en Þ@ ⋮ A:
Pn
xn l¼1 anl xl
Xn Xn
¼ k¼1
ea x
l¼1 k kl l
Xn Xn Xn Xn
¼ l¼1 k¼1
e k a kl x l ¼ k¼1 l¼1
a kl x l ek : ð11:39Þ
That is, the above equation can be viewed in either of two ways, i.e., coordinate
transformation with fix vectors or vector transformation with fixed coordinates.
442 11 Vectors and Their Transformation
where we assumed that the distributive law holds with operation of A on (e1 en).
Meanwhile, if in (11.37) we put xi ¼ 1, xj ¼ 0 ( j 6¼ i), from (11.39) we get
Xn
A ð ei Þ ¼ ea :
k¼1 k ki
On the basis of the linear dependence of the basis vectors, ∑nk¼1 aik a0ik xk ¼
0 ð1 i nÞ: This relationship holds for any arbitrarily and independently chosen
complex numbers xi (1 i n). Therefore, we must have aik ¼ a0ik ð1 i, k nÞ,
meaning that the matrix representation of A is unique with regard to fixed basis
vectors.
Nonetheless, if a set of vectors e1, e2, , and en does not constitute basis vectors
(i.e., those vectors are linearly dependent), the aforementioned uniqueness of the
matrix representation loses its meaning. For instance, in V2 take vectors e1 and e2
such that e1 ¼ e2 (i.e., the two vectors are linearly dependent) and let the transfor-
1 0
mation matrix be B ¼ . This means that e1 should be e1 after the transfor-
0 2
mation. At the same time, the vector e2 (¼e1) should be converted to 2e2 (¼2e1). It is
impossible except for the case of e1 ¼ e2 ¼ 0. The above matrix B in its own right is
an object of matrix algebra, of course.
Putting a ¼ b ¼ 0 and c ¼ d ¼ 1 in the definition of (9.25) for the linear
transformation, we obtain
Að0Þ = 0: ð11:44Þ
1 0
e01 e02 ¼ ðe1 e2 Þ ¼ ðe1 0Þ:
0 0
The number of dimA(V n) is said to be a rank of the linear transformation A. That is,
we write
dimAðV n Þ ¼ rank A:
Also the number of dim Ker A is said to be a nullity of the linear transformation A.
That is, we have
cνþ2 c
Aðeνþ1 Þ þ Aðeνþ2 Þ þ þ n Aðen Þ ¼ 0,
C νþ1 Cνþ1
cνþ2 cn
Aðeνþ1 Þ þ A e þ þ A e ¼ 0,
Cνþ1 νþ2 C νþ1 n
cνþ2 c
A eνþ1 þ e þ þ n en ¼ 0:
C νþ1 νþ2 Cνþ1
From the above discussion, however, this relation holds if and only if
cν + 1 ¼ ¼ cn ¼ 0. This implies that Ker A \ V n ν ¼ {0}. Then, we have
Example 11.2 Let e1, e2, e3, e4 be basis vectors of V 4. Let A be an endomorphism of
V 4 and described by
0 1
1 0 1 0
B0 0C
B 1 1 C
A¼B C:
@1 0 1 0A
0 1 1 0
We have
0 1
1 0 1 0
B0 0C
B 1 1 C
ð e1 , e2 , e3 , e4 Þ B C ¼ ðe1 þ e3 , e2 þ e4 , e1 þ e2 þ e3 þ e4 , 0Þ:
@1 0 1 0A
0 1 1 0
That is,
We have
Then, we find
A V 4 ¼ Spanfe1 þ e3 , e2 þ e4 g, Ker A ¼ Spanf 2 e1 e2 þ e3 , e4 g: ð11:49Þ
x ¼ c1 e1 þ c2 e2 þ c 3 e3 þ c 2 e4
1 1
¼ ðc1 þ c3 Þðe1 þ e3 Þ þ ð2c2 þ c3 c1 Þðe2 þ e4 Þ
2 2
1 1
þ ðc3 c1 Þð 2 e1 e2 þ e3 Þ þ ðc1 2c2 c3 þ 2c4 Þe4 : ð11:50Þ
2 2
Thus, x has been uniquely represented as (11.50) with respect to basis vectors
(e1 + e3), (e2 + e4), (2e1 e2 + e3), and e4. The linear independence of these vectors
can easily be checked by equating (11.50) with zero. We also confirm that
11.2 Linear Transformations of Vectors 447
surjective
V 4 ¼ A V 4 Ker A:
The existence of the inverse transformation plays a particularly important role in the
theory of linear vector spaces. The inverse transformation is a linear transformation.
Let x1 ¼ A1( y1), x2 ¼ A1( y2). Also, we have A(c1x1 + c2x2) ¼ c1A(x1) + c2A
(x2) ¼ c1 y1 + c2 y2. Thus, c1x1 + c2x2 ¼ A1(c1 y1 + c2 y2) ¼ c1A1( y1) + c2A1( y2),
showing that A1 is a linear transformation. As already mentioned, a matrix that
represents a linear transformation A is uniquely determined with respect to fixed
basis vectors. This should be the case with A1 accordingly. We have an important
theorem for this.
Theorem 11.5 [1] The necessary and sufficient condition for the matrix A1 that
represents the inverse transformation to A to exist is that det A 6¼ 0 (“det” means a
determinant). Here the matrix A represents the linear transformation A. The matrix
A1 is uniquely determined and given by
A1 ij
¼ ð1Þiþj ðM Þji =ðdet AÞ, ð11:51Þ
On this condition, suppose that det A ¼ 0. From the properties of determinants this
implies that one of the columns of A (let it be the m-th column) can be expressed as a
linear combination of the other columns of A such that
X
Akm ¼ A c:
j6¼m kj j
ð11:53Þ
∑nk¼1 A1 mk ∑j6¼m Akj cj ¼ ∑j6¼m δmj cj ¼ 0: ð11:54Þ
There is the inconsistency between (11.54) and (11.55). The inconsistency resulted
from the supposition that the matrix A1 exists. Therefore, we conclude that if
det A ¼ 0, A1 does not exist. Taking contraposition to the above statement, we
say that if A1 exists, det A 6¼ 0. Suppose next that det A 6¼ 0. In this case, on the
basis of the well-established result, a unique A1 exists and it is given by (11.51).
This completes the proof. ∎
Summarizing the characteristics of the endomorphism within a finite-dimensional
vector space, we have
Let a matrix A be
0 1
a11 a1n
B C
A¼@⋮ ⋱ ⋮ A:
an1 ann
where σ means permutation among 1, 2, . . ., n and ε(σ) denotes a sign of + (in the
case of even permutations) or (for odd permutations).
We deal with triangle matrices for future discussion. It is denoted by
450 11 Vectors and Their Transformation
= * , ð11:57Þ
0
where an asterisk ( ) means that upper right off-diagonal elements can take any
complex numbers (including zero). A large zero shows that all the lower left
off-diagonal elements are zero. Its determinant is given by
In fact, focusing on anin we notice that only if in ¼ n, anin does not vanish. Then we
get
AðxÞ ¼ y, ð11:60Þ
where we have vectors such that x = ∑ni¼1 xi ei and y = ∑ni¼1 yi ei . In reference to the
same set of basis vectors (e1 en) and using a matrix representation, we have
0 10 1 0 1
a11 a1n x1 y1
B CB C B C
AðxÞ ¼ ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ ðe1 en Þ@ ⋮ A: ð11:61Þ
an1 ann xn yn
From the unique representation of a vector in reference to the basis vectors, (11.61) is
simply expressed as
0 10 1 0 1
a11 a1n x1 y1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ @ ⋮ A : ð11:62Þ
an1 ann xn yn
From the above discussion, for the linear transformation A to be bijective, det A 6¼ 0.
In terms of the system of linear equations, we say that for (11.62) to have a unique
11.3 Inverse Matrices and Determinants 451
0 1 0 1
x1 y1
B C B C
solution @ ⋮ A for a given @ ⋮ A, we must have det A 6¼ 0. Conversely, det A ¼ 0
xn yn
is equivalent to that (11.62) has indefinite solutions or has no solution. As far as the
matrix algebra is concerned, (11.61) is symbolically described by omitting a paren-
thesis as
Ax ¼ y: ð11:64Þ
However, when the vector transformation is explicitly taken into account, the full
representation of (11.61) should be borne in mind.
The relations (11.60) and (11.64) can be considered as a set of simultaneous
equations. A necessary and sufficient condition for (11.64) to have a unique solution
x for a given y is det A 6¼ 0. In that case, the solution x of (11.64) can be symbolically
expressed as
x = A1 y, ð11:65Þ
This matrix transforms a vector x ¼ xe1 + ye2 + ze3 into y ¼ xe1 + ye2 as follows:
452 11 Vectors and Their Transformation
y
O
0 10 1 0 1
1 0 0 x x
B CB C B C
@0 1 0 A@ y A ¼ @ y A: ð11:67Þ
0 0 0 z 0
0 1
a11 a1n
B C
A¼@⋮ ⋱ ⋮ A:
an1 ann
Notice here that we often denote both a linear transformation and its corresponding
matrix by the same character. After the transformation, suppose that the resulting
vectors are given by e01 , e02 , and e0n . This is explicitly described as
0 1
a11 a1n
B C
ðe1 en Þ@ ⋮ ⋱ ⋮ A ¼ e01 e0n : ð11:69Þ
an1 ann
Care should be taken not to confuse (11.70) with (11.63). Here, a set of vectors
e01 , e02 , and e0n may or may0not 1
be linearly independent. Let us operate both sides of
x1
B C
(11.70) from the left on @ ⋮ A and equate both the sides to zero. That is,
xn
0 10 1 0 1
a11 a1n x1 x1
B CB C B C
ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ e01 e0n @ ⋮ A ¼ 0:
an1 ann xn xn
and e01 , e02 , and e0n are linearly dependent, and vice versa. In case det A 6¼ 0, an inverse
matrix A1 exists, and so we have
ðe1 en Þ ¼ e01 e0n A1 : ð11:72Þ
In the previous steps we see how the linear transformation (and the corresponding
matrix representation)
converts a set of basis vectors (e1 en) to another set of basis
vectors e01 e0n . Is this possible then to find a suitable transformation between two
sets of arbitrarily chosen basis vectors? The answer is yes. This is because any vector
can be expressed uniquely by a linear combination of any set of basis vectors. A
whole array of such linear combinations uniquely define a transformation matrix
between the two sets of basis vectors as expressed in (11.69) and (11.72). The matrix
has nonzero determinant and has an inverse matrix. A concept of the transformation
between basis vectors is important and very often used in various fields of natural
science.
Example 11.5 We revisit Example 11.2. The relation (11.49) tells us that the basis
vectors of A(V 4) and those of Ker A span V 4 in total. Therefore, in light of the above
argument, there should be a linear transformation R between the two sets of vectors,
i.e., e1, e2, e3, e4 and e1 + e3, e2 + e4, 2 e1 e2 + e3, e4. Moreover, the matrix
R associated with the linear transformation must be non-singular (i.e., detR 6¼ 0). In
fact, we find that R is expressed as
0 1
1 0 1 0
B0 1 1 0C
B C
R¼B C:
@1 0 1 0A
0 1 0 1
This is because we have a following relation between the two sets of basis vectors:
0 1
1 0 1 0
B0 1 1 0C
B C
ðe1 , e2 , e3 , e4 ÞB C ¼ ðe1 þ e3 , e2 þ e4 , 2 e1 e2 þ e3 , e4 Þ:
@1 0 1 0A
0 1 0 1
0 10 1
p11 p1n x1
B CB C
PðxÞ ¼ ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A , ð11:73Þ
pn1 pnn xn
where the non-singular matrix P represents the transformation P. Notice here that the
transformation and its matrix are represented by the same P. As mentioned in Sect.
11.2, the matrix P can be operated either from the right on the basis vectors or from
the left on the column vector. We explicitly write
2 0 130 1
p11 p1n x1
6 B C7B C
PðxÞ ¼ 4ðe1 en Þ@ ⋮ ⋱ ⋮ A5@ ⋮ A,
pn1 pnn xn
0 1
x1
0
0 B C
¼ e1 en @ ⋮ A, ð11:74Þ
xn
where e01 e0n ¼ (e1 en)P [here P is the non-singular matrix defined in (11.73)].
Alternatively, we have
20 10 13
p11 p1n x1
6B CB C7
PðxÞ ¼ ðe1 en Þ4@ ⋮ ⋱ ⋮ A@ ⋮ A5 :
pn1 pnn xn
0 0 1
x1
B C
¼ ðe1 en Þ@ ⋮ A, ð11:75Þ
x0n
where
0 1 0 10 1
x01 p11 p1n x1
B C B CB C
@⋮A ¼ @ ⋮ ⋱ ⋮ A@ ⋮ A : ð11:76Þ
x0n pn1 pnn xn
Equation (11.75) gives a column vector representation regarding the vector that has
been obtained by the transformation P and is viewed in reference to the basis vectors
(e1 en). Combining (11.74) and (11.75), we get
0 1 0 0 1
x1 x1
B C B C
PðxÞ = e01 e0n @ ⋮ A ¼ ðe1 en Þ@ ⋮ A: ð11:77Þ
xn x0n
456 11 Vectors and Their Transformation
PA0 ¼ AO P: ð11:80Þ
A0 ¼ P1 AO P: ð11:81Þ
Note that since the transformation A has been performed in reference to the basis
vectors (e1 en), AO should be used for the matrix representation. This is rewritten
as
11.4 Basis Vectors and Their Transformations 457
1 0
x1
B C
AðxÞ ¼ ðe1 en ÞPP1 AO PP1 @ ⋮ A: ð11:83Þ
xn
In (11.84) we put
0 1 0 1
x1 xe1
0 0
1 B C B C
ðe1 en ÞP ¼ e1 en , P @ ⋮ A ¼ @ ⋮ A: ð11:85Þ
xn xen
Equation (11.84) gives a column vector representation regarding the same vector
x viewed in reference to the basis set of vectors e1, , en or e01 , , e0n . Equation
(11.85) should
0 not1 be confused
0 0 1 with (11.76). That is, (11.76) relates the two
x1 x1
B C B C
coordinates @ ⋮ A and @ ⋮ A of different vectors before and after the transfor-
xn x0n
mation viewed in reference to the same set0of basis
1 vectors.
0 The 1 relation (11.85), on
x1 xe1
B C B C
the other hand, relates two coordinates @ ⋮ A and @ ⋮ A of the same vector
xn xen
viewed in reference to different set of basis vectors. Thus, from (11.83) we have
0 1
xe1
B C
AðxÞ = e01 e0n P1 AO P@ ⋮ A: ð11:86Þ
xen
Meanwhile, viewing the transformation A in reference to the basis vectors e01 e0n ,
we have
458 11 Vectors and Their Transformation
0 1
xe1
B C
AðxÞ = e01 e0n A0 @ ⋮ A: ð11:87Þ
xen
A0 ¼ P1 AO P: ð11:88Þ
Reference
In Sect. 11.4 we saw that the transformation matrices are altered depending on the
basis vectors we choose. Then a following question arises. Can we convert a
(transformation) matrix to as simple a form as possible by similarity transforma-
tion(s)? In Sect. 11.4 we have also shown that if we have two sets of basis vectors in
a linear vector space V n we can always find a non-singular transformation matrix
between the two. In conjunction with the transformation of the basis vectors, the
matrix undergoes similarity transformation. It is our task in this chapter to find a
simple form or a specific form (i.e., canonical form) of a matrix as a result of the
similarity transformation. For this purpose, we should first find eigenvalue(s) and
corresponding eigenvector(s) of the matrix. Depending upon the nature of matrices,
we get various canonical forms of matrices such as a triangle matrix and a diagonal
matrix. Regarding any form of matrices, we can treat these matrices under a unified
form called the Jordan canonical form.
where α is a certain (complex) number. Then we say that α is an eigenvalue and that
x is an eigenvector that corresponds to the eigenvalue α. Using a notation of (11.37)
of Sect. 11.2, we have
0 10 1 0 1
a11 a1n x1 x1
B CB C B C
AðxÞ ¼ ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ αðe1 en Þ@ ⋮ A:
an1 ann xn xn
Ax ¼ αx: ð12:2Þ
ðA αEÞx ¼ 0, ð12:3Þ
where E is a (n, n) unit matrix. Equations (12.2) and (12.3) are said to be an
eigenvalue equation (or eigenequation). In (12.2) or (12.3) x = 0 always holds
(as a trivial solution). Consequently, for x ≠ 0 to be a solution we must have
jA αEj ¼ 0, ð12:4Þ
we have
an ¼ ð1Þn j A j : ð12:8Þ
The characteristic equation fA(x) ¼ 0 has n roots including possible multiple roots.
Let those roots be α1, , αn (some of them may be identical). Then we have
Y
n
f A ð xÞ ¼ ðx αi Þ: ð12:9Þ
i¼1
This leads to invariance of the trace under a similarity transformation. That is,
Tr P1 AP ¼ TrA: ð12:13Þ
= * . ð12:14Þ
0
462 12 Canonical Forms of Matrices
The matrix of this type is thought to be one of canonical forms of matrices. Its
characteristic equation fT(x) is
x a11
f T ðxÞ ¼ jxE T j ¼ 0 ⋱ : ð12:15Þ
0 0 x ann
Therefore, we get
Y
n
f T ð xÞ ¼ ðx aii Þ, ð12:16Þ
i¼1
where we used (11.58). From (12.14) and (12.16), we find that eigenvalues of a
triangle matrix are given by its diagonal elements accordingly. Our immediate task
will be to examine whether and how a given matrix is converted to a triangle matrix
through a similarity transformation. A following theorem is important.
Theorem 12.1 Every (n, n) square matrix can be converted to a triangle matrix by
similarity transformation [1].
Proof A triangle matrix is either an “upper” triangle matrix [to which all the lower
left off-diagonal elements are zero; see (12.14)] or a “lower” triangle matrix
(to which all the upper right off-diagonal elements are zero). In the present case
we show the proof for the upper triangle matrix. Regarding the lower triangle matrix
the theorem is proven in a similar manner.
The proof is due to mathematical induction. First we show that the theorem is true
of a (2, 2) matrix. Suppose that one of eigenvalues of A2 is α1 and that its
corresponding eigenvector is x1. Then we have
A2 x1 ¼ α1 x1 , ð12:17Þ
P1 ¼ ðx1 p1 Þ, ð12:18Þ
where p1 is another column vector chosen in such a way that x1 and p1 are linearly
independent so that P1 can be a non-singular matrix. Then, we have
1
P1 1
1 A2 P1 ¼ P1 A2 x1 P1 A2 p1 : ð12:19Þ
Meanwhile,
12.1 Eigenvalues and Eigenvectors 463
1 1
x1 ¼ ð x1 p1 Þ ¼ P1 :
0 0
Hence, we have
1 1 α1
P1
1 A2 x1 ¼ α1 P1
1 x1 ¼ α1 P1
1 P1 ¼ α1 ¼ : ð12:20Þ
0 0 0
fn as.
Then, we can describe A
0
. ð12:23Þ
AnѸ1
e ¼ ðan p1 p2 pn1 Þ,
P
0 1 0 1
1 1
B 0 C B 0 C
B C B C
B C B C
B
an ¼ ðan p1 p2 pn1 ÞB C e B 0 C
0 C ¼ PB C:
B C B C
@⋮A @ ⋮A
0 0
Therefore, we have
1 0 0 1 0 1
1 1 αn
B 0 C B 0 C B 0 C
B C B C B C
1 1 1 B C B C B C
e e e e B C
P An an ¼ P αn an ¼ P αn PB 0 C ¼ αn B
B 0 C B
C¼B 0 CC:
B C B C B C
@⋮A @ ⋮A @ ⋮A
0 0 0
Thus, from (12.24) we see that one can express a matrix form of A fn as (12.23).
By a hypothesis of the mathematical induction, we assume that there exists a
(n 1, n 1) non-singular square matrix Pn 1 and an upper triangle matrix Δn 1
such that
P1
n1 An1 Pn1 ¼ Δn1 : ð12:25Þ
0 ⋯ 0
0 ,
PnѸ1
⋯ 0 ⋯ 0
, ð12:26Þ
AnѸ1 PnѸ1 AnѸ1 PnѸ1
0 1
x1
B C
where xTn1 is a transpose of a column vector @ ⋮ A . Therefore, xTn1 Pn1 is a
xn1
(1, n 1) matrix (i.e., a row vector). Meanwhile, we have
12.1 Eigenvalues and Eigenvectors 465
0 0 0 /
. ð12:27Þ
PnѸ1 Δ nѸ1 PnѸ1 ΔnѸ1
That is,
f
An Pn ¼ Pn Δn : ð12:29Þ
/
, ð12:30Þ
Δ nѸ1
1
2. Another eigenvector can be chosen to be . This is because for an
1
eigenvalue 1, we obtain
1 1
x¼0
0 0
c1
as an eigenvalue equation (A E)x ¼ 0. Therefore, with an eigenvector
c2
1
corresponding to the eigenvalue 1, we get c1 + c2 ¼ 0. Therefore, we have as
1
1 1
a simple form of the eigenvector. Hence, putting P ¼ , similarity trans-
0 1
formation is carried out such that
1 1 2 1 1 1 2 0
P1 AP ¼ ¼ : ð12:33Þ
0 1 0 1 0 1 0 1
c1 a1 þ c2 a2 ¼ 0: ð12:34Þ
Suppose that a1 and a2 are linearly dependent. Then, without loss of generality we
can put c1 6¼ 0. Accordingly, we get
c2
a1 ¼ a: ð12:35Þ
c1 2
c2
α1 a1 ¼ α a = α2 a1 : ð12:36Þ
c1 2 2
With the second equality we have used (12.35). From (12.36) we have
12.1 Eigenvalues and Eigenvectors 467
c1 c c
0¼ ðα α1 Þa1 þ 2 ðαn α2 Þa2 þ þ n1 ðαn αn1 Þan1 : ð12:43Þ
cn n cn cn
Since we assume that eigenvalue are different from one another, αn 6¼ α1,
αn 6¼ α2, , αn 6¼ αn 1. This implies that c1 ¼ c2 ¼ ¼ cn 1 ¼ 0. From
(12.40) we have an ¼ 0. This is, however, in contradiction to that an is an
468 12 Canonical Forms of Matrices
eigenvector. This means that our original supposition that a1, a2, , and an are
linearly dependent was wrong. Thus, the eigenvectors a1, a2, , and an should
be linearly independent.
(ii) Case II: αn ¼ 0.
Suppose again that a1, a2, , and an are linearly dependent. Since as before
cn 6¼ 0, we get (12.40) and (12.41) once again. Putting αn ¼ 0 in (12.41) we have
c c c
0 ¼ 1 α1 a1 þ 2 α2 a2 þ þ n1 αn1 an1 : ð12:44Þ
cn cn cn
Equations (12.21), (12.30), and (12.31) show that if we adopt an eigenvector as one
of basis vectors, the matrix representation of the linear transformation A in reference
to such basis vectors is obtained so that the leftmost column is zero except for the left
top corner on which an eigenvalue is positioned. (Note that if the said eigenvector is
zero, the leftmost column is a zero vector.) Meanwhile, neither Theorem 12.1 nor
Theorem 12.2 tells about multiplicity of eigenvalues. If the eigenvalues have
multiplicity, we have to think about different aspects. This is a major issue of this
section.
Let us start with a discussion of invariant subspaces. Let A be a (n, n) square
matrix. If a subspace W in V n is characterized by
x 2 W⟹Ax 2 W,
Am
ða1 a2 am amþ1 an ÞA ¼ ða1 a2 am amþ1 an Þ , ð12:45Þ
0
a, Aa, A2 a, , An a:
These vectors are linearly dependent, since there are at most n linearly indepen-
dent vectors in V n. These vectors span a subspace in V n. Let us call this subspace M;
i.e.,
M Span a, Aa, A2 a, , An a :
c0 a þ c1 Aa þ c2 A2 a þ þ cn An a = 0: ð12:46Þ
Not all ci (0 i n) are zero, because the vectors are linearly dependent. Suppose
that cn 6¼ 0. Then, from (12.46) we have
1
An a = 2 c a þ c1 Aa þ c2 A2 a þ þ cn1 An1 a :
cn 0
1
Anþ1 a = 2 c0 Aa þ c1 A2 a þ c2 A3 a þ þ cn1 An a :
cn
1
An1 a = 2 c a þ c1 Aa þ c2 A2 a þ þ cn2 An2 a :
cn1 0
1
Anþ1 a = 2 c0 A2 a þ c1 A3 a þ c2 A4 a þ þ cn2 An a :
cn1
c0 a þ c1 Aa ¼ 0:
c0
Aa ¼ a:
c1
Operating once again An on the above equation from the left, we have
c0 n
Anþ1 a = A a:
c1
m dimM dimV n ¼ n:
Note again that in (12.45) V n is spanned by the m basis vectors in M together with
other (n 2 m) linearly independent vectors.
We happen to encounter a situation where two subspaces W1 and W2 are at once
A-invariant. Here we can take basis vectors a1, a2, , and am for W1 and am + 1,
am + 2, , and an for W2. In reference to such a1, a2, , and an as basis vector of
V n, we have
Am 0
A¼ , ð12:48Þ
0 Anm
12.2 Eigenspaces and Invariant Subspaces 471
A ¼ Am Anm :
A eT ¼j A j E,
eT A ¼ AA ð12:50Þ
eT is a transposed matrix of A.
where A e We now apply (12.50) to the characteristic
polynomial to get a following equation expressed as
472 12 Canonical Forms of Matrices
g T
xE AÞ ðxE AÞ ¼ ðxE AÞ xEg
AÞT ¼ jxE AjE ¼ f A ðAÞE, ð12:51Þ
where xEg A is the cofactor matrix of xE A. Let the cofactor of the (i, j)-element
of (xE A) be Δij. Note in this case that Δij is an at most (n 1)-th order polynomial
of x. Let us put accordingly
0 1
b11,0 xn1 þ þ b11,n1 b1n,0 xn1 þ þ b1n,n1
B C
¼@ ⋮ ⋱ ⋮ A
bn1,0 xn1 þ þ bn1,n1 bnn,0 xn1 þ þ bnn,n1
0 1 0 1
b11,0 b1n,0 b11,0 b1n,n1
B C B C
¼ xn1 @ ⋮ ⋱ ⋮ A þ þ @ ⋮ ⋱ ⋮ A
bn1,0 bnn,0 bn1,0 bnn,n1
Thus, we get
jxE AjE ¼ ðxE AÞ xn1 BT0 þ xn2 BT1 þ þ BTn1 : ð12:54Þ
Definition 12.1 Let f(x) be a polynomial of x with scalar coefficients such that f
(A) ¼ 0, where A is a (n, n) matrix. Among f(x), a lowest-order polynomial with the
highest-order coefficient of one is said to be a minimal polynomial. We denote it by
φA(x); i.e., φA(A) ¼ 0.
We have an important property for this. Namely, a minimal polynomial φA(x) is a
divisor of f(A). In fact, suppose that we have
From the above equation, h(x) should be a polynomial whose order is lower than
that of φA(A). But, the presence of such h(x) is in contradiction to the definition of the
minimal polynomial. This implies that h(x) 0. Thus, we get
Then, we have
where c is a constant, because φA(x) and φ0A ðxÞ are of the same order. But, c should be
one, since the highest-order coefficient of the minimal polynomial is one. Thus,
φA(x) is uniquely defined.
Vn ¼ W1 W2 Wn
¼ Spanfa1 g Spanfa2 g Spanfan g, ð12:56Þ
The matrix has a triple root 2. As can easily be seen below, individual eigenvec-
tors a1, a2, and a3 form basis vectors of each invariant subspace, indicating that V3
can be decomposed to a direct sum of the three invariant subspaces as in (12.56).
0 1
2 0 0
B C
ð a1 a2 a3 Þ @ 0 2 0 A ¼ ð2a1 2a2 2a3 Þ: ð12:58Þ
0 0 2
where a1 and a2 are basis vectors and for 8x 2 V2,x = x1a1 + x2a2. This example has a
double root 3. We have
3 1
ð a1 a2 Þ ¼ ð3a1 a1 þ 3a2 Þ: ð12:60Þ
0 3
Thus, Span {a1} is A-invariant, but this is not the case with Span{a2}. This
implies that a1 is an eigenvector corresponding to an eigenvalue 3 but a2 is not.
Detailed discussion about matrices of this kind follows below.
Nilpotent matrices play an important role in matrix algebra. These matrices are
defined as follows.
Definition 12.2 Let N be a linear transformation in a vector space V n. Suppose that
we have
where N is a (n, n) square matrix and k (2) is a certain natural number. Then, N is
said to be a nilpotent matrix of an order k or a k-th order nilpotent matrix. If (12.61)
holds with k ¼ 1, N is a zero matrix.
Nilpotent matrices have following properties:
(i) Eigenvalues of a nilpotent matrix are zero. Let N be a k-th order nilpotent matrix.
Suppose that
Nx ¼ αx, ð12:62Þ
N k x ¼ αk x: ð12:63Þ
e
P1 NP ¼ N,
f N ðN Þ ¼ N n ¼ 0:
P1 NP ¼ 0:
The above equation holds, because N only takes eigenvalues of zero. Operating
P from the left of the above equation and P1 from the right, we have
N ¼ 0:
476 12 Canonical Forms of Matrices
8
A ¼ 0 ⟺ Ax ¼ 0 for x 2 V n: ð12:64Þ
∃
A 6¼ 0 ⟺ Ax 6¼ 0 for x 2 V n:
x, Nx N 2 x, , N k1 x:
c0 N k1 x ¼ 0: ð12:66Þ
c1 N k1 x ¼ 0, ð12:68Þ
That is, Ne2 ¼ e1. Then, linearly independent two vectors e2 and Ne2 correspond
to the case of Theorem 12.4.
So far we have shown simple cases where matrices can be diagonalized via
similarity transformation. This is equivalent to that the relevant vector space is
decomposed to a direct sum of (invariant) subspaces. Nonetheless, if a characteristic
polynomial of the matrix has multiple root(s), it remains uncertain whether the vector
space is decomposed to such a direct sum. To answer this question, we need a
following lemma. ∎
Lemma 12.1 Let f1(x), f2(x), , and fs(x) be polynomials without a common factor.
Then we have s polynomials M1(x), M2(x), , Ms(x) that satisfy the following
relation:
Proof Let Mi(x) (1 i s) be arbitrarily chosen polynomials and deal with a set of
g(x) that can be expressed as
where h(x) is a certain polynomial. Since gðxÞ, g0 ðxÞ 2 H, we have hðxÞ 2 H from
the above properties (i) and (ii). If h(x) 6¼ 0, the order of h(x) is lower than that of
478 12 Canonical Forms of Matrices
g0(x) from (12.71). This is, however, in contradiction to the definition of g0(x).
Therefore, h(x) ¼ 0. Thus,
On assumption the greatest common factor of f1(x), f2(x), , and fs(x) is 1. This
implies that g0(x) ¼ 1. Thus, we finally get (12.69) and complete the proof. ∎
ðA αE Þl x ¼ 0, ðA αEÞl1 x 6¼ 0:
AðAx xÞ ¼ 0:
Then, we have
Ax = x or Ax = 0: ð12:74Þ
V n ¼ AðV n Þ Ker A,
where
Thus, we find that A decomposes V n into a direct sum of A(V n) and Ker A.
Conversely, we can readily verify that if there exists an operator A that satisfies
(12.75), such A must be an idempotent operator. The verification is left for readers.
Meanwhile, we have
ðE 2 AÞ2 ¼ E 2A þ A2 ¼ E 2A þ A ¼ E A:
AðE AÞ ¼ ðE AÞA ¼ 0:
Bx x ¼ ðE AÞx x ¼ x Ax x ¼ Ax:
we get
W A ¼ AV n , W A ¼ ðE AÞV n ¼ BV n ¼ V n AV n ¼ Ker A:
That is,
V n ¼ W A þ W A:
V n ¼ W A W A:
A1 þ A2 þ þ As ¼ E,
Ai Aj ¼ Ai δij :
Then
V n ¼ W 1 W 2 W s: ð12:78Þ
In fact, suppose that ∃x( 2 V n) 2 Wi, Wj (i 6¼ j). Then Aix ¼ x = Ajx. Operating Aj
( j 6¼ i) from the left, AjAix ¼ Ajx = AjAjx. That is, 0 ¼ x = Ajx, implying that
Wi \ Wj ¼ {0}. Meanwhile,
V n ¼ ðA1 þ A2 þ þ As ÞV n ¼ A1 V n þ A2 V n þ þ As V n
¼ W 1 þ W 2 þ þ W s: ð12:79Þ
In the above,
12.4 Idempotent Matrices and Generalized Eigenspaces 481
0 1
0 0 0 0
B0 0 0C
B 0 C
ðe1 e2 e3 e4 ÞðE AÞ ¼ ðe1 e2 e3 e4 ÞB C ¼ ð0 0 e3 e4 Þ:
@0 0 1 0A
0 0 0 1
Thus, we have
W A ¼ Spanfe1 , e2 g, W A ¼ Spanfe3 , e4 g, V 4 ¼ W A W A :
The properties of idempotent matrices can easily be checked. It is left for readers.
Using the aforementioned idempotent operators, let us introduce the following
theorem.
Theorem 12.5 [3, 4] Let A be a linear transformation V n ! V n. Suppose that a
vector x (2V n) satisfies the following relation:
ðA αEÞl x ¼ 0, ð12:80Þ
g
Vn ¼ W g g
α1 W α2 W αs : ð12:81Þ
e αi ð1 i sÞ is given by
Here W
e α i ¼ ni .
where li is an enough large natural number. If multiplicity of αi is ni, dim W
Proof Let us define the aforementioned A-invariant subspaces as W e αk ð1 k sÞ.
Let fA(x) be a characteristic polynomial of A. Factorizing fA(x) into a product of
powers of first-degree polynomials, we have
Ys
f A ð xÞ ¼ i¼1
ðx αi Þni , ð12:83Þ
Then f1(x), f2(x), , and fs(x) do not have a common factor. Consequently,
Lemma 12.1 tells us that there are polynomials M1(x), M2(x), , and Ms(x) such that
Or defining Mi(A)fi(A) Ai
A1 þ A2 þ þ As ¼ E, ð12:87Þ
Ai Aj ¼ Ai δij : ð12:88Þ
In fact, if i 6¼ j,
Y
s
¼ M i ðAÞM j ðAÞf A ðAÞ ðA αk Þnk ¼ 0: ð12:89Þ
k6¼i, j
The second equality results from the fact that Mj(A) and fi(A) are commutable
since both are polynomials of A. The last equality follows from Hamilton–Cayley
theorem. On the basis of (12.87) and (12.89),
We have
Operating Mi(A) from the left and from the fact that Mi(A) commutes with
ðA αi EÞni , we get
ðA αi E Þni Ai ¼ 0, ð12:94Þ
ðA αi E Þni Ai V n ¼ 0:
e αi ð1 i sÞ:
Ai V n ⊂ W ð12:95Þ
and, hence,
x ¼ Ai ½N ðAÞx 2 Ai V n : ð12:98Þ
Thus, we get
e αi ⊂ Ai V n ð1 i sÞ:
W ð12:99Þ
e αi ¼ Ai V n ð1 i sÞ:
W ð12:100Þ
Vn ¼ W1 W2 Ws
or
g
Vn ¼ W g g
α1 W α2 W αs : ð12:101Þ
This completes the former half of the proof. With the latter half, the proof is as
follows.
Suppose that dim We αi ¼ n0i . In parallel to the decomposition of V n to the direct
sum of (12.81), A can be reduced as
0 1
Að1Þ 0
B C
A @ ⋮ ⋱ ⋮ A, ð12:102Þ
ðsÞ
0 A
where A(i) (1 i s) is a (n0i , n0i ) matrix and a symbol indicates that A has been
transformed by suitable similarity transformation. The matrix A(i) represents a linear
transformation that A causes to W e αi . We denote a n0i order identity matrix by E n0 .
i
Equation (12.82) implies that the matrix represented by
is a nilpotent matrix. The order of an nilpotent matrix is at most n0i (vide supra) and,
hence, li can be n0i . With Ni we have
h i
f N i ðxÞ ¼ jxEn0i N i j ¼ xE n0i AðiÞ αi E n0i
0
¼ xEn0i AðiÞ þ αi E n0i ¼ ðx þ αi ÞEn0i AðiÞ ¼ xni : ð12:104Þ
The last equality is because eigenvalues of a nilpotent matrix are all zero.
Meanwhile,
0
f AðiÞ ðxÞ ¼ jxE n0i AðiÞ j ¼ f N i ðx αi Þ ¼ ðx αi Þni : ð12:105Þ
Ys Ys 0 Ys
f A ð xÞ ¼ f ði Þ ð xÞ ¼
i¼1 A i¼1
ðx αi Þni ¼ i¼1
ðx αi Þni : ð12:106Þ
The last equality comes from (12.83). Thus, n0i ¼ ni . These procedures complete
the proof. At the same time, we may equate li in (12.82) to ni. ∎
Theorem 12.1 shows that any square matrix can be converted to a (upper) triangle
matrix by a similarity transformation. Theorem 12.5 demonstrates that the matrix can
further be segmented according to individual eigenvalues. Considering Theorem
12.1 again, A(i) (1 i s) can be described as an upper triangle matrix by
0 1
αi
B C
AðiÞ @⋮ ⋱ ⋮ A: ð12:107Þ
0 αi
we find that N (i) is nilpotent. This is because all the eigenvalues of N (i) are zero.
From (12.108), we have
h iμi
N ðiÞ ¼ 0 ðμi ni Þ:
Readers, check this. We have a following important theorem with the matrix
decomposition.
486 12 Canonical Forms of Matrices
A ¼ S þ N, ð12:109Þ
Then, Eq. (12.110) is a polynomial of A. From Theorem 12.1 and Theorem 12.5,
A(i) (1 i s) in (12.102) is characterized by that A(i) is a triangle matrix whose
eigenvalues αi (1 i s) are positioned on diagonal positions and that the order of
A(i) is identical to the multiplicity of αi. Since Ai (1 i s) is an idempotent matrix,
it should be diagonalized through similarity transformation (see Sect. 12.7). In fact,
S is transformed via similarity transformation the same as (12.102) into
0 1
α1 En1 0
B C
S @ ⋮ ⋱ ⋮ A, ð12:111Þ
0 αs E ns
Since each N (i) is nilpotent as stated in Sect. 12.4, N is nilpotent as well. Also
(12.112) is a polynomial of A as in the case of S. Therefore, S and N are commutable.
To prove the uniqueness of the decomposition, we show the following:
(i) Let S and S0 be commutable semi-simple matrices. Then, those matrices are
simultaneously diagonalized. That is, with a certain non-singular matrix
12.5 Decomposition of Matrix 487
V n ¼ W α1 W αs :
Thus, we get
0 1 0 1
α1 En1 0 S01 0
B C B C
P1 SP ¼ @ ⋮ ⋱ ⋮ A, 1 0
P S P ¼ @⋮ ⋱ ⋮ A:
0 αs E ns 0 S0s
This means that both P1SP and P1S0P are diagonal. That is,
P SP P1S0P ¼ P1(S S0)P is diagonal, indicating that S S0 is semi-
1
simple as well.
0
(ii) Suppose that Nν ¼ 0 and N 0ν ¼ 0 . From the assumption, N and N0 are
commutable. Consequently, using binomial theorem we have
m!
N0Þ ¼ Nm mN m1 N 0 þ þ ð 1Þi N mi N 0 þ þ ð 1Þm N 0 :
m i m
ðN
i!ðm iÞ!
ð12:113Þ
488 12 Canonical Forms of Matrices
A ¼ S þ N ¼ S0 þ N 0 or S S0 ¼ N 0 N: ð12:115Þ
From the assumption, S0 and N0 are commutable. Moreover, since S, S0, N, and N0
are described by a polynomial of A, they are commutable with one another. Hence,
from (i) and (ii) along with the second equation of (12.115), S S0 and N0 N are
both semi-simple and nilpotent at once. Consequently, from (iii) S S0 ¼ N0 N ¼ 0.
Thus, we finally get S ¼ S0 and N ¼ N0. That is, the decomposition is unique.
These complete the proof. ∎
Theorem 12.6 implies that the matrix decomposition of (12.109) is unique. On the
basis of Theorem 12.6, we investigate Jordan canonical forms of matrices in the next
section.
Once the vector space V n has been decomposed to a direct sum of generalized
eigenspaces with the matrix reduced in parallel, we are able to deal with individual
e αi ð1 i sÞ and the corresponding A(i) (1 i s) separately.
eigenspaces W
W ðiÞ ¼ x; x 2 V n , N i x ¼ 0 :
Then we have
Note that when ν ¼ 1, we have trivially V n ¼ W (ν) ⊃ W (0) {0}. Let us put dim
W ¼ mi, mi mi 1 ¼ ri (1 i ν), m0 0. Then we can add rν linearly
(i)
Moreover, these rν vectors Na1 , Na2 , , and Narν are linearly independent.
Suppose that
This would imply that c1 a1 þ c2 a2 þ þ crν arν 2 W ðν1Þ . On the basis of the
above argument, however, we must have c1 ¼ c2 ¼ ¼ crν ¼ 0. In other words, if
ci (1 i rν) were nonzero, we would have ai 2 W (ν 1), in contradiction. From
(12.118) this means linear independence of Na1 , Na2 , , and Narν .
As W (ν 1) ⊃ W (ν 2), we may well have additional linearly independent
vectors within the basis vectors of W (ν 1). Let those vectors be arν þ1 , , arν1. Here
we assume that the number of such vectors is rν 1 rν. We have rν 1 rν 0
accordingly. In this way we can construct basis vectors of W (ν 1) by including
arν þ1 , , and arν1 along with Na1 , Na2 , , and Narν . As a result, we get
490 12 Canonical Forms of Matrices
W ðiÞ ¼ Span N νi a1 , , N νi arν , N νi1 arν þ1 , , N νi1 arν1 , , ariþ1 þ1 , , ari
W ði1Þ :
ð12:119Þ
Further repeating the procedures, we exhaust all the n basis vectors of W (ν) ¼ V n.
These vectors are given as follows:
Table 12.1 [3, 4] shows the resulting structure of these basis vectors pertinent to
Jordan blocks. In Table 12.1, if laterally counting basis vectors, from the top we have
rν, rν 1, , r1 vectors. Their sum is n. This is the same number as that vertically
counted. The dimension n of the vector space V n is thus given by
Xν Xν
n¼ r ¼
i¼1 i i¼1
iðr i r iþ1 Þ: ð12:121Þ
Let us examine the structure of Table 12.1 more closely. More specifically, let us
inspect the i-layered structures of (ri ri + 1) vectors. Picking up a vector from
among ariþ1 þ1 , , ari , we call it aρ. Then, we get following set of vectors aρ, Naρ
N2aρ, , Ni 1aρ in the i-layered structure. These i vectors are displayed “verti-
cally” in Table 12.1. These vectors are linearly independent (see Theorem 12.4) and
form an i- dimensional N-invariant subspace; i.e., Span{Ni 1aρ, Ni 2aρ, , Naρ,
aρ}, where ri + 1 + 1 ρ ri. Matrix representation of the linear transformation
N with respect to the set of these i vectors is
12.6
Table 12.1 Jordan blocks and structure of basis vectors for a nilpotent matrixa
Jordan Canonical Form
rν#
rν a1 , , arν rν 1 rν#
rν 1 Na1 , , Narν arν þ1 , , arν1
N 2 a1 , , N 2 arν Narν þ1 , , Narν1
ri ri + 1#
ri N νi a1 , , N νi arν N νi1 arν þ1 , , N νi1 arν1 ariþ1 þ1 , , ari
ri 1 N νi1 a1 , , N νi1 arν N νi2 arν þ1 , , N νi2 arν1 Nariþ1 þ1 , , Nari
r2 r3#
r2 N ν2 a1 , , N ν2 arν N ν3 arν þ1 , , N ν3 arν1 N i2 ariþ1 þ1 , , N i2 ari ar3 þ1 , , ar2 r1 r2#
r1 ν1 ν1 ν2 ν2 i1 i1 Nar3 þ1 , , Nar2 ar2 þ1 , , ar1
N a1 , , N ar ν N arν þ1 , , N arν1 N ariþ1 þ1 , , N ari
P
n¼ νk¼1 r k νrν (ν 1)(rν 1 rν) i(ri ri + 1) 2(r2 r3) r1 r2
a
Adapted from Satake I (1974) Linear algebra (Mathematics Library 1: in Japanese) [4], with the permission of Shokabo Co., Ltd., Tokyo
491
492 12 Canonical Forms of Matrices
N i1 aρ N i2 aρ Naρ aρ N
0 1
0 1
B C
B C
B 0 1 C
B C
B C
B 0 C
i1 B C
B C
¼ N aρ N aρ Naρ aρ B
i2
⋮ ⋱ ⋮ C: ð12:122Þ
B C
B C
B 0 1 C
B C
B C
B 0 1C
@ A
0
These (i, i) matrices of (12.122) are called i-th order Jordan blocks. Notice that
the number of those Jordan blocks is (ri ri + 1). Let us expressly define this
number as [3, 4]
J i ¼ r i r iþ1 , ð12:123Þ
where Ji is the number of the i-th order Jordan blocks. The total number of Jordan
blocks within a whole vector space V n ¼ W (ν) is
Xν Xν
J
i¼1 i
¼ i¼1
ðr i r iþ1 Þ ¼ r 1 : ð12:124Þ
Meanwhile, since W (i) ¼ Ker Ni, dim W (i) ¼ mi ¼ dim Ker Ni. From (12.116),
m0 0. Then (11.45) is now read as
That is
n ¼ mi þ rankN i ð12:127Þ
or
mi ¼ n rankN i : ð12:128Þ
Xi Xi1
dim W ðiÞ ¼ mi ¼ r , dim
k¼1 k
W ði1Þ ¼ mi1 ¼ r :
k¼1 k
ð12:129Þ
Hence, we have
r i ¼ mi mi1 : ð12:130Þ
where the last equality arises from the dimension theorem expressed as (11.45).
In Table 12.1 [3, 4], moreover, we have two extreme cases. That is, if ν ¼ 1 in
(12.116), i.e., N ¼ 0, from (12.132)
P we have r1 ¼ n and r2 ¼ ¼ rn ¼ 0; see
Fig. 12.1a. Also we confirm n ¼ νi¼1 r i in (12.121). In this case, all the eigenvectors
are proper eigenvectors with multiplicity of n and we have n first-order Jordan
blocks. The other is the case of ν ¼ n. In that case, we have
r 1 ¼ r 2 ¼ ¼ r n ¼ 1: ð12:133Þ
P
In the latter case, we also have n ¼ νi¼1 r i in (12.121). We have only one proper
eigenvector and (n 1) generalized eigenvectors; see Fig. 12.1b. From (12.132), we
have this special case where we have only one n-th order Jordan block, when
rankN ¼ n 1.
Let us think of (12.81) on the basis of (12.102). Picking up A(i) from (12.102) and
considering (12.108), we put
(a) ⋯
, , ⋯,
= = =⋯=1
where Eni denotes (ni, ni) identity matrix. We express a nilpotent matrix as Ni as
before. In (12.134) the number ni corresponds to n in V n of Sect. 12.6.1. As Ni is a
(ni, ni) matrix, we have
N i ni ¼ 0:
Here we are speaking of ν-th order nilpotent matrices Ni such that Niν 1 6¼ 0 and
ν
Ni ¼ 0 (1 ν ni). We can deal with Ni in a manner fully consistent with the theory
we developed in Sect. 12.6.1. Each A(i) comprises one or more Jordan blocks A(κ)
that is expressed as
AðκÞ ¼ N κi þ αi E κi ð1 κi ni Þ, ð12:135Þ
where A(κ) denotes the κ-th Jordan block in A(i). In A(κ), N κi and Eκi are nilpotent
(κi, κ i) matrix and (κ i, κi) identity matrix, respectively. In N κi , κi zeros are displayed
on the principal diagonal and entries of 1 are positioned on the matrix element next
above the principal diagonal. All other entries are zero; see, e.g., a matrix of
(12.122). As in the Sect. 12.6.1, the number κi is called a dimension of the Jordan
block. Thus, A(i) of (12.102) can further be reduced to segmented matrices A(κ). Our
next task is to find out how many Jordan blocks are contained in individual A(i) and
what is the dimension of those Jordan blocks.
Corresponding to (12.122), the matrix representation of the linear transformation
by A(κ) with respect to the set of κ i vectors is
12.6 Jordan Canonical Form 495
h iκi 1 h h i
AðκÞ αi E κi aσ AðκÞ αi Eκi κ i 2
aσ aσ AðκÞ αi Eκi
h iκi 1 h iκi 2
ðκ Þ ðκ Þ
¼ A αi Eκi aσ A αi E κi aσ aσ
0 1
0 1
B C
B 0 1 C
B C
B 0 ⋱ C
B C
B C
B ⋮ ⋱ ⋱ ⋮ C: ð12:136Þ
B C
B 0 1 C
B C
B C
@ 0 1A
0
A vector aσ stands for a vector associated with the κ-th Jordan block of A(i).
From (12.136) we obtain
h i h iκi 1
AðκÞ αi Eκi AðκÞ αi Eκi aσ = 0: ð12:137Þ
Namely,
h iκi 1 h iκi 1
AðκÞ AðκÞ αi E κi aσ = αi AðκÞ αi E κi aσ : ð12:138Þ
κi 1
This shows that AðκÞ αi Eκi aσ is a proper eigenvector of A(κ) that corre-
sponds to an eigenvalue αi. On the other hand, aσ is a generalized eigenvector of rank
κi. There are another (κ i 2) generalized eigenvectors of
μ
AðκÞ αi Eκi aσ ð1 μ κ i 2Þ . In total, there are κi eigenvectors [a proper
eigenvector and (κi 1) generalized eigenvectors]. Also we see that the sole proper
eigenvector can be found for each Jordan block.
In reference to these κ i eigenvectors as the basis vectors, the (κ i,κi)-matrix A(κ)
(i.e., a Jordan block) is expressed as
1
⎛ 1 ⎞
⎜ ⋱ ⎟
( )
= ⎜ ⋮ ⋱ ⋱ ⋮ ⎟. ð12:139Þ
⎜ 1 ⎟
⎜ ⎟
496 12 Canonical Forms of Matrices
where A(i) is a (10, 10) upper triangle matrix in which αi is displayed on the principal
diagonal with entries 0 or 1 on the matrix element next above the principal diagonal
with all other entries being zero.
Theorem 12.1 shows that every (n, n) square matrix can be converted to a triangle
matrix by suitable similarity transformation. Diagonal elements give eigenvalues.
Furthermore, Theorem 12.5 ensures that A can be reduced to generalized
eigenspaces W e αi (1 i s) according to individual eigenvalues. Suppose for
example that after a suitable similarity transformation a full matrix A is rresented as
0 1
α1
B α2 C
B C
B C
B α2 C
B C
B α2 C
B C
B C
B ⋱ C
B C
B αi C
B C
A B C, ð12:140Þ
B αi C
B C
B αi C
B C
B C
B αi C
B C
B ⋱ C
B C
B C
@ αs A
αs
where
12.6 Jordan Canonical Form 497
In (12.141), A(1) is a (1, 1) matrix (i.e., simply a number); A(2) is a (3, 3) matrix;
A(i) is a (4, 4) matrix; A(s) is a (2, 2) matrix.
The above matrix form allows us to further deal with segmented triangle matrices
separately. In the case of (12.140) we may use a following matrix for similarity
transformation:
0 1
1
B C
B C
B 1 C
B C
B C
B 1 C
B C
B C
B 1 C
B C
B C
B ⋱ C
B C
B C
B C
e¼B
P B p11 p12 p13 p14 C
C,
B C
B p21 p22 p23 p24 C
B C
B C
B C
B p31 p32 p33 p34 C
B C
B C
B p41 p42 p43 p44 C
B C
B ⋱ C
B C
B C
B C
@ 1 A
1
matrix, however, the following argument will be useful. Let us think of an example
after that.
Using (12.140), we have
0 1
α1 αi
B C
B C
B α 2 αi C
B C
B C
B C
B α2 αi C
B C
B C
B C
B α2 αi C
B C
B C
B C
B ⋱ C
B C
B C
B C
B 0 C
A αi E B C
B C
B 0 C
B C
B C
B C
B 0 C
B C
B C
B C
B 0 C
B C
B C
B ⋱ C
B C
B C
B αs αi C
B C
@ A
αs αi
Here we are treating the (n, n) matrix A on V n. Note that a matrix A αiE is not
nilpotent as a whole. Suppose that the multiplicity of αi is ni; in (12.140) ni ¼ 4.
Since eigenvalues α1, α2, , and αs take different values from one another, α1 αi,
α2 αi, , and αs αi 6¼ 0. In a triangle matrix diagonal elements give eigenvalues
and, hence, α1 αi, α2 αi, , and αs αi are nonzero eigenvalues of A αiE. We
rewrite (12.140) as
0 1
M ð1Þ
B M ð2Þ C
B C
B C
B ⋱ C
A αi E B C, ð12:142Þ
B M ðiÞ C
B C
B C
@ ⋱ A
M ðsÞ
0 1
α2 αi
B C
where M ð1Þ ¼ ðα1 αi Þ, M ð2Þ ¼ @ α2 αi A, ,
α2 αi
12.6 Jordan Canonical Form 499
0 1
0
B
ðiÞ B 0 CC ðsÞ αs αi
M ¼B C, , M ¼ :
@ 0 A αs αi
0
we get
Ys
ΦA ðAÞ i¼1
ðA αi E Þμi ¼ 0:
and
500 12 Canonical Forms of Matrices
h ik h ik
e αi ¼ ni ¼ dim Ker AðiÞ αi Eni þ rank AðiÞ αi E ni ,
dim W ð12:146Þ
k
where rank (A αiE)k ¼ dim (A αiE)k(V n) and rank AðiÞ αi Eni ¼
k
dim AðiÞ αi Eni W e αi .
Noting that
h ik
dim Ker ðA αi E Þk ¼ dim Ker AðiÞ αi E ni , ð12:147Þ
we get
h ik
n rank ðA αi E Þk ¼ ni rank AðiÞ αi Eni : ð12:148Þ
rankðA αi E Þl ¼ n ni ðl μi Þ: ð12:149Þ
ðiÞ
The value r 1 gives the number of Jordan blocks with an eigenvalue αi.
Moreover, we must consider a following situation. We know how the matrix A(i)
in (10.134) is reduced to Jordan blocks of lower dimension. To get detailed infor-
mation about it, however, we have to get the information about (generalized)
eigenvalues corresponding to eigenvalues other than αi. In this context, (12.148) is
useful. Equation (12.131) tells how the number of Jordan blocks in a nilpotent matrix
is determined. If we can get this knowledge before we have found out all the
(generalized) eigenvectors, it will be easier to address the problem. Let us rewrite
(12.131) as
h iq1 h iqþ1
J ðqiÞ ¼ r q r qþ1 ¼ rank AðiÞ αi Eni þ rank AðiÞ αi E ni
h iq
2rank AðiÞ αi E ni , ð12:151Þ
12.6 Jordan Canonical Form 501
where we define J ðqiÞ as the number of the q-th order Jordan blocks within A(i). Note
that these blocks are expressed as (q, q) matrices. Meanwhile, using (12.148) J ðqiÞ is
expressed as
¼ ðx 4Þðx 2Þ3
ðiÞ x ¼ 4:
ðA 4EÞx ¼ 0:
3c1 c2 ¼ 0,
c1 c2 ¼ 0,
2c3 ¼ 0,
3c1 c2 2c3 ¼ 0:
ð4Þ
r 1 ¼ 4 rankðA 4EÞ ¼ 1: ð12:156Þ
In this case, the Jordan block is naturally one-dimensional. In fact, using (12.152)
we have
ð4Þ
J 1 ¼ rank ðA 4E Þ0 þ rank ðA 4EÞ2 2rank ðA 4E Þ
¼ 4 þ 3 2 3 ¼ 1: ð12:157Þ
12.6 Jordan Canonical Form 503
ð4Þ
In (12.157), J 1 gives the number of the first-order Jordan blocks for an
eigenvalue 4. We used
0 1
8 4 0 0
B 4 0C
B 0 0 C
ðA 4EÞ2 ¼ B C ð12:158Þ
@0 0 4 0A
8 4 4 0
ðiiÞ x ¼ 2 :
The eigenvalue 2 has a triple root. Therefore, we must examine how the invariant
subspaces can further be decomposed to subspaces of lower dimension. To this end
we first start with a secular equation expressed as
ðA 2EÞx ¼ 0: ð12:159Þ
c1 þ c2 ¼ 0,
3c1 c2 2c3 þ 2c4 ¼ 0:
From the above, we can put c1 ¼ c2 ¼ 0 and c3 ¼ c4 (¼1). The equations allow the
existence of another proper eigenvector. For this we have c1 ¼ c2 ¼ 1, c3 ¼ 0, and
c4 ¼ 1. Thus, for the two proper eigenvectors corresponding to an eigenvalue 2, we
get
0 1 0 1
0 1
B0C B 1 C
ð2Þ B C ð2Þ B C
e1 ¼ B 1 C, e2 ¼ B 0 C:
@ A @ A
1 1
504 12 Canonical Forms of Matrices
ðA 2EÞ2 x ¼ 0: ð12:160Þ
That is,
c1 c3 þ c4 ¼ 0: ð12:161Þ
Furthermore, we have
0 1
0 0 0 0
B0 0C
B 0 0 C
ðA 2E Þ3 ¼ B C: ð12:162Þ
@0 0 0 0A
8 0 8 8
ð2Þ
r 1 ¼ 4 rankðA 2E Þ ¼ 4 2 ¼ 2: ð12:163Þ
ð2Þ
J 1 ¼ rank ðA 2EÞ0 þ rank ðA 2EÞ2 2rank ðA 2E Þ
¼4þ12 2 ¼ 1: ð12:164Þ
ð 2Þ
J 2 ¼ rank ðA 2E Þ þ rank ðA 2E Þ3 2rank ðA 2E Þ2
¼ 2 þ 1 2 1 ¼ 1: ð12:165Þ
12.6 Jordan Canonical Form 505
ð2Þ ð2Þ
In the above, J 1 and J 2 are obtained from (12.152). Thus, Fig. 12.2 gives a
constitution of Jordan blocks for eigenvalues 4 and 2. The overall number of Jordan
blocks is three; the number of the first-order and second-order Jordan blocks is two
and one, respectively.
ð2Þ ð2Þ
The proper eigenvector e1 is related to J 1 of (12.164). A set of the proper
ð2Þ ð2Þ
eigenvector e2 and the corresponding generalized eigenvector g2 is pertinent to
ð2Þ ð2Þ
J 2 of (12.165). We must have the generalized eigenvector g2 in such a way that
h i
ð2Þ ð2Þ ð2Þ
ðA 2EÞ2 g2 ¼ ðA 2E Þ ðA 2E Þg2 = ðA 2EÞe2 = 0: ð12:166Þ
ð2Þ ð2Þ
We stress here that e1 is not eligible for a proper pair with g2 in J2. It is because
from (12.166) we have
( ) ( ) ( )
(LJHQYDOXH (LJHQYDOXH
0 10 10 1
1 0 1 1 1 1 0 0 0 0 1 0
B CB CB C
B0 0 1 0C B 0C B 1 1 C
1 B CB 1 3 0 CB 0 0 C
R AR ¼ B CB CB C
B1 0 0 0C B 0C B 0 C
@ A@ 0 0 2 A@ 0 1 0 A
1 1 0 0 3 1 2 4 1 1 1 0
0 1
4 0 0 0
B C
B 0 2 0 0C
B C
¼B C:
B 0 0 2 1C
@ A
0 0 0 2
ð12:170Þ
The structure of Jordan blocks is shown in Fig. 12.2. Notice here that the trace of
A remains unchanged before and after similarity transformation.
Next, we consider column vector representations. According to (11.37), let us
view the matrix A as a linear transformation over V 4. Then A is given by
0 1
x1
B x2 C
B C
AðxÞ ¼ ðe1 e2 e3 e4 ÞAB C
@x A
3
x4
0 10 1
1 1 0 0 x1
B1 0C B C
B 3 0 CB x2 C
¼ ð e1 e2 e3 e4 Þ B CB C, ð12:171Þ
@0 0 2 0 A@ x3 A
3 1 2 4 x4
0 1
x1
B C
B x2 C
B C
AðxÞ ¼ ðe1 e2 e3 e4 ÞRR1 ARR1 B
B
C
C
Bx C
@ 3A
x4
0 10 0 1
4 0 0 0 x1
B 0 0C B x02 C
ð4Þ ð2Þ ð2Þ ð2Þ B 2 0 CB C
¼ e 1 e 1 e 2 g2 B CB C, ð12:172Þ
@0 0 2 1 A@ x03 A
0 0 0 2 x04
where we have
ð4Þ ð2Þ ð2Þ ð2Þ
e1 e1 e2 g2 ¼ ðe1 e2 e3 e4 ÞR,
0 0 1 0 1
x1 x1
B 0 C Bx C
B x2 C B 2C ð12:173Þ
B C 1 B C:
B C¼R B C
B x0 C @ A
@ 3A x 3
x04
x4
As for V n in general, let us put R ¼ ( p)ij and R1 ¼ (q)ij. Also represent j-th
(generalized) eigenvectors by a column vector and denote them by p( j ). There we
display individual (generalized) eigenvectors in the order of (e(1) e(2) e( j ) e(n)),
where e( j ) (1 j n) denotes either a proper eigenvector or a generalized
eigenvector according to (12.174). Each p( j ) is represented in reference to original
basis vectors (e1 en). Then we have
508 12 Canonical Forms of Matrices
Xn ð jÞ ð jÞ
R1 pð jÞ ¼ q p
k¼1 ik k
¼ δi , ð12:175Þ
ð jÞ
where δi denotes a column vector to which only the j-th row is 1, otherwise
ð jÞ
0. Thus, a column vector δi is an “address” of e( j ) in reference to
(e e e e ) taken as basis vectors.
(1) (2) ( j) (n)
ð4Þ
In (12.176) p(1) is a column vector representation of e1 ; p(2) is a column vector
ð2Þ
representation of e1 , and so on.
A minimal polynomial ΦA(x) is expressed as
ΦA ð x Þ ¼ ð x 4Þ ð x 2Þ 2 :
0 1
0 0 1 0
B0 0 1C
B 1 C
R0 ¼ B C: ð12:177Þ
@0 1 0 0A
1 1 1 0
In this case, we also get the same Jordan canonical form as before. That is,
0 1
4 0 0 0
B0 2 0C
1 B 0 C
R0 AR0 ¼ B C:
@0 0 2 1A
0 0 0 2
Note that we get a different disposition of the matrix elements from that of
(12.172).
Next, we decompose A into a semi-simple matrix and a nilpotent matrix. In
(12.172), we had
0 1
4 0 0 0
B0 0C
B 2 0 C
R1 AR ¼ B C:
@0 0 2 1A
0 0 0 2
Defining
510 12 Canonical Forms of Matrices
0 1 0 1
4 0 0 0 0 0 0 0
B0 0C B0 0C
B 2 0 C B 0 0 C
S¼B C and N¼B C,
@0 0 2 0A @0 0 0 1A
0 0 0 2 0 0 0 0
we have
A¼e e
SþN
with
0 1 0 1
2 0 0 0 1 1 0 0
B0 0C B1 0C
e B 2 0 C e¼B
1 0 C
S¼B C and N B C:
@0 0 2 0A @0 0 0 0A
2 0 2 4 1 1 0 0
Even though matrix forms S and N differ depending on the choice of different
matrix forms of similarity transformation R (namely, R0, R00, etc. represented above),
the decomposition (12.178) is unique. That is, eS and N e are uniquely determined,
once a matrix A is given. The confirmation is left for readers as an exercise.
We present another simple example. Let A be a matrix described as
0 4
A¼ :
1 4
ð2Þ 2
e1 ¼ :
1
ð2Þ
Another eigenvector is a generalized eigenvector g1 of rank 2. This can be
decided such that
ð2Þ 2 4 ð2Þ ð2Þ
ðA 2E Þg1 ¼ g ¼ e1 :
1 2 1
As an option, we get
ð2Þ 1
g1 ¼ :
1
Thus, we can choose R for a diagonalizing matrix together with an inverse matrix
R1 such that
2 1 1 1
R¼ , and R1 ¼ :
1 1 1 2
As before, putting
2 0 0 1
S¼ and N¼ ,
0 2 0 0
we have
A¼e e
SþN
with
2 0 2 4
e
S¼ e¼
and N : ð12:180Þ
0 2 1 2
512 12 Canonical Forms of Matrices
We may also choose R0 for a diagonalizing matrix together with an inverse matrix
R instead of R and R1, respectively, such that we have, e.g.,
0 1
0 2 3 0 1 1 3
R ¼ , and R ¼ :
1 1 1 2
Using these matrices, we get exactly the same Jordan canonical form and the
matrix decomposition as (12.179) and (12.180). Thus, again we find that the matrix
decomposition is unique.
Another simple example is a lower triangle matrix
0 1
2 0 0
B C
A ¼ @ 2 1 0 A:
0 0 1
Then, we get
0 1
2 0 0
B C
R1 AR ¼ S ¼ @ 0 1 0 A:
0 0 1
where the first term is a semi-simple matrix and the second is a nilpotent matrix (i.e.,
zero matrix). Thus, the decomposition is once again unique.
In this case, let us examine what form a minimal polynomial φA(x) for A takes. A
characteristic polynomial fA(x) for A is invariant through similarity transformation,
so is φA(x). That is,
Surely φA(x) of (12.185) has a lowest order polynomial among those satisfying f
(A) ¼ 0 and is a divisor of fA(x). Also φA(x) has a highest-order coefficient of 1. Thus,
φA(x) should be a minimal polynomial of A and we conclude that φA(x) does not have
a multiple root.
Then let us think how V n is characterized in case φA(x) does not have a multiple
root. This is equivalent to that φA(x) is described by (12.185). To see this, suppose
that we have two matrices A and B and let BVn ¼ W. We wish to use the following
relation:
rank ðABÞ ¼ dimABV n ¼ dimAW ¼ dimW dim A1 f0g \ W
dimW dim A1 f0g ¼ dimBV n ðn dimAV n Þ
¼ rank A þ rank B n: ð12:186Þ
In (12.186), the third equality comes from the fact that the domain of A is
restricted to W. Concomitantly, A1{0} is restricted to A1{0} \ W as well; notice
that A1{0} \ W is a subspace. Considering these situations, we use a relation
corresponding to that of (11.45). The fourth equality is due to the dimension theorem
of (11.45). Applying (12.186) to (12.183) successively, we have
12.7 Diagonalizable Matrices 515
Finally we get
Xs
i¼1
rank ½n ðA αi E Þ n: ð12:187Þ
Meanwhile, we have
V n ⊃ W α1 W α2 W αs
Xs ð12:189Þ
n dim ðW α1 W α2 W αs Þ ¼ i¼1
dimW αi :
The equality results from the property of a direct sum. From (12.188) and
(12.189), we get
Xs
i¼1
dimW αi ¼ n: ð12:190Þ
Hence,
V n ¼ W α1 W α2 W αs : ð12:191Þ
Thus, we have proven that if the minimal polynomial does not have a multiple
root, V n is decomposed into direct sum of eigenspaces as in (12.191).
If in turn V n is decomposed into direct sum of eigenspaces as in (12.191),
A can be diagonalized by a similarity transformation. The proof is as follows:
Suppose that (12.191) holds. Then, we can take only eigenvectors for the basis
αi ¼ ni . Then, we can take vectors
n
vectors
hP of V . Suppose Pthat dimW i
i1 i
ak n
j¼1 j þ 1 k n
j¼1 j so that ak can be the basis vectors of W αi . In
reference to this basis set, we describe a vector x 2 V n such that
516 12 Canonical Forms of Matrices
0
1
x1
B x2 C
B C
x = ða1 a2 an Þ B C:
@⋮A
xn
Operating A on x, we get
0 1 0 1
x1 x1
B C B C
B x2 C B x2 C
B C B C
AðxÞ ¼ ða1 a2 an ÞAB C ¼ ða1 A a2 A an AÞB C
B⋮C B⋮C
@ A @ A
xn xn
0 1
x1
B C
B x2 C
B C
¼ ðα1 a1 α2 a2 αn an ÞB C
B⋮C
@ A
xn
0 10 1
α1 x1
B α2 CB C
B CB x2 C
B CB C
¼ ða1 a2 an ÞB CB C ð12:192Þ
B ⋱ CB ⋮ C
@ A@ A
αn xn
where with the second equality we used the notation (11.40); with the third equality
some of αi (1 i n) may be identical; ai is an eigenvector that corresponds to an
eigenvalue αi. Suppose that a1, a2, , and an are obtained by transforming an
“original” basis set e1, e2, , and en by R. Then, we have
0 1 0 ð0Þ 1
α1 x1
B α2 C B C
B C 1 B xð20Þ C
AðxÞ ¼ ðe1 e2 en Þ RB
B
CR B
C B ⋮
C:
C
@ ⋱ A @ A
αn xðn0Þ
We denote the transformation A with respect to a basis set e1, e2, , and en by
A0; see (11.82) with the notation. Then, we have
12.7 Diagonalizable Matrices 517
0 ð0Þ 1
x1
B ð0Þ C
B x2 C
AðxÞ ¼ ðe1 e2 en Þ A0 B
B ⋮
C:
C
@ A
xðn0Þ
Therefore, we get
0 1
α1
B α2 C
B C
R1 A0 R ¼ B C: ð12:193Þ
@ ⋱ A
αn
From (12.33), fA(x) ¼ (x 2)(x 1). Note that f A ðxÞ ¼ f P1 AP ðxÞ. Let us treat a
problem according to Theorem 12.5. Also we use the notation of (12.85). Given
f1(x) ¼ x 1 and f2(x) ¼ x 2, let us decide M1(x) and M2(x) such that these can
satisfy
518 12 Canonical Forms of Matrices
We find M1(x) ¼ 1 and M2(x) ¼ 1. Thus, using the notation of Theorem 12.5,
Sect. 12.4 we have
!
1 1
A1 ¼ M 1 ðAÞf 1 ðAÞ ¼ A E ¼ ,
0 0
!
0 1
A2 ¼ M 2 ðAÞf 2 ðAÞ ¼ A þ 2E ¼ :
0 1
We also have
A1 þ A2 ¼ E, Ai Aj ¼ Aj Ai ¼ Ai δij : ð12:195Þ
Thus, we find that A1 and A2 are idempotent matrices. As both A1 and A2 are
expressed by a polynomial A, they are commutative with A.
We find that A is represented by
A ¼ α1 A1 þ α2 A2 , ð12:196Þ
0 10 10 1 0 1
1 0 1 1 0 1 1 0 1 1 0 0
B CB CB C B C
e ¼ P1 AP ¼ B 0 1C B 1C B 1 1 C B 0C
A @ 1 A@ 0 1 A@ 0 A ¼ @0 1 A:
0 0 1 0 0 0 0 0 1 0 0 0
ð12:198Þ
e2 ¼ A.
As can be checked easily, A e We also have
0 1
0 0 0
e¼B
EA @0 0
C
0 A, ð12:199Þ
0 0 1
2
where E A e ¼EA e holds as well. Moreover, A e EA e ¼ EA e Ae ¼ 0.
Thus A e and E A e behave like A1 and A2 of (12.195).
Next, suppose that x 2 V n is expressed as a linear combination of basis vectors a1,
a2, , and an. Then
Here let us define a following linear transformation P(k) such that P(k) “extracts”
the k-th component of x. That is,
Xn Xn Xn
½k
PðkÞ ðxÞ = PðkÞ j¼1
c j a j ¼ j¼1
p c a ¼ c k ak ,
i¼1 ij j i
ð12:201Þ
½k
where pij is the matrix representation of P(k). In fact, suppose that there is another
arbitrarily chosen vector x such that
Then we have
PðkÞ ðax þ byÞ = ðack þ bd k Þak = ack ak þ bd k ak = aPðkÞ ðxÞ þ bPðkÞ ðyÞ: ð12:203Þ
Thus P(k) is a linear transformation. In (12.201), for the third equality to hold, we
should have
½k ðk Þ
pij ¼ δi δðjkÞ , ð12:204Þ
ð jÞ
where δi has been defined in (12.175). Meanwhile, δðjkÞ denotes a row vector to
ðk Þ
which only the k-th column is 1, otherwise 0. Note that δi represents a (n, 1) matrix
520 12 Canonical Forms of Matrices
ðk Þ
and that δðjkÞ denotes a (1, n) matrix. Therefore, δi δjðkÞ represents a (n, n) matrix
whose (k, k) element is 1, otherwise 0. Thus, P(k)(x) is denoted by
0 1
0
B ⋱ C
B C0 1
B C x1
B 0 C
B CB x C
ðk Þ B CB 2 C
P ðxÞ ¼ ða1 an ÞB 1 C B C ¼ x k ak , ð12:205Þ
B C@ ⋮ A
B 0 C
B C x
B C n
@ ⋱ A
0
where only the (k, k) element is 1, otherwise 0. Then P(k)[P(k)(x)] ¼ P(k)(x). That is
h i2
PðkÞ ¼ PðkÞ : ð12:206Þ
A ¼ α1 A1 þ þ αn An , ð12:208Þ
where α1, , and αn are eigenvalues (some of which may be identical) and A1, ,
and An are idempotent matrices such as those represented by (12.205).
Yet, we have to be careful to construct idempotent matrices according to a
formalism described in Theorem 12.5. It is because we often encounter a situation
where different matrices give an identical characteristic polynomial. We briefly
mention this in the next example.
Example 12.8 Let us think about following two matrices:
0 1 0 1
3 0 0 3 0 0
B C B C
A ¼ @0 2 0 A, B ¼ @ 0 2 1 A: ð12:209Þ
0 0 2 0 0 2
Therefore, we have
M 1 ðxÞf 1 ðxÞ ¼ ðx 2Þ3 , M 2 ðxÞf 2 ðxÞ ¼ ðx 3Þ x2 þ 3x 3 : ð12:211Þ
Hence, we get
A1 M 1 ðAÞf 1 ðAÞ ¼ ðA 2E Þ3 ,
ð12:212Þ
A2 M 2 ðAÞf 2 ðAÞ ¼ ðA 3E Þ A2 þ 3A 3E :
Notice that we get the same idempotent matrix of (12.213), even though the
matrix forms of A and B differ. Also we have
A1 þ A2 ¼ B1 þ B2 ¼ E:
Then, we have
A ¼ ðA1 þ A2 ÞA ¼ A1 A þ A2 A, B ¼ ðB1 þ B2 ÞB ¼ B1 B þ B2 B:
Nonetheless, although A ¼ 3A1 + 2A2 holds, B 6¼ 3B1 + 2B2. That is, the
decomposition of the form of (12.208) is not true of B. The decomposition of this
kind is possible only with diagonalizable matrices.
In summary, a (n, n) matrix with s (1 s n) different eigenvalues has at least
s proper eigenvectors. (Note that a diagonalizable matrix has n proper eigenvectors.)
In the case of s < n, the matrix has multiple root(s) and may have generalized
eigenvectors. If the matrix has a generalized eigenvector of rank ν, the matrix is
accompanied by (ν 1) generalized eigenvectors along with a sole proper eigen-
vector. Those vectors form an invariant subspace along with the proper eigenvector
(s). In total, such n (generalized) eigenvectors span a whole vector space V n.
With the eigenvalue equation A(x) ¼ αx, we have an indefinite but nontrivial
solution x ≠ 0 for only restricted numbers α (i.e., eigenvalues) in a complex plane.
522 12 Canonical Forms of Matrices
However, we have a unique but trivial solution x = 0 for complex numbers α other
than eigenvalues. This is characteristic of the eigenvalue problem.
References
Thus far we have treated the theory of linear vector spaces. The vector spaces,
however, were somewhat “structureless,” and so it will be desirable to introduce a
concept of metric or measure into the linear vector spaces. We call a linear vector
space where the inner product is defined an inner product space. In virtue of a
concept of the inner product, the linear vector space is given a variety of structures.
For instance, introduction of the inner product to the linear vector space immediately
leads to the definition of adjoint operators and Gram matrices.
Above all, the concept of inner product can readily be extended to a functional
space and facilitate understanding of, e.g., orthogonalization of functions, as was
exemplified in Parts I and II. Moreover, definition of the inner product allows us to
relate matrix operators and differential operators. In particular, it is a key issue to
understand logical structure of quantum mechanics. This can easily be understood
from the fact that Paul Dirac, who was known as one of prominent founders of
quantum mechanics, invented bra and ket vectors to represent an inner product.
Inner product relates two vectors to a complex number. To do this, we introduce the
notation jai and hbj to represent the vectors. This notation is due to Dirac and widely
used in physics and mathematical physics. Usually jai and hbj are called a “ket”
vector and a “bra” vector, respectively, again due to Dirac. Alternatively, we may
call haj an adjoint vector of jai. Or we denote haj jai{. The symbol “{” (dagger)
means that for a matrix its transposed matrix should be taken with complex conju-
{
gate matrix elements. That is, aij ¼ aji . If we represent a full matrix, we have
0 1 0 1
a11 a1n a11 an1
B C B C
A¼@⋮ ⋱ ⋮ A, A{ ¼ @ ⋮ ⋱ ⋮ A: ð13:1Þ
an1 ann a1n ann
In (13.4), equality holds only if jai ¼ 0. Note here that two vectors are said to be
orthogonal to each other if their inner product vanishes, i.e., hbjai ¼ hajbi ¼ 0. In
particular, if a vector jai 2 V n is orthogonal to all the vectors in V n, i.e., hxjai ¼ 0 for
8
x 2 V n, then jai ¼ 0. This is because if we choose jai for jxi, we have hajai ¼ 0.
This means that jai ¼ 0. We call a linear vector space to which the inner product is
defined an inner product space.
We can create another structure to a vector space. An example is a metric
(or distance function). Suppose that there is an arbitrary set Q. If a real non-negative
number ρ(a, b) is defined as follows with any arbitrary elements a, b, c 2 Q, the set
Q is called a metric space [1]:
In our study a vector space is chosen for the set Q. Here let us define a norm for
each vector a. The norm is defined as
pffiffiffiffiffiffiffiffiffiffi
jjajj¼ hajai: ð13:8Þ
If we define ρ(a, b) jja bjj, jja bjj satisfies the definitions of metric.
Equations (11.5) and (11.6) are obvious. For (11.7) let us consider a vector jci as
jci¼jai xhbjaijbi with real x. Since hcjci 0, we have
13.1 Inner Product and Metric 525
or
The inequality (13.9) related to the quadratic equation in x with real coefficients
requires the inequality such that
That is,
pffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi
hajai ∙ hbjbi jhajbij : ð13:11Þ
Namely,
Thus, (13.16) is equivalent to (13.7). At the same time, the norm defined in (13.8)
may be regarded as a “length” of a vector a.
As βjbi + γjci represents a vector, we use a shorthand notation for it as
That is,
Therefore, when we take out a scalar from a bra vector, it should be a complex
conjugate. When the scalar is taken out from a ket vector, on the other hand, it is
unaffected by definition (13.3). To show this, we have jαai ¼ αjai. Taking its
adjoint, hαaj ¼ αhaj.
We can view (13.17) as a linear transformation in a vector space. In other words,
fn, j ∙ i is that
if we regard j ∙ i as a linear transformation of a vector a 2 V n to j ai 2 V
n
of V to V f. On the other hand, h ∙ j could not be regarded as a linear transformation
n
of a 2 V to haj2 V
n fn 0. Sometimes the said transformation is referred to as “antilinear”
or “sesquilinear.” From the point of view of formalism, the inner product can be
viewed as an operation: V fn V fn 0 ! ℂ.
Let us consider jxi ¼ jyi in an inner product space V fn. Then jxi jyi ¼ 0. That is,
jx yi ¼ 0. Therefore, we have x ¼ y, or j0i ¼ 0. This means that the linear
transformation j ∙ i converts 0 2 V n to j 0i 2 V fn . This is a characteristic of a linear
transformation represented in (11.44). Similarly we have h0j ¼ 0. However, we do
not have to get into further details in this book. Also if we have to specify a vector
space, we simply do so by designating it as V n.
Once we have defined an inner product between any pair of vectors jai and jbi of a
vector space V n, we can define and calculate various quantities related to inner
products. As an example, let ja1i, , jani and jb1i, , jbni be two sets of vectors
in V n. The vectors ja1i, , jani may or may not be linearly independent. This is true
of jb1i, , jbni. Let us think of a following matrix M defined as below:
0 1 0 1
ha1 j ha1 jb1 i ha1 jbn i
B C B C
M ¼ @ ⋮ Aðjb1 i jbn iÞ ¼ @ ⋮ ⋱ ⋮ A: ð13:21Þ
han j han jb1 i han jbn i
(i) Suppose that in (13.21) jb1i, , jbni are linearly dependent. Then, without loss
of generality we can put
Then we have
0 1
c2 ha1 jb2 i þ c3 ha1 jb3 i þ þ cn ha1 jbn i ha1 jb2 i ha1 jbn i
B C
M¼@ ⋮ ⋮ ⋱ ⋮ A :
c2 han jb2 i þ c3 han jb3 i þ þ cn han jbn i han jb2 i han jbn i
ð13:23Þ
Multiplying the second column, , and the n-th column by (c2), , and
(cn), respectively, and adding them to the first column to get
ð13:24Þ
Hence, det M ¼ 0.
(ii) Suppose in turn that in (13.21) ja1i, , jani are linearly dependent. In that case,
again, without loss of generality we can put
Again, det M ¼ 0.
Next let us examine the case where det M ¼ 0. In that case, n column vectors of
M in (13.21) are linearly dependent. Without loss of generality, we suppose that the
first column is expressed as a linear combination of the other (n 1) columns such
that
528 13 Inner Product Space
0 1 0 1 0 1
ha1 jb1 i ha1 jb2 i ha1 jbn i
B C B C B C
@ ⋮ A ¼ c2 @ ⋮ A þ þ cn @ ⋮ A: ð13:27Þ
han jb1 i han jb2 i han jbn i
Multiplying the first row, , and the n-th row of (13.28) by an appropriate
complex number p1 , , and pn , respectively, we have
hp1 a1 jb1 c2 b2 cn bn i ¼ 0,
,
hpn an jb1 c2 b2 cn bn i ¼ 0: ð13:29Þ
Now, suppose that ja1i, , jani are the basis vectors. Then, jp1a1 + + pnani
represents any vectors in a vector space. This implies that jb1 c2b2 cnbni ¼ 0;
for this see remarks after (13.4). That is,
0 1 0 1
ha1 j ha1 jb1 i ha1 jbn i
B C B C
M ¼ @ ⋮ Aðjb1 i jbn iÞ ¼ @ ⋮ ⋱ ⋮ A:
han j han jb1 i han jbn i
H ¼ H{, ð1:119Þ
Here, putting
0 1 0 1
x1 ξ1
Xn
{B C B C
U @ ⋮ A ¼ @ ⋮ A or equivalently x u ξ i
k¼1 k ki
ð13:38Þ
xn ξn
we have
0 1 0 1
he1 je1 i he1 jen i ξ1
B C B C
hxjxi ¼ ξ1 ξn U { @ ⋮
⋱ ⋮ AU @ ⋮ A: ð13:39Þ
hen je1 i hen jen i ξn
0 1
λ1 0
{ 0 B C
U GU ¼ G ¼ @ ⋮ ⋱ ⋮ A: ð13:40Þ
0 λn
Thus, we get
0 10 1
λ1 0 ξ1
B CB C
hxjxi ¼ ξ1 ξn @ ⋮ ⋱ ⋮ A@ ⋮ A ¼ λ1 jξ1 j2 þ þ λn jξn j2 : ð13:41Þ
0 λn ξn
From the relation (13.4), hxjxi 0. This implies that in (13.41) we have
λ i 0 ð1 i nÞ: ð13:42Þ
this, suppose that for ∃λi, λi < 0. Suppose also that for ∃ξi, ξi 6¼ 0. Then,
To0show1
0
B C
B⋮C
B C
with B C
B ξi C we have hxjxi ¼ λi|ξi| < 0, in contradiction.
2
B⋮C
@ A
0
Since we have detG 6¼ 0, taking a determinant of (13.40) we have
Y
n
det G0 ¼ det U { GU ¼ det U { UG ¼ det E det G ¼ det G ¼ λi 6¼ 0: ð13:43Þ
i¼1
Combining (13.42) and (13.43), all the eigenvalues λi are positive; i.e.,
B⋮C
@ A
0
dependent vectors je1i and je2i ¼ j e1i; see Example 13.2 below.)
532 13 Inner Product Space
H > 0: ð13:46Þ
If ϕ(x1, , xn) 0 for any xn (1 i n) and ϕ(x1, , xn) ¼ 0 for at least a set of
(x1, , xn) to which ∃xi 6¼ 0, ϕ(x1, , xn) is said to be positive semi-definite or non-
negative. In that case, we write
H 0: ð13:47Þ
Notice here that eigenvalues λi remain unchanged after (unitary) similarity trans-
formation. Namely, the eigenvalues are inherent to H.
We have already encountered several examples of positive definite and non-
negative operators. A typical example of the former case is Hamiltonian of quan-
tum-mechanical harmonic oscillator (see Chap. 2). In this case, energy eigenvalues
are all positive (i.e., positive definite). Orbital angular momenta L2 of hydrogen-like
atoms, on the other hand, are non-negative operators and, hence, an eigenvalue of
zero is permitted.
Alternatively, the Gram matrix is defined as B{B, where B is any (n, n) matrix. If
we take an orthonormal basis jη1i,jη2i, , jηni, jeii can be expressed as
13.2 Gram Matrices 533
Xn Xn Xn
j ei i ¼ b ji
ηj , hek ei i ¼ b
b ji hη
l ηj i
j¼1 j¼1 l¼1 lk
Xn Xn Xn Xn
¼ j¼1 l¼1
b lk b ji δ lj ¼ j¼1
b jk b ji ¼ j¼1
B{ kj ðBÞji
¼ B{ B ki : ð13:49Þ
For the second equality of (13.49), we used the orthonormal condition hηijηji ¼ δij.
Thus, the Gram matrix G defined in (13.35) can be regarded as identical to B{B.
In Sect. 11.4 we have dealt with a linear transformation of a set of basis vectors e1,
e2, , and en by a matrix A defined in (11.69) and examined whether the
transformed vectors e01 , e02 , and e0n are linearly independent. As a result, a necessary
and sufficient condition for e01 , e02 , and e0n to be linearly independent (i.e., to be a set
of basis vectors) is detA 6¼ 0. Thus, we notice that B plays a same role as A of (11.69)
and, hence, det B 6¼ 0 if and only if the set of vectors je1i,je2i, and jeni defined in
(13.32) are linearly independent. By the same token as the above, we conclude that
eigenvalues of B{B are all positive, only if det B 6¼ 0 (i.e., B is non-singular).
Alternatively, if B is singular, detB{B ¼ det B{ det B ¼ 0. In that case at least one
of eigenvalues of B{B must be zero. The Gram matrices appearing in (13.35) are
frequently dealt with in the field of mathematical physics in conjunction with
quadratic forms. Further topics can be seen in the next chapter.
Example 13.1 Let us take two vectors jε1i andjε2i that are expressed as
Here we have heijeji ¼ δij (1 i, j 2). Then we have a Gram matrix expressed as
0 1
1þi 1þi
B 2 2pffiffiffi C
U ¼ @ pffiffiffi A: ð13:53Þ
2 2
2 2
Thus, we get
0 pffiffiffi 1 0 1
1i 2 1þi 1þi
B 2 C 2 1þi B 2 2pffiffiffi C
{ B
U GU ¼ @ 2 C
pffiffiffi A @ pffiffiffi A
1i 2 1i 2 2 2
2 2 2 2
pffiffiffi !
2þ 2 0
¼ pffiffiffi : ð13:54Þ
0 2 2
pffiffiffi pffiffiffi
The eigenvalues 2 þ 2 and 2 2 are real positive as expected. That is, the
Gram matrix is positive definite.
Example 13.2 Let je1i and je2i(¼je1i) be two vectors. Then, we have a Gram
matrix expressed as
Thus, we get
0 1 0 1
1 1 1 1
pffiffiffi pffiffiffi pffiffiffi pffiffiffi
B 2 2C 1 1 B 2 2C 2 0
U { GU ¼ B
@ 1
C B C¼ : ð13:58Þ
1 A 1 1 @ 1 1 A 0 0
pffiffiffi pffiffiffi pffiffiffi pffiffiffi
2 2 2 2
13.3 Adjoint Operators 535
X2 x1
ϕð x1 , x2 Þ ¼ x ðGÞij xj
i,j¼1 i
¼ x1 x2 UU { GUU {
x2
2 0 x1 2 0 ξ1
¼ x1 x2 U U{ ¼ ξ1 ξ2 ¼ 2j ξ 1 j 2 :
0 0 x2 0 0 ξ2
ξ1 0
Then, if we take with ξ2 6¼ 0 as a column vector, we get ϕ(x1,
¼
ξ2 ξ2
x2) ¼ 0. With this type of column vector, we have
0 1
1 1
pffiffiffi pffiffiffi
x1 0 x1 0 B 2 2C 0
U{ ¼ , i:e:, ¼U ¼B
@
C
x2 ξ2 x2 ξ2 1 1 A ξ2
pffiffiffi pffiffiffi
2 2
1 ξ2
¼ pffiffiffi ,
2 ξ2
x1 0
where ξ2 is an arbitrary complex number. Thus, even for 6¼ we may
x2 0
x1 1 1
have ϕ(x1, x2) ¼ 0. To be more specific, if we take, e.g., ¼ or , we
x2 1 2
get
1 1 1 0
ϕð1, 1Þ ¼ ð1 1Þ ¼ ð 1 1Þ ¼ 0,
1 1 1 0
1 1 1 1
ϕð1, 2Þ ¼ ð1 2Þ ¼ ð 1 2Þ ¼ 1 > 0:
1 1 2 1
0 10 1
a11 a1n x1
B CB C
AðjxiÞ ¼ ðje1 i jen iÞ@ ⋮ ⋱ ⋮ A@ ⋮ A, ð13:59Þ
an1 ann xn
Meanwhile, we get
0 1 0 10 1
h e1 j a11 a1n x1
B C B CB C
hyjAðxÞi ¼ y1 yn @ ⋮ Aðje1 i jen iÞ@ ⋮ ⋱ ⋮ A@ ⋮ A
h en j an1 ann xn
0 1
x1
B C
¼ y1 yn GA@ ⋮ A: ð13:62Þ
xn
Hence, we have
0 1 0 1
x1 x1
B C TB C
hyjAðxÞi ¼ ðy1 yn ÞG A @ ⋮ A ¼ ðy1 yn ÞGT A{ @ ⋮ A: ð13:63Þ
xn xn
0 1
a11 a1n
B C
A @ ⋮ ⋱ ⋮ A:
an1 ann
Comparing (13.61) and (13.63), we find that one is transposed matrix of the other.
Also note that (AB)T ¼ BTAT, (ABC)T ¼ CTBTAT, etc. Since an inner product can be
viewed as a (1, 1) matrix, two mutually transposed (1, 1) matrices are identical.
Hence, we get
ðxÞA{ jy ¼ hyjAðxÞi ¼ hAðxÞjyi, ð13:64Þ
With the third equality of (13.65), we used (G)jk ¼ (G)kj, i.e., G{ ¼ G (Hermitian
matrix). Thanks to the freedom in choice of basis vectors as well as xi and yi, we must
have
A{ ik
¼ ðA Þki : ð13:66Þ
{
A{ ¼ A: ð13:68Þ
A0 ¼ P1 A0 P: ð11:88Þ
{ { 1
ðA0 Þ ¼ P{ A{ P1 ¼ P{ A{ P{ : ð13:69Þ
In (13.69), we denote the adjoint operator before the basis vectors transformation
simply by A{ to avoid complicated notation. We also have
0 {
A{ ¼ ðA0 Þ : ð13:70Þ
Meanwhile, suppose that (A{)0 ¼ Q1A{Q. Then, from (13.69) and (13.70) we
have
P{ ¼ Q1 :
or
Equation (13.71) states the equality of two vectors in an inner product space on
both sides, whereas (13.72) states that in a vector space where the inner product is
not defined. In either case, both (13.71) and (13.72) show that A{ is indeed a linear
transformation. In fact, the matrix representation of (13.66) and (13.67) is indepen-
dent of the concept of the inner product.
Suppose that there are two (or more) adjoint operators B and C that correspond to
A. Then, from (13.64) we have
13.3 Adjoint Operators 539
Also we have
unit vector. Hence, a set of vectors je1i and je2i does not constitute the orthonormal
basis. Let jxi be an arbitrary position vector there expressed as
x1
jxi ¼ ðje1 i je2 iÞ : ð13:76Þ
x2
Then we have
h e1 j x1
hxjxi ¼ ðx1 x2 Þ ðje1 i je2 iÞ
h e2 j x2
he1 je1 i he1 je2 i x1 1 0 x1
¼ ð x1 x2 Þ ¼ ðx1 x2 Þ
he2 je1 i he2 je2 i x2 0 4 x2
¼ x21 þ 4x22: ð13:77Þ
cos θ 2 sin θ
R¼ : ð13:78Þ
ð sin θÞ=2 cos θ
cos θ 2 sin θ x1
RðjxiÞ ¼ je1 i je2 i : ð13:79Þ
ð sin θÞ=2 cos θ x2
As a result of the transformation R, the basis vectors je1i and je2i are transformed
into je10i and je20i, respectively, as in Fig. 13.1 such that
cos θ 2 sin θ
je1 0 i je2 0 i ¼ je1 i je2 i :
ð sin θÞ=2 cos θ
ðxÞR{ jRðxÞ
! ! ! !
cos θ ð sin θÞ=2 1 0 cos θ 2 sin θ x1
¼ ðx1 x2 Þ
2 sin θ cos θ 0 4 ð sin θÞ=2 cos θ x2
! !
1 0 x1
¼ ðx1 x2 Þ ¼ x21 þ 4x22
0 4 x2
ð13:80Þ
Putting
1 0
G¼ , ð13:81Þ
0 4
we have R{GR ¼ G.
Comparing (13.77) and (13.80), we notice that a norm of jxi remains unchanged
after the transformation R. This means that R is virtually a unitary transformation. A
somewhat unfamiliar matrix form of R resulted from the choice of basis vectors other
than an orthonormal basis.
Now we introduce an orthonormal basis, the simplest and most important basis set in
an inner product space. If we choose the orthonormal basis so that heijeji ¼ δij, a
Gram matrix G ¼ E. Thus, R{GR ¼ G reads as R{R ¼ E. In that case a linear
transformation is represented by a unitary matrix and it conserves a norm of a vector
and an inner product with two arbitrary vectors.
So far we assumed that an adjoint operator A{ operates only on a row vector from
the right, as is evident from (13.61). At the same time, A operates only on the column
vector from the left as in (13.62). To render the notation of (13.61) and (13.62)
consistent with the associative law, we have to examine the commutability of A{
with G. In this context, choice of the orthonormal basis enables us to get through
such a troublesome situation and largely eases matrix calculation. Thus,
542 13 Inner Product Space
ðxÞA{ jAðxÞ
0 10 1 0 10 1
a11 an1 h e1 j a11 a1n x1
B CB C B CB C
¼ x1 xn B
@⋮ ⋱⋮C B C B
A@ ⋮ Aðje1 i jen iÞ@ ⋮ ⋱ ⋮C B C
A@ ⋮ A
a1n ann h en j an1 ann xn
0 1 0 1
x1 x1
{ B C { B C
¼ x1 xn A EAB
C B C
@ ⋮ A ¼ x1 xn A A@ ⋮ A:
xn xn
ð13:82Þ
This notation has become now consistent with the associative law. Note that A{
and A operate on either a column or row vector. We can also do without a symbol “|”
in (13.83) and express it as
xA{ Ax ¼ xA{ jAx : ð13:84Þ
Thus, we can freely operate A{ and A from both the left and right. By the same
token, we rewrite (13.62) as
0 10 1
a11 a1n x1
B CB C
hyjAxi ¼ hyAxi ¼ y1 yn @ ⋮ ⋱ ⋮ A@ ⋮ A
an1 ann xn
0 1
x1
B C
¼ y1 yn A@ ⋮ A: ð13:85Þ
xn
Here, notice that a vector jxi is represented by a column vector with respect to the
orthonormal basis. Using (13.64), we have
13.4 Orthonormal Basis 543
0 1
y1
B C
hyjAxi ¼ xA{ jy ¼ x1 xn A{ B C
@⋮A
yn
0 10 1 ð13:86Þ
a11 an1 y1
B CB C
¼ x1 xn B
@⋮ ⋱ ⋮C B C
A@ ⋮ A:
a1n ann yn
j 1i
je1 i ¼ pffiffiffiffiffiffiffiffiffiffi , he1 je1 i ¼ 1: ð13:88Þ
h1j1i
1
je2 i ¼ ½j2i he1 j2ije1 i, ð13:89Þ
L2
where L2 is a normalization constant such that he2je2i ¼ 1. Note that je2i cannot be a
zero vector. This is because if je2i were a zero vector, j2i and je1i (or j1i) would be
linearly dependent, in contradiction to the assumption. We have he1je2i ¼ 0. Thus,
je1i and je2i are orthonormal.
After this, the proof is based upon mathematical induction. Suppose that the
theorem is true of (n 1) vectors. That is, let jeii (1 i n 1) so that
heijeji ¼ δij (1 j n 1) and each vector jeii can be a linear combination of
the vectors jii. Meanwhile, let us define
544 13 Inner Product Space
Xn1
jf
ni jni j¼1
ej jn jej i: ð13:90Þ
f
j ni
jen i ¼ qffiffiffiffiffiffiffiffiffiffi , hen jen i ¼ 1: ð13:92Þ
fni
hnj f
Thus, the theorem is proven. In (13.92) a phase factor eiθ (θ: an arbitrarily chosen
real number) can be added such that
f
eiθ jni
jen i ¼ qffiffiffiffiffiffiffiffiffiffi : ð13:93Þ
fni
hnj f
To prove Theorem 13.2, we have used the following simple but important
theorem. ∎
Theorem 13.3 Let us have any n vectors jii ¼
6 0 (1 i n) in V n and let these
vectors jii be orthogonal to one another. Then the vectors jii are linearly
independent.
Proof Let us think of the following equation:
Multiplying (13.94) by hij from the left and considering the orthogonality among
the vectors, we have
ci hijii ¼ 0: ð13:95Þ
References
Hermitian operators and unitary operators are quite often encountered in mathemat-
ical physics and, in particular, quantum physics. In this chapter we investigate their
basic properties. Both Hermitian operators and unitary operators fall under the
category of normal operators. The normal matrices are characterized by an important
fact that those matrices can be diagonalized by a unitary matrix. Moreover,
Hermitian matrices always possess real eigenvalues. This fact largely eases mathe-
matical treatment of quantum mechanics. In relation to these topics, in this chapter
we investigate projection operators systematically. We find their important applica-
tion to physicochemical problems in Part IV. We further investigate Hermitian
quadratic forms and real symmetric quadratic forms as an important branch of matrix
algebra. In connection with this topic, positive definiteness and non-negative prop-
erty of a matrix are an important concept. This characteristic is readily applicable to
theory of differential operators, thus rendering this chapter closely related to basic
concepts of quantum physics.
n o
8
jxi; hxjyi ¼ 0 for j yi 2 W :
V n ¼ W⨁W⊥ : ð14:1Þ
Proof Suppose that an orthonormal basis comprising je1i, j e2i, and j eni spans
V n;
Of the orthonormal basis, let |e1i, |e2i, and |eri (r < n) span W. Let an
arbitrarily chosen vector from V n be |xi. Then we have
Xn
jxi ¼ x1 je1 i þ x2 je2 i þ þ xn jen i ¼ x
i¼1 i
j ei i: ð14:3Þ
That is,
Xn
j xi ¼ i¼1
hei jxijei i: ð14:5Þ
Meanwhile, put
Xr
j x0 i ¼ i¼1
hei jxijei i: ð14:6Þ
hei j x00 i ¼ hei jxi hei jx0 i ¼ hei jxi hei jxi ¼ 0: ð14:7Þ
Taking account of W ¼ Span{| e1i, | e2i, , | eri}, we get jx00i 2 W⊥. That is,
for 8jxi 2 V n
Moreover, the contents of the Theorem 14.1 can readily be generalized to more
subspaces such that
Thus, the operator Pi extracts a vector jwii in a subspace Wi. Then we have
P1 þ P2 þ þ Pr ¼ E: ð14:13Þ
Moreover,
550 14 Hermitian Operators and Unitary Operators
Pi 2 ¼ Pi : ð14:16Þ
Then, we have
hxPi yi ¼ hw1 þ w2 þ þ wr Pi j u1 þ u2 þ þ ur i
: ð14:18Þ
¼ hw1 þ w2 þ þ wr j ui i ¼ hwi jui i
With the last equality, we used the mutual orthogonality of the subspaces.
Meanwhile, we have
where we used (13.64) with the second equality. Since jxi and jyi are arbitrarily
chosen, we get
Pi { ¼ Pi : ð14:21Þ
ðP1 þ P2 þ þ Pr ÞðP1 þ P2 þ þ Pr Þ
Xr X Xr X X
¼ P2þ
i¼1 i
PP ¼
i6¼j i j i¼1 i
P þ PP ¼Eþ
i6¼j i j
P P ¼ E:
i6¼j i j
ð14:22Þ
In particular, we have
Pi Pj ¼ 0, Pj Pi ¼ 0 ði 6¼ jÞ: ð14:23Þ
In fact, we have
Pi Pj ðjxiÞ ¼ Pi jwj i ¼ 0 ði 6¼ j, 1 i, j nÞ: ð14:24Þ
The second equality comes from Wi \ Wj ¼ {0}. Notice that in (14.24), indices
i and j are interchangeable. Again, |xi is arbitrarily chosen, and so (14.23) holds.
Combining (14.16) and (14.23), we write
Pi Pj ¼ δij : ð14:25Þ
j wi ihwi j
Pi ¼ : ð14:26Þ
j jwi j j j jwi j j
Then we have
552 14 Hermitian Operators and Unitary Operators
jwi ihwi j
Pi j xi ¼ j ðj w1 iþ j w2 i þ þ jws iÞ
jjwi jj jjwi jj
¼ ½jwi iðhwi jw1 i þ hwi jw2 i þ þ hwi jws iÞ=hwi jwi i
¼ ½jwi ihwi jwi i=hwi jwi i ¼j wi i:
Furthermore, we have
Pi 2 j xi ¼ Pi j wi i ¼j wi i ¼ Pi j xi: ð14:27Þ
ðjwi ihwi jÞ{ ¼ ðhwi jÞ{ ðjwi iÞ{ ¼ jwi ihwi j: ð14:28Þ
Hence, we recover
Pi { ¼ Pi : ð14:21Þ
With the second equality of (14.29), we used (13.86) where A is replaced with
B and jyi is replaced with A{ j yi. Since (14.29) holds for arbitrarily chosen vectors
jxi and jyi, comparing the first and last sides of (14.29) we have
ðABÞ{ ¼ B{ A{ : ð14:30Þ
D { { E
xB{ A{ y ¼ y A{ B{ x ¼ hyAjBxi ¼ hyABxi , ð14:31Þ
where with the second equality we used (13.68). Also recall the remarks after (13.83)
with the expressions of (14.29) and (14.31). Other notations can be adopted.
We can view a projection operator under a more strict condition. Related oper-
ators can be defined as well. As in (14.26), let us define an operator such that
Operating Pei on jxi ¼ x1 j e1i + x2 j e2i + + xn j eni from the left, we get
14.1 Projection Operators 553
Xn Xn Xn
ej ¼ek i ej ¼ ek i
Pek jxi¼jek ihek j¼1
x j j¼1
x j e k j¼1
x j δ kj ¼x k ek i:
Thus, we find that Pek plays the same role as P(k) defined in (12.201). Represented
by a matrix, Pek has the same structure as that denoted in (12.205). Evidently,
2 {
Pek ¼ Pek , Pek ¼ Pek , P
f1 þ f
P2 þ þ f
Pn ¼ E: ð14:33Þ
ðk Þ
Now let us modify P(k) in (12.201). There PðkÞ ¼ δi δjðkÞ , where only the (k,k)
ðk Þ ðk Þ
element is 1, otherwise 0, in the (n, n) matrix. We define a matrix PðmÞ ¼ δi δjðmÞ. A
full matrix representation for it is
0 1
0
B C
B ⋱ 1 C
B C
B C
B 0 C
B C
ðk Þ B C
PðmÞ ¼B 0 C, ð14:34Þ
B C
B C
B 0 C
B C
B C
@ A
⋱
0
ðk Þ
where only the (k, m) element is 1, otherwise 0. In an example of (14.34), PðmÞ is an
ðk Þ
upper triangle matrix (k < m). Therefore, its eigenvalues are all zero, and so PðmÞ is a
nilpotent matrix. If k > m, the matrix is a lower triangle matrix and nilpotent as well.
Such a matrix is not Hermitian (nor a projection operator), as can be immediately
seen from the matrix form of (14.34). Because of the properties of nilpotent matrices
ðk Þ
mentioned in Sect. 12.3, PðmÞ ðk 6¼ mÞ is not diagonalizable either.
Various relations can be extracted. As an example, we have
ðk Þ ðlÞ
X ðk Þ ðk Þ ðk Þ
PðmÞ PðnÞ ¼ δ δqðmÞ δðqlÞ δjðnÞ
q i
¼ δi δml δjðnÞ ¼ δml PðnÞ : ð14:35Þ
ðk Þ
Note that PðkÞ PðkÞ defined in (12.201). From (14.35), moreover, we have
ðk Þ
Among these operators, only PðkÞ is eligible for a projection operator. We will
encounter further examples in Part IV.
ðk Þ
Using PðkÞ in (13.62), we have
0 1
D E x1
ðkÞ ðk Þ B C
yPðkÞ ðxÞ ¼ y1 yn GPðkÞ @ ⋮ A: ð14:36Þ
xn
There are a large group of operators called normal operators that play an important
role in mathematical physics, especially quantum physics. A normal operator is
defined as an operator on an inner product space that commutes with its adjoint
operator. That is, let A be a normal operator. Then, we have
AA{ ¼ A{ A: ð14:38Þ
The other way around suppose that j|Ax| j ¼ kA{xk. Then, since ||Ax||2 ¼ kA{xk2,
hxA{Axi ¼ hxAA{xi. That is, hx| A{A AA{| xi ¼ 0 for an arbitrarily chosen vector
jxi. To assert A{A AA{ ¼ 0, i.e., A{A ¼ AA{ on the assumption that hx| A{A AA{|
xi ¼ 0, we need the following theorems:
Theorem 14.2 [2] A linear transformation A on an inner product space is the zero
transformation if and only if hy| Axi ¼ 0 for any vectors jxi and jyi.
Proof If A ¼ 0, then hy| Axi ¼ hy| 0i ¼ 0. This is because in (13.3) putting β ¼ 1 ¼ γ
and jbi ¼ j ci, we get ha| 0i ¼ 0. Conversely, suppose that hy| Axi ¼ 0 for any
vectors jxi and jyi. Then, putting jyi ¼ j Axi, hxA{| Axi ¼ 0 and hy| yi ¼ 0. This
implies that jyi ¼ j Axi ¼ 0. For jAxi ¼ 0 to hold for any jxi we must have A ¼ 0.
Note here that if A is a singular matrix, for some vectors jxi, j Axi ¼ 0. However,
even though A is singular, for jAxi ¼ 0 to hold for any jxi, A ¼ 0.
We have another important theorem under a further restricted condition. ∎
Theorem 14.3 [2] A linear transformation A on an inner product space is the zero
transformation if and only if hx| Axi ¼ 0 for any vectors jxi.
Proof As in the case of Theorem 14.2, a necessary condition is trivial. To prove a
sufficient condition, let us consider the following:
From the assumption that hx| Axi ¼ 0 with any vectors jxi, we have
That is,
hxjAyi ¼ 0: ð14:44Þ
Theorem 14.2 means that A ¼ 0, indicating that the sufficient condition holds.
This completes the proof. ∎
Thus, returning to the beginning, i.e., remarks made after (14.39), we establish the
following theorem:
556 14 Hermitian Operators and Unitary Operators
A normal operator has a distinct property. The normal operator can be diagonalized
by a similarity transformation by a unitary matrix. The transformation is said to be a
unitary similarity transformation. Let us prove the following theorem.
Theorem 14.5 [3] A necessary and sufficient condition for a matrix A to be
diagonalized by unitary similarity transformation is that the matrix A is a normal
matrix.
Proof To prove the necessary condition, suppose that A can be diagonalized by a
unitary matrix U. That is,
For the third equality, we used DD{ ¼ D{D (i.e., D and D{ are commutable). This
shows that A is a normal matrix.
To prove the sufficient condition, let us show that a normal matrix can be
diagonalized by unitary similarity transformation. The proof is due to mathematical
induction, as is the case with Theorem 12.1.
First we show that Theorem is true of a (2, 2) matrix. Suppose that one of
eigenvalues of A2 is α1 and that its corresponding eigenvector is jx1i. Following
procedures of the proof for Theorem 12.1 and remembering the Gram–Schmidt
orthonormalization theorem, we can construct a unitary matrix U1 such that
where jx1i represents a column vector and jp1i is another arbitrarily determined
column vector. Then we can convert A2 to a triangle matrix such that
14.3 Unitary Diagonalization of Matrices 557
α1 x
f
A2 U 1 { A2 U 1 ¼ : ð14:49Þ
0 y
Then we have
{ {
f2 A
A f2 ¼ U 1 A2 U 1 U 1 { A2 U 1 Þ{ ¼ U 1 { A2 U 1 U 1 { A2 { U 1
h i{
¼ U 1 { A2 A2 { U 1 ¼ U 1 { A2 { A2 U 1 ¼ U 1 { A2 { U 1 U 1 { A2 U 1 ¼ f
A2 Af2 :
ð14:50Þ
With the fourth equality, we used the supposition that A2 is a normal matrix.
Equation (14.50) means that fA2 defined in (14.49) is a normal operator. Via simple
matrix calculations, we have
! !
h i{ jα1 j2 þ jxj2 xy h i{ jα1 j2 α1 x
A2 f
f A2 ¼ , A2 f
f A2 ¼ : ð14:51Þ
x y jyj2 α1 x jxj2 þ jyj2
α1 0
f
A2 ¼ : ð14:52Þ
0 y
This implies that a normal matrix A2 has been diagonalized by the unitary
similarity transformation.
Now let us examine a general case where we consider a (n, n) square normal
matrix An. Let αn be one of eigenvalues of An. On the basis of the argument of the
e we
(2, 2) matrix case, after a suitable similarity transformation by a unitary matrix U
first have
{
fn ¼ U
A e
e An U, ð14:53Þ
fn ¼ αn xT
A , ð14:54Þ
0 B
fn { ¼ αn 0
½A , ð14:55Þ
x B{
fn ½A
For A fn { ½g
fn { ¼ ½A An to hold with (14.56), we must have x = 0. Thus we get
αn 0
fn ¼
A : ð14:57Þ
0 B
Since f
An is a normal matrix, so is B. According to mathematical induction, let us
assume that the theorem holds with a (n 1, n 1) matrix, i.e., B. Then, also from
the assumption there exists a unitary matrix C and a diagonal matrix D, both of order
(n 1), such that BC ¼ CD. Hence,
αn 0 1 0 1 0 αn 0
¼ : ð14:58Þ
0 B 0 C 0 C 0 D
Here putting
1 0 αn 0
fn ¼
C and fn ¼
D , ð14:59Þ
0 C 0 D
we get
f fn ¼ C
An C fn D
fn : ð14:60Þ
h i{
fn is a (n, n) unitary matrix, [ C
As C fn ¼ C
fn { C fn ¼ E . Hence,
fn C
h i{
fn A
C fn C
fn ¼ D
fn . Thus, from (14.53) finally we get
h i{ {
fn
C e An U
U eCfn ¼ D
fn : ð14:61Þ
eC
Putting U fn ¼ V, V being another unitary operator,
fn :
V { An V ¼ D ð14:62Þ
A direct consequence of Theorem 14.5 is that with any normal matrix we can find
a set of orthonormal eigenvectors corresponding to individual eigenvalues whether
or not those are degenerate.
In Sect. 12.2 we dealt with a decomposition of a linear vector space and relevant
reduction of an operator when discussing canonical forms of matrices. In this context
Theorem 14.5 gives a simple and clear criterion for this. Equation (14.57) implies
that a (n 1, n 1) submatrix B can further be reduced to matrices having lower
dimensions. Considering that a diagonal matrix is a special case of triangle matrices,
a normal matrix that has been diagonalized by the unitary similarity transformation
gives eigenvalues by its diagonal elements.
From a point of view of the aforementioned aspect, let us consider the character-
istics of normal matrices, starting with the discussion about the invariant subspaces.
We have a following important theorem.
Theorem 14.6 Let A be a normal matrix and let one of its eigenvalues be α. Let Wα
be an eigenspace corresponding to α. Then, Wα is both A-invariant and A{-invariant.
Also Wα⊥ is both A-invariant and A{-invariant.
Proof Theorem 14.5 ensures that a normal matrix is diagonalized by unitary
similarity transformation. Therefore, we deal with only “proper” eigenvalues and
eigenvectors here. First we show if a subspace W is A-invariant, then its orthogonal
complements W⊥ is A{-invariant. In fact, suppose that jxi 2 W and jx0i 2 W⊥. Then,
from (13.64) and (13.86) we have
hx0 jAxi ¼ 0 ¼ xA{ jx0 ¼ xjA{ x0 : ð14:63Þ
The first equality comes from the fact that jxi 2 W ⟹ A j xi(¼| Axi) 2 W as W is
A-invariant. From the last equality of (14.63), we have
A{ j x0 i ¼ jA{ x0 i 2 W ⊥ : ð14:64Þ
0 1
1
B ⋱ C
B C
B C
B 1 C
B C
B C
B 0 1 C
B C
B 1 C
B C
B C
U¼B ⋮ ⋱ ⋮ C, ð14:66Þ
B C
B 1 C
B C
B C
B 1 0 C
B C
B 1 C
B C
B C
@ ⋱ A
1
where except (i, j) and ( j, i) elements equal to 1, all the off-diagonal elements are
zero. If operated from the left, U exchanges the i-th and j-th rows of the matrix. If
operated from the right, U exchanges the i-th and j-th columns of the matrix. Note
that U is at once unitary and Hermitian with eigenvalue 1 or 1. Note that U2 ¼ E.
This is because exchanging two columns (or two rows) two times produces identity
transformation. Thus performing such unitary similarity transformations appropriate
times, we get
0 1
α1
B ⋱ C
B C
B C
B α1 C
B C
B α2 C
B C
B C
B ⋱ C
Dn B
~
B
C:
C ð14:67Þ
B α2 C
B C
B ⋱ C
B C
B αs C
B C
B C
@ ⋱ A
αs
V n ¼ W α1 W α2 W αs : ð14:68Þ
0 1
Að1Þ
B C
B C
B C
An B
~
B Að2Þ C,
C ð14:69Þ
B C
@ ⋮ ⋱ ⋮A
AðsÞ
where with the fourth equality we used Theorem 14.7. Then we get
ðα βÞhvjui ¼ 0:
Since α β 6¼ 0, hv| ui ¼ 0. Namely, the eigenvectors jui and jvi are mutually
orthogonal.
In (12.208) we mentioned the decomposition of diagonalizable matrices. As for
the normal matrices, we have a related matrix decomposition. Let A be a normal
operator. Then, according to Theorem 14.5, A can be diagonalized and expressed as
(14.67). This is equivalently expressed as a following succinct relation. That is, if we
choose U for a diagonalizing unitary matrix, we have
562 14 Hermitian Operators and Unitary Operators
U { AU ¼ α1 P1 þ α2 P2 þ þ αs Ps , ð14:71Þ
where E n1 stands for a (n1, n1) identity matrix with n1 corresponding to multiplicity
of α1. A matrix represented by 0n2 is a (n2, n2) zero matrix, and so forth. This
expression is in accordance with (14.69). From a matrix form (14.72), obviously
Pl (1 l s) is a projection operator. Thus, operating U and U{ on both sides
(14.71) from the left and right of (14.71), respectively, we obtain
A ¼ α1 f
P1 þ α1 f
P2 þ þ αs Pes : ð14:74Þ
In (14.74), we can easily check that Pel is a projection operator with α1, α2, ,
and αs being different eigenvalues of A. If αl (1 l s) is degenerate, we express Pel
fμ ð1 μ m Þ, where m is multiplicity of α . In that case, we may write
as P l l l l
Pel ¼ f
P1l Pf
l :
ml
ð14:75Þ
Also we have
Pel Pek ¼ 0:
Thus, we have
14.3 Unitary Diagonalization of Matrices 563
fμ Peν ¼ δ ð1 μ, ν m Þ:
P l l μν l
X { X X X X
A{ A ¼ αPe αPe ¼ eP
α α P e ¼ e ¼
α α δ P ei ,
jαi j2 P
i i i j j j i,j i j i j i,j i j ij i i
X X { X X X
AA{ ¼ αPe αPe ¼ eP
α α P e ¼ e ¼
α α δ P ei
jαi j2 P
j j j i i i i,j i j j i i,j i j ji j i
ð14:76Þ
Hence, A{A ¼ AA{. If projection operators are further decomposed as in the case
of (14.75), we have a related expression to (14.76). Thus, a necessary and sufficient
condition for an operator to be a normal operator is that the said operator is expressed
as (14.74). The relation (14.74) is well known as a spectral decomposition theorem.
Thus, the spectral decomposition theorem is equivalent to Theorem 14.5.
The relations (12.208) and (14.74) are virtually the same, aside from the fact that
whereas (14.74) premises an inner product space, (12.208) does not premise
it. Correspondingly, whereas the related operators are called projection operators
with the case of (14.74), those operators are said to be idempotent operators for
(12.208).
Example 14.1 Let us think of a Gram matrix of Example 13.1, as shown below.
2 1þi
G¼ : ð13:51Þ
1i 2
pffiffiffi !
2þ 2 0 pffiffiffi 1 0 pffiffiffi 0 0
e¼
G pffiffiffi ¼ 2 þ 2 þ 2 2 :
0 2 2 0 0 0 1
e { ¼ G, we get
By a back calculation of U GU
0 pffiffiffi 1
1 2ð1 þ iÞ
pffiffiffi B C pffiffiffi
G¼ 2þ 2 B @ pffiffiffi
2 4 Cþ 2 2
A
2ð 1 i Þ 1
4 2
0 pffiffiffi 1
1 2ð 1 þ i Þ
B C
B pffiffiffi2 4 C: ð14:77Þ
@ A
2ð 1 i Þ 1
4 2
pffiffiffi pffiffiffi
Putting eigenvalues α1 ¼ 2 þ 2 and α2 ¼ 2 2 along with
0 pffiffiffi 1 0 pffiffiffi 1
1 2ð1 þ iÞ 1 2ð1 þ iÞ
B C B C
A1 ¼ B
@ p ffiffi
ffi 2 4 C,
A A2 ¼ B
@ p ffiffi
ffi 2 4 C,
A ð14:78Þ
2ð1 iÞ 1 2ð1 iÞ 1
4 2 4 2
we get
G ¼ α1 A1 þ α2 A2 : ð14:79Þ
A1 2 ¼ A1 , A2 2 ¼ A2 , A1 A2 ¼ A2 A1 ¼ 0, A1 þ A2 ¼ E: ð14:80Þ
Moreover, (14.78) obviously shows that both A1 and A2 are Hermitian. Thus,
Eqs. (14.77) and (14.79) are an example of the spectral decomposition. The decom-
position is unique.
The Example 14.1 can be dealt with in parallel to Example 12.5. In Example 12.5,
however, an inner product space is not implied, and so we used an idempotent matrix
instead of a projection operator. Note that as can be seen in Example 12.5 that
idempotent matrix was not Hermitian.
Of normal matrices, Hermitian matrices and unitary matrices play a crucial role both
in fundamental and applied science. Let us think of several topics and examples.
14.4 Hermitian Matrices and Unitary Matrices 565
For the third equality we used the fact that Pel is Hermitian; for the last equality we
used Pel ¼ Pel . Summing (14.84) with the index l, we have
2
X X D X E
℘ ¼
l l l
uPel u ¼ u l
el u ¼ huEui ¼ hu j ui ¼ 1,
P
A ¼ α1 f
P1 þ α2 f
P2 þ þ αs Pes : ð14:74Þ
D E D E
f f e
hujAjui ¼ α1 uP 1 u þ α2 uP2 u þ þ αs u Ps u
¼ α1 ℘1 þ α2 ℘2 þ þ αs ℘s : ð14:85Þ
Hence,
ujA A{ ju ¼ 0: ð14:87Þ
From Theorem 14.3, we get A A{ ¼ 0, i.e., A ¼ A{. This completes the proof.∎
Theorem 14.10 The eigenvalues of an Hermitian operator A are real.
Proof Let α be an eigenvalue of A and let jui be a corresponding eigenvector. Then,
A j ui ¼ α j ui. Operating huj from the left, hu|A|ui ¼ αhu| ui ¼ α j |u|j. Thus,
ð1 λλ Þ j λi ¼ 0: ð14:90Þ
Thus, eigenvalues of a unitary matrix have unit absolute value. Let those eigen-
values be λk ¼ eiθk ðk ¼ 1, 2, ; θk : realÞ. Then, from (12.11) we have
Y Y Y
detU ¼ λk ¼ eiðθ1 þθ2 þÞ and jdetU j ¼ jλk j ¼ eiθk ¼ 1:
k k k
That is, any unitary matrix has a determinant of unit absolute value. ∎
Example 14.2 Let us think of a following unitary matrix R:
cos θ sin θ
R¼ : ð14:92Þ
sin θ cos θ
A characteristic equation is
cos θ λ sin θ
¼ λ2 2λ cos θ þ 1: ð14:93Þ
sin θ cos θ λ
(i) θ ¼ 0: This is a trivial case. The matrix R has automatically been diagonalized
to be an identity matrix. Eigenvalues are 1 (double root).
(ii) θ ¼ π: This is a trivial case. Eigenvalues are 1 (double root).
(iii) θ 6¼ 0, π: Let us think of 0 < θ < π. Then λ ¼ cos θ i sin θ. As a diagonalizing
unitary matrix U, we get
0 1 0 1
1 1 1 i
pffiffiffi pffiffiffi pffiffiffi pffiffiffi
B 2 2C B 2 2 C
U¼B
@
C, U{ ¼ B C: ð14:95Þ
i i A @ 1 i A
pffiffiffi pffiffiffi pffiffiffi pffiffiffi
2 2 2 2
Therefore, we have
568 14 Hermitian Operators and Unitary Operators
0 1 0 1
1 i 1 1
pffiffiffi pffiffiffi pffiffiffi pffiffiffi
B 2 2 C cos θ sin θ B 2 2C
U { RU ¼ B
@ 1
C B C
i A sin θ cos θ @ piffiffiffi i A
pffiffiffi pffiffiffi pffiffiffi
2 2 2 2
eiθ 0
¼ : ð14:96Þ
0 eiθ
A trace of the resulting matrix is 2 cos θ. In the case of π < θ < 2π, we get a
diagonal matrix similarly. The conformation is left for readers.
The Hermitian quadratic forms appeared in, e.g., (13.34) and (13.83) in relation to
Gram matrices in Sect. 13.2. The Hermitian quadratic forms have wide applications
in the field of mathematical physics and materials science.
Let H be an Hermitian operator and jxi on an inner vector space. We define the
Hermitian quadratic form in an arbitrary orthonormal basis as follows:
0 1
x1
B C X
hxjH jxi x1 xn H @ ⋮ A ¼ x ðH Þij xj ,
i,j i
xn
where jxi is represented as a column vector, as already mentioned in Sect. 13.4. Let
us start with unitary diagonalization of (13.40), where a Gram matrix is a kind of
Hermitian matrix. Following similar procedures, as in the case of (13.36) we obtain a
diagonal matrix and an inner product such that
0 10 1 2
λ1 0 ξ1
B
CB C
hxjH jxi ¼ ξ1 ξn @ ⋮ ⋱ ⋮ A@ ⋮ A ¼ λ1 ξ1 j2 þ þ λn ξn : ð14:97Þ
0 λn ξn
Notice, however, that the Gram matrix comprising basis vectors (that are linearly
independent) is positive definite. Remember that if a Gram matrix is constructed by
A{A (Hermitian as well) according to whether A is non-singular or singular, A{A is
either positive definite or non-negative. The Hermitian matrix we are dealing with
here, in general, does not necessarily possess the positive definiteness or
non-negative feature. Yet, remember that hx| H| xi and eigenvalues λ1, , λn are real.
Positive definiteness of matrices is an important concept in relation to the
Hermitian (and real) quadratic forms (see Sect. 13.2). In particular, in the case
where all the matrix elements are real and | xi is defined in a real domain, we are
14.5 Hermitian Quadratic Forms 569
dealing with the quadratic form with respect to a real symmetric matrix. In the case
of the real quadratic forms, we sometimes adopt the following notation [4, 5]:
0 1
x1
Xn B C
A½x xT Ax ¼ a x x ;x
i,j¼1 ij i j
¼ @ ⋮ A,
xn
where A ¼ (aij) is a real symmetric matrix and xi (1 i n) are real numbers. The
positive definiteness is invariant under a transformation PTAP, where P is
non-singular. In fact, if A > 0, for x0T ¼ xTP and A0 ¼ PTAP we have
xT Ax = xT P PT AP PT x = x0T A0 x0 :
A0 ¼ PT AP > 0, ð14:98Þ
where with the notation PTAP > 0; see (13.46). In particular, using a suitable
orthogonal matrix O, we obtain
0 1
λ1 0
B C
OT AO ¼ @ ⋮ ⋱ ⋮ A:
0 λn
Y
n
detA ¼ λi > 0:
i¼1
Theorem 14.11 [4, 5] Let A be a n-dimensional real symmetric matrix A ¼ (aij). Let
A(k) be k-dimensional principal submatrices described by
0 1
ai1 i1 ai1 i2 ai1 ik
Ba ai2 ik C
B i2 i1 ai2 i2 C
AðkÞ ¼ B C ð1 i1 < i2 < < ik nÞ,
@ ⋮ ⋮ ⋱ ⋮ A
aik i1 aik i2 aik ik
where the principal submatrices mean a matrix made by striking out rows and
columns on diagonal elements. Alternatively, a principal submatrix of a matrix
A can be defined as a matrix whose diagonal elements are a part of the diagonal
elements of A. As a special case, those include a11, , or ann as a (1, 1) matrix (i.e.,
merely a number) as well as A itself. Then, we have
Proof First, suppose that A > 0. Then, in a quadratic form A[x] equating (n k)
variables xl ¼ 0 (l 6¼ i1, i2, , ik), we obtain a quadratic form of
Xk
a x x :
μ,ν¼1 iμ iν iμ iν
Since A > 0, this (partial) quadratic form is positive definite as well; i.e., we have
AðkÞ > 0:
Therefore, we get
detAðkÞ > 0:
This is due to (13.48). Notice that detA(k) is said to be a principal minor. Thus, we
have proven ⟹ of (14.99).
To prove ⟸, in turn, we use mathematical induction. If n ¼ 1, we have a trivial
case; i.e., A is merely a real positive number. Suppose that ⟸ is true of n 1. Then,
we have A(n 1) > 0 by supposition. Thus, it follows that it will suffice to show
A > 0 on condition that A(n 1) > 0 and detA > 0 in addition. Let A be a n-
dimensional real symmetric non-singular matrix such that
14.5 Hermitian Quadratic Forms 571
!
Aðn1Þ a
A¼ ,
aT an
where A(n 1) is a symmetric matrix and non-singular as well. We define P such that
1
!
E Aðn1Þ a
P¼ ,
0 1
E 0
PT ¼ 1 :
aT Aðn1Þ 1
1 T 1 T T T 1
aT Aðn1Þ ¼ Aðn1Þ a ¼ Aðn1Þ a:
By supposition, we have det A(n 1) > 0 and det A > 0. Hence, we have
572 14 Hermitian Operators and Unitary Operators
1
an Aðn1Þ ½a > 0:
!
ðn1Þ 1 xðn1Þ
Putting aen an A ½a and x , we get
xn
!
Aðn1Þ 0 h i
½x ¼ Aðn1Þ xðn1Þ þ aen xn 2 :
0 aen
Meanwhile, A is expressed as
e
A ¼ PT AP:
g
A > 0⟺detAðk Þ
> 0 ð1 k nÞ,
g
where Aðk Þ
is described as
0 1
a11 a1k
gðk Þ B C
A ¼@⋮ ⋱ ⋮ A:
ak1 akk
5 2 5 2 x1
H¼ , hxjH jxi ¼ ðx1 x2 Þ : ð14:101Þ
2 1 2 1 x2
14.5 Hermitian Quadratic Forms 573
5 2 x1
hxjH jxi ¼ ðx1 x2 ÞUU { UU {
21 x2
pffiffiffi !
3þ2 2 0 x1
¼ ðx1 x2 ÞU pffiffiffi U { : ð14:104Þ
0 32 2 x2
Making an argument analogous to that of Sect. 13.2 and using similar notation,
we have
574 14 Hermitian Operators and Unitary Operators
f
x1 x1
hexjH 0 jexi ¼ ð f x2 ÞH 0
x1 f ¼ ðx1 x2 ÞUU { HUU { ¼ hxjH jxi: ð14:105Þ
f
x2 x2
f
x1 x1
¼ U{ : ð14:106Þ
f
x2 x2
H 0 ¼ U { HU: ð14:107Þ
x1 f
x1
Thus, it follows that and are different column vector representa-
x2 f
x2
tions of the identical vector that is viewed in reference to two different sets of
orthonormal bases (i.e., different coordinate systems).
From (14.102), as an approximation we have
!
0:92 0:38 cos 22:5 sin 22:5
Uffi ffi : ð14:108Þ
0:38 0:92 sin 22:5 cos 22:5
fx1 2 fx2 2
qffiffiffiffiffiffiffiffiffiffiffi 2
þ qffiffiffiffiffiffiffiffiffiffiffi 2
¼ 1: ð14:109Þ
1pffiffi 1pffiffi
3þ2 2 32 2
of eigenvectors. The question is boiled down to whether the two operators commute.
To address this question, the following theorem is important.
Theorem 14.13 [2] Two Hermitian matrices B and C commute if and only if there
exists a complete orthonormal set of common eigenvectors.
Proof Suppose that there exists a complete orthonormal set of common eigenvec-
tors {jxi; mii} that span the linear vector space, where i and mi are a positive integer
and that jxii corresponds to an eigenvalue bi of B and ci of C. Note that if mi is 1, we
say that the spectrum is nondegenerate and that if mi is equal to two or more, the
spectrum is said to be degenerate. Then we have
Therefore, BC(| xii) ¼ B(ci| xii) ¼ ciB(| xii) ¼ cibi j xii. Similarly, CB(|
xii) ¼ cibi j xii. Consequently, (BC CB)(| xii) ¼ 0 for any jxii. As all the set of
jxii span the vector space, BC CB ¼ 0, namely BC ¼ CB.
In turn, assume that BC ¼ CB and that B(| xii) ¼ bi j xii, where jxii are
orthonormal. Then, we have
This implies that C(| xii) is an eigenvector of B corresponding to the eigenvalue bi.
We have two cases.
(i) The spectrum is nondegenerate: The spectrum is said to be nondegenerate if
only one eigenvector belongs to an eigenvalue. In other words, multiplicity of bi
is one. Then, C(| xii) must be equal to some constant times jxii; i.e., C(|
xii) ¼ ci j xii. That is, jxii is an eigenvector of C corresponding to an eigenvalue
ci. That is, jxii is a common eigenvector to B and C. This completes the proof.
576 14 Hermitian Operators and Unitary Operators
ð1Þ
Xm ð jÞ ðmÞ
Xm ð jÞ
C jxi i ¼ j¼1
α j1 j xi i, , C jxi i ¼ α
j¼1 jm
j xi i: ð14:112Þ
where the third equality comes from the orthonormality of the basis vectors.
Meanwhile,
14.6 Simultaneous Eigenstates and Diagonalization 577
D E D E D E
ðlÞ ðk Þ ðk Þ ðlÞ ðk Þ ðlÞ
xi jCxi ¼ xi jC { xi ¼ xi jCxi ¼ αkl , ð14:115Þ
where the second equality comes from the Hermiticity of C. From (14.114) and
(14.115), we get
Xm ðk Þ
Xm ð jÞ
C γ jx i ¼ c
k¼1 k i
γ jx i
j¼1 j i
: ð14:117Þ
ð jÞ
The vectors j xi i ð1 k mÞ span an invariant subspace (i.e., an eigenspace
corresponding to an eigenvalue of bi). Let us call this subspace Wm. Consequently, in
ð jÞ
(14.118) we can equate the scalar coefficients of individual j xi i ð1 k mÞ .
Then, we get
0 10 1 0 1
α11 α1m γ1 γ1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ c@ ⋮ A: ð14:119Þ
αm1 αmm γm γm
0 10 ðμÞ
1 0 ðμÞ 1
α11 α1m γ1 γ1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ cμ @ ⋮ A: ð14:120Þ
αm1 αmm γ ðmμÞ γ ðmμÞ
Equation (14.120) implies that we can construct a (m, m) unitary matrix from the
m orthonormal column vectors γ (μ). Using the said unitary matrix, we will be able to
diagonalize (αij) according to Theorem 14.5.
Having determined m eigenvectors γ (μ) (1 μ m), we can construct a set of
eigenvectors such that
Xm ðμÞ ðkÞ
ðμÞ
yi i ¼ γ xi i:
k¼1 k
ð14:121Þ
ðμÞ
Finally let us confirm that j yi i ð1 μ mÞ in fact constitute an orthonormal
basis. To show this, we have
D E DXm Xm E
ðνÞ ðμÞ ðνÞ ðk Þ ðμÞ ðlÞ
yi yi ¼ γ
k¼1 k
x i γ
l¼1 l
x i
Xm Xm h ðνÞ i ðμÞ D ðkÞ ðlÞ E
¼ k¼1 l¼1 k
γ γ l xi jxi
Xm Xm h i ð14:122Þ
ðνÞ ðμÞ
¼ k¼1 l¼1 k
γ γ l δkl
Xm h ðνÞ i ðμÞ
¼ k¼1 k
γ γ k ¼ δνμ
The last equality comes from the fact that a matrix comprising m orthonormal
ðμÞ
column vectors γ (μ) (1 μ m) forms a unitary matrix. Thus, j yi i ð1 μ mÞ
certainly constitute an orthonormal basis.
The above completes the proof. ∎
Theorem 14.13 can be restated as follows: Two Hermitian matrices B and C can
be simultaneously diagonalized by a unitary similarity transformation. As mentioned
above, we can construct a unitary matrix U such that
0 ð1Þ ðmÞ
1
γ1 γ1
B C
U¼@⋮ ⋱ ⋮ A:
γ ðm1Þ γ ðmmÞ
0 1 0 1
bi c1
B bi C B c2 C
B C B C
U { BU ¼ B
B
C,
C U { CU ¼ B
B
C: ð14:123Þ
C
@ ⋱ A @ ⋱ A
bi cm
common eigenvectors exists with the set of operators Lx, and Ly, and Lz. This fact is
equivalent to that these three operators are noncommutative among them. In con-
trast, L2 and Lz share a complete orthonormal set of common eigenvectors and,
hence, are commutable.
ð1Þ ð2Þ ðmÞ
Notice that C jxi i , C jxi i , , and Cjxi i are not necessarily linearly
independent (see Sect. 9.4). Suppose that among m eigenvalues cμ (1 μ m),
some cμ ¼ 0. Then, detC ¼ 0 according to (13.48). This means that C is singular. In
ð1Þ ð2Þ ðmÞ
that case, C jxi i , C jxi i , , and Cjxi i are linearly dependent. In Sect.
3.3, in fact, we had j Lz Y 00 ðθ, ϕÞ ¼j L2 Y 00 ðθ, ϕÞ ¼ 0 . But, this special situation
does not affect the proof of Theorem 14.13.
We know that any matrix A can be decomposed such that
1 h1 i
A¼A þ A{ þ i A A{ , ð14:124Þ
2 2i
where we put B 12 A þ A{ and C 2i1 A A{ ; both B and C are Hermitian.
That is any matrix A can be decomposed to two Hermitian matrices in such a way
that
A ¼ B þ iC: ð14:125Þ
Note here that B and C commute if and only if A and A{ commute, that is A is a
normal matrix. In fact, from (14.125) we get
From (14.126), if and only if B and C commute (i.e., B and C can be diagonalized
simultaneously), AA{ A{A ¼ 0, i.e., AA{ ¼ A{A. This indicates that A is a normal
matrix. Thus, the following theorem will follow.
580 14 Hermitian Operators and Unitary Operators
References
In Chap. 12, we dealt with a function of matrices. In this chapter we study several
important definitions and characteristics of functions of matrices. If elements of
matrices consist of analytic functions of a real variable, such matrices are of
particular importance. For instance, differentiation can naturally be defined with
the functions of matrices. Of these, exponential functions of matrices are widely used
in various fields of mathematical physics. These functions frequently appear in a
system of differential equations. In Chap. 10, we showed that SOLDEs with suitable
BCs can be solved using Green’s functions. In the present chapter, in parallel, we
show a solving method based on resolvent matrices. The exponential functions of
matrices have broad applications in group theory that we will study in Part IV. In
preparation for it, we study how the collection of matrices forms a linear vector
space. In accordance with Chap. 13, we introduce basic notions of inner product and
norm to the matrices.
dAðt Þ 0
A0 ðt Þ ¼ aij ðt Þ : ð15:2Þ
dt
We have
1 2 1
exp A E þ A þ A þ þ Aν þ , ð15:7Þ
2! ν!
where A is a (n, n) square matrix and E is a (n, n) identity matrix. Note that E is used
instead of a number 1 that appears in the power series expansion of exp x described
as
1 2 1
exp x 1 þ x þ x þ þ xν þ :
2! ν!
Regarding the matrix notation in Definition 15.1, readers are referred to (11.38).
Now, let us show that the power series of (15.7) converges [1, 2].
Let A be a (n, n) square matrix. Putting A ¼ (aij) and max j aij j¼ M and defining
i, j
ðνÞ ðνÞ
A aij , where aij is defined as a (i, j) component of Aν, we get
ν
15.1 Functions of Matrices 583
ðνÞ
aij nν M ν ð1 i, j nÞ: ð15:8Þ
ð0Þ
Note that A0 ¼ E. Hence, aij ¼ δij. Then, if ν ¼ 0, (15.8) routinely holds. When
ν ¼ 1, (15.8) obviously holds as well from the definition of M and assuming n 2;
we are considering (n, n) matrices with n 2. We wish to show that (15.8) holds
with any ν (2) using mathematical induction. Suppose that (15.8) holds with
ν ¼ ν 1. That is,
ðν1Þ
aij nν1 M ν1 ð1 i, j nÞ:
Then, we have
ð νÞ Xn Xn ðν1Þ
ðAν Þij aij ¼ Aν1 A ij ¼ k¼1
Aν1 ik ðAÞkj ¼ a
k¼1 ik
akj : ð15:9Þ
Thus, we get
Xn Xn
ð νÞ ðν1Þ ðν1Þ
aij ¼ k¼1 aik akj a
k¼1 ik
akj n nν1 M ν1 M
¼ nν M ν : ð15:10Þ
The series (15.11) is certainly converges; note that nM is not a matrix but a
number. From (15.10) we obtain
X1 1 ðνÞ X1 1
a
ν¼0 ν! ij ν¼0 ν!
nν M ν ¼ enM :
P 1 ðνÞ
This implies that 1 ν¼0 ν! aij converges (i.e., absolute convergence [3]). Thus,
from Definition 15.1, (15.7) converges.
To further examine the exponential functions of matrices, in the following
proposition we show that a collection of matrices forms a linear vector space.
Proposition 15.1 Let M be a collection of all (n, n) square matrices. Then, M forms
a linear vector space.
Proof Let A ¼ aij 2 M and B ¼ bij 2 M be (n, n) square matrices. Then,
αA ¼ (αaij) and A + B ¼ (aij + bij) are again (n, n) square matrices, and so we
have αA 2 M and A þ B 2 M . Thus, M forms a linear vector space.
584 15 Exponential Functions of Matrices
2
In Proposition 15.1, M forms a n2-th order n
E vector space V . To define an inner
ðkÞ
product in this space, we introduce PðlÞ ðk, l ¼ 1, 2, . . . , nÞ as a basis set of n2
vectors. Now, let A ¼ (aij) be a (n, n) square matrices. To explicitly show that A is a
vector in the inner product space, we express it as
Xn E
ðk Þ
jAi ¼ a P
k,l¼1 kl ðlÞ
:
E
ðk Þ
Here, let us define an orthonormal basis set PðlÞ and an inner product between
vectors described in a form of matrices such that
D E
ðk Þ ðsÞ
PðlÞ jPðtÞ δks δlt :
The relation (15.13) leads to a norm already introduced in (13.8). The norm of a
matrix is defined as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi Xn 2
jjAjj AjA¼ aij : ð15:14Þ
i,j¼1
Proof Let C ¼ AB. Then, from Cauchy–Schwarz inequality (Sect. 13.1) we get
X 2 X X 2 X X
X 2
kC k2 ¼ cij ¼ a ik b kj ja j2 l blj
i,j i,j k i,j k ik
X X : ð15:16Þ
¼ j a ik j 2 blj 2 ¼ kAk2 kBk2
i,k j,l
Proof Since A and B are mutually commutative, (A + B)n can be expanded through
the binomial theorem such that
1 Xn Ak Bnk
ðA þ BÞn ¼ k¼0 k! ðn k Þ!
: ð15:18Þ
n!
X2m 1 X X
n m An m Bn
n¼0 n!
ð A þ B Þ ¼ n¼0 n! n¼0 n!
þ Rm , ð15:19Þ
P
where Rm has a form of ðk,lÞ A k l
B =k!l!, where the summation is taken over all
combinations of (k, l) that satisfy max(k, l) m + 1 and k + l 2m. The number of
all the combinations (k, l) is m(m + 1) accordingly; see Fig. 15.1 [4]. We remark that
if in (15.19) we put m ¼ 0, (15.19) trivially holds with Rm ¼ 0 (i.e., zero matrix). The
586 15 Exponential Functions of Matrices
⋯
Japanese) [4], with the
⋯⋯⋯
permission of Baifukan Co.,
Ltd., Tokyo
0
0 +1
where m(m + 1) in the numerator comes from the number of combinations (k, l) as
pointed out above. From Stirling’s interpolation formula [5], we have
pffiffiffiffiffi z zþ1
Γðz þ 1Þ 2π e z 2 : ð15:22Þ
Then, we have
15.2 Exponential Functions of Matrices and Their Manipulations 587
m12
C2m em1 C2m C eC 2
½RHS of ð15:21Þ ¼ pffiffiffiffiffi ¼ p ffiffiffiffiffiffiffi
ffi ð15:24Þ
ðm 1Þ! 2πe m 1
1
2π ðm 1Þm2
and
m12
C eC2
lim pffiffiffiffiffiffiffiffi ¼ 0: ð15:25Þ
m!1 2πe m 1
Remember that jjRmjj ¼ 0 if and only if Rm ¼ 0 (zero matrix); see (13.4) and
(13.8) in Sect. 13.1. Finally, taking limit (m ! 1) of both sides of (15.19) we obtain
X2m 1 X X
n m An m Bn
lim n¼0 n!
ð A þ B Þ ¼ lim n¼0 n! n¼0 n!
þ Rm :
m!1 m!1
X2m 1 X X
n m An m Bn
limn¼0 n!
ð A þ BÞ ¼ lim n¼0 n! n¼0 n!
: ð15:27Þ
m!1 m!1
This implies that (15.17) holds. It is obvious that if A and B commute, then exp A
and exp B commute. That is, exp A exp B ¼ exp B exp A. These complete the
proof. ∎
From Theorem 15.2, we immediately get the following property.
This leads to (15.28). Notice here that exp 0 ¼ E from (15.7). We further have
important properties of exponential functions of matrices.
ð2Þ P1 ð exp AÞP ¼ exp P1 AP : ð15:30Þ
588 15 Exponential Functions of Matrices
ð3Þ exp AT ¼ ð exp AÞT : ð15:31Þ
ð4Þ exp A{ ¼ ð exp AÞ{ : ð15:32Þ
The confirmation of (15.31) and (15.32) is left for readers. For future purpose, we
describe other important theorem and properties of the exponential functions of
matrices.
Theorem 15.3 Let a1, a2, , an be eigenvalue of a (n, n) square matrix A. Then,
exp a1, exp a2, , exp an are eigenvalues of exp A.
Proof From Theorem 12.1 we know that every (n, n) square matrix can be
converted to a triangle matrix by similarity transformation. Also, we know that
eigenvalues of a triangle matrix are given by its diagonal elements (Sect. 12.1).
Let P be a non-singular matrix used for such a similarity transformation. Then,
we have
= * . ð15:36Þ
0
* . ð15:37Þ
0
* . ð15:38Þ
* , ð15:39Þ
0
where the last equality is due to (15.33). Summing (15.39) over m following (15.34),
we obtain another triangle matrix expressed as
exp
* . ð15:40Þ
0
Once again considering that eigenvalues of a triangle matrix are given by its
diagonal elements and that the eigenvalues are invariant under a similarity transfor-
mation, (15.40) implies that exp a1, exp a2, , exp an are eigenvalues of exp A.
These complete the proof. ∎
A question of whether the eigenvalues are a proper eigenvalue or a generalized
eigenvalue is irrelevant to Theorem 15.3. In other words, the eigenvalue may be a
proper one or a generalized one (Sect. 12.6). That depends solely upon the nature of
A in question.
Theorem 15.3 immediately leads to a next important relation. Taking a determi-
nant of (15.40), we get
det P1 ð exp AÞP ¼ detP1 detð exp AÞdetP ¼ detð exp AÞ
¼ ð exp a1 Þð exp a2 Þ ð exp an Þ ¼ exp ða1 þ a2 þ þ an Þ
¼ exp ðTrAÞ ð15:41Þ
We have examined general aspects until now. Nonetheless, we have not yet
studied the relationship between the exponential functions of matrices and specific
types of matrices so far. Regarding the applications of exponential functions of
matrices to mathematical physics, especially with the transformations of functions
and vectors, we frequently encounter orthogonal matrices and unitary matrices. For
this purpose, we wish to examine the following properties.
(6) Let A be a real skew-symmetric matrix. Then, exp A is a real orthogonal matrix.
(7) Let A be an anti-Hermitian matrix. Then, exp A is a unitary matrix.
To show (6), we note that a skew-symmetric matrix is described as
AT ¼ A or AT þ A ¼ 0: ð15:42Þ
where the last equality comes from Property (3) of (15.31). Equation (15.43) implies
that exp A is a real orthogonal matrix.
Since an anti-Hermitian matrix A is expressed as
A{ ¼ A or A{ þ A ¼ 0, ð15:44Þ
where the last equality comes from Property (4) of (15.32). This implies that exp A is
a unitary matrix.
Regarding Properties (6) and (7), if A is replaced with tA (t : real), the relations
(15.42) through (15.45) are held unchanged. Then, Properties (6) and (7) are rewrit-
ten as
(6)0 Let A be a real skew-symmetric matrix. Then, exp tA (t : real) is a real
orthogonal matrix.
(7)0 Let A be an anti-Hermitian matrix. Then, exp tA (t : real) is a unitary matrix.
Now, let a function F(t) be expressed as F(t) ¼ exp tx with real numbers t and x.
Then, we have a well-known formula
d
F ðt Þ ¼ xF ðt Þ: ð15:46Þ
dt
15.2 Exponential Functions of Matrices and Their Manipulations 591
Next, we extend (15.46) to the exponential functions of matrices. Let us define the
differentiation of an exponential function of a matrix with respect to a real parameter
t. Then, in concert with (15.46), we have a following important theorem.
Theorem 15.4 [1, 2] Let F(t) exp tA with t being a real number and A being a
matrix that does not depend on t. Then, we have
d
F ðt Þ ¼ AF ðt Þ: ð15:47Þ
dt
where we considered that tA and ΔtA are commutative and used Theorem 15.2.
Therefore, we get
1 1
½F ðt þ Δt Þ F ðt Þ ¼ ½F ðΔt Þ EF ðt Þ
Δt Δt
h i
1 X1 1 ν
¼ ð ΔtA Þ E F ðt Þ
Δt ν¼0 ν!
hX1 i
1
¼ ν¼1 ν!
ðΔt Þν1 Aν F ðt Þ: ð15:49Þ
1 d
lim ½F ðt þ Δt Þ F ðt Þ F ðt Þ ¼ AF ðt Þ:
Δt!0 Δt dt
Notice that in (15.49) only the first term (i.e., ν ¼ 1) of RHS is nonvanishing with
respect to Δt ! 0. Then, (15.47) will follow. This completes the proof. ∎
On the basis of the power series expansion of the exponential function of a matrix
defined in (15.7), A and F(t) are commutative. Therefore, instead of (15.47) we may
write
d
F ðt Þ ¼ F ðt ÞA: ð15:50Þ
dt
Using Theorem 15.4, we show the following important properties. These are
converse propositions to Properties (6) and (7).
592 15 Exponential Functions of Matrices
(8) Let exp tA be a real orthogonal matrix with any real number t. Then, A is a real
skew-symmetric matrix.
(9) Let exp tA be a unitary matrix with any real number t. Then, A is an anti-
Hermitian matrix.
To show (8), from the assumption with any real number t we have
exp tAð exp tAÞT ¼ exp tA exp tAT ¼ E: ð15:51Þ
Since (15.52) must hold with any real number, (15.52) must hold with t ! 0 as
well. On this condition, from (15.52) we get A + AT ¼ 0. That is, A is a real skew-
symmetric matrix.
In the case of (9), similarly we have
exp tAð exp tAÞ{ ¼ exp tA exp tA{ ¼ E: ð15:53Þ
15.3.1 Introduction
In Chap. 10 we have dealt with SOLDEs using Green’s functions. In the present
section we examine the properties of systems of differential equations. There is a
close relationship between the two topics. First we briefly outline it.
We describe an example of the system of differential equations. First, let us think
of the following general equation:
where x(t) and y(t) vary as a function of t; a(t), b(t), c(t), and d(t) are coefficients.
Differentiating (15.54) with respect to t, we get
where with the second equality we used (15.55). To delete y(t) from (15.56), we
multiply b(t) on both sides of (15.56) and multiply bð_t Þ þ bðt Þdðt Þ on both sides of
(15.54) and further subtract the latter equation from the former. As a result, we
obtain
b€x b_ þ ða þ dÞb x_ þ a b_ þ bd bða_ þ bcÞ x ¼ 0: ð15:57Þ
_ þ d y_ ¼ c_ x þ cax þ dy
€y ¼ c_ x þ c_x þ dy _ þ dy_
Z t
¼ ðc_ þ caÞ C exp aðt Þdt þ dy_ þ dy_ ,
where we used (15.58) as well as x_ ¼ ax. Thus, we are going to solve a following
inhomogeneous SOLDE:
Z t
_ ¼ ðc_ þ caÞ C exp
€y dy_ dy aðt 0 Þdt 0 : ð15:59Þ
In Sect. 10.2 we have shown how we are able to solve an inhomogeneous FOLDE
using a weight function. For a later purpose, let us consider a next FOLDE assuming
a(x) 1 in (10.23):
594 15 Exponential Functions of Matrices
where we assume that the functional form u(t) remains unchanged in the inhomo-
geneous equation (15.60) and that instead the “constant” k(t) may change as a
function of t. Inserting (15.63) into (15.60), we have
_ þ ku_ þ pku ¼ ku
x_ þ px ¼ ku _ þ kðu_ þ puÞ ¼ ku
_ þ k ðpu þ puÞ ¼ ku
_
¼ q, ð15:64Þ
where with the third equality we used (15.61) and (15.62) to get u_ ¼ pu. In this
way, (15.64) can easily be integrated to give
Z t
qð t 0 Þ 0
k ðt Þ ¼ dt þ C,
uð t 0 Þ
Comparing (15.65) with (10.29), we notice that these two expressions are related.
In other words, the method of variation of constants in (15.65) is essentially the same
as that using the weight function in (10.29).
Another important point to bear in mind is that in Chap. 10 we have solved
SOLDEs of constant coefficients using the method of Green’s functions. In that case,
we have separately estimated a homogeneous term and inhomogeneous term (i.e.,
surface term). In the present section we are going to seek a method corresponding to
that using the Green’s functions. In the above discussion we see that a system of
differential equations with two unknowns can be translated into a SOLDE. Then, we
expect such a system of differential equations of constant coefficients to be solved
somehow or other. We will hereinafter describe it in detail giving examples.
15.3 System of Differential Equations 595
The equations of (15.54) and (15.55) are unified in a homogeneous single matrix
equation such that
að t Þ cð t Þ
ðxð_t Þ yð_t ÞÞ ¼ ðxðt Þ yðt ÞÞ : ð15:66Þ
bðt Þ d ðt Þ
In (15.66), x(t) and y(t) and their derivatives represent the functions (of t), and so
we describe them as row matrices. With this expression of equation, see Sect. 11.2
and, e.g., (11.37) therein. Another expression is a transposition of (15.66) described
by
!
xð_t Þ að t Þ bð t Þ xð t Þ
¼ : ð15:67Þ
yð_t Þ cðt Þ d ðt Þ yð t Þ
Notice that the coefficient matrix has been transposed in (15.67) accordingly. We
are most interested in an equation of constant coefficient and, hence, we rewrite
(15.66) once again explicitly as
_ _ a c
ðxðt Þ yðt ÞÞ ¼ ðxðt Þ yðt ÞÞ ð15:68Þ
b d
or
d a c
ð xð t Þ yð t Þ Þ ¼ ð xð t Þ yð t Þ Þ : ð15:69Þ
dt b d
Then, we get
1 1
exp tA E þ tA þ ðtAÞ2 þ þ ðtAÞν þ : ð15:73Þ
2! ν!
Note that if A (or tA) is a (n, n) matrix, F(t) of (15.72) is a (n, n) matrix as well.
Since in that general case F(t) is non-singular, it consists of n linearly independent
column vectors, i.e., (n, 1) matrices; see Chap. 11. Then, F(t) is symbolically
described as
_ _ a c
ðxðt Þ yðt ÞÞ ¼ ðxðt Þ yðt ÞÞ þ ðp qÞ: ð15:77Þ
b d
Using the same solution F(t) that appears in the homogeneous equation, we
assume that the solution of the inhomogeneous equation is described by
where k(t) is a variable “constant.” Replacing x(t) in (15.76) with x(t) ¼ k(t)F(t) of
(15.78), we obtain
_ þ kF_ ¼ kF
x_ ¼ kF _ þ kFA ¼ kFA þ b, ð15:79Þ
_ ¼ b:
kF ð15:80Þ
Here we define
The matrix Rðs, t Þ is said to be a resolvent matrix [6]. This matrix plays an
essential role in solving the system of differential equations both homogeneous and
inhomogeneous with either homogeneous or inhomogeneous boundary conditions
(BCs). Rewriting (15.82), we get
Z t
xðt Þ ¼ kðt ÞF ðt Þ ¼ ds½bðsÞℜðs, t Þ : ð15:84Þ
t0
ð3Þ Rðs, t ÞRðt, uÞ ¼ F ðsÞ1 F ðt ÞF ðt Þ1 F ðuÞ ¼ F ðsÞ1 F ðuÞ ¼ Rðs, uÞ: ð15:87Þ
uðaÞ ¼ σ, ð15:88Þ
where u(t) is a solution of a FOLDE. In that case, the BC can be translated into the
“initial” condition. In the present case, we describe the BCs as
where σ x(t0) and τ y(t0). Then, the said term associated with the inhomoge-
neous BCs is expected to be written as
xðt 0 ÞRðt 0 , t Þ:
In fact, since xðt 0 ÞRðt 0 , t 0 Þ ¼ xðt 0 ÞE ¼ xðt 0 Þ, this satisfies the BCs. Then, the full
expression of the solution of the inhomogeneous equation that takes account of the
BCs is assumed to be expressed as
Z t
xðt Þ ¼ xðt 0 Þℜðt 0 , t Þ þ ds½bðsÞℜðs, t Þ: ð15:90Þ
t0
Let us confirm that (15.90) certainly gives a proper solution of (15.76). First we
give a formula on the differentiation of
Z t
d
Q ðt Þ ¼ Pðt, sÞds, ð15:91Þ
dt t0
where P(t, s) stands for a (1, 2) matrix whose general form is described by (q(t, s)
r(t, s)), in which q(t, s) and r(t, s) are functions of t and s. We rewrite (15.91) as
15.3 System of Differential Equations 599
Z tþΔ Z t
1
Qðt Þ ¼ lim Pðt þ Δ, sÞds Pðt, sÞds
Δ!0 Δ t0 t0
Z tþΔ Z t Z t
1
¼ lim Pðt þ Δ, sÞds Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds
Δ!0 Δ t0 t0 t0
Z t
Pðt, sÞds
t0
Z tþΔ Z t Z t
1
¼ lim Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds Pðt, sÞds
Δ!0 Δ t t0 t0
Z t Z t
1 1
lim Pðt, t ÞΔ þ lim Pðt þ Δ, sÞds Pðt, sÞds
Δ!0 Δ Δ!0 Δ t0 t0
Z t
∂Pðt, sÞ
¼ Pðt, t Þ þ ds: ð15:92Þ
t0 ∂t
where with the last equality we used (15.85). Considering (15.83) and (15.93) and
differentiating (15.90) with respect to t, we get
Z t h i
xð_t Þ ¼ xðt 0 ÞF ðt 0 Þ1 F ð_t Þ þ ds bðsÞF ðsÞ1 F ð_t Þ þ bðt Þ
t0
Z t h i
¼ xðt 0 ÞF ðt 0 Þ1 F ðt ÞA þ ds bðsÞF ðsÞ1 F ðt ÞA þ bðt Þ
t0
Z t
¼ xðt 0 Þℜðt 0 , t ÞA þ ds½bðsÞℜðs, t ÞA þ bðt Þ
(" t 0Z )
t
¼ xðt 0 Þℜðt 0 , t Þ þ ds½bðsÞℜðs, t Þ A þ bðt Þ
t0
where with the second equality we used (15.70) and with the last equality we used
(15.90). Thus, we have certainly recovered the original differential equation of
(15.76). Consequently, (15.90) is the proper solution for the given inhomogeneous
equation with inhomogeneous BCs.
600 15 Exponential Functions of Matrices
The above discussion and formulation equally apply to the general case of the
differential equation with n unknowns, even though the calculation procedures
become increasingly complicated with the increasing number of unknowns.
As discussed in Sect. 15.3.2, this can readily be solved to produce a solution F(t)
such that
_ _ { 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞUU UU { , ð15:99Þ
2 1
Or we have
A ¼ UDU 1 : ð15:101Þ
Hence, we get
F ðt Þ ¼ exp tA ¼ exp tUDU 1 ¼ U ð exp tDÞU 1 , ð15:102Þ
where with the last equality we used (15.30). From Theorem 15.3, we have
et 0
exp tD ¼ : ð15:103Þ
0 e3t
In turn, we get
0 1 1 1 0 1
pffiffiffi pffiffiffi t
! p1ffiffiffi p1ffiffiffi
B 2 2C e 0 B 2 2C
F ðt Þ ¼ B
@
C
A
B
@
C
1 1 0 e 3t 1 1 A
pffiffiffi pffiffiffi pffiffiffi pffiffiffi ð15:104Þ
2 2 2 2
t t
!
1 e þ e 3t
e þ e 3t
¼ :
2 et þ e3t et þ e3t
where x(0) ¼ (x(0) y(0)) ¼ (c d) represents the BCs and b(s) ¼ (1 2) comes from
(15.96). This can easily be calculated so that we can have
1 1
xð0ÞRð0, t Þ ¼ ðc dÞet þ ðc þ d Þe3t ðd cÞet þ ðc þ dÞe3t
2 2
ð15:108Þ
and
Z
t
1 t 1
ds½bðsÞRðs, t Þ ¼ e þ e3t 1 et e3t : ð15:109Þ
0 2 2
Note that the first component of (15.108) and (15.109) represents the
x component of the solution for (15.95) and that the second component represents
the y component. From (15.108), we have
Thus, we find that the BCs are certainly satisfied. Putting t ¼ 0 in (15.109), we
have
Z 0
ds½bðsÞRðs, 0Þ ¼ 0, ð15:111Þ
0
1
xð t Þ ¼ ðc d þ 1Þet þ ðc þ d þ 1Þe3t 2 ,
2 ð15:112Þ
1
yðt Þ ¼ ðd c 1Þet þ ðc þ d þ 1Þe3t :
2
15.3 System of Differential Equations 603
1 2 2 1
exp tA ¼ E þ tA þt A þ þ t ν Aν þ , ð15:113Þ
2! ν!
1 1
exp ðsAÞ ¼ E þ ðsAÞ þ ðsÞ2 A2 þ þ ðsÞν Aν þ : ð15:114Þ
2! ν!
exp ðtA sAÞ ¼ ð exp tAÞ½ exp ðsAÞ ¼ ½ exp ðsAÞð exp tAÞ
¼ exp ðsA þ tAÞ ¼ exp ½ðt sÞA: ð15:115Þ
Rðs, t Þ ¼ F ðsÞ1 F ðt Þ ¼ exp ðsAÞ1 exp ðtAÞ ¼ exp ðsAÞ exp ðtAÞ
¼ exp ½ðt sÞA ¼ F ðt sÞ: ð15:116Þ
This implies that once we get exp tA, we can safely obtain the resolvent matrix
by automatically replacing t of (15.72) with t s. Also, exp tA and exp(sA) are
commutative. In the present case, therefore, by exchanging the order of products
of F(s)1 and F(t) in (15.106), we have the same result as (15.106) such that
!
1 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ
Rðs, t Þ ¼ F ðt ÞF ðsÞ1 ¼ : ð15:117Þ
2 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ
(ii) The resolvent matrix of the present example is real symmetric. This is because
if A is real symmetric, i.e., AT ¼ A, we have
n
ðAn ÞT ¼ AT ¼ An , ð15:118Þ
(iii) Let us think of a case where A is not a constant matrix but varies as a function of
t. In that case, we have
að t Þ bð t Þ
Aðt Þ ¼ : ð15:119Þ
cðt Þ d ðt Þ
Also, we have
aðsÞ bðsÞ
AðsÞ ¼ : ð15:120Þ
cð s Þ d ðsÞ
namely, A(t) and A(s) are not generally commutative. The matrices tA(t) is
not commutative with sA(s), either. In turn, (15.115) does not hold either.
Thus, we need an elaborated treatment for this.
Example 15.2 Find the resolvent matrix of the following homogeneous equation:
xð_t Þ ¼ 0,
yð_t Þ ¼ x þ y: ð15:122Þ
Or we have
A ¼ PDP1 : ð15:126Þ
Hence, we get
F ðt Þ ¼ exp tA ¼ exp tPDP1 ¼ Pð exp tDÞP1 , ð15:127Þ
In turn, we get
1 1 1 0 1 1 1 1 þ et
F ðt Þ ¼ ¼ : ð15:129Þ
0 1 0 et 0 1 0 et
Using the resolvent matrix, we can include the inhomogeneous term along with
BCs (or initial conditions). Once again, by exchanging the order of products of F
(s)1 and F(t) in (15.131), we have the same resolvent matrix. This can readily be
checked. Moreover, we get Rðs, t Þ ¼ F ðt sÞ.
Example 15.3 Solve the following equation
xð_t Þ ¼ x þ c,
yð_t Þ ¼ x þ y þ d, ð15:132Þ
under the BCs x(0) ¼ σ and y(0) ¼ τ. In a matrix form, (15.132) can be written as
1 1
ðxð_t Þ yð_t ÞÞ ¼ ðx yÞ þ ðc dÞ: ð15:133Þ
0 1
606 15 Exponential Functions of Matrices
1 1
The matrix of a type of M ¼ has been fully investigated in Sect. 12.6
0 1
and characterized as a matrix that cannot be diagonalized. In such a case, we directly
calculate exp tM. Here, remember that product of triangle matrices of same kind (i.e.,
an upper triangle matrix or a lower triangle matrix; see Sect. 12.1) is a triangle matrix
of the same kind.
We have
1 1 1 1 1 2 1 2 1 1
M2 ¼ ¼ , M3 ¼
0 1 0 1 0 1 0 1 0 1
1 3
¼ , :
0 1
Thus, we have
1 1
exp tM ¼ ℜð0, t Þ ¼ E þ tM þ t 2 M 2 þ þ t ν M ν þ
2! ν!
! ! ! !
1 0 1 1 1 2 1 2 1 ν 1 ν
¼ þt þ t þ þ t þ
0 1 0 1 2! 0 1 ν! 0 1
0 1
1 1 !
Be
t
t 1 þ t þ t2 þ þ t ν1 þ C et tet
¼@ 2! ðν 1Þ! A¼ :
t 0 et
0 e
ð15:135Þ
xðt Þ ¼ ðc þ σ Þet c,
yðt Þ ¼ ðc þ σ Þtet þ ðd c þ τÞet þ c d: ð15:138Þ
From (15.138), we find that the original inhomogeneous equations (15.132) and
BCs of x(0) ¼ σ and y(0) ¼ τ have been recovered.
Example 15.4 Find the resolvent matrix of the following homogeneous equation:
xð_t Þ ¼ 0,
yð_t Þ ¼ x: ð15:139Þ
Using the resolvent matrix, we can include the inhomogeneous term along with
BCs.
Example 15.5 As a trivial case, let us consider a following homogeneous equation:
_ _ a 0
ðxðt Þ yðt ÞÞ ¼ ðx yÞ , ð15:143Þ
0 b
where one of a and b can be zero or both of them can be zero. Even though (15.143)
merely apposes two FOLDEs, we can deal with such cases in an automatic manner.
a 0
From Theorem 15.3, as eigenvalues of exp A where A ¼ , we have ea and
0 b
eb. That is, we have
608 15 Exponential Functions of Matrices
eat 0
F ðt Þ ¼ exp tA ¼ : ð15:144Þ
0 ebt
is eat. We only have to perform routine calculations with a counterpart of x(t) [or y(t)]
of (x(t) y(t)) using (15.107) to solve an inhomogeneous differential equation under
BCs.
Returning back to Example 15.1, we had
_ _ 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞ þ ð1 2Þ: ð15:96Þ
2 1
we have
_ _ 1 0
ðX ðt Þ Y ðt ÞÞ ¼ ðX Y Þ þ ð1 2ÞU: ð15:148Þ
0 3
15.3 System of Differential Equations 609
To convert (X Y) to (x y), operating U1 on both sides of (15.149) from the right,
we obtain
Z t h i
e ðt 0 , t ÞU 1 þ
X ðt ÞU 1 ¼ Xðt 0 ÞU 1 U ℜ ds e e ðs, t Þ U 1 ,
bðsÞU 1 U ℜ ð15:151Þ
t0
where with the first and second terms we insert U1U ¼ E. Then, we get
Z t
e
xðt Þ ¼ xðt 0 Þ U ℜðt 0 , t ÞU 1 e ðs, t ÞU 1 :
þ dsbðsÞ U ℜ ð15:152Þ
t0
e ðs, t ÞU 1 to have
We calculate U R
!
e ðs, t ÞU 1 1 1 1 eðtsÞ 0 1 1
UR ¼ 3ðtsÞ
2 1 1 0 e 1 1
!
1 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ
¼ ¼ Rðs, t Þ: ð15:153Þ
2 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ
The calculation procedures will be seen in Chap. 20, and so we only show the
result here. That is, we have
cos ωt sin ωt
exp tωD ¼ : ð15:157Þ
sin ωt cos ωt
a a
x¼ sin ωt, y ¼ ð cos ωt 1Þ: ð15:159Þ
ω ω
In other words, the point mass performs a circular motion as shown in Fig. 15.2.
In Example 15.1 we mentioned that if the matrix A that defines the system of
differential equations varies as a function of t, then the solution method based upon
exp tA does not work. Yet, we have an important case where A is not a constant
matrix. Let us consider a next illustration.
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 611
− /
− /
In Sect. 4.5 we dealt with an electron motion under a circularly polarized light.
Here we consider the motion of a charged particle in a linearly polarized
electromagnetic wave.
Suppose that a charged particle is placed in vacuum. Suppose also that an
electromagnetic wave (light) linearly polarized along the x-direction is propagated
in the positive direction of the z-axis. Unlike the case of Sect. 4.5, we have to take
account of influence from both of the electric field and magnetic field components of
Lorentz force.
From (7.58) we have
where e1, e2, and e3 are the unit vector of the x-, y-, and z-direction. (The latter two
unit vectors appear just below.) If the extent of motion of the charged particle is
narrow enough around the origin compared to a wavelength of the electromagnetic
field, we can ignore kz in the exponent (see Sect. 4.5). Then, we have
E E 0 e1 eiωt : ð15:162Þ
F = eE þ e x_ B, ð15:167Þ
where x (=xe1 + ye2 + ze3) is a position vector of the charged particle having a
charge e. In (15.167) the first term represents the electric Lorentz force and the
second term shows the magnetic Lorentz force; see Sect. 4.5. Replacing E and B in
(15.167) with those in (15.163) and (15.166), we get
1 eE
F = 1 z_ eE0 e1 cos ωt þ 0 x_ e3 cos ωt
c c ð15:168Þ
= m€x ¼ m€xe1 þ m€ye2 þ m€ze3 ,
Note that the Lorentz force is not exerted in the direction of the y-axis. Putting
a eE0/m and b eE0/mc, we get
!
cos f ðt Þ sin f ðt Þ 0 f ð_t Þ
¼ : ð15:174Þ
sin f ðt Þ cos f ðt Þ f ð_t Þ 0
Closely inspecting (15.173) and (15.174), we find that if we can decide f(t) so that
f(t) may satisfy
we might well be able to solve (15.172). In that case, moreover, we expect that two
linearly independent column vectors
cos f ðt Þ sin f ðt Þ
and ð15:176Þ
sin f ðt Þ cos f ðt Þ
give a fundamental set of solutions of (15.172). A function f(t) that satisfies (15.175)
can be given by
f ðt Þ ¼ ðb=ωÞ sin ωt ¼ e
b sin ωt, ð15:177Þ
where e
b b=ω. Notice that at t ¼ 0 two column vectors of (15.176) give
1 0
and ,
0 1
if we choose f ðt Þ ¼ e
b sin ωt for f(t) in (15.176).
614 15 Exponential Functions of Matrices
dF ðt Þ
¼ F ðt ÞA,
dt
0 b cos ωt
A : ð15:179Þ
b cos ωt 0
Equation (15.179) is formally the same as (15.47), even though F(t) is not given
in the form of exp tA. This enables us to apply the general scheme mentioned in Sect.
15.3.2 to the present case. That is, we are going to address the given problem using
the method of variation of constants such that
where we define x(t) (ξ(t) ζ(t)) and k(t) (u(t) v(t)) as a variable constant.
Hence, using (15.80)–(15.90) we should be able to find the solution of (15.172)
described by
Z t
xðt Þ ¼ xðt 0 Þℜðt 0 , t Þ þ ds½bðsÞℜðs, t Þ, ð15:90Þ
t0
where b(s) (acosωt 0), i.e., the inhomogeneous term in (15.172). The resolvent
matrix Rðs, t Þ is expressed as
1 cos ½ f ðt Þ f ðsÞ sin ½ f ðt Þ f ðsÞ
Rðs, t Þ ¼ F ðsÞ F ðt Þ ¼ , ð15:180Þ
sin ½ f ðt Þ f ðsÞ cos ½ f ðt Þ f ðsÞ
where f(t) is given by (15.177). With F(s)1, we simply take a transpose matrix of
(15.178), because it is an orthogonal matrix. Thus, we obtain
0 1
cos eb sin ωs sin eb sin ωs
B C
F ðsÞ1 ¼@ A:
sin eb sin ωs e
cos b sin ωs
(15.178) represents a rotation matrix in ℝ2; see Sect. 11.2. Therefore, F(t) and F(s)1
are commutative. Notice, however, that we do not have a resolvent matrix as a
simple form that appears in (15.116). In fact, from (15.178) we find Rðs, t Þ 6¼
F ðt sÞ. It is because the matrix A given as (15.179) is not a constant but depends
on t.
Rewriting (15.90) as
Z t
xðt Þ ¼ xð0Þℜð0, t Þ þ ds½ða cos ωt 0Þℜðs, t Þ ð15:181Þ
0
and setting BCs as x(0) ¼ (σ τ), with the individual components we finally obtain
a
ξðt Þ ¼ σ cos e b sin ωt τ sin e b sin ωt þ sin e b sin ωt ,
b
a a
ζ ðt Þ ¼ σ sin e b sin ωt þ τ cos e b sin ωt cos e b sin ωt þ : ð15:182Þ
b b
Notice that the above BCs correspond to that the charged particle is initially
placed (at the origin) at rest (i.e., zero velocity). Deleting the argument t, we get
2
a 2 a
ξ2 þ ζ ¼ : ð15:184Þ
b b
Figure 15.3 depicts the particle velocities ξ and ζ in the x- and z-directions,
respectively. It is interesting that although the velocity in the x-direction switches
from negative to positive, that in the z-direction is kept non-negative. This implies
that the particle is oscillating along the x-direction, but continually drifting toward
the z-direction while oscillating. In fact, if we integrate ζ(t), we have a continually
drifting term ab t.
Figure 15.4 shows the Lorentz force F as a function of time in the case where the
charge of the particle is positive. Again, the electric Lorentz force (represented by E)
switches from positive to negative in the x-direction, but the magnetic Lorentz force
(represented by x_ B) holds non-negative in the z-direction all the time.
To precisely analyze the positional change of the particle, we need to integrate
(15.182) once again. This requires the detailed numerical calculation. For this
purpose the following relation is useful [7]:
616 15 Exponential Functions of Matrices
Fig. 15.3 Velocities ξ and ζ of a charged particle in the x- and z-directions, respectively.
The charged particle exerts a periodic motion under the influence of a linearly polarized
electromagnetic wave
(a) (b)
, ̇
̇× ,
, ̇
̇× ,
Fig. 15.4 Lorentz force as a function of time. In (a, b), the phase of electromagnetic fields E and
B is reversed. As a result, the electric Lorentz force (represented by E) switches from positive
to negative in the x-direction, but the magnetic Lorentz force (represented by x_ B) holds
non-negative all the time in the z-direction
X1
cos ðx sin t Þ ¼ J ðxÞ cos mt,
m¼1 m
X1
sin ðx sin t Þ ¼ J ðxÞ sin mt,
m¼1 m
ð15:185Þ
where the functions Jm(x) are called the Bessel functions that satisfy the following
equation:
d2 J n ðxÞ 1 dJ n ðxÞ n2
2
þ þ 1 2 J n ðxÞ ¼ 0: ð15:186Þ
dx x dx x
References 617
Although we do not examine the properties of the Bessel functions in this book,
the related topics are dealt with in detail in the literature [5, 7–9]. The Bessel
functions are widely investigated as one of special functions in many branches of
mathematical physics.
To conclude this section, we have described an example to which the matrix A of
(15.179) that characterizes the system of differential equations is not a constant
matrix but depends on the parameter t. Unlike the case of the constant matrix A, it
may well be difficult to find a fundamental set of solutions. Once we can find the
fundamental set of solutions, however, it is always possible to construct the resolvent
matrix and solve the problem using it. In this respect, we have already encountered a
similar situation in Chap. 10 where the Green’s functions were constructed by a
fundamental set of solutions.
In this section, a fundamental set of solutions could be found and, as a conse-
quence, we could construct the resolvent matrix. It is, however, not always the case
that we can recast the system of differential equations (15.66) as the form of
(15.179). Nonetheless, whenever we are successful in finding the fundamental set
of solutions, we can construct a resolvent matrix and, hence, solve the problems.
This is a generic and powerful tool.
References
1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
3. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
4. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
6. Inami T (1998) Ordinary differential equations. Iwanami, Tokyo. (in Japanese)
7. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd
edn. Cambridge University Press, Cambridge
8. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
9. Hassani S (2006) Mathematical physics. Springer, New York
Part IV
Group Theory and Its Chemical
Applications
Universe comprises space and matter. These two mutually stipulate their modality of
existence. We often comprehend related various aspects as manifestation of sym-
metry. In this part we deal with the symmetry from a point of view of group theory.
In this last part of the book, we outline and emphasize chemical applications of the
methods of mathematical physics. This part supplies us with introductory description
of group theory. The group theory forms an important field of both pure and applied
mathematics. Starting with the definition of groups, we cover a variety of topics
related to group theory. Of these, symmetry groups are familiar to chemists, because
they deal with a variety of matter and molecules that are characterized by different
types of symmetry. The symmetry group is a kind of finite group and called a point
group as well. Meanwhile, we have various infinite groups that include rotation
group as a typical example. We also mention an introductory theory of the rotation
group of SO(3)that deals with an important topic of, e.g., Euler angles. We also treat
successive coordinate transformations.
Next, we describe representation theory of groups. Schur’s lemmas and related
grand orthogonality theorem underlie the representation theory of groups. In parallel,
characters and irreducible representations are important concepts that support the
representation theory. We present various representations, e.g., regular representa-
tion, direct-product representation, and symmetric and antisymmetric representa-
tions. These have wide applications in the field of quantum mechanics and quantum
chemistry, and so forth.
On the basis of the above topics, we highlight quantum chemical applications of
group theory in relation to a method of molecular orbitals. As tangible examples, we
adopt aromatic molecules and methane.
The last part deals with the theory of continuous groups. The relevant theory has
wide applications in many fields of both pure and applied physics and chemistry. We
highlight the topics of SU(2)and SO(3)that very often appear there. Tangible exam-
ples help understand the essence.
Chapter 16
Introductory Group Theory
In contrast to a broad range of applications, the definition of the group is simple. Let
ℊ be a set of elements gν, where ν is an index either countable (e.g., integers) or
uncountable (e.g., real numbers) and the number of elements may be finite or
infinite. We denote this by ℊ ¼ {gν}. If a group is a finite group, we express it as
ℊ ¼ fg1 , g2 , , gn g, ð16:1Þ
is denoted by a symbol " ⋄ " below. Note that the symbol ⋄ implies an ordinary
multiplication, an ordinary addition, etc.
(A1) If a and b are any elements of the set ℊ, then so is a ⋄ b. (We sometimes say
that the set is “closed” regarding the multiplication.)
(A2) Multiplication is associative; i.e., a ⋄ (b ⋄ c) ¼ (a ⋄ b) ⋄ c.
(A3) The set ℊ contains an element of e called the identity element such that we
have a ⋄ e ¼ e ⋄ a ¼ a with any element a of ℊ.
(A4) For any a of ℊ, we have an element b such that a ⋄ b ¼ b ⋄ a ¼ e. The element
b is said to be the inverse element of a. We denote b a1.
In the above definitions, we assume that the commutative law does not necessar-
ily hold, that is, a ⋄ b 6¼ b ⋄ a. In that case the group ℊ is said to be a
noncommutative group. However, we have a case where the commutative law
holds, i.e., a ⋄ b ¼ b ⋄ a. If so, the group ℊ is called a commutative group or an
Abelian group.
Let us think of some examples of groups. Henceforth, we follow the convention
and write ab to express a ⋄ b.
Example 16.1 We present several examples of groups below. Examples (i)–(iv) are
simple, but Example (v) is general.
(i) ℊ ¼ {1, 1}. The group ℊ makes a group with respect to the multiplication.
This is an example of a finite group.
(ii) ℊ ¼ { , 3, 2, 1, 0, 1, 2, 3, }. The group ℊ makes a group with respect
to the addition. This is an infinite group. For instance, take a(>0) and make
a + 1 and make (a + 1) + 1, [(a + 1) + 1] + 1, again and again. Thus, addition
is not closed and, hence, we must have an infinite group.
0 1
(iii) Let us start with a matrix a ¼ . Then, the inverse a1 ¼ a3 ¼
1 0
0 1 1 0
. We have a2 ¼ . Its inverse is a2 itself. These four
1 0 0 1
elements make a group. That is, ℊ ¼ {e, a, a2, a3}. This is an example of cyclic
groups.
(iv) ℊ ¼ {1}. It is a most trivial case, but sometimes the trivial case is very
important as well. We will come back later to this point.
(v) Let us think of a more general case. In Chap. 11, we discussed endomorphism
on a vector space and showed the necessary and sufficient condition for the
existence of an inverse transformation. In this relation, we consider a set that
comprises matrices such that
GLðn, ℂÞ A ¼ aij ji, j ¼ 1, 2, , n; aij 2 ℂ, detA 6¼ 0 :
This may be either a finite group or an infinite group. The former can be a symmetry
group and the latter can be a rotation group. This group is characterized by a set of
16.1 Definition of Groups 623
invertible and endomorphic linear transformations over a vector space Vn and called
a linear transformation group or a general linear group and denoted by GL(n, ℂ), GL
(Vn), GL(V ), etc. The relevant transformations are bijective. We can readily make
sure that axioms (A1) to (A4) are satisfied with GL(n, ℂ). Here, the vector space can
be ℂn or a function space.
The structure of a finite group is tabulated in a multiplication table. This is made
up such that group elements are arranged in a first row and a first column and that an
intersection of an element gi in the row and gj in the column is designated as a
product gi ⋄ gj. Choosing the above (iii) for an example, we make its multiplication
table (see Table 16.1). There we define a2 ¼ b and a3 ¼ c.
Having a look at Table 16.1, we notice that in the individual rows and columns
each group element appears once and only once. This is well known as a
rearrangement theorem.
Theorem 16.1: Rearrangement Theorem [1] In each row or each column in the
group multiplication table, individual group elements appear once and only once.
From this, each row and each column list merely rearranged group elements.
Proof Let a set ℊ ¼ {g1 e, g2, , gn} be a group. Arbitrarily choosing any
element h from ℊ and multiplying individual elements by h, we obtain a set
Hfhg1 , hg2 , , hgn g . Then, all the group elements of ℊ appear in H once and
only once. Choosing any group element gi, let us multiply gi by h1 to get h1gi.
Since h1gi must be a certain element gk of ℊ, we put h1gi ¼ gk. Multiplying both
sides by h, we have gi ¼ hgk. Therefore, we are able to find this very element hgk in H,
i.e., gi in H. This implies that the element gi necessarily appears in H. Suppose in turn
that gi appears more than once. Then, we must have gi ¼ hgk ¼ hgl (k 6¼ l). Multi-
plying the relation by h1, we would get h1gi ¼ gk ¼ gl, in contradiction to the
supposition. This means that gi appears in H once and only once. This confirms that
the theorem is true of each row of the group multiplication table.
A similar argument applies with a set H0 fg1 h, g2 h, , gn hg. This confirms in turn
that the theorem is true of each column of the group multiplication table. These
complete the proof. ∎
624 16 Introductory Group Theory
16.2 Subgroups
ℊ ¼ g1 H þ g2 H þ þ gk H , ð16:2Þ
where g1, g2, , gk are mutually different elements with g1 being the identity e. In
that case (16.2) is said to be the left coset decomposition of ℊ by H . Similarly, right
coset decomposition can be done to give
ℊ ¼ H g1 þ H g2 þ þ H gk : ð16:3Þ
In general, however,
16.3 Classes 625
gk H 6¼ H gk or gk H g1
k 6¼ H : ð16:4Þ
Taking the case of left coset as an example, let us examine whether different cosets
mutually contain a common element. Suppose that gi H and gj H mutually contain a
common element. Then, that element would be expressed as gihp ¼ gjhq (1 i,
j n; 1 p, q s). Thus, we have gi hp h1 q ¼ gj . Since H is a subgroup of ℊ,
hp h1
q 2 H . This implies that g j 2 g i H . It is in contradiction to the definition of left
coset. Thus, we conclude that different cosets do not mutually contain a common
element.
Suppose that the order of ℊ and H is n and s, respectively. Different k cosets
comprise s elements individually and different cosets do not mutually possess a
common element and, hence, we must have
n ¼ sk, ð16:5Þ
16.3 Classes
1 1 1 1
c ¼ g0 bg0 , b ¼ gag1 ⟹c ¼ g0 bg0 ¼ g0 gag1 g0 ¼ g0 gaðg0 gÞ : ð16:6Þ
In the above a set containing a and all the elements conjugate to a is said to be a
(conjugate) class of a. Denoting this set by ∁a, we have
∁a ¼ a, g2 ag1 1 1
2 , g3 ag3 , , gn agn : ð16:7Þ
In ∁a a same element may appear repeatedly. It is obvious that in every group the
identity element e forms a class by itself. That is,
∁e ¼ feg: ð16:8Þ
As in the case of the decomposition of a group into (left or right) cosets, we can
decompose a group to classes. If group elements are not exhausted by a set
comprising ∁e or ∁a, let us take b such that b 6¼ e and b 2
= ∁a and make ∁b similarly
to (16.7). Repeating this procedure, we should be able to decompose a group into
classes. In fact, if group elements have not yet exhausted after these procedures, take
626 16 Introductory Group Theory
remaining element z and make a class. If the remaining element is only z in this
moment, z can make a class by itself (as in the case of e). Notice that for an Abelian
group every element makes a class by itself. Thus, with a finite group we have a
decomposition such that
ℊ ¼ ∁e þ ∁a þ ∁b þ þ ∁z : ð16:9Þ
To show that (16.9) is really a decomposition, suppose that for instance a set ∁a \ ∁b
is not an empty set and that x 2 ∁a \ ∁b. Then we must have α and β that satisfy a
following relation: x ¼ αaα1 ¼ βbβ1, i.e., b ¼ β1 αaα1β ¼ β1 αa(β1 α)1.
This implies that b has already been included in ∁a, in contradiction to the suppo-
sition. Thus, (16.9) is in fact a decomposition of ℊ into a finite number of classes.
In the above we thought of a class conjugate to a single element. This notion can
be extended to a class conjugate to a subgroup. Let H be a subgroup of ℊ. Let g be
an element of ℊ. Let us now consider a set H 0 ¼ gH g1. The set H 0 is a subgroup of
ℊ and is called a conjugate subgroup.
In fact, let hi and hj be any two elements of H , that is, let ghig1 and ghjg1 be any
tow elements of H 0 . Then, we have
ghi g1 ghj g1 ¼ ghi hj g1 ¼ ghk g1 , ð16:10Þ
1
where hk ¼ hi hj 2 H . Hence, ghk g1 2 H 0 . Meanwhile, ðghi g1 Þ ¼ gh1 i g
1
2
H . Thus, conditions (1) and (2) of Sect. 16.2 are satisfied with H . Therefore, H 0 is a
0 0
subgroup of ℊ. The subgroup H 0 has a same order as H . This is because with any
two different elements hi and hj ghig1 6¼ ghjg1.
If for 8g2ℊ and a subgroup H we have a following equality
g1 H g ¼ H , ð16:11Þ
gH ¼ H g: ð16:12Þ
This implies that the left coset is identical to the right coset. Thus, as far as we are
dealing with a coset pertinent to an invariant subgroup, we do not have to distinguish
left and right cosets.
Now let us anew consider the (left) coset decomposition of ℊ by an invariant
subgroup H
16.4 Isomorphism and Homomorphism 627
ℊ ¼ g1 H þ g2 H þ þ gk H , ð16:13Þ
ðgi hl Þ gj hm ¼ gi gj g1
j hl gj hm ¼ gi gj g1
j hl gj hm ¼ gi gj hp hm
¼ gi gj hq , ð16:14Þ
where the third equality comes from (16.11). That is, we should have ∃hp such that
g1
j hl gj ¼ hp , and hphm ¼ hq. In (16.14) hα 2 H (α stands for l, m, p, q, etc. with
1 α s). Note that gi gj hq 2 gi gj H . Accordingly, a product of elements belonging
to gi H and gj H belongs to gi gj H . We rewrite (16.14) as a relation between the sets
ð gi H Þ gj H ¼ gi gj H : ð16:15Þ
Viewing LHS of (16.15) as a product of two cosets, we find that the said product is a
coset as well. This implies that a collection of the cosets forms a group. Such a group
that possesses cosets as elements is said to be a factor group or quotient group. In this
context, the multiplication is a product of cosets. We denote the factor group by
ℊ=H :
An identity element
of this factor group is H . This is because in (16.15) putting
gi ¼ e, we get H gj H ¼ gj H . Alternatively, putting gj ¼ e, we have ðgi H ÞH ¼
gi H . In (16.15) moreover putting gj ¼ g1i , we get
ðgi H Þ g1 1
i H ¼ gi gi H ¼ H : ð16:16Þ
As in the case of the linear vector space, we consider the mapping between group
elements. Of these, the notion of isomorphism and homomorphism is important.
Definition 16.1 Let ℊ ¼ {x, y, } and ℊ0 ¼ {x0, y0, } be groups and let a
mapping ℊ ! ℊ0 exist. Suppose that there is a one-to-one correspondence (i.e.,
injective mapping)
628 16 Introductory Group Theory
x $ x0 , y $ y0 ,
between the elements such that xy ¼ z implies that x0y0 ¼ z0 and vice versa.
Meanwhile, any element in ℊ0 must be the image of some element of ℊ. That is,
the mapping is surjective as well and, hence, the mapping is bijective. Then, the two
groups ℊ and ℊ0 are said to be isomorphic. The relevant mapping is called an
isomorphism. We symbolically denote this relation by
ℊ ffi ℊ0 :
Note that the aforementioned groups can be either a finite group or an infinite
group. We did not designate identity elements. Suppose that x is the identity e. Then,
from the relations xy ¼ z and x0y0 ¼ z0 we have
ey ¼ z ¼ y, x0 y0 ¼ z 0 ¼ y0 : ð16:17Þ
Then, we get
x0 ¼ e0 , i:e:, e $ e0 : ð16:18Þ
1 0
y0 ¼ x0 ¼ x1 : ð16:20Þ
Then, the two groups ℊ and ℊ0 are said to be homomorphic. The relevant mapping is
called homomorphism. We symbolically denote this relation by
ℊ ℊ0 :
Therefore,
½ρðxÞ1 ¼ ρ x1 :
The two groups can be either a finite group or an infinite group. Note that in the
above the mapping is not injective. The mapping may or may not be surjective.
Regarding the identity and inverse elements, we have the same relations as (16.18)
and (16.20). From Definitions 16.1 and 16.2, we say that the bijective homomor-
phism is the isomorphism.
Let us introduce an important notion of a kernel of a mapping. In this regard, we
have a following definition.
Definition 16.3 Let ℊ ¼ {e, x, y, } and ℊ0 ¼ {e0, x0, y0, } be groups and let
e and e0 be the identity elements. Suppose that there exists a homomorphic mapping
ρ: ℊ ! ℊ0. Also let F be a subset of ℊ such that
ρð F Þ ¼ e 0 : ð16:22Þ
The first and second equalities result from the homomorphism of ρ. Since F ¼ feg,
xy1 ¼ e, i.e., x ¼ y. Therefore, ρ is injective (i.e., one-to-one correspondence). As ρ
is surjective from the assumption, ρ is bijective. The mapping ρ is isomorphic
accordingly.
Conversely, suppose that ρ is isomorphic. Also suppose for ∃x 2 ℊ ρ(x) ¼ e0.
From (16.18), ρ(e) ¼ e0. We have ρ(x) ¼ ρ(e) ¼ e0 ⟹ x ¼ e due to the isomorphism
of ρ (i.e., one-to-one correspondence). This implies F ¼ feg. This completes the
proof. ∎
We become aware of close relationship between Theorem 16.1 and linear trans-
formation versus kernel already mentioned in Sect. 11.2 of Part III. Figure 16.1
630 16 Introductory Group Theory
: inverble (bijecve)
Then, k1
i 2 F . Thus, F is a subgroup of ℊ.
Next, for 8g 2 ℊ we have
ρ gk i g1 ¼ ρðgÞρðki Þρ g1 ¼ ρðgÞe0 ρ g1 ¼ e0 : ð16:27Þ
gF g1 ¼ F : ð16:28Þ
e
ρðgi F Þ ¼ ρðgi Þ: ð16:29Þ
Then, e
ρ is an isomorphic mapping.
Proof From (16.15) and (16.21), it is obvious that e ρ is homomorphic. The confir-
mation is left for readers. Let gi F and gj F be two different cosets. Suppose here that
ρ(gi) ¼ ρ(gj). Then we have
1
ρ g1
i g j ¼ ρ gi ρ gj ¼ ½ρðgi Þ1 ρ gj ¼ ½ρðgi Þ1 ρðgi Þ ¼ e0 : ð16:30Þ
where hp ¼ hihk and h0q ¼ h0j h0l . The identity element is ee ¼ e; ehihk ¼ hihke. The
1 1 1
inverse element is ðhi h0j Þ ¼ h0j hi 1 ¼ hi 1 h0j . Associative law is obvious from
632 16 Introductory Group Theory
hi h0j ¼ h0j hi. Thus, ℊ forms a group. This is said to be a direct product of groups, or a
direct-product group. The groups H and H 0 are called direct factors of ℊ. In this
case, we succinctly represent
ℊ¼H H 0:
In the above the condition (ii) is equivalent to that 8 g2ℊ is uniquely represented
as
g ¼ hh0 ; h 2 H , h0 2 H 0 : ð16:33Þ
In fact, suppose that H \ H 0 ¼ feg and that g can be represented in two ways such
that
Then, we have
1 1
h2 1 h1 ¼ h02 h01 ; h2 1 h1 2 H , h02 h01 2 H 0: ð16:35Þ
1
h2 1 h1 ¼ h02 h01 ¼ e: ð16:36Þ
That is, h2 ¼ h1 and h02 ¼ h01 . This means that the representation is unique.
Conversely, suppose that the representation is unique and that x 2 H \ H 0. Then
we must have
x ¼ xe ¼ ex: ð16:37Þ
1 1
ghg1 ¼ hν h0μ hh0μ hν 1 ¼ hν hh0μ h0μ hν 1 ¼ hν hhν 1 2 H : ð16:38Þ
gH g1 ¼ H : ð16:39Þ
Reference
0 1
x
B C
P ¼ @ y A: ð17:1Þ
z
Note that P may be on or within or outside a geometric object or molecule that we are
dealing with. The relevant position vector x for P is expresses as
where e1, e2, and e3 denote an orthonormal basis vectors pointing to positive
directions of x-, y-, and z-axes, respectively. Similarly we denote a linear transfor-
mation A by
0 10 1
a11 a12 a13 x
B CB C
AðxÞ ¼ ðe1 e2 e3 Þ@ a21 a22 a23 A@ y A: ð17:3Þ
a31 a32 a33 z
(a) (b)
Disk A Disk B
Fig. 17.1 Rotation of an object. (a) Case where we cannot recognize that the object has been
moved. (b) Case where we can recognize that the object has been moved because of its attribute
(i.e., because the round disk is partly painted black)
two possibilities. The first alternative is that we cannot recognize that the object has
been moved. The second one is that we can yet recognize that the object has been
moved. According to Fig. 17.1a, b, we have the former case and the latter case,
respectively. In the latter case, we have recognized the movement of the object by its
attribute, i.e., by that the object is partly painted black.
However, we do not have to be rigorous here. We have a clear intuitive criterion
for a judgment of whether a geometric object has been moved. From now on, we
assume that the geometric character of an object is pertinent to both its positional
relationship and attribute. Thus, we define the equivalent (or indistinguishable)
disposition of an object and the operation that yields such an equivalent disposition
as follows:
Definition 17.1
(i) Symmetry operation: A geometric operation that produces an equivalent
(or indistinguishable) disposition of an object.
(ii) Equivalent (or indistinguishable) disposition: Suppose that regarding a geomet-
ric operation of an object, we cannot recognize that the object has been moved
before and after that geometric operation. In that case, the original disposition of
the object and the resulting disposition reached after the geometric operation are
referred to as an equivalent disposition. The relevant geometric operation is the
symmetric operation.
Here we should clearly distinguish translation (i.e., parallel displacement) from
the abovementioned symmetry operations. This is because for a geometric object to
possess the translation symmetry the object must be infinite in extent, typically an
infinite crystal lattice. The relevant discipline is widely studied as space group and
has a broad class of applications in physics and chemistry. However, we will not deal
with the space group or associated topics, but focus our attention upon symmetry
groups in this book.
In the above example, the rotation is a symmetry operation with Fig. 17.1a, but
the said geometric operation is not a symmetric operation with Fig. 17.1b.
Let us further inspect properties of the symmetry operation. Let us consider a set
H consisting of symmetry operations. Let a and b be any two symmetric operations
of H . Then, (i) a ⋄ b is a symmetric operation as well. (ii) Multiplication of
638 17 Symmetry Groups
This operation represents a π rotation around the z-axis. Let us think of another
symmetric operation described by
0 1
1 0 0
B C
B ¼ @0 1 0 A: ð17:5Þ
0 0 1
This produces a mirror symmetry with respect to the xy-plane. Then, we have
0 1
1 0 0
B C
C AB ¼ BA ¼ @ 0 1 0 A: ð17:6Þ
0 0 1
The operation C shows an inversion about the origin. Thus, A, B, and C along with
an identity element E form a group. Here E is expressed as
0 1
1 0 0
B C
E ¼ @0 1 0 A: ð17:7Þ
0 0 1
y
O
A, B, and C itself, respectively. The said group is commutative (Abelian) and said to
be a four group [1].
Meanwhile, we have a number of noncommutative groups. From a point of view
of a matrix structure, noncommutativity comes from off-diagonal elements of the
matrix. A typical example is a rotation matrix of a rotation angles different from zero
or nπ (n: integer). For later use, let us have a matrix form that expresses a θ rotation
around the z-axis. Figure 17.2 depicts a graphical illustration for this.
A matrix R has a following form:
0 1
cos θ sin θ 0
B C
R ¼ @ sin θ cos θ 0 A: ð17:8Þ
0 0 1
Note that R has been reduced. This implies that the three-dimensional Euclidean
space is decomposed into a two-dimensional subspace (xy-plane) and a one-dimen-
sional subspace (z-axis). The xy-coordinates are not mixed with the z-component
after the rotation R. Note, however, that if the rotation axis is oblique against the xy-
plane, this is not the case. We will come back to this point later.
Taking only the xy-coordinates in Fig. 17.3 we make a calculation. Using an
addition theorem of trigonometric functions, we get
where we used x ¼ r cos α and y ¼ r sin α. Combining (17.9) and (17.10), as a matrix
form we get
640 17 Symmetry Groups
θ
α
O x
x0 cos θ sin θ x
¼ : ð17:11Þ
y0 sin θ cos θ y
y
O $
0 1 0 1
cos θ sin θ 0 cos ϕ 0 sin ϕ
B C B C
Rzθ ¼ @ sin θ cos θ 0 A, Ryϕ ¼ @ 0 1 0 A,
0 0 1 sin ϕ 0 cos ϕ
0 1
1 0 0
B C
Rxφ ¼ @ 0 cos φ sin φ A: ð17:12Þ
0 sin φ cos φ
0 This
1 can easily0be 1
visualized in Fig. 17.4; consider that cyclic permutation of
x z
B C B C
@ y A produces @ x A. Shuffling the order of coordinates, we get
z y
0 1 0 10 1
x0 cos ϕ 0 sin ϕ x
B 0C B CB C
@y A ¼ @ 0 1 0 A@ y A: ð17:14Þ
z0 sin ϕ 0 cos ϕ z
n:
Cm
Note that as obviously from the matrix form, IO is commutable with any
other symmetry operations. Note also that IO can be expressed as successive
symmetry operations or product of symmetry operations. For instance, we have
0 10 1
1 0 0 1 0 0
B CB C
I O ¼ Rzπ M xy ¼ @ 0 1 0 A@ 0 1 0 A: ð17:17Þ
0 0 1 0 0 1
Note that Rzπ and Mxy are commutable; i.e., RzπMxy ¼ MxyRzπ.
17.2 Successive Symmetry Operations 643
Let us now consider successive reflections in different planes and successive rota-
tions about different axes [2]. Figure 17.5a displays two reflections with respect to
the planes σ and e σ both perpendicular to the xy-plane. The said planes make a
dihedral angle θ with their intersection line identical to the z-axis. Also, the plane
σ is identical with the zx-plane. Suppose that an arrow lies on the xy-plane perpen-
dicularly to the zx-plane. As in (17.15), an operation σ is represented as
0 1
1 0 0
B C
σ ¼ @0 1 0 A: ð17:19Þ
0 0 1
(a) z
O
y
(b) (c)
2
$ $
‒2
Fig. 17.5 Successive two reflections about two planes σ and e σ that make an angle θ. (a) Reflections
σ and e
σ with respect to two planes. (b) Successive operations of σ and e σ in this order. The combined
operation is denoted by eσ σ. The operations result in a 2θ rotation around the z-axis. (c) Successive
operations of e
σ and σ in this order. The combined operation is denoted by e σ σ. The operations result
in a 2θ rotation around the z-axis
0 10 10 1
cos θ sin θ 0 1 0 0 cos θ sin θ 0
B CB CB C
σ¼B
e @ sin θ cos θ 0C B
A@ 0 1 0C B
A@ sin θ cos θ 0C
A
0 0 1 0 0 1 0 0 1
0 1 ð17:20Þ
cos 2θ sin 2θ 0
B C
¼B
@ sin 2θ cos 2θ 0C
A:
0 0 1
0 1
cos 2θ sin 2θ 0
B C
e
σ σ ¼ @ sin 2θ cos 2θ 0 A: ð17:21Þ
0 0 1
The expression (17.21) means that the multiplication should be done first by σ and
then by eσ; see Fig. 17.5b. Note that σ and eσ are conjugate to each other (Sect. 16.3).
In this case, the combined operations produce a 2θ rotation around the z-axis.
If, on the other hand, the multiplication is made first by e
σ and then by σ, we have a
2θ rotation around the z-axis; see Fig. 17.5c. As a matrix representation, we have
0 1
cos 2θ sin 2θ 0
B C
σe
σ ¼ @ sin 2θ cos 2θ 0 A: ð17:22Þ
0 0 1
Thus, successive operations of reflection by the planes that make a dihedral angle θ
yield a rotation 2θ around the z-axis (i.e., the intersection line of σ and e
σ ). The
operation σeσ is an inverse to e
σ σ. That is
ðσe
σ Þðe
σ σ Þ ¼ E: ð17:23Þ
We have det σe
σ ¼ det e
σ σ ¼ 1.
Meanwhile, putting
0 1
cos 2θ sin 2θ 0
B C
R2θ ¼ @ sin 2θ cos 2θ 0 A, ð17:24Þ
0 0 1
we have
e
σ σ ¼ R2θ or e
σ ¼ R2θ σ: ð17:25Þ
This implies the following: Suppose a plane σ and a straight line on it. Also suppose
that one first makes a reflection about σ and then makes a 2θ rotation around the said
straight line. Then, the resulting transformation is equivalent to a reflection about e σ
that makes an angle θ with σ. At the same time, the said straight line is an intersection
line of σ and eσ. Note that a dihedral angle between the two planes σ and e σ is half an
angle of the rotation. Thus, any two of the symmetry operations related to (17.25) are
mutually dependent; any two of them produce the third symmetry operation.
In the above illustration, we did not take account of the presence of a symmetry
axis. If the aforementioned axis is a symmetry axis Cn, we must have
646 17 Symmetry Groups
0 0 1
0 10 10 1
cos θ sin θ 0 1 0 0 cos θ sin θ 0
B CB CB C
C aπ ¼ B
@ sin θ cos θ 0 C B
A@ 0 1 0 C B
A@ sin θ cos θ 0C
A
0 0 1 0 0 1 0 0 1
0 1
cos 2θ sin 2θ 0
B C
¼B@ sin 2θ cos 2θ 0 A,
C
0 0 1
ð17:27Þ
0 1
cos 2θ sin 2θ 0
B C
C aπ C xπ ¼ B
@ sin 2θ cos 2θ 0C
A,
0 0 1
0 1 ð17:28Þ
cos 2θ sin 2θ 0
B C
C xπ C aπ ¼ B
@ sin 2θ cos 2θ 0C
A:
0 0 1
we have [1, 2]
Note that in the above two illustrations for the successive symmetry operations, both
the relevant operators have been represented in reference to the original xyz-system.
For this reason, the latter operation was done from the left in (17.28).
From a symmetry requirement, once again the aforementioned C2 axes must be
present in combination with the Cn axis. Moreover, those C2 axes should be
perpendicular to the Cn axis. This can be seen in various Dn groups.
Another illustration of successive symmetry operations is an improper rotation. If
the rotation angle is π, this causes an inversion symmetry. In this illustration,
reflection, rotation, and inversion symmetries coexist. A C2h symmetry is a typical
example.
Equations (17.21), (17.22), and (17.29) demonstrate the same relation. Namely,
two successive mirror symmetry operations about a couple of planes and two
successive π-rotations about a couple of C2 axes cause the same effect with regard
to the geometric transformation. In this relation, we emphasize that two successive
reflection operations make a determinant of relevant matrices 1. These aspects cause
an interesting effect and we will briefly discuss it in relation to O and Td groups.
If furthermore the abovementioned mirror symmetry planes and C2 axes coexist,
the symmetry planes coincide with or bisect the C2 axes and vice versa. If these were
not the case, another mirror plane or C2 axis would be generated from the symmetry
requirement and the newly generated plane or axis would be coupled with the
original plane or axis. From the above argument, these processes again produce
another Cn axis. That must be prohibited.
Next, suppose that a Cn axis intersects obliquely with a plane of mirror symmetry.
A rotation of 2π/n around such an axis produces another mirror symmetry plane.
648 17 Symmetry Groups
This newly generated plane intersects with the original mirror plane and produces a
different Cn axis according to the above discussion. Thus, in this situation a mirror
symmetry plane cannot coexist with a sole rotation axis. In a geometric object with
higher symmetry such as Oh, however, several mirror symmetry planes can coexist
with several rotation axes in such a way that the axes intersect with the mirror planes
obliquely. In case the Cn axis intersects perpendicularly to a mirror symmetry plane,
that plane can coexist with a sole rotation axis (see Fig. 17.7). This is actually the
case with a geometric object having a C2h symmetry. The mirror symmetry plane is
denoted by σ h.
Now, let us examine simple examples of molecules and associated symmetry
operations.
Example 17.2 Figure 17.8 shows chemical structural formulae of thiophene,
bithiophene, biphenyl, and naphthalene. These molecules belong to C2v, C2h, D2,
and D2h, respectively. Note that these symbols are normally used to show specific
point groups. Notice also that in biphenyl two benzene rings are twisted relative to
the molecular axis. As an example, a multiplication table is shown in Table 17.1 for a
C2v group. Table 17.1 clearly demonstrates that the group constitution of C2v differs
from that of the group appearing in Example 16.1 (iii), even though the order is four
for both the case. Similar tables are given with C2h and D2. This is left for readers as
an exercise. We will find that the multiplication tables of these groups have the same
structure and that C2v, C2h, and D2 are all isomorphic to one another as a four group.
Table 17.2 gives matrix representation of symmetry operations for C2v. The repre-
sentation is defined as the transformation by the symmetry operations of a set of
basis vectors (x y z) in ℝ3.
Meanwhile, Table 17.3 gives the multiplication table of D2h. We recognize that
the multiplication table of C2v appears on upper left and lower right blocks. If we
suitably rearrange the order of group elements, we can make another multiplication
table so that, e.g., C2h may appear on upper left and lower right blocks. As in the case
of Table 17.2, Table 17.4 summarizes the matrix representation of symmetry
operations for D2h. There are eight group elements, i.e., an identity, an inversion,
three mutually perpendicular C2 axes, and three mutually perpendicular planes of
mirror symmetry (σ). Here, we consider a possibility of constructing subgroups of
D2h. The order of the subgroups must be a divisor of eight, and so let us list
subgroups whose order is four and examine how many subgroups exist. We have
8C4 ¼ 70 combinations, but those allowed should be restricted from the requirement
17.2 Successive Symmetry Operations 649
of forming a group. This is because all the groups must contain identity element, and
so the number allowed is equal to or no greater than 7C3 ¼ 35.
(i) In light of the aforementioned discussion, two C2 axes mutually intersecting at
π/2 yield another C2 axis around the normal to a plane defined by the intersecting
axes. Thus, three C2 axes have been chosen and a D2 symmetry results. In this case,
we have only one choice. (ii) In the case of C2v, two planes mutually intersecting at
π/2 yield a C2 axis around their line of intersection. There are three possibilities of
choosing two axes out of three (i.e., 3C2 ¼ 3). (iii) If we choose the inversion (i)
along with, e.g., one of the three C2 axes, a σ necessarily results. This is also the case
when we first combine a σ with i to obtain a C2 axis. We have three possibilities (i.e.,
3C1 ¼ 3) as well. Thus, we have only seven choices to construct subgroups of D2h
having an order of four. This is summarized in Table 17.5.
An inverse of any element is that element itself. Therefore, if with any above
subgroup one chooses any element out of the remaining four elements and combines
it with the identity, one can construct a subgroup Cs, C2, or Ci of an order of 2. Since
all those subgroups of an order of 4 and 2 are commutative with D2h, these sub-
groups are invariant subgroups. Thus in terms of a direct-product group, D2h can be
expressed as various direct factors. Conversely, we can construct factor groups from
coset decomposition. For instance, we have
650 17 Symmetry Groups
Example 17.3 Figure 17.9 shows an equilateral triangle placed on the xy-plane of a
three-dimensional Cartesian coordinate. An orthonormal basis e1 and e2 are desig-
nated as shown. As for a chemical species of molecules, we have, e.g., boron
trifluoride (BF3). In the molecule, a boron atom is positioned at a molecular center
with three fluorine atoms located at vertices of the equilateral triangle. The boron
atom and fluorine atoms form a planar molecule (Fig. 17.10).
A symmetry group belonging to D3h comprises twelve symmetry operations such
that
D3h ¼ E, C 3 , C03 , C 2 , C 02 , C002 , σ h , S3 , S03 , σ v , σ 0v , σ 00v , ð17:32Þ
where symmetry operations of the same species but distinct operation are denoted by
a “prime” or “double prime.” When we represent these operations by a matrix, it is
straightforward in most cases. For instance, a matrix for σ v is given by Myz of
(17.15). However, we should make some matrix calculations about
C 02 , C 002 , σ 0v , and σ 00v .
To determine a matrix representation of, e.g., σ 0v in reference to the xy-coordinate
system with orthonormal basis vectors e1 and e2, we consider the x’y’-coordinate
system with orthonormal basis vectors e01 and e02 (see Fig. 17.9). A transformation
matrix between the two set of basis vectors is represented by
17.2
Successive Symmetry Operations
0 pffiffiffi 1
3 1
B 2 0C
0 0 0 B pffiffi2ffi C
e1 e2 e3 ¼ ðe1 e2 e3 ÞRzπ6 ¼ ðe1 e2 e3 ÞB
B 1
C: ð17:33Þ
0C
3
@ 2 2 A
0 0 1
Thus, we see that in the first equation of (17.35) the order of multiplications are
reversed according as the latter operation is expressed in the xyz-system or in the
x’y’z’-system. As a full matrix representation, we get
0 pffiffiffi 1 0 pffiffiffi 1
3 1 0 1 3 1
B 2 0C 1 0 0 B 2 0C
B pffiffi2ffi CB CB p2ffiffiffi C
σ 0v ¼ B
B 1
C@ 0 1 0 AB C
0C B1 0C
3 3
@ 2 2 A @ 2 2 A
0 0 1
0 0 1 0 0 1
0 pffiffiffi 1
1 3
B 2 0C
B pffiffiffi 2 C
¼BB 3
C: ð17:36Þ
0C
1
@ 2 A
2
0 0 1
symmetry operations done in the order of (i) π/6 rotation, (ii) reflection (denoted by
Σv), and (iii) π/6 rotation (Fig. 17.9). The associated matrices are multiplied from
the left.
Similarly, with C02 we have
0 pffiffiffi 1
1 3
B 2 0 C
B pffiffiffi 2 C
C 02 ¼ B
B 3
C: ð17:37Þ
0 C
1
@ 2 A
2
0 0 1
The matrix form of (17.37) can also be decided in a manner similar to the above
according to three successive operations shown in Fig. 17.9. In terms of classes,
σ v , σ 0v , and σ 00v form a conjugacy class and C 2 , C02 , and C 002 form another conjugacy
class. With regard to the reflection and rotation, we have det σ 0v ¼ 1 and det
C2 ¼ 1, respectively.
654 17 Symmetry Groups
Fig. 17.11 Cube whose individual vertices have three arrows for the cube not to possess mirror
symmetries. This object belongs to a point group O called pure rotation group
χ=3
z
100
010
y
001
z x
χ=1
10 0 1 00 0 0 1 0 0 -1 0 -1 0 0 1 0 π/2
0 0 -1 0 01 0 1 0 0 1 0 1 0 0 -1 0 0
01 0 0 -1 0 -1 0 0 1 0 0 0 0 1 0 0 1 y
x
z
χ=0
001 0 01 0 0 -1 0 0 -1 010 0 -1 0 0 1 0 0 -1 0 y
100 -1 0 0 1 00 -1 0 0 001 0 0 -1 0 0 -1 0 0 1
010 0 -1 0 0 -1 0 0 10 100 10 0 -1 0 0 -1 0 0 2π/3
x
χ = 䠉1 z
10 0 -1 0 0 -1 0 0 π
0 -1 0 0 1 0 0 -1 0
0 0 -1 0 0 -1 0 01
y z
x
χ = 䠉1
-1 0 0 -1 0 0 0 1 0 0 -1 0 0 0 1 0 0 -1 y
00 1 0 0 -1 1 0 0 -1 0 0 0 -1 0 0 -1 0
01 0 0 -1 0 0 0 -1 0 0 -1 1 0 0 -1 0 0 x
Fig. 17.12 Point group O and its five conjugacy classes. Geometrical characteristics of individual
classes are briefly sketched
Another characteristic is the generation of eight rotation axes that trisect the x-, y-,
and z-axes, more specifically, a solid angle π/2 formed by the x-, y-, and z-axes. Since
the rotation switches all the x-, y-, and z-axes, the trace is zero. At the same time, we
find that this operation belongs to C3. This operation is generated by successive two
π/2 rotations around two mutually orthogonal axes. To inspect this situation more
656 17 Symmetry Groups
1
Rxπ2 ¼ Rxπ2 : ð17:39Þ
Now let us consider the successive two rotations. This is denoted by the multi-
plication of matrices that represent the related rotations. For instance, the multipli-
cation of, e.g., Rxπ2 and R0yπ produces the following:
2
0 10 1 0 1
1 0 0 0 0 1 0 0 1
B CB C B C
Rxyz2π3 ¼ Rxπ2 R0yπ ¼ @ 0 0 1 A@ 0 1 0A ¼ @1 0 0 A: ð17:40Þ
2
0 1 0 1 0 0 0 1 0
where Rxyz2π3 is a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-,
and z-axes. Notice that we used R0xπ this time, because it was performed after Ryπ2 .
2
Thus, we notice that there are eight related operations that trisect eight octants of the
coordinate system. These operations are further categorized into four sets in which
the two elements are an inverse element of each other. For instance, we have
1
Rx y z2π3 ¼ Rxyz2π3 : ð17:42Þ
Notice that a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-,
and z-axes is equivalent to a 2π/3 clockwise rotation around an axis that trisects the
1
x-, y-, and z-axes. Also, we have Rx yz2π3 ¼ Rxyz2π3 etc. Moreover, we have “cyclic”
relations such as
17.3 O and Td Groups 657
1
Rxπ2 R0yπ ¼ RO Rxπ2 , i:e:, RO ¼ Rxπ2 R0yπ Rxπ2 , ð17:44Þ
2 2
where RO is viewed in reference to the original (or fixed) coordinate system and
conjugated to R0yπ . Thus, we have
2
0 10 10 1
1 0 0 0 0 1 1 0 0
B CB CB C
RO ¼ @ 0 0 1 A@ 0 1 0 A@ 0 0 1A
0 1 0 1 0 0 0 1 0
0 1
0 1 0
B C
¼ @1 0 0 A: ð17:45Þ
0 0 1
Note that (17.45) is identical to a matrix representation of a π/2 rotation around the z-
axis. This is evident from the fact that the y-axis is converted to the original z-axis by
Rxπ2 ; readers, imagine it.
We have two conjugacy classes of π rotation (the C2 symmetry). One of them
includes six elements, i.e., Rxyπ , Ryzπ, Rzxπ, Rxyπ , Ryzπ , and Rzxπ . For these notations a
subscript, e.g., xy stands for an axis that bisects the angle formed by x- and y-axes. A
subscript xy denotes an axis bisecting the angle formed by x- and y-axes. Another
class includes three elements, i.e., Rxπ , Ryπ, and Rzπ.
As for Rxπ, Ryπ, and Rzπ, a combination of these operations should yield a C2
rotation axis as discussed in Sect. 17.2. Of these three rotation axes, in fact, any two
658 17 Symmetry Groups
produce a C2 rotation around the remaining axis, as is the case with naphthalene
belonging to the D2h symmetry (see Sect. 17.2).
Regarding the class comprising six π rotation elements, a combination of, e.g.,
Rxyπ and Rxyπ crossing each other at a right angle causes a related effect. For the other
combinations, the two C2 axes intersect each other at π/3; see Fig. 17.6 and put
θ ¼ π/3 there. In this respect, elementary analytic geometry teaches the positional
relationship among planes and
0 straight
1 0 lines. 1 The0argument
1 is as follows: A plane
x1 x2 x3
B C B C B C
determined by three points @ y1 A, @ y2 A, and @ y3 A that do not sit on a line is
z1 z2 z3
expressed by a following equation:
x y z 1
x1 y1 z1 1
¼ 0: ð17:46Þ
x2 x2 z2 1
x3 y3 z3 1
0 1 0 1 0 1
0 1 0
B C B C B C
Substituting @ 0 A, @ 1 A, and @ 1 A, we have
0 0 1
x y þ z ¼ 0: ð17:47Þ
Taking account of direction cosines and using the Hesse’s normal form, we get
1
pffiffiffi ðx y þ zÞ ¼ 0, ð17:48Þ
3
where the normal to the plane expressed in (17.48) has direction cosines of p1ffiffi3, p1ffiffi3,
and p1ffiffi in relation to the x-, y-, and z-axes, respectively.
3
0 Therefore,
1 the normal is given by a straight line connecting the origin and
1
B C
@ 1 A. In other words, a line connecting the origin and a corner of a cube is the
1
normal to the plane described by (17.48). That plane is formed by two intersecting
lines, i.e., rotation axes of C2 and C02 (see Fig. 17.13 that depicts a cube of each side
of 2). These axes0makepan π/3; this1
ffiffiffi 1angle 0 can easily be checked by taking an inner
1= 2 0
B pffiffiffi C B pffiffiffi C
product between @ 1= 2 A and @ 1= 2 A. These column vectors are two direction
pffiffiffi
0 1= 2
cosines of C2 and C 02 . On the basis of the discussion of Sect. 17.2, we must have a
17.3 O and Td Groups 659
(0, 1, 1)
Fig. 17.13 Rotation axes of C2 and C 02 along with another rotation axis C3 in a point group O
Fig. 17.14 Simple kit that helps visualize the positional relationship among planes and straight
lines in three-dimensional space. To make it, follow next procedures: (i) Take three thick sheets of
paper and make slits (dashed lines) as shown. (ii) Insert Sheet 2 into Sheet 1 so that the two sheets
can make a right angle. (iii) Insert Sheet 3 into combined Sheets 1 and 2
rotation axis of C3. That is, this axis trisects a solid angle π/2 shaped by three
intersecting sides.
It is sometimes hard to visualize or envisage the positional relationship among
planes and straight lines in three-dimensional space. It will therefore be useful to
make a simple kit to help visualize it. Figure 17.14 gives an illustration.
Another typical example having 24 group elements is Td. A molecule of methane
belongs to this symmetry. Table 17.6 collects the relevant symmetry operations and
their (3, 3) matrix representations. As in the case of Fig. 17.12, the matrices show
how a set of vectors (x y z) are transformed according to the symmetry operations.
Comparing it with Fig. 17.12, we immediately recognize that the close relationship
between Td and O exists and that these point groups share notable characteristics.
(i) Both Td and O consist of five conjugacy classes, each of which contains the
same number of symmetry species. (ii) Both Td and O contain a pure rotation group
T as a subgroup. The subgroup T consists of 12 group elements E, 8C3, and 3C2.
Other remaining twelve group elements of Td are symmetry species related to
reflection; S4 and σ d. The elements 6S4 and 6σ d correspond to 6C4 and 6C2 of O,
660
6σ d:
0 1 0 1 0 1 0 1 0 1 0 1
1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0
B C B C B C B C B C B C
@0 0 1A @ 0 0 1 A @ 0 1 0 A @ 0 1 0 A @1 0 0A @ 1 0 0 A
0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1
Symmetry Groups
17.3 O and Td Groups 661
1
pffiffiffi ðx yÞ ¼ 0, ð17:49Þ
2
1
pffiffiffi ðy zÞ ¼ 0, ð17:50Þ
2
1
pffiffiffi ðz xÞ ¼ 0: ð17:51Þ
2
1 1 1
cos α ¼ pffiffiffi ∙ pffiffiffi ¼ or ð17:52Þ
2 2 2
1 1 1 1
cos α ¼ pffiffiffi ∙ pffiffiffi pffiffiffi ∙ pffiffiffi ¼ 0: ð17:53Þ
2 2 2 2
That is, α ¼ π/3 or α ¼ π/2. On the basis of the discussion of Sect. 17.2, the
intersection of the two planes must be a rotation axis of C3 or C2. Once again, in the
case of C3 the intersection is a straight line connecting the origin and a vertex of the
cube. This can readily be verified as follows: For instance, two planes given by x ¼ y
and y ¼ z make an angle π/3 and produce an intersection line x ¼ y ¼ z. This line, in
turn, connects the origin and a vertex of the cube.
If we choose, e.g., two planes x ¼ y from the above, these planes make a right
angle and their intersection must be a C2 axis. The three C2 axes coincide with the x-,
y-, and z-axes. In this light, σ d functions similarly to 6C2 of O in that their
combinations produce 8C3 or 3C2. Thus, constitution and operation of Td and
O are related.
Let us more closely inspect the structure and constitution of O and Td. First we
construct mapping ρ between group elements of O and Td such that
ρ : g 2 O ⟷ g0 2 T d , if g 2 T,
ρ : g 2 O ⟷ g0 Þ1 2 T d , if g 2= T:
In the above relation, the minus sign indicates that with an inverse representation
matrix R must be replaced with R. Then, ρ(g) ¼ g0 is an isomorphic mapping. In
fact, comparing Fig. 17.12 and Table 17.6, ρ gives identical matrix representation for
O and Td. For example, taking the first matrix of S4 of Td, we have
662 17 Symmetry Groups
0 11 0 1 0 1
1 0 0 1 0 0 1 0 0
B C B C B0 0 1 C
@ 0 0 1 A ¼ @ 0 0 1A ¼ @ A:
0 1 0 0 1 0 0 1 0
The resulting matrix is identical to the first matrix of C4 of O. Thus, we find Td and
O are isomorphic to each other.
Both O and Td consist of 24 group elements and isomorphic to a symmetric group
S4; do not confuse it with the same symbol S4 as a group element of Td. The subgroup
T consists of three conjugacy classes E, 8C3, and 3C2. Since T is constructed only by
entire classes, it is an invariant subgroup; in this respect see the discussion of Sect.
16.3. The groups O, Td, and T along with Th and Oh form cubic groups [2]. In
Table 17.7, we list these cubic groups together with their name and order. Of these,
O is a pure rotation subgroup of Oh and T is a pure rotation subgroup of Td and Th.
Symmetry groups are related to permutations of n elements (or objects or
numbers). The permutation has already appeared in (11.56) when we defined a
determinant of a matrix. That was defined as
1 2 n
σ¼ ,
i1 i2 in
Theorem 17.1: Cayley’s Theorem [1] Every finite group ℊ of order n is isomor-
phic to a subgroup (containing a whole group) of the symmetric group Sn.
In Part III and Part IV thus far, we have dealt with a broad class of linear trans-
formations. Related groups are finite groups. Here we will describe characteristics of
special orthogonal group SO(3), a kind of infinite groups. The SO(3) represents
rotations in three-dimensional Euclidean space ℝ3. Rotations are made around an
axis (a line through the origin) with the origin fixed. The rotation is defined by an
azimuth of direction and a magnitude (angle) of rotation. The azimuth of direction is
defined by two parameters and the magnitude is defined by one parameter, a rotation
angle. Hence, the rotation is defined by three independent parameters. Since those
parameters are continuously variable, SO(3) is one of continuous groups.
Two rotations result in another rotation with the origin again fixed. A reverse
rotation is unambiguously defined. An identity transformation is naturally defined.
An associative law holds as well. Thus, the relevant rotations form a group, i.e., SO
(3). The rotation is represented by a real (3, 3) matrix whose determinant is 1. A
matrix representation is uniquely determined once an orthonormal basis is set in ℝ3.
Any rotation is represented by a rotation matrix accordingly. The rotation matrix R is
defined by
RT R ¼ RRT ¼ E, ð17:54Þ
where R is a real matrix with det R ¼ 1. Notice that we exclude the case where det
R ¼ 1. Matrices that satisfy (17.54) with det R ¼ 1 are referred to as orthogonal
matrices that cause orthogonal transformation. Correspondingly, orthogonal groups
(represented by orthogonal matrices) contain rotation groups as a special case. In
other words, the orthogonal groups contain the rotation groups as a subgroup. An
orthogonal group in ℝ3 denoted O(3) contains SO(3) as a subgroup. By the same
token, orthogonal matrices contain rotation matrices as a special case.
In Sect. 17.2 we treated reflection and improper rotation with a determinant of
their matrices being 1. In this section these transformations are excluded and only
rotations are dealt with. We focus on geometric characteristics of the rotation groups.
664 17 Symmetry Groups
In this section we represent a vector such as jxi. We start with showing that any
rotation has a unique presence of a rotation axis. The rotation axis is defined by the
following: Suppose that there is a rigid body with some point within the body fixed.
Here the said rigid body can be that with infinite extent. Then the rigid body exerts
rotation. The rotation axis is a line on which every point is unmoved during the
rotation. As a matter of course, identity matrix E has thee linearly independent
rotation axes. (Practically, this represents no rotation.)
Theorem 17.2 Any rotation matrix R is accompanied by at least one rotation axis.
Unless the rotation matrix is identity, the rotation matrix should be accompanied by
one and only one rotation axis.
Proof As R is an orthogonal matrix of a determinant 1, so are RT and R1. Then we
have
Hence, we get
det ðR EÞ ¼ det RT E ¼ det R1 E ¼ det R1 ðE RÞ
¼ det R1 det ðE RÞ ¼ det ðE RÞ ¼ det ðR EÞ: ð17:56Þ
det ðR E Þ ¼ 0: ð17:57Þ
ðR E Þ j x0 i ¼ 0: ð17:58Þ
Therefore, we get
Rðajx0 iÞ ¼ a j x0 i, ð17:59Þ
ðR E Þjui ¼ 0, ð17:61Þ
ðR EÞjvi ¼ 0: ð17:62Þ
Let us consider a vector 8jyi that is chosen from Span{j ui, j vi}, to which we
assume that
That is, Span{j ui, j vi} represents a plane P formed by two mutually intersecting
straight lines. Then we have
Similarly, we have
666 17 Symmetry Groups
P
| ⟩
| ⟩
R{ ¼ RT: ð17:69Þ
Now we have
hyjRwi ¼ yRT jRw ¼ yjRT Rw ¼ hyjEwi ¼ hyjwi ¼ 0: ð17:70Þ
In (17.70) the second equality comes from the associative law; the third is due to
(17.54). The last equality comes from that jwi is perpendicular to P.
From (17.70) we have
{
wR jRw ¼ wRT jRw ¼ hwjwi ¼ jaj2 hwjwi, i:e:, a ¼ 1,
where the first equality comes from that R is an orthogonal matrix. Since det R¼1,
a ¼ 1. Thus, from (17.72) this time around we have jRwi ¼ jwi; that is,
ðR E Þjwi ¼ 0: ð17:73Þ
Equations (17.65) and (17.73) imply that for any vector jxi arbitrarily chosen from
ℝ3 we have
ðR EÞjxi ¼ 0: ð17:74Þ
Consequently, we get
R E ¼ 0 or R ¼ E: ð17:75Þ
The above procedures represented by (17.61) to (17.75) indicate that the presence
of two rotation axes necessarily requires a transformation matrix to be identity. This
implies that all the vectors in ℝ3 are an eigenvector of a rotation matrix.
Taking contraposition of the above, unless the rotation matrix is identity, the
relevant rotation cannot have two rotation axes. Meanwhile, the proof of the former
half ensures the presence at least one rotation axis. Consequently, any rotation is
characterized by a unique rotation axis except for the identity transformation. This
completes the proof. ∎
An immediate consequence of Theorem 17.2 is that the rotation matrix should
have an eigenvalue 1 which an eigenvector representing the rotation axis corre-
sponds to. This statement includes a trivial fact that all the eigenvalues of the identity
matrix are 1. In Sect. 14.4, we calculated eigenvalues of a two-dimensional rotation
matrix. The eigenvalues were eiθ or eiθ, where θ is a rotation angle.
Let us consider rotation matrices that we dealt with in Sect. 17.1. The matrix
representing the rotation around the z-axis by a rotation angle θ is expressed by
0 1
cos θ sin θ 0
B C
R ¼ @ sin θ cos θ 0 A: ð17:8Þ
0 0 1
0 1 0 1
1 1 1 i
pffiffiffi pffiffiffi 0 pffiffiffi pffiffiffi 0
B 2 2 C B 2 2 C
B C B C
U¼B
B piffiffiffi i C, U{ ¼ B
B pffiffiffi piffiffiffi
1 C: ð17:76Þ
@ pffiffiffi 0 C
A @ 2 0C
A
2 2 2
0 0 1 0 0 1
Thus, eigenvalues are 1, eiθ, and eiθ. The eigenvalue 1 results from the existence of
the unique rotation axis. When θ ¼ 0, (17.77) gives an identity matrix with all the
eigenvalues 1 as expected. When θ ¼ π, eigenvalues are 1, 1, and 1. The
eigenvalue 1 is again associated with the unique rotation axis. The (unitary) simi-
larity transformation keeps a trace unchanged, that is, the trace χ is
χ ¼ 1 þ 2 cos θ: ð17:78Þ
Euler angles are well known and have been being used in various fields of science.
We wish to connect the above discussion with Euler angles.
In Part III we dealt with successive linear transformations. This can be extended
to the case of three or more successive transformations. Suppose that we have three
successive transformation R1, R2, and R3 and that the coordinate system (a three-
dimensional orthonormal basis) is transformed from O ⟶ I ⟶ II ⟶ III accordingly.
The symbol “O” stands for the original coordinate system and I, II, and III represent
successively transformed systems.
With the discussion that follows, let us denote the transformation by R20, R30, R300,
etc. in reference to the coordinate system I. For example, R30 means that the third
transformation is viewed from the system I. The transformation R300 indicates the
third transformation is viewed from the system II. That is, the number of primes “0”
denotes the number of the coordinate system to distinguish the systems I and II. Let
R2 (without prime) stand for the second transformation viewed from the system O.
Meanwhile, we have
R1 R02 ¼ R2 R1 : ð17:79Þ
R1 R2 0 R3 00 ¼ R1 R3 0 R2 0 ¼ R3 R1 R2 0 ¼ R3 R2 R1 : ð17:81Þ
Let us call R20, R300, etc. a transformation on a “moving” coordinate system (i.e., the
system I, II, III, ). On the other hand, we call R1, R2, etc. a transformation on a
“fixed” system (i.e., original coordinate system O). Thus, (17.81) shows that the
multiplication order is reversed with respect to the moving system and fixed
system [3].
For a practical purpose it would be enough to consider three successive trans-
formations. Let us think of, however, a general case where n successive transforma-
tions are involved (n denotes a positive integer). For the purpose of succinct
notation, let us define the linear transformations and relevant coordinate systems
ðiÞ
as those in Fig. 17.17. Also, we define a following orthogonal transformation Rj :
670 17 Symmetry Groups
( ) ( ) ( ) ( )
≡
O ⟶ I ⟶ II ⟶ III ⟶ ⋯
z
y
x
ðiÞ ð0Þ
R j ð0 i<j nÞ, and Ri Ri , ð17:83Þ
ðiÞ
where Rj is defined as a transformation Rj described in reference to the coordinate
ð0Þ
system i; Ri means that Ri is referred to the original coordinate system (i.e., the
fixed coordinate). Then we have
Particularly, when i ¼ 3
For i ¼ 2 we have
ð1Þ
R1 Rk ¼ Rk R1 ðk > 1Þ: ð17:86Þ
fn
We define n time successive transformations on a moving coordinate system as R
such that
ðn2Þ
Applying (17.84) on Rn1 Rðnn1Þ and rewriting (17.87), we have
ðn3Þ
Applying (17.84) again on Rn2 Rðnn2Þ , we get
where with the last equality we used (17.86). In this case, we have
R1 Rðn1Þ ¼ Rn R1 : ð17:91Þ
In total, we have applied the permutation of (17.84) n(n 1)/2 times. When n is 2, n
(n 1)/2 ¼ 1. This is the case with (17.79). If n is 3, n(n 1)/2 ¼ 3. This is the case
with (17.81). Thus, (17.93) once again confirms that the multiplication order is
reversed with respect to the moving system and fixed system.
Meanwhile, we define P e as
Alternately, we describe
R e ðnn1Þ ¼ Rn P:
fn ¼ PR e ð17:96Þ
Equivalently, we have
Rn ¼ PR e1 or
e ðnn1Þ P ð17:97Þ
1
Rðnn1Þ ¼ P e
e Rn P: ð17:98Þ
Moreover, we have
672 17 Symmetry Groups
h iT h iT h iT h iT
eT ¼ R1 Rð21Þ Rð32Þ Rðn2
P
n3Þ ðn2Þ
Rn1
ðn2Þ
¼ Rn1
ðn3Þ
Rn2
ð1Þ
R2 ½R1 T
h i1 h i1 h i1
ðn2Þ ðn3Þ ð1Þ
¼ Rn1 Rn2 R2 R1 1
h i1
ð1Þ ð2Þ ðn3Þ ðn2Þ
¼ R1 R2 R3 Rn2 Rn1 ¼P e1: ð17:99Þ
P e¼P
eT P ePeT ¼ E: ð17:100Þ
0 10 10 1
cos α sin α 0 cos β 0 sin β cos γ sin γ 0
B CB CB C
f3 ¼ B sin α cos α 0C B 0 C B 0C
R @ A@ 0 1 A@ sin γ cos γ A
0 0 1 sin β 0 cos β 0 0 1
0 1
cos α cos β cos γ sin α sin γ cos α cos β sin γ sin α cos γ cos α sin β
B C
¼B
@ sin α cos β cos γ þ cos α sin γ sin α cos β sin γ þ cos α cos γ sin α sin β C
A:
sin β cos γ sin β sin γ cos β
ð17:101Þ
0 10 10 1
cos α sin α 0 cos β 0 sin β cos γ sin γ 0
B CB CB C
R3 ¼ B
@ sin α cos α 0C
A@
B 0 0 C
1 B
A@ sin γ cos γ 0C
A
0 0 1 sin β 0 cos β 0 0 1
0 10 1
cos β 0 sin β cos α sin α 0
B CB C
B 0 1 0 C B C
@ A@ sin α cos α 0 A
sin β 0 cos β 0 0 1
0 1
cos 2 α cos 2 β þ sin 2 α cos γ cos α sin α sin 2 βð1 cos γ Þ cos α cos β sin βð1 cos γ Þ
B C
B þ cos 2 α sin 2 β cos β sin γ þ sin α sin β sin γ C
B C
B 2 C
B cos α sin α sin βð1 cos γ Þ
2
sin α cos 2 β þ cos 2 α cos γ sin α cos β sin βð1 cos γ Þ C
B C
¼B C:
B þ cos β sin γ þ sin 2 α sin 2 β cos α sin β sin γ C
B C
B C
B cos β sin β cos αð1 cos γ Þ sin α cos β sin βð1 cos γ Þ sin β cos γ
2 C
@ A
sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β
ð17:107Þ
χ ¼ 1 þ 2 cos γ: ð17:108Þ
ð2Þ
The trace is same as that of R3 , as expected from (17.106) and (17.97).
Equation (17.107) apparently seems complicated but has simple and well-defined
ð2Þ
meaning. The rotation represented by R3 is characterized by a rotation by γ around
the z -axis. Figure 17.18 represents the orientation of the z00-axis viewed in reference
00
to the original xyz-system. That is, the z00-axis (identical to the rotation axis A) is
designated by an azimuthal angle α and a zenithal angle β as shown. The operation
R3 is represented by a rotation by γ around the axis A in the xyz-system. The angles α,
β, and γ coincide with the Euler angles designated with the same independent
parameters α, β, and γ.
From (17.77) and (17.107), a diagonalizing matrix for R3 is PU. e That is,
0 1
eiγ 0 0
{ e 1
{ B C
e ¼ PU
U P R3 PU e R3 PU { ð 2Þ
e ¼ U R3 U ¼ @ 0 eiγ 0 A: ð17:109Þ
0 0 1
O y
e{ ¼ P
P e1
eT ¼ P ð17:110Þ
and
0 1
0 1 1 1
pffiffiffi pffiffiffi 0
cos α cos β sin α cos α sin β B 2 2 C
B CBB
C
C
e ¼ B sin α cos β cos α sin α sin β C
PU @ ABB pffiffiffi
i i
pffiffiffi 0C
C
@ 2 2 A
sin β 0 cos β
0 0 1
0 1 i 1 i 1
pffiffiffi cos α cos β þ pffiffiffi sin α pffiffiffi cos α cos β pffiffiffi sin α cos α sin β
B 2 2 2 2 C
B C
B 1 C
B i 1 i
sin α sin β C
¼ B pffiffiffi sin α cos β pffiffiffi cos α pffiffiffi sin α cos β þ pffiffiffi cos α C:
B 2 2 2 2 C
B C
@ 1 1 A
pffiffiffi sin β pffiffiffi sin β cos β
2 2
ð17:111Þ
jR3 λEj
0 1
cos 2 α cos 2 β þ sin 2 α cos γ cos α sin α sin 2 βð1 cos γ Þ cos α cos β sin βð1 cos γ Þ
B C
B þ cos 2 α sin 2 β λ cos β sin γ þ sin α sin β sin γ C
B C
B cos α sin α sin 2 βð1 cos γ Þ sin α cos β þ cos α cos γ sin α cos β sin βð1 cos γ Þ C
2 2 2
B C
¼B C:
B þ cos β sin γ þ sin α sin β λ
2 2
cos α sin β sin γ C
B C
B C
@ cos α cos β sin βð1 cos γ Þ sin α cos β sin βð1 cos γ Þ sin β cos γ
2
A
sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β λ
ð17:113Þ
The above matrix calculations certainly verify that (17.114) holds; readers, check it.
As an application of (17.107) to an illustrative example, let us consider a 2π/3
rotation around an axis trisecting a solid angle π/2 formed by the x-, y-, and z-axes
(see Fig. 17.19 and Sect. 17.3). Then we have
pffiffiffi pffiffiffi pffiffiffi
cos α ¼ sin α ¼ 1= 2, cos β ¼ 1= 3, sin β ¼ 2= 3,
pffiffiffi
cos γ ¼ 1=2, sin γ ¼ 3=2: ð17:115Þ
z z
(a) (b)
O
y
O
y
x
Fig. 17.19 Rotation axis of C3 that permutates basis vectors. (a) The C3 axis is a straight line that
connects the origin and each vertex of a cube. (b) Cube viewed along the C3 axis that connects the
origin and vertex
0 1
0 0 1
B C
R3 ¼ @ 1 0 0 A: ð17:116Þ
0 1 0
where je1i, j e2i, and j e3i represent unit basis vectors in the direction of x-, y-, and z-
axes, respectively. We have jxi ¼ x1 j e1i + x2 j e2i + x3 j e3i. Thus, we get
0 1
x1
B C
R3 ðjxiÞ ¼ ðje2 i je3 i je1 iÞ@ x2 A: ð17:118Þ
x3
This implies a cyclic permutation of the basis vectors. This is well expressed in terms
of the column vectors (i.e., coordinate) transformation as
678 17 Symmetry Groups
0 1
x3
B C
R3 ðjxiÞ ¼ ðje1 i je2 i je3 iÞ@ x1 A: ð17:119Þ
x2
Care should be taken on which linear transformation is intended out of the basis
vectors transformation or coordinate transformation.
As mentioned above, we have compared geometric features on the moving
coordinate system and fixed coordinate system. The features apparently seem to
differ at a glance, but are essentially the same.
References
1. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
2. Cotton FA (1990) Chemical applications of group theory. Wiley, New York
3. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
Chapter 18
Representation Theory of Groups
In Sect. 16.4 we dealt with various aspects of the mapping between group elements.
Of these, we studied fundamental properties of isomorphism and homomorphism. In
this section we introduce the notion of representation of groups and study it. If we
deal with a finite group consisting of n elements, we describe it as ℊ ¼ {g1 e,
g2, , gn} as in the case of Sect. 16.1.
Definition 18.1 Let ℊ ¼ {gν} be a group comprising elements gν. Suppose that a
(d, d ) matrix D(gν) is given for each group element gν. Suppose also that in
correspondence with
gμ gν ¼ gρ , ð18:1Þ
D gμ D ð gν Þ ¼ D gρ ð18:2Þ
L ¼ fDðgν Þg
is said to be a representation.
Although the definition seems somewhat daunting, the representation is, as
already seen, merely a homomorphism.
We call individual matrices D(gν) a representation matrix. A dimension d of the
matrix is said to be a dimension of representation as well. In correspondence with
gνe ¼ gν, we have
D gμ DðeÞ ¼ D gμ : ð18:3Þ
Therefore, we get
DðeÞ ¼ E, ð18:4Þ
That is,
D gν 1 ¼ ½Dðgν Þ1 : ð18:6Þ
where the third equality comes from homomorphism of the representation and the
second last equality is due to rearrangement theorem (see Sect. 16.1).
Note that each matrix D(gi){D(gi) is an Hermitian Gram matrix constructed by a
non-singular matrix and that H is a summation of such Gram matrices. Conse-
quently, on the basis of the argument of Sect. 13.2, H is positive definite and all
the eigenvalues of H are positive. Then, using an appropriate unitary matrix U, we
can get a diagonal matrix Λ such that
U { HU ¼ Λ: ð18:9Þ
0 pffiffiffiffiffi 1
λ1
B pffiffiffiffiffi C
B λ2 C
Λ1=2 ¼B C, ð18:11Þ
@ ⋱ A
pffiffiffiffiffi
λd
0 qffiffiffiffiffiffiffi 1
λ1
B 1
qffiffiffiffiffiffiffi C
1 B C
B λ1 C
Λ1=2 B
¼B 2 C: ð18:12Þ
C
B ⋱ C
@ qffiffiffiffiffiffiffi A
λ1
d
Notice that both Λ1/2 and (Λ1/2)1 are non-singular. Furthermore, we define a
matrix V such that
1
V ¼ U Λ1=2 : ð18:13Þ
Then, multiplying both sides of (18.8) by V1 from the left and by V from the
right and inserting VV1 ¼ E in between, we get
{
V 1 D gj VV 1 HVV 1 D gj V ¼ V 1 HV : ð18:14Þ
Meanwhile, we have
1 1
V 1 HV ¼ Λ1=2 U { HU Λ1=2 ¼ Λ1=2 Λ Λ1=2 ¼ Λ: ð18:15Þ
With the second equality of (18.15), we used (18.9). Inserting (18.15) into
(18.14), we get
{
V 1 D gj VΛV 1 D gj V ¼ Λ: ð18:16Þ
1 { 1
V{ ¼ Λ1=2 U { ¼ Λ1=2 U{:
VΛ ¼ UΛ1=2 :
with vectors mostly in ℝ3. Therefore, vectors naturally possess geometric features.
In this section, we extend a notion of vectors so that they can be treated under a wider
scope. More specifically, we include mathematical functions treated in analysis as
vectors.
To this end, let us think of a basis of a representation. According to the dimension
d of the representation, we have d basis vectors. Here we adopt d linearly indepen-
dent basis vectors for the representation.
Let ψ 1, ψ 2, , and ψ d be linearly independent vectors in a vector space V d. Let
ℊ ¼ {g1, g2, , gn} be a finite group of order n. Here we assume that gi (1 i n)
is a linear transformation (or operator) such as a symmetry operation dealt with in
Chap. 17. Suppose that the following relation holds with gi 2 ℊ (1 i n):
Xd
gi ð ψ ν Þ ¼ μ¼1
ψ μ Dμν ðgi Þ ð1 ν dÞ: ð18:22Þ
Xd
gk ð ψ ν Þ ¼ μ¼1
ψ λ Dλν ðgk Þ: ð18:26Þ
Comparing (18.25) and (18.26) and considering the uniqueness of vector repre-
sentation based on linear independence of the basis vectors (see Sect. 11.1), we get
D gj Dðgi Þ λν ¼ Dλν ðgk Þ ½Dðgk Þλν , ð18:27Þ
where the last identity follows the notation (11.38) of Sect. 11.2. Hence, we have
D gj Dðgi Þ ¼ Dðgk Þ: ð18:28Þ
B ¼ fψ 1 , ψ 2 , , ψ d g ð18:29Þ
0
Xd
ψν ¼ λ¼1
ψ λ T λν : ð18:31Þ
Xd
ψμ ¼ λ¼1
ψ 0λ T 1 λμ : ð18:32Þ
e ðgi Þ:
Δðgi Þ ¼ Dðgi Þ D ð18:36Þ
e ðgi Þ.
A dimension Δ(gi) is a sum of that of D(gi) and that of D
Definition 18.4 Let D be a representation of ℊ and let D(gi) be a representation
matrix of a group element gi. If we can convert D(gi) to a block matrix such as
(18.35) by its similarity transformation, D is said to be a reducible representation, or
completely reducible. If the representation is not reducible, it is irreducible.
18.2 Basis Functions of Representation 687
D : ℊ ! GLðV Þ:
where we used the fact that D(gi) is a unitary matrix; see (18.6). But, since W is
D(gi)-invariant from the supposition, jD(gi1)| bi 2 W. Here notice that gi1 must be
∃
gi (1 i n) 2 ℊ. Therefore, we should have
hbjDðgi Þjai ¼ ajD gi 1 jb ¼ 0:
V ¼ W W ⊥:
⊥
where D(W )(gi) and DðW Þ ðgi Þ are representation matrices associated with subspaces
W and W⊥, respectively. This is the converse of the additive representations that
appeared in Definition 18.3. From (11.17) and (11.18), we have
H H
H C1 H H C1 H
C3 C2 C3 C2
H H H H
y
x
Fig. 18.1 Allyl radical and its resonance structure. The molecule is placed on the yz-plane. The z-
axis is identical with a straight line connecting C1 and H (i.e., C2 axis)
Let us now operate a group element of C2v. For example, choosing C2(z), we have
0 10 1
1 0 0 c1
B CB C
C2 ðzÞðjψ Þ ¼ ðjϕ1 i jϕ2 i jϕ3 iÞ@ 0 0 1 A@ c2 A: ð18:39Þ
0 1 0 c3
Table 18.1 Matrix representation of symmetry operations given with respect to jϕ1i, jϕ2i, and jϕ3i
E C2(z) σ v(zx) σ 0 ðyzÞ
0 1 0 1 0 1 0v 1
1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C
@0 1 0A @ 0 0 1 A @0 0 1A @ 0 1 0 A
0 0 1 0 1 0 0 1 0 0 0 1
λ 1 0 0
0 λ 1 ¼ 0:
1 λ
The diagonalization of the representation matrix σ v(zx) using the same U as the
above is left for the readers as an exercise. Table 18.2 shows the results of the
diagonalized representations with regard to the vectors j ϕ1 i, p1ffiffi2 ðjϕ2 þ ϕ3 iÞ,
and p1ffiffi2 ðjϕ2 ϕ3 iÞ. Notice that traces of the matrices remain unchanged in
Tables 18.1 and 18.2, i.e., before and after the unitary similarity transformation.
From Table 18.2, we find that the vectors are eigenfunctions of the individual
symmetry operations. Each diagonal element is a corresponding eigenvalue of
those operations. Of the three vectors, jϕ1i and p1ffiffi2 ðjϕ2 þ ϕ3 iÞ have the same
eigenvalues with respect to individual symmetry operations. With symmetry
18.2 Basis Functions of Representation 691
Table 18.2 Matrix representation of symmetry operations given with respect to jϕ1i, p1ffiffi
2
ðjϕ2 þ ϕ3 iÞ, and p1ffiffi ðjϕ2 ϕ3 iÞ
2
operations C2 and σ v(zx), another vector p1ffiffi2 ðjϕ2 ϕ3 iÞ has an eigenvalue of a sign
opposite to that of the former two vectors. Thus, we find that we have arrived at
“symmetry-adapted” vectors by taking linear combination of original vectors.
Returning back to Table 17.2, we see that the representation matrices have already
been diagonalized. In terms of the representation space, we constructed the repre-
sentation matrices with respect to x, y, and z as symmetry-adapted basis vectors. In
the next chapter, we make the most of such vectors constructed by the symmetry-
adapted linear combination.
Meanwhile, allocating appropriate numbers to coefficients c1, c2, and c3, we
represent any vector in V3 spanned by jϕ1i, jϕ2i, and jϕ3i. In the next chapter, we
deal with molecular orbital (MO) calculations. Including the present case of the
allyl radical, we solve the energy eigenvalue problem by appropriately determining
those coefficients (i.e., eigenvectors) and corresponding energy eigenvalues in a
representation space. A dimension of the representation space depends on the
number of molecular orbitals. The representation space is decomposed into several
or more (but finite) orthogonal complements according to (18.38). We will come
back to the present example in Sect. 19.4.4 and further investigate the problems
there.
As can be seen in the above example, the representation space accommodates
various types of vectors, e.g., mathematical functions in the present case. If the
representation space is decomposed into invariant subspaces, we can choose appro-
priate basis vectors for each subspace, the number of which is equal to the dimen-
sionality of each irreducible representation [4]. In this context, the situation is related
to that of Part III where we have studied how a linear vector space is decomposed
into direct sum of invariant subspaces that are spanned by associated basis vectors.
In particular, it will be of great importance to construct mutually orthogonal
symmetry-adapted vectors through linear combination of original basis vectors in
each subspace associated with the irreducible representation. We further study these
important subjects in the following several sections.
692 18 Representation Theory of Groups
Pivotal notions of the representation theory of finite groups rest upon Schur’s
lemmas (first lemma and second lemma).
Schur’s First Lemma [1, 2] Let D and D e be two irreducible representations of ℊ.
e be m and n, respectively. Suppose that
Let dimensions of representations of D and D
8
with g 2 ℊ the following relation holds:
e ðgÞ,
DðgÞM ¼ M D ð18:41Þ
Proof
(a) First, suppose that m > n. Let B ¼ fψ 1 , ψ 2 , , ψ m g be a basis set of the
representation space related to D. Then we have
Xm
gð ψ ν Þ ¼ μ¼1
ψ μ Dμν ðgÞ ð1 ν mÞ:
ð18:43Þ
e ðgÞ{ M { ¼ M { DðgÞ{ :
D ð18:44Þ
Therefore, we get
D g1 ¼ DðgÞ1 ¼ DðgÞ{ :
Similarly, we have
e g1 ¼ D
D e ð gÞ { :
e ðgÞ1 ¼ D
where M is a (m, m) square matrix. If detM ¼ 0, then ϕ1, ϕ2, , and ϕm are
linearly dependent (see Sect. 11.4). With the number of linearly independent
vectors p, we have p < m accordingly. As in (18.43) again, this implies that we
would have obtained a representation of a smaller dimension p for D e , in
e
contradiction to the supposition that D is irreducible. To avoid this contradiction,
we must have detM 6¼ 0. These complete the proof. ∎
694 18 Representation Theory of Groups
(a): >
( ) × = ( )
×
(m, m) (m, n) (m, n) (n, n)
(b): <
( ) × = × ( )
(n, n) (n, m) (n, m) (m, m)
Fig. 18.2 Magnitude relationship between dimensions of representation matrices and M. (a) m > n.
(b) m < n. The diagram is based on (18.41) and (18.46) of the text
M ¼ cE: ð18:50Þ
∎
Schur’s lemmas lead to important orthogonality theorem that plays a fundamental
role in many scientific fields. The orthogonality theorem includes that of matrices
and their traces (or characters).
Theorem 18.3: Grand Orthogonality Theorem (GOT) [5] Let D(1), D(2), be
all inequivalent irreducible representations of a group ℊ ¼ {g1, g2, , gn} of order
n. Let D(α) and D(β) be two irreducible representations chosen from among D(1), D(2),
. Then, regarding their matrix representations, we have the following relationship:
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) 695
X ð αÞ ðβÞ n
Dij ðgÞ Dkl ðgÞ ¼ δ δ δ , ð18:51Þ
g dα αβ ik jl
where Σg means that the summation should be taken over all n group elements; dα
denotes a dimension of the representation D(α). The symbol δαβ means that δαβ ¼ 1
when D(α) and D(β) are equivalent and that δαβ ¼ 0 when D(α) and D(β) are
inequivalent.
Proof First we prove the case where D(α) ¼ D(β). For the sake of simple expression
we omit a superscript and denote D(α) simply by D. Let us construct a matrix A such
that
X
A¼ g
DðgÞXD g1 , ð18:52Þ
Thanks to the rearrangement theorem, for fixed g0 the element g0g runs through all
the group elements as g does so. Therefore, we have
X h i X
0 0 1
g
Dð g g ÞXD ð g g Þ ¼ g
DðgÞXD g1 ¼ A: ð18:54Þ
Thus,
A ¼ λE: ð18:56Þ
ðlÞ
The value of a constant λ depends upon the choice of X. Let X be δi δjðmÞ where all
the matrix elements are zero except for the (l, m)-component that takes 1 (Sects. 12.5
and 12.6). Thus from (18.53) we have
X X
g,p,q
Dip ðgÞδðplÞ δqðmÞ Dqj g1 ¼ D ðgÞDmj g1 ¼ λlm δij ,
g il
ð18:57Þ
X
g
Dil ðgÞDjm ðgÞ ¼ λlm δij : ð18:58Þ
X X X X
g i
Dil ðgÞDmi g1 ¼ g
D g1 DðgÞ ¼ g
D g1 g
ml ml
X X
¼ g
½DðeÞml ¼ δ
g ml
¼ nδml , ð18:59Þ
n
λlm d ¼ nδlm or λlm ¼ δlm : ð18:61Þ
d
X
DðαÞ ðg0 ÞB ¼ g
D ðαÞ 0
ð g ÞD ðαÞ
ð g ÞXD ðβÞ 1
g
X
ð αÞ 0 ð αÞ
ðβÞ 0 1 ðβÞ 0
ðβÞ 1
¼ g
D ð g ÞD ð g ÞXD g D g D ðg Þ ð18:65Þ
X h i
1
¼ g
DðαÞ ðg0 gÞXDðβÞ ðg0 gÞ DðβÞ ðg0 Þ ¼ BDðβÞ ðg0 Þ:
B ¼ 0: ð18:66Þ
ðlÞ
Putting X ¼ δi δjðmÞ as before and rewriting (18.64), we get
X ðαÞ ðβÞ
g
Dil ðgÞDjm ðgÞ ¼ 0: ð18:67Þ
Combining (18.63) and (18.67), we get (18.51). These procedures complete the
proof. ∎
18.4 Characters
where g stands for group elements g1, g2, , and gn; Tr stands for “trace”; d is a
dimension of the representation D. Let ∁ be a set defined as
This is because
X XX XX X
i
ðPQÞii ¼ i
P Q
j ij ji
¼ j i
Qji Pij ¼ j
ðQPÞjj : ð18:71Þ
χ ðgi Þ ¼ χ gj : ð18:75Þ
(iv) Any two equivalent representations have the same trace. This immediately
follows from (18.30).
There are several orthogonality theorem about a trace. Among them, the
following theorem is well known.
Theorem 18.4 A trace of irreducible representations satisfies the following orthog-
onality relation:
X
g
χ ðαÞ ðgÞ χ ðβÞ ðgÞ ¼ nδαβ , ð18:76Þ
where χ (α) and χ (β) are traces of irreducible representations D(α) and D(β),
respectively.
Proof In (18.51) putting i ¼ j and k ¼ l in both sides and summing over all i and k,
we have
18.4 Characters 699
X X ð αÞ ðβÞ
X n X n
g
D ðgÞ Dkk ðgÞ ¼
i,k ii i,k d α
δ αβ δ ik δ ik ¼ δ δ
i,k d α αβ ik
n
¼ δ d ¼ nδαβ : ð18:77Þ
d α αβ α
From (18.68) and (18.77), we get (18.76). This completes the proof. ∎
Since a character is identical with group elements belonging to the same
conjugacy class Kl, we may write it as χ(Kl) and rewrite a summation of (18.76) as
a summation of the conjugacy classes. Thus, we have
Xnc
l¼1
χ ðαÞ ðK l Þ χ ðβÞ ðK l Þkl ¼ nδαβ , ð18:78Þ
where nc denotes the number of conjugacy classes in a group and kl indicates the
number of group elements contained in a class Kl.
We have seen a case where a representation matrix can be reduced to two
(or more) block matrices as in (18.35). Also as already seen in Part III, the block
matrices decomposition takes place with normal matrices (including unitary matri-
ces). The character is often used to examine a constitution of a reducible represen-
tation or reducible matrix.
Alternatively, if a unitary matrix (a normal matrix, more widely) is decomposed
into block matrices, we say that the unitary matrix comprises a direct sum of those
block matrices. In physics and chemistry dealing with atoms, molecules, crystals,
etc., we very often encounter such a situation. Extending (18.36), the relation can
generally be summarized as
where D(gi) is a reducible representation for a group element gi; D(1)(gi), D(2)(gi), ,
and D(ω)(gi) are irreducible representations in a group. The notation D(ω)(gi) means
that D(ω)(gi) may be equivalent (or identical) to D(1)(gi), D(2)(gi), etc. or may be
inequivalent to them. More specifically, the same irreducible representations may
well appear several times.
To make the above situation clear, we usually use a following equation instead:
X
D ð gi Þ ¼ q DðαÞ ðgi Þ,
α α
ð18:80Þ
X
ðαÞ
χ ð gÞ ¼ q χ
α α
ðgÞ, ð18:81Þ
where we omitted a subscript i indicating an element. To find qα, let us multiply both
sides of (18.81) by χ (α)(g) and take summation over group elements. That is,
X X X X
g
χ ðαÞ ðgÞ χ ðgÞ ¼ q
β β g
χ ðαÞ ðgÞ χ ðβÞ ðgÞ ¼ q nδ
β β αβ
¼ qα n, ð18:82Þ
1 X ð αÞ
qα ¼ χ ðgÞ χ ðgÞ: ð18:83Þ
n g
The integer qα explicitly gives the number of appearance of D(α)(g) that appears in a
reducible representation D(g). The expression pertinent to the classes is
1 X ðαÞ
qα ¼ χ ðK i Þ χ ðK i Þki , ð18:84Þ
n i
where Ki and ki denote the i-th class of the group and the number of elements
belonging to Ki, respectively.
Now, the readers may wonder how many different irreducible representations exist
for a group. To answer this question, let us introduce a special representation of the
regular representation.
Definition 18.7 Let ℊ ¼ {g1, g2, , gn} be a group. Let us define a (n, n) square
matrix D(R)(gν) for an arbitrary group element gν (1 ν n) such that
h i
DðRÞ ðgν Þ ¼ δ g1
i gν gj ð1 i, j nÞ, ð18:85Þ
ij
8
>
<1 for gν ¼ e ði:e:, identityÞ,
where δðgν Þ ¼ ð18:86Þ
>
:
0 for gν 6¼ e:
gν gμ gk ¼ gi : ð18:88Þ
That is,
g1
i gν gμ gk ¼ e: ð18:89Þ
As can be seen from (18.92), the regular representation is reducible because the
matrix is decomposed into block matrices. Therefore, the representation can be
reduced to a direct sum of irreducible representations such that
X
DðRÞ ¼ q DðαÞ ,
α α
ð18:95Þ
X
χ ðRÞ ðgν Þ ¼ q χ
α α
ðαÞ
ðgν Þ, ð18:96Þ
where χ (α) is a trace of the irreducible representation D(α). Using (18.83) and
(18.94),
1 X ð α Þ ð RÞ 1 1
qα ¼ χ ðgÞ χ ðgÞ ¼ χ ðαÞ ðeÞ χ ðRÞ ðeÞ ¼ χ ðαÞ ðeÞ n ¼ χ ðαÞ ðeÞ
n g n n
¼ dα : ð18:97Þ
Note that a dimension dα of the representation D(α) is equal to a trace of its identity
matrix. Also notice from (18.95) and (18.97) that D(R) contains every irreducible
representation D(α) dα times. To show it more clearly, Table 18.4 gives a character
table of C2v. The regular representation matrices are given in Example 18.2. For this,
we have
DðRÞ ðC 2v Þ ¼ A1 þ A2 þ B1 þ B2 : ð18:98Þ
This relation obviously indicates that all the irreducible representations of C2v are
contained one time (that is equal to the dimension of representation of C2v).
Returning to (18.94) and (18.96) and replacing qα with dα there, we get
8
X < n for gν ¼ e,
>
ðαÞ
d χ
α α
ð gν Þ ¼ ð18:99Þ
>
:
0 for gν 6¼ e:
This is very important relation in the representation theory of a finite group in that
(18.100) sets an upper limit to the number of irreducible representations and their
dimensions. That number cannot exceed the order of a group.
Also, we get
X X X X
ℵ ℵ0 ¼ g
a g g g 0
a 0 0
g 0g ¼ g0 g
a g gg 0
a0g0
X X X X
0 1 0 0 0 1
¼ g0 1 g g
a gg a g0 1 ¼ g0 1
a
gg0 gg
0 gg g a0g0 1
X X X X X X
0 0 0
¼ g 0 1
gg 0
a gg 0 g a 0 1 ¼
g gg 0 g 0 1 agg ag0 1 g ¼
0
g g 0
a gg0 a 0 1 g,
g
ð18:104Þ
gK i g1 ⊂ K i : ð18:106Þ
Multiplying g1 from the left and g from the right of both sides, we get
Ki ⊂ g1Kig. Since g is arbitrarily chosen, replacing g with g1 we have
K i ⊂ gK i g1 : ð18:107Þ
Therefore, we get
gK i g1 ¼ K i : ð18:108Þ
ðiÞ ðiÞ
Meanwhile, for AðαiÞ , Aβ 2 K i ; AðαiÞ 6¼ Aβ (1 α, β ki) we have
ðiÞ 8
gAðαiÞ g1 6¼ gAβ g1 g2ℊ : ð18:109Þ
ðiÞ
This is because if the equality holds with (18.109), we have AðαiÞ ¼ Aβ , in
contradiction.
(c) Let K be a set collecting several classes and described as
X
K¼ aK,
i i i
ð18:110Þ
gKg1 ¼ K: ð18:111Þ
where the equality before the last comes from (18.111). Thus, from (18.113)
we get
8
gQg1 ¼ Q g2ℊ : ð18:114Þ
706 18 Representation Theory of Groups
By definition of the classes, this implies that Q must form a “complete” class,
in contradiction to the supposition. Thus, (18.110) holds.
In Sect. 16.3, we described several characteristics of an invariant subgroup.
As readily seen from (16.11) and (18.111), any invariant subgroup consists of
two or more entire classes. Conversely, if a group comprises entire classes, it
must be an invariant subgroup.
(d) Let us think of a product of classes. Let Ki and Kj be sets described as (18.105).
We define a product KiKj as a set containing products
ð jÞ
AðαiÞ Aβ 1 α ki , 1 β kj . That is
Multiplying (15.115) by g (8g 2 ℊ) from the left and by g1 from the right, we
get
gK i K j g1 ¼ ðgK i gÞ g1 K j g1 ¼ K i K j : ð18:116Þ
Thus, given a class Ki, there exists another class K i0 that consists of the
inverses of the elements of Ki. If gμ 6¼ gν, gμ1 6¼ gν1. Therefore, Ki and K i0 are
of the same order. Suppose that the number of elements contained in Ki is ki and
that in K i0 is ki0 . Then, we have
ki ¼ ki0 : ð18:120Þ
18.6 Classes and Irreducible Representations 707
b ¼ gρ 1 and b 2 K i0 : ð18:121Þ
T
This would mean that b 2 K j K i0 , implying that K j ¼ K i0 by definition of
class. It is in contradiction to K j 6¼ K i0 . Consequently, KiKj does not contain e.
Taking the product of the classes Ki and K i0 , we obtain the identity e precisely ki
times. Rewriting (18.117), we have
X
K i K j ¼ cij1 K 1 þ c K
l6¼1 ijl l
, ð18:122Þ
where K1 ¼ {e}. As mentioned below, we are most interested in the first term
in (18.122). In (18.122), if K j ¼ K i0 , we have
cij1 ¼ ki : ð18:123Þ
b i ¼ λE:
K ð18:130Þ
To determine λ, we take a trace of both sides of (18.130). Then, from (18.128) and
(18.130) we get
k i χ ðαÞ ðK i Þ ¼ λd α : ð18:131Þ
Thus, we get
b i ¼ ki χ ðαÞ ðK i ÞE:
K ð18:132Þ
dα
X
k i kj χ ðαÞ ðK i Þχ ðαÞ K j ¼ dα l cijl kl χ ðαÞ ðK l Þ: ð18:134Þ
where again we have K1 ¼ {e}. With respect to (18.134), we sum over all the
irreducible representations α. Then we have
X
ðαÞ ð αÞ
Xh X
ð αÞ
i
α
k i k j χ ðK i Þχ K j ¼ l
c ijl k l α
d α χ ð K l Þ
X
¼ c k nδ ¼ cij1 n:
l ijl l l1
ð18:136Þ
we get
X
ki kj α
χ ðαÞ ðK i Þχ ðαÞ K j ¼ cij1 n ¼ ki nδji : ð18:139Þ
Equations (18.78) and (18.126) are well known as orthogonality relations. So far,
we have no idea about the mutual relationship between the two numbers nc and nr in
magnitude. In (18.78), let us consider a following set S
npffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffiffi o
S¼ k 1 χ ðαÞ ðK 1 Þ, k2 χ ðαÞ ðK 2 Þ, , k nc χ ðαÞ ðK nc Þ : ð18:140Þ
710 18 Representation Theory of Groups
nr n c : ð18:141Þ
nc nr : ð18:143Þ
Thus, from (18.141) and (18.143) we finally reach a simple but very important
conclusion about the relationship between nc and nr such that
nr ¼ n c : ð18:144Þ
That is, the number (nr) of inequivalent irreducible representations is equal to that
(nc) of conjugacy classes of the group. An immediate and important consequence of
(18.144) together with (18.100) is that the representation of an Abelian group is one-
dimensional. This is because in the Abelian group individual group elements
constitute a conjugacy class. We have nc ¼ n accordingly.
In Sect. 18.2 we have described how basis vectors (or functions) are transformed by
a symmetry operation. In that case the symmetry operation is performed by a group
element that belongs to a transformation group. More specifically, given a group
ℊ ¼ {g1, g2, , gn} and a set of basis vectors ψ 1, ψ 2, , and ψ d, the basis vectors
are transformed by gi 2 ℊ (1 i n) such that
18.7 Projection Operators 711
Xd
gi ð ψ ν Þ ¼ μ¼1
ψ μ Dμν ðgi Þ: ð18:22Þ
We may well ask then how we can construct such basis vectors. In this section we
address this question. A central concept about this is a projection operator. We have
already studied the definition and basic properties of the projection operators. In this
section, we deal with them bearing in mind that we apply the group theory to
molecular science, especially to quantum chemical calculations.
In Sect. 18.6, we examined the permissible number of irreducible representations.
We have reached a conclusion that the number (nr) of inequivalent irreducible
representations is equal to that (nc) of conjugacy classes of the group. According
to this conclusion, we modify the above equation (18.22) such that
Xd
ðαÞ α ðαÞ ðαÞ
g ψi ¼ ψ Dji ðgÞ,
j¼1 j
ð18:145Þ
where α and dα denote the α-th irreducible representation and its dimension,
respectively; the subscript i is omitted from gi for simplicity. This naturally leads
ðαÞ ðβ Þ
to the next question of how the basis vectors ψ j are related to ψ j that are basis
vectors as well, but belong to a different irreducible representation β.
Suppose that we have an arbitrarily chosen function f. Then, f is assumed to
contain various components of different irreducible representations. Thus, let us
assume that f is decomposed into the component such that
X X
f ¼ α
cðαÞ ψ ðmαÞ ,
m m
ð18:146Þ
where cðmαÞ is a coefficient of the expansion and ψ ðmαÞ is the m-th component of α-th
irreducible representation.
Now, let us define the following operator:
ðαÞ dα X ðαÞ
PlðmÞ D ðgÞ g,
g lm
ð18:147Þ
n
where Σg means that the summation should be taken over all n group elements g; dα
ðαÞ
denotes a dimension of the representation D(α). Operating PlðmÞ on f, we have
712 18 Representation Theory of Groups
ðαÞ ðβ Þ ð αÞ
PlðmÞ PsðtÞ ¼ δαβ δms PlðtÞ : ð18:149Þ
Proof We have
18.7 Projection Operators 713
ð αÞ ðβÞ ðαÞ
PlðmÞ PmðtÞ ¼ δαβ PlðtÞ : ð18:151Þ
Putting t ¼ l furthermore,
ð αÞ ð αÞ ðαÞ
PlðmÞ PmðlÞ ¼ PlðlÞ : ð18:153Þ
h i2
ðαÞ ðαÞ ð αÞ ðαÞ
PlðlÞ PlðlÞ ¼ PlðlÞ ¼ PlðlÞ : ð18:154Þ
ð αÞ ð αÞ ðαÞ
PlðlÞ f ¼ cl ψ l : ð18:155Þ
ðαÞ ðαÞ
This means that the term cl ψ l has been extracted entirely.
Moreover, in (18.149) putting β ¼ α, s ¼ l, and t ¼ m, we have
h i2
ðαÞ ðαÞ ðαÞ ð αÞ
PlðmÞ PlðmÞ ¼ PlðmÞ ¼ δml PlðmÞ :
ðαÞ
Therefore, for PlðmÞ to be an idempotent operator, we must have m ¼ l. That is, of
ðαÞ ðαÞ
various operators PlðmÞ , only PlðlÞ is eligible for an idempotent operator.
ð αÞ
Meanwhile, fully describing PlðlÞ we have
ðαÞ dα X ðαÞ
PlðlÞ ¼ D ðgÞ g:
g ll
ð18:156Þ
n
d α X ðαÞ ðαÞ
¼ D ðgÞ g ¼ PlðlÞ ,
g ll
ð18:157Þ
n
where we used unitarity of g (with the third equality) and equivalence of summation
over g and g1. Notice that the notation of g{ is less common. This should be
interpreted as meaning that g{ operates on a vector constituting a representation
space. Thus, notation g{ implies that g{ is equivalent to its unitary representation
ðαÞ
matrix D(α)(g). Note also that Dll is not a matrix but a (complex) number. Namely,
ðαÞ
in (18.156) Dll ðgÞ is a coefficient of an operator g.
ðαÞ
Equations (18.154) and (18.157) establish that PlðlÞ is a projection operator in a
ðαÞ
rigorous sense. Let us call PlðlÞ a projection operator sensu stricto accordingly. Also,
we notice that cðmαÞ ψ ðmαÞ is entirely extracted from an arbitrary function f including a
coefficient. This situation resembles that of (12.205).
ðαÞ
Regarding PlðmÞ ðl 6¼ mÞ, on the other hand, we have
18.7 Projection Operators 715
h i{
ðαÞ dα X ðαÞ { dα X ðαÞ 1 1 {
PlðmÞ ¼ D lm ð g Þg ¼ 1
Dlm g g
n g n g
ð18:158Þ
d X ðαÞ ðαÞ
¼ α D ðgÞ g ¼ PmðlÞ :
g ml
n
ðαÞ
Hence, PlðmÞ is not Hermitian. We have many other related equations. For
instance,
h i{ h i h i
ðαÞ ð αÞ ðαÞ { ðαÞ { ðαÞ ðαÞ
PlðmÞ PmðlÞ ¼ PmðlÞ PlðmÞ ¼ PlðmÞ PmðlÞ : ð18:159Þ
ðαÞ ðαÞ
Therefore, PlðmÞ PmðlÞ is Hermitian, recovering the relation (18.153).
In (18.149) putting m ¼ l and t ¼ s, we get
where ϕðmαÞ is transformed in the same manner as that for ψ ðmαÞ in (18.146), but linearly
independent of ψ ðmαÞ . Namely, we have
Xd
ðαÞ α ðαÞ ðαÞ
g ϕi ¼ ϕ Dji ðgÞ:
j¼1 j
ð18:162Þ
h i h i
ðαÞ 2 ðαÞ ðαÞ ðαÞ ð αÞ ðαÞ ðαÞ
PlðlÞ f ¼ PlðlÞ cl ψ l ¼ PlðlÞ f ¼ cl ψ l , ð18:163Þ
h i
ð αÞ 2 ð αÞ
where with the second equality we used PlðlÞ ¼ PlðlÞ . That is, we have
h i
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ cl ψ l ¼ cl ψ l : ð18:164Þ
ðαÞ ðαÞ
This equation means that cl ψ l is an eigenfunction corresponding to an
ð αÞ ðαÞ ðαÞ
eigenvalue 1 of PlðlÞ . In other words, once cl ψ l is extracted from f, it belongs to
the “position” l of an irreducible representation α. Furthermore, for some constants
c and d as well as the functions f and h that appeared in (18.146) and (18.161), we
consider a following equation:
716 18 Representation Theory of Groups
h i
ð αÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ ccl ψ l þ ddl ϕl ¼ cPlðlÞ cl ψ l þ dPlðlÞ dl ϕl
ðαÞ ðαÞ ðαÞ ðαÞ
¼ ccl ψ l þ dd l ϕl , ð18:165Þ
where with the last equality we used (18.164). This means that an arbitrary linear
ðαÞ ðαÞ ðαÞ ðαÞ
combination of cl ψ l and d l ϕl again belongs to the position l of an irreducible
ðαÞ ðαÞ
representation α. If ψ l and ϕl are linearly independent, we can construct two
orthonormal basis vectors following Theorem 13.2 (Gram–Schmidt
orthonormalization theorem). If there are other linearly independent vectors
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
pl ξl , ql φl , , where ξl , φl , etc. belong to the position l of an irreducible
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
representation α, then ccl ψ l þ ddl ϕl þ ppl ξl þ qql φl þ again
belongs to the position l of an irreducible representation α. Thus, we can construct
orthonormal basis vectors of the representation space according to Theorem 13.2.
Regarding arbitrary functions f and h as vectors and using (18.160), we make an
inner product of (18.160) such that
ðαÞ ðβÞ ðαÞ ðαÞ ðβÞ ðαÞ ðαÞ
hhPlðlÞ PsðsÞ j f i ¼ hhPlðlÞ PlðlÞ PsðsÞ j f i ¼ hhPlðlÞ δαβ δls PlðsÞ f i
D E ð18:166Þ
ðαÞ ðαÞ
¼ δαβ δls hPlðlÞ PlðsÞ f ,
ðαÞ ð αÞ ð αÞ
where with the first equality we used PlðlÞ ¼ PlðlÞ PlðlÞ (18.154). Meanwhile, from
(18.148) we have
ðαÞ ð αÞ
PlðsÞ j f i ¼ cðsαÞ ψ l : ð18:167Þ
ðαÞ ð αÞ ð αÞ
PlðlÞ j hi ¼ dl j ϕl i: ð18:168Þ
h i{
ð αÞ ðαÞ
where we used PlðlÞ ¼ PlðlÞ (18.157). The relation (18.169) is due to the notation
of Sect. 13.3. Substituting (18.167) and (18.168) for (18.166),
D E h i D E
ðαÞ ðβÞ ðαÞ ðαÞ ðαÞ
hPlðlÞ PsðsÞ f ¼ δαβ δls dl cðsαÞ ϕl jψ l :
D E D E h i D E
ð αÞ ð β Þ ðαÞ ðαÞ ðαÞ
hPlðlÞ PsðsÞ f ¼ dl ϕl cðsβÞ ψ ðsβÞ ¼ dl cðsβÞ ϕl ψ ðsβÞ
ð αÞ
ð18:170Þ
ðβÞ ðαÞ
That is, under a given condition α 6¼ β or l 6¼ s, PsðsÞ j f i and PlðlÞ j hi are
orthogonal (see Theorem 13.3 of Sect. 13.4). The relation clearly indicates that
functions belonging to different irreducible representations (α 6¼ β) are mutually
orthogonal. Even though the functions belong to the same irreducible representation,
the functions are orthogonal if they are allocated to the different “place” as a basis
ðαÞ
vector designated by l, s, etc. Here the place means the index j of ψ j in (18.145) or
ðαÞ ðαÞ ðαÞ ð αÞ ð αÞ ðαÞ
ϕj in (18.162) that designates the “order” of ψ j within ψ 1 , ψ 2 , , ψ dα or ϕj
ðαÞ ð αÞ ð αÞ
within ϕ1 , ϕ2 , , ϕdα . This takes place if the representation is multidimensional
(i.e., dimensionality: dα).
In (18.160) putting α ¼ β, we get
ðαÞ ðαÞ
PlðlÞ PsðsÞ ¼ 0:
ðαÞ ðαÞ
Therefore, on the basis of the discussion of Sect. 14.1, PlðlÞ þ PsðsÞ is a projection
ð αÞ ðαÞ
operator as well in the case of l 6¼ s. Notice, however, that if l ¼ s, PlðlÞ þ PsðsÞ is not a
projection operator. Readers can readily show it. Moreover, let us define P(α) as
below
718 18 Representation Theory of Groups
Xd α ðαÞ
PðαÞ P :
l¼1 lðlÞ
ð18:173Þ
Thus, an operator P(α) has a clear meaning. That is, P(α) plays a role in extracting
all the vectors (or functions) including their coefficients. Defining those functions as
Xdα ðαÞ ðαÞ
ψ ðαÞ c ψl ,
l¼1 l
ð18:176Þ
In turn, let us calculate P(β)P(α). This can be done with (18.174) as follows:
d β X h ðβÞ i dα X h ðαÞ 0 i 0
PðβÞ PðαÞ ¼ χ ð gÞ g g0
χ ðg Þ g : ð18:178Þ
n g n
To carry out the calculation, (i) first we replace g0 with g1g0 and rewrite the
summation over g0 as that over g1g0. (ii) Using the homomorphism and unitarity of
the representation, we rewrite [χ (α)(g0)] as
h i X h i h i
χ ðαÞ ðg0 Þ ¼ i,j
D ðαÞ
ð g Þ ji DðαÞ 0
ð g Þ ji : ð18:179Þ
d β dα X X h ðβÞ i h ðαÞ X h i
PðβÞ PðαÞ ¼ D ð g Þ kk D ð g Þ ji 0
D ðαÞ 0
ðg Þ g0
n n g k,i,j g ji
dβ dα X n X h i
dα X X
h i
ðαÞ 0 0 ðαÞ 0
¼ δ αβ δkj δ ki g0
D ð g Þ g ¼ g0
D ð g Þ g0
n n k,i,j d β ji n k kk
h i
d X
¼ δαβ α g0
χ ðαÞ 0
ð g Þ g0 ¼ δαβ PðαÞ :
n
ð18:180Þ
These relationships are anticipated from the fact that both P(α) and P(β) are
projection operators.
As in the case of (18.160), (18.180) is useful to evaluate an inner product of
functions. Again taking an inner product of (18.180) with arbitrary functions f and g,
we have
D E D E D E
gPðβÞ PðαÞ f ¼ gδαβ PðαÞ f ¼ δαβ gPðαÞ f : ð18:181Þ
then P is once again a projection operator (see Sect. 14.1). Now taking summation in
(18.177) over all the irreducible representations α, we have
Xnr Xnr
α¼1
PðαÞ f ¼ α¼1
ψ ðαÞ ¼ f : ð18:183Þ
P ¼ E: ð18:184Þ
As mentioned in Example 18.1, the concept of the representation space is not only
important but also very useful for addressing various problems of physics and
chemistry. For instance, molecular orbital methods to be dealt with in Chap. 19
consider a representation space whose dimension is equal to the number of electrons
of a molecule about which we wish to know, e.g., energy eigenvalues of those
electrons. In that case, the dimension of representation space is equal to the number
of molecular orbitals. According to a symmetry species of the molecule, the repre-
sentation matrix is decomposed into a direct sum of invariant eigenspaces relevant to
individual irreducible representations. If the basis vectors belong to different irre-
ducible representation, such vectors are orthogonal to each other in virtue of
720 18 Representation Theory of Groups
(18.171). Even though those vectors belong to the same irreducible representation,
they are orthogonal if they are allocated to a different place. However, it is often the
case that the vectors belong to the same place of the same irreducible representation.
Then, it is always possible according to Theorem 13.2 to make them mutually
orthogonal by taking their linear combination. In such a way, we can construct an
orthonormal basis set throughout the representation space. In fact, the method is a
powerful tool for solving an energy eigenvalue problem and for determining asso-
ciated eigenfunctions (or molecular orbitals).
where ψ k (1 i dα) and ϕl (1 l dβ) are basis functions of D(α) and D(β),
respectively. We can construct dαdβ new basis vectors using ψ kϕl. These functions
are transformed according to g such that
hX ðαÞ
ihX
ðβÞ
i
g ψ i ϕj ¼ gðψ i Þg ϕj ¼ k
ψ k D ki ð g Þ l
ϕ l D lj ðg Þ
XX ðαÞ ðβÞ
¼ k
ψ ϕ D ðgÞDlj ðgÞ:
l k l ki
ð18:187Þ
β)
Here let us define the following matrix [D(α (g)]kl, ij such that
h i
ðαÞ ðβ Þ
Dðα βÞ
ð gÞ Dki ðgÞDlj ðgÞ: ð18:188Þ
kl,ij
Then we have
XX h i
g ψ i ϕj ¼ k l
ψ k ϕ l Dðα βÞ
ð gÞ : ð18:189Þ
kl,ij
18.8 Direct-Product Representation 721
Thus,
Dðα βÞ
ðgÞ ¼ DðαÞ ðgÞ DðβÞ ðgÞ
0 ðαÞ ðβÞ ðαÞ 1
d1,1 D ðgÞ d1,dα DðβÞ ðgÞ
B C
¼@ ⋮ ⋱ ⋮ A: ð18:191Þ
ðαÞ ðαÞ
ddα ,1 DðβÞ ðgÞ d dα ,dα DðβÞ ðgÞ
we get
Dðα βÞ
ðgÞ ¼ DðαÞ ðgÞ DðβÞ ðgÞ
0 1
a11 b11 a11 b12 a12 b11 a12 b12
Ba b C
B 11 21 a11 b22 a12 b21 a12 b22 C: ð18:193Þ
B C
¼B C
B a21 b11 a21 b12 a22 b11 a22 b12 C
@ A
a21 b21 a21 b22 a22 b21 a22 b22
h i
ðαÞ ðβÞ
Dðα βÞ
ðgg0 Þ ¼ Dki ðgg0 ÞDlj ðgg0 Þ
kl,ij
hX ihX i
ðαÞ ð αÞ 0 ðβÞ ðβÞ 0
¼ D
μ kμ
ð g ÞDμi ð g Þ D
ν lν
ð g ÞD νj ðg Þ
X X ðαÞ ð18:194Þ
ðβÞ ðαÞ 0 ðβÞ
¼ μ
D ðgÞDlν ðgÞDμi ðg ÞDνj ðg0 Þ
ν kμ
X Xh i h i
ðα β Þ ðα βÞ 0
¼ μ ν
D ð g Þ D ð g Þ
kl,μν μν,ij
Equation (18.194) shows that the rule of matrix calculation with respect to the
double subscript is satisfied. Consequently, we get
Dðα βÞ
ðgg0 Þ ¼ Dðα βÞ
ðgÞDðα βÞ
ðg0 Þ: ð18:195Þ
Denoting
X h i
χ ðα βÞ
ð gÞ i, j
Dðα βÞ
ð gÞ , ð18:197Þ
ij,ij
we have
χ ðα βÞ
ðgÞ ¼ χ ðαÞ ðgÞχ ðβÞ ðgÞ: ð18:198Þ
Even though D(α) and D(β) are both irreducible, D(αxβ) is not necessarily irreduc-
ible. Suppose that
X
DðαÞ ðgÞ DðβÞ ðgÞ ¼ q D
ω ω
ðωÞ
ðgÞ, ð18:199Þ
where n is an order of the group. This relation is often used to perform quantum
mechanical or chemical calculations, especially to evaluate optical transitions of
18.9 Symmetric Representation and Antisymmetric Representation 723
matter including atoms and molecular systems. This is also useful to examine
whether a definite integral of a product of functions (or an inner product of vectors)
vanishes.
In Sect. 16.5, we investigated definition and properties of direct-product groups.
Similarly to the case of the direct-product representation, we consider a representa-
tion of the direct-product groups. Let us consider two groups ℊ and H and assume
that a direct-product group ℊ H is defined (Sect. 16.5). Also let D(α) and D(β) be
dα- and dβ-dimensional representations of groups ℊ and H , respectively. Further-
more, let us define a matrix D(α β)(ab) as in (18.188) such that
h i
ð αÞ ðβ Þ
Dðα βÞ
ðabÞ Dki ðaÞDlj ðbÞ, ð18:201Þ
kl,ij
XX ðαÞ ðαÞ
g ψ i ϕj ¼ gðψ i Þg ϕj ¼ k
ψ ϕ D ðgÞDlj ðgÞ:
l k l ki
ð18:203Þ
Regarding a product function ψ jϕi we can get a similar equation such that
XX ðαÞ ðαÞ
g ψ j ϕi ¼ g ψ j gð ϕi Þ ¼ k
ψ ϕ D ðgÞDli ðgÞ:
l k l kj
ð18:204Þ
On the basis of the linearity of the relations (18.203) and (18.204), let us construct
a linear combination of the product functions. That is,
g ψ i ϕj ψ j ϕi ¼ g ψ i ϕj g ψ j ϕi
XX ðαÞ ðαÞ
XX ðαÞ ðαÞ
¼ k
ψ ϕ D ðgÞDlj ðgÞ
l k l ki k
ψ ϕ D ðgÞDli ðgÞ
l k l kj
XX ð αÞ ðαÞ
¼ k l
ðψ k ϕl ψ l ϕk ÞDki ðgÞDlj ðgÞ:
ð18:205Þ
Here, defining Ψ ij as
Ψ ij ¼ ψ i ϕj ψ j ϕi , ð18:206Þ
we rewrite (18.205) as
XX n h io
1 ðαÞ ð αÞ ðαÞ ðαÞ
gΨij ¼ Ψ
l kl
D ðgÞDlj ðgÞ Dli ðgÞDkj ðgÞ : ð18:207Þ
k 2 ki
Notice that Ψ þ
ij and Ψ ij in (18.206) are symmetric and antisymmetric with respect
to the interchange of subscript i and j, respectively. That is,
Ψ ij ¼ Ψ ji : ð18:208Þ
XX n X Xh i o
1 ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ 0 ðαÞ 0
¼ Ψ
l kl
μ ν
D kμ ð g ÞD lν ð g Þ D lμ ð g ÞD kν ð g Þ D μi ðg ÞD νj ðg Þ
(2
k
XX X X nh i h i o
1 ð α αÞ ðα αÞ ðαÞ 0 ðαÞ 0
¼ Ψ kl μ ν
D ð g Þ kl,μν D ð g Þ lk,μν Dμi ðg ÞDνj ðg Þ,
k l 2
ð18:209Þ
where the last equality follows from the definition of a direct-product representation
(18.188). We notice that the terms of [D(α α)(gg0)]kl, μν [D(α α)(gg0)]lk, μν in
(18.209) are symmetric and antisymmetric with respect to the interchange of sub-
scripts k and l together with subscripts μ and ν, respectively. Comparing both sides of
(18.209), we see that this must be the case with i and j as well. Then, the last factor of
(18.209) should be rewritten as:
h i
ðαÞ ðαÞ 1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ
Dμi ðg0 ÞDνj ðg0 Þ ¼ D ðg ÞDνj ðg Þ Dνi ðg0 ÞDμj ðg0 Þ :
2 μi
nh i h i o
1
Dfα αg
ð gÞ kl,μν
Dðα αÞ ðgÞ Dðα αÞ ðgÞ lk,μν
2 kl,μν
h i
1 ðαÞ ð αÞ ðαÞ ðαÞ
¼ Dkμ ðgÞDlν ðgÞ Dlμ ðgÞDkν ðgÞ : ð18:211Þ
2
ðαÞ ðαÞ
Meanwhile, using Dμi ðg0 ÞDνj ðg0 Þ ¼ Dðα αÞ ðg0 Þ μν,ij and considering the
exchange of summation with respect to the subscripts μ and ν, we can also define
D[α α](g0) and D{α α}(g) according to the symmetric and antisymmetric cases,
respectively. Thus, for the symmetric case, we have
X XX X n o n o
gg0 Ψþ
ij ¼ k l μ ν
Ψ þ
kl D½α α
ð gÞ D½α α
ð g0 Þ
kl,μν μν,ij
XX n o ð18:212Þ
¼ k
Ψþ
l kl
D½α α
ðgÞD½ α α 0
ðg Þ :
kl,ij
XX n o
gΨþ
ij ¼ k
Ψ þ
l kl
D½α α
ð gÞ : ð18:213Þ
kl,ij
Then we have
XX n o
gg0 Ψþ
ij ¼ k l
Ψþ
kl D
½α α
ðgg0 Þ : ð18:214Þ
kl,ij
D½α α
ðgg0 Þ ¼ D½α α
ðgÞD½α α
ðg0 Þ: ð18:215Þ
Dfα αg
ðgg0 Þ ¼ Dfα αg
ðgÞDfα αg
ðg0 Þ: ð18:218Þ
Thus, both D[α α](g) and D{α α}(g) produce well-defined representations.
Letting dimension of the representation α be dα, we have dα(dα + 1)/2 functions
belonging to the symmetric representation and dα(dα 1)/2 functions belonging to
the antisymmetric representation. With the two-dimensional representation, for
instance, functions belonging to the symmetric representation are
ψ 1 ϕ1 , ψ 1 ϕ2 þ ψ 2 ϕ1 , and ψ 2 ϕ2 :
ψ 1 ϕ2 ψ 2 ϕ1 :
X Xn o
χ ½α α
ð gÞ ¼ k l
D ½α α
ð g Þ
kl,kl
X X h i
1 ð αÞ ðαÞ ðαÞ ðαÞ
¼ D kk ð g ÞD ll ð g Þ þ D lk ð g ÞD kl ð g Þ ð18:219Þ
2 k l
h i 2 h i
1
¼ χ ðαÞ ðgÞ þ χ ðαÞ g2 :
2
Similarly we have
h i2 h
fα αg 1 ð αÞ ðαÞ
2 i
χ ðgÞ ¼ χ ð gÞ χ g : ð18:220Þ
2
References
1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, expanded
ed. Shokabo, Tokyo. (in Japanese)
3. Hassani S (2006) Mathematical physics. Springer, New York
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
Chapter 19
Applications of Group Theory to Physical
Chemistry
On the basis of studies of group theory, now in this last chapter we apply the
knowledge to the molecular orbital (MO) calculations (or quantum chemical calcu-
lations). As tangible examples, we adopt aromatic hydrocarbons (ethylene,
cyclopropenyl radical, benzene, and ally radical) and methane. The approach is
based upon a method of linear combination of atomic orbitals (LCAO). To seek an
appropriate LCAO MO, we make the most of a method based on a symmetry-
adapted linear combination (SALC). To use projection operators is a powerful tool
for this purpose. For the sake of correct understanding, it is desired to consider
transformation of functions. To this end, we first show several examples.
In the process of carrying out MO calculations, we encounter a secular equation
as an eigenvalue equation. Using a SALC eases the calculations of the secular
equation. Molecular science relies largely on spectroscopic measurements and
researchers need to assign individual spectral lines to a specific transition between
the relevant molecular states. Representation theory works well in this situation.
Thus, the group theory finds a perfect fit with its applications in the molecular
science.
O x
rigid body before and after the transformation. Concomitantly, a general point r (see
Sect. 17.1) is transformed to r0 in exactly the same way as r0.
A new function f 0 gives a new contour map after the rotation. Let us describe f 0 as
f 0 OR f : ð19:1Þ
The RHS of (19.1) means that f 0 is produced as a result of operating the rotation
R on f. We describe the position vector after being transformed as r00 . Following the
notation of Sect. 11.1, r00 and r0 are expressed as
The matrix representation for R is given by, e.g., (11.35). In (19.2) we have
0
x0 x
r0 = ðe1 e2 Þ and r00 ¼ ðe1 e2 Þ 00 , ð19:3Þ
y0 y0
where e1 and e2 are orthonormal basis vectors in the xy-plane. Also we have
0
x 0 x
r ¼ ð e1 e2 Þ and r ¼ ðe1 e2 Þ 0 : ð19:4Þ
y y
where
0 1 0 01
x x
B C 0 B 0C
r ¼ ðe1 e2 e3 Þ@ y A and r ¼ ðe1 e2 e3 Þ@ y A: ð19:8Þ
z z0
The last relation of (19.7) comes from (19.1). Using (19.2) and (19.3), we rewrite
(19.7) as
That is,
OR f ðrÞ ¼ f R1 ðrÞ : ð19:11Þ
f 0 OR f Rf : ð19:13Þ
A contour is shown in Fig. 19.2. We consider a π/2 rotation around the z-axis.
Then, f(x, y) is transformed into Rf(x, y) ¼ f0(x, y) such that
732 19 Applications of Group Theory to Physical Chemistry
O x
z
We also have
a
r 0 = ð e1 e2 Þ ,
b
0 1 a b
r00 ¼ ðe1 e2 Þ ¼ ð e1 e2 Þ ,
1 0 b a
where we define R as
0 1
R¼ :
1 0
Similarly,
x 0 y
r = ð e1 e2 Þ and r = ð e1 e2 Þ :
y x
This ensures that (19.5) holds. The implication of (19.16) combined with (19.5) is
that a view of f0(x0, y0) from (b, a) is the same as that of f(x, y) from (a, b). Imagine
that if we are standing at (b, a) of f0(x0, y0), we are in the bottom of the “valley” of
f0(x0, y0). Exactly in the same manner, if we are standing at (a, b) of f(x, y), we are in
19.1 Transformation of Functions 733
′, ′ ,
e 2 2½ðxþaÞ þy þz ða > 0Þ.
2 2 2
(− , 0) ( , 0)
The function form remains
unchanged by a reflection
with respect to the yz-plane O
z x
the bottom of the valley of f(x, y) as well. Notice that for both f(x, y) and f0(x0, y0), (a,
b) and (b, a) are the lowest point, respectively.
Meanwhile, we have
1 0 1 1 0 1 x y
R ¼ , R ð r Þ ¼ ð e1 e2 Þ ¼ ð e1 e2 Þ :
1 0 1 0 y x
Then we have
f R1 ðrÞ ¼ ½ðyÞ a2 þ ½ðxÞ b2 ¼ ðx þ bÞ2 þ ðy aÞ2 ¼ Rf ðrÞ: ð19:17Þ
(a) (b)
0
‒a 0 a ‒a 0 a
Fig. 19.4 Plots of f ðrÞ ¼ e2½ðxaÞ þy þz þ e2½ðxþaÞ þy þz and gðrÞ ¼ e2½ðxaÞ þy2 þz2
2 2 2 2 2 2 2
e2½ðxþaÞ þy þz ða > 0Þ as a function of x on the x-axis. (a) f(r). (b) g(r)
2 2 2
g R1 ðrÞ ¼ RgðrÞ = 2 gðrÞ: ð19:22Þ
Plotting f(r) and g(r) as a function of x on the x-axis, we depict results in Fig. 19.4.
Looking at (19.21) and (19.22), we find that f(r) and g(r) are solutions of an
eigenvalue equation for an operator R. Corresponding eigenvalues are 1 and 1 for f
(r) and g(r), respectively. In particular, f(r) holds the functional form after the
transformation R. In this case, f(r) is said to be invariant with the transformation R.
Moreover, f(r) is invariant with the following eight transformations:
0 1
1 0 0
B C
R¼@ 0 1 0 A: ð19:23Þ
0 0 1
the whole molecule. Electronic state in the molecule is formed by different MOs of
various energies that are occupied by electrons. As in the case of an atom, an MO
ψ i(r) occupied by an electron is described as
ħ2 2
Hψ i ðrÞ ∇ þ V ðrÞ ψ i ðrÞ ¼ E i ψ i ðrÞ, ð19:24Þ
2m
Or we have
Comparing the first and last sides and remembering that ψ is an arbitrary function,
we get
RV ¼ VR: ð19:29Þ
Next, let us examine the symmetry of the Laplacian ∇2. The Laplacian is defined
in Sect. 1.2 as
2 2 2
∂ ∂ ∂
∇2 2
þ 2þ 2, ð1:24Þ
∂x ∂y ∂z
736 19 Applications of Group Theory to Physical Chemistry
where x, y, and z denote the Cartesian coordinates. Let S be an orthogonal matrix that
transforms the xyz-coordinate system to x’y’z’-coordinate system. We suppose that
S is expressed as
0 1 0 1 0 10 1
s11 s12 s13 x0 s11 s12 s13 x
B C B 0C B CB C
S ¼ @ s21 s22 s23 A and @ y A ¼ @ s21 s22 s23 A@ y A: ð19:30Þ
s31 s32 s33 z0 s31 s32 s33 z
SST ¼ ST S ¼ E, ð19:32Þ
∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂ ∂
0
¼ 0 þ 0 þ 0 ¼ s11 þ s12 þ s13 : ð19:33Þ
∂x ∂x ∂x ∂x ∂y ∂x ∂z ∂x ∂y ∂z
∂ ∂ ∂ ∂ ∂ ∂ ∂
¼ s11 0 þ s12 0 þ s13 0
∂x0 2 ∂x ∂x ∂x ∂y ∂x ∂z
∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂
¼ s11 s11 þ s12 þ s13 þ s12 s11 þ s12 þ s13
∂x ∂y ∂z ∂x ∂x ∂y ∂z ∂y
∂ ∂ ∂ ∂
þs13 s11 þ s12 þ s13
∂x ∂y ∂z ∂z
2 2 2
∂ ∂ ∂
¼ s11 2 þ s11 s12 þ s11 s13
∂x2 ∂y∂x ∂z∂x
2 2 2
∂ ∂ ∂
þs12 s11 þ s12 2 2 þ s12 s13
∂x∂y ∂y ∂z∂y
2 2 2
∂ ∂ ∂
þs13 s11 þ s13 s12 þ s13 2 2 :
∂x∂z ∂y∂z ∂z
ð19:34Þ
19.2 Method of Molecular Orbitals (MOs) 737
Calculating e.g., terms of ∂y∂0 2 and ∂z∂0 2, we have similar results. Then, collecting all
those 27 terms, we get
2 2
∂ ∂
s11 2 þ s21 2 þ s31 2 2
¼ 2, ð19:35Þ
∂x ∂x
2 2
∂
where we used an orthogonal relationship of S. Cross terms of , ∂ ,
∂x∂y ∂y∂x
etc. all
vanish. Consequently, we get
2 2 2
∂ ∂ ∂ ∂ ∂ ∂
þ þ ¼ þ þ : ð19:36Þ
∂x0 2 ∂y0 2 ∂z0 2 ∂x2 ∂y2 ∂z2
Defining ∇02 as
∂ ∂ ∂
∇0
2
0
þ 02 þ 02 , ð19:37Þ
∂x 2
∂y ∂z
we have
∇ 0 ¼ ∇2 :
2
ð19:38Þ
Notice that (19.38) holds with not only the symmetry operation of the molecule,
but also any rotation operation in ℝ3 (see Sect. 17.4).
As in the case of (19.28), we have
R∇2 ψ ðr0 Þ ¼ R ∇2 ψ ðr0 Þ ¼ ∇0 ψ 0 ðr0 Þ ¼ ∇2 ψ 0 ðr0 Þ
2
ð19:39Þ
¼ ∇2 Rψ ðr0 Þ ¼ ∇2 R ψ ðr0 Þ:
Consequently, we get
R∇2 ¼ ∇2 R: ð19:40Þ
R ∇2 þ V ¼ ∇2 þ V R: ð19:41Þ
RH ¼ HR: ð19:42Þ
738 19 Applications of Group Theory to Physical Chemistry
Thus, we confirm that the Hamiltonian H commutes with any symmetry operation
R. In other words, H is invariant with the coordinate transformation relevant to the
symmetry operation.
Now we consider matrix representation of (19.42). Also let D be an irreducible
representation of the symmetry group ℊ. Then, we have
H ¼ λE, ð19:43Þ
where the above matrix is (d, d ) square diagonal matrix. This implies that H is
reduced to
H ¼ λDð0Þ λDð0Þ ,
where D(0) denotes the totally symmetric representation; notice that it is given by
1 for any symmetry operation. Thus, the commutability of an operator with all
symmetry operation is equivalent to that the operator belongs to the totally symmet-
ric representation.
Since H is Hermitian, λ should be real. Operating both sides of (19.42) on
{ψ 1, ψ 2, , ψ d} from the right, we have
ðψ 1 ψ 2 ψ d ÞRH ¼ ðψ 1 ψ 2 ψ d ÞHR
0 1
λ
B C
B λ C
B C
¼ ðψ 1 ψ 2 ψ d ÞB CR ¼ ðλψ 1 λψ 2 λψ d ÞR : ð19:45Þ
B ⋮ ⋱ ⋮C
@ A
λ
¼ λðψ 1 ψ 2 ψ d ÞR
ðψ 1 ψ 2 ψ d ÞH ¼ λðψ 1 ψ 2 ψ d Þ:
ψ i H ¼ λψ i ð1 i dÞ: ð19:46Þ
we rewrite (19.45) as
ðψ 1 ψ 2 ψ d ÞRH ¼ ðψ 1 R ψ 2 R ψ d RÞH
Xd Xd Xd
¼ ψ D ðRÞ k¼1 ψ k Dk2 ðRÞ k¼1 ψ k Dkd ðRÞ H ,
k¼1 k k1
Xd Xd Xd
¼λ k¼1
ψ k D k1 ð RÞ k¼1
ψ k D k2 ðR Þ ψ D ðRÞ
k¼1 k kd
where with the second equality we used the relation (11.42); i.e.,
ψ i R ¼ Rðψ i Þ:
Thus, we get
Xd Xd
k¼1
ψ k Dki ðRÞH ¼ λ k¼1
ψ k Dki ðRÞ ð1 i d Þ: ð19:47Þ
DXd Xd E Xd Xd
e
ψ
k¼1 k
D ki ð RÞj e
ψ
l¼1 l
D lj ð R Þ ¼ k¼1 l¼1 ki
e k je
D ðRÞ Dlj ðRÞhψ ψ li
Xd Xd
¼ D ðRÞ Dkj ðRÞ ¼
k¼1 ki k¼1
D{ ðRÞ ik ½DðRÞkj ¼ D{ ðRÞDðRÞ ij ¼ δij :
With the last equality, we used unitarity of D(R). Thus, the orthonormal basis set
is retained after unitary transformation of fψ e 2 , , ψ
e1, ψ e d g.
In the above discussion, we have assumed that ψ 1, ψ 2, , and ψ d belong to a
certain irreducible representation D(ν). Since ψ e1, ψ
e 2 , , and ψ e d consist of their linear
combinations, ψ e1, ψ
e 2 , , and ψ
e d belong to D(ν) as well. Also, we have assumed
that ψe1, ψ
e 2 , , and ψ e d form an orthonormal basis and, hence, according to
Theorem 11.3 these vectors constitute basis vectors that belong to D(ν). In particular,
functions derived using projection operators share these characteristics. In fact, the
above principles underlie molecular orbital calculations dealt with in Sects. 19.4 and
19.5. We will go into more details in subsequent sections.
Bearing the aforementioned argument in mind, we make the most of the relation
expressed by (18.171) to evaluate inner products of functions (or vectors). In this
connection, we often need to calculate matrix elements of an operator. One of the
most typical examples is matrix elements of Hamiltonian. In the field of molecular
science we have to estimate an overlap integral, Coulomb integral, resonance
integral, etc. Other examples include transition matrix elements pertinent to, e.g.,
electric dipole transition. To this end, we deal with direct-product representation (see
Sects. 18.7 and 18.8). Let O(γ) be an Hermitian operator belonging to the γ-th
irreducible representation. Let us think of a following inner product:
D E
ðβ Þ
ϕl jOðγÞ ψ ðsαÞ , ð19:48Þ
ðβ Þ
where ψ ðsαÞ and ϕl are the s-th component of α-th irreducible representation and the
l-th component of β-th irreducible representation, respectively; see (18.146) and
(18.161).
(i) Let us think of first the case where O(γ) is Hamiltonian H. Suppose that ψ ðsαÞ
belongs to an eigenvalue λ of H. Then, we have
ð αÞ ðαÞ
H j ψ l i ¼ λ j ψ l i: ð19:49Þ
D E D E
ð αÞ ðαÞ
ϕðsβÞ jψ l ¼ δαβ δls ϕðsβÞ jψ l : ð18:171Þ
At the first glance this equation seems trivial, but the equation gives us a
powerful tool to save us a lot of troublesome calculations. In fact, if we
encounter a series of inner product calculations (i.e., definite integrals), we
ignore many of them. It is because matrix elements of Hamiltonian do not
vanish only if α ¼ β and l ¼ s. That is, we only have to estimate ϕðsαÞ jψ ðsαÞ .
Otherwise the inner products vanish. The functional forms of ϕðsαÞ and ψ ðsαÞ are
determined depending upon individual practical problems. We will show tan-
gible examples later (Sect. 19.4).
(ii) Next, let us consider matrix elements of the optical transition. In this case, we
are thinking of transition probability between Dquantum states. E Assuming the
ðβ Þ
dipole approximation, we use εe P(γ) for O(γ) in ϕl jOðγÞ ψ ðsαÞ of (19.48). The
quantity P(γ) represents an electric dipole moment associated with the position
vector. Normally, we can readily find it in a character table, in which we
examine which irreducible representation γ the position vector components
x, y, and z indicated in a rightmost column correspond to. Table 18.4 is an
example. If we take a unit polarization vector εe in parallel to the position vector
component that the character table designates, εe P(γ) is nonvanishing. Suppose
ðβÞ
that ψ ðsαÞ and ϕl are an initial state and final state, respectively. At the first
glance, we would wish to use (18.200) and count how many times a represen-
tation β occurs for a direct-product representation D(γ α). It is often the case,
however, where β is not an irreducible representation.
Even in such a case, if either α or β belongs to a totally symmetric representation,
the handling will be easier. This situation corresponds to that an initial electronic
configuration or a final electronic configuration forms a closed shell an electronic
state of which is totally symmetric [1]. The former occurs when we consider the
optical absorption that takes place, e.g., in a molecule of a ground electronic state.
The latter corresponds to an optical emission that ends up with a ground state. Let us
ðγ Þ
consider the former case. Since Oj is Hermitian, we rewrite (19.48) as
D E D E D E
ðβÞ ðγ Þ ðγ Þ ðβÞ ðγ Þ ðβ Þ
ϕl jOj ψ ðsαÞ ¼ Oj ϕl jψ ðsαÞ ¼ ψ ðsαÞ jOj ϕl , ð19:51Þ
where we assume that ψ ðsαÞ is a ground state having a closed shell electronic
configuration. Therefore, ψ ðsαÞ belongs to a totally symmetric irreducible represen-
tation. For (19.48) not to vanish, therefore, we may alternatively state that it is
necessary for
Dðγ βÞ
¼ DðγÞ DðβÞ
742 19 Applications of Group Theory to Physical Chemistry
Thus, applying (18.199) and (18.200), we can obtain a direct sum of irreducible
representations. After that, we examine whether an irreducible representation α is
contained in D(γ) D(β). Since the totally symmetric representation is
one-dimensional, s ¼ 1 in (19.51).
where cki are complex coefficients. We assume that ϕk are normalized. That is,
ð
hϕk jϕk i ϕk ϕk dτ ¼ 1, ð19:53Þ
where dτ implies that an integration should be taken over ℝ3. The notation hg| fi
means an inner product defined in Sect. 13.1. The inner product is usually defined by
a definite integral of g f whose integration range covers a part or all of ℝ3 depending
on a constitution of a physical system.
First we try to solve Schrödinger equation given as an eigenvalue equation. The
said equation is described as
Hψ ¼ λψ or ðH λψ Þ ¼ 0: ð19:54Þ
Xn
c ðH
k¼1 k
λÞϕk ¼ 0 ð1 k nÞ, ð19:55Þ
where the subscript i in (19.52) has been omitted for simplicity. Multiplying ϕj from
the left and integrating over whole ℝ3, we have
Xn ð
k¼1
c k ϕj ðH λÞϕk dτ ¼ 0: ð19:56Þ
Xn ð
c
k¼1 k
ϕj Hϕk λϕj ϕk dτ ¼ 0: ð19:57Þ
The above notation has already been introduced in Sect. 1.4. Equation (19.62)
certainly satisfies the definition of the inner product described in (13.2)–(13.4);
readers, please check it. On the basis of (13.64), we have
hyjHxi ¼ xH { jy : ð19:63Þ
where all the off-diagonal elements are zero and an eigenvalue λ is given by
e ii =e
λi ¼ H Sii : ð19:67Þ
This means that the eigenvalue equation has automatically been solved.
The best way to achieve this is to choose basis functions (vectors) such that the
functions conform to the symmetry which the molecule belongs to. That is, we
“shuffle” the atomic orbitals as follows:
19.3 Calculation Procedures of Molecular Orbitals (MOs) 745
Xn
ξi ¼ d ϕ
k¼1 ki k
ð1 i nÞ, ð19:68Þ
where ξi are new functions chosen instead of ψ i of (19.52). That is, we construct a
linear combination of the atomic orbitals that belongs to individual irreducible
representation. The said linear combination is called symmetry-adapted linear com-
bination (SALC). We remark that all atomic orbitals are not necessarily included in
the SALC. Then we have
ð ð
e
H jk ¼ ξj Hξk dτ and ~
Sjk ¼ ξj ξk dτ: ð19:69Þ
e jk λ~
det H Sjk ¼ 0: ð19:70Þ
atomic orbitals contained in a molecule and examine how those orbitals are
transformed by symmetry operations. (iii) We examine how those atomic orbitals
are transformed according to symmetry operations. Since the symmetry operations
are represented by a (n, n) unitary matrix, we can readily decide a trace of the matrix.
Generally, that matrix representation is reducible, and so we reduce the representa-
tion according to the procedures of (18.81)–(18.83). Thus, we are able to determine
how many irreducible representations are contained in the original reducible repre-
sentation. (iv) After having determined the irreducible representations, we construct
SALCs and constitute a secular equation using them. (v) Solving the secular
equation, we determine molecular orbital energies and decide functional forms of
the corresponding MOs. (vi) We examine physicochemical properties such as optical
transition within a molecule.
In the procedure (iii) of the above, if the same irreducible representations appear
more than once, we have to solve a secular equation of an order of two or more. Even
in that case, we can render a set of resulting eigenvectors orthogonal to one another
during the process of solving a problem.
To construct the abovementioned SALC, we make the most of projection oper-
ators that are defined in Sect. 14.1. In (18.147) putting m ¼ l, we have
ðαÞ dα X ðαÞ
PlðlÞ ¼ D ðgÞ g:
g ll
ð19:71Þ
n
ðαÞ
In the one-dimensional representation, PlðlÞ and P(α) are identical. As expressed in
(18.155) and (18.175), these projection operators act on an arbitrary function and
extract specific component(s) pertinent to a specific irreducible representation of the
point group which the molecule belongs to.
ðαÞ
At first glance the definition of PlðlÞ and P(α) looks daunting, but use of character
tables relieves a calculation task. In particular, all the irreducible representations are
one-dimensional (i.e., just a number!) with Abelian groups as mentioned in Sect.
18.6. For this, notice that individual group elements form a class by themselves.
Therefore, utilization of character tables becomes easier for Abelian groups. Even
though we encounter a case where a dimension of representation is more than one
ðαÞ
(i.e., the case of noncommutative groups), Dll can be determined without much
difficulty.
19.4 MO Calculations Based on π-Electron Approximation 747
On the basis of the general argument developed in Sects. 19.2 and 19.3, we preform
molecular orbital calculations of individual molecules. First, we apply group theory
to the molecular orbital calculations about aromatic hydrocarbons such as ethylene,
cyclopropenyl radical and cation as well as benzene and allyl radical. In these cases,
in addition to adoption of the molecular orbital theory, we adopt so-called
“π-electron approximation.” With the first three molecules, we will not have to
“solve” a secular equation. But, for allyl radical we deal with two SALC orbitals
belonging to the same irreducible representations and, hence, the final molecular
orbitals must be obtained by solving the secular equation.
19.4.1 Ethylene
We start with one of the simplest examples, ethylene. Ethylene is a planar molecule
and belongs to D2h symmetry (see Sect. 17.2). In the molecule two π-electrons
extend vertically to the molecular plane toward upper and lower directions
(Fig. 19.5). The molecular plane forms a node to atomic 2pz orbitals; that is, those
atomic orbitals change a sign relative to the molecular plane (i.e., the xy-plane).
In Fig. 19.5, two pz atomic orbitals of carbon are denoted by ϕ1 and ϕ2. We
should be able to construct basis vectors using ϕ1 and ϕ2. Corresponding to the two
atomic orbitals, we are dealing with a two-dimensional vector space. Let us consider
how ϕ1 and ϕ2 are transformed by a symmetry operation. First we examine an
operation C2(z). This operation exchanges ϕ1 and ϕ2. That is, we have
+ z +
y y
& &
x
+ +
x
Fig. 19.5 Ethylene molecule placed on the xyz-coordinate system. Two pz atomic orbitals of carbon
are denoted by ϕ1 and ϕ2. The atomic orbitals change a sign relative to the molecular plane (i.e., the
xy-plane)
748 19 Applications of Group Theory to Physical Chemistry
C2 ðzÞðϕ2 Þ ¼ ϕ1 : ð19:73Þ
Whereas (19.76) and (19.77) show the transformation of a position vector in ℝ3,
(19.75) represents the transformation of functions in a two-dimensional vector space.
A vector space composed of functions may be finite-dimensional or infinite-dimen-
sional. We have already encountered the latter case in Part I where we dealt with the
quantum mechanics of a harmonic oscillator. Such a function space is often referred
to as a Hilbert space. An essence of (19.75) is characterized by that a trace
(or character) of the matrix is zero.
Let us consider another transformation C2( y). In this case the situation is different
from the above case in that ϕ1 is converted to ϕ1 by C2( y) and that ϕ2 is converted
to ϕ2. Notice again that the molecular plane forms a node to atomic p-orbitals.
Thus, C2( y) is represented by
1 0
C 2 ð yÞ ¼ : ð19:78Þ
0 1
The trace of the matrix C2( y) is 2. In this way, choosing ϕ1 and ϕ2 for the basis
functions for the representation of D2h we can determine the characters for individual
symmetry transformations of atomic 2pz orbitals in ethylene belonging to D2h. We
collect the results in Table 19.1.
Next, we examine what kind of irreducible representations is contained in our
present representation of ethylene. To do this, we need a character table of D2h (see
Table 19.2). If a specific kind of irreducible representation is contained, then we
19.4 MO Calculations Based on π-Electron Approximation 749
Table 19.1 Characters for individual symmetry transformations of atomic 2pz orbitals in ethylene
D2h E C2(z) C2( y) C2(x) i σ(xy) σ(zx) σ(yz)
Γ 2 0 2 0 0 2 0 2
want to examine how many times that specific representation takes place. Equation
(18.83) is very useful for this purpose. In the present case, n ¼ 8 in (18.83). Also
taking into account (18.79) and (18.80), we get
As a next step, we are going to find an appropriate basis function belonging to two
irreducible representations of (19.80). For this purpose, we use projection operators
expressed as
h i
dα X ðαÞ
PðαÞ ¼ χ ðgÞ g: ð18:174Þ
n g
h i
1 X ðB1u Þ
PðB1u Þ ϕ1 ¼ χ ð g Þ gϕ1
8 g
1
¼ ½1 ϕ1 þ 1 ϕ2 þ ð1Þðϕ1 Þ þ ð1Þðϕ2 Þ þ ð1Þðϕ2 Þ
8
1
þð1Þðϕ1 Þ þ 1 ϕ2 þ 1 ϕ1 ¼ ðϕ1 þ ϕ2 Þ: ð19:81Þ
2
1
¼ ½1 ϕ1 þ ð1Þϕ2 þ ð1Þðϕ1 Þ þ 1ðϕ2 Þ þ 1ðϕ2 Þ þ ð1Þðϕ1 Þ
8
1
þð1Þϕ2 þ 1 ϕ1 ¼ ðϕ1 ϕ2 Þ: ð19:82Þ
2
Thus, after going through routine but sure procedures, we have reached appro-
priate basis functions that belong to each irreducible representation.
As mentioned earlier (see Table 17.5), D2h can be expressed as a direct-product
group and C2 group is contained in D2h as a subgroup. We can use it for the present
analysis. Suppose now that we have a group ℊ and that H is a subgroup of ℊ. Let
g be an arbitrary element of ℊ and let D(g) be a representation of g. Meanwhile, let
h be an arbitrary element of H . Then with 8 h 2 H , a collection of D(h) is a
representation of H . We write this relation as
D#H: ð19:83Þ
Also, we have
19.4 MO Calculations Based on π-Electron Approximation 751
h i
1 X ðBÞ 1 1
PðBÞ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ ð1Þϕ2 ¼ ðϕ1 ϕ2 Þ: ð19:86Þ
2 g 2 2
The relations of (19.85) and (19.86) are essentially the same as (19.81) and
(19.82), respectively.
We can easily construct a character table (see Table 19.3). There should be two
irreducible representations. Regarding the totally symmetric representation, we
allocate 1 to each symmetry operation. For another representation we allocate 1 to
an identity element E and 1 to an element C2 so that the row and column of the
character table are orthogonal to each other.
Readers might well ask why we bother to make circuitous approaches to reaching
predictable results such as (19.81) and (19.82) or (19.85) and (19.86). This question
seems natural when we are dealing with a case where the number of basis vectors
(i.e., a dimension of the vector space) is small, typically 2 as in the present case. With
increasing dimension of the vector space, however, to seek and determine appropri-
ate SALCs become increasingly complicated and difficult. Under such circum-
stances, a projection operator is an indispensable tool to address the problems.
We have a two-dimensional secular equation to be solved such that
e
H 11 λe
S11 e 12 λe
H S12
¼ 0: ð19:87Þ
He 21 λe
S21 e 22 λe
H S22
In the above argument, 12 ðϕ1 þ ϕ2 Þ belongs to B1u and 12 ðϕ1 ϕ2 Þ belongs to B3g.
Since they belong to different irreducible representations, we have
e 12 ¼ e
H S12 ¼ 0: ð19:88Þ
e 11
H
λ1 ¼ e 22 =e
and λ2 ¼ H S22 : ð19:90Þ
e
S11
The next step is to determine the energy eigenvalue of the molecule. Note here
that a role of SALCs is to determine a suitable irreducible representation that
corresponds to a “direction” of a vector. As the coefficient keeps the direction of a
vector unaltered, it would be of secondary importance. The final form of normalized
MOs can be decided last. That procedure includes the normalization of a vector.
Thus, we tentatively choose following functions for SALCs, i.e.,
752 19 Applications of Group Theory to Physical Chemistry
ξ1 ¼ ϕ1 þ ϕ2 and ξ2 ¼ ϕ1 ϕ2 :
Then, we have
ð ð
e 11 ¼ ξ1 Hξ1 dτ ¼ ðϕ1 þ ϕ2 Þ H ðϕ1 þ ϕ2 Þdτ
H
ð ð ð ð
ð19:91Þ
¼ ϕ1 Hϕ1 dτ þ ϕ1 Hϕ2 dτ þ ϕ2 Hϕ1 dτ þ ϕ2 Hϕ2 dτ
¼ H 11 þ H 12 þ H 21 þ H 22 ¼ H 11 þ 2H 12 þ H 22 :
Similarly, we have
ð
e 22 ¼ ξ2 Hξ2 dτ ¼ H 11 2H 12 þ H 22 :
H ð19:92Þ
The last equality comes from the fact that we have chosen real functions for ϕ1
and ϕ2 as studied in Part I. Moreover, we have
ð ð ð
H 11 ϕ1 Hϕ1 dτ ¼ ϕ1 Hϕ1 dτ ¼ ϕ2 Hϕ2 dτ ¼ H 22 ,
ð ð ð ð ð19:93Þ
H 12 ϕ1 Hϕ2 dτ ¼ ϕ1 Hϕ2 dτ ¼ ϕ2 Hϕ1 dτ ¼ ϕ2 Hϕ1 dτ ¼ H 21 :
The first equation comes from the fact that both H11 and H22 are calculated using
the same 2pz atomic orbital of carbon. The second equation results from the fact that
H is Hermitian. Notice that both ϕ1 and ϕ2 are real functions. Following the
convention, we denote
α H 11 ¼ H 22 and β H 12 , ð19:94Þ
e 11 ¼ 2ðα þ βÞ:
H ð19:95Þ
e 22 ¼ 2ðα βÞ:
H ð19:96Þ
Meanwhile, we have
19.4 MO Calculations Based on π-Electron Approximation 753
ð ð
e
S11 ¼ hξ1 jξ1 i ¼ ðϕ1 þ ϕ2 Þ ðϕ1 þ ϕ2 Þdτ ¼ ðϕ1 þ ϕ2 Þ2 dτ
ð ð ð ð
¼ ϕ21 dτ þ ϕ22 dτ þ 2 ϕ1 ϕ2 dτ ¼ 2 þ 2 ϕ1 ϕ2 dτ, ð19:97Þ
where we used the fact that ϕ1 and ϕ2 have been normalized. Also following the
convention, we denote
ð
S ϕ1 ϕ2 dτ ¼ S12 , ð19:98Þ
e
S11 ¼ 2ð1 þ SÞ: ð19:99Þ
Similarly, we get
e
S22 ¼ 2ð1 SÞ: ð19:100Þ
Substituting (19.95) and (19.96) along with (19.99) and (19.100) for (19.90), we
get as the energy eigenvalue
αþβ αβ
λ1 ¼ and λ2 ¼ : ð19:101Þ
1þS 1S
j ξ1 i ϕ1 þ ϕ2
Ψ1 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:103Þ
j jξ1 j j 2ð 1 þ SÞ
Also, we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j jξ2 j j¼ hξ2 jξ2 i ¼ 2ð1 SÞ: ð19:104Þ
= ( )
= ( )
Fig. 19.6 HOMO and LUMO energy levels and their assignments of ethylene
j ξ2 i ϕ1 ϕ2
Ψ2 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:105Þ
j jξ2 j j 2ð 1 SÞ
Note that both normalized MOs and energy eigenvalues depend upon whether we
ignore an overlap integral, as being the case with the simplest Hückel approximation.
Nonetheless, MO functional forms (in this case either ϕ1 + ϕ2 or ϕ1 ϕ2) remain the
same regardless of the approximation levels about the overlap integral. Regarding
quantitative evaluation of α, β, and S, we will briefly mention it later.
Once we have decided symmetry of MO (or an irreducible representation which
the orbital belongs to) and its energy eigenvalue, we will be in a position to examine
various physicochemical properties of the molecule. One of them is an optical
transition within a molecule, particularly electric dipole transition. In most cases,
the most important transition is that occurring among the highest-occupied molec-
ular orbital (HOMO) and lowest-unoccupied molecular orbital (LUMO). In the case
of ethylene, those levels are depicted in Fig. 19.6. In a ground state (i.e., the most
stable state), two electrons are positioned in a B1u state (HOMO). An excited state is
assigned to B3g (LUMO). The ground state that consists only of fully occupied MOs
belongs to a totally symmetric representation. In the case of ethylene, the ground
state belongs to Ag accordingly.
If a photon is absorbed by a molecule, that molecule is excited by an energy of the
photon. In ethylene, this process takes place by exciting an electron from B1u to B3g.
The resulting electronic state ends up as an electron remaining in B1u and another
excited to B3g (a final state). Thus, the representation of the final sate electronic
configuration (denoted by Γ f) is described as
That is, the final excited state is expressed as a direct product of the states
associated with the optical transition. To determine the symmetry of the final sate,
we use (18.198) and (18.200). If Γ in (19.106) is reducible, using (18.200) we can
19.4 MO Calculations Based on π-Electron Approximation 755
determine the number of times that individual representations take place. Calculating
χ ðB1u Þ ðgÞχ ðB3g Þ ðgÞ for each group element, we can readily get the result. Using a
character table, we have
Γ f ¼ B2u : ð19:107Þ
Thus, we find that the transition is Ag ⟶ B2u, where Ag is called an initial state
electronic configuration and B2u is called a final state electronic configuration. This
transition is characterized by an electric dipole transition moment operator P. Here
we have an important physical quantity of transition matrix element. This quantity Tfi
is approximated by
T fi Θf jεe PjΘi , ð19:108Þ
where εe is a unit polarization vector of the electric field; Θi and Θf are the initial state
and final sate electronic configuration, respectively. The description (19.108) is in
parallel with (4.5). Notice that in (4.5) we dealt with a single electron system such as
a particle confined in a square-well potential, a sole one-dimensional harmonic
oscillator, and an electron in a hydrogen atom.
In the present case, however, we are dealing with a two-electron system, ethylene.
Consequently, we cannot describe Θf by a simple wave function, but must use a
more elaborated function. Nonetheless, when we discuss the optical transition of a
molecule, it is often the case that when we study optical absorption or optical
emission we first wish to know whether such a phenomenon truly takes place. In
such a case, qualitative prediction for this is of great importance. This can be done by
judging whether the integral (19.108) vanishes. If the integral does not vanish, the
relevant transition is said to be allowed. If, on the other hand, the integral vanishes,
the transition is called forbidden. In this context, a systematic approach based on
group theory is a powerful tool for this.
Let us consider optical absorption of ethylene. With the transition matrix element
for this, we have
D E
T fi ¼ Θf ðB2u Þ jεe PjΘi ðAg Þ : ð19:109Þ
D E
ðβ Þ ðαÞ
M fi ¼ Φf jOðγÞ jΦi , ð19:110Þ
1 X ðαÞ 1 X ðαÞ
qα ¼ χ ðgÞ χ ðγ βÞ ðgÞ ¼ χ ðgÞχ ðγ βÞ ðgÞ
n g n g
1 X ðαÞ ðγ Þ ðβ Þ 1 X ðγÞ
¼ χ ð g Þχ ð g Þχ ð g Þ ¼ χ ðgÞχ ðαÞ ðgÞχ ðβÞ ðgÞ ð19:111Þ
n g n g
1 X ðγ Þ ðα β Þ 1 X ðγÞ
¼ χ ð g Þχ ð g Þ ¼ χ ð gÞ χ ð α β Þ ð gÞ ¼ qγ
n g n g
Consequently, the number of times that D(γ) appears in D(α β) is identical to the
number of times that D(α) appears in D(γ β). Thus, it suffices to examine whether
D(γ β) contains D(α). In other words, we only have to examine whether qα 6¼ 0 in
(19.111).
Thus, applying (19.111) to (19.109), we examine whether DðB2u Ag Þ contains the
irreducible representation D(η) that is related to x. We easily get
1
qB2u ¼ ½1 1 þ ð1Þ ð1Þ þ 1 1 þ ð1Þ ð1Þ
8
þð1Þ ð1Þ þ 1 1 þ ð1Þ ð1Þ þ 1 1 ¼ 1:
B2u B2u ¼ Ag :
This means that if a light polarized along the y-axis is incident, i.e., εe is parallel to
the y-axis, the transition is allowed. In that situation, ethylene is said to be polarized
along the y-axis or polarized in the direction of the y-axis. As a molecular axis is
parallel to the y-axis, ethylene is polarized in the direction of the molecular axis. This
is often the case with aromatic molecules having a well-defined molecular long axis
such as ethylene.
We would examine whether ethylene is polarized along, e.g., the x-axis. From a
character table of D2h (see Table 19.2), x belongs to B3u. In that case, using (19.111)
we have
19.4 MO Calculations Based on π-Electron Approximation 757
1
qB3u ¼ ½1 1 þ ð1Þ ð1Þ þ 1 ð1Þ þ ð1Þ 1
8
þð1Þ ð1Þ þ 1 1 þ ð1Þ 1 þ 1 ð1Þ ¼ 0:
Let us think of another example, cyclopropenyl radical that has three resonant
structures (Fig. 19.7). It is a planar molecule and three carbon atoms form an
equilateral triangle. Hence, the molecule belongs to D3h symmetry (see Sect.
17.2). In the molecule three π-electrons extend vertically to the molecular plane
toward upper and lower directions. Suppose that cyclopropenyl radical is placed on
the xy-plane. Three pz atomic orbitals of carbons located at vertices of an equilateral
triangle are denoted by ϕ1, ϕ2, and ϕ3 in Fig. 19.8. The orbitals are numbered
clockwise so that the calculations can be consistent with the conventional notation of
a character table (vide infra). We assume that these π-orbitals take positive and
negative signs on the upper and lower sides of the plane of paper, respectively, with a
nodal plane lying on the xy-plane. The situation is similar to that of ethylene and the
problem can be treated in parallel to the case of ethylene.
As in the case of ethylene, we can choose ϕ1, ϕ2, and ϕ3 as real functions. We
construct basis vectors using these vectors. What we want to do to address the
problem is as follows: (i) We examine how ϕ1, ϕ2, and ϕ3 are transformed by the
symmetry operations of D3h. According to the analysis, we can determine what
irreducible representations SALCs should be assigned. (ii) On the basis of knowl-
edge obtained in (i), we construct proper MOs.
In Table 19.4 we list a character table of D3h along with symmetry species. First
we examine traces (characters) of representation matrices. Similarly in the case of
ethylene, a subgroup C3 of D3h plays an essential role (vide infra). This subgroup
contains three group elements such that
C 3 ¼ E, C 3 , C 23 :
In the above, we use the same notation for the group name and group element, and
so we should be careful not to confuse them.
⋅
structures of cyclopropenyl
radical ⟷ ⟷
⋅ ⋅
758 19 Applications of Group Theory to Physical Chemistry
O x
z
0 1
0 0 1
B C
C 3 ðzÞ ¼ @ 1
2
0 0 A: ð19:117Þ
0 1 0
Therefore,
0 1
1 0 0
B C
C2 ¼ @ 0 0 1 A: ð19:119Þ
0 1 0
In this way we can determine the trace for individual symmetry transformations
of basis functions ϕ1, ϕ2, and ϕ3. We collect the results of characters of a reducible
representation Γ in Table 19.5. It can be reduced to a summation of irreducible
representations according to the procedures given in (18.81)–(18.83) and using a
character table of D3h (Table 19.4). As a result, we get
Γ ¼ A2 00 þ E 00 : ð19:121Þ
As in the case of ethylene, we make the best use of the information of a subgroup
C3 of D3h. Let us consider a subduced representation of D of D3h to C3. For this, in
Table 19.6 we show a character table of irreducible representations of C3. We can
readily construct this character table. There should be three irreducible representa-
tions. Regarding the totally symmetric representation, we allocate 1 to each sym-
metry operation. Hence, for other representations we allocate 1 to an identity element
E and two other triple roots of 1, i.e., ε and ε [where ε ¼ exp (i2π/3)] to an element
C3 and C32 as shown so that the row and column vectors of the character table are
orthogonal to each other.
Returning to the construction of the subduced representation, we have
760 19 Applications of Group Theory to Physical Chemistry
Also, we have
h i
1 X ½ E ð 1Þ
P½E ϕ1 ¼
ð1Þ 1
χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ ε ϕ3 þ ðε Þ ϕ2
3 g 3
1
¼ ðϕ1 þ εϕ2 þ ε ϕ3 Þ: ð19:124Þ
3
Also, we get
h i
1 X ½ E ð 2Þ
P ½ E ϕ1 ¼
ð 2Þ 1
χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ ðε Þ ϕ3 þ ε ϕ2
3 g 3
1
¼ ðϕ1 þ ε ϕ2 þ εϕ3 Þ: ð19:125Þ
3
where for the second equality we used the fact that C3 is a linear operator. That is,
regarding a SALC ξ1, an eigenvalue of C3 is 1.
Similarly, we have
Furthermore, we get
C 3 ðξ 3 Þ ¼ ε ξ 3 : ð19:129Þ
Thus, we find that regarding SALCs ξ2 and ξ3, eigenvalues of C3 are ε and ε ,
respectively. These pieces of information imply that if we appropriately choose
proper functions for basis vectors, a character of a symmetry operation for a
one-dimensional representation is identical to an eigenvalue of the said symmetry
operation (see Table 19.6).
Regarding the last parts of the calculations, we follow the procedures described in
the case of ethylene. Using the above functions ξ1, ξ2, and ξ3, we construct the
secular equation such that
He e
11 λS11
e 22 λe
H S22 ¼ 0: ð19:130Þ
e e
H 33 λS33
Since we have obtained three SALCs that are assigned to individual irreducible
representations A, E(1), and E(2), these SALCs span the representation space V3. This
makes off-diagonal elements of the secular equation vanish and it is simplified as in
(19.130). Here, we have
ð ð
e 11 ¼ ξ1 Hξ1 dτ ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ H ðϕ1 þ ϕ2 þ ϕ3 Þdτ
H
where we used the same α and β as defined in (19.94). Strictly speaking, α and β
appearing in (19.131) should be slightly different from those of (19.94), because a
Hamiltonian is different. This approximation, however, would be enough for the
present studies. In a similar manner, we get
762 19 Applications of Group Theory to Physical Chemistry
ð ð
e 22 ¼ ξ2 Hξ2 dτ ¼ 3ðα βÞ
H e 33 ¼ ξ3 Hξ3 dτ ¼ 3ðα βÞ:
and H ð19:132Þ
Meanwhile, we have
ð
e
S11 ¼ hξ1 jξ1 i ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ ðϕ1 þ ϕ2 þ ϕ3 Þdτ
ð
¼ ðϕ1 þ ϕ2 þ ϕ3 Þ2 dτ ¼ 3ð1 þ 2SÞ: ð19:133Þ
Similarly, we get
e
S22 ¼ e
S33 ¼ 3ð1 SÞ: ð19:134Þ
α þ 2β αβ αβ
λ1 ¼ , λ2 ¼ , and λ3 ¼ : ð19:135Þ
1 þ 2S 1S 1S
Notice that two MOs belonging to E00 have the same energy. These MOs are said
to be energetically degenerate. This situation is characteristic of a two-dimensional
representation. Actually, even though the group C3 has only one-dimensional
representations (because it is an Abelian group), the two complex conjugate repre-
sentations labelled E behave as if they were a two-dimensional representation
[2]. We will again encounter the same situation in a next example, benzene.
From (19.126), we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j jξ1 j j¼ hξ1 jξ1 i ¼ 3ð1 þ 2SÞ: ð19:136Þ
j ξ1 i ϕ1 þ ϕ2 þ ϕ3
Ψ1 ¼ ¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:137Þ
j jξ1 j j 3ð1 þ 2SÞ
Also, we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
hξ2 jξ2 i ¼ 3ð1 SÞ
jjξ2 jj ¼ and j jξ3 j j¼ hξ3 jξ3 i
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ 3ð 1 SÞ : ð19:138Þ
j ξ2 i ϕ þ εϕ2 þ ε ϕ3
Ψ2 ¼ ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:139Þ
j jξ2 j j 3ð 1 SÞ
j ξ3 i ϕ þ ε ϕ2 þ εϕ3
Ψ3 ¼ ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:140Þ
j jξ3 j j 3ð 1 SÞ
Then we have
1 i
ðΨ 2 Ψ 3 ÞU ¼ pffiffiffi ðΨ 2 þ Ψ 3 Þ pffiffiffi ðΨ 2 þ Ψ 3 Þ : ð19:142Þ
2 2
f2 and Ψ
Thus, defining Ψ f3 as
f2 ¼ p1ffiffiffi ðΨ 2 þ Ψ 3 Þ and
Ψ f3 ¼ piffiffiffi ðΨ 2 þ Ψ 3 Þ,
Ψ ð19:143Þ
2 2
we get
1
f2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ ½2ϕ1 þ ðε þ ε Þϕ2 þ ðε þ εÞϕ3
6ð 1 SÞ
1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2ϕ1 ϕ2 ϕ3 Þ: ð19:144Þ
6ð 1 SÞ
Also, we have
h pffiffiffi pffiffiffi i
i
f3 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i
Ψ ½ðε εÞϕ2 þ ðε ε Þϕ3 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 3ϕ2 þ i 3ϕ3
6ð1 SÞ 6ð 1 SÞ
1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðϕ2 ϕ3 Þ: ð19:145Þ
2ð 1 SÞ
764 19 Applications of Group Theory to Physical Chemistry
19.4.3 Benzene
Benzene has structural formula which is shown in Fig. 19.9. It is a planar molecule
and six carbon atoms form a regular hexagon. Hence, the molecule belongs to D6h
symmetry. In the molecule six π-electrons extend vertically to the molecular plane
toward upper and lower directions as in the case of ethylene and cyclopropenyl
radical. This is a standard illustration of quantum chemistry and dealt with in many
textbooks. As in the case of ethylene and cyclopropenyl radical, the problem can be
treated similarly.
As before, six equivalent pz atomic orbitals of carbon are denoted by ϕ1 to ϕ6 in
Fig. 19.10. These vectors or their linear combinations span a six-dimensional
representation space. We construct basis vectors using these vectors. Following
the previous procedures, we construct proper SALC orbitals along with MOs.
Similarly as before, a subgroup C6 of D6h plays an essential role. This subgroup
contains six group elements such that
C6 ¼ E, C6 , C 3 , C 2 , C 23 , C56 : ð19:146Þ
O x
z
0 1
0 1 0 0 0 0
B0 0C
B 0 1 0 0 C
B C
B0 0 0 1 0 0C
C 6 ðzÞ ¼ B
B0
C: ð19:148Þ
B 0 0 0 1 0C
C
B C
@0 0 0 0 0 1A
1 0 0 0 0 0
Once again, we can determine the trace for individual symmetry transformations
belonging to D6h. We collect the results in Table 19.7. The representation is
reducible and this is reduced as follows using a character table of D6h
(Table 19.8). As a result, we get
Here, we used Table 19.9 that shows a character table of irreducible representa-
tions of C6. Following the previous procedures, as SALCs we have
766 19 Applications of Group Theory to Physical Chemistry
Table 19.7 Characters for individual symmetry transformations of 2pz orbitals in benzene
D6h E 2C6 2C3 C2 3C 02 3C 002 i 2S3 2S6 σh 3σ d 3σ v
Γ 6 0 0 0 2 0 0 0 0 6 0 2
6PðAÞ ϕ1 ξ1 ¼ ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6 ,
6PðBÞ ϕ1 ξ2 ¼ ϕ1 ϕ2 þ ϕ3 ϕ4 þ ϕ5 ϕ6 ,
P½E1 ϕ1 ξ3 ¼ ϕ1 þ εϕ2 ε ϕ3 ϕ4 εϕ5 þ ε ϕ6 ,
ð1Þ
ð19:151Þ
P½E1 ϕ ξ ¼ ϕ þ ε ϕ εϕ ϕ ε ϕ þ εϕ ,
ð2Þ
1 4 1 2 3 4 5 6
ð
e 11 ¼ ξ1 Hξ1 dτ ¼ hξ1 jHξ1 i ¼ 6ðα þ 2β þ 2β0 þ β00 Þ:
H ð19:153Þ
Meanwhile, we have
e
S11 ¼ hξ1 jξ1 i ¼ 6ð1 þ 2S þ 2S0 þ S00 Þ,
e
S22 ¼ hξ2 jξ2 i ¼ 6ð1 2S þ 2S0 S00 Þ,
ð19:155Þ
e
S33 ¼ e
S44 ¼ 6ð1 þ S S0 S00 Þ,
e
S55 ¼ e
S66 ¼ 6ð1 S S0 þ S00 Þ,
where S, S0, and S00 are overlap integrals between the ortho, meta, and para positions,
respectively. Substituting (19.153) through (19.155) for (19.152), the energy eigen-
values are readily obtained as
768 19 Applications of Group Theory to Physical Chemistry
Notice that two MOs ξ3 and ξ4 as well as ξ5 and ξ6 are degenerate. As can be seen,
λ3 and λ4 are doubly degenerate. So are λ5 and λ6.
From (19.151), we get for instance
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
kξ1 k ¼ hξ1 jξ1 i ¼ 6ð1 þ 2S þ 2S0 þ S00 Þ: ð19:157Þ
j ξ1 i ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6
Ψ1 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi : ð19:158Þ
kξ 1 k 6ð1 þ 2S þ 2S0 þ S00 Þ
Following the previous examples, we have other normalized MOs. That is, we
have
j ξ2 i ϕ1 ϕ2 þ ϕ3 ϕ4 þ ϕ5 ϕ6
Ψ2 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ,
kξ 2 k 6ð1 2S þ 2S0 S00 Þ
j ξ3 i ϕ1 þ εϕ2 ε ϕ3 ϕ4 εϕ5 þ ε ϕ6
Ψ3 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ,
kξ 3 k 6ð1 þ S S0 S00 Þ
j ξ4 i ϕ1 þ ε ϕ2 εϕ3 ϕ4 ε ϕ5 þ εϕ6
Ψ4 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi , ð19:159Þ
kξ 4 k 6ð1 þ S S0 S00 Þ
j ξ i ϕ ε ϕ2 εϕ3 þ ϕ4 ε ϕ5 εϕ6
Ψ5 ¼ 5 ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ,
kξ 5 k 6ð1 S S0 þ S00 Þ
j ξ i ϕ εϕ2 ε ϕ3 þ ϕ4 εϕ5 ε ϕ6
Ψ6 ¼ 6 ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi :
kξ 6 k 6ð1 S S0 þ S00 Þ
= ( )
f3 ¼ 2ϕ1 þpϕffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ 2 ϕ3 2ϕ4 ϕ5 þ ϕ6
,
12ð1 þ S S0 S00 Þ
ð19:160Þ
f4 ¼ ϕp
Ψ 2 þ ϕ3 ϕ5 ϕ6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
2 1 þ S S0 S00
f5 and Ψ
Similarly, transforming Ψ 5 and Ψ 6 to Ψ f6 , respectively, we have
f5 ¼ 2ϕ1 pϕffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ 2 ϕ3 þ 2ϕ4 ϕ5 ϕ6
,
12ð1 S S0 þ S00 Þ
ð19:161Þ
f6 ¼ ϕp
Ψ 2 ϕ3 þ ϕ5 ϕ6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
2 1 S S0 þ S00
Γ ¼ E 1g E 2u :
where ΦðA1g Þ stands for the totally symmetric ground-state electronic configuration;
ΦðE1g E2u Þ denotes an excited state electronic configuration represented by a direct-
product representation. This representation is reducible and expressed as a direct
sum of irreducible representations such that
770 19 Applications of Group Theory to Physical Chemistry
We revisit the allyl radical and perform its MO calculations. As already noted,
Tables 18.1 and 18.2 of Example 18.1 collected representation matrices of individual
symmetry operations in reference to the basis vectors comprising three atomic
orbitals of allyl radical. As usual, we examine traces (or characters) of those
matrices. Table 19.10 collects them. The representation is readily reduced according
to the character table of C2v (see Table 18.4) so that we have
Γ ¼ A2 þ 2B1 : ð19:164Þ
We have two SALC orbitals that belong to the same irreducible representation of
B1. As noted in Sect. 19.3, the orbitals obtained by a linear combination of these two
SALCs belong to B1 as well. Such a linear combination is given by a unitary
transformation. In the present case, it is convenient to transform the basis vectors
two times. The first transformation is carried out to get SALCs and the second one
will be done in the process of solving a secular equation using the SALCs. Sche-
matically showing the procedures, we have
8 8 8
< ϕ1
> < Ψ1
> < Φ1
>
ϕ2 ! Ψ 2 ! Φ2 ,
>
: >
: >
:
ϕ3 Ψ3 Φ3
where ϕ1, ϕ2, ϕ3 show the original atomic orbitals; Ψ 1, Ψ 2, Ψ 3 the SALCs; Φ1, Φ2,
Φ3 the final MOs. Thus, the three sets of vectors are connected through unitary
transformations.
Starting with ϕ1 and following the previous cases, we have, e.g.,
h i
1 X ðB1 Þ
PðB1 Þ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ϕ1 :
4 g
Meanwhile, we get
PðA2 Þ ϕ1 ¼ 0,
h i
1 X ðA2 Þ 1
PðA2 Þ ϕ2 ¼ χ ð g Þ gϕ2 ¼ ðϕ2 ϕ3 Þ:
4 g 2
Thus, we recovered the results of Example 18.1. Notice that ϕ1 does not partic-
ipate in A2, but take part in B1 by itself.
Normalized SALCs are given as follows:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ 1 ¼ ϕ1 , Ψ 2 ¼ ðϕ2 þ ϕ3 Þ= 2ð1 þ S0 Þ, Ψ 3 ¼ ðϕ2 ϕ3 Þ= 2ð1 S0 Þ,
Ð
where we define S0 ϕ2ϕ3dτ. If Ψ 1, Ψ 2, and Ψ 3 belonged to different irreducible
representations (as in the cases of previous three examples of ethylene,
cyclopropenyl radical, and benzene), the secular equation would be fully reduced
to a form of (19.66). In the present case, however, Ψ 1 and Ψ 2 belong to the same
irreducible representation B1, making the situation a bit complicated. Nonetheless,
we can use (19.61) and the secular equation is “partially” reduced.
Defining
ð ð
H jk ¼ Ψ j HΨ k dτ and Sjk ¼ Ψ j Ψ k dτ,
det H jk λSjk ¼ 0:
where α, β, and S are similarly defined as (19.94) and (19.98). The quantity β0 is
defined as
ð
0
β ϕ2 Hϕ3 dτ:
α β0
λ¼ :
1 S0
The first equation of (19.165) is somewhat complicated, and so we adopt the next
approximation. That is,
S0 ¼ β0 ¼ 0: ð19:166Þ
This approximation is justified, because two carbon atoms C2 and C3 are pretty
remote, and so the interaction between them is likely to be weak enough. Thus, we
rewrite the first equation of (19.165) as
pffiffiffi
αλ 2ðβ SλÞ
pffiffiffi ¼ 0: ð19:167Þ
2ðβ SλÞ αλ
19.4 MO Calculations Based on π-Electron Approximation 773
pffiffiffiffiffiffiffiffiffiffiffi 1
1x 1 x
2
where λL < λH. To determine the corresponding eigenfunctions Φ (i.e., MOs), we use
a linear combination of two SALCs Ψ 1 and Ψ 2. That is, putting
Φ ¼ c1 Ψ 1 þ c2 Ψ 2 ð19:169Þ
1
λ0 α and Φ0 ¼ pffiffiffi ðϕ2 ϕ3 Þ, ð19:172Þ
2
where we have λL < λ0 < λH. The eigenfunction Φ0 does not participate in chemical
bonding and, hence, is said to be a nonbonding orbital.
It is worth noting that eigenfunctions ΦL, Φ0, and ΦH have the same function
forms as those obtained by the simple Hückel theory that ignores the overlap
integrals S. It is because the interaction between ϕ2 and ϕ3 that construct SALCs
of Ψ 2 and Ψ 3 is weak.
Notice that within the framework of the simple Hückel theory, two sets of basis
vectors jϕ1 i, p1ffiffi2 jϕ2 þ ϕ3 i, p1ffiffi2 jϕ2 ϕ3 i and |ΦLi, |ΦHi, |Φ0i are connected by a
following unitary matrix V:
1 1
ðjΦL i jΦH i jΦ0 iÞ ¼ jϕ1 i pffiffiffi jϕ2 þ ϕ3 i pffiffiffijϕ2 ϕ3 i V
2 2
0 1
1 1
pffiffiffi pffiffiffi 0
B
B 2 2 C
C
1 1 B C
¼ jϕ1 i pffiffiffi jϕ2 þ ϕ3 i pffiffiffijϕ2 ϕ3 i B 1
pffiffiffi
1
pffiffiffi C: ð19:173Þ
2 2 B 0 C
@ 2 2 A
0 0 1
The optical transition of allyl radical represents general features of the optical
transitions of molecules. To make a story simple, let us consider a case of allyl
cation. Figure 19.12 shows electronic configurations together with symmetry species
of individual eigenstates and their corresponding energy eigenvalues for the allyl
cation. In Fig. 19.13, we redraw its geometry where the origin is located at the center
19.4 MO Calculations Based on π-Electron Approximation 775
( )
( ) + 2 )(1 − 2
Fig. 19.12 Electronic configurations and symmetry species of individual eigenstates along with
their corresponding energy eigenvalues for the allyl cation. (a) Ground state. (b) First excited state.
(c) Second excited state
C1
r1
C3 C2
r3 O r2
Fig. 19.13 Geometry and position vectors of carbon atoms of the allyl cation. The origin is located
at the center of a line segment connecting C2 and C3 (r2 + r3 = 0)
Γ ¼ B1 A2 ¼ B2 :
Since a direct product of the initial electronic configuration [Θi ðA1 Þ ] and the
final configuration Θf ðB1 A2 Þ is B1 A2 A1 ¼ B2. Then, if εe P belongs to the
same irreducible representation B2, the associated optical transition should be
allowed. Consulting a character table of C2v, we find that y belongs to B2. Thus,
the allowed optical transition is polarized along the y-direction.
(ii) ΦL ! ΦH: In parallel with the above case, Tfi is described by
D E
T fi ¼ Θf ðB1 B1 Þ
jεe PjΘi ðA1 Þ : ð19:176Þ
A1 ! A1 ,
where the former A1 indicates the electronic ground state and the latter A1
indicates the excited state given by B1 B1 ¼ A1. The direct product of them is
simply described as B1 B1 A1 ¼ A1. Consulting a character table again, we
find that z belongs to A1. This implies that the allowed transition is polarized
along the z-direction.
Next, we investigate the above optical transition in a semiquantitative manner. In
principle, the transition matrix element Tfi should be estimated from (19.175) and
(19.176) that use electronic configurations of two-electron system. Nonetheless, a
formulation of (4.7) of Sect. 4.1 based upon one-electron states well serves our
present purpose. For ΦL ! Φ0 transition, we have
ð ð
T fi ðΦL ! Φ0 Þ ¼ Φ0 εe PΦL dτ ¼ eεe Φ0 rΦL dτ
ð
1 pffiffiffi 1
¼ eεe 2ϕ1 þ ϕ2 þ ϕ3 r pffiffiffi ðϕ2 ϕ3 Þ dτ
2 2
ð pffiffiffi pffiffiffi
eε
¼ peffiffiffi 2ϕ1 rϕ2 2ϕ1 rϕ3 þ ϕ2 rϕ2 þ ϕ3 rϕ2 ϕ3 rϕ3 dτ
2 2
ð
eεe
pffiffiffi ðϕ2 rϕ2 ϕ3 rϕ3 Þ dτ
2 2
ð ð
eεe
pffiffiffi r2 jϕ2 j2 dτ r3 jϕ3 j2 dτ
2 2
eε eε
¼ peffiffiffi ðr2 r3 Þ ¼ pffiffieffi r2
2 2 2
Ð
where with the first near equality we ignored integrals ϕirϕjdτ (i 6¼ j); with the last
near equality r r2 or r3. For these approximations, we assumed that an electron
density is very high near C2 or C3 with ignorable density at a place remote from
them. Choosing εe for the direction of r2, we get
e
T fi ðΦL ! Φ0 Þ pffiffiffi j r2 j :
2
e
T fi ðΦL ! ΦH Þ jr j:
2 1
Transition
pffiffiffi probability is proportional to a square of an absolute value of Tfi. Using
j r2 j 3 j r1 j, we have
e2 3e2
T fi ðΦL ! Φ0 Þ2 j r2 j 2 ¼ jr j2 ,
2 2 1
e2
T fi ðΦL ! ΦH Þ2 jr j2 :
4 1
Thus, we obtain
2
T fi ðΦL ! Φ0 Þ2 6T fi ðΦL ! ΦH Þ : ð19:177Þ
Thus, the transition probability of ΦL ! Φ0 is about six times that for ΦL ! ΦH.
Note that in the above simple estimation we ignore an overlap integral S.
From the above discussion, we conclude that (i) the ΦL ! Φ0 transition is
polarized along the r2 direction (i.e., the molecular long axis) and that the ΦL ! ΦH
transition is polarized along the r1 direction (i.e., the molecular short axis).
(ii) Transition probability of ΦL ! Φ0 is about six times that of ΦL ! ΦH. Note
that the polarized characteristics are consistent with those obtained from the discus-
sion based on the group theory. The conclusion reached by the semiquantitative
estimation of a simple molecule of allyl cation well typifies the general optical
features of more complicated molecules having a well-defined molecular long axis
such as polyenes.
O
y
where by the above notations we denoted atomic species and molecular orbitals.
Hence, as a matrix representation we have
0 1
1 0 0 0 0 0 0 0
B0 0C
B 0 0 1 0 0 0 C
B C
B0 1 0 0 0 0 0 0C
B C
B0 0C
B 0 1 0 0 0 0 C
Cxyz ¼B C, ð19:178Þ
3
B0 0 0 0 1 0 0 0C
B C
B0 1C
B 0 0 0 0 0 0 C
B C
@0 0 0 0 0 1 0 0A
0 0 0 0 0 0 1 0
where Cxyz
3 is the same operation as Rxyz2π
3
appeared in Sect. 17.3. Therefore,
19.5 MO Calculations of Methane 779
χ C xyz
3 ¼ 2:
As another example σ yz
d , we have
where σ yz
d represents a mirror symmetry with respect to the plane that includes the x-
axis and bisects the angle formed by the y- and z-axes. Thus, we have
0 1
1 0 0 0 0 0 0 0
B0 0C
B 1 0 0 0 0 0 C
B C
B0 0 0 1 0 0 0 0C
B C
B0 0C
yz B 0 1 0 0 0 0 C
σd ¼ B C: ð19:179Þ
B0 0 0 0 1 0 0 0C
B C
B0 0C
B 0 0 0 0 1 0 C
B C
@0 0 0 0 0 0 0 1A
0 0 0 0 0 0 1 0
Then, we have
χ σ yz
d ¼ 4:
0 1
0 0 0 1 0 0 0 0
B0 0C
B 0 1 0 0 0 0 C
B C
B0 1 0 0 0 0 0 0C
B C
B1 0 0C
B 0 0 0 0 0 C
C2 ¼ B
z
C, ð19:180Þ
B0 0 0 0 1 0 0 0C
B C
B0 0 1 0 0C
B 0 0 0 C
B C
@0 0 0 0 0 0 1 0 A
0 0 0 0 0 0 0 1
χ Cz2 ¼ 0:
zπ
With S42 (i.e., an improper rotation by π2 around the z-axis), we get
zπ
H 1 H 2 H 3 H 4 C2s C2px C2py C2pz S42
¼ H 3 H 1 H 4 H 2 C2s C2py C2px C2pz ,
0 1
0 1 0 0 0 0 0 0
B0 0 C
B 0 0 1 0 0 0 C
B C
B1 0 0 0 0 0 0 0 C
B C
B0 0 C
zπ2 B 0 1 0 0 0 0 C
S4 ¼ B C, ð19:181Þ
B0 0 0 0 1 0 0 0 C
B C
B0 1 0 C
B 0 0 0 0 0 C
B C
@0 0 0 0 0 1 0 0 A
0 0 0 0 0 0 0 1
zπ2
χ S4 ¼ 0:
χ ðEÞ ¼ 8:
In other words, V8 is decomposed into the above two R-invariant subspaces (see
Part III); one is the hydrogen-related subspace and the other is the carbon-related
subspace.
Correspondingly, a representation D comprising the above representation matri-
ces should be reduced to a direct sum of irreducible representations D(α). That is, we
should have
X
D¼ q DðαÞ ,
α α
1 X ðαÞ
qα ¼ χ ðgÞ χ ðgÞ: ð18:83Þ
n g
1 X ðA1 Þ 1
qA1 ¼ χ ðgÞ χ ðgÞ ¼ ð1 8 þ 1 8 2 þ 1 6 4Þ ¼ 2:
24 g 24
1
qA 2 ¼ ½1 8 þ 1 8 2 þ ð1Þ 6 4 ¼ 0:
24
1
qT 2 ¼ ð3 8 þ 1 6 4Þ ¼ 2:
24
Note that this diagonal matrix is identical to submatrix of (19.180) for Span
zπ
{C2s C2px C2py C2pz}. Similarly, S42 of (19.181) gives eigenvalues 1 and i for
zπ
both the subspaces. Notice that since S42 is unitary, its eigenvalues take a complex
number with an absolute value of 1. Since these symmetry operation matrices are
unitary, these matrices must be diagonalized according to Theorem 14.5. Using
unitary matrices whose column vectors are chosen from eigenvectors of the matrices,
diagonal elements are identical to eigenvalues including their multiplicity. Writing
representation matrices of symmetry operations for the hydrogen-associated sub-
space and carbon-associated subspace as H and C, we find that H and C have the
same eigenvalues in common. Notice here that different types of transformation
zπ
matrices (e.g., C z2 , S42 ) give a different set of eigenvalues. Via unitary similarity
transformation using unitary matrices P and Q, we get
19.5 MO Calculations of Methane 783
1
P1 HP ¼ Q1 CQ or PQ1 H PQ1 ¼ C:
Namely, H and C are similar, i.e., the representation is equivalent. Thus, recalling
Schur’s First Lemma, Eq. (19.184) results.
Our next task is to construct SALCs, i.e., proper basis vectors using projection
operators. From the above, we anticipate that the MOs comprise a linear combina-
tion of the hydrogen-associated SALC and carbon-associated SALC that belongs to
the same irreducible representation (i.e., A1 or T2). For this purpose, we first find
proper SALCs using projection operators described as
ðαÞ dα X ðαÞ
PlðlÞ ¼ D ðgÞ g:
g ll
ð18:156Þ
n
ðA Þ dA1 X ðA1 Þ 1
P1ð11Þ H 1 ¼ D ðgÞ gH 1 ¼ ½ð1 H 1 Þ þ ð1 H 1 þ 1 H 1 þ 1 H 3
g 11
n 24
þ1 H 4 þ 1 H 3 þ 1 H 2 þ 1 H 4 þ 1 H 2 Þ þ ð1 H 4 þ 1 H 3 þ 1 H 2 Þ
þð1 H 3 þ 1 H 2 þ 1 H 2 þ 1 H 4 þ 1 H 4 þ 1 H 3 Þ
þð1 H 1 þ 1 H 4 þ 1 H 1 þ 1 H 2 þ 1 H 1 þ 1 H 3 Þ
1
¼ ð6 H 1 þ 6 H 2 þ 6 H 3 þ 6 H 4 Þ
24
1
¼ ðH 1 þ H 2 þ H 3 þ H 4 Þ: ð19:185Þ
4
The case of C2s is simple, because all the symmetry operations convert C2s to
itself. That is, we have
ðA Þ 1
P1ð11Þ C2s ¼ ð24 C2sÞ ¼ C2s: ð19:186Þ
24
ðA Þ 1
P1ð11Þ C2px ¼ fð1 C2px Þ þ 1 C2py þ 1 C2pz þ 1 C2py
24
þ1 C2pz þ 1 C2py þ 1 C2pz þ 1 C2py þ 1 C2pz
þ 1 C2py þ 1 C2px þ 1 C2pz þ 1 C2py þ 1 C2px þ 1 C2pz
¼ 0:
The calculation is somewhat tedious, but it is natural that since C2s is spherically
symmetric, it belongs to the totally symmetric representation. Conversely, it is
natural to think that C2px is totally unlikely to contain a totally symmetric represen-
tation. This is also the case with C2py and C2pz.
Table 19.12 shows the character table of Td in which the three-dimensional
irreducible representation T2 is spanned by basis vectors (x y z). Since in
Table 17.6 each (3, 3) matrix is given in reference to the vectors (x y z), it can
directly be utilized to represent T2. More specifically, we can directly choose the
ðT Þ
diagonal elements (1, 1), (2, 2), and (3, 3) of the individual (3, 3) matrices for D112 ,
ðT Þ ðT Þ
D222 , and D332 elements of the projection operators, respectively. Thus, we can
construct SALCs using projection operators explained in Sect. 18.7. For example,
using H1 we obtain SALCs that belong to T2 such that
ðT Þ 3 X ðT 2 Þ 3
P1ð12Þ H 1 ¼ D ðgÞ gH 1 ¼ fð1 H 1 Þ
g 11
24 24
þ½ð1Þ H 4 þ ð1Þ H 3 þ 1 H 2
þ½0 ðH 3 Þ þ 0 ðH 2 Þ þ 0 ðH 2 Þ þ 0 ðH 4 Þ þ ð1Þ H 4 þ ð1Þ H 3
þð0 H 1 þ 0 H 4 þ 1 H 1 þ 1 H 2 þ 0 H 1 þ 0 H 3 Þg
3
¼ ½2 H 1 þ 2 H 2 þ ð2Þ H 3 þ ð2Þ H 4
24
1
¼ ðH 1 þ H 2 H 3 H 4 Þ: ð19:187Þ
4
Similarly, we have
ðT Þ 1
P2ð22Þ H 1 ¼ ðH 1 H 2 þ H 3 H 4 Þ, ð19:188Þ
4
ðT Þ 1
P3ð32Þ H 1 ¼ ðH 1 H 2 H 3 þ H 4 Þ: ð19:189Þ
4
19.5 MO Calculations of Methane 785
ðT Þ
Now we can easily guess that P1ð12Þ C2px solely contains C2px. Likewise,
ðT Þ ðT Þ
P2ð22Þ C2py and P3ð32Þ C2pz contain only C2py and C2pz, respectively. In fact, we
obtain what we anticipate. That is,
ðT Þ ðT Þ ðT Þ
P1ð12Þ C2px ¼ C2px , P2ð22Þ C2py ¼ C2py , P3ð32Þ C2pz ¼ C2pz : ð19:190Þ
This gives a good example to illustrate the general concept of projection operators
and related calculations of inner products
D discussed in Sect.E 18.7. For example, we
ðT 2 Þ ðT 2 Þ
have a nonvanishing inner product of P1ð1Þ H 1 jP1ð1Þ C2px and an inner product of,
D E
ðT Þ ðT Þ
e.g., P2ð22Þ H 1 jP1ð12Þ C2px is zero. This significantly reduces efforts to solve a
ðT Þ ðT Þ
secular equation (vide infra). Notice that functions P1ð12Þ H 1 , P1ð12Þ C2px , etc. are
ðT Þ ðT Þ
linearly independent of one another. Recall also that P1ð12Þ H 1 and P1ð12Þ C2px
ð αÞ ðαÞ
correspond to ϕl and ψ l in (18.171), respectively. That is, the former functions
are linearly independent, while they belong to the same place “1” of the same three-
dimensional irreducible representation T2.
Equations (19.187) to (19.190) seem intuitively obvious. This is because if we
draw a molecular geometry of methane (see Fig. 19.14), we can immediately
recognize the relationship between the directionality in ℝ3 and “directionality” of
SALCs represented by sings of hydrogen atomic orbitals (or C2px, C2py, and C2pz of
carbon).
As stated above, we have successfully obtained SALCs relevant to methane.
Therefore, our next task is to solve an eigenvalue problem and construct appropriate
MOs. To do this, let us first normalize the SALCs. We assume that carbon-based
SALCs have already been normalized as well-studied atomic orbitals. We suppose
that all the functions are real. For the hydrogen-based SALCs
hH 1 þ H 2 þ H 3 þ H 4 jH 1 þ H 2 þ H 3 þ H 4 i ¼ 4hH 1 jH 1 i þ 12hH 1 jH 2 i
¼ 4 þ 12SHH ¼ 4 1 þ 3SHH , ð19:191Þ
where the second equality comes from the fact that jH1i, i.e., a 1s atomic orbital is
normalized. We define hH1| H2i ¼ SHH, i.e., an overlap integral between two
adjacent hydrogen atoms. Note also that hHi| Hji (1 i, j 4; i 6¼ j) is the same
because of the symmetry requirement.
Thus, as a normalized SALC we have
j H1 þ H2 þ H3 þ H4i
H ðA1 Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:192Þ
2 1 þ 3SHH
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Defining a denominator as c ¼ 2 1 þ 3SHH , we have
786 19 Applications of Group Theory to Physical Chemistry
Also, we have
hH 1 þ H 2 H 3 H 4 jH 1 þ H 2 H 3 H 4 i ¼ 4hH 1 jH 1 i 4hH 1 jH 2 i
¼ 4 4SHH ¼ 4 1 SHH : ð19:194Þ
Thus, we have
ðT 2 Þ j H1 þ H2 H3 H4i
H1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:195Þ
2 1 SHH
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Also defining a denominator as d ¼ 2 1 SHH , we have
ðT 2 Þ
H1 j H 1 þ H 2 H 3 H 4 i=d:
ðT 2 Þ
H2 j H 1 H 2 þ H 3 H 4 i=d,
ðT Þ
,
H3 2 j H 1 H 2 H 3 þ H 4 i=d:
The next step is to construct MOs using the above SALCs. To this end, we make a
linear combination using SALCs belonging to the same irreducible representations.
In the case of A1, we choose H ðA1 Þ and C2s. Naturally, we anticipate two linear
combinations of
where a1 and b1 are arbitrary constants. On the basis of the discussions of projection
operators in Sect. 18.7, both the above two linear combinations belong to A1 as well.
ðT Þ ðT Þ ðT Þ
Similarly, according to the projection operators P1ð12Þ , P2ð22Þ , and P3ð32Þ , we make three
sets of linear combinations
ðT Þ ðT Þ ðT Þ
q1 P1ð12Þ H 1 þ r 1 C2px ; q2 P2ð22Þ H 1 þ r 2 C2py ; q3 P3ð32Þ H 1 þ r 3 C2pz , ð19:197Þ
where q1, r1, etc. are arbitrary constants. These three sets of linear combinations
belong to individual “addresses” 1, 2, and 3 of T2. What we have to do is to
determine coefficients of the above MOs and to normalize them by solving the
secular equations. With two different energy eigenvalues, we get two orthogonal
(i.e., linearly independent) MOs for the individual four sets of linear combinations of
19.5 MO Calculations of Methane 787
(19.196) and (19.197). Thus, total eight linear combinations constitute MOs of
methane.
In light of (19.183), the secular equation can be reduced as follows according to
the representations A1 and T2. There we have changed the order of entries in the
equation so that we can deal with the equation easily. Then we have
H 11 λ H 12 λS12
H λS H λ
21 21 22
G11 λ G12 λT 12
G21 λT 21 G22 λ
F 11 λ F 12 λV 12
F 21 λV 21 F 22 λ
K λ K λW
11 12 12
K 21 λW 21 K 22 λ
¼ 0,
ð19:198Þ
where off-diagonal elements are all zero. This is because of the symmetry require-
ment (see Sect. 19.2). Thus, in (19.198) the secular equation is decomposed into four
(2, 2) blocks. The first block is pertinent to A1 of a hydrogen-based component and
carbon-based component from the left, respectively. Lower three blocks are perti-
nent to T2 of hydrogen-based and carbon-based components from the left, respec-
ðT Þ ðT Þ ðT Þ
tively, in order of P1ð12Þ , P2ð22Þ , and P3ð32Þ SALCs from the top. The notations follow
those of (19.59).
We compute these equations. The calculations are equivalent to solving the
following four two-dimensional secular equations:
H 11 λ H 12 λS12
¼ 0,
H λS H 22 λ
21 21
G11 λ G12 λT 12
¼ 0,
G λT G22 λ
21 21
ð19:199Þ
F 11 λ F 12 λV 12
¼ 0,
F λV F 22 λ
21 21
K 11 λ K 12 λW 12
¼ 0:
K λW K λ
21 21 22
Notice that these four secular equations are the same as a (2, 2) determinant of
(19.59) in the form of a secular equation. Note, at the same time, that while (19.59)
did not assume SALCs, (19.199) takes account of SALCs. That is, (19.199)
788 19 Applications of Group Theory to Physical Chemistry
expresses a secular equation with respect to two SALCs that belong to the same
irreducible representation.
The first equation of (19.199) reads as
A1 hH 1 jC2si:
SCH ð19:202Þ
Similarly, we obtain related solutions for the latter three eigenvalue equations of
(19.199). With the second equation of (19.199), for instance, we have
T 2 hH 1 jC2px i, βT 2 hH 1 jℌC2px i:
SCH ð19:214Þ
CH
In (19.210), an overlap integral T12 between four hydrogen atomic orbitals and
C2px is identical again from the symmetry requirement. That is, the integrals of
(19.213) are additive regarding a product of plus components of a hydrogen atomic
orbital of (19.187) and C2px as well as another product of minus components of a
hydrogen atomic orbital of (19.187) and C2px; see Fig. 19.14. Notice that C2px has a
node on yz-plane.
The third and fourth equations of (19.199) give exactly the same eigenvalues as
that given in (19.209). This is obvious from the fact that all the latter three equations
of (19.199) are associated with a irreducible representation T2. The corresponding
three MOs are triply degenerate.
In (19.207) and (19.209), the plus sign gives a higher orbital energy and the minus
sign gives a lower energy. Equations (19.207) and (19.209), however, look
790 19 Applications of Group Theory to Physical Chemistry
somewhat complicated. To simplify the situation, (i) in, e.g., (19.207) let us consider
a case where jH11 j j H22j or jH11 j j H22j. In that case, (H11 H22)2
dominates inside a square root, and so ignoring 2H12S12 and S122 we have
λ H11 or λ H22. Inserting these values into (19.199), we have either Ψ
H ðA1 Þ or Ψ C2s, where Ψ is a resulting MO. This implies that no interaction would
arise between H ðA1 Þ and C2s. (ii) If, on the other hand, H11 ¼ H22, we would get
H 11 H 12 S12 j H 12 H 11 S12 j
λ¼ : ð19:215Þ
1 S12 2
ħ2 2 e2 e2 e2
H¼ ∇ þ , ð19:216Þ
2m 4πε0 r A 4πε0 r B 4πε0 R
where m is an electron rest mass and other symbols are defined in Fig. 19.15; the last
term represents the repulsive interaction between the two hydrogen nuclei. To
estimate the 0
quantities
1 in (19.215), it is convenient to use dimensionless ellipsoidal
μ
B C
coordinates @ ν A such that [3]
ϕ
rA þ rB rA rB
μ¼ and ν¼ : ð19:217Þ
R R
19.5 MO Calculations of Methane 791
A B
Fig. 19.15 Configuration of electron (e) and hydrogen nuclei (A and B). rA and rB denote a
separation between the electron and A and that between the electron and B, respectively. R denotes
a separation between A and B
The quantity ϕ is an azimuthal angle around the molecular axis (i.e., the straight
line connecting the two nuclei). Then, we have
2
ħ2 2 e2 e2 e
H 11 ¼ hAjHjAi ¼ A ∇ A þ A A þ A A
2m 4πε0 r A 4πε0 r B 4πε0 R
e2 1 e2
¼ E 1s A A þ , ð19:218Þ
4πε0 rB 4πε0 R
where E1s is the same as that given in (3.258) with Z ¼ n ¼ 1 and μ replaced with
ħ2
m (i.e., E1s ¼ 2ma 2 ). Using a coordinate representation of (3.301), we have
rffiffiffi
1 3=2 rA =a
j Ai ¼ a e : ð19:219Þ
π
we have
792 19 Applications of Group Theory to Physical Chemistry
ð1 ð1
1 R3 eðμþνÞR=a
A A ¼ 2π dμ dν μ2 ν2
rB 8πa 3 R
1 1 ðμ νÞ
2
ð ð1
R2 1
¼ 3 dμ dνðμ þ νÞeðμþνÞR=a : ð19:220Þ
2a 1 1
Ð1 Ð1
Putting I ¼ 1 dμ 1 dνðμ þ νÞeðμþνÞR=a , we obtain
ð1 ð1 ð1 ð1
μR=a νR=a μR=a
I¼ μe dμ e dν þ e dμ νeνR=a dν: ð19:221Þ
1 1 1 1
The above definite integrals can readily be calculated using the methods
described in Sect. 3.7.2; see, e.g., (3.262) and (3.263). For instance, we have
ð1 1
ecx 1
ecν dν ¼ ¼ ðec ec Þ: ð19:222Þ
1 c 1
c
In the present case, c is to be replaced with R/a. Other calculations of the definite
integrals are left for readers. Thus, we obtain
h i
a3 R
I¼ 3
2 2 1 þ e2R=a :
R a
In turn, we have
h i
1 R2 1 R
A A ¼ 3 I ¼ 1 1 þ e2R=a :
rB 2a R a
e2
j0 ,
4πε0
we finally obtain
j0 h R
i j
H 11 ¼ E 1s 1 1 þ e2R=a þ 0 : ð19:224Þ
R a R
ð1 ð1 ð1
2 μR=a
dμ dν μ2 ν2 eμR=a ¼ dμ 2μ2 e
1 1 1 3
ħ 2
Note that in (19.227) jBi is an eigenfunction belonging to E1s ¼ 2ma 2 that is an
ħ2 e2
eigenvalue of an operator 2m ∇ 4πε0 rB . From (3.258), we estimate E1s to be
2
13.61 eV. Using (19.217) and converting Cartesian coordinates to ellipsoidal coor-
dinates once again, we get
794 19 Applications of Group Theory to Physical Chemistry
1 1 R R=a
A B ¼ 1þ e :
rA a a
Thus, we have
j0 j R R=a
H 12 ¼ E1s þ S 0 1þ e : ð19:228Þ
R 12 a a
Then, we have
H 12 H 11 S12
h i
1 2R R=a j0 R R 1 R 2 3R=a
¼ j0 2 e 1þ 1þ þ Þ e
R 3a R a a 3 a
j0 a 2R R=a j0 h i
a R 1 R 2 3R=a
¼ e 1þ 1þ þ Þ e : ð19:230Þ
a R 3a a R a 3 a
In (19.230) we notice that whereas the second term is always negative, the first term
may be negative or positive depending upon R. We could not tell a priori whether
H12 H11S12 is negative accordingly. If we had R a, (19.230) would become
positive.
Let us then make a quantitative estimation. The Bohr radius a is about 52.9 pm
(using an electron rest mass). As an experimental result, R is approximately 106 pm
[3]. Hence, for a Hþ2 ion we have
R=a 2:0:
H 11 þ H 12 H 11 H 12
λL ¼ and λH ¼ , ð19:231Þ
1 þ S12 1 S12
j Aiþ j Bi j Ai j Bi
Ψ L ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi and Ψ H ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , ð19:232Þ
2ð1 þ S12 Þ 2ð1 S12 Þ
where Ψ L and Ψ H belong to λL and λH, respectively. The results are virtually the
same as those given in (19.101) to (19.105) of Sect. 19.4.1. In (19.101) and
Fig. 19.6, however, we merely assumed that λ1 ¼ αþβ αβ
1þS is lower than λ2 ¼ 1S .
Here, we have confirmed that this is truly the case. A chemical bond is formed in
such a way that an electron is distributed as much as possible along the molecular
axis (see Fig. 19.4 for a schematic) and that in this configuration a minimized orbital
energy is achieved.
In our present case, H11 in (19.203) and H22 in (19.204) should differ. This is also
the case with G11 in (19.211) and G22 in (19.212). Here we return back to (19.199).
Suppose that we get a MO by solving, e.g., the first secular equation of (19.199) such
that
Ψ ¼ a1 H ðA1 Þ þ b1 C2s:
H 12 λS12
a1 ¼ b , ð19:233Þ
H 11 λ 1
where S12 and H12 were defined as in (19.201) and (19.205), respectively. A
e is described by
normalized MO Ψ
a1 H ðA1 Þ þ b1 C2s
e ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ ffi: ð19:234Þ
a1 2 þ b1 2 þ 2a1 b1 S12
Thus, according to two different energy eigenvalues λ we will get two linearly
independent MOs. Other three secular equations are dealt with similarly.
From the energetical consideration of Hþ 2 we infer that a1b1 > 0 with a bonding
MO and a1b1 < 0 with an antibonding MO. Meanwhile, since both H ðA1 Þ and C2s
belong to the irreducible representation A1, so does Ψe on the basis of the discussion
on the projection operators of Sect. 18.7. Thus, we should be able to construct proper
MOs that belong to A1. Similarly, we get proper MOs belonging to T2 by solving
other three secular equations of (19.199). In this case, three bonding MOs are triply
degenerate, so are three antibonding MOs. All these six MOs belong to the
796 19 Applications of Group Theory to Physical Chemistry
CH4
irreducible representation T2. Thus, we can get a complete set of MOs for methane.
These eight MOs span the representation space V8.
To precisely determine the energy levels, we need to take more elaborate
approaches to approximate and calculate various parameters that appear in
(19.199). At the same time, we need to perform detailed experiments including
spectroscopic measurements and interpret those results carefully [4]. Taking account
of these situations, Fig. 19.16 [5] displays as an example of MO calculations that
give a probable energy diagram and MO symmetry species of methane. The diagram
comprises a ground-state bonding a1 state and its corresponding antibonding state a1
along with triply degenerate bonding t2 states, and their corresponding antibonding
state t 2 .
We emphasize that the said elaborate approaches ensue truly from the “paper-
and-pencil” methods based upon group theory. The group theory thus supplies us
with a powerful tool and clear guideline for addressing various quantum chemical
problems, a few of which we are introducing as an example in this book.
Finally, let us examine the optical transition of methane. In this case, we have to
consider electronic configurations of the initial and final states. If we are dealing with
optical absorption, the initial state is the ground-state A1 that is described by the
totally symmetric representation. The final state, on the other hand, will be an excited
state, which is described by a direct-product representation related to the two states
that are associated with the optical transition. The matrix element is expressed as
D E
Θðα βÞ
jεe PjΘðA1 Þ , ð19:235Þ
where ΘðA1 Þ stands for an electronic configuration of the totally symmetric ground
state; Θ(α β) denotes an electronic configuration of an excited state represented by a
direct-product representation pertinent to irreducible representations α and β; P is an
electric dipole operator and εe is a unit polarization vector of the electric field. From a
19.5 MO Calculations of Methane 797
character table for Td, we find that εe P belongs to the irreducible representation T2
(see Table 19.12).
The ground-state electronic configuration is A1 (totally symmetric). It is denoted
by
a21 t 22 t 0 2 t 00 2 ,
2 2
where three T2 states are distinguished by a prime and double prime. For possible
configuration of excited states, we have.
Therefore, according to the general theory of Sect. 19.2, we need to examine whether
T2 D(α) D(β) contains A1 to judge whether the optical transition is allowed. As
mentioned above, D(α) D(β) is chosen from among T2 T2, T2, T2, and A1. The
results are given as below:
T2 T2 T 2 ¼ 3T 1 þ 4T 2 þ 2E þ A2 þ A1 , ð19:238Þ
T2 T 2 ¼ T 1 þ T 2 þ E þ A1 , ð19:239Þ
T2 A1 ¼ T 2 : ð19:240Þ
Admittedly, (19.238) and (19.239) contain A1, and so the transition is allowed. As
for (19.240), however, the transition is forbidden because it does not contain A1. In
light of the character table of Td (Table 19.12), we find that the allowed transitions
(i.e., t 2 ! t 2 , t 2 ! a1 , and a1 ! t 2 ) equally take place in the direction polarized
along the x-, y-, z-axes. This is often the case with molecules having higher
symmetries such as methane. The transition a1 ! a1 , on the other hand, is
forbidden.
In the above argument including the energetic consideration, we could not tell
magnitude relationship of MO energies or photon energies associated with the
optical transition. Once again, this requires more accurate calculations and experi-
ments. Yet, the discussion we have developed gives us strong guiding principles in
the investigation of molecular science.
798 19 Applications of Group Theory to Physical Chemistry
References
In Chap. 16, we classified the groups into finite groups and infinite groups. We
focused mostly on finite groups and their representations in the precedent chapters.
Of the infinite groups, continuous groups and their properties have been widely
investigated. In this chapter we think of the continuous groups as a collective term
that include Lie groups and topological groups. The continuous groups are also
viewed as the transformation group. Aside from the strict definition, we will see the
continuous group as a natural extension of the rotation group or SO(3) that we
studied briefly in Chap. 17. Here we reconstruct SO(3) on the basis of the notion of
infinitesimal rotation. We make the most of the exponential functions of matrices the
theory of which we have explored in Chap. 15. In this context, we study the basic
properties and representations of the special unitary group of SU(2) that has close
relevance to SO(3). Thus, we focus our attention on the representation theory of SU
(2) and SO(3). The results are associated with the (generalized) angular momenta
which we dealt with in Chap. 3. In particular, we show that the spherical surface
harmonics constitute the basis functions of the representation space of SO(3).
Finally, we study various important properties of SU(2) and SO(3) within a
framework of Lie groups and Lie algebras. The last sections comprise the description
of abstract ideas, but those are useful for us to comprehend the constitution of the
group theory from a higher perspective.
First let us consider how a function Ψ is transformed by the rotation (see Sect.
19.1). Here we express Ψ by
0 1
c1
B c2 C
B C
Ψ ¼ ðψ 1 ψ 2 ψ d ÞB C, ð20:1Þ
@⋮A
cd
where the index θ indicates the rotation angle in a real space whose dimension is two
or three. Note that the dimensions of the real space and representation space may or
may not be the same. Equation (20.2) is essentially the same as (19.11).
As a three-dimensional matrix representation of rotation, we have, e.g.,
0 1
cos θ sin θ 0
B C
Rθ ¼ @ sin θ cos θ 0 A: ð20:3Þ
0 0 1
The matrix Rθ represents a rotation around the z-axis in ℝ3; see Fig. 11.2 and
(11.31). Borrowing the notation of (17.3) and (17.12), we have
0 10 1
cos θ sin θ 0 x
B CB C
Rθ ðxÞ ¼ ðe1 e2 e3 Þ@ sin θ cos θ 0 A@ y A: ð11:36Þ
0 0 1 z
Therefore, we get
0 10 1
cos θ sin θ 0 x
B CB C
R1
θ ðxÞ ¼ ðe1 e2 e3 Þ@ sin θ cos θ 0 A@ y A: ð20:4Þ
0 0 1 z
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 801
0 0 1 z z ð20:5Þ
¼ ðxe1 þ θye1 ye2 θxe2 ze3 Þ
¼ ð½x þ θye1 ½y θxe2 ze3 Þ:
Putting
we have
Rθ ψ ðxÞ ¼ ψ R1
θ ð xÞ ð20:2Þ
or
we get
Note that the operator ℳz has already appeared in Sect. 3.5 [also see (3.19), Sect.
3.2], but in this section we denote it by ℳz to distinguish it from the previous
notation Mz. This is because ℳz and Mz are connected through a unitary similarity
transformation (vide infra).
From (20.8), we express an infinitesimal rotation θ around the z-axis as
802 20 Theory of Continuous Groups
Rθ = 1 iθℳz : ð20:9Þ
where RHS of (20.10) implies that the rotation Rθ of a finite angle θ is attained by
n successive infinitesimal rotations of Rθ/n with a large enough n.
Taking a limit n ! 1 [1, 2], we get
n
n θ
Rθ ¼ lim Rθ=n ¼ lim 1 i ℳz ¼ exp ðiθℳz Þ: ð20:11Þ
n!1 n!1 n
Recall the following definition of an exponential function other than (15.7) [3]:
x n
ex n!1
lim 1 þ : ð20:12Þ
n
Note that as in the case of (15.7), (20.12) holds when x represents a matrix. Thus,
if a dimension of the representation space d is two or more, (20.9) should be read as
Rθ ¼ E iθℳz , ð20:13Þ
with
Az iℳz : ð20:15Þ
Operators of this type play an essential role in continuous groups. This is because
in (20.14) the operator Az represents a Lie algebra, which in turn generates a Lie
group. The Lie group is categorized as one of continuous groups along with a
topological group. In this chapter we further explore various characteristics of SO
(3) and SU(2), both of which are Lie groups and have been fully investigated from
various aspects. The Lie algebras frequently appear in the theory of continuous
groups in combination with the Lie groups. Brief outline of the theory of Lie groups
and Lie algebras will be given at the end of this chapter.
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 803
Let us further consider implications of (20.9) and (20.13). From those equations
we have
ℳz ¼ ð1 Rθ Þ=iθ ð20:16Þ
or
ℳz ¼ ðE Rθ Þ=iθ: ð20:17Þ
Rθ ðe3 Þ ¼ e3 :
where c1, c2, and c3 are coefficients (or coordinates). Again, this notation is a natural
extension of (11.13). We emphasize once more that (x y z) does not represents
coordinates but functions. As an example, we depict a function f(x, y) ¼ ay (a > 0) in
Fig. 20.1. The function ay (drawn green in Fig. 20.1) behaves as a vector
with respect to a linear transformation like a basis vector e2 of the Cartesian
coordinate. The “directionality” as a vector can easily be visualized with the function
ay.
Since ℳz of (20.19) is Hermitian, it must be diagonalized by unitary similarity
transformation (Chap. 14). As in the case of, e.g., (14.96), we find a diagonalizing
unitary matrix P with respect to ℳz such that
0 1
1 1
pffiffiffi pffiffiffi 0
B 2 2 C
B C
P¼B
B pffiffiffi piffiffiffi
i C: ð20:22Þ
@ 0C
A
2 2
0 0 1
0 1 0 1
0 i 0 c1
B C B C
ℳz ðΧÞ ¼ ðx y zÞP P1 B
@i 0 0C 1 B C
AP P @ c2 A
0 0 0 c3
0 1 1
0
1 1 1 i
pffiffiffi pffiffiffi 0 0 1
pffiffiffi pffiffiffi 0 0 1
B 2 2 C 1 0 0 B 2 2 C c1
B CB CB CB C
B CB B 1 CB C
¼ ðx y zÞB 1 0 C
pffiffiffi 0 C AB pffiffiffi pffiffiffi 0 C
B
i i 0 i c2
B pffiffiffi C@ C@ A
@ 2 2 A @ 2 2 A
0 0 0 c3
0 0 1 0 0 1
0 1
0 1 p1ffiffiffi c þ piffiffiffi c
1 0 0 B 2 C
1 2
2
1 i 1 i B CB
B
C
C
¼ pffiffiffi x pffiffiffi y pffiffiffi x pffiffiffi y z B
@ 0 1 0 CB 1
AB pffiffiffi c1 þ pffiffiffi c2 C
i :
2 2 2 2 C
@ 2 2 A
0 0 0
c3
ð20:23Þ
fz such that
Here we define M
0 1
1 0 0
fz P1 ℳz P ¼ B
M @0
C
1 0 A: ð20:24Þ
0 0 0
Equation (20.24) was obtained by putting l ¼ 1 and shuffling the rows and
columns of (3.158) in Chap. 3. That is, choosing a (real) unitary matrix R3 given by
0 1
0 0 1
B C
R3 ¼ @ 1 0 0 A, ð17:116Þ
0 1 0
we have
0 1
1 0 0
f 1 B C
R1 1 1
3 M z R3 ¼ R3 P ℳz PR3 ¼ ðPR3 Þ ℳz ðPR3 Þ ¼ @ 0 0 0 A ¼ Mz:
0 0 1
a spherical basis. If we carefully look at the latter basis vectors (i.e., spherical basis),
we
1 notice that these vectors have the same transformation properties as
Y 1 Y 1
1 Y 01 . In fact, using (3.216) and converting the spherical coordinates to
Cartesian coordinates in (3.216), we get [4].
rffiffiffiffiffi
3 1 1 1
Y 11 Y 1
1 Y 01 ¼ pffiffiffi ðx þ iyÞ pffiffiffi ðx iyÞ z , ð20:25Þ
4π r 2 2
M z Y 00 ¼ 0,
pffiffiffiffiffiffiffiffiffiffi
where Y 00 ðθ, ϕÞ ¼ 1=4π . The above relation obviously shows that Y 00 ðθ, ϕÞ
belongs to an eigenvalue of zero of Mz. This is consistent with the earlier comments
made on the one-dimensional real space.
1 1 1 1
exp ðθAz Þ ¼ E þ θAz þ ðθAz Þ2 þ ðθAz Þ3 þ ðθAz Þ4 þ ðθAz Þ5 þ : ð20:27Þ
2! 3! 4! 5!
We note that
0 1 0 1
1 0 0 1 0 0
B C B C
Az 2 ¼@ 0 1 0 A ¼ Az , Az ¼ @ 0
6 4
1 0 A ¼ Az 8 , etc:;
0 0 0 0 0 0
0 1 0 1
0 1 0 0 1 0
B C B C
Az 3 ¼ @ 1 0 0 A ¼ Az 7 , Az 5 ¼ @ 1 0 0 A ¼ Az 9 , etc:
0 0 0 0 0 0
Thus, we get
1 1 1 1
exp ðθAz Þ ¼ E þ ðθAz Þ2 þ ðθAz Þ4 þ þ θAz þ ðθAz Þ3 þ ðθAz Þ5 þ
2! 4! 3! 5!
0 1
1 1
1 θ2 þ θ4 0 0
B 2! 4! C
B C
¼BB 1 2 1 4 C
0 1 θ þ θ 0 C
@ 2! 4! A
0 0 1
0 1
1 3 1 5
0 θ þ θ θ þ 0
B 3! 5! C
B C
þBB 1 3 1 5 C
θ θ þ θ þ 0 0 C :
@ 3! 5! A
0 0 0
0 1 0 1
cos θ 0 0 0 sin θ 0
B C B C
¼B@ 0 cos θ 0 C B
A þ @ sin θ 0 0CA
0 0 1 0 0 0
0 1
cos θ sin θ 0
B C
B
¼ @ sin θ cos θ 0 C A:
0 0 1
ð20:28Þ
0 1
1 0 0
B C
exp ðφAx Þ ¼ @ 0 cos φ sin φ A, ð20:29Þ
0 sin φ cos φ
0 1
cos ϕ 0 sin ϕ
B C
exp ϕAy ¼ @ 0 1 0 A: ð20:30Þ
sin ϕ 0 cos ϕ
Of the above operators, using (20.28) and (20.30) we recover (17.101) related to
the Euler angles such that
f
R3 ¼ exp ðαAz Þ exp βAy exp ðγAz Þ: ð20:31Þ
In Sect. 16.1 we mentioned a general linear group GL(n, ℂ). There are many
subgroups of GL(n, ℂ). Among them, SU(n), in particular SU(2), plays a central
role in theoretical physics and chemistry. In this section we examine how to
construct SU(2) matrices.
Definition 20.1 A group consisting of (n, n) unitary matrices with a determinant 1 is
called a special unitary group of degree n and denoted by SU(n).
In Chap. 14 we mentioned that the “absolute value” of determinant of a unitary
matrix is 1. In Definition 20.1, on the other hand, there is an additional constraint in
that we are dealing with unitary matrices whose determinant is 1.
The SU(2) matrices U have the following general form:
a b
U¼ , ð20:32Þ
c d
! ! !
a b a c jaj2 þ jbj2 ac þ bd
¼
c d b d a c þ b d jcj2 þ jdj2
! ! ! ! ð20:33Þ
a c a b jaj2 þ jcj2 a b þ c d 1 0
¼ ¼ ¼ :
b d c d ab þ cd jbj2 þ jdj2 0 1
Then we have
(i) |a|2 + |b|2 ¼ 1, (ii) |c|2 + |d|2 ¼ 1, (iii) |a|2 + |c|2 ¼ 1,
(iv) |b|2 + |d|2 ¼ 1, (v) ac + bd ¼ 0, (vi) ab + cd ¼ 0,
(vii) ad bc ¼ 1.
Of the above conditions, (vii) comes from detU ¼ 1. Condition (iv) can be
eliminated by conditions (i)(iii). From (v) and (vii), using Cramer’s rule we have
0 b a 0
1 a b b 1 a
c¼ ¼ ¼ b , d ¼ ¼ ¼ a : ð20:34Þ
a b 2
jaj þ jbj 2 a b jaj þ jbj2
2
b a b a
From (20.34) we see that (i)–(iv) are equivalent and that (v) and (vi) are redun-
dant. Thus, as a general form of U we get
a b
U¼ with jaj2 þ jbj2 ¼ 1: ð20:35Þ
b a
we have
p2 þ q2 þ r 2 þ s2 ¼ 1: ð20:36Þ
Also putting
810 20 Theory of Continuous Groups
a1 b1 a2 b2
U1 ¼ and U2 ¼ , ð20:37Þ
b1 a1 b2 a2
we get
a1 a2 b1 b2 a1 b2 þ b1 a2
U1U2 ¼ : ð20:38Þ
a2 b1 a1 b2 b1 b2 þ a1 a2
Thus, both U1 and U1U2 satisfy criteria (20.35) and, hence, the unitary matrices
having a matrix form of (20.35) constitute a group; i.e., SU(2).
Next we wish to connect the SU(2) matrices to the matrices that represent the
angular momentum. In Sect. 3.4 we developed the theory of generalized angular
momentum. In the case of j ¼ 1/2 we can obtain accurate information about the spin
states. Using (3.101) as a starting point and defining the state belonging to the spin
1 0
angular momentum μ ¼ 1/2 and μ ¼ 1/2 as and , respectively, we
0 1
obtain
ðþÞ 0 0 ð Þ 0 1
J ¼ and J ¼ : ð20:39Þ
1 0 0 0
1 0
Note that and are in accordance with (2.64) as a column vector
0 1
representation. Using (3.72), we have
1 0 1 1 0 i 1 1 0
Jx ¼ , Jy ¼ , Jz ¼ : ð20:40Þ
2 1 0 2 i 0 2 0 1
1 1 1
In (20.40) Jz was given so that we may have J z ¼ and
0 2 0
0 1 0 1
Jz ¼ . Deleting the coefficient from (20.40), we get Pauli spin
1 2 1 2
matrices such that
0 1 0 i 1 0
σx ¼ , σy ¼ , σz ¼ : ð20:41Þ
1 0 i 0 0 1
Now let us seek rotation matrices in a manner related to that deriving (20.31) but
at a more general level. From Property (7)0 of Chap. 15 (see Sect. 15.2), we know
that
exp ðtAÞ
Readers are recommended to derive this. Using ζ y and applying similar calcula-
tion procedures to the above, we also get
0 1
β β
B
cos
2
sin
2 C:
exp βζ y ¼ @ A ð20:44Þ
β β
sin cos
2 2
ð12Þ
Dα,β,γ ¼ exp ðαζ z Þ exp βζ y exp ðγζ z Þ
0 1
β β
e2ðαþγ Þ cos e2ðαγ Þ sin
i i
B 2 2 C,
¼@ A ð20:45Þ
β β
e2ðγαÞ sin e2ðαþγÞ cos
i i
2 2
ð12Þ
where the superscript 12 of Dα,β,γ corresponds to the non-negative generalized angular
1
momentum j ¼ 2 . Equation (20.45) can be obtained by putting a ¼ e2ðαþγÞ cos β2
i
and b ¼ e2ðαγÞ sin β2 in the matrix U of (20.35). Using this general SU(2) represen-
i
Suppose that we have two functions (or vectors) v and u. We regard v and u are
linearly independent vectors that undergo a unitary transformation by a unitary
matrix U of (20.35). That is, we have
0 0 a b
ðv u Þ ¼ ðv uÞU ¼ ðv uÞ , ð20:46Þ
b a
v0 ¼ av b u, u0 ¼ bv þ a u:
u jþm v jm
f m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm ¼ j, j þ 1, , j 1, jÞ, ð20:47Þ
ð j þ mÞ!ð j mÞ!
ð jÞ
where Dm0 m ða, bÞ denotes matrix elements with respect to the transformation.
Replacing u and v with u0 and v0, respectively, we get
1
ℜða,bÞf m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðbv þ a uÞ jþm ðav b uÞ jm
ð j þ mÞ!ð j mÞ!
X 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðbv þ a uÞ jþm ðav b uÞ jm
μ,v ð j þ mÞ!ð j mÞ!
X 1 ð j þ mÞ!ð j mÞ!
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
μ,v ð j þ mÞ!ð j mÞ! ð j þ m μÞ!μ!ð j m vÞ!v!
ð20:49Þ
20.2 Rotation Groups: SU(2) and SO(3) 813
2j μ v ¼ j þ m0 , μ þ v ¼ j m0 ,
we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð j þ mÞ!ð j mÞ!
Rða, bÞf m ¼ μ,m0 ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!
0
ða Þ jþmμ bμ ðb Þm mþμ a jm μ u jþm v jm :
0 0 0
ð20:50Þ
Expressing f m0 as
0 0
u jþm v jm
f m0 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm0 ¼ j, j þ 1, , j 1, jÞ, ð20:51Þ
ð j þ m0 Þ!ð j m0 Þ!
we obtain
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð j þ mÞ!ð j mÞ!ð j þ m0 Þ!ð j m0 Þ!
Rða, bÞð f m Þ ¼ f 0
μ,m 0 m
ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!
0
ða Þ jþmμ bμ ðb Þm mþμ a jm μ :
0
ð20:52Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð jÞ
X ð1Þm0 mþμ ð j þ mÞ!ð j mÞ!ð j þ m0 Þ!ð j m0 Þ!
Dm0 m ðα, β, γ Þ ¼
μ ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!
2jþmm0 2μ m0 mþ2μ
iðαm0 þγmÞ β β
e cos sin : ð20:54Þ
2 2
Meanwhile, we have
where with the last equality k was replaced with j m. Comparing the above two
equations, we get
Xj h i2j
1
j f j2 ¼ juj2 þ jvj2 : ð20:55Þ
m¼j m ð2jÞ!
Once representation matrices D( j )(α, β, γ) have been obtained, we are able to get
further useful information. We immediately notice that (20.54) is related to (3.216)
20.2 Rotation Groups: SU(2) and SO(3) 815
Note that (20.58) does not depend on the variable γ, because m ¼ 0 leads to
eimγ ¼ ei 0 γ ¼ e0 ¼ 1 in (20.54). Notice also that the event of m ¼ 0 in (20.54)
never happens with the case where j is a half-odd-integer, but happens with the case
where j is an integer l. In fact, D(l )(α, β, γ) is a (2l + 1)-degree representation of SO
(3). Later in this section, we will give a full matrix form in the case of l ¼ 1.
Replacing m0 with m in (20.58) we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2jm2μ mþ2μ
ð jÞ
X ð1Þmþμ j! ð j þ mÞ!ð j mÞ! β β
iαm
Dm0 ðα,β,γ Þ ¼ μ ð j μÞ!μ!ðm þ μÞ!ð j m μÞ!
e cos sin :
2 2
ð20:59Þ
h i rffiffiffiffiffiffiffiffiffiffiffiffi
ðlÞ 4π m
Dm0 ðϕ, θÞ ¼ Y ðθ, ϕÞ: ð20:61Þ
2l þ 1 l
From a general point of view, let us return to a discussion of SU(2) and introduce
a following important theorem in relation to the basis functions of SU(2).
Theorem 20.1 [1] Let {ψ j, ψ j + 1, , ψ j} be a set of (2j + 1) functions. Let D( j )
be a representation of the special unitary group SU(2). Suppose that we have a
following expression for ψ m, k (m ¼ j, , j) such that
h i
ð jÞ
ψ m,k ðα, β, γ Þ ¼ Dmk ðα, β, γ Þ , ð20:62Þ
where α, β, and γ are Euler angles that appeared in (17.101). Then, the above
functions are basis functions of the representation of D( j ).
816 20 Theory of Continuous Groups
Proof In Fig. 20.2 we show coordinate systems O, I, and II. Let P, Q, R be operators
of transformation (i.e., rotation) between the coordinate system as shown. The
transformation Q is defined as the coordinate transformation that transforms I to
II. Note that their inverse transformations, e.g., Q1 exists. In Fig. 20.2, P and R are
specified as P(α0, β0, γ 0) and R(α, β, γ), respectively. Then, we have [1, 2]
Q1 R ¼ P: ð20:63Þ
h i
ð jÞ
Qψ m,k ðα, β, γ Þ ¼ ψ m,k Q1 ðα, β, γ Þ ¼ ψ m,k ðα0 , β0 , γ 0 Þ ¼ Dmk Q1 R ,
where the last equality comes from (20.62) and (20.63); (α, β, γ) and (α0, β0, γ 0) stand
for the transformations R(α, β, γ) and P(α0, β0, γ 0), respectively. The matrix element
ð jÞ
Dmk Q1 R can be written as
ð jÞ
X ð jÞ
Xh i
ð jÞ
Dmk Q1 R ¼ DðmnjÞ Q1 Dnk ðRÞ ¼ DðnmjÞ ðQÞ Dnk ðRÞ,
n n
where with the last equality we used the unitarity of the representation matrix D( j ).
Taking complex conjugate of the above expression, we get
X h i
ð jÞ
Qψ m,k ðα, β, γ Þ ¼ ψ m,k ðα0 , β0 , γ 0 Þ ¼ DðnmjÞ ðQÞ Dnk ðRÞ
n
X h i X
ð jÞ ð jÞ
¼ n
D nm ð Q Þ D nk ð α, β, γ Þ ¼ ψ n,k ðα, β, γ ÞDðnmjÞ ðQÞ, ð20:64Þ
n
where with the last equality we used the supposition (20.62). Equation (20.64)
implies that ψ m, k (m ¼ j, , 0, , j) are basis functions of the representation
of D( j ). This completes the proof. ∎
If we restrict Theorem 20.1 to SO(3), we get an important result. Replacing j with
an integer l and putting k ¼ 0 in (20.62) and (20.64), we get
20.2 Rotation Groups: SU(2) and SO(3) 817
X
Qψ m,0 ðα, β, γ Þ ¼ ψ m,0 ðα0 , β0 , γ 0 Þ ¼ n
ψ n,0 ðα, β, γ ÞDðnm
lÞ
ðQÞ,
h i ð20:65Þ
ðlÞ
ψ m,0 ðα, β, γ Þ ¼ Dm0 ðα, β, γ Þ :
This expression shows that the complex conjugates of individual components for
the “center” column vector of the representation matrix give the basis functions of
the representation of D(l ).
Removing γ as it is redundant and by the aid of (20.61), we get
rffiffiffiffiffiffiffiffiffiffiffiffi
4π m
ψ m,0 ðα, βÞ Y ðβ, αÞ:
2l þ 1 l
Thus, in combination with (20.61), Theorem 20.1 immediately indicates that the
spherical surface harmonics Y m l ðθ, ϕÞ ðm ¼ l, , 0, , lÞ span the representation
space with respect to D(l ). That is, we have
X
l ðβ, αÞ ¼
QY m Y n ðβ, αÞDðnm
n l
lÞ
ðQÞ: ð20:66Þ
Q ¼ RP1 : ð20:67Þ
h i
Rðα, β, γ Þf ðrÞ ¼ f Rðα, β, γ Þ1 ðrÞ :
Thus, we have
l ðβ β, α γ αÞ ¼ Y l ð0, γ Þ:
Ym m
ðlÞ
Dm0 ðγ, 0Þ ¼ δm0 :
rffiffiffiffiffiffiffiffiffiffiffiffi
2l þ 1
l ð0, γ Þ
Ym ¼
4π 0m
δ : ð20:71Þ
Notice that this expression is consistent with (3.147) and (3.148). Meanwhile,
replacing ϕ and θ in (20.61) with α and β, respectively, multiplying the resulting
ðlÞ
equation by Dmk ðα, β, γ Þ, and further summing over m, we get
rffiffiffiffiffiffiffiffiffiffiffiffi
X 4π m ðlÞ
X h ðlÞ i
ðlÞ
Y ð β, αÞD ðα, β, γ Þ ¼ D ð α, β, γ Þ Dmk ðα, β, γ Þ
m 2l þ 1 l mk m m0
rffiffiffiffiffiffiffiffiffiffiffiffi
4π k
¼ δ0k ¼ Y ð0, γ Þ, ð20:72Þ
2l þ 1 l
where with the second equality we used the unitarity of the matrices Dðnm
lÞ
ðα, β, γ Þ (see
Sect. 20.2.2)
qffiffiffiffiffiffiffi and with the last equality we used (20.70) multiplied by the constant
4π
2lþ1 . At the same time, we recovered (20.71) comparing the last two sides of
(20.72).
Rewriting (20.48) in a (2j + 1, 2j + 1) matrix form, we get
f j , f jþ1 , , f ℜða, bÞ
j1 , f j
0 1
Dj,j Dj,j
B C ð20:73Þ
¼ f j , f jþ1 , , f j1 , f j B
@ ⋮ ⋱ ⋮ C A,
D j,j D j,j
ð0Þ
D00 ðα, β, γ Þ ¼ 1:
0 1
D1,1 D1,0 D1,1
B C
Dð1Þ ¼ B
@ D0,1 D0,0 D0,1 C A
D1,1 D1,0 D1,1
0 β 1 β 1
eiðαþγ Þ cos 2 pffiffiffi eiα sin β eiðαγÞ sin 2 ð20:74Þ
B 2 2 2 C
B C
B C
B 1
pffiffiffi e sin β C
1 iγ
¼ B pffiffiffi eiγ sin β cos β C:
B 2 2 C
B C
@ β 1 βA
eiðγαÞ sin 2 pffiffiffi eiα sin β eiðαþγÞ cos 2
2 2 2
Note that (20.74) is a unitary matrix. The confirmation is left for readers. The
complex conjugate of the column vector m ¼ 0 in (20.61) with l ¼ 1 is described by
0 1
1
pffiffiffi eiα sin β
h i B
B 2
C
C
Dm0 ðϕ, θÞ ¼ B
ð1Þ C
B cos β C ð20:75Þ
@ 1 iα A
pffiffiffi e sin β
2
that is related to the spherical surface harmonics (see Sect. 3.6); also see p. 760,
Table 15.4 of Ref. [4].
Now, (20.25), (20.61), (20.65), and (20.73) provide a clue to relating D(1) to the
f3 of (17.101) that includes Euler angles. The argument is
rotation matrix of ℝ3, i.e., R
as follows: As already shown in (20.61) and (20.65), we have related the basis
vectors ( f1 f0 f1) of D(1) to the spherical harmonics Y 1 1 Y 01 Y 11 . Denoting the
spherical basis by
1 1
ff e e
1 pffiffiffi ðx iyÞ, f 0 z, f 1 pffiffiffi ðx iyÞ, ð20:76Þ
2 2
we have
0 1
1 1
pffiffiffi pffiffiffi C
B 2 0 2C
B
ff e e
1 f 0 f 1 ¼ ðx z yÞU ¼ ðx z yÞB
B 0 1 0 C C, ð20:77Þ
@ i i A
pffiffiffi 0 pffiffiffi
2 2
0 1
1 1
pffiffiffi 0 pffiffiffi C
B 2
B 2C
UB
B 0 1 0 C C:
@ i i A
pffiffiffi 0 pffiffiffi
2 2
0
ff e 0 e 0 ff
1 f 0 f 1
e e f e e ð1Þ
1 f 0 f 1 Rða, bÞ ¼ f 1 f 0 f 1 D : ð20:78Þ
0
ff e 0 e 0 ¼ ðx0 z0 y0 Þ U:
1 f 0 f 1
0
ðx0 z0 y0 Þ ¼ ff e 0 e 0 1 ¼ ff
1 f 0 f 1 U
e e ð1Þ 1
1 f 0 f 1 D U
¼ ðx z yÞUDð1Þ U 1 : ð20:79Þ
Note that (x0 z0 y0)V ¼ (x0 y0 z0) and (x z y)V ¼ (x y z); that is, V exchanges the
order of z and y in (x z y).
Meanwhile, regarding (x y z) as the basis vectors in ℝ3 we have
ð x0 y0 z 0 Þ ¼ ð x y z Þ R , ð20:81Þ
β β
More explicitly, using a ¼ e2ðαþγÞ cos and b ¼ e2ðαγÞ sin
i i
2 2 as in the case of
(20.45), we describe R as
0 1
Re a2 b2 Im a2 þ b2 2 Re ðabÞ
B C
R ¼ @ Im a2 b2 Re a2 þ b2 2ImðabÞ A: ð20:83Þ
2 Re ðab Þ 2Imðab Þ 2
j aj j bj 2
Let us continue the discussion still further. We think of the quantity (x/r y/r z/r),
where r is radial coordinate of (20.25). Operating (20.82) on (x/r y/r z/r) from the
right, we have
¼ ff e e ð1Þ 1
1 =r f 0 =r f 1 =r D U V:
ff e e
1 =r f 0 =r f 1 =r D
ð1Þ
¼ ð0 1 0Þ:
ff e e ð1Þ 1
1 =r f 0 =r f 1 =r D U V ¼ ðx=r y=r z=r ÞR ¼ ð0 0 1Þ: ð20:85Þ
For the confirmation of (20.85), use the spherical coordinate representation (3.17)
to denote x, y, and z in the above equation. This seems to be merely a confirmation of
the unitarity of representation matrices; see (20.72). Nonetheless, we emphasize that
the simple relation (20.85) holds only with a special case where the parameters α and
β in D(1) (or R ) are identical with the angular components of spherical coordinates
associated with the Cartesian coordinates x, y, and z. Equation (20.85), however,
does not hold in a general case where the parameters α and β are different from those
angular components. In other words, (20.68) that describes the special case where
Q R(α, β, γ) leads to (20.85); see Fig. 20.2. Equation (20.66), on the other hand, is
used for the general case where Q represents an arbitrary transformation in ℝ3.
Regarding further detailed discussion on the topics, readers are referred to the
literature [1, 2, 5, 6].
ð jÞ
Since in (20.86) the exponential term never vanishes, Djm ðα, β, γ Þ does not vanish
either except for special values of β.
ð jÞ
Next, we consider Dm0 m ðα, 0, γ Þ. For this term not to vanish, the power of sin β2 in
(20.54) must be zero; i.e., we have
m0 m þ 2μ ¼ 0: ð20:87Þ
824 20 Theory of Continuous Groups
ð jÞ 0
Dm0 m ðα, 0, γ Þ ¼ δm0 m eiðαm þγmÞ : ð20:88Þ
ð jÞ
This implies that Dm0 m ðα, 0, γ Þ is diagonal. Suppose that we have a matrix
ð jÞ
A ¼ ðAm0 m Þ. If A commutes with Dm0 m ðα, 0, γ Þ, it must be diagonal unless
ð jÞ
Dm0 m ðα, 0, γ Þ ¼ cδm0 m, where c is a constant independent of m0 and m. But,
ð jÞ
Dm0 m ðα, 0, γ Þ 6¼ cδm0 m because of (20.88). That is, A should have a form Am0 m ¼
ð jÞ
am δm0 m . If A commutes with Dm0 m ðα, β, γ Þ, then taking the ( j, m) component of the
product we have
X ð jÞ
X ð jÞ ð jÞ
ADð jÞ ¼ A D ¼
k jk km
aδ D
k k jk km
¼ aj Djm : ð20:89Þ
jm
Meanwhile,
X ð jÞ
X ð jÞ ð jÞ
Dð jÞ A ¼ k
Djk Akm ¼ k
Djk am δkm ¼ am Djm : ð20:90Þ
jm
ð jÞ
Djm ðα, β, γ Þ aj am ¼ 0:
ð jÞ
As already mentioned above, the matrix elements Djm ðα, β, γ Þ of (20.86) do not
ð jÞ
vanish except for special values of β. Therefore, for A to commute with Dm0 m ðα, β, γ Þ,
we must have aj ¼ am a for all m. This implies
ð jÞ
Thus, from Schur’s Second Lemma Dm0 m ðα, β, γ Þ is irreducible.
We must be careful, however, about the application of Schur’s Second Lemma.
This is because Schur’s Second Lemma described in Sect. 18.3 says the
following:
A representation D is irreducible ⟹ The matrix M that is commutative with all
D(g) is limited to a form of M ¼ cE.
20.2 Rotation Groups: SU(2) and SO(3) 825
where g is any group element. Then, we can choose a following matrix A for a linear
transformation that is commutative with D(g):
where α(1), , α(ω) are complex constants that may be different from one another;
E1, , Eω are unit matrices having the same dimension as D(1)(g), , D(ω)(g),
respectively. Then, even though A is not a type of A ¼ αδm0 m, A commutes with D(g)
for any g.
Here we rephrase Schur’s Second Lemma as the following theorem.
Theorem 20.2 Let D be a unitary representation of a group G. Suppose that with
8
g 2 G we have
h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð1Þm0 mþμ ðl þ mÞ!ðl mÞ!ðl þ m0 Þ!ðl m0 Þ!
ðlÞ
D ðα, β, γ Þ ¼
m0 m μ ðl þ m μÞ!μ!ðm0 m þ μÞ!ðl m0 μÞ!
2lþmm0 2μ m0 mþ2μ
0 β β
eiðαm þγmÞ cos sin :
2 2
ð20:94Þ
826 20 Theory of Continuous Groups
f3 (appearing in
Meanwhile, as mentioned in Sect. 20.2.3 the transformation R
Sect. 17.4.2) described by
f
R3 ¼ Rzα R0 y0 β R00 z00 γ
ðlÞ
DðlÞ ðα, β, γ Þ DðlÞ f
R3 ¼ DðαlÞ ðRzα ÞDβ R0y0 β DðγlÞ R00 z00 γ : ð20:95Þ
where with each representation two out of three Euler angles are zero.
In light of (20.94), we estimate each factor of (20.96). By the same token as
before, putting β ¼ γ ¼ 0 in (20.94) let us examine on what conditions
m0 mþ2μ
sin 02 survives. For this factor to survive, we must have m0 m + 2μ ¼ 0
as in (20.87). Then, following the argument as before, we should have μ ¼ 0 and
m0 ¼ m. Thus, D(l )(α, 0, 0) must be a diagonal matrix. This is the case with
D(l )(0, 0, γ) as well.
Equation (3.216) implies that the functions Y m l ðβ, αÞ ðm ¼ l, , 0, , lÞ are
eigenfunctions with regard to the rotation about the z-axis. In other words, we have
imα m
l ðθ, ϕÞ ¼ Y l ðθ, ϕ αÞ ¼ e
Rzα Y m Y l ðθ, ϕÞ:
m
l ðθ, ϕÞ ¼ F ðθ Þe
Ym imϕ
,
imðϕαÞ
l ðθ, ϕ αÞ ¼ F ðθ Þe
Ym ¼ eimα F ðθÞeimϕ ¼ eimα Y m
l ðθ, ϕÞ:
Y l lþ1
l Yl Y 0l Y ll Rzα
0 1
eiðlÞα
B C
B C
l lþ1 B
l B
eiðlþ1Þα C
¼ Y l Y l Y l Y l B
0 C
C
B ⋱ C
@ A
eilα ð20:97Þ
0 ilα
1
e
B C
B C
l lþ1 B eiðl1Þα C
¼ Y l Y l Y 0l Y ll B
B
C:
C
B ⋱ C
@ A
eilα
From (20.96), with the (m0, m) matrix elements of D(l )(α, β, γ) we get
h i X h h h i
DðlÞ ðα, β, γ Þ ¼ D ðlÞ
ðα, 0, 0 Þ 0
ms D ðlÞ
ð 0, β, 0 Þst D ðlÞ
ð 0, 0, γ Þ
m0 m s,t
X h i tm
isα ðlÞ imγ
¼ s,t
e δm0 s D ð0, β, 0Þ e δtm ð20:99Þ
0
h i st
h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð1Þm0 mþμ ðl þ mÞ!ðl mÞ!ðl þ m0 Þ!ðl m0 Þ!
ðlÞ
D ð0, β, 0Þ ¼
m0 m μ ðl þ m μÞ!μ!ðm0 m þ μÞ!ðl m0 μÞ!
2lþmm0 2μ m0 mþ2μ:
β β
cos sin ð20:100Þ
2 2
Thus, we find that [DðlÞ ðα, β, γ Þm0 m of (20.99) has been factorized into three
factors. Equation (20.94) has such an implication. The same characteristic is shared
with the SU(2) representation matrices (20.54) more generally. This can readily be
understood from the aforementioned discussion.
Taking D(1) of (20.74) as an example, we have
828 20 Theory of Continuous Groups
0
am0 m eimα eim α ¼ 0:
A ¼ am δ m 0 m , ð20:103Þ
l ðβ θ, 0Þ:
¼ Ym ð20:105Þ
Taking account of (20.71), RHS does not vanish in (20.107) only when
n ¼ 0. On this condition, we obtain
ðlÞ
l ðθ, 0Þ ¼ Y l ð0, 0ÞD0m ð0, θ, 0Þ:
Ym ð20:108Þ
0
rffiffiffiffiffiffiffiffiffiffiffiffi
ðlÞ 4π m
D0m ð0, θ, 0Þ ¼ Y ðθ, 0Þ: ð20:111Þ
2l þ 1 l
Compare (20.111) with (20.61) and confirm (20.111) using (20.74) with
l ¼ 1. Viewing (20.111) as a “row vector,” (20.111) with l ¼ 1 can be expressed
as
and
h i Xh i h i
DðlÞ ð0, θ, 0ÞA ¼ k
DðlÞ ð0, θ, 0Þ am δkm ¼ DðlÞ ð0, θ, 0Þ am ,
0m 0k 0m
am ¼ a0
A ¼ a0 δm0 m , ð20:112Þ
X
D¼ a DðlÞ ðα, β, γ Þ,
l l
ð20:113Þ
As already discussed in Sect. 17.4.2, we need three angles α, β, and γ to specify the
rotation in ℝ3. Their domains are usually taken as follows:
0 α 2π, 0 β π, 0 γ 2π:
Then, we have
832 20 Theory of Continuous Groups
A u = 1u = u: ð20:114Þ
AT A u = Eu = u = AT u, ð20:115Þ
where we used the property of an orthogonal matrix; i.e., ATA ¼ E. From (20.114)
and (20.115), we have
A AT u ¼ 0: ð20:116Þ
That is,
0 1 0 1 0 1
1 0 0 1 0 η 1 ζ 0
B C B C B C
Rxξ ¼ @ 0 1 ξ A, Ryη ¼ @ 0 1 0 A, Rzζ ¼ @ ζ 1 0 A: ð20:121Þ
0 ξ 1 η 0 1 0 0 1
These operators represent infinitesimal rotations ξ, η, and ζ around the x-, y-, and
z-axes, respectively (see Fig. 20.3). Note that these operators commute one another
to the first order of infinitesimal quantities ξ, η, and ζ. That is, we have
Rxξ Ryη ¼ Ryη Rxξ , Ryη Rzζ ¼ Rzζ Ryη , Rxξ Ryη Rzζ ¼ Rzζ Ryη Rxξ , etc:
with, e.g.,
0 1
1 ζ η
B C
Rxξ Ryη Rzζ ¼ @ ζ 1 ξ A ð20:122Þ
η ξ 1
to the first order. Notice that the order of operations Rxξ, Ryη, and Rzζ is disregarded.
Next, let us consider a successive transformation with a finite rotation angle ω that
follows the infinitesimal rotations. Readers may well ask why and for what purpose
we need to make such elaborate calculations. It is partly because when we were
dealing with finite groups in the previous chapters, group elements were finite and
relevant calculations were straightforward accordingly. But, now we are thinking of
continuous groups that possess an infinite number of group elements and, hence, we
have to consider a “density” of group elements. In this respect, we are now thinking
of how the density of group elements in the parameter space is changed according to
y
O
x
834 20 Theory of Continuous Groups
the rotation of a finite angle ω. The results are used for various group calculations
including orthogonalization of characters.
Getting back to our subject, we further operate a finite rotation of an angle ω
around the z-axis subsequent to the aforementioned infinitesimal rotations. The
discussion developed below is due to Hamermesh [5]. We denote this rotation by
Rω. Since we are dealing with a spherically symmetric system, any rotation axis can
equivalently be chosen without loss of generality. Notice also that the rotations of the
same rotation angle belong to the same conjugacy class [see (17.106) and
Fig. 17.18]. Defining RxξRyηRzζ RΔ of (20.122), the rotation R that combines RΔ
and subsequently occurring Rω is described by
0 10 1
1 ζ η cos ω sin ω 0
B CB C
B CB C
R ¼ RΔ Rω ¼ B ζ 1 ξ CB sin ω cos ω 0 C
@ A@ A
η ξ 1 0 0 1
0 1 ð20:123Þ
cos ω ζ sin ω sin ω ζ cos ω η
B C
B C
¼ B sin ω þ ζ cos ω cos ω ζ sin ω ξ C:
@ A
ξ sin ω η cos ω ξ cos ω þ η sin ω 1
Hence, we have
Since (20.124) gives a relative directional ratio of the rotation axis, we should
normalize a vector whose components are given by (20.124) to get a direction
cosines of the rotation axis. Note that the direction of the rotation axis related to
R of (20.123) should be close to Rω and that only R12 R21 in (20.124) has a term
(i.e., 2 sin ω) lacking infinitesimal quantities ξ, η, ζ. Therefore, it suffices to
normalize R12 R21 to seek the direction cosines to the first order. Hence, dividing
(20.124) by 2 sin ω, as components of the direction cosines we get
Meanwhile, combining (17.78) and (20.123), we can find a rotation angle ω0 for
R in (20.123). The trace χ of (20.123) is written as
20.2 Rotation Groups: SU(2) and SO(3) 835
χ ¼ 1 þ 2 cos ω0 : ð20:127Þ
1 2 1
1 ω0 1 ω2 ζω: ð20:129Þ
2 2
Hence, we obtain
ω0 ω þ ζ: ð20:130Þ
Combining (20.125) and (20.130), we get their product to the first order such that
ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ
ω þ , ω , ω þ ζ: ð20:131Þ
2 sin ω 2 2 sin ω 2
Since these quantities are products of individual direction cosines and the rotation
angle ω + ζ, we introduce variables ex, ey, and ez as the x-, y-, and z-related quantities,
respectively. Namely, we have
ξð1 þ cos ωÞ η
ex ¼ ω þ ,
2 sin ω 2
ηð1 þ cos ωÞ ξ ð20:132Þ
ey ¼ ω ,
2 sin ω 2
ez ¼ ω þ ζ:
where we used formulae of trigonometric functions with the last equality. Note that
(20.133) does not depend on the signs of ω. This is because J represents the
relative volume density ratio between two parameter spaces of the ex ey ez-system and
ξηζ-system and this ratio is solely determined by the modulus of rotation angle.
Taking ω ! 0, we have
0
ω2 ðω2 Þ 1 2ω ω
lim J ¼ lim ¼ lim 2 0 ¼ lim ¼ lim
ω!0 ω!0 4 sin 2 ω ω!0 4 sin ω 4 ω!0 2 sin ω2 12 cos ω
2
ω!0 sin ω
2 2
ð ωÞ 0 1
¼ lim 0 ¼ lim cos ω ¼ 1:
ω!0 ð sin ωÞ ω!0
Thus, the relative volume density ratio in the limit of ω ! 0 is 1, as expected. Let
dV ¼ dexdeydez and dΠ ¼ dξdηdζ be volume elements of each coordinate system.
Then we have
dV ¼ JdΠ: ð20:134Þ
We assume that
dV ρξηζ
J¼ ¼ ,
dΠ ρexeyez
where ρ ex ey ez and ρξηζ are a “number density” of group elements in each coordinate
system. We suppose that the infinitesimal rotations in the ξηζ-coordinate are
converted by a finite rotation Rω of (20.123) into the ex ey ez -coordinate. In this way,
J can be viewed as an “expansion” coefficient as a function of rotation angle jωj.
Hence, we have
ρξηζ
dV ¼ dΠ or ρ ex ey ez dV ¼ ρξηζ dΠ: ð20:135Þ
ρ ex ey ez
20.2 Rotation Groups: SU(2) and SO(3) 837
Equation (20.135) implies that total (infinite) number of group elements that are
contained in the group SO(3) is invariant with respect to the transformation.
Let us calculate the total volume of SO(3). This can be measured such that
Z Z
1
dΠ ¼ dV: ð20:136Þ
J
where we denote the total volume of SO(3) by Π[SO(3)]. Note that ω in (20.133) is
replaced with ω e j ω j so that the radial coordinate can be positive. As already
noted, (20.133) does not depend on the signs of ω. In other words, among the
angles α, β, and ω (usually γ is used instead of ω; see Sect. 17.4.2), ω is taken as
π ω < π (instead of 0 ω < 2π) so that it can be conformed to the radial
coordinate.
We utilize (12.137) for calculation of various functions. Let f(ϕ, θ, ω) be an
arbitrary function on a parameter space. We can readily estimate a “mean value”
of f(ϕ, θ, ω) on that space. That is, we have
R
f ðϕ, θ, ωÞdΠ
f ðϕ, θ, ωÞ R , ð20:138Þ
dΠ
To evaluate (20.139), let us get back to the irreducible representations D(l )(α, β, γ) of
Sect. 20.2.3. Also, we recall how successive coordinate transformations produce
changes in the orthogonal matrices of transformation (Sect. 17.4.2). Since the
parameters α, β, and γ uniquely specify individual rotation, different sets of α, β,
and γ (in this section we use ω instead of γ) cause different irreducible
838 20 Theory of Continuous Groups
This implies that Rω and R0ω belong to the same conjugacy class. To consider the
situation more clearly, let us view the two rotations Rω and R0ω from two different
coordinate systems, say some xyz-coordinate system and another x0y0z0-coordinate
system. Suppose also that A coincides with the z-axis and that A0 coincides with the
z0-axis. Meanwhile, if we describe Rω in reference to the xyz-system and R0ω in
reference to the x0y0z0-system, the representation of these rotations must be identical
in reference to the individual coordinate systems. Let this representation matrix be
Rω .
(i) If we describe Rω and R0ω with respect to the xyz-system, we have
1
Rω ¼ Rω , R0ω ¼ Q1 Rω Q1 ¼ QRω Q1 : ð20:141Þ
Namely, we reproduce
Hence, again we reproduce (20.140). That is, the relation (20.140) is indepen-
dent of specific choices of the coordinate system.
20.2 Rotation Groups: SU(2) and SO(3) 839
where with the last equality we used (18.6). Operating D(l )(Q) on (20.143)
from the right, we get
DðlÞ R0ω DðlÞ ðQÞ ¼ DðlÞ ðQÞDðlÞ ðRω Þ: ð20:144Þ
Since both DðlÞ R0ω and D(l )(Rω) are irreducible, from (18.41) of Schur’s First
Lemma DðlÞ R0ω and D(l )(Rω) are equivalent. An infinite number of rotations
determined by the orientation of the rotation axis A that is accompanied by an
azimuthal angle α (0 α < 2π) and a zenithal angle β (0 β < π) (see
Fig. 17.18) form a conjugacy class with each specific ω shared. Thus, we have
classified various representations D(l )(Rω) according to ω and l. Meanwhile, we may
identify D(l )(Rω) with D(l )(α, β, γ) of Sect. 20.2.4.
In Sect. 20.2.2 we know that the spherical surface harmonics
l θ, ϕÞ ðm ¼ l, , 0, , lÞ constitute basis functions of D
ð (l )
Ym and span the rep-
resentation space. A dimension of the matrix (or the representation space) is 2l + 1.
0
Therefore, D(l ) and Dðl Þ for different l and l0 have a different dimension. From
0
(18.41) of Schur’s First Lemma, such D(l ) and Dðl Þ are inequivalent.
Returning
to (20.139), let us evaluate a mean value of f ðϕ, θ, ωÞ. Let the trace of
DðlÞ R0ω and D(l )(Rω) be χ ðlÞ R0ω and χ (l )(Rω), respectively. Remembering (12.13),
the trace is invariant under a similarity transformation, and so χ ðlÞ R0ω ¼ χ ðlÞ ðRω Þ.
Then, we put
χ ðlÞ ðωÞ χ ðlÞ ðRω Þ ¼ χ ðlÞ R0ω : ð20:145Þ
Consequently, it suffices to evaluate the trace χ (l )(ω) using D(l )(α, 0, 0) of (20.96)
whose representation matrix is given by (20.97). We have
Xl eilω 1 eiωð2lþ1Þ
χ ðlÞ ðωÞ ¼ e imω
¼
m¼l 1 eiω
eilω ð1 eiω Þ 1 eiωð2lþ1Þ
¼ , ð20:146Þ
ð1 eiω Þð1 eiω Þ
where
½numerator of ð20:146Þ
840 20 Theory of Continuous Groups
and
ω
½denominator of ð20:146Þ ¼ 2ð1 cos ωÞ ¼ 4sin2 :
2
Thus, we get
ðlÞ sin l þ 12 ω
χ ð ωÞ ¼ : ð20:147Þ
sin ω2
1 X ðαÞ ðβÞ
χ ðgÞ χ ðgÞ ¼ δαβ : ð20:148Þ
n g
If, moreover, f(α, β, ω) depends only on ω, again as in the case of the character
described by (20.147), the calculation is further simplified such that
Z Z π
ω
f ðωÞdΠ ¼ 16π dωf ðωÞ sin 2 : ð20:150Þ
0 2
0
Replacing f(ω) with χ ðl Þ ðωÞ χ ðlÞ ðωÞ, we have
20.2 Rotation Groups: SU(2) and SO(3) 841
Z h i Z πh 0 i
ðl0 Þ ðlÞ ω
χ ðωÞ χ ðωÞdΠ ¼ 16π dω χ ðl Þ ðωÞ χ ðlÞ ðωÞsin 2
0 2
Z π sin l0 þ
1 1
ω sin l þ ω Z π
¼ 16π dω 2 2 sin 2 ω ¼ 16π 1 1
dωsin l0 þ ωsin l þ ω
ω ω 2 2 2
0 sin sin 0
Z 2 2
π
¼ 8π dω½ cos ðl0 lÞω cos ðl0 þ l þ 1Þω:
0
ð20:151Þ
0
Finally, with χ ðl Þ ðωÞ χ ðlÞ ðωÞ defined as in (20.139), we get
0
χ ðl Þ ðωÞ χ ðlÞ ðωÞ ¼ δl0 l : ð20:153Þ
which must vanish as well. Subtracting RHS of (20.154) from RHS of (20.155) and
taking account of the fact that χ (l )(ω) is real [see (20.147)], we have
Z π h i
ω
dω χ ðlþ1Þ ðωÞ χ ðlÞ ðωÞ KðωÞ sin 2 ¼ 0: ð20:156Þ
0 2
aþb ab
sin a sin b ¼ 2 cos sin
2 2
Putting l ¼ 0 in LHS of (20.154) and from the assumption that the integral of
(20.154) vanishes, we also have
Z π
ω
dωKðωÞ sin 2 ¼ 0, ð20:158Þ
0 2
where we used χ (0)(ω) ¼ 1. Then, (20.157) and (20.158) are combined to give
Z π
ω
dωð cos lωÞKðωÞ sin 2 ¼ 0 ðl ¼ 0, 1, 2, Þ: ð20:159Þ
0 2
This implies that all the Fourier cosine coefficients of KðωÞ sin 2 ω2 vanish.
Considering that coslω (l ¼ 0, 1, 2, ) forms a complete orthonormal system in
the region [0, π], we must have
ω
KðωÞ sin 2 0: ð20:160Þ
2
In the above discussion, we assumed that K(ω) is an even function with respect to
ω as in the case of χ (l )(ω) in (20.147). Hence, KðωÞ sin 2 ω2 is an even function as
well. Therefore, we assumed that KðωÞ sin 2 ω2 can be expanded to the Fourier cosine
series.
20.3 Clebsch–Gordan Coefficients of Rotation Groups 843
In the infinite groups, typically the continuous groups, this situation often appears
when we deal with the addition of two angular momenta. Examples include the
addition of two (or more) orbital angular momenta and that of the orbital angular
momentum and spin angular momentum [6, 8].
Meanwhile, there is a set of basis vectors that spans the representation space with
respect to individual irreducible representations. There, we have a following ques-
tion: What are the basis vectors that span the total reduced representation space
described by (18.199)? An answer for this question is that these vectors must be
constructed by the basis vectors relevant to the individual irreducible representa-
tions. The ClebschGordan coefficients appear as coefficients with respect to a
linear combination of the basis vectors associated with those irreducible
representations.
In a word, to find the ClebschGordan coefficients is equivalent to the calcula-
tion of proper coefficients of the basis functions that span the representation space of
the direct-product groups. The relevant continuous groups which we are interested in
are SU(2) and SO(3).
Xj sin j þ 12 ω
ð jÞ
χ ð ωÞ ¼ e imω
¼ , ð20:161Þ
m¼j sin ω2
χð j1 j2 Þ
ðωÞ ¼ χ ð j1 Þ ðωÞχ ð j2 Þ
ðωÞ, ð20:163Þ
J ¼ j j1 j2 j, j j1 j2 j þ 1, , j1 þ j2 :
χ ð j1 j2 Þ ðωÞ ¼ χ ðj j1 j2 jÞ ðωÞ þ χ ðj j1 j2 jþ1Þ ðωÞ þ þ χ ð j1 þj2 Þ ðωÞ: ð20:165Þ
where k is zeroR or an integer and with the last equality we used (20.152). Dividing
both sides by dΠ, we get
χ ð k Þ ð ωÞ χ ð j1 j2 Þ ðωÞ ¼ δk,jj1 j2 j þ δk,jj1 j2 jþ1 þ þ δk,j1 þj2 , ð20:167Þ
where we used (20.153). Equation (20.166) implies that only if k is identical to one
out of jj1 j2j, jj1 j2 j + 1, , j1 + j2, χ ðkÞ ðωÞ χ ð j1 j2 Þ ðωÞ does not vanish. If, for
example, we choose jj1 j2j for k in (20.167), we have
This relation corresponds R to (18.200) where a finite group is relevant. Note that
the volume of SO(3), i.e., dΠ ¼ 8π 2 corresponds to the order n of a finite group in
(20.148). Considering (18.199) and (18.200) once again, it follows that in the above
case Dðj j1 j2 jÞ takes place once and only once in the direct-product representation
Dð j1 j2 Þ . Thus, we obtain the following important relation:
X j1 þj2
Dð j1 j2 Þ
¼ Dð j1 Þ
ð ωÞ Dð j2 Þ
ð ωÞ ¼ J¼jj1 j2 j
DðJ Þ : ð20:169Þ
ðμÞ ð vÞ
ψj μ ¼ 1, , nμ and ϕl ðμ ¼ 1, , nv Þ:
We know from Sect. 18.8 that nμnv new basis functions can be constructed using
ðμÞ ðvÞ
the functions that are described by ψ j ϕl . Our next task will be to classify these
nμnv functions into those belonging to different irreducible representations. For this,
we need relation of (18.199). According to the custom, we rewrite it as follows:
X
DðμÞ ðgÞ DðvÞ ðgÞ ¼ ω
ðμvωÞDðωÞ ðgÞ, ð20:170Þ
where (μvω) is identical with qω of RHS of (18.199) and shows how many times the
same representation ω takes place in the direct-product representations. Naturally,
we have
ðμvωÞ ¼ ðvμωÞ:
Then, we get
X
ðμvωÞnω ¼ nμ nv :
ω
ðμj, vljωτω sÞ
As we shall see later in this chapter, the functions Ψðsωτω Þ are going to be
ðμÞ ðvÞ
normalized along with the product functions ψ j ϕl . The ClebschGordan coeffi-
cients must form a unitary matrix accordingly. Notice that a transformation between
two orthonormal basis sets is unitary (see Chap. 14). According as there are nμnv
product functions as the basis vectors, the ClebschGordan coefficients are associ-
ated with (nμnv, nμnv) square matrices. The said coefficients can be defined with any
group either finite or infinite. Or rather, difficulty in determining the
ClebschGordan coefficients arises when τω is more than one. This is because we
should take account of linear combinations described by
X
c Ψðωτω Þ
τ ω ωτ ω s
that belong to the irreducible representation ω. This would cause arbitrariness and
ambiguity [5]. Nevertheless, we do not need to get into further discussion about this
problem. From now on, we focus on the ClebschGordan coefficients of SU(2) and
SO(3), a typical simply reducible group, namely τω is at most one.
For SU(2) and SO(3) to be simply reducible saves us a lot of labor. The argument
for this is as follows: Equation (20.169) shows how the dimension of (2j1 + 1)
(2j2 + 1) for the representation space with the direct product of two representations is
redistributed to individual irreducible representations that take place only once. The
arithmetic based on this fact is as follows:
ð2j1 þ 1Þð2j2 þ 1Þ
n h io
1
¼ 2ð j1 j2 Þð2j2 þ 1Þ þ 2 2j2 ð2j2 þ 1Þ þ ð2j2 þ 1Þ
2
¼ 2ð j1 j2 Þð2j2 þ 1Þ þ 2ð0 þ 1 þ 2 þ þ 2j2 Þ þ ð2j2 þ 1Þ ð20:172Þ
¼ ½2ð j1 j2 Þ þ 0 þ 2½ð j1 j2 Þ þ 1 þ 2½ð j1 j2 Þ þ 2 þ
þ2½ð j1 j2 Þ þ 2j2 þ ð2j2 þ 1Þ
¼ ½2ð j1 j2 Þ þ 1 þ f2½ð j1 j2 Þ þ 1 þ 1g þ þ ½2ð j1 þ j2 Þ þ 1,
where with the second equality, we used the formula 1 þ 2 þ þ n ¼ 12 nðn þ 1Þ;
resulting numbers of 0, 1, 2, , and 2j2 are distributed to 2j2 + 1 terms of the
subsequent RHS of (20.172) each; the third term of 2j2 + 1 is distributed to 2j2 + 1
terms each as well on the rightmost hand side. Equation (20.172) is symmetric with
j1 and j2, but if we assume that j1 j2, each term of the rightmost hand side of
(20.172) is positive. Therefore, for convenience we assume j1 j2 without loss of
generality.
To clarify the situation, we list several tables (Tables 20.1, 20.2, and 20.3).
Tables 20.1 and 20.2 show specific cases, but Table 20.3 represents a general
case. In these tables the topmost layer has a diagram indicated by . This
diagram corresponds to the case of J ¼ j j1 j2j and contains different
2|j1 j2| + 1 quantum states. The lower layers include diagrams indicated by
. These diagrams correspond to the case of J ¼ |j1 j2| + 1, , j1 + j2. For
848 20 Theory of Continuous Groups
−1 0 1 (= )
−1 −2 −1 0
0 −1 0 1
1 (= ) 0 1 2
−2 −1 0 1 2 (= )
−1 −3 −2 −1 0 1
0 −2 −1 0 1 2
1 (= ) −1 0 1 2 3
Table 20.3 Summation of angular momenta in a general case. We suppose j1 > 2j2. J ¼ |j1 j2|,
, j1 + j2; M J, J + 1, , J
m1
− − +1 ⋯ 2 − ⋯ −2 ⋯ −1
m2
− − − − − +1 ⋯ − ⋯ −3 ⋯ − −1 −
− +1 − − +1 − − +2 ⋯ − +1 ⋯ −3 +1 ⋯ − − +1
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
each J, 2J + 1 functions (or quantum states) are included and labelled according to
different numbers M + J, J + 1, , J. The rightmost column displays
M m1 + m2 ¼ |j1 j2|, |j1 j2| 1, , j1 + j2 from the topmost row through
the bottommost row.
Tables 20.1, 20.2, and 20.3 comprise 2j2 + 1 layers, where total (2j1 + 1)(2j2 + 1)
functions that form basis vectors of Dð j1 Þ ðωÞ Dð j2 Þ ðωÞ are redistributed according
to the M number. The same numbers that appear from the topmost row through the
bottommost row are marked red and connected with dotted aqua lines. These lines
form a parallelogram together with the top and bottom horizontal lines as shown.
The upper and lower sides of the parallelogram are 2 jj1 j2j in width. The
parallelogram gets flattened with decreasing jj1 j2j. If j1 ¼ j2, the parallelogram
coalesces into a line (Table 20.1). Regarding the notations M and J, see Sects. 20.3.2
and 20.3.3 (vide infra).
Bearing in mind the above remarks, we focus on the coupling of two angular
momenta as a typical example. Within this framework we deal with the
20.3 Clebsch–Gordan Coefficients of Rotation Groups 849
u jþm v jm
f m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm ¼ j, j þ 1, , j 1, jÞ: ð20:47Þ
ð j þ mÞ!ð j mÞ!
In the previous section we have described how systematically the quantum states are
made up of two angular momenta. We express those states in terms of a linear
combination of the basis functions related to the direct-product representations of
(20.169). To determine those coefficients, we follow calculation procedures due to
Hamermesh [5]. The calculation procedures are rather lengthy, and so we summarize
each item separately.
(1) Invariant quantities AJ and BJ:
We start with seeking invariant quantities under the unitary transformation U of
(20.35) expressed as
a b
U¼ with jaj2 þ jbj2 ¼ 1: ð20:35Þ
b a
As in (20.46) let us consider a combination of variables u (u2 u1) and v (v2 v1)
that are transformed such that
a b a b
u02 u01 ¼ ðu2 u1 Þ , v02 v01 ¼ ðv2 v1 Þ : ð20:173Þ
b a b a
u1 j1 þm1 u2 j1 m1
ψ mj11 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm1 ¼ j1 , j1 þ 1, , j1 1, j1 Þ,
ð j1 þ m1 Þ!ð j1 m1 Þ!
ð20:176Þ
v1 j2 þm2 v2 j2 m2
ϕmj22 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm2 ¼ j2 , j2 þ 1, , j2 1, j2 Þ:
ð j2 þ m2 Þ!ð j2 m2 Þ!
x0 ¼ xm , ð20:178Þ
where we have
a b
x0 x02 x01 , x ðx2 x1 Þ, m ¼ : ð20:179Þ
b a
Rewriting (20.181) as
20.3 Clebsch–Gordan Coefficients of Rotation Groups 851
u0 g ¼ ðugÞm , ð20:182Þ
x0 ¼ ðm ÞT xT ¼ m{ xT :
T
Then, we get
u0 x0 ¼ ðumÞ m{ xT ¼ u mm{ xT ¼ uxT ¼ u2 x2 þ u1 x1 ¼ uxT ,
T
ð20:183Þ
where with the third equality we used the unitarity of m. This implies that uxT is an
invariant quantity under the unitary transformation by m. We have
v0 x0 ¼ vxT ,
T
ð20:184Þ
0 1 u2
¼ ðv2 v1 Þ ¼ u1 v 2 u2 v 1 : ð20:185Þ
1 0 u1
Thus, v(ug)T (or u1v2 u2v1) is another invariant under the unitary transformation
by m. In the above discussion we can view (20.183) to (20.185) as a quadratic form
of two variables (see Sect. 14.5).
Using these invariants, we define other important invariants AJ and BJ as follows
[5]:
The polynomial AJ is of degree 2j1 with respect to the covariant variables u1 and
u2 and is of degree 2j2 with v1 and v2. The degree of the contravariant variables x1
and x2 is 2J. Expanding AJ in powers of x1 and x2, we get
XJ
AJ ¼ M¼J
W JM X JM , ð20:188Þ
where X JM is defined as
x1 JþM x2 JM
X JM pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðM ¼ J, J þ 1, , J 1, J Þ ð20:189Þ
ðJ þ M Þ!ðJ M Þ!
852 20 Theory of Continuous Groups
and coefficients W JM are polynomials of u1, u2, v1, and v2. The coefficients W JM will
be determined soon after in combination with X JM . Meanwhile, we have
X2J ð2J Þ!
BJ ¼ ðu x Þk ðu2 x2 Þ2Jk
k!ð2J kÞ! 1 1
k¼0
XJ 1
¼ ð2J Þ! M¼J ðu x ÞJþM ðu2 x2 ÞJM
ðJ þ M Þ!ðJ M Þ! 1 1 ð20:190Þ
XJ 1
¼ ð2J Þ! M¼J u JþM u2 JM x1 JþM x2 JM
ðJ þ M Þ!ðJ M Þ! 1
XJ
¼ ð2J Þ! M¼J ΦJM X JM ,
u1 JþM u2 JM
ΦJM pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðM ¼ J, J þ 1, , J 1, J Þ: ð20:191Þ
ðJ þ M Þ!ðJ M Þ!
Therefore, we have
X
λ j1 þ j2 J j1 j2 þ J j2 j1 þ J
AJ ¼ ð1Þ
λ,μ,v λ μ v
u1 2j1 λμ u2 λþμ v1 j2 j1 þJvþλ
v2 j1 þj2 Jλþv
x1 2Jμv x2 μþv ð20:193Þ
X
ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ!
AJ ¼ ð1Þλ
m1 ,m2 ,λ λ!ð j1 þ j2 J λÞ!ð j1 λ m1 Þ!
½ð j1 þ m1 Þ!ð j1 m1 Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!ðJ þ m1 þ m2 Þ!ðJ m1 m2 Þ!1=2 j1 j2 J
ψ m1 ϕm2 X m1 þm2 :
ðJ j2 þ λ þ m1 Þ!ð j2 λ þ m2 Þ!ðJ j1 þ λ m2 Þ!
Setting m1 + m2 ¼ M, we have
X
ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ!
AJ ¼ ð1Þλ
m1 ,m2 ,λ λ!ð j1 þ j2 J λÞ!ð j1 λ m1 Þ!
½ð j1 þ m1 Þ!ð j1 m1 Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!ðJ þ M Þ!ðJ M Þ!1=2 j1 j2 J
ψ m1 ϕm2 X M :
ðJ j2 þ λ þ m1 Þ!ð j2 λ þ m2 Þ!ðJ j1 þ λ m2 Þ!
ð20:196Þ
W JM ¼ ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ!
X
CJ ψ j1 ϕ j2 ,
m ,m ,m þm ¼M m1 ,m2 m1 m2
ð20:197Þ
1 2 1 2
X ð1Þλ ½ð j1 þm1 Þ!ð j1 m1 Þ!ð j2 þm2 Þ!ð j2 m2 Þ!ðJ þM Þ!ðJ M Þ!1=2
C Jm1 ,m2 λ λ!ð j þj J λÞ!ð j λm1 Þ!ðJ j þλþm1 Þ!ð j λþm2 Þ!ðJ j þλm2 Þ!
:
1 2 1 2 2 1
ð20:198Þ
ΛJM ¼ UW JM ,
where C(J ) is a constant that depends only on J with given numbers j1 and j2. An
implication of (20.199) is that we can get suitably normalized functions ΨJM using
arbitrary basis vectors ΛJM .
Putting
we get
X
ΨJM ¼ ρJ m1 ,m2 ,m1 þm2 ¼M
C Jm1 ,m2 ψ mj11 ϕmj22 : ð20:201Þ
ρJ CJm1 ,m2 :
* +
zl1 zml zk1 zmk
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ δlk:
2 2
ð20:204Þ
l!ðm lÞ! k!ðm kÞ!
λ þ m1 j1 0 or λ j1 m 1 : ð20:205Þ
λ j1 m1 : ð20:206Þ
From the above equations, we obtain only one choice for λ such that [5].
λ ¼ j1 m1 : ð20:207Þ
pffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð 2J Þ! ð j1 þm1 Þ!ð j2 þm2 Þ!
CJm1 ,m2 ¼ ð1Þ j1 m1
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! ð j1 m1 Þ!ð j2 m2 Þ!
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j1 m1 ð2J Þ! j1 þm1 j2 þm2
¼ ð1Þ : ð20:208Þ
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j2 m2 j1 m1
j1 þ m 1 j2 þ m2
To confirm (20.208), calculate and use, e.g.,
j2 m 2 j1 m1
j1 j2 þ m1 þ m2 ¼ j1 j2 þ M ¼ j1 j2 þ J: ð20:209Þ
Thus, we get
X 2
m1 ,m2 ,m1 þm2 ¼M
CJm1 ,m2
X
ð2J Þ! j1 þ m 1
¼
ðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ! m1 ,m2 ,m1 þm2 ¼M j2 m2
j2 þ m2
: ð20:210Þ
j1 m 1
where with the last equality we used (20.213). That is, we get
20.3 Clebsch–Gordan Coefficients of Rotation Groups 857
x y yx1
¼ ð1Þ : ð20:214Þ
y y
ð2J Þ!
¼
ðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ!
X
j2 m2 j1 m1 j2 m2 j1 m1 1 j1 m1 j2 m2 1
ð1Þ ð1Þ
m1 ,m2 ,m1 þm2 ¼M j2 m2 j1 m1
ð1Þ j1 þj2 J ð2J Þ! X j2 j1 J1 j1 j2 J1
¼ : ð20:215Þ
ðJj2 þj1 Þ!ðJj1 þj2 Þ! m1 ,m2 ,m1 þm2 ¼M j2 m2 j1 m1
Comparing the coefficients of the γ-th power monomials of the last three sides,
we get
X
rþs r s
¼ : ð20:217Þ
γ α,β,αþβ¼γ α β
ð2J Þ! j1 þj2 þJ þ1 ð2J Þ! ð j1 þj2 þJ þ1Þ!
¼ ¼
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j1 þj2 J ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! ð j1 þj2 J Þ!ð2J þ1Þ!
ð j1 þj2 þJ þ1Þ!
¼ : ð20:218Þ
ð2J þ1ÞðJ j2 þj1 Þ!ðJ j1 þj2 Þ!ð j1 þj2 J Þ!
858 20 Theory of Continuous Groups
X ð1Þλ ½ð j1 þm1 Þ!ð j1 m1 Þ!ð j2 þm2 Þ!ð j2 m2 Þ!ðJ þM Þ!ðJ M Þ!1=2
λ λ!ð j þj J λÞ!ð j λm1 Þ!ðJ j þλþm1 Þ!ð j λþm2 Þ!ðJ j þλm2 Þ!
1 2 1 2 2 1
ð20:221Þ
During the course of the above calculations, we encountered, e.g., Γ(x) and the
factorial involving a negative integer. Under ordinary circumstances, however, such
things must be avoided. Nonetheless, it would be convenient for practical use, if we
properly recognize that we first choose a number, e.g., x close to (but not identical
with) a negative integer and finally take the limit of x at a negative integer after the
related calculations have been finished.
Choosing, e.g., M(¼m1 + m2) ¼ J in (20.203) instead of setting M ¼ J in the
above, we will see that only the factor of λ ¼ j2 + m2 in (20.198) survives. Yet we get
the same result as the above. The confirmation is left for readers.
discussed in Sect. 18.9. Taking two sets of complex valuables u (u2 u1) and
v (v2 v1) and rewriting them according to (20.176), we have
so that we can get the basis functions of the direct-product representation described
by
ðu2 v2 u1 v2 u2 v1 u1 v1 Þ:
Notice that we have four functions above; i.e., (2j1 + 1)(2j2 + 1) ¼ 4, j1 ¼ j2 ¼ 1/2.
Using (20.221) we can determine the proper basis functions with respect to ΨJM of
(20.202). For example, for Ψ11 we have only one choice to take m1 ¼ m2 ¼ 12 to
get m1 + m2 ¼ M ¼ 1. In (20.221) we have no other choice but to take λ ¼ 0. In
this way, we have
1=2 1=2
Ψ11 ¼ u2 v2 ¼ ψ 1=2 ϕ1=2 :
Similarly, we get
1=2 1=2
Ψ11 ¼ u1 v1 ¼ ψ 1=2 ϕ1=2 :
We put the above results in a simple matrix form representing the basis vector
transformation such that
0 1
1 0 0 0
B 1 1 C
B 0 pffiffiffi 0 pffiffiffi C
B 2 2 C
ðu2 v2 u1 v2 u1 v1 u2 v1 ÞB
B0 0
C ¼ Ψ 1 Ψ1 Ψ1 Ψ0 :
C 1 0 1 0 ð20:222Þ
B 1 0 C
@ 1 1 A
0 pffiffiffi 0 pffiffiffi
2 2
In this way, (20.222) shows how the basis functions that possess a proper
irreducible representation and span the direct-product representation space are
constructed using the original product functions. The ClebschGordan coefficients
860 20 Theory of Continuous Groups
play a role as a linker (i.e., a unitary operator) of those two sets of functions. In this
respect these coefficients resemble those appearing in the symmetry-adapted linear
combinations (SALCs) mentioned in Sects. 18.2 and 19.3.
Expressing the unitary transformation according to (20.173) and choosing
1
Ψ1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the transformation in a similar
manner to (20.48) such that
Rða, bÞ Ψ11 Ψ10 Ψ11 Ψ00 ¼ Ψ11 Ψ10 Ψ11 Ψ00 Dð1=2Þ Dð1=2Þ : ð20:223Þ
Notice that (20.224) is a block matrix, namely (20.224) has been reduced
according to the symmetric and antisymmetric representations. Symbolically writing
(20.223) and (20.224), we have
The block matrix of (20.224) is unitary, and so the (3, 3) submatrix and (1, 1)
submatrix (i.e., the number of 1) are unitary accordingly. To show it, use the
conditions (20.35) and (20.173). The confirmation is left for readers. This submatrix
is a symmetric representation, whose representation space Ψ11 Ψ10 Ψ11 span. Note
that these functions are symmetric with the exchange m1 $ m2. Or, we may think
that these functions are symmetric with the exchange ψ $ ϕ. Though trivial, the
basis function Ψ00 spans the antisymmetric representation space. The corresponding
submatrix is merely the number 1. This is directly related to the fact that v
(ug)T (or u1v2 u2v1) of (20.185) is an invariant. The function Ψ00 changes the
sign (i.e., antisymmetric) with the exchange m1 $ m2 or ψ $ ϕ.
Once again, readers might well ask why we need to work out such an elaborate
means to pick up a small pebble. With increasing dimension of the vector space,
however, to seek an appropriate set of proper basis functions becomes increasingly
as difficult as lifting a huge rock. Though simple, a next example gives us a feel for
it. Under such situations, a projection operator is an indispensable tool to address the
problems.
Example 20.2 We examine a direct product of D(1) D(1), where j1 ¼ j2 ¼ 1. This
is another example of the symmetric and antisymmetric representations. Taking
two sets of complex valuables u (u2 u1) and v (v2 v1) and rewriting them
20.3 Clebsch–Gordan Coefficients of Rotation Groups 861
according to (20.176) again, we have nine, i.e., (2j1 + 1)(2j2 + 1) product functions
expressed as
ψ 11 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 , ψ 11 ϕ10 , ψ 10 ϕ10 , ψ 11 ϕ10 , ψ 11 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 :
ð20:225Þ
These product functions form the basis functions of the direct-product represen-
tation. Consequently, we will be able to construct proper eigenfunctions by means of
linear combinations of these functions. This method is again related to that based on
SALCs discussed in Chap. 19. In the present case, it is accomplished by finding the
ClebschGordan coefficients described by (20.221).
As implied in Table 20.1, we need to determine the individual coefficients of nine
functions of (20.225) with respect to the following functions that are constituted by
linear combinations of the above nine functions:
(i) With the first five functions, we have J ¼ j1 + j2 ¼ 2. In this case, the
determination procedures of the coefficients are pretty much simplified. Substitut-
ing J ¼ j1 + j2 for the second factor of the denominator of (20.221), we get (λ)!
as well as λ! in the first factor of the denominator. Therefore, for the inside of
factorials not to be negative we must have λ ¼ 0. Consequently, we have [5]
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2j1 Þ!ð2j2 Þ!
ρJ ¼ ,
ð2J Þ!
1=2
ðJ þ M Þ!ðJ M Þ!
C Jm1 ,m2 ¼ :
ð j1 þ m1 Þ!ð j1 m1 Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!
With Ψ2 2 we have only one choice for ρJ and CJm1 ,m2. That is, we have m1 ¼ j1
and m2 ¼ j2. Then, Ψ2 2 ¼ ψ 1 1 ϕ1 1 . In the case of M ¼ 1, we have two
choices in (20.221); i.e., m1 ¼ 1, m2 ¼ 0 and m1 ¼ 0, m2 ¼ 1. Then, we
get
1
Ψ21 ¼ pffiffiffi ψ 11 ϕ10 þ ψ 10 ϕ11 : ð20:226Þ
2
1
Ψ21 ¼ pffiffiffi ψ 11 ϕ10 þ ψ 10 ϕ11 : ð20:227Þ
2
1
Ψ20 ¼ pffiffiffi ψ 11 ϕ11 þ 2ψ 10 ϕ10 þ ψ 11 ϕ11 : ð20:228Þ
6
(ii) With J ¼ 1, i.e., Ψ11 , Ψ10 , and Ψ11 , we start with (20.221). Noting the λ! and
(1 λ)! in the first two factors of denominator, we have λ ¼ 0 or λ ¼ 1. In the
case of M ¼ 1, we have only one choice of m1 ¼ 0, m2 ¼ 1 for λ ¼ 0.
Similarly, m1 ¼ 1, m2 ¼ 0 for λ ¼ 1. Hence, we get
1
Ψ11 ¼ pffiffiffi ψ 10 ϕ11 ψ 11 ϕ10 : ð20:229Þ
2
1
Ψ10 ¼ pffiffiffi ψ 11 ϕ11 ψ 11 ϕ11 , ð20:230Þ
2
1
Ψ11 ¼ pffiffiffi ψ 11 ϕ10 ψ 10 ϕ11 : ð20:231Þ
2
1
Ψ00 ¼ pffiffiffi ψ 11 ϕ11 ψ 10 ϕ10 þ ψ 11 ϕ11 :
3
20.3 Clebsch–Gordan Coefficients of Rotation Groups 863
ℜða, bÞ Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00
ð20:233Þ
¼ Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00 Dð1Þ Dð1Þ :
Notice again that the representation matrix of D(1) D(1) can be converted
(or reduced) to a block unitary matrices according to the symmetric and
antisymmetric representations such that
where D(2) and D(0) are the symmetric representations and D(1) is the
antisymmetric representation. Recall that SO(3) is simply reducible. To show
(20.234), we need somewhat lengthy calculations, but the computation pro-
cedures are straightforward.
A set of basis functions Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 spans a five-dimensional
symmetric representation space. In turn, Ψ00 spans a one-dimensional symmetric
representation space. Another set of basis functions Ψ11 Ψ10 Ψ11 spans a
three-dimensional antisymmetric representation space.
We add that to seek some special ClebschGordan coefficients would be
easy. For example, we can readily find out them in the case of J ¼ j1 + j2 ¼ M or
J ¼ j1 + j2 ¼ M. The final result of the normalized basis functions can be
j1 þj2 j j j þj2 j j
Ψ j1 þj2 ¼ ψ j11 ϕ j22 , Ψ1ð j1 þj2 Þ ¼ ψ j1 1 ϕj2 2 :
ð j1 , m1 ¼ j1 , j2 , m2 ¼ j2 jJ ¼ j1 þ j2 , M ¼ J Þ
and
ð j1 , m1 ¼ j1 , j2 , m2 ¼ j2 jJ ¼ j1 þ j2 , M ¼ J Þ
survive and take a value of 1; see the corresponding parts of the matrix
elements in (20.221). This can readily be seen in Tables 20.1, 20.2, and 20.3.
In Sect. 20.2 we have developed a practical approach to dealing with various aspects
and characteristics of SU(2) and SO(3). In this section we introduce an elegant theory
of Lie groups and Lie algebras that enable us to systematically study SU(2) and
SO(3).
20.4 Lie Groups and Lie Algebras 865
We start with the following discussion. Let A(t) be a non-singular matrix whose
elements vary as a function of a real number t. Here we think of a matrix as a function
of a real number t. If under such conditions A(t) constitutes a group, A(t) is said to be
a one-parameter group. In particular, we are interested in a situation where we have
We further get
Since A(0) is non-singular, A(0)1 must exist. Then, multiplying A(0)1 on both
sides of (20.237), we obtain
Að0Þ ¼ E, ð20:238Þ
meaning that
A0 ðt Þ ¼ A0 ð0ÞAðt Þ: ð20:240Þ
Defining
X A0 ð0Þ, ð20:241Þ
we obtain
866 20 Theory of Continuous Groups
A0 ðt Þ ¼ XAðt Þ: ð20:242Þ
under an initial condition A(0) ¼ E. Thus, any one-parameter group A(t) is described
by (20.243) on condition that A(t) is represented by (20.235).
In Sect. 20.1 we mentioned the Lie algebra in relation to (20.14). In this context
(20.243) shows how the Lie algebra is related to the one-parameter group. For a
group to be a continuous group is thus deeply connected to this one-parameter group.
Equation (20.45) of Sect. 20.2 is an example of product of one-parameter groups.
Bearing in mind the above situation, we describe definitions of the Lie group and Lie
algebra.
Definition 20.2 [9] Let G be a subgroup of GL(n, ℂ). Suppose that
Ak (k ¼ 1, 2, ) 2 G and that lim Ak ¼ A ½2 GLðn, ℂÞ. If A 2 G, then G is called
k!1
a linear Lie group of dimension n.
We usually call a linear Lie group simply a Lie group. Note that GL(n, ℂ) has
appeared in Sect. 16.1. The definition again is most likely to bewilder chemists in
that they read it as though the definition itself including the word of Lie group were
written in the air. Nevertheless, a next example is again expected to relieve the
chemists of useless anxiety.
Example 20.3 Let G be a group and let Ak 2 G with detAk ¼ 1 such that
cos θk sin θk
Ak ¼ ðk : real numberÞ: ð20:244Þ
sin θk cos θk
cos θk sin θk cos θ sin θ
lim Ak ¼ lim ¼ A: ð20:245Þ
k!1 k!1 sin θk cos θk sin θ cos θ
Definition 20.3 [9] Let G be a linear Lie group. If with 8t (t: real number) we have
all X are called a Lie algebra of̬ the corresponding Lie group G. We denote the Lie
algebra of the Lie group G by g.
The relation (20.246) includes the case where X is a (1, 1) matrix (i.e., merely a
complex number) as the trivial case. Most importantly, (20.246) is directly
connected to (20.243) and Definition 20.3 shows the direct relevance between the
Lie group and Lie algebra. The two theories of Lie group and Lie algebra underlie
continuous groups.
The lie algebra has following properties:
̬ ̬
(i) If X 2g, aX 2g as well, where a is any real number.
̬ ̬
(ii) If X, Y 2g, X þ Y 2g as well.
̬ ̬
(iii) If X, Y 2g, ½X, Y ð XY YX Þ 2g as well.
In fact, exp[t(aX)] ¼ exp [(ta)X] and ta is a real number so that we have (i).
Regarding (ii) we should examine individual properties of X. As already shown in
Property (7)0 of Chap. 15 (Sect. 15.2), e.g., a unitary matrix of exp(tX) is associated
with an anti-Hermitian matrix of X; we are interested in this particular case. In the
case of SU(2) X is a traceless anti-Hermitian matrix and as to SO(3) X is a real skew-
symmetric matrix. In the former case, for example, traceless anti-Hermitian matrices
X and Y can be expressed as
ic a þ ib ih f þ ig
X¼ , Y¼ , ð20:247Þ
a þ ib ic f þ ig ih
Notice that we ignored the terms having s2, t2, st2, s2t, and s2t2 as a coefficient.
Defining
we have
exp ðsX Þ exp ðtY Þ exp ðsX Þ exp ðtY Þ 1 þ st ½X, Y exp ðst ½X, Y Þ: ð20:251Þ
̬
Thus, g constitutes an inner product (vector) space.
Dividing both sides of (20.254) by a real number Δ and taking a limit as below,
we have
1
½h f ðt þ ΔÞjgðt þ ΔÞi h f ðt Þjgðt Þi
lim
Δ
Δ!0
ð20:255Þ
f ðt þ ΔÞ f ðt Þ gðt þ ΔÞ gðt Þ
¼ lim gð t þ Δ Þ þ f ð t Þ ,
Δ!0 Δ Δ
where we used the calculation rules for inner products of (13.3) and (13.20). Then,
we get (20.253). This completes the proof. ∎
Now, let us apply (20.253) to a unitary operator A(t) we dealt with in the previous
section. Replacing f(t) and g(t) in (20.253) with ψA(t){ and A(t)χ, respectively, we
have
* +
D E dAðt Þ{ dAðt Þ
d
ψAðt Þ{ Aðt Þχ ¼ ψ Aðt Þχ þ ψAðt Þ {
χ , ð20:256Þ
dt dt dt
where ψ and χ are arbitrarily chosen vectors that do not depend on t. Then, we have
D E
d d d
LHS of ð20:256Þ ¼ ψ jAðt Þ{ Aðt Þjχ ¼ hψ jEjχ i ¼ hψ jχ i ¼ 0, ð20:257Þ
dt dt dt
where with the second equality we used the unitarity of A(t). Taking limit t ! 0 in
(20.256), we have
* +
dAð0Þ{ dAð0Þ
{
lim ½RHS of ð20:256Þ ¼ ψ Að0Þχ þ ψAð0Þ χ :
t!0 dt dt
where we assumed that operations of the adjoint and t ! 0 are commutable. Then,
we get
870 20 Theory of Continuous Groups
D E
0 { { dAð0Þ
lim ½RHS of ð20:256Þ ¼ ψ ½A ð0Þ jAð0Þχ þ ψAð0Þ χ
t!0 dt ð20:258Þ
¼ ψX { jχ þ hψ jXχ i ¼ ψ X { þ X χ ,
where we used X A0(0) of (20.241) and A(0) ¼ A(0){ ¼ E. From (20.257) and
(20.258), we have
0 ¼ ψ X{ þ X χ :
where we used (15.32) and Theorem 15.2 as well as the assumption that X is anti-
Hermitian. Note that exp(tX) and exp(tX) are commutative; see (15.29) of Sect.
15.2. Equation (20.261) implies that exp(tX) is unitary, i.e., exp(tX) 2 U(n). The
20.4 Lie Groups and Lie Algebras 871
̬ ̬
notation U(n) means a unitary group. Hence, X 2u ðnÞ, where u ðnÞ means the Lie
algebra corresponding to U(n).
Next, let us seek a Lie algebra that corresponds to a special unitary group U(n).
̬ ̬ ̬
By Definition 20.3, it is obvious that since U(n) ⊃ SU(n), u ðnÞ ⊃ su ðnÞ , where
̬ ̬
su ðnÞ means the Lie algebra corresponding to SU(n). It suffices to find the condition
̬
under which det[exp(tX)] ¼ 1 holds with any real number t and the elements X 2u
ðnÞ. From (15.41) [9], we have
where Tr stands for trace (see Chap. 12). This is equivalent to that
holds
̬ ̬
with any real t. For this, we must have Tr(X) ¼ 0 with m ¼ 0. This implies that
su ðnÞ comprises anti-Hermitian ̬ ̬
matrices with its trace being zero (i.e., traceless).
̬
Consequently, the Lie algebra su ðnÞ is certainly a subset (or subspace) of u ðnÞ that
consists of anti-Hermitian matrices.
In relation to (20.256) let us think of a real orthogonal matrix A(t). If A(t) is real
and orthogonal, (20.256) can be rewritten as
d dAðt ÞT dAðt Þ
ψAðt ÞT jAðt Þχ ¼ ψ Aðt Þχ þ ψAðt ÞT χ : ð20:264Þ
dt dt dt
In this case we obtain a skew-symmetric matrix. All its diagonal elements are zero
and, hence, the skew-symmetric matrix is traceless. Other typical Lie groups are an
orthogonal group O(n) and a special orthogonal group SO(n) and the corresponding
̬ ̬ ̬ ̬
Lie algebras are denoted by o ðnÞ and so ðnÞ, respectively. Note that both o ðnÞ and
̬ ̬
so ðnÞ consist of skew-symmetric matrices.
Conversely, let us consider what type of operator exp(tX) would be if X is a skew-
symmetric matrix. Here we assume that X is a (n, n) matrix. With an arbitrary real
number t we have
½ exp ðtX Þ exp ðtX ÞT ¼ ½ exp ðtX Þ exp tX T ¼ ½ exp ðtX Þ ½ exp ðtX Þ
¼ ½ exp ðtX tX Þ ¼ exp 0 ¼ E, ð20:266Þ
̬
Theorem 20.4 The n-th order Lie algebra u ðnÞ corresponding to the Lie group U
(n) (i.e., unitary group) consists of all the anti-Hermitian (n, n) matrices. The n-th
̬ ̬
order Lie algebra su ðnÞ corresponding to the Lie group SU(n) (i.e., special unitary
group) comprises all the anti-Hermitian (n, n) matrices with the trace zero. The n-th
̬ ̬ ̬
order Lie algebras o ðnÞ and so ðnÞ corresponding to the Lie groups O(n) and SO(n),
respectively, are all the real skew-symmetric (n, n) matrices.
Notice that if A and B are anti-Hermitian matrices, so are A + B and cA (c: real
̬ ̬ ̬ ̬
number). This is true of skew-symmetric matrices as well. Hence, u ðnÞ, su ðnÞ, o ðnÞ,
̬ ̬
and so ðnÞ form a linear vector space.
In Sect. 20.3.1 we introduced a commutator. The commutators are ubiquitous in
quantum mechanics, especially as commutation relations (see Part I). Major prop-
erties are as follows:
ði Þ ½X þ Y, Z ¼ ½X, Z þ ½Y, Z ,
ðiiÞ ½aX, Y ¼ a½X, Z ða 2 ℝÞ,
ðiiiÞ ½X, Y ¼ ½Y, X ,
ðivÞ ½X, ½Y, Z þ ½Y, ½Z, X þ ½Z, ½X, Y ¼ 0: ð20:267Þ
The last equation of (20.267) is well known as Jacobi’s identity. Readers are
encouraged to check it.
Since the Lie algebra forms a linear vector space of a finite dimension, we denote
its basis vectors by X1, X2, , Xd. Then, we have
̬
g ðdÞ ¼ SpanfX 1 , X 2 , , X d g, ð20:268Þ
̬
where
̬
g ðdÞ stands for a Lie algebra of dimension d. As their commutators belong to
g ðdÞ as well, with an arbitrary pair of Xi and Xj (i, j ¼ 1, 2, , d ) we have
Xd
Xi, Xj ¼ f X ,
k¼1 ijk k
ð20:269Þ
where a set of real coefficients fijk is said to be structure constants, which define the
structure of the Lie algebra.
Example 20.4 In (20.42) we write ζ 1 ζ x, ζ 2 ζ y, and ζ 3 ζ z. Rewriting it
explicitly, we have
1 0 i 1 0 1 1 i 0
ζ1 ¼ , ζ2 ¼ , ζ3 ¼ : ð20:270Þ
2 i 0 2 1 0 2 0 i
Then, we get
20.4 Lie Groups and Lie Algebras 873
X3
ζi , ζj ¼ E ζ ,
k¼1 ijk k
ð20:271Þ
Notice that in (20.272) if (i, j, k) represents an even permutation of (1, 2, 3), Eijk ¼ 1,
but if (i, j, k) is an odd permutation of (1, 2, 3), Eijk ¼ 1. Otherwise, for, e.g.,
i ¼ j, j ¼ k, or k ¼ i, etc. Eijk ¼ 0. The relation of (20.271) is essentially the same as
(3.30) and (3.69) of Chap. 3. We have
̬ ̬
su ð2Þ ¼ Spanfζ 1 , ζ 2 , ζ 3 g: ð20:273Þ
Example 20.5 In (20.26) we write A1 Ax, A2 Ay, and A3 Az. Rewriting it, we
have
0 1 0 1 0 1
0 0 0 0 0 1 0 1 0
B C B C B C
A1 ¼ @ 0 0 1 A, A2 ¼ @ 0 0 0 A, A3 ¼ @ 1 0 0 A: ð20:274Þ
0 1 0 1 0 0 0 0 0
Again, we have
X3
Ai , Aj ¼ E A:
k¼1 ijk k
ð20:275Þ
iJ iJ y
Jex x , Jey , Jez iJ z =ħ,
ħ ħ
we get
h i X3
Jel , Jf
m ¼ E Je ,
k¼1 lmn n
ð20:276Þ
We can readily check that the definition (20.252) satisfies those of the inner
product of (13.2), (13.3), and (13.4). In fact, we have
X X X
hBjAi ¼ b a ¼
i,j ij ij i,j
aij bij ¼ a b
i,j ij ij
¼ hAjBi : ð20:277Þ
Equation (13.3) is obvious from the calculation rules of a matrix. With (13.4),
we get
X
hAjAi ¼ Tr A{ A ¼
2
a
i,j ij
0, ð20:278Þ
In fact, we have
X X
Tr A{ B ¼ i,j
A{ ij ðBÞji ¼ a b hAjBi:
i,j ji ji
In another way, we can readily compare (20.279) with (20.252) using, e.g., a
(2, 2) matrix and find that both equations give the same result. It is left for readers as
an exercise.
20.4 Lie Groups and Lie Algebras 875
̬
From Theorem 20.4, u ðnÞ corresponding to the unitary group U(n) consists of all
the anti-Hermitian (n, n) matrices. With these matrices, we have
X X X
hBjAi ¼ Tr B{ A ¼ i,j
b
ji aji ¼ i,j
bij aij ¼ a b
i,j ij ij
¼ hAjBi: ð20:280Þ
hAjBi ¼ hAjBi:
̬
This means that hA|Bi is real. For example,
̬
using u ð2Þ, let us evaluate an inner
product. We denote arbitrary X 1 , X 2 2u ð2Þ by
ia c þ id ip r þ is
X1 ¼ , X2 ¼ ,
c þ id ib r þ is iq
where with the second last equality we used (12.13), namely invariance of the trace
under the (unitary) similarity transformation. If g is represented by a real orthogonal
matrix, instead of (20.282) we have
876 20 Theory of Continuous Groups
Tr gXg1 Þ{ gYg1 ¼ Tr gXgT Þ{ gYgT ¼ Tr gT Þ{ X { g{ gYgT
¼ Tr g X { gT gYg{ ¼ Tr gX { YgT ¼ hXjYi, ð20:283Þ
This relation clearly shows that the real inner product of (20.284) remains
unchanged by the operation
X⟼gXg1 ,
exp ftAd½gðX Þg ¼ exp tgXg1 ¼ exp gtXg1 ¼ g exp ðtX Þg1 ,
g Ad[ ] g
̬ ̬
Fig. 20.5 Linear transformation Ad[g]: g!g (i.e., endomorphism). Ad[g] is an endomorphism that
̬
operates on a vector space g
878 20 Theory of Continuous Groups
̬ ̬
is a linear
̬
transformation g!g, namely an endomorphism that operates on a vector
space g (see Sect. 11.2). Figure 20.5 schematically depicts it. The notation of the
adjoint representation Ad is somewhat
̬ ̬
confusing. That is, (20.288) shows that Ad[g]
is a linear transformation g!g (i.e., endomorphism). On the other hand, Ad is
thought to be a mapping G ! G0 where G and G0 mean two groups. Namely, we
express it symbolically as [9]
g⟼Ad½g or Ad : G ! G0 : ð20:289Þ
The G and G0 may or may not be identical. Examples can be seen in, e.g., (20.294)
and (20.302) later.
̬ ̬ ̬
As mentioned above, if g is either u ðnÞ or o ðnÞ , Ad[g] is an orthogonal
̬
transformation on g. The representation of Ad[g] is real accordingly.
An immediate implication of this is that the transformation of the basis vectors of
̬
g has a connection with the corresponding Lie groups such as U(n) and O(n).
Moreover, it is well known and studied that there is a close relationship between
̬ ̬
Ad[SU(2)] and SO(3). First, we wish to seek basis vectors of su ð2Þ. A general form
̬ ̬
of X 2su ð2Þ is a following traceless anti-Hermitian matrix described by
ic a þ ib
X¼ ,
a þ ib ic
where a, b, and c are real. We have encountered this type of matrix in (20.42) and
(20.270). Using an
̬ ̬
inner product described in (20.252), we determine an orthonor-
mal basis set of su ð2Þ such that
1 0 i 1 0 1 1 i 0
e1 ¼ pffiffiffi , e2 ¼ pffiffiffi , e3 ¼ pffiffiffi , ð20:290Þ
2 i 0 2 1 0 2 0 i
! ! !
1 eiα=2 0 0 i eiα=2 0
Ad½gα ðe1 Þ ¼ gα e1 g1
α ¼ pffiffiffi
2 0 eiα=2 i 0 0 eiα=2
!
1 0 sin α i cos α
¼ pffiffiffi ¼ e1 cos α þ e2 sin α:
2 sin α i cos α 0
ð20:291Þ
Similarly, we get
1 0 cosα þ isinα
Ad½gα ðe2 Þ ¼ p ffiffi
ffi ¼ e1 sinα þ e2 cosα, ð20:292Þ
2 cosα þ isinα 0
1 i 0
Ad½gα ðe3 Þ ¼ pffiffiffi ¼ e3 : ð20:293Þ
2 0 i
Moreover, with
!
eiγ=2 0
gγ ¼ , ð20:297Þ
0 eiγ=2
we have
880 20 Theory of Continuous Groups
0 1
cos γ sin γ 0
B C
Ad gγ ¼ @ sin γ cos γ 0 A: ð20:298Þ
0 0 1
where we
̬ ̬
used the fact that gω ⟼ Ad[gω] (ω ¼ α, β, γ) is a representation of SU
(2) on su ð2Þ . To obtain an explicit form of the representation matrix of Ad[g]
[g 2 SU(2)], we use
0 1
β β
e2ðαþγÞ cos e2ðαγÞ sin
i i
B 2 2 C
gαβγ gα gβ gγ ¼ @ A ð20:300Þ
i
ð γα Þ β 2i ðαþγ Þ β
e2 sin e cos
2 2
2 2 2 2
0 1
i sin β cos γ ið cos α cos β cos γ sin α sin γ Þ
B C
B þ sin α cos β cos γ þ cos α sin γ C
B C
1 BB
C
C
¼ pffiffiffi B C
2B C
B ið cos α cos β cos γ sin α sin γ Þ C
@ A
i sin β cos γ
ð sin α cos β cos γ þ cos α sin γ Þ
0 1
cos α cos β cos γ sin α sin γ
B C
¼ ð e1 e2 e3 Þ B C
@ sin α cos β cos γ þ cos α sin γ A:
sin β cos γ
ð20:301Þ
Note that the matrix of (20.300) is identical with (20.45). Similarly calculating Ad
[gαβγ ](e2) and Ad[gαβγ ](e3), we obtain
20.4 Lie Groups and Lie Algebras 881
ðe1 e2 e3 Þ Ad gαβγ ¼
0 1
cosαcosβcosγ sinαsinγ cosαcosβsinγ sinαcosγ cosαsinβ
B C
ð e1 e2 e3 Þ B
@
sinαcosβcosγ þ cosαsinγ sinαcosβsinγ þ cosαcosγ sinαsinβ C:
A
sinβcosγ sinβsinγ cosβ
ð20:302Þ
The matrix of RHS of (20.302) is the same as (17.101) that contains Euler angles.
The calculations are somewhat lengthy but straightforward. As mentioned just above
and as can be seen, e.g., in (20.294) and (20.302), we may symbolically express the
adjoint representation as [9]
homomorphism mapping from SU(2) to SO(3), we seek its kernel on the basis of
Theorem 16.3 (Sect. 16.4). Let h be an element of kernel of SU(2) such that
From the above relation, we must have h12 ¼ h21 ¼ 0. As h 2 SU(2), deth ¼ 1.
a 0
Hence, for h we choose h ¼ , where a 6¼ 0. Moreover, considering
0 a1
he2 ¼ e2h we get
0 a 0 a1
¼ : ð20:306Þ
a1 0 a 0
Ad½ e ¼ E: ð20:309Þ
Ad g1 g2 1 ¼ Ad½g1 Ad g2 1 ¼ Ad½g2 Ad g2 1 ¼ Ad g2 g2 1
¼ Ad½e ¼ E: ð20:310Þ
g1 g2 1 ¼ e or g1 ¼ g2 : ð20:311Þ
Conversely, we have
where with the first equality we used Theorem 20.6 and with the second equality we
used (20.309). The relations (20.311) and (20.312) imply that with any g 2 SU
(2) there are two elements g that satisfy Ad[g] ¼ Ad[g]. These complete the
proof. ∎
In the above proof of Theorem 20.7, (20.309) is a simple, but important expres-
sion. In view of Theorem 16.4 (Homomorphism theorem), we summarize the above
discussion as follows: Let Ad be a homomorphism mapping such that
with a kernel F ¼ fe, eg. The relation is schematically represented in Fig. 20.6.
Notice the resemblance between Fig. 20.6 and Fig. 16.1a.
Correspondingly, from Theorem 16.4 of Sect. 16.4 we have an isomorphic
f such that
mapping Ad
f SU ð2Þ=F ⟶SOð3Þ,
Ad: ð20:314Þ
(2) (3)
Ad
−
Fig. 20.6 Homomorphism mapping Ad: SU(2) ⟶ SO(3) with a kernel F ¼ fe, eg . E is the
identity transformation of SO(3)
884 20 Theory of Continuous Groups
form of SU(2) of (20.300) and a general form of SO(3) of (20.302), we can readily
show Ad[gαβγ ] ¼ Ad[gαβγ ]. That is, from (20.300) we have
gαþ2π,βþ2π,γþ2π ¼ gαβγ :
Both the above two elements gαβγ and gαβγ produce an identical orthogonal
matrix given in RHS of (20.302).
To associate the above results of abstract algebra with the tangible matrix form
obtained earlier, we rewrite (20.54) by applying a simple trigonometric formula of
β β
sin β ¼ 2 sin cos :
2 2
From (20.315), we clearly see that (i) if j is zero or a positive integer, so are m and
m0. Therefore, (20.315) is a periodic function with 2π with respect to α, β, and γ. But
(ii) if j is a half-odd-integer, so are m and m0. Then, (20.315) is a periodic function
with 4π. Note that in the case of (ii) 2j ¼ 2n + 1 (n ¼ 0.1, 2, ). Therefore, in
(20.315) we have
2jþ2ðmm0 2μÞ 2ðnþmm0 2μÞ
β β β
cos ¼ cos cos : ð20:316Þ
2 2 2
As studied in Sect. 20.2.3, in the former case (i) the spherical surface harmonics
span the representation space of D(l ) (l ¼ 0, 1, 2, ). The simplest case for it is
D(0) ¼ 1; a (1, 1) identity matrix, i.e., merely a number 1. The second simplest case
was given by (20.74) that describes D(1). Meanwhile, the simplest case for (ii) is
ð12Þ
D(1/2) whose matrix representation was given by Dα,β,γ in (20.45) or by gαβγ in
(20.300). With gαβγ , both Ad[ gαβγ] produce a real orthogonal (3, 3) matrix
expressed as (20.302). This representation matrix allowedly belongs to SO(3).
20.5 Connectedness of Lie Groups 885
Then, f(t) is a continuous matrix function of all real numbers t. We have f(0) ¼ a
and f(1) ¼ b and, hence, a and b are connected to each other within SO(2).
For a and b to be connected within A is denoted by a~b. Then, the symbol ~
satisfies the equivalence relation. That is, we have a following proposition.
Proposition 20.2 We have a, b, c 2 A ⊂ GL(n, ℂ). Then, we have following three
relations. (i) a~a. (ii) If a~b, b~a. (iii) If a~b and b~c, then we have a~c.
Proof (i) Let f(t) ¼ a (0 t 1). Then, f(0) ¼ f(1) ¼ a so that we have a~a. (ii) Let
f(t) be a continuous curve that connects a and b such that f(0) ¼ a and f(1) ¼ b
(0 t 1), and so a~b. Let g(t) be another continuous curve within A. Then, g(t) f
(1 t) is also a continuous curve within A. Then, g(0) ¼ b and g(1) ¼ a. This implies
b~a. (iii) Let f(t) and g(t) be two continuous curves that connect a and b and b and c,
respectively. Meanwhile, we define h(t) as
886 20 Theory of Continuous Groups
8
>
< f ð2t Þ 0 t
1
hð t Þ ¼ 2
>
: gð2t 1Þ 1
t 1
2
so that h(t) can be a third continuous curve. Then, h(1/2) ¼ f(1) ¼ g(0). From the
supposition we have h(0) ¼ f(0) ¼ a, h(1/2) ¼ f(1) ¼ g(0) ¼ b, and h(1) ¼ g(1) ¼ c.
This implies if a~b and b~c, then we have a~c.
Let us give another related definition.
Definition 20.6 Let A be a subset of GL(n, ℂ). Let a point a 2 A. Then, a collection
of points that are connected to a within A is called a connected component of a and
denoted by C(a).
From Proposition 20.2 (i), a 2 C(a). Let C(a) and C(b) be two different connected
components. We have two alternatives with C(a) and C(b). (I) C(a) \ C(b) ¼ ∅.
(II) C(a) ¼ C(b). Suppose c 2 C(a) \ C(b). Then, it follows that c~a and c~b. From
Proposition 20.2 (ii) and (iii), we have a~b. Hence, we get C(a) ¼ C(b). Thus, the
subset A is a direct sum of several connected components such that
a, b 2 A, f(a) ¼ p and f(b) ¼ q ( p, q : real) with p < q. Then, f(x) takes all real
numbers for its value that exist on the interval [p, q]. In other words, we have f
(A) ⊃ [p, q]. This is well known as an intermediate value theorem.
Example 20.8 [9] Suppose that A is a connected subset of GL(n, ℝ). Then, with any
pair of elements a, b 2 A ⊂ GL(n, ℝ), we must have a continuous function g(t)
(0 t 1) such that g(0) ¼ a and g(1) ¼ b. Meanwhile, suppose that we can define a
real determinant for any 8g 2 A as a real continuous function detg(t).
Now, suppose that deta ¼ p < 0 and detb ¼ q > 0 with p < q. Then, from
Theorem 20.8 we would have det(A) ⊃ [p, q]. Consequently, we must have some
x 2 A such that det(x) ¼ 0. But, this is in contradiction to A ⊂ GL(n, ℝ), because we
must have det(A) 6¼ 0. Thus, within a connected set the sign of the determinant
should be constant, if the determinant is defined as a real continuous function.
In relation to the discussion of the connectedness, we have the following general
theorem.
Theorem 20.9 [9] Let G0 be a connected component within a linear Lie group
G that contains the identity element e. Then, G0 is an invariant subgroup of G. A
connected component C(g) containing g 2 G is identical with gG0 ¼ G0g.
Proof Let a, b 2 G0. Since G0 ¼ C(e), from the supposition there are continuous
curves f(t) and g(t) within G0 that connect a with e and b with e, respectively, such
that f(0) ¼ a, f(1) ¼ e and g(0) ¼ b, g(1) ¼ e. Meanwhile, let h(t) f(t)[g(t)]1. Then,
h(t) is another continuous curve. We have h(0) ¼ f(0)[g(0)]1 ¼ ab1 and h(1) ¼ f(1)
[g(1)]1 ¼ e(e1) ¼ e e ¼ e. That is, ab1 and e are connected, being indicative of
ab1 2 G0. This implies that G0 is a subgroup of G.
Next, with 8g 2 G and 8a 2 G0 we have f(t) in the same sense as the above.
Defining k(t) as k(t) ¼ g f(t)g1, k(t) is a continuous curve within G with k
(0) ¼ gag1 and k(1) ¼ g eg1 ¼ e. Hence, we have gag1 2 G0. Since a is an
arbitrary point of G0, we have gG0g1 ⊂ G0 accordingly. Similarly, g is an
arbitrary point of G, and so replacing g with g1 in the above, we get g1G0g ⊂ G0.
Operating g and g1 from the left and right, respectively, we obtain G0 ⊂ gG0g1.
Combining this with the above relation gG0g1 ⊂ G0, we obtain gG0g1 ¼ G0 or
gG0 ¼ G0g. This means that G0 is an invariant subgroup of G; see Sect. 16.3.
Taking 8a 2 G0 and f(t) in the same sense as the above again, l(t) ¼ g f(t) is a
continuous curve within G with l(0) ¼ ga and l(1) ¼ g e ¼ g. Then, ga and g are
connected within G. That is, ga 2 C(g). Since a is an arbitrary point of G0, gG0 ⊂ C
(g). Meanwhile, choosing any 8p 2 C(g), we set a continuous curve m(t) (0 t 1)
that connects p with g within G. This implies that m(0) ¼ p and m(1) ¼ g. Also
setting a continuous curve n(t) ¼ g1m(t), we have n(0) ¼ g1m(0) ¼ g1p and n
(1) ¼ g1m(1) ¼ g1g ¼ e. This means that g1p 2 G0. Multiplying g on its both
sides, we get p 2 gG0. Since p is arbitrarily chosen from C(g), we get C(g) ⊂ gG0.
Combining this with the above relation gG0 ⊂ C(g), we obtain C(g) ¼ gG0 ¼ G0g.
These complete the proof. ∎
In the above proof, we add the following statement: In Sect. 16.2 with the
necessary and sufficient condition for the subset to be a subgroup, we describe
888 20 Theory of Continuous Groups
hi 2 H , hj 2 H ⟹hi ⋄h1
j 2 H:
In Chap. 17 we dealt with finite groups related to O(3) and SO(3), i.e., their
subgroups. In this section, we examine several important properties in terms of Lie
groups.
The groups O(3) and SO(3) are characterized by their determinant detO(3) ¼ 1
and detSO(3) ¼ 1. Example 20.8 implies that in O(3) there should be two different
connected components according to detO(3) ¼ 1. Also Theorem 20.9 tells us that
a connected component C(E) is an invariant subgroup G0 in O(3), where E is the
identity element of O(3). Obviously, G0 is SO(3). In turn, another connected
component is SO(3)c, where SO(3)c denotes a complementary set of SO(3); with
the notation see Sect. 6.1. Remembering (20.317), we have
Note that E is commutative with any Q (or Q1). Also, notice that from
(20.323) R0ω and Rω are again connected via a unitary similarity transformation.
From (20.322), we have
0 1 0 1
cosω sinω 0 cos ðω þ π Þ sin ðω þ π Þ 0
B C B C
Rω ¼ @ sinω cosω 0 A ¼ @ sin ðω þ π Þ cos ðω þ π Þ 0 A, ð20:324Þ
0 0 1 0 0 1
where the first component of LHS belongs to O (octahedral rotation group) and the
RHS belongs to Td (tetrahedral group) that gives a mirror symmetry. Combining
(20.324) and (20.325), we get the following schematic representation:
where the last side represents the coset decomposition (Sect. 16.2). This is essen-
tially the same as (20.321). In terms of the isomorphism, we have
where E is a (n, n) identity matrix. Note that since E is commutative with any
elements of O(n), the direct-product of (20.328) is allowed.
20.5 Connectedness of Lie Groups 891
Að0Þ ¼ E: ð20:238Þ
Under this condition, it is important to examine how A(t) evolves with t, starting
from E at t ¼ 0. In this connection we have the following theorems that give good
grounds to further discussion of Sect. 20.2. We describe them without proof.
Interested readers are encouraged to look up literature [9].
Theorem 20.10 Let G be a linear Lie group. Let G0 be a connected component of
the identity element e 2 G. Then, G0 is another linear Lie group and the Lie algebra
of G and G0 is identical.
Theorem 20.10 shows the significance of the connected component of the
̬ ̬ ̬
identity. From this theorem the Lie algebras of, e.g., o ðnÞ and so ðnÞ are identical,
namely both of the Lie algebras are given by real skew-symmetric matrices. This is
inherently related to the properties of Lie algebras. The zero matrix (i.e., zero vector)
as an element of Lie algebra corresponds to the identity element of the Lie group.
This is obvious from the relation
e ¼ exp 0,
where e is the identity element of the Lie group and 0 represents the zero matrix.
Basically Lie algebras are suited for describing local properties associated with
infinitesimal transformations around the identity element e. More specifically, the
exponential functions of the real skew-symmetric matrices cannot produce E.
Considering that one of the connected components of O(3) is ̬
SO(3) that̬ ̬ contains
the identity element, it is intuitively understandable that o ðnÞ and so ðnÞ are
the same.
Another interesting point is that SO(3) is at once an open set and a closed set (i.e.,
clopen set; see Sect. 6.1). This is relevant to Theorem 6.3 which says that a necessary
and sufficient condition for a subset S of a topological space to be both open and
closed at once is Sb ¼ ∅. Since the determinant of any elements of O(3) is
alternatively 1, it is natural that SO(3) has no boundary; if there were boundary,
its determinant would be zero, in contradiction to SO(3) ⊂ GL(n, ℝ); see Example
20.8.
Theorem 20.11 Let G be a connected linear Lie group. (The Lie group G may be G0
in the sense of Theorem 20.10.) Then, 8g 2 G can be described by
f : I⟶G or f ðI Þ ⊂ G: ð20:330Þ
Then, the function f is said to be a path, in which f(0) and f(1) are an initial point
and an end point, respectively. If f(0) ¼ f(1) ¼ x0, f is called a loop (i.e., a closed
path) at x0 [14]. If f(t) x0 (0 t 1), f is said to be a constant loop.
Definition 20.9 Let f and g be two paths. If f and g can continuously be deformed
from one to the other, they are said to be homotopic.
To be more specific with Definition 20.9, let us define a function h(s, t) that is
continuous with respect to s and t in a region I I ¼ {(s, t); 0 s 1, 0 t 1}. If
furthermore h(0, t) ¼ f(t) and h(1, t) ¼ g(t) hold, f and g are homotopic [9].
Definition 20.10 Let G be a connected Lie group. If all the loops at an arbitrary
chosen point 8x 2 G are homotopic to a constant loop, G is said to be simply
connected.
This would somewhat be an abstract concept. Rephrasing the statement in a word,
for G to be simply connected is that loops at any 8x 2 G can continually be
contracted to that point x. A next example helps understand the meaning of being
simply connected.
Example 20.9 Let S2 be a spherical surface of ℝ3 such that
S2 ¼ x 2 ℝ3 ; hxjxi ¼ 1 : ð20:331Þ
Any x is equivalent in virtue of the spherical symmetry of S2. Let us think of any
loop at x. This loop can be contracted to that point x. Hence, S2 is simply connected.
By the same token, a spherical hypersurface (or hypersphere) given by
Sn ¼ {x 2 ℝn + 1; hx| xi ¼ 1} is simply connected. Thus, from (20.36) SU(2) is
simply connected. Note that the parameter space of SU(2) is S3.
Example 20.10 In Sect. 20.4.2 we showed that SO(3) is a connected set. But, it is
not simply connected. To see this, we return to Sect. 17.4.2 that dealt with the three-
dimensional rotation matrices. Rewriting fR3 in (17.101) as fR3 ðα, β, γ Þ, we have
20.5 Connectedness of Lie Groups 893
f3 ðα, β, γ Þ
R
0 1
cos α cos β cos γ sin α sin γ cos α cos βsinγ sin α cos γ cos α sin β
B C
¼B @
sin α cos β cos γ þ cos α sin γ sin α cos β sin γ þ cos α cos γ sin α sin β C:
A
sin β cos γ sin β sin γ cos β
ð20:332Þ
Note that in (20.332) we represent the rotation in the moving coordinate system.
Putting β ¼ 0 in (20.332), we get
f
R3 ðα, 0, γ Þ
0 1
cos α cos γ sin α sin γ cos α sin γ sin α cos γ 0
B C
¼B @ sin αð cos 0Þ cos γ þ cos α sin γ sin αð cos 0Þ sin γ þ cos α cos γ 0C
A
0 0 1
0 1
cos ðα þ γ Þ sin ðα þ γ Þ 0
B C
¼B @ sin ðα þ γ Þ cos ðα þ γ Þ 0 CA:
0 0 1
ð20:333Þ
(a) (b)
+
− Equator
N
2 −
S
(c)
Fig. 20.7 Geometry of the rotation axis (A) accompanied by a rotation γ. (a) Geometry viewed in
parallel to the equator. (b, c) Geometry viewed from above the north pole (N). If the azimuthal angle
e α should be replaced with (b) α + π (in the case of 0 α π)
α is measured at the antipodal point P,
or (c) α π (in the case of π < α 2π)
0 1
cos 2 α cos 2 β þ sin 2 α cos γ cos α sinα sin 2 βð1 cos γ Þ cos α cos β sin βð1 cos γ Þ
B þ 2
α 2
β cos β sin γ þsin α sin β sin γ C
B cos sin C
B C
B C
B 2 C
B C
B cos αsinα sin 2 βð1 cos γ Þ sin α cos 2 β þ cos 2 α cos γ sinα cos β sin βð1 cos γ Þ C
B C
R3 ¼ B cos αsinβ sin γ C:
B þcos β sin γ þ sin 2 α sin 2 β C
B C
B C
B C
B C
B cos β sin β cos αð1 cos γ Þ sin αcos β sinβð1 cos γ Þ sin 2 β cos γ C
@ A
sin α sin β sin γ þcos α sin β sin γ þ cos 2 β
ð20:337Þ
20.5 Connectedness of Lie Groups 895
Note in (20.337) we represent the rotation in the fixed coordinate system. In this
case, again we have arbitrariness with the choice of parameters. Figure 20.7 shows a
geometry of the rotation axis (A) accompanied by a rotation γ; see Fig. 17.18 once
again. In this parameter space, specifying SO(3) the rotation is characterized by
azimuthal angle α, zenithal angle β, and magnitude of rotation γ. The domains of
variability of the parameters (α, β, γ) are
0 α 2π, 0 β π, 0 γ 2π:
Figure 20.7a shows a geometry viewed in parallel to the equator that includes a
zenithal angle β as well as a point P (i.e., a point of intersection between the rotation
axis and geosphere) and its antipodal point P. e If β is measured at P,
e β and γ should be
replaced with π β and 2π γ, respectively. Meanwhile, Fig. 20.7b, c depict an
azimuthal angle α viewed from above the north pole (N). If the azimuthal angle α is
e α should be replaced with either α + π (in the case
measured at the antipodal point P,
of 0 α π; see Fig. 20.7b) or α π (in the case of π < α 2π; see Fig. 20.7c).
Consequently, we have the following correspondence:
Thus, we confirm once again that the different sets of parameters give the same
rotation. In other words, two different points in the parameter space correspond to
the same transformation (i.e., rotation). In fact, using these different sets of param-
eters, the matrix form of (20.337) is held unchanged. The conformation is left for
readers.
Because of the aforementioned characteristic of the parameter space, a loop
cannot be contracted to a single point in SO(3). For this reason, SO(3) is not simply
connected.
In contrast to Example 20.10, the mapping from the parameter space S3 to SU(2)
is bijective in the case of SU(2). That is, any two different points of S3 give different
transformations on SU(2) (i.e., injective). For any element of SU(2) it has a
corresponding point in S3 (i.e., surjective). This can be rephrased such that SU
(2) and S3 are isomorphic.
In this context, let us give important concepts. Let (T, τ) and (S, σ) be two
topological spaces. Suppose that f : (T, τ) ⟶ (S, σ) is a bijective mapping. If in this
case, furthermore, both f and f1 are continuous mappings, (T, τ) and (S, σ) are said to
be homeomorphic [14]. In our present case, both SU(2) and S3 are homeomorphic.
Apart from being simply connected or not, SU(2) and SO(3) resemble in many
aspects. The resemblance has already shown up in Chap. 3 when we considered the
(orbital) angular momentum and generalized angular momentum. As shown in
(3.30) and (3.69), the commutation relation was virtually the same. It is obvious
when we compare (20.271) and (20.275). That is, in terms of Lie algebras their
structure constants are the same and, eventually, the structure of corresponding Lie
groups is related. This became well understood when we considered the adjoint
896 20 Theory of Continuous Groups
̬
representation of the Lie group G on its Lie algebra g . Of the groups whose Lie
algebras are characterized by the same structure constants, the simply connected
group is said to be a universal covering group. For example, among the Lie groups O
(3), SO(3), and SU(2), only SU(2) is simply connected and, hence, is a universal
covering group.
The understanding of the Lie algebra is indispensable to explicitly describing the
elements of the Lie group, especially the connected components of the identity
element e 2 G. Then, we are able to understand the global characteristics of the
Lie group. In this chapter, we have focused on SU(2) and SO(3) as a typical example
of linear Lie groups.
Finally, though formal, we describe a definition of a topological group.
Definition 20.11 [15] Let G be a set. If G satisfies the following conditions, G is
called a topological group.
(i) The set G is a group.
(ii) The set G is a T1-space.
(iii) Group operations (or calculations) of G are continuous. That is, let P and Q be
two mappings such that
References
1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, expanded
edn. Shokabo, Tokyo. (in Japanese)
3. Takagi T (2010) Introduction to analysis, Standard edn. Iwanami, Tokyo. (in Japanese)
4. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn.
Academic Press, Waltham
5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
6. Rose ME (1995) Elementary theory of angular momentum. Dover, New York
7. Hassani S (2006) Mathematical physics. Springer, New York
8. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton
9. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
10. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
References 897
11. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
12. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
13. Steeb W-H (2007) Continuous symmetries, Lie algebras, differential equations and computer
algebra, 2nd edn. World Scientific, Singapore
14. McCarty G (1988) Topology. Dover, New York
15. Murakami S (2004) Foundation of continuous groups, revived edn. Asakura Publishing, Tokyo.
(in Japanese)
Index
569, 573, 608, 615, 623, 647, 658, 681, Bithiophene, 648
693, 704, 707, 731, 734, 740, 747, 751, Blackbody, 339–341
764, 797, 806, 817, 820, 826, 828, 847 Blackbody radiation, 339–341
Aromatic hydrocarbons, 729, 747 Block matrix, 686, 860
Aromatic molecules, 756, 770, 777 Bohr radius
Associated Laguerre polynomials, 57, 108, of hydrogen, 133
116–122 of hydrogen-like atoms, 108
Associated Legendre differential equation, 83, Boltzmann distribution law, 339, 348, 355
91–103, 427 Born probability rule, 565
Associated Legendre functions, 57, 83, 94, Boron trifluoride (BF3), 650
103–107 Bose–Einstein distribution functions, 341
Associative law, 23, 441, 541, 624, 631, 663, Boundary, 14, 71, 109, 117, 173, 174, 189–190,
666 325–327, 341, 344, 363, 365, 370, 379,
Atomic orbitals, 729, 734, 742, 744, 745, 747, 380, 384, 385, 388, 389, 393, 401, 405,
749, 752, 757, 764, 770, 771, 777, 785, 408, 418, 423, 597, 891
788, 789 functionals, 380, 384, 388, 389, 393, 408
Attribute, 636, 637 term, 386
Azimuthal angle, 29, 70, 674, 791, 831, 839, Boundary conditions (BCs), 14–17, 19, 20,
895 25–27, 30, 32, 36, 56, 71, 72, 109, 116,
Azimuthal quantum numbers, 120, 142, 147 117, 173, 174, 285, 325–327, 341–343,
Azimuth of direction, 663 345, 363, 365, 370, 379, 380, 384–386,
388, 389, 393–395, 397–399, 401–403,
405, 407–411, 415, 418, 419, 421,
B 423–427, 581, 597–600, 602, 605, 607,
Backward wave, 278, 331, 333, 335 608, 610, 615
Ball, 220 adjoint, 410, 415
Basis homogeneous, 380, 388, 389, 393, 394,
functions, 88, 683–691, 720, 734, 744, 397, 399, 401–403, 405, 407–409, 411,
748–750, 777, 799, 815–817, 819, 839, 423, 426, 427, 597, 600
842, 843, 846, 849, 859, 860, 863, 864 homogeneous adjoint, 393–395
set, 457, 515, 516, 541, 547, 584, 685, 692, inhomogeneous, 380, 394, 395, 402, 403,
693, 720, 739, 847, 849, 854, 878, 881, 405, 407, 415, 418, 419, 424, 597–599,
892 605, 607, 608
vectors, 8, 41, 59, 60, 132, 140, 155, 291, Bounded, 25, 29, 74, 79, 212, 214–216, 221,
395, 433, 435, 439–446, 448, 450, 222
452–460, 468–470, 474, 477, 487, BP1T, 361–363
489–491, 495, 505–507, 513, 515, Bragg’s condition, 374
518–520, 528, 529, 533, 537–540, 547, Branch, 250, 252, 253, 255, 256, 427, 547, 617
568, 576, 636, 640, 648, 650, 654, 657, cut, 253, 255–259, 262, 263
677, 679, 684–686, 688, 689, 691, 704, point, 252–257, 259, 262–264
710, 711, 716, 717, 719, 730, 738, 740, Bra vector, 22, 523, 526, 529
744, 745, 747, 751, 757, 761, 764, 770, Brewster angles, 312–314
783, 784, 803, 805, 820, 822, 843, 847,
848, 852, 854, 855, 859, 860, 863, 872,
878 C
Benzene, 648, 729, 747, 762, 764–771 Canonical commutation relation, 3, 27–30, 34,
Bijective, 447, 450, 451, 623, 628, 629, 895 43, 44, 54, 59, 65
Bijective mapping, 895 Canonical coordinate, 27
Binomial expansion, 233, 234, 852 Canonical forms of matrices, 459–522, 559
Binomial theorem Canonical momentum, 27
generalized, 95, 233 Cardinalities, 183
Biphenyl, 648 Cardinal numbers, 183
Index 901
Carrier space, 687 Column vectors, 8, 21, 41, 88, 90, 290, 435,
Cartesian coordinates, 45, 61, 109, 171, 197, 440, 441, 452, 455, 457, 462–465, 502,
283, 321, 349, 650, 736, 791, 793, 804, 505, 507, 508, 510, 527, 535, 541, 542,
806, 823, 889 556, 557, 568, 573, 574, 577, 596, 613,
Cartesian space, 59, 436 658, 677, 684, 759, 782, 810, 817, 820,
Cauchy–Hadamard theorem, 220 831, 863
Cauchy–Liouville theorem, 215–216 Common factor, 477, 478, 482, 483
Cauchy Riemann conditions, 203–206 Commutable, 27, 482, 486–488, 524, 556, 579,
Cauchy–Schwarz inequality, 525, 585 642, 643, 869
Cauchy’s integral formula, 206–216, 219, 225, Commutation relation, 3, 27–30, 34, 43, 44, 54,
229 59, 65, 73, 77, 142–144, 872, 895
Cauchy’s integral theorem, 209–212, 227, 243 Commutative, 27, 171, 472, 518, 585, 591, 603,
Cavity, 33, 339, 341, 343–345, 375, 376, 378 604, 615, 622, 639, 649, 689, 824, 825,
Cavity radiation, 339 865, 870, 871, 889, 890
Center of inversion, 642 Commutative groups, 622
Center-of-mass coordinates, 57, 58 Commutative law, 622
Central force fields, 57, 107 Commutator, 27–30, 144, 868, 872
Characteristic equation, 421, 461, 462, 501, Compatibility relation, 750
533, 567, 573, 675, 689, 831 Complementary set, 183, 212, 888
Characteristic impedance Complements, 183, 185, 189, 190, 195, 212,
of dielectric media, 295, 336 547–549, 559, 688, 691, 888
Characteristic polynomial, 461, 471, 477, 481, Complete orthonormal system (CONS), 151,
489, 513, 517, 520, 782 155, 163, 166, 167, 173, 842
Characters, 373, 453, 524, 628, 631, 637, 694, Complete system, 155
697–700, 702, 703, 722, 723, 726, 741, Completely reducible, 825, 830
745, 746, 748, 750, 751, 755–757, 759, Complex amplitude, 303
761, 765, 770, 775, 776, 780, 784, 796, Complex analysis, 181, 182, 197
797, 834, 837–844 Complex conjugate, 22, 24, 25, 28, 34, 117,
Chemical species, 374, 650 137, 199, 347, 390, 397, 401, 523, 524,
Circularly polarized light 526, 536, 537, 693, 714, 762, 816, 817,
left-, 136, 138, 147–149, 293 820, 850, 876
right-, 136, 138 Complex conjugate representation, 762, 816,
Clad layers, 321, 327, 331, 371 817
Classes, 191, 625–627, 637, 653, 654, 656, 657, Complex conjugate transposed matrix, 22, 537
659, 662, 663, 699, 700, 704–711, 746, Complex domain, 19, 201, 216, 227, 263–265,
834, 838, 839, 841, 888, 889 315
Classical Hamiltonian, 33 Complex function, 14, 181, 199, 200, 204, 206,
Classical orthogonal polynomials, 47, 94, 379, 207, 216, 254, 389, 764
427 Complex phase shift, 329
Clebsch Gordan Coefficients, 843 Complex plane, 14, 181, 182, 197–199, 201,
Clopen sets, 190, 193, 195, 891 206, 207, 211, 212, 216, 227, 241, 242,
Closed interval, 196 249, 251, 253, 263, 317, 318, 521, 885
Closed-open set, 190 Complex roots, 235, 420, 421
Closed paths, 194, 209, 892 Complex variables, 15, 199–206, 215, 230,
Closed sets, 185, 186, 188–190, 193, 195, 196, 532, 855
212, 891 Composite function, 63
Closed shell, 741, 755 Compton, A., 4, 6
Closed surface, 273, 274 Compton effect, 4
Closures, 186–189, 654 Compton wavelength, 5
C-numbers, 10, 22 Condon–Shortley phase, 101, 102, 106, 123,
Cofactor, 471, 472 138, 140, 141
Cofactor matrix, 471, 472 Conjugacy classes, 625, 654, 657, 659, 662,
Coherent state, 126–128, 133, 139, 149 699, 707, 710, 711, 841
902 Index
Diagonal matrices, 89, 459, 485, 517, 535, 556, Dispersion of refractive index, 358, 359, 361,
558, 559, 568, 573, 600, 601, 604, 638, 362
681, 689, 738, 782, 805, 826, 828, 879 Displacement current, 273, 275
Diagonal positions, 465, 486 Distance function, 181, 196, 199, 524
Dielectric constant, 270, 271, 336, 365 Distributive law, 184, 442
Dielectric constant ellipsoid, 366 Divisor, 473, 514, 648
Dielectric medium, 269–272, 275, 276, 280, Domain
285, 286, 295–337, 339, 357, 365, 371, of analyticity, 206, 209, 210, 212, 216, 227
375, 376, 419 complex, 19, 227, 263–265, 315
Differentiable, 65, 201, 204, 206, 215, 275, of variability, 181, 895
406, 581, 591, 866, 868
Differential equations, 3, 14, 15, 17, 19, 26, 44,
50, 67, 83, 91–103, 107, 108, 169, 174, E
269, 341, 368, 379–389, 392, 401, 403, Effective index, 362, 368
405, 408, 418, 421, 426, 427, 581, Eigenenergy, 43, 111, 121, 151–156, 162–164
592–600, 613, 617 Eigenequation, 460
Differential operators, 9, 14, 19, 26, 27, 50, 68, Eigenfunctions, 17, 18, 24, 26, 36, 37, 40, 41,
72, 77, 167, 172–174, 270, 379, 386, 67–69, 72, 86, 110, 112, 137, 152, 173,
388–394, 399, 402, 403, 409, 414, 425, 429, 690, 715, 720, 739, 768, 773, 774,
426, 523, 547 793, 826, 861, 863
Diffraction grating, 359, 361, 363, 371 Eigenspace, 460, 468–473, 478–485, 487, 488,
Diffraction order, 364, 373 496, 513, 515, 517, 559, 565, 577, 719
Dimensionless, 72, 77, 79, 108, 270–272, 278, Eigenstate
355, 359, 790, 801 energy, 156, 157, 163, 166, 176, 774
Dimensionless parameter, 50, 152 Eigenvalue
Dimension theorem, 444, 445, 447, 479, 492, equation, 13, 24, 29, 33, 50, 51, 58, 126,
493, 514 151, 152, 160, 172, 371, 408, 426, 460,
Dipole 466, 502, 521, 534, 577, 729, 734, 742,
approximation, 127, 142, 347 744, 789
isolated, 353 problems, 14, 16, 18, 21, 24, 120, 125, 379,
oscillation, 339 425–430, 459, 522, 579, 691, 720, 785
radiation, 349–354 Eigenvectors, 26, 41, 74, 152, 153, 155, 288,
Dirac equation, 12 290, 459–468, 471, 473–478, 493, 495,
Dirac, P., 11, 12, 22, 523–526 497, 500–503, 505, 507, 510, 513, 515,
Direct factors, 632, 649 516, 518, 520, 521, 556, 559, 561, 566,
Direct sum, 191, 193, 437, 471, 473, 474, 575–580, 654, 665, 667, 676, 691, 743,
477–481, 484, 487–489, 496, 513, 515, 746, 782, 831, 832
517, 518, 547, 550, 633, 686, 687, 691, Einstein, A., 3, 4, 6, 7, 339, 341, 345–347, 349,
699, 702, 719, 742, 749, 769, 770, 781, 353, 354
825, 843, 886, 888, 889 Einstein coefficients, 349
Direction cosines, 279, 305, 658, 675, 676, 832, Electric dipole
834, 835 approximation, 142
Direct-product groups, 621, 631–633, 649, 720, moment, 127, 164, 166, 741, 755
723, 750, 843, 844, 888 operator, 796
Direct-product representations, 720–723, 725, transition, 125–127, 129, 132, 141, 146,
740, 741, 745, 769, 770, 796, 831, 740, 754, 755
843–846, 849, 859, 861, 863 Electric displacement, 270
Dirichlet conditions, 14, 19, 26, 71, 325, 342, Electric field, 127, 147, 149, 151, 156, 160,
344, 408 161, 163, 164, 172, 174, 177, 269, 270,
Disconnected, 194 272, 284–286, 289, 292, 293, 297,
Discrete set, 191 303–306, 310, 315, 324, 325, 328, 329,
Discrete topology, 195 331, 335, 336, 344, 350–352, 368–370,
Disjoint sets, 194 375, 376, 418, 419, 611, 755, 796
904 Index
745–748, 750, 751, 754–757, 759, 761, Light-emitting device, 363, 372, 373
765, 769–771, 774, 775, 778, 781, Light-emitting material
783–786, 788, 789, 795–797, 823–831, Light propagation, 148, 269, 319, 364, 366
837, 840–847, 859 Light quantum, 3, 4, 345–347
Isolated point, 191–193 organic, 359, 361, 374
Isomorphic, 629, 631, 648, 654, 661–663, 685, Limes superior, 220
734, 883, 895 Limit, 202, 205, 207, 212, 213, 220, 258, 312,
Isomorphism, 621, 627–631, 679, 890 415, 417, 587, 591, 703, 802, 836, 858,
869
Linear algebra, 491
J Linear combination
Jacobi’s identity, 872 of basis vectors, 155, 435, 440, 519, 691,
j j coupling, 849 843
Jordan blocks, 490–502, 504–506, 513 of functions, 126, 849, 854, 861
Jordan canonical forms, 459, 488–512 of a fundamental set of solutions, 404
LCAO, 734
Linear combination of atomic orbitals (LCAO),
K 729, 734, 742
Kernel, 443, 501, 629–631, 881, 883 Linearly dependent, 17, 93, 100, 107, 299, 344,
Ket vector, 22, 523, 526 381–383, 434, 443, 444, 454, 466, 468,
469, 471, 527, 528, 531, 532, 543, 693
Linearly independent, 15–17, 32, 71, 298, 341,
L 345, 368, 380–383, 404, 433, 434, 437,
Laboratory coordinate system, 366 445, 453, 456, 462, 463, 466–470, 473,
Ladder operators, 74, 90, 91, 112 476–478, 489, 490, 513, 526, 528, 532,
Laguerre polynomials, 57, 108, 116–122 533, 543, 544, 547, 568, 576, 579, 596,
Laser 613, 664, 665, 684, 685, 688, 693, 701,
material, 358, 361, 363 710, 715, 716, 785, 786, 795, 812, 822
medium, 355–358, 365 Linearly polarized light, 142, 144, 286, 293
organic, 359–374 Linear transformation group, 623
oscillation, 355, 358, 360, 374 Linear vector space
Lasing property, 359 finite-dimensional, 872
Laurent’s expansion, 221–223, 229, 233, 234 Line integral, 296
Laurent’s series, 216–224, 228, 229, 234 Local properties, 890–896
Law of conservation of charge, 274 Logarithmic branch point, 256
Law of electromagnetic induction, 272 Logarithmic functions, 248
Law of equipartition of energy, 341 Longitudinal multimode, 358
Left-circularly polarized light, 136, 138, Loop, 296, 892, 895
147–149, 293 Lorentz force, 147, 611, 612, 615
Left-circular polarization, 293 Lowering operators, 74, 90, 108, 112, 122
Legendre differential equation, 83, 91–103, 427 Lower left off-diagonal elements, 450, 462
Legendre polynomials, 57, 85, 91, 93, 98, 100, Lower triangle matrix, 462, 512, 553, 606
104, 106, 107, 264 Lowest-unoccupied molecular orbital (LUMO),
Leibniz rule, 92, 94, 101, 118 754, 769
Leibniz rule about differentiation, 92, 94 L S coupling, 849
Levi-Civita symbol, 873 Luminous flux, 311
Lie algebra, 592, 600, 610, 799, 802, 864, 891,
892, 895, 896
Lie group, 891 M
linear, 866, 867, 877, 887, 891, 892 Magnetic field, 147, 269, 271–273, 275,
Light amplification, 354, 355 283–285, 297, 303–307, 310, 322, 324,
Light amplification by stimulated emission of 326, 344, 350, 352, 368–370, 375, 376,
radiation (laser), 354 611
908 Index
Magnetic flux, 272 Molecular orbitals (MOs), 691, 719, 720, 729,
Magnetic flux density, 148, 271, 612 734–797
Magnetic permeability, 271, 365 Molecular science, 3, 711, 723, 729, 740, 797
Magnetic quantum number, 140, 142 Molecular short axis, 777
Magnetic wave, 284, 286 Molecular states, 729
Magnetostatics, 271 Molecular systems, 720, 723
Magnitude relationship, 412, 693, 694, 797 Momentum, 4–6, 27, 55, 59, 60, 72–77, 79, 82,
Major axis, 291, 292 86, 91–107, 125, 132, 138, 147–150,
Mapping 375, 377, 810, 843, 844, 873, 895
inverse, 447 Momentum operators, 3, 29, 31, 33, 61, 77–91,
invertible, 447 108, 142, 145
non-invertible, 447 Monoclinic, 365
reversible, 447 Monomials, 812, 855, 857
Mathematical induction, 48, 80, 113, 114, 462, Moving coordinate systems, 669, 670, 672,
464, 466, 543, 556, 558, 570, 583 678, 831, 893
Matrix Multiple roots, 249, 461, 473, 474, 477, 478,
algebra, 43, 306, 367, 433, 443, 451, 474, 514, 515, 517, 521, 559
530, 547 Multiplication table, 623, 648, 688, 701
decomposition, 485, 488, 512, 561 Multiplicity, 468, 481, 486, 493, 498, 562, 565,
representation, 3, 31, 41–44, 86, 89, 90, 575, 576, 782
440, 442, 443, 450, 451, 454, 456, 468, Multiply connected, 194, 222
490, 494, 503–505, 519, 536–538, 540, Multivalued function, 248–265
553, 576, 579, 640, 643–646, 648, 650,
652, 657, 659, 661, 663, 689–691, 694,
697, 730, 738, 746, 748, 758, 764, 778, N
800, 808, 826, 860, 863, 881, 884 Nabla, 61
Matrix element Nano capacitor, 157
of electric dipole, 127, 129 Nano device, 160
of Hamiltonian, 131, 532, 740, 741 Naphthalene, 648, 658
Matrix equation, 504 Necessary and sufficient condition, 53, 190,
Matrix function, 885 193, 195, 206, 382, 383, 447, 448, 451,
Matter wave, 6, 7 453, 461, 528, 533, 556, 563, 622, 624,
Maximum number, 434 629, 825, 887, 891
Maxwell, J.C., 275 Negation, 187, 191
Maxwell’s equations, 269–293, 321, 365, 366 Negative helicity, 293
Mechanical system, 33, 375–378 Neumann conditions, 27, 327, 344, 394
Meromorphic, 225, 231 Newtonian equation, 31
Meta, 767 Nilpotent, 475, 485–488, 498, 553
Methane, 659, 729, 777–797 Nilpotent matrices, 90, 91, 473–478, 484,
Method of variation of constants, 594, 596, 614 487–494, 497, 499, 500, 509, 512, 553,
Metric, 181, 196, 199, 523 607
Metric space, 196, 199, 220, 524 Nodes, 325, 335–337, 376, 747, 748, 789
Minimal polynomial, 472, 473, 475, 499, 508, Non-commutative, 3, 27, 579
513–515, 517 Non-commutative group, 622, 639, 746
Minor axis, 291 Non-degenerate, 151–154, 163, 176, 575
Mirror symmetry Non-equilibrium phenomenon, 355
plane of, 642, 643, 646–648 Non-invertible mapping, 447
Mode density, 341–345 Non-magnetic substance, 280, 286, 308, 312
Modulus, 197, 213, 235, 836 Non-negative, 27, 34, 49, 52, 68, 69, 74, 93,
Molecular axis, 648, 756, 791, 795 146, 197, 214, 241, 242, 384, 385, 524,
Molecular long axis, 756, 777 532, 535, 547, 568, 615, 811, 819, 855
Molecular orbital energy, 735, 746 Non-negative operator, 72, 74, 532
Index 909
Non-relativistic approximation, 7, 11 473, 496, 501, 502, 513, 639, 710, 742,
Non-singular matrix, 433, 454, 455, 462, 463, 746, 755, 803, 806, 864
465, 475, 486, 497, 499, 508, 513, 570, case, 343
571, 588, 681, 685, 865, 875 harmonic oscillator, 31, 57, 126, 130–132,
Non-trivial, 381, 434 755
Non-trivial solution, 382, 383, 399, 408, 426, infinite potential well, 19
521 position coordinate, 33
Non-vanishing, 15, 16, 90, 91, 128, 131, 137, representation, 746, 761, 762, 783, 800
146, 223, 225, 228, 274, 296, 394, 591, system, 128–132, 163
654, 741, 745, 770, 785 One-parameter group, 865, 870, 891
Non-zero Open ball, 220
coefficient, 300 Open neighborhood, 196
determinant, 454 Operator, 31, 61, 130, 151, 270, 347, 379, 456,
eigenvalue, 498 478, 523, 547, 610, 647, 684, 729, 801
element, 87, 88 Operator method, 33–41, 132
vector, 67 Optical devices, 314, 319, 320, 337, 354, 355,
Norm, 24, 53, 524, 525, 529, 531, 539, 541, 363, 371
543, 554, 561, 566, 573, 581, 584, 763, Optical path, 327, 328
764 Optical process, 125, 136, 346
Normalization constant, 37, 45, 51, 112, 140, Optical transition, 125–151, 346, 347, 717, 720,
543 722, 741, 746, 754, 755, 757, 769, 770,
Normalized, 13, 18, 24, 25, 28, 34, 35, 37, 39, 774–776, 796, 797
47, 53, 54, 67, 71, 76, 86, 106, 110, 113, Optical waveguide, 329
115, 121, 122, 128, 132–134, 139, 140, Optics, 295, 339, 362
153, 154, 174, 175, 356, 543, 544, 565, Orbital angular momentum, 58, 77–107, 843,
726, 742, 743, 751, 753, 754, 768, 771, 844, 895
785, 795, 847, 854, 864 Orbital angular momentum quantum numbers,
Normalized eigenfunction, 112, 773 108, 120, 122, 142
Normalized eigenstate, 53 Orbital energy, 789, 795
Normalized eigenvectors, 37, 153, 290 Order of group, 648, 696
Normal operator, 547, 554–557, 561, 563, 668 Ordinate axis, 129
n-th roots, 248–251 Organic laser, 359–374
Nullity, 444 Organic materials, 361
Null-space, 443, 501 Orthogonal, 47–49, 94, 107, 175, 304, 344,
Number operator, 40 379, 427, 429, 524, 539, 544, 548, 551,
Numerical vector, 435 560, 561, 636, 655, 691, 710, 717, 719,
737, 745, 746, 751, 759, 774, 786, 871,
875, 884
O complements, 547–549, 559, 688, 691
O(3), 280, 642, 647, 654–663, 888–889 coordinate, 197, 365, 366, 817
Oblique coordinate, 639 group, 663–678, 871
Oblique incidence matrix, 569, 574, 590, 592, 614, 636, 638,
of wave, 295 654, 663, 664, 667, 672, 680, 736, 822,
Observable, 565, 566 831, 832, 837, 871, 875, 876, 881, 884
Octants, 656 transformation, 636, 663, 669, 878, 881
Odd functions, 18, 49, 132, 157 Orthonormal base, 8, 532, 540–543
Odd permutations, 449, 873 Orthonormal basis set, 547, 584, 720, 739, 740,
Off-diagonal elements, 450, 512, 560, 639, 640, 847, 878, 881
744, 745, 761, 787 Orthonormal basis vectors, 59, 60, 636, 650,
O(n), 875 716, 730, 745
One-dimensional, 33, 49, 128, 130, 151, 152, Orthonormal eigenfunctions, 40
157, 167, 174, 177, 196, 277, 357, 375, Orthonormal eigenvectors, 153, 559
910 Index
Orthonormal system, 107, 151, 155, 842 Physicochemical properties, 746, 754
Oscillating electromagnetic field, 127 Planar molecule, 650, 747, 757, 764
Out-coupling Planck, M., 3, 127, 339, 341–345, 349
of emission, 372 Planck’s law of radiation, 339, 341–345, 349
of light, 363, 364 Plane electromagnetic waves, 283
Over damping, 419 Plane figure, 643
Overlap integrals, 717, 740, 753, 754, 767, 774, Plane of incidence, 301
777, 785, 788, 789 Plane of mirror symmetry, 642, 647
Plastics, 286
Point group, 635, 638, 648, 659, 745, 746, 755,
P 778
Paper-and-pencil, 796 Polar coordinate, 29, 45, 58, 62, 146, 171, 197,
Para, 767 241, 806, 837
Parallelepiped, 355–357 Polar coordinate representation, 146, 256, 837
Parallelogram, 848 Polar form, 197, 237, 249
Parallel translation, 278 Polarizability, 163–171
Parameter space, 831–837, 892, 895 Polarization vector, 127, 134, 135, 138, 284,
Parity, 18, 49, 130 285, 303–306, 309, 315, 324, 331, 335,
Partial differentiation, 8–10, 283, 321 350, 352, 375, 741, 755, 796
Partial fraction decomposition Polyenes, 777
method of, 232, 245 Polymers, 269, 280
Partial sum, 159, 218 Polynomials, 38, 47–49, 57, 85, 91, 93–95,
Particular solutions, 169, 383 98–100, 104–107, 116–122, 163, 231,
Path, 194, 208–210, 212, 222, 327, 328, 357, 235, 264, 298, 299, 379, 427, 428, 460,
892 461, 471–473, 475, 477, 478, 481–483,
Pauli spin matrices, 810 486, 488, 489, 499, 508, 513–515, 517,
Periodic conditions, 27, 394 518, 520, 603, 782, 851, 852
Periodic function, 250, 884 Population inversion, 355
Periodic wave, 278 Portmanteau word, 190
Permeability, 148, 271, 306, 365 Position coordinate, 33
Permittivity, 58, 108, 270, 365, 569 Position operator, 29, 347, 354
Permittivity tensor, 365, 569 Position vector, 8, 59, 127, 129, 134, 149, 164,
Permutation, 383, 449, 641, 662, 671, 677, 873 275, 280, 298, 349, 351, 439, 540, 612,
Perturbation 636, 730, 735, 741, 748, 755, 800
method, 151–172, 179 Positive definite, 17, 26, 27, 34, 36, 287, 288,
Perturbed 532, 534, 568, 570, 874, 981
state, 164 Positive definiteness, 17, 19, 26, 27, 31, 34, 36,
Phase change 287, 288, 532, 534, 547, 568–570, 681,
upon reflection, 295, 330 874
Phase difference, 286, 327, 328, 375 Positive-definite operator, 26
Phase factor, 18, 40, 74, 77, 84, 120, 133, 141, Positive helicity, 293
284, 544 Positive semi-definite, 27, 532
Phase matching, 364, 365, 371 Power series expansion, 33, 108, 116, 117, 121,
Phase matching conditions, 365 122, 198, 216, 582, 591
Phase refractive index, 358, 361, 369 Poynting vector, 309, 311, 335
Phase shift, 317, 318, 329, 330 Primitive function, 208
Phase velocity, 7, 8, 278, 280, 326, 331 Principal axis, 366, 373
Photoabsorption, 132 Principal branch, 251, 254
Photon Principal diagonal, 494, 496
absorption, 132, 135, 138 Principal minors, 287, 533, 570, 573
emission, 132, 134, 138 Principal part, 224
energy, 355 Principal submatrix, 570
Photonics, ix Principal value
Index 911
of branch, 251 R
of integral, 236, 237 Radial coordinate, 107–115, 197, 806, 822, 837
of ln z, 258 Radial wave function, 107–115, 120–122, 167
Probability, 21, 32, 126, 127, 133, 138, 172, Radiation field, 127, 136, 138
346, 353, 565, 741, 777 Raising and lowering operators, 74, 90
density, 126, 127, 129, 139 Raising operator, 74, 90
distribution, 126 Rational number, 198
distribution density, 128, 129 Rayleigh–Jeans law, 339, 345
Projection operator Real analysis, 182, 216
sensu lato, 712 Real axis, 181, 197, 235, 242, 246, 247, 253,
sensu stricto, 714 254, 257–259
Proof by contradiction, 190 Real functions, 49, 128, 141, 142, 200, 319,
Propagating electromagnetic waves, 295 348, 392, 397, 401, 402, 404, 752, 757,
Propagation constant, 326, 331, 361, 363, 364 764
Propagation vector, 364 Real number line, 200, 201, 395
Propagation velocity, 7, 278, 343 Real space, 800, 803, 806, 808
Proper rotation, 640, 643, 654 Real symmetric matrix, 287, 348, 569, 570,
Proposition 572, 600, 603
converse, 591, 825 Real symmetric quadratic form, 547, 569
P6T, 363, 365, 366, 368–370, 373, 374 Real variable, 14, 25, 200, 206, 207, 219, 230,
Pure imaginary, 28, 29, 315, 331, 419, 870 389, 581
Pure rotation group, 654, 659 Rearrangement theorem, 623, 681, 695, 701,
704, 713
Rectangular parallelepiped, 355–357
Q Recurrence relation, 84
Q-numbers, 10 Redshifted, 4, 372–374
Quadratic forms, 53, 287, 288, 532, 533, 535, Reduced mass, 58, 59, 108, 133, 735
547, 568–575, 814, 851 Reduced Planck constant, 4, 127
Quantum chemical calculations, 711, 712, 729, Reducible, 471, 686, 699, 700, 702, 742, 746,
734 749, 754, 759, 765, 769, 825, 830, 843,
Quantum chemistry, 764 845, 847, 864
Quantum electromagnetism, 378 Reducible representation, 686, 699, 700, 742,
Quantum-mechanical, 3, 21, 30–52, 54, 55, 57, 746, 749, 759
107–109, 122, 151, 152, 160, 165, 532 Reduction, 559, 687
Quantum-mechanical harmonic oscillator, Reflectance, 311, 317, 337
31–52, 55, 107–109, 152, 160, 532 Reflected light, 300, 302
Quantum mechanics, 3, 6, 11, 12, 18, 27, 29, Reflected wave, 311, 313, 324, 327
31, 51, 57, 58, 151–179, 339, 354, 427, Reflections
523, 547, 673, 748, 872 angle, 300, 302, 313, 314, 317, 319, 329,
Quantum number 643, 645
azimuthal, 120, 121, 142, 147 coefficient, 299, 300, 306–308, 317, 325,
magnetic, 140, 142 326, 337, 344, 379
orbital angular momentum, 108, 120, 122, Refraction angles, 302, 308, 315
142, 895 Refractive index
principal, 120, 121, 142 of dielectric, 280, 295, 308, 313, 314, 321,
Quantum operator, 29 326, 327
Quantum state, 21, 32, 52, 74, 90, 111, 121, group, 358
125–127, 132, 142, 143, 145, 151–159, of laser medium, 356, 358
163, 164, 166, 172, 174, 354, 741, 847, phase, 295, 358, 361, 362, 369
849 relative, 303, 312, 314, 318, 371
Quantum theory of light, 356 Region, 148, 199, 204, 206–209, 211, 215,
Quartic equation, 145 219–223, 225–227, 263, 285, 286, 318,
Quotient group, 627
912 Index
319, 321, 331, 336, 339, 351, 416, 842, theory, 621, 664, 679–726, 729, 745, 799,
892 819, 843
Regular hexagon, 764 unitary, 679–681, 683, 687, 695, 714, 825,
Regular point, 206, 216, 228 830, 843
Regular representation, 700–707 Residues, 227–230, 232, 233, 237, 257
Regular singular point, 169 Resolvent, 581, 595–600, 602–607, 609, 610,
Regular tetrahedron, 661, 777 614, 617
Relations between roots and coefficients, 461 Resolvent matrix, 595–600, 602–607, 609, 610,
Relative coordinate, 57–59 614, 617
Relative permeability, 271 Resonance integral, 740, 752, 767
Relative permittivity, 271 Resonance structure, 688, 689, 757
Relative refractive index, 303, 312, 314, 318, Resonator, 337, 359, 361
371 Reversible mapping, 447
Relativistic quantum mechanics, 12 Riemann sheet, 252
Removable singularity, 224, 244 Riemann surface, 248–265
Representation Right-circular polarization, 136, 138, 293
antisymmetric, 723–726, 858, 860, 863, 864 Right-handed system, 284, 304, 309, 350, 376
complex conjugate, 523, 762, 816, 817 Rigid body, 664, 729
conjugate, 762, 816, 817 Rodrigues formula, 85, 93, 94, 99, 116, 264
coordinate, 3, 31, 44–51, 108, 112, 122, Rotation
127, 129, 132, 136, 146, 157, 160, 163, angles, 367, 639, 640, 642, 647, 663, 667,
165, 168, 170, 171, 175, 178, 256, 395, 800, 831, 833–836, 838, 843, 844, 889
396, 440, 574, 640, 644, 650, 652, 791, axis, 639–643, 648, 657, 659, 661,
800, 823, 837, 838 664–668, 674–676, 831, 832, 834, 838,
dimension of, 679, 680, 685, 687, 688, 691, 839, 844, 889, 895
695–697, 703, 711, 719, 723, 726, 746, groups, 622, 654, 659, 663, 799, 806, 815,
777, 800, 802, 830, 839, 846, 847 843, 890
direct-product, 631–633, 720–723, 725, improper, 643, 647, 663, 780, 889
740, 741, 745, 750, 769, 796, 831, matrix, 615, 639, 663–668, 820, 838, 889
843–846, 849, 859, 861, 863 proper, 640, 643, 654
of group, 679–726, 840 symmetry, 640, 642
irreducible, 686, 688, 691–694, 696–700, transformation, 367, 439, 451, 622, 635,
702, 703, 707–711, 715–717, 719, 720, 640, 663, 667, 672–678, 799, 816, 832,
738, 740–742, 745–751, 754–757, 759, 833, 837, 889, 895
761, 765, 769–771, 774, 775, 778, 781, Row vectors, 8, 383, 464, 519, 528, 541, 542,
783–786, 788, 789, 795–797, 823, 825, 702, 830
828, 830, 840–847, 859
matrix, 3, 31, 34, 54, 86, 161, 440–443, 450,
451, 454, 456, 468, 487, 536, 553, 661, S
680, 681, 685, 686, 690, 698, 699, 708, Scalar, 270, 276–278, 434, 435, 446, 473, 486,
714, 719, 739, 811, 816, 817, 822, 826, 526, 577, 729
830, 838, 839, 844, 864, 880, 884 Scalar function, 277, 278, 729
of point group, 746 Schrödinger, E., 3, 7, 11, 14
reducible, 686, 699, 700, 742, 749, 759 Schrödinger equation
regular, 700–707 of motion, 58
space, 687, 688, 691, 692, 714, 716, 719, time-evolved, 126
738, 745, 761, 764, 777, 780, 796, 799, Schur’s First Lemma, 692–694, 697, 783, 822,
800, 802, 803, 806, 808, 817, 839, 842, 839
843, 846, 847, 849, 855, 859, 860, 864, Schur’s lemmas, 692–697, 823
884 Schur’s Second Lemma, 694, 695, 708, 738,
subduced, 750, 759, 765 824, 825, 830
symmetric, 723–726, 734, 738, 741, 742, Second-order differential operators, 389–394
745, 751, 754, 755, 759, 769, 784, 796, Second-order linear differential equations
858, 860, 863, 864 (SOLDEs), 3, 14, 15, 25, 33, 44, 69–71,
Index 913
82, 368, 379–384, 388–390, 394, 402, Skew-symmetric matrix, 590, 592, 600, 807,
424, 428, 581, 592–594 867, 871, 872, 876, 891
Secular equation, 503, 729, 744–747, 751, 761, Slab waveguide, 320, 321, 323, 324, 326, 327,
766, 770, 771, 774, 786, 787, 795 331, 359
Selection rules, 125–149, 720 Solid-state chemistry, 374
Self-adjoint, 389, 391–394, 399, 401, 402, 405, Solid-state physics, 374
410, 426, 427, 530 Space group, 637
Self-adjoint matrix, 530 Span, 435–437, 443–446, 454, 468, 469, 474,
Self-adjoint operator, 391, 399, 402, 410, 426 481, 489, 490, 501, 513, 521, 548, 549,
Sellmeier’s dispersion formula, 359, 362 575, 577, 665, 738, 745, 761, 764, 780,
Semicircle, 230–232, 236, 238, 239, 241, 242, 782, 796, 800, 817, 839, 842, 843, 849,
246 855, 859, 860, 864, 872, 873, 884
Semiclassical, 125, 147 Special functions, 57, 107, 617
Semiclassical theory, 147 Special orthogonal group SO(3), 635, 663–678,
Semiconductors, 286, 337, 354, 359–362 799, 802, 806, 808, 809, 815, 816,
Semi-infinite dielectric media, 295, 296 822–843, 845, 847, 864, 866, 867, 873,
Semi-simple, 485–488, 509, 512 884, 888–889, 892, 895, 896
Separation axioms, 195, 196 Special orthogonal group SO(n), 871, 875
Separation conditions, 195 Special solution, 8
Separation of variables, 12, 67–72, 107, 125, Special unitary group SU(2), 799, 802, 806,
341 808, 810, 812, 815, 823, 825, 827, 830,
Sesquilinear, 526 831, 843–849, 852, 864, 866, 867, 873,
Set difference, 183 892, 895, 896
Set of points, 225, 226 Special unitary group SU(n), 808, 871, 872, 875
Set theory, 181, 182, 196, 199 Spectral decomposition, 563–565, 580, 668
Similarity transformation, 458, 459, 461–463, Spectral lines, 127, 357, 729
465, 466, 475, 477, 484–486, 496, 497, Spherical basis, 806, 820
505, 506, 508, 510, 513, 515, 530, 532, Spherical coordinate, 58, 60, 109, 806, 823
554, 556, 557, 559, 560, 563, 569, 578, Spherical surface harmonics, 86, 92–103, 107,
580, 588, 589, 600, 604, 609, 654, 668, 167, 799, 806, 817, 820, 823, 839, 842,
679, 681, 683, 685, 686, 689, 782, 801, 884
804, 805, 822, 839, 863, 875, 889 Spherical symmetry, 57, 58, 892
Simple harmonic motion, 32 Spin angular momentum, 77, 843, 844
Simple root, 501, 502, 518 Spin states, 121, 810
Simply connected, 194, 209–212, 215, 219, Spontaneous emission, 346, 347, 353, 355
221, 222, 227, 890–896 Spring constant, 31, 419
Simply reducible, 686, 845, 847, 864 Square-integrable, 25
Simultaneous eigenfunction, 68, 86 Square matrix, 433, 462–464, 468, 469, 471,
Simultaneous eigenstate, 66, 68, 73, 79, 86, 90, 475, 485, 486, 496, 582, 584, 588, 679,
574–580 685, 692, 693, 697, 700, 802, 828
Singleton, 196 Square roots, 249, 288, 367, 790
Single-valuedness, 206, 215, 251 Standard deviation, 52
Singularity, 208, 211, 224, 225, 228, 229, 240, Stark effect, 164, 172
244, 408 State vectors, 127, 131, 133, 153
Singular matrix, 555 Stationary current, 273, 275
Singular part, 224, 225 Stationary waves, 331–337, 343, 375
Singular point Stimulated absorption, 346, 355
isolated, 235 Stimulated emission, 346, 347, 354, 355
regular, 169, 206, 228 Stirling’s interpolation formula, 586
Sinusoidal oscillation, 127, 129, 133 Stokes’ theorem, 296
Skew-symmetric, 590, 592, 600, 807, 867, 871, Strictly positive, 17
872, 876, 891 Structure constants, 872, 895
914 Index
Transition matrix elements, 131, 740, 755, 769, Unique solution, 403, 416, 419, 424, 447, 450,
775, 776 451
Transition probability, 127, 133, 138, 172, 346, Uniqueness
347, 741, 777 of decomposition, 486, 488, 512, 564
Translation symmetry, 637 of Green’s function, 401
Transmission of representation, 440, 450, 454, 632, 633
coefficient, 306–308 of solution, 18
of electromagnetic wave, 295–337 Unitarity, 683, 713, 714, 718, 740, 808, 816,
irradiance, 311 819, 823, 851, 869
of light, 297 Unitary
Transmittance, 311 diagonalization, 290, 556–564, 568, 573,
Transpose 667, 804
complex conjugate, 22, 537 group, 799, 870–872, 875
Transposed matrix, 22, 383, 471, 523, 537 matrix, 135, 290, 530, 533, 534, 541, 547,
Transverse electric (TE) mode, 304, 324, 326, 556–559, 561, 564–568, 573, 578, 590,
329 592, 601, 604, 667, 679–681, 687, 690,
Transverse electric (TE) wave, 295, 303–312, 699, 701, 714, 746, 763, 768, 774, 782,
314, 316, 320–331 804, 805, 808, 810–812, 814, 820, 821,
Transverse magnetic (TM) mode, 304, 368, 847, 850, 854, 863, 867, 875, 876
371, 372 operator, 547–580, 860, 869, 870
Transverse magnetic (TM) wave, 295, representation, 679–681, 683, 687, 695,
303–308, 310, 312–314, 316, 318, 714, 825, 830, 843
320–331 transformation, 135, 140, 541, 566, 740,
Transverse waves, 281, 282, 352 763, 764, 770, 771, 774, 812, 849, 851,
Trial function, 174, 175, 177 852, 854, 860, 863
Triangle inequality, 525 Unitary similarity transformation, 532, 554,
Triangle matrix 556, 557, 559, 560, 563, 578, 580, 600,
lower, 462, 512, 553, 606 609, 668, 689, 782, 801, 804, 805, 822,
upper, 462, 464, 476, 485, 496, 553, 606 863, 875, 889
Trigonometric formula, 101, 130, 309, 312, Unit polarization vector
313, 330, 842, 884 of electric field, 127, 284, 303, 350, 352,
Trigonometric functions, 103, 140, 241, 319, 755, 796
639, 815, 836 of magnetic field, 304, 352
Triple root, 474, 501, 503, 504, 759 Unit set, 196
Triply degenerate, 789, 795, 796 Unit vectors, 4, 129, 130, 273, 279, 280,
Trivial, 21, 26, 184, 192, 299, 381, 434, 437, 296–298, 303–305, 325, 327, 352, 364,
475, 480, 489, 555, 567, 570, 585, 607, 436, 540, 611, 803
622, 664, 667, 693, 741, 803, 819, 860, Universal covering group, 896
867 Universal set, 182–185
Trivial solution, 399, 401, 409, 426, 453, 460, Unknowns, 153, 284, 305, 596, 598, 600
522 Unperturbed
Trivial topology, 195 state, 153, 158
T1-space, 195–196, 896 system, 152, 153
Two-level atoms, 339, 345–349, 354, 355, 375 Upper right off-diagonal elements, 450, 462
Upper triangle matrix, 462, 464, 476, 485, 496,
553, 606
U
Ultraviolet catastrophe, 339, 345
Ultraviolet light, 363
Unbounded, 25, 29, 216 V
Uncertainty principle, 29, 51–55 Variable transformation, 46, 47, 78, 140, 160,
Uncountable, 182, 621 234, 278
Underlying set, 185 Variance, 51–55
Undetermined constants, 74, 75, 84 Variance operator, 51
Uniformly convergent, 216, 218–222 Variational method, 151, 172–179
916 Index