Hotta S - Mathematical Physical Chemistry-Springer (2020)

Shu Hotta
Mathematical
Physical
Chemistry
Practical and Intuitive Methodology
Second Edition
Mathematical Physical Chemistry
Shu Hotta
Mathematical Physical
Chemistry
Practical and Intuitive Methodology
Second Edition
Shu Hotta
Takatsuki, Osaka, Japan
ISBN 978-981-15-2224-6 ISBN 978-981-15-2225-3 (eBook)

https://doi.org/10.1007/978-981-15-2225-3
© Springer Nature Singapore Pte Ltd. 2020

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
To my wife Kazue
and
To the memory of Roro
Preface to the Second Edition
This book is the second edition of Mathematical Physical Chemistry. Mathematics is

a common language of natural science including physics, chemistry, and biology.
Although the words mathematical physics and physical chemistry (or chemical
physics) are commonly used, mathematical physical chemistry sounds rather uncom-
mon. Therefore, it might well be reworded as the mathematical physics for chemists.
The book title could have been, for instance, “The Mathematics of Physics and
Chemistry” accordingly, in tribute to the famous book that was written three-quarters
of a century ago by H. Margenau and G. M. Murphy. Yet, the word mathematical
physical chemistry is expected to be granted citizenship, considering that chemistry
and related interdisciplinary fields such as materials science and molecular science
are becoming increasingly mathematical.
The main concept and main theme remain unchanged, but this book’s second
edition contains the theory of analytic functions and the theory of continuous groups.
Both the theories are counted as one of the most elegant theories of mathematics. The
mathematics of these topics is of a somewhat advanced level and something like a
“sufficient condition” for chemists, whereas that of the first edition may be a
prerequisite (or a necessary condition) for them. Therefore, chemists (or may be
physicists as well) can creatively use the two editions. In association with these
major additions to the second edition, the author has disposed the mathematical
topics (the theory of analytic functions, Green’s functions, exponential functions of
matrices, and the theory of continuous groups) at the last chapter of individual parts
(Part I through Part IV).
At the same time, the author has also made several specific revisions including the
introductory discussion on the perturbation method and variational method, both of
which can be effectively used for gaining approximate solutions of various quantum-
mechanical problems. As another topic, the author has presented the recent progress
on organic lasers. This topic is expected to help develop high-performance light-
emitting devices, one of the important fields of materials science. As in the case of
the first edition, readers benefit from going freely back and forth across the whole
topics of this book.
vii
viii Preface to the Second Edition
Once again, the author wishes to thank many students for valuable discussions
and Dr. Shin’ichi Koizumi at Springer for giving him an opportunity to write
this book.
Takatsuki, Japan Shu Hotta

October 2019
Preface to the First Edition
The contents of this book are based upon manuscripts prepared for both undergrad-
uate courses of Kyoto Institute of Technology by the author entitled “Polymer
Nanomaterials Engineering” and “Photonics Physical Chemistry” and a master’s
course lecture of Kyoto Institute of Technology by the author entitled “Solid-State
Polymers Engineering.”
This book is intended for graduate and undergraduate students, especially those
who major in chemistry and, at the same time, wish to study mathematical physics.
Readers are supposed to have a basic knowledge of analysis and linear algebra.
However, they are not supposed to be familiar with the theory of analytic functions
(i.e., complex analysis), even though it is desirable to have relevant knowledge
about it.
At the beginning, mathematical physics looks daunting to chemists, as used to be
the case with myself as a chemist. The book introduces the basic concepts of
mathematical physics to chemists. Unlike other books related to mathematical
physics, this book makes a reasonable selection of material so that students majoring
in chemistry can readily understand the contents in spontaneity. In particular, we
stress the importance of practical and intuitive methodology. We also expect engi-
neers and physicists to benefit from reading this book.
In Part I and Part II, the book describes quantum mechanics and electromagne-
tism. Relevance between the two is well considered. Although quantum mechanics
covers the broad field of modern physics, in Part I we focus on a harmonic oscillator
and a hydrogen (like) atom. This is because we can study and deal with many of
fundamental concepts of quantum mechanics within these restricted topics. More-
over, knowledge acquired from the study of the topics can readily be extended to
practical investigation of, e.g., electronic states and vibration (or vibronic) states of
molecular systems. We describe these topics by both analytic method (that uses
differential equations) and operator approach (using matrix calculations). We believe
that the basic concepts of quantum mechanics can be best understood by contrasting
the analytical and algebraic approaches. For this reason, we give matrix representa-
tions of physical quantities whenever possible. Examples include energy
ix
x Preface to the First Edition
eigenvalues of a quantum-mechanical harmonic oscillator and angular momenta of a

hydrogen-like atom. At the same time, these two physical systems supply us with a
good opportunity to study classical polynomials, e.g., Hermite polynomials, (asso-
ciated) Legendre polynomials, Laguerre polynomials, Gegenbauer polynomials, and
special functions, more generally. These topics constitute one of the important
branches of mathematical physics. One of the basic concepts of quantum mechanics
is that a physical quantity is represented by a Hermitian operator or matrix. In this
respect, the algebraic approach gives a good opportunity to get familiar with this
concept. We present tangible examples for this. We also emphasize the importance
of the notion of Hermiticity of a differential operator. We often encounter a unitary
operator or unitary transformation alongside the notion of Hermitian operators. We
show several examples of unitary operators in connection with transformation of
vectors and coordinates.
Part II describes Maxwell equations and their applications to various phenomena
of electromagnetic waves. These include their propagation, reflection, and transmis-
sion in dielectric media. We restrict ourselves to treating those phenomena in
dielectrics without charge. Yet, we cover a wide range of important topics. In
particular, when two (or more) dielectrics are in contact with each other at a plane
interface, reflection and transmission of light are characterized by various important
parameters such as reflection and transmission coefficients, Brewster angles, and
critical angles. We should have a proper understanding not only from the point of
view of basic study but also to make use of relevant knowledge in optical device
applications such as a waveguide. In contrast to a concept of electromagnetic waves,
light possesses a characteristic of light quanta. We present semiclassical and statis-
tical approaches to blackbody radiation occurring in a simplified system in relation
to Part I. The physical processes are well characterized by a notion of two-level
atoms. In this context, we outline the dipole radiation within the framework of the
classical theory. We briefly describe how the optical processes occurring in a
confined dielectric medium are related to a laser that is of great importance in
fundamental science and its applications. Many of basic equations of physics are
descried as second-order linear differential equations (SOLDEs). Different methods
were developed and proposed to seek their solutions. One of the most important
methods is that of Green’s functions. We present the introductory theory of Green’s
functions accordingly. In this connection, we rethink the Hermiticity of a differential
operator.
In Part III and Part IV, we describe algebraic structures of mathematical physics.
Their understanding is useful to studies of quantum mechanics and electromagne-
tism whose topics are presented in Part I and Part II. Part III deals with theories of
linear vector spaces. We focus on the discussion of vectors and their transformations
in finite-dimensional vector spaces. Generally, we consider the vector transforma-
tions among the vector spaces of different dimensions. In this book, however, we
restrict ourselves to the case of the transformation between the vector spaces of same
dimension, i.e., endomorphism of the space (Vn ! Vn). This is not only because this
is most often the case with many of physical applications, but because the relevant
operator is represented by a square matrix. Canonical forms of square matrices hold
Preface to the First Edition xi
an important position in algebra. These include a triangle matrix, diagonalizable

matrix as well as a nilpotent matrix and idempotent matrix. The most general form
will be Jordan canonical form. We present its essential parts in detail taking a
tangible example. Next to the general discussion, we deal with an inner product
space. Once an inner product is defined between any couple of vectors, the vector
space is given a fruitful structure. An example is a norm (i.e., “length”) of a vector.
Also, we gain a clear relationship between Part III and Part I. We define various
operators or matrices that are important in physical applications. Examples include
normal operators (or matrices) such as Hermitian operators, projection operators, and
unitary operators. Once again, we emphasize the importance of Hermitian operators.
In particular, two commutable Hermitian matrices share simultaneous eigenvectors
(or eigenstates) and, in this respect, such two matrices occupy a special position in
quantum mechanics.
Finally, Part IV describes the essence of group theory and its chemical applica-
tions. Group theory has a broad range of applications in solid-state physics, solid-
state chemistry, molecular science, etc. Nonetheless, the knowledge of group theory
does not seem to have fully prevailed among chemists. We can discover an adequate
reason for this in a preface to the first edition of Chemical Applications of Group
Theory written by F. A. Cotton. It might well be natural that definition and statement
of abstract algebra, especially group theory, sound somewhat pretentious for chem-
ists, even though the definition of group is quite simple. Therefore, we present
various examples for readers to get used to notions of group theory. The notion of
mapping is important as in the case of the linear vector spaces. Aside from being
additive with calculation for a vector space and multiplicative for a group, the
fundamentals of calculation regulations are pretty much the same regarding the
vector space and group. We describe the characteristics of symmetry groups in detail
partly because related knowledge is useful for molecular orbital (MO) calculations
that are presented in the last section of the book. Representation theory is probably
one of the most daunting notions for chemists. Practically, however, the representa-
tion is just homomorphism that corresponds to a linear transformation in a vector
space. In this context, the representation is merely denoted by a number or a matrix.
Basis functions of representation correspond to basis vectors in a vector space.
Grand orthogonality theorem (GOT) is a “nursery bed” of the representation theory.
Therefore, readers are encouraged to understand its essence apart from the rigorous
proof of the theorem. In conjunction with Part III, we present a variety of projection
operators. These are very useful to practical applications in, e.g., quantum mechanics
and molecular science. The final parts of the book are devoted to applications of
group theory to problems of physical chemistry, especially those of quantum
chemistry, more specifically molecular orbital calculations. We see how symmetry
consideration, particularly the use of projection operators, saves us a lot of labor.
Examples include aromatic hydrocarbons and methane.
The previous sections sum up the contents of this book. Readers may start with
any part and go freely back and forth. This is because contents of many parts are
interrelated. For example, we emphasize the importance of Hermiticity of differen-
tial operators and matrices. Also projection operators and nilpotent matrices appear
xii Preface to the First Edition
in many parts along with their tangible applications to individual topics. Hence,
readers are recommended to carefully examine and compare the related contents
throughout the book. We believe that readers, especially chemists, benefit from the
writing style of this book, since it is suited to chemists who are good at intuitive
understanding.
The author would like to thank many students for their valuable suggestions and
discussions at the lectures. The author also wishes to thank many students for
valuable discussions and Dr. Shin’ichi Koizumi at Springer for giving him an
opportunity to write this book.
Kyoto, Japan Shu Hotta

October 2017
Contents
Part I Quantum Mechanics

1 Schrödinger Equation and Its Application . . . . . . . . . . . . . . . . . . . 3
1.1 Early-Stage Quantum Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Schrödinger Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Simple Applications of Schrödinger Equation . . . . . . . . . . . . . . 14
1.4 Quantum-Mechanical Operators and Matrices . . . . . . . . . . . . . . 21
1.5 Commutator and Canonical Commutation Relation . . . . . . . . . . 27
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2 Quantum-Mechanical Harmonic Oscillator . . . . . . . . . . . . . . . . . . . 31
2.1 Classical Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Formulation Based on an Operator Method . . . . . . . . . . . . . . . . 33
2.3 Matrix Representation of Physical Quantities . . . . . . . . . . . . . . 41
2.4 Coordinate Representation of Schrödinger Equation . . . . . . . . . 44
2.5 Variance and Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . 51
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 Hydrogen-Like Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Constitution of Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Generalized Angular Momentum . . . . . . . . . . . . . . . . . . . . . . . 72
3.5 Orbital Angular Momentum: Operator Approach . . . . . . . . . . . . 77
3.6 Orbital Angular Momentum: Analytic Approach . . . . . . . . . . . . 91
3.6.1 Spherical Surface Harmonics and Associated
Legendre Differential Equation . . . . . . . . . . . . . . . . . . 92
3.6.2 Orthogonality of Associated Legendre Functions . . . . . 103
3.7 Radial Wave Functions of Hydrogen-Like Atoms . . . . . . . . . . . 107
3.7.1 Operator Approach to Radial Wave Functions . . . . . . . 107
3.7.2 Normalization of Radial Wave Functions . . . . . . . . . . . 112
3.7.3 Associated Laguerre Polynomials . . . . . . . . . . . . . . . . 116
xiii
xiv Contents
3.8 Total Wave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4 Optical Transition and Selection Rules . . . . . . . . . . . . . . . . . . . . . . 125
4.1 Electric Dipole Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.2 One-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3 Three-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.4 Selection Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.5 Angular Momentum of Radiation . . . . . . . . . . . . . . . . . . . . . . . 147
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5 Approximation Methods of Quantum Mechanics . . . . . . . . . . . . . . 151
5.1 Perturbation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.1.1 Quantum State and Energy Level Shift Caused by
Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.1.2 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.2 Variational Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6 Theory of Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.1 Set and Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.1.1 Basic Notions and Notations . . . . . . . . . . . . . . . . . . . . 182
6.1.2 Topological Spaces and Their Building Blocks . . . . . . . 185
6.1.3 T1-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.1.4 Complex Numbers and Complex Plane . . . . . . . . . . . . 197
6.2 Analytic Functions of a Complex Variable . . . . . . . . . . . . . . . . 199
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula . . . . 207
6.4 Taylor’s Series and Laurent’s Series . . . . . . . . . . . . . . . . . . . . . 216
6.5 Zeros and Singular Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.6 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
6.7 Calculus of Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.8 Examples of Real Definite Integrals . . . . . . . . . . . . . . . . . . . . . 230
6.9 Multivalued Functions and Riemann Surfaces . . . . . . . . . . . . . . 248
6.9.1 Brief Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
6.9.2 Examples of Multivalued Functions . . . . . . . . . . . . . . . 256
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Part II Electromagnetism
7 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
7.1 Maxwell’s Equations and Their Characteristics . . . . . . . . . . . . . 269
7.2 Equation of Wave Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
7.3 Polarized Characteristics of Electromagnetic Waves . . . . . . . . . 280
7.4 Superposition of Two Electromagnetic Waves . . . . . . . . . . . . . 285
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Contents xv
8 Reflection and Transmission of Electromagnetic Waves

in Dielectric Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
8.1 Electromagnetic Fields at an Interface . . . . . . . . . . . . . . . . . . . 295
8.2 Basic Concepts Underlying Phenomena . . . . . . . . . . . . . . . . . . 297
8.3 Transverse Electric (TE) Waves and Transverse
Magnetic (TM) Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
8.4 Energy Transport by Electromagnetic Waves . . . . . . . . . . . . . . 308
8.5 Brewster Angles and Critical Angles . . . . . . . . . . . . . . . . . . . . 312
8.6 Total Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
8.7 Waveguide Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
8.7.1 TE and TM Waves in a Waveguide . . . . . . . . . . . . . . . 320
8.7.2 Total Internal Reflection and Evanescent Waves . . . . . . 327
8.8 Stationary Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
9 Light Quanta: Radiation and Absorption . . . . . . . . . . . . . . . . . . . . 339
9.1 Blackbody Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
9.2 Planck’s Law of Radiation and Mode Density of
Electromagnetic Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
9.3 Two-Level Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
9.4 Dipole Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
9.5 Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
9.5.1 Brief Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
9.5.2 Organic Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
9.6 Mechanical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
10 Introductory Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
10.1 Second-Order Linear Differential Equations (SOLDEs) . . . . . . . 379
10.2 First-Order Linear Differential Equations (FOLDEs) . . . . . . . . . 384
10.3 Second-Order Differential Operators . . . . . . . . . . . . . . . . . . . . 389
10.4 Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
10.5 Construction of Green’s Functions . . . . . . . . . . . . . . . . . . . . . . 401
10.6 Initial Value Problems (IVPs) . . . . . . . . . . . . . . . . . . . . . . . . . 408
10.6.1 General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
10.6.2 Green’s Functions for IVPs . . . . . . . . . . . . . . . . . . . . . 411
10.6.3 Estimation of Surface Terms . . . . . . . . . . . . . . . . . . . . 414
10.6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
10.7 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Part III Linear Vector Spaces

11 Vectors and Their Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 433
11.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
11.2 Linear Transformations of Vectors . . . . . . . . . . . . . . . . . . . . . . 438
11.3 Inverse Matrices and Determinants . . . . . . . . . . . . . . . . . . . . . . 448
xvi Contents
11.4 Basis Vectors and Their Transformations . . . . . . . . . . . . . . . . . 452

Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
12 Canonical Forms of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
12.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 459
12.2 Eigenspaces and Invariant Subspaces . . . . . . . . . . . . . . . . . . . . 468
12.3 Generalized Eigenvectors and Nilpotent Matrices . . . . . . . . . . . 473
12.4 Idempotent Matrices and Generalized Eigenspaces . . . . . . . . . . 478
12.5 Decomposition of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
12.6 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
12.6.1 Canonical Form of Nilpotent Matrix . . . . . . . . . . . . . . 488
12.6.2 Jordan Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
12.6.3 Example of Jordan Canonical Form . . . . . . . . . . . . . . . 501
12.7 Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
13 Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
13.1 Inner Product and Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
13.2 Gram Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
13.3 Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
13.4 Orthonormal Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
14 Hermitian Operators and Unitary Operators . . . . . . . . . . . . . . . . . 547
14.1 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
14.2 Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
14.3 Unitary Diagonalization of Matrices . . . . . . . . . . . . . . . . . . . . . 556
14.4 Hermitian Matrices and Unitary Matrices . . . . . . . . . . . . . . . . . 564
14.5 Hermitian Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
14.6 Simultaneous Eigenstates and Diagonalization . . . . . . . . . . . . . 574
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
15 Exponential Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 581
15.1 Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
15.2 Exponential Functions of Matrices and Their
Manipulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
15.3 System of Differential Equations . . . . . . . . . . . . . . . . . . . . . . . 592
15.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
15.3.2 System of Differential Equations in a Matrix
Form: Resolvent Matrix . . . . . . . . . . . . . . . . . . . . . . . 595
15.3.3 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 611
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Contents xvii
Part IV Group Theory and Its Chemical Applications

16 Introductory Group Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
16.1 Definition of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
16.2 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
16.3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
16.4 Isomorphism and Homomorphism . . . . . . . . . . . . . . . . . . . . . . 627
16.5 Direct-Product Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
17 Symmetry Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
17.1 A Variety of Symmetry Operations . . . . . . . . . . . . . . . . . . . . . 635
17.2 Successive Symmetry Operations . . . . . . . . . . . . . . . . . . . . . . . 643
17.3 O and Td Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
17.4 Special Orthogonal Group SO(3) . . . . . . . . . . . . . . . . . . . . . . . 663
17.4.1 Rotation Axis and Rotation Matrix . . . . . . . . . . . . . . . 664
17.4.2 Euler Angles and Related Topics . . . . . . . . . . . . . . . . . 669
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
18 Representation Theory of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 679
18.1 Definition of Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 679
18.2 Basis Functions of Representation . . . . . . . . . . . . . . . . . . . . . . 683
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) . . . . 692
18.4 Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
18.5 Regular Representation and Group Algebra . . . . . . . . . . . . . . . 700
18.6 Classes and Irreducible Representations . . . . . . . . . . . . . . . . . . 707
18.7 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
18.8 Direct-Product Representation . . . . . . . . . . . . . . . . . . . . . . . . . 720
18.9 Symmetric Representation and Antisymmetric
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
19 Applications of Group Theory to Physical Chemistry . . . . . . . . . . . 729
19.1 Transformation of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 729
19.2 Method of Molecular Orbitals (MOs) . . . . . . . . . . . . . . . . . . . . 734
19.3 Calculation Procedures of Molecular Orbitals (MOs) . . . . . . . . . 742
19.4 MO Calculations Based on π-Electron Approximation . . . . . . . . 747
19.4.1 Ethylene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
19.4.2 Cyclopropenyl Radical . . . . . . . . . . . . . . . . . . . . . . . . 757
19.4.3 Benzene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
19.4.4 Allyl Radical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
19.5 MO Calculations of Methane . . . . . . . . . . . . . . . . . . . . . . . . . . 777
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
20 Theory of Continuous Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
20.1 Introduction: Operators of Rotation and Infinitesimal
Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
xviii Contents
20.2 Rotation Groups: SU(2) and SO(3) . . . . . . . . . . . . . . . . . . . . . . 806

20.2.1 Construction of SU(2) Matrices . . . . . . . . . . . . . . . . . . 808
20.2.2 SU(2) Representation Matrices: Wigner Formula . . . . . 812
20.2.3 SO(3) Representation Matrices and Spherical
Surface Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
20.2.4 Irreducible Representations of SU(2) and SO(3) . . . . . . 823
20.2.5 Parameter Space of SO(3) . . . . . . . . . . . . . . . . . . . . . . 831
20.2.6 Irreducible Characters of SO(3) and Their Orthogonality 837
20.3 ClebschGordan Coefficients of Rotation Groups . . . . . . . . . . . 843
20.3.1 Direct-Product of SU(2) and ClebschGordan
Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
20.3.2 Calculation Procedures of ClebschGordan Coefficients 849
20.3.3 Examples of Calculation of ClebschGordan
Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858
20.4 Lie Groups and Lie Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 864
20.4.1 Definition of Lie Groups and Lie Algebras:
One-Parameter Groups . . . . . . . . . . . . . . . . . . . . . . . . 865
20.4.2 Properties of Lie Algebras . . . . . . . . . . . . . . . . . . . . . 868
20.4.3 Adjoint Representation of Lie Groups . . . . . . . . . . . . . 874
20.5 Connectedness of Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . . 885
20.5.1 Several Definitions and Examples . . . . . . . . . . . . . . . . 885
20.5.2 O(3) and SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
20.5.3 Simply Connected Lie Groups: Local Properties
and Global Properties . . . . . . . . . . . . . . . . . . . . . . . . . 890
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
Part I
Quantum Mechanics
Quantum mechanics is clearly distinguished from classical physics whose major

pillars are Newtonian mechanics and electromagnetism established by Maxwell.
Quantum mechanics was first established as a theory of atomic physics that handled
microscopic world. Later on, quantum mechanics was applied to macroscopic world,
i.e., cosmos. A question on how exactly quantum mechanics describes the natural
world and on how far the theory can go remains yet problematic and is in dispute to
thisday.
Such an ultimate question is irrelevant to this monograph. Our major aim is to
study a standard approach to applying Schrödinger equation to selected topics. The
topics include a particle confined within a potential well, a harmonic oscillator, and a
hydrogen-like atoms. Our major task rests on solving eigenvalue problems of these
topics. To this end, we describe both an analytical method and algebraic (oroperator)
method. Focusing on these topics, we will be able to acquire various methods to
tackle a wide range of quantum-mechanical problems. These problems are usually
posed as an analytical equation (i.e., differential equation) or an algebraic equation.
A Hamiltonian is constructed analytically or algebraically accordingly. Besides
Hamiltonian, physical quantities are expressed as a differential operator or a matrix
operator. In both analytical and algebraic approaches, Hermitian property
(orHermiticity) of an operator and matrix is of crucial importance. This feature
will, therefore, be highlighted not only in this part but also throughout this book
along with a unitary operator and matrix.
Optical transition and associated selection rules are dealt with in relation to the
above topics. Those subjects are closely related to electromagnetic phenomena that
are considered in PartII.
Unlike the eigenvalue problems of the abovementioned topics, it is difficult to get
exact analytical solutions in most cases of quantum-mechanical problems. For this
reason, we need appropriate methods to obtain approximate solutions with respect to
various problems including the eigenvalue problems. In this context, we deal with
approximation techniques of a perturbation method and variational method.
2 Part I Quantum Mechanics
In the last part, we study the theory of analytic functions, one of the most elegant
theories of mathematics. This approach not only helps cultivate a broad view of pure
mathematics, but also leads to the acquisition of practical methodology. The last part
deals with the introductory set theory and topology aswell.
Chapter 1
Schrödinger Equation and Its Application
Quantum mechanics is an indispensable research tool of modern natural science that

covers cosmology, atomic physics, molecular science, materials science, and so
forth. The basic concept underlying quantum mechanics rests upon Schrödinger
equation. The Schrödinger equation is described as a second-order linear differential
equation (SOLDE). The equation is analytically solved accordingly. Alternatively,
equations of the quantum mechanics are often described in terms of operators and
matrices and physical quantities are represented by those operators and matrices.
Normally, they are noncommutative. In particular, the quantum-mechanical formal-
ism requires the canonical commutation relation between position and momentum
operators. One of great characteristics of the quantum mechanics is that physical
quantities must be Hermitian. This aspect is deeply related to the requirement that
these quantities should be described by real numbers. We deal with the Hermiticity
from both an analytical point of view (or coordinate representation) relevant to the
differential equations and an algebraic viewpoint (or matrix representation) associ-
ated with the operators and matrices. Including these topics, we briefly survey the
origin of Schrödinger equation and consider its implications. To get acquainted
with the quantum-mechanical formalism, we deal with simple examples of the
Schrödinger equation.
1.1 Early-Stage Quantum Theory
The Schrödinger equation is a direct consequence of discovery of quanta. It stemmed

from the hypothesis of energy quanta propounded by Max Planck (1900). This
hypothesis was further followed by photon (light quantum) hypothesis propounded
by Albert Einstein (1905). He claimed that light is an aggregation of light quanta and
that individual quanta carry an energy E expressed as Planck constant h multiplied
by frequency of light ν, i.e.,
© Springer Nature Singapore Pte Ltd. 2020 3

S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_1
4 1 Schrödinger Equation and Its Application
(a) recoiled electron ( )
(b)
−ℏ
incident X-ray (ℏ ) rest electron

ℏ ′
scattered X-ray (ℏ ′)
Fig. 1.1 Scattering of an X-ray beam by an electron. (a) θ denotes a scattering angle of the X-ray
beam. (b) Conservation of momentum
E ¼ hν ¼ ħω, ð1:1Þ
where ħ h/2π and ω ¼ 2πν. The quantity ω is called angular frequency with ν
being frequency. The quantity ħ is said to be a reduced Planck constant.
Also Einstein (1917) concluded that momentum of light quantum p is identical to
the energy of light quantum divided by light velocity in vacuum c. That is, we have
p ¼ E=c ¼ ħω=c ¼ ħk, ð1:2Þ
where k 2π/λ (λ is wavelength of light in vacuum) and k is called wavenumber.

Using vector notation, we have
p = ħk, ð1:3Þ
where k 2πλ n (n: a unit vector in the direction of propagation of light) is said to be a
wavenumber vector.
Meanwhile, Arthur Compton (1923) conducted various experiments where he
investigated how an incident X-ray beam was scattered by matter (e.g., graphite,
copper). As a result, Compton found out a systematical redshift in X-ray wave-
lengths as a function of scattering angles of the X-ray beam (Compton effect).
Moreover he found that the shift in wavelengths depended only on the scattering
angle regardless of quality of material of a scatterer. The results can be summarized
in a simple equation described as
h
Δλ ¼ ð1 cos θÞ, ð1:4Þ
me c
where Δλ denotes a shift in wavelength of the scattered beam; me is a rest mass of an

electron; θ is a scattering angle of the X-ray beam (see Fig. 1.1). A quantity mhe c has a
dimension of length and denoted by λe. That is,
1.1 Early-Stage Quantum Theory 5
λe h=me c: ð1:5Þ
In other words, λe is equal to the maximum shift in the wavelength of the scattered
beam; this shift is obtained when θ ¼ π/2. The quantity λe is called an electron
Compton wavelength and has an approximate value of 2.426 1012 [m].
Let us derive (1.4) on the basis of conservation of energy and momentum. To this
end, in Fig. 1.1 we assume that an electron is originally at rest. An X-ray beam is
incident to the electron. Then the X-ray is scattered and the electron recoils as shown.
The energy conservation reads as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ħω þ me c2 ¼ ħω0 þ p2 c2 þ me 2 c4 , ð1:6Þ
where ω and ω0 are initial and final angular frequencies of the X-ray; the second term
of RHS is an energy of the electron in which p is a magnitude of momentum after
recoil. Meanwhile, conservation of the momentum as a vector quantity reads as
ħk = ħk0 þ p, ð1:7Þ
where k and k0 are wavenumber vectors of the X-ray before and after being scattered;
p is a momentum of the electron after recoil. Note that an initial momentum of the
electron is zero since the electron is originally at rest. Here p is defined as
p mu, ð1:8Þ
where u is a velocity of an electron and m is given by [1].

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
m ¼ me = 1 juj2 =c2 : ð1:9Þ
Figure 1.1 shows that ħk, ħk0, and p form a closed triangle.
From (1.6), we have
2
me c2 þ ħðω ω0 Þ ¼ p2 c 2 þ m e 2 c 4 : ð1:10Þ
Hence, we get
2me c2 ħðω ω0 Þ þ ħ2 ðω ω0 Þ ¼ p2 c2 :
2
ð1:11Þ
From (1.7), we have

p2 ¼ ħ2 ðk k0 Þ ¼ ħ2 k2 þ k0 2kk 0 cos θ
2 2

ħ2 2 02 0
¼ ω þ ω 2ωω cos θ , ð1:12Þ
c2
where we used the relations ω ¼ ck and ω0 ¼ ck0 with the third equality. Therefore,
we get

p2 c2 ¼ ħ2 ω2 þ ω0 2ωω0 cos θ :
2
ð1:13Þ
From (1.11) and (1.13), we have

2me c2 ħðω ω0 Þ þ ħ2 ðω ω0 Þ ¼ ħ2 ω2 þ ω0 2ωω0 cos θ :
2 2
ð1:14Þ
Equation (1.14) is simplified to the following:
2me c2 ħðω ω0 Þ 2ħ2 ωω0 ¼ 2ħ2 ωω0 cos θ:
That is,
me c2 ðω ω0 Þ ¼ ħωω0 ð1 cos θÞ: ð1:15Þ
Thus, we get
ω ω0 1 1 1 0 ħ
¼ 0 ¼ ðλ λÞ ¼ ð1 cos θÞ, ð1:16Þ
ωω0 ω ω 2πc me c2
where λ and λ0 are wavelengths of the initial and final X-ray beams, respectively.
Since λ0 λ ¼ Δλ, we have (1.4) from (1.16) accordingly.
We have to mention another important person, Louis-Victor de Broglie (1924) in
the development of quantum mechanics. Encouraged by the success of Einstein and
Compton, he propounded the concept of matter wave, which was referred to as the
de Broglie wave afterward. Namely, de Broglie reversed the relationship of (1.1) and
(1.2) such that
ω ¼ E=ħ, ð1:17Þ
and
p
k¼ or λ ¼ h=p, ð1:18Þ
ħ
where p equals jpj and λ is a wavelength of a corpuscular beam. This is said to be the
de Broglie wavelength. In (1.18), de Broglie thought that a particle carrying an
energy E and momentum p is accompanied by a wave that is characterized by an
angular frequency ω and wavenumber k (or a wavelength λ ¼ 2π/k). Equation (1.18)
implies that if we are able to determine the wavelength of the corpuscular beam
experimentally, we can decide a magnitude of momentum accordingly.
1.1 Early-Stage Quantum Theory 7
In turn, from squares of both sides of (1.8) and (1.9) we get
p
u¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð1:19Þ
me 1 þ ðp=me cÞ2
This relation represents a velocity of particles of the corpuscular beam. If we are

dealing with an electron beam, (1.19) gives the velocity of the electron beam. As a
nonrelativistic approximation (i.e., p/mec 1), we have
p me u:
We used a relativistic relation in the second term of RHS of (1.6), where an

energy of an electron Ee is expressed by
Ee ¼ p2 c 2 þ m e 2 c 4 : ð1:20Þ
In the meantime, deleting u2 from (1.8) and (1.9) we have

mc2 ¼ p2 c 2 þ m e 2 c 4 :
Namely, we get [1].
E e ¼ mc2 : ð1:21Þ
The relation (1.21) is due to Einstein (1905, 1907) and is said to be the equivalence
theorem of mass and energy.
If an electron is accompanied by a matter wave, that wave should be propagated
with a certain phase velocity vp and a group velocity vg. Thus, using (1.17) and (1.18)
we have
vp ¼ ω=k ¼ Ee =p ¼ p2 c2 þ me 2 c4 =p > c,
vg ¼ ∂ω=∂k ¼ ∂Ee =∂p ¼ c2 p= p2 c2 þ me 2 c4 < c, ð1:22Þ
vp vg ¼ c :
2
Notice that in the above expressions, we replaced E of (1.17) with Ee of (1.20). The
group velocity is thought to be a velocity of a wave packet and, hence, a propagation
velocity of a matter wave should be identical to vg. Thus, vg is considered as a
particle velocity as well. In fact, vg given by (1.22) is identical to u expressed in
(1.19). Therefore, a particle velocity must not exceed c. As for photons (or light
quanta), vp ¼ vg ¼ c and, hence, once again we get vpvg ¼ c2. We will encounter the
last relation of (1.22) in Part II as well.
The above discussion is a brief historical outlook of early-stage quantum theory
before Erwin Schrödinger (1926) propounded his equation.
1.2 Schrödinger Equation
First we introduce a wave equation expressed by
2
1 ∂ ψ
∇2 ψ ¼ , ð1:23Þ
v2 ∂t 2
where ψ is an arbitrary function of a physical quantity relevant to propagation of a

wave; v is a phase velocity of wave; ∇2 called Laplacian is defined below
2 2 2
∂ ∂ ∂
∇2 2
þ 2þ 2: ð1:24Þ
∂x ∂y ∂z
One of special solutions for (1.24) called a plane wave is well studied and expressed
as
ψ ¼ ψ 0 eiðk ∙ xωtÞ : ð1:25Þ
In (1.25), x denotes a position vector of a three-dimensional Cartesian coordinate

and is described as
0 1
x
B C
x = ð e1 e2 e3 Þ @ y A, ð1:26Þ
z
where e1, e2, and e3 denote basis vectors of an orthonormal base pointing to positive
directions of x-, y-, and z-axes. Here we make it a rule to represent basis vectors by a
row vector and represent a coordinate or a component of a vector by a column vector;
see Sect. 9.1.
The other way around, now we wish to seek a basic equation whose solution is
described as (1.25). Taking account of (1.1)–(1.3) as well as (1.17) and (1.18), we
rewrite (1.25) as
ψ ¼ ψ 0 eiðħ ∙ x ħ tÞ ,
p E
ð1:27Þ
0 1
px
B C
where we redefine p = ðe1 e2 e3 Þ@ py A and E as quantities associated with those of
pz
matter (electron) wave. Taking partial differentiation of (1.27) with respect to x, we
obtain
1.2 Schrödinger Equation 9
∂ψ
¼ p ψ eiðħ ∙ x ħ tÞ ¼ px ψ:
i p E i
ð1:28Þ
∂x ħ x 0 ħ
Rewriting (1.28), we have
ħ ∂ψ
¼ px ψ: ð1:29Þ
i ∂x
Similarly we have
ħ ∂ψ ħ ∂ψ
¼ py ψ and ¼ pz ψ: ð1:30Þ
i ∂y i ∂z
Comparing both sides of (1.29), we notice that we may relate a differential operator
ħ ∂
i ∂x to px. From (1.30), similar relationship holds with the y and z components. That
is, we have the following relations:
ħ ∂ ħ ∂ ħ ∂
$ px , $ py , $ pz : ð1:31Þ
i ∂x i ∂y i ∂z
Taking partial differentiation of (1.28) once more,
2 2
∂ ψ
ψ 0 eiðħ ∙ x ħtÞ ¼ 2 p2x ψ:
i p E 1
¼ p ð1:32Þ
∂x2 ħ x
ħ
Hence,
2
∂ ψ
ħ2 ¼ p2x ψ: ð1:33Þ
∂x2
Similarly we have
2 2
∂ ψ ∂ ψ
ħ2 ¼ p2y ψ and ħ2 ¼ p2z ψ: ð1:34Þ
∂y2 ∂z2
As in the above cases, we have
2 2 2
∂ ∂ ∂
ħ2 $ p2x , ħ2 2 $ p2y , ħ2 2 $ p2z : ð1:35Þ
∂x2 ∂y ∂z
Summing both sides of (1.33) and (1.34) and then dividing by 2m, we have
ħ2 2 p2
∇ ψ¼ ψ ð1:36Þ
2m 2m
and the following correspondence
ħ2 2 p2
∇ $ , ð1:37Þ
2m 2m
where m is the mass of a particle.

Meanwhile, taking partial differentiation of (1.27) with respect to t, we obtain
∂ψ
¼ Eψ 0 eiðħ ∙ x ħ tÞ ¼ Eψ:
i p E i
ð1:38Þ
∂t ħ ħ
That is,
∂ψ
iħ ¼ Eψ: ð1:39Þ
∂t
As the above, we get the following relationship:
∂
iħ $ E: ð1:40Þ
∂t
Thus, we have relationships between c-numbers (classical numbers) and q-numbers

(quantum numbers, namely, operators) in (1.35) and (1.40). Subtracting (1.36) from
(1.39), we get

∂ψ ħ2 2 p2
iħ þ ∇ ψ¼ E ψ: ð1:41Þ
∂t 2m 2m
Invoking the relationship on energy
ðTotal energyÞ ¼ ðKinetic energyÞ þ ðPotential energyÞ, ð1:42Þ
we have
p2
E¼ þ V, ð1:43Þ
2m
where V is a potential energy. Thus, (1.41) reads as

∂ψ ħ2 2
iħ þ ∇ ψ ¼ Vψ: ð1:44Þ
∂t 2m
Rearranging (1.44), we finally get

ħ2 2 ∂ψ
∇ þ V ψ ¼ iħ : ð1:45Þ
2m ∂t
This is the Schrödinger equation, a fundamental equation of quantum mechanics. In

(1.45), we define a following Hamiltonian operator H as
ħ2 2
H ∇ þ V: ð1:46Þ
2m
Then we have a shorthand representation such that
∂ψ
Hψ ¼ iħ : ð1:47Þ
∂t
On going from (1.25) to (1.27), we realize that quantities k and ω pertinent to a

field have been converted to quantities p and E related to a particle. At the same time,
whereas x and t represent a whole space-time in (1.25), those in (1.27) are charac-
terized as localized quantities.
From a historical point of view, we have to mention a great achievement
accomplished by Werner Heisenberg (1925) who propounded matrix mechanics.
The matrix mechanics is often contrasted with the wave mechanics Schrödinger
initiated. Schrödinger and Pau Dirac (1926) demonstrated that wave mechanics and
matrix mechanics are mathematically equivalent. Note that the Schrödinger equation
is described as a nonrelativistic expression based on (1.43). In fact, kinetic energy
K of a particle is given by [1].
m e c2
K ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m e c2 :
2
1 ðu=cÞ
As a nonrelativistic approximation, we get

1 u 2 1 p2
K m e c2 1 þ me c2 ¼ me u2 ,
2 c 2 2me
where we used p meu again as a nonrelativistic approximation; also, we used

1 1
pffiffiffiffiffiffiffiffiffiffiffi 1 þ x
1x 2
2
when x (>0) corresponding to uc is enough small than 1. This implies that in the
above case the group velocity u of a particle is supposed to be well below light
velocity c. Dirac (1928) formulated an equation that describes relativistic quantum
mechanics (the Dirac equation).
In (1.45) ψ varies as a function of x and t. Suppose, however, that a potential
V depends only upon x. Then we have
ħ2 2 ∂ψ ðx, t Þ
∇ þ V ðxÞ ψ ðx, t Þ ¼ iħ : ð1:48Þ
2m ∂t
Now, let us assume that separation of variables can be done with (1.48) such that
ψ ðx, t Þ ¼ ϕðxÞξðt Þ: ð1:49Þ
Then, we have
ħ2 2 ∂ϕðxÞξðt Þ
∇ þ V ðxÞ ϕðxÞξðt Þ ¼ iħ : ð1:50Þ
2m ∂t
Accordingly, (1.50) can be recast as
ħ2 2 ∂ξðt Þ
∇ þ V ðxÞ ϕðxÞ=ϕðxÞ ¼ iħ =ξðt Þ: ð1:51Þ
2m ∂t
For (1.51) to hold, we must equate both sides to a constant E. That is, for a certain
fixed point x0 we have
ħ2 2 ∂ξðt Þ
∇ þ V ðx0 Þ ϕðx0 Þ=ϕðx0 Þ ¼ iħ =ξðt Þ, ð1:52Þ
2m ∂t
where ϕ(x0) of a numerator should be evaluated after operating ∇2, while with ϕ(x0)
in a denominator, ϕ(x0) is evaluated simply replacing x in ϕ(x) with x0. Now, let us
define a function Φ(x) such that
ħ2 2
Φð xÞ ∇ þ V ðxÞ ϕðxÞ=ϕðxÞ: ð1:53Þ
2m
Then, we have
∂ξðt Þ
Φðx0 Þ ¼ iħ =ξðt Þ: ð1:54Þ
∂t
If RHS of (1.54) varied depending on t, Φ(x0) would be allowed to have various

values, but this must not be the case with our present investigation. Thus, RHS of
(1.54) should take a constant value E. For the same reason, LHS of (1.51) should
take a constant.
Thus, (1.48) or (1.51) should be separated into the following equations:
HϕðxÞ ¼ EϕðxÞ, ð1:55Þ

∂ξðt Þ
iħ ¼ Eξðt Þ: ð1:56Þ
∂t
Equation (1.56) can readily be solved. Since (1.56) depends on a sole variable t, we
have
dξðt Þ E E
¼ dt or d ln ξðt Þ ¼ dt: ð1:57Þ
ξðt Þ iħ iħ
Integrating (1.57) from zero to t, we get
ξðt Þ Et
ln ¼ : ð1:58Þ
ξð0Þ iħ
That is,
ξðt Þ ¼ ξð0Þ exp ðiEt=ħÞ: ð1:59Þ
Comparing (1.59) with (1.38), we find that the constant E in (1.55) and (1.56)
represents an energy of a particle (electron).
Thus, the next task we want to do is to solve an eigenvalue equation of (1.55).
After solving the problem, we get a solution
ψ ðx, t Þ ¼ ϕðxÞ exp ðiEt=ħÞ, ð1:60Þ
where the constant ξ(0) has been absorbed in ϕ(x). Normally, ϕ(x) is to be normal-
ized after determining the functional form (vide infra).
1.3 Simple Applications of Schrödinger Equation
The Schrödinger equation has been expressed as (1.48). The equation is a second-
order linear differential equation (SOLDE). In particular, our major interest lies in
solving an eigenvalue problem of (1.55). Eigenvalues consist of points in a complex
plane. Those points sometimes form a continuous domain, but we focus on the
eigenvalues that comprise discrete points in the complex plane. Therefore, in our
studies the eigenvalues are countable and numbered as, e.g., λn (n ¼ 1, 2, 3, ). An
example is depicted in Fig. 1.2. Having this common belief as a background, let us
first think of a simple form of SOLDE.
Example 1.1 Let us think of a following differential equation:
d 2 yð xÞ
þ λyðxÞ ¼ 0, ð1:61Þ
dx2
where x is a real variable; y may be a complex function of x with λ possibly being a

complex constant as well. Suppose that y(x) is defined within a domain [L, L]
(L > 0). We set boundary conditions (BCs) for (1.61) such that
yðLÞ ¼ 0 and yðLÞ ¼ 0 ðL > 0Þ: ð1:62Þ
The BCs of (1.62) are called Dirichlet conditions. We define the following differ-
ential operator D described as
d2
D : ð1:63Þ
dx2
Then rewriting (1.61), we have
Fig. 1.2 Eigenvalues

λn (n ¼ 1, 2, 3, ) on a
z
complex plane
0 1
1.3 Simple Applications of Schrödinger Equation 15
DyðxÞ ¼ λyðxÞ: ð1:64Þ
According to a general principle of SOLDE, it has two linearly independent

solutions. In the case of (1.61), we choose exponential functions for those solutions
described by
eikx and eikx ðk 6¼ 0Þ:
This is because the above functions do not change a functional form with respect to
the differentiation and we ascribe solving a differential equation to solving an
algebraic equation among constants (or parameters). In the present case, λ and
k are such constants.
The parameter k could be a complex variable, because λ is allowed to take a
complex value as well. Linear independence of these functions is ensured from a
nonvanishing Wronskian, W. That is,
eikx eikx eikx eikx

W¼ ikx 0 ikx 0
¼ ¼ ik ik ¼ 2ik: ð1:65Þ
e e ikeikx ikeikx
If k 6¼ 0, W 6¼ 0. Therefore, as a general solution, we get
yðxÞ ¼ aeikx þ beikx ðk 6¼ 0Þ, ð1:66Þ
where a and b are (complex) constant. We call two linearly independent solutions
eikx and eikx (k 6¼ 0) a fundamental set of solutions of a SOLDE. Inserting (1.66)
into (1.61), we have
λ k2 aeikx þ beikx ¼ 0: ð1:67Þ
For (1.67) to hold with any x, we must have
λ k2 ¼ 0, i:e:, λ ¼ k2 : ð1:68Þ
Using BCs (1.62), we have
aeikL þ beikL ¼ 0 and aeikL þ beikL ¼ 0: ð1:69Þ
Rewriting (1.69) in a matrix form, we have

eikL eikL a 0
¼ : ð1:70Þ
eikL eikL b 0
For a and b in (1.70) to have nonvanishing solutions, we must have

eikL eikL
¼ 0, i:e:, e2ikL e2ikL ¼ 0: ð1:71Þ
eikL eikL
It is because if (1.71) were not zero, we would have a ¼ b ¼ 0 and y(x) 0. Note that
with an eigenvalue problem we must avoid having a solution that is identically zero.
Rewriting (1.71), we get
eikL þ eikL eikL eikL ¼ 0: ð1:72Þ
That is, we have either
eikL þ eikL ¼ 0 ð1:73Þ
or
eikL eikL ¼ 0: ð1:74Þ
In the case of (1.73), inserting this into (1.69) we have
eikL ða bÞ ¼ 0: ð1:75Þ
Therefore,
a ¼ b, ð1:76Þ
where we used the fact that eikL is a nonvanishing function for any ikL (either real or
complex). Similarly, in the case of (1.74), we have
a ¼ b: ð1:77Þ
For (1.76), from (1.66) we have
yðxÞ ¼ a eikx þ eikx ¼ 2a cos kx: ð1:78Þ
With (1.77), in turn, we get
yðxÞ ¼ a eikx eikx ¼ 2ia sin kx: ð1:79Þ
Thus, we get two linearly independent solutions (1.78) and (1.79).

Inserting BCs (1.62) into (1.78), we have
cos kL ¼ 0: ð1:80Þ
Hence,
π
kL ¼ þ mπ ðm ¼ 0, 1, 2, Þ: ð1:81Þ
2
π π
In (1.81), for instance, we have k ¼ 2L for m ¼ 0 and k ¼ 2L for m ¼ 1. Also, we
have k ¼ 2L for m ¼ 1 and k ¼ 2L for m ¼ 2. These cases, however, individually
3π 3π
give linearly dependent solutions for (1.78). Therefore, to get a set of linearly
independent eigenfunctions we may define k as positive. Correspondingly, from
(1.68) we get eigenvalues of
λ ¼ ð2m þ 1Þ2 π 2 =4L2 ðm ¼ 0, 1, 2, Þ: ð1:82Þ
Also, inserting BCs (1.62) into (1.79), we have
sin kL ¼ 0: ð1:83Þ
Hence,
kL ¼ nπ ðn ¼ 1, 2, 3, Þ: ð1:84Þ
From (1.68) we get
λ ¼ n2 π 2 =L2 ¼ ð2nÞ2 π 2 =4L2 ðn ¼ 1, 2, 3, Þ, ð1:85Þ
where we chose positive numbers n for the same reason as the above. With the
second equality of (1.85), we made eigenvalues easily comparable to those of (1.82).
Figure 1.3 shows the eigenvalues given in both (1.82) and (1.85) in a unit of π 2/4L2.
From (1.82) and (1.85), we find that λ is positive definite (or strictly positive), and
so from (1.68) we have
pffiffiffi
k¼ λ: ð1:86Þ
The next step is to normalize eigenfunctions. This step corresponds to appropriate

choice of a constant a in (1.78) and (1.79) so that we can have
[ /4
+∞
0 1 4 9 16 25 ⋯
Fig. 1.3 Eigenvalues of a differential equation (1.61) under boundary conditions given by (1.62).
The eigenvalues are given in a unit of π 2/4L2 on a real axis
Z L Z L
I¼ yðxÞ yðxÞdx ¼ jyðxÞj2 dx ¼ 1: ð1:87Þ
L L
That is,
Z L Z L
1
I ¼ 4jaj2 cos 2 kxdx ¼ 4jaj2 ð1 þ cos 2kxÞdx
L L 2
h iL
1
¼ 2jaj2 x þ sin 2kx ¼ 4Ljaj2 : ð1:88Þ
2k L
Combining (1.87) and (1.88), we get

rffiffiffi
1 1
j aj ¼ : ð1:89Þ
2 L
Thus, we have
rffiffiffi
1 1 iθ
a¼ e , ð1:90Þ
2 L
where θ is any real number

qffiffi and e is said to be a phase factor. We usually set e 1.
iθ iθ
Then, we have a ¼ 12 1
L. Thus for a normalized cosine eigenfunctions, we get
rffiffiffi h i
1 π
yð xÞ ¼ cos kx kL ¼ þ mπ ðm ¼ 0, 1, 2, Þ ð1:91Þ
L 2
that corresponds to an eigenvalue λ ¼ (2m + 1)2π 2/4L2 (m ¼ 0, 1, 2, ). For another

series of normalized sine functions, similarly we get
rffiffiffi
1
yð xÞ ¼ sin kx ½kL ¼ nπ ðn ¼ 1, 2, 3, Þ ð1:92Þ
L
that corresponds to an eigenvalue λ ¼ (2n)2π 2/4L2 (n ¼ 1, 2, 3, ).

Notice that arranging λ in ascending order, we have even functions and odd
functions alternately as eigenfunctions corresponding to λ. Such a property is said to
be parity. We often encounter it in quantum mechanics and related fields. From
(1.61) we find that if y(x) is an eigenfunction, so is cy(x). That is, we should bear in
mind that the eigenvalue problem is always accompanied by an indeterminate
constant and that normalization of an eigenfunction does not mean the uniqueness
of the solution (see Chap. 10).
Strictly speaking, we should be careful to assure that (1.81) holds on the basis of
(1.80). It is because we have yet the possibility that k is a complex number. To see it,
we examine zeros of a cosine function that is defined in a complex domain. Here the
zeros are (complex) numbers to which the function takes zero. That is, if f(z0) ¼ 0, z0
is called a zero (i.e., one of zeros) of f(z). Now we have
1 iz
cos z e þ eiz ; z ¼ x þ iy ðx, y : realÞ: ð1:93Þ
2
Inserting z ¼ x + iy in cosz and rearranging terms, we get
1
cos z ¼ ½ cos xðey þ ey Þ þ i sin xðey ey Þ : ð1:94Þ
2
For cosz to vanish, both its real and imaginary parts must be zero. Since e y + ey > 0
for all real numbers y, we must have cosx ¼ 0 for the real part to vanish; i.e.,
π
x¼ þ mπ ðm ¼ 0, 1, 2, Þ: ð1:95Þ
2
Note in this case that sinx ¼ 1 (6¼0). Therefore, for the imaginary part to vanish,
ey e y ¼ 0. That is, we must have y ¼ 0. Consequently, the zeros of cosz are real
numbers. In other words, with respect to z0 that satisfies cosz0 ¼ 0 we have
π
z0 ¼ þ mπ ðm ¼ 0, 1, 2, Þ: ð1:96Þ
2
The above discussion equally applies to a sine function as well.

Thus, we ensure that k is a nonzero real number. Eigenvalues λ are positive
definite from (1.68) accordingly. This conclusion is not fortuitous but a direct
consequence of the form of a differential equation we have dealt with in combination
with the BCs we imposed, i.e., the Dirichlet conditions. Detailed discussion will
follow in Sects. 1.4, 8.3, and 8.4 in relation to the Hermiticity of a differential
operator.
Example 1.2: A Particle Confined Within a Potential Well The results obtained
in Example 1.1 can immediately be applied to dealing with a particle (electron) in a
one-dimensional infinite potential well. In this case, (1.55) reads as
ħ2 d2 ψ ðxÞ
þ Eψ ðxÞ ¼ 0, ð1:97Þ
2m dx2
where m is a mass of a particle and E is an energy of the particle. A potential V is

expressed as

0 ðL x LÞ,
V ð xÞ ¼
1 ðL > x; x > LÞ:
d2 ψ ðxÞ 2mE
þ 2 ψ ð xÞ ¼ 0 ð1:98Þ
dx2 ħ
with BCs
ψ ðLÞ ¼ ψ ðLÞ ¼ 0: ð1:99Þ
If we replace λ of (1.61) with 2mE

ħ2
, we can follow the procedures of Example 1.1. That
is, we put
ħ2 λ
E¼ ð1:100Þ
2m
with λ ¼ k2 in (1.68). For k we use the values of (1.81) and (1.84). Therefore, with
energy eigenvalues we get either
2
ħ2 ð2l þ 1Þ π 2
E¼ ðl ¼ 0, 1, 2, Þ,
2m 4L2
to which
rffiffiffi h i
1 π
ψ ð xÞ ¼ cos kx kL ¼ þ lπ ðl ¼ 0, 1, 2, Þ ð1:101Þ
L 2
corresponds or
2
ħ2 ð2nÞ π 2
E¼ ðn ¼ 1, 2, 3, Þ,
2m 4L2
to which
rffiffiffi
1
ψ ð xÞ ¼ sin kx ½kL ¼ nπ ðn ¼ 1, 2, 3, Þ ð1:102Þ
L
corresponds.
Since the particle behaves as a free particle within the potential well (L x L)
and p ¼ ħk, we obtain
1.4 Quantum-Mechanical Operators and Matrices 21
p2 ħ2 2
E¼ ¼ k ,
2m 2m
where
8
< ð2l þ 1Þπ=2L
> ðl ¼ 0, 1, 2, Þ,
k¼
>
:
2nπ=2L ðn ¼ 1, 2, 3, Þ:
The energy E is a kinetic energy of the particle.

Although in (1.97), ψ(x) 0 trivially holds, such a function may not be regarded
as a solution of the eigenvalue problem. In fact, considering that |ψ(x)|2 represents
existence probability of a particle, ψ(x) 0 corresponds to a situation where a
particle in question does not exist. Consequently, such a trivial case has physically
no meaning.
1.4 Quantum-Mechanical Operators and Matrices
As represented by (1.55), a quantum-mechanical operator corresponds to a physical

quantity. In (1.55), we connect a Hamiltonian operator to an energy (eigenvalue). Let
us rephrase the situation as follows:
PΨ ¼ pΨ : ð1:103Þ
In (1.103), we are viewing P as an operation or measurement on a physical system

that is characterized by the quantum state Ψ . Operating P on the physical system
(or state), we obtain a physical quantity p relevant to P as a result of the operation
(or measurement).
A way to effectively achieve the above is to use a matrix and vector to represent
the operation and physical state, respectively. Let us glance a little bit of matrix
calculation to get used to the quantum-mechanical concept and, hence, to obtain
clear understanding about it. In Part III, we will deal with matrix calculation in detail
from a point of view of a general principle. At present, a (2, 2) matrix suffices. Let
A be a (2, 2) matrix expressed as

a b
A¼ : ð1:104Þ
c d
Let jψi be a (2, 1) matrix, i.e., a column vector such that


e
j ψi ¼ : ð1:105Þ
f
Note that operating (2, 2) matrix on a (2, 1) matrix produces another (2, 1) matrix.
Furthermore, we define an adjoint matrix A{ such that

{ a c
A ¼ , ð1:106Þ
b d
where a is a complex conjugate of a. That is, A{ is a complex conjugate transposed

matrix of A. Also, we define an adjoint vector hψ| or jψi{ such that
hψj j ψi{ ¼ ðe f Þ: ð1:107Þ
In this case, jψi{ also denotes a complex conjugate transpose of jψi. The notation
jψi and hψj are due to Dirac. He named hψj and jφi a bra vector and ket vector,
respectively. This naming or equivoque comes from that hψ j j φi ¼ hψ| φi forms a
bracket. This is a (1, 2) (2, 1) ¼ (1, 1) matrix, i.e., a c-number (including a
complex number) and hψ| φi represents an inner product. These notations are widely
used nowadays in the field of mathematics
and physics.
g
Taking another vector j ξi ¼ and using a matrix calculation rule, we have
h

{ { a c e a eþc f
A j ψi ¼j A ψi ¼ ¼ : ð1:108Þ
b d f b eþd f
According to the definition (1.107), we have

j A{ ψi{ ¼ A{ ψj ¼ ðae þ cf be þ df Þ: ð1:109Þ
Thus, we get

g
A{ ψjξi ¼ ðae þ cf be þ df Þ ¼ ðag þ bhÞe þ ðcg þ dhÞf : ð1:110Þ
h
Similarly, we have

a b g
hψjAξi ¼ ðe f Þ ¼ ðag þ bhÞe þ ðcg þ dhÞf : ð1:111Þ
c d h
Comparing (1.110) and (1.111), we get


A{ ψjξi ¼ hψjAξi: ð1:112Þ
Also, we have
hψjAξi ¼ hAξjψi: ð1:113Þ
Replacing A with A{ in (1.112), we get

D {
E
A{ ψjξ ¼ ψjA{ ξ : ð1:114Þ
From (1.104) and (1.106), obviously we have
{
A{ ¼ A: ð1:115Þ
Then, from (1.114) and (1.115) we have

hAψjξi ¼ ψjA{ ξ ¼ hξjAψ i , ð1:116Þ
where the second equality comes from (1.113) obtained by exchanging ψ and ξ
there. Moreover, we have a following relation:
ðABÞ{ ¼ B{ A{ : ð1:117Þ
The proof is left for readers. Using this relation, we have

hAψ j¼j Aψi{ ¼ ½Ajψi { ¼j ψi{ A{ ¼ ψjA{ ¼ ψA{ j : ð1:118Þ
Making an inner product by multiplying jξi from the right of the leftmost and
rightmost sides of (1.118) and using (1.116), we get

hAψ j ξi ¼ ψA{ j ξi ¼ ψjA{ ξ :
This relation may be regarded as the associative law with regard to the symbol “j” of
the inner product. This is equivalent to the associative law with regard to the matrix
multiplication.
The results obtained above can readily be extended to a general case where (n, n)
matrices are dealt with.
Now, let us introduce an Hermitian operator (or matrix) H. When we have
H { ¼ H, ð1:119Þ
H is called an Hermitian matrix. Then, applying (1.112) to the Hermitian matrix H,

we have

H { ψjξi ¼ hψjHξi ¼ ψjH { ξ or hHψjξi ¼ ψjH { ξ ¼ hψjHξi: ð1:120Þ
Also let us introduce a norm of a vector jψi such that

pffiffiffiffiffiffiffiffiffiffiffiffi
jjψ jj ¼ hψjψi: ð1:121Þ
A norm is a natural extension for a notion of a “length” of a vector. The norm j|ψ|j is
zero, if and only if jψi ¼ 0 (zero vector). Then, from (1.105) and (1.107), we have
hψjψi ¼ jej2 þ j f j2 :
Therefore, hψ| ψi ¼ 0 ⟺ e ¼ f ¼ 0, i. e. , j ψi ¼ 0.
Let us further consider an eigenvalue problem represented by our newly intro-
duced notation. The eigenvalue equation is symbolically written as
H j ψi ¼ λ j ψi, ð1:122Þ
where H represents an Hermitian operator and jψi is an eigenfunction that belongs to

an eigenvalue λ. Operating hψ| on (1.122) from the left, we have
hψjH jψi ¼ hψjλjψi ¼ λhψjψi ¼ λ, ð1:123Þ
where we assume that jψi is normalized, namely hψ| ψi ¼ 1 or j|ψ| j ¼ 1. Notice that
the symbol “j” in an inner product is of secondary importance. We may disregard
this notation as in the case where a product notation “” is omitted by denoting ab
instead of a b.
Taking a complex conjugate of (1.123), we have
hψjH ψi ¼ λ : ð1:124Þ
Using (1.116) and (1.124), we have

λ ¼ hψjH ψi ¼ ψjH { ψ ¼ hψjHψi ¼ λ, ð1:125Þ
where with the third equality we used the definition (1.119). The relation λ ¼ λ
obviously shows that any eigenvalue λ is real, if H is Hermitian. The relation (1.125)
immediately tells us that even though jψi is not an eigenfunction, hψ| Hψi is real as
well, if H is Hermitian. The quantity hψ| Hψi is said to be an expectation value. This
value is interpreted as the most probable or averaged value of H obtained as a result
of operation of H on a physical state jψi. We sometimes denote the expectation value
as
hHi hψjHψi, ð1:126Þ
where jψi is normalized. Unless jψi is normalized, it can be normalized on the basis
of (1.121) by choosing jΦi such that
j Φi ¼j ψi= jjψ jj : ð1:127Þ
Thus, we have an important consequence; if an Hermitian operator has an eigen-

value, it must be real. An expectation value of an Hermitian operator is real as well.
The real eigenvalue and expectation value are a prerequisite for a physical quantity.
As discussed above, the Hermitian matrices play a central role in quantum
physics. Taking a further step, let us extend the notion of Hermiticity to a function
space.
In Example 1.1, we have remarked that we have finally reached a solution where λ
is a real (and positive) number, even though at the beginning we set no restriction on
λ. This is because the SOLDE form (1.61) accompanied by BCs (1.62) is Hermitian,
and so eigenvalues λ are real.
In this context, we give a little bit of further consideration. We define an inner
product between two functions as follows:
Z b
hgjf i gðxÞ f ðxÞdx, ð1:128Þ
a
where g(x) is a complex conjugate of g(x); x is a real variable and an integration

range can be either bounded or unbounded. If a and b are real definite numbers, [a, b]
is the bounded case. With the unbounded case, we have, e.g., (1, 1), (1, c),
and (c, 1), etc. where c is a definite number. This notation will appear again in
Chap. 10. In (1.128) we view functions f and g as vectors in a function space, often
referred to as a Hilbert space. We assume that any function f is square-integrable; i.e.,
|f |2 is finite. That is,
Z b
j f ðxÞj2 dx < 1: ð1:129Þ
a
Using the above definition, let us calculate hg| Df i, where D was defined in
(1.63). Then, using the integration by parts, we have
Z Z b
b
d2 f ðxÞ 0 b 0
hg j Df i ¼ gð x Þ 2
dx ¼ g f a þ g f 0 dx
a dx a
Z b Z b
0b 0 b 00 00
g fdx ¼ ½g 0 f g f 0 a þ
b
¼ g f aþ g f a g f dx
a a
¼ ½g 0 f g f 0 a þ hDg j f i:
b
ð1:130Þ
If we have BCs such that
f ðbÞ ¼ f ðaÞ ¼ 0 and gðbÞ ¼ gðaÞ ¼ 0, i:e:, gðbÞ ¼ gðaÞ ¼ 0, ð1:131Þ
we get
hgjDf i ¼ hDgjf i: ð1:132Þ
In light of (1.120), (1.132) implies that D is Hermitian. In (1.131), notice that the
functions f and g satisfy the same BCs. Normally, for an operator to be Hermitian has
this property. Thus, the Hermiticity of a differential operator is closely related to BCs
of the differential equation.
Next, we consider a following inner product:
Z b Z b
0
00
f0 a f f 0 dx
b
hf jDf i ¼ f f dx ¼ ½ f þ
a a
Z b
¼ ½ f f 0 a þ j f 0 j dx:
b 2
ð1:133Þ
a
Note that the definite integral of (1.133) cannot be negative. There are two possibil-
ities for D to be Hermitian according to different BCs.
(i) Dirichlet conditions: f(b) ¼ f(a) ¼ 0. If we could have f 0 ¼ 0, h f j Df i would be
zero. But, in that case f should be constant. If so, f(x) 0 according to BCs. We
must exclude this trivial case. Consequently, to avoid this situation we must
have
Z b
j f 0 j dx > 0 or h f jDf i > 0:
2
ð1:134Þ
a
In this case, the operator D is said to be positive definite. Suppose that such a
positive-definite operator has an eigenvalue λ. Then, for a corresponding
eigenfunction y(x) we have
DyðxÞ ¼ λyðxÞ: ð1:135Þ
In this case, we state that y(x) is an eigenfunction or eigenvector that

corresponds (or belongs) to an eigenvalue λ. Taking an inner product of both
sides, we have
1.5 Commutator and Canonical Commutation Relation 27
hyjDyi ¼ hyjλyi ¼ λhyjyi ¼ λkyk2 or λ ¼ hyjDyi=kyk2 : ð1:136Þ
Both hy| Dyi and kyk2 are positive and, hence, we have λ > 0. Thus, if D has
an eigenvalue, it must be positive. In this case, λ is said to be positive definite as
well; see Example 1.1.
(ii) Neumann conditions: f 0(b) ¼ f 0(a) ¼ 0. From (1.130), D is Hermitian as well.
Unlike the condition (i), however, f may be a nonzero constant in this case.
Therefore, we are allowed to have
Z b
j f 0 j dx ¼ 0 or hf jDf i ¼ 0:
2
ð1:137Þ
a
For any function, we have
hf jDf i 0: ð1:138Þ
In this case, the operator D is said to be non-negative (or positive semi-

definite). The eigenvalue may be zero from (1.136) and, hence, is called
non-negative accordingly.
(iii) Periodic conditions: f(b) ¼ f(a) and f 0(b) ¼ f 0(a). We are allowed to have
h f j Df i 0 as in the case of the condition (ii). Then, the operator and
eigenvalues are non-negative.
Thus, in spite of being formally the same operator, that operator behaves
differently according to the different BCs. In particular, for a differential
operator to be associated with an eigenvalue of zero produces a special interest.
We will encounter another illustration in Chap. 3.
1.5 Commutator and Canonical Commutation Relation
In quantum mechanics it is important whether two operators A and B are commut-

able. In this context, a commutator between A and B is defined such that
½A, B AB BA: ð1:139Þ
If [A, B] ¼ 0 (zero matrix), A and B are said to be commutable (or commutative).

If [A, B] 6¼ 0, A and B are noncommutative. Such relationships between two
operators are called commutation relation.
We have canonical commutation relation as an underlying concept of quantum
mechanics. This is defined between a (canonical) coordinate q and a (canonical)
momentum p such that
½q, p ¼ iħ, ð1:140Þ
where the presence of a unit matrix E is implied. Explicitly writing it, we have,
½q, p ¼ iħE: ð1:141Þ
The relations (1.140) and (1.141) are called the canonical commutation relation. On
the basis of a relation p ¼ ħi ∂q
∂
, a brief proof for this is as follows:

ħ ∂ ħ ∂ ħ ∂ ħ ∂
½q, p jψi ¼ ðqp pqÞjψi ¼ q q ψi ¼ q ψi ðqjψiÞ
i ∂q i ∂q i ∂q i ∂q
ħ ∂ j ψi ħ ∂q ħ ∂ j ψi ħ
¼q ψi q ¼ ψi ¼ iħ j ψi: ð1:142Þ
i ∂q i ∂q i ∂q i
Since jψi is an arbitrarily chosen vector, we have (1.140).

Using (1.117), we have
½A, B { ¼ ðAB BAÞ{ ¼ B{ A{ A{ B{ : ð1:143Þ
If in (1.143) A and B are both Hermitian, we have
½A, B { ¼ BA AB ¼ ½A, B : ð1:144Þ
If we have an operator G such that
G{ ¼ G, ð1:145Þ
G is said to be anti-Hermitian. Therefore, [A, B] is anti-Hermitian, if both A and B are

Hermitian. If an anti-Hermitian operator has an eigenvalue, that eigenvalue is zero or
pure imaginary. To show this, suppose that
G j ψi ¼ λ j ψi, ð1:146Þ
where G is an anti-Hermitian operator and jψi has been normalized. As in the case of
(1.123) we have
hψjG j ψi ¼ λhψjψi ¼ λ: ð1:147Þ
Taking a complex conjugate of (1.147), we have
hψjGψi ¼ λ : ð1:148Þ
Using (1.116) and (1.145) again, we have

1.5 Commutator and Canonical Commutation Relation 29

λ ¼ hψjGψi ¼ ψjG{ ψ ¼ hψjGψi ¼ λ, ð1:149Þ
This shows that λ is zero or pure imaginary.

Therefore, (1.142) can be viewed as an eigenvalue equation to which any physical
state jψi has a pure imaginary eigenvalue iħ with respect to [q, p]. Note that both
q and p are Hermitian (see Sect. 10.2, Example 10.3), and so [q, p] is anti-Hermitian
as mentioned above. The canonical commutation relation given by (1.140) is
believed to underpin the uncertainty principle.
In quantum mechanics, it is of great importance whether a quantum operator is
Hermitian or not. A position operator and momentum operator along with an angular
momentum operator are particularly important when we constitute Hamiltonian. Let
f and g be arbitrary functions. Let us consider, e.g., a following inner product with
the momentum operator.
Z b
ħ ∂
hgjpf i ¼ gð x Þ ½ f ðxÞ dx, ð1:150Þ
a i ∂x
where the domain [a, b] depends on a physical system; this can be either bounded or
unbounded. Performing integration by parts, we have
Z b
ħ b ∂ ħ
hgjpf i ¼ ½gðxÞ f ðxÞ a ½ gð x Þ f ðxÞdx
i a ∂x i
Z b
ħ ħ ∂
¼ ½gðbÞ f ðbÞ gðaÞ f ðaÞ þ gðxÞ f ðxÞdx: ð1:151Þ
i a i ∂x
If we require f(b) ¼ f(a) and g(b) ¼ g(a), the first term vanishes and we get
Z b
ħ ∂
hgjpf i ¼ gðxÞ f ðxÞdx ¼ hpgj f i: ð1:152Þ
a i ∂x
Thus, as in the case of (1.120), the momentum operator p is Hermitian. Note that a
position operator q of (1.142) is Hermitian as a priori assumption.
Meanwhile, the z-component of angular momentum operator Lz is described in a
polar coordinate as follows:
ħ ∂
Lz ¼ , ð1:153Þ
i ∂ϕ
where ϕ is an azimuthal angle varying from 0 to 2π. The notation and implication of
Lz will be mentioned in Chap. 3. Similarly as the above, we have
Z 2π
ħ ħ ∂
hgjLz f i ¼ ½gð2π Þ f ð2π Þ gð0Þ f ð0Þ þ gðxÞ f ðxÞdϕ: ð1:154Þ
i 0 i ∂ϕ
Requiring an arbitrary function f to satisfy a BC f(2π) ¼ f(0), we reach
hgjLz f i ¼ hLz gj f i: ð1:155Þ
Note that we must have the above BC, because ϕ ¼ 0 and ϕ ¼ 2π are spatially the
same point. Thus, we find that Lz is Hermitian as well on this condition.
On the basis of aforementioned argument, let us proceed to quantum-mechanical
studies of a harmonic oscillator. Regarding the angular momentum, we will study
their basic properties in Chap. 3.
Reference
1. Møller C (1952) The theory of relativity. Oxford University Press, London

Chapter 2
Quantum-Mechanical Harmonic Oscillator
Quantum-mechanical treatment of a harmonic oscillator has been a well-studied

topic from the beginning of the history of quantum mechanics. This topic is a
standard subject in classical mechanics as well. In this chapter, first we briefly
survey characteristics of a classical harmonic oscillator. From a quantum-mechanical
point of view, we deal with features of a harmonic oscillator through matrix
representation. We define creation and annihilation operators using position and
momentum operators. A Hamiltonian of the oscillator is described in terms of the
creation and annihilation operators. This enables us to easily determine energy
eigenvalues of the oscillator. As a result, energy eigenvalues are found to be positive
definite. Meanwhile, we express the Schrödinger equation by the coordinate repre-
sentation. We compare the results with those of the matrix representation and show
that the two representations are mathematically equivalent. Thus, the treatment of the
quantum-mechanical harmonic oscillator supplies us with a firm ground for studying
basic concepts of the quantum mechanics.
2.1 Classical Harmonic Oscillator
Classical Newtonian equation of a one-dimensional harmonic oscillator is expressed

as
d2 xðtÞ
m ¼ sxðt Þ, ð2:1Þ
dt 2
where m is a mass of an oscillator and s is a spring constant. Putting s/m ¼ ω2, we

have

https://doi.org/10.1007/978-981-15-2225-3_2
32 2 Quantum-Mechanical Harmonic Oscillator
d 2 xð t Þ
þ ω2 xðt Þ ¼ 0: ð2:2Þ
dt 2
In (2.2) we set ω positive, namely,

pffiffiffiffiffiffiffiffi
ω¼ s=m, ð2:3Þ
where ω is called an angular frequency of the oscillator.

If we replace ω2 with λ, we have formally the same equation as (1.61). Two
linearly independent solutions of (2.2) are the same as before (see Example 1.1); we
have eiωt and eiωt (ω 6¼ 0) as such. Note, however, that in Example 1.2 we were
dealing with a quantum state related to existence probability of a particle in a
potential well. In (2.2), on the other hand, we are examining a position of harmonic
oscillator undergoing a force of a spring. We are thus considering a different
situation.
As a general solution we have
xðt Þ ¼ aeiωt þ beiωt , ð2:4Þ
where a and b are suitable constants. Let us consider BCs different from those of
Examples 1.1 or 1.2 this time. That is, we set BCs such that
x ð 0Þ ¼ 0 and x0 ð0Þ ¼ v0 ðv0 > 0Þ: ð2:5Þ
Notice that (2.5) gives initial conditions (ICs). Mathematically, ICs are included
in BCs (see Chap. 10). From (2.4) we have
xð t Þ ¼ a þ b ¼ 0 and x0 ð0Þ ¼ iωða bÞ ¼ v0 : ð2:6Þ
Then, we get a ¼ b ¼ v0/2iω. Thus, we get a simple harmonic motion as a

solution expressed as
v0 iωx v
xð t Þ ¼ e eiωx ¼ 0 sin ωt: ð2:7Þ
2iω ω
From this, we have
1
E ¼ K þ V ¼ mv20 : ð2:8Þ
2
In particular, if v0 ¼ 0, x(t) 0. This is a solution of (2.1) that has the meaning

that the particle is eternally at rest. It is physically acceptable as well. Notice also that
unlike Examples 1.1 and 1.2, the solution has been determined uniquely. This is due
to the different BCs.
2.2 Formulation Based on an Operator Method 33
From a point of view of a mechanical system, mathematical formulation of the

classical harmonic oscillator resembles that of electromagnetic fields confined within
a cavity. We return this point later in Sect. 9.6.
2.2 Formulation Based on an Operator Method
Now let us return to our task to find quantum-mechanical solutions of a harmonic

oscillator. Potential V is given by
1 1
V ðqÞ ¼ sq2 ¼ mω2 q2 , ð2:9Þ
2 2
where q is used for a one-dimensional position coordinate. Then, we have a classical

Hamiltonian H expressed as
p2 p2 1
H¼ þ V ð qÞ ¼ þ mω2 q2 : ð2:10Þ
2m 2m 2
Following the formulation of Sect. 1.2, the Schrödinger equation as an eigenvalue

equation related to energy E is described as
Hψ ðqÞ ¼ Eψ ðqÞ
or
ħ2 2 1
∇ ψ ðqÞ þ mω2 q2 ψ ðqÞ ¼ Eψ ðqÞ: ð2:11Þ
2m 2
This is a SOLDE and it is well known that the SOLDE can be solved by a power
series expansion method.
In the present studies, however, let us first use an operator method to solve the
eigenvalue equation (2.11) of a one-dimensional oscillator. To this end, we use a
quantum-mechanical Hamiltonian where a momentum operator p is explicitly
represented. Thus, the Hamiltonian reads as
p2 1
H¼ þ mω2 q2 : ð2:12Þ
2m 2
Equation (2.12) is formally the same as (2.10). Note, however, that in (2.12) p and
q are expressed as quantum-mechanical operators.
As in (1.126), we first examine an expectation value hHi of H. It is given by
2 D E
p 1
hH i ¼ hψjHψ i ¼ ψ ψ þ ψ mω2 q2 ψ
2m 2
1 1 1 1 ð2:13Þ
¼ p{ ψ jpψ þ mω2 q{ ψ jqψ ¼ hpψ jpψ i þ mω2 hqψ jqψ i
2m 2 2m 2
1 1
¼ 2
kpψ k þ mω kqψ k 0,
2 2
2m 2
where again we assumed that jψi has been normalized. In (2.13) we used the
notation (1.126) and the fact that both q and p are Hermitian. In this situation, hHi
takes a non-negative value.
In (2.13), the equality holds if and only if jpψi ¼ 0 and jqψi ¼ 0. Let us specify a
vector jψ 0i that satisfies these conditions such that
j pψ 0 i ¼ 0 and j qψ 0 i ¼ 0: ð2:14Þ
Multiplying q from the left on the first equation of (2.14) and multiplying p from
the left on the second equation, we have
qp j ψ 0 i ¼ 0 and pq j ψ 0 i ¼ 0: ð2:15Þ
Subtracting the second equation of (2.15) from the first equation, we get
ðqp pqÞjψ 0 i ¼ iħjψ 0 i ¼ 0, ð2:16Þ
where with the first equality we used (1.140). Therefore, we would have jψ 0(q)i 0.
This leads to the relations (2.14). That is, if and only if jψ 0(q)i 0, hHi ¼ 0. But,
since it has no physical meaning, jψ 0(q)i 0 must be rejected as unsuitable for the
solution of (2.11). Regarding a physically acceptable solution of (2.13), hHi must
take a positive definite value accordingly. Thus, on the basis of the canonical
commutation relation, we restrict the range of the expectation values.
Instead of directly dealing with (2.12), it is well known to introduce following
operators [1]:
rffiffiffiffiffiffiffi
mω i
a q þ pffiffiffiffiffiffiffiffiffiffiffiffi p ð2:17Þ
2ħ 2mħω
and its adjoint (complex conjugate) operator

{ mω i
a ¼ q pffiffiffiffiffiffiffiffiffiffiffiffi p: ð2:18Þ
2ħ 2mħω
Notice here again that both q and p are Hermitian. Using a matrix representation
for (2.17) and (2.18), we have
0 qffiffiffiffiffiffiffi 1
mω i
a B 2ħ 2mħω C q
¼B ffiffiffiffiffiffiffi
@ qmω
C
A p ð2:19Þ
a{ i
2ħ 2mħω
Then we have
a{ a ¼ ð2mħωÞ1 ðmωq ipÞðmωq þ ipÞ

¼ ð2mħωÞ1 m2 ω2 q2 þ p2 þ imωðqp pqÞ ð2:20Þ
h i
1 1 2 1 1
¼ ðħωÞ1 mω2 q2 þ p þ iωiħ ¼ ðħωÞ1 H ħω ,
2 2m 2 2
where the second last equality comes from (1.140). Rewriting (2.20), we get
1
H ¼ ħωa{ a þ ħω: ð2:21Þ
2
Similarly we get
1
H ¼ ħωaa{ ħω: ð2:22Þ
2
Subtracting (2.22) from (2.21), we have
0 ¼ ħωa{ a ħωaa{ þ ħω: ð2:23Þ
That is,
a, a{ ¼ 1 or a, a{ ¼ E, ð2:24Þ
where E represents an identity operator.

Furthermore, using (2.21) we have
h i
1
H, a{ ¼ ħω a{ a þ , a{ ¼ ħω a{ aa{ a{ a{ a ¼ ħωa{ a, a{ ¼ ħωa{ : ð2:25Þ
2
Similarly, we get
½H, a ¼ ħωa: ð2:26Þ
Next, let us calculate an expectation value of H. Using a normalized function jψi,

from (2.21) we have
D E
1 1
hψ jH jψ i ¼ ψ ħωa{ a þ ħωψ ¼ ħω ψ a{ aψ þ ħωhψjψ i
2 2 ð2:27Þ
1 1 1
¼ ħωhaψjaψ i þ ħω ¼ ħωkaψ k2 þ ħω ħω:
2 2 2
Thus, the expectation value is equal to or larger than 12 ħω. This is consistent with
that an energy eigenvalue is positive definite as mentioned above. Equation (2.27)
also tells us that if we have
j aψ 0 i ¼ 0, ð2:28Þ
we get
1
hψ 0 jHjψ 0 i ¼ ħω: ð2:29Þ
2
Equation (2.29) means that the smallest expectation value is 12 ħω on the condition
of (2.28). On the same condition, using (2.21) we have
1 1
H j ψ 0 i ¼ ħωa{ a j ψ 0 i þ ħω j ψ 0 i ¼ ħω j ψ 0 i: ð2:30Þ
2 2
Thus, jψ 0i is an eigenfunction corresponding to an eigenvalue 12 ħω E0 , which

is identical with the smallest expectation value of (2.29). Since this is the lowest
eigenvalue, jψ 0i is said to be a ground state. We ensure later that jψ 0i is certainly an
eligible function for a ground state.
The above method is consistent with the variational principle [2] which stipulates
that under appropriate BCs an expectation value of Hamiltonian estimated with any
arbitrary function is always larger than or equal to the smallest eigenvalue
corresponding to the ground state.
Next, let us evaluate energy eigenvalues of the oscillator. First we have
1
H j ψ 0 i ¼ ħω j ψ 0 i ¼ E0 j ψ 0 i: ð2:31Þ
2
Operating a{ on both sides of (2.31), we have
a{ H j ψ 0 i ¼ a{ E 0 j ψ 0 i: ð2:32Þ
Meanwhile, using (2.25), we have

a{ H j ψ 0 i ¼ Ha{ ħωa{ j ψ 0 i: ð2:33Þ
Equating RHSs of (2.32) and (2.33), we get
Ha{ j ψ 0 i ¼ ðE 0 þ ħωÞa{ j ψ 0 i: ð2:34Þ
This implies that a{ j ψ 0i belongs to an eigenvalue (E0 + ħω), which is larger than
E0 as expected. Again multiplying a{ on both sides of (2.34) from the left and using
(2.25), we get
[ℏ
+∞
0 1 3 5 7 9
⋯
2 2 2 2 2
Fig. 2.1 Energy eigenvalues of a quantum-mechanical harmonic oscillator on a real axis
2 2
H a{ j ψ 0 i ¼ ðE 0 þ 2ħωÞ a{ j ψ 0 i: ð2:35Þ
This implies that (a{)2 j ψ 0i belongs to an eigenvalue (E0 + 2ħω). Thus, repeat-
edly taking the above procedures, we get
n n
H a{ j ψ 0 i ¼ ðE 0 þ nħωÞ a{ j ψ 0 i: ð2:36Þ
Thus, (a{)n j ψ 0i belongs to an eigenvalue

1
E n ðE0 þ nħωÞ ¼ n þ ħω, ð2:37Þ
2
where En denotes an energy eigenvalue of the n-th excited state. The energy
eigenvalues are plotted in Fig. 2.1.
Our next task is to seek normalized eigenvectors of the n-th excited state. Let cn
be a normalization constant of that state. That is, we have
n
j ψ n i ¼ cn a{ j ψ 0 i, ð2:38Þ
where jψ ni is a normalized eigenfunction of the n-th excited state. To determine cn,

let us calculate a j ψ ni. This includes a factor a(a{)n. We have
n n1 n1
a a{ ¼ aa{ a{ a a{ þ a{ a a{
n1 n1 { n1 n1
¼ a, a{ a{ þ a{ a a{ ¼ a þ a{ a a{
n1 n2 { 2 { n2
¼ a{ þ a{ a, a{ a{ þ a a a
{ n1 { 2 { n2 ð2:39Þ
¼2 a þ a a a
{ n1 { 2 n3 { 3 { n3
¼2 a þ a a, a{ a{ þ a a a
{ n1 { 3 { n3
¼3 a þ a a a
¼ :
In the above procedures, we used [a, a{] ¼ 1. What is implied in (2.39) is that a
coefficient of (a{)n 1 increased one by one with a transferred toward the right one
by one in the second term of RHS. Notice that in the second term a is sandwiched
such that (a{)ma(a{)n m (m ¼ 1, 2, . . .). Finally, we have
n n1 { n
a a{ ¼ n a{ þ a a: ð2:40Þ
Thus, we get
n h n1 n i n1

a ψ n i ¼ c n a a { ψ 0 i ¼ c n n a { þ a{ a ψ 0 i ¼ c n n a{ ψ 0 i
c n1 c ð2:41Þ
¼ n n cn1 a{ jψ 0 i ¼ n n jψ n1 i,
cn1 cn1
where the third equality comes from (2.28).

Next, operating a on (2.40) we get
n n1 n
a2 a{ ¼ na a{ þ a a{ a
h n2 { n1 i n
¼ n ð n 1Þ a{ þ a a þ a a{ a ð2:42Þ
n2 n1 n
¼ nð n 1Þ a{ þ n a{ a þ a a{ a
Operating another a on (2.42), we get

n n2 n1 n
a3 a{ ¼ nðn 1Þa a{ þ na a{ a þ a2 a{ a
h n3 { n2 i n1 n
¼ nðn 1Þ ðn 2Þ a{ þ a a þ na a{ a þ a2 a{ a
n3 n2 n1 n
¼ nðn 1Þðn 2Þ a{ þ nð n 1Þ a{ a þ na a{ a þ a2 a{ a
¼ :
ð2:43Þ
To generalize the above procedures, operating a on (2.40) m (<n) times we get

n nm
am a{ ¼ nðn 1Þðn 2Þ ðn m þ 1Þ a{ þ f a, a{ a, ð2:44Þ
where m < n and f(a, a{) is a polynomial of a{ that has a power of a as a coefficient.
Further operating hψ 0| and jψ 0i from the left and right on both sides of (2.44),
respectively, we have
D n E D nm E
ψ 0 am a{ ψ 0 ¼ nðn 1Þðn 2Þ ðn m þ 1Þ ψ 0 a{ ψ 0

þ ψ 0 f a, a{ aψ 0 ð2:45Þ
D nm E
¼ nðn 1Þðn 2Þ ðn m þ 1Þ ψ 0 a{ ψ 0
Note that in (2.45) hψ 0| f(a, a{)a| ψ 0i vanishes because of (2.28).

Meanwhile, taking adjoint of (2.28), we have
hψ 0 j a{ ¼ 0: ð2:46Þ
Operating a{ (n m 1) times from the right of LHS of (2.46), we have

nm
h ψ 0 j a{ ¼ 0:
Further operating jψ 0i from the right of the above equation, we get

nm
hψ 0 j a{ j ψ 0 i ¼ 0:
Therefore, from (2.45) we get

n
hψ 0 j am a{ j ψ 0 i ¼ 0: ð2:47Þ
Taking adjoint of (2.47) once again, we have

m
hψ 0 j an a{ j ψ 0 i ¼ 0: ð2:48Þ
Equation (2.48) can be obtained by repeatedly using (1.117). From (2.47) and
(2.48), we get
n
hψ 0 j am a{ j ψ 0 i ¼ 0, when m 6¼ n: ð2:49Þ
If m ¼ n, from (2.45) we get

n
hψ 0 j an a{ j ψ 0 i ¼ n!hψ 0 jψ 0 i: ð2:50Þ
If we assume that jψ 0i is normalized; i.e., hψ 0| ψ 0i ¼ 1, (2.50) is expressed as

n
hψ 0 j an a{ j ψ 0 i ¼ n!: ð2:51Þ
From (2.51), if we put
1 n
j ψ n i ¼ pffiffiffiffi a{ j ψ 0 i, ð2:52Þ
n!
then we have
1
hψ n j ¼ pffiffiffiffi hψ 0 jan :
n!
Thus, from (2.49) and (2.52) we get

hψ m j ψ n i ¼ δmn : ð2:53Þ
At the same time, for cn of (2.38) we get
1
cn ¼ pffiffiffiffi : ð2:54Þ
n!
Notice here that an undetermined phase factor eiθ (θ : real) is intended such that
1
cn ¼ pffiffiffiffi eiθ :
n!
But, eiθ is usually omitted for the sake of simplicity. Thus, we have constructed a
series of orthonormal eigenfunctions | ψ ni.
Furthermore, using (2.41) and (2.54) we get
pffiffiffi
a j ψ ni ¼ n j ψ n1 i: ð2:55Þ
From (2.36), we get
H j ψ n i ¼ ðE 0 þ nħωÞ j ψ n i: ð2:56Þ
Meanwhile, from (2.21) we have

H j ψ n i ¼ ħωa{ a þ E 0 j ψ n i: ð2:57Þ
Equating RHSs of (2.56) and (2.57), we get
a{ a j ψ n i ¼ n j ψ n i: ð2:58Þ
Thus, we find that an integer n is an eigenvalue of a{a when it is evaluated with

respect to jψ ni. For this reason a{a is called a number operator. Notice that a{a is
Hermitian because we have
{ {
a{ a ¼ a{ a{ ¼ a{ a, ð2:59Þ
where we used (1.115) and (1.117).

Moreover, from (2.55) and (2.58),
pffiffiffi
a{ aψ n i ¼ na{ ψ n1 i ¼ n j ψ n i: ð2:60Þ
Thus,
2.3 Matrix Representation of Physical Quantities 41
pffiffiffi
a{ j ψ n1 i ¼ n j ψ n i: ð2:61Þ
Or replacing n with n + 1, we get

pffiffiffiffiffiffiffiffiffiffiffi
a{ j ψ n i ¼ n þ 1 j ψ nþ1 : ð2:62Þ
As implied in (2.55) and (2.62), we find that operating a on jψ ni lowers an energy

level by one and that operating a{ on jψ ni raises an energy level by one. For this
reason, a and a{ are said to be an annihilation operator and creation operator,
respectively.
2.3 Matrix Representation of Physical Quantities
Equations (2.55) and (2.62) clearly represent the relationship between an operator
and eigenfunction (or eigenvector). The relationship is characterized by
ðMatrixÞ ðVectorÞ ¼ ðVector Þ: ð2:63Þ
Thus, we are now in a position to construct this relation using matrices. From
(2.53), we should be able to construct basis vectors using a column vector such that
0 1 0 1 0 1
1 0 0
B 0 C B 1 C B 0 C
B C B C B C
B C B C B C
B 0 C B 0 C B 1 C
jψ 0 i ¼ B C, j ψ 1 i ¼ B
B 0 C B
C, j ψ 2 i ¼ B
C B
C, : ð2:64Þ
B C B 0 C B 0 C
C
B C B C B C
@ 0 A @ 0 A @ 0 A
⋮ ⋮ ⋮
Notice that these vectors form a vector space of an infinite dimension. The
orthonormal relation (2.53) can easily be checked. We represent a and a{ so that
(2.55) and (2.62) can be satisfied. We obtain
0 1
0 1 0 0 0
B pffiffiffi
B 0 0 2 0 0 C
C
B pffiffiffi C
B 0 0 0 3 0 C
a¼B
B
C: ð2:65Þ
B 0 0 0 0 2 C
C
B C
@ 0 0 0 0 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
Similarly,
0 1
0 0 0 0 0
B 1 C
B 0
pffiffiffi
0 0 0 C
B C
B 0 2 0 0 0 C
a{ ¼ B
B 0 pffiffiffi C: ð2:66Þ
B 0 3 0 0 CC
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
Note that neither a nor a{ is Hermitian.

Since a determinant of the matrix of (2.19) is i/ħ 6¼ 0, using its inverse matrix we
get
0 rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi 1
ħ ħ
B 2mω C
q B 2mω C a
¼ B rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi C { : ð2:67Þ
p @ 1 mħω 1 mħω A a

i 2 i 2
That is, we have

rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
ħ 1 mħω
q¼ a þ a{ and p¼ a a{ : ð2:68Þ
2mω i 2
With the inverse matrix, we will deal with it in Part III. Note that q and p are both
Hermitian. Inserting (2.65) and (2.66) into (2.68), we get
0 1
0 1 0 0 0
B pffiffiffi
C
rffiffiffiffiffiffiffiffiffiffiB C
1 0 2 0 0
B pffiffiffi pffiffiffi C
ħ B B
0 2 0 3 0 C
C,
q¼ pffiffiffi ð2:69Þ
2mωB B 0 0 3 0 2 C
C
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
0 1
0 1 0 0 0
B pffiffiffi
0 2 C
rffiffiffiffiffiffiffiffiffiffi B C
1 0 0
mħω B 0 2 0 3 0 C
p¼ iB pffiffiffi C: ð2:70Þ
2 B B 0 0 3 0 2 C
C
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
2.3 Matrix Representation of Physical Quantities 43
Equations (2.69) and (2.70) obviously show that q and p are Hermitian. We can
derive various physical quantities from these equations. For instance, following
matrix algebra Hamiltonian H can readily be calculated. The result is expressed as
0 1
1 0 0 0 0
B C
B 0 3 0 0 0 C
B C
p 2
1 ħω B
B
0 0 5 0 0 C
C:
H¼ þ mω2 q2 ¼ ð2:71Þ
2m 2 2 B
B 0 0 0 7 0 C
C
B C
@ 0 0 0 0 9 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
Looking at (2.69) to (2.71), we immediately realize that although neither q or p is

diagonalized, H is diagonalized. The matrix representation of (2.71) is said to be a
representation that diagonalizes H. This representation is of great practical use. In
fact, using (2.64) we get, e.g.,
0 10 1 0 1 0 1
1 0 0 0 0 0 0 0
B CB C B C B C
B 0 3 0 0 0 CB C B 0 C B 0 C
B CB 0 C B C B C
B CB C B C B C
B 0 0 5 0 0 CB 1 C B 5 C B C
ħω B CB C ħω B C 5ħω B 1 C
H jψ 2 i ¼ B CB C ¼ B C¼ B C
2 B 0 0 0 7 0 CB C 2 B 0 C 2 B 0 C
B CB 0 C B C B C
B CB C B C B C
B 0 0 0 0 9 CB 0 C B 0 C B 0 C
@ A@ A @ A @ A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮

5ħω 1
¼ jψ 2 i ¼ þ 2 ħωjψ 2 i:
2 2
ð2:72Þ
1 This clearly means that the second-excited state jψ 2i has an1eigenenergy

2 þ 2 ħω . More generally, we find that jψ ni has an eigenenergy 2 þ n ħω as
already shown in (2.36) and (2.56).
Furthermore, let us confirm the canonical commutation relation of (1.140). Using
(2.68), we have
rffiffiffiffiffiffiffiffiffiffirffiffiffiffiffiffiffiffiffiffi
1 ħ mħω
qp pq ¼ a þ a{ a a{ a a{ a þ a{
i 2mω 2 ð2:73Þ
ħ
¼ ð2Þ aa{ a{ a ¼ iħ a, a{ ¼ iħ ¼ iħE,
2i
where with the second last equality we used (2.24) and the identity matrix E of an
infinite dimension is given by
0 1
1 0 0 0 0
B 0 C
B 1 0 0 0 C
B C
B 0 0 1 0 0 C
E¼B
B 0
C: ð2:74Þ
B 0 0 1 0 CC
B C
@ 0 0 0 0 1 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
Thus, the canonical commutation relation holds with the quantum-mechanical

harmonic oscillator. This can be confirmed directly from (2.69) and (2.70). The
proof is left for readers as an exercise.
2.4 Coordinate Representation of Schrödinger Equation
The Schrödinger equation has been given in (1.55) or (2.11) as a SOLDE form. In
contrast to the matrix representation, (1.55) and (2.11) are said to be coordinate
representation of Schrödinger equation. Now, let us derive a coordinate representa-
tion of (2.28) to obtain an analytical solution.
On the basis of (1.31) and (2.17), a is expressed as
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffi
mω i mω i ħ ∂
a¼ q þ pffiffiffiffiffiffiffiffiffiffiffiffi p ¼ q þ pffiffiffiffiffiffiffiffiffiffiffiffi
2ħ 2mħω 2ħ 2mħω i ∂q
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
mω ħ ∂
¼ qþ : ð2:75Þ
2ħ 2mω ∂q
Thus, (2.28) reads as a following first-order linear differential equation (FOLDE):

rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi !
mω ħ ∂
qþ ψ ðqÞ ¼ 0: ð2:76Þ
2ħ 2mω ∂q 0
Or
mω ħ ∂ψ 0 ðqÞ
qψ ðqÞ þ ¼ 0: ð2:77Þ
2ħ 0 2mω ∂q
This is further reduced to
∂ψ 0 ðqÞ mω
þ qψ 0 ðqÞ ¼ 0: ð2:78Þ
∂q ħ
From this FOLDE form, we anticipate the following solution:

2.4 Coordinate Representation of Schrödinger Equation 45
ψ 0 ðqÞ ¼ N 0 eαq ,
2
ð2:79Þ
where N0 is a normalization constant and α is a constant coefficient. Putting (2.79)

into (2.78), we have

mω
q N 0 eαq ¼ 0:
2
2αq þ ð2:80Þ
ħ
Hence, we get
mω
α¼ : ð2:81Þ
2ħ
Thus,
ψ 0 ðqÞ ¼ N 0 e 2ħ q :
mω 2
ð2:82Þ
The constant N0 can be determined by the normalization condition given by

Z 1 Z 1
e ħ q dq ¼ 1:
mω 2
jψ 0 ðqÞj2 dq ¼ 1 or N 0 2 ð2:83Þ
1 1
Recalling the following formula:

Z 1
rffiffiffi
cq2 π
e dq ¼ ðc > 0Þ, ð2:84Þ
1 c
we have
Z 1
mω 2 πħ
e ħ q dq ¼ : ð2:85Þ
1 mω
R1 cq2
To get (2.84), putting I 1 e dq, we have
Z 1 Z 1 Z 1
ecq dq ecs ds ecðq þs Þ dqds
2 2 2 2
I2 ¼ ¼
1 1 1
Z 1 Z 2π Z 1 Z 2π ð2:86Þ
1 π
ecr rdr ecR dR
2
¼ dθ ¼ dθ ¼ ,
0 0 2 0 0 c
where with the third equality we converted two-dimensional Cartesian coordinate to

polar coordinate; take q ¼ r cos θ, s ¼ r sin θ and convert an infinitesimal area
element dqds to dr rdθ. With the second last equality

pffiffiffiffiffiffiffiffi of (2.86), we used the variable
transformation of r2 ! R. Hence, we get I ¼ π=c.
Thus, we get
1=4 1=4 mω 2
mω mω
N0 ¼ and ψ 0 ðqÞ ¼ e 2ħ q : ð2:87Þ
πħ πħ
Also, we have
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffi
mω i mω i ħ ∂
a{ ¼ q pffiffiffiffiffiffiffiffiffiffiffiffi p ¼ q pffiffiffiffiffiffiffiffiffiffiffiffi
2ħ 2mħω 2ħ 2mħω i ∂q
mω ħ ∂
¼ q : ð2:88Þ
2ħ 2mω ∂q
From (2.52), we get

rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi !n
1 n 1 mω ħ ∂
ψ n ðqÞ ¼ pffiffiffiffi a{ j ψ 0 i ¼ pffiffiffiffi q ψ 0 ð qÞ
n! n! 2ħ 2mω ∂q
ð2:89Þ
n
1 mω n=2 ħ ∂
¼ pffiffiffiffi q ψ 0 ðqÞ:
n! 2ħ mω ∂q
Putting
mω
β and ξ ¼ βq, ð2:90Þ
ħ
we rewrite (2.89) as
n=2 n
ξ 1 mω ∂
ψ n ð qÞ ¼ ψ n ¼ pffiffiffiffi ξ ψ 0 ð qÞ
β n! 2ħβ 2 ∂ξ
n
rffiffiffiffiffiffiffiffiffi n
1 1 n=2 ∂ 1 mω 1=4 ∂
e2ξ : ð2:91Þ
1 2
¼ pffiffiffiffi ξ ψ 0 ð qÞ ¼ ξ
n! 2 ∂ξ 2 n
n! πħ ∂ξ
Comparing (2.81) and (2.90), we have
β2
α¼ : ð2:92Þ
2
Moreover, putting
rffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 mω 1=4 β
Nn ¼ , ð2:93Þ
2n n! πħ π 1=2 2n n!
we get
n
∂
e 2ξ :
1 2
ψ n ðξ=βÞ ¼ N n ξ ð2:94Þ
∂ξ
We have to normalize (2.94) with respect to a variable ξ. Since ψ n(q) has already
been normalized as in (2.53), we have
Z 1
jψ n ðqÞj2 dq ¼ 1: ð2:95Þ
1
Changing a variable q to ξ, we have

Z 1
1
jψ n ðξ=βÞj2 dξ ¼ 1: ð2:96Þ
β 1
fn ðξÞ as being normalized with ξ. In other words, ψ n(q) is converted

Let us define ψ
fn ðξÞ by means of variable transformation and concomitant change in normali-
to ψ
zation condition. Then, we have
Z 1
fn ðξÞj2 dξ ¼ 1:
jψ ð2:97Þ
1
fn ðξÞ as
Comparing (2.96) and (2.97), if we define ψ
rffiffiffi
1
fn ðξÞ
ψ ψ ðξ=βÞ, ð2:98Þ
β n
fn ðξÞ should be a proper normalized function. Thus, we get

ψ
n
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ψ fn ξ ∂
fn ðξÞ ¼ N e 12ξ2 fn
with N
1
: ð2:99Þ
∂ξ π 1=2 2n n!
Meanwhile, according to a theory of classical orthogonal polynomial, the Hermite

polynomials Hn(x) are defined as [3]

2 dn x2
H n ðxÞ ð1Þn ex n e ðn 0Þ, ð2:100Þ
dx
where Hn(x) is a n-th order polynomial. We wish to show the following relation on
the basis of mathematical induction:
fn H n ðξÞe2ξ :
fn ðξÞ ¼ N
ψ
1 2
ð2:101Þ
Comparing (2.87), (2.98), and (2.99), we make sure that (2.101) holds with n ¼ 0.
When n ¼ 1, from (2.99) we have
h i
f1 ξ ∂ e12ξ2 ¼ N
f1 ðξÞ ¼ N
ψ f1 ξe12ξ2 ðξÞe12ξ2 ¼ N f1 2ξe12ξ2
∂ξ
2 1 2 ð2:102Þ
f 1 ξ2 d f1 H 1 ðξÞe12ξ2 :
¼ N 1 ð1Þ e eξ e2ξ ¼ N
dξ
Then, (2.101) holds with n ¼ 1 as well.

Next, from supposition of mathematical induction we assume that (2.101) holds
with n. Then, we have
nþ1 n
g ∂ 1 ∂ f ∂
e2ξ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ξ e 2ξ
1 2 1 2
ψg
nþ1 ðξÞ ¼ N nþ1 ξ Nn ξ
∂ξ 2ð n þ 1 Þ ∂ξ ∂ξ
h i
1 ∂ f
N n H n ðxÞe2ξ
1 2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ξ
2ð n þ 1Þ ∂ξ
n 1 2
1 fn ξ ∂
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N ð1Þn eξ
2 d
e ξ2
e 2ξ
2ð n þ 1Þ ∂ξ dξn
n
1 fn ð1Þn ξ ∂ e12ξ2 d n eξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N
2ð n þ 1Þ ∂ξ dξ
n n
1 fn ð1Þn ξe12ξ2 d n eξ2 ∂ e12ξ2 d n eξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N
2ð n þ 1Þ dξ ∂ξ dξ
n n nþ1
1 f 1 2 d
ξ ξ2 1 2 d
ξ ξ2 1 2 d
ξ ξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N n ð1Þ ξe
n 2 e ξe 2 e e 2 e
2ð n þ 1Þ dξn dξn dξnþ1
nþ1
1 fn ð1Þnþ1 e12ξ2 d ξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N e
2ð n þ 1Þ dξnþ1
nþ1 1 2
nþ1 ξ2 d
¼ Ng
nþ1 ð 1 Þ e nþ1
e ξ2
e2ξ ¼ Ng nþ1 H nþ1 ðxÞe 2 :
1 ξ2
dξ
ð2:103Þ
This means that (2.101) holds with n + 1 as well. Thus, it follows that (2.101) is true
of n that is zero or any positive integer.
Orthogonal relation reads as
Table 2.1 First six Hermite H0(x) ¼ 1

polynomials
H1(x) ¼ 2x
H2(x) ¼ 4x2 2
H3(x) ¼ 8x3 12x
H4(x) ¼ 16x4 48x2 + 12
H5(x) ¼ 32x5 160x3 + 120x
Z 1
ψfm ðξÞ ψ
fn ðξÞdξ ¼ δmn : ð2:104Þ
1
Placing (2.98) back into the function form ψ n(q), we have

pffiffiffi
ψ n ð qÞ ¼ βψ fn ðβqÞ: ð2:105Þ
Using (2.101) and explicitly rewriting (2.105), we get
1=4 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi

mω 1 mω
q e 2ħ q ðn ¼ 0, 1, 2, Þ:
mω 2
ψ n ð qÞ ¼ H ð2:106Þ
ħ π 1=2 2n n!
n
ħ
We tabulate first several Hermite polynomials Hn(x) in Table 2.1, where the index
n represents the highest order of the polynomials. In Table 2.1 we see that even
functions and odd functions appear alternately (i.e., parity). This is the case with
ψ n(q) as well, because ψ n(q) is a product of Hn(x) and an even function e 2ħ q .
mω 2
Combining (2.101) and (2.104), the orthogonal relation between

fn ðξÞ ðn ¼ 0, 1, 2, Þ can be described alternatively as [3]
ψ
Z 1 pffiffiffi n
eξ H m ðξÞH n ðξÞdξ ¼
2
π 2 n!δmn : ð2:107Þ
1
Note that Hm(ξ) is a real function, and so Hm(ξ) ¼ Hm(ξ). The relation (2.107) is
well known as the orthogonality of Hermite polynomials with eξ taken as a weight
2
function [3]. Here the weight function is a real and non-negative function within the
domain considered [e.g., (1, +1) in the present case] and independent of indices
m and n. We will deal with it again in Sect. 8.4.
The relation (2.101) and the orthogonality relationship described as (2.107) can
more explicitly be understood as follows: From (2.11) we have the Schrödinger
equation of a one-dimensional quantum-mechanical harmonic oscillator such that
ħ2 d 2 uð qÞ 1
þ mω2 q2 uðqÞ ¼ EuðqÞ: ð2:108Þ
2m dq2 2
Changing a variable as in (2.90), we have

d 2 uð ξ Þ 2E
þ ξ2 uðξÞ ¼ uðξÞ: ð2:109Þ
dξ 2 ħω
Defining a dimensionless parameter
2E
λ ð2:110Þ
ħω
and also defining a differential operator D such that
d2
D þ ξ2 , ð2:111Þ
dξ2
we have a following eigenvalue equation:
DuðξÞ ¼ λuðξÞ: ð2:112Þ
We further consider a following function v(ξ) such that
uðξÞ ¼ vðξÞeξ =2
2
: ð2:113Þ
Then, (2.109) is converted as follows:

2
d vð ξ Þ dvðξÞ ξ22 2
ξ2
þ 2ξ e ¼ ðλ 1 Þ v ð ξ Þe : ð2:114Þ
dξ2 dξ
ξ2
Since e 2 does not vanish with any ξ, we have
d 2 vð ξ Þ dvðξÞ
2
þ 2ξ ¼ ðλ 1Þ vðξÞ: ð2:115Þ
dξ dξ
e such that
If we define another differential operator D
2
e d þ 2ξ d ,
D ð2:116Þ
dξ2 dξ
we have another eigenvalue equation
e ðξÞ ¼ ðλ 1Þ vðξÞ:
Dv ð2:117Þ
Meanwhile, we have a following well-known differential equation:

2.5 Variance and Uncertainty Principle 51
d 2 H n ðξÞ dH ðξÞ
2
2ξ n þ 2nH n ðξÞ ¼ 0: ð2:118Þ
dξ dξ
This equation is said to be Hermite differential equation. Using (2.116), (2.118) can
be recast as an eigenvalue equation such that
e n ðξÞ ¼ 2nH n ðξÞ:

DH ð2:119Þ
Therefore, comparing (2.115) and (2.118) and putting
λ ¼ 2n þ 1, ð2:120Þ
we get
vðξÞ ¼ cH n ðξÞ, ð2:121Þ
where c is an arbitrary constant. Thus, using (2.113), for a solution of (2.109) we get
un ðξÞ ¼ cH n ðξÞeξ =2
2
, ð2:122Þ
where the solution u(ξ) is indexed with n. From (2.110), as an energy eigenvalue we
have

1
E n ¼ n þ ħω:
2
Thus, (2.37) is recovered. A normalization constant c of (2.122) can be decided as in

(2.106).
As discussed above, the operator representation and coordinate representation are
fully consistent.
2.5 Variance and Uncertainty Principle
Uncertainty principle is one of most fundamental concepts of quantum mechanics.

To think of this conception on the basis of a quantum harmonic oscillator, let us
introduce a variance operator [4]. Let A be a physical quantity and let hAi be an
expectation value as defined in (1.126). We define a variance operator as
D E
ðΔAÞ2 ,
where we have
ΔA A hAi: ð2:123Þ
In (2.123), we assume that hAi is obtained by operating A on a certain physical state

jψi. Then, we have
D E D E D E
ðΔAÞ2 ¼ ðA hAiÞ2 ¼ A2 2hAiA þ hAi2 ¼ A2 hAi2 : ð2:124Þ
If A is Hermitian, ΔA is Hermitian as well. This is because
ðΔAÞ{ ¼ A{ hAi ¼ A hAi ¼ ΔA, ð2:125Þ
where we used the fact that an expectation value of an Hermitian operator is real.
Then, h(ΔA)2i is non-negative as in the case of (2.13). Moreover, if jψi is an
eigenstate of A, h(ΔA)2i ¼ 0. Therefore, h(ΔA)2i represents a measure of how large
measured values are dispersed when A is measured in reference to a quantum state
jψi. Also, we define a standard deviation δA as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D Effi
δA ðΔAÞ2 : ð2:126Þ
We have a following important theorem on a standard deviation δA [4].

Theorem 2.1 Let A and B be Hermitian operators. If A and B satisfy
½A, B ¼ ik ðk : non zero real numberÞ, ð2:127Þ
then we have
δA δB j k j =2 ð2:128Þ
in reference to any quantum state jψi.

Proof We have
½ΔA, ΔB ¼ ½A hψjAjψ i, B hψjBjψ i ¼ ½A, B ¼ ik: ð2:129Þ
In (2.129), we used the fact that hψ| A| ψi and hψ| B| ψi are just real numbers and
those commute with any operator. Next, we calculate a following quantity in relation
to a real number λ:
jjðΔA þ iλΔBÞjψijj2 ¼ hψjðΔA iλΔBÞðΔA þ iλΔBÞjψi

¼ hψjðΔAÞ2 j ψi kλ þ hψjðΔBÞ2 j ψiλ2 , ð2:130Þ
where we used the fact that ΔA and ΔB are Hermitian. For the above quadratic form
to hold with any real number λ, we have
ðkÞ2 4hψjðΔAÞ2 j ψihψjðΔBÞ2 j ψi 0: ð2:131Þ
Thus, (2.128) will follow. ∎

On the basis of Theorem 2.1, we find that both δA and δB are positive on
condition that (2.127) holds. We have another important theorem.
Theorem 2.2 Let A be an Hermitian operator. The necessary and sufficient condi-
tion for a physical state jψ 0i to be an eigenstate of A is δA ¼ 0.
Proof Suppose that jψ 0i is a normalized eigenstate of A that belongs to an eigen-
value a. Then, we have

ψ 0 jA2 ψ 0 ¼ ahψ 0 jAψ 0 i ¼ a2 ψ 0 jψ 0 i ¼ a2 ,

hψ 0 jAψ 0 i2 ¼ a ψ 0 jψ 0 i2 ¼ a2 : ð2:132Þ
From (2.124) and (2.126), we have

D E
ψ 0 jðΔAÞ2 ψ 0 ¼ 0, i:e:, δA ¼ 0: ð2:133Þ
Note that δA is measured in reference to jψ 0i. Conversely, suppose that δA ¼ 0.

Then,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D E pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
δA ¼ ψ 0 jðΔAÞ2 ψ 0 ¼ hΔAψ 0 jΔAψ 0 i ¼j jΔAψ 0 j j , ð2:134Þ
where we used the fact that ΔA is Hermitian. From the definition of norm of (1.121),
for δA ¼ 0 to hold, we have
ΔAψ 0 ¼ ðA hAiÞψ 0 ¼ 0, i:e:, Aψ 0 ¼ hAiψ 0 : ð2:135Þ
This indicates that ψ 0 is an eigenstate of A that belongs to an eigenvalue hAi. This

completes the proof. ∎
Theorem 2.1 implies that (2.127) holds with any physical state jψi. That is, we
must have δA > 0 and δB > 0, if δA and δB are evaluated in reference to any jψi on
condition that (2.127) holds. From Theorem 2.2, in turn, it follows that eigenstates
cannot exist with A or B under the condition of (2.127).
To explicitly show this, we take an inner product of (2.127). That is, with
Hermitian operators A and B, consider the following inner product:
hψj½A, Bjψi ¼ hψjik jψi, i:e:, hψjAB BA j ψi ¼ ik, ð2:136Þ

where we assumed that jψi is arbitrarily chosen normalized vector. Suppose now
that jψ 0i is an eigenstate of A that belongs to an eigenvalue a. Then, we have
A j ψ 0 i ¼ a j ψ 0 i: ð2:137Þ
Taking an adjoint of (2.137), we get
hψ 0 jA{ ¼ hψ 0 jA ¼ hψ 0 ja ¼ ahψ 0 j, ð2:138Þ
where the last equality comes from the fact that A is Hermitian. From (2.138), we
would have
hψ 0 jAB BAjψ 0 i ¼ hψ 0 jABjψ 0 i hψ 0 jBAjψ 0 i

¼ hψ 0 jaBjψ 0 i hψ 0 jBajψ 0 i ¼ ahψ 0 jBjψ 0 i ahψ 0 jBjψ 0 i ¼ 0:
This would imply that (2.136) does not hold with jψ 0i, in contradiction to (2.127),
where ik 6¼ 0. Namely, we conclude that any physical state cannot be an eigenstate of
A on condition that (2.127) holds. Equation (2.127) is rewritten as
hψjBA AB j ψi ¼ ik: ð2:139Þ
Suppose now that jφ0i is an eigenstate of B that belongs to an eigenvalue b. Then, we

can similarly show that any physical state cannot be an eigenstate of B.
Summarizing the above, we restate that once we have a relation [A, B] ¼ ik (k 6¼ 0),
their representation matrix does not diagonalize A or B. Or, once we postulate [A,
B] ¼ ik (k 6¼ 0), we must abandon an effort to have a representation matrix that
diagonalizes A and B. In the quantum-mechanical formulation of a harmonic oscil-
lator, we have introduced the canonical commutation relation (see Sect. 2.3)
described by [q, p] ¼ iħ (1.140). Indeed, neither q nor p is diagonalized as shown
in (2.69) or (2.70).
Example 2.1 Taking a quantum harmonic oscillator as an example, we consider
variance of q and p in reference to jψ ni (n ¼ 0, 1, ). We have
D E
ðΔqÞ2 ¼ hψ n jq2 j ψ n i hψ n jq j ψ n i2 : ð2:140Þ
Using (2.55) and (2.62) as well as (2.68), we get

rffiffiffiffiffiffiffiffiffiffi
ħ
hψ n jqjψ n i ¼ ψ a þ a{ ψ n
2mω n
rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2:141Þ
ħn ħð n þ 1Þ
¼ hψ n jψ n1 i þ ψ n jψ nþ1 ¼ 0,
2mω 2mω
where the last equality comes from (2.53). We have

h 2 i
ħ 2 ħ
q2 ¼ a þ a{ ¼ a2 þ E þ 2a{ a þ a{ , ð2:142Þ
2mω 2mω
where E denotes an identity operator and we used (2.24) along with the following
relation:
aa{ ¼ aa{ a{ a þ a{ a ¼ a, a{ þ a{ a ¼ E þ a{ a: ð2:143Þ
ħ ħ
h ψ n j q2 j ψ n i ¼ hψ n jψ n i þ 2 ψ n ja{ aψ n i ¼ ð2n þ 1Þ, ð2:144Þ
2mω 2mω
where we used (2.60) with the last equality. Thus, we get

D E
ħ
ðΔqÞ2 ¼ hψ n jq2 j ψ n i hψ n jq j ψ n i2 ¼ ð2n þ 1Þ:
2mω
Following similar procedures to those mentioned above, we get
mħω
hψ n jp j ψ n i ¼ 0 and hψ n jp2 j ψ n i ¼ ð2n þ 1Þ: ð2:145Þ
2
Thus, we get
D E
mħω
ðΔpÞ2 ¼ hψ n jp2 j ψ n i ¼ ð2n þ 1Þ:
2
Accordingly, we have
D E rDffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
E
2 ħ ħ
δq δp ¼ ðΔqÞ ðΔpÞ2 ¼ ð2n þ 1Þ : ð2:146Þ
2 2
The quantity δq δp is equal to ħ2 for n ¼ 0 and becomes larger with increasing n.

The above example gives a good illustration for Theorem 2.1. Note that putting
A ¼ q and B ¼ p along with k ¼ ħ in Theorem 2.1, we should have from (1.140)
ħ
δq δp :
2
This is indeed the case with (2.146) for the quantum-mechanical harmonic oscillator.
This example represents uncertainty principle more generally.
In relation to the aforementioned argument, we might well wonder if Examples
1.1 and 1.2 have an eigenstate of a fixed momentum. Suppose that we chose for an
eigenstate y(x) ¼ ceikx, where c is a constant. Then, we would have ħi ∂y∂xðxÞ ¼ ħkyðxÞ
and get an eigenvalue ħk for a momentum. Nonetheless, such y(x) does not satisfy
the proper BCs; i.e., y(L) ¼ y(L ) ¼ 0. This is because eikx never vanishes with any
real numbers of k or x (any complex numbers of k or x, more generally). Thus, we
cannot obtain a proper solution that has an eigenstate with a fixed momentum in a
confined physical system.
References
1. Messiah A (1999) Quantum mechanics. Dover, New York

2. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York
3. Lebedev NN (1972) Special functions and their applications. Dover, New York
4. Shimizu A (2003) Upgrade foundation of quantum theory (in Japanese). Saiensu-sha, Tokyo
Chapter 3
Hydrogen-Like Atoms
In a history of quantum mechanics, it was first successfully applied to the motion of

an electron in a hydrogen atom along with a harmonic oscillator. Unlike the case of a
one-dimensional harmonic oscillator we dealt with in Chap. 2, however, with a
hydrogen atom we have to consider three-dimensional motion of an electron.
Accordingly, it takes somewhat elaborate calculations to constitute the Hamiltonian.
The calculation procedures themselves, however, are worth following to understand
underlying basic concepts of the quantum mechanics. At the same time, this chapter
is a treasure of special functions. In Chap. 2, we have already encountered one of
them, i.e., Hermite polynomials. Here we will deal with Legendre polynomials,
associated Legendre polynomials, etc. These special functions arise when we deal
with a physical system having, e.g., the spherical symmetry. In a hydrogen atom, an
electron is moving in a spherically symmetric Coulomb potential field produced by a
proton. This topic provides us with a good opportunity to study various special
functions. The related Schrödinger equation can be separated into an angular part
and a radial part. The solutions of angular parts are characterized by spherical
(surface) harmonics. The (associated) Legendre functions are correlated to them.
The solutions of the radial part are connected to the (associated) Laguerre poly-
nomials. The exact solutions are obtained by the product of the (associated) Legen-
dre functions and (associated) Laguerre polynomials accordingly. Thus, to study the
characteristics of hydrogen-like atoms from the quantum-mechanical perspective is
of fundamental importance.
3.1 Introductory Remarks
The motion of the electron in hydrogen is well known as a two-particle problem

(or two-body problem) in a central force field. In that case, the coordinate system of
the physical system is separated into the relative coordinates and center-of-mass
coordinates. To be more specific, the coordinate separation is true of the case where

https://doi.org/10.1007/978-981-15-2225-3_3
58 3 Hydrogen-Like Atoms
two particles are moving under control only by a force field between the two
particles without other external force fields [1].
In the classical mechanics, equation of motion is separated into two equations
related to the relative coordinates and center-of-mass coordinates accordingly. Of
these, a term of the potential field is only included in the equation of motion with
respect to the relative coordinates.
The situation is the same with the quantum mechanics. Namely, the Schrödinger
equation of motion with the relative coordinates is expressed as an eigenvalue
equation that reads as

ħ2
∇2 þ V ðr Þ ψ ¼ Eψ, ð3:1Þ
2μ
where μ is a reduced mass of two particles [1], i.e., an electron and a proton; V(r) is a
potential with r being a distance between the electron and proton. In (3.1), we
assume the spherically symmetric potential; i.e., the potential is expressed only as
a function of the distance r. Moreover, if the potential is coulombic,

ħ2 2 e2
∇ ψ ¼ Eψ, ð3:2Þ
2μ 4πε0 r
where ε0 is permittivity of vacuum and e is an elementary charge.

If we think of hydrogen-like atoms such as He+, Li2+ and Be3+, we have an
equation described as

ħ2 Ze2
∇2 ψ ¼ Eψ, ð3:3Þ
2μ 4πε0 r
where Z is an atomic number and μ is a reduced mass of an electron and a nucleus

pertinent to the atomic (or ionic) species. We start with (3.3) in this chapter.
3.2 Constitution of Hamiltonian
As explicitly described in (3.3), the coulombic potential has a spherical symmetry. In

such a case, it will be convenient to recast (3.3) in a spherical coordinate (or polar
coordinate). As the physical system is of three-dimensional, we have to consider
orbital angular momentum L in Hamiltonian.
We have
3.2 Constitution of Hamiltonian 59
0 1
Lx
B C
L ¼ ðe1 e2 e3 Þ@ Ly A, ð3:4Þ
Lz
where e1, e2, and e3 denote an orthonormal basis vectors in a three-dimensional

Cartesian space (ℝ3); Lx, Ly, and Lz represent each component of L. The angular
momentum L is expressed in a form of determinant as

e1 e2 e3

z ,
L¼xp¼ x y

px py pz
where x denotes a position vector with respect to the relative coordinates x, y, and z.
That is,
0 1
x
B C
x ¼ ð e1 e2 e3 Þ @ y A: ð3:5Þ
z
The quantity p denotes a momentum of an electron (as a particle carrying a

reduced mass μ) with px, py, and pz being their components; p is denoted similarly to
the above.
As for each component of L, we have, e.g.,
Lx ¼ ypz zpy : ð3:6Þ
To calculate L2, we estimate Lx2, Ly2, and Lz2 separately. We have

Lx 2 ¼ ypz zpy ypz zpy
¼ ypz ypz ypz zpy zpy ypz zpy zpy
¼ y2 pz 2 ypz zpy zpy ypz þ z2 py 2

¼ y2 pz 2 y zpz iħ py z ypy iħ pz þ z2 py 2

¼ y2 pz 2 þ z2 py 2 yzpz py zypy pz þ iħ ypy þ zpz ,
where we have used canonical commutation relation (1.140) in the second to the last
equality. In the above calculations, we used commutability of, e.g., y and pz; z and py.
For example, we have

ħ ∂ ∂ ħ ∂jψi ∂jψi
pz , y j ψi ¼ yy j ψi ¼ y y ¼ 0:
i ∂z ∂z i ∂z ∂z
Since jψi is arbitrarily chosen, this relation implies that pz and y commute. We
obtain similar relations regarding Ly2 and Lz2 as well. Thus, we have
L2 ¼ Lx 2 þ Ly 2 þ Lz 2

¼ y2 pz 2 þ z2 py 2 þ z2 px 2 þ x2 pz 2 þ x2 py 2 þ y2 px 2

þ x 2 px 2 x 2 px 2 þ y 2 py 2 y 2 py 2 þ z 2 pz 2 z 2 pz 2

yzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py

þiħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy

¼ y 2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 px 2

þx2 px 2 þ y2 py 2 þ z2 pz 2 Þ x2 px 2 þ y2 py 2 þ z2 pz 2
þyzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py Þ

þiħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy
¼ r2 p2 rðr pÞ p þ 2iħðr pÞ: ð3:7Þ
In (3.7), we are able to ease the calculations by virtue of putting a term

(x2px2 x2px2 + y2py2 y2py2 + z2pz2 z2pz2). As a result, for the second term
after the second to the last equality we have

x 2 px 2 þ y 2 py 2 þ z 2 pz 2
þyzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py Þ

¼ x xpx þ ypy þ zpz px þ y xpx þ ypy þ zpz py þ z xpx þ ypy þ zpz pz
¼ rðr pÞ p:
The calculations of r2 p2 [the first term of (3.7)] and r p (in the third term) are
straightforward.
In a spherical coordinate, momentum p is expressed as
p = pr eðrÞ þ pθ eðθÞ þ pϕ eðϕÞ , ð3:8Þ
where pr, pθ, and pϕ are components of p; e(r), e(θ), and e(ϕ) are orthonormal basis
vectors of ℝ3 in the direction of increasing r, θ, and ϕ, respectively (see Fig. 3.1). In
(a) z (b)
e(r) z
e(r)
e(φ ) e(θ )
e(φ )
θ e(θ ) θ
O y
O
φ
Fig. 3.1 Spherical coordinate system and orthonormal basis set. (a) Orthonormal basis vectors e(r),
e(θ), and e(ϕ) in ℝ3. (b) The basis vector e(ϕ) is perpendicular to the plane shaped by the z-axis and a
straight line of y ¼ x tan ϕ
Fig. 3.1b, e(ϕ) is perpendicular to the plane shaped by the z-axis and a straight line of
y ¼ x tan ϕ. Notice that the said plane is spanned by e(r) and e(θ). Meanwhile, the
momentum operator is expressed as [2]
ħ
p¼ —
i

ħ ðrÞ ∂ ðθ Þ 1 ∂ ðϕÞ 1 ∂
¼ e þe þe : ð3:9Þ
i ∂r r ∂θ r sin θ ∂ϕ
The vector notation of (3.9) corresponds to (1.31). That is, in the Cartesian
coordinate, we have

ħ ħ ∂ ∂ ∂
p= —= e1 þ e2 þ e3 ,
i i ∂x ∂y ∂z
where — is said to be nabla (or del), a kind of differential vector operator.

Noting that
r = reðrÞ , ð3:10Þ
and using (3.9), we have


ħ ħ ∂
r p=r — =r : ð3:11Þ
i i ∂r
Hence,
2
ħ ∂ ħ ∂ ∂
rðr pÞ p = r r ¼ ħ2 r 2 2 : ð3:12Þ
i ∂r i ∂r ∂r
Thus, we have
2
∂ 2 ∂ 2 ∂ 2 ∂
L 2 ¼ r 2 p2 þ ħ2 r 2 þ 2ħ r ¼ r 2 2
p þ ħ r : ð3:13Þ
∂r
2 ∂r ∂r ∂r
Therefore,

ħ2 ∂ 2 ∂ L2
p ¼ 2
2
r þ 2: ð3:14Þ
r ∂r ∂r r
Notice here that L2 does not contain r (vide infra); i.e., L2 commutes with r2, and
so it can freely be divided by r2. Thus, the Hamiltonian H is represented by
p2
H¼ þ V ðr Þ
2μ
2
1 ħ ∂ ∂ L2 Ze2
¼ 2 r2 þ 2 : ð3:15Þ
2μ r ∂r ∂r r 4πε0 r
Thus, the Schrödinger equation can be expressed as

2
1 ħ ∂ 2 ∂ L2 Ze2
2 r þ 2 ψ ¼ Eψ: ð3:16Þ
Now, let us describe L2 in a polar coordinate. The calculation procedures are

somewhat lengthy, but straightforward. First we have
9
x ¼ r sin θ cos ϕ, >
=
y ¼ r sin θ sin ϕ, ð3:17Þ
>
;
z ¼ r cos θ,
where we have 0 θ π and 0 ϕ 2π. Rewriting (3.17) with respect to r, θ, and

ϕ, we get
1 9
r ¼ ð x2 þ y2 þ z 2 Þ 2 , >
>
>
>
1=2 =
ð x2 þ y2 Þ ð3:18Þ
θ ¼ tan 1 , >
z >
>
1 y
>
;
ϕ ¼ tan :
x
Thus, we have

∂ ∂
Lz ¼ xpy ypx ¼ iħ x y , ð3:19Þ
∂y ∂x
∂ ∂r ∂ ∂θ ∂ ∂ϕ ∂
¼ þ þ : ð3:20Þ
∂x ∂x ∂r ∂x ∂θ ∂x ∂ϕ
In turn, we have
∂r x
¼ ¼ sin θ cos ϕ,
∂x r
1
∂θ 1 ðx2 þ y2 Þ 2 2x z x cos θ cos ϕ
¼ ¼ 2 ¼ ,
∂x 1 þ ðx2 þ y2 Þ=z2 2z x þ y2 þ z2 ðx2 þ y2 Þ12 r

∂ϕ 1 1 sin ϕ
¼ y ¼ :
∂x 1 þ ðy2 =x2 Þ x2 r sin θ
ð3:21Þ
In calculating the last two equations of (3.21), we used the differentiation of an

arc tangent function along with a composite function. Namely,
0 1
tan 1 x ¼ :
1 þ x2
Inserting (3.21) into (3.20), we get
∂ ∂ cos θ cos ϕ ∂ sin ϕ ∂

¼ sin θ cos ϕ þ : ð3:22Þ
∂x ∂r r ∂θ r sin θ ∂ϕ
Similarly, we have
∂ ∂ cos θ sin ϕ ∂ cos ϕ ∂

¼ sin θ sin ϕ þ þ : ð3:23Þ
∂y ∂r r ∂θ r sin θ ∂ϕ
Inserting (3.22) and (3.23) together with (3.17) into (3.19), we get
∂
Lz ¼ iħ : ð3:24Þ
∂ϕ
In a similar manner, we have
∂ ∂r ∂ ∂θ ∂ ∂ sin θ ∂
¼ þ ¼ cos θ :
∂z ∂z ∂r ∂z ∂θ ∂r r ∂θ
Combining this relation with either (3.23) or (3.22), we get

∂ ∂
Lx ¼ iħ sin ϕ þ cot θ cos ϕ , ð3:25Þ
∂θ ∂ϕ

∂ ∂
Ly ¼ iħ cos ϕ cot θ sin ϕ : ð3:26Þ
∂θ ∂ϕ
Now, we introduce following operators:
LðþÞ Lx þ iLy and LðÞ Lx iLy : ð3:27Þ
Then, we have

∂ ∂ ∂ ∂
LðþÞ ¼ ħeiϕ þ i cot θ and LðÞ ¼ ħeiϕ þ i cot θ : ð3:28Þ
∂θ ∂ϕ ∂θ ∂ϕ
Thus, we get

ðþÞ ðÞ ∂ ∂ iϕ ∂ ∂
L L ¼ħ e 2 iϕ
þ i cot θ e þ i cot θ
∂θ ∂ϕ ∂θ ∂ϕ
2 2
∂ 1 ∂ ∂
¼ ħ2 eiϕ eiϕ 2 þ i þ i cot θ
∂θ sin 2 θ ∂ϕ ∂θ∂ϕ
2 2
∂ ∂ ∂ ∂
þeiϕ cot θ þ i cot θ þ ieiϕ cot θ þ i cot θ 2
∂θ ∂ϕ ∂ϕ∂θ ∂ϕ
2 2
∂ ∂ ∂ 2 ∂
¼ ħ2 þ cot θ þ i þ cot θ ð3:29Þ
∂θ2 ∂θ ∂ϕ ∂ϕ2
In the above calculation procedure, we used differentiation of a product function.

For instance, we have
2 2
∂ ∂ ∂ cot θ ∂ ∂ 1 ∂ ∂
i cot θ ¼i þ cot θ ¼i þ cot θ :
∂θ ∂ϕ ∂θ ∂ϕ ∂θ∂ϕ sin 2 θ ∂ϕ ∂θ∂ϕ
2 2
∂ ∂
Note also that ∂θ∂ϕ ¼ ∂ϕ∂θ . This is because we are dealing with continuous and
differentiable functions.
Meanwhile, we have following commutation relations:
Lx , Ly ¼ iħLz , Ly , Lz ¼ iħLx , and ½Lz , Lx ¼ iħLy : ð3:30Þ
This can easily be confirmed by requiring canonical commutation relations. The

derivation can routinely be performed, but we show it because the procedures
include several important points. For instance, we have
Lx , Ly ¼ Lx Ly Ly Lx

¼ ypz zpy zpx xpz zpx xpz ypz zpy
¼ ypz zpx ypz xpz zpy zpx þ zpy xpz
zpx ypz þ zpx zpy þ xpz ypz xpz zpy

¼ ypx pz z zpx ypz þ zpy xpz xpz zpy

þ xpz ypz ypz xpz þ zpx zpy zpy zpx

¼ ypx zpz pz z þ xpy zpz pz z ¼ iħ xpy ypx ¼ iħLz
In the above calculations, we used the canonical commutation relation as well as

commutability of, e.g., y and px; y and z; px and py. For example, we get
!
2 2
∂ ∂ ∂ ∂ ∂ jψi ∂ jψi
px , py j ψi ¼ ħ 2
j ψi ¼ ħ 2
¼ 0:
∂x ∂y ∂y ∂x ∂x∂y ∂y∂x
In the above equation, we assumed that the order of differentiation with respect to
x and y can be switched. It is because we are dealing with continuous and differen-
tiable normal functions. Thus, px and py commute.
For other important commutation relations, we have
Lx , L2 ¼ 0, Ly , L2 ¼ 0, and Lz , L2 ¼ 0: ð3:31Þ
With the derivation, use

½A, B þ C ¼ ½A, B þ ½A, C :
The derivation is straightforward and it is left for readers. The relations (3.30) and
(3.31) imply that a simultaneous eigenstate exists for L2 and one of Lx, Ly, and Lz.
This is because L2 commute with them from (3.31), whereas Lz does not commute
with Lx or Ly. The detailed argument about the simultaneous eigenstate can be seen
in Part III.
Thus, we have

LðþÞ LðÞ ¼ Lx 2 þ Ly 2 þ i Ly Lx Lx Ly ¼ Lx 2 þ Ly 2 þ i Ly , Lx
¼ Lx 2 þ Ly 2 þ ħLz
Notice here that [Ly, Lx] ¼ [Lx, Ly] ¼ iħLz. Hence,
L2 ¼ LðþÞ LðÞ þ Lz 2 ħLz : ð3:32Þ
From (3.24), we have
2
∂
Lz 2 ¼ ħ2 2
: ð3:33Þ
∂ϕ
Finally we get
!
2 2
∂ ∂ 1 ∂
L2 ¼ ħ2 þ cot θ þ
∂θ
2 ∂θ sin θ ∂ϕ2
2
or
" #
2
1 ∂ ∂ 1 ∂
L2 ¼ ħ2 sin θ þ : ð3:34Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
Replacing L2 in (3.15) with that of (3.34), we have

" #
2
ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂ Ze2
H¼ r2 þ sin θ þ : ð3:35Þ
2μr ∂r
2 ∂r sin θ ∂θ ∂θ sin θ ∂ϕ
2 2 4πε0 r
Thus, the Schrödinger equation of (3.3) takes a following form:

3.3 Separation of Variables 67
( " # )
2
ħ2 ∂ 2 ∂ 1 ∂ ∂ 1 ∂ Ze2
2 r þ sinθ þ 2 ψ ¼Eψ: ð3:36Þ
2μr ∂r ∂r sinθ ∂θ ∂θ sin θ ∂ϕ2 4πε0 r
3.3 Separation of Variables
If the potential is spherically symmetric (e.g., a Coulomb potential), it is well known

that the Schrödinger equations of (3.1)–(3.3) can be solved by a method of separa-
tion of variables. More specifically, (3.36) can be separated into two differential
equations one of which only depends on a radial component r and the other of which
depends only upon angular components θ and ϕ.
To apply the method of separation of variables to (3.36), let us first return to
(3.15). Considering that L2 is expressed as (3.34), we assume that L2 has eigenvalues
γ (at any rate if any) and takes eigenfunctions Y(θ, ϕ) (again, if any as well)
corresponding to γ. That is,
L2 Y ðθ, ϕÞ ¼ γY ðθ, ϕÞ, ð3:37Þ
where Y(θ, ϕ) is assumed to be normalized. Meanwhile,
L2 ¼ Lx 2 þ Ly 2 þ Lz 2 : ð3:38Þ
From (3.6), we have

{
Lx { ¼ ypz zpy ¼ pz { y{ py { z{ ¼ pz y py z ¼ ypz zpy ¼ Lx : ð3:39Þ
Note that pz and y commute, so do py and z. Therefore, Lx is Hermitian, so is Lx2.

More generally if an operator A is Hermitian, so is An (n: a positive integer); readers,
please show it. Likewise, Ly and Lz are Hermitian as well. Thus, L2 is Hermitian, too.
Next, we consider an expectation value of L2, i.e., hL2i. Let jψi be an arbitrary
normalized nonzero vector (or function). Then,
hL2 i hψ jL2 ψi
¼ hψ jLx 2 ψi þ hψ jLy 2 ψi þ hψ jLz 2 ψi
¼ hLx { ψ jLx ψi þ hLy { ψ jLy ψi þ hLz { ψ jLz ψi

¼ hLx ψ jLx ψi þ hLy ψ jLy ψi þ hLz ψ jLz ψi

2
¼jjLx ψ jj2 þjjLy ψ jj þ jjLz ψ jj2 0: ð3:40Þ
Notice that the second last equality comes from that Lx, Ly, and Lz are Hermitian.
An operator that satisfies (3.40) is said to be non-negative (see Sects. 1.4, 2.2, etc.
where we saw the calculation routines). Note also that in (3.40) the equality holds
only when the following relations hold:
jLx ψi ¼jLy ψi ¼jLz ψi ¼ 0: ð3:41Þ
On this condition, we have

jL2 ψ ¼ j Lx 2 þ Ly 2 þ Lz 2 ψ ¼ jLx 2 ψ þ jLy 2 ψ þ jLz 2 ψ

¼ Lx Lx ψi þ Ly Ly ψi þ Lz jLz ψi ¼ 0: ð3:42Þ
The eigenfunction that satisfies (3.42) and the next relation (3.43) is a simulta-
neous eigenstate of Lx, Ly, Lz, and L2. This could seem to be in contradiction to the
fact that Lz does not commute with Lx or Ly. However, this is an exceptional case. Let
jψ 0i be the eigenfunction that satisfies both (3.41) and (3.42). Then, we have
jLx ψ 0 i ¼jLy ψ 0 i ¼jLz ψ 0 i ¼jL2 ψ 0 i ¼ 0: ð3:43Þ
As can be seen from (3.24) to (3.26) along with (3.34), the operators Lx, Ly, Lz,
and L2 are differential operators. Therefore, (3.43) implies that jψ 0i is a constant. We
will come back this point later. In spite of this exceptional situation, it is impossible
that all Lx, Ly, and Lz as well as L2 take a whole set of eigenfunctions as simultaneous
eigenstates. We briefly show this as below.
In Chap. 2, we mention that if [A, B] ¼ ik, any physical state cannot be an
eigenstate of A or B. The situation is different, on the other hand, if we have a
following case
½A, B ¼ iC, ð3:44Þ
where A, B, and C are Hermitian operators. The relation (3.30) is a typical example
for this. If Cjψi ¼ 0 in (3.44), jψi might well be an eigenstate of A and/or B.
However, if Cjψi ¼ cjψi (c 6¼ 0), jψi cannot be an eigenstate of A or B. This can
readily be shown in a fashion similar to that described in Sect. 2.5. Let us think of,
e.g., [Lx, Ly] ¼ iħLz. Suppose that for ∃ψ 0 we have Lzj ψ 0i ¼ 0. Taking an inner
product using jψ 0i, from (3.30) we have

ψ 0 j Lx Ly Ly Lx ψ 0 ¼ 0:
In this case, moreover, even if we have jLxψ 0i ¼ 0 and jLyψ 0i ¼ 0, we have no

inconsistency. If, on the other hand, Lzjψi ¼ mjψi (m 6¼ 0), jψi cannot be an
eigenstate of Lx or Ly as mentioned above. Thus, we should be careful to deal with
a general situation where we have [A, B] ¼ iC.
In the case where [A, B] ¼ 0; AB ¼ BA, namely A and B commute, we have a
different situation. This relation is equivalent to that an operator AB BA has an
eigenvalue zero for any physical state jψi. Yet, this statement is of less practical use.
Again, regarding details we wish to make a discussion in Sect. 12.6 of Part III.
Returning to (3.40), let us replace ψ with a particular eigenfunction Y(θ, ϕ). Then,
we have

YjL2 Y ¼ hYjγYi ¼ γ hYjYi ¼ γ 0: ð3:45Þ
Again, if L2 has an eigenvalue, the eigenvalue should be non-negative. Taking

account of the coefficient ħ2 in (3.34), it is convenient to put
γ ¼ ħ2 λ ðλ 0Þ: ð3:46Þ
On ground that the solution of (3.36) can be described as
ψ ðr, θ, ϕÞ ¼ Rðr ÞY ðθ, ϕÞ, ð3:47Þ
the Schrödinger equation (3.16) can be rewritten as

2
1 ħ ∂ 2 ∂ L2 Ze2
2 r þ 2 Rðr ÞY ðθ, ϕÞ ¼ ERðr ÞY ðθ, ϕÞ: ð3:48Þ
That is,
2
1 ħ ∂ ∂Rðr Þ L2 Y ðθ, ϕÞ Ze2
2 r2 Y ðθ, ϕÞ þ R ð r Þ Rðr ÞY ðθ, ϕÞ
2μ r ∂r ∂r r 2 4πε0 r
¼ ERðr ÞY ðθ, ϕÞ: ð3:49Þ
Recalling (3.37) and (3.46), we have

2
1 ħ ∂ 2 ∂Rðr Þ ħ2 λY ðθ, ϕÞ Ze2
2 r Y ðθ, ϕÞ þ Rð r Þ Rðr ÞY ðθ, ϕÞ
2μ r ∂r ∂r r2 4πε0 r
¼ ERðr ÞY ðθ, ϕÞ: ð3:50Þ
Dividing both sides by Y(θ, ϕ), we get a SOLDE of a radial component as

2
1 ħ ∂ 2 ∂Rðr Þ ħ2 λ Ze2
2 r þ 2 R ðr Þ Rðr Þ ¼ ERðr Þ: ð3:51Þ
Regarding angular components θ and ϕ, using (3.34), (3.37), and (3.46), we have
" #
2
1 ∂ ∂ 1 ∂
L Y ðθ, ϕÞ ¼ ħ
2 2
sin θ þ Y ðθ, ϕÞ
¼ ħ2 λY ðθ, ϕÞ : ð3:52Þ
Dividing both sides by ħ2, we get

" #
2
1 ∂ ∂ 1 ∂
sin θ þ Y ðθ, ϕÞ ¼ λY ðθ, ϕÞ: ð3:53Þ
Notice in (3.53) that the angular part of SOLDE does not depend on a specific
form of the potential.
Now, we further assume that (3.53) can be separated into a zenithal angle part θ
and azimuthal angle part ϕ such that
Y ðθ, ϕÞ ¼ ΘðθÞΦðϕÞ: ð3:54Þ
Then we have
" #
2
1 ∂ ∂ΘðθÞ 1 ∂ Φ ð ϕÞ
sin θ ΦðϕÞ þ ΘðθÞ ¼ λΘðθÞΦðϕÞ: ð3:55Þ
Multiplying both sides by sin2θ/Θ(θ)Φ(ϕ) and arranging both the sides, we get
2
1 ∂ ΦðϕÞ sin 2 θ 1 ∂ ∂ΘðθÞ
¼ sin θ þ λΘðθÞ : ð3:56Þ
ΦðϕÞ ∂ϕ2 ΘðθÞ sin θ ∂θ ∂θ
Since LHS of (3.56) depends only upon ϕ and RHS depends only on θ, we must
have
LHS of ð3:56Þ ¼ RHS of ð3:56Þ ¼ η ðconstantÞ: ð3:57Þ
Thus, we have a following relation of LHS of (3.56):

1 d 2 Φð ϕÞ
¼ η: ð3:58Þ
ΦðϕÞ dϕ2
2
Putting D dϕ
d
2 , we get
DΦðϕÞ ¼ ηΦðϕÞ: ð3:59Þ
The SOLDEs of (3.58) and (3.59) are formally the same as (1.61) of Sect. 1.3,
where boundary conditions (BCs) are Dirichlet conditions. Unlike (1.61), however,
we have to consider different BCs, i.e., the periodic BCs.
As in Example 1.1, we adopt two linearly independent solutions. That is, we have
eimϕ and eimϕ ðm 6¼ 0Þ:
As their linear combination, we have
ΦðϕÞ ¼ aeimϕ þ beimϕ : ð3:60Þ
As BCs, we consider Φ(0) ¼ Φ(2π) and Φ0(0) ¼ Φ0(2π); i.e., we have
a þ b ¼ aei2πm þ bei2πm : ð3:61Þ
Meanwhile, we have
Φ0 ðϕÞ ¼ aimeimϕ bimeimϕ : ð3:62Þ
Therefore, from BCs we have
aim bim ¼ aimei2πm bimei2πm :
Then,
a b ¼ aei2πm bei2πm : ð3:63Þ

2a 1 ei2πm ¼ 0 and 2b 1 ei2πm ¼ 0:
If a 6¼ 0, we must have m ¼ 0, 1, 2, . If a ¼ 0, we must have b 6¼ 0 to avoid

having Φ(ϕ) 0 as a solution. In that case, we have m ¼ 0, 1, 2, as well.
Thus, it suffices to put Φ(ϕ) ¼ ceimϕ (m ¼ 0, 1, 2, ). Therefore, as a normalized
function Φ e ðϕÞ, we get
e ðϕÞ ¼ p1ffiffiffiffiffi eimϕ ðm ¼ 0, 1, 2, Þ:

Φ ð3:64Þ
2π
Inserting it into (3.58), we have
m2 eimϕ ¼ ηeimϕ :
Therefore, we get
η ¼ m2 ðm ¼ 0, 1, 2, Þ: ð3:65Þ

1 d dΘðθÞ m 2 Θ ðθ Þ
sin θ þ ¼ λΘðθÞ ðm ¼ 0, 1, 2, Þ: ð3:66Þ
sin θ dθ dθ sin 2 θ
pffiffiffiffiffi
In (3.64) putting m ¼ 0 as an eigenvalue, we have ΦðϕÞ ¼ 1= 2π as a
corresponding eigenfunction. Unlike Examples 1.1 and 1.2, this reflects that the
d2
differential operator dϕ 2 accompanied by the periodic BCs is a non-negative
operator that allows an eigenvalue of zero. Yet, we are uncertain of a range of m.

To clarify this point, we consider generalized angular momentum in the next section.
3.4 Generalized Angular Momentum
We obtained commutation relations of (3.30) among individual angular momentum

components Lx, Ly, and Lz. In an opposite way, we may start with (3.30) to define
angular momentum. Such a quantity is called generalized angular momentum.
Let e
J be a generalized angular momentum as in the case of (3.4) such that
0 1
Jex
e B C
J ¼ ðe1 e2 e3 Þ@ Jey A: ð3:67Þ
Jez
For the sake of simple notation, let us define J as follows so that we can eliminate
ħ and deal with dimensionless quantities in the present discussion:
3.4 Generalized Angular Momentum 73
0
1 0 1
Jex =ħ Jx
e Be C B C
J J=ħ ¼ ðe1 e2 e3 Þ@ J y =ħ A ¼ ðe1 e2 e3 Þ@ J y A,
Jez =ħ Jz
J2 ¼ J x 2 þ J y 2 þ J z 2 : ð3:68Þ
Then, we require following commutation relations:
J x , J y ¼ iJ z , J y , J z ¼ iJ x , and ½J z , J x ¼ iJ y : ð3:69Þ
Also, we require Jx, Jy, and Jz to be Hermitian. The operator J2 is Hermitian

accordingly. The relations (3.69) lead to
J x , J2 ¼ 0, J y , J2 ¼ 0, and J z , J2 ¼ 0: ð3:70Þ
This can be confirmed as in the case of (3.30).

As noted above, again a simultaneous eigenstate exists for J2 and one of Jx, Jy,
and Jz. According to the convention, we choose J2 and Jz for the simultaneous
eigenstate. Then, designating the eigenstate by jζ, μi, we have
J2 jζ, μi ¼ ζjζ, μi and J z jζ, μi ¼ μjζ, μi: ð3:71Þ
The implication of (3.71) is that jζ, μi is the simultaneous eigenstate and that μ is
an eigenvalue of Jz which jζ, μi belongs to with ζ being an eigenvalue of J2 which
jζ, μi belongs to as well.
Since Jz and J2 are Hermitian, both μ and ζ are real (see Sect. 1.4). Of these, ζ 0
as in the case of (3.45). We define following operators J (+) and J () as in the case of
(3.27):
J ðþÞ J x þ iJ y and J ðÞ J x iJ y : ð3:72Þ
Then, from (3.69) and (3.70), we get

h i h i
J ðþÞ , J2 ¼ J ðÞ , J2 ¼ 0: ð3:73Þ
Also, we obtain following commutation relations:

h i h i h i
J z , J ðþÞ ¼ J ðþÞ ; J z , J ðÞ ¼ J ðÞ ; J ðþÞ , J ðÞ ¼ 2J z : ð3:74Þ
From (3.70) to (3.72), we get

J2 J ðþÞ jζ, μi ¼ J ðþÞ J2 jμi ¼ ζJ ðþÞ jζ, μi,

J2 J ðÞ jζ, μi ¼ J ðÞ J2 jζ, μi ¼ ζJ ðÞ jζ, μi: ð3:75Þ
Equation (3.75) indicates that both J (+)jζ, μi and J ()jζ, μi are eigenvectors of J2
that correspond to an eigenvalue ζ.
Meanwhile, from (3.74) we get
J z J ðþÞ jζ, μi ¼ J ðþÞ ðJ z þ 1Þjζ, μi ¼ ðμ þ 1ÞJ ðþÞ jζ, μi,

J z J ðÞ jζ, μi ¼ J ðÞ ðJ z 1Þjζ, μi ¼ ðμ 1ÞJ ðÞ jζ, μi: ð3:76Þ
The relation (3.76) means that J (+)jζ, μi is an eigenvector of Jz corresponding to

an eigenvalue (μ + 1), while J ()jζ, μi is an eigenvector of Jz corresponding to an
eigenvalue (μ 1). This implies that J (+) and J () function as raising and lowering
operators (or ladder operators) that have been introduced in this chapter. Thus, using
undetermined constants (or phase factors) aμ(+) and aμ(), we describe
J ðþÞ jζ, μi ¼ aμ ðþÞ jζ, μ þ 1i and J ðÞ jζ, μi ¼ aμ ðÞ jζ, μ 1i: ð3:77Þ
Next, let us characterize eigenvalues μ. We have
J x 2 þ J y 2 ¼ J2 J z 2 : ð3:78Þ
Therefore,

J x 2 þ J y 2 Þjζ, μi ¼ J2 J z 2 jζ, μi ¼ ζ μ2 jζ, μi: ð3:79Þ
Since (Jx2 + Jy2) is a non-negative operator, its eigenvalues are non-negative as

well, as can be seen from (3.40) and (3.45). Then, we have
ζ μ2 0: ð3:80Þ
Thus, for a fixed value of non-negative ζ, μ is bounded both upwards and

downwards. We define then a maximum of μ as j and a minimum of μ as j0.
Consequently, on the basis of (3.77), we have
J ðþÞ jζ, ji ¼ 0 and J ðÞ jζ, j0 i ¼ 0: ð3:81Þ
This is because we have no quantum state corresponding to jζ, j + 1i or jζ, j0 1i.

From (3.75) and (3.81), possible numbers of μ are
3.4 Generalized Angular Momentum 75
j, j 1, j 2, , j0 : ð3:82Þ
From (3.69) and (3.72), we get
J ðÞ J ðþÞ ¼ J2 J z 2 J z , J ðþÞ J ðÞ ¼ J 2 J z 2 þ J z : ð3:83Þ
Operating these operators on j ζ, ji or j ζ, j0i and using (3.81) we get

J ðÞ J ðþÞ jζ, ji ¼ J2 J z 2 J z jζ, ji ¼ ζ j2 j jζ, ji ¼ 0,

J ðþÞ J ðÞ jζ, j0 i ¼ J2 J z 2 þ J z jζ, j0 i ¼ ζ j0 þ j0 jζ, j0 i ¼ 0:
2
ð3:84Þ
Since jζ, ji 6¼ 0 and jζ, j0i 6¼ 0, we have
ζ j2 j ¼ ζ j0 þ j0 ¼ 0:
2
ð3:85Þ
This means that
ζ ¼ jð j þ 1Þ ¼ j0 ð j0 1Þ: ð3:86Þ
Moreover, from (3.86) we get
jð j þ 1Þ j0 ð j0 1Þ ¼ ð j þ j0 Þð j j0 þ 1Þ ¼ 0: ð3:87Þ
As j j0, j j0 + 1 > 0. From (3.87), therefore, we get
j þ j0 ¼ 0 or j ¼ j0 : ð3:88Þ
Then, we conclude that the minimum of μ is –j. Accordingly, possible values of μ

are
μ ¼ j, j 1, j 2, , j 1, j: ð3:89Þ
That is, the number μ can take is (2j + 1). The relation (3.89) implies that taking a
positive integer k,
j k ¼ j or j ¼ k=2: ð3:90Þ
In other words, j is permitted to take a number zero, a positive integer, or a

positive half-integer (or more precisely, half-odd-integer). For instance, if j ¼ 1/2, μ
can be 1/2 or 1/2. When j ¼ 1, μ can be 1, 0, or 1.
Finally, we have to decide undetermined constants aμ(+) and aμ(). To this end,
multiplying hζ, μ 1| on both sides of the second equation of (3.77) from the left, we
have
hζ, μ 1jJ ðÞ jζ, μi ¼ aμ ðÞ hζ, μ 1jζ, μ 1i ¼ aμ ðÞ , ð3:91Þ
where the second equality comes from that jζ, μ 1i has been normalized; i.e.,
jjj ζ, μ 1ijj ¼ 1. Meanwhile, taking adjoint of both sides of the first equation of
(3.77), we have
h i{ h i
hζ, μj J ðþÞ ¼ aμ ðþÞ hζ, μ þ 1j: ð3:92Þ
But, from (3.72) and the fact that Jx and Jy are Hermitian,
h i{
J ðþÞ ¼ J ð Þ : ð3:93Þ
Using (3.93) and replacing μ in (3.92) with μ 1, we get

h i
hζ, μ 1jJ ðÞ ¼ aμ1 ðþÞ hζ, μj: ð3:94Þ
Furthermore, multiplying jζ, μi on (3.94) from the right, we have

h i h i
hζ, μ 1jJ ðÞ jζ, μi ¼ aμ1 ðþÞ hζ, μjζ, μi ¼ aμ1 ðþÞ , ð3:95Þ
where again jζ, μi is assumed to be normalized. Comparing (3.91) and (3.95), we get
h i
aμ ðÞ ¼ aμ1 ðþÞ : ð3:96Þ
Taking an inner product regarding the first equation of (3.77) and its adjoint,
h i 2
hζ, μjJ ðÞ J ðþÞ jζ, μi ¼ aμ ðþÞ aμ ðþÞ hζ, μ þ 1jζ, μ þ 1i ¼aμ ðþÞ : ð3:97Þ
Once again, the second equality of (3.97) results from the normalization of the
vector.
Using (3.83) as well as (3.71) and (3.86), (3.97) can be rewritten as
hζ, μjJ2 J z 2 J z jζ, μi ¼ hζ, μj jð j þ 1Þ μ2 μjζ, μi

2
¼ hζ, μjζ, μið j μÞð j þ μ þ 1Þ ¼ aμ ðþÞ : ð3:98Þ
Thus, we get
3.5 Orbital Angular Momentum: Operator Approach 77
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aμ ðþÞ ¼ eiδ ð j μÞð j þ μ þ 1Þ ðδ : an arbitrary real numberÞ, ð3:99Þ
where eiδ is a phase factor. From (3.96) we also get

aμ ðÞ ¼ eiδ ð j μ þ 1Þ ð j þ μ Þ : ð3:100Þ
In (3.99) and (3.100), we routinely put δ ¼ 0 so that aμ(+) and aμ() can be positive
numbers. Explicitly rewriting (3.77), we get
J ðþÞ jζ, μi ¼ ð j μÞð j þ μ þ 1Þjζ, μ þ 1i,
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
J ðÞ jζ, μi ¼ ð j μ þ 1Þð j þ μÞjζ, μ 1i, ð3:101Þ
where j is a fixed given number chosen from among zero, positive integers, and
positive half-integers (or half-odd-integers).
As discussed above, we have derived various properties and relations with respect
to the generalized angular momentum on the basis of (i) the relation (3.30) or (3.69)
and (ii) the fact that Jx, Jy, and Jz are Hermitian operators. This notion is very useful
in dealing with various angular momenta of different origins (e.g., orbital angular
momentum, spin angular momentum) from a unified point of view. In Chap. 20, we
will revisit this issue in more detail.
3.5 Orbital Angular Momentum: Operator Approach
In Sect. 3.4 we have derived various important results on angular momenta on the
basis of the commutation relations (3.69) and the assumption that Jx, Jy, and Jz are
Hermitian. Now, let us return to the discussion on orbital angular momenta we dealt
with in Sects. 3.2 and 3.3. First, we treat the orbital angular momenta via operator
approach. This approach enables us to understand why a quantity j introduced in
Sect. 3.4 takes a value zero or positive integers with the orbital angular momenta. In
the next section (Sect 3.6) we will deal with the related issues by an analytical
method.
In (3.28) we introduced differential operators L(+) and L(). According to Sect.
3.4, we define following operators to eliminate ħ so that we can deal with dimen-
sionless quantities:
0 1
Mx
B C
M L=ħ ¼ ðe1 e2 e3 Þ@ M y A,
Mz
M 2 ¼ L2 =ħ2 ¼ M x 2 þ M y 2 þ M z 2 : ð3:102Þ
Hence, we have
Lx Ly
Mx ¼ , M y ¼ , and M z ¼ Lz =ħ: ð3:103Þ
ħ ħ
Moreover, we define following operators:

∂ ∂
M ðþÞ M x þ iM y ¼ LðþÞ =ħ ¼ eiϕ þ i cot θ , ð3:104Þ
∂θ ∂ϕ

∂ ∂
M ð Þ LðÞ =ħ ¼ eiϕ þ i cot θ : ð3:105Þ
∂θ ∂ϕ
Then we have
" #
2
1 ∂ ∂ 1 ∂
M ¼ 2
sin θ þ : ð3:106Þ
Here we execute variable transformation such that

qffiffiffiffiffiffiffiffiffiffiffiffiffi
ξ ¼ cos θ ð0 θ πÞ or sin θ ¼ 1 ξ2 : ð3:107Þ
Noting, e.g., that

qffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ ∂ξ ∂ ∂ ∂ ∂ ∂
¼ ¼ sin θ ¼ 1 ξ2 , sin θ ¼ sin 2 θ
∂θ ∂θ ∂ξ ∂ξ ∂ξ ∂θ ∂ξ
∂
¼ 1 ξ2 , ð3:108Þ
∂ξ
we get
qffiffiffiffiffiffiffiffiffiffiffiffiffi !
ð þÞ 2 ∂ ξ ∂
M ¼e 1ξ iϕ
þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1 ξ2 ∂ϕ
ð Þ iϕ 2 ∂ ξ ∂ ð3:109Þ
M ¼e 1ξ þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1 ξ2 ∂ϕ

∂ ∂ 2
1 ∂
M2 ¼ 1 ξ2 :
∂ξ ∂ξ 1 ξ2 ∂ϕ2
Although we showed in Sect. 3.3 that m ¼ 0, 1, 2, , the range of m was

unclear. The relationship between m and λ in (3.66) remains unclear so far as well.
On the basis of a general approach developed in Sect. 3.4, however, we have known
that the eigenvalue μ of the dimensionless z-component angular momentum Jz is
bounded with its maximum and minimum being j and –j, respectively [see (3.89)],
where j can be zero, a positive integer, or a positive half-odd-integer. Concomitantly,
the eigenvalue ζ of J2 equals j( j + 1).
In the present section, let us reconsider the relationship between m and λ in (3.66)
in light of the knowledge obtained in Sect. 3.4. According to the custom, we replace
μ in (3.89) with m to have
m ¼ j, j 1, j 2, , j 1, j: ð3:110Þ
At the moment, we assume that m can be a half-odd-integer besides zero or an

integer [3].
Now, let us define notation of Y(θ, ϕ) that appeared in (3.37). This function is
eligible for a simultaneous eigenstate of M2 and Mz and can be indexed with j and
m as in (3.110). Then, let Y(θ, ϕ) be described accordingly as
j ðθ, ϕÞ Y ðθ, ϕÞ:

Ym ð3:111Þ
j ðθ, ϕÞ / e
Ym :
imϕ
Therefore, we get
2 ∂ mξ
M ð þÞ Y m
j ðθ, ϕÞ ¼e 1ξ
iϕ
pffiffiffiffiffiffiffiffiffiffiffiffiffi Y m
j ðθ, ϕÞ
∂ξ 1 ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffimþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffim
∂
¼ e iϕ
1ξ 2
1ξ 2
Y j ðθ, ϕÞ ,
m
ð3:112Þ
∂ξ
where we used the following equation:

qffiffiffiffiffiffiffiffiffiffiffiffiffim qffiffiffiffiffiffiffiffiffiffiffiffiffim1 qffiffiffiffiffiffiffiffiffiffiffiffiffi1
∂ 1
1ξ 2
¼ ðmÞ 1 ξ2 1 ξ2 ð2ξÞ
∂ξ 2
qffiffiffiffiffiffiffiffiffiffiffiffiffim2
¼ mξ 1 ξ2 ,
qffiffiffiffiffiffiffiffiffiffiffiffiffim qffiffiffiffiffiffiffiffiffiffiffiffiffim2
∂
1 ξ2 Ym ð θ, ϕ Þ ¼ mξ 1 ξ2 j ðθ, ϕÞ
Ym
∂ξ j
qffiffiffiffiffiffiffiffiffiffiffiffiffim
∂Y m
j ðθ, ϕÞ
þ 1 ξ2 : ð3:113Þ
∂ξ
Similarly, we get
2 ∂ mξ
M ð Þ Y m
j ðθ, ϕÞ ¼e iϕ
1ξ pffiffiffiffiffiffiffiffiffiffiffiffiffi Y m
j ðθ, ϕÞ
∂ξ 1 ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffimþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffim
iϕ ∂
¼e 1ξ 2
1ξ 2
Y j ðθ, ϕÞ :
m
ð3:114Þ
∂ξ
Let us derive the relations where M (+) or M () is successively operated on

j ðθ, ϕÞ. In the case of M , using (3.109) we have
(+)
Ym
h in qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n
ð þÞ n inϕ ∂
M Y j ðθ, ϕÞ ¼ ð1Þ e
m
1 ξ2 n
∂ξ
qffiffiffiffiffiffiffiffiffiffiffiffiffim
1 ξ2 Ym j ð θ, ϕ Þ : ð3:115Þ
We confirm this relation by mathematical induction. We have (3.112) by

replacing n with 1 in (3.115). Namely, (3.115) holds when n ¼ 1. Next, suppose
that (3.115) holds with n. Then, using the first equation of (3.109) and noting (3.64),
we have
h inþ1 nh in o
M ðþÞ j ðθ, ϕÞ ¼ M
Ym ð þÞ
M ðþÞ Y m
j ðθ, ϕÞ
2 ∂ ξ ∂
¼e iϕ
1ξ þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1 ξ2 ∂ϕ
qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ 2 m m
ð1Þn einϕ 1 ξ2 1 ξ Y ð θ, ϕ Þ
∂ξn j
" qffiffiffiffiffiffiffiffiffiffiffiffiffi #
n iðnþ1Þϕ 2 ∂ ξðn þ mÞ
¼ ð1Þ e 1ξ pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1 ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ 2 m m
1ξ 2
1ξ Y j ðθ, ϕÞ
∂ξn
( qffiffiffiffiffiffiffiffiffiffiffiffiffi" qffiffiffiffiffiffiffiffiffiffiffiffiffimþn1
n iðnþ1Þϕ
¼ ð1Þ e 1 ξ 2 ð m þ nÞ 1 ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffi n qffiffiffiffiffiffiffiffiffiffiffiffiffi
1 2 1 ∂ 2 m m
1ξ ð2ξÞ 1ξ Y j ðθ, ϕÞ
2 ∂ξn
qffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffimþn nþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ 2 m m
1 ξ2 1 ξ2 1 ξ Y j ð θ, ϕ Þ
∂ξnþ1
qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n "qffiffiffiffiffiffiffiffiffiffiffiffiffi )
ξðn þ mÞ ∂ 2 m m
pffiffiffiffiffiffiffiffiffiffiffiffiffi 1ξ 2
1ξ Y j ðθ, ϕÞ
1 ξ2 ∂ξn
qffiffiffiffiffiffiffiffiffiffiffiffiffimþðnþ1Þ nþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi
nþ1 iðnþ1Þϕ ∂ 2 m m
¼ ð1Þ e 1ξ 2
1ξ Y j ðθ, ϕÞ : ð3:116Þ
∂ξnþ1
Notice that the first and third terms in the second last equality cancelled each
other. Thus, (3.115) certainly holds with (n + 1). Similarly, we have [3]
h in qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi
ðÞ inϕ ∂
M Y j ðθ, ϕÞ ¼ e
m
1ξ 2
n 1ξ 2 m m
Y j ðθ, ϕÞ :
∂ξ
ð3:117Þ
Proof of (3.117) is left for readers.

From the second equation of (3.81) and (3.114) where m is replaced with j, we
have
qffiffiffiffiffiffiffiffiffiffiffiffiffi jþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi
∂
M ðÞ Y j
j ðθ, ϕÞ ¼ e
iϕ
1 ξ2 1 ξ2 j Y j ð θ, ϕ Þ ¼ 0: ð3:118Þ
∂ξ j
pffiffiffiffiffiffiffiffiffiffiffiffiffi j
This implies that 1 ξ2 j Y j ðθ, ϕÞ is a constant with respect to ξ. We
describe this as
qffiffiffiffiffiffiffiffiffiffiffiffiffij
1 ξ2 Y j
j ðθ, ϕÞ ¼ c ðc : constant with respect to ξÞ: ð3:119Þ
Meanwhile, putting m ¼ j and n ¼ 2j + 1 in (3.115) and taking account of the

first equation of (3.77) and the first equation of (3.81), we get
h i2jþ1
M ðþÞ Y j
j ðθ, ϕÞ
qffiffiffiffiffiffiffiffiffiffiffiffiffi jþ1 2jþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi

∂ 2 j j
¼ ð1Þ2jþ1 eið2jþ1Þϕ 1 ξ2 2jþ1
1 ξ Y j ð θ, ϕ Þ ¼ 0: ð3:120Þ
∂ξ
This means that

qffiffiffiffiffiffiffiffiffiffiffiffiffij
1 ξ2 Y j j ðθ, ϕÞ ¼ ðat most a 2j degree polynomial with ξÞ: ð3:121Þ
Replacing Y j
j ðθ, ϕÞ in (3.121) with that of (3.119), we get
qffiffiffiffiffiffiffiffiffiffiffiffiffij qffiffiffiffiffiffiffiffiffiffiffiffiffij
j
c 1 ξ2 1 ξ2 ¼ c 1 ξ2
¼ ðat most a 2j degree polynomial with ξÞ:

ð3:122Þ
Here, if j is a half-odd-integer, c(1 ξ2) j of (3.122) cannot be a polynomial. If, on

the other hand, j is zero or a positive integer, c(1 ξ2) j is certainly a polynomial and,
pffiffiffiffiffiffiffiffiffiffiffiffiffij j
to top it all, a 2j-degree polynomial with respect to ξ; so is 1 ξ2 Y j ðθ, ϕÞ.
According to the custom, henceforth we use l as zero or a positive integer instead
of j. That is,
Y ðθ, ϕÞ Y m
l ðθ, ϕÞ ðl : zero or a positive integer Þ: ð3:123Þ
At the same time, so far as the orbital angular momentum is concerned, from
(3.71) and (3.86) we can identify ζ in (3.71) with l(l + 1). Namely, we have
ζ ¼ lðl þ 1Þ:
Concomitantly, m in (3.110) is determined as
m ¼ l, l 1, l 2, 1, 0, 1, l þ 1, l: ð3:124Þ
Thus, as expected m is zero or a positive or negative integer. Considering (3.37)

and (3.46), ζ is identical with λ in (3.46). Finally, we rewrite (3.66) such that

1 d dΘðθÞ m2 ΘðθÞ
sin θ þ ¼ lðl þ 1ÞΘðθÞ, ð3:125Þ
sin θ dθ dθ sin 2 θ
where, l is equal to zero or positive integers and m is given by (3.124).

On condition of ξ ¼ cos θ (3.107), defining the following function
l ðξÞ ΘðθÞ,
Pm ð3:126Þ
and considering (3.109) along with (3.54), we arrive at the next SOLDE described as

d m
2 dPl ðξÞ m2
1ξ þ lðl þ 1Þ Pm ðξÞ ¼ 0: ð3:127Þ
dξ dξ 1 ξ2 l
The SOLDE of (3.127) is well known as the associated Legendre differential

equation. The solutions Pm l ðξÞ are called associated Legendre functions.
In the next section, we characterize the said equation and functions by an
analytical method. Before going into details, however, we further seek characteris-
l ðξÞ by the operator approach.
tics of Pm
Adopting the notation of (3.123) and putting m ¼ l in (3.112), we have
qffiffiffiffiffiffiffiffiffiffiffiffiffilþ1 "qffiffiffiffiffiffiffiffiffiffiffiffiffil #
∂
M ðþÞ Y ll ðθ, ϕÞ ¼ e 1ξ iϕ 2
1ξ 2
Y l ðθ, ϕÞ :
l
ð3:128Þ
∂ξ
Corresponding to (3.81), we have M ðþÞ Y ll ðθ, ϕÞ ¼ 0. This implies that

qffiffiffiffiffiffiffiffiffiffiffiffiffil
1 ξ2 Y ll ðθ, ϕÞ ¼ c ðc : constant with respect to ξÞ: ð3:129Þ
From (3.107) and (3.64), we get
Y ll ðθ, ϕÞ ¼ κ l sin l θeilϕ , ð3:130Þ
where κ l is another constant that depends on l, but is independent of θ and ϕ. Let us

seek κ l by normalization condition. That is,
Z Z Z
2π π 2 π
dϕ sin θdθY ll ðθ, ϕÞ ¼ 2π jκl j2 sin 2lþ1 θdθ ¼ 1, ð3:131Þ
0 0 0
where the integration is performed on a unit sphere. Note that an infinitesimal area
element on the unit sphere is represented by sinθdθdϕ.
We evaluate the above integral denoted as
Z π
I sin 2lþ1 θdθ: ð3:132Þ
0
Using integration by parts,

Z π
I¼ ð cos θÞ0 sin 2l θdθ
0
Z π
π
¼ ð cos θÞ sin θ 2l
0
þ ð cos θÞ 2l sin 2l1 θ cos θdθ
0
Z π Z π
¼ 2l sin 2l1 θdθ 2l sin 2lþ1 θdθ: ð3:133Þ
0 0
Thus, we get a recurrence relation with respect to I (3.132) such that

Z π
2l
I¼ sin 2l1 θdθ: ð3:134Þ
2l þ 1 0
Repeating the above process, we get

Z π
2l 2l 2 2 22lþ1 ðl!Þ2
I¼ sin θdθ ¼ : ð3:135Þ
2l þ 1 2l 1 3 0 ð2l þ 1Þ!
Then,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ð2l þ 1Þ! eiχ ð2l þ 1Þ!
jκ l j ¼ l or κl ¼ l ðχ : realÞ, ð3:136Þ
2 l! 4π 2 l! 4π
where eiχ is an undetermined constant (phase factor) that is to be determined below.

Thus we get
eiχ ð2l þ 1Þ! l ilϕ
Y ll ðθ, ϕÞ ¼ l sin θe : ð3:137Þ
2 l! 4π
Meanwhile, in the second equation of (3.101) replacing J (), j, and μ in (3.101)

l ðθ, ϕÞ instead of jζ, μi, we get
with M (), l, and m, respectively, and using Y m
1
Y m1
l ðθ, ϕÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M ðÞ Y m
l ðθ, ϕÞ: ð3:138Þ
ðl m þ 1Þðl þ mÞ
Replacing m with l in (3.138), we have
1 ðÞ l
l ðθ, ϕÞ ¼ pffiffiffiffi M
Y l1 Y l ðθ, ϕÞ: ð3:139Þ
2l
Operating M () (l m) times in total on Y ll ðθ, ϕÞ of (3.139), we have

h ilm
1 ð Þ
Ym
l ð θ, ϕ Þ ¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi M Y ll ðθ, ϕÞ
2lð2l 1Þ ðl þ m þ 1Þ 1 2 ðl mÞ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðl þ mÞ! h ðÞ ilm l
¼ M Y l ðθ, ϕÞ:
ð2lÞ!ðl mÞ!
ð3:140Þ
Meanwhile, putting m ¼ l, n ¼ l m, and j ¼ l in (3.117), we have

h ilm qffiffiffiffiffiffiffiffiffiffiffiffiffim lm

∂
M ð Þ Y ll ðθ, ϕÞ ¼ eiðlmÞϕ 1 ξ2 lm
∂ξ
"qffiffiffiffiffiffiffiffiffiffiffiffiffi #
l
1 ξ2 Y ll ðθ, ϕÞ : ð3:141Þ
lm l
Further replacing M ðÞ Y l ðθ, ϕÞ in (3.140) with that of (3.141), we get
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffim lm
ðl þ mÞ! iðlmÞϕ ∂
l ðθ, ϕÞ
Ym ¼ e 1 ξ2
ð2lÞ!ðl mÞ! ∂ξ
lm
"qffiffiffiffiffiffiffiffiffiffiffiffiffi #
l
1 ξ Y l ðθ, ϕÞ :
2 l
ð3:142Þ
Finally, replacing Y ll ðθ, ϕÞ in (3.142) with that of (3.137) and converting θ to ξ,

we arrive at the following equation:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eiχ ð2l þ 1Þðl þ mÞ! imϕ m=2 ∂lm h l i
l ðθ, ϕÞ ¼ l
Ym e 1 ξ2 1 ξ2 : ð3:143Þ
2 l! 4π ðl mÞ! ∂ξ
lm
Now, let us decide eiχ . Putting m ¼ 0 in (3.143), we have

eiχ ð1Þl ð2l þ 1Þ ∂l h i
2 l
Y 0l ðθ, ϕÞ ¼ l 1 ξ , ð3:144Þ
2 l!ð1Þl 4π ∂ξl
where we put (1)l on both the numerator and denominator. In RHS of (3.144),
ð1Þl ∂l
1 ξ2 Þl Pl ðξÞ: ð3:145Þ
2l l! ∂ξl
Equation (3.145) is well known as Rodrigues formula of Legendre polynomials.

We mention characteristics of Legendre polynomials in the next section. Thus,
eiχ ð2l þ 1Þ
Y 0l ðθ, ϕÞ ¼ Pl ðξÞ: ð3:146Þ
ð1Þl 4π
According to the custom [2], we require Y 0l ð0, ϕÞ to be positive. Noting that θ ¼ 0

corresponds to ξ ¼ 1, we have
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eiχ ð2l þ 1Þ eiχ ð2l þ 1Þ
Y 0l ð0, ϕÞ ¼ Pl ð1Þ ¼ , ð3:147Þ
ð1Þl 4π ð1Þl 4π
where we used Pl(1) ¼ 1. For this important relation, see Sect. 3.6.1. Also noting that
eiχ
ð1Þl ¼ 1, we must have
eiχ
¼ 1 or eiχ ¼ ð1Þl ð3:148Þ
ð1Þl
so that Y 0l ð0, ϕÞ can be positive. Thus, (3.143) is rewritten as

ð 1 Þl
ð2l þ 1Þðl þ mÞ! imϕ
2 m=2 ∂
lm
l ðθ, ϕÞ ¼
Ym e 1 ξ
2l l! 4π ðl mÞ! ∂ξ
lm
h l i
1 ξ2 : ð3:149Þ
In Sect. 3.3, we mentioned that jψ 0i in (3.43) is a constant. In fact, putting

l ¼ m ¼ 0 in (3.149), we have
pffiffiffiffiffiffiffiffiffiffi
Y 00 ðθ, ϕÞ ¼ 1=4π : ð3:150Þ
Thus, as a simultaneous eigenstate of all Lx, Ly, Lz, and L2 corresponding to l ¼ 0

and m ¼ 0, we have
jψ 0 i Y 00 ðθ, ϕÞ:
The normalized functions Y m l ðθ, ϕÞ described as (3.149) define simultaneous

eigenfunctions of L2 (or M2) and Lz (or Mz). Those functions are called spherical
surface harmonics and frequently appear in various fields of mathematical physics.
As in the case of Sect. 2.3, matrix representation enables us to intuitively grasp
the relationship between angular momentum operators and their eigenfunctions
(or eigenvectors). Rewriting the relations of (3.101) so that they can meet the present
purpose, we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðÞ jl, mi ¼ ðl m þ 1Þðl þ mÞ jl, m 1i,
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðþÞ jl, mi ¼ ðl mÞðl þ m þ 1Þ jl, m þ 1i, ð3:151Þ
where we used l instead of ζ to designate the eigenstate.

Now, we know that m takes (2l + 1) different values that correspond to each l.
This implies that the operators can be expressed with (2l + 1, 2l + 1) matrices. As
implied in (3.151), M () takes the following form:
M ð Þ ¼
0 pffiffiffiffiffiffiffiffiffiffi 1
0 2l 1
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B ð2l 1Þ 2 C
B 0 C
B C
B C
B 0 ⋱ C
B C
B C
B ⋱ C
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B ð2l k þ 1Þ k C,
B 0 C
B C
B 0 C
B C
B C
B ⋱ ⋱ C
B pffiffiffiffiffiffiffiffiffiffi C
B C
B 0 1 2l C
@ A
0
ð3:152Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where diagonal elements are zero and a (k, k + 1) element is ð2l k þ 1Þ k. That
is, nonzero elements are positioned just above the zero diagonal elements. Corre-
spondingly, we have
M ð þÞ ¼
0 1
0
B pffiffiffiffiffiffiffiffiffi C
B 2l 1 C
B 0 C
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B C
B ð2l 1Þ 2 0 C
B C
B C
B ⋱ 0 C
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B C,
B ð2l k þ 1Þ k 0 C
B C
B ⋱ C
B C
B ⋱ C
B C
B 0 C
B C
B ⋱ 0 C
@ A
pffiffiffiffiffiffiffiffiffi
1 2l 0
ð3:153Þ
where again diagonal

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi elements are zero and a (k + 1, k) element is
ð2l k þ 1Þ k . In this case, nonzero elements are positioned just below the
zero diagonal elements. Notice also that M () and M (+) are adjoint to each other
and that these notations correspond to (2.65) and (2.66).
Basis functions Y m l ðθ, ϕÞ can be represented by a column vector, as in the case of
Sect. 2.3. These are denoted as follows:
0 1 0 1
1 0
B C B C
B C0 B C 1
B C B C
B C0 B C 0
B C B C
jl, li ¼ B ⋮ C, jl, l þ 1i ¼ B ⋮ C, , jl, l 1i
B C B C
B C B C
B 0 C B 0 C
@ A @ A
0 0
0 1 0 1
0 0
B C B C
B 0 C B 0 C
B ⋮ C B ⋮ C
B C B C
B C B C
¼B C , jl, li ¼ B 0 C , ð3:154Þ
B 0 C B C
B C B C
B 1 C B 0 C
@ A @ A
0 1
where the first number l in jl, li, jl, l + 1i, etc. denotes the quantum number
associated with λ ¼ l(l + 1) of (3.124) and is kept constant; the latter number denotes
m. Note from (3.154) that the column vector whose k-th row is 1 corresponds to
m such that
m ¼ l þ k 1: ð3:155Þ
For instance, if k ¼ 1, m ¼ l; if k ¼ 2l + 1, m ¼ l, etc.

The operator M () converts the column vector whose (k + 1)-th row is 1 to that
whose k-th row is 1. The former column vector corresponds to jl, m + 1i and the latter
corresponding to jl, mi. Therefore, using (3.152), we get the following
representation:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðÞ jl, m þ 1i ¼ ð2l k þ 1Þ kjl, mi ¼ ðl mÞðl þ m þ 1Þ jl, mi, ð3:156Þ
where the second equality is obtained by replacing k with that of (3.155), i.e.,
k ¼ l + m + 1. Changing m to (m 1), we get the first equation of (3.151). Similarly,
we obtain the second equation of (3.151) as well. That is, we have
M ðþÞ jl, mi ¼ð2l k þ 1Þ k jl, m þ 1i
¼ ðl mÞðl þ m þ 1Þ jl, m þ 1i: ð3:157Þ
From (3.32) we have
M 2 ¼ M ðþÞ M ðÞ þ M z 2 M z :
In the above, M (+)M () and Mz are diagonal matrices and, hence, Mz2 and M2 are
diagonal matrices as well such that
0 1
l
B C
B l þ 1 C
B C
B C
B l þ 2 C
B C
B ⋱ C
Mz ¼ B C,
B C
B kl C
B C
B C
B C
@ A
⋱
l
0 1
0
B C
B 2l 1 C
B C
B C
B ð2l 1Þ 2 C
B C
ð þÞ ðÞ B C
M M ¼B C,
B ⋱ C
B ð2l k þ 1Þ k C
B C
B C
B C
@ A
⋱
1 2l
ð3:158Þ
where k l and (2l k + 1) k represent (k + 1, k + 1) elements of Mz and

M (+)M (), respectively. Therefore, (k + 1, k + 1) element of M2 is calculated as
ð2l k þ 1Þ k þ ðk lÞ2 ðk lÞ ¼ lðl þ 1Þ:
As expected, M2 takes a constant value l(l + 1). A matrix representation is shown

in (3.159) such that
0 1
l ð l þ 1Þ
B C
B lðl þ 1Þ C
B C
B C
B ⋱ C
B C
B lðl þ 1Þ C
M ¼B
2
C:
B C
B ⋱ C
B C
B C
B C
@ A
l ð l þ 1Þ
lðl þ 1Þ
ð3:159Þ
These expressions are useful to understand how the vectors of (3.154) constitute
simultaneous eigenstates of M2 and Mz. In this situation, the matrix representation is
said to diagonalize both M2 and Mz. In other words, the quantum states represented
by (3.154) are simultaneous eigenstates of M2 and Mz.
The matrices (3.152) and (3.153) that represent M () and M (+), respectively, are
said to be ladder operators or raising and lowering operators, because operating
column vectors those operators convert jmi to jm 1i as mentioned above. The
operators M () and M (+) correspond to a and a{ given in (2.65) and (2.66), respec-
tively. All these operators are characterized by that the corresponding matrices have
diagonal elements of zero and that nonvanishing elements are only positioned on
“right above” or “right below” relative to the diagonal elements. These matrices are a
kind of triangle matrices and all their diagonal elements are zero. The matrices are
characteristic of nilpotent matrices. That is, if a suitable power of a matrix is zero as a
matrix, such a matrix is said to be a nilpotent matrix (see Part III). In the present case,
(2l + 1)-th power of M () and M (+) becomes zero as a matrix.
The operator M () and M (+) can be described by the following shorthand
representations:
h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðÞ ¼ ð2l k þ 1Þ kδkþ1,j ð1 k 2lÞ: ð3:160Þ
kj
If l ¼ 0, Mz ¼ M (+)M () ¼ M2 ¼ 0. This case corresponds to Y 00 ðθ, ϕÞ ¼ 1=4π
and we do not need the matrix representation. Defining
ak ð2l k þ 1Þ k,
we have for instance

3.6 Orbital Angular Momentum: Analytic Approach 91
h i2 X
M ð Þ ¼ aδ aδ
p k kþ1,p p pþ1,j
¼ ak akþ1 δkþ2,j
kj
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ ð2l k þ 1Þ k ½2l ðk þ 1Þ þ 1 ðk þ 1Þδkþ2,j , ð3:161Þ
where the summation is nonvanishing only if p ¼ k + 1. The factor δk + 2, j implies

that the elements are shifted by one toward upper right by being squared. Similarly
we have
h i
M ð þÞ ¼ ak1 δk,jþ1 ð1 k 2l þ 1Þ: ð3:162Þ
kj
In (3.158), M (+)M () is represented as follows:

h i X
M ðþÞ M ðÞ ¼ a δ a δ
p k1 k,pþ1 p pþ1,j
¼ ak1 a j1 δk,j ¼ ðak1 Þ2 δk,j
kj
¼ ½2l ðk 1Þ þ 1 ðk 1Þδk,j ¼ ð2l k þ 2Þðk 1Þδk,j : ð3:163Þ
Notice that although a0 is not defined, δ1, j + 1 ¼ 0 for any j, and so this causes no
inconvenience. Hence, [M (+)M ()]kj of (3.163) is well defined with 1 k 2l + 1.
Important properties of angular momentum operators examined above are based
upon the fact that those operators are ladder operators and represented by nilpotent
matrices. These characteristics will further be studied in Part III.
3.6 Orbital Angular Momentum: Analytic Approach
In this section, our central task is to solve the associated Legendre differential
equation expressed by (3.127) by an analytical method. Putting m ¼ 0 in (3.127),
we have

d dP0l ðxÞ
1 x2 þ lðl þ 1ÞP0l ðxÞ ¼ 0, ð3:164Þ
dx dx
where we use a variable x instead of ξ. Equation (3.164) is called Legendre

differential equation and its characteristics and solutions have been widely investi-
gated. Hence, we put
P0l ðxÞ Pl ðxÞ, ð3:165Þ
where Pl(x) is said to be Legendre polynomials. We first start with Legendre

differential equation and Legendre polynomials.
3.6.1 Spherical Surface Harmonics and Associated Legendre

Differential Equation
Let us think of a following identity according to Byron and Fuller [4]:
d l l
1 x2 1 x2 ¼ 2lx 1 x2 , ð3:166Þ
dx
where l is a positive integer. We differentiate both sides of (3.166) (l + 1) times. Here

we use the Leibniz rule about differentiation of a product function that is described
by
Xn n!
d n ðuvÞ ¼ d m udnm v, ð3:167Þ
m¼0 m!ðn mÞ!
where
dm u=dxm dm u:
The above shorthand notation is due to Byron and Fuller [4]. We use this notation
for simplicity from place to place.
Noting that the third order and higher differentiations of (1 x2) vanish in LHS of
(3.166), we have
h l i
LHS ¼ dlþ1 1 x2 d 1 x2
l l
¼ 1 x2 dlþ2 1 x2 2ðl þ 1Þxd lþ1 1 x2
l
lðl þ 1Þdl 1 x2 :
Also noting that the second order and higher differentiations of 2lx vanish in LHS
of (3.166), we have
h l i
RHS ¼ d lþ1 2lx 1 x2
l l
¼ 2lxd lþ1 1 x2 2lðl þ 1Þd l 1 x2 :
Therefore,
LHS RHS
l l l
¼ 1 x2 dlþ2 1 x2 2xd lþ1 1 x2 þ lðl þ 1Þdl 1 x2 ¼ 0:
We define Pl(x) as
ð1Þl d l h i
2 l
Pl ðxÞ 1 x , ð3:168Þ
2l l! dxl
l
where a constant ð1 Þ
2l l!
is multiplied according to the custom so that we can explicitly
represent Rodrigues formula of Legendre polynomials. Thus, from (3.164) Pl(x)
defined above satisfies Legendre differential equation. Rewriting it, we get
d 2 P l ð xÞ dP ðxÞ
1 x2 2x l þ lðl þ 1ÞPl ðxÞ ¼ 0: ð3:169Þ
dx2 dx
Or equivalently, we have

d
2 dPl ðxÞ
1x þ lðl þ 1ÞPl ðxÞ ¼ 0: ð3:170Þ
dx dx
Returning to (3.127) and using x as a variable, we rewrite (3.127) as

d dPm
l ð xÞ m2
1 x2 þ l ð l þ 1Þ Pm ðxÞ ¼ 0, ð3:171Þ
dx dx 1 x2 l
where l is a non-negative integer and m is an integer that takes following values:
m ¼ l, l 1, l 2, 1, 0, 1, l þ 1, l:
Deferential equations expressed as

d dyðxÞ
pð x Þ þ cðxÞyðxÞ ¼ 0
dx dx
are of particular importance. We will come back to this point in Sect. 8.3.
Since m can be either positive or negative, from (3.171) we notice that Pm l ðxÞ and
Pm
l ð x Þ must satisfy the same differential equation (3.171). This implies that Pml ð xÞ
m
and Pl ðxÞ are connected, i.e., linearly dependent. First, let us assume that m 0. In
the case of m < 0, we will examine it later soon.
According to Dennery and Krzywicki [5], we assume

2 m=2
l ðxÞ ¼ κ 1 x
Pm CðxÞ, ð3:172Þ
where κ is a constant. Inserting (3.172) into (3.171) and rearranging the terms, we
obtain
d2 C dC
1 x2 2
2ðm þ 1Þx þ ðl mÞðl þ m þ 1ÞC ¼ 0 ð0 m lÞ: ð3:173Þ
dx dx
Recall once again that if m ¼ 0, the associated Legendre differential equation

given by (3.127) and (3.171) is exactly identical to Legendre differential equation of
(3.170). Differentiating (3.170) m times, we get

d 2 d m Pl d d m Pl
1x 2
m 2ðm þ 1Þx
dx2 dx dx dxm
d m Pl
þ ðl m Þðl þ m þ 1 Þ ¼ 0, ð3:174Þ
dxm
where we used the Leibniz rule about differentiation of (3.167). Comparing (3.173)
and (3.174), we find that
d m Pl
C ð xÞ ¼ κ 0 ,
dxm
where κ0 is a constant. Inserting this relation into (3.172) and setting κκ 0 ¼ 1, we get

2 m=2 d Pl ðxÞ
m
l ðxÞ ¼ 1 x
Pm
dxm
ð0 m lÞ: ð3:175Þ
Using Rodrigues formula of (3.168), we have
ð1Þl m=2 dlþm h l i

l ð xÞ
Pm l
1 x2 lþm
1 x2 : ð3:176Þ
2 l! dx
Equation (3.175) defines the associated Legendre functions. Note, however, that
the function form differs from literature to literature [2, 5, 6].
Among classical orthogonal polynomials, Gegenbauer polynomials C λn ðxÞ often
appear in the literature. The relevant differential equation is defined by
d2 λ d
1 x2 C ðxÞ ð2λ þ 1Þx C λn ðxÞ þ nðn þ 2λÞC λn ðxÞ
dx2 n dx

1
¼0 λ> : ð3:177Þ
2
Setting n ¼ l m and λ ¼ m þ 12 in (3.177) [5], we have
d2 mþ12 d mþ1
1 x2 C ðxÞ 2ðm þ 1Þx Clm2 ðxÞ
2 lm
dx dx
mþ1
þðl mÞðl þ m þ 1ÞC lm2 ðxÞ ¼ 0: ð3:178Þ
Once again comparing (3.174) and (3.178), we obtain
d m P l ð xÞ mþ1
¼ constant C lm2 ðxÞ ð0 m lÞ: ð3:179Þ
dxm
Next, let us determine the constant appearing in (3.179). To this end, we consider
a following generating function of the polynomials C λn ðxÞ defined by [7, 8]
λ X1
1
1 2tx þ t 2 C λ ðxÞt n
n¼0 n
λ> : ð3:180Þ
2
To calculate (3.180), let us think of a following expression for x and λ:
λ
X1 λ
ð 1 þ xÞ ¼ m¼0
xm , ð3:181Þ
m

λ
where λ is an arbitrary real number and we define as
m

λ λ
λðλ 1Þðλ 2Þ λ m þ 1=m! and 1: ð3:182Þ
m 0
Notice that (3.181) with (3.182) is an extension of binomial theorem (generalized

binomial theorem). Putting λ ¼ n, we have

n n! n
¼ and ¼ 1:
m ðn mÞ!m! 0
We rewrite (3.181) using gamma functions Γ(z) such that
X1 Γðλ þ 1Þ
ð1 þ xÞλ ¼ xm , ð3:183Þ
m¼0 m!Γðλ m þ 1Þ
where Γ(z) is defined by integral representation as

Z 1
Γ ðzÞ ¼ et t z1 dt ð Re z > 0Þ: ð3:184Þ
0
Changing variables such that t ¼ u2, we have

Z 1
eu u2z1 du ð Re z > 0Þ:
2
ΓðzÞ ¼ 2 ð3:185Þ
0
Note that the above expression is associated with the following fundamental
feature of the gamma functions:
Γðz þ 1Þ ¼ zΓðzÞ, ð3:186Þ
where z is any complex number.

Replacing x with t(2x t) and rewriting (3.183), we have
λ X1 Γðλ þ 1Þ
1 2tx þ t 2 ¼ ðt Þm ð2x t Þm : ð3:187Þ
m¼0 m!Γðλ m þ 1Þ
Assuming that x is a real number belonging to an interval [1, 1], (3.187) holds
with t satisfying jt j < 1 [8]. The discussion is as follows: When x satisfies the above
condition, solving 1 2tx + t2 ¼ 0 we get solution t such that
pffiffiffiffiffiffiffiffiffiffiffiffiffi
t ¼x i 1 x2 :
Defining r as
r min fjt þ j, jt jg,
(1 2tx + t2)λ, regarded as a function of t, is analytic in the disk |t| < r. But, we
have
jt j ¼ 1:
Thus, (1 2tx + t2)λ is analytic within the disk jt j < 1 and, hence, it can be
expanded in a Taylor’s series (see Chap. 6).
Continuing the calculation of (3.187), we have
λ
1 2tx þ t 2
X1 X
Γðλ þ 1Þ m m m m! k k
¼ m¼0 m!Γðλ m þ 1Þ
ð1Þ t k¼0 k!ðm k Þ!
2 x ð1Þ t
mk mk
X1 Xm ð1Þmþk Γðλ þ 1Þ
¼ m¼0 k¼0 k!ðm k Þ! Γðλ m þ 1Þ
2mk xmk t mþk
X1 Xm ð1Þmþk ð1Þm Γðλ þ mÞ
¼ k¼0 k!ðm k Þ!
2mk xmk t mþk ,
m¼0 ΓðλÞ
ð3:188Þ
where the last equality results from that we rewrote gamma functions using (3.186).
Replacing (m + k) with n, we get
λ X1 X½n=2 ð1Þk 2n2k Γðλ þ n k Þ

1 2tx þ t 2 ¼ k¼0 k!ðn 2k Þ!
xn2k t n , ð3:189Þ
n¼0 Γ ðλÞ
where [n/2] represents an integer that does not exceed n/2. This expression comes
from a requirement that an order of x must satisfy the following condition:
n 2k 0 or k n=2: ð3:190Þ
That is, if n is even, the maximum of k ¼ n/2. If n is odd, the maximum of

k ¼ (n 1)/2. Comparing (3.180) and (3.189), we get [8]
X½n=2 ð1Þk 2n2k Γðλ þ n k Þ

Cλn ðxÞ ¼ k¼0 k!ðn 2k Þ!
xn2k : ð3:191Þ
Γ ðλÞ
Comparing (3.164) and (3.177) and putting λ ¼ 1/2, we immediately find that the
two differential equations are identical [7]. That is,
n ðxÞ ¼ Pn ðxÞ:
C 1=2 ð3:192Þ
Hence, we further have

X½n=2 ð1Þk 2n2k Γ 1 þ n k
2
P n ð xÞ ¼ k¼0 k!ðn 2k Þ!
xn2k : ð3:193Þ
Γ 12
Using (3.186) once again, we get [8]
X½n=2 ð1Þk ð2n 2kÞ!

P n ð xÞ ¼ xn2k : ð3:194Þ
k¼0 2 k!ðn kÞ!ðn 2k Þ!
n
It is convenient to make
a formula
about a gamma function. In (3.193), n k > 0,
and so let us think of Γ 12 þ m ðm: positive integerÞ. Using (3.186), we have

1 1 1 1 3 3
Γ þm ¼ m Γ m ¼ m m Γ m ¼
2 2 2 2 2 2

1 3 1 1 m 1
¼ m m Γ ¼ 2 ð2m 1Þð2m 3Þ 3 1 Γ
2 2 2 2 2
ð 2m 1 Þ! ð 2m Þ!
1 1
¼ 2m m1 Γ ¼ 22m Γ :
2 ðm 1Þ! 2 m! 2
ð3:195Þ
Notice that (3.195) still holds even if m ¼ 0. Inserting n k into m of (3.195), we

get
ð2n 2kÞ! 1
1
Γ þ n k ¼ 22ðnkÞ Γ : ð3:196Þ
2 ðn kÞ! 2

Replacing Γ 12 þ n k of
(3.193) with RHS of the above equation, (3.194) will
follow. A gamma function Γ 12 often appears in mathematical physics. According to
(3.185), we have
Z 1
1 pffiffiffi
eu du ¼ π :
2
Γ ¼2
2 0
For the derivation of the above definite integral, see (2.86) of Sect. 2.4. From
(3.184), we also have
Γð1Þ ¼ 1:
In relation of the discussion of Sect. 3.5, let us derive an important formula about
Legendre polynomials. From (3.180) and (3.192), we get
1=2 X1
1 2tx þ t 2 P ðxÞt
n¼0 n
n
: ð3:197Þ
Assuming jt j < 1, when we put x ¼ 1 in (3.197), we have
12 1 X1 X1
1 2tx þ t 2 ¼ ¼ tn ¼ P ð1Þt n : ð3:198Þ
1t n¼0 n¼0 n
Comparing individual coefficients of tn in (3.198), we get
Pn ð1Þ ¼ 1:
See the related parts of Sect. 3.5.

Now, we are in the position to determine the constant in (3.179). Differentiating
(3.194) m times, we have
dm Pl ðxÞ=dxm
X½ðlmÞ=2 ð1Þk ð2l 2k Þ!ðl 2kÞðl 2k 1Þ ðl 2k m þ 1Þ
¼ xl2km
k¼0 2l k!ðl k Þ!ðl 2k Þ!
X½ðlmÞ=2 ð1Þk ð2l 2k Þ!
¼ xl2km : ð3:199Þ
k¼0 2 k!ðl kÞ!ðl 2k mÞ!
l
Meanwhile, we have

mþ1
X½ðlmÞ=2 ð1Þk 2l2km Γ l þ 1 k
C lm2 ðxÞ ¼ 2 xl2km : ð3:200Þ
k¼0 k!ðl 2k mÞ! Γ m þ 12

Γ l þ 12 k ð2l 2k Þ! m!
¼ 22ðlkmÞ :
Γ m þ 12 ðl k Þ! ð2mÞ!
Therefore, we get
mþ1
X½ðlmÞ=2 ð1Þk ð2l 2kÞ! 2m Γðm þ 1Þ l2km
Clm2 ðxÞ ¼ x , ð3:201Þ
k¼0 2l k!ðl kÞ!ðl 2k mÞ! Γð2m þ 1Þ
where we used m! ¼ Γ(m + 1) and (2m)! ¼ Γ(2m + 1). Comparing (3.199) and
(3.201), we get
dm Pl ðxÞ Γð2m þ 1Þ mþ12

¼ m C ðxÞ: ð3:202Þ
dx m
2 Γðm þ 1Þ lm
Thus, we find that the constant appearing in (3.179) is 2ΓmðΓ2mþ1Þ

ðmþ1Þ. Putting m ¼ 0 in
1=2
(3.202), we have Pl ðxÞ ¼ C l ðxÞ. Therefore, (3.192) is certainly recovered. This
gives an easy checkup to (3.202).
Meanwhile, Rodrigues formula of Gegenbauer polynomials [5] is given by

ð1Þn Γðn þ 2λÞΓ λ þ 12 1 n h
2 λþ2 d
1i
2 nþλ2
C λn ðxÞ ¼ n 1 x 1 x : ð3:203Þ
2 n!Γ n þ λ þ 12 Γð2λÞ dxn
Hence, we have
mþ1 ð1Þlm Γðl þ m þ 1ÞΓðm þ 1Þ m dlm

C lm2 ðxÞ ¼ 1 x2
2 ðl mÞ!Γðl þ 1ÞΓð2m þ 1Þ
lm dxlm
h l i
1 x2 : ð3:204Þ
Inserting (3.204) into (3.202), we have
dm Pl ðxÞ ð1Þlm Γðl þ m þ 1Þ m d lm

m ¼ l 1 x2 1 x2 Þ l
dx 2 ðl mÞ!Γðl þ 1Þ dxlm
ð1Þlm ðl þ mÞ! m dlm

¼ 1 x2 1 x2 Þ l : ð3:205Þ
2 l!ðl mÞ!
l
dxlm
Further inserting this into (3.175), we finally get
lm h i
ð1Þlm ðl þ mÞ!
2 m=2 d 2 l
l ð xÞ ¼
Pm 1 x 1 x : ð3:206Þ
2l l!ðl mÞ! dxlm
When m ¼ 0, we have
ð1Þl d l
P0l ðxÞ ¼ 1 x2 Þl ¼ Pl ðxÞ: ð3:207Þ
2l l! dxl
Thus, we recover the functional form of Legendre polynomials. The expression

(3.206) is also meaningful for negative m, provided jm j l, and permits an
l ðxÞ given by (3.175) to negative numbers of m [5].
extension of the definition of Pm
Changing m to m in (3.206), we have
ð1Þlþm ðl mÞ! m=2 dlþm h l i

Pm
l ð xÞ ¼ 1 x2 1 x2 : ð3:208Þ
2 l!ðl þ mÞ!
l dx lþm
Meanwhile, from (3.168) and (3.175),
ð1Þl m=2 dlþm h l i

l ð xÞ ¼
Pm l
1 x2 lþm
1 x2 ð0 m lÞ: ð3:209Þ
2 l! dx
ð1Þm ðl mÞ! m
Pm
l ð xÞ ¼ Pl ðxÞ ðl m lÞ: ð3:210Þ
ðl þ mÞ!
m
l ðxÞ and Pl ðxÞ are linearly dependent.
Thus, as expected earlier, Pm
Now, we return back to (3.149). From (3.206) we have
m2 dlm ð1Þlm 2l l!ðl mÞ! m

1 x2 1 x Þ
2 l
¼ Pl ðxÞ: ð3:211Þ
dxlm ðl þ mÞ!
Inserting (3.211) into (3.149) and changing the variable x to ξ, we have

ð2l þ 1Þðl mÞ! m
m
l ðθ, ϕÞ
Ym ¼ ð1Þ Pl ðξÞeimϕ ðξ ¼ cos θ; 0 θ π Þ: ð3:212Þ
4π ðl þ mÞ!
The coefficient (1)m appearing (3.212) is well known as Condon–Shortley

phase [7]. Another important expression obtained from (3.210) and (3.212) is
Y m m
l ðθ, ϕÞ ¼ ð1Þ Y l ðθ, ϕÞ :
m
ð3:213Þ
Since (3.208) or (3.209) involves higher order differentiations, it would some-

what be inconvenient to find their functional forms. Here we try to seek the
convenient representation of spherical harmonics using familiar cosine and sine
functions. Starting with (3.206) and applying Leibniz rule there, we have
ð1Þlm ðl þ mÞ!
l ð xÞ ¼
Pm ð1 þ xÞm=2 ð1 xÞm=2
2l l!ðl mÞ!
Xlm ðl mÞ! d r lmr
l d
r¼0 r 1 þ xÞ 1 xÞl
r!ðl m r Þ! dx dxlmr
m:
ð1Þlm ðl þ mÞ! Xlm ðl mÞ! l!ð1 þ xÞlr 2 ð1Þlmr l!ð1 xÞmþr 2
m
¼ r¼0 r!ðl m r Þ!
2l l!ðl mÞ! ðl r Þ! ðm þ r Þ!
l!ðl þ mÞ! Xlm
m m
r lr rþ
ð1Þ ð 1 þ xÞ 2 ð 1 xÞ 2
¼ :
2 l r¼0 r!ð l m r Þ! ð l r Þ! ð m þ r Þ!
ð3:214Þ
Putting x ¼ cos θ in (3.214) and using a trigonometric formula, we have

θ
Xlm ð1Þr cos 2l2rm sin 2rþm θ2
l ð xÞ
Pm ¼ l!ðl þ mÞ! : ð3:215Þ
2
r¼0 r!ðl m r Þ! ðl r Þ! ðm þ r Þ!
Inserting this into (3.212), we get

l ðθ, ϕÞ ¼
Ym
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2l þ 1Þðl þ mÞ!ðl mÞ! imϕ Xlm ð1Þ cos 2l2rm θ2 sin 2rþm θ2
rþm
l! e :
4π r¼0 r!ðl m r Þ!ðl r Þ!ðm þ rÞ!
ð3:216Þ
Summation domain of r must be determined so that factorials of negative integers

can be avoided [6]. That is,
(i) If m 0, 0 r l m; (l m + 1) terms,
(ii) If m < 0, jm j r l; (ljmj+1) terms.
For example, if we choose l for m, putting r ¼ 0 in (3.216) we have
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi l
ð2l þ 1Þ! ilϕ ð1Þ cos l θ2 sin θ2
l
Y ll ðθ, ϕÞ
¼ e
4π l!
ð2l þ 1Þ! ilϕ ð1Þl sin l θ
¼ e : ð3:217Þ
4π 2l l!
qffiffiffiffi
In particular, we have Y 00 ðθ, ϕÞ ¼ 4π 1
to recover (3.150). When m ¼ l, putting
r ¼ l in (3.216) we get
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi l rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2l þ 1Þ! ilϕ cos l θ2 sin θ2 ð2l þ 1Þ! ilϕ sin l θ
Y l
l ðθ, ϕÞ ¼ e ¼ e : ð3:218Þ
4π l! 4π 2l l!
For instance, choosing l ¼ 3 and m ¼ 3 and using (3.217) or (3.218), we have

rffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffi
35 i3ϕ 3 35 i3ϕ
Y 33 ðθ, ϕÞ ¼ e sin θ, Y 3 ðθ, ϕÞ ¼
3
e sin 3 θ:
64π 64π
For the minus sign appearing in Y 33 ðθ, ϕÞ is due to the Condon–Shortley phase.
For l ¼ 3 and m ¼ 0, moreover, we have

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r 62r θ 2r θ
ð 1 Þ cos sin
7 3! 3!X3 2 2
Y 03 ðθ, ϕÞ ¼ 3!
4π r¼0 r!ð3 r Þ!ð3 r Þ!r!
2 3
rffiffiffi cos 6 θ cos 4 θ
sin 2 θ
cos 2 θ
sin 4 θ
sin 6 θ
766
2 2 2 2 2 2 7
7
¼ 18 þ
π 4 0!3!3!0! 1!2!2!1! 2!1!1!2! 3!0!0!3! 5
8 9
rffiffiffi>
> θ θ θ θ θ θ >>
< cos 2 sin 2 cos
6 6 2 2 2 2
2 =
cos sin sin
7 2 2 2
¼ 18 þ
π>> 36 4 >
>
: ;
rffiffiffi
7 5 3
¼ cos 3 θ cos θ ,
π 4 4
where in the last equality we used formulae of elementary algebra and trigonometric
functions. At the same time, we get
rffiffiffiffiffi
7
Y 03 ð0, ϕÞ ¼ :
4π
This is consistent with (3.147) in that Y 03 ð0, ϕÞ is positive.
3.6.2 Orthogonality of Associated Legendre Functions
Orthogonality relation of functions is important. Here we deal with it, regarding the
associated Legendre functions.
Replacing m with (m 1) in (3.174) and using the notation introduced before, we
have

1 x2 d mþ1 Pl 2mxd m Pl þ ðl þ mÞðl m þ 1Þdm1 Pl ¼ 0: ð3:219Þ
Multiplying both sides by (1 x2)m 1, we have

m m1 m
dmþ1 Pl 2mx 1 x2
1 x2 d Pl
m1 m1
þðl þ mÞðl m þ 1Þ 1 x2 d Pl ¼ 0:
Rewriting the above equation, we get

h m1 m1
d 1 x2 Þm dm Pl ¼ ðl þ mÞðl m þ 1Þ 1 x2 d Pl : ð3:220Þ
Now, let us define f (m) as follows:

Z 1 m
f ðmÞ 1 x2 ðd m Pl Þd m Pl0 dx ð0 m l, l0 Þ: ð3:221Þ
1
Rewriting (3.221) as follows and integrating it by parts, we have

Z 1 m m1 m
f ðm Þ ¼ 1 x2 d d Pl d Pl0 dx
1
Z
1 m
¼ dm1 Pl ð1 x2 Þm dm Pl0 11 dm1 Pl d 1 x2 dm Pl0 dx
1
Z 1
m1 m
¼ d Pl d 1 x2 dm Pl0 dx
1
Z 1 m1 m1
¼ dm1 Pl ðl0 þ mÞðl0 m þ 1Þ 1 x2 d Pl0 dx
1
¼ ðl0 þ mÞðl0 m þ 1Þf ðm 1Þ, ð3:222Þ
where with the second equality the first term vanished and with the second last
equality we used (3.220). Equation (3.222) gives a recurrence formula regarding f
(m). Further performing the calculation, we get
f ðmÞ ¼ ðl0 þ mÞðl0 þ m 1Þ ðl0 m þ 2Þðl0 m þ 1Þf ðm 2Þ

¼
¼ ðl0 þ mÞðl0 þ m 1Þ ðl0 þ 1Þ l0 ðl0 m þ 2Þðl0 m þ 1Þf ð0Þ
ðl0 þ mÞ!
¼ f ð0Þ, ð3:223Þ
ðl0 mÞ!
where we have
Z 1
f ð 0Þ ¼ Pl ðxÞPl0 ðxÞdx: ð3:224Þ
1
Note that in (3.223) a coefficient of f(0) comprises 2m factors.

In (3.224), Pl(x) and Pl0 ðxÞ are Legendre polynomials defined in (3.165). Then,
using (3.168) we have
Z 1h
ih 0 0i
0
ð1Þl ð1Þl
f ð 0Þ ¼ 0 d l 1 x2 l dl 1 x2 l dx: ð3:225Þ
2l l! 2l l0 ! 1
To evaluate (3.224), we have two cases; i.e., (i) l 6¼ l0 and (ii) l ¼ l0. With the first
case, assuming that l > l0 and taking partial integration, we have
Z 1h ih 0 i
0
I¼ d l 1 x2 Þl dl 1 x2 Þl dx
1
hn l on l0 l0 oi1
¼ d l1 1 x2 d 1 x2
1
Z 1h h 0 i
0
d l1 1 x2 Þl dl þ1 1 x2 Þl dx: ð3:226Þ
1
In the above, we find that the first term vanishes because it contains (1 x2) as a
factor. Integrating (3.226) another l0 times as before, we get
Z 1h ih 0 i
0 0 0
I ¼ ð1Þl þ1 d ll 1 1 x2 Þl d2l þ1 1 x2 Þl dx: ð3:227Þ
1
0
In (3.227) 0 l l0 1 2l, and so dll 1 ð1 x2 Þ does not vanish, but
l
l0 0 l0
ð1 x2 Þ is an at most 2l0-degree polynomial and, hence, d2l þ1 ð1 x2 Þ vanishes.
Therefore,
f ð0Þ ¼ 0: ð3:228Þ
If l < l0, changing Pl(x) and Pl0 ðxÞ in (3.224), we get f(0) ¼ 0 as well.
In the second case of l ¼ l0, we evaluate the following integral:
Z 1h
l i2
I¼ d l 1 x2 dx: ð3:229Þ
1
Similarly integrating (3.229) by parts l times, we have

Z Z 1
1 l h l i l
I ¼ ð1Þl 1 x2 d2l 1 x2 dx ¼ ð1Þ2l ð2lÞ! 1 x2 dx: ð3:230Þ
1 1
In (3.230), changing x to cosθ, we have

Z 1 Z
l π
1 x2 dx ¼ sin 2lþ1 θdθ: ð3:231Þ
1 0
2
ðl!Þ 2lþ1
We have already estimate this integral in (3.132) to have 2ð2lþ1 Þ! . Therefore,
ð1Þ2l ð2lÞ! 22lþ1 ðl!Þ2 2

f ð0Þ ¼ ¼ : ð3:232Þ
2 ðl!Þ
2l 2 ð2l þ 1Þ! 2l þ1
Thus, we get
ðl þ mÞ! ðl þ mÞ! 2
f ðmÞ ¼ f ð 0Þ ¼ : ð3:233Þ
ðl mÞ! ðl mÞ! 2l þ 1
From (3.228) and (3.233), we have

Z 1
ðl þ mÞ! 2
l ðxÞPl0 ðxÞdx ¼
Pm δ 0: ð3:234Þ
m
1 ðl mÞ! 2l þ 1 ll
Accordingly, putting
fm ðxÞ ð2l þ 1Þðl mÞ!Pm ðxÞ,
P ð3:235Þ
l
2ðl þ mÞ! l
we get
Z 1
P fm0 ðxÞdx ¼ δll0 :
fm ðxÞP ð3:236Þ
l l
1
Normalized Legendre polynomials immediately follow. This is given by

rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffi
2l þ 1 ð1Þl 2l þ 1 d l h l i
Pel ðxÞ P l ð xÞ ¼ l l
1 x2 : ð3:237Þ
2 2 l! 2 dx
Combining a normalized function (3.235) with p1ffiffiffiffi

2π
eimϕ , we recover
ð2l þ 1Þðl mÞ! m
l ðθ, ϕÞ
Ym ¼ Pl ðxÞeimϕ ðx ¼ cos θ; 0 θ πÞ: ð3:238Þ
4π ðl þ mÞ!
Notice in (3.238), however, we could not determine Condon–Shortley phase

(1)m; see (3.212).
3.7 Radial Wave Functions of Hydrogen-Like Atoms 107
m
Since Pml ðxÞ and Pl ðxÞ are linearly dependent as noted in (3.210), the set of the
associated Legendre functions cannot define a complete set of orthonormal system.
In fact, we have
Z 1
m ð1Þm ðl mÞ! ðl þ mÞ! 2 2ð1Þm
l ðxÞPl ðxÞdx ¼
Pm ¼ : ð3:239Þ
1 ðl þ mÞ! ðl mÞ! 2l þ 1 2l þ 1
m
l ðxÞ and Pl ðxÞ are not orthogonal. Thus, we need e
imϕ
This means that Pm to
constitute the complete set of orthonormal system. In other words,
Z 2π Z 1 h 0 i
dϕ d ð cos θÞ Y m
l0 ðθ, ϕÞ Y l ðθ, ϕÞ ¼ δll δmm :
m 0 0 ð3:240Þ
0 1
3.7 Radial Wave Functions of Hydrogen-Like Atoms
In Sect. 3.1 we have constructed Hamiltonian of hydrogen-like atoms. If the physical

system is characterized by the central force field, the method of separation of
variables into the angular part (θ, ϕ) and radial (r) part is successfully applied to
the problem and that method allows us to deal with the Schrödinger equation
separately. The spherical surface harmonics play a central role in dealing with the
differential equations related to the angular part. We studied important properties of
the special functions such as Legendre polynomials and associated Legendre func-
tions, independent of the nature of the specific central force fields such as Coulomb
potential and Yukawa potential. With the Schrödinger equation pertinent to the
radial part, on the other hand, its characteristics differ depending on the nature of
individual force fields. Of these, the differential equation associated with the Cou-
lomb potential gives exact (or analytical) solutions. It is well known that the second-
order differential equations are often solved by an operator representation method.
Examples include its application to a quantum-mechanical harmonic oscillator and
angular momenta of a particle placed in a central force field. Nonetheless, the
corresponding approach to the radial equation for the electron has been less popular
to date. The initial approach, however, was made by Sunakawa [3]. The purpose of
this chapter rests upon further improvement of that approach.
3.7.1 Operator Approach to Radial Wave Functions
In Sect. 3.2, the separation of variables leaded to the radial part of the Schrödinger
2
1 ħ ∂ 2 ∂Rðr Þ ħ2 λ Ze2
2 r þ 2 R ðr Þ Rðr Þ ¼ ERðr Þ: ð3:51Þ
We identified λ with l(l + 1) in (3.124). Thus, rewriting (3.51) and indexing R(r)
with l, we have
2
ħ2 d 2 dRl ðr Þ ħ lðl þ 1Þ Ze2
r þ R ðr Þ ¼ ERl ðr Þ, ð3:241Þ
2μr 2 dr dr 2μr 2 4πε0 r l
where Rl(r) is a radial wave function parametrized with l; μ, Z, ε0, and E denote a
reduced mass of hydrogen-like atom, atomic number, permittivity of vacuum, and
eigenvalue of energy, respectively. Otherwise we follow conventions.
Now, we are in position to solve (3.241). As in the cases of Chap. 2 of a quantum-
mechanical harmonic oscillator and the previous section of the angular momentum
operator, we present the operator formalism in dealing with radial wave functions of
hydrogen-like atoms. The essential point rests upon that the radial wave functions
can be derived by successively operating lowering operators on a radial wave
function having a maximum allowed orbital angular momentum quantum number.
The results agree with the conventional coordinate representation method based
upon power series expansion that is related to associated Laguerre polynomials.
Sunakawa [3] introduced the following differential equation by suitable trans-
formations of a variable, parameter, and function:

d 2 ψ l ð ρÞ l ð l þ 1Þ 2
þ ψ l ðρÞ ¼ Eψ l ðρÞ, ð3:242Þ
dρ2 ρ2 ρ
a 2
where ρ ¼ Zra , E ¼ 2μ
ħ2 Z
E , and ψ l(ρ) ¼ ρRl(r) with a (4πε0ħ2/μe2) being Bohr
radius of a hydrogen-like atom. Note that ρ and E are dimensionless quantities. The
related calculations are as follows: We have

dRl d ðψ l =ρÞ dρ dψ l 1 ψ l Z
¼ ¼ :
dr dρ dr dρ ρ ρ2 a
Thus, we get

dRl dψ l a d 2 dRl d 2 ψ l ð ρÞ
r2
¼ r ψ l, r ¼ ρ:
dr dρ Z dr dr dρ2
Using the above relations we arrive at (3.242).

Here we define the following operators:

d l 1
bl þ : ð3:243Þ
dρ ρ l
Hence,

d l 1
b{l ¼ þ , ð3:244Þ
dρ ρ l
where the operator b{l is an adjoint operator of bl. Notice that these definitions are
different from those of Sunakawa [3]. The operator dρ d
ð AÞ is formally an anti-
Hermitian operator. We have mentioned such an operator in Sect. 1.5. The second
terms of (3.243) and (3.244) are Hermitian operators, which we define as H. Thus,
we foresee that bl and b{l can be denoted as follows:
bl ¼ A þ H and b{l ¼ A þ H:
These representations are analogous to those appearing in the operator formalism

of a quantum-mechanical harmonic oscillator. Special care, however, should be
taken in dealing with the operators bl and b{l . First, we should carefully examine
d d
whether dρ is in fact an anti-Hermitian operator. This is because for dρ to be anti-
Hermitian, the solution ψ l(ρ) must satisfy boundary conditions in such a way that
ψ l(ρ) vanishes or takes the same value at the endpoints ρ ! 0 and 1. Second, the
coordinate system we have chosen is not Cartesian coordinate but the polar (spher-
ical) coordinate, and so ρ is defined only on a domain ρ > 0. We will come back to
this point later.
Let us proceed on calculations. We have

d l 1 d l 1
bl b{l ¼ þ þ
dρ ρ l dρ ρ l

d2 d l 1 l 1 d l2 2 1
¼ 2þ þ 2 þ 2
dρ dρ ρ l ρ l dρ ρ ρ l
d2 l l2 2 1 d2 lðl 1Þ 2 1
¼ þ þ ¼ þ þ 2 ð3:245Þ
dρ2 ρ2 ρ2 ρ l2 dρ2 ρ2 ρ l
Also, we have
d2 l ðl þ 1 Þ 2 1
b{l bl ¼ þ þ 2: ð3:246Þ
dρ 2 ρ2 ρ l
We further define an operator H(l ) as follows:


ðlÞ d2 l ð l þ 1Þ 2
H 2þ :
dρ ρ2 ρ
Then, from (3.243) and (3.244) as well as (3.245) and (3.246) we have
H ðlÞ ¼ blþ1 b{lþ1 þ εðlÞ ðl 0Þ, ð3:247Þ
where εðlÞ ðlþ1

1
Þ2
. Alternatively,
H ðlÞ ¼ b{l bl þ εðl1Þ ðl 1Þ: ð3:248Þ
If we put l ¼ n 1 in (3.247) with n being a fixed given integer larger than l, we

obtain
H ðn1Þ ¼ bn b{n þ εðn1Þ : ð3:249Þ
We evaluate the following inner product of both sides of (3.249):

D E
χH ðn1Þ χ ¼ h χ jbn b{ jχ i þ εðn1Þ h χ jχ i
n

¼ b{ χ b{ χi þ εðn1Þ h χ jχ i
n n
2
¼ b{n jχ i þ εðn1Þ h χ jχ i
εðn1Þ : ð3:250Þ
Here we assume that χ is normalized. On the basis of the variational principle [9],
the above expected value must take a minimum ε(n 1) so that χ can be an
eigenfunction. To satisfy this condition, we have
jb{n χi ¼ 0: ð3:251Þ
In fact, if (3.251) holds, from (3.249) we have
H ðn1Þ χ ¼ εðn1Þ χ: ð3:252Þ
We define such a function as below
ðnÞ
ψ n1 χ: ð3:253Þ
From (3.247) and (3.248), we have the following relationship:

H ðlÞ blþ1 ¼ blþ1 H ðlþ1Þ ðl 0Þ: ð3:254Þ
Meanwhile we define the functions as shown below
ðnÞ
ψ ðns
nÞ
bnsþ1 bnsþ2 ∙ ∙ ∙ ∙ bn1 ψ n1 ð2 s nÞ: ð3:255Þ
ðnÞ
With these functions (s – 1) operators have been operated on ψ n1 . Note that if
s took 1, no operation of bl would take place. Thus, we find that bl functions upon the
l-state to produce the (l 1)-state. That is, bl acts as an annihilation operator. For the
sake of convenience we express
H ðn,sÞ H ðnsÞ : ð3:256Þ
Using this notation and (3.254), we have
ðnÞ
H ðn,sÞ ψ ðns
nÞ
¼ H ðn,sÞ bnsþ1 bnsþ2 bn1 ψ n1
ðnÞ
¼ bnsþ1 H ðn,s1Þ bnsþ2 bn1 ψ n1
ðnÞ
¼ bnsþ1 bnsþ2 H ðn,s2Þ bn1 ψ n1

ð nÞ
¼ bnsþ1 bnsþ2 H ðn,2Þ bn1 ψ n1
ð nÞ
¼ bnsþ1 bnsþ2 bn1 H ðn,1Þ ψ n1
ðnÞ
¼ bnsþ1 bnsþ2 bn1 εðn1Þ ψ n1
ðnÞ
¼ εðn1Þ bnsþ1 bnsþ2 bn1 ψ n1
¼ εðn1Þ ψ ðns
nÞ
: ð3:257Þ
Thus, total n functions ψ ðns

nÞ
ð1 s nÞ belong to the same eigenvalue ε(n 1).
Notice that the eigenenergy En corresponding to ε(n 1) is given by
2
ħ2 Z 1
En ¼ : ð3:258Þ
2μ a n2
ðnÞ
If we define l n s and take account of (3.252), total n functions ψ l (l ¼ 0, 1, 2,
∙ ∙ ∙ ∙, n 1) belong to the same eigenvalue ε(n 1).
ðnÞ
The quantum state ψ l is associated with the operators H(l ). Thus, the solution of
ðnÞ
(3.242) has been given by functions ψ l parametrized with n and l on condition that
(3.251) holds. As explicitly indicated in (3.255) and (3.257), bl lowers the parameter
ðnÞ
l by one from l to l – 1, when it operates on ψ l . The operator b0 cannot be defined as
indicated in (3.243), and so the lowest number of l should be zero. Operators such as
bl are known as a ladder operator (lowering operator or annihilation operator in the
ðnÞ
present case). The implication is that the successive operations of bl on ψ n1 produce
various parameters l as a subscript down to zero, while retaining the same integer
parameter n as a superscript.
3.7.2 Normalization of Radial Wave Functions
Next we seek normalized eigenfunctions. Coordinate representation of (3.251) takes
ðnÞ
dψ n1 n 1 ðnÞ
þ ψ n1 ¼ 0: ð3:259Þ
dρ ρ n
The solution can be obtained as
ðnÞ
ψ n1 ¼ cn ρn eρ=n , ð3:260Þ
where cn is a normalization constant. This can be determined as follows:

Z 1
ðnÞ 2
ψ n1 dρ ¼ 1: ð3:261Þ
0
Namely,
Z 1
j cn j 2
ρ2n e2ρ=n dρ ¼ 1: ð3:262Þ
0
Consider the following definite integral:

Z 1
1
e2ρξ dρ ¼ :
0 2ξ
Differentiating the above integral 2n times with respect to ξ gives

Z 1 2nþ1
1
ρ2n e2ρξ dρ ¼ ð2nÞ!ξð2nþ1Þ : ð3:263Þ
0 2
Substituting 1/n into ξ, we obtain

Z 1 2nþ1
1
ρ2n e2ρ=n dρ ¼ ð2nÞ!nð2nþ1Þ : ð3:264Þ
0 2
Hence,
nþ12 pffiffiffiffiffiffiffiffiffiffi
2
cn ¼ = ð2nÞ!: ð3:265Þ
n
To further normalize the other wave functions, we calculate the following inner
product:
D E D E
ðnÞ ðnÞ ðnÞ ðnÞ
ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 b{lþ1 blþ1 blþ2 bn1 ψ n1 : ð3:266Þ
From (3.247) and (3.248), we have
b{l bl þ εðl1Þ ¼ blþ1 b{lþ1 þ εðlÞ ðl 1Þ: ð3:267Þ
Applying (3.267) to (3.266) repeatedly and considering (3.251), we reach the

following relationship:
D E
ðnÞ ðnÞ
ψ l ψ l
h ih i h iD E ð3:268Þ
ðnÞ ðnÞ
¼ εðn1Þ εðn2Þ εðn1Þ εðn3Þ εðn1Þ εðlÞ ψ n1 ψ n1 :
ðnÞ
To show this, we use mathematical
D induction.
E We have already normalized ψ n1
ðnÞ ðnÞ
in (3.261). Next, we calculate ψ n2 jψ n2 such that
D E D E D E
ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ
ψ n2 jψ n2 ¼ ψ n1 b{n1 jbn1 ψ n1 ¼ ψ n1 b{n1 bn1 ψ n1
D h i E
ðnÞ ðnÞ
¼ ψ n1 bn b{n þ εðn1Þ εðn2Þ ψ n1
D E h iD E ð3:269Þ
¼ ψ n1 bn b{n ψ n1 þ εðn1Þ εðn2Þ ψ n1 ψ n1
h iD E
ðnÞ ðnÞ
¼ εðn1Þ εðn2Þ ψ n1 ψ n1 :
With the third equality, we used (3.267) with l ¼ n 1; with the last equality we
D with lE¼ n 2. Then, Dit sufficesEto show that
used (3.251). Therefore, (3.268) holds
assuming that (3.268) holds with ψ lþ1 ψ lþ1 , it holds with ψ l ψ l as well.
D E
ðnÞ ðnÞ
Let us calculate ψ l ψ l , starting with (3.266) as below:
D E D E
ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 b{lþ1 blþ1 blþ2 bn1 ψ n1
D E
ðnÞ ðnÞ
¼ ψ n1 b{n1 b{lþ2 b{lþ1 blþ1 blþ2 bn1 ψ n1
D h i E
ðnÞ ðnÞ
¼ ψ n1 b{n1 b{lþ2 blþ2 b{lþ2 þ εðlþ1Þ εðlÞ blþ2 bn1 ψ n1
D E
ðnÞ ðnÞ
¼ ψ n1 b{n1 b{lþ2 blþ2 b{lþ2 blþ2 bn1 ψ n1 ð3:270Þ
h iD E
ðnÞ ð nÞ
þ εðlþ1Þ εðlÞ ψn1 b{n1 b{lþ2 blþ2 bn1 ψ n1
D E
ðnÞ ðnÞ
¼ ψ n1 b{n1 b{lþ2 blþ2 b{lþ2 blþ2 bn1 ψ n1
h iD E
ðnÞ ðnÞ
þ εðlþ1Þ εðlÞ ψ lþ1 ψ lþ1 :
In the next step, using b{lþ2 blþ2 ¼ blþ3 b{lþ3 þ εðlþ2Þ εðlþ1Þ , we have
D E D E
ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 blþ2 blþ3 b{lþ3 blþ3 bn1 ψ n1
h iD E h iD E
þ εðlþ1Þ εðlÞ ψ lþ1 ψ lþ1 þ εðlþ2Þ εðlþ1Þ ψ lþ1 ψ lþ1 :
ð3:271Þ
Thus, we find that in the first term the index number of b{lþ3 has been increased by
one with itself transferred toward theDright side. EOn the other hand, we notice that
ðnÞ ðnÞ
with the second and third terms, εðlþ1Þ ψ lþ1 ψ lþ1 cancels out. Repeating the above
processes, we reach a following expression:
D E D ðnÞ E
ðnÞ ðnÞ ðnÞ
ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 blþ2 bn1 bn b{n ψ n1
h iD E h iD E
þ εðn1Þ εðn2Þ ψ lþ1 ψ lþ1 þ εðn2Þ εðn3Þ ψ lþ1 ψ lþ1 þ
h iD E h iD E ð3:272Þ
þ εðlþ2Þ εðlþ1Þ ψ lþ1 ψ lþ1 þ εðlþ1Þ εðlÞ ψ lþ1 ψ lþ1
h iD E
ðnÞ ðnÞ
¼ εðn1Þ εðlÞ ψ lþ1 ψ lþ1 :
D first termE of RHS vanishes because of (3.251); the subsequent

In (3.272), the
ðnÞ ðnÞ
terms produce ψ lþ1 ψ lþ1 whose coefficients have cancelled out one another
except for [ε(n 1) ε(l )].
Meanwhile, from assumption of the mathematical induction we have
D E
ðnÞ ðnÞ
ψ lþ1 ψ lþ1
h ih i h iD E
ðnÞ ðnÞ
¼ εðn1Þ εðn2Þ εðn1Þ εðn3Þ εðn1Þ εðlþ1Þ ψ n1 ψ n1 :
Inserting this equation into (3.272), we arrive at (3.268). In other words, we have
shown that if (3.268) holds with l ¼ l + 1, (3.268)D holds withE l ¼ l as well. This
ðnÞ ðnÞ
completes the proof to show that (3.268) is true of ψ l ψ l with l down to 0.
ðnÞ
el
The normalized wave functions ψ are expressed from (3.255) as
ðnÞ ðnÞ
¼ κðn, lÞ2 blþ1 blþ2 bn1 ψ
1
el
ψ e n1 , ð3:273Þ
where κ(n, l ) is defined such that

h i h i h i
κðn, lÞ εðn1Þ εðn2Þ εðn1Þ εðn3Þ εðn1Þ εðlÞ , ð3:274Þ
with l n 2. More explicitly, we get
ð2n 1Þ!ðn l 1Þ!ðl!Þ2

κ ðn, lÞ ¼ : ð3:275Þ
ðn þ lÞ!ðn!Þ2 ðnnl2 Þ2
In particular, from (3.265) we have
nþ12
ðnÞ 2 1
e n1 ¼
ψ pffiffiffiffiffiffiffiffiffiffi ρn eρ=n : ð3:276Þ
n ð2nÞ!
From (3.272), we define the following operator:
h i12
e
bl εðn1Þ εðl1Þ bl : ð3:277Þ
Then (3.273) becomes
¼e blþ2 e
blþ1e
ðnÞ ðnÞ
el
ψ e n1 :
bn1 ψ ð3:278Þ
3.7.3 Associated Laguerre Polynomials
ðnÞ
e l with conventional wave
It will be of great importance to compare the functions ψ
functions that are expressed using associated Laguerre polynomials. For this purpose
ðnÞ
we define the following functions Φl ðρÞ such that
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
lþ32 ðn l 1Þ! ρn lþ1 2lþ1 2ρ
ðnÞ 2
Φl ð ρÞ e ρ Lnl1 : ð3:279Þ
n 2nðn þ lÞ! n
The associated Laguerre polynomials are described as
1 ν x dn nþν x
Lνn ðxÞ ¼ x e n ðx e Þ, ðν > 1Þ: ð3:280Þ
n! dx
In a form of power series expansion, the polynomials are expressed for integer
k 0 as
Xn ð1Þm ðn þ k Þ!
Lkn ðxÞ ¼ m¼0 ðn mÞ!ðk þ mÞ!m!
xm : ð3:281Þ
Notice that “Laguerre polynomials” Ln(x) are defined as
Ln ðxÞ L0n ðxÞ:
Hence, instead of (3.280) and (3.281), the Rodrigues formula and power series
expansion of Ln(x) are given by [2, 5]
1 x dn n x
Ln ðxÞ ¼ e ðx e Þ,
n! dxn
Xn ð1Þm n!
L n ð xÞ ¼ xm :
m¼0
ðn mÞ!ðm!Þ2
ðnÞ ρ
The function Φl ðρÞ contains multiplication factors en and ρl + 1. The function
ðnÞ
Lnl1 n is a polynomial of ρ with the highest order of ρn l 1. Therefore, Φl ðρÞ
2lþ1 2ρ
ρ
consists of summation of terms containing en ρt , where t is an integer equal to 1 or
ðnÞ
larger. Consequently, Φl ðρÞ ! 0 when ρ ! 0 and ρ ! 1 (vide supra). Thus, we
ðnÞ
have confirmed that Φl ðρÞ certainly satisfies proper BCs mentioned earlier and,
d
hence, the operator dρ is indeed an anti-Hermitian. To show this more explicitly, we
define D dρ d
. An inner product between arbitrarily chosen functions f and g is
Z 1
h f jDgi f Dgdρ
0
Z 1
ð3:282Þ
¼ ½ f g 1
0 ðDf Þgdρ
0
¼ ½ f g 1
0 þ hDf jgi,
where f is a complex conjugate of f. Meanwhile, from (1.112) we have

h f jDgi ¼ D{ f jgi: ð3:283Þ
Therefore if the functions f and g vanish at ρ ! 0 and ρ ! 1, D{ ¼ D by

equating (3.282) and (3.283). This means that D is anti-Hermitian. The functions
ðnÞ
Φl ðρÞ we are dealing with certainly satisfy the required boundary conditions. The
operator H(l ) appearing in (3.247) and (3.248) is Hermitian accordingly. This is
because
b{l bl ¼ ðA þ H ÞðA þ H Þ ¼ H 2 A2 AH þ HA, ð3:284Þ
{ { { 2 2
b{l bl ¼ H 2 A2 H { A{ þ A{ H { ¼ H { A{ H { A{ þ A{ H {
¼ H 2 ðAÞ2 H ðAÞ þ ðAÞH ¼ H 2 A2 þ HA AH ¼ b{l bl :

ð3:285Þ
The Hermiticity is true of bl b{l as well. Thus, the eigenvalue and eigenstate
(or wave function) which belongs to that eigenvalue are physically meaningful.
Next, consider the following operation:
e ðnÞ
bl Φ l ð ρÞ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h i12 lþ32 ðn l 1Þ! n o
ðn1Þ ðl1Þ 2 d l 1 ρ 2ρ
¼ ε ε þ en ρlþ1 L2lþ1 ,
n 2nðn þ lÞ! dρ ρ l nl1 n
ð3:286Þ
where
h i12
nl
εðn1Þ εðl1Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð3:287Þ
ðn þ lÞðn lÞ
2ρ
Rewriting L2lþ1
nl1 n in a power series expansion form using (3.280) and
rearranging the result, we obtain
e ðnÞ
bl Φ l ð ρÞ
3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 lþ2 nl 1 d l 1
¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi þ
n ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ! dρ ρ l
nl1
ρ
l d nþl 2ρ
eρn ρ e n
dρnl1
3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 lþ2 nl 1 d l 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ
n ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ! dρ ρ l
Xnl1 ð1Þm ðn þ lÞ! m ðn l 1Þ!
ρ 2
en ρlþmþ1 , ð3:288Þ
m¼0 ð2l þ m þ 1Þ! n m!ðn l m 1Þ!
where we used well-known Leibniz

nþl 2ρ=n rule of higher order differentiation of a product
d nl1
function, i.e., dρnl1 ρ e . To perform further calculation, notice that dρd
does
ρ/n
not change a functional form of e , whereas it lowers the order of ρ l+m+1
by one.
Meanwhile, operation of ρl lowers the order of ρl + m + 1 by one as well. The factor nþl 2l
in the following equation (3.289) results from these calculation processes. Consid-
ering these characteristics of the operator, we get
e ðnÞ
bl Φ l ð ρÞ
3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 lþ2 n þ l nl 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi eρ=n ρl
n 2l ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ!
(
Xnl1 ð1Þmþ1 ðn þ lÞ! ðn l 1Þ! mþ1 ð3:289Þ
mþ1 2
ρ
m¼0 ð2l þ m þ 1Þ! m!ðn l m 1Þ! n
)
Xnl1 ð1Þm ðn þ l 1Þ! ðn l 1Þ! m
m 2
þ2l m¼0 ρ :
ð2l þ mÞ! m!ðn l m 1Þ! n
In (3.289), calculation of the part { } next to the multiplication sign for RHS is
somewhat complicated, and so we describe the outline of the calculation procedure
below.
{ } of RHS of (3.289)
Xnl ð1Þm ðn þ lÞ! ðn l 1Þ! m

m 2
¼ ρ
m¼1 ð2l þ mÞ! ðm 1Þ!ðn l mÞ! n
Xnl1 ð1Þm ðn þ l 1Þ! ðn l 1Þ!
2 m
þ2l m¼0 ρm
ð2l þ mÞ! m!ðn l m 1Þ! n
2ρm
Xnl1 ð1Þm ðn l 1Þ!ðn þ l 1Þ!
¼ n
m¼1 ð2l þ mÞ!

nþl 2l
þ
ðm 1Þ!ðn l mÞ! m!ðn l m 1Þ!
nl ðn þ l 1Þ!
2ρ
þð1Þnl þ
n ð2l 1Þ!
2ρm
Xnl1 ð1Þm ðn l 1Þ!ðn þ l 1Þ!
¼ n
m¼1 ð2l þ mÞ!
ðm 1Þ!ðn l m 1Þ!ð2l þ mÞðn lÞ

ðm 1Þ!ðn l mÞ!m!ðn l m 1Þ!
nl ðn þ l 1Þ!
2ρ
þð1Þnl þ
n ð2l 1Þ!
2ρm nl ðn þ l 1Þ!
Xnl1 ð1Þm ðn lÞ!ðn þ l 1Þ! nl 2ρ
¼ n
þ ð 1 Þ þ
m¼1 ð2l þ m 1Þ!ðn l mÞ!m! n ð2l 1Þ!
"
Xnl1 ð1Þm 2ρ m ðn þ l 1Þ! nl 1 2ρ nl
¼ ðn lÞ! n
m¼1 ð2l þ m 1Þ!ðn l mÞ!m!
þ ð1Þ
ðn lÞ! n
#
ðn þ l 1Þ! Xnl ð1Þm 2ρ m ðn þ l 1Þ!
þ ¼ ðn lÞ! m¼0 n
ðn lÞ!ð2l 1Þ! ð2l þ m 1Þ!ðn l mÞ!m!

2ρ
¼ ðn lÞ!L2l1
nl : ð3:290Þ
n
Notice that with the second equality of (3.290), the summation is divided into
three terms; i.e., 1 m n l 1, m ¼ n l (the highest-order term), and m ¼ 0
(the lowest- order term). Note that with the second last equality of (3.290), the
highest-order (n l) term and the lowest-order term (i.e., a constant) have been
absorbed in a single equation, namely, an associated Laguerre polynomial. Corre-
spondingly, with the second last equality of (3.290) the summation range is extended
to 0 m n l.
Summarizing the above results, we get
e ðnÞ
bl Φl ð ρÞ
3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 lþ2 n þ l nlðn lÞ! 1 2ρ
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi eρ=n ρl L2l1
ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ!
n 2l nl n
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ðn lÞ!
2 lþ2 2ρ
¼ eρ=n ρl L2l1
n 2nðn þ l 1Þ! nl n
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2 ðl1Þþ2 ½n ðl 1Þ 1! ρn ðl1Þþ1 2ðl1Þþ1 2ρ
3
ðnÞ
¼ e ρ Lnðl1Þ1 Φl1 ðρÞ:
n 2n½n þ ðl 1Þ! n
ð3:291Þ
ðnÞ ðnÞ
e l . Moreover, if we replace
Thus, we find out that Φl ðρÞ behaves exactly like ψ
l in (3.279) with n – 1, we find
ðnÞ ðnÞ
e n1 :
Φn1 ðρÞ ¼ ψ ð3:292Þ
Operating e
bn1 on both sides of (3.292),
ðnÞ ðnÞ
e n2 :
Φn2 ðρÞ ¼ ψ ð3:293Þ
Likewise successively operating e

bl (1 l n 1),
ðnÞ ðnÞ
e l ðρÞ,
Φl ðρÞ ¼ ψ ð3:294Þ
with all allowed numbers of l (i.e., 0 l n 1). This permits us to identify
ðnÞ ðnÞ
e l ðρÞ:
Φl ð ρÞ ψ ð3:295Þ
Consequently, it is clear that the parameter n introduced in (3.249) is identical to a

principal quantum number and that the parameter l (0 l n 1) is an orbital
angular momentum quantum number (or azimuthal quantum number). The functions
ðnÞ ðnÞ
Φl ðρÞ and ψ e l ðρÞ are identical up to the constant cn expressed in (3.265). Note,
however, that a complex constant with an absolute number of 1 (phase factor)
remains undetermined, as is always the case with the eigenvalue problem.
The radial wave functions are derived from the following relationship as
described earlier:
ðnÞ ðnÞ
e l =ρ:
Rl ðr Þ ¼ ψ ð3:296Þ
ðnÞ
To normalize Rl ðr Þ, we have to calculate the following integral:
Z 1 Z 1 Z
ðnÞ 2 2 1 ðnÞ 2 a 2 a a 3 1 ðnÞ 2
Rl ðr Þ r dr ¼ ψel ρ dρ ¼ e l dρ
ψ
0 0 ρ 2 Z Z Z 0
3
a
¼ : ð3:297Þ
Z
ðnÞ
el ðr Þ for the normalized radial
Accordingly, we choose the following functions R
wave functions:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eðl nÞ ðr Þ ¼
R ðZ=aÞ3 Rl ðr Þ:
ðnÞ
ð3:298Þ
Substituting (3.296) into (3.298) and taking account of (3.279) and (3.280), we
obtain
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h i
eðl nÞ ðr Þ 2Z 3 ðn l 1Þ! 2Zr l Zr 2Zr
R ¼ exp L2lþ1 : ð3:299Þ
an 2nðn þ lÞ! an an nl1 an
Equation (3.298) is exactly the same as the normalized radial wave functions that
can be obtained as the solution of (3.241) through the power series expansion. All

ħ2 Z 2 1
these functions belong to the same eigenenergy E n ¼ 2μ a n2 ; see (3.258).
Returning back to the quantum states sharing the same eigenenergy, we have such
n states that belong to the principal quantum number n with varying azimuthal
quantum number l (0 l n 1). Meanwhile, from (3.158), (3.159), and
(3.171), 2l + 1 quantum states possess an eigenvalue of l(l + 1) with the quantity
M2. As already mentioned in Sects. 3.5 and 3.6, the integer m takes the 2l + 1
different values of
m ¼ l, l 1, l 2, 1, 0, 1, l þ 1, l:
The quantum states of a hydrogen-like atoms are characterized by a set of integers

(m, l, n). On the basis of the above discussion, the number of those sharing the same
energy (i.e., the principal quantum number n) is
Xn1
l¼0
ð2l þ 1Þ ¼ n2 :
Regarding this situation, we say that the quantum states belonging to the principal

ħ2 Z 2 1
quantum number n, or the energy eigenvalue En ¼ 2μ a
2
n2 , are degenerate n -
fold. Note that we ignored the freedom of spin state.
In summary of this section, we have developed the operator formalism in dealing
with radial wave functions of hydrogen-like atoms and seen how the operator
formalism features the radial wave functions. The essential point rests upon that
the radial wave functions can be derived by successively operating the lowering
ð nÞ
e n1 that is parametrized with a principal quantum number n and an
operators bl on ψ
orbital angular momentum quantum number l ¼ n 1. This is clearly represented by
(3.278). The results agree with the conventional coordinate representation method
based upon the power series expansion that leads to associated Laguerre polyno-
mials. Thus, the operator formalism is again found to be powerful in explicitly
representing the mathematical constitution of quantum-mechanical systems.
3.8 Total Wave Functions
Since we have obtained angular wave functions and radial wave functions, we
e ðnÞ of hydrogen-like atoms as a product
describe normalized total wave functions Λ l,m
of the angular part and radial part such that
e ðnÞ ¼ Y m ðθ, ϕÞR

Λ eðl nÞ ðr Þ: ð3:300Þ
l,m l
Let us seek several tangible functional forms of hydrogen (Z ¼ 1) including

angular and radial parts. For example, we have
rffiffiffiffiffi ! rffiffiffi
ðnÞ
eð01Þ ðr Þ e n1
1 3=2 ψ 1 3=2 r=a
ϕð1sÞ Y 00 ðθ, ϕÞR ¼ a ¼ a e , ð3:301Þ
4π ρ π
where we used (3.276) and (3.295).

For ϕ(2s), using (3.277) and (3.278) we have

eð02Þ ðr Þ ¼ p1ffiffiffiffiffi a32 e2ar 2 r :
ϕð2sÞ Y 00 ðθ, ϕÞR ð3:302Þ
4 2π a
For ϕ(2pz), in turn, we express it as

rffiffiffiffiffi

ϕ 2pz eð12Þ ðr Þ
Y 01 ðθ, ϕÞR ¼
3 1 3 r
ð cos θÞ pffiffiffi a2 e2a
r
4π 2 6 a
1 3 r 1 3 r r z 1
¼ pffiffiffiffiffi a2 e2a cos θ ¼ pffiffiffiffiffi a2 e2a ¼ pffiffiffiffiffi a2 e2a z: ð3:303Þ
r 5 r
4 2π a 4 2π a r 4 2π
For ϕ(2px + iy), using (3.217) we get

References 123

eð12Þ ðr Þ ¼ p
ϕ 2pxþiy Y 11 ðθ, ϕÞR
1 32 r 2ar
ffiffiffi a e sin θeiϕ
8 π a
1 3 r r x þ iy 1
¼ pffiffiffi a2 e2a ¼ pffiffiffi a2 e2a ðx þ iyÞ:
5 r
ð3:304Þ
8 π a r 8 π
In (3.304), the minus sign comes from the Condon–Shortley phase. Furthermore,
we have

ϕ 2pxiy Y 1 eð2Þ 1 32 r 2ar
1 ðθ, ϕÞR1 ðr Þ ¼ pffiffiffi a a e sin θeiϕ
8 π
1 3 r r x iy 1
¼ pffiffiffi a2 e2a ¼ pffiffiffi a2 e2a ðx iyÞ:
5 r
ð3:305Þ
8 π a r 8 π
Notice that the above notations ϕ(2px + iy) and ϕ(2px iy) differ from the custom
that uses, e.g., ϕ(2px) and ϕ(2py). We will come back to this point in Sect. 4.3.
References
1. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York

2. Arfken GB (1970) Mathematical methods for physicists, 2nd edn. Academic Press, Waltham
3. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)
4. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
6. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd
edn. Cambridge University Press, Cambridge
7. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
Chapter 4
Optical Transition and Selection Rules
In Sect. 1.2 we showed the Schrödinger equation as a function of space coordinates

and time. In subsequent sections, we dealt with the time-independent eigenvalue
problems of a harmonic oscillator and a hydrogen-like atoms. This implies that the
physical system is isolated from the outside world and that there is no interaction
between the outside world and physical system we are considering. However, by
virtue of the interaction the system may acquire or lose energy, momentum, angular
momentum, etc. As a consequence of the interaction, the system changes its quan-
tum state as well. Such a change is said to be a transition. If the interaction takes
place as an optical process, we are to deal with an optical transition. Of various
optical transitions, the electric dipole transition is common and the most important.
In this chapter, we study the optical transition of a particle confined in a potential
well, a harmonic oscillator, and a hydrogen using a semiclassical approach. A
question of whether the transition is allowed or forbidden is of great importance.
We have a selection rule to judge it.
4.1 Electric Dipole Transition
We have a time-dependent Schrödinger equation described as
∂ψ
Hψ ¼ ih : ð1:47Þ
∂t
Using the method of separation of variables, we obtained two equations expressed

below.
HϕðxÞ ¼ EϕðxÞ, ð1:55Þ

https://doi.org/10.1007/978-981-15-2225-3_4
126 4 Optical Transition and Selection Rules
∂ξðt Þ
ih ¼ Eξðt Þ: ð1:56Þ
∂t
Equation (1.55) is an eigenvalue equation of energy and (1.56) is an equation with

time. So far we have focused our attention upon (1.55) taking a one-dimensional
harmonic oscillator and hydrogen-like atoms as an example. In this chapter we deal
with a time-evolved Schrödinger equation and its relevance to an optical transition.
The optical transition takes place according to selection rules. We mention their
significance as well.
We showed that after solving the eigenvalue equation, the Schrödinger equation
is expressed as
ψ ðx, t Þ ¼ ϕðxÞ exp ðiEt=hÞ: ð1:60Þ
The probability density of the system (i.e., normally a particle such as an electron
and a harmonic oscillator) residing at a certain place x at a certain time t is expressed
as
ψ ðx, t Þψ ðx, t Þ:
If the Schrödinger equation is described as a form of separated variables as in the

case of (1.60), the exponential factors including t cancel out and we have
ψ ðx, t Þψ ðx, t Þ ¼ ϕ ðxÞϕðxÞ: ð4:1Þ
This means that the probability density of the system depends only on spatial
coordinate and is constant in time. Such a state is said to be a stationary state. That
is, the system continues residing in a quantum state described by ϕ(x) and remains
unchanged independent of time.
Next, we consider a linear combination of functions described by (1.60). That is,
we have
ψ ðx, t Þ ¼ c1 ϕ1 ðxÞ exp ðiE 1 t=hÞ þ c2 ϕ2 ðxÞ exp ðiE 2 t=hÞ, ð4:2Þ
where the first term is pertinent to the state 1 and second term to the state 2; c1 and c2
are complex constants with respect to the spatial coordinates but may be weakly
time-dependent. The state described by (4.2) is called a coherent state. The proba-
bility distribution of that state is described as
ψ ðx, t Þψ ðx, t Þ
¼ jc1 j2 jϕ1 j2 þ jc2 j2 jϕ2 j2 þ c1 c2 ϕ1 ϕ2 eiωt þ c2 c1 ϕ2 ϕ1 eiωt , ð4:3Þ
where ω is expressed as
4.1 Electric Dipole Transition 127
ω ¼ ðE 2 E 1 Þ=h: ð4:4Þ
This equation shows that the probability density of the system undergoes a sinusoi-
dal oscillation with time. The angular frequency equals the energy difference
between the two states divided by the reduced Planck constant. If the system is a
charged particle such as an electron and proton, the sinusoidal oscillation is accom-
panied by an oscillating electromagnetic field. Thus, the coherent state is associated
with the optical transition from one state to another, when the transition is related to
the charged particle.
The optical transitions result from various causes. Of these, the electric dipole
transition yields the largest transition probability and the dipole approximation is
often chosen to represent the transition probability. From the point of view of optical
measurements, the electric dipole transition gives the strongest absorption or emis-
sion spectral lines. The matrix element of the electric dipole, more specifically a
square of an absolute value of the matrix element, is a measure of the optical
transition probability. Labelling the quantum states as a, b, etc. and describing the
corresponding state vector as j ai, j bi, etc., the matrix element Pba is given by
Pba hbjεe Pjai, ð4:5Þ
where εe is a unit polarization vector of the electric field of an electromagnetic wave

(i.e., light). Equation (4.5) describes the optical transition that takes place as a result
of the interaction between electrons and radiation field in such a way that the
interaction causes electrons in the system to change the state from j ai to j bi. That
interaction is represented by εe P. The quantum states j ai and j bi are referred to as
an initial state and final state, respectively.
The quantity P is the electric dipole moment of the system, which is defined as
P eΣj xj , ð4:6Þ
where e is an elementary charge (e < 0) and xj is a position vector of the j-th electron.
Detailed description of εe and P can be seen in Chap. 7. The quantity Pba is said to be
transition dipole moment, or, more precisely, transition electric dipole moment with
respect to the states j ai and j bi. We assume that the optical transition occurs from a
quantum state j ai to another state j bi. Since Pba is generally a complex number,
jPbaj2 represents the transition probability.
If we adopt the coordinate representation, (4.5) is expressed by
Z
Pba ¼ ϕb εe Pϕa dτ, ð4:7Þ
where τ denotes an integral range of a space.

4.2 One-Dimensional System
Let us apply the aforementioned general description to individual cases of Chaps. 1–

3.
Example 4.1: A Particle Confined in a Square-Well Potential This example was
treated in Chap. 1. As before, we assume that a particle (i.e., electron) is confined in a
one-dimensional system [L x L (L > 0)].
We consider the optical transition from the ground state ϕ1(x) to the first excited
state ϕ2(x). Here, we put L ¼ π/2 for convenience. Then, the normalized coherent
state ψ(x) is described as
1
ψ ðx, t Þ ¼ pffiffiffi ½ϕ1 ðxÞ exp ðiE 1 t=hÞ þ ϕ2 ðxÞ exp ðiE 2 t=hÞ, ð4:8Þ
2
where we put c1 ¼ c2 ¼ p1ffiffi2 in (4.2). In (4.8), we have
rffiffiffi rffiffiffi
2 2
ϕ1 ð xÞ ¼ cos x and ϕ2 ðxÞ ¼ sin 2x: ð4:9Þ
π π
Following (4.3), we have a following real function called a probability distribution

density:
1
ψ ðx, t Þψ ðx, t Þ ¼ cos 2 x þ sin 2 2x þ ð sin 3x þ sin xÞ cos ωt , ð4:10Þ
π
where ω is given by (4.4) as
ω ¼ 3h=2m, ð4:11Þ
where m is a mass of an electron. Rewriting (4.10), we have

h i
1 1
ψ ðx, t Þψ ðx, t Þ ¼ 1 þ ð cos 2x cos 4xÞ þ ð sin 3x þ sin xÞ cos ωt : ð4:12Þ
π 2
π π
Integrating (4.12) over 2, 2 , a contribution from only the first term is nonvan-
ishing to give 1, as anticipated (because of the normalization).
Putting t ¼ 0 and integrating (4.12) over a positive domain 0, π2 , we have
Z π=2
1 4
ψ ðx, 0Þψ ðx, 0Þdx ¼ þ 0:924: ð4:13Þ
0 2 3π

Similarly, integrating (4.12) over a negative domain π2, 0 , we have
4.2 One-Dimensional System 129
Fig. 4.1 Probability

distribution density ψ (x,
t)ψ(x, t) of a particle
,0
confined in a square-well
potential. The solid curve
,0
and broken curve represent
the density of t ¼ 0 and
∗
t ¼ π/ω (i.e., half period),
respectively
− /2 0 /2
Z 0
1 4
ψ ðx, 0Þψ ðx, 0Þdx ¼ 0:076: ð4:14Þ
π=2 2 3π
Thus, 92% of a total charge (as a probability density) is concentrated in the positive
domain. Differentiation of ψ (x, 0)ψ(x, 0) gives five extremals including both edges.
Of these, a major maximum is located at 0.635 radian that corresponds to about 40%
of π/2. This can be a measure of the transition moment. Figure 4.1 demonstrates
these results (see a solid curve). Meanwhile, putting t ¼ π/ω (i.e., half period), we
plot ψ (x, π/ω)ψ(x, π/ω). The result shows that the graph is obtained by folding back
the solid curve of Fig. 4.1 with respect to the ordinate axis. Thus, we find that the
charge (or the probability density) exerts a sinusoidal oscillation with an angular
frequency 3 h/2m along the x-axis around the origin.
Let e1 be a unit vector in the positive direction of the x-axis. Then, the electric
dipole P of the system is
P = ex = exe1 , ð4:15Þ
where x is a position vector of the electron. Let us define the matrix element of the
electric dipole transition as
P21 hϕ2 ðxÞje1 Pjϕ1 ðxÞi ¼ hϕ2 ðxÞjexjϕ1 ðxÞi: ð4:16Þ
Notice that we only have to consider that the polarization of light is parallel to the x-
axis. With the coordinate representation, we have
Z π=2 Z rffiffiffi
π=2
rffiffiffi
2 2
P21 ¼ ϕ2 ðxÞexϕ1 ðxÞdx ¼ ð cos xÞex sin 2x dx
π=2 π=2 π π
Z π=2 Z π=2
2 e
¼e x cos x sin 2x dx ¼ xð sin x þ sin 3xÞdx
π π=2 π π=2
Z π=2 0
e 1 16e
¼ xð cos xÞ0 þ x cos 3x dx ¼ , ð4:17Þ
π π=2 3 9π
where we used a trigonometric formula and integration by parts. The factor 16/9π in
(4.17) is about 36% of π/2. This number is pretty good agreement with 40% that is
estimated above from the major maximum of ψ (x, 0)ψ(x, 0). Note that the transition
moment vanishes if the two states associated with the transition have the same parity.
In other words, if these are both described by sine functions or cosine functions, the
integral vanishes.
Example 4.2: One-Dimensional Harmonic Oscillator Second, let us think of an
optical transition regarding a harmonic oscillator that we dealt with in Chap. 2. We
denote the state of the oscillator as j ni in place of jψ ni (n ¼ 0, 1, 2, ) of Chap. 2.
Then, a general expression (4.5) can be written as
Pkl ¼ hkjεe Pjli: ð4:18Þ
Since we are considering the sole one-dimensional oscillator,
εe ¼ e
q and P = eq, ð4:19Þ
where eq is a unit vector in the positive direction of the coordinate q. Therefore,

similarly to the above we have
εe P = eq: ð4:20Þ
That is,
Pkl ¼ ehkjqjli: ð4:21Þ
Since q is an Hermitian operator, we have
Pkl ¼ e ljq{ jk ¼ ehljqjki ¼ Plk , ð4:22Þ
where we used (1.116). Using (2.68), we have

h { h
Pkl ¼ e kja þ a jl ¼ e hkjajli þ kja{ jl : ð4:23Þ
2mω 2mω
Taking the adjoint of (2.62) and modifying the notation, we have

4.2 One-Dimensional System 131
D pffiffiffiffiffiffiffiffiffiffiffi
kja ¼ k þ 1hk þ 1j: ð4:24Þ
Using (2.62) once again, we get

rffiffiffiffiffiffiffiffiffiffih i
h pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi
Pkl ¼ e k þ 1hk þ 1jli þ l þ 1hkjl þ 1i : ð4:25Þ
2mω
Using orthonormal conditions between the state vectors, we have

rffiffiffiffiffiffiffiffiffiffih i
h pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi
Pkl ¼ e k þ 1δkþ1,l þ l þ 1δk,lþ1 : ð4:26Þ
2mω
Exchanging k and l in the above, we get
Pkl ¼ Plk :
The matrix element Pkl is symmetric with respect to indices k and l. Notice that the
first term does not vanish only when k + 1 ¼ l. The second term does not vanish only
when k ¼ l + 1. Therefore we get
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
hð k þ 1Þ hð l þ 1Þ hð k þ 1Þ
Pk,kþ1 ¼e and Plþ1,l ¼ e or Pkþ1,k ¼ e : ð4:27Þ
2mω 2mω 2mω
Meanwhile, we find that the transition matrix P is expressed as

h
P ¼ eq ¼ e a þ a{
2mω
0 1
0 1 0 0 0
B 1 pffiffiffi
C
rffiffiffiffiffiffiffiffiffiffiB C
0 2 0 0
h B B 0 2 0 3 0 C
¼e pffiffiffi C, ð4:28Þ
2mωB 0 B C
0 3 0 2 C
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
where we used (2.68). Note that a real Hermitian matrix is a symmetric matrix.
Practically, it is a fast way to construct a transition matrix (4.28) using (2.65) and
(2.66). It is an intuitively obvious and straightforward task. Having a glance at the
matrix form immediately tells us that the transition matrix elements are nonvanishing
with only (k, k + 1) and (k + 1, k) positions. Whereas the (k, k + 1)-element represents
transition from the k-th excited state to (k 1)-th excited state accompanied by
photoemission, the (k + 1, k)-element implies the transition from (k 1)-excited state
to k-th excited state accompanied by photoabsorption. The two transitions give the
same transition moment. Note that zeroth excited state means the ground state; see
(2.64) for basis vector representations.
We should be careful about “addresses” of the matrix accordingly. For example,
P0, 1 in (4.27) represents a (1, 2) element of the matrix (4.28); P2, 1 stands for a (3, 2)
element.
Suppose that we seek the transition dipole moments using coordinate represen-
tation. Then, we need to use (2.106) and perform definite integration. For instance,
we have
Z 1
e ψ 0 ðqÞqψ 1 ðqÞdq
1
qffiffiffiffiffiffiffi
h
that corresponds to (1, 2) element of (4.28). Indeed, the above integral gives e 2mω.
The confirmation is left for the readers. Nonetheless, to seek a definite integral of
product of higher excited-state wave functions becomes increasingly troublesome. In
this respect, the operator method described above provides us with a much better
insight into complicated calculations.
Equations (4.26)–(4.28) imply that the electric dipole transition is allowed to
occur only when the quantum number changes by one. Notice also that the transition
takes place between the even function and odd function; see Table 2.1 and (2.101).
Such a condition or restriction on the optical transition is called a selection rule. The
former equation of (4.27) shows that the transition takes place from the upper state to
the lower state accompanied by the photon emission. The latter equation, on the
other hand, shows that the transition takes place from the lower state to the upper
accompanied by the photon absorption.
4.3 Three-Dimensional System
The hydrogen-like atoms give us a typical example. Since we have fully investigated
the quantum states of those atoms, we make the most of the related results.
Example 4.3: An Electron in a Hydrogen Atom Unlike the one-dimensional
system, we have to take account of an angular momentum in the three-dimensional
system. We have already obtained explicit wave functions. Here we focus on 1s and
2p states of a hydrogen. For their normalized states we have
1 r=a
ϕð1sÞ ¼ e , ð4:29Þ
πa3
4.3 Three-Dimensional System 133
1 1 r r=2a
ϕ 2pz ¼ e cos θ, ð4:30Þ
4 2πa3 a
1 1 r r=2a
ϕ 2pxþiy ¼ e sin θeiϕ , ð4:31Þ
8 πa3 a
1 1 r r=2a
ϕ 2pxiy ¼ e sin θeiϕ , ð4:32Þ
8 πa3 a
where a denotes Bohr radius of a hydrogen. Note that a minus sign of ϕ(2px + iy) is
due to the Condon–Shortley phase. Even though the transition probability is pro-
portional to a square of the matrix element and so the phase factor cancels out, we
describe the state vector faithfully. The energy eigenvalues are
h2 h2
E ð1sÞ ¼ 2
, E 2pz ¼ E 2pxþiy ¼ E 2pxiy ¼ , ð4:33Þ
2μa 8μa2
where μ is a reduced mass of a hydrogen. Note that the latter three states are
degenerate.
First, we consider a transition between ϕ(1s) and ϕ(2pz) states. Suppose that the
normalized coherent state is described as
1
ψ ðx, t Þ ¼ pffiffiffi f ϕð1sÞ exp ½iE ð1sÞt=h þ ϕ 2pz exp iE 2pz t=h g: ð4:34Þ
2
As before, we have
ψ ðx, t Þψ ðx, t Þ ¼ jψ ðx, t Þj2

n 2 o
1
¼ ½ϕð1sÞ2 þ ϕ 2pz þ 2ϕð1sÞϕ 2pz cos ωt , ð4:35Þ
2
where ω is given by

ω ¼ E 2pz Eð1sÞ =h ¼ 3h=8 μa2 : ð4:36Þ
In virtue of the third term of (4.35) that contains a cos ωt factor, the charge
distribution undergoes a sinusoidal oscillation along the z-axis with an angular
frequency described by (4.36). For instance, ωt ¼ 0 gives +1 factor to (4.35) when
t ¼ 0, whereas it gives 1 factor when ωt ¼ π, i.e., t ¼ 8π μa2/3h.
Integrating (4.35), we have
Z Z
ψ ðx, t Þψ ðx, t Þdτ ¼ jψ ðx, t Þj2 dτ
Z n Z
1 2 o 1 1
¼ ½ϕð1sÞ2 þ ϕ 2pz dτ þ cos ωt ϕð1sÞϕ 2pz dτ ¼ þ ¼ 1,
2 2 2
where we used normalized functional forms of ϕ(1s) and ϕ(2pz) together with
orthogonality of them. Note that both of the functions are real.
Next, we calculate the matrix element. For simplicity, we denote the matrix
element simply as Pðεe Þ only by designating the unit polarization vector εe. Then,
we have
Pðεe Þ ¼ ϕð1sÞjεe Pjϕ 2pz , ð4:37Þ
where
0 1
x
B C
P = ex = eðe1 e2 e3 Þ@ y A: ð4:38Þ
z
We have three possibilities of choosing εe out of e1, e2, and e3. Choosing e3, we have
ðe Þ
Pz,jp3 ¼ e ϕð1sÞjzjϕ 2pz
zi
Z 1 Z π Z 2π pffiffiffi
e 4 3r=2a 27 2
¼ pffiffiffi r e dr cos θ sin θdθ
2
dϕ ¼ 5 ea 0:745ea: ð4:39Þ
4 2πa4 0 0 0 3
ðe Þ
In (4.39), we express the matrix element as Pz,jp3to indicate the z-component of
i z
position vector and to explicitly show that ϕ(2pz) state is responsible for the
transition. In (4.39), we used z ¼ r cos θ. We also used a radial part integration
such that
Z 1
2a 5
r 4 e3r=2a dr ¼ 24 :
0 3
Also, we changed a variable cos θ ⟶ t to perform the integration with respect to θ.

We see that a “leverage” length of the transition moment is comparable to Bohr
radius a.
ðe Þ
With the notation Pz,jp3 we need some explanation for consistency with the latter
zi
description. Equation (4.39) represents the transition from jϕ(2pz)i to jϕ(1s)i that is
accompanied by the photon emission. Thus, j pzi in the notation means that jϕ(2pz)i
is the initial state. In the notation, in turn, (e3) denotes the polarization vector and
z represents the electric dipole.
In the case of photon absorption where the transition occurs from jϕ(1s)i to
jϕ(2pz)i, we use the following notation:
ðe Þ
Pz, 3p j ¼ e ϕ 2pz jzjϕð1sÞ : ð4:40Þ
hz
Since all the functions related to the integration are real, we have
ðe Þ ðe Þ
Pz,jp3 ¼ Pz, 3p j :
zi hz
Meanwhile, if we choose e1 for εe to evaluate the matrix element Px, we have
ðe Þ
Px,jp1 ¼ e ϕð1sÞjxjϕ 2pz
zi
Z 1 Z π Z 2π
e
¼ pffiffiffi r 4 e3r=2a dr sin 2 θ cos θdθ cos ϕdϕ ¼ 0, ð4:41Þ
4 2πa4 0 0 0
where cos ϕ comes from x ¼ r sin θ cos ϕ and an integration of cos ϕ gives zero. In a
similar manner, we have
ðe Þ
Py, 2j p ¼ e ϕð1sÞjyjϕ 2pz ¼ 0: ð4:42Þ
zi
Next, we estimate the matrix elements associated with 2px and 2py. For this
purpose, it is convenient to introduce the following complex coordinates by a unitary
transformation:
0 10 1
1 1 1 i
0 1 p ffiffi
ffi p ffiffi
ffi 0 CB pffiffiffi pffiffiffi 0 C0 1
x B 2 C x
B 2 2 CB 2
B C B CB CB C
ðe1 e2 e3 Þ@ y A ¼ ðe1 e2 e3 ÞB piffiffiffi pi ffiffiffi 0 CB p1ffiffiffi pffiffiffi 0 C
i @yA
B CB C
z @ 2 2 A@ 2 2 A z
0 0 1 0 0 1
0 1
1
pffiffiffi ðx þ iyÞ
B 2 C
1 1 B C
¼ pffiffiffi ðe1 ie2 Þ pffiffiffi ðe1 þ ie2 Þ e3 B B 1 C,
C ð4:43Þ
2 2 @ pffiffiffi ðx iyÞ A
2
z
where a unitary transformation is represented by a unitary matrix defined as

U { U ¼ UU { ¼ E: ð4:44Þ
We will investigate details of the unitary transformation and matrix in Parts III
and IV.
We define e+ and e as follows [1]:
1 1
eþ pffiffiffi ðe1 þ ie2 Þ and e pffiffiffi ðe1 ie2 Þ, ð4:45Þ
2 2
where complex vectors e+ and e represent the left-circularly polarized light and
right-circularly polarized light that carry an angular momentum h and h, respec-
tively. We will revisit the characteristics and implication of these complex vectors in
Sect. 7.4.
We have
0 1
0 1 p
1
ffiffi
ffi ð x þ iy Þ
x B 2 C
B C B C
ð e1 e2 e3 Þ @ y A ¼ ð e eþ e3 Þ B
B pffiffiffi ðx iyÞ C:
1 C ð4:46Þ
@ A
z 2
z
Note that e+, e, and e3 are orthonormal. That is,
heþ jeþ i ¼ 1, heþ je i ¼ 0, etc: ð4:47Þ
In this situation, e+, e, and e3 are said to form an orthonormal basis in a three-
dimensional complex vector space (see Sect. 11.4).
Now, choosing e+ for εe, we have [2]

ðeþ Þ 1
Pxiy, e ϕð1sÞjpffiffiffi ðx iyÞjϕ 2pxþiy , ð4:48Þ
j pþ i 2
where j p+i is a shorthand notation of ϕ(2px + iy); x iy represents a complex electric

dipole. Equation (4.48) represents an optical process in which an electron causes
transition from ϕ(2px + iy) to ϕ(1s) to lose an angular momentum h, whereas the
radiation field gains that angular momentum to conserve a total angular momentum
ðeþ Þ
h. The notation Pxiy, reflects this situation. Using the coordinate representation,
j pþ i
Z 1 Z π Z 2π
ðe Þ e
þ
Pxiy,jp ¼ pffiffiffi r 4 e3r=2a dr sin 3 θdθ eiϕ eiϕ dϕ
þi 8 2πa4 0 0 0
pffiffiffi
27 2
¼ 5 ea, ð4:49Þ
3
where we used
x iy ¼ r sin θeiϕ : ð4:50Þ
In the definite integral of (4.49), eiϕ comes from x iy, while eiϕ comes from
ϕ(2px + iy). Note that from (3.24) eiϕ is an eigenfunction corresponding to an angular
momentum eigenvalue h. Notice that in (4.49) exponents eiϕ and eiϕ cancel out and
that an azimuthal integral is nonvanishing.
If we choose e for εe, we have

ðe Þ 1
Pxþiy,jp ¼ e ϕð1sÞjpffiffiffi ðx þ iyÞjϕ 2pxþiy ð4:51Þ
þi 2
Z 1 Z π Z 2π
e
¼ pffiffiffi r 4 e3r=2a dr sin 3 θdθ e2iϕ dϕ ¼ 0, ð4:52Þ
8 2πa4 0 0 0
where we used
x þ iy ¼ r sin θeiϕ :
With (4.52), a factor e2iϕ results from the product ϕ(2px + iy)(x + iy) which renders the
integral (4.51) vanishing. Note that the only difference between (4.49) and (4.52) is
about the integration of ϕ factor. For the same reason, if we choose e3 for εe, the
matrix element vanishes. Thus, with the ϕ(2px + iy)-related matrix element, only
ð eþ Þ
Pxiy, survives. Similarly, with the ϕ(2px iy)-related matrix element, only
j pþ i
ð e Þ
Pxþiy,jp survives. Notice that j pi is a shorthand notation of ϕ(2px iy). That is,
i
we have, e.g.,
pffiffiffi
ðe Þ 1 27 2
Pxþiy,jp ¼ e ϕð1sÞjpffiffiffi ðx þ iyÞjϕ 2pxiy ¼ 5 ea,
i
2 3

ðe Þ 1
þ
Pxiy, jp ¼ e ϕð1sÞjpffiffiffi ðx iyÞjϕ 2pxiy ¼ 0: ð4:53Þ
i
2
Taking complex conjugate of (4.48), we have

pffiffiffi
ðeþ Þ 1 27 2
Pxiy, ¼ e ϕ 2pxþiy jpffiffiffi ðx þ iyÞjϕð1sÞ ¼ 5 ea ð4:54Þ
j pþ i 2 3
ðe Þ
Here recall (1.116) and (x iy){ ¼ x + iy. Also note that since Pxiy,
þ
is real,
j pþ i
ð eþ Þ
[Pxiy, is real as well so that we have
j pþ i

ð eþ Þ ðe Þ ðe Þ
Pxiy, ¼ Pxiy,
þ
¼ Pxþiy,

: ð4:55Þ
j pþ i j pþ i hpþ j
Comparing (4.48) and (4.55), we notice that the polarization vector has been
switched from e+ to e with the allowed transition, even though the matrix element
remains the same. This can be explained as follows: In (4.48) the photon emission is
occurring, while the electron is causing a transition from ϕ(2px + iy) to ϕ(1s). As a
result, the radiation field has gained an angular momentum by h during the process in
which the electron has lost an angular momentum h. In other words, h is transferred
from the electron to the radiation field and this process results in the generation of
left-circularly polarized light in the radiation field.
In (4.54), on the other hand, the reversed process takes place. That is, the photon
absorption is occurring in such a way that the electron is excited from ϕ(1s) to
ϕ(2px + iy). After this process has been completed, the electron has gained an angular
momentum by h, whereas the radiation field has lost an angular momentum by h. As
a result, the positive angular momentum h is transferred to the electron from the
radiation field that involves left-circularly polarized light. This can be translated into
the statement that the radiation field has gained an angular momentum by h. This is
equivalent to the generation of right-circularly polarized light (characterized by e)
in the radiation field. In other words, the electron gains the angular momentum by h
to compensate the change in the radiation field.
The implication of the first equation of (4.53) can be interpreted in a similar
manner. Also we have
h i
ðe Þ ðe Þ ð eþ Þ 1

Pxþiy, jp ¼ Pxþiy,

¼ Pxiy, ¼ e ϕ 2p j p ffiffi
ffi ð x iy Þjϕ ð 1s Þ
i jp i hp j xiy
2
pffiffiffi
27 2
¼ ea:
35
Notice that the inner products of (4.49) and (4.53) are real, even though operators
ðeþ Þ ð e Þ
x + iy and x iy are not Hermitian. Also note that Pxiy, of (4.49) and Pxþiy,
j pþ i j p i
of (4.53) have the same absolute value with minus and plus signs, respectively. The
minus sign of (4.49) comes from the Condon–Shortley phase. The difference,
however, is not essential, because the transition probability is proportional to
2 2

ð e Þ
Pxþiy,j p i or Pxiy, . Some literature [3, 4] uses (x + iy) instead of x + iy.
ðeþ Þ
j pþ i
This is because simply of the inclusion of the Condon–Shortley phase; see (3.304).
Let us think of the coherent state that is composed of ϕ(1s) and ϕ(2px + iy) or
ϕ(2px iy). Choosing ϕ(2px + iy), the state ψ(x, t) can be given by
1
ψ ðx, t Þ ¼ pffiffiffi
2

ϕð1sÞ exp ðiE ð1sÞt=hÞ þ ϕ 2pxþiy exp iE 2pxþiy t=h , ð4:56Þ
where ϕ(1s) is described by (3.301) and ϕ(2px + iy) is expressed as (3.304). Then we
have
ψ ðx, t Þψ ðx, t Þ ¼ jψ ðx, t Þj2

n 2 h io
1
¼ jϕð1sÞj2 þ ϕ 2pxþiy þ ϕð1sÞR 2pxþiy eiðϕωtÞ þ eiðϕωtÞ
2
n 2 o
1
¼ jϕð1sÞj2 þ ϕ 2pxþiy þ 2ϕð1sÞR 2pxþiy cos ðϕ ωt Þ , ð4:57Þ
2
where using R 2pxþiy , we denote ϕ(2px+iy) as follows:
ϕ 2pxþiy R 2pxþiy eiϕ : ð4:58Þ
That is, R 2pxþiy represents a real component of ϕ(2px+iy) that depends only on
r and θ. The third term of (4.57) implies that the existence probability density of an
electron represented by jψ(x, t)j2 is rotating counterclockwise around the z-axis with
an angular frequency of ω. Similarly, in the case of ϕ(2pxiy), the existence prob-
ability density of an electron is rotating clockwise around the z-axis with an angular
frequency of ω.
Z Z
ψ ðx, t Þψ ðx, t Þdτ ¼ jψ ðx, t Þj2 dτ
Z n
1 2 o
¼ jϕð1sÞj2 þ ϕ 2pxþiy dτ
2
Z 1 Z π Z 2π
1 1
þ 2
r dr sin θ dθ ϕð1sÞR 2pxþiy dϕ cos ðϕ ωt Þ ¼ þ ¼ 1,
0 0 0 2 2
where we used normalized functional forms of ϕ(1s) and ϕ(2px+iy); the last term
vanishes because
Z 2π
dϕ cos ðϕ ωt Þ ¼ 0:
0
This is easily shown by suitable variable transformation.

In relation to the above discussion, we often use real numbers to describe wave
functions. For this purpose, we use the following unitary transformation to transform
the orthonormal basis of e imϕ to cos mϕ and sin mϕ. That is, we have
0 1
ð1Þm ð1Þm i
pffiffiffi pffiffiffi C
1 1 ð1Þm imϕ 1 imϕ B
pffiffiffi cos mϕ pffiffiffi sin mϕ ¼ pffiffiffiffiffi e pffiffiffiffiffi e B 2 2 C,
π π 2π 2π @ 1 i A
pffiffiffi pffiffiffi
2 2
ð4:59Þ
where we assume that m is positive so that we can appropriately take into account the
Condon–Shortley phase. Alternatively, we describe it via unitary transformation as
follows:

ð1Þm imϕ 1 imϕ 1 1
pffiffiffiffiffi e pffiffiffiffiffi e ¼ pffiffiffi cos mϕ pffiffiffi sin mϕ
2π 2π π π
0 m 1
ð1Þ 1
B 2 2 C
B C, ð4:60Þ
@ ð1Þm i i A
2 2
In this regard, we have to be careful about normalization constants; for trigonometric

functions the constant should be p1ffiffiπ , whereas for the exponential representation the
constant is p1ffiffiffiffi
2π
. At the same time, trigonometric functions are expressed as a linear
combination of eimϕ and eimϕ, and so if we use the trigonometric functions,
information of a magnetic quantum number is lost.
In Sect. 3.7, we showed normalized functions Λ eðnÞ ¼ Y m ðθ, ϕÞR
eðl nÞ ðr Þ of the
l,m l
imϕ e ðnÞ
hydrogen-like atom. Noting that Y m l ðθ, ϕÞ is proportional to e , Λl,m can be
described using cosmϕ and sinmϕ for the basis vectors. We denote two linearly
eðnÞ
independent vectors by Λ eðnÞ
l, cos mϕ and Λl, sin mϕ . Then, these vectors are expressed as
0 1
ð1Þm ð1Þm i
ðnÞ pffiffiffi pffiffiffi C
B
eðnÞ
Λ eðnÞ e eðnÞ B 2 2 C,
l, cos mϕ l, sin mϕ ¼ Λl,m
Λ Λl,m @ A ð4:61Þ
1 i
2 2
where we again assume that m is positive. In chemistry and materials science, we

normally use real functions of Λ eðnÞ eðnÞ
l, cos mϕ and Λl, sin mϕ . In particular, we use the
eð2Þ
notations of, e.g., ϕ(2px) and ϕ(2py) instead of Λ and Λeð2Þ , respectively. In
1, cos ϕ 1, sin ϕ
that case, we explicitly have a following form:
0 1
1 i
ð2Þ ð2Þ B p ffiffi
ffi p ffiffi
ffi
e Λ e B 2 2C C
ϕð2px Þ ϕ 2py ¼ Λ 1,1 @
1,1 1 i A
2 2
0 1
1 i
B 2 2C
¼ ϕ 2pxþiy ϕ 2pxiy B @ 1
C
i A
2 2

1 3 r 1 32 r 2a
pffiffiffiffiffi a2 e2a sin θ cos ϕ
r r
¼ pffiffiffiffiffi a e sin θ sin ϕ : ð4:62Þ
4 2π a 4 2π a
Thus, the Condon–Shortley phase factor has been removed.

Using this expression, we calculate matrix elements of the electric dipole transi-
tion. We have
ðe Þ
Px,jp1 i ¼ ehϕð1sÞjxjϕð2px Þi
x
e 27 2
¼ pffiffiffi r 4 e3r=2a dr sin 3 θdθ cos 2 ϕdϕ ¼ ea: ð4:63Þ
4 2πa4 0 0 0 35
Thus, we obtained the same result as (4.49) apart from the minus sign. Since a square
of an absolute value of the transition moment plays a role, the minus sign is again of
secondary importance. With Pðye2 Þ , similarly we have
ðe Þ
Py,jp2 ¼ e ϕð1sÞjyjϕ 2py
yi
e 4 3r=2a 27 2
¼ pffiffiffi r e dr sin θdθ
3
sin ϕdϕ ¼ 5 ea:
2
ð4:64Þ
4 2πa4 0 0 0 3
Comparing (4.39), (4.63), and (4.64), we have

pffiffiffi
ðe Þ ðe Þ ðe Þ 27 2
Pz, 3 p ¼ Px,j1 p i ¼ Py, 2 p ¼ 5 ea:
j zi x j yi 3
ðe Þ ðe Þ ðe Þ
In the case of Pz,jp3 , Px,j1 p i , and Py,jp2 , the optical transition is said to be polarized
zi x yi
along the z-, x-, and y-axis, respectively, and so linearly polarized lights are relevant.
Note moreover that operators z, x, and y in (4.39), (4.63), and (4.64) are Hermitian
and that ϕ(2pz), ϕ(2px), and ϕ(2py) are real functions.
4.4 Selection Rules
In a three-dimensional system such as hydrogen-like atoms, quantum states of

particles (i.e., electrons) are characterized by three quantum numbers: principal
quantum numbers, orbital angular momentum quantum numbers (or azimuthal
quantum numbers), and magnetic quantum numbers. In this section, we examine
the selection rules for the electric dipole approximation.
Of the three quantum numbers mentioned above, angular momentum quantum
numbers are denoted by l and magnetic quantum numbers by m. First, we examine
the conditions on m. With the angular momentum operator L and its corresponding
operator M, we get the following commutation relations:

½M z , x ¼ iy, M y , z ¼ ix, ½M x , y ¼ iz;

½M z , iy ¼ x, M y , ix ¼ z, ½M x , iz ¼ y; etc: ð4:65Þ
Notice that in the upper line the indices change cyclic like (z, x, y), whereas in the
lower line they change anti-cyclic such as (z, y, x). The proof of (4.65) is left for the
reader. Thus, we have, e.g.,
½M z , x þ iy ¼ x þ iy, ½M z , x iy ¼ ðx iyÞ, etc: ð4:66Þ
Putting
Qþ x þ iy and Q x iy,
we have
½M z , Qþ ¼ Qþ , ½M z , Q ¼ Q : ð4:67Þ
Taking an inner product of both sides of (4.67), we have

4.4 Selection Rules 143
hm0 j½M z , Qþ jmi ¼ hm0 jM z Qþ Qþ M z jmi ¼ m0 hm0 jQþ jmi mhm0 jQþ jmi
¼ hm0 jQþ jmi, ð4:68Þ
where the quantum state j mi is identical to j l, mi in (3.151). Here we need no

information about l, and so it is omitted. Thus, we have, e.g., Mzj mi ¼ mj mi. Taking
its adjoint, we have hm j Mz{ ¼ hm j Mz ¼ mhmj, where Mz is Hermitian. These results
lead to (4.68). From (4.68), we get
ðm0 m 1Þhm0 jQþ jmi ¼ 0: ð4:69Þ
Therefore, for the matrix element hm0j Q+j mi not to vanish, we must have
m0 m 1 ¼ 0 or Δm ¼ 1 ðΔm m0 mÞ:
This represents the selection rule with respect to the coordinate Q+.
Similarly, we get
ðm0 m þ 1Þhm0 jQ jmi ¼ 0: ð4:70Þ
In this case, for the matrix element hm0j Qj mi not to vanish we have
m0 m þ 1 ¼ 0 or Δm ¼ 1:
To derive (4.70), we can alternatively use the following: Taking the adjoint of (4.69),
we have
ðm0 m 1ÞhmjQ jm0 i ¼ 0:
Exchanging m0 and m, we have
ðm m0 1Þhm0 jQ jmi ¼ 0 or ðm0 m þ 1Þhm0 jQ jmi ¼ 0:
Thus, (4.70) is recovered.

Meanwhile, we have a commutation relation
½M z , z ¼ 0: ð4:71Þ
Similarly, taking an inner product of both sides of (4.71), we have
ðm0 mÞhm0 jzjmi ¼ 0:
Therefore, for the matrix element hm0j zj mi not to vanish, we must have
m0 m ¼ 0 or Δm ¼ 0: ð4:72Þ
These results are fully consistent with Example 4.3 of Sect. 4.3. That is, if
circularly polarized light takes part in the optical transition, Δm ¼ 1. For instance,
using the present notation we rewrite (4.48) as
pffiffiffi
1 1 27 2
ϕð1sÞjpffiffiffi ðx iyÞjϕ 2pxþiy ¼ pffiffiffi h0jQ j1i ¼ 5 a:
2 2 3
If linearly polarized light is related to the optical transition, we have Δm ¼ 0.

Next, we examine the conditions on l. To this end, we calculate a following
commutator [5]:

M2 , z ¼ M x 2 þ M y2 þ M z 2 , z ¼ M x 2 , z þ M y 2 , z
¼ M x ðM x z zM x Þ þ M x zM x zM x 2 þ M y M y z zM y
þ M y zM y zM y 2

¼ M x ½M x , z þ ½M x , zM x þ M y M y , z þ M y , z M y
¼ i M y x þ xM y M x y yM x
¼ i M x y yM x M y x þ xM y þ 2M y x 2M x y
¼ i 2iz þ 2M y x 2M x y ¼ 2i M y x M x y þ iz : ð4:73Þ
In the above calculations, (i) we used [Mz, z] ¼ 0 (with the second equality); (ii) RHS
was modified so that the commutation relations can be used (the third equality); (iii)
we used Mxy ¼ Mxy 2Mxy and Myx ¼ Myx + 2Myx so that we can use (4.65)
(the second last equality). Moreover, using (4.65), (4.73) can be written as
2
M , z ¼ 2i xM y M x y ¼ 2i M y x yM x :
Similar results on the commutator can be obtained with [M2, x] and [M2, y]. For
further use, we give alternative relations such that

M 2 , x ¼ 2i yM z M y z ¼ 2i M z y zM y ,
2
M , y ¼ 2iðzM x M z xÞ ¼ 2iðM x z xM z Þ: ð4:74Þ
Using (4.73), we calculate another commutator such that

4.4 Selection Rules 145
2 2
M , M , z ¼ 2i M 2 , M y x M 2 , M x y þ i M 2 , z

¼ 2i M y M 2 , x M x M 2 , y þ i M 2 , z

¼ 2i 2iM y yM z M y z 2iM x ðM x z xM z Þ þ i M 2 , z

¼ 2 2 M x x þ M y y þ M z z M z 2 M x 2 þ M y 2 þ M z 2 z
þM 2 z z M 2 g ¼ 2 M 2 z þ z M 2 : ð4:75Þ
In the above calculations, (i) we used [M2, My] ¼ [M2, Mx] ¼ 0 (with the second
equality); (ii) we used (4.74) (the third equality); (iii) RHS was modified so that we
can use the relation M ⊥ x from the definition of the angular momentum operator,
i.e., Mxx + Myy + Mzz ¼ 0 (the second last equality). We used [Mz, z] ¼ 0 as well.
Similar results are obtained with x and y. That is, we have

M 2 , M 2 , x ¼ 2 M 2 x þ xM 2 , ð4:76Þ
2 2
M , M , y ¼ 2 M 2 y þ yM 2 : ð4:77Þ
Rewriting, e.g., (4.75), we have
M 4 z 2M 2 z M 2 þ z M 4 ¼ 2 M 2 z þ z M 2 : ð4:78Þ
Using the relation (4.78) and taking inner products of both sides, we get, e.g.,
l0 jM 4 z 2M 2 z M 2 þ z M 4 jl ¼ l0 j2 M 2 z þ z M 2 jl : ð4:79Þ
That is,
l0 jM 4 z 2M 2 z M 2 þ z M 4 jl l0 j2 M 2 z þ z M 2 jl ¼ 0:
Considering that both terms of LHS contain a factor hl0j zj li in common, we have
h i
l0 ðl0 þ 1Þ 2l0 lðl0 þ 1Þðl þ 1Þ þ l2 ðl þ 1Þ2 2l0 ðl0 þ 1Þ 2lðl þ 1Þ
2 2
hl0 jzjli ¼ 0, ð4:80Þ
where the quantum state j li is identical to j l, mi in (3.151) with m omitted.

To factorize the first factor of LHS of (4.80), we view it as a quartic equation with
respect to l0. Replacing l0 with l, we find that the first factor vanishes, and so the first
factor should have a factor (l0 + l). Then, we factorize the first factor of LHS of (4.80)
such that
½the first factor of LHS of ð4:80Þ

¼ ðl0 þ lÞ ðl0 lÞ þ 2ðl0 þ lÞ l0 l0 l þ l2 2l0 lðl0 þ lÞ 2ðl0 þ lÞ ðl0 þ lÞ
2 2 2 2
h i
¼ ðl0 þ lÞ ðl0 þ lÞðl0 lÞ þ 2 l0 l0 l þ l2 2l0 l 2 ðl0 þ lÞ
2 2
h i
¼ ðl0 þ lÞ ðl0 þ lÞðl0 lÞ þ 2 l0 2l0 l þ l2 ðl0 þ l þ 2Þ
2 2
h i
¼ ðl0 þ lÞ ðl0 þ lÞðl0 lÞ þ 2ðl0 lÞ ðl0 þ l þ 2Þ
2 2
n o
¼ ðl0 þ lÞ ðl0 lÞ ½ðl0 þ lÞ þ 2 ðl0 þ l þ 2Þ
2
¼ ðl0 þ lÞðl0 þ l þ 2Þðl0 l þ 1Þðl0 l 1Þ: ð4:81Þ
Thus rewriting (4.80), we get
ðl0 þ lÞðl0 þ l þ 2Þðl0 l þ 1Þðl0 l 1Þhl0 jzjli ¼ 0: ð4:82Þ
We have similar relations with respect to hl0j xj li and hl0j yj li because of (4.76) and
(4.77). For the electric dipole transition to be allowed, among hl0j xj li, hl0j yj li, and
hl0j zj li at least one term must be nonvanishing. For this, at least one of the four
factors of (4.81) should be zero. Since l0 + l + 2 > 0, this factor is excluded.
For l0 + l to vanish, we should have l0 ¼ l ¼ 0; notice that both l0 and l are non-
negative integers. We must then examine this condition. This condition is equivalent
to that the spherical harmonics related to the angular variables θ and ϕ take the form
pffiffiffiffiffi
of Y00 ðθ, ϕÞ ¼ 1= 4π, i.e., a constant. Therefore, the θ-related integral for the matrix
element hl0j zj li only consists of a following factor:
Z π Z π
1 1
cos θ sin θdθ ¼ sin 2θdθ ¼ ½ cos 2θπ0 ¼ 0,
0 2 0 4
where cosθ comes from a polar coordinate z ¼ r cos θ; sinθ is due to an infinitesimal
volume of space, i.e., r2 sin θdrdθdϕ. Thus, we find that hl0j zj li vanishes on
condition that l0 ¼ l ¼ 0. As a polar coordinate representation, x ¼ r sin θ cos ϕ
and y ¼ r sin θ sin ϕ, and so the ϕ-related integral hl0j xj li and hl0j yj li vanishes as
well. That is,
Z 2π Z 2π
cos ϕdϕ ¼ sin ϕdϕ ¼ 0:
0 0
Therefore, the matrix elements relevant to l0 ¼ l ¼ 0 vanish with all the coordinates;
i.e., we have
4.5 Angular Momentum of Radiation 147
hl0 jxjli ¼ hl0 jyjli ¼ hl0 jzjli ¼ 0: ð4:83Þ
Consequently, we exclude (l0 + l)-factor as well, when we consider a condition of the

allowed transition. Thus, regarding the condition that should be satisfied with the
allowed transition, from (4.82) we get
l0 l þ 1 ¼ 0 or l0 l 1 ¼ 0: ð4:84Þ
Or defining Δl l0 l, we get
Δl ¼ 1: ð4:85Þ
Thus, for the transition to be allowed, the azimuthal quantum number must change
by one.
4.5 Angular Momentum of Radiation [6]
In Sect. 4.3 we mentioned circularly polarized light. If the circularly polarized light
acts on an electron, what can we anticipate? Here we deal with this problem within a
framework of a semiclassical theory.
Let E and H be an electric and magnetic field of a left-circularly polarized light,
respectively. They are expressed as
1
E = pffiffiffi E 0 ðe1 þ ie2 Þ exp iðkz ωt Þ, ð4:86Þ
2
1 1 E
H = pffiffiffi H 0 ðe2 2 ie1 Þ exp iðkz ωt Þ ¼ pffiffiffi 0 ðe2 2 ie1 Þ exp iðkz ωt Þ: ð4:87Þ
2 2 μv
Here we assume that the light is propagating in the direction of the positive z-axis.
The electric and magnetic fields described by (4.86) and (4.87) represent the left-
circularly polarized light. A synchronized motion of an electron is expected, if the
electron exerts a circular motion in such a way that the motion direction of the
electron is always perpendicular to the electric field and parallel to the magnetic field
(see Fig. 4.2). In this situation, magnetic Lorentz force does not affect the electron
motion.
Here, the Lorentz force F(t) is described by
Fðt Þ ¼ eEðxðt ÞÞ þ exð_t Þ Bðxðt ÞÞ, ð4:88Þ
where the first term is electric Lorentz force and the second term represents the
magnetic Lorentz force.
Fig. 4.2 Synchronized y

motion of an electron under
a left-circularly polarized
light electron motion
electron
O E x
The quantity B called magnetic flux density is related to H as B = μ0H, where

μ0 is permeability of vacuum; see (7.10) and (7.11) of Sect. 7.1. In (4.88) E and B are
measured at a position where the electron is situated at a certain time t. Equation
(4.88) universally describes the motion of a charged particle in the presence of
electromagnetic fields. We consider another related example in Chap. 15.
Equation (4.86) can be rewritten as
1
E = pffiffiffi E 0 ½e1 cos ðkz ωt Þ 2 e2 sin ðkz ωt Þ
2
1
þi pffiffiffi E 0 ½e2 cos ðkz ωt Þ þ e1 sin ðkz ωt Þ: ð4:89Þ
2
Suppose that the electron exerts the circular motion in a region narrow enough
around the origin and that the said electron motion is confined within the xy-plane
that is perpendicular to the light propagation direction. Then, we can assume that
z 0 in (4.89). Ignoring kz in (4.89) accordingly and taking a real part, we have
1
E = pffiffiffi E0 ðe1 cos ωt þ e2 sin ωt Þ:
2
Thus, a force F exerting the electron is described by
F = eE, ð4:90Þ
where e is an elementary charge (e < 0). Accordingly, an equation of motion of the

electron is approximated such that
x = eE,
m€ ð4:91Þ
4.5 Angular Momentum of Radiation 149
where m is a mass of an electron and x is a position vector of the electron. With

individual components of the coordinate, we have
1 1
m€x ¼ pffiffiffi eE0 cos ωt and m€y ¼ pffiffiffi eE 0 sin ωt: ð4:92Þ
2 2
Integrating (4.92) two times, we get
eE
mx ¼ pffiffiffi 0 cos ωt þ Ct þ D, ð4:93Þ
2 ω2
where C and D are integration constants. Setting xð0Þ ¼ pffiffieE

2mω2
0
and x 0 (0) ¼ 0, we
have C ¼ D ¼ 0. Similarly, we have
eE
my ¼ pffiffiffi 0 sin ωt þ C 0 t þ D0 , ð4:94Þ
2 ω2
ffiffi 0 , we
where C0 and D0 are integration constants. Setting y(0) ¼ 0 and y0 ð0Þ ¼ peE
2mω
have C0 ¼ D0 ¼ 0. Thus, making t a parameter, we get
2
eE
x2 þ y2 ¼ pffiffiffi 0 : ð4:95Þ
2mω2
This implies that the electron is exerting a counterclockwise circular motion with
a radius pffiffieE
2mω2
0
under the influence of the electric field. This is consistent with a
motion of an electron in the coherent state of ϕ(1s) and ϕ(2px + iy) as expressed in
(4.57).
An angular momentum the electron has acquired is

eE meE e2 E 0 2
L¼x p = xpy ypx ¼ pffiffiffi 0 pffiffiffi 0 ¼ : ð4:96Þ
2mω2 2mω 2mω3
Identifying this with h, we have
e2 E 0 2
¼ h: ð4:97Þ
2mω3
In terms of energy, we have
e2 E 0 2
¼ hω: ð4:98Þ
2mω2
Assuming a wavelength of the light is 600 nm, we need a left-circularly polarized

light whose electric field is about 1.5 1010 [V/m].
A radius α of a circular motion of the electron is given by
eE
α ¼ pffiffiffi 0 : ð4:99Þ
2mω2
Under the same condition as the above, α is estimated to be 2 Å.
References
1. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York

2. Fowles GR (1989) Introduction to modern optics, 2nd edn. Dover, New York
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
6. Rossi B (1957) Optics. Addison-Wesley, Reading, Massachusetts
Chapter 5
Approximation Methods of Quantum
Mechanics
In Chaps. 1–3 we have focused on solving eigenvalue equations with respect to a

particle confined within a one-dimensional potential well or a harmonic oscillator
along with an electron of a hydrogen-like atom. In each example we obtained exact
analytical solutions with the quantum-mechanical states and corresponding eigen-
values (energy, angular momentum, etc.). In most cases of quantum-mechanical
problems, however, we are not able to get such analytical solutions or accurately
determine the corresponding eigenvalues. Under these circumstances, we need
appropriate approximation methods of those problems. Among those methods, the
perturbation method and variational method are widely used.
In terms of usefulness, we provide several examples concerning physical systems
that have already appeared in Chaps. 1–3. In this chapter, we examine how these
physical systems change their quantum states and corresponding energy eigenvalues
as a result of undergoing influence from the external field. We assume that the
change results from the application of external electric field. For simplicity, we focus
on the change in eigenenergy and corresponding eigenstate with respect to the
nondegenerate quantum state. As specific cases in these examples, we happen to
be able to get perturbed physical quantities accurately. Including such cases, for later
purposes we take in advance important concepts of a complete orthonormal system
(CONS) and projection operator.
5.1 Perturbation Method
In Chaps. 1–3, we considered a situation where no external field is exerted on a

physical system. In Chap. 4, on the other hand, we studied the optical transition that
takes place as a consequence of the interaction between the physical system and
electromagnetic wave. In this chapter, we wish to examine how the physical system
changes its quantum state, when an external electric field is applied to the system.
We usually assume that the external field is weak and the corresponding change in

https://doi.org/10.1007/978-981-15-2225-3_5
152 5 Approximation Methods of Quantum Mechanics
the quantum state or energy is small. The applied external field causes the change in
Hamiltonian of a quantum system and results in the change in that quantum state
usually accompanied by the energy change of the system. We describe this process
as a following eigenvalue equation:
H j Ψi i ¼ Ei j Ψi i, ð5:1Þ
where H represents the total Hamiltonian; Ei is an energy eigenvalue and Ψi is an

eigenstate corresponding to Ei. Equation (5.1) implies that there should be a series of
eigenvalues and corresponding eigenvectors belonging to those eigenvalues. Strictly
speaking, however, we can get such combinations of the eigenstate and eigenvalue
only by solving (5.1) analytically. This is some kind of circular argument. Yet, at
present we do not have to go into detail about that issue. Instead we advance to
discussion of the approximation methods from a practical point of view.
We assume that H is divided into two terms such that
H ¼ H 0 þ λV, ð5:2Þ
where H0 is the Hamiltonian of the unperturbed system without the external field;
V is an additional Hamiltonian due to the applied external field, or interaction
between the physical system and the external field. The second term includes a
parameter λ that can be modulated according to the change in the strength of the
applied field. The parameter λ should be real so that H of (5.2) can be Hermitian. The
parameter λ can be the applied field itself or can merely be a dimensionless parameter
that can be set at λ ¼ 1 after finishing the calculation. We will come back to this point
later in Examples.
In (5.2) we assume that the following equation holds:
ð0Þ
H 0 j ii ¼ E i j ii, ð5:3Þ
where jii is the i-th quantum state. We designate j0i as the ground state. We assume
the excited states ji i (i ¼ 1, 2, ) to be numbered in order of increasing energies.
These functions (or vectors) jii (including j0i) can be a quantum state that appeared
as various eigenfunctions in the previous chapters. Of these, two or more functions
may have the same eigenenergy (i.e., degenerate). If we are to deal with such
degenerate states, those states jii have to be distinguished by, e.g., jid i
(d ¼ 1, 2, , s), where s denotes the degeneracy (or degree of degeneracy). For
simplicity, however, in this chapter we are going to examine only the change in
energy and physical states with respect to nondegenerate states. We disregard the
issue on notation of the degenerate states accordingly. We pay attention to each
individual case later, when necessary.
Regarding a one-dimensional physical system, e.g., a particle confined within
potential well and quantum-mechanical harmonic oscillator, for example, all the
quantum states are nondegenerate as we have already seen in Chaps. 1 and 2. As a
5.1 Perturbation Method 153
special case we also consider the ground state of a hydrogen-like atom. This is
because that state is nondegenerate, and so the problem can be dealt with in parallel
to the cases of the one-dimensional physical systems. We assume that the quantum
sates ji i (i ¼ 0, 1, 2, ) constitute the orthonormal eigenvectors such that
h j j ii ¼ δji : ð5:4Þ
Remember that the notation has already appeared in (2.53); see Chap. 13 as well.
5.1.1 Quantum State and Energy Level Shift Caused by

Perturbation
One of the most important applications of the perturbation method is to evaluate the
shift in quantum state and energy level caused by the applied external field. To this
end, we expand both the quantum sate and energy as a power series of λ. That is, in
(5.1) we expand jΨii and Ei such that
ð1Þ ð2Þ ð3Þ

j Ψi i ¼j ii þ λ j ϕi i þ λ2 j ϕi i þ λ3 j ϕi i þ , ð5:5Þ
ð0Þ ð1Þ ð2Þ ð3Þ
Ei ¼ Ei þ λE i þ λ2 Ei þ λ3 Ei þ , ð5:6Þ
ð1Þ ð2Þ ð3Þ

where j ϕi i, j ϕi i, j ϕi i, etc. are chosen as correction terms for the state jii that
is associated with the unperturbed system. Once again, we assume that the state ji i
(i ¼ 0, 1, 2, ) represents the nondegenerate normalized eigenvector that belongs to
ð0Þ ð1Þ ð2Þ
the eigenenergy E i of the unperturbed state. Unknown state vectors j ϕi i, j ϕi i,
j ϕi i, etc. as well as E i , E i , Ei , etc. are to be determined after the calculation
procedures carried out from now. These states jΨii and energies Ei result from the
ð0Þ
perturbation term λV and represent the deviation from jii and E i of the unperturbed
system.
With the normalization condition we impose the following condition upon jΨii
such that
hijΨi i ¼ 1: ð5:7Þ
This condition, however, not necessarily means that jΨii has been normalized
(vide infra). From (5.4) and (5.7) we have
D E D E
ð1Þ ð2Þ
ijϕi ¼ ijϕi ¼ ¼ 0: ð5:8Þ
Let us calculate hΨij Ψii on condition of (5.7) and (5.8). From (5.5) we have
hD E D Ei
ð1Þ ð1Þ
hΨi jΨi i ¼ hijii þ λ ijϕi þ ϕi ji
hD E D E D Ei
þλ2 ijϕi þ ϕi ji þ ϕi jϕi þ
D E D E
¼ hijii þ λ2 ϕi jϕi þ 1 þ λ2 ϕi jϕi ,
where we used (5.4) and (5.8). Thus, hΨij Ψii is normalized if we ignore the factor of
λ 2.
Now, inserting (5.5) and (5.6) into (5.1) and comparing the same power factors
with respect to λ, we obtain
h i
ð0Þ
E i H 0 j ii ¼ 0, ð5:9Þ
h i
ð0Þ ð1Þ ð1Þ
E i H 0 j ϕi i þ E i j ii ¼ V j ii, ð5:10Þ
h i
ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ
Ei H 0 j ϕi i þ E i j ϕi i þ Ei j ii ¼ V j ϕi i: ð5:11Þ
Equation (5.9) is identical with (5.3). Operating h jj on both sides of (5.10) from
the left, we get
h i h i
ð0Þ ð1Þ ð1Þ ð0Þ ð0Þ ð1Þ ð1Þ
h j j Ei H 0 j ϕi i þ h j j E i j ii ¼ h j j E i E j j ϕi i þ E i h jjii
h i
¼ E i E j h j j ϕi i þ E i δji ¼ h jjVjii: ð5:12Þ
Putting j ¼ i on (5.12), we have
ð1Þ
Ei ¼ hijVjii: ð5:13Þ
This represents the first-order correction term with the energy eigenvalue. Mean-
while, assuming j 6¼ i on (5.12), we have
ð1Þ h jjVjii
h j j ϕi i ¼ h i ð j 6¼ iÞ, ð5:14Þ
ð0Þ ð0Þ
Ei Ej
ð 0Þ ð0Þ
where we used Ei 6¼ E j on the assumption that the quantum state jii that belongs
ð0Þ
to eigenenergy E i is nondegenerate. Here, we emphasize that the eigenstates that
ð0Þ
correspond to Ej may or may not be degenerate. In other words, the quantum state
that we are making an issue of with respect to the perturbation is nondegenerate. In
this context, we do not question whether or not other quantum states [represented by
jji in (5.14)] are degenerate.
Here we postulate that the eigenvectors, i.e., ji i (i ¼ 0, 1, 2, ) of H0 form a

complete orthonormal system (CONS) such that
X
P k
jkihkj ¼ E, ð5:15Þ
where jk ih kj is said to be a projection operator and E is an identity operator. As

remarked just above, (5.15) holds regardless of whether the eigenstates are degen-
erate. The word of complete system implies that any vector (or function) can be
expanded by a linear combination of the basis vectors of the said system. The formal
definition of the projection operator can be seen in Chaps. 14 and 18.
Thus, we have
ð1Þ ð1Þ
X ð1Þ
X ð1Þ
j ϕi i ¼ E j ϕi i ¼ k
jkihk jϕi i ¼ k6¼i
jkihkjϕi i
X hkjVjii
¼ k6¼i
j ki h i, ð5:16Þ
ð0Þ ð0Þ
Ei Ek
where with the third equality we used (5.8) and with the last equality we used (5.14).
Operating hij from the left on (5.16) and using (5.4), we recover (5.8). Hence, using
(5.5) the approximated quantum state to the first order of λ is described by
ð1Þ
X hkjVjii
j Ψi i j ii þ λ j ϕi i ¼j ii þ λ k6¼i
j ki h i: ð5:17Þ
ð0Þ ð0Þ
Ei Ek
For a while, let us think of a case where we deal with the change in the
eigenenergy and corresponding eigenstate with respect to the degenerate quantum
state. In that case, regarding the state jii in question on (5.12) we have ∃j ji ( j 6¼ i)
ð 0Þ ð0Þ
that satisfies Ei ¼ E j . Therefore, we would have h jj Vj ii ¼ 0 from (5.12).
Generally, it is not the case, however, and so we need a relation different from
(5.12) to deal with the degenerate case. However, we do not get into details about
this issue in this book.
ð2Þ
Next let us seek the second-order correction term E i of the eigenenergy.
Operating h jj on both sides of (5.11) from the left, we get
h i
ð0Þ ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ
Ei Ej h j j ϕi i þ E i h j j ϕi i þ E i δji ¼ h j j V j ϕi i:
Putting j ¼ i in the above relation as before, we have

ð2Þ ð1Þ
Ei ¼ hi j V j ϕi i: ð5:18Þ
Notice that we used (5.8) to derive (5.18). Using (5.16) furthermore, we get
ð2Þ
X hkjVjii X hkjVjii
Ei ¼ k6¼i
hi j V j ki h i¼
k6¼i
hk j V { j ii h i
Ei Ek Ei Ek
X hkjVjii X 1
¼ k6¼i
hk j V j ii h i¼
k6¼i
h i jhkjVjiij2 , ð5:19Þ
ð0Þ ð0Þ ð0Þ ð 0Þ
Ei Ek Ei Ek
where with the second equality we used (1.116) in combination with (1.118) and
with the third equality we used the fact that V is an Hermitian operator; i.e., V{ ¼ V.
The state jki is sometimes called an intermediate state.
Considering the expression of (5.16), we find that the approximated state of
ð1Þ
j ii þ λ j ϕi i in (5.5) is not an eigenstate of energy, because the approximated state
contains a linear combination of different eigenstates jki that have eigenenergies
ð0Þ
different from Ei of jii. Substituting (5.13) and (5.19) for (5.6), with the energy
correction terms up to the second order we have
ð0Þ ð1Þ ð2Þ

E i Ei þ λE i þ λ2 E i
ð0Þ
X 1
¼ Ei þ λhijVjii þ λ2 h i jhkjVjiij2 : ð5:20Þ
k6¼i ð0Þ ð0Þ
Ei Ek
If we think of the ground statej0i forjii, we get
ð0Þ ð1Þ ð2Þ

E 0 E 0 þ λE 0 þ λ2 E 0
ð0Þ
X 1
¼ E0 þ λh0jVj0i þ λ2 h i jhkjVj0ij2 : ð5:21Þ
k6¼0 ð0Þ ð0Þ
E0 Ek
ð0Þ ð0Þ
Since E 0 < Ek ðk ¼ 1, 2, Þ for anyjki, the second-order correction term is
always negative. This contributes to the energy stabilization.
5.1.2 Several Examples
To deepen the understanding of the perturbation method, we show several examples.

We focus on the change in the eigenenergy and corresponding eigenstate with
respect to specific quantum states. We assume that the change is caused by the
applied electric field as a perturbation.
Example 5.1 Suppose that the charged particle is confined within a one-dimen-
sional potential well. This problem has already appeared in Example 1.2. Now let us
consider how an energy of a particle carrying a charge e is shifted by the applied
electric field. This situation could experimentally be achieved using a “nano capac-
itor” where, e.g., an electron is confined within the capacitor, while applying a
voltage between the two electrodes.
We adopt the coordinate system the same as that of Example 1.2. That is, suppose
that we apply an electric field F between the electrodes that are positioned at x ¼ L
and that the electron is confined between the two electrodes. Then, the Hamiltonian
of the system is described as
h2 d 2
H¼ eFx, ð5:22Þ
2m dx2
where m is a mass of the particle. Note that we may choose eF for the parameter λ
of Sect. 5.1.1. Then, the coordinate representation of the Schrödinger equation is
given by
h2 d 2 ψ i ð x Þ
eFxψ i ðxÞ ¼ E i ψ i ðxÞ, ð5:23Þ
2m dx2
where Ei is the energy of the i-th state from the bottom (i.e., the ground state); ψ i is its
corresponding eigenstate. We wish to seek the perturbation energy up to the second
order and the quantum state up to the first order. We are particularly interested in the
change in the ground state j0i. Notice that the ground state is obtained in (1.101) by
putting l ¼ 0. According to (5.21) we have
ð0Þ
X 1
E 0 E0 þ λh0jVj0i þ λ2 h i jhkjVj0ij2 , ð5:24Þ
k6¼0 ð0Þ ð0Þ
E0 Ek
where V ¼ eFx. Since the field F is thought to be an adjustable parameter, in

(5.24) we may replace λ with eF (vide supra). Then, in (5.24) V is replaced by x in
turn. In this way, (5.24) can be rewritten as
ð0Þ
X 1
E 0 E0 eF h0jxj0i þ ðeF Þ2 h i jhkjxj0ij2 : ð5:25Þ
k6¼0 ð0Þ ð0Þ
E0 Ek
Considering that j0i is an even function [i.e., a cosine function in (1.101)] with
respect to x and that x is an odd function, we find that the second term vanishes.
Notice that the explicit coordinate representation of j0i is
rffiffiffi
1 π
j 0i ¼ cos x, ð5:26Þ
L 2L
qffiffi
π
which is obtained by putting l ¼ 0 in j lcos i 1
L cos 2L þ lπL x ðl ¼ 0, 1, 2, Þ in
(1.101).
By the same token, hkj xj 0i in the third term of RHS of (5.25) vanishes if jki
denotes a cosine function in (1.101). If, on the other hand, jki denotes a sine
function, hkj xj 0i does not vanish. To distinguish the sine functions from cosine
functions, we denote
rffiffiffi
1 nπ
j nsin i sin x ðn ¼ 1, 2, 3, Þ ð5:27Þ
L L
that is identical with (1.102); see Fig. 5.1 with the notation of the quantum states.
h ð0Þ π 2 2
h ð0Þ
ð2nÞ π 2 2 2
Now, putting E0 ¼ 2m 4L 2 and E 2n1 ¼ 2m
4L2
ðn ¼ 1, 2, 3, Þ in (5.25),
we rewrite it as
ð0Þ
X1 1
E0 E 0 eF h0jxj0i þ ðeF Þ2 h i jhkjxj0ij2
k¼1 ð0Þ ð0Þ
E0 Ek
⋯
Fig. 5.1 Notation of the

quantum states that
( ) ℏ
distinguish cosine and sine
functions.
= |2 ⟩ ≡ |4⟩
ð0Þ
E k ðk ¼ 0, 1, 2, Þ
represents energy
eigenvalues of the
unperturbed states
( ) ℏ
= |2 ⟩ ≡ |3⟩
( ) ℏ
= |1 ⟩ ≡ |2⟩
( ) ℏ
= |1 ⟩ ≡ |1⟩
( ) ℏ
= |0 ⟩ ≡ | 0⟩
ð0Þ
X1 1
¼ E 0 eF h0jxj0i þ ðeF Þ2 h i jhnsin jxj0ij2 , ð5:25Þ
n¼1 ð0Þ ð0Þ
E 0 E 2n1
where with the second equality n denotes the number appearing in (5.27) and jnsini
ð0Þ
belongs to E2n1. Referring to, e.g., (4.17) with regard to the calculation of hkj xj 0i,
we have arrived at the following equation described by
h2 π 2 322 8 mL4 X1 n2
E0 2 ðeF Þ2 : ð5:28Þ
2m 4L π h
6 2 n¼1
ð4n 1Þ5
2
The series of the second term in (5.28) rapidly converges. Defining the first
N partial sum of the series as
XN n2
Sð N Þ ,
n¼1
ð4n2 1Þ5
we obtain
Sð1Þ ¼ 1=243 0:004115226 and Sð6Þ Sð1000Þ 0:004120685
as well as
½Sð1000Þ Sð1Þ
¼ 0:001324653:
Sð1000Þ
That is, only the first term of S(N ) occupies ~99.9% of the infinite series. Thus,
ð2Þ
the stabilization energy λ2 E0 due to the perturbation is satisfactorily given by
ð2Þ 322 8 mL4

λ2 E 0 ðeF Þ2 , ð5:29Þ
243π 6 h2
ð2Þ
where the notation λ2 E 0 is due to (5.6). Meanwhile, the corrected term of the
quantum state is given by
ð1Þ
X hkjxj0i
j Ψ0 i j 0i þ λ j ϕ0 i j 0i eF k6¼0
j ki h i
ð0Þ ð0Þ
E0 Ek
32 8 mL3 X1 ð1Þn1 n
¼j 0i þ eF j n sin i : ð5:30Þ
π 4 h2 n¼1
ð4n2 1Þ3
For a reason similar to the above, the approximation is satisfactory, if we adopt

only n ¼ 1 in the second term of RHS of (5.30). That is, we obtain
32 8 mL3
j Ψ0 i j 0i þ eF j 1sin i: ð5:31Þ
27π 4 h2
Thus, we have roughly estimated the stabilization energy and the corresponding
quantum state as in (5.29) and (5.31), respectively. It may well be worth estimating
rough numbers of L and F in the actual situation (or perhaps in an actual “nano
device”). The estimation is left for readers.
Example 5.2 [1] In Chap. 2 we dealt with a quantum-mechanical harmonic oscil-
lator. Here let us suppose that a charged particle (with its charge e) is performing
harmonic oscillation under an applied electric field. First we consider the coordinate
representation of Schrödinger equation. Without the external field, the Schrödinger
equation as an eigenvalue equation reads as
h2 d 2 uð qÞ 1
þ mω2 q2 uðqÞ ¼ EuðqÞ: ð2:108Þ
2m dq2 2
Under the applied electric field, the perturbation is expressed as eFq and, hence,
the equation can be written as
h2 d 2 uð qÞ 1
þ mω2 q2 uðqÞ eF ðqÞquðqÞ ¼ EuðqÞ: ð5:32Þ
2m dq2 2
For simplicity we assume that the electric field is uniform independent of q so that
we can deal with it as a constant. Then, (5.32) can be rewritten as
2
h2 d2 uðqÞ 1 eF 2 1 2 eF
þ mω q
2
uðqÞ ¼ E þ mω uðqÞ: ð5:33Þ
2m dq2 2 mω2 2 mω2
Changing the variable such that
eF
Qq , ð5:34Þ
mω2
we have
2
h2 d 2 uð qÞ 1 1 2 eF
þ mω Q uðqÞ ¼ E þ mω
2 2
uðqÞ: ð5:35Þ
2m dQ2 2 2 mω2
Taking account of the change in functional form that results from the variable
transformation, we rewrite (5.35) as
2
h2 d 2 e
uð Q Þ 1 1 2 eF
þ mω Q e
2 2
uðQÞ ¼ E þ mω e
uðQÞ, ð5:36Þ
2m dQ2 2 2 mω2
where

eF
uð qÞ ¼ u Q þ e
uðQÞ: ð5:37Þ
mω2
e as
Defining E
2
e E þ 1 mω2 eF e2 F 2
E ¼ E þ , ð5:38Þ
2 mω2 2mω2
we get
h2 d 2 e
uðQÞ 1 ee
þ mω2 Q2e
uð Q Þ ¼ E uðQÞ: ð5:39Þ
2m dQ2 2
Thus, we recover exactly the same form as (2.108). Equation (5.39) can be solved
analytically as already shown in Sect. 2.4. From (2.110) and (2.120), we have
e
2E
λ and λ ¼ 2n þ 1:
hω
Then, we have
e ¼ hω λ ¼ hωð2n þ 1Þ :
E ð5:40Þ
2 2
From (5.38), we get

e e2 F 2 1 e2 F 2
E¼E 2
¼ hω n þ : ð5:41Þ
2mω 2 2mω2
2 2
This implies that the energy stabilization due to the applied electric field is 2mω
e F
2.
Note that the system is always stabilized regardless of the sign of e.

Next, we wish to consider the problem on the basis of the matrix (or operator)
representation. Let us evaluate the perturbation energy using (5.20). Conforming the
notation of (5.20) to the present case, we have
X 1
E n Eðn0Þ eF hnjqjni þ ðeF Þ2 k6¼n
h i jhkjqjnij2 , ð5:42Þ
ð0Þ
Eðn0Þ Ek
where Eðn0Þ is given by


1
Eðn0Þ ¼ hω n þ : ð5:43Þ
2
Taking account of (2.55), (2.68), and (2.62), the second term of (5.42) vanishes.
With the third term, using (2.68) as well as (2.55) and (2.61) we have
h { h
jhkjqjnij ¼ 2
kja þ a jn kja þ a{ jn
2mω 2mω
h pffiffiffi pffiffiffiffiffiffiffiffiffiffiffi 2
¼ nhkjn 1i þ n þ 1hkjn þ 1i
2mω
h pffiffiffi pffiffiffiffiffiffiffiffiffiffiffi 2
¼ nδk,n1 þ n þ 1δk,nþ1
2mω
h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii
h
¼ nδk,n1 þ ðn þ 1Þδk,nþ1 þ 2δk,n1 δk,nþ1 nðn þ 1Þ
2mω
h
¼ ½nδ þ ðn þ 1Þδk,nþ1 : ð5:44Þ
2mω k,n1
Notice that with the last equality of (5.44) there is no k that satisfies
k ¼ n 1 ¼ n + 1 at once. Also note that hkj qj ni is a real number. Hence, as the
third term of (5.42) we get
X 1
h i jhkjqjnij2
k6¼n ð0Þ
Eðn0Þ Ek
X 1 h
¼ ½nδ þ ðn þ 1Þδk,nþ1
hωðn kÞ 2mω k,n1
k6¼n

1 n nþ1 1
¼ þ ¼ : ð5:45Þ
2mω n ðn 1Þ n ðn þ 1Þ
2 2mω2
Thus, from (5.42) we obtain

1 e2 F 2
En hω n þ : ð5:46Þ
2 2mω2
Consequently, we find that (5.46) obtained by the perturbation method is consis-

tent with (5.41) that was obtained as the exact solution. As already pointed out in
Sect. 5.1.1, however, (5.46) does not represent an eigenenergy. In fact, using (5.17)
together with (4.26) and (4.28), we have
ð1Þ
X hkjVj0i h1jVj0i
j Ψ0 i j 0i þ λ j ϕ0 i ¼j 0i þ λ k6¼0
j ki h i ¼j 0i þ λ j 1i h i
E0 Ek E0 E1
rffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h1jxj0i e2 F 2
¼ j0i eF j1i h i ¼j 0i þ j 1i: ð5:47Þ
ð0Þ
E E
ð0Þ 2mω3 h
0 1
Thus, jΨ0i is not an eigenstate because jΨ0i contains both j0i and j1i. Hence,
jΨ0i does not possess an eigenenergy. Notice also that in (5.47) the factors hkj xj 0i
vanish except for h1j xj 0i; see Chaps. 2 and 4.
To think of this point further, let us come back to the coordinate representation of
Schrödinger equation.
1=4 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi

mω 1 mω
Q e 2h Q ðn ¼ 0, 1, 2, Þ,
mω 2
e
un ðQÞ ¼ Hn
h π 1=2 2n n! h
pffiffiffiffiffi
mω
where H n h Q is the Hermite polynomial of the n-th order. Using (5.34) and
(5.37), we replace Q with q mω
eF
2 to obtain the following form described by

eF
un ð qÞ ¼ eun q 2
mω
14 rffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi mω eF 2
mω 1 mω eF
¼ Hn q e 2h qmω2 ðn ¼ 0, 1, 2, Þ: ð5:48Þ
h π 2 2n n!
1
h mω2
Since ψ n(q) of (2.106) forms the (CONS) [2], un(q) should be expanded using
ψ n(q). That is, as a solution of (5.32) we get
X
un ð qÞ ¼ c ψ ðqÞ,
k nk k
ð5:49Þ
where a set of cnk are appropriate coefficients. Since the functions ψ k(q) are
nondegenerate, un(q) expressed by (5.49) lacks a definite eigenenergy.
Example 5.3 [1] In the previous examples we studied the perturbation in
one-dimensional systems, where all the energy eigenstates are nondegenerate.
Here we deal with the change in the energy and quantum states of a hydrogen
atom (i.e., a three-dimensional system). Since the energy eigenstates are generally
degenerate (see Chap. 3), for simplicity we consider only the ground state that is
nondegenerate. As in the previous two cases, we assume that the perturbation is
caused by the applied electric field. As the simplest example, we deal with the
properties including the polarizability of the hydrogen atom in its ground state. The
polarizability of atom, molecule, etc. is one of the most fundamental properties in
materials science.
We assume that the electric field is applied along the direction of the z-axis; in this
case the perturbation term V is expressed as
V ¼ eFz, ð5:50Þ
where e is a charge of an electron (e < 0). Then, the total Hamiltonian H is described
by
e ¼ H 0 þ V,
H ð5:51Þ
" #
2
h2 ∂ 2 ∂ 1 ∂ ∂ 1 ∂
H0 ¼ r þ sin θ þ
2μr 2 ∂r ∂r sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
e2
, ð5:52Þ
4πε0 r
where H0 is identical with the Hamiltonian H of (3.35), in which Z ¼ 1. In this

example, we discuss topics on the polarization and energy shift (Stark effect) caused
by the applied electric field [1, 3].
(1) Polarizability of a hydrogen atom in the ground state
As in (5.5) and (5.16), the first-order perturbed state is described by
X hkjVj0i X hkjzj0i
j Ψ0 i j 0i þ k6¼0
j ki h i ¼j 0i eF
k6¼0
j ki h i, ð5:53Þ
E0 Ek E0 Ek
where j0i denotes the ground state expressed as (3.301). Since the quantum states jki
represent all the quantum states of the hydrogen atom as implied in (5.3), in the
second term of RHS in (5.53) we are considering all those states except for j0i. The
ð 0Þ h2
eigenenergy E0 is identical with E 1 ¼ 2μa 2 obtained from (3.258), where n ¼ 1.
As already noted, we simply numbered the states jki in order of increasing energies.
In the case of k 6¼ j (k, j 6¼ 0), we may have either E k ¼ Ej or E k 6¼ Ej
accordingly.
Using (5.51), let us calculate the polarizability of a hydrogen atom in a ground
state. In Sect. 4.1 we gave a definition of the electric dipole moment such that
X
Pe x,
j j
ð4:6Þ
where xj is a position vector of the j-th charged particle. Placing the proton of
hydrogen at the origin, by use of the notation (3.5) P is expressed as
1 0 0 1
Px ex
B C B C
P ¼ ðe1 e2 e3 Þ@ Py A ¼ ex ¼ ðe1 e2 e3 Þ@ ey A:
Pz ez
Hence, the z-component of the electric dipole moment Pz of electron is given by
Pz ¼ ez: ð5:54Þ
The quantum-mechanical analog of (5.54) is given by
hPz i ¼ ehΨ0 jzjΨ0 i, ð5:55Þ
where jΨ0i is taken from (5.53). Substituting (5.53) for (5.55), we get
hΨ0 jzjΨ0 i
* +
X hkjzj0i X hkjzj0i
¼ h0 j eF k6¼0
hk j h ijzjj 0i eF
k6¼0
j ki h i
E0 Ek E0 Ek
X hkjzj0i X hkjzj0i
¼ h0jzj0i eF k6¼0
h0jzjk i h i eF
k6¼0
0jz{ jk h i
E0 Ek E0 Ek
X jhkjzj0ij2
þðeF Þ2 k6¼0
hkjzjki h i2 : ð5:56Þ
ð0Þ ð0Þ
E0 Ek
In (5.56) readers are referred to the computation rule of inner product such as
(13.20) and (13.64). Since z is Hermitian (i.e., z{ ¼ z), the second and third terms of
RHS of (5.56) equal. Neglecting the second-order perturbation factor on the fourth
term, we have
X jhkjzj0ij2
hΨ0 jzjΨ0 i h0jzj0i 2eF h i: ð5:57Þ
k6¼0 ð0Þ ð0Þ
E0 Ek
Thus, from (5.55) we obtain
X jhkjzj0ij2
hPz i eh0jzj0i 2e2 F h i:
k6¼0 ð0Þ ð0Þ
E0 Ek
Using (3.301), the coordinate representation of h0j zj 0i is described by

Z 1
a3
h0jzj0i ¼ ze2r=a dxdydz
π 1
Z 1 Z Z π
a3 3 2r=a
2π
¼ r e dr dϕ cos θ sin θdθ
π 0 0 0
Z 1 Z Z π
a3 3 2r=a
2π
¼ r e dr dϕ sin 2θdθ ¼ 0, ð5:58Þ
2π 0 0 0
Rπ
where we used 0 sin 2θdθ ¼ 0. Then, we have
X jhkjzj0ij2
hPz i 2e2 F h i: ð5:59Þ
k6¼0 ð0Þ ð0Þ
E0 Ek
On the basis of classical electromagnetism, we define the polarizability α as [4]
α hPz i=F: ð5:60Þ
Here we have an important relationship between the polarizability and electric

dipole moment. That is,
ðpolarizabilityÞ ðelectric fieldÞ ¼ ðelectric dipole momentÞ:
From (5.59) and (5.60) we have
X jhkjzj0ij2
α ¼ 2e2 h i: ð5:61Þ
k6¼0 ð0Þ ð 0Þ
E0 Ek
At the first glance, to evaluate (5.61) appears to be formidable, but making the
most of the fact that the total wave functions of a hydrogen atom form the CONS, the
evaluation is straightforward. The discussion is as follows:
First, let us seek an operator G that satisfies [1]
z j 0i ¼ ðGH 0 H 0 GÞ j 0i: ð5:62Þ
To this end, we take account of all the quantum states jki of a hydrogen atom
(including j0i) that are solutions (or eigenstates of energy) of (3.36). In Sect 3.8 we
have obtained the total wave functions described by
eðnÞ ¼ Y m ðθ, ϕÞR

Λ eðl nÞ ðr Þ, ð3:300Þ
l,m l
ðnÞ
where Λe is expressed as a product of spherical surface harmonics and radial wave
l,m
functions. The latter functions are described by associated Laguerre functions. It is
well known [2] that the spherical surface harmonics constitute the CONS on a unit
sphere and that associated Laguerre functions form the CONS in the real one-
dimensional space. Thus, their product expressed as (3.300), i.e., aforementioned
collection of jki constitutes the CONS in the real three-dimensional space. This is
equivalent to
X
j
j jih j j¼ E, ð5:63Þ
where E is an identity operator.

The implications of (5.63) are as follows: Take any function jf(r)i and operate
(5.63) from the left. Then, we get
X X
Ef ðrÞ ¼ f ðrÞ ¼ k
j kihkj f ðrÞi ¼ k
f k j ki: ð5:64Þ
In other words, (5.64) implies that any function f(r) can be expanded into a series
of jki. The coefficients fk are defined as
f k hkj f ðrÞi: ð5:65Þ
Those are so-called “Fourier coefficients.” Related discussion is given in Sect.

10.4 as well as Chaps. 14, 18, and 20.
We further impose the conditions on the operator G defined in (5.62).
(i) G commutes with r (¼j rj ). (ii) G has a functional form described by G ¼ G
(r, θ). (iii) G does not contain a differential operator. On those conditions, we have
∂G/∂ϕ ¼ 0. From (3.301), we have ∂ j 0 i /∂ϕ ¼ 0. Thus, we get
ðGH 0 H 0 GÞ j 0i

h2 1 ∂ 2∂ 1 ∂ 2∂ 1 1 ∂ ∂
¼ G 2 r 2 r G 2 sinθ G j 0i: ð5:66Þ
2μ r ∂r ∂r r ∂r ∂r r sinθ ∂θ ∂θ
This calculation is somewhat complicated, and so we calculate (5.66) termwise.

With the first term of (5.66), we have

1 ∂ 2 ∂ 1 ∂ 2 ∂G 1 ∂ 2 ∂ j 0i
2 r G j 0i ¼ 2 r j 0i 2 r G
r ∂r ∂r r ∂r ∂r r ∂r ∂r
2
1 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i ∂G ∂ j 0i
¼ 2 2r j 0i 2 j 0i 2 r2
r ∂r ∂r ∂r ∂r r ∂r ∂r ∂r ∂r
2
2 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i
¼ j 0i 2 j 0i 2 2 r2
r ∂r ∂r ∂r ∂r r ∂r ∂r
2
2 ∂G ∂ G 2 ∂G G ∂ 2 ∂ j 0i
¼ j 0i 2 j 0i þ j 0i 2 r , ð5:67Þ
r ∂r ∂r a ∂r r ∂r ∂r
where with the last equality we used (3.301), namely j0 i / er/a, where a is Bohr
radius (Sect. 3.7.1). The last term of (5.67) cancels the first term of (5.66). Then, we
get
ðGH 0 H 0 GÞ j 0i
2
h2 1 1 ∂G ∂ G 1 1 ∂ ∂G
¼ 2 j0i þ 2 j0i þ 2 sin θ j0i
2μ r a ∂r ∂r r sin θ ∂θ ∂θ
2
h2 1 1 ∂G ∂ G 1 1 ∂ ∂G
¼ 2 þ 2 þ 2 sin θ j 0i: ð5:68Þ
From (5.62) we obtain
hr j z j 0i ¼ hr j ðGH 0 H 0 GÞ j 0i ð5:69Þ
so that we can have the coordinate representation. As a result, we get

2
h2 1 1 ∂G ∂ G 1 1 ∂ ∂G
r cos θϕð1sÞ ¼ 2 þ 2þ 2 sin θ ϕð1sÞ, ð5:70Þ
where ϕ(1s) is given by (3.301). Dividing (5.70) by ϕ(1s) and rearranging it, we
have
2
∂ G 1 1 ∂G 1 1 ∂ ∂G 2μ
þ 2 þ sin θ ¼ r cos θ: ð5:71Þ
∂r 2 r a ∂r r 2 sin θ ∂θ ∂θ h2
Now, we assume that G has a functional form of
Gðr, θÞ ¼ gðr Þ cos θ: ð5:72Þ
Inserting (5.72) into (5.71), we have

d2 g 1 1 dg 2g 2μ
þ2 ¼ 2 r: ð5:73Þ
dr 2 r a dr r 2 h
Equation (5.73) has a regular singular point at the origin and resembles an Euler
equation whose general from of a homogeneous equation is described by [5]
d2 g a dg b
þ þ g ¼ 0, ð5:74Þ
dr 2 r dr r 2
where a and b are arbitrary constants. In particular, if we have a following differen-

tial equation described by
d2 g a dg a
þ g ¼ 0, ð5:75Þ
dr 2 r dr r 2
we immediately see that one of particular solutions is g ¼ r. However, (5.73) differs

from the general Euler equation by the presence of 2a dg
dr. Then, let us assume that a
particular solution g has a form of
g ¼ pr 2 þ qr: ð5:76Þ
4pr 2q 2μ
4p ¼ 2 r: ð5:77Þ
a a h
Comparing coefficient of r and constant term, we obtain
aμ a2 μ
p¼ and q ¼ 2 : ð5:78Þ
2h 2
h
Hence, we get

aμ r
gð r Þ ¼ þ a r: ð5:79Þ
h2 2
Accordingly, we have

aμ r aμ r
Gðr, θÞ ¼ gðr Þ cos θ ¼ þ a r cos θ ¼ þ a z: ð5:80Þ
h2 2 h2 2
This is a coordinate representation of G(r, θ).

Returning back to (5.62) and operating h jj from the left on its both sides, we have
h jjzj0i ¼ h jjðGH 0 H 0 GÞj0i ¼ h jjGH 0 j0i h jjH 0 Gj0i

h i
ð0Þ ð0Þ
¼ E 0 E j h jjGj0i: ð5:81Þ
Notice here that
ð0Þ
H 0 j 0i ¼ E 0 j 0i ð5:82Þ
and
h i{
ð0Þ ð0Þ
h j j H 0 ¼ H 0 { j ji{ ¼ H 0 j ji{ ¼ Ej j ji ¼ E j h j j , ð5:83Þ
where we used a computation rule of (1.118) and with the second equality we used
the fact that H0 is Hermitian, i.e., H0{ ¼ H0. Rewriting (5.81), we have
h jjzj0i
h jjGj0i ¼ ð0Þ ð0Þ
: ð5:84Þ
E0 Ej
Multiplying h0jzj ji on both sides of (5.84) and summing over j (6¼0), we obtain
X X h0jzj jih jjzj0i

h0jzj jih jjGj0i ¼ h i: ð5:85Þ
j6¼0 j6¼0 ð0Þ ð0Þ
E0 Ej
Adding h0jzj0 i h0jGj0i on both sides of (5.85), we have
X X h0jzj jih jjzj0i

h0jzj jih jjGj0i ¼ h i þ h0jzj0ih0jGj0i: ð5:86Þ
j j6¼0 ð0Þ ð0Þ
E0 Ej
But, from (5.58) h0j zj 0i ¼ 0. Moreover, using the completeness of jji described
by (5.63) for LHS of (5.86), we get
X X h0jzj jih jjzj0i X jh jjzj0ij2

h0jzGj0i ¼ h0jzj jih jjGj0i ¼ h i¼ h i
j j6¼0 ð0Þ ð0Þ j6¼0 ð0Þ ð0Þ
E0 Ej E0 Ej
α
¼ , ð5:87Þ
2e2
where with the second equality we used (5.84) and with the last equality we used
(5.61). Also, we used the fact that z is Hermitian.
Meanwhile, using (5.80), LHS of (5.87) is expressed as

aμ r aμ a2 μ
h0jzGj0i ¼ h 0 z þ a z0i ¼ 2 h0jzrzj0i 2 h0z2 0i: ð5:88Þ
h 2 2 2h h
Noting that j0i is spherically symmetric and that z and r are commutative, (5.88)
can be rewritten as
aμ 3 a2 μ 2
h0jzGj0i ¼ h 0 r 0i h0 r 0i, ð5:89Þ
6h2 3h2
where we used h0jx2j0 i ¼ h0jy2j0 i ¼ h0jz2j0 i ¼ h0jr2j0 i /3.

Returning back to the coordinate representation and taking account of variables
change from the Cartesian coordinate to the polar coordinate as in (5.58), we get
aμ 4π 120a6 a2 μ 4π 3a5 aμ 5a3 a2 μ 4a3

h0jzGj0i ¼ ¼ 2
6h2 πa3 64 3h2 πa3 4 h2 4 h 4
3
aμ 9a
¼ 2 : ð5:90Þ
h 4
To perform the definite integral calculations of (5.89) and obtain the result of
(5.90), modify (3.263) and use the formula described below. That is, differentiate
Z1
1
erξ dr ¼
ξ
0
four (or five) times with respect to ξ and replace ξ with 2/a to get the result of (5.90)
appropriately.
Using (5.87) and (5.90), as the polarizability α we finally get

aμ 9a3 e2 aμ 9a3 9a3
α ¼ 2e h0jzGj0i ¼ 2e 2
2 2
¼ ¼ 4πε 0
h 4 h2 2 2
¼ 18πε0 a3 , ð5:91Þ
where we used
a ¼ 4πε0 h2 =μe2 : ð5:92Þ
See Sect. 3.7.1 for this relation.

(2) Stark effect of a hydrogen atom in a ground state
The energy shift with the ground state of a hydrogen atom up to the second order
is given by
ð0Þ
X 1
E 0 E0 eF h0jzj0i þ ðeF Þ2 h i jh jjzj0ij2 : ð5:93Þ
j6¼0 ð0Þ ð0Þ
E0 Ej
Using (5.58) and (5.87) obtained above, we readily get
ð0Þ αe2 F 2 ð0Þ αF 2 ð0Þ

E0 E0 ¼ E 0 ¼ E 0 9πε0 a3 F 2 : ð5:94Þ
2e2 2
Experimentally, the energy shift by 9πε0a3F2 is well known as the Stark shift.
In the above examples, we have seen how energy levels and related properties are
changed by the applied electric field. The energies of the system that result from the
applied external field should not be considered as an energy eigenvalue, but should
be thought to be an expectation value.
The perturbation theory has a wide application in physical problems. Examples
include the evaluation of transition probability between quantum states. The theory
also deals with scattering of particles. Including the treatment of the degenerate case,
interested readers are referred to appropriate literature [1, 3].
5.2 Variational Method
Another approximation method is a variational method. This method also has a wide
range of applications in mathematical physics.
A major application of the variational method lies in seeking an eigenvalue that is
appropriately approximated. Suppose that we have an Hermitian differential opera-
tor D that satisfies an eigenvalue equation
5.2 Variational Method 173
Dyn ¼ λn yn ðn ¼ 1, 2, 3, Þ, ð5:95Þ
where λn is a real eigenvalue and yn is a corresponding eigenfunction. We have

already encountered one of typical differential operators and eigenvalue equations in
(1.63) and (1.64). In (5.95) we assume that
λ1 λ2 λ3 , ð5:96Þ
where corresponding eigenstates may be degenerate.

We also assume that a collection {yn; n ¼ 1, 2, 3, } constitutes the CONS. Now
suppose that we have a function u that satisfies the boundary conditions (BCs) the
same as those for yn that is a solution of (5.95). Then, we have
hyn jDui ¼ hDyn jui ¼ hλn yn jui ¼ λn hyn jui ¼ λn hyn jui: ð5:97Þ
In (5.97) we used (1.132) and the computation rule of the inner product (see Sect.
13.1). Also, we used the fact that any eigenvalue of an Hermitian operator is real
(Sect. 1.4). From the assumption that the functions {yn} constitute the CONS, u can
be expanded in a series described by
X
u¼ c
j j
j yj i ð5:98Þ
with
cj yj ju : ð5:99Þ
This relation is in parallel with (5.64). Similarly, Du can be expanded such that
X X X X
Du ¼ d
j j
j yj i ¼ j
yj jDu j yj i ¼ λ
j j
yj ju j yj i ¼ λc
j j j
j yj i, ð5:100Þ
where dj is an expansion coefficient and with the third equality we used (5.95) and
with the last equality we used (5.99). Comparing the individual coefficient of jyji of
(5.100), we get
d j ¼ λj c j : ð5:101Þ
Thus, we obtain
X
Du ¼ λc
j j j
j yj i: ð5:102Þ
Comparing (5.98) and (5.102), we find that we may termwise operate D on (5.98).
Meanwhile, taking an inner product huj Dui again termwise with (5.100), we get
D X E X X X 2
hujDui ¼ uj j λj cj j yj i ¼ j
λ j c j ujy j ¼ j
λ j c j c
j ¼ λ c
j j j
X 2 X 2
λ cj ¼ λ 1 c , ð5:103Þ
j 1 j j
where with the third equality we used (5.99) in combination with ujyj ¼

yj ju ¼ cj . With huj ui, we have
DX X E X X 2
hujui ¼ c y jj c j y i ¼ c
c y jy ¼ c , ð5:104Þ
j j j j j j j j j j j j j
where with the last equality we used the fact that the functions {yn} are normalized.
Comparing once again (5.103) and (5.104), we finally get
hujDui
hujDui λ1 hujui or λ1 : ð5:105Þ
hujui
If we choose y1 for u, we have an equality in (5.105). Thus, we reach the

following important theorem.
Theorem 5.1 Suppose that we have a linear differential equation described by
Dy ¼ λy, ð5:106Þ
where D is a suitable Hermitian differential operator. Suppose also that under

appropriate boundary conditions (BCs), (5.106) has a series of (real) eigenvalues.
Then, the smallest eigenvalue is equal to the minimum of hhujDu i
ujui .
Using this powerful theorem, we are able to find a suitably approximated smallest
eigenvalue. We have simple examples next. As before, let us focus on the change in
the energies and quantum states caused by the applied electric field for a particle in a
one-dimensional potential well and a harmonic oscillator. The latter problem will be
left for readers as an exercise. As a specific case, for simplicity, we consider only the
ground state and first excited state to choose a trial function. First, we deal with the
problem in common with both the cases. Then, we discuss the feature individually.
As a general case, suppose that H is the Hamiltonian of the system (either a
particle in a one-dimensional potential well or a harmonic oscillator). We calculate
an expectation value of energy hHi when the system undergoes an influence of the
electric field. The Hamiltonian is described by
H ¼ H 0 eFx, ð5:107Þ
where H0 is the Hamiltonian without the external field F. We suppose that the
quantum state jui is expressed as
j ui j 0i þ a j 1i, ð5:108Þ
where j0i and j1i denote the ground state and first excited state, respectively; a is an
unknown parameter to be decided after the analysis based on the variational method.
In the present case, jui is a trial function. Then, we have
hujHui
hH i : ð5:109Þ
hujui
With the numerator, we have
hujHui ¼ hh0 j þah1 jjH 0 eFxjj 0i þ a j 1ii

¼ h0jH 0 j0i þ ah0jH 0 j1i eF h0jxj0i eFah0jxj1i
þa h1jH 0 j0i þ a ah1jH 0 j1i eFa h1jxj0i eFa ah1jxj1i
¼ h0jH 0 j0i eFah0jxj1i þ a ah1jH 0 j1i eFa h1jxj0i: ð5:110Þ
Note that in (5.110) four terms vanish because of the symmetry requirement of the
symmetric potential well and harmonic oscillator with respect to the origin as well as
because of the orthogonality between the states j0i and j1i. Here x is Hermitian and
we assume that the coordinate representation of j0i and j1i is real as in (1.101) and
(1.102) or in (2.106). Then, we have
h1jxj0i ¼ h1jxj0i ¼ 0jx{ j1 ¼ h0jxj1i:
Also, assuming that a is a real number, we rewrite (5.110) as
hujHui ¼ h0jH 0 j0i 2eFah0jxj1i þ a2 h1jH 0 j1i: ð5:111Þ
As for the denominator of (5.109), we have
hujui ¼ hh0 j þah1 jjj 0i þ a j 1ii ¼ 1 þ a2 , ð5:112Þ
where we used the normalized conditions h0j 0i ¼ h1j 1i ¼ 1 and the orthogonal
conditions h0j 1i ¼ h1j 0i ¼ 0. Thus, we get
h0jH 0 j0i 2eFah0jxj1i þ a2 h1jH 0 j1i

hH i ¼ : ð5:113Þ
1 þ a2
Here, defining the ground state energy and first excited energy as E0 and E1,
respectively, we put
H 0 j0i ¼ E 0 j0i and H 0 j 1i ¼ E 1 j 1i, ð5:114Þ
where E0 6¼ E1. As already shown in Chaps. 1 and 2, both the systems have
nondegenerate energy eigenstates. Then, we have
E0 2eFah0jxj1i þ a2 E1
hH i ¼ : ð5:115Þ
1 þ a2
Now we wish to determine the condition where hHi has an extremum. That is, we
are seeking a that satisfies
∂ hH i
¼ 0: ð5:116Þ
∂a
Calculating LHS of (5.116) by use of (5.115), we have
∂hH i 2eF h0jxj1ia2 þ 2ðE 1 E0 Þa 2eF h0jxj1i

¼ : ð5:117Þ
∂a ð1 þ a2 Þ2
For ∂∂a
hH i
to be zero, we must have
eF h0jxj1ia2 þ ðE1 E0 Þa eF h0jxj1i ¼ 0: ð5:118Þ
Then, solving the quadratic equation (5.118), we obtain

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h0jxj1i2
E 0 E1 ðE 1 E 0 Þ 1 þ 4½ðeF E E Þ2
1 0
a¼ : ð5:119Þ
2eF h0jxj1i
Taking the plus sign in the numerator of (5.119) and using
pffiffiffiffiffiffiffiffiffiffiffiffi Δ
1þΔ1þ ð5:120Þ
2
for small Δ, we obtain

eF h0jxj1i
a : ð5:121Þ
E1 E0
Thus, with the optimized state of (5.108) we get
eF h0jxj1i
j ui j 0i þ j 1i: ð5:122Þ
E1 E0
If one compares (5.122) with (5.17), one should recognize that these two equa-
tions are the same within the first-order approximation. Notice that putting i ¼ 0 and
k ¼ 1 in (5.17) and replacing V with eFx, we have the expression similar to (5.122).
Notice once again that h0j xj 1i ¼ h1j xj 0i as x is an Hermitian operator and that we
are considering real inner products.
Inserting (5.121) into (5.115) and approximating
1
1 a2 , ð5:123Þ
1 þ a2
we can estimate hHi. The estimation depends upon the nature of the system we
choose. The next examples deal with these specific characteristics.
Example 5.4 We adopt the same problem as Example 5.1. That is, we consider how
an energy of a particle carrying a charge e confined within a one-dimensional
potential well is changed by the applied electric field. Here we deal with it using
the variational method.
We deal with the same Hamiltonian as in the case of Example 5.1. It is described
as
h2 d 2
H¼ eFx: ð5:22Þ
2m dx2
By putting
h2 d 2
H0 , ð5:124Þ
2m dx2
we rewrite the Hamiltonian as
H ¼ H 0 eFx: ð5:125Þ
Expressing the ground state as j0i and the first excited state as j1i, we adopt a trial
function jui as
j ui ¼j 0i þ a j 1i: ð5:108Þ
In the coordinate representation, we have

rffiffiffi
1 π
j 0i ¼ cos x ð5:26Þ
L 2L
and
rffiffiffi
1 π
j 1i ¼ sin x: ð5:126Þ
L L
From Example 1.2 of Sect. 1.3, we have
h2 π 2
h0jH 0 j0i ¼ E0 ¼ ,
2m 4L2
h2 4π 2
h1jH 0 j1i ¼ E 1 ¼ ¼ 4E0 : ð5:127Þ
2m 4L2
Inserting these results into (5.121), we immediately get
h0jxj1i
a eF : ð5:128Þ
3E 0
As in Example 5.1 of Sect. 5.1, we have
32L
h0jxj1i ¼ : ð5:129Þ
9π 2
For this calculation, use the coordinate representation of (5.26) and (5.126). Thus,
we get
32 8 mL3
a eF : ð5:130Þ
27π 4 h2
Meanwhile, from (5.108) we obtain
h0jxj1i
j ui j 0i þ eF j 1i: ð5:131Þ
3E 0
The resulting jui in (5.131) is the same as (5.31), which was obtained by the first-
order approximation of (5.30). Inserting (5.130) into (5.113) and approximating the
denominator as before such that
References 179
1
1 a2 , ð5:123Þ
1 þ a2
and further using (5.130) for a, we get
322 8 mL4
hH i E0 ðeF Þ2 : ð5:132Þ
243π 6 h2
Once again, this is the same result as that of (5.28) and (5.29) where only n ¼ 1 is
taken into account.
Example 5.5 We adopt the same problem as Example 5.2. The calculation pro-
cedures are almost the same as those of Example 5.2. We use
h
h0jqj1i ¼ and E1 E 0 ¼ hω: ð5:133Þ
2mω
The detailed procedures are left for readers as an exercise. We should get the same
results as those obtained in Example 5.2.
As discussed in the above five simple examples, we showed the calculation
procedures of the perturbation method and variational method. These methods not
only supply us with suitable approximation techniques, but also provide physical
and mathematical insight in many fields of natural science.
References

New York
3. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York
5. Coddington EA (1989) An introduction to ordinary differential equations. Dover, New York
Chapter 6
Theory of Analytic Functions
Theory of analytic functions is one of major fields of modern mathematics. Its

application covers broad range of topics of natural science. A complex function
f (z), or a function that takes a complex number z as a variable, has various properties
that often differ from those of functions that take a real number x as a variable. In
particular, the analytic functions hold a paramount position in the complex analysis.
In this chapter we explore various features of the analytic functions accordingly.
From a practical point of view, the theory of analytic functions is very frequently
utilized for the calculation of real definite integrals. For this reason, we describe the
related topics together with tangible examples.
The complex plane (or Gaussian plane) can be dealt with as a topological space
where the metric (or distance function) is defined. Since the complex plane has a
two-dimensional extension, we can readily imagine and investigate its topological
feature. Set theory allows us to make an axiomatic approach along with the topology.
Therefore, we introduce basic notions and building blocks of the set theory and
topology.
6.1 Set and Topology
A complex number z is usually expressed as
z ¼ x þ iy, ð6:1Þ
where x and y are real numbers and i is an imaginary unit. Graphically, the number
z is indicated as a point in a complex plane where the real axis is drawn as abscissa
and the imaginary axis is depicted as ordinate (see Fig. 6.1). Since the complex plane
has a two-dimensional extension, we can readily imagine the domain of variability of
z and make a graphical object for it on the complex plane. A disk-like diagram that is
enclosed with a closed curve C is frequently dealt with in the theory of analytic

https://doi.org/10.1007/978-981-15-2225-3_6
182 6 Theory of Analytic Functions
Fig. 6.1 Complex plane

where a complex number
z is depicted
0 1
functions. Such a diagram provides a good subject for study of the set theory and
topology. Before developing the theory of complex numbers and analytic functions,
we briefly mention the basic notions as well as notations and meanings of sets and
topology.
6.1.1 Basic Notions and Notations
A set comprises numbers (either real or complex) or mathematical objects, more

generally. The latter include, e.g., vectors, their transformation, etc., as we shall see
various illustrations in Parts III and IV. The set A is described such that
A ¼ f1, 2, 3, , 10g or B ¼ fx; a < x < b, a, b 2 ℝg, ð6:2Þ
where A contains ten elements in the former case and uncountably infinite elements
are contained in the latter set B. In the latter case (sometimes in the former case as
well), the elements are usually called points.
When we speak of the sets, a universal set [1] is implied in the background. This
set contains all the sets in consideration of the study and is usually clearly defined in
the context of the discussion. For example, with the real analysis the universal set is
ℝ and with the complex analysis it is ℂ (an entire complex plane). In the latter case a
two-dimensional complex plane (see Fig. 6.1) is frequently used as the universal set.
We study various characteristics of sets (or subsets) that are contained in the
universal set and use a Venn diagram to represent them. Figure 6.2a illustrates an
example of Venn diagram that shows a subset A. As is often the case, the universal
set U is omitted in Fig. 6.2a. To show a (sub)set A we usually depict it with a closed
curve. If an element a is contained in A, we write
a2A ð6:3Þ
6.1 Set and Topology 183
(a) (b)
Fig. 6.2 Venn diagrams. (a) An example of the Venn diagram. (b) A subset A and its complement
Ac. U shows the universal set
and depict it as a point inside the closed curve. To show that b is not contained in A,
we write
b2
= A:
In that case, we depict b outside the closed curve of Fig. 6.2a.

To show that a set A is contained in another set B, we write
A ⊂ B: ð6:4Þ
If (6.4) holds, A is called a subset of B. Naturally, every set contained in the

universal set is its subset.
We need to explicitly show a set that has no element. The relevant set is called an
empty set and denoted by ∅. Examples are given below.

x; x2 < 0, x 2 ℝ ¼ ∅, fx; x 6¼ x; g ¼ ∅, x 2
= ∅, etc:
The subset A may have different cardinal numbers (or cardinalities) depending on
the nature of A. To explicitly show this, elements are sometimes indexed as, e.g.,
ai (i ¼ 1, 2, ) or aλ (λ 2 Λ), where Λ can be a finite set or infinite set with different
cardinalities; (6.2) is an example. A complement (or complementary set) of A is
defined as a difference U A and denoted by Ac(U A). That is, the set Ac is said
to be a complement of A with respect to U and indicated with a shaded area in
Fig. 6.2b.
Let A and B be two different subsets in U (Fig. 6.3). Figure 6.3 represents a sum
(or union, or union of sets) A [ B, an intersection A \ B, and differences A B and
B A. More specifically, A B is defined as a subset of A to which B is subtracted
from A and referred to as a set difference of A relative to B. We have
A [ B ¼ ðA BÞ [ ðB AÞ [ ðA \ BÞ: ð6:5Þ
Fig. 6.3 Two different

subsets A and B in an
universal set U. The diagram
shows the sum (A [ B),
intersection (A \ B), and −
differences (A B and
B A) ∩
−
If A has no element in common with B, A B ¼ A and B A ¼ B with A \ B ¼ ∅.

Then, (6.5) trivially holds. Notice also that A [ B is the smallest set that contains both
A and B. In (6.5) the elements must not be doubly counted. Thus, we have
A [ A ¼ A:
The well-known de Morgan’s law can be understood graphically using Fig. 6.3.
ðA [ BÞc ¼ Ac \ Bc and ðA \ BÞc ¼ Ac [ Bc : ð6:6Þ
This can be shown as follows: With 8u 2 U we have
u 2 ðA [ BÞc ⟺ u 2
= A [ B ⟺ u2
= A and u 2
= B ⟺ u 2 Ac and u 2 Bc
⟺ u 2 Ac \ Bc :
Furthermore, we have
u 2 ðA \ BÞc ⟺ u 2
= A \ B ⟺ u2
= A or u 2
= B ⟺ u 2 Ac or u 2 Bc ⟺ u 2 Ac [ Bc :
We have other important relations such as the distributive law described by
ðA [ BÞ \ C ¼ ðA \ C Þ [ ðB \ C Þ, ð6:7Þ
ðA \ BÞ [ C ¼ ðA [ CÞ \ ðB [ CÞ: ð6:8Þ
The confirmation of (6.7) and (6.8) is left for readers. The Venn diagrams are
intuitively acceptable, but we need more rigorous discussion to characterize and
analyze various aspects of sets (vide infra).
6.1.2 Topological Spaces and Their Building Blocks
On the basis of the aforementioned discussion, we define a topology and topological

space next. Let T be a universal set. Let τ be a collection of subsets of T. If the
collection τ satisfies the following axioms, τ is called a topology on T. The relevant
axioms are as follows:
(O1) T 2 τ and ∅ 2 τ, where ∅ is an empty set.
(O2) If O1, O2, , On 2 τ, O1 \ O2 \ On 2 τ.
(O3) If Oλ (λ 2 Λ) 2 τ, [λ 2 ΛOλ 2 τ.
In the above axioms, T is called an underlying set. The coupled object of T and τ
is said to be a topological space and denoted by (T, τ). The members (i.e., subsets) of
τ are defined as open sets. (If we do not assume the topological space, the underlying
set is equivalent to the universal set. Henceforth, however, we do not have to be too
strict with the difference in this kind of terminology.)
Let (T, τ) be a topological space with a subset O0 ⊂ T. Let τ0 be defined such that
τ0 fO0 \ O; O 2 τg:
Then, (O0, τ0) can be regarded as a topological space. In this case τ0 is called a
relative topology for O0 [2, 3] and O0 is referred to as a subspace of T. The topological
space (O0, τ0) shares the same topological feature as that of (T, τ). Notice that O0 does
not necessarily need to be an open set of T. But, if O0 is an open set of T, then any
open set belonging to O0 is an open set of T as well. This is evident from the
definition of the relative topology and Axiom (O2).
The abovementioned Axioms (O1)–(O3) sound somewhat pretentious and the
definition of the open sets seems to be “descent from heaven.” Nonetheless, these
axioms and terminology turn out useful soon. Once a topological space (T, τ) is
given, subsets of various types arise and a variety of relationships among them ensue
as well. Note that the above axioms mention nothing about a set that is different from
an open set. Let S be a subset (maybe an open set or maybe not) of T and let us think
of the properties of S.
Definition 6.1 Let (T, τ) be a topological space and S be an open set of τ; i.e., S 2 τ.
Then, a subset defined as a complement of S in (T, τ); i.e., Sc ¼ T S is called a
closed set in (T, τ).
Replacing S with Sc in Definition 6.1, we have (Sc)c ¼ S ¼ T Sc. The above
definition may be rephrased as follows: Let (T, τ) be a topological space and let A be
a subset of T. Then, if Ac ¼ T A 2 τ, then A is a closed set in (T, τ).
In parallel with the axioms (O1) to (O3), we have the following axioms related to
the closed sets of (T, τ): Let eτ be a collection of all the closed sets of (T, τ).
(C1) T 2 eτ and ∅ 2 eτ, where ∅ is an empty set.
(C2) If C1, C2, , C n 2 eτ, C1 [ C 2 [ C n 2 eτ.
(C3) If C λ ðλ 2 ΛÞ 2 eτ, \λ2Λ C λ 2 eτ.
Fig. 6.4 Venn diagram that

represents a neighborhood
S of a. S contains an open set
O that contains a
These axioms are obvious from those of (O1)–(O3) along with de Morgan’s law
(6.6).
Next, we classify and characterize a variety of elements (or points) and subsets as
well as relationships among them in terms of the open sets and closed sets. We make
the most of these notions and properties to study various aspects of topological
spaces and their structures. In what follows, we assume the presence of a topological
space (T, τ).
(a) Neighborhoods [4]
Definition 6.2 Let (T, τ) be a topological space and S be a subset of T. If S contains
an open set ∃O that contains an element (or point) a 2 T, i.e.,
a 2 O ⊂ S,
then S is called a neighborhood of a.

Figure 6.4 gives a Venn diagram that represents a neighborhood of a. This simple
definition occupies an important position in the study of the topological spaces and
the theory of analytic functions.
(b) Interior and Closure [4]
With the notion of neighborhoods at the core, we shall see further characterization
of sets and elements of the topological spaces.
Definition 6.3 Let x be a point of T and S be a subset of T. If S is a neighborhood of
x, x is said to be in the interior of S. In this case, x is called an interior point of S. The
interior of S is denoted by S∘.
The interior as a technical term can be understood as follows: Suppose that x 2 = S.
Then, by Definition 6.2 S is not a neighborhood of x. By Definition 6.3, this leads to
x2= S∘. This statement can be translated into x 2 Sc ) x 2 (S∘)c. That is, Sc ⊂ (S∘)c.
This is equivalent to
S∘ ⊂ S: ð6:9Þ
Definition 6.4 Let x be a point of T and S be a subset of T. If for any neighborhood

8
N of x we have N \ S 6¼ ∅, x is said to be in the closure of S. In this case, x is called
an adherent point of S. The closure of S is denoted by S.
According to the above definition, with the adherent point x of S we explicitly

write x 2 S. Think of negation of the statement of Definition 6.4. That is, with some
neighborhood ∃N of x we have (i) N \ S ¼ ∅ , x 2 = S . Meanwhile,
(ii) N \ S ¼ ∅ ) x 2 = S. Combining (i) and (ii), we have x 2
= S ) x2 = S. Similarly
to the above argument about the interior, we get
S ⊂ S: ð6:10Þ
S∘ ⊂ S ⊂ S: ð6:11Þ
In relation to the interior and closure, we have following important lemmas.

Lemma 6.1 Let S be a subset of T and O be any open set contained in S. Then, we
have O ⊂ S∘.
Proof Suppose that with an arbitrary point x, x 2 O. Since we have x 2 O ⊂ S, S is
a neighborhood of x (due to Definition 6.2). Then, from Definition 6.3 we have
x 2 S∘. Then, we have x 2 O ) x 2 S∘. This implies O ⊂ S∘. This completes the
proof. ∎
Lemma 6.2 Let S be a subset of T. If x 2 S∘, then there is some open set ∃O that
satisfies x 2 O ⊂ S.
Proof Suppose that with a point x, x 2 S∘. Then by Definition 6.3, S is a neighbor-
hood of x. Meanwhile, by Definition 6.2 there is some open set ∃O that satisfies
x 2 O ⊂ S. This completes the proof. ∎
Lemmas 6.1 and 6.2 teach us further implications. From Lemma 6.1 and (6.11)
we have
O ⊂ S∘ ⊂ S ð6:12Þ
with an arbitrary open set 8O contained in S∘. From Lemma 6.2 we have
x 2 S∘ ) x 2 O; i.e., S∘ ⊂ O with some open set ∃O containing S∘. Expressing
e we have S∘ ⊂ O.
this specific O as O, e Meanwhile, using Lemma 6.1 once again we
must get
e ⊂ S∘ ⊂ S:
O ð6:13Þ
e and O
That is, we have S∘ ⊂ O e ⊂ S∘ at once. This implies that with this specific
e e ∘
open set O, we get O ¼ S . That is,
Fig. 6.5 (a) Largest open (a) (b)

set S∘ contained in S.
O denotes an open set. (b)
Smallest closed set S
containing S. C denotes a
closed set ° ̅
e ¼ S∘ ⊂ S:
O ð6:14Þ
Relation (6.14) obviously shows that S∘ is an open set and moreover that S∘ is the
largest open set contained in S. Figure 6.5a depicts this relation. In this context we
have a following important theorem.
Theorem 6.1 Let S be a subset of T. The subset S is open if and only if S ¼ S∘.
Proof From (6.14) S∘ is open. Therefore, if S ¼ S∘, then S is open. Conversely,
suppose that S is open. In that event S itself is an open set contained in S. Meanwhile,
S∘ has been characterized as the largest open set contained in S. Consequently, we
have S ⊂ S∘. At the same time, from (6.11) we have S∘ ⊂ S. Thus, we get S ¼ S∘.
This completes the proof. ∎
In contrast to Lemmas 6.1 and 6.2, we have following lemmas with respect to
closures.
Lemma 6.3 Let S be a subset of T and C be any closed set containing S. Then, we
have S ⊂ C.
Proof Since C is a closed set, Cc is an open set. Suppose that with a point x, x 2
= C.
Then, x 2 Cc. Since C ⊃ S, Cc ⊂ Sc. This implies Cc \ S ¼ ∅. As x 2 Cc ⊂ Cc,
by Definition 6.2 Cc is a neighborhood of x. Then, by Definition 6.4 x is not in S; i.e.,
c c
x2= S. This statement can be translated into x 2 Cc ) x 2 S ; i.e., Cc ⊂ S . That
is, we get S ⊂ C. ∎
Lemma 6.4 Let S be a subset of T and suppose that a point x does not belong to S;
i.e., x 2 = C for some closed set ∃C containing S.
= S. Then, x 2
= S, then by Definition 6.4 we have some neighborhood ∃N and some
Proof If x 2
∃
open set O contained in that N such that x 2 O ⊂ N and N \ S ¼ ∅. Therefore, we
must have O \ S ¼ ∅. Let C ¼ Oc. Then C is a closed set with C ⊃ S. Since
x 2 O, x 2
= C. This completes the proof. ∎
From Lemmas 6.3 and 6.4 we obtain further implications. From Lemma 6.3 and
(6.11) we have
S⊂S⊂C ð6:15Þ
with an arbitrary closed set 8C containing S. From Lemma 6.4 we have x 2

c
e
S ) x 2 C c ; i.e., C ⊂ S with ∃C containing S. Expressing this specific C as C,
e
we have C ⊂ S. Meanwhile, using Lemma 6.3 once again we must get
e
S ⊂ S ⊂ C: ð6:16Þ
That is, we have Ce ⊂ S and S ⊂ C e at once. This implies that with this specific
e e
closed set C, we get C ¼ S. That is,
e
S ⊂ S ¼ C: ð6:17Þ
Relation (6.17) obviously shows that S is a closed set and moreover that S is the
smallest closed set containing S. Figure 6.5b depicts this relation. In this context we
have a following important theorem.
Theorem 6.2 Let S be a subset of T. The subset S is closed if and only if S ¼ S.
Proof The proof can be done analogously to that of Theorem 6.1. From (6.17) S is
closed. Therefore, if S ¼ S, then S is closed. Conversely, suppose that S is closed. In
that event S itself is a closed set containing S. Meanwhile, S has been characterized as
the smallest closed set containing S. Then, we have S ⊃ S. At the same time, from
(6.11) we have S ⊃ S. Thus, we get S ¼ S. This completes the proof. ∎
(c) Boundary [4]
Definition 6.5 Let (T, τ) be a topological space and S be a subset of T. If a point
x 2 T is in both the closure of S and the closure of the complement of S, i.e., Sc, x is
said to be in the boundary of S. In this case, x is called a boundary point of S. The
boundary of S is denoted by Sb.
By this definition Sb can be expressed as
Sb ¼ S \ Sc : ð6:18Þ
Replacing S with Sc, we get
ðSc Þb ¼ Sc \ ðSc Þc ¼ Sc \ S: ð6:19Þ
Thus, we have
Sb ¼ ðSc Þb : ð6:20Þ
This means that S and Sc have the same boundary. Since both S and Sc are closed
sets and the boundary is defined as their intersection, from Axiom (C3) the boundary
is a closed set. From Definition 6.1 a closed set is defined as a complement of an
open set, and vice versa. Thus, the open set and closed set do not make a confron-
tation concept but a complementation concept. In this respect we have the following
lemma and theorem.
Lemma 6.5 Let S be an arbitrary subset of T. Then, we have
Sc ¼ ðS∘ Þc :
Proof Since S∘ ⊂ S, (S∘)c ⊃ Sc. As S∘ is open, (S∘)c is closed. Hence, ðS∘ Þc ⊃ Sc,
because Sc is the smallest closed set containing Sc. Next let C be an arbitrary closed set
that contains Sc. Then Cc is an open set that is contained in S, and so we have Cc ⊂ S∘,
because S∘ is the largest open set contained in S. Therefore, C ⊃ (S∘)c. Then, if we
choose Sc for C, we must have Sc ⊃ (S∘)c. Combining this relation with ðS∘ Þc ⊃ Sc
obtained above, we get Sc ¼ ðS∘ Þc . ∎
Theorem 6.3 [4] A necessary and sufficient condition for S to be both open and
closed at once is Sb ¼ ∅.
Proof
(i) Necessary condition: Let S be open and closed. Then, S ¼ S and S ¼ S∘.
Suppose that Sb 6¼ ∅. Then, from Definition 6.5 we would have ∃ a 2 S \ Sc
for some a. Meanwhile, from Lemma 6.5 we have Sc ¼ ðS∘ Þc . This leads to
a 2 S \ ðS∘ Þc. But, from the assumption we would have a 2 S \ Sc. We have no
such a, however. Thus, using the proof by contradiction we must have Sb ¼ ∅.
(ii) Sufficient condition: Suppose that Sb ¼ ∅. Starting with S ⊃ S∘ and S 6¼ S∘, let
us assume that there is a such that a 2 S and a 2 = S∘. That is, we have a 2
ðS∘ Þ ) a 2 S \ ðS∘ Þ ⊂ S \ ðS∘ Þ ¼ S \ Sc ¼ Sb , in contradiction to the sup-
c c c
position. Thus, we must not have such a, implying that S ¼ S∘. Next, starting
with S ⊃ S and S 6¼ S, we assume that there is a such that a 2 S and a 2
= S. That
is, we have a 2 Sc ) a 2 S \ Sc ⊂ S \ Sc ¼ Sb , in contradiction to the suppo-
sition. Thus, we must not have such a, implying that S ¼ S. Combining this
relation with S ¼ S∘ obtained above, we get S ¼ S ¼ S∘ . In other words, S is
both open and closed at once. These complete the proof. ∎
The abovementioned set is sometimes referred to as a closed-open set or a clopen
set as a portmanteau word. An interesting example can be seen in Chap. 20 in
relation to topological groups (or continuous groups).
(d) Accumulation Points and Isolated Points

We have another important class of elements and sets that include accumulation
points and isolated points.
Definition 6.6 Let (T, τ) be a topological space and S be a subset of T. Suppose that
we have a point p 2 T. If for any neighborhood N of p we have N \ (S {p}) 6¼ ∅,
then p is called an accumulation point of S. A set comprising all the accumulation
points of S is said to be a derived set of S and denoted by Sd.
Note that from Definition 6.4 we have
N \ ðS fpgÞ 6¼ ∅ , p 2 S fpg, ð6:21Þ
where N is an arbitrary neighborhood of p.

Definition 6.7 Let S be a subset of T. Suppose that we have a point p 2 S. If for
some neighborhood N of p we have N \ (S {p}) ¼ ∅, then p is called an isolated
point of S. A set comprising only isolated points of S is said to be a discrete set
(or subset) of S and we denote it by Sdsc.
Note that if p is an isolated point, N \ (S {p}) ¼ N \ S N \ {p} ¼ N \ S
{p} ¼ ∅. That is, {p} ¼ N \ S. Therefore, any isolated points of S are contained in S.
Contrary to this, the accumulation points are not necessarily contained in S.
Comparing Definitions 6.6 and 6.7, we notice that the latter definition is obtained
by negation of the statement of the former definition. This implies that any points of
a set are classified into mutually exclusive two alternatives, i.e., the accumulation
points or isolated points.
We divide Sd into a direct sum of two sets such that

Sd ¼ ð S [ Sc Þ \ Sd ¼ Sd \ S [ Sd \ Sc , ð6:22Þ
where with the second equality we used (6.7).

Now, we define
d þ
S Sd \ S and Sd Sd \ Sc :
Then, (6.22) means that Sd is divided into two sets consisting of the accumulation
points, i.e., (Sd)+ that belongs to S and (Sd) that does not belong to S.
Thus, as another direct sum we have
þ
S ¼ Sd \ S [ Sdsc ¼ Sd [ Sdsc : ð6:23Þ
Equation (6.23) represents the direct sum consisting of a part of the derived set
and the whole discrete set. Here we have the following theorem.
Theorem 6.4 Let S be a subset of T. Then we have a following equation:

S ¼ Sd [ Sdsc Sd \ Sdsc ¼ ∅ : ð6:24Þ
Proof Let us assume that p is an accumulation point of S. Then, we have p 2

S fpg ⊂ S. Meanwhile, Sdsc ⊂ S ⊂ S. This implies that S contains both Sd and
Sdsc. That is, we have
S ⊃ Sd [ Sdsc :
It is natural to ask how to deal with other points that do not belong to Sd [ Sdsc.
Taking account of the above discussion on the accumulation points and isolated
points, however, such remainder points should again be classified into accumulation
and isolated points. Since all the isolated points are contained in S, they can be taken
into S.
Suppose that among the accumulation points, some points p are not contained in
= S. In that case, we have S {p} ¼ S, namely S fpg ¼ S. Thus, p 2
S; i.e., p 2
S fpg , p 2 S. From (6.21), in turn, this implies that in case p is not contained in
S, for p to be an accumulation point of S is equivalent to that p is an adherent point of
S. Then, those points p can be taken into S as well. Thus, finally we get (6.24). From
Definitions 6.6 and 6.7, obviously we have Sd \ Sdsc ¼ ∅. This completes the
proof. ∎
þ
S ¼ Sd [ Sd [ Sdsc ¼ Sd [ S: ð6:25Þ
We have other important theorems.

Theorem 6.5 Let S be a subset of T. Then we have a following relation:
S ¼ Sd [ S: ð6:26Þ
Proof Let us assume that p 2 S. Then, we have following two cases: (i) if p 2 S, we
have trivially p 2 S ⊂ Sd [ S. (ii) Suppose p 2 = S. Since p 2 S, from Definition 6.4
any neighborhood of p contains a point of S (other than p). Then, from Definition
6.6, p is an accumulation point of S. Thus, we have p 2 Sd ⊂ Sd [ S and, hence, get
S ⊂ Sd [ S. Conversely, suppose that p 2 Sd [ S. Then, (i) if p 2 S, obviously we
have p 2 S. (ii) Suppose p 2 Sd. Then, from Definition 6.6 any neighborhood of
p contains a point of S (other than p). Thus, from Definition 6.4, p 2 S. Taking into
account the above two cases, we get Sd [ S ⊂ S. Combining this with S ⊂ Sd [ S
obtained above, we get (6.26). This completes the proof. ∎
Fig. 6.6 Relationship ̅

among S, S, Sd, and Sdsc.
Regarding the symbols and
notations, see text
( )
( )
̅= ∪ = ∪ ,
=( ) ∪( )
Theorem 6.6 Let S be a subset of T. A necessary and sufficient condition for S be a

closed set is that S contains all the accumulation points of S.
Proof From Theorem 6.2, that S is a closed set is equivalent to S ¼ S . From
Theorem 6.5 this is equivalent to S ¼ Sd [ S. Obviously, this is equivalent to Sd ⊂ S.
In other words, S contains all the accumulation points of S. This completes the
proof. ∎
As another proof, taking account of (6.25), we have
d
S ¼ ∅ ⟺ S ¼ S ⟺ S is a closed set ðfrom Theorem 6:2Þ:
This relation is equivalent to the statement that S contains all the accumulation
points of S.
In Fig. 6.6 we show the relationship among S, S, Sd, and Sdsc. From Fig. 6.6, we
can readily show that
Sdsc ¼ S Sd ¼ S Sd :
We have a further example of direct sum decomposition. Using Lemma 6.5, we

have
Sb ¼ S \ Sc ¼ S \ ð S∘ Þ c ¼ S S∘ : ð6:27Þ
Since S ⊃ S∘ , we get
S ¼ S∘ [ Sb ; S∘ \ Sb ¼ ∅:
Notice that (6.27) is a succinct expression of Theorem 6.3. That is,
Sb ¼ ∅ ⟺ S ¼ S∘ ⟺ S is a clopen set:
(a) (b)
(c)
= ∪
Fig. 6.7 Connectedness of subspaces. (a) Connected subspace A. Any two points z1 and z2 of A can
be connected by a continuous line that is contained entirely within A. (b) Simply connected
subspace A. All the points inside any closed path C within A contain only the points that belong
to A. (c) Disconnected subspace A that comprises two disjoint sets B and C
(e) Connectedness
In the precedent several paragraphs we studied various building blocks and their
properties of the topological space and sets contained in the space. In this paragraph,
we briefly mention the relationship between the sets. The connectedness is an
important concept in the topology. The concept of the connectedness is associated
with the subspaces defined earlier in this section. In terms of the connectedness, a
topological space can be categorized into two main types: a subspace of the one type
is connected and that of the other is disconnected.
Let A be a subspace of T. Intuitively, for A to be connected is defined as follows:
Let z1 and z2 be any two points of A. If z1 and z2 can be joined by a continuous line
that is contained entirely within A (see Fig. 6.7a), then A is said to be connected. Of
the connected subspaces, suppose that the subspace A has the property that all the
points inside any closed path (i.e., continuous closed line) C within A contain only
the points that belong to A (Fig. 6.7b). Then, that subspace A is said to be simply
connected. A subspace that is not simply connected but connected is said to be
multiply connected. A typical example for the latter is a torus. If a subspace is given
as a union of two (or more) disjoint non-empty (open) sets, that subspace is said to be
disconnected. Notice that if two (or more) sets have no element in common, those
sets are said to be disjoint. Figure 6.7c shows an example for which a disconnected
subspace A is a union of two disjoint sets B and C.
Interesting examples can be seen in Chap. 20 in relation to the connectedness.
6.1.3 T1-Space
We can “enter” a variety of topologies τ into a set T to get a topological space (T, τ).
We have two extremes of them. One is an indiscrete topology (or trivial topology)
and the other is a discrete topology [2]. For the former the topology is characterized
as
τ ¼ f∅, T g ð6:28Þ
and the latter case is
τ ¼ f∅, all the subsets of T, T g: ð6:29Þ
With τ all the subsets are open and complements to the individual subsets are
closed by Definition 6.1. But, such closed subsets are contained in τ, and so they are
again open. Thus, all the subsets are clopen sets. Practically speaking, the afore-
mentioned two extremes are of less interest and value. For this reason, we need some
moderate separation conditions with topological spaces. These conditions are well
established as the separation axioms. We will not get into details about the discus-
sion [2], but we mention the first separation axiom (Fréchet axiom) that produces the
T1-space.
Definition 6.8 Let (T, τ) be a topological space. Suppose that with respect to
8
x, y (x 6¼ y) 2 T there is a neighborhood N in such a way that
x 2 N and y 2
= N:
The topological space that satisfies the above separation condition is called a
T1-space.
We mention two important theorems of the T1-space.
Theorem 6.7 A necessary and sufficient condition for (T, τ) to be a T1-space is that
for each point x 2 T, {x} is a closed set of T.
Proof
(i) Necessary condition: Let (T, τ) to be a T1-space. Choose any pair of elements
x, y (x 6¼ y) 2 T. Then, from Definition 6.8 there is a neighborhood ∃N of y (6¼x)
that does not contain x; y 2 N and N \ {x} ¼ ∅. From Definition 6.4 this implies
that y 2
= fxg. Thus, we have y 6¼ x ) y 2= fxg. This means that y 2 fxg ) y ¼ x.
Namely, fxg ⊂ fxg , but from (6.11) we have fxg ⊂ fxg . Thus, fxg ¼ fxg .
From Theorem 6.2 this shows that {x} is a closed set of T.
(ii) Sufficient condition: Suppose that {x} is a closed set of T. Choose any x and
y such that y 6¼ x. Since {x} is a closed set, N ¼ T {x} is an open set that
contains y; i.e., y 2 T {x} and x 2= T {x}. Since y 2 T {x} ⊂ T {x}
where T {x} is an open set, from Definition 6.2 N ¼ T {x} is a neighbor-

hood of y, and moreover N does not contain x. Then, from Definition 6.8 (T, τ) is
a T1-space. These complete the proof. ∎
A set comprising a unit element is called a singleton or a unit set.
Theorem 6.8 Let S be a subset of a T1-space. Then Sd is a closed set in the T1-space.
Proof Let p be a point of S. Suppose p 2 Sd . Suppose, furthermore, p 2= Sd. Then,

∃
from Definition 6.6 there is some neighborhood N of p such that N \ (S {p}) ¼ ∅.
Since p 2 Sd , from Definition 6.4 we must have N \ Sd 6¼ ∅ at once for this special
neighborhood N of p. Then, as we have p 2 = Sd, we can take a point q such that
q 2 N \ S and q 6¼ p. Meanwhile, we may choose an open set N for the above
d
neighborhood of p (i.e., an open neighborhood), because p 2 N (¼N∘) ⊂ N. Since

N \ (T {p}) ¼ N {p} is an open set from Theorem 6.7 as well as Axiom
(O2) and Definition 6.1, N {p} is an open neighborhood of q which does not
contain p. Namely, we have q 2 N {p}[¼(N {p})∘] ⊂ N {p}. Note that
q 2 N {p} and p 2 = N {p}, in agreement with Definition 6.8. As q 2 Sd, again
from Definition 6.6 we have
ðN fpgÞ \ ðS fqgÞ 6¼ ∅: ð6:30Þ
This implies that N \ (S {q}) 6¼ ∅ as well. But, it is in contradiction to the

original relation of N \ (S {p}) ¼ ∅, which is obtained from p 2 = Sd. Then, we
must have p 2 Sd. Thus, from the original supposition we get Sd ⊂ Sd . Meanwhile,
we have Sd ⊃ Sd from (6.11). We have Sd ¼ Sd accordingly. From Theorem 6.2, this
means that Sd is a closed set. Hence, we have proven the theorem. ∎
The above two theorems are intuitively acceptable. For, in a one-dimensional
Euclidean space ℝ we express a singleton {x} as [x, x] and a closed interval as [x, y]
(x 6¼ y). With the latter, [x, y] might well be expressed as (x, y)d. Such subsets are well
known as closed sets. According to the separation axioms, various topological
spaces can be obtained by imposing stronger constraints upon the T1-space [2]. A
typical example for this is a metric space. In this context, the T1-space has acquired
primitive notions of metric that can be shared with the metric space. Definitions of
the metric (or distance function) and metric space will briefly be summarized in
Chap. 13 in reference to an inner product space.
In the above discussion, we have described a brief outline of the set theory and
topology. We use the results in this chapter and in Part IV as well.
Fig. 6.8 Complex plane

where z is expressed in a
polar form using a non-
negative radial coordinate
r and an angular coordinate θ
i
x
0 1
6.1.4 Complex Numbers and Complex Plane
Returning to (6.1), we further investigate how to represent a complex number. In

Fig. 6.8 we redraw a complex plane and a complex number on it. The complex plane
is a natural extension of a real two-dimensional orthogonal coordinate plane (Car-
tesian coordinate plane) that graphically represents a pair of real numbers x and y as
(x, y). In the complex plane we represent a complex number as
z ¼ x þ iy ð6:1Þ
by designating x as an abscissa on the real axis and y as an ordinate on the imaginary

axis. The two axes are orthogonal to each other.
An absolute value (or modulus) of z is defined as a real non-negative number
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j zj ¼ x2 þ y2 : ð6:31Þ
Analogously to a real two-dimensional polar coordinate, we can introduce in the

complex plane a non-negative radial coordinate r and an angular coordinate θ
(Fig. 6.8). In the complex analysis the angular coordinate is called an argument
and denoted by arg z so as to be
arg z ¼ θ þ 2πn; n ¼ 0, 1, 2, :
In Fig. 6.8, x and y are given by
x ¼ r cos θ and y ¼ r sin θ:
Thus, we have a polar form of z expressed as

z ¼ r ðcos θ þ i sin θÞ: ð6:32Þ
Using the well-known Euler’s formula (or Euler’s identity)
eiθ ¼ cos θ þ i sin θ, ð6:33Þ
we get
z ¼ reiθ : ð6:34Þ
zn ¼ r n einθ ¼ r n ðcos θ þ i sin θÞn ¼ r n ðcos nθ þ i sin nθÞ, ð6:35Þ
where the last equality comes from replacing θ with nθ in (6.33).

Comparing both sides with the last equality of (6.35), we have
ðcos θ þ i sin θÞn ¼ cos nθ þ i sin nθ: ð6:36Þ
Equation (6.36) is called the de Moivre’s theorem. This relation holds with
n being integers including zero along with rational numbers including negative
numbers. Euler’s formula (6.33) immediately leads to the following important
formulae:
1 iθ 1 iθ
cos θ ¼ e þ eiθ , sin θ ¼ e eiθ : ð6:37Þ
2 2i
Notice that although in (6.32) and (6.33) we assumed that θ is a real number,
(6.33) and (6.37) hold with any complex number θ. Note also that (6.33) results from
(6.37) and the following definition of power series expansion of the exponential
function
X 1 zk
ez k¼0 k!
, ð6:38Þ
where z is any complex number [5]. In fact, from (6.37) and (6.38) we have
following familiar expressions of power series expansion of the cosine and sine
functions
X1 z2k X1 z2kþ1
cos z ¼ ð1Þk , sin z ¼ ð1Þk :
k¼0 ð2kÞ! k¼0 ð2k þ 1Þ!
These expressions frequently appear in this chapter.

6.2 Analytic Functions of a Complex Variable 199
Next, let us think of a pair of complex numbers z and w with w expressed as

w ¼ u + iv. Then, we have
z w ¼ ðx uÞ þ iðy vÞ:
Using (6.31), we get

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jz wj ¼ ð x uÞ 2 þ ð y v Þ 2 : ð6:39Þ
Equation (6.39) represents the “distance” between z and w. If we define a

following function ρ(z, w) such that
ρðz, wÞ jz wj ,
ρ(z, w) defines a metric (or distance function); see Chap. 13. Thus, the metric gives a
distance between any arbitrarily chosen pair of elements z, w 2 ℂ and (ℂ, ρ) can be
dealt with as a metric space. This makes it easy to view the complex plane as a
topological space. Discussions about the set theory and topology already developed
earlier in this chapter basically hold.
In the theory of analytic functions, a subset A depicted in Fig. 6.7 can be regarded
as a part of the complex plane. Since the complex plane ℂ itself represents a
topological space, the subset A may be viewed as a subspace of ℂ. A complex
function f (z) of a complex variable z of (6.1) is usually defined in a connected open
set called a region [5]. The said region is of practical use among various subspaces
and can be an entire domain of ℂ or a subset of ℂ. The connectedness of the region
can be considered in a manner similar to that described in Sect. 6.1.2.
6.2 Analytic Functions of a Complex Variable
Since the complex number and complex plane have been well characterized in the
previous section, we deal with the complex functions of a complex variable in the
complex plane.
Taking the complex conjugate of (6.1), we have
z ¼ x iy: ð6:40Þ
Combining (6.1) and (6.40) in a matrix form, we get

1 1
ðz z Þ ¼ ðx yÞ : ð6:41Þ
i i
∗
−
∗
−
∗ ∗
− − +
0
∗
−
∗
− +
Fig. 6.9 Vector synthesis of z and z
(a) (b) z
i
x
0 0 1
Fig. 6.10 Ways a number (real or complex) is approached. (a) Two ways a number x0 is
approached in a real number line. (b) Various ways a number z0 is approached in a complex
plane. Here, only four ways are indicated

1 1
The matrix is non-singular, and so the inverse matrix exists (see Sect.
i i
11.3) and (x y) can be described as

1 1 i
ðx yÞ ¼ ðz z Þ ð6:42Þ
2 1 i
or with individual components we have
1 i
x ¼ ðz þ z Þ and y ¼ ðz z Þ, ð6:43Þ
2 2
which can be understood as the “vector synthesis” (see Fig. 6.9).

Equations (6.41) and (6.42) imply that two sets of variables (x y) and (z z) can
be dealt with on an equal basis. According to the custom, however, we prioritize the
notation using (x y) over that of (z z). Hence, the complex function f is describe as
f ðx, yÞ ¼ uðx, yÞ þ ivðx, yÞ, ð6:44Þ
where both u(x, y) and v(x, y) are real functions of real variables x and y.
Here let us pause for a second to consider the differentiation of a function.

Suppose that a mathematical function is defined on a real domain (i.e., a real number
line). Consider whether that function is differentiable at a certain point x0 of the real
number line. On this occasion, we can approach x0 only from two directions, i.e.,
from the side of x < x0 (from the left) or from the side of x0 < x (from the right); see
Fig. 6.10a. Meanwhile, suppose that a mathematical function is defined on a
complex domain (i.e., a complex plane). Also consider whether the function is
differentiable at a certain point z0 of the complex plane. In this case, we can approach
z0 from continuously varying directions; see Fig. 6.10b where only four directions
are depicted.
In this context let us think of a simple example.
Example 6.1 Let f (x, y) be a function described by
f ðx, yÞ ¼ 2x þ iy ¼ z þ x: ð6:45Þ
As remarked above, substituting (6.43) for (6.45), we could have

h i
1 i 1 1
hðz, z Þ ¼ 2 ðz þ z Þ þ i ðz z Þ ¼ z þ z þ ðz z Þ ¼ ð3z þ z Þ,
2 2 2 2
where f (x, y) and h(z, z) denote different functional forms. The derivative (6.45)
varies depending on a way z0 is approached. For instance, think of
df f ð 0 þ z Þ f ð 0Þ 2x þ iy 2x2 þ y2 ixy
¼ lim ¼ lim ¼ lim :
dz 0 z!0 z x!0, y!0 x þ iy x!0, y!0 x 2 þ y2
Suppose that the differentiation is taken along a straight line in a complex plane
represented by iy ¼ (ik)x (k, x, y : real). Then, we have
df 2x2 þ k2 x2 ikx2 2 þ k 2 ik 2 þ k 2 ik
¼ lim ¼ lim ¼ : ð6:46Þ
dz 0 x!0, y!0 x þk x
2 2 2 x!0, y!0 1 þ k2 1 þ k2
df
However, this means that dz 0 takes varying values depending upon k. Namely,
df
dz 0cannot uniquely be defined but depends on different ways to approach the origin
of the complex plane. Thus, we find that the derivative takes different values
depending on straight lines along which the differentiation is taken. This means
that f (x, y) is not differentiable or analytic at z ¼ 0.
Meanwhile, think of g(z) expressed as
gðzÞ ¼ x þ iy ¼ z: ð6:47Þ
In this case, we get

dg gð 0 þ z Þ gð 0Þ z0
¼ lim ¼ lim ¼ 1:
dz 0 z!0 z z!0 z
As a result, the derivative takes the same value 1, regardless of what straight lines
the differentiation is taken along.
Though simple, the above example gives us a heuristic method. As before, let f (z)
be a function described as (6.44) such that
f ðzÞ ¼ f ðx, yÞ ¼ uðx, yÞ þ ivðx, yÞ, ð6:48Þ
where both u(x, y) and v(x, y) possess the first-order partial derivatives with respect to
x and y. Then we have a derivative df (z)/dz such that
df ðzÞ f ðz þ ΔzÞ f ðzÞ

¼ lim
dz Δz!0 Δz
uðx þ Δx, y þ ΔyÞ uðx, yÞ þ i½vðx þ Δx, y þ ΔyÞ vðx, yÞ
¼ lim :
Δx!0, Δy!0 Δx þ iΔy
ð6:49Þ
We wish to seek the condition that dfdzðzÞ gives the same result regardless of the
order of taking the limit of Δx ! 0 and Δy ! 0. That is, taking the limit of Δy ! 0
first, we have
df ðzÞ uðx þ Δx, yÞ uðx, yÞ þ i½vðx þ Δx, yÞ vðx, yÞ

¼ lim
dz Δx!0 Δx
∂uðx, yÞ ∂vðx, yÞ
¼ þi : ð6:50Þ
∂x ∂x
Next, taking the limit of Δx ! 0, we get
df ðzÞ uðx, y þ ΔyÞ uðx, yÞ þ i½vðx, y þ ΔyÞ vðx, yÞ

¼ lim
dz Δy!0 iΔy
¼ i þ : ð6:51Þ
∂y ∂y
Consequently, by equating the real and imaginary parts of (6.50) and (6.51) we
must have
¼ , ð6:52Þ
∂x ∂y
∂vðx, yÞ ∂uðx, yÞ
¼ : ð6:53Þ
∂x ∂y
These relationships (6.52) and (6.53) are called the CauchyRiemann conditions.
Differentiating (6.52) with respect to x and (6.53) with respect to y and further
subtracting one from the other, we get
2 2
∂ uðx, yÞ ∂ uðx, yÞ
2
þ 2
¼ 0: ð6:54Þ
∂x ∂y
Similarly we have
2 2
∂ vðx, yÞ ∂ vðx, yÞ
2
þ 2
¼ 0: ð6:55Þ
∂x ∂y
From the above discussion, we draw several important implications.

(1) From (6.50), we have
df ðzÞ ∂f
¼ : ð6:56Þ
dz ∂x
(2) Also from (6.51) we obtain
df ðzÞ ∂f
¼ i : ð6:57Þ
dz ∂y
Meanwhile, we have
∂ ∂z ∂ ∂z ∂ ∂ ∂
¼ þ ¼ þ : ð6:58Þ
∂x ∂x ∂z ∂x ∂z ∂z ∂z
Also, we get

∂ ∂z ∂ ∂z ∂ ∂ ∂
¼ þ ¼i : ð6:59Þ
∂y ∂y ∂z ∂y ∂z ∂z ∂z
As in the case of Example 6.1, we rewrite f (z) as
f ðzÞ F ðz, z Þ, ð6:60Þ
where the change in the functional form is due to the variables transformation.
Hence, from (6.56) and using (6.58) we have

df ðzÞ ∂F ðz, z Þ ∂ ∂ ∂F ðz, z Þ ∂F ðz, z Þ
¼ ¼ þ F ðz, z Þ ¼ þ : ð6:61Þ
dz ∂x ∂z ∂z ∂z ∂z
Similarly, we get

df ðzÞ ∂F ðz, z Þ ∂ ∂ ∂F ðz, z Þ ∂F ðz, z Þ
¼ i ¼ F ðz, z Þ ¼ : ð6:62Þ
dz ∂y ∂z ∂z ∂z ∂z
Equating (6.61) and (6.62), we obtain
∂F ðz, z Þ
¼ 0: ð6:63Þ
∂z
This clearly shows that if F(z, z) is differentiable with respect to z, F(z, z) does
not depend on z, but only depends upon z. Thus, we may write the differentiable
function as
F ðz, z Þ f ðzÞ:
This is an outstanding characteristic of the complex function that is differentiable

with respect to the complex variable z.
Taking the contraposition of the above statement, we say that if F(z, z) depends
on z, it is not differentiable with respect to z. Example 6.1 is one of the illustration.
On the other way around, we consider what can happen if the CauchyRiemann
conditions are satisfied [5]. Let f (z) be a complex function described by
f ðzÞ ¼ uðx, yÞ þ ivðx, yÞ, ð6:48Þ
where u(x, y) and v(x, y) satisfy the CauchyRiemann conditions and possess con-
tinuous first-order partial derivatives with respect to x and y in some region of ℂ.
Then, we have
∂uðx, yÞ ∂uðx, yÞ
uðx þ Δx, y þ ΔyÞ uðx, yÞ ¼ Δx þ Δy þ ε1 Δx þ δ1 Δy, ð6:64Þ
∂x ∂y
∂vðx, yÞ ∂vðx, yÞ
vðx þ Δx, y þ ΔyÞ vðx, yÞ ¼ Δx þ Δy þ ε2 Δx þ δ2 Δy, ð6:65Þ
∂x ∂y
where four positive numbers ε1, ε2, δ1, and δ2 can be made arbitrarily small as Δx and
Δy tend to be zero. The relations (6.64) and (6.65) result from the continuity of the
first-order partial derivatives of u(x, y) and v(x, y). Then, we have
f ðz þ ΔzÞ f ðzÞ
Δz
uðx þ Δx, y þ ΔyÞ uðx, yÞ i½vðx þ Δx, y þ ΔyÞ vðx, yÞ
¼ þ
Δz Δz
Δx ∂uðx, yÞ ∂vðx, yÞ Δy ∂uðx, yÞ ∂vðx, yÞ
¼ þi þ þi
Δz ∂x ∂x Δz ∂y ∂y
Δx Δy
þ ðε1 þ iε2 Þ þ ðδ þ iδ2 Þ
Δz Δz 1
Δx ∂uðx, yÞ ∂vðx, yÞ Δy ∂vðx, yÞ ∂uðx, yÞ
¼ þi þ þi
Δz ∂x ∂x Δz ∂x ∂x
Δx Δy
þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ
Δz 1 Δz 1
Δx þ iΔy ∂uðx, yÞ ðΔx þ iΔyÞ ∂vðx, yÞ Δx Δy
¼ þ i þ ðε1 þ iε2 Þ þ ðδ þ iδ2 Þ
Δz ∂x Δz ∂x Δz Δz 1
∂uðx, yÞ ∂vðx, yÞ Δx Δy
¼ þi þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ, ð6:66Þ
∂x ∂x Δz 1 Δz 1
where with the third equality we used the CauchyRiemann conditions (6.52) and
(6.53). Accordingly, we have
f ðz þ ΔzÞ f ðzÞ ∂uðx, yÞ ∂vðx, yÞ

þi
Δz ∂x ∂x
Δx Δy
¼ ðε þ iε2 Þ þ ðδ þ iδ2 Þ: ð6:67Þ
Δz 1 Δz 1
Taking the absolute values of both sides of (6.67), we get
f ðz þ ΔzÞ f ðzÞ ∂uðx, yÞ ∂vðx, yÞ

þi
Δz ∂x ∂x
Δx Δy
ðε þ iε2 Þ þ ðδ þ iδ2 Þ , ð6:68Þ
Δz 1 Δz 1
Δx j Δx j Δy jΔyj
where ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 and ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Δz 2 2 Δz
ðΔxÞ þ ðΔyÞ ðΔxÞ2 þ ðΔyÞ2
1: Taking the limit of Δz ! 0,
by assumption both the terms of RHS of (6.68) approach zero. Thus,

f ðz þ ΔzÞ f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f

lim ¼ þi ¼ : ð6:69Þ
Δy!0 Δz ∂x ∂x ∂x
Alternatively, using partial derivatives of u(x, y) and v(x, y) with respect to y we

can rewrite (6.66) as
f ðz þ ΔzÞ f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f

lim ¼ i þi ¼ i : ð6:70Þ
Δy!0 Δz ∂y ∂y ∂y
From (6.56) and (6.57), the relations (6.69) and (6.70) imply that for f (z) to be
differentiable requires that the first-order partial derivatives of u(x, y) and v(x, y) with
respect to x and y should exist. Meanwhile, once f (z) is found to be differentiable
(or analytic), its higher order derivatives must be analytic as well (vide infra). This
requires, in turn, that the above first-order partial derivatives should be continuous.
(Note that the analyticity naturally leads to continuity.) Thus, the following theorem
will follow.
Theorem 6.9 Let f (z) be a complex function of complex variables z such that
f (z) ¼ u(x, y) + iv(x, y), where x and y are real variables. Then, a necessary and
sufficient condition for f (z) to be differentiable is that the first-order partial deriva-
tives of u(x, y) and v(x, y) with respect to x and y exist and are continuous and that u
(x, y) and v(x, y) satisfy the CauchyRiemann conditions.
Now, we formally give definitions of the differentiability and analyticity of a
complex function of complex variables.
Definition 6.9 Let z0 be a given point of the complex plane ℂ. Let f (z) be defined in
a neighborhood containing z0. If f (z) is single valued and differentiable at all points
of this neighborhood containing z0, f (z) is said to be analytic at z0. A point at which
f (z) is analytic is called a regular point of f (z). A point at which f (z) is not analytic is
called a singular point of f (z).
We emphasize that the above definition of analyticity at some point z0 requires the
single valuedness and differentiability at all the points of a neighborhood containing
z0. This can be understood by Fig. 6.10b and Example 6.1. The analyticity at a point
is deeply connected to how and in which direction we take the limitation process.
Thus, we need detailed information about the neighborhood of the point in question
to determine whether the function is analytic at that point.
The next definition is associated with a global characteristic of the analytic
function.
Definition 6.10 Let R be a region contained in the complex plane ℂ. Let f (z) be a
complex function defined in R . If f (z) is analytic at all points of R , f (z) is said to be
analytic in R . In this case R is called a domain of analyticity. If the domain of
analyticity is an entire complex plane ℂ, the function is called an entire function.
To understand various characteristics of the analytic functions, it is indispensable
to introduce Cauchy’s integral formula. In the next section we deal with the
integration of complex functions.
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 207
6.3 Integration of Analytic Functions: Cauchy’s Integral

Formula
We can define the complex integration as a natural extension of Riemann integral

with regard to a real variable. Suppose that there is a curve C in the complex plane.
Both ends of the curve are fixed and located at za and zb. Suppose also that individual
points of the curve C are described using a parameter t such that
z ¼ zðt Þ ¼ xðt Þ þ iyðt Þ ðt 0 t a t t n t b Þ, ð6:71Þ
where z0 za ¼ z(ta) and zn zb ¼ z(tb) together with zi ¼ z(ti) (0 i n). For this
parametrization, we assume that C is subdivided into n pieces designated by z0, z1,
, zn. Let f (z) be a complex function defined in a region containing C. In this
situation, let us assume the following summation Sn:
Xn
Sn ¼ i¼1
f ðζ i Þðzi zi1 Þ, ð6:72Þ
where ζ i lies between zi 1 and zi (1 i n). Taking the limit n ! 1 and

concomitantly jzi zi 1 j ! 0, we have
Xn
lim Sn ¼ lim i¼1
f ðζ i Þðzi zi1 Þ: ð6:73Þ
n!1 n!1
If the constant limit exists independent of choice of zi (1 i n 1) and

ζ i (1 i n), this limit is called a contour integral of f (z) along the curve C. We
write it as
Z Z zb
I ¼ lim Sn f ðzÞdz ¼ f ðzÞdz: ð6:74Þ
n!1
C za
Using (6.48) and zi ¼ xi + iyi, we rewrite (6.72) as

Xn
Sn ¼ i¼1
½uðζ i Þðxi xi1 Þ vðζ i Þðyi yi1 Þ
Xn
þ i¼1
i½vðζ i Þðxi xi1 Þ þ uðζ i Þðyi yi1 Þ:
Taking the limit of n ! 1 as well as jxi xi 1j ! 0 and jyi yi 1j !0

(0 i n), we have
Z Z
I¼ ½uðx, yÞdx vðx, yÞdy þ i ½vðx, yÞdx þ uðx, yÞdy: ð6:75Þ
C C
Further rewriting (6.75), we get

Z Z
dx dy dx dy
I¼ uðx, yÞ dt vðx, yÞ dt þ i vðx, yÞ dt þ uðx, yÞ dt
dt dt dt dt
ZC Z C
dx dy
¼ ½uðx, yÞ þ ivðx, yÞ dt þ i ½uðx, yÞ þ ivðx, yÞ dt
dt dt
ZC C
Z ð6:76Þ
dx dy dz
¼ ½uðx, yÞ þ ivðx, yÞ þi dt ¼ ½uðx, yÞ þ ivðx, yÞ dt:
C dt dt C dt
Z Z tb Z zb
dz dz
¼ f ðzÞ dt ¼ f ðzÞ dt ¼ f ðzÞdz
C dt ta dt za
Notice that the contour integral I of (6.74), (6.75), or (6.76) depends in general on
the paths connecting the points za and zb. If so, (6.76) is merely of secondary
importance. But, if f (z) can be described by the derivative of another function, the
situation becomes totally different. Suppose that we have
e ðzÞ
dF
f ðzÞ ¼ : ð6:77Þ
dz
e ðzÞ is analytic, f (z) is analytic as well. It is because a derivative of an analytic

If F
function is again analytic (vide infra). Then, we have
e ðzÞ dz d F
dz d F e ½zðt Þ
f ð zÞ ¼ ¼ : ð6:78Þ
dt dz dt dt

Z Z Z
dz zb tb e ½zðt Þ
dF
f ðzÞ dt ¼ f ðzÞdz ¼ e ðzb Þ F
dt ¼ F e ðza Þ: ð6:79Þ
C dt za ta dt
e ðzÞ is called a primitive function of f (z). It is obvious from (6.79)

In this case F
that
Z zb Z za
f ðzÞdz ¼ f ðzÞdz: ð6:80Þ
za zb
This implies that the contour integral I does not depend on the path C connecting
the points za and zb. We must be careful, however, about a situation where there
would be a singular point zs of f (z) in a region R . Then, f (z) is not defined at zs. If one
chooses C00 for a contour of the return path in such a way that zs is on C00 (see
Fig. 6.11), RHS of (6.80) cannot exist, nor can (6.80) hold. To avoid that situation,
we have to “expel” such singularity from the region R so that R can wholly be
Fig. 6.11 Region R and

several contours for
integration. Regarding the
symbols and notations,
see text
ℛ
contained in the domain of analyticity of f (z) and that f (z) can be analytic in the
entire region R . Moreover, we must choose R so that the whole region inside any
closed paths within R can contain only the points that belong to the domain of
analyticity of f (z). That is, the region R must be simply connected in terms of the
analyticity of f (z) (see Sect. 6.1). For a region R to be simply connected ensures that
(6.80) holds with any pair of points za and zb in R , so far as the integration path with
respect to (6.80) is contained in R .
From (6.80), we further get
Z zb Z za
f ðzÞdz þ f ðzÞdz ¼ 0: ð6:81Þ
za zb
Since the integral does not depend on the paths, we can take a path from zb to za
along a curve C0 (see Fig. 6.11). Thus, we have
Z Z Z
f ðzÞdz þ f ðzÞdz ¼ f ðzÞdz ¼ 0, ð6:82Þ
C C0 CþC 0
e as shown. In this way, in accordance with

where C + C0 constitutes a closed curve C
(6.81) we get
Z
f ðzÞdz ¼ 0, ð6:83Þ
e
C
where Ce is followed counterclockwise according to the custom.

In (6.83) any closed path Ce can be chosen for the contour within the simply
connected region R . Hence, we reach the following important theorem.
Theorem 6.10: Cauchy’s Integral Theorem Let R be a simply connected region
and let C be an arbitrary closed curve within R . Let f (z) be analytic there. Then, we
have
(a) (b)
Γ
ℛ
ℛ
Fig. 6.12 Region R and contour C for integration. (a) A singular point at zs is contained within R .
(b) The singular point at zs is not contained within Re so that f(z) can be analytic in a simply
connected region Re . Regarding the symbols and notations, see text
I
f ðzÞdz ¼ 0: ð6:84Þ
C
There is a variation in description of Cauchy’s integral theorem. For instance, let f

(z) be analytic in R except for a singular point at zs; see Fig. 6.12a. Then, the domain
of analyticity of f (z) is not identical to R , but is defined as R fzs g. That is, the
domain of analyticity of f (z) is no longer simply connected and, hence, (6.84) is not
generally true. In such a case, we may deform the integration path of f (z) so that its
domain of analyticity can be simply connected and that (6.84) can hold. For
example, we take Re so that it can be surrounded by curves C and Γ e as well as
lines L1 and L2 (Fig. 6.12b). As a result, zs has been expelled from Re and, at the same
time, Re becomes simply connected. Then, as a tangible form of (6.84) we get
I
Cþe
ΓþL1 þL2
e denotes the clockwise integration (see Fig. 6.12b) and the lines L1 and L2 are
where Γ
rendered as closely as possible. In (6.85) the integrations with L1 and L2 cancel,
because they are taken in the reverse direction. Thus, we have
I I I
f ðzÞdz ¼ 0 or f ðzÞdz ¼ f ðzÞdz, ð6:86Þ
Cþe
Γ C Γ
where Γ stands for the counterclockwise integration. Equation (6.86) is valid as well,
when there are more singular points. In that case, instead of (6.86) we have
I XI
f ðzÞdz ¼ i
f ðzÞdz, ð6:87Þ
C Γi
Fig. 6.13 Simply

connected region R
encircled by a closed
contour C in a complex
plane ζ. A circle Γ of radius
ρ centered at z is contained ℛ
within R . The contour z
integration is taken in the Γ
i
counterclockwise direction
along C or Γ
0 1
where Γi is taken so that it can encircle individual singular points. Reflecting the
nature of singularity, we get different results with (6.87). We will come back to this
point later.
On the basis of Cauchy’s integral theorem, we show an integral representation of
an analytic function that is well known as Cauchy’s integral formula.
Theorem 6.11: Cauchy’s Integral Formula Let f (z) be analytic in a simply
connected region R . Let C be an arbitrary closed curve within R that encircles z.
Then, we have
I
1 f ðζ Þ
f ðzÞ ¼ dζ, ð6:88Þ
2πi C ζz
where the contour integration along C is taken in the counterclockwise direction.

Proof Let Γ be a circle of a radius ρ in the complex plane so that z can be a center of
the circle (see Fig. 6.13). Then, f (ζ)/(ζ z) has a singularity at ζ ¼ z. Therefore, we
must evaluate (6.88) using (6.86). From (6.86), we have
I I I I
f ðζ Þ f ðζ Þ dζ f ðζ Þ f ðzÞ
dζ ¼ dζ ¼ f ðzÞ þ dζ: ð6:89Þ
C z
ζ Γ ζ z Γ ζ z Γ ζz
An arbitrary point ζ on the circle Γ is described as
ζ ¼ z þ ρeiθ : ð6:90Þ
Then, taking infinitesimal quantities of (6.90), we have
dζ ¼ ρeiθ idθ ¼ ðζ zÞ idθ: ð6:91Þ

I Z 2π
dζ
¼ idθ ¼ 2πi: ð6:92Þ
Γ z
ζ 0
Rewriting (6.89), we obtain

I I
f ðζ Þ f ðζ Þ f ðzÞ
dζ ¼ 2πif ðzÞ þ dζ: ð6:93Þ
Γ ζz Γ ζz
Since f (z) is analytic, f (z) is uniformly continuous [6]. Therefore, if we make ρ

small enough with an arbitrary positive number ε, we have j f (ζ) f (z)j < ε, as
jζ zj ¼ ρ. Considering this situation, we have
I I I
1 f ðζ Þ 1 f ðζ Þ 1 f ðζ Þ f ðzÞ
dζ f ðzÞj¼j dζ f ðzÞj¼j dζ
2πi C ζ z 2πi Γ ζ z 2πi Γ ζz
I I Z 2π
1 f ðζ Þ f ðzÞ ε jdζ j ε
jdζ j < ¼ dθ ¼ ε:
2π Γ ζz 2π Γ jζ zj 2π 0
ð6:94Þ
Therefore, we get (6.88) in the limit of ρ ! 0. This completes the proof. ∎

f ðζ Þ
In the above proof, the domain of analyticity R of ζz (with respect to ζ) is given
by R ¼ R \ ½ℂ fzg ¼ R fzg. Since both R and ℂ fzg are open sets (i.e.,
ℂ {z} ¼ {z}c is a complementary set of a closed set {z} and, hence, an open set), R
is an open set as well; see Sect. 6.1. Notice that R is not simply connected. For this
reason, we had to evaluate (6.88) using (6.89).
Theorem 6.10 (Cauchy’s integral theorem) and Theorem 6.11 (Cauchy’s integral
formula) play a crucial role in the theory of analytic functions.
To derive (6.94), we can equally use the Darboux inequality [5]. This is intui-
tively obvious and frequently used in the theory of analytic functions.
Theorem 6.12: Darboux Inequality Let f (z) be a function for which j f (z)j is
bounded on C. Here C is a piecewise continuous path in the complex plane. Then,
with the following integral I described by
Z
I¼ f ðzÞdz,
C
we have
Z
j f ðzÞdz j max j f j L, ð6:95Þ
C
where L represents the arc length of the curve C for the contour integration.
Proof As discussed earlier in this section, the integral is the limit of n ! 1 of the
sum described by
Xn
Sn ¼ i¼1
f ðζ i Þðzi zi1 Þ ð6:72Þ
with
Xn
I ¼ lim Sn ¼ lim i¼1
f ðζ i Þðzi zi1 Þ: ð6:73Þ
n!1 n!1
Denoting the maximum modulus of f (z) on C by max j fj, we have

Xn Xn
j Sn j i¼1
j f ðζ i Þ j j ðzi zi1 Þj max j f j i¼1
j ðzi zi1 Þj : ð6:96Þ
P
The sum ni¼1 j ðzi zi1 Þj on RHS of inequality (6.96) is the length of a polygon
inscribed in the curve C. It is shorter than the arc length L of the curve C. Hence, for
all n we have
j Sn j max j f j L:
R
As n ! 1, jSnj ¼ j I j ¼ j C f (z)dz j max j f j L.
Applying the Darboux inequality to (6.94), we get
I I
f ðζ Þ f ðzÞ f ðζ Þ f ðzÞ f ðζ Þ f ðzÞ
dζ ¼ dζ max 2πρ < 2πε:
Γ ζz Γ ρe iθ ρeiθ
That is,
I
1 f ðζ Þ f ðzÞ
j dζ j< ε: ð6:97Þ
2πi Γ ζz
So far, we dealt with the differentiation and integration as different mathematical

manipulations. But Cauchy’s integral formula (or Cauchy’s integral expression)
enables us to relate and unify these two manipulations. We have following propo-
sition for this.
Proposition 6.1 [5] Let C be a piecewise continuous curve of finite length. (The
curve C may or may not be closed.) Let f (z) be a continuous function. The contour
integration of f (ζ)/(ζ z) gives ef ðzÞ such that
Z
ef ðzÞ ¼ 1 f ðζ Þ
dζ: ð6:98Þ
2πi C z
ζ
Then, ef ðzÞ is analytic at any point z that does not lie on C.

Proof We consider the following expression:
ef ðz þ ΔzÞ ef ðzÞ Z
1 f ðζ Þ
Δ dζ , ð6:99Þ
Δz 2πi C ðζ zÞ2
where Δ is a real non-negative number. Describing (6.99) by use of (6.98) for

ef ðz þ ΔzÞ and ef ðzÞ, we get
Z
j Δz j f ðζ Þ
Δ¼ 2
dζ : ð6:100Þ
2π C ðζ z ΔzÞðζ zÞ
To obtain (6.100), in combination with (6.98) we have calculated (6.99) as

follows:
Z
1 f ðζ Þðζ zÞ2 f ðζ Þðζ z ΔzÞðζ zÞ f ðζ ÞΔzðζ z ΔzÞ
Δ¼ dζ
2πiΔz C ðζ z ΔzÞðζ zÞ2
Z :
1 f ðζ ÞðΔzÞ2
¼ 2
dζ
2πiΔz C ðζ z ΔzÞðζ zÞ
Since z is not on C, the integrand of (6.100) is bounded. Then, as Δz ! 0,

jΔz j ! 0 and Δ ! 0 accordingly. From (6.99), this ensures the differentiability of
ef ðzÞ. Then, by definition of the differentiation, def ðzÞ is given by
dz
Z
def ðzÞ 1 f ðζ Þ
¼ dζ: ð6:101Þ
dz 2πi C ðζ zÞ2
As f (ζ) is continuous, ef ðzÞ is single valued. Thus, ef ðzÞ is found to be analytic at

any point z that does not lie on C. This completes the proof. ∎
If in (6.101) we take C as a closed contour that encircles z, from (6.98) we have
I
ef ðzÞ ¼ 1 f ðζ Þ
dζ, ð6:102Þ
2πi C z
ζ
Moreover, if f (z) is analytic in a simply connected region that contains C, from

Theorem 6.11 we must have
I
1 f ðζ Þ
f ðzÞ ¼ dζ: ð6:103Þ
2πi C z
ζ
Comparing (6.102) with (6.103) and taking account of the single valuedness of f
(z), we must have
f ðzÞ ef ðzÞ: ð6:104Þ
Then, from (6.101) we get

I
df ðzÞ 1 f ðζ Þ
¼ dζ: ð6:105Þ
dz 2πi C ð ζ zÞ2
An analogous result holds with the n-th derivative of f (z) such that [5]
I
d n f ðzÞ n! f ðζ Þ
¼ dζ ðn : zero or positive integersÞ: ð6:106Þ
dzn 2πi C ðζ zÞ
nþ1
Equation (6.106) implies that an analytic function is infinitely differentiable and

that the derivatives of all order of an analytic function are again analytic. These
prominent properties arise partly from the aforementioned stringent requirement on
the differentiability of a function of a complex variable.
So far, we have not assumed the continuity of dfdzðzÞ . However, once we have
established (6.106), it assures the presence of, e.g., d dzf ð2zÞ and, hence, the continuity of
2
df ðzÞ
It is true of d dzf ðnzÞ with any n (zero or positive integers).
n
dz .
A following theorem is important and intriguing with the analytic functions.
Theorem 6.13: Cauchy–Liouville Theorem A bounded entire function must be a
constant.
Proof Using (6.106), we consider the first derivative of an entire function f (z)
described as
I
df 1 f ðζ Þ
¼ 2
dζ:
dz 2πi C ðζ zÞ
Since f (z) is an entire function, we can arbitrarily choose a large enough circle of
radius R centered at z for a closed contour C. On the circle, we have
ζ ¼ z þ Reiθ ,
where θ is a real number changing from 0 to 2π. Then, the above equation can be
rewritten as
Z Z
df 1 2π
f ðζ Þ 1 2π
f ðζ Þ
¼ iReiθ dθ ¼ dθ: ð6:107Þ
dz 2πi 0 ðReiθ Þ2 2πR 0 eiθ
Taking an absolute value of both sides, we have

Z 2π Z 2π
df 1 M M
j j j f ðζ Þ j dθ dθ ¼ , ð6:108Þ
dz 2πR 0 2πR 0 R
where M is the maximum of j f (ζ)j. As R ! 1, j df dz j tends to be zero. This implies

that df
dz tends to be zero as well and, hence, that f (z) is constant. This completes the
proof. ∎
At the first glance, Cauchy–Liouville theorem looks astonishing in terms of
theory of real analysis. It is because we are too familiar with, e.g., 1 sin x 1
for any real number x. Note that sinx is bounded in a real domain. In fact, in Sect. 8.6
we will show
that from (8.98) and (8.99),
a ! 1 (a : real, a 6¼ 0),
when
sin π2 þ ia takes real values with sin π2 þ ia ! 1. This simple example clearly
shows that sinϕ is an unbounded entire function in a complex domain. As a very
familiar example of a bounded entire functions, we show
f ðzÞ ¼ cos z2 þ sin z2 1, ð6:109Þ
which is defined in an entire complex plane.
6.4 Taylor’s Series and Laurent’s Series
Using Cauchy’s integral formula, we study Taylor’s series and Laurent’s series in
relation to the power series expansion of analytic function. Taylor’s series and
Laurent’s series are fundamental tools to study various properties of complex
functions in the theory of analytic functions. First, let us examine Taylor’s series
of an analytic function.
Theorem 6.14 [6] Any analytic function f (z) can be expressed by uniformly
convergent power series at an arbitrary regular point of the function in the domain
of analyticity R .
6.4 Taylor’s Series and Laurent’s Series 217
Fig. 6.14 Diagram to

explain Taylor’s expansion
that is assumed to be ℛ
performed around a. A
circle C of radius r centered
at a is inside R . Regarding
other symbols and notations,
see text
Proof Let us assume that we have a circle C of radius r centered at a within R .

Choose z inside C with jz aj ¼ ρ (ρ < r) and consider the contour integration of f
(z) along C (Fig. 6.14). Then, we have
1 1 1 1
¼ ¼
ζ z ζ a ðz aÞ ζ a 1 ζa
za
1 za ð z aÞ 2
¼ þ þ þ , ð6:110Þ
ζ a ð ζ aÞ 2 ð ζ aÞ 3
where ζ is an arbitrary point on the circle C. To derive (6.110), we used the following
formula:
1 X1
¼ xn ,
1x n¼0
where x ¼ ζa
za
. Since za
ζa ¼ ρr < 1 , the geometric series of (6.110) converges.
Suppose that f (ζ) is analytic on C. Then, we have a finite positive number M on
C such that
j f ðζ Þj< M: ð6:111Þ
f ðζ Þ f ðζ Þ ðz aÞf ðζ Þ ðz aÞ2 f ðζ Þ
¼ þ þ þ : ð6:112Þ
ζz ζa ðζ aÞ2 ð ζ aÞ 3
Hence, combining (6.111) and (6.112) we get

" n #
X1 ðz aÞν f ðζ Þ M X1 ρ ν M 1 1 ρr
¼
ν¼n
ð ζ aÞ νþ1 r ν¼n r r 1 ρr 1 ρr
M ρ n
¼ : ð6:113Þ
rρ r
RHS of (6.113) tends to be zero when n ! 1. Therefore, (6.112) is uniformly

convergent with respect to ζ on C. In the above calculations, when we express the
infinite series of RHS in (6.112) as Σ and the partial sum of its first n terms as Σn,
LHS of (6.113) can be expressed as Σ Σn( Pn). Since j Pn j ! 0 with n ! 1, this
certainly shows that the series Σ is uniformly convergent [6].
Consequently, we can perform termwise integration [6] of (6.112) on C and
subsequently divide the result by 2πi to get
I X 1 ð z aÞ n I
1 f ðζ Þ f ðζ Þ
dζ ¼ dζ: ð6:114Þ
Cζz
nþ1
2πi 2πi C ðζ aÞ
n¼0
From Theorem 6.11, LHS of (6.114) equals f (z). Putting

I
1 f ðζ Þ
An nþ1
dζ, ð6:115Þ
2πi C ð ζ aÞ
we get
X1
f ðzÞ ¼ A ðz
n¼0 n
aÞ n : ð6:116Þ

In the above proof, from (6.106) we have
I
1 f ðζ Þ 1 d n f ðzÞ
dζ ¼ : ð6:117Þ
2πi C ðζ zÞ
nþ1 n! dzn
Hence, combining (6.117) with (6.115) we get
I
1 f ðζ Þ 1 dn f ðzÞ 1 d n f ð aÞ
An ¼ dζ ¼ ¼ : ð6:118Þ
2πi C ð ζ aÞ
nþ1 n! dzn n! dzn
z¼a
Thus, (6.116) can be rewritten as

Fig. 6.15 Diagram used for

calculating the Taylor’s z
series of f(z) ¼ 1/z around
z ¼ a (a 6¼ 0). A contour
C encircles the point a
i
0 1
X1 1 dn f ðaÞ
f ðzÞ ¼ n¼0 n! dzn
ð z aÞ n :
This is the same form as that obtained with respect to a real variable.
Equation (6.116) is a uniformly convergent power series called a Taylor’s series
or Taylor’s expansion of f (z) with respect to z ¼ a with An being its coefficient. In
Theorem 6.14 we have assumed that a union of a circle C and its inside is simply
connected. This topological environment is virtually the same as that of Fig. 6.13. In
Theorem 6.11 (Cauchy’s integral formula) we have shown the integral representa-
tion of an analytic function. On the basis of Cauchy’s integral formula, Theorem
6.14 demonstrates that any analytic function is given a tangible functional form.
Example 6.2 Let us think of a function f (z) ¼ 1/z. This function in analytic except
for z ¼ 0. Therefore, it can be expanded in the Taylor’s series in the region ℂ {0}.
We consider the expansion around z ¼ a (a 6¼ 0). Using (6.115), we get the
coefficients of the expansion are
I
1 1
An ¼ nþ1
dz,
2πi C z ð z aÞ
where C is a closed curve that does not contain z ¼ 0 on C or in its inside. Therefore,
1/z is analytic in the simply connected region encircled with C so that the point z ¼ 0
may not be contained inside C (Fig. 6.15). Then, from (6.106) we get
I
1 1 1 d n ð1=zÞ 1 1 ð1Þn
dz ¼ ¼ ð1Þn n! nþ1 ¼ nþ1 :
2πi C z ð z aÞ
nþ1 n! dzn n! a a
z¼a
Hence, from (6.116) we have

1 X1 ð1Þ 1 X1
n
n z n
f ðzÞ ¼ ¼ n¼0 anþ1
ð z a Þ ¼ 1 : ð6:119Þ
z a n¼0 a
The series uniformly converges within a convergence circle called a “ball” [7]
(vide infra) whose convergence radius r is estimated to be
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffi
1 n ð1Þn 1 n 1 1
¼ lim sup ¼ ¼ : ð6:120Þ
r n!1 anþ1 jaj jaj jaj
That is, r ¼ j aj. In (6.120) “n!1

lim sup” stands for the superior limit and is sometimes
said to be limes superior [6]. The estimation is due to CauchyHadamard theorem.
Readers interested in the theorem and related concepts are referred to appropriate
literature [6].
The concept of the convergence circle and radius is very important for defining
the analyticity of a function. Formally, the convergence radius is defined as a radius
of a convergence circle in a metric space, typically ℝ2 and ℂ. In ℂ a region R within
the convergence circle of convergence radius r (i.e., a real positive number) centered
at z0 is denoted by

R ¼ z; z, ∃ z0 2 ℂ, ρðz, z0 Þ < r :
The region R is an open set by definition (see Sect. 6.1) and the Tailor’s series
defined in R is convergent. On the other hand, the Tailor’s series may or may not be
convergent at z that is on the convergence circle. In Example 6.2, f ðzÞ ¼ 1z is not
defined at z ¼ 0, even though z (¼0) is on the convergent circle. At other points on
the convergent circle, however, the Tailor’s series is convergent.
Note that a region B n in ℝn similarly defined as

B n ¼ x; x, ∃ x0 2 ℝn , ρðx, x0 Þ < r
is sometimes called a ball or open ball. This concept can readily be extended to any
metric space.
Next, we examine Laurent’s series of an analytic function.
Theorem 6.15 [6] Let f (z) be analytic in a region R except for a point a. Then, f (z)
can be described by the uniformly convergent power series within a region R fag.
Proof Let us assume that we have a circle C1 of radius r1 centered at a and another
circle C2 of radius r2 centered at a both within R . Moreover, let another circle Γ be
centered at z (6¼a) within R so that z can be outside of C2; see Fig. 6.16.
In this situation we consider the contour integration along C1, C2, and Γ. Using
(6.87), we have
Fig. 6.16 Diagram to

explain Laurent’s expansion
that is performed around a.
The circle Γ containing the
point z lies on the annular Γ
region bounded by the
circles C1 and C2
I I I
1 f ðζ Þ 1 f ðζ Þ 1 f ðζ Þ
dζ ¼ dζ dζ: ð6:121Þ
2πi Γ ζ z 2πi C1 ζ z 2πi C2 z
ζ
From Theorem 6.11, we have
LHS of ð6:121Þ ¼ f ðzÞ:
Notice that the region containing Γ and its inside form a simply connected region
in terms of the analyticity of f (z). With the first term of RHS of (6.121), from
Theorem 6.14 we have
I X1
1 f ðζ Þ
dζ ¼ A ð z aÞ n ð6:122Þ
C1 ζ z
2πi n¼0 n
with
I
1 f ðζ Þ
An ¼ nþ1
dζ ðn ¼ 0, 1, 2, Þ: ð6:123Þ
2πi C 1 ð ζ aÞ
In fact, if ζ lies on the circle C1, the Taylor’s series is uniformly convergent as in
the case of the proof of Theorem 6.14.
With the second term of RHS of (6.121), however, the situation is different in
such a way that z lies outside the circle C2. In this case, from Fig. 6.16 we have
ζa r
¼ 2 < 1: ð6:124Þ
za ρ
Then, a geometric series described by

" #
2
1 1 1 ζ a ð ζ aÞ
¼ ¼ 1þ þ þ ð6:125Þ
ζ z ζ a ð z aÞ z a z a ð z aÞ 2
is uniformly convergent with respect to ζ on C2. Accordingly, again we can perform

termwise integration [6] of (6.125) on C2. Hence, we get
I I I
f ðζ Þ 1 1
dζ ¼ f ðζ Þdζ þ f ðζ Þðζ aÞdζ
C2 ζ z z a C2 ðz aÞ2 C2
I ð6:126Þ
1
þ 3
f ðζ Þðζ aÞ2 dζ þ :
ðz aÞ C2
That is, we have

I X1
1 f ðζ Þ
dζ ¼ A ðz aÞn ,
n¼1 n
ð6:127Þ
2πi C2 ζ z
where
I
1
An ¼ f ðζ Þðζ aÞn1 dζ ðn ¼ 1, 2, 3, Þ: ð6:128Þ
2πi C2
Consequently, from (6.121), with z lying on the annular region bounded by C1

and C2, we get
X1
f ðzÞ ¼ A ðz
n¼1 n
aÞ n : ð6:129Þ
In (6.129), the coefficients An are given by (6.123) or (6.128) according to the

plus (including zero) or minus sign of n. These complete the proof. ∎
Equation (6.129) is a uniformly convergent power series called a Laurent’s series
or Laurent’s expansion of f (z) with respect to z ¼ a with An being its coefficient.
Note that the above annular region bounded by C1 and C2 is not simply connected,
but multiply connected.
e is taken in the annular region formed by C1
We add that if the integration path C
and C2 so that it can be sandwiched in between them, the values of integral expressed
as (6.123) and (6.128) remain unchanged in virtue of (6.87). That is, instead of
(6.123) and (6.128) we may have a following unified formula
I
1 f ðζ Þ
An ¼ dζ ðn ¼ 0, 1, 2, Þ ð6:130Þ
2πi e
C ð ζ aÞ
nþ1
that represents the coefficients An of the power series

6.5 Zeros and Singular Points 223
X1
f ðzÞ ¼ A ðz
n¼1 n
aÞ n : ð6:129Þ
6.5 Zeros and Singular Points
We described the fundamental theorems of Taylor’s expansion and Laurent’s

expansion. Next, we explain important concepts of zeros and singular points.
Definition 6.11 Let f (z) be analytic in a region R . If f (z) vanishes at a point z ¼ a,
the point is called a zero of f (z). When we have
df ðzÞ d2 f ðzÞ dn1 f ðzÞ

f ð aÞ ¼ ¼ z¼a ¼ ¼ ¼ 0, ð6:131Þ
dz z¼a dz2 dzn1 z¼a
but
d n f ðzÞ
6¼ 0, ð6:132Þ
dzn z¼a
the function is said to have a zero of order n at z ¼ a.

If (6.131) and (6.132) hold, the first n coefficients in the Tailor’s series of the
analytic function f (z) at z ¼ a vanish. Hence, we have
f ðzÞ ¼ An ðz aÞn þ Anþ1 ðz aÞnþ1 þ

X1
¼ ðz aÞn k¼0 Anþk ðz aÞk ¼ ðz aÞn hðzÞ,
where we define h(z) as

X1
hð z Þ A ðz
k¼0 nþk
aÞ k : ð6:133Þ
Then, h(z) is analytic and nonvanishing at z ¼ a. From the analyticity h(z) must be
continuous at z ¼ a and differ from zero in some finite neighborhood of z ¼ a.
Consequently, it is also the case with f (z). Therefore, if the set of zeros had an
accumulation point at z ¼ a, any neighborhood of z ¼ a would contain another zero,
in contradiction to the above assumption. To avoid this contradiction, the analytic
function f (z) 0 throughout the region R . Taking the contraposition of the above
statement, if an analytic function is not identically zero [i.e., f (z) ≢ 0], the zeros of
that function are isolated; see the relevant discussion of Sect. 6.1.2.
Meanwhile, the Laurent’s series (6.129) can be rewritten as
X1 X1 1
f ðzÞ ¼ A ðz
k¼0 k
aÞk þ A
k¼1 k
ð6:134Þ
ð z aÞ k
so that the constitution of the Laurent’s series can explicitly be recognized. The
second term is called a singular part (or principal part) of f (z) at z ¼ a. The
singularity is classified as follows:
(1) If (6.134) lacks the singular part (i.e., Ak ¼ 0 for k ¼ 1, 2, ), the singular
point is said to be a removable singularity. In this case, we have
X1
f ðzÞ ¼ A ðz
k¼0 k
aÞ k : ð6:135Þ
If in (6.135) we define suitably as
f ðaÞ A0 ,
f (z) can be regarded as analytic. Examples include the following case where a
sin function is expanded as the Taylor’s series:
z3 z5 ð1Þn1 z2n1 X1 ð1Þn1 z2n1

sin z ¼ z þ þ þ ¼ : ð6:136Þ
3! 5! ð2n 1Þ! n¼1 ð2n 1Þ!
Hence, if we define
sin z sin z
f ðzÞ and f ð0Þ lim ¼ 1, ð6:137Þ
z z!0 z
z ¼ 0 is a removable singularity.
(2) Suppose that in (6.134) we have a certain positive integer such that
An 6¼ 0 but Aðnþ1Þ ¼ Aðnþ2Þ ¼ ¼ 0, ð6:138Þ
the function f (z) is said to have a pole of order n at z ¼ a. If, in particular, n ¼ 1

in (6.138), i.e.,
A1 6¼ 0 but A2 ¼ A3 ¼ ¼ 0, ð6:139Þ
the function f (z) is said to have a simple pole. This special case is important in
the calculation of various integrals (vide infra). In the above cases, (6.134) can
be rewritten as
6.6 Analytic Continuation 225
gð z Þ
f ðzÞ ¼ , ð6:140Þ
ð z aÞ n
where g(z) defined as

X1
gð z Þ A ðz
k¼0 kn
aÞk ð6:141Þ
is analytic and nonvanishing at z ¼ a, i.e., An 6¼ 0. Explicitly writing (6.140),

we have
X1 Xn 1
f ðzÞ ¼ A ðz
k¼0 k
aÞ k þ A
k¼1 k
:
ðz aÞk
(3) If the singular part of (6.134) comprises an infinite series, the point z ¼ a is
called an essential singularity. In that case, f (z) performs a complicated behavior
near or at z ¼ a. Interested readers are referred to suitable literature [5].
A function f (z) that is analytic in a region ℂ except at a set of points of the
region where the function has poles is called a meromorphic function in the said
region. The above definition of meromorphic functions is true of Cases (1) and
(2), but not true of Case (3). Henceforth, we will be dealing with the meromor-
phic functions.
In the above discussion of Cases (1) and (2), we often deal with a function f
(z) that has a single isolated pole at z ¼ a. This implies that f (z) is analytic within
a certain neighborhood Na of z ¼ a but is not analytic at z ¼ a. More specifically,
f (z) is analytic in a region Na {a}. Even though we consider more than one
isolated pole, the situation is essentially the same. Suppose that there is another
isolated pole at z ¼ b. In that case, again take a certain neighborhood Nb of z ¼ b
(6¼a) and f (z) is analytic in a region Nb {b}. Readers may well wonder why we
have to discuss this trifling issue. Nevertheless, think of the situation where a set
of poles has an accumulation point. Any neighborhood of the accumulation
point contains another pole where the function is not analytic. This is in
contradiction to that f (z) is analytic in a region, e.g., Na {a}. Thus, if such a
function were present, it would be intractable to deal with.
6.6 Analytic Continuation
When we discussed the Cauchy’s integral formula (Theorem 6.11), we have known
that if a function is analytic in a certain region of ℂ and on a curve C that encircles the
region, the values of the function within the region are determined once the values of
the function on C are given. We have the following theorem for this.
Fig. 6.17 Four

convergence circles C1, C2,
C3, and C4 for 1/z. These
convergence circles are
centered at 1, i, 1, or i
ℛ
−1 1
Theorem 6.16 Let f1(z) and f2(z) be two functions that are analytic within a region
R . Suppose that the two functions coincide in a set chosen from among (i) a
neighborhood of a point z 2 R , or (ii) a segment of a curve lying in R , or (iii) a
set of points containing an accumulation point belonging to R . Then, the two
functions f1(z) and f2(z) coincide throughout R .
Proof Suppose that f1(z) and f2(z) coincide on the above set chosen from the three.
Then, since f1(z) f2(z) ¼ 0 on that set, the set comprises the zeros of f1(z) f2(z).
This implies that the zeros are not isolated in R . Thus, as evidenced from the
discussion of Sect. 6.5, we conclude that f1(z) f2(z) 0 throughout R . That is,
f1(z) f2(z). This completes the proof. ∎
The theorem can be understood as follows: Two different analytic functions
cannot coincide in the set chosen from the above three (or more succinctly, a set
that contains an accumulation point). In other words, the behavior of an analytic
function in R is uniquely determined by the limited information of the subset
belonging to R .
The above statement is reminiscent of the pre-established harmony and, hence,
occasionally talked about mysteriously. Instead, we wish to give a simple example.
Example 6.3 In Example 6.2 we described a tangible illustration of the Taylor’s
expansion of a function 1/z as follows:
1 X1 ð1Þ 1 X1
n
n z n
¼ n¼0 anþ1
ð z a Þ ¼ 1 : ð6:119Þ
z a n¼0 a
If we choose 1, i, 1, or i for a, we can draw four convergence circles as depicted

in Fig. 6.17. Let each function described by (6.119) be f1(z), fi(z), f1(z), or fi(z). Let
R be a region that consists of an outer periphery and its inside of the four circles C1,
6.7 Calculus of Residues 227
C2, C3, and C4 (i.e., convergence circles of radius 1). The said region R is colored
pale green in Fig. 6.17. There are four petals as shown.
We can view Fig. 6.17 as follows: For instance, f1(z) and fi(z) are defined in C1
and its inside and C2 and its inside, respectively, excluding z ¼ 0. We have
f1(z) ¼ fi(z) within the petal P1 (overlapped part as shown). Consequently, from
the statement of Theorem 6.16, f1(z) fi(z) throughout the region encircled by C1
and C2 (excluding z ¼ 0). Repeating this procedure, we get
f 1 ðzÞ f i ðzÞ f 1 ðzÞ f i ðzÞ:
Thus, we obtain the same functional entity throughout R except for the origin
(z ¼ 0). Further continuing similar procedures, we finally reach the entire complex
plane except for the origin, i.e., ℂ {0} regarding the domain of analyticity of 1/z.
We further compare the above results with those for analytic function defined in
the real domain. The tailor’s expansion of f (x) ¼ 1/x (x: real with x 6¼ 0) around
a (a 6¼ 0) reads as
1 f ðnÞ ðaÞ X1 ð1Þn

f ð xÞ ¼ ¼ f ðaÞ þ f 0 ðaÞ þ þ þ ¼ n¼0 anþ1
ðx aÞn : ð6:142Þ
x n!
This is exactly the same form as that of (6.119) aside from notation of the
argument.
The aforementioned procedure that determines the behavior of an analytic func-
tion outside the region where that function was originally defined is called analytic
continuation or analytic prolongation. Comparing (6.119) and (6.142), we can see
that (6.119) is a consequence of the analytic continuation of 1/x from the real domain
to the complex domain.
6.7 Calculus of Residues
The theorem of residues and its application to calculus of various integrals are one of
central themes of the theory of analytic functions.
The Cauchy’s integral theorem (Theorem 6.10) tells us that if f (z) is analytic in a
simply connected region R , we have
I
C
where C is a closed curve within R . As we have already seen, this is not necessarily
the case if f (z) has singular points in R . Meanwhile, if we look at (6.130), we are
aware of a simple but important fact. That is, replacing n with 1, we have
I I
1
A1 ¼ f ðζ Þdζ or f ðζ Þdζ ¼ 2πiA1 : ð6:143Þ
2πi C C
This implies that if we could obtain the Laurent’s series of f (z) with respect
H to a
singularity located at z ¼ a and encircled by C, we can immediately estimate Cf (ζ)
1
dζ of (6.143) from the coefficient of the term (z H a) , i.e., A1. This fact further
implies that even though f (z) has singularities, Cf (ζ)dζ may happen to vanish.
Thus, whether (6.84) vanishes depends on the nature of f (z) and its singularities. To
examine it, we define a residue of f (z) at z ¼ a (that is encircled by C) as
I I
1
Res f ðaÞ f ðzÞdz or f ðzÞdz ¼ 2πi Res f ðaÞ, ð6:144Þ
2πi C C
where Res f (a) denotes the residue of f (z) at z ¼ a. From (6.143) and (6.144), we
have
A1 ¼ Res f ðaÞ: ð6:145Þ
In a formal sense, the point a does not have to be a singular point. If it is a regular
point of f (a), we have Res f (a) ¼ 0 trivially. If there is more than one singularity at
z ¼ aj, we have
I X
f ðzÞdz ¼ 2πi j
Res f aj : ð6:146Þ
C
Notice that we assume the isolated singularities with (6.144) and (6.146). These
equations are associated with Cases (1) and (2) dealt with in Sect. 6.5. Within this
framework, we wish to evaluate the residue of f (z) at a pole of order n located at
z ¼ a. Using (6.140)
gð z Þ
f ðzÞ ¼ , ð6:140Þ
ð z aÞ n
where g(z) is analytic and nonvanishing at z ¼ a. Inserting (6.140) into (6.144), we

have
I
1 gð ζ Þ 1 dn1 gðzÞ
Res f ðaÞ ¼ n dζ ¼ j
2πi C ð ζ aÞ ðn 1Þ! dzn1 z¼a
1 d n1 ½ f ðzÞðz aÞn

¼ jz¼a , ð6:147Þ
ðn 1Þ! dzn1
where with the second equality we used (6.106).

6.7 Calculus of Residues 229
In the special but important case where f (z) has a simple pole at z ¼ a, from
(6.147) we get
Res f ðaÞ ¼ f ðzÞðz aÞjz¼a : ð6:148Þ
Setting n ¼ 1 in (6.147) in combination with (6.140), we obtain

I I
1 1 gðzÞ
Res f ðaÞ ¼ f ðzÞdz ¼ dz ¼ f ðzÞðz aÞjz¼a ¼ gðzÞjz¼a :
2πi C 2πi C a
z
This is nothing but Cauchy’s integral formula (6.88) regarding g(z).

Summarizing the above discussion, we can calculate the residue of f (z) at a pole
of order n located at z ¼ a using one of the following alternatives, i.e., (i) using
(6.147) and (ii) picking up the coefficient A1 in the Laurent’s series described by
(6.129).
In Sect. 6.3 we mentioned that the closed contour integral (6.86) or (6.87)
depends on the nature of singularity. Here we have reached a simple criterion of
evaluating that integral. In fact, suppose that we have a Laurent’s expansion such
that
X1
f ðzÞ ¼ A ðz
n¼1 n
aÞ n : ð6:129Þ
Then, let us calculate the contour integral of f (z) on a closed circle Γ of radius ρ
centered at z ¼ a. We have
I X1 I
I f ðζ Þdζ ¼ A
n¼1 n
ðζ aÞn dζ
Γ Γ
X1 Z 2π
¼ n¼1
iAn ρnþ1 eiðnþ1Þθ dθ: ð6:149Þ
0
The above integral vanishes except for n ¼ 1. With n ¼ 1 we get
I ¼ 2πiA1 :
Thus, we recover (6.145) in combination of (6.144). To get a Laurent’s series of f

(z), however, is not necessarily easy. In that case, we estimate a residue by (6.147).
Tangible examples can be seen in the next section.
6.8 Examples of Real Definite Integrals
The calculus of residues is of great practical importance. For instance, it is directly

associated with the calculations of definite integrals. In particular, even if one finds
the real integration to be hard to perform in order to solve a related problem, it is
often easy to solve it using complex integration, especially the calculus of residues.
Here we study several examples.
Example 6.4 Let us consider the following real definite integral:
Z 1
dx
I¼ : ð6:150Þ
1 1 þ x2
To this end, we convert the real variable x to the complex variable z and evaluate a
contour integral IC (Fig. 6.18) described by
Z R Z
dz dz
IC ¼ þ , ð6:151Þ
R 1 þ z2 ΓR 1 þ z2
where R is a real positive number enough large (R 1); ΓR denotes an upper

semicircle; IC stands for the contour integral along the closed curve C that comprises
the interval [R, R] and the upper semicircle ΓR. Simple poles of order 1 are located
at z ¼ i as shown. There is only one simple pole at z ¼ i within the upper
semicircle of Fig. 6.18. From (6.146), therefore, we have
I
IC ¼ f ðzÞdz ¼ 2πi Res f ðiÞ, ð6:152Þ
C
where f ðzÞ 1þz

1
2. Using (6.148), the residue of f (z) at z ¼ i can readily be estimated
to be
Fig. 6.18 Contour for the

integration of 1/(1 + z2) that
appears in Example 6.4. One Γ
may equally choose the
upper semicircle (denoted
by ΓR) and lower semicircle i
(denoted by ΓfR ) for contour
integration − 0
−i
Γ
6.8 Examples of Real Definite Integrals 231
1
Res f ðiÞ ¼ f ðzÞðz iÞjz¼i ¼ : ð6:153Þ
2i
To estimate the integral (6.150), we must calculate the second term of (6.151). To
this end, we change the variable z such that
z ¼ Reiθ , ð6:154Þ
where θ is an argument. Then, using the Darboux inequality (6.95), we get

Z Z π Z π Z π
dz iReiθ dθ iReiθ R
¼ dθ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 dθ
ΓR 1þz
2
0 R2 e2iθ þ1 0 R e þ1
2 2iθ
0 R e þ1 R2 e2iθ þ1
2iθ
Z π :
1 π π
¼R qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
4 dθ R ¼
0 R þ2R2 cos2θ þ1 1 1
R2 1 2 R 1 2
R R
Taking R ! 1, we have
Z
dz
! 0: ð6:155Þ
ΓR 1 þ z2
Consequently, if R ! 1, we have
Z 1 Z Z 1
dx dz dx
lim I C ¼ þ lim ¼ ¼I
R!1 1 1 þ x 2 R!1 ΓR 1 þ z 2
1 1 þ x2
¼ 2πi Res f ðiÞ ¼ π, ð6:156Þ
where with the first equality we placed the variable back to x; with the last equality
we used (6.152) and (6.153). Equation (6.156) gives an answer to (6.150). Notice in
general that if a meromorphic function is given as a quotient of two polynomials, an
integral of a type of (6.155) tends to be zero with R ! 1 in the case where the
degree of the denominator of that function is at least two units higher than the degree
of the numerator [8].
We may equally choose another contour C e (the closed curve comprising the
interval [R, R] and the lower semicircle ΓfR ) for the integration (Fig. 6.18). In that
case, we have
Z 1 Z Z 1
dx dz dx
lim I ¼ þ lim ¼
R!1 e
C
1 1 þ x2 R!1 ΓeR 1 þ z2 1 1 þ x2
¼ 2πi Res f ðiÞ ¼ π: ð6:157Þ
Note that in the above equation the contour integral is taken in the counterclock-
wise direction and, hence, that the real definite integral has been taken from 1 to
1. Notice also that
1
Res f ðiÞ ¼ f ðzÞðz þ iÞjz¼i ¼ ,
2i
because f (z) has a simple pole at z ¼ i in the lower semicircle (Fig. 6.18) for which
the residue has been taken. From (6.157), we get the same result as before such that
Z 1
dx
¼ π:
1 1 þ x2
Alternatively, we can use the method of partial fraction decomposition. In that

case, as the integrand we have
1 1 1 1
¼ : ð6:158Þ
1 þ z2 2i z i z þ i
2 at z ¼ i is 2i [i.e., the coefficient of zi in (6.158)], giving the

1 1 1
The residue of 1þz
same result as (6.153).
Example 6.5 Evaluate the following real definite integral:
Z 1
dx
I¼ : ð6:159Þ
1 ð 1 þ x2 Þ 3
As in the case of Example 6.4, we evaluate a contour integration described by

I Z R Z
1 dz dz
IC ¼ 3
dz ¼ 3
þ , ð6:160Þ
C ð1 þ z2 Þ R ð1 þ z2 Þ ΓR ð1 þ z2 Þ3
where C stands for the closed curve comprising the interval [R, R] and the upper
semicircle ΓR (see Fig. 6.18).
The function f ðzÞ ð1þz1 2 Þ3 has two isolated poles at z ¼ i. In the present case,
however, the poles are of order of 3. Hence, we use (6.147) to estimate the residue.
The inside of the upper semicircle contains the pole of order 3 at z ¼ i. Therefore, we
have
I I
1 1 gð z Þ 1 d2 gðzÞ
Res f ðiÞ ¼ f ðzÞdz ¼ 3
dz ¼ , ð6:161Þ
2πi C 2πi C ðz iÞ 2! dz2
z¼i
where
1 gð z Þ
gð z Þ 3
and f ðzÞ ¼ :
ðz þ i Þ ðz iÞ3
Hence, we get
1 d 2 gð z Þ 1 1
z¼i ¼ ð3Þ ð4Þ ðz þ iÞ5 ¼ ð3Þ ð4Þ ð2iÞ5
2! dz2 2 z¼i 2
3
¼ ¼ Res f ðiÞ: ð6:162Þ
16i
For a reason similar to that of Example 6.4, the second term of (6.160) vanishes
when R ! 1. Then, we obtain
Z 1 Z I
dz dz gð z Þ
lim I ¼ 3
þ lim 3
¼ lim 3
dz ¼ 2πi Res f ðiÞ
R!1C 1 ð1 þ z Þ
2 R!1 ΓR ð1 þ z Þ
2 R!1 C ðz iÞ
3 3π
¼ 2πi ¼ :
16i 8
That is,
Z 1 Z 1
dz dx 3π
3
¼ 3
¼I¼ :
1 ð1 þ z2 Þ 1 ð1 þ x Þ
2 8
If we are able to perform the Laurent’s expansion of f (z), we can immediately

evaluate the residue using (6.145). To this end, let us utilize the binomial expansion
formula (or generalized binomial theorem). We have
1 1
f ðzÞ ¼ ¼ :
ð1 þ z2 Þ3 ðz þ iÞ3 ðz iÞ3
Changing the variable z i ¼ ζ, we have

3
1 1 1 ζ
f ðzÞ ¼ ef ðζ Þ ¼ ¼ 1 þ
ðζ þ 2iÞ3 ζ 3 ζ 3 ð2iÞ3 2i
" 2 3 #
1 1 ζ ð3Þð4Þ ζ ð3Þð4Þð5Þ ζ
¼ 3 1 þ ð3Þ þ þ þ
ζ ð2iÞ3 2i 2! 2i 3! 2i
1 1 3 3 5
¼ þ þ ,
8i ζ 3 2iζ 2 2ζ 4i
ð6:163Þ
where of the above equation denotes a power series of ζ. Accompanied by the

variable transformation z to ζ, the functional form f has been changed to ef . Getting
back the original functional form, we have
" #
1 1 3 3 5
f ðzÞ ¼ þ þ : ð6:164Þ
8i ðz iÞ3 2iðz iÞ2 2ðz iÞ 4i
Equation (6.164) is a Laurent’s expansion of f (z). From (6.164) we see that the
1 3
coefficient of zi of f (z) is 16i , in agreement with (6.145) and (6.162). To obtain the
answer of (6.159), once again we may equally choose the contour C e (Fig. 6.18) and
get the same result as in the case of Example 6.4. In that case, f (z) can be expanded
into a Laurent’s series around z ¼ i. Readers are encouraged to check it.
We make a few remarks about the (generalized) binomial expansion formula. We
described it in (3.181) of Sect 3.6.1 such that
X1 λ
ð1 þ xÞλ ¼ m¼0
xm , ð3:181Þ
m
where λ is an arbitrary real number. In (3.181), we assumed x is a real number with

jxj < 1. But, (6.163) suggests that (3.181) holds with x that can be a complex number
with jxj < 1. Moreover, λ in (3.181) is allowed to be any complex number. Then, on
those conditions (1 + x)λ is analytic because 1 + x 6¼ 0, and so (1 + x)λ must have a
3
convergent Taylor’s series. By the same token, the factor 1 þ 2iζ of (6.163) yields
a convergent Taylor’s series around ζ ¼ 0. In virtue of the factor ζ13 , (6.163) gives a
convergent Laurent’s series around ζ ¼ 0.
Returning to the present issue, we rewrite (3.181) as a Taylor’s series such that
X1 λ
ð1 þ zÞλ ¼ m¼0
zm , ð6:165Þ
m
where λ is any complex number with z also being a complex number of jzj < 1. We
may view (6.165) as a consequence of the analytic continuation and (6.163) has been
dealt with as such indeed.
Fig. 6.19 Graphical form

of f ðxÞ 1þx
1
3 . The modulus
of f(x) tends to infinity at

x ¼ 1. Inflection points
are present at x ¼ 0 and
p ffiffiffiffiffiffiffiffi
3
1=2 ð 0:7937Þ

-5 -3 -1 1 3 5

Z 1
dx
I¼ : ð6:166Þ
1 1 þ x3
We put
1
f ð xÞ : ð6:167Þ
1 þ x3
Figure 6.19 depicts a graphical form of f (x). The modulus of f (x) tends to be
infinity at x ¼ 1. Inflection points are present at x ¼ 0 and 3 1=2 ð 0:7937Þ.
Now, in the former two examples, the isolated singular points (i.e., poles) existed
only in the inside of a closed contour. In the present case, however, a pole is located
on the real axis. Bearing in mind this situation, we wish to estimate the integral
(6.166).
Z 1 Z 1
dx dz
I¼ or I¼ :
1 ð 1 þ x Þ ð x 2 x þ 1Þ
1 ð 1 þ z Þ ð z2 z þ 1Þ
The polynomial of the denominator, i.e., (1 + z)(z2 z + 1) has a real root at

pffiffi
z ¼ 1 and complex roots at z ¼ eiπ=3 ¼ 12 3i . Therefore,
1
f ðzÞ
ð 1 þ z Þ ð z 2 z þ 1Þ

1
integration of 1þz3 that Γ
appears in Example 6.6
/
Γ i
− −1 0
/
has three simple poles at z ¼ 1 along with z ¼ eiπ/3. This time, we have a contour
C depicted in Fig. 6.20, where the contour integration IC is performed in such a way
that
Z 1r Z
dz dz
IC ¼ þ
R ð1 þ zÞðz2 z þ 1Þ
Γ1 ð 1 þ z Þ ð z 2 z þ 1Þ
Z R Z
dz dz
þ þ ,
1þr ð1 þ zÞðz z þ 1Þ ΓR ð1 þ zÞðz z þ 1Þ
2 2
ð6:168Þ
where Γ1 stands for a small semicircle of radius r around z ¼ 1 as shown. Of the
complex roots z ¼ eiπ/3, only z ¼ eiπ/3 is responsible for the contour integration (see
Fig. 6.20).
Here we define a principal value P of the integral such that
Z 1
dx
P
1 ð 1 þ x Þ ð x 2 x þ 1Þ
Z 1r Z 1
dx dx
lim þ , ð6:169Þ
r!0 1 ð 1 þ x Þ ð x 2 x þ 1Þ 1þr ð 1 þ x Þ ð x 2 x þ 1Þ
where r is an arbitrary real positive number. As shown in this example, the principal
value is a convenient device to traverse the poles on the contour and gives a correct
answer in the contour integral. At the same time, getting R to infinity, we obtain
Z 1 Z
dx dz
lim I C ¼ P þ
R!1 1 ð 1 þ x Þ ðx 2 x þ 1Þ
Γ1 ð 1 þ z Þð z 2 z þ 1Þ
Z
dz
þ : ð6:170Þ
Γ1 ð 1 þ z Þ ð z 2 z þ 1Þ
Meanwhile, the contour integral IC is given by (6.146) such that

I X
lim I C ¼ f ðzÞdz ¼ 2πi Res f aj , ð6:146Þ
R!1 C j
where in the present case f ðzÞ ð1þzÞðz12 zþ1Þ and we have only one simple pole at
aj ¼ eiπ/3 within the contour C.
We are going to get the answer by combining (6.170) and (6.146) such that
Z 1
dx
IP
1 ð 1 þ x Þ ðx 2 x þ 1Þ
Z Z
dz dz
¼ 2πi Res f eiπ=3 :
Γ1 ð1 þ zÞðz z þ 1Þ Γ1 ð1 þ zÞðz z þ 1Þ
2 2
ð6:171Þ
The third term of RHS of (6.171) tends to be zero as in the case of Examples 6.4
and 6.5. Therefore, if we can estimate the second term as well as the residues
properly, the integral (6.166) can be adequately determined. Note that in this case
the integral is meant by the principal value. To estimate the integral of the second
term, we make use of the fact that defining g(z) as
1
gð z Þ , ð6:172Þ
z2 z þ 1
g(z) is analytic at and around z ¼ 1. Hence, g(z) can be expanded in the Taylor’s
series around z ¼ 1 such that
X1
gðzÞ ¼ A0 þ A ðz
n¼1 n
þ 1Þ n , ð6:173Þ
where A0 6¼ 0. In fact,
A0 ¼ gð1Þ ¼ 1=3: ð6:174Þ
Then, we have
Z Z
dz gðzÞdz
¼
ð þ Þ ð Γ1 1 þ z
1 z z 2 z þ 1Þ
Γ1
Z Z X
dz 1
¼ A0 þ A ðz þ 1Þn1 dz: ð6:175Þ
Γ1 1 þ z Γ1
n¼1 n
Changing the variable z to the polar form in RHS such that

z ¼ 1 þ reiθ , ð6:176Þ
with the first term of RHS we have

Z Z Z 0 Z π
ireiθ dθ
A0 iθ
¼ A0 idθ ¼ A0 idθ ¼ A0 idθ ¼ A0 iπ: ð6:177Þ
Γ1 re Γ1 π 0
Note that in (6.177) the contour integration along Γ1 was performed in the
decreasing direction of the argument θ from π to 0. Thus, (6.175) is rewritten as
Z X1 Z 0
dz
¼ A 0 iπ þ An i r n einθ dθ, ð6:178Þ
Γ1 ð1 þ zÞðz z þ 1Þ
2 n¼1
π
where with the last equality we exchanged the order of the summation and integra-
tion. The second term of RHS of (6.178) vanishes when r ! 0. Thus, (6.171) is
further rewritten as
I ¼ lim I ¼ 2πi Res f eiπ=3 þ A0 iπ: ð6:179Þ

r!0
We have only one simple pole for f (z) at z ¼ eiπ/3 within the semicircle contour C.
Then, using (6.148), we have
Res f eiπ=3 ¼ f ðzÞ z eiπ=3

z¼eiπ=3
1 1 pffiffiffi
¼ ¼ 1 þ 3i : ð6:180Þ
ð1 þ eiπ=3 Þðeiπ=3 eiπ=3 Þ 6
The above calculus is straightforward, but (6.37) can be conveniently used. Thus,
finally we get
pffiffiffi
πi pffiffiffi πi 3π π
I¼ 1 þ 3i þ ¼ ¼ pffiffiffi : ð6:181Þ
3 3 3 3
That is, I 1:8138: ð6:182Þ
To avoid any ambiguity of the principal value, we properly define it as [8]

Z 1 Z R
P f ðxÞdx lim f ðxÞdx:
1 R!1 R
(a) (b)
Γ
Γ
/
i /
Γ i
0 − −1 0
−1
/
/
1
Fig. 6.21 Sector of circle for contour integration of 1þz 3 that appears in Example 6.7. (a) The
integration range is [0, 1). (b) The integration range is (1, 0]
The integral can also be calculated using a lower semicircle including the simple
pffiffi
pole located at z ¼ eiπ=3 ¼ 12 3i . That gives the same answer as (6.181). The
calculations are left for readers as an exercise.
Example 6.7 [9, 10] It will be interesting to compare the result of Example 6.6 with
that of the following real definite integral:
Z 1
dx
I¼ : ð6:183Þ
0 1 þ x3
To evaluate (6.183), it is convenient to use a sector of circle for contour (see

Fig. 6.21a). In this case, the contour integral IC estimated along a sector is described
by
Z R Z Z
dz dz dz
IC ¼ þ þ : ð6:184Þ
0 1 þ z3 ΓR 1 þ z3 L 1 þ z3
In the third integral of (6.184), we take argz ¼ θ (constant) so that with z ¼ reiθ
we may have dz ¼ dreiθ. Then, that integral is given by
Z Z
dz dr
¼ : ð6:185Þ
L 1 þ z3 L 1 þ r 3 e3iθ
Setting 3iθ ¼ 2iπ, namely θ ¼ 2π/3, we get

Z Z 0 Z R
dz dr dr
¼ e2πi=3 ¼ e2πi=3 :
L 1 þ r 3 e3iθ R 1 þ r3 0 1 þ r3
Thus, taking R ! 1 in (6.184) we have

Z 1 Z Z 1
dx dz dr
lim I C ¼ þ e2πi=3 : ð6:186Þ
R!1 0 1 þ x3 Γ1 1 þ z3 0 1 þ r3
The second term of (6.186) vanishes as before. Changing the variable r ! x and
taking account of (6.180), we obtain
Z 1
dx
2πiRes f eiπ=3 ¼ 1 e2πi=3 ¼ 1 e2πi=3 I: ð6:187Þ
0 1 þ x3
Then, we have
I ¼ 2πiRes f eiπ=3 = 1 e2πi=3
¼ 2πi= 1 e2πi=3 1 þ eiπ=3 eiπ=3 eiπ=3 :
To calculate Res f (eiπ/3), we used (6.148) at z ¼ eiπ/3; f (z) has a simple pole at
z ¼ eiπ/3 within the contour C of Fig. 6.21a. Noting that (1 e2πi/3) (1 + eiπ/3) ¼ 3
pffiffiffi
and eiπ=3 eiπ=3 ¼ 2i sin ðπ=3Þ ¼ 3i, we get
pffiffiffi 2π
I ¼ 2πi= 3 3i ¼ pffiffiffi : ð6:188Þ
3 3
That is, I 1.2092. This number is two-thirds of (6.182).

We further extend the above calculation method to evaluation of real definite
integral. For instance, we have a following integral described by
Z 0
dx
I¼P : ð6:189Þ
1 1 þ x3
To evaluate this integral, we define the contour integral I C0 as in Fig. 6.21b, which
is expressed as
Z 0 Z Z Z
dz dz dz dz
I C0 ¼P þ þ þ : ð6:190Þ
R 1 þ z3 Γ1 ð 1 þ z Þ ð z 2 z þ 1Þ
L 0 1 þ z
3
ΓR 1 þ z3
Notice that combining I C0 of (6.190) with IC of (6.184) makes the contour

integration of (6.168). Proceeding as before and noting that there is no singularity
inside the contour C0, we have
Z 0 Z 1 Z 0
dz πi 2πi dr dz π
lim I C 0 ¼0¼P þe3 ¼P pffiffiffi ,
R!1 1 1 þ z 3 3 0 1 þ r3 1 1 þ z 3
3 3
where we used (6.188) and (6.178) in combination with (6.174). Thus, we get
Z 0
dx π
I¼ ¼ pffiffiffi : ð6:191Þ
1 1 þ x
3
3 3
As a matter of course, the result of (6.191) can immediately be obtained by

subtracting (6.188) from (6.181).
We frequently encounter real definite integrals including trigonometric functions.
In such cases, to make a valid estimation of the integrals it is desired to use complex
exponential functions. Jordan’s lemma is a powerful tool for this. The proof is given
as below following literature [5].
Lemma 6.6: Jordan’s Lemma [5] Let ΓR be a semicircle of radius R centered at
the origin in the upper half of the complex plane. Let f (z) be an analytic function that
tends uniformly to zero as jz j ! 1 when argz lies in the interval 0 arg z π.
Then, with a non-negative real number α, we have
Z
lim eiαζ f ðζ Þdζ ¼ 0:
R!1 Γ
R
Proof Using polar coordinates, ζ is expressed as
ζ ¼ Reiθ ¼ Rðcos θ þ i sin θÞ:
Then, the integral is denoted by

Z Z π
IR e f ðζ Þdζ ¼ iR
iαζ
f Reiθ eiαRðcos θþi sin θÞ eiθ dθ
ΓR 0
Z π
¼ iR f Re iθ
eiαR cos θαR sin θþiθ dθ:
0
By assumption f (z) tends uniformly to zero as jz j ! 1, and so we have

f Reiθ < εðRÞ,
where ε(R) is a certain positive number that depends only on R and tends to be zero
as jzj ! 1. Therefore, we have
Z π Z π=2
jI R j < εðRÞR eαR sin θ dθ ¼ 2 εðRÞR eαR sin θ dθ,
0 0
where the last equality results from the symmetry of sinθ about π/2. Meanwhile, we
have
sin θ 2θ=π when 0 θ π=2:
Hence, we have
Z π=2
πεðRÞ
jI R j < 2 εðRÞR e2αRθ=π dθ ¼ 1 eαR :
0 α
Therefore, we get
lim I R ¼ 0:
R!1
These complete the proof. ∎

Alternatively, for f (z) that is analytic and tends uniformly to zero as jz j ! 1
when argz lies in the interval π arg z 2π, similarly we have
Z
lim eiαζ f ðζ Þdζ ¼ 0,
R!1 Γ
R
where α is a certain non-negative real number. In this case, we assume that a

semicircle ΓR of radius R centered at the origin lies in the lower half of the complex
plane.
Using the Jordan’s lemma, we have several examples.
Z 1
sin x
I¼ dx: ð6:192Þ
1 x
Rewriting (6.192) as
Z 1 Z 1
sin z 1 eiz eiz
I¼ dz ¼ dz, ð6:193Þ
1 z 2i 1 z
we consider the following contour integral:

Z R Z Z
eiz eiz eiz
IC ¼ P dz þ dz þ dz, ð6:194Þ
R z Γ0 z ΓR z
where Γ0 represents an infinitesimally small semicircle around the origin and ΓR

shows the outer semicircle of radius R centered at the origin (see Fig. 6.22). The
contour C consists of the real axis, Γ0, and ΓR as shown.
The third term of (6.194) vanishes when R ! 1 due to the Jordan’s lemma. With
the second term, eiz is analytic in the whole complex plane (i.e., entire function) and,

integration of sinz z that Γ
appears in Example 6.8
Γ
− 0
hence, can be expanded in the Taylor’s series around any complex number.
Expanding it around the origin we have
X1 ðizÞn X1 ðizÞn
eiz ¼ n¼0 n!
¼ 1 þ n¼1 n!
: ð6:195Þ
As already mentioned in (6.173) through (6.178), among terms of RHS in (6.195)

only the first term, i.e., 1, contributes to the integral of the second term of RHS in
(6.194). Thus, as in (6.178) we get
Z
eiz
dz ¼ 1 iπ ¼ iπ: ð6:196Þ
Γ0 z
Notice that in (6.196) the argument is decreasing from π ! 0 during the contour
integration. Within the contour C, there is no singular point, and so IC ¼ 0 due to
Theorem 6.10 (Cauchy’s integral theorem). Thus, from (6.194) we obtain
Z 1 Z
eiz eiz
P dz ¼ dz ¼ iπ: ð6:197Þ
1 z Γ0 z
Exchanging the variable z ! z in (6.197), we have

Z 1 iz Z 1
e eiz
P dz ¼ P dz ¼ iπ: ð6:198Þ
1 z 1 z
Summing both sides of (6.197) and (6.198), we get

Z 1 Z 1 iz
eiz e
P dz þ P dz ¼ 2iπ: ð6:199Þ
1 z 1 z

Z 1 Z 1
ðeiz eiz Þ sin z
P dz ¼ 2iP dz ¼ 2iπ: ð6:200Þ
1 z 1 z
That is, the answer is

Z 1 Z 1
sin z sin x
P dz ¼ P dx ¼ I ¼ π: ð6:201Þ
1 z 1 x
Notice that as mentioned in (6.137) z ¼ 0 is a removable singularity of sin z

z , and so
the symbol P is superfluous in (6.201).
Z 1
sin 2 x
I¼ dx: ð6:202Þ
1 x2
From the trigonometric theorem, we have
1
sin 2 x ¼ ð1 cos 2xÞ: ð6:203Þ
2
Then, the integrand of (6.202) with the variable changed is rewritten using (6.37)
as
sin 2 z 1 1 e2iz 1 e2iz

¼ þ : ð6:204Þ
z2 4 z2 z2
Expanding e2iz as before, we have
X1 ð2izÞn X1 ð2izÞn
e2iz ¼ n¼0 n!
¼ 1 þ 2iz þ n¼2 n!
: ð6:205Þ
Then, we get
X1 ð2izÞn
1 e2iz ¼ 2iz n¼2 n!
,
X n n2 ð6:206Þ
1 e2iz 2i 1 ð2iÞ z
¼ :
z2 z n¼2 n!
As in Example 6.8, we wish to use (6.206) to estimate the second term of the
following contour integral IC:
Z 1 Z Z
1 e2iz 1 e2iz 1 e2iz
lim I C ¼ P dz þ dz þ dz: ð6:207Þ
R!1 1 z2 Γ0 z2 Γ1 z2
With the second integral of (6.207), as before, only the first term of (6.206)
contributes to the integral. The result is
Z Z Z 0
1 e2iz 1
dz ¼ 2i dz ¼ 2i idθ ¼ 2π: ð6:208Þ
Γ0 z2 Γ0 z π
With the third integral of (6.207), we have

Z Z Z
1 e2iz 1 e2iz
dz ¼ dz dz: ð6:209Þ
Γ1 z2 Γ1 z
2
Γ1 z
2
The first term of RHS of (6.209) vanishes for the reason similar to that already we
have seen in Example 6.4. The second term of RHS vanishes as well because of the
Jordan’s lemma. Within the contour C of (6.207), again there is no singular point,
and so lim I C ¼ 0. Hence, from (6.207) we have
R!1
Z 1 Z
1 e2iz 1 e2iz
P 2
dz ¼ dz ¼ 2π: ð6:210Þ
1 z Γ0 z2
Exchanging the variable z ! z in (6.210) as before, we have

" Z # Z 1
1
1 e2iz 1 e2iz
P 2
dz ¼ P dz ¼ 2π: ð6:211Þ
1 ðzÞ 1 z2
Summing both sides of (6.210) and (6.211) in combination with (6.204), we get
an answer described by
Z 1
sin 2 z
dz ¼ π: ð6:212Þ
1 z2
Again, the symbol P is superfluous in (6.210) and (6.211).

Z 1
cos x
I¼ dx ða 6¼ bÞ: ð6:213Þ
1 ð x aÞ ð x b Þ
We rewrite the denominator of (6.213) using the method of partial fraction

decomposition such that
(a) (b)
Γ
Γ − 0
Γ
− 0
Fig. 6.23 Contour for the integration of ðzacos z

ÞðzbÞ that appears in Example 6.10. (a) Along the upper
0
contour C. (b) Along the lower contour C
1 1 1 1
¼ : ð6:214Þ
ðx aÞðx bÞ a b x a x b
Then, (6.213) is described by

Z 1
1 cos z cos z
I¼ dz
ab 1 z a zb
Z 1 iz
1 e þ eiz eiz þ eiz
¼ dz: ð6:215Þ
2ða bÞ 1 za zb
We wish to consider the integration by dividing the integrand and calculate each
term individually. First, we estimate
Z 1
eiz
dz: ð6:216Þ
1 z a
To apply the Jordan’s lemma to the integration of (6.216), we use the upper
contour C that consists of the real axis, Γa, and ΓR; see Fig. 6.23a. As usual, we
consider a following integral:
Z 1 Z Z
eiz eiz eiz
lim I C ¼ P dz þ dz þ dz, ð6:217Þ
R!1 1 z a Γa z a Γ1 z a
where Γa represents an infinitesimally small semicircle around z ¼ a. Notice that

(6.217) is obtained when we are considering R ! 1 in Fig. 6.23a. The third term of
(6.217) vanishes due to the Jordan’s lemma. With the second term of (6.217), we
change variable z a ! z and rewrite it as
Z Z Z
eiz eiðzþaÞ eiz
dz ¼ dz ¼ eia dz ¼ eia iπ, ð6:218Þ
Γa z a Γ0 z Γ0 z
where with the last equality we used (6.196). There is no singular point within the
contour C, and so lim I C ¼ 0. Then, we have
R!1
Z 1 Z
eiz eiz
P dz ¼ dz ¼ eia iπ, ð6:219Þ
1 z a Γa z a
Next, we consider a following integral that appears in (6.215):

Z 1
eiz
dz: ð6:220Þ
1 z a
To apply the Jordan’s lemma to the integration of (6.220), we use the lower
contour C0 that consists of the real axis, Γea , and Γ
fR (Fig. 6.23b). This time, the
integral to be evaluated is given by
Z 1 Z Z
eiz eiz eiz
lim I C0 ¼ P dz þ dz þ dz ð6:221Þ
R!1 1 z a Γea z a Γe1 a
z
so that we can trace the contour C0 counterclockwise. Notice that the minus sign is
present in front of the principal value. The third term of (6.221) vanishes due to the
Jordan’s lemma. Evaluating the second term of (6.221) similarly just above, we get
Z Z iðzþaÞ Z iz
eiz e ia e
dz ¼ dz ¼ e dz ¼ eia iπ: ð6:222Þ
e
Γa z a e
Γ0 z e
Γ0 z
Hence, we obtain
Z 1 Z
eiz eiz
P dz ¼ dz ¼ eia iπ: ð6:223Þ
1 z a Γea z a
Notice that in (6.222) the argument is again decreasing from 0 ! π during the
contour integration. Summing (6.219) and (6.223), we get
Z 1 Z 1 iz Z 1 iz
eiz e e þ eiz
P dz þ P dz ¼ P dz ¼ eia iπ eia iπ
1 z a 1 z a 1 z a

¼ iπ eia eia ¼ iπ 2i sin a ¼ 2π sin a:
In a similar manner, we also get

Z 1
eiz þ eiz
P dz ¼ 2π sin b:
1 z b
Thus, the final answer is

Z 1 iz
1 e þ eiz eiz þ eiz 2π ðsin b sin aÞ
I¼ dz ¼
2ð a bÞ 1 za zb 2ð a bÞ
π ðsin b sin aÞ
¼ : ð6:224Þ
ab

Z 1
cos x
I¼ dx: ð6:225Þ
1 1 þ x2
The calculation is left for readers. Hints: (i) Use cosx ¼ (eix + eix)/2 and Jordan’s
lemma. (ii) Use the contour as shown in Fig. 6.18 for integration. Besides the
examples listed above, a variety of integral calculations can be seen in literature
[9–12].
6.9 Multivalued Functions and Riemann Surfaces
The last topics of the theory of analytic functions are related to multivalued func-
tions. Riemann surfaces play a central role in developing the theory of the
multivalued functions.
6.9.1 Brief Outline
We mention a brief outline for notions of multivalued functions and Riemann

surfaces. These notions are indispensable for investigating properties of irrational
functions and logarithmic functions and performing related calculations, especially
integrations with respect to those functions.
Definition 6.12 Let n be a positive integer and let z be an arbitrary complex number.
Suppose we have
z ¼ wn : ð6:226Þ
Then, w is said to be a n-th root of z and we denote

6.9 Multivalued Functions and Riemann Surfaces 249
Fig. 6.24 Multiple roots in

the complex plane. (a)
(a) 1 /
(−1) /
Square roots of 1 and 1.

(b) Cubic roots of 1 and 1
−1 1
(b) 1 /
(−1) /
/
/
1 −1
/
/
w ¼ z1=n : ð6:227Þ
We have expressly shown this simple thing. It is because if we think of real z, z1/n
is uniquely determined only when n is odd regardless of whether z is positive or
negative. In case we think of even n, (i) if z is positive, we have two n-th root of z1/n.
(ii) If z is negative, on the other hand, we have no n-th root. Meanwhile, if we think
of complex z, we have a totally different situation. That is, regardless of whether n is
odd or even, we always have n different z1/n for any given complex number z.
As a tangible example, let us think of n-th roots of 1 for n ¼ 2 and 3. These are
depicted graphically in Fig. 6.24. We can readily understand the statement of the
above paragraph. In Fig. 6.24a, if z ¼ 1 for n ¼ 2, we have either w ¼ (1)1/2
¼ (eiπ)1/2 ¼ i or w ¼ (1)1/2 ¼ (eiπ)1/2 ¼ i with square roots. In Fig. 6.24b, if
z ¼ 1 for n ¼ 3, we have w ¼ (1)1/3 ¼ 1; w ¼ (1)1/3 ¼ (eiπ)1/3 ¼ eiπ/3;
w ¼ (1)1/3 ¼ (eiπ)1/3 ¼ eiπ/3 with cubic roots. Moreover, we have 11/2 ¼ 1; 1.
Also, we have 11/3 ¼ 1; e2iπ/3; e2iπ/3.
Let us extend the above preliminary discussion to the analytic functions. Suppose
we are given a following polar form of z such that
z ¼ r ðcos θ þ i sin θÞ ð6:32Þ
and
w ¼ ρðcos φ þ i sin φÞ:
Then, we have
z ¼ wn ¼ ρn ðcos φ þ i sin φÞn ¼ ρn ðcos nφ þ i sin nφÞ, ð6:228Þ
where with the last equality we used de Moivre’s theorem (6.36). Comparing the real
and imaginary parts of (6.32) and (6.228), we have
r ¼ ρn or ρ ¼ r 1=n ð6:229Þ
and
θ ¼ nφ þ 2kπ ðk ¼ 0, 1, 2, Þ: ð6:230Þ
In (6.229), we define both r and ρ as being positive so that positive ρ can be

uniquely determined for any positive integer n (i.e., either even or odd). Recall once
again that if n is even, we have two n-th roots for ρ (i.e., both positive and negative)
with given positive r. Of these two roots, we discard the negative ρ. Rewriting
(6.230), we have
1
φ ¼ ðθ 2kπ Þ ðk ¼ 0, 1, 2, Þ:
n
Further rewriting (6.227), we get

h i
1 1
wk ðθÞ ¼ z1=n ¼ r 1=n cos ðθ 2kπ Þ þ i sin ðθ 2kπ Þ
n n
ðk ¼ 0, 1, 2, , n 1Þ, ð6:231Þ
where wk(θ) is said to be a branch. Note that we have
wk ðθ 2nπ Þ ¼ wk ðθÞ: ð6:232Þ
That is, wk(θ) is a periodic function of the period 2nπ. The number of the total
branches is n; the index k of wk(θ) denotes these different branches. In particular,
when k ¼ 0, we have
1 1
w0 ðθÞ ¼ z1=n ¼ r 1=n cos θ þ i sin θ : ð6:233Þ
n n
Comparing (6.231) and (6.233), we obtain

wk ðθÞ ¼ w0 ðθ 2kπ Þ: ð6:234Þ
From (6.234), we find that wk(θ) is obtained by shifting (or rotating) w0(θ) by 2kπ
toward the positive direction of θ. The branch w0(θ) is called a principal branch of
the n-th root of z. The value of w0(θ) is called the principal value of that branch. The
implication of the presence of the branches is as follows: Suppose that the branches
besides w0(θ) would be absent. Then, from (6.233) we have
2π 2π
w0 ð2π Þ ¼ z1=n ¼ r 1=n cos þ i sin 6¼ w0 ð0Þ ¼ r 1=n :
n n
This causes inconvenience, because θ ¼ 0 and θ ¼ 2π are assumed to be identical

in the complex plane. Hence, the single valuedness would be broken with w0(θ). But,
in virtue of the presence of w1(θ), we have
w1 ð2π Þ ¼ w0 ð2π 2π Þ ¼ w0 ð0Þ:
This means that the value of w0(0) is recovered and, hence, the single valuedness
remains intact. After another cycle around the origin, similarly we have
w2 ð4π Þ ¼ w0 ð4π 4π Þ ¼ w0 ð0Þ:
Thus, in succession we get
wk ð2kπ Þ ¼ w0 ð2kπ 2kπ Þ ¼ w0 ð0Þ:
For k ¼ n, from (6.231) we have

h i
1 1
wn ðθÞ ¼ z1=n ¼ r 1=n cos ðθ 2nπ Þ þ i sin ðθ 2nπ Þ
n n
h i
1 1
¼r 1=n
cos θ 2π þ i sin θ 2π
n n
1 1
¼ r 1=n cos θ þ i sin θ ¼ w0 ðθÞ:
n n
Thus, we have no more new function. At the same time, the single valuedness of
wk(θ) (k ¼ 0, 1, 2, , n 1) remains intact during these processes. To summarize
the above discussion, if we have n planes (or sheets) as the complex planes and
allocate them to w0(θ), w1(θ), , wn 1(θ) individually, we have a single-valued
function w(θ) as a whole throughout these planes. In a word, the following relation
represents the situation:
8
> w 0 ðθ Þ
>
> for Plane 0,
>
< w 1 ðθ Þ for Plane 1,
w ðθ Þ ¼
>
>
>
>
: for Plane n 1:
wn1 ðθÞ
This is the essence of the Riemann surface. The superposition of these n planes is
called a Riemann surface and each plane is said to be a Riemann sheet [of the
function w(θ)]. Each single-valued function w0(θ), w1(θ), , wn 1(θ) defined on
each Riemann sheet is called a branch of w(θ). In the above discussion, the origin is
called a branch point of w(z).
For simplicity, let us think of the function w(z) described by
pffiffi
wðzÞ z1=2 ¼ z:
In this case, for z ¼ reiθ (0 θ 2π) we have two different values w0 and w1
given by
w0 ðθÞ ¼ r 2 e 2 , w1 ðθÞ ¼ r 1=2 eiðθ2πÞ=2 ¼ w0 ðθ 2π Þ ¼ w0 ðθÞ:

1 iθ
ð6:235Þ
Then, we have

w0 ðθÞ for Plane 0,
wðθÞ ¼ ð6:236Þ
w1 ðθÞ for Plane 1:
In this case, w(z) is called a “two-valued” function of z and the functions w0 and
w1 are said to be branches of w(z). Suppose that z makes a counterclockwise circuit
around the origin, starting from, e.g., a real positive number z ¼ z0 to come full circle
to the original point z ¼ z0. Then, the argument of z has been increased by 2π. In this
situation the arguments of the individual branches w0 and w1 are increased by π.
Accordingly, w0 is switched to w1 and w1 is switched to w0. This situation can be
understood more clearly from (6.232) and (6.234). That is, putting n ¼ 2 in (6.232),
we have
wk ðθ 4π Þ ¼ wk ðθÞ ðk ¼ 0, 1Þ: ð6:237Þ
Meanwhile, putting k ¼ 1 in (6.234) we get
w1 ðθÞ ¼ w0 ðθ 2π Þ: ð6:238Þ
Replacing θ with θ + 2π in (6.238), we have w1(θ + 2π) ¼ w0(θ). Also replacing θ

with θ + 4π in (6.237), we have w1(θ + 4π) ¼ w1(θ) ¼ w0(θ 2π) ¼ w0(θ + 2π),
(a) (b)
Plane I placed on top of
Plane II
Plane I Plane II
Fig. 6.25 Simple kit that helps visualize the Riemann surface. To make it, follow next procedures:
(a) Take two sheets of paper (Planes I and II) and make slits (indicated by dashed lines) as shown.
Next, put Plane I on top of Plane II so that the two slits (i.e., branch cuts) can fit in line. (b) Tape
together the downside of the cut of Plane I and the foreside of the cut of Plane II. Then, also tape
together the downside of the cut of Plane II and the foreside of the cut of Plane I
where with the last equality we replaced θ with θ + 2π in (6.237). Rewriting the
above, we get
w1 ðθ þ 2π Þ ¼ w0 ðθÞ and w0 ðθ þ 2π Þ ¼ w1 ðθÞ:
That is, adding 2π to θ, w0 and w1 are switched to each other.

Let us further think of the two-valued function of w(z) ¼ z1/2. Strictly speaking, an
analytic function cannot be a two-valued function. This is because if so, the
continuity and differentiability will be lost from the function and, hence, the
analyticity would be broken. Then, we must make a suitable device to avoid
it. Such a device is called a Riemann surface.
Let us make a kit of the Riemann surface following Fig. 6.25. (i) Take a sheet of
paper so that it can represent a complex plane and cut it with scissors along the real
axis that starts from the origin so that the cut (or slit) can be made toward the real
positive direction. This cut is called a branch cut with the origin being a branch
point. Let us call this sheet Plane I. (ii) Take another sheet of paper and call it Plane
II. Also make a cut in Plane II in exactly the same way as that for Plane I; see
Fig. 6.25a for the processes (i) and (ii). (iii) Next, put Plane I on top of Plane II so that
the two branch cuts can fit in line. (iv) Tape together the downside of the cut of Plane
I and the foreside of the cut of Plane II (Fig. 6.25b). (v) Then, also tape together the
downside of the cut of Plane II and the foreside of the cut of Plane I (see Fig. 6.25b
once again).
Thus, what we see is that starting from, e.g., a real positive number z ¼ z0 of Plane
I and coming full circle to the original point z ¼ z0, then we cross the cut to enter
Plane II located underneath Plane I. After another cycle within Plane II, we come
back to Plane I again by crossing the cut. After all, we come back to the original
Plane I after two cycles on the combined planes. This combined plane is called the
Riemann surface. In this way, Planes I and II correspond to different branches w0 and
Fig. 6.26 Graphically

decided argument θa, which
is an angle between a line
connecting z and a and −
another line drawn in
parallel with the real axis
w1, respectively, so that each branch can be single valued. In other words, w(z) ¼ z1/2
is a single-valued function that is defined on the whole Riemann surface.
There are a variety of Riemann surfaces according to the nature of the complex
functions. Another example of the multivalued function is expressed as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
wðzÞ ¼ ðz aÞðz bÞ,
where two branch points are located at z ¼ a and z ¼ b. As before, we assume that
z a ¼ r a eiθa and z b ¼ r b eiθb :
Then, we have
pffiffiffiffiffiffiffiffi iðθa þθb Þ=2
wðθa , θb Þ ¼ z1=2 ¼ ra rb e : ð6:239Þ
In the above, e.g., the argument θa can be decided graphically as shown in

Fig. 6.26, where θa is an angle between a line connecting z and a and another line
drawn in parallel with the real axis. We can choose w(θa, θb) of (6.239) for the
principal branch and define this as
pffiffiffiffiffiffiffiffi iðθa þθb Þ=2
w 0 ðθ a , θ b Þ ra rb e : ð6:240Þ
Also, we define w1(θa, θb) as

pffiffiffiffiffiffiffiffi iðθa þθb 2πÞ=2
w1 ðθa , θb Þ ra rb e : ð6:241Þ
Then, as in (6.235) we have
w1 ðθa , θb Þ ¼ w0 ðθa 2π, θb Þ ¼ w0 ðθa , θb 2π Þ ¼ w0 ðθa , θb Þ: ð6:242Þ

(a) (b)
0 0
(c)
Fig. 6.27 Geometrical relationship between contour C and two branch points (located at z ¼ a and
z ¼ b) for ðz aÞðz bÞ. (a) The contour C encircles only z ¼ a. (b) C encircles both z ¼ a and
z ¼ b. (c) C encircles neither z ¼ a nor z ¼ b
From (6.242), we find that adding 2π to θa or θb, w0 and w1 are switched to each
other as before. We also find that after the variable z has come full circle around one
of a and b, w0(θa, θb) changes sign as in the case of (6.235).
We also have
w0 ðθa 2π, θb 2π Þ ¼ w0 ðθa , θb Þ, w1 ðθa 2π, θb 2π Þ ¼ w1 ðθa , θb Þ: ð6:243Þ
From (6.243), on the other hand, after the variable z has come full circle around
both a and b, both w0(θa, θb) and w1(θa, θb) keep the original value. This is also the
case where the variable z comes full circle without encircling a or b. These behaviors
imply that (i) if a contour encircles one of z ¼ a and z ¼ b, w0(θa, θb) and w1(θa, θb)
change sign and switch to each other. Thus, w0(θa, θb) and w1(θa, θb) form branches.
On the other hand, (ii) if a contour encircles both z ¼ a and z ¼ b, w0(θa, θb) and
w1(θa, θb) remain intact. (iii) If a contour encircles neither z ¼ a nor z ¼ b, w0(θa, θb)
and w1(θa, θb) remain intact as well.
These three cases (i), (ii), and (iii) are depicted in Fig. 6.27a–c, respectively. In
Fig. 6.27c after z comes full circle, argz relative to z ¼ a returns to the original value
keeping θ1 arg z θ2. This is similarly the case with argz relative to z ¼ b.
Correspondingly, the branch cut(s) are depicted with doubled broken line(s), e.g., as
in Fig. 6.28. Note that in the cases (ii) and (iii) the branch cut can be chosen so that
(a) (b)
0 0
Fig. 6.28 Branch cut(s) for ðz aÞðz bÞ. (a) A branch cut is shown by a line connecting z ¼ a
and z ¼ b. (b) Branch cuts are shown by a line connecting z ¼ a and z ¼ 1 and another
line connecting z ¼ b and z ¼ 1. In both (a, b) the branch cut(s) are depicted with doubled broken
line(s)
the circle (or contour) may not cross the branch cut. Although a bit complicated, the
Riemann surface can be shaped accordingly. Other choices of the branch cuts are
shown in Sect. 6.9.2 (vide infra).
When we consider logarithm functions, we have to deal with rather complicated
situation. For polar coordinate representation, we have z ¼ reiθ. That is,
ln z ¼ ln r þ i arg z ¼ ln jzj þ i arg z, ð6:244Þ
where
arg z ¼ θ þ 2πn ðn ¼ 0, 1, 2, Þ: ð6:245Þ
If n ¼ 0 is chosen, lnz is said to be the principal value. With the logarithm

functions, the Riemann surface comprises infinite planes each of which corresponds
to the individual branch whose argument is given by (6.245). The point z ¼ 0 is
pffiffi
called a logarithmic branch point. Meanwhile, the branch point for, e.g., wðzÞ ¼ z
is said to be an algebraic branch point.
6.9.2 Examples of Multivalued Functions
When we deal with a function having a branch point, we must remind that unless the
contour crosses the branch cut (either algebraic or logarithmic), a function we are
thinking of is held single valued and analytic (if that function is originally analytic)
and can be differentiated or integrated in a normal way. We give some examples
for this.
Example 6.12 [6] Evaluate the following real definite integral:

Z 1
xa1
I¼ dx ð0 < a < 1Þ: ð6:246Þ
0 1þx
We rewrite (6.246) as
Z 1
za1
I¼ dz: ð6:247Þ
0 1þz
We define the integrand of (6.247) as
za1
f ðzÞ ,
1þz
where za1 can be further rewritten as
za1 ¼ eða1Þ ln z :
The function f (z) has a branch point at z ¼ 0 and a simple pole at z ¼ 1.

Bearing in mind this situation, we consider a contour for integration (see Fig. 6.29).
In Fig. 6.29 we depict the branch cut with a doubled broken line. Lines PQ and Q0P0
are contour lines located over and under the branch cut, respectively. We assume that
the lines PQ and Q0P0 are illimitably close to the real axis. Thus, starting from the
point P the contour integration IC is described by
Z Z Z Z
za1 za1 za1 za1
IC ¼ dz þ dz þ dz þ dz, ð6:248Þ
PQ 1 þ z ΓR 1 þ z Q0 P0 1 þ z Γ0 1 þ z
where ΓR and Γ0 denote the outer large circle and inner small circle of their radius
R ( 1) and r ( 1), respectively. Note that with the contour integration ΓR is traced
counterclockwise but Γ0 is traced clockwise (see Fig. 6.29). Since the simple pole is
present at z ¼ 1, the related residue Res f (1) is
Fig. 6.29 Branch cut

(shown with a doubled Γ
broken line) and contour for
a1
the integration of z1þz
−1
Γ
0
a1
Res f ð1Þ ¼ za1 z¼1
¼ ð1Þa1 ¼ eiπ ¼ eaiπ :
Then, we have
I C ¼ 2πi Res f ð1Þ ¼ 2πi eaiπ : ð6:249Þ
When R ! 1 and r ! 0, we have

Z Z
za1 Ra1 za1 r a1
dz < 2πR ! 0 and dz < 2πr ! 0: ð6:250Þ
ΓR 1 þ z R1 Γ0 1 þ z 1r
Thus, the second and fourth terms of (6.248) vanish.

Meanwhile, we take the principal value of ln z of (6.244). Since the lines PQ and
Q0P0 are very close to the real axis, using (6.245) with n ¼ 0 we can put θ ¼ 0 on PQ
and θ ¼ 2π on Q0P0. Then, we have
Z Z Z
za1 eða1Þ ln z eða1Þ ln jzj
dz ¼ dz ¼ dz: ð6:251Þ
PQ 1 þ z PQ 1 þ z PQ 1 þ z
Moreover, we have
Z Z Z
za1 eða1Þ ln z eða1Þð ln jzjþ2πiÞ
dz ¼ dz ¼ dz
Q0 P0 1 þ z Q0 P0 1 þ z Q0 P0 1þz
Z
eða1Þ ln jzj
¼ eða1Þ2πi dz: ð6:252Þ
Q0 P0 1þz
Notice that the argument underneath the branch cut (i.e., the line Q0P0) is
increased by 2π relative to that over the branch cut.
Considering (6.249) through (6.252) and taking the limit of R ! 1 and r ! 0,
(6.248) is rewritten by
Z 1 ða1Þ ln jzj Z 0 ða1Þ ln jzj
e e
2πi eaπi ¼ dz þ eða1Þ2πi dz
0 1þz 1 1þz
h iZ 1 ða1Þ ln jzj
ða1Þ2πi e
¼ 1e dz
0 1þz
h iZ 1
za1
¼ 1 eða1Þ2πi dz: ð6:253Þ
0 1þz
In (6.253) we have assumed z to be real and positive (see Fig. 6.29), and so we have
e(a 1) ln jz| ¼ e(a 1) ln z ¼ za 1. Therefore, from (6.253) we get
Z 1
za1 2πi eaπi 2πi eaπi
dz ¼ ¼ : ð6:254Þ
0 1þz 1 eða1Þ2πi 1 e2πai
Using (6.37) for (6.254) and performing simple trigonometric calculations (after
multiplying both the numerator and denominator by 1 e2πai), we finally get
Z 1 Z 1
za1 xa1
dz ¼ dx ¼ I ¼ π=ðsin aπ Þ:
0 1þz 0 1þx
Example 6.13 [13] Estimate the following real definite integral:

Z b
1
I¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ða, b : real with a < bÞ: ð6:255Þ
a ð x aÞ ð b x Þ
To this end, we wish to evaluate the following integral described by

Z
1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz, ð6:256Þ
C ð z aÞ ð z bÞ
where we define the integrand of (6.256) as
1
f ðzÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
ðz aÞðz bÞ
The total contour C comprises Ca, PQ, Cb, and Q0P0 as depicted in Fig. 6.30. The
function f (z) has two branch points at z ¼ a and z ¼ b; otherwise f (z) is analytic. We
draw a branch cut with a doubled broken line as shown. Since the contour C does not
cross the branch cut but encircle it, we can evaluate the integral in a normal manner.
Starting from the point P0, (6.256) can be expressed by
Z Z
1 1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz
Ca ðz aÞðz bÞ PQ ð z aÞ ð z bÞ
Z Z
1 1
þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz: ð6:257Þ
Cb ð z a Þ ð z b Þ 0 0
QP ð z a Þ ð z bÞ
We assume that the lines PQ and Q0P0 are illimitably close to the real axis. This
time, Ca and Cb are both traced counterclockwise. Putting
z a ¼ r 1 eiθ1 and z b ¼ r 2 eiθ2 ,
we have
Fig. 6.30 Branch cut (shown with a doubled broken line) and contour for the integration of
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ffi ða < bÞ. Only the argument θ1 is depicted. With θ2 see text
ðzaÞðzbÞ
1
f ðzÞ ¼ pffiffiffiffiffiffiffiffi eiðθ1 þθ2 Þ=2 : ð6:258Þ
r1 r2
Let Ca and Cb be a small circle of radius ε centered at z ¼ a and z ¼ b,

respectively. When we are evaluating the integral on Cb, we can put
z b ¼ εeiθ2 ðπ θ2 π Þ; i:e:, r 2 ¼ ε:
We also have r1 b a; in Fig. 6.30 we assume b > a. Hence, we have
Z Z π Z π pffiffiffi
1 iε ε
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffi eiðθ1 þθ2 Þ=2 dθ2 pffiffiffiffi dθ2 :
Cb ðz aÞðz bÞ π r1 ε π r1
Therefore, we get
Z Z
1 1
lim pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0 or lim pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0:
ε!0 Cb ð z aÞ ð z bÞ ε!0 Cb ð z aÞ ð z bÞ

Z
1
lim pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0:
ε!0 C
a ð z aÞ ð z bÞ
Meanwhile, on the line Q0P0 we have
z a ¼ r 1 eiθ1 and z b ¼ r 2 eiθ2 : ð6:259Þ
In (6.259) we have r1 ¼ x a, θ1 ¼ 0 and r2 ¼ b x, θ2 ¼ π; see Fig. 6.30. Also,

we have dz ¼ dx. Hence, we have
Z Z Z
1 a
eiπ=2 a
i
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx
0 0
QP ðz aÞðz bÞ b ð x aÞ ð b x Þ b ðx aÞðb xÞ
Z b
1
¼i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx:
a ðx aÞðb xÞ
ð6:260Þ
On the line PQ, in turn, we have r1 ¼ x a and θ1 ¼ 2π. This is because when
going from P0 to P, the argument is increased by 2π; see Fig. 6.30 once again. Also,
we have r2 ¼ b x, θ2 ¼ π, and dz ¼ dx. Notice that the argument θ2 remains
unchanged. Then, we get
Z Z
1 b
e3iπ=2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx
PQ ð z a Þ ð z bÞ a ðx aÞðb xÞ
Z b
1
¼i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx: ð6:261Þ
a ðx aÞðb xÞ
Inserting these results into (6.257), we obtain

I Z b
1 1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 2i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx: ð6:262Þ
C ðz aÞðz bÞ a ð x aÞ ð b x Þ
The calculation is yet to be finished, however, because (6.262) is not a real

definite integral. Then, let us try another contour integration. We choose a contour
C of a radius R and centered at the origin. We assume that R is large enough this time.
Meanwhile, putting
z ¼ 1=w, ð6:263Þ
we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw
ð z aÞ ð z bÞ ¼ ð1 awÞð1 bwÞ with dz ¼ 2 : ð6:264Þ
w w
Thus, we have
I I
1 1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw, ð6:265Þ
C ðz aÞðz bÞ C 0 w ð1 awÞð1 bwÞ
where the closed contour C0 of diameter 1/R comes full circle clockwise around the
origin of the w-plane (see Fig. 6.31). This is because as in the z-plane the contour C is
(a) (b)
0< < <0<

1/ 1/
0 0
Fig. 6.31 Branch cuts (shown with doubled broken lines) and contour in the w-plane for the
integration of pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
. (a) Case of 0 < a < b. (b) Case of a < 0 < b. Notice that in both cases
w ð1awÞð1bwÞ
the contour C0 is traced clockwise (see text)
traced viewing the point z ¼ 1 on the right, the contour C0 is traced viewing the
corresponding point w ¼ 0 on the right. The above clockwise cycle results from such
a situation. The integrand of (6.265) has a simple pole at w ¼ 0.
Regarding other singularities of the integrand, there are two branch points at
w ¼ 1/a and w ¼ 1/b. Then, we appropriately choose R 1 (or 1/R 1) so that C0
does not cut across the branch cut. To meet this condition, we have two choices.
(i) The branch cut connects w ¼ 1/a and w ¼ 1/b. (ii) Another choice of the branch
cut is that it connects w ¼ 1/a and w ¼ 1 along with w ¼ 1/b and w ¼ 1 (see
Fig. 6.31). If we have 0 < a < b, we can choose a branch cut as that depicted in
Fig. 6.31a corresponding to Fig. 6.28a. In case a < 0 < b, the branch cut can be that
shown in Fig. 6.31b corresponding to Fig. 6.28b. (When a < b < 0, the branch cut
geometry may be that similar to Fig. 6.31a.)
Consequently, in both the above cases of (i) and (ii) there is only a simple pole at
w ¼ 0 without any other singularities inside the contour C0. Therefore, according to
(6.148) we have a contour integration described by
I
1 1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw ¼ 2πi Res pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
C0 w ð1 awÞð1 bwÞ w ð1 awÞð1 bwÞ w¼0
ð6:266Þ
1
¼ 2πi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2πi,
ð1 awÞð1 bwÞ w¼0
where the above minus sign is due to the clockwise rotation of the contour integra-
tion. Hence, from (6.265) we get
I
1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 2πi: ð6:267Þ
C ð z aÞ ð z bÞ
Equating (6.262) and (6.267), we finally obtain the answer described by

Z b
1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ¼ π: ð6:268Þ
a ðx aÞðb xÞ
Example 6.14 [14] In Chap. 3 we mentioned the analyticity of the function

λ
1 2tx þ t 2 : ð6:269Þ
In Chap. 3, we assumed that x is a real number belonging to an interval [1, 1].

Under this condition, we define f (t) in a complex domain as
f ðt Þ 1 2tx þ t 2 : ð6:270Þ
h pffiffiffiffiffiffiffiffiffiffiffiffiffi ih pffiffiffiffiffiffiffiffiffiffiffiffiffi i
f ð t Þ ¼ t x þ i 1 x2 t x i 1 x2 : ð6:271Þ
With f (t) ¼ 0 we have two roots at t ¼ x i 1 x2, and so jtj ¼ 1 (see Sect.
3.6.1).
When λ ¼ 1/2, we are dealing with essentially the same problem as that of
Example 6.13. For instance, when x ¼ 0, the branch points are located at t ¼ i.
When x ¼ 1= 2, the branch points are positioned at t ¼ ð1 iÞ= 2. These values
of t form the branch points. In Fig. 6.32 we depict these branch points and
corresponding branch cuts. In this situation, the function
1=2
hðt Þ 1 2tx þ t 2 ð6:272Þ
is analytic within a circle jt j < 1 of the complex plane t [14] (Fig. 6.32). Therefore,
we can freely expand h(t) into a Taylor’s series at any point within that circle. If we
expand h(t) around t ¼ 0, we expect to have a following Taylor’s series:
X1
hð t Þ ¼ c ðxÞt
n¼0 n
n
: ð6:273Þ
We can readily decide cn(x) using (6.115) such that

I
1 hð t Þ
cn ð xÞ ¼ nþ1
dt, ð6:274Þ
2πi C t
where the contour C can be chosen so that C can encircle t ¼ 0 in a region jt j < 1.
Let us make a substitution such that [14]
Fig. 6.32 Branch points

and corresponding branch
cuts for (1 2tx + t2)1/2.
For instance, when x ¼ 0, i t
the branch points are located
pffiffiffi (1 + )/ 2
at t ¼ i. When x ¼ 1= 2,
the branch points are
positioned at t ¼
pffiffiffi
ð1 iÞ= 2 −1 0 1
(1 − )/ 2
−i
1=2
1 ut ¼ 1 2tx þ t 2 : ð6:275Þ
Then we have
2ð u x Þ
t¼ and dt ¼ 2ð1 ut Þdu= u2 1 with u 6¼ 1: ð6:276Þ
u 1
2
Using these relations, we rewrite (6.274) as

I n I n
1 ð u2 1Þ 1 ð1Þn ð1 u2 Þ
cn ð xÞ ¼ nþ1
du ¼ nþ1
du, ð6:277Þ
0 2n ðu xÞ
2πi 2πi C 0 2 ð u xÞ
n
C
where the contour C0 encircles u ¼ x. Here using (6.118), we readily get
I n n
1 ð1Þn ð1 u2 Þ ð1Þn d n ð1 u2 Þ
cn ð xÞ ¼ du ¼ n Pn ðxÞ: ð6:278Þ
2πi C 0 2 ð u xÞ
n nþ1 2 n! dun
u¼x
The functions Pn(x) are Legendre polynomials that have already appeared in
(3.145) and (3.207) of Sects. 3.5 and 3.6. In this manner, the above argument directly
confirms that the Legendre polynomials of (3.194) derived from the generating
function (3.180) or defined as Rodrigues formula of (3.145) are identical in the
complex domain. Originally, (3.145) and (3.207) were defined in the real domain.
The identical representations in both the domains real and complex are the conse-
quence of the analytic continuation.
Substituting x ¼ 1 for (6.277), we get
References 265
I n I
1 ð1Þn ð1 u2 Þ 1 1 ð u þ 1Þ n 1
c n ð 1Þ ¼ du ¼ n du ¼ n ðu þ 1Þn
2πi C 2 ð u 1Þ
0 n nþ1 2 2πi C 0 u 1 2
u¼1
¼1
and
I n I
1 ð1Þn ð1 u2 Þ 1 1 ðu 1Þn 1
cn ð1Þ ¼ du ¼ n du ¼ n ðu 1Þn
2πi C 2 ð u þ 1Þ
0 n nþ1 2 2πi C 0 u þ 1 2
u¼1
¼ ð1Þn
ð6:279Þ
Notice that in (6.279) the integrand has a simple pole at u ¼ 1 and, hence, we
used (6.144) and (6.148) with the contour integration. The contour C0 is taken so that
it can encircle u ¼ 1 in the former case and u ¼ 1 in the latter. Hence, we have
important relations
Pn ð1Þ ¼ 1 and Pn ð1Þ ¼ ð1Þn : ð6:280Þ
Summarizing this chapter, we have explored the theory and applications of the
analytic functions on the complex domain. We have studied how the theory of
analytic functions produces fruitful results.
References
1. Stoll RR (1979) Set theory and logic. Dover, New York

2. Willard S (2004) General topology. Dover, New York
3. McCarty G (1988) Topology. Dover, New York
4. Mendelson B (1990) Introduction to topology, 3rd edn. Dover, New York
6. Takagi T (2010) Introduction to analysis. Iwanami, Tokyo. (in Japanese)
7. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
New York
9. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn.
Academic Press, Waltham
10. Hassani S (2006) Mathematical physics. Springer, New York
11. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering,
3rd edn. Cambridge University Press, Cambridge
12. Boas ML (2006) Mathematical methods in the physical sciences, 3rd edn. Wiley, New York
13. Omote M (1988) Complex functions. Iwanami, Tokyo. (in Japanese)
Part II
Electromagnetism
Electromagnetism is one of the pillars of modern physics, even though it belongs to

classical physics along with Newtonian mechanics. Maxwell’s equations describe
and interpretalmost all electromagnetic phenomena and are basis equations of
classical physics together with Newton equation. Although after the discovery of
relativistic theory Newton equation needed to be modified, Maxwell’s equations did
not virtually have to be changed. In this part we treat characteristics of electromag-
netic waves that are directly derived from Maxwell’s equations.
Electromagnetism becomes connected to quantum mechanics, especially when
we deal with emission and absorption of light. In fact, the experiments performed in
connection with the blackbody radiation led to discovery of light quanta and
establishment of quantum mechanics. These accounts are not only of particular
interest from a historical point of view but also of great importance in understanding
modern physics. To understand the propagation of electromagnetic waves in a
dielectric medium is important from a basic aspect of electromagnetism. Moreover,
it is deeply connected to optical applications including optical devices such as
waveguides and lasers. In light of recent progress of the laser devices, we provide
a tangible example of organic lasers as newly occurring laser devices.
The motion of particles as well as spatial and temporal change in, e.g., electro-
magnetic fields is very often described in terms of differential equations. We
describe introductory methods of Green’s functions in order to solve those differ-
ential equations.
Chapter 7
Maxwell’s Equations
Maxwell’s equations consist of four first-order partial differential equations. First we

deal with basic properties of Maxwell’s equations. Next we show how equations of
electromagnetic wave motion are derived from Maxwell’s equations along vector
analysis. It is important to realize that the generation of the electromagnetic wave is a
direct consequence of the interplay between the electric field and magnetic field that
both change with time. We deal with behaviors of electromagnetic waves in dielec-
tric media where no true charge exists. At a first glance, this restriction seems to
narrow a range of application of principles of electromagnetism. In practice, how-
ever, such a situation is universalistic; topics cover a wide range of electromagnetic
phenomena, e.g., light propagation in dielectrics including water, glass, polymers,
etc. Polarized properties characterize the electromagnetic waves. These include
linear, circular, and elliptic polarizations. The characteristics are important both
from a fundamental aspect and from the point of view of optical applications.
7.1 Maxwell’s Equations and Their Characteristics
In this chapter, we first represent Maxwell’s equations as vector forms. The equa-
tions are represented as a differential form that is consistent with a viewpoint based
on “action trough medium.” The equation of wave motion (or wave equation) is
naturally derived from these equations.
Maxwell’s equations of electromagnetism are expressed as follows:
div D ¼ ρ, ð7:1Þ
div B ¼ 0, ð7:2Þ

https://doi.org/10.1007/978-981-15-2225-3_7
270 7 Maxwell’s Equations
∂B
rot E þ ¼ 0, ð7:3Þ
∂t
∂D
rot H ¼ i: ð7:4Þ
∂t
In (7.3), RHS denotes a zero vector. Let us take some time to get acquainted with
physical quantities with their dimension as well as basic ideas and concepts along
with definitions of electromagnetism.
The quantity D is called electric flux density [As
m2 ¼ m2] (or electric displacement);
C
ρ is electric charge density [m3 ]. We describe vector quantities V as in (3.4):

C
0 1
Vx
B C
V ¼ ð e1 e2 e3 Þ @ V y A: ð7:5Þ
Vz
The notation div denotes a differential operator such that
∂V x ∂V y ∂V z
div V þ þ : ð7:6Þ
∂x ∂y ∂z
Thus, the div operator converts a vector to a scalar. The quantities D and the electric
field E [V/m] are associated with the following expression:
D = εE, ð7:7Þ
2
where ε [Nm
C
2] is called a dielectric constant (or permittivity) of the dielectric medium.
The dimension can be understood from the following Coulomb’s law that describes a
force exerted between two charges:
1 QQ0
F¼ , ð7:8Þ
4πε r 2
where F is the force; Q and Q0 are electric charges of the two charges; r is a distance
between the two charges. Equation (7.1) represents Gauss’ law of electrostatics.
The electric charge of 1 Coulomb (1 [C]) is defined as follows: Suppose that two
point charges having the same electric charge are placed in vacuum 1 m apart. In that
situation, if a force F between the two charges is
F ¼ ec2 =107 ½N,
where ec is a light velocity, then we define the electric charge which each point charge
possesses as 1 [C]. Here, note that ec is a dimensionless number related to the light
7.1 Maxwell’s Equations and Their Characteristics 271
velocity in vacuum that is measured in a unit of [m/s]. Notice also that the light
velocity c in vacuum is defined as
c ¼ 299 792 458 ½m=s ðexact numberÞ:
Thus, we have
ec ¼ 299 792 458 ðdimensionless numberÞ:
Vacuum is a kind of dielectric media. From (7.8), its dielectric constant ε0 is

defined by
2
107 C2 12 C
ε0 ¼ 8:854 10 : ð7:9Þ
4πec2 Nm2 Nm2
Meanwhile, 1 s (second) has been defined from a certain spectral line emitted from
133
Cs. Thus, 1 m (meter) is defined by
ðdistance along which light is propagated in vacuum during 1 sÞ=299 792 458:
The quantity B is called magnetic flux density [Vsm2 ¼ m2 ]. Equation (7.2) repre-
Wb
sents Gauss’s law of magnetostatics. In contrast to (7.1), RHS of (7.2) is zero. This
corresponds to the fact that although a true charge exists, a true “magnetic charge”
does not exist. (More precisely, such charge has not been detected so far.) The
quantity B is connected with magnetic field H by the following relation:
B = μH, ð7:10Þ
where μ [AN2 ] is said to be permeability (or magnetic permeability). Thus, H has a

dimension [m A
]. Permeability of vacuum μ0 is defined by

N
μ0 ¼ 4π=10 7
: ð7:11Þ
A2
Also we have
μ0 ε0 ¼ 1=c2 : ð7:12Þ
We will encounter this relation later again. Since the quantities c and ε0 have a
defined magnitude, so does μ0 from (7.12).
We often make an issue of a relative magnitude of dielectric constant and
magnetic permeability of a dielectric medium. That is, relative permittivity εr and
relative permeability μr are defined as follows:
εr ε=ε0 and μr μ=μ0 : ð7:13Þ
Note that both εr and μr are dimensionless quantities. Those magnitudes are equal to
1 (in the case of vacuum) or larger than 1 (with any other dielectric media).
Equations (7.3) and (7.4) deal with the change in electric and magnetic fields with
time. Of these, (7.3) represents Faraday’s law of electromagnetic induction due to
Michael Faraday (1831). He found that when a permanent magnet was thrust into or
out of a closed circuit, the transient current flowed. Moreover, that experiment
implied that even without the closed circuit, an electric field was generated around
the space that changed the position relative to the permanent magnet. Equation (7.3)
is easier to understand if it is rewritten as follows:
∂B
rot E = 2 : ð7:14Þ
∂t
That is, the electric field E is generated in such a way that the induced electric field
(or induced current) tends to lessen the change in magnetic flux (or magnetic field)
produced by the permanent magnet (Lenz’s law). The minus sign in RHS indicates
that effect.
The rot operator appearing in (7.3) and (7.4) is defined by

e1 e2 e3

∂ ∂ ∂
rot V = — V =

∂x ∂y ∂z

Vx Vy Vz

∂V z ∂V y ∂V x ∂V z ∂V y ∂V x
= 2 e1 þ 2 e2 þ 2 e3 : ð7:15Þ
∂y ∂z ∂z ∂x ∂x ∂y
The operator — has already appeared in (3.9). This operator transforms a vector to a
vector. Let us think of the meaning of the rot operator. Suppose that there is a vector
field that varies with time and spatial positions. Suppose also at some instant the
spatial distribution of the field varies as in Fig. 7.1, where a spiral vector field V is
present. For a z-component of rot V around the origin, we have

∂V y ∂V x ðV 1 Þy ðV 3 Þy ðV 2 Þx ðV 4 Þx
ðrot V Þz ¼ 2 ¼ lim :
∂x ∂y Δx⟶0, Δy⟶0 Δx Δy
In the case of Fig. 7.1, (V1)y (V3)y > 0 and (V2)x (V4)x < 0 and, hence, we find
∂V
that rot V has a positive z-component. If Vz ¼ 0 and ∂zy ¼ ∂V ∂z
x
¼ 0, we find from
∂V
(7.15) that rot V possesses only the z-component. The equation ∂zy ¼ ∂V ∂z
x
¼0
implies that the vector field V is uniform in the direction of the z-axis. Thus, under
the above conditions the spiral vector field V is accompanied by the rot V vector field
that is directed toward the upper side of the plane of paper (i.e., the positive direction
of the z-axis).
Equation (7.4) can be rewritten as
∂D
rot H ¼ i þ : ð7:16Þ
∂t
A
Notice that ∂D
∂t
has the same dimension as m2 ] and is called displacement current.
Without this term, we have
rot H ¼ i: ð7:17Þ
This relation is well known as Ampère’s law or Ampère’s circuital law (Andr-
é-Marie Ampère: 1827), which determines a magnetic field yielded by a stationary
current. Again with the aid of Fig. 7.1, (7.17) implies that the current given by i
produces spiral magnetic field.
Now, let us think of a change in amount of charges with time in a part of three-
dimensional closed space V surrounded by a closed surface S. It is given by
Z Z Z Z
d ∂ρ
ρdV ¼ dV ¼ i ndS ¼ div idV, ð7:18Þ
dt V V ∂t S V
where n is an outward-directed normal unit vector; with the last equality we used
Gauss’s theorem. The Gauss’s theorem is described by
Z Z
div idV ¼ i ndS:
V S
Fig. 7.1 Schematic y

representation of a spiral
vector field V that yields
rot V
(0, Δy/2)
V2 V1
(‒Δx/2, 0) O x
(Δx/2, 0)
z
V3 V4
(0, ‒Δy/2)
Fig. 7.2 Intuitive diagram

that explains the Gauss’s
theorem
Figure 7.2 gives an intuitive diagram that explains the Gauss’s theorem. The diagram
shows a cross-section of the closed space V surrounded by a surface S. In this case,
imagine a cube or a hexahedron as V. The periphery is the cross-section of the closed
surface accordingly. Arrows in the diagram schematically represent div i on indi-
vidual fragments; only those of the center infinitesimal fragment are shown with
solid lines. The arrows of adjacent fragments cancel out each other and only the
components on the periphery are nonvanishing. Thus, the volume integration of div i
is converted to the surface integration of i. Readers are referred to appropriate
literature with the vector integration [1].
Consequently, from (7.18) we have
Z
∂ρ
þ div i dV ¼ 0: ð7:19Þ
V ∂t
Since V is arbitrarily chosen, we get
∂ρ
þ div i = 0: ð7:20Þ
∂t
The relation (7.20) is called a current continuity equation. This relation represents
law of conservation of charge.
Meanwhile, taking div of both sides of (7.17), we have
div rot H ¼ div i: ð7:21Þ
The LHS of (7.21) reads


∂ ∂H z ∂H y ∂ ∂H x ∂H z ∂ ∂H y ∂H x
div rot H ¼ 2 þ 2 þ 2
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y
2 2 2 2 2 2
∂ ∂ ∂ ∂ ∂ ∂
¼ Hz þ Hx þ Hy
∂x∂y ∂y∂x ∂y∂z ∂z∂y ∂z∂x ∂x∂z
¼ 0:
ð7:22Þ
2 2
∂ Hz
With the last equality of (7.22), we used the fact that if, e.g., ∂x∂y
and ∂∂y∂x
Hz
are
2 2
∂ Hz
continuous and differentiable in a certain domain (x, y), ∂x∂y
¼ ∂∂y∂x
Hz
. That is, we
assume “ordinary” functions for Hz, Hx, and Hy. Thus from (7.21), we have
div i = 0: ð7:23Þ
From (7.20), we also have
∂ρðx, t Þ
¼ 0, ð7:24Þ
∂t
where we explicitly show that ρ depends upon both x and t. Note that x is a position
vector described as (3.5). Therefore, (7.24) shows that ρ(x, t) is temporally constant
at a position x, consistent with the stationary current.
Nevertheless, we encounter a problem when ρ(x, t) is temporally varying. In other
words, (7.17) goes against the charge conservation law, when ρ(x, t) is temporally
varying. It was James Clerk Maxwell (1861–1862) that solved the problem by
introducing a concept of the displacement current. In fact, taking div of both sides
of (7.16), we have
∂div D ∂ρ
div rot H ¼ div i þ ¼ div i þ ¼ 0, ð7:25Þ
∂t ∂t
where with the first equality we exchanged the order of differentiations with respect
to t and x; with the second equality we used (7.1). The last equality of (7.25) results
from (7.20). In other words, in virtue of the term of ∂D
∂t
, (7.4) is consistent with the
charge conservation law. Thus, the set of Maxwell’s eqs. (7.1)–(7.4) supply us with
well-established base in natural science up until the present.
Although the set of these equations describe spatial and temporal changes in
electric and magnetic fields in vacuum and matter including metal, in Part II we
confine ourselves to the changes in the electric and magnetic fields in a uniform
dielectric medium.
7.2 Equation of Wave Motion
If we further confine ourselves to the case where neither electric charge nor electric
current is present in a uniform dielectric medium, we can readily obtain equations of
wave motion regarding the electric and magnetic fields. That is,
div D ¼ 0, ð7:26Þ
div B ¼ 0, ð7:27Þ
∂B
rot E þ ¼ 0, ð7:28Þ
∂t
∂D
rot H ¼ 0: ð7:29Þ
∂t
The relations (7.27) and (7.28) are identical to (7.2) and (7.3), respectively.
Let us start with a formula of vector analysis. First, we introduce a grad operator.
We have
∂f ∂f ∂f
grad f ¼ — f ¼ e1 þ e2 þ e3 :
∂x ∂y ∂z
That is, the grad operator transforms a scalar to a vector. We have a following
formula:
rot rot V ¼ grad div V — 2 V: ð7:30Þ
The operator — 2 has already appeared in (1.24). To show (7.30), we compare a x-

component of both sides of (7.30). That is

∂ ∂V y ∂V x ∂ ∂V x ∂V z
½rot rot V x ¼ 2 2 2 : ð7:31Þ
∂y ∂x ∂y ∂z ∂z ∂x

∂ ∂V x ∂V y ∂V z
2 2
∂ Vx ∂ Vx ∂ Vx
2
grad div V — 2 V x = þ þ
∂x ∂x ∂y ∂z ∂x
2
∂y
2
∂z
2

∂ ∂V y ∂V z
2 2
∂ Vx ∂ Vx
= þ : ð7:32Þ
∂x ∂y ∂z ∂y
2
∂z
2
2 2
∂ V ∂ V 2 2
Again assuming that ∂y∂xy ¼ ∂x∂yy and ∂∂z∂x
Vz
¼ ∂∂x∂z
Vz
, we have

½rot rot V x ¼ grad div V ∇2 V x : ð7:33Þ
Regarding y- and z-components, we have similar relations as well. Thus (7.30) holds.
7.2 Equation of Wave Motion 277
Taking rot of both sides of (7.28), we have
2
∂B ∂rotB ∂ E
rot rot E þ rot ¼ grad div E — 2 E þ ¼ — 2 E þ με 2
∂t ∂t ∂t
¼ 0, ð7:34Þ
where with the first equality we used (7.30) and for the second equality we used
(7.7), (7.10), and (7.29). Thus we have
2
∂ E
— 2 E = με 2
: ð7:35Þ
∂t
Similarly, from (7.29) we get
2
∂ H
— 2 H = με 2
: ð7:36Þ
∂t
Equations (7.35) and (7.36) are called equations of wave motions for the electric and
magnetic fields.
To consider implications of these equations, let us think of for simplicity a
following equation in a one-dimensional space.
2 2
∂ yðx, t Þ 1 ∂ yðx, t Þ
2
¼ , ð7:37Þ
∂x v2 ∂t 2
where y is an arbitrary scalar function that depends on x and t; v is a constant. Let f(x,
t) and g(x, t) be arbitrarily chosen functions. Then, f(x vt) and g(x + vt) are two
solutions of (7.37). In fact, putting X ¼ x vt, we have
2 2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X ∂ f
¼ ¼ , ¼ ¼ , ð7:38Þ
∂x ∂X ∂x ∂X ∂x2 ∂X ∂X ∂x ∂X 2
2 2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X 2 ∂ f
¼ ¼ ðvÞ , ¼ ð v Þ ¼ ð v Þ : ð7:39Þ
∂t ∂X ∂t ∂X ∂t 2 ∂X ∂X ∂t ∂X
2
From the second equations of (7.38) and (7.39), we recover
2 2
∂ f 1 ∂ f
2
¼ 2 2: ð7:40Þ
∂x v ∂t
Similarly, we get
2 2
∂ g 1 ∂ g
2
¼ 2 2: ð7:41Þ
∂x v ∂t
Therefore, as a general solution we can take a superposition of f(x, t) and g(x, t). That
is,
yðx, t Þ ¼ f ðx vt Þ þ gðx þ vt Þ: ð7:42Þ
The implication of (7.42) is as follows: (i) The function f(x vt) can be obtained
by parallel translation of f(x) by vt in a positive direction of x-axis. In other words, f
(x vt) is obtained by translating f(x) by v in a unit of time in a positive direction of
x-axis, or the function represented by f(x) is translated at a rate of v with its form
unchanged in time. (ii) The function g(x + vt), on the other hand, is translated at a rate
of v with its form unchanged in time as well. (iii) Thus, y(x, t) of (7.42) represents
two “waves,” i.e., a forward wave and a backward wave. Propagation velocity of the
two waves is jvj accordingly. Usually we choose a positive number for v and v is
called a phase velocity.
Comparing (7.35) and (7.36) with (7.41), we have
με ¼ 1=v2 : ð7:43Þ
In particular, in a vacuum we recover (7.12).

Notice that f and g can take any functional form and, hence, they are not
necessarily a periodic wave. Yet, what we are mostly concerned with is a periodic
wave such as sinusoidal waves. Thus, we arrive at a following functional form:
f ðx vt Þ ¼ AeiðxvtÞ , ð7:44Þ
where A is said to be an amplitude of the wave. The constant A usually takes a

positive number, but it may take a complex number including a negative number. An
exponent of (7.44) contains a number having a dimension [m]. To make it a
dimensionless number, we multiply (x vt) by a wavenumber k that has been
introduced in (1.2) and (1.3). That is, we have
ef ½kðx vt Þ ¼ AeikðxvtÞ ¼ AeiðkxkvtÞ ¼ AeiðkxωtÞ , ð7:45Þ
where ef shows the change in the functional from according to the variable transfor-
mation. In (7.45), we have
kv ¼ kλν ¼ ð2π=λÞλν ¼ 2πν ¼ ω, ð7:46Þ
where ν and ω are said to be frequency and angular frequency, respectively. For a
three-dimensional wave f of a scalar function, we have a following form:
7.2 Equation of Wave Motion 279
f ¼ AeiðkxωtÞ ¼ A½ cos ðk x ωt Þ þ i sin ðk x ωt Þ, ð7:47Þ
where k is said to be a wavenumber vector. In (7.47),
k2 ¼ k2 ¼ k2x þ k2y þ k 2z : ð7:48Þ
Equation (7.47) is virtually identical to (1.25). When we deal with a problem of

classical electromagnetism, we usually take a real part of the results after relevant
calculations.
Suppose that (7.47) is a solution of a following wave equation:
2
1 ∂ f
— 2f = : ð7:49Þ
v2 ∂t 2
Substituting LHS of (7.47) for f of (7.49), we have
1
A k2 eiðkxωtÞ ¼ A ω2 eiðkxωtÞ : ð7:50Þ
v2
Comparing both sides of (7.50), we get
k2 v2 ¼ ω2 or kv ¼ ω: ð7:51Þ
Thus, we recover (7.46).

Here we introduce a unit vector n as in (1.3) whose direction parallels that of
propagation of wave such that
0 1
nx
2π B C
k = kn ¼ n, n = ðe1 e2 e3 Þ@ ny A, ð7:52Þ
λ
nz
where nx, and ny, and nz define direction cosines. Then (7.47) can be rewritten as
f ¼ AeiðknxωtÞ , ð7:53Þ
where an exponent is called a phase. Suppose that the phase is fixed at zero. That is,
kn x ωt ¼ 0: ð7:54Þ
Since ω ¼ kv, we have

Fig. 7.3 Schematic z

representation of a plane
wave. The field has the same
phase on a plane P. A solid
arrow x represents an
arbitrary position vector on
the plane and n is a unit x n
vector perpendicular to the
plane P (i.e., parallel to a
normal of the plane P). The
quantity v is a phase velocity vt P
of the plane wave y
O
n x vt ¼ 0: ð7:55Þ
Equation (7.55) defines a plane in a three-dimensional space and is called a Hesse’s

normal form. Figure 7.3 schematically represents a plane wave of the field f. The
field has the same phase on a plane P. A solid arrow x represents an arbitrary position
vector on the plane and n is a unit vector perpendicular to the plane P (i.e., parallel to
a normal of the plane P). The quantity vt defines a length of a perpendicular that
connects the origin O and plane P (i.e., the length of the perpendicular from the
origin and a foot of the perpendicular) at a given time t.
In other words, (7.54) determines a plane in such a way that the wave f has the
same phase (zero) at a given time t at position vectors x on the plane determined by
(7.54) or (7.55). That plane is moving in the direction of n at a phase velocity v. From
this situation, a wave f described by (7.53) is called a plane wave.
A refractive index n of a dielectric media is an important index that characterizes
its dielectric properties. It is defined as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi
n c=v ¼ με=μ0 ε0 ¼ μr εr : ð7:56Þ
In a nonmagnetic substance such as glass and polymer materials, we can assume that
μr 1. Thus, we get an approximate expression as follows:
pffiffiffiffi
n εr : ð7:57Þ
7.3 Polarized Characteristics of Electromagnetic Waves
As in (7.47), we assume a similar form for a solution of (7.35) such that

7.3 Polarized Characteristics of Electromagnetic Waves 281
E = E0 eiðkxωtÞ ¼ E0 eiðknxωtÞ , ð7:58Þ
where with the second equality we used (7.52). Similarly, we have
H = H0 eiðkxωtÞ ¼ H0 eiðknxωtÞ : ð7:59Þ
In (7.58) and (7.59), E0 and H0 are constant vectors and may take a complex
magnitude. Substituting (7.58) for (7.28) and using (7.10) as well as (7.43) and
(7.46), we have
ikn E0 eiðknxωtÞ ¼ ðiωÞμH 0 eiðknxωtÞ ¼ ivkμH 0 eiðknxωtÞ

¼ ik μ=εH 0 eiðknxωtÞ :
Comparing coefficients of the exponential functions of the first and last sides, we get
H 0 = n E0 = μ=ε : ð7:60Þ
Similarly, substituting (7.59) for (7.29) and using (7.7), we get

E0 = μ=ε H 0 n: ð7:61Þ
n E0 ¼ n H 0 ¼ 0: ð7:62Þ
This indicates that E and H are both perpendicular to n, i.e., the propagation
direction of the electromagnetic wave. Thus, the electromagnetic wave is character-
ized by a transverse wave. The fields E and H have the same phase on P at an
arbitrary given time. Taking account of (7.60)–(7.62), E, H, and n are mutually
perpendicular to one another. We depict a geometry of E, H, and n for the
electromagnetic plane wave in Fig. 7.4, where a plane P is perpendicular to n.
We find that (7.60) is not independent of (7.61). In fact, taking an outer product
from the right with respect to both sides of (7.60), we have
pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi
H 0 n ¼ n E0 n= μ=ε ¼ E0 = μ=ε ,
where we used (7.61) with the second equality. Thus, we get
n E0 n ¼ E0 : ð7:63Þ
Meanwhile, vector analysis tell us that [1]

Fig. 7.4 Mutual geometry z

of E and H for an
electromagnetic plane wave
in P. E and H have the same
phase on P at an arbitrary P
given time. The unit vector
n is perpendicular to P E n
H y
O
C ðA BÞ ¼ AðB CÞ BðC AÞ:
In the above, putting B ¼ C = n, we have
n ðA nÞ ¼ Aðn nÞ nðn AÞ:
That is, we have
A ¼ nðn AÞ þ n ðA nÞ:
This relation means that A can be decomposed into a component parallel to n and
that perpendicular to n. Equation (7.63) shows that E0 has no component parallel to
n. This is another confirmation
pffiffiffiffiffiffiffiffi that E is perpendicular to n.
In (7.60) and (7.61), μ=ε has a p dimension
ffiffiffiffiffiffiffiffi [Ω]. Make sure that this can be
confirmed by (7.9) and (7.11). Hence, μ=ε is said to be characteristic impedance
[2]. We denote it by
Z μ=ε: ð7:64Þ
Thus, we have
H0 ¼ ðn E0 Þ=Z:
In vacuum we have
Z0 ¼ μ0 =ε0 376:7 ½Ω:
For the electromagnetic wave to be the transverse wave means that neither E nor
H has component along n. Choosing the positive direction of the z-axis for n and
7.3 Polarized Characteristics of Electromagnetic Waves 283
ignoring components related to partial differentiation with respect to x and y (i.e., the
component related to ∂/∂x and ∂/∂y), we rewrite (7.28) and (7.29) for individual
Cartesian coordinates as
∂Ey ∂Bx ∂Dy ∂H x

2 þ ¼ 0 or 2 þ με ¼ 0,
∂z ∂t ∂z ∂t
∂E x ∂By ∂Dx ∂H y
þ ¼ 0 or þ με ¼ 0,
∂z ∂t ∂z ∂t
∂H y ∂Dx ∂By ∂x
2 ¼ 0 or με ¼ 0,
∂z ∂t ∂z ∂t
∂H x ∂Dy ∂Bx ∂E y
¼ 0 or με ¼ 0: ð7:65Þ
∂z ∂t ∂z ∂t
We differentiate the first equation of (7.65) with respect to z to get
2
∂ Ey 2
∂ Bx
þ ¼ 0:
∂z
2 ∂z∂t
Also differentiating the fourth equation of (7.65) with respect to t and multiplying
both sides by μ, we have
2
2
∂ Hx ∂ Dy
μ þμ ¼ 0:
∂t∂z ∂t
2
Summing both sides of the above equations and arranging terms, we get
2 2
∂ Ey ∂ Ey
2
¼ με 2
:
∂z ∂t
2 2
∂ Ex ∂ E
2
¼ με 2 x :
∂z ∂t
Similarly, for the magnetic field, we also get
2 2
2
∂ Hx
2
∂ Hx ∂ Hy ∂ Hy
2
¼ με 2
and 2
¼ με 2
:
∂z ∂t ∂z ∂t
From the above relations, we have two plane electromagnetic waves polarized either
the x-axis or y-axis.
What is implied in the above description is that as solutions of (7.35) and (7.36)
we have a plane wave characterized by a specific direction defined by E0 and H0.
This implies that if we observe the electromagnetic wave at a fixed point, both E and
H oscillate along the mutually perpendicular directions E0 and H0. Hence, we say
that the “electric wave” is polarized in the direction E0 and that the “magnetic wave”
is polarized in the direction H0.
To characterize the polarization of the electromagnetic wave, we introduce
following unit polarization vectors εe and εm (with indices e and m related to the
electric and magnetic field, respectively) [3]:
εe = E0 =E0 and εm = H 0 =H 0 ; H 0 ¼ E 0 =Z, ð7:66Þ
where E0 and H0 are said to be amplitude and may be again complex. We have
εe εm = n: ð7:67Þ
We call εe and εm a unit polarization vector of the electric field and magnetic field,
respectively. As noted above, εe, εm, and n constitute a right-handed system in this
order and are mutually perpendicular to one another.
The phase of E in the plane wave (7.58) and that of H in (7.59) are individually
the same on all the points of P. From the wave equations of (7.35) and (7.36),
however, it is unclear whether E and H have the same phase. Suppose that E and H
would have a different phase such that
E = E0 eiðkxωtÞ
and
f0 eiðkxωtÞ ,
H = H 0 eiðkxωtþδÞ ¼ H 0 eiδ eiðkxωtÞ ¼ H
where Hf0 ð¼ H 0 eiδ Þ is complex and a phase factor eiδ in the exponent is unknown.
This factor, however, can be set at zero. To show this, let us make qualitative
discussion using Fig. 7.5. Figure 7.5 shows the electric field of the plane wave at
some instant as a function of phase Φ. Suppose that Φ is taken in the direction of n in
Fig. 7.4. Then, from (7.53) we have
Φ ¼ kn x ωt ¼ kn ρn ωt ¼ kρ ωt,
where ρ is distance from the origin. Also we have
E = E0 eiðkxωtÞ ¼ E 0 εe eiðkρωtÞ ðE 0 > 0Þ:
Suppose furthermore that the phase is measured at t ¼ 0 and that the electric field is
measured along the εe direction. Then, we have
7.4 Superposition of Two Electromagnetic Waves 285
Fig. 7.5 Electric field of a

plane wave at some instant
as a function of phase. The
Electric field
sinusoidal curve
representing the electric P1 P2
field is shifted with time 0
from left to right with its
form unchanged
−
0
Phase
Φ ¼ kρ and E ¼ jEj ¼ E 0 eikρ : ð7:68Þ
In Fig. 7.5, a real part of E is shown with E ¼ E0 at ρ ¼ 0. The sinusoidal curve

representing the electric field in Fig. 7.5 is shifted with time from left to right with its
form unchanged.
In this situation, the electric field is to be strengthened with a negative value at a
phase P1 with time according to (7.68), whereas at another phase P2 the electric field
is to be strengthened with a positive value with time. As a result, in a region tucked
between P1 and P2 a spiral magnetic field is generated toward the εm direction, i.e.,
upper side of a plane of paper. The magnitude of the magnetic field is expected to be
maximized at a center point of P1P2 (i.e., the origin of the coordinate system) where
the electric field is maximized as well. Thus, we conclude that E and H have the
same phase. It is important to realize that the generation of the electromagnetic wave
is a direct consequence of the interplay between the electric field and magnetic field
that both change with time. It is truly based upon the nature of the electric and
magnetic fields that are clearly represented in Maxwell’s equations.
Next, we consider the situation where two electromagnetic waves are propagated
in the same direction but with a different phase. Notice again that we are considering
the electromagnetic wave that is propagated in a uniform and infinite dielectric
media without BCs.
7.4 Superposition of Two Electromagnetic Waves
Let E1 and E2 be two electric waves described such that
E1 = E 1 e1 eiðkzωtÞ and E2 = E 2 e2 eiðkzωtþδÞ ,
where E1 (>0) and E2 (>0) are amplitudes and e1 and e2 represent unit polarization
vectors in the direction of positive x-axis and y-axis; we assume that two waves are
being propagated in the direction of the positive z-axis; δ is a phase difference. The
total electric field E is described as the superposition of E1 and E2 such that
E = E1 þ E2 = E1 e1 eiðkzωtÞ þ E 2 e2 eiðkzωtþδÞ : ð7:69Þ
Note that we usually discuss the polarization characteristics of electromagnetic

wave only by considering electric waves. We emphasize that an electric wave and
concomitant magnetic wave share the same phase in a uniform and infinite dielectric
media. A reason why the electric wave represents an electromagnetic wave is partly
because optical application is mostly made in a nonmagnetic substance such as glass,
water, plastics, and most of semiconductors.
Let us view temporal change of E at a fixed point x = 0; x ¼ y ¼ z ¼ 0. Then,
taking a real part of (7.69), x- and y-components of E; i.e., Ex and Ey are expressed as
E x ¼ E 1 cos ðωt Þ and E y ¼ E 2 cos ðωt þ δÞ: ð7:70Þ
First, let us briefly think of the case where δ ¼ 0. Eliminating t, we have
E2
Ey ¼ E: ð7:71Þ
E1 x
This is an equation of a straight line. The resulting electric field E is called a linearly
polarized light accordingly. That is, when we are observing the electric field of the
relevant light at the origin, the field is oscillating along the straight line described by
(7.71) with the origin centrally located of the oscillating field. If δ ¼ π, we have
E2
Ey ¼ E:
E1 x
This gives a straight line as well. Therefore, if we wish to seek the relationship
between Ex and Ey, it suffices to examine it as a function of δ in a region of π2
δ π2.
(i) Case I: E1 6¼ E2.
Let us consider the case where δ 6¼ 0 in (7.70). Rewriting the second equation of
(7.70) and inserting the first equation into it so that we can eliminate t, we have
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi!
E E2
E y ¼ E 2 ð cos ωt cos δ þ sin ωt sin δÞ ¼ E 2 cos δ x sin δ 1 x2 :
E1 E1
Rearranging terms of the above equation, we have

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ey E E2
ð cos δÞ x ¼ ð sin δÞ 1 x2 : ð7:72Þ
E2 E1 E1
Squaring both sides of (7.72) and arranging the equation, we get
Ex 2 2ð cos δÞE x E y Ey 2
þ ¼ 1: ð7:73Þ
E 21 sin 2 δ E1 E2 sin 2 δ E22 sin 2 δ
Using a matrix form, we have

0 1 cos δ 1

B E 21 sin 2 δ E 1 E 2 sin 2 δ C Ex
B
Ex Ey @ C ¼ 1: ð7:74Þ
cos δ 1 A E
y
E 1 E 2 sin 2 δ E22 sin 2 δ
Note that the above matrix is real symmetric. In that case, to examine properties
of the matrix we calculate its determinant along with principal minors. The principal
minor means a minor with respect to a diagonal element. In this case, two principal
1 1
minors are E2 sin 2 and 2
δ E sin 2 δ
. Also we have
2 1

1 cos δ

E21 sin 2 δ E 1 E 2 sin 2 δ 1

cos δ 1 ¼ E2 E 2 sin 2 δ : ð7:75Þ
1 2
E E sin 2 δ E22 sin 2 δ
1 2
Evidently, two principal minors as well as a determinant are all positive (δ 6¼ 0). In
this case, the (2, 2) matrix of (7.74) is said to be positive definite. The related
discussion will be given in Part III. The positive definiteness means that in a
quadratic form described by (7.74), LHS takes a positive value for any real number
Ex and Ey except a unique case where Ex ¼ Ey ¼ 0, which renders LHS zero. The
positive definiteness of a matrix ensures the existence of positive eigenvalues with
the said matrix.
Let us consider a real symmetric (2, 2) matrix that has positive principal minors
and a positive determinant in a general case. Let such a matrix M be

a c
M¼ ,
c b
where a, b > 0 and det M > 0; i.e., ab c2 > 0. Let a corresponding quadratic form
be Q. Then, we have

a c x cy 2 c2 y2 aby2
Q ¼ ð x yÞ ¼ ax þ 2cyx þ by ¼ a x þ
2 2
2 þ 2
c b y a a a

cy 2 y2
¼a xþ þ 2 ab c2 :
a a
Thus, Q 0 for any real numbers x and y. We seek a condition under which Q ¼ 0.
We readily find that with M that has the above properties, only x ¼ y ¼ 0 makes
Q ¼ 0. Thus, M is positive definite. We will deal with this issue from a more general
standpoint in Part III.
In general, it is pretty complicated to seek eigenvalues and corresponding eigen-
vectors in the above case. Yet, we can extract important information from (7.74).
The eigenvalues λ are estimated as follows:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
E 21 þ E 22 E21 þ E22 4E21 E22 sin 2 δ
λ¼ : ð7:76Þ
2E 21 E 22 sin 2 δ
Notice that λ in (7.76) represents two different positive eigenvalues. It is because an

inside of the square root is rewritten by
2
E21 E22 þ 4E21 E22 cos 2 δ > 0 ðδ 6¼ π=2Þ:
Also we have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
E 21 þ E22 > E 21 þ E 22 4E 21 E 22 sin 2 δ:
These clearly show that the quadratic form of (7.74) gives an ellipse (i.e., elliptically
polarized light). Because of the presence of the second term of LHS of (7.73), both
the major and minor axes of the ellipse are tilted and diverted from the x- and y-axes.
Let us inspect the ellipse described by (7.74). Inserting Ex ¼ E1 obtained at t ¼ 0
in (7.70) into (7.73) and solving a quadratic equation with respect to Ey, we get Ey as
a double root such that
E y ¼ E 2 cos δ:
Similarly putting Ey ¼ E2 in (7.73), we have
E x ¼ E 1 cos δ:
These results show that an ellipse described by (7.73) or (7.74) is internally tangent
to a rectangle as depicted in Fig. 7.6a. Equation (7.69) shows that the electromag-
netic wave is propagated toward the positive direction of the z-axis. Therefore, in
Fig. 7.6a we are peeking into the oncoming wave from the bottom of a plane of paper
(a) (b)
cos
cos
Fig. 7.6 Trace of an electric field of an elliptically polarized light. (a) The trace is internally tangent
to a rectangle of 2E1 2E2. In the case of δ > 0, starting from P at t ¼ 0, the coordinate point
representing the electric field traces the ellipse counterclockwise with time. (b) The trace of an
elliptically polarized light for δ ¼ π/2
at a certain position of z ¼ constant. We set the constant ¼ 0. Then, we find that at

t ¼ 0 the electric field is represented by the point P (Ex ¼ E1, Ey ¼ E2 cos δ); see
Fig. 7.6a. From (5.70), if δ>0, P traces the ellipse counterclockwise. It reaches a
maximum point of Ey ¼ E2 at t ¼ δ/2ω. Since the trace of electric field forms an
ellipse as in Fig. 7.6, the associated light is said to be an elliptically polarized light. If
δ < 0 in (5.70), on the other hand, P traces the ellipse clockwise.
In a special case of δ ¼ π/2, the second term of (7.73) vanishes and we have a
simple form described as
Ex 2 Ey 2
þ 2 ¼ 1: ð7:77Þ
E 21 E2
Thus, the principal axes of the ellipse coincide with the x- and y-axes. On the basis of
(7.70), we see from Fig. 7.6b that starting from P at t ¼ 0, again the coordinate point
representing the electric field traces the ellipse counterclockwise with time; see the
curved arrow of Fig. 7.6b. If δ < 0, the coordinate point traces the ellipse clockwise
with time.
(ii) Case II: E1 ¼ E2.
Now, let us consider a simple but important case. When E1 ¼ E2, (7.73) is simplified
to be
E x 2 2 cos δE x E y þ E y 2 ¼ E21 sin 2 δ: ð7:78Þ
Using a matrix form, we have


1 cos δ Ex
Ex Ey ¼ E 21 sin 2 δ: ð7:79Þ
cos δ 1 Ey
We obtain eigenvalues λ of the matrix of (7.79) such that
λ¼1 j cos δ j : ð7:80Þ
Setting π2 δ π2, we have
λ¼1 cos δ: ð7:81Þ
The corresponding normalized eigenvectors v1 and v2 (as a column vector) are

0 1 0 1
1 1
B 2 C B C
v1 ¼ B C and v2 ¼ B 2 C: ð7:82Þ
@ 1 A @ 1 A
2 2
Thus, we have a diagonalizing unitary matrix P such that

0 1
1 1
B 2 2C
P¼B
@
C: ð7:83Þ
1 1 A
2 2
Defining the above matrix appearing in (7.79) as A such that

1 cos δ
A¼ , ð7:84Þ
cos δ 1
we obtain

1 þ cos δ 0
P1 AP ¼ : ð7:85Þ
0 1 cos δ
Notice that eigenvalues (1 + cos δ) and (1 cos δ) are both positive as expected.

1 1 cos δ 1
Ex
E x Ey PP PP
cos δ 1 Ey

1 þ cos δ 0 1
Ex
¼ Ex Ey P P ¼ E21 sin 2 δ: ð7:86Þ
0 1 cos δ Ey
Here, let us define new coordinates such that

u 1
Ex
P : ð7:87Þ
v Ey
This coordinate transformation corresponds to the transformation of basis vectors

(e1 e2) such that

Ex Ex u
ð e1 e2 Þ ¼ ðe1 e2 ÞPP1 ¼ e01 e02 , ð7:88Þ
Ey Ey v
where new basis vectors e01 e02 are given by

1 1 1 1
e01 e02 ¼ ðe1 e2 ÞP ¼ pffiffiffi e1 pffiffiffi e2 pffiffiffi e1 þ pffiffiffi e2 : ð7:89Þ
2 2 2 2
The coordinate system is depicted in Fig. 7.7 along with the basis vectors. The
relevant discussion will again appear in Part III.
Substituting (7.87) for (7.86) and rearranging terms, we get
u2 v2
þ 2 ¼ 1: ð7:90Þ
E21 ð1 cos δÞ E 1 ð1 þ cos δÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Equation (7.90) ffi indicates that a major axis and minor axis are E1 1 þ cos δ and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
E 1 1 cos δ, respectively. When δ ¼ π/2, (7.90) becomes
Fig. 7.7 Relationship

between the basis vectors
(e1 e2) and e01 e02 in the
case of E1 ¼ E2; see text
(a) (b)
1 − cos
>0
= /2
Fig. 7.8 Polarized feature of light in the case of E1 ¼ E2. (a) If δ > 0, the electric field traces an
ellipse from A1 via A2 to A3 (see text). (b) If δ ¼ π/2, the electric field traces a circle from C1 via C2
to C3 (left-circularly polarized light)
u2 v 2
þ ¼ 1: ð7:91Þ
E21 E 21
This represents a circle. For this reason, the wave described by (7.91) is called a
circularly polarized light. In (7.90) where δ 6¼ π/2, the wave is said to be an
elliptically polarized light. Thus, we have linearly, elliptically, and circularly polar-
ized lights depending on a magnitude of δ.
Let us closely examine characteristics of the elliptically and circularly polarized
lights in the case of E1 ¼ E2. When t ¼ 0, from (7.70) we have
E x ¼ E1 and Ey ¼ E 1 cos δ: ð7:92Þ
This coordinate point corresponds to A1 whose Ex coordinate is E1 (see Fig. 7.8a). In

the case of Δt ¼ δ/2ω, Ex ¼ Ey ¼ E1 cos ( δ/2). This point corresponds to A2 in
Fig. 7.8a. We have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
E 2x þ E2y ¼ 2E 1 cos ðδ=2Þ ¼ E 1 1 þ cos δ: ð7:93Þ
This is equal to the major axis as anticipated. With t ¼ Δt,
E x ¼ E1 cos ðωΔt Þ and E y ¼ E 1 cos ðωΔt þ δÞ: ð7:94Þ
Notice that Ey takes a maximum E1 when Δt ¼ δ/ω. Consequently, if δ takes a

positive value, Ey takes a maximum E1 for a positive Δt, as is similarly the case with
Fig. 7.6a. At that time, Ex ¼ E1 cos (δ) < E1. This point corresponds to A3 in
Fig. 7.8a. As a result, the electric field traces the ellipse counterclockwise with time,
References 293
as in the case of Fig. 7.6. If δ takes a negative value, on the other hand, the field traces
the ellipse clockwise.
If δ ¼ π/2, in (7.94) we have
π
Ex ¼ E 1 cos ðωt Þ and E y ¼ E 1 cos ωt : ð7:95Þ
2
We examine the case of δ ¼ π/2 first. In this case,pwhenffiffiffi t ¼ 0, Ex ¼ E1 and Ey ¼ 0

(Point C1 in Fig. 7.8b). If ωt ¼ π/4, Ex ¼ E y ¼ 1= 2 (Point C2 in Fig. 7.8b). In turn,
if ωt ¼ π/2, Ex ¼ 0 and Ey ¼ E1 (Point C3). Again the electric field traces the circle
counterclockwise. In this situation, we see the light from above the z-axis. In other
words, we are viewing the light against the direction of its propagation. The wave is
said to be left-circularly polarized and have positive helicity. In contrast, when
δ ¼ π/2, starting from Point C1 the electric field traces the circle clockwise.
That light is said to be right-circularly polarized and have negative helicity.
With the left-circularly polarized light, (7.69) can be rewritten as
E = E1 þ E2 = E1 ðe1 þ ie2 ÞeiðkzωtÞ : ð7:96Þ
Therefore, a complex vector (e1 + ie2) characterizes the left-circular polarization. On

the other hand, (e1 ie2) characterizes the right-circular polarization. To normalize
them, it is convenient to use the following vectors as in the case of Sect. 4.3 [3].
1 1
eþ pffiffiffi ðe1 þ ie2 Þ and e pffiffiffi ðe1 ie2 Þ, ð4:45Þ
2 2
In the case of δ ¼ 0, we have a linearly polarized light. For this, the points A1, A2,
and A3 coalesce to be a point on a straight line of Ey ¼ Ex.
References
2. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
Chapter 8
Reflection and Transmission
of Electromagnetic Waves in Dielectric
Media
In Chap. 7, we considered the propagation of electromagnetic waves in an infinite

uniform dielectric medium. In this chapter, we think of a situation where two
(or more) dielectrics are in contact with each other at a plane interface. When two
dielectric media adjoin each other with an interface, propagating electromagnetic
waves are partly reflected by the interface and partly transmitted beyond the inter-
face. We deal with these phenomena in terms of characteristic impedance of the
dielectric media. In the case of an oblique incidence of a wave, we categorize it into a
transverse electric (TE) wave and transverse magnetic (TM) wave. If a thin plate of a
dielectric is sandwiched by a couple of metal sheets, the electromagnetic wave is
confined within the dielectric. In this case, the propagating mode of the wave differs
from that of a wave propagating in a free space (i.e., a space filled by a three-
dimensionally infinite dielectric medium). If a thin plate of a dielectric having a large
refractive index is sandwiched by a couple of dielectrics with a smaller refractive
index, the electromagnetic wave is also confined within the dielectric with a larger
index. In this case, we have to take account of the total reflection that causes a phase
change upon the reflection. We deal with such specific modes of the electromagnetic
wave propagation. These phenomena are treated both from a basic aspect and from a
point of view of device application. The relevant devices are called waveguides in
optics.
8.1 Electromagnetic Fields at an Interface
We start with examining a condition of an electromagnetic field at the plane

interface. Suppose that two semi-infinite dielectric media D1 and D2 are in contact
with each other at a plane interface. Let us take a small rectangle S that strides the
interface (see Fig. 8.1). Taking a surface integral of both sides of (7.28) over the
strip, we have

https://doi.org/10.1007/978-981-15-2225-3_8
296 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
Z Z
∂B
rot E ndS þ ndS ¼ 0, ð8:1Þ
S S ∂t
where n is a unit vector directed to a normal of S as shown. Applying Stokes’

theorem to the first term of (8.1), we get
I
∂B
E dl þ nΔlΔh ¼ 0: ð8:2Þ
C ∂t
With the line integral of the first term, C is a closed loop surrounding the rectangle
S and dl = tdl, where t is a unit vector directed toward the tangential direction of
C (see t1 and t2 in Fig. 8.1). The line integration is performed such that C is followed
counter-clockwise in the direction of t.
Figure 8.2 gives an intuitive diagram that explains the Stokes’ theorem. The
diagram shows an overview of a surface S encircled by a closed curve C. Suppose
that we have a spiral vector field E represented by arrowed circles as shown. In that
case, rot E is directed toward the upper side of the plane of paper in the individual
fragments. A summation of rot E ndS forms a surface integral covering S. Mean-
while, the arrows of adjacent fragments cancel out each other and only the compo-
nents on the periphery (i.e., the curve C) are nonvanishing (see Fig. 8.2). Thus, the
surface integral of rot E is equivalent to the line integral of E. Accordingly, we get
Stokes’ theorem described by [1]
Δl
t1
Δh
D1
D2 n C t2
S
Fig. 8.1 A small rectangle S that strides an interface formed by two semi-infinite dielectric media
of D1 and D2. Let a curve C be a closed loop surrounding the rectangle S. A unit vector n is directed
to a normal of S. Unit vectors t1 and t2 are directed to a tangential line of the interface plane
C
dS
n
Fig. 8.2 Diagram that intuitively explains the Stokes’ theorem. In the diagram a surface S is
encircled by a closed curve C. An infinitesimal portion of C is denoted by dl. The surface S is
pertinent to the surface integration. Spiral vector field E is present on and near S
8.2 Basic Concepts Underlying Phenomena 297
Z I
rot E ndS ¼ E dl: ð8:3Þ
S C
Returning back to Fig. 8.1 and taking Δh ⟶ 0, we have ∂B

∂t
nΔlΔh⟶0. Then, the
second term of (8.2) vanishes and we get
I
E dl = 0:
C
This implies that
ΔlðE1 t1 þ E2 t2 Þ ¼ 0,
where E1 and E2 represent the electric field in the dielectrics D1 and D2 close to the
interface, respectively. Considering t2 = 2 t1 and putting t1 = t, we get
ðE1 E2 Þ t ¼ 0, ð8:4Þ
where t represents a unit vector in the direction of a tangential line of the interface
plane. Equation (8.4) means that the tangential components of the electric field are
continuous on both sides of the interface. We obtain a similar result with the
magnetic field. This can be shown by taking a surface integral of both sides of
(7.29) as well. As a result, we get
ðH 1 H 2 Þ t ¼ 0, ð8:5Þ
where H1 and H2 represent the magnetic field in D1 and D2 close to the interface,
respectively. Hence, from (8.5) the tangential components of the magnetic field are
continuous on both sides of the interface as well.
8.2 Basic Concepts Underlying Phenomena
When an electromagnetic wave is incident upon an interface of dielectrics, its

reflection and transmission (refraction) take place at the interface. We address a
question of how the nature of the dielectrics and the conditions dealt with in the
previous section are associated with the optical phenomena. When we deal with the
problem, we assume non-absorbing media. Notice that the complex wavenumber
vector is responsible for an absorbing medium along with a complex index of
refraction. Nonetheless, our approach is useful to discuss related problems in the
absorbing media. Characteristic impedance plays a key role in the reflection and
transmission of light.
We represent a field (either electric or magnetic) of the incident, reflected, and

transmitted (or refracted) waves by Fi, Fr, and Ft, respectively. We call a dielectric
of the incidence side (and, hence, reflection side) D1 and another dielectric of the
transmission side D2. The fields are described by
Fi = F i εi eiðki xωtÞ , ð8:6Þ

Fr = F r εr eiðkr xωtÞ , ð8:7Þ
Ft = F t εt eiðkt xωtÞ , ð8:8Þ
where Fi, Fr, and Ft denote an amplitude of the field; εi, εr, and εt represent a unit
vector of polarization direction, i.e., the direction along which the field oscillates; ki,
kr, and kt are wavenumber vectors such that ki ⊥ εi, kr ⊥ εr, and kt ⊥ εt. These
wavenumber vectors represent the propagation directions of individual waves. In
(8.6) to (8.8), indices of i, r, and t stand for incidence, reflection, and transmission,
respectively.
Let xs be an arbitrary position vector at the interface between the dielectrics. Also,
let t be a unit vector paralleling the interface. Thus, tangential components of the
field are described as
F it = F i ðt εi Þeiðki xs ωtÞ , ð8:9Þ
F rt = F r ðt εr Þeiðkr xs ωtÞ , ð8:10Þ

F tt = F t ðt εt Þeiðkt xs ωtÞ : ð8:11Þ
Note that F it and F rt represent the field in D1 just close to the interface and that F tt
denotes the field in D2 just close to the interface. Thus, in light of (8.4) and (8.5), we
have
F it þ F rt ¼ F tt : ð8:12Þ
Notice that (8.12) holds with any position xs and any time t.
Let us think of elementary calculation of exponential functions or exponential
polynomials and the relationship between individual coefficients and exponents.
0
With respect to two functions eikx and eik x , we have two alternatives according to a
value Wronskian takes. Here, Wronskian W is expressed as

eikx eik x
0
0
W ¼ ikx 0 ik0 x 0 ¼ iðk k 0 Þeiðkþk Þx : ð8:13Þ
e e
0
(i) W 6¼ 0 if and only if k 6¼ k0. In this case, eikx and eik x are said to be linearly
independent. That is, on condition of k 6¼ k0, for any x we have
0
aeikx þ beik x ¼ 0 ⟺ a ¼ b ¼ 0: ð8:14Þ
(ii) W ¼ 0 if k ¼ k0. In that case, we have

0
aeikx þ beik x ¼ ða þ bÞeikx ¼ 0 ⟺ a þ b ¼ 0:
Notice that eikx never vanishes with any x. To conclude, if we think of an

equation of an exponential polynomial
0
aeikx þ beik x ¼ 0,
we have two alternatives regarding the coefficients. One is a trivial case of

a ¼ b ¼ 0 and the other is a þ b ¼ 0.
Next, with respect to eik1 x , and eik2 x , and eik3 x , similarly we have

eik1 x eik2 x eik3 x
0 0 0
eik1 x eik2 x eik3 x
W¼
ik x 00 ik x 00 ik x 00
e 1 e 2 e 3
¼ iðk1 k 2 Þðk2 k3 Þðk3 k1 Þeiðk1 þk2 þk3 Þx , ð8:15Þ
where W 6¼ 0 if and only if k1 6¼ k2, k2 6¼ k3, and k3 6¼ k1. That is, on this condition for
any x we have
aeik1 x þ beik2 x þ ceik3 x ¼ 0 ⟺ a ¼ b ¼ c ¼ 0: ð8:16Þ
If the three exponential functions are linearly dependent, at least two of k1, k2, and k3
are equal to each other, and vice versa. On this condition, again consider a following
equation of an exponential polynomial:
aeik1 x þ beik2 x þ ceik3 x ¼ 0: ð8:17Þ
Without loss of generality, we assume that k1 ¼ k2. Then, we have
aeik1 x þ beik2 x þ ceik3 x ¼ ða þ bÞeik1 x þ ceik3 x ¼ 0:
If k1 6¼ k3, we must have
a þ b ¼ 0 and c ¼ 0: ð8:18Þ
If, on the other hand, k1 ¼ k3, i.e., k1 ¼ k2 ¼ k3, we have

aeik1 x þ beik2 x þ ceik3 x ¼ ða þ b þ cÞeik1 x ¼ 0:
That is, we have
a þ b þ c ¼ 0: ð8:19Þ
Consequently, we must have k1 ¼ k2 ¼ k3 so that we can get three nonzero

coefficients a, b, and c.
Returning to (8.12), its full description is
F i ðt εi Þeiðki xs ωtÞ þ F r ðt εr Þeiðkr xs ωtÞ F t ðt εt Þeiðkt xs ωtÞ ¼ 0: ð8:20Þ
Again, (8.20) must hold with any position xs and any time t. Meanwhile, for (8.20) to
have a physical meaning, we should have
F i ðt εi Þ 6¼ 0, F r ðt εr Þ 6¼ 0, and F t ðt εt Þ 6¼ 0: ð8:21Þ
On the basis of the above consideration, we must have following two relations:
ki xs ωt ¼ kr xs ωt ¼ kt xs ωt
or
ki xs ¼ kr xs ¼ k t xs , ð8:22Þ
and
F i ðt εi Þ þ F r ðt εr Þ F t ðt εt Þ ¼ 0
or
F i ðt εi Þ þ F r ðt εr Þ ¼ F t ðt εt Þ: ð8:23Þ
In this way, we are able to obtain a relation among amplitudes of the fields of
incidence, reflection, and transmission. Notice that we get both the relations between
exponents and coefficients at once.
First, let us consider (8.22). Suppose that the incident light (ki) is propagated in a
dielectric medium D1 in parallel to the zx-plane and that the interface is the xy-plane
(see Fig. 8.3). Also suppose that at the interface the light is reflected partly back to
D1 and transmitted (or refracted) partly into another dielectric medium D2. In
Fig. 8.3, ki, kr, and, kt represent the incident, reflected, and transmitted lights that
make an angle θ, θ0, and ϕ with the z-axis, respectively. Then we have
θ θ kr
' x
'
θ ki
φ
kt
Fig. 8.3 Geometry of the incident, reflected, and transmitted lights. We assume that the light is
incident from a dielectric medium D1 toward another medium D2. The wavenumber vectors ki, kr,
and, kt represent the incident, reflected, and transmitted (or refracted) lights with an angle θ, θ0, and
ϕ, respectively. Note here that we did not assume the equality of θ and θ0 (see text)
0 1
ki sin θ
B C
ki ¼ ð e 1 e 2 e 3 Þ @ 0 A, ð8:24Þ
ki cos θ
0 1
x
B C
xs ¼ ð e 1 e 2 e3 Þ @ y A , ð8:25Þ
0
where θ is said to be an incidence angle. A plane formed by ki and a normal to the

interface is called a plane of incidence (or incidence plane). In Fig. 8.3, the zx-plane
forms the incidence plane. From (8.24) and (8.25), we have
ki xs ¼ k i x sin θ, ð8:26Þ
kr xs ¼ k rx x þ kry y, ð8:27Þ
kt xs ¼ ktx x þ kty y, ð8:28Þ
where ki ¼ j kij; krx and kry are x and y components of kr; similarly ktx and k ty are
x and y components of kt.
Since (8.22) holds with any x and y, we have
ki sin θ ¼ krx ¼ ktx , ð8:29Þ

k ry ¼ k ty ¼ 0: ð8:30Þ
From (8.30) neither kr nor kt has a y component. This means that ki, kr, and kt are
coplanar. That is, the incident, reflected, and transmitted waves are all parallel to the
zx-plane. Notice that at the beginning we did not assume the coplanarity of those
waves. We did not assume the equality of θ and θ0 either (vide infra). From (8.29)
and Fig. 8.3, however, we have
k i sin θ ¼ kr sin θ0 ¼ kt sin ϕ, ð8:31Þ
where kr ¼ j krj and kt ¼ j ktj; θ0 and ϕ are said to be a reflection angle and a
refraction angle, respectively. Thus, the end points of ki, kr, and kt are connected on a
straight line that parallels the z-axis. Figure 8.3 clearly shows it.
Now, we suppose that a wavelength of the electromagnetic wave in D1 is λ1 and
that in D2 is λ2. Since the incident light and reflected light are propagated in D1, we
have
ki ¼ kr ¼ 2π=λ1 : ð8:32Þ
From (8.31) and (8.32), we get
sin θ ¼ sin θ0 : ð8:33Þ
Therefore, we have either θ ¼ θ0 or θ0 ¼ π θ. since 0 < θ, θ0 < π/2, we have
θ ¼ θ0 : ð8:34Þ
Then, returning back to (8.31), we have
k i sin θ ¼ kr sin θ ¼ kt sin ϕ: ð8:35Þ
This implies that the components tangential to the interface of ki, kr, and kt are
the same.
Meanwhile, we have
kt ¼ 2π=λ2 : ð8:36Þ
Also we have
c ¼ λ0 ν, v1 ¼ λ1 ν, v2 ¼ λ2 ν, ð8:37Þ
where v1 and v2 are phase velocities of light in D1 and D2, respectively. Since ν is
common to D1 and D2, we have
c=λ0 ¼ v1 =λ1 ¼ v2 =λ2
or
c=v1 ¼ λ0 =λ1 ¼ n1 , c=v2 ¼ λ0 =λ2 ¼ n2 , ð8:38Þ

8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 303
where λ0 is a wavelength in vacuum; n1 and n2 are refractive indices of D1 and D2,

respectively. Combining (8.35) with (8.32), (8.36), and (8.38), we have several
relations such that
sin θ kt λ1 n2
¼ ¼ ¼ ð nÞ, ð8:39Þ
sin ϕ ki λ2 n1
where n is said to be a relative refractive index of D2 relative to D1. The relation

(8.39) is called Snell’s law. Notice that (8.39) reflects the kinematic aspect of light
and that this characteristic comes from the exponents of (8.20).
8.3 Transverse Electric (TE) Waves and Transverse

Magnetic (TM) Waves
On the basis of the above argument, we are now in the position to determine the
relations among amplitudes of the electromagnetic fields of waves of incidence,
reflection, and transmission. Notice that since we are dealing with non-absorbing
media, the relevant amplitudes are real (i.e., positive or negative). In other words,
when the phase is retained upon reflection, we have a positive amplitude due to
ei0 ¼ 1. When the phase is reversed upon reflection, on the other hand, we will be
treating a negative amplitude due to eiπ ¼ 1. Nevertheless, when we consider the
total reflection, we deal with a complex amplitude (vide infra).
We start with the discussion of the vertical incidence of an electromagnetic wave
before the general oblique incidence. In Fig. 8.4a, we depict electric fields E and
magnetic fields H obtained at a certain moment near the interface. We index, e.g., Ei
for the incident field. There we define unit polarization vectors of the electric field εi,
εr, and εt as identical to be e1 (a unit vector in the direction of the x-axis). In (8.6), we
also define Fi (both electric and magnetic fields) as positive.
(a) z Incidence light (b) z Incidence light
Hi Ei Hi Ei
Hr Er Hr Er
' x x
Ht e1 ' e1
Et Ht Et
Fig. 8.4 Geometry of the electromagnetic fields near the interface between dielectric media D1 and
D2 in the case of vertical incidence. (a) All Ei, Er, and Et are directed in the same direction e1 (i.e., a
unit vector in the positive direction of the x-axis). (b) Although Ei and Et are directed in the same
direction, Er is reversed. In this case, we define Er as negative
We have two cases about a geometry of the fields (see Fig. 8.4). The first case is
that all Ei, Er, and Et are directed in the same direction (i.e., the positive direction of
the x-axis); see Fig. 8.4a. Another case is that although Ei and Et are directed in the
same direction, Er is reversed (Fig. 8.4b). In this case, we define Er as negative.
Notice that Ei and Et are always directed in the same direction and that Er is directed
either in the same direction or in the opposite direction according to the nature of the
dielectrics. The situation will be discussed soon.
Meanwhile, unit polarization vectors of the magnetic fields are determined by
(7.67) for the incident, reflected, and transmitted waves. In Fig. 8.4, the magnetic
fields are polarized along the y-axis (i.e., perpendicular to the plane of paper). The
magnetic fields Hi and Ht are always directed to the same direction as in the case of
the electric fields. On the other hand, if the phase of Er is conserved, the direction of
Hr is reversed and vice versa. This converse relationship with respect to the electric
and magnetic fields results solely from the requirement that E, H, and the propaga-
tion unit vector n of light must constitute a right-handed system in this order. Notice
that n is reversed upon reflection.
Next, let us consider an oblique incidence. With the oblique incidence, electro-
magnetic waves are classified into two special categories, i.e., transverse electric
(TE) waves (or modes) or transverse magnetic (TM) waves (or modes). The TE wave
is characterized by the electric field that is perpendicular to the incidence plane,
whereas the TM wave is characterized by the magnetic field that is perpendicular to
the incidence plane. Here the incidence plane is a plane that is formed by the
propagation direction of the incident light and the normal to the interface of the
two dielectrics. Since E, H, and n form a right-handed system, in the TE wave H lies
on the incidence plane. For the same reason, in the TM wave E lies on the incidence
plane.
In a general case where a field is polarized in an arbitrary direction, that field can
be formed by superimposing two fields corresponding to the TE and TM waves. In
other words, if we take an arbitrary field E, it can be decomposed into a component
having a unit polarization vector directed perpendicular to the incidence plane and
another component having the polarization vector that lies on the incidence plane.
These two components are orthogonal to each other.
Example 8.1: TE Wave In Fig. 8.5 we depict the geometry of oblique incidence of
a TE wave. The xy-plane defines the interface of the two dielectrics and t of (8.9) lies
on that plane. The zx-plane defines the incidence plane. In this case, E is polarized
along the y-axis with H polarized in the zx-plane. That is, regarding E, we choose
polarization direction εi, εr, and εt of the electric field as e2 (a unit vector toward the
positive direction of the y-axis that is perpendicular to the plane of paper). In Fig. 8.5,
the polarization direction of the electric field is denoted by a symbol ⨂. Therefore,
we have
e2 εi = e2 εr = e2 εt = 1: ð8:40Þ
Hr z
ki εi kr
εt
θ
' x
y '
Ei Et εr Er
φ
kt
Fig. 8.5 Geometry of the electromagnetic fields near the interface between dielectric media D1 and
D2 in the case of oblique incidence of a TE wave. The electric field E is polarized along the y-axis
(i.e., perpendicular to the plane of paper) with H polarized in the zx-plane. Polarization directions εi,
εr, and εt are given for H. To avoid complication, neither Hi nor Ht is shown
For H we define the direction of unit polarization vectors εi, εr, and εt so that their
direction cosine relative to the x-axis can be positive; see Fig. 8.5. Choosing e1
(a unit vector in the direction of the x-axis) for t of (8.9) with regard to H, we have
e1 εi = cos θ, e1 εr = cos θ, and e1 εt = cos ϕ: ð8:41Þ
According as Hr is directed to the same direction as εr or the opposite direction to

εr, the amplitude is defined as positive or negative, as in the case of the vertical
incidence. In Fig. 8.5, we depict the case where the amplitude Hr is negative. That is,
the phase of the magnetic field is reversed upon reflection and, hence, Hr is in an
opposite direction to εr in Fig. 8.5. Note that Hi and Ht are in the same direction as εi
and εt, respectively. To avoid complication, neither Hi nor Ht is shown in Fig. 8.5.
Applying (8.23) to both E and H, we have
Ei þ Er ¼ Et , ð8:42Þ
H i cos θ þ H r cos θ ¼ H t cos ϕ: ð8:43Þ
To derive the above equations, we choose t = e2 with E and t = e1 with H for (8.23).
Because of the abovementioned converse relationship with E and H, we have
E r H r < 0: ð8:44Þ
Suppose that we carry out an experiment to determine six amplitudes in (8.42)

and (8.43). Out of those quantities, we can freely choose and fix Ei. Then, we have
five unknown amplitudes, i.e., Er, Et, Hi, Hr, and Ht. Thus, we need three more
relations to determine them. Here information about the characteristic impedance
Z is useful. It was defined as (7.64). From (8.6) to (8.8) as well as (8.44), we get
Z1 ¼ μ1 =ε1 ¼ E i =H i ¼ E r =H r , ð8:45Þ
Z 2 ¼ μ2 =ε2 ¼ E t =H t , ð8:46Þ
where ε1 and μ1 are permittivity and permeability of D1, respectively; ε2 and μ2 are
permittivity and permeability of D2, respectively. As an example, we have
Hi = n Ei =Z 1 ¼ n Ei εi,e eiðki xωtÞ =Z 1 ¼ Ei εi,m eiðki xωtÞ =Z 1

¼ H i εi,m eiðki xωtÞ , ð8:47Þ
where we distinguish polarization vectors of electric and magnetic fields. Note in the
above discussion, however, we did not distinguish these vectors to avoid complica-
tion. Comparing coefficients of the last relation of (8.47), we get
Ei =Z 1 ¼ H i : ð8:48Þ
On the basis of (8.42) to (8.46), we are able to decide Er, Et, Hi, Hr, and Ht.
What we wish to determine, however, is a ratio among those quantities. To this
end, dividing (8.42) and (8.43) by Ei (>0), we define following quantities:
R⊥ ⊥
E E r =E i and T E E t =E i , ð8:49Þ
where R⊥ ⊥
E and T E are said to be a reflection coefficient and transmission coefficient
with the electric field, respectively; the symbol ⊥ means a quantity of the TE wave
(i.e., electric field oscillating vertically with respect to the incidence plane). Thus
rewriting (8.42) and (8.43) and using R⊥ ⊥
E and T E , we have
9
R⊥ ⊥
E T E ¼ 1, =
cos θ cos ϕ cos θ ð8:50Þ
R⊥
E Z þ T⊥
E Z ¼ :;
1 2 Z1
Using Cramer’s rule of matrix algebra, we have a solution such that

1 1

cosθ cos ϕ

⊥ Z Z2 Z cos θ Z 1 cos ϕ
RE ¼ 1 ¼ 2 , ð8:51Þ
1 1 Z 2 cos θ þ Z 1 cos ϕ

cos θ cos ϕ

Z1 Z2

1 1

cosθ cos θ
2Z 2 cos θ
⊥ Z Z1
TE ¼ 1 ¼ : ð8:52Þ
1 1 Z cos θ þ Z 1 cos ϕ
2
cos θ cos ϕ

Z1 Z2
Similarly, defining
R⊥ ⊥
H H r =H i and T H H t =H i , ð8:53Þ
where R⊥ ⊥
H and T H are said to be a reflection coefficient and transmission coefficient
with the magnetic field, respectively, we get
Z 1 cos ϕ Z 2 cos θ
R⊥
H ¼ , ð8:54Þ
Z 2 cos θ þ Z 1 cos ϕ
2Z 1 cos θ
T⊥
H ¼ : ð8:55Þ
In this case, rewrite (8.42) as a relation among Hi, Hr, and Ht using (8.45) and (8.46).
Derivation of (8.54) and (8.55) is left for readers. Notice also that
R⊥ ⊥
H ¼ RE : ð8:56Þ
This relation can easily be derived by (8.45).

Example 8.2: TM Wave In a manner similar to that described above, we obtain
information about the TM wave. Switching a role of E and H, we assume that H is
polarized along the y-axis with E polarized in the zx-plane. Following the aforemen-
tioned procedures, we have
E i cos θ þ E r cos θ ¼ E t cos ϕ, ð8:57Þ

Hi þ Hr ¼ Ht : ð8:58Þ
From (8.57) and (8.58), similarly we get
k Z 2 cos ϕ Z 1 cos θ
RE ¼ , ð8:59Þ
k 2Z 2 cos θ
TE ¼ : ð8:60Þ
Also, we get
Table 8.1 Reflection and transmission coefficients of TE and TM waves

⊥ Incidence (TE) k Incidence (TM)
Z 2 cos θZ 1 cos ϕ k
R⊥
E ¼ Z 2 cos θþZ 1 cos ϕ RE ¼ ZZ 21 cos ϕZ 1 cos θ
cos θþZ 2 cos ϕ
Z 1 cos ϕZ 2 cos θ k k
R⊥ ⊥
H ¼ Z 2 cos θþZ 1 cos ϕ ¼ RE RH ¼ ZZ 11 cos θZ 2 cos ϕ
cos θþZ 2 cos ϕ ¼ RE
2Z 2 cos θ
T⊥
E ¼ Z 2 cos θþZ 1 cos ϕ
k 2Z 2 cos θ
T E ¼ Z 1 cos θþZ 2 cos ϕ
2Z 1 cos θ
T⊥
H ¼ Z 2 cos θþZ 1 cos ϕ
k 2Z 1 cos θ
T H ¼ Z 1 cos θþZ 2 cos ϕ
cos ϕ k k k k cos ϕ
R⊥ ⊥ ⊥ ⊥
E RH þ T E T H cos θ ¼ 1 ðR⊥ þ T ⊥ ¼ 1Þ RE RH þ T E T H cos θ ¼ 1 Rk þ T k ¼ 1
k Z 1 cos θ Z 2 cos ϕ k
RH ¼ ¼ RE , ð8:61Þ
k 2Z 1 cos θ
TH ¼ : ð8:62Þ
In Table 8.1, we list the important coefficients in relation to the reflection and
transmission of electromagnetic waves along with their relationship.
In Examples 8.1 and 8.2, we have examined how the reflection and transmission
coefficients vary as a function of characteristic impedance as well as incidence and
refraction angles. Meanwhile, in a nonmagnetic substance a refractive index n can be
approximated as
pffiffiffiffi
n εr , ð7:57Þ
assuming that μr 1. In this case, we have

pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi
Z¼ μ=ε ¼ μr μ0 =εr ε0 Z 0 = εr Z 0 =n:
Using this relation, we can readily rewrite the reflection and transmission coeffi-
cients as a function of refractive indices of the dielectrics. The derivation is left for
the readers.
8.4 Energy Transport by Electromagnetic Waves
Returning to (7.58) and (7.59), let us consider energy transport in a dielectric

medium by electromagnetic waves. Let us describe their electric (E) and magnetic
(H) fields of the electromagnetic waves in a uniform and infinite dielectric medium
such that
8.4 Energy Transport by Electromagnetic Waves 309
E = Eεe eiðknxωtÞ , ð8:63Þ

H = Hεm eiðknxωtÞ , ð8:64Þ
where εe and εm are unit polarization vector; we assume that both E and H are
positive. Notice again that εe, εm, and n constitute a right-handed system in this
order.
The energy transport is characterized by a Poynting vector S that is described by
S = E H: ð8:65Þ
V A
Since E and H have a dimension [m ] and [m], respectively, S has a dimension [mW2]. Hence,
S represents an energy flow per unit time and per unit area with respect to the propagation
direction. For simplicity, let us assume that the electromagnetic wave is propagating
toward the z-direction. Then we have
E = Eεe eiðkzωtÞ , ð8:66Þ

H = Hεm eiðkzωtÞ : ð8:67Þ
To seek a time-averaged energy flow toward the z-direction, it suffices to multiply

real parts of (8.66) and (8.67) and integrate it during a period T at a point of z ¼ 0.
Thus, a time-averaged Poynting vector S is given by
Z T
EH
S ¼ e3 cos 2 ωtdt, ð8:68Þ
T 0
where T ¼ 1/ν ¼ 2π/ω. Using a trigonometric formula
1
cos 2 ωt ¼ ð1 þ cos 2ωt Þ, ð8:69Þ
2
the integration can easily be performed. Thus, we get
1
S ¼ EHe3 : ð8:70Þ
2
Equivalently, we have
1
S ¼ E H : ð8:71Þ
2
Meanwhile, an energy density W is given by

1
W ¼ ðE D þ H BÞ, ð8:72Þ
2
where the first and second terms are pertinent to the electric and magnetic fields,
respectively. Note in (8.72) that the dimension of E D is [m m2] ¼ [mJ3] and that the
V C
dimension of H B is [m m2 ] ¼ [ m3 ] ¼ [m3 ]. Using (7.7) and (7.10), we have

A Vs Ws J
1 2
W¼ εE þ μH 2 : ð8:73Þ
2
As in the above case, estimating a time-averaged energy density W, we get

1 1 2 1 2 1 1
W¼ εE þ μH = εE 2 þ μH 2 : ð8:74Þ
2 2 2 4 4
We also get this relation by integrating (8.73) over a wavelength λ at a time of t ¼ 0.

εE2 = μH 2 : ð8:75Þ
This implies that the energy density resulting from the electric field and that due to
the magnetic field have the same value. Thus, rewriting (8.74) we have
1 1
W ¼ εE 2 = μH 2 : ð8:76Þ
2 2
Moreover, using (7.43), we have for an impedance

Z ¼ E=H ¼ μ=ε ¼ μv or E ¼ μvH: ð8:77Þ
Using this relation along with (8.75), we get
1 1
S ¼ vεE 2 e3 ¼ vμH 2 e3 : ð8:78Þ
2 2
Thus, we have various relations among amplitudes of electromagnetic waves and

related physical quantities together with constant of dielectrics.
Returning to Examples 8.1 and 8.2, let us further investigate the reflection and
transmission properties of the electromagnetic waves. From (8.51) to (8.55) as well
as (8.59) to (8.62), we get in both the cases of TE and TM waves
cos ϕ
R⊥ ⊥ ⊥ ⊥
E RH þ TE T H ¼ 1, ð8:79Þ
cos θ
8.4 Energy Transport by Electromagnetic Waves 311
k k k k cos ϕ
RE RH þ T E T H ¼ 1: ð8:80Þ
cos θ
In both the TE and TM cases, we define reflectance R and transmittance T such that
R RE RH ¼ R2E ¼ 2 j Sr j =2 j Si j¼j Sr j = j Si j , ð8:81Þ
where Sr and Si are time-averaged Poynting vectors of the reflected wave and
incident waves, respectively. Also, we have
cos ϕ 2 j St j cos ϕ j St j cos ϕ

T TETH ¼ ¼ , ð8:82Þ
cos θ 2 j Si j cos θ j Si j cos θ
where St is a time-averaged Poynting vector of the transmitted wave. Thus, we have
R þ T ¼ 1: ð8:83Þ
ϕ
The relation (6.83) represents the energy conservation. The factor cos cos θ can be
understood by Fig. 8.6 that depicts a luminous flux near the interface. Suppose
that we have an incident wave with an irradiance I mW2 whose incidence plane is the
zx-plane. Notice that I has the same dimension as a Poynting vector.
Here let us think of the luminous flux that is getting through a unit area (i.e., a unit
length square) perpendicular to the propagation direction of the light. Then, this flux
illuminates an area on the interface of a unit length (in the y-direction) multiplied by
ϕ
cos θ (in the x-direction). That is, the luminous flux has been widened
a length of cos
ϕ
(or squeezed) by cos cos θ times after getting through the interface. The irradiance has
been weakened (or strengthened) accordingly (see Fig. 8.6). Thus, to take a balance
of income and outgo with respect to the luminous flux before and after getting
ϕ
through the interface, the transmission irradiance must be multiplied by a factor cos cos θ.
Fig. 8.6 Luminous flux z

near the interface ki kr
θ
'
' x
kt
8.5 Brewster Angles and Critical Angles
In this section and subsequent sections, we deal with nonmagnetic substance as

dielectrics; namely we assume μr 1. In that case, as mentioned in Sect. 8.3 we
rewrite, e.g., (8.51) and (8.59) as
cos θ n cos ϕ
R⊥
E ¼ , ð8:84Þ
cos θ þ n cos ϕ
k cos ϕ n cos θ
RE ¼ , ð8:85Þ
cos ϕ þ n cos θ
where n(¼n2/n1) is a relative refractive index of D2 relative to D1. Let us think of a

k
condition on which R⊥ E ¼ 0 or RE ¼ 0.
First, we consider (8.84). We have
sin θ
½numerator of ð8:84Þ ¼ cos θ n cos ϕ ¼ cos θ cos ϕ
sin ϕ
sin ϕ cos θ sin θ cos ϕ sin ðϕ θÞ
¼ ¼ , ð8:86Þ
sin ϕ sin ϕ
where with the second equality we used Snell’s law; the last equality is due to
trigonometric formula. Since we assume 0 < θ < π/2 and 0 < ϕ < π/2, we have
π/2 < ϕ θ < π/2. Therefore, if and only if ϕ θ ¼ 0, sin(ϕ θ) ¼ 0. Namely,
only when ϕ ¼ θ, R⊥ E could vanish. For different dielectrics having different
refractive indices, only if ϕ ¼ θ ¼ 0 (i.e., a vertical incidence), we have ϕ ¼ θ.
But, in that case we have
sin ðϕ θÞ 0
lim ¼ :
ϕ!0, θ!0 sin ϕ 0
This is a limit of indeterminate form. From (8.84), however, we have
1n
R⊥
E ¼ , ð8:87Þ
1þn
for ϕ ¼ θ ¼ 0. This implies that R⊥ ⊥

E does not vanish at ϕ ¼ θ ¼ 0. Thus, RE never
vanishes for any θ or ϕ. Note that for this condition, naturally we have
k 1n
RE ¼ :
1þn
This is because with ϕ ¼ θ ¼ 0 we have no physical difference between TE and TM

waves.
In turn, let us examine (8.85) similarly with the case of TM wave. We have
8.5 Brewster Angles and Critical Angles 313
sin θ
½numerator of ð8:85Þ ¼ cos ϕ n cos θ ¼ cos ϕ cos θ
sin ϕ
sin ϕ cos ϕ sin θ cos θ sin ðϕ θÞ cos ðϕ þ θÞ
¼ ¼ : ð8:88Þ
sin ϕ sin ϕ
With the last equality of (8.88), we used a trigonometric formula. From (8.86) we
ðϕθÞ k
know that sinsin ϕ does not vanish. Therefore, for RE to vanish, we need cos
(ϕ + θ) ¼ 0. Since 0 < ϕ + θ < π, cos(ϕ + θ) ¼ 0 if and only if
ϕ þ θ ¼ π=2: ð8:89Þ
In other words, for particular angles θ ¼ θB and ϕ ¼ ϕB that satisfy
ϕB þ θB ¼ π=2, ð8:90Þ
k
we have RE ¼ 0; i.e., we do not observe a reflected wave. The particular angle θB is
said to be the Brewster angle. For θB we have

π
sin ϕB ¼ sin θB ¼ cos θB ,
2
n ¼ sin θB = sin ϕB ¼ sin θB = cos θB ¼ tan θB or θB ¼ tan 1 n,
ϕB ¼ tan 1 n1 : ð8:91Þ
Suppose that we have a parallel plate consisting a dielectric D2 of a refractive

index n2 sandwiched with another dielectric D1 of a refractive index n1 (Fig. 8.7).
Let θB be the Brewster angle when the TM wave is incident from D1 to D2. In the
θB '
x
φB
' φB
' θB
Fig. 8.7 Diagram that explains the Brewster angle. Suppose that a parallel plate consisting of a
dielectric D2 of a refractive index n2 is sandwiched with another dielectric D1 of a refractive index
n1. The incidence angle θB represents the Brewster angle observed when the TM wave is incident
from D1 to D2. ϕB is another Brewster angle that is observed when the TM wave is getting back
from D2 to D1
above discussion, we defined a relative refractive index n of D2 relative D1

as n ¼ n2/n1; recall (8.39). The other way around, suppose that the TM wave
is incident from D2 to D1. Then, the relative refractive index of D1 relative to D2
is n1/n2 ¼ n1. In this situation, another Brewster angle (from D2 to D1) defined as
e
θB is given by
e
θB ¼ tan 1 n1 : ð8:92Þ
This number is, however, identical to ϕB in (8.91). Thus, we have
e
θ B ¼ ϕB : ð8:93Þ
Thus, regarding the TM wave that is propagating in D2 after getting through the
interface and is to get back to D1, eθB ¼ ϕB is again the Brewster angle. In this way,
the said TM wave is propagating from D1 to D2 and then getting back from D2 to D1
without being reflected by the two interfaces. This conspicuous feature is often
utilized for an optical device.
If an electromagnetic wave is incident from a dielectric of a higher refractive
index to that of a lower index, the total reflection takes place. This is equally the case
with both TE and TM waves. For the total reflection to take place, θ should be larger
than a critical angle θc that is defined by
θc ¼ sin 1 n: ð8:94Þ
This is because at θc from the Snell’s law we have
sin θc n
¼ sin θc ¼ 2 ð nÞ: ð8:95Þ
sin π2 n1

tan θc ¼ n= 1 n2 > n ¼ tan θB :
In the case of the TM wave, therefore, we find that
θc > θB : ð8:96Þ
The critical angle is always larger than the Brewster angle with TM waves.
8.6 Total Reflection 315
8.6 Total Reflection
In Sect. 8.2 we saw that the Snell’s law results from the kinematical requirement. For
this reason, we may consider it as a universal relation that can be extended to
complex refraction angles. In fact, for the Snell’s law to hold with θ > θc, we
must have
sin ϕ > 1: ð8:97Þ
This needs us to extend ϕ to a complex domain. Putting
π
ϕ¼ þ ia ða : real, a 6¼ 0Þ, ð8:98Þ
2
we have
1 iϕ 1
sin ϕ e eiϕ ¼ ðea þ ea Þ > 1, ð8:99Þ
2i 2
1 i
cos ϕ eiϕ þ eiϕ ¼ ðea ea Þ: ð8:100Þ
2 2
Thus, cosϕ is pure imaginary.

Now, let us consider a transmitted wave whose electric field is described as
Et = Eεt eiðkt xωtÞ , ð8:101Þ
where εt is the unit polarization vector and kt is a wavenumber vector of the

transmission wave. Suppose that the incidence plane is the zx-plane. Then, we have
kt x = ktx x þ k tz z ¼ xk t sin ϕ þ zk t cos ϕ, ð8:102Þ
where ktx and ktz are x and z components of kt, respectively; kt ¼ j ktj. Putting
cos ϕ ¼ ib ðb : real, b 6¼ 0Þ, ð8:103Þ
we have
Et = Eεt eiðxkt sin ϕþibzkt ωtÞ ¼ Eεt eiðxkt sin ϕωtÞ ebzkt : ð8:104Þ
With the total reflection, we must have
z⟶1 ⟹ ebzkt ⟶0: ð8:105Þ

To meet this requirement, we have
b > 0: ð8:106Þ
Meanwhile, we have
sin2 θ n2 sin2 θ
cos2 ϕ ¼ 1 sin2 ϕ ¼ 1 ¼ , ð8:107Þ
n2 n2
where notice that n < 1, because we are dealing with the incidence of light from a
medium with a higher refractive index to a low index medium. When we consider
the total reflection, the numerator of (8.107) is negative, and so we have two choices
such that
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sin2 θ n2
cos ϕ ¼ i : ð8:108Þ
n
From (8.103) and (8.106), we get

sin2 θ n2
cos ϕ ¼ i : ð8:109Þ
n
Hence, inserting (8.109) into (8.84) we have for the TE wave

cos θ i sin2 θ n2
R⊥
E ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð8:110Þ
cos θ þ i sin2 θ n2
Then, we have
⊥ ⊥
R⊥ ¼ R⊥E RH ¼ R⊥ E RE
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cos θ i sin2 θ n2 cos θ þ i sin2 θ n2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1: ð8:111Þ
cos θ þ i sin2 θ n2 cos θ i sin2 θ n2
As for the TM wave, substituting (8.109) for (8.85) we have

k n2 cos θ þ i sin2 θ n2
RE ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð8:112Þ
n2 cos θ þ i sin2 θ n2
In this case, we also get

8.6 Total Reflection 317
k n2 cos θ þ i sin2 θ n2 n2 cos θ i sin2 θ n2
R ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1: ð8:113Þ
n2 cos θ þ i sin2 θ n2 n2 cos θ i sin2 θ n2
The relations (8.111) and (8.113) ensure that the energy flow gets back to a higher
refractive index medium.
Thus, the total reflection is characterized by the complex reflection coefficient
expressed as (8.110) and (8.112) as well as a reflectance of 1. From (8.110) and
(8.112) we can estimate a change in a phase of the electromagnetic wave that takes
place by virtue of the total reflection. For this purpose, we put
k
R⊥
E e and RE e :
iα iβ
ð8:114Þ

cos2 θ sin2 θ n2 2i cos θ sin2 θ n2
R⊥ ¼ : ð8:115Þ
E
1 n2
At a critical angle θc, from (8.95) we have
sin θc ¼ n: ð8:116Þ
Therefore, we have
1 n2 ¼ cos2 θc : ð8:117Þ
Then, as expected, we get

R⊥
E θ¼θc ¼ 1: ð8:118Þ
Note, however, that at θ ¼ π/2 (i.e., grazing incidence) we have

R⊥
E θ¼π=2 ¼ 1: ð8:119Þ
From (8.115), an argument α in a complex plane is given by

2 cos θ sin2 θ n2
tan α ¼ : ð8:120Þ
cos2 θ sin2 θ n2
The argument α defines a phase shift upon the total reflection. Considering (8.115)
and (8.118), we have
Fig. 8.8 Phase shift α

defined in a complex plane i
for the total reflection of TE
wave. The number n denotes
a relative refractive index of
D2 relative to D1. At a
critical angle θc, α ¼ 0 0 1
α
= =
2
1+
= sin
2
αjθ¼θc ¼ 0:
Since 1 n2 > 0 (i. e., n < 1) and in the total reflection region sin2θ n2 > 0, the
imaginary part of R⊥ E is negative for any θ (i.e., 0 to π/2). On the other hand, the real
part of R⊥
E varies from 1 to 1, as is evidenced from (8.118) and (8.119). At θ0 that
satisfies a following condition:
rffiffiffiffiffiffiffiffiffiffiffiffiffi
1 þ n2
sin θ0 ¼ , ð8:121Þ
2
the real part is zero. Thus, the phase shift α varies from 0 to π as indicated in
Fig. 8.8. Comparing (8.121) with (8.116) and taking into account n < 1, we have
θc < θ0 < π=2:
Similarly, we estimate the phase change for a TM wave. Rewriting (8.112), we

have
k n4 cos2 θ þ sin2 θ n2 þ 2in2 cos θ sin2 θ n2
RE ¼
n4 cos2 θ þ sin2 θ n2
n4 cos2 θ þ sin2 θ n2 þ 2in2 cos θ sin2 θ n2
¼ : ð8:122Þ
ð1 n2 Þ sin2 θ n2 cos2 θ
Then, we have

k
RE ¼ 1: ð8:123Þ
θ¼θc
Also at θ ¼ π/2 (i.e., grazing incidence) we have

8.7 Waveguide Applications 319

k
RE ¼ 1: ð8:124Þ
θ¼π=2
From (8.122), an argument β is given by

2n2 cos θ sin2 θ n2
tan β ¼ : ð8:125Þ
n4 cos2 θ þ sin2 θ n2
Considering (8.122) and (8.123), we have
βjθ¼θc ¼ π: ð8:126Þ
In the total reflection region we have

sin2 θ n2 cos2 θ > n2 n2 cos2 θ ¼ n2 1 cos2 θ > 0: ð8:127Þ
Therefore, the denominator of (8.122) is positive and, hence, the imaginary part of
k
RE is positive as well for any θ (i.e., 0 to π/2). From (8.123) and (8.124), on the other
hand, the real part of RE in (8.122) varies from 1 to 1. At e
k
θ0 that satisfies a
following condition:
rffiffiffiffiffiffiffiffiffiffiffiffiffi
1 n2
cos e
θ0 ¼ , ð8:128Þ
1 þ n4
k
the real part of RE is zero. Once again, we have
θc < e
θ0 < π=2: ð8:129Þ
Thus, the phase β varies from π to 0 as depicted in Fig. 8.9.

In this section, we mentioned somewhat peculiar features of complex trigono-
metric functions such as sinϕ > 1 in light of real functions. As a matter of course, the
complex angle ϕ should be determined experimentally from (8.109). In this context
readers are referred to Chap. 6 that dealt with the theory of analytic functions [2].
8.7 Waveguide Applications
There are many optical devices based upon light propagation. Among them, wave-
guide devices utilize the total reflection. We explain their operation principle.
Suppose that we have a thin plate (usually said to be a slab) comprising a
dielectric medium that infinitely spreads two-dimensionally and that the plate is
sandwiched with another dielectric (or maybe air or vacuum) or metal. In this
Fig. 8.9 Phase shift β for

the total reflection of TM 1−
wave. At a critical angle θc, = cos
1+
β¼π i
β
1
0
= =
2
situation electromagnetic waves are confined within the slab. Moreover, only under a
restricted condition those waves are allowed to propagate in parallel to the slab
plane. Such electromagnetic waves are usually called propagation modes or simply
“modes.” An optical device thus designed is called a waveguide. These modes are
characterized by repeated total reflection during the propagation. Another mode is an
evanescent mode. Because of the total reflection, the energy transport is not allowed
to take place vertically to the interface of two dielectrics. For this reason, the
evanescent mode is thought to be propagated in parallel to the interface very close
to it.
8.7.1 TE and TM Waves in a Waveguide
In a waveguide configuration, propagating waves are classified into TE and TM

modes. Quality of materials that constitute a waveguide largely governs the propa-
gation modes within the waveguide.
Figure 8.10 depicts a cross-section of a slab waveguide. We assume that the
electromagnetic wave is propagated toward the positive direction of the z-axis and
that a waveguide infinitely spreads toward the z- and x-axes. Suppose that the said
waveguide is spatially confined toward the y-axis. Let the thickness of the wave-
guide be d. From a point of view of material that shapes a waveguide, waveguides
are classified into two types. (i) Electromagnetic waves are completely confined
within the waveguide layer. This case typically happens when a dielectric forming
the waveguide is sandwiched between a couple of metal layers (Fig. 8.10a). This is
because the electromagnetic wave is not allowed to exist or propagate inside the
metal.
(ii) Electromagnetic waves are not completely confined within the waveguide.
This case happens when the dielectric of the waveguide is sandwiched by a couple of
(a) Metal
(b) Dielectric (clad layer)
Waveguide
Waveguide
d
d
(core layer)
Metal Dielectric (clad layer)

y y
z z
x x
Fig. 8.10 Cross-section of a slab waveguide comprising a dielectric medium. (a) A waveguide is
sandwiched between a couple of metal layers. (b) A waveguide is sandwiched between a couple of
layers consisting of another dielectric called clad layer. The sandwiched layer is called core layer
other dielectrics. We distinguish this case as the total internal reflection from the
above case (i). We further describe it in Sect. 8.7.2. For the total internal reflection to
take place, the refractive index of the waveguide must be higher than those of other
dielectrics (Fig. 8.10b). The dielectric of the waveguide is called core layer and the
other dielectric is called clad layer. In this case, electromagnetic waves are allowed
to propagate inside of the clad layer, even though the region is confined very close to
the interface between the clad and core layers. Such electromagnetic waves are said
to be an evanescent wave. According to these two cases (i) and (ii), we have different
conditions under which the allowed modes can exist.
Now, let us return to Maxwell’s equations. We have introduced the equations of
wave motion (7.35) and (7.36) from Maxwell’s eqs. (7.28) and (7.29) along with
(7.7) and (7.10). One of their simplest solutions is a plane wave described by (7.53).
The plane wave is characterized by that the wave has the same phase on an infinitely
spreading plane perpendicular to the propagation direction (characterized by a
wavenumber vector k). In a waveguide, however, the electromagnetic field is
confined with respect to the direction parallel to the normal to the slab plane (i.e.,
the direction of the y-axis in Fig. 8.10). Consequently, the electromagnetic field can
no longer have the same phase in that direction. Yet, as solutions of equations of
wave motion, we can have a solution that has the same phase with the direction of the
x-axis. Bearing in mind such a situation, let us think of Maxwell’s equations in
relation to the equations of wave motion.
Ignoring components related to partial differentiation with respect to x (i.e., the
component related to ∂/∂x) and rewriting (7.28) and (7.29) for individual Cartesian
coordinates, we have [3].
∂E z ∂Ey ∂Bx
2 þ ¼ 0, ð8:130Þ
∂y ∂z ∂t
∂E x ∂By
þ ¼ 0, ð8:131Þ
∂z ∂t
∂Ex ∂Bz
þ ¼ 0, ð8:132Þ
∂y ∂t
∂H z ∂H y ∂Dx
2 ¼ 0, ð8:133Þ
∂y ∂z ∂t
∂H x ∂Dy
¼ 0, ð8:134Þ
∂z ∂t
∂H x ∂Dz
¼ 0: ð8:135Þ
∂y ∂t
Of the above equations, we collect those pertinent to Ex and differentiate (8.131),

(8.132), and (8.133) with respect to z, y, and t, respectively, to get
2
∂ E x ∂ By
2
þ ¼ 0, ð8:136Þ
∂z
2 ∂z∂t
2 2
∂ E x ∂ Bz
¼ 0, ð8:137Þ
∂y
2 ∂y∂t
2
2
∂ Hz ∂ H y ∂2 Dx
2 ¼ 0: ð8:138Þ
∂t∂y ∂t∂z ∂t
2
Multiplying (8.138) by μ and further adding (8.136) and (8.137) to it and using (7.7),
we get
2 2 2
∂ Ex ∂ Ex ∂ E
2
þ 2
¼ με 2 x : ð8:139Þ
∂y ∂z ∂t
This is a two-dimensional equation of wave motion. In a similar manner, from

(8.130), (8.134), and (8.135), we have for the magnetic field
2 2 2
∂ Hx ∂ Hx ∂ Hx
2
þ 2
¼ με 2
: ð8:140Þ
∂y ∂z ∂t
Equations (8.139) and (8.140) are two-dimensional wave equations with respect
to the y- and z-coordinates. With the direction of the x-axis, a propagating wave has
the same phase. Suppose that we have plane wave solutions for them as in the case of
(7.58) and (7.59). Then, we have
E = E0 eiðkxωtÞ ¼ E0 eiðknxωtÞ ,
H = H0 eiðkxωtÞ ¼ H0 eiðknxωtÞ : ð8:141Þ
Note that a plane wave expressed by (7.58) and (7.59) is propagated uniformly in a
dielectric medium. In a waveguide, on the other hand, the electromagnetic waves
undergo repeated (total) reflections from the two boundaries positioned either side of
the waveguide, while being propagated.
In a three-dimensional version, the wavenumber vector has three components kx,
ky, and kz as expressed in (7.48). In (8.141), in turn, k has y- and z-components such
that
k2 ¼ k 2 ¼ k 2y þ k2z : ð8:142Þ
Equations (8.139) and (8.140) can be rewritten as
2 2 2 2 2 2
∂ Ex ∂ Ex ∂ Ex ∂ Hx ∂ Hx ∂ Hx
þ ¼ με , þ ¼ με :
∂ð yÞ2 ∂ð zÞ2 ∂ ð t Þ2 ∂ð yÞ2 ∂ð zÞ2 ∂ ð t Þ2
Accordingly, we have four wavenumber vector components

ky ¼ k y and kz ¼ j kz j :
Figure 8.11 indicates this situation where an electromagnetic wave can be propa-
gated within a slab waveguide in either one direction out of four choices of k. In this
section, we assume that the electromagnetic wave is propagated toward the positive
direction of the z-axis, and so we define kz as positive. On the other hand, ky can be
either positive or negative. Thus, we get
kz ¼ k sin θ and ky ¼ k cos θ: ð8:143Þ
Fig. 8.11 Four possible | |

propagation directions k of
an electromagnetic wave in
a slab waveguide
−| | | |
−| |
Fig. 8.12 Geometries and

propagation of the
electromagnetic waves in a
slab waveguide
k kz
ky
θ ky θ
k
kz
z
x
Figure 8.12 shows the geometries of the electromagnetic waves within the slab
waveguide. The slab plane is parallel to the zx-plane. Let the positions of the two
interfaces of the slab waveguide be
y ¼ 0 and y ¼ d: ð8:144Þ
That is, we assume that the thickness of the waveguide is d.

Since (8.139) describes a wave equation for only one component Ex, (8.139) is
suited for representing a TE wave. In (8.140), in turn, a wave equation is given only
for Hx, and hence, it is suited for representing a TM wave. With the TE wave the
electric field oscillates parallel to the slab plane and vertical to the propagation
direction. With the TM wave, in turn, the magnetic field oscillates parallel to the slab
plane and vertical to the propagation direction. In a general case, electromagnetic
waves in a slab waveguide are formed by superposition of TE and TM waves. Notice
that Fig. 8.12 is applicable to both TE and TM waves.
Let us further proceed with the waveguide analysis. The electric field E within the
waveguide is described by superposition of the incident and reflected waves. Using
the first equation of (8.141) and (8.143), we have
Eðz, yÞ = εe Eþ eiðkz sin θþky cos θωtÞ þ εe 0 E eiðkz sin θky cos θωtÞ , ð8:145Þ

where E+ (E) and εe ε0e represent an amplitude and unit polarization vector of the
incident (reflected) waves, respectively. The vector εe (or εe0) is defined in (7.67).
Equation (8.145) is common to both the cases of TE and TM waves. From now,
we consider the TE mode case. Suppose that the slab waveguide is sandwiched with
a couple of metal sheet of high conductance. Since the electric field must be absent
inside the metal, the electric field at the interface must be zero owing to the
continuity condition of a tangential component of the electric field. Thus, we require
the following condition should be met with (8.145):
t Eðz, 0Þ = 0 = t εe E þ eiðkz sin θωtÞ þ t εe 0 E eiðkz sin θωtÞ

= ðt εe Eþ þ t εe 0 E Þeiðkz sin θωtÞ : ð8:146Þ
Therefore, since ei(kz sin θ ωt) never vanishes, we have
t εe E þ þ t εe 0 E ¼ 0, ð8:147Þ
where t is a tangential unit vector at the interface.

Since E is polarized along the x-axis, setting εe ¼ εe0 ¼ e1 and taking t as e1, we
get
E þ þ E ¼ 0:
This means that the reflection coefficient of the electric field is 1. Denoting
E+ ¼ E E0 (>0), we have
h i
E = e1 E0 eiðkz sin θþky cos θωtÞ eiðkz sin θky cos θωtÞ
h i
= e1 E0 eiky cos θ eiky cos θ eiðkz sin θωtÞ
= e1 2iE 0 sin ðky cos θÞeiðkz sin θωtÞ : ð8:148Þ
Requiring the electric field to vanish at another interface of y ¼ d, we have
Eðz, dÞ = 0 = e1 2iE 0 sin ðkd cos θÞeiðkz sin θωtÞ :
Note that in terms of the boundary conditions we are thinking of Dirichlet conditions
(see Sects. 1.3 and 10.3). In this case, we have nodes for the electric field at the
interface between metal and a dielectric. For this condition to be satisfied, we must
have
kd cos θ ¼ mπ ðm ¼ 1, 2, Þ: ð8:149Þ
From (8.149), we have a following condition for m:
m kd=π: ð8:150Þ
Meanwhile, we have
k ¼ nk 0 , ð8:151Þ
where n is a refractive index of a dielectric that shapes the slab waveguide; the
quantity k0 is a wavenumber of the electromagnetic wave in vacuum. The index n is
given by
n ¼ c=v, ð8:152Þ
where c and v are light velocity in vacuum and the dielectric media, respectively.
Here v is meant as a velocity in an infinitely spreading dielectric. Thus, θ is allowed
to take several (or more) numbers depending upon k, d, and m.
Since in the z-direction no specific boundary conditions are imposed, we have
propagating modes in that direction characterized by a propagation constant (vide
infra). Looking at (8.148), we notice that k sin θ plays a role of a wavenumber in a
free space. For this reason, a quantity β defined as
β ¼ k sin θ ¼ nk0 sin θ ð8:153Þ
is said to be a propagation constant. In (8.153), k0 is a wavenumber in vacuum. From

(8.149) and (8.153), we get
1=2
m2 π 2
β¼ k2 : ð8:154Þ
d2
Thus, the allowed TE waves indexed by m are called TE modes and represented as
TEm. The phase velocity vp is given by
vp ¼ ω=β: ð8:155Þ
Meanwhile, the group velocity vg is given by
1
dω dβ
vg ¼ ¼ : ð8:156Þ
dβ dω
Using (8.154) and noting that k2 ¼ ω2/v2, we get
vg ¼ v2 β=ω: ð8:157Þ
Thus, we have
vp vg ¼ v2 : ð8:158Þ
Note that in (1.22) of Sect 1.1 we saw a relationship similar to (8.158).

The characteristics of TM waves can be analyzed in a similar manner by exam-
ining the magnetic field Hx. In that case, the reflection coefficient of the magnetic
field is +1 and we have antinodes for the magnetic field at the interface.
Concomitantly, we adopt Neumann conditions as the boundary conditions (see

Sects. 1.3 and 8.3). Regardless of the difference in the boundary conditions, how-
ever, discussion including (8.149) to (8.158) applies to the analysis of TM waves.
Once Hx is determined, Ey and Ez can be determined as well from (8.134) and
(8.135).
8.7.2 Total Internal Reflection and Evanescent Waves
If a slab waveguide shaped by a dielectric is sandwiched by a couple of another

dielectric (Fig. 8.10b), the situation differs from a metal waveguide (Fig. 8.10a) we
encountered in Sect. 8.7.1. Suppose in Fig. 8.10b that the former dielectric D1 of a
refractive index n1 is sandwiched by the latter dielectric D2 of a refractive index n2.
Suppose that an electromagnetic wave is being propagated from D1 toward D2.
Then, we must have
n1 > n2 ð8:159Þ
so that the total internal reflection can take place at the interface of D1 and D2. In this
case, the dielectrics D1 and D2 act as a core layer and a clad layer, respectively.
The biggest difference between the present waveguide and the previous one is
that unlike the previous case, the total internal reflection occurs in the present case.
Concomitantly, an evanescent wave is present in the clad layer very close to the
interface.
First, let us estimate the conditions that are satisfied so that an electromagnetic
wave can be propagated within a waveguide. Figure 8.13 depicts a cross-section of
the waveguide where the light is propagated in the direction of k. In Fig. 8.13,
suppose that we have a normal N to the plane of paper at P. Then, N and a straight
line XY shape a plane NXY. Also suppose that a dielectric fills a semi-infinite space
situated below NXY. Further suppose that there is another virtual plane N’X’Y’ that
is parallel with NXY as shown. Here N0 is parallel to N. The separation of the two
parallel plane is d. We need the virtual plane N’X’Y0 just to estimate an optical path
difference (or phase difference, more specifically) between two waves, i.e., a prop-
agating wave and a reflected wave.
Let n be a unit vector in the direction of k; i.e.,
n ¼ k= j k j¼ k=k: ð8:160Þ
Then the electromagnetic wave is described as
E = E0 eiðkxωtÞ ¼ E0 eiðknxωtÞ : ð8:161Þ
Suppose that we take a coordinate system such that

Fig. 8.13 Cross-section of

; 3 1 <
the waveguide where the
light is propagated in the θ
direction of k. A dielectric $
d
fills a semi-infinite space 4
situated below NXY. We
suppose another virtual ;ಬ 1ಬ % <ಬ
plane N’X’Y0 that is parallel 3ಬ
with NXY. We need the θ
plane N’X’Y0 to estimate an
optical path difference $ಬ
4ಬ
(or phase difference)
%ಬ
x = rn þ su þ tv, ð8:162Þ
where u and v represent unit vectors in the direction perpendicular to n. Then (8.161)
can be expressed by
E = E0 eiðkrωtÞ : ð8:163Þ
Suppose that the wave is propagated starting from a point A to P and reflected at
P. Then, the wave is further propagated to B and reflected again to reach Q. The
wave front is originally at AB and finally at PQ. Thus, the Z-shaped optical path
length APBQ is equal to a separation between A’B’ and PQ. Notice that the
separation between A’B’ and P’Q’ is taken so that it is equal to that between AB
and PQ. The geometry of Fig. 8.13 implies that two waves starting from AB and
A’B’ at once reach PQ again at once.
We find the separation between AB and A’B’ is
2d cos θ:
Let us tentatively call these waves Wave-AB and Wave-A’B’ and describe their
electric fields as EAB and EA0 B0 , respectively. Then, we denote
EAB = E0 eiðkrωtÞ , ð8:164Þ

EA0 B0 = E0 ei½kðrþ2d cos θÞωt , ð8:165Þ
where k is a wavenumber in the dielectric. Note that since EA0 B0 gets behinds EAB, a
plus sign appears in the first term of the exponent. Therefore, the phase difference
between the two waves is
2kd cos θ: ð8:166Þ

Now, let us come back to the actual geometry of the waveguide. That is, the core
layer of thickness d is sandwiched by a couple of clad layers (Fig. 8.10b). In this
situation, the wave EAB experiences the total internal reflection two times, which we
ignored in the above discussion of the metal waveguide. Since the total internal
reflection causes a complex phase shift, we have to take account of this effect. The
phase shift was defined as α of (8.114) for a TE mode and β for a TM mode. Notice
that in Fig. 8.13 the electric field oscillates perpendicularly to the plane of paper with
the TE mode, whereas it oscillates in parallel with the plane of paper with the TM
mode. For both the cases the electric field oscillates perpendicularly to n. Conse-
quently, the phase shift due to these reflections has to be added to (8.166). Thus, for
the phase commensuration to be obtained, the following condition must be satisfied:
2kd cos θ þ 2δ ¼ 2lπ ðl ¼ 0, 1, 2, Þ, ð8:167Þ
where δ is either δTE or δTM defined below according to the case of the TE wave and
TM wave, respectively. For a practical purpose, (8.167) is dealt with by a numerical
calculation, e.g., to design an optical waveguide.
Unlike (8.149), what is the most important with (8.167) is that the condition l ¼ 0
is permitted because of δ < 0 (see just below).
For convenience and according to the custom, we adopt a phase shift notation
other than that defined in (8.114). With the TE mode, the phase is retained upon
reflection at the critical angle, and so we identify α with an additional component
δTE. In the TM case, on the other hand, the phase is reversed upon reflection at the
critical angle (i.e., a π shift occurs). Since this π shift has been incorporated into β, it
suffices to consider only an additional component δTM. That is, we have
δTE α and δTM β π: ð8:168Þ
aeiσ
R⊥
E ¼e ¼ e
iα iδTE
¼ e2iσ
aeiσ
and
δTE ¼ 2σ ðσ > 0Þ, ð8:169Þ
where we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aeiσ ¼ cos θ þ i sin 2 θ n2 : ð8:170Þ
Therefore,
δ sin2 θ n2
tan σ ¼ tan TE ¼ : ð8:171Þ
2 cos θ
Meanwhile, rewriting (8.112) we have
k beiτ
RE ¼ eiβ ¼ eiðδTM þπ Þ ¼ e2iτ ¼ eið2τþπÞ , ð8:172Þ
beiτ
where
beiτ ¼ n2 cos θ þ i sin2 θ n2 : ð8:173Þ
Note that the minus sign with the second last equality in (8.172) is due to the phase
reversal upon reflection. From (8.172), we may put
β ¼ 2τ þ π:
Comparing this with the second equation of (8.168), we get
δTM ¼ 2τ ðτ > 0Þ: ð8:174Þ
Consequently, we get
δ sin2 θ n2
tan τ ¼ tan TM ¼ : ð8:175Þ
2 n2 cos θ
Finally, the additional phase change δTE and δTM upon the total reflection is given
by [4].
1 sin2 θ n2 1 sin2 θ n2
δTE ¼ 2 tan and δTM ¼ 2 tan : ð8:176Þ
cos θ n2 cos θ
We emphasize that in (8.176) both δTE and δTM are negative quantities. This phase
shift has to be included in (8.167) as a negative quantity δ. At a first glance, (8.176)
seems to differ largely from (8.120) and (8.125). Nevertheless, noting that a trigo-
nometric formula
2 tan x
tan 2x ¼
1 tan2 x
and remembering that δTM in (8.168) includes π arising from the phase reversal, we
find that both the relations are virtually identical.
Evanescent waves are drawing a large attention in the field of basic physics and
applied device physics. If the total internal reflection is absent, ϕ is real. But, under
8.8 Stationary Waves 331
the total internal reflection, ϕ is pure imaginary. The electric field of the evanescent
wave is described as
Et = Eεt eiðkt z sin ϕþkt y cos ϕωtÞ ¼ Eεt eiðkt z sin ϕþkt yibωtÞ
¼ Eεt eiðkt z sin ϕωtÞ ekt yb : ð8:177Þ
In (8.177), a unit polarization vector εt is either perpendicular to the plane of paper of

Fig. 8.13 (the TE case) or in parallel to it (the TM case). Notice that the coordinate
system is different from that of (8.104). The quantity kt sin ϕ is the propagation
constant. Let vðpsÞ and vðpeÞ be a phase velocity of the electromagnetic wave in the slab
waveguide (i.e., core layer) and evanescent wave in the clad layer, respectively.
Then, in virtue of Snell’s law we have
v1 ω ω v
v1 < vðpsÞ ¼ ¼ ¼ ¼ vðpeÞ ¼ 2 < v2 , ð8:178Þ
sin θ k sin θ kt sin ϕ sin ϕ
where v1 and v2 are light velocity in a free space filled by the dielectric D1 and D2,
respectively. For this, we used a relation described as
ω ¼ v1 k ¼ v2 k t : ð8:179Þ
We also used Snell’s law with the third equality. Notice that sin ϕ > 1 in the
evanescent region and that kt sin ϕ is a propagation constant in the clad layer. Also
note that vðpsÞ is equal to vðpeÞ and that these phase velocities are in between the two
velocities of the free space. Thus, the evanescent waves must be present, accompa-
nying propagating waves that undergo the total internal reflections in a slab
waveguide.
As remarked in (6.105), the electric field of evanescent waves decays exponen-
tially with increasing z. This implies that the evanescent waves exist only in the clad
layer very close to an interface of core and clad layers.
8.8 Stationary Waves
So far, we have been dealing with propagating waves either in a free space or in a
waveguide. If the dielectric shaping the waveguide is confined in another direction,
the propagating waves show specific properties. Examples include optical fibers.
In this section we consider a situation where the electromagnetic wave is prop-
agating in a dielectric medium and reflected by a “wall” formed by metal or another
dielectric. In such a situation, the original wave (i.e., a forward wave) causes
interference with the backward wave and a stationary wave is formed as a conse-
quence of the interference.
To approach this issue, we deal with superposition of two waves that have
different phases and different amplitudes. To generalize the problem, let ψ 1 and
ψ 2 be two cosine functions described as
ψ 1 ¼ a1 cos b1 and ψ 2 ¼ a2 cos b2 : ð8:180Þ
Their addition is expressed as
ψ ¼ ψ 1 þ ψ 2 ¼ a1 cos b1 þ a2 cos b2 : ð8:181Þ
Here we wish to unify (8.181) as a single cosine (or sine) function. To this end, we
modify a description of ψ 2 such that
ψ 2 ¼ a2 cos ½b1 þ ðb2 b1 Þ

¼ a2 ½ cos b1 cos ðb2 b1 Þ sin b1 sin ðb2 b1 Þ
¼ a2 ½ cos b1 cos ðb1 b2 Þ þ sin b1 sin ðb1 b2 Þ: ð8:182Þ
Then, the addition is described by
ψ ¼ ψ1 þ ψ2
¼ ½a1 þ a2 cos ðb1 b2 Þ cos b1 þ a2 sin ðb1 b2 Þ sin b1 : ð8:183Þ
Putting R such that

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R¼ ½a1 þ a2 cos ðb1 b2 Þ2 þ a2 2 sin2 ðb1 b2 Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ a1 2 þ a2 2 þ 2a1 a2 cos ðb1 b2 Þ, ð8:184Þ
we get
ψ ¼ R cos ðb1 θÞ, ð8:185Þ
where θ is expressed by
a2 sin ðb1 b2 Þ
tan θ ¼ : ð8:186Þ
a1 þ a2 cos ðb1 b2 Þ
Figure 8.14 represents a geometrical diagram in relation to the superposition of two

waves having different amplitudes (a1 and a2) and different phases (b1 and b2) [5].
To apply (8.185) to the superposition of two electromagnetic waves that are
propagating forward and backward to collide head-on with each other, we change
the variables such that
Fig. 8.14 Geometrical y

diagram in relation to the
superposition of two waves cos( − ) sin( − )
having different amplitudes
(a1 and a2) and different
phases (b1 and b2)
b1 ‒ b2
a1
θ
a2
O b1 b2 x
b1 ¼ kx ωt and b2 ¼ kx ωt, ð8:187Þ
where the former equation represents a forward wave, whereas the latter a backward
wave. Then we have
b1 b2 ¼ 2kx: ð8:188Þ
Equation (8.185) is rewritten as
ψ ðx, t Þ ¼ R cos ðkx ωt θÞ, ð8:189Þ
with
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R¼ a1 2 þ a2 2 þ 2a1 a2 cos 2kx ð8:190Þ
and
a2 sin 2kx
tan θ ¼ : ð8:191Þ
a1 þ a2 cos 2kx
Equation (8.189) looks simple, but since both R and θ vary as a function of x, the
situation is somewhat complicated unlike a simple sinusoidal wave. Nonetheless,
when x takes a special value, (8.189) is expressed by a simple function form. For
example, at t ¼ 0,
ψ ðx, 0Þ ¼ ða1 þ a2 Þ cos kx:

Fig. 8.15 Superposition of 㻞

two sinusoidal waves. In
(8.189) and (8.190), we put 㻝㻚㻡
a1 ¼ 1, a2 ¼ 0.5, with (i) (iii)
(i) t ¼ 0; (ii) t ¼ T/4; (iii) 㻝
t ¼ T/2; (iv) t ¼ 3T/4. ψ(x, t)
(ii)
㻜㻚㻡
is plotted as a function of
phase kx 㻜
㻙㻜㻚㻡
(iv)
㻙㻝
㻙㻝㻚㻡
㻙㻞
0 Phase kx (rad) 6.3
This corresponds to (i) of Fig. 8.15. If t ¼ T/2 (where T is a period, i.e., T ¼ 2π/ω),
we have
ψ ðx, T=2Þ ¼ ða1 þ a2 Þ cos kx:
This corresponds to (iii) of Fig. 8.15. But, the waves described by (ii) or (iv) do not
have a simple function form.
We characterize Fig. 8.15 below. If we have
2kx ¼ nπ or x ¼ nλ=4 ð8:192Þ
with λ being a wavelength, then θ ¼ 0 or π, and so θ can be eliminated. This situation

occurs with every quarter period of a wavelength. Let us put t ¼ 0 and examine how
the superposed wave looks like. For instance, putting x ¼ 0, x ¼ λ/4, and x ¼ λ/2 we
have
ψ ð0, 0Þ ¼ ja1 þ a2 j, ψ ðλ=4, 0Þ ¼ ψ ð3λ=4, 0Þ ¼ 0, ψ ðλ=2, 0Þ

¼ ja1 þ a2 j, ð8:193Þ
respectively. Notice that in Fig. 8.15 we took a1, a2 > 0. At another instant t ¼ T/4,
we have similarly
ψ ð0, T=4Þ ¼ 0, ψ ðλ=4, T=4Þ ¼ ja1 a2 j, ψ ðλ=2, T=4Þ ¼ 0,

ψ ðλ=4, T=4Þ ¼ ja1 a2 j: ð8:194Þ
Thus, the waves that vary with time are characterized by two dram-shaped envelopes
that have extremals ja1 + a2j and |a1 a2| or those j a1 + a2j and |a1 a2|. An
important implication of Fig. 8.15 is that no node is present in the superposed wave.
In other words, there is no instant t0 when ψ(x, t0) ¼ 0 for any x. From the aspect of
energy transport of electromagnetic waves, if |a1| > j a2j (where a1 and a2 represent
an amplitude of the forward and backward waves, respectively), the net energy flow
takes place in the travelling direction of the forward wave. If, on the other hand, |
a1| > ja2j, the net energy flow takes place in the travelling direction of the backward
wave. In this respect, think of Poynting vectors.
In case |a1| ¼ ja2j, the situation is particularly simple. No net energy flow takes
place in this case. Correspondingly, we observe nodes. Such waves are called
stationary waves. Let us consider this simple situation for an electromagnetic wave
that is incident perpendicularly to the interface between two dielectrics (one of them
may be a metal).
Returning back to (7.58) and (7.66), we describe two electromagnetic waves that
are propagating in the positive and negative direction of the z-axis such that
E1 = E1 εe eiðkzωtÞ and E2 = E2 εe eiðkzωtÞ , ð8:195Þ
where εe is a unit polarization vector arbitrarily fixed so that it can be parallel to the
interface, i.e., wall (i.e., perpendicular to the z-axis). The situation is depicted in
Fig. 8.16. Notice that in Fig. 8.16 E1 represents the forward wave (i.e., incident
wave) and E2 the backward wave (i.e., wave reflected at the interface). Thus, a
superposed wave is described as
E ¼ E1 þ E2 : ð8:196Þ
Taking account of the reflection of an electromagnetic wave perpendicularly incident

on a wall, let us consider following two cases:
(i) Syn-phase:
The phase of the electric field is retained upon reflection. We assume that
E1 ¼ E2 (>0). Then, we have
Fig. 8.16 Superposition of

electric fields of forward
(or incident) wave E1 and ( )
=
backward (or reflected)
wave E2
= ( )
h i
E = E1 εe eiðkzωtÞ þ eiðkzωtÞ

¼ E1 εe eiωt eikz þ eikz ¼ 2E 1 εe eiωt cos kz: ð8:197Þ
In (8.197), we put z ¼ 0 at the interface for convenience. Taking a real part of

(8.197), we have
E ¼ 2E1 εe cos ωt cos kz: ð8:198Þ
Note that in (8.198) variables z and t have been separated. This implies that
we have a stationary wave. For this case to be realized, the characteristic
impedance of the dielectric of the incident wave side should be smaller enough
than that of the other side; see (8.51) and (8.59). In other words, the dielectric
constant of the incident side should be large enough. We have nodes at positions
that satisfy
π 1 m
kz ¼ þ mπ ðm ¼ 0, 1, 2, Þ or z ¼ λ þ λ: ð8:199Þ
2 4 2
Note that we are thinking of the stationary wave in the region of z < 0.
Equation (8.199) indicates that nodes are formed at a quarter wavelength from
the interface and every half wavelength from it. The node means the position
where no electric field is present.
Meanwhile, antinodes are observed at positions
m
kz ¼ mπ ðm ¼ 0, 1, 2, Þ or z ¼ þ λ:
2
Thus, the nodes and antinodes alternate with every quarter wavelength.
(ii) Anti-phase:
The phase of the electric field is reversed. We assume that E1 ¼ E2 (>0).
Then, we have
h i
E = E 1 εe eiðkzωtÞ eiðkzωtÞ ¼ E1 εe eiωt eikz eikz
¼ 2iE 1 εe eiωt sin kz: ð8:200Þ
Taking a real part of (8.197), we have
E ¼ 2E 1 εe sin ωt sin kz: ð8:201Þ
In (8.201) variables z and t have been separated as well. For this case to be
realized, the characteristic impedance of the dielectric of the incident wave side
should be larger enough than that of the other side. In other words, the dielectric
constant of the incident side should be small enough. Practically, this situation
References 337
can easily be attained choosing a metal of high reflectance for the wall material.
We have nodes at positions that satisfy
m
kz ¼ mπ ðm ¼ 0, 1, 2, Þ or z ¼ λ: ð8:202Þ
2
The nodes are formed at the interface and every half wavelength from it. As
in the case of the syn-phase, the antinodes take place with the positions shifted
by a quarter wavelength relative to the nodes.
If there is another interface at say z ¼ L (>0), the wave goes back and forth
many times. If an absolute value of the reflection coefficient at the interface is
high enough (i.e., close to the unity), attenuation of the wave is ignorable. For
both the syn-phase and anti-phase cases, we must have
m
kL ¼ mπ ðm ¼ 1, 2, Þ or L ¼ λ ð8:203Þ
2
so that stationary waves can stand stable. For a practical purpose, an optical
device having such an interface is said to be a resonator. Various geometries
and constitutions of the resonator are proposed in combination with various
dielectrics including semiconductors. Related discussion can be seen in Chap. 9.
References
3. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester
4. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge
Chapter 9
Light Quanta: Radiation and Absorption
So far we discussed propagation of light and its reflection and transmission

(or refraction) at an interface of dielectric media. We described characteristics of
light from the point of view of an electromagnetic wave. In this chapter, we describe
properties of light in relation to quantum mechanics. To this end, we start with
Planck’s law of radiation that successfully reproduced experimental results related to
a blackbody radiation. Before this law had been established, Rayleigh–Jeans law
failed to explain the experimental results at a high frequency region of radiation (the
ultraviolet catastrophe). The Planck’s law of radiation led to the discovery of light
quanta. Einstein interpreted Planck’s law of radiation on the basis of a model of two-
level atoms. This model includes so-called Einstein A and B coefficients that are
important in optics applications, especially lasers. We derive these coefficients from
a classical point of view based on a dipole oscillation. We also consider a close
relationship between electromagnetic waves confined in a cavity and a motion of a
harmonic oscillator.
9.1 Blackbody Radiation
Historically, the relevant theory was first propounded by Max Planck and then
Albert Einstein as briefly discussed in Chap. 1. The theory was developed on the
basis of the experiments called cavity radiation or blackbody radiation. Here,
however, we wish to derive Planck’s law of radiation on the assumption of the
existence of quantum harmonic oscillators.
As discussed in Chap. 2, the ground state of a quantum harmonic oscillator has an
energy 12 ħω. Therefore, we measure energies of the oscillator in reference to that
state. Let N0 be the number of oscillators (i.e., light quanta) present in the ground
state. Then, according to Boltzmann distribution law the number of oscillators of the
first excited state N1 is

https://doi.org/10.1007/978-981-15-2225-3_9
340 9 Light Quanta: Radiation and Absorption
N 1 ¼ N 0 eħω=kB T , ð9:1Þ
where kB is Boltzmann constant and T is absolute temperature. Let Nj be the number

of oscillators of the j-th excited state. Then we have
N j ¼ N 0 ejħω=kB T : ð9:2Þ
Let N be the total number of oscillators in the system. Then, we get
N ¼ N 0 þ N 0 eħω=kB T þ þ N 0 ejħω=kB T þ
X1
¼ N 0 j¼0 ejħω=kB T : ð9:3Þ
Let E be a total energy of the oscillator system in reference to the ground state. That
is, we put a ground-state energy at zero. Then we have
E ¼ 0 N 0 þ N 0 ħωeħω=kB T þ þ N 0 jħωejħω=kB T þ
X1
¼ N 0 j¼0 jħωejħω=kB T : ð9:4Þ
Therefore, an average energy of oscillators E is

P1 jħω=kB T
E j¼0 je
E ¼ ¼ ħω P1 jħω=k T : ð9:5Þ
N j¼0 e
B
Putting x eħω=kB T [1], we have

P1 j
j¼0 jx
E ¼ ħω P1 j : ð9:6Þ
j¼0 x
Since x < 1, we have
X1 X1
d X1 j d 1
jxj ¼ jxj1 x ¼ x x ¼ x
j¼0 j¼0 dx j¼0 dx 1 x
x
¼ : ð9:7Þ
ð 1 xÞ 2
X1 1
xj ¼ : ð9:8Þ
j¼0 1x
Therefore, we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 341
ħωx ħωeħω=kB T ħω
E¼ ¼ ¼ : ð9:9Þ
1 x 1 eħω=kB T eħω=kB T 1
The function
1
ð9:10Þ
eħω=kB T 1
is a form of Bose–Einstein distribution functions; more specifically it is called the

Bose–Einstein distribution function for photons today.
If ðħω=kB T Þ 1, eħω=kB T 1 þ ðħω=kB T Þ. Therefore, we have
E k B T: ð9:11Þ
Thus, the relation (9.9) asymptotically agrees with a classical theory. In other words,
according to the classical theory related to law of equipartition of energy, energy of
kBT/2 is distributed to each of two degrees of freedom of motion, i.e., a kinetic
energy and a potential energy of a harmonic oscillator.
9.2 Planck’s Law of Radiation and Mode Density

of Electromagnetic Waves
Researcher at the time tried to seek the relationship between the energy density
inside the cavity and (angular) frequency of radiation. To reach the relationship, let
us introduce a concept of mode density of electromagnetic waves related to the
blackbody radiation. We define the mode density D(ω) as the number of modes of
electromagnetic waves per unit volume per unit angular frequency. We refer to the
electromagnetic waves having allowed specific angular frequencies and polarization
as modes. These modes must be described as linearly independent functions.
Determination of the mode density is related to boundary conditions (BCs)
imposed on a physical system. We already dealt with this problem in Chaps. 2, 3,
and 8. These BCs often appear when we find solutions of a differential equations. Let
us consider a following wave equation:
2 2
∂ ψ 1 ∂ ψ
2
¼ 2 : ð9:12Þ
∂x v ∂t 2
According to the method of separation of variables, we put
ψ ðx, t Þ ¼ X ðxÞT ðt Þ: ð9:13Þ
Substituting (9.13) for (9.12) and dividing both sides by X(x)T(t), we have
1 d2 X 1 1 d2 T
¼ ¼ k 2 , ð9:14Þ
X dx2 v2 T dt 2
where k is an undetermined (possibly complex) constant. For the x component, we

get
d2 X
þ k2 X ¼ 0: ð9:15Þ
dx2
Remember that k is supposed to be a complex number for the moment (see Example 1.1).
Modifying Example 1.1 a little bit such that (9.15) is posed in a domain [0, L] and
imposing the Dirichlet conditions such that
X ð0Þ ¼ X ðLÞ ¼ 0, ð9:16Þ
we find a solution of
X ðxÞ ¼ a sin kx, ð9:17Þ
where a is a constant. The constant k can be determined to satisfy the BCs; i.e.,
kL ¼ mπ or k ¼ mπ=L ðm ¼ 1, 2, Þ: ð9:18Þ
Thus, we get real numbers for k. Then, we have a solution
T ðt Þ ¼ b sin kvt ¼ b sin ωt: ð9:19Þ
The overall solution is then
ψ ðx, t Þ ¼ c sin kx sin ωt: ð9:20Þ
This solution has already appeared in Chap. 8 as a stationary solution. The readers
are encouraged to derive these results.
In a three-dimensional case, we have a wave equation
2 2 2 2
∂ ψ ∂ ψ ∂ ψ 1 ∂ ψ
2
þ 2þ 2 ¼ 2 2
: ð9:21Þ
∂x ∂y ∂z v ∂t
In this case, we also assume
ψ ðx, t Þ ¼ X ðxÞY ðyÞZ ðzÞT ðt Þ: ð9:22Þ
Similarly we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 343
1 d2 X 1 d2 Y 1 d2 Z 1 1 d2 T
þ þ ¼ ¼ k2 : ð9:23Þ
X dx2 Y dx2 Z dx2 v2 T dt 2
Putting
1 d2 X 1 d2 Y 1 d2 Z
¼ k 2
x , ¼ k 2
y , ¼ k2z , ð9:24Þ
X dx2 Y dx2 Z dx2
we have
k 2x þ k2y þ k 2z ¼ k 2 : ð9:25Þ
Then, we get a stationary wave solution as in the one-dimensional case such that
ψ ðx, t Þ ¼ c sin kx x sin ky y sin k z z sin ωt: ð9:26Þ
The BCs to be satisfied with X(x), Y( y), and Z(z) are

kx L ¼ mx π, ky L ¼ my π, k z L ¼ mz π mx , my , mz ¼ 1, 2, : ð9:27Þ
Returning to the main issue, let us deal with the mode density. Think of a cube of
each side of L that is placed as shown in Fig. 9.1. Calculating k, we have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
π ω
k¼ k 2x þ k2y þ k 2z ¼ m2x þ m2y þ m2z ¼ , ð9:28Þ
L c
where we assumed that the inside of a cavity is vacuum and, hence, the propagation
velocity of light is c. Rewriting (9.28), we have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Lω
m2x þ m2y þ m2z ¼ : ð9:29Þ
πc
Fig. 9.1 Cube of each side z

of L. We use this simple
L
model to estimate mode
density L
L
O
y
x
The number mx, my, and mz represents allowable modes in the cavity; the set of (mx,
my, mz) specifies individual modes. Note that mx, my, and mz are all positive integers.
If for instance mx were allowed, this would produce sin(kxx) ¼ sin kxx; but
this function is linearly dependent on sinkxx. Then, a mode indexed by mx should
not be regarded as an independent mode. Given a ω, a set (mx, my, mz) that satisfies
(9.29) corresponds to each mode. Therefore, the number of modes that satisfies a
following expression
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Lω
m2x þ m2y þ m2z ð9:30Þ
πc
represents those corresponding to angular frequencies equal to or less than given ω.

Each mode has one-to-one correspondence with the lattice indexed by (mx, my,
mz). Accordingly, if mx, my, mz 1, the number of allowed modes approximately
equals one-eighth of a volume of a sphere having a radius of Lω πc . Let NL be the
number of modes whose angular frequencies equal to or less than ω. Recalling that
there are two independent modes having the same index (mx, my, mz) but mutually
orthogonal polarities, we have

4π Lω 3 1 L3 ω3
NL ¼ 2¼ 2 3: ð9:31Þ
3 πc 8 3π c
Consequently, the mode density D(ω) is expressed as
1 dN L ω2
DðωÞdω ¼ dω ¼ 2 3 dω, ð9:32Þ
L3 dω π c
where D(ω)dω represents the number of modes per unit volume whose angular
frequencies range ω and ω + dω.
Now, we introduce a function ρ(ω) as an energy density per unit angular
frequency. Then, combining D(ω) with (9.9), we get
ħω ħω3 1
ρ ð ω Þ ¼ D ð ωÞ ¼ : ð9:33Þ
eħω=kB T 1 π 2 c3 eħω=kB T 1
The relation (9.33) is called Planck’s law of radiation. Notice that ρ(ω) has a
dimension [Jm3s].
To solve (9.15) under the Dirichlet conditions (9.16) is pertinent to analyzing an
electric field within a cavity surrounded by a metal husk, because the electric field
must be absent at an interface between the cavity and metal. The problem, however,
can equivalently be solved using the magnetic field. This is because at the interface
the reflection coefficient of electric field and magnetic field have a reversed sign (see
Chap. 8). Thus, given an equation for the magnetic field, we may use the Neumann
condition. This condition requires differential coefficients to vanish at the boundary
9.3 Two-Level Atoms 345
(i.e., the interface between the cavity and metal). In a similar manner to the above,
we get
ψ ðx, t Þ ¼ c cos kx cos ωt: ð9:34Þ
By imposing BCs, again we have (9.18) that leads to the same result as the above.
We may also impose the periodic BCs. This type of equation has already been
treated in Chap. 3. In that case we have a solution of
eikx and eikx :
The BCs demand that e0 ¼ 1 ¼ eikL. That is,
kL ¼ 2πm ðm ¼ 0, 1, 2, Þ: ð9:35Þ
Notice that eikx and eikx are linearly independent and, hence, minus sign for m is
permitted. Correspondingly, we have

4π Lω 3 L3 ω3
NL ¼ 2¼ 2 3: ð9:36Þ
3 2πc 3π c
In other words, here we have to consider a whole volume of a sphere of a half radius
of the previous case. Thus, we reach the same conclusion as before.
If the average energy of an oscillator were described by (9.11), we would obtain a
following description of ρ(ω) such that
ω2
ρðωÞ ¼ DðωÞk B T ¼ kB T: ð9:37Þ
π 2 c3
This relation is well known as RayleighJeans law, but (9.37) disagreed with
experimental results in that according to RayleighJeans law, ρ(ω) diverges toward
infinity as ω goes to infinity. The discrepancy between the theory and experimental
results was referred to as “ultraviolet catastrophe.” Planck’s law of radiation
described by (9.33), on the other hand, reproduces the experimental results well.
9.3 Two-Level Atoms
Although Planck established Planck’s law of radiation, researchers at that time

hesitated in professing the existence of light quanta. It was Einstein that derived
Planck’s law by assuming two-level atoms in which light quanta play a role.
His assumption comprises the following three postulates: (i) The physical system
to be addressed comprises so-called hypothetical “two-level” atoms that have only
two energy levels. If two-level atoms absorb a light quantum, a ground-state electron
Fig. 9.2 Optical processes ,

of Einstein two-level atom
model
−
( ) ( )
=
ℏ
, Stimulated Stimulated Spontaneous
absorption emission emission
is excited up to a higher level (i.e., the stimulated absorption). (ii) The higher-level
electron may spontaneously lose its energy and return back to the ground state (the
spontaneous emission). (iii) The higher-level electron may also lose its energy and
return back to the ground state. Unlike (ii), however, the excited electron has to be
stimulated by being irradiated by light quanta having an energy corresponding to the
energy difference between the ground state and excited state (the stimulated emis-
sion). Figure 9.2 schematically depicts the optical processes of the Einstein model.
Under those postulates, Einstein dealt with the problem probabilistically. Sup-
pose that the ground state and excited state have energies E1 and E2. Einstein
assumed that light quanta having an energy equaling E2 E1 take part in all the
above three transition processes. He also propounded the idea that the light quanta
have an energy that is proportional to its (angular) frequency. That is, he thought that
the following relation should hold:
ħω21 ¼ E2 E1 , ð9:38Þ
where ω21 is an angular frequency of light that takes part in the optical transitions.
For the time being, let us follow Einstein’s postulates.
(i) Stimulated absorption: This process is simply said to be an “absorption.” Let Wa
[s1] be the transition probability that the electron absorbs a light quantum to be
excited to the excited state. Wa is described as
W a ¼ N 1 B21 ρðω21 Þ, ð9:39Þ
where N1 is the number of atoms occupying the ground state; B21 is a propor-
tional constant; ρ(ω21) is due to (9.33). Note that in (9.39) we used ω21 instead
of ω in (9.33). The coefficient B21 is called Einstein B coefficient; more
specifically one of Einstein B coefficients. Namely, B21 is pertinent to the
transition from the ground state to excited state.
9.3 Two-Level Atoms 347
(ii) Emission processes: The processes include both the spontaneous and stimulated
emissions. Let We [s1] be the transition probability that the electron emits a
light quantum and returns back to the ground state. We is described as
W e ¼ N 2 B12 ρðω21 Þ þ N 2 A12 , ð9:40Þ
where N2 is the number of atoms occupying the excited state; B12 and A12 are
proportional constants. The coefficient A12 is called Einstein A coefficient
relevant to the spontaneous emission. The coefficient B12 is associated with
the stimulated emission and also called Einstein B coefficient together with B21.
Here, B12 is pertinent to the transition from the excited state to ground state.
Now, we have
B12 ¼ B21 : ð9:41Þ
The reasoning for this is as follows: The coefficients B12 and B21 are proportional to
the matrix elements pertinent to the optical transition. Let T be an operator associated
with the transition. Then, a matrix element is described using an inner product
notation of Chap. 1 by
B21 ¼ hψ 2 jT j ψ 1 i, ð9:42Þ
where ψ 1 and ψ 2 are initial and final states of the system in relation to the optical
transition. As a good approximation, we use er for T (dipole approximation), where
e is an elementary charge and r is a position operator (see Chap. 1). If (9.42)
represents the absorption process (i.e., the transition from the ground state to excited
state), the corresponding emission process should be described as a reversed process
by
B12 ¼ hψ 1 jT j ψ 2 i: ð9:43Þ
Notice that in (9.43) ψ 2 and ψ 1 are initial and final states.

B21 ¼ hψ 1 jT { j ψ 2 i, ð9:44Þ
where T{ is an operator adjoint to T (see Chap. 1). With an Hermitian operator H,

from Sect. 1.4 we have
H { ¼ H: ð1:119Þ
Since T is also Hermitian, we have

T { ¼ T: ð9:45Þ
Thus, we get
B21 ¼ B12 : ð9:46Þ
But, as in the cases of Sects. 4.2 and 4.3, ψ 1 and ψ 2 can be represented as real
functions. Then, we have
B21 ¼ B21 ¼ B12 :
That is, we assume that the matrix B is real symmetric. In the case of two-level
atoms, as a matrix form we get
0 B12
B¼ : ð9:47Þ
B12 0
Compare (9.47) with (4.28).

Now, in the thermal equilibrium, we have
W e ¼ W a: ð9:48Þ
That is,
N 2 B21 ρðω21 Þ þ N 2 A12 ¼ N 1 B21 ρðω21 Þ, ð9:49Þ
where we used (9.41) for LHS. Assuming Boltzmann distribution law, we get
N2
¼ exp ½ðE 2 E 1 Þ=kB T : ð9:50Þ
N1
Here if moreover we assume (9.38), we get
N2
¼ exp ðħω21 =kB T Þ: ð9:51Þ
N1
Combing (9.49) and (9.51), we have
B21 ρðω21 Þ
exp ðħω21 =kB T Þ ¼ : ð9:52Þ
B21 ρðω21 Þ þ A12
Solving (9.52) with respect to ρ(ω12), we finally get

9.4 Dipole Radiation 349
A12 exp ðħω21 =k B T Þ A 1

ρðω21 Þ ¼ ¼ 12 : ð9:53Þ
B21 1 exp ðħω21 =k B T Þ B21 exp ðħω21 =k B T Þ 1
Assuming that
A12 ħω21 3
¼ 2 3 , ð9:54Þ
B21 π c
we have
ħω21 3 1
ρðω21 Þ ¼ : ð9:55Þ
π 2 c3 exp ðħω21 =kB T Þ 1
This is none other than Planck’s law of radiation.
9.4 Dipole Radiation
In (9.54) we only know the ratio of A12 to B21. To have a good knowledge of these
Einstein coefficients, we briefly examine a mechanism of the dipole radiation. The
electromagnetic radiation results from an accelerated motion of a dipole.
A dipole moment p(t) is defined as a function of time t by
Z
pð t Þ ¼ x0 ρðx0 , t Þdx0 , ð9:56Þ
where x0 is a position vector in a Cartesian coordinate; an integral is taken over a

whole three-dimensional space; ρ is a charge density appearing in (7.1). If we
consider a system comprising point charges, integration can immediately be carried
out to yield
X
pð t Þ ¼ qx,
i i i
ð9:57Þ
where qi is a charge of each point charge i and xi is a position vector of the point
charge i. From (9.56) and (9.57), we find that p(t) depends on how we set up the
coordinate system. However, if a total charge of the system is zero, p(t) does not
depend on the coordinate system. Let p(t) and p0(t) be a dipole moment viewed from
the frame O and O0, respectively (see Fig. 9.3). Then we have
᧧ ಥ
.
.
Fig. 9.3 Dipole moment viewed from the frame O or O0
(a) +q (b) z
x
z0 a
θ
‒a ‒z0
y
O
‒q φ
x
Fig. 9.4 Electromagnetic radiation from an accelerated motion of a dipole. (a) A dipole placed at
the origin of the coordinate system is executing harmonic oscillation along the z-direction around an
equilibrium position. (b) Electromagnetic radiation from a dipole in a wave zone. εe and εm are unit
polarization vectors of the electric field and magnetic field, respectively. εe, εm, and n form a right-
handed system
X X X X X
p0 ð t Þ ¼ q x0 ¼
i i i
q ð x þ xi Þ ¼
i i 0 i
q i x 0 þ i
q i x i ¼ qx
i i i
¼ pðt Þ: ð9:58Þ
Notice that with the third equality the first term vanishes because the total charge is
zero. The system comprising two point charges that have an opposite charge ( q) is
particularly simple but very important. In that case we have
pðt Þ ¼ qx1 þ ðqÞx2 ¼ qðx1 x2 Þ ¼ qe

x: ð9:59Þ
Here we assume that q > 0 according to the custom and, hence, e x is a vector directing
from the minus point charge to the plus charge.
Figure 9.4 displays geometry of an oscillating dipole and electromagnetic radia-
tion from it. Figure 9.4a depicts the dipole. It is placed at the origin of the coordinate
system and assumed to be of an atomic or molecular scale in extension; we regard a
center of the dipole as the origin. Figure 9.4b represents a large-scale geometry of the
dipole and surrounding space of it. For the electromagnetic radiation, an accelerated
motion of the dipole is of primary importance. The electromagnetic fields produced
by pð€t Þ vary as the inverse of r, where r is a macroscopic distance between the dipole
and observation point. Namely, r is much larger compared to the dipole size.
There are other electromagnetic fields that result from the dipole moment. The
fields result from p(t) and pð_t Þ. Strictly speaking, we have to include those quantities
that are responsible for the electromagnetic fields associated with the dipole radia-
tion. Nevertheless, the fields produced by p(t) and pð_t Þ vary as a function of the
inverse cube and inverse square of r, respectively. Therefore, the surface integral of
the square of the fields associated with p(t) and pð_t Þ asymptotically reaches zero with
enough large r with respect to a sphere enveloping the dipole. Regarding pð€t Þ, on the
other hand, the surface integral of the square of the fields remains finite even with
enough large r. For this reason, we refer to the spatial region where pð€t Þ does not
vanish as a wave zone.
Suppose that a dipole placed at the origin of the coordinate system is executing
harmonic oscillation along the z-direction around an equilibrium position (see
Fig. 9.4). Motion of two charges having plus and minus signs is described by
zþ = z0 e3 þ aeiωt e3 ðz0 , a > 0Þ, ð9:60Þ

z 2 = 2 z0 e3 2 aeiωt e3 , ð9:61Þ
where z+ and z2 are position vectors of a plus charge and minus charge, respectively;
z0 and z0 are equilibrium positions of each charge; a is an amplitude of the
harmonic oscillation; ω is an angular frequency of the oscillation. Then, accelera-
tions of the charges are given by
aþ z€þ = aω2 eiωt e3 , ð9:62Þ

a 2 z€ = aω2 eiωt e3 : ð9:63Þ
Meanwhile, we have
pðt Þ ¼ qzþ þ ðqÞz 2 ¼ qðzþ z 2 Þ ðq > 0Þ: ð9:64Þ
Therefore,
pð€t Þ ¼ qðz€þ z€ Þ ¼ 2qaω2 eiωt e3 : ð9:65Þ
The quantity pð€t Þ, i.e., the second derivative of p(t) with respect to time, produces
the electric field described by [2]
E= 2€
p=4πε0 c2 r ¼ 2 qaω2 eiωt e3 =2πε0 c2 r, ð9:66Þ
where r is a distance between the dipole and observation point. In (9.66) we ignored
a term proportional to inverse square and cube of r for the aforementioned reason.
As described in (9.66), the strength of the radiation electric field in the wave zone
measured at a point away from the oscillating dipole is proportional to a component
of the vector of the acceleration motion of the dipole [i.e., pð€t Þ ]. The radiation
electric field lies in the direction perpendicular to a line connecting the observation
point and the point of the dipole (Fig. 9.4). Let εe be a unit polarization vector of the
electric field in that direction and let E⊥ be the radiation electric field. Then, we have
E⊥ = 2 qaω2 eiωt ðe3 εe Þεe =2πε0 c2 r ¼ 2 qaω2 eiωt εe sin θ=2πε0 c2 r : ð9:67Þ
As shown in Sect. 7.3, (εe e3)εe in (9.67) “extracts” from e3 a vector component
parallel to εe. Such an operation is said to be a projection of a vector. The related
discussion can be seen in Part III.
It takes a time of r/c for the emitted light from the charge to arrive at the
observation point. Consequently, the acceleration of the charge has to be measured
at the time when the radiation leaves the charge. Let t be the instant when the electric
field is measured at the measuring point. Then, it follows that the radiation leaves the
charge at a time of t r/c. Thus, the electric field relevant to the radiation that can be
observed far enough away from the oscillating charge is described as [2]
qaω2 eiωðtcÞ sin θ

r
E⊥ ðx, t Þ = 2 εe : ð9:68Þ
2πε0 c2 r
The radiation electric field must necessarily be accompanied by a magnetic field.

Writing the radiation magnetic field as H⊥(x, t), we have [3]
1 qaω2 eiωðtcÞ sin θ qaω2 eiωðtcÞ sin θ

r r
⊥
H ðx, t Þ = 2 n εe = 2 n εe
cμ0 2πε0 c2 r 2πcr
qaω2 eiωðtcÞ sin θ

r
=2 εm , ð9:69Þ
2πcr
where n represents a unit vector in the direction parallel to a line connecting the
observation point and the dipole. The εm is a unit polarization vector of the magnetic
field as defined by (7.67). From the above, we see that the radiation electromagnetic
waves in the wave zone are transverse waves that show the properties the same as
those of electromagnetic waves in a free space.
Now, let us evaluate a time-averaged energy flux from an oscillating dipole.
1 qaω2 eiωðtcÞ sin θ qaω2 eiωðtcÞ sin θ

r r
1
SðθÞ ¼ E H = n
2 2 2πε0 c2 r 2πcr
ω4 sin 2 θ
= ðqaÞ2 n: ð9:70Þ
8π 2 ε0 c3 r 2
If we are thinking of an atom or a molecule in which the dipole consists of an

electron and a positive charge that compensates it, q is replaced with e (e < 0).
Then (9.70) reads as
ω4 sin 2 θ
SðθÞ = ðeaÞ2 n: ð9:71Þ
8π 2 ε0 c3 r 2
Let us relate the above argument to Einstein A and B coefficients. Since we are
dealing with an isolated dipole, we might well suppose that the radiation comes from
the spontaneous emission. Let P be a total power of emission from the oscillating
dipole that gets through a sphere of radius r. Then we have
Z Z 2π Z π
P¼ SðθÞ ndS ¼ dϕ SðθÞr 2 sin θdθ
0 0
Z 2π Z π
ω 4
¼ ðeaÞ2 dϕ sin 3 θdθ: ð9:72Þ
8π 2 ε0 c3 0 0
Rπ
Changing cosθ to t, the integral I 0 sin 3 θdθ can be converted into
Z 1
I¼ 1 t 2 dt ¼ 4=3: ð9:73Þ
1
Thus, we have
ω4
P¼ ðeaÞ2 : ð9:74Þ
3πε0 c3
A probability of the spontaneous emission is given by N2 A12. Since we are

dealing with a single dipole, N2 can be put 1. Accordingly, an expected power of
emission is A12ħω21. Replacing ω in (9.74) with ω21 in (9.55) and equating A12ħω21
to P, we get
ω21 3
A12 ¼ ðeaÞ2 : ð9:75Þ
3πε0 c3 ħ
From (9.54), we also get

π
B12 ¼ ðeaÞ2 : ð9:76Þ
3ε0 ħ 2
In order to relate these results to those of quantum mechanics, we may replace a2

in the above expressions with a square of an absolute value of the matrix elements of
the position operator r. That is, representing | 1i and | 2i as the quantum states of the
ground and excited states of a two-level atom, we define h1| r| 2i as the matrix
element of the position operator. Relating |h1| r| 2i|2 to a2, we get
ω21 3 e2 πe2
A12 ¼ jh1jrj2ij2 and B12 ¼ jh1jrj2ij2 : ð9:77Þ
3πε0 c ħ
3
3ε0 ħ2
πe2 πe2
B12 ¼ h1jrj2ih1jrj2i ¼ h1jrj2ih2jr{ j1
3ε0 ħ 2
3ε0 ħ2
πe2
¼ h1jrj2ih2jrj1i, ð9:78Þ
3ε0 ħ2
where with the second equality we used (1.116); the last equality comes from that r is
Hermitian. Meanwhile, we have
πe2 2 πe2 { πe2

B21 ¼ j h 2 jrj1i j ¼ h 2 jrj1i h 1 jr j2 ¼ h2jrj1ih1jrj2i: ð9:79Þ
3ε0 ħ2 3ε0 ħ2 3ε0 ħ2
Hence, we recover (9.41).
9.5 Lasers
9.5.1 Brief Outlook
A concept of the two-level atoms proposed by Einstein is universal and independent

of materials and can be utilized for some particular purposes. Actually, in later years
many researchers tried to validate that concept and verified its validity. After basic
researches of 1950s and 1960s, fundamentals were put into practical use as various
optical devices. Typical examples are masers and lasers, abbreviations for “micro-
wave amplification by stimulated emission of radiation” and “light amplification by
stimulated emission of radiation.” Of these, lasers are common and particularly
important nowadays. On the basis of universality of the concept, a lot of materials
including semiconductors and dyes are used in gaseous, liquid, and solid states.
9.5 Lasers 355
t t+dt
I(x) I(x+dx)
0 x x+dx L
Fig. 9.5 Rectangular parallelepiped of a laser medium with a length L and a cross-section area
S (not shown). I(x) denotes irradiance at a point of x from the origin
Let us consider a rectangular parallelepiped of a laser medium with a length L and

cross-section area S (Fig. 9.5). Suppose there are N two-level atoms in the rectan-
gular parallelepiped such that
N ¼ N1 þ N2, ð9:80Þ
where N1 and N2 represent the number of atoms occupying the ground and excited
states, respectively. Suppose that a light is propagated from the left of the rectangular
parallelepiped and entering it. Then, we expect that three processes occur simulta-
neously. One is a stimulated absorption and others are stimulated emission and
spontaneous emission. After these process, an increment dE in photon energy of the
total system (i.e., the rectangular parallelepiped) during dt is described by
dE ¼ fN 2 ½B21 ρðω21 Þ þ A12 N 1 B21 ρðω21 Þgħω21 dt: ð9:81Þ
In light of (9.39) and (9.40), a dimensionless quantity dE/ħω21 represents a number

of effective events of photons emission that have occurred during dt. Since in lasers
the stimulated emission is dominant, we shall forget about the spontaneous emission
and rewrite (9.81) as
dE ¼ fN 2 B21 ρðω21 Þ N 1 B21 ρðω21 Þgħω21 dt

¼ B21 ρðω21 ÞðN 2 N 1 Þħω21 dt: ð9:82Þ
Under a thermal equilibrium, we have N2 < N1 on the basis of Boltzmann distribu-

tion law, and so dE < 0. In this occasion, therefore, the photon energy decreases. For
the light amplification to take place, therefore, we must have a following condition:
N2 > N1: ð9:83Þ
This energy distribution is called inverted distribution or population inversion. Thus,

the laser oscillation is a typical nonequilibrium phenomenon. To produce the
population inversion, we need an external exciting source using an electrical or
optical device.
The essence of lasers rests upon the fact that stimulated emission produces a
photon that possesses a wavenumber vector (k) and a polarization (ε) both the same
as those of an original photon. For this reason, the laser light is monochromatic and
highly directional. To understand the fundamental mechanism underlying the related

phenomena, interested readers are encouraged to seek appropriate literature of
quantum theory of light for further reading [4].
To make a discussion simple and straightforward, let us assume that the light is
incident parallel to the long axis of the rectangular parallelepiped. Then, the stimu-
lated emission produces light to be propagated in the same direction. As a result, an
irradiance I measured in that direction is described as
E 0
I¼ c: ð9:84Þ
SL
Note that the light velocity in the laser medium c0 is given by
c0 ¼ c=n, ð9:85Þ
where n is a refractive index of the laser medium. Taking an infinitesimal of both

sides of (9.84), we have
c0 c0
dI ¼ dE ¼ B21 ρðω21 ÞðN 2 N 1 Þħω21 dt
SL SL
e 21 c0 dt,
¼ B21 ρðω21 ÞNħω ð9:86Þ
where Ne ¼ ðN 2 N 1 Þ=SL denotes a “net” density of atoms that occupy the excited
state.
The energy density ρ(ω21) can be written as
ρðω21 Þ ¼ I ðω21 Þgðω21 Þ=c0 , ð9:87Þ
where I(ω12) [Js1m2] represents an intensity of radiation; g(ω21) is a gain function

[s]. The gain function is a measure that shows how favorably (or unfavorably) the
transition takes place at the said angular frequency ω12. This is normalized in the
emission range such that
Z 1
gðωÞdω ¼ 1:
0
The quantity I(ω21) is an energy flux that gets through per unit area per unit time.
This flux corresponds to an energy contained in a long and thin rectangular paral-
lelepiped of a length c0 and a unit cross-section area. To obtain ρ(ω21), I(ω12) should
be divided by c0 in (9.87) accordingly. Using (9.87) and replacing c0dt with a distance
dx and I(ω12) with I(x) as a function of x, we rewrite (9.86) as
9.5 Lasers 357
e 21
B21 gðω21 ÞNħω
dI ðxÞ ¼ 0 I ðxÞdx: ð9:88Þ
c
Dividing (9.88) by I(x) and integrating both sides, we have

Z Z Z
I
dI ðxÞ I e 21
B21 gðω21 ÞNħω x
¼ d ln I ðxÞ ¼ dx, ð9:89Þ
I0 I ðxÞ I0 c0 0
where I0 is an irradiance of light at an instant when the light is entering the laser
medium from the left. Thus, we get

e 21
B21 gðω21 ÞNħω
I ðxÞ ¼ I 0 exp x : ð9:90Þ
c0
Equation (9.90) shows that an irradiance of the laser light is augmented exponen-
tially along the path of the laser light. In (9.90), denoting an exponent as G
e 21
B21 gðω21 ÞNħω
G 0 , ð9:91Þ
c
we get
I ðxÞ ¼ I 0 exp Gx:
The constant G is said to be a gain constant. This is an index that indicates the laser
performance. Large numbers B21, g(ω21), and N e yield a high performance of the
laser.
In Sects. 8.8 and 9.2, we sought conditions for electromagnetic waves to cause
constructive interference. In a one-dimensional dielectric medium, the condition is
described as
kL ¼ mπ or mλ ¼ 2L ðm ¼ 1, 2, Þ, ð9:92Þ
where k and λ denote a wavenumber and wavelength in the dielectric medium,

respectively. Indexing k and λ with m that represents a mode, we have
k m L ¼ mπ or mλm ¼ 2L ðm ¼ 1, 2, Þ: ð9:93Þ
This condition can be expressed by different manners such that
ωm ¼ 2πνm ¼ 2πc0 =λm ¼ 2πc=nλm ¼ mπc=nL: ð9:94Þ
It is often the case that if the laser is a long and thin rod, rectangular parallele-
piped, etc., we see that sharply resolved and regularly spaced spectral lines are
observed in emission spectrum. These lines are referred to as a longitudinal

multimode. The separation between two neighboring emission lines is referred to
as the free spectral range [2]. If adjacent emission lines are clearly resolved so that
the free spectral range can easily be recognized, we can derive useful information
from the laser oscillation spectra (vide infra).
Rewriting (9.94) as, e.g.,
πc
ωm n ¼ m, ð9:95Þ
L
and taking differential (or variation) of both sides, we get
δn δn πc
nδωm þ ωm δn ¼ nδωm þ ωm δω ¼ n þ ωm δωm ¼ δm: ð9:96Þ
δωm m δωm L
Therefore, we get
1
πc δn
δωm ¼ n þ ωm δm: ð9:97Þ
L δωm
Equation (9.97) premises the wavelength dispersion of a refractive index of a laser

medium. Here, the wavelength dispersion means that the refractive index varies as a
function of wavelengths of light in a matter. The laser materials often have a
considerably large dispersion and relevant information is indispensable.
From (9.97), we find that
δn
n g n þ ωm ð9:98Þ
δωm
plays a role of refractive index when the laser material has a wavelength dispersion.
The quantity ng is said to be a group refractive index (or group index). Thus, (9.97) is
rewritten as
πc
δω ¼ δm, ð9:99Þ
Lng
where we omitted the index m of ωm. When we need to distinguish the refractive
index n clearly from the group refractive index, we refer to n as a phase refractive
index. Rewriting (9.98) as a relation of continuous quantities and using differenti-
ation instead of variation, we have [2]
dn dn
ng ¼ n þ ω or ng ¼ n λ : ð9:100Þ
dω dλ
9.5 Lasers 359
To derive the second equation of (9.100), we used following relations: Namely,

taking a variation of λω ¼ 2πc, we have
ω λ
ωdλ þ λdω ¼ 0 or ¼ :
dω dλ
Several formulae or relation equations were proposed to describe the wavelength

dispersion. One of famous and useful formula among them is Sellmeier’s dispersion
formula [5]. As an example, the Sellmeier’s dispersion formula can be described as
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
B
n¼ Aþ 2 , ð9:101Þ
1 λc
where A, B, and C are appropriate constants with A and B being dimensionless and
C having a dimension [m]. In an actual case, it would be difficult to determine
n analytically. However, if we are able to obtain well-resolved spectra, δm can be put
as 1 and δωm can be determined from the free spectral range. Expressing it as δωFSR
from (9.99) we have
πc πc
δωFSR ¼ or ng ¼ : ð9:102Þ
Lng LðδωFSR Þ
Thus, one can determine ng as a function of wavelengths.
9.5.2 Organic Lasers
By virtue of prominent features of lasers and related phenomena, researchers and

engineers have been developing and proposing up to now various device structures
and their operation principles in the research field of device physics. Of these, we
have organic lasers as newly occurring laser devices. Organic crystals possess the
well-defined structureproperty relationship and some of them exhibit peculiar
light-emitting features. Therefore, those crystals are suited for studying their lasing
property. In this section we study the light-emitting properties (especially the lasing
property) of the organic crystals in relation to the wavelength dispersion of refractive
index of materials. From the point of view of device physics, another key issue lies in
designing an efficient diffraction grating or resonator.
In the following tangible examples, we further investigate specific aspects of the
light-emitting properties of the organic crystals in the slab waveguide configurations
(see Sect. 8.8) to pursue fundamental properties of the organic light-emitting mate-
rials and incorporate them into high-performance devices.
Example 9.1 [6] Figure 9.6 [6] displays a broadband emission spectra of a crystal
consisting of an organic semiconductor AC’7. As another example, Fig. 9.7 [6]
(a) (b)
10 11
Intensity (103 counts)

10
5
9
0
500 520 540 560 580 600 620 640 529 530 531
Wavelength (nm) Wavelength (nm)
Fig. 9.6 Broadband emission spectra of an organic semiconductor crystal AC’7. (a) Full spectrum.
(b) Enlarged profile of the spectrum around 530 nm. Reproduced from Yamao T, Okuda Y,
Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer
single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing.
https://doi.org/10.1063/1.3634117
4
0
522 524 526 528
Wavelength (nm)
Fig. 9.7 Laser oscillation spectrum of an organic semiconductor crystal AC’7. Reproduced from
Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/
phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission
of AIP Publishing. https://doi.org/10.1063/1.3634117
displays a laser oscillation spectrum of an AC’7 crystal. The structural formula of

AC’7 is shown in Fig. 9.8 together with other related organic semiconductors. Once
we choose an empirical formula of the wavelength dispersion [e.g., (9.101)], we can
determine constants of that empirical formula by comparing it with experimentally
decided data. For such data, laser oscillation spectra (Fig. 9.7) were used in addition
to the broadband emission spectra. It is because the laser oscillation spectra are
essentially identical to Fig. 9.6 in a sense that both the broadband and laser emission
lines gave the same free spectral range. Inserting (9.101) into (9.100) and expressing
ng as a function of λ, Yamao et al. got a following expression [6]:
9.5 Lasers 361
Fig. 9.8 Structural

formulae of several organic
semiconductors BP1T,
AC5, and AC’7
h 2 i2
A 1 λc þB
ng ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h i ffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h ffi: ð9:103Þ
c 2 3 c 2 i
1 λ A 1 λ þB
Determining optimum constants A, B, and C, a set of these constants yields a reliable

dispersion formula in (9.101). Numerical calculations can be utilized effectively.
The procedures are as follows: (i) Tentatively choosing probable numbers for A, B,
and C for (9.103), ng can be expressed as a function of λ. (ii) The resulting fitting
curve is then compared with ng data experimentally decided from (9.102). After this
procedure, one can choose another set of A, and B, and C and again compare the
fitting curve with the experimental data. (iii) This procedure can be repeated many
times through iterative numerical computations of (9.103) using different sets of
A, B, and C.
Thus, we should be able to adjust and determine better and better combination of
A, B, and C so that the refined function (9.103) can reproduce the experimental
results as precise as one pleases. At the same time, we can determine the most
reliable combination of A, B, and C with the dispersion formula of (9.101). Figure 9.9
[6] shows several examples of the wavelength dispersion for organic semiconductor
crystals. Optimized constants A, B, and C of (9.101) are listed in Table 9.1 [6].
The formulae (9.101) and (9.103) along with associated procedures to determine
the constants A, B, and C are expected to apply to various laser and light-emitting
materials consisting of semiconducting inorganic and organic materials.
Example 9.2 [7] If we wish to construct a laser device, it will be highly desired to
equip the laser material with a suitable diffraction grating or resonator [8, 9]. In that
case, besides the information about the dispersion of phase refractive index, we need
numerical data of the propagation constant. The propagation constant has appeared
in Sect. 8.7.1 and is defined as
Fig. 9.9 Examples of the

wavelength dispersion of (a) 7 (a)
group indices and (b)
Group index ng
6 BP1T AC'7
refractive indices for several
organic semiconductor AC5
crystals. Reproduced from 5
Yamao T, Okuda Y,
4
Makino Y, Hotta S (2011)
Dispersion of the refractive
3
indices of thiophene/
phenylene co-oligomer
single crystals. J Appl Phys 3.3
110(5): 053113/7 pages [6], 3.2 AC'7 (b)
Refractive index n
with the permission of AIP 3.1 BP1T
Publishing. https://doi.org/ 3
10.1063/1.3634117
2.9
2.8
2.7
2.6 AC5
2.5
500 600
Wavelength (nm)
Table 9.1 Optimized constants of A, B, and C for Sellmeier’s dispersion formula (9.101) with
several organic semiconductor crystalsa
Material A B C (nm)
BP1T 5.7 1.04 397
AC5 3.9 1.44 402
AC’7 6.0 1.06 452
a
Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive
indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages,
with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117
β ¼ k sin θ ¼ nk 0 sin θ: ð8:153Þ
As the phase refractive index n has a dispersion, the propagation constant β has a
dispersion as well. In optics, n sin θ is often referred to as an effective index. We
denote it by
neff n sin θ or β ¼ k0 neff : ð9:104Þ
As 0 θ π/2, we have n sin θ n. In general, it is going to be difficult to obtain

analytical description or solution for the dispersion of both the phase refractive index
and effective index. As in Example 9.1, we usually obtain the relevant data by the
numerical calculations.
9.5 Lasers 363
(a)
P6T
(b) (c)
P6T crystal
air
optical device
AZO grating
2 µm
Fig. 9.10 Construction of an organic light-emitting device. (a) Structural formula of P6T. (b)
Diffraction grating of an organic device. The purple arrow indicates the direction of grating
wavevector K. Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y,
Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal
light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP
Publishing. https://doi.org/10.1063/1.5030486. (c) Cross-section of P6T crystal/AZO substrate
Under these circumstances, to establish a design principle of high-performance

organic light-emitting devices Yamao et al. [7] have developed a light-emitting
device where an organic crystal of P6T (see structural formula of Fig. 9.10a) was
placed onto a diffraction grating (Fig. 9.10b [7]). The compound P6T is known as
one of a family of thiophene/phenylene co-oligomers (TPCOs) together with BP1T,
AC5, AC’7, etc. that appear in Fig. 9.8 [10, 11]. The diffraction grating was
engraved using a focused ion beam (FIB) apparatus on an aluminum-doped zinc
oxide (AZO) substrate. The resulting substrate was then laminated with the thin
crystal of P6T (Fig. 9.10c). Using those devices, Yamao et al. excited the P6T crystal
with ultraviolet light of a mercury lamp and collected the emissions from the crystal
and analyzed those emission spectra (vide infra). Detailed analysis of those spectra
has revealed that there is a close relationship between the peak location of emission
lines and emission direction. Thus, to analyze the angle-dependent emission spectra
turns out to be a powerful tool to accurately determine the dispersion of propagation
constant of the laser material.
In a light-emitting device, we must consider a problem of out-coupling of light.
Seeking efficient out-coupling of light is equivalent to solving equations of electro-
magnetic wave motion inside and outside the slab crystal under appropriate bound-
ary conditions. The boundary conditions are given such that tangential components
of electromagnetic fields are continuous across the interface formed by the slab
crystal and air or a device substrate. Moreover, if a diffraction grating is present in an
optical device (including the laser), the diffraction grating influences emission
Fig. 9.11 Geometrical relationship among the light propagation in crystal (indicated with β), the
propagation outside the crystal (k0), and the grating wavevector (K). This geometry represents the
phase matching between the emission inside the device (organic slab crystal) and out-coupled
emission (i.e., the emission outside the device); see text
characteristics of the device. In other words, regarding the out-coupling of light we

must impose the following condition on the device such that [7]
β 2 mK ¼ ðk0 eeÞee, ð9:105Þ
where β is the propagation vector with |β| β that appeared in (8.153). The quantity
β is called a propagation constant. Equation (9.105) can be visualized in Fig. 9.11 as
geometrical relationship among the light propagation in crystal (indicated with β),
the propagation outside the crystal (k0), and the grating wavevector (K ). The grating
wavevector K is defined as
jK j K ¼ 2π=Λ,
where Λ is the grating period and the direction of K is perpendicular to the grating
grooves; that direction is indicated in Fig. 9.10b with a purple arrow. The unit vector
ee is defined as
β mK
ee ¼ :
j β mK j
Namely, ee is oriented in the direction parallel to β 2 mK. In (9.105), furthermore,

m is the diffraction order with a natural number (i.e., a positive integer); k0 denotes
the wavenumber vector of an emission in vacuum with
jk0 j k0 ¼ 2π=λp , ð9:106Þ
where λp is the emission peak wavelength in vacuum. Note that k0 is almost identical
to the wavenumber of the emission in air (i.e., the emission to be detected). If we are
dealing with the emission parallel to the substrate plane, i.e., grazing emission,
(9.105) can be reduced to
9.5 Lasers 365
β 2 mK ¼ k0 : ð9:107Þ
Equations (9.105) and (9.107) represent the relationship between β and k0, i.e., the
phase matching conditions between the emission inside the device (organic slab
crystal) and the out-coupled emission, i.e., the emission outside the device (in air).
Thus, the boundary conditions can be restated as (i) the tangential component
continuity of electromagnetic fields at both the planes of interface (see Sect. 8.1) and
(ii) the phase matching of the electromagnetic fields both inside and outsides the
laser medium. We examine these two factors below.
(i) Tangential component continuity of electromagnetic fields:
Let us suppose that we are dealing with a dielectric medium with anisotropic
dielectric constant, but isotropic magnetic permeability. In this situation, Maxwell’s
equations must be formulated so that we can deal with the electromagnetic properties
in an anisotropic medium. Such substances are widely available and organic crystals
are counted as typical examples. Among those crystals, P6T crystallizes in the
monoclinic system as in many other cases of organic crystals [10, 11]. In this case,
the permittivity tensor (or electric permittivity tensor) ε is written as [7]
0 1
εaa 0 εac
B C
ε¼@ 0 εbb 0 A: ð9:108Þ
εc a 0 εc c
Notice that in (9.108) the permittivity tensor is described in the orthogonal

coordinate of abc -system, where a and b coincide with those of the crystallographic
a- and b-axes of P6T with the c -axis being perpendicular to the ab-plane. Note that
in many of organic crystals the c-axis is not perpendicular to the ab-plane. Mean-
while, we assume that the magnetic permeability μ of P6T is isotropic and identical
to that of vacuum. That is, in a tensor form we have
0 1
μ0 0 0
B C
μ¼@ 0 μ0 0 A, ð9:109Þ
0 0 μ0
where μ0 is the magnetic permeability of vacuum.

Then, the Maxwell’s equations in a matrix form read as
∂H
rot E ¼ 2 μ0 , ð9:110Þ
∂t
Fig. 9.12 Laboratory ∗

coordinate system (the ξηζ-
system) for the device
experiments. Regarding the
symbols and notations,
see text
0 1
εaa 0 εac
B C ∂E
rot H ¼ ε0 @ 0 εbb 0 A : ð9:111Þ
∂t
εc a 0 εc c
In (9.111) the equation is described in the abc -system. But, it will be desired to
describe the Maxwell’s equations in the laboratory coordinate system (see Fig. 9.12)
so that we can readily visualize the light propagation in an anisotropic crystal such as
P6T. Figure 9.12 depicts the ξηζ-system, where the ξ-axis is in the direction of the
light propagation within the crystal, namely the ξ-axis parallels β in (9.107). We
assume that the ξηζ-system forms an orthogonal coordinate.
Here we define the dielectric constant ellipsoid e ε described by
0 1
ξ
B C
ðξ η ζ Þe
ε@ η A ¼ 1, ð9:112Þ
ζ
which represents an ellipsoid in the ξηζ-system. If one cuts the ellipsoid by a plane
that includes the origin of the coordinate system and perpendicular to the ξ-axis, its
cross-section is an ellipse. In this situation, one can choose the η- and ζ- axes for the
principal axis of the ellipse. This implies that when we put ξ ¼ 0, we must have
0 1
0
B C
ð0 η ζ Þe
ε@ η A ¼ εηη η2 þ εζζ ζ 2 ¼ 1: ð9:113Þ
ζ
In other words, we must have e

ε in the form of
9.5 Lasers 367
0 1
εξξ εξη εξζ
B C
e
ε ¼ @ εηξ εηη 0 A,
εζξ 0 εζζ
where eε is expressed in the ξηζ-system. In terms of the matrix algebra, the principal
submatrix with respect to the η- and ζ- components should be diagonalized (see Sect.
14.5). On this condition, the electric flux density of the electromagnetic wave is
polarized in the η- and ζ-direction. A half of the reciprocal square root of the
pffiffiffiffiffiffi pffiffiffiffiffiffi
principal axes, i.e., 1= εηη or 1= εζζ , represents the anisotropic refractive indices.
Suppose that the ξηζ-system is reached by two successive rotations starting from
the abc -system in such a way that we first perform the rotation by α around the c -
axis that is followed by another rotation by δ around the ξ-axis (Fig. 9.12). Then we
have [7]
0 1 0 1
εξξ εξη εξζ εaa 0 εac
B C 1 1 B C
@ εηξ εηη 0 A ¼ Rξ ðδÞRc ðαÞ@ 0 εbb 0 ARc ðαÞRξ ðδÞ, ð9:114Þ
εζξ 0 εζζ εc a 0 εc c
where Rc ðαÞ and Rξ(δ) stand for the first and second rotation matrices of the above,
respectively. In (9.114) we have
0 1 0 1
cos α sin α 0 1 0 0
B C B C
Rc ðαÞ ¼ @ sin α cos α 0 A, Rξ ðδÞ ¼ @ 0 cos δ sin δ A: ð9:115Þ
0 0 1 0 sin δ cos δ
With the matrix presentations and related coordinate transformations for (9.114)
and (9.115), see Sects. 11.4, 17.1, and 17.4.2. Note that the rotation angle δ cannot
be decided independent of α. In the literature [7], furthermore, the quantity εξη ¼ εηξ
was ignored as small compared to other components. As a result, (9.111) is
converted into the following equation described by
0 1
εξξ 0 εξζ
B C ∂E
rot H ¼ ε0 @ 0 εηη 0 A : ð9:116Þ
∂t
εζξ 0 εζζ
Notice here that the electromagnetic quantities H and E are measured in the ξηζ-
coordinate system.
Meanwhile, the electromagnetic waves that are propagating within the slab
crystal are described as
Φν exp ½iðβξ ωt Þ , ð9:117Þ
where Φν stands for either an electric field or a magnetic field with ν chosen from
ν ¼ 1, 2, 3 representing each component of the ξηζ-system. Inserting (9.117) into
(9.116) and separating the resulting equation into ξ, η, and ζ components, we obtain
six equations with respect to six components of H and E. Of these, we are partic-
ularly interested in Eξ and Eζ as well as Hη, because we assume that the electromag-
netic wave is propagated as a TM mode [7].
Using the set of the above six equations, with Hη in the P6T crystal we get the
following second-order linear differential equation (SOLDE):

d2 H η εξζ dH η εξζ 2 εξξ
þ 2ineff k 0 þ εξξ n 2
k 2 H ¼ 0, ð9:118Þ
dζ 2 εζζ dζ εζζ εζζ eff 0 η
where neff ¼ n sin θ in (8.153) is the effective index. Assuming as a solution of

(9.118)

H η ¼ H cryst eiκζ cos ðkζ þ ϕs Þ κ 6¼ 0, k 6¼ 0, H cryst , ϕs : constant ð9:119Þ
and inserting (9.119) into (9.118), we get a following type of equation described by
Ζeiκζ cos ðkζ þ ϕs Þ þ Ωeiκζ sin ðkζ þ ϕs Þ ¼ 0: ð9:120Þ
In (9.119) Hcryst represents the magnetic field within the P6T crystal. The
constants Ζ and Ω in (9.120) can be described using κ and k together with the
dH
constant coefficients of dζη and Hη of SOLDE (9.118). Meanwhile, we have

eiκζ cos ðkζ þ ϕ Þ eiκζ sin ðkζ þ ϕs Þ
s 2iκ
iκζ 0 0 ¼ ke 6¼ 0,
e cos ðkζ þ ϕs Þ eiκζ sin ðkζ þ ϕs Þ
where the differentiation is pertinent to ζ. This shows that eiκζ cos (kζ + ϕs) and
eiκζ sin (kζ + ϕs) are linearly independent. This in turn implies that
ΖΩ0 ð9:121Þ
in (9.120). Thus, from (9.121) we can determine κ and k such that

9.5 Lasers 369
εξζ εξζ 1
κ¼β ¼ neff k0 , k 2 ¼ 2 εξξ εζζ εξζ 2 εζζ neff 2 k0 2 : ð9:122Þ
εζζ εζζ εζζ
Further using the aforementioned six equations to eliminate Eζ, we obtain
εζζ
Eξ ¼ i kH eiκζ sin ðkζ þ ϕs Þ: ð9:123Þ
ωε0 ðεξξ εζζ εξζ 2 Þ cryst
Equations (9.119) and (9.123) define the electromagnetic fields as a function of ζ

within the P6T slab crystal.
Meanwhile, the fields in the AZO substrate and air can readily be determined.
Assuming that both the substances are isotropic, we have εξζ ¼ 0 and εξξ ¼ εζζ e
n2
and, hence, we get
d2 H η 2
2
þ e
n neff 2 k0 2 H η ¼ 0, ð9:124Þ
dζ
where e
n is the refractive index of either the substrate or air. Since
e
n neff n, ð9:125Þ
where n is the phase refractive index of the P6T crystal, we have e

n2 neff 2 < 0.
Defining a quantity
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
γ neff 2 e n2 k 0 ð9:126Þ
for the substrate (i.e., AZO) and air, from (9.124) we express the electromagnetic
fields as
e γζ ,
H η ¼ He ð9:127Þ
where He denotes the magnetic field relevant to the substrate or air. Considering that
ζ < 0 for the substrate and ζ > d/ cos δ in air (see Fig. 9.13), with the magnetic field
we have
H η ¼ He e γðζ cosd δÞ
e γζ and H η ¼ He ð9:128Þ
for the substrate and air, respectively. Notice that Hη ! 0, when ζ ! 1 with the
substrate and ζ ! 1 for air. Correspondingly, with the electric field we get
γ e γζ γ e γðζ cosd δÞ
E ξ ¼ i He and E ξ ¼ i He ð9:129Þ
ωε0e
n 2
ωε0e
n2
for the substrate and air, respectively.

The device geometry is characterized by that the P6T slab crystal is sandwiched
by air and the device substrate so as to form the three-layered (air/crystal/substrate)
structure (Fig. 9.10c). Therefore, we impose boundary conditions on Hη in (9.119)
and (9.128) along with Eξ in (9.123) and (9.129) in such a way that their tangential
components are continuous across the interfaces between the crystal and the sub-
strate and between the crystal and air.
Figure 9.13 represents the geometry of the cross-section of air/P6T crystal/AZO
substrate. From (9.119) and (9.128), (a) the tangential continuity condition for the
magnetic field at the crystal/substrate interface is described as
e γ0 ð cos δÞ,

H cryst eiκ0 cos ðk 0 þ ϕs Þð cos δÞ ¼ He ð9:130Þ
where the factor of cosδ comes from the requirement of the tangential continuity
condition. (b) The tangential continuity condition for the electric field at the same
interface reads as
εζζ γ e γ0
i kH eiκ0 sin ðk 0 þ ϕs Þ ¼ i He : ð9:131Þ
ωε0 ðεξξ εζζ εξζ 2 Þ cryst ωε0e
n2
Thus, dividing both sides of (9.131) by both sides of (9.130), we get
εζζ γ εζζ γ
k tan ϕs ¼ 2 or k tan ðϕs Þ ¼ 2 , ð9:132Þ
εξξ εζζ εξζ 2 e
n ε ξξ ε ζζ ε ξζ
2
e
n
where γ and e n are substrate related quantities; see (9.125) and (9.126). Another
boundary condition at the air/crystal interface can be obtained in a similar manner.
Notice that in that case ζ ¼ d/ cos δ; see Fig. 9.13. As a result, we have
Fig. 9.13 Geometry of the ∗
cross-section of air/P6T
crystal/AZO substrate. The
angle δ is identical with that
of Fig. 9.12. The points air
P and P0 are located at the
crystal/substrate interface
and air/crystal interface, P6T crystal
respectively. The distance
between P and P0 is d/ cos δ
AZO substrate
9.5 Lasers 371
εζζ kd γ
k tan þ ϕs ¼ , ð9:133Þ
εξξ εζζ εξζ 2 cos δ e
n2
where γ and e n are air related quantities. The quantity ϕs can be eliminated
considering arc tangent of (9.132) and (9.133). In combination with (9.122), we
finally get the following eigenvalue equation for TM modes:
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
k0 d ðε ε εξζ 2 Þðεζζ neff 2 Þ=ð cos δÞ
εζζ 2 ξξ ζζ
" sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi#
1 1 n 2 nair 2
¼ lπ þ tan ðεξξ εζζ εξζ 2 Þ eff
nair 2 εζζ neff 2
" sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi#
1 1 n 2 nsub 2
þ tan ðεξξ εζζ εξζ 2 Þ eff , ð9:134Þ
nsub 2 εζζ neff 2
where nair (¼ 1) is the refractive index of air, nsub is that of AZO substrate equipped
with a diffraction grating, d is the crystal thickness, and l is the order of the transverse
mode. The refractive index nsub is estimated as the average of the refractive indices
of air and AZO weighted by volume fraction they occupy in the diffraction grating.
To solve (9.134), iterative numerical computation was needed. The detailed calcu-
lation procedures for this can be seen in the literature [7] and supplementary material
therein.
Note that (9.134) includes the well-known eigenvalue equation for a waveguide
that comprises an isotropic core dielectric medium sandwiched by a couple of clad
layers having the same refractive index (symmetric waveguide) [12]. In that case, in
fact, the second and third terms in RHS of (9.134) are the same and their sum is
identical with δTM of (8.176) in Sect. 8.7.2. To confirm it, in (9.134) use
εξξ ¼ εζζ n2 [see (7.57)] and εξζ ¼ 0 together with the relative refractive index
given by (8.39).
(ii) Phase matching of the electromagnetic fields:
Once we have obtained the eigenvalue equation described by (9.134), we will be
able to solve the problem and, hence, to design a high-performance optical device,
especially the laser. To this end, let us return back to (9.107). Taking inner products
(see Chap. 13) of both sides of (9.107), we have
hβ 2 mKjβ 2 mK i ¼ hk0 jk0 i:
That is, we get

β2 þ m2 K 2 2mKβ cos φ ¼ k0 2 , ð9:135Þ
where φ is an angle between mK and β (see Fig. 9.11). Inserting (9.104) into (9.135)
and solving the quadratic equation, we obtain neff as a function of k0 (or emission
wavelength). This implies that we can determine neff from the experimental data.
Yet, the above process does not mean that we can determine neff uniquely. It is
because there is still room for choice of a positive integer m. Using (9.125) that
restricts the values neff, however, we can decide the most probable integer m. Thus,
we can compare neff obtained from (9.134) with neff determined from the
experiments.
As described above, we have explained two major factors of the out-coupling of
emission. In the above discussion, we assumed that the TM modes dominate.
Combining (9.107) and (9.134), we successfully assign individual emissions to the
specific TMml modes, in which the indices m and l are longitudinal and transverse
modes with m associated with the grating. Note that m 1 and l 0. With the index
l, l ¼ 0 is allowed because the total internal reflection (Sects. 8.5 and 8.6) is
responsible.
Figure 9.14a [7] shows an example of the angle-dependent emission spectra. For
this, the grazing emission was intended. There are two series of progression in which
the emission peak locations are either redshifted or blueshifted with increasing
(a) (b)
4
(×10 counts)
Intensity
2
3
90
ion angle (degree)
60
30
RotatAngle
−30
−60
−90
600 700 800
Wavelength (nm)
Fig. 9.14 Angle-dependent emission spectra of a P6T crystal. (a) Emission spectra. Reproduced
from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018)
Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys
123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1.
5030486. (b) The grazing emissions (k0) were collected at various angles θ made with the grating
wavevector direction (K)
9.5 Lasers 373
Refractive index, effective index

(2, 0, b)
(2, 1, b) (m, l, s) = (1, 0, r)
2
(1, 1, r)
600 700 800

Peak wavelength (nm)
Fig. 9.15 Wavelength dispersion of effective indices neff. The open and closed symbols show the
data pertinent to the blueshifted and redshifted peaks, respectively. These data were calculated from
either (9.105) or (9.107). The colored solid curves represent the dispersions of the effective
refractive indices computed from (9.134). The numbers and characters (m, l, s) denote the diffrac-
tion order (m), order of transverse mode (l ), and shift direction (s) that distinguishes between
blueshift (b) and redshift (r). A black solid curve at the upper right indicates the wavelength
dispersion of the phase refractive indices (n) of the P6T crystal related to the one principal axis.
Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S,
Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J
Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.
1063/1.5030486
angles jθj. Figure 9.15 [7] compares the effective indices determined by the exper-
iments and numerical computations based on (9.134). The experimentally observed
emission peaks are indicated with and open circles (or open triangles) or closed
circles (or closed triangles) with blueshifted or redshifted peaks, respectively. The
color solid curves represent the results obtained from (9.134).
As clearly recognized in Fig. 9.15 [7], the results of the experiments and com-
putations are satisfactory. In particular, TM20 (blueshifted) and TM10 (redshifted)
modes are seamlessly jointed so as to be an upper group of emission peaks. A lower
group of emission peaks are assigned to TM21 (blueshifted) or TM11 (redshifted)
modes. Notice here that these modes may have multiple values of neff according to
different εξξ, εζζ, and εξζ that vary with the different propagation directions of light
(i.e., the ξ-axis) within the slab crystal waveguide.
From a practical point of view, it is desired to make a device so that it can strongly
emit light in the direction parallel to the grating wavevector. In that case, (9.107) can
be rewritten as
β ¼ ðk0 þ mK Þee,
where all the vectors β, K, and k0 are parallel to ee. Then, we have two choices such
that
2π mK
β ¼ k0 þ mK ¼ neff k 0 , k 0 ¼ ¼ ;
λp neff 1
2π mK
β ¼ k0 mK ¼ neff k0 , k0 ¼ ¼ : ð9:136Þ
λp neff þ 1
The first and second equations of (9.136) are associated with the spectra of
redshifted and blueshifted progressions, respectively. Further rewriting (9.136) to
combine the two equations, we get
2π ðneff 1Þ ðneff 1ÞΛ

λp ¼ ¼ , ð9:137Þ
mK m
where we used the relation of K ¼ 2π/Λ.

From (9.134) we find that neff implicitly depends on l and d (crystal thickness).
Thus, (9.137) tells us that if we choose m, l, and d appropriately, we can determine
the most probable neff. Then, choosing the grating period Λ appropriately in turn, we
may predict the emission peak wavelength λp. The other way around, designating λp
properly, we can decide Λ. In fact, λp can experimentally be determined from an
emission color (or maximum of emission gain) inherent to the chemical species of
organic crystals. For example, a maximum of emission gain of P6T is located around
660 nm, as can be seen from Fig. 9.14a [7]. In combination with (9.134), (9.137)
finally allows us to choose the optimum combination of the device parameters m, l,
d, Λ, and λp.
When the laser oscillation is taking place, (9.136) and (9.137) should be replaced
with the following expressions that represent the Bragg’s condition [13, 14]:
2β ¼ mK ð9:138Þ
or
2neff Λ
λp ¼ : ð9:139Þ
m
As discussed above in detail, Examples 9.1 and 9.2 enable us to design a high-
performance laser device using an organic crystal that is characterized by a pretty
complicated crystal structure associated with anisotropic refractive indices. The
abovementioned design principle enables one to construct effective laser devices
that consist of light-emitting materials either organic or inorganic more widely. At
the same time, these examples are expected to provide an effective methodology in
interdisciplinary fields encompassing solid-state physics and solid-state chemistry as
well as device physics.
9.6 Mechanical System 375
9.6 Mechanical System
As outlined above, two-level atoms have distinct characteristics in connection with

lasers. Electromagnetic waves similarly confined within a one-dimensional cavity
exhibit related properties and above all have many features in common with a
harmonic oscillator [15].
We have already described several features and properties of the harmonic
oscillator (Chap. 2). Meanwhile, we have briefly discussed formation of electromag-
netic stationary waves (Chap. 8). There are several resemblance and correspondence
between the harmonic oscillator and electromagnetic stationary waves when we
view them as mechanical systems. The point is that in a harmonic oscillator the
position and momentum are in-quadrature relationship; i.e., their phase difference is
π/2. For the electromagnetic stationary waves, electric field and magnetic field are
in-quadrature relationship as well.
In Chap. 8, we examine the conditions under which stationary waves are formed.
In a dielectric medium both sides of which are equipped with metal layers
(or mirrors), the electric field is described as
E ¼ 2E 1 εe sin ωt sin kz: ð8:201Þ
In (8.201), we assumed that forward and backward electromagnetic waves are

propagating in the direction of the z-axis. Here, we assume that the interfaces
(or walls) are positioned at z ¼ 0 and z ¼ L. Within a domain [0, L], the two
waves form a stationary wave. Since this expression assumed two waves, the
electromagnetic energy was doubled. To normalize the energy sopthat ffiffiffi a single
wave is contained, the amplitude E1 of (8.201) should be divides by 2. Therefore,
we think of a following description for E:
E¼ 2Ee1 sin ωt sin kz or Ex ¼ 2E sin ωt sin kz, ð9:140Þ
where we designated the polarization vector as a direction of the x-axis. At the same
time, we omitted the index from the amplitude. Thus, from the second equation of
(7.65) we have
pffiffiffi
∂H y 1 ∂E x 2Ek
¼ ¼ sin ωt cos kz: ð9:141Þ
∂t μ ∂z μ
Note that this equation appears in the second equation of (8.131) as well. Integrating
both sides of (9.141), we get
pffiffiffi
2Ek
Hy ¼ cos ωt cos kz:
μω
Using a relation ω ¼ vk, we have

pffiffiffi
2E
Hy ¼ cos ωt cos kz þ C,
μv
where v is a light velocity in the dielectric medium and C is an integration constant.

Removing C and putting
E
H ,
μv
we have
pffiffiffi
Hy ¼ 2H cos ωt cos kz:
Using a vector expression, we have

pffiffiffi
H ¼ e2 2H cos ωt cos kz:
Thus, E (ke1), H (ke2), and n (ke3) form the right-handed system in this order. As
noted in Sect. 8.8, at the interface (or wall) the electric field and magnetic field form
nodes and antinodes, respectively. Namely, the two fields are in-quadrature.
Let us calculate electromagnetic energy of the dielectric medium within a cavity.
In the present case the cavity is meant as the dielectric sandwiched by a couple of
metal layer. We have
W ¼ W e þ W m, ð9:142Þ
where W is the total electromagnetic energy; We and Wm are electric and magnetic
energies, respectively. Let the length of the cavity be L. Then the energy per unit
cross-section area is described as
Z L Z L
ε μ
We ¼ E dz and W m ¼
2
H 2 dz: ð9:143Þ
2 0 2 0
Performing integration, we get
ε
W e ¼ LE 2 sin2 ωt,
2
2
μ μ E ε
W m ¼ LH 2 cos 2 ωt ¼ L cos 2 ωt ¼ LE 2 cos 2 ωt, ð9:144Þ
2 2 μv 2
where we used 1/v2 ¼ εμ with the last equality. Thus, we have

9.6 Mechanical System 377
ε μ
W ¼ LE 2 ¼ LH 2 : ð9:145Þ
2 2
fe , W
Representing an energy per unit volume as W g e
m , and W, we have
fe ¼ ε E 2 sin2 ωt, W
W g ε 2 2 e ε 2 μ 2
m ¼ E cos ωt, W ¼ E ¼ H : ð9:146Þ
2 2 2 2
In Chap. 2, we treated motion of a harmonic oscillator. There, we had
v0
xð t Þ ¼ sin ωt ¼ x0 sin ωt: ð2:7Þ
ω
Here, we have defined an amplitude of the harmonic oscillation as x0 (>0)
v0
x0 :
ω
Then, momentum is described as
pðt Þ ¼ mxð_t Þ ¼ mωx0 cos ωt:
Defining
p0 mωx0 ,
we have
pðt Þ ¼ p0 cos ωt:
Let a kinetic energy and potential energy of the oscillator be K and V, respec-
tively. Then, we have
1 1 2
K¼ pðt Þ2 ¼ p cos 2 ωtxðt Þ,
2m 2m 0
1 1
V ¼ mω2 xðt Þ2 ¼ mω2 x0 2 sin2 ωtxðt Þ,
2 2
1 2 1 1
W ¼KþV ¼ p ¼ mv0 2 ¼ mω2 x0 2 : ð9:147Þ
2m 0 2 2
Comparing (9.146) and (9.147), we recognize the following relationship in energy

between the electromagnetic fields and harmonic oscillator motion [15]:
pffiffiffiffi pffiffiffi pffiffiffiffi pffiffiffi
mωx0 ⟷ εE and p0 = m⟷ μH: ð9:148Þ
Thus, there is an elegant contradistinction between the dynamics of electromagnetic

fields in cavity and motion of a harmonic oscillator. In fact, quantum electromag-
netism is based upon the treatment of a quantum harmonic oscillator introduced in
Chap. 2.
References
1. Moore WJ (1955) Physical chemistry, 3rd edn. Prentice-Hall, Englewood Cliffs

2. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester
3. Sunakawa S (1965) Theoretical electromagnetism. Kinokuniya, Tokyo. (in Japanese)
4. Loudon R (2000) The quantum theory of light, 3rd edn. Oxford University Press, Oxford
5. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge
6. Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of
thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5):053113
7. Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018)
Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys
123(23):235501
8. Siegman AE (1986) Lasers. University Science Books, Sausalito
9. Palmer C (2005) Diffraction grating handbook, 6th edn. Newport Corporation, New York
10. Hotta S, Yamao T (2011) The thiophene/phenylene co-oligomers: exotic molecular semicon-
ductors integrating high-performance electronic and optical functionalities. J Mater Chem 21
(5):1295–1304
11. Hotta S, Goto M, Azumi R, Inoue M, Ichikawa M, Taniguchi Y (2004) Crystal structures of
thiophene/phenylene co-oligomers with different molecular shapes. Chem Mater 16
(2):237–241
12. Yariv A, Yeh P (2003) Optical waves in crystals: propagation and control of laser radiation.
Wiley, New York
14. Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended
wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl Phys
127(5):053102
15. Fox M (2006) Quantum Optics. Oxford University Press, Oxford
Chapter 10
Introductory Green’s Functions
In this chapter, we deal with various properties and characteristics of differential

equations, especially first-order linear differential equations (FOLDEs) and second-
order linear differential equations (SOLDEs). These differential equations are char-
acterized by differential operators and boundary conditions (BCs). Of these, differ-
ential operators appearing in SOLDEs are particularly important. Under appropriate
conditions, the said operators can be converted to Hermitian operators. The SOLDEs
associated to classical orthogonal polynomials play a central role in many fields of
mathematical physics including quantum mechanics and electromagnetism. We
study the general principle of SOLDEs in relation to several specific SOLDEs we
have studied in Part I and examine general features of an eigenvalue problem and an
initial-value problem (IVP). In this context, Green’s functions provide a powerful
tool for solving SOLDEs. For a practical purpose, we deal with actual construction
of Green’s functions. In Sect. 8.8, we dealt with steady-state characteristics of
electromagnetic waves in dielectrics in terms of propagation, reflection, and trans-
mission. When we consider transient characteristics of electromagnetic and optical
phenomena, we often need to deal with SOLDEs having constant coefficients. This
is well known in connection with a motion of a damped harmonic oscillator. In the
latter part of this chapter, we treat the initial value problem of a SOLDE of this type.
10.1 Second-Order Linear Differential Equations

(SOLDEs)
A general form of n-th order linear differential equations has the following form:
dn u dn1 u du
a n ð xÞ n þ an1 ðxÞ n1 þ þ a1 ðxÞ þ a0 ðxÞu ¼ d ðxÞ: ð10:1Þ
dx dx dx

https://doi.org/10.1007/978-981-15-2225-3_10
380 10 Introductory Green’s Functions
Table 10.1 Characteristics of SOLDEs

Type I Type II Type III Type IV
Equation Homogeneous Homogeneous Inhomogeneous Inhomogeneous
Boundary conditions Homogeneous Inhomogeneous Homogeneous Inhomogeneous
If d(x) ¼ 0, the differential equation is said to be homogeneous; otherwise it is called

inhomogeneous. Equation (10.1) is a linear function of u and its derivatives.
Likewise, we have a SOLDE such that
d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ dðxÞ: ð10:2Þ
dx2 dx
In (10.2) we assume that the variable x is real. The equation can be solved under
appropriate boundary conditions (BCs). A general form of BCs is described as

du du
B1 ðuÞ ¼ α1 uðaÞ þ β1 x¼a þ γ 1 uðbÞ þ δ1 ¼ σ1, ð10:3Þ
dx dx x¼b

du du
B2 ðuÞ ¼ α2 uðaÞ þ β2 x¼a þ γ 2 uðbÞ þ δ2 ¼ σ2, ð10:4Þ
dx dx x¼b
where α1, β1, γ 1, δ1 σ 1, etc. are real constants; u(x) is defined in an interval [a, b],
where a and b can be infinity (i.e., 1). The LHS of B1(u) and B2(u) are referred to
as boundary functionals [1, 2]. If σ 1 ¼ σ 2 ¼ 0, the BCs are called homogeneous;
otherwise the BCs are said to be inhomogeneous. In combination with the inhomo-
geneous equation expressed as (10.2), Table 10.1 summarizes characteristics of
SOLDEs. We have four types of SOLDEs according to homogeneity and inhomo-
geneity of equations and BCs.
Even though SOLDEs are mathematically tractable, yet it is not easy necessarily
to solve them depending upon the nature of a(x), b(x), and c(x) of (8.2). Nonetheless,
if those functions are constant coefficients, it can readily be solved. We will deal with
SOLDEs of that type in great deal later. Suppose that we find two linearly indepen-
dent solutions u1(x) and u2(x) of a following homogeneous equation:
d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ 0: ð10:5Þ
dx2 dx
Then, any solution u(x) of (10.5) can be expressed as their linear combination such
that
uðxÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ, ð10:6Þ
where c1 and c2 are some arbitrary constants.

In general, suppose that there are arbitrarily chosen n functions; i.e., f1(x), f2(x), ,
fn(x). Suppose a following equation with those functions:
10.1 Second-Order Linear Differential Equations (SOLDEs) 381
a1 f 1 ðxÞ þ a2 f 2 ðxÞ þ þ an f n ðxÞ ¼ 0, ð10:7Þ
where a1, a2, , and an are constants. If a1 ¼ a2 ¼ ¼ an ¼ 0, (10.7) always

holds. In this case, (10.7) is said to be a trivial linear relation. If f1(x), f2(x), , and
fn(x) satisfy a nontrivial linear relation, f1(x), f2(x), , and fn(x) are said to be linearly
dependent. That is, the nontrivial expression means that in (10.7) at least one of a1,
a2, , and an is nonzero. Suppose that an 6¼ 0. Then, from (10.7), fn(x) is expressed
as
a1 a a
f n ð xÞ ¼ f ðxÞ 2 f 2 ðxÞ n1 f n1 ðxÞ: ð10:8Þ
an 1 an an
If f1(x), f2(x), , and fn(x) are not linearly dependent, they are called linearly
independent. In other words, the statement that f1(x), f2(x), , and fn(x) are linearly
independent is equivalent to that (10.7) holds if and only if a1 ¼ a2 ¼ ¼ an ¼ 0.
We will have relevant discussion in Part III.
Now suppose that with the above two linearly independent functions u1(x) and
u2(x), we have
a1 u1 ðxÞ þ a2 u2 ðxÞ ¼ 0: ð10:9Þ
Differentiating (10.9), we have
du1 ðxÞ du ðxÞ

a1 þ a2 2 ¼ 0: ð10:10Þ
dx dx
Expressing (10.9) and (10.10) in a matrix form, we get

0 1
u1 ð x Þ u2 ðxÞ a
@ du ðxÞ du ðxÞ A 1 ¼ 0: ð10:11Þ
1 2
a2
dx dx
Thus, that u1(x) and u2(x) are linearly independent is equivalent to that the following
expression holds:

u1 ð x Þ u2 ðxÞ

du ðxÞ du ðxÞ W ðu1 , u2 Þ 6¼ 0, ð10:12Þ
1 2

dx dx
where W(u1, u2) is called Wronskian of u1(x) and u2(x). In fact, if W(u1, u2) ¼ 0, then
we have
du2 du
u1 u2 1 ¼ 0: ð10:13Þ
dx dx
This implies that there is a functional relationship between u1 and u2. In fact, if we
can express as u2 ¼ u2(u1(x)), then du
dx ¼ du1 dx . That is,
2 du2 du1
du2 du du du du du du
u1 ¼ u1 2 1 ¼ u2 1 or u1 2 ¼ u2 or 2 ¼ 1 , ð10:14Þ
dx du1 dx dx du1 u2 u1
where the second equality of the first equation comes from (10.13). The third
equation can easily be integrated to yield
u2 u
ln ¼ c or 2 ¼ ec or u2 ¼ ec u1 : ð10:15Þ
u1 u1
Equation (10.15) shows that u1(x) and u2(x) are linearly dependent. It is easy to show
if u1(x) and u2(x) are linearly dependent, W(u1, u2) ¼ 0. Thus, we have a following
statement:
Two functions are linearly dependent: , W ðu1 , u2 Þ ¼ 0:
Then, as the contraposition of this statement we have
Two functions are linearly independent: , W ðu1 , u2 Þ 6¼ 0:
On the other hand, suppose that we have another solution u3(x) for (10.5) besides
u1(x) and u2(x). Then, we have
d 2 u1 du
að x Þ þ bðxÞ 1 þ cðxÞu1 ¼ 0,
dx2 dx
d 2 u2 du
að x Þ þ bðxÞ 2 þ cðxÞu2 ¼ 0,
dx2 dx
d 2 u3 du
aðxÞ þ bðxÞ 3 þ cðxÞu3 ¼ 0: ð10:16Þ
dx2 dx
Again rewriting (10.16) in a matrix form, we have

0 10 1
d 2 u1 du1 a
u1
B dx2 dx CB C
B CB C
B d 2 u2 du2 CB b C
B u2 C B C
B dx2 dx CB C ¼ 0: ð10:17Þ
B CB C
@ A@ A
d 2 u3 du3
u3
dx2 dx c
A necessary and sufficient condition to obtain a nontrivial solution (i.e., a solution

besides a ¼ b ¼ c ¼ 0) is that [1]
10.1 Second-Order Linear Differential Equations (SOLDEs) 383

d 2 u1 du1
u1
dx2 dx

d 2 u2 du2
u2 ¼ 0: ð10:18Þ
dx2 dx

d 2 u3 du3
u3
dx2 dx
Note here that

d 2 u1 du1
u1 u1 u2 u3
dx2 dx
du1 du2 du3

d 2 u2 du2
u2 ¼ dx
dx W ðu1 , u2 , u3 Þ, ð10:19Þ
dx2 dx 2
dx

d u1 d 2 u2 d 2 u3
2
d 2 u3 du3
u3 dx dx2 dx2
dx2 dx
where W(u1, u2, u3) is Wronskian of u1(x), u2(x), and u3(x). In the above relation, we
used the fact that a determinant of a matrix is identical to that of its transposed matrix
and that a determinant of a matrix changes the sign after permutation of row vectors.
To be short, a necessary and sufficient condition to get a nontrivial solution is that W
(u1, u2, u3) vanishes.
This implies that u1(x), u2(x), and u3(x) are linearly dependent. However, we have
assumed that u1(x) and u2(x) are linearly independent, and so (10.18) and (10.19)
mean that u3(x) must be described as a linear combination of u1(x) and u2(x). That is,
we have no third linearly independent solution. Consequently, the general solution
of (10.5) must be given by (10.6). In this sense, u1(x) and u2(x) are said to be a
fundamental set of solutions of (10.5).
Next, let us consider the inhomogeneous equation of (10.2). Suppose that up(x) is
a particular solution of (10.2). Let us think of a following function v(x) such that:
uðxÞ ¼ vðxÞ þ up ðxÞ: ð10:20Þ
Substituting (10.20) for (10.2), we have
d2 v dv d 2 up dup
aðxÞ 2
þ b ð x Þ þ c ð x Þv þ a ð x Þ 2
þ bð x Þ þ cðxÞup ¼ dðxÞ: ð10:21Þ
dx dx dx dx
Therefore, we have
d2 v dv
að x Þ 2
þ bðxÞ þ cðxÞv ¼ 0:
dx dx
But, v(x) can be described by a linear combination of u1(x) and u2(x) as in the case of
(10.6). Hence, the general solution of (10.2) should be expressed as
uðxÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ þ up ðxÞ, ð10:22Þ
where c1 and c2 are arbitrary (complex) constants.
10.2 First-Order Linear Differential Equations (FOLDEs)
In a discussion that follows, first-order linear differential equations (FOLDEs)

supply us with useful information. A general form of FOLDEs is expressed as
du
að x Þ þ bðxÞu ¼ d ðxÞ ½aðxÞ 6¼ 0: ð10:23Þ
dx
An associated boundary condition is given by a boundary functional B(u) such that
BðuÞ ¼ αuðaÞ þ βuðbÞ ¼ σ, ð10:24Þ
where α, β, and σ are real constants; u(x) is defined in an interval [a, b]. If in (10.23) d
(x) 0, (10.23) can readily be integrated to yield a solution. Let us multiply both
sides of (10.23) by w(x). Then we have
du
wðxÞaðxÞ þ wðxÞbðxÞu ¼ wðxÞd ðxÞ: ð10:25Þ
dx
We define p(x) as
pðxÞ wðxÞaðxÞ, ð10:26Þ
where w(x) is called a weight function. As mentioned in Sect. 2.3, the weight
function is a real and non-negative function within the domain considered. Here
we suppose that
dpðxÞ
¼ wðxÞbðxÞ: ð10:27Þ
dx
Then, (10.25) can be rewritten as
d
½pðxÞu ¼ wðxÞdðxÞ: ð10:28Þ
dx
10.2 First-Order Linear Differential Equations (FOLDEs) 385
Thus, we can immediately integrate (10.28) to obtain a solution

Z x
1
u¼ wðx0 Þdðx0 Þdx0 þ C , ð10:29Þ
pð x Þ
where C is an arbitrary integration constant.

To seek w(x), from (10.26) and (10.27) we have

0 b 0
p ¼ ðwaÞ ¼ wb ¼ wa : ð10:30Þ
a
This can easily be integrated for wa to be expressed as

Z Z
b C0 b
wa ¼ C 0 exp dx or w ¼ exp dx , ð10:31Þ
a a a
0
where C0 is an arbitrary integration constant. The quantity Ca must be non-negative so
that w can be non-negative.
Example 10.1 Let us think of a following FOLDE within an interval [a, b]; i.e.,
a x b.
du
þ xu ¼ x: ð10:32Þ
dx
A boundary condition is set such that
uðaÞ ¼ σ: ð10:33Þ
Notice that (10.33) is obtained by setting α ¼ 1 and β ¼ 0 in (10.24). Following the

above argument, we obtain a solution described as
Z x
1
u¼ wðx0 Þdðx0 Þdx0 þ uðaÞpðaÞ : ð10:34Þ
pð x Þ a
Also, we have
Z h
x
0 0 1 2 i
pðxÞ ¼ wðxÞ ¼ exp x dx ¼ exp x a2 : ð10:35Þ
a 2
The integration of RHS can be performed as follows:

Z x Z x h i
0 0 0 1 02
wðx Þdðx Þdx ¼ x0 exp
x a2 dx0
a a 2
2 Z x2 =2
a
¼ exp exp tdt
2 a2 =2
2 2 2
a x a
¼ exp exp exp
2 2 2
2 2
a x
¼ exp exp 1 ¼ pðxÞ 1, ð10:36Þ
2 2
where with the second equality we used an integration by substitution of 12 x0 ⟶t.

2
Considering (10.33) and putting p(a) ¼ 1, (10.34) is rewritten as
1 σ1
u¼ ½ pð x Þ 1 þ σ ¼ 1 þ
pð x Þ pðxÞ
2
a x 2
¼ 1 þ ðσ 1Þ exp : ð10:37Þ
2
where with the last equality we used (10.35).

Notice that when σ ¼ 1, u(x) 1. This is because u(x) 1 is certainly a solution
of (10.32) and satisfies the BC of (10.33). Uniqueness of a solution imposes this
strict condition upon (10.37).
In the next two examples, we make general discussions.
Example 10.2 Let us consider a following differential operator:
d
Lx ¼ : ð10:38Þ
dx
We think of a following identity using an integration by parts:

Z b Z b
d d
dxφ ψ þ dx φ ψ ¼ ½φ ψ ba : ð10:39Þ
a dx a dx
Rewriting this, we get

Z b Z b
d d
dxφ ψ dx φ ψ ¼ ½φ ψ ba : ð10:40Þ
a dx a dx
Looking at (10.40), we notice that LHS comprises a difference between two inte-
grals, while RHS referred to as a boundary term (or surface term) does not contain an
integral.
Recalling the expression (1.128) and defining
10.2 First-Order Linear Differential Equations (FOLDEs) 387
d
Lex
dx
in (10.40), we have
hφjLx ψ i Lex φjψ ¼ ½φ ψ ba : ð10:41Þ
Here, RHS of (10.41) needs to vanish so that we can have
hφjLx ψ i ¼ Lex φjψ : ð10:42Þ
Meanwhile, adopting the expression (1.112) with respect to an adjoint operator, we

have a following expression such that
hφjLx ψ i ¼ Lx { φjψ : ð10:43Þ
Comparing (10.42) and (10.43) and considering that φ and ψ are arbitrary functions,
we have Lex ¼ Lx { . Thus, as an operator adjoint to Lx we get
d
Lx { ¼ Lex ¼ ¼ Lx :
dx
Notice that only if the surface term vanishes, the adjoint operator Lx{ can appropri-
ately be defined. We will encounter a similar expression again in Part III.
We add that if with an operator A we have a relation describe by
A{ ¼ A, ð10:44Þ
the operator A is said to be anti-Hermitian. We have already encountered such an

operator in Sect. 1.5.
Let us then examine on what condition the surface term vanishes. The RHS of
(10.40) and (10.41) is given by
φ ðbÞψ ðbÞ φ ðaÞψ ðaÞ:
For this term to vanish, we should have
φ ð bÞ ψ ð aÞ
φ ðbÞψ ðbÞ ¼ φ ðaÞψ ðaÞ or ¼ :
φ ð aÞ ψ ð bÞ
If ψ(b) ¼ 2ψ(a), then we should have φðbÞ ¼ 12 φðaÞ for the surface term to vanish.
Recalling (10.24), the above conditions are expressed as
Bðψ Þ ¼ 2ψ ðaÞ ψ ðbÞ ¼ 0, ð10:45Þ

1
B0 ðφÞ ¼ φðaÞ φðbÞ ¼ 0: ð10:46Þ
2
The boundary functional B0(φ) is said to be adjoint to B(ψ). The two boundary
functionals are admittedly different. If, however, we set ψ(b) ¼ ψ(a), then we should
have φ(b) ¼ φ(a) for the surface term to vanish. That is
Bðψ Þ ¼ ψ ðaÞ ψ ðbÞ ¼ 0, ð10:47Þ

0
B ðφÞ ¼ φðaÞ φðbÞ ¼ 0: ð10:48Þ
Thus, the two functionals are identical and ψ and φ satisfy homogeneous BCs with
respect to these functionals.
As discussed above, a FOLDE is characterized by its differential operator as well
as a BC (or boundary functional). This is similarly the case with SOLDEs as well.
Example 10.3 Next, let us consider a following differential operator:
1 d
Lx ¼ : ð10:49Þ
i dx
As in the case of Example 10.2, we have

Z b Z b
1 d 1 d 1
dxφ ψ dx φ ψ ¼ ½φ ψ ba : ð10:50Þ
a i dx a i dx i
Also rewriting (10.50) using an inner product notation, we get
1
hφjLx ψ i hLx φjψ i ¼ ½φ ψ ba , ð10:51Þ
i
Apart from the factor 1i , RHS of (10.51) are again given by
φ ðbÞψ ðbÞ φ ðaÞψ ðaÞ:
Repeating a discussion similar to Example 10.2, when the surface term vanishes, we
get
hφjLx ψ i ¼ hLx φjψ i: ð10:52Þ

10.3 Second-Order Differential Operators 389
hLx φjψ i ¼ Lx { φjψ : ð10:53Þ
Again considering that φ and ψ are arbitrary functions, we get
Lx { ¼ Lx : ð10:54Þ
As in (10.54), if the differential operator is identical to its adjoint operator, such

an operator is called self-adjoint. On the basis of (1.119), Lx would apparently be
Hermitian. However, we have to be careful to assure that Lx is Hermitian. For a
differential operator to be Hermitian, (i) the said operator must be self-adjoint.
(ii) The two boundary functionals adjoint to each other must be identical. In other
words, ψ and φ must satisfy the same homogeneous BCs with respect to these
functionals. In this example, we must have the same boundary functionals as those
described by (10.47) and (10.48). If and only if the conditions (i) and (ii) are
satisfied, the operator is said to be Hermitian. It seems somewhat a formal expres-
sion. Nonetheless, satisfaction of these conditions is also the case with second-order
differential operators so that these operators can be Hermitian. In fact, SOLDEs we
studied in Part I are essentially dealt with within the framework of the aforemen-
tioned formalism.
10.3 Second-Order Differential Operators
The second-order differential operators are the most common operators and fre-
quently treated in mathematical physics. The general differential operators are
described as
d2 d
Lx ¼ a ð x Þ 2
þ bðxÞ þ cðxÞ, ð10:55Þ
dx dx
where a(x), b(x), and c(x) can in general be complex functions of a real variable x.
Let us think of following identities [1]:

d2 u d2 ðav Þ d du d ðav Þ
v a 2 u ¼ av u ,
dx dx2 dx dx dx
du dðbv Þ d
v b þu ¼ ½buv ,
dx dx dx
v cu ucv ¼ 0: ð10:56Þ
Summing both sides of (10.56), we have an identity


d2 u du d2 ðav Þ d ðbv Þ
v a 2 þ b þ cu u þ cv
dx dx dx2 dx
ð10:57Þ
d du dðav Þ d
¼ av u þ ½buv :
dx dx dx dx
Hence, following the expressions of Sect. 10.2, we define Lx{ such that
d 2 ðav Þ d ðbv Þ
Lx { v þ cv : ð10:58Þ
dx2 dx
Taking a complex conjugate of both sides, we get
d 2 ð a v Þ d ð b v Þ
Lx { v ¼ þ c v: ð10:59Þ
dx2 dx
Considering the differential of a product function, we have as Lx{

{ d2
da d d2 a db
Lx ¼a þ 2 b þ þ c : ð10:60Þ
dx2 dx dx dx2 dx
Replacing (10.57) with (10.55) and (10.59), we have

d du d ðav Þ
v ðLx uÞ Lx { v u ¼ av u þ buv : ð10:61Þ
dx dx dx
Assuming that the relevant SOLDE is defined in [r, s] and integrating (10.61) within
that interval, we get
Z h s
s

{ i du dðav Þ
dx v ðLx uÞ Lx v u ¼ av u þ buv : ð10:62Þ
r dx dx r
Using the definition of an inner product described in (1.128) and rewriting (10.62),
we have
s
du dðav Þ
hvjLx ui Lx { vju ¼ av u þ buv :
dx dx r
Here if RHS of the above (i.e., the surface term of the above expression) vanishes,
we get
hvjLx ui ¼ Lx { vju :
We find that this notation is consistent with (1.112).

Bearing in mind this situation, let us seek a condition under which the differential
operator Lx is Hermitian. Suppose here that a(x), b(x), and c(x) are all real and that
daðxÞ
¼ bðxÞ: ð10:63Þ
dx
Then, instead of (10.60), we have
d2 d
Lx { ¼ a ð x Þ 2
þ bð x Þ þ c ð x Þ ¼ L x :
dx dx
Thus, we are successful in constituting a self-adjoint operator Lx. In that case, (10.62)
can be rewritten as
Z s
s
du dv
dx½v ðLx uÞ ½Lx v u ¼ a v u : ð10:64Þ
r dx dx r
Notice that b(x) is eliminated from (10.64). If RHS of (10.64) vanishes, we get
hvjLx ui ¼ hLx vjui:
This notation is consistent with (1.119) and the Hermiticity of Lx becomes well
defined.
If we do not have the condition of dadxðxÞ ¼ bðxÞ , how can we deal with the
problem? The answer is that following the procedures in Sect. 10.2, we can convert
Lx to a self-adjoint operator by multiplying Lx by a weight function w(x) introduced
in (10.26), (10.27), and (10.31). Replacing a(x), b(x), and c(x) with w(x)a(x), w(x)b
(x), and w(x)c(x), respectively, in the identity (10.57), we rewrite (10.57) as

d2 u du d2 ðawv Þ dðbwv Þ
v aw 2 þ bw þ cwu u þ cwv
dx dx dx2 dx
ð10:65Þ
d du dðawv Þ d
¼ awv u þ ½bwuv :
dx dx dx dx
Let us calculate { } of the second term for LHS of (10.65). Using (10.26) and
(10.27), we have
d2 ðawv Þ dðbwv Þ
þ cwv
dx2 dx
h i h
0 0
i
¼ ðawÞ0 v þ awv ðbwÞ0 v þ bwv þ cwv
0
h i h
0 0 0
i
¼ bwv þ awv bwÞ0 v þ bwv þ cwv
¼ ðbwÞ0 v þ bwv þ ðawÞ0 v þ awv ðbwÞ0 v bwv þ cwv

0 0 00 0
¼ ðawÞ0 v þ awv þ cwv ¼ bwv þ awv þ cwv

0 00 0 00
00 0 00 0
¼ w av þ bv þ cv ¼ w a v þ b v þ c v

¼ wðav00 þ bv0 þ cvÞ : ð10:66Þ
The second last equality of (10.66) is based on the assumption that a(x), b(x), and
c(x) are real functions. Meanwhile, for RHS of (10.65) we have

d du dðawv Þ d
awv u þ ½bwuv
dx dx dx dx

d du 0 dv d
¼ awv uðawÞ v uaw þ ½bwuv
dx dx dx dx

d du dv d
¼ awv ubwv uaw þ ½bwuv
dx dx dx dx

d du dv d du dv
¼ aw v u ¼ p v u : ð10:67Þ
dx dx dx dx dx dx
With the last equality of (10.67), we used (10.26). Using (10.66) and (10.67), we
rewrite (10.65) once again as
2 2
d u du d v dv
v w a 2 þ b þ cu uw a 2 þ b þ cv
dx dx dx dx

d du dv
¼ p v u : ð10:68Þ
dx dx dx
Then, integrating (10.68) from r to s, we finally get

Z s
s
du dv
dxwðxÞfv ðLx uÞ ½Lx v ug ¼ p v u : ð10:69Þ
r dx dx r
The relations (10.69) along with (10.62) are called the generalized Green’s identity.
We emphasize that as far as the coefficients a(x), b(x), and c(x) in (10.55) are real
functions, the associated differential operator Lx can be converted to a self-adjoint
form following the procedures of (10.66) and (10.67).
In the above, LHS of the original homogeneous differential equation (10.5) is
rewritten as
d2 u du
aðxÞwðxÞ 2 þ bðxÞwðxÞ þ cðxÞwðxÞu
dx dx

d du
¼ pð x Þ þ cwðxÞu:
dx dx
Rewriting this, we have

1 d du
Lx u ¼ pð x Þ þ cu ½wðxÞ > 0, ð10:70Þ
wðxÞ dx dx
where we have
dpðxÞ
pðxÞ ¼ aðxÞwðxÞ and ¼ bðxÞwðxÞ: ð10:71Þ
dx
The latter equation of (10.71) corresponds to (10.63) if we assume w(x) 1. When

the differential operator Lx is defined as (10.70), Lx is said to be self-adjoint with
respect to a weight function of w(x).
Now we examine boundary functionals. The homogeneous adjoint boundary
functionals are described as follows:

dv dv
B{1 ðuÞ ¼ α1 v ðr Þ þ β1 þ γ v
ð s Þ þ δ ¼ 0, ð10:72Þ
dx x¼r 1 1
dx x¼s

dv dv
B{2 ðuÞ ¼ α2 v ðr Þ þ β2 þ γ v
ð s Þ þ δ ¼ 0: ð10:73Þ
dx x¼r 2 2
dx x¼s
In (10.3) putting α1 ¼ 1 and β1 ¼ γ 1 ¼ δ1 ¼ 0, we have
B 1 ð uÞ ¼ uð r Þ ¼ σ 1 : ð10:74Þ
Also putting γ 2 ¼ 1 and α2 ¼ β2 ¼ δ2 ¼ 0, we have
B2 ðuÞ ¼ uðsÞ ¼ σ 2 : ð10:75Þ
Further putting
σ 1 ¼ σ 2 ¼ 0, ð10:76Þ
we also get homogeneous BCs of
B1 ðuÞ ¼ B2 ðuÞ ¼ 0; i:e:, uðr Þ ¼ uðsÞ ¼ 0: ð10:77Þ
For RHS of (10.69) to vanish, it suffices to define B{1 ðuÞ and B{2 ðuÞ such that
B{1 ðuÞ ¼ v ðr Þ and B{2 ðuÞ ¼ v ðsÞ: ð10:78Þ
Then, homogeneous adjoint BCs read as
v ðr Þ ¼ v ðsÞ ¼ 0, i:e:, vðr Þ ¼ vðsÞ ¼ 0: ð10:79Þ
In this manner, we can readily construct the homogeneous adjoint BCs the same as
those of (10.77) so that Lx can be Hermitian.
We list several prescriptions of typical BCs below.
ðiÞ uðr Þ ¼ uðsÞ ¼ 0 ðDirichlet conditionsÞ,

du du
ðiiÞ x¼r ¼ ¼ 0 ðNeumann conditionsÞ,
dx dx x¼s

du du
ðiiiÞ uðr Þ ¼ uðsÞ and x¼r ¼ ðPeriodic conditionsÞ: ð10:80Þ
dx dx x¼s
Yet, care should be taken when handling RHS of (10.69), i.e., the surface terms. It is
because conditions (i) to (iii) are not necessary but sufficient conditions for the
surface terms to vanish. Such conditions are not limited to them. Meanwhile, we
often have to deal with the nonvanishing surface terms. In that case, we have to start
with (10.62) instead of (10.69).
In Sect. 10.2, we mentioned the definition of Hermiticity of the differential
operator in such a way that the said operator is self-adjoint and that homogeneous
BCs and homogeneous adjoint BCs are the same. In light of the above argument,
however, we may relax the conditions for a differential operator to be Hermitian.
This is particularly the case when p(x) ¼ a(x)w(x) in (10.69) vanishes at both the
endpoints. We will encounter such a situation in Sect. 10.7.
10.4 Green’s Functions
Having aforementioned discussions, let us proceed with studies of Green’s functions

for SOLDEs. Though minimum, we have to mention a bit of formalism.
Given Lx defined by (10.55), let us assume
Lx u ð x Þ ¼ d ð x Þ ð10:81Þ
under homogeneous BCs with an inhomogeneous term d(x) being an arbitrary

function. We also assume that (10.81) is well defined in a domain [r, s]. The numbers
r and s can be infinity. Suppose simultaneously that we have
10.4 Green’s Functions 395
Lx { vðxÞ ¼ hðxÞ ð10:82Þ
under homogeneous adjoint BCs [1, 2] with an inhomogeneous term h(x) being an
arbitrary function as well.
Let us describe the above relations as
Ljui ¼ jdi and L{ jvi ¼ jhi: ð10:83Þ
Suppose that there is an inverse operator L1 G such that
GL ¼ LG ¼ E, ð10:84Þ
where E is an identity operator. Operating G on (10.83), we have
GLjui ¼ Ejui ¼ jui ¼ Gjdi: ð10:85Þ
This implies that (10.81) has been solved and the solution is given by G| di. Since an
inverse operation to differentiation is integration, G is expected to be an integral
operator.
We have
hxjLG j yi ¼ Lx hxjG j yi ¼ Lx Gðx, yÞ: ð10:86Þ
Meanwhile, using (10.84) we get
hxjLG j yi ¼ hxjE j yi ¼ hxjyi: ð10:87Þ
Using a weight function w(x), we generalize an inner product of (1.128) such that
Z s
hgj f i wðxÞgðxÞ f ðxÞdx: ð10:88Þ
r
As we expand an arbitrary vector using basis vectors, we “expand” an arbitrary

function | f i using basis vectors | xi. Here, we are treating real numbers as if they
formed continuous innumerable basis vectors on a real number line (see Fig. 10.1).
Thus, we could expand | f i in terms of | xi such that
Fig. 10.1 Function | f i and

=
its coordinate representation
f(x)
Z s
jfi ¼ dxwðxÞf ðxÞjxi: ð10:89Þ
r
In (10.89) we considered f(x) as if it were an expansion coefficient. The following

notation would be reasonable accordingly:
f ðxÞ hxj f i: ð10:90Þ
In (10.90), f(x) can be viewed as coordinate representation of | f i. Thus, from (10.89)

we get
Z s
0 0
hx j f i ¼ f ðx Þ ¼ dxwðxÞf ðxÞhx0 jxi: ð10:91Þ
r
Alternatively, we have
Z s
0
f ðx Þ ¼ dxf ðxÞδðx x0 Þ: ð10:92Þ
r
This comes from a property of the δ function [1] described as

Z s
dxf ðxÞδðxÞ ¼ f ð0Þ: ð10:93Þ
r
δ ð x x0 Þ δðx0 xÞ
wðxÞhx0 jxi ¼ δðx x0 Þ or hx0 jxi ¼ ¼ : ð10:94Þ
wðxÞ wðxÞ
Thus comparing (10.86) and (10.87) and using (10.94), we get
δðx yÞ
Lx Gðx, yÞ ¼ : ð10:95Þ
wðxÞ
In a similar manner, we also have
δðx yÞ
Lx { gðx, yÞ ¼ : ð10:96Þ
wðxÞ
To arrive at (10.96), we start the discussion assuming an operator (Lx{)1 such that
(Lx{)1 g with gL{ ¼ L{g ¼ E.
The function G(x, y) is called a Green’s function and g(x, y) is said to be an adjoint
Green’s function. Handling of Green’s functions and adjoint Green’s functions is
based upon (10.95) and (10.96), respectively. As (10.81) is defined in a domain
r x s, (10.95) and (10.96) are defined in a domain r x s and r y s. Notice

that except for the point x ¼ y we have
Lx Gðx, yÞ ¼ 0 and Lx { gðx, yÞ ¼ 0: ð10:97Þ
That is, G(x, y) and g(x, y) satisfy the homogeneous equation with respect to the
variable x. Accordingly, we require G(x, y) and g(x, y) to satisfy the same homoge-
neous BCs with respect to the variable x as those imposed upon u(x) and v(x) of
(10.81) and (10.82), respectively [1].
The relation (10.88) can be obtained as follows: Operating hg| on (10.89), we
have
Z s Z s
hgjf i ¼ dxwðxÞf ðxÞhgjxi ¼ wðxÞgðxÞ f ðxÞdx, ð10:98Þ
r r
where for the last equality we used
hgjxi ¼ hx j gi ¼ gðxÞ : ð10:99Þ
For this, see (1.113) where A is replaced with an identity operator E with regard to a
complex conjugate of an inner product of two vectors. Also see (13.2) of Sect. 13.1.
If in (10.69) the surface term (i.e., RHS) vanishes under appropriate conditions,
e.g., (10.80), we have
Z n
s o
dxwðxÞ v ðLx uÞ Lx { v u ¼ 0, ð10:100Þ
r
which is called Green’s identity. Since (10.100) is derived from identities (10.56),
(10.100) is an identity as well (as a terminology of Green’s identity shows).
Therefore, (10.100) must hold with any functions u and v so far as they satisfy
homogeneous BCs. Thus, replacing v in (10.100) with g(x, y) and using (10.96)
together with (10.81), we have
Z n o
s
dxwðxÞ g ðx, yÞ½Lx uðxÞ Lx { gðx, yÞ uðxÞ
r
Z s
δðx yÞ
¼ dxwðxÞ g ðx, yÞdðxÞ uð x Þ
r wðxÞ
Z s
¼ dxwðxÞg ðx, yÞdðxÞ uðyÞ ¼ 0, ð10:101Þ
r
where with the second last equality we used a property of the δ functions. Also notice
that δwðxy Þ
ðxÞ is a real function. Rewriting (10.101), we get
Z s
uð y Þ ¼ dxwðxÞg ðx, yÞd ðxÞ: ð10:102Þ
r
Similarly, replacing u in (10.100) with G(x, y) and using (10.95) together with
(10.82), we have
Z s
vð yÞ ¼ dxwðxÞG ðx, yÞhðxÞ: ð10:103Þ
r
Replacing u and v in (10.100) with G(x, q) and g(x, t), respectively, we have
Z n o
s
dxwðxÞ g ðx, t Þ½Lx Gðx, qÞ Lx { gðx, t Þ Gðx, qÞ ¼ 0: ð10:104Þ
r
Notice that we have chosen q and t for the second argument y in (10.95) and (10.96),
respectively. Inserting (10.95) and (10.96) into the above equation after changing
arguments, we have
Z
s
δ ð x qÞ δ ðx t Þ
dxwðxÞ g ðx, t Þ Gðx, qÞ ¼ 0: ð10:105Þ
r wðxÞ w ð xÞ
Thus, we get
g ðq, t Þ ¼ Gðt, qÞ or gðq, t Þ ¼ G ðt, qÞ: ð10:106Þ
This implies that G(t, q) must satisfy the adjoint BCs with respect to the second
argument q. Inserting (10.106) into (10.102), we get
Z s
uð y Þ ¼ dxwðxÞGðy, xÞd ðxÞ: ð10:107Þ
r
Or exchanging the arguments x and y, we have

Z s
uð x Þ ¼ dywðyÞGðx, yÞd ðyÞ: ð10:108Þ
r
Similarly, using (10.103) into (10.106) we get

Z s
vð yÞ ¼ dxwðxÞgðy, xÞhðxÞ: ð10:109Þ
r
Or, we have
Z s
vð xÞ ¼ dywðyÞgðx, yÞhðyÞ: ð10:110Þ
r
Equations (10.107) to (10.110) clearly show that homogeneous equations [given by

putting d(x) ¼ h(x) ¼ 0] have a trivial solution u(x) 0 and v(x) 0 under
homogeneous BCs. Note that it is always the case when we are able to construct a
Green’s function. This in turn implies that we can construct a Green’s function if the
differential operator is accompanied by initial conditions. Conversely, if the homo-
geneous equation has a nontrivial solution under homogeneous BCs, Eqs. (10.107)
to (10.110) will not work.
If the differential operator L in (10.81) is Hermitian, according to the associated
remarks of Sect. 10.2 we must have Lx ¼ Lx{ and u(x) and v(x) of (10.81) and (10.82)
must satisfy the same homogeneous BCs. Consequently, in the case of an Hermitian
operator we should have
Gðx, yÞ ¼ gðx, yÞ: ð10:111Þ
From (10.106) and (10.111), if the operator is Hermitian we get
Gðx, yÞ ¼ G ðy, xÞ: ð10:112Þ
In Sect. 10.3 we assume that the coefficients a(x), and b(x), and c(x) are real to assure
that Lx is Hermitian [1]. On this condition G(x, y) is real as well (vide infra). Then we
have
Gðx, yÞ ¼ Gðy, xÞ: ð10:113Þ
That is, G(x, y) is real symmetric with respect to the arguments x and y.
To be able to apply Green’s functions to practical use, we will have to estimate a
behavior of the Green’s function near x ¼ y. This is because in light of (10.95) and
(10.96) there is a “jump” at x ¼ y.
When we deal with a case where a self-adjoint operator is relevant, using a
function p(x) of (10.69) we have

að x Þ ∂ ∂G δ ð x yÞ
p þ cðxÞG ¼ : ð10:114Þ
pðxÞ ∂x ∂x wðxÞ
Multiplying both sides by paððxxÞÞ, we have

∂ ∂G pðxÞ δðx yÞ pðxÞcðxÞ
p ¼ Gðx, yÞ: ð10:115Þ
∂x ∂x aðxÞ wðxÞ að x Þ
Using a property of the δ function expressed by

f ðxÞ δðxÞ ¼ f ð0Þ δðxÞ or f ðxÞ δðx yÞ ¼ f ðyÞ δðx yÞ, ð10:116Þ
we have

∂ ∂G pðyÞ δðx yÞ pðxÞcðxÞ
p ¼ Gðx, yÞ: ð10:117Þ
∂x ∂x aðyÞ wðyÞ að x Þ
Integrating (10.117) with respect to x, we get

Z
∂Gðx, yÞ pð y Þ x
pðt Þcðt Þ
p ¼ θ ð x yÞ dt Gðt, yÞ þ C, ð10:118Þ
∂x aðyÞwðyÞ r að t Þ
where C is a constant. The function θ(x y) is defined by
1 ð x > 0Þ
θ ð xÞ ¼ ð10:119Þ
0 ðx < 0Þ:
Note that we have
dθðxÞ
¼ δðxÞ: ð10:120Þ
dx
In RHS of (10.118) the first term has a discontinuity at x ¼ y because of θ(x y),
whereas the second term is continuous with respect to y. Thus, we have
" #
∂Gðx, yÞ ∂Gðx, yÞ
lim pðy þ εÞ x¼yþε pðy εÞ
ε!þ0 ∂x ∂x x¼yε
pð y Þ pð y Þ
¼ lim ½θðþεÞ θðεÞ ¼ : ð10:121Þ
ε!þ0 aðyÞwðyÞ aðyÞwðyÞ
Since p( y) is continuous with respect to the argument y, this factor drops off and we
get
" #
∂Gðx, yÞ ∂Gðx, yÞ 1
lim x¼yþε ¼ : ð10:122Þ
ε!þ0 ∂x ∂x x¼yε aðyÞwðyÞ
Thus, ∂G∂x
ðx, yÞ
is accompanied by a discontinuity at x ¼ y by a magnitude of aðyÞw
1
ðyÞ.
Since RHS of (10.122) is continuous with respect to the argument y, integrating
(10.122) again with respect to x, we find that G(x, y) is continuous at x ¼ y. These
properties of G(x, y) are useful to calculate Green’s functions in practical use. We
will encounter several examples in next sections.
10.5 Construction of Green’s Functions 401
Suppose that there are two Green’s functions that satisfy the same homogeneous
e ðx, yÞ be such functions. Then, we must have
BCs. Let G(x, y) and G
δ ð x yÞ e ðx, yÞ ¼ δðx yÞ :
Lx Gðx, yÞ ¼ and Lx G ð10:123Þ
w ð xÞ wðxÞ
Subtracting both sides of (10.123), we have

h i
Lx Ge ðx, yÞ Gðx, yÞ ¼ 0: ð10:124Þ
e ðx, yÞ must satisfy the same homoge-

In virtue of the linearity of BCs, Gðx, yÞ G
neous BCs as well. But, (10.124) is a homogeneous equation, and so we must have a
trivial solution from the aforementioned constructability of the Green’s function
such that
e ðx, yÞ 0 or Gðx, yÞ G
Gðx, yÞ G e ðx, yÞ: ð10:125Þ
This obviously indicates that a Green’s function should be unique.

We have assumed in Sect. 10.3 that the coefficients a(x), and b(x), and c(x) are
real. Therefore, taking complex conjugate of (10.95) we have
δ ð x yÞ
Lx Gðx, yÞ ¼ : ð10:126Þ
wðxÞ
Notice here that both δ(x y) and w(x) are real functions. Subtracting (10.95) from
(10.126), we have
Lx ½Gðx, yÞ Gðx, yÞ ¼ 0:
Again, from the uniqueness of the Green’s function, we get G(x, y) ¼ G(x, y); i.e., G
(x, y) is real accordingly. This is independent of specific structures of Lx. In other
words, so far as we are dealing with real coefficients a(x), b(x), and c(x), G(x, y) is
real whether or not Lx is self-adjoint.
10.5 Construction of Green’s Functions
So far we dealt with homogeneous boundary conditions (BCs) with respect to a

differential equation
d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ dðxÞ, ð10:2Þ
dx2 dx
where coefficients a(x), b(x), and c(x) are real. In this case, if d(x) 0 in (10.108),
namely, the SOLDE is homogeneous equation, we have a solution u(x) 0 on the
basis of (10.108). If, on the other hand, we have inhomogeneous boundary condi-
tions (BCs), additional terms appear on RHS of (10.108) in both the cases of
homogeneous and inhomogeneous equations. In this section, we examine how we
can deal with this problem.
Following the remarks made in Sect. 10.3, we start with (10.62) or (10.69). If we
deal with a self-adjoint or Hermitian operator, we can apply (10.69) to the problem.
In a more general case where the operator is not self-adjoint, (10.62) is useful. In this
respect, in Sect. 10.6 we have a good opportunity for this.
In Sect. 10.3, we mentioned that we may relax the definition of Hermiticity of the
differential operator in the case where the surface term vanishes. Meanwhile, we
should bear in mind that the Green’s functions and adjoint Green’s functions are
constructed using homogeneous BCs regardless of whether we are concerned with a
homogeneous equation or inhomogeneous equation. Thus, even if the surface terms
do not vanish, we may regard the differential operator as Hermitian. This is because
we deal with essentially the same Green’s function to solve a problem with both the
cases of homogeneous equation and inhomogeneous equations (vide infra). Notice
also that whether or not RHS vanishes, we are to use the same Green’s function
[1]. In this sense, we do not have to be too strict with the definition of Hermiticity.
Now, suppose that for a differential (self-adjoint) operator Lx we are given
Z s
s
du dv
dxwðxÞfv ðLx uÞ ½Lx v ug ¼ pðxÞ v u , ð10:69Þ
r dx dx r
where w(x) > 0 in a domain [r, s] and p(x) is a real function. Note that since (10.69) is
an identity, we may choose G(x, y) for v with an appropriate choice of w(x). Then
from (10.95) and (10.112), we have
Z Z s
s
δðx yÞ δðx yÞ
dxwðxÞ Gðy,xÞdðxÞ u ¼ dxwðxÞ Gðy, xÞdðxÞ u
r wðxÞ r wðxÞ
s
duðxÞ ∂Gðy, xÞ
¼ pðxÞ Gðy,xÞ uð x Þ :
dx ∂x x¼r
ð10:127Þ
Note in (10.127) we used the fact that both δ(x y) and w(x) are real.
Using a property of the δ function, we get
Z s
s
duðxÞ ∂Gðy, xÞ
uð y Þ ¼ dxwðxÞGðy, xÞdðxÞ pðxÞ Gðy, xÞ uð x Þ :
r dx ∂x x¼r
ð10:128Þ
When the differential operator Lx can be made Hermitian under an appropriate

condition of (10.71), as a real Green’s function we have
The function G(x, y) satisfies homogeneous BCs. Hence, if we assume, e.g., the
Dirichlet BCs [see (10.80)], we have
Gðr, yÞ ¼ Gðs, yÞ ¼ 0: ð10:129Þ
Using the symmetric property of G(x, y) with respect to arguments x and y, from
(10.129) we get
Gðy, r Þ ¼ Gðy, sÞ ¼ 0: ð10:130Þ
Thus, the first term of the surface terms of (10.128) is eliminated to yield
Z s
s
∂Gðy, xÞ
uð y Þ ¼ dxwðxÞGðy, xÞdðxÞ þ pðxÞuðxÞ :
r ∂x x¼r
Exchanging the arguments x and y, we get

Z s
s
∂Gðx, yÞ
uð x Þ ¼ dywðyÞGðx, yÞd ðyÞ þ pðyÞuðyÞ : ð10:131Þ
r ∂y y¼r
Then, (i) substituting surface terms of u(s) and u(r) that are associated with the
inhomogeneous BCs described as
B1 ðuÞ ¼ σ 1 and B2 ðuÞ ¼ σ 2 ð10:132Þ

ðx, yÞ ðx, yÞ
and (ii) calculating ∂G∂y and ∂G∂y , we will be able to obtain a unique
y¼r y¼s
solution. Notice that even though we have formally the same differential operators,
we get different Green’s functions depending upon different BCs. We see tangible
examples later.
On the basis of the general discussion of Sect. 10.4 and this section, we are in the
position to construct the Green’s functions. Except for the points of x ¼ y, the
Green’s function G(x, y) must satisfy the following differential equation:
Lx Gðx, yÞ ¼ 0, ð10:133Þ
where Lx is given by
d2 d
Lx ¼ aðxÞ þ bðxÞ þ cðxÞ: ð10:55Þ
dx2 dx
The differential equation Lxu ¼ d(x) is defined within an interval [r, s], where r may
be 1 and s may be +1.
From now on, we regard a(x), b(x), and c(x) as real functions. From (10.133), we
expect the Green’s function to be described as a linear combination of a fundamental
set of solutions u1(x) and u2(x). Here the fundamental set of solutions are given by
two linearly independent solutions of a homogeneous equation Lxu ¼ 0. Then we
should be able to express G(x, y) as a combination of F1(x, y) and F2(x, y) that are
described as
F 1 ðx, yÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ for r x < y,

F 2 ðx, yÞ ¼ d1 u1 ðxÞ þ d2 u2 ðxÞ for y < x s, ð10:134Þ
where c1, c2, d1, and d2 are arbitrary (complex) constants to be determined later.
These constants are given as a function of y. The combination has to be made such
that
F 1 ðx, yÞ for r x < y,

Gðx, yÞ ¼ ð10:135Þ
F 2 ðx, yÞ for y < x s:
Thus using θ(x) function defined as (10.119), we describe G(x, y) as
Gðx, yÞ ¼ F 1 ðx, yÞθðy xÞ þ F 2 ðx, yÞθðx yÞ: ð10:136Þ
Notice that F1(x, y) and F2(x, y) are “ordinary” functions and that G(x, y) is not,
because G(x, y) contains the θ(x) function.
If we have
F 2 ðx, yÞ ¼ F 1 ðy, xÞ, ð10:137Þ

Gðx, yÞ ¼ F 1 ðx, yÞθðy xÞ þ F 1 ðy, xÞθðx yÞ: ð10:138Þ
Hence, we get
From (10.113), Lx is Hermitian. Suppose that F1(x, y) ¼ (x r)(y s) and F2(x,

y) ¼ (x s)(y r). Then, (10.137) is satisfied and, hence, if we can construct the
Green’s function from F1(x, y) and F2(x, y), Lx should be Hermitian. However, if we
had, e.g., F1(x, y) ¼ x r and F2(x, y) ¼ y s, G(x, y) 6¼ G(y, x), and so Lx would not
be Hermitian.
The Green’s functions must satisfy the homogeneous BCs. That is,
B1 ðGÞ ¼ B2 ðGÞ ¼ 0: ð10:140Þ
Also, we require continuity condition of G(x, y) at x ¼ y and discontinuity condition

of ∂G∂x
ðx, yÞ
at x ¼ y described by (10.122). Thus, we have four conditions including
BCs and continuity and discontinuity conditions to be satisfied by G(x, y). Thus, we
can determine four constants c1, c2, d1, and d2 by the four conditions.
Now, let us inspect further details about the Green’s functions by an example.
Example 10.4 Let us consider a following differential equation
d2 u
þ u ¼ 1: ð10:141Þ
dx2
We assume that a domain of the argument x is [0, L]. We set boundary conditions
such that
uð0Þ ¼ σ 1 and uðLÞ ¼ σ 2 : ð10:142Þ
Thus, if at least one of σ 1 and σ 2 is not zero, we are dealing with an inhomogeneous
differential equation under inhomogeneous BCs.
Next, let us seek conditions that the Green’s function satisfies. We also seek a
fundamental set of solutions of a homogeneous equation described by
d2 u
þ u ¼ 0: ð10:143Þ
dx2
This is obtained by putting a ¼ c ¼ 1 and b ¼ 0 in a general form of (10.5) with a

weight function being unity. The differential equation (10.143) is therefore self-
adjoint according to the argument of Sect. 10.3. A fundamental set of solutions are
given by
eix and eix :
Then, we have
F 1 ðx, yÞ ¼ c1 eix þ c2 eix for 0 x < y,

F 2 ðx, yÞ ¼ d1 eix þ d 2 eix for y < x L: ð10:144Þ
The functions F1(x, y) and F2(x, y) must satisfy the following BCs such that
F 1 ð0, yÞ ¼ c1 þ c2 ¼ 0 and F 2 ðL, yÞ ¼ d 1 eiL þ d 2 eiL ¼ 0: ð10:145Þ
Thus, we have

F 1 ðx, yÞ ¼ c1 eix eix , F 2 ðx, yÞ ¼ d1 eix e2iL eix : ð10:146Þ
Therefore, at x ¼ y we have

c1 eiy eiy ¼ d 1 eiy e2iL eiy : ð10:147Þ
Discontinuity condition of (10.122) is equivalent to

∂F 2 ðx, yÞ ∂F 1 ðx, yÞ
∂x x¼y ∂x x¼y
¼ 1: ð10:148Þ
This is because both F1(x, y) and F2(x, y) are ordinary functions and supposed to be
differentiable at any x. The relation (10.148) then reads as

id 1 eiy þ e2iL eiy ic1 eiy þ eiy ¼ 1: ð10:149Þ
From (10.147) and (10.149), using Cramer’s rule we have

0 eiy þ e2iL eiy

i eiy e2iL eiy iðeiy e2iL eiy Þ
c1 ¼ iy ¼ , ð10:150Þ
e e iy
e þ e e
iy 2iL iy 2ð1 e2iL Þ

eiy þ eiy eiy e2iL eiy
iy
e eiy 0

eiy þ eiy i iðeiy eiy Þ
d1 ¼ iy ¼ : ð10:151Þ
e eiy eiy þ e2iL eiy 2ð1 e2iL Þ

eiy þ eiy eiy e2iL eiy
Substituting these parameters for (10.146), we get
sin xðe2iL eiy eiy Þ sin yðe2iL eix eix Þ

F 1 ðx, yÞ ¼ , F 2 ð x, y Þ ¼ : ð10:152Þ
1 e2iL 1 e2iL
Making a denominator real, we have

sin x½ cos ðy 2LÞ cos y

F 1 ðx, yÞ ¼ ,
2 sin 2 L
sin y½ cos ðx 2LÞ cos x
F 2 ðx, yÞ ¼ : ð10:153Þ
2 sin 2 L
Using the θ(x) function, we get
sin x½ cos ðy 2LÞ cos y sin y½ cos ðx 2LÞ cos x

Gðx, yÞ ¼ θðy xÞ þ θðx yÞ:
2 sin 2 L 2 sin 2 L
ð10:154Þ
Thus, G(x, y) ¼ G(y, x) as expected. Notice, however, that if L ¼ nπ (n ¼ 1, 2, ) the

Green’s function cannot be defined as an ordinary function even if x 6¼ y. We return
to this point later.
The solution for (10.141) under the homogeneous BCs is then described as
Z L
uðxÞ ¼ dyGðx, yÞ
0
Z Z
cos ðx 2LÞ cos x x
sin x L
¼ sin ydy þ ½ cos ðy 2LÞ cos ydy:
2 sin 2 L 0 2 sin 2 L x
ð10:155Þ
This can readily be integrated to yields solution for the inhomogeneous equation
such that
cos ðx 2LÞ cos x 2 sin L sin x cos 2L þ 1

uð x Þ ¼
2 sin 2 L
cos ðx 2LÞ cos x 2 sin L sin x þ 2 sin 2 L
¼ : ð10:156Þ
2 sin 2 L
Next, let us consider the surface term. This is given by the second term of
(10.131). We get

∂F 1 ðx, yÞ sin x ∂F 2 ðx, yÞ cos ðx 2LÞ cos x
¼ , ¼ : ð10:157Þ
∂x y¼0
y¼L
∂x sin L 2 sin 2 L
Therefore, with the inhomogeneous BCs we have the following solution for the
inhomogeneous equation:
cos ðx 2LÞ cos x 2 sin L sin x þ 2 sin 2 L

uð x Þ ¼
2 sin 2 L
2σ sin L sin x þ σ 1 ½ cos x cos ðx 2LÞ
þ 2 , ð10:158Þ
2 sin 2 L
where the second term is the surface term. If σ 1 ¼ σ 2 ¼ 1, we have
uðxÞ 1:
Looking at (10.141), we find that u(x) 1 is certainly a solution for (10.141) with
inhomogeneous BCs of σ 1 ¼ σ 2 ¼ 1. The uniqueness of the solution then ensures
that u(x) 1 is a sole solution under the said BCs.
From (10.154), we find that G(x, y) has a singularity at L ¼ nπ (n : integers). This
is associated with the fact that a homogenous equation (10.143) has a nontrivial
solution, e.g., u(x) ¼ sin x under homogeneous BCs u(0) ¼ u(L ) ¼ 0. The present
situation is essentially the same as that of Example 1.1 of Sect. 1.3. In other words,
when λ ¼ 1 in (1.61), the form of a differential equation is identical to (10.143) with
virtually the same Dirichlet conditions. The point is that (10.143) can be viewed as a
homogeneous equation and, at the same time, as an eigenvalue equation. In such a
case, a Green’s function approach will fail.
10.6 Initial Value Problems (IVPs)
10.6.1 General Remarks
The IVPs frequently appear in mathematical physics. The relevant conditions are
dealt with as BCs in the theory of differential equations. With boundary functionals
B1(u) and B2(u) of (10.3) and (10.4), setting α1 ¼ β2 ¼ 1 and other coefficients as
zero, we get

du
B1 ðuÞ ¼ uðpÞ ¼ σ 1 and B2 ðuÞ ¼ ¼ σ2 : ð10:159Þ
dx x¼p
In the above, note that we choose [r, s] for a domain of x. The points r and s can be
infinity as before. Any point p within the domain [r, s] may be designated as a special
point on which the BCs (10.159) are imposed. The initial conditions are particularly
prominent among BCs. This is because the conditions are set at one point of the
argument. This special condition is usually called initial conditions. In this section,
we investigate fundamental characteristics of IVPs.
Suppose that we have
10.6 Initial Value Problems (IVPs) 409

du
uð pÞ ¼ ¼ 0, ð10:160Þ
dx x¼p
with homogeneous BCs. Given a differential operator Lx defined as (10.55), i.e.,
d2 d
Lx ¼ aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx
let a fundamental set of solutions be u1(x) and u2(x) for
Lx uðxÞ ¼ 0: ð10:161Þ
A general solution u(x) for (10.161) is given by a linear combination of u1(x) and
u2(x) such that
uðxÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ, ð10:162Þ
where c1 and c2 are arbitrary (complex) constants. Suppose that we have homoge-
neous BCs expressed by (10.160). Then, we have
uðpÞ ¼ c1 u1 ðpÞ þ c2 u2 ðpÞ ¼ 0,

u0 ðpÞ ¼ c1 u1 0 ðpÞ þ c2 u2 0 ðpÞ ¼ 0:
Rewriting it in a matrix form, we have

u1 ð pÞ u2 ð pÞ c1
¼ 0:
u1 0 ð pÞ u2 0 ðpÞ c2
Since the matrix represents Wronskian of a fundamental set of solutions u1(x) and
u2(x), its determinant never vanishes at any point p. That is, we have

u1 ð pÞ u2 ð pÞ

u 0 ðpÞ u 0 ðpÞ 6¼ 0: ð10:163Þ
1 2
Then, we necessarily have c1 ¼ c2 ¼ 0. From (10.162), we have a trivial solution
uð x Þ 0
under the initial conditions as homogeneous BCs. Thus, as already discussed a

Green’s function can always be constructed for IVPs.
To seek the Green’s functions for IVPs, we return back to the generalized Green’s
identity described as
Z h s
s

{ i du dðav Þ
r dx dx r
For the surface term (RHS) to vanish, for homogeneous BCs we have, e.g.,

du dv
uð s Þ ¼ x¼s ¼ 0 and vðr Þ ¼ ¼ 0,
dx dx x¼r
for the two sets of BCs adjoint to each other. Obviously, these are not identical
simply because the former is determined at s and the latter is determined at a different
point r. For this reason, the operator Lx is not Hermitian, even though it is formally
self-adjoint. In such a case, we would rather use Lx directly than construct a self-
adjoint operator because we cannot make the operator Hermitian either way.
Hence, unlike the precedent sections we do not need a weight function w(x). Or,
we may regard w(x) 1. Then, we reconsider the conditions which the Green’s
functions should satisfy. On the basis of the general consideration of Sect. 8.4,
especially (10.86), (10.87), and (10.94), we have [2].
Lx Gðx, yÞ ¼ hxjyi ¼ δðx yÞ: ð10:164Þ
Therefore, we have
2
∂ Gðx, yÞ bðxÞ ∂Gðx, yÞ cðxÞ δðx yÞ
þ þ Gðx, yÞ ¼ :
∂x 2 að x Þ ∂x að x Þ að x Þ
Integrating or integrating by parts the above equation, we get

x Z x 0 Z x
∂Gðx, yÞ bð x Þ bð ξ Þ cðξÞ
þ Gðx, yÞ Gðξ, yÞdξ þ Gðξ, yÞdξ
∂x að x Þ x0 x0 a ð ξ Þ x0 ð ξ Þ
a
θ ð x yÞ
¼ :
aðyÞ
∂Gðx, yÞ θðxyÞ
Noting that the functions other than ∂x
and aðyÞ are continuous, as before we
have
" #
∂Gðx, yÞ ∂Gðx, yÞ 1
lim x¼yþε ¼ : ð10:165Þ
ε!þ0 ∂x ∂x x¼yε að y Þ
10.6.2 Green’s Functions for IVPs
From a practical point of view, we may set r ¼ 0 in (10.62). Then, we can choose a
domain for [0, s] (for s > 0) or [s, 0] (for s < 0) with (10.2). For simplicity, we use
x instead of s. We consider two cases of x > 0 and x < 0.
(i) Case I (x > 0): Let u1(x) and u2(x) be a fundamental set of solutions. We define
F1(x, y) and F2(x, y) as before such that
F 1 ðx, yÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ for 0 x < y,

F 2 ðx, yÞ ¼ d1 u1 ðxÞ þ d 2 u2 ðxÞ for 0 < y < x: ð10:166Þ
As before, we set
F 1 ðx, yÞ for 0 x < y,

Gðx, yÞ ¼
F 2 ðx, yÞ for 0 < y < x:
Homogeneous BCs are defined as
uð0Þ ¼ 0 and u0 ð0Þ ¼ 0:
Correspondingly, we have
F 1 ð0, yÞ ¼ 0 and F 1 0 ð0, yÞ ¼ 0:
This is translated into
c1 u1 ð0Þ þ c2 u2 ð0Þ ¼ 0 and c1 u1 0 ð0Þ þ c2 u2 0 ð0Þ ¼ 0:
In a matrix form, we get

u1 ð 0Þ u2 ð 0Þ c1
¼ 0:
u1 0 ð 0Þ u2 0 ð0Þ c2
As mentioned above, since u1(x) and u2(x) are a fundamental set of solutions, we
have c1 ¼ c2 ¼ 0. Hence, we get
F 1 ðx, yÞ ¼ 0:
From the continuity and discontinuity conditions (10.165) imposed upon the
Green’s functions, we have
d 1 u1 ðyÞ þ d2 u2 ðyÞ ¼ 0 and d 1 u1 0 ðyÞ þ d 2 u2 0 ðyÞ ¼ 1=aðyÞ: ð10:167Þ
As before, we get
u2 ð y Þ u1 ð y Þ
d1 ¼ and d 2 ¼ ,
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ aðyÞW ðu1 ðyÞ, u2 ðyÞÞ
where W(u1( y), u2( y)) is Wronskian of u1( y) and u2( y). Thus, we get
u2 ðxÞu1 ðyÞ u1 ðxÞu2 ðyÞ

F 2 ðx, yÞ ¼ : ð10:168Þ
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ
(ii) Case II (x < 0): Next, we think of the case as below:
F 1 ðx, yÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ for y < x 0,

F 2 ðx, yÞ ¼ d1 u1 ðxÞ þ d 2 u2 ðxÞ for x < y < 0: ð10:169Þ
Similarly proceeding as the above, we have c1 ¼ c2 ¼ 0. Also, we get

F 2 ðx, yÞ ¼ : ð10:170Þ
Here notice that the sign is reversed in (10.170)) relative to (10.168). This is because
on the discontinuity condition, instead of (10.167) we have to have
d 1 u1 0 ðyÞ þ d2 u2 0 ðyÞ ¼ 1=aðyÞ:
This results from the fact that magnitude relationship between the arguments x and
y has been reversed in (10.169) relative to (10.166).
Summarizing the above argument, (10.168) is obtained in the domain 0 y < x;
(10.170) is obtained in the domain x < y < 0. Noting this characteristic, we define a
function such that
Θðx, yÞ θðx yÞθðyÞ θðy xÞθðyÞ: ð10:171Þ
Notice that
Θðx, yÞ ¼ Θðx, yÞ:
That is, Θ(x, y) is antisymmetric with respect to the origin.

Figure 10.2 shows a feature of Θ(x, y). If the “initial point” is taken at x ¼ a, we
can use Θ(x a, y a) instead; see Fig. 10.3. The function is described as
Fig. 10.2 Graph of a x

function Θ(x, y). Θ(x,
y) ¼ 1 or 1 in hatched
areas, otherwise Θ(x, y) ¼ 0 , =1
( , )
O y
, = −1
Fig. 10.3 Graph of a x

function Θ(x a, y a).
We assume a > 0
( − , − )
O
y
Θðx a, y aÞ ¼ θðx yÞθðy aÞ θðy xÞθða yÞ:
Note that Θ(x a, y a) can be obtained by shifting Θ(x, y) toward the positive
direction of the x- and y-axes by a (a can be either positive or negative; in Fig. 10.3
we assume a > 0). Using the Θ(x, y) function, the Green’s function is described as

Gðx, yÞ ¼ Θðx, yÞ: ð10:172Þ
Defining a function F such that


F ðx, yÞ , ð10:173Þ
we have
Gðx, yÞ ¼ F ðx, yÞΘðx, yÞ: ð10:174Þ
Notice that
Θðx, yÞ 6¼ Θðy, xÞ and Gðx, yÞ 6¼ Gðy, xÞ: ð10:175Þ
It is therefore obvious that the differential operator is not Hermitian.
10.6.3 Estimation of Surface Terms
To include the surface term of the inhomogeneous case, we use (10.62).

Z h s
s i du dðav Þ
dx v ðLx uÞ Lx { v u ¼ av u þ buv : ð10:62Þ
r dx dx r
As before, we set r ¼ 0 in (10.62). Also, we classify (10.62) into two cases according
as s > 0 or s < 0.
(i) Case I (x > 0, y > 0): Equation (10.62) reads as
Z h 1
1

{ i du dðav Þ
0 dx dx 0
This time, inserting
gðx, yÞ ¼ G ðy, xÞ ¼ Gðy, xÞ ð10:177Þ
into v of (10.176) and arranging terms, we have

Z 1
uð y Þ ¼ dxGðy, xÞd ðxÞ
0
1
duðxÞ daðxÞ ∂Gðy,xÞ
aðxÞGðy,xÞ uð x Þ Gðy, xÞ uðxÞaðxÞ þ bðxÞuðxÞGðy,xÞ :
dx dx ∂x x¼0
ð10:178Þ
Fig. 10.4 Domain of G(y,

x). Areas for which G(y, x) x
does not vanish are hatched
O y
Note that in the above we used Lx{g(x, y) ¼ δ(x y). In Fig. 10.4, we depict a domain
of G(y, x) in which G(y, x) does not vanish. Notice that we get the domain by folding
back that of Θ(x, y) (see Fig. 10.2) relative to a straight line y ¼ x. Thus, we find that
G(y, x) vanishes at x > y. So does ∂G∂x ðy, xÞ
; see Fig. 10.4. Namely, the second term of
RHS of (10.178) vanishes at x ¼ 1. In other words, g(x, y) and G(y, x) must satisfy
the adjoint BCs; i.e.,
gð1, yÞ ¼ Gðy, 1Þ ¼ 0: ð10:179Þ
At the same time, the upper limit of integration range of (10.178) can be set at y.
Noting the above, we have
Z y
ðthe first termÞ of ð10:178Þ ¼ dxGðy, xÞdðxÞ: ð10:180Þ
0
Also with the second term of (10.178), we get
ðthe second termÞ of ð10:178Þ ¼

duðxÞ daðxÞ ∂Gðy, xÞ
þ aðxÞGðy, xÞ uð x Þ Gðy, xÞ uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ
dx dx ∂x x¼0

duð0Þ dað0Þ ∂Gðy, xÞ
¼ að0ÞGðy, 0Þ uð 0Þ Gðy, 0Þ uð0Það0Þ
dx dx ∂x x¼0
þ bð0Þuð0ÞGðy, 0Þ: ð10:181Þ
If we substitute inhomogeneous BCs


du
uð0Þ ¼ σ 1 and ¼ σ2 , ð10:182Þ
dx x¼0
for (10.181) along with other appropriate values, we should be able to get a unique
solution as
Z y
uð y Þ ¼ dxGðy, xÞdðxÞ
0

dað0Þ ∂Gðy, xÞ
þ σ 2 að 0Þ σ 1 þ σ 1 bð0Þ Gðy, 0Þσ 1 að0Þ : ð10:183Þ
dx ∂x x¼0
Exchanging arguments x and y, we get

Z x
uð x Þ ¼ dyGðx, yÞdðyÞ
0

dað0Þ ∂Gðx, yÞ
þ σ 2 að 0Þ σ 1 þ σ 1 bð0Þ Gðx, 0Þσ 1 að0Þ : ð10:184Þ
dy ∂y y¼0
Here, we consider that Θ(x, y) ¼ 1 in this region and use (10.174). Meanwhile, from
(10.174), we have
∂Gðx, yÞ ∂F ðx, yÞ ∂Θðx, yÞ

¼ Θðx, yÞ þ F ðx, yÞ: ð10:185Þ
∂y ∂y ∂y
In the second term,
∂Θðx, yÞ ∂θðx yÞ ∂θðyÞ ∂θðy xÞ

¼ θðyÞ þ θðx yÞ θðyÞ
∂y ∂y ∂y ∂y
∂θðyÞ
θ ð y xÞ
∂y
¼ δðx yÞθðyÞ þ θðx yÞδðyÞ δðy xÞθðyÞ
þ θðy xÞδðyÞ
¼ δðx yÞ½θðyÞ þ θðyÞ þ ½θðx yÞ þ θðy xÞδðyÞ
¼ δðx yÞ½θðyÞ þ θðyÞ þ ½θðxÞ þ θðxÞδðyÞ
¼ δðx yÞ þ δðyÞ, ð10:186Þ
where we used
θðxÞ þ θðxÞ 1 ð10:187Þ
as well as
f ðyÞδðyÞ ¼ f ð0ÞδðyÞ ð10:188Þ
and
δðyÞ ¼ δðyÞ: ð10:189Þ
However, the function δ(x y) + δ( y) is of secondary importance. It is because in

(10.184) we may choose [ε, x ε] (ε > 0) for the domain y and put ε ! + 0 after the
integration and other calculations related to the surface terms. Therefore,
δ(x y) + δ( y) in (10.186) virtually vanishes.
Thus, we can express (10.185) as
∂Gðx, yÞ ∂F ðx, yÞ ∂F ðx, yÞ

¼ Θðx, yÞ ¼ :
∂y ∂y ∂y
Then, finally we reach

Z x
uð x Þ ¼ dyF ðx, yÞdðyÞ
0

dað0Þ ∂F ðx, yÞ
þ σ 2 að 0Þ σ 1 þ σ 1 bð0Þ F ðx, 0Þσ 1 að0Þ : ð10:190Þ
dy ∂y y¼0
(ii) Case II (x < 0, y < 0): Similarly as the above, Eq. (10.62) reads as
Z 0
1
0
aðxÞGðy, xÞ uð x Þ Gðy, xÞ uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ :
dx dx ∂x 1
Similarly as mentioned above, the lower limit of integration range is y. Considering

both G(y, x) and ∂G∂x
ðy, xÞ
vanish at x < y (see Fig. 10.4), we have
Z 0
y

aðxÞGðy, xÞ uð x Þ Gðy, xÞ uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ
dx dx ∂x x¼0
Z y
dað0Þ
¼ dxGðy, xÞdðxÞ σ 2 að0Þ σ 1 þ σ 1 bð0Þ Gðy, 0Þ
0 dx

∂Gðy, xÞ
þσ 1 að0Þ : ð10:191Þ
∂x x¼0
Comparing (10.191) with (10.183), we recognize that the sign of RHS of (10.191)
has been reversed relative to RHS of (10.183). This is also the case after exchanging
arguments x and y. Note, however, Θ(x, y) ¼ 1 in the present case. As a result, two
minus signs cancel and (10.191) takes exactly the same expression as (10.183).
Proceeding with calculations similarly, for both Cases I and II we arrive at a unified
solution represented by (10.190) throughout a domain (1, +1).
10.6.4 Examples
To deepen understanding of Green’s functions, we deal with tangible examples of

the IVP below.
Example 10.5 Let us consider a following inhomogeneous differential equation
d2 u
þ u ¼ 1: ð10:192Þ
dx2
Note that (10.192) is formally the same differential equation of (10.141). We may
encounter (10.192) when we are observing a motion of a charged harmonic oscillator
that is placed under a static electric field. We assume that a domain of the argument
x is a whole range of real numbers. We set boundary conditions such that
uð0Þ ¼ σ 1 and u0 ð0Þ ¼ σ 2 : ð10:193Þ
As in the case of Example 10.4, a fundamental set of solutions are given by
eix and eix : ð10:194Þ
Therefore, following (10.173), we get
F ðx, yÞ ¼ sin ðx yÞ: ð10:195Þ
Also following (10.190) we have

Z
x
uð x Þ ¼ dy sin ðx yÞ þ σ 2 sin x σ 1 ½ cos ðx yÞ
0 y¼0
¼ 1 cos x þ σ 2 sin x þ σ 1 cos x: ð10:196Þ
In particular, if we choose σ 1 ¼ 1 and σ 2 ¼ 0, we have

uðxÞ 1: ð10:197Þ
This also ensures that this is a unique solution under the inhomogeneous BCs
described as σ 1 ¼ 1 and σ 2 ¼ 0.
Example 10.6: Damped Oscillator If a harmonic oscillator undergoes friction, the
oscillator exerts damped oscillation. Such an oscillator is said to be a damped
oscillator. The damped oscillator is often dealt with when we think of bound
electrons in a dielectric medium that undergo an effect of a dynamic external field
varying with time. This is the case when the electron is placed in an alternating
electric field or an electromagnetic wave.
An equation of motion of the damped oscillator is described as
d2 u du
m þ r þ ku ¼ d ðxÞ, ð10:198Þ
dt 2 dx
where m is a mass of an electron; r is a damping constant; k is a spring constant of the

damped oscillator. To seek a fundamental set of solutions of a homogeneous
d2 u du
m þ r þ ku ¼ 0,
dt 2 dx
putting
u ¼ eiρt ð10:199Þ
and inserting it to (10.198), we have

r k iρt
ρ þ iρ þ
2
e ¼ 0:
m m
Since eiρt does not vanish, we have
r k
ρ2 þ iρ þ ¼ 0: ð10:200Þ
m m
We call this equation a characteristic quadratic equation. We have three cases for the
solution of a quadratic equation of (10.200). Solving (10.200), we get
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ir r2 k
ρ¼ 2þ : ð10:201Þ
2m 4m m
2
(i) Equation (10.201) gives two pure imaginary roots; i.e., 4m
r
2 þ m < 0 (an over
k
2
damping). (ii) The equation has double roots; 4m
r
2 þ m ¼ 0 (a critical damping).
k
2
(iii) The equation has two complex roots; 4mr
2 þ m > 0 (a weak damping). Of these,
k
Case (iii) is characterized by an oscillating solution and has many applications in

mathematical physics. For the Cases (i) and (ii), on the other hand, we do not have an
oscillating solution.
Case (i):
The characteristic roots are given by
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ir r2 k
ρ¼ i :
2m 4m2 m
Therefore, we have a fundamental set of solutions described by

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !
rt r2 k
uðt Þ ¼ exp exp t :
2m 4m2 m
Then, a general solution is given by

" rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !#
rt r2 k r2 k
uðt Þ ¼ exp a exp 2
t þ b exp 2
t :
2m 4m m 4m m
Case (ii):
The characteristic roots are given by
ir
ρ¼ :
2m
Therefore, one of the solutions is
rt
u1 ðt Þ ¼ exp :
2m
Another solution u2(t) is given by
∂u1 ðt Þ rt
u 2 ðt Þ ¼ c ¼ c0 t exp ,
∂ðiρÞ 2m
where c and c0 are appropriate constants. Thus, general solution is given by
rt rt
uðt Þ ¼ a exp þ bt exp :
2m 2m
The most important and interesting feature emerges as a “damped oscillator” in the
next Case (iii) in many fields of natural science. We are particularly interested in
this case.
Case (iii):
Suppose that the damping is relatively weak such that the characteristic equation
has two complex roots. Let us examine further details of this case following the
prescriptions of IVPs. We divide (10.198) by m for the sake of easy handling of the
differential equation such that
d 2 u r du k 1
þ þ u ¼ dðxÞ:
dt 2 m dx m m
Putting
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r2 k
ω 2þ , ð10:202Þ
4m m
we get a fundamental set of solutions described as
rt
uðt Þ ¼ exp exp ð iωt Þ: ð10:203Þ
2m
Given BCs, following (10.172) we get as a Green’s function
u2 ðt Þu1 ðτÞ u1 ðt Þu2 ðτÞ

Gðt, τÞ ¼ Θðt, τÞ
W ðu1 ðτÞ, u2 ðτÞÞ
1 2mr ðtτÞ
e ¼ sin ωðt τÞΘðt, τÞ, ð10:204Þ
ω
rt rt
where u1 ðt Þ ¼ exp 2m exp ðiωt Þ and u2 ðt Þ ¼ exp 2m exp ðiωt Þ.
We examine whether G(t, τ) is eligible for the Green’s function as follows:
h i
dG 1 r 2mr ðtτÞ
sin ωðt τÞ þ ωe2mðtτÞ cos ωðt τÞ Θðt, τÞ
r
¼ e
dt ω 2m
1 2mr ðtτÞ
þ e sin ωðt τÞ½δðt τÞθðτÞ þ δðτ t ÞθðτÞ: ð10:205Þ
ω
The second term of (10.205) vanishes because sinω(t τ)δ(t τ) ¼ 0 δ(t τ) ¼ 0.

Thus,

d2 G 1 r 2 r 2mr ðtτÞ
e2mðtτÞ sin ωðt τÞ
r
¼ ωe cos ωðt τÞ
dt 2 ω 2m m
ω2 e2mðtτÞ sin ωðt τÞΘðt, τÞ
r
h i
1 r 2mr ðtτÞ
sin ωðt τÞ þ ωe2mðtτÞ cos ωðt τÞ
r
þ e
ω 2m
½δðt τÞθðτÞ þ δðτ t ÞθðτÞ: ð10:206Þ
In the last term using the property of the δ function and θ function, we get δ(t τ).
Note here that
e2mðtτÞ cos ωðt τÞ½δðt τÞθðτÞ þ δðτ t ÞθðτÞ

r
¼ e2m0 cos ðω 0Þfδðt τÞ½θðτÞ þ θðτÞg

r
¼ δðt τÞ½θðτÞ þ θðτÞ ¼ δðt τÞ:
Thus, rearranging (10.206), we get

d2 G 1 r 2 2 2m
e ðtτÞ sin ωðt τÞ
r
¼ δ ð t τ Þ þ f þ ω
dt 2 ω 2m
h i
r r 2mr ðtτÞ
sin ωðt τÞ þ ωe2mðtτÞ cos ωðt τÞ g Θðt, τÞ
r
e
m 2m
k r dG
¼ δ ðt τ Þ G , ð10:207Þ
m m dt
where we used (10.202) for the last equality. Rearranging (10.207) once again, we
have
d 2 G r dG k
þ þ G ¼ δðt τÞ: ð10:208Þ
dt 2 m dt m
Defining the following operator
d2 r d k
Lt þ þ , ð10:209Þ
dt 2 m dt m
we get
Lt G ¼ δðt τÞ: ð10:210Þ
Note that this expression is consistent with (10.164). Thus, we find that (10.210)
satisfies the condition (10.123) of the Green’s function, where the weight function is
identified with unity.
Now, suppose that a sinusoidally changing external field eiΩt influences the
motion of the damped oscillator. Here we assume that an amplitude of the external
field is unity. Then, we have
d2 u r du k 1
þ þ u ¼ eiΩt : ð10:211Þ
dt 2 m dx m m
Thus, as a solution of the homogeneous boundary conditions [i.e., uð0Þ ¼ uð_0Þ ¼ 0]

we get
Z t
1
e2mðtτÞ eiΩt sin ωðt τÞdτ,
r
uð t Þ ¼ ð10:212Þ
mω 0
where t is an arbitrary positive or negative time. Equation (10.212) shows that with
the real part we have an external field cosΩt and that with the imaginary part we have
an external field sinΩt. To calculate (10.212) we use
h i
1 iωðtτÞ
sin ωðt τÞ ¼ e eiωðtτÞ : ð10:213Þ
2i
Then, the equation can readily be solved by integration of exponential functions,

even though we have to do somewhat lengthy (but straightforward) calculations.
Thus for the real part (i.e., the external field is cosΩt), we get a solution
1 r 1 r 2 1 2
Cuðt Þ ¼ ΩsinΩt þ cos Ωt Ω ω2 cos Ωt
m m m 2m m
h
1 r r 1 2
þ e2mt Ω2 ω2 cos ωt Ω þ ω2 sin ωt
m 2m ω
r 2 r 31
cos ωt sin ωt, ð10:214Þ
2m 2m ω
where C is a constant [i.e., a constant denominator of u(t)] expressed as
2 r2 r 4
C ¼ Ω 2 ω 2 þ 2 Ω 2 þ ω2 þ : ð10:215Þ
2m 2m
For the imaginary part (i.e., the external field is sinΩt) we get
1 2 1 r 1 r 2
Cuðt Þ ¼ Ω ω2 sin Ωt ΩcosΩt þ sin Ωt
m m m m 2m

1 2mr t Ω 2 r r 2Ω
þ e Ω ω sin ωt þ Ωcosωt þ
2
sin ωt :
m ω m 2m ω
ð10:216Þ
In Fig. 10.5 we show an example that depicts the positions of a damped oscillator as
a function of time. In Fig. 10.5a, an amplitude of an envelope gradually diminishes
with time. An enlarged diagram near the origin (Fig. 10.5b) clearly reflects the initial
conditions uð0Þ ¼ uð_0Þ ¼ 0 . In Fig. 10.5, we put m ¼ 1 [kg], Ω ¼ 1 1s , ω ¼

0:94 1s , and r ¼ 0:006 kgs . In the above calculations, if mr is small enough (i.e.,
damping is small enough), the third order and fourth order of mr may be ignored and
the approximation is precise enough.
(a)

3RVLWLRQ P

Phase:

ಥ 7LPH V
(b)

3RVLWLRQ P

Phase: 0

ಥ 7LPH V
Fig. 10.5 Example of a damped oscillation as a function of t. The data are taken from (10.214). (a)
Overall profile. (b) Profile enlarged near the origin
In the case of inhomogeneous BCs, given σ 1 ¼ u(0) and σ 2 ¼ uð_0Þ we can decide
additional terms S(t) using (10.190) such that
r 1 2mr t
sin ωt þ σ 1 e2mt cos ωt:
r
Sð t Þ ¼ σ 2 þ σ 1 e ð10:217Þ
2m ω
This term arises from (10.190). Thus, from (10.212) and (10.217), u(t) + S(t) gives a
unique solution for the SOLDE with inhomogeneous BCs. Notice that S(t) does not
depend on the external field.
10.7 Eigenvalue Problems 425
10.7 Eigenvalue Problems
We often encounter eigenvalue problems in mathematical physics. Of these, those

related to Hermitian differential operators have particularly interesting and important
features. The eigenvalue problems we have considered in Part I are typical illustra-
tions. Here we investigate general properties of the eigenvalue problems.
Returning to the case of homogeneous BCs, we consider a following homoge-
neous SOLDE:
d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ 0: ð10:5Þ
dx2 dx
Defining a following differential operator Lx such that
d2 d
Lx ¼ aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx
we have a homogeneous equation
Lx uðxÞ ¼ 0: ð10:218Þ
Putting a constant λ instead of c(x), we have
d2 u du
að x Þ þ bðxÞ λu ¼ 0: ð10:219Þ
dx2 dx
If we define a differential operator Lx such that
d2 d
L x ¼ að x Þ þ bðxÞ λ, ð10:220Þ
dx2 dx
we have a homogeneous equation
Lx u ¼ 0 ð10:221Þ
to express (10.219). Instead, if we define a differential operator Lx such that
d2 d
Lx ¼ aðxÞ þ bð x Þ ,
dx2 dx
we have the same homogeneous equation

Lx u ¼ λu ð10:222Þ
to express (10.219).
Equations (10.221) and (10.222) are essentially the same except that the expres-
sion is different. The expression using (10.222) is familiar to us as an eigenvalue
equation. The difference between (10.5) and (10.219) is that whereas c(x) in (10.5) is
a given fixed function, λ in (10.219) is constant, but may be varied according to the
solution of u(x). One of the most essential properties of the eigenvalue problem that
is posed in the form of (10.222) is that its solution is not uniquely determined as
already studied in various cases of Part I. Remember that the methods based upon the
Green’s function are valid for a problem to which a homogeneous differential
equation has a trivial solution (i.e., identically zero) under homogeneous BCs. In
contrast to this situation, even though the eigenvalue problem is basically posed as a
homogeneous equation under homogeneous BCs, nontrivial solutions are expected
to be obtained. In this respect, we have seen that in Part I we rejected a trivial
solution (i.e., identically zero) because of no physical meaning.
As exemplified in Part I, the eigenvalue problems that appear in mathematical
physics are closely connected to the Hermiticity of (differential) operators. This is
because in many cases an eigenvalue is required to be real. We have already
examined how we can convert a differential operator to the self-adjoint form. That
is, if we define p(x) as in (10.26), we have the self-adjoint operator as described in
(10.70). As a symbolic description, we have

d du
wðxÞLx u ¼ pð x Þ þ cðxÞwðxÞu: ð10:223Þ
dx dx
In the same way, multiplying both sides of (10.222) by w(x), we get
wðxÞLx u ¼ λwðxÞu: ð10:224Þ
For instance, Hermite differential equation that has already appeared as (2.118) in
Sect. 2.3 is described as
d2 u du
2x þ 2nu ¼ 0: ð10:225Þ
dx2 dx
If we express (10.225) as in (10.224), multiplying ex on both sides of (10.225) we

2
have

d x2 du
þ 2nex u ¼ 0:
2
e ð10:226Þ
dx dx
Notice that the differential operator has been converted to a self-adjoint form
according to (10.31) that defines a real and positive weight function ex in the
2
present case. The domain of the Hermite differential equation is (1, +1) at the
endpoints (i.e., 1) of which the surface term of RHS of (10.69) approaches zero
sufficiently rapidly in virtue of ex .
2
In Sects. 3.4 and 3.5, in turn, we dealt with the (associated) Legendre differential
equation (3.127) for which the relevant differential operator is self-adjoint. The
surface term corresponding to (10.69) vanishes. It is because (1 ξ2) vanishes at
the endpoints ξ ¼ cos θ ¼ 1 (i.e., θ ¼ 0 or π) from (3.107). Thus, the Hermiticity
is automatically ensured for the (associated) Legendre differential equation as well
as Hermite differential equation. In those cases, even though the differential equa-
tions do not satisfy any particular BCs, the Hermiticity is yet ensured.
In the theory of differential equations, the aforementioned properties of the
Hermitian operators have been fully investigated as the so-called StrumLiouville
system (or problem) in the form of a homogeneous differential equation. The related
differential equations are connected to classical orthogonal polynomials having
personal names such as Hermite, Laguerre, Jacobi, Gegenbauer, Legendre, and
Tchebichef. These equations frequently appear in quantum mechanics and electro-
magnetism as typical examples of StrumLiouville system. They can be converted
to differential equations by multiplying an original form by a weight function. The
resulting equations can be expressed as

d dY ðxÞ
aðxÞwðxÞ n þ λn wðxÞY n ðxÞ ¼ 0, ð10:227Þ
dx dx
where Yn(x) is a collective representation of classical orthogonal polynomials.

Equation (10.226) is an example. Conventionally, a following form is adopted
instead of (10.227):

1 d dY n ðxÞ
aðxÞwðxÞ þ λn Y n ðxÞ ¼ 0, ð10:228Þ
wðxÞ dx dx
where we put (aw)0 ¼ bw. That is, the differential equation is originally described as
d 2 Y n ð xÞ dY ðxÞ
að x Þ þ bðxÞ n þ λn Y n ðxÞ ¼ 0: ð10:229Þ
dx2 dx
In the case of Hermite polynomials, for instance, a(x) ¼ 1 and wðxÞ ¼ ex . Since we
2
have ðawÞ0 ¼ 2xex ¼ bw, we can put b ¼ 2x. Examples including this case
2
are tabulated in Table 10.2. The eigenvalues λn are associated with real numbers that
characterize the individual physical systems. The related fields have a wide appli-
cations in many branches of natural science.
After having converted the operator to the self-adjoint form; i.e., Lx ¼ Lx{, instead
of (10.100) we have
428
Table 10.2 Classical polynomials and their related SOLDEs

Name of the polynomial SOLDE form Weight function: w(x) Domain
2
Hermite: Hn(x) d2 d ex (1, +1)
dx2 H n ðxÞ 2x dx H n ðxÞ þ 2nH n ðxÞ ¼ 0
Laguerre: Lνn ðxÞ d2 ν d ν xνex (ν > 1) [0, +1)
x dx2 Ln ðxÞ þ ðν þ 1 xÞ dx Ln ðxÞ þ nLνn ðxÞ ¼0
2
Gegenbauer: C λn ðxÞ d λ d λ λ λ12 [1, +1]
ð1 x2 Þ dx 2 C n ðxÞ ð2λ þ 1Þx dx C n ðxÞ þ nðn þ 2λÞC n ðxÞ ¼ 0 ð1 x2 Þ
λ > 12
2
Legendre: Pn(x) d d 1 [1, +1]
ð1 x2 Þ dx 2 Pn ðxÞ 2x dx Pn ðxÞ þ nðn þ 1ÞPn ðxÞ ¼ 0
10
Introductory Green’s Functions
Z s
dxwðxÞfv ðLx uÞ ½Lx v ug ¼ 0: ð10:230Þ
r
Rewriting it, we get

Z s Z s

dxv ½wðxÞLx u ¼ dx½wðxÞLx v u: ð10:231Þ
r r
If we use an inner product notation described by (10.88), we get
hvjLx ui ¼ hLx vjui: ð10:232Þ
Here let us think of two eigenfunctions ψ i and ψ j that belong to an eigenvalue λi

and λj, respectively. That is,
wðxÞLx ψ i ¼ λi wðxÞψ i and wðxÞLx ψ j ¼ λj wðxÞψ j : ð10:233Þ
Inserting ψ i and ψ j into u and v, respectively, in (10.232), we have
ψ j jLx ψ i ¼ ψ j jλi ψ i ¼ λi ψ j jψ i ¼ λj ψ j jψ i ¼ λj ψ j jψ i
¼ Lx ψ j jψ i : ð10:234Þ
With the second and third equalities, we have used a rule of the inner product (see
Parts I and III). Therefore, we get

λi λj ψ j jψ i ¼ 0: ð10:235Þ
Putting i ¼ j in (10.235), we get
ðλi λi Þhψ i jψ i i ¼ 0: ð10:236Þ
An inner product hψ i| ψ ii vanishes if and only if jψ ii 0; see inner product

calculation rules of Sect. 13.1. However, j ψ ii 0 is not acceptable as a physical
state. Therefore, we must have hψ i| ψ ii 6¼ 0. Thus, we get
λi λi ¼ 0 or λi ¼ λi : ð10:237Þ
The relation (10.237) obviously indicates that λi is real; i.e., we find that eigenvalues
of an Hermitian operator are real. If λi 6¼ λj ¼ λj, from (10.235) we get
ψ j jψ i ¼ 0: ð10:238Þ
That is, | ψ ji and | ψ ii are orthogonal to each other.

We often encounter related orthogonality relationship between vectors and func-

tions. We saw several cases in Part I and will see other cases in Part III.
References

Part III
Linear Vector Spaces
In this part we treat vectors and their transformations in linear vector spaces so that
we can address various aspects of mathematical physics systematically but intui-
tively. We outline general principles of linear vector spaces mostly from an algebraic
point of view. Starting with abstract definition and description of vectors, we deal
with their transformation in a vector space using a matrix. An inner product is a
central concept in the theory of a linear vector space so that two vectors can be
associated with each other to yield a scalar. Unlike many of the books of linear
algebra and linear vector spaces, however, we describe canonical forms of matrices
before considering the inner product. This is because we can treat the topics in light
of the abstract theory of matrices and vector space without a concept of the inner
product. Of the canonical forms of matrices, Jordan canonical form is of paramount
importance. We study how it is constructed providing a tangible example.
In relation to the inner product space, normal operators such as Hermitian
operators and unitary operators frequently appear in quantum mechanics and elec-
tromagnetism. From a general aspect, we revisit the theory of Hermitian operators
that often appeared in both Parts I andII.
The last part deals with exponential functions of matrices. The relevant topics can
be counted as one of the important branches of applied mathematics. Also, the
exponential functions of matrices play an essential role in the theory of continuous
groups that will be dealt with in PartIV.
Chapter 11
Vectors and Their Transformation
In this chapter, we deal with the theory of finite-dimensional linear vector spaces.
Such vector spaces are spanned by a finite number of linearly independent vectors,
namely, basis vectors. In conjunction with developing an abstract concept and
theory, we mention a notion of mapping among mathematical elements. A linear
transformation of a vector is a special kind of mapping. In particular, we focus on
endomorphism within a n-dimensional vector space V n. Here, the endomorphism is
defined as a linear transformation: V n ! V n. The endomorphism is represented by a
(n, n) square matrix. This is most often the case with physical and chemical
applications, when we deal with matrix algebra. In this book we focus on this type
of transformation.
A non-singular matrix plays an important role in the endomorphism. In this
connection, we consider its inverse matrix and determinant. All these fundamental
concepts supply us with a sufficient basis for better understanding of the theory of
the linear vector spaces. Through these processes, we should be able to get
acquainted with connection between algebraic and analytical approaches and gain
a broad perspective on various aspects of mathematical physics and related fields.
11.1 Vectors
From both fundamental and practical points of view, it is desirable to define linear
vector spaces in an abstract way. Suppose V is a set of elements denoted by a, b, c,
etc. called vectors. The set V is a linear vector space (or simply a vector space), if a
sum a + b 2 V is defined for any pair of vectors a and b and the elements of V satisfy
the following mathematical relations:
ða þ bÞ þ c ¼ a þ ðb þ cÞ, ð11:1Þ

https://doi.org/10.1007/978-981-15-2225-3_11
434 11 Vectors and Their Transformation
a þ b ¼ b þ a, ð11:2Þ
a þ 0 ¼ a, ð11:3Þ
a þ ð 2 aÞ ¼ 0: ð11:4Þ
For the above 0 is called the zero vector. Furthermore, for a 2 V, ca 2 V is defined
(c is a complex number called a scalar) and we assume the following relations among
vectors and scalars:
ðcd Þa ¼ cðdaÞ, ð11:5Þ

1a ¼ a, ð11:6Þ
cða þ bÞ ¼ ca þ cb, ð11:7Þ
ðc þ dÞa ¼ ca þ da: ð11:8Þ
On the basis of the above relations, we can construct the following expression called
a linear combination:
c 1 a1 þ c 2 a2 þ þ c n an :
If this linear combination is equated to zero, we obtain
c1 a1 þ c2 a2 þ þ cn an ¼ 0: ð11:9Þ
If (11.9) holds only in the case where every ci ¼ 0 (1 i n), the vectors a1, a2, ,
an are said to be linearly independent. In this case the relation represented by (11.9)
is said to be trivial. If the relation is nontrivial (i.e., ∃ci 6¼ 0), those vectors are said to
be linearly dependent.
If in the vector space V the maximum number of linearly independent vectors is
n, V is said to be an n-dimensional vector space and sometimes denoted by V n. In this
case any vector x of V n is expressed uniquely as a linear combination of linearly
independent vectors such that
x ¼ x 1 a1 þ x 2 a2 þ þ x n an : ð11:10Þ
Suppose x is denoted by
x ¼ x01 a1 þ x02 a2 þ þ x0n an : ð11:11Þ
Subtracting both sides of (11.11) from (11.10), we obtain

0 ¼ x1 2 x01 a1 þ x2 2 x02 a2 þ þ xn 2 x0n an : ð11:12Þ
Linear independence of the vectors a1, a2, , an implies xn 2 x0n ¼ 0; i:e:, xn ¼

x0n ð1 i nÞ. These n linearly independent vectors are referred to as basis vectors.
11.1 Vectors 435
A vector space that has a finite number of basis vectors is called finite-dimensional;
otherwise it is infinite-dimensional.
Alternatively, we express (11.10) as
0 1
x1
B C
x ¼ ða1 an Þ@ ⋮ A: ð11:13Þ
xn
0 1
x1
B C
A set of coordinates @ ⋮ A is called a column vector (or a numerical vector) that
xn
indicates an “address” of the vector x with respect to the basis vectors a1, a2, , an.
Any vector in V n can be expressed as a linear combination of the basis vectors and,
hence, we say that V n is spanned by a1, a2, , an. This is represented as
V n ¼ Spanfa1 , a2 , , an g: ð11:14Þ
Let us think of a subset W of V n (i.e., W ⊂ V n). If the following relations hold for
W, W is said to be a (linear) subspace of V n.
a, b 2 W⟹a þ b 2 W,
a 2 W⟹ca 2 W: ð11:15Þ
These two relations ensure that the relations of (11.1)–(11.8) hold for W as well. The
dimension of W is equal to or smaller than n. For instance W ¼ Span{a1, a2, ,
ar} (r n) is a subspace of V n. If r ¼ n, W ¼ V n. Suppose that there are two
subspaces W1 ¼ Span{a1} and W2 ¼ Span{a2}. Note that in this case W1 [ W2 is not
a subspace, because W1 [ W2 does not contain a1 + a2. However, a set U defined by

U ¼ x ¼ x1 þ x2 ; 8 x1 2 W 1 , 8 x2 2 W 2 ð11:16Þ
is a subspace of V n. We denote this subspace by W1 + W2.

To show this is in fact a subspace, suppose that x,y 2 W1 + W2. Then, we may
express x ¼ x1 + x2 and y ¼ y1 + y2, where x1,y1 2 W1; x2,y2 2 W2. We have
x + y ¼ (x1 + y1) + (x2 + y2), where x1 + y1 2 W1 and x2 + y2 2 W2 because both W1
and W2 are subspaces. Therefore, x + y 2 W1 + W2. Meanwhile, with any scalar
c, cx ¼ cx1 + cx2 2 W1 + W2. By definition (11.15), W1 + W2 is a subspace
accordingly. Suppose here x1 2 W1. Then, x1 = x1 + 0 2 W1 + W2. Then,
W1 ⊂ W1 + W2. Similarly, we have W2 ⊂ W1 + W2. Thus, W1 + W2 contains
both W1 and W2. Conversely, let W be an arbitrary subspace that contains both W1
and W2. Then, we have 8x1 2 W1 ⊂ W and 8x2 2 W2 ⊂ W and, hence, we have
x1 + x2 2 W by definition (11.15). But, from (11.16) W1 + W2 ¼ {x1 + x2;8x1 2 W1,
8
x2 2 W2}. Hence, W1 + W2 ⊂ W. Consequently, any subspace necessarily contains
(a) z z (b) z
% % %
2 2 2
y y y
$ $ಬ 3
x x x
Fig. 11.1 Decomposition of a vector in a three-dimensional Cartesian space ℝ3 into two subspaces.
(a) ℝ3 ¼ W1 + W2; W1 \ W2 ¼ Span{e2}, where e2 is a unit vector in the positive direction of the y-
axis. (b) ℝ3 ¼ W1 + W3; W1 \ W3 ¼ {0}
W1 + W2. This implies that W1 + W2 is the smallest subspace that contains both W1
and W2.
Example 11.1 Consider a three-dimensional Cartesian space ℝ3 (Fig. 11.1). We
regard the xy-plane and yz-plane as a subspace W1 and W2, respectively, and
! ! !
ℝ3 ¼ W1 + W2. In Fig. 11.1a, a vector OB (in ℝ3) is expressed as OA þ AB (i.e.,
!
a sum of a vector in W1 and that in W2). Alternatively, the same vector OB can be
! !
expressed as OA0 þ A0 B . On the other hand, we can designate a subspace in a
different way; i.e., in Fig. 11.1b the z-axis is chosen for a subspace W3 instead of
!
W2. We have ℝ3 ¼ W1 + W3 as well. In this case, however, OB is uniquely expressed
! ! !
as OB ¼ OP þ PB . Notice that in Fig. 11.1a W1 \ W2 ¼ Span{e2}, where e2 is a
unit vector in the positive direction of the y-axis. In Fig. 11.1b, on the other hand, we
have W1 \ W3 ¼ {0}.
We can generalize this example to the following theorem.
Theorem 11.1 Let W1 and W2 be subspaces of V and V ¼ W1 + W2. Then a vector x
in V is uniquely expressed as
x ¼ x1 þ x 2 , x1 2 W 1 , x2 2 W 2 ,
if and only if W1 \ W2 ¼ {0}.

Proof Suppose W1 \ W2 ¼ {0} and x ¼ x1 þ x2 = x01 þ x02 , x1 , x01 2 W 1 , x2 ,
x02 2 W 2 . Then x1 2 x01 = x02 2 x2 . LHS belongs to W1 and RHS belongs to W2.
Both sides belong to W1 \ W2 accordingly. Hence, from the supposition both the
sides should be equal to a zero vector. Therefore, x1 = x01, x2 = x02. This implies that x
11.1 Vectors 437
is expressed uniquely as x ¼ x1 + x2. Conversely, suppose the vector representation

(x ¼ x1 + x2) is unique and x 2 W1 \ W2. Then, x ¼ x + 0 = 0 + x; x,0 2 W1 and
x,0 2 W2. Uniqueness of the representation implies that x ¼ 0. Consequently,
W1 \ W2 ¼ {0} follows.
In case W1 \ W2 ¼ {0}, V ¼ W1 + W2 is said to be a direct sum of W1 and W2 or
we say that V is decomposed into a direct sum of W1 and W2. We symbolically
denote this by
V ¼ W 1 ⨁W2 : ð11:17Þ
In this case, the following equality holds:
dim V ¼ dimW 1 þ dimW 2 , ð11:18Þ
where “dim” stands for dimension of the vector space considered. To prove (11.18),
we suppose that V is a n-dimensional vector space and that W1 and W2 are spanned
by r1 and r2 linearly independent vectors, respectively, such that
n o n o
W 1 ¼ Span e1 , e2 , , eðr11 Þ and W 2 ¼ Span e1 , e2 , , eðr22 Þ : ð11:19Þ
This is equivalent to that dimension of W1 and W2 is r1 and r2, respectively. If

V ¼ W1 + W2 (here we do not assume that the summation is a direct sum), we have
n o
V ¼ Span e1 , e2 , , eðr11 Þ , e1 , e2 , , eðr22 Þ : ð11:20Þ
Then, we have n r1 + r2. This is almost trivial. Suppose r1 + r2 < n. Then, these
(r1 + r2) vectors cannot span V, but we need additional vectors for all vectors
including the additional vectors to span V. Thus, we should have n r1 + r2
accordingly. That is,
dim V dimW 1 þ dimW 2 : ð11:21Þ
ð2Þ
Now, let us assume that V ¼ W1 ⨁ W2. Then, ei ð1 i r 2 Þ must be linearly
independent of e1 , e2 , , er1 . If not, ei could be described as a linear combina-
tion of e1 , e2 , , eðr11 Þ. But, this would imply that ei 2 W 1 , i.e., ei 2 W 1 \ W 2 ,
in contradiction to that we have V ¼ W1 ⨁ W2. It is because W1 \ W2 ¼ {0} by
ð1Þ ð2Þ ð2Þ
assumption. Likewise, ej ð1 j r 1 Þ is linearly independent of e1 , e2 , , eðr22 Þ .
Hence, e1 , e2 , , eðr11 Þ , e1 , e2 , , eðr22 Þ must be linearly independent. Thus,
n r1 + r2. This is because in the vector space V we may well have additional
vector(s) that are independent of the above (r1 + r2) vectors. Meanwhile, n r1 + r2
from the above. Consequently, we must have n ¼ r1 + r2. Thus, we have proven that
V ¼ W 1 ⨁W 2 ⟹dim V ¼ dimW 1 þ dimW 2 :
Conversely, suppose n ¼ r1 + r2. Then, any vector x in V is expressed uniquely as

ð1Þ ð2Þ
x = a1 e1 þ þ ar1 eðr11 Þ þ b1 e1 þ þ br2 eðr22 Þ : ð11:22Þ
The vector described by the first term is contained in W1 and that described by the
second term in W2. Both the terms are again expressed uniquely. Therefore, we get
V ¼ W1 ⨁ W2. This is a proof of
dim V ¼ dimW 1 þ dimW 2 ⟹V ¼ W 1 ⨁W 2 :
The above statements are summarized as the following theorem: ∎

Theorem 11.2 Let V be a vector space and let W1 and W2 be subspaces of V. Also
suppose V ¼ W1 + W2. Then, the following relation holds:
dim V dimW 1 þ dimW 2 : ð11:21Þ
dim V ¼ dimW 1 þ dimW 2 , ð11:23Þ
if and only if V ¼ W1 ⨁ W2.

Theorem 11.2 is readily extended to the case where there are three or more
subspaces. That is, having W1, W2, , Wm so that V ¼ W1 + W2 + Wm, we obtain
the following relation:
dim V dimW 1 þ dimW 2 þ þ W m : ð11:24Þ
The equality of (11.24) holds if and only if V ¼ W1 ⨁ W2 ⨁ ⨁ Wm.

In light of Theorem 11.2, Example 11.1 says that 3 ¼ dim ℝ3 < 2 + 2 ¼
dim W1 + dim W2. But, dim ℝ3 ¼ 2 + 1 ¼ dim W1 + dim W3. Therefore, we have
ℝ 3 ¼ W 1 ⨁ W 2.
11.2 Linear Transformations of Vectors
In the previous section we introduced vectors and their calculation rules in a linear
vector space. It is natural and convenient to relate a vector to another vector, as a
function f relates a number (either real or complex) to another number such that y ¼ f
(x), where x and y are certain two numbers. A linear transformation from the vector
space V to another vector space W is a mapping A: V ! W such that
11.2 Linear Transformations of Vectors 439
Aðca þ dbÞ = cAðaÞ þ dAðbÞ: ð11:25Þ
We will briefly discuss the concepts of mapping at the end of this section.
It is convenient to define addition of the linear transformations. It is defined as
ðA þ BÞa = Aa þ Ba, ð11:26Þ
where a is any vector in V.

Since (11.25) is a broad but abstract definition, we begin with a well-known
simple example of rotation of a vector within a xy-plane (Fig. 11.2). We denote an
arbitrary position vector x in the xy-plane by
x ¼ xe1 þ ye2 :
x
¼ ð e1 e 2 Þ , ð11:27Þ
y
where e1 and e2 are unit basis vectors in the xy-plane and x and y are coordinates of
the vector x in reference to e1 and e2. The expression (11.27) is consistent with
(11.13). The rotation represented in Fig. 11.2 is an example of a linear transforma-
tion. We call this rotation R. According to the definition
Rðxe1 þ ye2 Þ = RðxÞ = xRðe1 Þ þ yRðe2 Þ: ð11:28Þ

Putting Rðxe1 þ ye2 Þ = x0 , Rðe1 Þ ¼ e01 , and Rðe2 Þ ¼ e02 ,
x0 = xe01 þ ye02 ,
x
¼ e01 e02 : ð11:29Þ
y
From Fig. 11.2 we readily obtain
Fig. 11.2 Rotation of a

vector x within a xy-plane
θ
θ
θ
′
e01 ¼ e1 cos θ þ e2 sin θ,

e02 ¼ 2 e1 sin θ þ e2 cos θ: ð11:30Þ
Using a matrix representation,
cos θ sin θ
e01 e02 ¼ ðe1 e2 Þ : ð11:31Þ
sin θ cos θ
Substituting (11.30) into (11.29), we obtain
x0 = ðx cos θ y sin θÞe1 þ ðx sin θ þ y cos θÞe2 : ð11:32Þ
Meanwhile, x0 can be expressed relative to the original basis vectors e1 and e2.
x0 = x 0 e 1 þ y 0 e 2 : ð11:33Þ
Comparing (11.32) and (11.33), uniqueness of the representation ensures that
x0 = x cos θ y sin θ,
y0 ¼ x sin θ þ y cos θ: ð11:34Þ
Using a matrix representation once again,
x0 cos θ sin θ x
0
¼ : ð11:35Þ
y sin θ cos θ y
Further combining (11.29) and (11.31), we get
cos θ sin θ x
x0 ¼ RðxÞ ¼ ðe1 e2 Þ : ð11:36Þ
sin θ cos θ y
The above example demonstrates that the linear transformation R has a (2, 2)
matrix representation shown in (11.36). Moreover, this example obviously shows
that if a vector is expressed as a linear combination of the basis vectors, the
“coordinates” (represented by a column vector) can be transformed as well by the
same matrix.
Regarding an abstract n-dimensional linear vector space V n, the linear vector
transformation A is given by
0 10 1
a11 a1n x1
B CB C
AðxÞ ¼ ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A , ð11:37Þ
an1 ann xn
where e1, e2, , and en are basis vectors and x1, x2, , and xn are the corresponding
coordinates of a vector x = ∑ni¼1 xi ei. We assume that the transformation is a mapping
A: V n ! V n (i.e., endomorphism). In this case the transformation is represented by a
(n, n) matrix. Note that the matrix operates on the basis vectors from the right and
that it operates on the coordinates (i.e., a column vector) from the left. In (11.37) we
often omit a parenthesis to simply write Ax.
Here we mention matrix notation for later convenience. We often identify a linear
vector transformation with its representation matrix and denote both transformation
and matrix by A. On this occasion, we write
0 1
a11 a1n
B C
A¼@⋮ ⋱ ⋮ A, A ¼ ðAÞij ¼ aij , etc:, ð11:38Þ
an1 ann
where with the second expression (A)ij and (aij) represent the matrix A itself; for we
frequently use indexed matrix notations such as A1, A{ and A.e The notation (11.38)
can conveniently be used in such cases. Note moreover that aij represents the matrix
A as well.
Equation (11.37) has duality such that the matrix A operates either on the basis
vectors or coordinates. This can explicitly be written as
2 0 13 0 1
a11 a1n x1
6 B C7 B C
AðxÞ ¼ 4ðe1 en Þ@ ⋮ ⋱ ⋮ A5 @ ⋮ A
an1 ann xn
20 10 13
a11 a1n x1
6B CB C7
¼ ðe1 en Þ4@ ⋮ ⋱ ⋮ A@ ⋮ A5:
an1 ann xn
That is, we assume the associative law with the above expression. Making summa-
tion representation, we have
0 1 0 Pn 1
Xn Xn x1 l¼1 a1l xl
B C B C
AðxÞ ¼ e a
k¼1 k k1
ea
k¼1 k kn
@ ⋮ A ¼ ðe1 en Þ@ ⋮ A:
Pn
xn l¼1 anl xl
Xn Xn
¼ k¼1
ea x
l¼1 k kl l
Xn Xn Xn Xn
¼ l¼1 k¼1
e k a kl x l ¼ k¼1 l¼1
a kl x l ek : ð11:39Þ
That is, the above equation can be viewed in either of two ways, i.e., coordinate
transformation with fix vectors or vector transformation with fixed coordinates.
Also, Eq. (11.37) can formally be written as

0 1 0 1
x1 x1
B C B C
AðxÞ ¼ ðe1 en ÞA@ ⋮ A ¼ ðe1 A en AÞ@ ⋮ A, ð11:40Þ
xn xn
where we assumed that the distributive law holds with operation of A on (e1 en).
Meanwhile, if in (11.37) we put xi ¼ 1, xj ¼ 0 ( j 6¼ i), from (11.39) we get
Xn
A ð ei Þ ¼ ea :
k¼1 k ki
Therefore, (11.39) can be rewritten as

0 1
x1
B C
AðxÞ ¼ ðAðe1 Þ Aðen ÞÞ@ ⋮ A: ð11:41Þ
xn
Since xi (1 i n) can arbitrarily be chosen, comparing (11.40) and (11.41) we

have
ei A ¼ Aðei Þ ð1 i nÞ: ð11:42Þ
The matrix representation is unique in reference to the same basis vectors.

Suppose that there is another matrix representation of the transformation A such that
0 10 1
a011 a01n x1
B CB C
AðxÞ ¼ ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A: ð11:43Þ
a0n1 a0nn xn
Subtracting (11.43) from (11.37), we obtain

20 10 1 0 0 10 13
a11 a1n x1 a11 a01n x1
6B CB C B CB C7
ðe1 en Þ4@ ⋮ ⋱ ⋮ A@ ⋮ A @ ⋮ ⋱ ⋮ A@ ⋮ A5
an1 ann xn a0n1 a0nn xn
0 n 1
∑k¼1 a1k a01k xk
B C
¼ ðe1 en Þ@ ⋮ A ¼ 0:
0

∑k¼1 ank ank xk
n

On the basis of the linear dependence of the basis vectors, ∑nk¼1 aik a0ik xk ¼
0 ð1 i nÞ: This relationship holds for any arbitrarily and independently chosen
complex numbers xi (1 i n). Therefore, we must have aik ¼ a0ik ð1 i, k nÞ,
meaning that the matrix representation of A is unique with regard to fixed basis
vectors.
Nonetheless, if a set of vectors e1, e2, , and en does not constitute basis vectors
(i.e., those vectors are linearly dependent), the aforementioned uniqueness of the
matrix representation loses its meaning. For instance, in V2 take vectors e1 and e2
such that e1 ¼ e2 (i.e., the two vectors are linearly dependent) and let the transfor-
1 0
mation matrix be B ¼ . This means that e1 should be e1 after the transfor-
0 2
mation. At the same time, the vector e2 (¼e1) should be converted to 2e2 (¼2e1). It is
impossible except for the case of e1 ¼ e2 ¼ 0. The above matrix B in its own right is
an object of matrix algebra, of course.
Putting a ¼ b ¼ 0 and c ¼ d ¼ 1 in the definition of (9.25) for the linear
transformation, we obtain
Að0Þ = Að0Þ þ Að0Þ:
Combining this relation with (11.4) gives
Að0Þ = 0: ð11:44Þ
Then do we have a vector u 6¼ 0 for which A(u) = 0? An answer is yes. This is

1 0
because if a (2, 2) matrix of is chosen for R, we get a linear transformation
0 0
such that
1 0
e01 e02 ¼ ðe1 e2 Þ ¼ ðe1 0Þ:
0 0
That is, we have R(e2) ¼ 0.

In general, vectors x (2V ) satisfying A(x) = 0 form a subspace in a vector space
V. This is because A(x) = 0 and A(y) = 0 ⟹ A(x + y) = A(x) + A(y) = 0, A(cx) = cA
(x) = 0. We call this subspace of a maximum dimension a null-space and represent it
as Ker A, where Ker stands for “kernel.” In other words, Ker A ¼ A1(0). Note that
this symbolic notation does not ensure the existence of the inverse transformation
A1 (vide infra), but represents a set comprising elements x that satisfy A(x) = 0. In
the above example Ker R ¼ Span{e2}.
A(V n) (here V n is a vector space considered and we assume that A is an endo-
morphism of V n) also forms a subspace in V n. In fact, for any x, y 2V n, we have A(x),
A( y) 2 A(V n) ⟹ A(x) + A( y) = A(x + y) 2 A(V n);cA(x) = A(cx) 2 A(V n).
Obviously, A(V n) ⊂ V n. The subspace A(V n) is said to be an image of the
transformation A and sometimes denoted by Im A. We have a so-called “dimension

theorem” expressed as follows:
Theorem 11.3: Dimension Theorem Let V n be a linear vector space of dimension
n. Also let A be an endomorphism: V n ! V n. Then we have
dim V n ¼ dimAðV n Þ þ dim Ker A: ð11:45Þ
The number of dimA(V n) is said to be a rank of the linear transformation A. That is,
we write
dimAðV n Þ ¼ rank A:
Also the number of dim Ker A is said to be a nullity of the linear transformation A.
That is, we have
dim Ker A ¼ nullity A:
Thus, (11.45) can be written succinctly as
dim V n ¼ rank A þ nullity A:
Proof Let e1, e2, , and en be basis vectors of V n. First, assume
Aðe1 Þ ¼ Aðe2 Þ ¼ ¼ Aðen Þ ¼ 0:

This implies that nullity A ¼ n. Then, A ∑ni¼1 xi ei ¼ ∑ni¼1 xi Aðei Þ ¼ 0. Since xi is
arbitrarily chosen, the expression means that A(x) ¼ 0 for 8x 2 V n. This implies
A ¼ 0. That is, rank A ¼ 0. Thus, (11.45) certainly holds.
To proceed with proof of the theorem, we think of a linear combination ∑ni¼1 ci ei.
Next, assume that Ker A ¼ Span{e1, e2, , eν} (ν < n); dim Ker A ¼ ν. After A is
operated on the above linear combination, we are left with ∑ni¼νþ1 ci Aðei Þ. We put
∑ni¼1 ci Aðei Þ ¼ ∑ni¼νþ1 ci Aðei Þ ¼ 0: ð11:46Þ
Suppose that the (n ν) vectors A(ei) (ν + 1 i n) are linearly dependent. Then

without loss of generality we can assume cν + 1 6¼ 0. Dividing (11.46) by cν + 1 we
obtain
cνþ2 c
Aðeνþ1 Þ þ Aðeνþ2 Þ þ þ n Aðen Þ ¼ 0,
C νþ1 Cνþ1
cνþ2 cn
Aðeνþ1 Þ þ A e þ þ A e ¼ 0,
Cνþ1 νþ2 C νþ1 n
cνþ2 c
A eνþ1 þ e þ þ n en ¼ 0:
C νþ1 νþ2 Cνþ1
Meanwhile, the (ν + 1) vectors e1 , e2 , , eν , and eνþ1 þ Ccνþ2

νþ1
eνþ2 þ þ Ccνþ1
n
en
are linearly independent, because e1, e2, , and en are basis vectors of V . This
n
would imply that the dimension of Ker A is ν + 1, but this is in contradiction to

Ker A ¼ Span{e1, e2, , eν}. Thus, the (n ν) vectors A(ei) (ν + 1 i n) should
be linearly independent.
Let V n ν be described as
V nν ¼ SpanfAðeνþ1 Þ, Aðeνþ2 Þ, , Aðen Þg: ð11:47Þ
Then, V n ν is a subspace of V n, and so dimA(V n) n ν ¼ dim V n ν

.

A ∑ni¼νþ1 ci ei ¼ 0:
From the above discussion, however, this relation holds if and only if
cν + 1 ¼ ¼ cn ¼ 0. This implies that Ker A \ V n ν ¼ {0}. Then, we have
V nν þ Ker A ¼ V nν Ker A:
Meanwhile, from Theorem 11.2 we have
dim½V nν Ker A ¼ dimV nν þ dim Ker A ¼ ðn νÞ þ ν ¼ n ¼ dimV n :
Thus, we must have dimA(V n) ¼ n ν ¼ dim V n ν. Since V n ν is a subspace of

V n and V n ν ⊂ A(V n) from (11.47), V n ν ¼ A(V n).
To conclude, we get
dim AðV n Þ þ dim Ker A ¼ dimV n ,

V n ¼ AðV n Þ Ker A: ð11:48Þ

Comparing Theorem 11.3 with Theorem 11.2, we find that Theorem 11.3 is a
special case of Theorem 11.2. Equations (11.45) and (11.48) play an important role
in the theory of linear vector space.
As an exercise, we have a following example:
Example 11.2 Let e1, e2, e3, e4 be basis vectors of V 4. Let A be an endomorphism of
V 4 and described by
0 1
1 0 1 0
B0 0C
B 1 1 C
A¼B C:
@1 0 1 0A
0 1 1 0
We have
0 1
1 0 1 0
B0 0C
B 1 1 C
ð e1 , e2 , e3 , e4 Þ B C ¼ ðe1 þ e3 , e2 þ e4 , e1 þ e2 þ e3 þ e4 , 0Þ:
@1 0 1 0A
0 1 1 0
That is,
Aðe1 Þ ¼ e1 þ e3 , Aðe2 Þ ¼ e2 þ e4 , Aðe3 Þ ¼ e1 þ e2 þ e3 þ e4 ,

Aðe4 Þ ¼ 0:
We have
Að 2 e1 e2 þ e3 Þ ¼ Aðe1 Þ Aðe2 Þ þ Aðe3 Þ ¼ 0:
Then, we find

A V 4 ¼ Spanfe1 þ e3 , e2 þ e4 g, Ker A ¼ Spanf 2 e1 e2 þ e3 , e4 g: ð11:49Þ
For any x 2 V n, using scalar ci (1 i 4), we have
x ¼ c1 e1 þ c2 e2 þ c 3 e3 þ c 2 e4
1 1
¼ ðc1 þ c3 Þðe1 þ e3 Þ þ ð2c2 þ c3 c1 Þðe2 þ e4 Þ
2 2
1 1
þ ðc3 c1 Þð 2 e1 e2 þ e3 Þ þ ðc1 2c2 c3 þ 2c4 Þe4 : ð11:50Þ
2 2
Thus, x has been uniquely represented as (11.50) with respect to basis vectors
(e1 + e3), (e2 + e4), (2e1 e2 + e3), and e4. The linear independence of these vectors
can easily be checked by equating (11.50) with zero. We also confirm that
Fig. 11.3 Concept of injective

mapping from a set X to
another set Y
surjective
bijective: injective + surjective

V 4 ¼ A V 4 Ker A:
Linear transformation is a kind of mapping. Figure 11.3 depicts the concept of

mapping. Suppose two sets of X and Y. The mapping f is a correspondence between
an element x (2X) and y (2Y ). The set f(X) ( ⊂ Y ) is said to be a range of f.
(i) The mapping f is injective: If x1 6¼ x2 ⟹ f(x1) 6¼ f(x2). (ii) The mapping is
surjective: f(X) ¼ Y. For 8y 2 Y corresponding element(s) x 2 X exist(s). (iii) The
mapping is bijective: If the mapping f is both injective and surjective, it is said to be
bijective (or reversible mapping or invertible mapping). A mapping that is not
invertible is said to be a non-invertible mapping.
If the mapping f is bijective, a unique element ∃x 2 X exists for 8y 2 Y such that f
(x) ¼ y. In terms of solving an equation, we say that with any given y we can find a
unique solution x to the equation f(x) ¼ y. In this case x is said to be an inverse
element to y and this is denoted by x ¼ f 1( y). The mapping f 1 is called an inverse
mapping. If the linear transformation is relevant, the mapping is said to be an inverse
transformation.
Here we focus on a case where both X and Y form a vector space and the mapping
is an endomorphism. Regarding the linear transformation A: V n ! V n (i.e., an
endomorphism of V n), we have a following important theorem:
Theorem 11.4 Let A: V n ! V n be an endomorphism of V n. A necessary and
sufficient condition for the existence of an inverse transformation to A (i.e., A1) is
A1(0) ¼ {0}.
Proof Suppose A1(0) ¼ {0}. Then A(x1) ¼ A(x2) ⟺ A(x1 x2) ¼ 0 ⟺ x1 x2 ¼ 0;
i.e., x1 ¼ x2. This implies that the transformation A is injective. Other way round,
suppose that A is injective. If A1(0) 6¼ {0}, there should be b (6¼0) with which A
(b) ¼ 0. This is, however, in contradiction to that A is injective. Then we must have
A1(0) ¼ {0}. Thus, A1(0) ¼ {0} ⟺ A is injective.
Meanwhile, A1(0) ¼ {0} ⟺ dim A1(0) ¼ 0 ⟺ dim A(V n) ¼ n (due to
Theorem 11.3); i.e., A(V n) ¼ V n. Then A1(0) ¼ {0} ⟺ A is surjective. Combining
this with the abovementioned statement, we have A1(0) ¼ {0} ⟺ A is bijective.
This statement is equivalent to that an inverse transformation exists.
In the proof of Theorem 11.4, to show that A1(0) ¼ {0} ⟺ A is surjective, we
have used the dimension theorem (Theorem 11.3), for which the relevant vector
space is finite (i.e., n-dimensional). In other words, that A is surjective is equivalent

to that A is injective with a finite-dimensional vector space, and vice versa. To
conclude, so far as we are thinking of the endomorphism of a finite-dimensional
vector space, if we can show it is either injective or surjective, the other necessarily
follows and, hence, the mapping is bijective. ∎
11.3 Inverse Matrices and Determinants
The existence of the inverse transformation plays a particularly important role in the
theory of linear vector spaces. The inverse transformation is a linear transformation.
Let x1 ¼ A1( y1), x2 ¼ A1( y2). Also, we have A(c1x1 + c2x2) ¼ c1A(x1) + c2A
(x2) ¼ c1 y1 + c2 y2. Thus, c1x1 + c2x2 ¼ A1(c1 y1 + c2 y2) ¼ c1A1( y1) + c2A1( y2),
showing that A1 is a linear transformation. As already mentioned, a matrix that
represents a linear transformation A is uniquely determined with respect to fixed
basis vectors. This should be the case with A1 accordingly. We have an important
theorem for this.
Theorem 11.5 [1] The necessary and sufficient condition for the matrix A1 that
represents the inverse transformation to A to exist is that det A 6¼ 0 (“det” means a
determinant). Here the matrix A represents the linear transformation A. The matrix
A1 is uniquely determined and given by

A1 ij
¼ ð1Þiþj ðM Þji =ðdet AÞ, ð11:51Þ
where (M)ij is the minor of det A corresponding to the element Aij.

Proof First, we suppose that the matrix A1 exists so that it satisfies the following
relation

∑nk¼1 A1 ik ðAÞkj ¼ δij ð1 i, j nÞ: ð11:52Þ
On this condition, suppose that det A ¼ 0. From the properties of determinants this
implies that one of the columns of A (let it be the m-th column) can be expressed as a
linear combination of the other columns of A such that
X
Akm ¼ A c:
j6¼m kj j
ð11:53Þ
Putting i ¼ m in (11.52), multiplying by cj, and summing over j 6¼ m, we get

11.3 Inverse Matrices and Determinants 449

∑nk¼1 A1 mk ∑j6¼m Akj cj ¼ ∑j6¼m δmj cj ¼ 0: ð11:54Þ
From (11.52) and (11.53), on the other hand, we obtain

Xn X n o
k¼1
A1 mk
A c ¼
j6¼m kj j
A1 ik
ðAÞkm ¼ 1: ð11:55Þ
i¼m
There is the inconsistency between (11.54) and (11.55). The inconsistency resulted
from the supposition that the matrix A1 exists. Therefore, we conclude that if
det A ¼ 0, A1 does not exist. Taking contraposition to the above statement, we
say that if A1 exists, det A 6¼ 0. Suppose next that det A 6¼ 0. In this case, on the
basis of the well-established result, a unique A1 exists and it is given by (11.51).
Summarizing the characteristics of the endomorphism within a finite-dimensional
vector space, we have
injective ⟺ surjective ⟺ bijective ⟺ det A 6¼ 0:
Let a matrix A be
0 1
a11 a1n
B C
A¼@⋮ ⋱ ⋮ A:
an1 ann
The determinant of a matrix A is denoted by det A or by
a11 a1n

⋮ ⋱ ⋮ :
an1 ann
The determinant is defined as
det A ∑ 1 2 n εðσ Þa1i1 a2i2 anin , ð11:56Þ

σ¼
i1 i2 in
where σ means permutation among 1, 2, . . ., n and ε(σ) denotes a sign of + (in the
case of even permutations) or (for odd permutations).
We deal with triangle matrices for future discussion. It is denoted by
= * , ð11:57Þ
0
where an asterisk ( ) means that upper right off-diagonal elements can take any
complex numbers (including zero). A large zero shows that all the lower left
off-diagonal elements are zero. Its determinant is given by
det T ¼ a11 a22 ann : ð11:58Þ
In fact, focusing on anin we notice that only if in ¼ n, anin does not vanish. Then we
get
det A ∑ 1 2 n 1 εðσ Þa1i1 a2i2 an1in1 ann : ð11:59Þ

σ¼
i1 i2 in1
Repeating this process, we finally obtain (11.58).

The endomorphic linear transformation can be described succinctly as
AðxÞ ¼ y, ð11:60Þ
where we have vectors such that x = ∑ni¼1 xi ei and y = ∑ni¼1 yi ei . In reference to the
same set of basis vectors (e1 en) and using a matrix representation, we have
0 10 1 0 1
a11 a1n x1 y1
B CB C B C
AðxÞ ¼ ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ ðe1 en Þ@ ⋮ A: ð11:61Þ
an1 ann xn yn
From the unique representation of a vector in reference to the basis vectors, (11.61) is
simply expressed as
0 10 1 0 1
a11 a1n x1 y1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ @ ⋮ A : ð11:62Þ
an1 ann xn yn
With a shorthand notation, we have

Xn
yi ¼ a x
k¼1 ik k
ð1 i nÞ: ð11:63Þ
From the above discussion, for the linear transformation A to be bijective, det A 6¼ 0.
In terms of the system of linear equations, we say that for (11.62) to have a unique
11.3 Inverse Matrices and Determinants 451
0 1 0 1
x1 y1
B C B C
solution @ ⋮ A for a given @ ⋮ A, we must have det A 6¼ 0. Conversely, det A ¼ 0
xn yn
is equivalent to that (11.62) has indefinite solutions or has no solution. As far as the
matrix algebra is concerned, (11.61) is symbolically described by omitting a paren-
thesis as
Ax ¼ y: ð11:64Þ
However, when the vector transformation is explicitly taken into account, the full
representation of (11.61) should be borne in mind.
The relations (11.60) and (11.64) can be considered as a set of simultaneous
equations. A necessary and sufficient condition for (11.64) to have a unique solution
x for a given y is det A 6¼ 0. In that case, the solution x of (11.64) can be symbolically
expressed as
x = A1 y, ð11:65Þ
where A1 represents an inverse matrix of A.

Example 11.3 Think of three-dimensional rotation by θ in ℝ3 around the z-axis.
The relevant transformation matrix is
0 1
cos θ sin θ 0
B C
R ¼ @ sin θ cos θ 0 A:
0 0 1
As detR ¼ 1 6¼ 0, the transformation is bijective. This means that for 8y 2 ℝ3 there is

always a corresponding x 2 ℝ3. This x can be found by solving Rx ¼ y; i.e.,
x ¼ R1y. Putting x ¼ xe1 + ye2 + ze3 and y ¼ x0e1 + y0e2 + z0e3, a matrix
representation is given by
0 1 0 01 0 10 1
x x cos θ sin θ 0 x0
B C B C B CB C
@ y A ¼ R1 @ y0 A ¼ @ sin θ cos θ 0 A@ y0 A:
z z0 0 0 1 z0
Thus, x can be obtained by rotating y by θ.

Example 11.4 Think of a following matrix that represents a linear transformation P:
0 1
1 0 0
B C
P ¼ @0 1 0 A: ð11:66Þ
0 0 0
This matrix transforms a vector x ¼ xe1 + ye2 + ze3 into y ¼ xe1 + ye2 as follows:
Fig. 11.4 Example of an z

endomorphism P: ℝ3 ! ℝ3
y
O
0 10 1 0 1
1 0 0 x x
B CB C B C
@0 1 0 A@ y A ¼ @ y A: ð11:67Þ
0 0 0 z 0
In this example we are thinking of an endomorphism P: ℝ3 ! ℝ3. Geometrically, it

can be viewed as in Fig. 11.4. Let us think of (11.67) from a point of view of solving
a system of linear equations and newly consider the next equation. In other words,
we are thinking of finding x, y, and z with given a, b, and c in (11.68).
0 10 1 0 1
1 0 0 x a
B CB C B C
@0 1 0 A@ y A ¼ @ b A: ð11:68Þ
0 0 0 z c
If c ¼ 0, we can readily find a solution of x ¼ a, y ¼ b, but z can be any (complex)

number; we have thus indefinite solutions. If c 6¼ 0, we have no solution. The former
situation reflects the fact that the transformation represented by P is not injective.
Meanwhile, the latter reflects the fact that the transformation is not surjective.
Remember that as detP ¼ 0, the transformation is not injective or surjective.
11.4 Basis Vectors and Their Transformations
In the previous sections we show that a vector is uniquely represented as a column

vector in reference to a set of the fixed basis vectors. The representation, however,
will be changed under a different set of basis vectors.
First let us think of a linear transformation of a set of basis vectors e1, e2, ,
and en. The transformation matrix A representing a linear transformation A is defined
as follows:
11.4 Basis Vectors and Their Transformations 453
0 1
a11 a1n
B C
A¼@⋮ ⋱ ⋮ A:
an1 ann
Notice here that we often denote both a linear transformation and its corresponding
matrix by the same character. After the transformation, suppose that the resulting
vectors are given by e01 , e02 , and e0n . This is explicitly described as
0 1
a11 a1n
B C
ðe1 en Þ@ ⋮ ⋱ ⋮ A ¼ e01 e0n : ð11:69Þ
an1 ann

Xn
e0i ¼ a e
k¼1 ki k
ð1 i nÞ: ð11:70Þ
Care should be taken not to confuse (11.70) with (11.63). Here, a set of vectors
e01 , e02 , and e0n may or may0not 1
be linearly independent. Let us operate both sides of
x1
B C
(11.70) from the left on @ ⋮ A and equate both the sides to zero. That is,
xn
0 10 1 0 1
a11 a1n x1 x1
B CB C B C
ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ e01 e0n @ ⋮ A ¼ 0:
an1 ann xn xn
Since e1, e2, , and en are the basis vectors, we get

0 10 1
a11 a1n x1
B CB C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ 0: ð11:71Þ
an1 ann xn
0 1
x1
B C
Meanwhile, we must have @ ⋮ A ¼ 0 so that e01 , e02 , and e0n can be linearly inde-
xn
pendent (i.e., so as to be a set of basis vectors). But this means that (11.71) has such a
unique (and trivial) solution and, hence, det A 6¼ 0. If conversely det A 6¼ 0, (11.71)
has a unique trivial solution and e01 , e02 , and e0n are linearly independent. Thus, a
necessary and sufficient condition for e01 , e02 , and e0n to be a set of basis vectors is
det A 6¼ 0. If det A ¼ 0, (11.71) has indefinite solutions (including a trivial solution)
and e01 , e02 , and e0n are linearly dependent, and vice versa. In case det A 6¼ 0, an inverse
matrix A1 exists, and so we have

ðe1 en Þ ¼ e01 e0n A1 : ð11:72Þ
In the previous steps we see how the linear transformation (and the corresponding
matrix representation)
converts a set of basis vectors (e1 en) to another set of basis
vectors e01 e0n . Is this possible then to find a suitable transformation between two
sets of arbitrarily chosen basis vectors? The answer is yes. This is because any vector
can be expressed uniquely by a linear combination of any set of basis vectors. A
whole array of such linear combinations uniquely define a transformation matrix
between the two sets of basis vectors as expressed in (11.69) and (11.72). The matrix
has nonzero determinant and has an inverse matrix. A concept of the transformation
between basis vectors is important and very often used in various fields of natural
science.
Example 11.5 We revisit Example 11.2. The relation (11.49) tells us that the basis
vectors of A(V 4) and those of Ker A span V 4 in total. Therefore, in light of the above
argument, there should be a linear transformation R between the two sets of vectors,
i.e., e1, e2, e3, e4 and e1 + e3, e2 + e4, 2 e1 e2 + e3, e4. Moreover, the matrix
R associated with the linear transformation must be non-singular (i.e., detR 6¼ 0). In
fact, we find that R is expressed as
0 1
1 0 1 0
B0 1 1 0C
B C
R¼B C:
@1 0 1 0A
0 1 0 1
This is because we have a following relation between the two sets of basis vectors:
0 1
1 0 1 0
B0 1 1 0C
B C
ðe1 , e2 , e3 , e4 ÞB C ¼ ðe1 þ e3 , e2 þ e4 , 2 e1 e2 þ e3 , e4 Þ:
@1 0 1 0A
0 1 0 1
We have det R ¼ 2 6¼ 0 as expected.

Next, let us consider successive linear transformations of vectors. Again we
assume that the transformations are endomorphism:V n ! V n. We have to take into
account transformations of basis vectors along with the targeted vectors. First we
choose a transformation by a non-singular matrix (having a nonzero determinant) for
the subsequent transformation to have a unique matrix representation (vide supra).
The vector transformation by P is expressed as
0 10 1
p11 p1n x1
B CB C
PðxÞ ¼ ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A , ð11:73Þ
pn1 pnn xn
where the non-singular matrix P represents the transformation P. Notice here that the
transformation and its matrix are represented by the same P. As mentioned in Sect.
11.2, the matrix P can be operated either from the right on the basis vectors or from
the left on the column vector. We explicitly write
2 0 130 1
p11 p1n x1
6 B C7B C
PðxÞ ¼ 4ðe1 en Þ@ ⋮ ⋱ ⋮ A5@ ⋮ A,
pn1 pnn xn
0 1
x1
0
0 B C
¼ e1 en @ ⋮ A, ð11:74Þ
xn

where e01 e0n ¼ (e1 en)P [here P is the non-singular matrix defined in (11.73)].
Alternatively, we have
20 10 13
p11 p1n x1
6B CB C7
PðxÞ ¼ ðe1 en Þ4@ ⋮ ⋱ ⋮ A@ ⋮ A5 :
pn1 pnn xn
0 0 1
x1
B C
¼ ðe1 en Þ@ ⋮ A, ð11:75Þ
x0n
where
0 1 0 10 1
x01 p11 p1n x1
B C B CB C
@⋮A ¼ @ ⋮ ⋱ ⋮ A@ ⋮ A : ð11:76Þ
x0n pn1 pnn xn
Equation (11.75) gives a column vector representation regarding the vector that has
been obtained by the transformation P and is viewed in reference to the basis vectors
(e1 en). Combining (11.74) and (11.75), we get
0 1 0 0 1
x1 x1
B C B C
PðxÞ = e01 e0n @ ⋮ A ¼ ðe1 en Þ@ ⋮ A: ð11:77Þ
xn x0n
We further make another linear transformation A : V n ! V n. In this case a

corresponding matrix A may be non-singular (i.e., det A 6¼ 0) or singular (det A ¼ 0).
We have to distinguish the matrix representations of the two cases. It is because the
matrix representations are uniquely defined in reference to an individual set of basis
0
vectors; see (11.37) and (11.43). Let
us denote
the matrices AO and A with respect to
0 0
the basis vectors (e1 en) and e1 en , respectively. Then, A[P(x)] can be
described in two different ways as follows:
0 1 0 0 1
x1 x1
0 0
0B C B C
A½PðxÞ = e1 en A @ ⋮ A ¼ ðe1 en ÞAO @ ⋮ A: ð11:78Þ
xn x0n
This can be rewritten in reference to a linearly independent set of vectors e1, , en as

0 1 2 0 13
x1 x1
B C 6 B C7
A½PðxÞ = ½ðe1 en ÞPA0 @ ⋮ A ¼ ðe1 en Þ4AO P@ ⋮ A5: ð11:79Þ
xn xn
As (11.79) is described for a vector x = ∑ni¼1 xi ei arbitrarily chosen in V n, we get
PA0 ¼ AO P: ð11:80Þ
Since P is non-singular, we finally obtain
A0 ¼ P1 AO P: ð11:81Þ
We can see (11.79) from a point of view of successive linear transformations.

When the subsequent operation is viewed in reference to the basis vectors e01 , , e0n
newly reached by the precedent transformation, we make it a rule to write the
relevant subsequent operator A0 from the right. In the case where the subsequent
operation is viewed in reference to the original basis vectors, on the other hand, we
write the subsequent operator AO from the left. Further discussion and examples can
be seen in Part IV.
We see (11.81) in a different manner. Suppose we have
0 1
x1
B C
AðxÞ ¼ ðe1 en ÞAO @ ⋮ A: ð11:82Þ
xn
Note that since the transformation A has been performed in reference to the basis
vectors (e1 en), AO should be used for the matrix representation. This is rewritten
as
1 0
x1
B C
AðxÞ ¼ ðe1 en ÞPP1 AO PP1 @ ⋮ A: ð11:83Þ
xn
Meanwhile, any vector x in V n can be written as

0 1
x1
B C
x ¼ ðe1 en Þ@ ⋮ A
xn
0 1 0 1 0 1
x1 x1 xe1
B C B C B C
¼ ðe1 en ÞPP1 @ ⋮ A ¼ e01 e0n P1 @ ⋮ A ¼ e01 e0n @ ⋮ A: ð11:84Þ
xn xn xen
In (11.84) we put
0 1 0 1
x1 xe1
0 0
1 B C B C
ðe1 en ÞP ¼ e1 en , P @ ⋮ A ¼ @ ⋮ A: ð11:85Þ
xn xen
Equation (11.84) gives a column vector representation regarding the same vector
x viewed in reference to the basis set of vectors e1, , en or e01 , , e0n . Equation
(11.85) should
0 not1 be confused
0 0 1 with (11.76). That is, (11.76) relates the two
x1 x1
B C B C
coordinates @ ⋮ A and @ ⋮ A of different vectors before and after the transfor-
xn x0n
mation viewed in reference to the same set0of basis
1 vectors.
0 The 1 relation (11.85), on
x1 xe1
B C B C
the other hand, relates two coordinates @ ⋮ A and @ ⋮ A of the same vector
xn xen
viewed in reference to different set of basis vectors. Thus, from (11.83) we have
0 1
xe1
B C
AðxÞ = e01 e0n P1 AO P@ ⋮ A: ð11:86Þ
xen

Meanwhile, viewing the transformation A in reference to the basis vectors e01 e0n ,
we have
0 1
xe1
B C
AðxÞ = e01 e0n A0 @ ⋮ A: ð11:87Þ
xen
Equating (11.86) and (11.87),
A0 ¼ P1 AO P: ð11:88Þ
Thus, (11.81) is recovered.

The relations expressed by (11.81) and (11.88) are called a similarity transfor-
mation on A. The matrices A0 and A0 are said to be similar to each other. As
mentioned earlier, if A0 (and hence A0) is non-singular, the linear transformation
A produces a set of basis vectors other than e01 , , e0n , say e001 , , e00n . We write this
symbolically as

ðe1 en ÞPA ¼ e01 e0n A ¼ e001 e00n : ð11:89Þ
Therefore, such A defines successive transformations of the basis vectors in con-

junction with P defined in (11.73). The successive transformations and resulting
basis vectors supply us with important applications in the field of group theory.
Topics will be dealt with in Part IV.
Reference

Chapter 12
Canonical Forms of Matrices
In Sect. 11.4 we saw that the transformation matrices are altered depending on the
basis vectors we choose. Then a following question arises. Can we convert a
(transformation) matrix to as simple a form as possible by similarity transforma-
tion(s)? In Sect. 11.4 we have also shown that if we have two sets of basis vectors in
a linear vector space V n we can always find a non-singular transformation matrix
between the two. In conjunction with the transformation of the basis vectors, the
matrix undergoes similarity transformation. It is our task in this chapter to find a
simple form or a specific form (i.e., canonical form) of a matrix as a result of the
similarity transformation. For this purpose, we should first find eigenvalue(s) and
corresponding eigenvector(s) of the matrix. Depending upon the nature of matrices,
we get various canonical forms of matrices such as a triangle matrix and a diagonal
matrix. Regarding any form of matrices, we can treat these matrices under a unified
form called the Jordan canonical form.
12.1 Eigenvalues and Eigenvectors
An eigenvalue problem is one of important subjects of the theory of linear vector

spaces. Let A be a linear transformation on V n. The resulting matrix gets to several
different kinds of canonical forms of matrices. A typical example is a diagonal
matrix. To reach a satisfactory answer, we start with so-called an eigenvalue
problem.
Suppose that after the transformation of x we have
AðxÞ ¼ αx, ð12:1Þ
where α is a certain (complex) number. Then we say that α is an eigenvalue and that
x is an eigenvector that corresponds to the eigenvalue α. Using a notation of (11.37)
of Sect. 11.2, we have

https://doi.org/10.1007/978-981-15-2225-3_12
460 12 Canonical Forms of Matrices
0 10 1 0 1
a11 a1n x1 x1
B CB C B C
AðxÞ ¼ ðe1 en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ αðe1 en Þ@ ⋮ A:
an1 ann xn xn
From linear dependence of e1, e2, , and en, we simply write

0 10 1 0 1
a11 a1n x1 x1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ α@ ⋮ A:
an1 ann xn xn
0 1
x1
B C
If we identify x with @ ⋮ A at fixed basis vectors (e1 en), we may naturally
xn
rewrite (12.1) as
Ax ¼ αx: ð12:2Þ
If x1 and x2 belong to the eigenvalue α, so does x1 + x2 and cx1 (c is an appropriate

complex number). Therefore, all the eigenvectors belonging to the eigenvalue α
along with 0 (a zero vector) form a subspace of A (within V n) corresponding to the
eigenvalue α.
Strictly speaking, we should use terminologies such as a “proper” (or ordinary)
eigenvalue, eigenvector, and eigenspace to distinguish them from a “generalized”
eigenvalue, eigenvector, and eigenspace. We will return to this point later. Further
rewriting (12.2) we have
ðA αEÞx ¼ 0, ð12:3Þ
where E is a (n, n) unit matrix. Equations (12.2) and (12.3) are said to be an
eigenvalue equation (or eigenequation). In (12.2) or (12.3) x = 0 always holds
(as a trivial solution). Consequently, for x ≠ 0 to be a solution we must have
jA αEj ¼ 0, ð12:4Þ
In (12.4), |A αE| stands for det(A αE).

Now let us define the following polynomial:

x a11 a1n

f A ðxÞ ¼ jxE Aj ¼ ⋮ ⋱ ⋮ : ð12:5Þ

an1 x ann
12.1 Eigenvalues and Eigenvectors 461
A necessary and sufficient condition for α to be an eigenvalue is that α is a root of

fA(x) ¼ 0. The function fA(x) is said to be a characteristic polynomial and we call
fA(x) ¼ 0 a characteristic equation. This is an n-th order polynomial. Putting
f A ðxÞ ¼ xn þ a1 xn1 þ þ an , ð12:6Þ
we have
a1 ¼ ða11 þ a22 þ þ ann Þ TrA, ð12:7Þ
where Tr stands for “trace” that is a summation of diagonal elements. Moreover,
an ¼ ð1Þn j A j : ð12:8Þ
The characteristic equation fA(x) ¼ 0 has n roots including possible multiple roots.
Let those roots be α1, , αn (some of them may be identical). Then we have
Y
n
f A ð xÞ ¼ ðx αi Þ: ð12:9Þ
i¼1
Furthermore, according to relations between roots and coefficients we get
α1 þ þ αn ¼ a1 ¼ TrA, ð12:10Þ

α1 αn ¼ ð1Þn an ¼ jAj: ð12:11Þ
The characteristic equation fA(x) is invariant under a similarity transformation.

In fact,

f P1 AP ðxÞ ¼ xE P1 AP ¼ P1 ðxE AÞP
a ¼ jPj1 jxE AjjPj ¼ jxE Aj ¼ f A ðxÞ: ð12:12Þ
This leads to invariance of the trace under a similarity transformation. That is,

Tr P1 AP ¼ TrA: ð12:13Þ
Let us think of a following triangle matrix:
= * . ð12:14Þ
0
The matrix of this type is thought to be one of canonical forms of matrices. Its
characteristic equation fT(x) is

x a11

f T ðxÞ ¼ jxE T j ¼ 0 ⋱ : ð12:15Þ

0 0 x ann
Therefore, we get
Y
n
f T ð xÞ ¼ ðx aii Þ, ð12:16Þ
i¼1
where we used (11.58). From (12.14) and (12.16), we find that eigenvalues of a
triangle matrix are given by its diagonal elements accordingly. Our immediate task
will be to examine whether and how a given matrix is converted to a triangle matrix
through a similarity transformation. A following theorem is important.
Theorem 12.1 Every (n, n) square matrix can be converted to a triangle matrix by
similarity transformation [1].
Proof A triangle matrix is either an “upper” triangle matrix [to which all the lower
left off-diagonal elements are zero; see (12.14)] or a “lower” triangle matrix
(to which all the upper right off-diagonal elements are zero). In the present case
we show the proof for the upper triangle matrix. Regarding the lower triangle matrix
the theorem is proven in a similar manner.
The proof is due to mathematical induction. First we show that the theorem is true
of a (2, 2) matrix. Suppose that one of eigenvalues of A2 is α1 and that its
corresponding eigenvector is x1. Then we have
A2 x1 ¼ α1 x1 , ð12:17Þ
where we assume that x1 represents a column vector. Let a non-singular matrix be
P1 ¼ ðx1 p1 Þ, ð12:18Þ
where p1 is another column vector chosen in such a way that x1 and p1 are linearly
independent so that P1 can be a non-singular matrix. Then, we have
1
P1 1
1 A2 P1 ¼ P1 A2 x1 P1 A2 p1 : ð12:19Þ
Meanwhile,

1 1
x1 ¼ ð x1 p1 Þ ¼ P1 :
0 0
Hence, we have

1 1 α1
P1
1 A2 x1 ¼ α1 P1
1 x1 ¼ α1 P1
1 P1 ¼ α1 ¼ : ð12:20Þ
0 0 0
Thus (12.19) can be rewritten as

α1
P1
1 A2 P1 ¼ : ð12:21Þ
0
This shows that Theorem 12.1 is true of a (2, 2) matrix A2.

Now let us show that Theorem 12.1 holds in a general case, i.e., for a (n, n) square
matrix An. Let αn be one of eigenvalues of An. On the basis of the argument of the
(2, 2) matrix case, suppose that after a suitable similarity transformation by a non-
e we have
singular matrix P
1
f e An P:
An ¼ P e ð12:22Þ
fn as.
Then, we can describe A
0
. ð12:23Þ
AnѸ1
In (12.23), αn is one of eigenvalues of An. To show that (12.23) is valid, we have a

e such that
similar argument as in the case of a (2, 2) matrix. That is, we set P
e ¼ ðan p1 p2 pn1 Þ,
P
where an is an eigenvector corresponding to αn and P e is a non-singular matrix

formed by n linearly independent column vectors an, p1, p2, and pn 1. Then we
have
1
fn ¼ P
A e An Pe
1 1 1 1
¼ P e An an Pe An p1 Pe An p2 P
e An pn1 : ð12:24Þ
The vector an can be expressed as

0 1 0 1
1 1
B 0 C B 0 C
B C B C
B C B C
B
an ¼ ðan p1 p2 pn1 ÞB C e B 0 C
0 C ¼ PB C:
B C B C
@⋮A @ ⋮A
0 0
Therefore, we have
1 0 0 1 0 1
1 1 αn
B 0 C B 0 C B 0 C
B C B C B C
1 1 1 B C B C B C
e e e e B C
P An an ¼ P αn an ¼ P αn PB 0 C ¼ αn B
B 0 C B
C¼B 0 CC:
B C B C B C
@⋮A @ ⋮A @ ⋮A
0 0 0
Thus, from (12.24) we see that one can express a matrix form of A fn as (12.23).
By a hypothesis of the mathematical induction, we assume that there exists a
(n 1, n 1) non-singular square matrix Pn 1 and an upper triangle matrix Δn 1
such that
P1
n1 An1 Pn1 ¼ Δn1 : ð12:25Þ
Let us define a following matrix
0 ⋯ 0
0 ,
PnѸ1
where d 6¼ 0. The Pn is (n, n) non-singular square matrix; remember that detPn ¼ d

fn from the right, we have
(det Pn 1) 6¼ 0. Operating Pn on A
⋯ 0 ⋯ 0
, ð12:26Þ
AnѸ1 PnѸ1 AnѸ1 PnѸ1
0 1
x1
B C
where xTn1 is a transpose of a column vector @ ⋮ A . Therefore, xTn1 Pn1 is a
xn1
(1, n 1) matrix (i.e., a row vector). Meanwhile, we have
0 0 0 /
. ð12:27Þ
PnѸ1 Δ nѸ1 PnѸ1 ΔnѸ1
From the assumption of (12.25), we have
LHS of ð12:26Þ ¼ LHS of ð12:27Þ: ð12:28Þ
That is,
f
An Pn ¼ Pn Δn : ð12:29Þ
In (12.29) we define Δn such that
/
, ð12:30Þ
Δ nѸ1
which has appeared in LHS of (12.27). As Δn 1 is a triangle matrix from the

assumption, Δn is a triangle matrix as well. Combining (12.22) and (12.29), we
finally get

e n 1 An PP
Δn ¼ PP e n: ð12:31Þ
e n is a non-singular matrix, and so PP

Notice that PP e1 ¼ E . Hence,
e n Pn 1 P

e1 ¼ PP
Pn 1 P e n 1 . The equation obviously shows that An has been converted to
a triangle matrix Δn. This completes the proof. ∎
Equation (12.31) implies that eigenvectors are disposed on diagonal positions of a
triangle matrix. Triangle matrices can further undergo a similarity transformation.
Example 12.1 Let us think of a following triangle matrix A:

2 1
A¼ : ð12:32Þ
0 1
Eigenvalues of A are 2 and 1. Remember that diagonal elements of a triangle

1
matrix give eigenvalues. According to (12.20), a vector can be chosen for an
0
eigenvector (as a column vector representation) corresponding to the eigenvalue

1
2. Another eigenvector can be chosen to be . This is because for an
1
eigenvalue 1, we obtain

1 1
x¼0
0 0

c1
as an eigenvalue equation (A E)x ¼ 0. Therefore, with an eigenvector
c2
1
corresponding to the eigenvalue 1, we get c1 + c2 ¼ 0. Therefore, we have as
1
1 1
a simple form of the eigenvector. Hence, putting P ¼ , similarity trans-
0 1
formation is carried out such that

1 1 2 1 1 1 2 0
P1 AP ¼ ¼ : ð12:33Þ
0 1 0 1 0 1 0 1
This is a simple example of matrix diagonalization.

Regarding the eigenvalue/eigenvector problems, we have another important
theorem.
Theorem 12.2 Eigenvectors corresponding to different eigenvalues of A are line-
arly independent.
Proof We prove the theorem by mathematical induction. Let α1 and α2 be two
different eigenvalues of a matrix A and let a1 and a2 be eigenvectors corresponding
to α1 and α2, respectively.
Let us think of a following equation:
c1 a1 þ c2 a2 ¼ 0: ð12:34Þ
Suppose that a1 and a2 are linearly dependent. Then, without loss of generality we
can put c1 6¼ 0. Accordingly, we get
c2
a1 ¼ a: ð12:35Þ
c1 2
Operating A from the left of (12.35) we have
c2
α1 a1 ¼ α a = α2 a1 : ð12:36Þ
c1 2 2
With the second equality we have used (12.35). From (12.36) we have
ðα1 α2 Þa1 = 0: ð12:37Þ
As α1 6¼ α2, α1 α2 6¼ 0. This implies a1 = 0, in contradiction to that a1 is a

eigenvector. Thus, a1 and a2 must be linearly independent.
Next we assume that Theorem 12.2 is true of the case where we have (n 1)
eigenvalues α1, α2, , and αn 1 that are different from one another and
corresponding eigenvectors a1, a2, , and an 1 are linearly independent. Let us
think of a following equation:
c1 a1 þ c2 a2 þ þ cn1 an1 þ cn an ¼ 0, ð12:38Þ
where an is an eigenvector corresponding to an eigenvalue αn. Suppose here that a1,

a2, , and an are linearly dependent. If cn = 0, we have
c1 a1 þ c2 a2 þ þ cn1 an1 ¼ 0: ð12:39Þ
But, from the linear independence of a1, a2, , and an 1 we have

c1 ¼ c2 ¼ ¼ cn 1 ¼ 0. Thus, it follows that all the eigenvectors a1, a2, ,
and an are linearly independent. However, this is in contradiction to the assumption.
We should therefore have cn 6¼ 0. Accordingly we get

c1 c c
an ¼ a1 þ 2 a2 þ þ n1 an1 : ð12:40Þ
cn cn cn
Operating A from the left of (12.38) again, we have

c1 c2 cn1
α n an ¼ α a þ α a þ þ α a : ð12:41Þ
cn 1 1 cn 2 2 cn n1 n1
Here we think of two cases of (i) αn 6¼ 0 and (ii) αn ¼ 0.

(i) Case I: αn 6¼ 0.
Multiplying both sides of (12.40) by αn we have

c1 c2 cn1
αn an ¼ α a þ α a þ þ αa : ð12:42Þ
cn n 1 cn n 2 cn n n1
Subtracting (12.42) from (12.41) we get
c1 c c
0¼ ðα α1 Þa1 þ 2 ðαn α2 Þa2 þ þ n1 ðαn αn1 Þan1 : ð12:43Þ
cn n cn cn
Since we assume that eigenvalue are different from one another, αn 6¼ α1,
αn 6¼ α2, , αn 6¼ αn 1. This implies that c1 ¼ c2 ¼ ¼ cn 1 ¼ 0. From
(12.40) we have an ¼ 0. This is, however, in contradiction to that an is an
eigenvector. This means that our original supposition that a1, a2, , and an are
linearly dependent was wrong. Thus, the eigenvectors a1, a2, , and an should
be linearly independent.
(ii) Case II: αn ¼ 0.
Suppose again that a1, a2, , and an are linearly dependent. Since as before
cn 6¼ 0, we get (12.40) and (12.41) once again. Putting αn ¼ 0 in (12.41) we have

c c c
0 ¼ 1 α1 a1 þ 2 α2 a2 þ þ n1 αn1 an1 : ð12:44Þ
cn cn cn
Since eigenvalues are different, we should have α1 6¼ 0, α2 6¼ 0, , and

αn 1 6¼ 0. Then, considering that a1, a2, , and an 1 are linearly indepen-
dent, for (12.44) to hold we must have c1 ¼ c2 ¼ ¼ cn 1 ¼ 0. But, from
(12.40) we have an ¼ 0, again in contradiction to that an is an eigenvector. Thus,
the eigenvectors a1, a2, , and an should be linearly independent. These
complete the proof. ∎
12.2 Eigenspaces and Invariant Subspaces
Equations (12.21), (12.30), and (12.31) show that if we adopt an eigenvector as one
of basis vectors, the matrix representation of the linear transformation A in reference
to such basis vectors is obtained so that the leftmost column is zero except for the left
top corner on which an eigenvalue is positioned. (Note that if the said eigenvector is
zero, the leftmost column is a zero vector.) Meanwhile, neither Theorem 12.1 nor
Theorem 12.2 tells about multiplicity of eigenvalues. If the eigenvalues have
multiplicity, we have to think about different aspects. This is a major issue of this
section.
Let us start with a discussion of invariant subspaces. Let A be a (n, n) square
matrix. If a subspace W in V n is characterized by
x 2 W⟹Ax 2 W,
W is said to be invariant with respect to A (or simply A-invariant) or an invariant

subspace in V n ¼ Span{a1, a2, , an}. Suppose that x is an eigenvector of A and that
its corresponding eigenvalue is α. Then, Span{x} is an invariant subspace of V n. It is
because A(cx) = cAx = cαx ¼ α(cx) and cx is again an eigenvector belonging to α.
Suppose that dimW ¼ m (m n) and that W ¼ Span{a1, a2, , am}. If W is A-
invariant, A causes a linear transformation within W. In that case, expressing A in
reference to a1, a2, , am, am + 1, , an, we have
12.2 Eigenspaces and Invariant Subspaces 469

Am
ða1 a2 am amþ1 an ÞA ¼ ða1 a2 am amþ1 an Þ , ð12:45Þ
0
where Am is a (m, m) square matrix and “zero” denotes a (n m, m) zero matrix.

Notice that the transformation A makes the remaining (n m) basis vectors am + 1,
am + 2, , and an in V n be converted to a linear combination of a1, a2, , and an.
The triangle matrix Δn given in (12.30) and (12.31) is an example to which Am is a
(1, 1) matrix (i.e., simply a complex number).
Let us examine properties of the A-invariant subspace still further. Let a be any
vector in V n and think of following (n + 1) vectors [2]:
a, Aa, A2 a, , An a:
These vectors are linearly dependent, since there are at most n linearly indepen-
dent vectors in V n. These vectors span a subspace in V n. Let us call this subspace M;
i.e.,
M Span a, Aa, A2 a, , An a :
Consider the following equation:
c0 a þ c1 Aa þ c2 A2 a þ þ cn An a = 0: ð12:46Þ
Not all ci (0 i n) are zero, because the vectors are linearly dependent. Suppose
that cn 6¼ 0. Then, from (12.46) we have
1
An a = 2 c a þ c1 Aa þ c2 A2 a þ þ cn1 An1 a :
cn 0
Operating A on the above equation from the left, we have
1
Anþ1 a = 2 c0 Aa þ c1 A2 a þ c2 A3 a þ þ cn1 An a :
cn
Thus, An + 1a is contained in M. That is, we have An + 1a 2 Span{a, Aa, A2a, ,

A a}. Next, suppose that cn ¼ 0. Then, at least one of ci (0 i n 1) is not zero.
n
Suppose that cn 1 6¼ 0. From (12.46), we have
1
An1 a = 2 c a þ c1 Aa þ c2 A2 a þ þ cn2 An2 a :
cn1 0
Operating A2 on the above equation from the left, we have

1
Anþ1 a = 2 c0 A2 a þ c1 A3 a þ c2 A4 a þ þ cn2 An a :
cn1
Again, An + 1a is contained in M. Repeating similar procedures, we reach
c0 a þ c1 Aa ¼ 0:
If c1 ¼ 0, then we must have c0 6¼ 0. If so, a = 0. This is impossible, however,

because we should have a 6¼ 0. Then, we have c1 6¼ 0 and, hence,
c0
Aa ¼ a:
c1
Operating once again An on the above equation from the left, we have
c0 n
Anþ1 a = A a:
c1
Thus, once again An + 1a is contained in M.

In the above discussion, we get AM ⊂ M. Further operating
A, A2M ⊂ AM ⊂ M, A3M ⊂ A2M ⊂ AM ⊂ M, . Then we have
a, Aa, A2 a, , An a, Anþ1 a, 2 Span a, Aa, A2 a, , An a :
That is, M is an A-invariant subspace. We also have
m dimM dimV n ¼ n:
There are m basis vectors in M, and so representing A in a matrix form in reference

to the n basis vectors of V n including these m vectors, we have

Am
A¼ : ð12:47Þ
0
Note again that in (12.45) V n is spanned by the m basis vectors in M together with
other (n 2 m) linearly independent vectors.
We happen to encounter a situation where two subspaces W1 and W2 are at once
A-invariant. Here we can take basis vectors a1, a2, , and am for W1 and am + 1,
am + 2, , and an for W2. In reference to such a1, a2, , and an as basis vector of
V n, we have

Am 0
A¼ , ð12:48Þ
0 Anm
12.2 Eigenspaces and Invariant Subspaces 471
where An m is a (n m, n m) square matrix and “zero” denotes either a

(n m, m) or (m, n m) zero matrix. Alternatively, we denote
A ¼ Am Anm :
In this situation, the matrix A is said to be reducible.

As stated above, A causes a linear transformation within both W1 and W2. In other
words, Am and An m cause a linear transformation within W1 and W2 in reference to
a1, a2, , am and am + 1, am + 2, , an, respectively. In this case V n can be
represented as a direct sum of W1 and W2 such that
V n ¼ W 1 W 2 ¼ Spanfa1 , a2 , , am g Spanfamþ1 , amþ2 , , an g: ð12:49Þ
This is because Span{a1, a2, , am} \ Span{am + 1, am + 2, , an} ¼ {0}. In fact,

if the two subspaces possess x ( ≠ 0) in common, a1, a2, , an become linearly
dependent, in contradiction.
The vector space V n may well further be decomposed into subspaces with a lower
dimension. For further discussion we need a following theorem and a concept of a
generalized eigenvector and generalized eigenspace.
Theorem 12.3: Hamilton–Cayley Theorem [3, 4] Let fA(x) be the characteristic
polynomial pertinent to a linear transformation A: V n ! V n. Then fA(A)(x) ¼ 0 for
8
x 2 V n. That is, Ker fA(A) ¼ V n.
Proof To prove the theorem we use the following well-known property of deter-
minants. Namely, let A be a (n, n) square matrix expressed as
0 1
a11 a1n
B C
A¼@⋮ ⋱ ⋮ A:
an1 ann
e be the cofactor matrix of A, namely

Let A
0 1
Δ11 Δ1n
e B C
A¼@ ⋮ ⋱ ⋮ A,
Δn1 Δnn
where Δij is a cofactor of aij. Then
A eT ¼j A j E,
eT A ¼ AA ð12:50Þ
eT is a transposed matrix of A.
where A e We now apply (12.50) to the characteristic
polynomial to get a following equation expressed as
g T
xE AÞ ðxE AÞ ¼ ðxE AÞ xEg
AÞT ¼ jxE AjE ¼ f A ðAÞE, ð12:51Þ
where xEg A is the cofactor matrix of xE A. Let the cofactor of the (i, j)-element
of (xE A) be Δij. Note in this case that Δij is an at most (n 1)-th order polynomial
of x. Let us put accordingly
Δij ¼ bij,0 xn1 þ bij,1 xn2 þ þ bij,n1 : ð12:52Þ
Also, we put Bk ¼ (bij, k).

Then we have
0 1
Δ11 Δ1n
B C
xEg
A¼B
@⋮ ⋱ ⋮ C A
Δn1 Δnn
0 1
b11,0 xn1 þ þ b11,n1 b1n,0 xn1 þ þ b1n,n1
B C
¼@ ⋮ ⋱ ⋮ A
bn1,0 xn1 þ þ bn1,n1 bnn,0 xn1 þ þ bnn,n1
0 1 0 1
b11,0 b1n,0 b11,0 b1n,n1
B C B C
¼ xn1 @ ⋮ ⋱ ⋮ A þ þ @ ⋮ ⋱ ⋮ A
bn1,0 bnn,0 bn1,0 bnn,n1
¼ xn1 B0 þ xn2 B1 þ þ Bn1 : ð12:53Þ
Thus, we get

jxE AjE ¼ ðxE AÞ xn1 BT0 þ xn2 BT1 þ þ BTn1 : ð12:54Þ
Replacing x with A and considering that BTl ðl ¼ 0, , n 1Þ is commutative

with A, we have

f A ðAÞ ¼ ðA AÞ An1 BT0 þ An2 BT1 þ þ BTn1 ¼ 0: ð12:55Þ
This means that fA(A)(x) ¼ 0 for 8x 2 V n. These complete the proof. ∎

In relation to Hamilton–Cayley theorem, we consider an important concept of a
minimal polynomial.
12.3 Generalized Eigenvectors and Nilpotent Matrices 473
Definition 12.1 Let f(x) be a polynomial of x with scalar coefficients such that f
(A) ¼ 0, where A is a (n, n) matrix. Among f(x), a lowest-order polynomial with the
highest-order coefficient of one is said to be a minimal polynomial. We denote it by
φA(x); i.e., φA(A) ¼ 0.
We have an important property for this. Namely, a minimal polynomial φA(x) is a
divisor of f(A). In fact, suppose that we have
f ðxÞ ¼ gðxÞφA ðxÞ þ hðxÞ:
Inserting A into x, we have
f ðAÞ ¼ gðAÞφA ðAÞ þ hðAÞ ¼ hðAÞ ¼ 0:
From the above equation, h(x) should be a polynomial whose order is lower than
that of φA(A). But, the presence of such h(x) is in contradiction to the definition of the
minimal polynomial. This implies that h(x) 0. Thus, we get
f ðxÞ ¼ gðxÞφA ðxÞ:
That is, φA(x) must be a divisor of f(A).

Suppose that φ0A ðxÞ is another minimal polynomial. If the order of φ0A ðxÞ is lower
than that of φA(x), we can choose φ0A ðxÞ for a minimal polynomial from the
beginning. Thus we assume that φA(x) and φ0A ðxÞ are of the same order. We have
f ðxÞ ¼ gðxÞφA ðxÞ ¼ g0 ðxÞφ0A ðxÞ:
Then, we have
φA ðxÞ=φ0A ðxÞ ¼ g0 ðxÞ=gðxÞ ¼ c,
where c is a constant, because φA(x) and φ0A ðxÞ are of the same order. But, c should be
one, since the highest-order coefficient of the minimal polynomial is one. Thus,
φA(x) is uniquely defined.
12.3 Generalized Eigenvectors and Nilpotent Matrices
Equation (12.4) ensures that an eigenvalue is accompanied by an eigenvector.

Therefore, if a matrix A : V n ! V n has different n eigenvalues without multiple
roots, the vector space V n is decomposed to a direct sum of one-dimensional sub-
spaces spanned by those individual linearly independent eigenvectors (see discus-
sion of Sect 12.1). Thus, we have
Vn ¼ W1 W2 Wn
¼ Spanfa1 g Spanfa2 g Spanfan g, ð12:56Þ
where ai (1 i n) are eigenvectors corresponding to different n eigenvalues. The

situation, however, is not always simple. Even though a matrix has eigenvalues of
multiple roots, we have yet a simple case as shown in a next example.
Example 12.2 Let us think of a following matrix A : V3 ! V3 described by
0 1
2 0 0
B C
A ¼ @0 2 0 A: ð12:57Þ
0 0 2
The matrix has a triple root 2. As can easily be seen below, individual eigenvec-
tors a1, a2, and a3 form basis vectors of each invariant subspace, indicating that V3
can be decomposed to a direct sum of the three invariant subspaces as in (12.56).
0 1
2 0 0
B C
ð a1 a2 a3 Þ @ 0 2 0 A ¼ ð2a1 2a2 2a3 Þ: ð12:58Þ
0 0 2
Let us think of another simple example.

Example 12.3 Let us think of a linear transformation A : V2 ! V2 expressed as

3 1 x1
A ð xÞ ¼ ð a1 a2 Þ , ð12:59Þ
0 3 x2
where a1 and a2 are basis vectors and for 8x 2 V2,x = x1a1 + x2a2. This example has a
double root 3. We have

3 1
ð a1 a2 Þ ¼ ð3a1 a1 þ 3a2 Þ: ð12:60Þ
0 3
Thus, Span {a1} is A-invariant, but this is not the case with Span{a2}. This
implies that a1 is an eigenvector corresponding to an eigenvalue 3 but a2 is not.
Detailed discussion about matrices of this kind follows below.
Nilpotent matrices play an important role in matrix algebra. These matrices are
defined as follows.
Definition 12.2 Let N be a linear transformation in a vector space V n. Suppose that
we have
N k ¼ 0 and N k1 6¼ 0, ð12:61Þ

where N is a (n, n) square matrix and k (2) is a certain natural number. Then, N is
said to be a nilpotent matrix of an order k or a k-th order nilpotent matrix. If (12.61)
holds with k ¼ 1, N is a zero matrix.
Nilpotent matrices have following properties:
(i) Eigenvalues of a nilpotent matrix are zero. Let N be a k-th order nilpotent matrix.
Suppose that
Nx ¼ αx, ð12:62Þ
where α is an eigenvalue and x (6¼0) is its corresponding eigenvector. Operating

N (k 1) more times from the left of both sides of (12.62), we have
N k x ¼ αk x: ð12:63Þ
Meanwhile, Nk ¼ 0 by definition, and so αk ¼ 0, namely α ¼ 0.

Conversely, suppose that eigenvalues of a (n, n) matrix N are zero. From
Theorem 12.1, via a suitable similarity transformation with P we have
e
P1 NP ¼ N,
e is a triangle matrix. Then, using (12.12) and (12.16) we have

where N
f P1 NP ðxÞ ¼ f eðxÞ ¼ f N ðxÞ ¼ xn :

N
f N ðN Þ ¼ N n ¼ 0:
Namely, N is a nilpotent matrix. In a trivial case, we have N ¼ 0

(zero matrix).
By Definition 12.2, we have Nk ¼ 0 with a k-th nilpotent (n, n) matrix N.
Then, its minimal polynomial is φN(x) ¼ xk (k n).
(ii) A nilpotent matrix N is not diagonalizable (except for a zero matrix). Suppose
that N is diagonalizable. Then N can be diagonalized by a non-singular matrix
P such that
P1 NP ¼ 0:
The above equation holds, because N only takes eigenvalues of zero. Operating
P from the left of the above equation and P1 from the right, we have
N ¼ 0:
This means that N would be a zero matrix, in contradiction. Thus, a nilpotent

matrix N is not diagonalizable.
Example 12.4 Let N be a matrix of a following form:

0 1
N¼ :
0 0
Then, we can easily check that N2 ¼ 0. Therefore, N is a nilpotent matrix of a

second order. Note that N is an upper triangle matrix, and so eigenvalues are given
by diagonal elements. In the present case the eigenvalue is zero (as a double root),
consistent with the aforementioned property.
With a nilpotent matrix of an order k, we have at least one vector x such that
Nk 1x 6¼ 0. Here we add that a zero transformation A (or matrix) can be defined as
8
A ¼ 0 ⟺ Ax ¼ 0 for x 2 V n: ð12:64Þ
Taking contraposition of this, we have
∃
A 6¼ 0 ⟺ Ax 6¼ 0 for x 2 V n:
In relation to a nilpotent matrix, we have a following important theorem.

Theorem 12.4 If N is a k-th order nilpotent matrix, then for ∃x 2 V we have
following linearly independent k vectors:
x, Nx N 2 x, , N k1 x:
Proof Let us think of a following equation:

Xk1
cN
i¼0 i
i
x ¼ 0: ð12:65Þ
Multiplying (12.65) by Nk 1 from the left and using Nk ¼ 0, we get
c0 N k1 x ¼ 0: ð12:66Þ
This implies that c0 ¼ 0. Thus, we are left with

Xk1
cN
i¼1 i
i
x ¼ 0: ð12:67Þ
Next, multiplying (12.67) by Nk 2 from the left, we get similarly

c1 N k1 x ¼ 0, ð12:68Þ
implying that c1 ¼ 0. Continuing this procedure, we finally get ci ¼ 0 (0 i k 1).

This completes the proof.
This immediately tells us that for a k-th order nilpotent matrix N : V n ! V n, we
should have k n. This is because the number of linearly independent vectors is at
most n.
In Example 12.4, N causes a transformation of basis vectors in V2 such that

0 1
ðe1 e2 ÞN ¼ ðe1 e2 Þ ¼ ð0 e1 Þ:
0 0
That is, Ne2 ¼ e1. Then, linearly independent two vectors e2 and Ne2 correspond
to the case of Theorem 12.4.
So far we have shown simple cases where matrices can be diagonalized via
similarity transformation. This is equivalent to that the relevant vector space is
decomposed to a direct sum of (invariant) subspaces. Nonetheless, if a characteristic
polynomial of the matrix has multiple root(s), it remains uncertain whether the vector
space is decomposed to such a direct sum. To answer this question, we need a
following lemma. ∎
Lemma 12.1 Let f1(x), f2(x), , and fs(x) be polynomials without a common factor.
Then we have s polynomials M1(x), M2(x), , Ms(x) that satisfy the following
relation:
M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ þ M s ðxÞf s ðxÞ ¼ 1: ð12:69Þ
Proof Let Mi(x) (1 i s) be arbitrarily chosen polynomials and deal with a set of
g(x) that can be expressed as
gðxÞ ¼ M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ þ M s ðxÞf s ðxÞ: ð12:70Þ
Let a whole set of g(x) be H. Then H has following two properties:

(i) g1 ðxÞ, g2 ðxÞ 2 H⟹g1 ðxÞ þ g2 ðxÞ 2 H,
(ii) gðxÞ 2 H, M(x): an arbitrarily chosen polynomial ⟹M ðxÞgðxÞ 2 H.
Now let the lowest order polynomial out of the set H be g0(x). Then 8gðxÞð2 HÞ
are a multiple of g0(x). Suppose that dividing g(x) by g0(x), we have
gðxÞ ¼ M ðxÞg0 ðxÞ þ hðxÞ, ð12:71Þ
where h(x) is a certain polynomial. Since gðxÞ, g0 ðxÞ 2 H, we have hðxÞ 2 H from
the above properties (i) and (ii). If h(x) 6¼ 0, the order of h(x) is lower than that of
g0(x) from (12.71). This is, however, in contradiction to the definition of g0(x).
Therefore, h(x) ¼ 0. Thus,
gðxÞ ¼ M ðxÞg0 ðxÞ: ð12:72Þ
This implies that H is identical to a whole set of polynomials comprising multiples

of g0(x). In particular, polynomials f i ðxÞ ð1 i sÞ 2 H. To show this, put Mi(x) ¼ 1
with other Mj(x) ¼ 0 ( j 6¼ i). Hence, the polynomial g0(x) should be a common
factor of fi(x). Meanwhile, g0 ðxÞ 2 H, and so by virtue of (12.70) we have
g0 ðxÞ ¼ M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ þ M s ðxÞf s ðxÞ: ð12:73Þ
On assumption the greatest common factor of f1(x), f2(x), , and fs(x) is 1. This
implies that g0(x) ¼ 1. Thus, we finally get (12.69) and complete the proof. ∎
12.4 Idempotent Matrices and Generalized Eigenspaces
In Sect. 12.1 we have shown that eigenvectors corresponding to different eigen-

values of A are linearly independent. Also if those eigenvalues do not possess
multiple roots, the vector space comprises a direct sum of the subspaces of
corresponding eigenvectors. However, how do we have to treat a situation where
eigenvalues possess multiple roots? Even in this case there is at least one eigenvector
that corresponds to the eigenvalue. To adequately address the question, we need a
concept of generalized eigenvectors and generalized eigenspaces.
For this purpose we extend and generalize the concept of eigenvectors. For a
certain natural number l, if a vector x (2V n) satisfies a following relation, x is said to
be a generalized eigenvector of rank l that corresponds to an eigenvalue α.
ðA αE Þl x ¼ 0, ðA αEÞl1 x 6¼ 0:
Thus (12.3) implies that an eigenvector of (12.3) may be said to be a generalized

eigenvector of rank 1. When we need to distinguish it from generalized eigenvectors
of rank l (l 2), we call it a “proper” eigenvector. Thus far we have only been
concerned with the proper eigenvectors. We have a following theorem related to the
generalized eigenvectors.
In this section, idempotent operators play a role. The definition is simple.
Definition 12.3 An operator A is said to be idempotent if A2 ¼ A.
From this simple definition, we can draw several important pieces of information.
Let A be an idempotent operator that operates on V n. Let x be an arbitrary vector in
V n. Then, A2x ¼ A(Ax) = Ax. That is,
12.4 Idempotent Matrices and Generalized Eigenspaces 479
AðAx xÞ ¼ 0:
Then, we have
Ax = x or Ax = 0: ð12:74Þ
From Theorem 11.3 (dimension theorem), we have
V n ¼ AðV n Þ Ker A,
where
AðV n Þ ¼ fx; x 2 V n , Ax = xg, Ker A ¼ fx; x 2 V n , Ax = 0g: ð12:75Þ
Thus, we find that A decomposes V n into a direct sum of A(V n) and Ker A.
Conversely, we can readily verify that if there exists an operator A that satisfies
(12.75), such A must be an idempotent operator. The verification is left for readers.
Meanwhile, we have
ðE 2 AÞ2 ¼ E 2A þ A2 ¼ E 2A þ A ¼ E A:
Hence, E A is an idempotent matrix as well. Moreover, we have
AðE AÞ ¼ ðE AÞA ¼ 0:
Putting E A B and following a procedure similar to the above
Bx x ¼ ðE AÞx x ¼ x Ax x ¼ Ax:
Therefore, B(V n) ¼ {x; x 2 V n, Bx = x} is identical to Ker A. Writing
W A ¼ f x; x 2 V n , Ax = xg, W A ¼ f x; x 2 V n , Ax = 0g, ð12:76Þ
we get
W A ¼ AV n , W A ¼ ðE AÞV n ¼ BV n ¼ V n AV n ¼ Ker A:
That is,
V n ¼ W A þ W A:
Suppose that ∃ u 2 W A \ W A . Then, from (12.76) Au = u and Au = 0, namely

u = 0, and so V is a direct sum of WA and W A . That is,
V n ¼ W A W A:
Notice that if we consider an identity operator E as an idempotent operator, we are

thinking of a trivial case. That is, V n ¼ V n {0}.
The result can immediately be extended to the case where more idempotent
operators take part in the vector space. Let us define operators such that
A1 þ A2 þ þ As ¼ E,
where E is a (n, n) identity matrix. Also
Ai Aj ¼ Ai δij :
Moreover, we define Wi such that
W i ¼ Ai V n ¼ fx; x 2 V n , Ai x ¼ xg: ð12:77Þ
Then
V n ¼ W 1 W 2 W s: ð12:78Þ
In fact, suppose that ∃x( 2 V n) 2 Wi, Wj (i 6¼ j). Then Aix ¼ x = Ajx. Operating Aj
( j 6¼ i) from the left, AjAix ¼ Ajx = AjAjx. That is, 0 ¼ x = Ajx, implying that
Wi \ Wj ¼ {0}. Meanwhile,
V n ¼ ðA1 þ A2 þ þ As ÞV n ¼ A1 V n þ A2 V n þ þ As V n
¼ W 1 þ W 2 þ þ W s: ð12:79Þ
As Wi \ Wj ¼ {0} (i 6¼ j), (12.79) is a direct sum. Thus, (12.78) will follow.

Example 12.5 Think of the following transformation A:
0 1
1 0 0 0
B0 0 0C
B 1 C
ðe1 e2 e3 e4 ÞA ¼ ðe1 e2 e3 e4 ÞB C ¼ ðe1 e2 0 0Þ:
@0 0 0 0A
0 0 0 0
P4
Put x = i¼1 xi ei . Then, we have
X4 X2
AðxÞ ¼ x Aðei Þ =
i¼1 i
xe
i¼1 i i
= W A,
X4
ðE AÞðxÞ ¼ x 2 AðxÞ = xe
i¼3 i i
= W A:
In the above,
0 1
0 0 0 0
B0 0 0C
B 0 C
ðe1 e2 e3 e4 ÞðE AÞ ¼ ðe1 e2 e3 e4 ÞB C ¼ ð0 0 e3 e4 Þ:
@0 0 1 0A
0 0 0 1
Thus, we have
W A ¼ Spanfe1 , e2 g, W A ¼ Spanfe3 , e4 g, V 4 ¼ W A W A :
The properties of idempotent matrices can easily be checked. It is left for readers.
Using the aforementioned idempotent operators, let us introduce the following
theorem.
Theorem 12.5 [3, 4] Let A be a linear transformation V n ! V n. Suppose that a
vector x (2V n) satisfies the following relation:
ðA αEÞl x ¼ 0, ð12:80Þ
where l is an enough large natural number. Then a set comprising x forms an A-

invariant subspace that corresponds to an eigenvalue α. Let α1, α2, , αs be
eigenvalues of A different from one another. Then V n is decomposed to a direct
sum of the A-invariant subspaces that correspond individually to α1, α2, , αs. This
is succinctly expressed as follows:
g
Vn ¼ W g g
α1 W α2 W αs : ð12:81Þ
e αi ð1 i sÞ is given by
Here W
e αi ¼ fx; x 2 V n , ðA αi E Þli x ¼ 0g,

W ð12:82Þ
e α i ¼ ni .
where li is an enough large natural number. If multiplicity of αi is ni, dim W
Proof Let us define the aforementioned A-invariant subspaces as W e αk ð1 k sÞ.
Let fA(x) be a characteristic polynomial of A. Factorizing fA(x) into a product of
powers of first-degree polynomials, we have
Ys
f A ð xÞ ¼ i¼1
ðx αi Þni , ð12:83Þ
where ni is a multiplicity of αi. Let us put

Ys nj
f i ðxÞ ¼ f A ðxÞ=ðx αi Þni ¼ j6¼i
x αj : ð12:84Þ
Then f1(x), f2(x), , and fs(x) do not have a common factor. Consequently,
Lemma 12.1 tells us that there are polynomials M1(x), M2(x), , and Ms(x) such that
M 1 ðxÞf 1 ðxÞ þ þ M s ðxÞf s ðxÞ ¼ 1: ð12:85Þ
Replacing x with a matrix A, we get
M 1 ðAÞf 1 ðAÞ þ þ M s ðAÞf s ðAÞ ¼ E: ð12:86Þ
Or defining Mi(A)fi(A) Ai
A1 þ A2 þ þ As ¼ E, ð12:87Þ
where E is a (n, n) identity matrix. Moreover we have
Ai Aj ¼ Ai δij : ð12:88Þ
In fact, if i 6¼ j,
Ai Aj ¼ M i ðAÞf i ðAÞM j ðAÞf j ðAÞ ¼ M i ðAÞM j ðAÞf i ðAÞf j ðAÞ

Y
s Y
s
¼ M i ðAÞM j ðAÞ ðA αk Þnk ðA αl Þnl
k6¼i l6¼j
Y
s
¼ M i ðAÞM j ðAÞf A ðAÞ ðA αk Þnk ¼ 0: ð12:89Þ
k6¼i, j
The second equality results from the fact that Mj(A) and fi(A) are commutable
since both are polynomials of A. The last equality follows from Hamilton–Cayley
theorem. On the basis of (12.87) and (12.89),
Ai ¼ Ai E ¼ Ai ðA1 þ A2 þ þ As Þ ¼ A2i : ð12:90Þ
Thus, we find that Ai is an idempotent matrix.

Next, let us show that using Ai determined above, AiV n is identical to
e αi ð1 i sÞ. To this end, we define Wi such that
W
W i ¼ Ai V n ¼ fx; x 2 V n , Ai x ¼ xg: ð12:91Þ
We have
ðx αi Þni f i ðxÞ ¼ f A ðxÞ: ð12:92Þ
Therefore, from Hamilton–Cayley theorem we have

ðA αi E Þni f i ðAÞ ¼ 0: ð12:93Þ
Operating Mi(A) from the left and from the fact that Mi(A) commutes with
ðA αi EÞni , we get
ðA αi E Þni Ai ¼ 0, ð12:94Þ
where we used Mi(A)fi(A) ¼ Ai. Operating both sides of this equation on V n,

furthermore, we have
ðA αi E Þni Ai V n ¼ 0:
This means that
e αi ð1 i sÞ:
Ai V n ⊂ W ð12:95Þ
Conversely, suppose that x 2 W e αi . Then (A αE)lx ¼ 0 holds for a certain

natural number l. If Mi(x)fi(x) were divided out by x αi, LHS of (12.85) would be
divided out by x αi as well, leading to the contradiction. Thus, it follows that
(x αi)l and Mi(x)fi(x) do not have a common factor. Consequently, Lemma 12.1
ensures that we have polynomials M(x) and N(x) such that
M ðxÞðx αi Þl þ N ðxÞM i ðxÞf i ðxÞ ¼ 1:
and, hence,
M ðAÞðA αi E Þl þ N ðAÞM i ðAÞf i ðAÞ ¼ E: ð12:96Þ
Operating both sides of (12.96) on x, we get
M ðAÞðA αi E Þl x þ N ðAÞM i ðAÞf i ðAÞx ¼ N ðAÞAi x = x: ð12:97Þ
Notice that the first term of (12.97) vanishes from (12.82).

As Ai is a polynomial of A, it commutes with N(A). Hence, we have
x ¼ Ai ½N ðAÞx 2 Ai V n : ð12:98Þ
Thus, we get
e αi ⊂ Ai V n ð1 i sÞ:
W ð12:99Þ
From (12.95) and (12.99), we conclude that

e αi ¼ Ai V n ð1 i sÞ:
W ð12:100Þ
e αi that is defined as (12.82).

In other words, Wi defined as (12.91) is identical to W
Thus, we have
Vn ¼ W1 W2 Ws
or
g
Vn ¼ W g g
α1 W α2 W αs : ð12:101Þ
This completes the former half of the proof. With the latter half, the proof is as
follows.
Suppose that dim We αi ¼ n0i . In parallel to the decomposition of V n to the direct
sum of (12.81), A can be reduced as
0 1
Að1Þ 0
B C
A @ ⋮ ⋱ ⋮ A, ð12:102Þ
ðsÞ
0 A
where A(i) (1 i s) is a (n0i , n0i ) matrix and a symbol indicates that A has been
transformed by suitable similarity transformation. The matrix A(i) represents a linear
transformation that A causes to W e αi . We denote a n0i order identity matrix by E n0 .
i
Equation (12.82) implies that the matrix represented by
N i ¼ AðiÞ αi E n0i ð12:103Þ
is a nilpotent matrix. The order of an nilpotent matrix is at most n0i (vide supra) and,
hence, li can be n0i . With Ni we have
h i

f N i ðxÞ ¼ jxEn0i N i j ¼ xE n0i AðiÞ αi E n0i

0
¼ xEn0i AðiÞ þ αi E n0i ¼ ðx þ αi ÞEn0i AðiÞ ¼ xni : ð12:104Þ
The last equality is because eigenvalues of a nilpotent matrix are all zero.
Meanwhile,
0
f AðiÞ ðxÞ ¼ jxE n0i AðiÞ j ¼ f N i ðx αi Þ ¼ ðx αi Þni : ð12:105Þ
Equation (12.105) implies that

12.5 Decomposition of Matrix 485
Ys Ys 0 Ys
f A ð xÞ ¼ f ði Þ ð xÞ ¼
i¼1 A i¼1
ðx αi Þni ¼ i¼1
ðx αi Þni : ð12:106Þ
The last equality comes from (12.83). Thus, n0i ¼ ni . These procedures complete
the proof. At the same time, we may equate li in (12.82) to ni. ∎
Theorem 12.1 shows that any square matrix can be converted to a (upper) triangle
matrix by a similarity transformation. Theorem 12.5 demonstrates that the matrix can
further be segmented according to individual eigenvalues. Considering Theorem
12.1 again, A(i) (1 i s) can be described as an upper triangle matrix by
0 1
αi
B C
AðiÞ @⋮ ⋱ ⋮ A: ð12:107Þ
0 αi
Therefore, denoting N (i) such that

0 1
0
B C
N ðiÞ ¼ AðiÞ αi E ni ¼ @ ⋮ ⋱ ⋮ A, ð12:108Þ
0 0
we find that N (i) is nilpotent. This is because all the eigenvalues of N (i) are zero.
h iμi
N ðiÞ ¼ 0 ðμi ni Þ:
12.5 Decomposition of Matrix
To investigate canonical forms of matrices, it would be convenient if a matrix can be

decomposed into appropriate forms. To this end, the following definition is
important.
Definition 12.4 A matrix similar to a diagonal matrix is said to be semi-simple.
In the above definition, if a matrix is related to another matrix by similarity
transformation, those matrices are said to be similar to each other. When we have
two matrices A and A0, we express it by A~A0 as stated above. This relation satisfies
the equivalence law. That is,
ði Þ A A, ðiiÞ A A0 ) A0 A, ðiiiÞ A A0 , A0 A00 ) A A00 :
Readers, check this. We have a following important theorem with the matrix
decomposition.
Theorem 12.6 [3, 4] Any (n, n) square matrix A is expressed uniquely as
A ¼ S þ N, ð12:109Þ
where S is semi-simple and N is nilpotent; S and N are commutable, i.e., SN ¼ NS.

Furthermore, S and N are polynomials of A with scalar coefficients.
Proof Using (12.86) and (12.87), we write
Xs
S ¼ α1 A1 þ þ αs As ¼ α M ðAÞf i ðAÞ:
i¼1 i i
ð12:110Þ
Then, Eq. (12.110) is a polynomial of A. From Theorem 12.1 and Theorem 12.5,
A(i) (1 i s) in (12.102) is characterized by that A(i) is a triangle matrix whose
eigenvalues αi (1 i s) are positioned on diagonal positions and that the order of
A(i) is identical to the multiplicity of αi. Since Ai (1 i s) is an idempotent matrix,
it should be diagonalized through similarity transformation (see Sect. 12.7). In fact,
S is transformed via similarity transformation the same as (12.102) into
0 1
α1 En1 0
B C
S @ ⋮ ⋱ ⋮ A, ð12:111Þ
0 αs E ns
where Eni ð1 i sÞ is an identity matrix of an order ni that is identical to the

multiplicity of αi. This expression is equivalent to, e.g.,
0 1
E n1 0
B C
A1 @⋮ ⋱ ⋮A
0 0
in (12.110). Thus, S is obviously semi-simple. Putting N ¼ A S, N is described

after the above transformation as
0 1
N ð1Þ 0
B C
N @ ⋮ ⋱ ⋮ A, N ðiÞ ¼ AðiÞ αi Eni : ð12:112Þ
0 N ðsÞ
Since each N (i) is nilpotent as stated in Sect. 12.4, N is nilpotent as well. Also
(12.112) is a polynomial of A as in the case of S. Therefore, S and N are commutable.
To prove the uniqueness of the decomposition, we show the following:
(i) Let S and S0 be commutable semi-simple matrices. Then, those matrices are
simultaneously diagonalized. That is, with a certain non-singular matrix
12.5 Decomposition of Matrix 487
P, P1SP and P1S0P are diagonalized at once. Hence, S S0 is semi-simple

as well.
(ii) Let N and N0 be commutable nilpotent matrices. Then, N N0 is nilpotent
as well.
(iii) A matrix both semi-simple and nilpotent is zero matrix.
(i) Let different eigenvalues of S be α1, , αs. Then, since S is semi-simple, a
vector space V n is decomposed into a direct sum of eigenspaces
W αi ð1 i sÞ. That is, we have
V n ¼ W α1 W αs :
Since S and S0 are commutable, with ∃ x 2 W αi we have

SS0x ¼ S0Sx = S0(αix) ¼ αiS0x. Hence, we have S0 x 2 W αi . Namely, W αi is S0-
invariant. Therefore, if we adopt the basis vectors {a1, , an} with respect to
the direct sum decomposition, we get
0 1 0 1
α1 E n1 0 S01 0
B C B C
S @ ⋮ ⋱ ⋮ A, S0 @⋮ ⋱ ⋮ A::
0 αs E ns 0 S0s
Since S0 is semi-simple, S0i ð1 i sÞ must be semi-simple as well. Here,

let {e1, , en} be original basis vectors before the basis vector transformation
and let P be a representation matrix of the said transformation. Then, we have
ðe1 en ÞP ¼ ða1 an Þ:
Thus, we get
0 1 0 1
α1 En1 0 S01 0
B C B C
P1 SP ¼ @ ⋮ ⋱ ⋮ A, 1 0
P S P ¼ @⋮ ⋱ ⋮ A:
0 αs E ns 0 S0s
This means that both P1SP and P1S0P are diagonal. That is,
P SP P1S0P ¼ P1(S S0)P is diagonal, indicating that S S0 is semi-
1
simple as well.
0
(ii) Suppose that Nν ¼ 0 and N 0ν ¼ 0 . From the assumption, N and N0 are
commutable. Consequently, using binomial theorem we have
m!
N0Þ ¼ Nm mN m1 N 0 þ þ ð 1Þi N mi N 0 þ þ ð 1Þm N 0 :
m i m
ðN
i!ðm iÞ!
ð12:113Þ
Putting m ¼ ν + ν0 1, if i ν0, N0 i ¼ 0 from the supposition. If i < ν0, we have

m i > m ν0 ¼ ν + ν0 1 ν0 ¼ ν 1; i.e., m i ν. Therefore, Nm i ¼ 0.
Consequently, we have Nm iN0 i ¼ 0 with any i in (12.113). Thus, we get
(N N0)m ¼ 0, indicating that N N0 is nilpotent.
(iii) Let S be a semi-simple and nilpotent matrix. We describe S as
0 1
α1 0
B C
S @⋮ ⋱ ⋮ A, ð12:114Þ
0 αn
where some of αi (1 i n) may be identical. Since S is nilpotent, all

αi (1 i n) is zero. We have then S~0; i.e., S ¼ 0 accordingly.
Now, suppose that a matrix A is decomposed differently from (12.109). That is,
we have
A ¼ S þ N ¼ S0 þ N 0 or S S0 ¼ N 0 N: ð12:115Þ
From the assumption, S0 and N0 are commutable. Moreover, since S, S0, N, and N0
are described by a polynomial of A, they are commutable with one another. Hence,
from (i) and (ii) along with the second equation of (12.115), S S0 and N0 N are
both semi-simple and nilpotent at once. Consequently, from (iii) S S0 ¼ N0 N ¼ 0.
Thus, we finally get S ¼ S0 and N ¼ N0. That is, the decomposition is unique.
Theorem 12.6 implies that the matrix decomposition of (12.109) is unique. On the
basis of Theorem 12.6, we investigate Jordan canonical forms of matrices in the next
section.
12.6 Jordan Canonical Form
Once the vector space V n has been decomposed to a direct sum of generalized
eigenspaces with the matrix reduced in parallel, we are able to deal with individual
e αi ð1 i sÞ and the corresponding A(i) (1 i s) separately.
eigenspaces W
12.6.1 Canonical Form of Nilpotent Matrix
To avoid complication of notation, we think of a following example where we

assume a (n, n) matrix that operates on V n: Let the nilpotent matrix be N: Suppose
that Nν 1 6¼ 0 and Nν ¼ 0 (1 ν n). If ν ¼ 1, the nilpotent matrix N is a zero
12.6 Jordan Canonical Form 489
matrix. Notice that since a characteristic polynomial is described by fN(x) ¼ xn,

fN(N ) ¼ Nn ¼ 0 from Hamilton–Cayley theorem. Let W (i) be given such that
W ðiÞ ¼ x; x 2 V n , N i x ¼ 0 :
Then we have
V n ¼ W ðνÞ ⊃ W ðν1Þ ⊃ ⊃ W ð1Þ ⊃ W ð0Þ f0g: ð12:116Þ
Note that when ν ¼ 1, we have trivially V n ¼ W (ν) ⊃ W (0) {0}. Let us put dim
W ¼ mi, mi mi 1 ¼ ri (1 i ν), m0 0. Then we can add rν linearly
(i)
independent vectors a1 , a2 , , and arν to the basis vectors of W (ν 1) so that those

rν vectors can be basis vectors of W (ν). Unless N ¼ 0 (i.e., zero matrix), we must
have at least one such vector; from the supposition with ∃x 6¼ 0 we have Nν 1x 6¼ 0,
and so x 2= W (ν 1). At least one such vector x is present and it is eligible for a basis
vector of W (ν). Hence, rν 1 and we have
W ðνÞ ¼ Spanfa1 , a2 , , arν g W ðν1Þ : ð12:117Þ
Note that (12.117) is expressed as a direct sum. Meanwhile,

Na1 , Na2 , , Narν 2 W ðν1Þ. In fact, suppose that x 2 W (ν), i.e., Nνx ¼ Nν 1(Nx) = 0.
That is, Nx 2 W (ν 1).
According to a similar reasoning made above, we have
SpanfNa1 , Na2 , , Narν g \ W ðν2Þ ¼ f0g:
Moreover, these rν vectors Na1 , Na2 , , and Narν are linearly independent.
Suppose that
c1 Na1 þ c2 Na2 þ þ crν Narν = 0: ð12:118Þ
Operating N ν 2 from the left, we have
N ν1 ðc1 a1 þ c2 a2 þ þ crν arν Þ = 0:
This would imply that c1 a1 þ c2 a2 þ þ crν arν 2 W ðν1Þ . On the basis of the
above argument, however, we must have c1 ¼ c2 ¼ ¼ crν ¼ 0. In other words, if
ci (1 i rν) were nonzero, we would have ai 2 W (ν 1), in contradiction. From
(12.118) this means linear independence of Na1 , Na2 , , and Narν .
As W (ν 1) ⊃ W (ν 2), we may well have additional linearly independent
vectors within the basis vectors of W (ν 1). Let those vectors be arν þ1 , , arν1. Here
we assume that the number of such vectors is rν 1 rν. We have rν 1 rν 0
accordingly. In this way we can construct basis vectors of W (ν 1) by including
arν þ1 , , and arν1 along with Na1 , Na2 , , and Narν . As a result, we get
W ðν1Þ ¼ SpanfNa1 , , Narν , arν þ1 , , arν1 g W ðν2Þ :
We can repeat these processes to construct W (ν 2) such that
W ðν2Þ ¼ Span N 2 a1 , , N 2 arν , Narν þ1 , , Narν1 , arν1 þ1 , , arν2 W ðν3Þ :
For W (i), furthermore, we have
W ðiÞ ¼ Span N νi a1 , , N νi arν , N νi1 arν þ1 , , N νi1 arν1 , , ariþ1 þ1 , , ari
W ði1Þ :
ð12:119Þ
Further repeating the procedures, we exhaust all the n basis vectors of W (ν) ¼ V n.
These vectors are given as follows:
N k ariþ1 þ1 , , N k ari ð1 i ν; 0 k i 1Þ:
At the same time we have
0 r νþ1 < 1 r ν r ν1 r 1 : ð12:120Þ
Table 12.1 [3, 4] shows the resulting structure of these basis vectors pertinent to
Jordan blocks. In Table 12.1, if laterally counting basis vectors, from the top we have
rν, rν 1, , r1 vectors. Their sum is n. This is the same number as that vertically
counted. The dimension n of the vector space V n is thus given by
Xν Xν
n¼ r ¼
i¼1 i i¼1
iðr i r iþ1 Þ: ð12:121Þ
Let us examine the structure of Table 12.1 more closely. More specifically, let us
inspect the i-layered structures of (ri ri + 1) vectors. Picking up a vector from
among ariþ1 þ1 , , ari , we call it aρ. Then, we get following set of vectors aρ, Naρ
N2aρ, , Ni 1aρ in the i-layered structure. These i vectors are displayed “verti-
cally” in Table 12.1. These vectors are linearly independent (see Theorem 12.4) and
form an i- dimensional N-invariant subspace; i.e., Span{Ni 1aρ, Ni 2aρ, , Naρ,
aρ}, where ri + 1 + 1 ρ ri. Matrix representation of the linear transformation
N with respect to the set of these i vectors is
12.6
Table 12.1 Jordan blocks and structure of basis vectors for a nilpotent matrixa
Jordan Canonical Form
rν#
rν a1 , , arν rν 1 rν#
rν 1 Na1 , , Narν arν þ1 , , arν1
N 2 a1 , , N 2 arν Narν þ1 , , Narν1
ri ri + 1#
ri N νi a1 , , N νi arν N νi1 arν þ1 , , N νi1 arν1 ariþ1 þ1 , , ari
ri 1 N νi1 a1 , , N νi1 arν N νi2 arν þ1 , , N νi2 arν1 Nariþ1 þ1 , , Nari
r2 r3#
r2 N ν2 a1 , , N ν2 arν N ν3 arν þ1 , , N ν3 arν1 N i2 ariþ1 þ1 , , N i2 ari ar3 þ1 , , ar2 r1 r2#
r1 ν1 ν1 ν2 ν2 i1 i1 Nar3 þ1 , , Nar2 ar2 þ1 , , ar1
N a1 , , N ar ν N arν þ1 , , N arν1 N ariþ1 þ1 , , N ari
P
n¼ νk¼1 r k νrν (ν 1)(rν 1 rν) i(ri ri + 1) 2(r2 r3) r1 r2
a
Adapted from Satake I (1974) Linear algebra (Mathematics Library 1: in Japanese) [4], with the permission of Shokabo Co., Ltd., Tokyo
491

N i1 aρ N i2 aρ Naρ aρ N
0 1
0 1
B C
B C
B 0 1 C
B C
B C
B 0 C
i1 B C
B C
¼ N aρ N aρ Naρ aρ B
i2
⋮ ⋱ ⋮ C: ð12:122Þ
B C
B C
B 0 1 C
B C
B C
B 0 1C
@ A
0
These (i, i) matrices of (12.122) are called i-th order Jordan blocks. Notice that
the number of those Jordan blocks is (ri ri + 1). Let us expressly define this
number as [3, 4]
J i ¼ r i r iþ1 , ð12:123Þ
where Ji is the number of the i-th order Jordan blocks. The total number of Jordan
blocks within a whole vector space V n ¼ W (ν) is
Xν Xν
J
i¼1 i
¼ i¼1
ðr i r iþ1 Þ ¼ r 1 : ð12:124Þ
Recalling the dimension theorem mentioned in (11.45), we have
dim V n ¼ dim Ker N i þ dimN i ðV n Þ ¼ dim Ker N i þ rank N i : ð12:125Þ
Meanwhile, since W (i) ¼ Ker Ni, dim W (i) ¼ mi ¼ dim Ker Ni. From (12.116),
m0 0. Then (11.45) is now read as
dim V n ¼ mi þ rankN i : ð12:126Þ
That is
n ¼ mi þ rankN i ð12:127Þ
or
mi ¼ n rankN i : ð12:128Þ
Meanwhile, from Table 12.1 [3, 4] we have

Xi Xi1
dim W ðiÞ ¼ mi ¼ r , dim
k¼1 k
W ði1Þ ¼ mi1 ¼ r :
k¼1 k
ð12:129Þ
Hence, we have
r i ¼ mi mi1 : ð12:130Þ
Then we get [3, 4]
J i ¼ r i r iþ1 ¼ ðmi mi1 Þ ðmiþ1 mi Þ ¼ 2mi mi1 miþ1

¼ 2 n rankN i n rankN i1 n rankN iþ1
¼ rankN i1 þ rankN iþ1 2rankN i : ð12:131Þ
The number Ji is therefore defined uniquely by N. The total number of Jordan

blocks r1 is also computed using (12.128) and (12.130) as
r 1 ¼ m1 m0 ¼ m1 ¼ n rankN ¼ dim Ker N, ð12:132Þ
where the last equality arises from the dimension theorem expressed as (11.45).
In Table 12.1 [3, 4], moreover, we have two extreme cases. That is, if ν ¼ 1 in
(12.116), i.e., N ¼ 0, from (12.132)
P we have r1 ¼ n and r2 ¼ ¼ rn ¼ 0; see
Fig. 12.1a. Also we confirm n ¼ νi¼1 r i in (12.121). In this case, all the eigenvectors
are proper eigenvectors with multiplicity of n and we have n first-order Jordan
blocks. The other is the case of ν ¼ n. In that case, we have
r 1 ¼ r 2 ¼ ¼ r n ¼ 1: ð12:133Þ
P
In the latter case, we also have n ¼ νi¼1 r i in (12.121). We have only one proper
eigenvector and (n 1) generalized eigenvectors; see Fig. 12.1b. From (12.132), we
have this special case where we have only one n-th order Jordan block, when
rankN ¼ n 1.
12.6.2 Jordan Blocks
Let us think of (12.81) on the basis of (12.102). Picking up A(i) from (12.102) and
considering (12.108), we put
N i ¼ AðiÞ αi E ni ð1 i sÞ, ð12:134Þ

Fig. 12.1 Examples of the (b)

structure of Jordan blocks.
(a) r1 ¼ n. (b)
r1 ¼ r2 ¼ ¼ rn ¼ 1
(a) ⋯
, , ⋯,
= = =⋯=1
where Eni denotes (ni, ni) identity matrix. We express a nilpotent matrix as Ni as
before. In (12.134) the number ni corresponds to n in V n of Sect. 12.6.1. As Ni is a
(ni, ni) matrix, we have
N i ni ¼ 0:
Here we are speaking of ν-th order nilpotent matrices Ni such that Niν 1 6¼ 0 and
ν
Ni ¼ 0 (1 ν ni). We can deal with Ni in a manner fully consistent with the theory
we developed in Sect. 12.6.1. Each A(i) comprises one or more Jordan blocks A(κ)
that is expressed as
AðκÞ ¼ N κi þ αi E κi ð1 κi ni Þ, ð12:135Þ
where A(κ) denotes the κ-th Jordan block in A(i). In A(κ), N κi and Eκi are nilpotent
(κi, κ i) matrix and (κ i, κi) identity matrix, respectively. In N κi , κi zeros are displayed
on the principal diagonal and entries of 1 are positioned on the matrix element next
above the principal diagonal. All other entries are zero; see, e.g., a matrix of
(12.122). As in the Sect. 12.6.1, the number κi is called a dimension of the Jordan
block. Thus, A(i) of (12.102) can further be reduced to segmented matrices A(κ). Our
next task is to find out how many Jordan blocks are contained in individual A(i) and
what is the dimension of those Jordan blocks.
Corresponding to (12.122), the matrix representation of the linear transformation
by A(κ) with respect to the set of κ i vectors is
h iκi 1 h h i
AðκÞ αi E κi aσ AðκÞ αi Eκi κ i 2
aσ aσ AðκÞ αi Eκi
h iκi 1 h iκi 2
ðκ Þ ðκ Þ
¼ A αi Eκi aσ A αi E κi aσ aσ
0 1
0 1
B C
B 0 1 C
B C
B 0 ⋱ C
B C
B C
B ⋮ ⋱ ⋱ ⋮ C: ð12:136Þ
B C
B 0 1 C
B C
B C
@ 0 1A
0
A vector aσ stands for a vector associated with the κ-th Jordan block of A(i).
From (12.136) we obtain
h i h iκi 1
AðκÞ αi Eκi AðκÞ αi Eκi aσ = 0: ð12:137Þ
Namely,
h iκi 1 h iκi 1
AðκÞ AðκÞ αi E κi aσ = αi AðκÞ αi E κi aσ : ð12:138Þ
κi 1
This shows that AðκÞ αi Eκi aσ is a proper eigenvector of A(κ) that corre-
sponds to an eigenvalue αi. On the other hand, aσ is a generalized eigenvector of rank
κi. There are another (κ i 2) generalized eigenvectors of
μ
AðκÞ αi Eκi aσ ð1 μ κ i 2Þ . In total, there are κi eigenvectors [a proper
eigenvector and (κi 1) generalized eigenvectors]. Also we see that the sole proper
eigenvector can be found for each Jordan block.
In reference to these κ i eigenvectors as the basis vectors, the (κ i,κi)-matrix A(κ)
(i.e., a Jordan block) is expressed as
1
⎛ 1 ⎞
⎜ ⋱ ⎟
( )
= ⎜ ⋮ ⋱ ⋱ ⋮ ⎟. ð12:139Þ
⎜ 1 ⎟
⎜ ⎟
A (ni, ni) matrix A(i) of (12.102) pertinent to an eigenvalue αi contains a direct

sum of Jordan blocks whose dimension ranges from 1 to ni. The largest possible
number of Jordan blocks of dimension d (that satisfies n2i þ 1 d ni , where [μ]
denotes a largest integer that does not exceed μ) is at most one.
An example depicted below is a matrix A(i) that explicitly includes two one-
dimensional Jordan blocks, a (3, 3) three-dimensional Jordan block, and a (5, 5) five-
dimensional Jordan block:
0 1
αi 0
B αi C
B 0 C
B C
B αi 1 C
B C
B αi C
B 1 C
B C
B ⋮ αi 0 ⋮ C
AðiÞ B C,
B αi C
B 1 C
B C
B αi 1 C
B C
B αi C
B 1 C
B C
@ αi 1A
αi
where A(i) is a (10, 10) upper triangle matrix in which αi is displayed on the principal
diagonal with entries 0 or 1 on the matrix element next above the principal diagonal
with all other entries being zero.
Theorem 12.1 shows that every (n, n) square matrix can be converted to a triangle
matrix by suitable similarity transformation. Diagonal elements give eigenvalues.
Furthermore, Theorem 12.5 ensures that A can be reduced to generalized
eigenspaces W e αi (1 i s) according to individual eigenvalues. Suppose for
example that after a suitable similarity transformation a full matrix A is rresented as
0 1
α1
B α2 C
B C
B C
B α2 C
B C
B α2 C
B C
B C
B ⋱ C
B C
B αi C
B C
A B C, ð12:140Þ
B αi C
B C
B αi C
B C
B C
B αi C
B C
B ⋱ C
B C
B C
@ αs A
αs
where
A ¼ Að1Þ Að2Þ AðiÞ AðsÞ : ð12:141Þ
In (12.141), A(1) is a (1, 1) matrix (i.e., simply a number); A(2) is a (3, 3) matrix;
A(i) is a (4, 4) matrix; A(s) is a (2, 2) matrix.
The above matrix form allows us to further deal with segmented triangle matrices
separately. In the case of (12.140) we may use a following matrix for similarity
transformation:
0 1
1
B C
B C
B 1 C
B C
B C
B 1 C
B C
B C
B 1 C
B C
B C
B ⋱ C
B C
B C
B C
e¼B
P B p11 p12 p13 p14 C
C,
B C
B p21 p22 p23 p24 C
B C
B C
B C
B p31 p32 p33 p34 C
B C
B C
B p41 p42 p43 p44 C
B C
B ⋱ C
B C
B C
B C
@ 1 A
1
where a (4, 4) matrix P given by

0 1
p11 p12 p13 p14
B C
B p21 p22 p23 p24 C
P¼B
Bp
C
C
@ 31 p32 p33 p34 A
p41 p42 p43 p44
is a non-singular matrix. The matrix P is to be operated on A(i) so that we can

separately perform the similarity transformation with respect to a (4, 4) nilpotent
matrix A(i) αiE4 following the procedures mentioned Sect. 12.6.1.
Thus, only an αi-associated segment can be treated with other segments left
unchanged. In a similar fashion, we can consecutively deal with matrix segments
related to other eigenvalues. In a practical case, however, it is more convenient to
seek different eigenvalues and corresponding (generalized) eigenvectors at once and
convert the matrix to Jordan canonical form. To make a guess about the structure of a
matrix, however, the following argument will be useful. Let us think of an example
after that.
0 1
α1 αi
B C
B C
B α 2 αi C
B C
B C
B C
B α2 αi C
B C
B C
B C
B α2 αi C
B C
B C
B C
B ⋱ C
B C
B C
B C
B 0 C
A αi E B C
B C
B 0 C
B C
B C
B C
B 0 C
B C
B C
B C
B 0 C
B C
B C
B ⋱ C
B C
B C
B αs αi C
B C
@ A
αs αi
Here we are treating the (n, n) matrix A on V n. Note that a matrix A αiE is not
nilpotent as a whole. Suppose that the multiplicity of αi is ni; in (12.140) ni ¼ 4.
Since eigenvalues α1, α2, , and αs take different values from one another, α1 αi,
α2 αi, , and αs αi 6¼ 0. In a triangle matrix diagonal elements give eigenvalues
and, hence, α1 αi, α2 αi, , and αs αi are nonzero eigenvalues of A αiE. We
rewrite (12.140) as
0 1
M ð1Þ
B M ð2Þ C
B C
B C
B ⋱ C
A αi E B C, ð12:142Þ
B M ðiÞ C
B C
B C
@ ⋱ A
M ðsÞ
0 1
α2 αi
B C
where M ð1Þ ¼ ðα1 αi Þ, M ð2Þ ¼ @ α2 αi A, ,
α2 αi
0 1
0
B
ðiÞ B 0 CC ðsÞ αs αi
M ¼B C, , M ¼ :
@ 0 A αs αi
0
Thus, M ( p) ( p 6¼ i) is a non-singular matrix and M (i) is a nilpotent matrix. Note

μ 1 μ
that if we can find μi such that M ðiÞ i 6¼ 0 and M ðiÞ i ¼ 0 , with a minimal
polynomial φM ðiÞ ðxÞ for M (i), we have φM ðiÞ ðxÞ ¼ xμi . Consequently, we get
0 μi 1
M ð1Þ
B μi C
B M ð2Þ C
B C
B C
B ⋱ C
ðA αi EÞμi B C: ð12:143Þ
B 0 C
B C
B C
@ ⋱ A
μi
M ðsÞ

ðpÞ μi
In (12.143)
μidiagonal elements of non-singular triangle matrices M ðp 6¼ iÞ
are αp αi ð6¼ 0Þ . Thus, we have a “perforated” matrix ðA αi E Þμi where
μ
M ðiÞ i ¼ 0 in (12.143). Putting
Ys
ΦA ð x Þ i¼1
ðx αi Þμi ,
we get
Ys
ΦA ðAÞ i¼1
ðA αi E Þμi ¼ 0:
A polynomial ΦA(x) gives a minimal polynomial for fA(x).

From the above argument, we can choose μi for li in (12.82). Meanwhile, M (i) in
(12.142) is identical to Ni in (12.134). Rewriting ((12.134)), we get
AðiÞ ¼ M ðiÞ þ αi E ni : ð12:144Þ

k
Let us think of matrices AðiÞ αi E ni and (A αiE)k (k μi). From (11.45), we
find
dim V n ¼ n ¼ dim Ker ðA αi E Þk þ rank ðA αi E Þk , ð12:145Þ
and
h ik h ik
e αi ¼ ni ¼ dim Ker AðiÞ αi Eni þ rank AðiÞ αi E ni ,
dim W ð12:146Þ
k
where rank (A αiE)k ¼ dim (A αiE)k(V n) and rank AðiÞ αi Eni ¼
k
dim AðiÞ αi Eni W e αi .
Noting that
h ik
dim Ker ðA αi E Þk ¼ dim Ker AðiÞ αi E ni , ð12:147Þ
we get
h ik
n rank ðA αi E Þk ¼ ni rank AðiÞ αi Eni : ð12:148Þ
This notable property comes from the non-singularity of [M ( p)]k ( p 6¼ i, k: a

positive integer); i.e., all the eigenvalue of [M ( p)]k are nonzero. In particular, as
μ
rank AðiÞ αi E ni i ¼ 0, from (12.148) we have
rankðA αi E Þl ¼ n ni ðl μi Þ: ð12:149Þ
Meanwhile, putting k ¼ 1 in (12.148) and using (12.132) we get

h i h i
dim Ker AðiÞ αi E ni ¼ ni rank AðiÞ αi E ni
¼ n rank ðA αi E Þ ¼ dim Ker ðA αi EÞ

ðiÞ
¼ r1 : ð12:150Þ
ðiÞ
The value r 1 gives the number of Jordan blocks with an eigenvalue αi.
Moreover, we must consider a following situation. We know how the matrix A(i)
in (10.134) is reduced to Jordan blocks of lower dimension. To get detailed infor-
mation about it, however, we have to get the information about (generalized)
eigenvalues corresponding to eigenvalues other than αi. In this context, (12.148) is
useful. Equation (12.131) tells how the number of Jordan blocks in a nilpotent matrix
is determined. If we can get this knowledge before we have found out all the
(generalized) eigenvectors, it will be easier to address the problem. Let us rewrite
(12.131) as
h iq1 h iqþ1
J ðqiÞ ¼ r q r qþ1 ¼ rank AðiÞ αi Eni þ rank AðiÞ αi E ni
h iq
2rank AðiÞ αi E ni , ð12:151Þ
where we define J ðqiÞ as the number of the q-th order Jordan blocks within A(i). Note
that these blocks are expressed as (q, q) matrices. Meanwhile, using (12.148) J ðqiÞ is
expressed as
J ðqiÞ ¼ rank ðA αi EÞq1 þ rank ðA αi E Þqþ1 2rank ðA αi E Þq : ð12:152Þ
This relation can be obtained by replacing k in (12.148) with q 1, q + 1, and q,

respectively, and deleting n and ni from these three relations. This enables us to gain
access to a whole structure of the linear transformation represented by the (n, n)
matrix A without reducing it to subspaces.
To enrich our understanding of Jordan canonical forms, a following tangible
example will be beneficial.
12.6.3 Example of Jordan Canonical Form
Let us think of a following matrix A:

0 1
1 1 0 0
B1 0C
B 3 0 C
A¼B C: ð12:153Þ
@0 0 2 0A
3 1 2 4
The characteristic equation fA(x) is given by

x 1 1 0 0

1 x3 0 0

f A ðxÞ ¼ :

0 0 x2 0 ð12:154Þ

3 1 2 x 4
¼ ðx 4Þðx 2Þ3
Equating (12.154) to zero, we get an eigenvalue 4 as a simple root and an

eigenvalue 2 as a triple root. The vector space V4 is then decomposed to two
invariant subspaces. The first is a one-dimensional kernel (or null-space) of the
transformation (A 4E) and the other is a three-dimensional kernel of the transfor-
mation (A 2E)3. We have to seek eigenvectors that span these invariant subspaces.
ðiÞ x ¼ 4:
An eigenvector belonging to the first invariant subspace must satisfy a

proper eigenvalue equation since the eigenvalue 4 is simple root. This equation
is expressed as
ðA 4EÞx ¼ 0:
This reads in a matrix form as

0 10 1
3 1 0 0 c1
B1 1 C
0 C B c2 C
B
B 0 C
B CB C ¼ 0: ð12:155Þ
@0 0 2 0 A @ c3 A
3 1 2 0 c4
This is equivalent to a following set of four equations:
3c1 c2 ¼ 0,
c1 c2 ¼ 0,
2c3 ¼ 0,
3c1 c2 2c3 ¼ 0:
These are equivalent to that c3 ¼ 0 and c1 ¼ c2 ¼ 3c1. Therefore, c1 ¼ c2 ¼ c3 ¼ 0

with an arbitrarily chosen number of c4, which is chosen as 1 as usual. Hence,
ð4Þ
designating the proper eigenvector as e1 , its column vector representation is
0 1
0
B0C
ð4Þ B C
e1 ¼ B 0 C:
@ A
1
A (4, 4) matrix in (12.155) representing (A 4E) has a rank 3. The number of

Jordan blocks for an eigenvalue 4 is given by (12.150) as
ð4Þ
r 1 ¼ 4 rankðA 4EÞ ¼ 1: ð12:156Þ
In this case, the Jordan block is naturally one-dimensional. In fact, using (12.152)
we have
ð4Þ
J 1 ¼ rank ðA 4E Þ0 þ rank ðA 4EÞ2 2rank ðA 4E Þ
¼ 4 þ 3 2 3 ¼ 1: ð12:157Þ
ð4Þ
In (12.157), J 1 gives the number of the first-order Jordan blocks for an
eigenvalue 4. We used
0 1
8 4 0 0
B 4 0C
B 0 0 C
ðA 4EÞ2 ¼ B C ð12:158Þ
@0 0 4 0A
8 4 4 0
and confirmed that rank (A 4E)2 ¼ 3.
ðiiÞ x ¼ 2 :
The eigenvalue 2 has a triple root. Therefore, we must examine how the invariant
subspaces can further be decomposed to subspaces of lower dimension. To this end
we first start with a secular equation expressed as
ðA 2EÞx ¼ 0: ð12:159Þ
The matrix representation is

0 10 1
1 1 0 0 c1
B1 0C B C
B 1 0 CB c2 C
B CB C ¼ 0:
@0 0 0 0 A@ c3 A
3 1 2 2 c4
This is equivalent to a following set of two equations:
c1 þ c2 ¼ 0,
3c1 c2 2c3 þ 2c4 ¼ 0:
From the above, we can put c1 ¼ c2 ¼ 0 and c3 ¼ c4 (¼1). The equations allow the
existence of another proper eigenvector. For this we have c1 ¼ c2 ¼ 1, c3 ¼ 0, and
c4 ¼ 1. Thus, for the two proper eigenvectors corresponding to an eigenvalue 2, we
get
0 1 0 1
0 1
B0C B 1 C
ð2Þ B C ð2Þ B C
e1 ¼ B 1 C, e2 ¼ B 0 C:
@ A @ A
1 1
A dimension of the invariant subspace corresponding to an eigenvalue 2 is three

(due to the triple root) and, hence, there should be one generalized eigenvector. To
determine it, we examine a following matrix equation:
ðA 2EÞ2 x ¼ 0: ð12:160Þ
The matrix representation is

0 10 1
0 0 0 0 c1
B0 0C B C
B 0 0 CB c2 C
B CB C:
@0 0 0 0 A@ c3 A
4 0 4 4 c4
That is,
c1 c3 þ c4 ¼ 0: ð12:161Þ
0 1
0 0 0 0
B0 0C
B 0 0 C
ðA 2E Þ3 ¼ B C: ð12:162Þ
@0 0 0 0A
8 0 8 8
Moreover, rank (A 2E)l(¼1) remains unchanged for l 2 as expected from

(12.149).
It will be convenient to examine a structure of the invariant subspace. For this
ð2Þ
purpose, we seek the number of Jordan blocks r 1 and their order. Using (12.150),
we have
ð2Þ
r 1 ¼ 4 rankðA 2E Þ ¼ 4 2 ¼ 2: ð12:163Þ
The number of first-order Jordan blocks is
ð2Þ
J 1 ¼ rank ðA 2EÞ0 þ rank ðA 2EÞ2 2rank ðA 2E Þ
¼4þ12 2 ¼ 1: ð12:164Þ
In turn, the number of second-order Jordan blocks is
ð 2Þ
J 2 ¼ rank ðA 2E Þ þ rank ðA 2E Þ3 2rank ðA 2E Þ2
¼ 2 þ 1 2 1 ¼ 1: ð12:165Þ
ð2Þ ð2Þ
In the above, J 1 and J 2 are obtained from (12.152). Thus, Fig. 12.2 gives a
constitution of Jordan blocks for eigenvalues 4 and 2. The overall number of Jordan
blocks is three; the number of the first-order and second-order Jordan blocks is two
and one, respectively.
ð2Þ ð2Þ
The proper eigenvector e1 is related to J 1 of (12.164). A set of the proper
ð2Þ ð2Þ
eigenvector e2 and the corresponding generalized eigenvector g2 is pertinent to
ð2Þ ð2Þ
J 2 of (12.165). We must have the generalized eigenvector g2 in such a way that
h i
ð2Þ ð2Þ ð2Þ
ðA 2EÞ2 g2 ¼ ðA 2E Þ ðA 2E Þg2 = ðA 2EÞe2 = 0: ð12:166Þ
From (12.161), we can put c1 ¼ c3 ¼ c4 ¼ 0 and c2 ¼ 1. Thus, the matrix

ð2Þ
representation of the generalized eigenvector g2 is
0 1
0
B 1 C
ð2Þ B C
g2 ¼ B C: ð12:167Þ
@0 A
0
ð2Þ ð2Þ
We stress here that e1 is not eligible for a proper pair with g2 in J2. It is because
from (12.166) we have

ðA 2E Þg2 ¼ e2 , ðA 2EÞg2 6¼ e1 : ð12:168Þ
Thus, we have determined a set of (generalized) eigenvectors. The matrix repre-

sentation R for the basis vectors transformation is given by
0 1
0 0 1 0
B0 0 1 1 C
B C ð4Þ ð2Þ ð2Þ ð2Þ
R¼B C e 1 e 1 e 2 g2 , ð12:169Þ
@0 1 0 0 A
1 1 1 0
ð4Þ ð2Þ ð2Þ

where the symbol ~ denotes the column vector representation; e1 , e1 , and e2
ð2Þ
represent proper eigenvectors and g2 is a generalized eigenvector. Performing
similarity transformation using this R, we get a following Jordan canonical form:
Fig. 12.2 Structure of -RUGDQEORFNV

Jordan blocks of a matrix
shown in (12.170)
( )
( ) ( ) ( )
(LJHQYDOXH (LJHQYDOXH
0 10 10 1
1 0 1 1 1 1 0 0 0 0 1 0
B CB CB C
B0 0 1 0C B 0C B 1 1 C
1 B CB 1 3 0 CB 0 0 C
R AR ¼ B CB CB C
B1 0 0 0C B 0C B 0 C
@ A@ 0 0 2 A@ 0 1 0 A
1 1 0 0 3 1 2 4 1 1 1 0
0 1
4 0 0 0
B C
B 0 2 0 0C
B C
¼B C:
B 0 0 2 1C
@ A
0 0 0 2
ð12:170Þ
The structure of Jordan blocks is shown in Fig. 12.2. Notice here that the trace of
A remains unchanged before and after similarity transformation.
Next, we consider column vector representations. According to (11.37), let us
view the matrix A as a linear transformation over V 4. Then A is given by
0 1
x1
B x2 C
B C
AðxÞ ¼ ðe1 e2 e3 e4 ÞAB C
@x A
3
x4
0 10 1
1 1 0 0 x1
B1 0C B C
B 3 0 CB x2 C
¼ ð e1 e2 e3 e4 Þ B CB C, ð12:171Þ
@0 0 2 0 A@ x3 A
3 1 2 4 x4
where e1, e2, e3, and e4 arePbasis vectors

and
x1, x2, x3, and x4 are corresponding
coordinates of a vector x = ni¼1 xi ei 2 V 4 . We rewrite (12.171) as
0 1
x1
B C
B x2 C
B C
AðxÞ ¼ ðe1 e2 e3 e4 ÞRR1 ARR1 B
B
C
C
Bx C
@ 3A
x4
0 10 0 1
4 0 0 0 x1
B 0 0C B x02 C
ð4Þ ð2Þ ð2Þ ð2Þ B 2 0 CB C
¼ e 1 e 1 e 2 g2 B CB C, ð12:172Þ
@0 0 2 1 A@ x03 A
0 0 0 2 x04
where we have

e1 e1 e2 g2 ¼ ðe1 e2 e3 e4 ÞR,
0 0 1 0 1
x1 x1
B 0 C Bx C
B x2 C B 2C ð12:173Þ
B C 1 B C:
B C¼R B C
B x0 C @ A
@ 3A x 3
x04
x4
After (11.84), we have

0 1 0 1
x1 x1
Bx C Bx C
B 2C B 2C
B
x = ð e1 e2 e 3 e4 Þ B C 1 B C
C ¼ ðe1 e2 e3 e4 ÞRR B C
@ x3 A @ x3 A
x4 x4
0 0 1
x1
B 0 C
B x C
ð4Þ ð2Þ ð2Þ ð2Þ B 2 C
¼ e 1 e 1 e 2 g2 B C: ð12:174Þ
B x0 C
@ 3A
x04
As for V n in general, let us put R ¼ ( p)ij and R1 ¼ (q)ij. Also represent j-th
(generalized) eigenvectors by a column vector and denote them by p( j ). There we
display individual (generalized) eigenvectors in the order of (e(1) e(2) e( j ) e(n)),
where e( j ) (1 j n) denotes either a proper eigenvector or a generalized
eigenvector according to (12.174). Each p( j ) is represented in reference to original
basis vectors (e1 en). Then we have
Xn ð jÞ ð jÞ
R1 pð jÞ ¼ q p
k¼1 ik k
¼ δi , ð12:175Þ
ð jÞ
where δi denotes a column vector to which only the j-th row is 1, otherwise
ð jÞ
0. Thus, a column vector δi is an “address” of e( j ) in reference to
(e e e e ) taken as basis vectors.
(1) (2) ( j) (n)
In our present case, in fact, we have

0 10 1 0 1
1 0 1 1 0 1
B0 0 CB 0 C B 0 C
C B C B
B 0 1 C
R1 pð1Þ ¼ B CB C ¼ B C,
@1 0 0 0 A @ 0 A @ 0A
1 1 0 0 1 0
0 10 1 0 1
1 0 1 1 0 0
B0 0C B0C B1C
B 0 1 CB C B C
R1 pð2Þ ¼B CB C ¼ B C,
@1 0 0 0 A@ 1 A @ 0 A
1 1 0 0 1 0
0 10 1 0 1
1 0 1 1 1 0
B0 0 CB 1 C B 0 C
C B C B
B 0 1 C
R1 pð3Þ ¼B CB C ¼ B C,
@1 0 0 0 A@ 0 A @ 1 A
1 1 0 0 1 0
0 10 0 1
1
1 0 1 1 0 0
B0 0 CB 1 C B 0 C
C B C B
B 0 1 C
R1 pð4Þ ¼B CB C ¼ B C: ð12:176Þ
@1 0 0 0 A @ 0 A @ 0A
1 1 0 0 0 1
ð4Þ
In (12.176) p(1) is a column vector representation of e1 ; p(2) is a column vector
ð2Þ
representation of e1 , and so on.
A minimal polynomial ΦA(x) is expressed as
ΦA ð x Þ ¼ ð x 4Þ ð x 2Þ 2 :
Readers can easily make sure of it.

We remark that a non-singular matrix R pertinent to the similarity transformation
is not uniquely determined, but we have arbitrariness. In (12.169), for example, if we
adopt g2 instead of g2 , we should adopt e2 instead of e2 accordingly. Thus,
instead of R in (12.169) we may choose
0 1
0 0 1 0
B0 0 1C
B 1 C
R0 ¼ B C: ð12:177Þ
@0 1 0 0A
1 1 1 0
In this case, we also get the same Jordan canonical form as before. That is,
0 1
4 0 0 0
B0 2 0C
1 B 0 C
R0 AR0 ¼ B C:
@0 0 2 1A
0 0 0 2
Suppose that we choose R00 such that

0 1
0 1 0 0
B0 1 1 0C
B C ð2Þ ð2Þ ð2Þ ð4Þ
ðe1 e2 e3 e4 ÞR00 ¼ ðe1 e2 e3 e4 ÞB C ¼ e 1 e 2 g2 e 1 :
@1 0 0 0A
1 1 0 1
In this case, we have

0 1
2 0 0 0
B0 0C
1 B 2 1 C
R00 AR00 ¼ B C:
@0 0 2 0A
0 0 0 4
Note that we get a different disposition of the matrix elements from that of
(12.172).
Next, we decompose A into a semi-simple matrix and a nilpotent matrix. In
(12.172), we had
0 1
4 0 0 0
B0 0C
B 2 0 C
R1 AR ¼ B C:
@0 0 2 1A
0 0 0 2
Defining
0 1 0 1
4 0 0 0 0 0 0 0
B0 0C B0 0C
B 2 0 C B 0 0 C
S¼B C and N¼B C,
@0 0 2 0A @0 0 0 1A
0 0 0 2 0 0 0 0
we have
R1 AR ¼ S þ N, i:e:, A ¼ RðS þ N ÞR1 ¼ RSR1 þ RNR1 :
Performing the above matrix calculations and putting e e¼

S ¼ RSR1 and N
1
RNR , we get
A¼e e
SþN
with
0 1 0 1
2 0 0 0 1 1 0 0
B0 0C B1 0C
e B 2 0 C e¼B
1 0 C
S¼B C and N B C:
@0 0 2 0A @0 0 0 0A
2 0 2 4 1 1 0 0
That is, we have

0 1 0 1 0 1
1 1 0 0 2 0 0 0 1 1 0 0
B C B C B C
B1 3 0 0C B 0C B 0 0C
B C B0 2 0 C B1 1 C
A¼B C¼B CþB C:
B0 0 2 0C B 0C B 0 0C
@ A @0 0 2 A @0 0 A
3 1 2 4 2 0 2 4 1 1 0 0
ð12:178Þ
Even though matrix forms S and N differ depending on the choice of different
matrix forms of similarity transformation R (namely, R0, R00, etc. represented above),
the decomposition (12.178) is unique. That is, eS and N e are uniquely determined,
once a matrix A is given. The confirmation is left for readers as an exercise.
We present another simple example. Let A be a matrix described as

0 4
A¼ :
1 4
Eigenvalues of A are 2 as a double root. According to routine, we have an

ð2Þ
eigenvector e1 as a column vector, e.g.,

ð2Þ 2
e1 ¼ :
1
ð2Þ
Another eigenvector is a generalized eigenvector g1 of rank 2. This can be
decided such that

ð2Þ 2 4 ð2Þ ð2Þ
ðA 2E Þg1 ¼ g ¼ e1 :
1 2 1
As an option, we get

ð2Þ 1
g1 ¼ :
1
Thus, we can choose R for a diagonalizing matrix together with an inverse matrix
R1 such that

2 1 1 1
R¼ , and R1 ¼ :
1 1 1 2
Therefore, with a Jordan canonical form we have

1 2 1
R AR ¼ : ð12:179Þ
0 2
As before, putting

2 0 0 1
S¼ and N¼ ,
0 2 0 0
we have
R1 AR ¼ S þ N, i:e:, A ¼ RðS þ N ÞR1 ¼ RSR1 þ RNR1 :
Putting e e ¼ RNR1 , we get

S ¼ RSR1 and N
A¼e e
SþN
with

2 0 2 4
e
S¼ e¼
and N : ð12:180Þ
0 2 1 2
We may also choose R0 for a diagonalizing matrix together with an inverse matrix
R instead of R and R1, respectively, such that we have, e.g.,
0 1

0 2 3 0 1 1 3
R ¼ , and R ¼ :
1 1 1 2
Using these matrices, we get exactly the same Jordan canonical form and the
matrix decomposition as (12.179) and (12.180). Thus, again we find that the matrix
decomposition is unique.
Another simple example is a lower triangle matrix
0 1
2 0 0
B C
A ¼ @ 2 1 0 A:
0 0 1
Following now familiar procedures, as a diagonalizing matrix we have, e.g.,

0 1 0 1
1 0 0 1 0 0
B C B C
R ¼ @ 2 1 0A and R1 ¼ @2 1 0 A:
0 0 1 0 0 1
Then, we get
0 1
2 0 0
B C
R1 AR ¼ S ¼ @ 0 1 0 A:
0 0 1
Therefore, the “decomposition” is

0 1 0 1
2 0 0 0 0 0
B C B C
A ¼ RSR1 ¼ @ 2 1 0A þ @0 0 0 A,
0 0 1 0 0 0
where the first term is a semi-simple matrix and the second is a nilpotent matrix (i.e.,
zero matrix). Thus, the decomposition is once again unique.
12.7 Diagonalizable Matrices
Among canonical forms of matrices, the simplest form is a diagonalizable matrix.

Here we define a diagonalizable matrix as a matrix that can be converted to that
whose off-diagonal elements are zero. In Sect. 12.5 we have investigated different
12.7 Diagonalizable Matrices 513
properties of the matrices. In this section we examine basic properties of diagonal-

izable matrices.
In Sect 12.6.1 we have shown that Span{Ni 1aρ, Ni 2aρ, Naρ, aρ} forms a
N-invariant subspace of a dimension i, where aρ satisfies the relations Niaρ ¼ 0 and
Ni 1aρ 6¼ 0 (ri + 1 + 1 j ri) as in (12.122). Of these vectors, only Ni 1aρ is a sole
proper eigenvector that is accompanied by (i 1) generalized eigenvectors. Note
that only the proper eigenvector can construct one-dimensional N-invariant subspace
by itself. This is because regarding other generalized eigenvectors g (here g stands
for all of generalized eigenvectors), Ng (≠0) and g are linearly independent. Note
that with a proper eigenvector e, we have Ne = 0. A corresponding Jordan block is
represented by a matrix as given in (12.122) in reference to the basis vectors
comprising these i eigenvectors. Therefore, if a (n, n) matrix A has only proper
eigenvectors, all Jordan blocks are one-dimensional. This means that A is diagonal-
izable. That A has only proper eigenvectors is equivalent to that those eigenvectors
form a one-dimensional subspace and that V n is a direct sum of the subspaces
spanned by individual proper eigenvectors. In other words, if V n is a direct sum of
subspaces (i.e., eigenspaces) spanned by individual proper eigenvectors of A, A is
diagonalizable.
Next, suppose that A is diagonalizable. Then, after an appropriate similarity
transformation with a non-singular matrix P, A has a following form:
0 1
α1
B ⋱ C
B C
B C
B α1 C
B C
B α2 C
B C
B C
B ⋱ C
P1 AP ¼ B
B
C:
C ð12:181Þ
B α2 C
B C
B ⋱ C
B C
B αs C
B C
B C
@ ⋱ A
αs
In this case, let us examine what form a minimal polynomial φA(x) for A takes. A
characteristic polynomial fA(x) for A is invariant through similarity transformation,
so is φA(x). That is,
φP1 AP ðxÞ ¼ φA ðxÞ: ð12:182Þ
From (12.181), we find that A αiE (1 i s) has a “perforated” form such as

(12.143) with the diagonalized form unchanged. Then we have
ðA α1 EÞðA α2 E Þ ðA αs EÞ ¼ 0: ð12:183Þ
This is because a product of matrices having only diagonal elements is merely a

product of individual diagonal elements. Meanwhile, in virtue of HamiltonCayley
theorem, we have
Ys
f A ðAÞ ¼ i¼1
ðA αi E Þni ¼ 0:
Rewriting this expression, we have
ðA α1 EÞn1 ðA α2 E Þn2 ðA αs E Þns ¼ 0: ð12:184Þ
In light of (12.183), this implies that a minima polynomial φA(x) is expressed as
φA ðxÞ ¼ ðx α1 Þðx α2 Þ ðx αs Þ: ð12:185Þ
Surely φA(x) of (12.185) has a lowest order polynomial among those satisfying f
(A) ¼ 0 and is a divisor of fA(x). Also φA(x) has a highest-order coefficient of 1. Thus,
φA(x) should be a minimal polynomial of A and we conclude that φA(x) does not have
a multiple root.
Then let us think how V n is characterized in case φA(x) does not have a multiple
root. This is equivalent to that φA(x) is described by (12.185). To see this, suppose
that we have two matrices A and B and let BVn ¼ W. We wish to use the following
relation:

rank ðABÞ ¼ dimABV n ¼ dimAW ¼ dimW dim A1 f0g \ W

dimW dim A1 f0g ¼ dimBV n ðn dimAV n Þ
¼ rank A þ rank B n: ð12:186Þ
In (12.186), the third equality comes from the fact that the domain of A is
restricted to W. Concomitantly, A1{0} is restricted to A1{0} \ W as well; notice
that A1{0} \ W is a subspace. Considering these situations, we use a relation
corresponding to that of (11.45). The fourth equality is due to the dimension theorem
of (11.45). Applying (12.186) to (12.183) successively, we have
0 ¼ rank ½ðA α1 E ÞðA α2 EÞ ðA αs EÞ

rank ðA α1 EÞ þ rank ½ðA α2 E Þ ðA αs E Þ n
rank ðA α1 EÞ þ rank ðA α2 EÞ þ rank ½ðA α3 E Þ ðA αs E Þ
2n

rank ðA α1 EÞ þ þ rank ðA αs E Þ ðs 1Þn
Xs
¼ i¼1
rank ½ðA αi E Þ n þ n:
Finally we get
Xs
i¼1
rank ½n ðA αi E Þ n: ð12:187Þ
As rank ½n ðA αi E Þ ¼ dimW αi , we have

Xs
i¼1
dimW αi n: ð12:188Þ
Meanwhile, we have
V n ⊃ W α1 W α2 W αs
Xs ð12:189Þ
n dim ðW α1 W α2 W αs Þ ¼ i¼1
dimW αi :
The equality results from the property of a direct sum. From (12.188) and
(12.189), we get
Xs
i¼1
dimW αi ¼ n: ð12:190Þ
Hence,
V n ¼ W α1 W α2 W αs : ð12:191Þ
Thus, we have proven that if the minimal polynomial does not have a multiple
root, V n is decomposed into direct sum of eigenspaces as in (12.191).
If in turn V n is decomposed into direct sum of eigenspaces as in (12.191),
A can be diagonalized by a similarity transformation. The proof is as follows:
Suppose that (12.191) holds. Then, we can take only eigenvectors for the basis
αi ¼ ni . Then, we can take vectors
n
vectors
hP of V . Suppose Pthat dimW i
i1 i
ak n
j¼1 j þ 1 k n
j¼1 j so that ak can be the basis vectors of W αi . In
reference to this basis set, we describe a vector x 2 V n such that
0
1
x1
B x2 C
B C
x = ða1 a2 an Þ B C:
@⋮A
xn
Operating A on x, we get
0 1 0 1
x1 x1
B C B C
B x2 C B x2 C
B C B C
AðxÞ ¼ ða1 a2 an ÞAB C ¼ ða1 A a2 A an AÞB C
B⋮C B⋮C
@ A @ A
xn xn
0 1
x1
B C
B x2 C
B C
¼ ðα1 a1 α2 a2 αn an ÞB C
B⋮C
@ A
xn
0 10 1
α1 x1
B α2 CB C
B CB x2 C
B CB C
¼ ða1 a2 an ÞB CB C ð12:192Þ
B ⋱ CB ⋮ C
@ A@ A
αn xn
where with the second equality we used the notation (11.40); with the third equality
some of αi (1 i n) may be identical; ai is an eigenvector that corresponds to an
eigenvalue αi. Suppose that a1, a2, , and an are obtained by transforming an
“original” basis set e1, e2, , and en by R. Then, we have
0 1 0 ð0Þ 1
α1 x1
B α2 C B C
B C 1 B xð20Þ C
AðxÞ ¼ ðe1 e2 en Þ RB
B
CR B
C B ⋮
C:
C
@ ⋱ A @ A
αn xðn0Þ
We denote the transformation A with respect to a basis set e1, e2, , and en by
A0; see (11.82) with the notation. Then, we have
0 ð0Þ 1
x1
B ð0Þ C
B x2 C
AðxÞ ¼ ðe1 e2 en Þ A0 B
B ⋮
C:
C
@ A
xðn0Þ
Therefore, we get
0 1
α1
B α2 C
B C
R1 A0 R ¼ B C: ð12:193Þ
@ ⋱ A
αn
Thus, A is similar to a diagonal matrix as represented in (12.192) and (12.193).

It is obvious to show a minimal polynomial of a diagonalizable matrix has no
multiple root. The proof is left for readers.
Summarizing the above arguments, we have a following theorem:
Theorem 12.7 [3, 4] The following three statements related to A are equivalent:
(i) The matrix A is similar to a diagonal matrix.
(ii) The minimal polynomial φA(x) does not have a multiple root.
(iii) The vector space V n is decomposed into a direct sum of eigenspaces.
In Example 12.1 we showed the diagonalization of a matrix. There A has two
different eigenvalues. Since with a (n, n) matrix having n different eigenvalues its
characteristic polynomial does not have a multiple root, the minimal polynomial
necessarily has no multiple root. The above theorem therefore ensures that a matrix
having no multiple root must be diagonalizable.
Another consequence of this theorem is that an idempotent matrix is diagonaliz-
able. The matrix is characterized by A2 ¼ A. Then A(A E) ¼ 0. Taking its
determinant, (detA)[det(A E)] ¼ 0. Therefore, we have either detA ¼ 0 or det
(A E) ¼ 0. Hence, eigenvalues of A are zero or 1. Think of f(x) ¼ x(x 1). As f
(A) ¼ 0, f(x) should be a minimal polynomial. It has no multiple root, and so the
matrix is diagonalizable.
Example 12.6 Let us revisit Example 12.1, where we dealt with

2 1
A¼ : ð12:32Þ
0 1
From (12.33), fA(x) ¼ (x 2)(x 1). Note that f A ðxÞ ¼ f P1 AP ðxÞ. Let us treat a
problem according to Theorem 12.5. Also we use the notation of (12.85). Given
f1(x) ¼ x 1 and f2(x) ¼ x 2, let us decide M1(x) and M2(x) such that these can
satisfy
M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ ¼ 1: ð12:194Þ
We find M1(x) ¼ 1 and M2(x) ¼ 1. Thus, using the notation of Theorem 12.5,
Sect. 12.4 we have
!
1 1
A1 ¼ M 1 ðAÞf 1 ðAÞ ¼ A E ¼ ,
0 0
!
0 1
A2 ¼ M 2 ðAÞf 2 ðAÞ ¼ A þ 2E ¼ :
0 1
We also have
A1 þ A2 ¼ E, Ai Aj ¼ Aj Ai ¼ Ai δij : ð12:195Þ
Thus, we find that A1 and A2 are idempotent matrices. As both A1 and A2 are
expressed by a polynomial A, they are commutative with A.
We find that A is represented by
A ¼ α1 A1 þ α2 A2 , ð12:196Þ
where α1 and α2 denote eigenvalues 2 and 1, respectively. Thus choosing proper

eigenvectors for basis vectors, we have decomposed a vector space V n into a direct
sum of invariant subspaces comprising the proper eigenvectors. Concomitantly, A is
represented as in (12.196). The relevant decomposition is always possible for a
diagonalizable matrix.
Thus, idempotent matrices play an important role in the theory of linear vector
spaces.
Example 12.7 Let us think of a following matrix.
0 1
1 0 1
B C
A ¼ @0 1 1 A: ð12:197Þ
0 0 0
This is a triangle matrix, and so diagonal elements give eigenvalues. We have an

eigenvalue 1 of double root and that 0 of simple root. The matrix can be diagonalized
using P such that
0 10 10 1 0 1
1 0 1 1 0 1 1 0 1 1 0 0
B CB CB C B C
e ¼ P1 AP ¼ B 0 1C B 1C B 1 1 C B 0C
A @ 1 A@ 0 1 A@ 0 A ¼ @0 1 A:
0 0 1 0 0 0 0 0 1 0 0 0
ð12:198Þ
e2 ¼ A.
As can be checked easily, A e We also have
0 1
0 0 0
e¼B
EA @0 0
C
0 A, ð12:199Þ
0 0 1
2
where E A e ¼EA e holds as well. Moreover, A e EA e ¼ EA e Ae ¼ 0.
Thus A e and E A e behave like A1 and A2 of (12.195).
Next, suppose that x 2 V n is expressed as a linear combination of basis vectors a1,
a2, , and an. Then
x = c1 a1 þ c2 a2 þ þ cn1 an1 þ cn an : ð12:200Þ
Here let us define a following linear transformation P(k) such that P(k) “extracts”
the k-th component of x. That is,
Xn Xn Xn
½k
PðkÞ ðxÞ = PðkÞ j¼1
c j a j ¼ j¼1
p c a ¼ c k ak ,
i¼1 ij j i
ð12:201Þ
½k
where pij is the matrix representation of P(k). In fact, suppose that there is another
arbitrarily chosen vector x such that
y = d1 a1 þ d2 a2 þ þ dn1 an1 þ d n an : ð12:202Þ
Then we have
PðkÞ ðax þ byÞ = ðack þ bd k Þak = ack ak þ bd k ak = aPðkÞ ðxÞ þ bPðkÞ ðyÞ: ð12:203Þ
Thus P(k) is a linear transformation. In (12.201), for the third equality to hold, we
should have

½k ðk Þ
pij ¼ δi δðjkÞ , ð12:204Þ
ð jÞ
where δi has been defined in (12.175). Meanwhile, δðjkÞ denotes a row vector to
ðk Þ
which only the k-th column is 1, otherwise 0. Note that δi represents a (n, 1) matrix
ðk Þ
and that δðjkÞ denotes a (1, n) matrix. Therefore, δi δjðkÞ represents a (n, n) matrix
whose (k, k) element is 1, otherwise 0. Thus, P(k)(x) is denoted by
0 1
0
B ⋱ C
B C0 1
B C x1
B 0 C
B CB x C
ðk Þ B CB 2 C
P ðxÞ ¼ ða1 an ÞB 1 C B C ¼ x k ak , ð12:205Þ
B C@ ⋮ A
B 0 C
B C x
B C n
@ ⋱ A
0
where only the (k, k) element is 1, otherwise 0. Then P(k)[P(k)(x)] ¼ P(k)(x). That is
h i2
PðkÞ ¼ PðkÞ : ð12:206Þ
Also P(k)[P(l )(x)] = 0 if k 6¼ l. Meanwhile, we have P(1)(x) + + P(n)(x) = x.

Hence, P(1)(x) + + P(n)(x) = [P(1) + + P(n)](x) ¼ x. Since this relation holds
with any x 2 V n, we get
Pð1Þ þ þ PðnÞ ¼ E: ð12:207Þ
As shown above, an idempotent matrix such as P(k) always exists. In particular, if

the basis vectors comprise only proper eigenvectors, the decomposition as expressed
in (12.196) is possible. In that case, it is described as
A ¼ α1 A1 þ þ αn An , ð12:208Þ
where α1, , and αn are eigenvalues (some of which may be identical) and A1, ,
and An are idempotent matrices such as those represented by (12.205).
Yet, we have to be careful to construct idempotent matrices according to a
formalism described in Theorem 12.5. It is because we often encounter a situation
where different matrices give an identical characteristic polynomial. We briefly
mention this in the next example.
Example 12.8 Let us think about following two matrices:
0 1 0 1
3 0 0 3 0 0
B C B C
A ¼ @0 2 0 A, B ¼ @ 0 2 1 A: ð12:209Þ
0 0 2 0 0 2
Then, following Theorem 12.5, Sect. 12.4, we have

f A ðxÞ ¼ f B ðxÞ ¼ ðx 3Þðx 2Þ2 , ð12:210Þ
with eigenvalues α1 ¼ 3 and α2 ¼ 2. Also we have f1(x) ¼ (x 2)2 and f2(x) ¼ x 3.

Following the procedures of (12.85) and (12.86), we obtain
M 1 ð xÞ ¼ x 2 and M 2 ðxÞ ¼ x2 þ 3x 3:
Therefore, we have

M 1 ðxÞf 1 ðxÞ ¼ ðx 2Þ3 , M 2 ðxÞf 2 ðxÞ ¼ ðx 3Þ x2 þ 3x 3 : ð12:211Þ
Hence, we get
A1 M 1 ðAÞf 1 ðAÞ ¼ ðA 2E Þ3 ,
ð12:212Þ
A2 M 2 ðAÞf 2 ðAÞ ¼ ðA 3E Þ A2 þ 3A 3E :
Similarly, we get B1 and B2 by replacing A with B in (12.212). Thus, we have

0 1 0 1
1 0 0 0 0 0
B C B C
A1 ¼ B1 ¼ @ 0 0 0 A, A2 ¼ B2 ¼ @ 0 1 0 A: ð12:213Þ
0 0 0 0 0 1
Notice that we get the same idempotent matrix of (12.213), even though the
matrix forms of A and B differ. Also we have
A1 þ A2 ¼ B1 þ B2 ¼ E:
Then, we have
A ¼ ðA1 þ A2 ÞA ¼ A1 A þ A2 A, B ¼ ðB1 þ B2 ÞB ¼ B1 B þ B2 B:
Nonetheless, although A ¼ 3A1 + 2A2 holds, B 6¼ 3B1 + 2B2. That is, the
decomposition of the form of (12.208) is not true of B. The decomposition of this
kind is possible only with diagonalizable matrices.
In summary, a (n, n) matrix with s (1 s n) different eigenvalues has at least
s proper eigenvectors. (Note that a diagonalizable matrix has n proper eigenvectors.)
In the case of s < n, the matrix has multiple root(s) and may have generalized
eigenvectors. If the matrix has a generalized eigenvector of rank ν, the matrix is
accompanied by (ν 1) generalized eigenvectors along with a sole proper eigen-
vector. Those vectors form an invariant subspace along with the proper eigenvector
(s). In total, such n (generalized) eigenvectors span a whole vector space V n.
With the eigenvalue equation A(x) ¼ αx, we have an indefinite but nontrivial
solution x ≠ 0 for only restricted numbers α (i.e., eigenvalues) in a complex plane.
However, we have a unique but trivial solution x = 0 for complex numbers α other
than eigenvalues. This is characteristic of the eigenvalue problem.
References
1. Mirsky L (1990) An introduction to linear algebra. Dover, New York

3. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
4. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
Chapter 13
Inner Product Space
Thus far we have treated the theory of linear vector spaces. The vector spaces,
however, were somewhat “structureless,” and so it will be desirable to introduce a
concept of metric or measure into the linear vector spaces. We call a linear vector
space where the inner product is defined an inner product space. In virtue of a
concept of the inner product, the linear vector space is given a variety of structures.
For instance, introduction of the inner product to the linear vector space immediately
leads to the definition of adjoint operators and Gram matrices.
Above all, the concept of inner product can readily be extended to a functional
space and facilitate understanding of, e.g., orthogonalization of functions, as was
exemplified in Parts I and II. Moreover, definition of the inner product allows us to
relate matrix operators and differential operators. In particular, it is a key issue to
understand logical structure of quantum mechanics. This can easily be understood
from the fact that Paul Dirac, who was known as one of prominent founders of
quantum mechanics, invented bra and ket vectors to represent an inner product.
13.1 Inner Product and Metric
Inner product relates two vectors to a complex number. To do this, we introduce the
notation jai and hbj to represent the vectors. This notation is due to Dirac and widely
used in physics and mathematical physics. Usually jai and hbj are called a “ket”
vector and a “bra” vector, respectively, again due to Dirac. Alternatively, we may
call haj an adjoint vector of jai. Or we denote haj jai{. The symbol “{” (dagger)
means that for a matrix its transposed matrix should be taken with complex conju-
{
gate matrix elements. That is, aij ¼ aji . If we represent a full matrix, we have

https://doi.org/10.1007/978-981-15-2225-3_13
524 13 Inner Product Space
0 1 0 1
a11 a1n a11 an1
B C B C
A¼@⋮ ⋱ ⋮ A, A{ ¼ @ ⋮ ⋱ ⋮ A: ð13:1Þ
an1 ann a1n ann
We call A{ an adjoint matrix or adjoint to A; see (1.106). The operation of

transposition and complex conjugate is commutable. A further remark will be
made below after showing the definition of the inner product. The symbols jai and
hbj represent vectors and, hence, we do not need to use bold characters to show that
those are vectors.
The definition of the inner product is as follows:
hbjai = hajbi , ð13:2Þ

hajðβjbi þ γjciÞ ¼ βhajbi þ γ hajci, ð13:3Þ
hajai 0: ð13:4Þ
In (13.4), equality holds only if jai ¼ 0. Note here that two vectors are said to be
orthogonal to each other if their inner product vanishes, i.e., hbjai ¼ hajbi ¼ 0. In
particular, if a vector jai 2 V n is orthogonal to all the vectors in V n, i.e., hxjai ¼ 0 for
8
x 2 V n, then jai ¼ 0. This is because if we choose jai for jxi, we have hajai ¼ 0.
This means that jai ¼ 0. We call a linear vector space to which the inner product is
defined an inner product space.
We can create another structure to a vector space. An example is a metric
(or distance function). Suppose that there is an arbitrary set Q. If a real non-negative
number ρ(a, b) is defined as follows with any arbitrary elements a, b, c 2 Q, the set
Q is called a metric space [1]:
ρða, bÞ ¼ ρðb, aÞ, ð13:5Þ

8
ρða, bÞ 0 for a, b; ρða, bÞ ¼ 0 if and only if a ¼ b, ð13:6Þ
ρða, bÞ þ ρðb, cÞ ρða, cÞ: ð13:7Þ
In our study a vector space is chosen for the set Q. Here let us define a norm for
each vector a. The norm is defined as
jjajj¼ hajai: ð13:8Þ
If we define ρ(a, b) jja bjj, jja bjj satisfies the definitions of metric.
Equations (11.5) and (11.6) are obvious. For (11.7) let us consider a vector jci as
jci¼jai xhbjaijbi with real x. Since hcjci 0, we have
13.1 Inner Product and Metric 525
x2 hajbihbjaihbjbi 2xhajbihbjai þ hajai 0
or
x2 jhajbij2 hbjbi 2x j hajbij2 þ hajai 0: ð13:9Þ
The inequality (13.9) related to the quadratic equation in x with real coefficients
requires the inequality such that
hajaihbjbi hajbihbjai ¼ jhajbij2 : ð13:10Þ
That is,
pffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi
hajai ∙ hbjbi jhajbij : ð13:11Þ
Namely,
jjajj ∙ jjbjj j hajbi j R ehajbi: ð13:12Þ
The relations (13.11) and (13.12) are known as Cauchy–Schwarz inequality.

Meanwhile, we have
jja þ bjj2 ¼ ha þ bja þ bi ¼ jjajj2 þ jjbjj2 þ 2R ehajbi, ð13:13Þ

ðjjajj þ jjbjjÞ2 ¼ jjajj2 þ jjbjj2 þ 2jjajj ∙ jjbjj, ð13:14Þ
Comparing (13.13) and (13.14) and using (13.12), we have
jjajj þ jjbjj jja þ bjj: ð13:15Þ
The inequality (13.15) is known as the triangle inequality. In (13.15) replacing

a ! a b and b ! b c, we get
jja bjj þ jjb cjj jja cjj: ð13:16Þ
Thus, (13.16) is equivalent to (13.7). At the same time, the norm defined in (13.8)
may be regarded as a “length” of a vector a.
As βjbi + γjci represents a vector, we use a shorthand notation for it as
j βb þ γci β j bi þ γ j ci: ð13:17Þ
According to the definition (13.3)

hajβb þ γci ¼ βhajbi þ γ hajci: ð13:18Þ
Also from (13.2),
hβb þ γcjai ¼ hajβb þ γci ¼ ½βhajbi þ γ hajci ¼ β hbjai þ γ hcjai: ð13:19Þ
That is,
hβb þ γcj ¼ β hb j þγ hcj : ð13:20Þ
Therefore, when we take out a scalar from a bra vector, it should be a complex
conjugate. When the scalar is taken out from a ket vector, on the other hand, it is
unaffected by definition (13.3). To show this, we have jαai ¼ αjai. Taking its
adjoint, hαaj ¼ αhaj.
We can view (13.17) as a linear transformation in a vector space. In other words,
fn, j ∙ i is that
if we regard j ∙ i as a linear transformation of a vector a 2 V n to j ai 2 V
n
of V to V f. On the other hand, h ∙ j could not be regarded as a linear transformation
n
of a 2 V to haj2 V
n fn 0. Sometimes the said transformation is referred to as “antilinear”
or “sesquilinear.” From the point of view of formalism, the inner product can be
viewed as an operation: V fn V fn 0 ! ℂ.
Let us consider jxi ¼ jyi in an inner product space V fn. Then jxi jyi ¼ 0. That is,
jx yi ¼ 0. Therefore, we have x ¼ y, or j0i ¼ 0. This means that the linear
transformation j ∙ i converts 0 2 V n to j 0i 2 V fn . This is a characteristic of a linear
transformation represented in (11.44). Similarly we have h0j ¼ 0. However, we do
not have to get into further details in this book. Also if we have to specify a vector
space, we simply do so by designating it as V n.
13.2 Gram Matrices
Once we have defined an inner product between any pair of vectors jai and jbi of a
vector space V n, we can define and calculate various quantities related to inner
products. As an example, let ja1i, , jani and jb1i, , jbni be two sets of vectors
in V n. The vectors ja1i, , jani may or may not be linearly independent. This is true
of jb1i, , jbni. Let us think of a following matrix M defined as below:
0 1 0 1
ha1 j ha1 jb1 i ha1 jbn i
B C B C
M ¼ @ ⋮ Aðjb1 i jbn iÞ ¼ @ ⋮ ⋱ ⋮ A: ð13:21Þ
han j han jb1 i han jbn i
We assume following cases.

13.2 Gram Matrices 527
(i) Suppose that in (13.21) jb1i, , jbni are linearly dependent. Then, without loss
of generality we can put
jb1 i ¼ c2 jb2 i þ c3 jb3 i þ þ cn jbn i: ð13:22Þ
Then we have
0 1
c2 ha1 jb2 i þ c3 ha1 jb3 i þ þ cn ha1 jbn i ha1 jb2 i ha1 jbn i
B C
M¼@ ⋮ ⋮ ⋱ ⋮ A :
c2 han jb2 i þ c3 han jb3 i þ þ cn han jbn i han jb2 i han jbn i
ð13:23Þ
Multiplying the second column, , and the n-th column by (c2), , and
(cn), respectively, and adding them to the first column to get
ð13:24Þ
Hence, det M ¼ 0.
(ii) Suppose in turn that in (13.21) ja1i, , jani are linearly dependent. In that case,
again, without loss of generality we can put
ja1 i ¼ d2 ja2 i þ d3 ja3 i þ þ dn jan i: ð13:25Þ
Focusing attention on individual rows and taking a similar procedure described

above, we have
0 1
0 0
B C
B ha jb i ha2 jbn i C
M¼B 2 1 C: ð13:26Þ
@ ⋮ ⋱ ⋮ A
han jb1 i han jbn i
Again, det M ¼ 0.
Next let us examine the case where det M ¼ 0. In that case, n column vectors of
M in (13.21) are linearly dependent. Without loss of generality, we suppose that the
first column is expressed as a linear combination of the other (n 1) columns such
that
0 1 0 1 0 1
ha1 jb1 i ha1 jb2 i ha1 jbn i
B C B C B C
@ ⋮ A ¼ c2 @ ⋮ A þ þ cn @ ⋮ A: ð13:27Þ
han jb1 i han jb2 i han jbn i
Rewriting this, we have

0 1 0 1
ha1 jb1 c2 b2 cn bn i 0
B C B C
@ ⋮ A ¼ @ ⋮ A: ð13:28Þ
han jb1 c2 b2 cn bn i 0
Multiplying the first row, , and the n-th row of (13.28) by an appropriate
complex number p1 , , and pn , respectively, we have
hp1 a1 jb1 c2 b2 cn bn i ¼ 0,
,
hpn an jb1 c2 b2 cn bn i ¼ 0: ð13:29Þ
Adding all the above, we get
hp1 a1 þ þ pn an jb1 c2 b2 cn bn i ¼ 0: ð13:30Þ
Now, suppose that ja1i, , jani are the basis vectors. Then, jp1a1 + + pnani
represents any vectors in a vector space. This implies that jb1 c2b2 cnbni ¼ 0;
for this see remarks after (13.4). That is,
jb1 i ¼ c2 jb2 i þ þ cn jbn i: ð13:31Þ
Thus, jb1i, , jbni are linearly dependent.

Meanwhile, det M ¼ 0 implies that n row vectors of M in (13.21) are linearly
dependent. In that case, performing similar calculations to the above, we can readily
show that if jb1i, , jbni are the basis vectors, ja1i, , jani are linearly dependent.
We summarize the above discussion by a following statement: Suppose that we
have two sets of vectors ja1i, , jani and jb1i, , jbni.
At least a set of vectors are linearly dependent. ⟺ det M ¼ 0.
Both the sets of vectors are linearly independent. ⟺ det M 6¼ 0.
The latter statement is obtained by considering contraposition of the former
statement.
We restate the above in a following theorem:
Theorem 13.1 Let ja1i, , jani and jb1i, , jbni be two sets of vectors defined in
a vector space V n. A necessary and sufficient condition for both these sets of vectors
to be linearly independent is that for a matrix M defined below, det M 6¼ 0.
0 1 0 1
ha1 j ha1 jb1 i ha1 jbn i
B C B C
M ¼ @ ⋮ Aðjb1 i jbn iÞ ¼ @ ⋮ ⋱ ⋮ A:
han j han jb1 i han jbn i
Next we consider a norm of a vector expressed in reference to a set of basis

vectors je1i, , jeni of V n. Let us express a vector jxi in an inner product space as
follows as in the case of (11.10) and (11.13):
jxi ¼ x1 je1 i þ x2 je2 i þ þ xn jen i ¼ jx1 e1 þ x2 e2 þ þ xn en i

0 1
x1
B C
¼ ðje1 i jen iÞ@ ⋮ A: ð13:32Þ
xn
A bra vector hxj is then denoted by

0 1
he1 j
B C
hx j¼ x1 xn @ ⋮ A: ð13:33Þ
hen j
Thus, we have an inner product described as

0 1 0 1
h e1 j x1

B C B C
hxjxi ¼ x1 xn @ ⋮ Aðje1 i jen iÞ@ ⋮ A
h en j xn
0 10 1
he1 je1 i he1 jen i x1
B CB C
¼ x1 xn @ ⋮ ⋱ ⋮ A@ ⋮ A : ð13:34Þ
hen je1 i hen jen i xn
Here the matrix expressed as follows is called a Gram matrix [2–4]:

0 1
he1 je1 i he1 jen i
B C
G¼@ ⋮ ⋱ ⋮ A: ð13:35Þ
hen je1 i hen jen i
As hejjeii ¼ heijeji, we have G ¼ G{. From Theorem 13.1, detG 6¼ 0. With a

shorthand notation, we write (G)ij ¼ (gij) ¼ (heijeji). As already mentioned in Sect.
1.4, if for a matrix H we have a relation described by
H ¼ H{, ð1:119Þ
it is said to be an Hermitian matrix or a self-adjoint matrix. We often say that such a

matrix is Hermitian.
Since the Gram matrices frequently appear in matrix algebra and play a role, their
properties are worth examining. Since G is an Hermitian matrix, it can be diagonal-
ized through similarity transformation using a unitary matrix. We will give its proof
later (see Sect. 14.3).
Let us deal with (13.34) further. We have
0 1 0 1
he1 je1 i he1 jen i x1

{B C {B C
hxjxi ¼ x1 xn UU @ ⋮ ⋱ ⋮ AUU @ ⋮ A, ð13:36Þ
hen je1 i hen jen i xn
where U is defined as UU{ ¼ U{U ¼ E. Such a matrix U is called a unitary matrix.

We represent a matrix form of U as
0 1 0 1
u11 u1n u11 un1
B C B C
U¼@⋮ ⋱ ⋮ A, U { ¼ @ ⋮ ⋱ ⋮ A: ð13:37Þ
un1 unn u1n unn
Here, putting
0 1 0 1
x1 ξ1
Xn
{B C B C
U @ ⋮ A ¼ @ ⋮ A or equivalently x u ξ i
k¼1 k ki
ð13:38Þ
xn ξn
and taking its adjoint such that

Xn
x1 xn U ¼ ξ1 ξn or equivalently x u ¼ ξi ,
k¼1 k ki
we have
0 1 0 1
he1 je1 i he1 jen i ξ1
B C B C
hxjxi ¼ ξ1 ξn U { @ ⋮

⋱ ⋮ AU @ ⋮ A: ð13:39Þ
hen je1 i hen jen i ξn
We assume that the Gram matrix is diagonalized by a similarity transformation by

U. After being diagonalized, similarly to (12.192) and (12.193) the Gram matrix has
a following form G0:
0 1
λ1 0
{ 0 B C
U GU ¼ G ¼ @ ⋮ ⋱ ⋮ A: ð13:40Þ
0 λn
Thus, we get
0 10 1
λ1 0 ξ1
B CB C
hxjxi ¼ ξ1 ξn @ ⋮ ⋱ ⋮ A@ ⋮ A ¼ λ1 jξ1 j2 þ þ λn jξn j2 : ð13:41Þ
0 λn ξn
From the relation (13.4), hxjxi 0. This implies that in (13.41) we have
λ i 0 ð1 i nÞ: ð13:42Þ
this, suppose that for ∃λi, λi < 0. Suppose also that for ∃ξi, ξi 6¼ 0. Then,
To0show1
0
B C
B⋮C
B C
with B C
B ξi C we have hxjxi ¼ λi|ξi| < 0, in contradiction.
2
B⋮C
@ A
0
Since we have detG 6¼ 0, taking a determinant of (13.40) we have
Y
n
det G0 ¼ det U { GU ¼ det U { UG ¼ det E det G ¼ det G ¼ λi 6¼ 0: ð13:43Þ
i¼1
Combining (13.42) and (13.43), all the eigenvalues λi are positive; i.e.,
λ i > 0 ð1 i nÞ: ð13:44Þ
The norm hxjxi ¼ 0, if and only if ξ1 ¼ ξ2 ¼ ¼ ξn ¼ 0 which corresponds to

x1 ¼ x2 ¼ ¼ xn ¼ 0 from (13.38).
For further study, we generalize the aforementioned feature a little further. That
is, if je1i, , andjeni are linearly dependent, from Theorem 13.1 we get det G ¼ 0.
This implies
0 1that there is at least one eigenvalue λi ¼ 0 (1 i n) and that for a
0
B C
B⋮C
B C
vector B C ∃
B ξi C with ξi 6¼ 0 we have hxjxi ¼ 0. (Suppose that in V we have linearly
2
B⋮C
@ A
0
dependent vectors je1i and je2i ¼ j e1i; see Example 13.2 below.)
Let H be an Hermitian matrix [i.e., (H )ij ¼ (H)ji]. If we have ϕ as a function of

complex variables x1, , xn such that
Xn
ϕðx1 , , xn Þ ¼ x ðH Þij xj ,
i,j¼1 i
ð13:45Þ
where Hij is a matrix element of an Hermitian matrix; ϕ(x1, , xn) is said to be an

Hermitian quadratic form. Suppose that ϕ(x1, , xn) satisfies ϕ(x1, , xn) ¼ 0 if and
only if x1 ¼ x2 ¼ ¼ xn ¼ 0, and otherwise ϕ(x1, , xn) > 0 with any other sets of
xi (1 i n). Then, the said Hermitian quadratic form is called positive definite and
we write
H > 0: ð13:46Þ
If ϕ(x1, , xn) 0 for any xn (1 i n) and ϕ(x1, , xn) ¼ 0 for at least a set of
(x1, , xn) to which ∃xi 6¼ 0, ϕ(x1, , xn) is said to be positive semi-definite or non-
negative. In that case, we write
H 0: ð13:47Þ
From the above argument, a Gram matrix comprising linearly independent

vectors is positive definite, whereas that comprising linearly dependent vectors is
non-negative. On the basis of the above argument including (13.36) to (13.44), we
have
H > 0 ⟺ λi > 0 ð1 i nÞ, det H > 0;

∃
H 0 ⟺ λi 0 with λi ¼ 0 ð1 i nÞ, det H ¼ 0: ð13:48Þ
Notice here that eigenvalues λi remain unchanged after (unitary) similarity trans-
formation. Namely, the eigenvalues are inherent to H.
We have already encountered several examples of positive definite and non-
negative operators. A typical example of the former case is Hamiltonian of quan-
tum-mechanical harmonic oscillator (see Chap. 2). In this case, energy eigenvalues
are all positive (i.e., positive definite). Orbital angular momenta L2 of hydrogen-like
atoms, on the other hand, are non-negative operators and, hence, an eigenvalue of
zero is permitted.
Alternatively, the Gram matrix is defined as B{B, where B is any (n, n) matrix. If
we take an orthonormal basis jη1i,jη2i, , jηni, jeii can be expressed as
Xn Xn Xn
j ei i ¼ b ji
ηj , hek ei i ¼ b
b ji hη

l ηj i
j¼1 j¼1 l¼1 lk
Xn Xn Xn Xn

¼ j¼1 l¼1
b lk b ji δ lj ¼ j¼1
b jk b ji ¼ j¼1
B{ kj ðBÞji

¼ B{ B ki : ð13:49Þ
For the second equality of (13.49), we used the orthonormal condition hηijηji ¼ δij.
Thus, the Gram matrix G defined in (13.35) can be regarded as identical to B{B.
In Sect. 11.4 we have dealt with a linear transformation of a set of basis vectors e1,
e2, , and en by a matrix A defined in (11.69) and examined whether the
transformed vectors e01 , e02 , and e0n are linearly independent. As a result, a necessary
and sufficient condition for e01 , e02 , and e0n to be linearly independent (i.e., to be a set
of basis vectors) is detA 6¼ 0. Thus, we notice that B plays a same role as A of (11.69)
and, hence, det B 6¼ 0 if and only if the set of vectors je1i,je2i, and jeni defined in
(13.32) are linearly independent. By the same token as the above, we conclude that
eigenvalues of B{B are all positive, only if det B 6¼ 0 (i.e., B is non-singular).
Alternatively, if B is singular, detB{B ¼ det B{ det B ¼ 0. In that case at least one
of eigenvalues of B{B must be zero. The Gram matrices appearing in (13.35) are
frequently dealt with in the field of mathematical physics in conjunction with
quadratic forms. Further topics can be seen in the next chapter.
Example 13.1 Let us take two vectors jε1i andjε2i that are expressed as
jε1 i ¼ je1 iþje2 i, jε2 i ¼ j e1 i þ ije2 i: ð13:50Þ
Here we have heijeji ¼ δij (1 i, j 2). Then we have a Gram matrix expressed as
hε1 jε1 i hε1 jε2 i 2 1þi

G¼ ¼ : ð13:51Þ
hε2 jε1 i hε2 jε2 i 1i 2

2 1 þ i
Principal minors of G are j2j ¼ 2 > 0 and ¼ 4 ð1 þ i Þ
1i 2
ð1 iÞ ¼ 2 > 0. Therefore, according to Theorem 14.11, G > 0 (vide infra).
Let us diagonalize the matrix G. To this end, we find roots of the characteristic
equation. That is,

2 λ 1 þ i

det jG λE j¼ ¼ 0, λ2 4λ þ 2 ¼ 0: ð13:52Þ
1 i 2 λ
pffiffiffi
We have λ ¼ 2 2. Then as a diagonalizing unitary matrix U we get
0 1
1þi 1þi
B 2 2pffiffiffi C
U ¼ @ pffiffiffi A: ð13:53Þ
2 2

2 2
Thus, we get
0 pffiffiffi 1 0 1
1i 2 1þi 1þi
B 2 C 2 1þi B 2 2pffiffiffi C
{ B
U GU ¼ @ 2 C
pffiffiffi A @ pffiffiffi A
1i 2 1i 2 2 2

2 2 2 2
pffiffiffi !
2þ 2 0
¼ pffiffiffi : ð13:54Þ
0 2 2
The eigenvalues 2 þ 2 and 2 2 are real positive as expected. That is, the
Gram matrix is positive definite.
Example 13.2 Let je1i and je2i(¼je1i) be two vectors. Then, we have a Gram
matrix expressed as
he1 je1 i he1 je2 i 1 1

G¼ ¼ : ð13:55Þ
he2 je1 i he2 je2 i 1 1
Similarly in the case of Example 13.1, we have as an eigenvalue equation

1λ 1
det j G λE j¼ ¼ 0, λ2 2λ ¼ 0: ð13:56Þ
1 1 λ
We have λ ¼ 2 or 0. As a diagonalizing unitary matrix U we get

0 1
1 1
B 2 2C
U¼B
@
C: ð13:57Þ
1 1 A
2 2
Thus, we get
0 1 0 1
1 1 1 1
pffiffiffi pffiffiffi pffiffiffi pffiffiffi
B 2 2C 1 1 B 2 2C 2 0
U { GU ¼ B
@ 1
C B C¼ : ð13:58Þ
1 A 1 1 @ 1 1 A 0 0
2 2 2 2
13.3 Adjoint Operators 535
As expected, we have a diagonal matrix, one of eigenvalues for which is zero.

That is, the Gram matrix is non-negative.
In the present case, let us think of a following Hermitian quadratic form:
X2 x1
ϕð x1 , x2 Þ ¼ x ðGÞij xj
i,j¼1 i
¼ x1 x2 UU { GUU {
x2
2 0 x1 2 0 ξ1
¼ x1 x2 U U{ ¼ ξ1 ξ2 ¼ 2j ξ 1 j 2 :
0 0 x2 0 0 ξ2
ξ1 0
Then, if we take with ξ2 6¼ 0 as a column vector, we get ϕ(x1,
¼
ξ2 ξ2
x2) ¼ 0. With this type of column vector, we have
0 1
1 1
x1 0 x1 0 B 2 2C 0
U{ ¼ , i:e:, ¼U ¼B
@
C
x2 ξ2 x2 ξ2 1 1 A ξ2
2 2
1 ξ2
¼ pffiffiffi ,
2 ξ2
x1 0
where ξ2 is an arbitrary complex number. Thus, even for 6¼ we may
x2 0
x1 1 1
have ϕ(x1, x2) ¼ 0. To be more specific, if we take, e.g., ¼ or , we
x2 1 2
get
1 1 1 0
ϕð1, 1Þ ¼ ð1 1Þ ¼ ð 1 1Þ ¼ 0,
1 1 1 0
1 1 1 1
ϕð1, 2Þ ¼ ð1 2Þ ¼ ð 1 2Þ ¼ 1 > 0:
1 1 2 1
13.3 Adjoint Operators
A linear transformation A is similarly defined as before and A transforms jxi of

(13.32) such that
0 10 1
a11 a1n x1
B CB C
AðjxiÞ ¼ ðje1 i jen iÞ@ ⋮ ⋱ ⋮ A@ ⋮ A, ð13:59Þ
an1 ann xn
where (aij) is a matrix representation of A. Defining A(jxi) =jA(x)i and h(x)

A{j ¼ (hxj)A{ ¼ [A(jxi)]{ to be consistent with (1.117), we have
0 10 1
a11 an1 he1 j
B CB C
hðxÞA{ j ¼ x1 xn @ ⋮ ⋱ ⋮ A@ ⋮ A: ð13:60Þ
a1n ann hen j
Pn
Therefore, puttingjyi ¼ i¼1 yi jei i, we have
0 10 1 0 1
a11 an1 h e1 j y1
B CB C B C
ðxÞA{ j y ¼ x1 xn @ ⋮ ⋱ ⋮ A@ ⋮ Aðje1 i jen iÞ@ ⋮ A
a1n ann h en j yn
0 1
y

{ B 1C
¼ x1 xn A G@ ⋮ A: ð13:61Þ
yn
Meanwhile, we get
0 1 0 10 1
h e1 j a11 a1n x1

B C B CB C
hyjAðxÞi ¼ y1 yn @ ⋮ Aðje1 i jen iÞ@ ⋮ ⋱ ⋮ A@ ⋮ A
h en j an1 ann xn
0 1
x1

B C
¼ y1 yn GA@ ⋮ A: ð13:62Þ
xn
Hence, we have
0 1 0 1
x1 x1
B C TB C
hyjAðxÞi ¼ ðy1 yn ÞG A @ ⋮ A ¼ ðy1 yn ÞGT A{ @ ⋮ A: ð13:63Þ
xn xn
With the second equality, we used G ¼ (G{)T ¼ GT (note that G is Hermitian)

and A ¼ (A{)T. A complex conjugate matrix A is defined as
0 1
a11 a1n
B C
A @ ⋮ ⋱ ⋮ A:
an1 ann
Comparing (13.61) and (13.63), we find that one is transposed matrix of the other.
Also note that (AB)T ¼ BTAT, (ABC)T ¼ CTBTAT, etc. Since an inner product can be
viewed as a (1, 1) matrix, two mutually transposed (1, 1) matrices are identical.
Hence, we get

ðxÞA{ jy ¼ hyjAðxÞi ¼ hAðxÞjyi, ð13:64Þ
where the second equality is due to (13.2).

The other way around, we may use (13.64) for the definition of an adjoint
operator of a linear transformation A. In fact, on the basis of (13.64) we have
X { X

i,j,k
x i A ik
ðG Þ kj y j ¼ y ðG Þjk ðA Þki xi
i,j,k j
X h

¼ i,j,k
x i y j A{ ik ðGÞkj ðG Þjk ðA Þki
X h

¼ x
i,j,k i j
y A{ ik ðA Þki ðGÞkj ¼ 0: ð13:65Þ
With the third equality of (13.65), we used (G)jk ¼ (G)kj, i.e., G{ ¼ G (Hermitian
matrix). Thanks to the freedom in choice of basis vectors as well as xi and yi, we must
have

A{ ik
¼ ðA Þki : ð13:66Þ
Adopting the matrix representation of (13.59) for A, we get [1]

A{ ik
¼ aki : ð13:67Þ
Thus, we confirm that the adjoint operator A{ is represented by a complex

conjugate transposed matrix of A, in accordance with (13.1).
D { E
ðxÞA{ jy ¼ yj A{ ðxÞ ¼ hyjAðxÞi:
Comparing both sides of the second equality, we get

{
A{ ¼ A: ð13:68Þ
In Chap. 11, we have seen how a matrix representation of a linear transformation

A is changed from A0 to A0 by the basis vectors transformation. We have
A0 ¼ P1 A0 P: ð11:88Þ
In a similar manner, taking the adjoint of (11.88) we get
{ { 1
ðA0 Þ ¼ P{ A{ P1 ¼ P{ A{ P{ : ð13:69Þ
In (13.69), we denote the adjoint operator before the basis vectors transformation
simply by A{ to avoid complicated notation. We also have
0 {
A{ ¼ ðA0 Þ : ð13:70Þ
Meanwhile, suppose that (A{)0 ¼ Q1A{Q. Then, from (13.69) and (13.70) we
have
P{ ¼ Q1 :
Next let us perform a calculation as below:

ðαu þ βvÞA{ jy ¼ hyjAðαu þ βvÞi ¼ α hyjAðuÞi þ β hyjAðvÞi

¼ α ðuÞA{ jy þ β ðvÞA{ jy ¼ αðuÞA{ þ βðvÞA{ jy :
As y is an element arbitrarily chosen from a relevant vector space, we have
hðαu þ βvÞA{ j¼ hαðuÞA{ þ βðvÞA{ j , ð13:71Þ
or
ðαu þ βvÞA{ ¼ αðuÞA{ þ βðvÞA{ : ð13:72Þ
Equation (13.71) states the equality of two vectors in an inner product space on
both sides, whereas (13.72) states that in a vector space where the inner product is
not defined. In either case, both (13.71) and (13.72) show that A{ is indeed a linear
transformation. In fact, the matrix representation of (13.66) and (13.67) is indepen-
dent of the concept of the inner product.
Suppose that there are two (or more) adjoint operators B and C that correspond to
A. Then, from (13.64) we have
hðxÞBjyi ¼ hðxÞCjyi ¼ hyjAðxÞi : ð13:73Þ
Also we have
hðxÞB ðxÞCjyi ¼ hðxÞðB C Þjyi ¼ 0: ð13:74Þ
As x and y are arbitrarily chosen elements, we get B ¼ C, indicating the

uniqueness of the adjoint operator.
It is of importance to examine how the norm of a vector is changed by the linear
transformation. To this end, let us perform a calculation as below:
0 10 1
a11 an1 he1 j
B CB C
ðxÞA{ jAðxÞ ¼ x1 xn @ ⋮ ⋱ ⋮ A@ ⋮ Aðje1 i jen iÞ
a1n ann hen j
0 10 1
a11 a1n x1
B CB C
@ ⋮ ⋱ ⋮ A@ ⋮ A
an1 ann xn
0 1
x1
B C
¼ x1 xn A{ GA@ ⋮ A: ð13:75Þ
xn
Equation (13.75) gives the norm of vector after its transformation.

We may have a case where the norm is conserved before and after the transfor-
mation. Actually, comparing (13.34) and (13.75), we notice that if A{GA ¼ G,
hxjxi ¼ h(x)A{| A(x)i. Let us have a following example for this.
Example 13.3 Let us take two mutually orthogonal vectors je1i andje2i with
ke2k ¼ 2ke1k as basis vectors in the xy-plane (Fig. 13.1). We assume that je1i is a
Fig. 13.1 Basis vectors | e1i y

and | e2i in the xy-plane and
their linear transformation | ⟩
by R
| ′⟩ | ⟩ cos
| ′⟩
sin
| ⟩
2
| ⟩
−2| ⟩ sin | ⟩ cos x
unit vector. Hence, a set of vectors je1i and je2i does not constitute the orthonormal
basis. Let jxi be an arbitrary position vector there expressed as
x1
jxi ¼ ðje1 i je2 iÞ : ð13:76Þ
x2
Then we have
h e1 j x1
hxjxi ¼ ðx1 x2 Þ ðje1 i je2 iÞ
h e2 j x2
he1 je1 i he1 je2 i x1 1 0 x1
¼ ð x1 x2 Þ ¼ ðx1 x2 Þ
he2 je1 i he2 je2 i x2 0 4 x2
¼ x21 þ 4x22: ð13:77Þ
Next let us think of a following linear transformation R whose matrix represen-

tation is given by
cos θ 2 sin θ
R¼ : ð13:78Þ
ð sin θÞ=2 cos θ
The transformation matrix R is geometrically represented in Fig. 13.1. Following

(11.36) we have
cos θ 2 sin θ x1
RðjxiÞ ¼ je1 i je2 i : ð13:79Þ
ð sin θÞ=2 cos θ x2
As a result of the transformation R, the basis vectors je1i and je2i are transformed
into je10i and je20i, respectively, as in Fig. 13.1 such that
cos θ 2 sin θ
je1 0 i je2 0 i ¼ je1 i je2 i :
ð sin θÞ=2 cos θ
Taking an inner product of (13.79), we have

13.4 Orthonormal Basis 541

ðxÞR{ jRðxÞ
! ! ! !
cos θ ð sin θÞ=2 1 0 cos θ 2 sin θ x1
¼ ðx1 x2 Þ
2 sin θ cos θ 0 4 ð sin θÞ=2 cos θ x2
! !
1 0 x1
¼ ðx1 x2 Þ ¼ x21 þ 4x22
0 4 x2
ð13:80Þ
Putting
1 0
G¼ , ð13:81Þ
0 4
we have R{GR ¼ G.
Comparing (13.77) and (13.80), we notice that a norm of jxi remains unchanged
after the transformation R. This means that R is virtually a unitary transformation. A
somewhat unfamiliar matrix form of R resulted from the choice of basis vectors other
than an orthonormal basis.
13.4 Orthonormal Basis
Now we introduce an orthonormal basis, the simplest and most important basis set in
an inner product space. If we choose the orthonormal basis so that heijeji ¼ δij, a
Gram matrix G ¼ E. Thus, R{GR ¼ G reads as R{R ¼ E. In that case a linear
transformation is represented by a unitary matrix and it conserves a norm of a vector
and an inner product with two arbitrary vectors.
So far we assumed that an adjoint operator A{ operates only on a row vector from
the right, as is evident from (13.61). At the same time, A operates only on the column
vector from the left as in (13.62). To render the notation of (13.61) and (13.62)
consistent with the associative law, we have to examine the commutability of A{
with G. In this context, choice of the orthonormal basis enables us to get through
such a troublesome situation and largely eases matrix calculation. Thus,

ðxÞA{ jAðxÞ
0 10 1 0 10 1
a11 an1 h e1 j a11 a1n x1
B CB C B CB C
¼ x1 xn B
@⋮ ⋱⋮C B C B
A@ ⋮ Aðje1 i jen iÞ@ ⋮ ⋱ ⋮C B C
A@ ⋮ A
a1n ann h en j an1 ann xn
0 1 0 1
x1 x1
{ B C { B C
¼ x1 xn A EAB
C B C
@ ⋮ A ¼ x1 xn A A@ ⋮ A:
xn xn
ð13:82Þ
At the same time, we adopt a simple notation as below instead of (13.75)

0 1
x1
B C
xA{ jAx ¼ x1 xn A{ A@ ⋮ A: ð13:83Þ
xn
This notation has become now consistent with the associative law. Note that A{
and A operate on either a column or row vector. We can also do without a symbol “|”
in (13.83) and express it as

xA{ Ax ¼ xA{ jAx : ð13:84Þ
Thus, we can freely operate A{ and A from both the left and right. By the same
token, we rewrite (13.62) as
0 10 1
a11 a1n x1
B CB C
hyjAxi ¼ hyAxi ¼ y1 yn @ ⋮ ⋱ ⋮ A@ ⋮ A
an1 ann xn
0 1
x1

B C
¼ y1 yn A@ ⋮ A: ð13:85Þ
xn
Here, notice that a vector jxi is represented by a column vector with respect to the
orthonormal basis. Using (13.64), we have
13.4 Orthonormal Basis 543
0 1
y1
B C
hyjAxi ¼ xA{ jy ¼ x1 xn A{ B C
@⋮A
yn
0 10 1 ð13:86Þ
a11 an1 y1
B CB C
¼ x1 xn B
@⋮ ⋱ ⋮C B C
A@ ⋮ A:
a1n ann yn
If in (13.83) we put jyi ¼ jAxi, hxA{jAxi ¼ hyjyi 0. Thus, we define a norm of

jAxi as
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jjAxjj ¼ xA{ jAx : ð13:87Þ
Now we are in a position to construct an orthonormal basis in V n using n linearly

independent vectors jii (1 i n). The following theorem is well known as the
Gram–Schmidt orthonormalization.
Theorem 13.2: Gram–Schmidt Orthonormalization Theorem [5] Suppose that
there are a set of linearly independent vectors jii (1 i n) in V n. Then one can
construct an orthonormal basis jeii (1 i n) so that heijeji ¼ δij (1 j n) and
each vector jeii can be a linear combination of the vectors jii.
Proof First let us take j1i. This can be normalized such that
j 1i
je1 i ¼ pffiffiffiffiffiffiffiffiffiffi , he1 je1 i ¼ 1: ð13:88Þ
h1j1i
Next let us take j2i and then make a following vector:
1
je2 i ¼ ½j2i he1 j2ije1 i, ð13:89Þ
L2
where L2 is a normalization constant such that he2je2i ¼ 1. Note that je2i cannot be a
zero vector. This is because if je2i were a zero vector, j2i and je1i (or j1i) would be
linearly dependent, in contradiction to the assumption. We have he1je2i ¼ 0. Thus,
je1i and je2i are orthonormal.
After this, the proof is based upon mathematical induction. Suppose that the
theorem is true of (n 1) vectors. That is, let jeii (1 i n 1) so that
heijeji ¼ δij (1 j n 1) and each vector jeii can be a linear combination of
the vectors jii. Meanwhile, let us define
Xn1
jf
ni jni j¼1
ej jn jej i: ð13:90Þ
Again, the vector j f

ni cannot be a zero vector as asserted above. We have
Xn1 Xn1
hek j f
ni ¼ hek jni j¼1
ej jn ek jej ¼ hek jni e jn δkj ¼ 0, ð13:91Þ
j¼1 j
where 1 k n 1. The second equality comes from the assumption of the

induction. The vector j f
ni can always be normalized such that
f
j ni
jen i ¼ qffiffiffiffiffiffiffiffiffiffi , hen jen i ¼ 1: ð13:92Þ
fni
hnj f
Thus, the theorem is proven. In (13.92) a phase factor eiθ (θ: an arbitrarily chosen
real number) can be added such that
f
eiθ jni
jen i ¼ qffiffiffiffiffiffiffiffiffiffi : ð13:93Þ
fni
hnj f
To prove Theorem 13.2, we have used the following simple but important
theorem. ∎
Theorem 13.3 Let us have any n vectors jii ¼
6 0 (1 i n) in V n and let these
vectors jii be orthogonal to one another. Then the vectors jii are linearly
independent.
Proof Let us think of the following equation:
c1 j1i þ c2 j2i þ þ cn jni ¼ 0: ð13:94Þ
Multiplying (13.94) by hij from the left and considering the orthogonality among
the vectors, we have
ci hijii ¼ 0: ð13:95Þ
Since hijii 6¼ 0, ci ¼ 0. The above is true of any ci and jii. Then c1 ¼ c2 ¼ ¼ cn ¼ 0.

Thus, (13.94) implies that j1i,j2i, , jni are linearly independent. ∎
References 545
References

Chapter 14
Hermitian Operators and Unitary
Operators
Hermitian operators and unitary operators are quite often encountered in mathemat-
ical physics and, in particular, quantum physics. In this chapter we investigate their
basic properties. Both Hermitian operators and unitary operators fall under the
category of normal operators. The normal matrices are characterized by an important
fact that those matrices can be diagonalized by a unitary matrix. Moreover,
Hermitian matrices always possess real eigenvalues. This fact largely eases mathe-
matical treatment of quantum mechanics. In relation to these topics, in this chapter
we investigate projection operators systematically. We find their important applica-
tion to physicochemical problems in Part IV. We further investigate Hermitian
quadratic forms and real symmetric quadratic forms as an important branch of matrix
algebra. In connection with this topic, positive definiteness and non-negative prop-
erty of a matrix are an important concept. This characteristic is readily applicable to
theory of differential operators, thus rendering this chapter closely related to basic
concepts of quantum physics.
14.1 Projection Operators
In Chap. 12 we considered the decomposition of a vector space to direct sum of

invariant subspaces. We also mentioned properties of idempotent operators. More-
over, we have shown how an orthonormal basis can be constructed from a set of
linearly independent vectors. In this section an orthonormal basis set is implied as
basis vectors in an inner product space V n.
Let us start with a concept of an orthogonal complement. Let W be a subspace in
V n. Let us think of a set of vectors jxi such that

https://doi.org/10.1007/978-981-15-2225-3_14
548 14 Hermitian Operators and Unitary Operators
n o
8
jxi; hxjyi ¼ 0 for j yi 2 W :
We name this set W⊥ and call it an orthogonal complement of W. The set W⊥

forms a subspace of V n. In fact, if jai, j bi 2 W⊥, ha j yi ¼ 0, hb| yi ¼ 0. Since
(ha j + hb| )| yi ¼ ha| yi + hb| yi ¼ 0. Therefore, jai + j bi 2 W⊥ and hαa| yi ¼ αha|
yi ¼ 0. Hence, jαai ¼ α j ai 2 W⊥. Then, W⊥ is a subspace of V n.
Theorem 14.1 Let W be a subspace and W⊥ be its orthogonal complement in V n.
Then,
V n ¼ W⨁W⊥ : ð14:1Þ
Proof Suppose that an orthonormal basis comprising je1i, j e2i, and j eni spans
V n;
V n ¼ Spanfje1 i, je2 i, . . . , jen ig: ð14:2Þ
Of the orthonormal basis, let |e1i, |e2i, and |eri (r < n) span W. Let an
arbitrarily chosen vector from V n be |xi. Then we have
Xn
jxi ¼ x1 je1 i þ x2 je2 i þ þ xn jen i ¼ x
i¼1 i
j ei i: ð14:3Þ
Multiplying hejj on (14.3) from the left, we have

Xn Xn
ej j xi ¼ i¼1
x i e j j e i i ¼ x δ ¼ xj :
i¼1 i ij
ð14:4Þ
That is,
Xn
j xi ¼ i¼1
hei jxijei i: ð14:5Þ
Meanwhile, put
Xr
j x0 i ¼ i¼1
hei jxijei i: ð14:6Þ
Then we have jx0i 2 W. Also putting jx00i ¼ j xi j x0i and multiplying

hei j (1 i r) on it from the left, we get
14.1 Projection Operators 549
hei j x00 i ¼ hei jxi hei jx0 i ¼ hei jxi hei jxi ¼ 0: ð14:7Þ
Taking account of W ¼ Span{| e1i, | e2i, , | eri}, we get jx00i 2 W⊥. That is,
for 8jxi 2 V n
jxi ¼jx0 i þ jx00 i: ð14:8Þ
This means that V n ¼ W + W⊥. Meanwhile, we have W \ W⊥ ¼ {0}. In fact,

suppose that |xi 2 W \ W⊥. Then hx j xi ¼ 0 because of the orthogonality.
However, this implies that |xi ¼ 0. Consequently, we have V n ¼ W ⨁ W⊥. This
completes the proof.
The consequence of the Theorem 14.1 is that the dimension of W⊥ is (n r). In
other words, we have
dimV n ¼ n ¼ dimW þ dimW ⊥ :
Moreover, the contents of the Theorem 14.1 can readily be generalized to more
subspaces such that
V n ¼ W 1 ⨁W2 ⨁ ⨁Wr , ð14:9Þ
where W1, W2, , and Wr (r n) are mutually orthogonal complements. In this

case 8jxi 2 V n can be expressed uniquely as the sum of individual vectors jw1i, j w2i,
, and j wri of each subspace; i.e.,
jxi ¼jw1 i þ jw2 i þ þjwr i ¼ jw1 þ w2 þ þ wr i: ð14:10Þ
Let us define the following operators similarly to the case of (12.201):
Pi ðjxiÞ ¼ jwi i ð1 i r Þ: ð14:11Þ
Thus, the operator Pi extracts a vector jwii in a subspace Wi. Then we have
ðP1 þ P2 þ þ Pr ÞðjxiÞ ¼ P1 jxi þ P2 jxi þ þ Pr jxi

¼ jw1 iþjw2 i þ þ jwr i ¼jxi: ð14:12Þ
Since jxi is an arbitrarily chosen vector, we get
P1 þ P2 þ þ Pr ¼ E: ð14:13Þ
Moreover,
Pi ½Pi ðjxiÞ ¼ Pi ðjwi iÞ ¼ jwi i ð1 i r Þ: ð14:14Þ
Therefore, from (14.11) and (14.14) we have
Pi ½Pi ðjxiÞ ¼ Pi ðjxiÞ: ð14:15Þ
The vector jxi is arbitrarily chosen, and so we get
Pi 2 ¼ Pi : ð14:16Þ
Choose another arbitrary vector jyi 2 V n such that
jyi ¼ju1 i þ ju2 i þ þjur i ¼ ju1 þ u2 þ þ ur i: ð14:17Þ
Then, we have

hxPi yi ¼ hw1 þ w2 þ þ wr Pi j u1 þ u2 þ þ ur i
: ð14:18Þ
¼ hw1 þ w2 þ þ wr j ui i ¼ hwi jui i
With the last equality, we used the mutual orthogonality of the subspaces.
Meanwhile, we have
hyjPi xi ¼ hu1 þ u2 þ þ ur jPi jw1 þ w2 þ þ wr i

: ð14:19Þ
¼ hu1 þ u2 þ þ ur jwi i ¼ hui jwi i ¼ hwi jui i

hxjPi yi ¼ hyjPi xi ¼ hxPi { y , ð14:20Þ
where we used (13.64) with the second equality. Since jxi and jyi are arbitrarily
chosen, we get
Pi { ¼ Pi : ð14:21Þ
Equation (14.21) shows that Pi is Hermitian.

The above discussion parallels that made in Sect. 12.4 with an idempotent
operator. We have a following definition about a projection operator.
Definition 14.1 An operator P is said to be a projection operator if P2 ¼ P and
P{ ¼ P. That is, an idempotent and Hermitian operator is a projection operator.
As described above, a projection operator is characterized by (14.16) and (14.21).
An idempotent operator does not premise the presence of an inner product space, but
we only need a direct sum of subspaces. In contrast, if we deal with the projection
operator, we are thinking of orthogonal compliments as subspaces and their direct

sum. The projection operator can adequately be defined in an inner product vector
space having an orthonormal basis.
From (14.13) we have
ðP1 þ P2 þ þ Pr ÞðP1 þ P2 þ þ Pr Þ
Xr X Xr X X
¼ P2þ
i¼1 i
PP ¼
i6¼j i j i¼1 i
P þ PP ¼Eþ
i6¼j i j
P P ¼ E:
i6¼j i j
ð14:22Þ
In (14.22) we used (14.13) and (14.16). Therefore, we get

X
PP
i6¼j i j
¼ 0:
In particular, we have
Pi Pj ¼ 0, Pj Pi ¼ 0 ði 6¼ jÞ: ð14:23Þ
In fact, we have

Pi Pj ðjxiÞ ¼ Pi jwj i ¼ 0 ði 6¼ j, 1 i, j nÞ: ð14:24Þ
The second equality comes from Wi \ Wj ¼ {0}. Notice that in (14.24), indices
i and j are interchangeable. Again, |xi is arbitrarily chosen, and so (14.23) holds.
Combining (14.16) and (14.23), we write
Pi Pj ¼ δij : ð14:25Þ
In virtue of the relation (14.23), Pi + Pj (i 6¼ j) is a projection operator as well

[1]. In fact, we have
2
Pi þ Pj ¼ Pi 2 þ Pi Pj þ Pj Pi þ Pj 2 ¼ Pi 2 þ Pj 2 ¼ Pi þ Pj ,
where the second equality comes from (14.23). Also we have

{
Pi þ Pj ¼ Pi { þ Pj { ¼ Pi þ Pj :
The following notation is often used:
j wi ihwi j
Pi ¼ : ð14:26Þ
j jwi j j j jwi j j
Then we have
jwi ihwi j
Pi j xi ¼ j ðj w1 iþ j w2 i þ þ jws iÞ
jjwi jj jjwi jj
¼ ½jwi iðhwi jw1 i þ hwi jw2 i þ þ hwi jws iÞ=hwi jwi i
¼ ½jwi ihwi jwi i=hwi jwi i ¼j wi i:
Pi 2 j xi ¼ Pi j wi i ¼j wi i ¼ Pi j xi: ð14:27Þ
Equation (14.16) is recovered accordingly. Meanwhile,
ðjwi ihwi jÞ{ ¼ ðhwi jÞ{ ðjwi iÞ{ ¼ jwi ihwi j: ð14:28Þ
Hence, we recover
Pi { ¼ Pi : ð14:21Þ
In (14.28) we used (AB){ ¼ B{A{. In fact, we have

{ { { { D E

xB A y ¼ xB A y ¼ hyAjBxi ¼ hyjABjxi ¼ xðABÞ{ jy : ð14:29Þ
With the second equality of (14.29), we used (13.86) where A is replaced with
B and jyi is replaced with A{ j yi. Since (14.29) holds for arbitrarily chosen vectors
jxi and jyi, comparing the first and last sides of (14.29) we have
ðABÞ{ ¼ B{ A{ : ð14:30Þ
We can express (14.29) alternatively as follows:
D { { E
xB{ A{ y ¼ y A{ B{ x ¼ hyAjBxi ¼ hyABxi , ð14:31Þ
where with the second equality we used (13.68). Also recall the remarks after (13.83)
with the expressions of (14.29) and (14.31). Other notations can be adopted.
We can view a projection operator under a more strict condition. Related oper-
ators can be defined as well. As in (14.26), let us define an operator such that
Pek ¼j ek ihek j : ð14:32Þ
Operating Pei on jxi ¼ x1 j e1i + x2 j e2i + + xn j eni from the left, we get
Xn Xn Xn
ej ¼ek i ej ¼ ek i
Pek jxi¼jek ihek j¼1
x j j¼1
x j e k j¼1
x j δ kj ¼x k ek i:
Thus, we find that Pek plays the same role as P(k) defined in (12.201). Represented
by a matrix, Pek has the same structure as that denoted in (12.205). Evidently,
2 {
Pek ¼ Pek , Pek ¼ Pek , P
f1 þ f
P2 þ þ f
Pn ¼ E: ð14:33Þ
ðk Þ
Now let us modify P(k) in (12.201). There PðkÞ ¼ δi δjðkÞ , where only the (k,k)
ðk Þ ðk Þ
element is 1, otherwise 0, in the (n, n) matrix. We define a matrix PðmÞ ¼ δi δjðmÞ. A
full matrix representation for it is
0 1
0
B C
B ⋱ 1 C
B C
B C
B 0 C
B C
ðk Þ B C
PðmÞ ¼B 0 C, ð14:34Þ
B C
B C
B 0 C
B C
B C
@ A
⋱
0
ðk Þ
where only the (k, m) element is 1, otherwise 0. In an example of (14.34), PðmÞ is an
ðk Þ
upper triangle matrix (k < m). Therefore, its eigenvalues are all zero, and so PðmÞ is a
nilpotent matrix. If k > m, the matrix is a lower triangle matrix and nilpotent as well.
Such a matrix is not Hermitian (nor a projection operator), as can be immediately
seen from the matrix form of (14.34). Because of the properties of nilpotent matrices
ðk Þ
mentioned in Sect. 12.3, PðmÞ ðk 6¼ mÞ is not diagonalizable either.
Various relations can be extracted. As an example, we have
ðk Þ ðlÞ
X ðk Þ ðk Þ ðk Þ
PðmÞ PðnÞ ¼ δ δqðmÞ δðqlÞ δjðnÞ
q i
¼ δi δml δjðnÞ ¼ δml PðnÞ : ð14:35Þ
ðk Þ
Note that PðkÞ PðkÞ defined in (12.201). From (14.35), moreover, we have
ðk Þ ðmÞ ðk Þ ðnÞ ðmÞ ðnÞ ðmÞ ðnÞ ðmÞ

PðmÞ PðnÞ ¼ PðnÞ , PðmÞ PðnÞ ¼ PðnÞ , PðnÞ PðmÞ ¼ PðmÞ ,
h i2
ðkÞ ðk Þ ðk Þ ðmÞ ðmÞ ðmÞ ðmÞ
PðmÞ PðmÞ ¼ δmk PðmÞ , PðmÞ PðmÞ ¼ PðmÞ ¼ PðmÞ , etc:
These relations remain unchanged after the unitary similarity transformation by

U. For instance, taking the first equation of the above, we have
h ih i
ðkÞ ðmÞ ðk Þ ðmÞ ðk Þ
U { PðmÞ PðnÞ U ¼ U { PðmÞ U U { PðnÞ U ¼ U { PðnÞ U:
ðk Þ
Among these operators, only PðkÞ is eligible for a projection operator. We will
encounter further examples in Part IV.
ðk Þ
Using PðkÞ in (13.62), we have
0 1
D E x1
ðkÞ ðk Þ B C
yPðkÞ ðxÞ ¼ y1 yn GPðkÞ @ ⋮ A: ð14:36Þ
xn
Within a framework of an orthonormal basis where G ¼ E, the representation is

largely simplified to be
0 1
D E x1
ðkÞ ðkÞ B C
yPðkÞ x ¼ y1 yn PðkÞ @ ⋮ A ¼ yk xk :

ð14:37Þ
xn
14.2 Normal Operators
There are a large group of operators called normal operators that play an important
role in mathematical physics, especially quantum physics. A normal operator is
defined as an operator on an inner product space that commutes with its adjoint
operator. That is, let A be a normal operator. Then, we have
AA{ ¼ A{ A: ð14:38Þ
The normal operators include an Hermitian operator H defined as H{ ¼ H as well

as a unitary operator U defined as UU{ ¼ U{U ¼ E.
In this condition let us estimate the norm of jA{xi together with jAxi defined by
(13.87). If A is a normal operator,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D { Effi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
kA{ xk ¼ x A{ jA{ x ¼ xAA{ x ¼ xA{ Ax ¼j jAxj j : ð14:39Þ
14.2 Normal Operators 555
The other way around suppose that j|Ax| j ¼ kA{xk. Then, since ||Ax||2 ¼ kA{xk2,
hxA{Axi ¼ hxAA{xi. That is, hx| A{A AA{| xi ¼ 0 for an arbitrarily chosen vector
jxi. To assert A{A AA{ ¼ 0, i.e., A{A ¼ AA{ on the assumption that hx| A{A AA{|
xi ¼ 0, we need the following theorems:
Theorem 14.2 [2] A linear transformation A on an inner product space is the zero
transformation if and only if hy| Axi ¼ 0 for any vectors jxi and jyi.
Proof If A ¼ 0, then hy| Axi ¼ hy| 0i ¼ 0. This is because in (13.3) putting β ¼ 1 ¼ γ
and jbi ¼ j ci, we get ha| 0i ¼ 0. Conversely, suppose that hy| Axi ¼ 0 for any
vectors jxi and jyi. Then, putting jyi ¼ j Axi, hxA{| Axi ¼ 0 and hy| yi ¼ 0. This
implies that jyi ¼ j Axi ¼ 0. For jAxi ¼ 0 to hold for any jxi we must have A ¼ 0.
Note here that if A is a singular matrix, for some vectors jxi, j Axi ¼ 0. However,
even though A is singular, for jAxi ¼ 0 to hold for any jxi, A ¼ 0.
We have another important theorem under a further restricted condition. ∎
Theorem 14.3 [2] A linear transformation A on an inner product space is the zero
transformation if and only if hx| Axi ¼ 0 for any vectors jxi.
Proof As in the case of Theorem 14.2, a necessary condition is trivial. To prove a
sufficient condition, let us consider the following:
hx þ yjAðx þ yÞi ¼ hxjAxi þ hyjAyi þ hxjAyi þ hyjAxi, hxjAyi þ hyjAxi

¼ hx þ yjAðx þ yÞi hxjAxi hyjAyi: ð14:40Þ
From the assumption that hx| Axi ¼ 0 with any vectors jxi, we have
hxjAyi þ hyjAxi ¼ 0: ð14:41Þ
Meanwhile, replacing jyi by jiyi in (14.41), we get
hxjAiyi þ hiyjAxi ¼ i½hxjAyi hyjAxi ¼ 0: ð14:42Þ
That is,
hxjAyi hyjAxi ¼ 0: ð14:43Þ
hxjAyi ¼ 0: ð14:44Þ
Theorem 14.2 means that A ¼ 0, indicating that the sufficient condition holds.
Thus, returning to the beginning, i.e., remarks made after (14.39), we establish the
following theorem:
Theorem 14.4 A necessary and sufficient condition for a linear transformation A on

an inner product space to be a normal operator is
kA{ xk ¼ kAxk: ð14:45Þ
14.3 Unitary Diagonalization of Matrices
A normal operator has a distinct property. The normal operator can be diagonalized
by a similarity transformation by a unitary matrix. The transformation is said to be a
unitary similarity transformation. Let us prove the following theorem.
Theorem 14.5 [3] A necessary and sufficient condition for a matrix A to be
diagonalized by unitary similarity transformation is that the matrix A is a normal
matrix.
Proof To prove the necessary condition, suppose that A can be diagonalized by a
unitary matrix U. That is,
U { AU ¼ D, i:e:, A ¼ UDU { and A{ ¼ UD{ U { , ð14:46Þ
where D is a diagonal matrix. Then

AA{ ¼ UDU { UD{ U { ¼ UDD{ U { ¼ UD{ DU {

¼ UD{ U { UDU { ¼ A{ A: ð14:47Þ
For the third equality, we used DD{ ¼ D{D (i.e., D and D{ are commutable). This
shows that A is a normal matrix.
To prove the sufficient condition, let us show that a normal matrix can be
diagonalized by unitary similarity transformation. The proof is due to mathematical
induction, as is the case with Theorem 12.1.
First we show that Theorem is true of a (2, 2) matrix. Suppose that one of
eigenvalues of A2 is α1 and that its corresponding eigenvector is jx1i. Following
procedures of the proof for Theorem 12.1 and remembering the Gram–Schmidt
orthonormalization theorem, we can construct a unitary matrix U1 such that
U 1 ¼ ðjx1 i jp1 iÞ, ð14:48Þ
where jx1i represents a column vector and jp1i is another arbitrarily determined
column vector. Then we can convert A2 to a triangle matrix such that
14.3 Unitary Diagonalization of Matrices 557
α1 x
f
A2 U 1 { A2 U 1 ¼ : ð14:49Þ
0 y
Then we have
{ {
f2 A
A f2 ¼ U 1 A2 U 1 U 1 { A2 U 1 Þ{ ¼ U 1 { A2 U 1 U 1 { A2 { U 1
h i{
¼ U 1 { A2 A2 { U 1 ¼ U 1 { A2 { A2 U 1 ¼ U 1 { A2 { U 1 U 1 { A2 U 1 ¼ f
A2 Af2 :
ð14:50Þ
With the fourth equality, we used the supposition that A2 is a normal matrix.
Equation (14.50) means that fA2 defined in (14.49) is a normal operator. Via simple
matrix calculations, we have
! !
h i{ jα1 j2 þ jxj2 xy h i{ jα1 j2 α1 x
A2 f
f A2 ¼ , A2 f
f A2 ¼ : ð14:51Þ
x y jyj2 α1 x jxj2 þ jyj2
For (14.50) to hold, we must have x ¼ 0 in (14.51). Accordingly, we get
α1 0
f
A2 ¼ : ð14:52Þ
0 y
This implies that a normal matrix A2 has been diagonalized by the unitary
similarity transformation.
Now let us examine a general case where we consider a (n, n) square normal
matrix An. Let αn be one of eigenvalues of An. On the basis of the argument of the
e we
(2, 2) matrix case, after a suitable similarity transformation by a unitary matrix U
first have
{
fn ¼ U
A e
e An U, ð14:53Þ
where we can put
fn ¼ αn xT
A , ð14:54Þ
0 B
where x is a column vector of order (n 1), 0 is a zero column vector of order

(n 1), and B is a (n 1, n 1) matrix. Then we have
fn { ¼ αn 0
½A , ð14:55Þ
x B{
where x is a complex column vector. Performing matrix calculations, we have

! !
fn { ¼ jαn j2 þ xT x xT B{ fn ¼
fn { ½A jαn j2 αn xT
f
An ½A , ½A :
Bx BB{ α n x x xT þ B { B
ð14:56Þ
fn ½A
For A fn { ½g
fn { ¼ ½A An to hold with (14.56), we must have x = 0. Thus we get
αn 0
fn ¼
A : ð14:57Þ
0 B
Since f
An is a normal matrix, so is B. According to mathematical induction, let us
assume that the theorem holds with a (n 1, n 1) matrix, i.e., B. Then, also from
the assumption there exists a unitary matrix C and a diagonal matrix D, both of order
(n 1), such that BC ¼ CD. Hence,
αn 0 1 0 1 0 αn 0
¼ : ð14:58Þ
0 B 0 C 0 C 0 D
Here putting
1 0 αn 0
fn ¼
C and fn ¼
D , ð14:59Þ
0 C 0 D
we get
f fn ¼ C
An C fn D
fn : ð14:60Þ
h i{
fn is a (n, n) unitary matrix, [ C
As C fn ¼ C
fn { C fn ¼ E . Hence,
fn C
h i{
fn A
C fn C
fn ¼ D
fn . Thus, from (14.53) finally we get
h i{ {
fn
C e An U
U eCfn ¼ D
fn : ð14:61Þ
eC
Putting U fn ¼ V, V being another unitary operator,
fn :
V { An V ¼ D ð14:62Þ

A direct consequence of Theorem 14.5 is that with any normal matrix we can find
a set of orthonormal eigenvectors corresponding to individual eigenvalues whether
or not those are degenerate.
In Sect. 12.2 we dealt with a decomposition of a linear vector space and relevant
reduction of an operator when discussing canonical forms of matrices. In this context
Theorem 14.5 gives a simple and clear criterion for this. Equation (14.57) implies
that a (n 1, n 1) submatrix B can further be reduced to matrices having lower
dimensions. Considering that a diagonal matrix is a special case of triangle matrices,
a normal matrix that has been diagonalized by the unitary similarity transformation
gives eigenvalues by its diagonal elements.
From a point of view of the aforementioned aspect, let us consider the character-
istics of normal matrices, starting with the discussion about the invariant subspaces.
We have a following important theorem.
Theorem 14.6 Let A be a normal matrix and let one of its eigenvalues be α. Let Wα
be an eigenspace corresponding to α. Then, Wα is both A-invariant and A{-invariant.
Also Wα⊥ is both A-invariant and A{-invariant.
Proof Theorem 14.5 ensures that a normal matrix is diagonalized by unitary
similarity transformation. Therefore, we deal with only “proper” eigenvalues and
eigenvectors here. First we show if a subspace W is A-invariant, then its orthogonal
complements W⊥ is A{-invariant. In fact, suppose that jxi 2 W and jx0i 2 W⊥. Then,
from (13.64) and (13.86) we have

hx0 jAxi ¼ 0 ¼ xA{ jx0 ¼ xjA{ x0 : ð14:63Þ
The first equality comes from the fact that jxi 2 W ⟹ A j xi(¼| Axi) 2 W as W is
A-invariant. From the last equality of (14.63), we have

A{ j x0 i ¼ jA{ x0 i 2 W ⊥ : ð14:64Þ
That is, W⊥ is A{-invariant.

Next suppose that jxi 2 Wα. Then we have
AA{ j xi ¼ A{ A j xi ¼ A{ ðαjxiÞ ¼ αA{ j xi: ð14:65Þ
Therefore, A{ j xi 2 Wα. This means that Wα is A{-invariant. From the above

remark, Wα⊥ is (A{){-invariant, i.e., A-invariant accordingly. This completes the
proof. ∎
From Theorem 14.5, we know that the resulting diagonal matrix D fn in (14.62) has
a form with n eigenvalues (αn) some of which may be multiple roots arranged in
diagonal elements. After diagonalizing the matrix, those eigenvalues can be sorted
out according to different eigenvalues α1, α2, , and αs. This can also be done by
unitary similarity transformation. The relevant unitary matrix U is represented as
0 1
1
B ⋱ C
B C
B C
B 1 C
B C
B C
B 0 1 C
B C
B 1 C
B C
B C
U¼B ⋮ ⋱ ⋮ C, ð14:66Þ
B C
B 1 C
B C
B C
B 1 0 C
B C
B 1 C
B C
B C
@ ⋱ A
1
where except (i, j) and ( j, i) elements equal to 1, all the off-diagonal elements are
zero. If operated from the left, U exchanges the i-th and j-th rows of the matrix. If
operated from the right, U exchanges the i-th and j-th columns of the matrix. Note
that U is at once unitary and Hermitian with eigenvalue 1 or 1. Note that U2 ¼ E.
This is because exchanging two columns (or two rows) two times produces identity
transformation. Thus performing such unitary similarity transformations appropriate
times, we get
0 1
α1
B ⋱ C
B C
B C
B α1 C
B C
B α2 C
B C
B C
B ⋱ C
Dn B
~
B
C:
C ð14:67Þ
B α2 C
B C
B ⋱ C
B C
B αs C
B C
B C
@ ⋱ A
αs
The matrix is identical to that represented in (12.181).

In parallel, V n is decomposed to mutually orthogonal subspaces associated with
different eigenvalues α1, α2, , and αs such that
V n ¼ W α1 W α2 W αs : ð14:68Þ
This expression is formally identical to that represented in (12.191). Note,

however, that in (12.181) orthogonal subspaces are not implied. At the same time,
An is reduced to
0 1
Að1Þ
B C
B C
B C
An B
~
B Að2Þ C,
C ð14:69Þ
B C
@ ⋮ ⋱ ⋮A
AðsÞ
according to the different eigenvalues.

A normal operator has other distinct properties. Following theorems are good
examples.
Theorem 14.7 Let A be a normal operator on V n. Then jxi is an eigenvector of
A with an eigenvalue α, if and only if jxi is an eigenvector of A{ with an eigenvalue
α.
Proof We apply (14.45) for the proof. Both (A αE){ ¼ A{ αE and (A αE) are
normal, since A is normal. Consequently, we have j|(A αE)x| j ¼ 0 if and only if
k(A{ αE)xk ¼ 0. Since only the zero vector has a zero norm, we get

ðA αE Þ j xi ¼ 0 if and only if A{ α E j xi ¼ 0:

n
Theorem 14.8 Let A be a normal operator on V . Then, eigenvectors corresponding
to different eigenvalues are mutually orthogonal.
Proof Let A be a normal operator on V n. Let jui be an eigenvector corresponding to
an eigenvalue α and jvi be an eigenvector corresponding to an eigenvalue β with
α 6¼ β. Then we have

αhvjui ¼ hvjαui ¼ vjAui ¼ hujA{ v ¼ hujβ vi ¼ hβ vjui ¼ βhvjui, ð14:70Þ
where with the fourth equality we used Theorem 14.7. Then we get
ðα βÞhvjui ¼ 0:
Since α β 6¼ 0, hv| ui ¼ 0. Namely, the eigenvectors jui and jvi are mutually
orthogonal.
In (12.208) we mentioned the decomposition of diagonalizable matrices. As for
the normal matrices, we have a related matrix decomposition. Let A be a normal
operator. Then, according to Theorem 14.5, A can be diagonalized and expressed as
(14.67). This is equivalently expressed as a following succinct relation. That is, if we
choose U for a diagonalizing unitary matrix, we have
U { AU ¼ α1 P1 þ α2 P2 þ þ αs Ps , ð14:71Þ
where α1, α2, , and αs are different eigenvalues of A; Pl (1 l s) is described

such that, e.g.,
0 1
E n1
B C
B C
B C
P1 ¼ B
B 0n 2 C,
C ð14:72Þ
B C
@ ⋮ ⋱ ⋮A
0n s
where E n1 stands for a (n1, n1) identity matrix with n1 corresponding to multiplicity
of α1. A matrix represented by 0n2 is a (n2, n2) zero matrix, and so forth. This
expression is in accordance with (14.69). From a matrix form (14.72), obviously
Pl (1 l s) is a projection operator. Thus, operating U and U{ on both sides
(14.71) from the left and right of (14.71), respectively, we obtain
A ¼ α1 UP1 U { þ α2 UP2 U { þ þ αs UPs U { : ð14:73Þ
Defining Pel UPl U { ð1 l sÞ, we have
A ¼ α1 f
P1 þ α1 f
P2 þ þ αs Pes : ð14:74Þ
In (14.74), we can easily check that Pel is a projection operator with α1, α2, ,
and αs being different eigenvalues of A. If αl (1 l s) is degenerate, we express Pel
fμ ð1 μ m Þ, where m is multiplicity of α . In that case, we may write
as P l l l l
Pel ¼ f
P1l Pf
l :
ml
ð14:75Þ
Also we have
Pek Pel ¼ UPk U { UPl U { ¼ UPk EPl U { ¼ UPk Pl U { ¼ 0 ð1 k, l sÞ:
The last equality comes from (14.23). Similarly, we have
Pel Pek ¼ 0:
Thus, we have
Pek Pel ¼ δkl :
If the operator is decomposed as in the case of (14.75), we can express
fμ Peν ¼ δ ð1 μ, ν m Þ:
P l l μν l
Conversely, if an operator A is expressed by (14.74), that operator is normal

operator. In fact, we have
X { X X X X
A{ A ¼ αPe αPe ¼ eP
α α P e ¼ e ¼
α α δ P ei ,
jαi j2 P
i i i j j j i,j i j i j i,j i j ij i i
X X { X X X
AA{ ¼ αPe αPe ¼ eP
α α P e ¼ e ¼
α α δ P ei
jαi j2 P
j j j i i i i,j i j j i i,j i j ji j i
ð14:76Þ
Hence, A{A ¼ AA{. If projection operators are further decomposed as in the case
of (14.75), we have a related expression to (14.76). Thus, a necessary and sufficient
condition for an operator to be a normal operator is that the said operator is expressed
as (14.74). The relation (14.74) is well known as a spectral decomposition theorem.
Thus, the spectral decomposition theorem is equivalent to Theorem 14.5.
The relations (12.208) and (14.74) are virtually the same, aside from the fact that
whereas (14.74) premises an inner product space, (12.208) does not premise
it. Correspondingly, whereas the related operators are called projection operators
with the case of (14.74), those operators are said to be idempotent operators for
(12.208).
Example 14.1 Let us think of a Gram matrix of Example 13.1, as shown below.
2 1þi
G¼ : ð13:51Þ
1i 2
After a unitary similarity transformation, we got

pffiffiffi !
{ 2þ 2 0
U GU ¼ pffiffiffi : ð13:54Þ
0 2 2
e ¼ U { GU and rewriting (13.54), we have

Putting G
pffiffiffi !
2þ 2 0 pffiffiffi 1 0 pffiffiffi 0 0
e¼
G pffiffiffi ¼ 2 þ 2 þ 2 2 :
0 2 2 0 0 0 1
e { ¼ G, we get
By a back calculation of U GU
0 pffiffiffi 1
1 2ð1 þ iÞ
pffiffiffi B C pffiffiffi
G¼ 2þ 2 B @ pffiffiffi
2 4 Cþ 2 2
A
2ð 1 i Þ 1
4 2
0 pffiffiffi 1
1 2ð 1 þ i Þ
B C
B pffiffiffi2 4 C: ð14:77Þ
@ A
2ð 1 i Þ 1

4 2
Putting eigenvalues α1 ¼ 2 þ 2 and α2 ¼ 2 2 along with
0 pffiffiffi 1 0 pffiffiffi 1
1 2ð1 þ iÞ 1 2ð1 þ iÞ
B C B C
A1 ¼ B
@ p ffiffi
ffi 2 4 C,
A A2 ¼ B
@ p ffiffi
ffi 2 4 C,
A ð14:78Þ
2ð1 iÞ 1 2ð1 iÞ 1

4 2 4 2
we get
G ¼ α1 A1 þ α2 A2 : ð14:79Þ
In the above, A1 and A2 are projection operators. In fact, as anticipated we have
A1 2 ¼ A1 , A2 2 ¼ A2 , A1 A2 ¼ A2 A1 ¼ 0, A1 þ A2 ¼ E: ð14:80Þ
Moreover, (14.78) obviously shows that both A1 and A2 are Hermitian. Thus,
Eqs. (14.77) and (14.79) are an example of the spectral decomposition. The decom-
position is unique.
The Example 14.1 can be dealt with in parallel to Example 12.5. In Example 12.5,
however, an inner product space is not implied, and so we used an idempotent matrix
instead of a projection operator. Note that as can be seen in Example 12.5 that
idempotent matrix was not Hermitian.
14.4 Hermitian Matrices and Unitary Matrices
Of normal matrices, Hermitian matrices and unitary matrices play a crucial role both
in fundamental and applied science. Let us think of several topics and examples.
14.4 Hermitian Matrices and Unitary Matrices 565
In quantum physics, one frequently treats expectation value of an operator. In

general, such an operator is Hermitian, more strictly an observable. Moreover, a
vector on an inner product space is interpreted as a state on a Hilbert space. Suppose
that there is a linear operator (or observable that represents a physical quantity)
O that has discrete (or countable) eigenvalues α1, α2, . The number of the
eigenvalues may be a finite number or an infinite number, but here we assume the
finite number; i.e., let us suppose that we have eigenvalues α1, α2, , and αs, in
consistent with our previous discussion.
In quantum physics, we have a well-known Born probability rule. The rule says
the following: Suppose that we carry out a physical measurement on A with respect
to a physical state jui. Here we assume that jui has been normalized. Then, the
probability ℘l that A takes αl (1 l s) is given by
2
℘l ¼ Pel u , ð14:81Þ
where Pel is a projection operator that projects jui to an eigenspace W αl spanned by

jαl, ki. Here k (1 k nl) reflects the multiplicity nl of an eigenvalue αl. Hence, we
express the nl-dimensional eigenspace W αl as
W αl ¼ Spanfjαl , 1i, jαl , 2i, , jαl , nl ig: ð14:82Þ
Now, we define an expectation value hAi of A such that

Xs
hAi α℘:
l¼1 l l
ð14:83Þ

2 D { E
℘l ¼ Pel u ¼ uPel jPel u ¼ uPel jPel u ¼ uPel Pel u ¼ uPel u : ð14:84Þ
For the third equality we used the fact that Pel is Hermitian; for the last equality we
used Pel ¼ Pel . Summing (14.84) with the index l, we have
2
X X D X E
℘ ¼
l l l
uPel u ¼ u l
el u ¼ huEui ¼ hu j ui ¼ 1,
P
where with the third equality we used (14.33).

Meanwhile, from the spectral decomposition theorem, we have
A ¼ α1 f
P1 þ α2 f
P2 þ þ αs Pes : ð14:74Þ
Operating huj and jui on both sides of (14.74), we get

D E D E
f f e
hujAjui ¼ α1 uP 1 u þ α2 uP2 u þ þ αs u Ps u
¼ α1 ℘1 þ α2 ℘2 þ þ αs ℘s : ð14:85Þ
Equating (14.83) and (14.85), we have
hAi ¼ hujAjui: ð14:86Þ
In quantum physics, a real number is required for an expectation value of an

observable (i.e., a physical quantity). To warrant this, we have following theorems.
Theorem 14.9 A linear transformation A on an inner product space is Hermitian if
and only if hu|A|ui is real for all jui of that inner product space.
Proof If A ¼ A{, then hu|A|ui ¼ hu| A{| ui ¼ hu| A| ui. Therefore, hu| A| ui is real.
Conversely, if hu|A|ui is real for all jui, we have

hujAjui ¼ hujAjui ¼ ujA{ ju :
Hence,

ujA A{ ju ¼ 0: ð14:87Þ
From Theorem 14.3, we get A A{ ¼ 0, i.e., A ¼ A{. This completes the proof.∎
Theorem 14.10 The eigenvalues of an Hermitian operator A are real.
Proof Let α be an eigenvalue of A and let jui be a corresponding eigenvector. Then,
A j ui ¼ α j ui. Operating huj from the left, hu|A|ui ¼ αhu| ui ¼ α j |u|j. Thus,
α ¼ hujAjui= j juj j : ð14:88Þ
Since A is Hermitian, hu|A|ui is real. Then, α is real as well.

Unitary matrices have a following conspicuous features: (i) An inner product is
held invariant under unitary transformation: Suppose that jx0i ¼ U j xi and
jy0i ¼ U j yi, where U is a unitary operator. Then hy0| x0i ¼ hyU{| Uxi ¼ hy| xi. A
norm of any vector is held invariant under unitary transformation as well. This is
easily checked by replacing jyi with jxi in the above. (ii) Let U be a unitary matrix
and suppose that λ be an eigenvalue with jλi being its corresponding eigenvector of
that matrix. Then we have
j λi ¼ U { U j λi ¼ U { ðλjλiÞ ¼ λU { j λi ¼ λλ j λi, ð14:89Þ
where with the last equality we used Theorem 14.7. Thus

14.4 Hermitian Matrices and Unitary Matrices 567
ð1 λλ Þ j λi ¼ 0: ð14:90Þ
As jλi 6¼ 0 is assumed, 1 λλ ¼ 0. That is
λλ ¼ jλj2 ¼ 1: ð14:91Þ
Thus, eigenvalues of a unitary matrix have unit absolute value. Let those eigen-
values be λk ¼ eiθk ðk ¼ 1, 2, ; θk : realÞ. Then, from (12.11) we have
Y Y Y
detU ¼ λk ¼ eiðθ1 þθ2 þÞ and jdetU j ¼ jλk j ¼ eiθk ¼ 1:
k k k
That is, any unitary matrix has a determinant of unit absolute value. ∎
Example 14.2 Let us think of a following unitary matrix R:
cos θ sin θ
R¼ : ð14:92Þ
sin θ cos θ
A characteristic equation is

cos θ λ sin θ
¼ λ2 2λ cos θ þ 1: ð14:93Þ
sin θ cos θ λ
Solving (14.93), we have

λ ¼ cos θ cos 2 θ 1 ¼ cos θ i j sin θ j : ð14:94Þ
(i) θ ¼ 0: This is a trivial case. The matrix R has automatically been diagonalized
to be an identity matrix. Eigenvalues are 1 (double root).
(ii) θ ¼ π: This is a trivial case. Eigenvalues are 1 (double root).
(iii) θ 6¼ 0, π: Let us think of 0 < θ < π. Then λ ¼ cos θ i sin θ. As a diagonalizing
unitary matrix U, we get
0 1 0 1
1 1 1 i
B 2 2C B 2 2 C
U¼B
@
C, U{ ¼ B C: ð14:95Þ
i i A @ 1 i A
2 2 2 2
Therefore, we have
0 1 0 1
1 i 1 1
B 2 2 C cos θ sin θ B 2 2C
U { RU ¼ B
@ 1
C B C
i A sin θ cos θ @ piffiffiffi i A
pffiffiffi pffiffiffi pffiffiffi
2 2 2 2
eiθ 0
¼ : ð14:96Þ
0 eiθ
A trace of the resulting matrix is 2 cos θ. In the case of π < θ < 2π, we get a
diagonal matrix similarly. The conformation is left for readers.
14.5 Hermitian Quadratic Forms
The Hermitian quadratic forms appeared in, e.g., (13.34) and (13.83) in relation to
Gram matrices in Sect. 13.2. The Hermitian quadratic forms have wide applications
in the field of mathematical physics and materials science.
Let H be an Hermitian operator and jxi on an inner vector space. We define the
Hermitian quadratic form in an arbitrary orthonormal basis as follows:
0 1
x1
B C X
hxjH jxi x1 xn H @ ⋮ A ¼ x ðH Þij xj ,
i,j i
xn
where jxi is represented as a column vector, as already mentioned in Sect. 13.4. Let
us start with unitary diagonalization of (13.40), where a Gram matrix is a kind of
Hermitian matrix. Following similar procedures, as in the case of (13.36) we obtain a
diagonal matrix and an inner product such that
0 10 1 2
λ1 0 ξ1
B
CB C
hxjH jxi ¼ ξ1 ξn @ ⋮ ⋱ ⋮ A@ ⋮ A ¼ λ1 ξ1 j2 þ þ λn ξn : ð14:97Þ

0 λn ξn
Notice, however, that the Gram matrix comprising basis vectors (that are linearly
independent) is positive definite. Remember that if a Gram matrix is constructed by
A{A (Hermitian as well) according to whether A is non-singular or singular, A{A is
either positive definite or non-negative. The Hermitian matrix we are dealing with
here, in general, does not necessarily possess the positive definiteness or
non-negative feature. Yet, remember that hx| H| xi and eigenvalues λ1, , λn are real.
Positive definiteness of matrices is an important concept in relation to the
Hermitian (and real) quadratic forms (see Sect. 13.2). In particular, in the case
where all the matrix elements are real and | xi is defined in a real domain, we are
14.5 Hermitian Quadratic Forms 569
dealing with the quadratic form with respect to a real symmetric matrix. In the case
of the real quadratic forms, we sometimes adopt the following notation [4, 5]:
0 1
x1
Xn B C
A½x xT Ax ¼ a x x ;x
i,j¼1 ij i j
¼ @ ⋮ A,
xn
where A ¼ (aij) is a real symmetric matrix and xi (1 i n) are real numbers. The
positive definiteness is invariant under a transformation PTAP, where P is
non-singular. In fact, if A > 0, for x0T ¼ xTP and A0 ¼ PTAP we have

xT Ax = xT P PT AP PT x = x0T A0 x0 :
Since P is non-singular, PTx = x0 represents any arbitrary vector. Hence,
A0 ¼ PT AP > 0, ð14:98Þ
where with the notation PTAP > 0; see (13.46). In particular, using a suitable
orthogonal matrix O, we obtain
0 1
λ1 0
B C
OT AO ¼ @ ⋮ ⋱ ⋮ A:
0 λn
From the above argument, OTAO > 0. We have
detOT AO ¼ detOT detAdetO ¼ ð 1ÞdetAð 1Þ ¼ detA:
Therefore, from (14.97), we have λi > 0 (1 i n). Thus, we get
Y
n
detA ¼ λi > 0:
i¼1
Notice that in the above discussion, PTAP is said to be an equivalent transforma-

tion of A by P and that we distinguish the equivalent transformation from the
similarity transformation P1A P. Nonetheless, if we choose an orthogonal matrix
O for P, the two transformations are the same because OT ¼ O1.
We often encounter real quadratic forms in the field of electromagnetism. Typical
example is a trace of electromagnetic fields observed with an elliptically or circularly
polarized light (see Sect. 7.3). A permittivity tensor of an anisotropic media such as
crystals (either inorganic or organic) is another example. A tangible example for this
appeared in Sect. 9.5.2 with an anisotropic organic crystal.
Regarding the real quadratic forms, we have a following important theorem.
Theorem 14.11 [4, 5] Let A be a n-dimensional real symmetric matrix A ¼ (aij). Let
A(k) be k-dimensional principal submatrices described by
0 1
ai1 i1 ai1 i2 ai1 ik
Ba ai2 ik C
B i2 i1 ai2 i2 C
AðkÞ ¼ B C ð1 i1 < i2 < < ik nÞ,
@ ⋮ ⋮ ⋱ ⋮ A
aik i1 aik i2 aik ik
where the principal submatrices mean a matrix made by striking out rows and
columns on diagonal elements. Alternatively, a principal submatrix of a matrix
A can be defined as a matrix whose diagonal elements are a part of the diagonal
elements of A. As a special case, those include a11, , or ann as a (1, 1) matrix (i.e.,
merely a number) as well as A itself. Then, we have
A > 0⟺detAðkÞ > 0 ð1 k nÞ: ð14:99Þ
Proof First, suppose that A > 0. Then, in a quadratic form A[x] equating (n k)
variables xl ¼ 0 (l 6¼ i1, i2, , ik), we obtain a quadratic form of
Xk
a x x :
μ,ν¼1 iμ iν iμ iν
Since A > 0, this (partial) quadratic form is positive definite as well; i.e., we have
AðkÞ > 0:
Therefore, we get
detAðkÞ > 0:
This is due to (13.48). Notice that detA(k) is said to be a principal minor. Thus, we
have proven ⟹ of (14.99).
To prove ⟸, in turn, we use mathematical induction. If n ¼ 1, we have a trivial
case; i.e., A is merely a real positive number. Suppose that ⟸ is true of n 1. Then,
we have A(n 1) > 0 by supposition. Thus, it follows that it will suffice to show
A > 0 on condition that A(n 1) > 0 and detA > 0 in addition. Let A be a n-
dimensional real symmetric non-singular matrix such that
!
Aðn1Þ a
A¼ ,
aT an
where A(n 1) is a symmetric matrix and non-singular as well. We define P such that
1
!
E Aðn1Þ a
P¼ ,
0 1
where E is a (n 1, n 1) unit matrix. Notice that detP ¼ det E 1 ¼ 1 6¼ 0,

indicating that P is non-singular. We have
E 0
PT ¼ 1 :
aT Aðn1Þ 1
For this expression, consider a non-singular matrix S. Then, we have SS1 ¼ E.

Taking its transposition, we have (S1)TST ¼ E. Therefore, if S is a symmetric matrix
(S1)TS ¼ E; i.e., (S1)T ¼ S1. Hence, an inverse matrix of a symmetric matrix is
symmetric as well. Then, for a symmetric matrix A(n 1) we have
1 T 1 T T T 1
aT Aðn1Þ ¼ Aðn1Þ a ¼ Aðn1Þ a:
Therefore, A can be expressed as

!
Aðn1Þ 0
A¼P T
1
P
0 an Aðn1Þ ½a
! ! 1 !
E 0 Aðn1Þ 0 E Aðn1Þ a
¼ 1 1
:
aT Aðn1Þ 1 0 an Aðn1Þ ½a 0 1
ð14:100Þ
Now, taking a determinant of (14.100), we have

h in 1
o
det A ¼ det PT det Aðn1Þ an Aðn1Þ ½a det P
h in 1
o
¼ det Aðn1Þ an Aðn1Þ ½a :
By supposition, we have det A(n 1) > 0 and det A > 0. Hence, we have
1
an Aðn1Þ ½a > 0:
!
ðn1Þ 1 xðn1Þ
Putting aen an A ½a and x , we get
xn
!
Aðn1Þ 0 h i
½x ¼ Aðn1Þ xðn1Þ þ aen xn 2 :
0 aen
Since A(n 1) > 0 and aen > 0, we have

!
e Aðn1Þ 0
A > 0:
0 aen
Meanwhile, A is expressed as
e
A ¼ PT AP:
From (14.98), A > 0. These complete the proof. ∎

We also have a related theorem (the following Theorem 14.12) for an Hermitian
quadratic form.
g
Theorem 14.12 Let A ¼ (aij) be a n-dimensional Hermitian matrix. Let A ðk Þ
be k-
dimensional principal submatrices. Then, we have
g
A > 0⟺detAðk Þ
> 0 ð1 k nÞ,
g
where Aðk Þ
is described as
0 1
a11 a1k
gðk Þ B C
A ¼@⋮ ⋱ ⋮ A:
ak1 akk
The proof is left for readers.

Example 14.3 Let us consider a following Hermitian (real symmetric) matrix and
corresponding Hermitian quadratic form.
5 2 5 2 x1
H¼ , hxjH jxi ¼ ðx1 x2 Þ : ð14:101Þ
2 1 2 1 x2
Principal minors of H are 5 (>0) and 1 (>0) and detH ¼ 5 4 ¼ 1 (>0).

Therefore, from Theorem 14.11 we have H > 0. A characteristic equation gives
following eigenvalues; i.e.,
λ 1 ¼ 3 þ 2 2, λ 2 ¼ 3 2 2:
Both the eigenvalues are positive as anticipated. As a diagonalizing matrix R, we

get
pffiffiffi pffiffiffi !
1þ 2 1 2
R¼ :
1 1
To obtain a unitary matrix, we have to seek norms of column vectors.

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi
Corresponding to λ1 and λ2, we estimate their norms to be 4 þ 2 2 and
pffiffiffi
4 2 2, respectively. Using them, as a unitary matrix U we get
0 1
1 1
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi C
B 42 2 4 þ 2 2C
U¼B C: ð14:102Þ
@ 1 1 A
pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi
4þ2 2 42 2
Thus, performing the matrix diagonalization, we obtain a diagonal matrix D such

that
pffiffiffi !
{ 3þ2 2 0
D ¼ U HU ¼ pffiffiffi : ð14:103Þ
0 32 2
Let us view the above unitary diagonalization in terms of coordinate transforma-

tion. Using the above matrix U and changing (14.101) as in (13.36),
5 2 x1
hxjH jxi ¼ ðx1 x2 ÞUU { UU {
21 x2
pffiffiffi !
3þ2 2 0 x1
¼ ðx1 x2 ÞU pffiffiffi U { : ð14:104Þ
0 32 2 x2
Making an argument analogous to that of Sect. 13.2 and using similar notation,
we have
f
x1 x1
hexjH 0 jexi ¼ ð f x2 ÞH 0
x1 f ¼ ðx1 x2 ÞUU { HUU { ¼ hxjH jxi: ð14:105Þ
f
x2 x2
That is, we have
f
x1 x1
¼ U{ : ð14:106Þ
f
x2 x2
Or, taking an adjoint of (14.106), we have equivalently ð fx1 fx2 Þ ¼ ðx1 x2 ÞU .

f
x1 x1
Notice here that and are real and that U is real (i.e., orthogonal
f
x2 x2
matrix). Likewise,
H 0 ¼ U { HU: ð14:107Þ
x1 f
x1
Thus, it follows that and are different column vector representa-
x2 f
x2
tions of the identical vector that is viewed in reference to two different sets of
orthonormal bases (i.e., different coordinate systems).
From (14.102), as an approximation we have
!
0:92 0:38 cos 22:5 sin 22:5
Uffi ffi : ð14:108Þ
0:38 0:92 sin 22:5 cos 22:5
Equating hx|H|xi to a constant, we get a hypersurface in a plane. Choosing 1 for a

constant, we get an equation of hypersurface (i.e., an ellipse) as a function of f
x1 and
fx2 such that
fx1 2 fx2 2
qffiffiffiffiffiffiffiffiffiffiffi 2
þ qffiffiffiffiffiffiffiffiffiffiffi 2
¼ 1: ð14:109Þ
1pffiffi 1pffiffi
3þ2 2 32 2
Figure 14.1 depicts the ellipse.
14.6 Simultaneous Eigenstates and Diagonalization
In quantum physics, a concept of simultaneous eigenstate is important and has been

briefly mentioned in Sect. 3.3. To rephrase this concept, suppose that there are two
operators B and C and ask whether the two operators (or more) possess a common set
14.6 Simultaneous Eigenstates and Diagonalization 575
Fig. 14.1 Ellipse obtained

through diagonalization of a
Hermitian quadratic form.
The angle θ is about 22.5
of eigenvectors. The question is boiled down to whether the two operators commute.
To address this question, the following theorem is important.
Theorem 14.13 [2] Two Hermitian matrices B and C commute if and only if there
exists a complete orthonormal set of common eigenvectors.
Proof Suppose that there exists a complete orthonormal set of common eigenvec-
tors {jxi; mii} that span the linear vector space, where i and mi are a positive integer
and that jxii corresponds to an eigenvalue bi of B and ci of C. Note that if mi is 1, we
say that the spectrum is nondegenerate and that if mi is equal to two or more, the
spectrum is said to be degenerate. Then we have
Bðjxi iÞ ¼ bi j xi i, C ðjxi iÞ ¼ ci j xi i: ð14:110Þ
Therefore, BC(| xii) ¼ B(ci| xii) ¼ ciB(| xii) ¼ cibi j xii. Similarly, CB(|
xii) ¼ cibi j xii. Consequently, (BC CB)(| xii) ¼ 0 for any jxii. As all the set of
jxii span the vector space, BC CB ¼ 0, namely BC ¼ CB.
In turn, assume that BC ¼ CB and that B(| xii) ¼ bi j xii, where jxii are
orthonormal. Then, we have
CBðjxi iÞ ¼ bi C ðjxi iÞ⟹B½Cðjxi iÞ ¼ bi C ðjxi iÞ: ð14:111Þ
This implies that C(| xii) is an eigenvector of B corresponding to the eigenvalue bi.
We have two cases.
(i) The spectrum is nondegenerate: The spectrum is said to be nondegenerate if
only one eigenvector belongs to an eigenvalue. In other words, multiplicity of bi
is one. Then, C(| xii) must be equal to some constant times jxii; i.e., C(|
xii) ¼ ci j xii. That is, jxii is an eigenvector of C corresponding to an eigenvalue
ci. That is, jxii is a common eigenvector to B and C. This completes the proof.
(ii) The spectrum is degenerate: The spectrum is said to be degenerate if two or

more eigenvectors belong to an eigenvalue. Multiplicity
of bi is two or more;
ð1Þ ð2Þ ðmÞ
here suppose that the multiplicity is m (m 2). Let xi i, xi i, , and j xi i
be linearly independent vectors and belong to the eigenvalue bi of B. Then, from
the assumption we have m eigenvectors
ð1Þ ð2Þ ðmÞ

C jxi i , C jxi i , , C jxi i
that belong to an eigenvalue bi of B. This means that individual

ð μÞ ð1Þ
C jxi i ð1 μ mÞ are described by linear combination of j xi i,
ð2Þ ðmÞ
j xi i, , and j xi i. What we want to prove is to show that suitable linear
combination of these m vectors constitutes an eigenvector corresponding to an
eigenvalue cμ of C. Here, to avoid complexity we denote the multiplicity by
m instead of the abovementioned mi.
ð μÞ
The vectors C jxi i ð1 μ mÞ can be described as
ð1Þ
Xm ð jÞ ðmÞ
Xm ð jÞ
C jxi i ¼ j¼1
α j1 j xi i, , C jxi i ¼ α
j¼1 jm
j xi i: ð14:112Þ
Using full matrix representation, we have

0 1
γ1
Xm ðk Þ ð1Þ ðmÞ B C
C γ jx i ¼ xi i xi i C @ ⋮ A
k¼1 k i
γm
0 10 1
α11 α1m γ1
ð1Þ ðmÞ B CB C
¼ xi i xi i @ ⋮ ⋱ ⋮ A@ ⋮ A: ð14:113Þ
αm1 αmm γm
In (14.113), we adopt the notation of (11.37). Since (αij) is a matrix representation

of an Hermitian operator C, (αij) is Hermitian as well. More specifically, if we take an
ðlÞ
inner product of a vector expressed in (14.112) with j xi i, then we have
D E D Xm E Xm D E Xm
ðlÞ ðk Þ ðlÞ ð jÞ ðlÞ ð jÞ
xi Cxi ¼ xi j¼1
αjk x i ¼ j¼1
αjk xi xi ¼ α δ
j¼1 jk lj
¼ αlk ð1 k,l mÞ, ð14:114Þ
where the third equality comes from the orthonormality of the basis vectors.
Meanwhile,
D E D E D E
ðlÞ ðk Þ ðk Þ ðlÞ ðk Þ ðlÞ
xi jCxi ¼ xi jC { xi ¼ xi jCxi ¼ αkl , ð14:115Þ
where the second equality comes from the Hermiticity of C. From (14.114) and
(14.115), we get
αlk ¼ αkl ð1 k, l mÞ: ð14:116Þ
This indicates that (αij) is in fact Hermitian.

We are seeking the condition under which linear combinations of the eigenvec-
ðk Þ
tors j xi i ð1 k mÞ for B are simultaneously eigenvectors of C. If the linear
P ðk Þ
combination m k¼1 γ k j xi i is to be an eigenvector of C, we must have
Xm ðk Þ
Xm ð jÞ
C γ jx i ¼ c
k¼1 k i
γ jx i
j¼1 j i
: ð14:117Þ
Considering (14.113), we have

0 10 1 0 1
α11 α1m γ1 γ1

ð1Þ ðmÞ B
B
CB C ð 1Þ ðmÞ B C
xi i xi i @ ⋮ ⋱ ⋮ A@ ⋮ A ¼ xi i xi i c@ ⋮ C
C B C B
A:
αm1 αmm γm γm
ð14:118Þ
ð jÞ
The vectors j xi i ð1 k mÞ span an invariant subspace (i.e., an eigenspace
corresponding to an eigenvalue of bi). Let us call this subspace Wm. Consequently, in
ð jÞ
(14.118) we can equate the scalar coefficients of individual j xi i ð1 k mÞ .
Then, we get
0 10 1 0 1
α11 α1m γ1 γ1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ c@ ⋮ A: ð14:119Þ
αm1 αmm γm γm
This is nothing other than an eigenvalue equation. Since (αij) is an Hermitian

matrix, there should be m eigenvalues cμ some of which may be identical (the
degenerate case). Moreover, we can always decide m orthonormal column vectors
by solving (14.119). We denote them by γ (μ) (1 μ m) that belong to cμ. Rewriting
(14.119), we get
0 10 ðμÞ
1 0 ðμÞ 1
α11 α1m γ1 γ1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ cμ @ ⋮ A: ð14:120Þ
αm1 αmm γ ðmμÞ γ ðmμÞ
Equation (14.120) implies that we can construct a (m, m) unitary matrix from the
m orthonormal column vectors γ (μ). Using the said unitary matrix, we will be able to
diagonalize (αij) according to Theorem 14.5.
Having determined m eigenvectors γ (μ) (1 μ m), we can construct a set of
eigenvectors such that
Xm ðμÞ ðkÞ
ðμÞ
yi i ¼ γ xi i:
k¼1 k
ð14:121Þ
ðμÞ
Finally let us confirm that j yi i ð1 μ mÞ in fact constitute an orthonormal
basis. To show this, we have
D E DXm Xm E
ðνÞ ðμÞ ðνÞ ðk Þ ðμÞ ðlÞ
yi yi ¼ γ
k¼1 k
x i γ
l¼1 l
x i
Xm Xm h ðνÞ i ðμÞ D ðkÞ ðlÞ E
¼ k¼1 l¼1 k
γ γ l xi jxi
Xm Xm h i ð14:122Þ
ðνÞ ðμÞ
¼ k¼1 l¼1 k
γ γ l δkl
Xm h ðνÞ i ðμÞ
¼ k¼1 k
γ γ k ¼ δνμ
The last equality comes from the fact that a matrix comprising m orthonormal
ðμÞ
column vectors γ (μ) (1 μ m) forms a unitary matrix. Thus, j yi i ð1 μ mÞ
certainly constitute an orthonormal basis.
The above completes the proof. ∎
Theorem 14.13 can be restated as follows: Two Hermitian matrices B and C can
be simultaneously diagonalized by a unitary similarity transformation. As mentioned
above, we can construct a unitary matrix U such that
0 ð1Þ ðmÞ
1
γ1 γ1
B C
U¼@⋮ ⋱ ⋮ A:
γ ðm1Þ γ ðmmÞ
Then, using U, matrices B and C are diagonalized such that

0 1 0 1
bi c1
B bi C B c2 C
B C B C
U { BU ¼ B
B
C,
C U { CU ¼ B
B
C: ð14:123Þ
C
@ ⋱ A @ ⋱ A
bi cm
Note that both B and C are represented in an invariant subspace Wm.

As we have already seen in Part I that dealt with an eigenvalue problem of a
hydrogen-like atom, squared angular momentum (L2) and z-component of angular
momentum (Lz) possess a mutual set of eigenvectors and, hence, their eigenvalues
are determined at once. Related matrix representations to (14.123) were given in
(3.159). On the other hand, this was not the case with a set of operators Lx, and Ly,
and Lz; see (3.30). Yet, we pointed out the exceptional case where these three
operators along with L2 take an eigenvalue zero in common which an eigenstate
Y 0 ðθ, ϕÞ 1=4π corresponds to. Nonetheless, no complete orthonormal set of
0
common eigenvectors exists with the set of operators Lx, and Ly, and Lz. This fact is
equivalent to that these three operators are noncommutative among them. In con-
trast, L2 and Lz share a complete orthonormal set of common eigenvectors and,
hence, are commutable.
ð1Þ ð2Þ ðmÞ
Notice that C jxi i , C jxi i , , and Cjxi i are not necessarily linearly
independent (see Sect. 9.4). Suppose that among m eigenvalues cμ (1 μ m),
some cμ ¼ 0. Then, detC ¼ 0 according to (13.48). This means that C is singular. In
ð1Þ ð2Þ ðmÞ
that case, C jxi i , C jxi i , , and Cjxi i are linearly dependent. In Sect.

3.3, in fact, we had j Lz Y 00 ðθ, ϕÞ ¼j L2 Y 00 ðθ, ϕÞ ¼ 0 . But, this special situation
does not affect the proof of Theorem 14.13.
We know that any matrix A can be decomposed such that
1 h1 i
A¼A þ A{ þ i A A{ , ð14:124Þ
2 2i

where we put B 12 A þ A{ and C 2i1 A A{ ; both B and C are Hermitian.
That is any matrix A can be decomposed to two Hermitian matrices in such a way
that
A ¼ B þ iC: ð14:125Þ
Note here that B and C commute if and only if A and A{ commute, that is A is a
normal matrix. In fact, from (14.125) we get
AA{ A{ A ¼ 2iðCB BC Þ: ð14:126Þ
From (14.126), if and only if B and C commute (i.e., B and C can be diagonalized
simultaneously), AA{ A{A ¼ 0, i.e., AA{ ¼ A{A. This indicates that A is a normal
matrix. Thus, the following theorem will follow.
Theorem 14.14 A matrix can be diagonalized by a unitary similarity transforma-

tion, if and only if it is a normal matrix.
Thus, Theorem 14.13 is naturally generalized so that it can be stated as follows:
Two normal matrices B and C commute if and only if there exists a complete
orthonormal set of common eigenvectors.
In Sects. 14.1 and 14.3, we mentioned the spectral decomposition. There, we
showed a special case where projection operators commute with one another; see
(14.23). Thus, in light of Theorem 14.13, those projection operators can be diago-
nalized at once to be expressed as, e.g., (14.72). This is a conspicuous feature of the
projection operators.
References

New York
3. Mirsky L (1990) An introduction to linear algebra. Dover, New York
Chapter 15
Exponential Functions of Matrices
In Chap. 12, we dealt with a function of matrices. In this chapter we study several
important definitions and characteristics of functions of matrices. If elements of
matrices consist of analytic functions of a real variable, such matrices are of
particular importance. For instance, differentiation can naturally be defined with
the functions of matrices. Of these, exponential functions of matrices are widely used
in various fields of mathematical physics. These functions frequently appear in a
system of differential equations. In Chap. 10, we showed that SOLDEs with suitable
BCs can be solved using Green’s functions. In the present chapter, in parallel, we
show a solving method based on resolvent matrices. The exponential functions of
matrices have broad applications in group theory that we will study in Part IV. In
preparation for it, we study how the collection of matrices forms a linear vector
space. In accordance with Chap. 13, we introduce basic notions of inner product and
norm to the matrices.
15.1 Functions of Matrices
We consider a matrix in which individual components are differentiable with respect

to a real variable such that

Aðt Þ ¼ aij ðt Þ : ð15:1Þ
We define the differentiation as
dAðt Þ 0
A0 ðt Þ ¼ aij ðt Þ : ð15:2Þ
dt
We have

https://doi.org/10.1007/978-981-15-2225-3_15
582 15 Exponential Functions of Matrices
½Aðt Þ þ Bðt Þ0 ¼ A0 ðt Þ þ B0 ðt Þ, ð15:3Þ

½Aðt ÞBðt Þ0 ¼ A0 ðt ÞBðt Þ þ Aðt ÞB0 ðt Þ: ð15:4Þ
To get (15.4), putting A(t)B(t) C(t) we have

X
cij ðt Þ ¼ a ðt Þbkj ðt Þ,
k ik
ð15:5Þ
where C(t) ¼ (cij(t)), B(t) ¼ (bij(t)). Differentiating (15.5), we get

X X
c0ij ðt Þ ¼ a0 ðt Þbkj ðt Þ þ
k ik
a ðt Þb0kj ðt Þ,
k ik
ð15:6Þ
namely, we have C0(t) ¼ A0(t)B(t) + A(t)B0(t), i.e., (15.4) holds.

Analytic functions are usually expanded as an infinite power series. As an
analogue, a function of a matrix is expected to be expanded as such. Among various
functions of matrices, exponential functions are particularly important and have a
wide field of applications. As in the case of a real or complex number x, the
exponential function of matrix A can be defined as
1 2 1
exp A E þ A þ A þ þ Aν þ , ð15:7Þ
2! ν!
where A is a (n, n) square matrix and E is a (n, n) identity matrix. Note that E is used
instead of a number 1 that appears in the power series expansion of exp x described
as
1 2 1
exp x 1 þ x þ x þ þ xν þ :
2! ν!
Here we define the convergence of a series of matrix as in the following

definition:
Definition 15.1 Let A0, A1, , Aν, [Aν (aij, ν) (1 i, j n)] be a sequence of
matrices. Let aij, 0, aij, 1, , aij, ν, be the corresponding sequence of each matrix

component. Let us define a series of matrices as A e e aij ¼ A0 þ A1 þ þ
Aν þ and a series of individual matrix components as e aij aij,0 þ aij,1 þ þ
aij,ν þ . Then, if each eaij converges, we say that A e converges.
Regarding the matrix notation in Definition 15.1, readers are referred to (11.38).
Now, let us show that the power series of (15.7) converges [1, 2].
Let A be a (n, n) square matrix. Putting A ¼ (aij) and max j aij j¼ M and defining
i, j
ðνÞ ðνÞ
A aij , where aij is defined as a (i, j) component of Aν, we get
ν
15.1 Functions of Matrices 583

ðνÞ
aij nν M ν ð1 i, j nÞ: ð15:8Þ
ð0Þ
Note that A0 ¼ E. Hence, aij ¼ δij. Then, if ν ¼ 0, (15.8) routinely holds. When
ν ¼ 1, (15.8) obviously holds as well from the definition of M and assuming n 2;
we are considering (n, n) matrices with n 2. We wish to show that (15.8) holds
with any ν (2) using mathematical induction. Suppose that (15.8) holds with
ν ¼ ν 1. That is,

ðν1Þ
aij nν1 M ν1 ð1 i, j nÞ:
Then, we have
ð νÞ Xn Xn ðν1Þ
ðAν Þij aij ¼ Aν1 A ij ¼ k¼1
Aν1 ik ðAÞkj ¼ a
k¼1 ik
akj : ð15:9Þ
Thus, we get
Xn Xn
ð νÞ ðν1Þ ðν1Þ
aij ¼ k¼1 aik akj a
k¼1 ik
akj n nν1 M ν1 M
¼ nν M ν : ð15:10Þ
Consequently, (15.8) holds with ν ¼ ν.

Meanwhile, we have a next relation such that
X1 1
enM ¼ ν¼0 ν!
nν M ν : ð15:11Þ
The series (15.11) is certainly converges; note that nM is not a matrix but a
number. From (15.10) we obtain
X1 1 ðνÞ X1 1
a
ν¼0 ν! ij ν¼0 ν!
nν M ν ¼ enM :
P 1 ðνÞ
This implies that 1 ν¼0 ν! aij converges (i.e., absolute convergence [3]). Thus,
from Definition 15.1, (15.7) converges.
To further examine the exponential functions of matrices, in the following
proposition we show that a collection of matrices forms a linear vector space.
Proposition 15.1 Let M be a collection of all (n, n) square matrices. Then, M forms
a linear vector space.

Proof Let A ¼ aij 2 M and B ¼ bij 2 M be (n, n) square matrices. Then,
αA ¼ (αaij) and A + B ¼ (aij + bij) are again (n, n) square matrices, and so we
have αA 2 M and A þ B 2 M . Thus, M forms a linear vector space.
2
In Proposition 15.1, M forms a n2-th order n
E vector space V . To define an inner
ðkÞ
product in this space, we introduce PðlÞ ðk, l ¼ 1, 2, . . . , nÞ as a basis set of n2
vectors. Now, let A ¼ (aij) be a (n, n) square matrices. To explicitly show that A is a
vector in the inner product space, we express it as
Xn E
ðk Þ
jAi ¼ a P
k,l¼1 kl ðlÞ
:
Then, an adjoint matrix of jAi is written as

Xn D
ðk Þ
hAjjAi{ ¼ a
k,l¼1 kl

PðlÞ , ð15:12Þ
D E
ðk Þ ðkÞ
where PðlÞ is an adjoint matrix of PðlÞ . With the notations in (15.12), see Sects.
ðk Þ
1.4 and 13.3; for PðlÞ see Sect. 14.1. Meanwhile, let B ¼ (bij) be another (n, n) square
matrix denoted by
Xn E
ðsÞ
jBi ¼ s,t¼1
b st PðtÞ :
E
ðk Þ
Here, let us define an orthonormal basis set PðlÞ and an inner product between
vectors described in a form of matrices such that
D E
ðk Þ ðsÞ
PðlÞ jPðtÞ δks δlt :
Then, from (15.12) the inner product between A and B is given by

Xn D E Xn Xn
ðkÞ ðsÞ
hAjBi k,l,s,t¼1
a kl b st Pð lÞ jPð t Þ ¼ k,l,s,t¼1
a kl b st δks δlt ¼ a b : ð15:13Þ
k,l¼1 kl kl
The relation (15.13) leads to a norm already introduced in (13.8). The norm of a
matrix is defined as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi Xn 2
jjAjj AjA¼ aij : ð15:14Þ
i,j¼1
We have the following theorem regarding two matrices.

Theorem 15.1 Let A ¼ (aij) and B ¼ (bij) be two square matrices. Then, we have a
following inequality:
15.2 Exponential Functions of Matrices and Their Manipulations 585
jjABjjjjAjj jjBjj : ð15:15Þ
Proof Let C ¼ AB. Then, from Cauchy–Schwarz inequality (Sect. 13.1) we get
X 2 X X 2 X X
X 2
kC k2 ¼ cij ¼ a ik b kj ja j2 l blj
i,j i,j k i,j k ik
X X : ð15:16Þ
¼ j a ik j 2 blj 2 ¼ kAk2 kBk2
i,k j,l
This leads to (15.15). ∎
15.2 Exponential Functions of Matrices and Their

Manipulations
Using Theorem 15.1, we explore important characteristics of exponential functions

of matrices. For this we have a next theorem.
Theorem 15.2 [4] Let A and B be mutually commutative matrices; i.e., AB ¼ BA.
Then, we have
exp ðA þ BÞ ¼ exp A exp B ¼ exp B exp A: ð15:17Þ
Proof Since A and B are mutually commutative, (A + B)n can be expanded through
the binomial theorem such that
1 Xn Ak Bnk
ðA þ BÞn ¼ k¼0 k! ðn k Þ!
: ð15:18Þ
n!
Meanwhile, with an arbitrary integer m, we have
X2m 1 X X
n m An m Bn
n¼0 n!
ð A þ B Þ ¼ n¼0 n! n¼0 n!
þ Rm , ð15:19Þ
P
where Rm has a form of ðk,lÞ A k l
B =k!l!, where the summation is taken over all
combinations of (k, l) that satisfy max(k, l) m + 1 and k + l 2m. The number of
all the combinations (k, l) is m(m + 1) accordingly; see Fig. 15.1 [4]. We remark that
if in (15.19) we put m ¼ 0, (15.19) trivially holds with Rm ¼ 0 (i.e., zero matrix). The
Fig. 15.1 Number of

possible combinations (k, l ).
Such combinations are
indicated with green points.
The number of the green −1
points of upper left triangle −2
is m(m + 1)/2. That for lower
right triangle is m(m + 1)/2 ⋯⋯⋯
as well. The total number of
the green points is m(m + 1)
accordingly. Adapted from +1 ⋯
Yamanouchi T, Sugiura M
(1960) Introduction to
continuous groups (New
Mathematics Series 18: in
⋯
Japanese) [4], with the
⋯⋯⋯
permission of Baifukan Co.,
Ltd., Tokyo
0
0 +1
matrix Rm, however, changes with increasing m, and so we must evaluate it at

m ! 1. Putting max(| |A|| , | |B|| , 1) ¼ C and using Theorem 15.1, we get
k l
X jAj jBj
jjRm jj ðk,lÞ
: ð15:20Þ
k!l!
In (15.20) min(k, l ) ¼ 0, and so k ! l ! (m + 1)!. Since ||A||k||B||l Ck + l C2m, we

have
k l
X jAj jBj mðm þ 1ÞC 2m C 2m
¼ , ð15:21Þ
ðk,lÞ k!l! ðm þ 1Þ! ðm 1Þ!
where m(m + 1) in the numerator comes from the number of combinations (k, l) as
pointed out above. From Stirling’s interpolation formula [5], we have
pffiffiffiffiffi z zþ1
Γðz þ 1Þ 2π e z 2 : ð15:22Þ
Replacing z with an integer m 1, we get

pffiffiffiffiffi ðm1Þ 1
ΓðmÞ ¼ ðm 1Þ! 2π e ðm 1Þm2 : ð15:23Þ
Then, we have
m12
C2m em1 C2m C eC 2
½RHS of ð15:21Þ ¼ pffiffiffiffiffi ¼ p ffiffiffiffiffiffiffi
ffi ð15:24Þ
ðm 1Þ! 2πe m 1
1
2π ðm 1Þm2
and
m12
C eC2
lim pffiffiffiffiffiffiffiffi ¼ 0: ð15:25Þ
m!1 2πe m 1
Thus, from (15.20) and (15.21) we get
lim jjRm jj ¼ 0: ð15:26Þ

m!1
Remember that jjRmjj ¼ 0 if and only if Rm ¼ 0 (zero matrix); see (13.4) and
(13.8) in Sect. 13.1. Finally, taking limit (m ! 1) of both sides of (15.19) we obtain
X2m 1 X X
n m An m Bn
lim n¼0 n!
ð A þ B Þ ¼ lim n¼0 n! n¼0 n!
þ Rm :
m!1 m!1
Considering (15.26), we get
X2m 1 X X
n m An m Bn
limn¼0 n!
ð A þ BÞ ¼ lim n¼0 n! n¼0 n!
: ð15:27Þ
m!1 m!1
This implies that (15.17) holds. It is obvious that if A and B commute, then exp A
and exp B commute. That is, exp A exp B ¼ exp B exp A. These complete the
proof. ∎
From Theorem 15.2, we immediately get the following property.
ð1Þ ð exp AÞ1 ¼ exp ðAÞ: ð15:28Þ
To show this, it suffices to find that A and A commute. That is, A

(A) ¼ A2 ¼ (A)A. Replacing B with A in (15.17), we have
exp ðA AÞ ¼ exp 0 ¼ E ¼ exp A exp ðAÞ ¼ exp ðAÞ exp A: ð15:29Þ
This leads to (15.28). Notice here that exp 0 ¼ E from (15.7). We further have
important properties of exponential functions of matrices.

ð2Þ P1 ð exp AÞP ¼ exp P1 AP : ð15:30Þ

ð3Þ exp AT ¼ ð exp AÞT : ð15:31Þ

ð4Þ exp A{ ¼ ð exp AÞ{ : ð15:32Þ
To show (2) we have

1 n 1 1 1
P AP ¼ P AP P AP P AP

¼ P1 A PP1 A PP1 PP1 AP ð15:33Þ
1 1 n
¼ P AEAE EAP ¼ P A P:
Applying (15.33) to (15.7), we have

2 n
exp P1 AP ¼ E þ P1 AP þ P1 AP þ þ P1 AP þ
A2 An
¼ P1 EP þ P1 AP þ P1 P þ þ P1 P þ ð15:34Þ
2! n!
2 n
A A
¼ P1 E þ A þ þ þ þ P ¼ P1 ð exp AÞP:
2! n!
With (3) and (4), use the following equations:

n n
AT ¼ ðAn ÞT and A{ ¼ ðAn Þ{ : ð15:35Þ
The confirmation of (15.31) and (15.32) is left for readers. For future purpose, we
describe other important theorem and properties of the exponential functions of
matrices.
Theorem 15.3 Let a1, a2, , an be eigenvalue of a (n, n) square matrix A. Then,
exp a1, exp a2, , exp an are eigenvalues of exp A.
Proof From Theorem 12.1 we know that every (n, n) square matrix can be
converted to a triangle matrix by similarity transformation. Also, we know that
eigenvalues of a triangle matrix are given by its diagonal elements (Sect. 12.1).
Let P be a non-singular matrix used for such a similarity transformation. Then,
we have
= * . ð15:36Þ
0
Meanwhile, let T be a triangle matrix described by

* . ð15:37Þ
0
Then, from a property of the triangle matrix we have
* . ð15:38Þ
Applying this property to (15.36), we get
* , ð15:39Þ
0
where the last equality is due to (15.33). Summing (15.39) over m following (15.34),
we obtain another triangle matrix expressed as
exp
* . ð15:40Þ
0
Once again considering that eigenvalues of a triangle matrix are given by its
diagonal elements and that the eigenvalues are invariant under a similarity transfor-
mation, (15.40) implies that exp a1, exp a2, , exp an are eigenvalues of exp A.
A question of whether the eigenvalues are a proper eigenvalue or a generalized
eigenvalue is irrelevant to Theorem 15.3. In other words, the eigenvalue may be a
proper one or a generalized one (Sect. 12.6). That depends solely upon the nature of
A in question.
Theorem 15.3 immediately leads to a next important relation. Taking a determi-
nant of (15.40), we get
det P1 ð exp AÞP ¼ detP1 detð exp AÞdetP ¼ detð exp AÞ
¼ ð exp a1 Þð exp a2 Þ ð exp an Þ ¼ exp ða1 þ a2 þ þ an Þ
¼ exp ðTrAÞ ð15:41Þ
To derive (15.41), refer to (11.58) and (12.10) as well. We list (15.41) as an

important property of the exponential functions of matrices.
ð5Þ detð exp AÞ ¼ exp ðTrAÞ: ð15:41Þ
We have examined general aspects until now. Nonetheless, we have not yet
studied the relationship between the exponential functions of matrices and specific
types of matrices so far. Regarding the applications of exponential functions of
matrices to mathematical physics, especially with the transformations of functions
and vectors, we frequently encounter orthogonal matrices and unitary matrices. For
this purpose, we wish to examine the following properties.
(6) Let A be a real skew-symmetric matrix. Then, exp A is a real orthogonal matrix.
(7) Let A be an anti-Hermitian matrix. Then, exp A is a unitary matrix.
To show (6), we note that a skew-symmetric matrix is described as
AT ¼ A or AT þ A ¼ 0: ð15:42Þ
Then, we have ATA ¼ A2 ¼ AAT and, hence, A and AT commute. Therefore,

from Theorem 15.2, we have

exp A þ AT ¼ exp 0 ¼ E ¼ exp A exp AT ¼ exp Að exp AÞT , ð15:43Þ
where the last equality comes from Property (3) of (15.31). Equation (15.43) implies
that exp A is a real orthogonal matrix.
Since an anti-Hermitian matrix A is expressed as
A{ ¼ A or A{ þ A ¼ 0, ð15:44Þ
we have A{A ¼ A2 ¼ AA{, and so A and A{ commute. Using (15.32), we have

exp A þ A{ ¼ exp 0 ¼ E ¼ exp A exp A{ ¼ exp Að exp AÞ{ , ð15:45Þ
where the last equality comes from Property (4) of (15.32). This implies that exp A is
a unitary matrix.
Regarding Properties (6) and (7), if A is replaced with tA (t : real), the relations
(15.42) through (15.45) are held unchanged. Then, Properties (6) and (7) are rewrit-
ten as
(6)0 Let A be a real skew-symmetric matrix. Then, exp tA (t : real) is a real
orthogonal matrix.
(7)0 Let A be an anti-Hermitian matrix. Then, exp tA (t : real) is a unitary matrix.
Now, let a function F(t) be expressed as F(t) ¼ exp tx with real numbers t and x.
Then, we have a well-known formula
d
F ðt Þ ¼ xF ðt Þ: ð15:46Þ
dt
Next, we extend (15.46) to the exponential functions of matrices. Let us define the
differentiation of an exponential function of a matrix with respect to a real parameter
t. Then, in concert with (15.46), we have a following important theorem.
Theorem 15.4 [1, 2] Let F(t) exp tA with t being a real number and A being a
matrix that does not depend on t. Then, we have
d
F ðt Þ ¼ AF ðt Þ: ð15:47Þ
dt
In (15.47) we assume that individual matrix components of F(t) are differentiable

with respect to t and that the differentiation of a matrix is defined as in (15.2).
Proof We have
F ðt þ Δt Þ ¼ exp ðt þ Δt ÞA ¼ exp ðtA þ ΔtAÞ ¼ ð exp tAÞð exp ΔtAÞ

¼ F ðt ÞF ðΔt Þ ¼ ð exp ΔtAÞð exp tAÞ ¼ F ðΔt ÞF ðt Þ, ð15:48Þ
where we considered that tA and ΔtA are commutative and used Theorem 15.2.
Therefore, we get
1 1
½F ðt þ Δt Þ F ðt Þ ¼ ½F ðΔt Þ EF ðt Þ
Δt Δt
h i
1 X1 1 ν
¼ ð ΔtA Þ E F ðt Þ
Δt ν¼0 ν!
hX1 i
1
¼ ν¼1 ν!
ðΔt Þν1 Aν F ðt Þ: ð15:49Þ
Taking the limit of both sides of (15.49), we get
1 d
lim ½F ðt þ Δt Þ F ðt Þ F ðt Þ ¼ AF ðt Þ:
Δt!0 Δt dt
Notice that in (15.49) only the first term (i.e., ν ¼ 1) of RHS is nonvanishing with
respect to Δt ! 0. Then, (15.47) will follow. This completes the proof. ∎
On the basis of the power series expansion of the exponential function of a matrix
defined in (15.7), A and F(t) are commutative. Therefore, instead of (15.47) we may
write
d
F ðt Þ ¼ F ðt ÞA: ð15:50Þ
dt
Using Theorem 15.4, we show the following important properties. These are
converse propositions to Properties (6) and (7).
(8) Let exp tA be a real orthogonal matrix with any real number t. Then, A is a real
skew-symmetric matrix.
(9) Let exp tA be a unitary matrix with any real number t. Then, A is an anti-
Hermitian matrix.
To show (8), from the assumption with any real number t we have

exp tAð exp tAÞT ¼ exp tA exp tAT ¼ E: ð15:51Þ
Differentiating (15.51) with respect to t by use of (15.4), we have

ðA exp tAÞ exp tAT þ ð exp tAÞAT exp tAT ¼ 0: ð15:52Þ
Since (15.52) must hold with any real number, (15.52) must hold with t ! 0 as
well. On this condition, from (15.52) we get A + AT ¼ 0. That is, A is a real skew-
symmetric matrix.
In the case of (9), similarly we have

exp tAð exp tAÞ{ ¼ exp tA exp tA{ ¼ E: ð15:53Þ
Differentiating (15.53) with respect to t and taking t ! 0, we get A + A{ ¼ 0. That

is, A is an anti-Hermitian matrix.
Properties (6)–(9) including Properties (6)0 and (7)0 are frequently used later in
Chap. 20 in relation to Lie groups and Lie algebras.
15.3 System of Differential Equations
In this section we describe important applications of the exponential functions of

matrices to systems of differential equations.
15.3.1 Introduction
In Chap. 10 we have dealt with SOLDEs using Green’s functions. In the present
section we examine the properties of systems of differential equations. There is a
close relationship between the two topics. First we briefly outline it.
We describe an example of the system of differential equations. First, let us think
of the following general equation:
xð_t Þ ¼ aðt Þxðt Þ þ bðt Þyðt Þ, ð15:54Þ

15.3 System of Differential Equations 593
yð_t Þ ¼ cðt Þxðt Þ þ dðt Þyðt Þ, ð15:55Þ
where x(t) and y(t) vary as a function of t; a(t), b(t), c(t), and d(t) are coefficients.
Differentiating (15.54) with respect to t, we get
xð€t Þ ¼ að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þyð_t Þ

¼ að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þ½cðt Þxðt Þ þ d ðt Þyðt Þ
¼ ½að_t Þ þ bðt Þcðt Þxðt Þ þ aðt Þxð_t Þ þ ½bð_t Þ þ bðt Þd ðt Þyðt Þ, ð15:56Þ
where with the second equality we used (15.55). To delete y(t) from (15.56), we
multiply b(t) on both sides of (15.56) and multiply bð_t Þ þ bðt Þdðt Þ on both sides of
(15.54) and further subtract the latter equation from the former. As a result, we
obtain

b€x b_ þ ða þ dÞb x_ þ a b_ þ bd bða_ þ bcÞ x ¼ 0: ð15:57Þ
Upon the above derivation, we assume b 6¼ 0. Solving (15.57) and substituting

the resulting solution for (15.54), we can get a solution for y(t). If, on the other
hand, b ¼ 0, we have xð_t Þ ¼ aðt Þxðt Þ. This is a simple FOLDE and can readily be
integrated to yield
Z t
x ¼ C exp aðt 0 Þdt 0 , ð15:58Þ
where C is an integration constant.

Meanwhile, differentiating (15.55) with respect to t, we have
_ þ d y_ ¼ c_ x þ cax þ dy
€y ¼ c_ x þ c_x þ dy _ þ dy_
Z t
¼ ðc_ þ caÞ C exp aðt Þdt þ dy_ þ dy_ ,
where we used (15.58) as well as x_ ¼ ax. Thus, we are going to solve a following
inhomogeneous SOLDE:
Z t
_ ¼ ðc_ þ caÞ C exp
€y dy_ dy aðt 0 Þdt 0 : ð15:59Þ
In Sect. 10.2 we have shown how we are able to solve an inhomogeneous FOLDE
using a weight function. For a later purpose, let us consider a next FOLDE assuming
a(x) 1 in (10.23):
xð_t Þ þ pðt Þx ¼ qðt Þ: ð15:60Þ
As another method to solve (15.60), we examine the method of variation of

constants. The homogeneous equation corresponding to (15.60) is
xð_t Þ þ pðt Þx ¼ 0: ð15:61Þ
It can be solved as just before such that

Z t
xðt Þ ¼ C exp pðt 0 Þdt 0 uðt Þ: ð15:62Þ
Now we assume that a solution of (15.60) can be sought by putting
xðt Þ ¼ kðt Þuðt Þ, ð15:63Þ
where we assume that the functional form u(t) remains unchanged in the inhomo-
geneous equation (15.60) and that instead the “constant” k(t) may change as a
function of t. Inserting (15.63) into (15.60), we have
_ þ ku_ þ pku ¼ ku
x_ þ px ¼ ku _ þ kðu_ þ puÞ ¼ ku
_ þ k ðpu þ puÞ ¼ ku
_
¼ q, ð15:64Þ
where with the third equality we used (15.61) and (15.62) to get u_ ¼ pu. In this
way, (15.64) can easily be integrated to give
Z t
qð t 0 Þ 0
k ðt Þ ¼ dt þ C,
uð t 0 Þ
where C is an integration constant. Thus, as a solution of (15.60) we have

Z t
qðt 0 Þ 0
xðt Þ ¼ kðt Þuðt Þ ¼ uðt Þ dt þ Cuðt Þ: ð15:65Þ
uðt 0 Þ
Comparing (15.65) with (10.29), we notice that these two expressions are related.
In other words, the method of variation of constants in (15.65) is essentially the same
as that using the weight function in (10.29).
Another important point to bear in mind is that in Chap. 10 we have solved
SOLDEs of constant coefficients using the method of Green’s functions. In that case,
we have separately estimated a homogeneous term and inhomogeneous term (i.e.,
surface term). In the present section we are going to seek a method corresponding to
that using the Green’s functions. In the above discussion we see that a system of
differential equations with two unknowns can be translated into a SOLDE. Then, we
expect such a system of differential equations of constant coefficients to be solved
somehow or other. We will hereinafter describe it in detail giving examples.
15.3.2 System of Differential Equations in a Matrix Form:

Resolvent Matrix
The equations of (15.54) and (15.55) are unified in a homogeneous single matrix
equation such that

að t Þ cð t Þ
ðxð_t Þ yð_t ÞÞ ¼ ðxðt Þ yðt ÞÞ : ð15:66Þ
bðt Þ d ðt Þ
In (15.66), x(t) and y(t) and their derivatives represent the functions (of t), and so
we describe them as row matrices. With this expression of equation, see Sect. 11.2
and, e.g., (11.37) therein. Another expression is a transposition of (15.66) described
by
!
xð_t Þ að t Þ bð t Þ xð t Þ
¼ : ð15:67Þ
yð_t Þ cðt Þ d ðt Þ yð t Þ
Notice that the coefficient matrix has been transposed in (15.67) accordingly. We
are most interested in an equation of constant coefficient and, hence, we rewrite
(15.66) once again explicitly as

_ _ a c
ðxðt Þ yðt ÞÞ ¼ ðxðt Þ yðt ÞÞ ð15:68Þ
b d
or

d a c
ð xð t Þ yð t Þ Þ ¼ ð xð t Þ yð t Þ Þ : ð15:69Þ
dt b d
Putting F(t) (x(t) y(t)), we have

dF ðt Þ a c
¼ F ðt Þ : ð15:70Þ
dt b d
This is exactly the same as (15.50) if we put

a c
A¼ : ð15:71Þ
b d
Then, we get
F ðt Þ ¼ exp tA: ð15:72Þ

As already shown in (15.7) of Sect. 15.1, an exponential function of a matrix, i.e.,

exp tA can be defined as
1 1
exp tA E þ tA þ ðtAÞ2 þ þ ðtAÞν þ : ð15:73Þ
2! ν!
From Property (5) of (15.41), we have det(exp A) 6¼ 0. This is because no matter

what number (either real or complex) TrA may take, exp t(TrA) never vanishes. That
is, F(t) ¼ exp A is non-singular, and so an inverse matrix of exp A must exist. This is
the case with exp tA as well; see equation below
ð5Þ0 detð exp tAÞ ¼ exp TrðtAÞ ¼ exp t ðTrAÞ: ð15:74Þ
Note that if A (or tA) is a (n, n) matrix, F(t) of (15.72) is a (n, n) matrix as well.
Since in that general case F(t) is non-singular, it consists of n linearly independent
column vectors, i.e., (n, 1) matrices; see Chap. 11. Then, F(t) is symbolically
described as
F ðt Þ ¼ ðx1 ðt Þ x2 ðt Þ xn ðt ÞÞ,
where xi(t) (1 i n) represents a i-th column vector. Note that since F

(0) ¼ E, xi(0) has its i-th row component of 1, otherwise 0. Since F(t) is
non-singular, n column vectors xi(t) (1 i n) are linearly independent. In the
case of, e.g., (15.66), A is a (2, 2) matrix, which implies that x(t) and y(t) represent
two linearly independent column vectors, i.e., (2, 1) matrices. Furthermore, if we
look at the constitution of (15.70), x(t) and y(t) represent two linearly independent
solutions of (15.70). We will come back to this point later.
On the basis of the method of variation of constants, we deal with inhomogeneous
equations and describe below the procedures for solving the system of equations
with two unknowns. But, they can be readily conformed to the general case where
the equations with n unknowns are dealt with.
Now, we rewrite (15.68) symbolically such that
xð_t Þ ¼ xðt ÞA, ð15:75Þ

a c
where x(t) (x(t) y(t)), xð_t Þ ðxð_t Þ yð_t ÞÞ, and A . Then, the inhomo-
b d
geneous equation can be described as
xð_t Þ ¼ xðt ÞA þ bðt Þ, ð15:76Þ
where we define the inhomogeneous term as b (p q). This is equivalent to


_ _ a c
ðxðt Þ yðt ÞÞ ¼ ðxðt Þ yðt ÞÞ þ ðp qÞ: ð15:77Þ
b d
Using the same solution F(t) that appears in the homogeneous equation, we
assume that the solution of the inhomogeneous equation is described by
xðt Þ ¼ kðt ÞF ðt Þ, ð15:78Þ
where k(t) is a variable “constant.” Replacing x(t) in (15.76) with x(t) ¼ k(t)F(t) of
(15.78), we obtain
_ þ kF_ ¼ kF
x_ ¼ kF _ þ kFA ¼ kFA þ b, ð15:79Þ
where with the second equality we used F_ ¼ FA in (15.50). Then, we have
_ ¼ b:
kF ð15:80Þ
This can be integrated so that we may have

Z t h i
k¼ ds bðsÞF ðsÞ1 : ð15:81Þ
As previously noted, F(s)1 exists because F(s) is non-singular. Thus, as a

solution we get
Z t h i
xðt Þ ¼ kðt ÞF ðt Þ ¼ ds bðsÞF ðsÞ1 F ðt Þ : ð15:82Þ
t0
Here we define
Rðs, t Þ F ðsÞ1 F ðt Þ: ð15:83Þ
The matrix Rðs, t Þ is said to be a resolvent matrix [6]. This matrix plays an
essential role in solving the system of differential equations both homogeneous and
inhomogeneous with either homogeneous or inhomogeneous boundary conditions
(BCs). Rewriting (15.82), we get
Z t
xðt Þ ¼ kðt ÞF ðt Þ ¼ ds½bðsÞℜðs, t Þ : ð15:84Þ
t0
We summarize principal characteristics of the resolvent matrix.

ð1Þ Rðs, sÞ ¼ F ðsÞ1 F ðsÞ ¼ E, ð15:85Þ

1 0
where E is a (2, 2) unit matrix, i.e., E ¼ for an equation having two
0 1
unknowns.
h i1
ð2Þ Rðs, t Þ1 ¼ F ðsÞ1 F ðt Þ ¼ F ðt Þ1 F ðsÞ ¼ Rðt, sÞ: ð15:86Þ
ð3Þ Rðs, t ÞRðt, uÞ ¼ F ðsÞ1 F ðt ÞF ðt Þ1 F ðuÞ ¼ F ðsÞ1 F ðuÞ ¼ Rðs, uÞ: ð15:87Þ
It is easy to include a term in solution related to the inhomogeneous BCs. As

already seen in (10.33) of Sect. 10.2, a boundary condition for the FOLDEs is set
such that, e.g.,
uðaÞ ¼ σ, ð15:88Þ
where u(t) is a solution of a FOLDE. In that case, the BC can be translated into the
“initial” condition. In the present case, we describe the BCs as
xðt 0 Þ x0 ðσ τÞ, ð15:89Þ
where σ x(t0) and τ y(t0). Then, the said term associated with the inhomoge-
neous BCs is expected to be written as
xðt 0 ÞRðt 0 , t Þ:
In fact, since xðt 0 ÞRðt 0 , t 0 Þ ¼ xðt 0 ÞE ¼ xðt 0 Þ, this satisfies the BCs. Then, the full
expression of the solution of the inhomogeneous equation that takes account of the
BCs is assumed to be expressed as
Z t
xðt Þ ¼ xðt 0 Þℜðt 0 , t Þ þ ds½bðsÞℜðs, t Þ: ð15:90Þ
t0
Let us confirm that (15.90) certainly gives a proper solution of (15.76). First we
give a formula on the differentiation of
Z t
d
Q ðt Þ ¼ Pðt, sÞds, ð15:91Þ
dt t0
where P(t, s) stands for a (1, 2) matrix whose general form is described by (q(t, s)
r(t, s)), in which q(t, s) and r(t, s) are functions of t and s. We rewrite (15.91) as
Z tþΔ Z t
1
Qðt Þ ¼ lim Pðt þ Δ, sÞds Pðt, sÞds
Δ!0 Δ t0 t0
Z tþΔ Z t Z t
1
¼ lim Pðt þ Δ, sÞds Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds
Δ!0 Δ t0 t0 t0
Z t
Pðt, sÞds
t0
Z tþΔ Z t Z t
1
¼ lim Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds Pðt, sÞds
Δ!0 Δ t t0 t0
Z t Z t
1 1
lim Pðt, t ÞΔ þ lim Pðt þ Δ, sÞds Pðt, sÞds
Δ!0 Δ Δ!0 Δ t0 t0
Z t
∂Pðt, sÞ
¼ Pðt, t Þ þ ds: ð15:92Þ
t0 ∂t
Replacing P(t, s) of (15.92) with bðsÞRðs, t Þ, we have

Z Z
d t t
∂ℜðs, t Þ
bðsÞℜðs, t Þds ¼ bðt Þℜðt, t Þ þ bð s Þ ds
dt t0 t0 ∂t
Z t
∂ℜðs, t Þ
¼ bð t Þ þ bð s Þ ds, ð15:93Þ
t0 ∂t
where with the last equality we used (15.85). Considering (15.83) and (15.93) and
differentiating (15.90) with respect to t, we get
Z t h i
xð_t Þ ¼ xðt 0 ÞF ðt 0 Þ1 F ð_t Þ þ ds bðsÞF ðsÞ1 F ð_t Þ þ bðt Þ
t0
Z t h i
¼ xðt 0 ÞF ðt 0 Þ1 F ðt ÞA þ ds bðsÞF ðsÞ1 F ðt ÞA þ bðt Þ
t0
Z t
¼ xðt 0 Þℜðt 0 , t ÞA þ ds½bðsÞℜðs, t ÞA þ bðt Þ
(" t 0Z )
t
¼ xðt 0 Þℜðt 0 , t Þ þ ds½bðsÞℜðs, t Þ A þ bðt Þ
t0
¼ xðt ÞA þ bðt Þ, ð15:94Þ
where with the second equality we used (15.70) and with the last equality we used
(15.90). Thus, we have certainly recovered the original differential equation of
(15.76). Consequently, (15.90) is the proper solution for the given inhomogeneous
equation with inhomogeneous BCs.
The above discussion and formulation equally apply to the general case of the
differential equation with n unknowns, even though the calculation procedures
become increasingly complicated with the increasing number of unknowns.
15.3.3 Several Examples
To deepen our understanding of the essence and characteristics of the system of

differential equations, we deal with several examples regarding the equations of two
unknowns with constant coefficients. The constant matrices A of specific types (e.g.,
anti-Hermitian matrices and skew-symmetric matrices) have a wide field of appli-
cations in mathematical physics, especially in Lie groups and Lie algebras (see
Chap. 20). In the subsequent examples, however, we deal with various types of
matrices.
Example 15.1 Solve the following equation
xð_t Þ ¼ x þ 2y þ 1, yð_t Þ ¼ 2x þ y þ 2, ð15:95Þ
under the BCs x(0) ¼ c and y(0) ¼ d.

In a matrix form, (15.95) can be written as

_ _ 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞ þ ð1 2Þ: ð15:96Þ
2 1
The homogeneous equation corresponding to (15.96) is expressed as

_ _ 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞ : ð15:97Þ
2 1
As discussed in Sect. 15.3.2, this can readily be solved to produce a solution F(t)
such that
F ðt Þ ¼ exp tA, ð15:98Þ

1 2
where A ¼ . Since A is a real symmetric (i.e., Hermitian) matrix, it should
2 1
be diagonalized by the unitary similarity transformation (see Sect. 14.3). For the
purpose of getting an exponential function of a matrix, it is easier to estimate exp tD
(where D is a diagonal matrix) than to directly calculate exp tA. That is, we rewrite
(15.97) as

_ _ { 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞUU UU { , ð15:99Þ
2 1
where U is a unitary matrix; i.e., U{ ¼ U1.

Following routine calculation procedures (as in, e.g., Example 14.3), we readily
get a following diagonal matrix and the corresponding unitary matrix for the
diagonalization. That is, we obtain
0 1
1 1
B 2 2C 1 2
U¼B@
C and D ¼ U 1 U ¼ U 1 AU
1 1 A 2 1
2 2

1 0
¼ : ð15:100Þ
0 3
Or we have
A ¼ UDU 1 : ð15:101Þ
Hence, we get

F ðt Þ ¼ exp tA ¼ exp tUDU 1 ¼ U ð exp tDÞU 1 , ð15:102Þ
where with the last equality we used (15.30). From Theorem 15.3, we have

et 0
exp tD ¼ : ð15:103Þ
0 e3t
In turn, we get
0 1 1 1 0 1
pffiffiffi pffiffiffi t
! p1ffiffiffi p1ffiffiffi
B 2 2C e 0 B 2 2C
F ðt Þ ¼ B
@
C
A
B
@
C
1 1 0 e 3t 1 1 A
pffiffiffi pffiffiffi pffiffiffi pffiffiffi ð15:104Þ
2 2 2 2
t t
!
1 e þ e 3t
e þ e 3t
¼ :
2 et þ e3t et þ e3t
With an inverse matrix, we have

1 1 e3t þ et e3t e3t
F ðt Þ ¼ : ð15:105Þ
2 e3t e3t e3t þ et
Therefore, as a resolvent matrix we get

!
1 1 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ
Rðs, t Þ ¼ F ðsÞ F ðt Þ ¼ : ð15:106Þ
2 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ
Thus, as a solution of (15.95) we obtain

Z t
xðt Þ ¼ xð0ÞRð0, t Þ þ ds½bðsÞRðs, t Þ, ð15:107Þ
0
where x(0) ¼ (x(0) y(0)) ¼ (c d) represents the BCs and b(s) ¼ (1 2) comes from
(15.96). This can easily be calculated so that we can have

1 1
xð0ÞRð0, t Þ ¼ ðc dÞet þ ðc þ d Þe3t ðd cÞet þ ðc þ dÞe3t
2 2
ð15:108Þ
and
Z
t
1 t 1
ds½bðsÞRðs, t Þ ¼ e þ e3t 1 et e3t : ð15:109Þ
0 2 2
Note that the first component of (15.108) and (15.109) represents the
x component of the solution for (15.95) and that the second component represents
the y component. From (15.108), we have
xð0ÞRð0, 0Þ ¼ xð0ÞE ¼ xð0Þ ¼ ðc d Þ: ð15:110Þ
Thus, we find that the BCs are certainly satisfied. Putting t ¼ 0 in (15.109), we
have
Z 0
ds½bðsÞRðs, 0Þ ¼ 0, ð15:111Þ
0
confirming that the second term of (15.107) vanishes at t ¼ 0.

The summation of (15.108) and (15.109) gives an overall solution of (15.107).
Separating individual components of the solution, we describe them as
1
xð t Þ ¼ ðc d þ 1Þet þ ðc þ d þ 1Þe3t 2 ,
2 ð15:112Þ
1
yðt Þ ¼ ðd c 1Þet þ ðc þ d þ 1Þe3t :
2
Moreover, differentiating (15.112) with respect to t, we recover the original form

of (15.95). The confirmation is left for readers.
Remarks on Example 15.1: We become aware of several points in Example 15.1.
(i) The resolvent matrix given by (15.106) can be obtained by replacing t of
(15.104) with t s. This is one of important characteristics of the resolvent
matrix that is derived from an exponential function of a constant matrix.
To see it, if A is a constant matrix, tA and sA commute. By the definition (15.7)
of the exponential function we have
1 2 2 1
exp tA ¼ E þ tA þt A þ þ t ν Aν þ , ð15:113Þ
2! ν!
1 1
exp ðsAÞ ¼ E þ ðsAÞ þ ðsÞ2 A2 þ þ ðsÞν Aν þ : ð15:114Þ
2! ν!
Both (15.113) and (15.114) are polynomials of a constant matrix A and,

hence, exp tA and exp(sA) commute as well. Then, from Theorem 15.2 and
Property (1) of (15.28), we get
exp ðtA sAÞ ¼ ð exp tAÞ½ exp ðsAÞ ¼ ½ exp ðsAÞð exp tAÞ
¼ exp ðsA þ tAÞ ¼ exp ½ðt sÞA: ð15:115Þ
That is, using (15.28), (15.72), and (15.83), we get
Rðs, t Þ ¼ F ðsÞ1 F ðt Þ ¼ exp ðsAÞ1 exp ðtAÞ ¼ exp ðsAÞ exp ðtAÞ
¼ exp ½ðt sÞA ¼ F ðt sÞ: ð15:116Þ
This implies that once we get exp tA, we can safely obtain the resolvent matrix
by automatically replacing t of (15.72) with t s. Also, exp tA and exp(sA) are
commutative. In the present case, therefore, by exchanging the order of products
of F(s)1 and F(t) in (15.106), we have the same result as (15.106) such that
!
Rðs, t Þ ¼ F ðt ÞF ðsÞ1 ¼ : ð15:117Þ
(ii) The resolvent matrix of the present example is real symmetric. This is because
if A is real symmetric, i.e., AT ¼ A, we have
n
ðAn ÞT ¼ AT ¼ An , ð15:118Þ
that is An is real symmetric as well. From (15.7), exp A is real symmetric

accordingly. In a similar manner, if A is Hermitian, exp A is also Hermitian.
(iii) Let us think of a case where A is not a constant matrix but varies as a function of
t. In that case, we have

að t Þ bð t Þ
Aðt Þ ¼ : ð15:119Þ
cðt Þ d ðt Þ
Also, we have

aðsÞ bðsÞ
AðsÞ ¼ : ð15:120Þ
cð s Þ d ðsÞ
We can easily check that in a general case
Aðt ÞAðsÞ 6¼ AðsÞAðt Þ, ð15:121Þ
namely, A(t) and A(s) are not generally commutative. The matrices tA(t) is
not commutative with sA(s), either. In turn, (15.115) does not hold either.
Thus, we need an elaborated treatment for this.
Example 15.2 Find the resolvent matrix of the following homogeneous equation:
xð_t Þ ¼ 0,
yð_t Þ ¼ x þ y: ð15:122Þ

_ _ 0 1
ðxðt Þ yðt ÞÞ ¼ ðx yÞ : ð15:123Þ
0 1

0 1
The matrix A ¼ is characterized by an idempotent matrix (see Sect.
0 1
12.4). As in the case of Example 15.1, we get

0 1
ðxð_t Þ yð_t ÞÞ ¼ ðx yÞPP1 PP1 , ð15:124Þ
0 1

1 1 1
1 1
where we choose P ¼ and, hence, P ¼
. Notice that in this
0 1 0 1
example since A was not a symmetric matrix, we did not use a unitary matrix with the
similarity transformation. As a diagonal matrix D, we have

0 1 0 0
D ¼ P1 P ¼ P1 AP ¼ : ð15:125Þ
0 1 0 1
Or we have
A ¼ PDP1 : ð15:126Þ
Hence, we get

F ðt Þ ¼ exp tA ¼ exp tPDP1 ¼ Pð exp tDÞP1 , ð15:127Þ
where we used (15.30) again. From Theorem 15.3, we have

1 0
exp tD ¼ : ð15:128Þ
0 et
In turn, we get

1 1 1 0 1 1 1 1 þ et
F ðt Þ ¼ ¼ : ð15:129Þ
0 1 0 et 0 1 0 et
With an inverse matrix, we have

1 1 et 1
F ðt Þ ¼ : ð15:130Þ
0 et
Therefore, as a resolvent matrix we get

1 1 þ ets
Rðs, t Þ ¼ F ðsÞ1 F ðt Þ ¼ : ð15:131Þ
0 ets
Using the resolvent matrix, we can include the inhomogeneous term along with
BCs (or initial conditions). Once again, by exchanging the order of products of F
(s)1 and F(t) in (15.131), we have the same resolvent matrix. This can readily be
checked. Moreover, we get Rðs, t Þ ¼ F ðt sÞ.
Example 15.3 Solve the following equation
xð_t Þ ¼ x þ c,
yð_t Þ ¼ x þ y þ d, ð15:132Þ
under the BCs x(0) ¼ σ and y(0) ¼ τ. In a matrix form, (15.132) can be written as

1 1
ðxð_t Þ yð_t ÞÞ ¼ ðx yÞ þ ðc dÞ: ð15:133Þ
0 1

1 1
The matrix of a type of M ¼ has been fully investigated in Sect. 12.6
0 1
and characterized as a matrix that cannot be diagonalized. In such a case, we directly
calculate exp tM. Here, remember that product of triangle matrices of same kind (i.e.,
an upper triangle matrix or a lower triangle matrix; see Sect. 12.1) is a triangle matrix
of the same kind.
We have

1 1 1 1 1 2 1 2 1 1
M2 ¼ ¼ , M3 ¼
0 1 0 1 0 1 0 1 0 1

1 3
¼ , :
0 1
Repeating this, we get

1 n
M ¼ n
: ð15:134Þ
0 1
Thus, we have
1 1
exp tM ¼ ℜð0, t Þ ¼ E þ tM þ t 2 M 2 þ þ t ν M ν þ
2! ν!
! ! ! !
1 0 1 1 1 2 1 2 1 ν 1 ν
¼ þt þ t þ þ t þ
0 1 0 1 2! 0 1 ν! 0 1
0 1
1 1 !
Be
t
t 1 þ t þ t2 þ þ t ν1 þ C et tet
¼@ 2! ðν 1Þ! A¼ :
t 0 et
0 e
ð15:135Þ
With the resolvent matrix we get

ets ðt sÞets
Rðs, t Þ ¼ exp ðsM Þ exp tM ¼ : ð15:136Þ
0 ets

Z t
xðt Þ ¼ ðσ τÞRð0, t Þ þ ds½ðc d ÞRðs, t Þ: ð15:137Þ
0
Thus, we obtain the solution described by

xðt Þ ¼ ðc þ σ Þet c,
yðt Þ ¼ ðc þ σ Þtet þ ðd c þ τÞet þ c d: ð15:138Þ
From (15.138), we find that the original inhomogeneous equations (15.132) and
BCs of x(0) ¼ σ and y(0) ¼ τ have been recovered.
Example 15.4 Find the resolvent matrix of the following homogeneous equation:
xð_t Þ ¼ 0,
yð_t Þ ¼ x: ð15:139Þ

0 1
ðxð_t Þ yð_t ÞÞ ¼ ðx yÞ : ð15:140Þ
0 0

0 1
The matrix N ¼ is characterized by a nilpotent matrix (Sect. 12.3).
0 0
Since N2 and matrices of higher order of N vanish, we have

1 t
exp tN ¼ E þ tN ¼ : ð15:141Þ
0 1
We have a corresponding resolvent matrix such that

1 ts
Rðs, t Þ ¼ : ð15:142Þ
0 1
Using the resolvent matrix, we can include the inhomogeneous term along with
BCs.
Example 15.5 As a trivial case, let us consider a following homogeneous equation:

_ _ a 0
ðxðt Þ yðt ÞÞ ¼ ðx yÞ , ð15:143Þ
0 b
where one of a and b can be zero or both of them can be zero. Even though (15.143)
merely apposes two FOLDEs, we can deal with such cases in an automatic manner.
a 0
From Theorem 15.3, as eigenvalues of exp A where A ¼ , we have ea and
0 b
eb. That is, we have

eat 0
F ðt Þ ¼ exp tA ¼ : ð15:144Þ
0 ebt
In particular, if a ¼ b (including a ¼ b ¼ 0), we merely have the same FOLDE

twice changing the arguments x with y.
Readers may well wonder why we have to argue such a trivial case expressly.
That is because the theory we developed in this chapter holds widely and we are able
to make a clear forecast about a constitution of the solution for differential equations
of a first order. For instance, (15.144) immediately tells us that a fundamental
solution for
xð_t Þ ¼ axðt Þ ð15:145Þ
is eat. We only have to perform routine calculations with a counterpart of x(t) [or y(t)]
of (x(t) y(t)) using (15.107) to solve an inhomogeneous differential equation under
BCs.
Returning back to Example 15.1, we had

_ _ 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞ þ ð1 2Þ: ð15:96Þ
2 1
This equation can be converted to

_ _ 1 1 2
ðxðt Þ yðt ÞÞU ¼ ðx yÞUU U þ ð1 2ÞU, ð15:146Þ
2 1
0 1 0 1
1 1 1 1
B 2 2C B 2C
where U ¼ B C. Then, ðx yÞU ¼ ðx yÞB 2 C. Defining (X
@ 1 1 A @ 1 1 A
2 2 2 2
Y) as
0 1
1 1
B 2 2C
ðX Y Þ ðx yÞU ¼ ðx yÞB
@
C, ð15:147Þ
1 1 A
2 2
we have

_ _ 1 0
ðX ðt Þ Y ðt ÞÞ ¼ ðX Y Þ þ ð1 2ÞU: ð15:148Þ
0 3
Then, according to (15.90), we get a solution

Z t h i
e ðt 0 , t Þ þ
X ðt Þ ¼ X ðt 0 ÞR ds e e ðs, t Þ ,
bð s Þ R ð15:149Þ
t0
where X(t0) ¼ (X(t0) Y(t0)) and e e ðs, t Þ is given

bðsÞ ¼ bðsÞU. The resolvent matrix R
by
!
e ðs, t Þ ¼ eðtsÞ 0
R : ð15:150Þ
0 e3ðtsÞ
To convert (X Y) to (x y), operating U1 on both sides of (15.149) from the right,
we obtain
Z t h i
e ðt 0 , t ÞU 1 þ
X ðt ÞU 1 ¼ Xðt 0 ÞU 1 U ℜ ds e e ðs, t Þ U 1 ,
bðsÞU 1 U ℜ ð15:151Þ
t0
where with the first and second terms we insert U1U ¼ E. Then, we get
Z t
e
xðt Þ ¼ xðt 0 Þ U ℜðt 0 , t ÞU 1 e ðs, t ÞU 1 :
þ dsbðsÞ U ℜ ð15:152Þ
t0
e ðs, t ÞU 1 to have
We calculate U R
!
e ðs, t ÞU 1 1 1 1 eðtsÞ 0 1 1
UR ¼ 3ðtsÞ
2 1 1 0 e 1 1
!
¼ ¼ Rðs, t Þ: ð15:153Þ
Thus, we recover (15.90), i.e., we have

Z t
xðt Þ ¼ xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ: ð15:90Þ
t0
e ðs, t Þ are connected to each other

The equation (15.153) shows that Rðs, t Þ and R
through the unitary similarity transformation such that
e ðs, t Þ ¼ U 1 Rðs, t ÞU ¼ U { Rðs, t ÞU:

R ð15:154Þ
Example 15.6 Let us consider the following system of differential equations:

_ _ 0 ω
ðxðt Þ yðt ÞÞ ¼ ðx yÞ þ ða 0Þ: ð15:155Þ
ω 0

0 ω
The matrix is characterized by an anti-Hermitian operator. This type
ω 0
of operator frequently appears in Lie groups and Lie algebras that we will deal with
in Chap. 20. Here we define an operator D as

0 1 0 ω
D¼ or ωD ¼ : ð15:156Þ
1 0 ω 0
The calculation procedures will be seen in Chap. 20, and so we only show the
result here. That is, we have

cos ωt sin ωt
exp tωD ¼ : ð15:157Þ
sin ωt cos ωt
We seek the resolvent matrix Rðs, t Þ such that

cos ωðt sÞ sin ωðt sÞ
Rðs, t Þ ¼ : ð15:158Þ
sin ωðt sÞ cos ωðt sÞ
The implication of (15.158) is simple; synthesis of a rotation of ωt and the inverse

rotation ωs is described by ω(t s). Physically, (15.155) represents the motion of a
point mass under an external field.
We routinely obtained the above solution using (15.90). If, for example, we solve
(15.155) under the BCs for which the point mass is placed at rest at the origin of the
xy-plane at t ¼ 0, we get a solution described by
a a
x¼ sin ωt, y ¼ ð cos ωt 1Þ: ð15:159Þ
ω ω
Deleting t from (15.159), we get

2
a 2 a
x2 þ y þ ¼ : ð15:160Þ
ω ω
In other words, the point mass performs a circular motion as shown in Fig. 15.2.
In Example 15.1 we mentioned that if the matrix A that defines the system of
differential equations varies as a function of t, then the solution method based upon
exp tA does not work. Yet, we have an important case where A is not a constant
matrix. Let us consider a next illustration.
Fig. 15.2 Circular motion

of a point mass
− /
− /
15.4 Motion of a Charged Particle in Polarized

Electromagnetic Wave
In Sect. 4.5 we dealt with an electron motion under a circularly polarized light.
Here we consider the motion of a charged particle in a linearly polarized
electromagnetic wave.
Suppose that a charged particle is placed in vacuum. Suppose also that an
electromagnetic wave (light) linearly polarized along the x-direction is propagated
in the positive direction of the z-axis. Unlike the case of Sect. 4.5, we have to take
account of influence from both of the electric field and magnetic field components of
Lorentz force.
From (7.58) we have
E = E0 eiðkxωtÞ ¼ E0 e1 eiðkzωtÞ , ð15:161Þ
where e1, e2, and e3 are the unit vector of the x-, y-, and z-direction. (The latter two
unit vectors appear just below.) If the extent of motion of the charged particle is
narrow enough around the origin compared to a wavelength of the electromagnetic
field, we can ignore kz in the exponent (see Sect. 4.5). Then, we have
E E 0 e1 eiωt : ð15:162Þ
Taking a real part of (15.162), we get
E = E0 e1 cos ωt: ð15:163Þ

H = H 0 eiðkzωtÞ H 0 cos ωt: ð15:164Þ
From (7.60) furthermore, we get

H 0 = e3 E0 = μ0 =ε0 ¼ e3 E 0 e1 =μ0 c ¼ E 0 e2 =μ0 c: ð15:165Þ
Thus, as the magnetic flux density we have
B = μ0 H μ0 H 0 cos ωt ¼ E 0 e2 cos ωt=c: ð15:166Þ
The Lorentz force F exerted on the charged particle is then described by
F = eE þ e x_ B, ð15:167Þ
where x (=xe1 + ye2 + ze3) is a position vector of the charged particle having a
charge e. In (15.167) the first term represents the electric Lorentz force and the
second term shows the magnetic Lorentz force; see Sect. 4.5. Replacing E and B in
(15.167) with those in (15.163) and (15.166), we get

1 eE
F = 1 z_ eE0 e1 cos ωt þ 0 x_ e3 cos ωt
c c ð15:168Þ
= m€x ¼ m€xe1 þ m€ye2 þ m€ze3 ,
where m is a mass of the charged particle. Comparing each vector components, we

have

1
m€x ¼ 1 z_ eE 0 cos ωt,
c
m€y ¼ 0, ð15:169Þ
eE
m€z ¼ 0 x_ cos ωt:
c
Note that the Lorentz force is not exerted in the direction of the y-axis. Putting
a eE0/m and b eE0/mc, we get
€x ¼ a cos ωt b_z cos ωt,

€z ¼ b_x cos ωt: ð15:170Þ
Further putting x_ ¼ ξ and z_ ¼ ζ, we obtain
ξ_ ¼ bζ cos ωt þ a cos ωt, ζ_ ¼ bξ cos ωt: ð15:171Þ

Writing (15.171) as a matrix form, we obtain a following system of inhomoge-

neous differential equations:

0 b cos ωt
ðξð_t Þ ζ ð_t ÞÞ ¼ ðξ ζ Þ þ ða cos ωt 0Þ, ð15:172Þ
b cos ωt 0
where the inhomogeneous term is given by (acosωt). First, we think of a homoge-

neous equation described by

0 b cos ωt
ðξð_t Þ ζ ð_t ÞÞ ¼ ðξ ζ Þ : ð15:173Þ
b cos ωt 0
Meanwhile, we consider the following equation:

! 0 1
d cos f ðt Þ sin f ðt Þ f ð_t Þ sin f ðt Þ f ð_t Þ cos f ðt Þ
¼@ A
dt sin f ðt Þ cos f ðt Þ f ð_t Þ cos f ðt Þ f ð_t Þ sin f ðt Þ
!
cos f ðt Þ sin f ðt Þ 0 f ð_t Þ
¼ : ð15:174Þ
sin f ðt Þ cos f ðt Þ f ð_t Þ 0
Closely inspecting (15.173) and (15.174), we find that if we can decide f(t) so that
f(t) may satisfy
f ð_t Þ ¼ b cos ωt, ð15:175Þ
we might well be able to solve (15.172). In that case, moreover, we expect that two
linearly independent column vectors

cos f ðt Þ sin f ðt Þ
and ð15:176Þ
sin f ðt Þ cos f ðt Þ
give a fundamental set of solutions of (15.172). A function f(t) that satisfies (15.175)
can be given by
f ðt Þ ¼ ðb=ωÞ sin ωt ¼ e
b sin ωt, ð15:177Þ
where e
b b=ω. Notice that at t ¼ 0 two column vectors of (15.176) give

1 0
and ,
0 1
if we choose f ðt Þ ¼ e
b sin ωt for f(t) in (15.176).
Thus, defining F(t) as

0 1
cos e b sin ωt sin eb sin ωt
B C
F ðt Þ @ A, ð15:178Þ
sin e b sin ωt cos eb sin ωt
and taking account of (15.174), we recast the homogeneous equation (15.173) as
dF ðt Þ
¼ F ðt ÞA,
dt

0 b cos ωt
A : ð15:179Þ
b cos ωt 0
Equation (15.179) is formally the same as (15.47), even though F(t) is not given
in the form of exp tA. This enables us to apply the general scheme mentioned in Sect.
15.3.2 to the present case. That is, we are going to address the given problem using
the method of variation of constants such that
xðt Þ ¼ kðt ÞF ðt Þ, ð15:78Þ
where we define x(t) (ξ(t) ζ(t)) and k(t) (u(t) v(t)) as a variable constant.
Hence, using (15.80)–(15.90) we should be able to find the solution of (15.172)
described by
Z t
xðt Þ ¼ xðt 0 Þℜðt 0 , t Þ þ ds½bðsÞℜðs, t Þ, ð15:90Þ
t0
where b(s) (acosωt 0), i.e., the inhomogeneous term in (15.172). The resolvent
matrix Rðs, t Þ is expressed as

1 cos ½ f ðt Þ f ðsÞ sin ½ f ðt Þ f ðsÞ
Rðs, t Þ ¼ F ðsÞ F ðt Þ ¼ , ð15:180Þ
sin ½ f ðt Þ f ðsÞ cos ½ f ðt Þ f ðsÞ
where f(t) is given by (15.177). With F(s)1, we simply take a transpose matrix of
(15.178), because it is an orthogonal matrix. Thus, we obtain
0 1
cos eb sin ωs sin eb sin ωs
B C
F ðsÞ1 ¼@ A:
sin eb sin ωs e
cos b sin ωs
The resolvent matrix (15.180) possesses the properties described as (15.85)–

(15.87). These can readily be checked. Also, we become aware that F(t) of
(15.178) represents a rotation matrix in ℝ2; see Sect. 11.2. Therefore, F(t) and F(s)1
are commutative. Notice, however, that we do not have a resolvent matrix as a
simple form that appears in (15.116). In fact, from (15.178) we find Rðs, t Þ 6¼
F ðt sÞ. It is because the matrix A given as (15.179) is not a constant but depends
on t.
Z t
xðt Þ ¼ xð0Þℜð0, t Þ þ ds½ða cos ωt 0Þℜðs, t Þ ð15:181Þ
0
and setting BCs as x(0) ¼ (σ τ), with the individual components we finally obtain

a
ξðt Þ ¼ σ cos e b sin ωt τ sin e b sin ωt þ sin e b sin ωt ,
b

a a
ζ ðt Þ ¼ σ sin e b sin ωt þ τ cos e b sin ωt cos e b sin ωt þ : ð15:182Þ
b b
Note that differentiating both sides of (15.182) with respect to t, we recover

original equation of (15.172) to be solved. Further setting ξ(0) ¼ σ ¼ ζ(0) ¼ τ ¼ 0 as
BCs, we have

a
ξðt Þ ¼ sin e
b sin ωt ,
b
h i
a
ζ ðt Þ ¼ 1 cos e b sin ωt : ð15:183Þ
b
Notice that the above BCs correspond to that the charged particle is initially
placed (at the origin) at rest (i.e., zero velocity). Deleting the argument t, we get
2
a 2 a
ξ2 þ ζ ¼ : ð15:184Þ
b b
Figure 15.3 depicts the particle velocities ξ and ζ in the x- and z-directions,
respectively. It is interesting that although the velocity in the x-direction switches
from negative to positive, that in the z-direction is kept non-negative. This implies
that the particle is oscillating along the x-direction, but continually drifting toward
the z-direction while oscillating. In fact, if we integrate ζ(t), we have a continually
drifting term ab t.
Figure 15.4 shows the Lorentz force F as a function of time in the case where the
charge of the particle is positive. Again, the electric Lorentz force (represented by E)
switches from positive to negative in the x-direction, but the magnetic Lorentz force
(represented by x_ B) holds non-negative in the z-direction all the time.
To precisely analyze the positional change of the particle, we need to integrate
(15.182) once again. This requires the detailed numerical calculation. For this
purpose the following relation is useful [7]:
Fig. 15.3 Velocities ξ and ζ of a charged particle in the x- and z-directions, respectively.
The charged particle exerts a periodic motion under the influence of a linearly polarized
electromagnetic wave
(a) (b)
, ̇
̇× ,
, ̇
̇× ,
Fig. 15.4 Lorentz force as a function of time. In (a, b), the phase of electromagnetic fields E and
B is reversed. As a result, the electric Lorentz force (represented by E) switches from positive
to negative in the x-direction, but the magnetic Lorentz force (represented by x_ B) holds
non-negative all the time in the z-direction
X1
cos ðx sin t Þ ¼ J ðxÞ cos mt,
m¼1 m
X1
sin ðx sin t Þ ¼ J ðxÞ sin mt,
m¼1 m
ð15:185Þ
where the functions Jm(x) are called the Bessel functions that satisfy the following
equation:

d2 J n ðxÞ 1 dJ n ðxÞ n2
2
þ þ 1 2 J n ðxÞ ¼ 0: ð15:186Þ
dx x dx x
References 617
Although we do not examine the properties of the Bessel functions in this book,
the related topics are dealt with in detail in the literature [5, 7–9]. The Bessel
functions are widely investigated as one of special functions in many branches of
mathematical physics.
To conclude this section, we have described an example to which the matrix A of
(15.179) that characterizes the system of differential equations is not a constant
matrix but depends on the parameter t. Unlike the case of the constant matrix A, it
may well be difficult to find a fundamental set of solutions. Once we can find the
fundamental set of solutions, however, it is always possible to construct the resolvent
matrix and solve the problem using it. In this respect, we have already encountered a
similar situation in Chap. 10 where the Green’s functions were constructed by a
fundamental set of solutions.
In this section, a fundamental set of solutions could be found and, as a conse-
quence, we could construct the resolvent matrix. It is, however, not always the case
that we can recast the system of differential equations (15.66) as the form of
(15.179). Nonetheless, whenever we are successful in finding the fundamental set
of solutions, we can construct a resolvent matrix and, hence, solve the problems.
This is a generic and powerful tool.
References
4. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
6. Inami T (1998) Ordinary differential equations. Iwanami, Tokyo. (in Japanese)
Part IV
Group Theory and Its Chemical
Applications
Universe comprises space and matter. These two mutually stipulate their modality of
existence. We often comprehend related various aspects as manifestation of sym-
metry. In this part we deal with the symmetry from a point of view of group theory.
In this last part of the book, we outline and emphasize chemical applications of the
methods of mathematical physics. This part supplies us with introductory description
of group theory. The group theory forms an important field of both pure and applied
mathematics. Starting with the definition of groups, we cover a variety of topics
related to group theory. Of these, symmetry groups are familiar to chemists, because
they deal with a variety of matter and molecules that are characterized by different
types of symmetry. The symmetry group is a kind of finite group and called a point
group as well. Meanwhile, we have various infinite groups that include rotation
group as a typical example. We also mention an introductory theory of the rotation
group of SO(3)that deals with an important topic of, e.g., Euler angles. We also treat
successive coordinate transformations.
Next, we describe representation theory of groups. Schur’s lemmas and related
grand orthogonality theorem underlie the representation theory of groups. In parallel,
characters and irreducible representations are important concepts that support the
representation theory. We present various representations, e.g., regular representa-
tion, direct-product representation, and symmetric and antisymmetric representa-
tions. These have wide applications in the field of quantum mechanics and quantum
chemistry, and so forth.
On the basis of the above topics, we highlight quantum chemical applications of
group theory in relation to a method of molecular orbitals. As tangible examples, we
adopt aromatic molecules and methane.
The last part deals with the theory of continuous groups. The relevant theory has
wide applications in many fields of both pure and applied physics and chemistry. We
highlight the topics of SU(2)and SO(3)that very often appear there. Tangible exam-
ples help understand the essence.
Chapter 16
Introductory Group Theory
A group comprises mathematical elements that satisfy four simple definitions. A

bunch of groups exists under these simple definitions. This makes the group theory a
discriminating field of mathematics. To get familiar with various concepts of groups,
we first show several tangible examples. Group elements can be numbers (both real
and complex) and matrices. More abstract mathematical elements can be included as
well. Examples include transformation, operation, etc. as already studied in previous
parts. Once those mathematical elements form a group, they share several common
notions such as classes, subgroups, and direct-product groups. In this context,
readers are encouraged to conceive different kinds of groups close to their heart.
Mapping is an important concept as in the case of vector spaces. In particular,
isomorphism and homomorphism frequently appear in the group theory. These
concepts are closely related to the representation theory that is an important pillar
of the group theory.
16.1 Definition of Groups
In contrast to a broad range of applications, the definition of the group is simple. Let
ℊ be a set of elements gν, where ν is an index either countable (e.g., integers) or
uncountable (e.g., real numbers) and the number of elements may be finite or
infinite. We denote this by ℊ ¼ {gν}. If a group is a finite group, we express it as
ℊ ¼ fg1 , g2 , , gn g, ð16:1Þ
where n is said to be an order of the group.

Definition of the group comprises the following four axioms with respect to a
well-defined “multiplication” rule between any pair of elements. The multiplication

https://doi.org/10.1007/978-981-15-2225-3_16
622 16 Introductory Group Theory
is denoted by a symbol " ⋄ " below. Note that the symbol ⋄ implies an ordinary
multiplication, an ordinary addition, etc.
(A1) If a and b are any elements of the set ℊ, then so is a ⋄ b. (We sometimes say
that the set is “closed” regarding the multiplication.)
(A2) Multiplication is associative; i.e., a ⋄ (b ⋄ c) ¼ (a ⋄ b) ⋄ c.
(A3) The set ℊ contains an element of e called the identity element such that we
have a ⋄ e ¼ e ⋄ a ¼ a with any element a of ℊ.
(A4) For any a of ℊ, we have an element b such that a ⋄ b ¼ b ⋄ a ¼ e. The element
b is said to be the inverse element of a. We denote b a1.
In the above definitions, we assume that the commutative law does not necessar-
ily hold, that is, a ⋄ b 6¼ b ⋄ a. In that case the group ℊ is said to be a
noncommutative group. However, we have a case where the commutative law
holds, i.e., a ⋄ b ¼ b ⋄ a. If so, the group ℊ is called a commutative group or an
Abelian group.
Let us think of some examples of groups. Henceforth, we follow the convention
and write ab to express a ⋄ b.
Example 16.1 We present several examples of groups below. Examples (i)–(iv) are
simple, but Example (v) is general.
(i) ℊ ¼ {1, 1}. The group ℊ makes a group with respect to the multiplication.
This is an example of a finite group.
(ii) ℊ ¼ { , 3, 2, 1, 0, 1, 2, 3, }. The group ℊ makes a group with respect
to the addition. This is an infinite group. For instance, take a(>0) and make
a + 1 and make (a + 1) + 1, [(a + 1) + 1] + 1, again and again. Thus, addition
is not closed and, hence, we must have an infinite group.

0 1
(iii) Let us start with a matrix a ¼ . Then, the inverse a1 ¼ a3 ¼
1 0

0 1 1 0
. We have a2 ¼ . Its inverse is a2 itself. These four
1 0 0 1
elements make a group. That is, ℊ ¼ {e, a, a2, a3}. This is an example of cyclic
groups.
(iv) ℊ ¼ {1}. It is a most trivial case, but sometimes the trivial case is very
important as well. We will come back later to this point.
(v) Let us think of a more general case. In Chap. 11, we discussed endomorphism
on a vector space and showed the necessary and sufficient condition for the
existence of an inverse transformation. In this relation, we consider a set that
comprises matrices such that

GLðn, ℂÞ A ¼ aij ji, j ¼ 1, 2, , n; aij 2 ℂ, detA 6¼ 0 :
This may be either a finite group or an infinite group. The former can be a symmetry
group and the latter can be a rotation group. This group is characterized by a set of
16.1 Definition of Groups 623
Table 16.1 Multiplication ℊ e a b a2 c a3

table of ℊ ¼ {e, a, a2, a3}
e e a b c
a a b c e
b b c e a
c c e a b
invertible and endomorphic linear transformations over a vector space Vn and called
a linear transformation group or a general linear group and denoted by GL(n, ℂ), GL
(Vn), GL(V ), etc. The relevant transformations are bijective. We can readily make
sure that axioms (A1) to (A4) are satisfied with GL(n, ℂ). Here, the vector space can
be ℂn or a function space.
The structure of a finite group is tabulated in a multiplication table. This is made
up such that group elements are arranged in a first row and a first column and that an
intersection of an element gi in the row and gj in the column is designated as a
product gi ⋄ gj. Choosing the above (iii) for an example, we make its multiplication
table (see Table 16.1). There we define a2 ¼ b and a3 ¼ c.
Having a look at Table 16.1, we notice that in the individual rows and columns
each group element appears once and only once. This is well known as a
rearrangement theorem.
Theorem 16.1: Rearrangement Theorem [1] In each row or each column in the
group multiplication table, individual group elements appear once and only once.
From this, each row and each column list merely rearranged group elements.
Proof Let a set ℊ ¼ {g1 e, g2, , gn} be a group. Arbitrarily choosing any
element h from ℊ and multiplying individual elements by h, we obtain a set
Hfhg1 , hg2 , , hgn g . Then, all the group elements of ℊ appear in H once and
only once. Choosing any group element gi, let us multiply gi by h1 to get h1gi.
Since h1gi must be a certain element gk of ℊ, we put h1gi ¼ gk. Multiplying both
sides by h, we have gi ¼ hgk. Therefore, we are able to find this very element hgk in H,
i.e., gi in H. This implies that the element gi necessarily appears in H. Suppose in turn
that gi appears more than once. Then, we must have gi ¼ hgk ¼ hgl (k 6¼ l). Multi-
plying the relation by h1, we would get h1gi ¼ gk ¼ gl, in contradiction to the
supposition. This means that gi appears in H once and only once. This confirms that
the theorem is true of each row of the group multiplication table.
A similar argument applies with a set H0 fg1 h, g2 h, , gn hg. This confirms in turn
that the theorem is true of each column of the group multiplication table. These
complete the proof. ∎
16.2 Subgroups
As we think of subspaces in a linear vector space, we have subgroups in a group. The

definition of the subgroup is that a subset H of a group makes a group with respect to
the multiplication ⋄ that is defined for the group ℊ. The identity element makes a
group by itself. Both {e} and ℊ are subgroups as well. We often call subgroups other
than {e} and ℊ “proper” subgroups.
A necessary and sufficient condition for the subset H to be a subgroup is the
following:
(1) hi , hj 2 H ⟹hi ⋄hj 2 H .
(2) h 2 H ⟹h1 2 H .
If H is a subgroup of ℊ, it is obvious that the relations (1) and (2) hold.
Conversely, if (1) and (2) hold, H is a subgroup. In fact, (1) ensures the aforemen-
tioned relation (A1). Since H is a subset of ℊ, this guarantees the associative law
(A2). The relation (2) ensures (A4). Finally, in virtue of (1) and (2), h ⋄ h1 ¼ e is
contained in H ; this implies that (A3) is satisfied. Thus, H is a subgroup, because H
satisfies
the axioms (A1) to (A4). Of the above examples, (iii) has a subgroup
H ¼ e, a2 .
It is important to decompose a set into subsets that do not mutually contain an
element (except for a special element) among them. We saw this in Part III when we
decomposed a linear vector space into subspaces. In that case the said special
element was a zero vector. Here let us consider a related question in its similar
aspects.
Let H ¼ fh1 e, h2 , , hs g be a subgroup of ℊ. Also let us consider aH where
∃
a 2 ℊ and a= 2H . Suppose that aH is a subset of ℊ such that aH ¼
fah1 , ah2 , , ahs g . Then, we have another subset H þ aH . If H contains
s elements, so does aH . In fact, if it were not the case, namely, if ahi ¼ ahj,
multiplying the both sides by a1 we would have hi ¼ hj, in contradiction. Next,
let us take b such that b= 2H and b= 2aH and make up bH and H þ aH þ bH
successively. Our question is whether these procedures decompose ℊ into subsets
mutually exclusive and collectively exhaustive.
Suppose that we can succeed in such a decomposition and get
ℊ ¼ g1 H þ g2 H þ þ gk H , ð16:2Þ
where g1, g2, , gk are mutually different elements with g1 being the identity e. In
that case (16.2) is said to be the left coset decomposition of ℊ by H . Similarly, right
coset decomposition can be done to give
ℊ ¼ H g1 þ H g2 þ þ H gk : ð16:3Þ
In general, however,
16.3 Classes 625
gk H 6¼ H gk or gk H g1
k 6¼ H : ð16:4Þ
Taking the case of left coset as an example, let us examine whether different cosets
mutually contain a common element. Suppose that gi H and gj H mutually contain a
common element. Then, that element would be expressed as gihp ¼ gjhq (1 i,
j n; 1 p, q s). Thus, we have gi hp h1 q ¼ gj . Since H is a subgroup of ℊ,
hp h1
q 2 H . This implies that g j 2 g i H . It is in contradiction to the definition of left
coset. Thus, we conclude that different cosets do not mutually contain a common
element.
Suppose that the order of ℊ and H is n and s, respectively. Different k cosets
comprise s elements individually and different cosets do not mutually possess a
common element and, hence, we must have
n ¼ sk, ð16:5Þ
where k is called an index of H . We will have many examples afterward.
16.3 Classes
Another method to decompose a group into subsets is called conjugacy classes. A

conjugate element is defined as follows: Let a be an element arbitrarily chosen from
a group. Then an element gag1 is called a conjugate element or conjugate to a. If
c is conjugate to b and b is conjugate to a, then c is conjugate to a. It is because
1 1 1 1
c ¼ g0 bg0 , b ¼ gag1 ⟹c ¼ g0 bg0 ¼ g0 gag1 g0 ¼ g0 gaðg0 gÞ : ð16:6Þ
In the above a set containing a and all the elements conjugate to a is said to be a
(conjugate) class of a. Denoting this set by ∁a, we have

∁a ¼ a, g2 ag1 1 1
2 , g3 ag3 , , gn agn : ð16:7Þ
In ∁a a same element may appear repeatedly. It is obvious that in every group the
identity element e forms a class by itself. That is,
∁e ¼ feg: ð16:8Þ
As in the case of the decomposition of a group into (left or right) cosets, we can
decompose a group to classes. If group elements are not exhausted by a set
comprising ∁e or ∁a, let us take b such that b 6¼ e and b 2
= ∁a and make ∁b similarly
to (16.7). Repeating this procedure, we should be able to decompose a group into
classes. In fact, if group elements have not yet exhausted after these procedures, take
remaining element z and make a class. If the remaining element is only z in this
moment, z can make a class by itself (as in the case of e). Notice that for an Abelian
group every element makes a class by itself. Thus, with a finite group we have a
decomposition such that
ℊ ¼ ∁e þ ∁a þ ∁b þ þ ∁z : ð16:9Þ
To show that (16.9) is really a decomposition, suppose that for instance a set ∁a \ ∁b
is not an empty set and that x 2 ∁a \ ∁b. Then we must have α and β that satisfy a
following relation: x ¼ αaα1 ¼ βbβ1, i.e., b ¼ β1 αaα1β ¼ β1 αa(β1 α)1.
This implies that b has already been included in ∁a, in contradiction to the suppo-
sition. Thus, (16.9) is in fact a decomposition of ℊ into a finite number of classes.
In the above we thought of a class conjugate to a single element. This notion can
be extended to a class conjugate to a subgroup. Let H be a subgroup of ℊ. Let g be
an element of ℊ. Let us now consider a set H 0 ¼ gH g1. The set H 0 is a subgroup of
ℊ and is called a conjugate subgroup.
In fact, let hi and hj be any two elements of H , that is, let ghig1 and ghjg1 be any
tow elements of H 0 . Then, we have

ghi g1 ghj g1 ¼ ghi hj g1 ¼ ghk g1 , ð16:10Þ
1
where hk ¼ hi hj 2 H . Hence, ghk g1 2 H 0 . Meanwhile, ðghi g1 Þ ¼ gh1 i g
1
2
H . Thus, conditions (1) and (2) of Sect. 16.2 are satisfied with H . Therefore, H 0 is a
0 0
subgroup of ℊ. The subgroup H 0 has a same order as H . This is because with any
two different elements hi and hj ghig1 6¼ ghjg1.
If for 8g2ℊ and a subgroup H we have a following equality
g1 H g ¼ H , ð16:11Þ
such a subgroup H is said to be an invariant subgroup. If (16.11) holds, H should be

a sum of classes (reader, please show this). A set comprising only the identity, i.e.,
{e} forms a class. Therefore, if H is a proper subgroup, H must contain two or more
classes. The relation (16.11) can be rewritten as
gH ¼ H g: ð16:12Þ
This implies that the left coset is identical to the right coset. Thus, as far as we are
dealing with a coset pertinent to an invariant subgroup, we do not have to distinguish
left and right cosets.
Now let us anew consider the (left) coset decomposition of ℊ by an invariant
subgroup H
16.4 Isomorphism and Homomorphism 627
ℊ ¼ g1 H þ g2 H þ þ gk H , ð16:13Þ
where we have H ¼ fh1 e, h2 , , hs g. Then multiplication of two elements that

belong to the cosets gi H and gj H is expresses as

ðgi hl Þ gj hm ¼ gi gj g1
j hl gj hm ¼ gi gj g1
j hl gj hm ¼ gi gj hp hm
¼ gi gj hq , ð16:14Þ
where the third equality comes from (16.11). That is, we should have ∃hp such that
g1
j hl gj ¼ hp , and hphm ¼ hq. In (16.14) hα 2 H (α stands for l, m, p, q, etc. with
1 α s). Note that gi gj hq 2 gi gj H . Accordingly, a product of elements belonging
to gi H and gj H belongs to gi gj H . We rewrite (16.14) as a relation between the sets

ð gi H Þ gj H ¼ gi gj H : ð16:15Þ
Viewing LHS of (16.15) as a product of two cosets, we find that the said product is a
coset as well. This implies that a collection of the cosets forms a group. Such a group
that possesses cosets as elements is said to be a factor group or quotient group. In this
context, the multiplication is a product of cosets. We denote the factor group by
ℊ=H :
An identity element
of this factor group is H . This is because in (16.15) putting
gi ¼ e, we get H gj H ¼ gj H . Alternatively, putting gj ¼ e, we have ðgi H ÞH ¼
gi H . In (16.15) moreover putting gj ¼ g1i , we get

ðgi H Þ g1 1
i H ¼ gi gi H ¼ H : ð16:16Þ
Hence, ðgi H Þ1 ¼ g1 1

i H . That is, the inverse element of gi H is gi H .
16.4 Isomorphism and Homomorphism
As in the case of the linear vector space, we consider the mapping between group
elements. Of these, the notion of isomorphism and homomorphism is important.
Definition 16.1 Let ℊ ¼ {x, y, } and ℊ0 ¼ {x0, y0, } be groups and let a
mapping ℊ ! ℊ0 exist. Suppose that there is a one-to-one correspondence (i.e.,
injective mapping)
x $ x0 , y $ y0 ,
between the elements such that xy ¼ z implies that x0y0 ¼ z0 and vice versa.
Meanwhile, any element in ℊ0 must be the image of some element of ℊ. That is,
the mapping is surjective as well and, hence, the mapping is bijective. Then, the two
groups ℊ and ℊ0 are said to be isomorphic. The relevant mapping is called an
isomorphism. We symbolically denote this relation by
ℊ ffi ℊ0 :
Note that the aforementioned groups can be either a finite group or an infinite
group. We did not designate identity elements. Suppose that x is the identity e. Then,
from the relations xy ¼ z and x0y0 ¼ z0 we have
ey ¼ z ¼ y, x0 y0 ¼ z 0 ¼ y0 : ð16:17Þ
Then, we get
x0 ¼ e0 , i:e:, e $ e0 : ð16:18Þ
Also let us put y ¼ x1. Then,

0
xx1 ¼ z ¼ e, x0 x1 ¼ e0 , x0 y0 ¼ z0 ¼ e0 : ð16:19Þ
Comparing the second and third equations of (16.19), we get
1 0
y0 ¼ x0 ¼ x1 : ð16:20Þ
The bijective character mentioned above can somewhat be loosened in such a

way that the one-to-one correspondence is replaced with n-to-one correspondence.
We have a following definition.
Definition 16.2 Let ℊ ¼ {x, y, } and ℊ0 ¼ {x0, y0, } be groups and let a
mapping ℊ ! ℊ0 exist. Also let a mapping ρ: ℊ ! ℊ0 exist such that with arbitrarily
chosen any two elements, the following relation holds:
ρðxÞρðyÞ ¼ ρðxyÞ: ð16:21Þ
Then, the two groups ℊ and ℊ0 are said to be homomorphic. The relevant mapping is
called homomorphism. We symbolically denote this relation by
ℊ ℊ0 :
In this case, we have

16.4 Isomorphism and Homomorphism 629
ρðeÞρðeÞ ¼ ρðeeÞ ¼ ρðeÞ, i:e:, ρðeÞ ¼ e0 ,
where e0 is an identity element of ℊ0. Also, we have

ρðxÞρ x1 ¼ ρ xx1 ¼ ρðeÞ ¼ e0 :
Therefore,

½ρðxÞ1 ¼ ρ x1 :
The two groups can be either a finite group or an infinite group. Note that in the
above the mapping is not injective. The mapping may or may not be surjective.
Regarding the identity and inverse elements, we have the same relations as (16.18)
and (16.20). From Definitions 16.1 and 16.2, we say that the bijective homomor-
phism is the isomorphism.
Let us introduce an important notion of a kernel of a mapping. In this regard, we
have a following definition.
Definition 16.3 Let ℊ ¼ {e, x, y, } and ℊ0 ¼ {e0, x0, y0, } be groups and let
e and e0 be the identity elements. Suppose that there exists a homomorphic mapping
ρ: ℊ ! ℊ0. Also let F be a subset of ℊ such that
ρð F Þ ¼ e 0 : ð16:22Þ
Then, F is said to be a kernel of ρ.

Regarding the kernel, we have following important theorems.
Theorem 16.2 Let ℊ ¼ {e, x, y, } and ℊ0 ¼ {e0, x0, y0, } be groups, where e and
e0 are identity elements. A necessary and sufficient condition for a surjective and
homomorphic mapping ρ : ℊ ! ℊ0 to be isomorphic is that a kernel F ¼ feg.
Proof We assume that F ¼ feg. Suppose that ρ(x) ¼ ρ( y). Then, we have

ρðxÞ½ρðyÞ1 ¼ ρðxÞρ y1 ¼ ρ xy1 ¼ e0 : ð16:23Þ
The first and second equalities result from the homomorphism of ρ. Since F ¼ feg,
xy1 ¼ e, i.e., x ¼ y. Therefore, ρ is injective (i.e., one-to-one correspondence). As ρ
is surjective from the assumption, ρ is bijective. The mapping ρ is isomorphic
accordingly.
Conversely, suppose that ρ is isomorphic. Also suppose for ∃x 2 ℊ ρ(x) ¼ e0.
From (16.18), ρ(e) ¼ e0. We have ρ(x) ¼ ρ(e) ¼ e0 ⟹ x ¼ e due to the isomorphism
of ρ (i.e., one-to-one correspondence). This implies F ¼ feg. This completes the
proof. ∎
We become aware of close relationship between Theorem 16.1 and linear trans-
formation versus kernel already mentioned in Sect. 11.2 of Part III. Figure 16.1
Fig. 16.1 Mapping in a (a)

group and vector space. (a)
Homomorphic mapping ρ: ℊ
ℊ ! ℊ0 between two
groups. (b) Linear ℊ
transformation
(endomorphism) A: Vn ! Vn
in a vector space Vn ⟷ : isomorphism
(b)
: inverble (bijecve)
shows this relationship. Figure 16.1a represents homomorphic mapping ρ: ℊ ! ℊ0

between two groups, whereas Fig. 16.1b shows linear transformation (endomor-
phism) A: Vn ! Vn in a vector space Vn.
Theorem 16.3 Suppose that there exists a homomorphic mapping ρ: ℊ ! ℊ0,
where ℊ and ℊ0 are groups. Then, a kernel F of ρ is an invariant subgroup of ℊ.
Proof Let ki and kj be any two arbitrarily chosen elements of F . Then,

ρð k i Þ ¼ ρ k j ¼ e 0 , ð16:24Þ
where e0 is the identity element of ℊ0. From (16.21) we have

ρ ki kj ¼ ρðki Þρ kj ¼ e0 e0 ¼ e0 : ð16:25Þ
Therefore, k i kj 2 F . Meanwhile, from (16.20) we have

1
ρ k1
i ¼ ½ρðk i Þ1 ¼ e0 ¼ e0 : ð16:26Þ
Then, k1
i 2 F . Thus, F is a subgroup of ℊ.
Next, for 8g 2 ℊ we have

ρ gk i g1 ¼ ρðgÞρðki Þρ g1 ¼ ρðgÞe0 ρ g1 ¼ e0 : ð16:27Þ
Accordingly we have gk i g1 2 F . Thus, gF g1 ⊂ F . Since g is chosen arbitrarily,

replacing it with g1 we have g1 F g ⊂ F. Multiplying g and g1 on both sides from
the left and right, respectively, we get F ⊂ gF g1 . Consequently, we get
gF g1 ¼ F : ð16:28Þ
This implies that F of ρ is an invariant subgroup of ℊ. ∎

16.5 Direct-Product Groups 631
Theorem 16.4: Homomorphism Theorem Let ℊ ¼ {x, y, } and ℊ0 ¼ {x0,

y0, } be groups and let a homomorphic (and surjective) mapping ρ: ℊ ! ℊ0
ρ: ℊ=F ! ℊ0
exist. Also let F be a kernel of ℊ. Let us define a surjective mapping e
such that
e
ρðgi F Þ ¼ ρðgi Þ: ð16:29Þ
Then, e
ρ is an isomorphic mapping.
Proof From (16.15) and (16.21), it is obvious that e ρ is homomorphic. The confir-
mation is left for readers. Let gi F and gj F be two different cosets. Suppose here that
ρ(gi) ¼ ρ(gj). Then we have
1
ρ g1
i g j ¼ ρ gi ρ gj ¼ ½ρðgi Þ1 ρ gj ¼ ½ρðgi Þ1 ρðgi Þ ¼ e0 : ð16:30Þ
This implies that g1

i gj 2 F . That is, we would have gj 2 gi F . This is in contradic-
tion to the definition of a coset. Thus, we should have ρ(gi) 6¼ ρ(gj). In other words,
the different cosets gi F and gj F have been mapped into different elements ρ(gi) and
ρ(gj) in ℊ0. That is, e
ρ is isomorphic; i.e., ℊ=F ffi ℊ0 . ∎
16.5 Direct-Product Groups
So far we have investigated basic properties of groups. In Sect. 16.4 we examined

factor groups. The homomorphism theorem shows that the factor group is charac-
terized by division. In the case of a finite group, an order of the group is reduced. In
this section, we study the opposite character, i.e., properties of direct product of
groups, or direct-product groups.
Let H ¼ fh1 e, h2 , , hm g and H 0 ¼ h01 e, h02 , , h0n be groups of the
8
order of m and n, respectively. Suppose that (i) 8 hi (1 i m) and h0j ð1 i nÞ
commute, i.e., hi h0j ¼ h0j hi and that (ii) H \ H 0 ¼ feg. Under these conditions let us
construct a set ℊ such that
n o
ℊ ¼ h1 h01 e, hi h0j ð1 i m, 1 j nÞ : ð16:31Þ
In other words, ℊ is a set comprising mn elements hi h0j . A product of elements is

defined as

hi h0j hk h0l ¼ hi hk h0j h0l ¼ hp h0q , ð16:32Þ
where hp ¼ hihk and h0q ¼ h0j h0l . The identity element is ee ¼ e; ehihk ¼ hihke. The
1 1 1
inverse element is ðhi h0j Þ ¼ h0j hi 1 ¼ hi 1 h0j . Associative law is obvious from
hi h0j ¼ h0j hi. Thus, ℊ forms a group. This is said to be a direct product of groups, or a
direct-product group. The groups H and H 0 are called direct factors of ℊ. In this
case, we succinctly represent
ℊ¼H H 0:
In the above the condition (ii) is equivalent to that 8 g2ℊ is uniquely represented
as
g ¼ hh0 ; h 2 H , h0 2 H 0 : ð16:33Þ
In fact, suppose that H \ H 0 ¼ feg and that g can be represented in two ways such
that
g ¼ h1 h01 ¼ h2 h02 ; h1 , h2 2 H , h01 , h02 2 H 0 : ð16:34Þ
Then, we have
1 1
h2 1 h1 ¼ h02 h01 ; h2 1 h1 2 H , h02 h01 2 H 0: ð16:35Þ
From the supposition, we get
1
h2 1 h1 ¼ h02 h01 ¼ e: ð16:36Þ
That is, h2 ¼ h1 and h02 ¼ h01 . This means that the representation is unique.
Conversely, suppose that the representation is unique and that x 2 H \ H 0. Then
we must have
x ¼ xe ¼ ex: ð16:37Þ
Thanks to uniqueness of the representation, x ¼ e. This implies H \ H 0 ¼ feg.

Now suppose h 2 H . Then for 8 g2ℊ putting g ¼ hν h0μ , we have
1 1
ghg1 ¼ hν h0μ hh0μ hν 1 ¼ hν hh0μ h0μ hν 1 ¼ hν hhν 1 2 H : ð16:38Þ
Then we have gH g1 ⊂ H . Similarly to the proof of Theorem 16.3, we get
gH g1 ¼ H : ð16:39Þ
This shows that H is an invariant subgroup of ℊ. Similarly, H 0 is an invariant

subgroup as well.
Reference 633
Regarding the unique representation of the group element of a direct-product

group, we become again aware of the close relationship between the direct product
and direct sum that was mentioned earlier in Part III.
Reference
1. Cotton FA (1990) Chemical applications of group theory. Wiley, New York

Chapter 17
Symmetry Groups
We have many opportunities to observe symmetry both macroscopic and micro-

scopic in natural world. First, we need to formulate the symmetry appropriately. For
this purpose, we must regard various symmetry operations as mathematical elements
and classify these operations under several categories. In Part III we examined
various properties of vectors and their transformations. We also showed that the
vector transformation can be viewed as the coordinate transformation. On these
topics, we focused upon abstract concepts in various ways. On another front,
however, we have not paid attention to specific geometric objects, especially mol-
ecules. In this chapter, we study the symmetry of these concrete objects. For this, it
will be indispensable to correctly understand a variety of symmetry operations. At
the same time, we deal with the vector and coordinate transformations as group
elements. Among such transformations, rotations occupy a central place in the group
theory and related field of mathematics. Regarding the three-dimensional Euclidean
space, SO(3) is particularly important. This is characterized by an infinite group in
contrast to various symmetry groups (or point groups) we investigate in the former
parts of this chapter.
17.1 A Variety of Symmetry Operations
To understand various aspects of symmetry operations, it is convenient and essential

to consider a general point that is fixed in a three-dimensional Euclidean space and to
examine how this point is transformed in the space. In parallel to the description in
Part III, we express the coordinate of the general point P as

https://doi.org/10.1007/978-981-15-2225-3_17
636 17 Symmetry Groups
0 1
x
B C
P ¼ @ y A: ð17:1Þ
z
Note that P may be on or within or outside a geometric object or molecule that we are
dealing with. The relevant position vector x for P is expresses as
x ¼ xe1 þ ye2 þ ze3:

0 1
x
B C
¼ ðe1 e2 e3 Þ@ y A, ð17:2Þ
z
where e1, e2, and e3 denote an orthonormal basis vectors pointing to positive
directions of x-, y-, and z-axes, respectively. Similarly we denote a linear transfor-
mation A by
0 10 1
a11 a12 a13 x
B CB C
AðxÞ ¼ ðe1 e2 e3 Þ@ a21 a22 a23 A@ y A: ð17:3Þ
a31 a32 a33 z
Among various linear transformations that are represented by matrices, orthogonal

transformations are the simplest and most widely used. We use orthogonal matrices
to represent the orthogonal transformations accordingly.
Let us think of a movement or translation of a geometric object and an operation
that causes such a movement. Fist suppose that the geometric object is fixed on a
coordinate system. Then the object is moved (or translate) to another place. If before
and after such a movement (or translation) one could not tell whether the object has
been moved, we say that the object possesses a “symmetry” in a certain sense. Thus,
we have to specify this symmetry. In that context, group theory deals with the
symmetry and defines it clearly.
To tell whether the object has been moved (to another place), we usually
distinguish it by change in (i) positional relationship and (ii) attribute or property.
To make the situation simple, let us consider a following example:
Example 17.1 We have two round disks, i.e., Disk A and Disk B. Suppose that
Disk A is a solid white disk, whereas Disk B is partly painted black (see Fig. 17.1). In
Fig. 17.1 we are thinking of a rotation of an object (e.g., round disk) around an axis
standing on its center and stretching perpendicularly to the object plane.
If an arbitrarily chosen positional vector fixed on the object before the rotation is
moved to another position that was not originally occupied by the object, then we
recognize that the object has certainly been moved. For instance, imagine that a
round disk having a through-hole located aside from center is rotating. What about
the case where that position was originally occupied by the object, then? We have
17.1 A Variety of Symmetry Operations 637
(a) (b)
Disk A Disk B
Rotation around Rotation around C

C the center C C the center
Fig. 17.1 Rotation of an object. (a) Case where we cannot recognize that the object has been
moved. (b) Case where we can recognize that the object has been moved because of its attribute
(i.e., because the round disk is partly painted black)
two possibilities. The first alternative is that we cannot recognize that the object has
been moved. The second one is that we can yet recognize that the object has been
moved. According to Fig. 17.1a, b, we have the former case and the latter case,
respectively. In the latter case, we have recognized the movement of the object by its
attribute, i.e., by that the object is partly painted black.
However, we do not have to be rigorous here. We have a clear intuitive criterion
for a judgment of whether a geometric object has been moved. From now on, we
assume that the geometric character of an object is pertinent to both its positional
relationship and attribute. Thus, we define the equivalent (or indistinguishable)
disposition of an object and the operation that yields such an equivalent disposition
as follows:
Definition 17.1
(i) Symmetry operation: A geometric operation that produces an equivalent
(or indistinguishable) disposition of an object.
(ii) Equivalent (or indistinguishable) disposition: Suppose that regarding a geomet-
ric operation of an object, we cannot recognize that the object has been moved
before and after that geometric operation. In that case, the original disposition of
the object and the resulting disposition reached after the geometric operation are
referred to as an equivalent disposition. The relevant geometric operation is the
symmetric operation.
Here we should clearly distinguish translation (i.e., parallel displacement) from
the abovementioned symmetry operations. This is because for a geometric object to
possess the translation symmetry the object must be infinite in extent, typically an
infinite crystal lattice. The relevant discipline is widely studied as space group and
has a broad class of applications in physics and chemistry. However, we will not deal
with the space group or associated topics, but focus our attention upon symmetry
groups in this book.
In the above example, the rotation is a symmetry operation with Fig. 17.1a, but
the said geometric operation is not a symmetric operation with Fig. 17.1b.
Let us further inspect properties of the symmetry operation. Let us consider a set
H consisting of symmetry operations. Let a and b be any two symmetric operations
of H . Then, (i) a ⋄ b is a symmetric operation as well. (ii) Multiplication of
successive symmetric operations a, b, c is associative; i.e., a ⋄ (b ⋄ c) ¼ (a ⋄ b) ⋄ c.

(iii) The set H contains an element of e called the identity element such that we have
a ⋄ e ¼ e ⋄ a ¼ a with any element a of H. Operating “nothing” should be e. If the
rotation is relevant, 2π rotation is thought to be e. These are intuitively acceptable.
(iv) For any a of ℊ, we have an element b such that a ⋄ b ¼ b ⋄ a ¼ e. The element
b is said to be the inverse element of a. We denote it by b a1. The inverse element
corresponds to an operation that brings the disposition of a geometric object back to
the original disposition. Thus, H forms a group. We call H satisfying the above
criteria a symmetry group. A symmetry group is called a point group as well. This is
because a point group comprises symmetry operations of geometric objects as group
elements and those objects have at least one fixed point after the relevant symmetry
operation. The name of a point group comes from this fact.
As mentioned above, the symmetric operation is best characterized by a (3, 3)
orthogonal matrix. In Example 17.1, e.g., the π rotation is represented by an
orthogonal matrix A such that
0 1
1 0 0
B C
A¼@ 0 1 0 A: ð17:4Þ
0 0 1
This operation represents a π rotation around the z-axis. Let us think of another
symmetric operation described by
0 1
1 0 0
B C
B ¼ @0 1 0 A: ð17:5Þ
0 0 1
This produces a mirror symmetry with respect to the xy-plane. Then, we have
0 1
1 0 0
B C
C AB ¼ BA ¼ @ 0 1 0 A: ð17:6Þ
0 0 1
The operation C shows an inversion about the origin. Thus, A, B, and C along with
an identity element E form a group. Here E is expressed as
0 1
1 0 0
B C
E ¼ @0 1 0 A: ð17:7Þ
0 0 1
The above group is represented by four three-dimensional diagonal matrices whose

elements are 1 or 1. Therefore, it is evident that an inverse element of A, B, and C is
Fig. 17.2 Rotation by θ z

around the z-axis
y
O
A, B, and C itself, respectively. The said group is commutative (Abelian) and said to
be a four group [1].
Meanwhile, we have a number of noncommutative groups. From a point of view
of a matrix structure, noncommutativity comes from off-diagonal elements of the
matrix. A typical example is a rotation matrix of a rotation angles different from zero
or nπ (n: integer). For later use, let us have a matrix form that expresses a θ rotation
around the z-axis. Figure 17.2 depicts a graphical illustration for this.
A matrix R has a following form:
0 1
cos θ sin θ 0
B C
R ¼ @ sin θ cos θ 0 A: ð17:8Þ
0 0 1
Note that R has been reduced. This implies that the three-dimensional Euclidean
space is decomposed into a two-dimensional subspace (xy-plane) and a one-dimen-
sional subspace (z-axis). The xy-coordinates are not mixed with the z-component
after the rotation R. Note, however, that if the rotation axis is oblique against the xy-
plane, this is not the case. We will come back to this point later.
Taking only the xy-coordinates in Fig. 17.3 we make a calculation. Using an
addition theorem of trigonometric functions, we get
x0 ¼ r cos ðθ þ αÞ ¼ rð cos α cos θ sin α sin θÞ ¼ x cos θ y sin θ, ð17:9Þ

0
y ¼ r sin ðθ þ αÞ ¼ r ð sin α cos θ þ cos α sin θÞ ¼ y cos θ þ x sin θ, ð17:10Þ
where we used x ¼ r cos α and y ¼ r sin α. Combining (17.9) and (17.10), as a matrix
form we get
Fig. 17.3 Transformation y

of the xy-coordinates by a θ
rotation ′
θ
α
O x

x0 cos θ sin θ x
¼ : ð17:11Þ
y0 sin θ cos θ y
Equation (17.11) is the same as (11.31) and represents a transformation matrix of a

rotation angle θ within the xy-plane. Whereas in Chap. 11 we considered this from
the point of view of the transformation of basis vectors, here we deal with the
transformations of coordinates in the fixed coordinate system. If θ 6¼ π, off-diagonal
elements do not vanish. Including the z-component we get (17.8).
Let us summarize symmetry operations and their (3, 3) matrix representations.
The coordinates before and after a symmetry operation are expressed as
0 1 0 01
x x
B C B 0C
@ A
y and @ y A, respectively:
z z0
(i) Identity transformation:

To leave a geometric object or a coordinate system unchanged (or unmoved).
By convention, we denote it by a capital letter E. It is represented by a (3, 3)
identity matrix.
(ii) Rotation symmetry around a rotation axis:
Here a “proper” rotation is intended. We denote a rotation by a rotation axis and
its magnitude (i.e., rotation angle). Thus, we have
Fig. 17.4 Rotation by ϕ z

around the y-axis
y
O $
0 1 0 1
cos θ sin θ 0 cos ϕ 0 sin ϕ
B C B C
Rzθ ¼ @ sin θ cos θ 0 A, Ryϕ ¼ @ 0 1 0 A,
0 0 1 sin ϕ 0 cos ϕ
0 1
1 0 0
B C
Rxφ ¼ @ 0 cos φ sin φ A: ð17:12Þ
0 sin φ cos φ
With Ryϕ we first consider a following coordinate transformation:

0 1 0 10 1
z0 cos ϕ sin ϕ 0 z
B 0C B CB C
@ x A ¼ @ sin ϕ cos ϕ 0 A@ x A: ð17:13Þ
y0 0 0 1 y
0 This
1 can easily0be 1
visualized in Fig. 17.4; consider that cyclic permutation of
x z
B C B C
@ y A produces @ x A. Shuffling the order of coordinates, we get
z y
0 1 0 10 1
x0 cos ϕ 0 sin ϕ x
B 0C B CB C
@y A ¼ @ 0 1 0 A@ y A: ð17:14Þ
z0 sin ϕ 0 cos ϕ z
By convention of the symmetry groups, the following notation Cn is used to

denote a rotation. A subscript n of Cn represents the order of the rotation axis.
The order means the largest number of n so that the rotation through 2π/n gives
an equivalent configuration. Successive rotations of m times (m < n) are
denoted by
n:
Cm
If m ¼ n, the successive rotations produce an equivalent configuration same

as the beginning; i.e., Cnn ¼ E . The rotation angles θ, φ, etc. used above are
restricted to 2πm/n accordingly.
(iii) Mirror symmetry with respect to a plane of mirror symmetry:
We denote a mirror symmetry by a mirror symmetry plane (e.g., xy-plane, yz-
plane). We have
0 1 0 1
1 0 0 1 0 0
B C B C
M xy ¼ @ 0 0 A,
1 M yz ¼ @ 0 1 0 A,
0 0 1 0 0 0
0 1
1 0 0
B C
M zx ¼ @ 0 1 0 A: ð17:15Þ
0 0 1
The mirror symmetry is usually denoted by σ v, σ h, and σ d, whose subscripts

stand for “vertical,” “horizontal,” and “dihedral,” respectively. Among these
symmetry operations, σ v and σ d include a rotation axis in the symmetry plane,
while σ h is perpendicular to the rotation axis if such an axis exists. Notice that a
group belonging to Cs symmetry possesses only E and σ h. Although σ h can
exist by itself as a mirror symmetry, neither σ v nor σ d can exist as a mirror
symmetry by itself. We will come back to this point later.
(iv) Inversion symmetry with respect to a center of inversion:
We specify an inversion center if necessary, e.g., an origin of a coordinate
system O. The operation of inversion is denoted by
0 1
1 0 0
B C
IO ¼ @ 0 1 0 A: ð17:16Þ
0 0 1
Note that as obviously from the matrix form, IO is commutable with any
other symmetry operations. Note also that IO can be expressed as successive
symmetry operations or product of symmetry operations. For instance, we have
0 10 1
1 0 0 1 0 0
B CB C
I O ¼ Rzπ M xy ¼ @ 0 1 0 A@ 0 1 0 A: ð17:17Þ
0 0 1 0 0 1
Note that Rzπ and Mxy are commutable; i.e., RzπMxy ¼ MxyRzπ.
17.2 Successive Symmetry Operations 643
(v) Improper rotation:

This is a combination of a proper rotation and a reflection by a mirror symmetry
plane. That is, rotation around an axis is performed first and then reflection is
carried out by a mirror plane that is perpendicular to the rotation axis. For
instance, an improper rotation is expressed as
0 10 1
1 0 0 cos θ sin θ 0
B CB C
M xy Rzθ ¼ B CB
@ 0 1 0 A@ sin θ cos θ 0C
A
0 0 1 0 0 1
0 1 ð17:18Þ
cos θ sin θ 0
B C
B
¼ @ sin θ cos θ 0 CA:
0 0 1
As mentioned just above, the inversion symmetry I can be viewed as an

improper rotation. Note that in this case the reflection and rotation operations
are commutable. However, we will follow a conventional custom that considers
the inversion symmetry as an independent symmetry operation. Readers may
well wonder why we need to consider the improper rotation. The answer is
simple; it solely rests upon the axiom (A1) of the group theory. A group must
be closed with respect to the multiplication.
The improper rotations are usually denoted by Sn. A subscript n again stands
for an order of rotation.
17.2 Successive Symmetry Operations
Let us now consider successive reflections in different planes and successive rota-
tions about different axes [2]. Figure 17.5a displays two reflections with respect to
the planes σ and e σ both perpendicular to the xy-plane. The said planes make a
dihedral angle θ with their intersection line identical to the z-axis. Also, the plane
σ is identical with the zx-plane. Suppose that an arrow lies on the xy-plane perpen-
dicularly to the zx-plane. As in (17.15), an operation σ is represented as
0 1
1 0 0
B C
σ ¼ @0 1 0 A: ð17:19Þ
0 0 1
To determine a matrix representation of e

σ , we calculate a matrix again as in the
above case. As a result we have
(a) z
O
y
(b) (c)
2
$ $
‒2
Fig. 17.5 Successive two reflections about two planes σ and e σ that make an angle θ. (a) Reflections
σ and e
σ with respect to two planes. (b) Successive operations of σ and e σ in this order. The combined
operation is denoted by eσ σ. The operations result in a 2θ rotation around the z-axis. (c) Successive
operations of e
σ and σ in this order. The combined operation is denoted by e σ σ. The operations result
in a 2θ rotation around the z-axis
0 10 10 1
cos θ sin θ 0 1 0 0 cos θ sin θ 0
B CB CB C
σ¼B
e @ sin θ cos θ 0C B
A@ 0 1 0C B
A@ sin θ cos θ 0C
A
0 0 1 0 0 1 0 0 1
0 1 ð17:20Þ
cos 2θ sin 2θ 0
B C
¼B
@ sin 2θ cos 2θ 0C
A:
0 0 1
Notice that this matrix representation is referred to the original xyz-coordinate

system; see discussion of Sect. 11.4. Hence, we describe the successive transforma-
tions σ followed by e
σ as
0 1
cos 2θ sin 2θ 0
B C
e
σ σ ¼ @ sin 2θ cos 2θ 0 A: ð17:21Þ
0 0 1
The expression (17.21) means that the multiplication should be done first by σ and
then by eσ; see Fig. 17.5b. Note that σ and eσ are conjugate to each other (Sect. 16.3).
In this case, the combined operations produce a 2θ rotation around the z-axis.
If, on the other hand, the multiplication is made first by e
σ and then by σ, we have a
2θ rotation around the z-axis; see Fig. 17.5c. As a matrix representation, we have
0 1
cos 2θ sin 2θ 0
B C
σe
σ ¼ @ sin 2θ cos 2θ 0 A: ð17:22Þ
0 0 1
Thus, successive operations of reflection by the planes that make a dihedral angle θ
yield a rotation 2θ around the z-axis (i.e., the intersection line of σ and e
σ ). The
operation σeσ is an inverse to e
σ σ. That is
ðσe
σ Þðe
σ σ Þ ¼ E: ð17:23Þ
We have det σe
σ ¼ det e
σ σ ¼ 1.
Meanwhile, putting
0 1
cos 2θ sin 2θ 0
B C
R2θ ¼ @ sin 2θ cos 2θ 0 A, ð17:24Þ
0 0 1
we have
e
σ σ ¼ R2θ or e
σ ¼ R2θ σ: ð17:25Þ
This implies the following: Suppose a plane σ and a straight line on it. Also suppose
that one first makes a reflection about σ and then makes a 2θ rotation around the said
straight line. Then, the resulting transformation is equivalent to a reflection about e σ
that makes an angle θ with σ. At the same time, the said straight line is an intersection
line of σ and eσ. Note that a dihedral angle between the two planes σ and e σ is half an
angle of the rotation. Thus, any two of the symmetry operations related to (17.25) are
mutually dependent; any two of them produce the third symmetry operation.
In the above illustration, we did not take account of the presence of a symmetry
axis. If the aforementioned axis is a symmetry axis Cn, we must have
Fig. 17.6 Successive two π $

rotations around two C2
axes of Cxπ and Caπ that
intersect at an angle θ. Of 2
these, Cxπ is identical to the
x-axis (not shown)
−2
2θ ¼ 2π=n or n ¼ π=θ: ð17:26Þ
From a symmetry requirement, there should be n planes of mirror symmetry in

combination with the Cn axis. Moreover, an intersection line of these n mirror
symmetry planes should coincide with that Cn axis. This can be seen as various
Cnv groups.
Next we consider another successive symmetry operations. Suppose that there are
two C2 axes (Cxπ and Caπ as shown) that intersect at an angle θ (Fig. 17.6). There,
Cxπ is identical to the x-axis and the other (Caπ) lies on the xy-plane making an angle
θ with Cxπ. Following procedures similar to the above, we have matrix representa-
tions of the successive C2 operations in reference to the xyz-system such that
0 1
1 0 0
B C
C xπ ¼ B
@ 0 1 0 A
C
0 0 1
0 10 10 1
cos θ sin θ 0 1 0 0 cos θ sin θ 0
B CB CB C
C aπ ¼ B
@ sin θ cos θ 0 C B
A@ 0 1 0 C B
A@ sin θ cos θ 0C
A
0 0 1 0 0 1 0 0 1
0 1
cos 2θ sin 2θ 0
B C
¼B@ sin 2θ cos 2θ 0 A,
C
0 0 1
ð17:27Þ
where Caπ can be calculated similarly to (17.20). Again we get

0 1
cos 2θ sin 2θ 0
B C
C aπ C xπ ¼ B
A,
0 0 1
0 1 ð17:28Þ
cos 2θ sin 2θ 0
B C
C xπ C aπ ¼ B
A:
0 0 1
Notice that (CxπCaπ )(CaπCxπ) ¼ E. Once again putting

0 1
cos 2θ sin 2θ 0
B C
R2θ ¼ @ sin 2θ cos 2θ 0 A, ð17:29Þ
0 0 1
we have [1, 2]
Caπ C xπ ¼ R2θ or C aπ ¼ R2θ Cxπ : ð17:30Þ
Note that in the above two illustrations for the successive symmetry operations, both
the relevant operators have been represented in reference to the original xyz-system.
For this reason, the latter operation was done from the left in (17.28).
From a symmetry requirement, once again the aforementioned C2 axes must be
present in combination with the Cn axis. Moreover, those C2 axes should be
perpendicular to the Cn axis. This can be seen in various Dn groups.
Another illustration of successive symmetry operations is an improper rotation. If
the rotation angle is π, this causes an inversion symmetry. In this illustration,
reflection, rotation, and inversion symmetries coexist. A C2h symmetry is a typical
example.
Equations (17.21), (17.22), and (17.29) demonstrate the same relation. Namely,
two successive mirror symmetry operations about a couple of planes and two
successive π-rotations about a couple of C2 axes cause the same effect with regard
to the geometric transformation. In this relation, we emphasize that two successive
reflection operations make a determinant of relevant matrices 1. These aspects cause
an interesting effect and we will briefly discuss it in relation to O and Td groups.
If furthermore the abovementioned mirror symmetry planes and C2 axes coexist,
the symmetry planes coincide with or bisect the C2 axes and vice versa. If these were
not the case, another mirror plane or C2 axis would be generated from the symmetry
requirement and the newly generated plane or axis would be coupled with the
original plane or axis. From the above argument, these processes again produce
another Cn axis. That must be prohibited.
Next, suppose that a Cn axis intersects obliquely with a plane of mirror symmetry.
A rotation of 2π/n around such an axis produces another mirror symmetry plane.
Fig. 17.7 Mirror symmetry

plane σ h and a sole rotation
axis perpendicular to it
This newly generated plane intersects with the original mirror plane and produces a
different Cn axis according to the above discussion. Thus, in this situation a mirror
symmetry plane cannot coexist with a sole rotation axis. In a geometric object with
higher symmetry such as Oh, however, several mirror symmetry planes can coexist
with several rotation axes in such a way that the axes intersect with the mirror planes
obliquely. In case the Cn axis intersects perpendicularly to a mirror symmetry plane,
that plane can coexist with a sole rotation axis (see Fig. 17.7). This is actually the
case with a geometric object having a C2h symmetry. The mirror symmetry plane is
denoted by σ h.
Now, let us examine simple examples of molecules and associated symmetry
operations.
Example 17.2 Figure 17.8 shows chemical structural formulae of thiophene,
bithiophene, biphenyl, and naphthalene. These molecules belong to C2v, C2h, D2,
and D2h, respectively. Note that these symbols are normally used to show specific
point groups. Notice also that in biphenyl two benzene rings are twisted relative to
the molecular axis. As an example, a multiplication table is shown in Table 17.1 for a
C2v group. Table 17.1 clearly demonstrates that the group constitution of C2v differs
from that of the group appearing in Example 16.1 (iii), even though the order is four
for both the case. Similar tables are given with C2h and D2. This is left for readers as
an exercise. We will find that the multiplication tables of these groups have the same
structure and that C2v, C2h, and D2 are all isomorphic to one another as a four group.
Table 17.2 gives matrix representation of symmetry operations for C2v. The repre-
sentation is defined as the transformation by the symmetry operations of a set of
basis vectors (x y z) in ℝ3.
Meanwhile, Table 17.3 gives the multiplication table of D2h. We recognize that
the multiplication table of C2v appears on upper left and lower right blocks. If we
suitably rearrange the order of group elements, we can make another multiplication
table so that, e.g., C2h may appear on upper left and lower right blocks. As in the case
of Table 17.2, Table 17.4 summarizes the matrix representation of symmetry
operations for D2h. There are eight group elements, i.e., an identity, an inversion,
three mutually perpendicular C2 axes, and three mutually perpendicular planes of
mirror symmetry (σ). Here, we consider a possibility of constructing subgroups of
D2h. The order of the subgroups must be a divisor of eight, and so let us list
subgroups whose order is four and examine how many subgroups exist. We have
8C4 ¼ 70 combinations, but those allowed should be restricted from the requirement
Fig. 17.8 Chemical (a) (b)

structural formulae and
point groups of (a) 6
6 6
thiophene, (b) bithiophene,
(c) biphenyl, and (d)
naphthalene
(c) (d)
Table 17.1 Multiplication C2v E C2(z) σ v(zx) σ 0v ðyzÞ

table of C2v
E E C2 σv σ 0v
C2(z) C2 E σ 0v σv
σ v(zx) σv σ 0v E C2
σ 0v ðyzÞ σ 0v σv C2 E
Table 17.2 Matrix representation of symmetry operations for C2v

E C2 (around z-axis) σ v(zx) σ 0 ðyzÞ
0 1 0 1 0 1 0v 1
Matrix 1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C
@0 1 0A @ 0 1 0 A @ 0 1 0 A @ 0 1 0A
0 0 1 0 0 1 0 0 1 0 0 1
of forming a group. This is because all the groups must contain identity element, and
so the number allowed is equal to or no greater than 7C3 ¼ 35.
(i) In light of the aforementioned discussion, two C2 axes mutually intersecting at
π/2 yield another C2 axis around the normal to a plane defined by the intersecting
axes. Thus, three C2 axes have been chosen and a D2 symmetry results. In this case,
we have only one choice. (ii) In the case of C2v, two planes mutually intersecting at
π/2 yield a C2 axis around their line of intersection. There are three possibilities of
choosing two axes out of three (i.e., 3C2 ¼ 3). (iii) If we choose the inversion (i)
along with, e.g., one of the three C2 axes, a σ necessarily results. This is also the case
when we first combine a σ with i to obtain a C2 axis. We have three possibilities (i.e.,
3C1 ¼ 3) as well. Thus, we have only seven choices to construct subgroups of D2h
having an order of four. This is summarized in Table 17.5.
An inverse of any element is that element itself. Therefore, if with any above
subgroup one chooses any element out of the remaining four elements and combines
it with the identity, one can construct a subgroup Cs, C2, or Ci of an order of 2. Since
all those subgroups of an order of 4 and 2 are commutative with D2h, these sub-
groups are invariant subgroups. Thus in terms of a direct-product group, D2h can be
expressed as various direct factors. Conversely, we can construct factor groups from
coset decomposition. For instance, we have
Table 17.3 Multiplication table of D2h

D2h E C2(z) σ v(zx) σ 0v ðyzÞ i σ 00v ðxyÞ C 02 ðyÞ C 002 ðxÞ
E E C2 σv σ 0v i σ 00v C 02 C 002
C2(z) C2 E σ 0v σv σ 00v i C 002 C 02
σ v(zx) σv σ 0v E C2 C 02 C 002 i σ 00v
σ 0v ðyzÞ σ 0v σv C2 E C 002 C 02 σ 00v i
i i σ 00v C 02 C 002 E C2 σv σ 0v
σ 00v ðxyÞ σ 00v i C 002 C 02 C2 E σ 0v σv
C 02 ðyÞ C 02 C 002 i σ 00v σv σ 0v E C2
C 002 ðxÞ C 002 C 02 σ 00v i σ 0v σv C2 E
D2h =C 2v ffi C s , D2h =C2v ffi C 2 , D2h =C 2v ffi C i : ð17:31Þ
In turn, we express direct-product groups as, e.g.,
D2h ¼ C2v C s , D2h ¼ C 2v C 2 , D2h ¼ C 2v C i :
Example 17.3 Figure 17.9 shows an equilateral triangle placed on the xy-plane of a
three-dimensional Cartesian coordinate. An orthonormal basis e1 and e2 are desig-
nated as shown. As for a chemical species of molecules, we have, e.g., boron
trifluoride (BF3). In the molecule, a boron atom is positioned at a molecular center
with three fluorine atoms located at vertices of the equilateral triangle. The boron
atom and fluorine atoms form a planar molecule (Fig. 17.10).
A symmetry group belonging to D3h comprises twelve symmetry operations such
that

D3h ¼ E, C 3 , C03 , C 2 , C 02 , C002 , σ h , S3 , S03 , σ v , σ 0v , σ 00v , ð17:32Þ
where symmetry operations of the same species but distinct operation are denoted by
a “prime” or “double prime.” When we represent these operations by a matrix, it is
straightforward in most cases. For instance, a matrix for σ v is given by Myz of
(17.15). However, we should make some matrix calculations about
C 02 , C 002 , σ 0v , and σ 00v .
To determine a matrix representation of, e.g., σ 0v in reference to the xy-coordinate
system with orthonormal basis vectors e1 and e2, we consider the x’y’-coordinate
system with orthonormal basis vectors e01 and e02 (see Fig. 17.9). A transformation
matrix between the two set of basis vectors is represented by
17.2
Successive Symmetry Operations
Table 17.4 Matrix representation of symmetry operations for D2h

D2h E C2(z) C2( y) C2(x) i σ(xy) σ(zx) σ(yz)
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Matrix 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C B C B C B C B C
@0 1 0A @ 0 1 0 A @ 0 1 0 A @ 0 1 0 A @ 0 1 0 A @ 0 1 0 A @ 0 1 0A @ 0 1 0A
0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1
651
Table 17.5 Choice of Subgroup E C2(z) σ i Choice

symmetry operations for
D2 1 3 0 0 1
construction of subgroups of
D2h C2v 1 1 2 0 3
C2h 1 1 1 1 3
0 pffiffiffi 1
3 1
B 2 0C
0 0 0 B pffiffi2ffi C
e1 e2 e3 ¼ ðe1 e2 e3 ÞRzπ6 ¼ ðe1 e2 e3 ÞB
B 1
C: ð17:33Þ
0C
3
@ 2 2 A
0 0 1
This representation corresponds to (11.69). Let Σv be a reflection with respect to the

z0x0-plane. This is the same operation as σ 0v . However, a matrix representation is
different. This is because Σv is represented in reference to the x’y’z’-system, while σ 0v
is in reference to xyz-system. The matrix representation of Σv is simple and expressed
as
0 1
1 0 0
B C
Σv ¼ @ 0 1 0 A: ð17:34Þ
0 0 1
Referring to (11.80), we have

h i1
Rzπ6 Σv ¼ σ 0v Rzπ6 or σ 0v ¼ Rzπ6 Σv Rzπ6 : ð17:35Þ
Thus, we see that in the first equation of (17.35) the order of multiplications are
reversed according as the latter operation is expressed in the xyz-system or in the
x’y’z’-system. As a full matrix representation, we get
0 pffiffiffi 1 0 pffiffiffi 1
3 1 0 1 3 1
B 2 0C 1 0 0 B 2 0C
B pffiffi2ffi CB CB p2ffiffiffi C
σ 0v ¼ B
B 1
C@ 0 1 0 AB C
0C B1 0C
3 3
@ 2 2 A @ 2 2 A
0 0 1
0 0 1 0 0 1
0 pffiffiffi 1
1 3
B 2 0C
B pffiffiffi 2 C
¼BB 3
C: ð17:36Þ
0C
1
@ 2 A
2
0 0 1
Notice that this matrix representation is referred to the original xyz-coordinate

system as before. Graphically, (17.35) corresponds to the multiplication of the
Fig. 17.9 Equilateral

triangle placed on the xy-
plane. Several symmetry
operations of D3h are shown.
We consider the successive L
operations (i), (ii), and (iii)
to represent σ 0v in reference ± /6
to the xy-coordinate system
(see text) LLL
LL
(Σ );
Σ
Fig. 17.10 Boron

trifluoride (BF3) belonging
to a D3h point group
%
symmetry operations done in the order of (i) π/6 rotation, (ii) reflection (denoted by
Σv), and (iii) π/6 rotation (Fig. 17.9). The associated matrices are multiplied from
the left.
Similarly, with C02 we have
0 pffiffiffi 1
1 3
B 2 0 C
B pffiffiffi 2 C
C 02 ¼ B
B 3
C: ð17:37Þ
0 C
1
@ 2 A
2
0 0 1
The matrix form of (17.37) can also be decided in a manner similar to the above
according to three successive operations shown in Fig. 17.9. In terms of classes,
σ v , σ 0v , and σ 00v form a conjugacy class and C 2 , C02 , and C 002 form another conjugacy
class. With regard to the reflection and rotation, we have det σ 0v ¼ 1 and det
C2 ¼ 1, respectively.
17.3 O and Td Groups
According as a geometric object (or a molecule) has a higher symmetry, we have to

deal with many symmetry operations and relationship between them. As an example,
we consider O and Td groups. Both the groups have 24 symmetry operations and are
isomorphic.
Let us think of the group O fist. We start with considering rotations of π/2 around
x-, y-, and z-axis. The matrices representing these rotations are obtained from (17.12)
to give
0 1 0 1
1 0 0 0 0 1
B C B C
Rxπ2 ¼ @ 0 0 1 A, Ryπ2 ¼ @ 0 1 0 A,
0 1 0 1 0 0
0 1
0 1 0
B C
Rzπ2 ¼ @ 1 0 0 A: ð17:38Þ
0 0 1
We continue multiplications of these matrices so that the matrices can make up a

complete set (i.e., a closure). Counting over those matrices, we have 24 of them and
they form a group termed O. The group O is a pure rotation group. Here, the pure
rotation group is defined as a group whose group elements only comprise proper
rotations (with their determinant of 1). An example is shown in Fig. 17.11, where
individual vertices of the cube have three arrows for the cube not to possess mirror
symmetries or inversion symmetry. The group O has five conjugacy classes.
Figure 17.12 summarizes them. Geometrical characteristics of individual classes
are sketched as well. These classes are categorized by a trace of the matrix. This is
because the trace is kept unchanged by a similarity transformation. (Remember that
elements of the same conjugacy class are connected with a similarity transforma-
tion.) Having a look at the sketches, we notice that each operation switches the basis
vectors e1, e2, and e3, i.e., x-, y-, and z-axes. Therefore, the presence of diagonal
elements (either 1 or 1) implies that the matrix takes the basis vector(s) as an
eigenvector with respect to the rotation. Corresponding eigenvalue(s) are either 1 or
1 accordingly. This is expected from the fact that the matrix is an orthogonal
matrix (i.e., unitary). The trace, namely a summation of diagonal elements, is closely
related to the geometrical feature of the operation.
The operations of a π rotation around the x-, y-, and z-axis and those of a π
rotation around an axis bisecting any two of three axes have a trace 1. The former
operations take all the basis vectors as an eigenvectors; that is, all the diagonal
elements are nonvanishing. With the latter operations, however, only one diagonal
element is 1. This feature comes from that the bisected axes are switched by the
rotation, whereas the remaining axis is reversed by the rotation.
17.3 O and Td Groups 655
Fig. 17.11 Cube whose individual vertices have three arrows for the cube not to possess mirror
symmetries. This object belongs to a point group O called pure rotation group
χ=3
z
100
010
y
001
z x
χ=1
10 0 1 00 0 0 1 0 0 -1 0 -1 0 0 1 0 π/2
0 0 -1 0 01 0 1 0 0 1 0 1 0 0 -1 0 0
01 0 0 -1 0 -1 0 0 1 0 0 0 0 1 0 0 1 y
x
z
χ=0
001 0 01 0 0 -1 0 0 -1 010 0 -1 0 0 1 0 0 -1 0 y
100 -1 0 0 1 00 -1 0 0 001 0 0 -1 0 0 -1 0 0 1
010 0 -1 0 0 -1 0 0 10 100 10 0 -1 0 0 -1 0 0 2π/3
x
χ = 䠉1 z
10 0 -1 0 0 -1 0 0 π
0 -1 0 0 1 0 0 -1 0
0 0 -1 0 0 -1 0 01
y z
x
χ = 䠉1
-1 0 0 -1 0 0 0 1 0 0 -1 0 0 0 1 0 0 -1 y
00 1 0 0 -1 1 0 0 -1 0 0 0 -1 0 0 -1 0
01 0 0 -1 0 0 0 -1 0 0 -1 1 0 0 -1 0 0 x
Fig. 17.12 Point group O and its five conjugacy classes. Geometrical characteristics of individual
classes are briefly sketched
Another characteristic is the generation of eight rotation axes that trisect the x-, y-,
and z-axes, more specifically, a solid angle π/2 formed by the x-, y-, and z-axes. Since
the rotation switches all the x-, y-, and z-axes, the trace is zero. At the same time, we
find that this operation belongs to C3. This operation is generated by successive two
π/2 rotations around two mutually orthogonal axes. To inspect this situation more
closely, we consider a conjugacy class of π/2 rotation that belongs to the C4

symmetry and includes six elements, i.e., Rxπ2 , Ryπ2 , Rzπ2 , Rxπ2 , Ryπ2 , and Rzπ2 . With
these notations, e.g., Rxπ2 stands for a π/2 counterclockwise rotation around the x-axis;
Rxπ2 denotes a π/2 counterclockwise rotation around the x-axis. Consequently, Rxπ2
implies a π/2 clockwise rotation around the x-axis and, hence, an inverse element of
Rxπ2 . Namely, we have
1
Rxπ2 ¼ Rxπ2 : ð17:39Þ
Now let us consider the successive two rotations. This is denoted by the multi-
plication of matrices that represent the related rotations. For instance, the multipli-
cation of, e.g., Rxπ2 and R0yπ produces the following:
2
0 10 1 0 1
1 0 0 0 0 1 0 0 1
B CB C B C
Rxyz2π3 ¼ Rxπ2 R0yπ ¼ @ 0 0 1 A@ 0 1 0A ¼ @1 0 0 A: ð17:40Þ
2
0 1 0 1 0 0 0 1 0
In (17.40), we define Rxyz2π3 as a 2π/3 counterclockwise rotation around an axis that

trisects the x-, y-, and z-axes. The prime “ 0 ” of R0yπ means that the operation is carried
2
out in reference to the new coordinate system reached by the previous operation Rxπ2.
For this reason, R0yπ is operated (i.e., multiplied) from the right in (17.40). Compare
2
this with the remark made just after (17.30). Changing the order of Rxπ2 and R0yπ , we
2
have
0 10 1 0 1
0 0 1 1 0 0 0 1 0
0 B CB C B C
Rxyz2π3 ¼ Ry2 Rxπ ¼ @ 0
π 1 0 A@ 0 0 1 A ¼ @ 0 0 1 A, ð17:41Þ
2
1 0 0 0 1 0 1 0 0
where Rxyz2π3 is a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-,
and z-axes. Notice that we used R0xπ this time, because it was performed after Ryπ2 .
2
Thus, we notice that there are eight related operations that trisect eight octants of the
coordinate system. These operations are further categorized into four sets in which
the two elements are an inverse element of each other. For instance, we have
1
Rx y z2π3 ¼ Rxyz2π3 : ð17:42Þ
Notice that a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-,
and z-axes is equivalent to a 2π/3 clockwise rotation around an axis that trisects the
1
x-, y-, and z-axes. Also, we have Rx yz2π3 ¼ Rxyz2π3 etc. Moreover, we have “cyclic”
relations such as
Rxπ2 R0yπ ¼ Ryπ2 R0zπ ¼ Rzπ2 R0xπ ¼ Rxyz2π3 : ð17:43Þ

2 2 2
Returning back to Sect. 11.4, we had

10 2 0 13
x1 x1
0 B C 6 B C7
A½PðxÞ = ½ðe1 en ÞPA @ ⋮ A ¼ ðe1 en Þ4AO P@ ⋮ A5: ð11:79Þ
xn xn
Implication of (11.79) is that0LHS1is related to the transformation of basis vectors

x1
B C
while retaining coordinates @ ⋮ A and that transformation matrices should be
xn
operated on the basis vectors from the right. Meanwhile, RHS describes the trans-
formation of coordinates, while retaining basis vectors. In that case, transformation
matrices should be operated on the coordinates from the left. Thus, the order of
operator multiplication is reversed. Following (11.80), we describe
1
Rxπ2 R0yπ ¼ RO Rxπ2 , i:e:, RO ¼ Rxπ2 R0yπ Rxπ2 , ð17:44Þ
2 2
where RO is viewed in reference to the original (or fixed) coordinate system and
conjugated to R0yπ . Thus, we have
2
0 10 10 1
1 0 0 0 0 1 1 0 0
B CB CB C
RO ¼ @ 0 0 1 A@ 0 1 0 A@ 0 0 1A
0 1 0 1 0 0 0 1 0
0 1
0 1 0
B C
¼ @1 0 0 A: ð17:45Þ
0 0 1
Note that (17.45) is identical to a matrix representation of a π/2 rotation around the z-
axis. This is evident from the fact that the y-axis is converted to the original z-axis by
Rxπ2 ; readers, imagine it.
We have two conjugacy classes of π rotation (the C2 symmetry). One of them
includes six elements, i.e., Rxyπ , Ryzπ, Rzxπ, Rxyπ , Ryzπ , and Rzxπ . For these notations a
subscript, e.g., xy stands for an axis that bisects the angle formed by x- and y-axes. A
subscript xy denotes an axis bisecting the angle formed by x- and y-axes. Another
class includes three elements, i.e., Rxπ , Ryπ, and Rzπ.
As for Rxπ, Ryπ, and Rzπ, a combination of these operations should yield a C2
rotation axis as discussed in Sect. 17.2. Of these three rotation axes, in fact, any two
produce a C2 rotation around the remaining axis, as is the case with naphthalene
belonging to the D2h symmetry (see Sect. 17.2).
Regarding the class comprising six π rotation elements, a combination of, e.g.,
Rxyπ and Rxyπ crossing each other at a right angle causes a related effect. For the other
combinations, the two C2 axes intersect each other at π/3; see Fig. 17.6 and put
θ ¼ π/3 there. In this respect, elementary analytic geometry teaches the positional
relationship among planes and
0 straight
1 0 lines. 1 The0argument
1 is as follows: A plane
x1 x2 x3
B C B C B C
determined by three points @ y1 A, @ y2 A, and @ y3 A that do not sit on a line is
z1 z2 z3
expressed by a following equation:
x y z 1
x1 y1 z1 1
¼ 0: ð17:46Þ
x2 x2 z2 1
x3 y3 z3 1
0 1 0 1 0 1
0 1 0
B C B C B C
Substituting @ 0 A, @ 1 A, and @ 1 A, we have
0 0 1
x y þ z ¼ 0: ð17:47Þ
Taking account of direction cosines and using the Hesse’s normal form, we get
1
pffiffiffi ðx y þ zÞ ¼ 0, ð17:48Þ
3
where the normal to the plane expressed in (17.48) has direction cosines of p1ffiffi3, p1ffiffi3,
and p1ffiffi in relation to the x-, y-, and z-axes, respectively.
3
0 Therefore,
1 the normal is given by a straight line connecting the origin and
1
B C
@ 1 A. In other words, a line connecting the origin and a corner of a cube is the
1
normal to the plane described by (17.48). That plane is formed by two intersecting
lines, i.e., rotation axes of C2 and C02 (see Fig. 17.13 that depicts a cube of each side
of 2). These axes0makepan π/3; this1
ffiffiffi 1angle 0 can easily be checked by taking an inner
1= 2 0
B pffiffiffi C B pffiffiffi C
product between @ 1= 2 A and @ 1= 2 A. These column vectors are two direction
pffiffiffi
0 1= 2
cosines of C2 and C 02 . On the basis of the discussion of Sect. 17.2, we must have a
(0, 1, 1)
(1, ‒1, 1) π/3

3 y
O
2
(1, 1, 0)
Fig. 17.13 Rotation axes of C2 and C 02 along with another rotation axis C3 in a point group O
6KHHW 6KHHW 6KHHW
Fig. 17.14 Simple kit that helps visualize the positional relationship among planes and straight
lines in three-dimensional space. To make it, follow next procedures: (i) Take three thick sheets of
paper and make slits (dashed lines) as shown. (ii) Insert Sheet 2 into Sheet 1 so that the two sheets
can make a right angle. (iii) Insert Sheet 3 into combined Sheets 1 and 2
rotation axis of C3. That is, this axis trisects a solid angle π/2 shaped by three
intersecting sides.
It is sometimes hard to visualize or envisage the positional relationship among
planes and straight lines in three-dimensional space. It will therefore be useful to
make a simple kit to help visualize it. Figure 17.14 gives an illustration.
Another typical example having 24 group elements is Td. A molecule of methane
belongs to this symmetry. Table 17.6 collects the relevant symmetry operations and
their (3, 3) matrix representations. As in the case of Fig. 17.12, the matrices show
how a set of vectors (x y z) are transformed according to the symmetry operations.
Comparing it with Fig. 17.12, we immediately recognize that the close relationship
between Td and O exists and that these point groups share notable characteristics.
(i) Both Td and O consist of five conjugacy classes, each of which contains the
same number of symmetry species. (ii) Both Td and O contain a pure rotation group
T as a subgroup. The subgroup T consists of 12 group elements E, 8C3, and 3C2.
Other remaining twelve group elements of Td are symmetry species related to
reflection; S4 and σ d. The elements 6S4 and 6σ d correspond to 6C4 and 6C2 of O,
660
Table 17.6 Symmetry operations and their matrix representation of Td

E:
0 1
1 0 0
B C
@0 1 0A
0 0 1
8C3:
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0
B C B C B C B C B C B C B C B C
@1 0 0A @ 1 0 0 A @ 1 0 0 A @ 1 0 0 A @ 0 0 1 A @ 0 0 1 A @ 0 0 1 A @ 0 0 1A
0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0
3C2:
0 1 0 1 0 1
1 0 0 1 0 0 1 0 0
B C B C B C
@ 0 1 0 A @ 0 1 0 A @ 0 1 0 A
0 0 1 0 0 1 0 0 1
6S4:
0 1 0 1 0 1 0 1 0 1 0 1
1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0
B C B C B C B C B C B C
@ 0 0 1 A @ 0 0 1 A @ 0 1 0 A @ 0 1 0 A @ 1 0 0 A @ 1 0 0 A
0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1
17
6σ d:
0 1 0 1 0 1 0 1 0 1 0 1
1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0
B C B C B C B C B C B C
@0 0 1A @ 0 0 1 A @ 0 1 0 A @ 0 1 0 A @1 0 0A @ 1 0 0 A
0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1
Symmetry Groups
respectively. That is, successive operations of S4 cause similar effects to those of C4

of O. Meanwhile, successive operations of σ d are related to those of 6C2 of O.
Let us imagine in Fig. 17.15 that a regular tetrahedron is inscribed in a cube. As
an example of symmetry operations, suppose that three pairs of planes of σ d (six
planes in total) are given by equations of x ¼ y and y ¼ z and z ¼ x. Their
Hesse’s normal forms are represented as
1
pffiffiffi ðx yÞ ¼ 0, ð17:49Þ
2
1
pffiffiffi ðy zÞ ¼ 0, ð17:50Þ
2
1
pffiffiffi ðz xÞ ¼ 0: ð17:51Þ
2
Then, a dihedral angle α of the two planes is given by
1 1 1
cos α ¼ pffiffiffi ∙ pffiffiffi ¼ or ð17:52Þ
2 2 2
1 1 1 1
cos α ¼ pffiffiffi ∙ pffiffiffi pffiffiffi ∙ pffiffiffi ¼ 0: ð17:53Þ
2 2 2 2
That is, α ¼ π/3 or α ¼ π/2. On the basis of the discussion of Sect. 17.2, the
intersection of the two planes must be a rotation axis of C3 or C2. Once again, in the
case of C3 the intersection is a straight line connecting the origin and a vertex of the
cube. This can readily be verified as follows: For instance, two planes given by x ¼ y
and y ¼ z make an angle π/3 and produce an intersection line x ¼ y ¼ z. This line, in
turn, connects the origin and a vertex of the cube.
If we choose, e.g., two planes x ¼ y from the above, these planes make a right
angle and their intersection must be a C2 axis. The three C2 axes coincide with the x-,
y-, and z-axes. In this light, σ d functions similarly to 6C2 of O in that their
combinations produce 8C3 or 3C2. Thus, constitution and operation of Td and
O are related.
Let us more closely inspect the structure and constitution of O and Td. First we
construct mapping ρ between group elements of O and Td such that
ρ : g 2 O ⟷ g0 2 T d , if g 2 T,

ρ : g 2 O ⟷ g0 Þ1 2 T d , if g 2= T:
In the above relation, the minus sign indicates that with an inverse representation
matrix R must be replaced with R. Then, ρ(g) ¼ g0 is an isomorphic mapping. In
fact, comparing Fig. 17.12 and Table 17.6, ρ gives identical matrix representation for
O and Td. For example, taking the first matrix of S4 of Td, we have
Fig. 17.15 Regular z

tetrahedron inscribed in a
cube. As an example of
symmetry operations, we
can choose three pairs of
planes of σ d (six planes in
total) given by equations of
x ¼ y and y ¼ z and
z¼ x
y
O
0 11 0 1 0 1
1 0 0 1 0 0 1 0 0
B C B C B0 0 1 C
@ 0 0 1 A ¼ @ 0 0 1A ¼ @ A:
0 1 0 0 1 0 0 1 0
The resulting matrix is identical to the first matrix of C4 of O. Thus, we find Td and
O are isomorphic to each other.
Both O and Td consist of 24 group elements and isomorphic to a symmetric group
S4; do not confuse it with the same symbol S4 as a group element of Td. The subgroup
T consists of three conjugacy classes E, 8C3, and 3C2. Since T is constructed only by
entire classes, it is an invariant subgroup; in this respect see the discussion of Sect.
16.3. The groups O, Td, and T along with Th and Oh form cubic groups [2]. In
Table 17.7, we list these cubic groups together with their name and order. Of these,
O is a pure rotation subgroup of Oh and T is a pure rotation subgroup of Td and Th.
Symmetry groups are related to permutations of n elements (or objects or
numbers). The permutation has already appeared in (11.56) when we defined a
determinant of a matrix. That was defined as

1 2 n
σ¼ ,
i1 i2 in
where σ means the permutation of the numbers 1, 2, , n. Therefore, the above

symmetry group has n! group elements (i.e., different ways of rearrangements).
Although we do not dwell on symmetric groups much, we describe a following
important theorem related to finite groups without proof. Interested readers are
referred to literature [1].
17.4 Special Orthogonal Group SO(3) 663
Table 17.7 Several cubic groups and their characteristics

Notation Group name Order Remark
T Tetrahedral rotation 12 Subgroup of Th, Td
Th, Td Tetrahedral 24
O Octahedral rotation 24 Subgroup of Oh
Oh Octahedral 48
Theorem 17.1: Cayley’s Theorem [1] Every finite group ℊ of order n is isomor-
phic to a subgroup (containing a whole group) of the symmetric group Sn.
17.4 Special Orthogonal Group SO(3)
In Part III and Part IV thus far, we have dealt with a broad class of linear trans-
formations. Related groups are finite groups. Here we will describe characteristics of
special orthogonal group SO(3), a kind of infinite groups. The SO(3) represents
rotations in three-dimensional Euclidean space ℝ3. Rotations are made around an
axis (a line through the origin) with the origin fixed. The rotation is defined by an
azimuth of direction and a magnitude (angle) of rotation. The azimuth of direction is
defined by two parameters and the magnitude is defined by one parameter, a rotation
angle. Hence, the rotation is defined by three independent parameters. Since those
parameters are continuously variable, SO(3) is one of continuous groups.
Two rotations result in another rotation with the origin again fixed. A reverse
rotation is unambiguously defined. An identity transformation is naturally defined.
An associative law holds as well. Thus, the relevant rotations form a group, i.e., SO
(3). The rotation is represented by a real (3, 3) matrix whose determinant is 1. A
matrix representation is uniquely determined once an orthonormal basis is set in ℝ3.
Any rotation is represented by a rotation matrix accordingly. The rotation matrix R is
defined by
RT R ¼ RRT ¼ E, ð17:54Þ
where R is a real matrix with det R ¼ 1. Notice that we exclude the case where det
R ¼ 1. Matrices that satisfy (17.54) with det R ¼ 1 are referred to as orthogonal
matrices that cause orthogonal transformation. Correspondingly, orthogonal groups
(represented by orthogonal matrices) contain rotation groups as a special case. In
other words, the orthogonal groups contain the rotation groups as a subgroup. An
orthogonal group in ℝ3 denoted O(3) contains SO(3) as a subgroup. By the same
token, orthogonal matrices contain rotation matrices as a special case.
In Sect. 17.2 we treated reflection and improper rotation with a determinant of
their matrices being 1. In this section these transformations are excluded and only
rotations are dealt with. We focus on geometric characteristics of the rotation groups.
Readers are referred to more detailed representation theory of SO(3) in appropriate

literature [1].
17.4.1 Rotation Axis and Rotation Matrix
In this section we represent a vector such as jxi. We start with showing that any
rotation has a unique presence of a rotation axis. The rotation axis is defined by the
following: Suppose that there is a rigid body with some point within the body fixed.
Here the said rigid body can be that with infinite extent. Then the rigid body exerts
rotation. The rotation axis is a line on which every point is unmoved during the
rotation. As a matter of course, identity matrix E has thee linearly independent
rotation axes. (Practically, this represents no rotation.)
Theorem 17.2 Any rotation matrix R is accompanied by at least one rotation axis.
Unless the rotation matrix is identity, the rotation matrix should be accompanied by
one and only one rotation axis.
Proof As R is an orthogonal matrix of a determinant 1, so are RT and R1. Then we
have
ðR EÞT ¼ RT E ¼ R1 E: ð17:55Þ
Hence, we get

det ðR EÞ ¼ det RT E ¼ det R1 E ¼ det R1 ðE RÞ

¼ det R1 det ðE RÞ ¼ det ðE RÞ ¼ det ðR EÞ: ð17:56Þ
Note here that with any (3, 3) matrix A of ℝ3,
det A ¼ ð1Þ9 det A ¼ det ðAÞ:
This equality holds with ℝn (n : odd), but in ℝn (n : even) we have det

A ¼ det (A). Then, (17.56) results in a trivial equation 0 ¼ 0 accordingly.
Therefore, the discussion made below only applies to ℝn (n : odd). Thus, from
(17.56) we have
det ðR E Þ ¼ 0: ð17:57Þ
This implies that for ∃jx0i 6¼ 0

ðR E Þ j x0 i ¼ 0: ð17:58Þ
Therefore, we get
Rðajx0 iÞ ¼ a j x0 i, ð17:59Þ
where a is an arbitrarily chosen real number. In this case an eigenvalue of R is

1, which an eigenvector a j x0i corresponds to. Thus, as a rotation axis, we have a
straight line expressed as
l ¼ Spanfajx0 i; a 2 ℝg: ð17:60Þ
This proves the presence of a rotation axis.

Next suppose that there are two (or more) rotation axes. The presence of two
rotation axes naturally implies that there are two linearly independent vectors (i.e.,
two straight lines that mutually intersect at the fixed point). Suppose that such
vectors are jui and jvi. Then we have
ðR E Þjui ¼ 0, ð17:61Þ
ðR EÞjvi ¼ 0: ð17:62Þ
Let us consider a vector 8jyi that is chosen from Span{j ui, j vi}, to which we
assume that
Spanfjui, vijg Spanfajui, bjvi; a, b 2 ℝg: ð17:63Þ
That is, Span{j ui, j vi} represents a plane P formed by two mutually intersecting
straight lines. Then we have
y ¼ sjui þ tjvi, ð17:64Þ
where s and t are some real numbers. Operating R E on (17.64), we have
ðR EÞjyi ¼ ðR E Þðsjui þ tjviÞ ¼ sðR E Þjui þ t ðR E Þjvi ¼ 0: ð17:65Þ
This indicates that any vectors in P can be an eigenvector of R, implying that an

infinite number of rotation axes exist.
Now, take another vector jwi that is perpendicular to the plane P (See Fig. 17.16).
Let us consider an inner product hyj Rwi. Since
Rjui ¼ jui, hujR{ ¼ hujRT ¼ huj: ð17:66Þ
Similarly, we have
Fig. 17.16 Plane P formed | ⟩

by two mutually intersecting
straight lines represented by
jui and j vi. Another vector
jwi is perpendicular to the
plane P
P
| ⟩
| ⟩
hvjRT ¼ hvj : ð17:67Þ
Therefore, using the relation (17.64), we get
hyjRT ¼ hyj : ð17:68Þ
Here we are dealing with real numbers and, hence, we have
R{ ¼ RT: ð17:69Þ
Now we have

hyjRwi ¼ yRT jRw ¼ yjRT Rw ¼ hyjEwi ¼ hyjwi ¼ 0: ð17:70Þ
In (17.70) the second equality comes from the associative law; the third is due to
(17.54). The last equality comes from that jwi is perpendicular to P.
hyjðR E Þwi ¼ 0: ð17:71Þ
However, we should be careful not to conclude immediately from (17.71) that

(R E)jwi ¼ 0; i.e., Rjwi ¼ jwi. This is because in (17.70) jyi does not represent
all vectors in ℝ3, but merely represent all the vectors in Span{j ui, j vi}. Nonethe-
less, both jwi and jRwi are perpendicular to P, and so
jRwi ¼ ajwi, ð17:72Þ
where a is an arbitrarily chosen real number. From (17.72), we have

{
wR jRw ¼ wRT jRw ¼ hwjwi ¼ jaj2 hwjwi, i:e:, a ¼ 1,
where the first equality comes from that R is an orthogonal matrix. Since det R¼1,
a ¼ 1. Thus, from (17.72) this time around we have jRwi ¼ jwi; that is,
ðR E Þjwi ¼ 0: ð17:73Þ
Equations (17.65) and (17.73) imply that for any vector jxi arbitrarily chosen from
ℝ3 we have
ðR EÞjxi ¼ 0: ð17:74Þ
R E ¼ 0 or R ¼ E: ð17:75Þ
The above procedures represented by (17.61) to (17.75) indicate that the presence
of two rotation axes necessarily requires a transformation matrix to be identity. This
implies that all the vectors in ℝ3 are an eigenvector of a rotation matrix.
Taking contraposition of the above, unless the rotation matrix is identity, the
relevant rotation cannot have two rotation axes. Meanwhile, the proof of the former
half ensures the presence at least one rotation axis. Consequently, any rotation is
characterized by a unique rotation axis except for the identity transformation. This
completes the proof. ∎
An immediate consequence of Theorem 17.2 is that the rotation matrix should
have an eigenvalue 1 which an eigenvector representing the rotation axis corre-
sponds to. This statement includes a trivial fact that all the eigenvalues of the identity
matrix are 1. In Sect. 14.4, we calculated eigenvalues of a two-dimensional rotation
matrix. The eigenvalues were eiθ or eiθ, where θ is a rotation angle.
Let us consider rotation matrices that we dealt with in Sect. 17.1. The matrix
representing the rotation around the z-axis by a rotation angle θ is expressed by
0 1
cos θ sin θ 0
B C
R ¼ @ sin θ cos θ 0 A: ð17:8Þ
0 0 1
In Sect. 14.4 we treated diagonalization of a rotation matrix. As R is reduced, the

diagonalization can be performed in a manner essentially the same as (14.92). That
is, as a diagonalizing unitary matrix, we have
0 1 0 1
1 1 1 i
pffiffiffi pffiffiffi 0 pffiffiffi pffiffiffi 0
B 2 2 C B 2 2 C
B C B C
U¼B
B piffiffiffi i C, U{ ¼ B
B pffiffiffi piffiffiffi
1 C: ð17:76Þ
@ pffiffiffi 0 C
A @ 2 0C
A
2 2 2
0 0 1 0 0 1
As a result of the unitary similarity transformation, we get

0 1 0 1
1 i 1 1
pffiffiffi pffiffiffi 0 0 1 pffiffiffi pffiffiffi 0
B 2 2 C cos θ sin θ 0 B 2 2 C
B CB CB C
B CB C B C
U { RU ¼ B 1 C sin θ cos θ 0 AB
0C
i i i
B pffiffiffi pffiffiffi 0 C@ B pffiffiffi pffiffiffi C
@ 2 2 A @ 2 2 A
0 0 1
0 0 1 0 0 1
0 iθ 1
e 0 0
B C
¼B@ 0 e
iθ
0CA:
0 0 1
ð17:77Þ
Thus, eigenvalues are 1, eiθ, and eiθ. The eigenvalue 1 results from the existence of
the unique rotation axis. When θ ¼ 0, (17.77) gives an identity matrix with all the
eigenvalues 1 as expected. When θ ¼ π, eigenvalues are 1, 1, and 1. The
eigenvalue 1 is again associated with the unique rotation axis. The (unitary) simi-
larity transformation keeps a trace unchanged, that is, the trace χ is
χ ¼ 1 þ 2 cos θ: ð17:78Þ
As R is a normal operator, spectral decomposition can be done as in the case of

Example 14.1. Here we only show the result below.
0 1 0 1
1 i 1 i 0 1
0 0 0 0 0
B 2 2 C B2 2 C
B C B C B C
R ¼ eiθ B i þ eiθ B i þ@0 0 0 A:
0C 0C
1 1
@ A @ A
2 2 2 2 0 0 1
0 0 0 0 0 0
Three matrices of the above equation are projection operators.

17.4.2 Euler Angles and Related Topics
Euler angles are well known and have been being used in various fields of science.
We wish to connect the above discussion with Euler angles.
In Part III we dealt with successive linear transformations. This can be extended
to the case of three or more successive transformations. Suppose that we have three
successive transformation R1, R2, and R3 and that the coordinate system (a three-
dimensional orthonormal basis) is transformed from O ⟶ I ⟶ II ⟶ III accordingly.
The symbol “O” stands for the original coordinate system and I, II, and III represent
successively transformed systems.
With the discussion that follows, let us denote the transformation by R20, R30, R300,
etc. in reference to the coordinate system I. For example, R30 means that the third
transformation is viewed from the system I. The transformation R300 indicates the
third transformation is viewed from the system II. That is, the number of primes “0”
denotes the number of the coordinate system to distinguish the systems I and II. Let
R2 (without prime) stand for the second transformation viewed from the system O.
Meanwhile, we have
R1 R02 ¼ R2 R1 : ð17:79Þ
This notation is in parallel to (11.80). Similarly, we have
R2 0 R3 00 ¼ R3 0 R2 0 and R1 R03 ¼ R3 R1 : ð17:80Þ
Therefore, we get [3]
R1 R2 0 R3 00 ¼ R1 R3 0 R2 0 ¼ R3 R1 R2 0 ¼ R3 R2 R1 : ð17:81Þ
Also combining (17.79) and (17.80), we have
R3 00 ¼ ðR2 R1 Þ1 R3 ðR2 R1 Þ: ð17:82Þ
Let us call R20, R300, etc. a transformation on a “moving” coordinate system (i.e., the
system I, II, III, ). On the other hand, we call R1, R2, etc. a transformation on a
“fixed” system (i.e., original coordinate system O). Thus, (17.81) shows that the
multiplication order is reversed with respect to the moving system and fixed
system [3].
For a practical purpose it would be enough to consider three successive trans-
formations. Let us think of, however, a general case where n successive transforma-
tions are involved (n denotes a positive integer). For the purpose of succinct
notation, let us define the linear transformations and relevant coordinate systems
ðiÞ
as those in Fig. 17.17. Also, we define a following orthogonal transformation Rj :
( ) ( ) ( ) ( )
≡
O ⟶ I ⟶ II ⟶ III ⟶ ⋯
z
y
x
Fig. 17.17 Successive orthogonal transformations and relevant coordinate systems
ðiÞ ð0Þ
R j ð0 i<j nÞ, and Ri Ri , ð17:83Þ
ðiÞ
where Rj is defined as a transformation Rj described in reference to the coordinate
ð0Þ
system i; Ri means that Ri is referred to the original coordinate system (i.e., the
fixed coordinate). Then we have
ði2Þ ði1Þ ði2Þ ði2Þ

Ri1 Rk ¼ Rk Ri1 ðk > i 1Þ: ð17:84Þ
Particularly, when i ¼ 3

R2 Rk ¼ Rk R2 ðk > 2Þ: ð17:85Þ
For i ¼ 2 we have
ð1Þ
R1 Rk ¼ Rk R1 ðk > 1Þ: ð17:86Þ
fn
We define n time successive transformations on a moving coordinate system as R
such that
f ð1Þ ð2Þ ðn3Þ ðn2Þ

Rn R1 R2 R3 Rn2 Rn1 Rðnn1Þ: ð17:87Þ
ðn2Þ
Applying (17.84) on Rn1 Rðnn1Þ and rewriting (17.87), we have

Rn ¼ R1 R2 R3 Rn2 Rðnn2Þ Rn1 : ð17:88Þ
ðn3Þ
Applying (17.84) again on Rn2 Rðnn2Þ , we get

Rn ¼ R1 R2 R3 Rðnn3Þ Rn2 Rn1 : ð17:89Þ
Proceeding similarly, we have

f ð1Þ ð2Þ ðn3Þ ðn2Þ ð1Þ ð2Þ ðn3Þ ðn2Þ

Rn ¼ R1 Rðn1Þ R2 R3 Rn2 Rn1 ¼ Rn R1 R2 R3 Rn2 Rn1 , ð17:90Þ
where with the last equality we used (17.86). In this case, we have
R1 Rðn1Þ ¼ Rn R1 : ð17:91Þ
To reach RHS of (17.90), we applied (17.84) (n 1) times in total. Then we

ðn2Þ
repeat the above procedures with respect to Rn1 another (n 2) times to get
f ð1Þ ð2Þ ðn3Þ

Rn ¼ Rn Rn1 R1 R2 R3 Rn2 : ð17:92Þ
Further proceeding similarly, we finally get
fn ¼ Rn Rn1 Rn2 R3 R2 R1 :

R ð17:93Þ
In total, we have applied the permutation of (17.84) n(n 1)/2 times. When n is 2, n
(n 1)/2 ¼ 1. This is the case with (17.79). If n is 3, n(n 1)/2 ¼ 3. This is the case
with (17.81). Thus, (17.93) once again confirms that the multiplication order is
reversed with respect to the moving system and fixed system.
Meanwhile, we define P e as
e R1 Rð21Þ Rð32Þ Rðn2

P
n3Þ ðn2Þ
Rn1 : ð17:94Þ
Alternately, we describe
e ¼ Rn1 Rn2 R3 R2 R1 :

P ð17:95Þ
Then, from (17.87) and (17.90) we get
R e ðnn1Þ ¼ Rn P:
fn ¼ PR e ð17:96Þ
Equivalently, we have
Rn ¼ PR e1 or
e ðnn1Þ P ð17:97Þ
1
Rðnn1Þ ¼ P e
e Rn P: ð17:98Þ
Moreover, we have
h iT h iT h iT h iT
eT ¼ R1 Rð21Þ Rð32Þ Rðn2
P
n3Þ ðn2Þ
Rn1
ðn2Þ
¼ Rn1
ðn3Þ
Rn2
ð1Þ
R2 ½R1 T
h i1 h i1 h i1
ðn2Þ ðn3Þ ð1Þ
¼ Rn1 Rn2 R2 R1 1
h i1
ð1Þ ð2Þ ðn3Þ ðn2Þ
¼ R1 R2 R3 Rn2 Rn1 ¼P e1: ð17:99Þ
ðn2Þ ðn3Þ ð1Þ

The third equality of (17.99) comes from the fact that matrices Rn1 , Rn2 , R2 ,
and R1 are orthogonal matrices. Then, we get
P e¼P
eT P ePeT ¼ E: ð17:100Þ
Thus, Pe is an orthogonal matrix.

From a point of view of practical application, (17.97) and (17.98) are very useful.
This is because Rn and Rðnn1Þ are conjugate to each other. Consequently, Rn has the
same eigenvalues and trace as Rðnn1Þ . In light of (11.81), we see that (17.97) and
(17.98) relate Rn (i.e., viewed in reference to the original coordinate system) to
Rðnn1Þ [i.e., the same transformation viewed in reference to the coordinate system
reached after the (n 1) transformations]. Since the transformation Rðnn1Þ is usually
described in a simple form, matrix calculations to compute Rn can readily be done.
Now let us consider an example.
Example 17.4: Successive Rotations A typical illustration of three successive
transformations in moving coordinate systems is well known in association with
Euler angles. This contains the following three steps:
(i) Rotation by α around the z-axis in the original coordinate system (O).
(ii) Rotation by β around the y0-axis in the transferred coordinate system (I).
(iii) Rotation by γ around the z00-axis (the same as z0-axis) in the transferred
coordinate system (II).
The above three steps are represented by matrices of Rzα, R0 y0 β , and R00 z00 γ in
f3 we have
(17.12). That is, as a total transformation R
0 10 10 1
cos α sin α 0 cos β 0 sin β cos γ sin γ 0
B CB CB C
f3 ¼ B sin α cos α 0C B 0 C B 0C
R @ A@ 0 1 A@ sin γ cos γ A
0 0 1 sin β 0 cos β 0 0 1
0 1
cos α cos β cos γ sin α sin γ cos α cos β sin γ sin α cos γ cos α sin β
B C
¼B
@ sin α cos β cos γ þ cos α sin γ sin α cos β sin γ þ cos α cos γ sin α sin β C
A:
sin β cos γ sin β sin γ cos β
ð17:101Þ
This matrix corresponds to (17.87), where n ¼ 3. The angles α, β, and γ in (17.101)

are called Euler angles and their domains are usually taken as 0 α 2π, 0 β π,
0 γ 2π. The matrix (17.101) is widely used in quantum mechanics and related
fields of natural science. The matrix notation, however, differs from literature to
literature, and so care should be taken [3–5]. Using the notation of (17.87), we have
0 1
cos α sin α 0
B C
R1 ¼ @ sin α cos α 0 A, ð17:102Þ
0 0 1
0 1
cos β 0 sin β
ð1Þ B C
R2 ¼ @ 0 1 0 A, ð17:103Þ
sin β 0 cos β
0 1
cos γ sin γ 0
ð2Þ B C
R3 ¼ @ sin γ cos γ 0 A: ð17:104Þ
0 0 1
From (17.94) we also get

0 1
cos α cos β sin α cos α sin β
e R1 Rð21Þ B C
P ¼ @ sin α cos β cos α sin α sin β A: ð17:105Þ
sin β 0 cos β
Corresponding to (17.97), we have

h i1
R3 ¼ PR e1 ¼ R1 Rð21Þ Rð32Þ Rð21Þ R1 1:
e ð32Þ P ð17:106Þ
Now matrix calculations are readily performed such that

0 10 10 1
cos α sin α 0 cos β 0 sin β cos γ sin γ 0
B CB CB C
R3 ¼ B
@ sin α cos α 0C
A@
B 0 0 C
1 B
A@ sin γ cos γ 0C
A
0 0 1 sin β 0 cos β 0 0 1
0 10 1
cos β 0 sin β cos α sin α 0
B CB C
B 0 1 0 C B C
@ A@ sin α cos α 0 A
sin β 0 cos β 0 0 1
0 1
cos 2 α cos 2 β þ sin 2 α cos γ cos α sin α sin 2 βð1 cos γ Þ cos α cos β sin βð1 cos γ Þ
B C
B þ cos 2 α sin 2 β cos β sin γ þ sin α sin β sin γ C
B C
B 2 C
B cos α sin α sin βð1 cos γ Þ
2
sin α cos 2 β þ cos 2 α cos γ sin α cos β sin βð1 cos γ Þ C
B C
¼B C:
B þ cos β sin γ þ sin 2 α sin 2 β cos α sin β sin γ C
B C
B C
B cos β sin β cos αð1 cos γ Þ sin α cos β sin βð1 cos γ Þ sin β cos γ
2 C
@ A
sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β
ð17:107Þ
Notice that in (17.107) we have a trace χ described as
χ ¼ 1 þ 2 cos γ: ð17:108Þ
ð2Þ
The trace is same as that of R3 , as expected from (17.106) and (17.97).
Equation (17.107) apparently seems complicated but has simple and well-defined
ð2Þ
meaning. The rotation represented by R3 is characterized by a rotation by γ around
the z -axis. Figure 17.18 represents the orientation of the z00-axis viewed in reference
00
to the original xyz-system. That is, the z00-axis (identical to the rotation axis A) is
designated by an azimuthal angle α and a zenithal angle β as shown. The operation
R3 is represented by a rotation by γ around the axis A in the xyz-system. The angles α,
β, and γ coincide with the Euler angles designated with the same independent
parameters α, β, and γ.
From (17.77) and (17.107), a diagonalizing matrix for R3 is PU. e That is,
0 1
eiγ 0 0
{ e 1
{ B C
e ¼ PU
U P R3 PU e R3 PU { ð 2Þ
e ¼ U R3 U ¼ @ 0 eiγ 0 A: ð17:109Þ
0 0 1
e is a real matrix, we have

Note that as P
Fig. 17.18 Rotation γ z

around the rotation axis A.
The orientation of A is $
defined by angles α and β as
shown
γ
O y
e{ ¼ P
P e1
eT ¼ P ð17:110Þ
and
0 1
0 1 1 1
pffiffiffi pffiffiffi 0
cos α cos β sin α cos α sin β B 2 2 C
B CBB
C
C
e ¼ B sin α cos β cos α sin α sin β C
PU @ ABB pffiffiffi
i i
pffiffiffi 0C
C
@ 2 2 A
sin β 0 cos β
0 0 1
0 1 i 1 i 1
pffiffiffi cos α cos β þ pffiffiffi sin α pffiffiffi cos α cos β pffiffiffi sin α cos α sin β
B 2 2 2 2 C
B C
B 1 C
B i 1 i
sin α sin β C
¼ B pffiffiffi sin α cos β pffiffiffi cos α pffiffiffi sin α cos β þ pffiffiffi cos α C:
B 2 2 2 2 C
B C
@ 1 1 A
pffiffiffi sin β pffiffiffi sin β cos β
2 2
ð17:111Þ
A vector representing the rotation axis corresponds to an eigenvalue 1. The direction

cosines of x-, y-, and z-component for the rotation axis A are cosα sin β, sinα sin β,
and cosβ (see Fig. 3.1), respectively, when viewed in reference to the original xyz-
coordinate system. This can directly be shown as follows: The characteristic equa-
tion of R3 is expressed as
jR3 λEj ¼ 0: ð17:112Þ
Using (17.107) we have
jR3 λEj
0 1
cos 2 α cos 2 β þ sin 2 α cos γ cos α sin α sin 2 βð1 cos γ Þ cos α cos β sin βð1 cos γ Þ
B C
B þ cos 2 α sin 2 β λ cos β sin γ þ sin α sin β sin γ C
B C
B cos α sin α sin 2 βð1 cos γ Þ sin α cos β þ cos α cos γ sin α cos β sin βð1 cos γ Þ C
2 2 2
B C
¼B C:
B þ cos β sin γ þ sin α sin β λ
2 2
cos α sin β sin γ C
B C
B C
@ cos α cos β sin βð1 cos γ Þ sin α cos β sin βð1 cos γ Þ sin β cos γ
2
A
sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β λ
ð17:113Þ
When λ ¼ 1, we must have the direction cosines of the rotation axis as an

eigenvector. That is, we get
0 1
cos 2 β cos 2 α þ sin 2 α cos γ sin 2 β cos α sin αð1 cos γ Þ cos β sin β cos αð1 cos γ Þ
B C
B þ sin 2 β cos 2 α 1 cos β sin γ þ sin β sin α sin γ C
B C
B sin 2 β cos α sin αð1 cos γ Þ cos β sin 2 α þ cos 2 α cos γ
2
sin α cos β sin βð1 cos γ Þ C
B C
B C
B þ cos β sin γ þ sin 2 α sin 2 β 1 cos α sin β sin γ C
B C
B C
@ cos α cos β sin βð1 cos γ Þ sin α cos β sin βð1 cos γ Þ sin β cos γ
2
A
sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β 1
ð17:114Þ
0 10 1
cos α sin β 0
B C B C
@ sin α sin β A ¼ @ 0 A:
cos β 0
The above matrix calculations certainly verify that (17.114) holds; readers, check it.
As an application of (17.107) to an illustrative example, let us consider a 2π/3
rotation around an axis trisecting a solid angle π/2 formed by the x-, y-, and z-axes
(see Fig. 17.19 and Sect. 17.3). Then we have
pffiffiffi pffiffiffi pffiffiffi
cos α ¼ sin α ¼ 1= 2, cos β ¼ 1= 3, sin β ¼ 2= 3,
pffiffiffi
cos γ ¼ 1=2, sin γ ¼ 3=2: ð17:115Þ
Substituting (17.115) for (17.107), we get

z z
(a) (b)
O
y
O
y
x
Fig. 17.19 Rotation axis of C3 that permutates basis vectors. (a) The C3 axis is a straight line that
connects the origin and each vertex of a cube. (b) Cube viewed along the C3 axis that connects the
origin and vertex
0 1
0 0 1
B C
R3 ¼ @ 1 0 0 A: ð17:116Þ
0 1 0
If we write a linear transformation R3 following (11.37), we get

0 10 1
0 0 1 x1
B CB C
R3 ðjxiÞ ¼ ðje1 ije2 i j e3 iÞ@ 1 0 0 A@ x2 A, ð17:117Þ
0 1 0 x3
where je1i, j e2i, and j e3i represent unit basis vectors in the direction of x-, y-, and z-
axes, respectively. We have jxi ¼ x1 j e1i + x2 j e2i + x3 j e3i. Thus, we get
0 1
x1
B C
R3 ðjxiÞ ¼ ðje2 i je3 i je1 iÞ@ x2 A: ð17:118Þ
x3
This implies a cyclic permutation of the basis vectors. This is well expressed in terms
of the column vectors (i.e., coordinate) transformation as
0 1
x3
B C
R3 ðjxiÞ ¼ ðje1 i je2 i je3 iÞ@ x1 A: ð17:119Þ
x2
Care should be taken on which linear transformation is intended out of the basis
vectors transformation or coordinate transformation.
As mentioned above, we have compared geometric features on the moving
coordinate system and fixed coordinate system. The features apparently seem to
differ at a glance, but are essentially the same.
References
1. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
3. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton
Chapter 18
Representation Theory of Groups
Representation theory is an important pillar of the group theory. As we shall see

soon, the word “representation” and its definition sound a bit daunting. To be short,
however, we need “numbers” or “matrices” to do a mathematical calculation.
Therefore, we may think of the representation as merely numbers and matrices.
Individual representations have their dimension. If that dimension is one, we treat a
representation as a number (real or complex). If the dimension is two or more, we are
going to deal with matrices; in the case of the n-dimension, it is a (n, n) square
matrix. In this chapter we focus on the representation theory of finite groups. In this
case, we have an important theorem stating that a representation of any finite group
can be converted to a unitary representation by a similarity transformation. That is,
group elements of a finite group are represented by a unitary matrix. According to the
dimension of the representation, we have the same number of basis vectors. Bearing
these things firmly in mind, we can pretty easily understand this important notion of
representation.
18.1 Definition of Representation
In Sect. 16.4 we dealt with various aspects of the mapping between group elements.
Of these, we studied fundamental properties of isomorphism and homomorphism. In
this section we introduce the notion of representation of groups and study it. If we
deal with a finite group consisting of n elements, we describe it as ℊ ¼ {g1 e,
g2, , gn} as in the case of Sect. 16.1.
Definition 18.1 Let ℊ ¼ {gν} be a group comprising elements gν. Suppose that a
(d, d ) matrix D(gν) is given for each group element gν. Suppose also that in
correspondence with

https://doi.org/10.1007/978-981-15-2225-3_18
680 18 Representation Theory of Groups
gμ gν ¼ gρ , ð18:1Þ

D gμ D ð gν Þ ¼ D gρ ð18:2Þ
holds. Then a set L consisting of D(gν), that is,
L ¼ fDðgν Þg
is said to be a representation.
Although the definition seems somewhat daunting, the representation is, as
already seen, merely a homomorphism.
We call individual matrices D(gν) a representation matrix. A dimension d of the
matrix is said to be a dimension of representation as well. In correspondence with
gνe ¼ gν, we have

D gμ DðeÞ ¼ D gμ : ð18:3Þ
Therefore, we get
DðeÞ ¼ E, ð18:4Þ
where E is an identity matrix of dimension d. Also in correspondence with

gνgν1 ¼ gν1gν ¼ E, we have

Dðgν ÞD gν 1 ¼ D gν 1 Dðgν Þ ¼ DðeÞ ¼ E: ð18:5Þ
That is,

D gν 1 ¼ ½Dðgν Þ1 : ð18:6Þ
Namely, an inverse matrix corresponds to an inverse element. If the representa-

tion has one-to-one correspondence, the representation is said to be faithful. In the
case of n-to-one correspondence, the representation is homomorphic. In particular,
representing all group elements by 1 [as (1, 1) matrix] is called an “identity
representation.”
Illustrative examples of the representation have already appeared in Sects. 17.2
and 17.3. In that case the representation was faithful. For instance, in Tables 17.2,
17.4, and 17.6 as well as Fig. 17.12, the number of representation matrices is the
same as that of group elements. In Sects. 17.2 and 17.3, in most cases we have used
real orthogonal matrices. Such matrices are included among unitary matrices. Rep-
resentation using unitary matrices is said to be a “unitary representation.” In fact, a
representation of a finite group can be converted to a unitary representation.
18.1 Definition of Representation 681
Theorem 18.1 [1, 2] A representation of any finite group can be converted to a

unitary representation by a similarity transformation.
Proof Let ℊ ¼ {g1, g2, , gn} be a finite group of order n. Let D(gν) be a
representation matrix of gν of dimension d. Here we suppose that D(gν) is not
unitary, but non-singular. Using D(gν), we construct the following matrix H such
that
Xn
H¼ i¼1
Dðgi Þ{ Dðgi Þ: ð18:7Þ
Then, for an arbitrarily chosen group element gj we have

{ Xn {
D gj HD gj ¼ i¼1
D gj Dðgi Þ{ Dðgi ÞD gj
Xn {
¼ i¼1
Dðgi ÞD gj Dðgi ÞD gj
Xn {
¼ D gi gj D gi gj ð18:8Þ
Xni¼1
¼ k¼1
Dðgk Þ{ Dðgk Þ
¼ H,
where the third equality comes from homomorphism of the representation and the
second last equality is due to rearrangement theorem (see Sect. 16.1).
Note that each matrix D(gi){D(gi) is an Hermitian Gram matrix constructed by a
non-singular matrix and that H is a summation of such Gram matrices. Conse-
quently, on the basis of the argument of Sect. 13.2, H is positive definite and all
the eigenvalues of H are positive. Then, using an appropriate unitary matrix U, we
can get a diagonal matrix Λ such that
U { HU ¼ Λ: ð18:9Þ
Here, the diagonal matrix Λ is given by

0 1
λ1
B λ2 C
B C
Λ¼B C, ð18:10Þ
@ ⋱ A
λd
with λi > 0 (1 i d). We define Λ1/2 such that

0 pffiffiffiffiffi 1
λ1
B pffiffiffiffiffi C
B λ2 C
Λ1=2 ¼B C, ð18:11Þ
@ ⋱ A
pffiffiffiffiffi
λd
0 qffiffiffiffiffiffiffi 1
λ1
B 1
qffiffiffiffiffiffiffi C
1 B C
B λ1 C
Λ1=2 B
¼B 2 C: ð18:12Þ
C
B ⋱ C
@ qffiffiffiffiffiffiffi A
λ1
d
Notice that both Λ1/2 and (Λ1/2)1 are non-singular. Furthermore, we define a
matrix V such that
1
V ¼ U Λ1=2 : ð18:13Þ
Then, multiplying both sides of (18.8) by V1 from the left and by V from the
right and inserting VV1 ¼ E in between, we get
{
V 1 D gj VV 1 HVV 1 D gj V ¼ V 1 HV : ð18:14Þ
Meanwhile, we have
1 1
V 1 HV ¼ Λ1=2 U { HU Λ1=2 ¼ Λ1=2 Λ Λ1=2 ¼ Λ: ð18:15Þ
With the second equality of (18.15), we used (18.9). Inserting (18.15) into
(18.14), we get
{
V 1 D gj VΛV 1 D gj V ¼ Λ: ð18:16Þ
Multiplying both sides of (18.16) by Λ1 from the left, we get

{
Λ1 V 1 D gj ðVΛÞ V 1 D gj V ¼ E: ð18:17Þ

h i1 1 1
Λ1 V 1 ¼ ðVΛÞ1 ¼ UΛ1=2 ¼ Λ1=2 U 1 ¼ Λ1=2 U{:
18.2 Basis Functions of Representation 683
Meanwhile, taking adjoint of (18.13) and noting (18.12), we get
1 { 1
V{ ¼ Λ1=2 U { ¼ Λ1=2 U{:
Using (18.13) once again, we have
VΛ ¼ UΛ1=2 :
Also using (18.13) we have

( ){
1 1 h i{ { {
1 {
V ¼ U Λ1=2 ¼ Λ1=2 U { ¼ U { Λ1=2 ¼ UΛ1=2 ,
where we used unitarity of U and (18.11).

Using the above relations and rewriting (18.17), finally we get
{ {
V { D gj V 1 ∙ V 1 D gj V ¼ E: ð18:18Þ

e gj as follows
Defining D

e gj V 1 D gj V,
D ð18:19Þ
and taking adjoint of both sides of (18.19), we get

{ {
e gj { :
V { D gj V 1 ¼ D ð18:20Þ
Then, from (18.18) we have

e gj { D
D e gj ¼ E: ð18:21Þ
Equation (18.21) implies that a representation of any finite group can be

converted to a unitary representation by a similarity transformation of (18.19).
18.2 Basis Functions of Representation
In Part III we considered a linear transformation of a vector. In that case we have

defined vectors as abstract elements with which operation laws of (11.1)–(11.8)
hold. We assumed that the operation is addition. In this part, so far we have dealt
with vectors mostly in ℝ3. Therefore, vectors naturally possess geometric features.
In this section, we extend a notion of vectors so that they can be treated under a wider
scope. More specifically, we include mathematical functions treated in analysis as
vectors.
To this end, let us think of a basis of a representation. According to the dimension
d of the representation, we have d basis vectors. Here we adopt d linearly indepen-
dent basis vectors for the representation.
Let ψ 1, ψ 2, , and ψ d be linearly independent vectors in a vector space V d. Let
ℊ ¼ {g1, g2, , gn} be a finite group of order n. Here we assume that gi (1 i n)
is a linear transformation (or operator) such as a symmetry operation dealt with in
Chap. 17. Suppose that the following relation holds with gi 2 ℊ (1 i n):
Xd
gi ð ψ ν Þ ¼ μ¼1
ψ μ Dμν ðgi Þ ð1 ν dÞ: ð18:22Þ
Here we followed the notation of (11.37) that represented a linear transformation

of a vector. Rewriting (18.22) more explicitly, we have
0 1
D11 ðgi Þ D1d ðgi Þ
B C
gi ðψ ν Þ ¼ ðψ 1 ψ 2 ψ d Þ@ ⋮ ⋱ ⋮ A: ð18:23Þ
Dd1 ðgi Þ Ddd ðgi Þ
Comparing (18.23) with (11.37), we notice that ψ 1, ψ 2, , and ψ d act as vectors.

The corresponding coordinates of the vectors (or a column vector); i.e., x1, x2, ,
and xd have been omitted.
Now let us make sure that a set L consisting of {D(g1), D(g2), , D(gn)} forms a
representation. Operating gj on (18.22), we have
Xd Xd
gj gi ðψ ν Þ¼½gj gi ðψ ν Þ ¼ gj μ¼1 ψ μ Dμν ðgi Þ ¼ g ψ μ Dμν ðgi Þ
μ¼1 j
Xd Xd Xd
¼ g ψ D μν ð g Þ ¼ ψ D g Dμν ðgi Þ ð18:24Þ
μ¼1 j μ i μ¼1 λ¼1 λ λμ j
Xd
¼ ψ D gj Dðgi Þ λν :
λ¼1 λ
Putting gjgi ¼ gk according to multiplication of group elements, we get

Xd
gk ð ψ ν Þ ¼ λ¼1
ψ λ D gj Dðgi Þ λν : ð18:25Þ
Meanwhile, replacing gi in (18.22) with gk, we have

Xd
gk ð ψ ν Þ ¼ μ¼1
ψ λ Dλν ðgk Þ: ð18:26Þ
Comparing (18.25) and (18.26) and considering the uniqueness of vector repre-
sentation based on linear independence of the basis vectors (see Sect. 11.1), we get

D gj Dðgi Þ λν ¼ Dλν ðgk Þ ½Dðgk Þλν , ð18:27Þ
where the last identity follows the notation (11.38) of Sect. 11.2. Hence, we have

D gj Dðgi Þ ¼ Dðgk Þ: ð18:28Þ
Thus, the set L consisting of {D(g1), D(g2), , D(gn)} is certainly a representa-

tion of the group ℊ ¼ {g1, g2, , gn}. In such a case, the set B consisting of linearly
independent d functions (i.e., vectors)
B ¼ fψ 1 , ψ 2 , , ψ d g ð18:29Þ
is said to be basis functions of the representation D. The number d equals the

dimension of representation. Correspondingly, the representation matrix is a (d, d )
square matrix. As remarked above, the correspondence between the elements
of L and ℊ is not necessarily one-to-one (isomorphic), but may be n-to-one
(homomorphic).
Definition 18.2 Let D and D0 be two representations of ℊ ¼ {g1, g2, , gn}.
Suppose that these representations are related to each other by similarity transfor-
mation such that
D0 ðgi Þ ¼ T 1 Dðgi ÞT ð1 i nÞ, ð18:30Þ
where T is a non-singular matrix. Then, D and D0 are said to be equivalent repre-

sentations, or simply equivalent. If the representations are not equivalent, they are
called inequivalent.
Suppose that B ¼ fψ 1 , ψ 2 , , ψ d g is a basis of a representation D of
ℊ ¼ {g1, g2, , gn}. Then we have (18.22). Let T be a non-singular matrix. Using
T, we want to transform a basis of the representation D from B to a new set
B 0 ¼ ψ 01 , ψ 02 , , ψ 0d . Individual elements of the new basis are expressed as
0
Xd
ψν ¼ λ¼1
ψ λ T λν : ð18:31Þ
Since T is non-singular, this ensures that ψ 01 , ψ 02 , , and ψ 0d are linearly inde-

pendent and, hence, that B 0 forms another basis set of D (see Discussion of Sect.
11.4). Thus, we can describe ψ μ (1 μ n) in terms of ψ 0ν . That is
Xd
ψμ ¼ λ¼1
ψ 0λ T 1 λμ : ð18:32Þ
Operating gi on both sides of (18.31), we have

Xd Xd Xd
gi ψ 0ν ¼ λ¼1
g i ð ψ λ ÞT λν ¼ λ¼1
ψ D ðg ÞT λν
μ¼1 μ μλ i
Xd Xd Xd Xd
¼ λ¼1 μ¼1
ψ 0 T 1 κμ Dμλ ðgi ÞT λν ¼
κ¼1 κ
ψ 0 T 1 Dðgi ÞT κν :
κ¼1 κ
ð18:33Þ
Let D0 be a representation of ℊ in reference to B 0 ¼ ψ 01 , ψ 02 , , ψ 0d . Then we

have
Xd
gi ψ 0ν ¼ ψ 0 D0 :
κ¼1 κ κν
ð18:34Þ
Hence, (18.30) follows in virtue of the linear independence of

ψ 01 , ψ 02 , , and ψ 0d . Thus, we see that the transformation of basis vectors via
(18.31) causes similarity transformation between representation matrices. This is
in parallel with (11.81) and (11.88).
Below we show several important notions as a definition. We assume that a group
is a finite group.
Definition 18.3 Let D and D e be two representations of ℊ ¼ {g1, g2, , gn}. Let
e ðgi Þ be two representation matrices of gi (1 i n). If we construct Δ(gi)
D(gi) and D
such that

Dðgi Þ
Δ ð gi Þ ¼ , ð18:35Þ
e ð gi Þ
D
then Δ(gi) is a representation as well. The representation Δ(gi) is said to be a direct

sum of D(gi) and D e ðgi Þ. We denote it by
e ðgi Þ:
Δðgi Þ ¼ Dðgi Þ D ð18:36Þ
e ðgi Þ.
A dimension Δ(gi) is a sum of that of D(gi) and that of D
Definition 18.4 Let D be a representation of ℊ and let D(gi) be a representation
matrix of a group element gi. If we can convert D(gi) to a block matrix such as
(18.35) by its similarity transformation, D is said to be a reducible representation, or
completely reducible. If the representation is not reducible, it is irreducible.
A reducible representation can be decomposed (or reduced) to a direct sum

of plural irreducible representations. Such an operation (or procedure) is called
reduction.
Definition 18.5 Let ℊ ¼ {g1, g2, , gn} be a group and let V be a vector space.
Then, we denote a representation D of ℊ operating on V by
D : ℊ ! GLðV Þ:
We call V a representation space (or carrier space) L S of D [3, 4].

From Definition 18.5, the dimension of representation is identical with the
dimension of V. Suppose that there is a subspace W in V. If W is D(gi)-invariant
(see Sect. 12.2) with all 8gi 2 ℊ, W is said to be an invariant subspace of V. Here, we
say that W is D(gi)-invariant if we have | xi 2 W ⟹ D(gi)| xi 2 W; see Sect. 12.2.
Notice that | xi may well represent a function ψ ν of (18.22). In this context, we have a
following important theorem.
Theorem 18.2 [3] Let D: ℊ ! GL(V ) be a unitary representation over V. Let W be a
D(gi)-invariant subspace of V, where gi (1 i n) 2 ℊ ¼ {g1, g2, , gn}. Then, W⊥
is a D(gi)-invariant subspace of V as well.
Proof Suppose that | ai 2 W⊥ and let | bi 2 W. Then, from (13.86) we have
D E D E
hbjDðgi Þjai ¼ ajDðgi Þ{ jb ¼ aj½Dðgi Þ1 jb ¼ ajD gi 1 jb ,
where we used the fact that D(gi) is a unitary matrix; see (18.6). But, since W is
D(gi)-invariant from the supposition, jD(gi1)| bi 2 W. Here notice that gi1 must be
∃
gi (1 i n) 2 ℊ. Therefore, we should have

hbjDðgi Þjai ¼ ajD gi 1 jb ¼ 0:
This implies that jD(gi)| ai 2 W⊥. This means that W⊥ is a D(gi)-invariant

subspace of V. This completes the proof. ∎
V ¼ W W ⊥:
Correspondingly, as in the case of (12.102), D(gi) is reduced as follows:

!
D ð W Þ ð gi Þ
Dðgi Þ ¼ ⊥ ,
DðW Þ ðgi Þ
⊥
where D(W )(gi) and DðW Þ ðgi Þ are representation matrices associated with subspaces
W and W⊥, respectively. This is the converse of the additive representations that
appeared in Definition 18.3. From (11.17) and (11.18), we have
dimV ¼ dimW þ dimW ⊥ : ð18:37Þ
If V is a d-dimensional vector space, V is spanned by d linearly independent basis

vectors (or functions) ψ μ (1 μ d). Suppose that the dimension W and W⊥ is d (W )
⊥
and dðW Þ , respectively. Then, W and W⊥ are spanned by d(W ) linearly independent
⊥
vectors and dðW Þ linearly independent vectors, respectively. The subspaces W and
W⊥ may well further be decomposed into orthogonal complements [i.e., D(gi)-
invariant subspaces of V]. In this situation, in general, we write
Dðgi Þ ¼ Dð1Þ ðgi Þ Dð2Þ ðgi Þ DðωÞ ðgi Þ: ð18:38Þ
We will develop further detailed discussion in Sect. 18.4.

If the aforementioned decomposition cannot be done, D is irreducible. Therefore,
it will be of great importance to examine a dimension of irreducible representations
contained in (18.38). This is closely related to the choice of basis vectors and
properties of the invariant subspaces. Notice that the same irreducible representation
may repeatedly occur in (18.38).
Before advancing to the next section, however, let us think of an example to get
used to abstract concepts. This is at the same time for the purpose of taking the
contents of Chap. 19 in advance.
Example 18.1 Figure 18.1 shows structural formulae including resonance struc-
tures for allyl radical. Thanks to the resonance structures, the allyl radical belongs to
C2v; see the multiplication table in Table 17.1. The molecule lies on the yz-plane and
a line connecting C1 and a bonded H is the C2 axis (see Fig. 18.1). The C2 axis is
identical to the z-axis. The molecule has mirror symmetries with respect to the yz-
and zx-planes. We denote π-orbitals of C1, C2, and C3 by ϕ1, ϕ2, and ϕ3, respec-
tively. We suppose that these orbitals extend toward the x-direction with a positive
sign in the upper side and a negative sign in the lower side relative to the plane of
paper. Notice that we follow custom of the group theory notation with the coordinate
setting in Fig. 18.1.
We consider an inner vector space V3 spanned by ϕ1, ϕ2, and ϕ3. To explicitly
show that these are vectors of the inner vector space, we express them as jϕ1i, jϕ2i,
and jϕ3i in this example. Then, according to (11.19) of Sect. 11.1, we write
V 3 ¼ Spanfjϕ1 i, jϕ2 i, jϕ3 ig:
The vector space V3 is a representation space pertinent to a representation D of the

present example. Also, in parallel to (13.32) of Sect. 13.2 we express an arbitrary
vector jψi 2 V3 as
H H
H C1 H H C1 H
C3 C2 C3 C2
H H H H
y
x
Fig. 18.1 Allyl radical and its resonance structure. The molecule is placed on the yz-plane. The z-
axis is identical with a straight line connecting C1 and H (i.e., C2 axis)
jψi ¼ c1 jϕ1 i þ c2 jϕ2 i þ c3 jϕ3 i ¼ jc1 ϕ1 þ c2 ϕ2 þ c3 ϕ3 i

0 1
c1
B C
¼ ðjϕ1 i jϕ2 i jϕ3 iÞB C
@ c2 A:
c3
Let us now operate a group element of C2v. For example, choosing C2(z), we have
0 10 1
1 0 0 c1
B CB C
C2 ðzÞðjψ Þ ¼ ðjϕ1 i jϕ2 i jϕ3 iÞ@ 0 0 1 A@ c2 A: ð18:39Þ
0 1 0 c3
Thus, C2(z) is represented by a (3, 3) matrix. Other group elements are

represented similarly. These results are collected in Table 18.1, where each matrix
is given with respect to jϕ1i, jϕ2i, and jϕ3i as basis vectors. Notice that the matrix
representations differ from those of Table 17.2, where we chose jxi, j yi, and j zi for
the basis vectors. From Table 18.1, we immediately see that the representation
matrices are reduced to an upper (1, 1) diagonal matrix (i.e., just a number) and a
lower (2, 2) diagonal matrix.
In Table 18.1, we find that all the matrices are Hermitian (as well as unitary).
Since C2v is an Abelian group (i.e., commutative), in light of Theorem 14.13, we
should be able to diagonalize these matrices by a single unitary similarity transfor-
mation all at once. In fact, E and σ 0v ðyzÞ are invariant with respect to unitary similarity
transformation, and so we only have to diagonalize C2(z) and σ v(zx) at once. As a
characteristic equation of the above (3, 3) matrix of (18.39), we have
Table 18.1 Matrix representation of symmetry operations given with respect to jϕ1i, jϕ2i, and jϕ3i
E C2(z) σ v(zx) σ 0 ðyzÞ
0 1 0 1 0 1 0v 1
1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C
@0 1 0A @ 0 0 1 A @0 0 1A @ 0 1 0 A
0 0 1 0 1 0 0 1 0 0 0 1

λ 1 0 0

0 λ 1 ¼ 0:

1 λ
Solving the above equation, we get λ ¼ 1 or λ ¼ 1 (as a double root). Also as a

diagonalizing unitary matrix U, we get
0 1
1 0 0
B 1 1 C
B0 pffiffiffi pffiffiffi C
U¼B
B 2 2 C {
C¼U : ð18:40Þ
@ 1 1 A
0 pffiffiffi pffiffiffi
2 2
Thus, (18.39) can be rewritten as

0 1 0 1
1 0 0 c1
B C {B C
C 2 ðzÞðjψ Þ ¼ ðjϕ1 i jϕ2 i jϕ3 iÞUU { B
@ 0 0 1 C B C
AUU @ c2 A
0 1 0 c3
0 1
0 1 c1
1 0 0 B 1 C
1 1 B CB
B pffiffiffi ðc2 þ c3 Þ C
C:
¼ ðjϕ1 i pffiffiffi ðjϕ2 þ ϕ3 iÞ pffiffiffi ðjϕ2 ϕ3 iÞB
@ 0 1 0CAB 2 C
2 2 B C
@ 1 A
0 0 1 pffiffiffi ðc2 c3 Þ
2
The diagonalization of the representation matrix σ v(zx) using the same U as the
above is left for the readers as an exercise. Table 18.2 shows the results of the
diagonalized representations with regard to the vectors j ϕ1 i, p1ffiffi2 ðjϕ2 þ ϕ3 iÞ,
and p1ffiffi2 ðjϕ2 ϕ3 iÞ. Notice that traces of the matrices remain unchanged in
Tables 18.1 and 18.2, i.e., before and after the unitary similarity transformation.
From Table 18.2, we find that the vectors are eigenfunctions of the individual
symmetry operations. Each diagonal element is a corresponding eigenvalue of
those operations. Of the three vectors, jϕ1i and p1ffiffi2 ðjϕ2 þ ϕ3 iÞ have the same
eigenvalues with respect to individual symmetry operations. With symmetry
Table 18.2 Matrix representation of symmetry operations given with respect to jϕ1i, p1ffiffi
2
ðjϕ2 þ ϕ3 iÞ, and p1ffiffi ðjϕ2 ϕ3 iÞ
2
E C2(z) σ v(zx) σ 0v ðyzÞ

0 1 0 1 0 1 0 1
1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C
@0 1 0A @ 0 1 0 A @0 1 0 A @ 0 1 0 A
0 0 1 0 0 1 0 0 1 0 0 1
operations C2 and σ v(zx), another vector p1ffiffi2 ðjϕ2 ϕ3 iÞ has an eigenvalue of a sign
opposite to that of the former two vectors. Thus, we find that we have arrived at
“symmetry-adapted” vectors by taking linear combination of original vectors.
Returning back to Table 17.2, we see that the representation matrices have already
been diagonalized. In terms of the representation space, we constructed the repre-
sentation matrices with respect to x, y, and z as symmetry-adapted basis vectors. In
the next chapter, we make the most of such vectors constructed by the symmetry-
adapted linear combination.
Meanwhile, allocating appropriate numbers to coefficients c1, c2, and c3, we
represent any vector in V3 spanned by jϕ1i, jϕ2i, and jϕ3i. In the next chapter, we
deal with molecular orbital (MO) calculations. Including the present case of the
allyl radical, we solve the energy eigenvalue problem by appropriately determining
those coefficients (i.e., eigenvectors) and corresponding energy eigenvalues in a
representation space. A dimension of the representation space depends on the
number of molecular orbitals. The representation space is decomposed into several
or more (but finite) orthogonal complements according to (18.38). We will come
back to the present example in Sect. 19.4.4 and further investigate the problems
there.
As can be seen in the above example, the representation space accommodates
various types of vectors, e.g., mathematical functions in the present case. If the
representation space is decomposed into invariant subspaces, we can choose appro-
priate basis vectors for each subspace, the number of which is equal to the dimen-
sionality of each irreducible representation [4]. In this context, the situation is related
to that of Part III where we have studied how a linear vector space is decomposed
into direct sum of invariant subspaces that are spanned by associated basis vectors.
In particular, it will be of great importance to construct mutually orthogonal
symmetry-adapted vectors through linear combination of original basis vectors in
each subspace associated with the irreducible representation. We further study these
important subjects in the following several sections.
18.3 Schur’s Lemmas and Grand Orthogonality Theorem

(GOT)
Pivotal notions of the representation theory of finite groups rest upon Schur’s
lemmas (first lemma and second lemma).
Schur’s First Lemma [1, 2] Let D and D e be two irreducible representations of ℊ.
e be m and n, respectively. Suppose that
Let dimensions of representations of D and D
8
with g 2 ℊ the following relation holds:
e ðgÞ,
DðgÞM ¼ M D ð18:41Þ
where M is a (m, n) matrix. Then we must have

Case (i): M ¼ 0 or
Case (ii): M is a square matrix (i.e., m ¼ n) with detM 6¼ 0.
In Case (i), the representations of D and De are inequivalent. In Case (ii), on the
other hand, D and D e are equivalent.
Proof
(a) First, suppose that m > n. Let B ¼ fψ 1 , ψ 2 , , ψ m g be a basis set of the
representation space related to D. Then we have
Xm
gð ψ ν Þ ¼ μ¼1
ψ μ Dμν ðgÞ ð1 ν mÞ:
Next we form a linear combination of ψ 1, ψ 2, , and ψ m such that

Xm
ϕν ¼ μ¼1
ψ μ M μν ð1 ν nÞ: ð18:42Þ
Operating g on both sides of (18.42), we have

Xm Xm hXm i
gð ϕ ν Þ ¼ μ¼1
g ψ μ M μν ¼ μ¼1 λ¼1
ψ λ D λμ ð g Þ M μν
Xm hXm i Xm hXn i
¼ ψ λ D λμ ð g ÞM μν ¼ ψ λ M λμ
e μν ðgÞ
D
λ¼1 μ¼1 λ¼1 μ¼1
Xn hXm i Xn
¼ ψ M D e μν ðgÞ ¼ ϕ D e ðgÞ:
μ¼1 λ¼1 λ λμ μ¼1 μ μν
ð18:43Þ
With the fourth equality of (18.43), we used (18.41). Therefore, B e¼

e
fϕ1 , ϕ2 , , ϕn g is a representation of D.
If m > n, we would have been able to construct a basis of the representation
D using n (<m) functions of ϕν (1 ν n) that are a linear combination of
ψ μ (1 μ m). In that case, we would have obtained a representation of a
smaller dimension n for D e by taking a linear combination of ψ 1, ψ 2, , and ψ m.
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) 693
As n < m by supposition, this implies that as in (18.35), D would be decomposed

into representations whose dimension is smaller than m. It is in contradiction to
the supposition that D is irreducible. To avoid this contradiction, we must have
M ¼ 0. This makes (18.41) trivially hold.
(b) Next we consider a case of m < n. Taking a complex conjugate of (18.41) and
exchanging both sides, we get
e ðgÞ{ M { ¼ M { DðgÞ{ :
D ð18:44Þ
Then, from (18.21) and (18.28) we have

DðgÞ{ DðgÞ ¼ E and DðgÞD g1 ¼ DðeÞ: ð18:45Þ
Therefore, we get

D g1 ¼ DðgÞ1 ¼ DðgÞ{ :
Similarly, we have

e g1 ¼ D
D e ð gÞ { :
e ðgÞ1 ¼ D
Thus, (18.44) is rewritten as

e g1 M { ¼ M { D g1 :
D ð18:46Þ
The number of rows of M{ is larger than that of columns. This time,

dimension n of D e ðg1 Þ is larger than m of D(g1). A group element
g designates an arbitrary element of ℊ, so does g1. Thus, (18.46) can be treated
in parallel with (18.41) where m > n with a (m, n) matrix M. Figure 18.2
graphically shows the magnitude relationship in representation dimensions
between representation matrices and M. Thus, in parallel with the above argu-
ment of (a), we exclude the case where m < n as well.
(c) We consider the third case of m ¼ n. Similarly as before we make a linear
combination of vectors contained in the basis set B ¼ fψ 1 , ψ 2 , , ψ m g such that
Xm
ϕν ¼ μ¼1
ψ μ M μν ð1 ν mÞ, ð18:47Þ
where M is a (m, m) square matrix. If detM ¼ 0, then ϕ1, ϕ2, , and ϕm are
linearly dependent (see Sect. 11.4). With the number of linearly independent
vectors p, we have p < m accordingly. As in (18.43) again, this implies that we
would have obtained a representation of a smaller dimension p for D e , in
e
contradiction to the supposition that D is irreducible. To avoid this contradiction,
we must have detM 6¼ 0. These complete the proof. ∎
(a): >
( ) × = ( )
×
(m, m) (m, n) (m, n) (n, n)
(b): <
( ) × = × ( )
(n, n) (n, m) (n, m) (m, m)
Fig. 18.2 Magnitude relationship between dimensions of representation matrices and M. (a) m > n.
(b) m < n. The diagram is based on (18.41) and (18.46) of the text
Schur’s Second Lemma [1, 2] Let D be a representation of ℊ. Suppose that with

8
g 2 ℊ we have
DðgÞM ¼ MDðgÞ: ð18:48Þ
Then, if D is irreducible, M ¼ cE with c being an arbitrary complex number, where

E is an identity matrix.
Proof Let c be an arbitrarily chosen complex number. From (18.48) we have
DðgÞðM cE Þ ¼ ðM cE ÞDðgÞ: ð18:49Þ
If D is irreducible, Schur’s First Lemma implies that we must have either

(i) M cE ¼ 0 or (ii) det(M cE) 6¼ 0. A matrix M has at least one proper
eigenvalue λ (Sect. 12.1), and so choosing λ for c in (18.49), we have det(M λE) ¼ 0.
Consequently, only former case is allowed. That is, we have
M ¼ cE: ð18:50Þ
∎
Schur’s lemmas lead to important orthogonality theorem that plays a fundamental
role in many scientific fields. The orthogonality theorem includes that of matrices
and their traces (or characters).
Theorem 18.3: Grand Orthogonality Theorem (GOT) [5] Let D(1), D(2), be
all inequivalent irreducible representations of a group ℊ ¼ {g1, g2, , gn} of order
n. Let D(α) and D(β) be two irreducible representations chosen from among D(1), D(2),
. Then, regarding their matrix representations, we have the following relationship:
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) 695
X ð αÞ ðβÞ n
Dij ðgÞ Dkl ðgÞ ¼ δ δ δ , ð18:51Þ
g dα αβ ik jl
where Σg means that the summation should be taken over all n group elements; dα
denotes a dimension of the representation D(α). The symbol δαβ means that δαβ ¼ 1
when D(α) and D(β) are equivalent and that δαβ ¼ 0 when D(α) and D(β) are
inequivalent.
Proof First we prove the case where D(α) ¼ D(β). For the sake of simple expression
we omit a superscript and denote D(α) simply by D. Let us construct a matrix A such
that
X
A¼ g
DðgÞXD g1 , ð18:52Þ
where X is an arbitrary matrix. Hence,

X 1 X 1 0 1
Dðg0 ÞA ¼ g
D ð g 0
ÞD ð g ÞXD g ¼ g
Dð g 0
ÞD ð g ÞXD g D g Dðg0 Þ
X
1
¼ g
Dðg0 ÞDðgÞXD g1 g0 Dðg0 Þ
X h i
0 0 1
¼ g
D ð g g ÞXD ð g g Þ Dðg0 Þ:
ð18:53Þ
Thanks to the rearrangement theorem, for fixed g0 the element g0g runs through all
the group elements as g does so. Therefore, we have
X h i X
0 0 1
g
Dð g g ÞXD ð g g Þ ¼ g
DðgÞXD g1 ¼ A: ð18:54Þ
Thus,
DðgÞA ¼ ADðgÞ: ð18:55Þ
According to Schur’s Second Lemma, we have
A ¼ λE: ð18:56Þ
ðlÞ
The value of a constant λ depends upon the choice of X. Let X be δi δjðmÞ where all
the matrix elements are zero except for the (l, m)-component that takes 1 (Sects. 12.5
and 12.6). Thus from (18.53) we have
X X
g,p,q
Dip ðgÞδðplÞ δqðmÞ Dqj g1 ¼ D ðgÞDmj g1 ¼ λlm δij ,
g il
ð18:57Þ
where λlm is a constant to be determined. Using the unitary representation, we have

X
g
Dil ðgÞDjm ðgÞ ¼ λlm δij : ð18:58Þ
Next, we wish to determine coefficients λlm. To this end, setting i ¼ j and

summing over i in (18.57), we get for LHS
X X X X
g i
Dil ðgÞDmi g1 ¼ g
D g1 DðgÞ ¼ g
D g1 g
ml ml
X X
¼ g
½DðeÞml ¼ δ
g ml
¼ nδml , ð18:59Þ
where n is equal to the order of group. As for RHS, we have

X
λ δ
i lm ii
¼ λlm d, ð18:60Þ
where d is equal to a dimension of D. From (18.59) and (18.60), we get
n
λlm d ¼ nδlm or λlm ¼ δlm : ð18:61Þ
d
Therefore, from (18.58)

X n
Dil ðgÞDjm ðgÞ ¼ δlm δij : ð18:62Þ
g d
Specifying a species of the irreducible representation, we get

X ðαÞ ðαÞ n
Dil ðgÞDjm ðgÞ ¼ δ δ , ð18:63Þ
g dα lm ij
where dα is a dimension of D(α).

Next, we examine the relationship between two inequivalent irreducible repre-
sentations. Let D(α) and D(β) be such representations with dimensions dα and dβ,
respectively. Let us construct a matrix B such that
X
B¼ g
DðαÞ ðgÞXDðβÞ g1 , ð18:64Þ
where X is again an arbitrary matrix. Hence,

18.4 Characters 697
X
DðαÞ ðg0 ÞB ¼ g
D ðαÞ 0
ð g ÞD ðαÞ
ð g ÞXD ðβÞ 1
g
X
ð αÞ 0 ð αÞ
ðβÞ 0 1 ðβÞ 0
ðβÞ 1
¼ g
D ð g ÞD ð g ÞXD g D g D ðg Þ ð18:65Þ
X h i
1
¼ g
DðαÞ ðg0 gÞXDðβÞ ðg0 gÞ DðβÞ ðg0 Þ ¼ BDðβÞ ðg0 Þ:
According to Schur’s First Lemma, we have
B ¼ 0: ð18:66Þ
ðlÞ
Putting X ¼ δi δjðmÞ as before and rewriting (18.64), we get
X ðαÞ ðβÞ
g
Dil ðgÞDjm ðgÞ ¼ 0: ð18:67Þ
Combining (18.63) and (18.67), we get (18.51). These procedures complete the
proof. ∎
18.4 Characters
Representation matrices of a group are square matrices. In Part III we examined

properties of a trace, i.e., a sum of diagonal elements of a square matrix. In group
theory the trace is called a character.
Definition 18.6 Let D be a (matrix) representation of a group ℊ ¼ {g1, g2, , gn}.
The sum of diagonal elements χ(g) is defined as follows:
Xd
χ ðgÞ TrDðgÞ ¼ i¼1
Dii ðgÞ, ð18:68Þ
where g stands for group elements g1, g2, , and gn; Tr stands for “trace”; d is a
dimension of the representation D. Let ∁ be a set defined as
∁ ¼ fχ ðg1 Þ, χ ðg2 Þ, , χ ðgn Þg: ð18:69Þ
Then, the set ∁ is called a character of D. A character of an irreducible represen-

tation is said to be an irreducible character.
Let us describe several properties of the character or trace.
(i) A character of the identity element χ(e) is equal to a dimension d of a
representation. This is because the identity element is given by a unit matrix.
(ii) Let P and Q be two square matrices. Then, we have
TrðPQÞ ¼ TrðQPÞ: ð18:70Þ
This is because
X XX XX X
i
ðPQÞii ¼ i
P Q
j ij ji
¼ j i
Qji Pij ¼ j
ðQPÞjj : ð18:71Þ
Putting Q ¼ SP1 in (18.70), we get

Tr PSP1 ¼ Tr SP1 P ¼ TrðSÞ: ð18:72Þ
Therefore, we have the following property:

(iii) Characters of group elements that are conjugate to each other are equal. If gi
and gj are conjugate, these elements are connected by means of a suitable
element g such that
ggi g1 ¼ gj : ð18:73Þ
Accordingly, a representation matrix is expressed as

DðgÞDðgi ÞD g1 ¼ DðgÞDðgi Þ½DðgÞ1 ¼ D gj : ð18:74Þ
Taking a trace of both sides of (18.74), we have

χ ðgi Þ ¼ χ gj : ð18:75Þ
(iv) Any two equivalent representations have the same trace. This immediately
follows from (18.30).
There are several orthogonality theorem about a trace. Among them, the
following theorem is well known.
Theorem 18.4 A trace of irreducible representations satisfies the following orthog-
onality relation:
X
g
χ ðαÞ ðgÞ χ ðβÞ ðgÞ ¼ nδαβ , ð18:76Þ
where χ (α) and χ (β) are traces of irreducible representations D(α) and D(β),
respectively.
Proof In (18.51) putting i ¼ j and k ¼ l in both sides and summing over all i and k,
we have
18.4 Characters 699
X X ð αÞ ðβÞ
X n X n
g
D ðgÞ Dkk ðgÞ ¼
i,k ii i,k d α
δ αβ δ ik δ ik ¼ δ δ
i,k d α αβ ik
n
¼ δ d ¼ nδαβ : ð18:77Þ
d α αβ α
From (18.68) and (18.77), we get (18.76). This completes the proof. ∎
Since a character is identical with group elements belonging to the same
conjugacy class Kl, we may write it as χ(Kl) and rewrite a summation of (18.76) as
a summation of the conjugacy classes. Thus, we have
Xnc
l¼1
χ ðαÞ ðK l Þ χ ðβÞ ðK l Þkl ¼ nδαβ , ð18:78Þ
where nc denotes the number of conjugacy classes in a group and kl indicates the
number of group elements contained in a class Kl.
We have seen a case where a representation matrix can be reduced to two
(or more) block matrices as in (18.35). Also as already seen in Part III, the block
matrices decomposition takes place with normal matrices (including unitary matri-
ces). The character is often used to examine a constitution of a reducible represen-
tation or reducible matrix.
Alternatively, if a unitary matrix (a normal matrix, more widely) is decomposed
into block matrices, we say that the unitary matrix comprises a direct sum of those
block matrices. In physics and chemistry dealing with atoms, molecules, crystals,
etc., we very often encounter such a situation. Extending (18.36), the relation can
generally be summarized as
Dðgi Þ ¼ Dð1Þ ðgi Þ Dð2Þ ðgi Þ DðωÞ ðgi Þ, ð18:79Þ
where D(gi) is a reducible representation for a group element gi; D(1)(gi), D(2)(gi), ,
and D(ω)(gi) are irreducible representations in a group. The notation D(ω)(gi) means
that D(ω)(gi) may be equivalent (or identical) to D(1)(gi), D(2)(gi), etc. or may be
inequivalent to them. More specifically, the same irreducible representations may
well appear several times.
To make the above situation clear, we usually use a following equation instead:
X
D ð gi Þ ¼ q DðαÞ ðgi Þ,
α α
ð18:80Þ
where qα is zero or a positive integer and D(α) is different types of irreducible

representations. If the same D(α) repeatedly appears in the direct sum, then
qα specifies how many times D(α) appears in the direct sum. Unless D(α) appears,
qα is zero.
Bearing the above in mind, we take a trace of (18.80). Then we have
X
ðαÞ
χ ð gÞ ¼ q χ
α α
ðgÞ, ð18:81Þ
where we omitted a subscript i indicating an element. To find qα, let us multiply both
sides of (18.81) by χ (α)(g) and take summation over group elements. That is,
X X X X
g
χ ðαÞ ðgÞ χ ðgÞ ¼ q
β β g
χ ðαÞ ðgÞ χ ðβÞ ðgÞ ¼ q nδ
β β αβ
¼ qα n, ð18:82Þ
where we used (18.76) with the second equality. Thus, we get
1 X ð αÞ
qα ¼ χ ðgÞ χ ðgÞ: ð18:83Þ
n g
The integer qα explicitly gives the number of appearance of D(α)(g) that appears in a
reducible representation D(g). The expression pertinent to the classes is
1 X ðαÞ
qα ¼ χ ðK i Þ χ ðK i Þki , ð18:84Þ
n i
where Ki and ki denote the i-th class of the group and the number of elements
belonging to Ki, respectively.
18.5 Regular Representation and Group Algebra
Now, the readers may wonder how many different irreducible representations exist
for a group. To answer this question, let us introduce a special representation of the
regular representation.
Definition 18.7 Let ℊ ¼ {g1, g2, , gn} be a group. Let us define a (n, n) square
matrix D(R)(gν) for an arbitrary group element gν (1 ν n) such that
h i
DðRÞ ðgν Þ ¼ δ g1
i gν gj ð1 i, j nÞ, ð18:85Þ
ij
8
>
<1 for gν ¼ e ði:e:, identityÞ,
where δðgν Þ ¼ ð18:86Þ
>
:
0 for gν 6¼ e:
Let us consider a set

n o
R ¼ DðRÞ ðg1 Þ, DðRÞ ðg2 Þ, , DðRÞ ðgn Þ : ð18:87Þ
Then, the set R is said to be a regular representation of the group ℊ.

18.5 Regular Representation and Group Algebra 701
In fact, R is a representation.

This is confirmed as follows: In (18.85), if
g1
i g g
ν j ¼ e, δ g 1
i g g
ν j ¼ 1. This occurs when gνgj ¼ gi (A). Meanwhile, let us
consider a situation where g1 j g g
μ k ¼ e. This occurs when gμgk ¼ gj (B). Replacing
gj in (A) with that in (B), we have
gν gμ gk ¼ gi : ð18:88Þ
That is,
g1
i gν gμ gk ¼ e: ð18:89Þ
If we choose gi and gν, gj is uniquely decided from (A). If gμ is separately chosen,

then gk is uniquely decide from (B) as well, because gj has already been uniquely
decided. Thus, performing the following matrix calculations, we get
Xh i h i X
j
DðRÞ ðgν Þ ij D
ðRÞ
gμ jk ¼ j
δ g 1
i g ν g j δ g 1
j g μ g k
1 h i
¼ δ gi gν gμ gk ¼ DðRÞ gν gμ : ð18:90Þ
ik
Rewriting (18.90) in a matrix product form, we get

DðRÞ ðgν ÞDðRÞ gμ ¼ DðRÞ gν gμ : ð18:91Þ
Thus, D(R) is certainly a representation.

To further confirm this, let us think of an example.
Example 18.2 Let us consider a thiophene molecule that we have already examined
in Sect. 17.2. In a multiplication table, we arrange E, C2, σ v , σ v0 in a first column and
their inverse element E, C2, σ v , σ v0 in a first row. In this case, the inverse element is
the same as original element itself. Paying attention, e.g., to C2, we allocate the
number 1 on the place where C2 appears and the number 0 otherwise. Then, that
matrix is a regular representation of C2; see Table 18.3. Thus as D(R)(C2), we get
0 1
0 1 0 0
B1 0C
B 0 0 C
DðRÞ ðC 2 Þ ¼ B C: ð18:92Þ
@0 0 0 1A
0 0 1 0
As evidenced in (18.92), the rearrangement theorem ensures that the number

1 appears once and only once in each column and each row in such a way that
individual column and row vectors become linearly independent. Thus, at the same
time, we confirm that the matrix is unitary.
Another characteristic of the regular representation is that the identity is

represented by an identity matrix. In this example, we have
0 1
1 0 0 0
B0 0 0C
B 1 C
DðRÞ ðE Þ ¼ B C: ð18:93Þ
@0 0 1 0A
0 0 0 1
For the other symmetry operations, we have

0 1 0 1
0 0 1 0 0 0 0 1
B0 0 1C B0 0C
B 0 C B 0 1 C
DðRÞ ðσ v Þ ¼ B C, DðRÞ ðσ v0 Þ ¼ B C:
@1 0 0 0A @0 1 0 0A
0 1 0 0 1 0 0 0
Let χ (R)(gν) be a character of the regular representation. Then, according to the

definition of (18.85),
8
>
<n for gν ¼ e,
Xn 1
χ ð RÞ ð g ν Þ ¼ δ g g ν g ¼ ð18:94Þ
i¼1 i i
>
:
0 for gν 6¼ e:
As can be seen from (18.92), the regular representation is reducible because the
matrix is decomposed into block matrices. Therefore, the representation can be
reduced to a direct sum of irreducible representations such that
X
DðRÞ ¼ q DðαÞ ,
α α
ð18:95Þ
where qα is a positive integer or zero and D(α) is an irreducible representation. Then,

from (18.81), we have
Table 18.3 How to make a

C2v E1 C2(z)1 σ v(zx)1 σ 0v ðyzÞ1
regular representation of C2v
E E C2 σv σ 0v
C2(z) C2 E σ 0v σv
σ v(zx) σv σ 0v E C2
σ 0v ðyzÞ σ 0v σv C2 E
X
χ ðRÞ ðgν Þ ¼ q χ
α α
ðαÞ
ðgν Þ, ð18:96Þ
where χ (α) is a trace of the irreducible representation D(α). Using (18.83) and
(18.94),
1 X ð α Þ ð RÞ 1 1
qα ¼ χ ðgÞ χ ðgÞ ¼ χ ðαÞ ðeÞ χ ðRÞ ðeÞ ¼ χ ðαÞ ðeÞ n ¼ χ ðαÞ ðeÞ
n g n n
¼ dα : ð18:97Þ
Note that a dimension dα of the representation D(α) is equal to a trace of its identity
matrix. Also notice from (18.95) and (18.97) that D(R) contains every irreducible
representation D(α) dα times. To show it more clearly, Table 18.4 gives a character
table of C2v. The regular representation matrices are given in Example 18.2. For this,
we have
DðRÞ ðC 2v Þ ¼ A1 þ A2 þ B1 þ B2 : ð18:98Þ
This relation obviously indicates that all the irreducible representations of C2v are
contained one time (that is equal to the dimension of representation of C2v).
Returning to (18.94) and (18.96) and replacing qα with dα there, we get
8
X < n for gν ¼ e,
>
ðαÞ
d χ
α α
ð gν Þ ¼ ð18:99Þ
>
:
0 for gν 6¼ e:
In particular, when gν ¼ e, again we have χ (α)(e) ¼ dα (dα is a real number!).

That is,
X
d
α α
2
¼ n: ð18:100Þ
This is very important relation in the representation theory of a finite group in that
(18.100) sets an upper limit to the number of irreducible representations and their
dimensions. That number cannot exceed the order of a group.
Table 18.4 Character table C2v E C2(z) σ v(zx) σ 0v ðyzÞ

of C2v
A1 1 1 1 1 z; x2, y2, z2
A2 1 1 1 1 xy
B1 1 1 1 1 x; zx
B2 1 1 1 1 y; yz
In (18.76) and (18.78), we have shown the orthogonality relationship between

traces. We have another important orthogonality relationship between them. To prove
this theorem, we need a notion of group algebra [5]. The argument is as follows:
(a) Let us think of a set comprising group elements expressed as
X
ℵ¼ a g,
g g
ð18:101Þ
where g is a group element of a group ℊ ¼ {g1, g2, , gn}; ag is an arbitrarily

chosen complex number; Σg means that summation should be taken over group
elements. Let ℵ0 be another set similarly defined as (18.101). That is,
X
ℵ0 ¼ a0 0 g0 :
g0 g
ð18:102Þ
Then we can define a following sum:

X X X
ℵ þ ℵ0 ¼ g
a g g þ a
g0 g
0 0
0 g ¼ g
a g g þ a 0
g g
X
¼ g
ag þ a0g g: ð18:103Þ
Also, we get
X X X X
ℵ ℵ0 ¼ g
a g g g 0
a 0 0
g 0g ¼ g0 g
a g gg 0
a0g0
X X X X
0 1 0 0 0 1
¼ g0 1 g g
a gg a g0 1 ¼ g0 1
a
gg0 gg
0 gg g a0g0 1
X X X X X X
0 0 0
¼ g 0 1
gg 0
a gg 0 g a 0 1 ¼
g gg 0 g 0 1 agg ag0 1 g ¼
0
g g 0
a gg0 a 0 1 g,
g
ð18:104Þ
where we used the rearrangement theorem and suitable exchange of group

elements. Thus, we see that the above-defined set ℵ is closed under summations
(i.e., linear combinations) and multiplications. A set closed under summations
and multiplications is said to be an algebra. If the set forms a group, the said set is
called a group algebra.
So far we have treated calculations as a multiplication between two elements
g and g0, i.e., g ⋄ g0 (see Sect. 16.1). Now we start regarding the calculations as
summation as well. In that case, group elements act as basis vectors in a vector
space. Bearing this in mind, let us further define specific group algebra.
(b) Let Ki be the i-th conjugacy class of a group ℊ ¼ {g1, g2, , gn}. Also let Ki be
such that
n o
ðiÞ ðiÞ ðiÞ
K i ¼ A1 , A2 , , Aki , ð18:105Þ
where ki is the number of elements belonging to Ki. Now think of a set

gKig1 (8g 2 ℊ). Then, due to the definition of a class, we have
gK i g1 ⊂ K i : ð18:106Þ
Multiplying g1 from the left and g from the right of both sides, we get
Ki ⊂ g1Kig. Since g is arbitrarily chosen, replacing g with g1 we have
K i ⊂ gK i g1 : ð18:107Þ
Therefore, we get
gK i g1 ¼ K i : ð18:108Þ
ðiÞ ðiÞ
Meanwhile, for AðαiÞ , Aβ 2 K i ; AðαiÞ 6¼ Aβ (1 α, β ki) we have
ðiÞ 8
gAðαiÞ g1 6¼ gAβ g1 g2ℊ : ð18:109Þ
ðiÞ
This is because if the equality holds with (18.109), we have AðαiÞ ¼ Aβ , in
contradiction.
(c) Let K be a set collecting several classes and described as
X
K¼ aK,
i i i
ð18:110Þ
where ai is a positive integer or zero. Thanks to (18.105) and (18.110), we have
gKg1 ¼ K: ð18:111Þ
Conversely, if a group algebra K satisfies (18.111), K can be expressed as a

sum of classes such as (18.110). Here suppose that K is not expressed by
(18.110), but described by
X
K¼ aK
i i i
þ Q, ð18:112Þ
where Q is an “incomplete” set that does not form a class. Then,

X X
gKg1 ¼ g i
a i K i þ Q g 1
¼ g i
a i K i g1 þ gQg1
X X
¼ a K þ gQg1 ¼ K ¼
i i i
a K þ Q,
i i i
ð18:113Þ
where the equality before the last comes from (18.111). Thus, from (18.113)
we get
8
gQg1 ¼ Q g2ℊ : ð18:114Þ
By definition of the classes, this implies that Q must form a “complete” class,
in contradiction to the supposition. Thus, (18.110) holds.
In Sect. 16.3, we described several characteristics of an invariant subgroup.
As readily seen from (16.11) and (18.111), any invariant subgroup consists of
two or more entire classes. Conversely, if a group comprises entire classes, it
must be an invariant subgroup.
(d) Let us think of a product of classes. Let Ki and Kj be sets described as (18.105).
We define a product KiKj as a set containing products
ð jÞ
AðαiÞ Aβ 1 α ki , 1 β kj . That is
Xki Xkj ðiÞ

KiKj ¼ l¼1
A AðmjÞ :
m¼1 l
ð18:115Þ
Multiplying (15.115) by g (8g 2 ℊ) from the left and by g1 from the right, we
get

gK i K j g1 ¼ ðgK i gÞ g1 K j g1 ¼ K i K j : ð18:116Þ
From the above discussion, we get

X
KiKj ¼ c K,
l ijl l
ð18:117Þ
where cijl is a positive integer or zero. In fact, when we take gKiKjg1, we

merely permute the terms in (18.117).
(e) If two group elements of the group ℊ ¼ {g1, g2, , gn} are conjugate to each
other, their inverse elements are conjugate to each other as well. In fact, suppose
that for gμ, gν 2 Ki, we have
gμ ¼ ggν g1 : ð18:118Þ
Then, taking the inverse of (18.118), we get
gμ 1 ¼ ggν 1 g1 : ð18:119Þ
Thus, given a class Ki, there exists another class K i0 that consists of the
inverses of the elements of Ki. If gμ 6¼ gν, gμ1 6¼ gν1. Therefore, Ki and K i0 are
of the same order. Suppose that the number of elements contained in Ki is ki and
that in K i0 is ki0 . Then, we have
ki ¼ ki0 : ð18:120Þ
18.6 Classes and Irreducible Representations 707
If K j 6¼ K i0 (i.e., Kj is not identical with K i0 as a set), KiKj does not contain e.

In fact, suppose that for gρ 2 Ki there were a group element b 2 Kj such that
bgρ ¼ e. Then, we would have
b ¼ gρ 1 and b 2 K i0 : ð18:121Þ
T
This would mean that b 2 K j K i0 , implying that K j ¼ K i0 by definition of
class. It is in contradiction to K j 6¼ K i0 . Consequently, KiKj does not contain e.
Taking the product of the classes Ki and K i0 , we obtain the identity e precisely ki
times. Rewriting (18.117), we have
X
K i K j ¼ cij1 K 1 þ c K
l6¼1 ijl l
, ð18:122Þ
where K1 ¼ {e}. As mentioned below, we are most interested in the first term
in (18.122). In (18.122), if K j ¼ K i0 , we have
cij1 ¼ ki : ð18:123Þ
On the other hand, if K j 6¼ K i0 , cij1 ¼ 0. Summarizing the above arguments,

we get
8
>
< ki for K j ¼ K i0 ,
cij1 ¼ ð18:124Þ
>
:
0 for K j 6¼ K i0 :
Or, symbolically we write
cij1 ¼ ki δ ji0 : ð18:125Þ
18.6 Classes and Irreducible Representations
After the aforementioned considerations, we have the following theorem:

Theorem 18.5 Let ℊ ¼ {g1, g2, , gn} be a group. A trace of irreducible repre-
sentations satisfies the following orthogonality relation:
Xnr n
α¼1
χ ðαÞ ðK i Þχ ðαÞ K j ¼ δij , ð18:126Þ
ki
where summation α is taken over all inequivalent nr irreducible representations; Ki

and Kj indicate conjugacy classes; ki denotes the number of elements contained in the
i-th class Ki.
Proof Rewriting (18.108), we have

8
gK i ¼ K i g g2ℊ : ð18:127Þ
Since a homomorphic correspondence holds between a group element and its

bi
representation matrix, a similar correspondence holds as well with (18.127). Let K
be a sum of ki matrices of the α-th irreducible representation D and let K
(α) b i be
expressed as
X
bi ¼
K DðαÞ : ð18:128Þ
g2K i
Note that in (18.128) D(α) functions as a linear transformation with respect to a

group algebra. From (18.127), we have
8
bi ¼ K
DðαÞ K b i DðαÞ g2ℊ : ð18:129Þ
Since D(α) is an irreducible representation, Kbi must be expressed on the basis of

Schur’s Second Lemma as
b i ¼ λE:
K ð18:130Þ
To determine λ, we take a trace of both sides of (18.130). Then, from (18.128) and
(18.130) we get
k i χ ðαÞ ðK i Þ ¼ λd α : ð18:131Þ
Thus, we get
b i ¼ ki χ ðαÞ ðK i ÞE:
K ð18:132Þ
dα
Next, corresponding to (18.122), we have

X
b iK
K bj ¼ c K: b ð18:133Þ
l ijl l
b l in (18.133) with that of (18.132), we get

Replacing K
18.6 Classes and Irreducible Representations 709
X
k i kj χ ðαÞ ðK i Þχ ðαÞ K j ¼ dα l cijl kl χ ðαÞ ðK l Þ: ð18:134Þ
Returning to (18.99) and rewriting it, we have

X
ðαÞ
d χ
α α
ðK i Þ ¼ nδi1 , ð18:135Þ
where again we have K1 ¼ {e}. With respect to (18.134), we sum over all the
irreducible representations α. Then we have
X
ðαÞ ð αÞ
Xh X
ð αÞ
i
α
k i k j χ ðK i Þχ K j ¼ l
c ijl k l α
d α χ ð K l Þ
X
¼ c k nδ ¼ cij1 n:
l ijl l l1
ð18:136Þ
In (18.136) we remark that k1 ¼ 1, meaning that the number of group elements

that K1 ¼ {e} contains is 1. Rewriting (18.136), we get
X
ðαÞ ðαÞ
ki kj α
χ ð K i Þχ K j ¼ cij1 n ¼ ki nδ ji0 , ð18:137Þ
where we used (18.125) with the last equality. Moreover, using
χ ðαÞ ðK i0 Þ ¼ χ ðαÞ ðK i Þ , ð18:138Þ
we get
X
ki kj α
χ ðαÞ ðK i Þχ ðαÞ K j ¼ cij1 n ¼ ki nδji : ð18:139Þ
Rewriting (18.139), we finally get

Xnr n
α¼1
χ ðαÞ ðK i Þχ ðαÞ K j ¼ δij : ð18:126Þ
ki
Equations (18.78) and (18.126) are well known as orthogonality relations. So far,
we have no idea about the mutual relationship between the two numbers nc and nr in
magnitude. In (18.78), let us consider a following set S
npffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffiffi o
S¼ k 1 χ ðαÞ ðK 1 Þ, k2 χ ðαÞ ðK 2 Þ, , k nc χ ðαÞ ðK nc Þ : ð18:140Þ
Viewing individual components in S as coordinates of a nc-dimensional vector,

(18.78) can be considered as an inner product as expressed using (complex) coor-
dinates of the two vectors. At the same time, (18.78) represents an orthogonal
relationship between the vectors. Since we can obtain at most nc mutually orthogonal
(i.e., linearly independent) vectors in a nc-dimensional space, for the number (nr) of
such vectors we have
nr n c : ð18:141Þ
Here nr is equal to the number of different α, i.e., the number of irreducible

representations.
Meanwhile, in (18.126) we consider a following set S0
n o
S0 ¼ χ ð1Þ ðK i Þ, χ ð2Þ ðK i Þ, , χ ðnr Þ ðK i Þ : ð18:142Þ
Similarly, individual components in S0 can be considered as coordinates of a nr-

dimensional vector. Again, (18.126) implies the orthogonality relation among vec-
tors. Therefore, as for the number (nc) of mutually orthogonal vectors we have
nc nr : ð18:143Þ
Thus, from (18.141) and (18.143) we finally reach a simple but very important
conclusion about the relationship between nc and nr such that
nr ¼ n c : ð18:144Þ
That is, the number (nr) of inequivalent irreducible representations is equal to that
(nc) of conjugacy classes of the group. An immediate and important consequence of
(18.144) together with (18.100) is that the representation of an Abelian group is one-
dimensional. This is because in the Abelian group individual group elements
constitute a conjugacy class. We have nc ¼ n accordingly.
18.7 Projection Operators
In Sect. 18.2 we have described how basis vectors (or functions) are transformed by
a symmetry operation. In that case the symmetry operation is performed by a group
element that belongs to a transformation group. More specifically, given a group
ℊ ¼ {g1, g2, , gn} and a set of basis vectors ψ 1, ψ 2, , and ψ d, the basis vectors
are transformed by gi 2 ℊ (1 i n) such that
Xd
gi ð ψ ν Þ ¼ μ¼1
ψ μ Dμν ðgi Þ: ð18:22Þ
We may well ask then how we can construct such basis vectors. In this section we
address this question. A central concept about this is a projection operator. We have
already studied the definition and basic properties of the projection operators. In this
section, we deal with them bearing in mind that we apply the group theory to
molecular science, especially to quantum chemical calculations.
In Sect. 18.6, we examined the permissible number of irreducible representations.
We have reached a conclusion that the number (nr) of inequivalent irreducible
representations is equal to that (nc) of conjugacy classes of the group. According
to this conclusion, we modify the above equation (18.22) such that
Xd
ðαÞ α ðαÞ ðαÞ
g ψi ¼ ψ Dji ðgÞ,
j¼1 j
ð18:145Þ
where α and dα denote the α-th irreducible representation and its dimension,
respectively; the subscript i is omitted from gi for simplicity. This naturally leads
ðαÞ ðβ Þ
to the next question of how the basis vectors ψ j are related to ψ j that are basis
vectors as well, but belong to a different irreducible representation β.
Suppose that we have an arbitrarily chosen function f. Then, f is assumed to
contain various components of different irreducible representations. Thus, let us
assume that f is decomposed into the component such that
X X
f ¼ α
cðαÞ ψ ðmαÞ ,
m m
ð18:146Þ
where cðmαÞ is a coefficient of the expansion and ψ ðmαÞ is the m-th component of α-th
irreducible representation.
Now, let us define the following operator:
ðαÞ dα X ðαÞ
PlðmÞ D ðgÞ g,
g lm
ð18:147Þ
n
where Σg means that the summation should be taken over all n group elements g; dα
ðαÞ
denotes a dimension of the representation D(α). Operating PlðmÞ on f, we have
ðαÞ d α X ðαÞ d X ðαÞ X X ðνÞ ðνÞ

PlðmÞ f ¼ Dlm ðgÞ gf ¼ α D ð gÞ g ν k c k ψ k
g lm
n g n
d X ðαÞ X X ðνÞ ðνÞ
¼ α D ð gÞ
g lm ν k k
c gψ k
n
d X ðαÞ
X X ðνÞ
Xdν ðνÞ ðνÞ
¼ α D lm ð g Þ ν
c k ψ Djk ðgÞ
j¼1 j
ð18:148Þ
n g k
X X X h X i
d ðνÞ dν ð νÞ ðαÞ ðνÞ
¼ α ν
c k ψ j D lm ð g Þ Djk ð g Þ
n k j¼1 g
dα X X ðνÞ Xdν ðνÞ n ðαÞ
¼ ν
c
k k
ψ δ δ δ ¼ cðmαÞ ψ l ,
j¼1 j d α αν lj mk
n
where we used (18.51) for the equality before the last.

ð αÞ
Thus, an implication of the operator PlðmÞ is that if f contains the component ψ ðmαÞ,
ðαÞ
i.e., cðmαÞ 6¼ 0, PlðmÞ plays a role in extracting the component cðmαÞ ψ ðmαÞ from f and then
ðαÞ
converting it to cðmαÞ ψ l . If the ψ ðmαÞ component is not contained in f, that means from
ð αÞ
(18.148) PlðmÞ f ¼ 0. In that case we choose a more suitable function for f. In this
context, we will investigate an example of a quantum chemical calculation later.
ðαÞ
In the above case, let us call PlðmÞ a projection operator sensu lato. In Sect. 14.1,
we dealt with several aspects of the projection operators. There we have mentioned
that a projection operator should be idempotent and Hermitian in a rigorous sense
(Definition 14.1). To address another important aspect of the projection operator, let
us prove an important relation in a following theorem.
ðαÞ ðβÞ
Theorem 18.6 [1, 2] Let PlðmÞ and PsðtÞ be projection operators defined in (18.147).
Then, the following equation holds:
ðαÞ ðβ Þ ð αÞ
PlðmÞ PsðtÞ ¼ δαβ δms PlðtÞ : ð18:149Þ
Proof We have
ðαÞ ðβÞ d α X ðαÞ dβ X ðβÞ 0 0

PlðmÞ PsðtÞ ¼ D ðgÞ g
g lm
D ðg Þ g
g0 st
n n
d X ðαÞ d β X ðβÞ 1 0 1 0
¼ α D ð gÞ g
g lm
D g g g g
g0 st
n n
d dβ X X ðαÞ ðβÞ 1 0 1 0
¼ α g0 lm
D ðgÞ Dst g g gg g
n n g
h i{
d α d β X X ðαÞ
¼ g0 lm
D ð gÞ D ðgÞ D ðg Þ eg0
ðβÞ ðβÞ 0
n n g
st
n o
d α d β X X ð αÞ X ð β Þ ðβÞ 0
¼ 0
Dlm ð g Þ D ð g Þ ks D ð g Þ kt g0
n n g g k
h i
d d β X X X ðαÞ ðβÞ
¼ α 0
D lm ð g Þ D ð g Þ ks DðβÞ ðg0 Þkt g0
n n k g g
d dβ X X n ðβ Þ 0 0 dβ X
¼ α 0
δ αβ δ lk δ ms D ð g Þ kt g ¼ δ δ DðβÞ ðg0 Þlt g0
g0 αβ ms
n n k g dα n
d β X ðβ Þ 0 0 ð αÞ
¼ δαβ δms g0
D ðg Þlt g ¼ δαβ δms PlðtÞ :
n
ð18:150Þ

In the above proof, we used the rearrangement theorem and grand orthogonality
theorem (GOT) as well as the homomorphism and unitarity of the representation
matrices.
Comparing (18.146) and (18.148), we notice that the term cðmαÞ ψ ðmαÞ is not extracted
ðαÞ ðαÞ
entirely, but cðmαÞ ψ l is given instead. This is due to the linearity of PlðmÞ . Nonethe-
less, this is somewhat inconvenient for a practical purpose. To overcome this
inconvenience, we modify (18.149). In (18.149) putting s ¼ m, we have
ð αÞ ðβÞ ðαÞ
PlðmÞ PmðtÞ ¼ δαβ PlðtÞ : ð18:151Þ
We further modify the relation. Putting β ¼ α, we have
ðαÞ ðαÞ ðαÞ

PlðmÞ PmðtÞ ¼ PlðtÞ : ð18:152Þ
Putting t ¼ l furthermore,
ð αÞ ð αÞ ðαÞ
PlðmÞ PmðlÞ ¼ PlðlÞ : ð18:153Þ
In particular, putting m ¼ l moreover, we get

h i2
ðαÞ ðαÞ ð αÞ ðαÞ
PlðlÞ PlðlÞ ¼ PlðlÞ ¼ PlðlÞ : ð18:154Þ
In fact, putting m ¼ l in (18.148), we get
ð αÞ ð αÞ ðαÞ
PlðlÞ f ¼ cl ψ l : ð18:155Þ
ðαÞ ðαÞ
This means that the term cl ψ l has been extracted entirely.
Moreover, in (18.149) putting β ¼ α, s ¼ l, and t ¼ m, we have
h i2
ðαÞ ðαÞ ðαÞ ð αÞ
PlðmÞ PlðmÞ ¼ PlðmÞ ¼ δml PlðmÞ :
ðαÞ
Therefore, for PlðmÞ to be an idempotent operator, we must have m ¼ l. That is, of
ðαÞ ðαÞ
various operators PlðmÞ , only PlðlÞ is eligible for an idempotent operator.
ð αÞ
Meanwhile, fully describing PlðlÞ we have
ðαÞ dα X ðαÞ
PlðlÞ ¼ D ðgÞ g:
g ll
ð18:156Þ
n
Taking complex conjugate transposition (i.e., adjoint) of (18.156), we have

h i{
ðαÞ d X ðαÞ d X ð αÞ {
PlðlÞ ¼ α Dll ðgÞg{ ¼ α 1
Dll g1 g1
n g n g
d α X ðαÞ ðαÞ
¼ D ðgÞ g ¼ PlðlÞ ,
g ll
ð18:157Þ
n
where we used unitarity of g (with the third equality) and equivalence of summation
over g and g1. Notice that the notation of g{ is less common. This should be
interpreted as meaning that g{ operates on a vector constituting a representation
space. Thus, notation g{ implies that g{ is equivalent to its unitary representation
ðαÞ
matrix D(α)(g). Note also that Dll is not a matrix but a (complex) number. Namely,
ðαÞ
in (18.156) Dll ðgÞ is a coefficient of an operator g.
ðαÞ
Equations (18.154) and (18.157) establish that PlðlÞ is a projection operator in a
ðαÞ
rigorous sense. Let us call PlðlÞ a projection operator sensu stricto accordingly. Also,
we notice that cðmαÞ ψ ðmαÞ is entirely extracted from an arbitrary function f including a
coefficient. This situation resembles that of (12.205).
ðαÞ
Regarding PlðmÞ ðl 6¼ mÞ, on the other hand, we have
h i{
ðαÞ dα X ðαÞ { dα X ðαÞ 1 1 {
PlðmÞ ¼ D lm ð g Þg ¼ 1
Dlm g g
n g n g
ð18:158Þ
d X ðαÞ ðαÞ
¼ α D ðgÞ g ¼ PmðlÞ :
g ml
n
ðαÞ
Hence, PlðmÞ is not Hermitian. We have many other related equations. For
instance,
h i{ h i h i
ðαÞ ð αÞ ðαÞ { ðαÞ { ðαÞ ðαÞ
PlðmÞ PmðlÞ ¼ PmðlÞ PlðmÞ ¼ PlðmÞ PmðlÞ : ð18:159Þ
ðαÞ ðαÞ
Therefore, PlðmÞ PmðlÞ is Hermitian, recovering the relation (18.153).
In (18.149) putting m ¼ l and t ¼ s, we get
ðαÞ ðβÞ ðαÞ

PlðlÞ PsðsÞ ¼ δαβ δls PlðsÞ : ð18:160Þ
As in the case of (18.146), we assume that h is described as

X X
h¼ α
dðαÞ ϕðmαÞ ,
m m
ð18:161Þ
where ϕðmαÞ is transformed in the same manner as that for ψ ðmαÞ in (18.146), but linearly
independent of ψ ðmαÞ . Namely, we have
Xd
ðαÞ α ðαÞ ðαÞ
g ϕi ¼ ϕ Dji ðgÞ:
j¼1 j
ð18:162Þ
Tangible examples can be seen in Chap. 19.

ð αÞ
Operating PlðlÞ on both sides of (18.155), we have
h i h i
ðαÞ 2 ðαÞ ðαÞ ðαÞ ð αÞ ðαÞ ðαÞ
PlðlÞ f ¼ PlðlÞ cl ψ l ¼ PlðlÞ f ¼ cl ψ l , ð18:163Þ
h i
ð αÞ 2 ð αÞ
where with the second equality we used PlðlÞ ¼ PlðlÞ . That is, we have
h i
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ cl ψ l ¼ cl ψ l : ð18:164Þ
ðαÞ ðαÞ
This equation means that cl ψ l is an eigenfunction corresponding to an
ð αÞ ðαÞ ðαÞ
eigenvalue 1 of PlðlÞ . In other words, once cl ψ l is extracted from f, it belongs to
the “position” l of an irreducible representation α. Furthermore, for some constants
c and d as well as the functions f and h that appeared in (18.146) and (18.161), we
consider a following equation:
h i
ð αÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ ccl ψ l þ ddl ϕl ¼ cPlðlÞ cl ψ l þ dPlðlÞ dl ϕl
ðαÞ ðαÞ ðαÞ ðαÞ
¼ ccl ψ l þ dd l ϕl , ð18:165Þ
where with the last equality we used (18.164). This means that an arbitrary linear
ðαÞ ðαÞ ðαÞ ðαÞ
combination of cl ψ l and d l ϕl again belongs to the position l of an irreducible
ðαÞ ðαÞ
representation α. If ψ l and ϕl are linearly independent, we can construct two
orthonormal basis vectors following Theorem 13.2 (Gram–Schmidt
orthonormalization theorem). If there are other linearly independent vectors
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
pl ξl , ql φl , , where ξl , φl , etc. belong to the position l of an irreducible
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
representation α, then ccl ψ l þ ddl ϕl þ ppl ξl þ qql φl þ again
belongs to the position l of an irreducible representation α. Thus, we can construct
orthonormal basis vectors of the representation space according to Theorem 13.2.
Regarding arbitrary functions f and h as vectors and using (18.160), we make an
inner product of (18.160) such that

ðαÞ ðβÞ ðαÞ ðαÞ ðβÞ ðαÞ ðαÞ
hhPlðlÞ PsðsÞ j f i ¼ hhPlðlÞ PlðlÞ PsðsÞ j f i ¼ hhPlðlÞ δαβ δls PlðsÞ f i
D E ð18:166Þ
ðαÞ ðαÞ
¼ δαβ δls hPlðlÞ PlðsÞ f ,
ðαÞ ð αÞ ð αÞ
where with the first equality we used PlðlÞ ¼ PlðlÞ PlðlÞ (18.154). Meanwhile, from
(18.148) we have
ðαÞ ð αÞ
PlðsÞ j f i ¼ cðsαÞ ψ l : ð18:167Þ
Also using (18.155), we get
PlðlÞ j hi ¼ dl j ϕl i: ð18:168Þ
Taking adjoint of (18.168), we have

h i h i
ðαÞ { ðαÞ ð αÞ ðαÞ
h PlðlÞ ¼ hhPlðlÞ ¼ dl hϕl j , ð18:169Þ
h i{
ð αÞ ðαÞ
where we used PlðlÞ ¼ PlðlÞ (18.157). The relation (18.169) is due to the notation
of Sect. 13.3. Substituting (18.167) and (18.168) for (18.166),
D E h i D E
ðαÞ ðβÞ ðαÞ ðαÞ ðαÞ
hPlðlÞ PsðsÞ f ¼ δαβ δls dl cðsαÞ ϕl jψ l :
Meanwhile, (18.166) can also be described as

D E D E h i D E
ð αÞ ð β Þ ðαÞ ðαÞ ðαÞ
hPlðlÞ PsðsÞ f ¼ dl ϕl cðsβÞ ψ ðsβÞ ¼ dl cðsβÞ ϕl ψ ðsβÞ
ð αÞ
ð18:170Þ
For (18.166) and (18.170) to be identical, we must have

h i D E h i D E
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðβÞ ð αÞ
δαβ δls d l cs ϕl ψ l ¼ dl cs ϕl ψ ðsβÞ :
Deleting coefficients, we get

D E D E
ð αÞ ðαÞ ðαÞ
ϕl ψ ðsβÞ ¼ δαβ δls ϕl ψ l : ð18:171Þ
The relation (18.171) is frequently used to estimate whether definite integrals

vanish. Functional forms depend upon actual problems we encounter in various
situations. We will deal with this problem in Chap. 19 in relation to, e.g., discussion
on optical transition and evaluation of overlap integrals.
To evaluate (18.170), if α 6¼ β or l 6¼ s, we get simply
D E
ðαÞ ðβÞ
hPlðlÞ PsðsÞ f ¼ 0 ðα 6¼ β or l 6¼ sÞ: ð18:172Þ
ðβÞ ðαÞ
That is, under a given condition α 6¼ β or l 6¼ s, PsðsÞ j f i and PlðlÞ j hi are
orthogonal (see Theorem 13.3 of Sect. 13.4). The relation clearly indicates that
functions belonging to different irreducible representations (α 6¼ β) are mutually
orthogonal. Even though the functions belong to the same irreducible representation,
the functions are orthogonal if they are allocated to the different “place” as a basis
ðαÞ
vector designated by l, s, etc. Here the place means the index j of ψ j in (18.145) or
ðαÞ ðαÞ ðαÞ ð αÞ ð αÞ ðαÞ
ϕj in (18.162) that designates the “order” of ψ j within ψ 1 , ψ 2 , , ψ dα or ϕj
within ϕ1 , ϕ2 , , ϕdα . This takes place if the representation is multidimensional
(i.e., dimensionality: dα).
In (18.160) putting α ¼ β, we get
ðαÞ ðαÞ ðαÞ

PlðlÞ PsðsÞ ¼ δls PlðsÞ :
In the above relation, furthermore, unless l ¼ s we have
ðαÞ ðαÞ
PlðlÞ PsðsÞ ¼ 0:
ðαÞ ðαÞ
Therefore, on the basis of the discussion of Sect. 14.1, PlðlÞ þ PsðsÞ is a projection
ð αÞ ðαÞ
operator as well in the case of l 6¼ s. Notice, however, that if l ¼ s, PlðlÞ þ PsðsÞ is not a
projection operator. Readers can readily show it. Moreover, let us define P(α) as
below
Xd α ðαÞ
PðαÞ P :
l¼1 lðlÞ
ð18:173Þ
Similarly, P(α) is again a projection operator as well.

From (18.156), P(α) in (18.173) can be rewritten as
h i
dα X Xdα ðαÞ dα X ðαÞ
PðαÞ ¼ D
l¼1 ll
ð g Þ g ¼ χ ð g Þ g: ð18:174Þ
n g n g
Returning to (18.155) and taking summation over l there, we have

Xdα ð αÞ
Xd α ðαÞ ðαÞ
P f ¼
l¼1 lðlÞ
c ψl :
l¼1 l

Xdα ðαÞ ðαÞ
PðαÞ f ¼ c ψl :
l¼1 l
ð18:175Þ
Thus, an operator P(α) has a clear meaning. That is, P(α) plays a role in extracting
all the vectors (or functions) including their coefficients. Defining those functions as
Xdα ðαÞ ðαÞ
ψ ðαÞ c ψl ,
l¼1 l
ð18:176Þ
we succinctly rewrite (18.175) as
PðαÞ f ¼ ψ ðαÞ : ð18:177Þ
In turn, let us calculate P(β)P(α). This can be done with (18.174) as follows:
d β X h ðβÞ i dα X h ðαÞ 0 i 0
PðβÞ PðαÞ ¼ χ ð gÞ g g0
χ ðg Þ g : ð18:178Þ
n g n
To carry out the calculation, (i) first we replace g0 with g1g0 and rewrite the
summation over g0 as that over g1g0. (ii) Using the homomorphism and unitarity of
the representation, we rewrite [χ (α)(g0)] as
h i X h i h i
χ ðαÞ ðg0 Þ ¼ i,j
D ðαÞ
ð g Þ ji DðαÞ 0
ð g Þ ji : ð18:179Þ
Rewriting (18.178) we have

d β dα X X h ðβÞ i h ðαÞ X h i
PðβÞ PðαÞ ¼ D ð g Þ kk D ð g Þ ji 0
D ðαÞ 0
ðg Þ g0
n n g k,i,j g ji
dβ dα X n X h i
dα X X
h i
ðαÞ 0 0 ðαÞ 0
¼ δ αβ δkj δ ki g0
D ð g Þ g ¼ g0
D ð g Þ g0
n n k,i,j d β ji n k kk
h i
d X
¼ δαβ α g0
χ ðαÞ 0
ð g Þ g0 ¼ δαβ PðαÞ :
n
ð18:180Þ
These relationships are anticipated from the fact that both P(α) and P(β) are
projection operators.
As in the case of (18.160), (18.180) is useful to evaluate an inner product of
functions. Again taking an inner product of (18.180) with arbitrary functions f and g,
we have
D E D E D E
gPðβÞ PðαÞ f ¼ gδαβ PðαÞ f ¼ δαβ gPðαÞ f : ð18:181Þ
If we define P such that

Xnr
P α¼1
PðαÞ , ð18:182Þ
then P is once again a projection operator (see Sect. 14.1). Now taking summation in
(18.177) over all the irreducible representations α, we have
Xnr Xnr
α¼1
PðαÞ f ¼ α¼1
ψ ðαÞ ¼ f : ð18:183Þ
The function f has been arbitrarily taken and, hence, we get
P ¼ E: ð18:184Þ
As mentioned in Example 18.1, the concept of the representation space is not only
important but also very useful for addressing various problems of physics and
chemistry. For instance, molecular orbital methods to be dealt with in Chap. 19
consider a representation space whose dimension is equal to the number of electrons
of a molecule about which we wish to know, e.g., energy eigenvalues of those
electrons. In that case, the dimension of representation space is equal to the number
of molecular orbitals. According to a symmetry species of the molecule, the repre-
sentation matrix is decomposed into a direct sum of invariant eigenspaces relevant to
individual irreducible representations. If the basis vectors belong to different irre-
ducible representation, such vectors are orthogonal to each other in virtue of
(18.171). Even though those vectors belong to the same irreducible representation,
they are orthogonal if they are allocated to a different place. However, it is often the
case that the vectors belong to the same place of the same irreducible representation.
Then, it is always possible according to Theorem 13.2 to make them mutually
orthogonal by taking their linear combination. In such a way, we can construct an
orthonormal basis set throughout the representation space. In fact, the method is a
powerful tool for solving an energy eigenvalue problem and for determining asso-
ciated eigenfunctions (or molecular orbitals).
18.8 Direct-Product Representation
In Sect. 16.5 we have studied basic properties of direct-product groups. Correspond-

ingly, we examine in this section properties of direct-product representation. This
notion is very useful to investigate optical transitions in molecular systems and
selection rules relevant to those transitions.
Let D(α) and D(β) be two different irreducible representations whose dimensions
are dα and dβ, respectively. Then, operating a group element g on the basis functions
we have
Xd α ðαÞ
gð ψ i Þ ¼ k¼1
ψ k Dki ðgÞ ð1 i dα Þ, ð18:185Þ
Xd β ðβ Þ
g ϕj ¼ ϕ D ð gÞ
l¼1 l lj
1 j dβ , ð18:186Þ
where ψ k (1 i dα) and ϕl (1 l dβ) are basis functions of D(α) and D(β),
respectively. We can construct dαdβ new basis vectors using ψ kϕl. These functions
are transformed according to g such that
hX ðαÞ
ihX
ðβÞ
i
g ψ i ϕj ¼ gðψ i Þg ϕj ¼ k
ψ k D ki ð g Þ l
ϕ l D lj ðg Þ
XX ðαÞ ðβÞ
¼ k
ψ ϕ D ðgÞDlj ðgÞ:
l k l ki
ð18:187Þ
β)
Here let us define the following matrix [D(α (g)]kl, ij such that
h i
ðαÞ ðβ Þ
Dðα βÞ
ð gÞ Dki ðgÞDlj ðgÞ: ð18:188Þ
kl,ij
Then we have
XX h i
g ψ i ϕj ¼ k l
ψ k ϕ l Dðα βÞ
ð gÞ : ð18:189Þ
kl,ij
18.8 Direct-Product Representation 721
The notation using double scripts is somewhat complicated. We notice, however,

that in (18.188) the order of subscript of kilj is converted to kl, ij; i.e., the subscripts
i and l have been interchanged.
We write D(α) and D(β) in explicit forms as follows:
0 1 0 1
ð αÞ ðαÞ ðβÞ ðβÞ
d1,1 d 1,dα d1,1 d1,dβ
B C B C
DðαÞ ðgÞ ¼ @ ⋮ ⋱ ⋮ A, DðβÞ ðgÞ ¼ B
@ ⋮ ⋱ ⋮ C A: ð18:190Þ
ðαÞ ðαÞ ðβ Þ ðβÞ
ddα ,1 d dα ,dα ddβ ,1 d dβ ,dβ
Thus,
Dðα βÞ
ðgÞ ¼ DðαÞ ðgÞ DðβÞ ðgÞ
0 ðαÞ ðβÞ ðαÞ 1
d1,1 D ðgÞ d1,dα DðβÞ ðgÞ
B C
¼@ ⋮ ⋱ ⋮ A: ð18:191Þ
ðαÞ ðαÞ
ddα ,1 DðβÞ ðgÞ d dα ,dα DðβÞ ðgÞ
To get familiar with the double-scripted notation, we describe a case of (2, 2)

matrices. Denoting

ðαÞ a11 a12 ðβÞ b11 b12
D ð gÞ ¼ and D ðgÞ ¼ , ð18:192Þ
a21 a22 b21 b22
we get
Dðα βÞ
ðgÞ ¼ DðαÞ ðgÞ DðβÞ ðgÞ
0 1
a11 b11 a11 b12 a12 b11 a12 b12
Ba b C
B 11 21 a11 b22 a12 b21 a12 b22 C: ð18:193Þ
B C
¼B C
B a21 b11 a21 b12 a22 b11 a22 b12 C
@ A
a21 b21 a21 b22 a22 b21 a22 b22
Corresponding to (18.22), (18.189) describes the transformation of ψ iϕj regard-

ing the double script. At the same time, a set {ψ iϕj; (1 i dα, 1 j dβ)} is a
basis of D(α β). In fact,
h i
ðαÞ ðβÞ
Dðα βÞ
ðgg0 Þ ¼ Dki ðgg0 ÞDlj ðgg0 Þ
kl,ij
hX ihX i
ðαÞ ð αÞ 0 ðβÞ ðβÞ 0
¼ D
μ kμ
ð g ÞDμi ð g Þ D
ν lν
ð g ÞD νj ðg Þ
X X ðαÞ ð18:194Þ
ðβÞ ðαÞ 0 ðβÞ
¼ μ
D ðgÞDlν ðgÞDμi ðg ÞDνj ðg0 Þ
ν kμ
X Xh i h i
ðα β Þ ðα βÞ 0
¼ μ ν
D ð g Þ D ð g Þ
kl,μν μν,ij
Equation (18.194) shows that the rule of matrix calculation with respect to the
double subscript is satisfied. Consequently, we get
Dðα βÞ
ðgg0 Þ ¼ Dðα βÞ
ðgÞDðα βÞ
ðg0 Þ: ð18:195Þ
The relation (18.195) indicates that D(α β) is certainly a representation of a

group. This representation is said to be a direct-product representation.
A character of the direct-product representation is given by putting k ¼ i and l ¼ j
in (18.188). That is,
h i
ðαÞ ðβÞ
Dðα βÞ
ð gÞ ¼ Dii ðgÞDjj ðgÞ: ð18:196Þ
ij,ij
Denoting
X h i
χ ðα βÞ
ð gÞ i, j
Dðα βÞ
ð gÞ , ð18:197Þ
ij,ij
we have
χ ðα βÞ
ðgÞ ¼ χ ðαÞ ðgÞχ ðβÞ ðgÞ: ð18:198Þ
Even though D(α) and D(β) are both irreducible, D(αxβ) is not necessarily irreduc-
ible. Suppose that
X
DðαÞ ðgÞ DðβÞ ðgÞ ¼ q D
ω ω
ðωÞ
ðgÞ, ð18:199Þ
where qγ is given by (18.83). Then, we have
1 X ðγÞ ðα βÞ 1 X ðγÞ ðαÞ

qγ ¼ χ ð gÞ χ ðgÞ ¼ χ ðgÞ χ ðgÞχ ðβÞ ðgÞ, ð18:200Þ
n g n g
where n is an order of the group. This relation is often used to perform quantum
mechanical or chemical calculations, especially to evaluate optical transitions of
18.9 Symmetric Representation and Antisymmetric Representation 723
matter including atoms and molecular systems. This is also useful to examine
whether a definite integral of a product of functions (or an inner product of vectors)
vanishes.
In Sect. 16.5, we investigated definition and properties of direct-product groups.
Similarly to the case of the direct-product representation, we consider a representa-
tion of the direct-product groups. Let us consider two groups ℊ and H and assume
that a direct-product group ℊ H is defined (Sect. 16.5). Also let D(α) and D(β) be
dα- and dβ-dimensional representations of groups ℊ and H , respectively. Further-
more, let us define a matrix D(α β)(ab) as in (18.188) such that
h i
ð αÞ ðβ Þ
Dðα βÞ
ðabÞ Dki ðaÞDlj ðbÞ, ð18:201Þ
kl,ij
where a and b are arbitrarily chosen from ℊ and H , respectively, and ab 2 ℊ H .

Then a set comprising D(α β)(ab) forms a representation of ℊ H . (Readers,
please verify it.) A dimension of D(α β) is dαdβ. The character is given in (18.69),
and so in the present case by putting i ¼ k and j ¼ l in (18.201) we get
X Xh i XX ð αÞ ðβ Þ
χ ðα βÞ
ðabÞ ¼ Dðα βÞ
ðabÞ ¼ Dkk ðaÞDll ðbÞ
k l kl,kl k l
ð18:202Þ
¼ χ ðαÞ ðaÞχ ðβÞ ðbÞ:
Equation (18.202) resembles (18.198). Hence, we should be careful not to

confuse them. In (18.198), we were thinking of a direct-product representation
within a sole group ℊ. In (18.202), however, we are considering a representation
of the direct-product group comprising two different groups. In fact, even though a
character is computed with respect to a sole group element g in (18.198), in (18.202)
we evaluate a character regarding two group elements a and b chosen from different
groups ℊ and H , respectively.
18.9 Symmetric Representation and Antisymmetric

Representation
As mentioned in Sect. 18.2, we have viewed a group element g as a linear transfor-

mation over a vector space V. There we dealt with widely chosen functions as
vectors. In this chapter we introduce other useful ideas so that we can apply them
to molecular science and atomic physics.
In the previous section we defined direct-product representation. In
D(α β)(g) ¼ D(α)(g) D(β)(g) we can freely consider a case where D(α)(g) ¼ D(β)(g).
Then we have
XX ðαÞ ðαÞ
g ψ i ϕj ¼ gðψ i Þg ϕj ¼ k
ψ ϕ D ðgÞDlj ðgÞ:
l k l ki
ð18:203Þ
Regarding a product function ψ jϕi we can get a similar equation such that
XX ðαÞ ðαÞ
g ψ j ϕi ¼ g ψ j gð ϕi Þ ¼ k
ψ ϕ D ðgÞDli ðgÞ:
l k l kj
ð18:204Þ
On the basis of the linearity of the relations (18.203) and (18.204), let us construct
a linear combination of the product functions. That is,

g ψ i ϕj ψ j ϕi ¼ g ψ i ϕj g ψ j ϕi
XX ðαÞ ðαÞ
XX ðαÞ ðαÞ
¼ k
ψ ϕ D ðgÞDlj ðgÞ
l k l ki k
ψ ϕ D ðgÞDli ðgÞ
l k l kj
XX ð αÞ ðαÞ
¼ k l
ðψ k ϕl ψ l ϕk ÞDki ðgÞDlj ðgÞ:
ð18:205Þ
Here, defining Ψ ij as
Ψ ij ¼ ψ i ϕj ψ j ϕi , ð18:206Þ
XX n h io
1 ðαÞ ð αÞ ðαÞ ðαÞ
gΨij ¼ Ψ
l kl
D ðgÞDlj ðgÞ Dli ðgÞDkj ðgÞ : ð18:207Þ
k 2 ki
Notice that Ψ þ
ij and Ψ ij in (18.206) are symmetric and antisymmetric with respect
to the interchange of subscript i and j, respectively. That is,
Ψ ij ¼ Ψ ji : ð18:208Þ
We may naturally ask how we can constitute representation (matrices) with

(18.207). To answer this question, we have to carry out calculations by replacing
g with gg0 in (18.207) and following the procedures of Sect. 18.2. Thus, we have
XX n h io
1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ
gg0 Ψij ¼ Ψ D
l kl ki ðgg ÞDlj ðgg Þ Dli ðgg 0
ÞDkj ð gg 0
Þ
(2
k
XX h
1 X ðαÞ ðαÞ
X ðαÞ ðαÞ
¼ k
Ψ
l kl 2
D ðgÞDμi ðg0 Þ ν Dlν ðgÞDνj ðg0 Þ
μ kμ
#)
X ðαÞ ð αÞ 0
X ðαÞ ð αÞ 0
D ðgÞDμi ðg Þ ν Dkν ðgÞDνj ðg Þ
μ lμ
18.9 Symmetric Representation and Antisymmetric Representation 725
XX n X Xh i o
1 ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ 0 ðαÞ 0
¼ Ψ
l kl
μ ν
D kμ ð g ÞD lν ð g Þ D lμ ð g ÞD kν ð g Þ D μi ðg ÞD νj ðg Þ
(2
k
XX X X nh i h i o
1 ð α αÞ ðα αÞ ðαÞ 0 ðαÞ 0
¼ Ψ kl μ ν
D ð g Þ kl,μν D ð g Þ lk,μν Dμi ðg ÞDνj ðg Þ,
k l 2
ð18:209Þ
where the last equality follows from the definition of a direct-product representation
(18.188). We notice that the terms of [D(α α)(gg0)]kl, μν [D(α α)(gg0)]lk, μν in
(18.209) are symmetric and antisymmetric with respect to the interchange of sub-
scripts k and l together with subscripts μ and ν, respectively. Comparing both sides of
(18.209), we see that this must be the case with i and j as well. Then, the last factor of
(18.209) should be rewritten as:
h i
ðαÞ ðαÞ 1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ
Dμi ðg0 ÞDνj ðg0 Þ ¼ D ðg ÞDνj ðg Þ Dνi ðg0 ÞDμj ðg0 Þ :
2 μi
Now, we define the following notations accordingly:

n o h i h o
1
D½α α
ð gÞ f Dðα αÞ ðgÞ þ Dðα αÞ ðgÞlk,μν
kl,μν 2 kl,μν
h i
1 ðαÞ ðαÞ ðαÞ ð αÞ
¼ Dkμ ðgÞDlν ðgÞ þ Dlμ ðgÞDkν ðgÞ : ð18:210Þ
2
nh i h i o
1
Dfα αg
ð gÞ kl,μν
Dðα αÞ ðgÞ Dðα αÞ ðgÞ lk,μν
2 kl,μν
h i
1 ðαÞ ð αÞ ðαÞ ðαÞ
¼ Dkμ ðgÞDlν ðgÞ Dlμ ðgÞDkν ðgÞ : ð18:211Þ
2
ðαÞ ðαÞ
Meanwhile, using Dμi ðg0 ÞDνj ðg0 Þ ¼ Dðα αÞ ðg0 Þ μν,ij and considering the
exchange of summation with respect to the subscripts μ and ν, we can also define
D[α α](g0) and D{α α}(g) according to the symmetric and antisymmetric cases,
respectively. Thus, for the symmetric case, we have
X XX X n o n o
gg0 Ψþ
ij ¼ k l μ ν
Ψ þ
kl D½α α
ð gÞ D½α α
ð g0 Þ
kl,μν μν,ij
XX n o ð18:212Þ
¼ k
Ψþ
l kl
D½α α
ðgÞD½ α α 0
ðg Þ :
kl,ij
Using (18.210), (18.207) can be rewritten as

XX n o
gΨþ
ij ¼ k
Ψ þ
l kl
D½α α
ð gÞ : ð18:213Þ
kl,ij
Then we have
XX n o
gg0 Ψþ
ij ¼ k l
Ψþ
kl D
½α α
ðgg0 Þ : ð18:214Þ
kl,ij
Comparing (18.212) and (18.214), we finally get
D½α α
ðgg0 Þ ¼ D½α α
ðgÞD½α α
ðg0 Þ: ð18:215Þ
Similarly, for the antisymmetric case, we have

XX
gg0 Ψ
ij ¼ k l
Ψ
kl D
fα αg
ðgÞDfα αg
ð g0 Þ kl,ij
, ð18:216Þ
XX
gΨ
ij ¼ k l
Ψ
kl D
fα αg
ð gÞ kl,ij
, ð18:217Þ
Dfα αg
ðgg0 Þ ¼ Dfα αg
ðgÞDfα αg
ðg0 Þ: ð18:218Þ
Thus, both D[α α](g) and D{α α}(g) produce well-defined representations.
Letting dimension of the representation α be dα, we have dα(dα + 1)/2 functions
belonging to the symmetric representation and dα(dα 1)/2 functions belonging to
the antisymmetric representation. With the two-dimensional representation, for
instance, functions belonging to the symmetric representation are
ψ 1 ϕ1 , ψ 1 ϕ2 þ ψ 2 ϕ1 , and ψ 2 ϕ2 :
A function belonging to the antisymmetric representation is
ψ 1 ϕ2 ψ 2 ϕ1 :
Note that these vectors have not yet been normalized.

From (18.210) and (18.211), we can readily get useful expressions with charac-
ters of symmetric and antisymmetric representations. In (18.210) putting μ ¼ k and
ν ¼ l and summing over k and l,
References 727
X Xn o
χ ½α α
ð gÞ ¼ k l
D ½α α
ð g Þ
kl,kl
X X h i
1 ð αÞ ðαÞ ðαÞ ðαÞ
¼ D kk ð g ÞD ll ð g Þ þ D lk ð g ÞD kl ð g Þ ð18:219Þ
2 k l
h i 2 h i
1
¼ χ ðαÞ ðgÞ þ χ ðαÞ g2 :
2
Similarly we have
h i2 h
fα αg 1 ð αÞ ðαÞ
2 i
χ ðgÞ ¼ χ ð gÞ χ g : ð18:220Þ
2
References
1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, expanded
ed. Shokabo, Tokyo. (in Japanese)
Chapter 19
Applications of Group Theory to Physical
Chemistry
On the basis of studies of group theory, now in this last chapter we apply the
knowledge to the molecular orbital (MO) calculations (or quantum chemical calcu-
lations). As tangible examples, we adopt aromatic hydrocarbons (ethylene,
cyclopropenyl radical, benzene, and ally radical) and methane. The approach is
based upon a method of linear combination of atomic orbitals (LCAO). To seek an
appropriate LCAO MO, we make the most of a method based on a symmetry-
adapted linear combination (SALC). To use projection operators is a powerful tool
for this purpose. For the sake of correct understanding, it is desired to consider
transformation of functions. To this end, we first show several examples.
In the process of carrying out MO calculations, we encounter a secular equation
as an eigenvalue equation. Using a SALC eases the calculations of the secular
equation. Molecular science relies largely on spectroscopic measurements and
researchers need to assign individual spectral lines to a specific transition between
the relevant molecular states. Representation theory works well in this situation.
Thus, the group theory finds a perfect fit with its applications in the molecular
science.
19.1 Transformation of Functions
Before showing individual examples, let us consider the transformation of functions

(or vectors) by the symmetry operation. Here we consider scalar functions.
For example, let us suppose an arbitrary function f(x, y) on a Cartesian xy-
coordinate. Figure 19.1 shows a contour map of f(x, y) ¼ constant. Then suppose
that the map is rotated around the origin. More specifically, the position vector r0
fixed on a “summit” [i.e., a point that gives a maximal value of f(x, y)] undergoes a
symmetry operation, namely, rotation around the origin. As a result, r0 is
transformed to r00 . Here, we assume that a “mountain” represented by f(x, y) is a

https://doi.org/10.1007/978-981-15-2225-3_19
730 19 Applications of Group Theory to Physical Chemistry
Fig. 19.1 Contour map of f y

(x, y) and f 0(x0, y0). The
function f 0(x0, y0) is obtained
by rotating a map of f(x, y)
around the z-axis
′ ′, ′
,
O x
rigid body before and after the transformation. Concomitantly, a general point r (see
Sect. 17.1) is transformed to r0 in exactly the same way as r0.
A new function f 0 gives a new contour map after the rotation. Let us describe f 0 as
f 0 OR f : ð19:1Þ
The RHS of (19.1) means that f 0 is produced as a result of operating the rotation
R on f. We describe the position vector after being transformed as r00 . Following the
notation of Sect. 11.1, r00 and r0 are expressed as
r00 ¼ Rðr0 Þ and r0 ¼ RðrÞ: ð19:2Þ
The matrix representation for R is given by, e.g., (11.35). In (19.2) we have
0
x0 x
r0 = ðe1 e2 Þ and r00 ¼ ðe1 e2 Þ 00 , ð19:3Þ
y0 y0
where e1 and e2 are orthonormal basis vectors in the xy-plane. Also we have
0
x 0 x
r ¼ ð e1 e2 Þ and r ¼ ðe1 e2 Þ 0 : ð19:4Þ
y y
Meanwhile, the following equation must hold:
f 0 ðx0 , y0 Þ ¼ f ðx, yÞ: ð19:5Þ
Or, using (19.1) we have
OR f ðx0 , y0 Þ ¼ f ðx, yÞ: ð19:6Þ

19.1 Transformation of Functions 731
The above argument can be extended to a three-dimensional (or higher dimen-

sional) space. In that case, similarly we have
OR f ðx0 , y0 , z0 Þ ¼ f ðx, y, zÞ, OR f ðr0 Þ ¼ f ðrÞ, or f 0 ðr0 Þ ¼ f ðrÞ, ð19:7Þ
where
0 1 0 01
x x
B C 0 B 0C
r ¼ ðe1 e2 e3 Þ@ y A and r ¼ ðe1 e2 e3 Þ@ y A: ð19:8Þ
z z0
The last relation of (19.7) comes from (19.1). Using (19.2) and (19.3), we rewrite
(19.7) as
OR f ½RðrÞ ¼ f ðrÞ: ð19:9Þ
Replacing r with R1(r), we get

OR f R R1 ðrÞ ¼ f R1 ðrÞ : ð19:10Þ
That is,

OR f ðrÞ ¼ f R1 ðrÞ : ð19:11Þ
More succinctly, (19.11) may be rewritten as

Rf ðrÞ ¼ f R1 ðrÞ : ð19:12Þ
Comparing (19.11) with (19.12) and considering (19.1), we have
f 0 OR f Rf : ð19:13Þ
To gain a good understanding of the function transformation, let us think of some

examples.
Example 19.1 Let f(x, y) be a function described by
f ðx, yÞ = ðx aÞ2 þ ðy bÞ2 ; a, b > 0: ð19:14Þ
A contour is shown in Fig. 19.2. We consider a π/2 rotation around the z-axis.
Then, f(x, y) is transformed into Rf(x, y) ¼ f0(x, y) such that
Fig. 19.2 Contour map of f

(x, y) ¼ (x a)2 + (y b)2 y
and f0(x0,
y0) ¼ (x0 + b)2 + (y0 a)2 ′ ′, ′ ,
before and after a π/2
rotation around the z-axis
( , )
(− , )
O x
z
Rf ðx, yÞ ¼ f 0 ðx, yÞ = ðx þ bÞ2 þ ðy aÞ2 : ð19:15Þ
We also have

a
r 0 = ð e1 e2 Þ ,
b

0 1 a b
r00 ¼ ðe1 e2 Þ ¼ ð e1 e2 Þ ,
1 0 b a
where we define R as

0 1
R¼ :
1 0
Similarly,

x 0 y
r = ð e1 e2 Þ and r = ð e1 e2 Þ :
y x
f 0 ðx0 , y0 Þ = ðx0 þ bÞ þ ðy0 aÞ ¼ ðy þ bÞ2 þ ðx aÞ2 ¼ f ðx, yÞ:

2 2
ð19:16Þ
This ensures that (19.5) holds. The implication of (19.16) combined with (19.5) is
that a view of f0(x0, y0) from (b, a) is the same as that of f(x, y) from (a, b). Imagine
that if we are standing at (b, a) of f0(x0, y0), we are in the bottom of the “valley” of
f0(x0, y0). Exactly in the same manner, if we are standing at (a, b) of f(x, y), we are in
19.1 Transformation of Functions 733
Fig. 19.3 Contour map of y

f ðrÞ = e 2 2½ðxaÞ þy þz þ
2 2 2
′, ′ ,
e 2 2½ðxþaÞ þy þz ða > 0Þ.
2 2 2
(− , 0) ( , 0)
The function form remains
unchanged by a reflection
with respect to the yz-plane O
z x
the bottom of the valley of f(x, y) as well. Notice that for both f(x, y) and f0(x0, y0), (a,
b) and (b, a) are the lowest point, respectively.
Meanwhile, we have

1 0 1 1 0 1 x y
R ¼ , R ð r Þ ¼ ð e1 e2 Þ ¼ ð e1 e2 Þ :
1 0 1 0 y x
Then we have

f R1 ðrÞ ¼ ½ðyÞ a2 þ ½ðxÞ b2 ¼ ðx þ bÞ2 þ ðy aÞ2 ¼ Rf ðrÞ: ð19:17Þ
Thus, (19.12) certainly holds.

Example 19.2 Let f(r) and g(r) be functions described by
f ðrÞ ¼ e2½ðxaÞ þy þz2

þ e2½ðxþaÞ þy þz2
2 2 2 2
ða > 0Þ, ð19:18Þ
gðrÞ ¼ e2½ðxaÞ þy2 þz2

e2½ðxþaÞ þy2 þz2
2 2
ða > 0Þ: ð19:19Þ
Figure 19.3 shows an outline of the contour of f(r). We consider a following

symmetry operation in a three-dimensional coordinate system:
0 1 0 1
1 0 0 1 0 0
B C 1 B C
R¼@ 0 1 0 A and R ¼ @ 0 1 0 A: ð19:20Þ
0 0 1 0 0 1
This represents a reflection with respect to the yz-plane. Then we have

f R1 ðrÞ ¼ Rf ðrÞ = f ðrÞ, ð19:21Þ
(a) (b)
0
‒a 0 a ‒a 0 a
Fig. 19.4 Plots of f ðrÞ ¼ e2½ðxaÞ þy þz þ e2½ðxþaÞ þy þz and gðrÞ ¼ e2½ðxaÞ þy2 þz2
2 2 2 2 2 2 2

e2½ðxþaÞ þy þz ða > 0Þ as a function of x on the x-axis. (a) f(r). (b) g(r)
2 2 2

g R1 ðrÞ ¼ RgðrÞ = 2 gðrÞ: ð19:22Þ
Plotting f(r) and g(r) as a function of x on the x-axis, we depict results in Fig. 19.4.
Looking at (19.21) and (19.22), we find that f(r) and g(r) are solutions of an
eigenvalue equation for an operator R. Corresponding eigenvalues are 1 and 1 for f
(r) and g(r), respectively. In particular, f(r) holds the functional form after the
transformation R. In this case, f(r) is said to be invariant with the transformation R.
Moreover, f(r) is invariant with the following eight transformations:
0 1
1 0 0
B C
R¼@ 0 1 0 A: ð19:23Þ
0 0 1
These transformations form a group that is isomorphic to D2h. Therefore, f(r) is

eligible for a basis function of the totally symmetric representation Ag of D2h. Notice
that f(r) is invariant as well with a rotation of an arbitrary angle around the x-axis. On
the other hand, g(r) belongs to B3u.
19.2 Method of Molecular Orbitals (MOs)
Bearing in mind these arguments, we examine several examples of quantum chem-

ical calculations. Our approach is based upon the molecular orbital theory. The
theory assumes the existence of molecular orbitals (MOs) in a molecule, as the
notion of atomic orbitals has been well established in atomic physics. Furthermore,
we assume that the molecular orbitals comprise a linear combination of atomic
orbitals (LCAO).
This notion is equivalent to that individual electrons in a molecule are indepen-
dently moving in a potential field produced by nuclei and other electrons. In other
words, we assume that each electron is moving along an MO that is extended over
19.2 Method of Molecular Orbitals (MOs) 735
the whole molecule. Electronic state in the molecule is formed by different MOs of
various energies that are occupied by electrons. As in the case of an atom, an MO
ψ i(r) occupied by an electron is described as

ħ2 2
Hψ i ðrÞ ∇ þ V ðrÞ ψ i ðrÞ ¼ E i ψ i ðrÞ, ð19:24Þ
2m
where H is Hamiltonian of a molecule; m is a mass of an electron (note that we do not

use a reduced mass μ here); r is a position vector of the electron; ∇2 is the Laplacian
(Laplace operator); V(r) is a potential of the molecule at r; Ei is an energy of the
electron occupying ψ i and said to be a molecular orbital energy. We assume that V(r)
possesses a symmetry the same as that of the molecule.
Let ℊ be a symmetry group and let a group element arbitrarily chosen from among
ℊ be R. Suppose that an arbitrary position vector r is moved to another position r0.
This transformation is expressed as (19.2). Since V(r) has the same symmetry as the
molecule, an electron “feels” the same potential field at r0 as that at r. That is,
V ðrÞ ¼ V 0 ðr0 Þ ¼ V ðr0 Þ: ð19:25Þ
Or we have
V ðrÞψ ðrÞ ¼ V ðr0 Þψ 0 ðr0 Þ, ð19:26Þ
where ψ is an arbitrary function. Defining
V ðrÞψ ðrÞ ½Vψ ðrÞ ¼ Vψ ðrÞ, ð19:27Þ
and recalling (19.1) and (19.7), we get
½RV ψ ðr0 Þ ¼ R½Vψ ðr0 Þ ¼ V 0 ψ 0 ðr0 Þ ¼ V 0 ðr0 Þψ 0 ðr0 Þ ¼ V ðr0 Þψ 0 ðr0 Þ

ð19:28Þ
¼ V ðr0 ÞRψ ðr0 Þ ¼ VRψ ðr0 Þ ¼ ½VRψ ðr0 Þ:
Comparing the first and last sides and remembering that ψ is an arbitrary function,
we get
RV ¼ VR: ð19:29Þ
Next, let us examine the symmetry of the Laplacian ∇2. The Laplacian is defined
in Sect. 1.2 as
2 2 2
∂ ∂ ∂
∇2 2
þ 2þ 2, ð1:24Þ
∂x ∂y ∂z
where x, y, and z denote the Cartesian coordinates. Let S be an orthogonal matrix that
transforms the xyz-coordinate system to x’y’z’-coordinate system. We suppose that
S is expressed as
0 1 0 1 0 10 1
s11 s12 s13 x0 s11 s12 s13 x
B C B 0C B CB C
S ¼ @ s21 s22 s23 A and @ y A ¼ @ s21 s22 s23 A@ y A: ð19:30Þ
s31 s32 s33 z0 s31 s32 s33 z
Since S is an orthonormal matrix, we have

0 1 0 10 0 1
x s11 s21 s31 x
B C B CB C
@ y A ¼ @ s12 s22 s32 A@ y0 A: ð19:31Þ
0
z s13 s23 s33 z
The equation is due to
SST ¼ ST S ¼ E, ð19:32Þ
where E is a unit matrix. Then we have
∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂ ∂
0
¼ 0 þ 0 þ 0 ¼ s11 þ s12 þ s13 : ð19:33Þ
∂x ∂x ∂x ∂x ∂y ∂x ∂z ∂x ∂y ∂z
Partially differentiating (19.33), we have
∂ ∂ ∂ ∂ ∂ ∂ ∂
¼ s11 0 þ s12 0 þ s13 0
∂x0 2 ∂x ∂x ∂x ∂y ∂x ∂z

∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂
¼ s11 s11 þ s12 þ s13 þ s12 s11 þ s12 þ s13
∂x ∂y ∂z ∂x ∂x ∂y ∂z ∂y

∂ ∂ ∂ ∂
þs13 s11 þ s12 þ s13
∂x ∂y ∂z ∂z
2 2 2
∂ ∂ ∂
¼ s11 2 þ s11 s12 þ s11 s13
∂x2 ∂y∂x ∂z∂x
2 2 2
∂ ∂ ∂
þs12 s11 þ s12 2 2 þ s12 s13
∂x∂y ∂y ∂z∂y
2 2 2
∂ ∂ ∂
þs13 s11 þ s13 s12 þ s13 2 2 :
∂x∂z ∂y∂z ∂z
ð19:34Þ
Calculating e.g., terms of ∂y∂0 2 and ∂z∂0 2, we have similar results. Then, collecting all
those 27 terms, we get
2 2
∂ ∂
s11 2 þ s21 2 þ s31 2 2
¼ 2, ð19:35Þ
∂x ∂x
2 2
∂
where we used an orthogonal relationship of S. Cross terms of , ∂ ,
∂x∂y ∂y∂x
etc. all
vanish. Consequently, we get
2 2 2
∂ ∂ ∂ ∂ ∂ ∂
þ þ ¼ þ þ : ð19:36Þ
∂x0 2 ∂y0 2 ∂z0 2 ∂x2 ∂y2 ∂z2
Defining ∇02 as
∂ ∂ ∂
∇0
2
0
þ 02 þ 02 , ð19:37Þ
∂x 2
∂y ∂z
we have
∇ 0 ¼ ∇2 :
2
ð19:38Þ
Notice that (19.38) holds with not only the symmetry operation of the molecule,
but also any rotation operation in ℝ3 (see Sect. 17.4).
As in the case of (19.28), we have

R∇2 ψ ðr0 Þ ¼ R ∇2 ψ ðr0 Þ ¼ ∇0 ψ 0 ðr0 Þ ¼ ∇2 ψ 0 ðr0 Þ
2
ð19:39Þ
¼ ∇2 Rψ ðr0 Þ ¼ ∇2 R ψ ðr0 Þ:
R∇2 ¼ ∇2 R: ð19:40Þ
Adding both sides of (19.29) and (19.40), we get
R ∇2 þ V ¼ ∇2 þ V R: ð19:41Þ
RH ¼ HR: ð19:42Þ
Thus, we confirm that the Hamiltonian H commutes with any symmetry operation
R. In other words, H is invariant with the coordinate transformation relevant to the
symmetry operation.
Now we consider matrix representation of (19.42). Also let D be an irreducible
representation of the symmetry group ℊ. Then, we have
DðgÞH ¼ HDðgÞ ðg 2 ℊÞ:
Let {ψ 1, ψ 2, , ψ d} be a set of basis vectors that span a representation space L S

associated with D. Then, on the basis of Schur’s Second Lemma of Sect. 18.3,
(19.42) immediately leads to an important conclusion that if H is represented by a
matrix, we must have
H ¼ λE, ð19:43Þ
where λ is an arbitrary complex constant. That is, H is represented such that

0 1
λ 0
B C
H ¼ @⋮ ⋱ ⋮ A, ð19:44Þ
0 λ
where the above matrix is (d, d ) square diagonal matrix. This implies that H is
reduced to
H ¼ λDð0Þ λDð0Þ ,
where D(0) denotes the totally symmetric representation; notice that it is given by
1 for any symmetry operation. Thus, the commutability of an operator with all
symmetry operation is equivalent to that the operator belongs to the totally symmet-
ric representation.
Since H is Hermitian, λ should be real. Operating both sides of (19.42) on
{ψ 1, ψ 2, , ψ d} from the right, we have
ðψ 1 ψ 2 ψ d ÞRH ¼ ðψ 1 ψ 2 ψ d ÞHR
0 1
λ
B C
B λ C
B C
¼ ðψ 1 ψ 2 ψ d ÞB CR ¼ ðλψ 1 λψ 2 λψ d ÞR : ð19:45Þ
B ⋮ ⋱ ⋮C
@ A
λ
¼ λðψ 1 ψ 2 ψ d ÞR
In particular putting R ¼ E, we get

ðψ 1 ψ 2 ψ d ÞH ¼ λðψ 1 ψ 2 ψ d Þ:
Simplifying this equation, we have
ψ i H ¼ λψ i ð1 i dÞ: ð19:46Þ
Thus, λ is found to be an energy eigenvalue of H and ψ i (1 i d ) are

eigenfunctions belonging to the eigenvalue λ.
Meanwhile, using
Xd
Rðψ i Þ ¼ k¼1
ψ k Dki ðRÞ,
ðψ 1 ψ 2 ψ d ÞRH ¼ ðψ 1 R ψ 2 R ψ d RÞH
Xd Xd Xd
¼ ψ D ðRÞ k¼1 ψ k Dk2 ðRÞ k¼1 ψ k Dkd ðRÞ H ,
k¼1 k k1
Xd Xd Xd
¼λ k¼1
ψ k D k1 ð RÞ k¼1
ψ k D k2 ðR Þ ψ D ðRÞ
k¼1 k kd
where with the second equality we used the relation (11.42); i.e.,
ψ i R ¼ Rðψ i Þ:
Thus, we get
Xd Xd
k¼1
ψ k Dki ðRÞH ¼ λ k¼1
ψ k Dki ðRÞ ð1 i d Þ: ð19:47Þ
The relations (19.45) to (19.47) imply that ψ 1, ψ 2, , and ψ d as well as their

linear combinations using a representation matrix D(R) are eigenfunctions that
belong to the same eigenvalue λ. That is, ψ 1, ψ 2, , and ψ d are said to be degenerate
with a multiplication d.
After the remarks of Theorem 14.5, we can construct an orthonormal basis set
fψe1, ψ
e 2 , , ψ
e d g as eigenfunctions. These vectors can be constructed via linear
combinations of ψ 1, ψ 2, , and ψ d. After transforming the vectors
e i ð1 i d Þ by R, we get
ψ
DXd Xd E Xd Xd
e
ψ
k¼1 k
D ki ð RÞj e
ψ
l¼1 l
D lj ð R Þ ¼ k¼1 l¼1 ki
e k je
D ðRÞ Dlj ðRÞhψ ψ li
Xd Xd
¼ D ðRÞ Dkj ðRÞ ¼
k¼1 ki k¼1
D{ ðRÞ ik ½DðRÞkj ¼ D{ ðRÞDðRÞ ij ¼ δij :
With the last equality, we used unitarity of D(R). Thus, the orthonormal basis set
is retained after unitary transformation of fψ e 2 , , ψ
e1, ψ e d g.
In the above discussion, we have assumed that ψ 1, ψ 2, , and ψ d belong to a
certain irreducible representation D(ν). Since ψ e1, ψ
e 2 , , and ψ e d consist of their linear
combinations, ψ e1, ψ
e 2 , , and ψ
e d belong to D(ν) as well. Also, we have assumed
that ψe1, ψ
e 2 , , and ψ e d form an orthonormal basis and, hence, according to
Theorem 11.3 these vectors constitute basis vectors that belong to D(ν). In particular,
functions derived using projection operators share these characteristics. In fact, the
above principles underlie molecular orbital calculations dealt with in Sects. 19.4 and
19.5. We will go into more details in subsequent sections.
Bearing the aforementioned argument in mind, we make the most of the relation
expressed by (18.171) to evaluate inner products of functions (or vectors). In this
connection, we often need to calculate matrix elements of an operator. One of the
most typical examples is matrix elements of Hamiltonian. In the field of molecular
science we have to estimate an overlap integral, Coulomb integral, resonance
integral, etc. Other examples include transition matrix elements pertinent to, e.g.,
electric dipole transition. To this end, we deal with direct-product representation (see
Sects. 18.7 and 18.8). Let O(γ) be an Hermitian operator belonging to the γ-th
irreducible representation. Let us think of a following inner product:
D E
ðβ Þ
ϕl jOðγÞ ψ ðsαÞ , ð19:48Þ
ðβ Þ
where ψ ðsαÞ and ϕl are the s-th component of α-th irreducible representation and the
l-th component of β-th irreducible representation, respectively; see (18.146) and
(18.161).
(i) Let us think of first the case where O(γ) is Hamiltonian H. Suppose that ψ ðsαÞ
belongs to an eigenvalue λ of H. Then, we have
ð αÞ ðαÞ
H j ψ l i ¼ λ j ψ l i: ð19:49Þ
In that case, with (19.48) we get

D E D E
ðβÞ ðβ Þ
ϕl jHψ ðsαÞ ¼ λ ϕl jψ ðsαÞ : ð19:50Þ
This equation is essentially the same as (18.171). That is,

D E D E
ð αÞ ðαÞ
ϕðsβÞ jψ l ¼ δαβ δls ϕðsβÞ jψ l : ð18:171Þ
At the first glance this equation seems trivial, but the equation gives us a
powerful tool to save us a lot of troublesome calculations. In fact, if we
encounter a series of inner product calculations (i.e., definite integrals), we
ignore many of them. It is because matrix elements of Hamiltonian do not

vanish only if α ¼ β and l ¼ s. That is, we only have to estimate ϕðsαÞ jψ ðsαÞ .
Otherwise the inner products vanish. The functional forms of ϕðsαÞ and ψ ðsαÞ are
determined depending upon individual practical problems. We will show tan-
gible examples later (Sect. 19.4).
(ii) Next, let us consider matrix elements of the optical transition. In this case, we
are thinking of transition probability between Dquantum states. E Assuming the
ðβ Þ
dipole approximation, we use εe P(γ) for O(γ) in ϕl jOðγÞ ψ ðsαÞ of (19.48). The
quantity P(γ) represents an electric dipole moment associated with the position
vector. Normally, we can readily find it in a character table, in which we
examine which irreducible representation γ the position vector components
x, y, and z indicated in a rightmost column correspond to. Table 18.4 is an
example. If we take a unit polarization vector εe in parallel to the position vector
component that the character table designates, εe P(γ) is nonvanishing. Suppose
ðβÞ
that ψ ðsαÞ and ϕl are an initial state and final state, respectively. At the first
glance, we would wish to use (18.200) and count how many times a represen-
tation β occurs for a direct-product representation D(γ α). It is often the case,
however, where β is not an irreducible representation.
Even in such a case, if either α or β belongs to a totally symmetric representation,
the handling will be easier. This situation corresponds to that an initial electronic
configuration or a final electronic configuration forms a closed shell an electronic
state of which is totally symmetric [1]. The former occurs when we consider the
optical absorption that takes place, e.g., in a molecule of a ground electronic state.
The latter corresponds to an optical emission that ends up with a ground state. Let us
ðγ Þ
consider the former case. Since Oj is Hermitian, we rewrite (19.48) as
D E D E D E
ðβÞ ðγ Þ ðγ Þ ðβÞ ðγ Þ ðβ Þ
ϕl jOj ψ ðsαÞ ¼ Oj ϕl jψ ðsαÞ ¼ ψ ðsαÞ jOj ϕl , ð19:51Þ
where we assume that ψ ðsαÞ is a ground state having a closed shell electronic
configuration. Therefore, ψ ðsαÞ belongs to a totally symmetric irreducible represen-
tation. For (19.48) not to vanish, therefore, we may alternatively state that it is
necessary for
Dðγ βÞ
¼ DðγÞ DðβÞ
to contain D(α) belonging to a totally symmetric representation. Note that in group

theory we usually write D(γ) D(β) instead of D(γ) D(β); see (18.191).
ðβÞ
If ϕl belongs to a reducible representation, from (18.80) we have
X
DðβÞ ¼ q DðωÞ ,
ω ω
where D(ω) belongs to an irreducible representation ω. Then, we get

X
DðγÞ DðβÞ ¼ q DðγÞ
ω ω
DðωÞ :
Thus, applying (18.199) and (18.200), we can obtain a direct sum of irreducible
representations. After that, we examine whether an irreducible representation α is
contained in D(γ) D(β). Since the totally symmetric representation is
one-dimensional, s ¼ 1 in (19.51).
19.3 Calculation Procedures of Molecular Orbitals (MOs)
We describe a brief outline of the MO method based on LCAO (LCAOMO).

Suppose that a molecule consists of n atomic orbitals and that each MO ψ i (1 i n)
comprises a linear combination of those n atomic orbitals ϕk (1 k n). That is, we
have
Xn
ψi ¼ c ϕ
k¼1 ki k
ð1 i nÞ, ð19:52Þ
where cki are complex coefficients. We assume that ϕk are normalized. That is,
ð
hϕk jϕk i ϕk ϕk dτ ¼ 1, ð19:53Þ
where dτ implies that an integration should be taken over ℝ3. The notation hg| fi
means an inner product defined in Sect. 13.1. The inner product is usually defined by
a definite integral of g f whose integration range covers a part or all of ℝ3 depending
on a constitution of a physical system.
First we try to solve Schrödinger equation given as an eigenvalue equation. The
said equation is described as
Hψ ¼ λψ or ðH λψ Þ ¼ 0: ð19:54Þ
Replacing ψ with (19.52), we have

19.3 Calculation Procedures of Molecular Orbitals (MOs) 743
Xn
c ðH
k¼1 k
λÞϕk ¼ 0 ð1 k nÞ, ð19:55Þ
where the subscript i in (19.52) has been omitted for simplicity. Multiplying ϕj from
the left and integrating over whole ℝ3, we have
Xn ð
k¼1
c k ϕj ðH λÞϕk dτ ¼ 0: ð19:56Þ
Xn ð
c
k¼1 k
ϕj Hϕk λϕj ϕk dτ ¼ 0: ð19:57Þ
Here let us define following quantities:

ð ð
H jk ¼ ϕj Hϕk dτ and Sjk ¼ ϕj ϕk dτ, ð19:58Þ
where Sii ¼ 1 (1 i n) due to a normalized function of ϕi. Then we get

Xn
c
k¼1 k
H jk λSjk ¼ 0: ð19:59Þ
Rewriting (19.59) in a matrix form, we get

0 10 1
H 11 λ H 1n λS1n c1
B CB C
@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ 0, ð19:60Þ
H n1 λSn1 H nn λ cn
where note that Sii ¼ 1 (1 i n) because of normalization of ϕi. Suppose that by

solving (19.59) or (19.60), we get λi (1 i n), some of which may be identical
(i.e., the degenerate case), and obtain n different column eigenvectors corresponding
to n eigenvalues λi. In light of (12.4) of Sect. 12.1, the following condition must be
satisfied for this to get eigenvectors for which not all ck is zero:

H 11 λ H 1n λS1n

⋮ ⋱ ⋮ ¼0 ð19:61Þ

H n1 λSn1 H nn λ
Equation (19.61) is called a secular equation. This equation is pertinent to a

determinant of an order n, and so we are expected to get n roots for this, some of
which are identical (i.e., a degenerate case).
In the above discussion, it is useful to introduce the following notation:
ð ð

ϕj jHϕk H jk ¼ ϕj Hϕk dτ and ϕj jϕk Sjk ¼ ϕj ϕk dτ: ð19:62Þ
The above notation has already been introduced in Sect. 1.4. Equation (19.62)
certainly satisfies the definition of the inner product described in (13.2)–(13.4);
readers, please check it. On the basis of (13.64), we have

hyjHxi ¼ xH { jy : ð19:63Þ
Since H is Hermitian, H{ = H. That is,
hyjHxi ¼ hxHjyi: ð19:64Þ
To solve (19.61) with an enough large number n is usually formidable. However,

if we can find appropriate conditions, (19.61) can pretty easily be solved. An
essential point rests upon how we deal with off-diagonal elements Hjk and Sjk. If
we are able to appropriately choose basis vectors so that we can get
H jk ¼ 0 and Sjk ¼ 0 for j 6¼ k, ð19:65Þ
the secular equation is reduced to a simple form

He e
11 λS11
e 22 λe
H S22
¼ 0, ð19:66Þ

⋱

e nn λe
H Snn
where all the off-diagonal elements are zero and an eigenvalue λ is given by
e ii =e
λi ¼ H Sii : ð19:67Þ
This means that the eigenvalue equation has automatically been solved.
The best way to achieve this is to choose basis functions (vectors) such that the
functions conform to the symmetry which the molecule belongs to. That is, we
“shuffle” the atomic orbitals as follows:
19.3 Calculation Procedures of Molecular Orbitals (MOs) 745
Xn
ξi ¼ d ϕ
k¼1 ki k
ð1 i nÞ, ð19:68Þ
where ξi are new functions chosen instead of ψ i of (19.52). That is, we construct a
linear combination of the atomic orbitals that belongs to individual irreducible
representation. The said linear combination is called symmetry-adapted linear com-
bination (SALC). We remark that all atomic orbitals are not necessarily included in
the SALC. Then we have
ð ð
e
H jk ¼ ξj Hξk dτ and ~
Sjk ¼ ξj ξk dτ: ð19:69Þ
Thus, instead of (19.61), we have a new secular equation of
e jk λ~
det H Sjk ¼ 0: ð19:70Þ
Since the Hamiltonian H is totally symmetric, in terms of the direct-product

representation Hξk in (19.69) belongs to an irreducible representation which ξk
belongs to. This can intuitively be understood. But, to assert this, use (18.76) and
(18.200) along with the fact that characters of the totally symmetric representation
are 1.
In light of (18.171) and (18.181), if ξk and ξj belong to different irreducible
e jk and e
representations, H Sjk both vanish at once. From (18.172), at the same time, ξk
and ξj are orthogonal to each other. The most ideal situation is to get (19.66) with all
the off-diagonal elements vanishing. Regarding the diagonal elements of (19.70), we
always have the direct product of the same representation and, hence, the integrals
(19.69) do not vanish. Thus, we get a powerful guideline for the evaluation of
(19.70) in an as simplest as possible form.
If the SALC orbitals ξj and ξk belong to the same irreducible representation, the
relevant matrix elements are generally nonvanishing. Even in that case, however,
according to Theorem 13.2 we can construct a set of orthonormal vectors by taking
appropriate linear combination of ξj and ξk. The resulting vectors naturally belong to
the same irreducible representation. This can be done by solving the secular equation
as described below. On top of it, Hermiticity of the Hamiltonian ensures that those
vectors can be rendered orthogonal (see Theorem 14.5). If n electrons are present in a
molecule, we are to deal with n SALC orbitals which we view as vectors accord-
ingly. In terms of representation theory, these vectors span a representation space
where the vectors undergo symmetry operations. Thus, we can construct orthonor-
mal basis vectors belonging to various irreducible representations throughout the
representation space.
To address our problems, we take following procedures: (i) First we have to
determine a symmetry species (i.e., point group) of a molecule. (ii) Next, we pick up
atomic orbitals contained in a molecule and examine how those orbitals are
transformed by symmetry operations. (iii) We examine how those atomic orbitals
are transformed according to symmetry operations. Since the symmetry operations
are represented by a (n, n) unitary matrix, we can readily decide a trace of the matrix.
Generally, that matrix representation is reducible, and so we reduce the representa-
tion according to the procedures of (18.81)–(18.83). Thus, we are able to determine
how many irreducible representations are contained in the original reducible repre-
sentation. (iv) After having determined the irreducible representations, we construct
SALCs and constitute a secular equation using them. (v) Solving the secular
equation, we determine molecular orbital energies and decide functional forms of
the corresponding MOs. (vi) We examine physicochemical properties such as optical
transition within a molecule.
In the procedure (iii) of the above, if the same irreducible representations appear
more than once, we have to solve a secular equation of an order of two or more. Even
in that case, we can render a set of resulting eigenvectors orthogonal to one another
during the process of solving a problem.
To construct the abovementioned SALC, we make the most of projection oper-
ators that are defined in Sect. 14.1. In (18.147) putting m ¼ l, we have
ðαÞ dα X ðαÞ
g ll
ð19:71Þ
n
Or we can choose a projection operator P(α) described as

h i
dα X Xdα ðαÞ dα X ðαÞ
PðαÞ ¼ D
l¼1 ll
ð g Þ g ¼ χ ð g Þ g: ð18:174Þ
n g n g
ðαÞ
In the one-dimensional representation, PlðlÞ and P(α) are identical. As expressed in
(18.155) and (18.175), these projection operators act on an arbitrary function and
extract specific component(s) pertinent to a specific irreducible representation of the
point group which the molecule belongs to.
ðαÞ
At first glance the definition of PlðlÞ and P(α) looks daunting, but use of character
tables relieves a calculation task. In particular, all the irreducible representations are
one-dimensional (i.e., just a number!) with Abelian groups as mentioned in Sect.
18.6. For this, notice that individual group elements form a class by themselves.
Therefore, utilization of character tables becomes easier for Abelian groups. Even
though we encounter a case where a dimension of representation is more than one
ðαÞ
(i.e., the case of noncommutative groups), Dll can be determined without much
difficulty.
19.4 MO Calculations Based on π-Electron Approximation 747
19.4 MO Calculations Based on π-Electron Approximation
On the basis of the general argument developed in Sects. 19.2 and 19.3, we preform
molecular orbital calculations of individual molecules. First, we apply group theory
to the molecular orbital calculations about aromatic hydrocarbons such as ethylene,
cyclopropenyl radical and cation as well as benzene and allyl radical. In these cases,
in addition to adoption of the molecular orbital theory, we adopt so-called
“π-electron approximation.” With the first three molecules, we will not have to
“solve” a secular equation. But, for allyl radical we deal with two SALC orbitals
belonging to the same irreducible representations and, hence, the final molecular
orbitals must be obtained by solving the secular equation.
19.4.1 Ethylene
We start with one of the simplest examples, ethylene. Ethylene is a planar molecule
and belongs to D2h symmetry (see Sect. 17.2). In the molecule two π-electrons
extend vertically to the molecular plane toward upper and lower directions
(Fig. 19.5). The molecular plane forms a node to atomic 2pz orbitals; that is, those
atomic orbitals change a sign relative to the molecular plane (i.e., the xy-plane).
In Fig. 19.5, two pz atomic orbitals of carbon are denoted by ϕ1 and ϕ2. We
should be able to construct basis vectors using ϕ1 and ϕ2. Corresponding to the two
atomic orbitals, we are dealing with a two-dimensional vector space. Let us consider
how ϕ1 and ϕ2 are transformed by a symmetry operation. First we examine an
operation C2(z). This operation exchanges ϕ1 and ϕ2. That is, we have
C 2 ðzÞðϕ1 Þ ¼ ϕ2 and ð19:72Þ
+ z +
y y
& &
x
+ +
x
Fig. 19.5 Ethylene molecule placed on the xyz-coordinate system. Two pz atomic orbitals of carbon
are denoted by ϕ1 and ϕ2. The atomic orbitals change a sign relative to the molecular plane (i.e., the
xy-plane)
C2 ðzÞðϕ2 Þ ¼ ϕ1 : ð19:73Þ
Equation (19.73) can be combined into a following equation:
ðϕ1 ϕ2 ÞC 2 ðzÞ ¼ ðϕ2 ϕ1 Þ: ð19:74Þ
Thus, using a matrix representation, we have

0 1
C2 ðzÞ ¼ : ð19:75Þ
1 0
In Sect. 17.2, we had

0 1
cos θ sin θ 0
B C
Rzθ ¼ @ sin θ cos θ 0 A or ð19:76Þ
0 0 1
0 1
1 0 0
B C
C 2 ðzÞ ¼ @ 0 1 0 A: ð19:77Þ
0 0 1
Whereas (19.76) and (19.77) show the transformation of a position vector in ℝ3,
(19.75) represents the transformation of functions in a two-dimensional vector space.
A vector space composed of functions may be finite-dimensional or infinite-dimen-
sional. We have already encountered the latter case in Part I where we dealt with the
quantum mechanics of a harmonic oscillator. Such a function space is often referred
to as a Hilbert space. An essence of (19.75) is characterized by that a trace
(or character) of the matrix is zero.
Let us consider another transformation C2( y). In this case the situation is different
from the above case in that ϕ1 is converted to ϕ1 by C2( y) and that ϕ2 is converted
to ϕ2. Notice again that the molecular plane forms a node to atomic p-orbitals.
Thus, C2( y) is represented by

1 0
C 2 ð yÞ ¼ : ð19:78Þ
0 1
The trace of the matrix C2( y) is 2. In this way, choosing ϕ1 and ϕ2 for the basis
functions for the representation of D2h we can determine the characters for individual
symmetry transformations of atomic 2pz orbitals in ethylene belonging to D2h. We
collect the results in Table 19.1.
Next, we examine what kind of irreducible representations is contained in our
present representation of ethylene. To do this, we need a character table of D2h (see
Table 19.2). If a specific kind of irreducible representation is contained, then we
Table 19.1 Characters for individual symmetry transformations of atomic 2pz orbitals in ethylene
Γ 2 0 2 0 0 2 0 2
Table 19.2 Character table of D2h

Ag 1 1 1 1 1 1 1 1 x2, y2, z2
B1g 1 1 1 1 1 1 1 1 xy
B2g 1 1 1 1 1 1 1 1 zx
B3g 1 1 1 1 1 1 1 1 yz
Au 1 1 1 1 1 1 1 1
B1u 1 1 1 1 1 1 1 1 z
B2u 1 1 1 1 1 1 1 1 y
B3u 1 1 1 1 1 1 1 1 x
want to examine how many times that specific representation takes place. Equation
(18.83) is very useful for this purpose. In the present case, n ¼ 8 in (18.83). Also
taking into account (18.79) and (18.80), we get
Γ ¼ B1u B3g , ð19:79Þ
where Γ is a reducible representation for a set consisting of two pz atomic orbitals of

ethylene. Equation (19.79) clearly shows that Γ is a direct sum of two irreducible
representations B1u and B3g that belong to D2h. In instead of (19.79), we usually
simply express the direct product in quantum chemical notation as
Γ ¼ B1u þ B3g : ð19:80Þ
As a next step, we are going to find an appropriate basis function belonging to two
irreducible representations of (19.80). For this purpose, we use projection operators
expressed as
h i
dα X ðαÞ
PðαÞ ¼ χ ðgÞ g: ð18:174Þ
n g
Taking ϕ1 for instance, we apply (18.174) to ϕ1. That is,

Table 19.3 Character table C2 E C2

of C2
A 1 1 z; x2, y2, z2, xy
B 1 1 x, y; yz, zx
h i
1 X ðB1u Þ
PðB1u Þ ϕ1 ¼ χ ð g Þ gϕ1
8 g
1
¼ ½1 ϕ1 þ 1 ϕ2 þ ð1Þðϕ1 Þ þ ð1Þðϕ2 Þ þ ð1Þðϕ2 Þ
8
1
þð1Þðϕ1 Þ þ 1 ϕ2 þ 1 ϕ1 ¼ ðϕ1 þ ϕ2 Þ: ð19:81Þ
2
Also with the B3g, we apply (18.174) to ϕ1 and get

h i
1 X ðB3g Þ
PðB3g Þ ϕ1 ¼ χ ð g Þ gϕ1
8 g
1
¼ ½1 ϕ1 þ ð1Þϕ2 þ ð1Þðϕ1 Þ þ 1ðϕ2 Þ þ 1ðϕ2 Þ þ ð1Þðϕ1 Þ
8
1
þð1Þϕ2 þ 1 ϕ1 ¼ ðϕ1 ϕ2 Þ: ð19:82Þ
2
Thus, after going through routine but sure procedures, we have reached appro-
priate basis functions that belong to each irreducible representation.
As mentioned earlier (see Table 17.5), D2h can be expressed as a direct-product
group and C2 group is contained in D2h as a subgroup. We can use it for the present
analysis. Suppose now that we have a group ℊ and that H is a subgroup of ℊ. Let
g be an arbitrary element of ℊ and let D(g) be a representation of g. Meanwhile, let
h be an arbitrary element of H . Then with 8 h 2 H , a collection of D(h) is a
representation of H . We write this relation as
D#H: ð19:83Þ
This representation is called a subduced representation of D to H . Table 19.3

shows a character table of irreducible representations of C2. In the present case, we
are thinking of C2(z) as C2; see Table 17.5 and Fig. 19.5. Then we have
B1u # C 2 ¼ A and B3g # C2 ¼ B: ð19:84Þ
The expression of (19.84) is called a compatibility relation. Note that in (19.84)

C2 is not a symmetry operation, but means a subgroup of D2h. Thus, (19.81) is
reduced to
h i
1 X ðAÞ 1 1
PðAÞ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ 1 ϕ2 ¼ ðϕ1 þ ϕ2 Þ: ð19:85Þ
2 g 2 2
Also, we have
h i
1 X ðBÞ 1 1
PðBÞ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ ð1Þϕ2 ¼ ðϕ1 ϕ2 Þ: ð19:86Þ
2 g 2 2
The relations of (19.85) and (19.86) are essentially the same as (19.81) and
(19.82), respectively.
We can easily construct a character table (see Table 19.3). There should be two
irreducible representations. Regarding the totally symmetric representation, we
allocate 1 to each symmetry operation. For another representation we allocate 1 to
an identity element E and 1 to an element C2 so that the row and column of the
character table are orthogonal to each other.
Readers might well ask why we bother to make circuitous approaches to reaching
predictable results such as (19.81) and (19.82) or (19.85) and (19.86). This question
seems natural when we are dealing with a case where the number of basis vectors
(i.e., a dimension of the vector space) is small, typically 2 as in the present case. With
increasing dimension of the vector space, however, to seek and determine appropri-
ate SALCs become increasingly complicated and difficult. Under such circum-
stances, a projection operator is an indispensable tool to address the problems.
We have a two-dimensional secular equation to be solved such that

e
H 11 λe
S11 e 12 λe
H S12
¼ 0: ð19:87Þ
He 21 λe
S21 e 22 λe
H S22
In the above argument, 12 ðϕ1 þ ϕ2 Þ belongs to B1u and 12 ðϕ1 ϕ2 Þ belongs to B3g.
Since they belong to different irreducible representations, we have
e 12 ¼ e
H S12 ¼ 0: ð19:88Þ
Thus, the secular equation (19.84) is reduced to

e
H 11 λe
S11 0
¼ 0: ð19:89Þ
0 e 22 λe
H S22
As expected, (19.89) has automatically been solved to give a solution
e 11
H
λ1 ¼ e 22 =e
and λ2 ¼ H S22 : ð19:90Þ
e
S11
The next step is to determine the energy eigenvalue of the molecule. Note here
that a role of SALCs is to determine a suitable irreducible representation that
corresponds to a “direction” of a vector. As the coefficient keeps the direction of a
vector unaltered, it would be of secondary importance. The final form of normalized
MOs can be decided last. That procedure includes the normalization of a vector.
Thus, we tentatively choose following functions for SALCs, i.e.,
ξ1 ¼ ϕ1 þ ϕ2 and ξ2 ¼ ϕ1 ϕ2 :
Then, we have
ð ð
e 11 ¼ ξ1 Hξ1 dτ ¼ ðϕ1 þ ϕ2 Þ H ðϕ1 þ ϕ2 Þdτ
H
ð ð ð ð
ð19:91Þ
¼ ϕ1 Hϕ1 dτ þ ϕ1 Hϕ2 dτ þ ϕ2 Hϕ1 dτ þ ϕ2 Hϕ2 dτ
¼ H 11 þ H 12 þ H 21 þ H 22 ¼ H 11 þ 2H 12 þ H 22 :
Similarly, we have
ð
e 22 ¼ ξ2 Hξ2 dτ ¼ H 11 2H 12 þ H 22 :
H ð19:92Þ
The last equality comes from the fact that we have chosen real functions for ϕ1
and ϕ2 as studied in Part I. Moreover, we have
ð ð ð
H 11 ϕ1 Hϕ1 dτ ¼ ϕ1 Hϕ1 dτ ¼ ϕ2 Hϕ2 dτ ¼ H 22 ,
ð ð ð ð ð19:93Þ
H 12 ϕ1 Hϕ2 dτ ¼ ϕ1 Hϕ2 dτ ¼ ϕ2 Hϕ1 dτ ¼ ϕ2 Hϕ1 dτ ¼ H 21 :
The first equation comes from the fact that both H11 and H22 are calculated using
the same 2pz atomic orbital of carbon. The second equation results from the fact that
H is Hermitian. Notice that both ϕ1 and ϕ2 are real functions. Following the
convention, we denote
α H 11 ¼ H 22 and β H 12 , ð19:94Þ
where α is called Coulomb integral and β is said to be resonance integral. Then, we

have
e 11 ¼ 2ðα þ βÞ:
H ð19:95Þ
In a similar manner, we get
e 22 ¼ 2ðα βÞ:
H ð19:96Þ
Meanwhile, we have
ð ð
e
S11 ¼ hξ1 jξ1 i ¼ ðϕ1 þ ϕ2 Þ ðϕ1 þ ϕ2 Þdτ ¼ ðϕ1 þ ϕ2 Þ2 dτ
ð ð ð ð
¼ ϕ21 dτ þ ϕ22 dτ þ 2 ϕ1 ϕ2 dτ ¼ 2 þ 2 ϕ1 ϕ2 dτ, ð19:97Þ
where we used the fact that ϕ1 and ϕ2 have been normalized. Also following the
convention, we denote
ð
S ϕ1 ϕ2 dτ ¼ S12 , ð19:98Þ
where S is called overlap integral. Thus, we have
e
S11 ¼ 2ð1 þ SÞ: ð19:99Þ
Similarly, we get
e
S22 ¼ 2ð1 SÞ: ð19:100Þ
Substituting (19.95) and (19.96) along with (19.99) and (19.100) for (19.90), we
get as the energy eigenvalue
αþβ αβ
λ1 ¼ and λ2 ¼ : ð19:101Þ
1þS 1S
From (19.97), we get

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j jξ1 j j¼ hξ1 jξ1 i ¼ 2ð1 þ SÞ: ð19:102Þ
Thus, for one of MOs corresponding to an energy eigenvalue λ1, we get
j ξ1 i ϕ1 þ ϕ2
Ψ1 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:103Þ
j jξ1 j j 2ð 1 þ SÞ
Also, we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j jξ2 j j¼ hξ2 jξ2 i ¼ 2ð1 SÞ: ð19:104Þ
For another MO corresponding to an energy eigenvalue λ2, we get

= ( )
= ( )
Fig. 19.6 HOMO and LUMO energy levels and their assignments of ethylene
j ξ2 i ϕ1 ϕ2
Ψ2 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:105Þ
j jξ2 j j 2ð 1 SÞ
Note that both normalized MOs and energy eigenvalues depend upon whether we
ignore an overlap integral, as being the case with the simplest Hückel approximation.
Nonetheless, MO functional forms (in this case either ϕ1 + ϕ2 or ϕ1 ϕ2) remain the
same regardless of the approximation levels about the overlap integral. Regarding
quantitative evaluation of α, β, and S, we will briefly mention it later.
Once we have decided symmetry of MO (or an irreducible representation which
the orbital belongs to) and its energy eigenvalue, we will be in a position to examine
various physicochemical properties of the molecule. One of them is an optical
transition within a molecule, particularly electric dipole transition. In most cases,
the most important transition is that occurring among the highest-occupied molec-
ular orbital (HOMO) and lowest-unoccupied molecular orbital (LUMO). In the case
of ethylene, those levels are depicted in Fig. 19.6. In a ground state (i.e., the most
stable state), two electrons are positioned in a B1u state (HOMO). An excited state is
assigned to B3g (LUMO). The ground state that consists only of fully occupied MOs
belongs to a totally symmetric representation. In the case of ethylene, the ground
state belongs to Ag accordingly.
If a photon is absorbed by a molecule, that molecule is excited by an energy of the
photon. In ethylene, this process takes place by exciting an electron from B1u to B3g.
The resulting electronic state ends up as an electron remaining in B1u and another
excited to B3g (a final state). Thus, the representation of the final sate electronic
configuration (denoted by Γ f) is described as
Γ f ¼ B1u B3g : ð19:106Þ
That is, the final excited state is expressed as a direct product of the states
associated with the optical transition. To determine the symmetry of the final sate,
we use (18.198) and (18.200). If Γ in (19.106) is reducible, using (18.200) we can
determine the number of times that individual representations take place. Calculating
χ ðB1u Þ ðgÞχ ðB3g Þ ðgÞ for each group element, we can readily get the result. Using a
character table, we have
Γ f ¼ B2u : ð19:107Þ
Thus, we find that the transition is Ag ⟶ B2u, where Ag is called an initial state
electronic configuration and B2u is called a final state electronic configuration. This
transition is characterized by an electric dipole transition moment operator P. Here
we have an important physical quantity of transition matrix element. This quantity Tfi
is approximated by

T fi Θf jεe PjΘi , ð19:108Þ
where εe is a unit polarization vector of the electric field; Θi and Θf are the initial state
and final sate electronic configuration, respectively. The description (19.108) is in
parallel with (4.5). Notice that in (4.5) we dealt with a single electron system such as
a particle confined in a square-well potential, a sole one-dimensional harmonic
oscillator, and an electron in a hydrogen atom.
In the present case, however, we are dealing with a two-electron system, ethylene.
Consequently, we cannot describe Θf by a simple wave function, but must use a
more elaborated function. Nonetheless, when we discuss the optical transition of a
molecule, it is often the case that when we study optical absorption or optical
emission we first wish to know whether such a phenomenon truly takes place. In
such a case, qualitative prediction for this is of great importance. This can be done by
judging whether the integral (19.108) vanishes. If the integral does not vanish, the
relevant transition is said to be allowed. If, on the other hand, the integral vanishes,
the transition is called forbidden. In this context, a systematic approach based on
group theory is a powerful tool for this.
Let us consider optical absorption of ethylene. With the transition matrix element
for this, we have
D E
T fi ¼ Θf ðB2u Þ jεe PjΘi ðAg Þ : ð19:109Þ
Again, note that a closed-shell electronic configuration belongs to a totally

symmetric representation [1]. Suppose that the position vector x belongs to an
irreducible representation η. A necessary condition for (19.109) not to vanish is
that DðηÞ DðAg Þ contains the irreducible representation DðB2u Þ . In the present case,
all the representations are one-dimensional, and so we can use χ (ω) instead of D(ω),
where ω shows an arbitrary irreducible representation of D2h. This procedure is
straightforward as seen in Sect. 19.2.
However, if the character is real (it is the case with many symmetry groups and
with the point group D2h as well), the situation will be easier. Suppose in general that
we are examining whether a following matrix element vanishes:
D E
ðβ Þ ðαÞ
M fi ¼ Φf jOðγÞ jΦi , ð19:110Þ
where α, β, and γ stand for irreducible representations and O is an appropriate

operator. In this case, (18.200) can be rewritten as
1 X ðαÞ 1 X ðαÞ
qα ¼ χ ðgÞ χ ðγ βÞ ðgÞ ¼ χ ðgÞχ ðγ βÞ ðgÞ
n g n g
1 X ðαÞ ðγ Þ ðβ Þ 1 X ðγÞ
¼ χ ð g Þχ ð g Þχ ð g Þ ¼ χ ðgÞχ ðαÞ ðgÞχ ðβÞ ðgÞ ð19:111Þ
n g n g
1 X ðγ Þ ðα β Þ 1 X ðγÞ
¼ χ ð g Þχ ð g Þ ¼ χ ð gÞ χ ð α β Þ ð gÞ ¼ qγ
n g n g
Consequently, the number of times that D(γ) appears in D(α β) is identical to the
number of times that D(α) appears in D(γ β). Thus, it suffices to examine whether
D(γ β) contains D(α). In other words, we only have to examine whether qα 6¼ 0 in
(19.111).
Thus, applying (19.111) to (19.109), we examine whether DðB2u Ag Þ contains the
irreducible representation D(η) that is related to x. We easily get
B2u ¼ B2u Ag : ð19:112Þ
Therefore, if εe P (or x) belongs to B2u, the transition is allowed. Consulting the

character table, we find that y belongs to B2u. In this case, in fact (19.111) reads as
1
qB2u ¼ ½1 1 þ ð1Þ ð1Þ þ 1 1 þ ð1Þ ð1Þ
8
þð1Þ ð1Þ þ 1 1 þ ð1Þ ð1Þ þ 1 1 ¼ 1:
Equivalently, we simply write
B2u B2u ¼ Ag :
This means that if a light polarized along the y-axis is incident, i.e., εe is parallel to
the y-axis, the transition is allowed. In that situation, ethylene is said to be polarized
along the y-axis or polarized in the direction of the y-axis. As a molecular axis is
parallel to the y-axis, ethylene is polarized in the direction of the molecular axis. This
is often the case with aromatic molecules having a well-defined molecular long axis
such as ethylene.
We would examine whether ethylene is polarized along, e.g., the x-axis. From a
character table of D2h (see Table 19.2), x belongs to B3u. In that case, using (19.111)
we have
1
qB3u ¼ ½1 1 þ ð1Þ ð1Þ þ 1 ð1Þ þ ð1Þ 1
8
þð1Þ ð1Þ þ 1 1 þ ð1Þ 1 þ 1 ð1Þ ¼ 0:
This implies that B3u is not contained in B1u B3g (¼B2u).

The above results on the optical transitions are quite obvious. Once we get used to
using a character table, quick estimation will be done.
19.4.2 Cyclopropenyl Radical [1]
Let us think of another example, cyclopropenyl radical that has three resonant
structures (Fig. 19.7). It is a planar molecule and three carbon atoms form an
equilateral triangle. Hence, the molecule belongs to D3h symmetry (see Sect.
17.2). In the molecule three π-electrons extend vertically to the molecular plane
toward upper and lower directions. Suppose that cyclopropenyl radical is placed on
the xy-plane. Three pz atomic orbitals of carbons located at vertices of an equilateral
triangle are denoted by ϕ1, ϕ2, and ϕ3 in Fig. 19.8. The orbitals are numbered
clockwise so that the calculations can be consistent with the conventional notation of
a character table (vide infra). We assume that these π-orbitals take positive and
negative signs on the upper and lower sides of the plane of paper, respectively, with a
nodal plane lying on the xy-plane. The situation is similar to that of ethylene and the
problem can be treated in parallel to the case of ethylene.
As in the case of ethylene, we can choose ϕ1, ϕ2, and ϕ3 as real functions. We
construct basis vectors using these vectors. What we want to do to address the
problem is as follows: (i) We examine how ϕ1, ϕ2, and ϕ3 are transformed by the
symmetry operations of D3h. According to the analysis, we can determine what
irreducible representations SALCs should be assigned. (ii) On the basis of knowl-
edge obtained in (i), we construct proper MOs.
In Table 19.4 we list a character table of D3h along with symmetry species. First
we examine traces (characters) of representation matrices. Similarly in the case of
ethylene, a subgroup C3 of D3h plays an essential role (vide infra). This subgroup
contains three group elements such that

C 3 ¼ E, C 3 , C 23 :
In the above, we use the same notation for the group name and group element, and
so we should be careful not to confuse them.
Fig. 19.7 Three resonant
⋅
structures of cyclopropenyl
radical ⟷ ⟷
⋅ ⋅
Fig. 19.8 Three pz atomic

orbitals of carbons for y
cyclopropenyl radical that is
placed on the xy-plane. The
carbon atoms are located at
vertices of an equilateral
triangle. The atomic orbitals
are denoted by ϕ1, ϕ2 and ϕ3
O x
z
Table 19.4 Character table D3h E 2C3 3C2 σh 2S3 3σ v

of D3h
A01 1 1 1 1 1 1 x2 + y2, z2
A02 1 1 1 1 1 1
E0 2 1 0 2 1 0 (x, y); (x2 y2, xy)
A001 1 1 1 1 1 1
A002 1 1 1 1 1 1 z
E00 2 1 0 2 1 0 (yz, zx)
They are transformed as follows:
C 3 ðzÞðϕ1 Þ ¼ ϕ3 , C 3 ðzÞðϕ2 Þ ¼ ϕ1 , and C3 ðzÞðϕ3 Þ ¼ ϕ2 ; ð19:113Þ

C23 ðzÞðϕ1 Þ ¼ ϕ2 , C23 ðzÞðϕ2 Þ ¼ ϕ3 , and C 23 ðzÞðϕ3 Þ ¼ ϕ1 : ð19:114Þ
Equation (19.113) can be combined into a following form:
ðϕ1 ϕ2 ϕ3 ÞC 3 ðzÞ ¼ ðϕ3 ϕ1 ϕ2 Þ: ð19:115Þ
Using a matrix representation, we have

0 1
0 1 0
B C
C 3 ðzÞ ¼ @ 0 0 1 A: ð19:116Þ
1 0 0
In turn, (19.114) is expressed as

0 1
0 0 1
B C
C 3 ðzÞ ¼ @ 1
2
0 0 A: ð19:117Þ
0 1 0
Both traces of (19.116) and (19.117) are zero.

Similarly let us check the representation matrices of other symmetry species. Of
these, e.g., for C2 related to the y-axis (see Fig. 19.8) we have
ðϕ1 ϕ2 ϕ3 ÞC 2 ¼ ðϕ1 ϕ3 ϕ2 Þ: ð19:118Þ
Therefore,
0 1
1 0 0
B C
C2 ¼ @ 0 0 1 A: ð19:119Þ
0 1 0
We have a trace 1 accordingly. Regarding σ h, we have

0 1
1 0 0
B C
ðϕ1 ϕ2 ϕ3 Þσ h ¼ ðϕ1 ϕ2 ϕ3 Þ and σ h ¼ @ 0 1 0 A: ð19:120Þ
0 0 1
In this way we can determine the trace for individual symmetry transformations
of basis functions ϕ1, ϕ2, and ϕ3. We collect the results of characters of a reducible
representation Γ in Table 19.5. It can be reduced to a summation of irreducible
representations according to the procedures given in (18.81)–(18.83) and using a
character table of D3h (Table 19.4). As a result, we get
Γ ¼ A2 00 þ E 00 : ð19:121Þ
As in the case of ethylene, we make the best use of the information of a subgroup
C3 of D3h. Let us consider a subduced representation of D of D3h to C3. For this, in
Table 19.6 we show a character table of irreducible representations of C3. We can
readily construct this character table. There should be three irreducible representa-
tions. Regarding the totally symmetric representation, we allocate 1 to each sym-
metry operation. Hence, for other representations we allocate 1 to an identity element
E and two other triple roots of 1, i.e., ε and ε [where ε ¼ exp (i2π/3)] to an element
C3 and C32 as shown so that the row and column vectors of the character table are
orthogonal to each other.
Returning to the construction of the subduced representation, we have
Table 19.5 Characters for D3h E 2C3 3C2 σh 2S3 3σ v

individual symmetry
Γ 3 0 1 3 0 1
transformations of 2pz orbitals
in cyclopropenyl radical
Table 19.6 Character table C3 E C3 C 23 ε ¼ exp (i2π/3)

of C3
A 1 1 1 z; x2 + y2, z2

E 1 ε ε (x, y); (x2 y2, xy), (yz, zx)
1 ε ε
A2 00 # C3 ¼ A and E00 # C 3 ¼ 2E: ð19:122Þ
Then, corresponding to (18.147) we have

h i
1 X ðAÞ 1
PðAÞ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ 1 ϕ2 þ 1 ϕ3
3 g 3
1
¼ ðϕ1 þ ϕ2 þ ϕ3 Þ: ð19:123Þ
3
Also, we have
h i
1 X ½ E ð 1Þ
P½E ϕ1 ¼
ð1Þ 1
χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ ε ϕ3 þ ðε Þ ϕ2
3 g 3
1
¼ ðϕ1 þ εϕ2 þ ε ϕ3 Þ: ð19:124Þ
3
Also, we get
h i
1 X ½ E ð 2Þ
P ½ E ϕ1 ¼
ð 2Þ 1
χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ ðε Þ ϕ3 þ ε ϕ2
3 g 3
1
¼ ðϕ1 þ ε ϕ2 þ εϕ3 Þ: ð19:125Þ
3
Here is the best place to mention an eigenvalue of a symmetry operator. Let us

designate SALCs as follows:
ξ1 ¼ ϕ1 þ ϕ2 þ ϕ3 , ξ2 ¼ ϕ1 þ εϕ2 þ ε ϕ3 , and ξ3 ¼ ϕ1 þ ε ϕ2 þ εϕ3 : ð19:126Þ
Let us choose C3 for a symmetry operator. Then we have

C 3 ðξ1 Þ ¼ C 3 ðϕ1 þ ϕ2 þ ϕ3 Þ ¼ C3 ϕ1 þ C3 ϕ2 þ C 3 ϕ3 ¼ ϕ3 þ ϕ1 þ ϕ2 ¼ ξ1 , ð19:127Þ
where for the second equality we used the fact that C3 is a linear operator. That is,
regarding a SALC ξ1, an eigenvalue of C3 is 1.
Similarly, we have
C3 ðξ2 Þ ¼ C3 ðϕ1 þ εϕ2 þ ε ϕ3 Þ ¼ C 3 ϕ1 þ εC3 ϕ2 þ ε C 3 ϕ3

¼ ϕ3 þ εϕ1 þ ε ϕ2 ¼ εðϕ1 þ εϕ2 þ ε ϕ3 Þ ¼ εξ2 : ð19:128Þ
Furthermore, we get
C 3 ðξ 3 Þ ¼ ε ξ 3 : ð19:129Þ
Thus, we find that regarding SALCs ξ2 and ξ3, eigenvalues of C3 are ε and ε ,
respectively. These pieces of information imply that if we appropriately choose
proper functions for basis vectors, a character of a symmetry operation for a
one-dimensional representation is identical to an eigenvalue of the said symmetry
operation (see Table 19.6).
Regarding the last parts of the calculations, we follow the procedures described in
the case of ethylene. Using the above functions ξ1, ξ2, and ξ3, we construct the
secular equation such that

He e
11 λS11
e 22 λe
H S22 ¼ 0: ð19:130Þ

e e
H 33 λS33
Since we have obtained three SALCs that are assigned to individual irreducible
representations A, E(1), and E(2), these SALCs span the representation space V3. This
makes off-diagonal elements of the secular equation vanish and it is simplified as in
(19.130). Here, we have
ð ð
e 11 ¼ ξ1 Hξ1 dτ ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ H ðϕ1 þ ϕ2 þ ϕ3 Þdτ
H
¼ 3ðα þ 2βÞ, ð19:131Þ
where we used the same α and β as defined in (19.94). Strictly speaking, α and β
appearing in (19.131) should be slightly different from those of (19.94), because a
Hamiltonian is different. This approximation, however, would be enough for the
present studies. In a similar manner, we get
ð ð
e 22 ¼ ξ2 Hξ2 dτ ¼ 3ðα βÞ
H e 33 ¼ ξ3 Hξ3 dτ ¼ 3ðα βÞ:
and H ð19:132Þ
Meanwhile, we have
ð
e
S11 ¼ hξ1 jξ1 i ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ ðϕ1 þ ϕ2 þ ϕ3 Þdτ
ð
¼ ðϕ1 þ ϕ2 þ ϕ3 Þ2 dτ ¼ 3ð1 þ 2SÞ: ð19:133Þ
Similarly, we get
e
S22 ¼ e
S33 ¼ 3ð1 SÞ: ð19:134Þ
Readers are urged to verify (19.133) and (19.134).

Substituting (19.131) through (19.134) for (19.130), we get as the energy
eigenvalue
α þ 2β αβ αβ
λ1 ¼ , λ2 ¼ , and λ3 ¼ : ð19:135Þ
1 þ 2S 1S 1S
Notice that two MOs belonging to E00 have the same energy. These MOs are said
to be energetically degenerate. This situation is characteristic of a two-dimensional
representation. Actually, even though the group C3 has only one-dimensional
representations (because it is an Abelian group), the two complex conjugate repre-
sentations labelled E behave as if they were a two-dimensional representation
[2]. We will again encounter the same situation in a next example, benzene.
From (19.126), we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j jξ1 j j¼ hξ1 jξ1 i ¼ 3ð1 þ 2SÞ: ð19:136Þ
Thus, as one of MOs corresponding to an energy eigenvalue λ1, i.e., Ψ 1, we get
j ξ1 i ϕ1 þ ϕ2 þ ϕ3
Ψ1 ¼ ¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:137Þ
j jξ1 j j 3ð1 þ 2SÞ
Also, we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
hξ2 jξ2 i ¼ 3ð1 SÞ
jjξ2 jj ¼ and j jξ3 j j¼ hξ3 jξ3 i
¼ 3ð 1 SÞ : ð19:138Þ
Thus, for another MO corresponding to an energy eigenvalue λ2, we get

j ξ2 i ϕ þ εϕ2 þ ε ϕ3
Ψ2 ¼ ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:139Þ
Also, with a MO corresponding to λ3 (¼λ2), we have
j ξ3 i ϕ þ ε ϕ2 þ εϕ3
Ψ3 ¼ ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:140Þ
Equations (19.139) and (19.140) include complex numbers, and so it is inconve-

nient to computer analysis. In that case, we can convert it to real numbers. In Part III,
we examined properties of unitary transformations. Since the unitary transformation
keeps a norm of a vector unchanged, this is suited to our present purpose. This can be
done using a following unitary matrix U:
0 1
1 i
B 2 2C
U¼B
@ 1
C: ð19:141Þ
i A
2 2
Then we have

1 i
ðΨ 2 Ψ 3 ÞU ¼ pffiffiffi ðΨ 2 þ Ψ 3 Þ pffiffiffi ðΨ 2 þ Ψ 3 Þ : ð19:142Þ
2 2
f2 and Ψ
Thus, defining Ψ f3 as
f2 ¼ p1ffiffiffi ðΨ 2 þ Ψ 3 Þ and
Ψ f3 ¼ piffiffiffi ðΨ 2 þ Ψ 3 Þ,
Ψ ð19:143Þ
2 2
we get
1
f2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ ½2ϕ1 þ ðε þ ε Þϕ2 þ ðε þ εÞϕ3
6ð 1 SÞ
1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2ϕ1 ϕ2 ϕ3 Þ: ð19:144Þ
6ð 1 SÞ
Also, we have
h pffiffiffi pffiffiffi i
i
f3 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i
Ψ ½ðε εÞϕ2 þ ðε ε Þϕ3 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 3ϕ2 þ i 3ϕ3
6ð1 SÞ 6ð 1 SÞ
1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðϕ2 ϕ3 Þ: ð19:145Þ
2ð 1 SÞ
Thus, we have successfully converted complex functions to real functions. In the

above unitary transformation, notice that a norm of the vectors remains unchanged
before and after the unitary transformation.
As cyclopropenyl radical has three π electrons, two occupy the lowest energy
level of A00. Another electron occupies a level E00. Since this level possesses an
energy higher than α, the electron occupying this level is anticipated to be unstable.
Under such a circumstance, a molecule tends to lose the said electron so as to be a
cation. Following the argument given in the previous case of ethylene, it is easy to
make sure that the allowed transition of the cyclopropenyl radical takes place when
the light is polarized parallel to the molecular plane (i.e., the xy-plane in Fig. 19.8).
The proof is left for readers as an exercise. This polarizing feature is typical of planar
molecules with high molecular symmetry.
19.4.3 Benzene
Benzene has structural formula which is shown in Fig. 19.9. It is a planar molecule
and six carbon atoms form a regular hexagon. Hence, the molecule belongs to D6h
symmetry. In the molecule six π-electrons extend vertically to the molecular plane
toward upper and lower directions as in the case of ethylene and cyclopropenyl
radical. This is a standard illustration of quantum chemistry and dealt with in many
textbooks. As in the case of ethylene and cyclopropenyl radical, the problem can be
treated similarly.
As before, six equivalent pz atomic orbitals of carbon are denoted by ϕ1 to ϕ6 in
Fig. 19.10. These vectors or their linear combinations span a six-dimensional
representation space. We construct basis vectors using these vectors. Following
the previous procedures, we construct proper SALC orbitals along with MOs.
Similarly as before, a subgroup C6 of D6h plays an essential role. This subgroup
contains six group elements such that

C6 ¼ E, C6 , C 3 , C 2 , C 23 , C56 : ð19:146Þ
Taking C6(z) as an example, we have
ðϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ÞC 6 ðzÞ ¼ ðϕ6 ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 Þ: ð19:147Þ
Using a matrix representation, we have
Fig. 19.9 Structural

formula of benzene. It
belongs to D6h symmetry
Fig. 19.10 Six equivalent

pz atomic orbitals of carbon
y
of benzene
O x
z
0 1
0 1 0 0 0 0
B0 0C
B 0 1 0 0 C
B C
B0 0 0 1 0 0C
C 6 ðzÞ ¼ B
B0
C: ð19:148Þ
B 0 0 0 1 0C
C
B C
@0 0 0 0 0 1A
1 0 0 0 0 0
Once again, we can determine the trace for individual symmetry transformations
belonging to D6h. We collect the results in Table 19.7. The representation is
reducible and this is reduced as follows using a character table of D6h
(Table 19.8). As a result, we get
Γ ¼ A2u þ B2g þ E 1g þ E2u : ð19:149Þ
As a subduced representation of D of D6h to C6, we have
A2u # C 6 ¼ A, B2g # C 6 ¼ B, E 1g # C 6 ¼ 2E1 , E 2u # C 6 ¼ 2E2 : ð19:150Þ
Here, we used Table 19.9 that shows a character table of irreducible representa-
tions of C6. Following the previous procedures, as SALCs we have
Table 19.7 Characters for individual symmetry transformations of 2pz orbitals in benzene
D6h E 2C6 2C3 C2 3C 02 3C 002 i 2S3 2S6 σh 3σ d 3σ v
Γ 6 0 0 0 2 0 0 0 0 6 0 2
Table 19.8 Character table of D6h

D6h E 2C6 2C3 C2 3C 02 3C 002 i 2S3 2S6 σh 3σ d 3σ v
A1g 1 1 1 1 1 1 1 1 1 1 1 1 x2 + y2, z2
A2g 1 1 1 1 1 1 1 1 1 1 1 1
B1g 1 1 1 1 1 1 1 1 1 1 1 1
B2g 1 1 1 1 1 1 1 1 1 1 1 1
E1g 2 1 1 2 0 0 2 1 1 2 0 0 (yz, zx)
E2g 2 1 1 2 0 0 2 1 1 2 0 0 (x2 y2, xy)
A1u 1 1 1 1 1 1 1 1 1 1 1 1
A2u 1 1 1 1 1 1 1 1 1 1 1 1 z
B1u 1 1 1 1 1 1 1 1 1 1 1 1
B2u 1 1 1 1 1 1 1 1 1 1 1 1
E1u 2 1 1 2 0 0 2 1 1 2 0 0 (x, y)
E2u 2 1 1 2 0 0 2 1 1 2 0 0
6PðAÞ ϕ1 ξ1 ¼ ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6 ,
6PðBÞ ϕ1 ξ2 ¼ ϕ1 ϕ2 þ ϕ3 ϕ4 þ ϕ5 ϕ6 ,
P½E1 ϕ1 ξ3 ¼ ϕ1 þ εϕ2 ε ϕ3 ϕ4 εϕ5 þ ε ϕ6 ,
ð1Þ
ð19:151Þ
P½E1 ϕ ξ ¼ ϕ þ ε ϕ εϕ ϕ ε ϕ þ εϕ ,
ð2Þ
1 4 1 2 3 4 5 6
P½E2 ϕ1 ξ5 ¼ ϕ1 ε ϕ2 εϕ3 þ ϕ4 ε ϕ5 εϕ6 ,

ð1Þ
P½E2 ϕ1 ξ6 ¼ ϕ1 εϕ2 ε ϕ3 þ ϕ4 εϕ5 ε ϕ6 ,

ð2Þ
where ε ¼ exp (iπ/3).

Correspondingly, we have a diagonal secular equation of a sixth-order such that

He e
11 λS11
e 22 λe
H S22

e 33 λe
H S33

¼ 0: ð19:152Þ

e 44 λe
H S44

e 55 λe
H S55
H 66 λS66
e e
Here, we have, for example,

Table 19.9 Character table of C6

C6 E C6 C3 C2 C 23 C 56 ε ¼ exp (iπ/3)
A 1 1 1 1 1 1 z; x2 + y2, z2
B 1 1 1 1 1 1

E1 1 ε ε 1 ε ε (x, y); (yz, zx)
1 ε ε 1 ε ε

E2 1 ε ε 1 ε ε (x2 y2, xy)
1 ε ε 1 ε ε
ð
e 11 ¼ ξ1 Hξ1 dτ ¼ hξ1 jHξ1 i ¼ 6ðα þ 2β þ 2β0 þ β00 Þ:
H ð19:153Þ
In (19.153), we used the same α and β defined in (19.94) as in the case of

cyclopropenyl radical. That is, α is a Coulomb integral and β is a resonance integral
between two adjacent 2pz orbital of carbon. Meanwhile, β0 is a resonance integral
between orbitals of “meta” positions such as ϕ1 and ϕ3. A quantity β00 is a resonance
integral between orbitals of “para” positions such as ϕ1 and ϕ4. It is unfamiliar to
include such kind resonance integrals of β0 and β00 at a simple π-electron approxi-
mation level. To ignore such resonance integrals is because of a practical purpose to
simplify the calculations. However, we have no reason to exclude them. Or rather,
the use of appropriate SALCs makes it feasible to include β0 and β00.
In a similar manner, we get
ð
e 22 ¼ ξ2 Hξ2 dτ ¼ hξ2 jHξ2 i ¼ 6ðα 2β þ 2β0 β00 Þ,
H
e 33 ¼ H
e 44 ¼ 6ðα þ β β0 β00 Þ, ð19:154Þ
H
e 55 ¼ H
H e 66 ¼ 6ðα β β0 þ β00 Þ:
Meanwhile, we have
e
S11 ¼ hξ1 jξ1 i ¼ 6ð1 þ 2S þ 2S0 þ S00 Þ,
e
S22 ¼ hξ2 jξ2 i ¼ 6ð1 2S þ 2S0 S00 Þ,
ð19:155Þ
e
S33 ¼ e
S44 ¼ 6ð1 þ S S0 S00 Þ,
e
S55 ¼ e
S66 ¼ 6ð1 S S0 þ S00 Þ,
where S, S0, and S00 are overlap integrals between the ortho, meta, and para positions,
respectively. Substituting (19.153) through (19.155) for (19.152), the energy eigen-
values are readily obtained as
α þ 2β þ 2β0 þ β00 α 2β þ 2β0 β00 α þ β β0 β00

λ1 ¼ 0 00 , λ2 ¼ , λ3 ¼ λ4 ¼ ,
1 þ 2S þ 2S þ S 1 2S þ 2S0 S00 1 þ S S0 S00
0 00
αββ þβ
λ5 ¼ λ6 ¼ :
1 S S0 þ S00
ð19:156Þ
Notice that two MOs ξ3 and ξ4 as well as ξ5 and ξ6 are degenerate. As can be seen,
λ3 and λ4 are doubly degenerate. So are λ5 and λ6.
From (19.151), we get for instance
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
kξ1 k ¼ hξ1 jξ1 i ¼ 6ð1 þ 2S þ 2S0 þ S00 Þ: ð19:157Þ
Thus, for one of normalized MOs corresponding to an energy eigenvalue λ1, we

get
j ξ1 i ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6
Ψ1 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi : ð19:158Þ
kξ 1 k 6ð1 þ 2S þ 2S0 þ S00 Þ
Following the previous examples, we have other normalized MOs. That is, we
have
j ξ2 i ϕ1 ϕ2 þ ϕ3 ϕ4 þ ϕ5 ϕ6
Ψ2 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ,
kξ 2 k 6ð1 2S þ 2S0 S00 Þ
j ξ3 i ϕ1 þ εϕ2 ε ϕ3 ϕ4 εϕ5 þ ε ϕ6
Ψ3 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ,
kξ 3 k 6ð1 þ S S0 S00 Þ
j ξ4 i ϕ1 þ ε ϕ2 εϕ3 ϕ4 ε ϕ5 þ εϕ6
Ψ4 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi , ð19:159Þ
kξ 4 k 6ð1 þ S S0 S00 Þ
j ξ i ϕ ε ϕ2 εϕ3 þ ϕ4 ε ϕ5 εϕ6
Ψ5 ¼ 5 ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ,
kξ 5 k 6ð1 S S0 þ S00 Þ
j ξ i ϕ εϕ2 ε ϕ3 þ ϕ4 εϕ5 ε ϕ6
Ψ6 ¼ 6 ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi :
kξ 6 k 6ð1 S S0 þ S00 Þ
The eigenfunction Ψ i (1 i 6) corresponds to the eigenvalue λi. As in the case

f3 and Ψ
of cyclopropenyl radical, Ψ 3 and Ψ 4 can be transformed to Ψ f4, respectively,
through a unitary matrix of (19.141). Thus, we get
Fig. 19.11 Energy diagram ( )

and MO assignments of
benzene. Energy
eigenvalues λ1 to λ6 are
= ( )
given in (19.156)
= ( )
f3 ¼ 2ϕ1 þpϕffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ 2 ϕ3 2ϕ4 ϕ5 þ ϕ6
,
12ð1 þ S S0 S00 Þ
ð19:160Þ
f4 ¼ ϕp
Ψ 2 þ ϕ3 ϕ5 ϕ6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
2 1 þ S S0 S00
f5 and Ψ
Similarly, transforming Ψ 5 and Ψ 6 to Ψ f6 , respectively, we have
f5 ¼ 2ϕ1 pϕffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ 2 ϕ3 þ 2ϕ4 ϕ5 ϕ6
,
12ð1 S S0 þ S00 Þ
ð19:161Þ
f6 ¼ ϕp
Ψ 2 ϕ3 þ ϕ5 ϕ6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
2 1 S S0 þ S00
Figure 19.11 shows an energy diagram and MO assignments of benzene.

A major optical transition takes place among HOMO (E1g) and LUMO (E2u)
levels. In the case of optical absorption, an initial electronic configuration is assigned
to the totally symmetric representation A1g and the symmetry of the final electronic
configuration is described as
Γ ¼ E 1g E 2u :
Therefore, a transition matrix element is expressed as

D E

ΦðE1g E 2u Þ
jεe PΦðA1g Þ , ð19:162Þ
where ΦðA1g Þ stands for the totally symmetric ground-state electronic configuration;
ΦðE1g E2u Þ denotes an excited state electronic configuration represented by a direct-
product representation. This representation is reducible and expressed as a direct
sum of irreducible representations such that
Γ ¼ B1u þ B2u þ E 1u : ð19:163Þ
Notice that unlike ethylene a direct-product representation associated with the

final state is reducible.
To examine whether (19.162) is nonvanishing, as in the case of (19.109) we
estimate whether a direct-product representation A1g E1g E2u ¼ E1g E2u ¼
B1u + B2u + E1u contains an irreducible representation which εe P belongs
to. Consulting a character table of D6h, we find that x and y belong to an irreducible
representation E1u and that z belongs to A2u. Since the direct sum Γ in (19.163)
contains E1u, benzene is expected to be polarized along both x- and y-axes (see
Fig. 19.10). Since (19.163) does not contain A2u, the transition along the z-axis is
forbidden. Accordingly, the transition takes place when the light is polarized parallel
to the molecular plane (i.e., the xy-plane). This is a common feature among planar
aromatic molecules including benzene and cyclopropenyl radical. On the other hand,
A2u is not contained in Γ, and so we do not expect the optical transition to occur in
the direction of the z-axis.
19.4.4 Allyl Radical [1]
We revisit the allyl radical and perform its MO calculations. As already noted,
Tables 18.1 and 18.2 of Example 18.1 collected representation matrices of individual
symmetry operations in reference to the basis vectors comprising three atomic
orbitals of allyl radical. As usual, we examine traces (or characters) of those
matrices. Table 19.10 collects them. The representation is readily reduced according
to the character table of C2v (see Table 18.4) so that we have
Γ ¼ A2 þ 2B1 : ð19:164Þ
We have two SALC orbitals that belong to the same irreducible representation of
B1. As noted in Sect. 19.3, the orbitals obtained by a linear combination of these two
SALCs belong to B1 as well. Such a linear combination is given by a unitary
transformation. In the present case, it is convenient to transform the basis vectors
two times. The first transformation is carried out to get SALCs and the second one
will be done in the process of solving a secular equation using the SALCs. Sche-
matically showing the procedures, we have
Table 19.10 Characters for C2v E C2(z) σ v(zx) σ 0v ðyzÞ

individual symmetry
Γ 3 1 1 3
transformations of 2pz orbitals
in allyl radical
8 8 8
< ϕ1
> < Ψ1
> < Φ1
>
ϕ2 ! Ψ 2 ! Φ2 ,
>
: >
: >
:
ϕ3 Ψ3 Φ3
where ϕ1, ϕ2, ϕ3 show the original atomic orbitals; Ψ 1, Ψ 2, Ψ 3 the SALCs; Φ1, Φ2,
Φ3 the final MOs. Thus, the three sets of vectors are connected through unitary
transformations.
Starting with ϕ1 and following the previous cases, we have, e.g.,
h i
1 X ðB1 Þ
PðB1 Þ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ϕ1 :
4 g
Also starting with ϕ2, we have

h i
1 X ðB1 Þ 1
PðB1 Þ ϕ2 ¼ χ ðgÞ gϕ2 ¼ ðϕ2 þ ϕ3 Þ:
4 g 2
Meanwhile, we get
PðA2 Þ ϕ1 ¼ 0,
h i
1 X ðA2 Þ 1
PðA2 Þ ϕ2 ¼ χ ð g Þ gϕ2 ¼ ðϕ2 ϕ3 Þ:
4 g 2
Thus, we recovered the results of Example 18.1. Notice that ϕ1 does not partic-
ipate in A2, but take part in B1 by itself.
Normalized SALCs are given as follows:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ 1 ¼ ϕ1 , Ψ 2 ¼ ðϕ2 þ ϕ3 Þ= 2ð1 þ S0 Þ, Ψ 3 ¼ ðϕ2 ϕ3 Þ= 2ð1 S0 Þ,
Ð
where we define S0 ϕ2ϕ3dτ. If Ψ 1, Ψ 2, and Ψ 3 belonged to different irreducible
representations (as in the cases of previous three examples of ethylene,
cyclopropenyl radical, and benzene), the secular equation would be fully reduced
to a form of (19.66). In the present case, however, Ψ 1 and Ψ 2 belong to the same
irreducible representation B1, making the situation a bit complicated. Nonetheless,
we can use (19.61) and the secular equation is “partially” reduced.
Defining
ð ð
H jk ¼ Ψ j HΨ k dτ and Sjk ¼ Ψ j Ψ k dτ,
we have a following secular equation the same form as (19.61):

det H jk λSjk ¼ 0:
More specifically, we have

rffiffiffiffiffiffiffiffiffiffiffiffi

αλ
2
ðβ SλÞ
1 þ S0
0

2 α þ β0
¼ 0,
1 þ S0 ðβ SλÞ 1 þ S0
λ 0

α β0
0 0 λ
1 S0
where α, β, and S are similarly defined as (19.94) and (19.98). The quantity β0 is
defined as
ð
0
β ϕ2 Hϕ3 dτ:
Thus, the secular equation is separated into the following two:


αλ
2
0 ðβ SλÞ
1þS α β0
rffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0 and λ ¼ 0: ð19:165Þ
2 αþβ 0 1 S0
ð β Sλ Þ λ
1 þ S0 1þS 0
The second equation immediately gives
α β0
λ¼ :
1 S0
The first equation of (19.165) is somewhat complicated, and so we adopt the next
approximation. That is,
S0 ¼ β0 ¼ 0: ð19:166Þ
This approximation is justified, because two carbon atoms C2 and C3 are pretty
remote, and so the interaction between them is likely to be weak enough. Thus, we
rewrite the first equation of (19.165) as
pffiffiffi
αλ 2ðβ SλÞ

pffiffiffi ¼ 0: ð19:167Þ
2ðβ SλÞ αλ
Moreover, we assume that since S is a small quantity compared to 1, a square term

of S2 1. Hence, we ignore S2.
Using the approximation of (19.166) and assuming S2 0, from (19.167) we
have a following quadratic equation:
λ2 2ðα 2βSÞλ þ α2 2β2 ¼ 0:
Solving this equation, we have

pffiffiffi 2αS pffiffiffi αS
λ ¼ α 2βS 2β 1 α 2βS 2β 1 ,
β β
where the last approximation is based on
pffiffiffiffiffiffiffiffiffiffiffi 1
1x 1 x
2
with a small quantity x. Thus, we get

λL αþ 2β 1 2S and λH α 2β 1þ 2S , ð19:168Þ
where λL < λH. To determine the corresponding eigenfunctions Φ (i.e., MOs), we use
a linear combination of two SALCs Ψ 1 and Ψ 2. That is, putting
Φ ¼ c1 Ψ 1 þ c2 Ψ 2 ð19:169Þ
and from (19.167), we obtain

pffiffiffi
ðα λÞc1 þ 2ðβ SλÞc2 ¼ 0:
Thus, with λ ¼ λL we get c1 ¼ c2 and for λ ¼ λH we get c1 ¼ c2. Consequently,

as a normalized eigenfunction ΦL corresponding to λL we get
1 1 pffiffiffi 1=2 pffiffiffi

ΦL ¼ pffiffiffi ðΨ 1 þ Ψ 2 Þ ¼ 1 þ 2S 2ϕ1 þ ϕ2 þ ϕ3 : ð19:170Þ
2 2
Likewise, as a normalized eigenfunction ΦH corresponding to λH we get
1 1 pffiffiffi 1=2 pffiffiffi

ΦH ¼ pffiffiffi ðΨ 1 þ Ψ 2 Þ ¼ 1 2S 2ϕ1 þ ϕ2 þ ϕ3 : ð19:171Þ
2 2
As another eigenvalue λ0 and corresponding eigenfunction Φ0, we have

1
λ0 α and Φ0 ¼ pffiffiffi ðϕ2 ϕ3 Þ, ð19:172Þ
2
where we have λL < λ0 < λH. The eigenfunction Φ0 does not participate in chemical
bonding and, hence, is said to be a nonbonding orbital.
It is worth noting that eigenfunctions ΦL, Φ0, and ΦH have the same function
forms as those obtained by the simple Hückel theory that ignores the overlap
integrals S. It is because the interaction between ϕ2 and ϕ3 that construct SALCs
of Ψ 2 and Ψ 3 is weak.
Notice that within the framework of the simple Hückel theory, two sets of basis
vectors jϕ1 i, p1ffiffi2 jϕ2 þ ϕ3 i, p1ffiffi2 jϕ2 ϕ3 i and |ΦLi, |ΦHi, |Φ0i are connected by a
following unitary matrix V:

1 1
ðjΦL i jΦH i jΦ0 iÞ ¼ jϕ1 i pffiffiffi jϕ2 þ ϕ3 i pffiffiffijϕ2 ϕ3 i V
2 2
0 1
1 1
B
B 2 2 C
C
1 1 B C
¼ jϕ1 i pffiffiffi jϕ2 þ ϕ3 i pffiffiffijϕ2 ϕ3 i B 1
pffiffiffi
1
pffiffiffi C: ð19:173Þ
2 2 B 0 C
@ 2 2 A
0 0 1
Although both SALCs Ψ 1 and Ψ 2 belong to the same irreducible representation

B1, they are not orthogonal. As can be seen from (19.170) and (19.171), however, we
find that ΦL and ΦH that are sought by solving the secular equation (19.167) have
been mutually orthogonal. Starting from jϕ1i, jϕ2i, and jϕ3i of Example 18.1, we
reached jΦLi, jΦHi, and jΦ0i via two-step unitary transformations (18.40) and
(19.173). The combined unitary transformations W ¼ UV are unitary again. That
is, we have
0 1 1 1
B 2 2 C
B C
B 1 1 1 C
ðjϕ1 ijϕ2 ijϕ3 iÞUV ¼ ðjϕ1 ijϕ2 ijϕ3 iÞB
B 2 pffiffiffi C
B 2 2 C C
@ 1 1 1 A
pffiffiffi
2 2 2
¼ ðjΦL i jΦH i jΦ0 iÞ: ð19:174Þ
The optical transition of allyl radical represents general features of the optical
transitions of molecules. To make a story simple, let us consider a case of allyl
cation. Figure 19.12 shows electronic configurations together with symmetry species
of individual eigenstates and their corresponding energy eigenvalues for the allyl
cation. In Fig. 19.13, we redraw its geometry where the origin is located at the center
(a) (b) (c)

( ) − 2 )(1 + 2
( )
( ) + 2 )(1 − 2
Fig. 19.12 Electronic configurations and symmetry species of individual eigenstates along with
their corresponding energy eigenvalues for the allyl cation. (a) Ground state. (b) First excited state.
(c) Second excited state
C1
r1
C3 C2
r3 O r2
Fig. 19.13 Geometry and position vectors of carbon atoms of the allyl cation. The origin is located
at the center of a line segment connecting C2 and C3 (r2 + r3 = 0)
of a line segment connecting C2 and C3 (r2 + r3 = 0). Major optical transitions

(or optical absorption) are ΦL ! Φ0 and ΦL ! ΦH.
(i) ΦL ! Φ0: In this case, following (19.109) the transition matrix element Tfi is
described by
D E
T fi ¼ Θf ðB1 A2 Þ
jεe PjΘi ðA1 Þ : ð19:175Þ
In the above equation, we designate the irreducible representation of eigenstates

according to (19.164). Therefore, the symmetry of the final electronic configu-
ration is described as
Γ ¼ B1 A2 ¼ B2 :
Since a direct product of the initial electronic configuration [Θi ðA1 Þ ] and the
final configuration Θf ðB1 A2 Þ is B1 A2 A1 ¼ B2. Then, if εe P belongs to the
same irreducible representation B2, the associated optical transition should be
allowed. Consulting a character table of C2v, we find that y belongs to B2. Thus,
the allowed optical transition is polarized along the y-direction.
(ii) ΦL ! ΦH: In parallel with the above case, Tfi is described by
D E
T fi ¼ Θf ðB1 B1 Þ
jεe PjΘi ðA1 Þ : ð19:176Þ
Thus, the transition is characterized by

A1 ! A1 ,
where the former A1 indicates the electronic ground state and the latter A1
indicates the excited state given by B1 B1 ¼ A1. The direct product of them is
simply described as B1 B1 A1 ¼ A1. Consulting a character table again, we
find that z belongs to A1. This implies that the allowed transition is polarized
along the z-direction.
Next, we investigate the above optical transition in a semiquantitative manner. In
principle, the transition matrix element Tfi should be estimated from (19.175) and
(19.176) that use electronic configurations of two-electron system. Nonetheless, a
formulation of (4.7) of Sect. 4.1 based upon one-electron states well serves our
present purpose. For ΦL ! Φ0 transition, we have
ð ð
T fi ðΦL ! Φ0 Þ ¼ Φ0 εe PΦL dτ ¼ eεe Φ0 rΦL dτ
ð
1 pffiffiffi 1
¼ eεe 2ϕ1 þ ϕ2 þ ϕ3 r pffiffiffi ðϕ2 ϕ3 Þ dτ
2 2
ð pffiffiffi pffiffiffi
eε
¼ peffiffiffi 2ϕ1 rϕ2 2ϕ1 rϕ3 þ ϕ2 rϕ2 þ ϕ3 rϕ2 ϕ3 rϕ3 dτ
2 2
ð
eεe
pffiffiffi ðϕ2 rϕ2 ϕ3 rϕ3 Þ dτ
2 2
ð ð
eεe
pffiffiffi r2 jϕ2 j2 dτ r3 jϕ3 j2 dτ
2 2
eε eε
¼ peffiffiffi ðr2 r3 Þ ¼ pffiffieffi r2
2 2 2
Ð
where with the first near equality we ignored integrals ϕirϕjdτ (i 6¼ j); with the last
near equality r r2 or r3. For these approximations, we assumed that an electron
density is very high near C2 or C3 with ignorable density at a place remote from
them. Choosing εe for the direction of r2, we get
e
T fi ðΦL ! Φ0 Þ pffiffiffi j r2 j :
2
With ΦL ! ΦH transition, similarly we have

ð
eεe eε
T fi ðΦL ! ΦH Þ ¼ ΦH εe PΦL dτ ðr3 2 2r1 þ r2 Þ = 2 e r1 :
4 2
Choosing εe for the direction of r1, we get

19.5 MO Calculations of Methane 777
e
T fi ðΦL ! ΦH Þ jr j:
2 1
Transition
pffiffiffi probability is proportional to a square of an absolute value of Tfi. Using
j r2 j 3 j r1 j, we have
e2 3e2
T fi ðΦL ! Φ0 Þ2 j r2 j 2 ¼ jr j2 ,
2 2 1
e2
T fi ðΦL ! ΦH Þ2 jr j2 :
4 1
Thus, we obtain
2
T fi ðΦL ! Φ0 Þ2 6T fi ðΦL ! ΦH Þ : ð19:177Þ
Thus, the transition probability of ΦL ! Φ0 is about six times that for ΦL ! ΦH.
Note that in the above simple estimation we ignore an overlap integral S.
From the above discussion, we conclude that (i) the ΦL ! Φ0 transition is
polarized along the r2 direction (i.e., the molecular long axis) and that the ΦL ! ΦH
transition is polarized along the r1 direction (i.e., the molecular short axis).
(ii) Transition probability of ΦL ! Φ0 is about six times that of ΦL ! ΦH. Note
that the polarized characteristics are consistent with those obtained from the discus-
sion based on the group theory. The conclusion reached by the semiquantitative
estimation of a simple molecule of allyl cation well typifies the general optical
features of more complicated molecules having a well-defined molecular long axis
such as polyenes.
19.5 MO Calculations of Methane
So far, we investigated MO calculations of aromatic molecules based upon π-

electron approximation. These are a homogeneous system that has the same quality
of electrons. Here we deal with methane that includes a carbon and surrounding four
hydrogens. These hydrogen atoms form a regular tetrahedron with the carbon atom
positioned at a center of the tetrahedron. It is therefore considered as a heterogeneous
system. The calculation principle, however, is consistent, namely, we make the most
of projection operators and construct appropriate SALCs of methane.
We deal with four 1s electrons of hydrogen along with two 2s electrons and two
2p electrons of carbon. Regarding basis functions of carbon, however, we consider
2s atomic orbital and three 2p orbitals (i.e., 2px, 2py, 2pz orbitals). That is, we deal
with eight atomic orbitals all together. These are depicted in Fig. 19.14. The
dimension of the vector space (i.e., representation space) is eight accordingly.
Fig. 19.14 Four 1s atomic z

orbitals of hydrogen and a
2pz orbital of carbon. The
former orbitals are
represented by H1 to H4. 2px
and 2py orbitals of carbon
are omitted for simplicity
O
y
As before, we wish to determine irreducible representations which individual

MOs belong to. As already mentioned in Sect. 17.3, there are 24 symmetry opera-
tions in a point group Td which methane belongs to (see Table 17.6). According to
the symmetry operations, we decide transformation matrices related to each opera-
tion. For example, Cxyz
3 transforms basis functions as follows:
H 1 H 2 H 3 H 4 C2s C2px C2py C2pz C xyz

3
¼ H 1 H 3 H 4 H 2 C2s C2py C2pz C2px ,
where by the above notations we denoted atomic species and molecular orbitals.
Hence, as a matrix representation we have
0 1
1 0 0 0 0 0 0 0
B0 0C
B 0 0 1 0 0 0 C
B C
B0 1 0 0 0 0 0 0C
B C
B0 0C
B 0 1 0 0 0 0 C
Cxyz ¼B C, ð19:178Þ
3
B0 0 0 0 1 0 0 0C
B C
B0 1C
B 0 0 0 0 0 0 C
B C
@0 0 0 0 0 1 0 0A
0 0 0 0 0 0 1 0
where Cxyz
3 is the same operation as Rxyz2π
3
appeared in Sect. 17.3. Therefore,
χ C xyz
3 ¼ 2:
As another example σ yz
d , we have
H 1 H 2 H 3 H 4 C2s C2px C2py C2pz σ yz

d
¼ H 1 H 2 H 4 H 3 C2s C2px C2pz C2py ,
where σ yz
d represents a mirror symmetry with respect to the plane that includes the x-
axis and bisects the angle formed by the y- and z-axes. Thus, we have
0 1
1 0 0 0 0 0 0 0
B0 0C
B 1 0 0 0 0 0 C
B C
B0 0 0 1 0 0 0 0C
B C
B0 0C
yz B 0 1 0 0 0 0 C
σd ¼ B C: ð19:179Þ
B0 0 0 0 1 0 0 0C
B C
B0 0C
B 0 0 0 0 1 0 C
B C
@0 0 0 0 0 0 0 1A
0 0 0 0 0 0 1 0
Then, we have
χ σ yz
d ¼ 4:
Taking some more examples, for Cz2 we get
H 1 H 2 H 3 H 4 C2s C2px C2py C2pz C z2

¼ H 4 H 3 H 2 H 1 C2s C2px C2py C2pz ,
where Cz2 means a rotation by π around the z-axis. Also, we have

0 1
0 0 0 1 0 0 0 0
B0 0C
B 0 1 0 0 0 0 C
B C
B0 1 0 0 0 0 0 0C
B C
B1 0 0C
B 0 0 0 0 0 C
C2 ¼ B
z
C, ð19:180Þ
B0 0 0 0 1 0 0 0C
B C
B0 0 1 0 0C
B 0 0 0 C
B C
@0 0 0 0 0 0 1 0 A
0 0 0 0 0 0 0 1
χ Cz2 ¼ 0:
zπ
With S42 (i.e., an improper rotation by π2 around the z-axis), we get
zπ
H 1 H 2 H 3 H 4 C2s C2px C2py C2pz S42
¼ H 3 H 1 H 4 H 2 C2s C2py C2px C2pz ,
0 1
0 1 0 0 0 0 0 0
B0 0 C
B 0 0 1 0 0 0 C
B C
B1 0 0 0 0 0 0 0 C
B C
B0 0 C
zπ2 B 0 1 0 0 0 0 C
S4 ¼ B C, ð19:181Þ
B0 0 0 0 1 0 0 0 C
B C
B0 1 0 C
B 0 0 0 0 0 C
B C
@0 0 0 0 0 1 0 0 A
0 0 0 0 0 0 0 1
zπ2
χ S4 ¼ 0:
As for the identity matrix E, we have
χ ðEÞ ¼ 8:
Thus, Table 19.11 collects characters of individual symmetry transformations

with respect to hydrogen 1s and carbon 2s and 2p orbitals in methane. From the
above examples, we notice that all the symmetry operators R reduce an eight-
dimensional representation space V8 to subspaces of Span{H1 H2 H3 H4} and
Span{C2s C2px C2py C2pz}. In terms of a notation of Sect. 12.2, we have

V 8 ¼ SpanfH 1 H 2 H 3 H 4 g Span C2s C2px C2py C2pz : ð19:182Þ
Table 19.11 Characters for Td E 8C3 3C2 6S4 6σ d

individual symmetry
Γ 8 2 0 0 4
transformations of hydrogen
1s orbitals and carbon 2s and
2p orbitals in methane
In other words, V8 is decomposed into the above two R-invariant subspaces (see
Part III); one is the hydrogen-related subspace and the other is the carbon-related
subspace.
Correspondingly, a representation D comprising the above representation matri-
ces should be reduced to a direct sum of irreducible representations D(α). That is, we
should have
X
D¼ q DðαÞ ,
α α
where qα is a positive integer or zero. Here we are thinking of decomposition of (8, 8)

matrices such as (19.178) into submatrices. We estimate qα using (18.83). That is,
1 X ðαÞ
qα ¼ χ ðgÞ χ ðgÞ: ð18:83Þ
n g
With an irreducible representation A1 of Td for instance, we have
1 X ðA1 Þ 1
qA1 ¼ χ ðgÞ χ ðgÞ ¼ ð1 8 þ 1 8 2 þ 1 6 4Þ ¼ 2:
24 g 24
As for A2, we have
1
qA 2 ¼ ½1 8 þ 1 8 2 þ ð1Þ 6 4 ¼ 0:
24
Regarding T2, we get
1
qT 2 ¼ ð3 8 þ 1 6 4Þ ¼ 2:
24
For other irreducible representations of Td, we get qα ¼ 0. Consequently, we have
D ¼ 2DðA1 Þ þ 2DðT 2 Þ : ð19:183Þ
Evidently, both for the hydrogen-related representation D(H )and carbon-related

representation D(C), we have individually
DðH Þ ¼ DðA1 Þ þ DðT 2 Þ and DðCÞ ¼ DðA1 Þ þ DðT 2 Þ , ð19:184Þ
where D ¼ D(H ) + D(C).

In fact, (19.180) for example, in a subspace Span{H1 H2 H3 H4}, C z2 is expressed
as
0 1
0 0 0 1
B0 0 0C
B 1 C
C z2 ¼ B C:
@0 1 0 0A
1 0 0 0
Following routine procedures based on a characteristic polynomials, we get

eigenvalues +1 (as a double root) and 1 (a double root as well). Unitary similarity
transformation using a unitary matrix P expressed as
01 1 1 1 1
B2 2 2 2 C
B1 1C
B 1

1
C
B2 2C
P¼B
B1
2 2 C
B 1 1 1C
B2 C
@ 2 2 2C
A
1 1 1 1

2 2 2 2
yields a diagonal matrix such that

0 1
1 0 0 0
B0 1 0C
B 0 C
P1 C z2 P ¼ B C:
@0 0 1 0A
0 0 0 1
Note that this diagonal matrix is identical to submatrix of (19.180) for Span
zπ
{C2s C2px C2py C2pz}. Similarly, S42 of (19.181) gives eigenvalues 1 and i for
zπ
both the subspaces. Notice that since S42 is unitary, its eigenvalues take a complex
number with an absolute value of 1. Since these symmetry operation matrices are
unitary, these matrices must be diagonalized according to Theorem 14.5. Using
unitary matrices whose column vectors are chosen from eigenvectors of the matrices,
diagonal elements are identical to eigenvalues including their multiplicity. Writing
representation matrices of symmetry operations for the hydrogen-associated sub-
space and carbon-associated subspace as H and C, we find that H and C have the
same eigenvalues in common. Notice here that different types of transformation
zπ
matrices (e.g., C z2 , S42 ) give a different set of eigenvalues. Via unitary similarity
transformation using unitary matrices P and Q, we get
1
P1 HP ¼ Q1 CQ or PQ1 H PQ1 ¼ C:
Namely, H and C are similar, i.e., the representation is equivalent. Thus, recalling
Schur’s First Lemma, Eq. (19.184) results.
Our next task is to construct SALCs, i.e., proper basis vectors using projection
operators. From the above, we anticipate that the MOs comprise a linear combina-
tion of the hydrogen-associated SALC and carbon-associated SALC that belongs to
the same irreducible representation (i.e., A1 or T2). For this purpose, we first find
proper SALCs using projection operators described as
ðαÞ dα X ðαÞ
g ll
ð18:156Þ
n
We apply this operator to Span{H1 H2 H3 H4}. Taking, e.g., H1 and operating

both sides of (18.156) on H1, as a basis vector corresponding to a one-dimensional
representation of A1 we have
ðA Þ dA1 X ðA1 Þ 1
P1ð11Þ H 1 ¼ D ðgÞ gH 1 ¼ ½ð1 H 1 Þ þ ð1 H 1 þ 1 H 1 þ 1 H 3
g 11
n 24
þ1 H 4 þ 1 H 3 þ 1 H 2 þ 1 H 4 þ 1 H 2 Þ þ ð1 H 4 þ 1 H 3 þ 1 H 2 Þ
þð1 H 3 þ 1 H 2 þ 1 H 2 þ 1 H 4 þ 1 H 4 þ 1 H 3 Þ
þð1 H 1 þ 1 H 4 þ 1 H 1 þ 1 H 2 þ 1 H 1 þ 1 H 3 Þ
1
¼ ð6 H 1 þ 6 H 2 þ 6 H 3 þ 6 H 4 Þ
24
1
¼ ðH 1 þ H 2 þ H 3 þ H 4 Þ: ð19:185Þ
4
The case of C2s is simple, because all the symmetry operations convert C2s to
itself. That is, we have
ðA Þ 1
P1ð11Þ C2s ¼ ð24 C2sÞ ¼ C2s: ð19:186Þ
24
Regarding C2p, taking C2px for instance, we have
ðA Þ 1
P1ð11Þ C2px ¼ fð1 C2px Þ þ 1 C2py þ 1 C2pz þ 1 C2py
24
þ1 C2pz þ 1 C2py þ 1 C2pz þ 1 C2py þ 1 C2pz
þ½ð1 C2px þ 1 ðC2px Þ þ 1 ðC2px Þþ½1 ðC2px Þ
þ1 ðC2px Þ þ 1 C2pz þ 1 C2pz þ 1 C2py þ 1 C2py

Table 19.12 Character table of Td

Td E 8C3 3C2 6S4 6σ d
A1 1 1 1 1 1 x2 + y2 + z2
A2 1 1 1 1 1
E 2 1 2 0 0 (2z2 x2 y2, x2 y2)
T1 3 0 1 1 1
T2 3 0 1 1 1 (x, y, z); (xy, yz, zx)

þ 1 C2py þ 1 C2px þ 1 C2pz þ 1 C2py þ 1 C2px þ 1 C2pz
¼ 0:
The calculation is somewhat tedious, but it is natural that since C2s is spherically
symmetric, it belongs to the totally symmetric representation. Conversely, it is
natural to think that C2px is totally unlikely to contain a totally symmetric represen-
tation. This is also the case with C2py and C2pz.
Table 19.12 shows the character table of Td in which the three-dimensional
irreducible representation T2 is spanned by basis vectors (x y z). Since in
Table 17.6 each (3, 3) matrix is given in reference to the vectors (x y z), it can
directly be utilized to represent T2. More specifically, we can directly choose the
ðT Þ
diagonal elements (1, 1), (2, 2), and (3, 3) of the individual (3, 3) matrices for D112 ,
ðT Þ ðT Þ
D222 , and D332 elements of the projection operators, respectively. Thus, we can
construct SALCs using projection operators explained in Sect. 18.7. For example,
using H1 we obtain SALCs that belong to T2 such that
ðT Þ 3 X ðT 2 Þ 3
P1ð12Þ H 1 ¼ D ðgÞ gH 1 ¼ fð1 H 1 Þ
g 11
24 24
þ½ð1Þ H 4 þ ð1Þ H 3 þ 1 H 2
þ½0 ðH 3 Þ þ 0 ðH 2 Þ þ 0 ðH 2 Þ þ 0 ðH 4 Þ þ ð1Þ H 4 þ ð1Þ H 3
þð0 H 1 þ 0 H 4 þ 1 H 1 þ 1 H 2 þ 0 H 1 þ 0 H 3 Þg
3
¼ ½2 H 1 þ 2 H 2 þ ð2Þ H 3 þ ð2Þ H 4
24
1
¼ ðH 1 þ H 2 H 3 H 4 Þ: ð19:187Þ
4
Similarly, we have
ðT Þ 1
P2ð22Þ H 1 ¼ ðH 1 H 2 þ H 3 H 4 Þ, ð19:188Þ
4
ðT Þ 1
P3ð32Þ H 1 ¼ ðH 1 H 2 H 3 þ H 4 Þ: ð19:189Þ
4
ðT Þ
Now we can easily guess that P1ð12Þ C2px solely contains C2px. Likewise,
ðT Þ ðT Þ
P2ð22Þ C2py and P3ð32Þ C2pz contain only C2py and C2pz, respectively. In fact, we
obtain what we anticipate. That is,
ðT Þ ðT Þ ðT Þ
P1ð12Þ C2px ¼ C2px , P2ð22Þ C2py ¼ C2py , P3ð32Þ C2pz ¼ C2pz : ð19:190Þ
This gives a good example to illustrate the general concept of projection operators
and related calculations of inner products
D discussed in Sect.E 18.7. For example, we
ðT 2 Þ ðT 2 Þ
have a nonvanishing inner product of P1ð1Þ H 1 jP1ð1Þ C2px and an inner product of,
D E
ðT Þ ðT Þ
e.g., P2ð22Þ H 1 jP1ð12Þ C2px is zero. This significantly reduces efforts to solve a
ðT Þ ðT Þ
secular equation (vide infra). Notice that functions P1ð12Þ H 1 , P1ð12Þ C2px , etc. are
ðT Þ ðT Þ
linearly independent of one another. Recall also that P1ð12Þ H 1 and P1ð12Þ C2px
ð αÞ ðαÞ
correspond to ϕl and ψ l in (18.171), respectively. That is, the former functions
are linearly independent, while they belong to the same place “1” of the same three-
dimensional irreducible representation T2.
Equations (19.187) to (19.190) seem intuitively obvious. This is because if we
draw a molecular geometry of methane (see Fig. 19.14), we can immediately
recognize the relationship between the directionality in ℝ3 and “directionality” of
SALCs represented by sings of hydrogen atomic orbitals (or C2px, C2py, and C2pz of
carbon).
As stated above, we have successfully obtained SALCs relevant to methane.
Therefore, our next task is to solve an eigenvalue problem and construct appropriate
MOs. To do this, let us first normalize the SALCs. We assume that carbon-based
SALCs have already been normalized as well-studied atomic orbitals. We suppose
that all the functions are real. For the hydrogen-based SALCs
hH 1 þ H 2 þ H 3 þ H 4 jH 1 þ H 2 þ H 3 þ H 4 i ¼ 4hH 1 jH 1 i þ 12hH 1 jH 2 i
¼ 4 þ 12SHH ¼ 4 1 þ 3SHH , ð19:191Þ
where the second equality comes from the fact that jH1i, i.e., a 1s atomic orbital is
normalized. We define hH1| H2i ¼ SHH, i.e., an overlap integral between two
adjacent hydrogen atoms. Note also that hHi| Hji (1 i, j 4; i 6¼ j) is the same
because of the symmetry requirement.
Thus, as a normalized SALC we have
j H1 þ H2 þ H3 þ H4i
H ðA1 Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:192Þ
2 1 þ 3SHH
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Defining a denominator as c ¼ 2 1 þ 3SHH , we have
H ðA1 Þ ¼j H 1 þ H 2 þ H 3 þ H 4 i=c: ð19:193Þ
Also, we have
hH 1 þ H 2 H 3 H 4 jH 1 þ H 2 H 3 H 4 i ¼ 4hH 1 jH 1 i 4hH 1 jH 2 i
¼ 4 4SHH ¼ 4 1 SHH : ð19:194Þ
Thus, we have
ðT 2 Þ j H1 þ H2 H3 H4i
H1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:195Þ
2 1 SHH
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Also defining a denominator as d ¼ 2 1 SHH , we have
ðT 2 Þ
H1 j H 1 þ H 2 H 3 H 4 i=d:
Similarly, we define other hydrogen-based SALCs as
ðT 2 Þ
H2 j H 1 H 2 þ H 3 H 4 i=d,
ðT Þ
,
H3 2 j H 1 H 2 H 3 þ H 4 i=d:
The next step is to construct MOs using the above SALCs. To this end, we make a
linear combination using SALCs belonging to the same irreducible representations.
In the case of A1, we choose H ðA1 Þ and C2s. Naturally, we anticipate two linear
combinations of
a1 H ðA1 Þ þ b1 C2s, ð19:196Þ
where a1 and b1 are arbitrary constants. On the basis of the discussions of projection
operators in Sect. 18.7, both the above two linear combinations belong to A1 as well.
Similarly, according to the projection operators P1ð12Þ , P2ð22Þ , and P3ð32Þ , we make three
sets of linear combinations
q1 P1ð12Þ H 1 þ r 1 C2px ; q2 P2ð22Þ H 1 þ r 2 C2py ; q3 P3ð32Þ H 1 þ r 3 C2pz , ð19:197Þ
where q1, r1, etc. are arbitrary constants. These three sets of linear combinations
belong to individual “addresses” 1, 2, and 3 of T2. What we have to do is to
determine coefficients of the above MOs and to normalize them by solving the
secular equations. With two different energy eigenvalues, we get two orthogonal
(i.e., linearly independent) MOs for the individual four sets of linear combinations of
(19.196) and (19.197). Thus, total eight linear combinations constitute MOs of
methane.
In light of (19.183), the secular equation can be reduced as follows according to
the representations A1 and T2. There we have changed the order of entries in the
equation so that we can deal with the equation easily. Then we have

H 11 λ H 12 λS12

H λS H λ
21 21 22

G11 λ G12 λT 12

G21 λT 21 G22 λ

F 11 λ F 12 λV 12

F 21 λV 21 F 22 λ

K λ K λW
11 12 12
K 21 λW 21 K 22 λ
¼ 0,
ð19:198Þ
where off-diagonal elements are all zero. This is because of the symmetry require-
ment (see Sect. 19.2). Thus, in (19.198) the secular equation is decomposed into four
(2, 2) blocks. The first block is pertinent to A1 of a hydrogen-based component and
carbon-based component from the left, respectively. Lower three blocks are perti-
nent to T2 of hydrogen-based and carbon-based components from the left, respec-
tively, in order of P1ð12Þ , P2ð22Þ , and P3ð32Þ SALCs from the top. The notations follow
those of (19.59).
We compute these equations. The calculations are equivalent to solving the
following four two-dimensional secular equations:

H 11 λ H 12 λS12

¼ 0,
H λS H 22 λ
21 21

G11 λ G12 λT 12

¼ 0,
G λT G22 λ
21 21
ð19:199Þ
F 11 λ F 12 λV 12

¼ 0,
F λV F 22 λ
21 21

K 11 λ K 12 λW 12

¼ 0:
K λW K λ
21 21 22
Notice that these four secular equations are the same as a (2, 2) determinant of
(19.59) in the form of a secular equation. Note, at the same time, that while (19.59)
did not assume SALCs, (19.199) takes account of SALCs. That is, (19.199)
expresses a secular equation with respect to two SALCs that belong to the same
irreducible representation.
The first equation of (19.199) reads as
1 S12 2 λ2 ðH 11 þ H 22 2H 12 S12 Þλ þ H 11 H 22 H 12 2 ¼ 0: ð19:200Þ
In (19.200) we define quantities as follows:

ð
hH 1 þ H 2 þ H 3 þ H 4 jC2si 2hH 1 jC2si
S12 H ðA1 Þ C2sdτ hH ðA1 Þ jC2si ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 1 þ 3SHH 1 þ 3SHH
2SCH
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
A1
ffi: ð19:201Þ
1 þ 3SHH
In (19.201), an overlap integral between hydrogen atomic orbitals and C2s is

identical from the symmetry requirement and it is defined as
A1 hH 1 jC2si:
SCH ð19:202Þ
Also in (19.200), other quantities are defined as follows:

D E
αH þ 3βHH
H 11 H ðA1 Þ jHH ðA1 Þ ¼ , ð19:203Þ
1 þ 3SHH
H 22 hC2sjHC2si, ð19:204Þ
D E hH þ H þ H þ H jHC2si 2hH jHC2si
H 12 H ðA1 Þ jHC2s ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 3
ffi4 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
ffi
2 1 þ 3S HH
1 þ 3SHH
2βCH
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
A1
ffi, ð19:205Þ
1 þ 3SHH
where H is a Hamiltonian of a methane molecule. In (19.203) and (19.205),

moreover, we define the quantities as

αH H 1 HH 1 i, βHH hH 1 HH 2 , βCH
A1 hH 1 jHC2si: ð19:206Þ
The quantity of H11 is a “Coulomb” integral of the hydrogen-based SALC that

involves four hydrogen atoms. Solving the first equation of (19.199), we get
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

H 11 þH 22 2H 12 S12 ðH 11 H 22 Þ2 þ4 H 12 2 þH 11 H 22 S12 2 H 12 S12 ðH 11 þH 22 Þ
λ¼ :
2 1S12 2
ð19:207Þ
Similarly, we obtain related solutions for the latter three eigenvalue equations of
(19.199). With the second equation of (19.199), for instance, we have
1 T 12 2 λ2 ðG11 þ G22 2G12 T 12 Þλ þ G11 G22 G12 2 ¼ 0: ð19:208Þ
Solving this, we get

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

G11 þG22 2G12 T 12 ðG11 G22 Þ2 þ4 G12 2 þG11 G22 T 12 2 G12 T 12 ðG11 þG22 Þ
λ¼ :
2 1T 12 2
ð19:209Þ
In (19.208) and (19.209) we define these quantities as follows:

ð D E hH þH H H jC2p i 2hH jC2p i
ðT Þ ðT Þ
T 12 H 1 2 C2px dτ H 1 2 j2Cpx ¼ 1 2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3
ffi4 x
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
ffi
x
2 1SHH 1SHH
2SCH
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T2
ffi, ð19:210Þ
1SHH
D E
ðT Þ ðT Þ αH βHH
G11 H 1 2 jHH 1 2 ¼ , ð19:211Þ
1 SHH
G22 hC2px jHC2px i, ð19:212Þ
D E hH þ H H H jHC2p i 2hH jHC2p i
ðT Þ
G12 H 1 2 jHC2px ¼ 1 2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3
ffi
4 x
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
ffix
2 1 SHH 1 SHH
2βCH
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T2
ffi: ð19:213Þ
1 SHH
In the above equations, we further define integrals such that
T 2 hH 1 jC2px i, βT 2 hH 1 jℌC2px i:
SCH ð19:214Þ
CH
In (19.210), an overlap integral T12 between four hydrogen atomic orbitals and
C2px is identical again from the symmetry requirement. That is, the integrals of
(19.213) are additive regarding a product of plus components of a hydrogen atomic
orbital of (19.187) and C2px as well as another product of minus components of a
hydrogen atomic orbital of (19.187) and C2px; see Fig. 19.14. Notice that C2px has a
node on yz-plane.
The third and fourth equations of (19.199) give exactly the same eigenvalues as
that given in (19.209). This is obvious from the fact that all the latter three equations
of (19.199) are associated with a irreducible representation T2. The corresponding
three MOs are triply degenerate.
In (19.207) and (19.209), the plus sign gives a higher orbital energy and the minus
sign gives a lower energy. Equations (19.207) and (19.209), however, look
somewhat complicated. To simplify the situation, (i) in, e.g., (19.207) let us consider
a case where jH11 j j H22j or jH11 j j H22j. In that case, (H11 H22)2
dominates inside a square root, and so ignoring 2H12S12 and S122 we have
λ H11 or λ H22. Inserting these values into (19.199), we have either Ψ
H ðA1 Þ or Ψ C2s, where Ψ is a resulting MO. This implies that no interaction would
arise between H ðA1 Þ and C2s. (ii) If, on the other hand, H11 ¼ H22, we would get
H 11 H 12 S12 j H 12 H 11 S12 j
λ¼ : ð19:215Þ
1 S12 2
As H12 H11S12 is positive or negative, we have a following alternative:

11 þH 12 11 H 12
Case I: H12 H11S12 > 0. We have λH ¼ H1þS 12
and λL ¼ H1S 12
, where
λ H > λ L.
11 H 12 11 þH 12
Case II: H12 H11S12 < 0. We have λH ¼ H1S 12
and λL ¼ H1þS 12
, where
λ H > λ L.
In this case, we would anticipate maximum orbital mixing between H ðA1 Þ and
C2s. (iii) If H11 and H22 are moderately different in between the above (i) and (ii), the
orbital mixing is likely to be moderate. This is an actual situation. With (19.209) we
have eigenvalues expressed similarly to those of (19.207). Therefore, classifications
related to the above Cases I and II hold with G12 G11T12, F12 F11V12, and
K12 K11W12 as well.
In spite of simplicity of (19.215), the quantities H11, H12, and S12 are hard to
calculate. In general cases including the present example (i.e., methane), the diffi-
culty in calculating these quantities results essentially from the fact that we are
dealing with a many-particle interactions that include electron repulsion. Nonethe-
less, for a simplest case of a hydrogen molecular ion Hþ 2 the estimation is feasible
[3]. Let us estimate H12 H11S12 quantitatively according to Atkins and
Friedman [3].
The Hamiltonian of Hþ 2 is described as
ħ2 2 e2 e2 e2
H¼ ∇ þ , ð19:216Þ
2m 4πε0 r A 4πε0 r B 4πε0 R
where m is an electron rest mass and other symbols are defined in Fig. 19.15; the last
term represents the repulsive interaction between the two hydrogen nuclei. To
estimate the 0
quantities
1 in (19.215), it is convenient to use dimensionless ellipsoidal
μ
B C
coordinates @ ν A such that [3]
ϕ
rA þ rB rA rB
μ¼ and ν¼ : ð19:217Þ
R R
A B
Fig. 19.15 Configuration of electron (e) and hydrogen nuclei (A and B). rA and rB denote a
separation between the electron and A and that between the electron and B, respectively. R denotes
a separation between A and B
The quantity ϕ is an azimuthal angle around the molecular axis (i.e., the straight
line connecting the two nuclei). Then, we have
2
ħ2 2 e2 e2 e

H 11 ¼ hAjHjAi ¼ A ∇ A þ A A þ A A
2m 4πε0 r A 4πε0 r B 4πε0 R

e2 1 e2
¼ E 1s A A þ , ð19:218Þ
4πε0 rB 4πε0 R
where E1s is the same as that given in (3.258) with Z ¼ n ¼ 1 and μ replaced with
ħ2
m (i.e., E1s ¼ 2ma 2 ). Using a coordinate representation of (3.301), we have
rffiffiffi
1 3=2 rA =a
j Ai ¼ a e : ð19:219Þ
π
Moreover considering (19.217), we have

ð
1 1 1
A A ¼ 3 dτe2rA =a :
rB πa rB
Converting Cartesian coordinates to ellipsoidal coordinates [3] such that

ð ð
3 2π
ð1 ð1
R
dτ ¼ dϕ dμ dν μ2 ν2 ,
2 0 1 1
we have
ð1 ð1
1 R3 eðμþνÞR=a
A A ¼ 2π dμ dν μ2 ν2
rB 8πa 3 R
1 1 ðμ νÞ
2
ð ð1
R2 1
¼ 3 dμ dνðμ þ νÞeðμþνÞR=a : ð19:220Þ
2a 1 1
Ð1 Ð1
Putting I ¼ 1 dμ 1 dνðμ þ νÞeðμþνÞR=a , we obtain
ð1 ð1 ð1 ð1
μR=a νR=a μR=a
I¼ μe dμ e dν þ e dμ νeνR=a dν: ð19:221Þ
1 1 1 1
The above definite integrals can readily be calculated using the methods
described in Sect. 3.7.2; see, e.g., (3.262) and (3.263). For instance, we have
ð1 1
ecx 1
ecν dν ¼ ¼ ðec ec Þ: ð19:222Þ
1 c 1
c
Differentiating (19.222) with respect to the parameter c, we get

ð1
1
νecν dν ¼ ½ð1 cÞec ð1 þ cÞec : ð19:223Þ
1 c2
In the present case, c is to be replaced with R/a. Other calculations of the definite
integrals are left for readers. Thus, we obtain
h i
a3 R
I¼ 3
2 2 1 þ e2R=a :
R a
In turn, we have
h i
1 R2 1 R
A A ¼ 3 I ¼ 1 1 þ e2R=a :
rB 2a R a
Introducing a symbol according to Atkins and Friedman [3]

e2
j0 ,
4πε0
we finally obtain
j0 h R
i j
H 11 ¼ E 1s 1 1 þ e2R=a þ 0 : ð19:224Þ
R a R
The quantity S12 can be obtained as follows:

ð 2π ð1 ð1
R3
S12 ¼ hB j Ai ¼ dϕ dμ dν μ2 ν2 eμR=a , ð19:225Þ
8πa3 0 1 1
qffiffi
1 3=2 rB =a
where we used j Bi ¼ πa e and (19.217). Noting that
ð1 ð1 ð1
2 μR=a
dμ dν μ2 ν2 eμR=a ¼ dμ 2μ2 e
1 1 1 3
and following procedures similar to those described above, we get

2
R 1 R
S12 ¼ 1 þ þ eR=a : ð19:226Þ
a 3 a
In turn, for H12 we have

2
ħ2 e2 2
H 12 ¼ hAjHjBi ¼ A ∇2 B þ A e B þ A e B
2m 4πε0 r B 4πε0 r A 4πε0 R

e2 1 e2
¼ E1s hAjBi A B þ hAjBi: ð19:227Þ
4πε0 r A 4πε0 R
ħ 2
Note that in (19.227) jBi is an eigenfunction belonging to E1s ¼ 2ma 2 that is an
ħ2 e2
eigenvalue of an operator 2m ∇ 4πε0 rB . From (3.258), we estimate E1s to be
2
13.61 eV. Using (19.217) and converting Cartesian coordinates to ellipsoidal coor-
dinates once again, we get

1 1 R R=a
A B ¼ 1þ e :
rA a a
Thus, we have

j0 j R R=a
H 12 ¼ E1s þ S 0 1þ e : ð19:228Þ
R 12 a a
We are now in the position to evaluate H12 H11S12 in (19.213). According to

Atkins and Friedman [3], we define following notations:

1 1
j j0 A A
0
and k j0 A B :
0
rB rA
Then, we have
H 12 H 11 S12 ¼ k0 þ j0 S12 : ð19:229Þ
The calculation of this quantity is straightforward. The result is
H 12 H 11 S12
h i
1 2R R=a j0 R R 1 R 2 3R=a
¼ j0 2 e 1þ 1þ þ Þ e
R 3a R a a 3 a
j0 a 2R R=a j0 h i
a R 1 R 2 3R=a
¼ e 1þ 1þ þ Þ e : ð19:230Þ
a R 3a a R a 3 a
In (19.230) we notice that whereas the second term is always negative, the first term
may be negative or positive depending upon R. We could not tell a priori whether
H12 H11S12 is negative accordingly. If we had R a, (19.230) would become
positive.
Let us then make a quantitative estimation. The Bohr radius a is about 52.9 pm
(using an electron rest mass). As an experimental result, R is approximately 106 pm
[3]. Hence, for a Hþ2 ion we have
R=a 2:0:
Using this number, we get
H 12 H 11 S12 0:13j0 =a < 0:
We estimate H12 H11S12 to be ~ 3.5 eV. Therefore, from (19.215) we get

H 11 þ H 12 H 11 H 12
λL ¼ and λH ¼ , ð19:231Þ
1 þ S12 1 S12
where λL and λH indicate lower and higher energy eigenvalues, respectively.

Namely, Case II in the above is more likely. Correspondingly, for MOs we have
j Aiþ j Bi j Ai j Bi
Ψ L ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi and Ψ H ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , ð19:232Þ
2ð1 þ S12 Þ 2ð1 S12 Þ
where Ψ L and Ψ H belong to λL and λH, respectively. The results are virtually the
same as those given in (19.101) to (19.105) of Sect. 19.4.1. In (19.101) and
Fig. 19.6, however, we merely assumed that λ1 ¼ αþβ αβ
1þS is lower than λ2 ¼ 1S .
Here, we have confirmed that this is truly the case. A chemical bond is formed in
such a way that an electron is distributed as much as possible along the molecular
axis (see Fig. 19.4 for a schematic) and that in this configuration a minimized orbital
energy is achieved.
In our present case, H11 in (19.203) and H22 in (19.204) should differ. This is also
the case with G11 in (19.211) and G22 in (19.212). Here we return back to (19.199).
Suppose that we get a MO by solving, e.g., the first secular equation of (19.199) such
that
Ψ ¼ a1 H ðA1 Þ þ b1 C2s:
From the secular equation, we get
H 12 λS12
a1 ¼ b , ð19:233Þ
H 11 λ 1
where S12 and H12 were defined as in (19.201) and (19.205), respectively. A
e is described by
normalized MO Ψ
a1 H ðA1 Þ þ b1 C2s
e ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ ffi: ð19:234Þ
a1 2 þ b1 2 þ 2a1 b1 S12
Thus, according to two different energy eigenvalues λ we will get two linearly
independent MOs. Other three secular equations are dealt with similarly.
From the energetical consideration of Hþ 2 we infer that a1b1 > 0 with a bonding
MO and a1b1 < 0 with an antibonding MO. Meanwhile, since both H ðA1 Þ and C2s
belong to the irreducible representation A1, so does Ψe on the basis of the discussion
on the projection operators of Sect. 18.7. Thus, we should be able to construct proper
MOs that belong to A1. Similarly, we get proper MOs belonging to T2 by solving
other three secular equations of (19.199). In this case, three bonding MOs are triply
degenerate, so are three antibonding MOs. All these six MOs belong to the
Fig. 19.16 Probable energy ∗

diagram and MO symmetry
species of methane. Adapted
from http://www.science.
oregonstate.edu/~gablek/
CH334/Chapter1/methane_ ∗
MOs.htm with kind
permission of Professor
Kevin P. Gable
CH4
irreducible representation T2. Thus, we can get a complete set of MOs for methane.
These eight MOs span the representation space V8.
To precisely determine the energy levels, we need to take more elaborate
approaches to approximate and calculate various parameters that appear in
(19.199). At the same time, we need to perform detailed experiments including
spectroscopic measurements and interpret those results carefully [4]. Taking account
of these situations, Fig. 19.16 [5] displays as an example of MO calculations that
give a probable energy diagram and MO symmetry species of methane. The diagram
comprises a ground-state bonding a1 state and its corresponding antibonding state a1
along with triply degenerate bonding t2 states, and their corresponding antibonding
state t 2 .
We emphasize that the said elaborate approaches ensue truly from the “paper-
and-pencil” methods based upon group theory. The group theory thus supplies us
with a powerful tool and clear guideline for addressing various quantum chemical
problems, a few of which we are introducing as an example in this book.
Finally, let us examine the optical transition of methane. In this case, we have to
consider electronic configurations of the initial and final states. If we are dealing with
optical absorption, the initial state is the ground-state A1 that is described by the
totally symmetric representation. The final state, on the other hand, will be an excited
state, which is described by a direct-product representation related to the two states
that are associated with the optical transition. The matrix element is expressed as
D E
Θðα βÞ
jεe PjΘðA1 Þ , ð19:235Þ
where ΘðA1 Þ stands for an electronic configuration of the totally symmetric ground
state; Θ(α β) denotes an electronic configuration of an excited state represented by a
direct-product representation pertinent to irreducible representations α and β; P is an
electric dipole operator and εe is a unit polarization vector of the electric field. From a
character table for Td, we find that εe P belongs to the irreducible representation T2
(see Table 19.12).
The ground-state electronic configuration is A1 (totally symmetric). It is denoted
by
a21 t 22 t 0 2 t 00 2 ,
2 2
where three T2 states are distinguished by a prime and double prime. For possible
configuration of excited states, we have.
a21 t 2 t 0 2 t 00 2 t 2 ðA1 ! T 2 T 2 Þ, a21 t 2 t 0 2 t 00 2 a1 ðA1 ! T 2

2 2 2 2
A1 ¼ T 2 Þ,
a1 t 22 t 0 2 t 00 2 t 2 a1 t 22 t 0 2 t 00 2 a1
2 2 2 2
ðA1 ! A1 T 2 ¼ T 2 Þ, ðA1 ! A1 A1 ¼ A1 Þ:
ð19:236Þ
In (19.236), we have optical excitations of t 2 ! t 2 , t 2 ! a1 , a1 ! t 2 , and a1 ! a1 ,

respectively. Consequently, the excited states are denoted by T2 T2, T2, T2, and A1,
respectively. Since εe P is Hermitian as seen in (19.51) of Sect. 19.2, we have
D E D E
Θðα βÞ
jεe PjΘðA1 Þ ¼ ΘðA1 Þ jεe PjΘðα βÞ
: ð19:237Þ
Therefore, according to the general theory of Sect. 19.2, we need to examine whether
T2 D(α) D(β) contains A1 to judge whether the optical transition is allowed. As
mentioned above, D(α) D(β) is chosen from among T2 T2, T2, T2, and A1. The
results are given as below:
T2 T2 T 2 ¼ 3T 1 þ 4T 2 þ 2E þ A2 þ A1 , ð19:238Þ
T2 T 2 ¼ T 1 þ T 2 þ E þ A1 , ð19:239Þ
T2 A1 ¼ T 2 : ð19:240Þ
Admittedly, (19.238) and (19.239) contain A1, and so the transition is allowed. As
for (19.240), however, the transition is forbidden because it does not contain A1. In
light of the character table of Td (Table 19.12), we find that the allowed transitions
(i.e., t 2 ! t 2 , t 2 ! a1 , and a1 ! t 2 ) equally take place in the direction polarized
along the x-, y-, z-axes. This is often the case with molecules having higher
symmetries such as methane. The transition a1 ! a1 , on the other hand, is
forbidden.
In the above argument including the energetic consideration, we could not tell
magnitude relationship of MO energies or photon energies associated with the
optical transition. Once again, this requires more accurate calculations and experi-
ments. Yet, the discussion we have developed gives us strong guiding principles in
the investigation of molecular science.
References

3. Atkins P, Friedman R (2005) Molecular quantum mechanics, 4th edn. Oxford University Press,
Oxford
4. Anslyn EV, Dougherty DA (2006) Modern physical organic chemistry. University Science
Books, Sausalito
5. Gable KP (2014) Molecular orbitals: Methane, http://www.science.oregonstate.edu/~gablek/
CH334/Chapter1/methane_MOs.htm
Chapter 20
Theory of Continuous Groups
In Chap. 16, we classified the groups into finite groups and infinite groups. We
focused mostly on finite groups and their representations in the precedent chapters.
Of the infinite groups, continuous groups and their properties have been widely
investigated. In this chapter we think of the continuous groups as a collective term
that include Lie groups and topological groups. The continuous groups are also
viewed as the transformation group. Aside from the strict definition, we will see the
continuous group as a natural extension of the rotation group or SO(3) that we
studied briefly in Chap. 17. Here we reconstruct SO(3) on the basis of the notion of
infinitesimal rotation. We make the most of the exponential functions of matrices the
theory of which we have explored in Chap. 15. In this context, we study the basic
properties and representations of the special unitary group of SU(2) that has close
relevance to SO(3). Thus, we focus our attention on the representation theory of SU
(2) and SO(3). The results are associated with the (generalized) angular momenta
which we dealt with in Chap. 3. In particular, we show that the spherical surface
harmonics constitute the basis functions of the representation space of SO(3).
Finally, we study various important properties of SU(2) and SO(3) within a
framework of Lie groups and Lie algebras. The last sections comprise the description
of abstract ideas, but those are useful for us to comprehend the constitution of the
group theory from a higher perspective.
20.1 Introduction: Operators of Rotation and Infinitesimal

Rotation
To characterize the continuous transformation appropriately, we have to describe an

infinitesimal transformation of rotation. This is because a rotation with a finite angle
is attained by an infinite number of infinitesimal rotations.

https://doi.org/10.1007/978-981-15-2225-3_20
800 20 Theory of Continuous Groups
First let us consider how a function Ψ is transformed by the rotation (see Sect.
19.1). Here we express Ψ by
0 1
c1
B c2 C
B C
Ψ ¼ ðψ 1 ψ 2 ψ d ÞB C, ð20:1Þ
@⋮A
cd
where {ψ 1, , ψ d} span the representation space relevant to the rotation; d is a

dimension of the representation space; c1, c2, cd are coefficients (or coordinates).
Henceforth, we make it a rule to denote a vector in a representation space as in
(20.1). This is a natural extension of (11.13). We assume ψ v (v ¼ 1, 2, , d ) as a
vector component and ψ v is normally expressed as a function of a position vector x in
a real three-dimensional space (see Chap. 18). Considering for simplicity a
one-dimensional representation space (i.e., d ¼ 1) and omitting the index v, as a
transformation of ψ we have

Rθ ψ ðxÞ ¼ ψ R1
θ ð xÞ , ð20:2Þ
where the index θ indicates the rotation angle in a real space whose dimension is two
or three. Note that the dimensions of the real space and representation space may or
may not be the same. Equation (20.2) is essentially the same as (19.11).
As a three-dimensional matrix representation of rotation, we have, e.g.,
0 1
cos θ sin θ 0
B C
Rθ ¼ @ sin θ cos θ 0 A: ð20:3Þ
0 0 1
The matrix Rθ represents a rotation around the z-axis in ℝ3; see Fig. 11.2 and
(11.31). Borrowing the notation of (17.3) and (17.12), we have
0 10 1
cos θ sin θ 0 x
B CB C
Rθ ðxÞ ¼ ðe1 e2 e3 Þ@ sin θ cos θ 0 A@ y A: ð11:36Þ
0 0 1 z
Therefore, we get
0 10 1
cos θ sin θ 0 x
B CB C
R1
θ ðxÞ ¼ ðe1 e2 e3 Þ@ sin θ cos θ 0 A@ y A: ð20:4Þ
0 0 1 z
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 801
Let us think of an infinitesimal rotation. Taking a first-order quantity of an

infinitesimal angle θ, (20.4) can be approximated as
0 10 1 0 1
1 θ 0 x x þ θy
B CB C B C
R1 B CB C B C
θ ðxÞ ðe1 e2 e3 Þ@ θ 1 0 A@ y A ¼ ðe1 e2 e3 Þ@ y θx A
0 0 1 z z ð20:5Þ
¼ ðxe1 þ θye1 ye2 θxe2 ze3 Þ
¼ ð½x þ θye1 ½y θxe2 ze3 Þ:
Putting
ψ ðxÞ = ψ ðxe1 þ ye2 þ ze3 Þ ψ ðx, y, zÞ,
we have

Rθ ψ ðxÞ ¼ ψ R1
θ ð xÞ ð20:2Þ
or
Rθ ψ ðx, y, zÞ ψ ðx þ θy, y θx, zÞ

∂ψ ∂ψ
¼ ψ ðx, y, zÞ þ θy þ ðθxÞ
∂x ∂y ð20:6Þ

∂ ∂
¼ ψ ðx, y, zÞ iθðiÞ x y ψ ðx, y, zÞ:
∂y ∂x
Now, using a dimensionless operator ℳz (¼Lz/ħ) such that

∂ ∂
ℳz i x y , ð20:7Þ
∂y ∂x
we get
Rθ ψ ðx, y, zÞ ψ ðx, y, zÞ iθℳz ψ ðx, y, zÞ ¼ ð1 iθℳz Þψ ðx, y, zÞ: ð20:8Þ
Note that the operator ℳz has already appeared in Sect. 3.5 [also see (3.19), Sect.
3.2], but in this section we denote it by ℳz to distinguish it from the previous
notation Mz. This is because ℳz and Mz are connected through a unitary similarity
transformation (vide infra).
From (20.8), we express an infinitesimal rotation θ around the z-axis as
Rθ = 1 iθℳz : ð20:9Þ
Next, let us think of a rotation of a finite angle θ. We have

n
Rθ ¼ Rθ=n , ð20:10Þ
where RHS of (20.10) implies that the rotation Rθ of a finite angle θ is attained by
n successive infinitesimal rotations of Rθ/n with a large enough n.
Taking a limit n ! 1 [1, 2], we get
n
n θ
Rθ ¼ lim Rθ=n ¼ lim 1 i ℳz ¼ exp ðiθℳz Þ: ð20:11Þ
n!1 n!1 n
Recall the following definition of an exponential function other than (15.7) [3]:
x n
ex n!1
lim 1 þ : ð20:12Þ
n
Note that as in the case of (15.7), (20.12) holds when x represents a matrix. Thus,
if a dimension of the representation space d is two or more, (20.9) should be read as
Rθ ¼ E iθℳz , ð20:13Þ
where E is a (d, d) identity matrix; ℳz is represented by a (d, d) square matrix

accordingly. As already shown in Chap. 15, an exponential function of a matrix is
well defined. Since ℳz is Hermitian, iℳz is anti-Hermitian (see Chaps. 2 and 3).
Thus, using (20.11) Rθ can be expressed as
Rθ ¼ exp ðθAz Þ, ð20:14Þ
with
Az iℳz : ð20:15Þ
Operators of this type play an essential role in continuous groups. This is because
in (20.14) the operator Az represents a Lie algebra, which in turn generates a Lie
group. The Lie group is categorized as one of continuous groups along with a
topological group. In this chapter we further explore various characteristics of SO
(3) and SU(2), both of which are Lie groups and have been fully investigated from
various aspects. The Lie algebras frequently appear in the theory of continuous
groups in combination with the Lie groups. Brief outline of the theory of Lie groups
and Lie algebras will be given at the end of this chapter.
Let us further consider implications of (20.9) and (20.13). From those equations
we have
ℳz ¼ ð1 Rθ Þ=iθ ð20:16Þ
or
ℳz ¼ ðE Rθ Þ=iθ: ð20:17Þ
As a very trivial case, we might well be able to think of a one-dimensional real

space. Let the space extend along the z-axis and a unit vector of that space be e3.
Then, with any rotation Rθ around the z-axis makes e3 unchanged (or invariant). That
is, we have
Rθ ðe3 Þ ¼ e3 :
This means Rθ E. Putting this into (20.17), we have ℳz ¼ 0. From (20.15), we

have Az ¼ 0 as well. In fact, (20.14) shows that this is the case.
With the three-dimensional case, we have
0 1 0 1
1 0 0 1 θ 0
B C B C
E ¼ @0 1 0A and Rθ ¼ @ θ 1 0 A: ð20:18Þ
0 0 1 0 0 1
Therefore, from (20.17) we get

0 1
0 i 0
B C
ℳz ¼ @ i 0 0 A: ð20:19Þ
0 0 0
Next, viewing x, y, and z as vector components (i.e., functions), we think of the

following equation:
0 10 1
0 i 0 c1
B CB C
ℳz ðΧÞ ¼ ðx y zÞ@ i 0 0 A@ c2 A: ð20:20Þ
0 0 0 c3
The notation of (20.20) denotes the vector transformation in the representation

space, in accordance with (11.37). In (20.20) we consider (x y z) as basis vectors as in
the case of (e1 e2 e3) of (17.2) and (17.3). In other words, we are thinking of a vector
transformation in a three-dimensional representation space spanned by a set of basis
Fig. 20.1 Function

f(x, y) ¼ ay (a > 0). The = , = ( > 0)
function ay is drawn green.
Only a part of f (x, y) > 0
is shown
vectors (x y z). In this case, ℳz represented by (20.19) operates on a vector (i.e., a

function) Χ expressed by
0 1
c1
B C
Χ ¼ ð x y z Þ @ c2 A, ð20:21Þ
c3
where c1, c2, and c3 are coefficients (or coordinates). Again, this notation is a natural
extension of (11.13). We emphasize once more that (x y z) does not represents
coordinates but functions. As an example, we depict a function f(x, y) ¼ ay (a > 0) in
Fig. 20.1. The function ay (drawn green in Fig. 20.1) behaves as a vector
with respect to a linear transformation like a basis vector e2 of the Cartesian
coordinate. The “directionality” as a vector can easily be visualized with the function
ay.
Since ℳz of (20.19) is Hermitian, it must be diagonalized by unitary similarity
transformation (Chap. 14). As in the case of, e.g., (14.96), we find a diagonalizing
unitary matrix P with respect to ℳz such that
0 1
1 1
B 2 2 C
B C
P¼B
B pffiffiffi piffiffiffi
i C: ð20:22Þ
@ 0C
A
2 2
0 0 1
Using (20.22), we rewrite (20.20) as

0 1 0 1
0 i 0 c1
B C B C
ℳz ðΧÞ ¼ ðx y zÞP P1 B
@i 0 0C 1 B C
AP P @ c2 A
0 0 0 c3
0 1 1
0
1 1 1 i
pffiffiffi pffiffiffi 0 0 1
pffiffiffi pffiffiffi 0 0 1
B 2 2 C 1 0 0 B 2 2 C c1
B CB CB CB C
B CB B 1 CB C
¼ ðx y zÞB 1 0 C
pffiffiffi 0 C AB pffiffiffi pffiffiffi 0 C
B
i i 0 i c2
B pffiffiffi C@ C@ A
@ 2 2 A @ 2 2 A
0 0 0 c3
0 0 1 0 0 1
0 1
0 1 p1ffiffiffi c þ piffiffiffi c
1 0 0 B 2 C
1 2
2
1 i 1 i B CB
B
C
C
¼ pffiffiffi x pffiffiffi y pffiffiffi x pffiffiffi y z B
@ 0 1 0 CB 1
AB pffiffiffi c1 þ pffiffiffi c2 C
i :
2 2 2 2 C
@ 2 2 A
0 0 0
c3
ð20:23Þ
fz such that
Here we define M
0 1
1 0 0
fz P1 ℳz P ¼ B
M @0
C
1 0 A: ð20:24Þ
0 0 0
Equation (20.24) was obtained by putting l ¼ 1 and shuffling the rows and
columns of (3.158) in Chap. 3. That is, choosing a (real) unitary matrix R3 given by
0 1
0 0 1
B C
R3 ¼ @ 1 0 0 A, ð17:116Þ
0 1 0
we have
0 1
1 0 0
f 1 B C
R1 1 1
3 M z R3 ¼ R3 P ℳz PR3 ¼ ðPR3 Þ ℳz ðPR3 Þ ¼ @ 0 0 0 A ¼ Mz:
0 0 1
Hence, Mz that appeared as a diagonal matrix in (3.158) has been obtained by a

unitary similarity transformation of ℳz using a unitary operator PR3.
Comparing (20.20) and (20.23), we obtain useful information. If we choose (x y z)
for basis vectors, we could not have a representation that diagonalizes ℳz. If, on the
other hand, we choose p1ffiffi2 x piffiffi2 y p1ffiffi2 x piffiffi2 y z for the basis vectors, we get a
representation that diagonalizes ℳz. These basis vectors are sometimes referred to as
a spherical basis. If we carefully look at the latter basis vectors (i.e., spherical basis),
we
1 notice that these vectors have the same transformation properties as
Y 1 Y 1
1 Y 01 . In fact, using (3.216) and converting the spherical coordinates to
Cartesian coordinates in (3.216), we get [4].
rffiffiffiffiffi
3 1 1 1
Y 11 Y 1
1 Y 01 ¼ pffiffiffi ðx þ iyÞ pffiffiffi ðx iyÞ z , ð20:25Þ
4π r 2 2
where r is a radial coordinate (i.e., a positive number)

that appears in the polar
coordinate (r, θ). Note that the factor r contained in Y 11 Y 1
1 Y 01 is invariant with
respect to the rotation.
We can generalize the results and conform them to the discussion of related
characteristics in representation spaces of higher dimensions. The spherical surface
harmonics possess an important position for this (vide infra).
From Sect. 3.5, (3.150) we have
M z Y 00 ¼ 0,
where Y 00 ðθ, ϕÞ ¼ 1=4π . The above relation obviously shows that Y 00 ðθ, ϕÞ
belongs to an eigenvalue of zero of Mz. This is consistent with the earlier comments
made on the one-dimensional real space.
20.2 Rotation Groups: SU(2) and SO(3)
Bearing in mind the aforementioned background knowledge, we further develop the

theory of the rotation groups SU(2) and SO(3) and explore in detail various proper-
ties of them. First, we deal with the topics through conventional approaches. Later
we will describe them systematically.
Following the argument developed above, we wish to relate an anti-Hermitian
operator to a rotation operator of a finite angle θ represented by (17.12). We express
Hermitian operators related to the x- and y-axes as
0 1 0 1
0 0 0 0 0 i
B C B C
ℳx ¼ @ 0 0 i A and ℳy ¼ @ 0 0 0 A:
0 i 0 i 0 0
Then, anti-Hermitian operators corresponding to (20.15) are described by

0 1 0 1 0 1
0 1 0 0 0 0 0 0 1
B C B C B C
Az ¼ @ 1 0 0 A, A x ¼ @ 0 0 1 A, Ay ¼ @ 0 0 0 A, ð20:26Þ
0 0 0 0 1 0 1 0 0
20.2 Rotation Groups: SU(2) and SO(3) 807
where Ax iℳx and Ay iℳy. These are characterized by real skew-symmetric

matrices.
Using (15.7), we calculate exp(θAz) such that
1 1 1 1
exp ðθAz Þ ¼ E þ θAz þ ðθAz Þ2 þ ðθAz Þ3 þ ðθAz Þ4 þ ðθAz Þ5 þ : ð20:27Þ
2! 3! 4! 5!
We note that
0 1 0 1
1 0 0 1 0 0
B C B C
Az 2 ¼@ 0 1 0 A ¼ Az , Az ¼ @ 0
6 4
1 0 A ¼ Az 8 , etc:;
0 0 0 0 0 0
0 1 0 1
0 1 0 0 1 0
B C B C
Az 3 ¼ @ 1 0 0 A ¼ Az 7 , Az 5 ¼ @ 1 0 0 A ¼ Az 9 , etc:
0 0 0 0 0 0
Thus, we get
1 1 1 1
exp ðθAz Þ ¼ E þ ðθAz Þ2 þ ðθAz Þ4 þ þ θAz þ ðθAz Þ3 þ ðθAz Þ5 þ
2! 4! 3! 5!
0 1
1 1
1 θ2 þ θ4 0 0
B 2! 4! C
B C
¼BB 1 2 1 4 C
0 1 θ þ θ 0 C
@ 2! 4! A
0 0 1
0 1
1 3 1 5
0 θ þ θ θ þ 0
B 3! 5! C
B C
þBB 1 3 1 5 C
θ θ þ θ þ 0 0 C :
@ 3! 5! A
0 0 0
0 1 0 1
cos θ 0 0 0 sin θ 0
B C B C
¼B@ 0 cos θ 0 C B
A þ @ sin θ 0 0CA
0 0 1 0 0 0
0 1
cos θ sin θ 0
B C
B
¼ @ sin θ cos θ 0 C A:
0 0 1
ð20:28Þ
Hence, we recover the matrix form of (17.12). Similarly, we get

0 1
1 0 0
B C
exp ðφAx Þ ¼ @ 0 cos φ sin φ A, ð20:29Þ
0 sin φ cos φ
0 1
cos ϕ 0 sin ϕ
B C
exp ϕAy ¼ @ 0 1 0 A: ð20:30Þ
sin ϕ 0 cos ϕ
Of the above operators, using (20.28) and (20.30) we recover (17.101) related to
the Euler angles such that

f
R3 ¼ exp ðαAz Þ exp βAy exp ðγAz Þ: ð20:31Þ
Using exponential functions of matrices, (20.31) gives a transformation matrix

containing the Euler angles and supplies us with a direct method to be able to make a
matrix representation of SO(3).
Although (20.31) is useful for the matrix representation in the real space, its
extension to abstract representation spaces would somewhat be limited. In this
respect, the method developed in SU(2) can widely be used to produce representa-
tion matrices for a representation space of an arbitrary dimension. Let us start with
constructing those representation matrices using (2, 2) complex matrices.
20.2.1 Construction of SU(2) Matrices
In Sect. 16.1 we mentioned a general linear group GL(n, ℂ). There are many
subgroups of GL(n, ℂ). Among them, SU(n), in particular SU(2), plays a central
role in theoretical physics and chemistry. In this section we examine how to
construct SU(2) matrices.
Definition 20.1 A group consisting of (n, n) unitary matrices with a determinant 1 is
called a special unitary group of degree n and denoted by SU(n).
In Chap. 14 we mentioned that the “absolute value” of determinant of a unitary
matrix is 1. In Definition 20.1, on the other hand, there is an additional constraint in
that we are dealing with unitary matrices whose determinant is 1.
The SU(2) matrices U have the following general form:

a b
U¼ , ð20:32Þ
c d
where a, b, c, and d are complex numbers. According to Definition 20.1 we try to

seek conditions for these numbers. The unitarity condition UU{ ¼ U{U ¼ E reads as
! ! !
a b a c jaj2 þ jbj2 ac þ bd
¼
c d b d a c þ b d jcj2 þ jdj2
! ! ! ! ð20:33Þ
a c a b jaj2 þ jcj2 a b þ c d 1 0
¼ ¼ ¼ :
b d c d ab þ cd jbj2 þ jdj2 0 1
Then we have
(i) |a|2 + |b|2 ¼ 1, (ii) |c|2 + |d|2 ¼ 1, (iii) |a|2 + |c|2 ¼ 1,
(iv) |b|2 + |d|2 ¼ 1, (v) ac + bd ¼ 0, (vi) ab + cd ¼ 0,
(vii) ad bc ¼ 1.
Of the above conditions, (vii) comes from detU ¼ 1. Condition (iv) can be
eliminated by conditions (i)(iii). From (v) and (vii), using Cramer’s rule we have
0 b a 0

1 a b b 1 a
c¼ ¼ ¼ b , d ¼ ¼ ¼ a : ð20:34Þ
a b 2
jaj þ jbj 2 a b jaj þ jbj2
2
b a b a
From (20.34) we see that (i)–(iv) are equivalent and that (v) and (vi) are redun-
dant. Thus, as a general form of U we get

a b
U¼ with jaj2 þ jbj2 ¼ 1: ð20:35Þ
b a
As a result, we have four freedoms with a and b with a restricting condition of

|a|2 + |b|2 ¼ 1. That is, we finally get three freedoms with choice of parameters.
Putting
a ¼ p þ iq and b ¼ r þ is ðp, q, r, s : realÞ,
we have
p2 þ q2 þ r 2 þ s2 ¼ 1: ð20:36Þ
Therefore, the restricted condition of (20.35) is equivalent to that p, q, r, and s are

regarded as coordinates that are positioned on a unit sphere of ℝ4.

1 { a b
U ¼U ¼ :
b a
Also putting

a1 b1 a2 b2
U1 ¼ and U2 ¼ , ð20:37Þ
b1 a1 b2 a2
we get

a1 a2 b1 b2 a1 b2 þ b1 a2
U1U2 ¼ : ð20:38Þ
a2 b1 a1 b2 b1 b2 þ a1 a2
Thus, both U1 and U1U2 satisfy criteria (20.35) and, hence, the unitary matrices
having a matrix form of (20.35) constitute a group; i.e., SU(2).
Next we wish to connect the SU(2) matrices to the matrices that represent the
angular momentum. In Sect. 3.4 we developed the theory of generalized angular
momentum. In the case of j ¼ 1/2 we can obtain accurate information about the spin
states. Using (3.101) as a starting point and defining the state belonging to the spin

1 0
angular momentum μ ¼ 1/2 and μ ¼ 1/2 as and , respectively, we
0 1
obtain

ðþÞ 0 0 ð Þ 0 1
J ¼ and J ¼ : ð20:39Þ
1 0 0 0

1 0
Note that and are in accordance with (2.64) as a column vector
0 1
representation. Using (3.72), we have

1 0 1 1 0 i 1 1 0
Jx ¼ , Jy ¼ , Jz ¼ : ð20:40Þ
2 1 0 2 i 0 2 0 1

1 1 1
In (20.40) Jz was given so that we may have J z ¼ and
0 2 0

0 1 0 1
Jz ¼ . Deleting the coefficient from (20.40), we get Pauli spin
1 2 1 2
matrices such that

0 1 0 i 1 0
σx ¼ , σy ¼ , σz ¼ : ð20:41Þ
1 0 i 0 0 1
Multiplying i on (20.40), we have anti-Hermitian operators described by

1 0 i 1 0 1 1 i 0
ζx ¼ , ζy ¼ , ζz ¼ : ð20:42Þ
2 i 0 2 1 0 2 0 i
Notice that these are traceless anti-Hermitian matrices.

Now let us seek rotation matrices in a manner related to that deriving (20.31) but
at a more general level. From Property (7)0 of Chap. 15 (see Sect. 15.2), we know
that
exp ðtAÞ
with an anti-Hermitian operator A combined with a real number t produces a unitary

matrix. Let us apply this to the anti-Hermitian matrices described by (20.42).
Using ζ z of (20.42), we calculate exp(aζ z) as in (20.27). We show only the results
described by
0 α α 1
α α @ cos 2 þ i sin 2 0
A
exp ðαζ z Þ ¼ E cos iσ z sin ¼ α α
2 2 0 cos i sin
2 2
iα
!
e2 0
¼ : ð20:43Þ
0 e 2
iα
Readers are recommended to derive this. Using ζ y and applying similar calcula-
tion procedures to the above, we also get
0 1
β β
B
cos
2
sin
2 C:
exp βζ y ¼ @ A ð20:44Þ
β β
sin cos
2 2
The representation matrices describe “complex rotations” and (20.43) and

ð12Þ
(20.44) correspond to (20.28) and (20.30), respectively. We define Dα,β,γ as the
representation matrix that describes the successive rotations α, β, and γ and corre-
sponds to f
R3 in (17.101) and (20.31) including Euler angles. As a result, we have
ð12Þ
Dα,β,γ ¼ exp ðαζ z Þ exp βζ y exp ðγζ z Þ
0 1
β β
e2ðαþγ Þ cos e2ðαγ Þ sin
i i
B 2 2 C,
¼@ A ð20:45Þ
β β
e2ðγαÞ sin e2ðαþγÞ cos
i i
2 2
ð12Þ
where the superscript 12 of Dα,β,γ corresponds to the non-negative generalized angular
1
momentum j ¼ 2 . Equation (20.45) can be obtained by putting a ¼ e2ðαþγÞ cos β2
i
and b ¼ e2ðαγÞ sin β2 in the matrix U of (20.35). Using this general SU(2) represen-
i
tation matrix, we construct representation matrices of higher orders.

20.2.2 SU(2) Representation Matrices: Wigner Formula
Suppose that we have two functions (or vectors) v and u. We regard v and u are
linearly independent vectors that undergo a unitary transformation by a unitary
matrix U of (20.35). That is, we have

0 0 a b
ðv u Þ ¼ ðv uÞU ¼ ðv uÞ , ð20:46Þ
b a
where (v u) are transformed into (v0 u0). Namely, we have
v0 ¼ av b u, u0 ¼ bv þ a u:
Let us further define according to Hamermesh [5] an orthonormal set of 2j + 1

functions such that
u jþm v jm
f m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm ¼ j, j þ 1, , j 1, jÞ, ð20:47Þ
ð j þ mÞ!ð j mÞ!
where j is an integer or a half-odd-integer. Equation (20.47) implies that 2j + 1

monomials fm of 2j-degree with respect to v and u constitute the orthonormal basis.
Related discussion can be seen in Sect. 20.3.2. The description that follows includes
contents related to generalized angular momenta (see Sect. 3.4).
We examine how these functions fm (or vectors) are transformed by (20.45). We
denote this transformation by Rða, bÞ. That is, we have
X ð jÞ
Rða, bÞð f m Þ m0
f m0 Dm0 m ða, bÞ, ð20:48Þ
ð jÞ
where Dm0 m ða, bÞ denotes matrix elements with respect to the transformation.
Replacing u and v with u0 and v0, respectively, we get
1
ℜða,bÞf m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðbv þ a uÞ jþm ðav b uÞ jm
X 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðbv þ a uÞ jþm ðav b uÞ jm
μ,v ð j þ mÞ!ð j mÞ!
X 1 ð j þ mÞ!ð j mÞ!
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
μ,v ð j þ mÞ!ð j mÞ! ð j þ m μÞ!μ!ð j m vÞ!v!
ða uÞ jþmμ ðbvÞμ ðb uÞ jmv ðavÞv

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð j þ mÞ!ð j mÞ!
¼ μ,v ð j þ m μÞ!μ!ð j m vÞ!v!
ða Þ jþmμ bμ ðb Þ jmv av u2jμv vμþv :
ð20:49Þ
Here, to eliminate v, putting
2j μ v ¼ j þ m0 , μ þ v ¼ j m0 ,
we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð j þ mÞ!ð j mÞ!
Rða, bÞf m ¼ μ,m0 ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!
0
ða Þ jþmμ bμ ðb Þm mþμ a jm μ u jþm v jm :
0 0 0
ð20:50Þ
Expressing f m0 as
0 0
u jþm v jm
f m0 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm0 ¼ j, j þ 1, , j 1, jÞ, ð20:51Þ
ð j þ m0 Þ!ð j m0 Þ!
we obtain
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð j þ mÞ!ð j mÞ!ð j þ m0 Þ!ð j m0 Þ!
Rða, bÞð f m Þ ¼ f 0
μ,m 0 m
ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!
0
ða Þ jþmμ bμ ðb Þm mþμ a jm μ :
0
ð20:52Þ
Equating (20.52) with (20.48), we get

ð jÞ
X ð j þ mÞ!ð j mÞ!ð j þ m0 Þ!ð j m0 Þ!
Dm0 m ða, bÞ ¼
μ ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!
0
ða Þ jþmμ bμ ðb Þm mþμ a jm μ :
0
ð20:53Þ
In (20.53), factorials of negative integers must be avoided; see (3.216). Finally,

replacing a (or a) and b (or b) with e2ðαþγÞ cos β2 [or e2ðαþγÞ cos β2] and e2ðαγÞ sin β2
i i i
[or e2ðγαÞ sin β2] by use of (20.45), respectively, we get

i
ð jÞ
X ð1Þm0 mþμ ð j þ mÞ!ð j mÞ!ð j þ m0 Þ!ð j m0 Þ!
Dm0 m ðα, β, γ Þ ¼
μ ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!
2jþmm0 2μ m0 mþ2μ
iðαm0 þγmÞ β β
e cos sin : ð20:54Þ
2 2
Equation (20.54) is called Wigner formula [5–7].

ð jÞ
To confirm that Dm0 m is indeed unitary, we carry out following calculations. From
(20.47) we have
Xj Xj ju jþm v jm j Xj juj2ð jþmÞ jvj2ð jmÞ

2
j f j2 ¼
m¼j m m¼j ð j þ mÞ!ð j mÞ!
¼ m¼j ð j þ mÞ!ð j mÞ!
:
Meanwhile, we have
h i2j X2j ð2jÞ!juj2ð2jkÞ jvj2k Xj ð2jÞ!juj2ð jþmÞ jvj

2ð jmÞ
2
juj þ jvj 2
¼ ¼ ,
k¼0 ð2j k Þ!k! m¼j ð j þ mÞ!ð j mÞ!
where with the last equality k was replaced with j m. Comparing the above two
equations, we get
Xj h i2j
1
j f j2 ¼ juj2 þ jvj2 : ð20:55Þ
m¼j m ð2jÞ!
Viewing (20.46) as variables transformation and taking adjoint of (20.46), we

have

v0 { v
¼U , ð20:56Þ
u0 u

a b
where we defined U ¼ (i.e., a unitary matrix). Operating (20.56) on
b a
both sides of (20.46) from the right, we get

v0 v v
ðv0 u0 Þ ¼ ðv uÞUU { ¼ ð v uÞ :
u0 u u
That is, we have a quadratic form described by
ju0 j þ jv0 j ¼ juj2 þ jvj2 :

2 2
ð20:57Þ
Pj 2
From (20.55) and (20.57), we conclude that m¼j j f m j (i.e., the length of
ð jÞ
vector) is invariant under the transformation Dm0 m. This implies that the transforma-
ð jÞ
tion is unitary and, hence, that Dm0 m is a unitary matrix [5].
20.2.3 SO(3) Representation Matrices and Spherical Surface

Harmonics
Once representation matrices D( j )(α, β, γ) have been obtained, we are able to get
further useful information. We immediately notice that (20.54) is related to (3.216)
that is explicitly described by trigonometric functions. Putting m ¼ 0 in (20.54) we

have
ð jÞ ð jÞ
X ð1Þm0 þμ j! ð j þ m0 Þ!ð j m0 Þ!
Dm0 0 ðα, β, γ Þ ¼ Dm0 0 ðα, βÞ
¼ μ ð j μÞ!μ!ðm0 þ μÞ!ð j m0 μÞ!
2jm0 2μ m0 þ2μ
0 β β
eiαm cos sin : ð20:58Þ
2 2
Note that (20.58) does not depend on the variable γ, because m ¼ 0 leads to
eimγ ¼ ei 0 γ ¼ e0 ¼ 1 in (20.54). Notice also that the event of m ¼ 0 in (20.54)
never happens with the case where j is a half-odd-integer, but happens with the case
where j is an integer l. In fact, D(l )(α, β, γ) is a (2l + 1)-degree representation of SO
(3). Later in this section, we will give a full matrix form in the case of l ¼ 1.
Replacing m0 with m in (20.58) we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2jm2μ mþ2μ
ð jÞ
X ð1Þmþμ j! ð j þ mÞ!ð j mÞ! β β
iαm
Dm0 ðα,β,γ Þ ¼ μ ð j μÞ!μ!ðm þ μÞ!ð j m μÞ!
e cos sin :
2 2
ð20:59Þ
For (20.59) to be consistent with (3.216), by changing notations of variables and

replacing j with an integer l we further rewrite (20.59) to get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2l2rm 2rþm
ðlÞ
X ð1Þmþr l! ðl þ mÞ!ðl mÞ! θ θ
imϕ
Dm0 ðϕ, θÞ ¼ r r!ðl m r Þ!ðl r Þ!ðm þ r Þ!
e cos sin :
2 2
ð20:60Þ
Notice that in (20.60) we deleted the variable γ as it was redundant. Comparing

(20.60) with (3.216), we get [5].
h i rffiffiffiffiffiffiffiffiffiffiffiffi
ðlÞ 4π m
Dm0 ðϕ, θÞ ¼ Y ðθ, ϕÞ: ð20:61Þ
2l þ 1 l
From a general point of view, let us return to a discussion of SU(2) and introduce
a following important theorem in relation to the basis functions of SU(2).
Theorem 20.1 [1] Let {ψ j, ψ j + 1, , ψ j} be a set of (2j + 1) functions. Let D( j )
be a representation of the special unitary group SU(2). Suppose that we have a
following expression for ψ m, k (m ¼ j, , j) such that
h i
ð jÞ
ψ m,k ðα, β, γ Þ ¼ Dmk ðα, β, γ Þ , ð20:62Þ
where α, β, and γ are Euler angles that appeared in (17.101). Then, the above
functions are basis functions of the representation of D( j ).
Fig. 20.2 Coordinate I

systems O, I, and II and their ( , , )
transformation. Regarding
the symbols and notations,
see text
O
Proof In Fig. 20.2 we show coordinate systems O, I, and II. Let P, Q, R be operators
of transformation (i.e., rotation) between the coordinate system as shown. The
transformation Q is defined as the coordinate transformation that transforms I to
II. Note that their inverse transformations, e.g., Q1 exists. In Fig. 20.2, P and R are
specified as P(α0, β0, γ 0) and R(α, β, γ), respectively. Then, we have [1, 2]
Q1 R ¼ P: ð20:63Þ
According to the expression (19.12), we have
h i
ð jÞ
Qψ m,k ðα, β, γ Þ ¼ ψ m,k Q1 ðα, β, γ Þ ¼ ψ m,k ðα0 , β0 , γ 0 Þ ¼ Dmk Q1 R ,
where the last equality comes from (20.62) and (20.63); (α, β, γ) and (α0, β0, γ 0) stand
for the transformations R(α, β, γ) and P(α0, β0, γ 0), respectively. The matrix element
ð jÞ
Dmk Q1 R can be written as
ð jÞ
X ð jÞ
Xh i
ð jÞ
Dmk Q1 R ¼ DðmnjÞ Q1 Dnk ðRÞ ¼ DðnmjÞ ðQÞ Dnk ðRÞ,
n n
where with the last equality we used the unitarity of the representation matrix D( j ).
Taking complex conjugate of the above expression, we get
X h i
ð jÞ
Qψ m,k ðα, β, γ Þ ¼ ψ m,k ðα0 , β0 , γ 0 Þ ¼ DðnmjÞ ðQÞ Dnk ðRÞ
n
X h i X
ð jÞ ð jÞ
¼ n
D nm ð Q Þ D nk ð α, β, γ Þ ¼ ψ n,k ðα, β, γ ÞDðnmjÞ ðQÞ, ð20:64Þ
n
where with the last equality we used the supposition (20.62). Equation (20.64)
implies that ψ m, k (m ¼ j, , 0, , j) are basis functions of the representation
of D( j ). This completes the proof. ∎
If we restrict Theorem 20.1 to SO(3), we get an important result. Replacing j with
an integer l and putting k ¼ 0 in (20.62) and (20.64), we get
X
Qψ m,0 ðα, β, γ Þ ¼ ψ m,0 ðα0 , β0 , γ 0 Þ ¼ n
ψ n,0 ðα, β, γ ÞDðnm
lÞ
ðQÞ,
h i ð20:65Þ
ðlÞ
ψ m,0 ðα, β, γ Þ ¼ Dm0 ðα, β, γ Þ :
This expression shows that the complex conjugates of individual components for
the “center” column vector of the representation matrix give the basis functions of
the representation of D(l ).
Removing γ as it is redundant and by the aid of (20.61), we get
4π m
ψ m,0 ðα, βÞ Y ðβ, αÞ:
2l þ 1 l
Thus, in combination with (20.61), Theorem 20.1 immediately indicates that the
spherical surface harmonics Y m l ðθ, ϕÞ ðm ¼ l, , 0, , lÞ span the representation
space with respect to D(l ). That is, we have
X
l ðβ, αÞ ¼
QY m Y n ðβ, αÞDðnm
n l
lÞ
ðQÞ: ð20:66Þ
Now, let us get back to (20.63). From (20.63), we have
Q ¼ RP1 : ð20:67Þ
Since P represents an arbitrary transformation, we may choose any (orthogonal)

coordinate system I in ℝ3. Meanwhile, a transformation which transforms I into II is
necessarily present and given by Q that is expressed as (20.67); see (11.72) and the
relevant argument of Sect. 11.4. Consequently, Q represents an arbitrary transfor-
mation in ℝ3 in turn.
Now, in (20.67) putting P ¼ E we get Q R(α, β, γ). Then, replacing Q with R in
(20.66) as a special case we have
X
Rðα, β, γ ÞY m
l ðβ, αÞ ¼ Y n ðβ, αÞDðnm
n l
lÞ
ðα, β, γ Þ: ð20:68Þ
The LHS of (20.68) is associated with three successive transformations. To

describe it appropriately, let us recall (19.12) again. We have

Rf ðrÞ ¼ f R1 ðrÞ : ð19:12Þ
Now, we are thinking of the case where R in (19.12) is described by R(α, β, γ) or

f
R3 ¼ Rzα R0y0 β R00 z00 γ in (17.101). That is, we have
h i
Rðα, β, γ Þf ðrÞ ¼ f Rðα, β, γ Þ1 ðrÞ :
f3 of (17.101), we readily get [5].

Considering the constitution of R
Rðα, β, γ Þ1 ¼ Rðγ, β, αÞ:
Thus, we have
Rðα, β, γ Þf ðrÞ ¼ f ½Rðγ, β, αÞðrÞ: ð20:69Þ
Applying (20.69) to (20.68), we get the successive (inverse) transformations of

the functions such that
l ðβ, αÞ ! Y l ðβ, α γ Þ ! Y l ðβ β, α γ Þ ! Y l ðβ β, α γ αÞ:

Ym m m m
The last entry in the above is [5]
l ðβ β, α γ αÞ ¼ Y l ð0, γ Þ:
Ym m

X
l ð0, γ Þ ¼
Ym Y nl ðβ, αÞDðnm
lÞ
ðα, β, γ Þ: ð20:70Þ
n

rffiffiffiffiffiffiffiffiffiffiffiffih i
2l þ 1 ðlÞ
l ð0, γ Þ ¼
Ym
4π
Dm0 ðγ, 0Þ :
Returning to (20.60), if we would have sin 2θ factor in (20.60) or in (3.216), the

ðlÞ
relevant terms vanish with Dm0 ðγ, 0Þ except for special cases where 2r + m ¼ 0 in
(3.216) or (20.60). But, to avoid having a negative integer inside the factorial in the
denominator of (20.60), we must have r + m 0. Since r 0, we must have
2r + m ¼ (r + m) + r 0 as well. In other words, only if r ¼ m ¼ 0 in (20.60), we
ðlÞ
have 2r + m ¼ 0, for which Dm0 ðγ, 0Þ does not vanish. Using these conditions in
(20.60), (20.60) is greatly simplified to be
ðlÞ
Dm0 ðγ, 0Þ ¼ δm0 :
From (20.61) we get [5]

2l þ 1
l ð0, γ Þ
Ym ¼
4π 0m
δ : ð20:71Þ
Notice that this expression is consistent with (3.147) and (3.148). Meanwhile,
replacing ϕ and θ in (20.61) with α and β, respectively, multiplying the resulting
ðlÞ
equation by Dmk ðα, β, γ Þ, and further summing over m, we get
X 4π m ðlÞ
X h ðlÞ i
ðlÞ
Y ð β, αÞD ðα, β, γ Þ ¼ D ð α, β, γ Þ Dmk ðα, β, γ Þ
m 2l þ 1 l mk m m0
4π k
¼ δ0k ¼ Y ð0, γ Þ, ð20:72Þ
2l þ 1 l
where with the second equality we used the unitarity of the matrices Dðnm
lÞ
ðα, β, γ Þ (see
Sect. 20.2.2)
qffiffiffiffiffiffiffi and with the last equality we used (20.70) multiplied by the constant
4π
2lþ1 . At the same time, we recovered (20.71) comparing the last two sides of
(20.72).
Rewriting (20.48) in a (2j + 1, 2j + 1) matrix form, we get

f j , f jþ1 , , f ℜða, bÞ
j1 , f j
0 1
Dj,j Dj,j
B C ð20:73Þ
¼ f j , f jþ1 , , f j1 , f j B
@ ⋮ ⋱ ⋮ C A,
D j,j D j,j
where fm (m ¼ j, j + 1, , j 1, j) is described as in (20.47). If j is an integer l,

we can choose Y m l ðβ, αÞ for individual fm (i.e., basis functions).
To get used to the abstract representation theory in continuous groups, we wish to
think of a following trivial case: Putting j ¼ 0 in (20.54), (20.54) becomes trivial, but
still valid. In that case, we have m0 ¼ m ¼ 0 so that the inside of the factorial can be
non-negative. In turn, we must have μ ¼ 0 as well. Thus, we have
ð0Þ
D00 ðα, β, γ Þ ¼ 1:
Or, inserting l ¼ m ¼ 0 into (20.61), we have

h i pffiffiffiffiffi 0 pffiffiffiffiffiffiffiffiffiffi
ð 0Þ
D00 ðϕ, θÞ ¼1¼ 4π Y 0 ðθ, ϕÞ, i:e:, Y 00 ðθ, ϕÞ ¼ 1=4π :
Thus, we recover (3.150).

Next, let us reproduce three-dimensional representation D(1) using (20.54). A full
matrix form is expressed as
0 1
D1,1 D1,0 D1,1
B C
Dð1Þ ¼ B
@ D0,1 D0,0 D0,1 C A
D1,1 D1,0 D1,1
0 β 1 β 1
eiðαþγ Þ cos 2 pffiffiffi eiα sin β eiðαγÞ sin 2 ð20:74Þ
B 2 2 2 C
B C
B C
B 1
pffiffiffi e sin β C
1 iγ
¼ B pffiffiffi eiγ sin β cos β C:
B 2 2 C
B C
@ β 1 βA
eiðγαÞ sin 2 pffiffiffi eiα sin β eiðαþγÞ cos 2
2 2 2
Note that (20.74) is a unitary matrix. The confirmation is left for readers. The
complex conjugate of the column vector m ¼ 0 in (20.61) with l ¼ 1 is described by
0 1
1
pffiffiffi eiα sin β
h i B
B 2
C
C
Dm0 ðϕ, θÞ ¼ B
ð1Þ C
B cos β C ð20:75Þ
@ 1 iα A
pffiffiffi e sin β
2
that is related to the spherical surface harmonics (see Sect. 3.6); also see p. 760,
Table 15.4 of Ref. [4].
Now, (20.25), (20.61), (20.65), and (20.73) provide a clue to relating D(1) to the
f3 of (17.101) that includes Euler angles. The argument is
rotation matrix of ℝ3, i.e., R
as follows: As already shown in (20.61) and (20.65), we have related the basis
vectors ( f1 f0 f1) of D(1) to the spherical harmonics Y 1 1 Y 01 Y 11 . Denoting the
spherical basis by
1 1
ff e e
1 pffiffiffi ðx iyÞ, f 0 z, f 1 pffiffiffi ðx iyÞ, ð20:76Þ
2 2
we have
0 1
1 1
B 2 0 2C
B
ff e e
1 f 0 f 1 ¼ ðx z yÞU ¼ ðx z yÞB
B 0 1 0 C C, ð20:77Þ
@ i i A
pffiffiffi 0 pffiffiffi
2 2
where we define a (unitary) matrix U as

0 1
1 1
pffiffiffi 0 pffiffiffi C
B 2
B 2C
UB
B 0 1 0 C C:
@ i i A
pffiffiffi 0 pffiffiffi
2 2
Equation (20.77) describes the relationship among two set of functions

f
f 1 fe0 fe1 and (x z y). We should not regard (x z y) as coordinate in ℝ3 as
mentioned in Sect. 20.1. The (3, 3) matrix of (20.77) is essentially the same as P of
0
(20.22). Following (20.73) and taking account of (20.25), we express ff e0 e0
1 f 0 f 1
as
0
ff e 0 e 0 ff
1 f 0 f 1
e e f e e ð1Þ
1 f 0 f 1 Rða, bÞ ¼ f 1 f 0 f 1 D : ð20:78Þ
This means that a set of functions ff e e

1 f 0 f 1 forms the basis vectors of D
(1)
as
well. Meanwhile, the relation (20.77) holds after the transformation, because (20.77)
is independent of the choice of specific coordinate systems. That is, we have
0
ff e 0 e 0 ¼ ðx0 z0 y0 Þ U:
1 f 0 f 1
0
ðx0 z0 y0 Þ ¼ ff e 0 e 0 1 ¼ ff
1 f 0 f 1 U
e e ð1Þ 1
1 f 0 f 1 D U
¼ ðx z yÞUDð1Þ U 1 : ð20:79Þ
Defining a unitary matrix V as

0 1
1 0 0
B C
V ¼@0 0 1A
0 1 0
and operating V on both sides of (20.79) from the right, we get
ðx0 z0 y0 ÞV ¼ ðx z yÞV V 1 UDð1Þ U 1 V:
That is, we have

1
ðx0 y0 z0 Þ ¼ ðx y zÞ U 1 V Dð1Þ U 1 V: ð20:80Þ
Note that (x0 z0 y0)V ¼ (x0 y0 z0) and (x z y)V ¼ (x y z); that is, V exchanges the
order of z and y in (x z y).
Meanwhile, regarding (x y z) as the basis vectors in ℝ3 we have
ð x0 y0 z 0 Þ ¼ ð x y z Þ R , ð20:81Þ
where R represents a (3, 3) orthogonal matrix.

Equation (20.81) represents a rotation in SO(3). Since, x, y, and z are linearly
independent, comparing (20.80) and (20.81) we get
1 {
R U 1 V Dð1Þ U 1 V ¼ U 1 V Dð1Þ U 1 V: ð20:82Þ
β β
More explicitly, using a ¼ e2ðαþγÞ cos and b ¼ e2ðαγÞ sin
i i
2 2 as in the case of
(20.45), we describe R as
0 1
Re a2 b2 Im a2 þ b2 2 Re ðabÞ
B C
R ¼ @ Im a2 b2 Re a2 þ b2 2ImðabÞ A: ð20:83Þ
2 Re ðab Þ 2Imðab Þ 2
j aj j bj 2
Comparing (20.83) with (17.101), we find out that R is identical to f R3 of

f3 are
(17.101). Equations (20.82) and (20.83) clearly show that D(1) and R R
connected to each other through the unitary similarity transformation, even though
these matrices differ in that the former matrix is unitary and the latter is a real
orthogonal matrix. That is, D(1) and f
R3 are equivalent in terms of representation; see
Schur’s First Lemma of Sect. 18.3. Thus, D(1) is certainly a representation matrix of
f3 is given by
SO(3). The trace of D(1) and R
f3 ¼ cos ðα þ γ Þð1 þ cos βÞ þ cos β:

TrDð1Þ ¼ Tr R
Let us continue the discussion still further. We think of the quantity (x/r y/r z/r),
where r is radial coordinate of (20.25). Operating (20.82) on (x/r y/r z/r) from the
right, we have
ðx=r y=r z=r ÞR ¼ ðx=r y=r z=r ÞV 1 UDð1Þ U 1 V
¼ ff e e ð1Þ 1
1 =r f 0 =r f 1 =r D U V:
From (20.25), (20.61), (20.75), and (20.76), we get

1 1
ff e e
1 =r f 0 =r f 1 =r ¼ pffiffiffi eiα sin β cos β pffiffiffi eiα sin β : ð20:84Þ
2 2
Using (20.74) and (20.84), we obtain

ff e e
1 =r f 0 =r f 1 =r D
ð1Þ
¼ ð0 1 0Þ:
This is a tangible example of (20.72). Then, we have
ff e e ð1Þ 1
1 =r f 0 =r f 1 =r D U V ¼ ðx=r y=r z=r ÞR ¼ ð0 0 1Þ: ð20:85Þ
For the confirmation of (20.85), use the spherical coordinate representation (3.17)
to denote x, y, and z in the above equation. This seems to be merely a confirmation of
the unitarity of representation matrices; see (20.72). Nonetheless, we emphasize that
the simple relation (20.85) holds only with a special case where the parameters α and
β in D(1) (or R ) are identical with the angular components of spherical coordinates
associated with the Cartesian coordinates x, y, and z. Equation (20.85), however,
does not hold in a general case where the parameters α and β are different from those
angular components. In other words, (20.68) that describes the special case where
Q R(α, β, γ) leads to (20.85); see Fig. 20.2. Equation (20.66), on the other hand, is
used for the general case where Q represents an arbitrary transformation in ℝ3.
Regarding further detailed discussion on the topics, readers are referred to the
literature [1, 2, 5, 6].
20.2.4 Irreducible Representations of SU(2) and SO(3)
In this section we further explore characteristics of the representation of SU(2) and

SO(3) and show the irreducibility of the representation matrices of SU(2) and SO(3).
ð jÞ
To prove the irreducibility of the SU(2) representation matrices Dm0 m , we use
Schur’s lemmas already explained in Sect. 18.3. For this purpose we use special
types of matrices of [5]. In (20.54) we consider a special case where m0 ¼ j. Then, the
factor ( j m0 μ) ! ¼ (μ)! in the denominator survives only when μ ¼ 0. Hence,
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jþm jm
ð jÞ jm ð2jÞ! iðαjþγmÞ β β
Djm ðα, β, γ Þ ¼ ð1Þ e cos sin :
ð j þ mÞ!ð j mÞ! 2 2
ð20:86Þ
ð jÞ
Since in (20.86) the exponential term never vanishes, Djm ðα, β, γ Þ does not vanish
either except for special values of β.
ð jÞ
Next, we consider Dm0 m ðα, 0, γ Þ. For this term not to vanish, the power of sin β2 in
(20.54) must be zero; i.e., we have
m0 m þ 2μ ¼ 0: ð20:87Þ
Meanwhile, no condition is imposed on the power of cos β2 in (20.54); note that

2jþmm0 2μ
cos 02 ¼ 1. In the denominator of (20.54), to avoid a negative number in
factorials we must have m0 m + μ 0 and μ 0. If μ > 0,
m0 m + 2μ ¼ (m0 m + μ) + μ > 0 as well. Therefore, for (20.87) to hold, we
must have μ ¼ 0. From (20.87), in turn, this means m0 m ¼ 0, i.e., m0 ¼ m. Then,
we have a greatly simplified equation
ð jÞ 0
Dm0 m ðα, 0, γ Þ ¼ δm0 m eiðαm þγmÞ : ð20:88Þ
ð jÞ
This implies that Dm0 m ðα, 0, γ Þ is diagonal. Suppose that we have a matrix
ð jÞ
A ¼ ðAm0 m Þ. If A commutes with Dm0 m ðα, 0, γ Þ, it must be diagonal unless
ð jÞ
Dm0 m ðα, 0, γ Þ ¼ cδm0 m, where c is a constant independent of m0 and m. But,
ð jÞ
Dm0 m ðα, 0, γ Þ 6¼ cδm0 m because of (20.88). That is, A should have a form Am0 m ¼
ð jÞ
am δm0 m . If A commutes with Dm0 m ðα, β, γ Þ, then taking the ( j, m) component of the
product we have
X ð jÞ
X ð jÞ ð jÞ
ADð jÞ ¼ A D ¼
k jk km
aδ D
k k jk km
¼ aj Djm : ð20:89Þ
jm
Meanwhile,
X ð jÞ
X ð jÞ ð jÞ
Dð jÞ A ¼ k
Djk Akm ¼ k
Djk am δkm ¼ am Djm : ð20:90Þ
jm
From (20.89) and (20.90), we have
ð jÞ
Djm ðα, β, γ Þ aj am ¼ 0:
ð jÞ
As already mentioned above, the matrix elements Djm ðα, β, γ Þ of (20.86) do not
ð jÞ
vanish except for special values of β. Therefore, for A to commute with Dm0 m ðα, β, γ Þ,
we must have aj ¼ am a for all m. This implies
Am0 m ¼ aδm0 m : ð20:91Þ
ð jÞ
Thus, from Schur’s Second Lemma Dm0 m ðα, β, γ Þ is irreducible.
We must be careful, however, about the application of Schur’s Second Lemma.
This is because Schur’s Second Lemma described in Sect. 18.3 says the
following:
A representation D is irreducible ⟹ The matrix M that is commutative with all
D(g) is limited to a form of M ¼ cE.
Nonetheless, we are uncertain of truth or falsehood of the converse proposition.

Fortunately, however, the converse proposition is true if the representation is unitary.
We prove the converse proposition by its contraposition such that.
A representation D is not irreducible (i.e., reducible) ⟹ Of the matrices that are
commutative with all D(g), we can find at least one matrix M of the type other than
M ¼ cE.
In this context, Theorem 18.2 (Sect. 18.2) as well as (18.38) teaches us that the
unitary representation D(g) is completely reducible so that we can describe it as a
direct sum such that
DðgÞ ¼ Dð1Þ ðgÞ Dð2Þ ðgÞ DðωÞ ðgÞ, ð20:92Þ
where g is any group element. Then, we can choose a following matrix A for a linear
transformation that is commutative with D(g):
A ¼ αð1Þ E 1 αð2Þ E 2 αðωÞ E ω , ð20:93Þ
where α(1), , α(ω) are complex constants that may be different from one another;
E1, , Eω are unit matrices having the same dimension as D(1)(g), , D(ω)(g),
respectively. Then, even though A is not a type of A ¼ αδm0 m, A commutes with D(g)
for any g.
Here we rephrase Schur’s Second Lemma as the following theorem.
Theorem 20.2 Let D be a unitary representation of a group G. Suppose that with
8
g 2 G we have
DðgÞM ¼ MDðgÞ: ð18:48Þ
A necessary and sufficient condition for the said unitary representation to be

irreducible is that linear transformations that are commutative with D(g) (8g 2 G)
are limited to those of a form described by A ¼ αδm0 m , where α is a (complex)
constant.
Next, we examine the irreducibility of the SO(3) representation matrices.
We develop the discussion in conjunction with explanation about the important
properties of the representation matrices of SU(2) as well as SO(3). We rewrite
(20.54) when j ¼ l (l : zero or positive integers) to relate it to the rotation in ℝ3.
h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð1Þm0 mþμ ðl þ mÞ!ðl mÞ!ðl þ m0 Þ!ðl m0 Þ!
ðlÞ
D ðα, β, γ Þ ¼
m0 m μ ðl þ m μÞ!μ!ðm0 m þ μÞ!ðl m0 μÞ!
2lþmm0 2μ m0 mþ2μ
0 β β
eiðαm þγmÞ cos sin :
2 2
ð20:94Þ
f3 (appearing in
Meanwhile, as mentioned in Sect. 20.2.3 the transformation R
Sect. 17.4.2) described by
f
R3 ¼ Rzα R0 y0 β R00 z00 γ
may be regarded as R(α, β, γ) of (20.63). Consequently, the representation matrix

D(l )(α, β, γ) is described by
ðlÞ
DðlÞ ðα, β, γ Þ DðlÞ f
R3 ¼ DðαlÞ ðRzα ÞDβ R0y0 β DðγlÞ R00 z00 γ : ð20:95Þ
Equation (20.95) is based upon the definition of representation (18.2). This

ð12Þ
notation is in parallel with (20.45) in which Dα,β,γ is described by the product of
three exponential functions of matrices each of which was associated with the
rotation characterized by Euler angles of (17.101). Denoting DðαlÞ ðRzα Þ ¼
ðlÞ
DðlÞ ðα, 0, 0Þ, Dβ R0y0 β ¼ DðlÞ ð0, β, 0Þ, and DðγlÞ R00 z00 γ ¼ DðlÞ ð0, 0, γ Þ, we have
DðlÞ ðα, β, γ Þ ¼ DðlÞ ðα, 0, 0ÞDðlÞ ð0, β, 0ÞDðlÞ ð0, 0, γ Þ, ð20:96Þ
where with each representation two out of three Euler angles are zero.
In light of (20.94), we estimate each factor of (20.96). By the same token as
before, putting β ¼ γ ¼ 0 in (20.94) let us examine on what conditions
m0 mþ2μ
sin 02 survives. For this factor to survive, we must have m0 m + 2μ ¼ 0
as in (20.87). Then, following the argument as before, we should have μ ¼ 0 and
m0 ¼ m. Thus, D(l )(α, 0, 0) must be a diagonal matrix. This is the case with
D(l )(0, 0, γ) as well.
Equation (3.216) implies that the functions Y m l ðβ, αÞ ðm ¼ l, , 0, , lÞ are
eigenfunctions with regard to the rotation about the z-axis. In other words, we have
imα m
l ðθ, ϕÞ ¼ Y l ðθ, ϕ αÞ ¼ e
Rzα Y m Y l ðθ, ϕÞ:
m
l ðθ, ϕÞ can be described as

This is because from (3.216) Y m
l ðθ, ϕÞ ¼ F ðθ Þe
Ym imϕ
,
where F(θ) is a function of θ. Hence, we have
imðϕαÞ
l ðθ, ϕ αÞ ¼ F ðθ Þe
Ym ¼ eimα F ðθÞeimϕ ¼ eimα Y m
l ðθ, ϕÞ:
Therefore, with a full matrix representation we get


Y l lþ1
l Yl Y 0l Y ll Rzα
0 1
eiðlÞα
B C
B C
l lþ1 B
l B
eiðlþ1Þα C
¼ Y l Y l Y l Y l B
0 C
C
B ⋱ C
@ A
eilα ð20:97Þ
0 ilα
1
e
B C
B C
l lþ1 B eiðl1Þα C
¼ Y l Y l Y 0l Y ll B
B
C:
C
B ⋱ C
@ A
eilα
DðlÞ ðα, 0, 0Þ ¼ eimα δm0 m ðm ¼ l, , lÞ: ð20:98Þ
From (20.96), with the (m0, m) matrix elements of D(l )(α, β, γ) we get
h i X h h h i
DðlÞ ðα, β, γ Þ ¼ D ðlÞ
ðα, 0, 0 Þ 0
ms D ðlÞ
ð 0, β, 0 Þst D ðlÞ
ð 0, 0, γ Þ
m0 m s,t
X h i tm
isα ðlÞ imγ
¼ s,t
e δm0 s D ð0, β, 0Þ e δtm ð20:99Þ
0
h i st
¼ eim α DðlÞ ð0, β, 0Þ 0 eimγ

mm
h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð1Þm0 mþμ ðl þ mÞ!ðl mÞ!ðl þ m0 Þ!ðl m0 Þ!
ðlÞ
D ð0, β, 0Þ ¼
m0 m μ ðl þ m μÞ!μ!ðm0 m þ μÞ!ðl m0 μÞ!
2lþmm0 2μ m0 mþ2μ:
β β
cos sin ð20:100Þ
2 2
Thus, we find that [DðlÞ ðα, β, γ Þm0 m of (20.99) has been factorized into three
factors. Equation (20.94) has such an implication. The same characteristic is shared
with the SU(2) representation matrices (20.54) more generally. This can readily be
understood from the aforementioned discussion.
Taking D(1) of (20.74) as an example, we have
Dð1Þ ðα, β, γ Þ ¼ Dð1Þ ðα, 0, 0ÞDð1Þ ð0, β, 0ÞDð1Þ ð0, 0, γ Þ

0 β 1
2β 1
pffiffiffi sin β
0 iα 1B cos 2 sin 2
2 2 C0 iγ 1
e 0 0 B C e 0 0
B CB CB C
¼B 0 1 0 CBB p
1
ffiffi
ffi sin β cos β p
1
ffiffi
ffi sin β CB
C@ 0 1 0 C
@ AB 2 2 C A
B C
0 0 eiα @ β 1 β A 0 0 e iγ
sin 2 pffiffiffi sin β cos2
2 2 2
0 β 1 β 1
eiðαþγÞ cos 2 pffiffiffi eiα sin β eiðαγÞ sin 2
B 2 2 2 C
B C
B 1 C
B pffiffiffi eiγ sin β cos β pffiffiffi e sin β C
1 iγ
¼B C:
B 2 2 C
B C
@ βA
iðγαÞ 2β 1 iα i ðαþγ Þ
e sin pffiffiffi e sin β e cos 2
2 2 2
ð20:74Þ
In this way, we have reproduced the result of (20.74).

Once D(l )(α, β, γ) has been factorized as in (20.94), we can readily examine
whether the representation is irreducible. To know what kinds of matrices commute
with D(l )(α, β, γ), it suffices to examine whether the matrices commute with individ-
ual factors. The argument is due to Hamermesh [5].
(i) Let A be a matrix having a dimension the same as that of D(l )(α, β, γ), namely
A is a (2l + 1, 2l + 1) square matrix with a form of A ¼ (aij). First, we examine the
commutativity of D(l )(α, 0, 0) and A. With the matrix elements we have
h i X h i X
ADðlÞ ðα, 0, 0Þ ¼ a m0k D
ðlÞ
ðα, 0, 0 Þ ¼ am0 k eimα δkm
m0 m k km
k
¼ am0 m eimα , ð20:101Þ
where the second equality comes from (20.98). Also, we have

h i Xh i 0
DðlÞ ðα, 0, 0ÞA ¼ DðlÞ ðα, 0, 0Þ akm ¼ am0 m eim α : ð20:102Þ
m0 m k m0 k
From (20.101) and (20.102), for a condition of commutativity we have
0
am0 m eimα eim α ¼ 0:
If m0 6¼ m, we should have am0 m ¼ 0. If m0 ¼ m, am0 m does not have to vanish,

but may take an arbitrary complex number. Hence, A is a diagonal matrix,
namely
A ¼ am δ m 0 m , ð20:103Þ
where am is an arbitrary complex number.

(ii) Next we examine the commutativity of D(l )(0, β, 0) and A of (20.103). Using
(20.66), we have
X
l ðβ, αÞ ¼
QY m Y n ðβ, αÞDðnm
n l
lÞ
ðQÞ: ð20:66Þ
Choosing R(0, θ, 0) for Q and putting α ¼ 0 in (20.66), we have

X
Rð0, θ, 0ÞY m
l ðβ, 0Þ ¼ Y n ðβ, 0ÞDðnm
n l
lÞ
ð0, θ, 0Þ: ð20:104Þ
Meanwhile, from (20.69) we get

h i
1
Rð0, θ, 0ÞY m
l ð β, 0 Þ ¼ Y m
l Rð 0, θ, 0 Þ ð β, 0 Þ ¼ Ym
l ½Rð0, θ, 0Þðβ, 0Þ
l ðβ θ, 0Þ:
¼ Ym ð20:105Þ
Combining (20.104) and (20.105) we get

X
l ðβ θ, 0Þ ¼
Ym Y n ðβ, 0ÞDðnm
n l
lÞ
ð0, θ, 0Þ: ð20:106Þ
Further putting β ¼ 0 in (20.106), we have

X
l ðθ, 0Þ ¼
Ym Y n ð0, 0ÞDðnm
n l
lÞ
ð0, θ, 0Þ: ð20:107Þ
Taking account of (20.71), RHS does not vanish in (20.107) only when
n ¼ 0. On this condition, we obtain
ðlÞ
l ðθ, 0Þ ¼ Y l ð0, 0ÞD0m ð0, θ, 0Þ:
Ym ð20:108Þ
0
Meanwhile, from (20.71) with m ¼ 0, we have

2l þ 1
Y 0l ð0, 0Þ ¼ : ð20:109Þ
4π
Inserting this into (20.108), we have

2l þ 1 ðlÞ
l ðθ, 0Þ
Ym ¼
4π 0m
D ð0, θ, 0Þ: ð20:110Þ
Replacing θ with θ in (20.110), we get

ðlÞ 4π m
D0m ð0, θ, 0Þ ¼ Y ðθ, 0Þ: ð20:111Þ
2l þ 1 l
Compare (20.111) with (20.61) and confirm (20.111) using (20.74) with
l ¼ 1. Viewing (20.111) as a “row vector,” (20.111) with l ¼ 1 can be expressed
as
ð1Þ ð1Þ ð1Þ

D0,1 ð0, θ, 0Þ D0,0 ð0, θ, 0Þ D0,1 ð0, θ, 0Þ
rffiffiffiffiffi
4π 1 1 1
¼ Y ðθ, 0Þ Y l ðθ, 0Þ Y l ðθ, 0Þ ¼ pffiffiffi sin θ cos θ pffiffiffi sin θ :
0 1
3 l 2 2
Getting back to our topic, with the matrix elements we have

h i X h i h i
ADðlÞ ð0, θ, 0Þ ¼ k
a k δ 0k D ðlÞ
ð 0, θ, 0 Þ ¼ a0 DðlÞ ð0, θ, 0Þ
0m km 0m
and
h i Xh i h i
DðlÞ ð0, θ, 0ÞA ¼ k
DðlÞ ð0, θ, 0Þ am δkm ¼ DðlÞ ð0, θ, 0Þ am ,
0m 0k 0m
where we assume A ¼ am δm0 m in (20.103). Since from (20.108)

[D (0, θ, 0)]0m does not vanish other than specific values of θ, for A and
(l )
D(l )(0, θ, 0) to commute we must have
am ¼ a0
for all m ¼ l, l + 1, , 0, , l 1, l. Then, from (20.103) we get
A ¼ a0 δm0 m , ð20:112Þ
where a0 is an arbitrary complex number. Thus, a matrix that commutes with

both D(l )(α, 0, 0) and D(l )(0, β, 0) must be of a form of (20.112). The same
discussion holds with D(l )(α, β, γ) of (20.94) as a whole with regard to the
commutativity. On the basis of Schur’s Second Lemma (Sect. 18.3), we con-
clude that the representation D(l )(α, β, γ) is again irreducible. This is one of the
prominent features of SO(3) as well as SU(2).
As can be seen clearly in (20.97), the dimension of representation matrix
D(l )(α, β, γ) is 2l + 1. Suppose that there is another representation matrix
0 0
Dðl Þ ðα, β, γ Þ ðl0 6¼ lÞ. Then, D(l )(α, β, γ) and Dðl Þ ðα, β, γ Þ are inequivalent (see
Sect. 18.3). Meanwhile, the unitary representation of SO(3) is completely
reducible and, hence, any reducible unitary representation D can be described by
X
D¼ a DðlÞ ðα, β, γ Þ,
l l
ð20:113Þ
where al is zero or a positive integer. If al 2, this means that the same

representations D(l )(α, β, γ) repeatedly appear. Equation (20.113) provides a
tangible example of (20.92) expressed as
DðgÞ ¼ Dð1Þ ðgÞ Dð2Þ ðgÞ DðωÞ ðgÞ ð20:92Þ
and applies to the representation matrices of SU(2) more generally. We will

encounter related discussion in Sect. 20.3 in connection with the direct-product
representation.
20.2.5 Parameter Space of SO(3)
As already discussed in Sect. 17.4.2, we need three angles α, β, and γ to specify the
rotation in ℝ3. Their domains are usually taken as follows:
0 α 2π, 0 β π, 0 γ 2π:
The domains defined above are referred to as a parameter space.

Yet, there are different implications depending upon choosing different coordinate
systems. The first choice is a moving coordinate system where α, β, and γ are Euler
angles (see Sect. 17.4.2). The second choice is a fixed coordinate system where α is an
azimuthal angle, β is a zenithal angle, and γ is a rotation angle. Notice that in the
former case α, β, and γ represent equivalent rotation angles. In the latter case, however,
although α and β define the orientation of a rotation axis, γ is designated as a rotation
angle. In the present section we further discuss characteristics of the parameter space.
First, we study several characteristics of (3, 3) real orthogonal matrices. (i) The (3, 3)
real orthogonal matrices have eigenvalues 1, eiγ , and eiγ (0 γ π). This is a direct
consequence of (17.77) and invariance of the characteristic equation of the matrix. The
rotation axis is associated with an eigenvector that belongs to the eigenvalue 1.
Let A ¼ (aij) (1 i, j 3) be a (3, 3) real orthogonal matrix. Let u be an
eigenvector belonging to the eigenvalue 1. We suppose that when we are thinking of
the rotation on the fixed coordinate system, u is given by a column vector such that
0 1
u1
B C
u ¼ @ u2 A :
u3
Then, we have
A u = 1u = u: ð20:114Þ
Operating AT on both sides of (20.114), we get
AT A u = Eu = u = AT u, ð20:115Þ
where we used the property of an orthogonal matrix; i.e., ATA ¼ E. From (20.114)
and (20.115), we have

A AT u ¼ 0: ð20:116Þ
Writing (20.116) in matrix components, we have [5].

20 1 0 13 0 1
a11 a12 a13 a11 a21 a31 u1
6B C B C7 B C
4@ a21 a22 a23 A @ a12 a22 a32 A5@ u2 A ¼ 0: ð20:117Þ
a31 a32 a33 a13 a23 a33 u3
That is,
ða12 a21 Þu2 þ ða13 a31 Þu3 ¼ 0,

ða21 a12 Þu1 þ ða23 a32 Þu3 ¼ 0,
ða31 a13 Þu1 þ ða32 a23 Þu2 ¼ 0: ð20:118Þ
Solving (20.118), we get
u1 : u2 : u3 ¼ ða32 a23 Þ : ða13 a31 Þ : ða21 a12 Þ: ð20:119Þ

1 0
u1
B C
If we normalize u, i.e., u12 + u22 + u32 ¼ 1, @ u2 A gives direction cosines of the
u3
eigenvector, namely the rotation axis. Equation (20.119) applies to any orthogonal
matrix. A simple example is (20.3); a more complicated example can be seen in
(17.107). Confirmation is left for readers. We use this property soon below.
Let us consider infinitesimal transformations of rotation. As already implied in
(20.5), an infinitesimal rotation θ around the z-axis is described by
0 1
1 θ 0
B C
Rzθ ¼ @ θ 1 0 A: ð20:120Þ
0 0 1
Now, we newly introduce infinitesimal rotation operators as below:

0 1 0 1 0 1
1 0 0 1 0 η 1 ζ 0
B C B C B C
Rxξ ¼ @ 0 1 ξ A, Ryη ¼ @ 0 1 0 A, Rzζ ¼ @ ζ 1 0 A: ð20:121Þ
0 ξ 1 η 0 1 0 0 1
These operators represent infinitesimal rotations ξ, η, and ζ around the x-, y-, and
z-axes, respectively (see Fig. 20.3). Note that these operators commute one another
to the first order of infinitesimal quantities ξ, η, and ζ. That is, we have
Rxξ Ryη ¼ Ryη Rxξ , Ryη Rzζ ¼ Rzζ Ryη , Rxξ Ryη Rzζ ¼ Rzζ Ryη Rxξ , etc:
with, e.g.,
0 1
1 ζ η
B C
Rxξ Ryη Rzζ ¼ @ ζ 1 ξ A ð20:122Þ
η ξ 1
to the first order. Notice that the order of operations Rxξ, Ryη, and Rzζ is disregarded.
Next, let us consider a successive transformation with a finite rotation angle ω that
follows the infinitesimal rotations. Readers may well ask why and for what purpose
we need to make such elaborate calculations. It is partly because when we were
dealing with finite groups in the previous chapters, group elements were finite and
relevant calculations were straightforward accordingly. But, now we are thinking of
continuous groups that possess an infinite number of group elements and, hence, we
have to consider a “density” of group elements. In this respect, we are now thinking
of how the density of group elements in the parameter space is changed according to
Fig. 20.3 Infinitesimal z

rotations ξ, η, and ζ around
the x-, y-, and z-axes,
respectively
y
O
x
the rotation of a finite angle ω. The results are used for various group calculations
including orthogonalization of characters.
Getting back to our subject, we further operate a finite rotation of an angle ω
around the z-axis subsequent to the aforementioned infinitesimal rotations. The
discussion developed below is due to Hamermesh [5]. We denote this rotation by
Rω. Since we are dealing with a spherically symmetric system, any rotation axis can
equivalently be chosen without loss of generality. Notice also that the rotations of the
same rotation angle belong to the same conjugacy class [see (17.106) and
Fig. 17.18]. Defining RxξRyηRzζ RΔ of (20.122), the rotation R that combines RΔ
and subsequently occurring Rω is described by
0 10 1
1 ζ η cos ω sin ω 0
B CB C
B CB C
R ¼ RΔ Rω ¼ B ζ 1 ξ CB sin ω cos ω 0 C
@ A@ A
η ξ 1 0 0 1
0 1 ð20:123Þ
cos ω ζ sin ω sin ω ζ cos ω η
B C
B C
¼ B sin ω þ ζ cos ω cos ω ζ sin ω ξ C:
@ A
ξ sin ω η cos ω ξ cos ω þ η sin ω 1
Hence, we have
R32 R23 ¼ ξ cos ω þ η sin ω ðξÞ ¼ ξð1 þ cos ωÞ þ η sin ω,

R13 R31 ¼ η ðξ sin ω η cos ωÞ ¼ ηð1 þ cos ωÞ ξ sin ω,
R21 R12 ¼ sin ω þ ζ cos ω ð sin ω ζ cos ωÞ ¼ 2ð sin ω þ ζ cos ωÞ:
ð20:124Þ
Since (20.124) gives a relative directional ratio of the rotation axis, we should
normalize a vector whose components are given by (20.124) to get a direction
cosines of the rotation axis. Note that the direction of the rotation axis related to
R of (20.123) should be close to Rω and that only R12 R21 in (20.124) has a term
(i.e., 2 sin ω) lacking infinitesimal quantities ξ, η, ζ. Therefore, it suffices to
normalize R12 R21 to seek the direction cosines to the first order. Hence, dividing
(20.124) by 2 sin ω, as components of the direction cosines we get
ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ

þ , , 1: ð20:125Þ
2 sin ω 2 2 sin ω 2
Meanwhile, combining (17.78) and (20.123), we can find a rotation angle ω0 for
R in (20.123). The trace χ of (20.123) is written as
χ ¼ 1 þ 2ð cos ω ζ sin ωÞ: ð20:126Þ
χ ¼ 1 þ 2 cos ω0 : ð20:127Þ
Equating (20.126) and (20.127), we obtain
cos ω0 ¼ cos ω ζ sin ω: ð20:128Þ
Approximating (20.128), we get
1 2 1
1 ω0 1 ω2 ζω: ð20:129Þ
2 2
From (20.129), we have a following approximate expression:
ðω0 þ ωÞðω0 ωÞ 2ωðω0 ωÞ 2ζω:
Hence, we obtain
ω0 ω þ ζ: ð20:130Þ
Combining (20.125) and (20.130), we get their product to the first order such that

ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ
ω þ , ω , ω þ ζ: ð20:131Þ
2 sin ω 2 2 sin ω 2
Since these quantities are products of individual direction cosines and the rotation
angle ω + ζ, we introduce variables ex, ey, and ez as the x-, y-, and z-related quantities,
respectively. Namely, we have

ξð1 þ cos ωÞ η
ex ¼ ω þ ,
2 sin ω 2

ηð1 þ cos ωÞ ξ ð20:132Þ
ey ¼ ω ,
2 sin ω 2
ez ¼ ω þ ζ:
To evaluate how the density of group elements is changed as a function of the

rotation angle ω, we are interested in variations of ex, ey, and ez that depend on ξ, η, and
ζ. To this end, we calculate the Jacobian J as
∂ex ∂ex ∂ex

∂ξ ∂η ∂ζ ωð1 þ cos ωÞ ω
0
∂ðex, ey,ezÞ 2 sin ω 2
∂ey ∂ey ∂ey
J¼ ¼ ¼ ω ωð1 þ cos ωÞ
∂ðξ, η, ζ Þ ∂ξ ∂η ∂ζ 0
2 2 sin ω
∂ez ∂ez ∂ez 0 0 1
∂ξ ∂η ∂ζ
2
ω2 ð1 þ cos ωÞ ω2
¼ þ1 ¼ , ð20:133Þ
4 sin ω
2
4 sin 2 ω 2
where we used formulae of trigonometric functions with the last equality. Note that
(20.133) does not depend on the signs of ω. This is because J represents the
relative volume density ratio between two parameter spaces of the ex ey ez-system and
ξηζ-system and this ratio is solely determined by the modulus of rotation angle.
Taking ω ! 0, we have
0
ω2 ðω2 Þ 1 2ω ω
lim J ¼ lim ¼ lim 2 0 ¼ lim ¼ lim
ω!0 ω!0 4 sin 2 ω ω!0 4 sin ω 4 ω!0 2 sin ω2 12 cos ω
2
ω!0 sin ω
2 2
ð ωÞ 0 1
¼ lim 0 ¼ lim cos ω ¼ 1:
ω!0 ð sin ωÞ ω!0
Thus, the relative volume density ratio in the limit of ω ! 0 is 1, as expected. Let
dV ¼ dexdeydez and dΠ ¼ dξdηdζ be volume elements of each coordinate system.
Then we have
dV ¼ JdΠ: ð20:134Þ
We assume that
dV ρξηζ
J¼ ¼ ,
dΠ ρexeyez
where ρ ex ey ez and ρξηζ are a “number density” of group elements in each coordinate
system. We suppose that the infinitesimal rotations in the ξηζ-coordinate are
converted by a finite rotation Rω of (20.123) into the ex ey ez -coordinate. In this way,
J can be viewed as an “expansion” coefficient as a function of rotation angle jωj.
Hence, we have
ρξηζ
dV ¼ dΠ or ρ ex ey ez dV ¼ ρξηζ dΠ: ð20:135Þ
ρ ex ey ez
Equation (20.135) implies that total (infinite) number of group elements that are
contained in the group SO(3) is invariant with respect to the transformation.
Let us calculate the total volume of SO(3). This can be measured such that
Z Z
1
dΠ ¼ dV: ð20:136Þ
J
Converting dV to the polar coordinate representation, we get

Z Z Z Z
π π 2π
4 sin 2 ω~2
Π½SOð3Þ ¼ dΠ ¼ ω
de dθ dϕ e 2 sin θ ¼ 8π 2 ,
ω ð20:137Þ
0 0 0 e2
ω
where we denote the total volume of SO(3) by Π[SO(3)]. Note that ω in (20.133) is
replaced with ω e j ω j so that the radial coordinate can be positive. As already
noted, (20.133) does not depend on the signs of ω. In other words, among the
angles α, β, and ω (usually γ is used instead of ω; see Sect. 17.4.2), ω is taken as
π ω < π (instead of 0 ω < 2π) so that it can be conformed to the radial
coordinate.
We utilize (12.137) for calculation of various functions. Let f(ϕ, θ, ω) be an
arbitrary function on a parameter space. We can readily estimate a “mean value”
of f(ϕ, θ, ω) on that space. That is, we have
R
f ðϕ, θ, ωÞdΠ
f ðϕ, θ, ωÞ R , ð20:138Þ
dΠ
where f ðϕ, θ, ωÞ represents a mean value of the function and given by

Z π Z π Z 2π Z
e
ω
f ðϕ, θ, ωÞ ¼ 4 deω dθ dϕf ðϕ, θ, ωÞ sin 2 sin θ = dΠ: ð20:139Þ
0 0 0 2
There would be some inconvenience to use (20.139) because of the mixture of

variables ω and ωe . Yet, that causes no problem if f(ϕ, θ, ω) is an even function with
respect to ω (see Sect. 20.2.6).
20.2.6 Irreducible Characters of SO(3) and Their

Orthogonality
To evaluate (20.139), let us get back to the irreducible representations D(l )(α, β, γ) of
Sect. 20.2.3. Also, we recall how successive coordinate transformations produce
changes in the orthogonal matrices of transformation (Sect. 17.4.2). Since the
parameters α, β, and γ uniquely specify individual rotation, different sets of α, β,
and γ (in this section we use ω instead of γ) cause different irreducible
Fig. 20.4 Geometrical

arrangement of two rotation
axes A and A0 accompanied
by the same rotation angle ω
representations. A corresponding rotation matrix is given by (17.101). In fact, the

said matrix is a representation of the rotation. Keeping these points in mind, we
discuss irreducible representations and their characters particularly in relation to the
orthogonality of the irreducible characters.
Figure 20.4 depicts a geometrical arrangement of two rotation axes accompanied
by the same rotation angle ω. Let Rω and R0ω be such two rotations around the
rotation axis A and A0, respectively. Let Q be another rotation that transforms the
rotation axis A to A0. Then we have [1, 2]
R0ω ¼ QRω Q1 : ð20:140Þ
This implies that Rω and R0ω belong to the same conjugacy class. To consider the
situation more clearly, let us view the two rotations Rω and R0ω from two different
coordinate systems, say some xyz-coordinate system and another x0y0z0-coordinate
system. Suppose also that A coincides with the z-axis and that A0 coincides with the
z0-axis. Meanwhile, if we describe Rω in reference to the xyz-system and R0ω in
reference to the x0y0z0-system, the representation of these rotations must be identical
in reference to the individual coordinate systems. Let this representation matrix be
Rω .
(i) If we describe Rω and R0ω with respect to the xyz-system, we have
1
Rω ¼ Rω , R0ω ¼ Q1 Rω Q1 ¼ QRω Q1 : ð20:141Þ
Namely, we reproduce
R0ω ¼ QRω Q1 : ð20:140Þ
(ii) If we describe Rω and R0ω with respect to the x0y0z0-system, we have
R0ω ¼ Rω , Rω ¼ Q1 Rω Q: ð20:142Þ
Hence, again we reproduce (20.140). That is, the relation (20.140) is indepen-
dent of specific choices of the coordinate system.
Taking the representation indexed by l of Sect. 20.2.4 with respect to

(20.140), we have

DðlÞ R0ω ¼ DðlÞ QRω Q1 ¼ DðlÞ ðQÞDðlÞ ðRω ÞDðlÞ Q1
¼ DðlÞ ðQÞDðlÞ ðRω Þ½DðlÞ ðQÞ1 , ð20:143Þ
where with the last equality we used (18.6). Operating D(l )(Q) on (20.143)
from the right, we get

DðlÞ R0ω DðlÞ ðQÞ ¼ DðlÞ ðQÞDðlÞ ðRω Þ: ð20:144Þ

Since both DðlÞ R0ω and D(l )(Rω) are irreducible, from (18.41) of Schur’s First
Lemma DðlÞ R0ω and D(l )(Rω) are equivalent. An infinite number of rotations
determined by the orientation of the rotation axis A that is accompanied by an
azimuthal angle α (0 α < 2π) and a zenithal angle β (0 β < π) (see
Fig. 17.18) form a conjugacy class with each specific ω shared. Thus, we have
classified various representations D(l )(Rω) according to ω and l. Meanwhile, we may
identify D(l )(Rω) with D(l )(α, β, γ) of Sect. 20.2.4.
In Sect. 20.2.2 we know that the spherical surface harmonics
l θ, ϕÞ ðm ¼ l, , 0, , lÞ constitute basis functions of D
ð (l )
Ym and span the rep-
resentation space. A dimension of the matrix (or the representation space) is 2l + 1.
0
Therefore, D(l ) and Dðl Þ for different l and l0 have a different dimension. From
0
(18.41) of Schur’s First Lemma, such D(l ) and Dðl Þ are inequivalent.
Returning
to (20.139), let us evaluate a mean value of f ðϕ, θ, ωÞ. Let the trace of
DðlÞ R0ω and D(l )(Rω) be χ ðlÞ R0ω and χ (l )(Rω), respectively. Remembering (12.13),
the trace is invariant under a similarity transformation, and so χ ðlÞ R0ω ¼ χ ðlÞ ðRω Þ.
Then, we put

χ ðlÞ ðωÞ χ ðlÞ ðRω Þ ¼ χ ðlÞ R0ω : ð20:145Þ
Consequently, it suffices to evaluate the trace χ (l )(ω) using D(l )(α, 0, 0) of (20.96)
whose representation matrix is given by (20.97). We have

Xl eilω 1 eiωð2lþ1Þ
χ ðlÞ ðωÞ ¼ e imω
¼
m¼l 1 eiω

eilω ð1 eiω Þ 1 eiωð2lþ1Þ
¼ , ð20:146Þ
ð1 eiω Þð1 eiω Þ
where
½numerator of ð20:146Þ
¼ eilω þ eilω eiωðlþ1Þ eiωðlþ1Þ ¼ 2½ cos lω cos ðl þ 1Þω

1 ω
¼ 4 sin l þ ω sin
2 2
and
ω
½denominator of ð20:146Þ ¼ 2ð1 cos ωÞ ¼ 4sin2 :
2
Thus, we get

ðlÞ sin l þ 12 ω
χ ð ωÞ ¼ : ð20:147Þ
sin ω2
To adapt (18.76) to the present formulation, we rewrite it as
1 X ðαÞ ðβÞ
χ ðgÞ χ ðgÞ ¼ δαβ : ð20:148Þ
n g
A summation over a finite number of group element n in (20.148) should be read

as integration in the case of the continuous groups. Instead of a finite number of
irreducible representations in a finite group, we are thinking of an infinite number of
irreducible
R representations with continuous groups. In (20.139) the denominator
dΠ (¼8π 2) corresponds to n of (20.148). If f(ϕ, θ, ω) or f(α, β, ω) is an even
function with respect to ω (π ω < π) as in the case of (20.147), the numerator
of (20.139) can be expressed in a form of
Z Z Z Z
π π 2π
4 sin 2 ω2 2
f ðα, β, ωÞdΠ ¼ dω dβ dαf ðα, β, ωÞ ω sin β
0 0 0 ω2
Z π Z π Z 2π
ω
¼4 dω dβ dαf ðα, β, ωÞ sin 2 sin β: ð20:149Þ
0 0 0 2
If, moreover, f(α, β, ω) depends only on ω, again as in the case of the character
described by (20.147), the calculation is further simplified such that
Z Z π
ω
f ðωÞdΠ ¼ 16π dωf ðωÞ sin 2 : ð20:150Þ
0 2
0
Replacing f(ω) with χ ðl Þ ðωÞ χ ðlÞ ðωÞ, we have
Z h i Z πh 0 i
ðl0 Þ ðlÞ ω
χ ðωÞ χ ðωÞdΠ ¼ 16π dω χ ðl Þ ðωÞ χ ðlÞ ðωÞsin 2
0 2
Z π sin l0 þ
1 1
ω sin l þ ω Z π
¼ 16π dω 2 2 sin 2 ω ¼ 16π 1 1
dωsin l0 þ ωsin l þ ω
ω ω 2 2 2
0 sin sin 0
Z 2 2
π
¼ 8π dω½ cos ðl0 lÞω cos ðl0 þ l þ 1Þω:
0
ð20:151Þ
Since l0 + l + 1 > 0, the integral of cos(l0 + l + 1)ω term vanishes. Only if l0 l ¼ 0,

the integral of cos(l0 l)ω term does not vanish, but takes a value of π. Therefore,
we have
Z h i
0
χ ðl Þ ðωÞ χ ðlÞ ðωÞdΠ ¼ 8π 2 δl0 l : ð20:152Þ
0
Finally, with χ ðl Þ ðωÞ χ ðlÞ ðωÞ defined as in (20.139), we get
0
χ ðl Þ ðωÞ χ ðlÞ ðωÞ ¼ δl0 l : ð20:153Þ
This relation gives the orthogonalization condition in concert with (20.148)

obtained in the case of a finite group.
In Sect. 18.6 we have shown that the number of inequivalent irreducible
representations of a finite group is well defined and given by the number of
conjugacy classes of that group. This is clearly demonstrated in (18.144). In the
case of the continuous groups, however, the situation is somewhat complicated and
we are uncertain of the number of the inequivalent irreducible representations. We
address this problem as follows: Suppose that Ω(α, β, ω) would be an irreducible
representation that is inequivalent to individual D(l)(α, β, ω)
(l ¼ 0, 1, 2, ). Let
K(ω) be a character of Ω(α, β, ω). Defining f ðωÞ χ ðlÞ ðωÞ KðωÞ, (20.150) reads
as
Z h i Z π h i
ðlÞ ω
χ ðωÞ KðωÞdΠ ¼ 16π dω χ ðlÞ ðωÞ KðωÞ sin 2 : ð20:154Þ
0 2
The orthogonalization condition (20.153) demands that the integral of (20.154)

vanishes. Similarly, we would have
Z h i Z π h i
ðlþ1Þ ω
χ ðωÞ KðωÞdΠ ¼ 16π dω χ ðlþ1Þ ðωÞ KðωÞ sin 2 , ð20:155Þ
0 2
which must vanish as well. Subtracting RHS of (20.154) from RHS of (20.155) and
taking account of the fact that χ (l )(ω) is real [see (20.147)], we have
Z π h i
ω
dω χ ðlþ1Þ ðωÞ χ ðlÞ ðωÞ KðωÞ sin 2 ¼ 0: ð20:156Þ
0 2
Invoking a trigonometric formula of
aþb ab
sin a sin b ¼ 2 cos sin
2 2
and applying it to (20.147), (20.156) can be rewritten as

Z π
ω
2 dω½ cos ðl þ 1ÞωKðωÞ sin 2 ¼ 0 ðl ¼ 0, 1, 2, Þ: ð20:157Þ
0 2
Putting l ¼ 0 in LHS of (20.154) and from the assumption that the integral of
(20.154) vanishes, we also have
Z π
ω
dωKðωÞ sin 2 ¼ 0, ð20:158Þ
0 2
where we used χ (0)(ω) ¼ 1. Then, (20.157) and (20.158) are combined to give
Z π
ω
dωð cos lωÞKðωÞ sin 2 ¼ 0 ðl ¼ 0, 1, 2, Þ: ð20:159Þ
0 2
This implies that all the Fourier cosine coefficients of KðωÞ sin 2 ω2 vanish.
Considering that coslω (l ¼ 0, 1, 2, ) forms a complete orthonormal system in
the region [0, π], we must have
ω
KðωÞ sin 2 0: ð20:160Þ
2
Requiring K(ω) to be continuous, (20.160) is equivalent to K(ω) 0. This

implies that there is no other inequivalent irreducible representation than
D(l )(α, β, ω) (l ¼ 0, 1, 2, ). In other words, the representations D(l )(α, β, ω)
(l ¼ 0, 1, 2, ) constitute a complete set of irreducible representations. The spherical
l ðθ, ϕÞ ðm ¼ l, , 0, , lÞ constitute basis functions of D
(l )
surface harmonics Y m
and span the representation space of D (α, β, ω) accordingly.
(l )
In the above discussion, we assumed that K(ω) is an even function with respect to
ω as in the case of χ (l )(ω) in (20.147). Hence, KðωÞ sin 2 ω2 is an even function as
well. Therefore, we assumed that KðωÞ sin 2 ω2 can be expanded to the Fourier cosine
series.
20.3 Clebsch–Gordan Coefficients of Rotation Groups 843
20.3 Clebsch2Gordan Coefficients of Rotation Groups
In Sect. 18.8, we studied direct-product representations. We know that even though

D(α) and D(β) are both irreducible, D(α β) is not necessarily irreducible. This is a
common feature for both finite and infinite groups. Moreover, the reducible unitary
representation can be expressed as a direct sum of irreducible representations such
that
X
DðαÞ ðgÞ D ð β Þ ð gÞ ¼ q D
ω ω
ðωÞ
ðgÞ: ð18:199Þ
In the infinite groups, typically the continuous groups, this situation often appears
when we deal with the addition of two angular momenta. Examples include the
addition of two (or more) orbital angular momenta and that of the orbital angular
momentum and spin angular momentum [6, 8].
Meanwhile, there is a set of basis vectors that spans the representation space with
respect to individual irreducible representations. There, we have a following ques-
tion: What are the basis vectors that span the total reduced representation space
described by (18.199)? An answer for this question is that these vectors must be
constructed by the basis vectors relevant to the individual irreducible representa-
tions. The ClebschGordan coefficients appear as coefficients with respect to a
linear combination of the basis vectors associated with those irreducible
representations.
In a word, to find the ClebschGordan coefficients is equivalent to the calcula-
tion of proper coefficients of the basis functions that span the representation space of
the direct-product groups. The relevant continuous groups which we are interested in
are SU(2) and SO(3).
20.3.1 Direct-Product of SU(2) and Clebsch2Gordan

Coefficients
In Sect. 20.2.6 we have explained the orthogonality of irreducible characters of SO

(3) by deriving (20.153). It takes advanced theory to show the orthogonality of
irreducible characters of SU(2). Fortunately, however, we have an expression similar
to (20.153) with SU(2) as well. Readers are encouraged to look up appropriate
literature for this [9]. On the basis of this important expression, we further develop
the representation theory of the continuous groups. Within this framework, we
calculate the ClebschGordan Coefficients.
As in the case of the previous section, the character of the irreducible represen-
tation D( j )(α, β, γ) of SU(2) is described only as a function of a rotation angle.
Similarly to (20.147), in SU(2) we get

Xj sin j þ 12 ω
ð jÞ
χ ð ωÞ ¼ e imω
¼ , ð20:161Þ
m¼j sin ω2
where χ ( j )(ω) is an irreducible character of D( j )(α, β, γ). It is because we can align

the rotation axis in the direction of, e.g., the z-axis and the representation matrix of a
rotation angle ω is typified by D( j )(ω, 0, 0) or D( j )(0, 0, ω). Or, we can evenly choose
any direction of the rotation axis at will.
We are interested in calculating angular momenta of a coupled system, i.e.,
addition of angular momenta of that system [5, 6]. The word of “coupled system”
needs some explanation. This is because, on the one hand, we may deal with, e.g., a
sum of an orbital angular momentum and a spin angular momentum of a “single”
electron, but, on the other hand, we might calculate, e.g., a sum of two angular
momenta of “two” electrons. In either case, we will deal with the total angular
momenta of the coupled system.
Now, let us consider a direct-product group for this kind of problem. Suppose that
one system is characterized by a quantum number j1 and the other by j2. Here these
numbers are supposed to be those of (3.89), namely the highest positive number of
the generalized angular momentum in the z-direction. That is, the representation
matrices of rotation for systems 1 and 2 are assumed to be Dð j1 Þ ðω, 0, 0Þ and
Dð j2 Þ ðω, 0, 0Þ, respectively. Then, we are to deal with the direct-product represen-
tation Dð j1 Þ Dð j2 Þ (see Sect. 18.8). As in (18.193), we describe
Dð j1 j2 Þ ðωÞ ¼ Dð j1 Þ ðωÞ Dð j2 Þ

ðωÞ: ð20:162Þ
In (20.162), the group is represented by a rotation angle ω, more strictly the

rotation operation of an angle ω. The quantum numbers j1 and j2 mean the irreduc-
ible representations of the rotations for systems 1 and 2, respectively. According to
(20.161), we have
χð j1 j2 Þ
ðωÞ ¼ χ ð j1 Þ ðωÞχ ð j2 Þ
ðωÞ, ð20:163Þ
where χ ð j1 j2 Þ ðωÞ gives a character of the direct-product representation; see

(18.198). Furthermore, we have [5].
X j1 X j2
χð j1 j2 Þ
ð ωÞ ¼ m1 ¼j1
eim1 ω eim2 ω
m2 ¼j2
X j1 X j2 X j1 þj2 XJ
¼ m1 ¼j1 m2 ¼j2
eiðm1 þm2 Þω ¼ J¼j j j j M¼J
eiMω
X
1 2
j1 þj2 ðJ Þ
¼ J¼j j1 j2 j
χ ðωÞ,
ð20:164Þ
where a positive number J can be chosen from among

J ¼ j j1 j2 j, j j1 j2 j þ 1, , j1 þ j2 :
Rewriting (20.164) more explicitly, we have
χ ð j1 j2 Þ ðωÞ ¼ χ ðj j1 j2 jÞ ðωÞ þ χ ðj j1 j2 jþ1Þ ðωÞ þ þ χ ð j1 þj2 Þ ðωÞ: ð20:165Þ
If both j1 and j2 are integers, from (20.165) we can immediately derive an

important result. In light of (18.199) and (18.200), we multiply χ (k)(ω) on both
sides of (20.165) and then integrate it to get
Z
χ ðkÞ ðωÞ χ ð j1 j2 Þ ðωÞdΠ
Z Z
¼ χ ðkÞ ðωÞ χ ðj j1 j2 jÞ ðωÞdΠ þ χ ðkÞ ðωÞ χ ðj j1 j2 jþ1Þ ðωÞdΠ þ
Z ð20:166Þ
þ χ ðkÞ ðωÞ χ ð j1 þj2 Þ ðωÞdΠ

¼ 8π 2 δk,jj1 j2 j þ δk,j j1 j2 jþ1 þ þ δk,j1 þj2 ,
where k is zeroR or an integer and with the last equality we used (20.152). Dividing
both sides by dΠ, we get
χ ð k Þ ð ωÞ χ ð j1 j2 Þ ðωÞ ¼ δk,jj1 j2 j þ δk,jj1 j2 jþ1 þ þ δk,j1 þj2 , ð20:167Þ
where we used (20.153). Equation (20.166) implies that only if k is identical to one
out of jj1 j2j, jj1 j2 j + 1, , j1 + j2, χ ðkÞ ðωÞ χ ð j1 j2 Þ ðωÞ does not vanish. If, for
example, we choose jj1 j2j for k in (20.167), we have
χ ðj j1 j2 jÞ ðωÞ χ ð j1 j2 Þ ðωÞ ¼ δjj1 j2 j,jj1 j2 j ¼ 1: ð20:168Þ
This relation corresponds R to (18.200) where a finite group is relevant. Note that
the volume of SO(3), i.e., dΠ ¼ 8π 2 corresponds to the order n of a finite group in
(20.148). Considering (18.199) and (18.200) once again, it follows that in the above
case Dðj j1 j2 jÞ takes place once and only once in the direct-product representation
Dð j1 j2 Þ . Thus, we obtain the following important relation:
X j1 þj2
Dð j1 j2 Þ
¼ Dð j1 Þ
ð ωÞ Dð j2 Þ
ð ωÞ ¼ J¼jj1 j2 j
DðJ Þ : ð20:169Þ
Any group (finite or infinite) is said to be simply reducible, if each irreducible

representation takes place at most once when the direct-product representation of
that group in question is reduced. Equation (20.169) clearly shows that SO(3) is
simply reducible. With respect to SU(2) we also have the same expression of
(20.169) [9].
In Sect. 18.8 we examined the direct-product representation of two (irreducible)

representations. Let D(μ)(g) and D(v)(g) be two different irreducible representations
of the group G. Here G may be either a finite or infinite group. Note that in Sect. 18.8
we have focused on the finite groups and that now we are thinking of an infinite
group, typically SU(2) and SO(3). Let nμ and nv be a dimension of the representation
space of the irreducible representations μ and v, respectively. We assume that each
representation space is spanned by following basis functions:
ðμÞ ð vÞ
ψj μ ¼ 1, , nμ and ϕl ðμ ¼ 1, , nv Þ:
We know from Sect. 18.8 that nμnv new basis functions can be constructed using
ðμÞ ðvÞ
the functions that are described by ψ j ϕl . Our next task will be to classify these
nμnv functions into those belonging to different irreducible representations. For this,
we need relation of (18.199). According to the custom, we rewrite it as follows:
X
DðμÞ ðgÞ DðvÞ ðgÞ ¼ ω
ðμvωÞDðωÞ ðgÞ, ð20:170Þ
where (μvω) is identical with qω of RHS of (18.199) and shows how many times the
same representation ω takes place in the direct-product representations. Naturally,
we have
ðμvωÞ ¼ ðvμωÞ:
Then, we get
X
ðμvωÞnω ¼ nμ nv :
ω
Meanwhile, we should be able to constitute the functions that have transformation

properties the same as those of the irreducible representation ω. In other words, these
functions should make the basis functions belonging to ω. Let the function be Ψðsωτω Þ.
Then, we have
X ðμÞ ðvÞ
Ψðsωτω Þ ¼ ðμj, vljωτω sÞψ j ϕl , ð20:171Þ
j, l
where τω ¼ 1, , (μvω) and s designates the functions contained in the irreducible

representation ω; i.e., s ¼ 1, , nω. If ω takes place more than once, we label ω as
ωτω . The coefficients with respect to the above linear combination
ðμj, vljωτω sÞ
are called ClebschGordan coefficients. Readers might well be bewildered by the

notations of abstract algebra, but Examples of Sect. 20.3.3 should greatly relieve
them of anxiety about it.
As we shall see later in this chapter, the functions Ψðsωτω Þ are going to be
ðμÞ ðvÞ
normalized along with the product functions ψ j ϕl . The ClebschGordan coeffi-
cients must form a unitary matrix accordingly. Notice that a transformation between
two orthonormal basis sets is unitary (see Chap. 14). According as there are nμnv
product functions as the basis vectors, the ClebschGordan coefficients are associ-
ated with (nμnv, nμnv) square matrices. The said coefficients can be defined with any
group either finite or infinite. Or rather, difficulty in determining the
ClebschGordan coefficients arises when τω is more than one. This is because we
should take account of linear combinations described by
X
c Ψðωτω Þ
τ ω ωτ ω s
that belong to the irreducible representation ω. This would cause arbitrariness and
ambiguity [5]. Nevertheless, we do not need to get into further discussion about this
problem. From now on, we focus on the ClebschGordan coefficients of SU(2) and
SO(3), a typical simply reducible group, namely τω is at most one.
For SU(2) and SO(3) to be simply reducible saves us a lot of labor. The argument
for this is as follows: Equation (20.169) shows how the dimension of (2j1 + 1)
(2j2 + 1) for the representation space with the direct product of two representations is
redistributed to individual irreducible representations that take place only once. The
arithmetic based on this fact is as follows:
ð2j1 þ 1Þð2j2 þ 1Þ
n h io
1
¼ 2ð j1 j2 Þð2j2 þ 1Þ þ 2 2j2 ð2j2 þ 1Þ þ ð2j2 þ 1Þ
2
¼ 2ð j1 j2 Þð2j2 þ 1Þ þ 2ð0 þ 1 þ 2 þ þ 2j2 Þ þ ð2j2 þ 1Þ ð20:172Þ
¼ ½2ð j1 j2 Þ þ 0 þ 2½ð j1 j2 Þ þ 1 þ 2½ð j1 j2 Þ þ 2 þ
þ2½ð j1 j2 Þ þ 2j2 þ ð2j2 þ 1Þ
¼ ½2ð j1 j2 Þ þ 1 þ f2½ð j1 j2 Þ þ 1 þ 1g þ þ ½2ð j1 þ j2 Þ þ 1,
where with the second equality, we used the formula 1 þ 2 þ þ n ¼ 12 nðn þ 1Þ;
resulting numbers of 0, 1, 2, , and 2j2 are distributed to 2j2 + 1 terms of the
subsequent RHS of (20.172) each; the third term of 2j2 + 1 is distributed to 2j2 + 1
terms each as well on the rightmost hand side. Equation (20.172) is symmetric with
j1 and j2, but if we assume that j1 j2, each term of the rightmost hand side of
(20.172) is positive. Therefore, for convenience we assume j1 j2 without loss of
generality.
To clarify the situation, we list several tables (Tables 20.1, 20.2, and 20.3).
Tables 20.1 and 20.2 show specific cases, but Table 20.3 represents a general
case. In these tables the topmost layer has a diagram indicated by . This
diagram corresponds to the case of J ¼ j j1 j2j and contains different
2|j1 j2| + 1 quantum states. The lower layers include diagrams indicated by
. These diagrams correspond to the case of J ¼ |j1 j2| + 1, , j1 + j2. For
Table 20.1 Summation of angular momenta. J ¼ 0, 1, or 2; M J, J + 1, , J. Regarding a

dotted aqua line, see text
−1 0 1 (= )
−1 −2 −1 0
0 −1 0 1
1 (= ) 0 1 2
Table 20.2 Summation of angular momenta. J ¼ 1, 2, or 3; M J, J + 1, , J
−2 −1 0 1 2 (= )
−1 −3 −2 −1 0 1
0 −2 −1 0 1 2
1 (= ) −1 0 1 2 3
Table 20.3 Summation of angular momenta in a general case. We suppose j1 > 2j2. J ¼ |j1 j2|,
, j1 + j2; M J, J + 1, , J
m1
− − +1 ⋯ 2 − ⋯ −2 ⋯ −1
m2
− − − − − +1 ⋯ − ⋯ −3 ⋯ − −1 −
− +1 − − +1 − − +2 ⋯ − +1 ⋯ −3 +1 ⋯ − − +1
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
each J, 2J + 1 functions (or quantum states) are included and labelled according to
different numbers M + J, J + 1, , J. The rightmost column displays
M m1 + m2 ¼ |j1 j2|, |j1 j2| 1, , j1 + j2 from the topmost row through
the bottommost row.
Tables 20.1, 20.2, and 20.3 comprise 2j2 + 1 layers, where total (2j1 + 1)(2j2 + 1)
functions that form basis vectors of Dð j1 Þ ðωÞ Dð j2 Þ ðωÞ are redistributed according
to the M number. The same numbers that appear from the topmost row through the
bottommost row are marked red and connected with dotted aqua lines. These lines
form a parallelogram together with the top and bottom horizontal lines as shown.
The upper and lower sides of the parallelogram are 2 jj1 j2j in width. The
parallelogram gets flattened with decreasing jj1 j2j. If j1 ¼ j2, the parallelogram
coalesces into a line (Table 20.1). Regarding the notations M and J, see Sects. 20.3.2
and 20.3.3 (vide infra).
Bearing in mind the above remarks, we focus on the coupling of two angular
momenta as a typical example. Within this framework we deal with the
ClebschGordan coefficients with regard to SU(2). We develop the discussion

according to literature [5]. The relevant approach helps address a more complicated
situation where the coupling of three or more angular momenta including j j
coupling and L S coupling is responsible [6].
From (20.47) we know that fm forms a basis set that spans a (2j + 1)-dimensional
representation space associated with D( j ) such that
u jþm v jm
f m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm ¼ j, j þ 1, , j 1, jÞ: ð20:47Þ
The functions fm are transformed according to Rða, bÞ so that we have

X ð jÞ
Rða, bÞð f m Þ f 0 D 0 ða, bÞ,
m0 m m m
ð20:48Þ
where j can be either an integer or a half-odd-integer. We henceforth work on fm of

(20.47).
20.3.2 Calculation Procedures of Clebsch2Gordan

Coefficients
In the previous section we have described how systematically the quantum states are
made up of two angular momenta. We express those states in terms of a linear
combination of the basis functions related to the direct-product representations of
(20.169). To determine those coefficients, we follow calculation procedures due to
Hamermesh [5]. The calculation procedures are rather lengthy, and so we summarize
each item separately.
(1) Invariant quantities AJ and BJ:
We start with seeking invariant quantities under the unitary transformation U of
(20.35) expressed as

a b
U¼ with jaj2 þ jbj2 ¼ 1: ð20:35Þ
b a
As in (20.46) let us consider a combination of variables u (u2 u1) and v (v2 v1)
that are transformed such that

a b a b
u02 u01 ¼ ðu2 u1 Þ , v02 v01 ¼ ðv2 v1 Þ : ð20:173Þ
b a b a
For a shorthand notation we have

u0 ¼ um, v0 ¼ vm, ð20:174Þ
where we define u0, v0, m as

a b
u u02 u01 ,
0
v v02 v01 ,
0
m : ð20:175Þ
b a
As in (20.47) we also define the following functions as
u1 j1 þm1 u2 j1 m1
ψ mj11 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm1 ¼ j1 , j1 þ 1, , j1 1, j1 Þ,
ð j1 þ m1 Þ!ð j1 m1 Þ!
ð20:176Þ
v1 j2 þm2 v2 j2 m2
ϕmj22 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm2 ¼ j2 , j2 þ 1, , j2 1, j2 Þ:
ð j2 þ m2 Þ!ð j2 m2 Þ!
Meanwhile, we also consider a combination of variables x (x2 x1) that are

transformed as

a b
x02 x01 ¼ ð x2 x1 Þ : ð20:177Þ
b a
For a shorthand notation we have
x0 ¼ xm , ð20:178Þ
where we have

a b
x0 x02 x01 , x ðx2 x1 Þ, m ¼ : ð20:179Þ
b a
The unitary matrix m is said to be a complex conjugate matrix of m. We say that

in (20.174) u and v are transformed covariantly and that in (20.178) x is transformed
contravariantly. Defining a unitary operator g as

0 1
g ð20:180Þ
1 0
and operating g on both sides of (20.174), we have

u0 g ¼ u gg{ mg ¼ ug g{ mg ¼ ugm : ð20:181Þ
u0 g ¼ ðugÞm , ð20:182Þ
we find that ug is transformed contravariantly. Taking transposition of (20.178), we

have
x0 ¼ ðm ÞT xT ¼ m{ xT :
T
Then, we get

u0 x0 ¼ ðumÞ m{ xT ¼ u mm{ xT ¼ uxT ¼ u2 x2 þ u1 x1 ¼ uxT ,
T
ð20:183Þ
where with the third equality we used the unitarity of m. This implies that uxT is an
invariant quantity under the unitary transformation by m. We have
v0 x0 ¼ vxT ,
T
ð20:184Þ
meaning that vxT is an invariant as well.

u01 v02 u02 v01 ¼ v0 ðu0 gÞ ¼ v0 gT u0 ¼ vmgT mT uT ¼ vgT uT ¼ vðugÞT

T T

0 1 u2
¼ ðv2 v1 Þ ¼ u1 v 2 u2 v 1 : ð20:185Þ
1 0 u1
Thus, v(ug)T (or u1v2 u2v1) is another invariant under the unitary transformation
by m. In the above discussion we can view (20.183) to (20.185) as a quadratic form
of two variables (see Sect. 14.5).
Using these invariants, we define other important invariants AJ and BJ as follows
[5]:
AJ ðu1 v2 u2 v1 Þ j1 þj2 J ðu2 x2 þ u1 x1 Þ j1 j2 þJ ðv2 x2 þ v1 x1 Þ j2 j1 þJ

, ð20:186Þ
BJ ðu2 x2 þ u1 x1 Þ2J : ð20:187Þ
The polynomial AJ is of degree 2j1 with respect to the covariant variables u1 and
u2 and is of degree 2j2 with v1 and v2. The degree of the contravariant variables x1
and x2 is 2J. Expanding AJ in powers of x1 and x2, we get
XJ
AJ ¼ M¼J
W JM X JM , ð20:188Þ
where X JM is defined as
x1 JþM x2 JM
X JM pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðM ¼ J, J þ 1, , J 1, J Þ ð20:189Þ
ðJ þ M Þ!ðJ M Þ!
and coefficients W JM are polynomials of u1, u2, v1, and v2. The coefficients W JM will
be determined soon after in combination with X JM . Meanwhile, we have
X2J ð2J Þ!
BJ ¼ ðu x Þk ðu2 x2 Þ2Jk
k!ð2J kÞ! 1 1
k¼0
XJ 1
¼ ð2J Þ! M¼J ðu x ÞJþM ðu2 x2 ÞJM
ðJ þ M Þ!ðJ M Þ! 1 1 ð20:190Þ
XJ 1
¼ ð2J Þ! M¼J u JþM u2 JM x1 JþM x2 JM
ðJ þ M Þ!ðJ M Þ! 1
XJ
¼ ð2J Þ! M¼J ΦJM X JM ,
where ΦJM is defined as
u1 JþM u2 JM
ΦJM pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðM ¼ J, J þ 1, , J 1, J Þ: ð20:191Þ
Implications of the above calculations of abstract algebra are as follows: In Sect.

ð jÞ
20.2.2 we determined the representation matrices Dm0 m of SU(2) using (20.47) and
(20.48). We see that the functions fm (m ¼ j, j + 1, , j 1, j) of (20.47) are
transformed as basis vectors by the unitary transformation D( j ). Meanwhile, the
functions ΦJM ðM ¼ J, J þ 1, , J 1, J Þ of (20.191) are transformed as the
basis vectors by the unitary transformation D(J ). Here let us compare (20.188) and
(20.190). Then, we see that whereas ΦJM associates X JM with the invariant BJ, W JM
associates X JM with the invariant AJ. Thus, W JM plays the same role as ΦJM and, hence,
we expect that W JM is eligible for the basis vectors for D(J ) as well.
(2) Binomial expansion of AJ:
To calculate W JM , using the binomial theorem we expand AJ such that [5]
!
j1 þj2 J
X j1 þj2 J λ
j1 þ j2 J j1 þj2 Jλ
ð u1 v 2 u2 v 1 Þ ¼ λ¼0
ð1Þ ð u1 v 2 Þ ð u2 v 1 Þ λ ,
λ
!
j1 j2 þJ
X j1 j2 þJ j1 j2 þ J j1 j2 þJμ
ð u2 x 2 þ u 1 x 1 Þ ¼ μ¼0
ð u1 x 1 Þ ð u2 x 2 Þ μ ,
μ
!
j2 j1 þJ
X j1 j2 þJ j2 j1 þ J j2 j1 þJv
ð v2 x2 þ v1 x1 Þ ¼ μ¼0
ð v1 x1 Þ ð v2 x2 Þ v :
v
ð20:192Þ
Therefore, we have
X
λ j1 þ j2 J j1 j2 þ J j2 j1 þ J
AJ ¼ ð1Þ
λ,μ,v λ μ v
u1 2j1 λμ u2 λþμ v1 j2 j1 þJvþλ
v2 j1 þj2 Jλþv
x1 2Jμv x2 μþv ð20:193Þ
Introducing new summation variables m1 ¼ j1 λ μ and m2 ¼ J j1 + λ v,

we obtain
! ! !
X j1 þ j2 J j1 j2 þ J j2 j1 þ J
λ
AJ ¼ λ,μ,v
ð1Þ
λ j1 λ m1 j2 λ þ m 2 ð20:194Þ
j1 þm1 j1 m1 j2 þm2 j2 m2
u1 u2 v1 v2 x1 Jþm1 þm2 x2 Jm1 m2 ,

j2 j1 þ J
where to derive in RHS we used
j2 λ þ m2

a a
¼ : ð20:195Þ
b ab
Further using (20.176) and (20.189), we modify AJ such that
X
ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ!
AJ ¼ ð1Þλ
m1 ,m2 ,λ λ!ð j1 þ j2 J λÞ!ð j1 λ m1 Þ!
½ð j1 þ m1 Þ!ð j1 m1 Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!ðJ þ m1 þ m2 Þ!ðJ m1 m2 Þ!1=2 j1 j2 J
ψ m1 ϕm2 X m1 þm2 :
ðJ j2 þ λ þ m1 Þ!ð j2 λ þ m2 Þ!ðJ j1 þ λ m2 Þ!
Setting m1 + m2 ¼ M, we have
X
ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ!
AJ ¼ ð1Þλ
m1 ,m2 ,λ λ!ð j1 þ j2 J λÞ!ð j1 λ m1 Þ!
½ð j1 þ m1 Þ!ð j1 m1 Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!ðJ þ M Þ!ðJ M Þ!1=2 j1 j2 J
ψ m1 ϕm2 X M :
ðJ j2 þ λ þ m1 Þ!ð j2 λ þ m2 Þ!ðJ j1 þ λ m2 Þ!
ð20:196Þ
In this way, we get W JM for (20.188) expressed as
W JM ¼ ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ!
X
CJ ψ j1 ϕ j2 ,
m ,m ,m þm ¼M m1 ,m2 m1 m2
ð20:197Þ
1 2 1 2
where we define C Jm1 ,m2 as

X ð1Þλ ½ð j1 þm1 Þ!ð j1 m1 Þ!ð j2 þm2 Þ!ð j2 m2 Þ!ðJ þM Þ!ðJ M Þ!1=2
C Jm1 ,m2 λ λ!ð j þj J λÞ!ð j λm1 Þ!ðJ j þλþm1 Þ!ð j λþm2 Þ!ðJ j þλm2 Þ!
:
1 2 1 2 2 1
ð20:198Þ
From (20.197), we assume that the functions W JM ðM ¼ J, Jþ

1, , J 1, JÞ are suited for forming basis vectors of D(J ). Then, any basis set ΛJM
should be connected to W JM via unitary transformation such that
ΛJM ¼ UW JM ,
where U is a unitary matrix. Then, we get W JM ¼ U { ΛJM . What we have to do is only

to normalize W JM. If we carefully look at the functional form of (20.197), we become
aware that we only have to adjust the first factor of (20.197) that is a function of only
J. Hence, as the proper normalized functions we expect to have
ΨJM ¼ C ðJ ÞW JM ¼ C ðJ ÞU { ΛJM , ð20:199Þ
where C(J ) is a constant that depends only on J with given numbers j1 and j2. An
implication of (20.199) is that we can get suitably normalized functions ΨJM using
arbitrary basis vectors ΛJM .
Putting
ρJ C ðJ Þ ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ!, ð20:200Þ
we get
X
ΨJM ¼ ρJ m1 ,m2 ,m1 þm2 ¼M
C Jm1 ,m2 ψ mj11 ϕmj22 : ð20:201Þ
The combined states are completely decided by J and M. If we fix J at a certain

number, (20.201) is expressed as a linear combination of different functions ψ mj11 ϕmj22
with varying m1 and m2 but fixed m1 + m2 ¼ M. Returning back to Tables 20.1, 20.2,
and 20.3, the same M can be found in at most 2j2 + 1 places. This implies that ΨJM of
(20.201) has at most 2j2 + 1 terms on condition that j1 j2. In (20.201) the
ClebschGordan coefficients are given by
ρJ CJm1 ,m2 :
The descriptions of the ClebschGordan coefficients are different from literature

to literature [5, 6]. We adopt the description due to Hamermesh [5] such that
ð j1 m1 j2 m2 jJM Þ ρJ C Jm1 ,m2 ,

X
ΨJM ¼ ð j m j m jJM Þψ mj11 ϕmj22 :
m ,m ,m þm ¼M 1 1 2 2
ð20:202Þ
1 2 1 2
(3) Normalization of ΨJM :

Normalization condition for ΨJM of (20.201) or (20.202) is given by
X 2
j ρJ j 2 m1 ,m2 ,m1 þm2 ¼M
CJm1 ,m2 ¼ 1: ð20:203Þ
To obtain (20.203), we assume the following normalization condition for the

basis vectors ψ mj11 ϕmj22 that span the representation space. That is, for an appropriate
pair of complex variables z1 and z2, we define their inner product as follows [9]:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 jz1 z2
zl1 zml ¼ l!ðm lÞ! k!ðm k Þ!δlk or
k mk
* +
zl1 zml zk1 zmk
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ δlk:
2 2
ð20:204Þ
l!ðm lÞ! k!ðm kÞ!
Equation (20.204) implies that m + 1 monomials of m-degree with respect to z1

and z2 constitute the orthonormal basis.
Since ρJ defined in (20.200) is independent of M, we can evaluate it by choosing
M conveniently in (20.203). Setting M (¼m1 + m2) ¼ J and substituting it in
J j1 + λ m2 of (20.198), we obtain (m1 + m2) j1 + λ m2 ¼ λ + m1 j1.
So far as we are dealing with integers, an inside of the factorial must be non-
negative, and so we have
λ þ m1 j1 0 or λ j1 m 1 : ð20:205Þ
Meanwhile, looking at another factorial ( j1 λ m1)!, we get
λ j1 m1 : ð20:206Þ
From the above equations, we obtain only one choice for λ such that [5].
λ ¼ j1 m1 : ð20:207Þ
Notice that j1 m1 is an integer, whichever j1 takes out of an integer or a half-

odd-integer. Then, we get
pffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð 2J Þ! ð j1 þm1 Þ!ð j2 þm2 Þ!
CJm1 ,m2 ¼ ð1Þ j1 m1
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! ð j1 m1 Þ!ð j2 m2 Þ!
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

j1 m1 ð2J Þ! j1 þm1 j2 þm2
¼ ð1Þ : ð20:208Þ
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j2 m2 j1 m1

j1 þ m 1 j2 þ m2
To confirm (20.208), calculate and use, e.g.,
j2 m 2 j1 m1
j1 j2 þ m1 þ m2 ¼ j1 j2 þ M ¼ j1 j2 þ J: ð20:209Þ
Thus, we get
X 2
m1 ,m2 ,m1 þm2 ¼M
CJm1 ,m2
X
ð2J Þ! j1 þ m 1
¼
ðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ! m1 ,m2 ,m1 þm2 ¼M j2 m2

j2 þ m2
: ð20:210Þ
j1 m 1
To evaluate the sum, we use the following relations [4, 5, 10]:
ΓðzÞΓð1 zÞ ¼ π=ð sin πzÞ: ð20:211Þ
Replacing z with x + 1 in (20.211), we have
Γðx þ 1ÞΓðxÞ ¼ π=½ sin π ðx þ 1Þ: ð20:212Þ
Meanwhile, using gamma functions, we have

x x! Γ ð x þ 1Þ Γðx þ 1ÞΓðxÞ Γðy xÞ
¼ ¼ ¼
y y!ðx yÞ! y!Γðx y þ 1Þ Γðx y þ 1ÞΓðy xÞ y!ΓðxÞ
π sin π ðx y þ 1Þ Γðy xÞ
¼
sin π ðx þ 1Þ π y!ΓðxÞ
sin π ðx y þ 1Þ Γðy xÞ Γ ð y xÞ
¼ ¼ ð1Þy : ð20:213Þ
sin π ðx þ 1Þ y!ΓðxÞ y!ΓðxÞ
Replacing x in (20.213) with y x 1, we have

yx1 y Γ ð x þ 1Þ x
¼ ð1Þ ¼ ð1Þy ,
y y!Γðx y þ 1Þ y
where with the last equality we used (20.213). That is, we get

x y yx1
¼ ð1Þ : ð20:214Þ
y y
Applying (20.214) to (20.210), we get

X 2
m1 ,m2 ,m1 þm2 ¼M
C Jm1 ,m2
ð2J Þ!
¼
ðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ!
X
j2 m2 j1 m1 j2 m2 j1 m1 1 j1 m1 j2 m2 1
ð1Þ ð1Þ
m1 ,m2 ,m1 þm2 ¼M j2 m2 j1 m1

ð1Þ j1 þj2 J ð2J Þ! X j2 j1 J1 j1 j2 J1
¼ : ð20:215Þ
ðJj2 þj1 Þ!ðJj1 þj2 Þ! m1 ,m2 ,m1 þm2 ¼M j2 m2 j1 m1
Meanwhile, from the binomial theorem, we have

! ! ! !
X r X s X r s
α β
r
ð1 þ xÞ ð1 þ xÞ ¼ s
α
x β
x ¼ α,β
xαþβ
α β α β
!
X rþs
¼ ð 1 þ xÞ rþs
¼ γ
xγ :
γ
ð20:216Þ
Comparing the coefficients of the γ-th power monomials of the last three sides,
we get
X
rþs r s
¼ : ð20:217Þ
γ α,β,αþβ¼γ α β
Applying (20.217) to (20.215) and employing (20.214) once again, we obtain

X 2
m1 ,m2 ,m1 þm2 ¼M
C Jm1 ,m2

ð1Þ j1 þj2 J ð2J Þ! 2J 2 ð1Þ j1 þj2 J ð1Þ j1 þj2 J ð2J Þ! j1 þj2 þJ þ1
¼ ¼
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j1 þj2 J ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j1 þj2 J

ð2J Þ! j1 þj2 þJ þ1 ð2J Þ! ð j1 þj2 þJ þ1Þ!
¼ ¼
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j1 þj2 J ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! ð j1 þj2 J Þ!ð2J þ1Þ!
ð j1 þj2 þJ þ1Þ!
¼ : ð20:218Þ
ð2J þ1ÞðJ j2 þj1 Þ!ðJ j1 þj2 Þ!ð j1 þj2 J Þ!
Inserting (20.218) into (20.203), as a positive number ρJ we have

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2J þ 1ÞðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ!ð j1 þ j2 J Þ!
ρJ ¼ : ð20:219Þ
ð j1 þ j2 þ J þ 1Þ!
From (20.200), we find

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2J þ 1
C ðJ Þ ¼ : ð20:220Þ
ðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ!ð j1 þ j2 J Þ!ð j1 þ j2 þ J þ 1Þ!
Thus, as the ClebschGordan coefficients ( j1m1j2m2| JM), at last we get
ð j1 m1 j2 m2 jJM Þ ρJ C Jm1 ,m2

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2J þ1ÞðJ j2 þj1 Þ!ðJ j1 þj2 Þ!ð j1 þj2 J Þ!
¼
ð j1 þj2 þJ þ1Þ!
X ð1Þλ ½ð j1 þm1 Þ!ð j1 m1 Þ!ð j2 þm2 Þ!ð j2 m2 Þ!ðJ þM Þ!ðJ M Þ!1=2
λ λ!ð j þj J λÞ!ð j λm1 Þ!ðJ j þλþm1 Þ!ð j λþm2 Þ!ðJ j þλm2 Þ!
1 2 1 2 2 1
ð20:221Þ
During the course of the above calculations, we encountered, e.g., Γ(x) and the
factorial involving a negative integer. Under ordinary circumstances, however, such
things must be avoided. Nonetheless, it would be convenient for practical use, if we
properly recognize that we first choose a number, e.g., x close to (but not identical
with) a negative integer and finally take the limit of x at a negative integer after the
related calculations have been finished.
Choosing, e.g., M(¼m1 + m2) ¼ J in (20.203) instead of setting M ¼ J in the
above, we will see that only the factor of λ ¼ j2 + m2 in (20.198) survives. Yet we get
the same result as the above. The confirmation is left for readers.
20.3.3 Examples of Calculation of Clebsch2Gordan

Coefficients
Simple examples mentioned below help us understand the calculation procedures of

the ClebschGordan coefficients and their implications.
Example 20.1 As a simplest example, we examine a case of D(1/2) D(1/2). At the
same time, it is an example of the symmetric and antisymmetric representations
discussed in Sect. 18.9. Taking two sets of complex valuables u (u2 u1) and
v (v2 v1) and rewriting them according to (20.176), we have
1=2 1=2 1=2 1=2

ψ 1=2 ¼ u2 , ψ 1=2 ¼ u1 , ϕ1=2 ¼ v2 , ϕ1=2 ¼ v1
so that we can get the basis functions of the direct-product representation described
by
ðu2 v2 u1 v2 u2 v1 u1 v1 Þ:
Notice that we have four functions above; i.e., (2j1 + 1)(2j2 + 1) ¼ 4, j1 ¼ j2 ¼ 1/2.
Using (20.221) we can determine the proper basis functions with respect to ΨJM of
(20.202). For example, for Ψ11 we have only one choice to take m1 ¼ m2 ¼ 12 to
get m1 + m2 ¼ M ¼ 1. In (20.221) we have no other choice but to take λ ¼ 0. In
this way, we have
1=2 1=2
Ψ11 ¼ u2 v2 ¼ ψ 1=2 ϕ1=2 :
Similarly, we get
1=2 1=2
Ψ11 ¼ u1 v1 ¼ ψ 1=2 ϕ1=2 :
With two other functions, using (20.221) we obtain
1 1 1=2 1=2 1=2 1=2

Ψ10 ¼ pffiffiffi ðu2 v1 þ u1 v2 Þ ¼ pffiffiffi ψ 1=2 ϕ1=2 þ ψ 1=2 ϕ1=2 ,
2 2
1 1 1=2 1=2 1=2 1=2
Ψ0 ¼ pffiffiffi ðu1 v2 u2 v1 Þ ¼ pffiffiffi ψ 1=2 ϕ1=2 ψ 1=2 ϕ1=2 :
0
2 2
We put the above results in a simple matrix form representing the basis vector
transformation such that
0 1
1 0 0 0
B 1 1 C
B 0 pffiffiffi 0 pffiffiffi C
B 2 2 C
ðu2 v2 u1 v2 u1 v1 u2 v1 ÞB
B0 0
C ¼ Ψ 1 Ψ1 Ψ1 Ψ0 :
C 1 0 1 0 ð20:222Þ
B 1 0 C
@ 1 1 A
0 pffiffiffi 0 pffiffiffi
2 2
In this way, (20.222) shows how the basis functions that possess a proper
irreducible representation and span the direct-product representation space are
constructed using the original product functions. The ClebschGordan coefficients
play a role as a linker (i.e., a unitary operator) of those two sets of functions. In this
respect these coefficients resemble those appearing in the symmetry-adapted linear
combinations (SALCs) mentioned in Sects. 18.2 and 19.3.
Expressing the unitary transformation according to (20.173) and choosing
1
Ψ1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the transformation in a similar
manner to (20.48) such that

Rða, bÞ Ψ11 Ψ10 Ψ11 Ψ00 ¼ Ψ11 Ψ10 Ψ11 Ψ00 Dð1=2Þ Dð1=2Þ : ð20:223Þ
Then, as the matrix representation of D(1/2) D(1/2) we get

0 pffiffiffi 1
a2 2ab b2 0
B ffiffi2ffiab
p
aa bb

pffiffiffi
2a b 0 C
B C
Dð1=2Þ Dð1=2Þ ¼B pffiffiffi C: ð20:224Þ
@ ð b Þ 2 2a b 2
ða Þ 0A
0 0 0 1
Notice that (20.224) is a block matrix, namely (20.224) has been reduced
according to the symmetric and antisymmetric representations. Symbolically writing
(20.223) and (20.224), we have
Dð1=2Þ Dð1=2Þ ¼ Dð1Þ ⨁Dð0Þ :
The block matrix of (20.224) is unitary, and so the (3, 3) submatrix and (1, 1)
submatrix (i.e., the number of 1) are unitary accordingly. To show it, use the
conditions (20.35) and (20.173). The confirmation is left for readers. This submatrix

is a symmetric representation, whose representation space Ψ11 Ψ10 Ψ11 span. Note
that these functions are symmetric with the exchange m1 $ m2. Or, we may think
that these functions are symmetric with the exchange ψ $ ϕ. Though trivial, the
basis function Ψ00 spans the antisymmetric representation space. The corresponding
submatrix is merely the number 1. This is directly related to the fact that v
(ug)T (or u1v2 u2v1) of (20.185) is an invariant. The function Ψ00 changes the
sign (i.e., antisymmetric) with the exchange m1 $ m2 or ψ $ ϕ.
Once again, readers might well ask why we need to work out such an elaborate
means to pick up a small pebble. With increasing dimension of the vector space,
however, to seek an appropriate set of proper basis functions becomes increasingly
as difficult as lifting a huge rock. Though simple, a next example gives us a feel for
it. Under such situations, a projection operator is an indispensable tool to address the
problems.
Example 20.2 We examine a direct product of D(1) D(1), where j1 ¼ j2 ¼ 1. This
is another example of the symmetric and antisymmetric representations. Taking
two sets of complex valuables u (u2 u1) and v (v2 v1) and rewriting them
according to (20.176) again, we have nine, i.e., (2j1 + 1)(2j2 + 1) product functions
expressed as
ψ 11 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 , ψ 11 ϕ10 , ψ 10 ϕ10 , ψ 11 ϕ10 , ψ 11 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 :
ð20:225Þ
These product functions form the basis functions of the direct-product represen-
tation. Consequently, we will be able to construct proper eigenfunctions by means of
linear combinations of these functions. This method is again related to that based on
SALCs discussed in Chap. 19. In the present case, it is accomplished by finding the
ClebschGordan coefficients described by (20.221).
As implied in Table 20.1, we need to determine the individual coefficients of nine
functions of (20.225) with respect to the following functions that are constituted by
linear combinations of the above nine functions:
Ψ22 , Ψ21 , Ψ20 , Ψ21 , Ψ22 , Ψ11 , Ψ10 , Ψ11 , Ψ00 :
(i) With the first five functions, we have J ¼ j1 + j2 ¼ 2. In this case, the
determination procedures of the coefficients are pretty much simplified. Substitut-
ing J ¼ j1 + j2 for the second factor of the denominator of (20.221), we get (λ)!
as well as λ! in the first factor of the denominator. Therefore, for the inside of
factorials not to be negative we must have λ ¼ 0. Consequently, we have [5]
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2j1 Þ!ð2j2 Þ!
ρJ ¼ ,
ð2J Þ!
1=2
C Jm1 ,m2 ¼ :
ð j1 þ m1 Þ!ð j1 m1 Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!
With Ψ2 2 we have only one choice for ρJ and CJm1 ,m2. That is, we have m1 ¼ j1
and m2 ¼ j2. Then, Ψ2 2 ¼ ψ 1 1 ϕ1 1 . In the case of M ¼ 1, we have two
choices in (20.221); i.e., m1 ¼ 1, m2 ¼ 0 and m1 ¼ 0, m2 ¼ 1. Then, we
get
1
Ψ21 ¼ pffiffiffi ψ 11 ϕ10 þ ψ 10 ϕ11 : ð20:226Þ
2
In the case of M ¼ 1, similarly we have two choices in (20.221); i.e., m1 ¼ 1,

m2 ¼ 0 and m1 ¼ 0, m2 ¼ 1. We get
1
Ψ21 ¼ pffiffiffi ψ 11 ϕ10 þ ψ 10 ϕ11 : ð20:227Þ
2
Moreover, in the case of M ¼ 0, we have three choices in (20.221); i.e.,

m1 ¼ 1, m2 ¼ 1 and m1 ¼ 0, m2 ¼ 0 along with m1 ¼ 1, m2 ¼ 1 (see
Table 20.1). As a result, we get
1
Ψ20 ¼ pffiffiffi ψ 11 ϕ11 þ 2ψ 10 ϕ10 þ ψ 11 ϕ11 : ð20:228Þ
6
(ii) With J ¼ 1, i.e., Ψ11 , Ψ10 , and Ψ11 , we start with (20.221). Noting the λ! and
(1 λ)! in the first two factors of denominator, we have λ ¼ 0 or λ ¼ 1. In the
case of M ¼ 1, we have only one choice of m1 ¼ 0, m2 ¼ 1 for λ ¼ 0.
Similarly, m1 ¼ 1, m2 ¼ 0 for λ ¼ 1. Hence, we get
1
Ψ11 ¼ pffiffiffi ψ 10 ϕ11 ψ 11 ϕ10 : ð20:229Þ
2
In a similar manner, for M ¼ 0 and M ¼ 1, respectively, we have
1
Ψ10 ¼ pffiffiffi ψ 11 ϕ11 ψ 11 ϕ11 , ð20:230Þ
2
1
Ψ11 ¼ pffiffiffi ψ 11 ϕ10 ψ 10 ϕ11 : ð20:231Þ
2
(iii) With J ¼ 0 (i.e., Ψ00 ), we have J ¼ j1 j2 (¼0). In this case, we have

J j1 + λ m2 ¼ j2 + λ m2 in the denominator of (20.221) [5]. Since
this factor is inside the factorial, we have j2 + λ m2 0. Also, we have
j2 λ + m2 0 in the denominator of (20.221). Hence, we have only one
choice of λ ¼ j2 + m2. Therefore, we get
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2J þ 1Þ!ð2j2 Þ!
ρJ ¼ ,
ð2j1 þ 1Þ!
1=2
j2 þm2 ð j1 þ m1 Þ!ð j1 m1 Þ!
C Jm1 ,m2 ¼ ð1Þ :
ðJ þ M Þ!ðJ M Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!
In the above, we have three choices of m1 ¼ 1, m2 ¼ 1; m1 ¼ m2 ¼ 0; m1 ¼ 1,

m2 ¼ 1. As a result, we get
1
Ψ00 ¼ pffiffiffi ψ 11 ϕ11 ψ 10 ϕ10 þ ψ 11 ϕ11 :
3
Summarizing the above results, we have constructed the proper

eigenfunctions with respect to the combined angular momenta using the basis
functions of the direct-product representation. An example of the relevant
matrix representations is described as follows:
1 1
ψ 1 ϕ1 ψ 10 ϕ11 ψ 11 ϕ11 ψ 10 ϕ11 ψ 11 ϕ11 ψ 11 ϕ10 ψ 11 ϕ11 ψ 11 ϕ10 ψ 10 ϕ10
0 1
1 0 0 0 0 0 0 0 0
B C
B 0 p1ffiffiffi 0 0 0 pffiffiffi
1
0 C
B 0 0 C
B 2 2 C
B C
B 1 C
B 0 0 p1ffiffiffi 0 0 1
B 0 0 C
B 6 2 3 C
B C
B 1 1 C
B0 0 0 p ffiffi
ffi 0 0 0 p ffiffi
ffi 0 C
B C
B 2 2 C
B C
B0 0 0 0 1 0 0 0 0 C
BB C ð20:232Þ
C
B 0 p1ffiffiffi 0 0 0 pffiffiffi
1
0 C
B 0 0 C
B 2 2 C
B C
B 1 C
B 0 0 p1ffiffiffi 0 0 0
1
pffiffiffi 0 pffiffiffi C
B C
B 6 2 3 C
B C
B 1 1 C
B0 0 0 p ffiffi
ffi 0 0 0 p ffiffi
ffi 0 C
B C
B 2 2 C
B C
@ 2 1 A
0 0 p ffiffi
ffi 0 0 0 0 0 p ffiffi
ffi
6 3
2
¼ Ψ2 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00
In (20.232) the matrix elements represent the ClebschGordan coefficients

and their combination in (20.232) forms a unitary matrix. In (20.232) we have
arbitrariness with the disposition of the elements of both row and column
vectors. In other words, we have the arbitrariness with the unitary similarity
transformation of the matrix, but the unitary matrix that underwent the unitary
similarity transformation is again a unitary matrix (see Chap. 14). To show that
the determinant of (20.232) is 1, use the expansion of the determinant and use
the fact that when (a multiple of) a certain column (or row) is added to another
column (or row), the determinant is unchanged (see Sect. 11.3).
We find that the functions belonging to J ¼ 2 and J ¼ 0 are the basis
functions of the symmetric representations. Meanwhile, those belonging to
J ¼ 1 are the basis functions of the antisymmetric representations (see Sect.
18.9). Expressing the unitary transformation of this example according to

(20.173) and choosing Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00 as the basis
vectors, we describe the transformation as

ℜða, bÞ Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00
ð20:233Þ
¼ Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00 Dð1Þ Dð1Þ :
Notice again that the representation matrix of D(1) D(1) can be converted
(or reduced) to a block unitary matrices according to the symmetric and
antisymmetric representations such that
Dð1Þ Dð1Þ ¼ Dð2Þ ⨁Dð1Þ ⨁Dð0Þ , ð20:234Þ
where D(2) and D(0) are the symmetric representations and D(1) is the
antisymmetric representation. Recall that SO(3) is simply reducible. To show
(20.234), we need somewhat lengthy calculations, but the computation pro-
cedures are straightforward.

A set of basis functions Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 spans a five-dimensional
symmetric representation space. In turn, Ψ00 spans a one-dimensional symmetric

representation space. Another set of basis functions Ψ11 Ψ10 Ψ11 spans a
three-dimensional antisymmetric representation space.
We add that to seek some special ClebschGordan coefficients would be
easy. For example, we can readily find out them in the case of J ¼ j1 + j2 ¼ M or
J ¼ j1 + j2 ¼ M. The final result of the normalized basis functions can be
j1 þj2 j j j þj2 j j
Ψ j1 þj2 ¼ ψ j11 ϕ j22 , Ψ1ð j1 þj2 Þ ¼ ψ j1 1 ϕj2 2 :
That is, among the related coefficients, only
ð j1 , m1 ¼ j1 , j2 , m2 ¼ j2 jJ ¼ j1 þ j2 , M ¼ J Þ
and
ð j1 , m1 ¼ j1 , j2 , m2 ¼ j2 jJ ¼ j1 þ j2 , M ¼ J Þ
survive and take a value of 1; see the corresponding parts of the matrix
elements in (20.221). This can readily be seen in Tables 20.1, 20.2, and 20.3.
20.4 Lie Groups and Lie Algebras
In Sect. 20.2 we have developed a practical approach to dealing with various aspects
and characteristics of SU(2) and SO(3). In this section we introduce an elegant theory
of Lie groups and Lie algebras that enable us to systematically study SU(2) and
SO(3).
20.4 Lie Groups and Lie Algebras 865
20.4.1 Definition of Lie Groups and Lie Algebras:

One-Parameter Groups
We start with the following discussion. Let A(t) be a non-singular matrix whose
elements vary as a function of a real number t. Here we think of a matrix as a function
of a real number t. If under such conditions A(t) constitutes a group, A(t) is said to be
a one-parameter group. In particular, we are interested in a situation where we have
Aðs þ t Þ ¼ AðsÞAðt Þ: ð20:235Þ
We further get
AðsÞAðt Þ ¼ Aðs þ t Þ ¼ Aðt þ sÞ ¼ Aðt ÞAðsÞ: ð20:236Þ
Therefore, any pair of A(t) and A(s) is commutative. Putting t ¼ s ¼ 0 in (20.236),

we have
Að0Þ ¼ Að0ÞAð0Þ: ð20:237Þ
Since A(0) is non-singular, A(0)1 must exist. Then, multiplying A(0)1 on both
sides of (20.237), we obtain
Að0Þ ¼ E, ð20:238Þ
where E is an identity matrix. Also, we have
Aðt ÞAðt Þ ¼ Að0Þ ¼ E, ð20:239Þ
meaning that
Aðt Þ ¼ Aðt Þ1 :
Differentiating both sides of (20.235) with respect to s at s ¼ 0, we get
A0 ðt Þ ¼ A0 ð0ÞAðt Þ: ð20:240Þ
Defining
X A0 ð0Þ, ð20:241Þ
we obtain
A0 ðt Þ ¼ XAðt Þ: ð20:242Þ
Aðt Þ ¼ exp ðtX Þ ð20:243Þ
under an initial condition A(0) ¼ E. Thus, any one-parameter group A(t) is described
by (20.243) on condition that A(t) is represented by (20.235).
In Sect. 20.1 we mentioned the Lie algebra in relation to (20.14). In this context
(20.243) shows how the Lie algebra is related to the one-parameter group. For a
group to be a continuous group is thus deeply connected to this one-parameter group.
Equation (20.45) of Sect. 20.2 is an example of product of one-parameter groups.
Bearing in mind the above situation, we describe definitions of the Lie group and Lie
algebra.
Definition 20.2 [9] Let G be a subgroup of GL(n, ℂ). Suppose that
Ak (k ¼ 1, 2, ) 2 G and that lim Ak ¼ A ½2 GLðn, ℂÞ. If A 2 G, then G is called
k!1
a linear Lie group of dimension n.
We usually call a linear Lie group simply a Lie group. Note that GL(n, ℂ) has
appeared in Sect. 16.1. The definition again is most likely to bewilder chemists in
that they read it as though the definition itself including the word of Lie group were
written in the air. Nevertheless, a next example is again expected to relieve the
chemists of useless anxiety.
Example 20.3 Let G be a group and let Ak 2 G with detAk ¼ 1 such that

cos θk sin θk
Ak ¼ ðk : real numberÞ: ð20:244Þ
sin θk cos θk
Suppose that lim θk ¼ θ and then we have

k!1

cos θk sin θk cos θ sin θ
lim Ak ¼ lim ¼ A: ð20:245Þ
k!1 k!1 sin θk cos θk sin θ cos θ
Certainly, we have A 2 G, because detAk ¼ det A ¼ 1. Such group is called

SO(2).
In a word, a lie group is a group whose component consists of continuous and
differentiable (or analytic) functions. In that sense the groups such as SU(2) and
SO(3) we have already encountered are Lie groups.
Next, the following is in turn a definition of a Lie algebra.
Definition 20.3 [9] Let G be a linear Lie group. If with 8t (t: real number) we have
exp ðtX Þ 2 G, ð20:246Þ
all X are called a Lie algebra of̬ the corresponding Lie group G. We denote the Lie
algebra of the Lie group G by g.
The relation (20.246) includes the case where X is a (1, 1) matrix (i.e., merely a
complex number) as the trivial case. Most importantly, (20.246) is directly
connected to (20.243) and Definition 20.3 shows the direct relevance between the
Lie group and Lie algebra. The two theories of Lie group and Lie algebra underlie
continuous groups.
The lie algebra has following properties:
̬ ̬
(i) If X 2g, aX 2g as well, where a is any real number.
̬ ̬
(ii) If X, Y 2g, X þ Y 2g as well.
̬ ̬
(iii) If X, Y 2g, ½X, Y ð XY YX Þ 2g as well.
In fact, exp[t(aX)] ¼ exp [(ta)X] and ta is a real number so that we have (i).
Regarding (ii) we should examine individual properties of X. As already shown in
Property (7)0 of Chap. 15 (Sect. 15.2), e.g., a unitary matrix of exp(tX) is associated
with an anti-Hermitian matrix of X; we are interested in this particular case. In the
case of SU(2) X is a traceless anti-Hermitian matrix and as to SO(3) X is a real skew-
symmetric matrix. In the former case, for example, traceless anti-Hermitian matrices
X and Y can be expressed as

ic a þ ib ih f þ ig
X¼ , Y¼ , ð20:247Þ
a þ ib ic f þ ig ih
where a, b, c, etc. are real. Then, we have

iðc þ hÞ ða þ f Þ þ iðb þ gÞ
XþY ¼ : ð20:248Þ
ð a þ f Þ þ i ð b þ gÞ iðc þ hÞ
Thus, X + Y is a traceless anti-Hermitian matrix as well. With the property (iii), we

̬
need some explanation. Suppose X, Y 2g. Then, with any real numbers s and t, exp
(sX), exp (tY), exp (sX), exp (tY) 2 G by Definition 20.3. Then, their product exp
(sX) exp (tY) exp (sX) exp (tY) 2 G as well. Taking an infinitesimal transforma-
tion of these four products within the first order of s and t, we have
exp ðsX Þ exp ðtY Þ exp ðsX Þ exp ðtY Þ

ð1 þ sX Þð1 þ tY Þð1 sX Þð1 tY Þ
¼ ð1 þ tY þ sX þ stXY Þð1 tY sX þ stXY Þ
ð1 tY sX þ stXY Þ þ ðtY stYX Þ þ ðsX stXY Þ þ stXY
¼ 1 þ stðXY YX Þ: ð20:249Þ
Notice that we ignored the terms having s2, t2, st2, s2t, and s2t2 as a coefficient.
Defining
½X, Y XY YX, ð20:250Þ
we have
exp ðsX Þ exp ðtY Þ exp ðsX Þ exp ðtY Þ 1 þ st ½X, Y exp ðst ½X, Y Þ: ð20:251Þ
Since LHS represents an element of G and st is a real number, by Definition 20.3

̬
½X, Y 2g . The quantity [X, Y] is called a commutator. The commutator and its
definition appeared in (1.139).
̬
From properties (i) and (ii), the Lie algebra g forms a vector space (see Chap. 11).
̬
An element of g is usually expressed by a matrix. Zero vector corresponds to zero
matrix (see Proposition 15.1). As already seen in Chap. 15, we defined an inner
̬
product between any A, B 2g such that
X
hAjBi aij bij : ð20:252Þ
i, j
̬
Thus, g constitutes an inner product (vector) space.
20.4.2 Properties of Lie Algebras
Let us further continue our discussion of Lie algebras by thinking of a following

proposition about the inner product.
Proposition 20.1 Let f(t) and g(t) be continuous and differentiable functions with
respect to a real number t. Then, we have

d df ðt Þ dgðt Þ
h f ðt Þjgðt Þi ¼ gðt Þ þ f ðt Þ : ð20:253Þ
dt dt dt
Proof We calculate a following equation:

h f ðt þ ΔÞjgðt þ ΔÞi h f ðt Þjgðt Þi

¼ h f ðt þ ΔÞjgðt þ ΔÞi h f ðt Þjgðt þ ΔÞi þ h f ðt Þjgðt þ ΔÞi h f ðt Þjgðt Þi
¼ h f ðt þ ΔÞ f ðt Þjgðt þ ΔÞi þ h f ðt Þjgðt þ ΔÞ gðt Þi:
ð20:254Þ
Dividing both sides of (20.254) by a real number Δ and taking a limit as below,
we have
1
½h f ðt þ ΔÞjgðt þ ΔÞi h f ðt Þjgðt Þi
lim
Δ
Δ!0
ð20:255Þ
f ðt þ ΔÞ f ðt Þ gðt þ ΔÞ gðt Þ
¼ lim gð t þ Δ Þ þ f ð t Þ ,
Δ!0 Δ Δ
where we used the calculation rules for inner products of (13.3) and (13.20). Then,
we get (20.253). This completes the proof. ∎
Now, let us apply (20.253) to a unitary operator A(t) we dealt with in the previous
section. Replacing f(t) and g(t) in (20.253) with ψA(t){ and A(t)χ, respectively, we
have
* +
D E dAðt Þ{ dAðt Þ
d
ψAðt Þ{ Aðt Þχ ¼ ψ Aðt Þχ þ ψAðt Þ {
χ , ð20:256Þ
dt dt dt
where ψ and χ are arbitrarily chosen vectors that do not depend on t. Then, we have
D E
d d d
LHS of ð20:256Þ ¼ ψ jAðt Þ{ Aðt Þjχ ¼ hψ jEjχ i ¼ hψ jχ i ¼ 0, ð20:257Þ
dt dt dt
where with the second equality we used the unitarity of A(t). Taking limit t ! 0 in
(20.256), we have
* +
dAð0Þ{ dAð0Þ
{
lim ½RHS of ð20:256Þ ¼ ψ Að0Þχ þ ψAð0Þ χ :
t!0 dt dt
Meanwhile, taking a limit of t ! 0 in the following relation, we have

{
dAðt Þ{ dAðt Þ {
¼ ¼ ½A0 ð0Þ ,
dt t!0
dt t!0
where we assumed that operations of the adjoint and t ! 0 are commutable. Then,
we get
D E
0 { { dAð0Þ
lim ½RHS of ð20:256Þ ¼ ψ ½A ð0Þ jAð0Þχ þ ψAð0Þ χ
t!0 dt ð20:258Þ

¼ ψX { jχ þ hψ jXχ i ¼ ψ X { þ X χ ,
where we used X A0(0) of (20.241) and A(0) ¼ A(0){ ¼ E. From (20.257) and
(20.258), we have

0 ¼ ψ X{ þ X χ :
Since ψ and χ are arbitrary, from Theorem 14.2 we get
X { þ X 0, i:e:, X { ¼ X: ð20:259Þ
This indicates that X is an anti-Hermitian operator. Let X be expressed as (xij).

Then, from (20.259) xij ¼ xji . As for diagonal elements, xii ¼ xii ; i.e., xii þ
xii ¼ 0 . Hence, xii is a pure imaginary number or zero. We have the following
theorem accordingly.
Theorem 20.3 Let A(t) be a one-parameter unitary group with A(0) ¼ E that is
described by A(t) ¼ exp (tX) and satisfies (20.235). Then, X A0(0) is an anti-
Hermitian operator.
Differentiating both sides of A(t) ¼ exp (tX) with respect to t and using (15.47) of
Theorem 15.4 [11, 12], we have
A0 ðt Þ ¼ X exp ðtX Þ ¼ XAðt Þ ¼ Aðt ÞX: ð20:260Þ
Putting t ¼ 0 in (20.260), we restore X ¼ A0(0). If we require A(t) to be unitary,

from Theorem 20.3 again X should be anti-Hermitian. Thus, once we have a one-
parameter group in a form of a unitary operator A(t) ¼ exp (tX), the exponent can be
separated into a product of a parameter t and a t-independent constant anti-Hermitian
operator X. In fact, all the one-parameter groups that have appeared in Sect. 20.2 are
of a type of exp(tX).
Conversely, let us consider what type of operator exp(tX) would be if X is anti-
Hermitian. Here we assume that X is a (n, n) matrix. With an arbitrary real number
t we have

½ exp ðtX Þ exp ðtX Þ{ ¼ ½ exp ðtX Þ exp tX { ¼ ½ exp ðtX Þ ½ exp ðtX Þ
¼ ½ exp ðtX tX Þ ¼ exp 0 ¼ E, ð20:261Þ
where we used (15.32) and Theorem 15.2 as well as the assumption that X is anti-
Hermitian. Note that exp(tX) and exp(tX) are commutative; see (15.29) of Sect.
15.2. Equation (20.261) implies that exp(tX) is unitary, i.e., exp(tX) 2 U(n). The
̬ ̬
notation U(n) means a unitary group. Hence, X 2u ðnÞ, where u ðnÞ means the Lie
algebra corresponding to U(n).
Next, let us seek a Lie algebra that corresponds to a special unitary group U(n).
̬ ̬ ̬
By Definition 20.3, it is obvious that since U(n) ⊃ SU(n), u ðnÞ ⊃ su ðnÞ , where
̬ ̬
su ðnÞ means the Lie algebra corresponding to SU(n). It suffices to find the condition
̬
under which det[exp(tX)] ¼ 1 holds with any real number t and the elements X 2u
ðnÞ. From (15.41) [9], we have
det½ exp ðtX Þ ¼ exp TrðtX Þ ¼ exp t ½TrðX Þ ¼ 1, ð20:262Þ
where Tr stands for trace (see Chap. 12). This is equivalent to that
t ½TrðX Þ ¼ 2miπ ðm : zero or integersÞ ð20:263Þ
holds
̬ ̬
with any real t. For this, we must have Tr(X) ¼ 0 with m ¼ 0. This implies that
su ðnÞ comprises anti-Hermitian ̬ ̬
matrices with its trace being zero (i.e., traceless).
̬
Consequently, the Lie algebra su ðnÞ is certainly a subset (or subspace) of u ðnÞ that
consists of anti-Hermitian matrices.
In relation to (20.256) let us think of a real orthogonal matrix A(t). If A(t) is real
and orthogonal, (20.256) can be rewritten as

d dAðt ÞT dAðt Þ
ψAðt ÞT jAðt Þχ ¼ ψ Aðt Þχ þ ψAðt ÞT χ : ð20:264Þ
dt dt dt
Following the procedure similar to the case of (20.259), we get
X T þ X 0, i:e:, X T ¼ X: ð20:265Þ
In this case we obtain a skew-symmetric matrix. All its diagonal elements are zero
and, hence, the skew-symmetric matrix is traceless. Other typical Lie groups are an
orthogonal group O(n) and a special orthogonal group SO(n) and the corresponding
̬ ̬ ̬ ̬
Lie algebras are denoted by o ðnÞ and so ðnÞ, respectively. Note that both o ðnÞ and
̬ ̬
so ðnÞ consist of skew-symmetric matrices.
Conversely, let us consider what type of operator exp(tX) would be if X is a skew-
symmetric matrix. Here we assume that X is a (n, n) matrix. With an arbitrary real
number t we have

½ exp ðtX Þ exp ðtX ÞT ¼ ½ exp ðtX Þ exp tX T ¼ ½ exp ðtX Þ ½ exp ðtX Þ
¼ ½ exp ðtX tX Þ ¼ exp 0 ¼ E, ð20:266Þ
where we used (15.31). This

̬
implies that exp(tX) is an orthogonal matrix, i.e., exp
(tX) 2 O(n). Hence, X 2o ðnÞ. Notice that exp(tX) and exp(tX) are commutative.
Summarizing the above, we have the following theorem.
̬
Theorem 20.4 The n-th order Lie algebra u ðnÞ corresponding to the Lie group U
(n) (i.e., unitary group) consists of all the anti-Hermitian (n, n) matrices. The n-th
̬ ̬
order Lie algebra su ðnÞ corresponding to the Lie group SU(n) (i.e., special unitary
group) comprises all the anti-Hermitian (n, n) matrices with the trace zero. The n-th
̬ ̬ ̬
order Lie algebras o ðnÞ and so ðnÞ corresponding to the Lie groups O(n) and SO(n),
respectively, are all the real skew-symmetric (n, n) matrices.
Notice that if A and B are anti-Hermitian matrices, so are A + B and cA (c: real
̬ ̬ ̬ ̬
number). This is true of skew-symmetric matrices as well. Hence, u ðnÞ, su ðnÞ, o ðnÞ,
̬ ̬
and so ðnÞ form a linear vector space.
In Sect. 20.3.1 we introduced a commutator. The commutators are ubiquitous in
quantum mechanics, especially as commutation relations (see Part I). Major prop-
erties are as follows:
ði Þ ½X þ Y, Z ¼ ½X, Z þ ½Y, Z ,
ðiiÞ ½aX, Y ¼ a½X, Z ða 2 ℝÞ,
ðiiiÞ ½X, Y ¼ ½Y, X ,
ðivÞ ½X, ½Y, Z þ ½Y, ½Z, X þ ½Z, ½X, Y ¼ 0: ð20:267Þ
The last equation of (20.267) is well known as Jacobi’s identity. Readers are
encouraged to check it.
Since the Lie algebra forms a linear vector space of a finite dimension, we denote
its basis vectors by X1, X2, , Xd. Then, we have
̬
g ðdÞ ¼ SpanfX 1 , X 2 , , X d g, ð20:268Þ
̬
where
̬
g ðdÞ stands for a Lie algebra of dimension d. As their commutators belong to
g ðdÞ as well, with an arbitrary pair of Xi and Xj (i, j ¼ 1, 2, , d ) we have
Xd
Xi, Xj ¼ f X ,
k¼1 ijk k
ð20:269Þ
where a set of real coefficients fijk is said to be structure constants, which define the
structure of the Lie algebra.
Example 20.4 In (20.42) we write ζ 1 ζ x, ζ 2 ζ y, and ζ 3 ζ z. Rewriting it
explicitly, we have

1 0 i 1 0 1 1 i 0
ζ1 ¼ , ζ2 ¼ , ζ3 ¼ : ð20:270Þ
2 i 0 2 1 0 2 0 i
Then, we get
X3
ζi , ζj ¼ E ζ ,
k¼1 ijk k
ð20:271Þ
where Eijk is called the Levi-Civita symbol [4] and denoted by

8
< þ1 ði, j, k Þ ¼ ð1, 2, 3Þ, ð2, 3, 1Þ, ð3, 1, 2Þ
>
Eijk ¼ 1 ði, j, k Þ ¼ ð3, 2, 1Þ, ð1, 3, 2Þ, ð2, 1, 3Þ ð20:272Þ
>
:
0 otherwise:
Notice that in (20.272) if (i, j, k) represents an even permutation of (1, 2, 3), Eijk ¼ 1,
but if (i, j, k) is an odd permutation of (1, 2, 3), Eijk ¼ 1. Otherwise, for, e.g.,
i ¼ j, j ¼ k, or k ¼ i, etc. Eijk ¼ 0. The relation of (20.271) is essentially the same as
(3.30) and (3.69) of Chap. 3. We have
̬ ̬
su ð2Þ ¼ Spanfζ 1 , ζ 2 , ζ 3 g: ð20:273Þ
Example 20.5 In (20.26) we write A1 Ax, A2 Ay, and A3 Az. Rewriting it, we
have
0 1 0 1 0 1
0 0 0 0 0 1 0 1 0
B C B C B C
A1 ¼ @ 0 0 1 A, A2 ¼ @ 0 0 0 A, A3 ¼ @ 1 0 0 A: ð20:274Þ
0 1 0 1 0 0 0 0 0
Again, we have
X3
Ai , Aj ¼ E A:
k¼1 ijk k
ð20:275Þ
This is of the same form as that of (20.271). Namely,

̬ ̬ ̬
so ð3Þ ¼o ð3Þ ¼ SpanfA1 , A2 , A3 g:
Equation (20.274) clearly shows that diagonal elements of Ai (i ¼ 1, 2, 3)

are zero.
̬ ̬ ̬ ̬ ̬
From Examples 20.4 and 20.5, su ð2Þ and so ð3Þ ½or o ð3Þ structurally resemble
each other. This fact is directly related to similarity between SU(2) and SO(3). Note
̬ ̬ ̬ ̬
also that the dimensionality of su ð2Þ and so ð3Þ is the same in terms of linear vector
spaces.
In Chap. 3, we deal with the generalized angular momentum using the relation
(3.30). Using (20.15), we can readily obtain the relation the same as (20.271)
and (20.275). That is, from (3.30) defining the following anti-Hermitian operators
as
iJ iJ y
Jex x , Jey , Jez iJ z =ħ,
ħ ħ
we get
h i X3
Jel , Jf
m ¼ E Je ,
k¼1 lmn n
ð20:276Þ
where l, m, n stand for x, y, z. The derivation is left for readers.
20.4.3 Adjoint Representation of Lie Groups

̬
In (20.252) of Sect. 20.3.1 we defined an inner product of g as
X
hAjBi a b :
i,j ij ij
ð20:252Þ
We can readily check that the definition (20.252) satisfies those of the inner
product of (13.2), (13.3), and (13.4). In fact, we have
X X X
hBjAi ¼ b a ¼
i,j ij ij i,j
aij bij ¼ a b
i,j ij ij
¼ hAjBi : ð20:277Þ
Equation (13.3) is obvious from the calculation rules of a matrix. With (13.4),
we get
X
hAjAi ¼ Tr A{ A ¼
2
a
i,j ij
0, ð20:278Þ
where we have A ¼ (aij) and the equality holds if and only

̬
if all aij ¼ 0, i.e., A ¼ 0.
Thus, hA|Ai gives a positive definite inner product on g. ̬
From (20.278), we may equally define an inner product of g as

hAjBi ¼ Tr A{ B : ð20:279Þ
In fact, we have
X X
Tr A{ B ¼ i,j
A{ ij ðBÞji ¼ a b hAjBi:
i,j ji ji
In another way, we can readily compare (20.279) with (20.252) using, e.g., a
(2, 2) matrix and find that both equations give the same result. It is left for readers as
an exercise.
̬
From Theorem 20.4, u ðnÞ corresponding to the unitary group U(n) consists of all
the anti-Hermitian (n, n) matrices. With these matrices, we have
X X X
hBjAi ¼ Tr B{ A ¼ i,j
b
ji aji ¼ i,j
bij aij ¼ a b
i,j ij ij
¼ hAjBi: ð20:280Þ
Then, comparing (20.277) and (20.280) we have
hAjBi ¼ hAjBi:
̬
This means that hA|Bi is real. For example,
̬
using u ð2Þ, let us evaluate an inner
product. We denote arbitrary X 1 , X 2 2u ð2Þ by

ia c þ id ip r þ is
X1 ¼ , X2 ¼ ,
c þ id ib r þ is iq
where a, b, c, d; p, q, r, s are real. Then, according to the calculation rule of an inner

product of the Lie algebra, we have
hX 1 jX 2 i ¼ ap þ 2ðcr þ dsÞ þ bq,

̬ ̬
which gives a real inner product. Notice that this is the case with su ð2Þ as well.
Next, let us think of the following inner product:
h i
{
gXg1 gYg1 ¼ Tr gXg1 gYg1 , ð20:281Þ
where g is any non-singular matrix.

Bearing in mind that we are particularly interested in the continuous groups U(n)
[or its subgroup SU(n)] and O(n) [or its subgroup SO(n)], we assume that g is an
element of those groups and represented by a unitary matrix (including an orthog-
onal matrix); i.e., g1 ¼ g{ (or gT). Then, if g is a unitary matrix, we have

Tr gXg1 Þ{ gYg1 ¼ Tr gXg{ Þ{ gYg{ ¼ Tr gX { g{ gYg{

¼ Tr gX { Yg{ ¼ Tr X { Y ¼ hXjY i, ð20:282Þ
where with the second last equality we used (12.13), namely invariance of the trace
under the (unitary) similarity transformation. If g is represented by a real orthogonal
matrix, instead of (20.282) we have

Tr gXg1 Þ{ gYg1 ¼ Tr gXgT Þ{ gYgT ¼ Tr gT Þ{ X { g{ gYgT

¼ Tr g X { gT gYg{ ¼ Tr gX { YgT ¼ hXjYi, ð20:283Þ
where g is a complex conjugate matrix of g. Combining (20.281) and (20.282) or

(20.283), we have

gXg1 gYg1 ¼ hXjY i: ð20:284Þ
This relation clearly shows that the real inner product of (20.284) remains
unchanged by the operation
X⟼gXg1 ,
where X is an anti-Hermitian (or skew-symmetric) matrix and g is a unitary

(or orthogonal) matrix.
Now, we give the following definition and related theorems in relation to both the
Lie groups and Lie algebras.
Definition 20.4 Let G be a Lie group chosen from U(n) [including SU(n)] and O(n)
̬ ̬
[including SO(n)]. Let g be a Lie algebra corresponding to G. Let g 2 G and X 2g.
̬
We define the following transformation on g such that
Ad½gðX Þ gXg1 : ð20:285Þ

̬
Then, Ad[g] is said to be an adjoint representation of G on g. We write the relation
(20.285) as
̬ ̬
Ad½g: g!g :
̬
That is, Ad½gðX Þ 2g . The operator Ad[g] is a kind of mapping (i.e., endomor-
phism) discussed in Sect. 11.2. Notice that both g and X are represented by (n, n)
matrices. The matrix g is either a unitary matrix or an orthogonal matrix. The matrix
X is either an anti-Hermitian matrix or a skew-symmetric matrix.
̬
Theorem 20.5 Let g be an element of a Lie group G and g be a Lie algebra of G.
̬
Then, Ad[g] is a linear transformation on g.
̬
Proof Let X, Y 2g. Then, we have

Ad½gðaX þ bY Þ gðaX þ bY Þg1 ¼ a gXg1 þ b gYg1
¼ aAd½gðX Þ þ bAd½gðY Þ: ð20:286Þ
Thus, we find that Ad[g] is a linear transformation. By Definition 20.3, exp

(tX) 2 G with an arbitrary real number t. Meanwhile, we have

exp ftAd½gðX Þg ¼ exp tgXg1 ¼ exp gtXg1 ¼ g exp ðtX Þg1 ,
where with the last equality we used (15.30).

̬
Then, if g 2 G, g exp (tX)g1 2 G. Again from Definition 20.3, Ad½gðX Þ 2g .
̬
That is, Ad[g] is a linear transformation on g. This completes the proof. ∎
Notice that in Theorem 20.5, G may be any linear Lie group chosen̬ from among
̬
GL(n, ℂ). Lie algebra corresponding to GL(n, ℂ) is denoted by gl ðn, ℂÞ. The
following theorem shows that Ad[g] is a representation.
Theorem 20.6 Let g be an element of a Lie group G. Then, g ⟼ Ad[g] is a
̬
representation of G on g.
Proof From (20.285), we have
Ad½g1 g2 ðX Þ ðg1 g2 ÞX ðg1 g2 Þ1 ¼ g1 g2 Xg1 1 1

2 g1 ¼ g1 Ad½g2 ðX Þg1
¼ Ad½g1 ðAd½g2 ðX ÞÞ ¼ ðAd½g1 Ad½g2 ÞðX Þ: ð20:287Þ
Comparing the first and last sides of (20.287), we get
Ad½g1 g2 ¼ Ad½g1 Ad½g2 :

̬
That is, g ⟼ Ad[g] is a representation of G on g.
By virtue of Theorem 20.6, we call g ⟼ Ad[g] an adjoint representation of G on
̬
g. Once again, we remark that
̬ ̬
X⟼Ad½gðXÞ X0 or Ad½g : g!g ð20:288Þ
g Ad[ ] g
̬ ̬
Fig. 20.5 Linear transformation Ad[g]: g!g (i.e., endomorphism). Ad[g] is an endomorphism that
̬
operates on a vector space g
̬ ̬
is a linear
̬
transformation g!g, namely an endomorphism that operates on a vector
space g (see Sect. 11.2). Figure 20.5 schematically depicts it. The notation of the
adjoint representation Ad is somewhat
̬ ̬
confusing. That is, (20.288) shows that Ad[g]
is a linear transformation g!g (i.e., endomorphism). On the other hand, Ad is
thought to be a mapping G ! G0 where G and G0 mean two groups. Namely, we
express it symbolically as [9]
g⟼Ad½g or Ad : G ! G0 : ð20:289Þ
The G and G0 may or may not be identical. Examples can be seen in, e.g., (20.294)
and (20.302) later.
̬ ̬ ̬
As mentioned above, if g is either u ðnÞ or o ðnÞ , Ad[g] is an orthogonal
̬
transformation on g. The representation of Ad[g] is real accordingly.
An immediate implication of this is that the transformation of the basis vectors of
̬
g has a connection with the corresponding Lie groups such as U(n) and O(n).
Moreover, it is well known and studied that there is a close relationship between
̬ ̬
Ad[SU(2)] and SO(3). First, we wish to seek basis vectors of su ð2Þ. A general form
̬ ̬
of X 2su ð2Þ is a following traceless anti-Hermitian matrix described by

ic a þ ib
X¼ ,
a þ ib ic
where a, b, and c are real. We have encountered this type of matrix in (20.42) and
(20.270). Using an
̬ ̬
inner product described in (20.252), we determine an orthonor-
mal basis set of su ð2Þ such that

1 0 i 1 0 1 1 i 0
e1 ¼ pffiffiffi , e2 ¼ pffiffiffi , e3 ¼ pffiffiffi , ð20:290Þ
2 i 0 2 1 0 2 0 i
where we have hei| eji ¼ δij (i, j ¼ 1, 2, 3).

Choosing the SU(2) elements of
0 1
! β β
iα cos sin
e 2 0 B 2 2C
gα and gβ @ A
0 e iα2 β β
sin cos
2 2
that appeared in (20.43) and (20.44), we have [9]

! ! !
1 eiα=2 0 0 i eiα=2 0
Ad½gα ðe1 Þ ¼ gα e1 g1
α ¼ pffiffiffi
2 0 eiα=2 i 0 0 eiα=2
!
1 0 sin α i cos α
¼ pffiffiffi ¼ e1 cos α þ e2 sin α:
2 sin α i cos α 0
ð20:291Þ
Similarly, we get

1 0 cosα þ isinα
Ad½gα ðe2 Þ ¼ p ffiffi
ffi ¼ e1 sinα þ e2 cosα, ð20:292Þ
2 cosα þ isinα 0

1 i 0
Ad½gα ðe3 Þ ¼ pffiffiffi ¼ e3 : ð20:293Þ
2 0 i
Notice that we obtained (20.291) to (20.293) as a product of three diagonal

matrices. Thus, we obtain
0 1
cos α sin α 0
B C
ðe1 e2 e3 ÞAd½gα ¼ ðe1 e2 e3 Þ@ sin α cos α 0 A: ð20:294Þ
0 0 1
From (20.294), as the real representation we get

0 1
cos α sin α 0
B C
Ad½gα ¼ @ sin α cos α 0 A: ð20:295Þ
0 0 1
Calculating (e1 e2 e3)Ad[gβ] likewise, we get

0 1
cos β 0 sin β
B C
Ad gβ ¼ @ 0 1 0 A: ð20:296Þ
sin β 0 cos β
Moreover, with
!
eiγ=2 0
gγ ¼ , ð20:297Þ
0 eiγ=2
we have
0 1
cos γ sin γ 0
B C
Ad gγ ¼ @ sin γ cos γ 0 A: ð20:298Þ
0 0 1
Finally, let us reproduce (17.101) by calculating

Ad½gα Ad gβ Ad gγ ¼ Ad gα gβ gγ , ð20:299Þ
where we
̬ ̬
used the fact that gω ⟼ Ad[gω] (ω ¼ α, β, γ) is a representation of SU
(2) on su ð2Þ . To obtain an explicit form of the representation matrix of Ad[g]
[g 2 SU(2)], we use
0 1
β β
e2ðαþγÞ cos e2ðαγÞ sin
i i
B 2 2 C
gαβγ gα gβ gγ ¼ @ A ð20:300Þ
i
ð γα Þ β 2i ðαþγ Þ β
e2 sin e cos
2 2
and calculate Ad[gαβγ ](e1) such that

Ad gαβγ ðe1 Þ ¼ gαβγ e1 g1αβγ
0 i β β 1 0 β β1
! 2i ðαþγÞ
e2ðαþγÞ cos e2ðαγÞ sin cos e2ðαγÞ sin
i i
0 i e
1 B 2 2 C B 2 2C
¼ pffiffiffi @ A @ A
2 β β i 0 β β
e2ðγαÞ sin e2ðαþγÞ cos e2ðγαÞ sin e2ðαþγÞ cos
i i i i
2 2 2 2
0 1
i sin β cos γ ið cos α cos β cos γ sin α sin γ Þ
B C
B þ sin α cos β cos γ þ cos α sin γ C
B C
1 BB
C
C
¼ pffiffiffi B C
2B C
B ið cos α cos β cos γ sin α sin γ Þ C
@ A
i sin β cos γ
ð sin α cos β cos γ þ cos α sin γ Þ
0 1
cos α cos β cos γ sin α sin γ
B C
¼ ð e1 e2 e3 Þ B C
@ sin α cos β cos γ þ cos α sin γ A:
sin β cos γ
ð20:301Þ
Note that the matrix of (20.300) is identical with (20.45). Similarly calculating Ad
[gαβγ ](e2) and Ad[gαβγ ](e3), we obtain

ðe1 e2 e3 Þ Ad gαβγ ¼
0 1
cosαcosβcosγ sinαsinγ cosαcosβsinγ sinαcosγ cosαsinβ
B C
ð e1 e2 e3 Þ B
@
sinαcosβcosγ þ cosαsinγ sinαcosβsinγ þ cosαcosγ sinαsinβ C:
A
sinβcosγ sinβsinγ cosβ
ð20:302Þ
The matrix of RHS of (20.302) is the same as (17.101) that contains Euler angles.
The calculations are somewhat lengthy but straightforward. As mentioned just above
and as can be seen, e.g., in (20.294) and (20.302), we may symbolically express the
adjoint representation as [9]
Ad : SU ð2Þ ! SOð3Þ ð20:303Þ
in parallel to (20.289). Accordingly, from (20.302) we may identify Ad[gαβγ ] with

the SO(3) matrix (17.101) and write

Ad gαβγ
0 1
cos α cos β cos γ sin α sin γ cos α cos βsinγ sin α cos γ cos α sin β
B C
¼B@
sin α cos β cos γ þ cos α sin γ sin α cos β sin γ þ cos α cos γ sin α sin β C:
A
ð20:304Þ
In the above discussion, we make several remarks. (i) g 2 SU(2) is described by

(2, 2) complex matrices, but Ad[SU(2)] is expressed by (3, 3) real orthogonal
matrices. Thus, the dimension of the matrices is different. (ii) Notice again that
̬ ̬
(e1 e2 e3) of (20.302) is an orthonormal basis set of su ð2Þ. Hence, (20.302) can be
viewed as the orthogonal transformation of (e1 e2 e3). That is, we may regard Ad[SU
(2)] as all the matrix representations of SO(3). From (12.64) and (20.302), we write
[9]
Ad½SU ð2Þ SOð3Þ ¼ 0 or Ad½SU ð2Þ ¼ SOð3Þ:
Regarding the adjoint representation of G, we have the following important

theorem.
Theorem 20.7 The adjoint representation of Ad[SU(2)] is a surjective homomor-
phism of SU(2) on SO(3). The kernel F of the representation is {e, e} 2 SU(2).
With any rotation R 2 SO(3), two elements g 2 SU(2) satisfy Ad[g] ¼ Ad
[g] ¼ R.
Proof We have shown that by (20.302) the adjoint representation of Ad[SU(2)] is
surjective mapping of SU(2) on SO(3). Since Ad[g] is a representation, namely
homomorphism mapping from SU(2) to SO(3), we seek its kernel on the basis of
Theorem 16.3 (Sect. 16.4). Let h be an element of kernel of SU(2) such that
Ad½h ¼ E 2 SOð3Þ, ð20:305Þ

̬ ̬
where E is the identity transformation of SO(3). Operating (20.305) on e3 2su ð2Þ,
from (20.305) we have a following expression such that
Ad½hðe3 Þ ¼ he3 h1 ¼ Eðe3 Þ ¼ e3 or he3 ¼ e3 h,

h11 h12
where e3 is denoted by (20.290). Expressing h as , we get
h21 h22

ih11 ih12 ih11 ih12
¼ :
ih21 ih22 ih21 ih22
From the above relation, we must have h12 ¼ h21 ¼ 0. As h 2 SU(2), deth ¼ 1.

a 0
Hence, for h we choose h ¼ , where a 6¼ 0. Moreover, considering
0 a1
he2 ¼ e2h we get

0 a 0 a1
¼ : ð20:306Þ
a1 0 a 0
This implies that a ¼ a1, or a ¼ 1. That is, h ¼ e, where e denotes the

identity element of SU(2). (Note that we get no further conditions from he1 ¼ e1h.)
Thus, we obtain
F ¼ fe, eg: ð20:307Þ

̬ ̬
In fact, with respect to X 2su ð2Þ we have
Ad½ eðX Þ ¼ ð eÞX ð eÞ1 ¼ ð eÞX ð eÞ ¼ X: ð20:308Þ
Therefore, certainly we get
Ad½ e ¼ E: ð20:309Þ
Meanwhile, suppose that with g1, g2 2 SU(2) we have Ad[g1] ¼ Ad[g2] ¼ E 2 SO

(3). Then, we have

Ad g1 g2 1 ¼ Ad½g1 Ad g2 1 ¼ Ad½g2 Ad g2 1 ¼ Ad g2 g2 1
¼ Ad½e ¼ E: ð20:310Þ
Therefore, from (20.307) and (20.309) we get
g1 g2 1 ¼ e or g1 ¼ g2 : ð20:311Þ
Conversely, we have
Ad½g ¼ Ad½eAd½g ¼ Ad½g, ð20:312Þ
where with the first equality we used Theorem 20.6 and with the second equality we
used (20.309). The relations (20.311) and (20.312) imply that with any g 2 SU
(2) there are two elements g that satisfy Ad[g] ¼ Ad[g]. These complete the
proof. ∎
In the above proof of Theorem 20.7, (20.309) is a simple, but important expres-
sion. In view of Theorem 16.4 (Homomorphism theorem), we summarize the above
discussion as follows: Let Ad be a homomorphism mapping such that
Ad: SU ð2Þ⟶SOð3Þ ð20:313Þ
with a kernel F ¼ fe, eg. The relation is schematically represented in Fig. 20.6.
Notice the resemblance between Fig. 20.6 and Fig. 16.1a.
Correspondingly, from Theorem 16.4 of Sect. 16.4 we have an isomorphic
f such that
mapping Ad
f SU ð2Þ=F ⟶SOð3Þ,
Ad: ð20:314Þ
where F ¼ fe, eg. Symbolically writing (20.314) as in Sect. 16.4, we have

SU ð2Þ=F ffi SOð3Þ. Notice that F is an invariant subgroup of SU(2). Using a general
(2) (3)
Ad
−
Fig. 20.6 Homomorphism mapping Ad: SU(2) ⟶ SO(3) with a kernel F ¼ fe, eg . E is the
identity transformation of SO(3)
form of SU(2) of (20.300) and a general form of SO(3) of (20.302), we can readily
show Ad[gαβγ ] ¼ Ad[gαβγ ]. That is, from (20.300) we have
gαþ2π,βþ2π,γþ2π ¼ gαβγ :
Both the above two elements gαβγ and gαβγ produce an identical orthogonal
matrix given in RHS of (20.302).
To associate the above results of abstract algebra with the tangible matrix form
obtained earlier, we rewrite (20.54) by applying a simple trigonometric formula of
β β
sin β ¼ 2 sin cos :
2 2
That is, we get

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð jÞ
X ð1Þm0 mþμ ð jþmÞ!ð jmÞ!ð jþm0 Þ!ð jm0 Þ!
Dm0 m ðα,β,γ Þ¼ μ ð jþmμÞ!μ!ðm0 mþμÞ!ð jm0 μÞ!
2j " mm0 2μ m0 mþ2μ #
iðαm0 þγmÞ β β β
e cos cos sin
2 2 2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð1Þm0 mþμ ð jþmÞ!ð jmÞ!ð jþm0 Þ!ð jm0 Þ! 0
¼ 2mm 2μ
μ ð jþmμÞ!μ!ðm0 mþμÞ!ð jm0 μÞ!
2jþ2ðmm0 2μÞ
iðαm0 þγmÞ m0 mþ2μ β
e ð sinβÞ cos : ð20:315Þ
2
From (20.315), we clearly see that (i) if j is zero or a positive integer, so are m and
m0. Therefore, (20.315) is a periodic function with 2π with respect to α, β, and γ. But
(ii) if j is a half-odd-integer, so are m and m0. Then, (20.315) is a periodic function
with 4π. Note that in the case of (ii) 2j ¼ 2n + 1 (n ¼ 0.1, 2, ). Therefore, in
(20.315) we have
2jþ2ðmm0 2μÞ 2ðnþmm0 2μÞ
β β β
cos ¼ cos cos : ð20:316Þ
2 2 2
As studied in Sect. 20.2.3, in the former case (i) the spherical surface harmonics
span the representation space of D(l ) (l ¼ 0, 1, 2, ). The simplest case for it is
D(0) ¼ 1; a (1, 1) identity matrix, i.e., merely a number 1. The second simplest case
was given by (20.74) that describes D(1). Meanwhile, the simplest case for (ii) is
ð12Þ
D(1/2) whose matrix representation was given by Dα,β,γ in (20.45) or by gαβγ in
(20.300). With gαβγ , both Ad[ gαβγ] produce a real orthogonal (3, 3) matrix
expressed as (20.302). This representation matrix allowedly belongs to SO(3).
20.5 Connectedness of Lie Groups 885
20.5 Connectedness of Lie Groups
As already discussed in Chap. 6, reflecting the topological characteristics the

connectedness is an important concept in the Lie groups. If we think of the analytic
functions on the complex plane ℂ2, the connectedness was easily envisaged geo-
metrically. To deal with the connectedness in general Lie groups, on the other hand,
we need abstract concepts. In this section we take a down-to-earth approach as much
as possible.
20.5.1 Several Definitions and Examples
To discuss this topic, let us give several definitions and examples.

Definition 20.5 Suppose that A is a subset of GL(n, ℂ) and that we have any pair of
elements a, b 2 A ⊂ GL(n, ℂ). Meanwhile, let f(t) be a continuous function defined
in an interval 0 t 1 and take values within A; that is, f(t) 2 A with 8t. If f(0) ¼ a
and f(1) ¼ b, then a and b are said to be connected within A.
Major parts of the discussion that follows are based mostly upon literature [9]. A
simple example for the above is given below.

1 0 1 0
Example 20.6 Let a ¼ and b ¼ be elements of SO(2). Let f
0 1 0 1
(x) be described by

cos πt sin πt
f ðt Þ ¼ :
sin πt cos πt
Then, f(t) is a continuous matrix function of all real numbers t. We have f(0) ¼ a
and f(1) ¼ b and, hence, a and b are connected to each other within SO(2).
For a and b to be connected within A is denoted by a~b. Then, the symbol ~
satisfies the equivalence relation. That is, we have a following proposition.
Proposition 20.2 We have a, b, c 2 A ⊂ GL(n, ℂ). Then, we have following three
relations. (i) a~a. (ii) If a~b, b~a. (iii) If a~b and b~c, then we have a~c.
Proof (i) Let f(t) ¼ a (0 t 1). Then, f(0) ¼ f(1) ¼ a so that we have a~a. (ii) Let
f(t) be a continuous curve that connects a and b such that f(0) ¼ a and f(1) ¼ b
(0 t 1), and so a~b. Let g(t) be another continuous curve within A. Then, g(t) f
(1 t) is also a continuous curve within A. Then, g(0) ¼ b and g(1) ¼ a. This implies
b~a. (iii) Let f(t) and g(t) be two continuous curves that connect a and b and b and c,
respectively. Meanwhile, we define h(t) as
8
>
< f ð2t Þ 0 t
1
hð t Þ ¼ 2
>
: gð2t 1Þ 1
t 1
2
so that h(t) can be a third continuous curve. Then, h(1/2) ¼ f(1) ¼ g(0). From the
supposition we have h(0) ¼ f(0) ¼ a, h(1/2) ¼ f(1) ¼ g(0) ¼ b, and h(1) ¼ g(1) ¼ c.
This implies if a~b and b~c, then we have a~c.
Let us give another related definition.
Definition 20.6 Let A be a subset of GL(n, ℂ). Let a point a 2 A. Then, a collection
of points that are connected to a within A is called a connected component of a and
denoted by C(a).
From Proposition 20.2 (i), a 2 C(a). Let C(a) and C(b) be two different connected
components. We have two alternatives with C(a) and C(b). (I) C(a) \ C(b) ¼ ∅.
(II) C(a) ¼ C(b). Suppose c 2 C(a) \ C(b). Then, it follows that c~a and c~b. From
Proposition 20.2 (ii) and (iii), we have a~b. Hence, we get C(a) ¼ C(b). Thus, the
subset A is a direct sum of several connected components such that
A ¼ C ðaÞ [ C ðbÞ [ [ CðzÞ with any CðpÞ \ C ðqÞ ¼ ∅ ðp 6¼ qÞ, ð20:317Þ
where p, q are taken from among a, b, , z. In particular, the connected component

containing e (i.e., the identity element) is of major importance.
Definition 20.7 Let A be a subset of GL(n, ℂ). If any pair of elements a, b (2A) is
connected to each other within A, then A is connected or A is called a connected set.
The connected set is nothing but a set that consists of a single connected set. With
a connected set, we have the following important theorem.
Theorem 20.8 An image of a connected set A obtained by a continuous mapping
f is a connected set as well.
Proof The proof is almost self-evident. Let f: A ⟶ B be a continuous mapping from
a connected set A to B. Suppose f(A) ¼ B. Then, we have 8a, a0 2 A and b, b0 2 B that
satisfy f(a) ¼ b and f(a0) ¼ b0. Then, from Definition 20.5 there is a continuous
function g(t) (0 t 1) that satisfies g(0) ¼ a and g(1) ¼ a0. Now, define a function
h(t) f [g(t)]. Then, h(t) is a continuous mapping with h(0) ¼ f [g(0)] ¼ f(a) ¼ b and
h(1) ¼ f [g(1)] ¼ f(a0) ¼ b0. Again from Definition 20.5, h(t) is a continuous curve
that connects b and b0 within B. Meanwhile, from supposition of f(A) ¼ B, with

8
b, b0 2 B we must have ∃ a0 , a00 2 A that satisfy f(a0) ¼ b and f a00 ¼ b0 .
Consequently, from Definition 20.7, B is a connected set as well. This completes
the proof. ∎
Applications of Theorem 20.8 are given below as examples.
Example 20.7 Let f(x) be a real continuous function defined on a subset A ⊂ GL
(n, ℝ), where x 2 A with A being a connected set. Suppose that for a pair of
a, b 2 A, f(a) ¼ p and f(b) ¼ q ( p, q : real) with p < q. Then, f(x) takes all real
numbers for its value that exist on the interval [p, q]. In other words, we have f
(A) ⊃ [p, q]. This is well known as an intermediate value theorem.
Example 20.8 [9] Suppose that A is a connected subset of GL(n, ℝ). Then, with any
pair of elements a, b 2 A ⊂ GL(n, ℝ), we must have a continuous function g(t)
(0 t 1) such that g(0) ¼ a and g(1) ¼ b. Meanwhile, suppose that we can define a
real determinant for any 8g 2 A as a real continuous function detg(t).
Now, suppose that deta ¼ p < 0 and detb ¼ q > 0 with p < q. Then, from
Theorem 20.8 we would have det(A) ⊃ [p, q]. Consequently, we must have some
x 2 A such that det(x) ¼ 0. But, this is in contradiction to A ⊂ GL(n, ℝ), because we
must have det(A) 6¼ 0. Thus, within a connected set the sign of the determinant
should be constant, if the determinant is defined as a real continuous function.
In relation to the discussion of the connectedness, we have the following general
theorem.
Theorem 20.9 [9] Let G0 be a connected component within a linear Lie group
G that contains the identity element e. Then, G0 is an invariant subgroup of G. A
connected component C(g) containing g 2 G is identical with gG0 ¼ G0g.
Proof Let a, b 2 G0. Since G0 ¼ C(e), from the supposition there are continuous
curves f(t) and g(t) within G0 that connect a with e and b with e, respectively, such
that f(0) ¼ a, f(1) ¼ e and g(0) ¼ b, g(1) ¼ e. Meanwhile, let h(t) f(t)[g(t)]1. Then,
h(t) is another continuous curve. We have h(0) ¼ f(0)[g(0)]1 ¼ ab1 and h(1) ¼ f(1)
[g(1)]1 ¼ e(e1) ¼ e e ¼ e. That is, ab1 and e are connected, being indicative of
ab1 2 G0. This implies that G0 is a subgroup of G.
Next, with 8g 2 G and 8a 2 G0 we have f(t) in the same sense as the above.
Defining k(t) as k(t) ¼ g f(t)g1, k(t) is a continuous curve within G with k
(0) ¼ gag1 and k(1) ¼ g eg1 ¼ e. Hence, we have gag1 2 G0. Since a is an
arbitrary point of G0, we have gG0g1 ⊂ G0 accordingly. Similarly, g is an
arbitrary point of G, and so replacing g with g1 in the above, we get g1G0g ⊂ G0.
Operating g and g1 from the left and right, respectively, we obtain G0 ⊂ gG0g1.
Combining this with the above relation gG0g1 ⊂ G0, we obtain gG0g1 ¼ G0 or
gG0 ¼ G0g. This means that G0 is an invariant subgroup of G; see Sect. 16.3.
Taking 8a 2 G0 and f(t) in the same sense as the above again, l(t) ¼ g f(t) is a
continuous curve within G with l(0) ¼ ga and l(1) ¼ g e ¼ g. Then, ga and g are
connected within G. That is, ga 2 C(g). Since a is an arbitrary point of G0, gG0 ⊂ C
(g). Meanwhile, choosing any 8p 2 C(g), we set a continuous curve m(t) (0 t 1)
that connects p with g within G. This implies that m(0) ¼ p and m(1) ¼ g. Also
setting a continuous curve n(t) ¼ g1m(t), we have n(0) ¼ g1m(0) ¼ g1p and n
(1) ¼ g1m(1) ¼ g1g ¼ e. This means that g1p 2 G0. Multiplying g on its both
sides, we get p 2 gG0. Since p is arbitrarily chosen from C(g), we get C(g) ⊂ gG0.
Combining this with the above relation gG0 ⊂ C(g), we obtain C(g) ¼ gG0 ¼ G0g.
In the above proof, we add the following statement: In Sect. 16.2 with the
necessary and sufficient condition for the subset to be a subgroup, we describe
(1) hi , hj 2 H ⟹hi ⋄hj 2 H . (2) h 2 H ⟹h1 2 H . In (1) if we replace hj with h1 i ,

hi ⋄h1
j ¼ h i ⋄h 1
i ¼ e 2 H . In turn, if we replace h i with e, h i ⋄h 1
j ¼ e⋄h 1
j ¼
1
h1 1 1 1
j 2 H . Finally, if we replace hj ¼ hj , hi ⋄hj ¼ hi ⋄ hj ¼ hi ⋄hj 2 H .
Thus, H satisfies the axioms (A1), (A3), and A(4) of Sect. 16.1 and, hence, forms
a (sub)group. In other words, the conditions (1) and (2) are combined so as to be
hi 2 H , hj 2 H ⟹hi ⋄h1
j 2 H:
The above general theorem can immediately be applied to an important class of

Lie groups SO(n). We discuss this topic below.
20.5.2 O(3) and SO(3)
In Chap. 17 we dealt with finite groups related to O(3) and SO(3), i.e., their
subgroups. In this section, we examine several important properties in terms of Lie
groups.
The groups O(3) and SO(3) are characterized by their determinant detO(3) ¼ 1
and detSO(3) ¼ 1. Example 20.8 implies that in O(3) there should be two different
connected components according to detO(3) ¼ 1. Also Theorem 20.9 tells us that
a connected component C(E) is an invariant subgroup G0 in O(3), where E is the
identity element of O(3). Obviously, G0 is SO(3). In turn, another connected
component is SO(3)c, where SO(3)c denotes a complementary set of SO(3); with
the notation see Sect. 6.1. Remembering (20.317), we have
Oð3Þ ¼ SOð3Þ [ SOð3Þc : ð20:318Þ
This can be rewritten as a direct sum such that
Oð3Þ ¼ CðE Þ [ C ðEÞ with C ðE Þ \ C ðE Þ ¼ ∅, ð20:319Þ

0 1
1 0 0
B C
where E denotes a (3, 3) unit matrix and E ¼ @ 0 1 0 A . In the above,
0 0 1
SO(3) is identical to C(E) with SO(3) itself being a connected set; SO(3)c is identical
with C(E) that is another connected set.
Another description of O(3) is [13].
Oð3Þ ¼ SOð3Þ fE, Eg, ð20:320Þ
where the symbol denotes a direct-product group (Sect. 16.5). An alternative

description for this is
Oð3Þ ¼ SOð3Þ [ fðE ÞSOð3Þg, ð20:321Þ
where the direct sum is implied.

In Sect. 20.2.6, we discussed two rotations around different rotation axes with the
same rotation angle. These rotations belong to the same conjugacy class; see
(20.140). With Q as another rotation that transforms the rotation axis of Rω to that
of R0ω , the two rotations Rω and R0ω having the same rotation angle ω are associated
with each other through the following relation:
R0ω ¼ QRω Q1 : ð20:140Þ
Notice that this relation is independent of the choice of specific coordinate

system. In Fig. 20.4 let us choose the z-axis for the rotation axis A with respect to
the rotation Rω. Then, the rotation matrix Rω is expressed in reference to the
Cartesian coordinate system as
0 1
cos ω sin ω 0
B C
Rω ¼ @ sin ω cos ω 0 A: ð20:322Þ
0 0 1
Meanwhile, multiplying E on both sides of (20.140), we have
R0ω ¼ QðRω ÞQ1 : ð20:323Þ
Note that E is commutative with any Q (or Q1). Also, notice that from
(20.323) R0ω and Rω are again connected via a unitary similarity transformation.
0 1 0 1
cosω sinω 0 cos ðω þ π Þ sin ðω þ π Þ 0
B C B C
Rω ¼ @ sinω cosω 0 A ¼ @ sin ðω þ π Þ cos ðω þ π Þ 0 A, ð20:324Þ
0 0 1 0 0 1
where Rω represents an improper rotation of an angle ω + π (see Sect. 17.1).

Referring to Fig. 17.12 and Table 17.6 (Sect. 17.3), as another tangible example
we further get
0 10 1 0 1
0 1 0 1 0 0 0 1 0
B CB C B C
@ 1 0 0 A@ 0 1 0 A ¼ @1 0 0 A, ð20:325Þ
0 0 1 0 0 1 0 0 1
where the first component of LHS belongs to O (octahedral rotation group) and the
RHS belongs to Td (tetrahedral group) that gives a mirror symmetry. Combining
(20.324) and (20.325), we get the following schematic representation:
ðRotationÞ ðInversionÞ ¼ ðImproper rotationÞ or ðMirror symmetryÞ:
This is a finite group version of (20.320).
20.5.3 Simply Connected Lie Groups: Local Properties

and Global Properties
As a final topic of Lie groups, we outline the characteristic of simply connected

groups.
In (20.318), (20.319), and (20.321) we showed three different decompositions of
O(3). This can be generalized to O(n). That is, instead of E we may take
F described by [9].
0 1
1
B C
B C
B 1 C
B C
F¼B ⋱ C,
B C
B C
@ 1 A
1
where detF ¼ 1. Then, we have
OðnÞ ¼ SOðnÞ [ SOðnÞc ¼ CðE Þ [ C ðF Þ ¼ SOðnÞ þ FSOðnÞ, ð20:326Þ
where the last side represents the coset decomposition (Sect. 16.2). This is essen-
tially the same as (20.321). In terms of the isomorphism, we have
OðnÞ=SOðnÞ ffi fE, F g: ð20:327Þ
If in particular n is odd (n 3), we can choose E instead of F, because det

(E) ¼ 1. Then, similarly to (20.320) we have
OðnÞ ¼ SOðnÞ fE, Eg ðn: oddÞ, ð20:328Þ
where E is a (n, n) identity matrix. Note that since E is commutative with any
elements of O(n), the direct-product of (20.328) is allowed.
Of the connected components, that containing the identity element is of particular

importance. This is because as already mentioned in Sect. 20.3.1, the “initial”
condition of the one-parameter group A(t) is set at t ¼ 0 such that
Að0Þ ¼ E: ð20:238Þ
Under this condition, it is important to examine how A(t) evolves with t, starting
from E at t ¼ 0. In this connection we have the following theorems that give good
grounds to further discussion of Sect. 20.2. We describe them without proof.
Interested readers are encouraged to look up literature [9].
Theorem 20.10 Let G be a linear Lie group. Let G0 be a connected component of
the identity element e 2 G. Then, G0 is another linear Lie group and the Lie algebra
of G and G0 is identical.
Theorem 20.10 shows the significance of the connected component of the
̬ ̬ ̬
identity. From this theorem the Lie algebras of, e.g., o ðnÞ and so ðnÞ are identical,
namely both of the Lie algebras are given by real skew-symmetric matrices. This is
inherently related to the properties of Lie algebras. The zero matrix (i.e., zero vector)
as an element of Lie algebra corresponds to the identity element of the Lie group.
This is obvious from the relation
e ¼ exp 0,
where e is the identity element of the Lie group and 0 represents the zero matrix.
Basically Lie algebras are suited for describing local properties associated with
infinitesimal transformations around the identity element e. More specifically, the
exponential functions of the real skew-symmetric matrices cannot produce E.
Considering that one of the connected components of O(3) is ̬
SO(3) that̬ ̬ contains
the identity element, it is intuitively understandable that o ðnÞ and so ðnÞ are
the same.
Another interesting point is that SO(3) is at once an open set and a closed set (i.e.,
clopen set; see Sect. 6.1). This is relevant to Theorem 6.3 which says that a necessary
and sufficient condition for a subset S of a topological space to be both open and
closed at once is Sb ¼ ∅. Since the determinant of any elements of O(3) is
alternatively 1, it is natural that SO(3) has no boundary; if there were boundary,
its determinant would be zero, in contradiction to SO(3) ⊂ GL(n, ℝ); see Example
20.8.
Theorem 20.11 Let G be a connected linear Lie group. (The Lie group G may be G0
in the sense of Theorem 20.10.) Then, 8g 2 G can be described by
g ¼ ð exp t 1 X 1 Þð exp t 2 X 2 Þ ð exp t d X d Þ, ð20:329Þ

where Xi (i ¼ 1, 2, , d ) is a basis set of Lie algebra corresponding to G and

ti (i ¼ 1, 2, , d) is an arbitrary real number.
Originally, the notion of Lie algebra has been introduced to deal with infinites-
imal transformation near the identity element. Yet, Theorem 20.11 says that any
transformation (or element) of a connected linear Lie group can be described by a
finite number of elements of the corresponding Lie algebra. That is, the Lie algebra
determines the global characteristics of the connected Lie group.
Strengthening the condition of connectedness, we need to deal with simply
connected groups. In this context we have following definitions.
Definition 20.8 Let G be a linear Lie group and let an interval I ¼ [0, 1]. Let f be a
continuous function such that
f : I⟶G or f ðI Þ ⊂ G: ð20:330Þ
Then, the function f is said to be a path, in which f(0) and f(1) are an initial point
and an end point, respectively. If f(0) ¼ f(1) ¼ x0, f is called a loop (i.e., a closed
path) at x0 [14]. If f(t) x0 (0 t 1), f is said to be a constant loop.
Definition 20.9 Let f and g be two paths. If f and g can continuously be deformed
from one to the other, they are said to be homotopic.
To be more specific with Definition 20.9, let us define a function h(s, t) that is
continuous with respect to s and t in a region I I ¼ {(s, t); 0 s 1, 0 t 1}. If
furthermore h(0, t) ¼ f(t) and h(1, t) ¼ g(t) hold, f and g are homotopic [9].
Definition 20.10 Let G be a connected Lie group. If all the loops at an arbitrary
chosen point 8x 2 G are homotopic to a constant loop, G is said to be simply
connected.
This would somewhat be an abstract concept. Rephrasing the statement in a word,
for G to be simply connected is that loops at any 8x 2 G can continually be
contracted to that point x. A next example helps understand the meaning of being
simply connected.
Example 20.9 Let S2 be a spherical surface of ℝ3 such that

S2 ¼ x 2 ℝ3 ; hxjxi ¼ 1 : ð20:331Þ
Any x is equivalent in virtue of the spherical symmetry of S2. Let us think of any
loop at x. This loop can be contracted to that point x. Hence, S2 is simply connected.
By the same token, a spherical hypersurface (or hypersphere) given by
Sn ¼ {x 2 ℝn + 1; hx| xi ¼ 1} is simply connected. Thus, from (20.36) SU(2) is
simply connected. Note that the parameter space of SU(2) is S3.
Example 20.10 In Sect. 20.4.2 we showed that SO(3) is a connected set. But, it is
not simply connected. To see this, we return to Sect. 17.4.2 that dealt with the three-
dimensional rotation matrices. Rewriting fR3 in (17.101) as fR3 ðα, β, γ Þ, we have
f3 ðα, β, γ Þ
R
0 1
cos α cos β cos γ sin α sin γ cos α cos βsinγ sin α cos γ cos α sin β
B C
¼B @
sin α cos β cos γ þ cos α sin γ sin α cos β sin γ þ cos α cos γ sin α sin β C:
A
ð20:332Þ
Note that in (20.332) we represent the rotation in the moving coordinate system.
Putting β ¼ 0 in (20.332), we get
f
R3 ðα, 0, γ Þ
0 1
cos α cos γ sin α sin γ cos α sin γ sin α cos γ 0
B C
¼B @ sin αð cos 0Þ cos γ þ cos α sin γ sin αð cos 0Þ sin γ þ cos α cos γ 0C
A
0 0 1
0 1
cos ðα þ γ Þ sin ðα þ γ Þ 0
B C
¼B @ sin ðα þ γ Þ cos ðα þ γ Þ 0 CA:
0 0 1
ð20:333Þ
If in (20.333) we replace α with α + ϕ0 and γ with γ ϕ0, we are going to obtain

the same result as (20.333). That is, different sets of parameters give the same result
(i.e., the same rotation) such that
ðα, 0, γ Þ⟷ðα þ ϕ0 , 0, γ ϕ0 Þ: ð20:334Þ
Putting β ¼ π in (20.332), we get

0 1
cos ðα γ Þ sin ðα γ Þ 0
f3 ðα, π, γ Þ ¼ B
R @ sin ðα γ Þ cos ðα γ Þ
C
0 A: ð20:335Þ
0 0 1
If, in turn, we replace α with α + ϕ0 and γ with γ + ϕ0 in (20.335), we obtain the

same result as (20.335). Again, different sets of parameters give the same result (i.e.,
the same rotation) such that
ðα, π, γ Þ⟷ðα þ ϕ0 , π, γ þ ϕ0 Þ: ð20:336Þ
Meanwhile, R3 of (17.107) is expressed as

(a) (b)
+
− Equator
N
2 −
S
(c)
Fig. 20.7 Geometry of the rotation axis (A) accompanied by a rotation γ. (a) Geometry viewed in
parallel to the equator. (b, c) Geometry viewed from above the north pole (N). If the azimuthal angle
e α should be replaced with (b) α + π (in the case of 0 α π)
α is measured at the antipodal point P,
or (c) α π (in the case of π < α 2π)
0 1
cos 2 α cos 2 β þ sin 2 α cos γ cos α sinα sin 2 βð1 cos γ Þ cos α cos β sin βð1 cos γ Þ
B þ 2
α 2
β cos β sin γ þsin α sin β sin γ C
B cos sin C
B C
B C
B 2 C
B C
B cos αsinα sin 2 βð1 cos γ Þ sin α cos 2 β þ cos 2 α cos γ sinα cos β sin βð1 cos γ Þ C
B C
R3 ¼ B cos αsinβ sin γ C:
B þcos β sin γ þ sin 2 α sin 2 β C
B C
B C
B C
B C
B cos β sin β cos αð1 cos γ Þ sin αcos β sinβð1 cos γ Þ sin 2 β cos γ C
@ A
sin α sin β sin γ þcos α sin β sin γ þ cos 2 β
ð20:337Þ
Note in (20.337) we represent the rotation in the fixed coordinate system. In this
case, again we have arbitrariness with the choice of parameters. Figure 20.7 shows a
geometry of the rotation axis (A) accompanied by a rotation γ; see Fig. 17.18 once
again. In this parameter space, specifying SO(3) the rotation is characterized by
azimuthal angle α, zenithal angle β, and magnitude of rotation γ. The domains of
variability of the parameters (α, β, γ) are
0 α 2π, 0 β π, 0 γ 2π:
Figure 20.7a shows a geometry viewed in parallel to the equator that includes a
zenithal angle β as well as a point P (i.e., a point of intersection between the rotation
axis and geosphere) and its antipodal point P. e If β is measured at P,
e β and γ should be
replaced with π β and 2π γ, respectively. Meanwhile, Fig. 20.7b, c depict an
azimuthal angle α viewed from above the north pole (N). If the azimuthal angle α is
e α should be replaced with either α + π (in the case
measured at the antipodal point P,
of 0 α π; see Fig. 20.7b) or α π (in the case of π < α 2π; see Fig. 20.7c).
Consequently, we have the following correspondence:
ðα, β, γ Þ⟷ðα π, π β, 2π γ Þ: ð20:338Þ
Thus, we confirm once again that the different sets of parameters give the same
rotation. In other words, two different points in the parameter space correspond to
the same transformation (i.e., rotation). In fact, using these different sets of param-
eters, the matrix form of (20.337) is held unchanged. The conformation is left for
readers.
Because of the aforementioned characteristic of the parameter space, a loop
cannot be contracted to a single point in SO(3). For this reason, SO(3) is not simply
connected.
In contrast to Example 20.10, the mapping from the parameter space S3 to SU(2)
is bijective in the case of SU(2). That is, any two different points of S3 give different
transformations on SU(2) (i.e., injective). For any element of SU(2) it has a
corresponding point in S3 (i.e., surjective). This can be rephrased such that SU
(2) and S3 are isomorphic.
In this context, let us give important concepts. Let (T, τ) and (S, σ) be two
topological spaces. Suppose that f : (T, τ) ⟶ (S, σ) is a bijective mapping. If in this
case, furthermore, both f and f1 are continuous mappings, (T, τ) and (S, σ) are said to
be homeomorphic [14]. In our present case, both SU(2) and S3 are homeomorphic.
Apart from being simply connected or not, SU(2) and SO(3) resemble in many
aspects. The resemblance has already shown up in Chap. 3 when we considered the
(orbital) angular momentum and generalized angular momentum. As shown in
(3.30) and (3.69), the commutation relation was virtually the same. It is obvious
when we compare (20.271) and (20.275). That is, in terms of Lie algebras their
structure constants are the same and, eventually, the structure of corresponding Lie
groups is related. This became well understood when we considered the adjoint
̬
representation of the Lie group G on its Lie algebra g . Of the groups whose Lie
algebras are characterized by the same structure constants, the simply connected
group is said to be a universal covering group. For example, among the Lie groups O
(3), SO(3), and SU(2), only SU(2) is simply connected and, hence, is a universal
covering group.
The understanding of the Lie algebra is indispensable to explicitly describing the
elements of the Lie group, especially the connected components of the identity
element e 2 G. Then, we are able to understand the global characteristics of the
Lie group. In this chapter, we have focused on SU(2) and SO(3) as a typical example
of linear Lie groups.
Finally, though formal, we describe a definition of a topological group.
Definition 20.11 [15] Let G be a set. If G satisfies the following conditions, G is
called a topological group.
(i) The set G is a group.
(ii) The set G is a T1-space.
(iii) Group operations (or calculations) of G are continuous. That is, let P and Q be
two mappings such that
P : G G⟶G; Pðg, hÞ ¼ gh ðg, h 2 GÞ, ð20:339Þ

Q : G⟶G; QðgÞ ¼ g1 ðg 2 GÞ: ð20:340Þ
Then, both the mappings P and Q are continuous.

By Definition 20.11, a topological group combines the structures of group and
topological space. Usually, the continuous group is a collective term of Lie groups
and topological groups. The continuous groups are very frequently dealt with in
various fields of natural science and mathematical physics.
References
1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, expanded
edn. Shokabo, Tokyo. (in Japanese)
3. Takagi T (2010) Introduction to analysis, Standard edn. Iwanami, Tokyo. (in Japanese)
4. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn.
Academic Press, Waltham
6. Rose ME (1995) Elementary theory of angular momentum. Dover, New York
8. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton
9. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
References 897
13. Steeb W-H (2007) Continuous symmetries, Lie algebras, differential equations and computer
algebra, 2nd edn. World Scientific, Singapore
14. McCarty G (1988) Topology. Dover, New York
15. Murakami S (2004) Foundation of continuous groups, revived edn. Asakura Publishing, Tokyo.
(in Japanese)
Index
A Analytic continuation, 225–227, 234, 264

Abelian group, 622, 626, 689, 710, 746, 762 Analytic functions, 181–265, 319, 581, 582,
Abscissa axis, 181, 197 866, 885
Absolute convergence, 583 Analytic prolongation, 227
Absolute temperature, 340 Angular coordinate, 197
Absolute value, 127, 138, 141, 197, 205, 216, Angular frequency, 4, 6, 32, 127, 129, 133, 139,
337, 354, 567, 777, 782, 808 278, 341, 344, 346, 351, 356
Abstract algebra, 846, 852, 884 Angular momentum operator, 29, 77–91, 108,
Abstract concept, 433, 635, 688, 885, 892 142, 145
AC5, 361–363 Anisotropic
AC’7, 359–363 crystal, 366
Accumulation points, 191–193, 223, 225, 226 dielectric constant, 365, 366
Adherent point, 186, 192 media, 569
Adjoint boundary functionals, 393 medium, 365
Adjoint Green’s functions, 396, 402 Annihilation operators, 31, 41, 111, 112
Adjoint matrix, 22, 524, 584 Anti-Hermitian, 28, 29, 109, 116, 117, 387,
Adjoint operator, 109, 387, 389, 523, 535–539, 590, 592, 600, 610, 802, 806, 810, 867,
541, 554 870–873, 875, 876, 878
Adjoint representation, 874–884, 895 Anti-Hermitian operators, 28, 109, 610, 806,
Algebraic branch point, 256 810, 870, 873
Algebraic equation, 15 Antilinear, 526
Algebraic method, 1 Anti-phase, 336, 337
Allowed transition, 138, 147, 764, 776 Antipodal point, 895
Allyl radical, 688, 689, 691, 747, 770–777 Antisymmetric, 412, 723–726, 858, 860, 863,
Aluminum-doped zinc oxide (AZO), 363, 864
369–371 Antisymmetric representation, 723–726, 858,
Ampère, A.-M., 273 860, 863, 864
Ampère’s circuital law, 273 Approximation methods, 151–179
Amplitude, 278, 284, 285, 298, 300, 303, 305, Arc tangent, 63, 371
310, 324, 332, 333, 335, 351, 375, 377, Arguments, 30, 55, 66, 152, 187, 197, 227, 231,
422, 423 238, 243, 247, 252, 254, 256, 258, 260,
Analytical equation, 1 261, 264, 303, 317, 319, 353, 385, 394,
Analytical method, 77, 83, 91 398–400, 403, 405, 408, 412, 416, 418,
Analytical solutions, 44, 107, 151 454, 463, 489, 498, 499, 517, 532, 557,

https://doi.org/10.1007/978-981-15-2225-3
900 Index
569, 573, 608, 615, 623, 647, 658, 681, Bithiophene, 648
693, 704, 707, 731, 734, 740, 747, 751, Blackbody, 339–341
764, 797, 806, 817, 820, 826, 828, 847 Blackbody radiation, 339–341
Aromatic hydrocarbons, 729, 747 Block matrix, 686, 860
Aromatic molecules, 756, 770, 777 Bohr radius
Associated Laguerre polynomials, 57, 108, of hydrogen, 133
116–122 of hydrogen-like atoms, 108
Associated Legendre differential equation, 83, Boltzmann distribution law, 339, 348, 355
91–103, 427 Born probability rule, 565
Associated Legendre functions, 57, 83, 94, Boron trifluoride (BF3), 650
103–107 Bose–Einstein distribution functions, 341
Associative law, 23, 441, 541, 624, 631, 663, Boundary, 14, 71, 109, 117, 173, 174, 189–190,
666 325–327, 341, 344, 363, 365, 370, 379,
Atomic orbitals, 729, 734, 742, 744, 745, 747, 380, 384, 385, 388, 389, 393, 401, 405,
749, 752, 757, 764, 770, 771, 777, 785, 408, 418, 423, 597, 891
788, 789 functionals, 380, 384, 388, 389, 393, 408
Attribute, 636, 637 term, 386
Azimuthal angle, 29, 70, 674, 791, 831, 839, Boundary conditions (BCs), 14–17, 19, 20,
895 25–27, 30, 32, 36, 56, 71, 72, 109, 116,
Azimuthal quantum numbers, 120, 142, 147 117, 173, 174, 285, 325–327, 341–343,
Azimuth of direction, 663 345, 363, 365, 370, 379, 380, 384–386,
388, 389, 393–395, 397–399, 401–403,
405, 407–411, 415, 418, 419, 421,
B 423–427, 581, 597–600, 602, 605, 607,
Backward wave, 278, 331, 333, 335 608, 610, 615
Ball, 220 adjoint, 410, 415
Basis homogeneous, 380, 388, 389, 393, 394,
functions, 88, 683–691, 720, 734, 744, 397, 399, 401–403, 405, 407–409, 411,
748–750, 777, 799, 815–817, 819, 839, 423, 426, 427, 597, 600
842, 843, 846, 849, 859, 860, 863, 864 homogeneous adjoint, 393–395
set, 457, 515, 516, 541, 547, 584, 685, 692, inhomogeneous, 380, 394, 395, 402, 403,
693, 720, 739, 847, 849, 854, 878, 881, 405, 407, 415, 418, 419, 424, 597–599,
892 605, 607, 608
vectors, 8, 41, 59, 60, 132, 140, 155, 291, Bounded, 25, 29, 74, 79, 212, 214–216, 221,
395, 433, 435, 439–446, 448, 450, 222
452–460, 468–470, 474, 477, 487, BP1T, 361–363
489–491, 495, 505–507, 513, 515, Bragg’s condition, 374
518–520, 528, 529, 533, 537–540, 547, Branch, 250, 252, 253, 255, 256, 427, 547, 617
568, 576, 636, 640, 648, 650, 654, 657, cut, 253, 255–259, 262, 263
677, 679, 684–686, 688, 689, 691, 704, point, 252–257, 259, 262–264
710, 711, 716, 717, 719, 730, 738, 740, Bra vector, 22, 523, 526, 529
744, 745, 747, 751, 757, 761, 764, 770, Brewster angles, 312–314
783, 784, 803, 805, 820, 822, 843, 847,
848, 852, 854, 855, 859, 860, 863, 872,
878 C
Benzene, 648, 729, 747, 762, 764–771 Canonical commutation relation, 3, 27–30, 34,
Bijective, 447, 450, 451, 623, 628, 629, 895 43, 44, 54, 59, 65
Bijective mapping, 895 Canonical coordinate, 27
Binomial expansion, 233, 234, 852 Canonical forms of matrices, 459–522, 559
Binomial theorem Canonical momentum, 27
generalized, 95, 233 Cardinalities, 183
Biphenyl, 648 Cardinal numbers, 183
Index 901
Carrier space, 687 Column vectors, 8, 21, 41, 88, 90, 290, 435,
Cartesian coordinates, 45, 61, 109, 171, 197, 440, 441, 452, 455, 457, 462–465, 502,
283, 321, 349, 650, 736, 791, 793, 804, 505, 507, 508, 510, 527, 535, 541, 542,
806, 823, 889 556, 557, 568, 573, 574, 577, 596, 613,
Cartesian space, 59, 436 658, 677, 684, 759, 782, 810, 817, 820,
Cauchy–Hadamard theorem, 220 831, 863
Cauchy–Liouville theorem, 215–216 Common factor, 477, 478, 482, 483
Cauchy Riemann conditions, 203–206 Commutable, 27, 482, 486–488, 524, 556, 579,
Cauchy–Schwarz inequality, 525, 585 642, 643, 869
Cauchy’s integral formula, 206–216, 219, 225, Commutation relation, 3, 27–30, 34, 43, 44, 54,
229 59, 65, 73, 77, 142–144, 872, 895
Cauchy’s integral theorem, 209–212, 227, 243 Commutative, 27, 171, 472, 518, 585, 591, 603,
Cavity, 33, 339, 341, 343–345, 375, 376, 378 604, 615, 622, 639, 649, 689, 824, 825,
Cavity radiation, 339 865, 870, 871, 889, 890
Center of inversion, 642 Commutative groups, 622
Center-of-mass coordinates, 57, 58 Commutative law, 622
Central force fields, 57, 107 Commutator, 27–30, 144, 868, 872
Characteristic equation, 421, 461, 462, 501, Compatibility relation, 750
533, 567, 573, 675, 689, 831 Complementary set, 183, 212, 888
Characteristic impedance Complements, 183, 185, 189, 190, 195, 212,
of dielectric media, 295, 336 547–549, 559, 688, 691, 888
Characteristic polynomial, 461, 471, 477, 481, Complete orthonormal system (CONS), 151,
489, 513, 517, 520, 782 155, 163, 166, 167, 173, 842
Characters, 373, 453, 524, 628, 631, 637, 694, Complete system, 155
697–700, 702, 703, 722, 723, 726, 741, Completely reducible, 825, 830
745, 746, 748, 750, 751, 755–757, 759, Complex amplitude, 303
761, 765, 770, 775, 776, 780, 784, 796, Complex analysis, 181, 182, 197
797, 834, 837–844 Complex conjugate, 22, 24, 25, 28, 34, 117,
Chemical species, 374, 650 137, 199, 347, 390, 397, 401, 523, 524,
Circularly polarized light 526, 536, 537, 693, 714, 762, 816, 817,
left-, 136, 138, 147–149, 293 820, 850, 876
right-, 136, 138 Complex conjugate representation, 762, 816,
Clad layers, 321, 327, 331, 371 817
Classes, 191, 625–627, 637, 653, 654, 656, 657, Complex conjugate transposed matrix, 22, 537
659, 662, 663, 699, 700, 704–711, 746, Complex domain, 19, 201, 216, 227, 263–265,
834, 838, 839, 841, 888, 889 315
Classical Hamiltonian, 33 Complex function, 14, 181, 199, 200, 204, 206,
Classical orthogonal polynomials, 47, 94, 379, 207, 216, 254, 389, 764
427 Complex phase shift, 329
Clebsch Gordan Coefficients, 843 Complex plane, 14, 181, 182, 197–199, 201,
Clopen sets, 190, 193, 195, 891 206, 207, 211, 212, 216, 227, 241, 242,
Closed interval, 196 249, 251, 253, 263, 317, 318, 521, 885
Closed-open set, 190 Complex roots, 235, 420, 421
Closed paths, 194, 209, 892 Complex variables, 15, 199–206, 215, 230,
Closed sets, 185, 186, 188–190, 193, 195, 196, 532, 855
212, 891 Composite function, 63
Closed shell, 741, 755 Compton, A., 4, 6
Closed surface, 273, 274 Compton effect, 4
Closures, 186–189, 654 Compton wavelength, 5
C-numbers, 10, 22 Condon–Shortley phase, 101, 102, 106, 123,
Cofactor, 471, 472 138, 140, 141
Cofactor matrix, 471, 472 Conjugacy classes, 625, 654, 657, 659, 662,
Coherent state, 126–128, 133, 139, 149 699, 707, 710, 711, 841
902 Index
Conjugate element, 625 Cross-section area, 355, 356, 376

Conjugate representation, 762 Crystal
Connected anisotropic, 366, 569
components, 886–888, 891, 896 organic, 359, 360, 362, 363, 365, 372–374,
set, 886–888, 892 569
Connectedness, 194, 199, 887, 892, 896 Cubic roots, 249
Conservation of charge, 274 Current continuity equation, 274
Conservation of energy, 5 Cyclic groups, 622
Constant loop, 892 Cyclic permutation, 641, 677
Constant matrix, 603, 604, 610, 617 Cyclopropenyl radical, 729, 747, 757–764, 767,
Constant term, 169 768, 770, 771
Constructability, 401
Constructive interference, 357
Continuous function, 214, 885–887, 892 D
Continuous groups, 190, 663, 804, 816, 833, Damped oscillator, 419–424
838, 848, 877, 883, 894, 896 Damping
Continuous mapping, 886, 895 critical, 419
Contour integral, 207, 208, 229, 230, 232, 236, over, 419
239, 240, 242, 244 weak, 420
Contour integration, 211, 213, 214, 217, 220, Damping constant, 419
230, 232, 236, 238–240, 243, 247, 257, Darboux inequality, 212–213, 231
261, 262, 265 de Broglie, L.-V., 6
Contour map, 729 de Broglie wave, 6
Contraposition, 204, 223, 382, 449, 476, 528, Definite integrals, 26, 98, 112, 132, 137, 171,
667, 825 181, 230–248, 257, 259, 261, 717, 723,
Contravariant, 850, 851 741, 742, 792
Convergence circle, 220, 226 de Moivre’s theorem, 198, 250
Convergence radius, 220 de Morgan’s law, 184, 186
Coordinate point, 289, 292 Degeneracy, 152
Coordinate representation, 3, 31, 44–51, 112, Degenerate
122, 129, 132, 136, 146, 157, 160, 165, doubly, 768
168, 170, 171, 175, 178, 256, 395, 396, triply, 789, 795, 796
791, 823, 837 Derived set, 191
Coordinate transformation, 291, 367, 441, 573, Determinants, 42, 59, 287, 383, 409, 433,
635, 641, 677, 738, 816, 837 448–452, 454, 471, 517, 531, 567, 571,
Core layer, 321, 327, 329, 331 589, 647, 654, 662–664, 744, 787, 808,
Corpuscular beam, 6, 7 863, 887, 888, 891
Correction terms, 153–156 Device physics, 330, 359, 374
Coset Device substrate, 363, 370
decomposition, 624, 626, 649, 890 Diagonal elements, 87, 88, 90, 287, 450, 461,
left, 624–626 462, 465, 476, 496, 498, 499, 512, 514,
right, 624–626 518, 559, 560, 570, 588, 589, 639, 640,
Coulomb integral, 740, 752, 767, 788 654, 690, 697, 744, 745, 761, 782, 784,
Coulomb potential, 57, 67, 107 787, 870, 871, 873
Coulomb’s law, 270 Diagonalizable, 475, 553
Countable, 14, 565, 621 Diagonalizable matrices, 512–522, 561
Coupling, 849 Diagonalization, 466, 517, 573–580, 601, 667,
Covariant, 850, 851 690
Covering group, 896 Diagonalized, 43, 54, 367, 475, 477, 486, 487,
Cramer’s rule, 306, 406, 809 513, 515, 530, 547, 556, 557, 559, 561,
Creation operator, 41 567, 578–580, 600, 606, 690, 691, 782,
Critical angles, 312–314, 317, 318, 320, 329 804
Critical damping, 419
Index 903
Diagonal matrices, 89, 459, 485, 517, 535, 556, Dispersion of refractive index, 358, 359, 361,
558, 559, 568, 573, 600, 601, 604, 638, 362
681, 689, 738, 782, 805, 826, 828, 879 Displacement current, 273, 275
Diagonal positions, 465, 486 Distance function, 181, 196, 199, 524
Dielectric constant, 270, 271, 336, 365 Distributive law, 184, 442
Dielectric constant ellipsoid, 366 Divisor, 473, 514, 648
Dielectric medium, 269–272, 275, 276, 280, Domain
285, 286, 295–337, 339, 357, 365, 371, of analyticity, 206, 209, 210, 212, 216, 227
375, 376, 419 complex, 19, 227, 263–265, 315
Differentiable, 65, 201, 204, 206, 215, 275, of variability, 181, 895
406, 581, 591, 866, 868
Differential equations, 3, 14, 15, 17, 19, 26, 44,
50, 67, 83, 91–103, 107, 108, 169, 174, E
269, 341, 368, 379–389, 392, 401, 403, Effective index, 362, 368
405, 408, 418, 421, 426, 427, 581, Eigenenergy, 43, 111, 121, 151–156, 162–164
592–600, 613, 617 Eigenequation, 460
Differential operators, 9, 14, 19, 26, 27, 50, 68, Eigenfunctions, 17, 18, 24, 26, 36, 37, 40, 41,
72, 77, 167, 172–174, 270, 379, 386, 67–69, 72, 86, 110, 112, 137, 152, 173,
388–394, 399, 402, 403, 409, 414, 425, 429, 690, 715, 720, 739, 768, 773, 774,
426, 523, 547 793, 826, 861, 863
Diffraction grating, 359, 361, 363, 371 Eigenspace, 460, 468–473, 478–485, 487, 488,
Diffraction order, 364, 373 496, 513, 515, 517, 559, 565, 577, 719
Dimensionless, 72, 77, 79, 108, 270–272, 278, Eigenstate
355, 359, 790, 801 energy, 156, 157, 163, 166, 176, 774
Dimensionless parameter, 50, 152 Eigenvalue
Dimension theorem, 444, 445, 447, 479, 492, equation, 13, 24, 29, 33, 50, 51, 58, 126,
493, 514 151, 152, 160, 172, 371, 408, 426, 460,
Dipole 466, 502, 521, 534, 577, 729, 734, 742,
approximation, 127, 142, 347 744, 789
isolated, 353 problems, 14, 16, 18, 21, 24, 120, 125, 379,
oscillation, 339 425–430, 459, 522, 579, 691, 720, 785
radiation, 349–354 Eigenvectors, 26, 41, 74, 152, 153, 155, 288,
Dirac equation, 12 290, 459–468, 471, 473–478, 493, 495,
Dirac, P., 11, 12, 22, 523–526 497, 500–503, 505, 507, 510, 513, 515,
Direct factors, 632, 649 516, 518, 520, 521, 556, 559, 561, 566,
Direct sum, 191, 193, 437, 471, 473, 474, 575–580, 654, 665, 667, 676, 691, 743,
477–481, 484, 487–489, 496, 513, 515, 746, 782, 831, 832
517, 518, 547, 550, 633, 686, 687, 691, Einstein, A., 3, 4, 6, 7, 339, 341, 345–347, 349,
699, 702, 719, 742, 749, 769, 770, 781, 353, 354
825, 843, 886, 888, 889 Einstein coefficients, 349
Direction cosines, 279, 305, 658, 675, 676, 832, Electric dipole
834, 835 approximation, 142
Direct-product groups, 621, 631–633, 649, 720, moment, 127, 164, 166, 741, 755
723, 750, 843, 844, 888 operator, 796
Direct-product representations, 720–723, 725, transition, 125–127, 129, 132, 141, 146,
740, 741, 745, 769, 770, 796, 831, 740, 754, 755
843–846, 849, 859, 861, 863 Electric displacement, 270
Dirichlet conditions, 14, 19, 26, 71, 325, 342, Electric field, 127, 147, 149, 151, 156, 160,
344, 408 161, 163, 164, 172, 174, 177, 269, 270,
Disconnected, 194 272, 284–286, 289, 292, 293, 297,
Discrete set, 191 303–306, 310, 315, 324, 325, 328, 329,
Discrete topology, 195 331, 335, 336, 344, 350–352, 368–370,
Disjoint sets, 194 375, 376, 418, 419, 611, 755, 796
904 Index
Electric flux density, 270, 367 Evanescent mode, 320

Electric permittivity, 365 Evanescent wave, 321, 327–331
Electric permittivity tensor, 365 Even function, 18, 49, 132, 157, 837, 840, 842
Electric waves, 284–286 Even permutations, 449, 873
Electromagnetic fields, 33, 127, 148, 295–297, Excited state, 37, 128, 132, 152, 174, 175, 177,
303, 305, 321, 351, 363, 365, 369, 371, 339, 346, 347, 354–356, 754, 769, 776,
377, 569, 611 796, 797
Electromagnetic induction, 272 Existence probability, 21, 32, 139
Electromagnetic plane wave, 281 Existence probability density, 139
Electromagnetic waves, 127, 151, 269, Expansion coefficient, 173, 396, 836
280–293, 295–337, 339, 341–345, 352, Expectation value, 24, 25, 33–36, 51, 52, 67,
357, 363, 367, 368, 375, 379, 419, 172, 174, 565, 566
595–600 Exponential functions, 15, 198, 241, 281, 298,
Electromagnetism, 166, 269, 270, 279, 378, 299, 423, 581–617, 799, 802, 808, 826,
379, 427, 569 891
Electronic configuration, 741, 754, 755, 769, Exponential polynomials, 299
774–776, 796, 797 External force fields, 58
Element, 45, 83, 127, 182, 287, 347, 433, 447, Extremals, 129, 334
523, 553, 581, 621, 635, 679, 735, 812 Extremum, 176
Elementary charge, 58, 127, 148, 347
Ellipse, 288, 289, 292, 366, 574, 575
Ellipsoidal coordinates, 790, 791, 793 F
Elliptic polarizations, 269 Factor group, 627, 631, 649
Elliptically polarized light, 288, 289, 292 Faraday, M., 272
Emission color, 374 Faraday’s law of electromagnetic induction,
Emission gain, 374 272
Empty set, 183, 185, 626 Finite group, 621–623, 626, 628, 629, 631, 662,
Endomorphism, 433, 441, 443, 444, 446–449, 663, 679–681, 683, 684, 686, 692, 703,
452, 454, 622, 630, 876, 878 799, 833, 840, 841, 845, 846, 888, 890
Energy conservation, 5, 311 Finite set, 183
Energy eigenvalues, 20, 31, 36, 37, 51, 133, First-order approximation, 177, 178
151, 152, 154, 158, 172, 532, 691, 719, First-order linear differential equation
739, 751, 753, 754, 762, 767, 768, 774, (FOLDE), 44, 379, 384–389, 593, 598,
786, 795 607, 608
Energy level, 41, 153–156, 172, 345, 764, 796 First-order partial differential equations, 269
Energy transport, 308–311, 320, 335 Fixed coordinate system, 640, 657, 678, 831,
Entire function, 206, 215, 216, 242 895
Equation of motion, 58, 148, 419 Flux, 148, 271, 272, 311, 352, 356, 612
Equation of wave motion, 269, 276–280, 322 Focused ion beam (FIB), 363
Equilateral triangle, 650, 757 Forbidden, 125, 755, 770, 797
Equivalence law, 485 Forward wave, 278, 331, 333, 335
Equivalence relation, 885 Four group, 639, 648
Equivalent transformation, 569 Fourier coefficients, 167
Essential singularity, 225 Fourier cosine coefficients, 842
Ethylene, 729, 747–757, 759, 761, 764, 770, Fourier cosine series, 842
771 Fréchet axiom, 195
Euclidean space, 196, 635, 639, 663 Free space, 295, 326, 331, 352
Euler angles, 669–678, 808, 811, 815, 820, 826, Functionals, 13, 15, 100, 101, 118, 122, 134,
831, 881 139, 160, 167, 169, 201, 203, 219, 227,
Euler equation, 169 234, 278, 380, 382, 384, 388, 389, 393,
Euler’s formula, 198 408, 523, 594, 717, 734, 741, 746, 754,
Euler’s identity, 198 854
Index 905
Function space, 25, 623, 728 H

Fundamental set of solutions, 15, 383, 404, 405, Half-integer, 75, 77
409, 411, 418–421, 613, 617 Half-odd-integers, 75, 77, 79, 82, 812, 815,
Fundamental solution, 608 849, 855, 884
Hamilton–Cayley theorem, 471–472, 482, 489,
514
G Hamiltonian, 11, 21, 29, 31, 33, 36, 43, 57–67,
Gain function, 356 107, 152, 157, 164, 174, 177, 532, 735,
Gamma functions, 95, 97, 98, 856 738, 740, 741, 745, 761, 788, 790
Gaussian plane, 181 Harmonic oscillator, 30–52, 57, 107–109, 125,
Gauss’ law, 270, 271 126, 130–132, 151, 152, 160, 174, 175,
Gegenbauer polynomials, 94, 99 339, 341, 375, 377, 379, 418, 419, 532,
Generalized angular momentum, 72–77, 810, 748, 755
811, 844, 873, 895 Heisenberg, W., 11
Generalized binomial theorem, 95, 233 Hermite differential equation, 51, 426, 427
Generalized eigenspaces, 471, 478–485, 488, Hermite polynomials, 47, 49, 57, 163, 427
496 Hermitian, 23, 52, 67, 130, 152, 347, 379, 530,
Generalized eigenvalues, 460, 500, 589 547, 600, 681, 738, 802
Generalized eigenvectors, 471, 473–478, 493, Hermitian differential operators, 172, 174, 425
495, 497, 500, 504, 505, 507, 511, 513, Hermitian matrix, 24, 131, 530, 532, 537, 568,
521 572, 577, 600
Generalized Green’s identity, 392, 409 Hermitian operator, 23–25, 52, 53, 68, 77, 156,
General linear group, 623, 808 173, 177, 347, 379, 399, 402, 427, 429,
General point, 635, 730, 815 547–580, 740, 806
General solution, 15, 32, 278, 384, 409, 420 Hermitian quadratic form, 532, 535, 547,
Generating function, 95, 264 568–575
Geometric object, 635–638, 640, 648, 654 Hermiticity, 3, 19, 25, 26, 117, 391, 394, 402,
Geometric series, 217, 221 426, 427, 577, 745
Glass, 269, 280, 286 Hesse’s normal forms, 280, 658, 661
Global properties, 890–896 Highest-occupied molecular orbital (HOMO),
Gram matrix, 529, 530, 532–535, 541, 563, 754, 769
568, 681 Hilbert space, 25, 565, 748
Gram–Schmidt orthonormalization, 543, 556 Homeomorphic, 895
Grand orthogonality theorem (GOT), 692–697, Homogeneous BCs, 388, 389, 393, 394, 397,
713 399, 401–403, 405, 407–409, 411, 423,
Grating period, 364, 374 425, 426
Grating wavevector, 363, 364, 372, 373 Homogeneous equation, 169, 380, 397, 399,
Grazing emissions, 364, 372 401, 402, 404, 405, 408, 419, 425, 426,
Grazing incidence, 317, 318 594, 597, 600, 604, 607, 613, 614
Greatest common factor, 478 Homomorphic, 628–631, 680, 685, 708
Green’s functions, 379–430, 581, 592, 594, 617 Homomorphic mapping, 629, 630
Green’s identity, 392, 397 Homomorphism, 621, 627–631, 679–681, 713,
Ground state, 36, 128, 132, 152, 153, 156, 157, 718, 881, 883
163, 164, 172, 174, 176, 177, 339, 340, Homomorphism theorem, 631, 883
345–347, 741, 754, 769, 776, 796, 797 Homotopic, 892
Group Hückel approximation, 754
algebra, 700–708 Hydrogen, 57, 122, 125, 132–142, 163–172,
representation, 679–726 755, 777, 780, 785, 788–790
theory, 581, 621–633, 635, 636, 643, Hydrogen-like atom, 57–123, 125, 126, 132,
679–726, 729–797 140, 142, 151, 153, 532, 579
Group refractive index, 358 Hypersphere, 892
Group velocity, 7, 12, 326 Hypersurface, 574, 892
906 Index
I contour, 211, 213, 214, 217, 220, 230, 232,

Idempotent matrix, 479, 482, 486, 517, 520, 236, 238, 239, 243, 247, 257, 261, 262,
521, 564, 604 265
Identity counter-clockwise, 210, 211, 232, 257
element, 622, 624, 625, 627–631, 638, 649, definite, 132
697, 751, 759, 882, 886–888, 891, 892, by parts, 25, 29, 83, 130, 386
896 path, 209, 210, 222
matrix, 43, 480, 482, 484, 486, 494, 562, range, 25, 239, 415, 417, 742
567, 582, 640, 664, 667, 668, 680, 694, real, 135
702, 703, 780, 802, 865, 884, 890 surface, 274
operator, 35, 55, 155, 167, 395, 397, 480 termwise, 218, 222
transformation, 640, 663, 667, 882 volume, 274
Image Interface, 295–298, 300–305, 311, 314, 320,
of transformation, 443 321, 324–327, 331, 335–337, 339, 344,
Imaginary axis, 181, 197 363, 365, 370, 375, 376
Imaginary unit, 181 Interference, 331, 357
Improper rotation, 643, 647, 663, 780, 889, 890 Interior point, 186
Incidence angle, 301, 313 Intermediate state, 156
Incidence plane, 301, 304, 306, 311, 315 Intermediate value theorem, 887
Incident light, 300, 302, 304 Intersection, 183, 184, 190, 623, 643, 645, 646,
Indeterminate constant, 18 649, 661, 895
Indiscrete topology, 195 Invariant, 461, 468–474, 481, 487, 501, 513,
Inequivalent, 685, 692, 694–696, 699, 707, 559, 566, 569, 589, 626, 632, 687, 689,
710, 711, 830, 839, 841, 842 691, 706, 719, 734, 738, 803, 806, 814,
Infinite dielectric media, 285, 286 837, 839, 849, 851, 852, 860
Infinite-dimensional, 41, 43, 435, 748 Invariant subgroups, 626, 630, 632, 649, 662,
Infinite group, 622, 628, 629, 635, 663, 799, 706, 883, 887, 888
843, 846 Invariant subspace, 468–474, 477, 481, 490,
Infinite power series, 582 501, 503, 504, 513, 518, 521, 547, 559,
Infinite series, 159, 218, 225 577, 579, 687, 688, 691, 781
Infinite set, 182 Inverse element, 447, 622, 627, 629, 631, 638,
Inflection points, 235 656, 680, 701, 706
Inhomogeneous boundary conditions (BCs), Inverse mapping, 447
402, 403, 405, 407, 415, 419, 424, Inverse matrix, 42, 200, 433, 448–452, 454,
597–599 511, 512, 571, 596, 601, 605, 680
Inhomogeneous equation, 380, 383, 402, 407, Inverse transformations, 443, 447, 448, 622,
594, 596–599, 607 816, 818
Initial value problem (IVP), 379, 408–424 Inversion symmetry, 642, 643, 647, 654
Injective, 447, 448, 452, 627, 629, 895 Inverted distribution, 355
Injective mapping, 627 Invertible, 447, 622
Inner product Invertible mapping, 447
of functions, 719, 740 Irradiance, 311, 355–357
space, 196, 523–544, 547, 550, 554–556, Irrational functions, 248
563–566, 584 Irreducible, 686, 688, 691–694, 696–700, 703,
of vectors, 397, 523, 576 707–711, 715–717, 719, 720, 722, 738,
vector space, 523, 551, 868 740–742, 745–751, 754–757, 759, 761,
In-quadrature, 375, 376 765, 769–771, 774, 775, 778, 781,
Integrand, 214, 232, 244, 246, 257, 259, 262, 783–786, 788, 789, 795–797, 823–831,
265 837–847, 859
Integration Irreducible character, 697, 842–844
clockwise, 210, 257, 262 Irreducible representations, 687, 688, 692, 694,
complex, 206 696, 698, 699, 702, 703, 707–711,
constant, 149, 376, 385, 593, 594 715–717, 719, 720, 738, 740–742,
Index 907
745–748, 750, 751, 754–757, 759, 761, Light-emitting device, 363, 372, 373
765, 769–771, 774, 775, 778, 781, Light-emitting material
783–786, 788, 789, 795–797, 823–831, Light propagation, 148, 269, 319, 364, 366
837, 840–847, 859 Light quantum, 3, 4, 345–347
Isolated point, 191–193 organic, 359, 361, 374
Isomorphic, 629, 631, 648, 654, 661–663, 685, Limes superior, 220
734, 883, 895 Limit, 202, 205, 207, 212, 213, 220, 258, 312,
Isomorphism, 621, 627–631, 679, 890 415, 417, 587, 591, 703, 802, 836, 858,
869
Linear algebra, 491
J Linear combination
Jacobi’s identity, 872 of basis vectors, 155, 435, 440, 519, 691,
j j coupling, 849 843
Jordan blocks, 490–502, 504–506, 513 of functions, 126, 849, 854, 861
Jordan canonical forms, 459, 488–512 of a fundamental set of solutions, 404
LCAO, 734
Linear combination of atomic orbitals (LCAO),
K 729, 734, 742
Kernel, 443, 501, 629–631, 881, 883 Linearly dependent, 17, 93, 100, 107, 299, 344,
Ket vector, 22, 523, 526 381–383, 434, 443, 444, 454, 466, 468,
469, 471, 527, 528, 531, 532, 543, 693
Linearly independent, 15–17, 32, 71, 298, 341,
L 345, 368, 380–383, 404, 433, 434, 437,
Laboratory coordinate system, 366 445, 453, 456, 462, 463, 466–470, 473,
Ladder operators, 74, 90, 91, 112 476–478, 489, 490, 513, 526, 528, 532,
Laguerre polynomials, 57, 108, 116–122 533, 543, 544, 547, 568, 576, 579, 596,
Laser 613, 664, 665, 684, 685, 688, 693, 701,
material, 358, 361, 363 710, 715, 716, 785, 786, 795, 812, 822
medium, 355–358, 365 Linearly polarized light, 142, 144, 286, 293
organic, 359–374 Linear transformation group, 623
oscillation, 355, 358, 360, 374 Linear vector space
Lasing property, 359 finite-dimensional, 872
Laurent’s expansion, 221–223, 229, 233, 234 Line integral, 296
Laurent’s series, 216–224, 228, 229, 234 Local properties, 890–896
Law of conservation of charge, 274 Logarithmic branch point, 256
Law of electromagnetic induction, 272 Logarithmic functions, 248
Law of equipartition of energy, 341 Longitudinal multimode, 358
Left-circularly polarized light, 136, 138, Loop, 296, 892, 895
147–149, 293 Lorentz force, 147, 611, 612, 615
Left-circular polarization, 293 Lowering operators, 74, 90, 108, 112, 122
Legendre differential equation, 83, 91–103, 427 Lower left off-diagonal elements, 450, 462
Legendre polynomials, 57, 85, 91, 93, 98, 100, Lower triangle matrix, 462, 512, 553, 606
104, 106, 107, 264 Lowest-unoccupied molecular orbital (LUMO),
Leibniz rule, 92, 94, 101, 118 754, 769
Leibniz rule about differentiation, 92, 94 L S coupling, 849
Levi-Civita symbol, 873 Luminous flux, 311
Lie algebra, 592, 600, 610, 799, 802, 864, 891,
892, 895, 896
Lie group, 891 M
linear, 866, 867, 877, 887, 891, 892 Magnetic field, 147, 269, 271–273, 275,
Light amplification, 354, 355 283–285, 297, 303–307, 310, 322, 324,
Light amplification by stimulated emission of 326, 344, 350, 352, 368–370, 375, 376,
radiation (laser), 354 611
908 Index
Magnetic flux, 272 Molecular orbitals (MOs), 691, 719, 720, 729,
Magnetic flux density, 148, 271, 612 734–797
Magnetic permeability, 271, 365 Molecular science, 3, 711, 723, 729, 740, 797
Magnetic quantum number, 140, 142 Molecular short axis, 777
Magnetic wave, 284, 286 Molecular states, 729
Magnetostatics, 271 Molecular systems, 720, 723
Magnitude relationship, 412, 693, 694, 797 Momentum, 4–6, 27, 55, 59, 60, 72–77, 79, 82,
Major axis, 291, 292 86, 91–107, 125, 132, 138, 147–150,
Mapping 375, 377, 810, 843, 844, 873, 895
inverse, 447 Momentum operators, 3, 29, 31, 33, 61, 77–91,
invertible, 447 108, 142, 145
non-invertible, 447 Monoclinic, 365
reversible, 447 Monomials, 812, 855, 857
Mathematical induction, 48, 80, 113, 114, 462, Moving coordinate systems, 669, 670, 672,
464, 466, 543, 556, 558, 570, 583 678, 831, 893
Matrix Multiple roots, 249, 461, 473, 474, 477, 478,
algebra, 43, 306, 367, 433, 443, 451, 474, 514, 515, 517, 521, 559
530, 547 Multiplication table, 623, 648, 688, 701
decomposition, 485, 488, 512, 561 Multiplicity, 468, 481, 486, 493, 498, 562, 565,
representation, 3, 31, 41–44, 86, 89, 90, 575, 576, 782
440, 442, 443, 450, 451, 454, 456, 468, Multiply connected, 194, 222
490, 494, 503–505, 519, 536–538, 540, Multivalued function, 248–265
553, 576, 579, 640, 643–646, 648, 650,
652, 657, 659, 661, 663, 689–691, 694,
697, 730, 738, 746, 748, 758, 764, 778, N
800, 808, 826, 860, 863, 881, 884 Nabla, 61
Matrix element Nano capacitor, 157
of electric dipole, 127, 129 Nano device, 160
of Hamiltonian, 131, 532, 740, 741 Naphthalene, 648, 658
Matrix equation, 504 Necessary and sufficient condition, 53, 190,
Matrix function, 885 193, 195, 206, 382, 383, 447, 448, 451,
Matter wave, 6, 7 453, 461, 528, 533, 556, 563, 622, 624,
Maximum number, 434 629, 825, 887, 891
Maxwell, J.C., 275 Negation, 187, 191
Maxwell’s equations, 269–293, 321, 365, 366 Negative helicity, 293
Mechanical system, 33, 375–378 Neumann conditions, 27, 327, 344, 394
Meromorphic, 225, 231 Newtonian equation, 31
Meta, 767 Nilpotent, 475, 485–488, 498, 553
Methane, 659, 729, 777–797 Nilpotent matrices, 90, 91, 473–478, 484,
Method of variation of constants, 594, 596, 614 487–494, 497, 499, 500, 509, 512, 553,
Metric, 181, 196, 199, 523 607
Metric space, 196, 199, 220, 524 Nodes, 325, 335–337, 376, 747, 748, 789
Minimal polynomial, 472, 473, 475, 499, 508, Non-commutative, 3, 27, 579
513–515, 517 Non-commutative group, 622, 639, 746
Minor axis, 291 Non-degenerate, 151–154, 163, 176, 575
Mirror symmetry Non-equilibrium phenomenon, 355
plane of, 642, 643, 646–648 Non-invertible mapping, 447
Mode density, 341–345 Non-magnetic substance, 280, 286, 308, 312
Modulus, 197, 213, 235, 836 Non-negative, 27, 34, 49, 52, 68, 69, 74, 93,
Molecular axis, 648, 756, 791, 795 146, 197, 214, 241, 242, 384, 385, 524,
Molecular long axis, 756, 777 532, 535, 547, 568, 615, 811, 819, 855
Molecular orbital energy, 735, 746 Non-negative operator, 72, 74, 532
Index 909
Non-relativistic approximation, 7, 11 473, 496, 501, 502, 513, 639, 710, 742,
Non-singular matrix, 433, 454, 455, 462, 463, 746, 755, 803, 806, 864
465, 475, 486, 497, 499, 508, 513, 570, case, 343
571, 588, 681, 685, 865, 875 harmonic oscillator, 31, 57, 126, 130–132,
Non-trivial, 381, 434 755
Non-trivial solution, 382, 383, 399, 408, 426, infinite potential well, 19
521 position coordinate, 33
Non-vanishing, 15, 16, 90, 91, 128, 131, 137, representation, 746, 761, 762, 783, 800
146, 223, 225, 228, 274, 296, 394, 591, system, 128–132, 163
654, 741, 745, 770, 785 One-parameter group, 865, 870, 891
Non-zero Open ball, 220
coefficient, 300 Open neighborhood, 196
determinant, 454 Operator, 31, 61, 130, 151, 270, 347, 379, 456,
eigenvalue, 498 478, 523, 547, 610, 647, 684, 729, 801
element, 87, 88 Operator method, 33–41, 132
vector, 67 Optical devices, 314, 319, 320, 337, 354, 355,
Norm, 24, 53, 524, 525, 529, 531, 539, 541, 363, 371
543, 554, 561, 566, 573, 581, 584, 763, Optical path, 327, 328
764 Optical process, 125, 136, 346
Normalization constant, 37, 45, 51, 112, 140, Optical transition, 125–151, 346, 347, 717, 720,
543 722, 741, 746, 754, 755, 757, 769, 770,
Normalized, 13, 18, 24, 25, 28, 34, 35, 37, 39, 774–776, 796, 797
47, 53, 54, 67, 71, 76, 86, 106, 110, 113, Optical waveguide, 329
115, 121, 122, 128, 132–134, 139, 140, Optics, 295, 339, 362
153, 154, 174, 175, 356, 543, 544, 565, Orbital angular momentum, 58, 77–107, 843,
726, 742, 743, 751, 753, 754, 768, 771, 844, 895
785, 795, 847, 854, 864 Orbital angular momentum quantum numbers,
Normalized eigenfunction, 112, 773 108, 120, 122, 142
Normalized eigenstate, 53 Orbital energy, 789, 795
Normalized eigenvectors, 37, 153, 290 Order of group, 648, 696
Normal operator, 547, 554–557, 561, 563, 668 Ordinate axis, 129
n-th roots, 248–251 Organic laser, 359–374
Nullity, 444 Organic materials, 361
Null-space, 443, 501 Orthogonal, 47–49, 94, 107, 175, 304, 344,
Number operator, 40 379, 427, 429, 524, 539, 544, 548, 551,
Numerical vector, 435 560, 561, 636, 655, 691, 710, 717, 719,
737, 745, 746, 751, 759, 774, 786, 871,
875, 884
O complements, 547–549, 559, 688, 691
O(3), 280, 642, 647, 654–663, 888–889 coordinate, 197, 365, 366, 817
Oblique coordinate, 639 group, 663–678, 871
Oblique incidence matrix, 569, 574, 590, 592, 614, 636, 638,
of wave, 295 654, 663, 664, 667, 672, 680, 736, 822,
Observable, 565, 566 831, 832, 837, 871, 875, 876, 881, 884
Octants, 656 transformation, 636, 663, 669, 878, 881
Odd functions, 18, 49, 132, 157 Orthonormal base, 8, 532, 540–543
Odd permutations, 449, 873 Orthonormal basis set, 547, 584, 720, 739, 740,
Off-diagonal elements, 450, 512, 560, 639, 640, 847, 878, 881
744, 745, 761, 787 Orthonormal basis vectors, 59, 60, 636, 650,
O(n), 875 716, 730, 745
One-dimensional, 33, 49, 128, 130, 151, 152, Orthonormal eigenfunctions, 40
157, 167, 174, 177, 196, 277, 357, 375, Orthonormal eigenvectors, 153, 559
910 Index
Orthonormal system, 107, 151, 155, 842 Physicochemical properties, 746, 754
Oscillating electromagnetic field, 127 Planar molecule, 650, 747, 757, 764
Out-coupling Planck, M., 3, 127, 339, 341–345, 349
of emission, 372 Planck’s law of radiation, 339, 341–345, 349
of light, 363, 364 Plane electromagnetic waves, 283
Over damping, 419 Plane figure, 643
Overlap integrals, 717, 740, 753, 754, 767, 774, Plane of incidence, 301
777, 785, 788, 789 Plane of mirror symmetry, 642, 647
Plastics, 286
Point group, 635, 638, 648, 659, 745, 746, 755,
P 778
Paper-and-pencil, 796 Polar coordinate, 29, 45, 58, 62, 146, 171, 197,
Para, 767 241, 806, 837
Parallelepiped, 355–357 Polar coordinate representation, 146, 256, 837
Parallelogram, 848 Polar form, 197, 237, 249
Parallel translation, 278 Polarizability, 163–171
Parameter space, 831–837, 892, 895 Polarization vector, 127, 134, 135, 138, 284,
Parity, 18, 49, 130 285, 303–306, 309, 315, 324, 331, 335,
Partial differentiation, 8–10, 283, 321 350, 352, 375, 741, 755, 796
Partial fraction decomposition Polyenes, 777
method of, 232, 245 Polymers, 269, 280
Partial sum, 159, 218 Polynomials, 38, 47–49, 57, 85, 91, 93–95,
Particular solutions, 169, 383 98–100, 104–107, 116–122, 163, 231,
Path, 194, 208–210, 212, 222, 327, 328, 357, 235, 264, 298, 299, 379, 427, 428, 460,
892 461, 471–473, 475, 477, 478, 481–483,
Pauli spin matrices, 810 486, 488, 489, 499, 508, 513–515, 517,
Periodic conditions, 27, 394 518, 520, 603, 782, 851, 852
Periodic function, 250, 884 Population inversion, 355
Periodic wave, 278 Portmanteau word, 190
Permeability, 148, 271, 306, 365 Position coordinate, 33
Permittivity, 58, 108, 270, 365, 569 Position operator, 29, 347, 354
Permittivity tensor, 365, 569 Position vector, 8, 59, 127, 129, 134, 149, 164,
Permutation, 383, 449, 641, 662, 671, 677, 873 275, 280, 298, 349, 351, 439, 540, 612,
Perturbation 636, 730, 735, 741, 748, 755, 800
method, 151–172, 179 Positive definite, 17, 26, 27, 34, 36, 287, 288,
Perturbed 532, 534, 568, 570, 874, 981
state, 164 Positive definiteness, 17, 19, 26, 27, 31, 34, 36,
Phase change 287, 288, 532, 534, 547, 568–570, 681,
upon reflection, 295, 330 874
Phase difference, 286, 327, 328, 375 Positive-definite operator, 26
Phase factor, 18, 40, 74, 77, 84, 120, 133, 141, Positive helicity, 293
284, 544 Positive semi-definite, 27, 532
Phase matching, 364, 365, 371 Power series expansion, 33, 108, 116, 117, 121,
Phase matching conditions, 365 122, 198, 216, 582, 591
Phase refractive index, 358, 361, 369 Poynting vector, 309, 311, 335
Phase shift, 317, 318, 329, 330 Primitive function, 208
Phase velocity, 7, 8, 278, 280, 326, 331 Principal axis, 366, 373
Photoabsorption, 132 Principal branch, 251, 254
Photon Principal diagonal, 494, 496
absorption, 132, 135, 138 Principal minors, 287, 533, 570, 573
emission, 132, 134, 138 Principal part, 224
energy, 355 Principal submatrix, 570
Photonics, ix Principal value
Index 911
of branch, 251 R
of integral, 236, 237 Radial coordinate, 107–115, 197, 806, 822, 837
of ln z, 258 Radial wave function, 107–115, 120–122, 167
Probability, 21, 32, 126, 127, 133, 138, 172, Radiation field, 127, 136, 138
346, 353, 565, 741, 777 Raising and lowering operators, 74, 90
density, 126, 127, 129, 139 Raising operator, 74, 90
distribution, 126 Rational number, 198
distribution density, 128, 129 Rayleigh–Jeans law, 339, 345
Projection operator Real analysis, 182, 216
sensu lato, 712 Real axis, 181, 197, 235, 242, 246, 247, 253,
sensu stricto, 714 254, 257–259
Proof by contradiction, 190 Real functions, 49, 128, 141, 142, 200, 319,
Propagating electromagnetic waves, 295 348, 392, 397, 401, 402, 404, 752, 757,
Propagation constant, 326, 331, 361, 363, 364 764
Propagation vector, 364 Real number line, 200, 201, 395
Propagation velocity, 7, 278, 343 Real space, 800, 803, 806, 808
Proper rotation, 640, 643, 654 Real symmetric matrix, 287, 348, 569, 570,
Proposition 572, 600, 603
converse, 591, 825 Real symmetric quadratic form, 547, 569
P6T, 363, 365, 366, 368–370, 373, 374 Real variable, 14, 25, 200, 206, 207, 219, 230,
Pure imaginary, 28, 29, 315, 331, 419, 870 389, 581
Pure rotation group, 654, 659 Rearrangement theorem, 623, 681, 695, 701,
704, 713
Rectangular parallelepiped, 355–357
Q Recurrence relation, 84
Q-numbers, 10 Redshifted, 4, 372–374
Quadratic forms, 53, 287, 288, 532, 533, 535, Reduced mass, 58, 59, 108, 133, 735
547, 568–575, 814, 851 Reduced Planck constant, 4, 127
Quantum chemical calculations, 711, 712, 729, Reducible, 471, 686, 699, 700, 702, 742, 746,
734 749, 754, 759, 765, 769, 825, 830, 843,
Quantum chemistry, 764 845, 847, 864
Quantum electromagnetism, 378 Reducible representation, 686, 699, 700, 742,
Quantum-mechanical, 3, 21, 30–52, 54, 55, 57, 746, 749, 759
107–109, 122, 151, 152, 160, 165, 532 Reduction, 559, 687
Quantum-mechanical harmonic oscillator, Reflectance, 311, 317, 337
31–52, 55, 107–109, 152, 160, 532 Reflected light, 300, 302
Quantum mechanics, 3, 6, 11, 12, 18, 27, 29, Reflected wave, 311, 313, 324, 327
31, 51, 57, 58, 151–179, 339, 354, 427, Reflections
523, 547, 673, 748, 872 angle, 300, 302, 313, 314, 317, 319, 329,
Quantum number 643, 645
azimuthal, 120, 121, 142, 147 coefficient, 299, 300, 306–308, 317, 325,
magnetic, 140, 142 326, 337, 344, 379
orbital angular momentum, 108, 120, 122, Refraction angles, 302, 308, 315
142, 895 Refractive index
principal, 120, 121, 142 of dielectric, 280, 295, 308, 313, 314, 321,
Quantum operator, 29 326, 327
Quantum state, 21, 32, 52, 74, 90, 111, 121, group, 358
125–127, 132, 142, 143, 145, 151–159, of laser medium, 356, 358
163, 164, 166, 172, 174, 354, 741, 847, phase, 295, 358, 361, 362, 369
849 relative, 303, 312, 314, 318, 371
Quantum theory of light, 356 Region, 148, 199, 204, 206–209, 211, 215,
Quartic equation, 145 219–223, 225–227, 263, 285, 286, 318,
Quotient group, 627
912 Index
319, 321, 331, 336, 339, 351, 416, 842, theory, 621, 664, 679–726, 729, 745, 799,
892 819, 843
Regular hexagon, 764 unitary, 679–681, 683, 687, 695, 714, 825,
Regular point, 206, 216, 228 830, 843
Regular representation, 700–707 Residues, 227–230, 232, 233, 237, 257
Regular singular point, 169 Resolvent, 581, 595–600, 602–607, 609, 610,
Regular tetrahedron, 661, 777 614, 617
Relations between roots and coefficients, 461 Resolvent matrix, 595–600, 602–607, 609, 610,
Relative coordinate, 57–59 614, 617
Relative permeability, 271 Resonance integral, 740, 752, 767
Relative permittivity, 271 Resonance structure, 688, 689, 757
Relative refractive index, 303, 312, 314, 318, Resonator, 337, 359, 361
371 Reversible mapping, 447
Relativistic quantum mechanics, 12 Riemann sheet, 252
Removable singularity, 224, 244 Riemann surface, 248–265
Representation Right-circular polarization, 136, 138, 293
antisymmetric, 723–726, 858, 860, 863, 864 Right-handed system, 284, 304, 309, 350, 376
complex conjugate, 523, 762, 816, 817 Rigid body, 664, 729
conjugate, 762, 816, 817 Rodrigues formula, 85, 93, 94, 99, 116, 264
coordinate, 3, 31, 44–51, 108, 112, 122, Rotation
127, 129, 132, 136, 146, 157, 160, 163, angles, 367, 639, 640, 642, 647, 663, 667,
165, 168, 170, 171, 175, 178, 256, 395, 800, 831, 833–836, 838, 843, 844, 889
396, 440, 574, 640, 644, 650, 652, 791, axis, 639–643, 648, 657, 659, 661,
800, 823, 837, 838 664–668, 674–676, 831, 832, 834, 838,
dimension of, 679, 680, 685, 687, 688, 691, 839, 844, 889, 895
695–697, 703, 711, 719, 723, 726, 746, groups, 622, 654, 659, 663, 799, 806, 815,
777, 800, 802, 830, 839, 846, 847 843, 890
direct-product, 631–633, 720–723, 725, improper, 643, 647, 663, 780, 889
740, 741, 745, 750, 769, 796, 831, matrix, 615, 639, 663–668, 820, 838, 889
843–846, 849, 859, 861, 863 proper, 640, 643, 654
of group, 679–726, 840 symmetry, 640, 642
irreducible, 686, 688, 691–694, 696–700, transformation, 367, 439, 451, 622, 635,
702, 703, 707–711, 715–717, 719, 720, 640, 663, 667, 672–678, 799, 816, 832,
738, 740–742, 745–751, 754–757, 759, 833, 837, 889, 895
761, 765, 769–771, 774, 775, 778, 781, Row vectors, 8, 383, 464, 519, 528, 541, 542,
783–786, 788, 789, 795–797, 823, 825, 702, 830
828, 830, 840–847, 859
matrix, 3, 31, 34, 54, 86, 161, 440–443, 450,
451, 454, 456, 468, 487, 536, 553, 661, S
680, 681, 685, 686, 690, 698, 699, 708, Scalar, 270, 276–278, 434, 435, 446, 473, 486,
714, 719, 739, 811, 816, 817, 822, 826, 526, 577, 729
830, 838, 839, 844, 864, 880, 884 Scalar function, 277, 278, 729
of point group, 746 Schrödinger, E., 3, 7, 11, 14
reducible, 686, 699, 700, 742, 749, 759 Schrödinger equation
regular, 700–707 of motion, 58
space, 687, 688, 691, 692, 714, 716, 719, time-evolved, 126
738, 745, 761, 764, 777, 780, 796, 799, Schur’s First Lemma, 692–694, 697, 783, 822,
800, 802, 803, 806, 808, 817, 839, 842, 839
843, 846, 847, 849, 855, 859, 860, 864, Schur’s lemmas, 692–697, 823
884 Schur’s Second Lemma, 694, 695, 708, 738,
subduced, 750, 759, 765 824, 825, 830
symmetric, 723–726, 734, 738, 741, 742, Second-order differential operators, 389–394
745, 751, 754, 755, 759, 769, 784, 796, Second-order linear differential equations
858, 860, 863, 864 (SOLDEs), 3, 14, 15, 25, 33, 44, 69–71,
Index 913
82, 368, 379–384, 388–390, 394, 402, Skew-symmetric matrix, 590, 592, 600, 807,
424, 428, 581, 592–594 867, 871, 872, 876, 891
Secular equation, 503, 729, 744–747, 751, 761, Slab waveguide, 320, 321, 323, 324, 326, 327,
766, 770, 771, 774, 786, 787, 795 331, 359
Selection rules, 125–149, 720 Solid-state chemistry, 374
Self-adjoint, 389, 391–394, 399, 401, 402, 405, Solid-state physics, 374
410, 426, 427, 530 Space group, 637
Self-adjoint matrix, 530 Span, 435–437, 443–446, 454, 468, 469, 474,
Self-adjoint operator, 391, 399, 402, 410, 426 481, 489, 490, 501, 513, 521, 548, 549,
Sellmeier’s dispersion formula, 359, 362 575, 577, 665, 738, 745, 761, 764, 780,
Semicircle, 230–232, 236, 238, 239, 241, 242, 782, 796, 800, 817, 839, 842, 843, 849,
246 855, 859, 860, 864, 872, 873, 884
Semiclassical, 125, 147 Special functions, 57, 107, 617
Semiclassical theory, 147 Special orthogonal group SO(3), 635, 663–678,
Semiconductors, 286, 337, 354, 359–362 799, 802, 806, 808, 809, 815, 816,
Semi-infinite dielectric media, 295, 296 822–843, 845, 847, 864, 866, 867, 873,
Semi-simple, 485–488, 509, 512 884, 888–889, 892, 895, 896
Separation axioms, 195, 196 Special orthogonal group SO(n), 871, 875
Separation conditions, 195 Special solution, 8
Separation of variables, 12, 67–72, 107, 125, Special unitary group SU(2), 799, 802, 806,
341 808, 810, 812, 815, 823, 825, 827, 830,
Sesquilinear, 526 831, 843–849, 852, 864, 866, 867, 873,
Set difference, 183 892, 895, 896
Set of points, 225, 226 Special unitary group SU(n), 808, 871, 872, 875
Set theory, 181, 182, 196, 199 Spectral decomposition, 563–565, 580, 668
Similarity transformation, 458, 459, 461–463, Spectral lines, 127, 357, 729
465, 466, 475, 477, 484–486, 496, 497, Spherical basis, 806, 820
505, 506, 508, 510, 513, 515, 530, 532, Spherical coordinate, 58, 60, 109, 806, 823
554, 556, 557, 559, 560, 563, 569, 578, Spherical surface harmonics, 86, 92–103, 107,
580, 588, 589, 600, 604, 609, 654, 668, 167, 799, 806, 817, 820, 823, 839, 842,
679, 681, 683, 685, 686, 689, 782, 801, 884
804, 805, 822, 839, 863, 875, 889 Spherical symmetry, 57, 58, 892
Simple harmonic motion, 32 Spin angular momentum, 77, 843, 844
Simple root, 501, 502, 518 Spin states, 121, 810
Simply connected, 194, 209–212, 215, 219, Spontaneous emission, 346, 347, 353, 355
221, 222, 227, 890–896 Spring constant, 31, 419
Simply reducible, 686, 845, 847, 864 Square-integrable, 25
Simultaneous eigenfunction, 68, 86 Square matrix, 433, 462–464, 468, 469, 471,
Simultaneous eigenstate, 66, 68, 73, 79, 86, 90, 475, 485, 486, 496, 582, 584, 588, 679,
574–580 685, 692, 693, 697, 700, 802, 828
Singleton, 196 Square roots, 249, 288, 367, 790
Single-valuedness, 206, 215, 251 Standard deviation, 52
Singularity, 208, 211, 224, 225, 228, 229, 240, Stark effect, 164, 172
244, 408 State vectors, 127, 131, 133, 153
Singular matrix, 555 Stationary current, 273, 275
Singular part, 224, 225 Stationary waves, 331–337, 343, 375
Singular point Stimulated absorption, 346, 355
isolated, 235 Stimulated emission, 346, 347, 354, 355
regular, 169, 206, 228 Stirling’s interpolation formula, 586
Sinusoidal oscillation, 127, 129, 133 Stokes’ theorem, 296
Skew-symmetric, 590, 592, 600, 807, 867, 871, Strictly positive, 17
872, 876, 891 Structure constants, 872, 895
914 Index
Structure–property relationship, 359 T

Strum Liouville system, 427 Taylor’s expansion, 217, 219, 223, 226
Subduced representation, 750, 759, 765 Taylor’s series, 96, 216–222, 224, 234, 237,
Subgroups, 621, 624–626, 630, 632, 648, 649, 243, 263
659, 662, 663, 706, 750, 757, 759, 764, Td, 647, 654, 659, 661, 662, 778, 781, 784, 797,
808, 866, 875, 883, 887, 888 890
Submatrix, 367, 559, 570, 782, 860 Tensor
Subsets, 182, 183, 185–193, 195, 196, 199, electric permittivity, 365
226, 435, 624, 625, 629, 871, 885–887, permittivity, 365, 569
891 Termwise integration, 218, 222
Subspaces, 185, 194, 199, 435, 436, 438, 443, Thiophene, 648, 701
445, 460, 468–474, 477, 478, 481, 490, Thiophene/phenylene co-oligomers (TPCOs),
501, 503, 504, 513, 514, 518, 521, 360, 362, 363
548–550, 559, 560, 577, 579, 624, 639, Three-dimensional, 57, 58, 132–142, 163, 167,
687, 688, 691, 780, 782, 871 273, 278, 280, 323, 342, 349, 436, 451,
Superior limit, 220 496, 501, 635, 638, 639, 659, 669, 731,
Superposed wave, 334, 335 733, 784, 785, 800, 803, 819, 864, 892
Superposition Three-dimensional Cartesian coordinate, 8, 650
of waves, 285–293 Time-averaged Poynting vectors, 309, 311
Surface integral, 295–297, 351 Time-evolved Schrödinger equation, 126
Surface term, 386–388, 390, 394, 397, 402, Topological groups, 190, 799, 802, 896
403, 407, 410, 414–418, 427, 594 Topological spaces, 181, 185–196, 199, 891,
Surjective, 447, 448, 452, 628, 629, 631, 881, 895, 896
895 Topology, 181–199
Surjective mapping, 631, 881 Torus, 194
Symmetric, 67, 131, 171, 175, 287, 348, 371, Total internal reflection, 321, 327–331, 372
399, 403, 547, 570–572, 600, 603, 604, Total reflection, 295, 303, 314–320, 323, 330
637, 638, 662, 723–726, 734, 738, 741, Totally symmetric, 741, 745, 797
742, 745, 751, 754, 755, 759, 769, 784, Totally symmetric ground state, 769, 796
796, 797, 834, 847, 858, 860, 863, 864 Totally symmetric representation, 734, 738,
Symmetric matrix, 131, 569–572, 604 741, 742, 745, 751, 754, 755, 759, 769,
Symmetric representation, 723–726, 734, 738, 784
741, 742, 745, 751, 754, 755, 759, 769, Trace, 247, 257, 259, 261, 262, 289, 292, 293,
784, 796, 860, 863, 864 461, 506, 568, 569, 654, 668, 672, 674,
Symmetry 690, 694, 697–699, 703, 704, 707, 708,
groups, 622, 637, 638, 641, 650, 662, 735, 746, 748, 757, 759, 765, 770, 822, 834,
738, 755 839, 871, 872, 875
operations, 635, 637, 638, 640, 642–654, Traceless, 810, 867, 871, 878
659, 661, 684, 690, 691, 702, 710, 729, Transformation
733, 737, 738, 745–747, 750, 751, 757, coordinate, 641
759, 761, 770, 778, 782, 783 group, 623, 635, 710, 799
requirement, 175, 646, 647, 785, 787–789 linear, 433, 438–440, 443, 444, 447, 448,
species, 650, 659, 719, 745, 757, 759, 774, 450–452, 454, 456, 458, 459, 468, 471,
796 474, 481, 490, 494, 501, 506, 519, 526,
transformations, 748, 759, 765, 780 533, 535, 537–541, 555, 556, 566, 623,
Symmetry-adapted linear combination (SALC), 629, 636, 663, 669, 677, 678, 683, 684,
691, 729, 745–747, 751, 757, 760, 761, 708, 723, 804, 825, 876, 878
764, 765, 767, 770, 771, 773, 774, 777, matrix, 441, 443, 451, 452, 454, 455, 459,
783–788, 860, 861 540, 782, 808, 812
Symmetry groups, 635–678 successive, 454, 456, 458, 669, 670, 672,
Syn-phase, 335, 337 817, 818, 833, 837
System of differential equations, 581, 592–600, Transition dipole moments, 127, 132
610, 617 Transition electric dipole moment, 127
System of linear equations, 450, 452
Index 915
Transition matrix elements, 131, 740, 755, 769, Unique solution, 403, 416, 419, 424, 447, 450,
775, 776 451
Transition probability, 127, 133, 138, 172, 346, Uniqueness
347, 741, 777 of decomposition, 486, 488, 512, 564
Translation symmetry, 637 of Green’s function, 401
Transmission of representation, 440, 450, 454, 632, 633
coefficient, 306–308 of solution, 18
of electromagnetic wave, 295–337 Unitarity, 683, 713, 714, 718, 740, 808, 816,
irradiance, 311 819, 823, 851, 869
of light, 297 Unitary
Transmittance, 311 diagonalization, 290, 556–564, 568, 573,
Transpose 667, 804
complex conjugate, 22, 537 group, 799, 870–872, 875
Transposed matrix, 22, 383, 471, 523, 537 matrix, 135, 290, 530, 533, 534, 541, 547,
Transverse electric (TE) mode, 304, 324, 326, 556–559, 561, 564–568, 573, 578, 590,
329 592, 601, 604, 667, 679–681, 687, 690,
Transverse electric (TE) wave, 295, 303–312, 699, 701, 714, 746, 763, 768, 774, 782,
314, 316, 320–331 804, 805, 808, 810–812, 814, 820, 821,
Transverse magnetic (TM) mode, 304, 368, 847, 850, 854, 863, 867, 875, 876
371, 372 operator, 547–580, 860, 869, 870
Transverse magnetic (TM) wave, 295, representation, 679–681, 683, 687, 695,
303–308, 310, 312–314, 316, 318, 714, 825, 830, 843
320–331 transformation, 135, 140, 541, 566, 740,
Transverse waves, 281, 282, 352 763, 764, 770, 771, 774, 812, 849, 851,
Trial function, 174, 175, 177 852, 854, 860, 863
Triangle inequality, 525 Unitary similarity transformation, 532, 554,
Triangle matrix 556, 557, 559, 560, 563, 578, 580, 600,
lower, 462, 512, 553, 606 609, 668, 689, 782, 801, 804, 805, 822,
upper, 462, 464, 476, 485, 496, 553, 606 863, 875, 889
Trigonometric formula, 101, 130, 309, 312, Unit polarization vector
313, 330, 842, 884 of electric field, 127, 284, 303, 350, 352,
Trigonometric functions, 103, 140, 241, 319, 755, 796
639, 815, 836 of magnetic field, 304, 352
Triple root, 474, 501, 503, 504, 759 Unit set, 196
Triply degenerate, 789, 795, 796 Unit vectors, 4, 129, 130, 273, 279, 280,
Trivial, 21, 26, 184, 192, 299, 381, 434, 437, 296–298, 303–305, 325, 327, 352, 364,
475, 480, 489, 555, 567, 570, 585, 607, 436, 540, 611, 803
622, 664, 667, 693, 741, 803, 819, 860, Universal covering group, 896
867 Universal set, 182–185
Trivial solution, 399, 401, 409, 426, 453, 460, Unknowns, 153, 284, 305, 596, 598, 600
522 Unperturbed
Trivial topology, 195 state, 153, 158
T1-space, 195–196, 896 system, 152, 153
Two-level atoms, 339, 345–349, 354, 355, 375 Upper right off-diagonal elements, 450, 462
Upper triangle matrix, 462, 464, 476, 485, 496,
553, 606
U
Ultraviolet catastrophe, 339, 345
Ultraviolet light, 363
Unbounded, 25, 29, 216 V
Uncertainty principle, 29, 51–55 Variable transformation, 46, 47, 78, 140, 160,
Uncountable, 182, 621 234, 278
Underlying set, 185 Variance, 51–55
Undetermined constants, 74, 75, 84 Variance operator, 51
Uniformly convergent, 216, 218–222 Variational method, 151, 172–179
916 Index
Variational principle, 36, 110 Wigner formula, 814

Vector Wronskian, 15, 298, 381, 383, 409, 412
transformation, 182, 433–454, 459, 487,
505, 538, 635, 678, 803
Vector analysis, 269, 276, 281 X
Vector space X-ray, 4–6
complex, 136
finite-dimensional, 433, 435, 448, 449, 748
linear, 433, 438, 440, 444, 445, 448, 459, Y
518, 523, 524, 559, 575, 581, 583, 624, Yukawa potential, 107
627, 691, 872
n-dimensional, 433, 434, 437, 440, 448
Venn diagrams, 182–184, 186 Z
Vertical incidence, 303, 305, 312 Zenithal angle, 70, 674, 831, 839, 895
Zero matrix, 27, 469, 471, 475, 476, 487–489,
512, 562, 585, 587, 868, 891
W Zeros, 5, 48, 69, 132, 176, 204, 270, 318, 340,
Wave equations, 8, 269, 279, 284, 322, 324, 408, 434, 460, 532, 553, 607, 624, 639,
341, 342 695, 743, 806
Wave front, 328
Waveguide, 295, 319–331, 359, 371, 373
Wavelength dispersion, 358–362, 373 Δ
Wavelengths, 4–6, 149, 302, 303, 310, 334, δ functions, 396, 397, 399, 402, 422
336, 337, 357–359, 364, 372, 374, 611
Wavenumber, 4–6, 278, 297, 298, 301, 315,
321, 323, 326, 328, 357, 364 Θ
Wavenumber vectors, 4, 5, 279, 297, 298, 301, θ function, 422
315, 321, 323, 355, 364
Wavevector, 363, 364, 372, 373
Wave zone, 350–352 Π
Weak damping, 420 π-electron approximation, 767, 777
Weight function, 49, 384, 391, 393, 395, 405,
410, 422, 426–428, 593, 594

Hotta S - Mathematical Physical Chemistry-Springer (2020)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hotta S - Mathematical Physical Chemistry-Springer (2020)

Uploaded by

Copyright:

Available Formats

Shu Hotta

ISBN 978-981-15-2224-6 ISBN 978-981-15-2225-3 (eBook)

© Springer Nature Singapore Pte Ltd. 2020

This book is the second edition of Mathematical Physical Chemistry. Mathematics is

Takatsuki, Japan Shu Hotta

eigenvalues of a quantum-mechanical harmonic oscillator and angular momenta of a

an important position in algebra. These include a triangle matrix, diagonalizable

Kyoto, Japan Shu Hotta

Part I Quantum Mechanics

3.8 Total Wave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8 Reﬂection and Transmission of Electromagnetic Waves

Part III Linear Vector Spaces

11.4 Basis Vectors and Their Transformations . . . . . . . . . . . . . . . . . 452

Part IV Group Theory and Its Chemical Applications

20.2 Rotation Groups: SU(2) and SO(3) . . . . . . . . . . . . . . . . . . . . . . 806

Quantum mechanics is clearly distinguished from classical physics whose major

Quantum mechanics is an indispensable research tool of modern natural science that

1.1 Early-Stage Quantum Theory

The Schrödinger equation is a direct consequence of discovery of quanta. It stemmed

© Springer Nature Singapore Pte Ltd. 2020 3

(a) recoiled electron ( )

incident X-ray (ℏ ) rest electron

p ¼ E=c ¼ ħω=c ¼ ħk, ð1:2Þ

where k 2π/λ (λ is wavelength of light in vacuum) and k is called wavenumber.

where Δλ denotes a shift in wavelength of the scattered beam; me is a rest mass of an

where u is a velocity of an electron and m is given by [1].

From (1.7), we have

From (1.11) and (1.13), we have

Equation (1.14) is simpliﬁed to the following:

2me c2 ħðω ω0 Þ 2ħ2 ωω0 ¼ 2ħ2 ωω0 cos θ:

me c2 ðω ω0 Þ ¼ ħωω0 ð1 cos θÞ: ð1:15Þ

In turn, from squares of both sides of (1.8) and (1.9) we get

This relation represents a velocity of particles of the corpuscular beam. If we are

We used a relativistic relation in the second term of RHS of (1.6), where an

In the meantime, deleting u2 from (1.8) and (1.9) we have

Namely, we get [1].

1.2 Schrödinger Equation

First we introduce a wave equation expressed by

where ψ is an arbitrary function of a physical quantity relevant to propagation of a

ψ ¼ ψ 0 eiðk ∙ xωtÞ : ð1:25Þ

In (1.25), x denotes a position vector of a three-dimensional Cartesian coordinate

Rewriting (1.28), we have

Taking partial differentiation of (1.28) once more,

As in the above cases, we have

and the following correspondence

where m is the mass of a particle.

As the above, we get the following relationship:

Thus, we have relationships between c-numbers (classical numbers) and q-numbers

Invoking the relationship on energy

ðTotal energyÞ ¼ ðKinetic energyÞ þ ðPotential energyÞ, ð1:42Þ

where V is a potential energy. Thus, (1.41) reads as

Rearranging (1.44), we ﬁnally get

This is the Schrödinger equation, a fundamental equation of quantum mechanics. In

Then we have a shorthand representation such that

On going from (1.25) to (1.27), we realize that quantities k and ω pertinent to a

As a nonrelativistic approximation, we get

where we used p  meu again as a nonrelativistic approximation; also, we used

ψ ðx, t Þ ¼ ϕðxÞξðt Þ: ð1:49Þ

Accordingly, (1.50) can be recast as

If RHS of (1.54) varied depending on t, Φ(x0) would be allowed to have various

HϕðxÞ ¼ EϕðxÞ, ð1:55Þ

Integrating (1.57) from zero to t, we get

ξðt Þ ¼ ξð0Þ exp ðiEt=ħÞ: ð1:59Þ

ψ ðx, t Þ ¼ ϕðxÞ exp ðiEt=ħÞ, ð1:60Þ

where we used p meu again as a nonrelativistic approximation; also, we used

λ ¼ ð2m þ 1Þ2 π 2 =4L2 ðm ¼ 0, 1, 2, Þ: ð1:82Þ

λ ¼ n2 π 2 =L2 ¼ ð2nÞ2 π 2 =4L2 ðn ¼ 1, 2, 3, Þ, ð1:85Þ

that corresponds to an eigenvalue λ ¼ (2m + 1)2π 2/4L2 (m ¼ 0, 1, 2, ). For another

that corresponds to an eigenvalue λ ¼ (2n)2π 2/4L2 (n ¼ 1, 2, 3, ).