You are on page 1of 920

Shu Hotta

Mathematical
Physical
Chemistry
Practical and Intuitive Methodology
Second Edition
Mathematical Physical Chemistry
Shu Hotta

Mathematical Physical
Chemistry
Practical and Intuitive Methodology

Second Edition
Shu Hotta
Takatsuki, Osaka, Japan

ISBN 978-981-15-2224-6 ISBN 978-981-15-2225-3 (eBook)


https://doi.org/10.1007/978-981-15-2225-3

© Springer Nature Singapore Pte Ltd. 2020


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
To my wife Kazue
and
To the memory of Roro
Preface to the Second Edition

This book is the second edition of Mathematical Physical Chemistry. Mathematics is


a common language of natural science including physics, chemistry, and biology.
Although the words mathematical physics and physical chemistry (or chemical
physics) are commonly used, mathematical physical chemistry sounds rather uncom-
mon. Therefore, it might well be reworded as the mathematical physics for chemists.
The book title could have been, for instance, “The Mathematics of Physics and
Chemistry” accordingly, in tribute to the famous book that was written three-quarters
of a century ago by H. Margenau and G. M. Murphy. Yet, the word mathematical
physical chemistry is expected to be granted citizenship, considering that chemistry
and related interdisciplinary fields such as materials science and molecular science
are becoming increasingly mathematical.
The main concept and main theme remain unchanged, but this book’s second
edition contains the theory of analytic functions and the theory of continuous groups.
Both the theories are counted as one of the most elegant theories of mathematics. The
mathematics of these topics is of a somewhat advanced level and something like a
“sufficient condition” for chemists, whereas that of the first edition may be a
prerequisite (or a necessary condition) for them. Therefore, chemists (or may be
physicists as well) can creatively use the two editions. In association with these
major additions to the second edition, the author has disposed the mathematical
topics (the theory of analytic functions, Green’s functions, exponential functions of
matrices, and the theory of continuous groups) at the last chapter of individual parts
(Part I through Part IV).
At the same time, the author has also made several specific revisions including the
introductory discussion on the perturbation method and variational method, both of
which can be effectively used for gaining approximate solutions of various quantum-
mechanical problems. As another topic, the author has presented the recent progress
on organic lasers. This topic is expected to help develop high-performance light-
emitting devices, one of the important fields of materials science. As in the case of
the first edition, readers benefit from going freely back and forth across the whole
topics of this book.

vii
viii Preface to the Second Edition

Once again, the author wishes to thank many students for valuable discussions
and Dr. Shin’ichi Koizumi at Springer for giving him an opportunity to write
this book.

Takatsuki, Japan Shu Hotta


October 2019
Preface to the First Edition

The contents of this book are based upon manuscripts prepared for both undergrad-
uate courses of Kyoto Institute of Technology by the author entitled “Polymer
Nanomaterials Engineering” and “Photonics Physical Chemistry” and a master’s
course lecture of Kyoto Institute of Technology by the author entitled “Solid-State
Polymers Engineering.”
This book is intended for graduate and undergraduate students, especially those
who major in chemistry and, at the same time, wish to study mathematical physics.
Readers are supposed to have a basic knowledge of analysis and linear algebra.
However, they are not supposed to be familiar with the theory of analytic functions
(i.e., complex analysis), even though it is desirable to have relevant knowledge
about it.
At the beginning, mathematical physics looks daunting to chemists, as used to be
the case with myself as a chemist. The book introduces the basic concepts of
mathematical physics to chemists. Unlike other books related to mathematical
physics, this book makes a reasonable selection of material so that students majoring
in chemistry can readily understand the contents in spontaneity. In particular, we
stress the importance of practical and intuitive methodology. We also expect engi-
neers and physicists to benefit from reading this book.
In Part I and Part II, the book describes quantum mechanics and electromagne-
tism. Relevance between the two is well considered. Although quantum mechanics
covers the broad field of modern physics, in Part I we focus on a harmonic oscillator
and a hydrogen (like) atom. This is because we can study and deal with many of
fundamental concepts of quantum mechanics within these restricted topics. More-
over, knowledge acquired from the study of the topics can readily be extended to
practical investigation of, e.g., electronic states and vibration (or vibronic) states of
molecular systems. We describe these topics by both analytic method (that uses
differential equations) and operator approach (using matrix calculations). We believe
that the basic concepts of quantum mechanics can be best understood by contrasting
the analytical and algebraic approaches. For this reason, we give matrix representa-
tions of physical quantities whenever possible. Examples include energy

ix
x Preface to the First Edition

eigenvalues of a quantum-mechanical harmonic oscillator and angular momenta of a


hydrogen-like atom. At the same time, these two physical systems supply us with a
good opportunity to study classical polynomials, e.g., Hermite polynomials, (asso-
ciated) Legendre polynomials, Laguerre polynomials, Gegenbauer polynomials, and
special functions, more generally. These topics constitute one of the important
branches of mathematical physics. One of the basic concepts of quantum mechanics
is that a physical quantity is represented by a Hermitian operator or matrix. In this
respect, the algebraic approach gives a good opportunity to get familiar with this
concept. We present tangible examples for this. We also emphasize the importance
of the notion of Hermiticity of a differential operator. We often encounter a unitary
operator or unitary transformation alongside the notion of Hermitian operators. We
show several examples of unitary operators in connection with transformation of
vectors and coordinates.
Part II describes Maxwell equations and their applications to various phenomena
of electromagnetic waves. These include their propagation, reflection, and transmis-
sion in dielectric media. We restrict ourselves to treating those phenomena in
dielectrics without charge. Yet, we cover a wide range of important topics. In
particular, when two (or more) dielectrics are in contact with each other at a plane
interface, reflection and transmission of light are characterized by various important
parameters such as reflection and transmission coefficients, Brewster angles, and
critical angles. We should have a proper understanding not only from the point of
view of basic study but also to make use of relevant knowledge in optical device
applications such as a waveguide. In contrast to a concept of electromagnetic waves,
light possesses a characteristic of light quanta. We present semiclassical and statis-
tical approaches to blackbody radiation occurring in a simplified system in relation
to Part I. The physical processes are well characterized by a notion of two-level
atoms. In this context, we outline the dipole radiation within the framework of the
classical theory. We briefly describe how the optical processes occurring in a
confined dielectric medium are related to a laser that is of great importance in
fundamental science and its applications. Many of basic equations of physics are
descried as second-order linear differential equations (SOLDEs). Different methods
were developed and proposed to seek their solutions. One of the most important
methods is that of Green’s functions. We present the introductory theory of Green’s
functions accordingly. In this connection, we rethink the Hermiticity of a differential
operator.
In Part III and Part IV, we describe algebraic structures of mathematical physics.
Their understanding is useful to studies of quantum mechanics and electromagne-
tism whose topics are presented in Part I and Part II. Part III deals with theories of
linear vector spaces. We focus on the discussion of vectors and their transformations
in finite-dimensional vector spaces. Generally, we consider the vector transforma-
tions among the vector spaces of different dimensions. In this book, however, we
restrict ourselves to the case of the transformation between the vector spaces of same
dimension, i.e., endomorphism of the space (Vn ! Vn). This is not only because this
is most often the case with many of physical applications, but because the relevant
operator is represented by a square matrix. Canonical forms of square matrices hold
Preface to the First Edition xi

an important position in algebra. These include a triangle matrix, diagonalizable


matrix as well as a nilpotent matrix and idempotent matrix. The most general form
will be Jordan canonical form. We present its essential parts in detail taking a
tangible example. Next to the general discussion, we deal with an inner product
space. Once an inner product is defined between any couple of vectors, the vector
space is given a fruitful structure. An example is a norm (i.e., “length”) of a vector.
Also, we gain a clear relationship between Part III and Part I. We define various
operators or matrices that are important in physical applications. Examples include
normal operators (or matrices) such as Hermitian operators, projection operators, and
unitary operators. Once again, we emphasize the importance of Hermitian operators.
In particular, two commutable Hermitian matrices share simultaneous eigenvectors
(or eigenstates) and, in this respect, such two matrices occupy a special position in
quantum mechanics.
Finally, Part IV describes the essence of group theory and its chemical applica-
tions. Group theory has a broad range of applications in solid-state physics, solid-
state chemistry, molecular science, etc. Nonetheless, the knowledge of group theory
does not seem to have fully prevailed among chemists. We can discover an adequate
reason for this in a preface to the first edition of Chemical Applications of Group
Theory written by F. A. Cotton. It might well be natural that definition and statement
of abstract algebra, especially group theory, sound somewhat pretentious for chem-
ists, even though the definition of group is quite simple. Therefore, we present
various examples for readers to get used to notions of group theory. The notion of
mapping is important as in the case of the linear vector spaces. Aside from being
additive with calculation for a vector space and multiplicative for a group, the
fundamentals of calculation regulations are pretty much the same regarding the
vector space and group. We describe the characteristics of symmetry groups in detail
partly because related knowledge is useful for molecular orbital (MO) calculations
that are presented in the last section of the book. Representation theory is probably
one of the most daunting notions for chemists. Practically, however, the representa-
tion is just homomorphism that corresponds to a linear transformation in a vector
space. In this context, the representation is merely denoted by a number or a matrix.
Basis functions of representation correspond to basis vectors in a vector space.
Grand orthogonality theorem (GOT) is a “nursery bed” of the representation theory.
Therefore, readers are encouraged to understand its essence apart from the rigorous
proof of the theorem. In conjunction with Part III, we present a variety of projection
operators. These are very useful to practical applications in, e.g., quantum mechanics
and molecular science. The final parts of the book are devoted to applications of
group theory to problems of physical chemistry, especially those of quantum
chemistry, more specifically molecular orbital calculations. We see how symmetry
consideration, particularly the use of projection operators, saves us a lot of labor.
Examples include aromatic hydrocarbons and methane.
The previous sections sum up the contents of this book. Readers may start with
any part and go freely back and forth. This is because contents of many parts are
interrelated. For example, we emphasize the importance of Hermiticity of differen-
tial operators and matrices. Also projection operators and nilpotent matrices appear
xii Preface to the First Edition

in many parts along with their tangible applications to individual topics. Hence,
readers are recommended to carefully examine and compare the related contents
throughout the book. We believe that readers, especially chemists, benefit from the
writing style of this book, since it is suited to chemists who are good at intuitive
understanding.
The author would like to thank many students for their valuable suggestions and
discussions at the lectures. The author also wishes to thank many students for
valuable discussions and Dr. Shin’ichi Koizumi at Springer for giving him an
opportunity to write this book.

Kyoto, Japan Shu Hotta


October 2017
Contents

Part I Quantum Mechanics


1 Schrödinger Equation and Its Application . . . . . . . . . . . . . . . . . . . 3
1.1 Early-Stage Quantum Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Schrödinger Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Simple Applications of Schrödinger Equation . . . . . . . . . . . . . . 14
1.4 Quantum-Mechanical Operators and Matrices . . . . . . . . . . . . . . 21
1.5 Commutator and Canonical Commutation Relation . . . . . . . . . . 27
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2 Quantum-Mechanical Harmonic Oscillator . . . . . . . . . . . . . . . . . . . 31
2.1 Classical Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Formulation Based on an Operator Method . . . . . . . . . . . . . . . . 33
2.3 Matrix Representation of Physical Quantities . . . . . . . . . . . . . . 41
2.4 Coordinate Representation of Schrödinger Equation . . . . . . . . . 44
2.5 Variance and Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . 51
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 Hydrogen-Like Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Constitution of Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Generalized Angular Momentum . . . . . . . . . . . . . . . . . . . . . . . 72
3.5 Orbital Angular Momentum: Operator Approach . . . . . . . . . . . . 77
3.6 Orbital Angular Momentum: Analytic Approach . . . . . . . . . . . . 91
3.6.1 Spherical Surface Harmonics and Associated
Legendre Differential Equation . . . . . . . . . . . . . . . . . . 92
3.6.2 Orthogonality of Associated Legendre Functions . . . . . 103
3.7 Radial Wave Functions of Hydrogen-Like Atoms . . . . . . . . . . . 107
3.7.1 Operator Approach to Radial Wave Functions . . . . . . . 107
3.7.2 Normalization of Radial Wave Functions . . . . . . . . . . . 112
3.7.3 Associated Laguerre Polynomials . . . . . . . . . . . . . . . . 116
xiii
xiv Contents

3.8 Total Wave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122


References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4 Optical Transition and Selection Rules . . . . . . . . . . . . . . . . . . . . . . 125
4.1 Electric Dipole Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.2 One-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3 Three-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.4 Selection Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.5 Angular Momentum of Radiation . . . . . . . . . . . . . . . . . . . . . . . 147
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5 Approximation Methods of Quantum Mechanics . . . . . . . . . . . . . . 151
5.1 Perturbation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.1.1 Quantum State and Energy Level Shift Caused by
Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.1.2 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.2 Variational Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6 Theory of Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.1 Set and Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.1.1 Basic Notions and Notations . . . . . . . . . . . . . . . . . . . . 182
6.1.2 Topological Spaces and Their Building Blocks . . . . . . . 185
6.1.3 T1-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.1.4 Complex Numbers and Complex Plane . . . . . . . . . . . . 197
6.2 Analytic Functions of a Complex Variable . . . . . . . . . . . . . . . . 199
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula . . . . 207
6.4 Taylor’s Series and Laurent’s Series . . . . . . . . . . . . . . . . . . . . . 216
6.5 Zeros and Singular Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.6 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
6.7 Calculus of Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.8 Examples of Real Definite Integrals . . . . . . . . . . . . . . . . . . . . . 230
6.9 Multivalued Functions and Riemann Surfaces . . . . . . . . . . . . . . 248
6.9.1 Brief Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
6.9.2 Examples of Multivalued Functions . . . . . . . . . . . . . . . 256
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Part II Electromagnetism
7 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
7.1 Maxwell’s Equations and Their Characteristics . . . . . . . . . . . . . 269
7.2 Equation of Wave Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
7.3 Polarized Characteristics of Electromagnetic Waves . . . . . . . . . 280
7.4 Superposition of Two Electromagnetic Waves . . . . . . . . . . . . . 285
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Contents xv

8 Reflection and Transmission of Electromagnetic Waves


in Dielectric Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
8.1 Electromagnetic Fields at an Interface . . . . . . . . . . . . . . . . . . . 295
8.2 Basic Concepts Underlying Phenomena . . . . . . . . . . . . . . . . . . 297
8.3 Transverse Electric (TE) Waves and Transverse
Magnetic (TM) Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
8.4 Energy Transport by Electromagnetic Waves . . . . . . . . . . . . . . 308
8.5 Brewster Angles and Critical Angles . . . . . . . . . . . . . . . . . . . . 312
8.6 Total Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
8.7 Waveguide Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
8.7.1 TE and TM Waves in a Waveguide . . . . . . . . . . . . . . . 320
8.7.2 Total Internal Reflection and Evanescent Waves . . . . . . 327
8.8 Stationary Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
9 Light Quanta: Radiation and Absorption . . . . . . . . . . . . . . . . . . . . 339
9.1 Blackbody Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
9.2 Planck’s Law of Radiation and Mode Density of
Electromagnetic Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
9.3 Two-Level Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
9.4 Dipole Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
9.5 Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
9.5.1 Brief Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
9.5.2 Organic Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
9.6 Mechanical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
10 Introductory Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
10.1 Second-Order Linear Differential Equations (SOLDEs) . . . . . . . 379
10.2 First-Order Linear Differential Equations (FOLDEs) . . . . . . . . . 384
10.3 Second-Order Differential Operators . . . . . . . . . . . . . . . . . . . . 389
10.4 Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
10.5 Construction of Green’s Functions . . . . . . . . . . . . . . . . . . . . . . 401
10.6 Initial Value Problems (IVPs) . . . . . . . . . . . . . . . . . . . . . . . . . 408
10.6.1 General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
10.6.2 Green’s Functions for IVPs . . . . . . . . . . . . . . . . . . . . . 411
10.6.3 Estimation of Surface Terms . . . . . . . . . . . . . . . . . . . . 414
10.6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
10.7 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430

Part III Linear Vector Spaces


11 Vectors and Their Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 433
11.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
11.2 Linear Transformations of Vectors . . . . . . . . . . . . . . . . . . . . . . 438
11.3 Inverse Matrices and Determinants . . . . . . . . . . . . . . . . . . . . . . 448
xvi Contents

11.4 Basis Vectors and Their Transformations . . . . . . . . . . . . . . . . . 452


Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
12 Canonical Forms of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
12.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 459
12.2 Eigenspaces and Invariant Subspaces . . . . . . . . . . . . . . . . . . . . 468
12.3 Generalized Eigenvectors and Nilpotent Matrices . . . . . . . . . . . 473
12.4 Idempotent Matrices and Generalized Eigenspaces . . . . . . . . . . 478
12.5 Decomposition of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
12.6 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
12.6.1 Canonical Form of Nilpotent Matrix . . . . . . . . . . . . . . 488
12.6.2 Jordan Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
12.6.3 Example of Jordan Canonical Form . . . . . . . . . . . . . . . 501
12.7 Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
13 Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
13.1 Inner Product and Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
13.2 Gram Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
13.3 Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
13.4 Orthonormal Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
14 Hermitian Operators and Unitary Operators . . . . . . . . . . . . . . . . . 547
14.1 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
14.2 Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
14.3 Unitary Diagonalization of Matrices . . . . . . . . . . . . . . . . . . . . . 556
14.4 Hermitian Matrices and Unitary Matrices . . . . . . . . . . . . . . . . . 564
14.5 Hermitian Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
14.6 Simultaneous Eigenstates and Diagonalization . . . . . . . . . . . . . 574
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
15 Exponential Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 581
15.1 Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
15.2 Exponential Functions of Matrices and Their
Manipulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
15.3 System of Differential Equations . . . . . . . . . . . . . . . . . . . . . . . 592
15.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
15.3.2 System of Differential Equations in a Matrix
Form: Resolvent Matrix . . . . . . . . . . . . . . . . . . . . . . . 595
15.3.3 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 611
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Contents xvii

Part IV Group Theory and Its Chemical Applications


16 Introductory Group Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
16.1 Definition of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
16.2 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
16.3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
16.4 Isomorphism and Homomorphism . . . . . . . . . . . . . . . . . . . . . . 627
16.5 Direct-Product Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
17 Symmetry Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
17.1 A Variety of Symmetry Operations . . . . . . . . . . . . . . . . . . . . . 635
17.2 Successive Symmetry Operations . . . . . . . . . . . . . . . . . . . . . . . 643
17.3 O and Td Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
17.4 Special Orthogonal Group SO(3) . . . . . . . . . . . . . . . . . . . . . . . 663
17.4.1 Rotation Axis and Rotation Matrix . . . . . . . . . . . . . . . 664
17.4.2 Euler Angles and Related Topics . . . . . . . . . . . . . . . . . 669
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
18 Representation Theory of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 679
18.1 Definition of Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 679
18.2 Basis Functions of Representation . . . . . . . . . . . . . . . . . . . . . . 683
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) . . . . 692
18.4 Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
18.5 Regular Representation and Group Algebra . . . . . . . . . . . . . . . 700
18.6 Classes and Irreducible Representations . . . . . . . . . . . . . . . . . . 707
18.7 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
18.8 Direct-Product Representation . . . . . . . . . . . . . . . . . . . . . . . . . 720
18.9 Symmetric Representation and Antisymmetric
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
19 Applications of Group Theory to Physical Chemistry . . . . . . . . . . . 729
19.1 Transformation of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 729
19.2 Method of Molecular Orbitals (MOs) . . . . . . . . . . . . . . . . . . . . 734
19.3 Calculation Procedures of Molecular Orbitals (MOs) . . . . . . . . . 742
19.4 MO Calculations Based on π-Electron Approximation . . . . . . . . 747
19.4.1 Ethylene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
19.4.2 Cyclopropenyl Radical . . . . . . . . . . . . . . . . . . . . . . . . 757
19.4.3 Benzene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
19.4.4 Allyl Radical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
19.5 MO Calculations of Methane . . . . . . . . . . . . . . . . . . . . . . . . . . 777
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
20 Theory of Continuous Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
20.1 Introduction: Operators of Rotation and Infinitesimal
Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
xviii Contents

20.2 Rotation Groups: SU(2) and SO(3) . . . . . . . . . . . . . . . . . . . . . . 806


20.2.1 Construction of SU(2) Matrices . . . . . . . . . . . . . . . . . . 808
20.2.2 SU(2) Representation Matrices: Wigner Formula . . . . . 812
20.2.3 SO(3) Representation Matrices and Spherical
Surface Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
20.2.4 Irreducible Representations of SU(2) and SO(3) . . . . . . 823
20.2.5 Parameter Space of SO(3) . . . . . . . . . . . . . . . . . . . . . . 831
20.2.6 Irreducible Characters of SO(3) and Their Orthogonality 837
20.3 ClebschGordan Coefficients of Rotation Groups . . . . . . . . . . . 843
20.3.1 Direct-Product of SU(2) and ClebschGordan
Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
20.3.2 Calculation Procedures of ClebschGordan Coefficients 849
20.3.3 Examples of Calculation of ClebschGordan
Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858
20.4 Lie Groups and Lie Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 864
20.4.1 Definition of Lie Groups and Lie Algebras:
One-Parameter Groups . . . . . . . . . . . . . . . . . . . . . . . . 865
20.4.2 Properties of Lie Algebras . . . . . . . . . . . . . . . . . . . . . 868
20.4.3 Adjoint Representation of Lie Groups . . . . . . . . . . . . . 874
20.5 Connectedness of Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . . 885
20.5.1 Several Definitions and Examples . . . . . . . . . . . . . . . . 885
20.5.2 O(3) and SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
20.5.3 Simply Connected Lie Groups: Local Properties
and Global Properties . . . . . . . . . . . . . . . . . . . . . . . . . 890
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
Part I
Quantum Mechanics

Quantum mechanics is clearly distinguished from classical physics whose major


pillars are Newtonian mechanics and electromagnetism established by Maxwell.
Quantum mechanics was first established as a theory of atomic physics that handled
microscopic world. Later on, quantum mechanics was applied to macroscopic world,
i.e., cosmos. A question on how exactly quantum mechanics describes the natural
world and on how far the theory can go remains yet problematic and is in dispute to
thisday.
Such an ultimate question is irrelevant to this monograph. Our major aim is to
study a standard approach to applying Schrödinger equation to selected topics. The
topics include a particle confined within a potential well, a harmonic oscillator, and a
hydrogen-like atoms. Our major task rests on solving eigenvalue problems of these
topics. To this end, we describe both an analytical method and algebraic (oroperator)
method. Focusing on these topics, we will be able to acquire various methods to
tackle a wide range of quantum-mechanical problems. These problems are usually
posed as an analytical equation (i.e., differential equation) or an algebraic equation.
A Hamiltonian is constructed analytically or algebraically accordingly. Besides
Hamiltonian, physical quantities are expressed as a differential operator or a matrix
operator. In both analytical and algebraic approaches, Hermitian property
(orHermiticity) of an operator and matrix is of crucial importance. This feature
will, therefore, be highlighted not only in this part but also throughout this book
along with a unitary operator and matrix.
Optical transition and associated selection rules are dealt with in relation to the
above topics. Those subjects are closely related to electromagnetic phenomena that
are considered in PartII.
Unlike the eigenvalue problems of the abovementioned topics, it is difficult to get
exact analytical solutions in most cases of quantum-mechanical problems. For this
reason, we need appropriate methods to obtain approximate solutions with respect to
various problems including the eigenvalue problems. In this context, we deal with
approximation techniques of a perturbation method and variational method.
2 Part I Quantum Mechanics

In the last part, we study the theory of analytic functions, one of the most elegant
theories of mathematics. This approach not only helps cultivate a broad view of pure
mathematics, but also leads to the acquisition of practical methodology. The last part
deals with the introductory set theory and topology aswell.
Chapter 1
Schrödinger Equation and Its Application

Quantum mechanics is an indispensable research tool of modern natural science that


covers cosmology, atomic physics, molecular science, materials science, and so
forth. The basic concept underlying quantum mechanics rests upon Schrödinger
equation. The Schrödinger equation is described as a second-order linear differential
equation (SOLDE). The equation is analytically solved accordingly. Alternatively,
equations of the quantum mechanics are often described in terms of operators and
matrices and physical quantities are represented by those operators and matrices.
Normally, they are noncommutative. In particular, the quantum-mechanical formal-
ism requires the canonical commutation relation between position and momentum
operators. One of great characteristics of the quantum mechanics is that physical
quantities must be Hermitian. This aspect is deeply related to the requirement that
these quantities should be described by real numbers. We deal with the Hermiticity
from both an analytical point of view (or coordinate representation) relevant to the
differential equations and an algebraic viewpoint (or matrix representation) associ-
ated with the operators and matrices. Including these topics, we briefly survey the
origin of Schrödinger equation and consider its implications. To get acquainted
with the quantum-mechanical formalism, we deal with simple examples of the
Schrödinger equation.

1.1 Early-Stage Quantum Theory

The Schrödinger equation is a direct consequence of discovery of quanta. It stemmed


from the hypothesis of energy quanta propounded by Max Planck (1900). This
hypothesis was further followed by photon (light quantum) hypothesis propounded
by Albert Einstein (1905). He claimed that light is an aggregation of light quanta and
that individual quanta carry an energy E expressed as Planck constant h multiplied
by frequency of light ν, i.e.,

© Springer Nature Singapore Pte Ltd. 2020 3


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_1
4 1 Schrödinger Equation and Its Application

(a) recoiled electron ( )

(b)
−ℏ

incident X-ray (ℏ ) rest electron


ℏ ′

scattered X-ray (ℏ ′)

Fig. 1.1 Scattering of an X-ray beam by an electron. (a) θ denotes a scattering angle of the X-ray
beam. (b) Conservation of momentum

E ¼ hν ¼ ħω, ð1:1Þ

where ħ  h/2π and ω ¼ 2πν. The quantity ω is called angular frequency with ν
being frequency. The quantity ħ is said to be a reduced Planck constant.
Also Einstein (1917) concluded that momentum of light quantum p is identical to
the energy of light quantum divided by light velocity in vacuum c. That is, we have

p ¼ E=c ¼ ħω=c ¼ ħk, ð1:2Þ

where k  2π/λ (λ is wavelength of light in vacuum) and k is called wavenumber.


Using vector notation, we have

p = ħk, ð1:3Þ

where k  2πλ n (n: a unit vector in the direction of propagation of light) is said to be a
wavenumber vector.
Meanwhile, Arthur Compton (1923) conducted various experiments where he
investigated how an incident X-ray beam was scattered by matter (e.g., graphite,
copper). As a result, Compton found out a systematical redshift in X-ray wave-
lengths as a function of scattering angles of the X-ray beam (Compton effect).
Moreover he found that the shift in wavelengths depended only on the scattering
angle regardless of quality of material of a scatterer. The results can be summarized
in a simple equation described as

h
Δλ ¼ ð1  cos θÞ, ð1:4Þ
me c

where Δλ denotes a shift in wavelength of the scattered beam; me is a rest mass of an


electron; θ is a scattering angle of the X-ray beam (see Fig. 1.1). A quantity mhe c has a
dimension of length and denoted by λe. That is,
1.1 Early-Stage Quantum Theory 5

λe  h=me c: ð1:5Þ

In other words, λe is equal to the maximum shift in the wavelength of the scattered
beam; this shift is obtained when θ ¼ π/2. The quantity λe is called an electron
Compton wavelength and has an approximate value of 2.426  1012 [m].
Let us derive (1.4) on the basis of conservation of energy and momentum. To this
end, in Fig. 1.1 we assume that an electron is originally at rest. An X-ray beam is
incident to the electron. Then the X-ray is scattered and the electron recoils as shown.
The energy conservation reads as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ħω þ me c2 ¼ ħω0 þ p2 c2 þ me 2 c4 , ð1:6Þ

where ω and ω0 are initial and final angular frequencies of the X-ray; the second term
of RHS is an energy of the electron in which p is a magnitude of momentum after
recoil. Meanwhile, conservation of the momentum as a vector quantity reads as

ħk = ħk0 þ p, ð1:7Þ

where k and k0 are wavenumber vectors of the X-ray before and after being scattered;
p is a momentum of the electron after recoil. Note that an initial momentum of the
electron is zero since the electron is originally at rest. Here p is defined as

p  mu, ð1:8Þ

where u is a velocity of an electron and m is given by [1].


qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
m ¼ me = 1  juj2 =c2 : ð1:9Þ

Figure 1.1 shows that ħk, ħk0, and p form a closed triangle.
From (1.6), we have
 2
me c2 þ ħðω  ω0 Þ ¼ p2 c 2 þ m e 2 c 4 : ð1:10Þ

Hence, we get

2me c2 ħðω  ω0 Þ þ ħ2 ðω  ω0 Þ ¼ p2 c2 :
2
ð1:11Þ

From (1.7), we have


 
p2 ¼ ħ2 ðk  k0 Þ ¼ ħ2 k2 þ k0  2kk 0 cos θ
2 2

 
ħ2 2 02 0
¼ ω þ ω  2ωω cos θ , ð1:12Þ
c2
6 1 Schrödinger Equation and Its Application

where we used the relations ω ¼ ck and ω0 ¼ ck0 with the third equality. Therefore,
we get
 
p2 c2 ¼ ħ2 ω2 þ ω0  2ωω0 cos θ :
2
ð1:13Þ

From (1.11) and (1.13), we have


 
2me c2 ħðω  ω0 Þ þ ħ2 ðω  ω0 Þ ¼ ħ2 ω2 þ ω0  2ωω0 cos θ :
2 2
ð1:14Þ

Equation (1.14) is simplified to the following:

2me c2 ħðω  ω0 Þ  2ħ2 ωω0 ¼ 2ħ2 ωω0 cos θ:

That is,

me c2 ðω  ω0 Þ ¼ ħωω0 ð1  cos θÞ: ð1:15Þ

Thus, we get

ω  ω0 1 1 1 0 ħ
¼ 0 ¼ ðλ  λÞ ¼ ð1  cos θÞ, ð1:16Þ
ωω0 ω ω 2πc me c2

where λ and λ0 are wavelengths of the initial and final X-ray beams, respectively.
Since λ0  λ ¼ Δλ, we have (1.4) from (1.16) accordingly.
We have to mention another important person, Louis-Victor de Broglie (1924) in
the development of quantum mechanics. Encouraged by the success of Einstein and
Compton, he propounded the concept of matter wave, which was referred to as the
de Broglie wave afterward. Namely, de Broglie reversed the relationship of (1.1) and
(1.2) such that

ω ¼ E=ħ, ð1:17Þ

and

p
k¼ or λ ¼ h=p, ð1:18Þ
ħ

where p equals jpj and λ is a wavelength of a corpuscular beam. This is said to be the
de Broglie wavelength. In (1.18), de Broglie thought that a particle carrying an
energy E and momentum p is accompanied by a wave that is characterized by an
angular frequency ω and wavenumber k (or a wavelength λ ¼ 2π/k). Equation (1.18)
implies that if we are able to determine the wavelength of the corpuscular beam
experimentally, we can decide a magnitude of momentum accordingly.
1.1 Early-Stage Quantum Theory 7

In turn, from squares of both sides of (1.8) and (1.9) we get

p
u¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð1:19Þ
me 1 þ ðp=me cÞ2

This relation represents a velocity of particles of the corpuscular beam. If we are


dealing with an electron beam, (1.19) gives the velocity of the electron beam. As a
nonrelativistic approximation (i.e., p/mec  1), we have

p  me u:

We used a relativistic relation in the second term of RHS of (1.6), where an


energy of an electron Ee is expressed by
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ee ¼ p2 c 2 þ m e 2 c 4 : ð1:20Þ

In the meantime, deleting u2 from (1.8) and (1.9) we have


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
mc2 ¼ p2 c 2 þ m e 2 c 4 :

Namely, we get [1].

E e ¼ mc2 : ð1:21Þ

The relation (1.21) is due to Einstein (1905, 1907) and is said to be the equivalence
theorem of mass and energy.
If an electron is accompanied by a matter wave, that wave should be propagated
with a certain phase velocity vp and a group velocity vg. Thus, using (1.17) and (1.18)
we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
vp ¼ ω=k ¼ Ee =p ¼ p2 c2 þ me 2 c4 =p > c,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
vg ¼ ∂ω=∂k ¼ ∂Ee =∂p ¼ c2 p= p2 c2 þ me 2 c4 < c, ð1:22Þ
vp vg ¼ c :
2

Notice that in the above expressions, we replaced E of (1.17) with Ee of (1.20). The
group velocity is thought to be a velocity of a wave packet and, hence, a propagation
velocity of a matter wave should be identical to vg. Thus, vg is considered as a
particle velocity as well. In fact, vg given by (1.22) is identical to u expressed in
(1.19). Therefore, a particle velocity must not exceed c. As for photons (or light
quanta), vp ¼ vg ¼ c and, hence, once again we get vpvg ¼ c2. We will encounter the
last relation of (1.22) in Part II as well.
The above discussion is a brief historical outlook of early-stage quantum theory
before Erwin Schrödinger (1926) propounded his equation.
8 1 Schrödinger Equation and Its Application

1.2 Schrödinger Equation

First we introduce a wave equation expressed by

2
1 ∂ ψ
∇2 ψ ¼ , ð1:23Þ
v2 ∂t 2

where ψ is an arbitrary function of a physical quantity relevant to propagation of a


wave; v is a phase velocity of wave; ∇2 called Laplacian is defined below

2 2 2
∂ ∂ ∂
∇2  2
þ 2þ 2: ð1:24Þ
∂x ∂y ∂z

One of special solutions for (1.24) called a plane wave is well studied and expressed
as

ψ ¼ ψ 0 eiðk ∙ xωtÞ : ð1:25Þ

In (1.25), x denotes a position vector of a three-dimensional Cartesian coordinate


and is described as
0 1
x
B C
x = ð e1 e2 e3 Þ @ y A, ð1:26Þ
z

where e1, e2, and e3 denote basis vectors of an orthonormal base pointing to positive
directions of x-, y-, and z-axes. Here we make it a rule to represent basis vectors by a
row vector and represent a coordinate or a component of a vector by a column vector;
see Sect. 9.1.
The other way around, now we wish to seek a basic equation whose solution is
described as (1.25). Taking account of (1.1)–(1.3) as well as (1.17) and (1.18), we
rewrite (1.25) as

ψ ¼ ψ 0 eiðħ ∙ x ħ tÞ ,
p E
ð1:27Þ
0 1
px
B C
where we redefine p = ðe1 e2 e3 Þ@ py A and E as quantities associated with those of
pz
matter (electron) wave. Taking partial differentiation of (1.27) with respect to x, we
obtain
1.2 Schrödinger Equation 9

∂ψ
¼ p ψ eiðħ ∙ x ħ tÞ ¼ px ψ:
i p E i
ð1:28Þ
∂x ħ x 0 ħ

Rewriting (1.28), we have

ħ ∂ψ
¼ px ψ: ð1:29Þ
i ∂x

Similarly we have

ħ ∂ψ ħ ∂ψ
¼ py ψ and ¼ pz ψ: ð1:30Þ
i ∂y i ∂z

Comparing both sides of (1.29), we notice that we may relate a differential operator
ħ ∂
i ∂x to px. From (1.30), similar relationship holds with the y and z components. That
is, we have the following relations:

ħ ∂ ħ ∂ ħ ∂
$ px , $ py , $ pz : ð1:31Þ
i ∂x i ∂y i ∂z

Taking partial differentiation of (1.28) once more,

2  2
∂ ψ
ψ 0 eiðħ ∙ x ħtÞ ¼  2 p2x ψ:
i p E 1
¼ p ð1:32Þ
∂x2 ħ x
ħ

Hence,

2
∂ ψ
ħ2 ¼ p2x ψ: ð1:33Þ
∂x2

Similarly we have

2 2
∂ ψ ∂ ψ
ħ2 ¼ p2y ψ and  ħ2 ¼ p2z ψ: ð1:34Þ
∂y2 ∂z2

As in the above cases, we have

2 2 2
∂ ∂ ∂
ħ2 $ p2x ,  ħ2 2 $ p2y ,  ħ2 2 $ p2z : ð1:35Þ
∂x2 ∂y ∂z

Summing both sides of (1.33) and (1.34) and then dividing by 2m, we have
10 1 Schrödinger Equation and Its Application

ħ2 2 p2
 ∇ ψ¼ ψ ð1:36Þ
2m 2m

and the following correspondence

ħ2 2 p2
 ∇ $ , ð1:37Þ
2m 2m

where m is the mass of a particle.


Meanwhile, taking partial differentiation of (1.27) with respect to t, we obtain

∂ψ
¼  Eψ 0 eiðħ ∙ x ħ tÞ ¼  Eψ:
i p E i
ð1:38Þ
∂t ħ ħ

That is,

∂ψ
iħ ¼ Eψ: ð1:39Þ
∂t

As the above, we get the following relationship:


iħ $ E: ð1:40Þ
∂t

Thus, we have relationships between c-numbers (classical numbers) and q-numbers


(quantum numbers, namely, operators) in (1.35) and (1.40). Subtracting (1.36) from
(1.39), we get
 
∂ψ ħ2 2 p2
iħ þ ∇ ψ¼ E ψ: ð1:41Þ
∂t 2m 2m

Invoking the relationship on energy

ðTotal energyÞ ¼ ðKinetic energyÞ þ ðPotential energyÞ, ð1:42Þ

we have

p2
E¼ þ V, ð1:43Þ
2m

where V is a potential energy. Thus, (1.41) reads as


1.2 Schrödinger Equation 11

∂ψ ħ2 2
iħ þ ∇ ψ ¼ Vψ: ð1:44Þ
∂t 2m

Rearranging (1.44), we finally get


 
ħ2 2 ∂ψ
 ∇ þ V ψ ¼ iħ : ð1:45Þ
2m ∂t

This is the Schrödinger equation, a fundamental equation of quantum mechanics. In


(1.45), we define a following Hamiltonian operator H as

ħ2 2
H ∇ þ V: ð1:46Þ
2m

Then we have a shorthand representation such that

∂ψ
Hψ ¼ iħ : ð1:47Þ
∂t

On going from (1.25) to (1.27), we realize that quantities k and ω pertinent to a


field have been converted to quantities p and E related to a particle. At the same time,
whereas x and t represent a whole space-time in (1.25), those in (1.27) are charac-
terized as localized quantities.
From a historical point of view, we have to mention a great achievement
accomplished by Werner Heisenberg (1925) who propounded matrix mechanics.
The matrix mechanics is often contrasted with the wave mechanics Schrödinger
initiated. Schrödinger and Pau Dirac (1926) demonstrated that wave mechanics and
matrix mechanics are mathematically equivalent. Note that the Schrödinger equation
is described as a nonrelativistic expression based on (1.43). In fact, kinetic energy
K of a particle is given by [1].

m e c2
K ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  m e c2 :
2
1  ðu=cÞ

As a nonrelativistic approximation, we get


 
1 u 2 1 p2
K  m e c2 1 þ  me c2 ¼ me u2  ,
2 c 2 2me

where we used p  meu again as a nonrelativistic approximation; also, we used


12 1 Schrödinger Equation and Its Application

1 1
pffiffiffiffiffiffiffiffiffiffiffi  1 þ x
1x 2

2
when x (>0) corresponding to uc is enough small than 1. This implies that in the
above case the group velocity u of a particle is supposed to be well below light
velocity c. Dirac (1928) formulated an equation that describes relativistic quantum
mechanics (the Dirac equation).
In (1.45) ψ varies as a function of x and t. Suppose, however, that a potential
V depends only upon x. Then we have

ħ2 2 ∂ψ ðx, t Þ
 ∇ þ V ðxÞ ψ ðx, t Þ ¼ iħ : ð1:48Þ
2m ∂t

Now, let us assume that separation of variables can be done with (1.48) such that

ψ ðx, t Þ ¼ ϕðxÞξðt Þ: ð1:49Þ

Then, we have

ħ2 2 ∂ϕðxÞξðt Þ
 ∇ þ V ðxÞ ϕðxÞξðt Þ ¼ iħ : ð1:50Þ
2m ∂t

Accordingly, (1.50) can be recast as

ħ2 2 ∂ξðt Þ
 ∇ þ V ðxÞ ϕðxÞ=ϕðxÞ ¼ iħ =ξðt Þ: ð1:51Þ
2m ∂t

For (1.51) to hold, we must equate both sides to a constant E. That is, for a certain
fixed point x0 we have

ħ2 2 ∂ξðt Þ
 ∇ þ V ðx0 Þ ϕðx0 Þ=ϕðx0 Þ ¼ iħ =ξðt Þ, ð1:52Þ
2m ∂t

where ϕ(x0) of a numerator should be evaluated after operating ∇2, while with ϕ(x0)
in a denominator, ϕ(x0) is evaluated simply replacing x in ϕ(x) with x0. Now, let us
define a function Φ(x) such that

ħ2 2
Φð xÞ   ∇ þ V ðxÞ ϕðxÞ=ϕðxÞ: ð1:53Þ
2m

Then, we have
1.2 Schrödinger Equation 13

∂ξðt Þ
Φðx0 Þ ¼ iħ =ξðt Þ: ð1:54Þ
∂t

If RHS of (1.54) varied depending on t, Φ(x0) would be allowed to have various


values, but this must not be the case with our present investigation. Thus, RHS of
(1.54) should take a constant value E. For the same reason, LHS of (1.51) should
take a constant.
Thus, (1.48) or (1.51) should be separated into the following equations:

HϕðxÞ ¼ EϕðxÞ, ð1:55Þ


∂ξðt Þ
iħ ¼ Eξðt Þ: ð1:56Þ
∂t

Equation (1.56) can readily be solved. Since (1.56) depends on a sole variable t, we
have

dξðt Þ E E
¼ dt or d ln ξðt Þ ¼ dt: ð1:57Þ
ξðt Þ iħ iħ

Integrating (1.57) from zero to t, we get

ξðt Þ Et
ln ¼ : ð1:58Þ
ξð0Þ iħ

That is,

ξðt Þ ¼ ξð0Þ exp ðiEt=ħÞ: ð1:59Þ

Comparing (1.59) with (1.38), we find that the constant E in (1.55) and (1.56)
represents an energy of a particle (electron).
Thus, the next task we want to do is to solve an eigenvalue equation of (1.55).
After solving the problem, we get a solution

ψ ðx, t Þ ¼ ϕðxÞ exp ðiEt=ħÞ, ð1:60Þ

where the constant ξ(0) has been absorbed in ϕ(x). Normally, ϕ(x) is to be normal-
ized after determining the functional form (vide infra).
14 1 Schrödinger Equation and Its Application

1.3 Simple Applications of Schrödinger Equation

The Schrödinger equation has been expressed as (1.48). The equation is a second-
order linear differential equation (SOLDE). In particular, our major interest lies in
solving an eigenvalue problem of (1.55). Eigenvalues consist of points in a complex
plane. Those points sometimes form a continuous domain, but we focus on the
eigenvalues that comprise discrete points in the complex plane. Therefore, in our
studies the eigenvalues are countable and numbered as, e.g., λn (n ¼ 1, 2, 3,   ). An
example is depicted in Fig. 1.2. Having this common belief as a background, let us
first think of a simple form of SOLDE.
Example 1.1 Let us think of a following differential equation:

d 2 yð xÞ
þ λyðxÞ ¼ 0, ð1:61Þ
dx2

where x is a real variable; y may be a complex function of x with λ possibly being a


complex constant as well. Suppose that y(x) is defined within a domain [L, L]
(L > 0). We set boundary conditions (BCs) for (1.61) such that

yðLÞ ¼ 0 and yðLÞ ¼ 0 ðL > 0Þ: ð1:62Þ

The BCs of (1.62) are called Dirichlet conditions. We define the following differ-
ential operator D described as

d2
D : ð1:63Þ
dx2

Then rewriting (1.61), we have

Fig. 1.2 Eigenvalues


λn (n ¼ 1, 2, 3,   ) on a
z
complex plane

0 1
1.3 Simple Applications of Schrödinger Equation 15

DyðxÞ ¼ λyðxÞ: ð1:64Þ

According to a general principle of SOLDE, it has two linearly independent


solutions. In the case of (1.61), we choose exponential functions for those solutions
described by

eikx and eikx ðk 6¼ 0Þ:

This is because the above functions do not change a functional form with respect to
the differentiation and we ascribe solving a differential equation to solving an
algebraic equation among constants (or parameters). In the present case, λ and
k are such constants.
The parameter k could be a complex variable, because λ is allowed to take a
complex value as well. Linear independence of these functions is ensured from a
nonvanishing Wronskian, W. That is,

eikx eikx eikx eikx


W¼ ikx 0 ikx 0
¼ ¼ ik  ik ¼ 2ik: ð1:65Þ
e e ikeikx ikeikx

If k 6¼ 0, W 6¼ 0. Therefore, as a general solution, we get

yðxÞ ¼ aeikx þ beikx ðk 6¼ 0Þ, ð1:66Þ

where a and b are (complex) constant. We call two linearly independent solutions
eikx and eikx (k 6¼ 0) a fundamental set of solutions of a SOLDE. Inserting (1.66)
into (1.61), we have

λ  k2 aeikx þ beikx ¼ 0: ð1:67Þ

For (1.67) to hold with any x, we must have

λ  k2 ¼ 0, i:e:, λ ¼ k2 : ð1:68Þ

Using BCs (1.62), we have

aeikL þ beikL ¼ 0 and aeikL þ beikL ¼ 0: ð1:69Þ

Rewriting (1.69) in a matrix form, we have


    
eikL eikL a 0
¼ : ð1:70Þ
eikL eikL b 0

For a and b in (1.70) to have nonvanishing solutions, we must have


16 1 Schrödinger Equation and Its Application

eikL eikL
¼ 0, i:e:, e2ikL  e2ikL ¼ 0: ð1:71Þ
eikL eikL

It is because if (1.71) were not zero, we would have a ¼ b ¼ 0 and y(x)  0. Note that
with an eigenvalue problem we must avoid having a solution that is identically zero.
Rewriting (1.71), we get

eikL þ eikL eikL  eikL ¼ 0: ð1:72Þ

That is, we have either

eikL þ eikL ¼ 0 ð1:73Þ

or

eikL  eikL ¼ 0: ð1:74Þ

In the case of (1.73), inserting this into (1.69) we have

eikL ða  bÞ ¼ 0: ð1:75Þ

Therefore,

a ¼ b, ð1:76Þ

where we used the fact that eikL is a nonvanishing function for any ikL (either real or
complex). Similarly, in the case of (1.74), we have

a ¼ b: ð1:77Þ

For (1.76), from (1.66) we have

yðxÞ ¼ a eikx þ eikx ¼ 2a cos kx: ð1:78Þ

With (1.77), in turn, we get

yðxÞ ¼ a eikx  eikx ¼ 2ia sin kx: ð1:79Þ

Thus, we get two linearly independent solutions (1.78) and (1.79).


Inserting BCs (1.62) into (1.78), we have
1.3 Simple Applications of Schrödinger Equation 17

cos kL ¼ 0: ð1:80Þ

Hence,

π
kL ¼ þ mπ ðm ¼ 0, 1, 2,   Þ: ð1:81Þ
2
π π
In (1.81), for instance, we have k ¼ 2L for m ¼ 0 and k ¼  2L for m ¼  1. Also, we
have k ¼ 2L for m ¼ 1 and k ¼  2L for m ¼  2. These cases, however, individually
3π 3π

give linearly dependent solutions for (1.78). Therefore, to get a set of linearly
independent eigenfunctions we may define k as positive. Correspondingly, from
(1.68) we get eigenvalues of

λ ¼ ð2m þ 1Þ2 π 2 =4L2 ðm ¼ 0, 1, 2,   Þ: ð1:82Þ

Also, inserting BCs (1.62) into (1.79), we have

sin kL ¼ 0: ð1:83Þ

Hence,

kL ¼ nπ ðn ¼ 1, 2, 3,   Þ: ð1:84Þ

From (1.68) we get

λ ¼ n2 π 2 =L2 ¼ ð2nÞ2 π 2 =4L2 ðn ¼ 1, 2, 3,   Þ, ð1:85Þ

where we chose positive numbers n for the same reason as the above. With the
second equality of (1.85), we made eigenvalues easily comparable to those of (1.82).
Figure 1.3 shows the eigenvalues given in both (1.82) and (1.85) in a unit of π 2/4L2.
From (1.82) and (1.85), we find that λ is positive definite (or strictly positive), and
so from (1.68) we have
pffiffiffi
k¼ λ: ð1:86Þ

The next step is to normalize eigenfunctions. This step corresponds to appropriate


choice of a constant a in (1.78) and (1.79) so that we can have

[ /4
+∞
0 1 4 9 16 25 ⋯

Fig. 1.3 Eigenvalues of a differential equation (1.61) under boundary conditions given by (1.62).
The eigenvalues are given in a unit of π 2/4L2 on a real axis
18 1 Schrödinger Equation and Its Application

Z L Z L
I¼ yðxÞ yðxÞdx ¼ jyðxÞj2 dx ¼ 1: ð1:87Þ
L L

That is,
Z L Z L
1
I ¼ 4jaj2 cos 2 kxdx ¼ 4jaj2 ð1 þ cos 2kxÞdx
L L 2
h iL
1
¼ 2jaj2 x þ sin 2kx ¼ 4Ljaj2 : ð1:88Þ
2k L

Combining (1.87) and (1.88), we get


rffiffiffi
1 1
j aj ¼ : ð1:89Þ
2 L

Thus, we have
rffiffiffi
1 1 iθ
a¼ e , ð1:90Þ
2 L

where θ is any real number


qffiffi and e is said to be a phase factor. We usually set e  1.
iθ iθ

Then, we have a ¼ 12 1
L. Thus for a normalized cosine eigenfunctions, we get

rffiffiffi h i
1 π
yð xÞ ¼ cos kx kL ¼ þ mπ ðm ¼ 0, 1, 2,   Þ ð1:91Þ
L 2

that corresponds to an eigenvalue λ ¼ (2m + 1)2π 2/4L2 (m ¼ 0, 1, 2,   ). For another


series of normalized sine functions, similarly we get
rffiffiffi
1
yð xÞ ¼ sin kx ½kL ¼ nπ ðn ¼ 1, 2, 3,   Þ ð1:92Þ
L

that corresponds to an eigenvalue λ ¼ (2n)2π 2/4L2 (n ¼ 1, 2, 3,   ).


Notice that arranging λ in ascending order, we have even functions and odd
functions alternately as eigenfunctions corresponding to λ. Such a property is said to
be parity. We often encounter it in quantum mechanics and related fields. From
(1.61) we find that if y(x) is an eigenfunction, so is cy(x). That is, we should bear in
mind that the eigenvalue problem is always accompanied by an indeterminate
constant and that normalization of an eigenfunction does not mean the uniqueness
of the solution (see Chap. 10).
1.3 Simple Applications of Schrödinger Equation 19

Strictly speaking, we should be careful to assure that (1.81) holds on the basis of
(1.80). It is because we have yet the possibility that k is a complex number. To see it,
we examine zeros of a cosine function that is defined in a complex domain. Here the
zeros are (complex) numbers to which the function takes zero. That is, if f(z0) ¼ 0, z0
is called a zero (i.e., one of zeros) of f(z). Now we have

1 iz
cos z  e þ eiz ; z ¼ x þ iy ðx, y : realÞ: ð1:93Þ
2

Inserting z ¼ x + iy in cosz and rearranging terms, we get

1
cos z ¼ ½ cos xðey þ ey Þ þ i sin xðey  ey Þ : ð1:94Þ
2

For cosz to vanish, both its real and imaginary parts must be zero. Since e y + ey > 0
for all real numbers y, we must have cosx ¼ 0 for the real part to vanish; i.e.,

π
x¼ þ mπ ðm ¼ 0, 1, 2,   Þ: ð1:95Þ
2

Note in this case that sinx ¼  1 (6¼0). Therefore, for the imaginary part to vanish,
ey  e y ¼ 0. That is, we must have y ¼ 0. Consequently, the zeros of cosz are real
numbers. In other words, with respect to z0 that satisfies cosz0 ¼ 0 we have

π
z0 ¼ þ mπ ðm ¼ 0, 1, 2,   Þ: ð1:96Þ
2

The above discussion equally applies to a sine function as well.


Thus, we ensure that k is a nonzero real number. Eigenvalues λ are positive
definite from (1.68) accordingly. This conclusion is not fortuitous but a direct
consequence of the form of a differential equation we have dealt with in combination
with the BCs we imposed, i.e., the Dirichlet conditions. Detailed discussion will
follow in Sects. 1.4, 8.3, and 8.4 in relation to the Hermiticity of a differential
operator.
Example 1.2: A Particle Confined Within a Potential Well The results obtained
in Example 1.1 can immediately be applied to dealing with a particle (electron) in a
one-dimensional infinite potential well. In this case, (1.55) reads as

ħ2 d2 ψ ðxÞ
þ Eψ ðxÞ ¼ 0, ð1:97Þ
2m dx2

where m is a mass of a particle and E is an energy of the particle. A potential V is


expressed as
20 1 Schrödinger Equation and Its Application


0 ðL x LÞ,
V ð xÞ ¼
1 ðL > x; x > LÞ:

Rewriting (1.97), we have

d2 ψ ðxÞ 2mE
þ 2 ψ ð xÞ ¼ 0 ð1:98Þ
dx2 ħ

with BCs

ψ ðLÞ ¼ ψ ðLÞ ¼ 0: ð1:99Þ

If we replace λ of (1.61) with 2mE


ħ2
, we can follow the procedures of Example 1.1. That
is, we put

ħ2 λ
E¼ ð1:100Þ
2m

with λ ¼ k2 in (1.68). For k we use the values of (1.81) and (1.84). Therefore, with
energy eigenvalues we get either

2
ħ2 ð2l þ 1Þ π 2
E¼  ðl ¼ 0, 1, 2,   Þ,
2m 4L2

to which
rffiffiffi h i
1 π
ψ ð xÞ ¼ cos kx kL ¼ þ lπ ðl ¼ 0, 1, 2,   Þ ð1:101Þ
L 2

corresponds or

2
ħ2 ð2nÞ π 2
E¼  ðn ¼ 1, 2, 3,   Þ,
2m 4L2

to which
rffiffiffi
1
ψ ð xÞ ¼ sin kx ½kL ¼ nπ ðn ¼ 1, 2, 3,   Þ ð1:102Þ
L

corresponds.
Since the particle behaves as a free particle within the potential well (L x L)
and p ¼ ħk, we obtain
1.4 Quantum-Mechanical Operators and Matrices 21

p2 ħ2 2
E¼ ¼ k ,
2m 2m

where
8
< ð2l þ 1Þπ=2L
> ðl ¼ 0, 1, 2,   Þ,

>
:
2nπ=2L ðn ¼ 1, 2, 3,   Þ:

The energy E is a kinetic energy of the particle.


Although in (1.97), ψ(x)  0 trivially holds, such a function may not be regarded
as a solution of the eigenvalue problem. In fact, considering that |ψ(x)|2 represents
existence probability of a particle, ψ(x)  0 corresponds to a situation where a
particle in question does not exist. Consequently, such a trivial case has physically
no meaning.

1.4 Quantum-Mechanical Operators and Matrices

As represented by (1.55), a quantum-mechanical operator corresponds to a physical


quantity. In (1.55), we connect a Hamiltonian operator to an energy (eigenvalue). Let
us rephrase the situation as follows:

PΨ ¼ pΨ : ð1:103Þ

In (1.103), we are viewing P as an operation or measurement on a physical system


that is characterized by the quantum state Ψ . Operating P on the physical system
(or state), we obtain a physical quantity p relevant to P as a result of the operation
(or measurement).
A way to effectively achieve the above is to use a matrix and vector to represent
the operation and physical state, respectively. Let us glance a little bit of matrix
calculation to get used to the quantum-mechanical concept and, hence, to obtain
clear understanding about it. In Part III, we will deal with matrix calculation in detail
from a point of view of a general principle. At present, a (2, 2) matrix suffices. Let
A be a (2, 2) matrix expressed as
 
a b
A¼ : ð1:104Þ
c d

Let jψi be a (2, 1) matrix, i.e., a column vector such that


22 1 Schrödinger Equation and Its Application

 
e
j ψi ¼ : ð1:105Þ
f

Note that operating (2, 2) matrix on a (2, 1) matrix produces another (2, 1) matrix.
Furthermore, we define an adjoint matrix A{ such that
 
{ a c
A ¼ , ð1:106Þ
b d

where a is a complex conjugate of a. That is, A{ is a complex conjugate transposed


matrix of A. Also, we define an adjoint vector hψ| or jψi{ such that

hψj  j ψi{ ¼ ðe f Þ: ð1:107Þ

In this case, jψi{ also denotes a complex conjugate transpose of jψi. The notation
jψi and hψj are due to Dirac. He named hψj and jφi a bra vector and ket vector,
respectively. This naming or equivoque comes from that hψ j  j φi ¼ hψ| φi forms a
bracket. This is a (1, 2)  (2, 1) ¼ (1, 1) matrix, i.e., a c-number (including a
complex number) and hψ| φi represents an inner product. These notations are widely
used nowadays in the field of mathematics
  and physics.
g
Taking another vector j ξi ¼ and using a matrix calculation rule, we have
h
    
{ { a c e a eþc f
A j ψi ¼j A ψi ¼ ¼ : ð1:108Þ
b d f b eþd f

According to the definition (1.107), we have



j A{ ψi{ ¼ A{ ψj ¼ ðae þ cf be þ df Þ: ð1:109Þ

Thus, we get
 
 g
A{ ψjξi ¼ ðae þ cf be þ df Þ ¼ ðag þ bhÞe þ ðcg þ dhÞf : ð1:110Þ
h

Similarly, we have
  
a b g
hψjAξi ¼ ðe f Þ ¼ ðag þ bhÞe þ ðcg þ dhÞf : ð1:111Þ
c d h

Comparing (1.110) and (1.111), we get


1.4 Quantum-Mechanical Operators and Matrices 23


A{ ψjξi ¼ hψjAξi: ð1:112Þ

Also, we have

hψjAξi ¼ hAξjψi: ð1:113Þ

Replacing A with A{ in (1.112), we get


D {
E  
A{ ψjξ ¼ ψjA{ ξ : ð1:114Þ

From (1.104) and (1.106), obviously we have

{
A{ ¼ A: ð1:115Þ

Then, from (1.114) and (1.115) we have


 
hAψjξi ¼ ψjA{ ξ ¼ hξjAψ i , ð1:116Þ

where the second equality comes from (1.113) obtained by exchanging ψ and ξ
there. Moreover, we have a following relation:

ðABÞ{ ¼ B{ A{ : ð1:117Þ

The proof is left for readers. Using this relation, we have


 
hAψ j¼j Aψi{ ¼ ½Ajψi { ¼j ψi{ A{ ¼ ψjA{ ¼ ψA{ j : ð1:118Þ

Making an inner product by multiplying jξi from the right of the leftmost and
rightmost sides of (1.118) and using (1.116), we get
  
hAψ j ξi ¼ ψA{ j ξi ¼ ψjA{ ξ :

This relation may be regarded as the associative law with regard to the symbol “j” of
the inner product. This is equivalent to the associative law with regard to the matrix
multiplication.
The results obtained above can readily be extended to a general case where (n, n)
matrices are dealt with.
Now, let us introduce an Hermitian operator (or matrix) H. When we have

H { ¼ H, ð1:119Þ
24 1 Schrödinger Equation and Its Application

H is called an Hermitian matrix. Then, applying (1.112) to the Hermitian matrix H,


we have
    
H { ψjξi ¼ hψjHξi ¼ ψjH { ξ or hHψjξi ¼ ψjH { ξ ¼ hψjHξi: ð1:120Þ

Also let us introduce a norm of a vector jψi such that


pffiffiffiffiffiffiffiffiffiffiffiffi
jjψ jj ¼ hψjψi: ð1:121Þ

A norm is a natural extension for a notion of a “length” of a vector. The norm j|ψ|j is
zero, if and only if jψi ¼ 0 (zero vector). Then, from (1.105) and (1.107), we have

hψjψi ¼ jej2 þ j f j2 :

Therefore, hψ| ψi ¼ 0 ⟺ e ¼ f ¼ 0, i. e. , j ψi ¼ 0.
Let us further consider an eigenvalue problem represented by our newly intro-
duced notation. The eigenvalue equation is symbolically written as

H j ψi ¼ λ j ψi, ð1:122Þ

where H represents an Hermitian operator and jψi is an eigenfunction that belongs to


an eigenvalue λ. Operating hψ| on (1.122) from the left, we have

hψjH jψi ¼ hψjλjψi ¼ λhψjψi ¼ λ, ð1:123Þ

where we assume that jψi is normalized, namely hψ| ψi ¼ 1 or j|ψ| j ¼ 1. Notice that
the symbol “j” in an inner product is of secondary importance. We may disregard
this notation as in the case where a product notation “” is omitted by denoting ab
instead of a  b.
Taking a complex conjugate of (1.123), we have

hψjH ψi ¼ λ : ð1:124Þ

Using (1.116) and (1.124), we have


 
λ ¼ hψjH ψi ¼ ψjH { ψ ¼ hψjHψi ¼ λ, ð1:125Þ

where with the third equality we used the definition (1.119). The relation λ ¼ λ
obviously shows that any eigenvalue λ is real, if H is Hermitian. The relation (1.125)
immediately tells us that even though jψi is not an eigenfunction, hψ| Hψi is real as
well, if H is Hermitian. The quantity hψ| Hψi is said to be an expectation value. This
value is interpreted as the most probable or averaged value of H obtained as a result
of operation of H on a physical state jψi. We sometimes denote the expectation value
as
1.4 Quantum-Mechanical Operators and Matrices 25

hHi  hψjHψi, ð1:126Þ

where jψi is normalized. Unless jψi is normalized, it can be normalized on the basis
of (1.121) by choosing jΦi such that

j Φi ¼j ψi= jjψ jj : ð1:127Þ

Thus, we have an important consequence; if an Hermitian operator has an eigen-


value, it must be real. An expectation value of an Hermitian operator is real as well.
The real eigenvalue and expectation value are a prerequisite for a physical quantity.
As discussed above, the Hermitian matrices play a central role in quantum
physics. Taking a further step, let us extend the notion of Hermiticity to a function
space.
In Example 1.1, we have remarked that we have finally reached a solution where λ
is a real (and positive) number, even though at the beginning we set no restriction on
λ. This is because the SOLDE form (1.61) accompanied by BCs (1.62) is Hermitian,
and so eigenvalues λ are real.
In this context, we give a little bit of further consideration. We define an inner
product between two functions as follows:
Z b
hgjf i  gðxÞ f ðxÞdx, ð1:128Þ
a

where g(x) is a complex conjugate of g(x); x is a real variable and an integration


range can be either bounded or unbounded. If a and b are real definite numbers, [a, b]
is the bounded case. With the unbounded case, we have, e.g., (1, 1), (1, c),
and (c, 1), etc. where c is a definite number. This notation will appear again in
Chap. 10. In (1.128) we view functions f and g as vectors in a function space, often
referred to as a Hilbert space. We assume that any function f is square-integrable; i.e.,
|f |2 is finite. That is,
Z b
j f ðxÞj2 dx < 1: ð1:129Þ
a

Using the above definition, let us calculate hg| Df i, where D was defined in
(1.63). Then, using the integration by parts, we have
Z Z b
b
d2 f ðxÞ  0 b 0
hg j Df i ¼ gð x Þ  2
dx ¼  g f a þ g f 0 dx
a dx a
Z b Z b 
 0b  0 b 00 00
g fdx ¼ ½g 0 f  g f 0 a þ
b
¼  g f aþ g f a  g f dx
a a

¼ ½g 0 f  g f 0 a þ hDg j f i:
b
ð1:130Þ
26 1 Schrödinger Equation and Its Application

If we have BCs such that

f ðbÞ ¼ f ðaÞ ¼ 0 and gðbÞ ¼ gðaÞ ¼ 0, i:e:, gðbÞ ¼ gðaÞ ¼ 0, ð1:131Þ

we get

hgjDf i ¼ hDgjf i: ð1:132Þ

In light of (1.120), (1.132) implies that D is Hermitian. In (1.131), notice that the
functions f and g satisfy the same BCs. Normally, for an operator to be Hermitian has
this property. Thus, the Hermiticity of a differential operator is closely related to BCs
of the differential equation.
Next, we consider a following inner product:
Z b Z b
0
00
f0 a f f 0 dx
b
hf jDf i ¼  f f dx ¼ ½ f þ
a a
Z b
¼ ½ f f 0 a þ j f 0 j dx:
b 2
ð1:133Þ
a

Note that the definite integral of (1.133) cannot be negative. There are two possibil-
ities for D to be Hermitian according to different BCs.
(i) Dirichlet conditions: f(b) ¼ f(a) ¼ 0. If we could have f 0 ¼ 0, h f j Df i would be
zero. But, in that case f should be constant. If so, f(x)  0 according to BCs. We
must exclude this trivial case. Consequently, to avoid this situation we must
have
Z b
j f 0 j dx > 0 or h f jDf i > 0:
2
ð1:134Þ
a

In this case, the operator D is said to be positive definite. Suppose that such a
positive-definite operator has an eigenvalue λ. Then, for a corresponding
eigenfunction y(x) we have

DyðxÞ ¼ λyðxÞ: ð1:135Þ

In this case, we state that y(x) is an eigenfunction or eigenvector that


corresponds (or belongs) to an eigenvalue λ. Taking an inner product of both
sides, we have
1.5 Commutator and Canonical Commutation Relation 27

hyjDyi ¼ hyjλyi ¼ λhyjyi ¼ λkyk2 or λ ¼ hyjDyi=kyk2 : ð1:136Þ

Both hy| Dyi and kyk2 are positive and, hence, we have λ > 0. Thus, if D has
an eigenvalue, it must be positive. In this case, λ is said to be positive definite as
well; see Example 1.1.
(ii) Neumann conditions: f 0(b) ¼ f 0(a) ¼ 0. From (1.130), D is Hermitian as well.
Unlike the condition (i), however, f may be a nonzero constant in this case.
Therefore, we are allowed to have
Z b
j f 0 j dx ¼ 0 or hf jDf i ¼ 0:
2
ð1:137Þ
a

For any function, we have

hf jDf i 0: ð1:138Þ

In this case, the operator D is said to be non-negative (or positive semi-


definite). The eigenvalue may be zero from (1.136) and, hence, is called
non-negative accordingly.
(iii) Periodic conditions: f(b) ¼ f(a) and f 0(b) ¼ f 0(a). We are allowed to have
h f j Df i 0 as in the case of the condition (ii). Then, the operator and
eigenvalues are non-negative.
Thus, in spite of being formally the same operator, that operator behaves
differently according to the different BCs. In particular, for a differential
operator to be associated with an eigenvalue of zero produces a special interest.
We will encounter another illustration in Chap. 3.

1.5 Commutator and Canonical Commutation Relation

In quantum mechanics it is important whether two operators A and B are commut-


able. In this context, a commutator between A and B is defined such that

½A, B  AB  BA: ð1:139Þ

If [A, B] ¼ 0 (zero matrix), A and B are said to be commutable (or commutative).


If [A, B] 6¼ 0, A and B are noncommutative. Such relationships between two
operators are called commutation relation.
We have canonical commutation relation as an underlying concept of quantum
mechanics. This is defined between a (canonical) coordinate q and a (canonical)
momentum p such that
28 1 Schrödinger Equation and Its Application

½q, p ¼ iħ, ð1:140Þ

where the presence of a unit matrix E is implied. Explicitly writing it, we have,

½q, p ¼ iħE: ð1:141Þ

The relations (1.140) and (1.141) are called the canonical commutation relation. On
the basis of a relation p ¼ ħi ∂q

, a brief proof for this is as follows:
 
ħ ∂ ħ ∂ ħ ∂ ħ ∂
½q, p jψi ¼ ðqp  pqÞjψi ¼ q  q ψi ¼ q ψi  ðqjψiÞ
i ∂q i ∂q i ∂q i ∂q

ħ ∂ j ψi ħ ∂q ħ ∂ j ψi ħ
¼q  ψi  q ¼  ψi ¼ iħ j ψi: ð1:142Þ
i ∂q i ∂q i ∂q i

Since jψi is an arbitrarily chosen vector, we have (1.140).


Using (1.117), we have

½A, B { ¼ ðAB  BAÞ{ ¼ B{ A{  A{ B{ : ð1:143Þ

If in (1.143) A and B are both Hermitian, we have

½A, B { ¼ BA  AB ¼ ½A, B : ð1:144Þ

If we have an operator G such that

G{ ¼ G, ð1:145Þ

G is said to be anti-Hermitian. Therefore, [A, B] is anti-Hermitian, if both A and B are


Hermitian. If an anti-Hermitian operator has an eigenvalue, that eigenvalue is zero or
pure imaginary. To show this, suppose that

G j ψi ¼ λ j ψi, ð1:146Þ

where G is an anti-Hermitian operator and jψi has been normalized. As in the case of
(1.123) we have

hψjG j ψi ¼ λhψjψi ¼ λ: ð1:147Þ

Taking a complex conjugate of (1.147), we have

hψjGψi ¼ λ : ð1:148Þ

Using (1.116) and (1.145) again, we have


1.5 Commutator and Canonical Commutation Relation 29

 
λ ¼ hψjGψi ¼ ψjG{ ψ ¼ hψjGψi ¼ λ, ð1:149Þ

This shows that λ is zero or pure imaginary.


Therefore, (1.142) can be viewed as an eigenvalue equation to which any physical
state jψi has a pure imaginary eigenvalue iħ with respect to [q, p]. Note that both
q and p are Hermitian (see Sect. 10.2, Example 10.3), and so [q, p] is anti-Hermitian
as mentioned above. The canonical commutation relation given by (1.140) is
believed to underpin the uncertainty principle.
In quantum mechanics, it is of great importance whether a quantum operator is
Hermitian or not. A position operator and momentum operator along with an angular
momentum operator are particularly important when we constitute Hamiltonian. Let
f and g be arbitrary functions. Let us consider, e.g., a following inner product with
the momentum operator.
Z b
ħ ∂
hgjpf i ¼ gð x Þ ½ f ðxÞ dx, ð1:150Þ
a i ∂x

where the domain [a, b] depends on a physical system; this can be either bounded or
unbounded. Performing integration by parts, we have
Z b
ħ b ∂ ħ
hgjpf i ¼ ½gðxÞ f ðxÞ a  ½ gð x Þ f ðxÞdx
i a ∂x i
Z b
ħ ħ ∂
¼ ½gðbÞ f ðbÞ  gðaÞ f ðaÞ þ gðxÞ f ðxÞdx: ð1:151Þ
i a i ∂x

If we require f(b) ¼ f(a) and g(b) ¼ g(a), the first term vanishes and we get
Z b
ħ ∂
hgjpf i ¼ gðxÞ f ðxÞdx ¼ hpgj f i: ð1:152Þ
a i ∂x

Thus, as in the case of (1.120), the momentum operator p is Hermitian. Note that a
position operator q of (1.142) is Hermitian as a priori assumption.
Meanwhile, the z-component of angular momentum operator Lz is described in a
polar coordinate as follows:

ħ ∂
Lz ¼ , ð1:153Þ
i ∂ϕ

where ϕ is an azimuthal angle varying from 0 to 2π. The notation and implication of
Lz will be mentioned in Chap. 3. Similarly as the above, we have
30 1 Schrödinger Equation and Its Application

Z 2π
ħ ħ ∂
hgjLz f i ¼ ½gð2π Þ f ð2π Þ  gð0Þ f ð0Þ þ gðxÞ f ðxÞdϕ: ð1:154Þ
i 0 i ∂ϕ

Requiring an arbitrary function f to satisfy a BC f(2π) ¼ f(0), we reach

hgjLz f i ¼ hLz gj f i: ð1:155Þ

Note that we must have the above BC, because ϕ ¼ 0 and ϕ ¼ 2π are spatially the
same point. Thus, we find that Lz is Hermitian as well on this condition.
On the basis of aforementioned argument, let us proceed to quantum-mechanical
studies of a harmonic oscillator. Regarding the angular momentum, we will study
their basic properties in Chap. 3.

Reference

1. Møller C (1952) The theory of relativity. Oxford University Press, London


Chapter 2
Quantum-Mechanical Harmonic Oscillator

Quantum-mechanical treatment of a harmonic oscillator has been a well-studied


topic from the beginning of the history of quantum mechanics. This topic is a
standard subject in classical mechanics as well. In this chapter, first we briefly
survey characteristics of a classical harmonic oscillator. From a quantum-mechanical
point of view, we deal with features of a harmonic oscillator through matrix
representation. We define creation and annihilation operators using position and
momentum operators. A Hamiltonian of the oscillator is described in terms of the
creation and annihilation operators. This enables us to easily determine energy
eigenvalues of the oscillator. As a result, energy eigenvalues are found to be positive
definite. Meanwhile, we express the Schrödinger equation by the coordinate repre-
sentation. We compare the results with those of the matrix representation and show
that the two representations are mathematically equivalent. Thus, the treatment of the
quantum-mechanical harmonic oscillator supplies us with a firm ground for studying
basic concepts of the quantum mechanics.

2.1 Classical Harmonic Oscillator

Classical Newtonian equation of a one-dimensional harmonic oscillator is expressed


as

d2 xðtÞ
m ¼ sxðt Þ, ð2:1Þ
dt 2

where m is a mass of an oscillator and s is a spring constant. Putting s/m ¼ ω2, we


have

© Springer Nature Singapore Pte Ltd. 2020 31


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_2
32 2 Quantum-Mechanical Harmonic Oscillator

d 2 xð t Þ
þ ω2 xðt Þ ¼ 0: ð2:2Þ
dt 2

In (2.2) we set ω positive, namely,


pffiffiffiffiffiffiffiffi
ω¼ s=m, ð2:3Þ

where ω is called an angular frequency of the oscillator.


If we replace ω2 with λ, we have formally the same equation as (1.61). Two
linearly independent solutions of (2.2) are the same as before (see Example 1.1); we
have eiωt and eiωt (ω 6¼ 0) as such. Note, however, that in Example 1.2 we were
dealing with a quantum state related to existence probability of a particle in a
potential well. In (2.2), on the other hand, we are examining a position of harmonic
oscillator undergoing a force of a spring. We are thus considering a different
situation.
As a general solution we have

xðt Þ ¼ aeiωt þ beiωt , ð2:4Þ

where a and b are suitable constants. Let us consider BCs different from those of
Examples 1.1 or 1.2 this time. That is, we set BCs such that

x ð 0Þ ¼ 0 and x0 ð0Þ ¼ v0 ðv0 > 0Þ: ð2:5Þ

Notice that (2.5) gives initial conditions (ICs). Mathematically, ICs are included
in BCs (see Chap. 10). From (2.4) we have

xð t Þ ¼ a þ b ¼ 0 and x0 ð0Þ ¼ iωða  bÞ ¼ v0 : ð2:6Þ

Then, we get a ¼  b ¼ v0/2iω. Thus, we get a simple harmonic motion as a


solution expressed as

v0  iωx  v
xð t Þ ¼ e  eiωx ¼ 0 sin ωt: ð2:7Þ
2iω ω

From this, we have

1
E ¼ K þ V ¼ mv20 : ð2:8Þ
2

In particular, if v0 ¼ 0, x(t)  0. This is a solution of (2.1) that has the meaning


that the particle is eternally at rest. It is physically acceptable as well. Notice also that
unlike Examples 1.1 and 1.2, the solution has been determined uniquely. This is due
to the different BCs.
2.2 Formulation Based on an Operator Method 33

From a point of view of a mechanical system, mathematical formulation of the


classical harmonic oscillator resembles that of electromagnetic fields confined within
a cavity. We return this point later in Sect. 9.6.

2.2 Formulation Based on an Operator Method

Now let us return to our task to find quantum-mechanical solutions of a harmonic


oscillator. Potential V is given by

1 1
V ðqÞ ¼ sq2 ¼ mω2 q2 , ð2:9Þ
2 2

where q is used for a one-dimensional position coordinate. Then, we have a classical


Hamiltonian H expressed as

p2 p2 1
H¼ þ V ð qÞ ¼ þ mω2 q2 : ð2:10Þ
2m 2m 2

Following the formulation of Sect. 1.2, the Schrödinger equation as an eigenvalue


equation related to energy E is described as

Hψ ðqÞ ¼ Eψ ðqÞ

or

ħ2 2 1
 ∇ ψ ðqÞ þ mω2 q2 ψ ðqÞ ¼ Eψ ðqÞ: ð2:11Þ
2m 2

This is a SOLDE and it is well known that the SOLDE can be solved by a power
series expansion method.
In the present studies, however, let us first use an operator method to solve the
eigenvalue equation (2.11) of a one-dimensional oscillator. To this end, we use a
quantum-mechanical Hamiltonian where a momentum operator p is explicitly
represented. Thus, the Hamiltonian reads as

p2 1
H¼ þ mω2 q2 : ð2:12Þ
2m 2

Equation (2.12) is formally the same as (2.10). Note, however, that in (2.12) p and
q are expressed as quantum-mechanical operators.
As in (1.126), we first examine an expectation value hHi of H. It is given by
34 2 Quantum-Mechanical Harmonic Oscillator

  2  D  E
p 1
hH i ¼ hψjHψ i ¼ ψ  ψ þ ψ  mω2 q2 ψ
2m 2
1  1  1 1 ð2:13Þ
¼ p{ ψ jpψ þ mω2 q{ ψ jqψ ¼ hpψ jpψ i þ mω2 hqψ jqψ i
2m 2 2m 2
1 1
¼ 2
kpψ k þ mω kqψ k  0,
2 2
2m 2

where again we assumed that jψi has been normalized. In (2.13) we used the
notation (1.126) and the fact that both q and p are Hermitian. In this situation, hHi
takes a non-negative value.
In (2.13), the equality holds if and only if jpψi ¼ 0 and jqψi ¼ 0. Let us specify a
vector jψ 0i that satisfies these conditions such that

j pψ 0 i ¼ 0 and j qψ 0 i ¼ 0: ð2:14Þ

Multiplying q from the left on the first equation of (2.14) and multiplying p from
the left on the second equation, we have

qp j ψ 0 i ¼ 0 and pq j ψ 0 i ¼ 0: ð2:15Þ

Subtracting the second equation of (2.15) from the first equation, we get

ðqp  pqÞjψ 0 i ¼ iħjψ 0 i ¼ 0, ð2:16Þ

where with the first equality we used (1.140). Therefore, we would have jψ 0(q)i  0.
This leads to the relations (2.14). That is, if and only if jψ 0(q)i  0, hHi ¼ 0. But,
since it has no physical meaning, jψ 0(q)i  0 must be rejected as unsuitable for the
solution of (2.11). Regarding a physically acceptable solution of (2.13), hHi must
take a positive definite value accordingly. Thus, on the basis of the canonical
commutation relation, we restrict the range of the expectation values.
Instead of directly dealing with (2.12), it is well known to introduce following
operators [1]:
rffiffiffiffiffiffiffi
mω i
a q þ pffiffiffiffiffiffiffiffiffiffiffiffi p ð2:17Þ
2ħ 2mħω

and its adjoint (complex conjugate) operator


rffiffiffiffiffiffiffi
{ mω i
a ¼ q  pffiffiffiffiffiffiffiffiffiffiffiffi p: ð2:18Þ
2ħ 2mħω

Notice here again that both q and p are Hermitian. Using a matrix representation
for (2.17) and (2.18), we have
2.2 Formulation Based on an Operator Method 35

0 qffiffiffiffiffiffiffi 1
mω i
pffiffiffiffiffiffiffiffiffiffiffiffi
a B 2ħ 2mħω C q
¼B ffiffiffiffiffiffiffi
@ qmω
C
A p ð2:19Þ
a{ i
 pffiffiffiffiffiffiffiffiffiffiffiffi
2ħ 2mħω

Then we have

a{ a ¼ ð2mħωÞ1 ðmωq  ipÞðmωq þ ipÞ


¼ ð2mħωÞ1 m2 ω2 q2 þ p2 þ imωðqp  pqÞ ð2:20Þ
h i  
1 1 2 1 1
¼ ðħωÞ1 mω2 q2 þ p þ iωiħ ¼ ðħωÞ1 H  ħω ,
2 2m 2 2

where the second last equality comes from (1.140). Rewriting (2.20), we get

1
H ¼ ħωa{ a þ ħω: ð2:21Þ
2

Similarly we get

1
H ¼ ħωaa{  ħω: ð2:22Þ
2

Subtracting (2.22) from (2.21), we have

0 ¼ ħωa{ a  ħωaa{ þ ħω: ð2:23Þ

That is,

a, a{ ¼ 1 or a, a{ ¼ E, ð2:24Þ

where E represents an identity operator.


Furthermore, using (2.21) we have
h i  
1
H, a{ ¼ ħω a{ a þ , a{ ¼ ħω a{ aa{  a{ a{ a ¼ ħωa{ a, a{ ¼ ħωa{ : ð2:25Þ
2

Similarly, we get

½H, a ¼ ħωa: ð2:26Þ

Next, let us calculate an expectation value of H. Using a normalized function jψi,


from (2.21) we have
D   E   
 1  1
hψ jH jψ i ¼ ψ ħωa{ a þ ħωψ ¼ ħω ψ a{ aψ þ ħωhψjψ i
2 2 ð2:27Þ
1 1 1
¼ ħωhaψjaψ i þ ħω ¼ ħωkaψ k2 þ ħω  ħω:
2 2 2
36 2 Quantum-Mechanical Harmonic Oscillator

Thus, the expectation value is equal to or larger than 12 ħω. This is consistent with
that an energy eigenvalue is positive definite as mentioned above. Equation (2.27)
also tells us that if we have

j aψ 0 i ¼ 0, ð2:28Þ

we get

1
hψ 0 jHjψ 0 i ¼ ħω: ð2:29Þ
2

Equation (2.29) means that the smallest expectation value is 12 ħω on the condition
of (2.28). On the same condition, using (2.21) we have

1 1
H j ψ 0 i ¼ ħωa{ a j ψ 0 i þ ħω j ψ 0 i ¼ ħω j ψ 0 i: ð2:30Þ
2 2

Thus, jψ 0i is an eigenfunction corresponding to an eigenvalue 12 ħω  E0 , which


is identical with the smallest expectation value of (2.29). Since this is the lowest
eigenvalue, jψ 0i is said to be a ground state. We ensure later that jψ 0i is certainly an
eligible function for a ground state.
The above method is consistent with the variational principle [2] which stipulates
that under appropriate BCs an expectation value of Hamiltonian estimated with any
arbitrary function is always larger than or equal to the smallest eigenvalue
corresponding to the ground state.
Next, let us evaluate energy eigenvalues of the oscillator. First we have

1
H j ψ 0 i ¼ ħω j ψ 0 i ¼ E0 j ψ 0 i: ð2:31Þ
2

Operating a{ on both sides of (2.31), we have

a{ H j ψ 0 i ¼ a{ E 0 j ψ 0 i: ð2:32Þ

Meanwhile, using (2.25), we have


 
a{ H j ψ 0 i ¼ Ha{  ħωa{ j ψ 0 i: ð2:33Þ

Equating RHSs of (2.32) and (2.33), we get

Ha{ j ψ 0 i ¼ ðE 0 þ ħωÞa{ j ψ 0 i: ð2:34Þ

This implies that a{ j ψ 0i belongs to an eigenvalue (E0 + ħω), which is larger than
E0 as expected. Again multiplying a{ on both sides of (2.34) from the left and using
(2.25), we get
2.2 Formulation Based on an Operator Method 37

[ℏ
+∞
0 1 3 5 7 9

2 2 2 2 2

Fig. 2.1 Energy eigenvalues of a quantum-mechanical harmonic oscillator on a real axis

 2  2
H a{ j ψ 0 i ¼ ðE 0 þ 2ħωÞ a{ j ψ 0 i: ð2:35Þ

This implies that (a{)2 j ψ 0i belongs to an eigenvalue (E0 + 2ħω). Thus, repeat-
edly taking the above procedures, we get
 n  n
H a{ j ψ 0 i ¼ ðE 0 þ nħωÞ a{ j ψ 0 i: ð2:36Þ

Thus, (a{)n j ψ 0i belongs to an eigenvalue


 
1
E n  ðE0 þ nħωÞ ¼ n þ ħω, ð2:37Þ
2

where En denotes an energy eigenvalue of the n-th excited state. The energy
eigenvalues are plotted in Fig. 2.1.
Our next task is to seek normalized eigenvectors of the n-th excited state. Let cn
be a normalization constant of that state. That is, we have
 n
j ψ n i ¼ cn a{ j ψ 0 i, ð2:38Þ

where jψ ni is a normalized eigenfunction of the n-th excited state. To determine cn,


let us calculate a j ψ ni. This includes a factor a(a{)n. We have
 n   n1  n1
a a{ ¼ aa{  a{ a a{ þ a{ a a{
 n1  n1  { n1  n1
¼ a, a{ a{ þ a{ a a{ ¼ a þ a{ a a{
 n1  n2  { 2  { n2
¼ a{ þ a{ a, a{ a{ þ a a a
 { n1  { 2  { n2 ð2:39Þ
¼2 a þ a a a
 { n1  { 2  n3  { 3  { n3
¼2 a þ a a, a{ a{ þ a a a
 { n1  { 3  { n3
¼3 a þ a a a
¼   :

In the above procedures, we used [a, a{] ¼ 1. What is implied in (2.39) is that a
coefficient of (a{)n  1 increased one by one with a transferred toward the right one
by one in the second term of RHS. Notice that in the second term a is sandwiched
such that (a{)ma(a{)n  m (m ¼ 1, 2, . . .). Finally, we have
38 2 Quantum-Mechanical Harmonic Oscillator

 n  n1  { n
a a{ ¼ n a{ þ a a: ð2:40Þ

Thus, we get
  n  h  n1  n i  n1 

a ψ n i ¼ c n a a {  ψ 0 i ¼ c n n a { þ a{ a  ψ 0 i ¼ c n n a{ ψ 0 i
c   n1 c ð2:41Þ
¼ n n cn1 a{ jψ 0 i ¼ n n jψ n1 i,
cn1 cn1

where the third equality comes from (2.28).


Next, operating a on (2.40) we get
 n  n1  n
a2 a{ ¼ na a{ þ a a{ a
h  n2  { n1 i  n
¼ n ð n  1Þ a{ þ a a þ a a{ a ð2:42Þ
 n2  n1  n
¼ nð n  1Þ a{ þ n a{ a þ a a{ a

Operating another a on (2.42), we get


 n  n2  n1  n
a3 a{ ¼ nðn  1Þa a{ þ na a{ a þ a2 a{ a
h  n3  { n2 i  n1  n
¼ nðn  1Þ ðn  2Þ a{ þ a a þ na a{ a þ a2 a{ a
 n3  n2  n1  n
¼ nðn  1Þðn  2Þ a{ þ nð n  1Þ a{ a þ na a{ a þ a2 a{ a
¼   :
ð2:43Þ

To generalize the above procedures, operating a on (2.40) m (<n) times we get


 n  nm  
am a{ ¼ nðn  1Þðn  2Þ  ðn  m þ 1Þ a{ þ f a, a{ a, ð2:44Þ

where m < n and f(a, a{) is a polynomial of a{ that has a power of a as a coefficient.
Further operating hψ 0| and jψ 0i from the left and right on both sides of (2.44),
respectively, we have
D   n  E D  nm  E
ψ 0 am a{ ψ 0 ¼ nðn  1Þðn  2Þ  ðn  m þ 1Þ ψ 0  a{ ψ 0
    
þ ψ 0  f a, a{ aψ 0 ð2:45Þ
D  nm  E
¼ nðn  1Þðn  2Þ  ðn  m þ 1Þ ψ 0  a{ ψ 0

Note that in (2.45) hψ 0| f(a, a{)a| ψ 0i vanishes because of (2.28).


Meanwhile, taking adjoint of (2.28), we have
2.2 Formulation Based on an Operator Method 39

hψ 0 j a{ ¼ 0: ð2:46Þ

Operating a{ (n  m  1) times from the right of LHS of (2.46), we have


 nm
h ψ 0 j a{ ¼ 0:

Further operating jψ 0i from the right of the above equation, we get


 nm
hψ 0 j a{ j ψ 0 i ¼ 0:

Therefore, from (2.45) we get


 n
hψ 0 j am a{ j ψ 0 i ¼ 0: ð2:47Þ

Taking adjoint of (2.47) once again, we have


 m
hψ 0 j an a{ j ψ 0 i ¼ 0: ð2:48Þ

Equation (2.48) can be obtained by repeatedly using (1.117). From (2.47) and
(2.48), we get
 n
hψ 0 j am a{ j ψ 0 i ¼ 0, when m 6¼ n: ð2:49Þ

If m ¼ n, from (2.45) we get


 n
hψ 0 j an a{ j ψ 0 i ¼ n!hψ 0 jψ 0 i: ð2:50Þ

If we assume that jψ 0i is normalized; i.e., hψ 0| ψ 0i ¼ 1, (2.50) is expressed as


 n
hψ 0 j an a{ j ψ 0 i ¼ n!: ð2:51Þ

From (2.51), if we put

1  n
j ψ n i ¼ pffiffiffiffi a{ j ψ 0 i, ð2:52Þ
n!

then we have

1
hψ n j ¼ pffiffiffiffi hψ 0 jan :
n!

Thus, from (2.49) and (2.52) we get


40 2 Quantum-Mechanical Harmonic Oscillator

hψ m j ψ n i ¼ δmn : ð2:53Þ

At the same time, for cn of (2.38) we get

1
cn ¼ pffiffiffiffi : ð2:54Þ
n!

Notice here that an undetermined phase factor eiθ (θ : real) is intended such that

1
cn ¼ pffiffiffiffi eiθ :
n!

But, eiθ is usually omitted for the sake of simplicity. Thus, we have constructed a
series of orthonormal eigenfunctions | ψ ni.
Furthermore, using (2.41) and (2.54) we get
pffiffiffi
a j ψ ni ¼ n j ψ n1 i: ð2:55Þ

From (2.36), we get

H j ψ n i ¼ ðE 0 þ nħωÞ j ψ n i: ð2:56Þ

Meanwhile, from (2.21) we have


 
H j ψ n i ¼ ħωa{ a þ E 0 j ψ n i: ð2:57Þ

Equating RHSs of (2.56) and (2.57), we get

a{ a j ψ n i ¼ n j ψ n i: ð2:58Þ

Thus, we find that an integer n is an eigenvalue of a{a when it is evaluated with


respect to jψ ni. For this reason a{a is called a number operator. Notice that a{a is
Hermitian because we have
 {  {
a{ a ¼ a{ a{ ¼ a{ a, ð2:59Þ

where we used (1.115) and (1.117).


Moreover, from (2.55) and (2.58),
 pffiffiffi 
a{ aψ n i ¼ na{ ψ n1 i ¼ n j ψ n i: ð2:60Þ

Thus,
2.3 Matrix Representation of Physical Quantities 41

pffiffiffi
a{ j ψ n1 i ¼ n j ψ n i: ð2:61Þ

Or replacing n with n + 1, we get


pffiffiffiffiffiffiffiffiffiffiffi
a{ j ψ n i ¼ n þ 1 j ψ nþ1 : ð2:62Þ

As implied in (2.55) and (2.62), we find that operating a on jψ ni lowers an energy


level by one and that operating a{ on jψ ni raises an energy level by one. For this
reason, a and a{ are said to be an annihilation operator and creation operator,
respectively.

2.3 Matrix Representation of Physical Quantities

Equations (2.55) and (2.62) clearly represent the relationship between an operator
and eigenfunction (or eigenvector). The relationship is characterized by

ðMatrixÞ  ðVectorÞ ¼ ðVector Þ: ð2:63Þ

Thus, we are now in a position to construct this relation using matrices. From
(2.53), we should be able to construct basis vectors using a column vector such that
0 1 0 1 0 1
1 0 0
B 0 C B 1 C B 0 C
B C B C B C
B C B C B C
B 0 C B 0 C B 1 C
jψ 0 i ¼ B C, j ψ 1 i ¼ B
B 0 C B
C, j ψ 2 i ¼ B
C B
C,   : ð2:64Þ
B C B 0 C B 0 C
C
B C B C B C
@ 0 A @ 0 A @ 0 A
⋮ ⋮ ⋮

Notice that these vectors form a vector space of an infinite dimension. The
orthonormal relation (2.53) can easily be checked. We represent a and a{ so that
(2.55) and (2.62) can be satisfied. We obtain
0 1
0 1 0 0 0 
B pffiffiffi
B 0 0 2 0 0 C
C
B pffiffiffi C
B 0 0 0 3 0 C
a¼B
B
C: ð2:65Þ
B 0 0 0 0 2 C
C
B C
@ 0 0 0 0 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Similarly,
42 2 Quantum-Mechanical Harmonic Oscillator

0 1
0 0 0 0 0 
B 1 C
B 0
pffiffiffi
0 0 0 C
B C
B 0 2 0 0 0 C
a{ ¼ B
B 0 pffiffiffi C: ð2:66Þ
B 0 3 0 0 CC
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Note that neither a nor a{ is Hermitian.


Since a determinant of the matrix of (2.19) is i/ħ 6¼ 0, using its inverse matrix we
get
0 rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi 1
ħ ħ
B 2mω C
q B 2mω C a
¼ B rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi C { : ð2:67Þ
p @ 1 mħω 1 mħω A a

i 2 i 2

That is, we have


rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
ħ   1 mħω 
q¼ a þ a{ and p¼ a  a{ : ð2:68Þ
2mω i 2

With the inverse matrix, we will deal with it in Part III. Note that q and p are both
Hermitian. Inserting (2.65) and (2.66) into (2.68), we get
0 1
0 1 0 0 0 
B pffiffiffi
C
rffiffiffiffiffiffiffiffiffiffiB C
1 0 2 0 0
B pffiffiffi pffiffiffi C
ħ B B
0 2 0 3 0 C
C,
q¼ pffiffiffi ð2:69Þ
2mωB B 0 0 3 0 2 C
C
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
0 1
0 1 0 0 0 
B pffiffiffi
0  2 C
rffiffiffiffiffiffiffiffiffiffi B C
1 0 0
B pffiffiffi pffiffiffi C
mħω B 0 2 0  3 0 C
p¼ iB pffiffiffi C: ð2:70Þ
2 B B 0 0 3 0 2 C
C
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
2.3 Matrix Representation of Physical Quantities 43

Equations (2.69) and (2.70) obviously show that q and p are Hermitian. We can
derive various physical quantities from these equations. For instance, following
matrix algebra Hamiltonian H can readily be calculated. The result is expressed as
0 1
1 0 0 0 0 
B C
B 0 3 0 0 0 C
B C
p 2
1 ħω B
B
0 0 5 0 0 C
C:
H¼ þ mω2 q2 ¼ ð2:71Þ
2m 2 2 B
B 0 0 0 7 0 C
C
B C
@ 0 0 0 0 9 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Looking at (2.69) to (2.71), we immediately realize that although neither q or p is


diagonalized, H is diagonalized. The matrix representation of (2.71) is said to be a
representation that diagonalizes H. This representation is of great practical use. In
fact, using (2.64) we get, e.g.,
0 10 1 0 1 0 1
1 0 0 0 0  0 0 0
B CB C B C B C
B 0 3 0 0 0 CB C B 0 C B 0 C
B CB 0 C B C B C
B CB C B C B C
B 0 0 5 0 0 CB 1 C B 5 C B C
ħω B CB C ħω B C 5ħω B 1 C
H jψ 2 i ¼ B CB C ¼ B C¼ B C
2 B 0 0 0 7 0 CB C 2 B 0 C 2 B 0 C
B CB 0 C B C B C
B CB C B C B C
B 0 0 0 0 9 CB 0 C B 0 C B 0 C
@ A@ A @ A @ A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮
 
5ħω 1
¼ jψ 2 i ¼ þ 2 ħωjψ 2 i:
2 2
ð2:72Þ

1 This clearly means that the second-excited state jψ 2i has an1eigenenergy



2 þ 2 ħω . More generally, we find that jψ ni has an eigenenergy 2 þ n ħω as
already shown in (2.36) and (2.56).
Furthermore, let us confirm the canonical commutation relation of (1.140). Using
(2.68), we have
rffiffiffiffiffiffiffiffiffiffirffiffiffiffiffiffiffiffiffiffi
1 ħ mħω      
qp  pq ¼ a þ a{ a  a{  a  a{ a þ a{
i 2mω 2 ð2:73Þ
ħ  
¼  ð2Þ  aa{  a{ a ¼ iħ a, a{ ¼ iħ ¼ iħE,
2i

where with the second last equality we used (2.24) and the identity matrix E of an
infinite dimension is given by
44 2 Quantum-Mechanical Harmonic Oscillator

0 1
1 0 0 0 0 
B 0 C
B 1 0 0 0 C
B C
B 0 0 1 0 0 C
E¼B
B 0
C: ð2:74Þ
B 0 0 1 0 CC
B C
@ 0 0 0 0 1 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Thus, the canonical commutation relation holds with the quantum-mechanical


harmonic oscillator. This can be confirmed directly from (2.69) and (2.70). The
proof is left for readers as an exercise.

2.4 Coordinate Representation of Schrödinger Equation

The Schrödinger equation has been given in (1.55) or (2.11) as a SOLDE form. In
contrast to the matrix representation, (1.55) and (2.11) are said to be coordinate
representation of Schrödinger equation. Now, let us derive a coordinate representa-
tion of (2.28) to obtain an analytical solution.
On the basis of (1.31) and (2.17), a is expressed as
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffi
mω i mω i ħ ∂
a¼ q þ pffiffiffiffiffiffiffiffiffiffiffiffi p ¼ q þ pffiffiffiffiffiffiffiffiffiffiffiffi
2ħ 2mħω 2ħ 2mħω i ∂q
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
mω ħ ∂
¼ qþ : ð2:75Þ
2ħ 2mω ∂q

Thus, (2.28) reads as a following first-order linear differential equation (FOLDE):


rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi !
mω ħ ∂
qþ ψ ðqÞ ¼ 0: ð2:76Þ
2ħ 2mω ∂q 0

Or
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
mω ħ ∂ψ 0 ðqÞ
qψ ðqÞ þ ¼ 0: ð2:77Þ
2ħ 0 2mω ∂q

This is further reduced to

∂ψ 0 ðqÞ mω
þ qψ 0 ðqÞ ¼ 0: ð2:78Þ
∂q ħ

From this FOLDE form, we anticipate the following solution:


2.4 Coordinate Representation of Schrödinger Equation 45

ψ 0 ðqÞ ¼ N 0 eαq ,
2
ð2:79Þ

where N0 is a normalization constant and α is a constant coefficient. Putting (2.79)


into (2.78), we have
 

q N 0 eαq ¼ 0:
2
2αq þ ð2:80Þ
ħ

Hence, we get


α¼ : ð2:81Þ

Thus,

ψ 0 ðqÞ ¼ N 0 e 2ħ q :
mω 2
ð2:82Þ

The constant N0 can be determined by the normalization condition given by


Z 1 Z 1
e ħ q dq ¼ 1:
mω 2
jψ 0 ðqÞj2 dq ¼ 1 or N 0 2 ð2:83Þ
1 1

Recalling the following formula:


Z 1
rffiffiffi
cq2 π
e dq ¼ ðc > 0Þ, ð2:84Þ
1 c

we have
Z 1
rffiffiffiffiffiffiffi
mω 2 πħ
e ħ q dq ¼ : ð2:85Þ
1 mω
R1 cq2
To get (2.84), putting I  1 e dq, we have
Z 1 Z 1 Z 1
ecq dq ecs ds ecðq þs Þ dqds
2 2 2 2
I2 ¼ ¼
1 1 1
Z 1 Z 2π Z 1 Z 2π ð2:86Þ
1 π
ecr rdr ecR dR
2
¼ dθ ¼ dθ ¼ ,
0 0 2 0 0 c

where with the third equality we converted two-dimensional Cartesian coordinate to


polar coordinate; take q ¼ r cos θ, s ¼ r sin θ and convert an infinitesimal area
46 2 Quantum-Mechanical Harmonic Oscillator

element dqds to dr  rdθ. With the second last equality


pffiffiffiffiffiffiffiffi of (2.86), we used the variable
transformation of r2 ! R. Hence, we get I ¼ π=c.
Thus, we get
 1=4  1=4 mω 2
mω mω
N0 ¼ and ψ 0 ðqÞ ¼ e 2ħ q : ð2:87Þ
πħ πħ

Also, we have
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffi
mω i mω i ħ ∂
a{ ¼ q  pffiffiffiffiffiffiffiffiffiffiffiffi p ¼ q  pffiffiffiffiffiffiffiffiffiffiffiffi
2ħ 2mħω 2ħ 2mħω i ∂q
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
mω ħ ∂
¼ q : ð2:88Þ
2ħ 2mω ∂q

From (2.52), we get


rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi !n
1  n 1 mω ħ ∂
ψ n ðqÞ ¼ pffiffiffiffi a{ j ψ 0 i ¼ pffiffiffiffi q ψ 0 ð qÞ
n! n! 2ħ 2mω ∂q
ð2:89Þ
  n
1 mω n=2 ħ ∂
¼ pffiffiffiffi q ψ 0 ðqÞ:
n! 2ħ mω ∂q

Putting
rffiffiffiffiffiffiffi

β and ξ ¼ βq, ð2:90Þ
ħ

we rewrite (2.89) as

n=2 n
ξ 1 mω ∂
ψ n ð qÞ ¼ ψ n ¼ pffiffiffiffi ξ ψ 0 ð qÞ
β n! 2ħβ 2 ∂ξ
  n
rffiffiffiffiffiffiffiffiffi  n
1 1 n=2 ∂ 1 mω 1=4 ∂
e2ξ : ð2:91Þ
1 2
¼ pffiffiffiffi ξ ψ 0 ð qÞ ¼ ξ
n! 2 ∂ξ 2 n
n! πħ ∂ξ

Comparing (2.81) and (2.90), we have

β2
α¼ : ð2:92Þ
2

Moreover, putting
2.4 Coordinate Representation of Schrödinger Equation 47

rffiffiffiffiffiffiffiffiffi  rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 mω 1=4 β
Nn  ¼ , ð2:93Þ
2n n! πħ π 1=2 2n n!

we get
n

e  2ξ :
1 2
ψ n ðξ=βÞ ¼ N n ξ  ð2:94Þ
∂ξ

We have to normalize (2.94) with respect to a variable ξ. Since ψ n(q) has already
been normalized as in (2.53), we have
Z 1
jψ n ðqÞj2 dq ¼ 1: ð2:95Þ
1

Changing a variable q to ξ, we have


Z 1
1
jψ n ðξ=βÞj2 dξ ¼ 1: ð2:96Þ
β 1

fn ðξÞ as being normalized with ξ. In other words, ψ n(q) is converted


Let us define ψ
fn ðξÞ by means of variable transformation and concomitant change in normali-
to ψ
zation condition. Then, we have
Z 1
fn ðξÞj2 dξ ¼ 1:
jψ ð2:97Þ
1

fn ðξÞ as
Comparing (2.96) and (2.97), if we define ψ
rffiffiffi
1
fn ðξÞ 
ψ ψ ðξ=βÞ, ð2:98Þ
β n

fn ðξÞ should be a proper normalized function. Thus, we get


ψ

n
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ψ fn ξ  ∂
fn ðξÞ ¼ N e 12ξ2 fn 
with N
1
: ð2:99Þ
∂ξ π 1=2 2n n!

Meanwhile, according to a theory of classical orthogonal polynomial, the Hermite


polynomials Hn(x) are defined as [3]
 
2 dn x2
H n ðxÞ  ð1Þn ex n e ðn  0Þ, ð2:100Þ
dx
48 2 Quantum-Mechanical Harmonic Oscillator

where Hn(x) is a n-th order polynomial. We wish to show the following relation on
the basis of mathematical induction:

fn H n ðξÞe2ξ :
fn ðξÞ ¼ N
ψ
1 2
ð2:101Þ

Comparing (2.87), (2.98), and (2.99), we make sure that (2.101) holds with n ¼ 0.
When n ¼ 1, from (2.99) we have
h i
f1 ξ  ∂ e12ξ2 ¼ N
f1 ðξÞ ¼ N
ψ f1 ξe12ξ2  ðξÞe12ξ2 ¼ N f1  2ξe12ξ2
∂ξ
  2  1 2 ð2:102Þ
f 1 ξ2 d f1 H 1 ðξÞe12ξ2 :
¼ N 1 ð1Þ e eξ e2ξ ¼ N

Then, (2.101) holds with n ¼ 1 as well.


Next, from supposition of mathematical induction we assume that (2.101) holds
with n. Then, we have

nþ1 n
g ∂ 1 ∂ f ∂
e2ξ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ξ  e  2ξ
1 2 1 2
ψg
nþ1 ðξÞ ¼ N nþ1 ξ  Nn ξ 
∂ξ 2ð n þ 1 Þ ∂ξ ∂ξ
h i
1 ∂ f
N n H n ðxÞe2ξ
1 2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ξ 
2ð n þ 1Þ ∂ξ
 n   1 2 
1 fn ξ  ∂
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N ð1Þn eξ
2 d
e ξ2
e  2ξ
2ð n þ 1Þ ∂ξ dξn
 n  
1 fn ð1Þn ξ  ∂ e12ξ2 d n eξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N
2ð n þ 1Þ ∂ξ dξ
 n    n  
1 fn ð1Þn ξe12ξ2 d n eξ2  ∂ e12ξ2 d n eξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N
2ð n þ 1Þ dξ ∂ξ dξ
 n   n   nþ1  
1 f 1 2 d
ξ ξ2 1 2 d
ξ ξ2 1 2 d
ξ ξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N n ð1Þ ξe
n 2 e  ξe 2 e e 2 e
2ð n þ 1Þ dξn dξn dξnþ1
nþ1  
1 fn ð1Þnþ1 e12ξ2 d ξ2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N e
2ð n þ 1Þ dξnþ1
 nþ1   1 2
nþ1 ξ2 d
¼ Ng
nþ1 ð 1 Þ e nþ1
e ξ2
e2ξ ¼ Ng nþ1 H nþ1 ðxÞe 2 :
 1 ξ2

ð2:103Þ

This means that (2.101) holds with n + 1 as well. Thus, it follows that (2.101) is true
of n that is zero or any positive integer.
Orthogonal relation reads as
2.4 Coordinate Representation of Schrödinger Equation 49

Table 2.1 First six Hermite H0(x) ¼ 1


polynomials
H1(x) ¼ 2x
H2(x) ¼ 4x2  2
H3(x) ¼ 8x3  12x
H4(x) ¼ 16x4  48x2 + 12
H5(x) ¼ 32x5  160x3 + 120x

Z 1
ψfm ðξÞ ψ
fn ðξÞdξ ¼ δmn : ð2:104Þ
1

Placing (2.98) back into the function form ψ n(q), we have


pffiffiffi
ψ n ð qÞ ¼ βψ fn ðβqÞ: ð2:105Þ

Using (2.101) and explicitly rewriting (2.105), we get

 1=4 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi


mω 1 mω
q e 2ħ q ðn ¼ 0, 1, 2,   Þ:
mω 2
ψ n ð qÞ ¼ H ð2:106Þ
ħ π 1=2 2n n!
n
ħ

We tabulate first several Hermite polynomials Hn(x) in Table 2.1, where the index
n represents the highest order of the polynomials. In Table 2.1 we see that even
functions and odd functions appear alternately (i.e., parity). This is the case with
ψ n(q) as well, because ψ n(q) is a product of Hn(x) and an even function e 2ħ q .
mω 2

Combining (2.101) and (2.104), the orthogonal relation between


fn ðξÞ ðn ¼ 0, 1, 2,   Þ can be described alternatively as [3]
ψ
Z 1 pffiffiffi n
eξ H m ðξÞH n ðξÞdξ ¼
2
π 2 n!δmn : ð2:107Þ
1

Note that Hm(ξ) is a real function, and so Hm(ξ) ¼ Hm(ξ). The relation (2.107) is
well known as the orthogonality of Hermite polynomials with eξ taken as a weight
2

function [3]. Here the weight function is a real and non-negative function within the
domain considered [e.g., (1, +1) in the present case] and independent of indices
m and n. We will deal with it again in Sect. 8.4.
The relation (2.101) and the orthogonality relationship described as (2.107) can
more explicitly be understood as follows: From (2.11) we have the Schrödinger
equation of a one-dimensional quantum-mechanical harmonic oscillator such that

ħ2 d 2 uð qÞ 1
 þ mω2 q2 uðqÞ ¼ EuðqÞ: ð2:108Þ
2m dq2 2

Changing a variable as in (2.90), we have


50 2 Quantum-Mechanical Harmonic Oscillator

d 2 uð ξ Þ 2E
 þ ξ2 uðξÞ ¼ uðξÞ: ð2:109Þ
dξ 2 ħω

Defining a dimensionless parameter

2E
λ ð2:110Þ
ħω

and also defining a differential operator D such that

d2
D þ ξ2 , ð2:111Þ
dξ2

we have a following eigenvalue equation:

DuðξÞ ¼ λuðξÞ: ð2:112Þ

We further consider a following function v(ξ) such that

uðξÞ ¼ vðξÞeξ =2
2
: ð2:113Þ

Then, (2.109) is converted as follows:


 2 
d vð ξ Þ dvðξÞ ξ22 2
ξ2
 þ 2ξ e ¼ ðλ  1 Þ v ð ξ Þe : ð2:114Þ
dξ2 dξ

ξ2
Since e 2 does not vanish with any ξ, we have

d 2 vð ξ Þ dvðξÞ
 2
þ 2ξ ¼ ðλ  1Þ vðξÞ: ð2:115Þ
dξ dξ

e such that
If we define another differential operator D

2
e   d þ 2ξ d ,
D ð2:116Þ
dξ2 dξ

we have another eigenvalue equation

e ðξÞ ¼ ðλ  1Þ vðξÞ:
Dv ð2:117Þ

Meanwhile, we have a following well-known differential equation:


2.5 Variance and Uncertainty Principle 51

d 2 H n ðξÞ dH ðξÞ
2
 2ξ n þ 2nH n ðξÞ ¼ 0: ð2:118Þ
dξ dξ

This equation is said to be Hermite differential equation. Using (2.116), (2.118) can
be recast as an eigenvalue equation such that

e n ðξÞ ¼ 2nH n ðξÞ:


DH ð2:119Þ

Therefore, comparing (2.115) and (2.118) and putting

λ ¼ 2n þ 1, ð2:120Þ

we get

vðξÞ ¼ cH n ðξÞ, ð2:121Þ

where c is an arbitrary constant. Thus, using (2.113), for a solution of (2.109) we get

un ðξÞ ¼ cH n ðξÞeξ =2
2
, ð2:122Þ

where the solution u(ξ) is indexed with n. From (2.110), as an energy eigenvalue we
have
 
1
E n ¼ n þ ħω:
2

Thus, (2.37) is recovered. A normalization constant c of (2.122) can be decided as in


(2.106).
As discussed above, the operator representation and coordinate representation are
fully consistent.

2.5 Variance and Uncertainty Principle

Uncertainty principle is one of most fundamental concepts of quantum mechanics.


To think of this conception on the basis of a quantum harmonic oscillator, let us
introduce a variance operator [4]. Let A be a physical quantity and let hAi be an
expectation value as defined in (1.126). We define a variance operator as
D E
ðΔAÞ2 ,

where we have
52 2 Quantum-Mechanical Harmonic Oscillator

ΔA  A  hAi: ð2:123Þ

In (2.123), we assume that hAi is obtained by operating A on a certain physical state


jψi. Then, we have
D E D E D E 
ðΔAÞ2 ¼ ðA  hAiÞ2 ¼ A2  2hAiA þ hAi2 ¼ A2  hAi2 : ð2:124Þ

If A is Hermitian, ΔA is Hermitian as well. This is because

ðΔAÞ{ ¼ A{  hAi ¼ A  hAi ¼ ΔA, ð2:125Þ

where we used the fact that an expectation value of an Hermitian operator is real.
Then, h(ΔA)2i is non-negative as in the case of (2.13). Moreover, if jψi is an
eigenstate of A, h(ΔA)2i ¼ 0. Therefore, h(ΔA)2i represents a measure of how large
measured values are dispersed when A is measured in reference to a quantum state
jψi. Also, we define a standard deviation δA as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D Effi
δA  ðΔAÞ2 : ð2:126Þ

We have a following important theorem on a standard deviation δA [4].


Theorem 2.1 Let A and B be Hermitian operators. If A and B satisfy

½A, B ¼ ik ðk : non  zero real numberÞ, ð2:127Þ

then we have

δA  δB j k j =2 ð2:128Þ

in reference to any quantum state jψi.


Proof We have

½ΔA, ΔB ¼ ½A  hψjAjψ i, B  hψjBjψ i ¼ ½A, B ¼ ik: ð2:129Þ

In (2.129), we used the fact that hψ| A| ψi and hψ| B| ψi are just real numbers and
those commute with any operator. Next, we calculate a following quantity in relation
to a real number λ:

jjðΔA þ iλΔBÞjψijj2 ¼ hψjðΔA  iλΔBÞðΔA þ iλΔBÞjψi


¼ hψjðΔAÞ2 j ψi  kλ þ hψjðΔBÞ2 j ψiλ2 , ð2:130Þ
2.5 Variance and Uncertainty Principle 53

where we used the fact that ΔA and ΔB are Hermitian. For the above quadratic form
to hold with any real number λ, we have

ðkÞ2  4hψjðΔAÞ2 j ψihψjðΔBÞ2 j ψi 0: ð2:131Þ

Thus, (2.128) will follow. ∎


On the basis of Theorem 2.1, we find that both δA and δB are positive on
condition that (2.127) holds. We have another important theorem.
Theorem 2.2 Let A be an Hermitian operator. The necessary and sufficient condi-
tion for a physical state jψ 0i to be an eigenstate of A is δA ¼ 0.
Proof Suppose that jψ 0i is a normalized eigenstate of A that belongs to an eigen-
value a. Then, we have
 
ψ 0 jA2 ψ 0 ¼ ahψ 0 jAψ 0 i ¼ a2 ψ 0 jψ 0 i ¼ a2 ,

hψ 0 jAψ 0 i2 ¼ a ψ 0 jψ 0 i2 ¼ a2 : ð2:132Þ

From (2.124) and (2.126), we have


D E
ψ 0 jðΔAÞ2 ψ 0 ¼ 0, i:e:, δA ¼ 0: ð2:133Þ

Note that δA is measured in reference to jψ 0i. Conversely, suppose that δA ¼ 0.


Then,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D E pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
δA ¼ ψ 0 jðΔAÞ2 ψ 0 ¼ hΔAψ 0 jΔAψ 0 i ¼j jΔAψ 0 j j , ð2:134Þ

where we used the fact that ΔA is Hermitian. From the definition of norm of (1.121),
for δA ¼ 0 to hold, we have

ΔAψ 0 ¼ ðA  hAiÞψ 0 ¼ 0, i:e:, Aψ 0 ¼ hAiψ 0 : ð2:135Þ

This indicates that ψ 0 is an eigenstate of A that belongs to an eigenvalue hAi. This


completes the proof. ∎
Theorem 2.1 implies that (2.127) holds with any physical state jψi. That is, we
must have δA > 0 and δB > 0, if δA and δB are evaluated in reference to any jψi on
condition that (2.127) holds. From Theorem 2.2, in turn, it follows that eigenstates
cannot exist with A or B under the condition of (2.127).
To explicitly show this, we take an inner product of (2.127). That is, with
Hermitian operators A and B, consider the following inner product:

hψj½A, Bjψi ¼ hψjik jψi, i:e:, hψjAB  BA j ψi ¼ ik, ð2:136Þ


54 2 Quantum-Mechanical Harmonic Oscillator

where we assumed that jψi is arbitrarily chosen normalized vector. Suppose now
that jψ 0i is an eigenstate of A that belongs to an eigenvalue a. Then, we have

A j ψ 0 i ¼ a j ψ 0 i: ð2:137Þ

Taking an adjoint of (2.137), we get

hψ 0 jA{ ¼ hψ 0 jA ¼ hψ 0 ja ¼ ahψ 0 j, ð2:138Þ

where the last equality comes from the fact that A is Hermitian. From (2.138), we
would have

hψ 0 jAB  BAjψ 0 i ¼ hψ 0 jABjψ 0 i  hψ 0 jBAjψ 0 i


¼ hψ 0 jaBjψ 0 i  hψ 0 jBajψ 0 i ¼ ahψ 0 jBjψ 0 i  ahψ 0 jBjψ 0 i ¼ 0:

This would imply that (2.136) does not hold with jψ 0i, in contradiction to (2.127),
where ik 6¼ 0. Namely, we conclude that any physical state cannot be an eigenstate of
A on condition that (2.127) holds. Equation (2.127) is rewritten as

hψjBA  AB j ψi ¼ ik: ð2:139Þ

Suppose now that jφ0i is an eigenstate of B that belongs to an eigenvalue b. Then, we


can similarly show that any physical state cannot be an eigenstate of B.
Summarizing the above, we restate that once we have a relation [A, B] ¼ ik (k 6¼ 0),
their representation matrix does not diagonalize A or B. Or, once we postulate [A,
B] ¼ ik (k 6¼ 0), we must abandon an effort to have a representation matrix that
diagonalizes A and B. In the quantum-mechanical formulation of a harmonic oscil-
lator, we have introduced the canonical commutation relation (see Sect. 2.3)
described by [q, p] ¼ iħ (1.140). Indeed, neither q nor p is diagonalized as shown
in (2.69) or (2.70).
Example 2.1 Taking a quantum harmonic oscillator as an example, we consider
variance of q and p in reference to jψ ni (n ¼ 0, 1,   ). We have
D E
ðΔqÞ2 ¼ hψ n jq2 j ψ n i  hψ n jq j ψ n i2 : ð2:140Þ

Using (2.55) and (2.62) as well as (2.68), we get


rffiffiffiffiffiffiffiffiffiffi
ħ   
hψ n jqjψ n i ¼ ψ a þ a{  ψ n
2mω n
rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2:141Þ
ħn ħð n þ 1Þ 
¼ hψ n jψ n1 i þ ψ n jψ nþ1 ¼ 0,
2mω 2mω

where the last equality comes from (2.53). We have


2.5 Variance and Uncertainty Principle 55

h  2 i
ħ  2 ħ
q2 ¼ a þ a{ ¼ a2 þ E þ 2a{ a þ a{ , ð2:142Þ
2mω 2mω

where E denotes an identity operator and we used (2.24) along with the following
relation:

aa{ ¼ aa{  a{ a þ a{ a ¼ a, a{ þ a{ a ¼ E þ a{ a: ð2:143Þ

Using (2.55) and (2.62), we have

ħ  ħ
h ψ n j q2 j ψ n i ¼ hψ n jψ n i þ 2 ψ n ja{ aψ n i ¼ ð2n þ 1Þ, ð2:144Þ
2mω 2mω

where we used (2.60) with the last equality. Thus, we get


D E
ħ
ðΔqÞ2 ¼ hψ n jq2 j ψ n i  hψ n jq j ψ n i2 ¼ ð2n þ 1Þ:
2mω

Following similar procedures to those mentioned above, we get

mħω
hψ n jp j ψ n i ¼ 0 and hψ n jp2 j ψ n i ¼ ð2n þ 1Þ: ð2:145Þ
2

Thus, we get
D E
mħω
ðΔpÞ2 ¼ hψ n jp2 j ψ n i ¼ ð2n þ 1Þ:
2

Accordingly, we have
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D E rDffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
E
2 ħ ħ
δq  δp ¼ ðΔqÞ  ðΔpÞ2 ¼ ð2n þ 1Þ  : ð2:146Þ
2 2

The quantity δq  δp is equal to ħ2 for n ¼ 0 and becomes larger with increasing n.


The above example gives a good illustration for Theorem 2.1. Note that putting
A ¼ q and B ¼ p along with k ¼ ħ in Theorem 2.1, we should have from (1.140)

ħ
δq  δp  :
2

This is indeed the case with (2.146) for the quantum-mechanical harmonic oscillator.
This example represents uncertainty principle more generally.
In relation to the aforementioned argument, we might well wonder if Examples
1.1 and 1.2 have an eigenstate of a fixed momentum. Suppose that we chose for an
eigenstate y(x) ¼ ceikx, where c is a constant. Then, we would have ħi ∂y∂xðxÞ ¼ ħkyðxÞ
56 2 Quantum-Mechanical Harmonic Oscillator

and get an eigenvalue ħk for a momentum. Nonetheless, such y(x) does not satisfy
the proper BCs; i.e., y(L) ¼ y(L ) ¼ 0. This is because eikx never vanishes with any
real numbers of k or x (any complex numbers of k or x, more generally). Thus, we
cannot obtain a proper solution that has an eigenstate with a fixed momentum in a
confined physical system.

References

1. Messiah A (1999) Quantum mechanics. Dover, New York


2. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York
3. Lebedev NN (1972) Special functions and their applications. Dover, New York
4. Shimizu A (2003) Upgrade foundation of quantum theory (in Japanese). Saiensu-sha, Tokyo
Chapter 3
Hydrogen-Like Atoms

In a history of quantum mechanics, it was first successfully applied to the motion of


an electron in a hydrogen atom along with a harmonic oscillator. Unlike the case of a
one-dimensional harmonic oscillator we dealt with in Chap. 2, however, with a
hydrogen atom we have to consider three-dimensional motion of an electron.
Accordingly, it takes somewhat elaborate calculations to constitute the Hamiltonian.
The calculation procedures themselves, however, are worth following to understand
underlying basic concepts of the quantum mechanics. At the same time, this chapter
is a treasure of special functions. In Chap. 2, we have already encountered one of
them, i.e., Hermite polynomials. Here we will deal with Legendre polynomials,
associated Legendre polynomials, etc. These special functions arise when we deal
with a physical system having, e.g., the spherical symmetry. In a hydrogen atom, an
electron is moving in a spherically symmetric Coulomb potential field produced by a
proton. This topic provides us with a good opportunity to study various special
functions. The related Schrödinger equation can be separated into an angular part
and a radial part. The solutions of angular parts are characterized by spherical
(surface) harmonics. The (associated) Legendre functions are correlated to them.
The solutions of the radial part are connected to the (associated) Laguerre poly-
nomials. The exact solutions are obtained by the product of the (associated) Legen-
dre functions and (associated) Laguerre polynomials accordingly. Thus, to study the
characteristics of hydrogen-like atoms from the quantum-mechanical perspective is
of fundamental importance.

3.1 Introductory Remarks

The motion of the electron in hydrogen is well known as a two-particle problem


(or two-body problem) in a central force field. In that case, the coordinate system of
the physical system is separated into the relative coordinates and center-of-mass
coordinates. To be more specific, the coordinate separation is true of the case where

© Springer Nature Singapore Pte Ltd. 2020 57


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_3
58 3 Hydrogen-Like Atoms

two particles are moving under control only by a force field between the two
particles without other external force fields [1].
In the classical mechanics, equation of motion is separated into two equations
related to the relative coordinates and center-of-mass coordinates accordingly. Of
these, a term of the potential field is only included in the equation of motion with
respect to the relative coordinates.
The situation is the same with the quantum mechanics. Namely, the Schrödinger
equation of motion with the relative coordinates is expressed as an eigenvalue
equation that reads as
 
ħ2
 ∇2 þ V ðr Þ ψ ¼ Eψ, ð3:1Þ

where μ is a reduced mass of two particles [1], i.e., an electron and a proton; V(r) is a
potential with r being a distance between the electron and proton. In (3.1), we
assume the spherically symmetric potential; i.e., the potential is expressed only as
a function of the distance r. Moreover, if the potential is coulombic,
 
ħ2 2 e2
 ∇  ψ ¼ Eψ, ð3:2Þ
2μ 4πε0 r

where ε0 is permittivity of vacuum and e is an elementary charge.


If we think of hydrogen-like atoms such as He+, Li2+ and Be3+, we have an
equation described as
 
ħ2 Ze2
 ∇2  ψ ¼ Eψ, ð3:3Þ
2μ 4πε0 r

where Z is an atomic number and μ is a reduced mass of an electron and a nucleus


pertinent to the atomic (or ionic) species. We start with (3.3) in this chapter.

3.2 Constitution of Hamiltonian

As explicitly described in (3.3), the coulombic potential has a spherical symmetry. In


such a case, it will be convenient to recast (3.3) in a spherical coordinate (or polar
coordinate). As the physical system is of three-dimensional, we have to consider
orbital angular momentum L in Hamiltonian.
We have
3.2 Constitution of Hamiltonian 59

0 1
Lx
B C
L ¼ ðe1 e2 e3 Þ@ Ly A, ð3:4Þ
Lz

where e1, e2, and e3 denote an orthonormal basis vectors in a three-dimensional


Cartesian space (ℝ3); Lx, Ly, and Lz represent each component of L. The angular
momentum L is expressed in a form of determinant as
 
 e1 e2 e3 

 z ,
L¼xp¼ x y
 
 px py pz 

where x denotes a position vector with respect to the relative coordinates x, y, and z.
That is,
0 1
x
B C
x ¼ ð e1 e2 e3 Þ @ y A: ð3:5Þ
z

The quantity p denotes a momentum of an electron (as a particle carrying a


reduced mass μ) with px, py, and pz being their components; p is denoted similarly to
the above.
As for each component of L, we have, e.g.,

Lx ¼ ypz  zpy : ð3:6Þ

To calculate L2, we estimate Lx2, Ly2, and Lz2 separately. We have


   
Lx 2 ¼ ypz  zpy  ypz  zpy
¼ ypz ypz  ypz zpy  zpy ypz  zpy zpy
¼ y2 pz 2  ypz zpy  zpy ypz þ z2 py 2
   
¼ y2 pz 2  y zpz  iħ py  z ypy  iħ pz þ z2 py 2
 
¼ y2 pz 2 þ z2 py 2  yzpz py  zypy pz þ iħ ypy þ zpz ,

where we have used canonical commutation relation (1.140) in the second to the last
equality. In the above calculations, we used commutability of, e.g., y and pz; z and py.
For example, we have
60 3 Hydrogen-Like Atoms

   
ħ ∂ ∂ ħ ∂jψi ∂jψi
pz , y j ψi ¼ yy j ψi ¼ y y ¼ 0:
i ∂z ∂z i ∂z ∂z

Since jψi is arbitrarily chosen, this relation implies that pz and y commute. We
obtain similar relations regarding Ly2 and Lz2 as well. Thus, we have

L2 ¼ Lx 2 þ Ly 2 þ Lz 2
 
¼ y2 pz 2 þ z2 py 2 þ z2 px 2 þ x2 pz 2 þ x2 py 2 þ y2 px 2
 
þ x 2 px 2  x 2 px 2 þ y 2 py 2  y 2 py 2 þ z 2 pz 2  z 2 pz 2
 
 yzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py
 
þiħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy

¼ y 2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 px 2

þx2 px 2 þ y2 py 2 þ z2 pz 2 Þ  x2 px 2 þ y2 py 2 þ z2 pz 2

þyzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py Þ


 
þiħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy

¼ r2  p2  rðr  pÞ  p þ 2iħðr  pÞ: ð3:7Þ

In (3.7), we are able to ease the calculations by virtue of putting a term


(x2px2  x2px2 + y2py2  y2py2 + z2pz2  z2pz2). As a result, for the second term
after the second to the last equality we have

 x 2 px 2 þ y 2 py 2 þ z 2 pz 2

þyzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py Þ


     
¼  x xpx þ ypy þ zpz px þ y xpx þ ypy þ zpz py þ z xpx þ ypy þ zpz pz

¼ rðr  pÞ  p:

The calculations of r2  p2 [the first term of (3.7)] and r  p (in the third term) are
straightforward.
In a spherical coordinate, momentum p is expressed as

p = pr eðrÞ þ pθ eðθÞ þ pϕ eðϕÞ , ð3:8Þ

where pr, pθ, and pϕ are components of p; e(r), e(θ), and e(ϕ) are orthonormal basis
vectors of ℝ3 in the direction of increasing r, θ, and ϕ, respectively (see Fig. 3.1). In
3.2 Constitution of Hamiltonian 61

(a) z (b)

e(r) z
e(r)

e(φ ) e(θ )
e(φ )
θ e(θ ) θ
O y
O
φ

Fig. 3.1 Spherical coordinate system and orthonormal basis set. (a) Orthonormal basis vectors e(r),
e(θ), and e(ϕ) in ℝ3. (b) The basis vector e(ϕ) is perpendicular to the plane shaped by the z-axis and a
straight line of y ¼ x tan ϕ

Fig. 3.1b, e(ϕ) is perpendicular to the plane shaped by the z-axis and a straight line of
y ¼ x tan ϕ. Notice that the said plane is spanned by e(r) and e(θ). Meanwhile, the
momentum operator is expressed as [2]

ħ
p¼ —
i
 
ħ ðrÞ ∂ ðθ Þ 1 ∂ ðϕÞ 1 ∂
¼ e þe þe : ð3:9Þ
i ∂r r ∂θ r sin θ ∂ϕ

The vector notation of (3.9) corresponds to (1.31). That is, in the Cartesian
coordinate, we have
 
ħ ħ ∂ ∂ ∂
p= —= e1 þ e2 þ e3 ,
i i ∂x ∂y ∂z

where — is said to be nabla (or del), a kind of differential vector operator.


Noting that

r = reðrÞ , ð3:10Þ

and using (3.9), we have


62 3 Hydrogen-Like Atoms

 
ħ ħ ∂
r  p=r  — =r : ð3:11Þ
i i ∂r

Hence,
    2
ħ ∂ ħ ∂ ∂
rðr  pÞ  p = r r ¼ ħ2 r 2 2 : ð3:12Þ
i ∂r i ∂r ∂r

Thus, we have

2  
∂ 2 ∂ 2 ∂ 2 ∂
L 2 ¼ r 2 p2 þ ħ2 r 2 þ 2ħ r ¼ r 2 2
p þ ħ r : ð3:13Þ
∂r
2 ∂r ∂r ∂r

Therefore,
 
ħ2 ∂ 2 ∂ L2
p ¼ 2
2
r þ 2: ð3:14Þ
r ∂r ∂r r

Notice here that L2 does not contain r (vide infra); i.e., L2 commutes with r2, and
so it can freely be divided by r2. Thus, the Hamiltonian H is represented by

p2
H¼ þ V ðr Þ

 2   
1 ħ ∂ ∂ L2 Ze2
¼  2 r2 þ 2  : ð3:15Þ
2μ r ∂r ∂r r 4πε0 r

Thus, the Schrödinger equation can be expressed as


 2   
1 ħ ∂ 2 ∂ L2 Ze2
 2 r þ 2  ψ ¼ Eψ: ð3:16Þ
2μ r ∂r ∂r r 4πε0 r

Now, let us describe L2 in a polar coordinate. The calculation procedures are


somewhat lengthy, but straightforward. First we have
9
x ¼ r sin θ cos ϕ, >
=
y ¼ r sin θ sin ϕ, ð3:17Þ
>
;
z ¼ r cos θ,

where we have 0  θ  π and 0  ϕ  2π. Rewriting (3.17) with respect to r, θ, and


ϕ, we get
3.2 Constitution of Hamiltonian 63

1 9
r ¼ ð x2 þ y2 þ z 2 Þ 2 , >
>
>
>
1=2 =
ð x2 þ y2 Þ ð3:18Þ
θ ¼ tan 1 , >
z >
>
1 y
>
;
ϕ ¼ tan :
x

Thus, we have
 
∂ ∂
Lz ¼ xpy  ypx ¼ iħ x  y , ð3:19Þ
∂y ∂x
∂ ∂r ∂ ∂θ ∂ ∂ϕ ∂
¼ þ þ : ð3:20Þ
∂x ∂x ∂r ∂x ∂θ ∂x ∂ϕ

In turn, we have

∂r x
¼ ¼ sin θ cos ϕ,
∂x r
1
∂θ 1 ðx2 þ y2 Þ 2  2x z x cos θ cos ϕ
¼  ¼ 2  ¼ ,
∂x 1 þ ðx2 þ y2 Þ=z2 2z x þ y2 þ z2 ðx2 þ y2 Þ12 r

∂ϕ 1 1 sin ϕ
¼  y  ¼ :
∂x 1 þ ðy2 =x2 Þ x2 r sin θ
ð3:21Þ

In calculating the last two equations of (3.21), we used the differentiation of an


arc tangent function along with a composite function. Namely,

 0 1
tan 1 x ¼ :
1 þ x2

Inserting (3.21) into (3.20), we get

∂ ∂ cos θ cos ϕ ∂ sin ϕ ∂


¼ sin θ cos ϕ þ  : ð3:22Þ
∂x ∂r r ∂θ r sin θ ∂ϕ

Similarly, we have

∂ ∂ cos θ sin ϕ ∂ cos ϕ ∂


¼ sin θ sin ϕ þ þ : ð3:23Þ
∂y ∂r r ∂θ r sin θ ∂ϕ

Inserting (3.22) and (3.23) together with (3.17) into (3.19), we get
64 3 Hydrogen-Like Atoms


Lz ¼ iħ : ð3:24Þ
∂ϕ

In a similar manner, we have

∂ ∂r ∂ ∂θ ∂ ∂ sin θ ∂
¼ þ ¼ cos θ  :
∂z ∂z ∂r ∂z ∂θ ∂r r ∂θ

Combining this relation with either (3.23) or (3.22), we get


 
∂ ∂
Lx ¼ iħ sin ϕ þ cot θ cos ϕ , ð3:25Þ
∂θ ∂ϕ
 
∂ ∂
Ly ¼ iħ cos ϕ  cot θ sin ϕ : ð3:26Þ
∂θ ∂ϕ

Now, we introduce following operators:

LðþÞ  Lx þ iLy and LðÞ  Lx  iLy : ð3:27Þ

Then, we have
   
∂ ∂ ∂ ∂
LðþÞ ¼ ħeiϕ þ i cot θ and LðÞ ¼ ħeiϕ  þ i cot θ : ð3:28Þ
∂θ ∂ϕ ∂θ ∂ϕ

Thus, we get
   
ðþÞ ðÞ ∂ ∂ iϕ ∂ ∂
L L ¼ħ e 2 iϕ
þ i cot θ e  þ i cot θ
∂θ ∂ϕ ∂θ ∂ϕ
 2   2 
∂ 1 ∂ ∂
¼ ħ2 eiϕ eiϕ  2 þ i  þ i cot θ
∂θ sin 2 θ ∂ϕ ∂θ∂ϕ
   2 2 
∂ ∂ ∂ ∂
þeiϕ cot θ  þ i cot θ þ ieiϕ cot θ  þ i cot θ 2
∂θ ∂ϕ ∂ϕ∂θ ∂ϕ
 2 2 
∂ ∂ ∂ 2 ∂
¼ ħ2 þ cot θ þ i þ cot θ ð3:29Þ
∂θ2 ∂θ ∂ϕ ∂ϕ2

In the above calculation procedure, we used differentiation of a product function.


For instance, we have
3.2 Constitution of Hamiltonian 65

   2    2 
∂ ∂ ∂ cot θ ∂ ∂ 1 ∂ ∂
i cot θ ¼i þ cot θ ¼i  þ cot θ :
∂θ ∂ϕ ∂θ ∂ϕ ∂θ∂ϕ sin 2 θ ∂ϕ ∂θ∂ϕ
2 2
∂ ∂
Note also that ∂θ∂ϕ ¼ ∂ϕ∂θ . This is because we are dealing with continuous and
differentiable functions.
Meanwhile, we have following commutation relations:

Lx , Ly ¼ iħLz , Ly , Lz ¼ iħLx , and ½Lz , Lx  ¼ iħLy : ð3:30Þ

This can easily be confirmed by requiring canonical commutation relations. The


derivation can routinely be performed, but we show it because the procedures
include several important points. For instance, we have

Lx , Ly ¼ Lx Ly  Ly Lx
     
¼ ypz  zpy zpx  xpz  zpx  xpz ypz  zpy

¼ ypz zpx  ypz xpz  zpy zpx þ zpy xpz

zpx ypz þ zpx zpy þ xpz ypz  xpz zpy


   
¼ ypx pz z  zpx ypz þ zpy xpz  xpz zpy
   
þ xpz ypz  ypz xpz þ zpx zpy  zpy zpx
     
¼ ypx zpz  pz z þ xpy zpz  pz z ¼ iħ xpy  ypx ¼ iħLz

In the above calculations, we used the canonical commutation relation as well as


commutability of, e.g., y and px; y and z; px and py. For example, we get
  !
2 2
∂ ∂ ∂ ∂ ∂ jψi ∂ jψi
px , py j ψi ¼ ħ 2
 j ψi ¼ ħ 2
 ¼ 0:
∂x ∂y ∂y ∂x ∂x∂y ∂y∂x

In the above equation, we assumed that the order of differentiation with respect to
x and y can be switched. It is because we are dealing with continuous and differen-
tiable normal functions. Thus, px and py commute.
For other important commutation relations, we have

Lx , L2 ¼ 0, Ly , L2 ¼ 0, and Lz , L2 ¼ 0: ð3:31Þ

With the derivation, use


66 3 Hydrogen-Like Atoms

½A, B þ C ¼ ½A, B þ ½A, C :

The derivation is straightforward and it is left for readers. The relations (3.30) and
(3.31) imply that a simultaneous eigenstate exists for L2 and one of Lx, Ly, and Lz.
This is because L2 commute with them from (3.31), whereas Lz does not commute
with Lx or Ly. The detailed argument about the simultaneous eigenstate can be seen
in Part III.
Thus, we have
 
LðþÞ LðÞ ¼ Lx 2 þ Ly 2 þ i Ly Lx  Lx Ly ¼ Lx 2 þ Ly 2 þ i Ly , Lx

¼ Lx 2 þ Ly 2 þ ħLz

Notice here that [Ly, Lx] ¼  [Lx, Ly] ¼  iħLz. Hence,

L2 ¼ LðþÞ LðÞ þ Lz 2  ħLz : ð3:32Þ

From (3.24), we have

2

Lz 2 ¼ ħ2 2
: ð3:33Þ
∂ϕ

Finally we get
!
2 2
∂ ∂ 1 ∂
L2 ¼ ħ2 þ cot θ þ
∂θ
2 ∂θ sin θ ∂ϕ2
2

or
"   #
2
1 ∂ ∂ 1 ∂
L2 ¼ ħ2 sin θ þ : ð3:34Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

Replacing L2 in (3.15) with that of (3.34), we have


"     #
2
ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂ Ze2
H¼ r2 þ sin θ þ  : ð3:35Þ
2μr ∂r
2 ∂r sin θ ∂θ ∂θ sin θ ∂ϕ
2 2 4πε0 r

Thus, the Schrödinger equation of (3.3) takes a following form:


3.3 Separation of Variables 67

( "     # )
2
ħ2 ∂ 2 ∂ 1 ∂ ∂ 1 ∂ Ze2
 2 r þ sinθ þ 2  ψ ¼Eψ: ð3:36Þ
2μr ∂r ∂r sinθ ∂θ ∂θ sin θ ∂ϕ2 4πε0 r

3.3 Separation of Variables

If the potential is spherically symmetric (e.g., a Coulomb potential), it is well known


that the Schrödinger equations of (3.1)–(3.3) can be solved by a method of separa-
tion of variables. More specifically, (3.36) can be separated into two differential
equations one of which only depends on a radial component r and the other of which
depends only upon angular components θ and ϕ.
To apply the method of separation of variables to (3.36), let us first return to
(3.15). Considering that L2 is expressed as (3.34), we assume that L2 has eigenvalues
γ (at any rate if any) and takes eigenfunctions Y(θ, ϕ) (again, if any as well)
corresponding to γ. That is,

L2 Y ðθ, ϕÞ ¼ γY ðθ, ϕÞ, ð3:37Þ

where Y(θ, ϕ) is assumed to be normalized. Meanwhile,

L2 ¼ Lx 2 þ Ly 2 þ Lz 2 : ð3:38Þ

From (3.6), we have


 {
Lx { ¼ ypz  zpy ¼ pz { y{  py { z{ ¼ pz y  py z ¼ ypz  zpy ¼ Lx : ð3:39Þ

Note that pz and y commute, so do py and z. Therefore, Lx is Hermitian, so is Lx2.


More generally if an operator A is Hermitian, so is An (n: a positive integer); readers,
please show it. Likewise, Ly and Lz are Hermitian as well. Thus, L2 is Hermitian, too.
Next, we consider an expectation value of L2, i.e., hL2i. Let jψi be an arbitrary
normalized nonzero vector (or function). Then,

hL2 i  hψ jL2 ψi

¼ hψ jLx 2 ψi þ hψ jLy 2 ψi þ hψ jLz 2 ψi

¼ hLx { ψ jLx ψi þ hLy { ψ jLy ψi þ hLz { ψ jLz ψi


68 3 Hydrogen-Like Atoms

¼ hLx ψ jLx ψi þ hLy ψ jLy ψi þ hLz ψ jLz ψi


2
¼jjLx ψ jj2 þjjLy ψ jj þ jjLz ψ jj2  0: ð3:40Þ

Notice that the second last equality comes from that Lx, Ly, and Lz are Hermitian.
An operator that satisfies (3.40) is said to be non-negative (see Sects. 1.4, 2.2, etc.
where we saw the calculation routines). Note also that in (3.40) the equality holds
only when the following relations hold:

jLx ψi ¼jLy ψi ¼jLz ψi ¼ 0: ð3:41Þ

On this condition, we have


      
jL2 ψ ¼ j Lx 2 þ Ly 2 þ Lz 2 ψ ¼ jLx 2 ψ þ jLy 2 ψ þ jLz 2 ψ
 
¼ Lx Lx ψi þ Ly Ly ψi þ Lz jLz ψi ¼ 0: ð3:42Þ

The eigenfunction that satisfies (3.42) and the next relation (3.43) is a simulta-
neous eigenstate of Lx, Ly, Lz, and L2. This could seem to be in contradiction to the
fact that Lz does not commute with Lx or Ly. However, this is an exceptional case. Let
jψ 0i be the eigenfunction that satisfies both (3.41) and (3.42). Then, we have

jLx ψ 0 i ¼jLy ψ 0 i ¼jLz ψ 0 i ¼jL2 ψ 0 i ¼ 0: ð3:43Þ

As can be seen from (3.24) to (3.26) along with (3.34), the operators Lx, Ly, Lz,
and L2 are differential operators. Therefore, (3.43) implies that jψ 0i is a constant. We
will come back this point later. In spite of this exceptional situation, it is impossible
that all Lx, Ly, and Lz as well as L2 take a whole set of eigenfunctions as simultaneous
eigenstates. We briefly show this as below.
In Chap. 2, we mention that if [A, B] ¼ ik, any physical state cannot be an
eigenstate of A or B. The situation is different, on the other hand, if we have a
following case

½A, B ¼ iC, ð3:44Þ

where A, B, and C are Hermitian operators. The relation (3.30) is a typical example
for this. If Cjψi ¼ 0 in (3.44), jψi might well be an eigenstate of A and/or B.
However, if Cjψi ¼ cjψi (c 6¼ 0), jψi cannot be an eigenstate of A or B. This can
readily be shown in a fashion similar to that described in Sect. 2.5. Let us think of,
e.g., [Lx, Ly] ¼ iħLz. Suppose that for ∃ψ 0 we have Lzj ψ 0i ¼ 0. Taking an inner
product using jψ 0i, from (3.30) we have
3.3 Separation of Variables 69

   
ψ 0 j Lx Ly  Ly Lx ψ 0 ¼ 0:

In this case, moreover, even if we have jLxψ 0i ¼ 0 and jLyψ 0i ¼ 0, we have no


inconsistency. If, on the other hand, Lzjψi ¼ mjψi (m 6¼ 0), jψi cannot be an
eigenstate of Lx or Ly as mentioned above. Thus, we should be careful to deal with
a general situation where we have [A, B] ¼ iC.
In the case where [A, B] ¼ 0; AB ¼ BA, namely A and B commute, we have a
different situation. This relation is equivalent to that an operator AB  BA has an
eigenvalue zero for any physical state jψi. Yet, this statement is of less practical use.
Again, regarding details we wish to make a discussion in Sect. 12.6 of Part III.
Returning to (3.40), let us replace ψ with a particular eigenfunction Y(θ, ϕ). Then,
we have
 
YjL2 Y ¼ hYjγYi ¼ γ hYjYi ¼ γ  0: ð3:45Þ

Again, if L2 has an eigenvalue, the eigenvalue should be non-negative. Taking


account of the coefficient ħ2 in (3.34), it is convenient to put

γ ¼ ħ2 λ ðλ  0Þ: ð3:46Þ

On ground that the solution of (3.36) can be described as

ψ ðr, θ, ϕÞ ¼ Rðr ÞY ðθ, ϕÞ, ð3:47Þ

the Schrödinger equation (3.16) can be rewritten as


 2   
1 ħ ∂ 2 ∂ L2 Ze2
 2 r þ 2  Rðr ÞY ðθ, ϕÞ ¼ ERðr ÞY ðθ, ϕÞ: ð3:48Þ
2μ r ∂r ∂r r 4πε0 r

That is,
 2   
1 ħ ∂ ∂Rðr Þ L2 Y ðθ, ϕÞ Ze2
 2 r2 Y ðθ, ϕÞ þ R ð r Þ  Rðr ÞY ðθ, ϕÞ
2μ r ∂r ∂r r 2 4πε0 r

¼ ERðr ÞY ðθ, ϕÞ: ð3:49Þ

Recalling (3.37) and (3.46), we have


 2   
1 ħ ∂ 2 ∂Rðr Þ ħ2 λY ðθ, ϕÞ Ze2
 2 r Y ðθ, ϕÞ þ Rð r Þ  Rðr ÞY ðθ, ϕÞ
2μ r ∂r ∂r r2 4πε0 r
¼ ERðr ÞY ðθ, ϕÞ: ð3:50Þ

Dividing both sides by Y(θ, ϕ), we get a SOLDE of a radial component as


70 3 Hydrogen-Like Atoms

 2   
1 ħ ∂ 2 ∂Rðr Þ ħ2 λ Ze2
 2 r þ 2 R ðr Þ  Rðr Þ ¼ ERðr Þ: ð3:51Þ
2μ r ∂r ∂r r 4πε0 r

Regarding angular components θ and ϕ, using (3.34), (3.37), and (3.46), we have
"   #
2
1 ∂ ∂ 1 ∂
L Y ðθ, ϕÞ ¼ ħ
2 2
sin θ þ Y ðθ, ϕÞ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
¼ ħ2 λY ðθ, ϕÞ : ð3:52Þ

Dividing both sides by ħ2, we get


"   #
2
1 ∂ ∂ 1 ∂
 sin θ þ Y ðθ, ϕÞ ¼ λY ðθ, ϕÞ: ð3:53Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

Notice in (3.53) that the angular part of SOLDE does not depend on a specific
form of the potential.
Now, we further assume that (3.53) can be separated into a zenithal angle part θ
and azimuthal angle part ϕ such that

Y ðθ, ϕÞ ¼ ΘðθÞΦðϕÞ: ð3:54Þ

Then we have
"   #
2
1 ∂ ∂ΘðθÞ 1 ∂ Φ ð ϕÞ
 sin θ ΦðϕÞ þ ΘðθÞ ¼ λΘðθÞΦðϕÞ: ð3:55Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

Multiplying both sides by sin2θ/Θ(θ)Φ(ϕ) and arranging both the sides, we get

2  
1 ∂ ΦðϕÞ sin 2 θ 1 ∂ ∂ΘðθÞ
 ¼ sin θ þ λΘðθÞ : ð3:56Þ
ΦðϕÞ ∂ϕ2 ΘðθÞ sin θ ∂θ ∂θ

Since LHS of (3.56) depends only upon ϕ and RHS depends only on θ, we must
have

LHS of ð3:56Þ ¼ RHS of ð3:56Þ ¼ η ðconstantÞ: ð3:57Þ

Thus, we have a following relation of LHS of (3.56):


3.3 Separation of Variables 71

1 d 2 Φð ϕÞ
 ¼ η: ð3:58Þ
ΦðϕÞ dϕ2
2
Putting D   dϕ
d
2 , we get

DΦðϕÞ ¼ ηΦðϕÞ: ð3:59Þ

The SOLDEs of (3.58) and (3.59) are formally the same as (1.61) of Sect. 1.3,
where boundary conditions (BCs) are Dirichlet conditions. Unlike (1.61), however,
we have to consider different BCs, i.e., the periodic BCs.
As in Example 1.1, we adopt two linearly independent solutions. That is, we have

eimϕ and eimϕ ðm 6¼ 0Þ:

As their linear combination, we have

ΦðϕÞ ¼ aeimϕ þ beimϕ : ð3:60Þ

As BCs, we consider Φ(0) ¼ Φ(2π) and Φ0(0) ¼ Φ0(2π); i.e., we have

a þ b ¼ aei2πm þ bei2πm : ð3:61Þ

Meanwhile, we have

Φ0 ðϕÞ ¼ aimeimϕ  bimeimϕ : ð3:62Þ

Therefore, from BCs we have

aim  bim ¼ aimei2πm  bimei2πm :

Then,

a  b ¼ aei2πm  bei2πm : ð3:63Þ

From (3.61) and (3.63), we have


   
2a 1  ei2πm ¼ 0 and 2b 1  ei2πm ¼ 0:

If a 6¼ 0, we must have m ¼ 0, 1, 2,   . If a ¼ 0, we must have b 6¼ 0 to avoid


having Φ(ϕ)  0 as a solution. In that case, we have m ¼ 0, 1, 2,    as well.
Thus, it suffices to put Φ(ϕ) ¼ ceimϕ (m ¼ 0, 1, 2,   ). Therefore, as a normalized
function Φ e ðϕÞ, we get
72 3 Hydrogen-Like Atoms

e ðϕÞ ¼ p1ffiffiffiffiffi eimϕ ðm ¼ 0, 1, 2,   Þ:


Φ ð3:64Þ

Inserting it into (3.58), we have

m2 eimϕ ¼ ηeimϕ :

Therefore, we get

η ¼ m2 ðm ¼ 0, 1, 2,   Þ: ð3:65Þ

From (3.56) and (3.65), we have


 
1 d dΘðθÞ m 2 Θ ðθ Þ
 sin θ þ ¼ λΘðθÞ ðm ¼ 0, 1, 2,   Þ: ð3:66Þ
sin θ dθ dθ sin 2 θ
pffiffiffiffiffi
In (3.64) putting m ¼ 0 as an eigenvalue, we have ΦðϕÞ ¼ 1= 2π as a
corresponding eigenfunction. Unlike Examples 1.1 and 1.2, this reflects that the
d2
differential operator  dϕ 2 accompanied by the periodic BCs is a non-negative

operator that allows an eigenvalue of zero. Yet, we are uncertain of a range of m.


To clarify this point, we consider generalized angular momentum in the next section.

3.4 Generalized Angular Momentum

We obtained commutation relations of (3.30) among individual angular momentum


components Lx, Ly, and Lz. In an opposite way, we may start with (3.30) to define
angular momentum. Such a quantity is called generalized angular momentum.
Let e
J be a generalized angular momentum as in the case of (3.4) such that
0 1
Jex
e B C
J ¼ ðe1 e2 e3 Þ@ Jey A: ð3:67Þ
Jez

For the sake of simple notation, let us define J as follows so that we can eliminate
ħ and deal with dimensionless quantities in the present discussion:
3.4 Generalized Angular Momentum 73

0
1 0 1
Jex =ħ Jx
e Be C B C
J  J=ħ ¼ ðe1 e2 e3 Þ@ J y =ħ A ¼ ðe1 e2 e3 Þ@ J y A,
Jez =ħ Jz

J2 ¼ J x 2 þ J y 2 þ J z 2 : ð3:68Þ

Then, we require following commutation relations:

J x , J y ¼ iJ z , J y , J z ¼ iJ x , and ½J z , J x  ¼ iJ y : ð3:69Þ

Also, we require Jx, Jy, and Jz to be Hermitian. The operator J2 is Hermitian


accordingly. The relations (3.69) lead to

J x , J2 ¼ 0, J y , J2 ¼ 0, and J z , J2 ¼ 0: ð3:70Þ

This can be confirmed as in the case of (3.30).


As noted above, again a simultaneous eigenstate exists for J2 and one of Jx, Jy,
and Jz. According to the convention, we choose J2 and Jz for the simultaneous
eigenstate. Then, designating the eigenstate by jζ, μi, we have

J2 jζ, μi ¼ ζjζ, μi and J z jζ, μi ¼ μjζ, μi: ð3:71Þ

The implication of (3.71) is that jζ, μi is the simultaneous eigenstate and that μ is
an eigenvalue of Jz which jζ, μi belongs to with ζ being an eigenvalue of J2 which
jζ, μi belongs to as well.
Since Jz and J2 are Hermitian, both μ and ζ are real (see Sect. 1.4). Of these, ζ  0
as in the case of (3.45). We define following operators J (+) and J () as in the case of
(3.27):

J ðþÞ  J x þ iJ y and J ðÞ  J x  iJ y : ð3:72Þ

Then, from (3.69) and (3.70), we get


h i h i
J ðþÞ , J2 ¼ J ðÞ , J2 ¼ 0: ð3:73Þ

Also, we obtain following commutation relations:


h i h i h i
J z , J ðþÞ ¼ J ðþÞ ; J z , J ðÞ ¼ J ðÞ ; J ðþÞ , J ðÞ ¼ 2J z : ð3:74Þ

From (3.70) to (3.72), we get


74 3 Hydrogen-Like Atoms

J2 J ðþÞ jζ, μi ¼ J ðþÞ J2 jμi ¼ ζJ ðþÞ jζ, μi,


J2 J ðÞ jζ, μi ¼ J ðÞ J2 jζ, μi ¼ ζJ ðÞ jζ, μi: ð3:75Þ

Equation (3.75) indicates that both J (+)jζ, μi and J ()jζ, μi are eigenvectors of J2
that correspond to an eigenvalue ζ.
Meanwhile, from (3.74) we get

J z J ðþÞ jζ, μi ¼ J ðþÞ ðJ z þ 1Þjζ, μi ¼ ðμ þ 1ÞJ ðþÞ jζ, μi,


J z J ðÞ jζ, μi ¼ J ðÞ ðJ z  1Þjζ, μi ¼ ðμ  1ÞJ ðÞ jζ, μi: ð3:76Þ

The relation (3.76) means that J (+)jζ, μi is an eigenvector of Jz corresponding to


an eigenvalue (μ + 1), while J ()jζ, μi is an eigenvector of Jz corresponding to an
eigenvalue (μ  1). This implies that J (+) and J () function as raising and lowering
operators (or ladder operators) that have been introduced in this chapter. Thus, using
undetermined constants (or phase factors) aμ(+) and aμ(), we describe

J ðþÞ jζ, μi ¼ aμ ðþÞ jζ, μ þ 1i and J ðÞ jζ, μi ¼ aμ ðÞ jζ, μ  1i: ð3:77Þ

Next, let us characterize eigenvalues μ. We have

J x 2 þ J y 2 ¼ J2  J z 2 : ð3:78Þ

Therefore,
    
J x 2 þ J y 2 Þjζ, μi ¼ J2  J z 2 jζ, μi ¼ ζ  μ2 jζ, μi: ð3:79Þ

Since (Jx2 + Jy2) is a non-negative operator, its eigenvalues are non-negative as


well, as can be seen from (3.40) and (3.45). Then, we have

ζ  μ2  0: ð3:80Þ

Thus, for a fixed value of non-negative ζ, μ is bounded both upwards and


downwards. We define then a maximum of μ as j and a minimum of μ as j0.
Consequently, on the basis of (3.77), we have

J ðþÞ jζ, ji ¼ 0 and J ðÞ jζ, j0 i ¼ 0: ð3:81Þ

This is because we have no quantum state corresponding to jζ, j + 1i or jζ, j0  1i.


From (3.75) and (3.81), possible numbers of μ are
3.4 Generalized Angular Momentum 75

j, j  1, j  2,   , j0 : ð3:82Þ

From (3.69) and (3.72), we get

J ðÞ J ðþÞ ¼ J2  J z 2  J z , J ðþÞ J ðÞ ¼ J 2  J z 2 þ J z : ð3:83Þ

Operating these operators on j ζ, ji or j ζ, j0i and using (3.81) we get


   
J ðÞ J ðþÞ jζ, ji ¼ J2  J z 2  J z jζ, ji ¼ ζ  j2  j jζ, ji ¼ 0,
  
J ðþÞ J ðÞ jζ, j0 i ¼ J2  J z 2 þ J z jζ, j0 i ¼ ζ  j0 þ j0 jζ, j0 i ¼ 0:
2
ð3:84Þ

Since jζ, ji 6¼ 0 and jζ, j0i 6¼ 0, we have

ζ  j2  j ¼ ζ  j0 þ j0 ¼ 0:
2
ð3:85Þ

This means that

ζ ¼ jð j þ 1Þ ¼ j0 ð j0  1Þ: ð3:86Þ

Moreover, from (3.86) we get

jð j þ 1Þ  j0 ð j0  1Þ ¼ ð j þ j0 Þð j  j0 þ 1Þ ¼ 0: ð3:87Þ

As j  j0, j  j0 + 1 > 0. From (3.87), therefore, we get

j þ j0 ¼ 0 or j ¼ j0 : ð3:88Þ

Then, we conclude that the minimum of μ is –j. Accordingly, possible values of μ


are

μ ¼ j, j  1, j  2,   ,  j  1,  j: ð3:89Þ

That is, the number μ can take is (2j + 1). The relation (3.89) implies that taking a
positive integer k,

j  k ¼ j or j ¼ k=2: ð3:90Þ

In other words, j is permitted to take a number zero, a positive integer, or a


positive half-integer (or more precisely, half-odd-integer). For instance, if j ¼ 1/2, μ
can be 1/2 or 1/2. When j ¼ 1, μ can be 1, 0, or  1.
Finally, we have to decide undetermined constants aμ(+) and aμ(). To this end,
multiplying hζ, μ  1| on both sides of the second equation of (3.77) from the left, we
have
76 3 Hydrogen-Like Atoms

hζ, μ  1jJ ðÞ jζ, μi ¼ aμ ðÞ hζ, μ  1jζ, μ  1i ¼ aμ ðÞ , ð3:91Þ

where the second equality comes from that jζ, μ  1i has been normalized; i.e.,
jjj ζ, μ  1ijj ¼ 1. Meanwhile, taking adjoint of both sides of the first equation of
(3.77), we have
h i{ h i
hζ, μj J ðþÞ ¼ aμ ðþÞ hζ, μ þ 1j: ð3:92Þ

But, from (3.72) and the fact that Jx and Jy are Hermitian,
h i{
J ðþÞ ¼ J ð Þ : ð3:93Þ

Using (3.93) and replacing μ in (3.92) with μ  1, we get


h i
hζ, μ  1jJ ðÞ ¼ aμ1 ðþÞ hζ, μj: ð3:94Þ

Furthermore, multiplying jζ, μi on (3.94) from the right, we have


h i h i
hζ, μ  1jJ ðÞ jζ, μi ¼ aμ1 ðþÞ hζ, μjζ, μi ¼ aμ1 ðþÞ , ð3:95Þ

where again jζ, μi is assumed to be normalized. Comparing (3.91) and (3.95), we get
h i
aμ ðÞ ¼ aμ1 ðþÞ : ð3:96Þ

Taking an inner product regarding the first equation of (3.77) and its adjoint,
h i  2
hζ, μjJ ðÞ J ðþÞ jζ, μi ¼ aμ ðþÞ aμ ðþÞ hζ, μ þ 1jζ, μ þ 1i ¼aμ ðþÞ  : ð3:97Þ

Once again, the second equality of (3.97) results from the normalization of the
vector.
Using (3.83) as well as (3.71) and (3.86), (3.97) can be rewritten as

hζ, μjJ2  J z 2  J z jζ, μi ¼ hζ, μj jð j þ 1Þ  μ2  μjζ, μi


 2
¼ hζ, μjζ, μið j  μÞð j þ μ þ 1Þ ¼ aμ ðþÞ  : ð3:98Þ

Thus, we get
3.5 Orbital Angular Momentum: Operator Approach 77

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aμ ðþÞ ¼ eiδ ð j  μÞð j þ μ þ 1Þ ðδ : an arbitrary real numberÞ, ð3:99Þ

where eiδ is a phase factor. From (3.96) we also get


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aμ ðÞ ¼ eiδ ð j  μ þ 1Þ ð j þ μ Þ : ð3:100Þ

In (3.99) and (3.100), we routinely put δ ¼ 0 so that aμ(+) and aμ() can be positive
numbers. Explicitly rewriting (3.77), we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
J ðþÞ jζ, μi ¼ ð j  μÞð j þ μ þ 1Þjζ, μ þ 1i,
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
J ðÞ jζ, μi ¼ ð j  μ þ 1Þð j þ μÞjζ, μ  1i, ð3:101Þ

where j is a fixed given number chosen from among zero, positive integers, and
positive half-integers (or half-odd-integers).
As discussed above, we have derived various properties and relations with respect
to the generalized angular momentum on the basis of (i) the relation (3.30) or (3.69)
and (ii) the fact that Jx, Jy, and Jz are Hermitian operators. This notion is very useful
in dealing with various angular momenta of different origins (e.g., orbital angular
momentum, spin angular momentum) from a unified point of view. In Chap. 20, we
will revisit this issue in more detail.

3.5 Orbital Angular Momentum: Operator Approach

In Sect. 3.4 we have derived various important results on angular momenta on the
basis of the commutation relations (3.69) and the assumption that Jx, Jy, and Jz are
Hermitian. Now, let us return to the discussion on orbital angular momenta we dealt
with in Sects. 3.2 and 3.3. First, we treat the orbital angular momenta via operator
approach. This approach enables us to understand why a quantity j introduced in
Sect. 3.4 takes a value zero or positive integers with the orbital angular momenta. In
the next section (Sect 3.6) we will deal with the related issues by an analytical
method.
In (3.28) we introduced differential operators L(+) and L(). According to Sect.
3.4, we define following operators to eliminate ħ so that we can deal with dimen-
sionless quantities:
0 1
Mx
B C
M  L=ħ ¼ ðe1 e2 e3 Þ@ M y A,
Mz
78 3 Hydrogen-Like Atoms

M 2 ¼ L2 =ħ2 ¼ M x 2 þ M y 2 þ M z 2 : ð3:102Þ

Hence, we have

Lx Ly
Mx ¼ , M y ¼ , and M z ¼ Lz =ħ: ð3:103Þ
ħ ħ

Moreover, we define following operators:


 
∂ ∂
M ðþÞ  M x þ iM y ¼ LðþÞ =ħ ¼ eiϕ þ i cot θ , ð3:104Þ
∂θ ∂ϕ
 
∂ ∂
M ð Þ  LðÞ =ħ ¼ eiϕ  þ i cot θ : ð3:105Þ
∂θ ∂ϕ

Then we have
"   #
2
1 ∂ ∂ 1 ∂
M ¼ 2
sin θ þ : ð3:106Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

Here we execute variable transformation such that


qffiffiffiffiffiffiffiffiffiffiffiffiffi
ξ ¼ cos θ ð0  θ  πÞ or sin θ ¼ 1  ξ2 : ð3:107Þ

Noting, e.g., that


qffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ ∂ξ ∂ ∂ ∂ ∂ ∂
¼ ¼  sin θ ¼ 1  ξ2 , sin θ ¼  sin 2 θ
∂θ ∂θ ∂ξ ∂ξ ∂ξ ∂θ ∂ξ
 ∂
¼  1  ξ2 , ð3:108Þ
∂ξ

we get

qffiffiffiffiffiffiffiffiffiffiffiffiffi !
ð þÞ 2 ∂ ξ ∂
M ¼e  1ξ iϕ
þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1  ξ2 ∂ϕ
qffiffiffiffiffiffiffiffiffiffiffiffiffi !
ð Þ iϕ 2 ∂ ξ ∂ ð3:109Þ
M ¼e 1ξ þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1  ξ2 ∂ϕ
 
∂  ∂ 2
1 ∂
M2 ¼  1  ξ2  :
∂ξ ∂ξ 1  ξ2 ∂ϕ2
3.5 Orbital Angular Momentum: Operator Approach 79

Although we showed in Sect. 3.3 that m ¼ 0, 1, 2,   , the range of m was


unclear. The relationship between m and λ in (3.66) remains unclear so far as well.
On the basis of a general approach developed in Sect. 3.4, however, we have known
that the eigenvalue μ of the dimensionless z-component angular momentum Jz is
bounded with its maximum and minimum being j and –j, respectively [see (3.89)],
where j can be zero, a positive integer, or a positive half-odd-integer. Concomitantly,
the eigenvalue ζ of J2 equals j( j + 1).
In the present section, let us reconsider the relationship between m and λ in (3.66)
in light of the knowledge obtained in Sect. 3.4. According to the custom, we replace
μ in (3.89) with m to have

m ¼ j, j  1, j  2,   , j  1,  j: ð3:110Þ

At the moment, we assume that m can be a half-odd-integer besides zero or an


integer [3].
Now, let us define notation of Y(θ, ϕ) that appeared in (3.37). This function is
eligible for a simultaneous eigenstate of M2 and Mz and can be indexed with j and
m as in (3.110). Then, let Y(θ, ϕ) be described accordingly as

j ðθ, ϕÞ  Y ðθ, ϕÞ:


Ym ð3:111Þ

From (3.54) and (3.64), we have

j ðθ, ϕÞ / e
Ym :
imϕ

Therefore, we get

qffiffiffiffiffiffiffiffiffiffiffiffiffi !
2 ∂ mξ
M ð þÞ Y m
j ðθ, ϕÞ ¼e  1ξ

 pffiffiffiffiffiffiffiffiffiffiffiffiffi Y m
j ðθ, ϕÞ
∂ξ 1  ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffimþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffim 

¼ e iϕ
1ξ 2
1ξ 2
Y j ðθ, ϕÞ ,
m
ð3:112Þ
∂ξ

where we used the following equation:


qffiffiffiffiffiffiffiffiffiffiffiffiffim  qffiffiffiffiffiffiffiffiffiffiffiffiffim1 qffiffiffiffiffiffiffiffiffiffiffiffiffi1
∂ 1
1ξ 2
¼ ðmÞ 1  ξ2  1  ξ2 ð2ξÞ
∂ξ 2
qffiffiffiffiffiffiffiffiffiffiffiffiffim2
¼ mξ 1  ξ2 ,
80 3 Hydrogen-Like Atoms

qffiffiffiffiffiffiffiffiffiffiffiffiffim  qffiffiffiffiffiffiffiffiffiffiffiffiffim2

1  ξ2 Ym ð θ, ϕ Þ ¼ mξ 1  ξ2 j ðθ, ϕÞ
Ym
∂ξ j

qffiffiffiffiffiffiffiffiffiffiffiffiffim
∂Y m
j ðθ, ϕÞ
þ 1  ξ2 : ð3:113Þ
∂ξ

Similarly, we get

qffiffiffiffiffiffiffiffiffiffiffiffiffi !
2 ∂ mξ
M ð Þ Y m
j ðθ, ϕÞ ¼e iϕ
1ξ  pffiffiffiffiffiffiffiffiffiffiffiffiffi Y m
j ðθ, ϕÞ
∂ξ 1  ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffimþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffim 
iϕ ∂
¼e 1ξ 2
1ξ 2
Y j ðθ, ϕÞ :
m
ð3:114Þ
∂ξ

Let us derive the relations where M (+) or M () is successively operated on


j ðθ, ϕÞ. In the case of M , using (3.109) we have
(+)
Ym

h in qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n
ð þÞ n inϕ ∂
M Y j ðθ, ϕÞ ¼ ð1Þ e
m
1  ξ2 n
∂ξ
qffiffiffiffiffiffiffiffiffiffiffiffiffim 
 1  ξ2 Ym j ð θ, ϕ Þ : ð3:115Þ

We confirm this relation by mathematical induction. We have (3.112) by


replacing n with 1 in (3.115). Namely, (3.115) holds when n ¼ 1. Next, suppose
that (3.115) holds with n. Then, using the first equation of (3.109) and noting (3.64),
we have
h inþ1 nh in o
M ðþÞ j ðθ, ϕÞ ¼ M
Ym ð þÞ
M ðþÞ Y m
j ðθ, ϕÞ

qffiffiffiffiffiffiffiffiffiffiffiffiffi !
2 ∂ ξ ∂
¼e iϕ
 1ξ þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1  ξ2 ∂ϕ
qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi 
∂ 2 m m
ð1Þn einϕ 1  ξ2 1  ξ Y ð θ, ϕ Þ
∂ξn j

" qffiffiffiffiffiffiffiffiffiffiffiffiffi #
n iðnþ1Þϕ 2 ∂ ξðn þ mÞ
¼ ð1Þ e  1ξ  pffiffiffiffiffiffiffiffiffiffiffiffiffi
∂ξ 1  ξ2
qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi 
∂ 2 m m
 1ξ 2
1ξ Y j ðθ, ϕÞ
∂ξn
3.5 Orbital Angular Momentum: Operator Approach 81

( qffiffiffiffiffiffiffiffiffiffiffiffiffi" qffiffiffiffiffiffiffiffiffiffiffiffiffimþn1
n iðnþ1Þϕ
¼ ð1Þ e  1  ξ 2 ð m þ nÞ 1  ξ2

qffiffiffiffiffiffiffiffiffiffiffiffiffi  n qffiffiffiffiffiffiffiffiffiffiffiffiffi 
1 2 1 ∂ 2 m m
 1ξ ð2ξÞ 1ξ Y j ðθ, ϕÞ
2 ∂ξn
qffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffimþn nþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi 
∂ 2 m m
 1  ξ2 1  ξ2 1  ξ Y j ð θ, ϕ Þ
∂ξnþ1
qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n "qffiffiffiffiffiffiffiffiffiffiffiffiffi )
ξðn þ mÞ ∂ 2 m m
 pffiffiffiffiffiffiffiffiffiffiffiffiffi 1ξ 2
1ξ Y j ðθ, ϕÞ
1  ξ2 ∂ξn
qffiffiffiffiffiffiffiffiffiffiffiffiffimþðnþ1Þ nþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi 
nþ1 iðnþ1Þϕ ∂ 2 m m
¼ ð1Þ e 1ξ 2
1ξ Y j ðθ, ϕÞ : ð3:116Þ
∂ξnþ1

Notice that the first and third terms in the second last equality cancelled each
other. Thus, (3.115) certainly holds with (n + 1). Similarly, we have [3]

h in qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi 
ðÞ inϕ ∂
M Y j ðθ, ϕÞ ¼ e
m
1ξ 2
n 1ξ 2 m m
Y j ðθ, ϕÞ :
∂ξ
ð3:117Þ

Proof of (3.117) is left for readers.


From the second equation of (3.81) and (3.114) where m is replaced with j, we
have
qffiffiffiffiffiffiffiffiffiffiffiffiffi jþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi 

M ðÞ Y j
j ðθ, ϕÞ ¼ e
iϕ
1  ξ2 1  ξ2 j Y j ð θ, ϕ Þ ¼ 0: ð3:118Þ
∂ξ j

pffiffiffiffiffiffiffiffiffiffiffiffiffi j
This implies that 1  ξ2 j Y j ðθ, ϕÞ is a constant with respect to ξ. We
describe this as
qffiffiffiffiffiffiffiffiffiffiffiffiffij
1  ξ2 Y j
j ðθ, ϕÞ ¼ c ðc : constant with respect to ξÞ: ð3:119Þ

Meanwhile, putting m ¼  j and n ¼ 2j + 1 in (3.115) and taking account of the


first equation of (3.77) and the first equation of (3.81), we get
h i2jþ1
M ðþÞ Y j
j ðθ, ϕÞ

qffiffiffiffiffiffiffiffiffiffiffiffiffi jþ1 2jþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi 


∂ 2 j j
¼ ð1Þ2jþ1 eið2jþ1Þϕ 1  ξ2 2jþ1
1  ξ Y j ð θ, ϕ Þ ¼ 0: ð3:120Þ
∂ξ

This means that


82 3 Hydrogen-Like Atoms

qffiffiffiffiffiffiffiffiffiffiffiffiffij
1  ξ2 Y j j ðθ, ϕÞ ¼ ðat most a 2j  degree polynomial with ξÞ: ð3:121Þ

Replacing Y j
j ðθ, ϕÞ in (3.121) with that of (3.119), we get

qffiffiffiffiffiffiffiffiffiffiffiffiffij qffiffiffiffiffiffiffiffiffiffiffiffiffij
 j
c 1  ξ2 1  ξ2 ¼ c 1  ξ2

¼ ðat most a 2j  degree polynomial with ξÞ:


ð3:122Þ

Here, if j is a half-odd-integer, c(1  ξ2) j of (3.122) cannot be a polynomial. If, on


the other hand, j is zero or a positive integer, c(1  ξ2) j is certainly a polynomial and,
pffiffiffiffiffiffiffiffiffiffiffiffiffij j
to top it all, a 2j-degree polynomial with respect to ξ; so is 1  ξ2 Y j ðθ, ϕÞ.
According to the custom, henceforth we use l as zero or a positive integer instead
of j. That is,

Y ðθ, ϕÞ  Y m
l ðθ, ϕÞ ðl : zero or a positive integer Þ: ð3:123Þ

At the same time, so far as the orbital angular momentum is concerned, from
(3.71) and (3.86) we can identify ζ in (3.71) with l(l + 1). Namely, we have

ζ ¼ lðl þ 1Þ:

Concomitantly, m in (3.110) is determined as

m ¼ l, l  1, l  2,   1, 0,  1,     l þ 1,  l: ð3:124Þ

Thus, as expected m is zero or a positive or negative integer. Considering (3.37)


and (3.46), ζ is identical with λ in (3.46). Finally, we rewrite (3.66) such that
 
1 d dΘðθÞ m2 ΘðθÞ
 sin θ þ ¼ lðl þ 1ÞΘðθÞ, ð3:125Þ
sin θ dθ dθ sin 2 θ

where, l is equal to zero or positive integers and m is given by (3.124).


On condition of ξ ¼ cos θ (3.107), defining the following function

l ðξÞ  ΘðθÞ,
Pm ð3:126Þ

and considering (3.109) along with (3.54), we arrive at the next SOLDE described as
3.5 Orbital Angular Momentum: Operator Approach 83

   
d   m
2 dPl ðξÞ m2
1ξ þ lðl þ 1Þ  Pm ðξÞ ¼ 0: ð3:127Þ
dξ dξ 1  ξ2 l

The SOLDE of (3.127) is well known as the associated Legendre differential


equation. The solutions Pm l ðξÞ are called associated Legendre functions.
In the next section, we characterize the said equation and functions by an
analytical method. Before going into details, however, we further seek characteris-
l ðξÞ by the operator approach.
tics of Pm
Adopting the notation of (3.123) and putting m ¼ l in (3.112), we have
qffiffiffiffiffiffiffiffiffiffiffiffiffilþ1 "qffiffiffiffiffiffiffiffiffiffiffiffiffil #

M ðþÞ Y ll ðθ, ϕÞ ¼ e 1ξ iϕ 2
1ξ 2
Y l ðθ, ϕÞ :
l
ð3:128Þ
∂ξ

Corresponding to (3.81), we have M ðþÞ Y ll ðθ, ϕÞ ¼ 0. This implies that


qffiffiffiffiffiffiffiffiffiffiffiffiffil
1  ξ2 Y ll ðθ, ϕÞ ¼ c ðc : constant with respect to ξÞ: ð3:129Þ

From (3.107) and (3.64), we get

Y ll ðθ, ϕÞ ¼ κ l sin l θeilϕ , ð3:130Þ

where κ l is another constant that depends on l, but is independent of θ and ϕ. Let us


seek κ l by normalization condition. That is,
Z Z Z
2π π  2 π
dϕ sin θdθY ll ðθ, ϕÞ ¼ 2π  jκl j2 sin 2lþ1 θdθ ¼ 1, ð3:131Þ
0 0 0

where the integration is performed on a unit sphere. Note that an infinitesimal area
element on the unit sphere is represented by sinθdθdϕ.
We evaluate the above integral denoted as
Z π
I sin 2lþ1 θdθ: ð3:132Þ
0

Using integration by parts,


Z π
I¼ ð cos θÞ0 sin 2l θdθ
0
Z π
π
¼ ð cos θÞ sin θ 2l
0
þ ð cos θÞ  2l  sin 2l1 θ cos θdθ
0
Z π Z π
¼ 2l sin 2l1 θdθ  2l sin 2lþ1 θdθ: ð3:133Þ
0 0
84 3 Hydrogen-Like Atoms

Thus, we get a recurrence relation with respect to I (3.132) such that


Z π
2l
I¼ sin 2l1 θdθ: ð3:134Þ
2l þ 1 0

Repeating the above process, we get


Z π
2l 2l  2 2 22lþ1 ðl!Þ2
I¼   sin θdθ ¼ : ð3:135Þ
2l þ 1 2l  1 3 0 ð2l þ 1Þ!

Then,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ð2l þ 1Þ! eiχ ð2l þ 1Þ!
jκ l j ¼ l or κl ¼ l ðχ : realÞ, ð3:136Þ
2 l! 4π 2 l! 4π

where eiχ is an undetermined constant (phase factor) that is to be determined below.


Thus we get
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eiχ ð2l þ 1Þ! l ilϕ
Y ll ðθ, ϕÞ ¼ l sin θe : ð3:137Þ
2 l! 4π

Meanwhile, in the second equation of (3.101) replacing J (), j, and μ in (3.101)


l ðθ, ϕÞ instead of jζ, μi, we get
with M (), l, and m, respectively, and using Y m

1
Y m1
l ðθ, ϕÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M ðÞ Y m
l ðθ, ϕÞ: ð3:138Þ
ðl  m þ 1Þðl þ mÞ

Replacing m with l in (3.138), we have

1 ðÞ l
l ðθ, ϕÞ ¼ pffiffiffiffi M
Y l1 Y l ðθ, ϕÞ: ð3:139Þ
2l

Operating M () (l  m) times in total on Y ll ðθ, ϕÞ of (3.139), we have


h ilm
1 ð Þ
Ym
l ð θ, ϕ Þ ¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi M Y ll ðθ, ϕÞ
2lð2l  1Þ  ðl þ m þ 1Þ 1  2  ðl  mÞ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðl þ mÞ! h ðÞ ilm l
¼ M Y l ðθ, ϕÞ:
ð2lÞ!ðl  mÞ!
ð3:140Þ

Meanwhile, putting m ¼ l, n ¼ l  m, and j ¼ l in (3.117), we have


3.5 Orbital Angular Momentum: Operator Approach 85

h ilm qffiffiffiffiffiffiffiffiffiffiffiffiffim lm



M ð Þ Y ll ðθ, ϕÞ ¼ eiðlmÞϕ 1  ξ2 lm
∂ξ
"qffiffiffiffiffiffiffiffiffiffiffiffiffi #
l
 1  ξ2 Y ll ðθ, ϕÞ : ð3:141Þ

lm l
Further replacing M ðÞ Y l ðθ, ϕÞ in (3.140) with that of (3.141), we get
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffim lm
ðl þ mÞ! iðlmÞϕ ∂
l ðθ, ϕÞ
Ym ¼ e 1  ξ2
ð2lÞ!ðl  mÞ! ∂ξ
lm

"qffiffiffiffiffiffiffiffiffiffiffiffiffi #
l
 1  ξ Y l ðθ, ϕÞ :
2 l
ð3:142Þ

Finally, replacing Y ll ðθ, ϕÞ in (3.142) with that of (3.137) and converting θ to ξ,


we arrive at the following equation:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eiχ ð2l þ 1Þðl þ mÞ! imϕ  m=2 ∂lm h l i
l ðθ, ϕÞ ¼ l
Ym e 1  ξ2 1  ξ2 : ð3:143Þ
2 l! 4π ðl  mÞ! ∂ξ
lm

Now, let us decide eiχ . Putting m ¼ 0 in (3.143), we have


rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eiχ ð1Þl ð2l þ 1Þ ∂l h i
2 l
Y 0l ðθ, ϕÞ ¼ l 1  ξ , ð3:144Þ
2 l!ð1Þl 4π ∂ξl

where we put (1)l on both the numerator and denominator. In RHS of (3.144),

ð1Þl ∂l 
1  ξ2 Þl  Pl ðξÞ: ð3:145Þ
2l l! ∂ξl

Equation (3.145) is well known as Rodrigues formula of Legendre polynomials.


We mention characteristics of Legendre polynomials in the next section. Thus,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eiχ ð2l þ 1Þ
Y 0l ðθ, ϕÞ ¼ Pl ðξÞ: ð3:146Þ
ð1Þl 4π

According to the custom [2], we require Y 0l ð0, ϕÞ to be positive. Noting that θ ¼ 0


corresponds to ξ ¼ 1, we have
86 3 Hydrogen-Like Atoms

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eiχ ð2l þ 1Þ eiχ ð2l þ 1Þ
Y 0l ð0, ϕÞ ¼ Pl ð1Þ ¼ , ð3:147Þ
ð1Þl 4π ð1Þl 4π

where we used Pl(1) ¼ 1. For this important relation, see Sect. 3.6.1. Also noting that
 eiχ 
ð1Þl  ¼ 1, we must have

eiχ
¼ 1 or eiχ ¼ ð1Þl ð3:148Þ
ð1Þl

so that Y 0l ð0, ϕÞ can be positive. Thus, (3.143) is rewritten as


sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð 1 Þl
ð2l þ 1Þðl þ mÞ! imϕ  
2 m=2 ∂
lm
l ðθ, ϕÞ ¼
Ym e 1  ξ
2l l! 4π ðl  mÞ! ∂ξ
lm

h l i
 1  ξ2 : ð3:149Þ

In Sect. 3.3, we mentioned that jψ 0i in (3.43) is a constant. In fact, putting


l ¼ m ¼ 0 in (3.149), we have
pffiffiffiffiffiffiffiffiffiffi
Y 00 ðθ, ϕÞ ¼ 1=4π : ð3:150Þ

Thus, as a simultaneous eigenstate of all Lx, Ly, Lz, and L2 corresponding to l ¼ 0


and m ¼ 0, we have

jψ 0 i  Y 00 ðθ, ϕÞ:

The normalized functions Y m l ðθ, ϕÞ described as (3.149) define simultaneous


eigenfunctions of L2 (or M2) and Lz (or Mz). Those functions are called spherical
surface harmonics and frequently appear in various fields of mathematical physics.
As in the case of Sect. 2.3, matrix representation enables us to intuitively grasp
the relationship between angular momentum operators and their eigenfunctions
(or eigenvectors). Rewriting the relations of (3.101) so that they can meet the present
purpose, we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðÞ jl, mi ¼ ðl  m þ 1Þðl þ mÞ jl, m  1i,
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðþÞ jl, mi ¼ ðl  mÞðl þ m þ 1Þ jl, m þ 1i, ð3:151Þ

where we used l instead of ζ to designate the eigenstate.


3.5 Orbital Angular Momentum: Operator Approach 87

Now, we know that m takes (2l + 1) different values that correspond to each l.
This implies that the operators can be expressed with (2l + 1, 2l + 1) matrices. As
implied in (3.151), M () takes the following form:

M ð Þ ¼
0 pffiffiffiffiffiffiffiffiffiffi 1
0 2l  1
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B ð2l  1Þ  2 C
B 0 C
B C
B C
B 0 ⋱ C
B C
B C
B ⋱ C
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B ð2l  k þ 1Þ  k C,
B 0 C
B C
B 0 C
B C
B C
B ⋱ ⋱ C
B pffiffiffiffiffiffiffiffiffiffi C
B C
B 0 1  2l C
@ A
0
ð3:152Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where diagonal elements are zero and a (k, k + 1) element is ð2l  k þ 1Þ  k. That
is, nonzero elements are positioned just above the zero diagonal elements. Corre-
spondingly, we have

M ð þÞ ¼
0 1
0
B pffiffiffiffiffiffiffiffiffi C
B 2l  1 C
B 0 C
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B C
B ð2l  1Þ  2 0 C
B C
B C
B ⋱ 0 C
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C
B C,
B ð2l  k þ 1Þ  k 0 C
B C
B ⋱ C
B C
B ⋱ C
B C
B 0 C
B C
B ⋱ 0 C
@ A
pffiffiffiffiffiffiffiffiffi
1  2l 0
ð3:153Þ
88 3 Hydrogen-Like Atoms

where again diagonal


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi elements are zero and a (k + 1, k) element is
ð2l  k þ 1Þ  k . In this case, nonzero elements are positioned just below the
zero diagonal elements. Notice also that M () and M (+) are adjoint to each other
and that these notations correspond to (2.65) and (2.66).
Basis functions Y m l ðθ, ϕÞ can be represented by a column vector, as in the case of
Sect. 2.3. These are denoted as follows:
0 1 0 1
1 0
B C B C
B C0 B C 1
B C B C
B C0 B C 0
B C B C
jl, li ¼ B ⋮ C, jl, l þ 1i ¼ B ⋮ C,   , jl, l  1i
B C B C
B C B C
B 0 C B 0 C
@ A @ A
0 0
0 1 0 1
0 0
B C B C
B 0 C B 0 C
B ⋮ C B ⋮ C
B C B C
B C B C
¼B C , jl, li ¼ B 0 C , ð3:154Þ
B 0 C B C
B C B C
B 1 C B 0 C
@ A @ A
0 1

where the first number l in jl, li, jl, l + 1i, etc. denotes the quantum number
associated with λ ¼ l(l + 1) of (3.124) and is kept constant; the latter number denotes
m. Note from (3.154) that the column vector whose k-th row is 1 corresponds to
m such that

m ¼ l þ k  1: ð3:155Þ

For instance, if k ¼ 1, m ¼  l; if k ¼ 2l + 1, m ¼ l, etc.


The operator M () converts the column vector whose (k + 1)-th row is 1 to that
whose k-th row is 1. The former column vector corresponds to jl, m + 1i and the latter
corresponding to jl, mi. Therefore, using (3.152), we get the following
representation:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðÞ jl, m þ 1i ¼ ð2l  k þ 1Þ  kjl, mi ¼ ðl  mÞðl þ m þ 1Þ jl, mi, ð3:156Þ

where the second equality is obtained by replacing k with that of (3.155), i.e.,
k ¼ l + m + 1. Changing m to (m  1), we get the first equation of (3.151). Similarly,
we obtain the second equation of (3.151) as well. That is, we have
3.5 Orbital Angular Momentum: Operator Approach 89

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðþÞ jl, mi ¼ð2l  k þ 1Þ  k jl, m þ 1i
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ ðl  mÞðl þ m þ 1Þ jl, m þ 1i: ð3:157Þ

From (3.32) we have

M 2 ¼ M ðþÞ M ðÞ þ M z 2  M z :

In the above, M (+)M () and Mz are diagonal matrices and, hence, Mz2 and M2 are
diagonal matrices as well such that
0 1
l
B C
B l þ 1 C
B C
B C
B l þ 2 C
B C
B ⋱ C
Mz ¼ B C,
B C
B kl C
B C
B C
B C
@ A

l
0 1
0
B C
B 2l  1 C
B C
B C
B ð2l  1Þ  2 C
B C
ð þÞ ðÞ B C
M M ¼B C,
B ⋱ C
B ð2l  k þ 1Þ  k C
B C
B C
B C
@ A

1  2l
ð3:158Þ

where k  l and (2l  k + 1)  k represent (k + 1, k + 1) elements of Mz and


M (+)M (), respectively. Therefore, (k + 1, k + 1) element of M2 is calculated as

ð2l  k þ 1Þ  k þ ðk  lÞ2  ðk  lÞ ¼ lðl þ 1Þ:

As expected, M2 takes a constant value l(l + 1). A matrix representation is shown


in (3.159) such that
90 3 Hydrogen-Like Atoms

0 1
l ð l þ 1Þ
B C
B lðl þ 1Þ C
B C
B C
B ⋱ C
B C
B lðl þ 1Þ C
M ¼B
2
C:
B C
B ⋱ C
B C
B C
B C
@ A
l ð l þ 1Þ
lðl þ 1Þ
ð3:159Þ

These expressions are useful to understand how the vectors of (3.154) constitute
simultaneous eigenstates of M2 and Mz. In this situation, the matrix representation is
said to diagonalize both M2 and Mz. In other words, the quantum states represented
by (3.154) are simultaneous eigenstates of M2 and Mz.
The matrices (3.152) and (3.153) that represent M () and M (+), respectively, are
said to be ladder operators or raising and lowering operators, because operating
column vectors those operators convert jmi to jm 1i as mentioned above. The
operators M () and M (+) correspond to a and a{ given in (2.65) and (2.66), respec-
tively. All these operators are characterized by that the corresponding matrices have
diagonal elements of zero and that nonvanishing elements are only positioned on
“right above” or “right below” relative to the diagonal elements. These matrices are a
kind of triangle matrices and all their diagonal elements are zero. The matrices are
characteristic of nilpotent matrices. That is, if a suitable power of a matrix is zero as a
matrix, such a matrix is said to be a nilpotent matrix (see Part III). In the present case,
(2l + 1)-th power of M () and M (+) becomes zero as a matrix.
The operator M () and M (+) can be described by the following shorthand
representations:
h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M ðÞ ¼ ð2l  k þ 1Þ  kδkþ1,j ð1  k  2lÞ: ð3:160Þ
kj

pffiffiffiffiffiffiffiffiffiffi
If l ¼ 0, Mz ¼ M (+)M () ¼ M2 ¼ 0. This case corresponds to Y 00 ðθ, ϕÞ ¼ 1=4π
and we do not need the matrix representation. Defining
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ak  ð2l  k þ 1Þ  k,

we have for instance


3.6 Orbital Angular Momentum: Analytic Approach 91

h i2 X
M ð Þ ¼ aδ aδ
p k kþ1,p p pþ1,j
¼ ak akþ1 δkþ2,j
kj
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ ð2l  k þ 1Þ  k ½2l  ðk þ 1Þ þ 1  ðk þ 1Þδkþ2,j , ð3:161Þ

where the summation is nonvanishing only if p ¼ k + 1. The factor δk + 2, j implies


that the elements are shifted by one toward upper right by being squared. Similarly
we have
h i
M ð þÞ ¼ ak1 δk,jþ1 ð1  k  2l þ 1Þ: ð3:162Þ
kj

In (3.158), M (+)M () is represented as follows:


h i X
M ðþÞ M ðÞ ¼ a δ a δ
p k1 k,pþ1 p pþ1,j
¼ ak1 a j1 δk,j ¼ ðak1 Þ2 δk,j
kj

¼ ½2l  ðk  1Þ þ 1  ðk  1Þδk,j ¼ ð2l  k þ 2Þðk  1Þδk,j : ð3:163Þ

Notice that although a0 is not defined, δ1, j + 1 ¼ 0 for any j, and so this causes no
inconvenience. Hence, [M (+)M ()]kj of (3.163) is well defined with 1  k  2l + 1.
Important properties of angular momentum operators examined above are based
upon the fact that those operators are ladder operators and represented by nilpotent
matrices. These characteristics will further be studied in Part III.

3.6 Orbital Angular Momentum: Analytic Approach

In this section, our central task is to solve the associated Legendre differential
equation expressed by (3.127) by an analytical method. Putting m ¼ 0 in (3.127),
we have
 
d   dP0l ðxÞ
1  x2 þ lðl þ 1ÞP0l ðxÞ ¼ 0, ð3:164Þ
dx dx

where we use a variable x instead of ξ. Equation (3.164) is called Legendre


differential equation and its characteristics and solutions have been widely investi-
gated. Hence, we put

P0l ðxÞ  Pl ðxÞ, ð3:165Þ

where Pl(x) is said to be Legendre polynomials. We first start with Legendre


differential equation and Legendre polynomials.
92 3 Hydrogen-Like Atoms

3.6.1 Spherical Surface Harmonics and Associated Legendre


Differential Equation

Let us think of a following identity according to Byron and Fuller [4]:

 d  l  l
1  x2 1  x2 ¼ 2lx 1  x2 , ð3:166Þ
dx

where l is a positive integer. We differentiate both sides of (3.166) (l + 1) times. Here


we use the Leibniz rule about differentiation of a product function that is described
by

Xn n!
d n ðuvÞ ¼ d m udnm v, ð3:167Þ
m¼0 m!ðn  mÞ!

where

dm u=dxm  dm u:

The above shorthand notation is due to Byron and Fuller [4]. We use this notation
for simplicity from place to place.
Noting that the third order and higher differentiations of (1  x2) vanish in LHS of
(3.166), we have
h   l i
LHS ¼ dlþ1 1  x2 d 1  x2
   l  l
¼ 1  x2 dlþ2 1  x2  2ðl þ 1Þxd lþ1 1  x2
 l
lðl þ 1Þdl 1  x2 :

Also noting that the second order and higher differentiations of 2lx vanish in LHS
of (3.166), we have
h  l i
RHS ¼ d lþ1 2lx 1  x2
 l  l
¼ 2lxd lþ1 1  x2  2lðl þ 1Þd l 1  x2 :

Therefore,

LHS  RHS
   l  l  l
¼ 1  x2 dlþ2 1  x2  2xd lþ1 1  x2 þ lðl þ 1Þdl 1  x2 ¼ 0:

We define Pl(x) as
3.6 Orbital Angular Momentum: Analytic Approach 93

ð1Þl d l h i
2 l
Pl ðxÞ  1  x , ð3:168Þ
2l l! dxl
l
where a constant ð1 Þ
2l l!
is multiplied according to the custom so that we can explicitly
represent Rodrigues formula of Legendre polynomials. Thus, from (3.164) Pl(x)
defined above satisfies Legendre differential equation. Rewriting it, we get

  d 2 P l ð xÞ dP ðxÞ
1  x2  2x l þ lðl þ 1ÞPl ðxÞ ¼ 0: ð3:169Þ
dx2 dx

Or equivalently, we have
 
d  
2 dPl ðxÞ
1x þ lðl þ 1ÞPl ðxÞ ¼ 0: ð3:170Þ
dx dx

Returning to (3.127) and using x as a variable, we rewrite (3.127) as


   
d   dPm
l ð xÞ m2
1  x2 þ l ð l þ 1Þ  Pm ðxÞ ¼ 0, ð3:171Þ
dx dx 1  x2 l

where l is a non-negative integer and m is an integer that takes following values:

m ¼ l, l  1, l  2,   1, 0,  1,     l þ 1,  l:

Deferential equations expressed as


 
d dyðxÞ
pð x Þ þ cðxÞyðxÞ ¼ 0
dx dx

are of particular importance. We will come back to this point in Sect. 8.3.
Since m can be either positive or negative, from (3.171) we notice that Pm l ðxÞ and
Pm
l ð x Þ must satisfy the same differential equation (3.171). This implies that Pml ð xÞ
m
and Pl ðxÞ are connected, i.e., linearly dependent. First, let us assume that m  0. In
the case of m < 0, we will examine it later soon.
According to Dennery and Krzywicki [5], we assume
 
2 m=2
l ðxÞ ¼ κ 1  x
Pm CðxÞ, ð3:172Þ

where κ is a constant. Inserting (3.172) into (3.171) and rearranging the terms, we
obtain
94 3 Hydrogen-Like Atoms

  d2 C dC
1  x2 2
 2ðm þ 1Þx þ ðl  mÞðl þ m þ 1ÞC ¼ 0 ð0  m  lÞ: ð3:173Þ
dx dx

Recall once again that if m ¼ 0, the associated Legendre differential equation


given by (3.127) and (3.171) is exactly identical to Legendre differential equation of
(3.170). Differentiating (3.170) m times, we get
   
  d 2 d m Pl d d m Pl
1x 2
m  2ðm þ 1Þx
dx2 dx dx dxm

d m Pl
þ ðl  m Þðl þ m þ 1 Þ ¼ 0, ð3:174Þ
dxm

where we used the Leibniz rule about differentiation of (3.167). Comparing (3.173)
and (3.174), we find that

d m Pl
C ð xÞ ¼ κ 0 ,
dxm

where κ0 is a constant. Inserting this relation into (3.172) and setting κκ 0 ¼ 1, we get

 
2 m=2 d Pl ðxÞ
m
l ðxÞ ¼ 1  x
Pm
dxm
ð0  m  lÞ: ð3:175Þ

Using Rodrigues formula of (3.168), we have

ð1Þl  m=2 dlþm h l i


l ð xÞ 
Pm l
1  x2 lþm
1  x2 : ð3:176Þ
2 l! dx

Equation (3.175) defines the associated Legendre functions. Note, however, that
the function form differs from literature to literature [2, 5, 6].
Among classical orthogonal polynomials, Gegenbauer polynomials C λn ðxÞ often
appear in the literature. The relevant differential equation is defined by

  d2 λ d
1  x2 C ðxÞ  ð2λ þ 1Þx C λn ðxÞ þ nðn þ 2λÞC λn ðxÞ
dx2 n dx

1
¼0 λ> : ð3:177Þ
2

Setting n ¼ l  m and λ ¼ m þ 12 in (3.177) [5], we have

  d2 mþ12 d mþ1
1  x2 C ðxÞ  2ðm þ 1Þx Clm2 ðxÞ
2 lm
dx dx
3.6 Orbital Angular Momentum: Analytic Approach 95

mþ1
þðl  mÞðl þ m þ 1ÞC lm2 ðxÞ ¼ 0: ð3:178Þ

Once again comparing (3.174) and (3.178), we obtain

d m P l ð xÞ mþ1
¼ constant  C lm2 ðxÞ ð0  m  lÞ: ð3:179Þ
dxm

Next, let us determine the constant appearing in (3.179). To this end, we consider
a following generating function of the polynomials C λn ðxÞ defined by [7, 8]

 λ X1 
1
1  2tx þ t 2  C λ ðxÞt n
n¼0 n
λ> : ð3:180Þ
2

To calculate (3.180), let us think of a following expression for x and λ:


X1  λ 
ð 1 þ xÞ ¼ m¼0
xm , ð3:181Þ
m
 

where λ is an arbitrary real number and we define as
m
   
λ λ
 λðλ  1Þðλ  2Þ  λ  m þ 1=m! and  1: ð3:182Þ
m 0

Notice that (3.181) with (3.182) is an extension of binomial theorem (generalized


binomial theorem). Putting λ ¼ n, we have
   
n n! n
¼ and ¼ 1:
m ðn  mÞ!m! 0

We rewrite (3.181) using gamma functions Γ(z) such that

X1 Γðλ þ 1Þ
ð1 þ xÞλ ¼ xm , ð3:183Þ
m¼0 m!Γðλ  m þ 1Þ

where Γ(z) is defined by integral representation as


Z 1
Γ ðzÞ ¼ et t z1 dt ð Re z > 0Þ: ð3:184Þ
0

Changing variables such that t ¼ u2, we have


96 3 Hydrogen-Like Atoms

Z 1
eu u2z1 du ð Re z > 0Þ:
2
ΓðzÞ ¼ 2 ð3:185Þ
0

Note that the above expression is associated with the following fundamental
feature of the gamma functions:

Γðz þ 1Þ ¼ zΓðzÞ, ð3:186Þ

where z is any complex number.


Replacing x with t(2x  t) and rewriting (3.183), we have

 λ X1 Γðλ þ 1Þ
1  2tx þ t 2 ¼ ðt Þm ð2x  t Þm : ð3:187Þ
m¼0 m!Γðλ  m þ 1Þ

Assuming that x is a real number belonging to an interval [1, 1], (3.187) holds
with t satisfying jt j < 1 [8]. The discussion is as follows: When x satisfies the above
condition, solving 1  2tx + t2 ¼ 0 we get solution t such that
pffiffiffiffiffiffiffiffiffiffiffiffiffi
t ¼x i 1  x2 :

Defining r as

r  min fjt þ j, jt  jg,

(1  2tx + t2)λ, regarded as a function of t, is analytic in the disk |t| < r. But, we
have

jt j ¼ 1:

Thus, (1  2tx + t2)λ is analytic within the disk jt j < 1 and, hence, it can be
expanded in a Taylor’s series (see Chap. 6).
Continuing the calculation of (3.187), we have
 λ
1  2tx þ t 2
X1 X 
Γðλ þ 1Þ m m m m! k k
¼ m¼0 m!Γðλ  m þ 1Þ
ð1Þ t k¼0 k!ðm  k Þ!
2 x ð1Þ t
mk mk

X1 Xm ð1Þmþk Γðλ þ 1Þ
¼ m¼0 k¼0 k!ðm  k Þ! Γðλ  m þ 1Þ
2mk xmk t mþk
X1 Xm ð1Þmþk ð1Þm Γðλ þ mÞ
¼ k¼0 k!ðm  k Þ!
2mk xmk t mþk ,
m¼0 ΓðλÞ
ð3:188Þ
3.6 Orbital Angular Momentum: Analytic Approach 97

where the last equality results from that we rewrote gamma functions using (3.186).
Replacing (m + k) with n, we get

 λ X1 X½n=2 ð1Þk 2n2k Γðλ þ n  k Þ


1  2tx þ t 2 ¼ k¼0 k!ðn  2k Þ!
xn2k t n , ð3:189Þ
n¼0 Γ ðλÞ

where [n/2] represents an integer that does not exceed n/2. This expression comes
from a requirement that an order of x must satisfy the following condition:

n  2k  0 or k  n=2: ð3:190Þ

That is, if n is even, the maximum of k ¼ n/2. If n is odd, the maximum of


k ¼ (n  1)/2. Comparing (3.180) and (3.189), we get [8]

X½n=2 ð1Þk 2n2k Γðλ þ n  k Þ


Cλn ðxÞ ¼ k¼0 k!ðn  2k Þ!
xn2k : ð3:191Þ
Γ ðλÞ

Comparing (3.164) and (3.177) and putting λ ¼ 1/2, we immediately find that the
two differential equations are identical [7]. That is,

n ðxÞ ¼ Pn ðxÞ:
C 1=2 ð3:192Þ

Hence, we further have


 
X½n=2 ð1Þk 2n2k Γ 1 þ n  k
2  
P n ð xÞ ¼ k¼0 k!ðn  2k Þ!
xn2k : ð3:193Þ
Γ 12

Using (3.186) once again, we get [8]

X½n=2 ð1Þk ð2n  2kÞ!


P n ð xÞ ¼ xn2k : ð3:194Þ
k¼0 2 k!ðn  kÞ!ðn  2k Þ!
n

It is convenient to make
 a formula
 about a gamma function. In (3.193), n  k > 0,
and so let us think of Γ 12 þ m ðm: positive integerÞ. Using (3.186), we have
98 3 Hydrogen-Like Atoms

     
1 1 1 1 3 3
Γ þm ¼ m Γ m ¼ m m  Γ m   ¼
2 2 2 2 2 2
    
1 3 1 1 m 1
¼ m m   Γ ¼ 2 ð2m  1Þð2m  3Þ  3  1  Γ
2 2 2 2 2
ð 2m  1 Þ!  ð 2m Þ! 
1 1
¼ 2m m1 Γ ¼ 22m Γ :
2 ðm  1Þ! 2 m! 2
ð3:195Þ

Notice that (3.195) still holds even if m ¼ 0. Inserting n  k into m of (3.195), we


get
 ð2n  2kÞ! 1
1
Γ þ n  k ¼ 22ðnkÞ Γ : ð3:196Þ
2 ðn  kÞ! 2
 
Replacing Γ 12 þ n  k of
 (3.193) with RHS of the above equation, (3.194) will
follow. A gamma function Γ 12 often appears in mathematical physics. According to
(3.185), we have
 Z 1
1 pffiffiffi
eu du ¼ π :
2
Γ ¼2
2 0

For the derivation of the above definite integral, see (2.86) of Sect. 2.4. From
(3.184), we also have

Γð1Þ ¼ 1:

In relation of the discussion of Sect. 3.5, let us derive an important formula about
Legendre polynomials. From (3.180) and (3.192), we get
 1=2 X1
1  2tx þ t 2  P ðxÞt
n¼0 n
n
: ð3:197Þ

Assuming jt j < 1, when we put x ¼ 1 in (3.197), we have

 12 1 X1 X1
1  2tx þ t 2 ¼ ¼ tn ¼ P ð1Þt n : ð3:198Þ
1t n¼0 n¼0 n

Comparing individual coefficients of tn in (3.198), we get

Pn ð1Þ ¼ 1:

See the related parts of Sect. 3.5.


Now, we are in the position to determine the constant in (3.179). Differentiating
(3.194) m times, we have
3.6 Orbital Angular Momentum: Analytic Approach 99

dm Pl ðxÞ=dxm
X½ðlmÞ=2 ð1Þk ð2l  2k Þ!ðl  2kÞðl  2k  1Þ  ðl  2k  m þ 1Þ
¼ xl2km
k¼0 2l k!ðl  k Þ!ðl  2k Þ!
X½ðlmÞ=2 ð1Þk ð2l  2k Þ!
¼ xl2km : ð3:199Þ
k¼0 2 k!ðl  kÞ!ðl  2k  mÞ!
l

Meanwhile, we have
 
mþ1
X½ðlmÞ=2 ð1Þk 2l2km Γ l þ 1  k
C lm2 ðxÞ ¼  2  xl2km : ð3:200Þ
k¼0 k!ðl  2k  mÞ! Γ m þ 12

Using (3.195) and (3.196), we have


 
Γ l þ 12  k ð2l  2k Þ! m!
  ¼ 22ðlkmÞ :
Γ m þ 12 ðl  k Þ! ð2mÞ!

Therefore, we get

mþ1
X½ðlmÞ=2 ð1Þk ð2l  2kÞ! 2m Γðm þ 1Þ l2km
Clm2 ðxÞ ¼ x , ð3:201Þ
k¼0 2l k!ðl  kÞ!ðl  2k  mÞ! Γð2m þ 1Þ

where we used m! ¼ Γ(m + 1) and (2m)! ¼ Γ(2m + 1). Comparing (3.199) and
(3.201), we get

dm Pl ðxÞ Γð2m þ 1Þ mþ12


¼ m C ðxÞ: ð3:202Þ
dx m
2 Γðm þ 1Þ lm

Thus, we find that the constant appearing in (3.179) is 2ΓmðΓ2mþ1Þ


ðmþ1Þ. Putting m ¼ 0 in
1=2
(3.202), we have Pl ðxÞ ¼ C l ðxÞ. Therefore, (3.192) is certainly recovered. This
gives an easy checkup to (3.202).
Meanwhile, Rodrigues formula of Gegenbauer polynomials [5] is given by
 
ð1Þn Γðn þ 2λÞΓ λ þ 12   1 n h
2 λþ2 d
 1i
2 nþλ2
C λn ðxÞ ¼ n   1  x 1  x : ð3:203Þ
2 n!Γ n þ λ þ 12 Γð2λÞ dxn

Hence, we have
100 3 Hydrogen-Like Atoms

mþ1 ð1Þlm Γðl þ m þ 1ÞΓðm þ 1Þ  m dlm


C lm2 ðxÞ ¼ 1  x2
2 ðl  mÞ!Γðl þ 1ÞΓð2m þ 1Þ
lm dxlm
h l i
 1  x2 : ð3:204Þ

Inserting (3.204) into (3.202), we have

dm Pl ðxÞ ð1Þlm Γðl þ m þ 1Þ  m d lm 


m ¼ l 1  x2 1  x2 Þ l
dx 2 ðl  mÞ!Γðl þ 1Þ dxlm

ð1Þlm ðl þ mÞ!  m dlm 


¼ 1  x2 1  x2 Þ l : ð3:205Þ
2 l!ðl  mÞ!
l
dxlm

Further inserting this into (3.175), we finally get

lm h i
ð1Þlm ðl þ mÞ!  
2 m=2 d 2 l
l ð xÞ ¼
Pm 1  x 1  x : ð3:206Þ
2l l!ðl  mÞ! dxlm

When m ¼ 0, we have

ð1Þl d l 
P0l ðxÞ ¼ 1  x2 Þl ¼ Pl ðxÞ: ð3:207Þ
2l l! dxl

Thus, we recover the functional form of Legendre polynomials. The expression


(3.206) is also meaningful for negative m, provided jm j  l, and permits an
l ðxÞ given by (3.175) to negative numbers of m [5].
extension of the definition of Pm
Changing m to m in (3.206), we have

ð1Þlþm ðl  mÞ!  m=2 dlþm h l i


Pm
l ð xÞ ¼ 1  x2 1  x2 : ð3:208Þ
2 l!ðl þ mÞ!
l dx lþm

Meanwhile, from (3.168) and (3.175),

ð1Þl  m=2 dlþm h l i


l ð xÞ ¼
Pm l
1  x2 lþm
1  x2 ð0  m  lÞ: ð3:209Þ
2 l! dx

Comparing (3.208) and (3.209), we get

ð1Þm ðl  mÞ! m
Pm
l ð xÞ ¼ Pl ðxÞ ðl  m  lÞ: ð3:210Þ
ðl þ mÞ!

m
l ðxÞ and Pl ðxÞ are linearly dependent.
Thus, as expected earlier, Pm
3.6 Orbital Angular Momentum: Analytic Approach 101

Now, we return back to (3.149). From (3.206) we have

 m2 dlm  ð1Þlm 2l l!ðl  mÞ! m


1  x2 1  x Þ
2 l
¼ Pl ðxÞ: ð3:211Þ
dxlm ðl þ mÞ!

Inserting (3.211) into (3.149) and changing the variable x to ξ, we have


sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2l þ 1Þðl  mÞ! m
m
l ðθ, ϕÞ
Ym ¼ ð1Þ Pl ðξÞeimϕ ðξ ¼ cos θ; 0  θ  π Þ: ð3:212Þ
4π ðl þ mÞ!

The coefficient (1)m appearing (3.212) is well known as Condon–Shortley


phase [7]. Another important expression obtained from (3.210) and (3.212) is

Y m m
l ðθ, ϕÞ ¼ ð1Þ Y l ðθ, ϕÞ :
m
ð3:213Þ

Since (3.208) or (3.209) involves higher order differentiations, it would some-


what be inconvenient to find their functional forms. Here we try to seek the
convenient representation of spherical harmonics using familiar cosine and sine
functions. Starting with (3.206) and applying Leibniz rule there, we have

ð1Þlm ðl þ mÞ!
l ð xÞ ¼
Pm ð1 þ xÞm=2 ð1  xÞm=2
2l l!ðl  mÞ!
Xlm ðl  mÞ! d r  lmr 
l d
 r¼0 r 1 þ xÞ 1  xÞl
r!ðl  m  r Þ! dx dxlmr
m:
ð1Þlm ðl þ mÞ! Xlm ðl  mÞ! l!ð1 þ xÞlr 2 ð1Þlmr l!ð1  xÞmþr 2
m

¼ r¼0 r!ðl  m  r Þ!
2l l!ðl  mÞ! ðl  r Þ! ðm þ r Þ!
l!ðl þ mÞ! Xlm
m m
r lr rþ
ð1Þ ð 1 þ xÞ 2 ð 1  xÞ 2
¼ :
2 l r¼0 r!ð l  m  r Þ! ð l  r Þ! ð m þ r Þ!
ð3:214Þ

Putting x ¼ cos θ in (3.214) and using a trigonometric formula, we have


θ  
Xlm ð1Þr cos 2l2rm sin 2rþm θ2
l ð xÞ
Pm ¼ l!ðl þ mÞ! : ð3:215Þ
2
r¼0 r!ðl  m  r Þ! ðl  r Þ! ðm þ r Þ!

Inserting this into (3.212), we get


102 3 Hydrogen-Like Atoms

l ðθ, ϕÞ ¼
Ym
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  
ð2l þ 1Þðl þ mÞ!ðl  mÞ! imϕ Xlm ð1Þ cos 2l2rm θ2 sin 2rþm θ2
rþm
l! e :
4π r¼0 r!ðl  m  r Þ!ðl  r Þ!ðm þ rÞ!
ð3:216Þ

Summation domain of r must be determined so that factorials of negative integers


can be avoided [6]. That is,
(i) If m  0, 0  r  l  m; (l  m + 1) terms,
(ii) If m < 0, jm j  r  l; (ljmj+1) terms.
For example, if we choose l for m, putting r ¼ 0 in (3.216) we have
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   l 
ð2l þ 1Þ! ilϕ ð1Þ cos l θ2 sin θ2
l
Y ll ðθ, ϕÞ
¼ e
4π l!
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2l þ 1Þ! ilϕ ð1Þl sin l θ
¼ e : ð3:217Þ
4π 2l l!
qffiffiffiffi
In particular, we have Y 00 ðθ, ϕÞ ¼ 4π 1
to recover (3.150). When m ¼  l, putting
r ¼ l in (3.216) we get
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   l   rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2l þ 1Þ! ilϕ cos l θ2 sin θ2 ð2l þ 1Þ! ilϕ sin l θ
Y l
l ðθ, ϕÞ ¼ e ¼ e : ð3:218Þ
4π l! 4π 2l l!

For instance, choosing l ¼ 3 and m ¼ 3 and using (3.217) or (3.218), we have


rffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffi
35 i3ϕ 3 35 i3ϕ
Y 33 ðθ, ϕÞ ¼ e sin θ, Y 3 ðθ, ϕÞ ¼
3
e sin 3 θ:
64π 64π

For the minus sign appearing in Y 33 ðθ, ϕÞ is due to the Condon–Shortley phase.
For l ¼ 3 and m ¼ 0, moreover, we have
3.6 Orbital Angular Momentum: Analytic Approach 103

   
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r 62r θ 2r θ
ð 1 Þ cos sin
7  3!  3!X3 2 2
Y 03 ðθ, ϕÞ ¼ 3!
4π r¼0 r!ð3  r Þ!ð3  r Þ!r!
2            3
rffiffiffi cos 6 θ cos 4 θ
sin 2 θ
cos 2 θ
sin 4 θ
sin 6 θ
766
2 2 2 2 2 2 7
7
¼ 18  þ 
π 4 0!3!3!0! 1!2!2!1! 2!1!1!2! 3!0!0!3! 5

8            9
rffiffiffi>
> θ θ θ θ θ θ >>
< cos 2  sin 2  cos
6 6 2 2 2 2
2 =
cos sin sin
7 2 2 2
¼ 18 þ
π>> 36 4 >
>
: ;
rffiffiffi 
7 5 3
¼ cos 3 θ  cos θ ,
π 4 4

where in the last equality we used formulae of elementary algebra and trigonometric
functions. At the same time, we get
rffiffiffiffiffi
7
Y 03 ð0, ϕÞ ¼ :

This is consistent with (3.147) in that Y 03 ð0, ϕÞ is positive.

3.6.2 Orthogonality of Associated Legendre Functions

Orthogonality relation of functions is important. Here we deal with it, regarding the
associated Legendre functions.
Replacing m with (m  1) in (3.174) and using the notation introduced before, we
have
 
1  x2 d mþ1 Pl  2mxd m Pl þ ðl þ mÞðl  m þ 1Þdm1 Pl ¼ 0: ð3:219Þ

Multiplying both sides by (1  x2)m  1, we have


 m  m1 m
dmþ1 Pl  2mx 1  x2
1  x2 d Pl
 m1 m1
þðl þ mÞðl  m þ 1Þ 1  x2 d Pl ¼ 0:

Rewriting the above equation, we get


104 3 Hydrogen-Like Atoms

h  m1 m1
d 1  x2 Þm dm Pl ¼ ðl þ mÞðl  m þ 1Þ 1  x2 d Pl : ð3:220Þ

Now, let us define f (m) as follows:


Z 1 m
f ðmÞ  1  x2 ðd m Pl Þd m Pl0 dx ð0  m  l, l0 Þ: ð3:221Þ
1

Rewriting (3.221) as follows and integrating it by parts, we have


Z 1 m  m1  m
f ðm Þ ¼ 1  x2 d d Pl d Pl0 dx
1
 Z
  1   m
¼ dm1 Pl ð1  x2 Þm dm Pl0 11  dm1 Pl d 1  x2 dm Pl0 dx
1
Z 1
 m1   m
¼ d Pl d 1  x2 dm Pl0 dx
1
Z 1   m1 m1
¼ dm1 Pl ðl0 þ mÞðl0  m þ 1Þ 1  x2 d Pl0 dx
1

¼ ðl0 þ mÞðl0  m þ 1Þf ðm  1Þ, ð3:222Þ

where with the second equality the first term vanished and with the second last
equality we used (3.220). Equation (3.222) gives a recurrence formula regarding f
(m). Further performing the calculation, we get

f ðmÞ ¼ ðl0 þ mÞðl0 þ m  1Þ  ðl0  m þ 2Þðl0  m þ 1Þf ðm  2Þ


¼ 
¼ ðl0 þ mÞðl0 þ m  1Þ  ðl0 þ 1Þ  l0   ðl0  m þ 2Þðl0  m þ 1Þf ð0Þ
ðl0 þ mÞ!
¼ f ð0Þ, ð3:223Þ
ðl0  mÞ!

where we have
Z 1
f ð 0Þ ¼ Pl ðxÞPl0 ðxÞdx: ð3:224Þ
1

Note that in (3.223) a coefficient of f(0) comprises 2m factors.


In (3.224), Pl(x) and Pl0 ðxÞ are Legendre polynomials defined in (3.165). Then,
using (3.168) we have
3.6 Orbital Angular Momentum: Analytic Approach 105

Z 1h
 ih 0   0i
0
ð1Þl ð1Þl 
f ð 0Þ ¼ 0 d l 1  x2 l dl 1  x2 l dx: ð3:225Þ
2l l! 2l l0 ! 1

To evaluate (3.224), we have two cases; i.e., (i) l 6¼ l0 and (ii) l ¼ l0. With the first
case, assuming that l > l0 and taking partial integration, we have
Z 1h ih 0  i
 0
I¼ d l 1  x2 Þl dl 1  x2 Þl dx
1
hn  l on l0  l0 oi1
¼ d l1 1  x2 d 1  x2
1
Z 1h h 0  i
 0
 d l1 1  x2 Þl dl þ1 1  x2 Þl dx: ð3:226Þ
1

In the above, we find that the first term vanishes because it contains (1  x2) as a
factor. Integrating (3.226) another l0 times as before, we get
Z 1h ih 0  i
0 0  0
I ¼ ð1Þl þ1 d ll 1 1  x2 Þl d2l þ1 1  x2 Þl dx: ð3:227Þ
1

0
In (3.227) 0  l  l0  1  2l, and so dll 1 ð1  x2 Þ does not vanish, but
l

l0 0 l0
ð1  x2 Þ is an at most 2l0-degree polynomial and, hence, d2l þ1 ð1  x2 Þ vanishes.
Therefore,

f ð0Þ ¼ 0: ð3:228Þ

If l < l0, changing Pl(x) and Pl0 ðxÞ in (3.224), we get f(0) ¼ 0 as well.
In the second case of l ¼ l0, we evaluate the following integral:
Z 1h
 l i2
I¼ d l 1  x2 dx: ð3:229Þ
1

Similarly integrating (3.229) by parts l times, we have


Z Z 1
1 l h  l i  l
I ¼ ð1Þl 1  x2 d2l 1  x2 dx ¼ ð1Þ2l ð2lÞ! 1  x2 dx: ð3:230Þ
1 1

In (3.230), changing x to cosθ, we have


106 3 Hydrogen-Like Atoms

Z 1 Z
l π
1  x2 dx ¼ sin 2lþ1 θdθ: ð3:231Þ
1 0

2
ðl!Þ 2lþ1
We have already estimate this integral in (3.132) to have 2ð2lþ1 Þ! . Therefore,

ð1Þ2l ð2lÞ! 22lþ1 ðl!Þ2 2


f ð0Þ ¼ ¼ : ð3:232Þ
2 ðl!Þ
2l 2 ð2l þ 1Þ! 2l þ1

Thus, we get

ðl þ mÞ! ðl þ mÞ! 2
f ðmÞ ¼ f ð 0Þ ¼ : ð3:233Þ
ðl  mÞ! ðl  mÞ! 2l þ 1

From (3.228) and (3.233), we have


Z 1
ðl þ mÞ! 2
l ðxÞPl0 ðxÞdx ¼
Pm δ 0: ð3:234Þ
m
1 ðl  mÞ! 2l þ 1 ll

Accordingly, putting
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
fm ðxÞ  ð2l þ 1Þðl  mÞ!Pm ðxÞ,
P ð3:235Þ
l
2ðl þ mÞ! l

we get
Z 1
P fm0 ðxÞdx ¼ δll0 :
fm ðxÞP ð3:236Þ
l l
1

Normalized Legendre polynomials immediately follow. This is given by


rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffi
2l þ 1 ð1Þl 2l þ 1 d l h l i
Pel ðxÞ  P l ð xÞ ¼ l l
1  x2 : ð3:237Þ
2 2 l! 2 dx

Combining a normalized function (3.235) with p1ffiffiffiffi



eimϕ , we recover

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2l þ 1Þðl  mÞ! m
l ðθ, ϕÞ
Ym ¼ Pl ðxÞeimϕ ðx ¼ cos θ; 0  θ  πÞ: ð3:238Þ
4π ðl þ mÞ!

Notice in (3.238), however, we could not determine Condon–Shortley phase


(1)m; see (3.212).
3.7 Radial Wave Functions of Hydrogen-Like Atoms 107

m
Since Pml ðxÞ and Pl ðxÞ are linearly dependent as noted in (3.210), the set of the
associated Legendre functions cannot define a complete set of orthonormal system.
In fact, we have
Z 1
m ð1Þm ðl  mÞ! ðl þ mÞ! 2 2ð1Þm
l ðxÞPl ðxÞdx ¼
Pm ¼ : ð3:239Þ
1 ðl þ mÞ! ðl  mÞ! 2l þ 1 2l þ 1

m
l ðxÞ and Pl ðxÞ are not orthogonal. Thus, we need e
imϕ
This means that Pm to
constitute the complete set of orthonormal system. In other words,
Z 2π Z 1 h 0 i
dϕ d ð cos θÞ Y m
l0 ðθ, ϕÞ Y l ðθ, ϕÞ ¼ δll δmm :
m 0 0 ð3:240Þ
0 1

3.7 Radial Wave Functions of Hydrogen-Like Atoms

In Sect. 3.1 we have constructed Hamiltonian of hydrogen-like atoms. If the physical


system is characterized by the central force field, the method of separation of
variables into the angular part (θ, ϕ) and radial (r) part is successfully applied to
the problem and that method allows us to deal with the Schrödinger equation
separately. The spherical surface harmonics play a central role in dealing with the
differential equations related to the angular part. We studied important properties of
the special functions such as Legendre polynomials and associated Legendre func-
tions, independent of the nature of the specific central force fields such as Coulomb
potential and Yukawa potential. With the Schrödinger equation pertinent to the
radial part, on the other hand, its characteristics differ depending on the nature of
individual force fields. Of these, the differential equation associated with the Cou-
lomb potential gives exact (or analytical) solutions. It is well known that the second-
order differential equations are often solved by an operator representation method.
Examples include its application to a quantum-mechanical harmonic oscillator and
angular momenta of a particle placed in a central force field. Nonetheless, the
corresponding approach to the radial equation for the electron has been less popular
to date. The initial approach, however, was made by Sunakawa [3]. The purpose of
this chapter rests upon further improvement of that approach.

3.7.1 Operator Approach to Radial Wave Functions

In Sect. 3.2, the separation of variables leaded to the radial part of the Schrödinger
equation described as
108 3 Hydrogen-Like Atoms

 2   
1 ħ ∂ 2 ∂Rðr Þ ħ2 λ Ze2
 2 r þ 2 R ðr Þ  Rðr Þ ¼ ERðr Þ: ð3:51Þ
2μ r ∂r ∂r r 4πε0 r

We identified λ with l(l + 1) in (3.124). Thus, rewriting (3.51) and indexing R(r)
with l, we have
   2 
ħ2 d 2 dRl ðr Þ ħ lðl þ 1Þ Ze2
 r þ  R ðr Þ ¼ ERl ðr Þ, ð3:241Þ
2μr 2 dr dr 2μr 2 4πε0 r l

where Rl(r) is a radial wave function parametrized with l; μ, Z, ε0, and E denote a
reduced mass of hydrogen-like atom, atomic number, permittivity of vacuum, and
eigenvalue of energy, respectively. Otherwise we follow conventions.
Now, we are in position to solve (3.241). As in the cases of Chap. 2 of a quantum-
mechanical harmonic oscillator and the previous section of the angular momentum
operator, we present the operator formalism in dealing with radial wave functions of
hydrogen-like atoms. The essential point rests upon that the radial wave functions
can be derived by successively operating lowering operators on a radial wave
function having a maximum allowed orbital angular momentum quantum number.
The results agree with the conventional coordinate representation method based
upon power series expansion that is related to associated Laguerre polynomials.
Sunakawa [3] introduced the following differential equation by suitable trans-
formations of a variable, parameter, and function:
 
d 2 ψ l ð ρÞ l ð l þ 1Þ 2
 þ  ψ l ðρÞ ¼ Eψ l ðρÞ, ð3:242Þ
dρ2 ρ2 ρ
 a 2
where ρ ¼ Zra , E ¼ 2μ
ħ2 Z
E , and ψ l(ρ) ¼ ρRl(r) with a (4πε0ħ2/μe2) being Bohr
radius of a hydrogen-like atom. Note that ρ and E are dimensionless quantities. The
related calculations are as follows: We have
 
dRl d ðψ l =ρÞ dρ dψ l 1 ψ l Z
¼ ¼  :
dr dρ dr dρ ρ ρ2 a

Thus, we get
 
dRl dψ l a d 2 dRl d 2 ψ l ð ρÞ
r2
¼ r  ψ l, r ¼ ρ:
dr dρ Z dr dr dρ2

Using the above relations we arrive at (3.242).


Here we define the following operators:
3.7 Radial Wave Functions of Hydrogen-Like Atoms 109

 
d l 1
bl  þ  : ð3:243Þ
dρ ρ l

Hence,
 
d l 1
b{l ¼  þ  , ð3:244Þ
dρ ρ l

where the operator b{l is an adjoint operator of bl. Notice that these definitions are
different from those of Sunakawa [3]. The operator dρ d
ð AÞ is formally an anti-
Hermitian operator. We have mentioned such an operator in Sect. 1.5. The second
terms of (3.243) and (3.244) are Hermitian operators, which we define as H. Thus,
we foresee that bl and b{l can be denoted as follows:

bl ¼ A þ H and b{l ¼ A þ H:

These representations are analogous to those appearing in the operator formalism


of a quantum-mechanical harmonic oscillator. Special care, however, should be
taken in dealing with the operators bl and b{l . First, we should carefully examine
d d
whether dρ is in fact an anti-Hermitian operator. This is because for dρ to be anti-
Hermitian, the solution ψ l(ρ) must satisfy boundary conditions in such a way that
ψ l(ρ) vanishes or takes the same value at the endpoints ρ ! 0 and 1. Second, the
coordinate system we have chosen is not Cartesian coordinate but the polar (spher-
ical) coordinate, and so ρ is defined only on a domain ρ > 0. We will come back to
this point later.
Let us proceed on calculations. We have
     
d l 1 d l 1
bl b{l ¼ þ    þ 
dρ ρ l dρ ρ l
   
d2 d l 1 l 1 d l2 2 1
¼ 2þ    þ 2 þ 2
dρ dρ ρ l ρ l dρ ρ ρ l

d2 l l2 2 1 d2 lðl  1Þ 2 1
¼  þ  þ ¼  þ  þ 2 ð3:245Þ
dρ2 ρ2 ρ2 ρ l2 dρ2 ρ2 ρ l

Also, we have

d2 l ðl þ 1 Þ 2 1
b{l bl ¼  þ  þ 2: ð3:246Þ
dρ 2 ρ2 ρ l

We further define an operator H(l ) as follows:


110 3 Hydrogen-Like Atoms

 
ðlÞ d2 l ð l þ 1Þ 2
H  2þ  :
dρ ρ2 ρ

Then, from (3.243) and (3.244) as well as (3.245) and (3.246) we have

H ðlÞ ¼ blþ1 b{lþ1 þ εðlÞ ðl  0Þ, ð3:247Þ

where εðlÞ   ðlþ1


1
Þ2
. Alternatively,

H ðlÞ ¼ b{l bl þ εðl1Þ ðl  1Þ: ð3:248Þ

If we put l ¼ n  1 in (3.247) with n being a fixed given integer larger than l, we


obtain

H ðn1Þ ¼ bn b{n þ εðn1Þ : ð3:249Þ

We evaluate the following inner product of both sides of (3.249):


D   E
χH ðn1Þ χ ¼ h χ jbn b{ jχ i þ εðn1Þ h χ jχ i
n

 
¼ b{ χ b{ χi þ εðn1Þ h χ jχ i
n n
 2
¼ b{n jχ i þ εðn1Þ h χ jχ i

 εðn1Þ : ð3:250Þ

Here we assume that χ is normalized. On the basis of the variational principle [9],
the above expected value must take a minimum ε(n  1) so that χ can be an
eigenfunction. To satisfy this condition, we have

jb{n χi ¼ 0: ð3:251Þ

In fact, if (3.251) holds, from (3.249) we have

H ðn1Þ χ ¼ εðn1Þ χ: ð3:252Þ

We define such a function as below

ðnÞ
ψ n1  χ: ð3:253Þ

From (3.247) and (3.248), we have the following relationship:


3.7 Radial Wave Functions of Hydrogen-Like Atoms 111

H ðlÞ blþ1 ¼ blþ1 H ðlþ1Þ ðl  0Þ: ð3:254Þ

Meanwhile we define the functions as shown below

ðnÞ
ψ ðns

 bnsþ1 bnsþ2 ∙ ∙ ∙ ∙ bn1 ψ n1 ð2  s  nÞ: ð3:255Þ

ðnÞ
With these functions (s – 1) operators have been operated on ψ n1 . Note that if
s took 1, no operation of bl would take place. Thus, we find that bl functions upon the
l-state to produce the (l  1)-state. That is, bl acts as an annihilation operator. For the
sake of convenience we express

H ðn,sÞ  H ðnsÞ : ð3:256Þ

Using this notation and (3.254), we have

ðnÞ
H ðn,sÞ ψ ðns

¼ H ðn,sÞ bnsþ1 bnsþ2   bn1 ψ n1
ðnÞ
¼ bnsþ1 H ðn,s1Þ bnsþ2   bn1 ψ n1
ðnÞ
¼ bnsþ1 bnsþ2 H ðn,s2Þ   bn1 ψ n1
    
ð nÞ
¼ bnsþ1 bnsþ2   H ðn,2Þ bn1 ψ n1
ð nÞ
¼ bnsþ1 bnsþ2   bn1 H ðn,1Þ ψ n1
ðnÞ
¼ bnsþ1 bnsþ2   bn1 εðn1Þ ψ n1
ðnÞ
¼ εðn1Þ bnsþ1 bnsþ2   bn1 ψ n1

¼ εðn1Þ ψ ðns

: ð3:257Þ

Thus, total n functions ψ ðns



ð1  s  nÞ belong to the same eigenvalue ε(n  1).
Notice that the eigenenergy En corresponding to ε(n  1) is given by

2
ħ2 Z 1
En ¼  : ð3:258Þ
2μ a n2
ðnÞ
If we define l  n  s and take account of (3.252), total n functions ψ l (l ¼ 0, 1, 2,
∙ ∙ ∙ ∙, n  1) belong to the same eigenvalue ε(n  1).
ðnÞ
The quantum state ψ l is associated with the operators H(l ). Thus, the solution of
ðnÞ
(3.242) has been given by functions ψ l parametrized with n and l on condition that
(3.251) holds. As explicitly indicated in (3.255) and (3.257), bl lowers the parameter
112 3 Hydrogen-Like Atoms

ðnÞ
l by one from l to l – 1, when it operates on ψ l . The operator b0 cannot be defined as
indicated in (3.243), and so the lowest number of l should be zero. Operators such as
bl are known as a ladder operator (lowering operator or annihilation operator in the
ðnÞ
present case). The implication is that the successive operations of bl on ψ n1 produce
various parameters l as a subscript down to zero, while retaining the same integer
parameter n as a superscript.

3.7.2 Normalization of Radial Wave Functions

Next we seek normalized eigenfunctions. Coordinate representation of (3.251) takes

ðnÞ  
dψ n1 n 1 ðnÞ
 þ  ψ n1 ¼ 0: ð3:259Þ
dρ ρ n

The solution can be obtained as

ðnÞ
ψ n1 ¼ cn ρn eρ=n , ð3:260Þ

where cn is a normalization constant. This can be determined as follows:


Z 1 
 ðnÞ 2
ψ n1  dρ ¼ 1: ð3:261Þ
0

Namely,
Z 1
j cn j 2
ρ2n e2ρ=n dρ ¼ 1: ð3:262Þ
0

Consider the following definite integral:


Z 1
1
e2ρξ dρ ¼ :
0 2ξ

Differentiating the above integral 2n times with respect to ξ gives


Z 1 2nþ1
1
ρ2n e2ρξ dρ ¼ ð2nÞ!ξð2nþ1Þ : ð3:263Þ
0 2

Substituting 1/n into ξ, we obtain


3.7 Radial Wave Functions of Hydrogen-Like Atoms 113

Z 1 2nþ1
1
ρ2n e2ρ=n dρ ¼ ð2nÞ!nð2nþ1Þ : ð3:264Þ
0 2

Hence,

nþ12 pffiffiffiffiffiffiffiffiffiffi
2
cn ¼ = ð2nÞ!: ð3:265Þ
n

To further normalize the other wave functions, we calculate the following inner
product:
D  E D  E
ðnÞ  ðnÞ ðnÞ  ðnÞ
ψ l ψ l ¼ ψ n1 b{n1   b{lþ2 b{lþ1 blþ1 blþ2   bn1 ψ n1 : ð3:266Þ

From (3.247) and (3.248), we have

b{l bl þ εðl1Þ ¼ blþ1 b{lþ1 þ εðlÞ ðl  1Þ: ð3:267Þ

Applying (3.267) to (3.266) repeatedly and considering (3.251), we reach the


following relationship:
D  E
ðnÞ  ðnÞ
ψ l ψ l
h ih i h iD  E ð3:268Þ
ðnÞ  ðnÞ
¼ εðn1Þ  εðn2Þ εðn1Þ  εðn3Þ    εðn1Þ  εðlÞ ψ n1 ψ n1 :

ðnÞ
To show this, we use mathematical
D induction.
E We have already normalized ψ n1
ðnÞ ðnÞ
in (3.261). Next, we calculate ψ n2 jψ n2 such that

D E D E D E
ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ
ψ n2 jψ n2 ¼ ψ n1 b{n1 jbn1 ψ n1 ¼ ψ n1 b{n1 bn1 ψ n1
D h i E
ðnÞ ðnÞ
¼ ψ n1 bn b{n þ εðn1Þ  εðn2Þ ψ n1
D E h iD  E ð3:269Þ
ðnÞ ðnÞ ðnÞ  ðnÞ
¼ ψ n1 bn b{n ψ n1 þ εðn1Þ  εðn2Þ ψ n1 ψ n1
h iD  E
ðnÞ  ðnÞ
¼ εðn1Þ  εðn2Þ ψ n1 ψ n1 :

With the third equality, we used (3.267) with l ¼ n  1; with the last equality we
D with lE¼ n  2. Then, Dit sufficesEto show that
used (3.251). Therefore, (3.268) holds
ðnÞ ðnÞ ðnÞ ðnÞ
assuming that (3.268) holds with ψ lþ1 ψ lþ1 , it holds with ψ l ψ l as well.
D E
ðnÞ ðnÞ
Let us calculate ψ l ψ l , starting with (3.266) as below:
114 3 Hydrogen-Like Atoms

D  E D  E
ðnÞ  ðnÞ ðnÞ  ðnÞ
ψ l ψ l ¼ ψ n1 b{n1   b{lþ2 b{lþ1 blþ1 blþ2   bn1 ψ n1
D  E
ðnÞ ðnÞ
¼ ψ n1 b{n1   b{lþ2 b{lþ1 blþ1 blþ2   bn1 ψ n1
D h i E
ðnÞ ðnÞ
¼ ψ n1 b{n1   b{lþ2 blþ2 b{lþ2 þ εðlþ1Þ  εðlÞ blþ2   bn1 ψ n1
D  E
ðnÞ ðnÞ
¼ ψ n1 b{n1   b{lþ2 blþ2 b{lþ2 blþ2   bn1 ψ n1 ð3:270Þ
h iD E
ðnÞ ð nÞ
þ εðlþ1Þ  εðlÞ ψn1 b{n1   b{lþ2 blþ2   bn1 ψ n1
D  E
ðnÞ ðnÞ
¼ ψ n1 b{n1   b{lþ2 blþ2 b{lþ2 blþ2   bn1 ψ n1
h iD  E
ðnÞ  ðnÞ
þ εðlþ1Þ  εðlÞ ψ lþ1 ψ lþ1 :

In the next step, using b{lþ2 blþ2 ¼ blþ3 b{lþ3 þ εðlþ2Þ  εðlþ1Þ , we have
D  E D E
ðnÞ  ðnÞ ðnÞ ðnÞ
ψ l ψ l ¼ ψ n1 b{n1   b{lþ2 blþ2 blþ3 b{lþ3 blþ3   bn1 ψ n1
h iD  E h iD  E
ðnÞ  ðnÞ ðnÞ  ðnÞ
þ εðlþ1Þ  εðlÞ ψ lþ1 ψ lþ1 þ εðlþ2Þ  εðlþ1Þ ψ lþ1 ψ lþ1 :
ð3:271Þ

Thus, we find that in the first term the index number of b{lþ3 has been increased by
one with itself transferred toward theDright side. EOn the other hand, we notice that
ðnÞ  ðnÞ
with the second and third terms, εðlþ1Þ ψ lþ1 ψ lþ1 cancels out. Repeating the above
processes, we reach a following expression:
D  E D   ðnÞ E
ðnÞ  ðnÞ ðnÞ
ψ l ψ l ¼ ψ n1 b{n1   b{lþ2 blþ2   bn1 bn b{n ψ n1
h iD  E h iD  E
ðnÞ  ðnÞ ðnÞ  ðnÞ
þ εðn1Þ  εðn2Þ ψ lþ1 ψ lþ1 þ εðn2Þ  εðn3Þ ψ lþ1 ψ lþ1 þ   
h iD  E h iD  E ð3:272Þ
ðnÞ  ðnÞ ðnÞ  ðnÞ
þ εðlþ2Þ  εðlþ1Þ ψ lþ1 ψ lþ1 þ εðlþ1Þ  εðlÞ ψ lþ1 ψ lþ1
h iD  E
ðnÞ  ðnÞ
¼ εðn1Þ  εðlÞ ψ lþ1 ψ lþ1 :

D first termE of RHS vanishes because of (3.251); the subsequent


In (3.272), the
ðnÞ  ðnÞ
terms produce ψ lþ1 ψ lþ1 whose coefficients have cancelled out one another
except for [ε(n  1)  ε(l )].
Meanwhile, from assumption of the mathematical induction we have
3.7 Radial Wave Functions of Hydrogen-Like Atoms 115

D  E
ðnÞ  ðnÞ
ψ lþ1 ψ lþ1
h ih i h iD  E
ðnÞ  ðnÞ
¼ εðn1Þ  εðn2Þ εðn1Þ  εðn3Þ    εðn1Þ  εðlþ1Þ ψ n1 ψ n1 :

Inserting this equation into (3.272), we arrive at (3.268). In other words, we have
shown that if (3.268) holds with l ¼ l + 1, (3.268)D holds  withE l ¼ l as well. This
ðnÞ  ðnÞ
completes the proof to show that (3.268) is true of ψ l ψ l with l down to 0.
ðnÞ
el
The normalized wave functions ψ are expressed from (3.255) as

ðnÞ ðnÞ
¼ κðn, lÞ2 blþ1 blþ2   bn1 ψ
1
el
ψ e n1 , ð3:273Þ

where κ(n, l ) is defined such that


h i h i h i
κðn, lÞ  εðn1Þ  εðn2Þ  εðn1Þ  εðn3Þ     εðn1Þ  εðlÞ , ð3:274Þ

with l  n  2. More explicitly, we get

ð2n  1Þ!ðn  l  1Þ!ðl!Þ2


κ ðn, lÞ ¼ : ð3:275Þ
ðn þ lÞ!ðn!Þ2 ðnnl2 Þ2

In particular, from (3.265) we have

nþ12
ðnÞ 2 1
e n1 ¼
ψ pffiffiffiffiffiffiffiffiffiffi ρn eρ=n : ð3:276Þ
n ð2nÞ!

From (3.272), we define the following operator:

h i12
e
bl  εðn1Þ  εðl1Þ bl : ð3:277Þ

Then (3.273) becomes

¼e blþ2   e
blþ1e
ðnÞ ðnÞ
el
ψ e n1 :
bn1 ψ ð3:278Þ
116 3 Hydrogen-Like Atoms

3.7.3 Associated Laguerre Polynomials

ðnÞ
e l with conventional wave
It will be of great importance to compare the functions ψ
functions that are expressed using associated Laguerre polynomials. For this purpose
ðnÞ
we define the following functions Φl ðρÞ such that
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
lþ32 ðn  l  1Þ! ρn lþ1 2lþ1 2ρ
ðnÞ 2
Φl ð ρÞ  e ρ Lnl1 : ð3:279Þ
n 2nðn þ lÞ! n

The associated Laguerre polynomials are described as

1 ν x dn nþν x
Lνn ðxÞ ¼ x e n ðx e Þ, ðν > 1Þ: ð3:280Þ
n! dx

In a form of power series expansion, the polynomials are expressed for integer
k  0 as

Xn ð1Þm ðn þ k Þ!
Lkn ðxÞ ¼ m¼0 ðn  mÞ!ðk þ mÞ!m!
xm : ð3:281Þ

Notice that “Laguerre polynomials” Ln(x) are defined as

Ln ðxÞ  L0n ðxÞ:

Hence, instead of (3.280) and (3.281), the Rodrigues formula and power series
expansion of Ln(x) are given by [2, 5]

1 x dn n x
Ln ðxÞ ¼ e ðx e Þ,
n! dxn
Xn ð1Þm n!
L n ð xÞ ¼ xm :
m¼0
ðn  mÞ!ðm!Þ2

ðnÞ ρ
The function Φl ðρÞ contains multiplication factors en and ρl + 1. The function
  ðnÞ
Lnl1 n is a polynomial of ρ with the highest order of ρn  l  1. Therefore, Φl ðρÞ
2lþ1 2ρ
ρ
consists of summation of terms containing en ρt , where t is an integer equal to 1 or
ðnÞ
larger. Consequently, Φl ðρÞ ! 0 when ρ ! 0 and ρ ! 1 (vide supra). Thus, we
ðnÞ
have confirmed that Φl ðρÞ certainly satisfies proper BCs mentioned earlier and,
d
hence, the operator dρ is indeed an anti-Hermitian. To show this more explicitly, we
define D  dρ d
. An inner product between arbitrarily chosen functions f and g is
3.7 Radial Wave Functions of Hydrogen-Like Atoms 117

Z 1
h f jDgi  f Dgdρ
0
Z 1
ð3:282Þ
¼ ½ f g 1
0  ðDf Þgdρ
0
¼ ½ f g 1
0 þ hDf jgi,

where f is a complex conjugate of f. Meanwhile, from (1.112) we have



h f jDgi ¼ D{ f jgi: ð3:283Þ

Therefore if the functions f and g vanish at ρ ! 0 and ρ ! 1, D{ ¼  D by


equating (3.282) and (3.283). This means that D is anti-Hermitian. The functions
ðnÞ
Φl ðρÞ we are dealing with certainly satisfy the required boundary conditions. The
operator H(l ) appearing in (3.247) and (3.248) is Hermitian accordingly. This is
because

b{l bl ¼ ðA þ H ÞðA þ H Þ ¼ H 2  A2  AH þ HA, ð3:284Þ

{  {  {  2  2
b{l bl ¼ H 2  A2  H { A{ þ A{ H { ¼ H {  A{  H { A{ þ A{ H {

¼ H 2  ðAÞ2  H ðAÞ þ ðAÞH ¼ H 2  A2 þ HA  AH ¼ b{l bl :


ð3:285Þ

The Hermiticity is true of bl b{l as well. Thus, the eigenvalue and eigenstate
(or wave function) which belongs to that eigenvalue are physically meaningful.
Next, consider the following operation:

e ðnÞ
bl Φ l ð ρÞ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h i12 lþ32 ðn  l  1Þ!  n o
ðn1Þ ðl1Þ 2 d l 1 ρ 2ρ
¼ ε ε þ  en ρlþ1 L2lþ1 ,
n 2nðn þ lÞ! dρ ρ l nl1 n
ð3:286Þ

where
h i12
nl
εðn1Þ  εðl1Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð3:287Þ
ðn þ lÞðn  lÞ
2ρ
Rewriting L2lþ1
nl1 n in a power series expansion form using (3.280) and
rearranging the result, we obtain
118 3 Hydrogen-Like Atoms

e ðnÞ
bl Φ l ð ρÞ
 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  
2 lþ2 nl 1 d l 1
¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi þ 
n ðn þ lÞðn  lÞ 2nðn þ lÞ!ðn  l  1Þ! dρ ρ l
 nl1 
ρ
l d nþl 2ρ
 eρn ρ e n
dρnl1
 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  
2 lþ2 nl 1 d l 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ 
n ðn þ lÞðn  lÞ 2nðn þ lÞ!ðn  l  1Þ! dρ ρ l
Xnl1 ð1Þm ðn þ lÞ! m ðn  l  1Þ!
ρ 2
en ρlþmþ1 , ð3:288Þ
m¼0 ð2l þ m þ 1Þ! n m!ðn  l  m  1Þ!

where we used well-known Leibniz


 nþl 2ρ=n  rule of higher order differentiation of a product
d nl1
function, i.e., dρnl1 ρ e . To perform further calculation, notice that dρd
does
ρ/n
not change a functional form of e , whereas it lowers the order of ρ l+m+1
by one.
Meanwhile, operation of ρl lowers the order of ρl + m + 1 by one as well. The factor nþl 2l
in the following equation (3.289) results from these calculation processes. Consid-
ering these characteristics of the operator, we get

e ðnÞ
bl Φ l ð ρÞ
 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 lþ2 n þ l nl 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi eρ=n ρl
n 2l ðn þ lÞðn  lÞ 2nðn þ lÞ!ðn  l  1Þ!
(
Xnl1 ð1Þmþ1 ðn þ lÞ! ðn  l  1Þ! mþ1 ð3:289Þ
mþ1 2
 ρ
m¼0 ð2l þ m þ 1Þ! m!ðn  l  m  1Þ! n
)
Xnl1 ð1Þm ðn þ l  1Þ! ðn  l  1Þ! m
m 2
þ2l m¼0 ρ :
ð2l þ mÞ! m!ðn  l  m  1Þ! n

In (3.289), calculation of the part {  } next to the multiplication sign for RHS is
somewhat complicated, and so we describe the outline of the calculation procedure
below.
{  } of RHS of (3.289)

Xnl ð1Þm ðn þ lÞ! ðn  l  1Þ! m


m 2
¼ ρ
m¼1 ð2l þ mÞ! ðm  1Þ!ðn  l  mÞ! n
Xnl1 ð1Þm ðn þ l  1Þ! ðn  l  1Þ! 
2 m
þ2l m¼0 ρm
ð2l þ mÞ! m!ðn  l  m  1Þ! n
3.7 Radial Wave Functions of Hydrogen-Like Atoms 119

2ρm
Xnl1 ð1Þm ðn  l  1Þ!ðn þ l  1Þ!
¼ n
m¼1 ð2l þ mÞ!
 
nþl 2l
 þ
ðm  1Þ!ðn  l  mÞ! m!ðn  l  m  1Þ!
nl ðn þ l  1Þ!

þð1Þnl þ
n ð2l  1Þ!
2ρm
Xnl1 ð1Þm ðn  l  1Þ!ðn þ l  1Þ!
¼ n
m¼1 ð2l þ mÞ!
ðm  1Þ!ðn  l  m  1Þ!ð2l þ mÞðn  lÞ

ðm  1Þ!ðn  l  mÞ!m!ðn  l  m  1Þ!
nl ðn þ l  1Þ!

þð1Þnl þ
n ð2l  1Þ!
2ρm nl ðn þ l  1Þ!
Xnl1 ð1Þm ðn  lÞ!ðn þ l  1Þ! nl 2ρ
¼ n
þ ð 1 Þ þ
m¼1 ð2l þ m  1Þ!ðn  l  mÞ!m! n ð2l  1Þ!
"    
Xnl1 ð1Þm 2ρ m ðn þ l  1Þ! nl 1 2ρ nl
¼ ðn  lÞ! n
m¼1 ð2l þ m  1Þ!ðn  l  mÞ!m!
þ ð1Þ
ðn  lÞ! n
#  
ðn þ l  1Þ! Xnl ð1Þm 2ρ m ðn þ l  1Þ!
þ ¼ ðn  lÞ! m¼0 n
ðn  lÞ!ð2l  1Þ! ð2l þ m  1Þ!ðn  l  mÞ!m!


¼ ðn  lÞ!L2l1
nl : ð3:290Þ
n

Notice that with the second equality of (3.290), the summation is divided into
three terms; i.e., 1  m  n  l  1, m ¼ n  l (the highest-order term), and m ¼ 0
(the lowest- order term). Note that with the second last equality of (3.290), the
highest-order (n  l) term and the lowest-order term (i.e., a constant) have been
absorbed in a single equation, namely, an associated Laguerre polynomial. Corre-
spondingly, with the second last equality of (3.290) the summation range is extended
to 0  m  n  l.
Summarizing the above results, we get
120 3 Hydrogen-Like Atoms

e ðnÞ
bl Φl ð ρÞ
 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 
2 lþ2 n þ l nlðn  lÞ! 1 2ρ
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi eρ=n ρl L2l1
ðn þ lÞðn  lÞ 2nðn þ lÞ!ðn  l  1Þ!
n 2l nl n
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 1 ðn  lÞ! 
2 lþ2 2ρ
¼ eρ=n ρl L2l1
n 2nðn þ l  1Þ! nl n
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 
2 ðl1Þþ2 ½n  ðl  1Þ  1! ρn ðl1Þþ1 2ðl1Þþ1 2ρ
3
ðnÞ
¼ e ρ Lnðl1Þ1  Φl1 ðρÞ:
n 2n½n þ ðl  1Þ! n
ð3:291Þ

ðnÞ ðnÞ
e l . Moreover, if we replace
Thus, we find out that Φl ðρÞ behaves exactly like ψ
l in (3.279) with n – 1, we find

ðnÞ ðnÞ
e n1 :
Φn1 ðρÞ ¼ ψ ð3:292Þ

Operating e
bn1 on both sides of (3.292),

ðnÞ ðnÞ
e n2 :
Φn2 ðρÞ ¼ ψ ð3:293Þ

Likewise successively operating e


bl (1  l  n  1),

ðnÞ ðnÞ
e l ðρÞ,
Φl ðρÞ ¼ ψ ð3:294Þ

with all allowed numbers of l (i.e., 0  l  n  1). This permits us to identify

ðnÞ ðnÞ
e l ðρÞ:
Φl ð ρÞ  ψ ð3:295Þ

Consequently, it is clear that the parameter n introduced in (3.249) is identical to a


principal quantum number and that the parameter l (0  l  n  1) is an orbital
angular momentum quantum number (or azimuthal quantum number). The functions
ðnÞ ðnÞ
Φl ðρÞ and ψ e l ðρÞ are identical up to the constant cn expressed in (3.265). Note,
however, that a complex constant with an absolute number of 1 (phase factor)
remains undetermined, as is always the case with the eigenvalue problem.
The radial wave functions are derived from the following relationship as
described earlier:

ðnÞ ðnÞ
e l =ρ:
Rl ðr Þ ¼ ψ ð3:296Þ

ðnÞ
To normalize Rl ðr Þ, we have to calculate the following integral:
3.7 Radial Wave Functions of Hydrogen-Like Atoms 121

Z 1  Z 1     Z  
 ðnÞ 2 2 1  ðnÞ 2 a 2 a a 3 1  ðnÞ 2
Rl ðr Þ r dr ¼ ψel  ρ dρ ¼ e l  dρ

0 0 ρ 2 Z Z Z 0
3
a
¼ : ð3:297Þ
Z
ðnÞ
el ðr Þ for the normalized radial
Accordingly, we choose the following functions R
wave functions:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffi
eðl nÞ ðr Þ ¼
R ðZ=aÞ3 Rl ðr Þ:
ðnÞ
ð3:298Þ

Substituting (3.296) into (3.298) and taking account of (3.279) and (3.280), we
obtain
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
  h i 
eðl nÞ ðr Þ 2Z 3 ðn  l  1Þ! 2Zr l Zr 2Zr
R ¼  exp  L2lþ1 : ð3:299Þ
an 2nðn þ lÞ! an an nl1 an

Equation (3.298) is exactly the same as the normalized radial wave functions that
can be obtained as the solution of (3.241) through the power series expansion. All
 
ħ2 Z 2 1
these functions belong to the same eigenenergy E n ¼  2μ a n2 ; see (3.258).
Returning back to the quantum states sharing the same eigenenergy, we have such
n states that belong to the principal quantum number n with varying azimuthal
quantum number l (0  l  n  1). Meanwhile, from (3.158), (3.159), and
(3.171), 2l + 1 quantum states possess an eigenvalue of l(l + 1) with the quantity
M2. As already mentioned in Sects. 3.5 and 3.6, the integer m takes the 2l + 1
different values of

m ¼ l, l  1, l  2,   1, 0,  1,     l þ 1,  l:

The quantum states of a hydrogen-like atoms are characterized by a set of integers


(m, l, n). On the basis of the above discussion, the number of those sharing the same
energy (i.e., the principal quantum number n) is
Xn1
l¼0
ð2l þ 1Þ ¼ n2 :

Regarding this situation, we say that the quantum states belonging to the principal
 
ħ2 Z 2 1
quantum number n, or the energy eigenvalue En ¼  2μ a
2
n2 , are degenerate n -
fold. Note that we ignored the freedom of spin state.
In summary of this section, we have developed the operator formalism in dealing
with radial wave functions of hydrogen-like atoms and seen how the operator
formalism features the radial wave functions. The essential point rests upon that
122 3 Hydrogen-Like Atoms

the radial wave functions can be derived by successively operating the lowering
ð nÞ
e n1 that is parametrized with a principal quantum number n and an
operators bl on ψ
orbital angular momentum quantum number l ¼ n  1. This is clearly represented by
(3.278). The results agree with the conventional coordinate representation method
based upon the power series expansion that leads to associated Laguerre polyno-
mials. Thus, the operator formalism is again found to be powerful in explicitly
representing the mathematical constitution of quantum-mechanical systems.

3.8 Total Wave Functions

Since we have obtained angular wave functions and radial wave functions, we
e ðnÞ of hydrogen-like atoms as a product
describe normalized total wave functions Λ l,m
of the angular part and radial part such that

e ðnÞ ¼ Y m ðθ, ϕÞR


Λ eðl nÞ ðr Þ: ð3:300Þ
l,m l

Let us seek several tangible functional forms of hydrogen (Z ¼ 1) including


angular and radial parts. For example, we have
rffiffiffiffiffi ! rffiffiffi
ðnÞ
eð01Þ ðr Þ e n1
1 3=2 ψ 1 3=2 r=a
ϕð1sÞ  Y 00 ðθ, ϕÞR ¼ a ¼ a e , ð3:301Þ
4π ρ π

where we used (3.276) and (3.295).


For ϕ(2s), using (3.277) and (3.278) we have

eð02Þ ðr Þ ¼ p1ffiffiffiffiffi a32 e2ar 2  r :
ϕð2sÞ  Y 00 ðθ, ϕÞR ð3:302Þ
4 2π a

For ϕ(2pz), in turn, we express it as


rffiffiffiffiffi
 
ϕ 2pz  eð12Þ ðr Þ
Y 01 ðθ, ϕÞR ¼
3 1 3 r
ð cos θÞ pffiffiffi a2 e2a
r

4π 2 6 a
1 3 r 1 3 r r z 1
¼ pffiffiffiffiffi a2 e2a cos θ ¼ pffiffiffiffiffi a2 e2a ¼ pffiffiffiffiffi a2 e2a z: ð3:303Þ
r 5 r

4 2π a 4 2π a r 4 2π

For ϕ(2px + iy), using (3.217) we get


References 123

 
eð12Þ ðr Þ ¼  p
ϕ 2pxþiy  Y 11 ðθ, ϕÞR
1 32 r 2ar
ffiffiffi a e sin θeiϕ
8 π a
1 3 r r x þ iy 1
¼  pffiffiffi a2 e2a ¼  pffiffiffi a2 e2a ðx þ iyÞ:
5 r
ð3:304Þ
8 π a r 8 π

In (3.304), the minus sign comes from the Condon–Shortley phase. Furthermore,
we have

 
ϕ 2pxiy  Y 1 eð2Þ 1 32 r 2ar
1 ðθ, ϕÞR1 ðr Þ ¼ pffiffiffi a a e sin θeiϕ
8 π
1 3 r r x  iy 1
¼ pffiffiffi a2 e2a ¼ pffiffiffi a2 e2a ðx  iyÞ:
5 r
ð3:305Þ
8 π a r 8 π

Notice that the above notations ϕ(2px + iy) and ϕ(2px  iy) differ from the custom
that uses, e.g., ϕ(2px) and ϕ(2py). We will come back to this point in Sect. 4.3.

References

1. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York


2. Arfken GB (1970) Mathematical methods for physicists, 2nd edn. Academic Press, Waltham
3. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)
4. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
6. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd
edn. Cambridge University Press, Cambridge
7. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
8. Lebedev NN (1972) Special functions and their applications. Dover, New York
9. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York
Chapter 4
Optical Transition and Selection Rules

In Sect. 1.2 we showed the Schrödinger equation as a function of space coordinates


and time. In subsequent sections, we dealt with the time-independent eigenvalue
problems of a harmonic oscillator and a hydrogen-like atoms. This implies that the
physical system is isolated from the outside world and that there is no interaction
between the outside world and physical system we are considering. However, by
virtue of the interaction the system may acquire or lose energy, momentum, angular
momentum, etc. As a consequence of the interaction, the system changes its quan-
tum state as well. Such a change is said to be a transition. If the interaction takes
place as an optical process, we are to deal with an optical transition. Of various
optical transitions, the electric dipole transition is common and the most important.
In this chapter, we study the optical transition of a particle confined in a potential
well, a harmonic oscillator, and a hydrogen using a semiclassical approach. A
question of whether the transition is allowed or forbidden is of great importance.
We have a selection rule to judge it.

4.1 Electric Dipole Transition

We have a time-dependent Schrödinger equation described as

∂ψ
Hψ ¼ ih : ð1:47Þ
∂t

Using the method of separation of variables, we obtained two equations expressed


below.

HϕðxÞ ¼ EϕðxÞ, ð1:55Þ

© Springer Nature Singapore Pte Ltd. 2020 125


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_4
126 4 Optical Transition and Selection Rules

∂ξðt Þ
ih ¼ Eξðt Þ: ð1:56Þ
∂t

Equation (1.55) is an eigenvalue equation of energy and (1.56) is an equation with


time. So far we have focused our attention upon (1.55) taking a one-dimensional
harmonic oscillator and hydrogen-like atoms as an example. In this chapter we deal
with a time-evolved Schrödinger equation and its relevance to an optical transition.
The optical transition takes place according to selection rules. We mention their
significance as well.
We showed that after solving the eigenvalue equation, the Schrödinger equation
is expressed as

ψ ðx, t Þ ¼ ϕðxÞ exp ðiEt=hÞ: ð1:60Þ

The probability density of the system (i.e., normally a particle such as an electron
and a harmonic oscillator) residing at a certain place x at a certain time t is expressed
as

ψ  ðx, t Þψ ðx, t Þ:

If the Schrödinger equation is described as a form of separated variables as in the


case of (1.60), the exponential factors including t cancel out and we have

ψ  ðx, t Þψ ðx, t Þ ¼ ϕ ðxÞϕðxÞ: ð4:1Þ

This means that the probability density of the system depends only on spatial
coordinate and is constant in time. Such a state is said to be a stationary state. That
is, the system continues residing in a quantum state described by ϕ(x) and remains
unchanged independent of time.
Next, we consider a linear combination of functions described by (1.60). That is,
we have

ψ ðx, t Þ ¼ c1 ϕ1 ðxÞ exp ðiE 1 t=hÞ þ c2 ϕ2 ðxÞ exp ðiE 2 t=hÞ, ð4:2Þ

where the first term is pertinent to the state 1 and second term to the state 2; c1 and c2
are complex constants with respect to the spatial coordinates but may be weakly
time-dependent. The state described by (4.2) is called a coherent state. The proba-
bility distribution of that state is described as

ψ  ðx, t Þψ ðx, t Þ
¼ jc1 j2 jϕ1 j2 þ jc2 j2 jϕ2 j2 þ c1 c2 ϕ1 ϕ2 eiωt þ c2 c1 ϕ2 ϕ1 eiωt , ð4:3Þ

where ω is expressed as
4.1 Electric Dipole Transition 127

ω ¼ ðE 2  E 1 Þ=h: ð4:4Þ

This equation shows that the probability density of the system undergoes a sinusoi-
dal oscillation with time. The angular frequency equals the energy difference
between the two states divided by the reduced Planck constant. If the system is a
charged particle such as an electron and proton, the sinusoidal oscillation is accom-
panied by an oscillating electromagnetic field. Thus, the coherent state is associated
with the optical transition from one state to another, when the transition is related to
the charged particle.
The optical transitions result from various causes. Of these, the electric dipole
transition yields the largest transition probability and the dipole approximation is
often chosen to represent the transition probability. From the point of view of optical
measurements, the electric dipole transition gives the strongest absorption or emis-
sion spectral lines. The matrix element of the electric dipole, more specifically a
square of an absolute value of the matrix element, is a measure of the optical
transition probability. Labelling the quantum states as a, b, etc. and describing the
corresponding state vector as j ai, j bi, etc., the matrix element Pba is given by

Pba  hbjεe  Pjai, ð4:5Þ

where εe is a unit polarization vector of the electric field of an electromagnetic wave


(i.e., light). Equation (4.5) describes the optical transition that takes place as a result
of the interaction between electrons and radiation field in such a way that the
interaction causes electrons in the system to change the state from j ai to j bi. That
interaction is represented by εe  P. The quantum states j ai and j bi are referred to as
an initial state and final state, respectively.
The quantity P is the electric dipole moment of the system, which is defined as

P  eΣj xj , ð4:6Þ

where e is an elementary charge (e < 0) and xj is a position vector of the j-th electron.
Detailed description of εe and P can be seen in Chap. 7. The quantity Pba is said to be
transition dipole moment, or, more precisely, transition electric dipole moment with
respect to the states j ai and j bi. We assume that the optical transition occurs from a
quantum state j ai to another state j bi. Since Pba is generally a complex number,
jPbaj2 represents the transition probability.
If we adopt the coordinate representation, (4.5) is expressed by
Z
Pba ¼ ϕb εe  Pϕa dτ, ð4:7Þ

where τ denotes an integral range of a space.


128 4 Optical Transition and Selection Rules

4.2 One-Dimensional System

Let us apply the aforementioned general description to individual cases of Chaps. 1–


3.
Example 4.1: A Particle Confined in a Square-Well Potential This example was
treated in Chap. 1. As before, we assume that a particle (i.e., electron) is confined in a
one-dimensional system [L  x  L (L > 0)].
We consider the optical transition from the ground state ϕ1(x) to the first excited
state ϕ2(x). Here, we put L ¼ π/2 for convenience. Then, the normalized coherent
state ψ(x) is described as

1
ψ ðx, t Þ ¼ pffiffiffi ½ϕ1 ðxÞ exp ðiE 1 t=hÞ þ ϕ2 ðxÞ exp ðiE 2 t=hÞ, ð4:8Þ
2

where we put c1 ¼ c2 ¼ p1ffiffi2 in (4.2). In (4.8), we have

rffiffiffi rffiffiffi
2 2
ϕ1 ð xÞ ¼ cos x and ϕ2 ðxÞ ¼ sin 2x: ð4:9Þ
π π

Following (4.3), we have a following real function called a probability distribution


density:

1 
ψ  ðx, t Þψ ðx, t Þ ¼ cos 2 x þ sin 2 2x þ ð sin 3x þ sin xÞ cos ωt , ð4:10Þ
π

where ω is given by (4.4) as

ω ¼ 3h=2m, ð4:11Þ

where m is a mass of an electron. Rewriting (4.10), we have


h i
1 1
ψ  ðx, t Þψ ðx, t Þ ¼ 1 þ ð cos 2x  cos 4xÞ þ ð sin 3x þ sin xÞ cos ωt : ð4:12Þ
π 2
 π π
Integrating (4.12) over  2, 2 , a contribution from only the first term is nonvan-
ishing to give 1, as anticipated (because of the normalization).  
Putting t ¼ 0 and integrating (4.12) over a positive domain 0, π2 , we have
Z π=2
1 4
ψ  ðx, 0Þψ ðx, 0Þdx ¼ þ  0:924: ð4:13Þ
0 2 3π
 
Similarly, integrating (4.12) over a negative domain  π2, 0 , we have
4.2 One-Dimensional System 129

Fig. 4.1 Probability


distribution density ψ (x,
t)ψ(x, t) of a particle

,0
confined in a square-well
potential. The solid curve

,0
and broken curve represent
the density of t ¼ 0 and


t ¼ π/ω (i.e., half period),
respectively

− /2 0 /2

Z 0
1 4
ψ  ðx, 0Þψ ðx, 0Þdx ¼   0:076: ð4:14Þ
π=2 2 3π

Thus, 92% of a total charge (as a probability density) is concentrated in the positive
domain. Differentiation of ψ (x, 0)ψ(x, 0) gives five extremals including both edges.
Of these, a major maximum is located at 0.635 radian that corresponds to about 40%
of π/2. This can be a measure of the transition moment. Figure 4.1 demonstrates
these results (see a solid curve). Meanwhile, putting t ¼ π/ω (i.e., half period), we
plot ψ (x, π/ω)ψ(x, π/ω). The result shows that the graph is obtained by folding back
the solid curve of Fig. 4.1 with respect to the ordinate axis. Thus, we find that the
charge (or the probability density) exerts a sinusoidal oscillation with an angular
frequency 3 h/2m along the x-axis around the origin.
Let e1 be a unit vector in the positive direction of the x-axis. Then, the electric
dipole P of the system is

P = ex = exe1 , ð4:15Þ

where x is a position vector of the electron. Let us define the matrix element of the
electric dipole transition as

P21  hϕ2 ðxÞje1  Pjϕ1 ðxÞi ¼ hϕ2 ðxÞjexjϕ1 ðxÞi: ð4:16Þ

Notice that we only have to consider that the polarization of light is parallel to the x-
axis. With the coordinate representation, we have
Z π=2 Z rffiffiffi
π=2
rffiffiffi
2 2
P21 ¼ ϕ2 ðxÞexϕ1 ðxÞdx ¼ ð cos xÞex sin 2x dx
π=2 π=2 π π
Z π=2 Z π=2
2 e
¼e x cos x sin 2x dx ¼ xð sin x þ sin 3xÞdx
π π=2 π π=2
130 4 Optical Transition and Selection Rules

Z π=2   0 
e 1 16e
¼ xð cos xÞ0 þ x  cos 3x dx ¼ , ð4:17Þ
π π=2 3 9π

where we used a trigonometric formula and integration by parts. The factor 16/9π in
(4.17) is about 36% of π/2. This number is pretty good agreement with 40% that is
estimated above from the major maximum of ψ (x, 0)ψ(x, 0). Note that the transition
moment vanishes if the two states associated with the transition have the same parity.
In other words, if these are both described by sine functions or cosine functions, the
integral vanishes.
Example 4.2: One-Dimensional Harmonic Oscillator Second, let us think of an
optical transition regarding a harmonic oscillator that we dealt with in Chap. 2. We
denote the state of the oscillator as j ni in place of jψ ni (n ¼ 0, 1, 2,   ) of Chap. 2.
Then, a general expression (4.5) can be written as

Pkl ¼ hkjεe  Pjli: ð4:18Þ

Since we are considering the sole one-dimensional oscillator,

εe ¼ e
q and P = eq, ð4:19Þ

where eq is a unit vector in the positive direction of the coordinate q. Therefore,


similarly to the above we have

εe  P = eq: ð4:20Þ

That is,

Pkl ¼ ehkjqjli: ð4:21Þ

Since q is an Hermitian operator, we have

Pkl ¼ e ljq{ jk ¼ ehljqjki ¼ Plk , ð4:22Þ

where we used (1.116). Using (2.68), we have


rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
h { h  
Pkl ¼ e kja þ a jl ¼ e hkjajli þ kja{ jl : ð4:23Þ
2mω 2mω

Taking the adjoint of (2.62) and modifying the notation, we have


4.2 One-Dimensional System 131

D pffiffiffiffiffiffiffiffiffiffiffi
kja ¼ k þ 1hk þ 1j: ð4:24Þ

Using (2.62) once again, we get


rffiffiffiffiffiffiffiffiffiffih i
h pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi
Pkl ¼ e k þ 1hk þ 1jli þ l þ 1hkjl þ 1i : ð4:25Þ
2mω

Using orthonormal conditions between the state vectors, we have


rffiffiffiffiffiffiffiffiffiffih i
h pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi
Pkl ¼ e k þ 1δkþ1,l þ l þ 1δk,lþ1 : ð4:26Þ
2mω

Exchanging k and l in the above, we get

Pkl ¼ Plk :

The matrix element Pkl is symmetric with respect to indices k and l. Notice that the
first term does not vanish only when k + 1 ¼ l. The second term does not vanish only
when k ¼ l + 1. Therefore we get
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
hð k þ 1Þ hð l þ 1Þ hð k þ 1Þ
Pk,kþ1 ¼e and Plþ1,l ¼ e or Pkþ1,k ¼ e : ð4:27Þ
2mω 2mω 2mω

Meanwhile, we find that the transition matrix P is expressed as


rffiffiffiffiffiffiffiffiffiffi
h
P ¼ eq ¼ e a þ a{
2mω
0 1
0 1 0 0 0 
B 1 pffiffiffi
C
rffiffiffiffiffiffiffiffiffiffiB C
0 2 0 0
B pffiffiffi pffiffiffi C
h B B 0 2 0 3 0 C
¼e pffiffiffi C, ð4:28Þ
2mωB 0 B C
0 3 0 2 C
B C
@ 0 0 0 2 0 A
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

where we used (2.68). Note that a real Hermitian matrix is a symmetric matrix.
Practically, it is a fast way to construct a transition matrix (4.28) using (2.65) and
(2.66). It is an intuitively obvious and straightforward task. Having a glance at the
matrix form immediately tells us that the transition matrix elements are nonvanishing
with only (k, k + 1) and (k + 1, k) positions. Whereas the (k, k + 1)-element represents
transition from the k-th excited state to (k  1)-th excited state accompanied by
photoemission, the (k + 1, k)-element implies the transition from (k  1)-excited state
132 4 Optical Transition and Selection Rules

to k-th excited state accompanied by photoabsorption. The two transitions give the
same transition moment. Note that zeroth excited state means the ground state; see
(2.64) for basis vector representations.
We should be careful about “addresses” of the matrix accordingly. For example,
P0, 1 in (4.27) represents a (1, 2) element of the matrix (4.28); P2, 1 stands for a (3, 2)
element.
Suppose that we seek the transition dipole moments using coordinate represen-
tation. Then, we need to use (2.106) and perform definite integration. For instance,
we have
Z 1
e ψ 0 ðqÞqψ 1 ðqÞdq
1
qffiffiffiffiffiffiffi
h
that corresponds to (1, 2) element of (4.28). Indeed, the above integral gives e 2mω.
The confirmation is left for the readers. Nonetheless, to seek a definite integral of
product of higher excited-state wave functions becomes increasingly troublesome. In
this respect, the operator method described above provides us with a much better
insight into complicated calculations.
Equations (4.26)–(4.28) imply that the electric dipole transition is allowed to
occur only when the quantum number changes by one. Notice also that the transition
takes place between the even function and odd function; see Table 2.1 and (2.101).
Such a condition or restriction on the optical transition is called a selection rule. The
former equation of (4.27) shows that the transition takes place from the upper state to
the lower state accompanied by the photon emission. The latter equation, on the
other hand, shows that the transition takes place from the lower state to the upper
accompanied by the photon absorption.

4.3 Three-Dimensional System

The hydrogen-like atoms give us a typical example. Since we have fully investigated
the quantum states of those atoms, we make the most of the related results.
Example 4.3: An Electron in a Hydrogen Atom Unlike the one-dimensional
system, we have to take account of an angular momentum in the three-dimensional
system. We have already obtained explicit wave functions. Here we focus on 1s and
2p states of a hydrogen. For their normalized states we have
rffiffiffiffiffiffiffi
1 r=a
ϕð1sÞ ¼ e , ð4:29Þ
πa3
4.3 Three-Dimensional System 133

rffiffiffiffiffiffiffiffiffiffi
1 1 r r=2a
ϕ 2pz ¼ e cos θ, ð4:30Þ
4 2πa3 a
rffiffiffiffiffiffiffi
1 1 r r=2a
ϕ 2pxþiy ¼  e sin θeiϕ , ð4:31Þ
8 πa3 a
rffiffiffiffiffiffiffi
1 1 r r=2a
ϕ 2pxiy ¼ e sin θeiϕ , ð4:32Þ
8 πa3 a

where a denotes Bohr radius of a hydrogen. Note that a minus sign of ϕ(2px + iy) is
due to the Condon–Shortley phase. Even though the transition probability is pro-
portional to a square of the matrix element and so the phase factor cancels out, we
describe the state vector faithfully. The energy eigenvalues are

h2 h2
E ð1sÞ ¼  2
, E 2pz ¼ E 2pxþiy ¼ E 2pxiy ¼  , ð4:33Þ
2μa 8μa2

where μ is a reduced mass of a hydrogen. Note that the latter three states are
degenerate.
First, we consider a transition between ϕ(1s) and ϕ(2pz) states. Suppose that the
normalized coherent state is described as

1   
ψ ðx, t Þ ¼ pffiffiffi f ϕð1sÞ exp ½iE ð1sÞt=h þ ϕ 2pz exp iE 2pz t=h g: ð4:34Þ
2

As before, we have

ψ  ðx, t Þψ ðx, t Þ ¼ jψ ðx, t Þj2


n  2 o
1
¼ ½ϕð1sÞ2 þ ϕ 2pz þ 2ϕð1sÞϕ 2pz cos ωt , ð4:35Þ
2

where ω is given by
 
ω ¼ E 2pz  Eð1sÞ =h ¼ 3h=8 μa2 : ð4:36Þ

In virtue of the third term of (4.35) that contains a cos ωt factor, the charge
distribution undergoes a sinusoidal oscillation along the z-axis with an angular
frequency described by (4.36). For instance, ωt ¼ 0 gives +1 factor to (4.35) when
t ¼ 0, whereas it gives 1 factor when ωt ¼ π, i.e., t ¼ 8π μa2/3h.
Integrating (4.35), we have
134 4 Optical Transition and Selection Rules

Z Z
ψ  ðx, t Þψ ðx, t Þdτ ¼ jψ ðx, t Þj2 dτ
Z n Z
1  2 o 1 1
¼ ½ϕð1sÞ2 þ ϕ 2pz dτ þ cos ωt ϕð1sÞϕ 2pz dτ ¼ þ ¼ 1,
2 2 2

where we used normalized functional forms of ϕ(1s) and ϕ(2pz) together with
orthogonality of them. Note that both of the functions are real.
Next, we calculate the matrix element. For simplicity, we denote the matrix
element simply as Pðεe Þ only by designating the unit polarization vector εe. Then,
we have

Pðεe Þ ¼ ϕð1sÞjεe  Pjϕ 2pz , ð4:37Þ

where
0 1
x
B C
P = ex = eðe1 e2 e3 Þ@ y A: ð4:38Þ
z

We have three possibilities of choosing εe out of e1, e2, and e3. Choosing e3, we have

ðe Þ
Pz,jp3 ¼ e ϕð1sÞjzjϕ 2pz
zi

Z 1 Z π Z 2π pffiffiffi
e 4 3r=2a 27 2
¼ pffiffiffi r e dr cos θ sin θdθ
2
dϕ ¼ 5 ea  0:745ea: ð4:39Þ
4 2πa4 0 0 0 3
ðe Þ
In (4.39), we express the matrix element as Pz,jp3to indicate the z-component of
i z

position vector and to explicitly show that ϕ(2pz) state is responsible for the
transition. In (4.39), we used z ¼ r cos θ. We also used a radial part integration
such that
Z 1  
2a 5
r 4 e3r=2a dr ¼ 24 :
0 3

Also, we changed a variable cos θ ⟶ t to perform the integration with respect to θ.


We see that a “leverage” length of the transition moment is comparable to Bohr
radius a.
ðe Þ
With the notation Pz,jp3 we need some explanation for consistency with the latter
zi

description. Equation (4.39) represents the transition from jϕ(2pz)i to jϕ(1s)i that is
accompanied by the photon emission. Thus, j pzi in the notation means that jϕ(2pz)i
4.3 Three-Dimensional System 135

is the initial state. In the notation, in turn, (e3) denotes the polarization vector and
z represents the electric dipole.
In the case of photon absorption where the transition occurs from jϕ(1s)i to
jϕ(2pz)i, we use the following notation:

ðe Þ
Pz, 3p j ¼ e ϕ 2pz jzjϕð1sÞ : ð4:40Þ
hz

Since all the functions related to the integration are real, we have

ðe Þ ðe Þ
Pz,jp3 ¼ Pz, 3p j :
zi hz

Meanwhile, if we choose e1 for εe to evaluate the matrix element Px, we have

ðe Þ
Px,jp1 ¼ e ϕð1sÞjxjϕ 2pz
zi

Z 1 Z π Z 2π
e
¼ pffiffiffi r 4 e3r=2a dr sin 2 θ cos θdθ cos ϕdϕ ¼ 0, ð4:41Þ
4 2πa4 0 0 0

where cos ϕ comes from x ¼ r sin θ cos ϕ and an integration of cos ϕ gives zero. In a
similar manner, we have

ðe Þ
Py, 2j p ¼ e ϕð1sÞjyjϕ 2pz ¼ 0: ð4:42Þ
zi

Next, we estimate the matrix elements associated with 2px and 2py. For this
purpose, it is convenient to introduce the following complex coordinates by a unitary
transformation:
0 10 1
1 1 1 i
0 1 p ffiffi
ffi p ffiffi
ffi 0 CB pffiffiffi pffiffiffi 0 C0 1
x B 2 C x
B 2 2 CB 2
B C B CB CB C
ðe1 e2 e3 Þ@ y A ¼ ðe1 e2 e3 ÞB  piffiffiffi pi ffiffiffi 0 CB p1ffiffiffi  pffiffiffi 0 C
i @yA
B CB C
z @ 2 2 A@ 2 2 A z
0 0 1 0 0 1
0 1
1
pffiffiffi ðx þ iyÞ
B 2 C
1 1 B C
¼ pffiffiffi ðe1  ie2 Þ pffiffiffi ðe1 þ ie2 Þ e3 B B 1 C,
C ð4:43Þ
2 2 @ pffiffiffi ðx  iyÞ A
2
z

where a unitary transformation is represented by a unitary matrix defined as


136 4 Optical Transition and Selection Rules

U { U ¼ UU { ¼ E: ð4:44Þ

We will investigate details of the unitary transformation and matrix in Parts III
and IV.
We define e+ and e as follows [1]:

1 1
eþ  pffiffiffi ðe1 þ ie2 Þ and e  pffiffiffi ðe1  ie2 Þ, ð4:45Þ
2 2

where complex vectors e+ and e represent the left-circularly polarized light and
right-circularly polarized light that carry an angular momentum h and h, respec-
tively. We will revisit the characteristics and implication of these complex vectors in
Sect. 7.4.
We have
0 1
0 1 p
1
ffiffi
ffi ð x þ iy Þ
x B 2 C
B C B C
ð e1 e2 e3 Þ @ y A ¼ ð e eþ e3 Þ B
B pffiffiffi ðx  iyÞ C:
1 C ð4:46Þ
@ A
z 2
z

Note that e+, e, and e3 are orthonormal. That is,

heþ jeþ i ¼ 1, heþ je i ¼ 0, etc: ð4:47Þ

In this situation, e+, e, and e3 are said to form an orthonormal basis in a three-
dimensional complex vector space (see Sect. 11.4).
Now, choosing e+ for εe, we have [2]
 
ðeþ Þ 1
Pxiy,  e ϕð1sÞjpffiffiffi ðx  iyÞjϕ 2pxþiy , ð4:48Þ
j pþ i 2

where j p+i is a shorthand notation of ϕ(2px + iy); x  iy represents a complex electric


dipole. Equation (4.48) represents an optical process in which an electron causes
transition from ϕ(2px + iy) to ϕ(1s) to lose an angular momentum h, whereas the
radiation field gains that angular momentum to conserve a total angular momentum
ðeþ Þ
h. The notation Pxiy, reflects this situation. Using the coordinate representation,
j pþ i
we rewrite (4.48) as
4.3 Three-Dimensional System 137

Z 1 Z π Z 2π
ðe Þ e
þ
Pxiy,jp ¼  pffiffiffi r 4 e3r=2a dr sin 3 θdθ eiϕ eiϕ dϕ
þi 8 2πa4 0 0 0
pffiffiffi
27 2
¼  5 ea, ð4:49Þ
3

where we used

x  iy ¼ r sin θeiϕ : ð4:50Þ

In the definite integral of (4.49), eiϕ comes from x  iy, while eiϕ comes from
ϕ(2px + iy). Note that from (3.24) eiϕ is an eigenfunction corresponding to an angular
momentum eigenvalue h. Notice that in (4.49) exponents eiϕ and eiϕ cancel out and
that an azimuthal integral is nonvanishing.
If we choose e for εe, we have
 
ðe Þ 1
Pxþiy,jp ¼ e ϕð1sÞjpffiffiffi ðx þ iyÞjϕ 2pxþiy ð4:51Þ
þi 2
Z 1 Z π Z 2π
e
¼ pffiffiffi r 4 e3r=2a dr sin 3 θdθ e2iϕ dϕ ¼ 0, ð4:52Þ
8 2πa4 0 0 0

where we used

x þ iy ¼ r sin θeiϕ :

With (4.52), a factor e2iϕ results from the product ϕ(2px + iy)(x + iy) which renders the
integral (4.51) vanishing. Note that the only difference between (4.49) and (4.52) is
about the integration of ϕ factor. For the same reason, if we choose e3 for εe, the
matrix element vanishes. Thus, with the ϕ(2px + iy)-related matrix element, only
ð eþ Þ
Pxiy, survives. Similarly, with the ϕ(2px  iy)-related matrix element, only
j pþ i
ð e Þ
Pxþiy,jp survives. Notice that j pi is a shorthand notation of ϕ(2px  iy). That is,
i
we have, e.g.,
  pffiffiffi
ðe Þ 1 27 2
Pxþiy,jp ¼ e ϕð1sÞjpffiffiffi ðx þ iyÞjϕ 2pxiy ¼ 5 ea,
i
2 3
 
ðe Þ 1
þ
Pxiy, jp ¼ e ϕð1sÞjpffiffiffi ðx  iyÞjϕ 2pxiy ¼ 0: ð4:53Þ
i
2

Taking complex conjugate of (4.48), we have


138 4 Optical Transition and Selection Rules

    pffiffiffi
ðeþ Þ 1 27 2
Pxiy, ¼ e ϕ 2pxþiy jpffiffiffi ðx þ iyÞjϕð1sÞ ¼  5 ea ð4:54Þ
j pþ i 2 3
ðe Þ
Here recall (1.116) and (x  iy){ ¼ x + iy. Also note that since Pxiy,
þ
is real,
j pþ i
ð eþ Þ
[Pxiy,  is real as well so that we have
j pþ i
 
ð eþ Þ ðe Þ ðe Þ
Pxiy, ¼ Pxiy,
þ
¼ Pxþiy,

: ð4:55Þ
j pþ i j pþ i hpþ j

Comparing (4.48) and (4.55), we notice that the polarization vector has been
switched from e+ to e with the allowed transition, even though the matrix element
remains the same. This can be explained as follows: In (4.48) the photon emission is
occurring, while the electron is causing a transition from ϕ(2px + iy) to ϕ(1s). As a
result, the radiation field has gained an angular momentum by h during the process in
which the electron has lost an angular momentum h. In other words, h is transferred
from the electron to the radiation field and this process results in the generation of
left-circularly polarized light in the radiation field.
In (4.54), on the other hand, the reversed process takes place. That is, the photon
absorption is occurring in such a way that the electron is excited from ϕ(1s) to
ϕ(2px + iy). After this process has been completed, the electron has gained an angular
momentum by h, whereas the radiation field has lost an angular momentum by h. As
a result, the positive angular momentum h is transferred to the electron from the
radiation field that involves left-circularly polarized light. This can be translated into
the statement that the radiation field has gained an angular momentum by h. This is
equivalent to the generation of right-circularly polarized light (characterized by e)
in the radiation field. In other words, the electron gains the angular momentum by h
to compensate the change in the radiation field.
The implication of the first equation of (4.53) can be interpreted in a similar
manner. Also we have
h i  
ðe Þ ðe Þ ð eþ Þ 1

Pxþiy, jp ¼ Pxþiy,

¼ Pxiy, ¼ e ϕ 2p j p ffiffi
ffi ð x  iy Þjϕ ð 1s Þ
i jp i hp j xiy
2
pffiffiffi
27 2
¼ ea:
35

Notice that the inner products of (4.49) and (4.53) are real, even though operators
ðeþ Þ ð e Þ
x + iy and x  iy are not Hermitian. Also note that Pxiy, of (4.49) and Pxþiy,
j pþ i j p i

of (4.53) have the same absolute value with minus and plus signs, respectively. The
minus sign of (4.49) comes from the Condon–Shortley phase. The difference,
however, is not essential, because the transition probability is proportional to
4.3 Three-Dimensional System 139

 2  2

 ð e Þ 
Pxþiy,j p i  or Pxiy,  . Some literature [3, 4] uses (x + iy) instead of x + iy.
ðeþ Þ
j pþ i 
This is because simply of the inclusion of the Condon–Shortley phase; see (3.304).
Let us think of the coherent state that is composed of ϕ(1s) and ϕ(2px + iy) or
ϕ(2px  iy). Choosing ϕ(2px + iy), the state ψ(x, t) can be given by

1
ψ ðx, t Þ ¼ pffiffiffi
2
 
ϕð1sÞ exp ðiE ð1sÞt=hÞ þ ϕ 2pxþiy exp iE 2pxþiy t=h , ð4:56Þ

where ϕ(1s) is described by (3.301) and ϕ(2px + iy) is expressed as (3.304). Then we
have

ψ  ðx, t Þψ ðx, t Þ ¼ jψ ðx, t Þj2


n  2 h io
1
¼ jϕð1sÞj2 þ ϕ 2pxþiy  þ ϕð1sÞR 2pxþiy eiðϕωtÞ þ eiðϕωtÞ
2
n  2 o
1
¼ jϕð1sÞj2 þ ϕ 2pxþiy  þ 2ϕð1sÞR 2pxþiy cos ðϕ  ωt Þ , ð4:57Þ
2

where using R 2pxþiy , we denote ϕ(2px+iy) as follows:

ϕ 2pxþiy  R 2pxþiy eiϕ : ð4:58Þ

That is, R 2pxþiy represents a real component of ϕ(2px+iy) that depends only on
r and θ. The third term of (4.57) implies that the existence probability density of an
electron represented by jψ(x, t)j2 is rotating counterclockwise around the z-axis with
an angular frequency of ω. Similarly, in the case of ϕ(2pxiy), the existence prob-
ability density of an electron is rotating clockwise around the z-axis with an angular
frequency of ω.
Integrating (4.57), we have
Z Z
ψ  ðx, t Þψ ðx, t Þdτ ¼ jψ ðx, t Þj2 dτ
Z n
1  2 o
¼ jϕð1sÞj2 þ ϕ 2pxþiy  dτ
2
Z 1 Z π Z 2π
1 1
þ 2
r dr sin θ dθ ϕð1sÞR 2pxþiy dϕ cos ðϕ  ωt Þ ¼ þ ¼ 1,
0 0 0 2 2

where we used normalized functional forms of ϕ(1s) and ϕ(2px+iy); the last term
vanishes because
140 4 Optical Transition and Selection Rules

Z 2π
dϕ cos ðϕ  ωt Þ ¼ 0:
0

This is easily shown by suitable variable transformation.


In relation to the above discussion, we often use real numbers to describe wave
functions. For this purpose, we use the following unitary transformation to transform
the orthonormal basis of e imϕ to cos mϕ and sin mϕ. That is, we have
0 1
ð1Þm ð1Þm i
  pffiffiffi  pffiffiffi C
1 1 ð1Þm imϕ 1 imϕ B
pffiffiffi cos mϕ pffiffiffi sin mϕ ¼ pffiffiffiffiffi e pffiffiffiffiffi e B 2 2 C,
π π 2π 2π @ 1 i A
pffiffiffi pffiffiffi
2 2
ð4:59Þ

where we assume that m is positive so that we can appropriately take into account the
Condon–Shortley phase. Alternatively, we describe it via unitary transformation as
follows:
 
ð1Þm imϕ 1 imϕ 1 1
pffiffiffiffiffi e pffiffiffiffiffi e ¼ pffiffiffi cos mϕ pffiffiffi sin mϕ
2π 2π π π
0 m 1
ð1Þ 1
pffiffiffi pffiffiffi
B 2 2 C
B C, ð4:60Þ
@ ð1Þm i i A
pffiffiffi  pffiffiffi
2 2

In this regard, we have to be careful about normalization constants; for trigonometric


functions the constant should be p1ffiffiπ , whereas for the exponential representation the
constant is p1ffiffiffiffi

. At the same time, trigonometric functions are expressed as a linear
combination of eimϕ and eimϕ, and so if we use the trigonometric functions,
information of a magnetic quantum number is lost.
In Sect. 3.7, we showed normalized functions Λ eðnÞ ¼ Y m ðθ, ϕÞR
eðl nÞ ðr Þ of the
l,m l
imϕ e ðnÞ
hydrogen-like atom. Noting that Y m l ðθ, ϕÞ is proportional to e , Λl,m can be
described using cosmϕ and sinmϕ for the basis vectors. We denote two linearly
eðnÞ
independent vectors by Λ eðnÞ
l, cos mϕ and Λl, sin mϕ . Then, these vectors are expressed as
4.3 Three-Dimensional System 141

0 1
ð1Þm ð1Þm i
   ðnÞ  pffiffiffi  pffiffiffi C
B
eðnÞ
Λ eðnÞ e eðnÞ B 2 2 C,
l, cos mϕ l, sin mϕ ¼ Λl,m
Λ Λl,m @ A ð4:61Þ
1 i
pffiffiffi pffiffiffi
2 2

where we again assume that m is positive. In chemistry and materials science, we


normally use real functions of Λ eðnÞ eðnÞ
l, cos mϕ and Λl, sin mϕ . In particular, we use the
eð2Þ
notations of, e.g., ϕ(2px) and ϕ(2py) instead of Λ and Λeð2Þ , respectively. In
1, cos ϕ 1, sin ϕ
that case, we explicitly have a following form:
0 1
1 i
 ð2Þ ð2Þ B  p ffiffi
ffi p ffiffi

e Λ e B 2 2C C
ϕð2px Þ ϕ 2py ¼ Λ 1,1 @
1,1 1 i A
pffiffiffi pffiffiffi
2 2
0 1
1 i
 pffiffiffi pffiffiffi
B 2 2C
¼ ϕ 2pxþiy ϕ 2pxiy B @ 1
C
i A
pffiffiffi pffiffiffi
2 2

1 3 r 1 32 r 2a
pffiffiffiffiffi a2 e2a sin θ cos ϕ
r r
¼ pffiffiffiffiffi a e sin θ sin ϕ : ð4:62Þ
4 2π a 4 2π a

Thus, the Condon–Shortley phase factor has been removed.


Using this expression, we calculate matrix elements of the electric dipole transi-
tion. We have

ðe Þ
Px,jp1 i ¼ ehϕð1sÞjxjϕð2px Þi
x

Z 1 Z π Z 2π pffiffiffi
e 27 2
¼ pffiffiffi r 4 e3r=2a dr sin 3 θdθ cos 2 ϕdϕ ¼ ea: ð4:63Þ
4 2πa4 0 0 0 35

Thus, we obtained the same result as (4.49) apart from the minus sign. Since a square
of an absolute value of the transition moment plays a role, the minus sign is again of
secondary importance. With Pðye2 Þ , similarly we have

ðe Þ
Py,jp2 ¼ e ϕð1sÞjyjϕ 2py
yi

Z 1 Z π Z 2π pffiffiffi
e 4 3r=2a 27 2
¼ pffiffiffi r e dr sin θdθ
3
sin ϕdϕ ¼ 5 ea:
2
ð4:64Þ
4 2πa4 0 0 0 3

Comparing (4.39), (4.63), and (4.64), we have


142 4 Optical Transition and Selection Rules

pffiffiffi
ðe Þ ðe Þ ðe Þ 27 2
Pz, 3 p ¼ Px,j1 p i ¼ Py, 2 p ¼ 5 ea:
j zi x j yi 3
ðe Þ ðe Þ ðe Þ
In the case of Pz,jp3 , Px,j1 p i , and Py,jp2 , the optical transition is said to be polarized
zi x yi

along the z-, x-, and y-axis, respectively, and so linearly polarized lights are relevant.
Note moreover that operators z, x, and y in (4.39), (4.63), and (4.64) are Hermitian
and that ϕ(2pz), ϕ(2px), and ϕ(2py) are real functions.

4.4 Selection Rules

In a three-dimensional system such as hydrogen-like atoms, quantum states of


particles (i.e., electrons) are characterized by three quantum numbers: principal
quantum numbers, orbital angular momentum quantum numbers (or azimuthal
quantum numbers), and magnetic quantum numbers. In this section, we examine
the selection rules for the electric dipole approximation.
Of the three quantum numbers mentioned above, angular momentum quantum
numbers are denoted by l and magnetic quantum numbers by m. First, we examine
the conditions on m. With the angular momentum operator L and its corresponding
operator M, we get the following commutation relations:
 
½M z , x ¼ iy, M y , z ¼ ix, ½M x , y ¼ iz;
 
½M z , iy ¼ x, M y , ix ¼ z, ½M x , iz ¼ y; etc: ð4:65Þ

Notice that in the upper line the indices change cyclic like (z, x, y), whereas in the
lower line they change anti-cyclic such as (z, y, x). The proof of (4.65) is left for the
reader. Thus, we have, e.g.,

½M z , x þ iy ¼ x þ iy, ½M z , x  iy ¼ ðx  iyÞ, etc: ð4:66Þ

Putting

Qþ  x þ iy and Q  x  iy,

we have

½M z , Qþ  ¼ Qþ , ½M z , Q  ¼ Q : ð4:67Þ

Taking an inner product of both sides of (4.67), we have


4.4 Selection Rules 143

hm0 j½M z , Qþ jmi ¼ hm0 jM z Qþ  Qþ M z jmi ¼ m0 hm0 jQþ jmi  mhm0 jQþ jmi
¼ hm0 jQþ jmi, ð4:68Þ

where the quantum state j mi is identical to j l, mi in (3.151). Here we need no


information about l, and so it is omitted. Thus, we have, e.g., Mzj mi ¼ mj mi. Taking
its adjoint, we have hm j Mz{ ¼ hm j Mz ¼ mhmj, where Mz is Hermitian. These results
lead to (4.68). From (4.68), we get

ðm0  m  1Þhm0 jQþ jmi ¼ 0: ð4:69Þ

Therefore, for the matrix element hm0j Q+j mi not to vanish, we must have

m0  m  1 ¼ 0 or Δm ¼ 1 ðΔm  m0  mÞ:

This represents the selection rule with respect to the coordinate Q+.
Similarly, we get

ðm0  m þ 1Þhm0 jQ jmi ¼ 0: ð4:70Þ

In this case, for the matrix element hm0j Qj mi not to vanish we have

m0  m þ 1 ¼ 0 or Δm ¼ 1:

To derive (4.70), we can alternatively use the following: Taking the adjoint of (4.69),
we have

ðm0  m  1ÞhmjQ jm0 i ¼ 0:

Exchanging m0 and m, we have

ðm  m0  1Þhm0 jQ jmi ¼ 0 or ðm0  m þ 1Þhm0 jQ jmi ¼ 0:

Thus, (4.70) is recovered.


Meanwhile, we have a commutation relation

½M z , z ¼ 0: ð4:71Þ

Similarly, taking an inner product of both sides of (4.71), we have

ðm0  mÞhm0 jzjmi ¼ 0:

Therefore, for the matrix element hm0j zj mi not to vanish, we must have
144 4 Optical Transition and Selection Rules

m0  m ¼ 0 or Δm ¼ 0: ð4:72Þ

These results are fully consistent with Example 4.3 of Sect. 4.3. That is, if
circularly polarized light takes part in the optical transition, Δm ¼ 1. For instance,
using the present notation we rewrite (4.48) as
  pffiffiffi
1 1  27 2
ϕð1sÞjpffiffiffi ðx  iyÞjϕ 2pxþiy ¼ pffiffiffi h0jQ j1i ¼  5 a:
2 2 3

If linearly polarized light is related to the optical transition, we have Δm ¼ 0.


Next, we examine the conditions on l. To this end, we calculate a following
commutator [5]:
       
M2 , z ¼ M x 2 þ M y2 þ M z 2 , z ¼ M x 2 , z þ M y 2 , z

¼ M x ðM x z  zM x Þ þ M x zM x  zM x 2 þ M y M y z  zM y

þ M y zM y  zM y 2
   
¼ M x ½M x , z þ ½M x , zM x þ M y M y , z þ M y , z M y

¼ i M y x þ xM y  M x y  yM x

¼ i M x y  yM x  M y x þ xM y þ 2M y x  2M x y

¼ i 2iz þ 2M y x  2M x y ¼ 2i M y x  M x y þ iz : ð4:73Þ

In the above calculations, (i) we used [Mz, z] ¼ 0 (with the second equality); (ii) RHS
was modified so that the commutation relations can be used (the third equality); (iii)
we used Mxy ¼ Mxy  2Mxy and Myx ¼  Myx + 2Myx so that we can use (4.65)
(the second last equality). Moreover, using (4.65), (4.73) can be written as
 2 
M , z ¼ 2i xM y  M x y ¼ 2i M y x  yM x :

Similar results on the commutator can be obtained with [M2, x] and [M2, y]. For
further use, we give alternative relations such that
 
M 2 , x ¼ 2i yM z  M y z ¼ 2i M z y  zM y ,
 2 
M , y ¼ 2iðzM x  M z xÞ ¼ 2iðM x z  xM z Þ: ð4:74Þ

Using (4.73), we calculate another commutator such that


4.4 Selection Rules 145

 2  2       
M , M , z ¼ 2i M 2 , M y x  M 2 , M x y þ i M 2 , z
      
¼ 2i M y M 2 , x  M x M 2 , y þ i M 2 , z
  
¼ 2i 2iM y yM z  M y z  2iM x ðM x z  xM z Þ þ i M 2 , z

¼ 2 2 M x x þ M y y þ M z z M z  2 M x 2 þ M y 2 þ M z 2 z
þM 2 z  z M 2 g ¼ 2 M 2 z þ z M 2 : ð4:75Þ

In the above calculations, (i) we used [M2, My] ¼ [M2, Mx] ¼ 0 (with the second
equality); (ii) we used (4.74) (the third equality); (iii) RHS was modified so that we
can use the relation M ⊥ x from the definition of the angular momentum operator,
i.e., Mxx + Myy + Mzz ¼ 0 (the second last equality). We used [Mz, z] ¼ 0 as well.
Similar results are obtained with x and y. That is, we have
  
M 2 , M 2 , x ¼ 2 M 2 x þ xM 2 , ð4:76Þ
 2  2 
M , M , y ¼ 2 M 2 y þ yM 2 : ð4:77Þ

Rewriting, e.g., (4.75), we have

M 4 z  2M 2 z M 2 þ z M 4 ¼ 2 M 2 z þ z M 2 : ð4:78Þ

Using the relation (4.78) and taking inner products of both sides, we get, e.g.,

l0 jM 4 z  2M 2 z M 2 þ z M 4 jl ¼ l0 j2 M 2 z þ z M 2 jl : ð4:79Þ

That is,

l0 jM 4 z  2M 2 z M 2 þ z M 4 jl  l0 j2 M 2 z þ z M 2 jl ¼ 0:

Considering that both terms of LHS contain a factor hl0j zj li in common, we have
h i
l0 ðl0 þ 1Þ  2l0 lðl0 þ 1Þðl þ 1Þ þ l2 ðl þ 1Þ2  2l0 ðl0 þ 1Þ  2lðl þ 1Þ
2 2

hl0 jzjli ¼ 0, ð4:80Þ

where the quantum state j li is identical to j l, mi in (3.151) with m omitted.


To factorize the first factor of LHS of (4.80), we view it as a quartic equation with
respect to l0. Replacing l0 with l, we find that the first factor vanishes, and so the first
factor should have a factor (l0 + l). Then, we factorize the first factor of LHS of (4.80)
such that
146 4 Optical Transition and Selection Rules

½the first factor of LHS of ð4:80Þ


 
¼ ðl0 þ lÞ ðl0  lÞ þ 2ðl0 þ lÞ l0  l0 l þ l2  2l0 lðl0 þ lÞ  2ðl0 þ lÞ  ðl0 þ lÞ
2 2 2 2

h   i
¼ ðl0 þ lÞ ðl0 þ lÞðl0  lÞ þ 2 l0  l0 l þ l2  2l0 l  2  ðl0 þ lÞ
2 2

h   i
¼ ðl0 þ lÞ ðl0 þ lÞðl0  lÞ þ 2 l0  2l0 l þ l2  ðl0 þ l þ 2Þ
2 2

h i
¼ ðl0 þ lÞ ðl0 þ lÞðl0  lÞ þ 2ðl0  lÞ  ðl0 þ l þ 2Þ
2 2

n o
¼ ðl0 þ lÞ ðl0  lÞ ½ðl0 þ lÞ þ 2  ðl0 þ l þ 2Þ
2

¼ ðl0 þ lÞðl0 þ l þ 2Þðl0  l þ 1Þðl0  l  1Þ: ð4:81Þ

Thus rewriting (4.80), we get

ðl0 þ lÞðl0 þ l þ 2Þðl0  l þ 1Þðl0  l  1Þhl0 jzjli ¼ 0: ð4:82Þ

We have similar relations with respect to hl0j xj li and hl0j yj li because of (4.76) and
(4.77). For the electric dipole transition to be allowed, among hl0j xj li, hl0j yj li, and
hl0j zj li at least one term must be nonvanishing. For this, at least one of the four
factors of (4.81) should be zero. Since l0 + l + 2 > 0, this factor is excluded.
For l0 + l to vanish, we should have l0 ¼ l ¼ 0; notice that both l0 and l are non-
negative integers. We must then examine this condition. This condition is equivalent
to that the spherical harmonics related to the angular variables θ and ϕ take the form
pffiffiffiffiffi
of Y00 ðθ, ϕÞ ¼ 1= 4π, i.e., a constant. Therefore, the θ-related integral for the matrix
element hl0j zj li only consists of a following factor:
Z π Z π
1 1
cos θ sin θdθ ¼ sin 2θdθ ¼  ½ cos 2θπ0 ¼ 0,
0 2 0 4

where cosθ comes from a polar coordinate z ¼ r cos θ; sinθ is due to an infinitesimal
volume of space, i.e., r2 sin θdrdθdϕ. Thus, we find that hl0j zj li vanishes on
condition that l0 ¼ l ¼ 0. As a polar coordinate representation, x ¼ r sin θ cos ϕ
and y ¼ r sin θ sin ϕ, and so the ϕ-related integral hl0j xj li and hl0j yj li vanishes as
well. That is,
Z 2π Z 2π
cos ϕdϕ ¼ sin ϕdϕ ¼ 0:
0 0

Therefore, the matrix elements relevant to l0 ¼ l ¼ 0 vanish with all the coordinates;
i.e., we have
4.5 Angular Momentum of Radiation 147

hl0 jxjli ¼ hl0 jyjli ¼ hl0 jzjli ¼ 0: ð4:83Þ

Consequently, we exclude (l0 + l)-factor as well, when we consider a condition of the


allowed transition. Thus, regarding the condition that should be satisfied with the
allowed transition, from (4.82) we get

l0  l þ 1 ¼ 0 or l0  l  1 ¼ 0: ð4:84Þ

Or defining Δl  l0  l, we get

Δl ¼ 1: ð4:85Þ

Thus, for the transition to be allowed, the azimuthal quantum number must change
by one.

4.5 Angular Momentum of Radiation [6]

In Sect. 4.3 we mentioned circularly polarized light. If the circularly polarized light
acts on an electron, what can we anticipate? Here we deal with this problem within a
framework of a semiclassical theory.
Let E and H be an electric and magnetic field of a left-circularly polarized light,
respectively. They are expressed as

1
E = pffiffiffi E 0 ðe1 þ ie2 Þ exp iðkz  ωt Þ, ð4:86Þ
2
1 1 E
H = pffiffiffi H 0 ðe2 2 ie1 Þ exp iðkz  ωt Þ ¼ pffiffiffi 0 ðe2 2 ie1 Þ exp iðkz  ωt Þ: ð4:87Þ
2 2 μv

Here we assume that the light is propagating in the direction of the positive z-axis.
The electric and magnetic fields described by (4.86) and (4.87) represent the left-
circularly polarized light. A synchronized motion of an electron is expected, if the
electron exerts a circular motion in such a way that the motion direction of the
electron is always perpendicular to the electric field and parallel to the magnetic field
(see Fig. 4.2). In this situation, magnetic Lorentz force does not affect the electron
motion.
Here, the Lorentz force F(t) is described by

Fðt Þ ¼ eEðxðt ÞÞ þ exð_t Þ Bðxðt ÞÞ, ð4:88Þ

where the first term is electric Lorentz force and the second term represents the
magnetic Lorentz force.
148 4 Optical Transition and Selection Rules

Fig. 4.2 Synchronized y


motion of an electron under
a left-circularly polarized
light electron motion

electron
O E x

The quantity B called magnetic flux density is related to H as B = μ0H, where


μ0 is permeability of vacuum; see (7.10) and (7.11) of Sect. 7.1. In (4.88) E and B are
measured at a position where the electron is situated at a certain time t. Equation
(4.88) universally describes the motion of a charged particle in the presence of
electromagnetic fields. We consider another related example in Chap. 15.
Equation (4.86) can be rewritten as

1
E = pffiffiffi E 0 ½e1 cos ðkz  ωt Þ 2 e2 sin ðkz  ωt Þ
2
1
þi pffiffiffi E 0 ½e2 cos ðkz  ωt Þ þ e1 sin ðkz  ωt Þ: ð4:89Þ
2

Suppose that the electron exerts the circular motion in a region narrow enough
around the origin and that the said electron motion is confined within the xy-plane
that is perpendicular to the light propagation direction. Then, we can assume that
z  0 in (4.89). Ignoring kz in (4.89) accordingly and taking a real part, we have

1
E = pffiffiffi E0 ðe1 cos ωt þ e2 sin ωt Þ:
2

Thus, a force F exerting the electron is described by

F = eE, ð4:90Þ

where e is an elementary charge (e < 0). Accordingly, an equation of motion of the


electron is approximated such that

x = eE,
m€ ð4:91Þ
4.5 Angular Momentum of Radiation 149

where m is a mass of an electron and x is a position vector of the electron. With


individual components of the coordinate, we have

1 1
m€x ¼ pffiffiffi eE0 cos ωt and m€y ¼ pffiffiffi eE 0 sin ωt: ð4:92Þ
2 2

Integrating (4.92) two times, we get

eE
mx ¼  pffiffiffi 0 cos ωt þ Ct þ D, ð4:93Þ
2 ω2

where C and D are integration constants. Setting xð0Þ ¼  pffiffieE


2mω2
0
and x 0 (0) ¼ 0, we
have C ¼ D ¼ 0. Similarly, we have

eE
my ¼  pffiffiffi 0 sin ωt þ C 0 t þ D0 , ð4:94Þ
2 ω2

ffiffi 0 , we
where C0 and D0 are integration constants. Setting y(0) ¼ 0 and y0 ð0Þ ¼  peE
2mω
have C0 ¼ D0 ¼ 0. Thus, making t a parameter, we get
2
eE
x2 þ y2 ¼ pffiffiffi 0 : ð4:95Þ
2mω2

This implies that the electron is exerting a counterclockwise circular motion with
a radius  pffiffieE
2mω2
0
under the influence of the electric field. This is consistent with a
motion of an electron in the coherent state of ϕ(1s) and ϕ(2px + iy) as expressed in
(4.57).
An angular momentum the electron has acquired is
 
eE meE e2 E 0 2
L¼x p = xpy  ypx ¼  pffiffiffi 0  pffiffiffi 0 ¼ : ð4:96Þ
2mω2 2mω 2mω3

Identifying this with h, we have

e2 E 0 2
¼ h: ð4:97Þ
2mω3

In terms of energy, we have

e2 E 0 2
¼ hω: ð4:98Þ
2mω2

Assuming a wavelength of the light is 600 nm, we need a left-circularly polarized


light whose electric field is about 1.5 1010 [V/m].
150 4 Optical Transition and Selection Rules

A radius α of a circular motion of the electron is given by

eE
α ¼ pffiffiffi 0 : ð4:99Þ
2mω2

Under the same condition as the above, α is estimated to be 2 Å.

References

1. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York


2. Fowles GR (1989) Introduction to modern optics, 2nd edn. Dover, New York
3. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)
6. Rossi B (1957) Optics. Addison-Wesley, Reading, Massachusetts
Chapter 5
Approximation Methods of Quantum
Mechanics

In Chaps. 1–3 we have focused on solving eigenvalue equations with respect to a


particle confined within a one-dimensional potential well or a harmonic oscillator
along with an electron of a hydrogen-like atom. In each example we obtained exact
analytical solutions with the quantum-mechanical states and corresponding eigen-
values (energy, angular momentum, etc.). In most cases of quantum-mechanical
problems, however, we are not able to get such analytical solutions or accurately
determine the corresponding eigenvalues. Under these circumstances, we need
appropriate approximation methods of those problems. Among those methods, the
perturbation method and variational method are widely used.
In terms of usefulness, we provide several examples concerning physical systems
that have already appeared in Chaps. 1–3. In this chapter, we examine how these
physical systems change their quantum states and corresponding energy eigenvalues
as a result of undergoing influence from the external field. We assume that the
change results from the application of external electric field. For simplicity, we focus
on the change in eigenenergy and corresponding eigenstate with respect to the
nondegenerate quantum state. As specific cases in these examples, we happen to
be able to get perturbed physical quantities accurately. Including such cases, for later
purposes we take in advance important concepts of a complete orthonormal system
(CONS) and projection operator.

5.1 Perturbation Method

In Chaps. 1–3, we considered a situation where no external field is exerted on a


physical system. In Chap. 4, on the other hand, we studied the optical transition that
takes place as a consequence of the interaction between the physical system and
electromagnetic wave. In this chapter, we wish to examine how the physical system
changes its quantum state, when an external electric field is applied to the system.
We usually assume that the external field is weak and the corresponding change in

© Springer Nature Singapore Pte Ltd. 2020 151


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_5
152 5 Approximation Methods of Quantum Mechanics

the quantum state or energy is small. The applied external field causes the change in
Hamiltonian of a quantum system and results in the change in that quantum state
usually accompanied by the energy change of the system. We describe this process
as a following eigenvalue equation:

H j Ψi i ¼ Ei j Ψi i, ð5:1Þ

where H represents the total Hamiltonian; Ei is an energy eigenvalue and Ψi is an


eigenstate corresponding to Ei. Equation (5.1) implies that there should be a series of
eigenvalues and corresponding eigenvectors belonging to those eigenvalues. Strictly
speaking, however, we can get such combinations of the eigenstate and eigenvalue
only by solving (5.1) analytically. This is some kind of circular argument. Yet, at
present we do not have to go into detail about that issue. Instead we advance to
discussion of the approximation methods from a practical point of view.
We assume that H is divided into two terms such that

H ¼ H 0 þ λV, ð5:2Þ

where H0 is the Hamiltonian of the unperturbed system without the external field;
V is an additional Hamiltonian due to the applied external field, or interaction
between the physical system and the external field. The second term includes a
parameter λ that can be modulated according to the change in the strength of the
applied field. The parameter λ should be real so that H of (5.2) can be Hermitian. The
parameter λ can be the applied field itself or can merely be a dimensionless parameter
that can be set at λ ¼ 1 after finishing the calculation. We will come back to this point
later in Examples.
In (5.2) we assume that the following equation holds:

ð0Þ
H 0 j ii ¼ E i j ii, ð5:3Þ

where jii is the i-th quantum state. We designate j0i as the ground state. We assume
the excited states ji i (i ¼ 1, 2,   ) to be numbered in order of increasing energies.
These functions (or vectors) jii (including j0i) can be a quantum state that appeared
as various eigenfunctions in the previous chapters. Of these, two or more functions
may have the same eigenenergy (i.e., degenerate). If we are to deal with such
degenerate states, those states jii have to be distinguished by, e.g., jid i
(d ¼ 1, 2,   , s), where s denotes the degeneracy (or degree of degeneracy). For
simplicity, however, in this chapter we are going to examine only the change in
energy and physical states with respect to nondegenerate states. We disregard the
issue on notation of the degenerate states accordingly. We pay attention to each
individual case later, when necessary.
Regarding a one-dimensional physical system, e.g., a particle confined within
potential well and quantum-mechanical harmonic oscillator, for example, all the
quantum states are nondegenerate as we have already seen in Chaps. 1 and 2. As a
5.1 Perturbation Method 153

special case we also consider the ground state of a hydrogen-like atom. This is
because that state is nondegenerate, and so the problem can be dealt with in parallel
to the cases of the one-dimensional physical systems. We assume that the quantum
sates ji i (i ¼ 0, 1, 2,   ) constitute the orthonormal eigenvectors such that

h j j ii ¼ δji : ð5:4Þ

Remember that the notation has already appeared in (2.53); see Chap. 13 as well.

5.1.1 Quantum State and Energy Level Shift Caused by


Perturbation

One of the most important applications of the perturbation method is to evaluate the
shift in quantum state and energy level caused by the applied external field. To this
end, we expand both the quantum sate and energy as a power series of λ. That is, in
(5.1) we expand jΨii and Ei such that

ð1Þ ð2Þ ð3Þ


j Ψi i ¼j ii þ λ j ϕi i þ λ2 j ϕi i þ λ3 j ϕi i þ   , ð5:5Þ
ð0Þ ð1Þ ð2Þ ð3Þ
Ei ¼ Ei þ λE i þ λ2 Ei þ λ3 Ei þ   , ð5:6Þ

ð1Þ ð2Þ ð3Þ


where j ϕi i, j ϕi i, j ϕi i, etc. are chosen as correction terms for the state jii that
is associated with the unperturbed system. Once again, we assume that the state ji i
(i ¼ 0, 1, 2,   ) represents the nondegenerate normalized eigenvector that belongs to
ð0Þ ð1Þ ð2Þ
the eigenenergy E i of the unperturbed state. Unknown state vectors j ϕi i, j ϕi i,
ð3Þ ð1Þ ð2Þ ð3Þ
j ϕi i, etc. as well as E i , E i , Ei , etc. are to be determined after the calculation
procedures carried out from now. These states jΨii and energies Ei result from the
ð0Þ
perturbation term λV and represent the deviation from jii and E i of the unperturbed
system.
With the normalization condition we impose the following condition upon jΨii
such that

hijΨi i ¼ 1: ð5:7Þ

This condition, however, not necessarily means that jΨii has been normalized
(vide infra). From (5.4) and (5.7) we have
D E D E
ð1Þ ð2Þ
ijϕi ¼ ijϕi ¼    ¼ 0: ð5:8Þ

Let us calculate hΨij Ψii on condition of (5.7) and (5.8). From (5.5) we have
154 5 Approximation Methods of Quantum Mechanics

hD E D Ei
ð1Þ ð1Þ
hΨi jΨi i ¼ hijii þ λ ijϕi þ ϕi ji
hD E D E D Ei
ð2Þ ð2Þ ð1Þ ð1Þ
þλ2 ijϕi þ ϕi ji þ ϕi jϕi þ 
D E D E
ð1Þ ð1Þ ð1Þ ð1Þ
¼ hijii þ λ2 ϕi jϕi þ     1 þ λ2 ϕi jϕi ,

where we used (5.4) and (5.8). Thus, hΨij Ψii is normalized if we ignore the factor of
λ 2.
Now, inserting (5.5) and (5.6) into (5.1) and comparing the same power factors
with respect to λ, we obtain
h i
ð0Þ
E i  H 0 j ii ¼ 0, ð5:9Þ
h i
ð0Þ ð1Þ ð1Þ
E i  H 0 j ϕi i þ E i j ii ¼ V j ii, ð5:10Þ
h i
ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ
Ei  H 0 j ϕi i þ E i j ϕi i þ Ei j ii ¼ V j ϕi i: ð5:11Þ

Equation (5.9) is identical with (5.3). Operating h jj on both sides of (5.10) from
the left, we get
h i h i
ð0Þ ð1Þ ð1Þ ð0Þ ð0Þ ð1Þ ð1Þ
h j j Ei  H 0 j ϕi i þ h j j E i j ii ¼ h j j E i  E j j ϕi i þ E i h jjii
h i
ð0Þ ð0Þ ð1Þ ð1Þ
¼ E i  E j h j j ϕi i þ E i δji ¼ h jjVjii: ð5:12Þ

Putting j ¼ i on (5.12), we have

ð1Þ
Ei ¼ hijVjii: ð5:13Þ

This represents the first-order correction term with the energy eigenvalue. Mean-
while, assuming j 6¼ i on (5.12), we have

ð1Þ h jjVjii
h j j ϕi i ¼ h i ð j 6¼ iÞ, ð5:14Þ
ð0Þ ð0Þ
Ei  Ej

ð 0Þ ð0Þ
where we used Ei 6¼ E j on the assumption that the quantum state jii that belongs
ð0Þ
to eigenenergy E i is nondegenerate. Here, we emphasize that the eigenstates that
ð0Þ
correspond to Ej may or may not be degenerate. In other words, the quantum state
that we are making an issue of with respect to the perturbation is nondegenerate. In
this context, we do not question whether or not other quantum states [represented by
jji in (5.14)] are degenerate.
5.1 Perturbation Method 155

Here we postulate that the eigenvectors, i.e., ji i (i ¼ 0, 1, 2,   ) of H0 form a


complete orthonormal system (CONS) such that
X
P k
jkihkj ¼ E, ð5:15Þ

where jk ih kj is said to be a projection operator and E is an identity operator. As


remarked just above, (5.15) holds regardless of whether the eigenstates are degen-
erate. The word of complete system implies that any vector (or function) can be
expanded by a linear combination of the basis vectors of the said system. The formal
definition of the projection operator can be seen in Chaps. 14 and 18.
Thus, we have

ð1Þ ð1Þ
X ð1Þ
X ð1Þ
j ϕi i ¼ E j ϕi i ¼ k
jkihk jϕi i ¼ k6¼i
jkihkjϕi i
X hkjVjii
¼ k6¼i
j ki h i, ð5:16Þ
ð0Þ ð0Þ
Ei  Ek

where with the third equality we used (5.8) and with the last equality we used (5.14).
Operating hij from the left on (5.16) and using (5.4), we recover (5.8). Hence, using
(5.5) the approximated quantum state to the first order of λ is described by

ð1Þ
X hkjVjii
j Ψi i j ii þ λ j ϕi i ¼j ii þ λ k6¼i
j ki h i: ð5:17Þ
ð0Þ ð0Þ
Ei  Ek

For a while, let us think of a case where we deal with the change in the
eigenenergy and corresponding eigenstate with respect to the degenerate quantum
state. In that case, regarding the state jii in question on (5.12) we have ∃j ji ( j 6¼ i)
ð 0Þ ð0Þ
that satisfies Ei ¼ E j . Therefore, we would have h jj Vj ii ¼ 0 from (5.12).
Generally, it is not the case, however, and so we need a relation different from
(5.12) to deal with the degenerate case. However, we do not get into details about
this issue in this book.
ð2Þ
Next let us seek the second-order correction term E i of the eigenenergy.
Operating h jj on both sides of (5.11) from the left, we get
h i
ð0Þ ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ
Ei  Ej h j j ϕi i þ E i h j j ϕi i þ E i δji ¼ h j j V j ϕi i:

Putting j ¼ i in the above relation as before, we have


156 5 Approximation Methods of Quantum Mechanics

ð2Þ ð1Þ
Ei ¼ hi j V j ϕi i: ð5:18Þ

Notice that we used (5.8) to derive (5.18). Using (5.16) furthermore, we get

ð2Þ
X hkjVjii X hkjVjii
Ei ¼ k6¼i
hi j V j ki h i¼
k6¼i
hk j V { j ii h i
ð0Þ ð0Þ ð0Þ ð0Þ
Ei  Ek Ei  Ek
X hkjVjii X 1
¼ k6¼i
hk j V j ii h i¼
k6¼i
h i jhkjVjiij2 , ð5:19Þ
ð0Þ ð0Þ ð0Þ ð 0Þ
Ei  Ek Ei  Ek

where with the second equality we used (1.116) in combination with (1.118) and
with the third equality we used the fact that V is an Hermitian operator; i.e., V{ ¼ V.
The state jki is sometimes called an intermediate state.
Considering the expression of (5.16), we find that the approximated state of
ð1Þ
j ii þ λ j ϕi i in (5.5) is not an eigenstate of energy, because the approximated state
contains a linear combination of different eigenstates jki that have eigenenergies
ð0Þ
different from Ei of jii. Substituting (5.13) and (5.19) for (5.6), with the energy
correction terms up to the second order we have

ð0Þ ð1Þ ð2Þ


E i  Ei þ λE i þ λ2 E i
ð0Þ
X 1
¼ Ei þ λhijVjii þ λ2 h i jhkjVjiij2 : ð5:20Þ
k6¼i ð0Þ ð0Þ
Ei  Ek

If we think of the ground statej0i forjii, we get

ð0Þ ð1Þ ð2Þ


E 0  E 0 þ λE 0 þ λ2 E 0
ð0Þ
X 1
¼ E0 þ λh0jVj0i þ λ2 h i jhkjVj0ij2 : ð5:21Þ
k6¼0 ð0Þ ð0Þ
E0  Ek

ð0Þ ð0Þ
Since E 0 < Ek ðk ¼ 1, 2,   Þ for anyjki, the second-order correction term is
always negative. This contributes to the energy stabilization.

5.1.2 Several Examples

To deepen the understanding of the perturbation method, we show several examples.


We focus on the change in the eigenenergy and corresponding eigenstate with
respect to specific quantum states. We assume that the change is caused by the
applied electric field as a perturbation.
5.1 Perturbation Method 157

Example 5.1 Suppose that the charged particle is confined within a one-dimen-
sional potential well. This problem has already appeared in Example 1.2. Now let us
consider how an energy of a particle carrying a charge e is shifted by the applied
electric field. This situation could experimentally be achieved using a “nano capac-
itor” where, e.g., an electron is confined within the capacitor, while applying a
voltage between the two electrodes.
We adopt the coordinate system the same as that of Example 1.2. That is, suppose
that we apply an electric field F between the electrodes that are positioned at x ¼  L
and that the electron is confined between the two electrodes. Then, the Hamiltonian
of the system is described as

h2 d 2
H¼  eFx, ð5:22Þ
2m dx2

where m is a mass of the particle. Note that we may choose eF for the parameter λ
of Sect. 5.1.1. Then, the coordinate representation of the Schrödinger equation is
given by

h2 d 2 ψ i ð x Þ
  eFxψ i ðxÞ ¼ E i ψ i ðxÞ, ð5:23Þ
2m dx2

where Ei is the energy of the i-th state from the bottom (i.e., the ground state); ψ i is its
corresponding eigenstate. We wish to seek the perturbation energy up to the second
order and the quantum state up to the first order. We are particularly interested in the
change in the ground state j0i. Notice that the ground state is obtained in (1.101) by
putting l ¼ 0. According to (5.21) we have

ð0Þ
X 1
E 0  E0 þ λh0jVj0i þ λ2 h i jhkjVj0ij2 , ð5:24Þ
k6¼0 ð0Þ ð0Þ
E0  Ek

where V ¼  eFx. Since the field F is thought to be an adjustable parameter, in


(5.24) we may replace λ with eF (vide supra). Then, in (5.24) V is replaced by x in
turn. In this way, (5.24) can be rewritten as

ð0Þ
X 1
E 0  E0  eF h0jxj0i þ ðeF Þ2 h i jhkjxj0ij2 : ð5:25Þ
k6¼0 ð0Þ ð0Þ
E0  Ek

Considering that j0i is an even function [i.e., a cosine function in (1.101)] with
respect to x and that x is an odd function, we find that the second term vanishes.
Notice that the explicit coordinate representation of j0i is
158 5 Approximation Methods of Quantum Mechanics

rffiffiffi
1 π
j 0i ¼ cos x, ð5:26Þ
L 2L
qffiffi  
π
which is obtained by putting l ¼ 0 in j lcos i  1
L cos 2L þ lπL x ðl ¼ 0, 1, 2,   Þ in
(1.101).
By the same token, hkj xj 0i in the third term of RHS of (5.25) vanishes if jki
denotes a cosine function in (1.101). If, on the other hand, jki denotes a sine
function, hkj xj 0i does not vanish. To distinguish the sine functions from cosine
functions, we denote
rffiffiffi
1 nπ
j nsin i  sin x ðn ¼ 1, 2, 3,   Þ ð5:27Þ
L L

that is identical with (1.102); see Fig. 5.1 with the notation of the quantum states.
h ð0Þ π 2 2
h ð0Þ
ð2nÞ π 2 2 2
Now, putting E0 ¼ 2m  4L 2 and E 2n1 ¼ 2m 
4L2
ðn ¼ 1, 2, 3,   Þ in (5.25),
we rewrite it as

ð0Þ
X1 1
E0  E 0  eF h0jxj0i þ ðeF Þ2 h i jhkjxj0ij2
k¼1 ð0Þ ð0Þ
E0  Ek

Fig. 5.1 Notation of the


quantum states that
( ) ℏ
distinguish cosine and sine
functions.
= |2 ⟩ ≡ |4⟩
ð0Þ
E k ðk ¼ 0, 1, 2,   Þ
represents energy
eigenvalues of the
unperturbed states

( ) ℏ
= |2 ⟩ ≡ |3⟩

( ) ℏ
= |1 ⟩ ≡ |2⟩

( ) ℏ
= |1 ⟩ ≡ |1⟩
( ) ℏ
= |0 ⟩ ≡ | 0⟩
5.1 Perturbation Method 159

ð0Þ
X1 1
¼ E 0  eF h0jxj0i þ ðeF Þ2 h i jhnsin jxj0ij2 , ð5:25Þ
n¼1 ð0Þ ð0Þ
E 0  E 2n1

where with the second equality n denotes the number appearing in (5.27) and jnsini
ð0Þ
belongs to E2n1. Referring to, e.g., (4.17) with regard to the calculation of hkj xj 0i,
we have arrived at the following equation described by

h2 π 2 322  8  mL4 X1 n2
E0   2  ðeF Þ2 : ð5:28Þ
2m 4L π h
6 2 n¼1
ð4n  1Þ5
2

The series of the second term in (5.28) rapidly converges. Defining the first
N partial sum of the series as

XN n2
Sð N Þ  ,
n¼1
ð4n2  1Þ5

we obtain

Sð1Þ ¼ 1=243  0:004115226 and Sð6Þ  Sð1000Þ  0:004120685

as well as

½Sð1000Þ  Sð1Þ
¼ 0:001324653:
Sð1000Þ

That is, only the first term of S(N ) occupies ~99.9% of the infinite series. Thus,
ð2Þ
the stabilization energy λ2 E0 due to the perturbation is satisfactorily given by

ð2Þ 322  8  mL4


λ2 E 0  ðeF Þ2 , ð5:29Þ
243π 6 h2
ð2Þ
where the notation λ2 E 0 is due to (5.6). Meanwhile, the corrected term of the
quantum state is given by

ð1Þ
X hkjxj0i
j Ψ0 i j 0i þ λ j ϕ0 i j 0i  eF k6¼0
j ki h i
ð0Þ ð0Þ
E0  Ek

32  8  mL3 X1 ð1Þn1 n
¼j 0i þ eF j n sin i : ð5:30Þ
π 4 h2 n¼1
ð4n2  1Þ3

For a reason similar to the above, the approximation is satisfactory, if we adopt


only n ¼ 1 in the second term of RHS of (5.30). That is, we obtain
160 5 Approximation Methods of Quantum Mechanics

32  8  mL3
j Ψ0 i j 0i þ eF j 1sin i: ð5:31Þ
27π 4 h2

Thus, we have roughly estimated the stabilization energy and the corresponding
quantum state as in (5.29) and (5.31), respectively. It may well be worth estimating
rough numbers of L and F in the actual situation (or perhaps in an actual “nano
device”). The estimation is left for readers.
Example 5.2 [1] In Chap. 2 we dealt with a quantum-mechanical harmonic oscil-
lator. Here let us suppose that a charged particle (with its charge e) is performing
harmonic oscillation under an applied electric field. First we consider the coordinate
representation of Schrödinger equation. Without the external field, the Schrödinger
equation as an eigenvalue equation reads as

h2 d 2 uð qÞ 1
 þ mω2 q2 uðqÞ ¼ EuðqÞ: ð2:108Þ
2m dq2 2

Under the applied electric field, the perturbation is expressed as eFq and, hence,
the equation can be written as

h2 d 2 uð qÞ 1
 þ mω2 q2 uðqÞ  eF ðqÞquðqÞ ¼ EuðqÞ: ð5:32Þ
2m dq2 2

For simplicity we assume that the electric field is uniform independent of q so that
we can deal with it as a constant. Then, (5.32) can be rewritten as

    2 
h2 d2 uðqÞ 1 eF 2 1 2 eF
 þ mω q 
2
uðqÞ ¼ E þ mω uðqÞ: ð5:33Þ
2m dq2 2 mω2 2 mω2

Changing the variable such that

eF
Qq , ð5:34Þ
mω2

we have
  2 
h2 d 2 uð qÞ 1 1 2 eF
 þ mω Q uðqÞ ¼ E þ mω
2 2
uðqÞ: ð5:35Þ
2m dQ2 2 2 mω2

Taking account of the change in functional form that results from the variable
transformation, we rewrite (5.35) as
5.1 Perturbation Method 161

  2 
h2 d 2 e
uð Q Þ 1 1 2 eF
 þ mω Q e
2 2
uðQÞ ¼ E þ mω e
uðQÞ, ð5:36Þ
2m dQ2 2 2 mω2

where
 
eF
uð qÞ ¼ u Q þ e
uðQÞ: ð5:37Þ
mω2

e as
Defining E
 2
e  E þ 1 mω2 eF e2 F 2
E ¼ E þ , ð5:38Þ
2 mω2 2mω2

we get

h2 d 2 e
uðQÞ 1 ee
 þ mω2 Q2e
uð Q Þ ¼ E uðQÞ: ð5:39Þ
2m dQ2 2

Thus, we recover exactly the same form as (2.108). Equation (5.39) can be solved
analytically as already shown in Sect. 2.4. From (2.110) and (2.120), we have

e
2E
λ and λ ¼ 2n þ 1:

Then, we have

e ¼ hω λ ¼ hωð2n þ 1Þ :
E ð5:40Þ
2 2

From (5.38), we get


 
e e2 F 2 1 e2 F 2
E¼E 2
¼ hω n þ  : ð5:41Þ
2mω 2 2mω2
2 2
This implies that the energy stabilization due to the applied electric field is  2mω
e F
2.

Note that the system is always stabilized regardless of the sign of e.


Next, we wish to consider the problem on the basis of the matrix (or operator)
representation. Let us evaluate the perturbation energy using (5.20). Conforming the
notation of (5.20) to the present case, we have
X 1
E n  Eðn0Þ  eF hnjqjni þ ðeF Þ2 k6¼n
h i jhkjqjnij2 , ð5:42Þ
ð0Þ
Eðn0Þ  Ek

where Eðn0Þ is given by


162 5 Approximation Methods of Quantum Mechanics

 
1
Eðn0Þ ¼ hω n þ : ð5:43Þ
2

Taking account of (2.55), (2.68), and (2.62), the second term of (5.42) vanishes.
With the third term, using (2.68) as well as (2.55) and (2.61) we have
rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi
h { h 
jhkjqjnij ¼ 2
kja þ a jn kja þ a{ jn
2mω 2mω
h pffiffiffi pffiffiffiffiffiffiffiffiffiffiffi 2
¼ nhkjn  1i þ n þ 1hkjn þ 1i
2mω
h pffiffiffi pffiffiffiffiffiffiffiffiffiffiffi 2
¼ nδk,n1 þ n þ 1δk,nþ1
2mω
h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii
h
¼ nδk,n1 þ ðn þ 1Þδk,nþ1 þ 2δk,n1 δk,nþ1 nðn þ 1Þ
2mω
h
¼ ½nδ þ ðn þ 1Þδk,nþ1 : ð5:44Þ
2mω k,n1

Notice that with the last equality of (5.44) there is no k that satisfies
k ¼ n  1 ¼ n + 1 at once. Also note that hkj qj ni is a real number. Hence, as the
third term of (5.42) we get
X 1
h i jhkjqjnij2
k6¼n ð0Þ
Eðn0Þ  Ek
X 1 h
¼  ½nδ þ ðn þ 1Þδk,nþ1 
hωðn  kÞ 2mω k,n1
k6¼n
 
1 n nþ1 1
¼ þ ¼ : ð5:45Þ
2mω n  ðn  1Þ n  ðn þ 1Þ
2 2mω2

Thus, from (5.42) we obtain


 
1 e2 F 2
En  hω n þ  : ð5:46Þ
2 2mω2

Consequently, we find that (5.46) obtained by the perturbation method is consis-


tent with (5.41) that was obtained as the exact solution. As already pointed out in
Sect. 5.1.1, however, (5.46) does not represent an eigenenergy. In fact, using (5.17)
together with (4.26) and (4.28), we have
5.1 Perturbation Method 163

ð1Þ
X hkjVj0i h1jVj0i
j Ψ0 i j 0i þ λ j ϕ0 i ¼j 0i þ λ k6¼0
j ki h i ¼j 0i þ λ j 1i h i
ð0Þ ð0Þ ð0Þ ð0Þ
E0  Ek E0  E1
rffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h1jxj0i e2 F 2
¼ j0i  eF j1i h i ¼j 0i þ j 1i: ð5:47Þ
ð0Þ
E E
ð0Þ 2mω3 h
0 1

Thus, jΨ0i is not an eigenstate because jΨ0i contains both j0i and j1i. Hence,
jΨ0i does not possess an eigenenergy. Notice also that in (5.47) the factors hkj xj 0i
vanish except for h1j xj 0i; see Chaps. 2 and 4.
To think of this point further, let us come back to the coordinate representation of
Schrödinger equation.
From (5.39) and (2.106), we have

 1=4 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi 


mω 1 mω
Q e 2h Q ðn ¼ 0, 1, 2,   Þ,
mω 2
e
un ðQÞ ¼ Hn
h π 1=2 2n n! h
pffiffiffiffiffi 

where H n h Q is the Hermite polynomial of the n-th order. Using (5.34) and
(5.37), we replace Q with q  mω
eF
2 to obtain the following form described by

 
eF
un ð qÞ ¼ eun q  2

 14 rffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi  mω eF 2
mω 1 mω eF
¼ Hn q e 2h qmω2 ðn ¼ 0, 1, 2,   Þ: ð5:48Þ
h π 2 2n n!
1
h mω2

Since ψ n(q) of (2.106) forms the (CONS) [2], un(q) should be expanded using
ψ n(q). That is, as a solution of (5.32) we get
X
un ð qÞ ¼ c ψ ðqÞ,
k nk k
ð5:49Þ

where a set of cnk are appropriate coefficients. Since the functions ψ k(q) are
nondegenerate, un(q) expressed by (5.49) lacks a definite eigenenergy.
Example 5.3 [1] In the previous examples we studied the perturbation in
one-dimensional systems, where all the energy eigenstates are nondegenerate.
Here we deal with the change in the energy and quantum states of a hydrogen
atom (i.e., a three-dimensional system). Since the energy eigenstates are generally
degenerate (see Chap. 3), for simplicity we consider only the ground state that is
nondegenerate. As in the previous two cases, we assume that the perturbation is
caused by the applied electric field. As the simplest example, we deal with the
properties including the polarizability of the hydrogen atom in its ground state. The
polarizability of atom, molecule, etc. is one of the most fundamental properties in
materials science.
164 5 Approximation Methods of Quantum Mechanics

We assume that the electric field is applied along the direction of the z-axis; in this
case the perturbation term V is expressed as

V ¼ eFz, ð5:50Þ

where e is a charge of an electron (e < 0). Then, the total Hamiltonian H is described
by

e ¼ H 0 þ V,
H ð5:51Þ
"   #
2
h2 ∂ 2 ∂ 1 ∂ ∂ 1 ∂
H0 ¼  r þ sin θ þ
2μr 2 ∂r ∂r sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
e2
 , ð5:52Þ
4πε0 r

where H0 is identical with the Hamiltonian H of (3.35), in which Z ¼ 1. In this


example, we discuss topics on the polarization and energy shift (Stark effect) caused
by the applied electric field [1, 3].
(1) Polarizability of a hydrogen atom in the ground state
As in (5.5) and (5.16), the first-order perturbed state is described by

X hkjVj0i X hkjzj0i
j Ψ0 i j 0i þ k6¼0
j ki h i ¼j 0i  eF
k6¼0
j ki h i, ð5:53Þ
ð0Þ ð0Þ ð0Þ ð0Þ
E0  Ek E0  Ek

where j0i denotes the ground state expressed as (3.301). Since the quantum states jki
represent all the quantum states of the hydrogen atom as implied in (5.3), in the
second term of RHS in (5.53) we are considering all those states except for j0i. The
ð 0Þ h2
eigenenergy E0 is identical with E 1 ¼  2μa 2 obtained from (3.258), where n ¼ 1.

As already noted, we simply numbered the states jki in order of increasing energies.
ð0Þ ð0Þ ð0Þ ð0Þ
In the case of k 6¼ j (k, j 6¼ 0), we may have either E k ¼ Ej or E k 6¼ Ej
accordingly.
Using (5.51), let us calculate the polarizability of a hydrogen atom in a ground
state. In Sect. 4.1 we gave a definition of the electric dipole moment such that
X
Pe x,
j j
ð4:6Þ

where xj is a position vector of the j-th charged particle. Placing the proton of
hydrogen at the origin, by use of the notation (3.5) P is expressed as
5.1 Perturbation Method 165

1 0 0 1
Px ex
B C B C
P ¼ ðe1 e2 e3 Þ@ Py A ¼ ex ¼ ðe1 e2 e3 Þ@ ey A:
Pz ez

Hence, the z-component of the electric dipole moment Pz of electron is given by

Pz ¼ ez: ð5:54Þ

The quantum-mechanical analog of (5.54) is given by

hPz i ¼ ehΨ0 jzjΨ0 i, ð5:55Þ

where jΨ0i is taken from (5.53). Substituting (5.53) for (5.55), we get

hΨ0 jzjΨ0 i
* +
X hkjzj0i X hkjzj0i
¼ h0 j eF k6¼0
hk j h ijzjj 0i  eF
k6¼0
j ki h i
ð0Þ ð0Þ ð0Þ ð0Þ
E0  Ek E0  Ek
X hkjzj0i X hkjzj0i
¼ h0jzj0i  eF k6¼0
h0jzjk i h i  eF
k6¼0
0jz{ jk h i
ð0Þ ð0Þ ð0Þ ð0Þ
E0  Ek E0  Ek

X jhkjzj0ij2
þðeF Þ2 k6¼0
hkjzjki h i2 : ð5:56Þ
ð0Þ ð0Þ
E0  Ek

In (5.56) readers are referred to the computation rule of inner product such as
(13.20) and (13.64). Since z is Hermitian (i.e., z{ ¼ z), the second and third terms of
RHS of (5.56) equal. Neglecting the second-order perturbation factor on the fourth
term, we have

X jhkjzj0ij2
hΨ0 jzjΨ0 i  h0jzj0i  2eF h i: ð5:57Þ
k6¼0 ð0Þ ð0Þ
E0  Ek

Thus, from (5.55) we obtain

X jhkjzj0ij2
hPz i  eh0jzj0i  2e2 F h i:
k6¼0 ð0Þ ð0Þ
E0  Ek

Using (3.301), the coordinate representation of h0j zj 0i is described by


166 5 Approximation Methods of Quantum Mechanics

Z 1
a3
h0jzj0i ¼ ze2r=a dxdydz
π 1
Z 1 Z Z π
a3 3 2r=a

¼ r e dr dϕ cos θ sin θdθ
π 0 0 0
Z 1 Z Z π
a3 3 2r=a

¼ r e dr dϕ sin 2θdθ ¼ 0, ð5:58Þ
2π 0 0 0

where we used 0 sin 2θdθ ¼ 0. Then, we have

X jhkjzj0ij2
hPz i  2e2 F h i: ð5:59Þ
k6¼0 ð0Þ ð0Þ
E0  Ek

On the basis of classical electromagnetism, we define the polarizability α as [4]

α  hPz i=F: ð5:60Þ

Here we have an important relationship between the polarizability and electric


dipole moment. That is,

ðpolarizabilityÞ ðelectric fieldÞ ¼ ðelectric dipole momentÞ:

From (5.59) and (5.60) we have

X jhkjzj0ij2
α ¼ 2e2 h i: ð5:61Þ
k6¼0 ð0Þ ð 0Þ
E0  Ek

At the first glance, to evaluate (5.61) appears to be formidable, but making the
most of the fact that the total wave functions of a hydrogen atom form the CONS, the
evaluation is straightforward. The discussion is as follows:
First, let us seek an operator G that satisfies [1]

z j 0i ¼ ðGH 0  H 0 GÞ j 0i: ð5:62Þ

To this end, we take account of all the quantum states jki of a hydrogen atom
(including j0i) that are solutions (or eigenstates of energy) of (3.36). In Sect 3.8 we
have obtained the total wave functions described by

eðnÞ ¼ Y m ðθ, ϕÞR


Λ eðl nÞ ðr Þ, ð3:300Þ
l,m l
5.1 Perturbation Method 167

ðnÞ
where Λe is expressed as a product of spherical surface harmonics and radial wave
l,m
functions. The latter functions are described by associated Laguerre functions. It is
well known [2] that the spherical surface harmonics constitute the CONS on a unit
sphere and that associated Laguerre functions form the CONS in the real one-
dimensional space. Thus, their product expressed as (3.300), i.e., aforementioned
collection of jki constitutes the CONS in the real three-dimensional space. This is
equivalent to
X
j
j jih j j¼ E, ð5:63Þ

where E is an identity operator.


The implications of (5.63) are as follows: Take any function jf(r)i and operate
(5.63) from the left. Then, we get
X X
Ef ðrÞ ¼ f ðrÞ ¼ k
j kihkj f ðrÞi ¼ k
f k j ki: ð5:64Þ

In other words, (5.64) implies that any function f(r) can be expanded into a series
of jki. The coefficients fk are defined as

f k  hkj f ðrÞi: ð5:65Þ

Those are so-called “Fourier coefficients.” Related discussion is given in Sect.


10.4 as well as Chaps. 14, 18, and 20.
We further impose the conditions on the operator G defined in (5.62).
(i) G commutes with r (¼j rj ). (ii) G has a functional form described by G ¼ G
(r, θ). (iii) G does not contain a differential operator. On those conditions, we have
∂G/∂ϕ ¼ 0. From (3.301), we have ∂ j 0 i /∂ϕ ¼ 0. Thus, we get

ðGH 0  H 0 GÞ j 0i
    
h2 1 ∂ 2∂ 1 ∂ 2∂ 1 1 ∂ ∂
¼ G 2 r  2 r G 2 sinθ G j 0i: ð5:66Þ
2μ r ∂r ∂r r ∂r ∂r r sinθ ∂θ ∂θ

This calculation is somewhat complicated, and so we calculate (5.66) termwise.


With the first term of (5.66), we have
168 5 Approximation Methods of Quantum Mechanics

  
1 ∂ 2 ∂ 1 ∂ 2 ∂G 1 ∂ 2 ∂ j 0i
 2 r G j 0i ¼  2 r j 0i  2 r G
r ∂r ∂r r ∂r ∂r r ∂r ∂r
2 
1 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i ∂G ∂ j 0i
¼  2  2r j 0i  2 j 0i   2 r2 
r ∂r ∂r ∂r ∂r r ∂r ∂r ∂r ∂r
2 
2 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i
¼ j 0i  2 j 0i  2  2 r2
r ∂r ∂r ∂r ∂r r ∂r ∂r
2 
2 ∂G ∂ G 2 ∂G G ∂ 2 ∂ j 0i
¼ j 0i  2 j 0i þ j 0i  2 r , ð5:67Þ
r ∂r ∂r a ∂r r ∂r ∂r

where with the last equality we used (3.301), namely j0 i / er/a, where a is Bohr
radius (Sect. 3.7.1). The last term of (5.67) cancels the first term of (5.66). Then, we
get

ðGH 0  H 0 GÞ j 0i
   2  
h2 1 1 ∂G ∂ G 1 1 ∂ ∂G
¼ 2  j0i þ 2 j0i þ 2 sin θ j0i
2μ r a ∂r ∂r r sin θ ∂θ ∂θ
   2 
h2 1 1 ∂G ∂ G 1 1 ∂ ∂G
¼ 2  þ 2 þ 2 sin θ j 0i: ð5:68Þ
2μ r a ∂r ∂r r sin θ ∂θ ∂θ

From (5.62) we obtain

hr j z j 0i ¼ hr j ðGH 0  H 0 GÞ j 0i ð5:69Þ

so that we can have the coordinate representation. As a result, we get


   2 
h2 1 1 ∂G ∂ G 1 1 ∂ ∂G
r cos θϕð1sÞ ¼ 2  þ 2þ 2 sin θ ϕð1sÞ, ð5:70Þ
2μ r a ∂r ∂r r sin θ ∂θ ∂θ

where ϕ(1s) is given by (3.301). Dividing (5.70) by ϕ(1s) and rearranging it, we
have
5.1 Perturbation Method 169

2   
∂ G 1 1 ∂G 1 1 ∂ ∂G 2μ
þ 2  þ sin θ ¼ r cos θ: ð5:71Þ
∂r 2 r a ∂r r 2 sin θ ∂θ ∂θ h2

Now, we assume that G has a functional form of

Gðr, θÞ ¼ gðr Þ cos θ: ð5:72Þ

Inserting (5.72) into (5.71), we have

 
d2 g 1 1 dg 2g 2μ
þ2   ¼ 2 r: ð5:73Þ
dr 2 r a dr r 2 h

Equation (5.73) has a regular singular point at the origin and resembles an Euler
equation whose general from of a homogeneous equation is described by [5]

d2 g a dg b
þ þ g ¼ 0, ð5:74Þ
dr 2 r dr r 2

where a and b are arbitrary constants. In particular, if we have a following differen-


tial equation described by

d2 g a dg a
þ  g ¼ 0, ð5:75Þ
dr 2 r dr r 2

we immediately see that one of particular solutions is g ¼ r. However, (5.73) differs


from the general Euler equation by the presence of  2a dg
dr. Then, let us assume that a
particular solution g has a form of

g ¼ pr 2 þ qr: ð5:76Þ

Inserting (5.76) into (5.73), we get

4pr 2q 2μ
4p   ¼ 2 r: ð5:77Þ
a a h

Comparing coefficient of r and constant term, we obtain

aμ a2 μ
p¼ and q ¼  2 : ð5:78Þ
2h 2
h

Hence, we get
170 5 Approximation Methods of Quantum Mechanics

 
aμ r
gð r Þ ¼  þ a r: ð5:79Þ
h2 2

Accordingly, we have
   
aμ r aμ r
Gðr, θÞ ¼ gðr Þ cos θ ¼  þ a r cos θ ¼  þ a z: ð5:80Þ
h2 2 h2 2

This is a coordinate representation of G(r, θ).


Returning back to (5.62) and operating h jj from the left on its both sides, we have

h jjzj0i ¼ h jjðGH 0  H 0 GÞj0i ¼ h jjGH 0 j0i  h jjH 0 Gj0i


h i
ð0Þ ð0Þ
¼ E 0  E j h jjGj0i: ð5:81Þ

Notice here that

ð0Þ
H 0 j 0i ¼ E 0 j 0i ð5:82Þ

and
  h i{
ð0Þ ð0Þ
h j j H 0 ¼ H 0 { j ji{ ¼ H 0 j ji{ ¼ Ej j ji ¼ E j h j j , ð5:83Þ

where we used a computation rule of (1.118) and with the second equality we used
the fact that H0 is Hermitian, i.e., H0{ ¼ H0. Rewriting (5.81), we have

h jjzj0i
h jjGj0i ¼ ð0Þ ð0Þ
: ð5:84Þ
E0  Ej

Multiplying h0jzj ji on both sides of (5.84) and summing over j (6¼0), we obtain

X X h0jzj jih jjzj0i


h0jzj jih jjGj0i ¼ h i: ð5:85Þ
j6¼0 j6¼0 ð0Þ ð0Þ
E0  Ej

Adding h0jzj0 i h0jGj0i on both sides of (5.85), we have

X X h0jzj jih jjzj0i


h0jzj jih jjGj0i ¼ h i þ h0jzj0ih0jGj0i: ð5:86Þ
j j6¼0 ð0Þ ð0Þ
E0  Ej

But, from (5.58) h0j zj 0i ¼ 0. Moreover, using the completeness of jji described
by (5.63) for LHS of (5.86), we get
5.1 Perturbation Method 171

X X h0jzj jih jjzj0i X jh jjzj0ij2


h0jzGj0i ¼ h0jzj jih jjGj0i ¼ h i¼ h i
j j6¼0 ð0Þ ð0Þ j6¼0 ð0Þ ð0Þ
E0  Ej E0  Ej
α
¼ , ð5:87Þ
2e2

where with the second equality we used (5.84) and with the last equality we used
(5.61). Also, we used the fact that z is Hermitian.
Meanwhile, using (5.80), LHS of (5.87) is expressed as
 
aμ  r  aμ a2 μ  
h0jzGj0i ¼  h 0  z þ a z0i ¼  2 h0jzrzj0i  2 h0z2 0i: ð5:88Þ
h 2 2 2h h

Noting that j0i is spherically symmetric and that z and r are commutative, (5.88)
can be rewritten as

aμ  3  a2 μ  2 
h0jzGj0i ¼  h 0 r 0i  h0 r 0i, ð5:89Þ
6h2 3h2

where we used h0jx2j0 i ¼ h0jy2j0 i ¼ h0jz2j0 i ¼ h0jr2j0 i /3.


Returning back to the coordinate representation and taking account of variables
change from the Cartesian coordinate to the polar coordinate as in (5.58), we get

aμ 4π 120a6 a2 μ 4π 3a5 aμ 5a3 a2 μ 4a3


h0jzGj0i ¼   ¼   2
6h2 πa3 64 3h2 πa3 4 h2 4 h 4
3
aμ 9a
¼ 2 : ð5:90Þ
h 4

To perform the definite integral calculations of (5.89) and obtain the result of
(5.90), modify (3.263) and use the formula described below. That is, differentiate

Z1
1
erξ dr ¼
ξ
0

four (or five) times with respect to ξ and replace ξ with 2/a to get the result of (5.90)
appropriately.
Using (5.87) and (5.90), as the polarizability α we finally get
172 5 Approximation Methods of Quantum Mechanics


aμ 9a3 e2 aμ 9a3 9a3
α ¼ 2e h0jzGj0i ¼ 2e   2
2 2
¼ ¼ 4πε 0
h 4 h2 2 2
¼ 18πε0 a3 , ð5:91Þ

where we used

a ¼ 4πε0 h2 =μe2 : ð5:92Þ

See Sect. 3.7.1 for this relation.


(2) Stark effect of a hydrogen atom in a ground state
The energy shift with the ground state of a hydrogen atom up to the second order
is given by

ð0Þ
X 1
E 0  E0  eF h0jzj0i þ ðeF Þ2 h i jh jjzj0ij2 : ð5:93Þ
j6¼0 ð0Þ ð0Þ
E0  Ej

Using (5.58) and (5.87) obtained above, we readily get

ð0Þ αe2 F 2 ð0Þ αF 2 ð0Þ


E0  E0  ¼ E 0  ¼ E 0  9πε0 a3 F 2 : ð5:94Þ
2e2 2

Experimentally, the energy shift by 9πε0a3F2 is well known as the Stark shift.
In the above examples, we have seen how energy levels and related properties are
changed by the applied electric field. The energies of the system that result from the
applied external field should not be considered as an energy eigenvalue, but should
be thought to be an expectation value.
The perturbation theory has a wide application in physical problems. Examples
include the evaluation of transition probability between quantum states. The theory
also deals with scattering of particles. Including the treatment of the degenerate case,
interested readers are referred to appropriate literature [1, 3].

5.2 Variational Method

Another approximation method is a variational method. This method also has a wide
range of applications in mathematical physics.
A major application of the variational method lies in seeking an eigenvalue that is
appropriately approximated. Suppose that we have an Hermitian differential opera-
tor D that satisfies an eigenvalue equation
5.2 Variational Method 173

Dyn ¼ λn yn ðn ¼ 1, 2, 3,   Þ, ð5:95Þ

where λn is a real eigenvalue and yn is a corresponding eigenfunction. We have


already encountered one of typical differential operators and eigenvalue equations in
(1.63) and (1.64). In (5.95) we assume that

λ1 λ2 λ3   , ð5:96Þ

where corresponding eigenstates may be degenerate.


We also assume that a collection {yn; n ¼ 1, 2, 3,   } constitutes the CONS. Now
suppose that we have a function u that satisfies the boundary conditions (BCs) the
same as those for yn that is a solution of (5.95). Then, we have

hyn jDui ¼ hDyn jui ¼ hλn yn jui ¼ λn hyn jui ¼ λn hyn jui: ð5:97Þ

In (5.97) we used (1.132) and the computation rule of the inner product (see Sect.
13.1). Also, we used the fact that any eigenvalue of an Hermitian operator is real
(Sect. 1.4). From the assumption that the functions {yn} constitute the CONS, u can
be expanded in a series described by
X
u¼ c
j j
j yj i ð5:98Þ

with

cj  yj ju : ð5:99Þ

This relation is in parallel with (5.64). Similarly, Du can be expanded such that
X X X X
Du ¼ d
j j
j yj i ¼ j
yj jDu j yj i ¼ λ
j j
yj ju j yj i ¼ λc
j j j
j yj i, ð5:100Þ

where dj is an expansion coefficient and with the third equality we used (5.95) and
with the last equality we used (5.99). Comparing the individual coefficient of jyji of
(5.100), we get

d j ¼ λj c j : ð5:101Þ

Thus, we obtain
X
Du ¼ λc
j j j
j yj i: ð5:102Þ

Comparing (5.98) and (5.102), we find that we may termwise operate D on (5.98).
174 5 Approximation Methods of Quantum Mechanics

Meanwhile, taking an inner product huj Dui again termwise with (5.100), we get
D X E X X X  2
hujDui ¼ uj j λj cj j yj i ¼ j
λ j c j ujy j ¼ j
λ j c j c 
j ¼ λ c 
j j j
X  2 X  2
λ  cj  ¼ λ 1 c  , ð5:103Þ
j 1 j j

where with the third equality we used (5.99) in combination with ujyj ¼

yj ju ¼ cj . With huj ui, we have
DX X E X X  2
hujui ¼ c y jj c j y i ¼ c 
c y jy ¼ c  , ð5:104Þ
j j j j j j j j j j j j j

where with the last equality we used the fact that the functions {yn} are normalized.
Comparing once again (5.103) and (5.104), we finally get

hujDui
hujDui λ1 hujui or λ1 : ð5:105Þ
hujui

If we choose y1 for u, we have an equality in (5.105). Thus, we reach the


following important theorem.
Theorem 5.1 Suppose that we have a linear differential equation described by

Dy ¼ λy, ð5:106Þ

where D is a suitable Hermitian differential operator. Suppose also that under


appropriate boundary conditions (BCs), (5.106) has a series of (real) eigenvalues.
Then, the smallest eigenvalue is equal to the minimum of hhujDu i
ujui .

Using this powerful theorem, we are able to find a suitably approximated smallest
eigenvalue. We have simple examples next. As before, let us focus on the change in
the energies and quantum states caused by the applied electric field for a particle in a
one-dimensional potential well and a harmonic oscillator. The latter problem will be
left for readers as an exercise. As a specific case, for simplicity, we consider only the
ground state and first excited state to choose a trial function. First, we deal with the
problem in common with both the cases. Then, we discuss the feature individually.
As a general case, suppose that H is the Hamiltonian of the system (either a
particle in a one-dimensional potential well or a harmonic oscillator). We calculate
an expectation value of energy hHi when the system undergoes an influence of the
electric field. The Hamiltonian is described by
5.2 Variational Method 175

H ¼ H 0  eFx, ð5:107Þ

where H0 is the Hamiltonian without the external field F. We suppose that the
quantum state jui is expressed as

j ui j 0i þ a j 1i, ð5:108Þ

where j0i and j1i denote the ground state and first excited state, respectively; a is an
unknown parameter to be decided after the analysis based on the variational method.
In the present case, jui is a trial function. Then, we have

hujHui
hH i  : ð5:109Þ
hujui

With the numerator, we have

hujHui ¼ hh0 j þah1 jjH 0  eFxjj 0i þ a j 1ii


¼ h0jH 0 j0i þ ah0jH 0 j1i  eF h0jxj0i  eFah0jxj1i
þa h1jH 0 j0i þ a ah1jH 0 j1i  eFa h1jxj0i  eFa ah1jxj1i
¼ h0jH 0 j0i  eFah0jxj1i þ a ah1jH 0 j1i  eFa h1jxj0i: ð5:110Þ

Note that in (5.110) four terms vanish because of the symmetry requirement of the
symmetric potential well and harmonic oscillator with respect to the origin as well as
because of the orthogonality between the states j0i and j1i. Here x is Hermitian and
we assume that the coordinate representation of j0i and j1i is real as in (1.101) and
(1.102) or in (2.106). Then, we have

h1jxj0i ¼ h1jxj0i ¼ 0jx{ j1 ¼ h0jxj1i:

Also, assuming that a is a real number, we rewrite (5.110) as

hujHui ¼ h0jH 0 j0i  2eFah0jxj1i þ a2 h1jH 0 j1i: ð5:111Þ

As for the denominator of (5.109), we have

hujui ¼ hh0 j þah1 jjj 0i þ a j 1ii ¼ 1 þ a2 , ð5:112Þ

where we used the normalized conditions h0j 0i ¼ h1j 1i ¼ 1 and the orthogonal
conditions h0j 1i ¼ h1j 0i ¼ 0. Thus, we get
176 5 Approximation Methods of Quantum Mechanics

h0jH 0 j0i  2eFah0jxj1i þ a2 h1jH 0 j1i


hH i ¼ : ð5:113Þ
1 þ a2

Here, defining the ground state energy and first excited energy as E0 and E1,
respectively, we put

H 0 j0i ¼ E 0 j0i and H 0 j 1i ¼ E 1 j 1i, ð5:114Þ

where E0 6¼ E1. As already shown in Chaps. 1 and 2, both the systems have
nondegenerate energy eigenstates. Then, we have

E0  2eFah0jxj1i þ a2 E1
hH i ¼ : ð5:115Þ
1 þ a2

Now we wish to determine the condition where hHi has an extremum. That is, we
are seeking a that satisfies

∂ hH i
¼ 0: ð5:116Þ
∂a

Calculating LHS of (5.116) by use of (5.115), we have

∂hH i 2eF h0jxj1ia2 þ 2ðE 1  E0 Þa  2eF h0jxj1i


¼ : ð5:117Þ
∂a ð1 þ a2 Þ2

For ∂∂a
hH i
to be zero, we must have

eF h0jxj1ia2 þ ðE1  E0 Þa  eF h0jxj1i ¼ 0: ð5:118Þ

Then, solving the quadratic equation (5.118), we obtain


rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h0jxj1i2
E 0  E1  ðE 1  E 0 Þ 1 þ 4½ðeF E E Þ2
1 0
a¼ : ð5:119Þ
2eF h0jxj1i

Taking the plus sign in the numerator of (5.119) and using

pffiffiffiffiffiffiffiffiffiffiffiffi Δ
1þΔ1þ ð5:120Þ
2

for small Δ, we obtain


5.2 Variational Method 177

eF h0jxj1i
a : ð5:121Þ
E1  E0

Thus, with the optimized state of (5.108) we get

eF h0jxj1i
j ui j 0i þ j 1i: ð5:122Þ
E1  E0

If one compares (5.122) with (5.17), one should recognize that these two equa-
tions are the same within the first-order approximation. Notice that putting i ¼ 0 and
k ¼ 1 in (5.17) and replacing V with eFx, we have the expression similar to (5.122).
Notice once again that h0j xj 1i ¼ h1j xj 0i as x is an Hermitian operator and that we
are considering real inner products.
Inserting (5.121) into (5.115) and approximating

1
 1  a2 , ð5:123Þ
1 þ a2

we can estimate hHi. The estimation depends upon the nature of the system we
choose. The next examples deal with these specific characteristics.
Example 5.4 We adopt the same problem as Example 5.1. That is, we consider how
an energy of a particle carrying a charge e confined within a one-dimensional
potential well is changed by the applied electric field. Here we deal with it using
the variational method.
We deal with the same Hamiltonian as in the case of Example 5.1. It is described
as

h2 d 2
H¼  eFx: ð5:22Þ
2m dx2

By putting

h2 d 2
H0   , ð5:124Þ
2m dx2

we rewrite the Hamiltonian as

H ¼ H 0  eFx: ð5:125Þ

Expressing the ground state as j0i and the first excited state as j1i, we adopt a trial
function jui as
178 5 Approximation Methods of Quantum Mechanics

j ui ¼j 0i þ a j 1i: ð5:108Þ

In the coordinate representation, we have


rffiffiffi
1 π
j 0i ¼ cos x ð5:26Þ
L 2L

and
rffiffiffi
1 π
j 1i ¼ sin x: ð5:126Þ
L L

From Example 1.2 of Sect. 1.3, we have

h2 π 2
h0jH 0 j0i ¼ E0 ¼  ,
2m 4L2
h2 4π 2
h1jH 0 j1i ¼ E 1 ¼  ¼ 4E0 : ð5:127Þ
2m 4L2

Inserting these results into (5.121), we immediately get

h0jxj1i
a  eF : ð5:128Þ
3E 0

As in Example 5.1 of Sect. 5.1, we have

32L
h0jxj1i ¼ : ð5:129Þ
9π 2

For this calculation, use the coordinate representation of (5.26) and (5.126). Thus,
we get

32  8  mL3
a  eF : ð5:130Þ
27π 4 h2

Meanwhile, from (5.108) we obtain

h0jxj1i
j ui j 0i þ eF j 1i: ð5:131Þ
3E 0

The resulting jui in (5.131) is the same as (5.31), which was obtained by the first-
order approximation of (5.30). Inserting (5.130) into (5.113) and approximating the
denominator as before such that
References 179

1
 1  a2 , ð5:123Þ
1 þ a2

and further using (5.130) for a, we get

322  8  mL4
hH i  E0  ðeF Þ2 : ð5:132Þ
243π 6 h2

Once again, this is the same result as that of (5.28) and (5.29) where only n ¼ 1 is
taken into account.
Example 5.5 We adopt the same problem as Example 5.2. The calculation pro-
cedures are almost the same as those of Example 5.2. We use
rffiffiffiffiffiffiffiffiffiffi
h
h0jqj1i ¼ and E1  E 0 ¼ hω: ð5:133Þ
2mω

The detailed procedures are left for readers as an exercise. We should get the same
results as those obtained in Example 5.2.
As discussed in the above five simple examples, we showed the calculation
procedures of the perturbation method and variational method. These methods not
only supply us with suitable approximation techniques, but also provide physical
and mathematical insight in many fields of natural science.

References

1. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)


2. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
3. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York
4. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York
5. Coddington EA (1989) An introduction to ordinary differential equations. Dover, New York
Chapter 6
Theory of Analytic Functions

Theory of analytic functions is one of major fields of modern mathematics. Its


application covers broad range of topics of natural science. A complex function
f (z), or a function that takes a complex number z as a variable, has various properties
that often differ from those of functions that take a real number x as a variable. In
particular, the analytic functions hold a paramount position in the complex analysis.
In this chapter we explore various features of the analytic functions accordingly.
From a practical point of view, the theory of analytic functions is very frequently
utilized for the calculation of real definite integrals. For this reason, we describe the
related topics together with tangible examples.
The complex plane (or Gaussian plane) can be dealt with as a topological space
where the metric (or distance function) is defined. Since the complex plane has a
two-dimensional extension, we can readily imagine and investigate its topological
feature. Set theory allows us to make an axiomatic approach along with the topology.
Therefore, we introduce basic notions and building blocks of the set theory and
topology.

6.1 Set and Topology

A complex number z is usually expressed as

z ¼ x þ iy, ð6:1Þ

where x and y are real numbers and i is an imaginary unit. Graphically, the number
z is indicated as a point in a complex plane where the real axis is drawn as abscissa
and the imaginary axis is depicted as ordinate (see Fig. 6.1). Since the complex plane
has a two-dimensional extension, we can readily imagine the domain of variability of
z and make a graphical object for it on the complex plane. A disk-like diagram that is
enclosed with a closed curve C is frequently dealt with in the theory of analytic

© Springer Nature Singapore Pte Ltd. 2020 181


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_6
182 6 Theory of Analytic Functions

Fig. 6.1 Complex plane


where a complex number
z is depicted

0 1

functions. Such a diagram provides a good subject for study of the set theory and
topology. Before developing the theory of complex numbers and analytic functions,
we briefly mention the basic notions as well as notations and meanings of sets and
topology.

6.1.1 Basic Notions and Notations

A set comprises numbers (either real or complex) or mathematical objects, more


generally. The latter include, e.g., vectors, their transformation, etc., as we shall see
various illustrations in Parts III and IV. The set A is described such that

A ¼ f1, 2, 3,   , 10g or B ¼ fx; a < x < b, a, b 2 ℝg, ð6:2Þ

where A contains ten elements in the former case and uncountably infinite elements
are contained in the latter set B. In the latter case (sometimes in the former case as
well), the elements are usually called points.
When we speak of the sets, a universal set [1] is implied in the background. This
set contains all the sets in consideration of the study and is usually clearly defined in
the context of the discussion. For example, with the real analysis the universal set is
ℝ and with the complex analysis it is ℂ (an entire complex plane). In the latter case a
two-dimensional complex plane (see Fig. 6.1) is frequently used as the universal set.
We study various characteristics of sets (or subsets) that are contained in the
universal set and use a Venn diagram to represent them. Figure 6.2a illustrates an
example of Venn diagram that shows a subset A. As is often the case, the universal
set U is omitted in Fig. 6.2a. To show a (sub)set A we usually depict it with a closed
curve. If an element a is contained in A, we write

a2A ð6:3Þ
6.1 Set and Topology 183

(a) (b)

Fig. 6.2 Venn diagrams. (a) An example of the Venn diagram. (b) A subset A and its complement
Ac. U shows the universal set

and depict it as a point inside the closed curve. To show that b is not contained in A,
we write

b2
= A:

In that case, we depict b outside the closed curve of Fig. 6.2a.


To show that a set A is contained in another set B, we write

A ⊂ B: ð6:4Þ

If (6.4) holds, A is called a subset of B. Naturally, every set contained in the


universal set is its subset.
We need to explicitly show a set that has no element. The relevant set is called an
empty set and denoted by ∅. Examples are given below.
 
x; x2 < 0, x 2 ℝ ¼ ∅, fx; x 6¼ x; g ¼ ∅, x 2
= ∅, etc:

The subset A may have different cardinal numbers (or cardinalities) depending on
the nature of A. To explicitly show this, elements are sometimes indexed as, e.g.,
ai (i ¼ 1, 2,   ) or aλ (λ 2 Λ), where Λ can be a finite set or infinite set with different
cardinalities; (6.2) is an example. A complement (or complementary set) of A is
defined as a difference U  A and denoted by Ac(U  A). That is, the set Ac is said
to be a complement of A with respect to U and indicated with a shaded area in
Fig. 6.2b.
Let A and B be two different subsets in U (Fig. 6.3). Figure 6.3 represents a sum
(or union, or union of sets) A [ B, an intersection A \ B, and differences A  B and
B  A. More specifically, A  B is defined as a subset of A to which B is subtracted
from A and referred to as a set difference of A relative to B. We have

A [ B ¼ ðA  BÞ [ ðB  AÞ [ ðA \ BÞ: ð6:5Þ
184 6 Theory of Analytic Functions

Fig. 6.3 Two different


subsets A and B in an
universal set U. The diagram
shows the sum (A [ B),
intersection (A \ B), and −
differences (A  B and
B  A) ∩

If A has no element in common with B, A  B ¼ A and B  A ¼ B with A \ B ¼ ∅.


Then, (6.5) trivially holds. Notice also that A [ B is the smallest set that contains both
A and B. In (6.5) the elements must not be doubly counted. Thus, we have

A [ A ¼ A:

The well-known de Morgan’s law can be understood graphically using Fig. 6.3.

ðA [ BÞc ¼ Ac \ Bc and ðA \ BÞc ¼ Ac [ Bc : ð6:6Þ

This can be shown as follows: With 8u 2 U we have

u 2 ðA [ BÞc ⟺ u 2
= A [ B ⟺ u2
= A and u 2
= B ⟺ u 2 Ac and u 2 Bc
⟺ u 2 Ac \ Bc :

Furthermore, we have

u 2 ðA \ BÞc ⟺ u 2
= A \ B ⟺ u2
= A or u 2
= B ⟺ u 2 Ac or u 2 Bc ⟺ u 2 Ac [ Bc :

We have other important relations such as the distributive law described by

ðA [ BÞ \ C ¼ ðA \ C Þ [ ðB \ C Þ, ð6:7Þ
ðA \ BÞ [ C ¼ ðA [ CÞ \ ðB [ CÞ: ð6:8Þ

The confirmation of (6.7) and (6.8) is left for readers. The Venn diagrams are
intuitively acceptable, but we need more rigorous discussion to characterize and
analyze various aspects of sets (vide infra).
6.1 Set and Topology 185

6.1.2 Topological Spaces and Their Building Blocks

On the basis of the aforementioned discussion, we define a topology and topological


space next. Let T be a universal set. Let τ be a collection of subsets of T. If the
collection τ satisfies the following axioms, τ is called a topology on T. The relevant
axioms are as follows:
(O1) T 2 τ and ∅ 2 τ, where ∅ is an empty set.
(O2) If O1, O2,   , On 2 τ, O1 \ O2   \ On 2 τ.
(O3) If Oλ (λ 2 Λ) 2 τ, [λ 2 ΛOλ 2 τ.
In the above axioms, T is called an underlying set. The coupled object of T and τ
is said to be a topological space and denoted by (T, τ). The members (i.e., subsets) of
τ are defined as open sets. (If we do not assume the topological space, the underlying
set is equivalent to the universal set. Henceforth, however, we do not have to be too
strict with the difference in this kind of terminology.)
Let (T, τ) be a topological space with a subset O0 ⊂ T. Let τ0 be defined such that

τ0  fO0 \ O; O 2 τg:

Then, (O0, τ0) can be regarded as a topological space. In this case τ0 is called a
relative topology for O0 [2, 3] and O0 is referred to as a subspace of T. The topological
space (O0, τ0) shares the same topological feature as that of (T, τ). Notice that O0 does
not necessarily need to be an open set of T. But, if O0 is an open set of T, then any
open set belonging to O0 is an open set of T as well. This is evident from the
definition of the relative topology and Axiom (O2).
The abovementioned Axioms (O1)–(O3) sound somewhat pretentious and the
definition of the open sets seems to be “descent from heaven.” Nonetheless, these
axioms and terminology turn out useful soon. Once a topological space (T, τ) is
given, subsets of various types arise and a variety of relationships among them ensue
as well. Note that the above axioms mention nothing about a set that is different from
an open set. Let S be a subset (maybe an open set or maybe not) of T and let us think
of the properties of S.
Definition 6.1 Let (T, τ) be a topological space and S be an open set of τ; i.e., S 2 τ.
Then, a subset defined as a complement of S in (T, τ); i.e., Sc ¼ T  S is called a
closed set in (T, τ).
Replacing S with Sc in Definition 6.1, we have (Sc)c ¼ S ¼ T  Sc. The above
definition may be rephrased as follows: Let (T, τ) be a topological space and let A be
a subset of T. Then, if Ac ¼ T  A 2 τ, then A is a closed set in (T, τ).
In parallel with the axioms (O1) to (O3), we have the following axioms related to
the closed sets of (T, τ): Let eτ be a collection of all the closed sets of (T, τ).
(C1) T 2 eτ and ∅ 2 eτ, where ∅ is an empty set.
(C2) If C1, C2,   , C n 2 eτ, C1 [ C 2    [ C n 2 eτ.
(C3) If C λ ðλ 2 ΛÞ 2 eτ, \λ2Λ C λ 2 eτ.
186 6 Theory of Analytic Functions

Fig. 6.4 Venn diagram that


represents a neighborhood
S of a. S contains an open set
O that contains a

These axioms are obvious from those of (O1)–(O3) along with de Morgan’s law
(6.6).
Next, we classify and characterize a variety of elements (or points) and subsets as
well as relationships among them in terms of the open sets and closed sets. We make
the most of these notions and properties to study various aspects of topological
spaces and their structures. In what follows, we assume the presence of a topological
space (T, τ).
(a) Neighborhoods [4]
Definition 6.2 Let (T, τ) be a topological space and S be a subset of T. If S contains
an open set ∃O that contains an element (or point) a 2 T, i.e.,

a 2 O ⊂ S,

then S is called a neighborhood of a.


Figure 6.4 gives a Venn diagram that represents a neighborhood of a. This simple
definition occupies an important position in the study of the topological spaces and
the theory of analytic functions.
(b) Interior and Closure [4]
With the notion of neighborhoods at the core, we shall see further characterization
of sets and elements of the topological spaces.
Definition 6.3 Let x be a point of T and S be a subset of T. If S is a neighborhood of
x, x is said to be in the interior of S. In this case, x is called an interior point of S. The
interior of S is denoted by S∘.
The interior as a technical term can be understood as follows: Suppose that x 2 = S.
Then, by Definition 6.2 S is not a neighborhood of x. By Definition 6.3, this leads to
x2= S∘. This statement can be translated into x 2 Sc ) x 2 (S∘)c. That is, Sc ⊂ (S∘)c.
This is equivalent to

S∘ ⊂ S: ð6:9Þ

Definition 6.4 Let x be a point of T and S be a subset of T. If for any neighborhood


8
N of x we have N \ S 6¼ ∅, x is said to be in the closure of S. In this case, x is called
an adherent point of S. The closure of S is denoted by S.
6.1 Set and Topology 187

According to the above definition, with the adherent point x of S we explicitly


write x 2 S. Think of negation of the statement of Definition 6.4. That is, with some
neighborhood ∃N of x we have (i) N \ S ¼ ∅ , x 2 = S . Meanwhile,
(ii) N \ S ¼ ∅ ) x 2 = S. Combining (i) and (ii), we have x 2
= S ) x2 = S. Similarly
to the above argument about the interior, we get

S ⊂ S: ð6:10Þ

Combining (6.9) and (6.10), we get

S∘ ⊂ S ⊂ S: ð6:11Þ

In relation to the interior and closure, we have following important lemmas.


Lemma 6.1 Let S be a subset of T and O be any open set contained in S. Then, we
have O ⊂ S∘.
Proof Suppose that with an arbitrary point x, x 2 O. Since we have x 2 O ⊂ S, S is
a neighborhood of x (due to Definition 6.2). Then, from Definition 6.3 we have
x 2 S∘. Then, we have x 2 O ) x 2 S∘. This implies O ⊂ S∘. This completes the
proof. ∎
Lemma 6.2 Let S be a subset of T. If x 2 S∘, then there is some open set ∃O that
satisfies x 2 O ⊂ S.
Proof Suppose that with a point x, x 2 S∘. Then by Definition 6.3, S is a neighbor-
hood of x. Meanwhile, by Definition 6.2 there is some open set ∃O that satisfies
x 2 O ⊂ S. This completes the proof. ∎
Lemmas 6.1 and 6.2 teach us further implications. From Lemma 6.1 and (6.11)
we have

O ⊂ S∘ ⊂ S ð6:12Þ

with an arbitrary open set 8O contained in S∘. From Lemma 6.2 we have
x 2 S∘ ) x 2 O; i.e., S∘ ⊂ O with some open set ∃O containing S∘. Expressing
e we have S∘ ⊂ O.
this specific O as O, e Meanwhile, using Lemma 6.1 once again we
must get

e ⊂ S∘ ⊂ S:
O ð6:13Þ

e and O
That is, we have S∘ ⊂ O e ⊂ S∘ at once. This implies that with this specific
e e ∘
open set O, we get O ¼ S . That is,
188 6 Theory of Analytic Functions

Fig. 6.5 (a) Largest open (a) (b)


set S∘ contained in S.
O denotes an open set. (b)
Smallest closed set S
containing S. C denotes a
closed set ° ̅

e ¼ S∘ ⊂ S:
O ð6:14Þ

Relation (6.14) obviously shows that S∘ is an open set and moreover that S∘ is the
largest open set contained in S. Figure 6.5a depicts this relation. In this context we
have a following important theorem.
Theorem 6.1 Let S be a subset of T. The subset S is open if and only if S ¼ S∘.
Proof From (6.14) S∘ is open. Therefore, if S ¼ S∘, then S is open. Conversely,
suppose that S is open. In that event S itself is an open set contained in S. Meanwhile,
S∘ has been characterized as the largest open set contained in S. Consequently, we
have S ⊂ S∘. At the same time, from (6.11) we have S∘ ⊂ S. Thus, we get S ¼ S∘.
This completes the proof. ∎
In contrast to Lemmas 6.1 and 6.2, we have following lemmas with respect to
closures.
Lemma 6.3 Let S be a subset of T and C be any closed set containing S. Then, we
have S ⊂ C.
Proof Since C is a closed set, Cc is an open set. Suppose that with a point x, x 2
= C.
Then, x 2 Cc. Since C ⊃ S, Cc ⊂ Sc. This implies Cc \ S ¼ ∅. As x 2 Cc ⊂ Cc,
by Definition 6.2 Cc is a neighborhood of x. Then, by Definition 6.4 x is not in S; i.e.,
 c  c
x2= S. This statement can be translated into x 2 Cc ) x 2 S ; i.e., Cc ⊂ S . That
is, we get S ⊂ C. ∎
Lemma 6.4 Let S be a subset of T and suppose that a point x does not belong to S;
i.e., x 2 = C for some closed set ∃C containing S.
= S. Then, x 2
= S, then by Definition 6.4 we have some neighborhood ∃N and some
Proof If x 2

open set O contained in that N such that x 2 O ⊂ N and N \ S ¼ ∅. Therefore, we
must have O \ S ¼ ∅. Let C ¼ Oc. Then C is a closed set with C ⊃ S. Since
x 2 O, x 2
= C. This completes the proof. ∎
From Lemmas 6.3 and 6.4 we obtain further implications. From Lemma 6.3 and
(6.11) we have
6.1 Set and Topology 189

S⊂S⊂C ð6:15Þ

with an arbitrary closed set 8C containing S. From Lemma 6.4 we have x 2


 c
e
S ) x 2 C c ; i.e., C ⊂ S with ∃C containing S. Expressing this specific C as C,
e
we have C ⊂ S. Meanwhile, using Lemma 6.3 once again we must get

e
S ⊂ S ⊂ C: ð6:16Þ

That is, we have Ce ⊂ S and S ⊂ C e at once. This implies that with this specific
e e
closed set C, we get C ¼ S. That is,

e
S ⊂ S ¼ C: ð6:17Þ

Relation (6.17) obviously shows that S is a closed set and moreover that S is the
smallest closed set containing S. Figure 6.5b depicts this relation. In this context we
have a following important theorem.
Theorem 6.2 Let S be a subset of T. The subset S is closed if and only if S ¼ S.
Proof The proof can be done analogously to that of Theorem 6.1. From (6.17) S is
closed. Therefore, if S ¼ S, then S is closed. Conversely, suppose that S is closed. In
that event S itself is a closed set containing S. Meanwhile, S has been characterized as
the smallest closed set containing S. Then, we have S ⊃ S. At the same time, from
(6.11) we have S ⊃ S. Thus, we get S ¼ S. This completes the proof. ∎
(c) Boundary [4]
Definition 6.5 Let (T, τ) be a topological space and S be a subset of T. If a point
x 2 T is in both the closure of S and the closure of the complement of S, i.e., Sc, x is
said to be in the boundary of S. In this case, x is called a boundary point of S. The
boundary of S is denoted by Sb.
By this definition Sb can be expressed as

Sb ¼ S \ Sc : ð6:18Þ

Replacing S with Sc, we get

ðSc Þb ¼ Sc \ ðSc Þc ¼ Sc \ S: ð6:19Þ

Thus, we have

Sb ¼ ðSc Þb : ð6:20Þ
190 6 Theory of Analytic Functions

This means that S and Sc have the same boundary. Since both S and Sc are closed
sets and the boundary is defined as their intersection, from Axiom (C3) the boundary
is a closed set. From Definition 6.1 a closed set is defined as a complement of an
open set, and vice versa. Thus, the open set and closed set do not make a confron-
tation concept but a complementation concept. In this respect we have the following
lemma and theorem.
Lemma 6.5 Let S be an arbitrary subset of T. Then, we have

Sc ¼ ðS∘ Þc :

Proof Since S∘ ⊂ S, (S∘)c ⊃ Sc. As S∘ is open, (S∘)c is closed. Hence, ðS∘ Þc ⊃ Sc,
because Sc is the smallest closed set containing Sc. Next let C be an arbitrary closed set
that contains Sc. Then Cc is an open set that is contained in S, and so we have Cc ⊂ S∘,
because S∘ is the largest open set contained in S. Therefore, C ⊃ (S∘)c. Then, if we
choose Sc for C, we must have Sc ⊃ (S∘)c. Combining this relation with ðS∘ Þc ⊃ Sc
obtained above, we get Sc ¼ ðS∘ Þc . ∎
Theorem 6.3 [4] A necessary and sufficient condition for S to be both open and
closed at once is Sb ¼ ∅.
Proof
(i) Necessary condition: Let S be open and closed. Then, S ¼ S and S ¼ S∘.
Suppose that Sb 6¼ ∅. Then, from Definition 6.5 we would have ∃ a 2 S \ Sc
for some a. Meanwhile, from Lemma 6.5 we have Sc ¼ ðS∘ Þc . This leads to
a 2 S \ ðS∘ Þc. But, from the assumption we would have a 2 S \ Sc. We have no
such a, however. Thus, using the proof by contradiction we must have Sb ¼ ∅.
(ii) Sufficient condition: Suppose that Sb ¼ ∅. Starting with S ⊃ S∘ and S 6¼ S∘, let
us assume that there is a such that a 2 S and a 2 = S∘. That is, we have a 2
ðS∘ Þ ) a 2 S \ ðS∘ Þ ⊂ S \ ðS∘ Þ ¼ S \ Sc ¼ Sb , in contradiction to the sup-
c c c

position. Thus, we must not have such a, implying that S ¼ S∘. Next, starting
with S ⊃ S and S 6¼ S, we assume that there is a such that a 2 S and a 2
= S. That
is, we have a 2 Sc ) a 2 S \ Sc ⊂ S \ Sc ¼ Sb , in contradiction to the suppo-
sition. Thus, we must not have such a, implying that S ¼ S. Combining this
relation with S ¼ S∘ obtained above, we get S ¼ S ¼ S∘ . In other words, S is
both open and closed at once. These complete the proof. ∎
The abovementioned set is sometimes referred to as a closed-open set or a clopen
set as a portmanteau word. An interesting example can be seen in Chap. 20 in
relation to topological groups (or continuous groups).
6.1 Set and Topology 191

(d) Accumulation Points and Isolated Points


We have another important class of elements and sets that include accumulation
points and isolated points.
Definition 6.6 Let (T, τ) be a topological space and S be a subset of T. Suppose that
we have a point p 2 T. If for any neighborhood N of p we have N \ (S  {p}) 6¼ ∅,
then p is called an accumulation point of S. A set comprising all the accumulation
points of S is said to be a derived set of S and denoted by Sd.
Note that from Definition 6.4 we have

N \ ðS  fpgÞ 6¼ ∅ , p 2 S  fpg, ð6:21Þ

where N is an arbitrary neighborhood of p.


Definition 6.7 Let S be a subset of T. Suppose that we have a point p 2 S. If for
some neighborhood N of p we have N \ (S  {p}) ¼ ∅, then p is called an isolated
point of S. A set comprising only isolated points of S is said to be a discrete set
(or subset) of S and we denote it by Sdsc.
Note that if p is an isolated point, N \ (S  {p}) ¼ N \ S  N \ {p} ¼ N \ S 
{p} ¼ ∅. That is, {p} ¼ N \ S. Therefore, any isolated points of S are contained in S.
Contrary to this, the accumulation points are not necessarily contained in S.
Comparing Definitions 6.6 and 6.7, we notice that the latter definition is obtained
by negation of the statement of the former definition. This implies that any points of
a set are classified into mutually exclusive two alternatives, i.e., the accumulation
points or isolated points.
We divide Sd into a direct sum of two sets such that
   
Sd ¼ ð S [ Sc Þ \ Sd ¼ Sd \ S [ Sd \ Sc , ð6:22Þ

where with the second equality we used (6.7).


Now, we define
 d þ  
S  Sd \ S and Sd  Sd \ Sc :

Then, (6.22) means that Sd is divided into two sets consisting of the accumulation
points, i.e., (Sd)+ that belongs to S and (Sd) that does not belong to S.
Thus, as another direct sum we have
   þ
S ¼ Sd \ S [ Sdsc ¼ Sd [ Sdsc : ð6:23Þ

Equation (6.23) represents the direct sum consisting of a part of the derived set
and the whole discrete set. Here we have the following theorem.
192 6 Theory of Analytic Functions

Theorem 6.4 Let S be a subset of T. Then we have a following equation:


 
S ¼ Sd [ Sdsc Sd \ Sdsc ¼ ∅ : ð6:24Þ

Proof Let us assume that p is an accumulation point of S. Then, we have p 2


S  fpg ⊂ S. Meanwhile, Sdsc ⊂ S ⊂ S. This implies that S contains both Sd and
Sdsc. That is, we have

S ⊃ Sd [ Sdsc :

It is natural to ask how to deal with other points that do not belong to Sd [ Sdsc.
Taking account of the above discussion on the accumulation points and isolated
points, however, such remainder points should again be classified into accumulation
and isolated points. Since all the isolated points are contained in S, they can be taken
into S.
Suppose that among the accumulation points, some points p are not contained in
= S. In that case, we have S  {p} ¼ S, namely S  fpg ¼ S. Thus, p 2
S; i.e., p 2
S  fpg , p 2 S. From (6.21), in turn, this implies that in case p is not contained in
S, for p to be an accumulation point of S is equivalent to that p is an adherent point of
S. Then, those points p can be taken into S as well. Thus, finally we get (6.24). From
Definitions 6.6 and 6.7, obviously we have Sd \ Sdsc ¼ ∅. This completes the
proof. ∎
Rewriting (6.24), we get
   þ  
S ¼ Sd [ Sd [ Sdsc ¼ Sd [ S: ð6:25Þ

We have other important theorems.


Theorem 6.5 Let S be a subset of T. Then we have a following relation:

S ¼ Sd [ S: ð6:26Þ

Proof Let us assume that p 2 S. Then, we have following two cases: (i) if p 2 S, we
have trivially p 2 S ⊂ Sd [ S. (ii) Suppose p 2 = S. Since p 2 S, from Definition 6.4
any neighborhood of p contains a point of S (other than p). Then, from Definition
6.6, p is an accumulation point of S. Thus, we have p 2 Sd ⊂ Sd [ S and, hence, get
S ⊂ Sd [ S. Conversely, suppose that p 2 Sd [ S. Then, (i) if p 2 S, obviously we
have p 2 S. (ii) Suppose p 2 Sd. Then, from Definition 6.6 any neighborhood of
p contains a point of S (other than p). Thus, from Definition 6.4, p 2 S. Taking into
account the above two cases, we get Sd [ S ⊂ S. Combining this with S ⊂ Sd [ S
obtained above, we get (6.26). This completes the proof. ∎
6.1 Set and Topology 193

Fig. 6.6 Relationship ̅


among S, S, Sd, and Sdsc.
Regarding the symbols and
notations, see text
( )

( )

̅= ∪ = ∪ ,
=( ) ∪( )

Theorem 6.6 Let S be a subset of T. A necessary and sufficient condition for S be a


closed set is that S contains all the accumulation points of S.
Proof From Theorem 6.2, that S is a closed set is equivalent to S ¼ S . From
Theorem 6.5 this is equivalent to S ¼ Sd [ S. Obviously, this is equivalent to Sd ⊂ S.
In other words, S contains all the accumulation points of S. This completes the
proof. ∎
As another proof, taking account of (6.25), we have
 d 
S ¼ ∅ ⟺ S ¼ S ⟺ S is a closed set ðfrom Theorem 6:2Þ:

This relation is equivalent to the statement that S contains all the accumulation
points of S.
In Fig. 6.6 we show the relationship among S, S, Sd, and Sdsc. From Fig. 6.6, we
can readily show that

Sdsc ¼ S  Sd ¼ S  Sd :

We have a further example of direct sum decomposition. Using Lemma 6.5, we


have

Sb ¼ S \ Sc ¼ S \ ð S∘ Þ c ¼ S  S∘ : ð6:27Þ

Since S ⊃ S∘ , we get

S ¼ S∘ [ Sb ; S∘ \ Sb ¼ ∅:

Notice that (6.27) is a succinct expression of Theorem 6.3. That is,

Sb ¼ ∅ ⟺ S ¼ S∘ ⟺ S is a clopen set:
194 6 Theory of Analytic Functions

(a) (b)

(c)

= ∪

Fig. 6.7 Connectedness of subspaces. (a) Connected subspace A. Any two points z1 and z2 of A can
be connected by a continuous line that is contained entirely within A. (b) Simply connected
subspace A. All the points inside any closed path C within A contain only the points that belong
to A. (c) Disconnected subspace A that comprises two disjoint sets B and C

(e) Connectedness
In the precedent several paragraphs we studied various building blocks and their
properties of the topological space and sets contained in the space. In this paragraph,
we briefly mention the relationship between the sets. The connectedness is an
important concept in the topology. The concept of the connectedness is associated
with the subspaces defined earlier in this section. In terms of the connectedness, a
topological space can be categorized into two main types: a subspace of the one type
is connected and that of the other is disconnected.
Let A be a subspace of T. Intuitively, for A to be connected is defined as follows:
Let z1 and z2 be any two points of A. If z1 and z2 can be joined by a continuous line
that is contained entirely within A (see Fig. 6.7a), then A is said to be connected. Of
the connected subspaces, suppose that the subspace A has the property that all the
points inside any closed path (i.e., continuous closed line) C within A contain only
the points that belong to A (Fig. 6.7b). Then, that subspace A is said to be simply
connected. A subspace that is not simply connected but connected is said to be
multiply connected. A typical example for the latter is a torus. If a subspace is given
as a union of two (or more) disjoint non-empty (open) sets, that subspace is said to be
disconnected. Notice that if two (or more) sets have no element in common, those
sets are said to be disjoint. Figure 6.7c shows an example for which a disconnected
subspace A is a union of two disjoint sets B and C.
Interesting examples can be seen in Chap. 20 in relation to the connectedness.
6.1 Set and Topology 195

6.1.3 T1-Space

We can “enter” a variety of topologies τ into a set T to get a topological space (T, τ).
We have two extremes of them. One is an indiscrete topology (or trivial topology)
and the other is a discrete topology [2]. For the former the topology is characterized
as

τ ¼ f∅, T g ð6:28Þ

and the latter case is

τ ¼ f∅, all the subsets of T, T g: ð6:29Þ

With τ all the subsets are open and complements to the individual subsets are
closed by Definition 6.1. But, such closed subsets are contained in τ, and so they are
again open. Thus, all the subsets are clopen sets. Practically speaking, the afore-
mentioned two extremes are of less interest and value. For this reason, we need some
moderate separation conditions with topological spaces. These conditions are well
established as the separation axioms. We will not get into details about the discus-
sion [2], but we mention the first separation axiom (Fréchet axiom) that produces the
T1-space.
Definition 6.8 Let (T, τ) be a topological space. Suppose that with respect to
8
x, y (x 6¼ y) 2 T there is a neighborhood N in such a way that

x 2 N and y 2
= N:

The topological space that satisfies the above separation condition is called a
T1-space.
We mention two important theorems of the T1-space.
Theorem 6.7 A necessary and sufficient condition for (T, τ) to be a T1-space is that
for each point x 2 T, {x} is a closed set of T.
Proof
(i) Necessary condition: Let (T, τ) to be a T1-space. Choose any pair of elements
x, y (x 6¼ y) 2 T. Then, from Definition 6.8 there is a neighborhood ∃N of y (6¼x)
that does not contain x; y 2 N and N \ {x} ¼ ∅. From Definition 6.4 this implies
that y 2
= fxg. Thus, we have y 6¼ x ) y 2= fxg. This means that y 2 fxg ) y ¼ x.
Namely, fxg ⊂ fxg , but from (6.11) we have fxg ⊂ fxg . Thus, fxg ¼ fxg .
From Theorem 6.2 this shows that {x} is a closed set of T.
(ii) Sufficient condition: Suppose that {x} is a closed set of T. Choose any x and
y such that y 6¼ x. Since {x} is a closed set, N ¼ T  {x} is an open set that
contains y; i.e., y 2 T  {x} and x 2= T  {x}. Since y 2 T  {x} ⊂ T  {x}
196 6 Theory of Analytic Functions

where T  {x} is an open set, from Definition 6.2 N ¼ T  {x} is a neighbor-


hood of y, and moreover N does not contain x. Then, from Definition 6.8 (T, τ) is
a T1-space. These complete the proof. ∎
A set comprising a unit element is called a singleton or a unit set.
Theorem 6.8 Let S be a subset of a T1-space. Then Sd is a closed set in the T1-space.

Proof Let p be a point of S. Suppose p 2 Sd . Suppose, furthermore, p 2= Sd. Then,



from Definition 6.6 there is some neighborhood N of p such that N \ (S  {p}) ¼ ∅.
Since p 2 Sd , from Definition 6.4 we must have N \ Sd 6¼ ∅ at once for this special
neighborhood N of p. Then, as we have p 2 = Sd, we can take a point q such that
q 2 N \ S and q 6¼ p. Meanwhile, we may choose an open set N for the above
d

neighborhood of p (i.e., an open neighborhood), because p 2 N (¼N∘) ⊂ N. Since


N \ (T  {p}) ¼ N  {p} is an open set from Theorem 6.7 as well as Axiom
(O2) and Definition 6.1, N  {p} is an open neighborhood of q which does not
contain p. Namely, we have q 2 N  {p}[¼(N  {p})∘] ⊂ N  {p}. Note that
q 2 N  {p} and p 2 = N  {p}, in agreement with Definition 6.8. As q 2 Sd, again
from Definition 6.6 we have

ðN  fpgÞ \ ðS  fqgÞ 6¼ ∅: ð6:30Þ

This implies that N \ (S  {q}) 6¼ ∅ as well. But, it is in contradiction to the


original relation of N \ (S  {p}) ¼ ∅, which is obtained from p 2 = Sd. Then, we
must have p 2 Sd. Thus, from the original supposition we get Sd ⊂ Sd . Meanwhile,
we have Sd ⊃ Sd from (6.11). We have Sd ¼ Sd accordingly. From Theorem 6.2, this
means that Sd is a closed set. Hence, we have proven the theorem. ∎
The above two theorems are intuitively acceptable. For, in a one-dimensional
Euclidean space ℝ we express a singleton {x} as [x, x] and a closed interval as [x, y]
(x 6¼ y). With the latter, [x, y] might well be expressed as (x, y)d. Such subsets are well
known as closed sets. According to the separation axioms, various topological
spaces can be obtained by imposing stronger constraints upon the T1-space [2]. A
typical example for this is a metric space. In this context, the T1-space has acquired
primitive notions of metric that can be shared with the metric space. Definitions of
the metric (or distance function) and metric space will briefly be summarized in
Chap. 13 in reference to an inner product space.
In the above discussion, we have described a brief outline of the set theory and
topology. We use the results in this chapter and in Part IV as well.
6.1 Set and Topology 197

Fig. 6.8 Complex plane


where z is expressed in a
polar form using a non-
negative radial coordinate
r and an angular coordinate θ
i
x
0 1

6.1.4 Complex Numbers and Complex Plane

Returning to (6.1), we further investigate how to represent a complex number. In


Fig. 6.8 we redraw a complex plane and a complex number on it. The complex plane
is a natural extension of a real two-dimensional orthogonal coordinate plane (Car-
tesian coordinate plane) that graphically represents a pair of real numbers x and y as
(x, y). In the complex plane we represent a complex number as

z ¼ x þ iy ð6:1Þ

by designating x as an abscissa on the real axis and y as an ordinate on the imaginary


axis. The two axes are orthogonal to each other.
An absolute value (or modulus) of z is defined as a real non-negative number
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j zj ¼ x2 þ y2 : ð6:31Þ

Analogously to a real two-dimensional polar coordinate, we can introduce in the


complex plane a non-negative radial coordinate r and an angular coordinate θ
(Fig. 6.8). In the complex analysis the angular coordinate is called an argument
and denoted by arg z so as to be

arg z ¼ θ þ 2πn; n ¼ 0,  1,  2,   :

In Fig. 6.8, x and y are given by

x ¼ r cos θ and y ¼ r sin θ:

Thus, we have a polar form of z expressed as


198 6 Theory of Analytic Functions

z ¼ r ðcos θ þ i sin θÞ: ð6:32Þ

Using the well-known Euler’s formula (or Euler’s identity)

eiθ ¼ cos θ þ i sin θ, ð6:33Þ

we get

z ¼ reiθ : ð6:34Þ

From (6.32) and (6.34), we have

zn ¼ r n einθ ¼ r n ðcos θ þ i sin θÞn ¼ r n ðcos nθ þ i sin nθÞ, ð6:35Þ

where the last equality comes from replacing θ with nθ in (6.33).


Comparing both sides with the last equality of (6.35), we have

ðcos θ þ i sin θÞn ¼ cos nθ þ i sin nθ: ð6:36Þ

Equation (6.36) is called the de Moivre’s theorem. This relation holds with
n being integers including zero along with rational numbers including negative
numbers. Euler’s formula (6.33) immediately leads to the following important
formulae:

1  iθ  1  iθ 
cos θ ¼ e þ eiθ , sin θ ¼ e  eiθ : ð6:37Þ
2 2i

Notice that although in (6.32) and (6.33) we assumed that θ is a real number,
(6.33) and (6.37) hold with any complex number θ. Note also that (6.33) results from
(6.37) and the following definition of power series expansion of the exponential
function

X 1 zk
ez  k¼0 k!
, ð6:38Þ

where z is any complex number [5]. In fact, from (6.37) and (6.38) we have
following familiar expressions of power series expansion of the cosine and sine
functions

X1 z2k X1 z2kþ1
cos z ¼ ð1Þk , sin z ¼ ð1Þk :
k¼0 ð2kÞ! k¼0 ð2k þ 1Þ!

These expressions frequently appear in this chapter.


6.2 Analytic Functions of a Complex Variable 199

Next, let us think of a pair of complex numbers z and w with w expressed as


w ¼ u + iv. Then, we have

z  w ¼ ðx  uÞ þ iðy  vÞ:

Using (6.31), we get


qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jz  wj ¼ ð x  uÞ 2 þ ð y  v Þ 2 : ð6:39Þ

Equation (6.39) represents the “distance” between z and w. If we define a


following function ρ(z, w) such that

ρðz, wÞ  jz  wj ,

ρ(z, w) defines a metric (or distance function); see Chap. 13. Thus, the metric gives a
distance between any arbitrarily chosen pair of elements z, w 2 ℂ and (ℂ, ρ) can be
dealt with as a metric space. This makes it easy to view the complex plane as a
topological space. Discussions about the set theory and topology already developed
earlier in this chapter basically hold.
In the theory of analytic functions, a subset A depicted in Fig. 6.7 can be regarded
as a part of the complex plane. Since the complex plane ℂ itself represents a
topological space, the subset A may be viewed as a subspace of ℂ. A complex
function f (z) of a complex variable z of (6.1) is usually defined in a connected open
set called a region [5]. The said region is of practical use among various subspaces
and can be an entire domain of ℂ or a subset of ℂ. The connectedness of the region
can be considered in a manner similar to that described in Sect. 6.1.2.

6.2 Analytic Functions of a Complex Variable

Since the complex number and complex plane have been well characterized in the
previous section, we deal with the complex functions of a complex variable in the
complex plane.
Taking the complex conjugate of (6.1), we have

z ¼ x  iy: ð6:40Þ

Combining (6.1) and (6.40) in a matrix form, we get


 
1 1
ðz z Þ ¼ ðx yÞ : ð6:41Þ
i i
200 6 Theory of Analytic Functions





∗ ∗
− − +
0



− +

Fig. 6.9 Vector synthesis of z and z

(a) (b) z

i
x
0 0 1

Fig. 6.10 Ways a number (real or complex) is approached. (a) Two ways a number x0 is
approached in a real number line. (b) Various ways a number z0 is approached in a complex
plane. Here, only four ways are indicated

 
1 1
The matrix is non-singular, and so the inverse matrix exists (see Sect.
i i
11.3) and (x y) can be described as
 
1 1 i
ðx yÞ ¼ ðz z Þ ð6:42Þ
2 1 i

or with individual components we have

1 i
x ¼ ðz þ z Þ and y ¼  ðz  z Þ, ð6:43Þ
2 2

which can be understood as the “vector synthesis” (see Fig. 6.9).


Equations (6.41) and (6.42) imply that two sets of variables (x y) and (z z) can
be dealt with on an equal basis. According to the custom, however, we prioritize the
notation using (x y) over that of (z z). Hence, the complex function f is describe as

f ðx, yÞ ¼ uðx, yÞ þ ivðx, yÞ, ð6:44Þ

where both u(x, y) and v(x, y) are real functions of real variables x and y.
6.2 Analytic Functions of a Complex Variable 201

Here let us pause for a second to consider the differentiation of a function.


Suppose that a mathematical function is defined on a real domain (i.e., a real number
line). Consider whether that function is differentiable at a certain point x0 of the real
number line. On this occasion, we can approach x0 only from two directions, i.e.,
from the side of x < x0 (from the left) or from the side of x0 < x (from the right); see
Fig. 6.10a. Meanwhile, suppose that a mathematical function is defined on a
complex domain (i.e., a complex plane). Also consider whether the function is
differentiable at a certain point z0 of the complex plane. In this case, we can approach
z0 from continuously varying directions; see Fig. 6.10b where only four directions
are depicted.
In this context let us think of a simple example.
Example 6.1 Let f (x, y) be a function described by

f ðx, yÞ ¼ 2x þ iy ¼ z þ x: ð6:45Þ

As remarked above, substituting (6.43) for (6.45), we could have


h i
1 i 1 1
hðz, z Þ ¼ 2  ðz þ z Þ þ i  ðz  z Þ ¼ z þ z þ ðz  z Þ ¼ ð3z þ z Þ,
2 2 2 2

where f (x, y) and h(z, z) denote different functional forms. The derivative (6.45)
varies depending on a way z0 is approached. For instance, think of

df f ð 0 þ z Þ  f ð 0Þ 2x þ iy 2x2 þ y2  ixy
¼ lim ¼ lim ¼ lim :
dz 0 z!0 z x!0, y!0 x þ iy x!0, y!0 x 2 þ y2

Suppose that the differentiation is taken along a straight line in a complex plane
represented by iy ¼ (ik)x (k, x, y : real). Then, we have

df 2x2 þ k2 x2  ikx2 2 þ k 2  ik 2 þ k 2  ik
¼ lim ¼ lim ¼ : ð6:46Þ
dz 0 x!0, y!0 x þk x
2 2 2 x!0, y!0 1 þ k2 1 þ k2
df
However, this means that dz 0 takes varying values depending upon k. Namely,
df
dz 0cannot uniquely be defined but depends on different ways to approach the origin
of the complex plane. Thus, we find that the derivative takes different values
depending on straight lines along which the differentiation is taken. This means
that f (x, y) is not differentiable or analytic at z ¼ 0.
Meanwhile, think of g(z) expressed as

gðzÞ ¼ x þ iy ¼ z: ð6:47Þ

In this case, we get


202 6 Theory of Analytic Functions

dg gð 0 þ z Þ  gð 0Þ z0
¼ lim ¼ lim ¼ 1:
dz 0 z!0 z z!0 z

As a result, the derivative takes the same value 1, regardless of what straight lines
the differentiation is taken along.
Though simple, the above example gives us a heuristic method. As before, let f (z)
be a function described as (6.44) such that

f ðzÞ ¼ f ðx, yÞ ¼ uðx, yÞ þ ivðx, yÞ, ð6:48Þ

where both u(x, y) and v(x, y) possess the first-order partial derivatives with respect to
x and y. Then we have a derivative df (z)/dz such that

df ðzÞ f ðz þ ΔzÞ  f ðzÞ


¼ lim
dz Δz!0 Δz
uðx þ Δx, y þ ΔyÞ  uðx, yÞ þ i½vðx þ Δx, y þ ΔyÞ  vðx, yÞ
¼ lim :
Δx!0, Δy!0 Δx þ iΔy
ð6:49Þ

We wish to seek the condition that dfdzðzÞ gives the same result regardless of the
order of taking the limit of Δx ! 0 and Δy ! 0. That is, taking the limit of Δy ! 0
first, we have

df ðzÞ uðx þ Δx, yÞ  uðx, yÞ þ i½vðx þ Δx, yÞ  vðx, yÞ


¼ lim
dz Δx!0 Δx
∂uðx, yÞ ∂vðx, yÞ
¼ þi : ð6:50Þ
∂x ∂x

Next, taking the limit of Δx ! 0, we get

df ðzÞ uðx, y þ ΔyÞ  uðx, yÞ þ i½vðx, y þ ΔyÞ  vðx, yÞ


¼ lim
dz Δy!0 iΔy
∂uðx, yÞ ∂vðx, yÞ
¼ i þ : ð6:51Þ
∂y ∂y

Consequently, by equating the real and imaginary parts of (6.50) and (6.51) we
must have

∂uðx, yÞ ∂vðx, yÞ
¼ , ð6:52Þ
∂x ∂y
6.2 Analytic Functions of a Complex Variable 203

∂vðx, yÞ ∂uðx, yÞ
¼ : ð6:53Þ
∂x ∂y

These relationships (6.52) and (6.53) are called the CauchyRiemann conditions.
Differentiating (6.52) with respect to x and (6.53) with respect to y and further
subtracting one from the other, we get

2 2
∂ uðx, yÞ ∂ uðx, yÞ
2
þ 2
¼ 0: ð6:54Þ
∂x ∂y

Similarly we have

2 2
∂ vðx, yÞ ∂ vðx, yÞ
2
þ 2
¼ 0: ð6:55Þ
∂x ∂y

From the above discussion, we draw several important implications.


(1) From (6.50), we have

df ðzÞ ∂f
¼ : ð6:56Þ
dz ∂x

(2) Also from (6.51) we obtain

df ðzÞ ∂f
¼ i : ð6:57Þ
dz ∂y

Meanwhile, we have

∂ ∂z ∂ ∂z ∂ ∂ ∂
¼ þ ¼ þ : ð6:58Þ
∂x ∂x ∂z ∂x ∂z ∂z ∂z

Also, we get
 
∂ ∂z ∂ ∂z ∂ ∂ ∂
¼ þ ¼i  : ð6:59Þ
∂y ∂y ∂z ∂y ∂z ∂z ∂z

As in the case of Example 6.1, we rewrite f (z) as

f ðzÞ  F ðz, z Þ, ð6:60Þ

where the change in the functional form is due to the variables transformation.
Hence, from (6.56) and using (6.58) we have
204 6 Theory of Analytic Functions

 
df ðzÞ ∂F ðz, z Þ ∂ ∂ ∂F ðz, z Þ ∂F ðz, z Þ
¼ ¼ þ  F ðz, z Þ ¼ þ : ð6:61Þ
dz ∂x ∂z ∂z ∂z ∂z

Similarly, we get
 
df ðzÞ ∂F ðz, z Þ ∂ ∂ ∂F ðz, z Þ ∂F ðz, z Þ
¼ i ¼   F ðz, z Þ ¼  : ð6:62Þ
dz ∂y ∂z ∂z ∂z ∂z

Equating (6.61) and (6.62), we obtain

∂F ðz, z Þ
¼ 0: ð6:63Þ
∂z

This clearly shows that if F(z, z) is differentiable with respect to z, F(z, z) does
not depend on z, but only depends upon z. Thus, we may write the differentiable
function as

F ðz, z Þ  f ðzÞ:

This is an outstanding characteristic of the complex function that is differentiable


with respect to the complex variable z.
Taking the contraposition of the above statement, we say that if F(z, z) depends
on z, it is not differentiable with respect to z. Example 6.1 is one of the illustration.
On the other way around, we consider what can happen if the CauchyRiemann
conditions are satisfied [5]. Let f (z) be a complex function described by

f ðzÞ ¼ uðx, yÞ þ ivðx, yÞ, ð6:48Þ

where u(x, y) and v(x, y) satisfy the CauchyRiemann conditions and possess con-
tinuous first-order partial derivatives with respect to x and y in some region of ℂ.
Then, we have

∂uðx, yÞ ∂uðx, yÞ
uðx þ Δx, y þ ΔyÞ  uðx, yÞ ¼ Δx þ Δy þ ε1 Δx þ δ1 Δy, ð6:64Þ
∂x ∂y
∂vðx, yÞ ∂vðx, yÞ
vðx þ Δx, y þ ΔyÞ  vðx, yÞ ¼ Δx þ Δy þ ε2 Δx þ δ2 Δy, ð6:65Þ
∂x ∂y

where four positive numbers ε1, ε2, δ1, and δ2 can be made arbitrarily small as Δx and
Δy tend to be zero. The relations (6.64) and (6.65) result from the continuity of the
first-order partial derivatives of u(x, y) and v(x, y). Then, we have
6.2 Analytic Functions of a Complex Variable 205

f ðz þ ΔzÞ  f ðzÞ
Δz
uðx þ Δx, y þ ΔyÞ  uðx, yÞ i½vðx þ Δx, y þ ΔyÞ  vðx, yÞ
¼ þ
Δz Δz
Δx ∂uðx, yÞ ∂vðx, yÞ Δy ∂uðx, yÞ ∂vðx, yÞ
¼ þi þ þi
Δz ∂x ∂x Δz ∂y ∂y
Δx Δy
þ ðε1 þ iε2 Þ þ ðδ þ iδ2 Þ
Δz Δz 1
Δx ∂uðx, yÞ ∂vðx, yÞ Δy ∂vðx, yÞ ∂uðx, yÞ
¼ þi þ  þi
Δz ∂x ∂x Δz ∂x ∂x
Δx Δy
þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ
Δz 1 Δz 1
Δx þ iΔy ∂uðx, yÞ ðΔx þ iΔyÞ ∂vðx, yÞ Δx Δy
¼  þ i þ ðε1 þ iε2 Þ þ ðδ þ iδ2 Þ
Δz ∂x Δz ∂x Δz Δz 1
∂uðx, yÞ ∂vðx, yÞ Δx Δy
¼ þi þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ, ð6:66Þ
∂x ∂x Δz 1 Δz 1

where with the third equality we used the CauchyRiemann conditions (6.52) and
(6.53). Accordingly, we have

f ðz þ ΔzÞ  f ðzÞ ∂uðx, yÞ ∂vðx, yÞ


 þi
Δz ∂x ∂x
Δx Δy
¼ ðε þ iε2 Þ þ ðδ þ iδ2 Þ: ð6:67Þ
Δz 1 Δz 1

Taking the absolute values of both sides of (6.67), we get

f ðz þ ΔzÞ  f ðzÞ ∂uðx, yÞ ∂vðx, yÞ


 þi
Δz ∂x ∂x
Δx Δy
 ðε þ iε2 Þ þ ðδ þ iδ2 Þ , ð6:68Þ
Δz 1 Δz 1
Δx j Δx j Δy jΔyj
where ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  1 and ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Δz 2 2 Δz
ðΔxÞ þ ðΔyÞ ðΔxÞ2 þ ðΔyÞ2
 1: Taking the limit of Δz ! 0,

by assumption both the terms of RHS of (6.68) approach zero. Thus,


206 6 Theory of Analytic Functions

f ðz þ ΔzÞ  f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f


lim ¼ þi ¼ : ð6:69Þ
Δy!0 Δz ∂x ∂x ∂x

Alternatively, using partial derivatives of u(x, y) and v(x, y) with respect to y we


can rewrite (6.66) as

f ðz þ ΔzÞ  f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f


lim ¼ i þi ¼ i : ð6:70Þ
Δy!0 Δz ∂y ∂y ∂y

From (6.56) and (6.57), the relations (6.69) and (6.70) imply that for f (z) to be
differentiable requires that the first-order partial derivatives of u(x, y) and v(x, y) with
respect to x and y should exist. Meanwhile, once f (z) is found to be differentiable
(or analytic), its higher order derivatives must be analytic as well (vide infra). This
requires, in turn, that the above first-order partial derivatives should be continuous.
(Note that the analyticity naturally leads to continuity.) Thus, the following theorem
will follow.
Theorem 6.9 Let f (z) be a complex function of complex variables z such that
f (z) ¼ u(x, y) + iv(x, y), where x and y are real variables. Then, a necessary and
sufficient condition for f (z) to be differentiable is that the first-order partial deriva-
tives of u(x, y) and v(x, y) with respect to x and y exist and are continuous and that u
(x, y) and v(x, y) satisfy the CauchyRiemann conditions.
Now, we formally give definitions of the differentiability and analyticity of a
complex function of complex variables.
Definition 6.9 Let z0 be a given point of the complex plane ℂ. Let f (z) be defined in
a neighborhood containing z0. If f (z) is single valued and differentiable at all points
of this neighborhood containing z0, f (z) is said to be analytic at z0. A point at which
f (z) is analytic is called a regular point of f (z). A point at which f (z) is not analytic is
called a singular point of f (z).
We emphasize that the above definition of analyticity at some point z0 requires the
single valuedness and differentiability at all the points of a neighborhood containing
z0. This can be understood by Fig. 6.10b and Example 6.1. The analyticity at a point
is deeply connected to how and in which direction we take the limitation process.
Thus, we need detailed information about the neighborhood of the point in question
to determine whether the function is analytic at that point.
The next definition is associated with a global characteristic of the analytic
function.
Definition 6.10 Let R be a region contained in the complex plane ℂ. Let f (z) be a
complex function defined in R . If f (z) is analytic at all points of R , f (z) is said to be
analytic in R . In this case R is called a domain of analyticity. If the domain of
analyticity is an entire complex plane ℂ, the function is called an entire function.
To understand various characteristics of the analytic functions, it is indispensable
to introduce Cauchy’s integral formula. In the next section we deal with the
integration of complex functions.
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 207

6.3 Integration of Analytic Functions: Cauchy’s Integral


Formula

We can define the complex integration as a natural extension of Riemann integral


with regard to a real variable. Suppose that there is a curve C in the complex plane.
Both ends of the curve are fixed and located at za and zb. Suppose also that individual
points of the curve C are described using a parameter t such that

z ¼ zðt Þ ¼ xðt Þ þ iyðt Þ ðt 0  t a  t  t n  t b Þ, ð6:71Þ

where z0  za ¼ z(ta) and zn  zb ¼ z(tb) together with zi ¼ z(ti) (0  i  n). For this
parametrization, we assume that C is subdivided into n pieces designated by z0, z1,
  , zn. Let f (z) be a complex function defined in a region containing C. In this
situation, let us assume the following summation Sn:
Xn
Sn ¼ i¼1
f ðζ i Þðzi  zi1 Þ, ð6:72Þ

where ζ i lies between zi  1 and zi (1  i  n). Taking the limit n ! 1 and


concomitantly jzi  zi  1 j ! 0, we have
Xn
lim Sn ¼ lim i¼1
f ðζ i Þðzi  zi1 Þ: ð6:73Þ
n!1 n!1

If the constant limit exists independent of choice of zi (1  i  n  1) and


ζ i (1  i  n), this limit is called a contour integral of f (z) along the curve C. We
write it as
Z Z zb
I ¼ lim Sn  f ðzÞdz ¼ f ðzÞdz: ð6:74Þ
n!1
C za

Using (6.48) and zi ¼ xi + iyi, we rewrite (6.72) as


Xn
Sn ¼ i¼1
½uðζ i Þðxi  xi1 Þ  vðζ i Þðyi  yi1 Þ
Xn
þ i¼1
i½vðζ i Þðxi  xi1 Þ þ uðζ i Þðyi  yi1 Þ:

Taking the limit of n ! 1 as well as jxi  xi  1j ! 0 and jyi  yi  1j !0


(0  i  n), we have
Z Z
I¼ ½uðx, yÞdx  vðx, yÞdy þ i ½vðx, yÞdx þ uðx, yÞdy: ð6:75Þ
C C

Further rewriting (6.75), we get


208 6 Theory of Analytic Functions

Z Z
dx dy dx dy
I¼ uðx, yÞ dt  vðx, yÞ dt þ i vðx, yÞ dt þ uðx, yÞ dt
dt dt dt dt
ZC Z C
dx dy
¼ ½uðx, yÞ þ ivðx, yÞ dt þ i ½uðx, yÞ þ ivðx, yÞ dt
dt dt
ZC  C
 Z ð6:76Þ
dx dy dz
¼ ½uðx, yÞ þ ivðx, yÞ þi dt ¼ ½uðx, yÞ þ ivðx, yÞ dt:
C dt dt C dt
Z Z tb Z zb
dz dz
¼ f ðzÞ dt ¼ f ðzÞ dt ¼ f ðzÞdz
C dt ta dt za

Notice that the contour integral I of (6.74), (6.75), or (6.76) depends in general on
the paths connecting the points za and zb. If so, (6.76) is merely of secondary
importance. But, if f (z) can be described by the derivative of another function, the
situation becomes totally different. Suppose that we have

e ðzÞ
dF
f ðzÞ ¼ : ð6:77Þ
dz

e ðzÞ is analytic, f (z) is analytic as well. It is because a derivative of an analytic


If F
function is again analytic (vide infra). Then, we have

e ðzÞ dz d F
dz d F e ½zðt Þ
f ð zÞ ¼ ¼ : ð6:78Þ
dt dz dt dt

Inserting (6.78) into (6.76), we get


Z Z Z
dz zb tb e ½zðt Þ
dF
f ðzÞ dt ¼ f ðzÞdz ¼ e ðzb Þ  F
dt ¼ F e ðza Þ: ð6:79Þ
C dt za ta dt

e ðzÞ is called a primitive function of f (z). It is obvious from (6.79)


In this case F
that
Z zb Z za
f ðzÞdz ¼  f ðzÞdz: ð6:80Þ
za zb

This implies that the contour integral I does not depend on the path C connecting
the points za and zb. We must be careful, however, about a situation where there
would be a singular point zs of f (z) in a region R . Then, f (z) is not defined at zs. If one
chooses C00 for a contour of the return path in such a way that zs is on C00 (see
Fig. 6.11), RHS of (6.80) cannot exist, nor can (6.80) hold. To avoid that situation,
we have to “expel” such singularity from the region R so that R can wholly be
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 209

Fig. 6.11 Region R and


several contours for
integration. Regarding the
symbols and notations,
see text

contained in the domain of analyticity of f (z) and that f (z) can be analytic in the
entire region R . Moreover, we must choose R so that the whole region inside any
closed paths within R can contain only the points that belong to the domain of
analyticity of f (z). That is, the region R must be simply connected in terms of the
analyticity of f (z) (see Sect. 6.1). For a region R to be simply connected ensures that
(6.80) holds with any pair of points za and zb in R , so far as the integration path with
respect to (6.80) is contained in R .
From (6.80), we further get
Z zb Z za
f ðzÞdz þ f ðzÞdz ¼ 0: ð6:81Þ
za zb

Since the integral does not depend on the paths, we can take a path from zb to za
along a curve C0 (see Fig. 6.11). Thus, we have
Z Z Z
f ðzÞdz þ f ðzÞdz ¼ f ðzÞdz ¼ 0, ð6:82Þ
C C0 CþC 0

e as shown. In this way, in accordance with


where C + C0 constitutes a closed curve C
(6.81) we get
Z
f ðzÞdz ¼ 0, ð6:83Þ
e
C

where Ce is followed counterclockwise according to the custom.


In (6.83) any closed path Ce can be chosen for the contour within the simply
connected region R . Hence, we reach the following important theorem.
Theorem 6.10: Cauchy’s Integral Theorem Let R be a simply connected region
and let C be an arbitrary closed curve within R . Let f (z) be analytic there. Then, we
have
210 6 Theory of Analytic Functions

(a) (b)

Γ

Fig. 6.12 Region R and contour C for integration. (a) A singular point at zs is contained within R .
(b) The singular point at zs is not contained within Re so that f(z) can be analytic in a simply
connected region Re . Regarding the symbols and notations, see text

I
f ðzÞdz ¼ 0: ð6:84Þ
C

There is a variation in description of Cauchy’s integral theorem. For instance, let f


(z) be analytic in R except for a singular point at zs; see Fig. 6.12a. Then, the domain
of analyticity of f (z) is not identical to R , but is defined as R  fzs g. That is, the
domain of analyticity of f (z) is no longer simply connected and, hence, (6.84) is not
generally true. In such a case, we may deform the integration path of f (z) so that its
domain of analyticity can be simply connected and that (6.84) can hold. For
example, we take Re so that it can be surrounded by curves C and Γ e as well as
lines L1 and L2 (Fig. 6.12b). As a result, zs has been expelled from Re and, at the same
time, Re becomes simply connected. Then, as a tangible form of (6.84) we get
I
f ðzÞdz ¼ 0, ð6:85Þ
Cþe
ΓþL1 þL2

e denotes the clockwise integration (see Fig. 6.12b) and the lines L1 and L2 are
where Γ
rendered as closely as possible. In (6.85) the integrations with L1 and L2 cancel,
because they are taken in the reverse direction. Thus, we have
I I I
f ðzÞdz ¼ 0 or f ðzÞdz ¼ f ðzÞdz, ð6:86Þ
Cþe
Γ C Γ

where Γ stands for the counterclockwise integration. Equation (6.86) is valid as well,
when there are more singular points. In that case, instead of (6.86) we have
I XI
f ðzÞdz ¼ i
f ðzÞdz, ð6:87Þ
C Γi
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 211

Fig. 6.13 Simply


connected region R
encircled by a closed
contour C in a complex
plane ζ. A circle Γ of radius
ρ centered at z is contained ℛ
within R . The contour z
integration is taken in the Γ
i
counterclockwise direction
along C or Γ
0 1

where Γi is taken so that it can encircle individual singular points. Reflecting the
nature of singularity, we get different results with (6.87). We will come back to this
point later.
On the basis of Cauchy’s integral theorem, we show an integral representation of
an analytic function that is well known as Cauchy’s integral formula.
Theorem 6.11: Cauchy’s Integral Formula Let f (z) be analytic in a simply
connected region R . Let C be an arbitrary closed curve within R that encircles z.
Then, we have
I
1 f ðζ Þ
f ðzÞ ¼ dζ, ð6:88Þ
2πi C ζz

where the contour integration along C is taken in the counterclockwise direction.


Proof Let Γ be a circle of a radius ρ in the complex plane so that z can be a center of
the circle (see Fig. 6.13). Then, f (ζ)/(ζ  z) has a singularity at ζ ¼ z. Therefore, we
must evaluate (6.88) using (6.86). From (6.86), we have
I I I I
f ðζ Þ f ðζ Þ dζ f ðζ Þ  f ðzÞ
dζ ¼ dζ ¼ f ðzÞ þ dζ: ð6:89Þ
C z
ζ Γ ζ  z Γ ζ  z Γ ζz

An arbitrary point ζ on the circle Γ is described as

ζ ¼ z þ ρeiθ : ð6:90Þ

Then, taking infinitesimal quantities of (6.90), we have

dζ ¼ ρeiθ idθ ¼ ðζ  zÞ idθ: ð6:91Þ

Inserting (6.91) into (6.89), we get


212 6 Theory of Analytic Functions

I Z 2π

¼ idθ ¼ 2πi: ð6:92Þ
Γ z
ζ 0

Rewriting (6.89), we obtain


I I
f ðζ Þ f ðζ Þ  f ðzÞ
dζ ¼ 2πif ðzÞ þ dζ: ð6:93Þ
Γ ζz Γ ζz

Since f (z) is analytic, f (z) is uniformly continuous [6]. Therefore, if we make ρ


small enough with an arbitrary positive number ε, we have j f (ζ)  f (z)j < ε, as
jζ  zj ¼ ρ. Considering this situation, we have
I I I
1 f ðζ Þ 1 f ðζ Þ 1 f ðζ Þ  f ðzÞ
dζ  f ðzÞj¼j dζ  f ðzÞj¼j dζ
2πi C ζ  z 2πi Γ ζ  z 2πi Γ ζz
I I Z 2π
1 f ðζ Þ  f ðzÞ ε jdζ j ε
  jdζ j < ¼ dθ ¼ ε:
2π Γ ζz 2π Γ jζ  zj 2π 0
ð6:94Þ

Therefore, we get (6.88) in the limit of ρ ! 0. This completes the proof. ∎


f ðζ Þ
In the above proof, the domain of analyticity R of ζz (with respect to ζ) is given
by R ¼ R \ ½ℂ  fzg ¼ R  fzg. Since both R and ℂ  fzg are open sets (i.e.,
ℂ  {z} ¼ {z}c is a complementary set of a closed set {z} and, hence, an open set), R
is an open set as well; see Sect. 6.1. Notice that R is not simply connected. For this
reason, we had to evaluate (6.88) using (6.89).
Theorem 6.10 (Cauchy’s integral theorem) and Theorem 6.11 (Cauchy’s integral
formula) play a crucial role in the theory of analytic functions.
To derive (6.94), we can equally use the Darboux inequality [5]. This is intui-
tively obvious and frequently used in the theory of analytic functions.
Theorem 6.12: Darboux Inequality Let f (z) be a function for which j f (z)j is
bounded on C. Here C is a piecewise continuous path in the complex plane. Then,
with the following integral I described by
Z
I¼ f ðzÞdz,
C

we have
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 213

Z
j f ðzÞdz j max j f j  L, ð6:95Þ
C

where L represents the arc length of the curve C for the contour integration.
Proof As discussed earlier in this section, the integral is the limit of n ! 1 of the
sum described by
Xn
Sn ¼ i¼1
f ðζ i Þðzi  zi1 Þ ð6:72Þ

with
Xn
I ¼ lim Sn ¼ lim i¼1
f ðζ i Þðzi  zi1 Þ: ð6:73Þ
n!1 n!1

Denoting the maximum modulus of f (z) on C by max j fj, we have


Xn Xn
j Sn j i¼1
j f ðζ i Þ j  j ðzi  zi1 Þj max j f j i¼1
j ðzi  zi1 Þj : ð6:96Þ
P
The sum ni¼1 j ðzi  zi1 Þj on RHS of inequality (6.96) is the length of a polygon
inscribed in the curve C. It is shorter than the arc length L of the curve C. Hence, for
all n we have

j Sn j max j f j L:
R
As n ! 1, jSnj ¼ j I j ¼ j C f (z)dz j  max j f j  L.
This completes the proof. ∎
Applying the Darboux inequality to (6.94), we get
I I
f ðζ Þ  f ðzÞ f ðζ Þ  f ðzÞ f ðζ Þ  f ðzÞ
dζ ¼ dζ  max  2πρ < 2πε:
Γ ζz Γ ρe iθ ρeiθ

That is,
I
1 f ðζ Þ  f ðzÞ
j dζ j< ε: ð6:97Þ
2πi Γ ζz

So far, we dealt with the differentiation and integration as different mathematical


manipulations. But Cauchy’s integral formula (or Cauchy’s integral expression)
enables us to relate and unify these two manipulations. We have following propo-
sition for this.
214 6 Theory of Analytic Functions

Proposition 6.1 [5] Let C be a piecewise continuous curve of finite length. (The
curve C may or may not be closed.) Let f (z) be a continuous function. The contour
integration of f (ζ)/(ζ  z) gives ef ðzÞ such that
Z
ef ðzÞ ¼ 1 f ðζ Þ
dζ: ð6:98Þ
2πi C z
ζ

Then, ef ðzÞ is analytic at any point z that does not lie on C.


Proof We consider the following expression:

ef ðz þ ΔzÞ  ef ðzÞ Z
1 f ðζ Þ
Δ  dζ , ð6:99Þ
Δz 2πi C ðζ  zÞ2

where Δ is a real non-negative number. Describing (6.99) by use of (6.98) for


ef ðz þ ΔzÞ and ef ðzÞ, we get

Z
j Δz j f ðζ Þ
Δ¼ 2
dζ : ð6:100Þ
2π C ðζ  z  ΔzÞðζ  zÞ

To obtain (6.100), in combination with (6.98) we have calculated (6.99) as


follows:

Z
1 f ðζ Þðζ  zÞ2  f ðζ Þðζ  z  ΔzÞðζ  zÞ  f ðζ ÞΔzðζ  z  ΔzÞ
Δ¼ dζ
2πiΔz C ðζ  z  ΔzÞðζ  zÞ2
Z :
1 f ðζ ÞðΔzÞ2
¼ 2

2πiΔz C ðζ  z  ΔzÞðζ  zÞ

Since z is not on C, the integrand of (6.100) is bounded. Then, as Δz ! 0,


jΔz j ! 0 and Δ ! 0 accordingly. From (6.99), this ensures the differentiability of
ef ðzÞ. Then, by definition of the differentiation, def ðzÞ is given by
dz

Z
def ðzÞ 1 f ðζ Þ
¼ dζ: ð6:101Þ
dz 2πi C ðζ  zÞ2

As f (ζ) is continuous, ef ðzÞ is single valued. Thus, ef ðzÞ is found to be analytic at


any point z that does not lie on C. This completes the proof. ∎
If in (6.101) we take C as a closed contour that encircles z, from (6.98) we have
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 215

I
ef ðzÞ ¼ 1 f ðζ Þ
dζ, ð6:102Þ
2πi C z
ζ

Moreover, if f (z) is analytic in a simply connected region that contains C, from


Theorem 6.11 we must have
I
1 f ðζ Þ
f ðzÞ ¼ dζ: ð6:103Þ
2πi C z
ζ

Comparing (6.102) with (6.103) and taking account of the single valuedness of f
(z), we must have

f ðzÞ  ef ðzÞ: ð6:104Þ

Then, from (6.101) we get


I
df ðzÞ 1 f ðζ Þ
¼ dζ: ð6:105Þ
dz 2πi C ð ζ  zÞ2

An analogous result holds with the n-th derivative of f (z) such that [5]
I
d n f ðzÞ n! f ðζ Þ
¼ dζ ðn : zero or positive integersÞ: ð6:106Þ
dzn 2πi C ðζ  zÞ
nþ1

Equation (6.106) implies that an analytic function is infinitely differentiable and


that the derivatives of all order of an analytic function are again analytic. These
prominent properties arise partly from the aforementioned stringent requirement on
the differentiability of a function of a complex variable.
So far, we have not assumed the continuity of dfdzðzÞ . However, once we have
established (6.106), it assures the presence of, e.g., d dzf ð2zÞ and, hence, the continuity of
2

df ðzÞ
It is true of d dzf ðnzÞ with any n (zero or positive integers).
n

dz .
A following theorem is important and intriguing with the analytic functions.
Theorem 6.13: Cauchy–Liouville Theorem A bounded entire function must be a
constant.
Proof Using (6.106), we consider the first derivative of an entire function f (z)
described as
I
df 1 f ðζ Þ
¼ 2
dζ:
dz 2πi C ðζ  zÞ

Since f (z) is an entire function, we can arbitrarily choose a large enough circle of
radius R centered at z for a closed contour C. On the circle, we have
216 6 Theory of Analytic Functions

ζ ¼ z þ Reiθ ,

where θ is a real number changing from 0 to 2π. Then, the above equation can be
rewritten as
Z Z
df 1 2π
f ðζ Þ 1 2π
f ðζ Þ
¼ iReiθ dθ ¼ dθ: ð6:107Þ
dz 2πi 0 ðReiθ Þ2 2πR 0 eiθ

Taking an absolute value of both sides, we have


Z 2π Z 2π
df 1 M M
j j j f ðζ Þ j dθ  dθ ¼ , ð6:108Þ
dz 2πR 0 2πR 0 R

where M is the maximum of j f (ζ)j. As R ! 1, j df dz j tends to be zero. This implies


that df
dz tends to be zero as well and, hence, that f (z) is constant. This completes the
proof. ∎
At the first glance, Cauchy–Liouville theorem looks astonishing in terms of
theory of real analysis. It is because we are too familiar with, e.g., 1  sin x  1
for any real number x. Note that sinx is bounded in a real domain. In fact, in Sect. 8.6
we will show
 that from (8.98) and (8.99),
  a !  1 (a : real, a 6¼ 0),
when
sin π2 þ ia takes real values with sin π2 þ ia ! 1. This simple example clearly
shows that sinϕ is an unbounded entire function in a complex domain. As a very
familiar example of a bounded entire functions, we show

f ðzÞ ¼ cos z2 þ sin z2  1, ð6:109Þ

which is defined in an entire complex plane.

6.4 Taylor’s Series and Laurent’s Series

Using Cauchy’s integral formula, we study Taylor’s series and Laurent’s series in
relation to the power series expansion of analytic function. Taylor’s series and
Laurent’s series are fundamental tools to study various properties of complex
functions in the theory of analytic functions. First, let us examine Taylor’s series
of an analytic function.
Theorem 6.14 [6] Any analytic function f (z) can be expressed by uniformly
convergent power series at an arbitrary regular point of the function in the domain
of analyticity R .
6.4 Taylor’s Series and Laurent’s Series 217

Fig. 6.14 Diagram to


explain Taylor’s expansion
that is assumed to be ℛ
performed around a. A
circle C of radius r centered
at a is inside R . Regarding
other symbols and notations,
see text

Proof Let us assume that we have a circle C of radius r centered at a within R .


Choose z inside C with jz  aj ¼ ρ (ρ < r) and consider the contour integration of f
(z) along C (Fig. 6.14). Then, we have

1 1 1 1
¼ ¼ 
ζ  z ζ  a  ðz  aÞ ζ  a 1  ζa
za

1 za ð z  aÞ 2
¼ þ þ þ   , ð6:110Þ
ζ  a ð ζ  aÞ 2 ð ζ  aÞ 3

where ζ is an arbitrary point on the circle C. To derive (6.110), we used the following
formula:

1 X1
¼ xn ,
1x n¼0

where x ¼ ζa
za
. Since za
ζa ¼ ρr < 1 , the geometric series of (6.110) converges.
Suppose that f (ζ) is analytic on C. Then, we have a finite positive number M on
C such that

j f ðζ Þj< M: ð6:111Þ

Using (6.110), we have

f ðζ Þ f ðζ Þ ðz  aÞf ðζ Þ ðz  aÞ2 f ðζ Þ
¼ þ þ þ   : ð6:112Þ
ζz ζa ðζ  aÞ2 ð ζ  aÞ 3

Hence, combining (6.111) and (6.112) we get


218 6 Theory of Analytic Functions

"  n #
X1 ðz  aÞν f ðζ Þ M X1 ρ ν M 1 1  ρr
 ¼ 
ν¼n
ð ζ  aÞ νþ1 r ν¼n r r 1  ρr 1  ρr
M ρ n
¼ : ð6:113Þ
rρ r

RHS of (6.113) tends to be zero when n ! 1. Therefore, (6.112) is uniformly


convergent with respect to ζ on C. In the above calculations, when we express the
infinite series of RHS in (6.112) as Σ and the partial sum of its first n terms as Σn,
LHS of (6.113) can be expressed as Σ  Σn( Pn). Since j Pn j ! 0 with n ! 1, this
certainly shows that the series Σ is uniformly convergent [6].
Consequently, we can perform termwise integration [6] of (6.112) on C and
subsequently divide the result by 2πi to get
I X 1 ð z  aÞ n I
1 f ðζ Þ f ðζ Þ
dζ ¼ dζ: ð6:114Þ
Cζz
nþ1
2πi 2πi C ðζ  aÞ
n¼0

From Theorem 6.11, LHS of (6.114) equals f (z). Putting


I
1 f ðζ Þ
An  nþ1
dζ, ð6:115Þ
2πi C ð ζ  aÞ

we get
X1
f ðzÞ ¼ A ðz
n¼0 n
 aÞ n : ð6:116Þ

This completes the proof. ∎


In the above proof, from (6.106) we have
I
1 f ðζ Þ 1 d n f ðzÞ
dζ ¼ : ð6:117Þ
2πi C ðζ  zÞ
nþ1 n! dzn

Hence, combining (6.117) with (6.115) we get

I
1 f ðζ Þ 1 dn f ðzÞ 1 d n f ð aÞ
An ¼ dζ ¼ ¼ : ð6:118Þ
2πi C ð ζ  aÞ
nþ1 n! dzn n! dzn
z¼a

Thus, (6.116) can be rewritten as


6.4 Taylor’s Series and Laurent’s Series 219

Fig. 6.15 Diagram used for


calculating the Taylor’s z
series of f(z) ¼ 1/z around
z ¼ a (a 6¼ 0). A contour
C encircles the point a
i

0 1

X1 1 dn f ðaÞ
f ðzÞ ¼ n¼0 n! dzn
ð z  aÞ n :

This is the same form as that obtained with respect to a real variable.
Equation (6.116) is a uniformly convergent power series called a Taylor’s series
or Taylor’s expansion of f (z) with respect to z ¼ a with An being its coefficient. In
Theorem 6.14 we have assumed that a union of a circle C and its inside is simply
connected. This topological environment is virtually the same as that of Fig. 6.13. In
Theorem 6.11 (Cauchy’s integral formula) we have shown the integral representa-
tion of an analytic function. On the basis of Cauchy’s integral formula, Theorem
6.14 demonstrates that any analytic function is given a tangible functional form.
Example 6.2 Let us think of a function f (z) ¼ 1/z. This function in analytic except
for z ¼ 0. Therefore, it can be expanded in the Taylor’s series in the region ℂ  {0}.
We consider the expansion around z ¼ a (a 6¼ 0). Using (6.115), we get the
coefficients of the expansion are
I
1 1
An ¼ nþ1
dz,
2πi C z ð z  aÞ

where C is a closed curve that does not contain z ¼ 0 on C or in its inside. Therefore,
1/z is analytic in the simply connected region encircled with C so that the point z ¼ 0
may not be contained inside C (Fig. 6.15). Then, from (6.106) we get

I
1 1 1 d n ð1=zÞ 1 1 ð1Þn
dz ¼ ¼ ð1Þn n! nþ1 ¼ nþ1 :
2πi C z ð z  aÞ
nþ1 n! dzn n! a a
z¼a

Hence, from (6.116) we have


220 6 Theory of Analytic Functions

1 X1 ð1Þ 1 X1
n
n z n
f ðzÞ ¼ ¼ n¼0 anþ1
ð z  a Þ ¼ 1  : ð6:119Þ
z a n¼0 a

The series uniformly converges within a convergence circle called a “ball” [7]
(vide infra) whose convergence radius r is estimated to be
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffi
1 n ð1Þn 1 n 1 1
¼ lim sup ¼ ¼ : ð6:120Þ
r n!1 anþ1 jaj jaj jaj

That is, r ¼ j aj. In (6.120) “n!1


lim sup” stands for the superior limit and is sometimes
said to be limes superior [6]. The estimation is due to CauchyHadamard theorem.
Readers interested in the theorem and related concepts are referred to appropriate
literature [6].
The concept of the convergence circle and radius is very important for defining
the analyticity of a function. Formally, the convergence radius is defined as a radius
of a convergence circle in a metric space, typically ℝ2 and ℂ. In ℂ a region R within
the convergence circle of convergence radius r (i.e., a real positive number) centered
at z0 is denoted by
 
R ¼ z; z, ∃ z0 2 ℂ, ρðz, z0 Þ < r :

The region R is an open set by definition (see Sect. 6.1) and the Tailor’s series
defined in R is convergent. On the other hand, the Tailor’s series may or may not be
convergent at z that is on the convergence circle. In Example 6.2, f ðzÞ ¼ 1z is not
defined at z ¼ 0, even though z (¼0) is on the convergent circle. At other points on
the convergent circle, however, the Tailor’s series is convergent.
Note that a region B n in ℝn similarly defined as
 
B n ¼ x; x, ∃ x0 2 ℝn , ρðx, x0 Þ < r

is sometimes called a ball or open ball. This concept can readily be extended to any
metric space.
Next, we examine Laurent’s series of an analytic function.
Theorem 6.15 [6] Let f (z) be analytic in a region R except for a point a. Then, f (z)
can be described by the uniformly convergent power series within a region R  fag.
Proof Let us assume that we have a circle C1 of radius r1 centered at a and another
circle C2 of radius r2 centered at a both within R . Moreover, let another circle Γ be
centered at z (6¼a) within R so that z can be outside of C2; see Fig. 6.16.
In this situation we consider the contour integration along C1, C2, and Γ. Using
(6.87), we have
6.4 Taylor’s Series and Laurent’s Series 221

Fig. 6.16 Diagram to


explain Laurent’s expansion
that is performed around a.
The circle Γ containing the
point z lies on the annular Γ
region bounded by the
circles C1 and C2

I I I
1 f ðζ Þ 1 f ðζ Þ 1 f ðζ Þ
dζ ¼ dζ  dζ: ð6:121Þ
2πi Γ ζ  z 2πi C1 ζ  z 2πi C2  z
ζ

From Theorem 6.11, we have

LHS of ð6:121Þ ¼ f ðzÞ:

Notice that the region containing Γ and its inside form a simply connected region
in terms of the analyticity of f (z). With the first term of RHS of (6.121), from
Theorem 6.14 we have
I X1
1 f ðζ Þ
dζ ¼ A ð z  aÞ n ð6:122Þ
C1 ζ  z
2πi n¼0 n

with
I
1 f ðζ Þ
An ¼ nþ1
dζ ðn ¼ 0, 1, 2,   Þ: ð6:123Þ
2πi C 1 ð ζ  aÞ

In fact, if ζ lies on the circle C1, the Taylor’s series is uniformly convergent as in
the case of the proof of Theorem 6.14.
With the second term of RHS of (6.121), however, the situation is different in
such a way that z lies outside the circle C2. In this case, from Fig. 6.16 we have

ζa r
¼ 2 < 1: ð6:124Þ
za ρ

Then, a geometric series described by


222 6 Theory of Analytic Functions

" #
2
1 1 1 ζ  a ð ζ  aÞ
¼ ¼ 1þ þ þ  ð6:125Þ
ζ  z ζ  a  ð z  aÞ z  a z  a ð z  aÞ 2

is uniformly convergent with respect to ζ on C2. Accordingly, again we can perform


termwise integration [6] of (6.125) on C2. Hence, we get
I I I
f ðζ Þ 1 1
 dζ ¼ f ðζ Þdζ þ f ðζ Þðζ  aÞdζ
C2 ζ  z z  a C2 ðz  aÞ2 C2
I ð6:126Þ
1
þ 3
f ðζ Þðζ  aÞ2 dζ þ   :
ðz  aÞ C2

That is, we have


I X1
1 f ðζ Þ
 dζ ¼ A ðz  aÞn ,
n¼1 n
ð6:127Þ
2πi C2 ζ  z

where
I
1
An ¼ f ðζ Þðζ  aÞn1 dζ ðn ¼ 1, 2, 3,   Þ: ð6:128Þ
2πi C2

Consequently, from (6.121), with z lying on the annular region bounded by C1


and C2, we get
X1
f ðzÞ ¼ A ðz
n¼1 n
 aÞ n : ð6:129Þ

In (6.129), the coefficients An are given by (6.123) or (6.128) according to the


plus (including zero) or minus sign of n. These complete the proof. ∎
Equation (6.129) is a uniformly convergent power series called a Laurent’s series
or Laurent’s expansion of f (z) with respect to z ¼ a with An being its coefficient.
Note that the above annular region bounded by C1 and C2 is not simply connected,
but multiply connected.
e is taken in the annular region formed by C1
We add that if the integration path C
and C2 so that it can be sandwiched in between them, the values of integral expressed
as (6.123) and (6.128) remain unchanged in virtue of (6.87). That is, instead of
(6.123) and (6.128) we may have a following unified formula
I
1 f ðζ Þ
An ¼ dζ ðn ¼ 0, 1, 2,   Þ ð6:130Þ
2πi e
C ð ζ  aÞ
nþ1

that represents the coefficients An of the power series


6.5 Zeros and Singular Points 223

X1
f ðzÞ ¼ A ðz
n¼1 n
 aÞ n : ð6:129Þ

6.5 Zeros and Singular Points

We described the fundamental theorems of Taylor’s expansion and Laurent’s


expansion. Next, we explain important concepts of zeros and singular points.
Definition 6.11 Let f (z) be analytic in a region R . If f (z) vanishes at a point z ¼ a,
the point is called a zero of f (z). When we have

df ðzÞ d2 f ðzÞ dn1 f ðzÞ


f ð aÞ ¼ ¼ z¼a ¼  ¼ ¼ 0, ð6:131Þ
dz z¼a dz2 dzn1 z¼a

but

d n f ðzÞ
6¼ 0, ð6:132Þ
dzn z¼a

the function is said to have a zero of order n at z ¼ a.


If (6.131) and (6.132) hold, the first n coefficients in the Tailor’s series of the
analytic function f (z) at z ¼ a vanish. Hence, we have

f ðzÞ ¼ An ðz  aÞn þ Anþ1 ðz  aÞnþ1 þ   


X1
¼ ðz  aÞn k¼0 Anþk ðz  aÞk ¼ ðz  aÞn hðzÞ,

where we define h(z) as


X1
hð z Þ  A ðz
k¼0 nþk
 aÞ k : ð6:133Þ

Then, h(z) is analytic and nonvanishing at z ¼ a. From the analyticity h(z) must be
continuous at z ¼ a and differ from zero in some finite neighborhood of z ¼ a.
Consequently, it is also the case with f (z). Therefore, if the set of zeros had an
accumulation point at z ¼ a, any neighborhood of z ¼ a would contain another zero,
in contradiction to the above assumption. To avoid this contradiction, the analytic
function f (z)  0 throughout the region R . Taking the contraposition of the above
statement, if an analytic function is not identically zero [i.e., f (z) ≢ 0], the zeros of
that function are isolated; see the relevant discussion of Sect. 6.1.2.
Meanwhile, the Laurent’s series (6.129) can be rewritten as
224 6 Theory of Analytic Functions

X1 X1 1
f ðzÞ ¼ A ðz
k¼0 k
 aÞk þ A
k¼1 k
ð6:134Þ
ð z  aÞ k

so that the constitution of the Laurent’s series can explicitly be recognized. The
second term is called a singular part (or principal part) of f (z) at z ¼ a. The
singularity is classified as follows:
(1) If (6.134) lacks the singular part (i.e., Ak ¼ 0 for k ¼ 1, 2,   ), the singular
point is said to be a removable singularity. In this case, we have
X1
f ðzÞ ¼ A ðz
k¼0 k
 aÞ k : ð6:135Þ

If in (6.135) we define suitably as

f ðaÞ  A0 ,

f (z) can be regarded as analytic. Examples include the following case where a
sin function is expanded as the Taylor’s series:

z3 z5 ð1Þn1 z2n1 X1 ð1Þn1 z2n1


sin z ¼ z  þ   þ þ  ¼ : ð6:136Þ
3! 5! ð2n  1Þ! n¼1 ð2n  1Þ!

Hence, if we define

sin z sin z
f ðzÞ  and f ð0Þ  lim ¼ 1, ð6:137Þ
z z!0 z

z ¼ 0 is a removable singularity.
(2) Suppose that in (6.134) we have a certain positive integer such that

An 6¼ 0 but Aðnþ1Þ ¼ Aðnþ2Þ ¼    ¼ 0, ð6:138Þ

the function f (z) is said to have a pole of order n at z ¼ a. If, in particular, n ¼ 1


in (6.138), i.e.,

A1 6¼ 0 but A2 ¼ A3 ¼    ¼ 0, ð6:139Þ

the function f (z) is said to have a simple pole. This special case is important in
the calculation of various integrals (vide infra). In the above cases, (6.134) can
be rewritten as
6.6 Analytic Continuation 225

gð z Þ
f ðzÞ ¼ , ð6:140Þ
ð z  aÞ n

where g(z) defined as


X1
gð z Þ  A ðz
k¼0 kn
 aÞk ð6:141Þ

is analytic and nonvanishing at z ¼ a, i.e., An 6¼ 0. Explicitly writing (6.140),


we have

X1 Xn 1
f ðzÞ ¼ A ðz
k¼0 k
 aÞ k þ A
k¼1 k
:
ðz  aÞk

(3) If the singular part of (6.134) comprises an infinite series, the point z ¼ a is
called an essential singularity. In that case, f (z) performs a complicated behavior
near or at z ¼ a. Interested readers are referred to suitable literature [5].
A function f (z) that is analytic in a region ℂ except at a set of points of the
region where the function has poles is called a meromorphic function in the said
region. The above definition of meromorphic functions is true of Cases (1) and
(2), but not true of Case (3). Henceforth, we will be dealing with the meromor-
phic functions.
In the above discussion of Cases (1) and (2), we often deal with a function f
(z) that has a single isolated pole at z ¼ a. This implies that f (z) is analytic within
a certain neighborhood Na of z ¼ a but is not analytic at z ¼ a. More specifically,
f (z) is analytic in a region Na  {a}. Even though we consider more than one
isolated pole, the situation is essentially the same. Suppose that there is another
isolated pole at z ¼ b. In that case, again take a certain neighborhood Nb of z ¼ b
(6¼a) and f (z) is analytic in a region Nb  {b}. Readers may well wonder why we
have to discuss this trifling issue. Nevertheless, think of the situation where a set
of poles has an accumulation point. Any neighborhood of the accumulation
point contains another pole where the function is not analytic. This is in
contradiction to that f (z) is analytic in a region, e.g., Na  {a}. Thus, if such a
function were present, it would be intractable to deal with.

6.6 Analytic Continuation

When we discussed the Cauchy’s integral formula (Theorem 6.11), we have known
that if a function is analytic in a certain region of ℂ and on a curve C that encircles the
region, the values of the function within the region are determined once the values of
the function on C are given. We have the following theorem for this.
226 6 Theory of Analytic Functions

Fig. 6.17 Four


convergence circles C1, C2,
C3, and C4 for 1/z. These
convergence circles are
centered at 1, i,  1, or  i

−1 1

Theorem 6.16 Let f1(z) and f2(z) be two functions that are analytic within a region
R . Suppose that the two functions coincide in a set chosen from among (i) a
neighborhood of a point z 2 R , or (ii) a segment of a curve lying in R , or (iii) a
set of points containing an accumulation point belonging to R . Then, the two
functions f1(z) and f2(z) coincide throughout R .
Proof Suppose that f1(z) and f2(z) coincide on the above set chosen from the three.
Then, since f1(z)  f2(z) ¼ 0 on that set, the set comprises the zeros of f1(z)  f2(z).
This implies that the zeros are not isolated in R . Thus, as evidenced from the
discussion of Sect. 6.5, we conclude that f1(z)  f2(z)  0 throughout R . That is,
f1(z)  f2(z). This completes the proof. ∎
The theorem can be understood as follows: Two different analytic functions
cannot coincide in the set chosen from the above three (or more succinctly, a set
that contains an accumulation point). In other words, the behavior of an analytic
function in R is uniquely determined by the limited information of the subset
belonging to R .
The above statement is reminiscent of the pre-established harmony and, hence,
occasionally talked about mysteriously. Instead, we wish to give a simple example.
Example 6.3 In Example 6.2 we described a tangible illustration of the Taylor’s
expansion of a function 1/z as follows:

1 X1 ð1Þ 1 X1
n
n z n
¼ n¼0 anþ1
ð z  a Þ ¼ 1 : ð6:119Þ
z a n¼0 a

If we choose 1, i,  1, or  i for a, we can draw four convergence circles as depicted


in Fig. 6.17. Let each function described by (6.119) be f1(z), fi(z), f1(z), or fi(z). Let
R be a region that consists of an outer periphery and its inside of the four circles C1,
6.7 Calculus of Residues 227

C2, C3, and C4 (i.e., convergence circles of radius 1). The said region R is colored
pale green in Fig. 6.17. There are four petals as shown.
We can view Fig. 6.17 as follows: For instance, f1(z) and fi(z) are defined in C1
and its inside and C2 and its inside, respectively, excluding z ¼ 0. We have
f1(z) ¼ fi(z) within the petal P1 (overlapped part as shown). Consequently, from
the statement of Theorem 6.16, f1(z)  fi(z) throughout the region encircled by C1
and C2 (excluding z ¼ 0). Repeating this procedure, we get

f 1 ðzÞ  f i ðzÞ  f 1 ðzÞ  f i ðzÞ:

Thus, we obtain the same functional entity throughout R except for the origin
(z ¼ 0). Further continuing similar procedures, we finally reach the entire complex
plane except for the origin, i.e., ℂ  {0} regarding the domain of analyticity of 1/z.
We further compare the above results with those for analytic function defined in
the real domain. The tailor’s expansion of f (x) ¼ 1/x (x: real with x 6¼ 0) around
a (a 6¼ 0) reads as

1 f ðnÞ ðaÞ X1 ð1Þn


f ð xÞ ¼ ¼ f ðaÞ þ f 0 ðaÞ þ    þ þ  ¼ n¼0 anþ1
ðx  aÞn : ð6:142Þ
x n!

This is exactly the same form as that of (6.119) aside from notation of the
argument.
The aforementioned procedure that determines the behavior of an analytic func-
tion outside the region where that function was originally defined is called analytic
continuation or analytic prolongation. Comparing (6.119) and (6.142), we can see
that (6.119) is a consequence of the analytic continuation of 1/x from the real domain
to the complex domain.

6.7 Calculus of Residues

The theorem of residues and its application to calculus of various integrals are one of
central themes of the theory of analytic functions.
The Cauchy’s integral theorem (Theorem 6.10) tells us that if f (z) is analytic in a
simply connected region R , we have
I
f ðzÞdz ¼ 0, ð6:84Þ
C

where C is a closed curve within R . As we have already seen, this is not necessarily
the case if f (z) has singular points in R . Meanwhile, if we look at (6.130), we are
aware of a simple but important fact. That is, replacing n with 1, we have
228 6 Theory of Analytic Functions

I I
1
A1 ¼ f ðζ Þdζ or f ðζ Þdζ ¼ 2πiA1 : ð6:143Þ
2πi C C

This implies that if we could obtain the Laurent’s series of f (z) with respect
H to a
singularity located at z ¼ a and encircled by C, we can immediately estimate Cf (ζ)
1
dζ of (6.143) from the coefficient of the term (z  H a) , i.e., A1. This fact further
implies that even though f (z) has singularities, Cf (ζ)dζ may happen to vanish.
Thus, whether (6.84) vanishes depends on the nature of f (z) and its singularities. To
examine it, we define a residue of f (z) at z ¼ a (that is encircled by C) as
I I
1
Res f ðaÞ  f ðzÞdz or f ðzÞdz ¼ 2πi Res f ðaÞ, ð6:144Þ
2πi C C

where Res f (a) denotes the residue of f (z) at z ¼ a. From (6.143) and (6.144), we
have

A1 ¼ Res f ðaÞ: ð6:145Þ

In a formal sense, the point a does not have to be a singular point. If it is a regular
point of f (a), we have Res f (a) ¼ 0 trivially. If there is more than one singularity at
z ¼ aj, we have
I X  
f ðzÞdz ¼ 2πi j
Res f aj : ð6:146Þ
C

Notice that we assume the isolated singularities with (6.144) and (6.146). These
equations are associated with Cases (1) and (2) dealt with in Sect. 6.5. Within this
framework, we wish to evaluate the residue of f (z) at a pole of order n located at
z ¼ a. Using (6.140)

gð z Þ
f ðzÞ ¼ , ð6:140Þ
ð z  aÞ n

where g(z) is analytic and nonvanishing at z ¼ a. Inserting (6.140) into (6.144), we


have
I
1 gð ζ Þ 1 dn1 gðzÞ
Res f ðaÞ ¼ n dζ ¼ j
2πi C ð ζ  aÞ ðn  1Þ! dzn1 z¼a

1 d n1 ½ f ðzÞðz  aÞn 


¼ jz¼a , ð6:147Þ
ðn  1Þ! dzn1

where with the second equality we used (6.106).


6.7 Calculus of Residues 229

In the special but important case where f (z) has a simple pole at z ¼ a, from
(6.147) we get

Res f ðaÞ ¼ f ðzÞðz  aÞjz¼a : ð6:148Þ

Setting n ¼ 1 in (6.147) in combination with (6.140), we obtain


I I
1 1 gðzÞ
Res f ðaÞ ¼ f ðzÞdz ¼ dz ¼ f ðzÞðz  aÞjz¼a ¼ gðzÞjz¼a :
2πi C 2πi C a
z

This is nothing but Cauchy’s integral formula (6.88) regarding g(z).


Summarizing the above discussion, we can calculate the residue of f (z) at a pole
of order n located at z ¼ a using one of the following alternatives, i.e., (i) using
(6.147) and (ii) picking up the coefficient A1 in the Laurent’s series described by
(6.129).
In Sect. 6.3 we mentioned that the closed contour integral (6.86) or (6.87)
depends on the nature of singularity. Here we have reached a simple criterion of
evaluating that integral. In fact, suppose that we have a Laurent’s expansion such
that
X1
f ðzÞ ¼ A ðz
n¼1 n
 aÞ n : ð6:129Þ

Then, let us calculate the contour integral of f (z) on a closed circle Γ of radius ρ
centered at z ¼ a. We have
I X1 I
I f ðζ Þdζ ¼ A
n¼1 n
ðζ  aÞn dζ
Γ Γ
X1 Z 2π
¼ n¼1
iAn ρnþ1 eiðnþ1Þθ dθ: ð6:149Þ
0

The above integral vanishes except for n ¼  1. With n ¼  1 we get

I ¼ 2πiA1 :

Thus, we recover (6.145) in combination of (6.144). To get a Laurent’s series of f


(z), however, is not necessarily easy. In that case, we estimate a residue by (6.147).
Tangible examples can be seen in the next section.
230 6 Theory of Analytic Functions

6.8 Examples of Real Definite Integrals

The calculus of residues is of great practical importance. For instance, it is directly


associated with the calculations of definite integrals. In particular, even if one finds
the real integration to be hard to perform in order to solve a related problem, it is
often easy to solve it using complex integration, especially the calculus of residues.
Here we study several examples.
Example 6.4 Let us consider the following real definite integral:
Z 1
dx
I¼ : ð6:150Þ
1 1 þ x2

To this end, we convert the real variable x to the complex variable z and evaluate a
contour integral IC (Fig. 6.18) described by
Z R Z
dz dz
IC ¼ þ , ð6:151Þ
R 1 þ z2 ΓR 1 þ z2

where R is a real positive number enough large (R 1); ΓR denotes an upper


semicircle; IC stands for the contour integral along the closed curve C that comprises
the interval [R, R] and the upper semicircle ΓR. Simple poles of order 1 are located
at z ¼  i as shown. There is only one simple pole at z ¼ i within the upper
semicircle of Fig. 6.18. From (6.146), therefore, we have
I
IC ¼ f ðzÞdz ¼ 2πi Res f ðiÞ, ð6:152Þ
C

where f ðzÞ  1þz


1
2. Using (6.148), the residue of f (z) at z ¼ i can readily be estimated

to be

Fig. 6.18 Contour for the


integration of 1/(1 + z2) that
appears in Example 6.4. One Γ
may equally choose the
upper semicircle (denoted
by ΓR) and lower semicircle i
(denoted by ΓfR ) for contour
integration − 0
−i

Γ
6.8 Examples of Real Definite Integrals 231

1
Res f ðiÞ ¼ f ðzÞðz  iÞjz¼i ¼ : ð6:153Þ
2i

To estimate the integral (6.150), we must calculate the second term of (6.151). To
this end, we change the variable z such that

z ¼ Reiθ , ð6:154Þ

where θ is an argument. Then, using the Darboux inequality (6.95), we get


Z Z π Z π Z π
dz iReiθ dθ iReiθ R
¼  dθ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 2   dθ
ΓR 1þz
2
0 R2 e2iθ þ1 0 R e þ1
2 2iθ
0 R e þ1 R2 e2iθ þ1
2iθ
Z π :
1 π π
¼R qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 4  dθ  R  ¼  
0 R þ2R2 cos2θ þ1 1 1
R2 1 2 R 1 2
R R

Taking R ! 1, we have
Z
dz
! 0: ð6:155Þ
ΓR 1 þ z2

Consequently, if R ! 1, we have
Z 1 Z Z 1
dx dz dx
lim I C ¼ þ lim ¼ ¼I
R!1 1 1 þ x 2 R!1 ΓR 1 þ z 2
1 1 þ x2
¼ 2πi Res f ðiÞ ¼ π, ð6:156Þ

where with the first equality we placed the variable back to x; with the last equality
we used (6.152) and (6.153). Equation (6.156) gives an answer to (6.150). Notice in
general that if a meromorphic function is given as a quotient of two polynomials, an
integral of a type of (6.155) tends to be zero with R ! 1 in the case where the
degree of the denominator of that function is at least two units higher than the degree
of the numerator [8].
We may equally choose another contour C e (the closed curve comprising the
interval [R, R] and the lower semicircle ΓfR ) for the integration (Fig. 6.18). In that
case, we have
232 6 Theory of Analytic Functions

Z 1 Z Z 1
dx dz dx
lim I ¼ þ lim ¼
R!1 e
C
1 1 þ x2 R!1 ΓeR 1 þ z2 1 1 þ x2
¼ 2πi Res f ðiÞ ¼ π: ð6:157Þ

Note that in the above equation the contour integral is taken in the counterclock-
wise direction and, hence, that the real definite integral has been taken from 1 to
1. Notice also that

1
Res f ðiÞ ¼ f ðzÞðz þ iÞjz¼i ¼  ,
2i

because f (z) has a simple pole at z ¼  i in the lower semicircle (Fig. 6.18) for which
the residue has been taken. From (6.157), we get the same result as before such that
Z 1
dx
¼ π:
1 1 þ x2

Alternatively, we can use the method of partial fraction decomposition. In that


case, as the integrand we have

1 1 1 1
¼  : ð6:158Þ
1 þ z2 2i z  i z þ i

2 at z ¼ i is 2i [i.e., the coefficient of zi in (6.158)], giving the


1 1 1
The residue of 1þz
same result as (6.153).
Example 6.5 Evaluate the following real definite integral:
Z 1
dx
I¼ : ð6:159Þ
1 ð 1 þ x2 Þ 3

As in the case of Example 6.4, we evaluate a contour integration described by


I Z R Z
1 dz dz
IC ¼ 3
dz ¼ 3
þ , ð6:160Þ
C ð1 þ z2 Þ R ð1 þ z2 Þ ΓR ð1 þ z2 Þ3

where C stands for the closed curve comprising the interval [R, R] and the upper
semicircle ΓR (see Fig. 6.18).
The function f ðzÞ  ð1þz1 2 Þ3 has two isolated poles at z ¼  i. In the present case,
however, the poles are of order of 3. Hence, we use (6.147) to estimate the residue.
The inside of the upper semicircle contains the pole of order 3 at z ¼ i. Therefore, we
have
6.8 Examples of Real Definite Integrals 233

I I
1 1 gð z Þ 1 d2 gðzÞ
Res f ðiÞ ¼ f ðzÞdz ¼ 3
dz ¼ , ð6:161Þ
2πi C 2πi C ðz  iÞ 2! dz2
z¼i

where

1 gð z Þ
gð z Þ  3
and f ðzÞ ¼ :
ðz þ i Þ ðz  iÞ3

Hence, we get

1 d 2 gð z Þ 1 1
z¼i ¼ ð3Þ  ð4Þ  ðz þ iÞ5 ¼ ð3Þ  ð4Þ  ð2iÞ5
2! dz2 2 z¼i 2
3
¼ ¼ Res f ðiÞ: ð6:162Þ
16i

For a reason similar to that of Example 6.4, the second term of (6.160) vanishes
when R ! 1. Then, we obtain
Z 1 Z I
dz dz gð z Þ
lim I ¼ 3
þ lim 3
¼ lim 3
dz ¼ 2πi Res f ðiÞ
R!1C 1 ð1 þ z Þ
2 R!1 ΓR ð1 þ z Þ
2 R!1 C ðz  iÞ
3 3π
¼ 2πi ¼ :
16i 8

That is,
Z 1 Z 1
dz dx 3π
3
¼ 3
¼I¼ :
1 ð1 þ z2 Þ 1 ð1 þ x Þ
2 8

If we are able to perform the Laurent’s expansion of f (z), we can immediately


evaluate the residue using (6.145). To this end, let us utilize the binomial expansion
formula (or generalized binomial theorem). We have

1 1
f ðzÞ ¼ ¼ :
ð1 þ z2 Þ3 ðz þ iÞ3 ðz  iÞ3

Changing the variable z  i ¼ ζ, we have


234 6 Theory of Analytic Functions

 3
1 1 1 ζ
f ðzÞ ¼ ef ðζ Þ ¼ ¼   1 þ
ðζ þ 2iÞ3 ζ 3 ζ 3 ð2iÞ3 2i
"  2  3 #
1 1 ζ ð3Þð4Þ ζ ð3Þð4Þð5Þ ζ
¼ 3  1 þ ð3Þ þ þ þ 
ζ ð2iÞ3 2i 2! 2i 3! 2i
1 1 3 3 5
¼   þ þ  ,
8i ζ 3 2iζ 2 2ζ 4i
ð6:163Þ

where    of the above equation denotes a power series of ζ. Accompanied by the


variable transformation z to ζ, the functional form f has been changed to ef . Getting
back the original functional form, we have
" #
1 1 3 3 5
f ðzÞ ¼    þ þ  : ð6:164Þ
8i ðz  iÞ3 2iðz  iÞ2 2ðz  iÞ 4i

Equation (6.164) is a Laurent’s expansion of f (z). From (6.164) we see that the
1 3
coefficient of zi of f (z) is 16i , in agreement with (6.145) and (6.162). To obtain the
answer of (6.159), once again we may equally choose the contour C e (Fig. 6.18) and
get the same result as in the case of Example 6.4. In that case, f (z) can be expanded
into a Laurent’s series around z ¼  i. Readers are encouraged to check it.
We make a few remarks about the (generalized) binomial expansion formula. We
described it in (3.181) of Sect 3.6.1 such that

X1  λ 
ð1 þ xÞλ ¼ m¼0
xm , ð3:181Þ
m

where λ is an arbitrary real number. In (3.181), we assumed x is a real number with


jxj < 1. But, (6.163) suggests that (3.181) holds with x that can be a complex number
with jxj < 1. Moreover, λ in (3.181) is allowed to be any complex number. Then, on
those conditions (1 + x)λ is analytic because 1 + x 6¼ 0, and so (1 + x)λ must have a
 3
convergent Taylor’s series. By the same token, the factor 1 þ 2iζ of (6.163) yields
a convergent Taylor’s series around ζ ¼ 0. In virtue of the factor ζ13 , (6.163) gives a
convergent Laurent’s series around ζ ¼ 0.
Returning to the present issue, we rewrite (3.181) as a Taylor’s series such that

X1  λ 
ð1 þ zÞλ ¼ m¼0
zm , ð6:165Þ
m

where λ is any complex number with z also being a complex number of jzj < 1. We
may view (6.165) as a consequence of the analytic continuation and (6.163) has been
dealt with as such indeed.
6.8 Examples of Real Definite Integrals 235

Fig. 6.19 Graphical form 


of f ðxÞ  1þx
1
3 . The modulus

of f(x) tends to infinity at 


x ¼  1. Inflection points
are present at x ¼ 0 and 
p ffiffiffiffiffiffiffiffi
3
1=2 ð 0:7937Þ 








-5 -3 -1 1 3 5

Example 6.6 Evaluate the following real definite integral:


Z 1
dx
I¼ : ð6:166Þ
1 1 þ x3

We put

1
f ð xÞ  : ð6:167Þ
1 þ x3

Figure 6.19 depicts a graphical form of f (x). The modulus of f (x) tends to be
pffiffiffiffiffiffiffiffi
infinity at x ¼  1. Inflection points are present at x ¼ 0 and 3 1=2 ð 0:7937Þ.
Now, in the former two examples, the isolated singular points (i.e., poles) existed
only in the inside of a closed contour. In the present case, however, a pole is located
on the real axis. Bearing in mind this situation, we wish to estimate the integral
(6.166).
Rewriting (6.166), we have
Z 1 Z 1
dx dz
I¼ or I¼ :
1 ð 1 þ x Þ ð x 2  x þ 1Þ
1 ð 1 þ z Þ ð z2  z þ 1Þ

The polynomial of the denominator, i.e., (1 + z)(z2  z + 1) has a real root at


pffiffi
z ¼  1 and complex roots at z ¼ eiπ=3 ¼ 12 3i . Therefore,

1
f ðzÞ 
ð 1 þ z Þ ð z 2  z þ 1Þ
236 6 Theory of Analytic Functions

Fig. 6.20 Contour for the


1
integration of 1þz3 that Γ
appears in Example 6.6
/
Γ i

− −1 0
/

has three simple poles at z ¼  1 along with z ¼ eiπ/3. This time, we have a contour
C depicted in Fig. 6.20, where the contour integration IC is performed in such a way
that
Z 1r Z
dz dz
IC ¼ þ
R ð1 þ zÞðz2  z þ 1Þ
Γ1 ð 1 þ z Þ ð z 2  z þ 1Þ

Z R Z
dz dz
þ þ ,
1þr ð1 þ zÞðz  z þ 1Þ ΓR ð1 þ zÞðz  z þ 1Þ
2 2

ð6:168Þ

where Γ1 stands for a small semicircle of radius r around z ¼  1 as shown. Of the
complex roots z ¼ eiπ/3, only z ¼ eiπ/3 is responsible for the contour integration (see
Fig. 6.20).
Here we define a principal value P of the integral such that
Z 1
dx
P
1 ð 1 þ x Þ ð x 2  x þ 1Þ

Z 1r Z 1
dx dx
 lim þ , ð6:169Þ
r!0 1 ð 1 þ x Þ ð x 2  x þ 1Þ 1þr ð 1 þ x Þ ð x 2  x þ 1Þ

where r is an arbitrary real positive number. As shown in this example, the principal
value is a convenient device to traverse the poles on the contour and gives a correct
answer in the contour integral. At the same time, getting R to infinity, we obtain
Z 1 Z
dx dz
lim I C ¼ P þ
R!1 1 ð 1 þ x Þ ðx 2  x þ 1Þ
Γ1 ð 1 þ z Þð z 2  z þ 1Þ
Z
dz
þ : ð6:170Þ
Γ1 ð 1 þ z Þ ð z 2  z þ 1Þ

Meanwhile, the contour integral IC is given by (6.146) such that


6.8 Examples of Real Definite Integrals 237

I X  
lim I C ¼ f ðzÞdz ¼ 2πi Res f aj , ð6:146Þ
R!1 C j

where in the present case f ðzÞ  ð1þzÞðz12 zþ1Þ and we have only one simple pole at
aj ¼ eiπ/3 within the contour C.
We are going to get the answer by combining (6.170) and (6.146) such that
Z 1
dx
IP
1 ð 1 þ x Þ ðx 2  x þ 1Þ
Z Z
dz dz
¼ 2πi Res f eiπ=3   :
Γ1 ð1 þ zÞðz  z þ 1Þ Γ1 ð1 þ zÞðz  z þ 1Þ
2 2

ð6:171Þ

The third term of RHS of (6.171) tends to be zero as in the case of Examples 6.4
and 6.5. Therefore, if we can estimate the second term as well as the residues
properly, the integral (6.166) can be adequately determined. Note that in this case
the integral is meant by the principal value. To estimate the integral of the second
term, we make use of the fact that defining g(z) as

1
gð z Þ  , ð6:172Þ
z2  z þ 1

g(z) is analytic at and around z ¼  1. Hence, g(z) can be expanded in the Taylor’s
series around z ¼  1 such that
X1
gðzÞ ¼ A0 þ A ðz
n¼1 n
þ 1Þ n , ð6:173Þ

where A0 6¼ 0. In fact,

A0 ¼ gð1Þ ¼ 1=3: ð6:174Þ

Then, we have
Z Z
dz gðzÞdz
¼
ð þ Þ ð Γ1 1 þ z
1 z z 2  z þ 1Þ
Γ1
Z Z X
dz 1
¼ A0 þ A ðz þ 1Þn1 dz: ð6:175Þ
Γ1 1 þ z Γ1
n¼1 n

Changing the variable z to the polar form in RHS such that


238 6 Theory of Analytic Functions

z ¼ 1 þ reiθ , ð6:176Þ

with the first term of RHS we have


Z Z Z 0 Z π
ireiθ dθ
A0 iθ
¼ A0 idθ ¼ A0 idθ ¼ A0 idθ ¼ A0 iπ: ð6:177Þ
Γ1 re Γ1 π 0

Note that in (6.177) the contour integration along Γ1 was performed in the
decreasing direction of the argument θ from π to 0. Thus, (6.175) is rewritten as
Z X1 Z 0
dz
¼ A 0 iπ þ An i r n einθ dθ, ð6:178Þ
Γ1 ð1 þ zÞðz  z þ 1Þ
2 n¼1
π

where with the last equality we exchanged the order of the summation and integra-
tion. The second term of RHS of (6.178) vanishes when r ! 0. Thus, (6.171) is
further rewritten as

I ¼ lim I ¼ 2πi Res f eiπ=3 þ A0 iπ: ð6:179Þ


r!0

We have only one simple pole for f (z) at z ¼ eiπ/3 within the semicircle contour C.
Then, using (6.148), we have

Res f eiπ=3 ¼ f ðzÞ z  eiπ=3


z¼eiπ=3

1 1 pffiffiffi
¼ ¼ 1 þ 3i : ð6:180Þ
ð1 þ eiπ=3 Þðeiπ=3  eiπ=3 Þ 6

The above calculus is straightforward, but (6.37) can be conveniently used. Thus,
finally we get
pffiffiffi
πi pffiffiffi πi 3π π
I¼ 1 þ 3i þ ¼ ¼ pffiffiffi : ð6:181Þ
3 3 3 3
That is, I 1:8138: ð6:182Þ

To avoid any ambiguity of the principal value, we properly define it as [8]


Z 1 Z R
P f ðxÞdx  lim f ðxÞdx:
1 R!1 R
6.8 Examples of Real Definite Integrals 239

(a) (b)

Γ
Γ
/
i /
Γ i

0 − −1 0
−1
/
/

1
Fig. 6.21 Sector of circle for contour integration of 1þz 3 that appears in Example 6.7. (a) The

integration range is [0, 1). (b) The integration range is (1, 0]

The integral can also be calculated using a lower semicircle including the simple
pffiffi
pole located at z ¼ eiπ=3 ¼ 12 3i . That gives the same answer as (6.181). The
calculations are left for readers as an exercise.
Example 6.7 [9, 10] It will be interesting to compare the result of Example 6.6 with
that of the following real definite integral:
Z 1
dx
I¼ : ð6:183Þ
0 1 þ x3

To evaluate (6.183), it is convenient to use a sector of circle for contour (see


Fig. 6.21a). In this case, the contour integral IC estimated along a sector is described
by
Z R Z Z
dz dz dz
IC ¼ þ þ : ð6:184Þ
0 1 þ z3 ΓR 1 þ z3 L 1 þ z3

In the third integral of (6.184), we take argz ¼ θ (constant) so that with z ¼ reiθ
we may have dz ¼ dreiθ. Then, that integral is given by
Z Z
dz dr
¼ : ð6:185Þ
L 1 þ z3 L 1 þ r 3 e3iθ

Setting 3iθ ¼ 2iπ, namely θ ¼ 2π/3, we get


Z Z 0 Z R
dz dr dr
¼ e2πi=3 ¼ e2πi=3 :
L 1 þ r 3 e3iθ R 1 þ r3 0 1 þ r3

Thus, taking R ! 1 in (6.184) we have


240 6 Theory of Analytic Functions

Z 1 Z Z 1
dx dz dr
lim I C ¼ þ  e2πi=3 : ð6:186Þ
R!1 0 1 þ x3 Γ1 1 þ z3 0 1 þ r3

The second term of (6.186) vanishes as before. Changing the variable r ! x and
taking account of (6.180), we obtain
Z 1
dx
2πiRes f eiπ=3 ¼ 1  e2πi=3 ¼ 1  e2πi=3 I: ð6:187Þ
0 1 þ x3

Then, we have

I ¼ 2πiRes f eiπ=3 = 1  e2πi=3

¼ 2πi= 1  e2πi=3 1 þ eiπ=3 eiπ=3  eiπ=3 :

To calculate Res f (eiπ/3), we used (6.148) at z ¼ eiπ/3; f (z) has a simple pole at
z ¼ eiπ/3 within the contour C of Fig. 6.21a. Noting that (1  e2πi/3) (1 + eiπ/3) ¼ 3
pffiffiffi
and eiπ=3  eiπ=3 ¼ 2i sin ðπ=3Þ ¼ 3i, we get

pffiffiffi 2π
I ¼ 2πi= 3 3i ¼ pffiffiffi : ð6:188Þ
3 3

That is, I 1.2092. This number is two-thirds of (6.182).


We further extend the above calculation method to evaluation of real definite
integral. For instance, we have a following integral described by
Z 0
dx
I¼P : ð6:189Þ
1 1 þ x3

To evaluate this integral, we define the contour integral I C0 as in Fig. 6.21b, which
is expressed as
Z 0 Z Z Z
dz dz dz dz
I C0 ¼P þ þ þ : ð6:190Þ
R 1 þ z3 Γ1 ð 1 þ z Þ ð z 2  z þ 1Þ
L 0 1 þ z
3
ΓR 1 þ z3

Notice that combining I C0 of (6.190) with IC of (6.184) makes the contour


integration of (6.168). Proceeding as before and noting that there is no singularity
inside the contour C0, we have
Z 0 Z 1 Z 0
dz πi 2πi dr dz π
lim I C 0 ¼0¼P  þe3 ¼P  pffiffiffi ,
R!1 1 1 þ z 3 3 0 1 þ r3 1 1 þ z 3
3 3

where we used (6.188) and (6.178) in combination with (6.174). Thus, we get
6.8 Examples of Real Definite Integrals 241

Z 0
dx π
I¼ ¼ pffiffiffi : ð6:191Þ
1 1 þ x
3
3 3

As a matter of course, the result of (6.191) can immediately be obtained by


subtracting (6.188) from (6.181).
We frequently encounter real definite integrals including trigonometric functions.
In such cases, to make a valid estimation of the integrals it is desired to use complex
exponential functions. Jordan’s lemma is a powerful tool for this. The proof is given
as below following literature [5].
Lemma 6.6: Jordan’s Lemma [5] Let ΓR be a semicircle of radius R centered at
the origin in the upper half of the complex plane. Let f (z) be an analytic function that
tends uniformly to zero as jz j ! 1 when argz lies in the interval 0  arg z  π.
Then, with a non-negative real number α, we have
Z
lim eiαζ f ðζ Þdζ ¼ 0:
R!1 Γ
R

Proof Using polar coordinates, ζ is expressed as

ζ ¼ Reiθ ¼ Rðcos θ þ i sin θÞ:

Then, the integral is denoted by


Z Z π  
IR  e f ðζ Þdζ ¼ iR
iαζ
f Reiθ eiαRðcos θþi sin θÞ eiθ dθ
ΓR 0
Z π  
¼ iR f Re iθ
eiαR cos θαR sin θþiθ dθ:
0

By assumption f (z) tends uniformly to zero as jz j ! 1, and so we have


 
f Reiθ < εðRÞ,

where ε(R) is a certain positive number that depends only on R and tends to be zero
as jzj ! 1. Therefore, we have
Z π Z π=2
jI R j < εðRÞR eαR sin θ dθ ¼ 2 εðRÞR eαR sin θ dθ,
0 0

where the last equality results from the symmetry of sinθ about π/2. Meanwhile, we
have
242 6 Theory of Analytic Functions

sin θ 2θ=π when 0  θ  π=2:

Hence, we have
Z π=2
πεðRÞ  
jI R j < 2 εðRÞR e2αRθ=π dθ ¼ 1  eαR :
0 α

Therefore, we get

lim I R ¼ 0:
R!1

These complete the proof. ∎


Alternatively, for f (z) that is analytic and tends uniformly to zero as jz j ! 1
when argz lies in the interval π  arg z  2π, similarly we have
Z
lim eiαζ f ðζ Þdζ ¼ 0,
R!1 Γ
R

where α is a certain non-negative real number. In this case, we assume that a


semicircle ΓR of radius R centered at the origin lies in the lower half of the complex
plane.
Using the Jordan’s lemma, we have several examples.
Example 6.8 Evaluate the following real definite integral:
Z 1
sin x
I¼ dx: ð6:192Þ
1 x

Rewriting (6.192) as
Z 1 Z 1
sin z 1 eiz  eiz
I¼ dz ¼ dz, ð6:193Þ
1 z 2i 1 z

we consider the following contour integral:


Z R Z Z
eiz eiz eiz
IC ¼ P dz þ dz þ dz, ð6:194Þ
R z Γ0 z ΓR z

where Γ0 represents an infinitesimally small semicircle around the origin and ΓR


shows the outer semicircle of radius R centered at the origin (see Fig. 6.22). The
contour C consists of the real axis, Γ0, and ΓR as shown.
The third term of (6.194) vanishes when R ! 1 due to the Jordan’s lemma. With
the second term, eiz is analytic in the whole complex plane (i.e., entire function) and,
6.8 Examples of Real Definite Integrals 243

Fig. 6.22 Contour for the


integration of sinz z that Γ
appears in Example 6.8

Γ
− 0

hence, can be expanded in the Taylor’s series around any complex number.
Expanding it around the origin we have

X1 ðizÞn X1 ðizÞn
eiz ¼ n¼0 n!
¼ 1 þ n¼1 n!
: ð6:195Þ

As already mentioned in (6.173) through (6.178), among terms of RHS in (6.195)


only the first term, i.e., 1, contributes to the integral of the second term of RHS in
(6.194). Thus, as in (6.178) we get
Z
eiz
dz ¼ 1 iπ ¼ iπ: ð6:196Þ
Γ0 z

Notice that in (6.196) the argument is decreasing from π ! 0 during the contour
integration. Within the contour C, there is no singular point, and so IC ¼ 0 due to
Theorem 6.10 (Cauchy’s integral theorem). Thus, from (6.194) we obtain
Z 1 Z
eiz eiz
P dz ¼  dz ¼ iπ: ð6:197Þ
1 z Γ0 z

Exchanging the variable z !  z in (6.197), we have


Z 1 iz Z 1
e eiz
P  dz ¼ P dz ¼ iπ: ð6:198Þ
1 z 1 z

Summing both sides of (6.197) and (6.198), we get


Z 1 Z 1 iz
eiz e
P dz þ P dz ¼ 2iπ: ð6:199Þ
1 z 1 z

Rewriting (6.199), we have


244 6 Theory of Analytic Functions

Z 1 Z 1
ðeiz  eiz Þ sin z
P dz ¼ 2iP dz ¼ 2iπ: ð6:200Þ
1 z 1 z

That is, the answer is


Z 1 Z 1
sin z sin x
P dz ¼ P dx ¼ I ¼ π: ð6:201Þ
1 z 1 x

Notice that as mentioned in (6.137) z ¼ 0 is a removable singularity of sin z


z , and so
the symbol P is superfluous in (6.201).
Example 6.9 Evaluate the following real definite integral:
Z 1
sin 2 x
I¼ dx: ð6:202Þ
1 x2

From the trigonometric theorem, we have

1
sin 2 x ¼ ð1  cos 2xÞ: ð6:203Þ
2

Then, the integrand of (6.202) with the variable changed is rewritten using (6.37)
as

sin 2 z 1 1  e2iz 1  e2iz


¼ þ : ð6:204Þ
z2 4 z2 z2

Expanding e2iz as before, we have

X1 ð2izÞn X1 ð2izÞn
e2iz ¼ n¼0 n!
¼ 1 þ 2iz þ n¼2 n!
: ð6:205Þ

Then, we get

X1 ð2izÞn
1  e2iz ¼ 2iz  n¼2 n!
,
X n n2 ð6:206Þ
1  e2iz 2i 1 ð2iÞ z
¼  :
z2 z n¼2 n!

As in Example 6.8, we wish to use (6.206) to estimate the second term of the
following contour integral IC:
6.8 Examples of Real Definite Integrals 245

Z 1 Z Z
1  e2iz 1  e2iz 1  e2iz
lim I C ¼ P dz þ dz þ dz: ð6:207Þ
R!1 1 z2 Γ0 z2 Γ1 z2

With the second integral of (6.207), as before, only the first term of (6.206)
contributes to the integral. The result is
Z Z Z 0
1  e2iz 1
dz ¼ 2i dz ¼ 2i idθ ¼ 2π: ð6:208Þ
Γ0 z2 Γ0 z π

With the third integral of (6.207), we have


Z Z Z
1  e2iz 1 e2iz
dz ¼ dz  dz: ð6:209Þ
Γ1 z2 Γ1 z
2
Γ1 z
2

The first term of RHS of (6.209) vanishes for the reason similar to that already we
have seen in Example 6.4. The second term of RHS vanishes as well because of the
Jordan’s lemma. Within the contour C of (6.207), again there is no singular point,
and so lim I C ¼ 0. Hence, from (6.207) we have
R!1

Z 1 Z
1  e2iz 1  e2iz
P 2
dz ¼  dz ¼ 2π: ð6:210Þ
1 z Γ0 z2

Exchanging the variable z !  z in (6.210) as before, we have


" Z # Z 1
1
1  e2iz 1  e2iz
P  2
dz ¼ P dz ¼ 2π: ð6:211Þ
1 ðzÞ 1 z2

Summing both sides of (6.210) and (6.211) in combination with (6.204), we get
an answer described by
Z 1
sin 2 z
dz ¼ π: ð6:212Þ
1 z2

Again, the symbol P is superfluous in (6.210) and (6.211).


Example 6.10 Evaluate the following real definite integral:
Z 1
cos x
I¼ dx ða 6¼ bÞ: ð6:213Þ
1 ð x  aÞ ð x  b Þ

We rewrite the denominator of (6.213) using the method of partial fraction


decomposition such that
246 6 Theory of Analytic Functions

(a) (b)
Γ

Γ − 0
Γ
− 0

Fig. 6.23 Contour for the integration of ðzacos z


ÞðzbÞ that appears in Example 6.10. (a) Along the upper
0
contour C. (b) Along the lower contour C

1 1 1 1
¼  : ð6:214Þ
ðx  aÞðx  bÞ a  b x  a x  b

Then, (6.213) is described by


Z 1
1 cos z cos z
I¼  dz
ab 1 z a zb
Z 1  iz 
1 e þ eiz eiz þ eiz
¼  dz: ð6:215Þ
2ða  bÞ 1 za zb

We wish to consider the integration by dividing the integrand and calculate each
term individually. First, we estimate
Z 1
eiz
dz: ð6:216Þ
1 z  a

To apply the Jordan’s lemma to the integration of (6.216), we use the upper
contour C that consists of the real axis, Γa, and ΓR; see Fig. 6.23a. As usual, we
consider a following integral:
Z 1 Z Z
eiz eiz eiz
lim I C ¼ P dz þ dz þ dz, ð6:217Þ
R!1 1 z  a Γa z  a Γ1 z  a

where Γa represents an infinitesimally small semicircle around z ¼ a. Notice that


(6.217) is obtained when we are considering R ! 1 in Fig. 6.23a. The third term of
(6.217) vanishes due to the Jordan’s lemma. With the second term of (6.217), we
change variable z  a ! z and rewrite it as
6.8 Examples of Real Definite Integrals 247

Z Z Z
eiz eiðzþaÞ eiz
dz ¼ dz ¼ eia dz ¼ eia iπ, ð6:218Þ
Γa z  a Γ0 z Γ0 z

where with the last equality we used (6.196). There is no singular point within the
contour C, and so lim I C ¼ 0. Then, we have
R!1

Z 1 Z
eiz eiz
P dz ¼  dz ¼ eia iπ, ð6:219Þ
1 z  a Γa z  a

Next, we consider a following integral that appears in (6.215):


Z 1
eiz
dz: ð6:220Þ
1 z  a

To apply the Jordan’s lemma to the integration of (6.220), we use the lower
contour C0 that consists of the real axis, Γea , and Γ
fR (Fig. 6.23b). This time, the
integral to be evaluated is given by
Z 1 Z Z
eiz eiz eiz
lim I C0 ¼ P dz þ dz þ dz ð6:221Þ
R!1 1 z  a Γea z  a Γe1  a
z

so that we can trace the contour C0 counterclockwise. Notice that the minus sign is
present in front of the principal value. The third term of (6.221) vanishes due to the
Jordan’s lemma. Evaluating the second term of (6.221) similarly just above, we get
Z Z iðzþaÞ Z iz
eiz e ia e
dz ¼ dz ¼ e dz ¼ eia iπ: ð6:222Þ
e
Γa z  a e
Γ0 z e
Γ0 z

Hence, we obtain
Z 1 Z
eiz eiz
P dz ¼ dz ¼ eia iπ: ð6:223Þ
1 z  a Γea z  a

Notice that in (6.222) the argument is again decreasing from 0 !  π during the
contour integration. Summing (6.219) and (6.223), we get
Z 1 Z 1 iz Z 1 iz
eiz e e þ eiz
P dz þ P dz ¼ P dz ¼ eia iπ  eia iπ
1 z  a 1 z  a 1 z  a
 
¼ iπ eia  eia ¼ iπ  2i sin a ¼ 2π sin a:

In a similar manner, we also get


248 6 Theory of Analytic Functions

Z 1
eiz þ eiz
P dz ¼ 2π sin b:
1 z  b

Thus, the final answer is


Z 1  iz 
1 e þ eiz eiz þ eiz 2π ðsin b  sin aÞ
I¼  dz ¼
2ð a  bÞ 1 za zb 2ð a  bÞ
π ðsin b  sin aÞ
¼ : ð6:224Þ
ab

Example 6.11 Evaluate the following real definite integral:


Z 1
cos x
I¼ dx: ð6:225Þ
1 1 þ x2

The calculation is left for readers. Hints: (i) Use cosx ¼ (eix + eix)/2 and Jordan’s
lemma. (ii) Use the contour as shown in Fig. 6.18 for integration. Besides the
examples listed above, a variety of integral calculations can be seen in literature
[9–12].

6.9 Multivalued Functions and Riemann Surfaces

The last topics of the theory of analytic functions are related to multivalued func-
tions. Riemann surfaces play a central role in developing the theory of the
multivalued functions.

6.9.1 Brief Outline

We mention a brief outline for notions of multivalued functions and Riemann


surfaces. These notions are indispensable for investigating properties of irrational
functions and logarithmic functions and performing related calculations, especially
integrations with respect to those functions.
Definition 6.12 Let n be a positive integer and let z be an arbitrary complex number.
Suppose we have

z ¼ wn : ð6:226Þ

Then, w is said to be a n-th root of z and we denote


6.9 Multivalued Functions and Riemann Surfaces 249

Fig. 6.24 Multiple roots in


the complex plane. (a)
(a) 1 /
(−1) /

Square roots of 1 and 1.


(b) Cubic roots of 1 and 1

−1 1

(b) 1 /
(−1) /
/
/

1 −1

/
/

w ¼ z1=n : ð6:227Þ

We have expressly shown this simple thing. It is because if we think of real z, z1/n
is uniquely determined only when n is odd regardless of whether z is positive or
negative. In case we think of even n, (i) if z is positive, we have two n-th root of z1/n.
(ii) If z is negative, on the other hand, we have no n-th root. Meanwhile, if we think
of complex z, we have a totally different situation. That is, regardless of whether n is
odd or even, we always have n different z1/n for any given complex number z.
As a tangible example, let us think of n-th roots of 1 for n ¼ 2 and 3. These are
depicted graphically in Fig. 6.24. We can readily understand the statement of the
above paragraph. In Fig. 6.24a, if z ¼  1 for n ¼ 2, we have either w ¼ (1)1/2
¼ (eiπ)1/2 ¼ i or w ¼ (1)1/2 ¼ (eiπ)1/2 ¼  i with square roots. In Fig. 6.24b, if
z ¼  1 for n ¼ 3, we have w ¼ (1)1/3 ¼  1; w ¼ (1)1/3 ¼ (eiπ)1/3 ¼ eiπ/3;
w ¼ (1)1/3 ¼ (eiπ)1/3 ¼ eiπ/3 with cubic roots. Moreover, we have 11/2 ¼ 1;  1.
Also, we have 11/3 ¼ 1; e2iπ/3; e2iπ/3.
Let us extend the above preliminary discussion to the analytic functions. Suppose
we are given a following polar form of z such that

z ¼ r ðcos θ þ i sin θÞ ð6:32Þ

and
250 6 Theory of Analytic Functions

w ¼ ρðcos φ þ i sin φÞ:

Then, we have

z ¼ wn ¼ ρn ðcos φ þ i sin φÞn ¼ ρn ðcos nφ þ i sin nφÞ, ð6:228Þ

where with the last equality we used de Moivre’s theorem (6.36). Comparing the real
and imaginary parts of (6.32) and (6.228), we have

r ¼ ρn or ρ ¼ r 1=n ð6:229Þ

and

θ ¼ nφ þ 2kπ ðk ¼ 0, 1, 2,   Þ: ð6:230Þ

In (6.229), we define both r and ρ as being positive so that positive ρ can be


uniquely determined for any positive integer n (i.e., either even or odd). Recall once
again that if n is even, we have two n-th roots for ρ (i.e., both positive and negative)
with given positive r. Of these two roots, we discard the negative ρ. Rewriting
(6.230), we have

1
φ ¼ ðθ  2kπ Þ ðk ¼ 0, 1, 2,   Þ:
n

Further rewriting (6.227), we get


h i
1 1
wk ðθÞ ¼ z1=n ¼ r 1=n cos ðθ  2kπ Þ þ i sin ðθ  2kπ Þ
n n
ðk ¼ 0, 1, 2,   , n  1Þ, ð6:231Þ

where wk(θ) is said to be a branch. Note that we have

wk ðθ  2nπ Þ ¼ wk ðθÞ: ð6:232Þ

That is, wk(θ) is a periodic function of the period 2nπ. The number of the total
branches is n; the index k of wk(θ) denotes these different branches. In particular,
when k ¼ 0, we have

1 1
w0 ðθÞ ¼ z1=n ¼ r 1=n cos θ þ i sin θ : ð6:233Þ
n n

Comparing (6.231) and (6.233), we obtain


6.9 Multivalued Functions and Riemann Surfaces 251

wk ðθÞ ¼ w0 ðθ  2kπ Þ: ð6:234Þ

From (6.234), we find that wk(θ) is obtained by shifting (or rotating) w0(θ) by 2kπ
toward the positive direction of θ. The branch w0(θ) is called a principal branch of
the n-th root of z. The value of w0(θ) is called the principal value of that branch. The
implication of the presence of the branches is as follows: Suppose that the branches
besides w0(θ) would be absent. Then, from (6.233) we have

2π 2π
w0 ð2π Þ ¼ z1=n ¼ r 1=n cos þ i sin 6¼ w0 ð0Þ ¼ r 1=n :
n n

This causes inconvenience, because θ ¼ 0 and θ ¼ 2π are assumed to be identical


in the complex plane. Hence, the single valuedness would be broken with w0(θ). But,
in virtue of the presence of w1(θ), we have

w1 ð2π Þ ¼ w0 ð2π  2π Þ ¼ w0 ð0Þ:

This means that the value of w0(0) is recovered and, hence, the single valuedness
remains intact. After another cycle around the origin, similarly we have

w2 ð4π Þ ¼ w0 ð4π  4π Þ ¼ w0 ð0Þ:

Thus, in succession we get

wk ð2kπ Þ ¼ w0 ð2kπ  2kπ Þ ¼ w0 ð0Þ:

For k ¼ n, from (6.231) we have


h i
1 1
wn ðθÞ ¼ z1=n ¼ r 1=n cos ðθ  2nπ Þ þ i sin ðθ  2nπ Þ
n n
h i
1 1
¼r 1=n
cos θ  2π þ i sin θ  2π
n n
1 1
¼ r 1=n cos θ þ i sin θ ¼ w0 ðθÞ:
n n

Thus, we have no more new function. At the same time, the single valuedness of
wk(θ) (k ¼ 0, 1, 2,   , n  1) remains intact during these processes. To summarize
the above discussion, if we have n planes (or sheets) as the complex planes and
allocate them to w0(θ), w1(θ),   , wn  1(θ) individually, we have a single-valued
function w(θ) as a whole throughout these planes. In a word, the following relation
represents the situation:
252 6 Theory of Analytic Functions

8
> w 0 ðθ Þ
>
> for Plane 0,
>
< w 1 ðθ Þ for Plane 1,
w ðθ Þ ¼
>
>  
>
>
: for Plane n  1:
wn1 ðθÞ

This is the essence of the Riemann surface. The superposition of these n planes is
called a Riemann surface and each plane is said to be a Riemann sheet [of the
function w(θ)]. Each single-valued function w0(θ), w1(θ),   , wn  1(θ) defined on
each Riemann sheet is called a branch of w(θ). In the above discussion, the origin is
called a branch point of w(z).
For simplicity, let us think of the function w(z) described by
pffiffi
wðzÞ  z1=2 ¼ z:

In this case, for z ¼ reiθ (0  θ  2π) we have two different values w0 and w1
given by

w0 ðθÞ ¼ r 2 e 2 , w1 ðθÞ ¼ r 1=2 eiðθ2πÞ=2 ¼ w0 ðθ  2π Þ ¼ w0 ðθÞ:


1 iθ
ð6:235Þ

Then, we have

w0 ðθÞ for Plane 0,
wðθÞ ¼ ð6:236Þ
w1 ðθÞ for Plane 1:

In this case, w(z) is called a “two-valued” function of z and the functions w0 and
w1 are said to be branches of w(z). Suppose that z makes a counterclockwise circuit
around the origin, starting from, e.g., a real positive number z ¼ z0 to come full circle
to the original point z ¼ z0. Then, the argument of z has been increased by 2π. In this
situation the arguments of the individual branches w0 and w1 are increased by π.
Accordingly, w0 is switched to w1 and w1 is switched to w0. This situation can be
understood more clearly from (6.232) and (6.234). That is, putting n ¼ 2 in (6.232),
we have

wk ðθ  4π Þ ¼ wk ðθÞ ðk ¼ 0, 1Þ: ð6:237Þ

Meanwhile, putting k ¼ 1 in (6.234) we get

w1 ðθÞ ¼ w0 ðθ  2π Þ: ð6:238Þ

Replacing θ with θ + 2π in (6.238), we have w1(θ + 2π) ¼ w0(θ). Also replacing θ


with θ + 4π in (6.237), we have w1(θ + 4π) ¼ w1(θ) ¼ w0(θ  2π) ¼ w0(θ + 2π),
6.9 Multivalued Functions and Riemann Surfaces 253

(a) (b)
Plane I placed on top of
Plane II

Plane I Plane II

Fig. 6.25 Simple kit that helps visualize the Riemann surface. To make it, follow next procedures:
(a) Take two sheets of paper (Planes I and II) and make slits (indicated by dashed lines) as shown.
Next, put Plane I on top of Plane II so that the two slits (i.e., branch cuts) can fit in line. (b) Tape
together the downside of the cut of Plane I and the foreside of the cut of Plane II. Then, also tape
together the downside of the cut of Plane II and the foreside of the cut of Plane I

where with the last equality we replaced θ with θ + 2π in (6.237). Rewriting the
above, we get

w1 ðθ þ 2π Þ ¼ w0 ðθÞ and w0 ðθ þ 2π Þ ¼ w1 ðθÞ:

That is, adding 2π to θ, w0 and w1 are switched to each other.


Let us further think of the two-valued function of w(z) ¼ z1/2. Strictly speaking, an
analytic function cannot be a two-valued function. This is because if so, the
continuity and differentiability will be lost from the function and, hence, the
analyticity would be broken. Then, we must make a suitable device to avoid
it. Such a device is called a Riemann surface.
Let us make a kit of the Riemann surface following Fig. 6.25. (i) Take a sheet of
paper so that it can represent a complex plane and cut it with scissors along the real
axis that starts from the origin so that the cut (or slit) can be made toward the real
positive direction. This cut is called a branch cut with the origin being a branch
point. Let us call this sheet Plane I. (ii) Take another sheet of paper and call it Plane
II. Also make a cut in Plane II in exactly the same way as that for Plane I; see
Fig. 6.25a for the processes (i) and (ii). (iii) Next, put Plane I on top of Plane II so that
the two branch cuts can fit in line. (iv) Tape together the downside of the cut of Plane
I and the foreside of the cut of Plane II (Fig. 6.25b). (v) Then, also tape together the
downside of the cut of Plane II and the foreside of the cut of Plane I (see Fig. 6.25b
once again).
Thus, what we see is that starting from, e.g., a real positive number z ¼ z0 of Plane
I and coming full circle to the original point z ¼ z0, then we cross the cut to enter
Plane II located underneath Plane I. After another cycle within Plane II, we come
back to Plane I again by crossing the cut. After all, we come back to the original
Plane I after two cycles on the combined planes. This combined plane is called the
Riemann surface. In this way, Planes I and II correspond to different branches w0 and
254 6 Theory of Analytic Functions

Fig. 6.26 Graphically


decided argument θa, which
is an angle between a line
connecting z and a and −
another line drawn in
parallel with the real axis

w1, respectively, so that each branch can be single valued. In other words, w(z) ¼ z1/2
is a single-valued function that is defined on the whole Riemann surface.
There are a variety of Riemann surfaces according to the nature of the complex
functions. Another example of the multivalued function is expressed as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
wðzÞ ¼ ðz  aÞðz  bÞ,

where two branch points are located at z ¼ a and z ¼ b. As before, we assume that

z  a ¼ r a eiθa and z  b ¼ r b eiθb :

Then, we have
pffiffiffiffiffiffiffiffi iðθa þθb Þ=2
wðθa , θb Þ ¼ z1=2 ¼ ra rb e : ð6:239Þ

In the above, e.g., the argument θa can be decided graphically as shown in


Fig. 6.26, where θa is an angle between a line connecting z and a and another line
drawn in parallel with the real axis. We can choose w(θa, θb) of (6.239) for the
principal branch and define this as
pffiffiffiffiffiffiffiffi iðθa þθb Þ=2
w 0 ðθ a , θ b Þ  ra rb e : ð6:240Þ

Also, we define w1(θa, θb) as


pffiffiffiffiffiffiffiffi iðθa þθb 2πÞ=2
w1 ðθa , θb Þ  ra rb e : ð6:241Þ

Then, as in (6.235) we have

w1 ðθa , θb Þ ¼ w0 ðθa  2π, θb Þ ¼ w0 ðθa , θb  2π Þ ¼ w0 ðθa , θb Þ: ð6:242Þ


6.9 Multivalued Functions and Riemann Surfaces 255

(a) (b)

0 0

(c)

Fig. 6.27 Geometrical relationship between contour C and two branch points (located at z ¼ a and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
z ¼ b) for ðz  aÞðz  bÞ. (a) The contour C encircles only z ¼ a. (b) C encircles both z ¼ a and
z ¼ b. (c) C encircles neither z ¼ a nor z ¼ b

From (6.242), we find that adding 2π to θa or θb, w0 and w1 are switched to each
other as before. We also find that after the variable z has come full circle around one
of a and b, w0(θa, θb) changes sign as in the case of (6.235).
We also have

w0 ðθa  2π, θb  2π Þ ¼ w0 ðθa , θb Þ, w1 ðθa  2π, θb  2π Þ ¼ w1 ðθa , θb Þ: ð6:243Þ

From (6.243), on the other hand, after the variable z has come full circle around
both a and b, both w0(θa, θb) and w1(θa, θb) keep the original value. This is also the
case where the variable z comes full circle without encircling a or b. These behaviors
imply that (i) if a contour encircles one of z ¼ a and z ¼ b, w0(θa, θb) and w1(θa, θb)
change sign and switch to each other. Thus, w0(θa, θb) and w1(θa, θb) form branches.
On the other hand, (ii) if a contour encircles both z ¼ a and z ¼ b, w0(θa, θb) and
w1(θa, θb) remain intact. (iii) If a contour encircles neither z ¼ a nor z ¼ b, w0(θa, θb)
and w1(θa, θb) remain intact as well.
These three cases (i), (ii), and (iii) are depicted in Fig. 6.27a–c, respectively. In
Fig. 6.27c after z comes full circle, argz relative to z ¼ a returns to the original value
keeping θ1  arg z  θ2. This is similarly the case with argz relative to z ¼ b.
Correspondingly, the branch cut(s) are depicted with doubled broken line(s), e.g., as
in Fig. 6.28. Note that in the cases (ii) and (iii) the branch cut can be chosen so that
256 6 Theory of Analytic Functions

(a) (b)

0 0
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Fig. 6.28 Branch cut(s) for ðz  aÞðz  bÞ. (a) A branch cut is shown by a line connecting z ¼ a
and z ¼ b. (b) Branch cuts are shown by a line connecting z ¼ a and z ¼ 1 and another
line connecting z ¼ b and z ¼ 1. In both (a, b) the branch cut(s) are depicted with doubled broken
line(s)

the circle (or contour) may not cross the branch cut. Although a bit complicated, the
Riemann surface can be shaped accordingly. Other choices of the branch cuts are
shown in Sect. 6.9.2 (vide infra).
When we consider logarithm functions, we have to deal with rather complicated
situation. For polar coordinate representation, we have z ¼ reiθ. That is,

ln z ¼ ln r þ i arg z ¼ ln jzj þ i arg z, ð6:244Þ

where

arg z ¼ θ þ 2πn ðn ¼ 0, 1, 2,   Þ: ð6:245Þ

If n ¼ 0 is chosen, lnz is said to be the principal value. With the logarithm


functions, the Riemann surface comprises infinite planes each of which corresponds
to the individual branch whose argument is given by (6.245). The point z ¼ 0 is
pffiffi
called a logarithmic branch point. Meanwhile, the branch point for, e.g., wðzÞ ¼ z
is said to be an algebraic branch point.

6.9.2 Examples of Multivalued Functions

When we deal with a function having a branch point, we must remind that unless the
contour crosses the branch cut (either algebraic or logarithmic), a function we are
thinking of is held single valued and analytic (if that function is originally analytic)
and can be differentiated or integrated in a normal way. We give some examples
for this.
6.9 Multivalued Functions and Riemann Surfaces 257

Example 6.12 [6] Evaluate the following real definite integral:


Z 1
xa1
I¼ dx ð0 < a < 1Þ: ð6:246Þ
0 1þx

We rewrite (6.246) as
Z 1
za1
I¼ dz: ð6:247Þ
0 1þz

We define the integrand of (6.247) as

za1
f ðzÞ  ,
1þz

where za1 can be further rewritten as

za1 ¼ eða1Þ ln z :

The function f (z) has a branch point at z ¼ 0 and a simple pole at z ¼  1.


Bearing in mind this situation, we consider a contour for integration (see Fig. 6.29).
In Fig. 6.29 we depict the branch cut with a doubled broken line. Lines PQ and Q0P0
are contour lines located over and under the branch cut, respectively. We assume that
the lines PQ and Q0P0 are illimitably close to the real axis. Thus, starting from the
point P the contour integration IC is described by
Z Z Z Z
za1 za1 za1 za1
IC ¼ dz þ dz þ dz þ dz, ð6:248Þ
PQ 1 þ z ΓR 1 þ z Q0 P0 1 þ z Γ0 1 þ z

where ΓR and Γ0 denote the outer large circle and inner small circle of their radius
R ( 1) and r ( 1), respectively. Note that with the contour integration ΓR is traced
counterclockwise but Γ0 is traced clockwise (see Fig. 6.29). Since the simple pole is
present at z ¼  1, the related residue Res f (1) is

Fig. 6.29 Branch cut


(shown with a doubled Γ
broken line) and contour for
a1
the integration of z1þz
−1
Γ
0
258 6 Theory of Analytic Functions

 a1
Res f ð1Þ ¼ za1 z¼1
¼ ð1Þa1 ¼ eiπ ¼ eaiπ :

Then, we have

I C ¼ 2πi Res f ð1Þ ¼ 2πi eaiπ : ð6:249Þ

When R ! 1 and r ! 0, we have


Z Z
za1 Ra1 za1 r a1
dz <  2πR ! 0 and dz <  2πr ! 0: ð6:250Þ
ΓR 1 þ z R1 Γ0 1 þ z 1r

Thus, the second and fourth terms of (6.248) vanish.


Meanwhile, we take the principal value of ln z of (6.244). Since the lines PQ and
Q0P0 are very close to the real axis, using (6.245) with n ¼ 0 we can put θ ¼ 0 on PQ
and θ ¼ 2π on Q0P0. Then, we have
Z Z Z
za1 eða1Þ ln z eða1Þ ln jzj
dz ¼ dz ¼ dz: ð6:251Þ
PQ 1 þ z PQ 1 þ z PQ 1 þ z

Moreover, we have
Z Z Z
za1 eða1Þ ln z eða1Þð ln jzjþ2πiÞ
dz ¼ dz ¼ dz
Q0 P0 1 þ z Q0 P0 1 þ z Q0 P0 1þz
Z
eða1Þ ln jzj
¼ eða1Þ2πi dz: ð6:252Þ
Q0 P0 1þz

Notice that the argument underneath the branch cut (i.e., the line Q0P0) is
increased by 2π relative to that over the branch cut.
Considering (6.249) through (6.252) and taking the limit of R ! 1 and r ! 0,
(6.248) is rewritten by
Z 1 ða1Þ ln jzj Z 0 ða1Þ ln jzj
e e
2πi eaπi ¼ dz þ eða1Þ2πi dz
0 1þz 1 1þz
h iZ 1 ða1Þ ln jzj
ða1Þ2πi e
¼ 1e dz
0 1þz
h iZ 1
za1
¼ 1  eða1Þ2πi dz: ð6:253Þ
0 1þz

In (6.253) we have assumed z to be real and positive (see Fig. 6.29), and so we have
e(a  1) ln jz| ¼ e(a  1) ln z ¼ za  1. Therefore, from (6.253) we get
6.9 Multivalued Functions and Riemann Surfaces 259

Z 1
za1 2πi eaπi 2πi eaπi
dz ¼  ¼  : ð6:254Þ
0 1þz 1  eða1Þ2πi 1  e2πai

Using (6.37) for (6.254) and performing simple trigonometric calculations (after
multiplying both the numerator and denominator by 1  e2πai), we finally get
Z 1 Z 1
za1 xa1
dz ¼ dx ¼ I ¼ π=ðsin aπ Þ:
0 1þz 0 1þx

Example 6.13 [13] Estimate the following real definite integral:


Z b
1
I¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ða, b : real with a < bÞ: ð6:255Þ
a ð x  aÞ ð b  x Þ

To this end, we wish to evaluate the following integral described by


Z
1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz, ð6:256Þ
C ð z  aÞ ð z  bÞ

where we define the integrand of (6.256) as

1
f ðzÞ  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
ðz  aÞðz  bÞ

The total contour C comprises Ca, PQ, Cb, and Q0P0 as depicted in Fig. 6.30. The
function f (z) has two branch points at z ¼ a and z ¼ b; otherwise f (z) is analytic. We
draw a branch cut with a doubled broken line as shown. Since the contour C does not
cross the branch cut but encircle it, we can evaluate the integral in a normal manner.
Starting from the point P0, (6.256) can be expressed by
Z Z
1 1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz
Ca ðz  aÞðz  bÞ PQ ð z  aÞ ð z  bÞ
Z Z
1 1
þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz: ð6:257Þ
Cb ð z  a Þ ð z  b Þ 0 0
QP ð z  a Þ ð z  bÞ

We assume that the lines PQ and Q0P0 are illimitably close to the real axis. This
time, Ca and Cb are both traced counterclockwise. Putting

z  a ¼ r 1 eiθ1 and z  b ¼ r 2 eiθ2 ,

we have
260 6 Theory of Analytic Functions

Fig. 6.30 Branch cut (shown with a doubled broken line) and contour for the integration of
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ffi ða < bÞ. Only the argument θ1 is depicted. With θ2 see text
ðzaÞðzbÞ

1
f ðzÞ ¼ pffiffiffiffiffiffiffiffi eiðθ1 þθ2 Þ=2 : ð6:258Þ
r1 r2

Let Ca and Cb be a small circle of radius ε centered at z ¼ a and z ¼ b,


respectively. When we are evaluating the integral on Cb, we can put

z  b ¼ εeiθ2 ðπ  θ2  π Þ; i:e:, r 2 ¼ ε:

We also have r1 b  a; in Fig. 6.30 we assume b > a. Hence, we have

Z Z π Z π pffiffiffi
1 iε ε
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffi eiðθ1 þθ2 Þ=2 dθ2  pffiffiffiffi dθ2 :
Cb ðz  aÞðz  bÞ π r1 ε π r1

Therefore, we get

Z Z
1 1
lim pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0 or lim pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0:
ε!0 Cb ð z  aÞ ð z  bÞ ε!0 Cb ð z  aÞ ð z  bÞ

In a similar manner, we have


Z
1
lim pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0:
ε!0 C
a ð z  aÞ ð z  bÞ

Meanwhile, on the line Q0P0 we have

z  a ¼ r 1 eiθ1 and z  b ¼ r 2 eiθ2 : ð6:259Þ

In (6.259) we have r1 ¼ x  a, θ1 ¼ 0 and r2 ¼ b  x, θ2 ¼ π; see Fig. 6.30. Also,


we have dz ¼ dx. Hence, we have
6.9 Multivalued Functions and Riemann Surfaces 261

Z Z Z
1 a
eiπ=2 a
i
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ¼  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx
0 0
QP ðz  aÞðz  bÞ b ð x  aÞ ð b  x Þ b ðx  aÞðb  xÞ
Z b
1
¼i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx:
a ðx  aÞðb  xÞ
ð6:260Þ

On the line PQ, in turn, we have r1 ¼ x  a and θ1 ¼ 2π. This is because when
going from P0 to P, the argument is increased by 2π; see Fig. 6.30 once again. Also,
we have r2 ¼ b  x, θ2 ¼ π, and dz ¼ dx. Notice that the argument θ2 remains
unchanged. Then, we get
Z Z
1 b
e3iπ=2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx
PQ ð z  a Þ ð z  bÞ a ðx  aÞðb  xÞ
Z b
1
¼i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx: ð6:261Þ
a ðx  aÞðb  xÞ

Inserting these results into (6.257), we obtain


I Z b
1 1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 2i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx: ð6:262Þ
C ðz  aÞðz  bÞ a ð x  aÞ ð b  x Þ

The calculation is yet to be finished, however, because (6.262) is not a real


definite integral. Then, let us try another contour integration. We choose a contour
C of a radius R and centered at the origin. We assume that R is large enough this time.
Meanwhile, putting

z ¼ 1=w, ð6:263Þ

we have

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw
ð z  aÞ ð z  bÞ ¼ ð1  awÞð1  bwÞ with dz ¼  2 : ð6:264Þ
w w

Thus, we have
I I
1 1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw, ð6:265Þ
C ðz  aÞðz  bÞ C 0 w ð1  awÞð1  bwÞ

where the closed contour C0 of diameter 1/R comes full circle clockwise around the
origin of the w-plane (see Fig. 6.31). This is because as in the z-plane the contour C is
262 6 Theory of Analytic Functions

(a) (b)

0< < <0<


1/ 1/

0 0

Fig. 6.31 Branch cuts (shown with doubled broken lines) and contour in the w-plane for the
integration of pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
. (a) Case of 0 < a < b. (b) Case of a < 0 < b. Notice that in both cases
w ð1awÞð1bwÞ
the contour C0 is traced clockwise (see text)

traced viewing the point z ¼ 1 on the right, the contour C0 is traced viewing the
corresponding point w ¼ 0 on the right. The above clockwise cycle results from such
a situation. The integrand of (6.265) has a simple pole at w ¼ 0.
Regarding other singularities of the integrand, there are two branch points at
w ¼ 1/a and w ¼ 1/b. Then, we appropriately choose R 1 (or 1/R 1) so that C0
does not cut across the branch cut. To meet this condition, we have two choices.
(i) The branch cut connects w ¼ 1/a and w ¼ 1/b. (ii) Another choice of the branch
cut is that it connects w ¼ 1/a and w ¼  1 along with w ¼ 1/b and w ¼ 1 (see
Fig. 6.31). If we have 0 < a < b, we can choose a branch cut as that depicted in
Fig. 6.31a corresponding to Fig. 6.28a. In case a < 0 < b, the branch cut can be that
shown in Fig. 6.31b corresponding to Fig. 6.28b. (When a < b < 0, the branch cut
geometry may be that similar to Fig. 6.31a.)
Consequently, in both the above cases of (i) and (ii) there is only a simple pole at
w ¼ 0 without any other singularities inside the contour C0. Therefore, according to
(6.148) we have a contour integration described by

I
1 1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw ¼ 2πi Res pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
C0 w ð1  awÞð1  bwÞ w ð1  awÞð1  bwÞ w¼0
ð6:266Þ
1
¼ 2πi  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2πi,
ð1  awÞð1  bwÞ w¼0

where the above minus sign is due to the clockwise rotation of the contour integra-
tion. Hence, from (6.265) we get
I
1
IC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 2πi: ð6:267Þ
C ð z  aÞ ð z  bÞ

Equating (6.262) and (6.267), we finally obtain the answer described by


6.9 Multivalued Functions and Riemann Surfaces 263

Z b
1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ¼ π: ð6:268Þ
a ðx  aÞðb  xÞ

Example 6.14 [14] In Chap. 3 we mentioned the analyticity of the function


 λ
1  2tx þ t 2 : ð6:269Þ

In Chap. 3, we assumed that x is a real number belonging to an interval [1, 1].


Under this condition, we define f (t) in a complex domain as

f ðt Þ  1  2tx þ t 2 : ð6:270Þ

We rewrite (6.270) as
h pffiffiffiffiffiffiffiffiffiffiffiffiffi ih pffiffiffiffiffiffiffiffiffiffiffiffiffi i
f ð t Þ ¼ t  x þ i 1  x2 t  x  i 1  x2 : ð6:271Þ

pffiffiffiffiffiffiffiffiffiffiffiffiffi
With f (t) ¼ 0 we have two roots at t  ¼ x  i 1  x2, and so jtj ¼ 1 (see Sect.
3.6.1).
When λ ¼ 1/2, we are dealing with essentially the same problem as that of
Example 6.13. For instance, when x ¼ 0, the branch points are located at t ¼  i.
pffiffiffi pffiffiffi
When x ¼ 1= 2, the branch points are positioned at t ¼ ð1  iÞ= 2. These values
of t form the branch points. In Fig. 6.32 we depict these branch points and
corresponding branch cuts. In this situation, the function
 1=2
hðt Þ  1  2tx þ t 2 ð6:272Þ

is analytic within a circle jt j < 1 of the complex plane t [14] (Fig. 6.32). Therefore,
we can freely expand h(t) into a Taylor’s series at any point within that circle. If we
expand h(t) around t ¼ 0, we expect to have a following Taylor’s series:
X1
hð t Þ ¼ c ðxÞt
n¼0 n
n
: ð6:273Þ

We can readily decide cn(x) using (6.115) such that


I
1 hð t Þ
cn ð xÞ ¼ nþ1
dt, ð6:274Þ
2πi C t

where the contour C can be chosen so that C can encircle t ¼ 0 in a region jt j < 1.
Let us make a substitution such that [14]
264 6 Theory of Analytic Functions

Fig. 6.32 Branch points


and corresponding branch
cuts for (1  2tx + t2)1/2.
For instance, when x ¼ 0, i t
the branch points are located
pffiffiffi (1 + )/ 2
at t ¼  i. When x ¼ 1= 2,
the branch points are
positioned at t ¼
pffiffiffi
ð1  iÞ= 2 −1 0 1
(1 − )/ 2
−i

 1=2
1  ut ¼ 1  2tx þ t 2 : ð6:275Þ

Then we have

2ð u  x Þ  
t¼ and dt ¼ 2ð1  ut Þdu= u2  1 with u 6¼ 1: ð6:276Þ
u 1
2

Using these relations, we rewrite (6.274) as


I n I n
1 ð u2  1Þ 1 ð1Þn ð1  u2 Þ
cn ð xÞ ¼ nþ1
du ¼ nþ1
du, ð6:277Þ
0 2n ðu  xÞ
2πi 2πi C 0 2 ð u  xÞ
n
C

where the contour C0 encircles u ¼ x. Here using (6.118), we readily get

I n n
1 ð1Þn ð1  u2 Þ ð1Þn d n ð1  u2 Þ
cn ð xÞ ¼ du ¼ n  Pn ðxÞ: ð6:278Þ
2πi C 0 2 ð u  xÞ
n nþ1 2 n! dun
u¼x

The functions Pn(x) are Legendre polynomials that have already appeared in
(3.145) and (3.207) of Sects. 3.5 and 3.6. In this manner, the above argument directly
confirms that the Legendre polynomials of (3.194) derived from the generating
function (3.180) or defined as Rodrigues formula of (3.145) are identical in the
complex domain. Originally, (3.145) and (3.207) were defined in the real domain.
The identical representations in both the domains real and complex are the conse-
quence of the analytic continuation.
Substituting x ¼  1 for (6.277), we get
References 265

I n I
1 ð1Þn ð1  u2 Þ 1 1 ð u þ 1Þ n 1
c n ð 1Þ ¼ du ¼ n  du ¼ n  ðu þ 1Þn
2πi C 2 ð u  1Þ
0 n nþ1 2 2πi C 0 u  1 2
u¼1
¼1

and

I n I
1 ð1Þn ð1  u2 Þ 1 1 ðu  1Þn 1
cn ð1Þ ¼ du ¼ n  du ¼ n  ðu  1Þn
2πi C 2 ð u þ 1Þ
0 n nþ1 2 2πi C 0 u þ 1 2
u¼1
¼ ð1Þn
ð6:279Þ

Notice that in (6.279) the integrand has a simple pole at u ¼  1 and, hence, we
used (6.144) and (6.148) with the contour integration. The contour C0 is taken so that
it can encircle u ¼ 1 in the former case and u ¼  1 in the latter. Hence, we have
important relations

Pn ð1Þ ¼ 1 and Pn ð1Þ ¼ ð1Þn : ð6:280Þ

Summarizing this chapter, we have explored the theory and applications of the
analytic functions on the complex domain. We have studied how the theory of
analytic functions produces fruitful results.

References

1. Stoll RR (1979) Set theory and logic. Dover, New York


2. Willard S (2004) General topology. Dover, New York
3. McCarty G (1988) Topology. Dover, New York
4. Mendelson B (1990) Introduction to topology, 3rd edn. Dover, New York
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
6. Takagi T (2010) Introduction to analysis. Iwanami, Tokyo. (in Japanese)
7. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
8. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
9. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn.
Academic Press, Waltham
10. Hassani S (2006) Mathematical physics. Springer, New York
11. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering,
3rd edn. Cambridge University Press, Cambridge
12. Boas ML (2006) Mathematical methods in the physical sciences, 3rd edn. Wiley, New York
13. Omote M (1988) Complex functions. Iwanami, Tokyo. (in Japanese)
14. Lebedev NN (1972) Special functions and their applications. Dover, New York
Part II
Electromagnetism

Electromagnetism is one of the pillars of modern physics, even though it belongs to


classical physics along with Newtonian mechanics. Maxwell’s equations describe
and interpretalmost all electromagnetic phenomena and are basis equations of
classical physics together with Newton equation. Although after the discovery of
relativistic theory Newton equation needed to be modified, Maxwell’s equations did
not virtually have to be changed. In this part we treat characteristics of electromag-
netic waves that are directly derived from Maxwell’s equations.
Electromagnetism becomes connected to quantum mechanics, especially when
we deal with emission and absorption of light. In fact, the experiments performed in
connection with the blackbody radiation led to discovery of light quanta and
establishment of quantum mechanics. These accounts are not only of particular
interest from a historical point of view but also of great importance in understanding
modern physics. To understand the propagation of electromagnetic waves in a
dielectric medium is important from a basic aspect of electromagnetism. Moreover,
it is deeply connected to optical applications including optical devices such as
waveguides and lasers. In light of recent progress of the laser devices, we provide
a tangible example of organic lasers as newly occurring laser devices.
The motion of particles as well as spatial and temporal change in, e.g., electro-
magnetic fields is very often described in terms of differential equations. We
describe introductory methods of Green’s functions in order to solve those differ-
ential equations.
Chapter 7
Maxwell’s Equations

Maxwell’s equations consist of four first-order partial differential equations. First we


deal with basic properties of Maxwell’s equations. Next we show how equations of
electromagnetic wave motion are derived from Maxwell’s equations along vector
analysis. It is important to realize that the generation of the electromagnetic wave is a
direct consequence of the interplay between the electric field and magnetic field that
both change with time. We deal with behaviors of electromagnetic waves in dielec-
tric media where no true charge exists. At a first glance, this restriction seems to
narrow a range of application of principles of electromagnetism. In practice, how-
ever, such a situation is universalistic; topics cover a wide range of electromagnetic
phenomena, e.g., light propagation in dielectrics including water, glass, polymers,
etc. Polarized properties characterize the electromagnetic waves. These include
linear, circular, and elliptic polarizations. The characteristics are important both
from a fundamental aspect and from the point of view of optical applications.

7.1 Maxwell’s Equations and Their Characteristics

In this chapter, we first represent Maxwell’s equations as vector forms. The equa-
tions are represented as a differential form that is consistent with a viewpoint based
on “action trough medium.” The equation of wave motion (or wave equation) is
naturally derived from these equations.
Maxwell’s equations of electromagnetism are expressed as follows:

div D ¼ ρ, ð7:1Þ
div B ¼ 0, ð7:2Þ

© Springer Nature Singapore Pte Ltd. 2020 269


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_7
270 7 Maxwell’s Equations

∂B
rot E þ ¼ 0, ð7:3Þ
∂t
∂D
rot H  ¼ i: ð7:4Þ
∂t

In (7.3), RHS denotes a zero vector. Let us take some time to get acquainted with
physical quantities with their dimension as well as basic ideas and concepts along
with definitions of electromagnetism.
The quantity D is called electric flux density [As
m2 ¼ m2] (or electric displacement);
C

ρ is electric charge density [m3 ]. We describe vector quantities V as in (3.4):


C

0 1
Vx
B C
V ¼ ð e1 e2 e3 Þ @ V y A: ð7:5Þ
Vz

The notation div denotes a differential operator such that

∂V x ∂V y ∂V z
div V  þ þ : ð7:6Þ
∂x ∂y ∂z

Thus, the div operator converts a vector to a scalar. The quantities D and the electric
field E [V/m] are associated with the following expression:

D = εE, ð7:7Þ
2
where ε [Nm
C
2] is called a dielectric constant (or permittivity) of the dielectric medium.

The dimension can be understood from the following Coulomb’s law that describes a
force exerted between two charges:

1 QQ0
F¼ , ð7:8Þ
4πε r 2

where F is the force; Q and Q0 are electric charges of the two charges; r is a distance
between the two charges. Equation (7.1) represents Gauss’ law of electrostatics.
The electric charge of 1 Coulomb (1 [C]) is defined as follows: Suppose that two
point charges having the same electric charge are placed in vacuum 1 m apart. In that
situation, if a force F between the two charges is

F ¼ ec2 =107 ½N,

where ec is a light velocity, then we define the electric charge which each point charge
possesses as 1 [C]. Here, note that ec is a dimensionless number related to the light
7.1 Maxwell’s Equations and Their Characteristics 271

velocity in vacuum that is measured in a unit of [m/s]. Notice also that the light
velocity c in vacuum is defined as

c ¼ 299 792 458 ½m=s ðexact numberÞ:

Thus, we have

ec ¼ 299 792 458 ðdimensionless numberÞ:

Vacuum is a kind of dielectric media. From (7.8), its dielectric constant ε0 is


defined by
   2 
107 C2 12 C
ε0 ¼  8:854  10 : ð7:9Þ
4πec2 Nm2 Nm2

Meanwhile, 1 s (second) has been defined from a certain spectral line emitted from
133
Cs. Thus, 1 m (meter) is defined by

ðdistance along which light is propagated in vacuum during 1 sÞ=299 792 458:

The quantity B is called magnetic flux density [Vsm2 ¼ m2 ]. Equation (7.2) repre-
Wb

sents Gauss’s law of magnetostatics. In contrast to (7.1), RHS of (7.2) is zero. This
corresponds to the fact that although a true charge exists, a true “magnetic charge”
does not exist. (More precisely, such charge has not been detected so far.) The
quantity B is connected with magnetic field H by the following relation:

B = μH, ð7:10Þ

where μ [AN2 ] is said to be permeability (or magnetic permeability). Thus, H has a


dimension [m A
]. Permeability of vacuum μ0 is defined by
 
N
μ0 ¼ 4π=10 7
: ð7:11Þ
A2

Also we have

μ0 ε0 ¼ 1=c2 : ð7:12Þ

We will encounter this relation later again. Since the quantities c and ε0 have a
defined magnitude, so does μ0 from (7.12).
We often make an issue of a relative magnitude of dielectric constant and
magnetic permeability of a dielectric medium. That is, relative permittivity εr and
relative permeability μr are defined as follows:
272 7 Maxwell’s Equations

εr  ε=ε0 and μr  μ=μ0 : ð7:13Þ

Note that both εr and μr are dimensionless quantities. Those magnitudes are equal to
1 (in the case of vacuum) or larger than 1 (with any other dielectric media).
Equations (7.3) and (7.4) deal with the change in electric and magnetic fields with
time. Of these, (7.3) represents Faraday’s law of electromagnetic induction due to
Michael Faraday (1831). He found that when a permanent magnet was thrust into or
out of a closed circuit, the transient current flowed. Moreover, that experiment
implied that even without the closed circuit, an electric field was generated around
the space that changed the position relative to the permanent magnet. Equation (7.3)
is easier to understand if it is rewritten as follows:

∂B
rot E = 2 : ð7:14Þ
∂t

That is, the electric field E is generated in such a way that the induced electric field
(or induced current) tends to lessen the change in magnetic flux (or magnetic field)
produced by the permanent magnet (Lenz’s law). The minus sign in RHS indicates
that effect.
The rot operator appearing in (7.3) and (7.4) is defined by
 
e1  e2 e3 
 
∂  ∂ ∂ 
rot V = —  V =  

∂x ∂y ∂z
 

Vx Vy Vz 
     
∂V z ∂V y ∂V x ∂V z ∂V y ∂V x
= 2 e1 þ 2 e2 þ 2 e3 : ð7:15Þ
∂y ∂z ∂z ∂x ∂x ∂y

The operator — has already appeared in (3.9). This operator transforms a vector to a
vector. Let us think of the meaning of the rot operator. Suppose that there is a vector
field that varies with time and spatial positions. Suppose also at some instant the
spatial distribution of the field varies as in Fig. 7.1, where a spiral vector field V is
present. For a z-component of rot V around the origin, we have
 
∂V y ∂V x ðV 1 Þy  ðV 3 Þy ðV 2 Þx  ðV 4 Þx
ðrot V Þz ¼ 2 ¼ lim  :
∂x ∂y Δx⟶0, Δy⟶0 Δx Δy

In the case of Fig. 7.1, (V1)y  (V3)y > 0 and (V2)x  (V4)x < 0 and, hence, we find
∂V
that rot V has a positive z-component. If Vz ¼ 0 and ∂zy ¼ ∂V ∂z
x
¼ 0, we find from
∂V
(7.15) that rot V possesses only the z-component. The equation ∂zy ¼ ∂V ∂z
x
¼0
implies that the vector field V is uniform in the direction of the z-axis. Thus, under
the above conditions the spiral vector field V is accompanied by the rot V vector field
7.1 Maxwell’s Equations and Their Characteristics 273

that is directed toward the upper side of the plane of paper (i.e., the positive direction
of the z-axis).
Equation (7.4) can be rewritten as

∂D
rot H ¼ i þ : ð7:16Þ
∂t
A
Notice that ∂D
∂t
has the same dimension as m2 ] and is called displacement current.
Without this term, we have

rot H ¼ i: ð7:17Þ

This relation is well known as Ampère’s law or Ampère’s circuital law (Andr-
é-Marie Ampère: 1827), which determines a magnetic field yielded by a stationary
current. Again with the aid of Fig. 7.1, (7.17) implies that the current given by i
produces spiral magnetic field.
Now, let us think of a change in amount of charges with time in a part of three-
dimensional closed space V surrounded by a closed surface S. It is given by
Z Z Z Z
d ∂ρ
ρdV ¼ dV ¼  i  ndS ¼  div idV, ð7:18Þ
dt V V ∂t S V

where n is an outward-directed normal unit vector; with the last equality we used
Gauss’s theorem. The Gauss’s theorem is described by
Z Z
div idV ¼ i  ndS:
V S

Fig. 7.1 Schematic y


representation of a spiral
vector field V that yields
rot V
(0, Δy/2)
V2 V1

(‒Δx/2, 0) O x
(Δx/2, 0)
z

V3 V4
(0, ‒Δy/2)
274 7 Maxwell’s Equations

Fig. 7.2 Intuitive diagram


that explains the Gauss’s
theorem

Figure 7.2 gives an intuitive diagram that explains the Gauss’s theorem. The diagram
shows a cross-section of the closed space V surrounded by a surface S. In this case,
imagine a cube or a hexahedron as V. The periphery is the cross-section of the closed
surface accordingly. Arrows in the diagram schematically represent div i on indi-
vidual fragments; only those of the center infinitesimal fragment are shown with
solid lines. The arrows of adjacent fragments cancel out each other and only the
components on the periphery are nonvanishing. Thus, the volume integration of div i
is converted to the surface integration of i. Readers are referred to appropriate
literature with the vector integration [1].
Consequently, from (7.18) we have
Z  
∂ρ
þ div i dV ¼ 0: ð7:19Þ
V ∂t

Since V is arbitrarily chosen, we get

∂ρ
þ div i = 0: ð7:20Þ
∂t

The relation (7.20) is called a current continuity equation. This relation represents
law of conservation of charge.
Meanwhile, taking div of both sides of (7.17), we have

div rot H ¼ div i: ð7:21Þ

The LHS of (7.21) reads


7.1 Maxwell’s Equations and Their Characteristics 275

     
∂ ∂H z ∂H y ∂ ∂H x ∂H z ∂ ∂H y ∂H x
div rot H ¼ 2 þ 2 þ 2
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y
 2 2   2 2   2 2 
∂ ∂ ∂ ∂ ∂ ∂
¼  Hz þ  Hx þ  Hy
∂x∂y ∂y∂x ∂y∂z ∂z∂y ∂z∂x ∂x∂z
¼ 0:
ð7:22Þ
2 2
∂ Hz
With the last equality of (7.22), we used the fact that if, e.g., ∂x∂y
and ∂∂y∂x
Hz
are
2 2
∂ Hz
continuous and differentiable in a certain domain (x, y), ∂x∂y
¼ ∂∂y∂x
Hz
. That is, we
assume “ordinary” functions for Hz, Hx, and Hy. Thus from (7.21), we have

div i = 0: ð7:23Þ

From (7.20), we also have

∂ρðx, t Þ
¼ 0, ð7:24Þ
∂t

where we explicitly show that ρ depends upon both x and t. Note that x is a position
vector described as (3.5). Therefore, (7.24) shows that ρ(x, t) is temporally constant
at a position x, consistent with the stationary current.
Nevertheless, we encounter a problem when ρ(x, t) is temporally varying. In other
words, (7.17) goes against the charge conservation law, when ρ(x, t) is temporally
varying. It was James Clerk Maxwell (1861–1862) that solved the problem by
introducing a concept of the displacement current. In fact, taking div of both sides
of (7.16), we have

∂div D ∂ρ
div rot H ¼ div i þ ¼ div i þ ¼ 0, ð7:25Þ
∂t ∂t

where with the first equality we exchanged the order of differentiations with respect
to t and x; with the second equality we used (7.1). The last equality of (7.25) results
from (7.20). In other words, in virtue of the term of ∂D
∂t
, (7.4) is consistent with the
charge conservation law. Thus, the set of Maxwell’s eqs. (7.1)–(7.4) supply us with
well-established base in natural science up until the present.
Although the set of these equations describe spatial and temporal changes in
electric and magnetic fields in vacuum and matter including metal, in Part II we
confine ourselves to the changes in the electric and magnetic fields in a uniform
dielectric medium.
276 7 Maxwell’s Equations

7.2 Equation of Wave Motion

If we further confine ourselves to the case where neither electric charge nor electric
current is present in a uniform dielectric medium, we can readily obtain equations of
wave motion regarding the electric and magnetic fields. That is,

div D ¼ 0, ð7:26Þ
div B ¼ 0, ð7:27Þ
∂B
rot E þ ¼ 0, ð7:28Þ
∂t
∂D
rot H  ¼ 0: ð7:29Þ
∂t

The relations (7.27) and (7.28) are identical to (7.2) and (7.3), respectively.
Let us start with a formula of vector analysis. First, we introduce a grad operator.
We have

∂f ∂f ∂f
grad f ¼ — f ¼ e1 þ e2 þ e3 :
∂x ∂y ∂z

That is, the grad operator transforms a scalar to a vector. We have a following
formula:

rot rot V ¼ grad div V  — 2 V: ð7:30Þ

The operator — 2 has already appeared in (1.24). To show (7.30), we compare a x-


component of both sides of (7.30). That is
   
∂ ∂V y ∂V x ∂ ∂V x ∂V z
½rot rot V x ¼ 2 2 2 : ð7:31Þ
∂y ∂x ∂y ∂z ∂z ∂x
 
  ∂ ∂V x ∂V y ∂V z
2 2
∂ Vx ∂ Vx ∂ Vx
2
grad div V  — 2 V x = þ þ   
∂x ∂x ∂y ∂z ∂x
2
∂y
2
∂z
2

 
∂ ∂V y ∂V z
2 2
∂ Vx ∂ Vx
= þ   : ð7:32Þ
∂x ∂y ∂z ∂y
2
∂z
2

2 2
∂ V ∂ V 2 2
Again assuming that ∂y∂xy ¼ ∂x∂yy and ∂∂z∂x
Vz
¼ ∂∂x∂z
Vz
, we have
 
½rot rot V x ¼ grad div V  ∇2 V x : ð7:33Þ

Regarding y- and z-components, we have similar relations as well. Thus (7.30) holds.
7.2 Equation of Wave Motion 277

Taking rot of both sides of (7.28), we have

2
∂B ∂rotB ∂ E
rot rot E þ rot ¼ grad div E  — 2 E þ ¼ — 2 E þ με 2
∂t ∂t ∂t
¼ 0, ð7:34Þ

where with the first equality we used (7.30) and for the second equality we used
(7.7), (7.10), and (7.29). Thus we have

2
∂ E
— 2 E = με 2
: ð7:35Þ
∂t

Similarly, from (7.29) we get

2
∂ H
— 2 H = με 2
: ð7:36Þ
∂t

Equations (7.35) and (7.36) are called equations of wave motions for the electric and
magnetic fields.
To consider implications of these equations, let us think of for simplicity a
following equation in a one-dimensional space.

2 2
∂ yðx, t Þ 1 ∂ yðx, t Þ
2
¼ , ð7:37Þ
∂x v2 ∂t 2

where y is an arbitrary scalar function that depends on x and t; v is a constant. Let f(x,
t) and g(x, t) be arbitrarily chosen functions. Then, f(x  vt) and g(x + vt) are two
solutions of (7.37). In fact, putting X ¼ x  vt, we have

2    2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X ∂ f
¼ ¼ , ¼ ¼ , ð7:38Þ
∂x ∂X ∂x ∂X ∂x2 ∂X ∂X ∂x ∂X 2
2    2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X 2 ∂ f
¼ ¼ ðvÞ , ¼ ð v Þ ¼ ð v Þ : ð7:39Þ
∂t ∂X ∂t ∂X ∂t 2 ∂X ∂X ∂t ∂X
2

From the second equations of (7.38) and (7.39), we recover

2 2
∂ f 1 ∂ f
2
¼ 2 2: ð7:40Þ
∂x v ∂t

Similarly, we get
278 7 Maxwell’s Equations

2 2
∂ g 1 ∂ g
2
¼ 2 2: ð7:41Þ
∂x v ∂t

Therefore, as a general solution we can take a superposition of f(x, t) and g(x, t). That
is,

yðx, t Þ ¼ f ðx  vt Þ þ gðx þ vt Þ: ð7:42Þ

The implication of (7.42) is as follows: (i) The function f(x  vt) can be obtained
by parallel translation of f(x) by vt in a positive direction of x-axis. In other words, f
(x  vt) is obtained by translating f(x) by v in a unit of time in a positive direction of
x-axis, or the function represented by f(x) is translated at a rate of v with its form
unchanged in time. (ii) The function g(x + vt), on the other hand, is translated at a rate
of v with its form unchanged in time as well. (iii) Thus, y(x, t) of (7.42) represents
two “waves,” i.e., a forward wave and a backward wave. Propagation velocity of the
two waves is jvj accordingly. Usually we choose a positive number for v and v is
called a phase velocity.
Comparing (7.35) and (7.36) with (7.41), we have

με ¼ 1=v2 : ð7:43Þ

In particular, in a vacuum we recover (7.12).


Notice that f and g can take any functional form and, hence, they are not
necessarily a periodic wave. Yet, what we are mostly concerned with is a periodic
wave such as sinusoidal waves. Thus, we arrive at a following functional form:

f ðx  vt Þ ¼ AeiðxvtÞ , ð7:44Þ

where A is said to be an amplitude of the wave. The constant A usually takes a


positive number, but it may take a complex number including a negative number. An
exponent of (7.44) contains a number having a dimension [m]. To make it a
dimensionless number, we multiply (x  vt) by a wavenumber k that has been
introduced in (1.2) and (1.3). That is, we have

ef ½kðx  vt Þ ¼ AeikðxvtÞ ¼ AeiðkxkvtÞ ¼ AeiðkxωtÞ , ð7:45Þ

where ef shows the change in the functional from according to the variable transfor-
mation. In (7.45), we have

kv ¼ kλν ¼ ð2π=λÞλν ¼ 2πν ¼ ω, ð7:46Þ

where ν and ω are said to be frequency and angular frequency, respectively. For a
three-dimensional wave f of a scalar function, we have a following form:
7.2 Equation of Wave Motion 279

f ¼ AeiðkxωtÞ ¼ A½ cos ðk  x  ωt Þ þ i sin ðk  x  ωt Þ, ð7:47Þ

where k is said to be a wavenumber vector. In (7.47),

k2 ¼ k2 ¼ k2x þ k2y þ k 2z : ð7:48Þ

Equation (7.47) is virtually identical to (1.25). When we deal with a problem of


classical electromagnetism, we usually take a real part of the results after relevant
calculations.
Suppose that (7.47) is a solution of a following wave equation:

2
1 ∂ f
— 2f = : ð7:49Þ
v2 ∂t 2

Substituting LHS of (7.47) for f of (7.49), we have

1
A k2 eiðkxωtÞ ¼ A ω2 eiðkxωtÞ : ð7:50Þ
v2

Comparing both sides of (7.50), we get

k2 v2 ¼ ω2 or kv ¼ ω: ð7:51Þ

Thus, we recover (7.46).


Here we introduce a unit vector n as in (1.3) whose direction parallels that of
propagation of wave such that
0 1
nx
2π B C
k = kn ¼ n, n = ðe1 e2 e3 Þ@ ny A, ð7:52Þ
λ
nz

where nx, and ny, and nz define direction cosines. Then (7.47) can be rewritten as

f ¼ AeiðknxωtÞ , ð7:53Þ

where an exponent is called a phase. Suppose that the phase is fixed at zero. That is,

kn  x  ωt ¼ 0: ð7:54Þ

Since ω ¼ kv, we have


280 7 Maxwell’s Equations

Fig. 7.3 Schematic z


representation of a plane
wave. The field has the same
phase on a plane P. A solid
arrow x represents an
arbitrary position vector on
the plane and n is a unit x n
vector perpendicular to the
plane P (i.e., parallel to a
normal of the plane P). The
quantity v is a phase velocity vt P
of the plane wave y
O

n  x  vt ¼ 0: ð7:55Þ

Equation (7.55) defines a plane in a three-dimensional space and is called a Hesse’s


normal form. Figure 7.3 schematically represents a plane wave of the field f. The
field has the same phase on a plane P. A solid arrow x represents an arbitrary position
vector on the plane and n is a unit vector perpendicular to the plane P (i.e., parallel to
a normal of the plane P). The quantity vt defines a length of a perpendicular that
connects the origin O and plane P (i.e., the length of the perpendicular from the
origin and a foot of the perpendicular) at a given time t.
In other words, (7.54) determines a plane in such a way that the wave f has the
same phase (zero) at a given time t at position vectors x on the plane determined by
(7.54) or (7.55). That plane is moving in the direction of n at a phase velocity v. From
this situation, a wave f described by (7.53) is called a plane wave.
A refractive index n of a dielectric media is an important index that characterizes
its dielectric properties. It is defined as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi
n  c=v ¼ με=μ0 ε0 ¼ μr εr : ð7:56Þ

In a nonmagnetic substance such as glass and polymer materials, we can assume that
μr  1. Thus, we get an approximate expression as follows:
pffiffiffiffi
n εr : ð7:57Þ

7.3 Polarized Characteristics of Electromagnetic Waves

As in (7.47), we assume a similar form for a solution of (7.35) such that


7.3 Polarized Characteristics of Electromagnetic Waves 281

E = E0 eiðkxωtÞ ¼ E0 eiðknxωtÞ , ð7:58Þ

where with the second equality we used (7.52). Similarly, we have

H = H0 eiðkxωtÞ ¼ H0 eiðknxωtÞ : ð7:59Þ

In (7.58) and (7.59), E0 and H0 are constant vectors and may take a complex
magnitude. Substituting (7.58) for (7.28) and using (7.10) as well as (7.43) and
(7.46), we have

ikn  E0 eiðknxωtÞ ¼ ðiωÞμH 0 eiðknxωtÞ ¼ ivkμH 0 eiðknxωtÞ


pffiffiffiffiffiffiffiffi
¼ ik μ=εH 0 eiðknxωtÞ :

Comparing coefficients of the exponential functions of the first and last sides, we get
pffiffiffiffiffiffiffiffi
H 0 = n  E0 = μ=ε : ð7:60Þ

Similarly, substituting (7.59) for (7.29) and using (7.7), we get


pffiffiffiffiffiffiffiffi
E0 = μ=ε H 0  n: ð7:61Þ

From (7.26) and (7.27), we have

n  E0 ¼ n  H 0 ¼ 0: ð7:62Þ

This indicates that E and H are both perpendicular to n, i.e., the propagation
direction of the electromagnetic wave. Thus, the electromagnetic wave is character-
ized by a transverse wave. The fields E and H have the same phase on P at an
arbitrary given time. Taking account of (7.60)–(7.62), E, H, and n are mutually
perpendicular to one another. We depict a geometry of E, H, and n for the
electromagnetic plane wave in Fig. 7.4, where a plane P is perpendicular to n.
We find that (7.60) is not independent of (7.61). In fact, taking an outer product
from the right with respect to both sides of (7.60), we have
pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi
H 0  n ¼ n  E0  n= μ=ε ¼ E0 = μ=ε ,

where we used (7.61) with the second equality. Thus, we get

n  E0  n ¼ E0 : ð7:63Þ

Meanwhile, vector analysis tell us that [1]


282 7 Maxwell’s Equations

Fig. 7.4 Mutual geometry z


of E and H for an
electromagnetic plane wave
in P. E and H have the same
phase on P at an arbitrary P
given time. The unit vector
n is perpendicular to P E n

H y
O

C  ðA  BÞ ¼ AðB  CÞ  BðC  AÞ:

In the above, putting B ¼ C = n, we have

n  ðA  nÞ ¼ Aðn  nÞ  nðn  AÞ:

That is, we have

A ¼ nðn  AÞ þ n  ðA  nÞ:

This relation means that A can be decomposed into a component parallel to n and
that perpendicular to n. Equation (7.63) shows that E0 has no component parallel to
n. This is another confirmation
pffiffiffiffiffiffiffiffi that E is perpendicular to n.
In (7.60) and (7.61), μ=ε has a p dimension
ffiffiffiffiffiffiffiffi [Ω]. Make sure that this can be
confirmed by (7.9) and (7.11). Hence, μ=ε is said to be characteristic impedance
[2]. We denote it by
pffiffiffiffiffiffiffiffi
Z μ=ε: ð7:64Þ

Thus, we have

H0 ¼ ðn  E0 Þ=Z:

In vacuum we have
pffiffiffiffiffiffiffiffiffiffiffi
Z0 ¼ μ0 =ε0  376:7 ½Ω:

For the electromagnetic wave to be the transverse wave means that neither E nor
H has component along n. Choosing the positive direction of the z-axis for n and
7.3 Polarized Characteristics of Electromagnetic Waves 283

ignoring components related to partial differentiation with respect to x and y (i.e., the
component related to ∂/∂x and ∂/∂y), we rewrite (7.28) and (7.29) for individual
Cartesian coordinates as

∂Ey ∂Bx ∂Dy ∂H x


2 þ ¼ 0 or 2 þ με ¼ 0,
∂z ∂t ∂z ∂t
∂E x ∂By ∂Dx ∂H y
þ ¼ 0 or þ με ¼ 0,
∂z ∂t ∂z ∂t
∂H y ∂Dx ∂By ∂x
2  ¼ 0 or   με ¼ 0,
∂z ∂t ∂z ∂t
∂H x ∂Dy ∂Bx ∂E y
 ¼ 0 or  με ¼ 0: ð7:65Þ
∂z ∂t ∂z ∂t

We differentiate the first equation of (7.65) with respect to z to get

2
∂ Ey 2
∂ Bx
 þ ¼ 0:
∂z
2 ∂z∂t

Also differentiating the fourth equation of (7.65) with respect to t and multiplying
both sides by μ, we have

2
2
∂ Hx ∂ Dy
μ þμ ¼ 0:
∂t∂z ∂t
2

Summing both sides of the above equations and arranging terms, we get

2 2
∂ Ey ∂ Ey
2
¼ με 2
:
∂z ∂t

In a similar manner, we have

2 2
∂ Ex ∂ E
2
¼ με 2 x :
∂z ∂t

Similarly, for the magnetic field, we also get

2 2
2
∂ Hx
2
∂ Hx ∂ Hy ∂ Hy
2
¼ με 2
and 2
¼ με 2
:
∂z ∂t ∂z ∂t

From the above relations, we have two plane electromagnetic waves polarized either
the x-axis or y-axis.
284 7 Maxwell’s Equations

What is implied in the above description is that as solutions of (7.35) and (7.36)
we have a plane wave characterized by a specific direction defined by E0 and H0.
This implies that if we observe the electromagnetic wave at a fixed point, both E and
H oscillate along the mutually perpendicular directions E0 and H0. Hence, we say
that the “electric wave” is polarized in the direction E0 and that the “magnetic wave”
is polarized in the direction H0.
To characterize the polarization of the electromagnetic wave, we introduce
following unit polarization vectors εe and εm (with indices e and m related to the
electric and magnetic field, respectively) [3]:

εe = E0 =E0 and εm = H 0 =H 0 ; H 0 ¼ E 0 =Z, ð7:66Þ

where E0 and H0 are said to be amplitude and may be again complex. We have

εe  εm = n: ð7:67Þ

We call εe and εm a unit polarization vector of the electric field and magnetic field,
respectively. As noted above, εe, εm, and n constitute a right-handed system in this
order and are mutually perpendicular to one another.
The phase of E in the plane wave (7.58) and that of H in (7.59) are individually
the same on all the points of P. From the wave equations of (7.35) and (7.36),
however, it is unclear whether E and H have the same phase. Suppose that E and H
would have a different phase such that

E = E0 eiðkxωtÞ

and

f0 eiðkxωtÞ ,
H = H 0 eiðkxωtþδÞ ¼ H 0 eiδ eiðkxωtÞ ¼ H

where Hf0 ð¼ H 0 eiδ Þ is complex and a phase factor eiδ in the exponent is unknown.
This factor, however, can be set at zero. To show this, let us make qualitative
discussion using Fig. 7.5. Figure 7.5 shows the electric field of the plane wave at
some instant as a function of phase Φ. Suppose that Φ is taken in the direction of n in
Fig. 7.4. Then, from (7.53) we have

Φ ¼ kn  x  ωt ¼ kn  ρn  ωt ¼ kρ  ωt,

where ρ is distance from the origin. Also we have

E = E0 eiðkxωtÞ ¼ E 0 εe eiðkρωtÞ ðE 0 > 0Þ:

Suppose furthermore that the phase is measured at t ¼ 0 and that the electric field is
measured along the εe direction. Then, we have
7.4 Superposition of Two Electromagnetic Waves 285

Fig. 7.5 Electric field of a


plane wave at some instant
as a function of phase. The

Electric field
sinusoidal curve
representing the electric P1 P2
field is shifted with time 0
from left to right with its
form unchanged


0
Phase

Φ ¼ kρ and E ¼ jEj ¼ E 0 eikρ : ð7:68Þ

In Fig. 7.5, a real part of E is shown with E ¼ E0 at ρ ¼ 0. The sinusoidal curve


representing the electric field in Fig. 7.5 is shifted with time from left to right with its
form unchanged.
In this situation, the electric field is to be strengthened with a negative value at a
phase P1 with time according to (7.68), whereas at another phase P2 the electric field
is to be strengthened with a positive value with time. As a result, in a region tucked
between P1 and P2 a spiral magnetic field is generated toward the εm direction, i.e.,
upper side of a plane of paper. The magnitude of the magnetic field is expected to be
maximized at a center point of P1P2 (i.e., the origin of the coordinate system) where
the electric field is maximized as well. Thus, we conclude that E and H have the
same phase. It is important to realize that the generation of the electromagnetic wave
is a direct consequence of the interplay between the electric field and magnetic field
that both change with time. It is truly based upon the nature of the electric and
magnetic fields that are clearly represented in Maxwell’s equations.
Next, we consider the situation where two electromagnetic waves are propagated
in the same direction but with a different phase. Notice again that we are considering
the electromagnetic wave that is propagated in a uniform and infinite dielectric
media without BCs.

7.4 Superposition of Two Electromagnetic Waves

Let E1 and E2 be two electric waves described such that

E1 = E 1 e1 eiðkzωtÞ and E2 = E 2 e2 eiðkzωtþδÞ ,

where E1 (>0) and E2 (>0) are amplitudes and e1 and e2 represent unit polarization
vectors in the direction of positive x-axis and y-axis; we assume that two waves are
286 7 Maxwell’s Equations

being propagated in the direction of the positive z-axis; δ is a phase difference. The
total electric field E is described as the superposition of E1 and E2 such that

E = E1 þ E2 = E1 e1 eiðkzωtÞ þ E 2 e2 eiðkzωtþδÞ : ð7:69Þ

Note that we usually discuss the polarization characteristics of electromagnetic


wave only by considering electric waves. We emphasize that an electric wave and
concomitant magnetic wave share the same phase in a uniform and infinite dielectric
media. A reason why the electric wave represents an electromagnetic wave is partly
because optical application is mostly made in a nonmagnetic substance such as glass,
water, plastics, and most of semiconductors.
Let us view temporal change of E at a fixed point x = 0; x ¼ y ¼ z ¼ 0. Then,
taking a real part of (7.69), x- and y-components of E; i.e., Ex and Ey are expressed as

E x ¼ E 1 cos ðωt Þ and E y ¼ E 2 cos ðωt þ δÞ: ð7:70Þ

First, let us briefly think of the case where δ ¼ 0. Eliminating t, we have

E2
Ey ¼ E: ð7:71Þ
E1 x

This is an equation of a straight line. The resulting electric field E is called a linearly
polarized light accordingly. That is, when we are observing the electric field of the
relevant light at the origin, the field is oscillating along the straight line described by
(7.71) with the origin centrally located of the oscillating field. If δ ¼ π, we have

E2
Ey ¼  E:
E1 x

This gives a straight line as well. Therefore, if we wish to seek the relationship
between Ex and Ey, it suffices to examine it as a function of δ in a region of  π2 
δ  π2.
(i) Case I: E1 6¼ E2.
Let us consider the case where δ 6¼ 0 in (7.70). Rewriting the second equation of
(7.70) and inserting the first equation into it so that we can eliminate t, we have
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi!
E E2
E y ¼ E 2 ð cos ωt cos δ þ sin ωt sin δÞ ¼ E 2 cos δ x sin δ 1  x2 :
E1 E1

Rearranging terms of the above equation, we have


7.4 Superposition of Two Electromagnetic Waves 287

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ey E E2
 ð cos δÞ x ¼ ð sin δÞ 1  x2 : ð7:72Þ
E2 E1 E1

Squaring both sides of (7.72) and arranging the equation, we get

Ex 2 2ð cos δÞE x E y Ey 2
 þ ¼ 1: ð7:73Þ
E 21 sin 2 δ E1 E2 sin 2 δ E22 sin 2 δ

Using a matrix form, we have


0 1 cos δ 1
  
B E 21 sin 2 δ E 1 E 2 sin 2 δ C Ex
B
Ex Ey @ C ¼ 1: ð7:74Þ
cos δ 1 A E
 y
E 1 E 2 sin 2 δ E22 sin 2 δ

Note that the above matrix is real symmetric. In that case, to examine properties
of the matrix we calculate its determinant along with principal minors. The principal
minor means a minor with respect to a diagonal element. In this case, two principal
1 1
minors are E2 sin 2 and 2
δ E sin 2 δ
. Also we have
2 1

 
 1 cos δ 
  
 E21 sin 2 δ E 1 E 2 sin 2 δ  1

 cos δ 1  ¼ E2 E 2 sin 2 δ : ð7:75Þ
  1 2
 E E sin 2 δ E22 sin 2 δ 
1 2

Evidently, two principal minors as well as a determinant are all positive (δ 6¼ 0). In
this case, the (2, 2) matrix of (7.74) is said to be positive definite. The related
discussion will be given in Part III. The positive definiteness means that in a
quadratic form described by (7.74), LHS takes a positive value for any real number
Ex and Ey except a unique case where Ex ¼ Ey ¼ 0, which renders LHS zero. The
positive definiteness of a matrix ensures the existence of positive eigenvalues with
the said matrix.
Let us consider a real symmetric (2, 2) matrix that has positive principal minors
and a positive determinant in a general case. Let such a matrix M be
 
a c
M¼ ,
c b

where a, b > 0 and det M > 0; i.e., ab  c2 > 0. Let a corresponding quadratic form
be Q. Then, we have
288 7 Maxwell’s Equations

    
a c x cy 2 c2 y2 aby2
Q ¼ ð x yÞ ¼ ax þ 2cyx þ by ¼ a x þ
2 2
 2 þ 2
c b y a a a
 
cy 2 y2
¼a xþ þ 2 ab  c2 :
a a

Thus, Q 0 for any real numbers x and y. We seek a condition under which Q ¼ 0.
We readily find that with M that has the above properties, only x ¼ y ¼ 0 makes
Q ¼ 0. Thus, M is positive definite. We will deal with this issue from a more general
standpoint in Part III.
In general, it is pretty complicated to seek eigenvalues and corresponding eigen-
vectors in the above case. Yet, we can extract important information from (7.74).
The eigenvalues λ are estimated as follows:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
E 21 þ E 22 E21 þ E22  4E21 E22 sin 2 δ
λ¼ : ð7:76Þ
2E 21 E 22 sin 2 δ

Notice that λ in (7.76) represents two different positive eigenvalues. It is because an


inside of the square root is rewritten by

2
E21  E22 þ 4E21 E22 cos 2 δ > 0 ðδ 6¼ π=2Þ:

Also we have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
E 21 þ E22 > E 21 þ E 22  4E 21 E 22 sin 2 δ:

These clearly show that the quadratic form of (7.74) gives an ellipse (i.e., elliptically
polarized light). Because of the presence of the second term of LHS of (7.73), both
the major and minor axes of the ellipse are tilted and diverted from the x- and y-axes.
Let us inspect the ellipse described by (7.74). Inserting Ex ¼ E1 obtained at t ¼ 0
in (7.70) into (7.73) and solving a quadratic equation with respect to Ey, we get Ey as
a double root such that

E y ¼ E 2 cos δ:

Similarly putting Ey ¼ E2 in (7.73), we have

E x ¼ E 1 cos δ:

These results show that an ellipse described by (7.73) or (7.74) is internally tangent
to a rectangle as depicted in Fig. 7.6a. Equation (7.69) shows that the electromag-
netic wave is propagated toward the positive direction of the z-axis. Therefore, in
Fig. 7.6a we are peeking into the oncoming wave from the bottom of a plane of paper
7.4 Superposition of Two Electromagnetic Waves 289

(a) (b)

cos

cos

Fig. 7.6 Trace of an electric field of an elliptically polarized light. (a) The trace is internally tangent
to a rectangle of 2E1  2E2. In the case of δ > 0, starting from P at t ¼ 0, the coordinate point
representing the electric field traces the ellipse counterclockwise with time. (b) The trace of an
elliptically polarized light for δ ¼ π/2

at a certain position of z ¼ constant. We set the constant ¼ 0. Then, we find that at


t ¼ 0 the electric field is represented by the point P (Ex ¼ E1, Ey ¼ E2 cos δ); see
Fig. 7.6a. From (5.70), if δ>0, P traces the ellipse counterclockwise. It reaches a
maximum point of Ey ¼ E2 at t ¼ δ/2ω. Since the trace of electric field forms an
ellipse as in Fig. 7.6, the associated light is said to be an elliptically polarized light. If
δ < 0 in (5.70), on the other hand, P traces the ellipse clockwise.
In a special case of δ ¼ π/2, the second term of (7.73) vanishes and we have a
simple form described as

Ex 2 Ey 2
þ 2 ¼ 1: ð7:77Þ
E 21 E2

Thus, the principal axes of the ellipse coincide with the x- and y-axes. On the basis of
(7.70), we see from Fig. 7.6b that starting from P at t ¼ 0, again the coordinate point
representing the electric field traces the ellipse counterclockwise with time; see the
curved arrow of Fig. 7.6b. If δ < 0, the coordinate point traces the ellipse clockwise
with time.
(ii) Case II: E1 ¼ E2.
Now, let us consider a simple but important case. When E1 ¼ E2, (7.73) is simplified
to be

E x 2  2 cos δE x E y þ E y 2 ¼ E21 sin 2 δ: ð7:78Þ

Using a matrix form, we have


290 7 Maxwell’s Equations

  
1  cos δ Ex
Ex Ey ¼ E 21 sin 2 δ: ð7:79Þ
 cos δ 1 Ey

We obtain eigenvalues λ of the matrix of (7.79) such that

λ¼1 j cos δ j : ð7:80Þ

Setting  π2  δ  π2, we have

λ¼1 cos δ: ð7:81Þ

The corresponding normalized eigenvectors v1 and v2 (as a column vector) are


0 1 0 1
1 1
pffiffiffi pffiffiffi
B 2 C B C
v1 ¼ B C and v2 ¼ B 2 C: ð7:82Þ
@ 1 A @ 1 A
 pffiffiffi pffiffiffi
2 2

Thus, we have a diagonalizing unitary matrix P such that


0 1
1 1
pffiffiffi pffiffiffi
B 2 2C
P¼B
@
C: ð7:83Þ
1 1 A
 pffiffiffi pffiffiffi
2 2

Defining the above matrix appearing in (7.79) as A such that


 
1  cos δ
A¼ , ð7:84Þ
 cos δ 1

we obtain
 
1 þ cos δ 0
P1 AP ¼ : ð7:85Þ
0 1  cos δ

Notice that eigenvalues (1 + cos δ) and (1  cos δ) are both positive as expected.
Rewriting (7.79), we have
   
1 1  cos δ 1
Ex
E x Ey PP PP
 cos δ 1 Ey
7.4 Superposition of Two Electromagnetic Waves 291

   
1 þ cos δ 0 1
Ex
¼ Ex Ey P P ¼ E21 sin 2 δ: ð7:86Þ
0 1  cos δ Ey

Here, let us define new coordinates such that


   
u 1
Ex
P : ð7:87Þ
v Ey

This coordinate transformation corresponds to the transformation of basis vectors


(e1 e2) such that
     
Ex Ex u
ð e1 e2 Þ ¼ ðe1 e2 ÞPP1 ¼ e01 e02 , ð7:88Þ
Ey Ey v

where new basis vectors e01 e02 are given by


 
1 1 1 1
e01 e02 ¼ ðe1 e2 ÞP ¼ pffiffiffi e1  pffiffiffi e2 pffiffiffi e1 þ pffiffiffi e2 : ð7:89Þ
2 2 2 2

The coordinate system is depicted in Fig. 7.7 along with the basis vectors. The
relevant discussion will again appear in Part III.
Substituting (7.87) for (7.86) and rearranging terms, we get

u2 v2
þ 2 ¼ 1: ð7:90Þ
E21 ð1  cos δÞ E 1 ð1 þ cos δÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Equation (7.90) ffi indicates that a major axis and minor axis are E1 1 þ cos δ and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
E 1 1  cos δ, respectively. When δ ¼ π/2, (7.90) becomes

Fig. 7.7 Relationship


between the basis vectors
(e1 e2) and e01 e02 in the
case of E1 ¼ E2; see text
292 7 Maxwell’s Equations

(a) (b)

1 − cos

>0
= /2

Fig. 7.8 Polarized feature of light in the case of E1 ¼ E2. (a) If δ > 0, the electric field traces an
ellipse from A1 via A2 to A3 (see text). (b) If δ ¼ π/2, the electric field traces a circle from C1 via C2
to C3 (left-circularly polarized light)

u2 v 2
þ ¼ 1: ð7:91Þ
E21 E 21

This represents a circle. For this reason, the wave described by (7.91) is called a
circularly polarized light. In (7.90) where δ 6¼ π/2, the wave is said to be an
elliptically polarized light. Thus, we have linearly, elliptically, and circularly polar-
ized lights depending on a magnitude of δ.
Let us closely examine characteristics of the elliptically and circularly polarized
lights in the case of E1 ¼ E2. When t ¼ 0, from (7.70) we have

E x ¼ E1 and Ey ¼ E 1 cos δ: ð7:92Þ

This coordinate point corresponds to A1 whose Ex coordinate is E1 (see Fig. 7.8a). In


the case of Δt ¼ δ/2ω, Ex ¼ Ey ¼ E1 cos ( δ/2). This point corresponds to A2 in
Fig. 7.8a. We have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
E 2x þ E2y ¼ 2E 1 cos ðδ=2Þ ¼ E 1 1 þ cos δ: ð7:93Þ

This is equal to the major axis as anticipated. With t ¼ Δt,

E x ¼ E1 cos ðωΔt Þ and E y ¼ E 1 cos ðωΔt þ δÞ: ð7:94Þ

Notice that Ey takes a maximum E1 when Δt ¼ δ/ω. Consequently, if δ takes a


positive value, Ey takes a maximum E1 for a positive Δt, as is similarly the case with
Fig. 7.6a. At that time, Ex ¼ E1 cos (δ) < E1. This point corresponds to A3 in
Fig. 7.8a. As a result, the electric field traces the ellipse counterclockwise with time,
References 293

as in the case of Fig. 7.6. If δ takes a negative value, on the other hand, the field traces
the ellipse clockwise.
If δ ¼ π/2, in (7.94) we have

π
Ex ¼ E 1 cos ðωt Þ and E y ¼ E 1 cos ωt : ð7:95Þ
2

We examine the case of δ ¼ π/2 first. In this case,pwhenffiffiffi t ¼ 0, Ex ¼ E1 and Ey ¼ 0


(Point C1 in Fig. 7.8b). If ωt ¼ π/4, Ex ¼ E y ¼ 1= 2 (Point C2 in Fig. 7.8b). In turn,
if ωt ¼ π/2, Ex ¼ 0 and Ey ¼ E1 (Point C3). Again the electric field traces the circle
counterclockwise. In this situation, we see the light from above the z-axis. In other
words, we are viewing the light against the direction of its propagation. The wave is
said to be left-circularly polarized and have positive helicity. In contrast, when
δ ¼  π/2, starting from Point C1 the electric field traces the circle clockwise.
That light is said to be right-circularly polarized and have negative helicity.
With the left-circularly polarized light, (7.69) can be rewritten as

E = E1 þ E2 = E1 ðe1 þ ie2 ÞeiðkzωtÞ : ð7:96Þ

Therefore, a complex vector (e1 + ie2) characterizes the left-circular polarization. On


the other hand, (e1  ie2) characterizes the right-circular polarization. To normalize
them, it is convenient to use the following vectors as in the case of Sect. 4.3 [3].

1 1
eþ  pffiffiffi ðe1 þ ie2 Þ and e  pffiffiffi ðe1  ie2 Þ, ð4:45Þ
2 2

In the case of δ ¼ 0, we have a linearly polarized light. For this, the points A1, A2,
and A3 coalesce to be a point on a straight line of Ey ¼ Ex.

References

1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
2. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
3. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York
Chapter 8
Reflection and Transmission
of Electromagnetic Waves in Dielectric
Media

In Chap. 7, we considered the propagation of electromagnetic waves in an infinite


uniform dielectric medium. In this chapter, we think of a situation where two
(or more) dielectrics are in contact with each other at a plane interface. When two
dielectric media adjoin each other with an interface, propagating electromagnetic
waves are partly reflected by the interface and partly transmitted beyond the inter-
face. We deal with these phenomena in terms of characteristic impedance of the
dielectric media. In the case of an oblique incidence of a wave, we categorize it into a
transverse electric (TE) wave and transverse magnetic (TM) wave. If a thin plate of a
dielectric is sandwiched by a couple of metal sheets, the electromagnetic wave is
confined within the dielectric. In this case, the propagating mode of the wave differs
from that of a wave propagating in a free space (i.e., a space filled by a three-
dimensionally infinite dielectric medium). If a thin plate of a dielectric having a large
refractive index is sandwiched by a couple of dielectrics with a smaller refractive
index, the electromagnetic wave is also confined within the dielectric with a larger
index. In this case, we have to take account of the total reflection that causes a phase
change upon the reflection. We deal with such specific modes of the electromagnetic
wave propagation. These phenomena are treated both from a basic aspect and from a
point of view of device application. The relevant devices are called waveguides in
optics.

8.1 Electromagnetic Fields at an Interface

We start with examining a condition of an electromagnetic field at the plane


interface. Suppose that two semi-infinite dielectric media D1 and D2 are in contact
with each other at a plane interface. Let us take a small rectangle S that strides the
interface (see Fig. 8.1). Taking a surface integral of both sides of (7.28) over the
strip, we have

© Springer Nature Singapore Pte Ltd. 2020 295


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_8
296 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Z Z
∂B
rot E  ndS þ  ndS ¼ 0, ð8:1Þ
S S ∂t

where n is a unit vector directed to a normal of S as shown. Applying Stokes’


theorem to the first term of (8.1), we get
I
∂B
E  dl þ  nΔlΔh ¼ 0: ð8:2Þ
C ∂t

With the line integral of the first term, C is a closed loop surrounding the rectangle
S and dl = tdl, where t is a unit vector directed toward the tangential direction of
C (see t1 and t2 in Fig. 8.1). The line integration is performed such that C is followed
counter-clockwise in the direction of t.
Figure 8.2 gives an intuitive diagram that explains the Stokes’ theorem. The
diagram shows an overview of a surface S encircled by a closed curve C. Suppose
that we have a spiral vector field E represented by arrowed circles as shown. In that
case, rot E is directed toward the upper side of the plane of paper in the individual
fragments. A summation of rot E  ndS forms a surface integral covering S. Mean-
while, the arrows of adjacent fragments cancel out each other and only the compo-
nents on the periphery (i.e., the curve C) are nonvanishing (see Fig. 8.2). Thus, the
surface integral of rot E is equivalent to the line integral of E. Accordingly, we get
Stokes’ theorem described by [1]

Δl
t1
Δh

D1

D2 n C t2
S
Fig. 8.1 A small rectangle S that strides an interface formed by two semi-infinite dielectric media
of D1 and D2. Let a curve C be a closed loop surrounding the rectangle S. A unit vector n is directed
to a normal of S. Unit vectors t1 and t2 are directed to a tangential line of the interface plane

C
dS
n

Fig. 8.2 Diagram that intuitively explains the Stokes’ theorem. In the diagram a surface S is
encircled by a closed curve C. An infinitesimal portion of C is denoted by dl. The surface S is
pertinent to the surface integration. Spiral vector field E is present on and near S
8.2 Basic Concepts Underlying Phenomena 297

Z I
rot E  ndS ¼ E  dl: ð8:3Þ
S C

Returning back to Fig. 8.1 and taking Δh ⟶ 0, we have ∂B


∂t
 nΔlΔh⟶0. Then, the
second term of (8.2) vanishes and we get
I
E  dl = 0:
C

This implies that

ΔlðE1  t1 þ E2  t2 Þ ¼ 0,

where E1 and E2 represent the electric field in the dielectrics D1 and D2 close to the
interface, respectively. Considering t2 = 2 t1 and putting t1 = t, we get

ðE1  E2 Þ  t ¼ 0, ð8:4Þ

where t represents a unit vector in the direction of a tangential line of the interface
plane. Equation (8.4) means that the tangential components of the electric field are
continuous on both sides of the interface. We obtain a similar result with the
magnetic field. This can be shown by taking a surface integral of both sides of
(7.29) as well. As a result, we get

ðH 1  H 2 Þ  t ¼ 0, ð8:5Þ

where H1 and H2 represent the magnetic field in D1 and D2 close to the interface,
respectively. Hence, from (8.5) the tangential components of the magnetic field are
continuous on both sides of the interface as well.

8.2 Basic Concepts Underlying Phenomena

When an electromagnetic wave is incident upon an interface of dielectrics, its


reflection and transmission (refraction) take place at the interface. We address a
question of how the nature of the dielectrics and the conditions dealt with in the
previous section are associated with the optical phenomena. When we deal with the
problem, we assume non-absorbing media. Notice that the complex wavenumber
vector is responsible for an absorbing medium along with a complex index of
refraction. Nonetheless, our approach is useful to discuss related problems in the
absorbing media. Characteristic impedance plays a key role in the reflection and
transmission of light.
298 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

We represent a field (either electric or magnetic) of the incident, reflected, and


transmitted (or refracted) waves by Fi, Fr, and Ft, respectively. We call a dielectric
of the incidence side (and, hence, reflection side) D1 and another dielectric of the
transmission side D2. The fields are described by

Fi = F i εi eiðki xωtÞ , ð8:6Þ


Fr = F r εr eiðkr xωtÞ , ð8:7Þ
Ft = F t εt eiðkt xωtÞ , ð8:8Þ

where Fi, Fr, and Ft denote an amplitude of the field; εi, εr, and εt represent a unit
vector of polarization direction, i.e., the direction along which the field oscillates; ki,
kr, and kt are wavenumber vectors such that ki ⊥ εi, kr ⊥ εr, and kt ⊥ εt. These
wavenumber vectors represent the propagation directions of individual waves. In
(8.6) to (8.8), indices of i, r, and t stand for incidence, reflection, and transmission,
respectively.
Let xs be an arbitrary position vector at the interface between the dielectrics. Also,
let t be a unit vector paralleling the interface. Thus, tangential components of the
field are described as

F it = F i ðt  εi Þeiðki xs ωtÞ , ð8:9Þ

F rt = F r ðt  εr Þeiðkr xs ωtÞ , ð8:10Þ


F tt = F t ðt  εt Þeiðkt xs ωtÞ : ð8:11Þ

Note that F it and F rt represent the field in D1 just close to the interface and that F tt
denotes the field in D2 just close to the interface. Thus, in light of (8.4) and (8.5), we
have

F it þ F rt ¼ F tt : ð8:12Þ

Notice that (8.12) holds with any position xs and any time t.
Let us think of elementary calculation of exponential functions or exponential
polynomials and the relationship between individual coefficients and exponents.
0
With respect to two functions eikx and eik x , we have two alternatives according to a
value Wronskian takes. Here, Wronskian W is expressed as
 
 eikx eik x 
0
 0
W ¼   ikx 0  ik0 x 0  ¼ iðk  k 0 Þeiðkþk Þx : ð8:13Þ
 e e 

0
(i) W 6¼ 0 if and only if k 6¼ k0. In this case, eikx and eik x are said to be linearly
independent. That is, on condition of k 6¼ k0, for any x we have
8.2 Basic Concepts Underlying Phenomena 299

0
aeikx þ beik x ¼ 0 ⟺ a ¼ b ¼ 0: ð8:14Þ

(ii) W ¼ 0 if k ¼ k0. In that case, we have


0
aeikx þ beik x ¼ ða þ bÞeikx ¼ 0 ⟺ a þ b ¼ 0:

Notice that eikx never vanishes with any x. To conclude, if we think of an


equation of an exponential polynomial
0
aeikx þ beik x ¼ 0,

we have two alternatives regarding the coefficients. One is a trivial case of


a ¼ b ¼ 0 and the other is a þ b ¼ 0.
Next, with respect to eik1 x , and eik2 x , and eik3 x , similarly we have
 
 eik1 x eik2 x eik3 x 
  0  0  0 
 eik1 x eik2 x eik3 x 
W¼ 
  ik x 00  ik x 00  ik x 00 
 e 1 e 2 e 3 

¼ iðk1  k 2 Þðk2  k3 Þðk3  k1 Þeiðk1 þk2 þk3 Þx , ð8:15Þ

where W 6¼ 0 if and only if k1 6¼ k2, k2 6¼ k3, and k3 6¼ k1. That is, on this condition for
any x we have

aeik1 x þ beik2 x þ ceik3 x ¼ 0 ⟺ a ¼ b ¼ c ¼ 0: ð8:16Þ

If the three exponential functions are linearly dependent, at least two of k1, k2, and k3
are equal to each other, and vice versa. On this condition, again consider a following
equation of an exponential polynomial:

aeik1 x þ beik2 x þ ceik3 x ¼ 0: ð8:17Þ

Without loss of generality, we assume that k1 ¼ k2. Then, we have

aeik1 x þ beik2 x þ ceik3 x ¼ ða þ bÞeik1 x þ ceik3 x ¼ 0:

If k1 6¼ k3, we must have

a þ b ¼ 0 and c ¼ 0: ð8:18Þ

If, on the other hand, k1 ¼ k3, i.e., k1 ¼ k2 ¼ k3, we have


300 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

aeik1 x þ beik2 x þ ceik3 x ¼ ða þ b þ cÞeik1 x ¼ 0:

That is, we have

a þ b þ c ¼ 0: ð8:19Þ

Consequently, we must have k1 ¼ k2 ¼ k3 so that we can get three nonzero


coefficients a, b, and c.
Returning to (8.12), its full description is

F i ðt  εi Þeiðki xs ωtÞ þ F r ðt  εr Þeiðkr xs ωtÞ  F t ðt  εt Þeiðkt xs ωtÞ ¼ 0: ð8:20Þ

Again, (8.20) must hold with any position xs and any time t. Meanwhile, for (8.20) to
have a physical meaning, we should have

F i ðt  εi Þ 6¼ 0, F r ðt  εr Þ 6¼ 0, and F t ðt  εt Þ 6¼ 0: ð8:21Þ

On the basis of the above consideration, we must have following two relations:

ki  xs  ωt ¼ kr  xs  ωt ¼ kt  xs  ωt

or

ki  xs ¼ kr  xs ¼ k t  xs , ð8:22Þ

and

F i ðt  εi Þ þ F r ðt  εr Þ  F t ðt  εt Þ ¼ 0

or

F i ðt  εi Þ þ F r ðt  εr Þ ¼ F t ðt  εt Þ: ð8:23Þ

In this way, we are able to obtain a relation among amplitudes of the fields of
incidence, reflection, and transmission. Notice that we get both the relations between
exponents and coefficients at once.
First, let us consider (8.22). Suppose that the incident light (ki) is propagated in a
dielectric medium D1 in parallel to the zx-plane and that the interface is the xy-plane
(see Fig. 8.3). Also suppose that at the interface the light is reflected partly back to
D1 and transmitted (or refracted) partly into another dielectric medium D2. In
Fig. 8.3, ki, kr, and, kt represent the incident, reflected, and transmitted lights that
make an angle θ, θ0, and ϕ with the z-axis, respectively. Then we have
8.2 Basic Concepts Underlying Phenomena 301

θ θ kr
' x
'
θ ki
φ
kt

Fig. 8.3 Geometry of the incident, reflected, and transmitted lights. We assume that the light is
incident from a dielectric medium D1 toward another medium D2. The wavenumber vectors ki, kr,
and, kt represent the incident, reflected, and transmitted (or refracted) lights with an angle θ, θ0, and
ϕ, respectively. Note here that we did not assume the equality of θ and θ0 (see text)

0 1
ki sin θ
B C
ki ¼ ð e 1 e 2 e 3 Þ @ 0 A, ð8:24Þ
ki cos θ
0 1
x
B C
xs ¼ ð e 1 e 2 e3 Þ @ y A , ð8:25Þ
0

where θ is said to be an incidence angle. A plane formed by ki and a normal to the


interface is called a plane of incidence (or incidence plane). In Fig. 8.3, the zx-plane
forms the incidence plane. From (8.24) and (8.25), we have

ki  xs ¼ k i x sin θ, ð8:26Þ
kr  xs ¼ k rx x þ kry y, ð8:27Þ
kt  xs ¼ ktx x þ kty y, ð8:28Þ

where ki ¼ j kij; krx and kry are x and y components of kr; similarly ktx and k ty are
x and y components of kt.
Since (8.22) holds with any x and y, we have

ki sin θ ¼ krx ¼ ktx , ð8:29Þ


k ry ¼ k ty ¼ 0: ð8:30Þ

From (8.30) neither kr nor kt has a y component. This means that ki, kr, and kt are
coplanar. That is, the incident, reflected, and transmitted waves are all parallel to the
zx-plane. Notice that at the beginning we did not assume the coplanarity of those
waves. We did not assume the equality of θ and θ0 either (vide infra). From (8.29)
and Fig. 8.3, however, we have
302 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

k i sin θ ¼ kr sin θ0 ¼ kt sin ϕ, ð8:31Þ

where kr ¼ j krj and kt ¼ j ktj; θ0 and ϕ are said to be a reflection angle and a
refraction angle, respectively. Thus, the end points of ki, kr, and kt are connected on a
straight line that parallels the z-axis. Figure 8.3 clearly shows it.
Now, we suppose that a wavelength of the electromagnetic wave in D1 is λ1 and
that in D2 is λ2. Since the incident light and reflected light are propagated in D1, we
have

ki ¼ kr ¼ 2π=λ1 : ð8:32Þ

From (8.31) and (8.32), we get

sin θ ¼ sin θ0 : ð8:33Þ

Therefore, we have either θ ¼ θ0 or θ0 ¼ π  θ. since 0 < θ, θ0 < π/2, we have

θ ¼ θ0 : ð8:34Þ

Then, returning back to (8.31), we have

k i sin θ ¼ kr sin θ ¼ kt sin ϕ: ð8:35Þ

This implies that the components tangential to the interface of ki, kr, and kt are
the same.
Meanwhile, we have

kt ¼ 2π=λ2 : ð8:36Þ

Also we have

c ¼ λ0 ν, v1 ¼ λ1 ν, v2 ¼ λ2 ν, ð8:37Þ

where v1 and v2 are phase velocities of light in D1 and D2, respectively. Since ν is
common to D1 and D2, we have

c=λ0 ¼ v1 =λ1 ¼ v2 =λ2

or

c=v1 ¼ λ0 =λ1 ¼ n1 , c=v2 ¼ λ0 =λ2 ¼ n2 , ð8:38Þ


8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 303

where λ0 is a wavelength in vacuum; n1 and n2 are refractive indices of D1 and D2,


respectively. Combining (8.35) with (8.32), (8.36), and (8.38), we have several
relations such that

sin θ kt λ1 n2
¼ ¼ ¼ ð nÞ, ð8:39Þ
sin ϕ ki λ2 n1

where n is said to be a relative refractive index of D2 relative to D1. The relation


(8.39) is called Snell’s law. Notice that (8.39) reflects the kinematic aspect of light
and that this characteristic comes from the exponents of (8.20).

8.3 Transverse Electric (TE) Waves and Transverse


Magnetic (TM) Waves

On the basis of the above argument, we are now in the position to determine the
relations among amplitudes of the electromagnetic fields of waves of incidence,
reflection, and transmission. Notice that since we are dealing with non-absorbing
media, the relevant amplitudes are real (i.e., positive or negative). In other words,
when the phase is retained upon reflection, we have a positive amplitude due to
ei0 ¼ 1. When the phase is reversed upon reflection, on the other hand, we will be
treating a negative amplitude due to eiπ ¼  1. Nevertheless, when we consider the
total reflection, we deal with a complex amplitude (vide infra).
We start with the discussion of the vertical incidence of an electromagnetic wave
before the general oblique incidence. In Fig. 8.4a, we depict electric fields E and
magnetic fields H obtained at a certain moment near the interface. We index, e.g., Ei
for the incident field. There we define unit polarization vectors of the electric field εi,
εr, and εt as identical to be e1 (a unit vector in the direction of the x-axis). In (8.6), we
also define Fi (both electric and magnetic fields) as positive.

(a) z Incidence light (b) z Incidence light

Hi Ei Hi Ei
Hr Er Hr Er
' x x

Ht e1 ' e1
Et Ht Et

Fig. 8.4 Geometry of the electromagnetic fields near the interface between dielectric media D1 and
D2 in the case of vertical incidence. (a) All Ei, Er, and Et are directed in the same direction e1 (i.e., a
unit vector in the positive direction of the x-axis). (b) Although Ei and Et are directed in the same
direction, Er is reversed. In this case, we define Er as negative
304 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

We have two cases about a geometry of the fields (see Fig. 8.4). The first case is
that all Ei, Er, and Et are directed in the same direction (i.e., the positive direction of
the x-axis); see Fig. 8.4a. Another case is that although Ei and Et are directed in the
same direction, Er is reversed (Fig. 8.4b). In this case, we define Er as negative.
Notice that Ei and Et are always directed in the same direction and that Er is directed
either in the same direction or in the opposite direction according to the nature of the
dielectrics. The situation will be discussed soon.
Meanwhile, unit polarization vectors of the magnetic fields are determined by
(7.67) for the incident, reflected, and transmitted waves. In Fig. 8.4, the magnetic
fields are polarized along the y-axis (i.e., perpendicular to the plane of paper). The
magnetic fields Hi and Ht are always directed to the same direction as in the case of
the electric fields. On the other hand, if the phase of Er is conserved, the direction of
Hr is reversed and vice versa. This converse relationship with respect to the electric
and magnetic fields results solely from the requirement that E, H, and the propaga-
tion unit vector n of light must constitute a right-handed system in this order. Notice
that n is reversed upon reflection.
Next, let us consider an oblique incidence. With the oblique incidence, electro-
magnetic waves are classified into two special categories, i.e., transverse electric
(TE) waves (or modes) or transverse magnetic (TM) waves (or modes). The TE wave
is characterized by the electric field that is perpendicular to the incidence plane,
whereas the TM wave is characterized by the magnetic field that is perpendicular to
the incidence plane. Here the incidence plane is a plane that is formed by the
propagation direction of the incident light and the normal to the interface of the
two dielectrics. Since E, H, and n form a right-handed system, in the TE wave H lies
on the incidence plane. For the same reason, in the TM wave E lies on the incidence
plane.
In a general case where a field is polarized in an arbitrary direction, that field can
be formed by superimposing two fields corresponding to the TE and TM waves. In
other words, if we take an arbitrary field E, it can be decomposed into a component
having a unit polarization vector directed perpendicular to the incidence plane and
another component having the polarization vector that lies on the incidence plane.
These two components are orthogonal to each other.
Example 8.1: TE Wave In Fig. 8.5 we depict the geometry of oblique incidence of
a TE wave. The xy-plane defines the interface of the two dielectrics and t of (8.9) lies
on that plane. The zx-plane defines the incidence plane. In this case, E is polarized
along the y-axis with H polarized in the zx-plane. That is, regarding E, we choose
polarization direction εi, εr, and εt of the electric field as e2 (a unit vector toward the
positive direction of the y-axis that is perpendicular to the plane of paper). In Fig. 8.5,
the polarization direction of the electric field is denoted by a symbol ⨂. Therefore,
we have

e2  εi = e2  εr = e2  εt = 1: ð8:40Þ
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 305

Hr z

ki εi kr
εt
θ
' x
y '
Ei Et εr Er
φ

kt

Fig. 8.5 Geometry of the electromagnetic fields near the interface between dielectric media D1 and
D2 in the case of oblique incidence of a TE wave. The electric field E is polarized along the y-axis
(i.e., perpendicular to the plane of paper) with H polarized in the zx-plane. Polarization directions εi,
εr, and εt are given for H. To avoid complication, neither Hi nor Ht is shown

For H we define the direction of unit polarization vectors εi, εr, and εt so that their
direction cosine relative to the x-axis can be positive; see Fig. 8.5. Choosing e1
(a unit vector in the direction of the x-axis) for t of (8.9) with regard to H, we have

e1  εi = cos θ, e1  εr = cos θ, and e1  εt = cos ϕ: ð8:41Þ

According as Hr is directed to the same direction as εr or the opposite direction to


εr, the amplitude is defined as positive or negative, as in the case of the vertical
incidence. In Fig. 8.5, we depict the case where the amplitude Hr is negative. That is,
the phase of the magnetic field is reversed upon reflection and, hence, Hr is in an
opposite direction to εr in Fig. 8.5. Note that Hi and Ht are in the same direction as εi
and εt, respectively. To avoid complication, neither Hi nor Ht is shown in Fig. 8.5.
Applying (8.23) to both E and H, we have

Ei þ Er ¼ Et , ð8:42Þ
H i cos θ þ H r cos θ ¼ H t cos ϕ: ð8:43Þ

To derive the above equations, we choose t = e2 with E and t = e1 with H for (8.23).
Because of the abovementioned converse relationship with E and H, we have

E r H r < 0: ð8:44Þ

Suppose that we carry out an experiment to determine six amplitudes in (8.42)


and (8.43). Out of those quantities, we can freely choose and fix Ei. Then, we have
five unknown amplitudes, i.e., Er, Et, Hi, Hr, and Ht. Thus, we need three more
relations to determine them. Here information about the characteristic impedance
Z is useful. It was defined as (7.64). From (8.6) to (8.8) as well as (8.44), we get
306 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

pffiffiffiffiffiffiffiffiffiffiffi
Z1 ¼ μ1 =ε1 ¼ E i =H i ¼ E r =H r , ð8:45Þ
pffiffiffiffiffiffiffiffiffiffiffi
Z 2 ¼ μ2 =ε2 ¼ E t =H t , ð8:46Þ

where ε1 and μ1 are permittivity and permeability of D1, respectively; ε2 and μ2 are
permittivity and permeability of D2, respectively. As an example, we have

Hi = n  Ei =Z 1 ¼ n  Ei εi,e eiðki xωtÞ =Z 1 ¼ Ei εi,m eiðki xωtÞ =Z 1


¼ H i εi,m eiðki xωtÞ , ð8:47Þ

where we distinguish polarization vectors of electric and magnetic fields. Note in the
above discussion, however, we did not distinguish these vectors to avoid complica-
tion. Comparing coefficients of the last relation of (8.47), we get

Ei =Z 1 ¼ H i : ð8:48Þ

On the basis of (8.42) to (8.46), we are able to decide Er, Et, Hi, Hr, and Ht.
What we wish to determine, however, is a ratio among those quantities. To this
end, dividing (8.42) and (8.43) by Ei (>0), we define following quantities:

R⊥ ⊥
E  E r =E i and T E  E t =E i , ð8:49Þ

where R⊥ ⊥
E and T E are said to be a reflection coefficient and transmission coefficient
with the electric field, respectively; the symbol ⊥ means a quantity of the TE wave
(i.e., electric field oscillating vertically with respect to the incidence plane). Thus
rewriting (8.42) and (8.43) and using R⊥ ⊥
E and T E , we have
9
R⊥ ⊥
E  T E ¼ 1, =
cos θ cos ϕ cos θ ð8:50Þ
R⊥
E Z þ T⊥
E Z ¼ :;
1 2 Z1

Using Cramer’s rule of matrix algebra, we have a solution such that


 
 1 1 
 
 cosθ cos ϕ 
 
⊥ Z Z2 Z cos θ  Z 1 cos ϕ
RE ¼  1 ¼ 2 , ð8:51Þ
 1 1  Z 2 cos θ þ Z 1 cos ϕ
 
 cos θ cos ϕ 
 
Z1 Z2
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 307

 
 1 1 
 
 cosθ cos θ 
  2Z 2 cos θ
⊥ Z Z1
TE ¼  1 ¼ : ð8:52Þ
 1 1  Z cos θ þ Z 1 cos ϕ
  2
 cos θ cos ϕ 
 
Z1 Z2

Similarly, defining

R⊥ ⊥
H  H r =H i and T H  H t =H i , ð8:53Þ

where R⊥ ⊥
H and T H are said to be a reflection coefficient and transmission coefficient
with the magnetic field, respectively, we get

Z 1 cos ϕ  Z 2 cos θ
R⊥
H ¼ , ð8:54Þ
Z 2 cos θ þ Z 1 cos ϕ
2Z 1 cos θ
T⊥
H ¼ : ð8:55Þ
Z 2 cos θ þ Z 1 cos ϕ

In this case, rewrite (8.42) as a relation among Hi, Hr, and Ht using (8.45) and (8.46).
Derivation of (8.54) and (8.55) is left for readers. Notice also that

R⊥ ⊥
H ¼ RE : ð8:56Þ

This relation can easily be derived by (8.45).


Example 8.2: TM Wave In a manner similar to that described above, we obtain
information about the TM wave. Switching a role of E and H, we assume that H is
polarized along the y-axis with E polarized in the zx-plane. Following the aforemen-
tioned procedures, we have

E i cos θ þ E r cos θ ¼ E t cos ϕ, ð8:57Þ


Hi þ Hr ¼ Ht : ð8:58Þ

From (8.57) and (8.58), similarly we get

k Z 2 cos ϕ  Z 1 cos θ
RE ¼ , ð8:59Þ
Z 1 cos θ þ Z 2 cos ϕ
k 2Z 2 cos θ
TE ¼ : ð8:60Þ
Z 1 cos θ þ Z 2 cos ϕ

Also, we get
308 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Table 8.1 Reflection and transmission coefficients of TE and TM waves


⊥ Incidence (TE) k Incidence (TM)
Z 2 cos θZ 1 cos ϕ k
R⊥
E ¼ Z 2 cos θþZ 1 cos ϕ RE ¼ ZZ 21 cos ϕZ 1 cos θ
cos θþZ 2 cos ϕ
Z 1 cos ϕZ 2 cos θ k k
R⊥ ⊥
H ¼ Z 2 cos θþZ 1 cos ϕ ¼ RE RH ¼ ZZ 11 cos θZ 2 cos ϕ
cos θþZ 2 cos ϕ ¼ RE
2Z 2 cos θ
T⊥
E ¼ Z 2 cos θþZ 1 cos ϕ
k 2Z 2 cos θ
T E ¼ Z 1 cos θþZ 2 cos ϕ
2Z 1 cos θ
T⊥
H ¼ Z 2 cos θþZ 1 cos ϕ
k 2Z 1 cos θ
T H ¼ Z 1 cos θþZ 2 cos ϕ
cos ϕ k k k k cos ϕ  
R⊥ ⊥ ⊥ ⊥
E RH þ T E T H cos θ ¼ 1 ðR⊥ þ T ⊥ ¼ 1Þ RE RH þ T E T H cos θ ¼ 1 Rk þ T k ¼ 1

k Z 1 cos θ  Z 2 cos ϕ k
RH ¼ ¼ RE , ð8:61Þ
Z 1 cos θ þ Z 2 cos ϕ
k 2Z 1 cos θ
TH ¼ : ð8:62Þ
Z 1 cos θ þ Z 2 cos ϕ

In Table 8.1, we list the important coefficients in relation to the reflection and
transmission of electromagnetic waves along with their relationship.
In Examples 8.1 and 8.2, we have examined how the reflection and transmission
coefficients vary as a function of characteristic impedance as well as incidence and
refraction angles. Meanwhile, in a nonmagnetic substance a refractive index n can be
approximated as
pffiffiffiffi
n εr , ð7:57Þ

assuming that μr  1. In this case, we have


pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi
Z¼ μ=ε ¼ μr μ0 =εr ε0  Z 0 = εr  Z 0 =n:

Using this relation, we can readily rewrite the reflection and transmission coeffi-
cients as a function of refractive indices of the dielectrics. The derivation is left for
the readers.

8.4 Energy Transport by Electromagnetic Waves

Returning to (7.58) and (7.59), let us consider energy transport in a dielectric


medium by electromagnetic waves. Let us describe their electric (E) and magnetic
(H) fields of the electromagnetic waves in a uniform and infinite dielectric medium
such that
8.4 Energy Transport by Electromagnetic Waves 309

E = Eεe eiðknxωtÞ , ð8:63Þ


H = Hεm eiðknxωtÞ , ð8:64Þ

where εe and εm are unit polarization vector; we assume that both E and H are
positive. Notice again that εe, εm, and n constitute a right-handed system in this
order.
The energy transport is characterized by a Poynting vector S that is described by

S = E  H: ð8:65Þ

V A
Since E and H have a dimension [m ] and [m], respectively, S has a dimension [mW2]. Hence,
S represents an energy flow per unit time and per unit area with respect to the propagation
direction. For simplicity, let us assume that the electromagnetic wave is propagating
toward the z-direction. Then we have

E = Eεe eiðkzωtÞ , ð8:66Þ


H = Hεm eiðkzωtÞ : ð8:67Þ

To seek a time-averaged energy flow toward the z-direction, it suffices to multiply


real parts of (8.66) and (8.67) and integrate it during a period T at a point of z ¼ 0.
Thus, a time-averaged Poynting vector S is given by
Z T
EH
S ¼ e3 cos 2 ωtdt, ð8:68Þ
T 0

where T ¼ 1/ν ¼ 2π/ω. Using a trigonometric formula

1
cos 2 ωt ¼ ð1 þ cos 2ωt Þ, ð8:69Þ
2

the integration can easily be performed. Thus, we get

1
S ¼ EHe3 : ð8:70Þ
2

Equivalently, we have

1
S ¼ E  H : ð8:71Þ
2

Meanwhile, an energy density W is given by


310 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

1
W ¼ ðE  D þ H  BÞ, ð8:72Þ
2

where the first and second terms are pertinent to the electric and magnetic fields,
respectively. Note in (8.72) that the dimension of E  D is [m  m2] ¼ [mJ3] and that the
V C

dimension of H  B is [m  m2 ] ¼ [ m3 ] ¼ [m3 ]. Using (7.7) and (7.10), we have


A Vs Ws J

1 2 
W¼ εE þ μH 2 : ð8:73Þ
2

As in the above case, estimating a time-averaged energy density W, we get


 
1 1 2 1 2 1 1
W¼ εE þ μH = εE 2 þ μH 2 : ð8:74Þ
2 2 2 4 4

We also get this relation by integrating (8.73) over a wavelength λ at a time of t ¼ 0.


Using (7.60) and (7.61), we have

εE2 = μH 2 : ð8:75Þ

This implies that the energy density resulting from the electric field and that due to
the magnetic field have the same value. Thus, rewriting (8.74) we have

1 1
W ¼ εE 2 = μH 2 : ð8:76Þ
2 2

Moreover, using (7.43), we have for an impedance


pffiffiffiffiffiffiffiffi
Z ¼ E=H ¼ μ=ε ¼ μv or E ¼ μvH: ð8:77Þ

Using this relation along with (8.75), we get

1 1
S ¼ vεE 2 e3 ¼ vμH 2 e3 : ð8:78Þ
2 2

Thus, we have various relations among amplitudes of electromagnetic waves and


related physical quantities together with constant of dielectrics.
Returning to Examples 8.1 and 8.2, let us further investigate the reflection and
transmission properties of the electromagnetic waves. From (8.51) to (8.55) as well
as (8.59) to (8.62), we get in both the cases of TE and TM waves

cos ϕ
R⊥ ⊥ ⊥ ⊥
E RH þ TE T H ¼ 1, ð8:79Þ
cos θ
8.4 Energy Transport by Electromagnetic Waves 311

k k k k cos ϕ
RE RH þ T E T H ¼ 1: ð8:80Þ
cos θ

In both the TE and TM cases, we define reflectance R and transmittance T such that

R  RE RH ¼ R2E ¼ 2 j Sr j =2 j Si j¼j Sr j = j Si j , ð8:81Þ

where Sr and Si are time-averaged Poynting vectors of the reflected wave and
incident waves, respectively. Also, we have

cos ϕ 2 j St j cos ϕ j St j cos ϕ


T  TETH ¼ ¼ , ð8:82Þ
cos θ 2 j Si j cos θ j Si j cos θ

where St is a time-averaged Poynting vector of the transmitted wave. Thus, we have

R þ T ¼ 1: ð8:83Þ
ϕ
The relation (6.83) represents the energy conservation. The factor cos cos θ can be
understood by Fig. 8.6 that depicts a luminous flux near the interface. Suppose
that we have an incident wave with an irradiance I mW2 whose incidence plane is the
zx-plane. Notice that I has the same dimension as a Poynting vector.
Here let us think of the luminous flux that is getting through a unit area (i.e., a unit
length square) perpendicular to the propagation direction of the light. Then, this flux
illuminates an area on the interface of a unit length (in the y-direction) multiplied by
ϕ
cos θ (in the x-direction). That is, the luminous flux has been widened
a length of cos
ϕ
(or squeezed) by cos cos θ times after getting through the interface. The irradiance has
been weakened (or strengthened) accordingly (see Fig. 8.6). Thus, to take a balance
of income and outgo with respect to the luminous flux before and after getting
ϕ
through the interface, the transmission irradiance must be multiplied by a factor cos cos θ.

Fig. 8.6 Luminous flux z


near the interface ki kr

θ
'
' x

kt
312 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

8.5 Brewster Angles and Critical Angles

In this section and subsequent sections, we deal with nonmagnetic substance as


dielectrics; namely we assume μr  1. In that case, as mentioned in Sect. 8.3 we
rewrite, e.g., (8.51) and (8.59) as

cos θ  n cos ϕ
R⊥
E ¼ , ð8:84Þ
cos θ þ n cos ϕ
k cos ϕ  n cos θ
RE ¼ , ð8:85Þ
cos ϕ þ n cos θ

where n(¼n2/n1) is a relative refractive index of D2 relative to D1. Let us think of a


k
condition on which R⊥ E ¼ 0 or RE ¼ 0.
First, we consider (8.84). We have

sin θ
½numerator of ð8:84Þ ¼ cos θ  n cos ϕ ¼ cos θ  cos ϕ
sin ϕ
sin ϕ cos θ  sin θ cos ϕ sin ðϕ  θÞ
¼ ¼ , ð8:86Þ
sin ϕ sin ϕ

where with the second equality we used Snell’s law; the last equality is due to
trigonometric formula. Since we assume 0 < θ < π/2 and 0 < ϕ < π/2, we have
π/2 < ϕ  θ < π/2. Therefore, if and only if ϕ  θ ¼ 0, sin(ϕ  θ) ¼ 0. Namely,
only when ϕ ¼ θ, R⊥ E could vanish. For different dielectrics having different
refractive indices, only if ϕ ¼ θ ¼ 0 (i.e., a vertical incidence), we have ϕ ¼ θ.
But, in that case we have

sin ðϕ  θÞ 0
lim ¼ :
ϕ!0, θ!0 sin ϕ 0

This is a limit of indeterminate form. From (8.84), however, we have

1n
R⊥
E ¼ , ð8:87Þ
1þn

for ϕ ¼ θ ¼ 0. This implies that R⊥ ⊥


E does not vanish at ϕ ¼ θ ¼ 0. Thus, RE never
vanishes for any θ or ϕ. Note that for this condition, naturally we have

k 1n
RE ¼ :
1þn

This is because with ϕ ¼ θ ¼ 0 we have no physical difference between TE and TM


waves.
In turn, let us examine (8.85) similarly with the case of TM wave. We have
8.5 Brewster Angles and Critical Angles 313

sin θ
½numerator of ð8:85Þ ¼ cos ϕ  n cos θ ¼ cos ϕ  cos θ
sin ϕ
sin ϕ cos ϕ  sin θ cos θ sin ðϕ  θÞ cos ðϕ þ θÞ
¼ ¼ : ð8:88Þ
sin ϕ sin ϕ

With the last equality of (8.88), we used a trigonometric formula. From (8.86) we
ðϕθÞ k
know that sinsin ϕ does not vanish. Therefore, for RE to vanish, we need cos
(ϕ + θ) ¼ 0. Since 0 < ϕ + θ < π, cos(ϕ + θ) ¼ 0 if and only if

ϕ þ θ ¼ π=2: ð8:89Þ

In other words, for particular angles θ ¼ θB and ϕ ¼ ϕB that satisfy

ϕB þ θB ¼ π=2, ð8:90Þ

k
we have RE ¼ 0; i.e., we do not observe a reflected wave. The particular angle θB is
said to be the Brewster angle. For θB we have
 
π
sin ϕB ¼ sin  θB ¼ cos θB ,
2
n ¼ sin θB = sin ϕB ¼ sin θB = cos θB ¼ tan θB or θB ¼ tan 1 n,
ϕB ¼ tan 1 n1 : ð8:91Þ

Suppose that we have a parallel plate consisting a dielectric D2 of a refractive


index n2 sandwiched with another dielectric D1 of a refractive index n1 (Fig. 8.7).
Let θB be the Brewster angle when the TM wave is incident from D1 to D2. In the

θB '

x
φB
' φB

' θB

Fig. 8.7 Diagram that explains the Brewster angle. Suppose that a parallel plate consisting of a
dielectric D2 of a refractive index n2 is sandwiched with another dielectric D1 of a refractive index
n1. The incidence angle θB represents the Brewster angle observed when the TM wave is incident
from D1 to D2. ϕB is another Brewster angle that is observed when the TM wave is getting back
from D2 to D1
314 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

above discussion, we defined a relative refractive index n of D2 relative D1


as n ¼ n2/n1; recall (8.39). The other way around, suppose that the TM wave
is incident from D2 to D1. Then, the relative refractive index of D1 relative to D2
is n1/n2 ¼ n1. In this situation, another Brewster angle (from D2 to D1) defined as
e
θB is given by

e
θB ¼ tan 1 n1 : ð8:92Þ

This number is, however, identical to ϕB in (8.91). Thus, we have

e
θ B ¼ ϕB : ð8:93Þ

Thus, regarding the TM wave that is propagating in D2 after getting through the
interface and is to get back to D1, eθB ¼ ϕB is again the Brewster angle. In this way,
the said TM wave is propagating from D1 to D2 and then getting back from D2 to D1
without being reflected by the two interfaces. This conspicuous feature is often
utilized for an optical device.
If an electromagnetic wave is incident from a dielectric of a higher refractive
index to that of a lower index, the total reflection takes place. This is equally the case
with both TE and TM waves. For the total reflection to take place, θ should be larger
than a critical angle θc that is defined by

θc ¼ sin 1 n: ð8:94Þ

This is because at θc from the Snell’s law we have

sin θc n
¼ sin θc ¼ 2 ð nÞ: ð8:95Þ
sin π2 n1

From (8.95), we have


pffiffiffiffiffiffiffiffiffiffiffiffiffi
tan θc ¼ n= 1  n2 > n ¼ tan θB :

In the case of the TM wave, therefore, we find that

θc > θB : ð8:96Þ

The critical angle is always larger than the Brewster angle with TM waves.
8.6 Total Reflection 315

8.6 Total Reflection

In Sect. 8.2 we saw that the Snell’s law results from the kinematical requirement. For
this reason, we may consider it as a universal relation that can be extended to
complex refraction angles. In fact, for the Snell’s law to hold with θ > θc, we
must have

sin ϕ > 1: ð8:97Þ

This needs us to extend ϕ to a complex domain. Putting

π
ϕ¼ þ ia ða : real, a 6¼ 0Þ, ð8:98Þ
2

we have

1  iϕ  1
sin ϕ  e  eiϕ ¼ ðea þ ea Þ > 1, ð8:99Þ
2i 2
1   i
cos ϕ  eiϕ þ eiϕ ¼ ðea  ea Þ: ð8:100Þ
2 2

Thus, cosϕ is pure imaginary.


Now, let us consider a transmitted wave whose electric field is described as

Et = Eεt eiðkt xωtÞ , ð8:101Þ

where εt is the unit polarization vector and kt is a wavenumber vector of the


transmission wave. Suppose that the incidence plane is the zx-plane. Then, we have

kt  x = ktx x þ k tz z ¼ xk t sin ϕ þ zk t cos ϕ, ð8:102Þ

where ktx and ktz are x and z components of kt, respectively; kt ¼ j ktj. Putting

cos ϕ ¼ ib ðb : real, b 6¼ 0Þ, ð8:103Þ

we have

Et = Eεt eiðxkt sin ϕþibzkt ωtÞ ¼ Eεt eiðxkt sin ϕωtÞ ebzkt : ð8:104Þ

With the total reflection, we must have

z⟶1 ⟹ ebzkt ⟶0: ð8:105Þ


316 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

To meet this requirement, we have

b > 0: ð8:106Þ

Meanwhile, we have

sin2 θ n2  sin2 θ
cos2 ϕ ¼ 1  sin2 ϕ ¼ 1  ¼ , ð8:107Þ
n2 n2

where notice that n < 1, because we are dealing with the incidence of light from a
medium with a higher refractive index to a low index medium. When we consider
the total reflection, the numerator of (8.107) is negative, and so we have two choices
such that
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sin2 θ  n2
cos ϕ ¼ i : ð8:108Þ
n

From (8.103) and (8.106), we get


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sin2 θ  n2
cos ϕ ¼ i : ð8:109Þ
n

Hence, inserting (8.109) into (8.84) we have for the TE wave


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cos θ  i sin2 θ  n2
R⊥
E ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð8:110Þ
cos θ þ i sin2 θ  n2

Then, we have
 ⊥   ⊥ 
R⊥ ¼ R⊥E RH ¼ R⊥ E RE
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cos θ  i sin2 θ  n2 cos θ þ i sin2 θ  n2
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1: ð8:111Þ
cos θ þ i sin2 θ  n2 cos θ  i sin2 θ  n2

As for the TM wave, substituting (8.109) for (8.85) we have


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
k n2 cos θ þ i sin2 θ  n2
RE ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð8:112Þ
n2 cos θ þ i sin2 θ  n2

In this case, we also get


8.6 Total Reflection 317

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
k n2 cos θ þ i sin2 θ  n2 n2 cos θ  i sin2 θ  n2
R ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1: ð8:113Þ
n2 cos θ þ i sin2 θ  n2 n2 cos θ  i sin2 θ  n2

The relations (8.111) and (8.113) ensure that the energy flow gets back to a higher
refractive index medium.
Thus, the total reflection is characterized by the complex reflection coefficient
expressed as (8.110) and (8.112) as well as a reflectance of 1. From (8.110) and
(8.112) we can estimate a change in a phase of the electromagnetic wave that takes
place by virtue of the total reflection. For this purpose, we put

k
R⊥
E  e and RE  e :
iα iβ
ð8:114Þ

Rewriting (8.110), we have


  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cos2 θ  sin2 θ  n2  2i cos θ sin2 θ  n2
R⊥ ¼ : ð8:115Þ
E
1  n2

At a critical angle θc, from (8.95) we have

sin θc ¼ n: ð8:116Þ

Therefore, we have

1  n2 ¼ cos2 θc : ð8:117Þ

Then, as expected, we get



R⊥ 
E θ¼θc ¼ 1: ð8:118Þ

Note, however, that at θ ¼ π/2 (i.e., grazing incidence) we have



R⊥ 
E θ¼π=2 ¼ 1: ð8:119Þ

From (8.115), an argument α in a complex plane is given by


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 cos θ sin2 θ  n2
tan α ¼   : ð8:120Þ
cos2 θ  sin2 θ  n2

The argument α defines a phase shift upon the total reflection. Considering (8.115)
and (8.118), we have
318 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.8 Phase shift α


defined in a complex plane i
for the total reflection of TE
wave. The number n denotes
a relative refractive index of
D2 relative to D1. At a
critical angle θc, α ¼ 0 0 1

α
= =
2

1+
= sin
2

αjθ¼θc ¼ 0:

Since 1  n2 > 0 (i. e., n < 1) and in the total reflection region sin2θ  n2 > 0, the
imaginary part of R⊥ E is negative for any θ (i.e., 0 to π/2). On the other hand, the real
part of R⊥
E varies from 1 to 1, as is evidenced from (8.118) and (8.119). At θ0 that
satisfies a following condition:
rffiffiffiffiffiffiffiffiffiffiffiffiffi
1 þ n2
sin θ0 ¼ , ð8:121Þ
2

the real part is zero. Thus, the phase shift α varies from 0 to π as indicated in
Fig. 8.8. Comparing (8.121) with (8.116) and taking into account n < 1, we have

θc < θ0 < π=2:

Similarly, we estimate the phase change for a TM wave. Rewriting (8.112), we


have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
k n4 cos2 θ þ sin2 θ  n2 þ 2in2 cos θ sin2 θ  n2
RE ¼
n4 cos2 θ þ sin2 θ  n2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
n4 cos2 θ þ sin2 θ  n2 þ 2in2 cos θ sin2 θ  n2
¼   : ð8:122Þ
ð1  n2 Þ sin2 θ  n2 cos2 θ

Then, we have

k
RE  ¼ 1: ð8:123Þ
θ¼θc

Also at θ ¼ π/2 (i.e., grazing incidence) we have


8.7 Waveguide Applications 319


k
RE  ¼ 1: ð8:124Þ
θ¼π=2

From (8.122), an argument β is given by


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2n2 cos θ sin2 θ  n2
tan β ¼ : ð8:125Þ
n4 cos2 θ þ sin2 θ  n2

Considering (8.122) and (8.123), we have

βjθ¼θc ¼ π: ð8:126Þ

In the total reflection region we have


 
sin2 θ  n2 cos2 θ > n2  n2 cos2 θ ¼ n2 1  cos2 θ > 0: ð8:127Þ

Therefore, the denominator of (8.122) is positive and, hence, the imaginary part of
k
RE is positive as well for any θ (i.e., 0 to π/2). From (8.123) and (8.124), on the other
hand, the real part of RE in (8.122) varies from 1 to 1. At e
k
θ0 that satisfies a
following condition:
rffiffiffiffiffiffiffiffiffiffiffiffiffi
1  n2
cos e
θ0 ¼ , ð8:128Þ
1 þ n4
k
the real part of RE is zero. Once again, we have

θc < e
θ0 < π=2: ð8:129Þ

Thus, the phase β varies from π to 0 as depicted in Fig. 8.9.


In this section, we mentioned somewhat peculiar features of complex trigono-
metric functions such as sinϕ > 1 in light of real functions. As a matter of course, the
complex angle ϕ should be determined experimentally from (8.109). In this context
readers are referred to Chap. 6 that dealt with the theory of analytic functions [2].

8.7 Waveguide Applications

There are many optical devices based upon light propagation. Among them, wave-
guide devices utilize the total reflection. We explain their operation principle.
Suppose that we have a thin plate (usually said to be a slab) comprising a
dielectric medium that infinitely spreads two-dimensionally and that the plate is
sandwiched with another dielectric (or maybe air or vacuum) or metal. In this
320 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.9 Phase shift β for


the total reflection of TM 1−
wave. At a critical angle θc, = cos
1+
β¼π i

β
1
0

= =
2

situation electromagnetic waves are confined within the slab. Moreover, only under a
restricted condition those waves are allowed to propagate in parallel to the slab
plane. Such electromagnetic waves are usually called propagation modes or simply
“modes.” An optical device thus designed is called a waveguide. These modes are
characterized by repeated total reflection during the propagation. Another mode is an
evanescent mode. Because of the total reflection, the energy transport is not allowed
to take place vertically to the interface of two dielectrics. For this reason, the
evanescent mode is thought to be propagated in parallel to the interface very close
to it.

8.7.1 TE and TM Waves in a Waveguide

In a waveguide configuration, propagating waves are classified into TE and TM


modes. Quality of materials that constitute a waveguide largely governs the propa-
gation modes within the waveguide.
Figure 8.10 depicts a cross-section of a slab waveguide. We assume that the
electromagnetic wave is propagated toward the positive direction of the z-axis and
that a waveguide infinitely spreads toward the z- and x-axes. Suppose that the said
waveguide is spatially confined toward the y-axis. Let the thickness of the wave-
guide be d. From a point of view of material that shapes a waveguide, waveguides
are classified into two types. (i) Electromagnetic waves are completely confined
within the waveguide layer. This case typically happens when a dielectric forming
the waveguide is sandwiched between a couple of metal layers (Fig. 8.10a). This is
because the electromagnetic wave is not allowed to exist or propagate inside the
metal.
(ii) Electromagnetic waves are not completely confined within the waveguide.
This case happens when the dielectric of the waveguide is sandwiched by a couple of
8.7 Waveguide Applications 321

(a) Metal
(b) Dielectric (clad layer)

Waveguide
Waveguide

d
d
(core layer)

Metal Dielectric (clad layer)


y y

z z
x x

Fig. 8.10 Cross-section of a slab waveguide comprising a dielectric medium. (a) A waveguide is
sandwiched between a couple of metal layers. (b) A waveguide is sandwiched between a couple of
layers consisting of another dielectric called clad layer. The sandwiched layer is called core layer

other dielectrics. We distinguish this case as the total internal reflection from the
above case (i). We further describe it in Sect. 8.7.2. For the total internal reflection to
take place, the refractive index of the waveguide must be higher than those of other
dielectrics (Fig. 8.10b). The dielectric of the waveguide is called core layer and the
other dielectric is called clad layer. In this case, electromagnetic waves are allowed
to propagate inside of the clad layer, even though the region is confined very close to
the interface between the clad and core layers. Such electromagnetic waves are said
to be an evanescent wave. According to these two cases (i) and (ii), we have different
conditions under which the allowed modes can exist.
Now, let us return to Maxwell’s equations. We have introduced the equations of
wave motion (7.35) and (7.36) from Maxwell’s eqs. (7.28) and (7.29) along with
(7.7) and (7.10). One of their simplest solutions is a plane wave described by (7.53).
The plane wave is characterized by that the wave has the same phase on an infinitely
spreading plane perpendicular to the propagation direction (characterized by a
wavenumber vector k). In a waveguide, however, the electromagnetic field is
confined with respect to the direction parallel to the normal to the slab plane (i.e.,
the direction of the y-axis in Fig. 8.10). Consequently, the electromagnetic field can
no longer have the same phase in that direction. Yet, as solutions of equations of
wave motion, we can have a solution that has the same phase with the direction of the
x-axis. Bearing in mind such a situation, let us think of Maxwell’s equations in
relation to the equations of wave motion.
Ignoring components related to partial differentiation with respect to x (i.e., the
component related to ∂/∂x) and rewriting (7.28) and (7.29) for individual Cartesian
coordinates, we have [3].

∂E z ∂Ey ∂Bx
2 þ ¼ 0, ð8:130Þ
∂y ∂z ∂t
322 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

∂E x ∂By
þ ¼ 0, ð8:131Þ
∂z ∂t
∂Ex ∂Bz
 þ ¼ 0, ð8:132Þ
∂y ∂t
∂H z ∂H y ∂Dx
2  ¼ 0, ð8:133Þ
∂y ∂z ∂t
∂H x ∂Dy
 ¼ 0, ð8:134Þ
∂z ∂t
∂H x ∂Dz
  ¼ 0: ð8:135Þ
∂y ∂t

Of the above equations, we collect those pertinent to Ex and differentiate (8.131),


(8.132), and (8.133) with respect to z, y, and t, respectively, to get

2
∂ E x ∂ By
2
þ ¼ 0, ð8:136Þ
∂z
2 ∂z∂t
2 2
∂ E x ∂ Bz
 ¼ 0, ð8:137Þ
∂y
2 ∂y∂t
2
2
∂ Hz ∂ H y ∂2 Dx
2  ¼ 0: ð8:138Þ
∂t∂y ∂t∂z ∂t
2

Multiplying (8.138) by μ and further adding (8.136) and (8.137) to it and using (7.7),
we get

2 2 2
∂ Ex ∂ Ex ∂ E
2
þ 2
¼ με 2 x : ð8:139Þ
∂y ∂z ∂t

This is a two-dimensional equation of wave motion. In a similar manner, from


(8.130), (8.134), and (8.135), we have for the magnetic field

2 2 2
∂ Hx ∂ Hx ∂ Hx
2
þ 2
¼ με 2
: ð8:140Þ
∂y ∂z ∂t

Equations (8.139) and (8.140) are two-dimensional wave equations with respect
to the y- and z-coordinates. With the direction of the x-axis, a propagating wave has
the same phase. Suppose that we have plane wave solutions for them as in the case of
(7.58) and (7.59). Then, we have
8.7 Waveguide Applications 323

E = E0 eiðkxωtÞ ¼ E0 eiðknxωtÞ ,
H = H0 eiðkxωtÞ ¼ H0 eiðknxωtÞ : ð8:141Þ

Note that a plane wave expressed by (7.58) and (7.59) is propagated uniformly in a
dielectric medium. In a waveguide, on the other hand, the electromagnetic waves
undergo repeated (total) reflections from the two boundaries positioned either side of
the waveguide, while being propagated.
In a three-dimensional version, the wavenumber vector has three components kx,
ky, and kz as expressed in (7.48). In (8.141), in turn, k has y- and z-components such
that

k2 ¼ k 2 ¼ k 2y þ k2z : ð8:142Þ

Equations (8.139) and (8.140) can be rewritten as

2 2 2 2 2 2
∂ Ex ∂ Ex ∂ Ex ∂ Hx ∂ Hx ∂ Hx
þ ¼ με , þ ¼ με :
∂ð yÞ2 ∂ð zÞ2 ∂ ð t Þ2 ∂ð yÞ2 ∂ð zÞ2 ∂ ð t Þ2

Accordingly, we have four wavenumber vector components


 
ky ¼ k y  and kz ¼ j kz j :

Figure 8.11 indicates this situation where an electromagnetic wave can be propa-
gated within a slab waveguide in either one direction out of four choices of k. In this
section, we assume that the electromagnetic wave is propagated toward the positive
direction of the z-axis, and so we define kz as positive. On the other hand, ky can be
either positive or negative. Thus, we get

kz ¼ k sin θ and ky ¼ k cos θ: ð8:143Þ

Fig. 8.11 Four possible | |


propagation directions k of
an electromagnetic wave in
a slab waveguide

−| | | |

−| |
324 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.12 Geometries and


propagation of the
electromagnetic waves in a
slab waveguide

k kz
ky
θ ky θ
k
kz

z
x

Figure 8.12 shows the geometries of the electromagnetic waves within the slab
waveguide. The slab plane is parallel to the zx-plane. Let the positions of the two
interfaces of the slab waveguide be

y ¼ 0 and y ¼ d: ð8:144Þ

That is, we assume that the thickness of the waveguide is d.


Since (8.139) describes a wave equation for only one component Ex, (8.139) is
suited for representing a TE wave. In (8.140), in turn, a wave equation is given only
for Hx, and hence, it is suited for representing a TM wave. With the TE wave the
electric field oscillates parallel to the slab plane and vertical to the propagation
direction. With the TM wave, in turn, the magnetic field oscillates parallel to the slab
plane and vertical to the propagation direction. In a general case, electromagnetic
waves in a slab waveguide are formed by superposition of TE and TM waves. Notice
that Fig. 8.12 is applicable to both TE and TM waves.
Let us further proceed with the waveguide analysis. The electric field E within the
waveguide is described by superposition of the incident and reflected waves. Using
the first equation of (8.141) and (8.143), we have

Eðz, yÞ = εe Eþ eiðkz sin θþky cos θωtÞ þ εe 0 E  eiðkz sin θky cos θωtÞ , ð8:145Þ
 
where E+ (E) and εe ε0e represent an amplitude and unit polarization vector of the
incident (reflected) waves, respectively. The vector εe (or εe0) is defined in (7.67).
Equation (8.145) is common to both the cases of TE and TM waves. From now,
we consider the TE mode case. Suppose that the slab waveguide is sandwiched with
a couple of metal sheet of high conductance. Since the electric field must be absent
inside the metal, the electric field at the interface must be zero owing to the
continuity condition of a tangential component of the electric field. Thus, we require
the following condition should be met with (8.145):
8.7 Waveguide Applications 325

t  Eðz, 0Þ = 0 = t  εe E þ eiðkz sin θωtÞ þ t  εe 0 E eiðkz sin θωtÞ


= ðt  εe Eþ þ t  εe 0 E  Þeiðkz sin θωtÞ : ð8:146Þ

Therefore, since ei(kz sin θ  ωt) never vanishes, we have

t  εe E þ þ t  εe 0 E ¼ 0, ð8:147Þ

where t is a tangential unit vector at the interface.


Since E is polarized along the x-axis, setting εe ¼ εe0 ¼ e1 and taking t as e1, we
get

E þ þ E  ¼ 0:

This means that the reflection coefficient of the electric field is 1. Denoting
E+ ¼  E  E0 (>0), we have
h i
E = e1 E0 eiðkz sin θþky cos θωtÞ  eiðkz sin θky cos θωtÞ
h  i
= e1 E0 eiky cos θ  eiky cos θ eiðkz sin θωtÞ

= e1 2iE 0 sin ðky cos θÞeiðkz sin θωtÞ : ð8:148Þ

Requiring the electric field to vanish at another interface of y ¼ d, we have

Eðz, dÞ = 0 = e1 2iE 0 sin ðkd cos θÞeiðkz sin θωtÞ :

Note that in terms of the boundary conditions we are thinking of Dirichlet conditions
(see Sects. 1.3 and 10.3). In this case, we have nodes for the electric field at the
interface between metal and a dielectric. For this condition to be satisfied, we must
have

kd cos θ ¼ mπ ðm ¼ 1, 2,   Þ: ð8:149Þ

From (8.149), we have a following condition for m:

m kd=π: ð8:150Þ

Meanwhile, we have

k ¼ nk 0 , ð8:151Þ
326 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

where n is a refractive index of a dielectric that shapes the slab waveguide; the
quantity k0 is a wavenumber of the electromagnetic wave in vacuum. The index n is
given by

n ¼ c=v, ð8:152Þ

where c and v are light velocity in vacuum and the dielectric media, respectively.
Here v is meant as a velocity in an infinitely spreading dielectric. Thus, θ is allowed
to take several (or more) numbers depending upon k, d, and m.
Since in the z-direction no specific boundary conditions are imposed, we have
propagating modes in that direction characterized by a propagation constant (vide
infra). Looking at (8.148), we notice that k sin θ plays a role of a wavenumber in a
free space. For this reason, a quantity β defined as

β ¼ k sin θ ¼ nk0 sin θ ð8:153Þ

is said to be a propagation constant. In (8.153), k0 is a wavenumber in vacuum. From


(8.149) and (8.153), we get

1=2
m2 π 2
β¼ k2  : ð8:154Þ
d2

Thus, the allowed TE waves indexed by m are called TE modes and represented as
TEm. The phase velocity vp is given by

vp ¼ ω=β: ð8:155Þ

Meanwhile, the group velocity vg is given by

1
dω dβ
vg ¼ ¼ : ð8:156Þ
dβ dω

Using (8.154) and noting that k2 ¼ ω2/v2, we get

vg ¼ v2 β=ω: ð8:157Þ

Thus, we have

vp vg ¼ v2 : ð8:158Þ

Note that in (1.22) of Sect 1.1 we saw a relationship similar to (8.158).


The characteristics of TM waves can be analyzed in a similar manner by exam-
ining the magnetic field Hx. In that case, the reflection coefficient of the magnetic
field is +1 and we have antinodes for the magnetic field at the interface.
8.7 Waveguide Applications 327

Concomitantly, we adopt Neumann conditions as the boundary conditions (see


Sects. 1.3 and 8.3). Regardless of the difference in the boundary conditions, how-
ever, discussion including (8.149) to (8.158) applies to the analysis of TM waves.
Once Hx is determined, Ey and Ez can be determined as well from (8.134) and
(8.135).

8.7.2 Total Internal Reflection and Evanescent Waves

If a slab waveguide shaped by a dielectric is sandwiched by a couple of another


dielectric (Fig. 8.10b), the situation differs from a metal waveguide (Fig. 8.10a) we
encountered in Sect. 8.7.1. Suppose in Fig. 8.10b that the former dielectric D1 of a
refractive index n1 is sandwiched by the latter dielectric D2 of a refractive index n2.
Suppose that an electromagnetic wave is being propagated from D1 toward D2.
Then, we must have

n1 > n2 ð8:159Þ

so that the total internal reflection can take place at the interface of D1 and D2. In this
case, the dielectrics D1 and D2 act as a core layer and a clad layer, respectively.
The biggest difference between the present waveguide and the previous one is
that unlike the previous case, the total internal reflection occurs in the present case.
Concomitantly, an evanescent wave is present in the clad layer very close to the
interface.
First, let us estimate the conditions that are satisfied so that an electromagnetic
wave can be propagated within a waveguide. Figure 8.13 depicts a cross-section of
the waveguide where the light is propagated in the direction of k. In Fig. 8.13,
suppose that we have a normal N to the plane of paper at P. Then, N and a straight
line XY shape a plane NXY. Also suppose that a dielectric fills a semi-infinite space
situated below NXY. Further suppose that there is another virtual plane N’X’Y’ that
is parallel with NXY as shown. Here N0 is parallel to N. The separation of the two
parallel plane is d. We need the virtual plane N’X’Y0 just to estimate an optical path
difference (or phase difference, more specifically) between two waves, i.e., a prop-
agating wave and a reflected wave.
Let n be a unit vector in the direction of k; i.e.,

n ¼ k= j k j¼ k=k: ð8:160Þ

Then the electromagnetic wave is described as

E = E0 eiðkxωtÞ ¼ E0 eiðknxωtÞ : ð8:161Þ

Suppose that we take a coordinate system such that


328 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.13 Cross-section of


; 3 1 <
the waveguide where the
light is propagated in the θ
direction of k. A dielectric $

d
fills a semi-infinite space 4
situated below NXY. We
suppose another virtual ;ಬ 1ಬ % <ಬ
plane N’X’Y0 that is parallel 3ಬ
with NXY. We need the θ
plane N’X’Y0 to estimate an
optical path difference $ಬ
4ಬ
(or phase difference)
%ಬ

x = rn þ su þ tv, ð8:162Þ

where u and v represent unit vectors in the direction perpendicular to n. Then (8.161)
can be expressed by

E = E0 eiðkrωtÞ : ð8:163Þ

Suppose that the wave is propagated starting from a point A to P and reflected at
P. Then, the wave is further propagated to B and reflected again to reach Q. The
wave front is originally at AB and finally at PQ. Thus, the Z-shaped optical path
length APBQ is equal to a separation between A’B’ and PQ. Notice that the
separation between A’B’ and P’Q’ is taken so that it is equal to that between AB
and PQ. The geometry of Fig. 8.13 implies that two waves starting from AB and
A’B’ at once reach PQ again at once.
We find the separation between AB and A’B’ is

2d cos θ:

Let us tentatively call these waves Wave-AB and Wave-A’B’ and describe their
electric fields as EAB and EA0 B0 , respectively. Then, we denote

EAB = E0 eiðkrωtÞ , ð8:164Þ


EA0 B0 = E0 ei½kðrþ2d cos θÞωt , ð8:165Þ

where k is a wavenumber in the dielectric. Note that since EA0 B0 gets behinds EAB, a
plus sign appears in the first term of the exponent. Therefore, the phase difference
between the two waves is

2kd cos θ: ð8:166Þ


8.7 Waveguide Applications 329

Now, let us come back to the actual geometry of the waveguide. That is, the core
layer of thickness d is sandwiched by a couple of clad layers (Fig. 8.10b). In this
situation, the wave EAB experiences the total internal reflection two times, which we
ignored in the above discussion of the metal waveguide. Since the total internal
reflection causes a complex phase shift, we have to take account of this effect. The
phase shift was defined as α of (8.114) for a TE mode and β for a TM mode. Notice
that in Fig. 8.13 the electric field oscillates perpendicularly to the plane of paper with
the TE mode, whereas it oscillates in parallel with the plane of paper with the TM
mode. For both the cases the electric field oscillates perpendicularly to n. Conse-
quently, the phase shift due to these reflections has to be added to (8.166). Thus, for
the phase commensuration to be obtained, the following condition must be satisfied:

2kd cos θ þ 2δ ¼ 2lπ ðl ¼ 0, 1, 2,   Þ, ð8:167Þ

where δ is either δTE or δTM defined below according to the case of the TE wave and
TM wave, respectively. For a practical purpose, (8.167) is dealt with by a numerical
calculation, e.g., to design an optical waveguide.
Unlike (8.149), what is the most important with (8.167) is that the condition l ¼ 0
is permitted because of δ < 0 (see just below).
For convenience and according to the custom, we adopt a phase shift notation
other than that defined in (8.114). With the TE mode, the phase is retained upon
reflection at the critical angle, and so we identify α with an additional component
δTE. In the TM case, on the other hand, the phase is reversed upon reflection at the
critical angle (i.e., a π shift occurs). Since this π shift has been incorporated into β, it
suffices to consider only an additional component δTM. That is, we have

δTE  α and δTM  β  π: ð8:168Þ

We rewrite (8.110) as

aeiσ
R⊥
E ¼e ¼ e
iα iδTE
 ¼ e2iσ
aeiσ

and

δTE ¼ 2σ ðσ > 0Þ, ð8:169Þ

where we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
aeiσ ¼ cos θ þ i sin 2 θ  n2 : ð8:170Þ

Therefore,
330 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
δ sin2 θ  n2
tan σ ¼  tan TE ¼ : ð8:171Þ
2 cos θ

Meanwhile, rewriting (8.112) we have

k beiτ
RE ¼ eiβ ¼ eiðδTM þπ Þ   ¼ e2iτ ¼ eið2τþπÞ , ð8:172Þ
beiτ

where
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
beiτ ¼ n2 cos θ þ i sin2 θ  n2 : ð8:173Þ

Note that the minus sign with the second last equality in (8.172) is due to the phase
reversal upon reflection. From (8.172), we may put

β ¼ 2τ þ π:

Comparing this with the second equation of (8.168), we get

δTM ¼ 2τ ðτ > 0Þ: ð8:174Þ

Consequently, we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
δ sin2 θ  n2
tan τ ¼  tan TM ¼ : ð8:175Þ
2 n2 cos θ

Finally, the additional phase change δTE and δTM upon the total reflection is given
by [4].
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 sin2 θ  n2 1 sin2 θ  n2
δTE ¼ 2 tan and δTM ¼ 2 tan : ð8:176Þ
cos θ n2 cos θ

We emphasize that in (8.176) both δTE and δTM are negative quantities. This phase
shift has to be included in (8.167) as a negative quantity δ. At a first glance, (8.176)
seems to differ largely from (8.120) and (8.125). Nevertheless, noting that a trigo-
nometric formula

2 tan x
tan 2x ¼
1  tan2 x

and remembering that δTM in (8.168) includes π arising from the phase reversal, we
find that both the relations are virtually identical.
Evanescent waves are drawing a large attention in the field of basic physics and
applied device physics. If the total internal reflection is absent, ϕ is real. But, under
8.8 Stationary Waves 331

the total internal reflection, ϕ is pure imaginary. The electric field of the evanescent
wave is described as

Et = Eεt eiðkt z sin ϕþkt y cos ϕωtÞ ¼ Eεt eiðkt z sin ϕþkt yibωtÞ
¼ Eεt eiðkt z sin ϕωtÞ ekt yb : ð8:177Þ

In (8.177), a unit polarization vector εt is either perpendicular to the plane of paper of


Fig. 8.13 (the TE case) or in parallel to it (the TM case). Notice that the coordinate
system is different from that of (8.104). The quantity kt sin ϕ is the propagation
constant. Let vðpsÞ and vðpeÞ be a phase velocity of the electromagnetic wave in the slab
waveguide (i.e., core layer) and evanescent wave in the clad layer, respectively.
Then, in virtue of Snell’s law we have

v1 ω ω v
v1 < vðpsÞ ¼ ¼ ¼ ¼ vðpeÞ ¼ 2 < v2 , ð8:178Þ
sin θ k sin θ kt sin ϕ sin ϕ

where v1 and v2 are light velocity in a free space filled by the dielectric D1 and D2,
respectively. For this, we used a relation described as

ω ¼ v1 k ¼ v2 k t : ð8:179Þ

We also used Snell’s law with the third equality. Notice that sin ϕ > 1 in the
evanescent region and that kt sin ϕ is a propagation constant in the clad layer. Also
note that vðpsÞ is equal to vðpeÞ and that these phase velocities are in between the two
velocities of the free space. Thus, the evanescent waves must be present, accompa-
nying propagating waves that undergo the total internal reflections in a slab
waveguide.
As remarked in (6.105), the electric field of evanescent waves decays exponen-
tially with increasing z. This implies that the evanescent waves exist only in the clad
layer very close to an interface of core and clad layers.

8.8 Stationary Waves

So far, we have been dealing with propagating waves either in a free space or in a
waveguide. If the dielectric shaping the waveguide is confined in another direction,
the propagating waves show specific properties. Examples include optical fibers.
In this section we consider a situation where the electromagnetic wave is prop-
agating in a dielectric medium and reflected by a “wall” formed by metal or another
dielectric. In such a situation, the original wave (i.e., a forward wave) causes
interference with the backward wave and a stationary wave is formed as a conse-
quence of the interference.
332 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

To approach this issue, we deal with superposition of two waves that have
different phases and different amplitudes. To generalize the problem, let ψ 1 and
ψ 2 be two cosine functions described as

ψ 1 ¼ a1 cos b1 and ψ 2 ¼ a2 cos b2 : ð8:180Þ

Their addition is expressed as

ψ ¼ ψ 1 þ ψ 2 ¼ a1 cos b1 þ a2 cos b2 : ð8:181Þ

Here we wish to unify (8.181) as a single cosine (or sine) function. To this end, we
modify a description of ψ 2 such that

ψ 2 ¼ a2 cos ½b1 þ ðb2  b1 Þ


¼ a2 ½ cos b1 cos ðb2  b1 Þ  sin b1 sin ðb2  b1 Þ
¼ a2 ½ cos b1 cos ðb1  b2 Þ þ sin b1 sin ðb1  b2 Þ: ð8:182Þ

Then, the addition is described by

ψ ¼ ψ1 þ ψ2
¼ ½a1 þ a2 cos ðb1  b2 Þ cos b1 þ a2 sin ðb1  b2 Þ sin b1 : ð8:183Þ

Putting R such that


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R¼ ½a1 þ a2 cos ðb1  b2 Þ2 þ a2 2 sin2 ðb1  b2 Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ a1 2 þ a2 2 þ 2a1 a2 cos ðb1  b2 Þ, ð8:184Þ

we get

ψ ¼ R cos ðb1  θÞ, ð8:185Þ

where θ is expressed by

a2 sin ðb1  b2 Þ
tan θ ¼ : ð8:186Þ
a1 þ a2 cos ðb1  b2 Þ

Figure 8.14 represents a geometrical diagram in relation to the superposition of two


waves having different amplitudes (a1 and a2) and different phases (b1 and b2) [5].
To apply (8.185) to the superposition of two electromagnetic waves that are
propagating forward and backward to collide head-on with each other, we change
the variables such that
8.8 Stationary Waves 333

Fig. 8.14 Geometrical y


diagram in relation to the
superposition of two waves cos( − ) sin( − )
having different amplitudes
(a1 and a2) and different
phases (b1 and b2)
b1 ‒ b2

a1

θ
a2

O b1 b2 x

b1 ¼ kx  ωt and b2 ¼ kx  ωt, ð8:187Þ

where the former equation represents a forward wave, whereas the latter a backward
wave. Then we have

b1  b2 ¼ 2kx: ð8:188Þ

Equation (8.185) is rewritten as

ψ ðx, t Þ ¼ R cos ðkx  ωt  θÞ, ð8:189Þ

with
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R¼ a1 2 þ a2 2 þ 2a1 a2 cos 2kx ð8:190Þ

and

a2 sin 2kx
tan θ ¼ : ð8:191Þ
a1 þ a2 cos 2kx

Equation (8.189) looks simple, but since both R and θ vary as a function of x, the
situation is somewhat complicated unlike a simple sinusoidal wave. Nonetheless,
when x takes a special value, (8.189) is expressed by a simple function form. For
example, at t ¼ 0,

ψ ðx, 0Þ ¼ ða1 þ a2 Þ cos kx:


334 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.15 Superposition of 㻞


two sinusoidal waves. In
(8.189) and (8.190), we put 㻝㻚㻡
a1 ¼ 1, a2 ¼ 0.5, with (i) (iii)
(i) t ¼ 0; (ii) t ¼ T/4; (iii) 㻝
t ¼ T/2; (iv) t ¼ 3T/4. ψ(x, t)
(ii)
㻜㻚㻡
is plotted as a function of
phase kx 㻜
㻙㻜㻚㻡
(iv)
㻙㻝
㻙㻝㻚㻡
㻙㻞
0 Phase kx (rad) 6.3

This corresponds to (i) of Fig. 8.15. If t ¼ T/2 (where T is a period, i.e., T ¼ 2π/ω),
we have

ψ ðx, T=2Þ ¼ ða1 þ a2 Þ cos kx:

This corresponds to (iii) of Fig. 8.15. But, the waves described by (ii) or (iv) do not
have a simple function form.
We characterize Fig. 8.15 below. If we have

2kx ¼ nπ or x ¼ nλ=4 ð8:192Þ

with λ being a wavelength, then θ ¼ 0 or π, and so θ can be eliminated. This situation


occurs with every quarter period of a wavelength. Let us put t ¼ 0 and examine how
the superposed wave looks like. For instance, putting x ¼ 0, x ¼ λ/4, and x ¼ λ/2 we
have

ψ ð0, 0Þ ¼ ja1 þ a2 j, ψ ðλ=4, 0Þ ¼ ψ ð3λ=4, 0Þ ¼ 0, ψ ðλ=2, 0Þ


¼ ja1 þ a2 j, ð8:193Þ

respectively. Notice that in Fig. 8.15 we took a1, a2 > 0. At another instant t ¼ T/4,
we have similarly

ψ ð0, T=4Þ ¼ 0, ψ ðλ=4, T=4Þ ¼ ja1  a2 j, ψ ðλ=2, T=4Þ ¼ 0,


ψ ðλ=4, T=4Þ ¼ ja1  a2 j: ð8:194Þ

Thus, the waves that vary with time are characterized by two dram-shaped envelopes
that have extremals ja1 + a2j and |a1  a2| or those  j a1 + a2j and |a1  a2|. An
important implication of Fig. 8.15 is that no node is present in the superposed wave.
8.8 Stationary Waves 335

In other words, there is no instant t0 when ψ(x, t0) ¼ 0 for any x. From the aspect of
energy transport of electromagnetic waves, if |a1| > j a2j (where a1 and a2 represent
an amplitude of the forward and backward waves, respectively), the net energy flow
takes place in the travelling direction of the forward wave. If, on the other hand, |
a1| > ja2j, the net energy flow takes place in the travelling direction of the backward
wave. In this respect, think of Poynting vectors.
In case |a1| ¼ ja2j, the situation is particularly simple. No net energy flow takes
place in this case. Correspondingly, we observe nodes. Such waves are called
stationary waves. Let us consider this simple situation for an electromagnetic wave
that is incident perpendicularly to the interface between two dielectrics (one of them
may be a metal).
Returning back to (7.58) and (7.66), we describe two electromagnetic waves that
are propagating in the positive and negative direction of the z-axis such that

E1 = E1 εe eiðkzωtÞ and E2 = E2 εe eiðkzωtÞ , ð8:195Þ

where εe is a unit polarization vector arbitrarily fixed so that it can be parallel to the
interface, i.e., wall (i.e., perpendicular to the z-axis). The situation is depicted in
Fig. 8.16. Notice that in Fig. 8.16 E1 represents the forward wave (i.e., incident
wave) and E2 the backward wave (i.e., wave reflected at the interface). Thus, a
superposed wave is described as

E ¼ E1 þ E2 : ð8:196Þ

Taking account of the reflection of an electromagnetic wave perpendicularly incident


on a wall, let us consider following two cases:
(i) Syn-phase:
The phase of the electric field is retained upon reflection. We assume that
E1 ¼ E2 (>0). Then, we have

Fig. 8.16 Superposition of


electric fields of forward
(or incident) wave E1 and ( )
=
backward (or reflected)
wave E2

= ( )
336 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

h i
E = E1 εe eiðkzωtÞ þ eiðkzωtÞ
 
¼ E1 εe eiωt eikz þ eikz ¼ 2E 1 εe eiωt cos kz: ð8:197Þ

In (8.197), we put z ¼ 0 at the interface for convenience. Taking a real part of


(8.197), we have

E ¼ 2E1 εe cos ωt cos kz: ð8:198Þ

Note that in (8.198) variables z and t have been separated. This implies that
we have a stationary wave. For this case to be realized, the characteristic
impedance of the dielectric of the incident wave side should be smaller enough
than that of the other side; see (8.51) and (8.59). In other words, the dielectric
constant of the incident side should be large enough. We have nodes at positions
that satisfy

π 1 m
kz ¼ þ mπ ðm ¼ 0, 1, 2,   Þ or  z ¼ λ þ λ: ð8:199Þ
2 4 2

Note that we are thinking of the stationary wave in the region of z < 0.
Equation (8.199) indicates that nodes are formed at a quarter wavelength from
the interface and every half wavelength from it. The node means the position
where no electric field is present.
Meanwhile, antinodes are observed at positions

m
kz ¼ mπ ðm ¼ 0, 1, 2,   Þ or  z ¼ þ λ:
2

Thus, the nodes and antinodes alternate with every quarter wavelength.
(ii) Anti-phase:
The phase of the electric field is reversed. We assume that E1 ¼  E2 (>0).
Then, we have
h i  
E = E 1 εe eiðkzωtÞ  eiðkzωtÞ ¼ E1 εe eiωt eikz  eikz

¼ 2iE 1 εe eiωt sin kz: ð8:200Þ

Taking a real part of (8.197), we have

E ¼ 2E 1 εe sin ωt sin kz: ð8:201Þ

In (8.201) variables z and t have been separated as well. For this case to be
realized, the characteristic impedance of the dielectric of the incident wave side
should be larger enough than that of the other side. In other words, the dielectric
constant of the incident side should be small enough. Practically, this situation
References 337

can easily be attained choosing a metal of high reflectance for the wall material.
We have nodes at positions that satisfy

m
kz ¼ mπ ðm ¼ 0, 1, 2,   Þ or  z ¼ λ: ð8:202Þ
2

The nodes are formed at the interface and every half wavelength from it. As
in the case of the syn-phase, the antinodes take place with the positions shifted
by a quarter wavelength relative to the nodes.
If there is another interface at say z ¼ L (>0), the wave goes back and forth
many times. If an absolute value of the reflection coefficient at the interface is
high enough (i.e., close to the unity), attenuation of the wave is ignorable. For
both the syn-phase and anti-phase cases, we must have

m
kL ¼ mπ ðm ¼ 1, 2,   Þ or L ¼ λ ð8:203Þ
2

so that stationary waves can stand stable. For a practical purpose, an optical
device having such an interface is said to be a resonator. Various geometries
and constitutions of the resonator are proposed in combination with various
dielectrics including semiconductors. Related discussion can be seen in Chap. 9.

References

1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
2. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
3. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester
4. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge
5. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
Chapter 9
Light Quanta: Radiation and Absorption

So far we discussed propagation of light and its reflection and transmission


(or refraction) at an interface of dielectric media. We described characteristics of
light from the point of view of an electromagnetic wave. In this chapter, we describe
properties of light in relation to quantum mechanics. To this end, we start with
Planck’s law of radiation that successfully reproduced experimental results related to
a blackbody radiation. Before this law had been established, Rayleigh–Jeans law
failed to explain the experimental results at a high frequency region of radiation (the
ultraviolet catastrophe). The Planck’s law of radiation led to the discovery of light
quanta. Einstein interpreted Planck’s law of radiation on the basis of a model of two-
level atoms. This model includes so-called Einstein A and B coefficients that are
important in optics applications, especially lasers. We derive these coefficients from
a classical point of view based on a dipole oscillation. We also consider a close
relationship between electromagnetic waves confined in a cavity and a motion of a
harmonic oscillator.

9.1 Blackbody Radiation

Historically, the relevant theory was first propounded by Max Planck and then
Albert Einstein as briefly discussed in Chap. 1. The theory was developed on the
basis of the experiments called cavity radiation or blackbody radiation. Here,
however, we wish to derive Planck’s law of radiation on the assumption of the
existence of quantum harmonic oscillators.
As discussed in Chap. 2, the ground state of a quantum harmonic oscillator has an
energy 12 ħω. Therefore, we measure energies of the oscillator in reference to that
state. Let N0 be the number of oscillators (i.e., light quanta) present in the ground
state. Then, according to Boltzmann distribution law the number of oscillators of the
first excited state N1 is

© Springer Nature Singapore Pte Ltd. 2020 339


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_9
340 9 Light Quanta: Radiation and Absorption

N 1 ¼ N 0 eħω=kB T , ð9:1Þ

where kB is Boltzmann constant and T is absolute temperature. Let Nj be the number


of oscillators of the j-th excited state. Then we have

N j ¼ N 0 ejħω=kB T : ð9:2Þ

Let N be the total number of oscillators in the system. Then, we get

N ¼ N 0 þ N 0 eħω=kB T þ    þ N 0 ejħω=kB T þ   
X1
¼ N 0 j¼0 ejħω=kB T : ð9:3Þ

Let E be a total energy of the oscillator system in reference to the ground state. That
is, we put a ground-state energy at zero. Then we have

E ¼ 0  N 0 þ N 0 ħωeħω=kB T þ    þ N 0 jħωejħω=kB T þ   
X1
¼ N 0 j¼0 jħωejħω=kB T : ð9:4Þ

Therefore, an average energy of oscillators E is


P1 jħω=kB T
E j¼0 je
E ¼ ¼ ħω P1 jħω=k T : ð9:5Þ
N j¼0 e
B

Putting x  eħω=kB T [1], we have


P1 j
j¼0 jx
E ¼ ħω P1 j : ð9:6Þ
j¼0 x

Since x < 1, we have

X1 X1      
d X1 j d 1
jxj ¼ jxj1  x ¼ x x ¼ x
j¼0 j¼0 dx j¼0 dx 1  x
x
¼ : ð9:7Þ
ð 1  xÞ 2
X1 1
xj ¼ : ð9:8Þ
j¼0 1x

Therefore, we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 341

ħωx ħωeħω=kB T ħω
E¼ ¼ ¼ : ð9:9Þ
1  x 1  eħω=kB T eħω=kB T  1

The function

1
ð9:10Þ
eħω=kB T  1

is a form of Bose–Einstein distribution functions; more specifically it is called the


Bose–Einstein distribution function for photons today.
If ðħω=kB T Þ  1, eħω=kB T  1 þ ðħω=kB T Þ. Therefore, we have

E  k B T: ð9:11Þ

Thus, the relation (9.9) asymptotically agrees with a classical theory. In other words,
according to the classical theory related to law of equipartition of energy, energy of
kBT/2 is distributed to each of two degrees of freedom of motion, i.e., a kinetic
energy and a potential energy of a harmonic oscillator.

9.2 Planck’s Law of Radiation and Mode Density


of Electromagnetic Waves

Researcher at the time tried to seek the relationship between the energy density
inside the cavity and (angular) frequency of radiation. To reach the relationship, let
us introduce a concept of mode density of electromagnetic waves related to the
blackbody radiation. We define the mode density D(ω) as the number of modes of
electromagnetic waves per unit volume per unit angular frequency. We refer to the
electromagnetic waves having allowed specific angular frequencies and polarization
as modes. These modes must be described as linearly independent functions.
Determination of the mode density is related to boundary conditions (BCs)
imposed on a physical system. We already dealt with this problem in Chaps. 2, 3,
and 8. These BCs often appear when we find solutions of a differential equations. Let
us consider a following wave equation:

2 2
∂ ψ 1 ∂ ψ
2
¼ 2 : ð9:12Þ
∂x v ∂t 2

According to the method of separation of variables, we put

ψ ðx, t Þ ¼ X ðxÞT ðt Þ: ð9:13Þ

Substituting (9.13) for (9.12) and dividing both sides by X(x)T(t), we have
342 9 Light Quanta: Radiation and Absorption

1 d2 X 1 1 d2 T
¼ ¼ k 2 , ð9:14Þ
X dx2 v2 T dt 2

where k is an undetermined (possibly complex) constant. For the x component, we


get

d2 X
þ k2 X ¼ 0: ð9:15Þ
dx2

Remember that k is supposed to be a complex number for the moment (see Example 1.1).
Modifying Example 1.1 a little bit such that (9.15) is posed in a domain [0, L] and
imposing the Dirichlet conditions such that

X ð0Þ ¼ X ðLÞ ¼ 0, ð9:16Þ

we find a solution of

X ðxÞ ¼ a sin kx, ð9:17Þ

where a is a constant. The constant k can be determined to satisfy the BCs; i.e.,

kL ¼ mπ or k ¼ mπ=L ðm ¼ 1, 2,   Þ: ð9:18Þ

Thus, we get real numbers for k. Then, we have a solution

T ðt Þ ¼ b sin kvt ¼ b sin ωt: ð9:19Þ

The overall solution is then

ψ ðx, t Þ ¼ c sin kx sin ωt: ð9:20Þ

This solution has already appeared in Chap. 8 as a stationary solution. The readers
are encouraged to derive these results.
In a three-dimensional case, we have a wave equation

2 2 2 2
∂ ψ ∂ ψ ∂ ψ 1 ∂ ψ
2
þ 2þ 2 ¼ 2 2
: ð9:21Þ
∂x ∂y ∂z v ∂t

In this case, we also assume

ψ ðx, t Þ ¼ X ðxÞY ðyÞZ ðzÞT ðt Þ: ð9:22Þ

Similarly we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 343

1 d2 X 1 d2 Y 1 d2 Z 1 1 d2 T
þ þ ¼ ¼ k2 : ð9:23Þ
X dx2 Y dx2 Z dx2 v2 T dt 2

Putting

1 d2 X 1 d2 Y 1 d2 Z
¼ k 2
x , ¼ k 2
y , ¼ k2z , ð9:24Þ
X dx2 Y dx2 Z dx2

we have

k 2x þ k2y þ k 2z ¼ k 2 : ð9:25Þ

Then, we get a stationary wave solution as in the one-dimensional case such that

ψ ðx, t Þ ¼ c sin kx x sin ky y sin k z z sin ωt: ð9:26Þ

The BCs to be satisfied with X(x), Y( y), and Z(z) are


 
kx L ¼ mx π, ky L ¼ my π, k z L ¼ mz π mx , my , mz ¼ 1, 2,    : ð9:27Þ

Returning to the main issue, let us deal with the mode density. Think of a cube of
each side of L that is placed as shown in Fig. 9.1. Calculating k, we have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
π ω
k¼ k 2x þ k2y þ k 2z ¼ m2x þ m2y þ m2z ¼ , ð9:28Þ
L c

where we assumed that the inside of a cavity is vacuum and, hence, the propagation
velocity of light is c. Rewriting (9.28), we have
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

m2x þ m2y þ m2z ¼ : ð9:29Þ
πc

Fig. 9.1 Cube of each side z


of L. We use this simple
L
model to estimate mode
density L
L
O
y

x
344 9 Light Quanta: Radiation and Absorption

The number mx, my, and mz represents allowable modes in the cavity; the set of (mx,
my, mz) specifies individual modes. Note that mx, my, and mz are all positive integers.
If for instance mx were allowed, this would produce sin(kxx) ¼  sin kxx; but
this function is linearly dependent on sinkxx. Then, a mode indexed by mx should
not be regarded as an independent mode. Given a ω, a set (mx, my, mz) that satisfies
(9.29) corresponds to each mode. Therefore, the number of modes that satisfies a
following expression
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

m2x þ m2y þ m2z  ð9:30Þ
πc

represents those corresponding to angular frequencies equal to or less than given ω.


Each mode has one-to-one correspondence with the lattice indexed by (mx, my,
mz). Accordingly, if mx, my, mz  1, the number of allowed modes approximately
equals one-eighth of a volume of a sphere having a radius of Lω πc . Let NL be the
number of modes whose angular frequencies equal to or less than ω. Recalling that
there are two independent modes having the same index (mx, my, mz) but mutually
orthogonal polarities, we have
 
4π Lω 3 1 L3 ω3
NL ¼  2¼ 2 3: ð9:31Þ
3 πc 8 3π c

Consequently, the mode density D(ω) is expressed as

1 dN L ω2
DðωÞdω ¼ dω ¼ 2 3 dω, ð9:32Þ
L3 dω π c

where D(ω)dω represents the number of modes per unit volume whose angular
frequencies range ω and ω + dω.
Now, we introduce a function ρ(ω) as an energy density per unit angular
frequency. Then, combining D(ω) with (9.9), we get

ħω ħω3 1
ρ ð ω Þ ¼ D ð ωÞ ¼ : ð9:33Þ
eħω=kB T 1 π 2 c3 eħω=kB T  1

The relation (9.33) is called Planck’s law of radiation. Notice that ρ(ω) has a
dimension [Jm3s].
To solve (9.15) under the Dirichlet conditions (9.16) is pertinent to analyzing an
electric field within a cavity surrounded by a metal husk, because the electric field
must be absent at an interface between the cavity and metal. The problem, however,
can equivalently be solved using the magnetic field. This is because at the interface
the reflection coefficient of electric field and magnetic field have a reversed sign (see
Chap. 8). Thus, given an equation for the magnetic field, we may use the Neumann
condition. This condition requires differential coefficients to vanish at the boundary
9.3 Two-Level Atoms 345

(i.e., the interface between the cavity and metal). In a similar manner to the above,
we get

ψ ðx, t Þ ¼ c cos kx cos ωt: ð9:34Þ

By imposing BCs, again we have (9.18) that leads to the same result as the above.
We may also impose the periodic BCs. This type of equation has already been
treated in Chap. 3. In that case we have a solution of

eikx and eikx :

The BCs demand that e0 ¼ 1 ¼ eikL. That is,

kL ¼ 2πm ðm ¼ 0, 1, 2,   Þ: ð9:35Þ

Notice that eikx and eikx are linearly independent and, hence, minus sign for m is
permitted. Correspondingly, we have
 
4π Lω 3 L3 ω3
NL ¼ 2¼ 2 3: ð9:36Þ
3 2πc 3π c

In other words, here we have to consider a whole volume of a sphere of a half radius
of the previous case. Thus, we reach the same conclusion as before.
If the average energy of an oscillator were described by (9.11), we would obtain a
following description of ρ(ω) such that

ω2
ρðωÞ ¼ DðωÞk B T ¼ kB T: ð9:37Þ
π 2 c3

This relation is well known as RayleighJeans law, but (9.37) disagreed with
experimental results in that according to RayleighJeans law, ρ(ω) diverges toward
infinity as ω goes to infinity. The discrepancy between the theory and experimental
results was referred to as “ultraviolet catastrophe.” Planck’s law of radiation
described by (9.33), on the other hand, reproduces the experimental results well.

9.3 Two-Level Atoms

Although Planck established Planck’s law of radiation, researchers at that time


hesitated in professing the existence of light quanta. It was Einstein that derived
Planck’s law by assuming two-level atoms in which light quanta play a role.
His assumption comprises the following three postulates: (i) The physical system
to be addressed comprises so-called hypothetical “two-level” atoms that have only
two energy levels. If two-level atoms absorb a light quantum, a ground-state electron
346 9 Light Quanta: Radiation and Absorption

Fig. 9.2 Optical processes ,


of Einstein two-level atom
model


( ) ( )

=

, Stimulated Stimulated Spontaneous
absorption emission emission

is excited up to a higher level (i.e., the stimulated absorption). (ii) The higher-level
electron may spontaneously lose its energy and return back to the ground state (the
spontaneous emission). (iii) The higher-level electron may also lose its energy and
return back to the ground state. Unlike (ii), however, the excited electron has to be
stimulated by being irradiated by light quanta having an energy corresponding to the
energy difference between the ground state and excited state (the stimulated emis-
sion). Figure 9.2 schematically depicts the optical processes of the Einstein model.
Under those postulates, Einstein dealt with the problem probabilistically. Sup-
pose that the ground state and excited state have energies E1 and E2. Einstein
assumed that light quanta having an energy equaling E2  E1 take part in all the
above three transition processes. He also propounded the idea that the light quanta
have an energy that is proportional to its (angular) frequency. That is, he thought that
the following relation should hold:

ħω21 ¼ E2  E1 , ð9:38Þ

where ω21 is an angular frequency of light that takes part in the optical transitions.
For the time being, let us follow Einstein’s postulates.
(i) Stimulated absorption: This process is simply said to be an “absorption.” Let Wa
[s1] be the transition probability that the electron absorbs a light quantum to be
excited to the excited state. Wa is described as

W a ¼ N 1 B21 ρðω21 Þ, ð9:39Þ

where N1 is the number of atoms occupying the ground state; B21 is a propor-
tional constant; ρ(ω21) is due to (9.33). Note that in (9.39) we used ω21 instead
of ω in (9.33). The coefficient B21 is called Einstein B coefficient; more
specifically one of Einstein B coefficients. Namely, B21 is pertinent to the
transition from the ground state to excited state.
9.3 Two-Level Atoms 347

(ii) Emission processes: The processes include both the spontaneous and stimulated
emissions. Let We [s1] be the transition probability that the electron emits a
light quantum and returns back to the ground state. We is described as

W e ¼ N 2 B12 ρðω21 Þ þ N 2 A12 , ð9:40Þ

where N2 is the number of atoms occupying the excited state; B12 and A12 are
proportional constants. The coefficient A12 is called Einstein A coefficient
relevant to the spontaneous emission. The coefficient B12 is associated with
the stimulated emission and also called Einstein B coefficient together with B21.
Here, B12 is pertinent to the transition from the excited state to ground state.
Now, we have

B12 ¼ B21 : ð9:41Þ

The reasoning for this is as follows: The coefficients B12 and B21 are proportional to
the matrix elements pertinent to the optical transition. Let T be an operator associated
with the transition. Then, a matrix element is described using an inner product
notation of Chap. 1 by

B21 ¼ hψ 2 jT j ψ 1 i, ð9:42Þ

where ψ 1 and ψ 2 are initial and final states of the system in relation to the optical
transition. As a good approximation, we use er for T (dipole approximation), where
e is an elementary charge and r is a position operator (see Chap. 1). If (9.42)
represents the absorption process (i.e., the transition from the ground state to excited
state), the corresponding emission process should be described as a reversed process
by

B12 ¼ hψ 1 jT j ψ 2 i: ð9:43Þ

Notice that in (9.43) ψ 2 and ψ 1 are initial and final states.


Taking complex conjugate of (9.42), we have

B21 ¼ hψ 1 jT { j ψ 2 i, ð9:44Þ

where T{ is an operator adjoint to T (see Chap. 1). With an Hermitian operator H,


from Sect. 1.4 we have

H { ¼ H: ð1:119Þ

Since T is also Hermitian, we have


348 9 Light Quanta: Radiation and Absorption

T { ¼ T: ð9:45Þ

Thus, we get

B21 ¼ B12 : ð9:46Þ

But, as in the cases of Sects. 4.2 and 4.3, ψ 1 and ψ 2 can be represented as real
functions. Then, we have

B21 ¼ B21 ¼ B12 :

That is, we assume that the matrix B is real symmetric. In the case of two-level
atoms, as a matrix form we get

0 B12
B¼ : ð9:47Þ
B12 0

Compare (9.47) with (4.28).


Now, in the thermal equilibrium, we have

W e ¼ W a: ð9:48Þ

That is,

N 2 B21 ρðω21 Þ þ N 2 A12 ¼ N 1 B21 ρðω21 Þ, ð9:49Þ

where we used (9.41) for LHS. Assuming Boltzmann distribution law, we get

N2
¼ exp ½ðE 2  E 1 Þ=kB T : ð9:50Þ
N1

Here if moreover we assume (9.38), we get

N2
¼ exp ðħω21 =kB T Þ: ð9:51Þ
N1

Combing (9.49) and (9.51), we have

B21 ρðω21 Þ
exp ðħω21 =kB T Þ ¼ : ð9:52Þ
B21 ρðω21 Þ þ A12

Solving (9.52) with respect to ρ(ω12), we finally get


9.4 Dipole Radiation 349

A12 exp ðħω21 =k B T Þ A 1


ρðω21 Þ ¼  ¼ 12  : ð9:53Þ
B21 1  exp ðħω21 =k B T Þ B21 exp ðħω21 =k B T Þ  1

Assuming that

A12 ħω21 3
¼ 2 3 , ð9:54Þ
B21 π c

we have

ħω21 3 1
ρðω21 Þ ¼  : ð9:55Þ
π 2 c3 exp ðħω21 =kB T Þ  1

This is none other than Planck’s law of radiation.

9.4 Dipole Radiation

In (9.54) we only know the ratio of A12 to B21. To have a good knowledge of these
Einstein coefficients, we briefly examine a mechanism of the dipole radiation. The
electromagnetic radiation results from an accelerated motion of a dipole.
A dipole moment p(t) is defined as a function of time t by
Z
pð t Þ ¼ x0 ρðx0 , t Þdx0 , ð9:56Þ

where x0 is a position vector in a Cartesian coordinate; an integral is taken over a


whole three-dimensional space; ρ is a charge density appearing in (7.1). If we
consider a system comprising point charges, integration can immediately be carried
out to yield
X
pð t Þ ¼ qx,
i i i
ð9:57Þ

where qi is a charge of each point charge i and xi is a position vector of the point
charge i. From (9.56) and (9.57), we find that p(t) depends on how we set up the
coordinate system. However, if a total charge of the system is zero, p(t) does not
depend on the coordinate system. Let p(t) and p0(t) be a dipole moment viewed from
the frame O and O0, respectively (see Fig. 9.3). Then we have
350 9 Light Quanta: Radiation and Absorption

᧧ ಥ

.
.
Fig. 9.3 Dipole moment viewed from the frame O or O0

(a) +q (b) z
x
z0 a

θ
‒a ‒z0

y
O

‒q φ
x

Fig. 9.4 Electromagnetic radiation from an accelerated motion of a dipole. (a) A dipole placed at
the origin of the coordinate system is executing harmonic oscillation along the z-direction around an
equilibrium position. (b) Electromagnetic radiation from a dipole in a wave zone. εe and εm are unit
polarization vectors of the electric field and magnetic field, respectively. εe, εm, and n form a right-
handed system

X X X  X X
p0 ð t Þ ¼ q x0 ¼
i i i
q ð x þ xi Þ ¼
i i 0 i
q i x 0 þ i
q i x i ¼ qx
i i i

¼ pðt Þ: ð9:58Þ

Notice that with the third equality the first term vanishes because the total charge is
zero. The system comprising two point charges that have an opposite charge ( q) is
particularly simple but very important. In that case we have

pðt Þ ¼ qx1 þ ðqÞx2 ¼ qðx1  x2 Þ ¼ qe


x: ð9:59Þ

Here we assume that q > 0 according to the custom and, hence, e x is a vector directing
from the minus point charge to the plus charge.
Figure 9.4 displays geometry of an oscillating dipole and electromagnetic radia-
tion from it. Figure 9.4a depicts the dipole. It is placed at the origin of the coordinate
system and assumed to be of an atomic or molecular scale in extension; we regard a
center of the dipole as the origin. Figure 9.4b represents a large-scale geometry of the
9.4 Dipole Radiation 351

dipole and surrounding space of it. For the electromagnetic radiation, an accelerated
motion of the dipole is of primary importance. The electromagnetic fields produced
by pð€t Þ vary as the inverse of r, where r is a macroscopic distance between the dipole
and observation point. Namely, r is much larger compared to the dipole size.
There are other electromagnetic fields that result from the dipole moment. The
fields result from p(t) and pð_t Þ. Strictly speaking, we have to include those quantities
that are responsible for the electromagnetic fields associated with the dipole radia-
tion. Nevertheless, the fields produced by p(t) and pð_t Þ vary as a function of the
inverse cube and inverse square of r, respectively. Therefore, the surface integral of
the square of the fields associated with p(t) and pð_t Þ asymptotically reaches zero with
enough large r with respect to a sphere enveloping the dipole. Regarding pð€t Þ, on the
other hand, the surface integral of the square of the fields remains finite even with
enough large r. For this reason, we refer to the spatial region where pð€t Þ does not
vanish as a wave zone.
Suppose that a dipole placed at the origin of the coordinate system is executing
harmonic oscillation along the z-direction around an equilibrium position (see
Fig. 9.4). Motion of two charges having plus and minus signs is described by

zþ = z0 e3 þ aeiωt e3 ðz0 , a > 0Þ, ð9:60Þ


z 2 = 2 z0 e3 2 aeiωt e3 , ð9:61Þ

where z+ and z2 are position vectors of a plus charge and minus charge, respectively;
z0 and z0 are equilibrium positions of each charge; a is an amplitude of the
harmonic oscillation; ω is an angular frequency of the oscillation. Then, accelera-
tions of the charges are given by

aþ  z€þ =  aω2 eiωt e3 , ð9:62Þ


a 2  z€ = aω2 eiωt e3 : ð9:63Þ

Meanwhile, we have

pðt Þ ¼ qzþ þ ðqÞz 2 ¼ qðzþ  z 2 Þ ðq > 0Þ: ð9:64Þ

Therefore,

pð€t Þ ¼ qðz€þ  z€ Þ ¼ 2qaω2 eiωt e3 : ð9:65Þ

The quantity pð€t Þ, i.e., the second derivative of p(t) with respect to time, produces
the electric field described by [2]
352 9 Light Quanta: Radiation and Absorption

E= 2€
p=4πε0 c2 r ¼ 2 qaω2 eiωt e3 =2πε0 c2 r, ð9:66Þ

where r is a distance between the dipole and observation point. In (9.66) we ignored
a term proportional to inverse square and cube of r for the aforementioned reason.
As described in (9.66), the strength of the radiation electric field in the wave zone
measured at a point away from the oscillating dipole is proportional to a component
of the vector of the acceleration motion of the dipole [i.e., pð€t Þ ]. The radiation
electric field lies in the direction perpendicular to a line connecting the observation
point and the point of the dipole (Fig. 9.4). Let εe be a unit polarization vector of the
electric field in that direction and let E⊥ be the radiation electric field. Then, we have

E⊥ = 2 qaω2 eiωt ðe3  εe Þεe =2πε0 c2 r ¼ 2 qaω2 eiωt εe sin θ=2πε0 c2 r : ð9:67Þ

As shown in Sect. 7.3, (εe  e3)εe in (9.67) “extracts” from e3 a vector component
parallel to εe. Such an operation is said to be a projection of a vector. The related
discussion can be seen in Part III.
It takes a time of r/c for the emitted light from the charge to arrive at the
observation point. Consequently, the acceleration of the charge has to be measured
at the time when the radiation leaves the charge. Let t be the instant when the electric
field is measured at the measuring point. Then, it follows that the radiation leaves the
charge at a time of t  r/c. Thus, the electric field relevant to the radiation that can be
observed far enough away from the oscillating charge is described as [2]

qaω2 eiωðtcÞ sin θ


r

E⊥ ðx, t Þ = 2 εe : ð9:68Þ
2πε0 c2 r

The radiation electric field must necessarily be accompanied by a magnetic field.


Writing the radiation magnetic field as H⊥(x, t), we have [3]

1 qaω2 eiωðtcÞ sin θ qaω2 eiωðtcÞ sin θ


r r

H ðx, t Þ = 2  n εe = 2 n εe
cμ0 2πε0 c2 r 2πcr

qaω2 eiωðtcÞ sin θ


r

=2 εm , ð9:69Þ
2πcr

where n represents a unit vector in the direction parallel to a line connecting the
observation point and the dipole. The εm is a unit polarization vector of the magnetic
field as defined by (7.67). From the above, we see that the radiation electromagnetic
waves in the wave zone are transverse waves that show the properties the same as
those of electromagnetic waves in a free space.
Now, let us evaluate a time-averaged energy flux from an oscillating dipole.
Using (8.71), we have
9.4 Dipole Radiation 353

1 qaω2 eiωðtcÞ sin θ qaω2 eiωðtcÞ sin θ


r r
1
SðθÞ ¼ E H =   n
2 2 2πε0 c2 r 2πcr
ω4 sin 2 θ
= ðqaÞ2 n: ð9:70Þ
8π 2 ε0 c3 r 2

If we are thinking of an atom or a molecule in which the dipole consists of an


electron and a positive charge that compensates it, q is replaced with e (e < 0).
Then (9.70) reads as

ω4 sin 2 θ
SðθÞ = ðeaÞ2 n: ð9:71Þ
8π 2 ε0 c3 r 2

Let us relate the above argument to Einstein A and B coefficients. Since we are
dealing with an isolated dipole, we might well suppose that the radiation comes from
the spontaneous emission. Let P be a total power of emission from the oscillating
dipole that gets through a sphere of radius r. Then we have
Z Z 2π Z π
P¼ SðθÞ  ndS ¼ dϕ SðθÞr 2 sin θdθ
0 0
Z 2π Z π
ω 4
¼ ðeaÞ2 dϕ sin 3 θdθ: ð9:72Þ
8π 2 ε0 c3 0 0

Changing cosθ to t, the integral I  0 sin 3 θdθ can be converted into
Z 1 
I¼ 1  t 2 dt ¼ 4=3: ð9:73Þ
1

Thus, we have

ω4
P¼ ðeaÞ2 : ð9:74Þ
3πε0 c3

A probability of the spontaneous emission is given by N2 A12. Since we are


dealing with a single dipole, N2 can be put 1. Accordingly, an expected power of
emission is A12ħω21. Replacing ω in (9.74) with ω21 in (9.55) and equating A12ħω21
to P, we get

ω21 3
A12 ¼ ðeaÞ2 : ð9:75Þ
3πε0 c3 ħ

From (9.54), we also get


354 9 Light Quanta: Radiation and Absorption

π
B12 ¼ ðeaÞ2 : ð9:76Þ
3ε0 ħ 2

In order to relate these results to those of quantum mechanics, we may replace a2


in the above expressions with a square of an absolute value of the matrix elements of
the position operator r. That is, representing | 1i and | 2i as the quantum states of the
ground and excited states of a two-level atom, we define h1| r| 2i as the matrix
element of the position operator. Relating |h1| r| 2i|2 to a2, we get

ω21 3 e2 πe2
A12 ¼ jh1jrj2ij2 and B12 ¼ jh1jrj2ij2 : ð9:77Þ
3πε0 c ħ
3
3ε0 ħ2

From (9.77), we have

πe2 πe2
B12 ¼ h1jrj2ih1jrj2i ¼ h1jrj2ih2jr{ j1
3ε0 ħ 2
3ε0 ħ2
πe2
¼ h1jrj2ih2jrj1i, ð9:78Þ
3ε0 ħ2

where with the second equality we used (1.116); the last equality comes from that r is
Hermitian. Meanwhile, we have

πe2 2 πe2 { πe2


B21 ¼ j h 2 jrj1i j ¼ h 2 jrj1i h 1 jr j2 ¼ h2jrj1ih1jrj2i: ð9:79Þ
3ε0 ħ2 3ε0 ħ2 3ε0 ħ2

Hence, we recover (9.41).

9.5 Lasers
9.5.1 Brief Outlook

A concept of the two-level atoms proposed by Einstein is universal and independent


of materials and can be utilized for some particular purposes. Actually, in later years
many researchers tried to validate that concept and verified its validity. After basic
researches of 1950s and 1960s, fundamentals were put into practical use as various
optical devices. Typical examples are masers and lasers, abbreviations for “micro-
wave amplification by stimulated emission of radiation” and “light amplification by
stimulated emission of radiation.” Of these, lasers are common and particularly
important nowadays. On the basis of universality of the concept, a lot of materials
including semiconductors and dyes are used in gaseous, liquid, and solid states.
9.5 Lasers 355

t t+dt

I(x) I(x+dx)

0 x x+dx L

Fig. 9.5 Rectangular parallelepiped of a laser medium with a length L and a cross-section area
S (not shown). I(x) denotes irradiance at a point of x from the origin

Let us consider a rectangular parallelepiped of a laser medium with a length L and


cross-section area S (Fig. 9.5). Suppose there are N two-level atoms in the rectan-
gular parallelepiped such that

N ¼ N1 þ N2, ð9:80Þ

where N1 and N2 represent the number of atoms occupying the ground and excited
states, respectively. Suppose that a light is propagated from the left of the rectangular
parallelepiped and entering it. Then, we expect that three processes occur simulta-
neously. One is a stimulated absorption and others are stimulated emission and
spontaneous emission. After these process, an increment dE in photon energy of the
total system (i.e., the rectangular parallelepiped) during dt is described by

dE ¼ fN 2 ½B21 ρðω21 Þ þ A12  N 1 B21 ρðω21 Þgħω21 dt: ð9:81Þ

In light of (9.39) and (9.40), a dimensionless quantity dE/ħω21 represents a number


of effective events of photons emission that have occurred during dt. Since in lasers
the stimulated emission is dominant, we shall forget about the spontaneous emission
and rewrite (9.81) as

dE ¼ fN 2 B21 ρðω21 Þ  N 1 B21 ρðω21 Þgħω21 dt


¼ B21 ρðω21 ÞðN 2  N 1 Þħω21 dt: ð9:82Þ

Under a thermal equilibrium, we have N2 < N1 on the basis of Boltzmann distribu-


tion law, and so dE < 0. In this occasion, therefore, the photon energy decreases. For
the light amplification to take place, therefore, we must have a following condition:

N2 > N1: ð9:83Þ

This energy distribution is called inverted distribution or population inversion. Thus,


the laser oscillation is a typical nonequilibrium phenomenon. To produce the
population inversion, we need an external exciting source using an electrical or
optical device.
The essence of lasers rests upon the fact that stimulated emission produces a
photon that possesses a wavenumber vector (k) and a polarization (ε) both the same
as those of an original photon. For this reason, the laser light is monochromatic and
356 9 Light Quanta: Radiation and Absorption

highly directional. To understand the fundamental mechanism underlying the related


phenomena, interested readers are encouraged to seek appropriate literature of
quantum theory of light for further reading [4].
To make a discussion simple and straightforward, let us assume that the light is
incident parallel to the long axis of the rectangular parallelepiped. Then, the stimu-
lated emission produces light to be propagated in the same direction. As a result, an
irradiance I measured in that direction is described as

E 0
I¼ c: ð9:84Þ
SL

Note that the light velocity in the laser medium c0 is given by

c0 ¼ c=n, ð9:85Þ

where n is a refractive index of the laser medium. Taking an infinitesimal of both


sides of (9.84), we have

c0 c0
dI ¼ dE ¼ B21 ρðω21 ÞðN 2  N 1 Þħω21 dt
SL SL
e 21 c0 dt,
¼ B21 ρðω21 ÞNħω ð9:86Þ

where Ne ¼ ðN 2  N 1 Þ=SL denotes a “net” density of atoms that occupy the excited
state.
The energy density ρ(ω21) can be written as

ρðω21 Þ ¼ I ðω21 Þgðω21 Þ=c0 , ð9:87Þ

where I(ω12) [Js1m2] represents an intensity of radiation; g(ω21) is a gain function


[s]. The gain function is a measure that shows how favorably (or unfavorably) the
transition takes place at the said angular frequency ω12. This is normalized in the
emission range such that
Z 1
gðωÞdω ¼ 1:
0

The quantity I(ω21) is an energy flux that gets through per unit area per unit time.
This flux corresponds to an energy contained in a long and thin rectangular paral-
lelepiped of a length c0 and a unit cross-section area. To obtain ρ(ω21), I(ω12) should
be divided by c0 in (9.87) accordingly. Using (9.87) and replacing c0dt with a distance
dx and I(ω12) with I(x) as a function of x, we rewrite (9.86) as
9.5 Lasers 357

e 21
B21 gðω21 ÞNħω
dI ðxÞ ¼ 0 I ðxÞdx: ð9:88Þ
c

Dividing (9.88) by I(x) and integrating both sides, we have


Z Z Z
I
dI ðxÞ I e 21
B21 gðω21 ÞNħω x
¼ d ln I ðxÞ ¼ dx, ð9:89Þ
I0 I ðxÞ I0 c0 0

where I0 is an irradiance of light at an instant when the light is entering the laser
medium from the left. Thus, we get
 
e 21
B21 gðω21 ÞNħω
I ðxÞ ¼ I 0 exp x : ð9:90Þ
c0

Equation (9.90) shows that an irradiance of the laser light is augmented exponen-
tially along the path of the laser light. In (9.90), denoting an exponent as G

e 21
B21 gðω21 ÞNħω
G 0 , ð9:91Þ
c

we get

I ðxÞ ¼ I 0 exp Gx:

The constant G is said to be a gain constant. This is an index that indicates the laser
performance. Large numbers B21, g(ω21), and N e yield a high performance of the
laser.
In Sects. 8.8 and 9.2, we sought conditions for electromagnetic waves to cause
constructive interference. In a one-dimensional dielectric medium, the condition is
described as

kL ¼ mπ or mλ ¼ 2L ðm ¼ 1, 2,   Þ, ð9:92Þ

where k and λ denote a wavenumber and wavelength in the dielectric medium,


respectively. Indexing k and λ with m that represents a mode, we have

k m L ¼ mπ or mλm ¼ 2L ðm ¼ 1, 2,   Þ: ð9:93Þ

This condition can be expressed by different manners such that

ωm ¼ 2πνm ¼ 2πc0 =λm ¼ 2πc=nλm ¼ mπc=nL: ð9:94Þ

It is often the case that if the laser is a long and thin rod, rectangular parallele-
piped, etc., we see that sharply resolved and regularly spaced spectral lines are
358 9 Light Quanta: Radiation and Absorption

observed in emission spectrum. These lines are referred to as a longitudinal


multimode. The separation between two neighboring emission lines is referred to
as the free spectral range [2]. If adjacent emission lines are clearly resolved so that
the free spectral range can easily be recognized, we can derive useful information
from the laser oscillation spectra (vide infra).
Rewriting (9.94) as, e.g.,

πc
ωm n ¼ m, ð9:95Þ
L

and taking differential (or variation) of both sides, we get

δn δn πc
nδωm þ ωm δn ¼ nδωm þ ωm δω ¼ n þ ωm δωm ¼ δm: ð9:96Þ
δωm m δωm L

Therefore, we get

1
πc δn
δωm ¼ n þ ωm δm: ð9:97Þ
L δωm

Equation (9.97) premises the wavelength dispersion of a refractive index of a laser


medium. Here, the wavelength dispersion means that the refractive index varies as a
function of wavelengths of light in a matter. The laser materials often have a
considerably large dispersion and relevant information is indispensable.
From (9.97), we find that

δn
n g  n þ ωm ð9:98Þ
δωm

plays a role of refractive index when the laser material has a wavelength dispersion.
The quantity ng is said to be a group refractive index (or group index). Thus, (9.97) is
rewritten as

πc
δω ¼ δm, ð9:99Þ
Lng

where we omitted the index m of ωm. When we need to distinguish the refractive
index n clearly from the group refractive index, we refer to n as a phase refractive
index. Rewriting (9.98) as a relation of continuous quantities and using differenti-
ation instead of variation, we have [2]

dn dn
ng ¼ n þ ω or ng ¼ n  λ : ð9:100Þ
dω dλ
9.5 Lasers 359

To derive the second equation of (9.100), we used following relations: Namely,


taking a variation of λω ¼ 2πc, we have

ω λ
ωdλ þ λdω ¼ 0 or ¼ :
dω dλ

Several formulae or relation equations were proposed to describe the wavelength


dispersion. One of famous and useful formula among them is Sellmeier’s dispersion
formula [5]. As an example, the Sellmeier’s dispersion formula can be described as
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
B
n¼ Aþ  2 , ð9:101Þ
1  λc

where A, B, and C are appropriate constants with A and B being dimensionless and
C having a dimension [m]. In an actual case, it would be difficult to determine
n analytically. However, if we are able to obtain well-resolved spectra, δm can be put
as 1 and δωm can be determined from the free spectral range. Expressing it as δωFSR
from (9.99) we have

πc πc
δωFSR ¼ or ng ¼ : ð9:102Þ
Lng LðδωFSR Þ

Thus, one can determine ng as a function of wavelengths.

9.5.2 Organic Lasers

By virtue of prominent features of lasers and related phenomena, researchers and


engineers have been developing and proposing up to now various device structures
and their operation principles in the research field of device physics. Of these, we
have organic lasers as newly occurring laser devices. Organic crystals possess the
well-defined structureproperty relationship and some of them exhibit peculiar
light-emitting features. Therefore, those crystals are suited for studying their lasing
property. In this section we study the light-emitting properties (especially the lasing
property) of the organic crystals in relation to the wavelength dispersion of refractive
index of materials. From the point of view of device physics, another key issue lies in
designing an efficient diffraction grating or resonator.
In the following tangible examples, we further investigate specific aspects of the
light-emitting properties of the organic crystals in the slab waveguide configurations
(see Sect. 8.8) to pursue fundamental properties of the organic light-emitting mate-
rials and incorporate them into high-performance devices.
Example 9.1 [6] Figure 9.6 [6] displays a broadband emission spectra of a crystal
consisting of an organic semiconductor AC’7. As another example, Fig. 9.7 [6]
360 9 Light Quanta: Radiation and Absorption

(a) (b)

10 11
Intensity (103 counts)

Intensity (103 counts)


10
5

9
0
500 520 540 560 580 600 620 640 529 530 531
Wavelength (nm) Wavelength (nm)

Fig. 9.6 Broadband emission spectra of an organic semiconductor crystal AC’7. (a) Full spectrum.
(b) Enlarged profile of the spectrum around 530 nm. Reproduced from Yamao T, Okuda Y,
Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer
single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing.
https://doi.org/10.1063/1.3634117

4
Intensity (103 counts)

0
522 524 526 528
Wavelength (nm)

Fig. 9.7 Laser oscillation spectrum of an organic semiconductor crystal AC’7. Reproduced from
Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/
phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission
of AIP Publishing. https://doi.org/10.1063/1.3634117

displays a laser oscillation spectrum of an AC’7 crystal. The structural formula of


AC’7 is shown in Fig. 9.8 together with other related organic semiconductors. Once
we choose an empirical formula of the wavelength dispersion [e.g., (9.101)], we can
determine constants of that empirical formula by comparing it with experimentally
decided data. For such data, laser oscillation spectra (Fig. 9.7) were used in addition
to the broadband emission spectra. It is because the laser oscillation spectra are
essentially identical to Fig. 9.6 in a sense that both the broadband and laser emission
lines gave the same free spectral range. Inserting (9.101) into (9.100) and expressing
ng as a function of λ, Yamao et al. got a following expression [6]:
9.5 Lasers 361

Fig. 9.8 Structural


formulae of several organic
semiconductors BP1T,
AC5, and AC’7

h  2 i2
A 1  λc þB
ng ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h i ffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h ffi: ð9:103Þ
 c 2 3  c 2 i
1 λ A 1 λ þB

Determining optimum constants A, B, and C, a set of these constants yields a reliable


dispersion formula in (9.101). Numerical calculations can be utilized effectively.
The procedures are as follows: (i) Tentatively choosing probable numbers for A, B,
and C for (9.103), ng can be expressed as a function of λ. (ii) The resulting fitting
curve is then compared with ng data experimentally decided from (9.102). After this
procedure, one can choose another set of A, and B, and C and again compare the
fitting curve with the experimental data. (iii) This procedure can be repeated many
times through iterative numerical computations of (9.103) using different sets of
A, B, and C.
Thus, we should be able to adjust and determine better and better combination of
A, B, and C so that the refined function (9.103) can reproduce the experimental
results as precise as one pleases. At the same time, we can determine the most
reliable combination of A, B, and C with the dispersion formula of (9.101). Figure 9.9
[6] shows several examples of the wavelength dispersion for organic semiconductor
crystals. Optimized constants A, B, and C of (9.101) are listed in Table 9.1 [6].
The formulae (9.101) and (9.103) along with associated procedures to determine
the constants A, B, and C are expected to apply to various laser and light-emitting
materials consisting of semiconducting inorganic and organic materials.
Example 9.2 [7] If we wish to construct a laser device, it will be highly desired to
equip the laser material with a suitable diffraction grating or resonator [8, 9]. In that
case, besides the information about the dispersion of phase refractive index, we need
numerical data of the propagation constant. The propagation constant has appeared
in Sect. 8.7.1 and is defined as
362 9 Light Quanta: Radiation and Absorption

Fig. 9.9 Examples of the


wavelength dispersion of (a) 7 (a)
group indices and (b)

Group index ng
6 BP1T AC'7
refractive indices for several
organic semiconductor AC5
crystals. Reproduced from 5
Yamao T, Okuda Y,
4
Makino Y, Hotta S (2011)
Dispersion of the refractive
3
indices of thiophene/
phenylene co-oligomer
single crystals. J Appl Phys 3.3
110(5): 053113/7 pages [6], 3.2 AC'7 (b)

Refractive index n
with the permission of AIP 3.1 BP1T
Publishing. https://doi.org/ 3
10.1063/1.3634117
2.9
2.8
2.7
2.6 AC5
2.5
500 600
Wavelength (nm)

Table 9.1 Optimized constants of A, B, and C for Sellmeier’s dispersion formula (9.101) with
several organic semiconductor crystalsa
Material A B C (nm)
BP1T 5.7 1.04 397
AC5 3.9 1.44 402
AC’7 6.0 1.06 452
a
Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive
indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages,
with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117

β ¼ k sin θ ¼ nk 0 sin θ: ð8:153Þ

As the phase refractive index n has a dispersion, the propagation constant β has a
dispersion as well. In optics, n sin θ is often referred to as an effective index. We
denote it by

neff  n sin θ or β ¼ k0 neff : ð9:104Þ

As 0  θ  π/2, we have n sin θ  n. In general, it is going to be difficult to obtain


analytical description or solution for the dispersion of both the phase refractive index
and effective index. As in Example 9.1, we usually obtain the relevant data by the
numerical calculations.
9.5 Lasers 363

(a)

P6T
(b) (c)
P6T crystal
air

optical device

AZO grating
2 µm

Fig. 9.10 Construction of an organic light-emitting device. (a) Structural formula of P6T. (b)
Diffraction grating of an organic device. The purple arrow indicates the direction of grating
wavevector K. Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y,
Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal
light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP
Publishing. https://doi.org/10.1063/1.5030486. (c) Cross-section of P6T crystal/AZO substrate

Under these circumstances, to establish a design principle of high-performance


organic light-emitting devices Yamao et al. [7] have developed a light-emitting
device where an organic crystal of P6T (see structural formula of Fig. 9.10a) was
placed onto a diffraction grating (Fig. 9.10b [7]). The compound P6T is known as
one of a family of thiophene/phenylene co-oligomers (TPCOs) together with BP1T,
AC5, AC’7, etc. that appear in Fig. 9.8 [10, 11]. The diffraction grating was
engraved using a focused ion beam (FIB) apparatus on an aluminum-doped zinc
oxide (AZO) substrate. The resulting substrate was then laminated with the thin
crystal of P6T (Fig. 9.10c). Using those devices, Yamao et al. excited the P6T crystal
with ultraviolet light of a mercury lamp and collected the emissions from the crystal
and analyzed those emission spectra (vide infra). Detailed analysis of those spectra
has revealed that there is a close relationship between the peak location of emission
lines and emission direction. Thus, to analyze the angle-dependent emission spectra
turns out to be a powerful tool to accurately determine the dispersion of propagation
constant of the laser material.
In a light-emitting device, we must consider a problem of out-coupling of light.
Seeking efficient out-coupling of light is equivalent to solving equations of electro-
magnetic wave motion inside and outside the slab crystal under appropriate bound-
ary conditions. The boundary conditions are given such that tangential components
of electromagnetic fields are continuous across the interface formed by the slab
crystal and air or a device substrate. Moreover, if a diffraction grating is present in an
optical device (including the laser), the diffraction grating influences emission
364 9 Light Quanta: Radiation and Absorption

Fig. 9.11 Geometrical relationship among the light propagation in crystal (indicated with β), the
propagation outside the crystal (k0), and the grating wavevector (K). This geometry represents the
phase matching between the emission inside the device (organic slab crystal) and out-coupled
emission (i.e., the emission outside the device); see text

characteristics of the device. In other words, regarding the out-coupling of light we


must impose the following condition on the device such that [7]

β 2 mK ¼ ðk0  eeÞee, ð9:105Þ

where β is the propagation vector with |β|  β that appeared in (8.153). The quantity
β is called a propagation constant. Equation (9.105) can be visualized in Fig. 9.11 as
geometrical relationship among the light propagation in crystal (indicated with β),
the propagation outside the crystal (k0), and the grating wavevector (K ). The grating
wavevector K is defined as

jK j  K ¼ 2π=Λ,

where Λ is the grating period and the direction of K is perpendicular to the grating
grooves; that direction is indicated in Fig. 9.10b with a purple arrow. The unit vector
ee is defined as

β  mK
ee ¼ :
j β  mK j

Namely, ee is oriented in the direction parallel to β 2 mK. In (9.105), furthermore,


m is the diffraction order with a natural number (i.e., a positive integer); k0 denotes
the wavenumber vector of an emission in vacuum with

jk0 j  k0 ¼ 2π=λp , ð9:106Þ

where λp is the emission peak wavelength in vacuum. Note that k0 is almost identical
to the wavenumber of the emission in air (i.e., the emission to be detected). If we are
dealing with the emission parallel to the substrate plane, i.e., grazing emission,
(9.105) can be reduced to
9.5 Lasers 365

β 2 mK ¼ k0 : ð9:107Þ

Equations (9.105) and (9.107) represent the relationship between β and k0, i.e., the
phase matching conditions between the emission inside the device (organic slab
crystal) and the out-coupled emission, i.e., the emission outside the device (in air).
Thus, the boundary conditions can be restated as (i) the tangential component
continuity of electromagnetic fields at both the planes of interface (see Sect. 8.1) and
(ii) the phase matching of the electromagnetic fields both inside and outsides the
laser medium. We examine these two factors below.
(i) Tangential component continuity of electromagnetic fields:
Let us suppose that we are dealing with a dielectric medium with anisotropic
dielectric constant, but isotropic magnetic permeability. In this situation, Maxwell’s
equations must be formulated so that we can deal with the electromagnetic properties
in an anisotropic medium. Such substances are widely available and organic crystals
are counted as typical examples. Among those crystals, P6T crystallizes in the
monoclinic system as in many other cases of organic crystals [10, 11]. In this case,
the permittivity tensor (or electric permittivity tensor) ε is written as [7]
0 1
εaa 0 εac
B C
ε¼@ 0 εbb 0 A: ð9:108Þ
εc a 0 εc c

Notice that in (9.108) the permittivity tensor is described in the orthogonal


coordinate of abc -system, where a and b coincide with those of the crystallographic
a- and b-axes of P6T with the c -axis being perpendicular to the ab-plane. Note that
in many of organic crystals the c-axis is not perpendicular to the ab-plane. Mean-
while, we assume that the magnetic permeability μ of P6T is isotropic and identical
to that of vacuum. That is, in a tensor form we have
0 1
μ0 0 0
B C
μ¼@ 0 μ0 0 A, ð9:109Þ
0 0 μ0

where μ0 is the magnetic permeability of vacuum.


Then, the Maxwell’s equations in a matrix form read as

∂H
rot E ¼ 2 μ0 , ð9:110Þ
∂t
366 9 Light Quanta: Radiation and Absorption

Fig. 9.12 Laboratory ∗


coordinate system (the ξηζ-
system) for the device
experiments. Regarding the
symbols and notations,
see text

0 1
εaa 0 εac
B C ∂E
rot H ¼ ε0 @ 0 εbb 0 A : ð9:111Þ
∂t
εc a 0 εc c

In (9.111) the equation is described in the abc -system. But, it will be desired to
describe the Maxwell’s equations in the laboratory coordinate system (see Fig. 9.12)
so that we can readily visualize the light propagation in an anisotropic crystal such as
P6T. Figure 9.12 depicts the ξηζ-system, where the ξ-axis is in the direction of the
light propagation within the crystal, namely the ξ-axis parallels β in (9.107). We
assume that the ξηζ-system forms an orthogonal coordinate.
Here we define the dielectric constant ellipsoid e ε described by
0 1
ξ
B C
ðξ η ζ Þe
ε@ η A ¼ 1, ð9:112Þ
ζ

which represents an ellipsoid in the ξηζ-system. If one cuts the ellipsoid by a plane
that includes the origin of the coordinate system and perpendicular to the ξ-axis, its
cross-section is an ellipse. In this situation, one can choose the η- and ζ- axes for the
principal axis of the ellipse. This implies that when we put ξ ¼ 0, we must have
0 1
0
B C
ð0 η ζ Þe
ε@ η A ¼ εηη η2 þ εζζ ζ 2 ¼ 1: ð9:113Þ
ζ

In other words, we must have e


ε in the form of
9.5 Lasers 367

0 1
εξξ εξη εξζ
B C
e
ε ¼ @ εηξ εηη 0 A,
εζξ 0 εζζ

where eε is expressed in the ξηζ-system. In terms of the matrix algebra, the principal
submatrix with respect to the η- and ζ- components should be diagonalized (see Sect.
14.5). On this condition, the electric flux density of the electromagnetic wave is
polarized in the η- and ζ-direction. A half of the reciprocal square root of the
pffiffiffiffiffiffi pffiffiffiffiffiffi
principal axes, i.e., 1= εηη or 1= εζζ , represents the anisotropic refractive indices.
Suppose that the ξηζ-system is reached by two successive rotations starting from
the abc -system in such a way that we first perform the rotation by α around the c -
axis that is followed by another rotation by δ around the ξ-axis (Fig. 9.12). Then we
have [7]
0 1 0 1
εξξ εξη εξζ εaa 0 εac
B C 1 1 B C
@ εηξ εηη 0 A ¼ Rξ ðδÞRc ðαÞ@ 0 εbb 0 ARc ðαÞRξ ðδÞ, ð9:114Þ
εζξ 0 εζζ εc a 0 εc c

where Rc ðαÞ and Rξ(δ) stand for the first and second rotation matrices of the above,
respectively. In (9.114) we have
0 1 0 1
cos α  sin α 0 1 0 0
B C B C
Rc ðαÞ ¼ @ sin α cos α 0 A, Rξ ðδÞ ¼ @ 0 cos δ  sin δ A: ð9:115Þ
0 0 1 0 sin δ cos δ

With the matrix presentations and related coordinate transformations for (9.114)
and (9.115), see Sects. 11.4, 17.1, and 17.4.2. Note that the rotation angle δ cannot
be decided independent of α. In the literature [7], furthermore, the quantity εξη ¼ εηξ
was ignored as small compared to other components. As a result, (9.111) is
converted into the following equation described by
0 1
εξξ 0 εξζ
B C ∂E
rot H ¼ ε0 @ 0 εηη 0 A : ð9:116Þ
∂t
εζξ 0 εζζ

Notice here that the electromagnetic quantities H and E are measured in the ξηζ-
coordinate system.
368 9 Light Quanta: Radiation and Absorption

Meanwhile, the electromagnetic waves that are propagating within the slab
crystal are described as

Φν exp ½iðβξ  ωt Þ , ð9:117Þ

where Φν stands for either an electric field or a magnetic field with ν chosen from
ν ¼ 1, 2, 3 representing each component of the ξηζ-system. Inserting (9.117) into
(9.116) and separating the resulting equation into ξ, η, and ζ components, we obtain
six equations with respect to six components of H and E. Of these, we are partic-
ularly interested in Eξ and Eζ as well as Hη, because we assume that the electromag-
netic wave is propagated as a TM mode [7].
Using the set of the above six equations, with Hη in the P6T crystal we get the
following second-order linear differential equation (SOLDE):
 
d2 H η εξζ dH η εξζ 2 εξξ
þ 2ineff k 0 þ εξξ   n 2
k 2 H ¼ 0, ð9:118Þ
dζ 2 εζζ dζ εζζ εζζ eff 0 η

where neff ¼ n sin θ in (8.153) is the effective index. Assuming as a solution of


(9.118)
 
H η ¼ H cryst eiκζ cos ðkζ þ ϕs Þ κ 6¼ 0, k 6¼ 0, H cryst , ϕs : constant ð9:119Þ

and inserting (9.119) into (9.118), we get a following type of equation described by

Ζeiκζ cos ðkζ þ ϕs Þ þ Ωeiκζ sin ðkζ þ ϕs Þ ¼ 0: ð9:120Þ

In (9.119) Hcryst represents the magnetic field within the P6T crystal. The
constants Ζ and Ω in (9.120) can be described using κ and k together with the
dH
constant coefficients of dζη and Hη of SOLDE (9.118). Meanwhile, we have
 
 eiκζ cos ðkζ þ ϕ Þ eiκζ sin ðkζ þ ϕs Þ 
 s 2iκ
 iκζ 0 0  ¼ ke 6¼ 0,
 e cos ðkζ þ ϕs Þ eiκζ sin ðkζ þ ϕs Þ 

where the differentiation is pertinent to ζ. This shows that eiκζ cos (kζ + ϕs) and
eiκζ sin (kζ + ϕs) are linearly independent. This in turn implies that

ΖΩ0 ð9:121Þ

in (9.120). Thus, from (9.121) we can determine κ and k such that


9.5 Lasers 369

εξζ εξζ 1   
κ¼β ¼ neff k0 , k 2 ¼ 2 εξξ εζζ  εξζ 2 εζζ  neff 2 k0 2 : ð9:122Þ
εζζ εζζ εζζ

Further using the aforementioned six equations to eliminate Eζ, we obtain

εζζ
Eξ ¼ i kH eiκζ sin ðkζ þ ϕs Þ: ð9:123Þ
ωε0 ðεξξ εζζ  εξζ 2 Þ cryst

Equations (9.119) and (9.123) define the electromagnetic fields as a function of ζ


within the P6T slab crystal.
Meanwhile, the fields in the AZO substrate and air can readily be determined.
Assuming that both the substances are isotropic, we have εξζ ¼ 0 and εξξ ¼ εζζ  e
n2
and, hence, we get

d2 H η  2 
2
þ e
n  neff 2 k0 2 H η ¼ 0, ð9:124Þ

where e
n is the refractive index of either the substrate or air. Since

e
n  neff  n, ð9:125Þ

where n is the phase refractive index of the P6T crystal, we have e


n2  neff 2 < 0.
Defining a quantity
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
γ  neff 2  e n2 k 0 ð9:126Þ

for the substrate (i.e., AZO) and air, from (9.124) we express the electromagnetic
fields as

e γζ ,
H η ¼ He ð9:127Þ

where He denotes the magnetic field relevant to the substrate or air. Considering that
ζ < 0 for the substrate and ζ > d/ cos δ in air (see Fig. 9.13), with the magnetic field
we have

H η ¼ He e γðζ cosd δÞ
e γζ and H η ¼ He ð9:128Þ

for the substrate and air, respectively. Notice that Hη ! 0, when ζ !  1 with the
substrate and ζ ! 1 for air. Correspondingly, with the electric field we get
370 9 Light Quanta: Radiation and Absorption

γ e γζ γ e γðζ cosd δÞ
E ξ ¼ i He and E ξ ¼ i He ð9:129Þ
ωε0e
n 2
ωε0e
n2

for the substrate and air, respectively.


The device geometry is characterized by that the P6T slab crystal is sandwiched
by air and the device substrate so as to form the three-layered (air/crystal/substrate)
structure (Fig. 9.10c). Therefore, we impose boundary conditions on Hη in (9.119)
and (9.128) along with Eξ in (9.123) and (9.129) in such a way that their tangential
components are continuous across the interfaces between the crystal and the sub-
strate and between the crystal and air.
Figure 9.13 represents the geometry of the cross-section of air/P6T crystal/AZO
substrate. From (9.119) and (9.128), (a) the tangential continuity condition for the
magnetic field at the crystal/substrate interface is described as

e γ0 ð cos δÞ,


H cryst eiκ0 cos ðk  0 þ ϕs Þð cos δÞ ¼ He ð9:130Þ

where the factor of cosδ comes from the requirement of the tangential continuity
condition. (b) The tangential continuity condition for the electric field at the same
interface reads as

εζζ γ e γ0
i kH eiκ0 sin ðk  0 þ ϕs Þ ¼ i He : ð9:131Þ
ωε0 ðεξξ εζζ  εξζ 2 Þ cryst ωε0e
n2

Thus, dividing both sides of (9.131) by both sides of (9.130), we get

εζζ γ εζζ γ
k tan ϕs ¼  2 or k tan ðϕs Þ ¼ 2 , ð9:132Þ
εξξ εζζ  εξζ 2 e
n ε ξξ ε ζζ  ε ξζ
2
e
n

where γ and e n are substrate related quantities; see (9.125) and (9.126). Another
boundary condition at the air/crystal interface can be obtained in a similar manner.
Notice that in that case ζ ¼ d/ cos δ; see Fig. 9.13. As a result, we have

Fig. 9.13 Geometry of the ∗

cross-section of air/P6T
crystal/AZO substrate. The
angle δ is identical with that
of Fig. 9.12. The points air
P and P0 are located at the
crystal/substrate interface
and air/crystal interface, P6T crystal
respectively. The distance
between P and P0 is d/ cos δ

AZO substrate
9.5 Lasers 371

εζζ kd γ
k tan þ ϕs ¼ , ð9:133Þ
εξξ εζζ  εξζ 2 cos δ e
n2

where γ and e n are air related quantities. The quantity ϕs can be eliminated
considering arc tangent of (9.132) and (9.133). In combination with (9.122), we
finally get the following eigenvalue equation for TM modes:
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
k0 d ðε ε  εξζ 2 Þðεζζ  neff 2 Þ=ð cos δÞ
εζζ 2 ξξ ζζ
" sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi#
1 1 n 2  nair 2
¼ lπ þ tan ðεξξ εζζ  εξζ 2 Þ eff
nair 2 εζζ  neff 2
" sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi#
1 1 n 2  nsub 2
þ tan ðεξξ εζζ  εξζ 2 Þ eff , ð9:134Þ
nsub 2 εζζ  neff 2

where nair (¼ 1) is the refractive index of air, nsub is that of AZO substrate equipped
with a diffraction grating, d is the crystal thickness, and l is the order of the transverse
mode. The refractive index nsub is estimated as the average of the refractive indices
of air and AZO weighted by volume fraction they occupy in the diffraction grating.
To solve (9.134), iterative numerical computation was needed. The detailed calcu-
lation procedures for this can be seen in the literature [7] and supplementary material
therein.
Note that (9.134) includes the well-known eigenvalue equation for a waveguide
that comprises an isotropic core dielectric medium sandwiched by a couple of clad
layers having the same refractive index (symmetric waveguide) [12]. In that case, in
fact, the second and third terms in RHS of (9.134) are the same and their sum is
identical with δTM of (8.176) in Sect. 8.7.2. To confirm it, in (9.134) use
εξξ ¼ εζζ  n2 [see (7.57)] and εξζ ¼ 0 together with the relative refractive index
given by (8.39).
(ii) Phase matching of the electromagnetic fields:
Once we have obtained the eigenvalue equation described by (9.134), we will be
able to solve the problem and, hence, to design a high-performance optical device,
especially the laser. To this end, let us return back to (9.107). Taking inner products
(see Chap. 13) of both sides of (9.107), we have

hβ 2 mKjβ 2 mK i ¼ hk0 jk0 i:

That is, we get


372 9 Light Quanta: Radiation and Absorption

β2 þ m2 K 2  2mKβ cos φ ¼ k0 2 , ð9:135Þ

where φ is an angle between mK and β (see Fig. 9.11). Inserting (9.104) into (9.135)
and solving the quadratic equation, we obtain neff as a function of k0 (or emission
wavelength). This implies that we can determine neff from the experimental data.
Yet, the above process does not mean that we can determine neff uniquely. It is
because there is still room for choice of a positive integer m. Using (9.125) that
restricts the values neff, however, we can decide the most probable integer m. Thus,
we can compare neff obtained from (9.134) with neff determined from the
experiments.
As described above, we have explained two major factors of the out-coupling of
emission. In the above discussion, we assumed that the TM modes dominate.
Combining (9.107) and (9.134), we successfully assign individual emissions to the
specific TMml modes, in which the indices m and l are longitudinal and transverse
modes with m associated with the grating. Note that m 1 and l 0. With the index
l, l ¼ 0 is allowed because the total internal reflection (Sects. 8.5 and 8.6) is
responsible.
Figure 9.14a [7] shows an example of the angle-dependent emission spectra. For
this, the grazing emission was intended. There are two series of progression in which
the emission peak locations are either redshifted or blueshifted with increasing

(a) (b)
4
(×10 counts)
Intensity

2
3

90
ion angle (degree)

60

30
RotatAngle

−30

−60

−90
600 700 800
Wavelength (nm)

Fig. 9.14 Angle-dependent emission spectra of a P6T crystal. (a) Emission spectra. Reproduced
from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018)
Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys
123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1.
5030486. (b) The grazing emissions (k0) were collected at various angles θ made with the grating
wavevector direction (K)
9.5 Lasers 373

Refractive index, effective index


(2, 0, b)

(2, 1, b) (m, l, s) = (1, 0, r)

2
(1, 1, r)

600 700 800


Peak wavelength (nm)

Fig. 9.15 Wavelength dispersion of effective indices neff. The open and closed symbols show the
data pertinent to the blueshifted and redshifted peaks, respectively. These data were calculated from
either (9.105) or (9.107). The colored solid curves represent the dispersions of the effective
refractive indices computed from (9.134). The numbers and characters (m, l, s) denote the diffrac-
tion order (m), order of transverse mode (l ), and shift direction (s) that distinguishes between
blueshift (b) and redshift (r). A black solid curve at the upper right indicates the wavelength
dispersion of the phase refractive indices (n) of the P6T crystal related to the one principal axis.
Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S,
Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J
Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.
1063/1.5030486

angles jθj. Figure 9.15 [7] compares the effective indices determined by the exper-
iments and numerical computations based on (9.134). The experimentally observed
emission peaks are indicated with and open circles (or open triangles) or closed
circles (or closed triangles) with blueshifted or redshifted peaks, respectively. The
color solid curves represent the results obtained from (9.134).
As clearly recognized in Fig. 9.15 [7], the results of the experiments and com-
putations are satisfactory. In particular, TM20 (blueshifted) and TM10 (redshifted)
modes are seamlessly jointed so as to be an upper group of emission peaks. A lower
group of emission peaks are assigned to TM21 (blueshifted) or TM11 (redshifted)
modes. Notice here that these modes may have multiple values of neff according to
different εξξ, εζζ, and εξζ that vary with the different propagation directions of light
(i.e., the ξ-axis) within the slab crystal waveguide.
From a practical point of view, it is desired to make a device so that it can strongly
emit light in the direction parallel to the grating wavevector. In that case, (9.107) can
be rewritten as

β ¼ ðk0 þ mK Þee,
374 9 Light Quanta: Radiation and Absorption

where all the vectors β, K, and k0 are parallel to ee. Then, we have two choices such
that

2π mK
β ¼ k0 þ mK ¼ neff k 0 , k 0 ¼ ¼ ;
λp neff  1
2π mK
β ¼ k0  mK ¼ neff k0 , k0 ¼ ¼ : ð9:136Þ
λp neff þ 1

The first and second equations of (9.136) are associated with the spectra of
redshifted and blueshifted progressions, respectively. Further rewriting (9.136) to
combine the two equations, we get

2π ðneff  1Þ ðneff  1ÞΛ


λp ¼ ¼ , ð9:137Þ
mK m

where we used the relation of K ¼ 2π/Λ.


From (9.134) we find that neff implicitly depends on l and d (crystal thickness).
Thus, (9.137) tells us that if we choose m, l, and d appropriately, we can determine
the most probable neff. Then, choosing the grating period Λ appropriately in turn, we
may predict the emission peak wavelength λp. The other way around, designating λp
properly, we can decide Λ. In fact, λp can experimentally be determined from an
emission color (or maximum of emission gain) inherent to the chemical species of
organic crystals. For example, a maximum of emission gain of P6T is located around
660 nm, as can be seen from Fig. 9.14a [7]. In combination with (9.134), (9.137)
finally allows us to choose the optimum combination of the device parameters m, l,
d, Λ, and λp.
When the laser oscillation is taking place, (9.136) and (9.137) should be replaced
with the following expressions that represent the Bragg’s condition [13, 14]:

2β ¼ mK ð9:138Þ

or

2neff Λ
λp ¼ : ð9:139Þ
m

As discussed above in detail, Examples 9.1 and 9.2 enable us to design a high-
performance laser device using an organic crystal that is characterized by a pretty
complicated crystal structure associated with anisotropic refractive indices. The
abovementioned design principle enables one to construct effective laser devices
that consist of light-emitting materials either organic or inorganic more widely. At
the same time, these examples are expected to provide an effective methodology in
interdisciplinary fields encompassing solid-state physics and solid-state chemistry as
well as device physics.
9.6 Mechanical System 375

9.6 Mechanical System

As outlined above, two-level atoms have distinct characteristics in connection with


lasers. Electromagnetic waves similarly confined within a one-dimensional cavity
exhibit related properties and above all have many features in common with a
harmonic oscillator [15].
We have already described several features and properties of the harmonic
oscillator (Chap. 2). Meanwhile, we have briefly discussed formation of electromag-
netic stationary waves (Chap. 8). There are several resemblance and correspondence
between the harmonic oscillator and electromagnetic stationary waves when we
view them as mechanical systems. The point is that in a harmonic oscillator the
position and momentum are in-quadrature relationship; i.e., their phase difference is
π/2. For the electromagnetic stationary waves, electric field and magnetic field are
in-quadrature relationship as well.
In Chap. 8, we examine the conditions under which stationary waves are formed.
In a dielectric medium both sides of which are equipped with metal layers
(or mirrors), the electric field is described as

E ¼ 2E 1 εe sin ωt sin kz: ð8:201Þ

In (8.201), we assumed that forward and backward electromagnetic waves are


propagating in the direction of the z-axis. Here, we assume that the interfaces
(or walls) are positioned at z ¼ 0 and z ¼ L. Within a domain [0, L], the two
waves form a stationary wave. Since this expression assumed two waves, the
electromagnetic energy was doubled. To normalize the energy sopthat ffiffiffi a single
wave is contained, the amplitude E1 of (8.201) should be divides by 2. Therefore,
we think of a following description for E:
pffiffiffi pffiffiffi
E¼ 2Ee1 sin ωt sin kz or Ex ¼ 2E sin ωt sin kz, ð9:140Þ

where we designated the polarization vector as a direction of the x-axis. At the same
time, we omitted the index from the amplitude. Thus, from the second equation of
(7.65) we have
pffiffiffi
∂H y 1 ∂E x 2Ek
¼ ¼ sin ωt cos kz: ð9:141Þ
∂t μ ∂z μ

Note that this equation appears in the second equation of (8.131) as well. Integrating
both sides of (9.141), we get
pffiffiffi
2Ek
Hy ¼ cos ωt cos kz:
μω

Using a relation ω ¼ vk, we have


376 9 Light Quanta: Radiation and Absorption

pffiffiffi
2E
Hy ¼ cos ωt cos kz þ C,
μv

where v is a light velocity in the dielectric medium and C is an integration constant.


Removing C and putting

E
H ,
μv

we have
pffiffiffi
Hy ¼ 2H cos ωt cos kz:

Using a vector expression, we have


pffiffiffi
H ¼ e2 2H cos ωt cos kz:

Thus, E (ke1), H (ke2), and n (ke3) form the right-handed system in this order. As
noted in Sect. 8.8, at the interface (or wall) the electric field and magnetic field form
nodes and antinodes, respectively. Namely, the two fields are in-quadrature.
Let us calculate electromagnetic energy of the dielectric medium within a cavity.
In the present case the cavity is meant as the dielectric sandwiched by a couple of
metal layer. We have

W ¼ W e þ W m, ð9:142Þ

where W is the total electromagnetic energy; We and Wm are electric and magnetic
energies, respectively. Let the length of the cavity be L. Then the energy per unit
cross-section area is described as
Z L Z L
ε μ
We ¼ E dz and W m ¼
2
H 2 dz: ð9:143Þ
2 0 2 0

Performing integration, we get

ε
W e ¼ LE 2 sin2 ωt,
2
2
μ μ E ε
W m ¼ LH 2 cos 2 ωt ¼ L cos 2 ωt ¼ LE 2 cos 2 ωt, ð9:144Þ
2 2 μv 2

where we used 1/v2 ¼ εμ with the last equality. Thus, we have


9.6 Mechanical System 377

ε μ
W ¼ LE 2 ¼ LH 2 : ð9:145Þ
2 2
fe , W
Representing an energy per unit volume as W g e
m , and W, we have

fe ¼ ε E 2 sin2 ωt, W
W g ε 2 2 e ε 2 μ 2
m ¼ E cos ωt, W ¼ E ¼ H : ð9:146Þ
2 2 2 2

In Chap. 2, we treated motion of a harmonic oscillator. There, we had

v0
xð t Þ ¼ sin ωt ¼ x0 sin ωt: ð2:7Þ
ω

Here, we have defined an amplitude of the harmonic oscillation as x0 (>0)

v0
x0  :
ω

Then, momentum is described as

pðt Þ ¼ mxð_t Þ ¼ mωx0 cos ωt:

Defining

p0  mωx0 ,

we have

pðt Þ ¼ p0 cos ωt:

Let a kinetic energy and potential energy of the oscillator be K and V, respec-
tively. Then, we have

1 1 2
K¼ pðt Þ2 ¼ p cos 2 ωtxðt Þ,
2m 2m 0
1 1
V ¼ mω2 xðt Þ2 ¼ mω2 x0 2 sin2 ωtxðt Þ,
2 2
1 2 1 1
W ¼KþV ¼ p ¼ mv0 2 ¼ mω2 x0 2 : ð9:147Þ
2m 0 2 2

Comparing (9.146) and (9.147), we recognize the following relationship in energy


between the electromagnetic fields and harmonic oscillator motion [15]:
pffiffiffiffi pffiffiffi pffiffiffiffi pffiffiffi
mωx0 ⟷ εE and p0 = m⟷ μH: ð9:148Þ
378 9 Light Quanta: Radiation and Absorption

Thus, there is an elegant contradistinction between the dynamics of electromagnetic


fields in cavity and motion of a harmonic oscillator. In fact, quantum electromag-
netism is based upon the treatment of a quantum harmonic oscillator introduced in
Chap. 2.

References

1. Moore WJ (1955) Physical chemistry, 3rd edn. Prentice-Hall, Englewood Cliffs


2. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester
3. Sunakawa S (1965) Theoretical electromagnetism. Kinokuniya, Tokyo. (in Japanese)
4. Loudon R (2000) The quantum theory of light, 3rd edn. Oxford University Press, Oxford
5. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge
6. Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of
thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5):053113
7. Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018)
Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys
123(23):235501
8. Siegman AE (1986) Lasers. University Science Books, Sausalito
9. Palmer C (2005) Diffraction grating handbook, 6th edn. Newport Corporation, New York
10. Hotta S, Yamao T (2011) The thiophene/phenylene co-oligomers: exotic molecular semicon-
ductors integrating high-performance electronic and optical functionalities. J Mater Chem 21
(5):1295–1304
11. Hotta S, Goto M, Azumi R, Inoue M, Ichikawa M, Taniguchi Y (2004) Crystal structures of
thiophene/phenylene co-oligomers with different molecular shapes. Chem Mater 16
(2):237–241
12. Yariv A, Yeh P (2003) Optical waves in crystals: propagation and control of laser radiation.
Wiley, New York
13. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
14. Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended
wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl Phys
127(5):053102
15. Fox M (2006) Quantum Optics. Oxford University Press, Oxford
Chapter 10
Introductory Green’s Functions

In this chapter, we deal with various properties and characteristics of differential


equations, especially first-order linear differential equations (FOLDEs) and second-
order linear differential equations (SOLDEs). These differential equations are char-
acterized by differential operators and boundary conditions (BCs). Of these, differ-
ential operators appearing in SOLDEs are particularly important. Under appropriate
conditions, the said operators can be converted to Hermitian operators. The SOLDEs
associated to classical orthogonal polynomials play a central role in many fields of
mathematical physics including quantum mechanics and electromagnetism. We
study the general principle of SOLDEs in relation to several specific SOLDEs we
have studied in Part I and examine general features of an eigenvalue problem and an
initial-value problem (IVP). In this context, Green’s functions provide a powerful
tool for solving SOLDEs. For a practical purpose, we deal with actual construction
of Green’s functions. In Sect. 8.8, we dealt with steady-state characteristics of
electromagnetic waves in dielectrics in terms of propagation, reflection, and trans-
mission. When we consider transient characteristics of electromagnetic and optical
phenomena, we often need to deal with SOLDEs having constant coefficients. This
is well known in connection with a motion of a damped harmonic oscillator. In the
latter part of this chapter, we treat the initial value problem of a SOLDE of this type.

10.1 Second-Order Linear Differential Equations


(SOLDEs)

A general form of n-th order linear differential equations has the following form:

dn u dn1 u du
a n ð xÞ n þ an1 ðxÞ n1 þ    þ a1 ðxÞ þ a0 ðxÞu ¼ d ðxÞ: ð10:1Þ
dx dx dx

© Springer Nature Singapore Pte Ltd. 2020 379


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_10
380 10 Introductory Green’s Functions

Table 10.1 Characteristics of SOLDEs


Type I Type II Type III Type IV
Equation Homogeneous Homogeneous Inhomogeneous Inhomogeneous
Boundary conditions Homogeneous Inhomogeneous Homogeneous Inhomogeneous

If d(x) ¼ 0, the differential equation is said to be homogeneous; otherwise it is called


inhomogeneous. Equation (10.1) is a linear function of u and its derivatives.
Likewise, we have a SOLDE such that

d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ dðxÞ: ð10:2Þ
dx2 dx

In (10.2) we assume that the variable x is real. The equation can be solved under
appropriate boundary conditions (BCs). A general form of BCs is described as
 
du  du
B1 ðuÞ ¼ α1 uðaÞ þ β1 x¼a þ γ 1 uðbÞ þ δ1  ¼ σ1, ð10:3Þ
dx dx x¼b
 
du  du
B2 ðuÞ ¼ α2 uðaÞ þ β2 x¼a þ γ 2 uðbÞ þ δ2  ¼ σ2, ð10:4Þ
dx dx x¼b

where α1, β1, γ 1, δ1 σ 1, etc. are real constants; u(x) is defined in an interval [a, b],
where a and b can be infinity (i.e., 1). The LHS of B1(u) and B2(u) are referred to
as boundary functionals [1, 2]. If σ 1 ¼ σ 2 ¼ 0, the BCs are called homogeneous;
otherwise the BCs are said to be inhomogeneous. In combination with the inhomo-
geneous equation expressed as (10.2), Table 10.1 summarizes characteristics of
SOLDEs. We have four types of SOLDEs according to homogeneity and inhomo-
geneity of equations and BCs.
Even though SOLDEs are mathematically tractable, yet it is not easy necessarily
to solve them depending upon the nature of a(x), b(x), and c(x) of (8.2). Nonetheless,
if those functions are constant coefficients, it can readily be solved. We will deal with
SOLDEs of that type in great deal later. Suppose that we find two linearly indepen-
dent solutions u1(x) and u2(x) of a following homogeneous equation:

d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ 0: ð10:5Þ
dx2 dx

Then, any solution u(x) of (10.5) can be expressed as their linear combination such
that

uðxÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ, ð10:6Þ

where c1 and c2 are some arbitrary constants.


In general, suppose that there are arbitrarily chosen n functions; i.e., f1(x), f2(x),   ,
fn(x). Suppose a following equation with those functions:
10.1 Second-Order Linear Differential Equations (SOLDEs) 381

a1 f 1 ðxÞ þ a2 f 2 ðxÞ þ    þ an f n ðxÞ ¼ 0, ð10:7Þ

where a1, a2,   , and an are constants. If a1 ¼ a2 ¼    ¼ an ¼ 0, (10.7) always


holds. In this case, (10.7) is said to be a trivial linear relation. If f1(x), f2(x),   , and
fn(x) satisfy a nontrivial linear relation, f1(x), f2(x),   , and fn(x) are said to be linearly
dependent. That is, the nontrivial expression means that in (10.7) at least one of a1,
a2,   , and an is nonzero. Suppose that an 6¼ 0. Then, from (10.7), fn(x) is expressed
as

a1 a a
f n ð xÞ ¼  f ðxÞ  2 f 2 ðxÞ      n1 f n1 ðxÞ: ð10:8Þ
an 1 an an

If f1(x), f2(x),   , and fn(x) are not linearly dependent, they are called linearly
independent. In other words, the statement that f1(x), f2(x),   , and fn(x) are linearly
independent is equivalent to that (10.7) holds if and only if a1 ¼ a2 ¼    ¼ an ¼ 0.
We will have relevant discussion in Part III.
Now suppose that with the above two linearly independent functions u1(x) and
u2(x), we have

a1 u1 ðxÞ þ a2 u2 ðxÞ ¼ 0: ð10:9Þ

Differentiating (10.9), we have

du1 ðxÞ du ðxÞ


a1 þ a2 2 ¼ 0: ð10:10Þ
dx dx

Expressing (10.9) and (10.10) in a matrix form, we get


0 1
u1 ð x Þ u2 ðxÞ  a 
@ du ðxÞ du ðxÞ A 1 ¼ 0: ð10:11Þ
1 2
a2
dx dx

Thus, that u1(x) and u2(x) are linearly independent is equivalent to that the following
expression holds:
 
 u1 ð x Þ u2 ðxÞ 

 du ðxÞ du ðxÞ   W ðu1 , u2 Þ 6¼ 0, ð10:12Þ
 1 2 
 
dx dx

where W(u1, u2) is called Wronskian of u1(x) and u2(x). In fact, if W(u1, u2) ¼ 0, then
we have

du2 du
u1  u2 1 ¼ 0: ð10:13Þ
dx dx
382 10 Introductory Green’s Functions

This implies that there is a functional relationship between u1 and u2. In fact, if we
can express as u2 ¼ u2(u1(x)), then du
dx ¼ du1 dx . That is,
2 du2 du1

du2 du du du du du du
u1 ¼ u1 2 1 ¼ u2 1 or u1 2 ¼ u2 or 2 ¼ 1 , ð10:14Þ
dx du1 dx dx du1 u2 u1

where the second equality of the first equation comes from (10.13). The third
equation can easily be integrated to yield

u2 u
ln ¼ c or 2 ¼ ec or u2 ¼ ec u1 : ð10:15Þ
u1 u1

Equation (10.15) shows that u1(x) and u2(x) are linearly dependent. It is easy to show
if u1(x) and u2(x) are linearly dependent, W(u1, u2) ¼ 0. Thus, we have a following
statement:

Two functions are linearly dependent: , W ðu1 , u2 Þ ¼ 0:

Then, as the contraposition of this statement we have

Two functions are linearly independent: , W ðu1 , u2 Þ 6¼ 0:

On the other hand, suppose that we have another solution u3(x) for (10.5) besides
u1(x) and u2(x). Then, we have

d 2 u1 du
að x Þ þ bðxÞ 1 þ cðxÞu1 ¼ 0,
dx2 dx
d 2 u2 du
að x Þ þ bðxÞ 2 þ cðxÞu2 ¼ 0,
dx2 dx
d 2 u3 du
aðxÞ þ bðxÞ 3 þ cðxÞu3 ¼ 0: ð10:16Þ
dx2 dx

Again rewriting (10.16) in a matrix form, we have


0 10 1
d 2 u1 du1 a
u1
B dx2 dx CB C
B CB C
B d 2 u2 du2 CB b C
B u2 C B C
B dx2 dx CB C ¼ 0: ð10:17Þ
B CB C
@ A@ A
d 2 u3 du3
u3
dx2 dx c

A necessary and sufficient condition to obtain a nontrivial solution (i.e., a solution


besides a ¼ b ¼ c ¼ 0) is that [1]
10.1 Second-Order Linear Differential Equations (SOLDEs) 383

 
 d 2 u1 du1 
 u1 
 dx2 dx 
 
 d 2 u2 du2 
 u2  ¼ 0: ð10:18Þ
 dx2 dx 
 
 
 d 2 u3 du3 
 u3 
dx2 dx

Note here that


 
 d 2 u1 du1   
 u1   u1 u2 u3 
 dx2 dx   
   du1 du2 du3 
 
 d 2 u2 du2
u2  ¼  dx 
dx   W ðu1 , u2 , u3 Þ, ð10:19Þ
 dx2 dx   2
dx

   d u1 d 2 u2 d 2 u3 
   2
 d 2 u3 du3 
 u3  dx dx2 dx2
dx2 dx

where W(u1, u2, u3) is Wronskian of u1(x), u2(x), and u3(x). In the above relation, we
used the fact that a determinant of a matrix is identical to that of its transposed matrix
and that a determinant of a matrix changes the sign after permutation of row vectors.
To be short, a necessary and sufficient condition to get a nontrivial solution is that W
(u1, u2, u3) vanishes.
This implies that u1(x), u2(x), and u3(x) are linearly dependent. However, we have
assumed that u1(x) and u2(x) are linearly independent, and so (10.18) and (10.19)
mean that u3(x) must be described as a linear combination of u1(x) and u2(x). That is,
we have no third linearly independent solution. Consequently, the general solution
of (10.5) must be given by (10.6). In this sense, u1(x) and u2(x) are said to be a
fundamental set of solutions of (10.5).
Next, let us consider the inhomogeneous equation of (10.2). Suppose that up(x) is
a particular solution of (10.2). Let us think of a following function v(x) such that:

uðxÞ ¼ vðxÞ þ up ðxÞ: ð10:20Þ

Substituting (10.20) for (10.2), we have

d2 v dv d 2 up dup
aðxÞ 2
þ b ð x Þ þ c ð x Þv þ a ð x Þ 2
þ bð x Þ þ cðxÞup ¼ dðxÞ: ð10:21Þ
dx dx dx dx

Therefore, we have

d2 v dv
að x Þ 2
þ bðxÞ þ cðxÞv ¼ 0:
dx dx
384 10 Introductory Green’s Functions

But, v(x) can be described by a linear combination of u1(x) and u2(x) as in the case of
(10.6). Hence, the general solution of (10.2) should be expressed as

uðxÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ þ up ðxÞ, ð10:22Þ

where c1 and c2 are arbitrary (complex) constants.

10.2 First-Order Linear Differential Equations (FOLDEs)

In a discussion that follows, first-order linear differential equations (FOLDEs)


supply us with useful information. A general form of FOLDEs is expressed as

du
að x Þ þ bðxÞu ¼ d ðxÞ ½aðxÞ 6¼ 0: ð10:23Þ
dx

An associated boundary condition is given by a boundary functional B(u) such that

BðuÞ ¼ αuðaÞ þ βuðbÞ ¼ σ, ð10:24Þ

where α, β, and σ are real constants; u(x) is defined in an interval [a, b]. If in (10.23) d
(x)  0, (10.23) can readily be integrated to yield a solution. Let us multiply both
sides of (10.23) by w(x). Then we have

du
wðxÞaðxÞ þ wðxÞbðxÞu ¼ wðxÞd ðxÞ: ð10:25Þ
dx

We define p(x) as

pðxÞ  wðxÞaðxÞ, ð10:26Þ

where w(x) is called a weight function. As mentioned in Sect. 2.3, the weight
function is a real and non-negative function within the domain considered. Here
we suppose that

dpðxÞ
¼ wðxÞbðxÞ: ð10:27Þ
dx

Then, (10.25) can be rewritten as

d
½pðxÞu ¼ wðxÞdðxÞ: ð10:28Þ
dx
10.2 First-Order Linear Differential Equations (FOLDEs) 385

Thus, we can immediately integrate (10.28) to obtain a solution


Z x 
1
u¼ wðx0 Þdðx0 Þdx0 þ C , ð10:29Þ
pð x Þ

where C is an arbitrary integration constant.


To seek w(x), from (10.26) and (10.27) we have
 
0 b 0
p ¼ ðwaÞ ¼ wb ¼ wa : ð10:30Þ
a

This can easily be integrated for wa to be expressed as


Z  Z 
b C0 b
wa ¼ C 0 exp dx or w ¼ exp dx , ð10:31Þ
a a a
0
where C0 is an arbitrary integration constant. The quantity Ca must be non-negative so
that w can be non-negative.
Example 10.1 Let us think of a following FOLDE within an interval [a, b]; i.e.,
a  x  b.

du
þ xu ¼ x: ð10:32Þ
dx

A boundary condition is set such that

uðaÞ ¼ σ: ð10:33Þ

Notice that (10.33) is obtained by setting α ¼ 1 and β ¼ 0 in (10.24). Following the


above argument, we obtain a solution described as
Z x 
1
u¼ wðx0 Þdðx0 Þdx0 þ uðaÞpðaÞ : ð10:34Þ
pð x Þ a

Also, we have
Z  h 
x
0 0 1 2 i
pðxÞ ¼ wðxÞ ¼ exp x dx ¼ exp x  a2 : ð10:35Þ
a 2

The integration of RHS can be performed as follows:


386 10 Introductory Green’s Functions

Z x Z x h i
0 0 0 1 02
wðx Þdðx Þdx ¼ x0 exp
x  a2 dx0
a a 2
 2 Z x2 =2
a
¼ exp  exp tdt
2 a2 =2
 2   2  2 
a x a
¼ exp  exp  exp
2 2 2
 2  2
a x
¼ exp  exp  1 ¼ pðxÞ  1, ð10:36Þ
2 2

where with the second equality we used an integration by substitution of 12 x0 ⟶t.


2

Considering (10.33) and putting p(a) ¼ 1, (10.34) is rewritten as

1 σ1
u¼ ½ pð x Þ  1 þ σ  ¼ 1 þ
pð x Þ pðxÞ
 2 
a x 2
¼ 1 þ ðσ  1Þ exp : ð10:37Þ
2

where with the last equality we used (10.35).


Notice that when σ ¼ 1, u(x)  1. This is because u(x)  1 is certainly a solution
of (10.32) and satisfies the BC of (10.33). Uniqueness of a solution imposes this
strict condition upon (10.37).
In the next two examples, we make general discussions.
Example 10.2 Let us consider a following differential operator:

d
Lx ¼ : ð10:38Þ
dx

We think of a following identity using an integration by parts:


Z b   Z b  
d d 
dxφ ψ þ dx φ ψ ¼ ½φ ψ ba : ð10:39Þ
a dx a dx

Rewriting this, we get


Z b   Z b  
d d
dxφ ψ  dx  φ ψ ¼ ½φ ψ ba : ð10:40Þ
a dx a dx

Looking at (10.40), we notice that LHS comprises a difference between two inte-
grals, while RHS referred to as a boundary term (or surface term) does not contain an
integral.
Recalling the expression (1.128) and defining
10.2 First-Order Linear Differential Equations (FOLDEs) 387

d
Lex  
dx

in (10.40), we have

hφjLx ψ i  Lex φjψ ¼ ½φ ψ ba : ð10:41Þ

Here, RHS of (10.41) needs to vanish so that we can have

hφjLx ψ i ¼ Lex φjψ : ð10:42Þ

Meanwhile, adopting the expression (1.112) with respect to an adjoint operator, we


have a following expression such that

hφjLx ψ i ¼ Lx { φjψ : ð10:43Þ

Comparing (10.42) and (10.43) and considering that φ and ψ are arbitrary functions,
we have Lex ¼ Lx { . Thus, as an operator adjoint to Lx we get

d
Lx { ¼ Lex ¼  ¼ Lx :
dx

Notice that only if the surface term vanishes, the adjoint operator Lx{ can appropri-
ately be defined. We will encounter a similar expression again in Part III.
We add that if with an operator A we have a relation describe by

A{ ¼ A, ð10:44Þ

the operator A is said to be anti-Hermitian. We have already encountered such an


operator in Sect. 1.5.
Let us then examine on what condition the surface term vanishes. The RHS of
(10.40) and (10.41) is given by

φ ðbÞψ ðbÞ  φ ðaÞψ ðaÞ:

For this term to vanish, we should have

φ ð bÞ ψ ð aÞ
φ ðbÞψ ðbÞ ¼ φ ðaÞψ ðaÞ or ¼ :
φ ð aÞ ψ ð bÞ

If ψ(b) ¼ 2ψ(a), then we should have φðbÞ ¼ 12 φðaÞ for the surface term to vanish.
Recalling (10.24), the above conditions are expressed as
388 10 Introductory Green’s Functions

Bðψ Þ ¼ 2ψ ðaÞ  ψ ðbÞ ¼ 0, ð10:45Þ


1
B0 ðφÞ ¼ φðaÞ  φðbÞ ¼ 0: ð10:46Þ
2

The boundary functional B0(φ) is said to be adjoint to B(ψ). The two boundary
functionals are admittedly different. If, however, we set ψ(b) ¼ ψ(a), then we should
have φ(b) ¼ φ(a) for the surface term to vanish. That is

Bðψ Þ ¼ ψ ðaÞ  ψ ðbÞ ¼ 0, ð10:47Þ


0
B ðφÞ ¼ φðaÞ  φðbÞ ¼ 0: ð10:48Þ

Thus, the two functionals are identical and ψ and φ satisfy homogeneous BCs with
respect to these functionals.
As discussed above, a FOLDE is characterized by its differential operator as well
as a BC (or boundary functional). This is similarly the case with SOLDEs as well.
Example 10.3 Next, let us consider a following differential operator:

1 d
Lx ¼ : ð10:49Þ
i dx

As in the case of Example 10.2, we have


Z b   Z b  
 1 d 1 d 1
dxφ ψ  dx φ ψ ¼ ½φ ψ ba : ð10:50Þ
a i dx a i dx i

Also rewriting (10.50) using an inner product notation, we get

1
hφjLx ψ i  hLx φjψ i ¼ ½φ ψ ba , ð10:51Þ
i

Apart from the factor 1i , RHS of (10.51) are again given by

φ ðbÞψ ðbÞ  φ ðaÞψ ðaÞ:

Repeating a discussion similar to Example 10.2, when the surface term vanishes, we
get

hφjLx ψ i ¼ hLx φjψ i: ð10:52Þ

Comparing (10.43) and (10.52), we have


10.3 Second-Order Differential Operators 389

hLx φjψ i ¼ Lx { φjψ : ð10:53Þ

Again considering that φ and ψ are arbitrary functions, we get

Lx { ¼ Lx : ð10:54Þ

As in (10.54), if the differential operator is identical to its adjoint operator, such


an operator is called self-adjoint. On the basis of (1.119), Lx would apparently be
Hermitian. However, we have to be careful to assure that Lx is Hermitian. For a
differential operator to be Hermitian, (i) the said operator must be self-adjoint.
(ii) The two boundary functionals adjoint to each other must be identical. In other
words, ψ and φ must satisfy the same homogeneous BCs with respect to these
functionals. In this example, we must have the same boundary functionals as those
described by (10.47) and (10.48). If and only if the conditions (i) and (ii) are
satisfied, the operator is said to be Hermitian. It seems somewhat a formal expres-
sion. Nonetheless, satisfaction of these conditions is also the case with second-order
differential operators so that these operators can be Hermitian. In fact, SOLDEs we
studied in Part I are essentially dealt with within the framework of the aforemen-
tioned formalism.

10.3 Second-Order Differential Operators

The second-order differential operators are the most common operators and fre-
quently treated in mathematical physics. The general differential operators are
described as

d2 d
Lx ¼ a ð x Þ 2
þ bðxÞ þ cðxÞ, ð10:55Þ
dx dx

where a(x), b(x), and c(x) can in general be complex functions of a real variable x.
Let us think of following identities [1]:
 
d2 u d2 ðav Þ d  du d ðav Þ
v a 2 u ¼ av u ,
dx dx2 dx dx dx
du dðbv Þ d
v b þu ¼ ½buv ,
dx dx dx
v cu  ucv ¼ 0: ð10:56Þ

Summing both sides of (10.56), we have an identity


390 10 Introductory Green’s Functions

 
d2 u du d2 ðav Þ d ðbv Þ 
v a 2 þ b þ cu  u  þ cv
dx dx dx2 dx
  ð10:57Þ
d du dðav Þ d
¼ av  u þ ½buv :
dx dx dx dx

Hence, following the expressions of Sect. 10.2, we define Lx{ such that

  d 2 ðav Þ d ðbv Þ
Lx { v   þ cv : ð10:58Þ
dx2 dx

Taking a complex conjugate of both sides, we get

d 2 ð a v Þ d ð b v Þ
Lx { v ¼  þ c v: ð10:59Þ
dx2 dx

Considering the differential of a product function, we have as Lx{


 
{ d2
 da  d d2 a db
Lx ¼a þ 2  b þ  þ c : ð10:60Þ
dx2 dx dx dx2 dx

Replacing (10.57) with (10.55) and (10.59), we have


 
  d du d ðav Þ
v ðLx uÞ  Lx { v u ¼ av  u þ buv : ð10:61Þ
dx dx dx

Assuming that the relevant SOLDE is defined in [r, s] and integrating (10.61) within
that interval, we get
Z h  s
s

 {  i  du dðav Þ 
dx v ðLx uÞ  Lx v u ¼ av u þ buv : ð10:62Þ
r dx dx r

Using the definition of an inner product described in (1.128) and rewriting (10.62),
we have
 s
du dðav Þ
hvjLx ui  Lx { vju ¼ av  u þ buv :
dx dx r

Here if RHS of the above (i.e., the surface term of the above expression) vanishes,
we get

hvjLx ui ¼ Lx { vju :

We find that this notation is consistent with (1.112).


10.3 Second-Order Differential Operators 391

Bearing in mind this situation, let us seek a condition under which the differential
operator Lx is Hermitian. Suppose here that a(x), b(x), and c(x) are all real and that

daðxÞ
¼ bðxÞ: ð10:63Þ
dx

Then, instead of (10.60), we have

d2 d
Lx { ¼ a ð x Þ 2
þ bð x Þ þ c ð x Þ ¼ L x :
dx dx

Thus, we are successful in constituting a self-adjoint operator Lx. In that case, (10.62)
can be rewritten as
Z   s
s
   du dv
dx½v ðLx uÞ  ½Lx v u ¼ a v u : ð10:64Þ
r dx dx r

Notice that b(x) is eliminated from (10.64). If RHS of (10.64) vanishes, we get

hvjLx ui ¼ hLx vjui:

This notation is consistent with (1.119) and the Hermiticity of Lx becomes well
defined.
If we do not have the condition of dadxðxÞ ¼ bðxÞ , how can we deal with the
problem? The answer is that following the procedures in Sect. 10.2, we can convert
Lx to a self-adjoint operator by multiplying Lx by a weight function w(x) introduced
in (10.26), (10.27), and (10.31). Replacing a(x), b(x), and c(x) with w(x)a(x), w(x)b
(x), and w(x)c(x), respectively, in the identity (10.57), we rewrite (10.57) as
 
 d2 u du d2 ðawv Þ dðbwv Þ 
v aw 2 þ bw þ cwu  u  þ cwv
dx dx dx2 dx
  ð10:65Þ
d du dðawv Þ d
¼ awv  u þ ½bwuv :
dx dx dx dx

Let us calculate {  } of the second term for LHS of (10.65). Using (10.26) and
(10.27), we have

d2 ðawv Þ dðbwv Þ
 þ cwv
dx2 dx
h i h
0 0
i
¼ ðawÞ0 v þ awv  ðbwÞ0 v þ bwv þ cwv
0

h i h
0 0 0
i
¼ bwv þ awv  bwÞ0 v þ bwv þ cwv
392 10 Introductory Green’s Functions

¼ ðbwÞ0 v þ bwv þ ðawÞ0 v þ awv  ðbwÞ0 v  bwv þ cwv


0 0 00 0

¼ ðawÞ0 v þ awv þ cwv ¼ bwv þ awv þ cwv


0 00 0 00

00 0 00 0
¼ w av þ bv þ cv ¼ w a v þ b v þ c v

¼ wðav00 þ bv0 þ cvÞ : ð10:66Þ

The second last equality of (10.66) is based on the assumption that a(x), b(x), and
c(x) are real functions. Meanwhile, for RHS of (10.65) we have
 
d  du dðawv Þ d
awv u þ ½bwuv 
dx dx dx dx
 
d  du 0  dv d
¼ awv  uðawÞ v  uaw þ ½bwuv 
dx dx dx dx
 
d du dv d
¼ awv  ubwv  uaw þ ½bwuv 
dx dx dx dx
     
d du dv d du dv
¼ aw v  u ¼ p v  u : ð10:67Þ
dx dx dx dx dx dx

With the last equality of (10.67), we used (10.26). Using (10.66) and (10.67), we
rewrite (10.65) once again as
 2   2 
 d u du d v dv
v w a 2 þ b þ cu  uw a 2 þ b þ cv
dx dx dx dx
   
d du dv
¼ p v  u : ð10:68Þ
dx dx dx

Then, integrating (10.68) from r to s, we finally get


Z   s
s
du dv
dxwðxÞfv ðLx uÞ  ½Lx v ug ¼ p v  u : ð10:69Þ
r dx dx r

The relations (10.69) along with (10.62) are called the generalized Green’s identity.
We emphasize that as far as the coefficients a(x), b(x), and c(x) in (10.55) are real
functions, the associated differential operator Lx can be converted to a self-adjoint
form following the procedures of (10.66) and (10.67).
In the above, LHS of the original homogeneous differential equation (10.5) is
rewritten as
10.3 Second-Order Differential Operators 393

d2 u du
aðxÞwðxÞ 2 þ bðxÞwðxÞ þ cðxÞwðxÞu
dx dx
 
d du
¼ pð x Þ þ cwðxÞu:
dx dx

Rewriting this, we have


 
1 d du
Lx u ¼ pð x Þ þ cu ½wðxÞ > 0, ð10:70Þ
wðxÞ dx dx

where we have

dpðxÞ
pðxÞ ¼ aðxÞwðxÞ and ¼ bðxÞwðxÞ: ð10:71Þ
dx

The latter equation of (10.71) corresponds to (10.63) if we assume w(x)  1. When


the differential operator Lx is defined as (10.70), Lx is said to be self-adjoint with
respect to a weight function of w(x).
Now we examine boundary functionals. The homogeneous adjoint boundary
functionals are described as follows:
 
dv  dv 
B{1 ðuÞ ¼ α1 v ðr Þ þ β1 þ γ v 
ð s Þ þ δ ¼ 0, ð10:72Þ
dx x¼r 1 1
dx x¼s
 
dv  dv 
B{2 ðuÞ ¼ α2 v ðr Þ þ β2 þ γ v 
ð s Þ þ δ ¼ 0: ð10:73Þ
dx x¼r 2 2
dx x¼s

In (10.3) putting α1 ¼ 1 and β1 ¼ γ 1 ¼ δ1 ¼ 0, we have

B 1 ð uÞ ¼ uð r Þ ¼ σ 1 : ð10:74Þ

Also putting γ 2 ¼ 1 and α2 ¼ β2 ¼ δ2 ¼ 0, we have

B2 ðuÞ ¼ uðsÞ ¼ σ 2 : ð10:75Þ

Further putting

σ 1 ¼ σ 2 ¼ 0, ð10:76Þ

we also get homogeneous BCs of

B1 ðuÞ ¼ B2 ðuÞ ¼ 0; i:e:, uðr Þ ¼ uðsÞ ¼ 0: ð10:77Þ

For RHS of (10.69) to vanish, it suffices to define B{1 ðuÞ and B{2 ðuÞ such that
394 10 Introductory Green’s Functions

B{1 ðuÞ ¼ v ðr Þ and B{2 ðuÞ ¼ v ðsÞ: ð10:78Þ

Then, homogeneous adjoint BCs read as

v ðr Þ ¼ v ðsÞ ¼ 0, i:e:, vðr Þ ¼ vðsÞ ¼ 0: ð10:79Þ

In this manner, we can readily construct the homogeneous adjoint BCs the same as
those of (10.77) so that Lx can be Hermitian.
We list several prescriptions of typical BCs below.

ðiÞ uðr Þ ¼ uðsÞ ¼ 0 ðDirichlet conditionsÞ,


 
du  du
ðiiÞ x¼r ¼  ¼ 0 ðNeumann conditionsÞ,
dx dx x¼s
 
du  du
ðiiiÞ uðr Þ ¼ uðsÞ and x¼r ¼  ðPeriodic conditionsÞ: ð10:80Þ
dx dx x¼s

Yet, care should be taken when handling RHS of (10.69), i.e., the surface terms. It is
because conditions (i) to (iii) are not necessary but sufficient conditions for the
surface terms to vanish. Such conditions are not limited to them. Meanwhile, we
often have to deal with the nonvanishing surface terms. In that case, we have to start
with (10.62) instead of (10.69).
In Sect. 10.2, we mentioned the definition of Hermiticity of the differential
operator in such a way that the said operator is self-adjoint and that homogeneous
BCs and homogeneous adjoint BCs are the same. In light of the above argument,
however, we may relax the conditions for a differential operator to be Hermitian.
This is particularly the case when p(x) ¼ a(x)w(x) in (10.69) vanishes at both the
endpoints. We will encounter such a situation in Sect. 10.7.

10.4 Green’s Functions

Having aforementioned discussions, let us proceed with studies of Green’s functions


for SOLDEs. Though minimum, we have to mention a bit of formalism.
Given Lx defined by (10.55), let us assume

Lx u ð x Þ ¼ d ð x Þ ð10:81Þ

under homogeneous BCs with an inhomogeneous term d(x) being an arbitrary


function. We also assume that (10.81) is well defined in a domain [r, s]. The numbers
r and s can be infinity. Suppose simultaneously that we have
10.4 Green’s Functions 395

Lx { vðxÞ ¼ hðxÞ ð10:82Þ

under homogeneous adjoint BCs [1, 2] with an inhomogeneous term h(x) being an
arbitrary function as well.
Let us describe the above relations as

Ljui ¼ jdi and L{ jvi ¼ jhi: ð10:83Þ

Suppose that there is an inverse operator L1  G such that

GL ¼ LG ¼ E, ð10:84Þ

where E is an identity operator. Operating G on (10.83), we have

GLjui ¼ Ejui ¼ jui ¼ Gjdi: ð10:85Þ

This implies that (10.81) has been solved and the solution is given by G| di. Since an
inverse operation to differentiation is integration, G is expected to be an integral
operator.
We have

hxjLG j yi ¼ Lx hxjG j yi ¼ Lx Gðx, yÞ: ð10:86Þ

Meanwhile, using (10.84) we get

hxjLG j yi ¼ hxjE j yi ¼ hxjyi: ð10:87Þ

Using a weight function w(x), we generalize an inner product of (1.128) such that
Z s
hgj f i  wðxÞgðxÞ f ðxÞdx: ð10:88Þ
r

As we expand an arbitrary vector using basis vectors, we “expand” an arbitrary


function | f i using basis vectors | xi. Here, we are treating real numbers as if they
formed continuous innumerable basis vectors on a real number line (see Fig. 10.1).
Thus, we could expand | f i in terms of | xi such that

Fig. 10.1 Function | f i and


=
its coordinate representation
f(x)
396 10 Introductory Green’s Functions

Z s
jfi ¼ dxwðxÞf ðxÞjxi: ð10:89Þ
r

In (10.89) we considered f(x) as if it were an expansion coefficient. The following


notation would be reasonable accordingly:

f ðxÞ  hxj f i: ð10:90Þ

In (10.90), f(x) can be viewed as coordinate representation of | f i. Thus, from (10.89)


we get
Z s
0 0
hx j f i ¼ f ðx Þ ¼ dxwðxÞf ðxÞhx0 jxi: ð10:91Þ
r

Alternatively, we have
Z s
0
f ðx Þ ¼ dxf ðxÞδðx  x0 Þ: ð10:92Þ
r

This comes from a property of the δ function [1] described as


Z s
dxf ðxÞδðxÞ ¼ f ð0Þ: ð10:93Þ
r

Comparing (10.91) and (10.92), we have

δ ð x  x0 Þ δðx0  xÞ
wðxÞhx0 jxi ¼ δðx  x0 Þ or hx0 jxi ¼ ¼ : ð10:94Þ
wðxÞ wðxÞ

Thus comparing (10.86) and (10.87) and using (10.94), we get

δðx  yÞ
Lx Gðx, yÞ ¼ : ð10:95Þ
wðxÞ

In a similar manner, we also have

δðx  yÞ
Lx { gðx, yÞ ¼ : ð10:96Þ
wðxÞ

To arrive at (10.96), we start the discussion assuming an operator (Lx{)1 such that
(Lx{)1  g with gL{ ¼ L{g ¼ E.
The function G(x, y) is called a Green’s function and g(x, y) is said to be an adjoint
Green’s function. Handling of Green’s functions and adjoint Green’s functions is
based upon (10.95) and (10.96), respectively. As (10.81) is defined in a domain
10.4 Green’s Functions 397

r  x  s, (10.95) and (10.96) are defined in a domain r  x  s and r  y  s. Notice


that except for the point x ¼ y we have

Lx Gðx, yÞ ¼ 0 and Lx { gðx, yÞ ¼ 0: ð10:97Þ

That is, G(x, y) and g(x, y) satisfy the homogeneous equation with respect to the
variable x. Accordingly, we require G(x, y) and g(x, y) to satisfy the same homoge-
neous BCs with respect to the variable x as those imposed upon u(x) and v(x) of
(10.81) and (10.82), respectively [1].
The relation (10.88) can be obtained as follows: Operating hg| on (10.89), we
have
Z s Z s
hgjf i ¼ dxwðxÞf ðxÞhgjxi ¼ wðxÞgðxÞ f ðxÞdx, ð10:98Þ
r r

where for the last equality we used

hgjxi ¼ hx j gi ¼ gðxÞ : ð10:99Þ

For this, see (1.113) where A is replaced with an identity operator E with regard to a
complex conjugate of an inner product of two vectors. Also see (13.2) of Sect. 13.1.
If in (10.69) the surface term (i.e., RHS) vanishes under appropriate conditions,
e.g., (10.80), we have
Z n
s   o
dxwðxÞ v ðLx uÞ  Lx { v u ¼ 0, ð10:100Þ
r

which is called Green’s identity. Since (10.100) is derived from identities (10.56),
(10.100) is an identity as well (as a terminology of Green’s identity shows).
Therefore, (10.100) must hold with any functions u and v so far as they satisfy
homogeneous BCs. Thus, replacing v in (10.100) with g(x, y) and using (10.96)
together with (10.81), we have
Z n o
s  
dxwðxÞ g ðx, yÞ½Lx uðxÞ  Lx { gðx, yÞ uðxÞ
r
Z s   
 δðx  yÞ
¼ dxwðxÞ g ðx, yÞdðxÞ  uð x Þ
r wðxÞ
Z s
¼ dxwðxÞg ðx, yÞdðxÞ  uðyÞ ¼ 0, ð10:101Þ
r

where with the second last equality we used a property of the δ functions. Also notice
that δwðxy Þ
ðxÞ is a real function. Rewriting (10.101), we get
398 10 Introductory Green’s Functions

Z s
uð y Þ ¼ dxwðxÞg ðx, yÞd ðxÞ: ð10:102Þ
r

Similarly, replacing u in (10.100) with G(x, y) and using (10.95) together with
(10.82), we have
Z s
vð yÞ ¼ dxwðxÞG ðx, yÞhðxÞ: ð10:103Þ
r

Replacing u and v in (10.100) with G(x, q) and g(x, t), respectively, we have
Z n o
s  
dxwðxÞ g ðx, t Þ½Lx Gðx, qÞ  Lx { gðx, t Þ Gðx, qÞ ¼ 0: ð10:104Þ
r

Notice that we have chosen q and t for the second argument y in (10.95) and (10.96),
respectively. Inserting (10.95) and (10.96) into the above equation after changing
arguments, we have
Z   
s
δ ð x  qÞ δ ðx  t Þ
dxwðxÞ g ðx, t Þ  Gðx, qÞ ¼ 0: ð10:105Þ
r wðxÞ w ð xÞ

Thus, we get

g ðq, t Þ ¼ Gðt, qÞ or gðq, t Þ ¼ G ðt, qÞ: ð10:106Þ

This implies that G(t, q) must satisfy the adjoint BCs with respect to the second
argument q. Inserting (10.106) into (10.102), we get
Z s
uð y Þ ¼ dxwðxÞGðy, xÞd ðxÞ: ð10:107Þ
r

Or exchanging the arguments x and y, we have


Z s
uð x Þ ¼ dywðyÞGðx, yÞd ðyÞ: ð10:108Þ
r

Similarly, using (10.103) into (10.106) we get


Z s
vð yÞ ¼ dxwðxÞgðy, xÞhðxÞ: ð10:109Þ
r

Or, we have
10.4 Green’s Functions 399

Z s
vð xÞ ¼ dywðyÞgðx, yÞhðyÞ: ð10:110Þ
r

Equations (10.107) to (10.110) clearly show that homogeneous equations [given by


putting d(x) ¼ h(x) ¼ 0] have a trivial solution u(x)  0 and v(x)  0 under
homogeneous BCs. Note that it is always the case when we are able to construct a
Green’s function. This in turn implies that we can construct a Green’s function if the
differential operator is accompanied by initial conditions. Conversely, if the homo-
geneous equation has a nontrivial solution under homogeneous BCs, Eqs. (10.107)
to (10.110) will not work.
If the differential operator L in (10.81) is Hermitian, according to the associated
remarks of Sect. 10.2 we must have Lx ¼ Lx{ and u(x) and v(x) of (10.81) and (10.82)
must satisfy the same homogeneous BCs. Consequently, in the case of an Hermitian
operator we should have

Gðx, yÞ ¼ gðx, yÞ: ð10:111Þ

From (10.106) and (10.111), if the operator is Hermitian we get

Gðx, yÞ ¼ G ðy, xÞ: ð10:112Þ

In Sect. 10.3 we assume that the coefficients a(x), and b(x), and c(x) are real to assure
that Lx is Hermitian [1]. On this condition G(x, y) is real as well (vide infra). Then we
have

Gðx, yÞ ¼ Gðy, xÞ: ð10:113Þ

That is, G(x, y) is real symmetric with respect to the arguments x and y.
To be able to apply Green’s functions to practical use, we will have to estimate a
behavior of the Green’s function near x ¼ y. This is because in light of (10.95) and
(10.96) there is a “jump” at x ¼ y.
When we deal with a case where a self-adjoint operator is relevant, using a
function p(x) of (10.69) we have
 
að x Þ ∂ ∂G δ ð x  yÞ
p þ cðxÞG ¼ : ð10:114Þ
pðxÞ ∂x ∂x wðxÞ

Multiplying both sides by paððxxÞÞ, we have


 
∂ ∂G pðxÞ δðx  yÞ pðxÞcðxÞ
p ¼  Gðx, yÞ: ð10:115Þ
∂x ∂x aðxÞ wðxÞ að x Þ

Using a property of the δ function expressed by


400 10 Introductory Green’s Functions

f ðxÞ δðxÞ ¼ f ð0Þ δðxÞ or f ðxÞ δðx  yÞ ¼ f ðyÞ δðx  yÞ, ð10:116Þ

we have
 
∂ ∂G pðyÞ δðx  yÞ pðxÞcðxÞ
p ¼  Gðx, yÞ: ð10:117Þ
∂x ∂x aðyÞ wðyÞ að x Þ

Integrating (10.117) with respect to x, we get


Z
∂Gðx, yÞ pð y Þ x
pðt Þcðt Þ
p ¼ θ ð x  yÞ  dt Gðt, yÞ þ C, ð10:118Þ
∂x aðyÞwðyÞ r að t Þ

where C is a constant. The function θ(x  y) is defined by

1 ð x > 0Þ
θ ð xÞ ¼ ð10:119Þ
0 ðx < 0Þ:

Note that we have

dθðxÞ
¼ δðxÞ: ð10:120Þ
dx

In RHS of (10.118) the first term has a discontinuity at x ¼ y because of θ(x  y),
whereas the second term is continuous with respect to y. Thus, we have
"   #
∂Gðx, yÞ  ∂Gðx, yÞ
lim pðy þ εÞ x¼yþε  pðy  εÞ
ε!þ0 ∂x  ∂x x¼yε
pð y Þ pð y Þ
¼ lim ½θðþεÞ  θðεÞ ¼ : ð10:121Þ
ε!þ0 aðyÞwðyÞ aðyÞwðyÞ

Since p( y) is continuous with respect to the argument y, this factor drops off and we
get
"   #
∂Gðx, yÞ  ∂Gðx, yÞ 1
lim x¼yþε  ¼ : ð10:122Þ
ε!þ0 ∂x  ∂x x¼yε aðyÞwðyÞ

Thus, ∂G∂x
ðx, yÞ
is accompanied by a discontinuity at x ¼ y by a magnitude of aðyÞw
1
ðyÞ.
Since RHS of (10.122) is continuous with respect to the argument y, integrating
(10.122) again with respect to x, we find that G(x, y) is continuous at x ¼ y. These
properties of G(x, y) are useful to calculate Green’s functions in practical use. We
will encounter several examples in next sections.
10.5 Construction of Green’s Functions 401

Suppose that there are two Green’s functions that satisfy the same homogeneous
e ðx, yÞ be such functions. Then, we must have
BCs. Let G(x, y) and G

δ ð x  yÞ e ðx, yÞ ¼ δðx  yÞ :
Lx Gðx, yÞ ¼ and Lx G ð10:123Þ
w ð xÞ wðxÞ

Subtracting both sides of (10.123), we have


h i
Lx Ge ðx, yÞ  Gðx, yÞ ¼ 0: ð10:124Þ

e ðx, yÞ must satisfy the same homoge-


In virtue of the linearity of BCs, Gðx, yÞ  G
neous BCs as well. But, (10.124) is a homogeneous equation, and so we must have a
trivial solution from the aforementioned constructability of the Green’s function
such that

e ðx, yÞ  0 or Gðx, yÞ  G
Gðx, yÞ  G e ðx, yÞ: ð10:125Þ

This obviously indicates that a Green’s function should be unique.


We have assumed in Sect. 10.3 that the coefficients a(x), and b(x), and c(x) are
real. Therefore, taking complex conjugate of (10.95) we have

δ ð x  yÞ
Lx Gðx, yÞ ¼ : ð10:126Þ
wðxÞ

Notice here that both δ(x  y) and w(x) are real functions. Subtracting (10.95) from
(10.126), we have

Lx ½Gðx, yÞ  Gðx, yÞ ¼ 0:

Again, from the uniqueness of the Green’s function, we get G(x, y) ¼ G(x, y); i.e., G
(x, y) is real accordingly. This is independent of specific structures of Lx. In other
words, so far as we are dealing with real coefficients a(x), b(x), and c(x), G(x, y) is
real whether or not Lx is self-adjoint.

10.5 Construction of Green’s Functions

So far we dealt with homogeneous boundary conditions (BCs) with respect to a


differential equation
402 10 Introductory Green’s Functions

d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ dðxÞ, ð10:2Þ
dx2 dx

where coefficients a(x), b(x), and c(x) are real. In this case, if d(x)  0 in (10.108),
namely, the SOLDE is homogeneous equation, we have a solution u(x)  0 on the
basis of (10.108). If, on the other hand, we have inhomogeneous boundary condi-
tions (BCs), additional terms appear on RHS of (10.108) in both the cases of
homogeneous and inhomogeneous equations. In this section, we examine how we
can deal with this problem.
Following the remarks made in Sect. 10.3, we start with (10.62) or (10.69). If we
deal with a self-adjoint or Hermitian operator, we can apply (10.69) to the problem.
In a more general case where the operator is not self-adjoint, (10.62) is useful. In this
respect, in Sect. 10.6 we have a good opportunity for this.
In Sect. 10.3, we mentioned that we may relax the definition of Hermiticity of the
differential operator in the case where the surface term vanishes. Meanwhile, we
should bear in mind that the Green’s functions and adjoint Green’s functions are
constructed using homogeneous BCs regardless of whether we are concerned with a
homogeneous equation or inhomogeneous equation. Thus, even if the surface terms
do not vanish, we may regard the differential operator as Hermitian. This is because
we deal with essentially the same Green’s function to solve a problem with both the
cases of homogeneous equation and inhomogeneous equations (vide infra). Notice
also that whether or not RHS vanishes, we are to use the same Green’s function
[1]. In this sense, we do not have to be too strict with the definition of Hermiticity.
Now, suppose that for a differential (self-adjoint) operator Lx we are given
Z   s
s
du dv
dxwðxÞfv ðLx uÞ  ½Lx v ug ¼ pðxÞ v  u , ð10:69Þ
r dx dx r

where w(x) > 0 in a domain [r, s] and p(x) is a real function. Note that since (10.69) is
an identity, we may choose G(x, y) for v with an appropriate choice of w(x). Then
from (10.95) and (10.112), we have
Z    Z s 
s
δðx  yÞ δðx  yÞ
dxwðxÞ Gðy,xÞdðxÞ  u ¼ dxwðxÞ Gðy, xÞdðxÞ  u
r wðxÞ r wðxÞ
  s
duðxÞ ∂Gðy, xÞ
¼ pðxÞ Gðy,xÞ  uð x Þ :
dx ∂x x¼r
ð10:127Þ

Note in (10.127) we used the fact that both δ(x  y) and w(x) are real.
Using a property of the δ function, we get
10.5 Construction of Green’s Functions 403

Z   s
s
duðxÞ ∂Gðy, xÞ
uð y Þ ¼ dxwðxÞGðy, xÞdðxÞ  pðxÞ Gðy, xÞ  uð x Þ :
r dx ∂x x¼r
ð10:128Þ

When the differential operator Lx can be made Hermitian under an appropriate


condition of (10.71), as a real Green’s function we have

Gðx, yÞ ¼ Gðy, xÞ: ð10:113Þ

The function G(x, y) satisfies homogeneous BCs. Hence, if we assume, e.g., the
Dirichlet BCs [see (10.80)], we have

Gðr, yÞ ¼ Gðs, yÞ ¼ 0: ð10:129Þ

Using the symmetric property of G(x, y) with respect to arguments x and y, from
(10.129) we get

Gðy, r Þ ¼ Gðy, sÞ ¼ 0: ð10:130Þ

Thus, the first term of the surface terms of (10.128) is eliminated to yield
Z  s
s
∂Gðy, xÞ
uð y Þ ¼ dxwðxÞGðy, xÞdðxÞ þ pðxÞuðxÞ :
r ∂x x¼r

Exchanging the arguments x and y, we get


Z  s
s
∂Gðx, yÞ
uð x Þ ¼ dywðyÞGðx, yÞd ðyÞ þ pðyÞuðyÞ : ð10:131Þ
r ∂y y¼r

Then, (i) substituting surface terms of u(s) and u(r) that are associated with the
inhomogeneous BCs described as

B1 ðuÞ ¼ σ 1 and B2 ðuÞ ¼ σ 2 ð10:132Þ


 
ðx, yÞ ðx, yÞ
and (ii) calculating ∂G∂y  and ∂G∂y  , we will be able to obtain a unique
y¼r y¼s
solution. Notice that even though we have formally the same differential operators,
we get different Green’s functions depending upon different BCs. We see tangible
examples later.
On the basis of the general discussion of Sect. 10.4 and this section, we are in the
position to construct the Green’s functions. Except for the points of x ¼ y, the
Green’s function G(x, y) must satisfy the following differential equation:
404 10 Introductory Green’s Functions

Lx Gðx, yÞ ¼ 0, ð10:133Þ

where Lx is given by

d2 d
Lx ¼ aðxÞ þ bðxÞ þ cðxÞ: ð10:55Þ
dx2 dx

The differential equation Lxu ¼ d(x) is defined within an interval [r, s], where r may
be 1 and s may be +1.
From now on, we regard a(x), b(x), and c(x) as real functions. From (10.133), we
expect the Green’s function to be described as a linear combination of a fundamental
set of solutions u1(x) and u2(x). Here the fundamental set of solutions are given by
two linearly independent solutions of a homogeneous equation Lxu ¼ 0. Then we
should be able to express G(x, y) as a combination of F1(x, y) and F2(x, y) that are
described as

F 1 ðx, yÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ for r  x < y,


F 2 ðx, yÞ ¼ d1 u1 ðxÞ þ d2 u2 ðxÞ for y < x  s, ð10:134Þ

where c1, c2, d1, and d2 are arbitrary (complex) constants to be determined later.
These constants are given as a function of y. The combination has to be made such
that

F 1 ðx, yÞ for r  x < y,


Gðx, yÞ ¼ ð10:135Þ
F 2 ðx, yÞ for y < x  s:

Thus using θ(x) function defined as (10.119), we describe G(x, y) as

Gðx, yÞ ¼ F 1 ðx, yÞθðy  xÞ þ F 2 ðx, yÞθðx  yÞ: ð10:136Þ

Notice that F1(x, y) and F2(x, y) are “ordinary” functions and that G(x, y) is not,
because G(x, y) contains the θ(x) function.
If we have

F 2 ðx, yÞ ¼ F 1 ðy, xÞ, ð10:137Þ


Gðx, yÞ ¼ F 1 ðx, yÞθðy  xÞ þ F 1 ðy, xÞθðx  yÞ: ð10:138Þ

Hence, we get

Gðx, yÞ ¼ Gðy, xÞ: ð10:139Þ

From (10.113), Lx is Hermitian. Suppose that F1(x, y) ¼ (x  r)(y  s) and F2(x,


y) ¼ (x  s)(y  r). Then, (10.137) is satisfied and, hence, if we can construct the
10.5 Construction of Green’s Functions 405

Green’s function from F1(x, y) and F2(x, y), Lx should be Hermitian. However, if we
had, e.g., F1(x, y) ¼ x  r and F2(x, y) ¼ y  s, G(x, y) 6¼ G(y, x), and so Lx would not
be Hermitian.
The Green’s functions must satisfy the homogeneous BCs. That is,

B1 ðGÞ ¼ B2 ðGÞ ¼ 0: ð10:140Þ

Also, we require continuity condition of G(x, y) at x ¼ y and discontinuity condition


of ∂G∂x
ðx, yÞ
at x ¼ y described by (10.122). Thus, we have four conditions including
BCs and continuity and discontinuity conditions to be satisfied by G(x, y). Thus, we
can determine four constants c1, c2, d1, and d2 by the four conditions.
Now, let us inspect further details about the Green’s functions by an example.
Example 10.4 Let us consider a following differential equation

d2 u
þ u ¼ 1: ð10:141Þ
dx2

We assume that a domain of the argument x is [0, L]. We set boundary conditions
such that

uð0Þ ¼ σ 1 and uðLÞ ¼ σ 2 : ð10:142Þ

Thus, if at least one of σ 1 and σ 2 is not zero, we are dealing with an inhomogeneous
differential equation under inhomogeneous BCs.
Next, let us seek conditions that the Green’s function satisfies. We also seek a
fundamental set of solutions of a homogeneous equation described by

d2 u
þ u ¼ 0: ð10:143Þ
dx2

This is obtained by putting a ¼ c ¼ 1 and b ¼ 0 in a general form of (10.5) with a


weight function being unity. The differential equation (10.143) is therefore self-
adjoint according to the argument of Sect. 10.3. A fundamental set of solutions are
given by

eix and eix :

Then, we have

F 1 ðx, yÞ ¼ c1 eix þ c2 eix for 0  x < y,


F 2 ðx, yÞ ¼ d1 eix þ d 2 eix for y < x  L: ð10:144Þ

The functions F1(x, y) and F2(x, y) must satisfy the following BCs such that
406 10 Introductory Green’s Functions

F 1 ð0, yÞ ¼ c1 þ c2 ¼ 0 and F 2 ðL, yÞ ¼ d 1 eiL þ d 2 eiL ¼ 0: ð10:145Þ

Thus, we have
   
F 1 ðx, yÞ ¼ c1 eix  eix , F 2 ðx, yÞ ¼ d1 eix  e2iL eix : ð10:146Þ

Therefore, at x ¼ y we have
   
c1 eiy  eiy ¼ d 1 eiy  e2iL eiy : ð10:147Þ

Discontinuity condition of (10.122) is equivalent to


 
∂F 2 ðx, yÞ  ∂F 1 ðx, yÞ
∂x x¼y  ∂x x¼y
¼ 1: ð10:148Þ

This is because both F1(x, y) and F2(x, y) are ordinary functions and supposed to be
differentiable at any x. The relation (10.148) then reads as
   
id 1 eiy þ e2iL eiy  ic1 eiy þ eiy ¼ 1: ð10:149Þ

From (10.147) and (10.149), using Cramer’s rule we have


 
 0 eiy þ e2iL eiy 
 
 i eiy  e2iL eiy  iðeiy  e2iL eiy Þ
c1 ¼  iy ¼ , ð10:150Þ
e  e iy
e þ e e 
iy 2iL iy 2ð1  e2iL Þ

 eiy þ eiy eiy  e2iL eiy 
 iy 
 e  eiy 0 
 
 eiy þ eiy i  iðeiy  eiy Þ
d1 ¼  iy  ¼ : ð10:151Þ
 e  eiy eiy þ e2iL eiy  2ð1  e2iL Þ
 
 eiy þ eiy eiy  e2iL eiy 

Substituting these parameters for (10.146), we get

sin xðe2iL eiy  eiy Þ sin yðe2iL eix  eix Þ


F 1 ðx, yÞ ¼ , F 2 ð x, y Þ ¼ : ð10:152Þ
1  e2iL 1  e2iL

Making a denominator real, we have


10.5 Construction of Green’s Functions 407

sin x½ cos ðy  2LÞ  cos y


F 1 ðx, yÞ ¼ ,
2 sin 2 L
sin y½ cos ðx  2LÞ  cos x
F 2 ðx, yÞ ¼ : ð10:153Þ
2 sin 2 L

Using the θ(x) function, we get

sin x½ cos ðy  2LÞ  cos y sin y½ cos ðx  2LÞ  cos x


Gðx, yÞ ¼ θðy  xÞ þ θðx  yÞ:
2 sin 2 L 2 sin 2 L
ð10:154Þ

Thus, G(x, y) ¼ G(y, x) as expected. Notice, however, that if L ¼ nπ (n ¼ 1, 2,   ) the


Green’s function cannot be defined as an ordinary function even if x 6¼ y. We return
to this point later.
The solution for (10.141) under the homogeneous BCs is then described as
Z L
uðxÞ ¼ dyGðx, yÞ
0
Z Z
cos ðx  2LÞ  cos x x
sin x L
¼ sin ydy þ ½ cos ðy  2LÞ  cos ydy:
2 sin 2 L 0 2 sin 2 L x
ð10:155Þ

This can readily be integrated to yields solution for the inhomogeneous equation
such that

cos ðx  2LÞ  cos x  2 sin L sin x  cos 2L þ 1


uð x Þ ¼
2 sin 2 L
cos ðx  2LÞ  cos x  2 sin L sin x þ 2 sin 2 L
¼ : ð10:156Þ
2 sin 2 L

Next, let us consider the surface term. This is given by the second term of
(10.131). We get
 
∂F 1 ðx, yÞ  sin x ∂F 2 ðx, yÞ cos ðx  2LÞ  cos x
 ¼ , ¼ : ð10:157Þ
∂x y¼0
y¼L
∂x sin L 2 sin 2 L

Therefore, with the inhomogeneous BCs we have the following solution for the
inhomogeneous equation:
408 10 Introductory Green’s Functions

cos ðx  2LÞ  cos x  2 sin L sin x þ 2 sin 2 L


uð x Þ ¼
2 sin 2 L
2σ sin L sin x þ σ 1 ½ cos x  cos ðx  2LÞ
þ 2 , ð10:158Þ
2 sin 2 L

where the second term is the surface term. If σ 1 ¼ σ 2 ¼ 1, we have

uðxÞ  1:

Looking at (10.141), we find that u(x)  1 is certainly a solution for (10.141) with
inhomogeneous BCs of σ 1 ¼ σ 2 ¼ 1. The uniqueness of the solution then ensures
that u(x)  1 is a sole solution under the said BCs.
From (10.154), we find that G(x, y) has a singularity at L ¼ nπ (n : integers). This
is associated with the fact that a homogenous equation (10.143) has a nontrivial
solution, e.g., u(x) ¼ sin x under homogeneous BCs u(0) ¼ u(L ) ¼ 0. The present
situation is essentially the same as that of Example 1.1 of Sect. 1.3. In other words,
when λ ¼ 1 in (1.61), the form of a differential equation is identical to (10.143) with
virtually the same Dirichlet conditions. The point is that (10.143) can be viewed as a
homogeneous equation and, at the same time, as an eigenvalue equation. In such a
case, a Green’s function approach will fail.

10.6 Initial Value Problems (IVPs)

10.6.1 General Remarks

The IVPs frequently appear in mathematical physics. The relevant conditions are
dealt with as BCs in the theory of differential equations. With boundary functionals
B1(u) and B2(u) of (10.3) and (10.4), setting α1 ¼ β2 ¼ 1 and other coefficients as
zero, we get

du
B1 ðuÞ ¼ uðpÞ ¼ σ 1 and B2 ðuÞ ¼  ¼ σ2 : ð10:159Þ
dx x¼p

In the above, note that we choose [r, s] for a domain of x. The points r and s can be
infinity as before. Any point p within the domain [r, s] may be designated as a special
point on which the BCs (10.159) are imposed. The initial conditions are particularly
prominent among BCs. This is because the conditions are set at one point of the
argument. This special condition is usually called initial conditions. In this section,
we investigate fundamental characteristics of IVPs.
Suppose that we have
10.6 Initial Value Problems (IVPs) 409


du
uð pÞ ¼  ¼ 0, ð10:160Þ
dx x¼p

with homogeneous BCs. Given a differential operator Lx defined as (10.55), i.e.,

d2 d
Lx ¼ aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx

let a fundamental set of solutions be u1(x) and u2(x) for

Lx uðxÞ ¼ 0: ð10:161Þ

A general solution u(x) for (10.161) is given by a linear combination of u1(x) and
u2(x) such that

uðxÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ, ð10:162Þ

where c1 and c2 are arbitrary (complex) constants. Suppose that we have homoge-
neous BCs expressed by (10.160). Then, we have

uðpÞ ¼ c1 u1 ðpÞ þ c2 u2 ðpÞ ¼ 0,


u0 ðpÞ ¼ c1 u1 0 ðpÞ þ c2 u2 0 ðpÞ ¼ 0:

Rewriting it in a matrix form, we have


  
u1 ð pÞ u2 ð pÞ c1
¼ 0:
u1 0 ð pÞ u2 0 ðpÞ c2

Since the matrix represents Wronskian of a fundamental set of solutions u1(x) and
u2(x), its determinant never vanishes at any point p. That is, we have
 
 u1 ð pÞ u2 ð pÞ 
 
 u 0 ðpÞ u 0 ðpÞ  6¼ 0: ð10:163Þ
1 2

Then, we necessarily have c1 ¼ c2 ¼ 0. From (10.162), we have a trivial solution

uð x Þ  0

under the initial conditions as homogeneous BCs. Thus, as already discussed a


Green’s function can always be constructed for IVPs.
To seek the Green’s functions for IVPs, we return back to the generalized Green’s
identity described as
410 10 Introductory Green’s Functions

Z h  s
s

 {  i  du dðav Þ 
dx v ðLx uÞ  Lx v u ¼ av u þ buv : ð10:62Þ
r dx dx r

For the surface term (RHS) to vanish, for homogeneous BCs we have, e.g.,
 
du  dv
uð s Þ ¼ x¼s ¼ 0 and vðr Þ ¼  ¼ 0,
dx dx x¼r

for the two sets of BCs adjoint to each other. Obviously, these are not identical
simply because the former is determined at s and the latter is determined at a different
point r. For this reason, the operator Lx is not Hermitian, even though it is formally
self-adjoint. In such a case, we would rather use Lx directly than construct a self-
adjoint operator because we cannot make the operator Hermitian either way.
Hence, unlike the precedent sections we do not need a weight function w(x). Or,
we may regard w(x)  1. Then, we reconsider the conditions which the Green’s
functions should satisfy. On the basis of the general consideration of Sect. 8.4,
especially (10.86), (10.87), and (10.94), we have [2].

Lx Gðx, yÞ ¼ hxjyi ¼ δðx  yÞ: ð10:164Þ

Therefore, we have

2
∂ Gðx, yÞ bðxÞ ∂Gðx, yÞ cðxÞ δðx  yÞ
þ þ Gðx, yÞ ¼ :
∂x 2 að x Þ ∂x að x Þ að x Þ

Integrating or integrating by parts the above equation, we get


 x Z x 0 Z x
∂Gðx, yÞ bð x Þ bð ξ Þ cðξÞ
þ Gðx, yÞ  Gðξ, yÞdξ þ Gðξ, yÞdξ
∂x að x Þ x0 x0 a ð ξ Þ x0 ð ξ Þ
a
θ ð x  yÞ
¼ :
aðyÞ

∂Gðx, yÞ θðxyÞ
Noting that the functions other than ∂x
and aðyÞ are continuous, as before we
have
"   #
∂Gðx, yÞ  ∂Gðx, yÞ 1
lim x¼yþε  ¼ : ð10:165Þ
ε!þ0 ∂x  ∂x x¼yε að y Þ
10.6 Initial Value Problems (IVPs) 411

10.6.2 Green’s Functions for IVPs

From a practical point of view, we may set r ¼ 0 in (10.62). Then, we can choose a
domain for [0, s] (for s > 0) or [s, 0] (for s < 0) with (10.2). For simplicity, we use
x instead of s. We consider two cases of x > 0 and x < 0.
(i) Case I (x > 0): Let u1(x) and u2(x) be a fundamental set of solutions. We define
F1(x, y) and F2(x, y) as before such that

F 1 ðx, yÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ for 0  x < y,


F 2 ðx, yÞ ¼ d1 u1 ðxÞ þ d 2 u2 ðxÞ for 0 < y < x: ð10:166Þ

As before, we set

F 1 ðx, yÞ for 0  x < y,


Gðx, yÞ ¼
F 2 ðx, yÞ for 0 < y < x:

Homogeneous BCs are defined as

uð0Þ ¼ 0 and u0 ð0Þ ¼ 0:

Correspondingly, we have

F 1 ð0, yÞ ¼ 0 and F 1 0 ð0, yÞ ¼ 0:

This is translated into

c1 u1 ð0Þ þ c2 u2 ð0Þ ¼ 0 and c1 u1 0 ð0Þ þ c2 u2 0 ð0Þ ¼ 0:

In a matrix form, we get


  
u1 ð 0Þ u2 ð 0Þ c1
¼ 0:
u1 0 ð 0Þ u2 0 ð0Þ c2

As mentioned above, since u1(x) and u2(x) are a fundamental set of solutions, we
have c1 ¼ c2 ¼ 0. Hence, we get

F 1 ðx, yÞ ¼ 0:

From the continuity and discontinuity conditions (10.165) imposed upon the
Green’s functions, we have
412 10 Introductory Green’s Functions

d 1 u1 ðyÞ þ d2 u2 ðyÞ ¼ 0 and d 1 u1 0 ðyÞ þ d 2 u2 0 ðyÞ ¼ 1=aðyÞ: ð10:167Þ

As before, we get

u2 ð y Þ u1 ð y Þ
d1 ¼  and d 2 ¼ ,
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

where W(u1( y), u2( y)) is Wronskian of u1( y) and u2( y). Thus, we get

u2 ðxÞu1 ðyÞ  u1 ðxÞu2 ðyÞ


F 2 ðx, yÞ ¼ : ð10:168Þ
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

(ii) Case II (x < 0): Next, we think of the case as below:

F 1 ðx, yÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ for y < x  0,


F 2 ðx, yÞ ¼ d1 u1 ðxÞ þ d 2 u2 ðxÞ for x < y < 0: ð10:169Þ

Similarly proceeding as the above, we have c1 ¼ c2 ¼ 0. Also, we get

u1 ðxÞu2 ðyÞ  u2 ðxÞu1 ðyÞ


F 2 ðx, yÞ ¼ : ð10:170Þ
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

Here notice that the sign is reversed in (10.170)) relative to (10.168). This is because
on the discontinuity condition, instead of (10.167) we have to have

d 1 u1 0 ðyÞ þ d2 u2 0 ðyÞ ¼ 1=aðyÞ:

This results from the fact that magnitude relationship between the arguments x and
y has been reversed in (10.169) relative to (10.166).
Summarizing the above argument, (10.168) is obtained in the domain 0  y < x;
(10.170) is obtained in the domain x < y < 0. Noting this characteristic, we define a
function such that

Θðx, yÞ  θðx  yÞθðyÞ  θðy  xÞθðyÞ: ð10:171Þ

Notice that

Θðx, yÞ ¼ Θðx, yÞ:

That is, Θ(x, y) is antisymmetric with respect to the origin.


Figure 10.2 shows a feature of Θ(x, y). If the “initial point” is taken at x ¼ a, we
can use Θ(x  a, y  a) instead; see Fig. 10.3. The function is described as
10.6 Initial Value Problems (IVPs) 413

Fig. 10.2 Graph of a x


function Θ(x, y). Θ(x,
y) ¼ 1 or  1 in hatched
areas, otherwise Θ(x, y) ¼ 0 , =1

( , )

O y

, = −1

Fig. 10.3 Graph of a x


function Θ(x  a, y  a).
We assume a > 0

( − , − )
O
y

Θðx  a, y  aÞ ¼ θðx  yÞθðy  aÞ  θðy  xÞθða  yÞ:

Note that Θ(x  a, y  a) can be obtained by shifting Θ(x, y) toward the positive
direction of the x- and y-axes by a (a can be either positive or negative; in Fig. 10.3
we assume a > 0). Using the Θ(x, y) function, the Green’s function is described as

u2 ðxÞu1 ðyÞ  u1 ðxÞu2 ðyÞ


Gðx, yÞ ¼ Θðx, yÞ: ð10:172Þ
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

Defining a function F such that


414 10 Introductory Green’s Functions

u2 ðxÞu1 ðyÞ  u1 ðxÞu2 ðyÞ


F ðx, yÞ  , ð10:173Þ
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

we have

Gðx, yÞ ¼ F ðx, yÞΘðx, yÞ: ð10:174Þ

Notice that

Θðx, yÞ 6¼ Θðy, xÞ and Gðx, yÞ 6¼ Gðy, xÞ: ð10:175Þ

It is therefore obvious that the differential operator is not Hermitian.

10.6.3 Estimation of Surface Terms

To include the surface term of the inhomogeneous case, we use (10.62).


Z h  s
s   i du dðav Þ
dx v ðLx uÞ  Lx { v u ¼ av  u þ buv : ð10:62Þ
r dx dx r

As before, we set r ¼ 0 in (10.62). Also, we classify (10.62) into two cases according
as s > 0 or s < 0.
(i) Case I (x > 0, y > 0): Equation (10.62) reads as

Z h  1
1

 {  i  du dðav Þ 
dx v ðLx uÞ  Lx v u ¼ av u þ buv : ð10:176Þ
0 dx dx 0

This time, inserting

gðx, yÞ ¼ G ðy, xÞ ¼ Gðy, xÞ ð10:177Þ

into v of (10.176) and arranging terms, we have


Z 1
uð y Þ ¼ dxGðy, xÞd ðxÞ
0
 1
duðxÞ daðxÞ ∂Gðy,xÞ
 aðxÞGðy,xÞ  uð x Þ Gðy, xÞ  uðxÞaðxÞ þ bðxÞuðxÞGðy,xÞ :
dx dx ∂x x¼0
ð10:178Þ
10.6 Initial Value Problems (IVPs) 415

Fig. 10.4 Domain of G(y,


x). Areas for which G(y, x) x
does not vanish are hatched

O y

Note that in the above we used Lx{g(x, y) ¼ δ(x  y). In Fig. 10.4, we depict a domain
of G(y, x) in which G(y, x) does not vanish. Notice that we get the domain by folding
back that of Θ(x, y) (see Fig. 10.2) relative to a straight line y ¼ x. Thus, we find that
G(y, x) vanishes at x > y. So does ∂G∂x ðy, xÞ
; see Fig. 10.4. Namely, the second term of
RHS of (10.178) vanishes at x ¼ 1. In other words, g(x, y) and G(y, x) must satisfy
the adjoint BCs; i.e.,

gð1, yÞ ¼ Gðy, 1Þ ¼ 0: ð10:179Þ

At the same time, the upper limit of integration range of (10.178) can be set at y.
Noting the above, we have
Z y
ðthe first termÞ of ð10:178Þ ¼ dxGðy, xÞdðxÞ: ð10:180Þ
0

Also with the second term of (10.178), we get

ðthe second termÞ of ð10:178Þ ¼


 
duðxÞ daðxÞ ∂Gðy, xÞ
þ aðxÞGðy, xÞ  uð x Þ Gðy, xÞ  uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ
dx dx ∂x x¼0

duð0Þ dað0Þ ∂Gðy, xÞ
¼ að0ÞGðy, 0Þ  uð 0Þ Gðy, 0Þ  uð0Það0Þ
dx dx ∂x x¼0
þ bð0Þuð0ÞGðy, 0Þ: ð10:181Þ

If we substitute inhomogeneous BCs


416 10 Introductory Green’s Functions


du
uð0Þ ¼ σ 1 and  ¼ σ2 , ð10:182Þ
dx x¼0

for (10.181) along with other appropriate values, we should be able to get a unique
solution as
Z y
uð y Þ ¼ dxGðy, xÞdðxÞ
0
  
dað0Þ ∂Gðy, xÞ
þ σ 2 að 0Þ  σ 1 þ σ 1 bð0Þ Gðy, 0Þσ 1 að0Þ : ð10:183Þ
dx ∂x x¼0

Exchanging arguments x and y, we get


Z x
uð x Þ ¼ dyGðx, yÞdðyÞ
0
  
dað0Þ ∂Gðx, yÞ
þ σ 2 að 0Þ  σ 1 þ σ 1 bð0Þ Gðx, 0Þσ 1 að0Þ : ð10:184Þ
dy ∂y y¼0

Here, we consider that Θ(x, y) ¼ 1 in this region and use (10.174). Meanwhile, from
(10.174), we have

∂Gðx, yÞ ∂F ðx, yÞ ∂Θðx, yÞ


¼ Θðx, yÞ þ F ðx, yÞ: ð10:185Þ
∂y ∂y ∂y

In the second term,

∂Θðx, yÞ ∂θðx  yÞ ∂θðyÞ ∂θðy  xÞ


¼ θðyÞ þ θðx  yÞ  θðyÞ
∂y ∂y ∂y ∂y
∂θðyÞ
 θ ð y  xÞ
∂y
¼ δðx  yÞθðyÞ þ θðx  yÞδðyÞ  δðy  xÞθðyÞ
þ θðy  xÞδðyÞ
¼ δðx  yÞ½θðyÞ þ θðyÞ þ ½θðx  yÞ þ θðy  xÞδðyÞ
¼ δðx  yÞ½θðyÞ þ θðyÞ þ ½θðxÞ þ θðxÞδðyÞ
¼ δðx  yÞ þ δðyÞ, ð10:186Þ

where we used

θðxÞ þ θðxÞ  1 ð10:187Þ

as well as
10.6 Initial Value Problems (IVPs) 417

f ðyÞδðyÞ ¼ f ð0ÞδðyÞ ð10:188Þ

and

δðyÞ ¼ δðyÞ: ð10:189Þ

However, the function δ(x  y) + δ( y) is of secondary importance. It is because in


(10.184) we may choose [ε, x  ε] (ε > 0) for the domain y and put ε ! + 0 after the
integration and other calculations related to the surface terms. Therefore,
δ(x  y) + δ( y) in (10.186) virtually vanishes.
Thus, we can express (10.185) as

∂Gðx, yÞ ∂F ðx, yÞ ∂F ðx, yÞ


¼ Θðx, yÞ ¼ :
∂y ∂y ∂y

Then, finally we reach


Z x
uð x Þ ¼ dyF ðx, yÞdðyÞ
0
  
dað0Þ ∂F ðx, yÞ
þ σ 2 að 0Þ  σ 1 þ σ 1 bð0Þ F ðx, 0Þσ 1 að0Þ : ð10:190Þ
dy ∂y y¼0

(ii) Case II (x < 0, y < 0): Similarly as the above, Eq. (10.62) reads as
Z 0
uð y Þ ¼ dxGðy, xÞd ðxÞ
1

 0
duðxÞ daðxÞ ∂Gðy, xÞ
 aðxÞGðy, xÞ  uð x Þ Gðy, xÞ  uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ :
dx dx ∂x 1

Similarly as mentioned above, the lower limit of integration range is y. Considering


both G(y, x) and ∂G∂x
ðy, xÞ
vanish at x < y (see Fig. 10.4), we have
Z 0
uð y Þ ¼ dxGðy, xÞd ðxÞ
y
 
duðxÞ daðxÞ ∂Gðy, xÞ
 aðxÞGðy, xÞ  uð x Þ Gðy, xÞ  uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ
dx dx ∂x x¼0
Z y  
dað0Þ
¼  dxGðy, xÞdðxÞ  σ 2 að0Þ  σ 1 þ σ 1 bð0Þ Gðy, 0Þ
0 dx

∂Gðy, xÞ
þσ 1 að0Þ : ð10:191Þ
∂x x¼0
418 10 Introductory Green’s Functions

Comparing (10.191) with (10.183), we recognize that the sign of RHS of (10.191)
has been reversed relative to RHS of (10.183). This is also the case after exchanging
arguments x and y. Note, however, Θ(x, y) ¼  1 in the present case. As a result, two
minus signs cancel and (10.191) takes exactly the same expression as (10.183).
Proceeding with calculations similarly, for both Cases I and II we arrive at a unified
solution represented by (10.190) throughout a domain (1, +1).

10.6.4 Examples

To deepen understanding of Green’s functions, we deal with tangible examples of


the IVP below.
Example 10.5 Let us consider a following inhomogeneous differential equation

d2 u
þ u ¼ 1: ð10:192Þ
dx2

Note that (10.192) is formally the same differential equation of (10.141). We may
encounter (10.192) when we are observing a motion of a charged harmonic oscillator
that is placed under a static electric field. We assume that a domain of the argument
x is a whole range of real numbers. We set boundary conditions such that

uð0Þ ¼ σ 1 and u0 ð0Þ ¼ σ 2 : ð10:193Þ

As in the case of Example 10.4, a fundamental set of solutions are given by

eix and eix : ð10:194Þ

Therefore, following (10.173), we get

F ðx, yÞ ¼ sin ðx  yÞ: ð10:195Þ

Also following (10.190) we have


Z 
x 
uð x Þ ¼ dy sin ðx  yÞ þ σ 2 sin x  σ 1 ½ cos ðx  yÞ
0 y¼0

¼ 1  cos x þ σ 2 sin x þ σ 1 cos x: ð10:196Þ

In particular, if we choose σ 1 ¼ 1 and σ 2 ¼ 0, we have


10.6 Initial Value Problems (IVPs) 419

uðxÞ  1: ð10:197Þ

This also ensures that this is a unique solution under the inhomogeneous BCs
described as σ 1 ¼ 1 and σ 2 ¼ 0.
Example 10.6: Damped Oscillator If a harmonic oscillator undergoes friction, the
oscillator exerts damped oscillation. Such an oscillator is said to be a damped
oscillator. The damped oscillator is often dealt with when we think of bound
electrons in a dielectric medium that undergo an effect of a dynamic external field
varying with time. This is the case when the electron is placed in an alternating
electric field or an electromagnetic wave.
An equation of motion of the damped oscillator is described as

d2 u du
m þ r þ ku ¼ d ðxÞ, ð10:198Þ
dt 2 dx

where m is a mass of an electron; r is a damping constant; k is a spring constant of the


damped oscillator. To seek a fundamental set of solutions of a homogeneous
equation described as

d2 u du
m þ r þ ku ¼ 0,
dt 2 dx

putting

u ¼ eiρt ð10:199Þ

and inserting it to (10.198), we have


 
r k iρt
ρ þ iρ þ
2
e ¼ 0:
m m

Since eiρt does not vanish, we have

r k
ρ2 þ iρ þ ¼ 0: ð10:200Þ
m m

We call this equation a characteristic quadratic equation. We have three cases for the
solution of a quadratic equation of (10.200). Solving (10.200), we get
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ir r2 k
ρ¼  2þ : ð10:201Þ
2m 4m m
2
(i) Equation (10.201) gives two pure imaginary roots; i.e.,  4m
r
2 þ m < 0 (an over
k
2
damping). (ii) The equation has double roots;  4m
r
2 þ m ¼ 0 (a critical damping).
k
420 10 Introductory Green’s Functions

2
(iii) The equation has two complex roots;  4mr
2 þ m > 0 (a weak damping). Of these,
k

Case (iii) is characterized by an oscillating solution and has many applications in


mathematical physics. For the Cases (i) and (ii), on the other hand, we do not have an
oscillating solution.
Case (i):
The characteristic roots are given by
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ir r2 k
ρ¼ i  :
2m 4m2 m

Therefore, we have a fundamental set of solutions described by


rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !
rt r2 k
uðt Þ ¼ exp  exp  t :
2m 4m2 m

Then, a general solution is given by


" rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !#
rt r2 k r2 k
uðt Þ ¼ exp  a exp 2
 t þ b exp  2
 t :
2m 4m m 4m m

Case (ii):
The characteristic roots are given by

ir
ρ¼ :
2m

Therefore, one of the solutions is

rt
u1 ðt Þ ¼ exp  :
2m

Another solution u2(t) is given by

∂u1 ðt Þ rt
u 2 ðt Þ ¼ c ¼ c0 t exp  ,
∂ðiρÞ 2m

where c and c0 are appropriate constants. Thus, general solution is given by

rt rt
uðt Þ ¼ a exp  þ bt exp  :
2m 2m

The most important and interesting feature emerges as a “damped oscillator” in the
next Case (iii) in many fields of natural science. We are particularly interested in
this case.
10.6 Initial Value Problems (IVPs) 421

Case (iii):
Suppose that the damping is relatively weak such that the characteristic equation
has two complex roots. Let us examine further details of this case following the
prescriptions of IVPs. We divide (10.198) by m for the sake of easy handling of the
differential equation such that

d 2 u r du k 1
þ þ u ¼ dðxÞ:
dt 2 m dx m m

Putting
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r2 k
ω  2þ , ð10:202Þ
4m m

we get a fundamental set of solutions described as

rt
uðt Þ ¼ exp  exp ð iωt Þ: ð10:203Þ
2m

Given BCs, following (10.172) we get as a Green’s function

u2 ðt Þu1 ðτÞ  u1 ðt Þu2 ðτÞ


Gðt, τÞ ¼ Θðt, τÞ
W ðu1 ðτÞ, u2 ðτÞÞ
1 2mr ðtτÞ
e ¼ sin ωðt  τÞΘðt, τÞ, ð10:204Þ
ω
 rt   rt 
where u1 ðt Þ ¼ exp  2m exp ðiωt Þ and u2 ðt Þ ¼ exp  2m exp ðiωt Þ.
We examine whether G(t, τ) is eligible for the Green’s function as follows:
h i
dG 1 r 2mr ðtτÞ
sin ωðt  τÞ þ ωe2mðtτÞ cos ωðt  τÞ Θðt, τÞ
r
¼  e
dt ω 2m
1 2mr ðtτÞ
þ e sin ωðt  τÞ½δðt  τÞθðτÞ þ δðτ  t ÞθðτÞ: ð10:205Þ
ω

The second term of (10.205) vanishes because sinω(t  τ)δ(t  τ) ¼ 0  δ(t  τ) ¼ 0.


Thus,

d2 G 1 r 2 r 2mr ðtτÞ
e2mðtτÞ sin ωðt  τÞ 
r
¼  ωe cos ωðt  τÞ
dt 2 ω 2m m
ω2 e2mðtτÞ sin ωðt  τÞΘðt, τÞ
r

h i
1 r 2mr ðtτÞ
sin ωðt  τÞ þ ωe2mðtτÞ cos ωðt  τÞ
r
þ  e
ω 2m
½δðt  τÞθðτÞ þ δðτ  t ÞθðτÞ: ð10:206Þ
422 10 Introductory Green’s Functions

In the last term using the property of the δ function and θ function, we get δ(t  τ).
Note here that

e2mðtτÞ cos ωðt  τÞ½δðt  τÞθðτÞ þ δðτ  t ÞθðτÞ


r

¼ e2m0 cos ðω  0Þfδðt  τÞ½θðτÞ þ θðτÞg


r

¼ δðt  τÞ½θðτÞ þ θðτÞ ¼ δðt  τÞ:

Thus, rearranging (10.206), we get


 
d2 G 1 r 2 2 2m
e ðtτÞ sin ωðt  τÞ
r
¼ δ ð t  τ Þ þ f  þ ω
dt 2 ω 2m
h i
r r 2mr ðtτÞ
sin ωðt  τÞ þ ωe2mðtτÞ cos ωðt  τÞ g Θðt, τÞ
r
  e
m 2m
k r dG
¼ δ ðt  τ Þ  G  , ð10:207Þ
m m dt

where we used (10.202) for the last equality. Rearranging (10.207) once again, we
have

d 2 G r dG k
þ þ G ¼ δðt  τÞ: ð10:208Þ
dt 2 m dt m

Defining the following operator

d2 r d k
Lt  þ þ , ð10:209Þ
dt 2 m dt m

we get

Lt G ¼ δðt  τÞ: ð10:210Þ

Note that this expression is consistent with (10.164). Thus, we find that (10.210)
satisfies the condition (10.123) of the Green’s function, where the weight function is
identified with unity.
Now, suppose that a sinusoidally changing external field eiΩt influences the
motion of the damped oscillator. Here we assume that an amplitude of the external
field is unity. Then, we have

d2 u r du k 1
þ þ u ¼ eiΩt : ð10:211Þ
dt 2 m dx m m
10.6 Initial Value Problems (IVPs) 423

Thus, as a solution of the homogeneous boundary conditions [i.e., uð0Þ ¼ uð_0Þ ¼ 0]


we get
Z t
1
e2mðtτÞ eiΩt sin ωðt  τÞdτ,
r
uð t Þ ¼ ð10:212Þ
mω 0

where t is an arbitrary positive or negative time. Equation (10.212) shows that with
the real part we have an external field cosΩt and that with the imaginary part we have
an external field sinΩt. To calculate (10.212) we use
h i
1 iωðtτÞ
sin ωðt  τÞ ¼ e  eiωðtτÞ : ð10:213Þ
2i

Then, the equation can readily be solved by integration of exponential functions,


even though we have to do somewhat lengthy (but straightforward) calculations.
Thus for the real part (i.e., the external field is cosΩt), we get a solution

1 r 1 r 2 1 2 
Cuðt Þ ¼ ΩsinΩt þ cos Ωt  Ω  ω2 cos Ωt
m m m 2m m
h
1 r   r 1 2 
þ e2mt Ω2  ω2 cos ωt  Ω þ ω2 sin ωt
m 2m ω
r 2 r 31
 cos ωt  sin ωt, ð10:214Þ
2m 2m ω

where C is a constant [i.e., a constant denominator of u(t)] expressed as

 2 r2   r 4
C ¼ Ω 2  ω 2 þ 2 Ω 2 þ ω2 þ : ð10:215Þ
2m 2m

For the imaginary part (i.e., the external field is sinΩt) we get

1 2  1 r 1 r 2
Cuðt Þ ¼  Ω  ω2 sin Ωt  ΩcosΩt þ sin Ωt
m m m m 2m
 
1 2mr t Ω  2  r r 2Ω
þ e Ω  ω sin ωt þ Ωcosωt þ
2
sin ωt :
m ω m 2m ω
ð10:216Þ

In Fig. 10.5 we show an example that depicts the positions of a damped oscillator as
a function of time. In Fig. 10.5a, an amplitude of an envelope gradually diminishes
with time. An enlarged diagram near the origin (Fig. 10.5b) clearly reflects the initial
conditions uð0Þ ¼ uð_0Þ ¼ 0 . In Fig. 10.5, we put m ¼ 1 [kg], Ω ¼ 1 1s , ω ¼
  
0:94 1s , and r ¼ 0:006 kgs . In the above calculations, if mr is small enough (i.e.,
damping is small enough), the third order and fourth order of mr may be ignored and
the approximation is precise enough.
424 10 Introductory Green’s Functions

(a)





3RVLWLRQ P




Phase: 

ಥ 7LPH V 

(b) 




3RVLWLRQ P


Phase: 0



ಥ 7LPH V 

Fig. 10.5 Example of a damped oscillation as a function of t. The data are taken from (10.214). (a)
Overall profile. (b) Profile enlarged near the origin

In the case of inhomogeneous BCs, given σ 1 ¼ u(0) and σ 2 ¼ uð_0Þ we can decide
additional terms S(t) using (10.190) such that

r 1 2mr t
sin ωt þ σ 1 e2mt cos ωt:
r
Sð t Þ ¼ σ 2 þ σ 1 e ð10:217Þ
2m ω

This term arises from (10.190). Thus, from (10.212) and (10.217), u(t) + S(t) gives a
unique solution for the SOLDE with inhomogeneous BCs. Notice that S(t) does not
depend on the external field.
10.7 Eigenvalue Problems 425

10.7 Eigenvalue Problems

We often encounter eigenvalue problems in mathematical physics. Of these, those


related to Hermitian differential operators have particularly interesting and important
features. The eigenvalue problems we have considered in Part I are typical illustra-
tions. Here we investigate general properties of the eigenvalue problems.
Returning to the case of homogeneous BCs, we consider a following homoge-
neous SOLDE:

d2 u du
að x Þ þ bðxÞ þ cðxÞu ¼ 0: ð10:5Þ
dx2 dx

Defining a following differential operator Lx such that

d2 d
Lx ¼ aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx

we have a homogeneous equation

Lx uðxÞ ¼ 0: ð10:218Þ

Putting a constant λ instead of c(x), we have

d2 u du
að x Þ þ bðxÞ  λu ¼ 0: ð10:219Þ
dx2 dx

If we define a differential operator Lx such that

d2 d
L x ¼ að x Þ þ bðxÞ  λ, ð10:220Þ
dx2 dx

we have a homogeneous equation

Lx u ¼ 0 ð10:221Þ

to express (10.219). Instead, if we define a differential operator Lx such that

d2 d
Lx ¼ aðxÞ þ bð x Þ ,
dx2 dx

we have the same homogeneous equation


426 10 Introductory Green’s Functions

Lx u ¼ λu ð10:222Þ

to express (10.219).
Equations (10.221) and (10.222) are essentially the same except that the expres-
sion is different. The expression using (10.222) is familiar to us as an eigenvalue
equation. The difference between (10.5) and (10.219) is that whereas c(x) in (10.5) is
a given fixed function, λ in (10.219) is constant, but may be varied according to the
solution of u(x). One of the most essential properties of the eigenvalue problem that
is posed in the form of (10.222) is that its solution is not uniquely determined as
already studied in various cases of Part I. Remember that the methods based upon the
Green’s function are valid for a problem to which a homogeneous differential
equation has a trivial solution (i.e., identically zero) under homogeneous BCs. In
contrast to this situation, even though the eigenvalue problem is basically posed as a
homogeneous equation under homogeneous BCs, nontrivial solutions are expected
to be obtained. In this respect, we have seen that in Part I we rejected a trivial
solution (i.e., identically zero) because of no physical meaning.
As exemplified in Part I, the eigenvalue problems that appear in mathematical
physics are closely connected to the Hermiticity of (differential) operators. This is
because in many cases an eigenvalue is required to be real. We have already
examined how we can convert a differential operator to the self-adjoint form. That
is, if we define p(x) as in (10.26), we have the self-adjoint operator as described in
(10.70). As a symbolic description, we have
 
d du
wðxÞLx u ¼ pð x Þ þ cðxÞwðxÞu: ð10:223Þ
dx dx

In the same way, multiplying both sides of (10.222) by w(x), we get

wðxÞLx u ¼ λwðxÞu: ð10:224Þ

For instance, Hermite differential equation that has already appeared as (2.118) in
Sect. 2.3 is described as

d2 u du
 2x þ 2nu ¼ 0: ð10:225Þ
dx2 dx

If we express (10.225) as in (10.224), multiplying ex on both sides of (10.225) we


2

have
 
d x2 du
þ 2nex u ¼ 0:
2
e ð10:226Þ
dx dx

Notice that the differential operator has been converted to a self-adjoint form
according to (10.31) that defines a real and positive weight function ex in the
2
10.7 Eigenvalue Problems 427

present case. The domain of the Hermite differential equation is (1, +1) at the
endpoints (i.e., 1) of which the surface term of RHS of (10.69) approaches zero
sufficiently rapidly in virtue of ex .
2

In Sects. 3.4 and 3.5, in turn, we dealt with the (associated) Legendre differential
equation (3.127) for which the relevant differential operator is self-adjoint. The
surface term corresponding to (10.69) vanishes. It is because (1  ξ2) vanishes at
the endpoints ξ ¼ cos θ ¼  1 (i.e., θ ¼ 0 or π) from (3.107). Thus, the Hermiticity
is automatically ensured for the (associated) Legendre differential equation as well
as Hermite differential equation. In those cases, even though the differential equa-
tions do not satisfy any particular BCs, the Hermiticity is yet ensured.
In the theory of differential equations, the aforementioned properties of the
Hermitian operators have been fully investigated as the so-called StrumLiouville
system (or problem) in the form of a homogeneous differential equation. The related
differential equations are connected to classical orthogonal polynomials having
personal names such as Hermite, Laguerre, Jacobi, Gegenbauer, Legendre, and
Tchebichef. These equations frequently appear in quantum mechanics and electro-
magnetism as typical examples of StrumLiouville system. They can be converted
to differential equations by multiplying an original form by a weight function. The
resulting equations can be expressed as
 
d dY ðxÞ
aðxÞwðxÞ n þ λn wðxÞY n ðxÞ ¼ 0, ð10:227Þ
dx dx

where Yn(x) is a collective representation of classical orthogonal polynomials.


Equation (10.226) is an example. Conventionally, a following form is adopted
instead of (10.227):
 
1 d dY n ðxÞ
aðxÞwðxÞ þ λn Y n ðxÞ ¼ 0, ð10:228Þ
wðxÞ dx dx

where we put (aw)0 ¼ bw. That is, the differential equation is originally described as

d 2 Y n ð xÞ dY ðxÞ
að x Þ þ bðxÞ n þ λn Y n ðxÞ ¼ 0: ð10:229Þ
dx2 dx

In the case of Hermite polynomials, for instance, a(x) ¼ 1 and wðxÞ ¼ ex . Since we
2

have ðawÞ0 ¼ 2xex ¼ bw, we can put b ¼  2x. Examples including this case
2

are tabulated in Table 10.2. The eigenvalues λn are associated with real numbers that
characterize the individual physical systems. The related fields have a wide appli-
cations in many branches of natural science.
After having converted the operator to the self-adjoint form; i.e., Lx ¼ Lx{, instead
of (10.100) we have
428

Table 10.2 Classical polynomials and their related SOLDEs


Name of the polynomial SOLDE form Weight function: w(x) Domain
2
Hermite: Hn(x) d2 d ex (1, +1)
dx2 H n ðxÞ  2x dx H n ðxÞ þ 2nH n ðxÞ ¼ 0
Laguerre: Lνn ðxÞ d2 ν d ν xνex (ν >  1) [0, +1)
x dx2 Ln ðxÞ þ ðν þ 1  xÞ dx Ln ðxÞ þ nLνn ðxÞ ¼0
2
Gegenbauer: C λn ðxÞ d λ d λ λ λ12 [1, +1]
ð1  x2 Þ dx 2 C n ðxÞ  ð2λ þ 1Þx dx C n ðxÞ þ nðn þ 2λÞC n ðxÞ ¼ 0 ð1  x2 Þ 
λ >  12
2
Legendre: Pn(x) d d 1 [1, +1]
ð1  x2 Þ dx 2 Pn ðxÞ  2x dx Pn ðxÞ þ nðn þ 1ÞPn ðxÞ ¼ 0
10
Introductory Green’s Functions
10.7 Eigenvalue Problems 429

Z s
dxwðxÞfv ðLx uÞ  ½Lx v ug ¼ 0: ð10:230Þ
r

Rewriting it, we get


Z s Z s

dxv ½wðxÞLx u ¼ dx½wðxÞLx v u: ð10:231Þ
r r

If we use an inner product notation described by (10.88), we get

hvjLx ui ¼ hLx vjui: ð10:232Þ

Here let us think of two eigenfunctions ψ i and ψ j that belong to an eigenvalue λi


and λj, respectively. That is,

wðxÞLx ψ i ¼ λi wðxÞψ i and wðxÞLx ψ j ¼ λj wðxÞψ j : ð10:233Þ

Inserting ψ i and ψ j into u and v, respectively, in (10.232), we have

ψ j jLx ψ i ¼ ψ j jλi ψ i ¼ λi ψ j jψ i ¼ λj  ψ j jψ i ¼ λj ψ j jψ i
¼ Lx ψ j jψ i : ð10:234Þ

With the second and third equalities, we have used a rule of the inner product (see
Parts I and III). Therefore, we get
 
λi  λj  ψ j jψ i ¼ 0: ð10:235Þ

Putting i ¼ j in (10.235), we get

ðλi  λi  Þhψ i jψ i i ¼ 0: ð10:236Þ

An inner product hψ i| ψ ii vanishes if and only if jψ ii  0; see inner product


calculation rules of Sect. 13.1. However, j ψ ii  0 is not acceptable as a physical
state. Therefore, we must have hψ i| ψ ii 6¼ 0. Thus, we get

λi  λi  ¼ 0 or λi ¼ λi  : ð10:237Þ

The relation (10.237) obviously indicates that λi is real; i.e., we find that eigenvalues
of an Hermitian operator are real. If λi 6¼ λj ¼ λj, from (10.235) we get

ψ j jψ i ¼ 0: ð10:238Þ

That is, | ψ ji and | ψ ii are orthogonal to each other.


430 10 Introductory Green’s Functions

We often encounter related orthogonality relationship between vectors and func-


tions. We saw several cases in Part I and will see other cases in Part III.

References

1. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York


2. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York
Part III
Linear Vector Spaces

In this part we treat vectors and their transformations in linear vector spaces so that
we can address various aspects of mathematical physics systematically but intui-
tively. We outline general principles of linear vector spaces mostly from an algebraic
point of view. Starting with abstract definition and description of vectors, we deal
with their transformation in a vector space using a matrix. An inner product is a
central concept in the theory of a linear vector space so that two vectors can be
associated with each other to yield a scalar. Unlike many of the books of linear
algebra and linear vector spaces, however, we describe canonical forms of matrices
before considering the inner product. This is because we can treat the topics in light
of the abstract theory of matrices and vector space without a concept of the inner
product. Of the canonical forms of matrices, Jordan canonical form is of paramount
importance. We study how it is constructed providing a tangible example.
In relation to the inner product space, normal operators such as Hermitian
operators and unitary operators frequently appear in quantum mechanics and elec-
tromagnetism. From a general aspect, we revisit the theory of Hermitian operators
that often appeared in both Parts I andII.
The last part deals with exponential functions of matrices. The relevant topics can
be counted as one of the important branches of applied mathematics. Also, the
exponential functions of matrices play an essential role in the theory of continuous
groups that will be dealt with in PartIV.
Chapter 11
Vectors and Their Transformation

In this chapter, we deal with the theory of finite-dimensional linear vector spaces.
Such vector spaces are spanned by a finite number of linearly independent vectors,
namely, basis vectors. In conjunction with developing an abstract concept and
theory, we mention a notion of mapping among mathematical elements. A linear
transformation of a vector is a special kind of mapping. In particular, we focus on
endomorphism within a n-dimensional vector space V n. Here, the endomorphism is
defined as a linear transformation: V n ! V n. The endomorphism is represented by a
(n, n) square matrix. This is most often the case with physical and chemical
applications, when we deal with matrix algebra. In this book we focus on this type
of transformation.
A non-singular matrix plays an important role in the endomorphism. In this
connection, we consider its inverse matrix and determinant. All these fundamental
concepts supply us with a sufficient basis for better understanding of the theory of
the linear vector spaces. Through these processes, we should be able to get
acquainted with connection between algebraic and analytical approaches and gain
a broad perspective on various aspects of mathematical physics and related fields.

11.1 Vectors

From both fundamental and practical points of view, it is desirable to define linear
vector spaces in an abstract way. Suppose V is a set of elements denoted by a, b, c,
etc. called vectors. The set V is a linear vector space (or simply a vector space), if a
sum a + b 2 V is defined for any pair of vectors a and b and the elements of V satisfy
the following mathematical relations:

ða þ bÞ þ c ¼ a þ ðb þ cÞ, ð11:1Þ

© Springer Nature Singapore Pte Ltd. 2020 433


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_11
434 11 Vectors and Their Transformation

a þ b ¼ b þ a, ð11:2Þ
a þ 0 ¼ a, ð11:3Þ
a þ ð 2 aÞ ¼ 0: ð11:4Þ

For the above 0 is called the zero vector. Furthermore, for a 2 V, ca 2 V is defined
(c is a complex number called a scalar) and we assume the following relations among
vectors and scalars:

ðcd Þa ¼ cðdaÞ, ð11:5Þ


1a ¼ a, ð11:6Þ
cða þ bÞ ¼ ca þ cb, ð11:7Þ
ðc þ dÞa ¼ ca þ da: ð11:8Þ

On the basis of the above relations, we can construct the following expression called
a linear combination:

c 1 a1 þ c 2 a2 þ    þ c n an :

If this linear combination is equated to zero, we obtain

c1 a1 þ c2 a2 þ    þ cn an ¼ 0: ð11:9Þ

If (11.9) holds only in the case where every ci ¼ 0 (1  i  n), the vectors a1, a2,   ,
an are said to be linearly independent. In this case the relation represented by (11.9)
is said to be trivial. If the relation is nontrivial (i.e., ∃ci 6¼ 0), those vectors are said to
be linearly dependent.
If in the vector space V the maximum number of linearly independent vectors is
n, V is said to be an n-dimensional vector space and sometimes denoted by V n. In this
case any vector x of V n is expressed uniquely as a linear combination of linearly
independent vectors such that

x ¼ x 1 a1 þ x 2 a2 þ    þ x n an : ð11:10Þ

Suppose x is denoted by

x ¼ x01 a1 þ x02 a2 þ    þ x0n an : ð11:11Þ

Subtracting both sides of (11.11) from (11.10), we obtain


     
0 ¼ x1 2 x01 a1 þ x2 2 x02 a2 þ    þ xn 2 x0n an : ð11:12Þ

Linear independence of the vectors a1, a2,   , an implies xn 2 x0n ¼ 0; i:e:, xn ¼


x0n ð1  i  nÞ. These n linearly independent vectors are referred to as basis vectors.
11.1 Vectors 435

A vector space that has a finite number of basis vectors is called finite-dimensional;
otherwise it is infinite-dimensional.
Alternatively, we express (11.10) as
0 1
x1
B C
x ¼ ða1   an Þ@ ⋮ A: ð11:13Þ
xn
0 1
x1
B C
A set of coordinates @ ⋮ A is called a column vector (or a numerical vector) that
xn
indicates an “address” of the vector x with respect to the basis vectors a1, a2,   , an.
Any vector in V n can be expressed as a linear combination of the basis vectors and,
hence, we say that V n is spanned by a1, a2,   , an. This is represented as

V n ¼ Spanfa1 , a2 ,   , an g: ð11:14Þ

Let us think of a subset W of V n (i.e., W ⊂ V n). If the following relations hold for
W, W is said to be a (linear) subspace of V n.

a, b 2 W⟹a þ b 2 W,
a 2 W⟹ca 2 W: ð11:15Þ

These two relations ensure that the relations of (11.1)–(11.8) hold for W as well. The
dimension of W is equal to or smaller than n. For instance W ¼ Span{a1, a2,   ,
ar} (r  n) is a subspace of V n. If r ¼ n, W ¼ V n. Suppose that there are two
subspaces W1 ¼ Span{a1} and W2 ¼ Span{a2}. Note that in this case W1 [ W2 is not
a subspace, because W1 [ W2 does not contain a1 + a2. However, a set U defined by
 
U ¼ x ¼ x1 þ x2 ; 8 x1 2 W 1 , 8 x2 2 W 2 ð11:16Þ

is a subspace of V n. We denote this subspace by W1 + W2.


To show this is in fact a subspace, suppose that x,y 2 W1 + W2. Then, we may
express x ¼ x1 + x2 and y ¼ y1 + y2, where x1,y1 2 W1; x2,y2 2 W2. We have
x + y ¼ (x1 + y1) + (x2 + y2), where x1 + y1 2 W1 and x2 + y2 2 W2 because both W1
and W2 are subspaces. Therefore, x + y 2 W1 + W2. Meanwhile, with any scalar
c, cx ¼ cx1 + cx2 2 W1 + W2. By definition (11.15), W1 + W2 is a subspace
accordingly. Suppose here x1 2 W1. Then, x1 = x1 + 0 2 W1 + W2. Then,
W1 ⊂ W1 + W2. Similarly, we have W2 ⊂ W1 + W2. Thus, W1 + W2 contains
both W1 and W2. Conversely, let W be an arbitrary subspace that contains both W1
and W2. Then, we have 8x1 2 W1 ⊂ W and 8x2 2 W2 ⊂ W and, hence, we have
x1 + x2 2 W by definition (11.15). But, from (11.16) W1 + W2 ¼ {x1 + x2;8x1 2 W1,
8
x2 2 W2}. Hence, W1 + W2 ⊂ W. Consequently, any subspace necessarily contains
436 11 Vectors and Their Transformation

(a) z z (b) z

% % %

2 2 2
y y y

$ $ಬ 3
x x x

Fig. 11.1 Decomposition of a vector in a three-dimensional Cartesian space ℝ3 into two subspaces.
(a) ℝ3 ¼ W1 + W2; W1 \ W2 ¼ Span{e2}, where e2 is a unit vector in the positive direction of the y-
axis. (b) ℝ3 ¼ W1 + W3; W1 \ W3 ¼ {0}

W1 + W2. This implies that W1 + W2 is the smallest subspace that contains both W1
and W2.
Example 11.1 Consider a three-dimensional Cartesian space ℝ3 (Fig. 11.1). We
regard the xy-plane and yz-plane as a subspace W1 and W2, respectively, and
! ! !
ℝ3 ¼ W1 + W2. In Fig. 11.1a, a vector OB (in ℝ3) is expressed as OA þ AB (i.e.,
!
a sum of a vector in W1 and that in W2). Alternatively, the same vector OB can be
! !
expressed as OA0 þ A0 B . On the other hand, we can designate a subspace in a
different way; i.e., in Fig. 11.1b the z-axis is chosen for a subspace W3 instead of
!
W2. We have ℝ3 ¼ W1 + W3 as well. In this case, however, OB is uniquely expressed
! ! !
as OB ¼ OP þ PB . Notice that in Fig. 11.1a W1 \ W2 ¼ Span{e2}, where e2 is a
unit vector in the positive direction of the y-axis. In Fig. 11.1b, on the other hand, we
have W1 \ W3 ¼ {0}.
We can generalize this example to the following theorem.
Theorem 11.1 Let W1 and W2 be subspaces of V and V ¼ W1 + W2. Then a vector x
in V is uniquely expressed as

x ¼ x1 þ x 2 , x1 2 W 1 , x2 2 W 2 ,

if and only if W1 \ W2 ¼ {0}.


Proof Suppose W1 \ W2 ¼ {0} and x ¼ x1 þ x2 = x01 þ x02 , x1 , x01 2 W 1 , x2 ,
x02 2 W 2 . Then x1 2 x01 = x02 2 x2 . LHS belongs to W1 and RHS belongs to W2.
Both sides belong to W1 \ W2 accordingly. Hence, from the supposition both the
sides should be equal to a zero vector. Therefore, x1 = x01, x2 = x02. This implies that x
11.1 Vectors 437

is expressed uniquely as x ¼ x1 + x2. Conversely, suppose the vector representation


(x ¼ x1 + x2) is unique and x 2 W1 \ W2. Then, x ¼ x + 0 = 0 + x; x,0 2 W1 and
x,0 2 W2. Uniqueness of the representation implies that x ¼ 0. Consequently,
W1 \ W2 ¼ {0} follows.
In case W1 \ W2 ¼ {0}, V ¼ W1 + W2 is said to be a direct sum of W1 and W2 or
we say that V is decomposed into a direct sum of W1 and W2. We symbolically
denote this by

V ¼ W 1 ⨁W2 : ð11:17Þ

In this case, the following equality holds:

dim V ¼ dimW 1 þ dimW 2 , ð11:18Þ

where “dim” stands for dimension of the vector space considered. To prove (11.18),
we suppose that V is a n-dimensional vector space and that W1 and W2 are spanned
by r1 and r2 linearly independent vectors, respectively, such that
n o n o
ð1Þ ð1Þ ð2Þ ð2Þ
W 1 ¼ Span e1 , e2 ,   , eðr11 Þ and W 2 ¼ Span e1 , e2 ,   , eðr22 Þ : ð11:19Þ

This is equivalent to that dimension of W1 and W2 is r1 and r2, respectively. If


V ¼ W1 + W2 (here we do not assume that the summation is a direct sum), we have
n o
ð1Þ ð1Þ ð2Þ ð2Þ
V ¼ Span e1 , e2 ,   , eðr11 Þ , e1 , e2 ,   , eðr22 Þ : ð11:20Þ

Then, we have n  r1 + r2. This is almost trivial. Suppose r1 + r2 < n. Then, these
(r1 + r2) vectors cannot span V, but we need additional vectors for all vectors
including the additional vectors to span V. Thus, we should have n  r1 + r2
accordingly. That is,

dim V  dimW 1 þ dimW 2 : ð11:21Þ

ð2Þ
Now, let us assume that V ¼ W1 ⨁ W2. Then, ei ð1  i  r 2 Þ must be linearly
ð1Þ ð1Þ ð1Þ ð2Þ
independent of e1 , e2 ,   , er1 . If not, ei could be described as a linear combina-
ð1Þ ð1Þ ð2Þ ð2Þ
tion of e1 , e2 ,   , eðr11 Þ. But, this would imply that ei 2 W 1 , i.e., ei 2 W 1 \ W 2 ,
in contradiction to that we have V ¼ W1 ⨁ W2. It is because W1 \ W2 ¼ {0} by
ð1Þ ð2Þ ð2Þ
assumption. Likewise, ej ð1  j  r 1 Þ is linearly independent of e1 , e2 ,   , eðr22 Þ .
ð1Þ ð1Þ ð2Þ ð2Þ
Hence, e1 , e2 ,   , eðr11 Þ , e1 , e2 ,   , eðr22 Þ must be linearly independent. Thus,
n  r1 + r2. This is because in the vector space V we may well have additional
vector(s) that are independent of the above (r1 + r2) vectors. Meanwhile, n  r1 + r2
from the above. Consequently, we must have n ¼ r1 + r2. Thus, we have proven that
438 11 Vectors and Their Transformation

V ¼ W 1 ⨁W 2 ⟹dim V ¼ dimW 1 þ dimW 2 :

Conversely, suppose n ¼ r1 + r2. Then, any vector x in V is expressed uniquely as


   
ð1Þ ð2Þ
x = a1 e1 þ    þ ar1 eðr11 Þ þ b1 e1 þ    þ br2 eðr22 Þ : ð11:22Þ

The vector described by the first term is contained in W1 and that described by the
second term in W2. Both the terms are again expressed uniquely. Therefore, we get
V ¼ W1 ⨁ W2. This is a proof of

dim V ¼ dimW 1 þ dimW 2 ⟹V ¼ W 1 ⨁W 2 :

The above statements are summarized as the following theorem: ∎


Theorem 11.2 Let V be a vector space and let W1 and W2 be subspaces of V. Also
suppose V ¼ W1 + W2. Then, the following relation holds:

dim V  dimW 1 þ dimW 2 : ð11:21Þ

Furthermore, we have

dim V ¼ dimW 1 þ dimW 2 , ð11:23Þ

if and only if V ¼ W1 ⨁ W2.


Theorem 11.2 is readily extended to the case where there are three or more
subspaces. That is, having W1, W2,   , Wm so that V ¼ W1 + W2   + Wm, we obtain
the following relation:

dim V  dimW 1 þ dimW 2 þ    þ W m : ð11:24Þ

The equality of (11.24) holds if and only if V ¼ W1 ⨁ W2 ⨁    ⨁ Wm.


In light of Theorem 11.2, Example 11.1 says that 3 ¼ dim ℝ3 < 2 + 2 ¼
dim W1 + dim W2. But, dim ℝ3 ¼ 2 + 1 ¼ dim W1 + dim W3. Therefore, we have
ℝ 3 ¼ W 1 ⨁ W 2.

11.2 Linear Transformations of Vectors

In the previous section we introduced vectors and their calculation rules in a linear
vector space. It is natural and convenient to relate a vector to another vector, as a
function f relates a number (either real or complex) to another number such that y ¼ f
(x), where x and y are certain two numbers. A linear transformation from the vector
space V to another vector space W is a mapping A: V ! W such that
11.2 Linear Transformations of Vectors 439

Aðca þ dbÞ = cAðaÞ þ dAðbÞ: ð11:25Þ

We will briefly discuss the concepts of mapping at the end of this section.
It is convenient to define addition of the linear transformations. It is defined as

ðA þ BÞa = Aa þ Ba, ð11:26Þ

where a is any vector in V.


Since (11.25) is a broad but abstract definition, we begin with a well-known
simple example of rotation of a vector within a xy-plane (Fig. 11.2). We denote an
arbitrary position vector x in the xy-plane by

x ¼ xe1 þ ye2 :
x
¼ ð e1 e 2 Þ , ð11:27Þ
y

where e1 and e2 are unit basis vectors in the xy-plane and x and y are coordinates of
the vector x in reference to e1 and e2. The expression (11.27) is consistent with
(11.13). The rotation represented in Fig. 11.2 is an example of a linear transforma-
tion. We call this rotation R. According to the definition

Rðxe1 þ ye2 Þ = RðxÞ = xRðe1 Þ þ yRðe2 Þ: ð11:28Þ


Putting Rðxe1 þ ye2 Þ = x0 , Rðe1 Þ ¼ e01 , and Rðe2 Þ ¼ e02 ,
x0 = xe01 þ ye02 ,
  x
¼ e01 e02 : ð11:29Þ
y

From Fig. 11.2 we readily obtain

Fig. 11.2 Rotation of a


vector x within a xy-plane

θ
θ

θ

440 11 Vectors and Their Transformation

e01 ¼ e1 cos θ þ e2 sin θ,


e02 ¼ 2 e1 sin θ þ e2 cos θ: ð11:30Þ

Using a matrix representation,

  cos θ  sin θ
e01 e02 ¼ ðe1 e2 Þ : ð11:31Þ
sin θ cos θ

Substituting (11.30) into (11.29), we obtain

x0 = ðx cos θ  y sin θÞe1 þ ðx sin θ þ y cos θÞe2 : ð11:32Þ

Meanwhile, x0 can be expressed relative to the original basis vectors e1 and e2.

x0 = x 0 e 1 þ y 0 e 2 : ð11:33Þ

Comparing (11.32) and (11.33), uniqueness of the representation ensures that

x0 = x cos θ  y sin θ,
y0 ¼ x sin θ þ y cos θ: ð11:34Þ

Using a matrix representation once again,

x0 cos θ  sin θ x
0
¼ : ð11:35Þ
y sin θ cos θ y

Further combining (11.29) and (11.31), we get

cos θ  sin θ x
x0 ¼ RðxÞ ¼ ðe1 e2 Þ : ð11:36Þ
sin θ cos θ y

The above example demonstrates that the linear transformation R has a (2, 2)
matrix representation shown in (11.36). Moreover, this example obviously shows
that if a vector is expressed as a linear combination of the basis vectors, the
“coordinates” (represented by a column vector) can be transformed as well by the
same matrix.
Regarding an abstract n-dimensional linear vector space V n, the linear vector
transformation A is given by
0 10 1
a11    a1n x1
B CB C
AðxÞ ¼ ðe1   en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A , ð11:37Þ
an1    ann xn
11.2 Linear Transformations of Vectors 441

where e1, e2,   , and en are basis vectors and x1, x2,   , and xn are the corresponding
coordinates of a vector x = ∑ni¼1 xi ei. We assume that the transformation is a mapping
A: V n ! V n (i.e., endomorphism). In this case the transformation is represented by a
(n, n) matrix. Note that the matrix operates on the basis vectors from the right and
that it operates on the coordinates (i.e., a column vector) from the left. In (11.37) we
often omit a parenthesis to simply write Ax.
Here we mention matrix notation for later convenience. We often identify a linear
vector transformation with its representation matrix and denote both transformation
and matrix by A. On this occasion, we write
0 1
a11  a1n
B C  
A¼@⋮ ⋱ ⋮ A, A ¼ ðAÞij ¼ aij , etc:, ð11:38Þ
an1  ann

where with the second expression (A)ij and (aij) represent the matrix A itself; for we
frequently use indexed matrix notations such as A1, A{ and A.e The notation (11.38)
can conveniently be used in such cases. Note moreover that aij represents the matrix
A as well.
Equation (11.37) has duality such that the matrix A operates either on the basis
vectors or coordinates. This can explicitly be written as
2 0 13 0 1
a11    a1n x1
6 B C7 B C
AðxÞ ¼ 4ðe1   en Þ@ ⋮ ⋱ ⋮ A5 @ ⋮ A
an1    ann xn
20 10 13
a11    a1n x1
6B CB C7
¼ ðe1   en Þ4@ ⋮ ⋱ ⋮ A@ ⋮ A5:
an1    ann xn

That is, we assume the associative law with the above expression. Making summa-
tion representation, we have
0 1 0 Pn 1
 Xn Xn  x1 l¼1 a1l xl
B C B C
AðxÞ ¼ e a 
k¼1 k k1
ea
k¼1 k kn
@ ⋮ A ¼ ðe1   en Þ@ ⋮ A:
Pn
xn l¼1 anl xl
Xn Xn
¼ k¼1
ea x
l¼1 k kl l
Xn Xn  Xn Xn 
¼ l¼1 k¼1
e k a kl x l ¼ k¼1 l¼1
a kl x l ek : ð11:39Þ

That is, the above equation can be viewed in either of two ways, i.e., coordinate
transformation with fix vectors or vector transformation with fixed coordinates.
442 11 Vectors and Their Transformation

Also, Eq. (11.37) can formally be written as


0 1 0 1
x1 x1
B C B C
AðxÞ ¼ ðe1   en ÞA@ ⋮ A ¼ ðe1 A  en AÞ@ ⋮ A, ð11:40Þ
xn xn

where we assumed that the distributive law holds with operation of A on (e1  en).
Meanwhile, if in (11.37) we put xi ¼ 1, xj ¼ 0 ( j 6¼ i), from (11.39) we get
Xn
A ð ei Þ ¼ ea :
k¼1 k ki

Therefore, (11.39) can be rewritten as


0 1
x1
B C
AðxÞ ¼ ðAðe1 Þ    Aðen ÞÞ@ ⋮ A: ð11:41Þ
xn

Since xi (1  i  n) can arbitrarily be chosen, comparing (11.40) and (11.41) we


have

ei A ¼ Aðei Þ ð1  i  nÞ: ð11:42Þ

The matrix representation is unique in reference to the same basis vectors.


Suppose that there is another matrix representation of the transformation A such that
0 10 1
a011    a01n x1
B CB C
AðxÞ ¼ ðe1   en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A: ð11:43Þ
a0n1    a0nn xn

Subtracting (11.43) from (11.37), we obtain


20 10 1 0 0 10 13
a11    a1n x1 a11    a01n x1
6B CB C B CB C7
ðe1   en Þ4@ ⋮ ⋱ ⋮ A@ ⋮ A  @ ⋮ ⋱ ⋮ A@ ⋮ A5
an1    ann xn a0n1    a0nn xn
0 n   1
∑k¼1 a1k  a01k xk
B C
¼ ðe1   en Þ@ ⋮ A ¼ 0:
 0

∑k¼1 ank  ank xk
n
11.2 Linear Transformations of Vectors 443

 
On the basis of the linear dependence of the basis vectors, ∑nk¼1 aik  a0ik xk ¼
0 ð1  i  nÞ: This relationship holds for any arbitrarily and independently chosen
complex numbers xi (1  i  n). Therefore, we must have aik ¼ a0ik ð1  i, k  nÞ,
meaning that the matrix representation of A is unique with regard to fixed basis
vectors.
Nonetheless, if a set of vectors e1, e2,   , and en does not constitute basis vectors
(i.e., those vectors are linearly dependent), the aforementioned uniqueness of the
matrix representation loses its meaning. For instance, in V2 take vectors e1 and e2
such that e1 ¼ e2 (i.e., the two vectors are linearly dependent) and let the transfor-
1 0
mation matrix be B ¼ . This means that e1 should be e1 after the transfor-
0 2
mation. At the same time, the vector e2 (¼e1) should be converted to 2e2 (¼2e1). It is
impossible except for the case of e1 ¼ e2 ¼ 0. The above matrix B in its own right is
an object of matrix algebra, of course.
Putting a ¼ b ¼ 0 and c ¼ d ¼ 1 in the definition of (9.25) for the linear
transformation, we obtain

Að0Þ = Að0Þ þ Að0Þ:

Combining this relation with (11.4) gives

Að0Þ = 0: ð11:44Þ

Then do we have a vector u 6¼ 0 for which A(u) = 0? An answer is yes. This is


1 0
because if a (2, 2) matrix of is chosen for R, we get a linear transformation
0 0
such that

  1 0
e01 e02 ¼ ðe1 e2 Þ ¼ ðe1 0Þ:
0 0

That is, we have R(e2) ¼ 0.


In general, vectors x (2V ) satisfying A(x) = 0 form a subspace in a vector space
V. This is because A(x) = 0 and A(y) = 0 ⟹ A(x + y) = A(x) + A(y) = 0, A(cx) = cA
(x) = 0. We call this subspace of a maximum dimension a null-space and represent it
as Ker A, where Ker stands for “kernel.” In other words, Ker A ¼ A1(0). Note that
this symbolic notation does not ensure the existence of the inverse transformation
A1 (vide infra), but represents a set comprising elements x that satisfy A(x) = 0. In
the above example Ker R ¼ Span{e2}.
A(V n) (here V n is a vector space considered and we assume that A is an endo-
morphism of V n) also forms a subspace in V n. In fact, for any x, y 2V n, we have A(x),
A( y) 2 A(V n) ⟹ A(x) + A( y) = A(x + y) 2 A(V n);cA(x) = A(cx) 2 A(V n).
Obviously, A(V n) ⊂ V n. The subspace A(V n) is said to be an image of the
444 11 Vectors and Their Transformation

transformation A and sometimes denoted by Im A. We have a so-called “dimension


theorem” expressed as follows:
Theorem 11.3: Dimension Theorem Let V n be a linear vector space of dimension
n. Also let A be an endomorphism: V n ! V n. Then we have

dim V n ¼ dimAðV n Þ þ dim Ker A: ð11:45Þ

The number of dimA(V n) is said to be a rank of the linear transformation A. That is,
we write

dimAðV n Þ ¼ rank A:

Also the number of dim Ker A is said to be a nullity of the linear transformation A.
That is, we have

dim Ker A ¼ nullity A:

Thus, (11.45) can be written succinctly as

dim V n ¼ rank A þ nullity A:

Proof Let e1, e2,   , and en be basis vectors of V n. First, assume

Aðe1 Þ ¼ Aðe2 Þ ¼    ¼ Aðen Þ ¼ 0:


 
This implies that nullity A ¼ n. Then, A ∑ni¼1 xi ei ¼ ∑ni¼1 xi Aðei Þ ¼ 0. Since xi is
arbitrarily chosen, the expression means that A(x) ¼ 0 for 8x 2 V n. This implies
A ¼ 0. That is, rank A ¼ 0. Thus, (11.45) certainly holds.
To proceed with proof of the theorem, we think of a linear combination ∑ni¼1 ci ei.
Next, assume that Ker A ¼ Span{e1, e2,   , eν} (ν < n); dim Ker A ¼ ν. After A is
operated on the above linear combination, we are left with ∑ni¼νþ1 ci Aðei Þ. We put

∑ni¼1 ci Aðei Þ ¼ ∑ni¼νþ1 ci Aðei Þ ¼ 0: ð11:46Þ

Suppose that the (n  ν) vectors A(ei) (ν + 1  i  n) are linearly dependent. Then


without loss of generality we can assume cν + 1 6¼ 0. Dividing (11.46) by cν + 1 we
obtain
11.2 Linear Transformations of Vectors 445

cνþ2 c
Aðeνþ1 Þ þ Aðeνþ2 Þ þ    þ n Aðen Þ ¼ 0,
C νþ1 Cνþ1
cνþ2 cn
Aðeνþ1 Þ þ A e þ  þ A e ¼ 0,
Cνþ1 νþ2 C νþ1 n
cνþ2 c
A eνþ1 þ e þ    þ n en ¼ 0:
C νþ1 νþ2 Cνþ1

Meanwhile, the (ν + 1) vectors e1 , e2 ,   , eν , and eνþ1 þ Ccνþ2


νþ1
eνþ2 þ    þ Ccνþ1
n
en
are linearly independent, because e1, e2,   , and en are basis vectors of V . This
n

would imply that the dimension of Ker A is ν + 1, but this is in contradiction to


Ker A ¼ Span{e1, e2,   , eν}. Thus, the (n  ν) vectors A(ei) (ν + 1  i  n) should
be linearly independent.
Let V n  ν be described as

V nν ¼ SpanfAðeνþ1 Þ, Aðeνþ2 Þ,   , Aðen Þg: ð11:47Þ

Then, V n  ν is a subspace of V n, and so dimA(V n)  n  ν ¼ dim V n  ν


.
Meanwhile, from (11.46) we have
 
A ∑ni¼νþ1 ci ei ¼ 0:

From the above discussion, however, this relation holds if and only if
cν + 1 ¼    ¼ cn ¼ 0. This implies that Ker A \ V n  ν ¼ {0}. Then, we have

V nν þ Ker A ¼ V nν  Ker A:

Meanwhile, from Theorem 11.2 we have

dim½V nν  Ker A ¼ dimV nν þ dim Ker A ¼ ðn  νÞ þ ν ¼ n ¼ dimV n :

Thus, we must have dimA(V n) ¼ n  ν ¼ dim V n  ν. Since V n  ν is a subspace of


V n and V n  ν ⊂ A(V n) from (11.47), V n  ν ¼ A(V n).
To conclude, we get

dim AðV n Þ þ dim Ker A ¼ dimV n ,


V n ¼ AðV n Þ  Ker A: ð11:48Þ

This completes the proof. ∎


Comparing Theorem 11.3 with Theorem 11.2, we find that Theorem 11.3 is a
special case of Theorem 11.2. Equations (11.45) and (11.48) play an important role
in the theory of linear vector space.
As an exercise, we have a following example:
446 11 Vectors and Their Transformation

Example 11.2 Let e1, e2, e3, e4 be basis vectors of V 4. Let A be an endomorphism of
V 4 and described by
0 1
1 0 1 0
B0 0C
B 1 1 C
A¼B C:
@1 0 1 0A
0 1 1 0

We have
0 1
1 0 1 0
B0 0C
B 1 1 C
ð e1 , e2 , e3 , e4 Þ B C ¼ ðe1 þ e3 , e2 þ e4 , e1 þ e2 þ e3 þ e4 , 0Þ:
@1 0 1 0A
0 1 1 0

That is,

Aðe1 Þ ¼ e1 þ e3 , Aðe2 Þ ¼ e2 þ e4 , Aðe3 Þ ¼ e1 þ e2 þ e3 þ e4 ,


Aðe4 Þ ¼ 0:

We have

Að 2 e1  e2 þ e3 Þ ¼ Aðe1 Þ  Aðe2 Þ þ Aðe3 Þ ¼ 0:

Then, we find
 
A V 4 ¼ Spanfe1 þ e3 , e2 þ e4 g, Ker A ¼ Spanf 2 e1  e2 þ e3 , e4 g: ð11:49Þ

For any x 2 V n, using scalar ci (1  i  4), we have

x ¼ c1 e1 þ c2 e2 þ c 3 e3 þ c 2 e4
1 1
¼ ðc1 þ c3 Þðe1 þ e3 Þ þ ð2c2 þ c3  c1 Þðe2 þ e4 Þ
2 2
1 1
þ ðc3  c1 Þð 2 e1  e2 þ e3 Þ þ ðc1  2c2  c3 þ 2c4 Þe4 : ð11:50Þ
2 2

Thus, x has been uniquely represented as (11.50) with respect to basis vectors
(e1 + e3), (e2 + e4), (2e1  e2 + e3), and e4. The linear independence of these vectors
can easily be checked by equating (11.50) with zero. We also confirm that
11.2 Linear Transformations of Vectors 447

Fig. 11.3 Concept of injective


mapping from a set X to
another set Y

surjective

bijective: injective + surjective

 
V 4 ¼ A V 4  Ker A:

Linear transformation is a kind of mapping. Figure 11.3 depicts the concept of


mapping. Suppose two sets of X and Y. The mapping f is a correspondence between
an element x (2X) and y (2Y ). The set f(X) ( ⊂ Y ) is said to be a range of f.
(i) The mapping f is injective: If x1 6¼ x2 ⟹ f(x1) 6¼ f(x2). (ii) The mapping is
surjective: f(X) ¼ Y. For 8y 2 Y corresponding element(s) x 2 X exist(s). (iii) The
mapping is bijective: If the mapping f is both injective and surjective, it is said to be
bijective (or reversible mapping or invertible mapping). A mapping that is not
invertible is said to be a non-invertible mapping.
If the mapping f is bijective, a unique element ∃x 2 X exists for 8y 2 Y such that f
(x) ¼ y. In terms of solving an equation, we say that with any given y we can find a
unique solution x to the equation f(x) ¼ y. In this case x is said to be an inverse
element to y and this is denoted by x ¼ f 1( y). The mapping f 1 is called an inverse
mapping. If the linear transformation is relevant, the mapping is said to be an inverse
transformation.
Here we focus on a case where both X and Y form a vector space and the mapping
is an endomorphism. Regarding the linear transformation A: V n ! V n (i.e., an
endomorphism of V n), we have a following important theorem:
Theorem 11.4 Let A: V n ! V n be an endomorphism of V n. A necessary and
sufficient condition for the existence of an inverse transformation to A (i.e., A1) is
A1(0) ¼ {0}.
Proof Suppose A1(0) ¼ {0}. Then A(x1) ¼ A(x2) ⟺ A(x1  x2) ¼ 0 ⟺ x1  x2 ¼ 0;
i.e., x1 ¼ x2. This implies that the transformation A is injective. Other way round,
suppose that A is injective. If A1(0) 6¼ {0}, there should be b (6¼0) with which A
(b) ¼ 0. This is, however, in contradiction to that A is injective. Then we must have
A1(0) ¼ {0}. Thus, A1(0) ¼ {0} ⟺ A is injective.
Meanwhile, A1(0) ¼ {0} ⟺ dim A1(0) ¼ 0 ⟺ dim A(V n) ¼ n (due to
Theorem 11.3); i.e., A(V n) ¼ V n. Then A1(0) ¼ {0} ⟺ A is surjective. Combining
this with the abovementioned statement, we have A1(0) ¼ {0} ⟺ A is bijective.
This statement is equivalent to that an inverse transformation exists.
In the proof of Theorem 11.4, to show that A1(0) ¼ {0} ⟺ A is surjective, we
have used the dimension theorem (Theorem 11.3), for which the relevant vector
448 11 Vectors and Their Transformation

space is finite (i.e., n-dimensional). In other words, that A is surjective is equivalent


to that A is injective with a finite-dimensional vector space, and vice versa. To
conclude, so far as we are thinking of the endomorphism of a finite-dimensional
vector space, if we can show it is either injective or surjective, the other necessarily
follows and, hence, the mapping is bijective. ∎

11.3 Inverse Matrices and Determinants

The existence of the inverse transformation plays a particularly important role in the
theory of linear vector spaces. The inverse transformation is a linear transformation.
Let x1 ¼ A1( y1), x2 ¼ A1( y2). Also, we have A(c1x1 + c2x2) ¼ c1A(x1) + c2A
(x2) ¼ c1 y1 + c2 y2. Thus, c1x1 + c2x2 ¼ A1(c1 y1 + c2 y2) ¼ c1A1( y1) + c2A1( y2),
showing that A1 is a linear transformation. As already mentioned, a matrix that
represents a linear transformation A is uniquely determined with respect to fixed
basis vectors. This should be the case with A1 accordingly. We have an important
theorem for this.
Theorem 11.5 [1] The necessary and sufficient condition for the matrix A1 that
represents the inverse transformation to A to exist is that det A 6¼ 0 (“det” means a
determinant). Here the matrix A represents the linear transformation A. The matrix
A1 is uniquely determined and given by
 
A1 ij
¼ ð1Þiþj ðM Þji =ðdet AÞ, ð11:51Þ

where (M)ij is the minor of det A corresponding to the element Aij.


Proof First, we suppose that the matrix A1 exists so that it satisfies the following
relation
 
∑nk¼1 A1 ik ðAÞkj ¼ δij ð1  i, j  nÞ: ð11:52Þ

On this condition, suppose that det A ¼ 0. From the properties of determinants this
implies that one of the columns of A (let it be the m-th column) can be expressed as a
linear combination of the other columns of A such that
X
Akm ¼ A c:
j6¼m kj j
ð11:53Þ

Putting i ¼ m in (11.52), multiplying by cj, and summing over j 6¼ m, we get


11.3 Inverse Matrices and Determinants 449

 
∑nk¼1 A1 mk ∑j6¼m Akj cj ¼ ∑j6¼m δmj cj ¼ 0: ð11:54Þ

From (11.52) and (11.53), on the other hand, we obtain


Xn   X n  o
k¼1
A1 mk
A c ¼
j6¼m kj j
A1 ik
ðAÞkm ¼ 1: ð11:55Þ
i¼m

There is the inconsistency between (11.54) and (11.55). The inconsistency resulted
from the supposition that the matrix A1 exists. Therefore, we conclude that if
det A ¼ 0, A1 does not exist. Taking contraposition to the above statement, we
say that if A1 exists, det A 6¼ 0. Suppose next that det A 6¼ 0. In this case, on the
basis of the well-established result, a unique A1 exists and it is given by (11.51).
This completes the proof. ∎
Summarizing the characteristics of the endomorphism within a finite-dimensional
vector space, we have

injective ⟺ surjective ⟺ bijective ⟺ det A 6¼ 0:

Let a matrix A be
0 1
a11  a1n
B C
A¼@⋮ ⋱ ⋮ A:
an1  ann

The determinant of a matrix A is denoted by det A or by

a11  a1n


⋮ ⋱ ⋮ :
an1  ann

The determinant is defined as

det A  ∑ 1 2  n εðσ Þa1i1 a2i2   anin , ð11:56Þ


σ¼
i1 i2    in

where σ means permutation among 1, 2, . . ., n and ε(σ) denotes a sign of + (in the
case of even permutations) or  (for odd permutations).
We deal with triangle matrices for future discussion. It is denoted by
450 11 Vectors and Their Transformation

= * , ð11:57Þ
0

where an asterisk ( ) means that upper right off-diagonal elements can take any
complex numbers (including zero). A large zero shows that all the lower left
off-diagonal elements are zero. Its determinant is given by

det T ¼ a11 a22   ann : ð11:58Þ

In fact, focusing on anin we notice that only if in ¼ n, anin does not vanish. Then we
get

det A  ∑ 1 2    n  1 εðσ Þa1i1 a2i2 an1in1 ann : ð11:59Þ


σ¼
i1 i2  in1

Repeating this process, we finally obtain (11.58).


The endomorphic linear transformation can be described succinctly as

AðxÞ ¼ y, ð11:60Þ

where we have vectors such that x = ∑ni¼1 xi ei and y = ∑ni¼1 yi ei . In reference to the
same set of basis vectors (e1  en) and using a matrix representation, we have
0 10 1 0 1
a11    a1n x1 y1
B CB C B C
AðxÞ ¼ ðe1   en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ ðe1   en Þ@ ⋮ A: ð11:61Þ
an1    ann xn yn

From the unique representation of a vector in reference to the basis vectors, (11.61) is
simply expressed as
0 10 1 0 1
a11    a1n x1 y1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ @ ⋮ A : ð11:62Þ
an1    ann xn yn

With a shorthand notation, we have


Xn
yi ¼ a x
k¼1 ik k
ð1  i  nÞ: ð11:63Þ

From the above discussion, for the linear transformation A to be bijective, det A 6¼ 0.
In terms of the system of linear equations, we say that for (11.62) to have a unique
11.3 Inverse Matrices and Determinants 451

0 1 0 1
x1 y1
B C B C
solution @ ⋮ A for a given @ ⋮ A, we must have det A 6¼ 0. Conversely, det A ¼ 0
xn yn
is equivalent to that (11.62) has indefinite solutions or has no solution. As far as the
matrix algebra is concerned, (11.61) is symbolically described by omitting a paren-
thesis as

Ax ¼ y: ð11:64Þ

However, when the vector transformation is explicitly taken into account, the full
representation of (11.61) should be borne in mind.
The relations (11.60) and (11.64) can be considered as a set of simultaneous
equations. A necessary and sufficient condition for (11.64) to have a unique solution
x for a given y is det A 6¼ 0. In that case, the solution x of (11.64) can be symbolically
expressed as

x = A1 y, ð11:65Þ

where A1 represents an inverse matrix of A.


Example 11.3 Think of three-dimensional rotation by θ in ℝ3 around the z-axis.
The relevant transformation matrix is
0 1
cos θ  sin θ 0
B C
R ¼ @ sin θ cos θ 0 A:
0 0 1

As detR ¼ 1 6¼ 0, the transformation is bijective. This means that for 8y 2 ℝ3 there is


always a corresponding x 2 ℝ3. This x can be found by solving Rx ¼ y; i.e.,
x ¼ R1y. Putting x ¼ xe1 + ye2 + ze3 and y ¼ x0e1 + y0e2 + z0e3, a matrix
representation is given by
0 1 0 01 0 10 1
x x cos θ sin θ 0 x0
B C B C B CB C
@ y A ¼ R1 @ y0 A ¼ @  sin θ cos θ 0 A@ y0 A:
z z0 0 0 1 z0

Thus, x can be obtained by rotating y by θ.


Example 11.4 Think of a following matrix that represents a linear transformation P:
0 1
1 0 0
B C
P ¼ @0 1 0 A: ð11:66Þ
0 0 0

This matrix transforms a vector x ¼ xe1 + ye2 + ze3 into y ¼ xe1 + ye2 as follows:
452 11 Vectors and Their Transformation

Fig. 11.4 Example of an z


endomorphism P: ℝ3 ! ℝ3

y
O

0 10 1 0 1
1 0 0 x x
B CB C B C
@0 1 0 A@ y A ¼ @ y A: ð11:67Þ
0 0 0 z 0

In this example we are thinking of an endomorphism P: ℝ3 ! ℝ3. Geometrically, it


can be viewed as in Fig. 11.4. Let us think of (11.67) from a point of view of solving
a system of linear equations and newly consider the next equation. In other words,
we are thinking of finding x, y, and z with given a, b, and c in (11.68).
0 10 1 0 1
1 0 0 x a
B CB C B C
@0 1 0 A@ y A ¼ @ b A: ð11:68Þ
0 0 0 z c

If c ¼ 0, we can readily find a solution of x ¼ a, y ¼ b, but z can be any (complex)


number; we have thus indefinite solutions. If c 6¼ 0, we have no solution. The former
situation reflects the fact that the transformation represented by P is not injective.
Meanwhile, the latter reflects the fact that the transformation is not surjective.
Remember that as detP ¼ 0, the transformation is not injective or surjective.

11.4 Basis Vectors and Their Transformations

In the previous sections we show that a vector is uniquely represented as a column


vector in reference to a set of the fixed basis vectors. The representation, however,
will be changed under a different set of basis vectors.
First let us think of a linear transformation of a set of basis vectors e1, e2,   ,
and en. The transformation matrix A representing a linear transformation A is defined
as follows:
11.4 Basis Vectors and Their Transformations 453

0 1
a11  a1n
B C
A¼@⋮ ⋱ ⋮ A:
an1  ann

Notice here that we often denote both a linear transformation and its corresponding
matrix by the same character. After the transformation, suppose that the resulting
vectors are given by e01 , e02 , and e0n . This is explicitly described as
0 1
a11  a1n
B C  
ðe1   en Þ@ ⋮ ⋱ ⋮ A ¼ e01   e0n : ð11:69Þ
an1    ann

With a shorthand notation, we have


Xn
e0i ¼ a e
k¼1 ki k
ð1  i  nÞ: ð11:70Þ

Care should be taken not to confuse (11.70) with (11.63). Here, a set of vectors
e01 , e02 , and e0n may or may0not 1
be linearly independent. Let us operate both sides of
x1
B C
(11.70) from the left on @ ⋮ A and equate both the sides to zero. That is,
xn
0 10 1 0 1
a11  a1n x1 x1
B CB C  B C
ðe1   en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ e01   e0n @ ⋮ A ¼ 0:
an1  ann xn xn

Since e1, e2,   , and en are the basis vectors, we get


0 10 1
a11  a1n x1
B CB C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ 0: ð11:71Þ
an1  ann xn
0 1
x1
B C
Meanwhile, we must have @ ⋮ A ¼ 0 so that e01 , e02 , and e0n can be linearly inde-
xn
pendent (i.e., so as to be a set of basis vectors). But this means that (11.71) has such a
unique (and trivial) solution and, hence, det A 6¼ 0. If conversely det A 6¼ 0, (11.71)
has a unique trivial solution and e01 , e02 , and e0n are linearly independent. Thus, a
necessary and sufficient condition for e01 , e02 , and e0n to be a set of basis vectors is
det A 6¼ 0. If det A ¼ 0, (11.71) has indefinite solutions (including a trivial solution)
454 11 Vectors and Their Transformation

and e01 , e02 , and e0n are linearly dependent, and vice versa. In case det A 6¼ 0, an inverse
matrix A1 exists, and so we have
 
ðe1   en Þ ¼ e01   e0n A1 : ð11:72Þ

In the previous steps we see how the linear transformation (and the corresponding
matrix representation)
  converts a set of basis vectors (e1  en) to another set of basis
vectors e01   e0n . Is this possible then to find a suitable transformation between two
sets of arbitrarily chosen basis vectors? The answer is yes. This is because any vector
can be expressed uniquely by a linear combination of any set of basis vectors. A
whole array of such linear combinations uniquely define a transformation matrix
between the two sets of basis vectors as expressed in (11.69) and (11.72). The matrix
has nonzero determinant and has an inverse matrix. A concept of the transformation
between basis vectors is important and very often used in various fields of natural
science.
Example 11.5 We revisit Example 11.2. The relation (11.49) tells us that the basis
vectors of A(V 4) and those of Ker A span V 4 in total. Therefore, in light of the above
argument, there should be a linear transformation R between the two sets of vectors,
i.e., e1, e2, e3, e4 and e1 + e3, e2 + e4, 2 e1  e2 + e3, e4. Moreover, the matrix
R associated with the linear transformation must be non-singular (i.e., detR 6¼ 0). In
fact, we find that R is expressed as
0 1
1 0 1 0
B0 1 1 0C
B C
R¼B C:
@1 0 1 0A
0 1 0 1

This is because we have a following relation between the two sets of basis vectors:
0 1
1 0 1 0
B0 1 1 0C
B C
ðe1 , e2 , e3 , e4 ÞB C ¼ ðe1 þ e3 , e2 þ e4 , 2 e1  e2 þ e3 , e4 Þ:
@1 0 1 0A
0 1 0 1

We have det R ¼ 2 6¼ 0 as expected.


Next, let us consider successive linear transformations of vectors. Again we
assume that the transformations are endomorphism:V n ! V n. We have to take into
account transformations of basis vectors along with the targeted vectors. First we
choose a transformation by a non-singular matrix (having a nonzero determinant) for
the subsequent transformation to have a unique matrix representation (vide supra).
The vector transformation by P is expressed as
11.4 Basis Vectors and Their Transformations 455

0 10 1
p11    p1n x1
B CB C
PðxÞ ¼ ðe1   en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A , ð11:73Þ
pn1    pnn xn

where the non-singular matrix P represents the transformation P. Notice here that the
transformation and its matrix are represented by the same P. As mentioned in Sect.
11.2, the matrix P can be operated either from the right on the basis vectors or from
the left on the column vector. We explicitly write
2 0 130 1
p11    p1n x1
6 B C7B C
PðxÞ ¼ 4ðe1   en Þ@ ⋮ ⋱ ⋮ A5@ ⋮ A,
pn1    pnn xn
0 1
x1
 0 
0 B C
¼ e1   en @ ⋮ A, ð11:74Þ
xn
 
where e01   e0n ¼ (e1  en)P [here P is the non-singular matrix defined in (11.73)].
Alternatively, we have
20 10 13
p11    p1n x1
6B CB C7
PðxÞ ¼ ðe1   en Þ4@ ⋮ ⋱ ⋮ A@ ⋮ A5 :
pn1    pnn xn
0 0 1
x1
B C
¼ ðe1   en Þ@ ⋮ A, ð11:75Þ
x0n

where
0 1 0 10 1
x01 p11    p1n x1
B C B CB C
@⋮A ¼ @ ⋮ ⋱ ⋮ A@ ⋮ A : ð11:76Þ
x0n pn1    pnn xn

Equation (11.75) gives a column vector representation regarding the vector that has
been obtained by the transformation P and is viewed in reference to the basis vectors
(e1  en). Combining (11.74) and (11.75), we get
0 1 0 0 1
x1 x1
  B C B C
PðxÞ = e01   e0n @ ⋮ A ¼ ðe1   en Þ@ ⋮ A: ð11:77Þ
xn x0n
456 11 Vectors and Their Transformation

We further make another linear transformation A : V n ! V n. In this case a


corresponding matrix A may be non-singular (i.e., det A 6¼ 0) or singular (det A ¼ 0).
We have to distinguish the matrix representations of the two cases. It is because the
matrix representations are uniquely defined in reference to an individual set of basis
0
vectors; see (11.37) and (11.43). Let
 us denote
 the matrices AO and A with respect to
0 0
the basis vectors (e1  en) and e1   en , respectively. Then, A[P(x)] can be
described in two different ways as follows:
0 1 0 0 1
x1 x1
 0 0
 0B C B C
A½PðxÞ = e1   en A @ ⋮ A ¼ ðe1   en ÞAO @ ⋮ A: ð11:78Þ
xn x0n

This can be rewritten in reference to a linearly independent set of vectors e1,   , en as


0 1 2 0 13
x1 x1
B C 6 B C7
A½PðxÞ = ½ðe1   en ÞPA0 @ ⋮ A ¼ ðe1   en Þ4AO P@ ⋮ A5: ð11:79Þ
xn xn

As (11.79) is described for a vector x = ∑ni¼1 xi ei arbitrarily chosen in V n, we get

PA0 ¼ AO P: ð11:80Þ

Since P is non-singular, we finally obtain

A0 ¼ P1 AO P: ð11:81Þ

We can see (11.79) from a point of view of successive linear transformations.


When the subsequent operation is viewed in reference to the basis vectors e01 ,   , e0n
newly reached by the precedent transformation, we make it a rule to write the
relevant subsequent operator A0 from the right. In the case where the subsequent
operation is viewed in reference to the original basis vectors, on the other hand, we
write the subsequent operator AO from the left. Further discussion and examples can
be seen in Part IV.
We see (11.81) in a different manner. Suppose we have
0 1
x1
B C
AðxÞ ¼ ðe1   en ÞAO @ ⋮ A: ð11:82Þ
xn

Note that since the transformation A has been performed in reference to the basis
vectors (e1  en), AO should be used for the matrix representation. This is rewritten
as
11.4 Basis Vectors and Their Transformations 457

1 0
x1
B C
AðxÞ ¼ ðe1   en ÞPP1 AO PP1 @ ⋮ A: ð11:83Þ
xn

Meanwhile, any vector x in V n can be written as


0 1
x1
B C
x ¼ ðe1   en Þ@ ⋮ A
xn
0 1 0 1 0 1
x1 x1 xe1
B C   B C  B C
¼ ðe1   en ÞPP1 @ ⋮ A ¼ e01   e0n P1 @ ⋮ A ¼ e01   e0n @ ⋮ A: ð11:84Þ
xn xn xen

In (11.84) we put
0 1 0 1
x1 xe1
 0 0
 1 B C B C
ðe1   en ÞP ¼ e1   en , P @ ⋮ A ¼ @ ⋮ A: ð11:85Þ
xn xen

Equation (11.84) gives a column vector representation regarding the same vector
x viewed in reference to the basis set of vectors e1,   , en or e01 ,   , e0n . Equation
(11.85) should
0 not1 be confused
0 0 1 with (11.76). That is, (11.76) relates the two
x1 x1
B C B C
coordinates @ ⋮ A and @ ⋮ A of different vectors before and after the transfor-
xn x0n
mation viewed in reference to the same set0of basis
1 vectors.
0 The 1 relation (11.85), on
x1 xe1
B C B C
the other hand, relates two coordinates @ ⋮ A and @ ⋮ A of the same vector
xn xen
viewed in reference to different set of basis vectors. Thus, from (11.83) we have
0 1
xe1
  B C
AðxÞ = e01   e0n P1 AO P@ ⋮ A: ð11:86Þ
xen
 
Meanwhile, viewing the transformation A in reference to the basis vectors e01   e0n ,
we have
458 11 Vectors and Their Transformation

0 1
xe1
  B C
AðxÞ = e01   e0n A0 @ ⋮ A: ð11:87Þ
xen

Equating (11.86) and (11.87),

A0 ¼ P1 AO P: ð11:88Þ

Thus, (11.81) is recovered.


The relations expressed by (11.81) and (11.88) are called a similarity transfor-
mation on A. The matrices A0 and A0 are said to be similar to each other. As
mentioned earlier, if A0 (and hence A0) is non-singular, the linear transformation
A produces a set of basis vectors other than e01 ,   , e0n , say e001 ,   , e00n . We write this
symbolically as
   
ðe1   en ÞPA ¼ e01   e0n A ¼ e001   e00n : ð11:89Þ

Therefore, such A defines successive transformations of the basis vectors in con-


junction with P defined in (11.73). The successive transformations and resulting
basis vectors supply us with important applications in the field of group theory.
Topics will be dealt with in Part IV.

Reference

1. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York


Chapter 12
Canonical Forms of Matrices

In Sect. 11.4 we saw that the transformation matrices are altered depending on the
basis vectors we choose. Then a following question arises. Can we convert a
(transformation) matrix to as simple a form as possible by similarity transforma-
tion(s)? In Sect. 11.4 we have also shown that if we have two sets of basis vectors in
a linear vector space V n we can always find a non-singular transformation matrix
between the two. In conjunction with the transformation of the basis vectors, the
matrix undergoes similarity transformation. It is our task in this chapter to find a
simple form or a specific form (i.e., canonical form) of a matrix as a result of the
similarity transformation. For this purpose, we should first find eigenvalue(s) and
corresponding eigenvector(s) of the matrix. Depending upon the nature of matrices,
we get various canonical forms of matrices such as a triangle matrix and a diagonal
matrix. Regarding any form of matrices, we can treat these matrices under a unified
form called the Jordan canonical form.

12.1 Eigenvalues and Eigenvectors

An eigenvalue problem is one of important subjects of the theory of linear vector


spaces. Let A be a linear transformation on V n. The resulting matrix gets to several
different kinds of canonical forms of matrices. A typical example is a diagonal
matrix. To reach a satisfactory answer, we start with so-called an eigenvalue
problem.
Suppose that after the transformation of x we have

AðxÞ ¼ αx, ð12:1Þ

where α is a certain (complex) number. Then we say that α is an eigenvalue and that
x is an eigenvector that corresponds to the eigenvalue α. Using a notation of (11.37)
of Sect. 11.2, we have

© Springer Nature Singapore Pte Ltd. 2020 459


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_12
460 12 Canonical Forms of Matrices

0 10 1 0 1
a11    a1n x1 x1
B CB C B C
AðxÞ ¼ ðe1   en Þ@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ αðe1   en Þ@ ⋮ A:
an1    ann xn xn

From linear dependence of e1, e2,   , and en, we simply write


0 10 1 0 1
a11  a1n x1 x1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ α@ ⋮ A:
an1  ann xn xn
0 1
x1
B C
If we identify x with @ ⋮ A at fixed basis vectors (e1  en), we may naturally
xn
rewrite (12.1) as

Ax ¼ αx: ð12:2Þ

If x1 and x2 belong to the eigenvalue α, so does x1 + x2 and cx1 (c is an appropriate


complex number). Therefore, all the eigenvectors belonging to the eigenvalue α
along with 0 (a zero vector) form a subspace of A (within V n) corresponding to the
eigenvalue α.
Strictly speaking, we should use terminologies such as a “proper” (or ordinary)
eigenvalue, eigenvector, and eigenspace to distinguish them from a “generalized”
eigenvalue, eigenvector, and eigenspace. We will return to this point later. Further
rewriting (12.2) we have

ðA  αEÞx ¼ 0, ð12:3Þ

where E is a (n, n) unit matrix. Equations (12.2) and (12.3) are said to be an
eigenvalue equation (or eigenequation). In (12.2) or (12.3) x = 0 always holds
(as a trivial solution). Consequently, for x ≠ 0 to be a solution we must have

jA  αEj ¼ 0, ð12:4Þ

In (12.4), |A  αE| stands for det(A  αE).


Now let us define the following polynomial:
 
 x  a11  a1n 
 
 
f A ðxÞ ¼ jxE  Aj ¼  ⋮ ⋱ ⋮ : ð12:5Þ
 
 an1  x  ann 
12.1 Eigenvalues and Eigenvectors 461

A necessary and sufficient condition for α to be an eigenvalue is that α is a root of


fA(x) ¼ 0. The function fA(x) is said to be a characteristic polynomial and we call
fA(x) ¼ 0 a characteristic equation. This is an n-th order polynomial. Putting

f A ðxÞ ¼ xn þ a1 xn1 þ    þ an , ð12:6Þ

we have

a1 ¼ ða11 þ a22 þ    þ ann Þ  TrA, ð12:7Þ

where Tr stands for “trace” that is a summation of diagonal elements. Moreover,

an ¼ ð1Þn j A j : ð12:8Þ

The characteristic equation fA(x) ¼ 0 has n roots including possible multiple roots.
Let those roots be α1,   , αn (some of them may be identical). Then we have

Y
n
f A ð xÞ ¼ ðx  αi Þ: ð12:9Þ
i¼1

Furthermore, according to relations between roots and coefficients we get

α1 þ    þ αn ¼ a1 ¼ TrA, ð12:10Þ


α1   αn ¼ ð1Þn an ¼ jAj: ð12:11Þ

The characteristic equation fA(x) is invariant under a similarity transformation.


In fact,
   
f P1 AP ðxÞ ¼  xE  P1 AP ¼ P1 ðxE  AÞP
a ¼ jPj1 jxE  AjjPj ¼ jxE  Aj ¼ f A ðxÞ: ð12:12Þ

This leads to invariance of the trace under a similarity transformation. That is,
 
Tr P1 AP ¼ TrA: ð12:13Þ

Let us think of a following triangle matrix:

= * . ð12:14Þ
0
462 12 Canonical Forms of Matrices

The matrix of this type is thought to be one of canonical forms of matrices. Its
characteristic equation fT(x) is
 
 x  a11  
 
 
f T ðxÞ ¼ jxE  T j ¼  0 ⋱  : ð12:15Þ
 
 0 0 x  ann 

Therefore, we get

Y
n
f T ð xÞ ¼ ðx  aii Þ, ð12:16Þ
i¼1

where we used (11.58). From (12.14) and (12.16), we find that eigenvalues of a
triangle matrix are given by its diagonal elements accordingly. Our immediate task
will be to examine whether and how a given matrix is converted to a triangle matrix
through a similarity transformation. A following theorem is important.
Theorem 12.1 Every (n, n) square matrix can be converted to a triangle matrix by
similarity transformation [1].
Proof A triangle matrix is either an “upper” triangle matrix [to which all the lower
left off-diagonal elements are zero; see (12.14)] or a “lower” triangle matrix
(to which all the upper right off-diagonal elements are zero). In the present case
we show the proof for the upper triangle matrix. Regarding the lower triangle matrix
the theorem is proven in a similar manner.
The proof is due to mathematical induction. First we show that the theorem is true
of a (2, 2) matrix. Suppose that one of eigenvalues of A2 is α1 and that its
corresponding eigenvector is x1. Then we have

A2 x1 ¼ α1 x1 , ð12:17Þ

where we assume that x1 represents a column vector. Let a non-singular matrix be

P1 ¼ ðx1 p1 Þ, ð12:18Þ

where p1 is another column vector chosen in such a way that x1 and p1 are linearly
independent so that P1 can be a non-singular matrix. Then, we have
 1 
P1 1
1 A2 P1 ¼ P1 A2 x1 P1 A2 p1 : ð12:19Þ

Meanwhile,
12.1 Eigenvalues and Eigenvectors 463

   
1 1
x1 ¼ ð x1 p1 Þ ¼ P1 :
0 0

Hence, we have
     
1 1 α1
P1
1 A2 x1 ¼ α1 P1
1 x1 ¼ α1 P1
1 P1 ¼ α1 ¼ : ð12:20Þ
0 0 0

Thus (12.19) can be rewritten as


 
α1 
P1
1 A2 P1 ¼ : ð12:21Þ
0 

This shows that Theorem 12.1 is true of a (2, 2) matrix A2.


Now let us show that Theorem 12.1 holds in a general case, i.e., for a (n, n) square
matrix An. Let αn be one of eigenvalues of An. On the basis of the argument of the
(2, 2) matrix case, suppose that after a suitable similarity transformation by a non-
e we have
singular matrix P
 1
f e An P:
An ¼ P e ð12:22Þ

fn as.
Then, we can describe A

0
. ð12:23Þ
AnѸ1

In (12.23), αn is one of eigenvalues of An. To show that (12.23) is valid, we have a


e such that
similar argument as in the case of a (2, 2) matrix. That is, we set P

e ¼ ðan p1 p2   pn1 Þ,
P

where an is an eigenvector corresponding to αn and P e is a non-singular matrix


formed by n linearly independent column vectors an, p1, p2,   and pn  1. Then we
have
 1
fn ¼ P
A e An Pe
 1  1  1  1 
¼ P e An an Pe An p1 Pe An p2    P
e An pn1 : ð12:24Þ

The vector an can be expressed as


464 12 Canonical Forms of Matrices

0 1 0 1
1 1
B 0 C B 0 C
B C B C
B C B C
B
an ¼ ðan p1 p2   pn1 ÞB C e B 0 C
0 C ¼ PB C:
B C B C
@⋮A @ ⋮A
0 0

Therefore, we have
1 0 0 1 0 1
1 1 αn
B 0 C B 0 C B 0 C
B C B C B C
 1  1  1 B C B C B C
e e e e B C
P An an ¼ P αn an ¼ P αn PB 0 C ¼ αn B
B 0 C B
C¼B 0 CC:
B C B C B C
@⋮A @ ⋮A @ ⋮A
0 0 0

Thus, from (12.24) we see that one can express a matrix form of A fn as (12.23).
By a hypothesis of the mathematical induction, we assume that there exists a
(n  1, n  1) non-singular square matrix Pn  1 and an upper triangle matrix Δn  1
such that

P1
n1 An1 Pn1 ¼ Δn1 : ð12:25Þ

Let us define a following matrix

0 ⋯ 0
0 ,
PnѸ1

where d 6¼ 0. The Pn is (n, n) non-singular square matrix; remember that detPn ¼ d


fn from the right, we have
(det Pn  1) 6¼ 0. Operating Pn on A

⋯ 0 ⋯ 0
, ð12:26Þ
AnѸ1 PnѸ1 AnѸ1 PnѸ1

0 1
x1
B C
where xTn1 is a transpose of a column vector @ ⋮ A . Therefore, xTn1 Pn1 is a
xn1
(1, n  1) matrix (i.e., a row vector). Meanwhile, we have
12.1 Eigenvalues and Eigenvectors 465

0 0 0 /
. ð12:27Þ
PnѸ1 Δ nѸ1 PnѸ1 ΔnѸ1

From the assumption of (12.25), we have

LHS of ð12:26Þ ¼ LHS of ð12:27Þ: ð12:28Þ

That is,

f
An Pn ¼ Pn Δn : ð12:29Þ

In (12.29) we define Δn such that

/
, ð12:30Þ
Δ nѸ1

which has appeared in LHS of (12.27). As Δn  1 is a triangle matrix from the


assumption, Δn is a triangle matrix as well. Combining (12.22) and (12.29), we
finally get
 
e n 1 An PP
Δn ¼ PP e n: ð12:31Þ

e n is a non-singular matrix, and so PP


Notice that PP e1 ¼ E . Hence,
e n Pn 1 P
 
e1 ¼ PP
Pn 1 P e n 1 . The equation obviously shows that An has been converted to
a triangle matrix Δn. This completes the proof. ∎
Equation (12.31) implies that eigenvectors are disposed on diagonal positions of a
triangle matrix. Triangle matrices can further undergo a similarity transformation.
Example 12.1 Let us think of a following triangle matrix A:
 
2 1
A¼ : ð12:32Þ
0 1

Eigenvalues of A are 2 and 1. Remember that diagonal elements of a triangle


 
1
matrix give eigenvalues. According to (12.20), a vector can be chosen for an
0
eigenvector (as a column vector representation) corresponding to the eigenvalue
466 12 Canonical Forms of Matrices

 
1
2. Another eigenvector can be chosen to be . This is because for an
1
eigenvalue 1, we obtain
 
1 1
x¼0
0 0
 
c1
as an eigenvalue equation (A  E)x ¼ 0. Therefore, with an eigenvector
 c2
1
corresponding to the eigenvalue 1, we get c1 + c2 ¼ 0. Therefore, we have as
  1
1 1
a simple form of the eigenvector. Hence, putting P ¼ , similarity trans-
0 1
formation is carried out such that
     
1 1 2 1 1 1 2 0
P1 AP ¼ ¼ : ð12:33Þ
0 1 0 1 0 1 0 1

This is a simple example of matrix diagonalization.


Regarding the eigenvalue/eigenvector problems, we have another important
theorem.
Theorem 12.2 Eigenvectors corresponding to different eigenvalues of A are line-
arly independent.
Proof We prove the theorem by mathematical induction. Let α1 and α2 be two
different eigenvalues of a matrix A and let a1 and a2 be eigenvectors corresponding
to α1 and α2, respectively.
Let us think of a following equation:

c1 a1 þ c2 a2 ¼ 0: ð12:34Þ

Suppose that a1 and a2 are linearly dependent. Then, without loss of generality we
can put c1 6¼ 0. Accordingly, we get

c2
a1 ¼  a: ð12:35Þ
c1 2

Operating A from the left of (12.35) we have

c2
α1 a1 ¼  α a = α2 a1 : ð12:36Þ
c1 2 2

With the second equality we have used (12.35). From (12.36) we have
12.1 Eigenvalues and Eigenvectors 467

ðα1  α2 Þa1 = 0: ð12:37Þ

As α1 6¼ α2, α1  α2 6¼ 0. This implies a1 = 0, in contradiction to that a1 is a


eigenvector. Thus, a1 and a2 must be linearly independent.
Next we assume that Theorem 12.2 is true of the case where we have (n  1)
eigenvalues α1, α2,   , and αn  1 that are different from one another and
corresponding eigenvectors a1, a2,   , and an  1 are linearly independent. Let us
think of a following equation:

c1 a1 þ c2 a2 þ    þ cn1 an1 þ cn an ¼ 0, ð12:38Þ

where an is an eigenvector corresponding to an eigenvalue αn. Suppose here that a1,


a2,   , and an are linearly dependent. If cn = 0, we have

c1 a1 þ c2 a2 þ    þ cn1 an1 ¼ 0: ð12:39Þ

But, from the linear independence of a1, a2,   , and an  1 we have


c1 ¼ c2 ¼    ¼ cn  1 ¼ 0. Thus, it follows that all the eigenvectors a1, a2,   ,
and an are linearly independent. However, this is in contradiction to the assumption.
We should therefore have cn 6¼ 0. Accordingly we get
 
c1 c c
an ¼  a1 þ 2 a2 þ    þ n1 an1 : ð12:40Þ
cn cn cn

Operating A from the left of (12.38) again, we have


 
c1 c2 cn1
α n an ¼  α a þ α a þ  þ α a : ð12:41Þ
cn 1 1 cn 2 2 cn n1 n1

Here we think of two cases of (i) αn 6¼ 0 and (ii) αn ¼ 0.


(i) Case I: αn 6¼ 0.
Multiplying both sides of (12.40) by αn we have
 
c1 c2 cn1
αn an ¼  α a þ α a þ  þ αa : ð12:42Þ
cn n 1 cn n 2 cn n n1

Subtracting (12.42) from (12.41) we get

c1 c c
0¼ ðα  α1 Þa1 þ 2 ðαn  α2 Þa2 þ    þ n1 ðαn  αn1 Þan1 : ð12:43Þ
cn n cn cn

Since we assume that eigenvalue are different from one another, αn 6¼ α1,
αn 6¼ α2,   , αn 6¼ αn  1. This implies that c1 ¼ c2 ¼    ¼ cn  1 ¼ 0. From
(12.40) we have an ¼ 0. This is, however, in contradiction to that an is an
468 12 Canonical Forms of Matrices

eigenvector. This means that our original supposition that a1, a2,   , and an are
linearly dependent was wrong. Thus, the eigenvectors a1, a2,   , and an should
be linearly independent.
(ii) Case II: αn ¼ 0.
Suppose again that a1, a2,   , and an are linearly dependent. Since as before
cn 6¼ 0, we get (12.40) and (12.41) once again. Putting αn ¼ 0 in (12.41) we have
 
c c c
0 ¼  1 α1 a1 þ 2 α2 a2 þ    þ n1 αn1 an1 : ð12:44Þ
cn cn cn

Since eigenvalues are different, we should have α1 6¼ 0, α2 6¼ 0,   , and


αn  1 6¼ 0. Then, considering that a1, a2,   , and an  1 are linearly indepen-
dent, for (12.44) to hold we must have c1 ¼ c2 ¼    ¼ cn  1 ¼ 0. But, from
(12.40) we have an ¼ 0, again in contradiction to that an is an eigenvector. Thus,
the eigenvectors a1, a2,   , and an should be linearly independent. These
complete the proof. ∎

12.2 Eigenspaces and Invariant Subspaces

Equations (12.21), (12.30), and (12.31) show that if we adopt an eigenvector as one
of basis vectors, the matrix representation of the linear transformation A in reference
to such basis vectors is obtained so that the leftmost column is zero except for the left
top corner on which an eigenvalue is positioned. (Note that if the said eigenvector is
zero, the leftmost column is a zero vector.) Meanwhile, neither Theorem 12.1 nor
Theorem 12.2 tells about multiplicity of eigenvalues. If the eigenvalues have
multiplicity, we have to think about different aspects. This is a major issue of this
section.
Let us start with a discussion of invariant subspaces. Let A be a (n, n) square
matrix. If a subspace W in V n is characterized by

x 2 W⟹Ax 2 W,

W is said to be invariant with respect to A (or simply A-invariant) or an invariant


subspace in V n ¼ Span{a1, a2,   , an}. Suppose that x is an eigenvector of A and that
its corresponding eigenvalue is α. Then, Span{x} is an invariant subspace of V n. It is
because A(cx) = cAx = cαx ¼ α(cx) and cx is again an eigenvector belonging to α.
Suppose that dimW ¼ m (m  n) and that W ¼ Span{a1, a2,   , am}. If W is A-
invariant, A causes a linear transformation within W. In that case, expressing A in
reference to a1, a2,   , am, am + 1,   , an, we have
12.2 Eigenspaces and Invariant Subspaces 469

 
Am 
ða1 a2   am amþ1   an ÞA ¼ ða1 a2   am amþ1   an Þ , ð12:45Þ
0 

where Am is a (m, m) square matrix and “zero” denotes a (n  m, m) zero matrix.


Notice that the transformation A makes the remaining (n  m) basis vectors am + 1,
am + 2,   , and an in V n be converted to a linear combination of a1, a2,   , and an.
The triangle matrix Δn given in (12.30) and (12.31) is an example to which Am is a
(1, 1) matrix (i.e., simply a complex number).
Let us examine properties of the A-invariant subspace still further. Let a be any
vector in V n and think of following (n + 1) vectors [2]:

a, Aa, A2 a,   , An a:

These vectors are linearly dependent, since there are at most n linearly indepen-
dent vectors in V n. These vectors span a subspace in V n. Let us call this subspace M;
i.e.,

M  Span a, Aa, A2 a,   , An a :

Consider the following equation:

c0 a þ c1 Aa þ c2 A2 a þ    þ cn An a = 0: ð12:46Þ

Not all ci (0  i  n) are zero, because the vectors are linearly dependent. Suppose
that cn 6¼ 0. Then, from (12.46) we have

1 
An a = 2 c a þ c1 Aa þ c2 A2 a þ    þ cn1 An1 a :
cn 0

Operating A on the above equation from the left, we have

1 
Anþ1 a = 2 c0 Aa þ c1 A2 a þ c2 A3 a þ    þ cn1 An a :
cn

Thus, An + 1a is contained in M. That is, we have An + 1a 2 Span{a, Aa, A2a,   ,


A a}. Next, suppose that cn ¼ 0. Then, at least one of ci (0  i  n  1) is not zero.
n

Suppose that cn  1 6¼ 0. From (12.46), we have

1  
An1 a = 2 c a þ c1 Aa þ c2 A2 a þ    þ cn2 An2 a :
cn1 0

Operating A2 on the above equation from the left, we have


470 12 Canonical Forms of Matrices

1  
Anþ1 a = 2 c0 A2 a þ c1 A3 a þ c2 A4 a þ    þ cn2 An a :
cn1

Again, An + 1a is contained in M. Repeating similar procedures, we reach

c0 a þ c1 Aa ¼ 0:

If c1 ¼ 0, then we must have c0 6¼ 0. If so, a = 0. This is impossible, however,


because we should have a 6¼ 0. Then, we have c1 6¼ 0 and, hence,

c0
Aa ¼  a:
c1

Operating once again An on the above equation from the left, we have

c0 n
Anþ1 a =  A a:
c1

Thus, once again An + 1a is contained in M.


In the above discussion, we get AM ⊂ M. Further operating
A, A2M ⊂ AM ⊂ M, A3M ⊂ A2M ⊂ AM ⊂ M,   . Then we have

a, Aa, A2 a,   , An a, Anþ1 a,    2 Span a, Aa, A2 a,   , An a :

That is, M is an A-invariant subspace. We also have

m  dimM  dimV n ¼ n:

There are m basis vectors in M, and so representing A in a matrix form in reference


to the n basis vectors of V n including these m vectors, we have
 
Am 
A¼ : ð12:47Þ
0 

Note again that in (12.45) V n is spanned by the m basis vectors in M together with
other (n 2 m) linearly independent vectors.
We happen to encounter a situation where two subspaces W1 and W2 are at once
A-invariant. Here we can take basis vectors a1, a2,   , and am for W1 and am + 1,
am + 2,   , and an for W2. In reference to such a1, a2,   , and an as basis vector of
V n, we have
 
Am 0
A¼ , ð12:48Þ
0 Anm
12.2 Eigenspaces and Invariant Subspaces 471

where An  m is a (n  m, n  m) square matrix and “zero” denotes either a


(n  m, m) or (m, n  m) zero matrix. Alternatively, we denote

A ¼ Am  Anm :

In this situation, the matrix A is said to be reducible.


As stated above, A causes a linear transformation within both W1 and W2. In other
words, Am and An  m cause a linear transformation within W1 and W2 in reference to
a1, a2,   , am and am + 1, am + 2,   , an, respectively. In this case V n can be
represented as a direct sum of W1 and W2 such that

V n ¼ W 1  W 2 ¼ Spanfa1 , a2 ,   , am g  Spanfamþ1 , amþ2 ,   , an g: ð12:49Þ

This is because Span{a1, a2,   , am} \ Span{am + 1, am + 2,   , an} ¼ {0}. In fact,


if the two subspaces possess x ( ≠ 0) in common, a1, a2,   , an become linearly
dependent, in contradiction.
The vector space V n may well further be decomposed into subspaces with a lower
dimension. For further discussion we need a following theorem and a concept of a
generalized eigenvector and generalized eigenspace.
Theorem 12.3: Hamilton–Cayley Theorem [3, 4] Let fA(x) be the characteristic
polynomial pertinent to a linear transformation A: V n ! V n. Then fA(A)(x) ¼ 0 for
8
x 2 V n. That is, Ker fA(A) ¼ V n.
Proof To prove the theorem we use the following well-known property of deter-
minants. Namely, let A be a (n, n) square matrix expressed as
0 1
a11  a1n
B C
A¼@⋮ ⋱ ⋮ A:
an1  ann

e be the cofactor matrix of A, namely


Let A
0 1
Δ11  Δ1n
e B C
A¼@ ⋮ ⋱ ⋮ A,
Δn1  Δnn

where Δij is a cofactor of aij. Then

A eT ¼j A j E,
eT A ¼ AA ð12:50Þ

eT is a transposed matrix of A.
where A e We now apply (12.50) to the characteristic
polynomial to get a following equation expressed as
472 12 Canonical Forms of Matrices

 g T 
xE  AÞ ðxE  AÞ ¼ ðxE  AÞ xEg
 AÞT ¼ jxE  AjE ¼ f A ðAÞE, ð12:51Þ

where xEg  A is the cofactor matrix of xE  A. Let the cofactor of the (i, j)-element
of (xE  A) be Δij. Note in this case that Δij is an at most (n  1)-th order polynomial
of x. Let us put accordingly

Δij ¼ bij,0 xn1 þ bij,1 xn2 þ    þ bij,n1 : ð12:52Þ

Also, we put Bk ¼ (bij, k).


Then we have
0 1
Δ11  Δ1n
B C
xEg
A¼B
@⋮ ⋱ ⋮ C A
Δn1  Δnn

0 1
b11,0 xn1 þ    þ b11,n1    b1n,0 xn1 þ    þ b1n,n1
B C
¼@ ⋮ ⋱ ⋮ A
bn1,0 xn1 þ    þ bn1,n1    bnn,0 xn1 þ    þ bnn,n1

0 1 0 1
b11,0  b1n,0 b11,0  b1n,n1
B C B C
¼ xn1 @ ⋮ ⋱ ⋮ A þ  þ @ ⋮ ⋱ ⋮ A
bn1,0  bnn,0 bn1,0  bnn,n1

¼ xn1 B0 þ xn2 B1 þ    þ Bn1 : ð12:53Þ

Thus, we get
 
jxE  AjE ¼ ðxE  AÞ xn1 BT0 þ xn2 BT1 þ    þ BTn1 : ð12:54Þ

Replacing x with A and considering that BTl ðl ¼ 0,   , n  1Þ is commutative


with A, we have
 
f A ðAÞ ¼ ðA  AÞ An1 BT0 þ An2 BT1 þ    þ BTn1 ¼ 0: ð12:55Þ

This means that fA(A)(x) ¼ 0 for 8x 2 V n. These complete the proof. ∎


In relation to Hamilton–Cayley theorem, we consider an important concept of a
minimal polynomial.
12.3 Generalized Eigenvectors and Nilpotent Matrices 473

Definition 12.1 Let f(x) be a polynomial of x with scalar coefficients such that f
(A) ¼ 0, where A is a (n, n) matrix. Among f(x), a lowest-order polynomial with the
highest-order coefficient of one is said to be a minimal polynomial. We denote it by
φA(x); i.e., φA(A) ¼ 0.
We have an important property for this. Namely, a minimal polynomial φA(x) is a
divisor of f(A). In fact, suppose that we have

f ðxÞ ¼ gðxÞφA ðxÞ þ hðxÞ:

Inserting A into x, we have

f ðAÞ ¼ gðAÞφA ðAÞ þ hðAÞ ¼ hðAÞ ¼ 0:

From the above equation, h(x) should be a polynomial whose order is lower than
that of φA(A). But, the presence of such h(x) is in contradiction to the definition of the
minimal polynomial. This implies that h(x)  0. Thus, we get

f ðxÞ ¼ gðxÞφA ðxÞ:

That is, φA(x) must be a divisor of f(A).


Suppose that φ0A ðxÞ is another minimal polynomial. If the order of φ0A ðxÞ is lower
than that of φA(x), we can choose φ0A ðxÞ for a minimal polynomial from the
beginning. Thus we assume that φA(x) and φ0A ðxÞ are of the same order. We have

f ðxÞ ¼ gðxÞφA ðxÞ ¼ g0 ðxÞφ0A ðxÞ:

Then, we have

φA ðxÞ=φ0A ðxÞ ¼ g0 ðxÞ=gðxÞ ¼ c,

where c is a constant, because φA(x) and φ0A ðxÞ are of the same order. But, c should be
one, since the highest-order coefficient of the minimal polynomial is one. Thus,
φA(x) is uniquely defined.

12.3 Generalized Eigenvectors and Nilpotent Matrices

Equation (12.4) ensures that an eigenvalue is accompanied by an eigenvector.


Therefore, if a matrix A : V n ! V n has different n eigenvalues without multiple
roots, the vector space V n is decomposed to a direct sum of one-dimensional sub-
spaces spanned by those individual linearly independent eigenvectors (see discus-
sion of Sect 12.1). Thus, we have
474 12 Canonical Forms of Matrices

Vn ¼ W1  W2      Wn
¼ Spanfa1 g  Spanfa2 g      Spanfan g, ð12:56Þ

where ai (1  i  n) are eigenvectors corresponding to different n eigenvalues. The


situation, however, is not always simple. Even though a matrix has eigenvalues of
multiple roots, we have yet a simple case as shown in a next example.
Example 12.2 Let us think of a following matrix A : V3 ! V3 described by
0 1
2 0 0
B C
A ¼ @0 2 0 A: ð12:57Þ
0 0 2

The matrix has a triple root 2. As can easily be seen below, individual eigenvec-
tors a1, a2, and a3 form basis vectors of each invariant subspace, indicating that V3
can be decomposed to a direct sum of the three invariant subspaces as in (12.56).
0 1
2 0 0
B C
ð a1 a2 a3 Þ @ 0 2 0 A ¼ ð2a1 2a2 2a3 Þ: ð12:58Þ
0 0 2

Let us think of another simple example.


Example 12.3 Let us think of a linear transformation A : V2 ! V2 expressed as
  
3 1 x1
A ð xÞ ¼ ð a1 a2 Þ , ð12:59Þ
0 3 x2

where a1 and a2 are basis vectors and for 8x 2 V2,x = x1a1 + x2a2. This example has a
double root 3. We have
 
3 1
ð a1 a2 Þ ¼ ð3a1 a1 þ 3a2 Þ: ð12:60Þ
0 3

Thus, Span {a1} is A-invariant, but this is not the case with Span{a2}. This
implies that a1 is an eigenvector corresponding to an eigenvalue 3 but a2 is not.
Detailed discussion about matrices of this kind follows below.
Nilpotent matrices play an important role in matrix algebra. These matrices are
defined as follows.
Definition 12.2 Let N be a linear transformation in a vector space V n. Suppose that
we have

N k ¼ 0 and N k1 6¼ 0, ð12:61Þ


12.3 Generalized Eigenvectors and Nilpotent Matrices 475

where N is a (n, n) square matrix and k (2) is a certain natural number. Then, N is
said to be a nilpotent matrix of an order k or a k-th order nilpotent matrix. If (12.61)
holds with k ¼ 1, N is a zero matrix.
Nilpotent matrices have following properties:
(i) Eigenvalues of a nilpotent matrix are zero. Let N be a k-th order nilpotent matrix.
Suppose that

Nx ¼ αx, ð12:62Þ

where α is an eigenvalue and x (6¼0) is its corresponding eigenvector. Operating


N (k  1) more times from the left of both sides of (12.62), we have

N k x ¼ αk x: ð12:63Þ

Meanwhile, Nk ¼ 0 by definition, and so αk ¼ 0, namely α ¼ 0.


Conversely, suppose that eigenvalues of a (n, n) matrix N are zero. From
Theorem 12.1, via a suitable similarity transformation with P we have

e
P1 NP ¼ N,

e is a triangle matrix. Then, using (12.12) and (12.16) we have


where N

f P1 NP ðxÞ ¼ f eðxÞ ¼ f N ðxÞ ¼ xn :


N

From Theorem 12.3, we have

f N ðN Þ ¼ N n ¼ 0:

Namely, N is a nilpotent matrix. In a trivial case, we have N ¼ 0


(zero matrix).
By Definition 12.2, we have Nk ¼ 0 with a k-th nilpotent (n, n) matrix N.
Then, its minimal polynomial is φN(x) ¼ xk (k  n).
(ii) A nilpotent matrix N is not diagonalizable (except for a zero matrix). Suppose
that N is diagonalizable. Then N can be diagonalized by a non-singular matrix
P such that

P1 NP ¼ 0:

The above equation holds, because N only takes eigenvalues of zero. Operating
P from the left of the above equation and P1 from the right, we have

N ¼ 0:
476 12 Canonical Forms of Matrices

This means that N would be a zero matrix, in contradiction. Thus, a nilpotent


matrix N is not diagonalizable.
Example 12.4 Let N be a matrix of a following form:
 
0 1
N¼ :
0 0

Then, we can easily check that N2 ¼ 0. Therefore, N is a nilpotent matrix of a


second order. Note that N is an upper triangle matrix, and so eigenvalues are given
by diagonal elements. In the present case the eigenvalue is zero (as a double root),
consistent with the aforementioned property.
With a nilpotent matrix of an order k, we have at least one vector x such that
Nk  1x 6¼ 0. Here we add that a zero transformation A (or matrix) can be defined as

8
A ¼ 0 ⟺ Ax ¼ 0 for x 2 V n: ð12:64Þ

Taking contraposition of this, we have


A 6¼ 0 ⟺ Ax 6¼ 0 for x 2 V n:

In relation to a nilpotent matrix, we have a following important theorem.


Theorem 12.4 If N is a k-th order nilpotent matrix, then for ∃x 2 V we have
following linearly independent k vectors:

x, Nx N 2 x,   , N k1 x:

Proof Let us think of a following equation:


Xk1
cN
i¼0 i
i
x ¼ 0: ð12:65Þ

Multiplying (12.65) by Nk  1 from the left and using Nk ¼ 0, we get

c0 N k1 x ¼ 0: ð12:66Þ

This implies that c0 ¼ 0. Thus, we are left with


Xk1
cN
i¼1 i
i
x ¼ 0: ð12:67Þ

Next, multiplying (12.67) by Nk  2 from the left, we get similarly


12.3 Generalized Eigenvectors and Nilpotent Matrices 477

c1 N k1 x ¼ 0, ð12:68Þ

implying that c1 ¼ 0. Continuing this procedure, we finally get ci ¼ 0 (0  i  k  1).


This completes the proof.
This immediately tells us that for a k-th order nilpotent matrix N : V n ! V n, we
should have k  n. This is because the number of linearly independent vectors is at
most n.
In Example 12.4, N causes a transformation of basis vectors in V2 such that
 
0 1
ðe1 e2 ÞN ¼ ðe1 e2 Þ ¼ ð0 e1 Þ:
0 0

That is, Ne2 ¼ e1. Then, linearly independent two vectors e2 and Ne2 correspond
to the case of Theorem 12.4.
So far we have shown simple cases where matrices can be diagonalized via
similarity transformation. This is equivalent to that the relevant vector space is
decomposed to a direct sum of (invariant) subspaces. Nonetheless, if a characteristic
polynomial of the matrix has multiple root(s), it remains uncertain whether the vector
space is decomposed to such a direct sum. To answer this question, we need a
following lemma. ∎
Lemma 12.1 Let f1(x), f2(x),   , and fs(x) be polynomials without a common factor.
Then we have s polynomials M1(x), M2(x),   , Ms(x) that satisfy the following
relation:

M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ    þ M s ðxÞf s ðxÞ ¼ 1: ð12:69Þ

Proof Let Mi(x) (1  i  s) be arbitrarily chosen polynomials and deal with a set of
g(x) that can be expressed as

gðxÞ ¼ M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ    þ M s ðxÞf s ðxÞ: ð12:70Þ

Let a whole set of g(x) be H. Then H has following two properties:


(i) g1 ðxÞ, g2 ðxÞ 2 H⟹g1 ðxÞ þ g2 ðxÞ 2 H,
(ii) gðxÞ 2 H, M(x): an arbitrarily chosen polynomial ⟹M ðxÞgðxÞ 2 H.
Now let the lowest order polynomial out of the set H be g0(x). Then 8gðxÞð2 HÞ
are a multiple of g0(x). Suppose that dividing g(x) by g0(x), we have

gðxÞ ¼ M ðxÞg0 ðxÞ þ hðxÞ, ð12:71Þ

where h(x) is a certain polynomial. Since gðxÞ, g0 ðxÞ 2 H, we have hðxÞ 2 H from
the above properties (i) and (ii). If h(x) 6¼ 0, the order of h(x) is lower than that of
478 12 Canonical Forms of Matrices

g0(x) from (12.71). This is, however, in contradiction to the definition of g0(x).
Therefore, h(x) ¼ 0. Thus,

gðxÞ ¼ M ðxÞg0 ðxÞ: ð12:72Þ

This implies that H is identical to a whole set of polynomials comprising multiples


of g0(x). In particular, polynomials f i ðxÞ ð1  i  sÞ 2 H. To show this, put Mi(x) ¼ 1
with other Mj(x) ¼ 0 ( j 6¼ i). Hence, the polynomial g0(x) should be a common
factor of fi(x). Meanwhile, g0 ðxÞ 2 H, and so by virtue of (12.70) we have

g0 ðxÞ ¼ M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ    þ M s ðxÞf s ðxÞ: ð12:73Þ

On assumption the greatest common factor of f1(x), f2(x),   , and fs(x) is 1. This
implies that g0(x) ¼ 1. Thus, we finally get (12.69) and complete the proof. ∎

12.4 Idempotent Matrices and Generalized Eigenspaces

In Sect. 12.1 we have shown that eigenvectors corresponding to different eigen-


values of A are linearly independent. Also if those eigenvalues do not possess
multiple roots, the vector space comprises a direct sum of the subspaces of
corresponding eigenvectors. However, how do we have to treat a situation where
eigenvalues possess multiple roots? Even in this case there is at least one eigenvector
that corresponds to the eigenvalue. To adequately address the question, we need a
concept of generalized eigenvectors and generalized eigenspaces.
For this purpose we extend and generalize the concept of eigenvectors. For a
certain natural number l, if a vector x (2V n) satisfies a following relation, x is said to
be a generalized eigenvector of rank l that corresponds to an eigenvalue α.

ðA  αE Þl x ¼ 0, ðA  αEÞl1 x 6¼ 0:

Thus (12.3) implies that an eigenvector of (12.3) may be said to be a generalized


eigenvector of rank 1. When we need to distinguish it from generalized eigenvectors
of rank l (l  2), we call it a “proper” eigenvector. Thus far we have only been
concerned with the proper eigenvectors. We have a following theorem related to the
generalized eigenvectors.
In this section, idempotent operators play a role. The definition is simple.
Definition 12.3 An operator A is said to be idempotent if A2 ¼ A.
From this simple definition, we can draw several important pieces of information.
Let A be an idempotent operator that operates on V n. Let x be an arbitrary vector in
V n. Then, A2x ¼ A(Ax) = Ax. That is,
12.4 Idempotent Matrices and Generalized Eigenspaces 479

AðAx  xÞ ¼ 0:

Then, we have

Ax = x or Ax = 0: ð12:74Þ

From Theorem 11.3 (dimension theorem), we have

V n ¼ AðV n Þ  Ker A,

where

AðV n Þ ¼ fx; x 2 V n , Ax = xg, Ker A ¼ fx; x 2 V n , Ax = 0g: ð12:75Þ

Thus, we find that A decomposes V n into a direct sum of A(V n) and Ker A.
Conversely, we can readily verify that if there exists an operator A that satisfies
(12.75), such A must be an idempotent operator. The verification is left for readers.
Meanwhile, we have

ðE 2 AÞ2 ¼ E  2A þ A2 ¼ E  2A þ A ¼ E  A:

Hence, E  A is an idempotent matrix as well. Moreover, we have

AðE  AÞ ¼ ðE  AÞA ¼ 0:

Putting E  A  B and following a procedure similar to the above

Bx  x ¼ ðE  AÞx  x ¼ x  Ax  x ¼ Ax:

Therefore, B(V n) ¼ {x; x 2 V n, Bx = x} is identical to Ker A. Writing

W A ¼ f x; x 2 V n , Ax = xg, W A ¼ f x; x 2 V n , Ax = 0g, ð12:76Þ

we get

W A ¼ AV n , W A ¼ ðE  AÞV n ¼ BV n ¼ V n  AV n ¼ Ker A:

That is,

V n ¼ W A þ W A:

Suppose that ∃ u 2 W A \ W A . Then, from (12.76) Au = u and Au = 0, namely


u = 0, and so V is a direct sum of WA and W A . That is,
480 12 Canonical Forms of Matrices

V n ¼ W A  W A:

Notice that if we consider an identity operator E as an idempotent operator, we are


thinking of a trivial case. That is, V n ¼ V n  {0}.
The result can immediately be extended to the case where more idempotent
operators take part in the vector space. Let us define operators such that

A1 þ A2 þ    þ As ¼ E,

where E is a (n, n) identity matrix. Also

Ai Aj ¼ Ai δij :

Moreover, we define Wi such that

W i ¼ Ai V n ¼ fx; x 2 V n , Ai x ¼ xg: ð12:77Þ

Then

V n ¼ W 1  W 2      W s: ð12:78Þ

In fact, suppose that ∃x( 2 V n) 2 Wi, Wj (i 6¼ j). Then Aix ¼ x = Ajx. Operating Aj
( j 6¼ i) from the left, AjAix ¼ Ajx = AjAjx. That is, 0 ¼ x = Ajx, implying that
Wi \ Wj ¼ {0}. Meanwhile,

V n ¼ ðA1 þ A2 þ    þ As ÞV n ¼ A1 V n þ A2 V n þ    þ As V n
¼ W 1 þ W 2 þ    þ W s: ð12:79Þ

As Wi \ Wj ¼ {0} (i 6¼ j), (12.79) is a direct sum. Thus, (12.78) will follow.


Example 12.5 Think of the following transformation A:
0 1
1 0 0 0
B0 0 0C
B 1 C
ðe1 e2 e3 e4 ÞA ¼ ðe1 e2 e3 e4 ÞB C ¼ ðe1 e2 0 0Þ:
@0 0 0 0A
0 0 0 0
P4
Put x = i¼1 xi ei . Then, we have
X4 X2
AðxÞ ¼ x Aðei Þ =
i¼1 i
xe
i¼1 i i
= W A,
X4
ðE  AÞðxÞ ¼ x 2 AðxÞ = xe
i¼3 i i
= W A:

In the above,
12.4 Idempotent Matrices and Generalized Eigenspaces 481

0 1
0 0 0 0
B0 0 0C
B 0 C
ðe1 e2 e3 e4 ÞðE  AÞ ¼ ðe1 e2 e3 e4 ÞB C ¼ ð0 0 e3 e4 Þ:
@0 0 1 0A
0 0 0 1

Thus, we have

W A ¼ Spanfe1 , e2 g, W A ¼ Spanfe3 , e4 g, V 4 ¼ W A  W A :

The properties of idempotent matrices can easily be checked. It is left for readers.
Using the aforementioned idempotent operators, let us introduce the following
theorem.
Theorem 12.5 [3, 4] Let A be a linear transformation V n ! V n. Suppose that a
vector x (2V n) satisfies the following relation:

ðA  αEÞl x ¼ 0, ð12:80Þ

where l is an enough large natural number. Then a set comprising x forms an A-


invariant subspace that corresponds to an eigenvalue α. Let α1, α2,   , αs be
eigenvalues of A different from one another. Then V n is decomposed to a direct
sum of the A-invariant subspaces that correspond individually to α1, α2,   , αs. This
is succinctly expressed as follows:

g
Vn ¼ W g g
α1  W α2      W αs : ð12:81Þ

e αi ð1  i  sÞ is given by
Here W

e αi ¼ fx; x 2 V n , ðA  αi E Þli x ¼ 0g,


W ð12:82Þ

e α i ¼ ni .
where li is an enough large natural number. If multiplicity of αi is ni, dim W
Proof Let us define the aforementioned A-invariant subspaces as W e αk ð1  k  sÞ.
Let fA(x) be a characteristic polynomial of A. Factorizing fA(x) into a product of
powers of first-degree polynomials, we have
Ys
f A ð xÞ ¼ i¼1
ðx  αi Þni , ð12:83Þ

where ni is a multiplicity of αi. Let us put


Ys  nj
f i ðxÞ ¼ f A ðxÞ=ðx  αi Þni ¼ j6¼i
x  αj : ð12:84Þ
482 12 Canonical Forms of Matrices

Then f1(x), f2(x),   , and fs(x) do not have a common factor. Consequently,
Lemma 12.1 tells us that there are polynomials M1(x), M2(x),   , and Ms(x) such that

M 1 ðxÞf 1 ðxÞ þ    þ M s ðxÞf s ðxÞ ¼ 1: ð12:85Þ

Replacing x with a matrix A, we get

M 1 ðAÞf 1 ðAÞ þ    þ M s ðAÞf s ðAÞ ¼ E: ð12:86Þ

Or defining Mi(A)fi(A)  Ai

A1 þ A2 þ    þ As ¼ E, ð12:87Þ

where E is a (n, n) identity matrix. Moreover we have

Ai Aj ¼ Ai δij : ð12:88Þ

In fact, if i 6¼ j,

Ai Aj ¼ M i ðAÞf i ðAÞM j ðAÞf j ðAÞ ¼ M i ðAÞM j ðAÞf i ðAÞf j ðAÞ


Y
s Y
s
¼ M i ðAÞM j ðAÞ ðA  αk Þnk ðA  αl Þnl
k6¼i l6¼j

Y
s
¼ M i ðAÞM j ðAÞf A ðAÞ ðA  αk Þnk ¼ 0: ð12:89Þ
k6¼i, j

The second equality results from the fact that Mj(A) and fi(A) are commutable
since both are polynomials of A. The last equality follows from Hamilton–Cayley
theorem. On the basis of (12.87) and (12.89),

Ai ¼ Ai E ¼ Ai ðA1 þ A2 þ    þ As Þ ¼ A2i : ð12:90Þ

Thus, we find that Ai is an idempotent matrix.


Next, let us show that using Ai determined above, AiV n is identical to
e αi ð1  i  sÞ. To this end, we define Wi such that
W

W i ¼ Ai V n ¼ fx; x 2 V n , Ai x ¼ xg: ð12:91Þ

We have

ðx  αi Þni f i ðxÞ ¼ f A ðxÞ: ð12:92Þ

Therefore, from Hamilton–Cayley theorem we have


12.4 Idempotent Matrices and Generalized Eigenspaces 483

ðA  αi E Þni f i ðAÞ ¼ 0: ð12:93Þ

Operating Mi(A) from the left and from the fact that Mi(A) commutes with
ðA  αi EÞni , we get

ðA  αi E Þni Ai ¼ 0, ð12:94Þ

where we used Mi(A)fi(A) ¼ Ai. Operating both sides of this equation on V n,


furthermore, we have

ðA  αi E Þni Ai V n ¼ 0:

This means that

e αi ð1  i  sÞ:
Ai V n ⊂ W ð12:95Þ

Conversely, suppose that x 2 W e αi . Then (A  αE)lx ¼ 0 holds for a certain


natural number l. If Mi(x)fi(x) were divided out by x  αi, LHS of (12.85) would be
divided out by x  αi as well, leading to the contradiction. Thus, it follows that
(x  αi)l and Mi(x)fi(x) do not have a common factor. Consequently, Lemma 12.1
ensures that we have polynomials M(x) and N(x) such that

M ðxÞðx  αi Þl þ N ðxÞM i ðxÞf i ðxÞ ¼ 1:

and, hence,

M ðAÞðA  αi E Þl þ N ðAÞM i ðAÞf i ðAÞ ¼ E: ð12:96Þ

Operating both sides of (12.96) on x, we get

M ðAÞðA  αi E Þl x þ N ðAÞM i ðAÞf i ðAÞx ¼ N ðAÞAi x = x: ð12:97Þ

Notice that the first term of (12.97) vanishes from (12.82).


As Ai is a polynomial of A, it commutes with N(A). Hence, we have

x ¼ Ai ½N ðAÞx 2 Ai V n : ð12:98Þ

Thus, we get

e αi ⊂ Ai V n ð1  i  sÞ:
W ð12:99Þ

From (12.95) and (12.99), we conclude that


484 12 Canonical Forms of Matrices

e αi ¼ Ai V n ð1  i  sÞ:
W ð12:100Þ

e αi that is defined as (12.82).


In other words, Wi defined as (12.91) is identical to W
Thus, we have

Vn ¼ W1  W2      Ws

or

g
Vn ¼ W g g
α1  W α2      W αs : ð12:101Þ

This completes the former half of the proof. With the latter half, the proof is as
follows.
Suppose that dim We αi ¼ n0i . In parallel to the decomposition of V n to the direct
sum of (12.81), A can be reduced as
0 1
Að1Þ  0
B C
A @ ⋮ ⋱ ⋮ A, ð12:102Þ
ðsÞ
0  A

where A(i) (1  i  s) is a (n0i , n0i ) matrix and a symbol indicates that A has been
transformed by suitable similarity transformation. The matrix A(i) represents a linear
transformation that A causes to W e αi . We denote a n0i order identity matrix by E n0 .
i
Equation (12.82) implies that the matrix represented by

N i ¼ AðiÞ  αi E n0i ð12:103Þ

is a nilpotent matrix. The order of an nilpotent matrix is at most n0i (vide supra) and,
hence, li can be n0i . With Ni we have
 h i
 
f N i ðxÞ ¼ jxEn0i  N i j ¼ xE n0i  AðiÞ  αi E n0i 
   
    0
¼ xEn0i  AðiÞ þ αi E n0i  ¼ ðx þ αi ÞEn0i  AðiÞ  ¼ xni : ð12:104Þ

The last equality is because eigenvalues of a nilpotent matrix are all zero.
Meanwhile,
0
f AðiÞ ðxÞ ¼ jxE n0i  AðiÞ j ¼ f N i ðx  αi Þ ¼ ðx  αi Þni : ð12:105Þ

Equation (12.105) implies that


12.5 Decomposition of Matrix 485

Ys Ys 0 Ys
f A ð xÞ ¼ f ði Þ ð xÞ ¼
i¼1 A i¼1
ðx  αi Þni ¼ i¼1
ðx  αi Þni : ð12:106Þ

The last equality comes from (12.83). Thus, n0i ¼ ni . These procedures complete
the proof. At the same time, we may equate li in (12.82) to ni. ∎
Theorem 12.1 shows that any square matrix can be converted to a (upper) triangle
matrix by a similarity transformation. Theorem 12.5 demonstrates that the matrix can
further be segmented according to individual eigenvalues. Considering Theorem
12.1 again, A(i) (1  i  s) can be described as an upper triangle matrix by
0 1
αi  
B C
AðiÞ @⋮ ⋱ ⋮ A: ð12:107Þ
0    αi

Therefore, denoting N (i) such that


0 1
0  
B C
N ðiÞ ¼ AðiÞ  αi E ni ¼ @ ⋮ ⋱ ⋮ A, ð12:108Þ
0  0

we find that N (i) is nilpotent. This is because all the eigenvalues of N (i) are zero.
From (12.108), we have
h iμi
N ðiÞ ¼ 0 ðμi  ni Þ:

12.5 Decomposition of Matrix

To investigate canonical forms of matrices, it would be convenient if a matrix can be


decomposed into appropriate forms. To this end, the following definition is
important.
Definition 12.4 A matrix similar to a diagonal matrix is said to be semi-simple.
In the above definition, if a matrix is related to another matrix by similarity
transformation, those matrices are said to be similar to each other. When we have
two matrices A and A0, we express it by A~A0 as stated above. This relation satisfies
the equivalence law. That is,

ði Þ A A, ðiiÞ A A0 ) A0 A, ðiiiÞ A A0 , A0 A00 ) A A00 :

Readers, check this. We have a following important theorem with the matrix
decomposition.
486 12 Canonical Forms of Matrices

Theorem 12.6 [3, 4] Any (n, n) square matrix A is expressed uniquely as

A ¼ S þ N, ð12:109Þ

where S is semi-simple and N is nilpotent; S and N are commutable, i.e., SN ¼ NS.


Furthermore, S and N are polynomials of A with scalar coefficients.
Proof Using (12.86) and (12.87), we write
Xs
S ¼ α1 A1 þ    þ αs As ¼ α M ðAÞf i ðAÞ:
i¼1 i i
ð12:110Þ

Then, Eq. (12.110) is a polynomial of A. From Theorem 12.1 and Theorem 12.5,
A(i) (1  i  s) in (12.102) is characterized by that A(i) is a triangle matrix whose
eigenvalues αi (1  i  s) are positioned on diagonal positions and that the order of
A(i) is identical to the multiplicity of αi. Since Ai (1  i  s) is an idempotent matrix,
it should be diagonalized through similarity transformation (see Sect. 12.7). In fact,
S is transformed via similarity transformation the same as (12.102) into
0 1
α1 En1  0
B C
S @ ⋮ ⋱ ⋮ A, ð12:111Þ
0  αs E ns

where Eni ð1  i  sÞ is an identity matrix of an order ni that is identical to the


multiplicity of αi. This expression is equivalent to, e.g.,
0 1
E n1  0
B C
A1 @⋮ ⋱ ⋮A
0  0

in (12.110). Thus, S is obviously semi-simple. Putting N ¼ A  S, N is described


after the above transformation as
0 1
N ð1Þ  0
B C
N @ ⋮ ⋱ ⋮ A, N ðiÞ ¼ AðiÞ  αi Eni : ð12:112Þ
0  N ðsÞ

Since each N (i) is nilpotent as stated in Sect. 12.4, N is nilpotent as well. Also
(12.112) is a polynomial of A as in the case of S. Therefore, S and N are commutable.
To prove the uniqueness of the decomposition, we show the following:
(i) Let S and S0 be commutable semi-simple matrices. Then, those matrices are
simultaneously diagonalized. That is, with a certain non-singular matrix
12.5 Decomposition of Matrix 487

P, P1SP and P1S0P are diagonalized at once. Hence, S S0 is semi-simple


as well.
(ii) Let N and N0 be commutable nilpotent matrices. Then, N N0 is nilpotent
as well.
(iii) A matrix both semi-simple and nilpotent is zero matrix.
(i) Let different eigenvalues of S be α1,   , αs. Then, since S is semi-simple, a
vector space V n is decomposed into a direct sum of eigenspaces
W αi ð1  i  sÞ. That is, we have

V n ¼ W α1      W αs :

Since S and S0 are commutable, with ∃ x 2 W αi we have


SS0x ¼ S0Sx = S0(αix) ¼ αiS0x. Hence, we have S0 x 2 W αi . Namely, W αi is S0-
invariant. Therefore, if we adopt the basis vectors {a1,   , an} with respect to
the direct sum decomposition, we get
0 1 0 1
α1 E n1  0 S01  0
B C B C
S @ ⋮ ⋱ ⋮ A, S0 @⋮ ⋱ ⋮ A::
0    αs E ns 0  S0s

Since S0 is semi-simple, S0i ð1  i  sÞ must be semi-simple as well. Here,


let {e1,   , en} be original basis vectors before the basis vector transformation
and let P be a representation matrix of the said transformation. Then, we have

ðe1   en ÞP ¼ ða1   an Þ:

Thus, we get
0 1 0 1
α1 En1  0 S01  0
B C B C
P1 SP ¼ @ ⋮ ⋱ ⋮ A, 1 0
P S P ¼ @⋮ ⋱ ⋮ A:
0    αs E ns 0    S0s

This means that both P1SP and P1S0P are diagonal. That is,
P SP P1S0P ¼ P1(S S0)P is diagonal, indicating that S S0 is semi-
1

simple as well.
0
(ii) Suppose that Nν ¼ 0 and N 0ν ¼ 0 . From the assumption, N and N0 are
commutable. Consequently, using binomial theorem we have

m!
N0Þ ¼ Nm mN m1 N 0 þ    þ ð 1Þi N mi N 0 þ    þ ð 1Þm N 0 :
m i m
ðN
i!ðm  iÞ!
ð12:113Þ
488 12 Canonical Forms of Matrices

Putting m ¼ ν + ν0  1, if i  ν0, N0 i ¼ 0 from the supposition. If i < ν0, we have


m  i > m  ν0 ¼ ν + ν0  1  ν0 ¼ ν  1; i.e., m  i  ν. Therefore, Nm  i ¼ 0.
Consequently, we have Nm  iN0 i ¼ 0 with any i in (12.113). Thus, we get
(N N0)m ¼ 0, indicating that N N0 is nilpotent.
(iii) Let S be a semi-simple and nilpotent matrix. We describe S as
0 1
α1  0
B C
S @⋮ ⋱ ⋮ A, ð12:114Þ
0  αn

where some of αi (1  i  n) may be identical. Since S is nilpotent, all


αi (1  i  n) is zero. We have then S~0; i.e., S ¼ 0 accordingly.
Now, suppose that a matrix A is decomposed differently from (12.109). That is,
we have

A ¼ S þ N ¼ S0 þ N 0 or S  S0 ¼ N 0  N: ð12:115Þ

From the assumption, S0 and N0 are commutable. Moreover, since S, S0, N, and N0
are described by a polynomial of A, they are commutable with one another. Hence,
from (i) and (ii) along with the second equation of (12.115), S  S0 and N0  N are
both semi-simple and nilpotent at once. Consequently, from (iii) S  S0 ¼ N0  N ¼ 0.
Thus, we finally get S ¼ S0 and N ¼ N0. That is, the decomposition is unique.
These complete the proof. ∎
Theorem 12.6 implies that the matrix decomposition of (12.109) is unique. On the
basis of Theorem 12.6, we investigate Jordan canonical forms of matrices in the next
section.

12.6 Jordan Canonical Form

Once the vector space V n has been decomposed to a direct sum of generalized
eigenspaces with the matrix reduced in parallel, we are able to deal with individual
e αi ð1  i  sÞ and the corresponding A(i) (1  i  s) separately.
eigenspaces W

12.6.1 Canonical Form of Nilpotent Matrix

To avoid complication of notation, we think of a following example where we


assume a (n, n) matrix that operates on V n: Let the nilpotent matrix be N: Suppose
that Nν  1 6¼ 0 and Nν ¼ 0 (1  ν  n). If ν ¼ 1, the nilpotent matrix N is a zero
12.6 Jordan Canonical Form 489

matrix. Notice that since a characteristic polynomial is described by fN(x) ¼ xn,


fN(N ) ¼ Nn ¼ 0 from Hamilton–Cayley theorem. Let W (i) be given such that

W ðiÞ ¼ x; x 2 V n , N i x ¼ 0 :

Then we have

V n ¼ W ðνÞ ⊃ W ðν1Þ ⊃    ⊃ W ð1Þ ⊃ W ð0Þ  f0g: ð12:116Þ

Note that when ν ¼ 1, we have trivially V n ¼ W (ν) ⊃ W (0)  {0}. Let us put dim
W ¼ mi, mi  mi  1 ¼ ri (1  i  ν), m0  0. Then we can add rν linearly
(i)

independent vectors a1 , a2 ,   , and arν to the basis vectors of W (ν  1) so that those


rν vectors can be basis vectors of W (ν). Unless N ¼ 0 (i.e., zero matrix), we must
have at least one such vector; from the supposition with ∃x 6¼ 0 we have Nν  1x 6¼ 0,
and so x 2= W (ν  1). At least one such vector x is present and it is eligible for a basis
vector of W (ν). Hence, rν  1 and we have

W ðνÞ ¼ Spanfa1 , a2 ,   , arν g  W ðν1Þ : ð12:117Þ

Note that (12.117) is expressed as a direct sum. Meanwhile,


Na1 , Na2 ,   , Narν 2 W ðν1Þ. In fact, suppose that x 2 W (ν), i.e., Nνx ¼ Nν  1(Nx) = 0.
That is, Nx 2 W (ν  1).
According to a similar reasoning made above, we have

SpanfNa1 , Na2 ,   , Narν g \ W ðν2Þ ¼ f0g:

Moreover, these rν vectors Na1 , Na2 ,   , and Narν are linearly independent.
Suppose that

c1 Na1 þ c2 Na2 þ    þ crν Narν = 0: ð12:118Þ

Operating N ν  2 from the left, we have

N ν1 ðc1 a1 þ c2 a2 þ    þ crν arν Þ = 0:

This would imply that c1 a1 þ c2 a2 þ    þ crν arν 2 W ðν1Þ . On the basis of the
above argument, however, we must have c1 ¼ c2 ¼    ¼ crν ¼ 0. In other words, if
ci (1  i  rν) were nonzero, we would have ai 2 W (ν  1), in contradiction. From
(12.118) this means linear independence of Na1 , Na2 ,   , and Narν .
As W (ν  1) ⊃ W (ν  2), we may well have additional linearly independent
vectors within the basis vectors of W (ν  1). Let those vectors be arν þ1 ,   , arν1. Here
we assume that the number of such vectors is rν  1  rν. We have rν  1  rν  0
accordingly. In this way we can construct basis vectors of W (ν  1) by including
arν þ1 ,   , and arν1 along with Na1 , Na2 ,   , and Narν . As a result, we get
490 12 Canonical Forms of Matrices

W ðν1Þ ¼ SpanfNa1 ,   , Narν , arν þ1 ,   , arν1 g  W ðν2Þ :

We can repeat these processes to construct W (ν  2) such that

W ðν2Þ ¼ Span N 2 a1 ,   , N 2 arν , Narν þ1 ,   , Narν1 , arν1 þ1 ,   , arν2  W ðν3Þ :

For W (i), furthermore, we have

W ðiÞ ¼ Span N νi a1 ,   , N νi arν , N νi1 arν þ1 ,   , N νi1 arν1 ,   , ariþ1 þ1 ,   , ari
 W ði1Þ :
ð12:119Þ

Further repeating the procedures, we exhaust all the n basis vectors of W (ν) ¼ V n.
These vectors are given as follows:

N k ariþ1 þ1 ,   , N k ari ð1  i  ν; 0  k  i  1Þ:

At the same time we have

0  r νþ1 < 1  r ν  r ν1      r 1 : ð12:120Þ

Table 12.1 [3, 4] shows the resulting structure of these basis vectors pertinent to
Jordan blocks. In Table 12.1, if laterally counting basis vectors, from the top we have
rν, rν  1,   , r1 vectors. Their sum is n. This is the same number as that vertically
counted. The dimension n of the vector space V n is thus given by
Xν Xν
n¼ r ¼
i¼1 i i¼1
iðr i  r iþ1 Þ: ð12:121Þ

Let us examine the structure of Table 12.1 more closely. More specifically, let us
inspect the i-layered structures of (ri  ri + 1) vectors. Picking up a vector from
among ariþ1 þ1 ,   , ari , we call it aρ. Then, we get following set of vectors aρ, Naρ
N2aρ,   , Ni  1aρ in the i-layered structure. These i vectors are displayed “verti-
cally” in Table 12.1. These vectors are linearly independent (see Theorem 12.4) and
form an i- dimensional N-invariant subspace; i.e., Span{Ni  1aρ, Ni  2aρ,   , Naρ,
aρ}, where ri + 1 + 1  ρ  ri. Matrix representation of the linear transformation
N with respect to the set of these i vectors is
12.6

Table 12.1 Jordan blocks and structure of basis vectors for a nilpotent matrixa
Jordan Canonical Form

rν#
rν a1 ,   , arν rν  1  rν#
rν  1 Na1 ,   , Narν arν þ1 ,   , arν1
 N 2 a1 ,   , N 2 arν Narν þ1 ,   , Narν1
    ri  ri + 1#
ri N νi a1 ,   , N νi arν N νi1 arν þ1 ,   , N νi1 arν1  ariþ1 þ1 ,   , ari
ri  1 N νi1 a1 ,   , N νi1 arν N νi2 arν þ1 ,   , N νi2 arν1  Nariþ1 þ1 ,   , Nari
     r2  r3#
r2 N ν2 a1 ,   , N ν2 arν N ν3 arν þ1 ,   , N ν3 arν1  N i2 ariþ1 þ1 ,   , N i2 ari  ar3 þ1 ,   , ar2 r1  r2#
r1 ν1 ν1 ν2 ν2  i1 i1  Nar3 þ1 ,   , Nar2 ar2 þ1 ,   , ar1
N a1 ,   , N ar ν N arν þ1 ,   , N arν1 N ariþ1 þ1 ,   , N ari
P
n¼ νk¼1 r k νrν (ν  1)(rν  1  rν)  i(ri  ri + 1)  2(r2  r3) r1  r2
a
Adapted from Satake I (1974) Linear algebra (Mathematics Library 1: in Japanese) [4], with the permission of Shokabo Co., Ltd., Tokyo
491
492 12 Canonical Forms of Matrices

 
N i1 aρ N i2 aρ   Naρ aρ N
0 1
0 1
B C
B C
B 0 1 C
B C
B C
B 0 C
 i1  B C
B C
¼ N aρ N aρ   Naρ aρ B
i2
⋮ ⋱ ⋮ C: ð12:122Þ
B C
B C
B 0 1 C
B C
B C
B  0 1C
@ A
0

These (i, i) matrices of (12.122) are called i-th order Jordan blocks. Notice that
the number of those Jordan blocks is (ri  ri + 1). Let us expressly define this
number as [3, 4]

J i ¼ r i  r iþ1 , ð12:123Þ

where Ji is the number of the i-th order Jordan blocks. The total number of Jordan
blocks within a whole vector space V n ¼ W (ν) is
Xν Xν
J
i¼1 i
¼ i¼1
ðr i  r iþ1 Þ ¼ r 1 : ð12:124Þ

Recalling the dimension theorem mentioned in (11.45), we have

dim V n ¼ dim Ker N i þ dimN i ðV n Þ ¼ dim Ker N i þ rank N i : ð12:125Þ

Meanwhile, since W (i) ¼ Ker Ni, dim W (i) ¼ mi ¼ dim Ker Ni. From (12.116),
m0  0. Then (11.45) is now read as

dim V n ¼ mi þ rankN i : ð12:126Þ

That is

n ¼ mi þ rankN i ð12:127Þ

or

mi ¼ n  rankN i : ð12:128Þ

Meanwhile, from Table 12.1 [3, 4] we have


12.6 Jordan Canonical Form 493

Xi Xi1
dim W ðiÞ ¼ mi ¼ r , dim
k¼1 k
W ði1Þ ¼ mi1 ¼ r :
k¼1 k
ð12:129Þ

Hence, we have

r i ¼ mi  mi1 : ð12:130Þ

Then we get [3, 4]

J i ¼ r i  r iþ1 ¼ ðmi  mi1 Þ  ðmiþ1  mi Þ ¼ 2mi  mi1  miþ1


     
¼ 2 n  rankN i  n  rankN i1  n  rankN iþ1
¼ rankN i1 þ rankN iþ1  2rankN i : ð12:131Þ

The number Ji is therefore defined uniquely by N. The total number of Jordan


blocks r1 is also computed using (12.128) and (12.130) as

r 1 ¼ m1  m0 ¼ m1 ¼ n  rankN ¼ dim Ker N, ð12:132Þ

where the last equality arises from the dimension theorem expressed as (11.45).
In Table 12.1 [3, 4], moreover, we have two extreme cases. That is, if ν ¼ 1 in
(12.116), i.e., N ¼ 0, from (12.132)
P we have r1 ¼ n and r2 ¼    ¼ rn ¼ 0; see
Fig. 12.1a. Also we confirm n ¼ νi¼1 r i in (12.121). In this case, all the eigenvectors
are proper eigenvectors with multiplicity of n and we have n first-order Jordan
blocks. The other is the case of ν ¼ n. In that case, we have

r 1 ¼ r 2 ¼    ¼ r n ¼ 1: ð12:133Þ
P
In the latter case, we also have n ¼ νi¼1 r i in (12.121). We have only one proper
eigenvector and (n  1) generalized eigenvectors; see Fig. 12.1b. From (12.132), we
have this special case where we have only one n-th order Jordan block, when
rankN ¼ n  1.

12.6.2 Jordan Blocks

Let us think of (12.81) on the basis of (12.102). Picking up A(i) from (12.102) and
considering (12.108), we put

N i ¼ AðiÞ  αi E ni ð1  i  sÞ, ð12:134Þ


494 12 Canonical Forms of Matrices

Fig. 12.1 Examples of the (b)


structure of Jordan blocks.
(a) r1 ¼ n. (b)
r1 ¼ r2 ¼    ¼ rn ¼ 1

(a) ⋯

, , ⋯,
= = =⋯=1

where Eni denotes (ni, ni) identity matrix. We express a nilpotent matrix as Ni as
before. In (12.134) the number ni corresponds to n in V n of Sect. 12.6.1. As Ni is a
(ni, ni) matrix, we have

N i ni ¼ 0:

Here we are speaking of ν-th order nilpotent matrices Ni such that Niν  1 6¼ 0 and
ν
Ni ¼ 0 (1  ν  ni). We can deal with Ni in a manner fully consistent with the theory
we developed in Sect. 12.6.1. Each A(i) comprises one or more Jordan blocks A(κ)
that is expressed as

AðκÞ ¼ N κi þ αi E κi ð1  κi  ni Þ, ð12:135Þ

where A(κ) denotes the κ-th Jordan block in A(i). In A(κ), N κi and Eκi are nilpotent
(κi, κ i) matrix and (κ i, κi) identity matrix, respectively. In N κi , κi zeros are displayed
on the principal diagonal and entries of 1 are positioned on the matrix element next
above the principal diagonal. All other entries are zero; see, e.g., a matrix of
(12.122). As in the Sect. 12.6.1, the number κi is called a dimension of the Jordan
block. Thus, A(i) of (12.102) can further be reduced to segmented matrices A(κ). Our
next task is to find out how many Jordan blocks are contained in individual A(i) and
what is the dimension of those Jordan blocks.
Corresponding to (12.122), the matrix representation of the linear transformation
by A(κ) with respect to the set of κ i vectors is
12.6 Jordan Canonical Form 495

h iκi 1 h h i
AðκÞ  αi E κi aσ AðκÞ  αi Eκi κ i 2
aσ   aσ AðκÞ  αi Eκi
h iκi 1 h iκi 2 
ðκ Þ ðκ Þ
¼ A  αi Eκi aσ A  αi E κi aσ   aσ
0 1
0 1
B C
B 0 1 C
B C
B 0 ⋱ C
B C
B C
B ⋮ ⋱ ⋱ ⋮ C: ð12:136Þ
B C
B 0 1 C
B C
B C
@  0 1A
0

A vector aσ stands for a vector associated with the κ-th Jordan block of A(i).
From (12.136) we obtain
h i h iκi 1
AðκÞ  αi Eκi AðκÞ  αi Eκi aσ = 0: ð12:137Þ

Namely,
h iκi 1 h iκi 1
AðκÞ AðκÞ  αi E κi aσ = αi AðκÞ  αi E κi aσ : ð12:138Þ

κi 1
This shows that AðκÞ  αi Eκi aσ is a proper eigenvector of A(κ) that corre-
sponds to an eigenvalue αi. On the other hand, aσ is a generalized eigenvector of rank
κi. There  are another (κ i  2) generalized eigenvectors of
μ
AðκÞ  αi Eκi aσ ð1  μ  κ i  2Þ . In total, there are κi eigenvectors [a proper
eigenvector and (κi  1) generalized eigenvectors]. Also we see that the sole proper
eigenvector can be found for each Jordan block.
In reference to these κ i eigenvectors as the basis vectors, the (κ i,κi)-matrix A(κ)
(i.e., a Jordan block) is expressed as

1
⎛ 1 ⎞
⎜ ⋱ ⎟
( )
= ⎜ ⋮ ⋱ ⋱ ⋮ ⎟. ð12:139Þ
⎜ 1 ⎟
⎜ ⎟
496 12 Canonical Forms of Matrices

A (ni, ni) matrix A(i) of (12.102) pertinent to an eigenvalue αi contains a direct


sum of Jordan blocks whose dimension ranges from 1 to ni. The largest possible
number of Jordan blocks of dimension d (that satisfies n2i þ 1  d  ni , where [μ]
denotes a largest integer that does not exceed μ) is at most one.
An example depicted below is a matrix A(i) that explicitly includes two one-
dimensional Jordan blocks, a (3, 3) three-dimensional Jordan block, and a (5, 5) five-
dimensional Jordan block:
0 1
αi 0
B αi C
B 0 C
B C
B αi 1  C
B C
B αi C
B 1 C
B C
B ⋮ αi 0 ⋮ C
AðiÞ B C,
B αi C
B 1 C
B C
B αi 1 C
B C
B  αi C
B 1 C
B C
@ αi 1A
αi

where A(i) is a (10, 10) upper triangle matrix in which αi is displayed on the principal
diagonal with entries 0 or 1 on the matrix element next above the principal diagonal
with all other entries being zero.
Theorem 12.1 shows that every (n, n) square matrix can be converted to a triangle
matrix by suitable similarity transformation. Diagonal elements give eigenvalues.
Furthermore, Theorem 12.5 ensures that A can be reduced to generalized
eigenspaces W e αi (1  i  s) according to individual eigenvalues. Suppose for
example that after a suitable similarity transformation a full matrix A is rresented as
0 1
α1
B α2   C
B C
B C
B α2  C
B C
B α2 C
B C
B C
B ⋱ C
B C
B αi    C
B C
A B C, ð12:140Þ
B αi   C
B C
B αi  C
B C
B C
B αi C
B C
B ⋱ C
B C
B C
@ αs A
αs

where
12.6 Jordan Canonical Form 497

A ¼ Að1Þ  Að2Þ      AðiÞ      AðsÞ : ð12:141Þ

In (12.141), A(1) is a (1, 1) matrix (i.e., simply a number); A(2) is a (3, 3) matrix;
   A(i) is a (4, 4) matrix;    A(s) is a (2, 2) matrix.
The above matrix form allows us to further deal with segmented triangle matrices
separately. In the case of (12.140) we may use a following matrix for similarity
transformation:
0 1
1
B C
B C
B 1 C
B C
B C
B 1 C
B C
B C
B 1 C
B C
B C
B ⋱ C
B C
B C
B C
e¼B
P B p11 p12 p13 p14 C
C,
B C
B p21 p22 p23 p24 C
B C
B C
B C
B p31 p32 p33 p34 C
B C
B C
B p41 p42 p43 p44 C
B C
B ⋱ C
B C
B C
B C
@ 1 A
1

where a (4, 4) matrix P given by


0 1
p11 p12 p13 p14
B C
B p21 p22 p23 p24 C
P¼B
Bp
C
C
@ 31 p32 p33 p34 A
p41 p42 p43 p44

is a non-singular matrix. The matrix P is to be operated on A(i) so that we can


separately perform the similarity transformation with respect to a (4, 4) nilpotent
matrix A(i)  αiE4 following the procedures mentioned Sect. 12.6.1.
Thus, only an αi-associated segment can be treated with other segments left
unchanged. In a similar fashion, we can consecutively deal with matrix segments
related to other eigenvalues. In a practical case, however, it is more convenient to
seek different eigenvalues and corresponding (generalized) eigenvectors at once and
convert the matrix to Jordan canonical form. To make a guess about the structure of a
498 12 Canonical Forms of Matrices

matrix, however, the following argument will be useful. Let us think of an example
after that.
Using (12.140), we have
0 1
α1  αi
B C
B C
B α 2  αi   C
B C
B C
B C
B α2  αi  C
B C
B C
B C
B α2  αi C
B C
B C
B C
B ⋱ C
B C
B C
B C
B 0    C
A  αi E B C
B C
B 0   C
B C
B C
B C
B 0  C
B C
B C
B C
B 0 C
B C
B C
B ⋱ C
B C
B C
B αs  αi  C
B C
@ A
αs  αi

Here we are treating the (n, n) matrix A on V n. Note that a matrix A  αiE is not
nilpotent as a whole. Suppose that the multiplicity of αi is ni; in (12.140) ni ¼ 4.
Since eigenvalues α1, α2,   , and αs take different values from one another, α1  αi,
α2  αi,   , and αs  αi 6¼ 0. In a triangle matrix diagonal elements give eigenvalues
and, hence, α1  αi, α2  αi,   , and αs  αi are nonzero eigenvalues of A  αiE. We
rewrite (12.140) as
0 1
M ð1Þ
B M ð2Þ C
B C
B C
B ⋱ C
A  αi E B C, ð12:142Þ
B M ðiÞ C
B C
B C
@ ⋱ A
M ðsÞ
0 1
α2  αi  
B C
where M ð1Þ ¼ ðα1  αi Þ, M ð2Þ ¼ @ α2  αi  A,   ,
α2  αi
12.6 Jordan Canonical Form 499

0 1
0   
B  
ðiÞ B 0  CC ðsÞ αs  αi 
M ¼B C,   , M ¼ :
@ 0 A αs  αi
0

Thus, M ( p) ( p 6¼ i) is a non-singular matrix and M (i) is a nilpotent matrix. Note


μ 1 μ
that if we can find μi such that M ðiÞ i 6¼ 0 and M ðiÞ i ¼ 0 , with a minimal
polynomial φM ðiÞ ðxÞ for M (i), we have φM ðiÞ ðxÞ ¼ xμi . Consequently, we get
0  μi 1
M ð1Þ
B μi C
B M ð2Þ C
B C
B C
B ⋱ C
ðA  αi EÞμi B C: ð12:143Þ
B 0 C
B C
B C
@ ⋱ A
 μi
M ðsÞ

ðpÞ μi
In (12.143)
μidiagonal elements of non-singular triangle matrices M ðp 6¼ iÞ
are αp  αi ð6¼ 0Þ . Thus, we have a “perforated” matrix ðA  αi E Þμi where

M ðiÞ i ¼ 0 in (12.143). Putting
Ys
ΦA ð x Þ  i¼1
ðx  αi Þμi ,

we get
Ys
ΦA ðAÞ  i¼1
ðA  αi E Þμi ¼ 0:

A polynomial ΦA(x) gives a minimal polynomial for fA(x).


From the above argument, we can choose μi for li in (12.82). Meanwhile, M (i) in
(12.142) is identical to Ni in (12.134). Rewriting ((12.134)), we get

AðiÞ ¼ M ðiÞ þ αi E ni : ð12:144Þ


k
Let us think of matrices AðiÞ  αi E ni and (A  αiE)k (k  μi). From (11.45), we
find

dim V n ¼ n ¼ dim Ker ðA  αi E Þk þ rank ðA  αi E Þk , ð12:145Þ

and
500 12 Canonical Forms of Matrices

h ik h ik
e αi ¼ ni ¼ dim Ker AðiÞ  αi Eni þ rank AðiÞ  αi E ni ,
dim W ð12:146Þ

k
where rank (A  αiE)k ¼ dim (A  αiE)k(V n) and rank AðiÞ  αi Eni ¼
k  
dim AðiÞ  αi Eni W e αi .
Noting that
h ik
dim Ker ðA  αi E Þk ¼ dim Ker AðiÞ  αi E ni , ð12:147Þ

we get
h ik
n  rank ðA  αi E Þk ¼ ni  rank AðiÞ  αi Eni : ð12:148Þ

This notable property comes from the non-singularity of [M ( p)]k ( p 6¼ i, k: a


positive integer); i.e., all the eigenvalue of [M ( p)]k are nonzero. In particular, as
μ
rank AðiÞ  αi E ni i ¼ 0, from (12.148) we have

rankðA  αi E Þl ¼ n  ni ðl  μi Þ: ð12:149Þ

Meanwhile, putting k ¼ 1 in (12.148) and using (12.132) we get


h i h i
dim Ker AðiÞ  αi E ni ¼ ni  rank AðiÞ  αi E ni

¼ n  rank ðA  αi E Þ ¼ dim Ker ðA  αi EÞ


ðiÞ
¼ r1 : ð12:150Þ

ðiÞ
The value r 1 gives the number of Jordan blocks with an eigenvalue αi.
Moreover, we must consider a following situation. We know how the matrix A(i)
in (10.134) is reduced to Jordan blocks of lower dimension. To get detailed infor-
mation about it, however, we have to get the information about (generalized)
eigenvalues corresponding to eigenvalues other than αi. In this context, (12.148) is
useful. Equation (12.131) tells how the number of Jordan blocks in a nilpotent matrix
is determined. If we can get this knowledge before we have found out all the
(generalized) eigenvectors, it will be easier to address the problem. Let us rewrite
(12.131) as
h iq1 h iqþ1
J ðqiÞ ¼ r q  r qþ1 ¼ rank AðiÞ  αi Eni þ rank AðiÞ  αi E ni
h iq
 2rank AðiÞ  αi E ni , ð12:151Þ
12.6 Jordan Canonical Form 501

where we define J ðqiÞ as the number of the q-th order Jordan blocks within A(i). Note
that these blocks are expressed as (q, q) matrices. Meanwhile, using (12.148) J ðqiÞ is
expressed as

J ðqiÞ ¼ rank ðA  αi EÞq1 þ rank ðA  αi E Þqþ1  2rank ðA  αi E Þq : ð12:152Þ

This relation can be obtained by replacing k in (12.148) with q  1, q + 1, and q,


respectively, and deleting n and ni from these three relations. This enables us to gain
access to a whole structure of the linear transformation represented by the (n, n)
matrix A without reducing it to subspaces.
To enrich our understanding of Jordan canonical forms, a following tangible
example will be beneficial.

12.6.3 Example of Jordan Canonical Form

Let us think of a following matrix A:


0 1
1 1 0 0
B1 0C
B 3 0 C
A¼B C: ð12:153Þ
@0 0 2 0A
3 1 2 4

The characteristic equation fA(x) is given by


 
x  1 1 0 0 
 
 
 
 1 x3 0 0 
 
f A ðxÞ ¼  :

0 0 x2 0  ð12:154Þ
 
 
 
3 1 2 x  4

¼ ðx  4Þðx  2Þ3

Equating (12.154) to zero, we get an eigenvalue 4 as a simple root and an


eigenvalue 2 as a triple root. The vector space V4 is then decomposed to two
invariant subspaces. The first is a one-dimensional kernel (or null-space) of the
transformation (A  4E) and the other is a three-dimensional kernel of the transfor-
mation (A  2E)3. We have to seek eigenvectors that span these invariant subspaces.
502 12 Canonical Forms of Matrices

ðiÞ x ¼ 4:

An eigenvector belonging to the first invariant subspace must satisfy a


proper eigenvalue equation since the eigenvalue 4 is simple root. This equation
is expressed as

ðA  4EÞx ¼ 0:

This reads in a matrix form as


0 10 1
3 1 0 0 c1
B1 1 C
0 C B c2 C
B
B 0 C
B CB C ¼ 0: ð12:155Þ
@0 0 2 0 A @ c3 A
3 1 2 0 c4

This is equivalent to a following set of four equations:

3c1  c2 ¼ 0,
c1  c2 ¼ 0,
2c3 ¼ 0,
3c1  c2  2c3 ¼ 0:

These are equivalent to that c3 ¼ 0 and c1 ¼ c2 ¼  3c1. Therefore, c1 ¼ c2 ¼ c3 ¼ 0


with an arbitrarily chosen number of c4, which is chosen as 1 as usual. Hence,
ð4Þ
designating the proper eigenvector as e1 , its column vector representation is
0 1
0
B0C
ð4Þ B C
e1 ¼ B 0 C:
@ A
1

A (4, 4) matrix in (12.155) representing (A  4E) has a rank 3. The number of


Jordan blocks for an eigenvalue 4 is given by (12.150) as

ð4Þ
r 1 ¼ 4  rankðA  4EÞ ¼ 1: ð12:156Þ

In this case, the Jordan block is naturally one-dimensional. In fact, using (12.152)
we have

ð4Þ
J 1 ¼ rank ðA  4E Þ0 þ rank ðA  4EÞ2  2rank ðA  4E Þ
¼ 4 þ 3  2 3 ¼ 1: ð12:157Þ
12.6 Jordan Canonical Form 503

ð4Þ
In (12.157), J 1 gives the number of the first-order Jordan blocks for an
eigenvalue 4. We used
0 1
8 4 0 0
B 4 0C
B 0 0 C
ðA  4EÞ2 ¼ B C ð12:158Þ
@0 0 4 0A
8 4 4 0

and confirmed that rank (A  4E)2 ¼ 3.

ðiiÞ x ¼ 2 :

The eigenvalue 2 has a triple root. Therefore, we must examine how the invariant
subspaces can further be decomposed to subspaces of lower dimension. To this end
we first start with a secular equation expressed as

ðA  2EÞx ¼ 0: ð12:159Þ

The matrix representation is


0 10 1
1 1 0 0 c1
B1 0C B C
B 1 0 CB c2 C
B CB C ¼ 0:
@0 0 0 0 A@ c3 A
3 1 2 2 c4

This is equivalent to a following set of two equations:

c1 þ c2 ¼ 0,
3c1  c2  2c3 þ 2c4 ¼ 0:

From the above, we can put c1 ¼ c2 ¼ 0 and c3 ¼ c4 (¼1). The equations allow the
existence of another proper eigenvector. For this we have c1 ¼  c2 ¼ 1, c3 ¼ 0, and
c4 ¼ 1. Thus, for the two proper eigenvectors corresponding to an eigenvalue 2, we
get
0 1 0 1
0 1
B0C B 1 C
ð2Þ B C ð2Þ B C
e1 ¼ B 1 C, e2 ¼ B 0 C:
@ A @ A
1 1
504 12 Canonical Forms of Matrices

A dimension of the invariant subspace corresponding to an eigenvalue 2 is three


(due to the triple root) and, hence, there should be one generalized eigenvector. To
determine it, we examine a following matrix equation:

ðA  2EÞ2 x ¼ 0: ð12:160Þ

The matrix representation is


0 10 1
0 0 0 0 c1
B0 0C B C
B 0 0 CB c2 C
B CB C:
@0 0 0 0 A@ c3 A
4 0 4 4 c4

That is,

c1  c3 þ c4 ¼ 0: ð12:161Þ

Furthermore, we have
0 1
0 0 0 0
B0 0C
B 0 0 C
ðA  2E Þ3 ¼ B C: ð12:162Þ
@0 0 0 0A
8 0 8 8

Moreover, rank (A  2E)l(¼1) remains unchanged for l  2 as expected from


(12.149).
It will be convenient to examine a structure of the invariant subspace. For this
ð2Þ
purpose, we seek the number of Jordan blocks r 1 and their order. Using (12.150),
we have

ð2Þ
r 1 ¼ 4  rankðA  2E Þ ¼ 4  2 ¼ 2: ð12:163Þ

The number of first-order Jordan blocks is

ð2Þ
J 1 ¼ rank ðA  2EÞ0 þ rank ðA  2EÞ2  2rank ðA  2E Þ
¼4þ12 2 ¼ 1: ð12:164Þ

In turn, the number of second-order Jordan blocks is

ð 2Þ
J 2 ¼ rank ðA  2E Þ þ rank ðA  2E Þ3  2rank ðA  2E Þ2
¼ 2 þ 1  2 1 ¼ 1: ð12:165Þ
12.6 Jordan Canonical Form 505

ð2Þ ð2Þ
In the above, J 1 and J 2 are obtained from (12.152). Thus, Fig. 12.2 gives a
constitution of Jordan blocks for eigenvalues 4 and 2. The overall number of Jordan
blocks is three; the number of the first-order and second-order Jordan blocks is two
and one, respectively.
ð2Þ ð2Þ
The proper eigenvector e1 is related to J 1 of (12.164). A set of the proper
ð2Þ ð2Þ
eigenvector e2 and the corresponding generalized eigenvector g2 is pertinent to
ð2Þ ð2Þ
J 2 of (12.165). We must have the generalized eigenvector g2 in such a way that
h i
ð2Þ ð2Þ ð2Þ
ðA  2EÞ2 g2 ¼ ðA  2E Þ ðA  2E Þg2 = ðA  2EÞe2 = 0: ð12:166Þ

From (12.161), we can put c1 ¼ c3 ¼ c4 ¼ 0 and c2 ¼  1. Thus, the matrix


ð2Þ
representation of the generalized eigenvector g2 is
0 1
0
B 1 C
ð2Þ B C
g2 ¼ B C: ð12:167Þ
@0 A
0

ð2Þ ð2Þ
We stress here that e1 is not eligible for a proper pair with g2 in J2. It is because
from (12.166) we have

ð2Þ ð2Þ ð2Þ ð2Þ


ðA  2E Þg2 ¼ e2 , ðA  2EÞg2 6¼ e1 : ð12:168Þ

Thus, we have determined a set of (generalized) eigenvectors. The matrix repre-


sentation R for the basis vectors transformation is given by
0 1
0 0 1 0
B0 0 1 1 C  
B C ð4Þ ð2Þ ð2Þ ð2Þ
R¼B C e 1 e 1 e 2 g2 , ð12:169Þ
@0 1 0 0 A
1 1 1 0

ð4Þ ð2Þ ð2Þ


where the symbol ~ denotes the column vector representation; e1 , e1 , and e2
ð2Þ
represent proper eigenvectors and g2 is a generalized eigenvector. Performing
similarity transformation using this R, we get a following Jordan canonical form:
506 12 Canonical Forms of Matrices

Fig. 12.2 Structure of -RUGDQEORFNV


Jordan blocks of a matrix
shown in (12.170)
( )

( ) ( ) ( )

(LJHQYDOXH (LJHQYDOXH

0 10 10 1
1 0 1 1 1 1 0 0 0 0 1 0
B CB CB C
B0 0 1 0C B 0C B 1 1 C
1 B CB 1 3 0 CB 0 0 C
R AR ¼ B CB CB C
B1 0 0 0C B 0C B 0 C
@ A@ 0 0 2 A@ 0 1 0 A
1 1 0 0 3 1 2 4 1 1 1 0
0 1
4 0 0 0
B C
B 0 2 0 0C
B C
¼B C:
B 0 0 2 1C
@ A
0 0 0 2
ð12:170Þ

The structure of Jordan blocks is shown in Fig. 12.2. Notice here that the trace of
A remains unchanged before and after similarity transformation.
Next, we consider column vector representations. According to (11.37), let us
view the matrix A as a linear transformation over V 4. Then A is given by
0 1
x1
B x2 C
B C
AðxÞ ¼ ðe1 e2 e3 e4 ÞAB C
@x A
3
x4
0 10 1
1 1 0 0 x1
B1 0C B C
B 3 0 CB x2 C
¼ ð e1 e2 e3 e4 Þ B CB C, ð12:171Þ
@0 0 2 0 A@ x3 A
3 1 2 4 x4

where e1, e2, e3, and e4 arePbasis vectors


 and
 x1, x2, x3, and x4 are corresponding
coordinates of a vector x = ni¼1 xi ei 2 V 4 . We rewrite (12.171) as
12.6 Jordan Canonical Form 507

0 1
x1
B C
B x2 C
B C
AðxÞ ¼ ðe1 e2 e3 e4 ÞRR1 ARR1 B
B
C
C
Bx C
@ 3A
x4

0 10 0 1
4 0 0 0 x1
 B 0 0C B x02 C
ð4Þ ð2Þ ð2Þ ð2Þ B 2 0 CB C
¼ e 1 e 1 e 2 g2 B CB C, ð12:172Þ
@0 0 2 1 A@ x03 A
0 0 0 2 x04

where we have
 
ð4Þ ð2Þ ð2Þ ð2Þ
e1 e1 e2 g2 ¼ ðe1 e2 e3 e4 ÞR,
0 0 1 0 1
x1 x1
B 0 C Bx C
B x2 C B 2C ð12:173Þ
B C 1 B C:
B C¼R B C
B x0 C @ A
@ 3A x 3

x04
x4

After (11.84), we have


0 1 0 1
x1 x1
Bx C Bx C
B 2C B 2C
B
x = ð e1 e2 e 3 e4 Þ B C 1 B C
C ¼ ðe1 e2 e3 e4 ÞRR B C
@ x3 A @ x3 A
x4 x4
0 0 1
x1
B 0 C
 B x C
ð4Þ ð2Þ ð2Þ ð2Þ B 2 C
¼ e 1 e 1 e 2 g2 B C: ð12:174Þ
B x0 C
@ 3A
x04

As for V n in general, let us put R ¼ ( p)ij and R1 ¼ (q)ij. Also represent j-th
(generalized) eigenvectors by a column vector and denote them by p( j ). There we
display individual (generalized) eigenvectors in the order of (e(1) e(2)  e( j )  e(n)),
where e( j ) (1  j  n) denotes either a proper eigenvector or a generalized
eigenvector according to (12.174). Each p( j ) is represented in reference to original
basis vectors (e1  en). Then we have
508 12 Canonical Forms of Matrices

Xn ð jÞ ð jÞ
R1 pð jÞ ¼ q p
k¼1 ik k
¼ δi , ð12:175Þ

ð jÞ
where δi denotes a column vector to which only the j-th row is 1, otherwise
ð jÞ
0. Thus, a column vector δi is an “address” of e( j ) in reference to
(e e   e   e ) taken as basis vectors.
(1) (2) ( j) (n)

In our present case, in fact, we have


0 10 1 0 1
1 0 1 1 0 1
B0 0 CB 0 C B 0 C
C B C B
B 0 1 C
R1 pð1Þ ¼ B CB C ¼ B C,
@1 0 0 0 A @ 0 A @ 0A
1 1 0 0 1 0
0 10 1 0 1
1 0 1 1 0 0
B0 0C B0C B1C
B 0 1 CB C B C
R1 pð2Þ ¼B CB C ¼ B C,
@1 0 0 0 A@ 1 A @ 0 A
1 1 0 0 1 0
0 10 1 0 1
1 0 1 1 1 0
B0 0 CB 1 C B 0 C
C B C B
B 0 1 C
R1 pð3Þ ¼B CB C ¼ B C,
@1 0 0 0 A@ 0 A @ 1 A
1 1 0 0 1 0
0 10 0 1
1
1 0 1 1 0 0
B0 0 CB 1 C B 0 C
C B C B
B 0 1 C
R1 pð4Þ ¼B CB C ¼ B C: ð12:176Þ
@1 0 0 0 A @ 0 A @ 0A
1 1 0 0 0 1

ð4Þ
In (12.176) p(1) is a column vector representation of e1 ; p(2) is a column vector
ð2Þ
representation of e1 , and so on.
A minimal polynomial ΦA(x) is expressed as

ΦA ð x Þ ¼ ð x  4Þ ð x  2Þ 2 :

Readers can easily make sure of it.


We remark that a non-singular matrix R pertinent to the similarity transformation
is not uniquely determined, but we have arbitrariness. In (12.169), for example, if we
ð2Þ ð2Þ ð2Þ ð2Þ
adopt g2 instead of g2 , we should adopt e2 instead of e2 accordingly. Thus,
instead of R in (12.169) we may choose
12.6 Jordan Canonical Form 509

0 1
0 0 1 0
B0 0 1C
B 1 C
R0 ¼ B C: ð12:177Þ
@0 1 0 0A
1 1 1 0

In this case, we also get the same Jordan canonical form as before. That is,
0 1
4 0 0 0
B0 2 0C
1 B 0 C
R0 AR0 ¼ B C:
@0 0 2 1A
0 0 0 2

Suppose that we choose R00 such that


0 1
0 1 0 0
B0 1 1 0C  
B C ð2Þ ð2Þ ð2Þ ð4Þ
ðe1 e2 e3 e4 ÞR00 ¼ ðe1 e2 e3 e4 ÞB C ¼ e 1 e 2 g2 e 1 :
@1 0 0 0A
1 1 0 1

In this case, we have


0 1
2 0 0 0
B0 0C
1 B 2 1 C
R00 AR00 ¼ B C:
@0 0 2 0A
0 0 0 4

Note that we get a different disposition of the matrix elements from that of
(12.172).
Next, we decompose A into a semi-simple matrix and a nilpotent matrix. In
(12.172), we had
0 1
4 0 0 0
B0 0C
B 2 0 C
R1 AR ¼ B C:
@0 0 2 1A
0 0 0 2

Defining
510 12 Canonical Forms of Matrices

0 1 0 1
4 0 0 0 0 0 0 0
B0 0C B0 0C
B 2 0 C B 0 0 C
S¼B C and N¼B C,
@0 0 2 0A @0 0 0 1A
0 0 0 2 0 0 0 0

we have

R1 AR ¼ S þ N, i:e:, A ¼ RðS þ N ÞR1 ¼ RSR1 þ RNR1 :

Performing the above matrix calculations and putting e e¼


S ¼ RSR1 and N
1
RNR , we get

A¼e e
SþN

with
0 1 0 1
2 0 0 0 1 1 0 0
B0 0C B1 0C
e B 2 0 C e¼B
1 0 C
S¼B C and N B C:
@0 0 2 0A @0 0 0 0A
2 0 2 4 1 1 0 0

That is, we have


0 1 0 1 0 1
1 1 0 0 2 0 0 0 1 1 0 0
B C B C B C
B1 3 0 0C B 0C B 0 0C
B C B0 2 0 C B1 1 C
A¼B C¼B CþB C:
B0 0 2 0C B 0C B 0 0C
@ A @0 0 2 A @0 0 A
3 1 2 4 2 0 2 4 1 1 0 0
ð12:178Þ

Even though matrix forms S and N differ depending on the choice of different
matrix forms of similarity transformation R (namely, R0, R00, etc. represented above),
the decomposition (12.178) is unique. That is, eS and N e are uniquely determined,
once a matrix A is given. The confirmation is left for readers as an exercise.
We present another simple example. Let A be a matrix described as
 
0 4
A¼ :
1 4

Eigenvalues of A are 2 as a double root. According to routine, we have an


ð2Þ
eigenvector e1 as a column vector, e.g.,
12.6 Jordan Canonical Form 511

 
ð2Þ 2
e1 ¼ :
1

ð2Þ
Another eigenvector is a generalized eigenvector g1 of rank 2. This can be
decided such that
 
ð2Þ 2 4 ð2Þ ð2Þ
ðA  2E Þg1 ¼ g ¼ e1 :
1 2 1

As an option, we get
 
ð2Þ 1
g1 ¼ :
1

Thus, we can choose R for a diagonalizing matrix together with an inverse matrix
R1 such that
   
2 1 1 1
R¼ , and R1 ¼ :
1 1 1 2

Therefore, with a Jordan canonical form we have


 
1 2 1
R AR ¼ : ð12:179Þ
0 2

As before, putting
   
2 0 0 1
S¼ and N¼ ,
0 2 0 0

we have

R1 AR ¼ S þ N, i:e:, A ¼ RðS þ N ÞR1 ¼ RSR1 þ RNR1 :

Putting e e ¼ RNR1 , we get


S ¼ RSR1 and N

A¼e e
SþN

with
   
2 0 2 4
e
S¼ e¼
and N : ð12:180Þ
0 2 1 2
512 12 Canonical Forms of Matrices

We may also choose R0 for a diagonalizing matrix together with an inverse matrix
R instead of R and R1, respectively, such that we have, e.g.,
0 1

   
0 2 3 0 1 1 3
R ¼ , and R ¼ :
1 1 1 2

Using these matrices, we get exactly the same Jordan canonical form and the
matrix decomposition as (12.179) and (12.180). Thus, again we find that the matrix
decomposition is unique.
Another simple example is a lower triangle matrix
0 1
2 0 0
B C
A ¼ @ 2 1 0 A:
0 0 1

Following now familiar procedures, as a diagonalizing matrix we have, e.g.,


0 1 0 1
1 0 0 1 0 0
B C B C
R ¼ @ 2 1 0A and R1 ¼ @2 1 0 A:
0 0 1 0 0 1

Then, we get
0 1
2 0 0
B C
R1 AR ¼ S ¼ @ 0 1 0 A:
0 0 1

Therefore, the “decomposition” is


0 1 0 1
2 0 0 0 0 0
B C B C
A ¼ RSR1 ¼ @ 2 1 0A þ @0 0 0 A,
0 0 1 0 0 0

where the first term is a semi-simple matrix and the second is a nilpotent matrix (i.e.,
zero matrix). Thus, the decomposition is once again unique.

12.7 Diagonalizable Matrices

Among canonical forms of matrices, the simplest form is a diagonalizable matrix.


Here we define a diagonalizable matrix as a matrix that can be converted to that
whose off-diagonal elements are zero. In Sect. 12.5 we have investigated different
12.7 Diagonalizable Matrices 513

properties of the matrices. In this section we examine basic properties of diagonal-


izable matrices.
In Sect 12.6.1 we have shown that Span{Ni  1aρ, Ni  2aρ,   Naρ, aρ} forms a
N-invariant subspace of a dimension i, where aρ satisfies the relations Niaρ ¼ 0 and
Ni  1aρ 6¼ 0 (ri + 1 + 1  j  ri) as in (12.122). Of these vectors, only Ni  1aρ is a sole
proper eigenvector that is accompanied by (i  1) generalized eigenvectors. Note
that only the proper eigenvector can construct one-dimensional N-invariant subspace
by itself. This is because regarding other generalized eigenvectors g (here g stands
for all of generalized eigenvectors), Ng (≠0) and g are linearly independent. Note
that with a proper eigenvector e, we have Ne = 0. A corresponding Jordan block is
represented by a matrix as given in (12.122) in reference to the basis vectors
comprising these i eigenvectors. Therefore, if a (n, n) matrix A has only proper
eigenvectors, all Jordan blocks are one-dimensional. This means that A is diagonal-
izable. That A has only proper eigenvectors is equivalent to that those eigenvectors
form a one-dimensional subspace and that V n is a direct sum of the subspaces
spanned by individual proper eigenvectors. In other words, if V n is a direct sum of
subspaces (i.e., eigenspaces) spanned by individual proper eigenvectors of A, A is
diagonalizable.
Next, suppose that A is diagonalizable. Then, after an appropriate similarity
transformation with a non-singular matrix P, A has a following form:
0 1
α1
B ⋱ C
B C
B C
B α1 C
B C
B α2 C
B C
B C
B ⋱ C
P1 AP ¼ B
B
C:
C ð12:181Þ
B α2 C
B C
B ⋱ C
B C
B αs C
B C
B C
@ ⋱ A
αs

In this case, let us examine what form a minimal polynomial φA(x) for A takes. A
characteristic polynomial fA(x) for A is invariant through similarity transformation,
so is φA(x). That is,

φP1 AP ðxÞ ¼ φA ðxÞ: ð12:182Þ

From (12.181), we find that A  αiE (1  i  s) has a “perforated” form such as


(12.143) with the diagonalized form unchanged. Then we have
514 12 Canonical Forms of Matrices

ðA  α1 EÞðA  α2 E Þ  ðA  αs EÞ ¼ 0: ð12:183Þ

This is because a product of matrices having only diagonal elements is merely a


product of individual diagonal elements. Meanwhile, in virtue of HamiltonCayley
theorem, we have
Ys
f A ðAÞ ¼ i¼1
ðA  αi E Þni ¼ 0:

Rewriting this expression, we have

ðA  α1 EÞn1 ðA  α2 E Þn2   ðA  αs E Þns ¼ 0: ð12:184Þ

In light of (12.183), this implies that a minima polynomial φA(x) is expressed as

φA ðxÞ ¼ ðx  α1 Þðx  α2 Þ  ðx  αs Þ: ð12:185Þ

Surely φA(x) of (12.185) has a lowest order polynomial among those satisfying f
(A) ¼ 0 and is a divisor of fA(x). Also φA(x) has a highest-order coefficient of 1. Thus,
φA(x) should be a minimal polynomial of A and we conclude that φA(x) does not have
a multiple root.
Then let us think how V n is characterized in case φA(x) does not have a multiple
root. This is equivalent to that φA(x) is described by (12.185). To see this, suppose
that we have two matrices A and B and let BVn ¼ W. We wish to use the following
relation:
 
rank ðABÞ ¼ dimABV n ¼ dimAW ¼ dimW  dim A1 f0g \ W
 
 dimW  dim A1 f0g ¼ dimBV n  ðn  dimAV n Þ
¼ rank A þ rank B  n: ð12:186Þ

In (12.186), the third equality comes from the fact that the domain of A is
restricted to W. Concomitantly, A1{0} is restricted to A1{0} \ W as well; notice
that A1{0} \ W is a subspace. Considering these situations, we use a relation
corresponding to that of (11.45). The fourth equality is due to the dimension theorem
of (11.45). Applying (12.186) to (12.183) successively, we have
12.7 Diagonalizable Matrices 515

0 ¼ rank ½ðA  α1 E ÞðA  α2 EÞ  ðA  αs EÞ


 rank ðA  α1 EÞ þ rank ½ðA  α2 E Þ  ðA  αs E Þ  n
 rank ðA  α1 EÞ þ rank ðA  α2 EÞ þ rank ½ðA  α3 E Þ  ðA  αs E Þ
 2n
 
 rank ðA  α1 EÞ þ    þ rank ðA  αs E Þ  ðs  1Þn
Xs
¼ i¼1
rank ½ðA  αi E Þ  n þ n:

Finally we get
Xs
i¼1
rank ½n  ðA  αi E Þ  n: ð12:187Þ

As rank ½n  ðA  αi E Þ ¼ dimW αi , we have


Xs
i¼1
dimW αi  n: ð12:188Þ

Meanwhile, we have

V n ⊃ W α1  W α2      W αs
Xs ð12:189Þ
n  dim ðW α1  W α2      W αs Þ ¼ i¼1
dimW αi :

The equality results from the property of a direct sum. From (12.188) and
(12.189), we get
Xs
i¼1
dimW αi ¼ n: ð12:190Þ

Hence,

V n ¼ W α1  W α2      W αs : ð12:191Þ

Thus, we have proven that if the minimal polynomial does not have a multiple
root, V n is decomposed into direct sum of eigenspaces as in (12.191).
If in turn V n is decomposed into direct sum of eigenspaces as in (12.191),
A can be diagonalized by a similarity transformation. The proof is as follows:
Suppose that (12.191) holds. Then, we can take only eigenvectors for the basis
αi ¼ ni . Then, we can take vectors
n
vectors
hP of V . Suppose Pthat dimW i
i1 i
ak n
j¼1 j þ 1  k  n
j¼1 j so that ak can be the basis vectors of W αi . In
reference to this basis set, we describe a vector x 2 V n such that
516 12 Canonical Forms of Matrices

0
1
x1
B x2 C
B C
x = ða1 a2   an Þ B C:
@⋮A
xn

Operating A on x, we get
0 1 0 1
x1 x1
B C B C
B x2 C B x2 C
B C B C
AðxÞ ¼ ða1 a2   an ÞAB C ¼ ða1 A a2 A  an AÞB C
B⋮C B⋮C
@ A @ A
xn xn
0 1
x1
B C
B x2 C
B C
¼ ðα1 a1 α2 a2   αn an ÞB C
B⋮C
@ A
xn
0 10 1
α1 x1
B α2 CB C
B CB x2 C
B CB C
¼ ða1 a2   an ÞB CB C ð12:192Þ
B ⋱ CB ⋮ C
@ A@ A
αn xn

where with the second equality we used the notation (11.40); with the third equality
some of αi (1  i  n) may be identical; ai is an eigenvector that corresponds to an
eigenvalue αi. Suppose that a1, a2,   , and an are obtained by transforming an
“original” basis set e1, e2,   , and en by R. Then, we have
0 1 0 ð0Þ 1
α1 x1
B α2 C B C
B C 1 B xð20Þ C
AðxÞ ¼ ðe1 e2   en Þ RB
B
CR B
C B ⋮
C:
C
@ ⋱ A @ A
αn xðn0Þ

We denote the transformation A with respect to a basis set e1, e2,   , and en by
A0; see (11.82) with the notation. Then, we have
12.7 Diagonalizable Matrices 517

0 ð0Þ 1
x1
B ð0Þ C
B x2 C
AðxÞ ¼ ðe1 e2   en Þ A0 B
B ⋮
C:
C
@ A
xðn0Þ

Therefore, we get
0 1
α1
B α2 C
B C
R1 A0 R ¼ B C: ð12:193Þ
@ ⋱ A
αn

Thus, A is similar to a diagonal matrix as represented in (12.192) and (12.193).


It is obvious to show a minimal polynomial of a diagonalizable matrix has no
multiple root. The proof is left for readers.
Summarizing the above arguments, we have a following theorem:
Theorem 12.7 [3, 4] The following three statements related to A are equivalent:
(i) The matrix A is similar to a diagonal matrix.
(ii) The minimal polynomial φA(x) does not have a multiple root.
(iii) The vector space V n is decomposed into a direct sum of eigenspaces.
In Example 12.1 we showed the diagonalization of a matrix. There A has two
different eigenvalues. Since with a (n, n) matrix having n different eigenvalues its
characteristic polynomial does not have a multiple root, the minimal polynomial
necessarily has no multiple root. The above theorem therefore ensures that a matrix
having no multiple root must be diagonalizable.
Another consequence of this theorem is that an idempotent matrix is diagonaliz-
able. The matrix is characterized by A2 ¼ A. Then A(A  E) ¼ 0. Taking its
determinant, (detA)[det(A  E)] ¼ 0. Therefore, we have either detA ¼ 0 or det
(A  E) ¼ 0. Hence, eigenvalues of A are zero or 1. Think of f(x) ¼ x(x  1). As f
(A) ¼ 0, f(x) should be a minimal polynomial. It has no multiple root, and so the
matrix is diagonalizable.
Example 12.6 Let us revisit Example 12.1, where we dealt with
 
2 1
A¼ : ð12:32Þ
0 1

From (12.33), fA(x) ¼ (x  2)(x  1). Note that f A ðxÞ ¼ f P1 AP ðxÞ. Let us treat a
problem according to Theorem 12.5. Also we use the notation of (12.85). Given
f1(x) ¼ x  1 and f2(x) ¼ x  2, let us decide M1(x) and M2(x) such that these can
satisfy
518 12 Canonical Forms of Matrices

M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ ¼ 1: ð12:194Þ

We find M1(x) ¼ 1 and M2(x) ¼  1. Thus, using the notation of Theorem 12.5,
Sect. 12.4 we have
!
1 1
A1 ¼ M 1 ðAÞf 1 ðAÞ ¼ A  E ¼ ,
0 0
!
0 1
A2 ¼ M 2 ðAÞf 2 ðAÞ ¼ A þ 2E ¼ :
0 1

We also have

A1 þ A2 ¼ E, Ai Aj ¼ Aj Ai ¼ Ai δij : ð12:195Þ

Thus, we find that A1 and A2 are idempotent matrices. As both A1 and A2 are
expressed by a polynomial A, they are commutative with A.
We find that A is represented by

A ¼ α1 A1 þ α2 A2 , ð12:196Þ

where α1 and α2 denote eigenvalues 2 and 1, respectively. Thus choosing proper


eigenvectors for basis vectors, we have decomposed a vector space V n into a direct
sum of invariant subspaces comprising the proper eigenvectors. Concomitantly, A is
represented as in (12.196). The relevant decomposition is always possible for a
diagonalizable matrix.
Thus, idempotent matrices play an important role in the theory of linear vector
spaces.
Example 12.7 Let us think of a following matrix.
0 1
1 0 1
B C
A ¼ @0 1 1 A: ð12:197Þ
0 0 0

This is a triangle matrix, and so diagonal elements give eigenvalues. We have an


eigenvalue 1 of double root and that 0 of simple root. The matrix can be diagonalized
using P such that
12.7 Diagonalizable Matrices 519

0 10 10 1 0 1
1 0 1 1 0 1 1 0 1 1 0 0
B CB CB C B C
e ¼ P1 AP ¼ B 0 1C B 1C B 1 1 C B 0C
A @ 1 A@ 0 1 A@ 0 A ¼ @0 1 A:
0 0 1 0 0 0 0 0 1 0 0 0
ð12:198Þ

e2 ¼ A.
As can be checked easily, A e We also have
0 1
0 0 0
e¼B
EA @0 0
C
0 A, ð12:199Þ
0 0 1
 2    
where E  A e ¼EA e holds as well. Moreover, A e EA e ¼ EA e Ae ¼ 0.
Thus A e and E  A e behave like A1 and A2 of (12.195).
Next, suppose that x 2 V n is expressed as a linear combination of basis vectors a1,
a2,   , and an. Then

x = c1 a1 þ c2 a2 þ    þ cn1 an1 þ cn an : ð12:200Þ

Here let us define a following linear transformation P(k) such that P(k) “extracts”
the k-th component of x. That is,
Xn  Xn Xn
½k
PðkÞ ðxÞ = PðkÞ j¼1
c j a j ¼ j¼1
p c a ¼ c k ak ,
i¼1 ij j i
ð12:201Þ

½k
where pij is the matrix representation of P(k). In fact, suppose that there is another
arbitrarily chosen vector x such that

y = d1 a1 þ d2 a2 þ    þ dn1 an1 þ d n an : ð12:202Þ

Then we have

PðkÞ ðax þ byÞ = ðack þ bd k Þak = ack ak þ bd k ak = aPðkÞ ðxÞ þ bPðkÞ ðyÞ: ð12:203Þ

Thus P(k) is a linear transformation. In (12.201), for the third equality to hold, we
should have
 
½k ðk Þ
pij ¼ δi δðjkÞ , ð12:204Þ

ð jÞ
where δi has been defined in (12.175). Meanwhile, δðjkÞ denotes a row vector to
ðk Þ
which only the k-th column is 1, otherwise 0. Note that δi represents a (n, 1) matrix
520 12 Canonical Forms of Matrices

ðk Þ
and that δðjkÞ denotes a (1, n) matrix. Therefore, δi δjðkÞ represents a (n, n) matrix
whose (k, k) element is 1, otherwise 0. Thus, P(k)(x) is denoted by
0 1
0
B ⋱ C
B C0 1
B C x1
B 0 C
B CB x C
ðk Þ B CB 2 C
P ðxÞ ¼ ða1   an ÞB 1 C B C ¼ x k ak , ð12:205Þ
B C@ ⋮ A
B 0 C
B C x
B C n
@ ⋱ A
0

where only the (k, k) element is 1, otherwise 0. Then P(k)[P(k)(x)] ¼ P(k)(x). That is
h i2
PðkÞ ¼ PðkÞ : ð12:206Þ

Also P(k)[P(l )(x)] = 0 if k 6¼ l. Meanwhile, we have P(1)(x) +    + P(n)(x) = x.


Hence, P(1)(x) +    + P(n)(x) = [P(1) +    + P(n)](x) ¼ x. Since this relation holds
with any x 2 V n, we get

Pð1Þ þ    þ PðnÞ ¼ E: ð12:207Þ

As shown above, an idempotent matrix such as P(k) always exists. In particular, if


the basis vectors comprise only proper eigenvectors, the decomposition as expressed
in (12.196) is possible. In that case, it is described as

A ¼ α1 A1 þ    þ αn An , ð12:208Þ

where α1,   , and αn are eigenvalues (some of which may be identical) and A1,   ,
and An are idempotent matrices such as those represented by (12.205).
Yet, we have to be careful to construct idempotent matrices according to a
formalism described in Theorem 12.5. It is because we often encounter a situation
where different matrices give an identical characteristic polynomial. We briefly
mention this in the next example.
Example 12.8 Let us think about following two matrices:
0 1 0 1
3 0 0 3 0 0
B C B C
A ¼ @0 2 0 A, B ¼ @ 0 2 1 A: ð12:209Þ
0 0 2 0 0 2

Then, following Theorem 12.5, Sect. 12.4, we have


12.7 Diagonalizable Matrices 521

f A ðxÞ ¼ f B ðxÞ ¼ ðx  3Þðx  2Þ2 , ð12:210Þ

with eigenvalues α1 ¼ 3 and α2 ¼ 2. Also we have f1(x) ¼ (x  2)2 and f2(x) ¼ x  3.


Following the procedures of (12.85) and (12.86), we obtain

M 1 ð xÞ ¼ x  2 and M 2 ðxÞ ¼ x2 þ 3x  3:

Therefore, we have
 
M 1 ðxÞf 1 ðxÞ ¼ ðx  2Þ3 , M 2 ðxÞf 2 ðxÞ ¼ ðx  3Þ x2 þ 3x  3 : ð12:211Þ

Hence, we get

A1  M 1 ðAÞf 1 ðAÞ ¼ ðA  2E Þ3 ,
  ð12:212Þ
A2  M 2 ðAÞf 2 ðAÞ ¼ ðA  3E Þ A2 þ 3A  3E :

Similarly, we get B1 and B2 by replacing A with B in (12.212). Thus, we have


0 1 0 1
1 0 0 0 0 0
B C B C
A1 ¼ B1 ¼ @ 0 0 0 A, A2 ¼ B2 ¼ @ 0 1 0 A: ð12:213Þ
0 0 0 0 0 1

Notice that we get the same idempotent matrix of (12.213), even though the
matrix forms of A and B differ. Also we have

A1 þ A2 ¼ B1 þ B2 ¼ E:

Then, we have

A ¼ ðA1 þ A2 ÞA ¼ A1 A þ A2 A, B ¼ ðB1 þ B2 ÞB ¼ B1 B þ B2 B:

Nonetheless, although A ¼ 3A1 + 2A2 holds, B 6¼ 3B1 + 2B2. That is, the
decomposition of the form of (12.208) is not true of B. The decomposition of this
kind is possible only with diagonalizable matrices.
In summary, a (n, n) matrix with s (1  s  n) different eigenvalues has at least
s proper eigenvectors. (Note that a diagonalizable matrix has n proper eigenvectors.)
In the case of s < n, the matrix has multiple root(s) and may have generalized
eigenvectors. If the matrix has a generalized eigenvector of rank ν, the matrix is
accompanied by (ν  1) generalized eigenvectors along with a sole proper eigen-
vector. Those vectors form an invariant subspace along with the proper eigenvector
(s). In total, such n (generalized) eigenvectors span a whole vector space V n.
With the eigenvalue equation A(x) ¼ αx, we have an indefinite but nontrivial
solution x ≠ 0 for only restricted numbers α (i.e., eigenvalues) in a complex plane.
522 12 Canonical Forms of Matrices

However, we have a unique but trivial solution x = 0 for complex numbers α other
than eigenvalues. This is characteristic of the eigenvalue problem.

References

1. Mirsky L (1990) An introduction to linear algebra. Dover, New York


2. Hassani S (2006) Mathematical physics. Springer, New York
3. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
4. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
Chapter 13
Inner Product Space

Thus far we have treated the theory of linear vector spaces. The vector spaces,
however, were somewhat “structureless,” and so it will be desirable to introduce a
concept of metric or measure into the linear vector spaces. We call a linear vector
space where the inner product is defined an inner product space. In virtue of a
concept of the inner product, the linear vector space is given a variety of structures.
For instance, introduction of the inner product to the linear vector space immediately
leads to the definition of adjoint operators and Gram matrices.
Above all, the concept of inner product can readily be extended to a functional
space and facilitate understanding of, e.g., orthogonalization of functions, as was
exemplified in Parts I and II. Moreover, definition of the inner product allows us to
relate matrix operators and differential operators. In particular, it is a key issue to
understand logical structure of quantum mechanics. This can easily be understood
from the fact that Paul Dirac, who was known as one of prominent founders of
quantum mechanics, invented bra and ket vectors to represent an inner product.

13.1 Inner Product and Metric

Inner product relates two vectors to a complex number. To do this, we introduce the
notation jai and hbj to represent the vectors. This notation is due to Dirac and widely
used in physics and mathematical physics. Usually jai and hbj are called a “ket”
vector and a “bra” vector, respectively, again due to Dirac. Alternatively, we may
call haj an adjoint vector of jai. Or we denote haj  jai{. The symbol “{” (dagger)
means that for a matrix its transposed matrix should be taken with complex conju-
 {  
gate matrix elements. That is, aij ¼ aji . If we represent a full matrix, we have

© Springer Nature Singapore Pte Ltd. 2020 523


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_13
524 13 Inner Product Space

0 1 0 1
a11    a1n a11    an1
B C B C
A¼@⋮ ⋱ ⋮ A, A{ ¼ @ ⋮ ⋱ ⋮ A: ð13:1Þ
an1    ann a1n    ann

We call A{ an adjoint matrix or adjoint to A; see (1.106). The operation of


transposition and complex conjugate is commutable. A further remark will be
made below after showing the definition of the inner product. The symbols jai and
hbj represent vectors and, hence, we do not need to use bold characters to show that
those are vectors.
The definition of the inner product is as follows:

hbjai = hajbi , ð13:2Þ


hajðβjbi þ γjciÞ ¼ βhajbi þ γ hajci, ð13:3Þ
hajai  0: ð13:4Þ

In (13.4), equality holds only if jai ¼ 0. Note here that two vectors are said to be
orthogonal to each other if their inner product vanishes, i.e., hbjai ¼ hajbi ¼ 0. In
particular, if a vector jai 2 V n is orthogonal to all the vectors in V n, i.e., hxjai ¼ 0 for
8
x 2 V n, then jai ¼ 0. This is because if we choose jai for jxi, we have hajai ¼ 0.
This means that jai ¼ 0. We call a linear vector space to which the inner product is
defined an inner product space.
We can create another structure to a vector space. An example is a metric
(or distance function). Suppose that there is an arbitrary set Q. If a real non-negative
number ρ(a, b) is defined as follows with any arbitrary elements a, b, c 2 Q, the set
Q is called a metric space [1]:

ρða, bÞ ¼ ρðb, aÞ, ð13:5Þ


8
ρða, bÞ  0 for a, b; ρða, bÞ ¼ 0 if and only if a ¼ b, ð13:6Þ
ρða, bÞ þ ρðb, cÞ  ρða, cÞ: ð13:7Þ

In our study a vector space is chosen for the set Q. Here let us define a norm for
each vector a. The norm is defined as
pffiffiffiffiffiffiffiffiffiffi
jjajj¼ hajai: ð13:8Þ

If we define ρ(a, b)  jja  bjj, jja  bjj satisfies the definitions of metric.
Equations (11.5) and (11.6) are obvious. For (11.7) let us consider a vector jci as
jci¼jai  xhbjaijbi with real x. Since hcjci  0, we have
13.1 Inner Product and Metric 525

x2 hajbihbjaihbjbi  2xhajbihbjai þ hajai  0

or

x2 jhajbij2 hbjbi  2x j hajbij2 þ hajai  0: ð13:9Þ

The inequality (13.9) related to the quadratic equation in x with real coefficients
requires the inequality such that

hajaihbjbi  hajbihbjai ¼ jhajbij2 : ð13:10Þ

That is,
pffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi
hajai ∙ hbjbi  jhajbij : ð13:11Þ

Namely,

jjajj ∙ jjbjj  j hajbi j  R ehajbi: ð13:12Þ

The relations (13.11) and (13.12) are known as Cauchy–Schwarz inequality.


Meanwhile, we have

jja þ bjj2 ¼ ha þ bja þ bi ¼ jjajj2 þ jjbjj2 þ 2R ehajbi, ð13:13Þ


ðjjajj þ jjbjjÞ2 ¼ jjajj2 þ jjbjj2 þ 2jjajj ∙ jjbjj, ð13:14Þ

Comparing (13.13) and (13.14) and using (13.12), we have

jjajj þ jjbjj  jja þ bjj: ð13:15Þ

The inequality (13.15) is known as the triangle inequality. In (13.15) replacing


a ! a  b and b ! b  c, we get

jja  bjj þ jjb  cjj  jja  cjj: ð13:16Þ

Thus, (13.16) is equivalent to (13.7). At the same time, the norm defined in (13.8)
may be regarded as a “length” of a vector a.
As βjbi + γjci represents a vector, we use a shorthand notation for it as

j βb þ γci  β j bi þ γ j ci: ð13:17Þ

According to the definition (13.3)


526 13 Inner Product Space

hajβb þ γci ¼ βhajbi þ γ hajci: ð13:18Þ

Also from (13.2),

hβb þ γcjai ¼ hajβb þ γci ¼ ½βhajbi þ γ hajci ¼ β hbjai þ γ  hcjai: ð13:19Þ

That is,

hβb þ γcj ¼ β hb j þγ  hcj : ð13:20Þ

Therefore, when we take out a scalar from a bra vector, it should be a complex
conjugate. When the scalar is taken out from a ket vector, on the other hand, it is
unaffected by definition (13.3). To show this, we have jαai ¼ αjai. Taking its
adjoint, hαaj ¼ αhaj.
We can view (13.17) as a linear transformation in a vector space. In other words,
fn, j ∙ i is that
if we regard j ∙ i as a linear transformation of a vector a 2 V n to j ai 2 V
n
of V to V f. On the other hand, h ∙ j could not be regarded as a linear transformation
n

of a 2 V to haj2 V
n fn 0. Sometimes the said transformation is referred to as “antilinear”
or “sesquilinear.” From the point of view of formalism, the inner product can be
viewed as an operation: V fn  V fn 0 ! ℂ.
Let us consider jxi ¼ jyi in an inner product space V fn. Then jxi jyi ¼ 0. That is,
jx  yi ¼ 0. Therefore, we have x ¼ y, or j0i ¼ 0. This means that the linear
transformation j ∙ i converts 0 2 V n to j 0i 2 V fn . This is a characteristic of a linear
transformation represented in (11.44). Similarly we have h0j ¼ 0. However, we do
not have to get into further details in this book. Also if we have to specify a vector
space, we simply do so by designating it as V n.

13.2 Gram Matrices

Once we have defined an inner product between any pair of vectors jai and jbi of a
vector space V n, we can define and calculate various quantities related to inner
products. As an example, let ja1i,    , jani and jb1i,    , jbni be two sets of vectors
in V n. The vectors ja1i,    , jani may or may not be linearly independent. This is true
of jb1i,    , jbni. Let us think of a following matrix M defined as below:
0 1 0 1
ha1 j ha1 jb1 i    ha1 jbn i
B C B C
M ¼ @ ⋮ Aðjb1 i  jbn iÞ ¼ @ ⋮ ⋱ ⋮ A: ð13:21Þ
han j han jb1 i    han jbn i

We assume following cases.


13.2 Gram Matrices 527

(i) Suppose that in (13.21) jb1i,    , jbni are linearly dependent. Then, without loss
of generality we can put

jb1 i ¼ c2 jb2 i þ c3 jb3 i þ    þ cn jbn i: ð13:22Þ

Then we have
0 1
c2 ha1 jb2 i þ c3 ha1 jb3 i þ    þ cn ha1 jbn i ha1 jb2 i  ha1 jbn i
B C
M¼@ ⋮ ⋮ ⋱ ⋮ A :
c2 han jb2 i þ c3 han jb3 i þ    þ cn han jbn i han jb2 i    han jbn i
ð13:23Þ

Multiplying the second column,   , and the n-th column by (c2),   , and
(cn), respectively, and adding them to the first column to get

ð13:24Þ

Hence, det M ¼ 0.
(ii) Suppose in turn that in (13.21) ja1i,    , jani are linearly dependent. In that case,
again, without loss of generality we can put

ja1 i ¼ d2 ja2 i þ d3 ja3 i þ    þ dn jan i: ð13:25Þ

Focusing attention on individual rows and taking a similar procedure described


above, we have
0 1
0  0
B C
B ha jb i  ha2 jbn i C
M¼B 2 1 C: ð13:26Þ
@ ⋮ ⋱ ⋮ A
han jb1 i  han jbn i

Again, det M ¼ 0.
Next let us examine the case where det M ¼ 0. In that case, n column vectors of
M in (13.21) are linearly dependent. Without loss of generality, we suppose that the
first column is expressed as a linear combination of the other (n  1) columns such
that
528 13 Inner Product Space

0 1 0 1 0 1
ha1 jb1 i ha1 jb2 i ha1 jbn i
B C B C B C
@ ⋮ A ¼ c2 @ ⋮ A þ    þ cn @ ⋮ A: ð13:27Þ
han jb1 i han jb2 i han jbn i

Rewriting this, we have


0 1 0 1
ha1 jb1  c2 b2      cn bn i 0
B C B C
@ ⋮ A ¼ @ ⋮ A: ð13:28Þ
han jb1  c2 b2      cn bn i 0

Multiplying the first row,   , and the n-th row of (13.28) by an appropriate
complex number p1 ,   , and pn , respectively, we have

hp1 a1 jb1  c2 b2      cn bn i ¼ 0,
  ,
hpn an jb1  c2 b2      cn bn i ¼ 0: ð13:29Þ

Adding all the above, we get

hp1 a1 þ    þ pn an jb1  c2 b2      cn bn i ¼ 0: ð13:30Þ

Now, suppose that ja1i,    , jani are the basis vectors. Then, jp1a1 +    + pnani
represents any vectors in a vector space. This implies that jb1  c2b2      cnbni ¼ 0;
for this see remarks after (13.4). That is,

jb1 i ¼ c2 jb2 i þ    þ cn jbn i: ð13:31Þ

Thus, jb1i,    , jbni are linearly dependent.


Meanwhile, det M ¼ 0 implies that n row vectors of M in (13.21) are linearly
dependent. In that case, performing similar calculations to the above, we can readily
show that if jb1i,    , jbni are the basis vectors, ja1i,    , jani are linearly dependent.
We summarize the above discussion by a following statement: Suppose that we
have two sets of vectors ja1i,    , jani and jb1i,    , jbni.
At least a set of vectors are linearly dependent. ⟺ det M ¼ 0.
Both the sets of vectors are linearly independent. ⟺ det M 6¼ 0.
The latter statement is obtained by considering contraposition of the former
statement.
We restate the above in a following theorem:
Theorem 13.1 Let ja1i,    , jani and jb1i,    , jbni be two sets of vectors defined in
a vector space V n. A necessary and sufficient condition for both these sets of vectors
to be linearly independent is that for a matrix M defined below, det M 6¼ 0.
13.2 Gram Matrices 529

0 1 0 1
ha1 j ha1 jb1 i    ha1 jbn i
B C B C
M ¼ @ ⋮ Aðjb1 i  jbn iÞ ¼ @ ⋮ ⋱ ⋮ A:
han j han jb1 i    han jbn i

Next we consider a norm of a vector expressed in reference to a set of basis


vectors je1i,    , jeni of V n. Let us express a vector jxi in an inner product space as
follows as in the case of (11.10) and (11.13):

jxi ¼ x1 je1 i þ x2 je2 i þ    þ xn jen i ¼ jx1 e1 þ x2 e2 þ    þ xn en i


0 1
x1
B C
¼ ðje1 i  jen iÞ@ ⋮ A: ð13:32Þ
xn

A bra vector hxj is then denoted by


0 1
he1 j
 B C
hx j¼ x1   xn @ ⋮ A: ð13:33Þ
hen j

Thus, we have an inner product described as


0 1 0 1
h e1 j x1



 B C B C
hxjxi ¼ x1   xn @ ⋮ Aðje1 i  jen iÞ@ ⋮ A
h en j xn
0 10 1
he1 je1 i    he1 jen i x1
  B CB C
¼ x1   xn @ ⋮ ⋱ ⋮ A@ ⋮ A : ð13:34Þ
hen je1 i    hen jen i xn

Here the matrix expressed as follows is called a Gram matrix [2–4]:


0 1
he1 je1 i    he1 jen i
B C
G¼@ ⋮ ⋱ ⋮ A: ð13:35Þ
hen je1 i    hen jen i

As hejjeii ¼ heijeji, we have G ¼ G{. From Theorem 13.1, detG 6¼ 0. With a


shorthand notation, we write (G)ij ¼ (gij) ¼ (heijeji). As already mentioned in Sect.
1.4, if for a matrix H we have a relation described by
530 13 Inner Product Space

H ¼ H{, ð1:119Þ

it is said to be an Hermitian matrix or a self-adjoint matrix. We often say that such a


matrix is Hermitian.
Since the Gram matrices frequently appear in matrix algebra and play a role, their
properties are worth examining. Since G is an Hermitian matrix, it can be diagonal-
ized through similarity transformation using a unitary matrix. We will give its proof
later (see Sect. 14.3).
Let us deal with (13.34) further. We have
0 1 0 1
he1 je1 i    he1 jen i x1
  
 {B C {B C
hxjxi ¼ x1   xn UU @ ⋮ ⋱ ⋮ AUU @ ⋮ A, ð13:36Þ
hen je1 i    hen jen i xn

where U is defined as UU{ ¼ U{U ¼ E. Such a matrix U is called a unitary matrix.


We represent a matrix form of U as
0 1 0 1
u11    u1n u11    un1
B C B C
U¼@⋮ ⋱ ⋮ A, U { ¼ @ ⋮ ⋱ ⋮ A: ð13:37Þ
un1    unn u1n    unn

Here, putting
0 1 0 1
x1 ξ1
Xn
{B C B C
U @ ⋮ A ¼ @ ⋮ A or equivalently x u  ξ i
k¼1 k ki
ð13:38Þ
xn ξn

and taking its adjoint such that


     Xn
x1   xn U ¼ ξ1   ξn or equivalently x u ¼ ξi ,
k¼1 k ki

we have
0 1 0 1
he1 je1 i    he1 jen i ξ1
  B C B C
hxjxi ¼ ξ1   ξn U { @ ⋮

⋱ ⋮ AU @ ⋮ A: ð13:39Þ
hen je1 i    hen jen i ξn

We assume that the Gram matrix is diagonalized by a similarity transformation by


U. After being diagonalized, similarly to (12.192) and (12.193) the Gram matrix has
a following form G0:
13.2 Gram Matrices 531

0 1
λ1  0
{ 0 B C
U GU ¼ G ¼ @ ⋮ ⋱ ⋮ A: ð13:40Þ
0  λn

Thus, we get
0 10 1
λ1  0 ξ1
 B CB C
hxjxi ¼ ξ1   ξn @ ⋮ ⋱ ⋮ A@ ⋮ A ¼ λ1 jξ1 j2 þ    þ λn jξn j2 : ð13:41Þ
0    λn ξn

From the relation (13.4), hxjxi  0. This implies that in (13.41) we have

λ i  0 ð1 i nÞ: ð13:42Þ

this, suppose that for ∃λi, λi < 0. Suppose also that for ∃ξi, ξi 6¼ 0. Then,
To0show1
0
B C
B⋮C
B C
with B C
B ξi C we have hxjxi ¼ λi|ξi| < 0, in contradiction.
2

B⋮C
@ A
0
Since we have detG 6¼ 0, taking a determinant of (13.40) we have

    Y
n
det G0 ¼ det U { GU ¼ det U { UG ¼ det E det G ¼ det G ¼ λi 6¼ 0: ð13:43Þ
i¼1

Combining (13.42) and (13.43), all the eigenvalues λi are positive; i.e.,

λ i > 0 ð1 i nÞ: ð13:44Þ

The norm hxjxi ¼ 0, if and only if ξ1 ¼ ξ2 ¼    ¼ ξn ¼ 0 which corresponds to


x1 ¼ x2 ¼    ¼ xn ¼ 0 from (13.38).
For further study, we generalize the aforementioned feature a little further. That
is, if je1i,   , andjeni are linearly dependent, from Theorem 13.1 we get det G ¼ 0.
This implies
0 1that there is at least one eigenvalue λi ¼ 0 (1 i n) and that for a
0
B C
B⋮C
B C
vector B C ∃
B ξi C with ξi 6¼ 0 we have hxjxi ¼ 0. (Suppose that in V we have linearly
2

B⋮C
@ A
0
dependent vectors je1i and je2i ¼ j  e1i; see Example 13.2 below.)
532 13 Inner Product Space

Let H be an Hermitian matrix [i.e., (H )ij ¼ (H)ji]. If we have ϕ as a function of


complex variables x1,   , xn such that
Xn
ϕðx1 ,   , xn Þ ¼ x ðH Þij xj ,
i,j¼1 i
ð13:45Þ

where Hij is a matrix element of an Hermitian matrix; ϕ(x1,   , xn) is said to be an


Hermitian quadratic form. Suppose that ϕ(x1,   , xn) satisfies ϕ(x1,   , xn) ¼ 0 if and
only if x1 ¼ x2 ¼    ¼ xn ¼ 0, and otherwise ϕ(x1,   , xn) > 0 with any other sets of
xi (1 i n). Then, the said Hermitian quadratic form is called positive definite and
we write

H > 0: ð13:46Þ

If ϕ(x1,   , xn)  0 for any xn (1 i n) and ϕ(x1,   , xn) ¼ 0 for at least a set of
(x1,   , xn) to which ∃xi 6¼ 0, ϕ(x1,   , xn) is said to be positive semi-definite or non-
negative. In that case, we write

H  0: ð13:47Þ

From the above argument, a Gram matrix comprising linearly independent


vectors is positive definite, whereas that comprising linearly dependent vectors is
non-negative. On the basis of the above argument including (13.36) to (13.44), we
have

H > 0 ⟺ λi > 0 ð1 i nÞ, det H > 0;



H  0 ⟺ λi  0 with λi ¼ 0 ð1 i nÞ, det H ¼ 0: ð13:48Þ

Notice here that eigenvalues λi remain unchanged after (unitary) similarity trans-
formation. Namely, the eigenvalues are inherent to H.
We have already encountered several examples of positive definite and non-
negative operators. A typical example of the former case is Hamiltonian of quan-
tum-mechanical harmonic oscillator (see Chap. 2). In this case, energy eigenvalues
are all positive (i.e., positive definite). Orbital angular momenta L2 of hydrogen-like
atoms, on the other hand, are non-negative operators and, hence, an eigenvalue of
zero is permitted.
Alternatively, the Gram matrix is defined as B{B, where B is any (n, n) matrix. If
we take an orthonormal basis jη1i,jη2i,    , jηni, jeii can be expressed as
13.2 Gram Matrices 533

Xn    Xn Xn 
j ei i ¼ b ji
ηj , hek ei i ¼ b 
b ji hη

l ηj i
j¼1 j¼1 l¼1 lk
Xn Xn Xn Xn  
 
¼ j¼1 l¼1
b lk b ji δ lj ¼ j¼1
b jk b ji ¼ j¼1
B{ kj ðBÞji
 
¼ B{ B ki : ð13:49Þ

For the second equality of (13.49), we used the orthonormal condition hηijηji ¼ δij.
Thus, the Gram matrix G defined in (13.35) can be regarded as identical to B{B.
In Sect. 11.4 we have dealt with a linear transformation of a set of basis vectors e1,
e2,   , and en by a matrix A defined in (11.69) and examined whether the
transformed vectors e01 , e02 , and e0n are linearly independent. As a result, a necessary
and sufficient condition for e01 , e02 , and e0n to be linearly independent (i.e., to be a set
of basis vectors) is detA 6¼ 0. Thus, we notice that B plays a same role as A of (11.69)
and, hence, det B 6¼ 0 if and only if the set of vectors je1i,je2i, and  jeni defined in
(13.32) are linearly independent. By the same token as the above, we conclude that
eigenvalues of B{B are all positive, only if det B 6¼ 0 (i.e., B is non-singular).
Alternatively, if B is singular, detB{B ¼ det B{ det B ¼ 0. In that case at least one
of eigenvalues of B{B must be zero. The Gram matrices appearing in (13.35) are
frequently dealt with in the field of mathematical physics in conjunction with
quadratic forms. Further topics can be seen in the next chapter.
Example 13.1 Let us take two vectors jε1i andjε2i that are expressed as

jε1 i ¼ je1 iþje2 i, jε2 i ¼ j e1 i þ ije2 i: ð13:50Þ

Here we have heijeji ¼ δij (1 i, j 2). Then we have a Gram matrix expressed as

hε1 jε1 i hε1 jε2 i 2 1þi


G¼ ¼ : ð13:51Þ
hε2 jε1 i hε2 jε2 i 1i 2
 
 2 1 þ i 
Principal minors of G are j2j ¼ 2 > 0 and  ¼ 4  ð1 þ i Þ 
1i 2 
ð1  iÞ ¼ 2 > 0. Therefore, according to Theorem 14.11, G > 0 (vide infra).
Let us diagonalize the matrix G. To this end, we find roots of the characteristic
equation. That is,
 
2  λ 1 þ i 

det jG  λE j¼   ¼ 0, λ2  4λ þ 2 ¼ 0: ð13:52Þ
1  i 2  λ
pffiffiffi
We have λ ¼ 2 2. Then as a diagonalizing unitary matrix U we get
534 13 Inner Product Space

0 1
1þi 1þi
B 2 2pffiffiffi C
U ¼ @ pffiffiffi A: ð13:53Þ
2 2

2 2

Thus, we get
0 pffiffiffi 1 0 1
1i 2 1þi 1þi
B 2 C 2 1þi B 2 2pffiffiffi C
{ B
U GU ¼ @ 2 C
pffiffiffi A @ pffiffiffi A
1i 2 1i 2 2 2
 
2 2 2 2
pffiffiffi !
2þ 2 0
¼ pffiffiffi : ð13:54Þ
0 2 2
pffiffiffi pffiffiffi
The eigenvalues 2 þ 2 and 2  2 are real positive as expected. That is, the
Gram matrix is positive definite.
Example 13.2 Let je1i and je2i(¼je1i) be two vectors. Then, we have a Gram
matrix expressed as

he1 je1 i he1 je2 i 1 1


G¼ ¼ : ð13:55Þ
he2 je1 i he2 je2 i 1 1

Similarly in the case of Example 13.1, we have as an eigenvalue equation


 
1λ 1 
det j G  λE j¼  ¼ 0, λ2  2λ ¼ 0: ð13:56Þ
1 1  λ

We have λ ¼ 2 or 0. As a diagonalizing unitary matrix U we get


0 1
1 1
pffiffiffi pffiffiffi
B 2 2C
U¼B
@
C: ð13:57Þ
1 1 A
 pffiffiffi pffiffiffi
2 2

Thus, we get
0 1 0 1
1 1 1 1
pffiffiffi  pffiffiffi pffiffiffi pffiffiffi
B 2 2C 1 1 B 2 2C 2 0
U { GU ¼ B
@ 1
C B C¼ : ð13:58Þ
1 A 1 1 @ 1 1 A 0 0
pffiffiffi pffiffiffi  pffiffiffi pffiffiffi
2 2 2 2
13.3 Adjoint Operators 535

As expected, we have a diagonal matrix, one of eigenvalues for which is zero.


That is, the Gram matrix is non-negative.
In the present case, let us think of a following Hermitian quadratic form:

X2   x1
ϕð x1 , x2 Þ ¼ x ðGÞij xj
i,j¼1 i
¼ x1 x2 UU { GUU {
x2
  2 0 x1   2 0 ξ1
¼ x1 x2 U U{ ¼ ξ1 ξ2 ¼ 2j ξ 1 j 2 :
0 0 x2 0 0 ξ2

ξ1 0
Then, if we take with ξ2 6¼ 0 as a column vector, we get ϕ(x1,
¼
ξ2 ξ2
x2) ¼ 0. With this type of column vector, we have
0 1
1 1
pffiffiffi pffiffiffi
x1 0 x1 0 B 2 2C 0
U{ ¼ , i:e:, ¼U ¼B
@
C
x2 ξ2 x2 ξ2 1 1 A ξ2
 pffiffiffi pffiffiffi
2 2
1 ξ2
¼ pffiffiffi ,
2 ξ2

x1 0
where ξ2 is an arbitrary complex number. Thus, even for 6¼ we may
x2 0
x1 1 1
have ϕ(x1, x2) ¼ 0. To be more specific, if we take, e.g., ¼ or , we
x2 1 2
get

1 1 1 0
ϕð1, 1Þ ¼ ð1 1Þ ¼ ð 1 1Þ ¼ 0,
1 1 1 0
1 1 1 1
ϕð1, 2Þ ¼ ð1 2Þ ¼ ð 1 2Þ ¼ 1 > 0:
1 1 2 1

13.3 Adjoint Operators

A linear transformation A is similarly defined as before and A transforms jxi of


(13.32) such that
536 13 Inner Product Space

0 10 1
a11    a1n x1
B CB C
AðjxiÞ ¼ ðje1 i  jen iÞ@ ⋮ ⋱ ⋮ A@ ⋮ A, ð13:59Þ
an1    ann xn

where (aij) is a matrix representation of A. Defining A(jxi) =jA(x)i and h(x)


A{j ¼ (hxj)A{ ¼ [A(jxi)]{ to be consistent with (1.117), we have
0 10 1
a11  an1 he1 j
  B CB C
hðxÞA{ j ¼ x1   xn @ ⋮ ⋱ ⋮ A@ ⋮ A: ð13:60Þ
a1n  ann hen j
Pn
Therefore, puttingjyi ¼ i¼1 yi jei i, we have
0  10 1 0 1
a11    an1 h e1 j y1
   B CB C B C
ðxÞA{ j y ¼ x1   xn @ ⋮ ⋱ ⋮ A@ ⋮ Aðje1 i  jen iÞ@ ⋮ A
a1n    ann h en j yn
0 1
y
  
 { B 1C
¼ x1   xn A G@ ⋮ A: ð13:61Þ
yn

Meanwhile, we get
0 1 0 10 1
h e1 j a11  a1n x1
  
 B C B CB C
hyjAðxÞi ¼ y1   yn @ ⋮ Aðje1 i  jen iÞ@ ⋮ ⋱ ⋮ A@ ⋮ A
h en j an1  ann xn
0 1
x1
  
 B C
¼ y1   yn GA@ ⋮ A: ð13:62Þ
xn

Hence, we have
0 1 0 1
x1 x1
B C   TB C
hyjAðxÞi ¼ ðy1   yn ÞG A @ ⋮ A ¼ ðy1   yn ÞGT A{ @ ⋮ A: ð13:63Þ
xn xn

With the second equality, we used G ¼ (G{)T ¼ GT (note that G is Hermitian)


and A ¼ (A{)T. A complex conjugate matrix A is defined as
13.3 Adjoint Operators 537

0 1
a11  a1n
B C
A  @ ⋮ ⋱ ⋮ A:
an1  ann

Comparing (13.61) and (13.63), we find that one is transposed matrix of the other.
Also note that (AB)T ¼ BTAT, (ABC)T ¼ CTBTAT, etc. Since an inner product can be
viewed as a (1, 1) matrix, two mutually transposed (1, 1) matrices are identical.
Hence, we get

ðxÞA{ jy ¼ hyjAðxÞi ¼ hAðxÞjyi, ð13:64Þ

where the second equality is due to (13.2).


The other way around, we may use (13.64) for the definition of an adjoint
operator of a linear transformation A. In fact, on the basis of (13.64) we have
X  { X

i,j,k
x i A ik
ðG Þ kj y j ¼ y ðG Þjk ðA Þki xi
i,j,k j
X h 

¼ i,j,k
x i y j A{ ik ðGÞkj  ðG Þjk ðA Þki 
X h 

¼ x
i,j,k i j
y A{ ik  ðA Þki ðGÞkj ¼ 0: ð13:65Þ

With the third equality of (13.65), we used (G)jk ¼ (G)kj, i.e., G{ ¼ G (Hermitian
matrix). Thanks to the freedom in choice of basis vectors as well as xi and yi, we must
have
 
A{ ik
¼ ðA Þki : ð13:66Þ

Adopting the matrix representation of (13.59) for A, we get [1]


   
A{ ik
¼ aki : ð13:67Þ

Thus, we confirm that the adjoint operator A{ is represented by a complex


conjugate transposed matrix of A, in accordance with (13.1).
Taking complex conjugate of (13.64), we have
 D  { E
ðxÞA{ jy ¼ yj A{ ðxÞ ¼ hyjAðxÞi:

Comparing both sides of the second equality, we get


538 13 Inner Product Space

 {
A{ ¼ A: ð13:68Þ

In Chap. 11, we have seen how a matrix representation of a linear transformation


A is changed from A0 to A0 by the basis vectors transformation. We have

A0 ¼ P1 A0 P: ð11:88Þ

In a similar manner, taking the adjoint of (11.88) we get

{  {  1
ðA0 Þ ¼ P{ A{ P1 ¼ P{ A{ P{ : ð13:69Þ

In (13.69), we denote the adjoint operator before the basis vectors transformation
simply by A{ to avoid complicated notation. We also have
 0 {
A{ ¼ ðA0 Þ : ð13:70Þ

Meanwhile, suppose that (A{)0 ¼ Q1A{Q. Then, from (13.69) and (13.70) we
have

P{ ¼ Q1 :

Next let us perform a calculation as below:



ðαu þ βvÞA{ jy ¼ hyjAðαu þ βvÞi ¼ α hyjAðuÞi þ β hyjAðvÞi
  
¼ α ðuÞA{ jy þ β ðvÞA{ jy ¼ αðuÞA{ þ βðvÞA{ jy :

As y is an element arbitrarily chosen from a relevant vector space, we have

hðαu þ βvÞA{ j¼ hαðuÞA{ þ βðvÞA{ j , ð13:71Þ

or

ðαu þ βvÞA{ ¼ αðuÞA{ þ βðvÞA{ : ð13:72Þ

Equation (13.71) states the equality of two vectors in an inner product space on
both sides, whereas (13.72) states that in a vector space where the inner product is
not defined. In either case, both (13.71) and (13.72) show that A{ is indeed a linear
transformation. In fact, the matrix representation of (13.66) and (13.67) is indepen-
dent of the concept of the inner product.
Suppose that there are two (or more) adjoint operators B and C that correspond to
A. Then, from (13.64) we have
13.3 Adjoint Operators 539

hðxÞBjyi ¼ hðxÞCjyi ¼ hyjAðxÞi : ð13:73Þ

Also we have

hðxÞB  ðxÞCjyi ¼ hðxÞðB  C Þjyi ¼ 0: ð13:74Þ

As x and y are arbitrarily chosen elements, we get B ¼ C, indicating the


uniqueness of the adjoint operator.
It is of importance to examine how the norm of a vector is changed by the linear
transformation. To this end, let us perform a calculation as below:
0 10 1
a11    an1 he1 j
  B CB C
ðxÞA{ jAðxÞ ¼ x1   xn @ ⋮ ⋱ ⋮ A@ ⋮ Aðje1 i  jen iÞ
a1n    ann hen j
0 10 1
a11    a1n x1
B CB C
 @ ⋮ ⋱ ⋮ A@ ⋮ A
an1    ann xn
0 1
x1
  B C
¼ x1   xn A{ GA@ ⋮ A: ð13:75Þ
xn

Equation (13.75) gives the norm of vector after its transformation.


We may have a case where the norm is conserved before and after the transfor-
mation. Actually, comparing (13.34) and (13.75), we notice that if A{GA ¼ G,
hxjxi ¼ h(x)A{| A(x)i. Let us have a following example for this.
Example 13.3 Let us take two mutually orthogonal vectors je1i andje2i with
ke2k ¼ 2ke1k as basis vectors in the xy-plane (Fig. 13.1). We assume that je1i is a

Fig. 13.1 Basis vectors | e1i y


and | e2i in the xy-plane and
their linear transformation | ⟩
by R
| ′⟩ | ⟩ cos
| ′⟩
sin
| ⟩
2
| ⟩
−2| ⟩ sin | ⟩ cos x
540 13 Inner Product Space

unit vector. Hence, a set of vectors je1i and je2i does not constitute the orthonormal
basis. Let jxi be an arbitrary position vector there expressed as

x1
jxi ¼ ðje1 i je2 iÞ : ð13:76Þ
x2

Then we have

h e1 j x1
hxjxi ¼ ðx1 x2 Þ ðje1 i je2 iÞ
h e2 j x2
he1 je1 i he1 je2 i x1 1 0 x1
¼ ð x1 x2 Þ ¼ ðx1 x2 Þ
he2 je1 i he2 je2 i x2 0 4 x2
¼ x21 þ 4x22: ð13:77Þ

Next let us think of a following linear transformation R whose matrix represen-


tation is given by

cos θ 2 sin θ
R¼ : ð13:78Þ
ð sin θÞ=2 cos θ

The transformation matrix R is geometrically represented in Fig. 13.1. Following


(11.36) we have

cos θ 2 sin θ x1
RðjxiÞ ¼ je1 i je2 i : ð13:79Þ
ð sin θÞ=2 cos θ x2

As a result of the transformation R, the basis vectors je1i and je2i are transformed
into je10i and je20i, respectively, as in Fig. 13.1 such that

cos θ 2 sin θ
je1 0 i je2 0 i ¼ je1 i je2 i :
ð sin θÞ=2 cos θ

Taking an inner product of (13.79), we have


13.4 Orthonormal Basis 541


ðxÞR{ jRðxÞ
! ! ! !
cos θ ð sin θÞ=2 1 0 cos θ 2 sin θ x1
¼ ðx1 x2 Þ
2 sin θ cos θ 0 4 ð sin θÞ=2 cos θ x2
! !
1 0 x1
¼ ðx1 x2 Þ ¼ x21 þ 4x22
0 4 x2
ð13:80Þ

Putting

1 0
G¼ , ð13:81Þ
0 4

we have R{GR ¼ G.
Comparing (13.77) and (13.80), we notice that a norm of jxi remains unchanged
after the transformation R. This means that R is virtually a unitary transformation. A
somewhat unfamiliar matrix form of R resulted from the choice of basis vectors other
than an orthonormal basis.

13.4 Orthonormal Basis

Now we introduce an orthonormal basis, the simplest and most important basis set in
an inner product space. If we choose the orthonormal basis so that heijeji ¼ δij, a
Gram matrix G ¼ E. Thus, R{GR ¼ G reads as R{R ¼ E. In that case a linear
transformation is represented by a unitary matrix and it conserves a norm of a vector
and an inner product with two arbitrary vectors.
So far we assumed that an adjoint operator A{ operates only on a row vector from
the right, as is evident from (13.61). At the same time, A operates only on the column
vector from the left as in (13.62). To render the notation of (13.61) and (13.62)
consistent with the associative law, we have to examine the commutability of A{
with G. In this context, choice of the orthonormal basis enables us to get through
such a troublesome situation and largely eases matrix calculation. Thus,
542 13 Inner Product Space


ðxÞA{ jAðxÞ
0 10 1 0 10 1
a11  an1 h e1 j a11    a1n x1
 B CB C B CB C
¼ x1   xn B
@⋮ ⋱⋮C B C B
A@ ⋮ Aðje1 i  jen iÞ@ ⋮ ⋱ ⋮C B C
A@ ⋮ A
a1n    ann h en j an1    ann xn
0 1 0 1
x1 x1
   { B C    { B C
¼ x1   xn A EAB
 C  B C
@ ⋮ A ¼ x1   xn A A@ ⋮ A:
xn xn
ð13:82Þ

At the same time, we adopt a simple notation as below instead of (13.75)


0 1
x1
   B C
xA{ jAx ¼ x1   xn A{ A@ ⋮ A: ð13:83Þ
xn

This notation has become now consistent with the associative law. Note that A{
and A operate on either a column or row vector. We can also do without a symbol “|”
in (13.83) and express it as
 
xA{ Ax ¼ xA{ jAx : ð13:84Þ

Thus, we can freely operate A{ and A from both the left and right. By the same
token, we rewrite (13.62) as
0 10 1
a11  a1n x1
 B CB C
hyjAxi ¼ hyAxi ¼ y1   yn @ ⋮ ⋱ ⋮ A@ ⋮ A
an1  ann xn
0 1
x1
  
 B C
¼ y1   yn A@ ⋮ A: ð13:85Þ
xn

Here, notice that a vector jxi is represented by a column vector with respect to the
orthonormal basis. Using (13.64), we have
13.4 Orthonormal Basis 543

0 1
y1
   B C
hyjAxi ¼ xA{ jy ¼ x1   xn A{ B C
@⋮A
yn
0 10 1 ð13:86Þ
a11  an1 y1
 B CB C
¼ x1   xn B
@⋮ ⋱ ⋮C B C
A@ ⋮ A:
a1n  ann yn

If in (13.83) we put jyi ¼ jAxi, hxA{jAxi ¼ hyjyi  0. Thus, we define a norm of


jAxi as
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jjAxjj ¼ xA{ jAx : ð13:87Þ

Now we are in a position to construct an orthonormal basis in V n using n linearly


independent vectors jii (1 i n). The following theorem is well known as the
Gram–Schmidt orthonormalization.
Theorem 13.2: Gram–Schmidt Orthonormalization Theorem [5] Suppose that
there are a set of linearly independent vectors jii (1 i n) in V n. Then one can
construct an orthonormal basis jeii (1 i n) so that heijeji ¼ δij (1 j n) and
each vector jeii can be a linear combination of the vectors jii.
Proof First let us take j1i. This can be normalized such that

j 1i
je1 i ¼ pffiffiffiffiffiffiffiffiffiffi , he1 je1 i ¼ 1: ð13:88Þ
h1j1i

Next let us take j2i and then make a following vector:

1
je2 i ¼ ½j2i  he1 j2ije1 i, ð13:89Þ
L2

where L2 is a normalization constant such that he2je2i ¼ 1. Note that je2i cannot be a
zero vector. This is because if je2i were a zero vector, j2i and je1i (or j1i) would be
linearly dependent, in contradiction to the assumption. We have he1je2i ¼ 0. Thus,
je1i and je2i are orthonormal.
After this, the proof is based upon mathematical induction. Suppose that the
theorem is true of (n  1) vectors. That is, let jeii (1 i n  1) so that
heijeji ¼ δij (1 j n  1) and each vector jeii can be a linear combination of
the vectors jii. Meanwhile, let us define
544 13 Inner Product Space

Xn1 
jf
ni  jni  j¼1
ej jn jej i: ð13:90Þ

Again, the vector j f


ni cannot be a zero vector as asserted above. We have
Xn1   Xn1 
hek j f
ni ¼ hek jni  j¼1
ej jn ek jej ¼ hek jni  e jn δkj ¼ 0, ð13:91Þ
j¼1 j

where 1 k n  1. The second equality comes from the assumption of the


induction. The vector j f
ni can always be normalized such that

f
j ni
jen i ¼ qffiffiffiffiffiffiffiffiffiffi , hen jen i ¼ 1: ð13:92Þ
fni
hnj f

Thus, the theorem is proven. In (13.92) a phase factor eiθ (θ: an arbitrarily chosen
real number) can be added such that

f
eiθ jni
jen i ¼ qffiffiffiffiffiffiffiffiffiffi : ð13:93Þ
fni
hnj f

To prove Theorem 13.2, we have used the following simple but important
theorem. ∎
Theorem 13.3 Let us have any n vectors jii ¼
6 0 (1 i n) in V n and let these
vectors jii be orthogonal to one another. Then the vectors jii are linearly
independent.
Proof Let us think of the following equation:

c1 j1i þ c2 j2i þ    þ cn jni ¼ 0: ð13:94Þ

Multiplying (13.94) by hij from the left and considering the orthogonality among
the vectors, we have

ci hijii ¼ 0: ð13:95Þ

Since hijii 6¼ 0, ci ¼ 0. The above is true of any ci and jii. Then c1 ¼ c2 ¼    ¼ cn ¼ 0.


Thus, (13.94) implies that j1i,j2i,    , jni are linearly independent. ∎
References 545

References

1. Hassani S (2006) Mathematical physics. Springer, New York


2. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
3. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
4. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd
edn. Cambridge University Press, Cambridge
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
Chapter 14
Hermitian Operators and Unitary
Operators

Hermitian operators and unitary operators are quite often encountered in mathemat-
ical physics and, in particular, quantum physics. In this chapter we investigate their
basic properties. Both Hermitian operators and unitary operators fall under the
category of normal operators. The normal matrices are characterized by an important
fact that those matrices can be diagonalized by a unitary matrix. Moreover,
Hermitian matrices always possess real eigenvalues. This fact largely eases mathe-
matical treatment of quantum mechanics. In relation to these topics, in this chapter
we investigate projection operators systematically. We find their important applica-
tion to physicochemical problems in Part IV. We further investigate Hermitian
quadratic forms and real symmetric quadratic forms as an important branch of matrix
algebra. In connection with this topic, positive definiteness and non-negative prop-
erty of a matrix are an important concept. This characteristic is readily applicable to
theory of differential operators, thus rendering this chapter closely related to basic
concepts of quantum physics.

14.1 Projection Operators

In Chap. 12 we considered the decomposition of a vector space to direct sum of


invariant subspaces. We also mentioned properties of idempotent operators. More-
over, we have shown how an orthonormal basis can be constructed from a set of
linearly independent vectors. In this section an orthonormal basis set is implied as
basis vectors in an inner product space V n.
Let us start with a concept of an orthogonal complement. Let W be a subspace in
V n. Let us think of a set of vectors jxi such that

© Springer Nature Singapore Pte Ltd. 2020 547


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_14
548 14 Hermitian Operators and Unitary Operators

n o
8
jxi; hxjyi ¼ 0 for j yi 2 W :

We name this set W⊥ and call it an orthogonal complement of W. The set W⊥


forms a subspace of V n. In fact, if jai, j bi 2 W⊥, ha j yi ¼ 0, hb| yi ¼ 0. Since
(ha j + hb| )| yi ¼ ha| yi + hb| yi ¼ 0. Therefore, jai + j bi 2 W⊥ and hαa| yi ¼ αha|
yi ¼ 0. Hence, jαai ¼ α j ai 2 W⊥. Then, W⊥ is a subspace of V n.
Theorem 14.1 Let W be a subspace and W⊥ be its orthogonal complement in V n.
Then,

V n ¼ W⨁W⊥ : ð14:1Þ

Proof Suppose that an orthonormal basis comprising je1i, j e2i, and j eni spans
V n;

V n ¼ Spanfje1 i, je2 i, . . . , jen ig: ð14:2Þ

Of the orthonormal basis, let |e1i, |e2i, and |eri (r < n) span W. Let an
arbitrarily chosen vector from V n be |xi. Then we have
Xn
jxi ¼ x1 je1 i þ x2 je2 i þ    þ xn jen i ¼ x
i¼1 i
j ei i: ð14:3Þ

Multiplying hejj on (14.3) from the left, we have


 Xn  Xn
ej j xi ¼ i¼1
x i e j j e i i ¼ x δ ¼ xj :
i¼1 i ij
ð14:4Þ

That is,
Xn
j xi ¼ i¼1
hei jxijei i: ð14:5Þ

Meanwhile, put
Xr
j x0 i ¼ i¼1
hei jxijei i: ð14:6Þ

Then we have jx0i 2 W. Also putting jx00i ¼ j xi  j x0i and multiplying


hei j (1  i  r) on it from the left, we get
14.1 Projection Operators 549

hei j x00 i ¼ hei jxi  hei jx0 i ¼ hei jxi  hei jxi ¼ 0: ð14:7Þ

Taking account of W ¼ Span{| e1i, | e2i,   , | eri}, we get jx00i 2 W⊥. That is,
for 8jxi 2 V n

jxi ¼jx0 i þ jx00 i: ð14:8Þ

This means that V n ¼ W + W⊥. Meanwhile, we have W \ W⊥ ¼ {0}. In fact,


suppose that |xi 2 W \ W⊥. Then hx j xi ¼ 0 because of the orthogonality.
However, this implies that |xi ¼ 0. Consequently, we have V n ¼ W ⨁ W⊥. This
completes the proof.
The consequence of the Theorem 14.1 is that the dimension of W⊥ is (n  r). In
other words, we have

dimV n ¼ n ¼ dimW þ dimW ⊥ :

Moreover, the contents of the Theorem 14.1 can readily be generalized to more
subspaces such that

V n ¼ W 1 ⨁W2 ⨁  ⨁Wr , ð14:9Þ

where W1, W2,   , and Wr (r  n) are mutually orthogonal complements. In this


case 8jxi 2 V n can be expressed uniquely as the sum of individual vectors jw1i, j w2i,
  , and j wri of each subspace; i.e.,

jxi ¼jw1 i þ jw2 i þ   þjwr i ¼ jw1 þ w2 þ    þ wr i: ð14:10Þ

Let us define the following operators similarly to the case of (12.201):

Pi ðjxiÞ ¼ jwi i ð1  i  r Þ: ð14:11Þ

Thus, the operator Pi extracts a vector jwii in a subspace Wi. Then we have

ðP1 þ P2 þ    þ Pr ÞðjxiÞ ¼ P1 jxi þ P2 jxi þ    þ Pr jxi


¼ jw1 iþjw2 i þ    þ jwr i ¼jxi: ð14:12Þ

Since jxi is an arbitrarily chosen vector, we get

P1 þ P2 þ    þ Pr ¼ E: ð14:13Þ

Moreover,
550 14 Hermitian Operators and Unitary Operators

Pi ½Pi ðjxiÞ ¼ Pi ðjwi iÞ ¼ jwi i ð1  i  r Þ: ð14:14Þ

Therefore, from (14.11) and (14.14) we have

Pi ½Pi ðjxiÞ ¼ Pi ðjxiÞ: ð14:15Þ

The vector jxi is arbitrarily chosen, and so we get

Pi 2 ¼ Pi : ð14:16Þ

Choose another arbitrary vector jyi 2 V n such that

jyi ¼ju1 i þ ju2 i þ   þjur i ¼ ju1 þ u2 þ    þ ur i: ð14:17Þ

Then, we have
 
hxPi yi ¼ hw1 þ w2 þ    þ wr Pi j u1 þ u2 þ    þ ur i
: ð14:18Þ
¼ hw1 þ w2 þ    þ wr j ui i ¼ hwi jui i

With the last equality, we used the mutual orthogonality of the subspaces.
Meanwhile, we have

hyjPi xi ¼ hu1 þ u2 þ    þ ur jPi jw1 þ w2 þ    þ wr i


: ð14:19Þ
¼ hu1 þ u2 þ    þ ur jwi i ¼ hui jwi i ¼ hwi jui i

Comparing (14.18) and (14.19), we get


 
hxjPi yi ¼ hyjPi xi ¼ hxPi { y , ð14:20Þ

where we used (13.64) with the second equality. Since jxi and jyi are arbitrarily
chosen, we get

Pi { ¼ Pi : ð14:21Þ

Equation (14.21) shows that Pi is Hermitian.


The above discussion parallels that made in Sect. 12.4 with an idempotent
operator. We have a following definition about a projection operator.
Definition 14.1 An operator P is said to be a projection operator if P2 ¼ P and
P{ ¼ P. That is, an idempotent and Hermitian operator is a projection operator.
As described above, a projection operator is characterized by (14.16) and (14.21).
An idempotent operator does not premise the presence of an inner product space, but
we only need a direct sum of subspaces. In contrast, if we deal with the projection
14.1 Projection Operators 551

operator, we are thinking of orthogonal compliments as subspaces and their direct


sum. The projection operator can adequately be defined in an inner product vector
space having an orthonormal basis.
From (14.13) we have

ðP1 þ P2 þ    þ Pr ÞðP1 þ P2 þ    þ Pr Þ
Xr X Xr X X
¼ P2þ
i¼1 i
PP ¼
i6¼j i j i¼1 i
P þ PP ¼Eþ
i6¼j i j
P P ¼ E:
i6¼j i j
ð14:22Þ

In (14.22) we used (14.13) and (14.16). Therefore, we get


X
PP
i6¼j i j
¼ 0:

In particular, we have

Pi Pj ¼ 0, Pj Pi ¼ 0 ði 6¼ jÞ: ð14:23Þ

In fact, we have
   
Pi Pj ðjxiÞ ¼ Pi jwj i ¼ 0 ði 6¼ j, 1  i, j  nÞ: ð14:24Þ

The second equality comes from Wi \ Wj ¼ {0}. Notice that in (14.24), indices
i and j are interchangeable. Again, |xi is arbitrarily chosen, and so (14.23) holds.
Combining (14.16) and (14.23), we write

Pi Pj ¼ δij : ð14:25Þ

In virtue of the relation (14.23), Pi + Pj (i 6¼ j) is a projection operator as well


[1]. In fact, we have
 2
Pi þ Pj ¼ Pi 2 þ Pi Pj þ Pj Pi þ Pj 2 ¼ Pi 2 þ Pj 2 ¼ Pi þ Pj ,

where the second equality comes from (14.23). Also we have


 {
Pi þ Pj ¼ Pi { þ Pj { ¼ Pi þ Pj :

The following notation is often used:

j wi ihwi j
Pi ¼ : ð14:26Þ
j jwi j j  j jwi j j

Then we have
552 14 Hermitian Operators and Unitary Operators

jwi ihwi j
Pi j xi ¼ j ðj w1 iþ j w2 i þ    þ jws iÞ
jjwi jj  jjwi jj
¼ ½jwi iðhwi jw1 i þ hwi jw2 i þ    þ hwi jws iÞ=hwi jwi i
¼ ½jwi ihwi jwi i=hwi jwi i ¼j wi i:

Furthermore, we have

Pi 2 j xi ¼ Pi j wi i ¼j wi i ¼ Pi j xi: ð14:27Þ

Equation (14.16) is recovered accordingly. Meanwhile,

ðjwi ihwi jÞ{ ¼ ðhwi jÞ{ ðjwi iÞ{ ¼ jwi ihwi j: ð14:28Þ

Hence, we recover

Pi { ¼ Pi : ð14:21Þ

In (14.28) we used (AB){ ¼ B{A{. In fact, we have


  { {   { {  D  E

xB A y ¼ xB A y ¼ hyAjBxi ¼ hyjABjxi ¼ xðABÞ{ jy : ð14:29Þ

With the second equality of (14.29), we used (13.86) where A is replaced with
B and jyi is replaced with A{ j yi. Since (14.29) holds for arbitrarily chosen vectors
jxi and jyi, comparing the first and last sides of (14.29) we have

ðABÞ{ ¼ B{ A{ : ð14:30Þ

We can express (14.29) alternatively as follows:

   D  {  { E
xB{ A{ y ¼ y A{  B{ x ¼ hyAjBxi ¼ hyABxi , ð14:31Þ

where with the second equality we used (13.68). Also recall the remarks after (13.83)
with the expressions of (14.29) and (14.31). Other notations can be adopted.
We can view a projection operator under a more strict condition. Related oper-
ators can be defined as well. As in (14.26), let us define an operator such that

Pek ¼j ek ihek j : ð14:32Þ

Operating Pei on jxi ¼ x1 j e1i + x2 j e2i +    + xn j eni from the left, we get
14.1 Projection Operators 553

 Xn    Xn     Xn 
 ej ¼ek i  ej ¼  ek i 
Pek jxi¼jek ihek  j¼1
x j j¼1
x j e k j¼1
x j δ kj ¼x k ek i:

Thus, we find that Pek plays the same role as P(k) defined in (12.201). Represented
by a matrix, Pek has the same structure as that denoted in (12.205). Evidently,
 2 {
Pek ¼ Pek , Pek ¼ Pek , P
f1 þ f
P2 þ    þ f
Pn ¼ E: ð14:33Þ

ðk Þ
Now let us modify P(k) in (12.201). There PðkÞ ¼ δi δjðkÞ , where only the (k,k)
ðk Þ ðk Þ
element is 1, otherwise 0, in the (n, n) matrix. We define a matrix PðmÞ ¼ δi δjðmÞ. A
full matrix representation for it is
0 1
0
B C
B ⋱ 1 C
B C
B C
B 0 C
B C
ðk Þ B C
PðmÞ ¼B 0 C, ð14:34Þ
B C
B C
B 0 C
B C
B C
@ A

0

ðk Þ
where only the (k, m) element is 1, otherwise 0. In an example of (14.34), PðmÞ is an
ðk Þ
upper triangle matrix (k < m). Therefore, its eigenvalues are all zero, and so PðmÞ is a
nilpotent matrix. If k > m, the matrix is a lower triangle matrix and nilpotent as well.
Such a matrix is not Hermitian (nor a projection operator), as can be immediately
seen from the matrix form of (14.34). Because of the properties of nilpotent matrices
ðk Þ
mentioned in Sect. 12.3, PðmÞ ðk 6¼ mÞ is not diagonalizable either.
Various relations can be extracted. As an example, we have

ðk Þ ðlÞ
X ðk Þ ðk Þ ðk Þ
PðmÞ PðnÞ ¼ δ δqðmÞ δðqlÞ δjðnÞ
q i
¼ δi δml δjðnÞ ¼ δml PðnÞ : ð14:35Þ

ðk Þ
Note that PðkÞ  PðkÞ defined in (12.201). From (14.35), moreover, we have

ðk Þ ðmÞ ðk Þ ðnÞ ðmÞ ðnÞ ðmÞ ðnÞ ðmÞ


PðmÞ PðnÞ ¼ PðnÞ , PðmÞ PðnÞ ¼ PðnÞ , PðnÞ PðmÞ ¼ PðmÞ ,
h i2
ðkÞ ðk Þ ðk Þ ðmÞ ðmÞ ðmÞ ðmÞ
PðmÞ PðmÞ ¼ δmk PðmÞ , PðmÞ PðmÞ ¼ PðmÞ ¼ PðmÞ , etc:
554 14 Hermitian Operators and Unitary Operators

These relations remain unchanged after the unitary similarity transformation by


U. For instance, taking the first equation of the above, we have
h ih i
ðkÞ ðmÞ ðk Þ ðmÞ ðk Þ
U { PðmÞ PðnÞ U ¼ U { PðmÞ U U { PðnÞ U ¼ U { PðnÞ U:

ðk Þ
Among these operators, only PðkÞ is eligible for a projection operator. We will
encounter further examples in Part IV.
ðk Þ
Using PðkÞ in (13.62), we have
0 1
D  E  x1
 ðkÞ  ðk Þ B C
yPðkÞ ðxÞ ¼ y1   yn GPðkÞ @ ⋮ A: ð14:36Þ
xn

Within a framework of an orthonormal basis where G ¼ E, the representation is


largely simplified to be
0 1
D  E  x1
 ðkÞ  ðkÞ B C
yPðkÞ x ¼ y1   yn PðkÞ @ ⋮ A ¼ yk xk :
 
ð14:37Þ
xn

14.2 Normal Operators

There are a large group of operators called normal operators that play an important
role in mathematical physics, especially quantum physics. A normal operator is
defined as an operator on an inner product space that commutes with its adjoint
operator. That is, let A be a normal operator. Then, we have

AA{ ¼ A{ A: ð14:38Þ

The normal operators include an Hermitian operator H defined as H{ ¼ H as well


as a unitary operator U defined as UU{ ¼ U{U ¼ E.
In this condition let us estimate the norm of jA{xi together with jAxi defined by
(13.87). If A is a normal operator,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D  { Effi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
kA{ xk ¼ x A{ jA{ x ¼ xAA{ x ¼ xA{ Ax ¼j jAxj j : ð14:39Þ
14.2 Normal Operators 555

The other way around suppose that j|Ax| j ¼ kA{xk. Then, since ||Ax||2 ¼ kA{xk2,
hxA{Axi ¼ hxAA{xi. That is, hx| A{A  AA{| xi ¼ 0 for an arbitrarily chosen vector
jxi. To assert A{A  AA{ ¼ 0, i.e., A{A ¼ AA{ on the assumption that hx| A{A  AA{|
xi ¼ 0, we need the following theorems:
Theorem 14.2 [2] A linear transformation A on an inner product space is the zero
transformation if and only if hy| Axi ¼ 0 for any vectors jxi and jyi.
Proof If A ¼ 0, then hy| Axi ¼ hy| 0i ¼ 0. This is because in (13.3) putting β ¼ 1 ¼  γ
and jbi ¼ j ci, we get ha| 0i ¼ 0. Conversely, suppose that hy| Axi ¼ 0 for any
vectors jxi and jyi. Then, putting jyi ¼ j Axi, hxA{| Axi ¼ 0 and hy| yi ¼ 0. This
implies that jyi ¼ j Axi ¼ 0. For jAxi ¼ 0 to hold for any jxi we must have A ¼ 0.
Note here that if A is a singular matrix, for some vectors jxi, j Axi ¼ 0. However,
even though A is singular, for jAxi ¼ 0 to hold for any jxi, A ¼ 0.
We have another important theorem under a further restricted condition. ∎
Theorem 14.3 [2] A linear transformation A on an inner product space is the zero
transformation if and only if hx| Axi ¼ 0 for any vectors jxi.
Proof As in the case of Theorem 14.2, a necessary condition is trivial. To prove a
sufficient condition, let us consider the following:

hx þ yjAðx þ yÞi ¼ hxjAxi þ hyjAyi þ hxjAyi þ hyjAxi, hxjAyi þ hyjAxi


¼ hx þ yjAðx þ yÞi  hxjAxi  hyjAyi: ð14:40Þ

From the assumption that hx| Axi ¼ 0 with any vectors jxi, we have

hxjAyi þ hyjAxi ¼ 0: ð14:41Þ

Meanwhile, replacing jyi by jiyi in (14.41), we get

hxjAiyi þ hiyjAxi ¼ i½hxjAyi  hyjAxi ¼ 0: ð14:42Þ

That is,

hxjAyi  hyjAxi ¼ 0: ð14:43Þ

Combining (14.41) and (14.43), we get

hxjAyi ¼ 0: ð14:44Þ

Theorem 14.2 means that A ¼ 0, indicating that the sufficient condition holds.
This completes the proof. ∎
Thus, returning to the beginning, i.e., remarks made after (14.39), we establish the
following theorem:
556 14 Hermitian Operators and Unitary Operators

Theorem 14.4 A necessary and sufficient condition for a linear transformation A on


an inner product space to be a normal operator is

kA{ xk ¼ kAxk: ð14:45Þ

14.3 Unitary Diagonalization of Matrices

A normal operator has a distinct property. The normal operator can be diagonalized
by a similarity transformation by a unitary matrix. The transformation is said to be a
unitary similarity transformation. Let us prove the following theorem.
Theorem 14.5 [3] A necessary and sufficient condition for a matrix A to be
diagonalized by unitary similarity transformation is that the matrix A is a normal
matrix.
Proof To prove the necessary condition, suppose that A can be diagonalized by a
unitary matrix U. That is,

U { AU ¼ D, i:e:, A ¼ UDU { and A{ ¼ UD{ U { , ð14:46Þ

where D is a diagonal matrix. Then


  
AA{ ¼ UDU { UD{ U { ¼ UDD{ U { ¼ UD{ DU {
  
¼ UD{ U { UDU { ¼ A{ A: ð14:47Þ

For the third equality, we used DD{ ¼ D{D (i.e., D and D{ are commutable). This
shows that A is a normal matrix.
To prove the sufficient condition, let us show that a normal matrix can be
diagonalized by unitary similarity transformation. The proof is due to mathematical
induction, as is the case with Theorem 12.1.
First we show that Theorem is true of a (2, 2) matrix. Suppose that one of
eigenvalues of A2 is α1 and that its corresponding eigenvector is jx1i. Following
procedures of the proof for Theorem 12.1 and remembering the Gram–Schmidt
orthonormalization theorem, we can construct a unitary matrix U1 such that

U 1 ¼ ðjx1 i jp1 iÞ, ð14:48Þ

where jx1i represents a column vector and jp1i is another arbitrarily determined
column vector. Then we can convert A2 to a triangle matrix such that
14.3 Unitary Diagonalization of Matrices 557

α1 x
f
A2  U 1 { A2 U 1 ¼ : ð14:49Þ
0 y

Then we have
 {  {    
f2 A
A f2  ¼ U 1 A2 U 1 U 1 { A2 U 1 Þ{ ¼ U 1 { A2 U 1 U 1 { A2 { U 1
   h i{
¼ U 1 { A2 A2 { U 1 ¼ U 1 { A2 { A2 U 1 ¼ U 1 { A2 { U 1 U 1 { A2 U 1 ¼ f
A2 Af2 :

ð14:50Þ

With the fourth equality, we used the supposition that A2 is a normal matrix.
Equation (14.50) means that fA2 defined in (14.49) is a normal operator. Via simple
matrix calculations, we have
! !
h i{ jα1 j2 þ jxj2 xy h i{ jα1 j2 α1 x
A2 f
f A2 ¼ , A2 f
f A2 ¼ : ð14:51Þ
x y jyj2 α1 x jxj2 þ jyj2

For (14.50) to hold, we must have x ¼ 0 in (14.51). Accordingly, we get

α1 0
f
A2 ¼ : ð14:52Þ
0 y

This implies that a normal matrix A2 has been diagonalized by the unitary
similarity transformation.
Now let us examine a general case where we consider a (n, n) square normal
matrix An. Let αn be one of eigenvalues of An. On the basis of the argument of the
e we
(2, 2) matrix case, after a suitable similarity transformation by a unitary matrix U
first have
 {
fn ¼ U
A e
e An U, ð14:53Þ

where we can put

fn ¼ αn xT
A , ð14:54Þ
0 B

where x is a column vector of order (n  1), 0 is a zero column vector of order


(n  1), and B is a (n  1, n  1) matrix. Then we have
558 14 Hermitian Operators and Unitary Operators

fn { ¼ αn 0
½A , ð14:55Þ
x B{

where x is a complex column vector. Performing matrix calculations, we have


! !
fn { ¼ jαn j2 þ xT x xT B{ fn  ¼
fn { ½A jαn j2 αn xT
f
An ½A , ½A :
Bx BB{ α n x x xT þ B { B
ð14:56Þ

fn ½A
For A fn { ½g
fn { ¼ ½A An  to hold with (14.56), we must have x = 0. Thus we get

αn 0
fn ¼
A : ð14:57Þ
0 B

Since f
An is a normal matrix, so is B. According to mathematical induction, let us
assume that the theorem holds with a (n  1, n  1) matrix, i.e., B. Then, also from
the assumption there exists a unitary matrix C and a diagonal matrix D, both of order
(n  1), such that BC ¼ CD. Hence,

αn 0 1 0 1 0 αn 0
¼ : ð14:58Þ
0 B 0 C 0 C 0 D

Here putting

1 0 αn 0
fn ¼
C and fn ¼
D , ð14:59Þ
0 C 0 D

we get

f fn ¼ C
An C fn D
fn : ð14:60Þ
h i{
fn is a (n, n) unitary matrix, [ C
As C fn ¼ C
fn { C fn ¼ E . Hence,
fn C
h i{
fn A
C fn C
fn ¼ D
fn . Thus, from (14.53) finally we get

h i{  {
fn
C e An U
U eCfn ¼ D
fn : ð14:61Þ

eC
Putting U fn ¼ V, V being another unitary operator,

fn :
V { An V ¼ D ð14:62Þ

This completes the proof. ∎


14.3 Unitary Diagonalization of Matrices 559

A direct consequence of Theorem 14.5 is that with any normal matrix we can find
a set of orthonormal eigenvectors corresponding to individual eigenvalues whether
or not those are degenerate.
In Sect. 12.2 we dealt with a decomposition of a linear vector space and relevant
reduction of an operator when discussing canonical forms of matrices. In this context
Theorem 14.5 gives a simple and clear criterion for this. Equation (14.57) implies
that a (n  1, n  1) submatrix B can further be reduced to matrices having lower
dimensions. Considering that a diagonal matrix is a special case of triangle matrices,
a normal matrix that has been diagonalized by the unitary similarity transformation
gives eigenvalues by its diagonal elements.
From a point of view of the aforementioned aspect, let us consider the character-
istics of normal matrices, starting with the discussion about the invariant subspaces.
We have a following important theorem.
Theorem 14.6 Let A be a normal matrix and let one of its eigenvalues be α. Let Wα
be an eigenspace corresponding to α. Then, Wα is both A-invariant and A{-invariant.
Also Wα⊥ is both A-invariant and A{-invariant.
Proof Theorem 14.5 ensures that a normal matrix is diagonalized by unitary
similarity transformation. Therefore, we deal with only “proper” eigenvalues and
eigenvectors here. First we show if a subspace W is A-invariant, then its orthogonal
complements W⊥ is A{-invariant. In fact, suppose that jxi 2 W and jx0i 2 W⊥. Then,
from (13.64) and (13.86) we have
   
hx0 jAxi ¼ 0 ¼ xA{ jx0 ¼ xjA{ x0 : ð14:63Þ

The first equality comes from the fact that jxi 2 W ⟹ A j xi(¼| Axi) 2 W as W is
A-invariant. From the last equality of (14.63), we have
 
A{ j x0 i ¼ jA{ x0 i 2 W ⊥ : ð14:64Þ

That is, W⊥ is A{-invariant.


Next suppose that jxi 2 Wα. Then we have

AA{ j xi ¼ A{ A j xi ¼ A{ ðαjxiÞ ¼ αA{ j xi: ð14:65Þ

Therefore, A{ j xi 2 Wα. This means that Wα is A{-invariant. From the above


remark, Wα⊥ is (A{){-invariant, i.e., A-invariant accordingly. This completes the
proof. ∎
From Theorem 14.5, we know that the resulting diagonal matrix D fn in (14.62) has
a form with n eigenvalues (αn) some of which may be multiple roots arranged in
diagonal elements. After diagonalizing the matrix, those eigenvalues can be sorted
out according to different eigenvalues α1, α2,   , and αs. This can also be done by
unitary similarity transformation. The relevant unitary matrix U is represented as
560 14 Hermitian Operators and Unitary Operators

0 1
1
B ⋱ C
B C
B C
B 1 C
B C
B C
B 0  1 C
B C
B 1 C
B C
B C
U¼B ⋮ ⋱ ⋮ C, ð14:66Þ
B C
B 1 C
B C
B C
B 1  0 C
B C
B 1 C
B C
B C
@ ⋱ A
1

where except (i, j) and ( j, i) elements equal to 1, all the off-diagonal elements are
zero. If operated from the left, U exchanges the i-th and j-th rows of the matrix. If
operated from the right, U exchanges the i-th and j-th columns of the matrix. Note
that U is at once unitary and Hermitian with eigenvalue 1 or 1. Note that U2 ¼ E.
This is because exchanging two columns (or two rows) two times produces identity
transformation. Thus performing such unitary similarity transformations appropriate
times, we get
0 1
α1
B ⋱ C
B C
B C
B α1 C
B C
B α2 C
B C
B C
B ⋱ C
Dn B
~
B
C:
C ð14:67Þ
B α2 C
B C
B ⋱ C
B C
B αs C
B C
B C
@ ⋱ A
αs

The matrix is identical to that represented in (12.181).


In parallel, V n is decomposed to mutually orthogonal subspaces associated with
different eigenvalues α1, α2,   , and αs such that

V n ¼ W α1  W α2      W αs : ð14:68Þ

This expression is formally identical to that represented in (12.191). Note,


however, that in (12.181) orthogonal subspaces are not implied. At the same time,
An is reduced to
14.3 Unitary Diagonalization of Matrices 561

0 1
Að1Þ
B C
B  C
B C
An B
~
B Að2Þ C,
C ð14:69Þ
B C
@ ⋮ ⋱ ⋮A
   AðsÞ

according to the different eigenvalues.


A normal operator has other distinct properties. Following theorems are good
examples.
Theorem 14.7 Let A be a normal operator on V n. Then jxi is an eigenvector of
A with an eigenvalue α, if and only if jxi is an eigenvector of A{ with an eigenvalue
α.
Proof We apply (14.45) for the proof. Both (A  αE){ ¼ A{  αE and (A  αE) are
normal, since A is normal. Consequently, we have j|(A  αE)x| j ¼ 0 if and only if
k(A{  αE)xk ¼ 0. Since only the zero vector has a zero norm, we get
 
ðA  αE Þ j xi ¼ 0 if and only if A{  α E j xi ¼ 0:

This completes the proof. ∎


n
Theorem 14.8 Let A be a normal operator on V . Then, eigenvectors corresponding
to different eigenvalues are mutually orthogonal.
Proof Let A be a normal operator on V n. Let jui be an eigenvector corresponding to
an eigenvalue α and jvi be an eigenvector corresponding to an eigenvalue β with
α 6¼ β. Then we have
 
αhvjui ¼ hvjαui ¼ vjAui ¼ hujA{ v ¼ hujβ vi ¼ hβ vjui ¼ βhvjui, ð14:70Þ

where with the fourth equality we used Theorem 14.7. Then we get

ðα  βÞhvjui ¼ 0:

Since α  β 6¼ 0, hv| ui ¼ 0. Namely, the eigenvectors jui and jvi are mutually
orthogonal.
In (12.208) we mentioned the decomposition of diagonalizable matrices. As for
the normal matrices, we have a related matrix decomposition. Let A be a normal
operator. Then, according to Theorem 14.5, A can be diagonalized and expressed as
(14.67). This is equivalently expressed as a following succinct relation. That is, if we
choose U for a diagonalizing unitary matrix, we have
562 14 Hermitian Operators and Unitary Operators

U { AU ¼ α1 P1 þ α2 P2 þ    þ αs Ps , ð14:71Þ

where α1, α2,   , and αs are different eigenvalues of A; Pl (1  l  s) is described


such that, e.g.,
0 1
E n1
B C
B  C
B C
P1 ¼ B
B 0n 2 C,
C ð14:72Þ
B C
@ ⋮ ⋱ ⋮A
   0n s

where E n1 stands for a (n1, n1) identity matrix with n1 corresponding to multiplicity
of α1. A matrix represented by 0n2 is a (n2, n2) zero matrix, and so forth. This
expression is in accordance with (14.69). From a matrix form (14.72), obviously
Pl (1  l  s) is a projection operator. Thus, operating U and U{ on both sides
(14.71) from the left and right of (14.71), respectively, we obtain

A ¼ α1 UP1 U { þ α2 UP2 U { þ    þ αs UPs U { : ð14:73Þ

Defining Pel  UPl U { ð1  l  sÞ, we have

A ¼ α1 f
P1 þ α1 f
P2 þ    þ αs Pes : ð14:74Þ

In (14.74), we can easily check that Pel is a projection operator with α1, α2,   ,
and αs being different eigenvalues of A. If αl (1  l  s) is degenerate, we express Pel
fμ ð1  μ  m Þ, where m is multiplicity of α . In that case, we may write
as P l l l l

Pel ¼ f
P1l      Pf
l :
ml
ð14:75Þ

Also we have

Pek Pel ¼ UPk U { UPl U { ¼ UPk EPl U { ¼ UPk Pl U { ¼ 0 ð1  k, l  sÞ:

The last equality comes from (14.23). Similarly, we have

Pel Pek ¼ 0:

Thus, we have
14.3 Unitary Diagonalization of Matrices 563

Pek Pel ¼ δkl :

If the operator is decomposed as in the case of (14.75), we can express

fμ Peν ¼ δ ð1  μ, ν  m Þ:
P l l μν l

Conversely, if an operator A is expressed by (14.74), that operator is normal


operator. In fact, we have

X { X X X X
A{ A ¼ αPe αPe ¼ eP
α α P e ¼ e ¼
α α δ P ei ,
jαi j2 P
i i i j j j i,j i j i j i,j i j ij i i
X X { X X X
AA{ ¼ αPe αPe ¼ eP
α α P e ¼ e ¼
α α δ P ei
jαi j2 P
j j j i i i i,j i j j i i,j i j ji j i

ð14:76Þ

Hence, A{A ¼ AA{. If projection operators are further decomposed as in the case
of (14.75), we have a related expression to (14.76). Thus, a necessary and sufficient
condition for an operator to be a normal operator is that the said operator is expressed
as (14.74). The relation (14.74) is well known as a spectral decomposition theorem.
Thus, the spectral decomposition theorem is equivalent to Theorem 14.5.
The relations (12.208) and (14.74) are virtually the same, aside from the fact that
whereas (14.74) premises an inner product space, (12.208) does not premise
it. Correspondingly, whereas the related operators are called projection operators
with the case of (14.74), those operators are said to be idempotent operators for
(12.208).
Example 14.1 Let us think of a Gram matrix of Example 13.1, as shown below.

2 1þi
G¼ : ð13:51Þ
1i 2

After a unitary similarity transformation, we got


pffiffiffi !
{ 2þ 2 0
U GU ¼ pffiffiffi : ð13:54Þ
0 2 2

e ¼ U { GU and rewriting (13.54), we have


Putting G
564 14 Hermitian Operators and Unitary Operators

pffiffiffi !
2þ 2 0 pffiffiffi 1 0 pffiffiffi 0 0

G pffiffiffi ¼ 2 þ 2 þ 2 2 :
0 2 2 0 0 0 1

e { ¼ G, we get
By a back calculation of U GU
0 pffiffiffi 1
1 2ð1 þ iÞ
pffiffiffi B C pffiffiffi
G¼ 2þ 2 B @ pffiffiffi
2 4 Cþ 2 2
A
2ð 1  i Þ 1
4 2
0 pffiffiffi 1
1 2ð 1 þ i Þ
B  C
B pffiffiffi2 4 C: ð14:77Þ
@ A
2ð 1  i Þ 1

4 2
pffiffiffi pffiffiffi
Putting eigenvalues α1 ¼ 2 þ 2 and α2 ¼ 2  2 along with
0 pffiffiffi 1 0 pffiffiffi 1
1 2ð1 þ iÞ 1 2ð1 þ iÞ
B C B  C
A1 ¼ B
@ p ffiffi
ffi 2 4 C,
A A2 ¼ B
@ p ffiffi
ffi 2 4 C,
A ð14:78Þ
2ð1  iÞ 1 2ð1  iÞ 1

4 2 4 2

we get

G ¼ α1 A1 þ α2 A2 : ð14:79Þ

In the above, A1 and A2 are projection operators. In fact, as anticipated we have

A1 2 ¼ A1 , A2 2 ¼ A2 , A1 A2 ¼ A2 A1 ¼ 0, A1 þ A2 ¼ E: ð14:80Þ

Moreover, (14.78) obviously shows that both A1 and A2 are Hermitian. Thus,
Eqs. (14.77) and (14.79) are an example of the spectral decomposition. The decom-
position is unique.
The Example 14.1 can be dealt with in parallel to Example 12.5. In Example 12.5,
however, an inner product space is not implied, and so we used an idempotent matrix
instead of a projection operator. Note that as can be seen in Example 12.5 that
idempotent matrix was not Hermitian.

14.4 Hermitian Matrices and Unitary Matrices

Of normal matrices, Hermitian matrices and unitary matrices play a crucial role both
in fundamental and applied science. Let us think of several topics and examples.
14.4 Hermitian Matrices and Unitary Matrices 565

In quantum physics, one frequently treats expectation value of an operator. In


general, such an operator is Hermitian, more strictly an observable. Moreover, a
vector on an inner product space is interpreted as a state on a Hilbert space. Suppose
that there is a linear operator (or observable that represents a physical quantity)
O that has discrete (or countable) eigenvalues α1, α2,   . The number of the
eigenvalues may be a finite number or an infinite number, but here we assume the
finite number; i.e., let us suppose that we have eigenvalues α1, α2,   , and αs, in
consistent with our previous discussion.
In quantum physics, we have a well-known Born probability rule. The rule says
the following: Suppose that we carry out a physical measurement on A with respect
to a physical state jui. Here we assume that jui has been normalized. Then, the
probability ℘l that A takes αl (1  l  s) is given by
 2
℘l ¼ Pel u , ð14:81Þ

where Pel is a projection operator that projects jui to an eigenspace W αl spanned by


jαl, ki. Here k (1  k  nl) reflects the multiplicity nl of an eigenvalue αl. Hence, we
express the nl-dimensional eigenspace W αl as

W αl ¼ Spanfjαl , 1i, jαl , 2i,   , jαl , nl ig: ð14:82Þ

Now, we define an expectation value hAi of A such that


Xs
hAi  α℘:
l¼1 l l
ð14:83Þ

From (14.81) we have


 2 D { E      
℘l ¼ Pel u ¼ uPel jPel u ¼ uPel jPel u ¼ uPel Pel u ¼ uPel u : ð14:84Þ

For the third equality we used the fact that Pel is Hermitian; for the last equality we
used Pel ¼ Pel . Summing (14.84) with the index l, we have
2

X X  D X E
℘ ¼
l l l
uPel u ¼ u l
el u ¼ huEui ¼ hu j ui ¼ 1,
P

where with the third equality we used (14.33).


Meanwhile, from the spectral decomposition theorem, we have

A ¼ α1 f
P1 þ α2 f
P2 þ    þ αs Pes : ð14:74Þ

Operating huj and jui on both sides of (14.74), we get


566 14 Hermitian Operators and Unitary Operators

D   E D   E    
 f  f e
hujAjui ¼ α1 uP 1 u þ α2 uP2 u þ    þ αs u Ps u

¼ α1 ℘1 þ α2 ℘2 þ    þ αs ℘s : ð14:85Þ

Equating (14.83) and (14.85), we have

hAi ¼ hujAjui: ð14:86Þ

In quantum physics, a real number is required for an expectation value of an


observable (i.e., a physical quantity). To warrant this, we have following theorems.
Theorem 14.9 A linear transformation A on an inner product space is Hermitian if
and only if hu|A|ui is real for all jui of that inner product space.
Proof If A ¼ A{, then hu|A|ui ¼ hu| A{| ui ¼ hu| A| ui. Therefore, hu| A| ui is real.
Conversely, if hu|A|ui is real for all jui, we have
 
hujAjui ¼ hujAjui ¼ ujA{ ju :

Hence,
 
ujA  A{ ju ¼ 0: ð14:87Þ

From Theorem 14.3, we get A  A{ ¼ 0, i.e., A ¼ A{. This completes the proof.∎
Theorem 14.10 The eigenvalues of an Hermitian operator A are real.
Proof Let α be an eigenvalue of A and let jui be a corresponding eigenvector. Then,
A j ui ¼ α j ui. Operating huj from the left, hu|A|ui ¼ αhu| ui ¼ α j |u|j. Thus,

α ¼ hujAjui= j juj j : ð14:88Þ

Since A is Hermitian, hu|A|ui is real. Then, α is real as well.


Unitary matrices have a following conspicuous features: (i) An inner product is
held invariant under unitary transformation: Suppose that jx0i ¼ U j xi and
jy0i ¼ U j yi, where U is a unitary operator. Then hy0| x0i ¼ hyU{| Uxi ¼ hy| xi. A
norm of any vector is held invariant under unitary transformation as well. This is
easily checked by replacing jyi with jxi in the above. (ii) Let U be a unitary matrix
and suppose that λ be an eigenvalue with jλi being its corresponding eigenvector of
that matrix. Then we have

j λi ¼ U { U j λi ¼ U { ðλjλiÞ ¼ λU { j λi ¼ λλ j λi, ð14:89Þ

where with the last equality we used Theorem 14.7. Thus


14.4 Hermitian Matrices and Unitary Matrices 567

ð1  λλ Þ j λi ¼ 0: ð14:90Þ

As jλi 6¼ 0 is assumed, 1  λλ ¼ 0. That is

λλ ¼ jλj2 ¼ 1: ð14:91Þ

Thus, eigenvalues of a unitary matrix have unit absolute value. Let those eigen-
values be λk ¼ eiθk ðk ¼ 1, 2,   ; θk : realÞ. Then, from (12.11) we have
Y Y Y 
detU ¼ λk ¼ eiðθ1 þθ2 þÞ and jdetU j ¼ jλk j ¼ eiθk  ¼ 1:
k k k

That is, any unitary matrix has a determinant of unit absolute value. ∎
Example 14.2 Let us think of a following unitary matrix R:

cos θ  sin θ
R¼ : ð14:92Þ
sin θ cos θ

A characteristic equation is
 
 cos θ  λ  sin θ 
 ¼ λ2  2λ cos θ þ 1: ð14:93Þ
 sin θ cos θ  λ 

Solving (14.93), we have


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
λ ¼ cos θ cos 2 θ  1 ¼ cos θ i j sin θ j : ð14:94Þ

(i) θ ¼ 0: This is a trivial case. The matrix R has automatically been diagonalized
to be an identity matrix. Eigenvalues are 1 (double root).
(ii) θ ¼ π: This is a trivial case. Eigenvalues are 1 (double root).
(iii) θ 6¼ 0, π: Let us think of 0 < θ < π. Then λ ¼ cos θ i sin θ. As a diagonalizing
unitary matrix U, we get

0 1 0 1
1 1 1 i
pffiffiffi pffiffiffi pffiffiffi pffiffiffi
B 2 2C B 2 2 C
U¼B
@
C, U{ ¼ B C: ð14:95Þ
i i A @ 1 i A
 pffiffiffi pffiffiffi pffiffiffi  pffiffiffi
2 2 2 2

Therefore, we have
568 14 Hermitian Operators and Unitary Operators

0 1 0 1
1 i 1 1
pffiffiffi pffiffiffi pffiffiffi pffiffiffi
B 2 2 C cos θ  sin θ B 2 2C
U { RU ¼ B
@ 1
C B C
i A sin θ cos θ @  piffiffiffi i A
pffiffiffi  pffiffiffi pffiffiffi
2 2 2 2
eiθ 0
¼ : ð14:96Þ
0 eiθ

A trace of the resulting matrix is 2 cos θ. In the case of π < θ < 2π, we get a
diagonal matrix similarly. The conformation is left for readers.

14.5 Hermitian Quadratic Forms

The Hermitian quadratic forms appeared in, e.g., (13.34) and (13.83) in relation to
Gram matrices in Sect. 13.2. The Hermitian quadratic forms have wide applications
in the field of mathematical physics and materials science.
Let H be an Hermitian operator and jxi on an inner vector space. We define the
Hermitian quadratic form in an arbitrary orthonormal basis as follows:
0 1
x1
  B C X 
hxjH jxi  x1   xn H @ ⋮ A ¼ x ðH Þij xj ,
i,j i
xn

where jxi is represented as a column vector, as already mentioned in Sect. 13.4. Let
us start with unitary diagonalization of (13.40), where a Gram matrix is a kind of
Hermitian matrix. Following similar procedures, as in the case of (13.36) we obtain a
diagonal matrix and an inner product such that
0 10 1 2
λ1  0 ξ1 
 B   
CB C 
hxjH jxi ¼ ξ1   ξn @ ⋮ ⋱ ⋮ A@ ⋮ A ¼ λ1 ξ1 j2 þ    þ λn ξn  : ð14:97Þ

0    λn ξn 

Notice, however, that the Gram matrix comprising basis vectors (that are linearly
independent) is positive definite. Remember that if a Gram matrix is constructed by
A{A (Hermitian as well) according to whether A is non-singular or singular, A{A is
either positive definite or non-negative. The Hermitian matrix we are dealing with
here, in general, does not necessarily possess the positive definiteness or
non-negative feature. Yet, remember that hx| H| xi and eigenvalues λ1,   , λn are real.
Positive definiteness of matrices is an important concept in relation to the
Hermitian (and real) quadratic forms (see Sect. 13.2). In particular, in the case
where all the matrix elements are real and | xi is defined in a real domain, we are
14.5 Hermitian Quadratic Forms 569

dealing with the quadratic form with respect to a real symmetric matrix. In the case
of the real quadratic forms, we sometimes adopt the following notation [4, 5]:
0 1
x1
Xn B C
A½x  xT Ax ¼ a x x ;x
i,j¼1 ij i j
¼ @ ⋮ A,
xn

where A ¼ (aij) is a real symmetric matrix and xi (1  i  n) are real numbers. The
positive definiteness is invariant under a transformation PTAP, where P is
non-singular. In fact, if A > 0, for x0T ¼ xTP and A0 ¼ PTAP we have
 
xT Ax = xT P PT AP PT x = x0T A0 x0 :

Since P is non-singular, PTx = x0 represents any arbitrary vector. Hence,

A0 ¼ PT AP > 0, ð14:98Þ

where with the notation PTAP > 0; see (13.46). In particular, using a suitable
orthogonal matrix O, we obtain
0 1
λ1  0
B C
OT AO ¼ @ ⋮ ⋱ ⋮ A:
0  λn

From the above argument, OTAO > 0. We have

detOT AO ¼ detOT detAdetO ¼ ð 1ÞdetAð 1Þ ¼ detA:

Therefore, from (14.97), we have λi > 0 (1  i  n). Thus, we get

Y
n
detA ¼ λi > 0:
i¼1

Notice that in the above discussion, PTAP is said to be an equivalent transforma-


tion of A by P and that we distinguish the equivalent transformation from the
similarity transformation P1A P. Nonetheless, if we choose an orthogonal matrix
O for P, the two transformations are the same because OT ¼ O1.
We often encounter real quadratic forms in the field of electromagnetism. Typical
example is a trace of electromagnetic fields observed with an elliptically or circularly
polarized light (see Sect. 7.3). A permittivity tensor of an anisotropic media such as
crystals (either inorganic or organic) is another example. A tangible example for this
appeared in Sect. 9.5.2 with an anisotropic organic crystal.
Regarding the real quadratic forms, we have a following important theorem.
570 14 Hermitian Operators and Unitary Operators

Theorem 14.11 [4, 5] Let A be a n-dimensional real symmetric matrix A ¼ (aij). Let
A(k) be k-dimensional principal submatrices described by
0 1
ai1 i1 ai1 i2    ai1 ik
Ba    ai2 ik C
B i2 i1 ai2 i2 C
AðkÞ ¼ B C ð1  i1 < i2 <    < ik  nÞ,
@ ⋮ ⋮ ⋱ ⋮ A
aik i1 aik i2  aik ik

where the principal submatrices mean a matrix made by striking out rows and
columns on diagonal elements. Alternatively, a principal submatrix of a matrix
A can be defined as a matrix whose diagonal elements are a part of the diagonal
elements of A. As a special case, those include a11,   , or ann as a (1, 1) matrix (i.e.,
merely a number) as well as A itself. Then, we have

A > 0⟺detAðkÞ > 0 ð1  k  nÞ: ð14:99Þ

Proof First, suppose that A > 0. Then, in a quadratic form A[x] equating (n  k)
variables xl ¼ 0 (l 6¼ i1, i2,   , ik), we obtain a quadratic form of
Xk
a x x :
μ,ν¼1 iμ iν iμ iν

Since A > 0, this (partial) quadratic form is positive definite as well; i.e., we have

AðkÞ > 0:

Therefore, we get

detAðkÞ > 0:

This is due to (13.48). Notice that detA(k) is said to be a principal minor. Thus, we
have proven ⟹ of (14.99).
To prove ⟸, in turn, we use mathematical induction. If n ¼ 1, we have a trivial
case; i.e., A is merely a real positive number. Suppose that ⟸ is true of n  1. Then,
we have A(n  1) > 0 by supposition. Thus, it follows that it will suffice to show
A > 0 on condition that A(n  1) > 0 and detA > 0 in addition. Let A be a n-
dimensional real symmetric non-singular matrix such that
14.5 Hermitian Quadratic Forms 571

!
Aðn1Þ a
A¼ ,
aT an

where A(n  1) is a symmetric matrix and non-singular as well. We define P such that

1
!
E Aðn1Þ a
P¼ ,
0 1

where E is a (n  1, n  1) unit matrix. Notice that detP ¼ det E  1 ¼ 1 6¼ 0,


indicating that P is non-singular. We have

E 0
PT ¼ 1 :
aT Aðn1Þ 1

For this expression, consider a non-singular matrix S. Then, we have SS1 ¼ E.


Taking its transposition, we have (S1)TST ¼ E. Therefore, if S is a symmetric matrix
(S1)TS ¼ E; i.e., (S1)T ¼ S1. Hence, an inverse matrix of a symmetric matrix is
symmetric as well. Then, for a symmetric matrix A(n  1) we have

1 T 1 T  T T 1
aT Aðn1Þ ¼ Aðn1Þ a ¼ Aðn1Þ a:

Therefore, A can be expressed as


!
Aðn1Þ 0
A¼P T
1
P
0 an  Aðn1Þ ½a
! ! 1 !
E 0 Aðn1Þ 0 E Aðn1Þ a
¼ 1 1
:
aT Aðn1Þ 1 0 an  Aðn1Þ ½a 0 1
ð14:100Þ

Now, taking a determinant of (14.100), we have


h in 1
o
det A ¼ det PT det Aðn1Þ an  Aðn1Þ ½a det P
h in 1
o
¼ det Aðn1Þ an  Aðn1Þ ½a :

By supposition, we have det A(n  1) > 0 and det A > 0. Hence, we have
572 14 Hermitian Operators and Unitary Operators

1
an  Aðn1Þ ½a > 0:
!
ðn1Þ 1 xðn1Þ
Putting aen  an  A ½a and x  , we get
xn
!
Aðn1Þ 0 h i
½x ¼ Aðn1Þ xðn1Þ þ aen xn 2 :
0 aen

Since A(n  1) > 0 and aen > 0, we have


!
e Aðn1Þ 0
A > 0:
0 aen

Meanwhile, A is expressed as

e
A ¼ PT AP:

From (14.98), A > 0. These complete the proof. ∎


We also have a related theorem (the following Theorem 14.12) for an Hermitian
quadratic form.
g
Theorem 14.12 Let A ¼ (aij) be a n-dimensional Hermitian matrix. Let A ðk Þ
be k-
dimensional principal submatrices. Then, we have

g
A > 0⟺detAðk Þ
> 0 ð1  k  nÞ,

g
where Aðk Þ
is described as
0 1
a11    a1k
gðk Þ B C
A ¼@⋮ ⋱ ⋮ A:
ak1  akk

The proof is left for readers.


Example 14.3 Let us consider a following Hermitian (real symmetric) matrix and
corresponding Hermitian quadratic form.

5 2 5 2 x1
H¼ , hxjH jxi ¼ ðx1 x2 Þ : ð14:101Þ
2 1 2 1 x2
14.5 Hermitian Quadratic Forms 573

Principal minors of H are 5 (>0) and 1 (>0) and detH ¼ 5  4 ¼ 1 (>0).


Therefore, from Theorem 14.11 we have H > 0. A characteristic equation gives
following eigenvalues; i.e.,
pffiffiffi pffiffiffi
λ 1 ¼ 3 þ 2 2, λ 2 ¼ 3  2 2:

Both the eigenvalues are positive as anticipated. As a diagonalizing matrix R, we


get
pffiffiffi pffiffiffi !
1þ 2 1 2
R¼ :
1 1

To obtain a unitary matrix, we have to seek norms of column vectors.


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi
Corresponding to λ1 and λ2, we estimate their norms to be 4 þ 2 2 and
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi
4  2 2, respectively. Using them, as a unitary matrix U we get
0 1
1 1
B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi C
B 42 2 4 þ 2 2C
U¼B C: ð14:102Þ
@ 1 1 A
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi
4þ2 2 42 2

Thus, performing the matrix diagonalization, we obtain a diagonal matrix D such


that
pffiffiffi !
{ 3þ2 2 0
D ¼ U HU ¼ pffiffiffi : ð14:103Þ
0 32 2

Let us view the above unitary diagonalization in terms of coordinate transforma-


tion. Using the above matrix U and changing (14.101) as in (13.36),

5 2 x1
hxjH jxi ¼ ðx1 x2 ÞUU { UU {
21 x2
pffiffiffi !
3þ2 2 0 x1
¼ ðx1 x2 ÞU pffiffiffi U { : ð14:104Þ
0 32 2 x2

Making an argument analogous to that of Sect. 13.2 and using similar notation,
we have
574 14 Hermitian Operators and Unitary Operators

f
x1 x1
hexjH 0 jexi ¼ ð f x2 ÞH 0
x1 f ¼ ðx1 x2 ÞUU { HUU { ¼ hxjH jxi: ð14:105Þ
f
x2 x2

That is, we have

f
x1 x1
¼ U{ : ð14:106Þ
f
x2 x2

Or, taking an adjoint of (14.106), we have equivalently ð fx1 fx2 Þ ¼ ðx1 x2 ÞU .


f
x1 x1
Notice here that and are real and that U is real (i.e., orthogonal
f
x2 x2
matrix). Likewise,

H 0 ¼ U { HU: ð14:107Þ

x1 f
x1
Thus, it follows that and are different column vector representa-
x2 f
x2
tions of the identical vector that is viewed in reference to two different sets of
orthonormal bases (i.e., different coordinate systems).
From (14.102), as an approximation we have
!
0:92 0:38 cos 22:5  sin 22:5
Uffi ffi : ð14:108Þ
0:38 0:92 sin 22:5 cos 22:5

Equating hx|H|xi to a constant, we get a hypersurface in a plane. Choosing 1 for a


constant, we get an equation of hypersurface (i.e., an ellipse) as a function of f
x1 and
fx2 such that

fx1 2 fx2 2
qffiffiffiffiffiffiffiffiffiffiffi 2
þ qffiffiffiffiffiffiffiffiffiffiffi 2
¼ 1: ð14:109Þ
1pffiffi 1pffiffi
3þ2 2 32 2

Figure 14.1 depicts the ellipse.

14.6 Simultaneous Eigenstates and Diagonalization

In quantum physics, a concept of simultaneous eigenstate is important and has been


briefly mentioned in Sect. 3.3. To rephrase this concept, suppose that there are two
operators B and C and ask whether the two operators (or more) possess a common set
14.6 Simultaneous Eigenstates and Diagonalization 575

Fig. 14.1 Ellipse obtained


through diagonalization of a
Hermitian quadratic form.
The angle θ is about 22.5

of eigenvectors. The question is boiled down to whether the two operators commute.
To address this question, the following theorem is important.
Theorem 14.13 [2] Two Hermitian matrices B and C commute if and only if there
exists a complete orthonormal set of common eigenvectors.
Proof Suppose that there exists a complete orthonormal set of common eigenvec-
tors {jxi; mii} that span the linear vector space, where i and mi are a positive integer
and that jxii corresponds to an eigenvalue bi of B and ci of C. Note that if mi is 1, we
say that the spectrum is nondegenerate and that if mi is equal to two or more, the
spectrum is said to be degenerate. Then we have

Bðjxi iÞ ¼ bi j xi i, C ðjxi iÞ ¼ ci j xi i: ð14:110Þ

Therefore, BC(| xii) ¼ B(ci| xii) ¼ ciB(| xii) ¼ cibi j xii. Similarly, CB(|
xii) ¼ cibi j xii. Consequently, (BC  CB)(| xii) ¼ 0 for any jxii. As all the set of
jxii span the vector space, BC  CB ¼ 0, namely BC ¼ CB.
In turn, assume that BC ¼ CB and that B(| xii) ¼ bi j xii, where jxii are
orthonormal. Then, we have

CBðjxi iÞ ¼ bi C ðjxi iÞ⟹B½Cðjxi iÞ ¼ bi C ðjxi iÞ: ð14:111Þ

This implies that C(| xii) is an eigenvector of B corresponding to the eigenvalue bi.
We have two cases.
(i) The spectrum is nondegenerate: The spectrum is said to be nondegenerate if
only one eigenvector belongs to an eigenvalue. In other words, multiplicity of bi
is one. Then, C(| xii) must be equal to some constant times jxii; i.e., C(|
xii) ¼ ci j xii. That is, jxii is an eigenvector of C corresponding to an eigenvalue
ci. That is, jxii is a common eigenvector to B and C. This completes the proof.
576 14 Hermitian Operators and Unitary Operators

(ii) The spectrum is degenerate: The spectrum is said to be degenerate if two or


more eigenvectors belong to an eigenvalue. Multiplicity
 of bi is two or more;
 ð1Þ  ð2Þ ðmÞ
here suppose that the multiplicity is m (m 2). Let xi i, xi i,   , and j xi i
be linearly independent vectors and belong to the eigenvalue bi of B. Then, from
the assumption we have m eigenvectors

ð1Þ ð2Þ ðmÞ


C jxi i , C jxi i ,   , C jxi i

that belong to an eigenvalue bi of B. This means that individual


ð μÞ ð1Þ
C jxi i ð1  μ  mÞ are described by linear combination of j xi i,
ð2Þ ðmÞ
j xi i,   , and j xi i. What we want to prove is to show that suitable linear
combination of these m vectors constitutes an eigenvector corresponding to an
eigenvalue cμ of C. Here, to avoid complexity we denote the multiplicity by
m instead of the abovementioned mi.
ð μÞ
The vectors C jxi i ð1  μ  mÞ can be described as

ð1Þ
Xm ð jÞ ðmÞ
Xm ð jÞ
C jxi i ¼ j¼1
α j1 j xi i,   , C jxi i ¼ α
j¼1 jm
j xi i: ð14:112Þ

Using full matrix representation, we have


0 1
  γ1
Xm ðk Þ  ð1Þ  ðmÞ B C
C γ jx i ¼ xi i  xi i C @ ⋮ A
k¼1 k i
γm
0 10 1
  α11    α1m γ1
 ð1Þ  ðmÞ B CB C
¼ xi i  xi i @ ⋮ ⋱ ⋮ A@ ⋮ A: ð14:113Þ
αm1    αmm γm

In (14.113), we adopt the notation of (11.37). Since (αij) is a matrix representation


of an Hermitian operator C, (αij) is Hermitian as well. More specifically, if we take an
ðlÞ
inner product of a vector expressed in (14.112) with j xi i, then we have
D  E D Xm E Xm D  E Xm
ðlÞ  ðk Þ ðlÞ  ð jÞ ðlÞ  ð jÞ
xi Cxi ¼ xi  j¼1
αjk x i ¼ j¼1
αjk xi  xi ¼ α δ
j¼1 jk lj

¼ αlk ð1  k,l  mÞ, ð14:114Þ

where the third equality comes from the orthonormality of the basis vectors.
Meanwhile,
14.6 Simultaneous Eigenstates and Diagonalization 577

D E D E D E
ðlÞ ðk Þ ðk Þ ðlÞ ðk Þ ðlÞ
xi jCxi ¼ xi jC { xi ¼ xi jCxi ¼ αkl  , ð14:115Þ

where the second equality comes from the Hermiticity of C. From (14.114) and
(14.115), we get

αlk ¼ αkl  ð1  k, l  mÞ: ð14:116Þ

This indicates that (αij) is in fact Hermitian.


We are seeking the condition under which linear combinations of the eigenvec-
ðk Þ
tors j xi i ð1  k  mÞ for B are simultaneously eigenvectors of C. If the linear
P ðk Þ
combination m k¼1 γ k j xi i is to be an eigenvector of C, we must have

Xm ðk Þ
Xm ð jÞ
C γ jx i ¼ c
k¼1 k i
γ jx i
j¼1 j i
: ð14:117Þ

Considering (14.113), we have


0 10 1 0 1
α11  α1m γ1 γ1
   
 ð1Þ  ðmÞ B
B
CB C  ð 1Þ  ðmÞ B C
xi i  xi i @ ⋮ ⋱ ⋮ A@ ⋮ A ¼ xi i  xi i c@ ⋮ C
C B C B
A:
αm1    αmm γm γm
ð14:118Þ

ð jÞ
The vectors j xi i ð1  k  mÞ span an invariant subspace (i.e., an eigenspace
corresponding to an eigenvalue of bi). Let us call this subspace Wm. Consequently, in
ð jÞ
(14.118) we can equate the scalar coefficients of individual j xi i ð1  k  mÞ .
Then, we get
0 10 1 0 1
α11  α1m γ1 γ1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ c@ ⋮ A: ð14:119Þ
αm1    αmm γm γm

This is nothing other than an eigenvalue equation. Since (αij) is an Hermitian


matrix, there should be m eigenvalues cμ some of which may be identical (the
degenerate case). Moreover, we can always decide m orthonormal column vectors
by solving (14.119). We denote them by γ (μ) (1  μ  m) that belong to cμ. Rewriting
(14.119), we get
578 14 Hermitian Operators and Unitary Operators

0 10 ðμÞ
1 0 ðμÞ 1
α11  α1m γ1 γ1
B CB C B C
@⋮ ⋱ ⋮ A@ ⋮ A ¼ cμ @ ⋮ A: ð14:120Þ
αm1    αmm γ ðmμÞ γ ðmμÞ

Equation (14.120) implies that we can construct a (m, m) unitary matrix from the
m orthonormal column vectors γ (μ). Using the said unitary matrix, we will be able to
diagonalize (αij) according to Theorem 14.5.
Having determined m eigenvectors γ (μ) (1  μ  m), we can construct a set of
eigenvectors such that
 Xm ðμÞ  ðkÞ
 ðμÞ
 yi i ¼ γ xi i:
k¼1 k
ð14:121Þ

ðμÞ
Finally let us confirm that j yi i ð1  μ  mÞ in fact constitute an orthonormal
basis. To show this, we have
D  E DXm Xm E
ðνÞ  ðμÞ ðνÞ ðk Þ  ðμÞ ðlÞ
yi  yi ¼ γ
k¼1 k
x i  γ
l¼1 l
x i
Xm Xm h ðνÞ i ðμÞ D ðkÞ ðlÞ E
¼ k¼1 l¼1 k
γ γ l xi jxi
Xm Xm h i ð14:122Þ
ðνÞ  ðμÞ
¼ k¼1 l¼1 k
γ γ l δkl
Xm h ðνÞ i ðμÞ
¼ k¼1 k
γ γ k ¼ δνμ

The last equality comes from the fact that a matrix comprising m orthonormal
ðμÞ
column vectors γ (μ) (1  μ  m) forms a unitary matrix. Thus, j yi i ð1  μ  mÞ
certainly constitute an orthonormal basis.
The above completes the proof. ∎
Theorem 14.13 can be restated as follows: Two Hermitian matrices B and C can
be simultaneously diagonalized by a unitary similarity transformation. As mentioned
above, we can construct a unitary matrix U such that
0 ð1Þ ðmÞ
1
γ1  γ1
B C
U¼@⋮ ⋱ ⋮ A:
γ ðm1Þ  γ ðmmÞ

Then, using U, matrices B and C are diagonalized such that


14.6 Simultaneous Eigenstates and Diagonalization 579

0 1 0 1
bi c1
B bi C B c2 C
B C B C
U { BU ¼ B
B
C,
C U { CU ¼ B
B
C: ð14:123Þ
C
@ ⋱ A @ ⋱ A
bi cm

Note that both B and C are represented in an invariant subspace Wm.


As we have already seen in Part I that dealt with an eigenvalue problem of a
hydrogen-like atom, squared angular momentum (L2) and z-component of angular
momentum (Lz) possess a mutual set of eigenvectors and, hence, their eigenvalues
are determined at once. Related matrix representations to (14.123) were given in
(3.159). On the other hand, this was not the case with a set of operators Lx, and Ly,
and Lz; see (3.30). Yet, we pointed out the exceptional case where these three
operators along with L2 take an eigenvalue zero in common which an eigenstate
pffiffiffiffiffiffiffiffiffiffi
Y 0 ðθ, ϕÞ  1=4π corresponds to. Nonetheless, no complete orthonormal set of
0

common eigenvectors exists with the set of operators Lx, and Ly, and Lz. This fact is
equivalent to that these three operators are noncommutative among them. In con-
trast, L2 and Lz share a complete orthonormal set of common eigenvectors and,
hence, are commutable.
ð1Þ ð2Þ ðmÞ
Notice that C jxi i , C jxi i ,   , and Cjxi i are not necessarily linearly
independent (see Sect. 9.4). Suppose that among m eigenvalues cμ (1  μ  m),
some cμ ¼ 0. Then, detC ¼ 0 according to (13.48). This means that C is singular. In
ð1Þ ð2Þ ðmÞ
that case, C jxi i , C jxi i ,   , and Cjxi i are linearly dependent. In Sect.
 
3.3, in fact, we had j Lz Y 00 ðθ, ϕÞ ¼j L2 Y 00 ðθ, ϕÞ ¼ 0 . But, this special situation
does not affect the proof of Theorem 14.13.
We know that any matrix A can be decomposed such that

1  h1  i
A¼A þ A{ þ i A  A{ , ð14:124Þ
2 2i
   
where we put B  12 A þ A{ and C  2i1 A  A{ ; both B and C are Hermitian.
That is any matrix A can be decomposed to two Hermitian matrices in such a way
that

A ¼ B þ iC: ð14:125Þ

Note here that B and C commute if and only if A and A{ commute, that is A is a
normal matrix. In fact, from (14.125) we get

AA{  A{ A ¼ 2iðCB  BC Þ: ð14:126Þ

From (14.126), if and only if B and C commute (i.e., B and C can be diagonalized
simultaneously), AA{  A{A ¼ 0, i.e., AA{ ¼ A{A. This indicates that A is a normal
matrix. Thus, the following theorem will follow.
580 14 Hermitian Operators and Unitary Operators

Theorem 14.14 A matrix can be diagonalized by a unitary similarity transforma-


tion, if and only if it is a normal matrix.
Thus, Theorem 14.13 is naturally generalized so that it can be stated as follows:
Two normal matrices B and C commute if and only if there exists a complete
orthonormal set of common eigenvectors.
In Sects. 14.1 and 14.3, we mentioned the spectral decomposition. There, we
showed a special case where projection operators commute with one another; see
(14.23). Thus, in light of Theorem 14.13, those projection operators can be diago-
nalized at once to be expressed as, e.g., (14.72). This is a conspicuous feature of the
projection operators.

References

1. Hassani S (2006) Mathematical physics. Springer, New York


2. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
3. Mirsky L (1990) An introduction to linear algebra. Dover, New York
4. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
5. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
Chapter 15
Exponential Functions of Matrices

In Chap. 12, we dealt with a function of matrices. In this chapter we study several
important definitions and characteristics of functions of matrices. If elements of
matrices consist of analytic functions of a real variable, such matrices are of
particular importance. For instance, differentiation can naturally be defined with
the functions of matrices. Of these, exponential functions of matrices are widely used
in various fields of mathematical physics. These functions frequently appear in a
system of differential equations. In Chap. 10, we showed that SOLDEs with suitable
BCs can be solved using Green’s functions. In the present chapter, in parallel, we
show a solving method based on resolvent matrices. The exponential functions of
matrices have broad applications in group theory that we will study in Part IV. In
preparation for it, we study how the collection of matrices forms a linear vector
space. In accordance with Chap. 13, we introduce basic notions of inner product and
norm to the matrices.

15.1 Functions of Matrices

We consider a matrix in which individual components are differentiable with respect


to a real variable such that
 
Aðt Þ ¼ aij ðt Þ : ð15:1Þ

We define the differentiation as

dAðt Þ  0 
A0 ðt Þ ¼  aij ðt Þ : ð15:2Þ
dt

We have

© Springer Nature Singapore Pte Ltd. 2020 581


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_15
582 15 Exponential Functions of Matrices

½Aðt Þ þ Bðt Þ0 ¼ A0 ðt Þ þ B0 ðt Þ, ð15:3Þ


½Aðt ÞBðt Þ0 ¼ A0 ðt ÞBðt Þ þ Aðt ÞB0 ðt Þ: ð15:4Þ

To get (15.4), putting A(t)B(t)  C(t) we have


X
cij ðt Þ ¼ a ðt Þbkj ðt Þ,
k ik
ð15:5Þ

where C(t) ¼ (cij(t)), B(t) ¼ (bij(t)). Differentiating (15.5), we get


X X
c0ij ðt Þ ¼ a0 ðt Þbkj ðt Þ þ
k ik
a ðt Þb0kj ðt Þ,
k ik
ð15:6Þ

namely, we have C0(t) ¼ A0(t)B(t) + A(t)B0(t), i.e., (15.4) holds.


Analytic functions are usually expanded as an infinite power series. As an
analogue, a function of a matrix is expected to be expanded as such. Among various
functions of matrices, exponential functions are particularly important and have a
wide field of applications. As in the case of a real or complex number x, the
exponential function of matrix A can be defined as

1 2 1
exp A  E þ A þ A þ    þ Aν þ   , ð15:7Þ
2! ν!

where A is a (n, n) square matrix and E is a (n, n) identity matrix. Note that E is used
instead of a number 1 that appears in the power series expansion of exp x described
as

1 2 1
exp x  1 þ x þ x þ    þ xν þ   :
2! ν!

Here we define the convergence of a series of matrix as in the following


definition:
Definition 15.1 Let A0, A1,   , Aν,    [Aν  (aij, ν) (1  i, j  n)] be a sequence of
matrices. Let aij, 0, aij, 1,   , aij, ν,    be the corresponding sequence of each matrix
 
component. Let us define a series of matrices as A e e aij ¼ A0 þ A1 þ    þ
Aν þ    and a series of individual matrix components as e aij  aij,0 þ aij,1 þ    þ
aij,ν þ   . Then, if each eaij converges, we say that A e converges.

Regarding the matrix notation in Definition 15.1, readers are referred to (11.38).
Now, let us show that the power series of (15.7) converges [1, 2].
Let A be a (n, n) square matrix. Putting A ¼ (aij) and max j aij j¼ M and defining
  i, j
ðνÞ ðνÞ
A  aij , where aij is defined as a (i, j) component of Aν, we get
ν
15.1 Functions of Matrices 583

 
 ðνÞ 
aij   nν M ν ð1  i, j  nÞ: ð15:8Þ

ð0Þ
Note that A0 ¼ E. Hence, aij ¼ δij. Then, if ν ¼ 0, (15.8) routinely holds. When
ν ¼ 1, (15.8) obviously holds as well from the definition of M and assuming n  2;
we are considering (n, n) matrices with n  2. We wish to show that (15.8) holds
with any ν (2) using mathematical induction. Suppose that (15.8) holds with
ν ¼ ν  1. That is,
 
 ðν1Þ 
aij   nν1 M ν1 ð1  i, j  nÞ:

Then, we have

ð νÞ   Xn   Xn ðν1Þ
ðAν Þij  aij ¼ Aν1 A ij ¼ k¼1
Aν1 ik ðAÞkj ¼ a
k¼1 ik
akj : ð15:9Þ

Thus, we get
  Xn  Xn  
 ð νÞ   ðν1Þ   ðν1Þ   
aij  ¼  k¼1 aik akj    a
k¼1 ik
  akj  n  nν1 M ν1  M

¼ nν M ν : ð15:10Þ

Consequently, (15.8) holds with ν ¼ ν.


Meanwhile, we have a next relation such that
X1 1
enM ¼ ν¼0 ν!
nν M ν : ð15:11Þ

The series (15.11) is certainly converges; note that nM is not a matrix but a
number. From (15.10) we obtain
X1  1 ðνÞ  X1 1
 a 
ν¼0 ν! ij ν¼0 ν!
nν M ν ¼ enM :

P 1 ðνÞ
This implies that 1 ν¼0 ν! aij converges (i.e., absolute convergence [3]). Thus,
from Definition 15.1, (15.7) converges.
To further examine the exponential functions of matrices, in the following
proposition we show that a collection of matrices forms a linear vector space.
Proposition 15.1 Let M be a collection of all (n, n) square matrices. Then, M forms
a linear vector space.
   
Proof Let A ¼ aij 2 M and B ¼ bij 2 M be (n, n) square matrices. Then,
αA ¼ (αaij) and A + B ¼ (aij + bij) are again (n, n) square matrices, and so we
have αA 2 M and A þ B 2 M . Thus, M forms a linear vector space.
584 15 Exponential Functions of Matrices

2
In Proposition 15.1, M forms a n2-th order n
E vector space V . To define an inner
 ðkÞ
product in this space, we introduce PðlÞ ðk, l ¼ 1, 2, . . . , nÞ as a basis set of n2
vectors. Now, let A ¼ (aij) be a (n, n) square matrices. To explicitly show that A is a
vector in the inner product space, we express it as
Xn  E
 ðk Þ
jAi ¼ a  P
k,l¼1 kl ðlÞ
:

Then, an adjoint matrix of jAi is written as


Xn D 
ðk Þ 
hAjjAi{ ¼ a
k,l¼1 kl

PðlÞ , ð15:12Þ
D   E
ðk Þ   ðkÞ
where PðlÞ  is an adjoint matrix of PðlÞ . With the notations in (15.12), see Sects.
ðk Þ
1.4 and 13.3; for PðlÞ see Sect. 14.1. Meanwhile, let B ¼ (bij) be another (n, n) square
matrix denoted by
Xn  E
 ðsÞ
jBi ¼ s,t¼1
b st PðtÞ :

 E
 ðk Þ
Here, let us define an orthonormal basis set PðlÞ and an inner product between
vectors described in a form of matrices such that
D E
ðk Þ ðsÞ
PðlÞ jPðtÞ  δks δlt :

Then, from (15.12) the inner product between A and B is given by


Xn D E Xn Xn
 ðkÞ ðsÞ 
hAjBi k,l,s,t¼1
a kl b st Pð lÞ jPð t Þ ¼ k,l,s,t¼1
a kl b st δks δlt ¼ a b : ð15:13Þ
k,l¼1 kl kl

The relation (15.13) leads to a norm already introduced in (13.8). The norm of a
matrix is defined as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi Xn  2
jjAjj  AjA¼ aij  : ð15:14Þ
i,j¼1

We have the following theorem regarding two matrices.


Theorem 15.1 Let A ¼ (aij) and B ¼ (bij) be two square matrices. Then, we have a
following inequality:
15.2 Exponential Functions of Matrices and Their Manipulations 585

jjABjjjjAjj  jjBjj : ð15:15Þ

Proof Let C ¼ AB. Then, from Cauchy–Schwarz inequality (Sect. 13.1) we get

X  2 X X 2 X X
 X  2
kC k2 ¼ cij  ¼  a ik b kj   ja j2 l blj 
i,j i,j k i,j k ik
X X    : ð15:16Þ
¼ j a ik j 2 blj 2 ¼ kAk2  kBk2
i,k j,l

This leads to (15.15). ∎

15.2 Exponential Functions of Matrices and Their


Manipulations

Using Theorem 15.1, we explore important characteristics of exponential functions


of matrices. For this we have a next theorem.
Theorem 15.2 [4] Let A and B be mutually commutative matrices; i.e., AB ¼ BA.
Then, we have

exp ðA þ BÞ ¼ exp A exp B ¼ exp B exp A: ð15:17Þ

Proof Since A and B are mutually commutative, (A + B)n can be expanded through
the binomial theorem such that

1 Xn Ak Bnk
ðA þ BÞn ¼ k¼0 k! ðn  k Þ!
: ð15:18Þ
n!

Meanwhile, with an arbitrary integer m, we have

X2m 1 X X
n m An m Bn
n¼0 n!
ð A þ B Þ ¼ n¼0 n! n¼0 n!
þ Rm , ð15:19Þ

P 
where Rm has a form of ðk,lÞ A k l
B =k!l!, where the summation is taken over all
combinations of (k, l) that satisfy max(k, l)  m + 1 and k + l  2m. The number of
all the combinations (k, l) is m(m + 1) accordingly; see Fig. 15.1 [4]. We remark that
if in (15.19) we put m ¼ 0, (15.19) trivially holds with Rm ¼ 0 (i.e., zero matrix). The
586 15 Exponential Functions of Matrices

Fig. 15.1 Number of


possible combinations (k, l ).
Such combinations are
indicated with green points.
The number of the green −1
points of upper left triangle −2
is m(m + 1)/2. That for lower
right triangle is m(m + 1)/2 ⋯⋯⋯
as well. The total number of
the green points is m(m + 1)
accordingly. Adapted from +1 ⋯
Yamanouchi T, Sugiura M
(1960) Introduction to
continuous groups (New
Mathematics Series 18: in


Japanese) [4], with the

⋯⋯⋯
permission of Baifukan Co.,
Ltd., Tokyo
0
0 +1

matrix Rm, however, changes with increasing m, and so we must evaluate it at


m ! 1. Putting max(| |A|| , | |B|| , 1) ¼ C and using Theorem 15.1, we get
 k  l
X jAj jBj
jjRm jj  ðk,lÞ
: ð15:20Þ
k!l!

In (15.20) min(k, l ) ¼ 0, and so k ! l !  (m + 1)!. Since ||A||k||B||l  Ck + l  C2m, we


have
 k  l
X jAj jBj mðm þ 1ÞC 2m C 2m
 ¼ , ð15:21Þ
ðk,lÞ k!l! ðm þ 1Þ! ðm  1Þ!

where m(m + 1) in the numerator comes from the number of combinations (k, l) as
pointed out above. From Stirling’s interpolation formula [5], we have
pffiffiffiffiffi z zþ1
Γðz þ 1Þ 2π e z 2 : ð15:22Þ

Replacing z with an integer m  1, we get


pffiffiffiffiffi ðm1Þ 1
ΓðmÞ ¼ ðm  1Þ! 2π e ðm  1Þm2 : ð15:23Þ

Then, we have
15.2 Exponential Functions of Matrices and Their Manipulations 587

 m12
C2m em1 C2m C eC 2
½RHS of ð15:21Þ ¼ pffiffiffiffiffi ¼ p ffiffiffiffiffiffiffi
ffi ð15:24Þ
ðm  1Þ! 2πe m  1
1
2π ðm  1Þm2

and

 m12
C eC2
lim pffiffiffiffiffiffiffiffi ¼ 0: ð15:25Þ
m!1 2πe m  1

Thus, from (15.20) and (15.21) we get

lim jjRm jj ¼ 0: ð15:26Þ


m!1

Remember that jjRmjj ¼ 0 if and only if Rm ¼ 0 (zero matrix); see (13.4) and
(13.8) in Sect. 13.1. Finally, taking limit (m ! 1) of both sides of (15.19) we obtain

X2m 1 X X
n m An m Bn
lim n¼0 n!
ð A þ B Þ ¼ lim n¼0 n! n¼0 n!
þ Rm :
m!1 m!1

Considering (15.26), we get

X2m 1 X X
n m An m Bn
limn¼0 n!
ð A þ BÞ ¼ lim n¼0 n! n¼0 n!
: ð15:27Þ
m!1 m!1

This implies that (15.17) holds. It is obvious that if A and B commute, then exp A
and exp B commute. That is, exp A exp B ¼ exp B exp A. These complete the
proof. ∎
From Theorem 15.2, we immediately get the following property.

ð1Þ ð exp AÞ1 ¼ exp ðAÞ: ð15:28Þ

To show this, it suffices to find that A and A commute. That is, A


(A) ¼  A2 ¼ (A)A. Replacing B with A in (15.17), we have

exp ðA  AÞ ¼ exp 0 ¼ E ¼ exp A exp ðAÞ ¼ exp ðAÞ exp A: ð15:29Þ

This leads to (15.28). Notice here that exp 0 ¼ E from (15.7). We further have
important properties of exponential functions of matrices.
 
ð2Þ P1 ð exp AÞP ¼ exp P1 AP : ð15:30Þ
588 15 Exponential Functions of Matrices

 
ð3Þ exp AT ¼ ð exp AÞT : ð15:31Þ
 
ð4Þ exp A{ ¼ ð exp AÞ{ : ð15:32Þ

To show (2) we have


 1 n  1  1   1 
P AP ¼ P AP P AP    P AP
     
¼ P1 A PP1 A PP1    PP1 AP ð15:33Þ
1 1 n
¼ P AEAE  EAP ¼ P A P:

Applying (15.33) to (15.7), we have


   2  n
exp P1 AP ¼ E þ P1 AP þ P1 AP þ    þ P1 AP þ   
A2 An
¼ P1 EP þ P1 AP þ P1 P þ    þ P1 P þ    ð15:34Þ
 2! n!
2 n
A A
¼ P1 E þ A þ þ    þ þ    P ¼ P1 ð exp AÞP:
2! n!

With (3) and (4), use the following equations:


 n  n
AT ¼ ðAn ÞT and A{ ¼ ðAn Þ{ : ð15:35Þ

The confirmation of (15.31) and (15.32) is left for readers. For future purpose, we
describe other important theorem and properties of the exponential functions of
matrices.
Theorem 15.3 Let a1, a2,   , an be eigenvalue of a (n, n) square matrix A. Then,
exp a1, exp a2,   , exp an are eigenvalues of exp A.
Proof From Theorem 12.1 we know that every (n, n) square matrix can be
converted to a triangle matrix by similarity transformation. Also, we know that
eigenvalues of a triangle matrix are given by its diagonal elements (Sect. 12.1).
Let P be a non-singular matrix used for such a similarity transformation. Then,
we have

= * . ð15:36Þ
0

Meanwhile, let T be a triangle matrix described by


15.2 Exponential Functions of Matrices and Their Manipulations 589

* . ð15:37Þ
0

Then, from a property of the triangle matrix we have

* . ð15:38Þ

Applying this property to (15.36), we get

* , ð15:39Þ
0

where the last equality is due to (15.33). Summing (15.39) over m following (15.34),
we obtain another triangle matrix expressed as

exp
* . ð15:40Þ
0

Once again considering that eigenvalues of a triangle matrix are given by its
diagonal elements and that the eigenvalues are invariant under a similarity transfor-
mation, (15.40) implies that exp a1, exp a2,   , exp an are eigenvalues of exp A.
These complete the proof. ∎
A question of whether the eigenvalues are a proper eigenvalue or a generalized
eigenvalue is irrelevant to Theorem 15.3. In other words, the eigenvalue may be a
proper one or a generalized one (Sect. 12.6). That depends solely upon the nature of
A in question.
Theorem 15.3 immediately leads to a next important relation. Taking a determi-
nant of (15.40), we get

det P1 ð exp AÞP ¼ detP1 detð exp AÞdetP ¼ detð exp AÞ
¼ ð exp a1 Þð exp a2 Þ  ð exp an Þ ¼ exp ða1 þ a2 þ    þ an Þ
¼ exp ðTrAÞ ð15:41Þ

To derive (15.41), refer to (11.58) and (12.10) as well. We list (15.41) as an


important property of the exponential functions of matrices.
590 15 Exponential Functions of Matrices

ð5Þ detð exp AÞ ¼ exp ðTrAÞ: ð15:41Þ

We have examined general aspects until now. Nonetheless, we have not yet
studied the relationship between the exponential functions of matrices and specific
types of matrices so far. Regarding the applications of exponential functions of
matrices to mathematical physics, especially with the transformations of functions
and vectors, we frequently encounter orthogonal matrices and unitary matrices. For
this purpose, we wish to examine the following properties.
(6) Let A be a real skew-symmetric matrix. Then, exp A is a real orthogonal matrix.
(7) Let A be an anti-Hermitian matrix. Then, exp A is a unitary matrix.
To show (6), we note that a skew-symmetric matrix is described as

AT ¼ A or AT þ A ¼ 0: ð15:42Þ

Then, we have ATA ¼  A2 ¼ AAT and, hence, A and AT commute. Therefore,


from Theorem 15.2, we have
   
exp A þ AT ¼ exp 0 ¼ E ¼ exp A exp AT ¼ exp Að exp AÞT , ð15:43Þ

where the last equality comes from Property (3) of (15.31). Equation (15.43) implies
that exp A is a real orthogonal matrix.
Since an anti-Hermitian matrix A is expressed as

A{ ¼ A or A{ þ A ¼ 0, ð15:44Þ

we have A{A ¼  A2 ¼ AA{, and so A and A{ commute. Using (15.32), we have


   
exp A þ A{ ¼ exp 0 ¼ E ¼ exp A exp A{ ¼ exp Að exp AÞ{ , ð15:45Þ

where the last equality comes from Property (4) of (15.32). This implies that exp A is
a unitary matrix.
Regarding Properties (6) and (7), if A is replaced with tA (t : real), the relations
(15.42) through (15.45) are held unchanged. Then, Properties (6) and (7) are rewrit-
ten as
(6)0 Let A be a real skew-symmetric matrix. Then, exp tA (t : real) is a real
orthogonal matrix.
(7)0 Let A be an anti-Hermitian matrix. Then, exp tA (t : real) is a unitary matrix.
Now, let a function F(t) be expressed as F(t) ¼ exp tx with real numbers t and x.
Then, we have a well-known formula

d
F ðt Þ ¼ xF ðt Þ: ð15:46Þ
dt
15.2 Exponential Functions of Matrices and Their Manipulations 591

Next, we extend (15.46) to the exponential functions of matrices. Let us define the
differentiation of an exponential function of a matrix with respect to a real parameter
t. Then, in concert with (15.46), we have a following important theorem.
Theorem 15.4 [1, 2] Let F(t)  exp tA with t being a real number and A being a
matrix that does not depend on t. Then, we have

d
F ðt Þ ¼ AF ðt Þ: ð15:47Þ
dt

In (15.47) we assume that individual matrix components of F(t) are differentiable


with respect to t and that the differentiation of a matrix is defined as in (15.2).
Proof We have

F ðt þ Δt Þ ¼ exp ðt þ Δt ÞA ¼ exp ðtA þ ΔtAÞ ¼ ð exp tAÞð exp ΔtAÞ


¼ F ðt ÞF ðΔt Þ ¼ ð exp ΔtAÞð exp tAÞ ¼ F ðΔt ÞF ðt Þ, ð15:48Þ

where we considered that tA and ΔtA are commutative and used Theorem 15.2.
Therefore, we get

1 1
½F ðt þ Δt Þ  F ðt Þ ¼ ½F ðΔt Þ  EF ðt Þ
Δt Δt
h i
1 X1 1 ν
¼ ð ΔtA Þ  E F ðt Þ
Δt ν¼0 ν!
hX1 i
1
¼ ν¼1 ν!
ðΔt Þν1 Aν F ðt Þ: ð15:49Þ

Taking the limit of both sides of (15.49), we get

1 d
lim ½F ðt þ Δt Þ  F ðt Þ  F ðt Þ ¼ AF ðt Þ:
Δt!0 Δt dt

Notice that in (15.49) only the first term (i.e., ν ¼ 1) of RHS is nonvanishing with
respect to Δt ! 0. Then, (15.47) will follow. This completes the proof. ∎
On the basis of the power series expansion of the exponential function of a matrix
defined in (15.7), A and F(t) are commutative. Therefore, instead of (15.47) we may
write

d
F ðt Þ ¼ F ðt ÞA: ð15:50Þ
dt

Using Theorem 15.4, we show the following important properties. These are
converse propositions to Properties (6) and (7).
592 15 Exponential Functions of Matrices

(8) Let exp tA be a real orthogonal matrix with any real number t. Then, A is a real
skew-symmetric matrix.
(9) Let exp tA be a unitary matrix with any real number t. Then, A is an anti-
Hermitian matrix.
To show (8), from the assumption with any real number t we have
 
exp tAð exp tAÞT ¼ exp tA exp tAT ¼ E: ð15:51Þ

Differentiating (15.51) with respect to t by use of (15.4), we have


   
ðA exp tAÞ exp tAT þ ð exp tAÞAT exp tAT ¼ 0: ð15:52Þ

Since (15.52) must hold with any real number, (15.52) must hold with t ! 0 as
well. On this condition, from (15.52) we get A + AT ¼ 0. That is, A is a real skew-
symmetric matrix.
In the case of (9), similarly we have
 
exp tAð exp tAÞ{ ¼ exp tA exp tA{ ¼ E: ð15:53Þ

Differentiating (15.53) with respect to t and taking t ! 0, we get A + A{ ¼ 0. That


is, A is an anti-Hermitian matrix.
Properties (6)–(9) including Properties (6)0 and (7)0 are frequently used later in
Chap. 20 in relation to Lie groups and Lie algebras.

15.3 System of Differential Equations

In this section we describe important applications of the exponential functions of


matrices to systems of differential equations.

15.3.1 Introduction

In Chap. 10 we have dealt with SOLDEs using Green’s functions. In the present
section we examine the properties of systems of differential equations. There is a
close relationship between the two topics. First we briefly outline it.
We describe an example of the system of differential equations. First, let us think
of the following general equation:

xð_t Þ ¼ aðt Þxðt Þ þ bðt Þyðt Þ, ð15:54Þ


15.3 System of Differential Equations 593

yð_t Þ ¼ cðt Þxðt Þ þ dðt Þyðt Þ, ð15:55Þ

where x(t) and y(t) vary as a function of t; a(t), b(t), c(t), and d(t) are coefficients.
Differentiating (15.54) with respect to t, we get

xð€t Þ ¼ að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þyð_t Þ


¼ að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þ½cðt Þxðt Þ þ d ðt Þyðt Þ
¼ ½að_t Þ þ bðt Þcðt Þxðt Þ þ aðt Þxð_t Þ þ ½bð_t Þ þ bðt Þd ðt Þyðt Þ, ð15:56Þ

where with the second equality we used (15.55). To delete y(t) from (15.56), we
multiply b(t) on both sides of (15.56) and multiply bð_t Þ þ bðt Þdðt Þ on both sides of
(15.54) and further subtract the latter equation from the former. As a result, we
obtain
 
b€x  b_ þ ða þ dÞb x_ þ a b_ þ bd  bða_ þ bcÞ x ¼ 0: ð15:57Þ

Upon the above derivation, we assume b 6¼ 0. Solving (15.57) and substituting


the resulting solution for (15.54), we can get a solution for y(t). If, on the other
hand, b ¼ 0, we have xð_t Þ ¼ aðt Þxðt Þ. This is a simple FOLDE and can readily be
integrated to yield
Z t
x ¼ C exp aðt 0 Þdt 0 , ð15:58Þ

where C is an integration constant.


Meanwhile, differentiating (15.55) with respect to t, we have

_ þ d y_ ¼ c_ x þ cax þ dy
€y ¼ c_ x þ c_x þ dy _ þ dy_
Z t
¼ ðc_ þ caÞ C exp aðt Þdt þ dy_ þ dy_ ,

where we used (15.58) as well as x_ ¼ ax. Thus, we are going to solve a following
inhomogeneous SOLDE:
Z t
_ ¼ ðc_ þ caÞ C exp
€y  dy_  dy aðt 0 Þdt 0 : ð15:59Þ

In Sect. 10.2 we have shown how we are able to solve an inhomogeneous FOLDE
using a weight function. For a later purpose, let us consider a next FOLDE assuming
a(x)  1 in (10.23):
594 15 Exponential Functions of Matrices

xð_t Þ þ pðt Þx ¼ qðt Þ: ð15:60Þ

As another method to solve (15.60), we examine the method of variation of


constants. The homogeneous equation corresponding to (15.60) is

xð_t Þ þ pðt Þx ¼ 0: ð15:61Þ

It can be solved as just before such that


 Z t
xðt Þ ¼ C exp  pðt 0 Þdt 0  uðt Þ: ð15:62Þ

Now we assume that a solution of (15.60) can be sought by putting

xðt Þ ¼ kðt Þuðt Þ, ð15:63Þ

where we assume that the functional form u(t) remains unchanged in the inhomo-
geneous equation (15.60) and that instead the “constant” k(t) may change as a
function of t. Inserting (15.63) into (15.60), we have

_ þ ku_ þ pku ¼ ku
x_ þ px ¼ ku _ þ kðu_ þ puÞ ¼ ku
_ þ k ðpu þ puÞ ¼ ku
_
¼ q, ð15:64Þ

where with the third equality we used (15.61) and (15.62) to get u_ ¼ pu. In this
way, (15.64) can easily be integrated to give
Z t
qð t 0 Þ 0
k ðt Þ ¼ dt þ C,
uð t 0 Þ

where C is an integration constant. Thus, as a solution of (15.60) we have


Z t
qðt 0 Þ 0
xðt Þ ¼ kðt Þuðt Þ ¼ uðt Þ dt þ Cuðt Þ: ð15:65Þ
uðt 0 Þ

Comparing (15.65) with (10.29), we notice that these two expressions are related.
In other words, the method of variation of constants in (15.65) is essentially the same
as that using the weight function in (10.29).
Another important point to bear in mind is that in Chap. 10 we have solved
SOLDEs of constant coefficients using the method of Green’s functions. In that case,
we have separately estimated a homogeneous term and inhomogeneous term (i.e.,
surface term). In the present section we are going to seek a method corresponding to
that using the Green’s functions. In the above discussion we see that a system of
differential equations with two unknowns can be translated into a SOLDE. Then, we
expect such a system of differential equations of constant coefficients to be solved
somehow or other. We will hereinafter describe it in detail giving examples.
15.3 System of Differential Equations 595

15.3.2 System of Differential Equations in a Matrix Form:


Resolvent Matrix

The equations of (15.54) and (15.55) are unified in a homogeneous single matrix
equation such that

að t Þ cð t Þ
ðxð_t Þ yð_t ÞÞ ¼ ðxðt Þ yðt ÞÞ : ð15:66Þ
bðt Þ d ðt Þ

In (15.66), x(t) and y(t) and their derivatives represent the functions (of t), and so
we describe them as row matrices. With this expression of equation, see Sect. 11.2
and, e.g., (11.37) therein. Another expression is a transposition of (15.66) described
by
!  
xð_t Þ að t Þ bð t Þ xð t Þ
¼ : ð15:67Þ
yð_t Þ cðt Þ d ðt Þ yð t Þ

Notice that the coefficient matrix has been transposed in (15.67) accordingly. We
are most interested in an equation of constant coefficient and, hence, we rewrite
(15.66) once again explicitly as

_ _ a c
ðxðt Þ yðt ÞÞ ¼ ðxðt Þ yðt ÞÞ ð15:68Þ
b d

or

d a c
ð xð t Þ yð t Þ Þ ¼ ð xð t Þ yð t Þ Þ : ð15:69Þ
dt b d

Putting F(t)  (x(t) y(t)), we have



dF ðt Þ a c
¼ F ðt Þ : ð15:70Þ
dt b d

This is exactly the same as (15.50) if we put



a c
A¼ : ð15:71Þ
b d

Then, we get

F ðt Þ ¼ exp tA: ð15:72Þ


596 15 Exponential Functions of Matrices

As already shown in (15.7) of Sect. 15.1, an exponential function of a matrix, i.e.,


exp tA can be defined as

1 1
exp tA  E þ tA þ ðtAÞ2 þ    þ ðtAÞν þ   : ð15:73Þ
2! ν!

From Property (5) of (15.41), we have det(exp A) 6¼ 0. This is because no matter


what number (either real or complex) TrA may take, exp t(TrA) never vanishes. That
is, F(t) ¼ exp A is non-singular, and so an inverse matrix of exp A must exist. This is
the case with exp tA as well; see equation below

ð5Þ0 detð exp tAÞ ¼ exp TrðtAÞ ¼ exp t ðTrAÞ: ð15:74Þ

Note that if A (or tA) is a (n, n) matrix, F(t) of (15.72) is a (n, n) matrix as well.
Since in that general case F(t) is non-singular, it consists of n linearly independent
column vectors, i.e., (n, 1) matrices; see Chap. 11. Then, F(t) is symbolically
described as

F ðt Þ ¼ ðx1 ðt Þ x2 ðt Þ  xn ðt ÞÞ,

where xi(t) (1  i  n) represents a i-th column vector. Note that since F


(0) ¼ E, xi(0) has its i-th row component of 1, otherwise 0. Since F(t) is
non-singular, n column vectors xi(t) (1  i  n) are linearly independent. In the
case of, e.g., (15.66), A is a (2, 2) matrix, which implies that x(t) and y(t) represent
two linearly independent column vectors, i.e., (2, 1) matrices. Furthermore, if we
look at the constitution of (15.70), x(t) and y(t) represent two linearly independent
solutions of (15.70). We will come back to this point later.
On the basis of the method of variation of constants, we deal with inhomogeneous
equations and describe below the procedures for solving the system of equations
with two unknowns. But, they can be readily conformed to the general case where
the equations with n unknowns are dealt with.
Now, we rewrite (15.68) symbolically such that

xð_t Þ ¼ xðt ÞA, ð15:75Þ



a c
where x(t)  (x(t) y(t)), xð_t Þ  ðxð_t Þ yð_t ÞÞ, and A  . Then, the inhomo-
b d
geneous equation can be described as

xð_t Þ ¼ xðt ÞA þ bðt Þ, ð15:76Þ

where we define the inhomogeneous term as b  (p q). This is equivalent to


15.3 System of Differential Equations 597


_ _ a c
ðxðt Þ yðt ÞÞ ¼ ðxðt Þ yðt ÞÞ þ ðp qÞ: ð15:77Þ
b d

Using the same solution F(t) that appears in the homogeneous equation, we
assume that the solution of the inhomogeneous equation is described by

xðt Þ ¼ kðt ÞF ðt Þ, ð15:78Þ

where k(t) is a variable “constant.” Replacing x(t) in (15.76) with x(t) ¼ k(t)F(t) of
(15.78), we obtain

_ þ kF_ ¼ kF
x_ ¼ kF _ þ kFA ¼ kFA þ b, ð15:79Þ

where with the second equality we used F_ ¼ FA in (15.50). Then, we have

_ ¼ b:
kF ð15:80Þ

This can be integrated so that we may have


Z t h i
k¼ ds bðsÞF ðsÞ1 : ð15:81Þ

As previously noted, F(s)1 exists because F(s) is non-singular. Thus, as a


solution we get
Z t h i
xðt Þ ¼ kðt ÞF ðt Þ ¼ ds bðsÞF ðsÞ1 F ðt Þ : ð15:82Þ
t0

Here we define

Rðs, t Þ  F ðsÞ1 F ðt Þ: ð15:83Þ

The matrix Rðs, t Þ is said to be a resolvent matrix [6]. This matrix plays an
essential role in solving the system of differential equations both homogeneous and
inhomogeneous with either homogeneous or inhomogeneous boundary conditions
(BCs). Rewriting (15.82), we get
Z t
xðt Þ ¼ kðt ÞF ðt Þ ¼ ds½bðsÞℜðs, t Þ : ð15:84Þ
t0

We summarize principal characteristics of the resolvent matrix.


598 15 Exponential Functions of Matrices

ð1Þ Rðs, sÞ ¼ F ðsÞ1 F ðsÞ ¼ E, ð15:85Þ



1 0
where E is a (2, 2) unit matrix, i.e., E ¼ for an equation having two
0 1
unknowns.
h i1
ð2Þ Rðs, t Þ1 ¼ F ðsÞ1 F ðt Þ ¼ F ðt Þ1 F ðsÞ ¼ Rðt, sÞ: ð15:86Þ

ð3Þ Rðs, t ÞRðt, uÞ ¼ F ðsÞ1 F ðt ÞF ðt Þ1 F ðuÞ ¼ F ðsÞ1 F ðuÞ ¼ Rðs, uÞ: ð15:87Þ

It is easy to include a term in solution related to the inhomogeneous BCs. As


already seen in (10.33) of Sect. 10.2, a boundary condition for the FOLDEs is set
such that, e.g.,

uðaÞ ¼ σ, ð15:88Þ

where u(t) is a solution of a FOLDE. In that case, the BC can be translated into the
“initial” condition. In the present case, we describe the BCs as

xðt 0 Þ  x0  ðσ τÞ, ð15:89Þ

where σ  x(t0) and τ  y(t0). Then, the said term associated with the inhomoge-
neous BCs is expected to be written as

xðt 0 ÞRðt 0 , t Þ:

In fact, since xðt 0 ÞRðt 0 , t 0 Þ ¼ xðt 0 ÞE ¼ xðt 0 Þ, this satisfies the BCs. Then, the full
expression of the solution of the inhomogeneous equation that takes account of the
BCs is assumed to be expressed as
Z t
xðt Þ ¼ xðt 0 Þℜðt 0 , t Þ þ ds½bðsÞℜðs, t Þ: ð15:90Þ
t0

Let us confirm that (15.90) certainly gives a proper solution of (15.76). First we
give a formula on the differentiation of
Z t
d
Q ðt Þ ¼ Pðt, sÞds, ð15:91Þ
dt t0

where P(t, s) stands for a (1, 2) matrix whose general form is described by (q(t, s)
r(t, s)), in which q(t, s) and r(t, s) are functions of t and s. We rewrite (15.91) as
15.3 System of Differential Equations 599

Z tþΔ Z t
1
Qðt Þ ¼ lim Pðt þ Δ, sÞds  Pðt, sÞds
Δ!0 Δ t0 t0
Z tþΔ Z t Z t
1
¼ lim Pðt þ Δ, sÞds  Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds
Δ!0 Δ t0 t0 t0
Z t
 Pðt, sÞds
t0
Z tþΔ Z t Z t
1
¼ lim Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds  Pðt, sÞds
Δ!0 Δ t t0 t0
Z t Z t
1 1
lim Pðt, t ÞΔ þ lim Pðt þ Δ, sÞds  Pðt, sÞds
Δ!0 Δ Δ!0 Δ t0 t0
Z t
∂Pðt, sÞ
¼ Pðt, t Þ þ ds: ð15:92Þ
t0 ∂t

Replacing P(t, s) of (15.92) with bðsÞRðs, t Þ, we have


Z Z
d t t
∂ℜðs, t Þ
bðsÞℜðs, t Þds ¼ bðt Þℜðt, t Þ þ bð s Þ ds
dt t0 t0 ∂t
Z t
∂ℜðs, t Þ
¼ bð t Þ þ bð s Þ ds, ð15:93Þ
t0 ∂t

where with the last equality we used (15.85). Considering (15.83) and (15.93) and
differentiating (15.90) with respect to t, we get
Z t h i
xð_t Þ ¼ xðt 0 ÞF ðt 0 Þ1 F ð_t Þ þ ds bðsÞF ðsÞ1 F ð_t Þ þ bðt Þ
t0
Z t h i
¼ xðt 0 ÞF ðt 0 Þ1 F ðt ÞA þ ds bðsÞF ðsÞ1 F ðt ÞA þ bðt Þ
t0
Z t
¼ xðt 0 Þℜðt 0 , t ÞA þ ds½bðsÞℜðs, t ÞA  þ bðt Þ
(" t 0Z )
t
¼ xðt 0 Þℜðt 0 , t Þ þ ds½bðsÞℜðs, t Þ A þ bðt Þ
t0

¼ xðt ÞA þ bðt Þ, ð15:94Þ

where with the second equality we used (15.70) and with the last equality we used
(15.90). Thus, we have certainly recovered the original differential equation of
(15.76). Consequently, (15.90) is the proper solution for the given inhomogeneous
equation with inhomogeneous BCs.
600 15 Exponential Functions of Matrices

The above discussion and formulation equally apply to the general case of the
differential equation with n unknowns, even though the calculation procedures
become increasingly complicated with the increasing number of unknowns.

15.3.3 Several Examples

To deepen our understanding of the essence and characteristics of the system of


differential equations, we deal with several examples regarding the equations of two
unknowns with constant coefficients. The constant matrices A of specific types (e.g.,
anti-Hermitian matrices and skew-symmetric matrices) have a wide field of appli-
cations in mathematical physics, especially in Lie groups and Lie algebras (see
Chap. 20). In the subsequent examples, however, we deal with various types of
matrices.
Example 15.1 Solve the following equation

xð_t Þ ¼ x þ 2y þ 1, yð_t Þ ¼ 2x þ y þ 2, ð15:95Þ

under the BCs x(0) ¼ c and y(0) ¼ d.


In a matrix form, (15.95) can be written as

_ _ 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞ þ ð1 2Þ: ð15:96Þ
2 1

The homogeneous equation corresponding to (15.96) is expressed as



_ _ 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞ : ð15:97Þ
2 1

As discussed in Sect. 15.3.2, this can readily be solved to produce a solution F(t)
such that

F ðt Þ ¼ exp tA, ð15:98Þ



1 2
where A ¼ . Since A is a real symmetric (i.e., Hermitian) matrix, it should
2 1
be diagonalized by the unitary similarity transformation (see Sect. 14.3). For the
purpose of getting an exponential function of a matrix, it is easier to estimate exp tD
(where D is a diagonal matrix) than to directly calculate exp tA. That is, we rewrite
(15.97) as
15.3 System of Differential Equations 601


_ _ { 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞUU UU { , ð15:99Þ
2 1

where U is a unitary matrix; i.e., U{ ¼ U1.


Following routine calculation procedures (as in, e.g., Example 14.3), we readily
get a following diagonal matrix and the corresponding unitary matrix for the
diagonalization. That is, we obtain
0 1
1 1
pffiffiffi pffiffiffi 
B 2 2C 1 2
U¼B@
C and D ¼ U 1 U ¼ U 1 AU
1 1 A 2 1
 pffiffiffi pffiffiffi
2 2

1 0
¼ : ð15:100Þ
0 3

Or we have

A ¼ UDU 1 : ð15:101Þ

Hence, we get
 
F ðt Þ ¼ exp tA ¼ exp tUDU 1 ¼ U ð exp tDÞU 1 , ð15:102Þ

where with the last equality we used (15.30). From Theorem 15.3, we have

et 0
exp tD ¼ : ð15:103Þ
0 e3t

In turn, we get
0 1 1 1 0 1
pffiffiffi pffiffiffi t
! p1ffiffiffi  p1ffiffiffi
B 2 2C e 0 B 2 2C
F ðt Þ ¼ B
@
C
A
B
@
C
1 1 0 e 3t 1 1 A
 pffiffiffi pffiffiffi pffiffiffi pffiffiffi ð15:104Þ
2 2 2 2
t t
!
1 e þ e 3t
e þ e 3t
¼ :
2 et þ e3t et þ e3t

With an inverse matrix, we have



1 1 e3t þ et e3t  e3t
F ðt Þ ¼ : ð15:105Þ
2 e3t  e3t e3t þ et
602 15 Exponential Functions of Matrices

Therefore, as a resolvent matrix we get


!
1 1 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ
Rðs, t Þ ¼ F ðsÞ F ðt Þ ¼ : ð15:106Þ
2 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ

Thus, as a solution of (15.95) we obtain


Z t
xðt Þ ¼ xð0ÞRð0, t Þ þ ds½bðsÞRðs, t Þ, ð15:107Þ
0

where x(0) ¼ (x(0) y(0)) ¼ (c d) represents the BCs and b(s) ¼ (1 2) comes from
(15.96). This can easily be calculated so that we can have
 
1 1
xð0ÞRð0, t Þ ¼ ðc  dÞet þ ðc þ d Þe3t ðd  cÞet þ ðc þ dÞe3t
2 2
ð15:108Þ

and
Z  
t
1 t  1 
ds½bðsÞRðs, t Þ ¼ e þ e3t  1  et  e3t : ð15:109Þ
0 2 2

Note that the first component of (15.108) and (15.109) represents the
x component of the solution for (15.95) and that the second component represents
the y component. From (15.108), we have

xð0ÞRð0, 0Þ ¼ xð0ÞE ¼ xð0Þ ¼ ðc d Þ: ð15:110Þ

Thus, we find that the BCs are certainly satisfied. Putting t ¼ 0 in (15.109), we
have
Z 0
ds½bðsÞRðs, 0Þ ¼ 0, ð15:111Þ
0

confirming that the second term of (15.107) vanishes at t ¼ 0.


The summation of (15.108) and (15.109) gives an overall solution of (15.107).
Separating individual components of the solution, we describe them as

1
xð t Þ ¼ ðc  d þ 1Þet þ ðc þ d þ 1Þe3t  2 ,
2 ð15:112Þ
1
yðt Þ ¼ ðd  c  1Þet þ ðc þ d þ 1Þe3t :
2
15.3 System of Differential Equations 603

Moreover, differentiating (15.112) with respect to t, we recover the original form


of (15.95). The confirmation is left for readers.
Remarks on Example 15.1: We become aware of several points in Example 15.1.
(i) The resolvent matrix given by (15.106) can be obtained by replacing t of
(15.104) with t  s. This is one of important characteristics of the resolvent
matrix that is derived from an exponential function of a constant matrix.
To see it, if A is a constant matrix, tA and sA commute. By the definition (15.7)
of the exponential function we have

1 2 2 1
exp tA ¼ E þ tA þt A þ    þ t ν Aν þ   , ð15:113Þ
2! ν!
1 1
exp ðsAÞ ¼ E þ ðsAÞ þ ðsÞ2 A2 þ    þ ðsÞν Aν þ   : ð15:114Þ
2! ν!

Both (15.113) and (15.114) are polynomials of a constant matrix A and,


hence, exp tA and exp(sA) commute as well. Then, from Theorem 15.2 and
Property (1) of (15.28), we get

exp ðtA  sAÞ ¼ ð exp tAÞ½ exp ðsAÞ ¼ ½ exp ðsAÞð exp tAÞ
¼ exp ðsA þ tAÞ ¼ exp ½ðt  sÞA: ð15:115Þ

That is, using (15.28), (15.72), and (15.83), we get

Rðs, t Þ ¼ F ðsÞ1 F ðt Þ ¼ exp ðsAÞ1 exp ðtAÞ ¼ exp ðsAÞ exp ðtAÞ
¼ exp ½ðt  sÞA ¼ F ðt  sÞ: ð15:116Þ

This implies that once we get exp tA, we can safely obtain the resolvent matrix
by automatically replacing t of (15.72) with t  s. Also, exp tA and exp(sA) are
commutative. In the present case, therefore, by exchanging the order of products
of F(s)1 and F(t) in (15.106), we have the same result as (15.106) such that

!
1 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ
Rðs, t Þ ¼ F ðt ÞF ðsÞ1 ¼ : ð15:117Þ
2 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ

(ii) The resolvent matrix of the present example is real symmetric. This is because
if A is real symmetric, i.e., AT ¼ A, we have
 n
ðAn ÞT ¼ AT ¼ An , ð15:118Þ

that is An is real symmetric as well. From (15.7), exp A is real symmetric


accordingly. In a similar manner, if A is Hermitian, exp A is also Hermitian.
604 15 Exponential Functions of Matrices

(iii) Let us think of a case where A is not a constant matrix but varies as a function of
t. In that case, we have

að t Þ bð t Þ
Aðt Þ ¼ : ð15:119Þ
cðt Þ d ðt Þ

Also, we have

aðsÞ bðsÞ
AðsÞ ¼ : ð15:120Þ
cð s Þ d ðsÞ

We can easily check that in a general case

Aðt ÞAðsÞ 6¼ AðsÞAðt Þ, ð15:121Þ

namely, A(t) and A(s) are not generally commutative. The matrices tA(t) is
not commutative with sA(s), either. In turn, (15.115) does not hold either.
Thus, we need an elaborated treatment for this.
Example 15.2 Find the resolvent matrix of the following homogeneous equation:

xð_t Þ ¼ 0,

yð_t Þ ¼ x þ y: ð15:122Þ

In a matrix form, (15.122) can be written as



_ _ 0 1
ðxðt Þ yðt ÞÞ ¼ ðx yÞ : ð15:123Þ
0 1

0 1
The matrix A ¼ is characterized by an idempotent matrix (see Sect.
0 1
12.4). As in the case of Example 15.1, we get

0 1
ðxð_t Þ yð_t ÞÞ ¼ ðx yÞPP1 PP1 , ð15:124Þ
0 1
 
1 1 1
1 1
where we choose P ¼ and, hence, P ¼
. Notice that in this
0 1 0 1
example since A was not a symmetric matrix, we did not use a unitary matrix with the
similarity transformation. As a diagonal matrix D, we have
 
0 1 0 0
D ¼ P1 P ¼ P1 AP ¼ : ð15:125Þ
0 1 0 1
15.3 System of Differential Equations 605

Or we have

A ¼ PDP1 : ð15:126Þ

Hence, we get
 
F ðt Þ ¼ exp tA ¼ exp tPDP1 ¼ Pð exp tDÞP1 , ð15:127Þ

where we used (15.30) again. From Theorem 15.3, we have



1 0
exp tD ¼ : ð15:128Þ
0 et

In turn, we get
   
1 1 1 0 1 1 1 1 þ et
F ðt Þ ¼ ¼ : ð15:129Þ
0 1 0 et 0 1 0 et

With an inverse matrix, we have



1 1 et  1
F ðt Þ ¼ : ð15:130Þ
0 et

Therefore, as a resolvent matrix we get



1 1 þ ets
Rðs, t Þ ¼ F ðsÞ1 F ðt Þ ¼ : ð15:131Þ
0 ets

Using the resolvent matrix, we can include the inhomogeneous term along with
BCs (or initial conditions). Once again, by exchanging the order of products of F
(s)1 and F(t) in (15.131), we have the same resolvent matrix. This can readily be
checked. Moreover, we get Rðs, t Þ ¼ F ðt  sÞ.
Example 15.3 Solve the following equation

xð_t Þ ¼ x þ c,

yð_t Þ ¼ x þ y þ d, ð15:132Þ

under the BCs x(0) ¼ σ and y(0) ¼ τ. In a matrix form, (15.132) can be written as

1 1
ðxð_t Þ yð_t ÞÞ ¼ ðx yÞ þ ðc dÞ: ð15:133Þ
0 1
606 15 Exponential Functions of Matrices


1 1
The matrix of a type of M ¼ has been fully investigated in Sect. 12.6
0 1
and characterized as a matrix that cannot be diagonalized. In such a case, we directly
calculate exp tM. Here, remember that product of triangle matrices of same kind (i.e.,
an upper triangle matrix or a lower triangle matrix; see Sect. 12.1) is a triangle matrix
of the same kind.
We have
    
1 1 1 1 1 2 1 2 1 1
M2 ¼ ¼ , M3 ¼
0 1 0 1 0 1 0 1 0 1

1 3
¼ ,   :
0 1

Repeating this, we get



1 n
M ¼ n
: ð15:134Þ
0 1

Thus, we have

1 1
exp tM ¼ ℜð0, t Þ ¼ E þ tM þ t 2 M 2 þ    þ t ν M ν þ   
2! ν!
! ! ! !
1 0 1 1 1 2 1 2 1 ν 1 ν
¼ þt þ t þ  þ t þ 
0 1 0 1 2! 0 1 ν! 0 1
0 1
1 1 !
Be
t
t 1 þ t þ t2 þ    þ t ν1 þ    C et tet
¼@ 2! ðν  1Þ! A¼ :
t 0 et
0 e
ð15:135Þ

With the resolvent matrix we get



ets ðt  sÞets
Rðs, t Þ ¼ exp ðsM Þ exp tM ¼ : ð15:136Þ
0 ets

Using (15.107), we get


Z t
xðt Þ ¼ ðσ τÞRð0, t Þ þ ds½ðc d ÞRðs, t Þ: ð15:137Þ
0

Thus, we obtain the solution described by


15.3 System of Differential Equations 607

xðt Þ ¼ ðc þ σ Þet  c,
yðt Þ ¼ ðc þ σ Þtet þ ðd  c þ τÞet þ c  d: ð15:138Þ

From (15.138), we find that the original inhomogeneous equations (15.132) and
BCs of x(0) ¼ σ and y(0) ¼ τ have been recovered.
Example 15.4 Find the resolvent matrix of the following homogeneous equation:

xð_t Þ ¼ 0,

yð_t Þ ¼ x: ð15:139Þ

In a matrix form, (15.139) can be written as



0 1
ðxð_t Þ yð_t ÞÞ ¼ ðx yÞ : ð15:140Þ
0 0

0 1
The matrix N ¼ is characterized by a nilpotent matrix (Sect. 12.3).
0 0
Since N2 and matrices of higher order of N vanish, we have

1 t
exp tN ¼ E þ tN ¼ : ð15:141Þ
0 1

We have a corresponding resolvent matrix such that



1 ts
Rðs, t Þ ¼ : ð15:142Þ
0 1

Using the resolvent matrix, we can include the inhomogeneous term along with
BCs.
Example 15.5 As a trivial case, let us consider a following homogeneous equation:

_ _ a 0
ðxðt Þ yðt ÞÞ ¼ ðx yÞ , ð15:143Þ
0 b

where one of a and b can be zero or both of them can be zero. Even though (15.143)
merely apposes two FOLDEs, we can deal with such cases  in an automatic manner.
a 0
From Theorem 15.3, as eigenvalues of exp A where A ¼ , we have ea and
0 b
eb. That is, we have
608 15 Exponential Functions of Matrices


eat 0
F ðt Þ ¼ exp tA ¼ : ð15:144Þ
0 ebt

In particular, if a ¼ b (including a ¼ b ¼ 0), we merely have the same FOLDE


twice changing the arguments x with y.
Readers may well wonder why we have to argue such a trivial case expressly.
That is because the theory we developed in this chapter holds widely and we are able
to make a clear forecast about a constitution of the solution for differential equations
of a first order. For instance, (15.144) immediately tells us that a fundamental
solution for

xð_t Þ ¼ axðt Þ ð15:145Þ

is eat. We only have to perform routine calculations with a counterpart of x(t) [or y(t)]
of (x(t) y(t)) using (15.107) to solve an inhomogeneous differential equation under
BCs.
Returning back to Example 15.1, we had

_ _ 1 2
ðxðt Þ yðt ÞÞ ¼ ðx yÞ þ ð1 2Þ: ð15:96Þ
2 1

This equation can be converted to



_ _ 1 1 2
ðxðt Þ yðt ÞÞU ¼ ðx yÞUU U þ ð1 2ÞU, ð15:146Þ
2 1
0 1 0 1
1 1 1 1
pffiffiffi pffiffiffi pffiffiffi pffiffiffi
B 2 2C B 2C
where U ¼ B C. Then, ðx yÞU ¼ ðx yÞB 2 C. Defining (X
@ 1 1 A @ 1 1 A
 pffiffiffi pffiffiffi  pffiffiffi pffiffiffi
2 2 2 2
Y) as
0 1
1 1
pffiffiffi pffiffiffi
B 2 2C
ðX Y Þ  ðx yÞU ¼ ðx yÞB
@
C, ð15:147Þ
1 1 A
 pffiffiffi pffiffiffi
2 2

we have

_ _ 1 0
ðX ðt Þ Y ðt ÞÞ ¼ ðX Y Þ þ ð1 2ÞU: ð15:148Þ
0 3
15.3 System of Differential Equations 609

Then, according to (15.90), we get a solution


Z t h i
e ðt 0 , t Þ þ
X ðt Þ ¼ X ðt 0 ÞR ds e e ðs, t Þ ,
bð s Þ R ð15:149Þ
t0

where X(t0) ¼ (X(t0) Y(t0)) and e e ðs, t Þ is given


bðsÞ ¼ bðsÞU. The resolvent matrix R
by
!
e ðs, t Þ ¼ eðtsÞ 0
R : ð15:150Þ
0 e3ðtsÞ

To convert (X Y) to (x y), operating U1 on both sides of (15.149) from the right,
we obtain
Z t h i
e ðt 0 , t ÞU 1 þ
X ðt ÞU 1 ¼ Xðt 0 ÞU 1 U ℜ ds e e ðs, t Þ U 1 ,
bðsÞU 1 U ℜ ð15:151Þ
t0

where with the first and second terms we insert U1U ¼ E. Then, we get
Z t
e
xðt Þ ¼ xðt 0 Þ U ℜðt 0 , t ÞU 1 e ðs, t ÞU 1 :
þ dsbðsÞ U ℜ ð15:152Þ
t0

e ðs, t ÞU 1 to have
We calculate U R
 !
e ðs, t ÞU 1 1 1 1 eðtsÞ 0 1 1
UR ¼ 3ðtsÞ
2 1 1 0 e 1 1
!
1 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ
¼ ¼ Rðs, t Þ: ð15:153Þ
2 eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ

Thus, we recover (15.90), i.e., we have


Z t
xðt Þ ¼ xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ: ð15:90Þ
t0

e ðs, t Þ are connected to each other


The equation (15.153) shows that Rðs, t Þ and R
through the unitary similarity transformation such that

e ðs, t Þ ¼ U 1 Rðs, t ÞU ¼ U { Rðs, t ÞU:


R ð15:154Þ
610 15 Exponential Functions of Matrices

Example 15.6 Let us consider the following system of differential equations:



_ _ 0 ω
ðxðt Þ yðt ÞÞ ¼ ðx yÞ þ ða 0Þ: ð15:155Þ
ω 0

0 ω
The matrix is characterized by an anti-Hermitian operator. This type
ω 0
of operator frequently appears in Lie groups and Lie algebras that we will deal with
in Chap. 20. Here we define an operator D as
 
0 1 0 ω
D¼ or ωD ¼ : ð15:156Þ
1 0 ω 0

The calculation procedures will be seen in Chap. 20, and so we only show the
result here. That is, we have

cos ωt  sin ωt
exp tωD ¼ : ð15:157Þ
sin ωt cos ωt

We seek the resolvent matrix Rðs, t Þ such that



cos ωðt  sÞ  sin ωðt  sÞ
Rðs, t Þ ¼ : ð15:158Þ
sin ωðt  sÞ cos ωðt  sÞ

The implication of (15.158) is simple; synthesis of a rotation of ωt and the inverse


rotation ωs is described by ω(t  s). Physically, (15.155) represents the motion of a
point mass under an external field.
We routinely obtained the above solution using (15.90). If, for example, we solve
(15.155) under the BCs for which the point mass is placed at rest at the origin of the
xy-plane at t ¼ 0, we get a solution described by

a a
x¼ sin ωt, y ¼ ð cos ωt  1Þ: ð15:159Þ
ω ω

Deleting t from (15.159), we get


   2
a 2 a
x2 þ y þ ¼ : ð15:160Þ
ω ω

In other words, the point mass performs a circular motion as shown in Fig. 15.2.
In Example 15.1 we mentioned that if the matrix A that defines the system of
differential equations varies as a function of t, then the solution method based upon
exp tA does not work. Yet, we have an important case where A is not a constant
matrix. Let us consider a next illustration.
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 611

Fig. 15.2 Circular motion


of a point mass

− /

− /

15.4 Motion of a Charged Particle in Polarized


Electromagnetic Wave

In Sect. 4.5 we dealt with an electron motion under a circularly polarized light.
Here we consider the motion of a charged particle in a linearly polarized
electromagnetic wave.
Suppose that a charged particle is placed in vacuum. Suppose also that an
electromagnetic wave (light) linearly polarized along the x-direction is propagated
in the positive direction of the z-axis. Unlike the case of Sect. 4.5, we have to take
account of influence from both of the electric field and magnetic field components of
Lorentz force.
From (7.58) we have

E = E0 eiðkxωtÞ ¼ E0 e1 eiðkzωtÞ , ð15:161Þ

where e1, e2, and e3 are the unit vector of the x-, y-, and z-direction. (The latter two
unit vectors appear just below.) If the extent of motion of the charged particle is
narrow enough around the origin compared to a wavelength of the electromagnetic
field, we can ignore kz in the exponent (see Sect. 4.5). Then, we have

E E 0 e1 eiωt : ð15:162Þ

Taking a real part of (15.162), we get

E = E0 e1 cos ωt: ð15:163Þ

Meanwhile, from (7.59) we have


612 15 Exponential Functions of Matrices

H = H 0 eiðkzωtÞ H 0 cos ωt: ð15:164Þ

From (7.60) furthermore, we get


pffiffiffiffiffiffiffiffiffiffiffi
H 0 = e3 E0 = μ0 =ε0 ¼ e3 E 0 e1 =μ0 c ¼ E 0 e2 =μ0 c: ð15:165Þ

Thus, as the magnetic flux density we have

B = μ0 H μ0 H 0 cos ωt ¼ E 0 e2 cos ωt=c: ð15:166Þ

The Lorentz force F exerted on the charged particle is then described by

F = eE þ e x_ B, ð15:167Þ

where x (=xe1 + ye2 + ze3) is a position vector of the charged particle having a
charge e. In (15.167) the first term represents the electric Lorentz force and the
second term shows the magnetic Lorentz force; see Sect. 4.5. Replacing E and B in
(15.167) with those in (15.163) and (15.166), we get
 
1 eE
F = 1  z_ eE0 e1 cos ωt þ 0 x_ e3 cos ωt
c c ð15:168Þ
= m€x ¼ m€xe1 þ m€ye2 þ m€ze3 ,

where m is a mass of the charged particle. Comparing each vector components, we


have
 
1
m€x ¼ 1  z_ eE 0 cos ωt,
c
m€y ¼ 0, ð15:169Þ
eE
m€z ¼ 0 x_ cos ωt:
c

Note that the Lorentz force is not exerted in the direction of the y-axis. Putting
a  eE0/m and b  eE0/mc, we get

€x ¼ a cos ωt  b_z cos ωt,


€z ¼ b_x cos ωt: ð15:170Þ

Further putting x_ ¼ ξ and z_ ¼ ζ, we obtain

ξ_ ¼ bζ cos ωt þ a cos ωt, ζ_ ¼ bξ cos ωt: ð15:171Þ


15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 613

Writing (15.171) as a matrix form, we obtain a following system of inhomoge-


neous differential equations:

0 b cos ωt
ðξð_t Þ ζ ð_t ÞÞ ¼ ðξ ζ Þ þ ða cos ωt 0Þ, ð15:172Þ
b cos ωt 0

where the inhomogeneous term is given by (acosωt). First, we think of a homoge-


neous equation described by

0 b cos ωt
ðξð_t Þ ζ ð_t ÞÞ ¼ ðξ ζ Þ : ð15:173Þ
b cos ωt 0

Meanwhile, we consider the following equation:


! 0 1
d cos f ðt Þ sin f ðt Þ f ð_t Þ sin f ðt Þ f ð_t Þ cos f ðt Þ
¼@ A
dt  sin f ðt Þ cos f ðt Þ f ð_t Þ cos f ðt Þ f ð_t Þ sin f ðt Þ

 !
cos f ðt Þ sin f ðt Þ 0 f ð_t Þ
¼ : ð15:174Þ
 sin f ðt Þ cos f ðt Þ f ð_t Þ 0

Closely inspecting (15.173) and (15.174), we find that if we can decide f(t) so that
f(t) may satisfy

f ð_t Þ ¼ b cos ωt, ð15:175Þ

we might well be able to solve (15.172). In that case, moreover, we expect that two
linearly independent column vectors
 
cos f ðt Þ sin f ðt Þ
and ð15:176Þ
 sin f ðt Þ cos f ðt Þ

give a fundamental set of solutions of (15.172). A function f(t) that satisfies (15.175)
can be given by

f ðt Þ ¼ ðb=ωÞ sin ωt ¼ e
b sin ωt, ð15:177Þ

where e
b  b=ω. Notice that at t ¼ 0 two column vectors of (15.176) give
 
1 0
and ,
0 1

if we choose f ðt Þ ¼ e
b sin ωt for f(t) in (15.176).
614 15 Exponential Functions of Matrices

Thus, defining F(t) as


0    1
cos e b sin ωt sin eb sin ωt
B C
F ðt Þ  @     A, ð15:178Þ
 sin e b sin ωt cos eb sin ωt

and taking account of (15.174), we recast the homogeneous equation (15.173) as

dF ðt Þ
¼ F ðt ÞA,
dt

0 b cos ωt
A : ð15:179Þ
b cos ωt 0

Equation (15.179) is formally the same as (15.47), even though F(t) is not given
in the form of exp tA. This enables us to apply the general scheme mentioned in Sect.
15.3.2 to the present case. That is, we are going to address the given problem using
the method of variation of constants such that

xðt Þ ¼ kðt ÞF ðt Þ, ð15:78Þ

where we define x(t)  (ξ(t) ζ(t)) and k(t)  (u(t) v(t)) as a variable constant.
Hence, using (15.80)–(15.90) we should be able to find the solution of (15.172)
described by
Z t
xðt Þ ¼ xðt 0 Þℜðt 0 , t Þ þ ds½bðsÞℜðs, t Þ, ð15:90Þ
t0

where b(s)  (acosωt 0), i.e., the inhomogeneous term in (15.172). The resolvent
matrix Rðs, t Þ is expressed as

1 cos ½ f ðt Þ  f ðsÞ sin ½ f ðt Þ  f ðsÞ
Rðs, t Þ ¼ F ðsÞ F ðt Þ ¼ , ð15:180Þ
 sin ½ f ðt Þ  f ðsÞ cos ½ f ðt Þ  f ðsÞ

where f(t) is given by (15.177). With F(s)1, we simply take a transpose matrix of
(15.178), because it is an orthogonal matrix. Thus, we obtain
0    1
cos eb sin ωs  sin eb sin ωs
B C
F ðsÞ1 ¼@     A:
sin eb sin ωs e
cos b sin ωs

The resolvent matrix (15.180) possesses the properties described as (15.85)–


(15.87). These can readily be checked. Also, we become aware that F(t) of
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 615

(15.178) represents a rotation matrix in ℝ2; see Sect. 11.2. Therefore, F(t) and F(s)1
are commutative. Notice, however, that we do not have a resolvent matrix as a
simple form that appears in (15.116). In fact, from (15.178) we find Rðs, t Þ 6¼
F ðt  sÞ. It is because the matrix A given as (15.179) is not a constant but depends
on t.
Rewriting (15.90) as
Z t
xðt Þ ¼ xð0Þℜð0, t Þ þ ds½ða cos ωt 0Þℜðs, t Þ ð15:181Þ
0

and setting BCs as x(0) ¼ (σ τ), with the individual components we finally obtain
     
a
ξðt Þ ¼ σ cos e b sin ωt  τ sin e b sin ωt þ sin e b sin ωt ,
b
     
a a
ζ ðt Þ ¼ σ sin e b sin ωt þ τ cos e b sin ωt  cos e b sin ωt þ : ð15:182Þ
b b

Note that differentiating both sides of (15.182) with respect to t, we recover


original equation of (15.172) to be solved. Further setting ξ(0) ¼ σ ¼ ζ(0) ¼ τ ¼ 0 as
BCs, we have
 
a
ξðt Þ ¼ sin e
b sin ωt ,
b
h  i
a
ζ ðt Þ ¼ 1  cos e b sin ωt : ð15:183Þ
b

Notice that the above BCs correspond to that the charged particle is initially
placed (at the origin) at rest (i.e., zero velocity). Deleting the argument t, we get
   2
a 2 a
ξ2 þ ζ  ¼ : ð15:184Þ
b b

Figure 15.3 depicts the particle velocities ξ and ζ in the x- and z-directions,
respectively. It is interesting that although the velocity in the x-direction switches
from negative to positive, that in the z-direction is kept non-negative. This implies
that the particle is oscillating along the x-direction, but continually drifting toward
the z-direction while oscillating. In fact, if we integrate ζ(t), we have a continually
drifting term ab t.
Figure 15.4 shows the Lorentz force F as a function of time in the case where the
charge of the particle is positive. Again, the electric Lorentz force (represented by E)
switches from positive to negative in the x-direction, but the magnetic Lorentz force
(represented by x_ B) holds non-negative in the z-direction all the time.
To precisely analyze the positional change of the particle, we need to integrate
(15.182) once again. This requires the detailed numerical calculation. For this
purpose the following relation is useful [7]:
616 15 Exponential Functions of Matrices

Fig. 15.3 Velocities ξ and ζ of a charged particle in the x- and z-directions, respectively.
The charged particle exerts a periodic motion under the influence of a linearly polarized
electromagnetic wave

(a) (b)

, ̇

̇× ,
, ̇
̇× ,

Fig. 15.4 Lorentz force as a function of time. In (a, b), the phase of electromagnetic fields E and
B is reversed. As a result, the electric Lorentz force (represented by E) switches from positive
to negative in the x-direction, but the magnetic Lorentz force (represented by x_ B) holds
non-negative all the time in the z-direction

X1
cos ðx sin t Þ ¼ J ðxÞ cos mt,
m¼1 m
X1
sin ðx sin t Þ ¼ J ðxÞ sin mt,
m¼1 m
ð15:185Þ

where the functions Jm(x) are called the Bessel functions that satisfy the following
equation:

d2 J n ðxÞ 1 dJ n ðxÞ n2
2
þ þ 1  2 J n ðxÞ ¼ 0: ð15:186Þ
dx x dx x
References 617

Although we do not examine the properties of the Bessel functions in this book,
the related topics are dealt with in detail in the literature [5, 7–9]. The Bessel
functions are widely investigated as one of special functions in many branches of
mathematical physics.
To conclude this section, we have described an example to which the matrix A of
(15.179) that characterizes the system of differential equations is not a constant
matrix but depends on the parameter t. Unlike the case of the constant matrix A, it
may well be difficult to find a fundamental set of solutions. Once we can find the
fundamental set of solutions, however, it is always possible to construct the resolvent
matrix and solve the problem using it. In this respect, we have already encountered a
similar situation in Chap. 10 where the Green’s functions were constructed by a
fundamental set of solutions.
In this section, a fundamental set of solutions could be found and, as a conse-
quence, we could construct the resolvent matrix. It is, however, not always the case
that we can recast the system of differential equations (15.66) as the form of
(15.179). Nonetheless, whenever we are successful in finding the fundamental set
of solutions, we can construct a resolvent matrix and, hence, solve the problems.
This is a generic and powerful tool.

References

1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
3. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
4. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
6. Inami T (1998) Ordinary differential equations. Iwanami, Tokyo. (in Japanese)
7. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd
edn. Cambridge University Press, Cambridge
8. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
9. Hassani S (2006) Mathematical physics. Springer, New York
Part IV
Group Theory and Its Chemical
Applications

Universe comprises space and matter. These two mutually stipulate their modality of
existence. We often comprehend related various aspects as manifestation of sym-
metry. In this part we deal with the symmetry from a point of view of group theory.
In this last part of the book, we outline and emphasize chemical applications of the
methods of mathematical physics. This part supplies us with introductory description
of group theory. The group theory forms an important field of both pure and applied
mathematics. Starting with the definition of groups, we cover a variety of topics
related to group theory. Of these, symmetry groups are familiar to chemists, because
they deal with a variety of matter and molecules that are characterized by different
types of symmetry. The symmetry group is a kind of finite group and called a point
group as well. Meanwhile, we have various infinite groups that include rotation
group as a typical example. We also mention an introductory theory of the rotation
group of SO(3)that deals with an important topic of, e.g., Euler angles. We also treat
successive coordinate transformations.
Next, we describe representation theory of groups. Schur’s lemmas and related
grand orthogonality theorem underlie the representation theory of groups. In parallel,
characters and irreducible representations are important concepts that support the
representation theory. We present various representations, e.g., regular representa-
tion, direct-product representation, and symmetric and antisymmetric representa-
tions. These have wide applications in the field of quantum mechanics and quantum
chemistry, and so forth.
On the basis of the above topics, we highlight quantum chemical applications of
group theory in relation to a method of molecular orbitals. As tangible examples, we
adopt aromatic molecules and methane.
The last part deals with the theory of continuous groups. The relevant theory has
wide applications in many fields of both pure and applied physics and chemistry. We
highlight the topics of SU(2)and SO(3)that very often appear there. Tangible exam-
ples help understand the essence.
Chapter 16
Introductory Group Theory

A group comprises mathematical elements that satisfy four simple definitions. A


bunch of groups exists under these simple definitions. This makes the group theory a
discriminating field of mathematics. To get familiar with various concepts of groups,
we first show several tangible examples. Group elements can be numbers (both real
and complex) and matrices. More abstract mathematical elements can be included as
well. Examples include transformation, operation, etc. as already studied in previous
parts. Once those mathematical elements form a group, they share several common
notions such as classes, subgroups, and direct-product groups. In this context,
readers are encouraged to conceive different kinds of groups close to their heart.
Mapping is an important concept as in the case of vector spaces. In particular,
isomorphism and homomorphism frequently appear in the group theory. These
concepts are closely related to the representation theory that is an important pillar
of the group theory.

16.1 Definition of Groups

In contrast to a broad range of applications, the definition of the group is simple. Let
ℊ be a set of elements gν, where ν is an index either countable (e.g., integers) or
uncountable (e.g., real numbers) and the number of elements may be finite or
infinite. We denote this by ℊ ¼ {gν}. If a group is a finite group, we express it as

ℊ ¼ fg1 , g2 ,   , gn g, ð16:1Þ

where n is said to be an order of the group.


Definition of the group comprises the following four axioms with respect to a
well-defined “multiplication” rule between any pair of elements. The multiplication

© Springer Nature Singapore Pte Ltd. 2020 621


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_16
622 16 Introductory Group Theory

is denoted by a symbol " ⋄ " below. Note that the symbol ⋄ implies an ordinary
multiplication, an ordinary addition, etc.
(A1) If a and b are any elements of the set ℊ, then so is a ⋄ b. (We sometimes say
that the set is “closed” regarding the multiplication.)
(A2) Multiplication is associative; i.e., a ⋄ (b ⋄ c) ¼ (a ⋄ b) ⋄ c.
(A3) The set ℊ contains an element of e called the identity element such that we
have a ⋄ e ¼ e ⋄ a ¼ a with any element a of ℊ.
(A4) For any a of ℊ, we have an element b such that a ⋄ b ¼ b ⋄ a ¼ e. The element
b is said to be the inverse element of a. We denote b  a1.
In the above definitions, we assume that the commutative law does not necessar-
ily hold, that is, a ⋄ b 6¼ b ⋄ a. In that case the group ℊ is said to be a
noncommutative group. However, we have a case where the commutative law
holds, i.e., a ⋄ b ¼ b ⋄ a. If so, the group ℊ is called a commutative group or an
Abelian group.
Let us think of some examples of groups. Henceforth, we follow the convention
and write ab to express a ⋄ b.
Example 16.1 We present several examples of groups below. Examples (i)–(iv) are
simple, but Example (v) is general.
(i) ℊ ¼ {1, 1}. The group ℊ makes a group with respect to the multiplication.
This is an example of a finite group.
(ii) ℊ ¼ {  , 3, 2, 1, 0, 1, 2, 3,   }. The group ℊ makes a group with respect
to the addition. This is an infinite group. For instance, take a(>0) and make
a + 1 and make (a + 1) + 1, [(a + 1) + 1] + 1,    again and again. Thus, addition
is not closed and, hence, we must have an infinite group.
 
0 1
(iii) Let us start with a matrix a ¼ . Then, the inverse a1 ¼ a3 ¼
1 0
   
0 1 1 0
. We have a2 ¼ . Its inverse is a2 itself. These four
1 0 0 1
elements make a group. That is, ℊ ¼ {e, a, a2, a3}. This is an example of cyclic
groups.
(iv) ℊ ¼ {1}. It is a most trivial case, but sometimes the trivial case is very
important as well. We will come back later to this point.
(v) Let us think of a more general case. In Chap. 11, we discussed endomorphism
on a vector space and showed the necessary and sufficient condition for the
existence of an inverse transformation. In this relation, we consider a set that
comprises matrices such that
   
GLðn, ℂÞ  A ¼ aij ji, j ¼ 1, 2,   , n; aij 2 ℂ, detA 6¼ 0 :

This may be either a finite group or an infinite group. The former can be a symmetry
group and the latter can be a rotation group. This group is characterized by a set of
16.1 Definition of Groups 623

Table 16.1 Multiplication ℊ e a b  a2 c  a3


table of ℊ ¼ {e, a, a2, a3}
e e a b c
a a b c e
b b c e a
c c e a b

invertible and endomorphic linear transformations over a vector space Vn and called
a linear transformation group or a general linear group and denoted by GL(n, ℂ), GL
(Vn), GL(V ), etc. The relevant transformations are bijective. We can readily make
sure that axioms (A1) to (A4) are satisfied with GL(n, ℂ). Here, the vector space can
be ℂn or a function space.
The structure of a finite group is tabulated in a multiplication table. This is made
up such that group elements are arranged in a first row and a first column and that an
intersection of an element gi in the row and gj in the column is designated as a
product gi ⋄ gj. Choosing the above (iii) for an example, we make its multiplication
table (see Table 16.1). There we define a2 ¼ b and a3 ¼ c.
Having a look at Table 16.1, we notice that in the individual rows and columns
each group element appears once and only once. This is well known as a
rearrangement theorem.
Theorem 16.1: Rearrangement Theorem [1] In each row or each column in the
group multiplication table, individual group elements appear once and only once.
From this, each row and each column list merely rearranged group elements.
Proof Let a set ℊ ¼ {g1  e, g2,   , gn} be a group. Arbitrarily choosing any
element h from ℊ and multiplying individual elements by h, we obtain a set
Hfhg1 , hg2 ,   , hgn g . Then, all the group elements of ℊ appear in H once and
only once. Choosing any group element gi, let us multiply gi by h1 to get h1gi.
Since h1gi must be a certain element gk of ℊ, we put h1gi ¼ gk. Multiplying both
sides by h, we have gi ¼ hgk. Therefore, we are able to find this very element hgk in H,
i.e., gi in H. This implies that the element gi necessarily appears in H. Suppose in turn
that gi appears more than once. Then, we must have gi ¼ hgk ¼ hgl (k 6¼ l). Multi-
plying the relation by h1, we would get h1gi ¼ gk ¼ gl, in contradiction to the
supposition. This means that gi appears in H once and only once. This confirms that
the theorem is true of each row of the group multiplication table.
A similar argument applies with a set H0 fg1 h, g2 h,   , gn hg. This confirms in turn
that the theorem is true of each column of the group multiplication table. These
complete the proof. ∎
624 16 Introductory Group Theory

16.2 Subgroups

As we think of subspaces in a linear vector space, we have subgroups in a group. The


definition of the subgroup is that a subset H of a group makes a group with respect to
the multiplication ⋄ that is defined for the group ℊ. The identity element makes a
group by itself. Both {e} and ℊ are subgroups as well. We often call subgroups other
than {e} and ℊ “proper” subgroups.
A necessary and sufficient condition for the subset H to be a subgroup is the
following:
(1) hi , hj 2 H ⟹hi ⋄hj 2 H .
(2) h 2 H ⟹h1 2 H .
If H is a subgroup of ℊ, it is obvious that the relations (1) and (2) hold.
Conversely, if (1) and (2) hold, H is a subgroup. In fact, (1) ensures the aforemen-
tioned relation (A1). Since H is a subset of ℊ, this guarantees the associative law
(A2). The relation (2) ensures (A4). Finally, in virtue of (1) and (2), h ⋄ h1 ¼ e is
contained in H ; this implies that (A3) is satisfied. Thus, H is a subgroup, because H
satisfies
 the axioms (A1) to (A4). Of the above examples, (iii) has a subgroup
H ¼ e, a2 .
It is important to decompose a set into subsets that do not mutually contain an
element (except for a special element) among them. We saw this in Part III when we
decomposed a linear vector space into subspaces. In that case the said special
element was a zero vector. Here let us consider a related question in its similar
aspects.
Let H ¼ fh1  e, h2 ,   , hs g be a subgroup of ℊ. Also let us consider aH where

a 2 ℊ and a= 2H . Suppose that aH is a subset of ℊ such that aH ¼
fah1 , ah2 ,   , ahs g . Then, we have another subset H þ aH . If H contains
s elements, so does aH . In fact, if it were not the case, namely, if ahi ¼ ahj,
multiplying the both sides by a1 we would have hi ¼ hj, in contradiction. Next,
let us take b such that b= 2H and b= 2aH and make up bH and H þ aH þ bH
successively. Our question is whether these procedures decompose ℊ into subsets
mutually exclusive and collectively exhaustive.
Suppose that we can succeed in such a decomposition and get

ℊ ¼ g1 H þ g2 H þ    þ gk H , ð16:2Þ

where g1, g2,   , gk are mutually different elements with g1 being the identity e. In
that case (16.2) is said to be the left coset decomposition of ℊ by H . Similarly, right
coset decomposition can be done to give

ℊ ¼ H g1 þ H g2 þ    þ H gk : ð16:3Þ

In general, however,
16.3 Classes 625

gk H 6¼ H gk or gk H g1
k 6¼ H : ð16:4Þ

Taking the case of left coset as an example, let us examine whether different cosets
mutually contain a common element. Suppose that gi H and gj H mutually contain a
common element. Then, that element would be expressed as gihp ¼ gjhq (1  i,
j  n; 1  p, q  s). Thus, we have gi hp h1 q ¼ gj . Since H is a subgroup of ℊ,
hp h1
q 2 H . This implies that g j 2 g i H . It is in contradiction to the definition of left
coset. Thus, we conclude that different cosets do not mutually contain a common
element.
Suppose that the order of ℊ and H is n and s, respectively. Different k cosets
comprise s elements individually and different cosets do not mutually possess a
common element and, hence, we must have

n ¼ sk, ð16:5Þ

where k is called an index of H . We will have many examples afterward.

16.3 Classes

Another method to decompose a group into subsets is called conjugacy classes. A


conjugate element is defined as follows: Let a be an element arbitrarily chosen from
a group. Then an element gag1 is called a conjugate element or conjugate to a. If
c is conjugate to b and b is conjugate to a, then c is conjugate to a. It is because

1 1 1 1
c ¼ g0 bg0 , b ¼ gag1 ⟹c ¼ g0 bg0 ¼ g0 gag1 g0 ¼ g0 gaðg0 gÞ : ð16:6Þ

In the above a set containing a and all the elements conjugate to a is said to be a
(conjugate) class of a. Denoting this set by ∁a, we have
 
∁a ¼ a, g2 ag1 1 1
2 , g3 ag3 ,   , gn agn : ð16:7Þ

In ∁a a same element may appear repeatedly. It is obvious that in every group the
identity element e forms a class by itself. That is,

∁e ¼ feg: ð16:8Þ

As in the case of the decomposition of a group into (left or right) cosets, we can
decompose a group to classes. If group elements are not exhausted by a set
comprising ∁e or ∁a, let us take b such that b 6¼ e and b 2
= ∁a and make ∁b similarly
to (16.7). Repeating this procedure, we should be able to decompose a group into
classes. In fact, if group elements have not yet exhausted after these procedures, take
626 16 Introductory Group Theory

remaining element z and make a class. If the remaining element is only z in this
moment, z can make a class by itself (as in the case of e). Notice that for an Abelian
group every element makes a class by itself. Thus, with a finite group we have a
decomposition such that

ℊ ¼ ∁e þ ∁a þ ∁b þ    þ ∁z : ð16:9Þ

To show that (16.9) is really a decomposition, suppose that for instance a set ∁a \ ∁b
is not an empty set and that x 2 ∁a \ ∁b. Then we must have α and β that satisfy a
following relation: x ¼ αaα1 ¼ βbβ1, i.e., b ¼ β1 αaα1β ¼ β1 αa(β1 α)1.
This implies that b has already been included in ∁a, in contradiction to the suppo-
sition. Thus, (16.9) is in fact a decomposition of ℊ into a finite number of classes.
In the above we thought of a class conjugate to a single element. This notion can
be extended to a class conjugate to a subgroup. Let H be a subgroup of ℊ. Let g be
an element of ℊ. Let us now consider a set H 0 ¼ gH g1. The set H 0 is a subgroup of
ℊ and is called a conjugate subgroup.
In fact, let hi and hj be any two elements of H , that is, let ghig1 and ghjg1 be any
tow elements of H 0 . Then, we have
  
ghi g1 ghj g1 ¼ ghi hj g1 ¼ ghk g1 , ð16:10Þ

1
where hk ¼ hi hj 2 H . Hence, ghk g1 2 H 0 . Meanwhile, ðghi g1 Þ ¼ gh1 i g
1
2
H . Thus, conditions (1) and (2) of Sect. 16.2 are satisfied with H . Therefore, H 0 is a
0 0

subgroup of ℊ. The subgroup H 0 has a same order as H . This is because with any
two different elements hi and hj ghig1 6¼ ghjg1.
If for 8g2ℊ and a subgroup H we have a following equality

g1 H g ¼ H , ð16:11Þ

such a subgroup H is said to be an invariant subgroup. If (16.11) holds, H should be


a sum of classes (reader, please show this). A set comprising only the identity, i.e.,
{e} forms a class. Therefore, if H is a proper subgroup, H must contain two or more
classes. The relation (16.11) can be rewritten as

gH ¼ H g: ð16:12Þ

This implies that the left coset is identical to the right coset. Thus, as far as we are
dealing with a coset pertinent to an invariant subgroup, we do not have to distinguish
left and right cosets.
Now let us anew consider the (left) coset decomposition of ℊ by an invariant
subgroup H
16.4 Isomorphism and Homomorphism 627

ℊ ¼ g1 H þ g2 H þ    þ gk H , ð16:13Þ

where we have H ¼ fh1  e, h2 ,   , hs g. Then multiplication of two elements that


belong to the cosets gi H and gj H is expresses as

     
ðgi hl Þ gj hm ¼ gi gj g1
j hl gj hm ¼ gi gj g1
j hl gj hm ¼ gi gj hp hm

¼ gi gj hq , ð16:14Þ

where the third equality comes from (16.11). That is, we should have ∃hp such that
g1
j hl gj ¼ hp , and hphm ¼ hq. In (16.14) hα 2 H (α stands for l, m, p, q, etc. with
1  α  s). Note that gi gj hq 2 gi gj H . Accordingly, a product of elements belonging
to gi H and gj H belongs to gi gj H . We rewrite (16.14) as a relation between the sets
 
ð gi H Þ gj H ¼ gi gj H : ð16:15Þ

Viewing LHS of (16.15) as a product of two cosets, we find that the said product is a
coset as well. This implies that a collection of the cosets forms a group. Such a group
that possesses cosets as elements is said to be a factor group or quotient group. In this
context, the multiplication is a product of cosets. We denote the factor group by

ℊ=H :

An identity element
 of this factor group is H . This is because in (16.15) putting
gi ¼ e, we get H gj H ¼ gj H . Alternatively, putting gj ¼ e, we have ðgi H ÞH ¼
gi H . In (16.15) moreover putting gj ¼ g1i , we get

 
ðgi H Þ g1 1
i H ¼ gi gi H ¼ H : ð16:16Þ

Hence, ðgi H Þ1 ¼ g1 1


i H . That is, the inverse element of gi H is gi H .

16.4 Isomorphism and Homomorphism

As in the case of the linear vector space, we consider the mapping between group
elements. Of these, the notion of isomorphism and homomorphism is important.
Definition 16.1 Let ℊ ¼ {x, y,   } and ℊ0 ¼ {x0, y0,   } be groups and let a
mapping ℊ ! ℊ0 exist. Suppose that there is a one-to-one correspondence (i.e.,
injective mapping)
628 16 Introductory Group Theory

x $ x0 , y $ y0 , 

between the elements such that xy ¼ z implies that x0y0 ¼ z0 and vice versa.
Meanwhile, any element in ℊ0 must be the image of some element of ℊ. That is,
the mapping is surjective as well and, hence, the mapping is bijective. Then, the two
groups ℊ and ℊ0 are said to be isomorphic. The relevant mapping is called an
isomorphism. We symbolically denote this relation by

ℊ ffi ℊ0 :

Note that the aforementioned groups can be either a finite group or an infinite
group. We did not designate identity elements. Suppose that x is the identity e. Then,
from the relations xy ¼ z and x0y0 ¼ z0 we have

ey ¼ z ¼ y, x0 y0 ¼ z 0 ¼ y0 : ð16:17Þ

Then, we get

x0 ¼ e0 , i:e:, e $ e0 : ð16:18Þ

Also let us put y ¼ x1. Then,


 0
xx1 ¼ z ¼ e, x0 x1 ¼ e0 , x0 y0 ¼ z0 ¼ e0 : ð16:19Þ

Comparing the second and third equations of (16.19), we get

1  0
y0 ¼ x0 ¼ x1 : ð16:20Þ

The bijective character mentioned above can somewhat be loosened in such a


way that the one-to-one correspondence is replaced with n-to-one correspondence.
We have a following definition.
Definition 16.2 Let ℊ ¼ {x, y,   } and ℊ0 ¼ {x0, y0,   } be groups and let a
mapping ℊ ! ℊ0 exist. Also let a mapping ρ: ℊ ! ℊ0 exist such that with arbitrarily
chosen any two elements, the following relation holds:

ρðxÞρðyÞ ¼ ρðxyÞ: ð16:21Þ

Then, the two groups ℊ and ℊ0 are said to be homomorphic. The relevant mapping is
called homomorphism. We symbolically denote this relation by

ℊ  ℊ0 :

In this case, we have


16.4 Isomorphism and Homomorphism 629

ρðeÞρðeÞ ¼ ρðeeÞ ¼ ρðeÞ, i:e:, ρðeÞ ¼ e0 ,

where e0 is an identity element of ℊ0. Also, we have


   
ρðxÞρ x1 ¼ ρ xx1 ¼ ρðeÞ ¼ e0 :

Therefore,
 
½ρðxÞ1 ¼ ρ x1 :

The two groups can be either a finite group or an infinite group. Note that in the
above the mapping is not injective. The mapping may or may not be surjective.
Regarding the identity and inverse elements, we have the same relations as (16.18)
and (16.20). From Definitions 16.1 and 16.2, we say that the bijective homomor-
phism is the isomorphism.
Let us introduce an important notion of a kernel of a mapping. In this regard, we
have a following definition.
Definition 16.3 Let ℊ ¼ {e, x, y,   } and ℊ0 ¼ {e0, x0, y0,   } be groups and let
e and e0 be the identity elements. Suppose that there exists a homomorphic mapping
ρ: ℊ ! ℊ0. Also let F be a subset of ℊ such that

ρð F Þ ¼ e 0 : ð16:22Þ

Then, F is said to be a kernel of ρ.


Regarding the kernel, we have following important theorems.
Theorem 16.2 Let ℊ ¼ {e, x, y,   } and ℊ0 ¼ {e0, x0, y0,   } be groups, where e and
e0 are identity elements. A necessary and sufficient condition for a surjective and
homomorphic mapping ρ : ℊ ! ℊ0 to be isomorphic is that a kernel F ¼ feg.
Proof We assume that F ¼ feg. Suppose that ρ(x) ¼ ρ( y). Then, we have
   
ρðxÞ½ρðyÞ1 ¼ ρðxÞρ y1 ¼ ρ xy1 ¼ e0 : ð16:23Þ

The first and second equalities result from the homomorphism of ρ. Since F ¼ feg,
xy1 ¼ e, i.e., x ¼ y. Therefore, ρ is injective (i.e., one-to-one correspondence). As ρ
is surjective from the assumption, ρ is bijective. The mapping ρ is isomorphic
accordingly.
Conversely, suppose that ρ is isomorphic. Also suppose for ∃x 2 ℊ ρ(x) ¼ e0.
From (16.18), ρ(e) ¼ e0. We have ρ(x) ¼ ρ(e) ¼ e0 ⟹ x ¼ e due to the isomorphism
of ρ (i.e., one-to-one correspondence). This implies F ¼ feg. This completes the
proof. ∎
We become aware of close relationship between Theorem 16.1 and linear trans-
formation versus kernel already mentioned in Sect. 11.2 of Part III. Figure 16.1
630 16 Introductory Group Theory

Fig. 16.1 Mapping in a (a)


group and vector space. (a)
Homomorphic mapping ρ: ℊ
ℊ ! ℊ0 between two
groups. (b) Linear ℊ
transformation
(endomorphism) A: Vn ! Vn
in a vector space Vn ⟷ : isomorphism
(b)

: inverble (bijecve)

shows this relationship. Figure 16.1a represents homomorphic mapping ρ: ℊ ! ℊ0


between two groups, whereas Fig. 16.1b shows linear transformation (endomor-
phism) A: Vn ! Vn in a vector space Vn.
Theorem 16.3 Suppose that there exists a homomorphic mapping ρ: ℊ ! ℊ0,
where ℊ and ℊ0 are groups. Then, a kernel F of ρ is an invariant subgroup of ℊ.
Proof Let ki and kj be any two arbitrarily chosen elements of F . Then,
 
ρð k i Þ ¼ ρ k j ¼ e 0 , ð16:24Þ

where e0 is the identity element of ℊ0. From (16.21) we have


   
ρ ki kj ¼ ρðki Þρ kj ¼ e0 e0 ¼ e0 : ð16:25Þ

Therefore, k i kj 2 F . Meanwhile, from (16.20) we have


  1
ρ k1
i ¼ ½ρðk i Þ1 ¼ e0 ¼ e0 : ð16:26Þ

Then, k1
i 2 F . Thus, F is a subgroup of ℊ.
Next, for 8g 2 ℊ we have
     
ρ gk i g1 ¼ ρðgÞρðki Þρ g1 ¼ ρðgÞe0 ρ g1 ¼ e0 : ð16:27Þ

Accordingly we have gk i g1 2 F . Thus, gF g1 ⊂ F . Since g is chosen arbitrarily,


replacing it with g1 we have g1 F g ⊂ F. Multiplying g and g1 on both sides from
the left and right, respectively, we get F ⊂ gF g1 . Consequently, we get

gF g1 ¼ F : ð16:28Þ

This implies that F of ρ is an invariant subgroup of ℊ. ∎


16.5 Direct-Product Groups 631

Theorem 16.4: Homomorphism Theorem Let ℊ ¼ {x, y,   } and ℊ0 ¼ {x0,


y0,   } be groups and let a homomorphic (and surjective) mapping ρ: ℊ ! ℊ0
ρ: ℊ=F ! ℊ0
exist. Also let F be a kernel of ℊ. Let us define a surjective mapping e
such that

e
ρðgi F Þ ¼ ρðgi Þ: ð16:29Þ

Then, e
ρ is an isomorphic mapping.
Proof From (16.15) and (16.21), it is obvious that e ρ is homomorphic. The confir-
mation is left for readers. Let gi F and gj F be two different cosets. Suppose here that
ρ(gi) ¼ ρ(gj). Then we have
   1     
ρ g1
i g j ¼ ρ gi ρ gj ¼ ½ρðgi Þ1 ρ gj ¼ ½ρðgi Þ1 ρðgi Þ ¼ e0 : ð16:30Þ

This implies that g1


i gj 2 F . That is, we would have gj 2 gi F . This is in contradic-
tion to the definition of a coset. Thus, we should have ρ(gi) 6¼ ρ(gj). In other words,
the different cosets gi F and gj F have been mapped into different elements ρ(gi) and
ρ(gj) in ℊ0. That is, e
ρ is isomorphic; i.e., ℊ=F ffi ℊ0 . ∎

16.5 Direct-Product Groups

So far we have investigated basic properties of groups. In Sect. 16.4 we examined


factor groups. The homomorphism theorem shows that the factor group is charac-
terized by division. In the case of a finite group, an order of the group is reduced. In
this section, we study the opposite character, i.e., properties of direct product of
groups, or direct-product groups.  
Let H ¼ fh1  e, h2 ,   , hm g and H 0 ¼ h01  e, h02 ,   , h0n be groups of the
8
order of m and n, respectively. Suppose that (i) 8 hi (1  i  m) and h0j ð1  i  nÞ
commute, i.e., hi h0j ¼ h0j hi and that (ii) H \ H 0 ¼ feg. Under these conditions let us
construct a set ℊ such that
n o
ℊ ¼ h1 h01  e, hi h0j ð1  i  m, 1  j  nÞ : ð16:31Þ

In other words, ℊ is a set comprising mn elements hi h0j . A product of elements is


defined as
  
hi h0j hk h0l ¼ hi hk h0j h0l ¼ hp h0q , ð16:32Þ

where hp ¼ hihk and h0q ¼ h0j h0l . The identity element is ee ¼ e; ehihk ¼ hihke. The
1 1 1
inverse element is ðhi h0j Þ ¼ h0j hi 1 ¼ hi 1 h0j . Associative law is obvious from
632 16 Introductory Group Theory

hi h0j ¼ h0j hi. Thus, ℊ forms a group. This is said to be a direct product of groups, or a
direct-product group. The groups H and H 0 are called direct factors of ℊ. In this
case, we succinctly represent

ℊ¼H H 0:

In the above the condition (ii) is equivalent to that 8 g2ℊ is uniquely represented
as

g ¼ hh0 ; h 2 H , h0 2 H 0 : ð16:33Þ

In fact, suppose that H \ H 0 ¼ feg and that g can be represented in two ways such
that

g ¼ h1 h01 ¼ h2 h02 ; h1 , h2 2 H , h01 , h02 2 H 0 : ð16:34Þ

Then, we have

1 1
h2 1 h1 ¼ h02 h01 ; h2 1 h1 2 H , h02 h01 2 H 0: ð16:35Þ

From the supposition, we get

1
h2 1 h1 ¼ h02 h01 ¼ e: ð16:36Þ

That is, h2 ¼ h1 and h02 ¼ h01 . This means that the representation is unique.
Conversely, suppose that the representation is unique and that x 2 H \ H 0. Then
we must have

x ¼ xe ¼ ex: ð16:37Þ

Thanks to uniqueness of the representation, x ¼ e. This implies H \ H 0 ¼ feg.


Now suppose h 2 H . Then for 8 g2ℊ putting g ¼ hν h0μ , we have

1 1
ghg1 ¼ hν h0μ hh0μ hν 1 ¼ hν hh0μ h0μ hν 1 ¼ hν hhν 1 2 H : ð16:38Þ

Then we have gH g1 ⊂ H . Similarly to the proof of Theorem 16.3, we get

gH g1 ¼ H : ð16:39Þ

This shows that H is an invariant subgroup of ℊ. Similarly, H 0 is an invariant


subgroup as well.
Reference 633

Regarding the unique representation of the group element of a direct-product


group, we become again aware of the close relationship between the direct product
and direct sum that was mentioned earlier in Part III.

Reference

1. Cotton FA (1990) Chemical applications of group theory. Wiley, New York


Chapter 17
Symmetry Groups

We have many opportunities to observe symmetry both macroscopic and micro-


scopic in natural world. First, we need to formulate the symmetry appropriately. For
this purpose, we must regard various symmetry operations as mathematical elements
and classify these operations under several categories. In Part III we examined
various properties of vectors and their transformations. We also showed that the
vector transformation can be viewed as the coordinate transformation. On these
topics, we focused upon abstract concepts in various ways. On another front,
however, we have not paid attention to specific geometric objects, especially mol-
ecules. In this chapter, we study the symmetry of these concrete objects. For this, it
will be indispensable to correctly understand a variety of symmetry operations. At
the same time, we deal with the vector and coordinate transformations as group
elements. Among such transformations, rotations occupy a central place in the group
theory and related field of mathematics. Regarding the three-dimensional Euclidean
space, SO(3) is particularly important. This is characterized by an infinite group in
contrast to various symmetry groups (or point groups) we investigate in the former
parts of this chapter.

17.1 A Variety of Symmetry Operations

To understand various aspects of symmetry operations, it is convenient and essential


to consider a general point that is fixed in a three-dimensional Euclidean space and to
examine how this point is transformed in the space. In parallel to the description in
Part III, we express the coordinate of the general point P as

© Springer Nature Singapore Pte Ltd. 2020 635


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_17
636 17 Symmetry Groups

0 1
x
B C
P ¼ @ y A: ð17:1Þ
z

Note that P may be on or within or outside a geometric object or molecule that we are
dealing with. The relevant position vector x for P is expresses as

x ¼ xe1 þ ye2 þ ze3:


0 1
x
B C
¼ ðe1 e2 e3 Þ@ y A, ð17:2Þ
z

where e1, e2, and e3 denote an orthonormal basis vectors pointing to positive
directions of x-, y-, and z-axes, respectively. Similarly we denote a linear transfor-
mation A by
0 10 1
a11 a12 a13 x
B CB C
AðxÞ ¼ ðe1 e2 e3 Þ@ a21 a22 a23 A@ y A: ð17:3Þ
a31 a32 a33 z

Among various linear transformations that are represented by matrices, orthogonal


transformations are the simplest and most widely used. We use orthogonal matrices
to represent the orthogonal transformations accordingly.
Let us think of a movement or translation of a geometric object and an operation
that causes such a movement. Fist suppose that the geometric object is fixed on a
coordinate system. Then the object is moved (or translate) to another place. If before
and after such a movement (or translation) one could not tell whether the object has
been moved, we say that the object possesses a “symmetry” in a certain sense. Thus,
we have to specify this symmetry. In that context, group theory deals with the
symmetry and defines it clearly.
To tell whether the object has been moved (to another place), we usually
distinguish it by change in (i) positional relationship and (ii) attribute or property.
To make the situation simple, let us consider a following example:
Example 17.1 We have two round disks, i.e., Disk A and Disk B. Suppose that
Disk A is a solid white disk, whereas Disk B is partly painted black (see Fig. 17.1). In
Fig. 17.1 we are thinking of a rotation of an object (e.g., round disk) around an axis
standing on its center and stretching perpendicularly to the object plane.
If an arbitrarily chosen positional vector fixed on the object before the rotation is
moved to another position that was not originally occupied by the object, then we
recognize that the object has certainly been moved. For instance, imagine that a
round disk having a through-hole located aside from center is rotating. What about
the case where that position was originally occupied by the object, then? We have
17.1 A Variety of Symmetry Operations 637

(a) (b)
Disk A Disk B

Rotation around Rotation around C


C the center C C the center

Fig. 17.1 Rotation of an object. (a) Case where we cannot recognize that the object has been
moved. (b) Case where we can recognize that the object has been moved because of its attribute
(i.e., because the round disk is partly painted black)

two possibilities. The first alternative is that we cannot recognize that the object has
been moved. The second one is that we can yet recognize that the object has been
moved. According to Fig. 17.1a, b, we have the former case and the latter case,
respectively. In the latter case, we have recognized the movement of the object by its
attribute, i.e., by that the object is partly painted black.
However, we do not have to be rigorous here. We have a clear intuitive criterion
for a judgment of whether a geometric object has been moved. From now on, we
assume that the geometric character of an object is pertinent to both its positional
relationship and attribute. Thus, we define the equivalent (or indistinguishable)
disposition of an object and the operation that yields such an equivalent disposition
as follows:
Definition 17.1
(i) Symmetry operation: A geometric operation that produces an equivalent
(or indistinguishable) disposition of an object.
(ii) Equivalent (or indistinguishable) disposition: Suppose that regarding a geomet-
ric operation of an object, we cannot recognize that the object has been moved
before and after that geometric operation. In that case, the original disposition of
the object and the resulting disposition reached after the geometric operation are
referred to as an equivalent disposition. The relevant geometric operation is the
symmetric operation.
Here we should clearly distinguish translation (i.e., parallel displacement) from
the abovementioned symmetry operations. This is because for a geometric object to
possess the translation symmetry the object must be infinite in extent, typically an
infinite crystal lattice. The relevant discipline is widely studied as space group and
has a broad class of applications in physics and chemistry. However, we will not deal
with the space group or associated topics, but focus our attention upon symmetry
groups in this book.
In the above example, the rotation is a symmetry operation with Fig. 17.1a, but
the said geometric operation is not a symmetric operation with Fig. 17.1b.
Let us further inspect properties of the symmetry operation. Let us consider a set
H consisting of symmetry operations. Let a and b be any two symmetric operations
of H . Then, (i) a ⋄ b is a symmetric operation as well. (ii) Multiplication of
638 17 Symmetry Groups

successive symmetric operations a, b, c is associative; i.e., a ⋄ (b ⋄ c) ¼ (a ⋄ b) ⋄ c.


(iii) The set H contains an element of e called the identity element such that we have
a ⋄ e ¼ e ⋄ a ¼ a with any element a of H. Operating “nothing” should be e. If the
rotation is relevant, 2π rotation is thought to be e. These are intuitively acceptable.
(iv) For any a of ℊ, we have an element b such that a ⋄ b ¼ b ⋄ a ¼ e. The element
b is said to be the inverse element of a. We denote it by b  a1. The inverse element
corresponds to an operation that brings the disposition of a geometric object back to
the original disposition. Thus, H forms a group. We call H satisfying the above
criteria a symmetry group. A symmetry group is called a point group as well. This is
because a point group comprises symmetry operations of geometric objects as group
elements and those objects have at least one fixed point after the relevant symmetry
operation. The name of a point group comes from this fact.
As mentioned above, the symmetric operation is best characterized by a (3, 3)
orthogonal matrix. In Example 17.1, e.g., the π rotation is represented by an
orthogonal matrix A such that
0 1
1 0 0
B C
A¼@ 0 1 0 A: ð17:4Þ
0 0 1

This operation represents a π rotation around the z-axis. Let us think of another
symmetric operation described by
0 1
1 0 0
B C
B ¼ @0 1 0 A: ð17:5Þ
0 0 1

This produces a mirror symmetry with respect to the xy-plane. Then, we have
0 1
1 0 0
B C
C  AB ¼ BA ¼ @ 0 1 0 A: ð17:6Þ
0 0 1

The operation C shows an inversion about the origin. Thus, A, B, and C along with
an identity element E form a group. Here E is expressed as
0 1
1 0 0
B C
E ¼ @0 1 0 A: ð17:7Þ
0 0 1

The above group is represented by four three-dimensional diagonal matrices whose


elements are 1 or 1. Therefore, it is evident that an inverse element of A, B, and C is
17.1 A Variety of Symmetry Operations 639

Fig. 17.2 Rotation by θ z


around the z-axis

y
O

A, B, and C itself, respectively. The said group is commutative (Abelian) and said to
be a four group [1].
Meanwhile, we have a number of noncommutative groups. From a point of view
of a matrix structure, noncommutativity comes from off-diagonal elements of the
matrix. A typical example is a rotation matrix of a rotation angles different from zero
or nπ (n: integer). For later use, let us have a matrix form that expresses a θ rotation
around the z-axis. Figure 17.2 depicts a graphical illustration for this.
A matrix R has a following form:
0 1
cos θ  sin θ 0
B C
R ¼ @ sin θ cos θ 0 A: ð17:8Þ
0 0 1

Note that R has been reduced. This implies that the three-dimensional Euclidean
space is decomposed into a two-dimensional subspace (xy-plane) and a one-dimen-
sional subspace (z-axis). The xy-coordinates are not mixed with the z-component
after the rotation R. Note, however, that if the rotation axis is oblique against the xy-
plane, this is not the case. We will come back to this point later.
Taking only the xy-coordinates in Fig. 17.3 we make a calculation. Using an
addition theorem of trigonometric functions, we get

x0 ¼ r cos ðθ þ αÞ ¼ rð cos α cos θ  sin α sin θÞ ¼ x cos θ  y sin θ, ð17:9Þ


0
y ¼ r sin ðθ þ αÞ ¼ r ð sin α cos θ þ cos α sin θÞ ¼ y cos θ þ x sin θ, ð17:10Þ

where we used x ¼ r cos α and y ¼ r sin α. Combining (17.9) and (17.10), as a matrix
form we get
640 17 Symmetry Groups

Fig. 17.3 Transformation y


of the xy-coordinates by a θ
rotation ′

θ
α
O x

    
x0 cos θ  sin θ x
¼ : ð17:11Þ
y0 sin θ cos θ y

Equation (17.11) is the same as (11.31) and represents a transformation matrix of a


rotation angle θ within the xy-plane. Whereas in Chap. 11 we considered this from
the point of view of the transformation of basis vectors, here we deal with the
transformations of coordinates in the fixed coordinate system. If θ 6¼ π, off-diagonal
elements do not vanish. Including the z-component we get (17.8).
Let us summarize symmetry operations and their (3, 3) matrix representations.
The coordinates before and after a symmetry operation are expressed as
0 1 0 01
x x
B C B 0C
@ A
y and @ y A, respectively:
z z0

(i) Identity transformation:


To leave a geometric object or a coordinate system unchanged (or unmoved).
By convention, we denote it by a capital letter E. It is represented by a (3, 3)
identity matrix.
(ii) Rotation symmetry around a rotation axis:
Here a “proper” rotation is intended. We denote a rotation by a rotation axis and
its magnitude (i.e., rotation angle). Thus, we have
17.1 A Variety of Symmetry Operations 641

Fig. 17.4 Rotation by ϕ z


around the y-axis

y
O $

0 1 0 1
cos θ  sin θ 0 cos ϕ 0 sin ϕ
B C B C
Rzθ ¼ @ sin θ cos θ 0 A, Ryϕ ¼ @ 0 1 0 A,
0 0 1  sin ϕ 0 cos ϕ
0 1
1 0 0
B C
Rxφ ¼ @ 0 cos φ  sin φ A: ð17:12Þ
0 sin φ cos φ

With Ryϕ we first consider a following coordinate transformation:


0 1 0 10 1
z0 cos ϕ  sin ϕ 0 z
B 0C B CB C
@ x A ¼ @ sin ϕ cos ϕ 0 A@ x A: ð17:13Þ
y0 0 0 1 y

0 This
1 can easily0be 1
visualized in Fig. 17.4; consider that cyclic permutation of
x z
B C B C
@ y A produces @ x A. Shuffling the order of coordinates, we get
z y
0 1 0 10 1
x0 cos ϕ 0 sin ϕ x
B 0C B CB C
@y A ¼ @ 0 1 0 A@ y A: ð17:14Þ
z0  sin ϕ 0 cos ϕ z

By convention of the symmetry groups, the following notation Cn is used to


denote a rotation. A subscript n of Cn represents the order of the rotation axis.
The order means the largest number of n so that the rotation through 2π/n gives
an equivalent configuration. Successive rotations of m times (m < n) are
denoted by
642 17 Symmetry Groups

n:
Cm

If m ¼ n, the successive rotations produce an equivalent configuration same


as the beginning; i.e., Cnn ¼ E . The rotation angles θ, φ, etc. used above are
restricted to 2πm/n accordingly.
(iii) Mirror symmetry with respect to a plane of mirror symmetry:
We denote a mirror symmetry by a mirror symmetry plane (e.g., xy-plane, yz-
plane). We have
0 1 0 1
1 0 0 1 0 0
B C B C
M xy ¼ @ 0 0 A,
1 M yz ¼ @ 0 1 0 A,
0 0 1 0 0 0
0 1
1 0 0
B C
M zx ¼ @ 0 1 0 A: ð17:15Þ
0 0 1

The mirror symmetry is usually denoted by σ v, σ h, and σ d, whose subscripts


stand for “vertical,” “horizontal,” and “dihedral,” respectively. Among these
symmetry operations, σ v and σ d include a rotation axis in the symmetry plane,
while σ h is perpendicular to the rotation axis if such an axis exists. Notice that a
group belonging to Cs symmetry possesses only E and σ h. Although σ h can
exist by itself as a mirror symmetry, neither σ v nor σ d can exist as a mirror
symmetry by itself. We will come back to this point later.
(iv) Inversion symmetry with respect to a center of inversion:
We specify an inversion center if necessary, e.g., an origin of a coordinate
system O. The operation of inversion is denoted by
0 1
1 0 0
B C
IO ¼ @ 0 1 0 A: ð17:16Þ
0 0 1

Note that as obviously from the matrix form, IO is commutable with any
other symmetry operations. Note also that IO can be expressed as successive
symmetry operations or product of symmetry operations. For instance, we have
0 10 1
1 0 0 1 0 0
B CB C
I O ¼ Rzπ M xy ¼ @ 0 1 0 A@ 0 1 0 A: ð17:17Þ
0 0 1 0 0 1

Note that Rzπ and Mxy are commutable; i.e., RzπMxy ¼ MxyRzπ.
17.2 Successive Symmetry Operations 643

(v) Improper rotation:


This is a combination of a proper rotation and a reflection by a mirror symmetry
plane. That is, rotation around an axis is performed first and then reflection is
carried out by a mirror plane that is perpendicular to the rotation axis. For
instance, an improper rotation is expressed as
0 10 1
1 0 0 cos θ  sin θ 0
B CB C
M xy Rzθ ¼ B CB
@ 0 1 0 A@ sin θ cos θ 0C
A
0 0 1 0 0 1
0 1 ð17:18Þ
cos θ  sin θ 0
B C
B
¼ @ sin θ cos θ 0 CA:
0 0 1

As mentioned just above, the inversion symmetry I can be viewed as an


improper rotation. Note that in this case the reflection and rotation operations
are commutable. However, we will follow a conventional custom that considers
the inversion symmetry as an independent symmetry operation. Readers may
well wonder why we need to consider the improper rotation. The answer is
simple; it solely rests upon the axiom (A1) of the group theory. A group must
be closed with respect to the multiplication.
The improper rotations are usually denoted by Sn. A subscript n again stands
for an order of rotation.

17.2 Successive Symmetry Operations

Let us now consider successive reflections in different planes and successive rota-
tions about different axes [2]. Figure 17.5a displays two reflections with respect to
the planes σ and e σ both perpendicular to the xy-plane. The said planes make a
dihedral angle θ with their intersection line identical to the z-axis. Also, the plane
σ is identical with the zx-plane. Suppose that an arrow lies on the xy-plane perpen-
dicularly to the zx-plane. As in (17.15), an operation σ is represented as
0 1
1 0 0
B C
σ ¼ @0 1 0 A: ð17:19Þ
0 0 1

To determine a matrix representation of e


σ , we calculate a matrix again as in the
above case. As a result we have
644 17 Symmetry Groups

(a) z

O
y

(b) (c)

2
$ $

‒2

Fig. 17.5 Successive two reflections about two planes σ and e σ that make an angle θ. (a) Reflections
σ and e
σ with respect to two planes. (b) Successive operations of σ and e σ in this order. The combined
operation is denoted by eσ σ. The operations result in a 2θ rotation around the z-axis. (c) Successive
operations of e
σ and σ in this order. The combined operation is denoted by e σ σ. The operations result
in a 2θ rotation around the z-axis

0 10 10 1
cos θ  sin θ 0 1 0 0 cos θ sin θ 0
B CB CB C
σ¼B
e @ sin θ cos θ 0C B
A@ 0 1 0C B
A@  sin θ cos θ 0C
A
0 0 1 0 0 1 0 0 1
0 1 ð17:20Þ
cos 2θ sin 2θ 0
B C
¼B
@ sin 2θ  cos 2θ 0C
A:
0 0 1

Notice that this matrix representation is referred to the original xyz-coordinate


system; see discussion of Sect. 11.4. Hence, we describe the successive transforma-
tions σ followed by e
σ as
17.2 Successive Symmetry Operations 645

0 1
cos 2θ  sin 2θ 0
B C
e
σ σ ¼ @ sin 2θ cos 2θ 0 A: ð17:21Þ
0 0 1

The expression (17.21) means that the multiplication should be done first by σ and
then by eσ; see Fig. 17.5b. Note that σ and eσ are conjugate to each other (Sect. 16.3).
In this case, the combined operations produce a 2θ rotation around the z-axis.
If, on the other hand, the multiplication is made first by e
σ and then by σ, we have a
2θ rotation around the z-axis; see Fig. 17.5c. As a matrix representation, we have
0 1
cos 2θ sin 2θ 0
B C
σe
σ ¼ @  sin 2θ cos 2θ 0 A: ð17:22Þ
0 0 1

Thus, successive operations of reflection by the planes that make a dihedral angle θ
yield a rotation 2θ around the z-axis (i.e., the intersection line of σ and e
σ ). The
operation σeσ is an inverse to e
σ σ. That is

ðσe
σ Þðe
σ σ Þ ¼ E: ð17:23Þ

We have det σe
σ ¼ det e
σ σ ¼ 1.
Meanwhile, putting
0 1
cos 2θ  sin 2θ 0
B C
R2θ ¼ @ sin 2θ cos 2θ 0 A, ð17:24Þ
0 0 1

we have

e
σ σ ¼ R2θ or e
σ ¼ R2θ σ: ð17:25Þ

This implies the following: Suppose a plane σ and a straight line on it. Also suppose
that one first makes a reflection about σ and then makes a 2θ rotation around the said
straight line. Then, the resulting transformation is equivalent to a reflection about e σ
that makes an angle θ with σ. At the same time, the said straight line is an intersection
line of σ and eσ. Note that a dihedral angle between the two planes σ and e σ is half an
angle of the rotation. Thus, any two of the symmetry operations related to (17.25) are
mutually dependent; any two of them produce the third symmetry operation.
In the above illustration, we did not take account of the presence of a symmetry
axis. If the aforementioned axis is a symmetry axis Cn, we must have
646 17 Symmetry Groups

Fig. 17.6 Successive two π $


rotations around two C2
axes of Cxπ and Caπ that
intersect at an angle θ. Of 2
these, Cxπ is identical to the
x-axis (not shown)
−2

2θ ¼ 2π=n or n ¼ π=θ: ð17:26Þ

From a symmetry requirement, there should be n planes of mirror symmetry in


combination with the Cn axis. Moreover, an intersection line of these n mirror
symmetry planes should coincide with that Cn axis. This can be seen as various
Cnv groups.
Next we consider another successive symmetry operations. Suppose that there are
two C2 axes (Cxπ and Caπ as shown) that intersect at an angle θ (Fig. 17.6). There,
Cxπ is identical to the x-axis and the other (Caπ) lies on the xy-plane making an angle
θ with Cxπ. Following procedures similar to the above, we have matrix representa-
tions of the successive C2 operations in reference to the xyz-system such that
0 1
1 0 0
B C
C xπ ¼ B
@ 0 1 0 A
C

0 0 1
0 10 10 1
cos θ  sin θ 0 1 0 0 cos θ sin θ 0
B CB CB C
C aπ ¼ B
@ sin θ cos θ 0 C B
A@ 0 1 0 C B
A@  sin θ cos θ 0C
A
0 0 1 0 0 1 0 0 1
0 1
cos 2θ sin 2θ 0
B C
¼B@ sin 2θ  cos 2θ 0 A,
C

0 0 1
ð17:27Þ

where Caπ can be calculated similarly to (17.20). Again we get


17.2 Successive Symmetry Operations 647

0 1
cos 2θ  sin 2θ 0
B C
C aπ C xπ ¼ B
@ sin 2θ cos 2θ 0C
A,
0 0 1
0 1 ð17:28Þ
cos 2θ sin 2θ 0
B C
C xπ C aπ ¼ B
@  sin 2θ cos 2θ 0C
A:
0 0 1

Notice that (CxπCaπ )(CaπCxπ) ¼ E. Once again putting


0 1
cos 2θ  sin 2θ 0
B C
R2θ ¼ @ sin 2θ cos 2θ 0 A, ð17:29Þ
0 0 1

we have [1, 2]

Caπ C xπ ¼ R2θ or C aπ ¼ R2θ Cxπ : ð17:30Þ

Note that in the above two illustrations for the successive symmetry operations, both
the relevant operators have been represented in reference to the original xyz-system.
For this reason, the latter operation was done from the left in (17.28).
From a symmetry requirement, once again the aforementioned C2 axes must be
present in combination with the Cn axis. Moreover, those C2 axes should be
perpendicular to the Cn axis. This can be seen in various Dn groups.
Another illustration of successive symmetry operations is an improper rotation. If
the rotation angle is π, this causes an inversion symmetry. In this illustration,
reflection, rotation, and inversion symmetries coexist. A C2h symmetry is a typical
example.
Equations (17.21), (17.22), and (17.29) demonstrate the same relation. Namely,
two successive mirror symmetry operations about a couple of planes and two
successive π-rotations about a couple of C2 axes cause the same effect with regard
to the geometric transformation. In this relation, we emphasize that two successive
reflection operations make a determinant of relevant matrices 1. These aspects cause
an interesting effect and we will briefly discuss it in relation to O and Td groups.
If furthermore the abovementioned mirror symmetry planes and C2 axes coexist,
the symmetry planes coincide with or bisect the C2 axes and vice versa. If these were
not the case, another mirror plane or C2 axis would be generated from the symmetry
requirement and the newly generated plane or axis would be coupled with the
original plane or axis. From the above argument, these processes again produce
another Cn axis. That must be prohibited.
Next, suppose that a Cn axis intersects obliquely with a plane of mirror symmetry.
A rotation of 2π/n around such an axis produces another mirror symmetry plane.
648 17 Symmetry Groups

Fig. 17.7 Mirror symmetry


plane σ h and a sole rotation
axis perpendicular to it

This newly generated plane intersects with the original mirror plane and produces a
different Cn axis according to the above discussion. Thus, in this situation a mirror
symmetry plane cannot coexist with a sole rotation axis. In a geometric object with
higher symmetry such as Oh, however, several mirror symmetry planes can coexist
with several rotation axes in such a way that the axes intersect with the mirror planes
obliquely. In case the Cn axis intersects perpendicularly to a mirror symmetry plane,
that plane can coexist with a sole rotation axis (see Fig. 17.7). This is actually the
case with a geometric object having a C2h symmetry. The mirror symmetry plane is
denoted by σ h.
Now, let us examine simple examples of molecules and associated symmetry
operations.
Example 17.2 Figure 17.8 shows chemical structural formulae of thiophene,
bithiophene, biphenyl, and naphthalene. These molecules belong to C2v, C2h, D2,
and D2h, respectively. Note that these symbols are normally used to show specific
point groups. Notice also that in biphenyl two benzene rings are twisted relative to
the molecular axis. As an example, a multiplication table is shown in Table 17.1 for a
C2v group. Table 17.1 clearly demonstrates that the group constitution of C2v differs
from that of the group appearing in Example 16.1 (iii), even though the order is four
for both the case. Similar tables are given with C2h and D2. This is left for readers as
an exercise. We will find that the multiplication tables of these groups have the same
structure and that C2v, C2h, and D2 are all isomorphic to one another as a four group.
Table 17.2 gives matrix representation of symmetry operations for C2v. The repre-
sentation is defined as the transformation by the symmetry operations of a set of
basis vectors (x y z) in ℝ3.
Meanwhile, Table 17.3 gives the multiplication table of D2h. We recognize that
the multiplication table of C2v appears on upper left and lower right blocks. If we
suitably rearrange the order of group elements, we can make another multiplication
table so that, e.g., C2h may appear on upper left and lower right blocks. As in the case
of Table 17.2, Table 17.4 summarizes the matrix representation of symmetry
operations for D2h. There are eight group elements, i.e., an identity, an inversion,
three mutually perpendicular C2 axes, and three mutually perpendicular planes of
mirror symmetry (σ). Here, we consider a possibility of constructing subgroups of
D2h. The order of the subgroups must be a divisor of eight, and so let us list
subgroups whose order is four and examine how many subgroups exist. We have
8C4 ¼ 70 combinations, but those allowed should be restricted from the requirement
17.2 Successive Symmetry Operations 649

Fig. 17.8 Chemical (a) (b)


structural formulae and
point groups of (a) 6
6 6
thiophene, (b) bithiophene,
(c) biphenyl, and (d)
naphthalene
(c) (d)

Table 17.1 Multiplication C2v E C2(z) σ v(zx) σ 0v ðyzÞ


table of C2v
E E C2 σv σ 0v
C2(z) C2 E σ 0v σv
σ v(zx) σv σ 0v E C2
σ 0v ðyzÞ σ 0v σv C2 E

Table 17.2 Matrix representation of symmetry operations for C2v


E C2 (around z-axis) σ v(zx) σ 0 ðyzÞ
0 1 0 1 0 1 0v 1
Matrix 1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C
@0 1 0A @ 0 1 0 A @ 0 1 0 A @ 0 1 0A
0 0 1 0 0 1 0 0 1 0 0 1

of forming a group. This is because all the groups must contain identity element, and
so the number allowed is equal to or no greater than 7C3 ¼ 35.
(i) In light of the aforementioned discussion, two C2 axes mutually intersecting at
π/2 yield another C2 axis around the normal to a plane defined by the intersecting
axes. Thus, three C2 axes have been chosen and a D2 symmetry results. In this case,
we have only one choice. (ii) In the case of C2v, two planes mutually intersecting at
π/2 yield a C2 axis around their line of intersection. There are three possibilities of
choosing two axes out of three (i.e., 3C2 ¼ 3). (iii) If we choose the inversion (i)
along with, e.g., one of the three C2 axes, a σ necessarily results. This is also the case
when we first combine a σ with i to obtain a C2 axis. We have three possibilities (i.e.,
3C1 ¼ 3) as well. Thus, we have only seven choices to construct subgroups of D2h
having an order of four. This is summarized in Table 17.5.
An inverse of any element is that element itself. Therefore, if with any above
subgroup one chooses any element out of the remaining four elements and combines
it with the identity, one can construct a subgroup Cs, C2, or Ci of an order of 2. Since
all those subgroups of an order of 4 and 2 are commutative with D2h, these sub-
groups are invariant subgroups. Thus in terms of a direct-product group, D2h can be
expressed as various direct factors. Conversely, we can construct factor groups from
coset decomposition. For instance, we have
650 17 Symmetry Groups

Table 17.3 Multiplication table of D2h


D2h E C2(z) σ v(zx) σ 0v ðyzÞ i σ 00v ðxyÞ C 02 ðyÞ C 002 ðxÞ
E E C2 σv σ 0v i σ 00v C 02 C 002
C2(z) C2 E σ 0v σv σ 00v i C 002 C 02
σ v(zx) σv σ 0v E C2 C 02 C 002 i σ 00v
σ 0v ðyzÞ σ 0v σv C2 E C 002 C 02 σ 00v i
i i σ 00v C 02 C 002 E C2 σv σ 0v
σ 00v ðxyÞ σ 00v i C 002 C 02 C2 E σ 0v σv
C 02 ðyÞ C 02 C 002 i σ 00v σv σ 0v E C2
C 002 ðxÞ C 002 C 02 σ 00v i σ 0v σv C2 E

D2h =C 2v ffi C s , D2h =C2v ffi C 2 , D2h =C 2v ffi C i : ð17:31Þ

In turn, we express direct-product groups as, e.g.,

D2h ¼ C2v  C s , D2h ¼ C 2v  C 2 , D2h ¼ C 2v  C i :

Example 17.3 Figure 17.9 shows an equilateral triangle placed on the xy-plane of a
three-dimensional Cartesian coordinate. An orthonormal basis e1 and e2 are desig-
nated as shown. As for a chemical species of molecules, we have, e.g., boron
trifluoride (BF3). In the molecule, a boron atom is positioned at a molecular center
with three fluorine atoms located at vertices of the equilateral triangle. The boron
atom and fluorine atoms form a planar molecule (Fig. 17.10).
A symmetry group belonging to D3h comprises twelve symmetry operations such
that
 
D3h ¼ E, C 3 , C03 , C 2 , C 02 , C002 , σ h , S3 , S03 , σ v , σ 0v , σ 00v , ð17:32Þ

where symmetry operations of the same species but distinct operation are denoted by
a “prime” or “double prime.” When we represent these operations by a matrix, it is
straightforward in most cases. For instance, a matrix for σ v is given by Myz of
(17.15). However, we should make some matrix calculations about
C 02 , C 002 , σ 0v , and σ 00v .
To determine a matrix representation of, e.g., σ 0v in reference to the xy-coordinate
system with orthonormal basis vectors e1 and e2, we consider the x’y’-coordinate
system with orthonormal basis vectors e01 and e02 (see Fig. 17.9). A transformation
matrix between the two set of basis vectors is represented by
17.2
Successive Symmetry Operations

Table 17.4 Matrix representation of symmetry operations for D2h


D2h E C2(z) C2( y) C2(x) i σ(xy) σ(zx) σ(yz)
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Matrix 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C B C B C B C B C
@0 1 0A @ 0 1 0 A @ 0 1 0 A @ 0 1 0 A @ 0 1 0 A @ 0 1 0 A @ 0 1 0A @ 0 1 0A
0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1
651
652 17 Symmetry Groups

Table 17.5 Choice of Subgroup E C2(z) σ i Choice


symmetry operations for
D2 1 3 0 0 1
construction of subgroups of
D2h C2v 1 1 2 0 3
C2h 1 1 1 1 3

0 pffiffiffi 1
3 1
B 2  0C
 0 0 0 B pffiffi2ffi C
e1 e2 e3 ¼ ðe1 e2 e3 ÞRzπ6 ¼ ðe1 e2 e3 ÞB
B 1
C: ð17:33Þ
0C
3
@ 2 2 A
0 0 1

This representation corresponds to (11.69). Let Σv be a reflection with respect to the


z0x0-plane. This is the same operation as σ 0v . However, a matrix representation is
different. This is because Σv is represented in reference to the x’y’z’-system, while σ 0v
is in reference to xyz-system. The matrix representation of Σv is simple and expressed
as
0 1
1 0 0
B C
Σv ¼ @ 0 1 0 A: ð17:34Þ
0 0 1

Referring to (11.80), we have


h i1
Rzπ6 Σv ¼ σ 0v Rzπ6 or σ 0v ¼ Rzπ6 Σv Rzπ6 : ð17:35Þ

Thus, we see that in the first equation of (17.35) the order of multiplications are
reversed according as the latter operation is expressed in the xyz-system or in the
x’y’z’-system. As a full matrix representation, we get
0 pffiffiffi 1 0 pffiffiffi 1
3 1 0 1 3 1
B 2  0C 1 0 0 B 2 0C
B pffiffi2ffi CB CB p2ffiffiffi C
σ 0v ¼ B
B 1
C@ 0 1 0 AB C
0C B1 0C
3 3
@ 2 2 A @ 2 2 A
0 0 1
0 0 1 0 0 1
0 pffiffiffi 1
1 3
B 2 0C
B pffiffiffi 2 C
¼BB 3
C: ð17:36Þ
0C
1
@ 2  A
2
0 0 1

Notice that this matrix representation is referred to the original xyz-coordinate


system as before. Graphically, (17.35) corresponds to the multiplication of the
17.2 Successive Symmetry Operations 653

Fig. 17.9 Equilateral


triangle placed on the xy-
plane. Several symmetry
operations of D3h are shown.
We consider the successive L
operations (i), (ii), and (iii)
to represent σ 0v in reference ± /6
to the xy-coordinate system
(see text) LLL
LL
(Σ );
Σ

Fig. 17.10 Boron


trifluoride (BF3) belonging
to a D3h point group
%

symmetry operations done in the order of (i) π/6 rotation, (ii) reflection (denoted by
Σv), and (iii) π/6 rotation (Fig. 17.9). The associated matrices are multiplied from
the left.
Similarly, with C02 we have
0 pffiffiffi 1
1 3
B 2 0 C
B pffiffiffi 2 C
C 02 ¼ B
B 3
C: ð17:37Þ
0 C
1
@ 2  A
2
0 0 1

The matrix form of (17.37) can also be decided in a manner similar to the above
according to three successive operations shown in Fig. 17.9. In terms of classes,
σ v , σ 0v , and σ 00v form a conjugacy class and C 2 , C02 , and C 002 form another conjugacy
class. With regard to the reflection and rotation, we have det σ 0v ¼ 1 and det
C2 ¼ 1, respectively.
654 17 Symmetry Groups

17.3 O and Td Groups

According as a geometric object (or a molecule) has a higher symmetry, we have to


deal with many symmetry operations and relationship between them. As an example,
we consider O and Td groups. Both the groups have 24 symmetry operations and are
isomorphic.
Let us think of the group O fist. We start with considering rotations of π/2 around
x-, y-, and z-axis. The matrices representing these rotations are obtained from (17.12)
to give
0 1 0 1
1 0 0 0 0 1
B C B C
Rxπ2 ¼ @ 0 0 1 A, Ryπ2 ¼ @ 0 1 0 A,
0 1 0 1 0 0
0 1
0 1 0
B C
Rzπ2 ¼ @ 1 0 0 A: ð17:38Þ
0 0 1

We continue multiplications of these matrices so that the matrices can make up a


complete set (i.e., a closure). Counting over those matrices, we have 24 of them and
they form a group termed O. The group O is a pure rotation group. Here, the pure
rotation group is defined as a group whose group elements only comprise proper
rotations (with their determinant of 1). An example is shown in Fig. 17.11, where
individual vertices of the cube have three arrows for the cube not to possess mirror
symmetries or inversion symmetry. The group O has five conjugacy classes.
Figure 17.12 summarizes them. Geometrical characteristics of individual classes
are sketched as well. These classes are categorized by a trace of the matrix. This is
because the trace is kept unchanged by a similarity transformation. (Remember that
elements of the same conjugacy class are connected with a similarity transforma-
tion.) Having a look at the sketches, we notice that each operation switches the basis
vectors e1, e2, and e3, i.e., x-, y-, and z-axes. Therefore, the presence of diagonal
elements (either 1 or 1) implies that the matrix takes the basis vector(s) as an
eigenvector with respect to the rotation. Corresponding eigenvalue(s) are either 1 or
1 accordingly. This is expected from the fact that the matrix is an orthogonal
matrix (i.e., unitary). The trace, namely a summation of diagonal elements, is closely
related to the geometrical feature of the operation.
The operations of a π rotation around the x-, y-, and z-axis and those of a π
rotation around an axis bisecting any two of three axes have a trace 1. The former
operations take all the basis vectors as an eigenvectors; that is, all the diagonal
elements are nonvanishing. With the latter operations, however, only one diagonal
element is 1. This feature comes from that the bisected axes are switched by the
rotation, whereas the remaining axis is reversed by the rotation.
17.3 O and Td Groups 655

Fig. 17.11 Cube whose individual vertices have three arrows for the cube not to possess mirror
symmetries. This object belongs to a point group O called pure rotation group

χ=3
z
100
010
y
001
z x
χ=1
10 0 1 00 0 0 1 0 0 -1 0 -1 0 0 1 0 π/2
0 0 -1 0 01 0 1 0 0 1 0 1 0 0 -1 0 0
01 0 0 -1 0 -1 0 0 1 0 0 0 0 1 0 0 1 y
x
z
χ=0
001 0 01 0 0 -1 0 0 -1 010 0 -1 0 0 1 0 0 -1 0 y
100 -1 0 0 1 00 -1 0 0 001 0 0 -1 0 0 -1 0 0 1
010 0 -1 0 0 -1 0 0 10 100 10 0 -1 0 0 -1 0 0 2π/3
x

χ = 䠉1 z
10 0 -1 0 0 -1 0 0 π
0 -1 0 0 1 0 0 -1 0
0 0 -1 0 0 -1 0 01
y z
x
χ = 䠉1
-1 0 0 -1 0 0 0 1 0 0 -1 0 0 0 1 0 0 -1 y
00 1 0 0 -1 1 0 0 -1 0 0 0 -1 0 0 -1 0
01 0 0 -1 0 0 0 -1 0 0 -1 1 0 0 -1 0 0 x

Fig. 17.12 Point group O and its five conjugacy classes. Geometrical characteristics of individual
classes are briefly sketched

Another characteristic is the generation of eight rotation axes that trisect the x-, y-,
and z-axes, more specifically, a solid angle π/2 formed by the x-, y-, and z-axes. Since
the rotation switches all the x-, y-, and z-axes, the trace is zero. At the same time, we
find that this operation belongs to C3. This operation is generated by successive two
π/2 rotations around two mutually orthogonal axes. To inspect this situation more
656 17 Symmetry Groups

closely, we consider a conjugacy class of π/2 rotation that belongs to the C4


symmetry and includes six elements, i.e., Rxπ2 , Ryπ2 , Rzπ2 , Rxπ2 , Ryπ2 , and Rzπ2 . With
these notations, e.g., Rxπ2 stands for a π/2 counterclockwise rotation around the x-axis;
Rxπ2 denotes a π/2 counterclockwise rotation around the x-axis. Consequently, Rxπ2
implies a π/2 clockwise rotation around the x-axis and, hence, an inverse element of
Rxπ2 . Namely, we have

1
Rxπ2 ¼ Rxπ2 : ð17:39Þ

Now let us consider the successive two rotations. This is denoted by the multi-
plication of matrices that represent the related rotations. For instance, the multipli-
cation of, e.g., Rxπ2 and R0yπ produces the following:
2

0 10 1 0 1
1 0 0 0 0 1 0 0 1
B CB C B C
Rxyz2π3 ¼ Rxπ2 R0yπ ¼ @ 0 0 1 A@ 0 1 0A ¼ @1 0 0 A: ð17:40Þ
2
0 1 0 1 0 0 0 1 0

In (17.40), we define Rxyz2π3 as a 2π/3 counterclockwise rotation around an axis that


trisects the x-, y-, and z-axes. The prime “ 0 ” of R0yπ means that the operation is carried
2
out in reference to the new coordinate system reached by the previous operation Rxπ2.
For this reason, R0yπ is operated (i.e., multiplied) from the right in (17.40). Compare
2
this with the remark made just after (17.30). Changing the order of Rxπ2 and R0yπ , we
2
have
0 10 1 0 1
0 0 1 1 0 0 0 1 0
0 B CB C B C
Rxyz2π3 ¼ Ry2 Rxπ ¼ @ 0
π 1 0 A@ 0 0 1 A ¼ @ 0 0 1 A, ð17:41Þ
2
1 0 0 0 1 0 1 0 0

where Rxyz2π3 is a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-,
and z-axes. Notice that we used R0xπ this time, because it was performed after Ryπ2 .
2
Thus, we notice that there are eight related operations that trisect eight octants of the
coordinate system. These operations are further categorized into four sets in which
the two elements are an inverse element of each other. For instance, we have

1
Rx y z2π3 ¼ Rxyz2π3 : ð17:42Þ

Notice that a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-,
and z-axes is equivalent to a 2π/3 clockwise rotation around an axis that trisects the
1
x-, y-, and z-axes. Also, we have Rx yz2π3 ¼ Rxyz2π3 etc. Moreover, we have “cyclic”
relations such as
17.3 O and Td Groups 657

Rxπ2 R0yπ ¼ Ryπ2 R0zπ ¼ Rzπ2 R0xπ ¼ Rxyz2π3 : ð17:43Þ


2 2 2

Returning back to Sect. 11.4, we had


10 2 0 13
x1 x1
0 B C 6 B C7
A½PðxÞ = ½ðe1   en ÞPA @ ⋮ A ¼ ðe1   en Þ4AO P@ ⋮ A5: ð11:79Þ
xn xn

Implication of (11.79) is that0LHS1is related to the transformation of basis vectors


x1
B C
while retaining coordinates @ ⋮ A and that transformation matrices should be
xn
operated on the basis vectors from the right. Meanwhile, RHS describes the trans-
formation of coordinates, while retaining basis vectors. In that case, transformation
matrices should be operated on the coordinates from the left. Thus, the order of
operator multiplication is reversed. Following (11.80), we describe

1
Rxπ2 R0yπ ¼ RO Rxπ2 , i:e:, RO ¼ Rxπ2 R0yπ Rxπ2 , ð17:44Þ
2 2

where RO is viewed in reference to the original (or fixed) coordinate system and
conjugated to R0yπ . Thus, we have
2

0 10 10 1
1 0 0 0 0 1 1 0 0
B CB CB C
RO ¼ @ 0 0 1 A@ 0 1 0 A@ 0 0 1A
0 1 0 1 0 0 0 1 0
0 1
0 1 0
B C
¼ @1 0 0 A: ð17:45Þ
0 0 1

Note that (17.45) is identical to a matrix representation of a π/2 rotation around the z-
axis. This is evident from the fact that the y-axis is converted to the original z-axis by
Rxπ2 ; readers, imagine it.
We have two conjugacy classes of π rotation (the C2 symmetry). One of them
includes six elements, i.e., Rxyπ , Ryzπ, Rzxπ, Rxyπ , Ryzπ , and Rzxπ . For these notations a
subscript, e.g., xy stands for an axis that bisects the angle formed by x- and y-axes. A
subscript xy denotes an axis bisecting the angle formed by x- and y-axes. Another
class includes three elements, i.e., Rxπ , Ryπ, and Rzπ.
As for Rxπ, Ryπ, and Rzπ, a combination of these operations should yield a C2
rotation axis as discussed in Sect. 17.2. Of these three rotation axes, in fact, any two
658 17 Symmetry Groups

produce a C2 rotation around the remaining axis, as is the case with naphthalene
belonging to the D2h symmetry (see Sect. 17.2).
Regarding the class comprising six π rotation elements, a combination of, e.g.,
Rxyπ and Rxyπ crossing each other at a right angle causes a related effect. For the other
combinations, the two C2 axes intersect each other at π/3; see Fig. 17.6 and put
θ ¼ π/3 there. In this respect, elementary analytic geometry teaches the positional
relationship among planes and
0 straight
1 0 lines. 1 The0argument
1 is as follows: A plane
x1 x2 x3
B C B C B C
determined by three points @ y1 A, @ y2 A, and @ y3 A that do not sit on a line is
z1 z2 z3
expressed by a following equation:

x y z 1
x1 y1 z1 1
¼ 0: ð17:46Þ
x2 x2 z2 1
x3 y3 z3 1
0 1 0 1 0 1
0 1 0
B C B C B C
Substituting @ 0 A, @ 1 A, and @ 1 A, we have
0 0 1

x  y þ z ¼ 0: ð17:47Þ

Taking account of direction cosines and using the Hesse’s normal form, we get

1
pffiffiffi ðx  y þ zÞ ¼ 0, ð17:48Þ
3

where the normal to the plane expressed in (17.48) has direction cosines of p1ffiffi3,  p1ffiffi3,
and p1ffiffi in relation to the x-, y-, and z-axes, respectively.
3
0 Therefore,
1 the normal is given by a straight line connecting the origin and
1
B C
@ 1 A. In other words, a line connecting the origin and a corner of a cube is the
1
normal to the plane described by (17.48). That plane is formed by two intersecting
lines, i.e., rotation axes of C2 and C02 (see Fig. 17.13 that depicts a cube of each side
of 2). These axes0makepan π/3; this1
ffiffiffi 1angle 0 can easily be checked by taking an inner
1= 2 0
B pffiffiffi C B pffiffiffi C
product between @ 1= 2 A and @ 1= 2 A. These column vectors are two direction
pffiffiffi
0 1= 2
cosines of C2 and C 02 . On the basis of the discussion of Sect. 17.2, we must have a
17.3 O and Td Groups 659

(0, 1, 1)

(1, ‒1, 1) π/3


3 y
O
2
(1, 1, 0)

Fig. 17.13 Rotation axes of C2 and C 02 along with another rotation axis C3 in a point group O

6KHHW 6KHHW 6KHHW

Fig. 17.14 Simple kit that helps visualize the positional relationship among planes and straight
lines in three-dimensional space. To make it, follow next procedures: (i) Take three thick sheets of
paper and make slits (dashed lines) as shown. (ii) Insert Sheet 2 into Sheet 1 so that the two sheets
can make a right angle. (iii) Insert Sheet 3 into combined Sheets 1 and 2

rotation axis of C3. That is, this axis trisects a solid angle π/2 shaped by three
intersecting sides.
It is sometimes hard to visualize or envisage the positional relationship among
planes and straight lines in three-dimensional space. It will therefore be useful to
make a simple kit to help visualize it. Figure 17.14 gives an illustration.
Another typical example having 24 group elements is Td. A molecule of methane
belongs to this symmetry. Table 17.6 collects the relevant symmetry operations and
their (3, 3) matrix representations. As in the case of Fig. 17.12, the matrices show
how a set of vectors (x y z) are transformed according to the symmetry operations.
Comparing it with Fig. 17.12, we immediately recognize that the close relationship
between Td and O exists and that these point groups share notable characteristics.
(i) Both Td and O consist of five conjugacy classes, each of which contains the
same number of symmetry species. (ii) Both Td and O contain a pure rotation group
T as a subgroup. The subgroup T consists of 12 group elements E, 8C3, and 3C2.
Other remaining twelve group elements of Td are symmetry species related to
reflection; S4 and σ d. The elements 6S4 and 6σ d correspond to 6C4 and 6C2 of O,
660

Table 17.6 Symmetry operations and their matrix representation of Td


E:
0 1
1 0 0
B C
@0 1 0A
0 0 1
8C3:
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0
B C B C B C B C B C B C B C B C
@1 0 0A @ 1 0 0 A @ 1 0 0 A @ 1 0 0 A @ 0 0 1 A @ 0 0 1 A @ 0 0 1 A @ 0 0 1A
0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0
3C2:
0 1 0 1 0 1
1 0 0 1 0 0 1 0 0
B C B C B C
@ 0 1 0 A @ 0 1 0 A @ 0 1 0 A
0 0 1 0 0 1 0 0 1
6S4:
0 1 0 1 0 1 0 1 0 1 0 1
1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0
B C B C B C B C B C B C
@ 0 0 1 A @ 0 0 1 A @ 0 1 0 A @ 0 1 0 A @ 1 0 0 A @ 1 0 0 A
0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1
17

6σ d:
0 1 0 1 0 1 0 1 0 1 0 1
1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0
B C B C B C B C B C B C
@0 0 1A @ 0 0 1 A @ 0 1 0 A @ 0 1 0 A @1 0 0A @ 1 0 0 A
0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1
Symmetry Groups
17.3 O and Td Groups 661

respectively. That is, successive operations of S4 cause similar effects to those of C4


of O. Meanwhile, successive operations of σ d are related to those of 6C2 of O.
Let us imagine in Fig. 17.15 that a regular tetrahedron is inscribed in a cube. As
an example of symmetry operations, suppose that three pairs of planes of σ d (six
planes in total) are given by equations of x ¼  y and y ¼  z and z ¼  x. Their
Hesse’s normal forms are represented as

1
pffiffiffi ðx  yÞ ¼ 0, ð17:49Þ
2
1
pffiffiffi ðy  zÞ ¼ 0, ð17:50Þ
2
1
pffiffiffi ðz  xÞ ¼ 0: ð17:51Þ
2

Then, a dihedral angle α of the two planes is given by

1 1 1
cos α ¼ pffiffiffi ∙ pffiffiffi ¼ or ð17:52Þ
2 2 2
1 1 1 1
cos α ¼ pffiffiffi ∙ pffiffiffi  pffiffiffi ∙ pffiffiffi ¼ 0: ð17:53Þ
2 2 2 2

That is, α ¼ π/3 or α ¼ π/2. On the basis of the discussion of Sect. 17.2, the
intersection of the two planes must be a rotation axis of C3 or C2. Once again, in the
case of C3 the intersection is a straight line connecting the origin and a vertex of the
cube. This can readily be verified as follows: For instance, two planes given by x ¼ y
and y ¼ z make an angle π/3 and produce an intersection line x ¼ y ¼ z. This line, in
turn, connects the origin and a vertex of the cube.
If we choose, e.g., two planes x ¼  y from the above, these planes make a right
angle and their intersection must be a C2 axis. The three C2 axes coincide with the x-,
y-, and z-axes. In this light, σ d functions similarly to 6C2 of O in that their
combinations produce 8C3 or 3C2. Thus, constitution and operation of Td and
O are related.
Let us more closely inspect the structure and constitution of O and Td. First we
construct mapping ρ between group elements of O and Td such that

ρ : g 2 O ⟷ g0 2 T d , if g 2 T,

ρ : g 2 O ⟷  g0 Þ1 2 T d , if g 2= T:

In the above relation, the minus sign indicates that with an inverse representation
matrix R must be replaced with R. Then, ρ(g) ¼ g0 is an isomorphic mapping. In
fact, comparing Fig. 17.12 and Table 17.6, ρ gives identical matrix representation for
O and Td. For example, taking the first matrix of S4 of Td, we have
662 17 Symmetry Groups

Fig. 17.15 Regular z


tetrahedron inscribed in a
cube. As an example of
symmetry operations, we
can choose three pairs of
planes of σ d (six planes in
total) given by equations of
x ¼  y and y ¼  z and
z¼ x
y
O

0 11 0 1 0 1
1 0 0 1 0 0 1 0 0
B C B C B0 0 1 C
@ 0 0 1 A ¼ @ 0 0 1A ¼ @ A:
0 1 0 0 1 0 0 1 0

The resulting matrix is identical to the first matrix of C4 of O. Thus, we find Td and
O are isomorphic to each other.
Both O and Td consist of 24 group elements and isomorphic to a symmetric group
S4; do not confuse it with the same symbol S4 as a group element of Td. The subgroup
T consists of three conjugacy classes E, 8C3, and 3C2. Since T is constructed only by
entire classes, it is an invariant subgroup; in this respect see the discussion of Sect.
16.3. The groups O, Td, and T along with Th and Oh form cubic groups [2]. In
Table 17.7, we list these cubic groups together with their name and order. Of these,
O is a pure rotation subgroup of Oh and T is a pure rotation subgroup of Td and Th.
Symmetry groups are related to permutations of n elements (or objects or
numbers). The permutation has already appeared in (11.56) when we defined a
determinant of a matrix. That was defined as
 
1 2  n
σ¼ ,
i1 i2    in

where σ means the permutation of the numbers 1, 2,   , n. Therefore, the above


symmetry group has n! group elements (i.e., different ways of rearrangements).
Although we do not dwell on symmetric groups much, we describe a following
important theorem related to finite groups without proof. Interested readers are
referred to literature [1].
17.4 Special Orthogonal Group SO(3) 663

Table 17.7 Several cubic groups and their characteristics


Notation Group name Order Remark
T Tetrahedral rotation 12 Subgroup of Th, Td
Th, Td Tetrahedral 24
O Octahedral rotation 24 Subgroup of Oh
Oh Octahedral 48

Theorem 17.1: Cayley’s Theorem [1] Every finite group ℊ of order n is isomor-
phic to a subgroup (containing a whole group) of the symmetric group Sn.

17.4 Special Orthogonal Group SO(3)

In Part III and Part IV thus far, we have dealt with a broad class of linear trans-
formations. Related groups are finite groups. Here we will describe characteristics of
special orthogonal group SO(3), a kind of infinite groups. The SO(3) represents
rotations in three-dimensional Euclidean space ℝ3. Rotations are made around an
axis (a line through the origin) with the origin fixed. The rotation is defined by an
azimuth of direction and a magnitude (angle) of rotation. The azimuth of direction is
defined by two parameters and the magnitude is defined by one parameter, a rotation
angle. Hence, the rotation is defined by three independent parameters. Since those
parameters are continuously variable, SO(3) is one of continuous groups.
Two rotations result in another rotation with the origin again fixed. A reverse
rotation is unambiguously defined. An identity transformation is naturally defined.
An associative law holds as well. Thus, the relevant rotations form a group, i.e., SO
(3). The rotation is represented by a real (3, 3) matrix whose determinant is 1. A
matrix representation is uniquely determined once an orthonormal basis is set in ℝ3.
Any rotation is represented by a rotation matrix accordingly. The rotation matrix R is
defined by

RT R ¼ RRT ¼ E, ð17:54Þ

where R is a real matrix with det R ¼ 1. Notice that we exclude the case where det
R ¼  1. Matrices that satisfy (17.54) with det R ¼  1 are referred to as orthogonal
matrices that cause orthogonal transformation. Correspondingly, orthogonal groups
(represented by orthogonal matrices) contain rotation groups as a special case. In
other words, the orthogonal groups contain the rotation groups as a subgroup. An
orthogonal group in ℝ3 denoted O(3) contains SO(3) as a subgroup. By the same
token, orthogonal matrices contain rotation matrices as a special case.
In Sect. 17.2 we treated reflection and improper rotation with a determinant of
their matrices being 1. In this section these transformations are excluded and only
rotations are dealt with. We focus on geometric characteristics of the rotation groups.
664 17 Symmetry Groups

Readers are referred to more detailed representation theory of SO(3) in appropriate


literature [1].

17.4.1 Rotation Axis and Rotation Matrix

In this section we represent a vector such as jxi. We start with showing that any
rotation has a unique presence of a rotation axis. The rotation axis is defined by the
following: Suppose that there is a rigid body with some point within the body fixed.
Here the said rigid body can be that with infinite extent. Then the rigid body exerts
rotation. The rotation axis is a line on which every point is unmoved during the
rotation. As a matter of course, identity matrix E has thee linearly independent
rotation axes. (Practically, this represents no rotation.)
Theorem 17.2 Any rotation matrix R is accompanied by at least one rotation axis.
Unless the rotation matrix is identity, the rotation matrix should be accompanied by
one and only one rotation axis.
Proof As R is an orthogonal matrix of a determinant 1, so are RT and R1. Then we
have

ðR  EÞT ¼ RT  E ¼ R1  E: ð17:55Þ

Hence, we get
   
det ðR  EÞ ¼ det RT  E ¼ det R1  E ¼ det R1 ðE  RÞ
 
¼ det R1 det ðE  RÞ ¼ det ðE  RÞ ¼ det ðR  EÞ: ð17:56Þ

Note here that with any (3, 3) matrix A of ℝ3,

det A ¼ ð1Þ9 det A ¼ det ðAÞ:

This equality holds with ℝn (n : odd), but in ℝn (n : even) we have det


A ¼ det (A). Then, (17.56) results in a trivial equation 0 ¼ 0 accordingly.
Therefore, the discussion made below only applies to ℝn (n : odd). Thus, from
(17.56) we have

det ðR  E Þ ¼ 0: ð17:57Þ

This implies that for ∃jx0i 6¼ 0


17.4 Special Orthogonal Group SO(3) 665

ðR  E Þ j x0 i ¼ 0: ð17:58Þ

Therefore, we get

Rðajx0 iÞ ¼ a j x0 i, ð17:59Þ

where a is an arbitrarily chosen real number. In this case an eigenvalue of R is


1, which an eigenvector a j x0i corresponds to. Thus, as a rotation axis, we have a
straight line expressed as

l ¼ Spanfajx0 i; a 2 ℝg: ð17:60Þ

This proves the presence of a rotation axis.


Next suppose that there are two (or more) rotation axes. The presence of two
rotation axes naturally implies that there are two linearly independent vectors (i.e.,
two straight lines that mutually intersect at the fixed point). Suppose that such
vectors are jui and jvi. Then we have

ðR  E Þjui ¼ 0, ð17:61Þ
ðR  EÞjvi ¼ 0: ð17:62Þ

Let us consider a vector 8jyi that is chosen from Span{j ui, j vi}, to which we
assume that

Spanfjui, vijg  Spanfajui, bjvi; a, b 2 ℝg: ð17:63Þ

That is, Span{j ui, j vi} represents a plane P formed by two mutually intersecting
straight lines. Then we have

y ¼ sjui þ tjvi, ð17:64Þ

where s and t are some real numbers. Operating R  E on (17.64), we have

ðR  EÞjyi ¼ ðR  E Þðsjui þ tjviÞ ¼ sðR  E Þjui þ t ðR  E Þjvi ¼ 0: ð17:65Þ

This indicates that any vectors in P can be an eigenvector of R, implying that an


infinite number of rotation axes exist.
Now, take another vector jwi that is perpendicular to the plane P (See Fig. 17.16).
Let us consider an inner product hyj Rwi. Since

Rjui ¼ jui, hujR{ ¼ hujRT ¼ huj: ð17:66Þ

Similarly, we have
666 17 Symmetry Groups

Fig. 17.16 Plane P formed | ⟩


by two mutually intersecting
straight lines represented by
jui and j vi. Another vector
jwi is perpendicular to the
plane P

P
| ⟩
| ⟩

hvjRT ¼ hvj : ð17:67Þ

Therefore, using the relation (17.64), we get

hyjRT ¼ hyj : ð17:68Þ

Here we are dealing with real numbers and, hence, we have

R{ ¼ RT: ð17:69Þ

Now we have
   
hyjRwi ¼ yRT jRw ¼ yjRT Rw ¼ hyjEwi ¼ hyjwi ¼ 0: ð17:70Þ

In (17.70) the second equality comes from the associative law; the third is due to
(17.54). The last equality comes from that jwi is perpendicular to P.
From (17.70) we have

hyjðR  E Þwi ¼ 0: ð17:71Þ

However, we should be careful not to conclude immediately from (17.71) that


(R  E)jwi ¼ 0; i.e., Rjwi ¼ jwi. This is because in (17.70) jyi does not represent
all vectors in ℝ3, but merely represent all the vectors in Span{j ui, j vi}. Nonethe-
less, both jwi and jRwi are perpendicular to P, and so

jRwi ¼ ajwi, ð17:72Þ

where a is an arbitrarily chosen real number. From (17.72), we have


17.4 Special Orthogonal Group SO(3) 667

 {   
wR jRw ¼ wRT jRw ¼ hwjwi ¼ jaj2 hwjwi, i:e:, a ¼ 1,

where the first equality comes from that R is an orthogonal matrix. Since det R¼1,
a ¼ 1. Thus, from (17.72) this time around we have jRwi ¼ jwi; that is,

ðR  E Þjwi ¼ 0: ð17:73Þ

Equations (17.65) and (17.73) imply that for any vector jxi arbitrarily chosen from
ℝ3 we have

ðR  EÞjxi ¼ 0: ð17:74Þ

Consequently, we get

R  E ¼ 0 or R ¼ E: ð17:75Þ

The above procedures represented by (17.61) to (17.75) indicate that the presence
of two rotation axes necessarily requires a transformation matrix to be identity. This
implies that all the vectors in ℝ3 are an eigenvector of a rotation matrix.
Taking contraposition of the above, unless the rotation matrix is identity, the
relevant rotation cannot have two rotation axes. Meanwhile, the proof of the former
half ensures the presence at least one rotation axis. Consequently, any rotation is
characterized by a unique rotation axis except for the identity transformation. This
completes the proof. ∎
An immediate consequence of Theorem 17.2 is that the rotation matrix should
have an eigenvalue 1 which an eigenvector representing the rotation axis corre-
sponds to. This statement includes a trivial fact that all the eigenvalues of the identity
matrix are 1. In Sect. 14.4, we calculated eigenvalues of a two-dimensional rotation
matrix. The eigenvalues were eiθ or eiθ, where θ is a rotation angle.
Let us consider rotation matrices that we dealt with in Sect. 17.1. The matrix
representing the rotation around the z-axis by a rotation angle θ is expressed by
0 1
cos θ  sin θ 0
B C
R ¼ @ sin θ cos θ 0 A: ð17:8Þ
0 0 1

In Sect. 14.4 we treated diagonalization of a rotation matrix. As R is reduced, the


diagonalization can be performed in a manner essentially the same as (14.92). That
is, as a diagonalizing unitary matrix, we have
668 17 Symmetry Groups

0 1 0 1
1 1 1 i
pffiffiffi pffiffiffi 0 pffiffiffi pffiffiffi 0
B 2 2 C B 2 2 C
B C B C
U¼B
B  piffiffiffi i C, U{ ¼ B
B pffiffiffi  piffiffiffi
1 C: ð17:76Þ
@ pffiffiffi 0 C
A @ 2 0C
A
2 2 2
0 0 1 0 0 1

As a result of the unitary similarity transformation, we get


0 1 0 1
1 i 1 1
pffiffiffi pffiffiffi 0 0 1 pffiffiffi pffiffiffi 0
B 2 2 C cos θ  sin θ 0 B 2 2 C
B CB CB C
B CB C B C
U { RU ¼ B 1 C sin θ cos θ 0 AB
0C
i i i
B pffiffiffi  pffiffiffi 0 C@ B  pffiffiffi pffiffiffi C
@ 2 2 A @ 2 2 A
0 0 1
0 0 1 0 0 1
0 iθ 1
e 0 0
B C
¼B@ 0 e
iθ
0CA:
0 0 1
ð17:77Þ

Thus, eigenvalues are 1, eiθ, and eiθ. The eigenvalue 1 results from the existence of
the unique rotation axis. When θ ¼ 0, (17.77) gives an identity matrix with all the
eigenvalues 1 as expected. When θ ¼ π, eigenvalues are 1,  1, and 1. The
eigenvalue 1 is again associated with the unique rotation axis. The (unitary) simi-
larity transformation keeps a trace unchanged, that is, the trace χ is

χ ¼ 1 þ 2 cos θ: ð17:78Þ

As R is a normal operator, spectral decomposition can be done as in the case of


Example 14.1. Here we only show the result below.
0 1 0 1
1 i 1 i 0 1
0  0 0 0 0
B 2 2 C B2 2 C
B C B C B C
R ¼ eiθ B i þ eiθ B i þ@0 0 0 A:
0C 0C
1 1
@ A @ A
2 2 2 2 0 0 1
0 0 0 0 0 0

Three matrices of the above equation are projection operators.


17.4 Special Orthogonal Group SO(3) 669

17.4.2 Euler Angles and Related Topics

Euler angles are well known and have been being used in various fields of science.
We wish to connect the above discussion with Euler angles.
In Part III we dealt with successive linear transformations. This can be extended
to the case of three or more successive transformations. Suppose that we have three
successive transformation R1, R2, and R3 and that the coordinate system (a three-
dimensional orthonormal basis) is transformed from O ⟶ I ⟶ II ⟶ III accordingly.
The symbol “O” stands for the original coordinate system and I, II, and III represent
successively transformed systems.
With the discussion that follows, let us denote the transformation by R20, R30, R300,
etc. in reference to the coordinate system I. For example, R30 means that the third
transformation is viewed from the system I. The transformation R300 indicates the
third transformation is viewed from the system II. That is, the number of primes “0”
denotes the number of the coordinate system to distinguish the systems I and II. Let
R2 (without prime) stand for the second transformation viewed from the system O.
Meanwhile, we have

R1 R02 ¼ R2 R1 : ð17:79Þ

This notation is in parallel to (11.80). Similarly, we have

R2 0 R3 00 ¼ R3 0 R2 0 and R1 R03 ¼ R3 R1 : ð17:80Þ

Therefore, we get [3]

R1 R2 0 R3 00 ¼ R1 R3 0 R2 0 ¼ R3 R1 R2 0 ¼ R3 R2 R1 : ð17:81Þ

Also combining (17.79) and (17.80), we have

R3 00 ¼ ðR2 R1 Þ1 R3 ðR2 R1 Þ: ð17:82Þ

Let us call R20, R300, etc. a transformation on a “moving” coordinate system (i.e., the
system I, II, III,   ). On the other hand, we call R1, R2, etc. a transformation on a
“fixed” system (i.e., original coordinate system O). Thus, (17.81) shows that the
multiplication order is reversed with respect to the moving system and fixed
system [3].
For a practical purpose it would be enough to consider three successive trans-
formations. Let us think of, however, a general case where n successive transforma-
tions are involved (n denotes a positive integer). For the purpose of succinct
notation, let us define the linear transformations and relevant coordinate systems
ðiÞ
as those in Fig. 17.17. Also, we define a following orthogonal transformation Rj :
670 17 Symmetry Groups

( ) ( ) ( ) ( )

O ⟶ I ⟶ II ⟶ III ⟶ ⋯
z

y
x

Fig. 17.17 Successive orthogonal transformations and relevant coordinate systems

ðiÞ ð0Þ
R j ð0 i<j nÞ, and Ri  Ri , ð17:83Þ

ðiÞ
where Rj is defined as a transformation Rj described in reference to the coordinate
ð0Þ
system i; Ri means that Ri is referred to the original coordinate system (i.e., the
fixed coordinate). Then we have

ði2Þ ði1Þ ði2Þ ði2Þ


Ri1 Rk ¼ Rk Ri1 ðk > i  1Þ: ð17:84Þ

Particularly, when i ¼ 3

ð1Þ ð2Þ ð1Þ ð1Þ


R2 Rk ¼ Rk R2 ðk > 2Þ: ð17:85Þ

For i ¼ 2 we have

ð1Þ
R1 Rk ¼ Rk R1 ðk > 1Þ: ð17:86Þ

fn
We define n time successive transformations on a moving coordinate system as R
such that

f ð1Þ ð2Þ ðn3Þ ðn2Þ


Rn  R1 R2 R3   Rn2 Rn1 Rðnn1Þ: ð17:87Þ

ðn2Þ
Applying (17.84) on Rn1 Rðnn1Þ and rewriting (17.87), we have

f ð1Þ ð2Þ ðn3Þ ðn2Þ


Rn ¼ R1 R2 R3   Rn2 Rðnn2Þ Rn1 : ð17:88Þ

ðn3Þ
Applying (17.84) again on Rn2 Rðnn2Þ , we get

f ð1Þ ð2Þ ðn3Þ ðn2Þ


Rn ¼ R1 R2 R3   Rðnn3Þ Rn2 Rn1 : ð17:89Þ

Proceeding similarly, we have


17.4 Special Orthogonal Group SO(3) 671

f ð1Þ ð2Þ ðn3Þ ðn2Þ ð1Þ ð2Þ ðn3Þ ðn2Þ


Rn ¼ R1 Rðn1Þ R2 R3   Rn2 Rn1 ¼ Rn R1 R2 R3   Rn2 Rn1 , ð17:90Þ

where with the last equality we used (17.86). In this case, we have

R1 Rðn1Þ ¼ Rn R1 : ð17:91Þ

To reach RHS of (17.90), we applied (17.84) (n  1) times in total. Then we


ðn2Þ
repeat the above procedures with respect to Rn1 another (n  2) times to get

f ð1Þ ð2Þ ðn3Þ


Rn ¼ Rn Rn1 R1 R2 R3   Rn2 : ð17:92Þ

Further proceeding similarly, we finally get

fn ¼ Rn Rn1 Rn2   R3 R2 R1 :


R ð17:93Þ

In total, we have applied the permutation of (17.84) n(n  1)/2 times. When n is 2, n
(n  1)/2 ¼ 1. This is the case with (17.79). If n is 3, n(n  1)/2 ¼ 3. This is the case
with (17.81). Thus, (17.93) once again confirms that the multiplication order is
reversed with respect to the moving system and fixed system.
Meanwhile, we define P e as

e  R1 Rð21Þ Rð32Þ   Rðn2


P
n3Þ ðn2Þ
Rn1 : ð17:94Þ

Alternately, we describe

e ¼ Rn1 Rn2   R3 R2 R1 :


P ð17:95Þ

Then, from (17.87) and (17.90) we get

R e ðnn1Þ ¼ Rn P:
fn ¼ PR e ð17:96Þ

Equivalently, we have

Rn ¼ PR e1 or
e ðnn1Þ P ð17:97Þ
1
Rðnn1Þ ¼ P e
e Rn P: ð17:98Þ

Moreover, we have
672 17 Symmetry Groups

h iT h iT h iT h iT
eT ¼ R1 Rð21Þ Rð32Þ   Rðn2
P
n3Þ ðn2Þ
Rn1
ðn2Þ
¼ Rn1
ðn3Þ
Rn2
ð1Þ
   R2 ½R1 T
h i1 h i1 h i1
ðn2Þ ðn3Þ ð1Þ
¼ Rn1 Rn2    R2 R1 1
h i1
ð1Þ ð2Þ ðn3Þ ðn2Þ
¼ R1 R2 R3   Rn2 Rn1 ¼P e1: ð17:99Þ

ðn2Þ ðn3Þ ð1Þ


The third equality of (17.99) comes from the fact that matrices Rn1 , Rn2 , R2 ,
and R1 are orthogonal matrices. Then, we get

P e¼P
eT P ePeT ¼ E: ð17:100Þ

Thus, Pe is an orthogonal matrix.


From a point of view of practical application, (17.97) and (17.98) are very useful.
This is because Rn and Rðnn1Þ are conjugate to each other. Consequently, Rn has the
same eigenvalues and trace as Rðnn1Þ . In light of (11.81), we see that (17.97) and
(17.98) relate Rn (i.e., viewed in reference to the original coordinate system) to
Rðnn1Þ [i.e., the same transformation viewed in reference to the coordinate system
reached after the (n  1) transformations]. Since the transformation Rðnn1Þ is usually
described in a simple form, matrix calculations to compute Rn can readily be done.
Now let us consider an example.
Example 17.4: Successive Rotations A typical illustration of three successive
transformations in moving coordinate systems is well known in association with
Euler angles. This contains the following three steps:
(i) Rotation by α around the z-axis in the original coordinate system (O).
(ii) Rotation by β around the y0-axis in the transferred coordinate system (I).
(iii) Rotation by γ around the z00-axis (the same as z0-axis) in the transferred
coordinate system (II).
The above three steps are represented by matrices of Rzα, R0 y0 β , and R00 z00 γ in
f3 we have
(17.12). That is, as a total transformation R
17.4 Special Orthogonal Group SO(3) 673

0 10 10 1
cos α  sin α 0 cos β 0 sin β cos γ  sin γ 0
B CB CB C
f3 ¼ B sin α cos α 0C B 0 C B 0C
R @ A@ 0 1 A@ sin γ cos γ A
0 0 1  sin β 0 cos β 0 0 1
0 1
cos α cos β cos γ  sin α sin γ  cos α cos β sin γ  sin α cos γ cos α sin β
B C
¼B
@ sin α cos β cos γ þ cos α sin γ  sin α cos β sin γ þ cos α cos γ sin α sin β C
A:
 sin β cos γ sin β sin γ cos β
ð17:101Þ

This matrix corresponds to (17.87), where n ¼ 3. The angles α, β, and γ in (17.101)


are called Euler angles and their domains are usually taken as 0 α 2π, 0 β π,
0 γ 2π. The matrix (17.101) is widely used in quantum mechanics and related
fields of natural science. The matrix notation, however, differs from literature to
literature, and so care should be taken [3–5]. Using the notation of (17.87), we have
0 1
cos α  sin α 0
B C
R1 ¼ @ sin α cos α 0 A, ð17:102Þ
0 0 1
0 1
cos β 0 sin β
ð1Þ B C
R2 ¼ @ 0 1 0 A, ð17:103Þ
 sin β 0 cos β
0 1
cos γ  sin γ 0
ð2Þ B C
R3 ¼ @ sin γ cos γ 0 A: ð17:104Þ
0 0 1

From (17.94) we also get


0 1
cos α cos β  sin α cos α sin β
e  R1 Rð21Þ B C
P ¼ @ sin α cos β cos α sin α sin β A: ð17:105Þ
 sin β 0 cos β

Corresponding to (17.97), we have


h i1
R3 ¼ PR e1 ¼ R1 Rð21Þ Rð32Þ Rð21Þ R1 1:
e ð32Þ P ð17:106Þ

Now matrix calculations are readily performed such that


674 17 Symmetry Groups

0 10 10 1
cos α  sin α 0 cos β 0 sin β cos γ  sin γ 0
B CB CB C
R3 ¼ B
@ sin α cos α 0C
A@
B 0 0 C
1 B
A@ sin γ cos γ 0C
A
0 0 1  sin β 0 cos β 0 0 1
0 10 1
cos β 0  sin β cos α sin α 0
B CB C
B 0 1 0 C B C
@ A@  sin α cos α 0 A
sin β 0 cos β 0 0 1
0  1
cos 2 α cos 2 β þ sin 2 α cos γ cos α sin α sin 2 βð1  cos γ Þ cos α cos β sin βð1  cos γ Þ
B C
B þ cos 2 α sin 2 β  cos β sin γ þ sin α sin β sin γ C
B C
B  2  C
B cos α sin α sin βð1  cos γ Þ
2
sin α cos 2 β þ cos 2 α cos γ sin α cos β sin βð1  cos γ Þ C
B C
¼B C:
B þ cos β sin γ þ sin 2 α sin 2 β  cos α sin β sin γ C
B C
B C
B cos β sin β cos αð1  cos γ Þ sin α cos β sin βð1  cos γ Þ sin β cos γ
2 C
@ A
 sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β
ð17:107Þ

Notice that in (17.107) we have a trace χ described as

χ ¼ 1 þ 2 cos γ: ð17:108Þ

ð2Þ
The trace is same as that of R3 , as expected from (17.106) and (17.97).
Equation (17.107) apparently seems complicated but has simple and well-defined
ð2Þ
meaning. The rotation represented by R3 is characterized by a rotation by γ around
the z -axis. Figure 17.18 represents the orientation of the z00-axis viewed in reference
00

to the original xyz-system. That is, the z00-axis (identical to the rotation axis A) is
designated by an azimuthal angle α and a zenithal angle β as shown. The operation
R3 is represented by a rotation by γ around the axis A in the xyz-system. The angles α,
β, and γ coincide with the Euler angles designated with the same independent
parameters α, β, and γ.
From (17.77) and (17.107), a diagonalizing matrix for R3 is PU. e That is,
0 1
eiγ 0 0
{ e 1
 { B C
e ¼ PU
U P R3 PU e R3 PU { ð 2Þ
e ¼ U R3 U ¼ @ 0 eiγ 0 A: ð17:109Þ
0 0 1

e is a real matrix, we have


Note that as P
17.4 Special Orthogonal Group SO(3) 675

Fig. 17.18 Rotation γ z


around the rotation axis A.
The orientation of A is $
defined by angles α and β as
shown
γ

O y

e{ ¼ P
P e1
eT ¼ P ð17:110Þ

and
0 1
0 1 1 1
pffiffiffi pffiffiffi 0
cos α cos β  sin α cos α sin β B 2 2 C
B CBB
C
C
e ¼ B sin α cos β cos α sin α sin β C
PU @ ABB  pffiffiffi
i i
pffiffiffi 0C
C
@ 2 2 A
 sin β 0 cos β
0 0 1
0 1 i 1 i 1
pffiffiffi cos α cos β þ pffiffiffi sin α pffiffiffi cos α cos β  pffiffiffi sin α cos α sin β
B 2 2 2 2 C
B C
B 1 C
B i 1 i
sin α sin β C
¼ B pffiffiffi sin α cos β  pffiffiffi cos α pffiffiffi sin α cos β þ pffiffiffi cos α C:
B 2 2 2 2 C
B C
@ 1 1 A
 pffiffiffi sin β  pffiffiffi sin β cos β
2 2
ð17:111Þ

A vector representing the rotation axis corresponds to an eigenvalue 1. The direction


cosines of x-, y-, and z-component for the rotation axis A are cosα sin β, sinα sin β,
and cosβ (see Fig. 3.1), respectively, when viewed in reference to the original xyz-
coordinate system. This can directly be shown as follows: The characteristic equa-
tion of R3 is expressed as
676 17 Symmetry Groups

jR3  λEj ¼ 0: ð17:112Þ

Using (17.107) we have

jR3  λEj
0  1
cos 2 α cos 2 β þ sin 2 α cos γ cos α sin α sin 2 βð1  cos γ Þ cos α cos β sin βð1  cos γ Þ
B C
B þ cos 2 α sin 2 β  λ  cos β sin γ þ sin α sin β sin γ C
B   C
B cos α sin α sin 2 βð1  cos γ Þ sin α cos β þ cos α cos γ sin α cos β sin βð1  cos γ Þ C
2 2 2
B C
¼B C:
B þ cos β sin γ þ sin α sin β  λ
2 2
 cos α sin β sin γ C
B C
B C
@ cos α cos β sin βð1  cos γ Þ sin α cos β sin βð1  cos γ Þ sin β cos γ
2
A
 sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β  λ
ð17:113Þ

When λ ¼ 1, we must have the direction cosines of the rotation axis as an


eigenvector. That is, we get
0  1
cos 2 β cos 2 α þ sin 2 α cos γ sin 2 β cos α sin αð1  cos γ Þ cos β sin β cos αð1  cos γ Þ
B C
B þ sin 2 β cos 2 α  1  cos β sin γ þ sin β sin α sin γ C
B   C
B sin 2 β cos α sin αð1  cos γ Þ cos β sin 2 α þ cos 2 α cos γ
2
sin α cos β sin βð1  cos γ Þ C
B C
B C
B þ cos β sin γ þ sin 2 α sin 2 β  1  cos α sin β sin γ C
B C
B C
@ cos α cos β sin βð1  cos γ Þ sin α cos β sin βð1  cos γ Þ sin β cos γ
2
A
 sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β  1
ð17:114Þ
0 10 1
cos α sin β 0
B C B C
@ sin α sin β A ¼ @ 0 A:
cos β 0

The above matrix calculations certainly verify that (17.114) holds; readers, check it.
As an application of (17.107) to an illustrative example, let us consider a 2π/3
rotation around an axis trisecting a solid angle π/2 formed by the x-, y-, and z-axes
(see Fig. 17.19 and Sect. 17.3). Then we have
pffiffiffi pffiffiffi pffiffiffi
cos α ¼ sin α ¼ 1= 2, cos β ¼ 1= 3, sin β ¼ 2= 3,
pffiffiffi
cos γ ¼ 1=2, sin γ ¼ 3=2: ð17:115Þ

Substituting (17.115) for (17.107), we get


17.4 Special Orthogonal Group SO(3) 677

z z
(a) (b)

O
y
O

y
x

Fig. 17.19 Rotation axis of C3 that permutates basis vectors. (a) The C3 axis is a straight line that
connects the origin and each vertex of a cube. (b) Cube viewed along the C3 axis that connects the
origin and vertex

0 1
0 0 1
B C
R3 ¼ @ 1 0 0 A: ð17:116Þ
0 1 0

If we write a linear transformation R3 following (11.37), we get


0 10 1
0 0 1 x1
B CB C
R3 ðjxiÞ ¼ ðje1 ije2 i j e3 iÞ@ 1 0 0 A@ x2 A, ð17:117Þ
0 1 0 x3

where je1i, j e2i, and j e3i represent unit basis vectors in the direction of x-, y-, and z-
axes, respectively. We have jxi ¼ x1 j e1i + x2 j e2i + x3 j e3i. Thus, we get
0 1
x1
B C
R3 ðjxiÞ ¼ ðje2 i je3 i je1 iÞ@ x2 A: ð17:118Þ
x3

This implies a cyclic permutation of the basis vectors. This is well expressed in terms
of the column vectors (i.e., coordinate) transformation as
678 17 Symmetry Groups

0 1
x3
B C
R3 ðjxiÞ ¼ ðje1 i je2 i je3 iÞ@ x1 A: ð17:119Þ
x2

Care should be taken on which linear transformation is intended out of the basis
vectors transformation or coordinate transformation.
As mentioned above, we have compared geometric features on the moving
coordinate system and fixed coordinate system. The features apparently seem to
differ at a glance, but are essentially the same.

References

1. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
2. Cotton FA (1990) Chemical applications of group theory. Wiley, New York
3. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
Chapter 18
Representation Theory of Groups

Representation theory is an important pillar of the group theory. As we shall see


soon, the word “representation” and its definition sound a bit daunting. To be short,
however, we need “numbers” or “matrices” to do a mathematical calculation.
Therefore, we may think of the representation as merely numbers and matrices.
Individual representations have their dimension. If that dimension is one, we treat a
representation as a number (real or complex). If the dimension is two or more, we are
going to deal with matrices; in the case of the n-dimension, it is a (n, n) square
matrix. In this chapter we focus on the representation theory of finite groups. In this
case, we have an important theorem stating that a representation of any finite group
can be converted to a unitary representation by a similarity transformation. That is,
group elements of a finite group are represented by a unitary matrix. According to the
dimension of the representation, we have the same number of basis vectors. Bearing
these things firmly in mind, we can pretty easily understand this important notion of
representation.

18.1 Definition of Representation

In Sect. 16.4 we dealt with various aspects of the mapping between group elements.
Of these, we studied fundamental properties of isomorphism and homomorphism. In
this section we introduce the notion of representation of groups and study it. If we
deal with a finite group consisting of n elements, we describe it as ℊ ¼ {g1  e,
g2,   , gn} as in the case of Sect. 16.1.
Definition 18.1 Let ℊ ¼ {gν} be a group comprising elements gν. Suppose that a
(d, d ) matrix D(gν) is given for each group element gν. Suppose also that in
correspondence with

© Springer Nature Singapore Pte Ltd. 2020 679


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_18
680 18 Representation Theory of Groups

gμ gν ¼ gρ , ð18:1Þ
   
D gμ D ð gν Þ ¼ D gρ ð18:2Þ

holds. Then a set L consisting of D(gν), that is,

L ¼ fDðgν Þg

is said to be a representation.
Although the definition seems somewhat daunting, the representation is, as
already seen, merely a homomorphism.
We call individual matrices D(gν) a representation matrix. A dimension d of the
matrix is said to be a dimension of representation as well. In correspondence with
gνe ¼ gν, we have
   
D gμ DðeÞ ¼ D gμ : ð18:3Þ

Therefore, we get

DðeÞ ¼ E, ð18:4Þ

where E is an identity matrix of dimension d. Also in correspondence with


gνgν1 ¼ gν1gν ¼ E, we have
   
Dðgν ÞD gν 1 ¼ D gν 1 Dðgν Þ ¼ DðeÞ ¼ E: ð18:5Þ

That is,
 
D gν 1 ¼ ½Dðgν Þ1 : ð18:6Þ

Namely, an inverse matrix corresponds to an inverse element. If the representa-


tion has one-to-one correspondence, the representation is said to be faithful. In the
case of n-to-one correspondence, the representation is homomorphic. In particular,
representing all group elements by 1 [as (1, 1) matrix] is called an “identity
representation.”
Illustrative examples of the representation have already appeared in Sects. 17.2
and 17.3. In that case the representation was faithful. For instance, in Tables 17.2,
17.4, and 17.6 as well as Fig. 17.12, the number of representation matrices is the
same as that of group elements. In Sects. 17.2 and 17.3, in most cases we have used
real orthogonal matrices. Such matrices are included among unitary matrices. Rep-
resentation using unitary matrices is said to be a “unitary representation.” In fact, a
representation of a finite group can be converted to a unitary representation.
18.1 Definition of Representation 681

Theorem 18.1 [1, 2] A representation of any finite group can be converted to a


unitary representation by a similarity transformation.
Proof Let ℊ ¼ {g1, g2,   , gn} be a finite group of order n. Let D(gν) be a
representation matrix of gν of dimension d. Here we suppose that D(gν) is not
unitary, but non-singular. Using D(gν), we construct the following matrix H such
that
Xn
H¼ i¼1
Dðgi Þ{ Dðgi Þ: ð18:7Þ

Then, for an arbitrarily chosen group element gj we have


 {   Xn  {  
D gj HD gj ¼ i¼1
D gj Dðgi Þ{ Dðgi ÞD gj
Xn    {  
¼ i¼1
Dðgi ÞD gj Dðgi ÞD gj
Xn   {  
¼ D gi gj D gi gj ð18:8Þ
Xni¼1
¼ k¼1
Dðgk Þ{ Dðgk Þ
¼ H,

where the third equality comes from homomorphism of the representation and the
second last equality is due to rearrangement theorem (see Sect. 16.1).
Note that each matrix D(gi){D(gi) is an Hermitian Gram matrix constructed by a
non-singular matrix and that H is a summation of such Gram matrices. Conse-
quently, on the basis of the argument of Sect. 13.2, H is positive definite and all
the eigenvalues of H are positive. Then, using an appropriate unitary matrix U, we
can get a diagonal matrix Λ such that

U { HU ¼ Λ: ð18:9Þ

Here, the diagonal matrix Λ is given by


0 1
λ1
B λ2 C
B C
Λ¼B C, ð18:10Þ
@ ⋱ A
λd

with λi > 0 (1  i  d). We define Λ1/2 such that


682 18 Representation Theory of Groups

0 pffiffiffiffiffi 1
λ1
B pffiffiffiffiffi C
B λ2 C
Λ1=2 ¼B C, ð18:11Þ
@ ⋱ A
pffiffiffiffiffi
λd
0 qffiffiffiffiffiffiffi 1
λ1
B 1
qffiffiffiffiffiffiffi C
 1 B C
B λ1 C
Λ1=2 B
¼B 2 C: ð18:12Þ
C
B ⋱ C
@ qffiffiffiffiffiffiffi A
λ1
d

Notice that both Λ1/2 and (Λ1/2)1 are non-singular. Furthermore, we define a
matrix V such that
 1
V ¼ U Λ1=2 : ð18:13Þ

Then, multiplying both sides of (18.8) by V1 from the left and by V from the
right and inserting VV1 ¼ E in between, we get
 {  
V 1 D gj VV 1 HVV 1 D gj V ¼ V 1 HV : ð18:14Þ

Meanwhile, we have
 1  1
V 1 HV ¼ Λ1=2 U { HU Λ1=2 ¼ Λ1=2 Λ Λ1=2 ¼ Λ: ð18:15Þ

With the second equality of (18.15), we used (18.9). Inserting (18.15) into
(18.14), we get
 {  
V 1 D gj VΛV 1 D gj V ¼ Λ: ð18:16Þ

Multiplying both sides of (18.16) by Λ1 from the left, we get


 {  
Λ1 V 1 D gj ðVΛÞ  V 1 D gj V ¼ E: ð18:17Þ

Using (18.13), we have


h i1  1  1
Λ1 V 1 ¼ ðVΛÞ1 ¼ UΛ1=2 ¼ Λ1=2 U 1 ¼ Λ1=2 U{:
18.2 Basis Functions of Representation 683

Meanwhile, taking adjoint of (18.13) and noting (18.12), we get

 1 {  1
V{ ¼ Λ1=2 U { ¼ Λ1=2 U{:

Using (18.13) once again, we have

VΛ ¼ UΛ1=2 :

Also using (18.13) we have


( ){
   1 1 h i{  {  {
1 {
V ¼ U Λ1=2 ¼ Λ1=2 U { ¼ U { Λ1=2 ¼ UΛ1=2 ,

where we used unitarity of U and (18.11).


Using the above relations and rewriting (18.17), finally we get
 {  {  
V { D gj V 1 ∙ V 1 D gj V ¼ E: ð18:18Þ
 
e gj as follows
Defining D
   
e gj  V 1 D gj V,
D ð18:19Þ

and taking adjoint of both sides of (18.19), we get


 {  {  
e gj { :
V { D gj V 1 ¼ D ð18:20Þ

Then, from (18.18) we have


   
e gj { D
D e gj ¼ E: ð18:21Þ

Equation (18.21) implies that a representation of any finite group can be


converted to a unitary representation by a similarity transformation of (18.19).
This completes the proof. ∎

18.2 Basis Functions of Representation

In Part III we considered a linear transformation of a vector. In that case we have


defined vectors as abstract elements with which operation laws of (11.1)–(11.8)
hold. We assumed that the operation is addition. In this part, so far we have dealt
684 18 Representation Theory of Groups

with vectors mostly in ℝ3. Therefore, vectors naturally possess geometric features.
In this section, we extend a notion of vectors so that they can be treated under a wider
scope. More specifically, we include mathematical functions treated in analysis as
vectors.
To this end, let us think of a basis of a representation. According to the dimension
d of the representation, we have d basis vectors. Here we adopt d linearly indepen-
dent basis vectors for the representation.
Let ψ 1, ψ 2,   , and ψ d be linearly independent vectors in a vector space V d. Let
ℊ ¼ {g1, g2,   , gn} be a finite group of order n. Here we assume that gi (1  i  n)
is a linear transformation (or operator) such as a symmetry operation dealt with in
Chap. 17. Suppose that the following relation holds with gi 2 ℊ (1  i  n):
Xd
gi ð ψ ν Þ ¼ μ¼1
ψ μ Dμν ðgi Þ ð1  ν  dÞ: ð18:22Þ

Here we followed the notation of (11.37) that represented a linear transformation


of a vector. Rewriting (18.22) more explicitly, we have
0 1
D11 ðgi Þ  D1d ðgi Þ
B C
gi ðψ ν Þ ¼ ðψ 1 ψ 2   ψ d Þ@ ⋮ ⋱ ⋮ A: ð18:23Þ
Dd1 ðgi Þ    Ddd ðgi Þ

Comparing (18.23) with (11.37), we notice that ψ 1, ψ 2,   , and ψ d act as vectors.


The corresponding coordinates of the vectors (or a column vector); i.e., x1, x2,   ,
and xd have been omitted.
Now let us make sure that a set L consisting of {D(g1), D(g2),   , D(gn)} forms a
representation. Operating gj on (18.22), we have
  Xd Xd  
gj gi ðψ ν Þ¼½gj gi ðψ ν Þ ¼ gj μ¼1 ψ μ Dμν ðgi Þ ¼ g ψ μ Dμν ðgi Þ
μ¼1 j
Xd   Xd Xd  
¼ g ψ D μν ð g Þ ¼ ψ D g Dμν ðgi Þ ð18:24Þ
μ¼1 j μ i μ¼1 λ¼1 λ λμ j
Xd    
¼ ψ D gj Dðgi Þ λν :
λ¼1 λ

Putting gjgi ¼ gk according to multiplication of group elements, we get


Xd    
gk ð ψ ν Þ ¼ λ¼1
ψ λ D gj Dðgi Þ λν : ð18:25Þ

Meanwhile, replacing gi in (18.22) with gk, we have


18.2 Basis Functions of Representation 685

Xd
gk ð ψ ν Þ ¼ μ¼1
ψ λ Dλν ðgk Þ: ð18:26Þ

Comparing (18.25) and (18.26) and considering the uniqueness of vector repre-
sentation based on linear independence of the basis vectors (see Sect. 11.1), we get
   
D gj Dðgi Þ λν ¼ Dλν ðgk Þ  ½Dðgk Þλν , ð18:27Þ

where the last identity follows the notation (11.38) of Sect. 11.2. Hence, we have
 
D gj Dðgi Þ ¼ Dðgk Þ: ð18:28Þ

Thus, the set L consisting of {D(g1), D(g2),   , D(gn)} is certainly a representa-


tion of the group ℊ ¼ {g1, g2,   , gn}. In such a case, the set B consisting of linearly
independent d functions (i.e., vectors)

B ¼ fψ 1 , ψ 2 ,   , ψ d g ð18:29Þ

is said to be basis functions of the representation D. The number d equals the


dimension of representation. Correspondingly, the representation matrix is a (d, d )
square matrix. As remarked above, the correspondence between the elements
of L and ℊ is not necessarily one-to-one (isomorphic), but may be n-to-one
(homomorphic).
Definition 18.2 Let D and D0 be two representations of ℊ ¼ {g1, g2,   , gn}.
Suppose that these representations are related to each other by similarity transfor-
mation such that

D0 ðgi Þ ¼ T 1 Dðgi ÞT ð1  i  nÞ, ð18:30Þ

where T is a non-singular matrix. Then, D and D0 are said to be equivalent repre-


sentations, or simply equivalent. If the representations are not equivalent, they are
called inequivalent.
Suppose that B ¼ fψ 1 , ψ 2 ,   , ψ d g is a basis of a representation D of
ℊ ¼ {g1, g2,   , gn}. Then we have (18.22). Let T be a non-singular matrix. Using
T, we want to transform a basis of the representation D from B to a new set
B 0 ¼ ψ 01 , ψ 02 ,   , ψ 0d . Individual elements of the new basis are expressed as

0
Xd
ψν ¼ λ¼1
ψ λ T λν : ð18:31Þ

Since T is non-singular, this ensures that ψ 01 , ψ 02 ,   , and ψ 0d are linearly inde-


pendent and, hence, that B 0 forms another basis set of D (see Discussion of Sect.
11.4). Thus, we can describe ψ μ (1  μ  n) in terms of ψ 0ν . That is
686 18 Representation Theory of Groups

Xd  
ψμ ¼ λ¼1
ψ 0λ T 1 λμ : ð18:32Þ

Operating gi on both sides of (18.31), we have


  Xd Xd Xd
gi ψ 0ν ¼ λ¼1
g i ð ψ λ ÞT λν ¼ λ¼1
ψ D ðg ÞT λν
μ¼1 μ μλ i
Xd Xd Xd   Xd  
¼ λ¼1 μ¼1
ψ 0 T 1 κμ Dμλ ðgi ÞT λν ¼
κ¼1 κ
ψ 0 T 1 Dðgi ÞT κν :
κ¼1 κ
ð18:33Þ

Let D0 be a representation of ℊ in reference to B 0 ¼ ψ 01 , ψ 02 ,   , ψ 0d . Then we


have
  Xd
gi ψ 0ν ¼ ψ 0 D0 :
κ¼1 κ κν
ð18:34Þ

Hence, (18.30) follows in virtue of the linear independence of


ψ 01 , ψ 02 ,   , and ψ 0d . Thus, we see that the transformation of basis vectors via
(18.31) causes similarity transformation between representation matrices. This is
in parallel with (11.81) and (11.88).
Below we show several important notions as a definition. We assume that a group
is a finite group.
Definition 18.3 Let D and D e be two representations of ℊ ¼ {g1, g2,   , gn}. Let
e ðgi Þ be two representation matrices of gi (1  i  n). If we construct Δ(gi)
D(gi) and D
such that

Dðgi Þ
Δ ð gi Þ ¼ , ð18:35Þ
e ð gi Þ
D

then Δ(gi) is a representation as well. The representation Δ(gi) is said to be a direct


sum of D(gi) and D e ðgi Þ. We denote it by

e ðgi Þ:
Δðgi Þ ¼ Dðgi Þ  D ð18:36Þ

e ðgi Þ.
A dimension Δ(gi) is a sum of that of D(gi) and that of D
Definition 18.4 Let D be a representation of ℊ and let D(gi) be a representation
matrix of a group element gi. If we can convert D(gi) to a block matrix such as
(18.35) by its similarity transformation, D is said to be a reducible representation, or
completely reducible. If the representation is not reducible, it is irreducible.
18.2 Basis Functions of Representation 687

A reducible representation can be decomposed (or reduced) to a direct sum


of plural irreducible representations. Such an operation (or procedure) is called
reduction.
Definition 18.5 Let ℊ ¼ {g1, g2,   , gn} be a group and let V be a vector space.
Then, we denote a representation D of ℊ operating on V by

D : ℊ ! GLðV Þ:

We call V a representation space (or carrier space) L S of D [3, 4].


From Definition 18.5, the dimension of representation is identical with the
dimension of V. Suppose that there is a subspace W in V. If W is D(gi)-invariant
(see Sect. 12.2) with all 8gi 2 ℊ, W is said to be an invariant subspace of V. Here, we
say that W is D(gi)-invariant if we have | xi 2 W ⟹ D(gi)| xi 2 W; see Sect. 12.2.
Notice that | xi may well represent a function ψ ν of (18.22). In this context, we have a
following important theorem.
Theorem 18.2 [3] Let D: ℊ ! GL(V ) be a unitary representation over V. Let W be a
D(gi)-invariant subspace of V, where gi (1  i  n) 2 ℊ ¼ {g1, g2,   , gn}. Then, W⊥
is a D(gi)-invariant subspace of V as well.
Proof Suppose that | ai 2 W⊥ and let | bi 2 W. Then, from (13.86) we have
D E D E    
hbjDðgi Þjai ¼ ajDðgi Þ{ jb ¼ aj½Dðgi Þ1 jb ¼ ajD gi 1 jb ,

where we used the fact that D(gi) is a unitary matrix; see (18.6). But, since W is
D(gi)-invariant from the supposition, jD(gi1)| bi 2 W. Here notice that gi1 must be

gi (1  i  n) 2 ℊ. Therefore, we should have
   
hbjDðgi Þjai ¼ ajD gi 1 jb ¼ 0:

This implies that jD(gi)| ai 2 W⊥. This means that W⊥ is a D(gi)-invariant


subspace of V. This completes the proof. ∎
From Theorem 14.1, we have

V ¼ W  W ⊥:

Correspondingly, as in the case of (12.102), D(gi) is reduced as follows:


!
D ð W Þ ð gi Þ
Dðgi Þ ¼ ⊥ ,
DðW Þ ðgi Þ
688 18 Representation Theory of Groups


where D(W )(gi) and DðW Þ ðgi Þ are representation matrices associated with subspaces
W and W⊥, respectively. This is the converse of the additive representations that
appeared in Definition 18.3. From (11.17) and (11.18), we have

dimV ¼ dimW þ dimW ⊥ : ð18:37Þ

If V is a d-dimensional vector space, V is spanned by d linearly independent basis


vectors (or functions) ψ μ (1  μ  d). Suppose that the dimension W and W⊥ is d (W )

and dðW Þ , respectively. Then, W and W⊥ are spanned by d(W ) linearly independent

vectors and dðW Þ linearly independent vectors, respectively. The subspaces W and
W⊥ may well further be decomposed into orthogonal complements [i.e., D(gi)-
invariant subspaces of V]. In this situation, in general, we write

Dðgi Þ ¼ Dð1Þ ðgi Þ  Dð2Þ ðgi Þ      DðωÞ ðgi Þ: ð18:38Þ

We will develop further detailed discussion in Sect. 18.4.


If the aforementioned decomposition cannot be done, D is irreducible. Therefore,
it will be of great importance to examine a dimension of irreducible representations
contained in (18.38). This is closely related to the choice of basis vectors and
properties of the invariant subspaces. Notice that the same irreducible representation
may repeatedly occur in (18.38).
Before advancing to the next section, however, let us think of an example to get
used to abstract concepts. This is at the same time for the purpose of taking the
contents of Chap. 19 in advance.
Example 18.1 Figure 18.1 shows structural formulae including resonance struc-
tures for allyl radical. Thanks to the resonance structures, the allyl radical belongs to
C2v; see the multiplication table in Table 17.1. The molecule lies on the yz-plane and
a line connecting C1 and a bonded H is the C2 axis (see Fig. 18.1). The C2 axis is
identical to the z-axis. The molecule has mirror symmetries with respect to the yz-
and zx-planes. We denote π-orbitals of C1, C2, and C3 by ϕ1, ϕ2, and ϕ3, respec-
tively. We suppose that these orbitals extend toward the x-direction with a positive
sign in the upper side and a negative sign in the lower side relative to the plane of
paper. Notice that we follow custom of the group theory notation with the coordinate
setting in Fig. 18.1.
We consider an inner vector space V3 spanned by ϕ1, ϕ2, and ϕ3. To explicitly
show that these are vectors of the inner vector space, we express them as jϕ1i, jϕ2i,
and jϕ3i in this example. Then, according to (11.19) of Sect. 11.1, we write

V 3 ¼ Spanfjϕ1 i, jϕ2 i, jϕ3 ig:

The vector space V3 is a representation space pertinent to a representation D of the


present example. Also, in parallel to (13.32) of Sect. 13.2 we express an arbitrary
vector jψi 2 V3 as
18.2 Basis Functions of Representation 689

H H

H C1 H H C1 H
C3 C2 C3 C2

H H H H

y
x

Fig. 18.1 Allyl radical and its resonance structure. The molecule is placed on the yz-plane. The z-
axis is identical with a straight line connecting C1 and H (i.e., C2 axis)

jψi ¼ c1 jϕ1 i þ c2 jϕ2 i þ c3 jϕ3 i ¼ jc1 ϕ1 þ c2 ϕ2 þ c3 ϕ3 i


0 1
c1
B C
¼ ðjϕ1 i jϕ2 i jϕ3 iÞB C
@ c2 A:
c3

Let us now operate a group element of C2v. For example, choosing C2(z), we have
0 10 1
1 0 0 c1
B CB C
C2 ðzÞðjψ Þ ¼ ðjϕ1 i jϕ2 i jϕ3 iÞ@ 0 0 1 A@ c2 A: ð18:39Þ
0 1 0 c3

Thus, C2(z) is represented by a (3, 3) matrix. Other group elements are


represented similarly. These results are collected in Table 18.1, where each matrix
is given with respect to jϕ1i, jϕ2i, and jϕ3i as basis vectors. Notice that the matrix
representations differ from those of Table 17.2, where we chose jxi, j yi, and j zi for
the basis vectors. From Table 18.1, we immediately see that the representation
matrices are reduced to an upper (1, 1) diagonal matrix (i.e., just a number) and a
lower (2, 2) diagonal matrix.
In Table 18.1, we find that all the matrices are Hermitian (as well as unitary).
Since C2v is an Abelian group (i.e., commutative), in light of Theorem 14.13, we
should be able to diagonalize these matrices by a single unitary similarity transfor-
mation all at once. In fact, E and σ 0v ðyzÞ are invariant with respect to unitary similarity
transformation, and so we only have to diagonalize C2(z) and σ v(zx) at once. As a
characteristic equation of the above (3, 3) matrix of (18.39), we have
690 18 Representation Theory of Groups

Table 18.1 Matrix representation of symmetry operations given with respect to jϕ1i, jϕ2i, and jϕ3i
E C2(z) σ v(zx) σ 0 ðyzÞ
0 1 0 1 0 1 0v 1
1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C
@0 1 0A @ 0 0 1 A @0 0 1A @ 0 1 0 A
0 0 1 0 1 0 0 1 0 0 0 1

 
 λ  1 0 0 

 
 0 λ 1  ¼ 0:
 
 1 λ 

Solving the above equation, we get λ ¼ 1 or λ ¼  1 (as a double root). Also as a


diagonalizing unitary matrix U, we get
0 1
1 0 0
B 1 1 C
B0 pffiffiffi pffiffiffi C
U¼B
B 2 2 C {
C¼U : ð18:40Þ
@ 1 1 A
0 pffiffiffi  pffiffiffi
2 2

Thus, (18.39) can be rewritten as


0 1 0 1
1 0 0 c1
B C {B C
C 2 ðzÞðjψ Þ ¼ ðjϕ1 i jϕ2 i jϕ3 iÞUU { B
@ 0 0 1 C B C
AUU @ c2 A
0 1 0 c3
0 1
0 1 c1
1 0 0 B 1 C
1 1 B CB
B pffiffiffi ðc2 þ c3 Þ C
C:
¼ ðjϕ1 i pffiffiffi ðjϕ2 þ ϕ3 iÞ pffiffiffi ðjϕ2  ϕ3 iÞB
@ 0 1 0CAB 2 C
2 2 B C
@ 1 A
0 0 1 pffiffiffi ðc2  c3 Þ
2

The diagonalization of the representation matrix σ v(zx) using the same U as the
above is left for the readers as an exercise. Table 18.2 shows the results of the
diagonalized representations with regard to the vectors j ϕ1 i, p1ffiffi2 ðjϕ2 þ ϕ3 iÞ,
and p1ffiffi2 ðjϕ2  ϕ3 iÞ. Notice that traces of the matrices remain unchanged in
Tables 18.1 and 18.2, i.e., before and after the unitary similarity transformation.
From Table 18.2, we find that the vectors are eigenfunctions of the individual
symmetry operations. Each diagonal element is a corresponding eigenvalue of
those operations. Of the three vectors, jϕ1i and p1ffiffi2 ðjϕ2 þ ϕ3 iÞ have the same
eigenvalues with respect to individual symmetry operations. With symmetry
18.2 Basis Functions of Representation 691

Table 18.2 Matrix representation of symmetry operations given with respect to jϕ1i, p1ffiffi
2
ðjϕ2 þ ϕ3 iÞ, and p1ffiffi ðjϕ2  ϕ3 iÞ
2

E C2(z) σ v(zx) σ 0v ðyzÞ


0 1 0 1 0 1 0 1
1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C
@0 1 0A @ 0 1 0 A @0 1 0 A @ 0 1 0 A
0 0 1 0 0 1 0 0 1 0 0 1

operations C2 and σ v(zx), another vector p1ffiffi2 ðjϕ2  ϕ3 iÞ has an eigenvalue of a sign
opposite to that of the former two vectors. Thus, we find that we have arrived at
“symmetry-adapted” vectors by taking linear combination of original vectors.
Returning back to Table 17.2, we see that the representation matrices have already
been diagonalized. In terms of the representation space, we constructed the repre-
sentation matrices with respect to x, y, and z as symmetry-adapted basis vectors. In
the next chapter, we make the most of such vectors constructed by the symmetry-
adapted linear combination.
Meanwhile, allocating appropriate numbers to coefficients c1, c2, and c3, we
represent any vector in V3 spanned by jϕ1i, jϕ2i, and jϕ3i. In the next chapter, we
deal with molecular orbital (MO) calculations. Including the present case of the
allyl radical, we solve the energy eigenvalue problem by appropriately determining
those coefficients (i.e., eigenvectors) and corresponding energy eigenvalues in a
representation space. A dimension of the representation space depends on the
number of molecular orbitals. The representation space is decomposed into several
or more (but finite) orthogonal complements according to (18.38). We will come
back to the present example in Sect. 19.4.4 and further investigate the problems
there.
As can be seen in the above example, the representation space accommodates
various types of vectors, e.g., mathematical functions in the present case. If the
representation space is decomposed into invariant subspaces, we can choose appro-
priate basis vectors for each subspace, the number of which is equal to the dimen-
sionality of each irreducible representation [4]. In this context, the situation is related
to that of Part III where we have studied how a linear vector space is decomposed
into direct sum of invariant subspaces that are spanned by associated basis vectors.
In particular, it will be of great importance to construct mutually orthogonal
symmetry-adapted vectors through linear combination of original basis vectors in
each subspace associated with the irreducible representation. We further study these
important subjects in the following several sections.
692 18 Representation Theory of Groups

18.3 Schur’s Lemmas and Grand Orthogonality Theorem


(GOT)

Pivotal notions of the representation theory of finite groups rest upon Schur’s
lemmas (first lemma and second lemma).
Schur’s First Lemma [1, 2] Let D and D e be two irreducible representations of ℊ.
e be m and n, respectively. Suppose that
Let dimensions of representations of D and D
8
with g 2 ℊ the following relation holds:

e ðgÞ,
DðgÞM ¼ M D ð18:41Þ

where M is a (m, n) matrix. Then we must have


Case (i): M ¼ 0 or
Case (ii): M is a square matrix (i.e., m ¼ n) with detM 6¼ 0.
In Case (i), the representations of D and De are inequivalent. In Case (ii), on the
other hand, D and D e are equivalent.

Proof
(a) First, suppose that m > n. Let B ¼ fψ 1 , ψ 2 ,   , ψ m g be a basis set of the
representation space related to D. Then we have
Xm
gð ψ ν Þ ¼ μ¼1
ψ μ Dμν ðgÞ ð1  ν  mÞ:

Next we form a linear combination of ψ 1, ψ 2,   , and ψ m such that


Xm
ϕν ¼ μ¼1
ψ μ M μν ð1  ν  nÞ: ð18:42Þ

Operating g on both sides of (18.42), we have


Xm   Xm hXm i
gð ϕ ν Þ ¼ μ¼1
g ψ μ M μν ¼ μ¼1 λ¼1
ψ λ D λμ ð g Þ M μν
Xm hXm i Xm hXn i
¼ ψ λ D λμ ð g ÞM μν ¼ ψ λ M λμ
e μν ðgÞ
D
λ¼1 μ¼1 λ¼1 μ¼1
Xn hXm i Xn
¼ ψ M D e μν ðgÞ ¼ ϕ D e ðgÞ:
μ¼1 λ¼1 λ λμ μ¼1 μ μν

ð18:43Þ

With the fourth equality of (18.43), we used (18.41). Therefore, B e¼


e
fϕ1 , ϕ2 ,   , ϕn g is a representation of D.
If m > n, we would have been able to construct a basis of the representation
D using n (<m) functions of ϕν (1  ν  n) that are a linear combination of
ψ μ (1  μ  m). In that case, we would have obtained a representation of a
smaller dimension n for D e by taking a linear combination of ψ 1, ψ 2,   , and ψ m.
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) 693

As n < m by supposition, this implies that as in (18.35), D would be decomposed


into representations whose dimension is smaller than m. It is in contradiction to
the supposition that D is irreducible. To avoid this contradiction, we must have
M ¼ 0. This makes (18.41) trivially hold.
(b) Next we consider a case of m < n. Taking a complex conjugate of (18.41) and
exchanging both sides, we get

e ðgÞ{ M { ¼ M { DðgÞ{ :
D ð18:44Þ

Then, from (18.21) and (18.28) we have


 
DðgÞ{ DðgÞ ¼ E and DðgÞD g1 ¼ DðeÞ: ð18:45Þ

Therefore, we get
 
D g1 ¼ DðgÞ1 ¼ DðgÞ{ :

Similarly, we have
 
e g1 ¼ D
D e ð gÞ { :
e ðgÞ1 ¼ D

Thus, (18.44) is rewritten as


   
e g1 M { ¼ M { D g1 :
D ð18:46Þ

The number of rows of M{ is larger than that of columns. This time,


dimension n of D e ðg1 Þ is larger than m of D(g1). A group element
g designates an arbitrary element of ℊ, so does g1. Thus, (18.46) can be treated
in parallel with (18.41) where m > n with a (m, n) matrix M. Figure 18.2
graphically shows the magnitude relationship in representation dimensions
between representation matrices and M. Thus, in parallel with the above argu-
ment of (a), we exclude the case where m < n as well.
(c) We consider the third case of m ¼ n. Similarly as before we make a linear
combination of vectors contained in the basis set B ¼ fψ 1 , ψ 2 ,   , ψ m g such that
Xm
ϕν ¼ μ¼1
ψ μ M μν ð1  ν  mÞ, ð18:47Þ

where M is a (m, m) square matrix. If detM ¼ 0, then ϕ1, ϕ2,   , and ϕm are
linearly dependent (see Sect. 11.4). With the number of linearly independent
vectors p, we have p < m accordingly. As in (18.43) again, this implies that we
would have obtained a representation of a smaller dimension p for D e , in
e
contradiction to the supposition that D is irreducible. To avoid this contradiction,
we must have detM 6¼ 0. These complete the proof. ∎
694 18 Representation Theory of Groups

(a): >

( ) × = ( )
×
(m, m) (m, n) (m, n) (n, n)

(b): <

( ) × = × ( )
(n, n) (n, m) (n, m) (m, m)

Fig. 18.2 Magnitude relationship between dimensions of representation matrices and M. (a) m > n.
(b) m < n. The diagram is based on (18.41) and (18.46) of the text

Schur’s Second Lemma [1, 2] Let D be a representation of ℊ. Suppose that with


8
g 2 ℊ we have

DðgÞM ¼ MDðgÞ: ð18:48Þ

Then, if D is irreducible, M ¼ cE with c being an arbitrary complex number, where


E is an identity matrix.
Proof Let c be an arbitrarily chosen complex number. From (18.48) we have

DðgÞðM  cE Þ ¼ ðM  cE ÞDðgÞ: ð18:49Þ

If D is irreducible, Schur’s First Lemma implies that we must have either


(i) M  cE ¼ 0 or (ii) det(M  cE) 6¼ 0. A matrix M has at least one proper
eigenvalue λ (Sect. 12.1), and so choosing λ for c in (18.49), we have det(M  λE) ¼ 0.
Consequently, only former case is allowed. That is, we have

M ¼ cE: ð18:50Þ


Schur’s lemmas lead to important orthogonality theorem that plays a fundamental
role in many scientific fields. The orthogonality theorem includes that of matrices
and their traces (or characters).
Theorem 18.3: Grand Orthogonality Theorem (GOT) [5] Let D(1), D(2),    be
all inequivalent irreducible representations of a group ℊ ¼ {g1, g2,   , gn} of order
n. Let D(α) and D(β) be two irreducible representations chosen from among D(1), D(2),
  . Then, regarding their matrix representations, we have the following relationship:
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) 695

X ð αÞ ðβÞ n
Dij ðgÞ Dkl ðgÞ ¼ δ δ δ , ð18:51Þ
g dα αβ ik jl

where Σg means that the summation should be taken over all n group elements; dα
denotes a dimension of the representation D(α). The symbol δαβ means that δαβ ¼ 1
when D(α) and D(β) are equivalent and that δαβ ¼ 0 when D(α) and D(β) are
inequivalent.
Proof First we prove the case where D(α) ¼ D(β). For the sake of simple expression
we omit a superscript and denote D(α) simply by D. Let us construct a matrix A such
that
X  
A¼ g
DðgÞXD g1 , ð18:52Þ

where X is an arbitrary matrix. Hence,


X  1  X  1   0 1 
Dðg0 ÞA ¼ g
D ð g 0
ÞD ð g ÞXD g ¼ g
Dð g 0
ÞD ð g ÞXD g D g Dðg0 Þ
X  
1
¼ g
Dðg0 ÞDðgÞXD g1 g0 Dðg0 Þ
X h i
0 0 1
¼ g
D ð g g ÞXD ð g g Þ Dðg0 Þ:
ð18:53Þ

Thanks to the rearrangement theorem, for fixed g0 the element g0g runs through all
the group elements as g does so. Therefore, we have
X h i X  
0 0 1
g
Dð g g ÞXD ð g g Þ ¼ g
DðgÞXD g1 ¼ A: ð18:54Þ

Thus,

DðgÞA ¼ ADðgÞ: ð18:55Þ

According to Schur’s Second Lemma, we have

A ¼ λE: ð18:56Þ

ðlÞ
The value of a constant λ depends upon the choice of X. Let X be δi δjðmÞ where all
the matrix elements are zero except for the (l, m)-component that takes 1 (Sects. 12.5
and 12.6). Thus from (18.53) we have
X   X  
g,p,q
Dip ðgÞδðplÞ δqðmÞ Dqj g1 ¼ D ðgÞDmj g1 ¼ λlm δij ,
g il
ð18:57Þ

where λlm is a constant to be determined. Using the unitary representation, we have


696 18 Representation Theory of Groups

X
g
Dil ðgÞDjm ðgÞ ¼ λlm δij : ð18:58Þ

Next, we wish to determine coefficients λlm. To this end, setting i ¼ j and


summing over i in (18.57), we get for LHS

X X   X   X  
g i
Dil ðgÞDmi g1 ¼ g
D g1 DðgÞ ¼ g
D g1 g
ml ml

X X
¼ g
½DðeÞml ¼ δ
g ml
¼ nδml , ð18:59Þ

where n is equal to the order of group. As for RHS, we have


X
λ δ
i lm ii
¼ λlm d, ð18:60Þ

where d is equal to a dimension of D. From (18.59) and (18.60), we get

n
λlm d ¼ nδlm or λlm ¼ δlm : ð18:61Þ
d

Therefore, from (18.58)


X n
Dil ðgÞDjm ðgÞ ¼ δlm δij : ð18:62Þ
g d

Specifying a species of the irreducible representation, we get


X ðαÞ ðαÞ n
Dil ðgÞDjm ðgÞ ¼ δ δ , ð18:63Þ
g dα lm ij

where dα is a dimension of D(α).


Next, we examine the relationship between two inequivalent irreducible repre-
sentations. Let D(α) and D(β) be such representations with dimensions dα and dβ,
respectively. Let us construct a matrix B such that
X  
B¼ g
DðαÞ ðgÞXDðβÞ g1 , ð18:64Þ

where X is again an arbitrary matrix. Hence,


18.4 Characters 697

X  
DðαÞ ðg0 ÞB ¼ g
D ðαÞ 0
ð g ÞD ðαÞ
ð g ÞXD ðβÞ 1
g
X
ð αÞ 0 ð αÞ
  ðβÞ  0 1  ðβÞ 0
ðβÞ 1
¼ g
D ð g ÞD ð g ÞXD g D g D ðg Þ ð18:65Þ
X h i
1
¼ g
DðαÞ ðg0 gÞXDðβÞ ðg0 gÞ DðβÞ ðg0 Þ ¼ BDðβÞ ðg0 Þ:

According to Schur’s First Lemma, we have

B ¼ 0: ð18:66Þ

ðlÞ
Putting X ¼ δi δjðmÞ as before and rewriting (18.64), we get

X ðαÞ ðβÞ
g
Dil ðgÞDjm ðgÞ ¼ 0: ð18:67Þ

Combining (18.63) and (18.67), we get (18.51). These procedures complete the
proof. ∎

18.4 Characters

Representation matrices of a group are square matrices. In Part III we examined


properties of a trace, i.e., a sum of diagonal elements of a square matrix. In group
theory the trace is called a character.
Definition 18.6 Let D be a (matrix) representation of a group ℊ ¼ {g1, g2,   , gn}.
The sum of diagonal elements χ(g) is defined as follows:
Xd
χ ðgÞ  TrDðgÞ ¼ i¼1
Dii ðgÞ, ð18:68Þ

where g stands for group elements g1, g2,   , and gn; Tr stands for “trace”; d is a
dimension of the representation D. Let ∁ be a set defined as

∁ ¼ fχ ðg1 Þ, χ ðg2 Þ,   , χ ðgn Þg: ð18:69Þ

Then, the set ∁ is called a character of D. A character of an irreducible represen-


tation is said to be an irreducible character.
Let us describe several properties of the character or trace.
(i) A character of the identity element χ(e) is equal to a dimension d of a
representation. This is because the identity element is given by a unit matrix.
698 18 Representation Theory of Groups

(ii) Let P and Q be two square matrices. Then, we have

TrðPQÞ ¼ TrðQPÞ: ð18:70Þ

This is because
X XX XX X
i
ðPQÞii ¼ i
P Q
j ij ji
¼ j i
Qji Pij ¼ j
ðQPÞjj : ð18:71Þ

Putting Q ¼ SP1 in (18.70), we get


   
Tr PSP1 ¼ Tr SP1 P ¼ TrðSÞ: ð18:72Þ

Therefore, we have the following property:


(iii) Characters of group elements that are conjugate to each other are equal. If gi
and gj are conjugate, these elements are connected by means of a suitable
element g such that

ggi g1 ¼ gj : ð18:73Þ

Accordingly, a representation matrix is expressed as


   
DðgÞDðgi ÞD g1 ¼ DðgÞDðgi Þ½DðgÞ1 ¼ D gj : ð18:74Þ

Taking a trace of both sides of (18.74), we have

 
χ ðgi Þ ¼ χ gj : ð18:75Þ

(iv) Any two equivalent representations have the same trace. This immediately
follows from (18.30).
There are several orthogonality theorem about a trace. Among them, the
following theorem is well known.
Theorem 18.4 A trace of irreducible representations satisfies the following orthog-
onality relation:
X
g
χ ðαÞ ðgÞ χ ðβÞ ðgÞ ¼ nδαβ , ð18:76Þ

where χ (α) and χ (β) are traces of irreducible representations D(α) and D(β),
respectively.
Proof In (18.51) putting i ¼ j and k ¼ l in both sides and summing over all i and k,
we have
18.4 Characters 699

X X ð αÞ ðβÞ
X n X n
g
D ðgÞ Dkk ðgÞ ¼
i,k ii i,k d α
δ αβ δ ik δ ik ¼ δ δ
i,k d α αβ ik
n
¼ δ d ¼ nδαβ : ð18:77Þ
d α αβ α

From (18.68) and (18.77), we get (18.76). This completes the proof. ∎
Since a character is identical with group elements belonging to the same
conjugacy class Kl, we may write it as χ(Kl) and rewrite a summation of (18.76) as
a summation of the conjugacy classes. Thus, we have
Xnc
l¼1
χ ðαÞ ðK l Þ χ ðβÞ ðK l Þkl ¼ nδαβ , ð18:78Þ

where nc denotes the number of conjugacy classes in a group and kl indicates the
number of group elements contained in a class Kl.
We have seen a case where a representation matrix can be reduced to two
(or more) block matrices as in (18.35). Also as already seen in Part III, the block
matrices decomposition takes place with normal matrices (including unitary matri-
ces). The character is often used to examine a constitution of a reducible represen-
tation or reducible matrix.
Alternatively, if a unitary matrix (a normal matrix, more widely) is decomposed
into block matrices, we say that the unitary matrix comprises a direct sum of those
block matrices. In physics and chemistry dealing with atoms, molecules, crystals,
etc., we very often encounter such a situation. Extending (18.36), the relation can
generally be summarized as

Dðgi Þ ¼ Dð1Þ ðgi Þ  Dð2Þ ðgi Þ      DðωÞ ðgi Þ, ð18:79Þ

where D(gi) is a reducible representation for a group element gi; D(1)(gi), D(2)(gi),   ,
and D(ω)(gi) are irreducible representations in a group. The notation D(ω)(gi) means
that D(ω)(gi) may be equivalent (or identical) to D(1)(gi), D(2)(gi), etc. or may be
inequivalent to them. More specifically, the same irreducible representations may
well appear several times.
To make the above situation clear, we usually use a following equation instead:
X
D ð gi Þ ¼ q DðαÞ ðgi Þ,
α α
ð18:80Þ

where qα is zero or a positive integer and D(α) is different types of irreducible


representations. If the same D(α) repeatedly appears in the direct sum, then
qα specifies how many times D(α) appears in the direct sum. Unless D(α) appears,
qα is zero.
Bearing the above in mind, we take a trace of (18.80). Then we have
700 18 Representation Theory of Groups

X
ðαÞ
χ ð gÞ ¼ q χ
α α
ðgÞ, ð18:81Þ

where we omitted a subscript i indicating an element. To find qα, let us multiply both
sides of (18.81) by χ (α)(g) and take summation over group elements. That is,
X X X X
g
χ ðαÞ ðgÞ χ ðgÞ ¼ q
β β g
χ ðαÞ ðgÞ χ ðβÞ ðgÞ ¼ q nδ
β β αβ
¼ qα n, ð18:82Þ

where we used (18.76) with the second equality. Thus, we get

1 X ð αÞ 
qα ¼ χ ðgÞ χ ðgÞ: ð18:83Þ
n g

The integer qα explicitly gives the number of appearance of D(α)(g) that appears in a
reducible representation D(g). The expression pertinent to the classes is

1 X ðαÞ
qα ¼ χ ðK i Þ χ ðK i Þki , ð18:84Þ
n i

where Ki and ki denote the i-th class of the group and the number of elements
belonging to Ki, respectively.

18.5 Regular Representation and Group Algebra

Now, the readers may wonder how many different irreducible representations exist
for a group. To answer this question, let us introduce a special representation of the
regular representation.
Definition 18.7 Let ℊ ¼ {g1, g2,   , gn} be a group. Let us define a (n, n) square
matrix D(R)(gν) for an arbitrary group element gν (1  ν  n) such that
h i  
DðRÞ ðgν Þ ¼ δ g1
i gν gj ð1  i, j  nÞ, ð18:85Þ
ij
8
>
<1 for gν ¼ e ði:e:, identityÞ,
where δðgν Þ ¼ ð18:86Þ
>
:
0 for gν 6¼ e:

Let us consider a set


n o
R ¼ DðRÞ ðg1 Þ, DðRÞ ðg2 Þ,   , DðRÞ ðgn Þ : ð18:87Þ

Then, the set R is said to be a regular representation of the group ℊ.


18.5 Regular Representation and Group Algebra 701

In fact, R is a representation.


 This is confirmed as follows: In (18.85), if
g1
i g g
ν j ¼ e, δ g 1
i g g
ν j ¼ 1. This occurs when gνgj ¼ gi (A). Meanwhile, let us
consider a situation where g1 j g g
μ k ¼ e. This occurs when gμgk ¼ gj (B). Replacing
gj in (A) with that in (B), we have

gν gμ gk ¼ gi : ð18:88Þ

That is,

g1
i gν gμ gk ¼ e: ð18:89Þ

If we choose gi and gν, gj is uniquely decided from (A). If gμ is separately chosen,


then gk is uniquely decide from (B) as well, because gj has already been uniquely
decided. Thus, performing the following matrix calculations, we get
Xh i h  i X    
j
DðRÞ ðgν Þ ij D
ðRÞ
gμ jk ¼ j
δ g 1
i g ν g j δ g 1
j g μ g k
 1  h  i
¼ δ gi gν gμ gk ¼ DðRÞ gν gμ : ð18:90Þ
ik

Rewriting (18.90) in a matrix product form, we get


   
DðRÞ ðgν ÞDðRÞ gμ ¼ DðRÞ gν gμ : ð18:91Þ

Thus, D(R) is certainly a representation.


To further confirm this, let us think of an example.
Example 18.2 Let us consider a thiophene molecule that we have already examined
in Sect. 17.2. In a multiplication table, we arrange E, C2, σ v , σ v0 in a first column and
their inverse element E, C2, σ v , σ v0 in a first row. In this case, the inverse element is
the same as original element itself. Paying attention, e.g., to C2, we allocate the
number 1 on the place where C2 appears and the number 0 otherwise. Then, that
matrix is a regular representation of C2; see Table 18.3. Thus as D(R)(C2), we get
0 1
0 1 0 0
B1 0C
B 0 0 C
DðRÞ ðC 2 Þ ¼ B C: ð18:92Þ
@0 0 0 1A
0 0 1 0

As evidenced in (18.92), the rearrangement theorem ensures that the number


1 appears once and only once in each column and each row in such a way that
individual column and row vectors become linearly independent. Thus, at the same
time, we confirm that the matrix is unitary.
702 18 Representation Theory of Groups

Another characteristic of the regular representation is that the identity is


represented by an identity matrix. In this example, we have
0 1
1 0 0 0
B0 0 0C
B 1 C
DðRÞ ðE Þ ¼ B C: ð18:93Þ
@0 0 1 0A
0 0 0 1

For the other symmetry operations, we have


0 1 0 1
0 0 1 0 0 0 0 1
B0 0 1C B0 0C
B 0 C B 0 1 C
DðRÞ ðσ v Þ ¼ B C, DðRÞ ðσ v0 Þ ¼ B C:
@1 0 0 0A @0 1 0 0A
0 1 0 0 1 0 0 0

Let χ (R)(gν) be a character of the regular representation. Then, according to the


definition of (18.85),
8
>
<n for gν ¼ e,
Xn  1 
χ ð RÞ ð g ν Þ ¼ δ g g ν g ¼ ð18:94Þ
i¼1 i i
>
:
0 for gν 6¼ e:

As can be seen from (18.92), the regular representation is reducible because the
matrix is decomposed into block matrices. Therefore, the representation can be
reduced to a direct sum of irreducible representations such that
X
DðRÞ ¼ q DðαÞ ,
α α
ð18:95Þ

where qα is a positive integer or zero and D(α) is an irreducible representation. Then,


from (18.81), we have

Table 18.3 How to make a


C2v E1 C2(z)1 σ v(zx)1 σ 0v ðyzÞ1
regular representation of C2v
E E C2 σv σ 0v
C2(z) C2 E σ 0v σv
σ v(zx) σv σ 0v E C2
σ 0v ðyzÞ σ 0v σv C2 E
18.5 Regular Representation and Group Algebra 703

X
χ ðRÞ ðgν Þ ¼ q χ
α α
ðαÞ
ðgν Þ, ð18:96Þ

where χ (α) is a trace of the irreducible representation D(α). Using (18.83) and
(18.94),

1 X ð α Þ  ð RÞ 1 1
qα ¼ χ ðgÞ χ ðgÞ ¼ χ ðαÞ ðeÞ χ ðRÞ ðeÞ ¼ χ ðαÞ ðeÞ n ¼ χ ðαÞ ðeÞ
n g n n
¼ dα : ð18:97Þ

Note that a dimension dα of the representation D(α) is equal to a trace of its identity
matrix. Also notice from (18.95) and (18.97) that D(R) contains every irreducible
representation D(α) dα times. To show it more clearly, Table 18.4 gives a character
table of C2v. The regular representation matrices are given in Example 18.2. For this,
we have

DðRÞ ðC 2v Þ ¼ A1 þ A2 þ B1 þ B2 : ð18:98Þ

This relation obviously indicates that all the irreducible representations of C2v are
contained one time (that is equal to the dimension of representation of C2v).
Returning to (18.94) and (18.96) and replacing qα with dα there, we get
8
X < n for gν ¼ e,
>
ðαÞ
d χ
α α
ð gν Þ ¼ ð18:99Þ
>
:
0 for gν 6¼ e:

In particular, when gν ¼ e, again we have χ (α)(e) ¼ dα (dα is a real number!).


That is,
X
d
α α
2
¼ n: ð18:100Þ

This is very important relation in the representation theory of a finite group in that
(18.100) sets an upper limit to the number of irreducible representations and their
dimensions. That number cannot exceed the order of a group.

Table 18.4 Character table C2v E C2(z) σ v(zx) σ 0v ðyzÞ


of C2v
A1 1 1 1 1 z; x2, y2, z2
A2 1 1 1 1 xy
B1 1 1 1 1 x; zx
B2 1 1 1 1 y; yz
704 18 Representation Theory of Groups

In (18.76) and (18.78), we have shown the orthogonality relationship between


traces. We have another important orthogonality relationship between them. To prove
this theorem, we need a notion of group algebra [5]. The argument is as follows:
(a) Let us think of a set comprising group elements expressed as
X
ℵ¼ a g,
g g
ð18:101Þ

where g is a group element of a group ℊ ¼ {g1, g2,   , gn}; ag is an arbitrarily


chosen complex number; Σg means that summation should be taken over group
elements. Let ℵ0 be another set similarly defined as (18.101). That is,
X
ℵ0 ¼ a0 0 g0 :
g0 g
ð18:102Þ

Then we can define a following sum:


X X X  
ℵ þ ℵ0 ¼ g
a g g þ a
g0 g
0 0
0 g ¼ g
a g g þ a 0
g g
X  
¼ g
ag þ a0g g: ð18:103Þ

Also, we get
X X  X X 
ℵ  ℵ0 ¼ g
a g g g 0
a 0 0
g 0g ¼ g0 g
a g gg 0
a0g0
X  X  X X 
0 1 0 0 0 1
¼ g0 1 g g
a gg a g0 1 ¼ g0 1
a
gg0 gg
0 gg g a0g0 1
X X  X X  X X 
0 0 0
¼ g 0 1
gg 0
a gg 0 g a 0 1 ¼
g gg 0 g 0 1 agg ag0 1 g ¼
0
g g 0
a gg0 a 0 1 g,
g

ð18:104Þ

where we used the rearrangement theorem and suitable exchange of group


elements. Thus, we see that the above-defined set ℵ is closed under summations
(i.e., linear combinations) and multiplications. A set closed under summations
and multiplications is said to be an algebra. If the set forms a group, the said set is
called a group algebra.
So far we have treated calculations as a multiplication between two elements
g and g0, i.e., g ⋄ g0 (see Sect. 16.1). Now we start regarding the calculations as
summation as well. In that case, group elements act as basis vectors in a vector
space. Bearing this in mind, let us further define specific group algebra.
(b) Let Ki be the i-th conjugacy class of a group ℊ ¼ {g1, g2,   , gn}. Also let Ki be
such that
n o
ðiÞ ðiÞ ðiÞ
K i ¼ A1 , A2 ,   , Aki , ð18:105Þ

where ki is the number of elements belonging to Ki. Now think of a set


gKig1 (8g 2 ℊ). Then, due to the definition of a class, we have
18.5 Regular Representation and Group Algebra 705

gK i g1 ⊂ K i : ð18:106Þ

Multiplying g1 from the left and g from the right of both sides, we get
Ki ⊂ g1Kig. Since g is arbitrarily chosen, replacing g with g1 we have

K i ⊂ gK i g1 : ð18:107Þ

Therefore, we get

gK i g1 ¼ K i : ð18:108Þ

ðiÞ ðiÞ
Meanwhile, for AðαiÞ , Aβ 2 K i ; AðαiÞ 6¼ Aβ (1  α, β  ki) we have

ðiÞ 8 
gAðαiÞ g1 6¼ gAβ g1 g2ℊ : ð18:109Þ

ðiÞ
This is because if the equality holds with (18.109), we have AðαiÞ ¼ Aβ , in
contradiction.
(c) Let K be a set collecting several classes and described as
X
K¼ aK,
i i i
ð18:110Þ

where ai is a positive integer or zero. Thanks to (18.105) and (18.110), we have

gKg1 ¼ K: ð18:111Þ

Conversely, if a group algebra K satisfies (18.111), K can be expressed as a


sum of classes such as (18.110). Here suppose that K is not expressed by
(18.110), but described by
X
K¼ aK
i i i
þ Q, ð18:112Þ

where Q is an “incomplete” set that does not form a class. Then,


X  X 
gKg1 ¼ g i
a i K i þ Q g 1
¼ g i
a i K i g1 þ gQg1
X X
¼ a K þ gQg1 ¼ K ¼
i i i
a K þ Q,
i i i
ð18:113Þ

where the equality before the last comes from (18.111). Thus, from (18.113)
we get
8 
gQg1 ¼ Q g2ℊ : ð18:114Þ
706 18 Representation Theory of Groups

By definition of the classes, this implies that Q must form a “complete” class,
in contradiction to the supposition. Thus, (18.110) holds.
In Sect. 16.3, we described several characteristics of an invariant subgroup.
As readily seen from (16.11) and (18.111), any invariant subgroup consists of
two or more entire classes. Conversely, if a group comprises entire classes, it
must be an invariant subgroup.
(d) Let us think of a product of classes. Let Ki and Kj be sets described as (18.105).
We define a product KiKj as a set containing products
ð jÞ  
AðαiÞ Aβ 1  α  ki , 1  β  kj . That is

Xki Xkj ðiÞ


KiKj ¼ l¼1
A AðmjÞ :
m¼1 l
ð18:115Þ

Multiplying (15.115) by g (8g 2 ℊ) from the left and by g1 from the right, we
get
 
gK i K j g1 ¼ ðgK i gÞ g1 K j g1 ¼ K i K j : ð18:116Þ

From the above discussion, we get


X
KiKj ¼ c K,
l ijl l
ð18:117Þ

where cijl is a positive integer or zero. In fact, when we take gKiKjg1, we


merely permute the terms in (18.117).
(e) If two group elements of the group ℊ ¼ {g1, g2,   , gn} are conjugate to each
other, their inverse elements are conjugate to each other as well. In fact, suppose
that for gμ, gν 2 Ki, we have

gμ ¼ ggν g1 : ð18:118Þ

Then, taking the inverse of (18.118), we get

gμ 1 ¼ ggν 1 g1 : ð18:119Þ

Thus, given a class Ki, there exists another class K i0 that consists of the
inverses of the elements of Ki. If gμ 6¼ gν, gμ1 6¼ gν1. Therefore, Ki and K i0 are
of the same order. Suppose that the number of elements contained in Ki is ki and
that in K i0 is ki0 . Then, we have

ki ¼ ki0 : ð18:120Þ
18.6 Classes and Irreducible Representations 707

If K j 6¼ K i0 (i.e., Kj is not identical with K i0 as a set), KiKj does not contain e.


In fact, suppose that for gρ 2 Ki there were a group element b 2 Kj such that
bgρ ¼ e. Then, we would have

b ¼ gρ 1 and b 2 K i0 : ð18:121Þ
T
This would mean that b 2 K j K i0 , implying that K j ¼ K i0 by definition of
class. It is in contradiction to K j 6¼ K i0 . Consequently, KiKj does not contain e.
Taking the product of the classes Ki and K i0 , we obtain the identity e precisely ki
times. Rewriting (18.117), we have
X
K i K j ¼ cij1 K 1 þ c K
l6¼1 ijl l
, ð18:122Þ

where K1 ¼ {e}. As mentioned below, we are most interested in the first term
in (18.122). In (18.122), if K j ¼ K i0 , we have

cij1 ¼ ki : ð18:123Þ

On the other hand, if K j 6¼ K i0 , cij1 ¼ 0. Summarizing the above arguments,


we get
8
>
< ki for K j ¼ K i0 ,
cij1 ¼ ð18:124Þ
>
:
0 for K j 6¼ K i0 :

Or, symbolically we write

cij1 ¼ ki δ ji0 : ð18:125Þ

18.6 Classes and Irreducible Representations

After the aforementioned considerations, we have the following theorem:


Theorem 18.5 Let ℊ ¼ {g1, g2,   , gn} be a group. A trace of irreducible repre-
sentations satisfies the following orthogonality relation:
Xnr   n
α¼1
χ ðαÞ ðK i Þχ ðαÞ K j ¼ δij , ð18:126Þ
ki

where summation α is taken over all inequivalent nr irreducible representations; Ki


and Kj indicate conjugacy classes; ki denotes the number of elements contained in the
i-th class Ki.
708 18 Representation Theory of Groups

Proof Rewriting (18.108), we have


8 
gK i ¼ K i g g2ℊ : ð18:127Þ

Since a homomorphic correspondence holds between a group element and its


bi
representation matrix, a similar correspondence holds as well with (18.127). Let K
be a sum of ki matrices of the α-th irreducible representation D and let K
(α) b i be
expressed as
X
bi ¼
K DðαÞ : ð18:128Þ
g2K i

Note that in (18.128) D(α) functions as a linear transformation with respect to a


group algebra. From (18.127), we have
8 
bi ¼ K
DðαÞ K b i DðαÞ g2ℊ : ð18:129Þ

Since D(α) is an irreducible representation, Kbi must be expressed on the basis of


Schur’s Second Lemma as

b i ¼ λE:
K ð18:130Þ

To determine λ, we take a trace of both sides of (18.130). Then, from (18.128) and
(18.130) we get

k i χ ðαÞ ðK i Þ ¼ λd α : ð18:131Þ

Thus, we get

b i ¼ ki χ ðαÞ ðK i ÞE:
K ð18:132Þ

Next, corresponding to (18.122), we have


X
b iK
K bj ¼ c K: b ð18:133Þ
l ijl l

b l in (18.133) with that of (18.132), we get


Replacing K
18.6 Classes and Irreducible Representations 709

  X
k i kj χ ðαÞ ðK i Þχ ðαÞ K j ¼ dα l cijl kl χ ðαÞ ðK l Þ: ð18:134Þ

Returning to (18.99) and rewriting it, we have


X
ðαÞ
d χ
α α
ðK i Þ ¼ nδi1 , ð18:135Þ

where again we have K1 ¼ {e}. With respect to (18.134), we sum over all the
irreducible representations α. Then we have
X
ðαÞ ð αÞ
  Xh X
ð αÞ
i
α
k i k j χ ðK i Þχ K j ¼ l
c ijl k l α
d α χ ð K l Þ
X
¼ c k nδ ¼ cij1 n:
l ijl l l1
ð18:136Þ

In (18.136) we remark that k1 ¼ 1, meaning that the number of group elements


that K1 ¼ {e} contains is 1. Rewriting (18.136), we get
X  
ðαÞ ðαÞ
ki kj α
χ ð K i Þχ K j ¼ cij1 n ¼ ki nδ ji0 , ð18:137Þ

where we used (18.125) with the last equality. Moreover, using

χ ðαÞ ðK i0 Þ ¼ χ ðαÞ ðK i Þ , ð18:138Þ

we get
X  
ki kj α
χ ðαÞ ðK i Þχ ðαÞ K j ¼ cij1 n ¼ ki nδji : ð18:139Þ

Rewriting (18.139), we finally get


Xnr   n
α¼1
χ ðαÞ ðK i Þχ ðαÞ K j ¼ δij : ð18:126Þ
ki

Equations (18.78) and (18.126) are well known as orthogonality relations. So far,
we have no idea about the mutual relationship between the two numbers nc and nr in
magnitude. In (18.78), let us consider a following set S
npffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffiffi o
S¼ k 1 χ ðαÞ ðK 1 Þ, k2 χ ðαÞ ðK 2 Þ,   , k nc χ ðαÞ ðK nc Þ : ð18:140Þ
710 18 Representation Theory of Groups

Viewing individual components in S as coordinates of a nc-dimensional vector,


(18.78) can be considered as an inner product as expressed using (complex) coor-
dinates of the two vectors. At the same time, (18.78) represents an orthogonal
relationship between the vectors. Since we can obtain at most nc mutually orthogonal
(i.e., linearly independent) vectors in a nc-dimensional space, for the number (nr) of
such vectors we have

nr  n c : ð18:141Þ

Here nr is equal to the number of different α, i.e., the number of irreducible


representations.
Meanwhile, in (18.126) we consider a following set S0
n o
S0 ¼ χ ð1Þ ðK i Þ, χ ð2Þ ðK i Þ,   , χ ðnr Þ ðK i Þ : ð18:142Þ

Similarly, individual components in S0 can be considered as coordinates of a nr-


dimensional vector. Again, (18.126) implies the orthogonality relation among vec-
tors. Therefore, as for the number (nc) of mutually orthogonal vectors we have

nc  nr : ð18:143Þ

Thus, from (18.141) and (18.143) we finally reach a simple but very important
conclusion about the relationship between nc and nr such that

nr ¼ n c : ð18:144Þ

That is, the number (nr) of inequivalent irreducible representations is equal to that
(nc) of conjugacy classes of the group. An immediate and important consequence of
(18.144) together with (18.100) is that the representation of an Abelian group is one-
dimensional. This is because in the Abelian group individual group elements
constitute a conjugacy class. We have nc ¼ n accordingly.

18.7 Projection Operators

In Sect. 18.2 we have described how basis vectors (or functions) are transformed by
a symmetry operation. In that case the symmetry operation is performed by a group
element that belongs to a transformation group. More specifically, given a group
ℊ ¼ {g1, g2,   , gn} and a set of basis vectors ψ 1, ψ 2,   , and ψ d, the basis vectors
are transformed by gi 2 ℊ (1  i  n) such that
18.7 Projection Operators 711

Xd
gi ð ψ ν Þ ¼ μ¼1
ψ μ Dμν ðgi Þ: ð18:22Þ

We may well ask then how we can construct such basis vectors. In this section we
address this question. A central concept about this is a projection operator. We have
already studied the definition and basic properties of the projection operators. In this
section, we deal with them bearing in mind that we apply the group theory to
molecular science, especially to quantum chemical calculations.
In Sect. 18.6, we examined the permissible number of irreducible representations.
We have reached a conclusion that the number (nr) of inequivalent irreducible
representations is equal to that (nc) of conjugacy classes of the group. According
to this conclusion, we modify the above equation (18.22) such that
  Xd
ðαÞ α ðαÞ ðαÞ
g ψi ¼ ψ Dji ðgÞ,
j¼1 j
ð18:145Þ

where α and dα denote the α-th irreducible representation and its dimension,
respectively; the subscript i is omitted from gi for simplicity. This naturally leads
ðαÞ ðβ Þ
to the next question of how the basis vectors ψ j are related to ψ j that are basis
vectors as well, but belong to a different irreducible representation β.
Suppose that we have an arbitrarily chosen function f. Then, f is assumed to
contain various components of different irreducible representations. Thus, let us
assume that f is decomposed into the component such that
X X
f ¼ α
cðαÞ ψ ðmαÞ ,
m m
ð18:146Þ

where cðmαÞ is a coefficient of the expansion and ψ ðmαÞ is the m-th component of α-th
irreducible representation.
Now, let us define the following operator:

ðαÞ dα X ðαÞ 
PlðmÞ  D ðgÞ g,
g lm
ð18:147Þ
n

where Σg means that the summation should be taken over all n group elements g; dα
ðαÞ
denotes a dimension of the representation D(α). Operating PlðmÞ on f, we have
712 18 Representation Theory of Groups

ðαÞ d α X ðαÞ  d X ðαÞ  X X ðνÞ ðνÞ


PlðmÞ f ¼ Dlm ðgÞ gf ¼ α D ð gÞ g ν k c k ψ k
g lm
n g n
d X ðαÞ  X X ðνÞ ðνÞ
¼ α D ð gÞ
g lm ν k k
c gψ k
n
d X ðαÞ 
X X ðνÞ
Xdν ðνÞ ðνÞ
¼ α D lm ð g Þ ν
c k ψ Djk ðgÞ
j¼1 j
ð18:148Þ
n g k
X X X h X i
d ðνÞ dν ð νÞ ðαÞ  ðνÞ
¼ α ν
c k ψ j D lm ð g Þ Djk ð g Þ
n k j¼1 g
dα X X ðνÞ Xdν ðνÞ n ðαÞ
¼ ν
c
k k
ψ δ δ δ ¼ cðmαÞ ψ l ,
j¼1 j d α αν lj mk
n

where we used (18.51) for the equality before the last.


ð αÞ
Thus, an implication of the operator PlðmÞ is that if f contains the component ψ ðmαÞ,
ðαÞ
i.e., cðmαÞ 6¼ 0, PlðmÞ plays a role in extracting the component cðmαÞ ψ ðmαÞ from f and then
ðαÞ
converting it to cðmαÞ ψ l . If the ψ ðmαÞ component is not contained in f, that means from
ð αÞ
(18.148) PlðmÞ f ¼ 0. In that case we choose a more suitable function for f. In this
context, we will investigate an example of a quantum chemical calculation later.
ðαÞ
In the above case, let us call PlðmÞ a projection operator sensu lato. In Sect. 14.1,
we dealt with several aspects of the projection operators. There we have mentioned
that a projection operator should be idempotent and Hermitian in a rigorous sense
(Definition 14.1). To address another important aspect of the projection operator, let
us prove an important relation in a following theorem.
ðαÞ ðβÞ
Theorem 18.6 [1, 2] Let PlðmÞ and PsðtÞ be projection operators defined in (18.147).
Then, the following equation holds:

ðαÞ ðβ Þ ð αÞ
PlðmÞ PsðtÞ ¼ δαβ δms PlðtÞ : ð18:149Þ

Proof We have
18.7 Projection Operators 713

ðαÞ ðβÞ d α X ðαÞ  dβ X ðβÞ 0  0


PlðmÞ PsðtÞ ¼ D ðgÞ g
g lm
D ðg Þ g
g0 st
n n
d X ðαÞ  d β X ðβÞ  1 0  1 0
¼ α D ð gÞ g
g lm
D g g g g
g0 st
n n
d dβ X X ðαÞ  ðβÞ  1 0  1 0
¼ α g0 lm
D ðgÞ Dst g g gg g
n n g
h i{ 
d α d β X X ðαÞ 
¼ g0 lm
D ð gÞ D ðgÞ D ðg Þ eg0
ðβÞ ðβÞ 0
n n g
st
n o
d α d β X X ð αÞ  X ð β Þ  ðβÞ 0
¼ 0
Dlm ð g Þ D ð g Þ ks D ð g Þ kt g0
n n g g k
h i
d d β X X X ðαÞ  ðβÞ 
¼ α 0
D lm ð g Þ D ð g Þ ks DðβÞ ðg0 Þkt g0
n n k g g
d dβ X X n ðβ Þ 0  0 dβ X 
¼ α 0
δ αβ δ lk δ ms D ð g Þ kt g ¼ δ δ DðβÞ ðg0 Þlt g0
g0 αβ ms
n n k g dα n
d β X ðβ Þ 0  0 ð αÞ
¼ δαβ δms g0
D ðg Þlt g ¼ δαβ δms PlðtÞ :
n
ð18:150Þ

This completes the proof. ∎


In the above proof, we used the rearrangement theorem and grand orthogonality
theorem (GOT) as well as the homomorphism and unitarity of the representation
matrices.
Comparing (18.146) and (18.148), we notice that the term cðmαÞ ψ ðmαÞ is not extracted
ðαÞ ðαÞ
entirely, but cðmαÞ ψ l is given instead. This is due to the linearity of PlðmÞ . Nonethe-
less, this is somewhat inconvenient for a practical purpose. To overcome this
inconvenience, we modify (18.149). In (18.149) putting s ¼ m, we have

ð αÞ ðβÞ ðαÞ
PlðmÞ PmðtÞ ¼ δαβ PlðtÞ : ð18:151Þ

We further modify the relation. Putting β ¼ α, we have

ðαÞ ðαÞ ðαÞ


PlðmÞ PmðtÞ ¼ PlðtÞ : ð18:152Þ

Putting t ¼ l furthermore,

ð αÞ ð αÞ ðαÞ
PlðmÞ PmðlÞ ¼ PlðlÞ : ð18:153Þ

In particular, putting m ¼ l moreover, we get


714 18 Representation Theory of Groups

h i2
ðαÞ ðαÞ ð αÞ ðαÞ
PlðlÞ PlðlÞ ¼ PlðlÞ ¼ PlðlÞ : ð18:154Þ

In fact, putting m ¼ l in (18.148), we get

ð αÞ ð αÞ ðαÞ
PlðlÞ f ¼ cl ψ l : ð18:155Þ

ðαÞ ðαÞ
This means that the term cl ψ l has been extracted entirely.
Moreover, in (18.149) putting β ¼ α, s ¼ l, and t ¼ m, we have
h i2
ðαÞ ðαÞ ðαÞ ð αÞ
PlðmÞ PlðmÞ ¼ PlðmÞ ¼ δml PlðmÞ :

ðαÞ
Therefore, for PlðmÞ to be an idempotent operator, we must have m ¼ l. That is, of
ðαÞ ðαÞ
various operators PlðmÞ , only PlðlÞ is eligible for an idempotent operator.
ð αÞ
Meanwhile, fully describing PlðlÞ we have

ðαÞ dα X ðαÞ 
PlðlÞ ¼ D ðgÞ g:
g ll
ð18:156Þ
n

Taking complex conjugate transposition (i.e., adjoint) of (18.156), we have


h i{
ðαÞ d X ðαÞ d X ð αÞ   {
PlðlÞ ¼ α Dll ðgÞg{ ¼ α 1
Dll g1 g1
n g n g

d α X ðαÞ  ðαÞ
¼ D ðgÞ g ¼ PlðlÞ ,
g ll
ð18:157Þ
n

where we used unitarity of g (with the third equality) and equivalence of summation
over g and g1. Notice that the notation of g{ is less common. This should be
interpreted as meaning that g{ operates on a vector constituting a representation
space. Thus, notation g{ implies that g{ is equivalent to its unitary representation
ðαÞ
matrix D(α)(g). Note also that Dll is not a matrix but a (complex) number. Namely,
ðαÞ
in (18.156) Dll ðgÞ is a coefficient of an operator g.
ðαÞ
Equations (18.154) and (18.157) establish that PlðlÞ is a projection operator in a
ðαÞ
rigorous sense. Let us call PlðlÞ a projection operator sensu stricto accordingly. Also,
we notice that cðmαÞ ψ ðmαÞ is entirely extracted from an arbitrary function f including a
coefficient. This situation resembles that of (12.205).
ðαÞ
Regarding PlðmÞ ðl 6¼ mÞ, on the other hand, we have
18.7 Projection Operators 715

h i{
ðαÞ dα X ðαÞ { dα X ðαÞ  1  1 {
PlðmÞ ¼ D lm ð g Þg ¼ 1
Dlm g g
n g n g
ð18:158Þ
d X ðαÞ  ðαÞ
¼ α D ðgÞ g ¼ PmðlÞ :
g ml
n
ðαÞ
Hence, PlðmÞ is not Hermitian. We have many other related equations. For
instance,
h i{ h i h i
ðαÞ ð αÞ ðαÞ { ðαÞ { ðαÞ ðαÞ
PlðmÞ PmðlÞ ¼ PmðlÞ PlðmÞ ¼ PlðmÞ PmðlÞ : ð18:159Þ

ðαÞ ðαÞ
Therefore, PlðmÞ PmðlÞ is Hermitian, recovering the relation (18.153).
In (18.149) putting m ¼ l and t ¼ s, we get

ðαÞ ðβÞ ðαÞ


PlðlÞ PsðsÞ ¼ δαβ δls PlðsÞ : ð18:160Þ

As in the case of (18.146), we assume that h is described as


X X
h¼ α
dðαÞ ϕðmαÞ ,
m m
ð18:161Þ

where ϕðmαÞ is transformed in the same manner as that for ψ ðmαÞ in (18.146), but linearly
independent of ψ ðmαÞ . Namely, we have
  Xd
ðαÞ α ðαÞ ðαÞ
g ϕi ¼ ϕ Dji ðgÞ:
j¼1 j
ð18:162Þ

Tangible examples can be seen in Chap. 19.


ð αÞ
Operating PlðlÞ on both sides of (18.155), we have

h i h i
ðαÞ 2 ðαÞ ðαÞ ðαÞ ð αÞ ðαÞ ðαÞ
PlðlÞ f ¼ PlðlÞ cl ψ l ¼ PlðlÞ f ¼ cl ψ l , ð18:163Þ

h i
ð αÞ 2 ð αÞ
where with the second equality we used PlðlÞ ¼ PlðlÞ . That is, we have

h i
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ cl ψ l ¼ cl ψ l : ð18:164Þ

ðαÞ ðαÞ
This equation means that cl ψ l is an eigenfunction corresponding to an
ð αÞ ðαÞ ðαÞ
eigenvalue 1 of PlðlÞ . In other words, once cl ψ l is extracted from f, it belongs to
the “position” l of an irreducible representation α. Furthermore, for some constants
c and d as well as the functions f and h that appeared in (18.146) and (18.161), we
consider a following equation:
716 18 Representation Theory of Groups

h i
ð αÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ ccl ψ l þ ddl ϕl ¼ cPlðlÞ cl ψ l þ dPlðlÞ dl ϕl
ðαÞ ðαÞ ðαÞ ðαÞ
¼ ccl ψ l þ dd l ϕl , ð18:165Þ

where with the last equality we used (18.164). This means that an arbitrary linear
ðαÞ ðαÞ ðαÞ ðαÞ
combination of cl ψ l and d l ϕl again belongs to the position l of an irreducible
ðαÞ ðαÞ
representation α. If ψ l and ϕl are linearly independent, we can construct two
orthonormal basis vectors following Theorem 13.2 (Gram–Schmidt
orthonormalization theorem). If there are other linearly independent vectors
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
pl ξl , ql φl ,   , where ξl , φl , etc. belong to the position l of an irreducible
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
representation α, then ccl ψ l þ ddl ϕl þ ppl ξl þ qql φl þ    again
belongs to the position l of an irreducible representation α. Thus, we can construct
orthonormal basis vectors of the representation space according to Theorem 13.2.
Regarding arbitrary functions f and h as vectors and using (18.160), we make an
inner product of (18.160) such that
    
 ðαÞ ðβÞ  ðαÞ ðαÞ ðβÞ   ðαÞ ðαÞ 
hhPlðlÞ PsðsÞ j f i ¼ hhPlðlÞ PlðlÞ PsðsÞ j f i ¼ hhPlðlÞ δαβ δls PlðsÞ  f i
D   E ð18:166Þ
ðαÞ  ðαÞ 
¼ δαβ δls hPlðlÞ PlðsÞ  f ,

ðαÞ ð αÞ ð αÞ
where with the first equality we used PlðlÞ ¼ PlðlÞ PlðlÞ (18.154). Meanwhile, from
(18.148) we have

ðαÞ ð αÞ
PlðsÞ j f i ¼ cðsαÞ ψ l : ð18:167Þ

Also using (18.155), we get

ðαÞ ð αÞ ð αÞ
PlðlÞ j hi ¼ dl j ϕl i: ð18:168Þ

Taking adjoint of (18.168), we have


h i  h i
 ðαÞ {  ðαÞ ð αÞ ðαÞ
h PlðlÞ ¼ hhPlðlÞ ¼ dl hϕl j , ð18:169Þ

h i{
ð αÞ ðαÞ
where we used PlðlÞ ¼ PlðlÞ (18.157). The relation (18.169) is due to the notation
of Sect. 13.3. Substituting (18.167) and (18.168) for (18.166),
D   E h i D E
 ðαÞ ðβÞ  ðαÞ ðαÞ ðαÞ
hPlðlÞ PsðsÞ  f ¼ δαβ δls dl cðsαÞ ϕl jψ l :

Meanwhile, (18.166) can also be described as


18.7 Projection Operators 717

D   E D E h i D E
 ð αÞ ð β Þ  ðαÞ ðαÞ  ðαÞ 
hPlðlÞ PsðsÞ  f ¼ dl ϕl cðsβÞ ψ ðsβÞ ¼ dl cðsβÞ ϕl ψ ðsβÞ
ð αÞ
ð18:170Þ

For (18.166) and (18.170) to be identical, we must have


h i D  E h i D E
ðαÞ  ðαÞ ðαÞ  ðαÞ ðαÞ  ðβÞ ð αÞ 
δαβ δls d l cs ϕl  ψ l ¼ dl cs ϕl ψ ðsβÞ :

Deleting coefficients, we get


D E D  E
ð αÞ  ðαÞ  ðαÞ
ϕl ψ ðsβÞ ¼ δαβ δls ϕl ψ l : ð18:171Þ

The relation (18.171) is frequently used to estimate whether definite integrals


vanish. Functional forms depend upon actual problems we encounter in various
situations. We will deal with this problem in Chap. 19 in relation to, e.g., discussion
on optical transition and evaluation of overlap integrals.
To evaluate (18.170), if α 6¼ β or l 6¼ s, we get simply
D   E
 ðαÞ ðβÞ 
hPlðlÞ PsðsÞ f ¼ 0 ðα 6¼ β or l 6¼ sÞ: ð18:172Þ

ðβÞ ðαÞ
That is, under a given condition α 6¼ β or l 6¼ s, PsðsÞ j f i and PlðlÞ j hi are
orthogonal (see Theorem 13.3 of Sect. 13.4). The relation clearly indicates that
functions belonging to different irreducible representations (α 6¼ β) are mutually
orthogonal. Even though the functions belong to the same irreducible representation,
the functions are orthogonal if they are allocated to the different “place” as a basis
ðαÞ
vector designated by l, s, etc. Here the place means the index j of ψ j in (18.145) or
ðαÞ ðαÞ ðαÞ ð αÞ ð αÞ ðαÞ
ϕj in (18.162) that designates the “order” of ψ j within ψ 1 , ψ 2 ,   , ψ dα or ϕj
ðαÞ ð αÞ ð αÞ
within ϕ1 , ϕ2 ,   , ϕdα . This takes place if the representation is multidimensional
(i.e., dimensionality: dα).
In (18.160) putting α ¼ β, we get

ðαÞ ðαÞ ðαÞ


PlðlÞ PsðsÞ ¼ δls PlðsÞ :

In the above relation, furthermore, unless l ¼ s we have

ðαÞ ðαÞ
PlðlÞ PsðsÞ ¼ 0:

ðαÞ ðαÞ
Therefore, on the basis of the discussion of Sect. 14.1, PlðlÞ þ PsðsÞ is a projection
ð αÞ ðαÞ
operator as well in the case of l 6¼ s. Notice, however, that if l ¼ s, PlðlÞ þ PsðsÞ is not a
projection operator. Readers can readily show it. Moreover, let us define P(α) as
below
718 18 Representation Theory of Groups

Xd α ðαÞ
PðαÞ  P :
l¼1 lðlÞ
ð18:173Þ

Similarly, P(α) is again a projection operator as well.


From (18.156), P(α) in (18.173) can be rewritten as
h i
dα X Xdα ðαÞ  dα X ðαÞ
PðαÞ ¼ D
l¼1 ll
ð g Þ g ¼ χ ð g Þ g: ð18:174Þ
n g n g

Returning to (18.155) and taking summation over l there, we have


Xdα ð αÞ
Xd α ðαÞ ðαÞ
P f ¼
l¼1 lðlÞ
c ψl :
l¼1 l

Using (18.173), we get


Xdα ðαÞ ðαÞ
PðαÞ f ¼ c ψl :
l¼1 l
ð18:175Þ

Thus, an operator P(α) has a clear meaning. That is, P(α) plays a role in extracting
all the vectors (or functions) including their coefficients. Defining those functions as
Xdα ðαÞ ðαÞ
ψ ðαÞ  c ψl ,
l¼1 l
ð18:176Þ

we succinctly rewrite (18.175) as

PðαÞ f ¼ ψ ðαÞ : ð18:177Þ

In turn, let us calculate P(β)P(α). This can be done with (18.174) as follows:

d β X h ðβÞ i dα X h ðαÞ 0 i 0
PðβÞ PðαÞ ¼ χ ð gÞ g g0
χ ðg Þ g : ð18:178Þ
n g n

To carry out the calculation, (i) first we replace g0 with g1g0 and rewrite the
summation over g0 as that over g1g0. (ii) Using the homomorphism and unitarity of
the representation, we rewrite [χ (α)(g0)] as
h i X h i h i
χ ðαÞ ðg0 Þ ¼ i,j
D ðαÞ
ð g Þ ji DðαÞ 0 
ð g Þ ji : ð18:179Þ

Rewriting (18.178) we have


18.7 Projection Operators 719

d β dα X X h ðβÞ i h ðαÞ X h i
PðβÞ PðαÞ ¼ D ð g Þ kk D ð g Þ ji 0
D ðαÞ 0
ðg Þ g0
n n g k,i,j g ji
dβ dα X n X h i
dα X X
h i
ðαÞ 0 0 ðαÞ 0
¼ δ αβ δkj δ ki g0
D ð g Þ g ¼ g0
D ð g Þ g0
n n k,i,j d β ji n k kk
h i
d X
¼ δαβ α g0
χ ðαÞ 0
ð g Þ g0 ¼ δαβ PðαÞ :
n
ð18:180Þ

These relationships are anticipated from the fact that both P(α) and P(β) are
projection operators.
As in the case of (18.160), (18.180) is useful to evaluate an inner product of
functions. Again taking an inner product of (18.180) with arbitrary functions f and g,
we have
D   E D   E D   E
gPðβÞ PðαÞ  f ¼ gδαβ PðαÞ  f ¼ δαβ gPðαÞ  f : ð18:181Þ

If we define P such that


Xnr
P α¼1
PðαÞ , ð18:182Þ

then P is once again a projection operator (see Sect. 14.1). Now taking summation in
(18.177) over all the irreducible representations α, we have
Xnr Xnr
α¼1
PðαÞ f ¼ α¼1
ψ ðαÞ ¼ f : ð18:183Þ

The function f has been arbitrarily taken and, hence, we get

P ¼ E: ð18:184Þ

As mentioned in Example 18.1, the concept of the representation space is not only
important but also very useful for addressing various problems of physics and
chemistry. For instance, molecular orbital methods to be dealt with in Chap. 19
consider a representation space whose dimension is equal to the number of electrons
of a molecule about which we wish to know, e.g., energy eigenvalues of those
electrons. In that case, the dimension of representation space is equal to the number
of molecular orbitals. According to a symmetry species of the molecule, the repre-
sentation matrix is decomposed into a direct sum of invariant eigenspaces relevant to
individual irreducible representations. If the basis vectors belong to different irre-
ducible representation, such vectors are orthogonal to each other in virtue of
720 18 Representation Theory of Groups

(18.171). Even though those vectors belong to the same irreducible representation,
they are orthogonal if they are allocated to a different place. However, it is often the
case that the vectors belong to the same place of the same irreducible representation.
Then, it is always possible according to Theorem 13.2 to make them mutually
orthogonal by taking their linear combination. In such a way, we can construct an
orthonormal basis set throughout the representation space. In fact, the method is a
powerful tool for solving an energy eigenvalue problem and for determining asso-
ciated eigenfunctions (or molecular orbitals).

18.8 Direct-Product Representation

In Sect. 16.5 we have studied basic properties of direct-product groups. Correspond-


ingly, we examine in this section properties of direct-product representation. This
notion is very useful to investigate optical transitions in molecular systems and
selection rules relevant to those transitions.
Let D(α) and D(β) be two different irreducible representations whose dimensions
are dα and dβ, respectively. Then, operating a group element g on the basis functions
we have
Xd α ðαÞ
gð ψ i Þ ¼ k¼1
ψ k Dki ðgÞ ð1  i  dα Þ, ð18:185Þ
  Xd β ðβ Þ  
g ϕj ¼ ϕ D ð gÞ
l¼1 l lj
1  j  dβ , ð18:186Þ

where ψ k (1  i  dα) and ϕl (1  l  dβ) are basis functions of D(α) and D(β),
respectively. We can construct dαdβ new basis vectors using ψ kϕl. These functions
are transformed according to g such that
    hX ðαÞ
ihX
ðβÞ
i
g ψ i ϕj ¼ gðψ i Þg ϕj ¼ k
ψ k D ki ð g Þ l
ϕ l D lj ðg Þ
XX ðαÞ ðβÞ
¼ k
ψ ϕ D ðgÞDlj ðgÞ:
l k l ki
ð18:187Þ

β)
Here let us define the following matrix [D(α (g)]kl, ij such that
h i
ðαÞ ðβ Þ
Dðα βÞ
ð gÞ  Dki ðgÞDlj ðgÞ: ð18:188Þ
kl,ij

Then we have
  XX h i
g ψ i ϕj ¼ k l
ψ k ϕ l Dðα βÞ
ð gÞ : ð18:189Þ
kl,ij
18.8 Direct-Product Representation 721

The notation using double scripts is somewhat complicated. We notice, however,


that in (18.188) the order of subscript of kilj is converted to kl, ij; i.e., the subscripts
i and l have been interchanged.
We write D(α) and D(β) in explicit forms as follows:
0 1 0 1
ð αÞ ðαÞ ðβÞ ðβÞ
d1,1  d 1,dα d1,1  d1,dβ
B C B C
DðαÞ ðgÞ ¼ @ ⋮ ⋱ ⋮ A, DðβÞ ðgÞ ¼ B
@ ⋮ ⋱ ⋮ C A: ð18:190Þ
ðαÞ ðαÞ ðβ Þ ðβÞ
ddα ,1    d dα ,dα ddβ ,1    d dβ ,dβ

Thus,

Dðα βÞ
ðgÞ ¼ DðαÞ ðgÞ DðβÞ ðgÞ
0 ðαÞ ðβÞ ðαÞ 1
d1,1 D ðgÞ    d1,dα DðβÞ ðgÞ
B C
¼@ ⋮ ⋱ ⋮ A: ð18:191Þ
ðαÞ ðαÞ
ddα ,1 DðβÞ ðgÞ  d dα ,dα DðβÞ ðgÞ

To get familiar with the double-scripted notation, we describe a case of (2, 2)


matrices. Denoting
 
ðαÞ a11 a12 ðβÞ b11 b12
D ð gÞ ¼ and D ðgÞ ¼ , ð18:192Þ
a21 a22 b21 b22

we get

Dðα βÞ
ðgÞ ¼ DðαÞ ðgÞ DðβÞ ðgÞ
0 1
a11 b11 a11 b12 a12 b11 a12 b12
Ba b C
B 11 21 a11 b22 a12 b21 a12 b22 C: ð18:193Þ
B C
¼B C
B a21 b11 a21 b12 a22 b11 a22 b12 C
@ A
a21 b21 a21 b22 a22 b21 a22 b22

Corresponding to (18.22), (18.189) describes the transformation of ψ iϕj regard-


ing the double script. At the same time, a set {ψ iϕj; (1  i  dα, 1  j  dβ)} is a
basis of D(α β). In fact,
722 18 Representation Theory of Groups

h i
ðαÞ ðβÞ
Dðα βÞ
ðgg0 Þ ¼ Dki ðgg0 ÞDlj ðgg0 Þ
kl,ij
hX ihX i
ðαÞ ð αÞ 0 ðβÞ ðβÞ 0
¼ D
μ kμ
ð g ÞDμi ð g Þ D
ν lν
ð g ÞD νj ðg Þ
X X ðαÞ ð18:194Þ
ðβÞ ðαÞ 0 ðβÞ
¼ μ
D ðgÞDlν ðgÞDμi ðg ÞDνj ðg0 Þ
ν kμ
X Xh i h i
ðα β Þ ðα βÞ 0
¼ μ ν
D ð g Þ D ð g Þ
kl,μν μν,ij

Equation (18.194) shows that the rule of matrix calculation with respect to the
double subscript is satisfied. Consequently, we get

Dðα βÞ
ðgg0 Þ ¼ Dðα βÞ
ðgÞDðα βÞ
ðg0 Þ: ð18:195Þ

The relation (18.195) indicates that D(α β) is certainly a representation of a


group. This representation is said to be a direct-product representation.
A character of the direct-product representation is given by putting k ¼ i and l ¼ j
in (18.188). That is,
h i
ðαÞ ðβÞ
Dðα βÞ
ð gÞ ¼ Dii ðgÞDjj ðgÞ: ð18:196Þ
ij,ij

Denoting
X h i
χ ðα βÞ
ð gÞ  i, j
Dðα βÞ
ð gÞ , ð18:197Þ
ij,ij

we have

χ ðα βÞ
ðgÞ ¼ χ ðαÞ ðgÞχ ðβÞ ðgÞ: ð18:198Þ

Even though D(α) and D(β) are both irreducible, D(αxβ) is not necessarily irreduc-
ible. Suppose that
X
DðαÞ ðgÞ DðβÞ ðgÞ ¼ q D
ω ω
ðωÞ
ðgÞ, ð18:199Þ

where qγ is given by (18.83). Then, we have

1 X ðγÞ  ðα βÞ 1 X ðγÞ  ðαÞ


qγ ¼ χ ð gÞ χ ðgÞ ¼ χ ðgÞ χ ðgÞχ ðβÞ ðgÞ, ð18:200Þ
n g n g

where n is an order of the group. This relation is often used to perform quantum
mechanical or chemical calculations, especially to evaluate optical transitions of
18.9 Symmetric Representation and Antisymmetric Representation 723

matter including atoms and molecular systems. This is also useful to examine
whether a definite integral of a product of functions (or an inner product of vectors)
vanishes.
In Sect. 16.5, we investigated definition and properties of direct-product groups.
Similarly to the case of the direct-product representation, we consider a representa-
tion of the direct-product groups. Let us consider two groups ℊ and H and assume
that a direct-product group ℊ H is defined (Sect. 16.5). Also let D(α) and D(β) be
dα- and dβ-dimensional representations of groups ℊ and H , respectively. Further-
more, let us define a matrix D(α β)(ab) as in (18.188) such that
h i
ð αÞ ðβ Þ
Dðα βÞ
ðabÞ  Dki ðaÞDlj ðbÞ, ð18:201Þ
kl,ij

where a and b are arbitrarily chosen from ℊ and H , respectively, and ab 2 ℊ H .


Then a set comprising D(α β)(ab) forms a representation of ℊ H . (Readers,
please verify it.) A dimension of D(α β) is dαdβ. The character is given in (18.69),
and so in the present case by putting i ¼ k and j ¼ l in (18.201) we get
X Xh i XX ð αÞ ðβ Þ
χ ðα βÞ
ðabÞ ¼ Dðα βÞ
ðabÞ ¼ Dkk ðaÞDll ðbÞ
k l kl,kl k l
ð18:202Þ
¼ χ ðαÞ ðaÞχ ðβÞ ðbÞ:

Equation (18.202) resembles (18.198). Hence, we should be careful not to


confuse them. In (18.198), we were thinking of a direct-product representation
within a sole group ℊ. In (18.202), however, we are considering a representation
of the direct-product group comprising two different groups. In fact, even though a
character is computed with respect to a sole group element g in (18.198), in (18.202)
we evaluate a character regarding two group elements a and b chosen from different
groups ℊ and H , respectively.

18.9 Symmetric Representation and Antisymmetric


Representation

As mentioned in Sect. 18.2, we have viewed a group element g as a linear transfor-


mation over a vector space V. There we dealt with widely chosen functions as
vectors. In this chapter we introduce other useful ideas so that we can apply them
to molecular science and atomic physics.
In the previous section we defined direct-product representation. In
D(α β)(g) ¼ D(α)(g) D(β)(g) we can freely consider a case where D(α)(g) ¼ D(β)(g).
Then we have
724 18 Representation Theory of Groups

    XX ðαÞ ðαÞ
g ψ i ϕj ¼ gðψ i Þg ϕj ¼ k
ψ ϕ D ðgÞDlj ðgÞ:
l k l ki
ð18:203Þ

Regarding a product function ψ jϕi we can get a similar equation such that
    XX ðαÞ ðαÞ
g ψ j ϕi ¼ g ψ j gð ϕi Þ ¼ k
ψ ϕ D ðgÞDli ðgÞ:
l k l kj
ð18:204Þ

On the basis of the linearity of the relations (18.203) and (18.204), let us construct
a linear combination of the product functions. That is,
     
g ψ i ϕj ψ j ϕi ¼ g ψ i ϕj g ψ j ϕi
XX ðαÞ ðαÞ
XX ðαÞ ðαÞ
¼ k
ψ ϕ D ðgÞDlj ðgÞ
l k l ki k
ψ ϕ D ðgÞDli ðgÞ
l k l kj
XX ð αÞ ðαÞ
¼ k l
ðψ k ϕl ψ l ϕk ÞDki ðgÞDlj ðgÞ:
ð18:205Þ

Here, defining Ψ ij as

Ψ ij ¼ ψ i ϕj ψ j ϕi , ð18:206Þ

we rewrite (18.205) as
XX n h io
1 ðαÞ ð αÞ ðαÞ ðαÞ
gΨij ¼ Ψ
l kl
D ðgÞDlj ðgÞ Dli ðgÞDkj ðgÞ : ð18:207Þ
k 2 ki

Notice that Ψ þ 
ij and Ψ ij in (18.206) are symmetric and antisymmetric with respect
to the interchange of subscript i and j, respectively. That is,

Ψ ij ¼ Ψ ji : ð18:208Þ

We may naturally ask how we can constitute representation (matrices) with


(18.207). To answer this question, we have to carry out calculations by replacing
g with gg0 in (18.207) and following the procedures of Sect. 18.2. Thus, we have
XX n h io
1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ
gg0 Ψij ¼ Ψ D
l kl ki ðgg ÞDlj ðgg Þ Dli ðgg 0
ÞDkj ð gg 0
Þ
(2
k
XX h
1 X ðαÞ ðαÞ
X ðαÞ ðαÞ
¼ k
Ψ
l kl 2
D ðgÞDμi ðg0 Þ ν Dlν ðgÞDνj ðg0 Þ
μ kμ
#)
X ðαÞ ð αÞ 0
X ðαÞ ð αÞ 0
D ðgÞDμi ðg Þ ν Dkν ðgÞDνj ðg Þ
μ lμ
18.9 Symmetric Representation and Antisymmetric Representation 725

XX n X Xh i o
1 ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ 0 ðαÞ 0
¼ Ψ
l kl
μ ν
D kμ ð g ÞD lν ð g Þ D lμ ð g ÞD kν ð g Þ D μi ðg ÞD νj ðg Þ
(2
k
XX X X nh i h i o
1 ð α αÞ ðα αÞ ðαÞ 0 ðαÞ 0
¼ Ψ kl μ ν
D ð g Þ kl,μν D ð g Þ lk,μν Dμi ðg ÞDνj ðg Þ,
k l 2
ð18:209Þ

where the last equality follows from the definition of a direct-product representation
(18.188). We notice that the terms of [D(α α)(gg0)]kl, μν [D(α α)(gg0)]lk, μν in
(18.209) are symmetric and antisymmetric with respect to the interchange of sub-
scripts k and l together with subscripts μ and ν, respectively. Comparing both sides of
(18.209), we see that this must be the case with i and j as well. Then, the last factor of
(18.209) should be rewritten as:
h i
ðαÞ ðαÞ 1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ
Dμi ðg0 ÞDνj ðg0 Þ ¼ D ðg ÞDνj ðg Þ Dνi ðg0 ÞDμj ðg0 Þ :
2 μi

Now, we define the following notations accordingly:


n o h i h o
1
D½α α
ð gÞ  f Dðα αÞ ðgÞ þ Dðα αÞ ðgÞlk,μν
kl,μν 2 kl,μν
h i
1 ðαÞ ðαÞ ðαÞ ð αÞ
¼ Dkμ ðgÞDlν ðgÞ þ Dlμ ðgÞDkν ðgÞ : ð18:210Þ
2

nh i h i o
1
Dfα αg
ð gÞ kl,μν
 Dðα αÞ ðgÞ  Dðα αÞ ðgÞ lk,μν
2 kl,μν
h i
1 ðαÞ ð αÞ ðαÞ ðαÞ
¼ Dkμ ðgÞDlν ðgÞ  Dlμ ðgÞDkν ðgÞ : ð18:211Þ
2
ðαÞ ðαÞ  
Meanwhile, using Dμi ðg0 ÞDνj ðg0 Þ ¼ Dðα αÞ ðg0 Þ μν,ij and considering the
exchange of summation with respect to the subscripts μ and ν, we can also define
D[α α](g0) and D{α α}(g) according to the symmetric and antisymmetric cases,
respectively. Thus, for the symmetric case, we have
X XX X n o n o
gg0 Ψþ
ij ¼ k l μ ν
Ψ þ
kl D½α α
ð gÞ D½α α
ð g0 Þ
kl,μν μν,ij
XX n o ð18:212Þ
¼ k
Ψþ
l kl
D½α α
ðgÞD½ α α 0
ðg Þ :
kl,ij

Using (18.210), (18.207) can be rewritten as


726 18 Representation Theory of Groups

XX n o
gΨþ
ij ¼ k
Ψ þ
l kl
D½α α
ð gÞ : ð18:213Þ
kl,ij

Then we have
XX n o
gg0 Ψþ
ij ¼ k l
Ψþ
kl D
½α α
ðgg0 Þ : ð18:214Þ
kl,ij

Comparing (18.212) and (18.214), we finally get

D½α α
ðgg0 Þ ¼ D½α α
ðgÞD½α α
ðg0 Þ: ð18:215Þ

Similarly, for the antisymmetric case, we have


XX
gg0 Ψ
ij ¼ k l
Ψ
kl D
fα αg
ðgÞDfα αg
ð g0 Þ kl,ij
, ð18:216Þ
XX
gΨ
ij ¼ k l
Ψ
kl D
fα αg
ð gÞ kl,ij
, ð18:217Þ

Dfα αg
ðgg0 Þ ¼ Dfα αg
ðgÞDfα αg
ðg0 Þ: ð18:218Þ

Thus, both D[α α](g) and D{α α}(g) produce well-defined representations.
Letting dimension of the representation α be dα, we have dα(dα + 1)/2 functions
belonging to the symmetric representation and dα(dα  1)/2 functions belonging to
the antisymmetric representation. With the two-dimensional representation, for
instance, functions belonging to the symmetric representation are

ψ 1 ϕ1 , ψ 1 ϕ2 þ ψ 2 ϕ1 , and ψ 2 ϕ2 :

A function belonging to the antisymmetric representation is

ψ 1 ϕ2  ψ 2 ϕ1 :

Note that these vectors have not yet been normalized.


From (18.210) and (18.211), we can readily get useful expressions with charac-
ters of symmetric and antisymmetric representations. In (18.210) putting μ ¼ k and
ν ¼ l and summing over k and l,
References 727

X Xn o
χ ½α α
ð gÞ ¼ k l
D ½α α
ð g Þ
kl,kl
X X h i
1 ð αÞ ðαÞ ðαÞ ðαÞ
¼ D kk ð g ÞD ll ð g Þ þ D lk ð g ÞD kl ð g Þ ð18:219Þ
2 k l
h i 2 h  i 
1
¼ χ ðαÞ ðgÞ þ χ ðαÞ g2 :
2

Similarly we have
h i2 h 
fα αg 1 ð αÞ ðαÞ
 2 i
χ ðgÞ ¼ χ ð gÞ  χ g : ð18:220Þ
2

References

1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, expanded
ed. Shokabo, Tokyo. (in Japanese)
3. Hassani S (2006) Mathematical physics. Springer, New York
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
Chapter 19
Applications of Group Theory to Physical
Chemistry

On the basis of studies of group theory, now in this last chapter we apply the
knowledge to the molecular orbital (MO) calculations (or quantum chemical calcu-
lations). As tangible examples, we adopt aromatic hydrocarbons (ethylene,
cyclopropenyl radical, benzene, and ally radical) and methane. The approach is
based upon a method of linear combination of atomic orbitals (LCAO). To seek an
appropriate LCAO MO, we make the most of a method based on a symmetry-
adapted linear combination (SALC). To use projection operators is a powerful tool
for this purpose. For the sake of correct understanding, it is desired to consider
transformation of functions. To this end, we first show several examples.
In the process of carrying out MO calculations, we encounter a secular equation
as an eigenvalue equation. Using a SALC eases the calculations of the secular
equation. Molecular science relies largely on spectroscopic measurements and
researchers need to assign individual spectral lines to a specific transition between
the relevant molecular states. Representation theory works well in this situation.
Thus, the group theory finds a perfect fit with its applications in the molecular
science.

19.1 Transformation of Functions

Before showing individual examples, let us consider the transformation of functions


(or vectors) by the symmetry operation. Here we consider scalar functions.
For example, let us suppose an arbitrary function f(x, y) on a Cartesian xy-
coordinate. Figure 19.1 shows a contour map of f(x, y) ¼ constant. Then suppose
that the map is rotated around the origin. More specifically, the position vector r0
fixed on a “summit” [i.e., a point that gives a maximal value of f(x, y)] undergoes a
symmetry operation, namely, rotation around the origin. As a result, r0 is
transformed to r00 . Here, we assume that a “mountain” represented by f(x, y) is a

© Springer Nature Singapore Pte Ltd. 2020 729


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_19
730 19 Applications of Group Theory to Physical Chemistry

Fig. 19.1 Contour map of f y


(x, y) and f 0(x0, y0). The
function f 0(x0, y0) is obtained
by rotating a map of f(x, y)
around the z-axis
′ ′, ′
,

O x

rigid body before and after the transformation. Concomitantly, a general point r (see
Sect. 17.1) is transformed to r0 in exactly the same way as r0.
A new function f 0 gives a new contour map after the rotation. Let us describe f 0 as

f 0  OR f : ð19:1Þ

The RHS of (19.1) means that f 0 is produced as a result of operating the rotation
R on f. We describe the position vector after being transformed as r00 . Following the
notation of Sect. 11.1, r00 and r0 are expressed as

r00 ¼ Rðr0 Þ and r0 ¼ RðrÞ: ð19:2Þ

The matrix representation for R is given by, e.g., (11.35). In (19.2) we have
   0
x0 x
r0 = ðe1 e2 Þ and r00 ¼ ðe1 e2 Þ 00 , ð19:3Þ
y0 y0

where e1 and e2 are orthonormal basis vectors in the xy-plane. Also we have
   0
x 0 x
r ¼ ð e1 e2 Þ and r ¼ ðe1 e2 Þ 0 : ð19:4Þ
y y

Meanwhile, the following equation must hold:

f 0 ðx0 , y0 Þ ¼ f ðx, yÞ: ð19:5Þ

Or, using (19.1) we have

OR f ðx0 , y0 Þ ¼ f ðx, yÞ: ð19:6Þ


19.1 Transformation of Functions 731

The above argument can be extended to a three-dimensional (or higher dimen-


sional) space. In that case, similarly we have

OR f ðx0 , y0 , z0 Þ ¼ f ðx, y, zÞ, OR f ðr0 Þ ¼ f ðrÞ, or f 0 ðr0 Þ ¼ f ðrÞ, ð19:7Þ

where
0 1 0 01
x x
B C 0 B 0C
r ¼ ðe1 e2 e3 Þ@ y A and r ¼ ðe1 e2 e3 Þ@ y A: ð19:8Þ
z z0

The last relation of (19.7) comes from (19.1). Using (19.2) and (19.3), we rewrite
(19.7) as

OR f ½RðrÞ ¼ f ðrÞ: ð19:9Þ

Replacing r with R1(r), we get


    
OR f R R1 ðrÞ ¼ f R1 ðrÞ : ð19:10Þ

That is,
 
OR f ðrÞ ¼ f R1 ðrÞ : ð19:11Þ

More succinctly, (19.11) may be rewritten as


 
Rf ðrÞ ¼ f R1 ðrÞ : ð19:12Þ

Comparing (19.11) with (19.12) and considering (19.1), we have

f 0  OR f  Rf : ð19:13Þ

To gain a good understanding of the function transformation, let us think of some


examples.
Example 19.1 Let f(x, y) be a function described by

f ðx, yÞ = ðx  aÞ2 þ ðy  bÞ2 ; a, b > 0: ð19:14Þ

A contour is shown in Fig. 19.2. We consider a π/2 rotation around the z-axis.
Then, f(x, y) is transformed into Rf(x, y) ¼ f0(x, y) such that
732 19 Applications of Group Theory to Physical Chemistry

Fig. 19.2 Contour map of f


(x, y) ¼ (x  a)2 + (y  b)2 y
and f0(x0,
y0) ¼ (x0 + b)2 + (y0  a)2 ′ ′, ′ ,
before and after a π/2
rotation around the z-axis
( , )
(− , )

O x
z

Rf ðx, yÞ ¼ f 0 ðx, yÞ = ðx þ bÞ2 þ ðy  aÞ2 : ð19:15Þ

We also have
 
a
r 0 = ð e1 e2 Þ ,
b
    
0 1 a b
r00 ¼ ðe1 e2 Þ ¼ ð e1 e2 Þ ,
1 0 b a

where we define R as
 
0 1
R¼ :
1 0

Similarly,
   
x 0 y
r = ð e1 e2 Þ and r = ð e1 e2 Þ :
y x

From (19.15), we have

f 0 ðx0 , y0 Þ = ðx0 þ bÞ þ ðy0  aÞ ¼ ðy þ bÞ2 þ ðx  aÞ2 ¼ f ðx, yÞ:


2 2
ð19:16Þ

This ensures that (19.5) holds. The implication of (19.16) combined with (19.5) is
that a view of f0(x0, y0) from (b, a) is the same as that of f(x, y) from (a, b). Imagine
that if we are standing at (b, a) of f0(x0, y0), we are in the bottom of the “valley” of
f0(x0, y0). Exactly in the same manner, if we are standing at (a, b) of f(x, y), we are in
19.1 Transformation of Functions 733

Fig. 19.3 Contour map of y


f ðrÞ = e 2 2½ðxaÞ þy þz  þ
2 2 2

′, ′ ,
e 2 2½ðxþaÞ þy þz  ða > 0Þ.
2 2 2

(− , 0) ( , 0)
The function form remains
unchanged by a reflection
with respect to the yz-plane O
z x

the bottom of the valley of f(x, y) as well. Notice that for both f(x, y) and f0(x0, y0), (a,
b) and (b, a) are the lowest point, respectively.
Meanwhile, we have
      
1 0 1 1 0 1 x y
R ¼ , R ð r Þ ¼ ð e1 e2 Þ ¼ ð e1 e2 Þ :
1 0 1 0 y x

Then we have
 
f R1 ðrÞ ¼ ½ðyÞ  a2 þ ½ðxÞ  b2 ¼ ðx þ bÞ2 þ ðy  aÞ2 ¼ Rf ðrÞ: ð19:17Þ

Thus, (19.12) certainly holds.


Example 19.2 Let f(r) and g(r) be functions described by

f ðrÞ ¼ e2½ðxaÞ þy þz2 


þ e2½ðxþaÞ þy þz2 
2 2 2 2
ða > 0Þ, ð19:18Þ

gðrÞ ¼ e2½ðxaÞ þy2 þz2 


 e2½ðxþaÞ þy2 þz2 
2 2
ða > 0Þ: ð19:19Þ

Figure 19.3 shows an outline of the contour of f(r). We consider a following


symmetry operation in a three-dimensional coordinate system:
0 1 0 1
1 0 0 1 0 0
B C 1 B C
R¼@ 0 1 0 A and R ¼ @ 0 1 0 A: ð19:20Þ
0 0 1 0 0 1

This represents a reflection with respect to the yz-plane. Then we have


 
f R1 ðrÞ ¼ Rf ðrÞ = f ðrÞ, ð19:21Þ
734 19 Applications of Group Theory to Physical Chemistry

(a) (b)

0
‒a 0 a ‒a 0 a

Fig. 19.4 Plots of f ðrÞ ¼ e2½ðxaÞ þy þz  þ e2½ðxþaÞ þy þz  and gðrÞ ¼ e2½ðxaÞ þy2 þz2 
2 2 2 2 2 2 2

e2½ðxþaÞ þy þz  ða > 0Þ as a function of x on the x-axis. (a) f(r). (b) g(r)
2 2 2

 
g R1 ðrÞ ¼ RgðrÞ = 2 gðrÞ: ð19:22Þ

Plotting f(r) and g(r) as a function of x on the x-axis, we depict results in Fig. 19.4.
Looking at (19.21) and (19.22), we find that f(r) and g(r) are solutions of an
eigenvalue equation for an operator R. Corresponding eigenvalues are 1 and 1 for f
(r) and g(r), respectively. In particular, f(r) holds the functional form after the
transformation R. In this case, f(r) is said to be invariant with the transformation R.
Moreover, f(r) is invariant with the following eight transformations:
0 1
1 0 0
B C
R¼@ 0 1 0 A: ð19:23Þ
0 0 1

These transformations form a group that is isomorphic to D2h. Therefore, f(r) is


eligible for a basis function of the totally symmetric representation Ag of D2h. Notice
that f(r) is invariant as well with a rotation of an arbitrary angle around the x-axis. On
the other hand, g(r) belongs to B3u.

19.2 Method of Molecular Orbitals (MOs)

Bearing in mind these arguments, we examine several examples of quantum chem-


ical calculations. Our approach is based upon the molecular orbital theory. The
theory assumes the existence of molecular orbitals (MOs) in a molecule, as the
notion of atomic orbitals has been well established in atomic physics. Furthermore,
we assume that the molecular orbitals comprise a linear combination of atomic
orbitals (LCAO).
This notion is equivalent to that individual electrons in a molecule are indepen-
dently moving in a potential field produced by nuclei and other electrons. In other
words, we assume that each electron is moving along an MO that is extended over
19.2 Method of Molecular Orbitals (MOs) 735

the whole molecule. Electronic state in the molecule is formed by different MOs of
various energies that are occupied by electrons. As in the case of an atom, an MO
ψ i(r) occupied by an electron is described as

ħ2 2
Hψ i ðrÞ   ∇ þ V ðrÞ ψ i ðrÞ ¼ E i ψ i ðrÞ, ð19:24Þ
2m

where H is Hamiltonian of a molecule; m is a mass of an electron (note that we do not


use a reduced mass μ here); r is a position vector of the electron; ∇2 is the Laplacian
(Laplace operator); V(r) is a potential of the molecule at r; Ei is an energy of the
electron occupying ψ i and said to be a molecular orbital energy. We assume that V(r)
possesses a symmetry the same as that of the molecule.
Let ℊ be a symmetry group and let a group element arbitrarily chosen from among
ℊ be R. Suppose that an arbitrary position vector r is moved to another position r0.
This transformation is expressed as (19.2). Since V(r) has the same symmetry as the
molecule, an electron “feels” the same potential field at r0 as that at r. That is,

V ðrÞ ¼ V 0 ðr0 Þ ¼ V ðr0 Þ: ð19:25Þ

Or we have

V ðrÞψ ðrÞ ¼ V ðr0 Þψ 0 ðr0 Þ, ð19:26Þ

where ψ is an arbitrary function. Defining

V ðrÞψ ðrÞ  ½Vψ ðrÞ ¼ Vψ ðrÞ, ð19:27Þ

and recalling (19.1) and (19.7), we get

½RV ψ ðr0 Þ ¼ R½Vψ ðr0 Þ ¼ V 0 ψ 0 ðr0 Þ ¼ V 0 ðr0 Þψ 0 ðr0 Þ ¼ V ðr0 Þψ 0 ðr0 Þ


ð19:28Þ
¼ V ðr0 ÞRψ ðr0 Þ ¼ VRψ ðr0 Þ ¼ ½VRψ ðr0 Þ:

Comparing the first and last sides and remembering that ψ is an arbitrary function,
we get

RV ¼ VR: ð19:29Þ

Next, let us examine the symmetry of the Laplacian ∇2. The Laplacian is defined
in Sect. 1.2 as

2 2 2
∂ ∂ ∂
∇2  2
þ 2þ 2, ð1:24Þ
∂x ∂y ∂z
736 19 Applications of Group Theory to Physical Chemistry

where x, y, and z denote the Cartesian coordinates. Let S be an orthogonal matrix that
transforms the xyz-coordinate system to x’y’z’-coordinate system. We suppose that
S is expressed as
0 1 0 1 0 10 1
s11 s12 s13 x0 s11 s12 s13 x
B C B 0C B CB C
S ¼ @ s21 s22 s23 A and @ y A ¼ @ s21 s22 s23 A@ y A: ð19:30Þ
s31 s32 s33 z0 s31 s32 s33 z

Since S is an orthonormal matrix, we have


0 1 0 10 0 1
x s11 s21 s31 x
B C B CB C
@ y A ¼ @ s12 s22 s32 A@ y0 A: ð19:31Þ
0
z s13 s23 s33 z

The equation is due to

SST ¼ ST S ¼ E, ð19:32Þ

where E is a unit matrix. Then we have

∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂ ∂
0
¼ 0 þ 0 þ 0 ¼ s11 þ s12 þ s13 : ð19:33Þ
∂x ∂x ∂x ∂x ∂y ∂x ∂z ∂x ∂y ∂z

Partially differentiating (19.33), we have

∂ ∂ ∂ ∂ ∂ ∂ ∂
¼ s11 0 þ s12 0 þ s13 0
∂x0 2 ∂x ∂x ∂x ∂y ∂x ∂z
   
∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂
¼ s11 s11 þ s12 þ s13 þ s12 s11 þ s12 þ s13
∂x ∂y ∂z ∂x ∂x ∂y ∂z ∂y
 
∂ ∂ ∂ ∂
þs13 s11 þ s12 þ s13
∂x ∂y ∂z ∂z
2 2 2
∂ ∂ ∂
¼ s11 2 þ s11 s12 þ s11 s13
∂x2 ∂y∂x ∂z∂x
2 2 2
∂ ∂ ∂
þs12 s11 þ s12 2 2 þ s12 s13
∂x∂y ∂y ∂z∂y
2 2 2
∂ ∂ ∂
þs13 s11 þ s13 s12 þ s13 2 2 :
∂x∂z ∂y∂z ∂z
ð19:34Þ
19.2 Method of Molecular Orbitals (MOs) 737

Calculating e.g., terms of ∂y∂0 2 and ∂z∂0 2, we have similar results. Then, collecting all
those 27 terms, we get

2 2
∂ ∂
s11 2 þ s21 2 þ s31 2 2
¼ 2, ð19:35Þ
∂x ∂x
2 2

where we used an orthogonal relationship of S. Cross terms of , ∂ ,
∂x∂y ∂y∂x
etc. all
vanish. Consequently, we get

2 2 2
∂ ∂ ∂ ∂ ∂ ∂
þ þ ¼ þ þ : ð19:36Þ
∂x0 2 ∂y0 2 ∂z0 2 ∂x2 ∂y2 ∂z2

Defining ∇02 as

∂ ∂ ∂
∇0 
2
0
þ 02 þ 02 , ð19:37Þ
∂x 2
∂y ∂z

we have

∇ 0 ¼ ∇2 :
2
ð19:38Þ

Notice that (19.38) holds with not only the symmetry operation of the molecule,
but also any rotation operation in ℝ3 (see Sect. 17.4).
As in the case of (19.28), we have
   
R∇2 ψ ðr0 Þ ¼ R ∇2 ψ ðr0 Þ ¼ ∇0 ψ 0 ðr0 Þ ¼ ∇2 ψ 0 ðr0 Þ
2

  ð19:39Þ
¼ ∇2 Rψ ðr0 Þ ¼ ∇2 R ψ ðr0 Þ:

Consequently, we get

R∇2 ¼ ∇2 R: ð19:40Þ

Adding both sides of (19.29) and (19.40), we get

R ∇2 þ V ¼ ∇2 þ V R: ð19:41Þ

From (19.24), we have

RH ¼ HR: ð19:42Þ
738 19 Applications of Group Theory to Physical Chemistry

Thus, we confirm that the Hamiltonian H commutes with any symmetry operation
R. In other words, H is invariant with the coordinate transformation relevant to the
symmetry operation.
Now we consider matrix representation of (19.42). Also let D be an irreducible
representation of the symmetry group ℊ. Then, we have

DðgÞH ¼ HDðgÞ ðg 2 ℊÞ:

Let {ψ 1, ψ 2,   , ψ d} be a set of basis vectors that span a representation space L S


associated with D. Then, on the basis of Schur’s Second Lemma of Sect. 18.3,
(19.42) immediately leads to an important conclusion that if H is represented by a
matrix, we must have

H ¼ λE, ð19:43Þ

where λ is an arbitrary complex constant. That is, H is represented such that


0 1
λ  0
B C
H ¼ @⋮ ⋱ ⋮ A, ð19:44Þ
0  λ

where the above matrix is (d, d ) square diagonal matrix. This implies that H is
reduced to

H ¼ λDð0Þ      λDð0Þ ,

where D(0) denotes the totally symmetric representation; notice that it is given by
1 for any symmetry operation. Thus, the commutability of an operator with all
symmetry operation is equivalent to that the operator belongs to the totally symmet-
ric representation.
Since H is Hermitian, λ should be real. Operating both sides of (19.42) on
{ψ 1, ψ 2,   , ψ d} from the right, we have

ðψ 1 ψ 2   ψ d ÞRH ¼ ðψ 1 ψ 2   ψ d ÞHR
0 1
λ
B  C
B λ C
B C
¼ ðψ 1 ψ 2   ψ d ÞB CR ¼ ðλψ 1 λψ 2   λψ d ÞR : ð19:45Þ
B ⋮ ⋱ ⋮C
@ A
 λ
¼ λðψ 1 ψ 2   ψ d ÞR

In particular putting R ¼ E, we get


19.2 Method of Molecular Orbitals (MOs) 739

ðψ 1 ψ 2   ψ d ÞH ¼ λðψ 1 ψ 2   ψ d Þ:

Simplifying this equation, we have

ψ i H ¼ λψ i ð1  i  dÞ: ð19:46Þ

Thus, λ is found to be an energy eigenvalue of H and ψ i (1  i  d ) are


eigenfunctions belonging to the eigenvalue λ.
Meanwhile, using
Xd
Rðψ i Þ ¼ k¼1
ψ k Dki ðRÞ,

we rewrite (19.45) as

ðψ 1 ψ 2   ψ d ÞRH ¼ ðψ 1 R ψ 2 R  ψ d RÞH
Xd Xd Xd
¼ ψ D ðRÞ k¼1 ψ k Dk2 ðRÞ   k¼1 ψ k Dkd ðRÞ H ,
k¼1 k k1
Xd Xd Xd
¼λ k¼1
ψ k D k1 ð RÞ k¼1
ψ k D k2 ðR Þ   ψ D ðRÞ
k¼1 k kd

where with the second equality we used the relation (11.42); i.e.,

ψ i R ¼ Rðψ i Þ:

Thus, we get
Xd Xd
k¼1
ψ k Dki ðRÞH ¼ λ k¼1
ψ k Dki ðRÞ ð1  i  d Þ: ð19:47Þ

The relations (19.45) to (19.47) imply that ψ 1, ψ 2,   , and ψ d as well as their


linear combinations using a representation matrix D(R) are eigenfunctions that
belong to the same eigenvalue λ. That is, ψ 1, ψ 2,   , and ψ d are said to be degenerate
with a multiplication d.
After the remarks of Theorem 14.5, we can construct an orthonormal basis set
fψe1, ψ
e 2 ,   , ψ
e d g as eigenfunctions. These vectors can be constructed via linear
combinations of ψ 1, ψ 2,   , and ψ d. After transforming the vectors
e i ð1  i  d Þ by R, we get
ψ
740 19 Applications of Group Theory to Physical Chemistry

DXd Xd E Xd Xd
e
ψ
k¼1 k
D ki ð RÞj e
ψ
l¼1 l
D lj ð R Þ ¼ k¼1 l¼1 ki
e k je
D ðRÞ Dlj ðRÞhψ ψ li
Xd Xd    
¼ D ðRÞ Dkj ðRÞ ¼
k¼1 ki k¼1
D{ ðRÞ ik ½DðRÞkj ¼ D{ ðRÞDðRÞ ij ¼ δij :

With the last equality, we used unitarity of D(R). Thus, the orthonormal basis set
is retained after unitary transformation of fψ e 2 ,   , ψ
e1, ψ e d g.
In the above discussion, we have assumed that ψ 1, ψ 2,   , and ψ d belong to a
certain irreducible representation D(ν). Since ψ e1, ψ
e 2 ,   , and ψ e d consist of their linear
combinations, ψ e1, ψ
e 2 ,   , and ψ
e d belong to D(ν) as well. Also, we have assumed
that ψe1, ψ
e 2 ,   , and ψ e d form an orthonormal basis and, hence, according to
Theorem 11.3 these vectors constitute basis vectors that belong to D(ν). In particular,
functions derived using projection operators share these characteristics. In fact, the
above principles underlie molecular orbital calculations dealt with in Sects. 19.4 and
19.5. We will go into more details in subsequent sections.
Bearing the aforementioned argument in mind, we make the most of the relation
expressed by (18.171) to evaluate inner products of functions (or vectors). In this
connection, we often need to calculate matrix elements of an operator. One of the
most typical examples is matrix elements of Hamiltonian. In the field of molecular
science we have to estimate an overlap integral, Coulomb integral, resonance
integral, etc. Other examples include transition matrix elements pertinent to, e.g.,
electric dipole transition. To this end, we deal with direct-product representation (see
Sects. 18.7 and 18.8). Let O(γ) be an Hermitian operator belonging to the γ-th
irreducible representation. Let us think of a following inner product:
D E
ðβ Þ
ϕl jOðγÞ ψ ðsαÞ , ð19:48Þ

ðβ Þ
where ψ ðsαÞ and ϕl are the s-th component of α-th irreducible representation and the
l-th component of β-th irreducible representation, respectively; see (18.146) and
(18.161).
(i) Let us think of first the case where O(γ) is Hamiltonian H. Suppose that ψ ðsαÞ
belongs to an eigenvalue λ of H. Then, we have

ð αÞ ðαÞ
H j ψ l i ¼ λ j ψ l i: ð19:49Þ

In that case, with (19.48) we get


D E D E
ðβÞ ðβ Þ
ϕl jHψ ðsαÞ ¼ λ ϕl jψ ðsαÞ : ð19:50Þ

This equation is essentially the same as (18.171). That is,


19.2 Method of Molecular Orbitals (MOs) 741

D E D E
ð αÞ ðαÞ
ϕðsβÞ jψ l ¼ δαβ δls ϕðsβÞ jψ l : ð18:171Þ

At the first glance this equation seems trivial, but the equation gives us a
powerful tool to save us a lot of troublesome calculations. In fact, if we
encounter a series of inner product calculations (i.e., definite integrals), we
ignore many of them. It is because matrix elements of Hamiltonian  do not

vanish only if α ¼ β and l ¼ s. That is, we only have to estimate ϕðsαÞ jψ ðsαÞ .
Otherwise the inner products vanish. The functional forms of ϕðsαÞ and ψ ðsαÞ are
determined depending upon individual practical problems. We will show tan-
gible examples later (Sect. 19.4).
(ii) Next, let us consider matrix elements of the optical transition. In this case, we
are thinking of transition probability between Dquantum states. E Assuming the
ðβ Þ
dipole approximation, we use εe  P(γ) for O(γ) in ϕl jOðγÞ ψ ðsαÞ of (19.48). The
quantity P(γ) represents an electric dipole moment associated with the position
vector. Normally, we can readily find it in a character table, in which we
examine which irreducible representation γ the position vector components
x, y, and z indicated in a rightmost column correspond to. Table 18.4 is an
example. If we take a unit polarization vector εe in parallel to the position vector
component that the character table designates, εe  P(γ) is nonvanishing. Suppose
ðβÞ
that ψ ðsαÞ and ϕl are an initial state and final state, respectively. At the first
glance, we would wish to use (18.200) and count how many times a represen-
tation β occurs for a direct-product representation D(γ α). It is often the case,
however, where β is not an irreducible representation.
Even in such a case, if either α or β belongs to a totally symmetric representation,
the handling will be easier. This situation corresponds to that an initial electronic
configuration or a final electronic configuration forms a closed shell an electronic
state of which is totally symmetric [1]. The former occurs when we consider the
optical absorption that takes place, e.g., in a molecule of a ground electronic state.
The latter corresponds to an optical emission that ends up with a ground state. Let us
ðγ Þ
consider the former case. Since Oj is Hermitian, we rewrite (19.48) as
D E D E D E
ðβÞ ðγ Þ ðγ Þ ðβÞ ðγ Þ ðβ Þ
ϕl jOj ψ ðsαÞ ¼ Oj ϕl jψ ðsαÞ ¼ ψ ðsαÞ jOj ϕl , ð19:51Þ

where we assume that ψ ðsαÞ is a ground state having a closed shell electronic
configuration. Therefore, ψ ðsαÞ belongs to a totally symmetric irreducible represen-
tation. For (19.48) not to vanish, therefore, we may alternatively state that it is
necessary for

Dðγ βÞ
¼ DðγÞ DðβÞ
742 19 Applications of Group Theory to Physical Chemistry

to contain D(α) belonging to a totally symmetric representation. Note that in group


theory we usually write D(γ) D(β) instead of D(γ) D(β); see (18.191).
ðβÞ
If ϕl belongs to a reducible representation, from (18.80) we have
X
DðβÞ ¼ q DðωÞ ,
ω ω

where D(ω) belongs to an irreducible representation ω. Then, we get


X
DðγÞ DðβÞ ¼ q DðγÞ
ω ω
DðωÞ :

Thus, applying (18.199) and (18.200), we can obtain a direct sum of irreducible
representations. After that, we examine whether an irreducible representation α is
contained in D(γ) D(β). Since the totally symmetric representation is
one-dimensional, s ¼ 1 in (19.51).

19.3 Calculation Procedures of Molecular Orbitals (MOs)

We describe a brief outline of the MO method based on LCAO (LCAOMO).


Suppose that a molecule consists of n atomic orbitals and that each MO ψ i (1  i  n)
comprises a linear combination of those n atomic orbitals ϕk (1  k  n). That is, we
have
Xn
ψi ¼ c ϕ
k¼1 ki k
ð1  i  nÞ, ð19:52Þ

where cki are complex coefficients. We assume that ϕk are normalized. That is,
ð
hϕk jϕk i  ϕk ϕk dτ ¼ 1, ð19:53Þ

where dτ implies that an integration should be taken over ℝ3. The notation hg| fi
means an inner product defined in Sect. 13.1. The inner product is usually defined by
a definite integral of g f whose integration range covers a part or all of ℝ3 depending
on a constitution of a physical system.
First we try to solve Schrödinger equation given as an eigenvalue equation. The
said equation is described as

Hψ ¼ λψ or ðH  λψ Þ ¼ 0: ð19:54Þ

Replacing ψ with (19.52), we have


19.3 Calculation Procedures of Molecular Orbitals (MOs) 743

Xn
c ðH
k¼1 k
 λÞϕk ¼ 0 ð1  k  nÞ, ð19:55Þ

where the subscript i in (19.52) has been omitted for simplicity. Multiplying ϕj from
the left and integrating over whole ℝ3, we have

Xn ð
k¼1
c k ϕj ðH  λÞϕk dτ ¼ 0: ð19:56Þ

Rewriting (19.56), we get

Xn ð
c
k¼1 k
ϕj Hϕk  λϕj ϕk dτ ¼ 0: ð19:57Þ

Here let us define following quantities:


ð ð
H jk ¼ ϕj Hϕk dτ and Sjk ¼ ϕj ϕk dτ, ð19:58Þ

where Sii ¼ 1 (1  i  n) due to a normalized function of ϕi. Then we get


Xn
c
k¼1 k
H jk  λSjk ¼ 0: ð19:59Þ

Rewriting (19.59) in a matrix form, we get


0 10 1
H 11  λ  H 1n  λS1n c1
B CB C
@ ⋮ ⋱ ⋮ A@ ⋮ A ¼ 0, ð19:60Þ
H n1  λSn1  H nn  λ cn

where note that Sii ¼ 1 (1  i  n) because of normalization of ϕi. Suppose that by


solving (19.59) or (19.60), we get λi (1  i  n), some of which may be identical
(i.e., the degenerate case), and obtain n different column eigenvectors corresponding
to n eigenvalues λi. In light of (12.4) of Sect. 12.1, the following condition must be
satisfied for this to get eigenvectors for which not all ck is zero:
 
 H 11  λ  H 1n  λS1n 

 
 ⋮ ⋱ ⋮ ¼0 ð19:61Þ
 
 H n1  λSn1  H nn  λ 
744 19 Applications of Group Theory to Physical Chemistry

Equation (19.61) is called a secular equation. This equation is pertinent to a


determinant of an order n, and so we are expected to get n roots for this, some of
which are identical (i.e., a degenerate case).
In the above discussion, it is useful to introduce the following notation:
ð ð
   
ϕj jHϕk  H jk ¼ ϕj Hϕk dτ and ϕj jϕk  Sjk ¼ ϕj ϕk dτ: ð19:62Þ

The above notation has already been introduced in Sect. 1.4. Equation (19.62)
certainly satisfies the definition of the inner product described in (13.2)–(13.4);
readers, please check it. On the basis of (13.64), we have
 
hyjHxi ¼ xH { jy : ð19:63Þ

Since H is Hermitian, H{ = H. That is,

hyjHxi ¼ hxHjyi: ð19:64Þ

To solve (19.61) with an enough large number n is usually formidable. However,


if we can find appropriate conditions, (19.61) can pretty easily be solved. An
essential point rests upon how we deal with off-diagonal elements Hjk and Sjk. If
we are able to appropriately choose basis vectors so that we can get

H jk ¼ 0 and Sjk ¼ 0 for j 6¼ k, ð19:65Þ

the secular equation is reduced to a simple form


 
He e 
 11  λS11 
 e 22  λe 
 H S22 
  ¼ 0, ð19:66Þ
 
 ⋱ 
 
 e nn  λe 
H Snn

where all the off-diagonal elements are zero and an eigenvalue λ is given by

e ii =e
λi ¼ H Sii : ð19:67Þ

This means that the eigenvalue equation has automatically been solved.
The best way to achieve this is to choose basis functions (vectors) such that the
functions conform to the symmetry which the molecule belongs to. That is, we
“shuffle” the atomic orbitals as follows:
19.3 Calculation Procedures of Molecular Orbitals (MOs) 745

Xn
ξi ¼ d ϕ
k¼1 ki k
ð1  i  nÞ, ð19:68Þ

where ξi are new functions chosen instead of ψ i of (19.52). That is, we construct a
linear combination of the atomic orbitals that belongs to individual irreducible
representation. The said linear combination is called symmetry-adapted linear com-
bination (SALC). We remark that all atomic orbitals are not necessarily included in
the SALC. Then we have
ð ð
e
H jk ¼ ξj Hξk dτ and ~
Sjk ¼ ξj ξk dτ: ð19:69Þ

Thus, instead of (19.61), we have a new secular equation of

e jk  λ~
det H Sjk ¼ 0: ð19:70Þ

Since the Hamiltonian H is totally symmetric, in terms of the direct-product


representation Hξk in (19.69) belongs to an irreducible representation which ξk
belongs to. This can intuitively be understood. But, to assert this, use (18.76) and
(18.200) along with the fact that characters of the totally symmetric representation
are 1.
In light of (18.171) and (18.181), if ξk and ξj belong to different irreducible
e jk and e
representations, H Sjk both vanish at once. From (18.172), at the same time, ξk
and ξj are orthogonal to each other. The most ideal situation is to get (19.66) with all
the off-diagonal elements vanishing. Regarding the diagonal elements of (19.70), we
always have the direct product of the same representation and, hence, the integrals
(19.69) do not vanish. Thus, we get a powerful guideline for the evaluation of
(19.70) in an as simplest as possible form.
If the SALC orbitals ξj and ξk belong to the same irreducible representation, the
relevant matrix elements are generally nonvanishing. Even in that case, however,
according to Theorem 13.2 we can construct a set of orthonormal vectors by taking
appropriate linear combination of ξj and ξk. The resulting vectors naturally belong to
the same irreducible representation. This can be done by solving the secular equation
as described below. On top of it, Hermiticity of the Hamiltonian ensures that those
vectors can be rendered orthogonal (see Theorem 14.5). If n electrons are present in a
molecule, we are to deal with n SALC orbitals which we view as vectors accord-
ingly. In terms of representation theory, these vectors span a representation space
where the vectors undergo symmetry operations. Thus, we can construct orthonor-
mal basis vectors belonging to various irreducible representations throughout the
representation space.
To address our problems, we take following procedures: (i) First we have to
determine a symmetry species (i.e., point group) of a molecule. (ii) Next, we pick up
746 19 Applications of Group Theory to Physical Chemistry

atomic orbitals contained in a molecule and examine how those orbitals are
transformed by symmetry operations. (iii) We examine how those atomic orbitals
are transformed according to symmetry operations. Since the symmetry operations
are represented by a (n, n) unitary matrix, we can readily decide a trace of the matrix.
Generally, that matrix representation is reducible, and so we reduce the representa-
tion according to the procedures of (18.81)–(18.83). Thus, we are able to determine
how many irreducible representations are contained in the original reducible repre-
sentation. (iv) After having determined the irreducible representations, we construct
SALCs and constitute a secular equation using them. (v) Solving the secular
equation, we determine molecular orbital energies and decide functional forms of
the corresponding MOs. (vi) We examine physicochemical properties such as optical
transition within a molecule.
In the procedure (iii) of the above, if the same irreducible representations appear
more than once, we have to solve a secular equation of an order of two or more. Even
in that case, we can render a set of resulting eigenvectors orthogonal to one another
during the process of solving a problem.
To construct the abovementioned SALC, we make the most of projection oper-
ators that are defined in Sect. 14.1. In (18.147) putting m ¼ l, we have

ðαÞ dα X ðαÞ
PlðlÞ ¼ D ðgÞ g:
g ll
ð19:71Þ
n

Or we can choose a projection operator P(α) described as


h i
dα X Xdα ðαÞ dα X ðαÞ
PðαÞ ¼ D
l¼1 ll
ð g Þ g ¼ χ ð g Þ g: ð18:174Þ
n g n g

ðαÞ
In the one-dimensional representation, PlðlÞ and P(α) are identical. As expressed in
(18.155) and (18.175), these projection operators act on an arbitrary function and
extract specific component(s) pertinent to a specific irreducible representation of the
point group which the molecule belongs to.
ðαÞ
At first glance the definition of PlðlÞ and P(α) looks daunting, but use of character
tables relieves a calculation task. In particular, all the irreducible representations are
one-dimensional (i.e., just a number!) with Abelian groups as mentioned in Sect.
18.6. For this, notice that individual group elements form a class by themselves.
Therefore, utilization of character tables becomes easier for Abelian groups. Even
though we encounter a case where a dimension of representation is more than one
ðαÞ
(i.e., the case of noncommutative groups), Dll can be determined without much
difficulty.
19.4 MO Calculations Based on π-Electron Approximation 747

19.4 MO Calculations Based on π-Electron Approximation

On the basis of the general argument developed in Sects. 19.2 and 19.3, we preform
molecular orbital calculations of individual molecules. First, we apply group theory
to the molecular orbital calculations about aromatic hydrocarbons such as ethylene,
cyclopropenyl radical and cation as well as benzene and allyl radical. In these cases,
in addition to adoption of the molecular orbital theory, we adopt so-called
“π-electron approximation.” With the first three molecules, we will not have to
“solve” a secular equation. But, for allyl radical we deal with two SALC orbitals
belonging to the same irreducible representations and, hence, the final molecular
orbitals must be obtained by solving the secular equation.

19.4.1 Ethylene

We start with one of the simplest examples, ethylene. Ethylene is a planar molecule
and belongs to D2h symmetry (see Sect. 17.2). In the molecule two π-electrons
extend vertically to the molecular plane toward upper and lower directions
(Fig. 19.5). The molecular plane forms a node to atomic 2pz orbitals; that is, those
atomic orbitals change a sign relative to the molecular plane (i.e., the xy-plane).
In Fig. 19.5, two pz atomic orbitals of carbon are denoted by ϕ1 and ϕ2. We
should be able to construct basis vectors using ϕ1 and ϕ2. Corresponding to the two
atomic orbitals, we are dealing with a two-dimensional vector space. Let us consider
how ϕ1 and ϕ2 are transformed by a symmetry operation. First we examine an
operation C2(z). This operation exchanges ϕ1 and ϕ2. That is, we have

C 2 ðzÞðϕ1 Þ ¼ ϕ2 and ð19:72Þ

+ z +
y y
& &
x
+ +
x

Fig. 19.5 Ethylene molecule placed on the xyz-coordinate system. Two pz atomic orbitals of carbon
are denoted by ϕ1 and ϕ2. The atomic orbitals change a sign relative to the molecular plane (i.e., the
xy-plane)
748 19 Applications of Group Theory to Physical Chemistry

C2 ðzÞðϕ2 Þ ¼ ϕ1 : ð19:73Þ

Equation (19.73) can be combined into a following equation:

ðϕ1 ϕ2 ÞC 2 ðzÞ ¼ ðϕ2 ϕ1 Þ: ð19:74Þ

Thus, using a matrix representation, we have


 
0 1
C2 ðzÞ ¼ : ð19:75Þ
1 0

In Sect. 17.2, we had


0 1
cos θ  sin θ 0
B C
Rzθ ¼ @ sin θ cos θ 0 A or ð19:76Þ
0 0 1
0 1
1 0 0
B C
C 2 ðzÞ ¼ @ 0 1 0 A: ð19:77Þ
0 0 1

Whereas (19.76) and (19.77) show the transformation of a position vector in ℝ3,
(19.75) represents the transformation of functions in a two-dimensional vector space.
A vector space composed of functions may be finite-dimensional or infinite-dimen-
sional. We have already encountered the latter case in Part I where we dealt with the
quantum mechanics of a harmonic oscillator. Such a function space is often referred
to as a Hilbert space. An essence of (19.75) is characterized by that a trace
(or character) of the matrix is zero.
Let us consider another transformation C2( y). In this case the situation is different
from the above case in that ϕ1 is converted to ϕ1 by C2( y) and that ϕ2 is converted
to ϕ2. Notice again that the molecular plane forms a node to atomic p-orbitals.
Thus, C2( y) is represented by
 
1 0
C 2 ð yÞ ¼ : ð19:78Þ
0 1

The trace of the matrix C2( y) is 2. In this way, choosing ϕ1 and ϕ2 for the basis
functions for the representation of D2h we can determine the characters for individual
symmetry transformations of atomic 2pz orbitals in ethylene belonging to D2h. We
collect the results in Table 19.1.
Next, we examine what kind of irreducible representations is contained in our
present representation of ethylene. To do this, we need a character table of D2h (see
Table 19.2). If a specific kind of irreducible representation is contained, then we
19.4 MO Calculations Based on π-Electron Approximation 749

Table 19.1 Characters for individual symmetry transformations of atomic 2pz orbitals in ethylene
D2h E C2(z) C2( y) C2(x) i σ(xy) σ(zx) σ(yz)
Γ 2 0 2 0 0 2 0 2

Table 19.2 Character table of D2h


D2h E C2(z) C2( y) C2(x) i σ(xy) σ(zx) σ(yz)
Ag 1 1 1 1 1 1 1 1 x2, y2, z2
B1g 1 1 1 1 1 1 1 1 xy
B2g 1 1 1 1 1 1 1 1 zx
B3g 1 1 1 1 1 1 1 1 yz
Au 1 1 1 1 1 1 1 1
B1u 1 1 1 1 1 1 1 1 z
B2u 1 1 1 1 1 1 1 1 y
B3u 1 1 1 1 1 1 1 1 x

want to examine how many times that specific representation takes place. Equation
(18.83) is very useful for this purpose. In the present case, n ¼ 8 in (18.83). Also
taking into account (18.79) and (18.80), we get

Γ ¼ B1u  B3g , ð19:79Þ

where Γ is a reducible representation for a set consisting of two pz atomic orbitals of


ethylene. Equation (19.79) clearly shows that Γ is a direct sum of two irreducible
representations B1u and B3g that belong to D2h. In instead of (19.79), we usually
simply express the direct product in quantum chemical notation as

Γ ¼ B1u þ B3g : ð19:80Þ

As a next step, we are going to find an appropriate basis function belonging to two
irreducible representations of (19.80). For this purpose, we use projection operators
expressed as
h i
dα X ðαÞ
PðαÞ ¼ χ ðgÞ g: ð18:174Þ
n g

Taking ϕ1 for instance, we apply (18.174) to ϕ1. That is,


750 19 Applications of Group Theory to Physical Chemistry

Table 19.3 Character table C2 E C2


of C2
A 1 1 z; x2, y2, z2, xy
B 1 1 x, y; yz, zx

h i
1 X ðB1u Þ
PðB1u Þ ϕ1 ¼ χ ð g Þ gϕ1
8 g

1
¼ ½1  ϕ1 þ 1  ϕ2 þ ð1Þðϕ1 Þ þ ð1Þðϕ2 Þ þ ð1Þðϕ2 Þ
8
1
þð1Þðϕ1 Þ þ 1  ϕ2 þ 1  ϕ1  ¼ ðϕ1 þ ϕ2 Þ: ð19:81Þ
2

Also with the B3g, we apply (18.174) to ϕ1 and get


h i
1 X ðB3g Þ
PðB3g Þ ϕ1 ¼ χ ð g Þ gϕ1
8 g

1
¼ ½1  ϕ1 þ ð1Þϕ2 þ ð1Þðϕ1 Þ þ 1ðϕ2 Þ þ 1ðϕ2 Þ þ ð1Þðϕ1 Þ
8
1
þð1Þϕ2 þ 1  ϕ1  ¼ ðϕ1  ϕ2 Þ: ð19:82Þ
2

Thus, after going through routine but sure procedures, we have reached appro-
priate basis functions that belong to each irreducible representation.
As mentioned earlier (see Table 17.5), D2h can be expressed as a direct-product
group and C2 group is contained in D2h as a subgroup. We can use it for the present
analysis. Suppose now that we have a group ℊ and that H is a subgroup of ℊ. Let
g be an arbitrary element of ℊ and let D(g) be a representation of g. Meanwhile, let
h be an arbitrary element of H . Then with 8 h 2 H , a collection of D(h) is a
representation of H . We write this relation as

D#H: ð19:83Þ

This representation is called a subduced representation of D to H . Table 19.3


shows a character table of irreducible representations of C2. In the present case, we
are thinking of C2(z) as C2; see Table 17.5 and Fig. 19.5. Then we have

B1u # C 2 ¼ A and B3g # C2 ¼ B: ð19:84Þ

The expression of (19.84) is called a compatibility relation. Note that in (19.84)


C2 is not a symmetry operation, but means a subgroup of D2h. Thus, (19.81) is
reduced to
h i
1 X ðAÞ 1 1
PðAÞ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ½1  ϕ1 þ 1  ϕ2  ¼ ðϕ1 þ ϕ2 Þ: ð19:85Þ
2 g 2 2

Also, we have
19.4 MO Calculations Based on π-Electron Approximation 751

h i
1 X ðBÞ 1 1
PðBÞ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ½1  ϕ1 þ ð1Þϕ2  ¼ ðϕ1  ϕ2 Þ: ð19:86Þ
2 g 2 2

The relations of (19.85) and (19.86) are essentially the same as (19.81) and
(19.82), respectively.
We can easily construct a character table (see Table 19.3). There should be two
irreducible representations. Regarding the totally symmetric representation, we
allocate 1 to each symmetry operation. For another representation we allocate 1 to
an identity element E and 1 to an element C2 so that the row and column of the
character table are orthogonal to each other.
Readers might well ask why we bother to make circuitous approaches to reaching
predictable results such as (19.81) and (19.82) or (19.85) and (19.86). This question
seems natural when we are dealing with a case where the number of basis vectors
(i.e., a dimension of the vector space) is small, typically 2 as in the present case. With
increasing dimension of the vector space, however, to seek and determine appropri-
ate SALCs become increasingly complicated and difficult. Under such circum-
stances, a projection operator is an indispensable tool to address the problems.
We have a two-dimensional secular equation to be solved such that
 
e 
 H 11  λe
S11 e 12  λe
H S12 
  ¼ 0: ð19:87Þ
He 21  λe
S21 e 22  λe
H S22 

In the above argument, 12 ðϕ1 þ ϕ2 Þ belongs to B1u and 12 ðϕ1  ϕ2 Þ belongs to B3g.
Since they belong to different irreducible representations, we have

e 12 ¼ e
H S12 ¼ 0: ð19:88Þ

Thus, the secular equation (19.84) is reduced to


 
e 
 H 11  λe
S11 0 
  ¼ 0: ð19:89Þ
 0 e 22  λe
H S22 

As expected, (19.89) has automatically been solved to give a solution

e 11
H
λ1 ¼ e 22 =e
and λ2 ¼ H S22 : ð19:90Þ
e
S11

The next step is to determine the energy eigenvalue of the molecule. Note here
that a role of SALCs is to determine a suitable irreducible representation that
corresponds to a “direction” of a vector. As the coefficient keeps the direction of a
vector unaltered, it would be of secondary importance. The final form of normalized
MOs can be decided last. That procedure includes the normalization of a vector.
Thus, we tentatively choose following functions for SALCs, i.e.,
752 19 Applications of Group Theory to Physical Chemistry

ξ1 ¼ ϕ1 þ ϕ2 and ξ2 ¼ ϕ1  ϕ2 :

Then, we have
ð ð
e 11 ¼ ξ1 Hξ1 dτ ¼ ðϕ1 þ ϕ2 Þ H ðϕ1 þ ϕ2 Þdτ
H
ð ð ð ð
ð19:91Þ
¼ ϕ1 Hϕ1 dτ þ ϕ1 Hϕ2 dτ þ ϕ2 Hϕ1 dτ þ ϕ2 Hϕ2 dτ
¼ H 11 þ H 12 þ H 21 þ H 22 ¼ H 11 þ 2H 12 þ H 22 :

Similarly, we have
ð
e 22 ¼ ξ2 Hξ2 dτ ¼ H 11  2H 12 þ H 22 :
H ð19:92Þ

The last equality comes from the fact that we have chosen real functions for ϕ1
and ϕ2 as studied in Part I. Moreover, we have
ð ð ð
H 11  ϕ1 Hϕ1 dτ ¼ ϕ1 Hϕ1 dτ ¼ ϕ2 Hϕ2 dτ ¼ H 22 ,
ð ð ð ð ð19:93Þ
H 12  ϕ1 Hϕ2 dτ ¼ ϕ1 Hϕ2 dτ ¼ ϕ2 Hϕ1 dτ ¼ ϕ2 Hϕ1 dτ ¼ H 21 :

The first equation comes from the fact that both H11 and H22 are calculated using
the same 2pz atomic orbital of carbon. The second equation results from the fact that
H is Hermitian. Notice that both ϕ1 and ϕ2 are real functions. Following the
convention, we denote

α  H 11 ¼ H 22 and β  H 12 , ð19:94Þ

where α is called Coulomb integral and β is said to be resonance integral. Then, we


have

e 11 ¼ 2ðα þ βÞ:
H ð19:95Þ

In a similar manner, we get

e 22 ¼ 2ðα  βÞ:
H ð19:96Þ

Meanwhile, we have
19.4 MO Calculations Based on π-Electron Approximation 753

ð ð
e
S11 ¼ hξ1 jξ1 i ¼ ðϕ1 þ ϕ2 Þ ðϕ1 þ ϕ2 Þdτ ¼ ðϕ1 þ ϕ2 Þ2 dτ
ð ð ð ð
¼ ϕ21 dτ þ ϕ22 dτ þ 2 ϕ1 ϕ2 dτ ¼ 2 þ 2 ϕ1 ϕ2 dτ, ð19:97Þ

where we used the fact that ϕ1 and ϕ2 have been normalized. Also following the
convention, we denote
ð
S  ϕ1 ϕ2 dτ ¼ S12 , ð19:98Þ

where S is called overlap integral. Thus, we have

e
S11 ¼ 2ð1 þ SÞ: ð19:99Þ

Similarly, we get

e
S22 ¼ 2ð1  SÞ: ð19:100Þ

Substituting (19.95) and (19.96) along with (19.99) and (19.100) for (19.90), we
get as the energy eigenvalue

αþβ αβ
λ1 ¼ and λ2 ¼ : ð19:101Þ
1þS 1S

From (19.97), we get


pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j jξ1 j j¼ hξ1 jξ1 i ¼ 2ð1 þ SÞ: ð19:102Þ

Thus, for one of MOs corresponding to an energy eigenvalue λ1, we get

j ξ1 i ϕ1 þ ϕ2
Ψ1 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:103Þ
j jξ1 j j 2ð 1 þ SÞ

Also, we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j jξ2 j j¼ hξ2 jξ2 i ¼ 2ð1  SÞ: ð19:104Þ

For another MO corresponding to an energy eigenvalue λ2, we get


754 19 Applications of Group Theory to Physical Chemistry

= ( )

= ( )

Fig. 19.6 HOMO and LUMO energy levels and their assignments of ethylene

j ξ2 i ϕ1  ϕ2
Ψ2 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:105Þ
j jξ2 j j 2ð 1  SÞ

Note that both normalized MOs and energy eigenvalues depend upon whether we
ignore an overlap integral, as being the case with the simplest Hückel approximation.
Nonetheless, MO functional forms (in this case either ϕ1 + ϕ2 or ϕ1  ϕ2) remain the
same regardless of the approximation levels about the overlap integral. Regarding
quantitative evaluation of α, β, and S, we will briefly mention it later.
Once we have decided symmetry of MO (or an irreducible representation which
the orbital belongs to) and its energy eigenvalue, we will be in a position to examine
various physicochemical properties of the molecule. One of them is an optical
transition within a molecule, particularly electric dipole transition. In most cases,
the most important transition is that occurring among the highest-occupied molec-
ular orbital (HOMO) and lowest-unoccupied molecular orbital (LUMO). In the case
of ethylene, those levels are depicted in Fig. 19.6. In a ground state (i.e., the most
stable state), two electrons are positioned in a B1u state (HOMO). An excited state is
assigned to B3g (LUMO). The ground state that consists only of fully occupied MOs
belongs to a totally symmetric representation. In the case of ethylene, the ground
state belongs to Ag accordingly.
If a photon is absorbed by a molecule, that molecule is excited by an energy of the
photon. In ethylene, this process takes place by exciting an electron from B1u to B3g.
The resulting electronic state ends up as an electron remaining in B1u and another
excited to B3g (a final state). Thus, the representation of the final sate electronic
configuration (denoted by Γ f) is described as

Γ f ¼ B1u B3g : ð19:106Þ

That is, the final excited state is expressed as a direct product of the states
associated with the optical transition. To determine the symmetry of the final sate,
we use (18.198) and (18.200). If Γ in (19.106) is reducible, using (18.200) we can
19.4 MO Calculations Based on π-Electron Approximation 755

determine the number of times that individual representations take place. Calculating
χ ðB1u Þ ðgÞχ ðB3g Þ ðgÞ for each group element, we can readily get the result. Using a
character table, we have

Γ f ¼ B2u : ð19:107Þ

Thus, we find that the transition is Ag ⟶ B2u, where Ag is called an initial state
electronic configuration and B2u is called a final state electronic configuration. This
transition is characterized by an electric dipole transition moment operator P. Here
we have an important physical quantity of transition matrix element. This quantity Tfi
is approximated by
 
T fi  Θf jεe  PjΘi , ð19:108Þ

where εe is a unit polarization vector of the electric field; Θi and Θf are the initial state
and final sate electronic configuration, respectively. The description (19.108) is in
parallel with (4.5). Notice that in (4.5) we dealt with a single electron system such as
a particle confined in a square-well potential, a sole one-dimensional harmonic
oscillator, and an electron in a hydrogen atom.
In the present case, however, we are dealing with a two-electron system, ethylene.
Consequently, we cannot describe Θf by a simple wave function, but must use a
more elaborated function. Nonetheless, when we discuss the optical transition of a
molecule, it is often the case that when we study optical absorption or optical
emission we first wish to know whether such a phenomenon truly takes place. In
such a case, qualitative prediction for this is of great importance. This can be done by
judging whether the integral (19.108) vanishes. If the integral does not vanish, the
relevant transition is said to be allowed. If, on the other hand, the integral vanishes,
the transition is called forbidden. In this context, a systematic approach based on
group theory is a powerful tool for this.
Let us consider optical absorption of ethylene. With the transition matrix element
for this, we have
D E
T fi ¼ Θf ðB2u Þ jεe  PjΘi ðAg Þ : ð19:109Þ

Again, note that a closed-shell electronic configuration belongs to a totally


symmetric representation [1]. Suppose that the position vector x belongs to an
irreducible representation η. A necessary condition for (19.109) not to vanish is
that DðηÞ DðAg Þ contains the irreducible representation DðB2u Þ . In the present case,
all the representations are one-dimensional, and so we can use χ (ω) instead of D(ω),
where ω shows an arbitrary irreducible representation of D2h. This procedure is
straightforward as seen in Sect. 19.2.
However, if the character is real (it is the case with many symmetry groups and
with the point group D2h as well), the situation will be easier. Suppose in general that
we are examining whether a following matrix element vanishes:
756 19 Applications of Group Theory to Physical Chemistry

D E
ðβ Þ ðαÞ
M fi ¼ Φf jOðγÞ jΦi , ð19:110Þ

where α, β, and γ stand for irreducible representations and O is an appropriate


operator. In this case, (18.200) can be rewritten as

1 X ðαÞ 1 X ðαÞ
qα ¼ χ ðgÞ χ ðγ βÞ ðgÞ ¼ χ ðgÞχ ðγ βÞ ðgÞ
n g n g
1 X ðαÞ ðγ Þ ðβ Þ 1 X ðγÞ
¼ χ ð g Þχ ð g Þχ ð g Þ ¼ χ ðgÞχ ðαÞ ðgÞχ ðβÞ ðgÞ ð19:111Þ
n g n g
1 X ðγ Þ ðα β Þ 1 X ðγÞ
¼ χ ð g Þχ ð g Þ ¼ χ ð gÞ χ ð α β Þ ð gÞ ¼ qγ
n g n g

Consequently, the number of times that D(γ) appears in D(α β) is identical to the
number of times that D(α) appears in D(γ β). Thus, it suffices to examine whether
D(γ β) contains D(α). In other words, we only have to examine whether qα 6¼ 0 in
(19.111).
Thus, applying (19.111) to (19.109), we examine whether DðB2u Ag Þ contains the
irreducible representation D(η) that is related to x. We easily get

B2u ¼ B2u Ag : ð19:112Þ

Therefore, if εe  P (or x) belongs to B2u, the transition is allowed. Consulting the


character table, we find that y belongs to B2u. In this case, in fact (19.111) reads as

1
qB2u ¼ ½1 1 þ ð1Þ ð1Þ þ 1 1 þ ð1Þ ð1Þ
8
þð1Þ ð1Þ þ 1 1 þ ð1Þ ð1Þ þ 1 1 ¼ 1:

Equivalently, we simply write

B2u B2u ¼ Ag :

This means that if a light polarized along the y-axis is incident, i.e., εe is parallel to
the y-axis, the transition is allowed. In that situation, ethylene is said to be polarized
along the y-axis or polarized in the direction of the y-axis. As a molecular axis is
parallel to the y-axis, ethylene is polarized in the direction of the molecular axis. This
is often the case with aromatic molecules having a well-defined molecular long axis
such as ethylene.
We would examine whether ethylene is polarized along, e.g., the x-axis. From a
character table of D2h (see Table 19.2), x belongs to B3u. In that case, using (19.111)
we have
19.4 MO Calculations Based on π-Electron Approximation 757

1
qB3u ¼ ½1 1 þ ð1Þ ð1Þ þ 1 ð1Þ þ ð1Þ 1
8
þð1Þ ð1Þ þ 1 1 þ ð1Þ 1 þ 1 ð1Þ ¼ 0:

This implies that B3u is not contained in B1u B3g (¼B2u).


The above results on the optical transitions are quite obvious. Once we get used to
using a character table, quick estimation will be done.

19.4.2 Cyclopropenyl Radical [1]

Let us think of another example, cyclopropenyl radical that has three resonant
structures (Fig. 19.7). It is a planar molecule and three carbon atoms form an
equilateral triangle. Hence, the molecule belongs to D3h symmetry (see Sect.
17.2). In the molecule three π-electrons extend vertically to the molecular plane
toward upper and lower directions. Suppose that cyclopropenyl radical is placed on
the xy-plane. Three pz atomic orbitals of carbons located at vertices of an equilateral
triangle are denoted by ϕ1, ϕ2, and ϕ3 in Fig. 19.8. The orbitals are numbered
clockwise so that the calculations can be consistent with the conventional notation of
a character table (vide infra). We assume that these π-orbitals take positive and
negative signs on the upper and lower sides of the plane of paper, respectively, with a
nodal plane lying on the xy-plane. The situation is similar to that of ethylene and the
problem can be treated in parallel to the case of ethylene.
As in the case of ethylene, we can choose ϕ1, ϕ2, and ϕ3 as real functions. We
construct basis vectors using these vectors. What we want to do to address the
problem is as follows: (i) We examine how ϕ1, ϕ2, and ϕ3 are transformed by the
symmetry operations of D3h. According to the analysis, we can determine what
irreducible representations SALCs should be assigned. (ii) On the basis of knowl-
edge obtained in (i), we construct proper MOs.
In Table 19.4 we list a character table of D3h along with symmetry species. First
we examine traces (characters) of representation matrices. Similarly in the case of
ethylene, a subgroup C3 of D3h plays an essential role (vide infra). This subgroup
contains three group elements such that
 
C 3 ¼ E, C 3 , C 23 :

In the above, we use the same notation for the group name and group element, and
so we should be careful not to confuse them.

Fig. 19.7 Three resonant


structures of cyclopropenyl
radical ⟷ ⟷
⋅ ⋅
758 19 Applications of Group Theory to Physical Chemistry

Fig. 19.8 Three pz atomic


orbitals of carbons for y
cyclopropenyl radical that is
placed on the xy-plane. The
carbon atoms are located at
vertices of an equilateral
triangle. The atomic orbitals
are denoted by ϕ1, ϕ2 and ϕ3

O x
z

Table 19.4 Character table D3h E 2C3 3C2 σh 2S3 3σ v


of D3h
A01 1 1 1 1 1 1 x2 + y2, z2
A02 1 1 1 1 1 1
E0 2 1 0 2 1 0 (x, y); (x2  y2, xy)
A001 1 1 1 1 1 1
A002 1 1 1 1 1 1 z
E00 2 1 0 2 1 0 (yz, zx)

They are transformed as follows:

C 3 ðzÞðϕ1 Þ ¼ ϕ3 , C 3 ðzÞðϕ2 Þ ¼ ϕ1 , and C3 ðzÞðϕ3 Þ ¼ ϕ2 ; ð19:113Þ


C23 ðzÞðϕ1 Þ ¼ ϕ2 , C23 ðzÞðϕ2 Þ ¼ ϕ3 , and C 23 ðzÞðϕ3 Þ ¼ ϕ1 : ð19:114Þ

Equation (19.113) can be combined into a following form:

ðϕ1 ϕ2 ϕ3 ÞC 3 ðzÞ ¼ ðϕ3 ϕ1 ϕ2 Þ: ð19:115Þ

Using a matrix representation, we have


0 1
0 1 0
B C
C 3 ðzÞ ¼ @ 0 0 1 A: ð19:116Þ
1 0 0

In turn, (19.114) is expressed as


19.4 MO Calculations Based on π-Electron Approximation 759

0 1
0 0 1
B C
C 3 ðzÞ ¼ @ 1
2
0 0 A: ð19:117Þ
0 1 0

Both traces of (19.116) and (19.117) are zero.


Similarly let us check the representation matrices of other symmetry species. Of
these, e.g., for C2 related to the y-axis (see Fig. 19.8) we have

ðϕ1 ϕ2 ϕ3 ÞC 2 ¼ ðϕ1  ϕ3  ϕ2 Þ: ð19:118Þ

Therefore,
0 1
1 0 0
B C
C2 ¼ @ 0 0 1 A: ð19:119Þ
0 1 0

We have a trace 1 accordingly. Regarding σ h, we have


0 1
1 0 0
B C
ðϕ1 ϕ2 ϕ3 Þσ h ¼ ðϕ1  ϕ2  ϕ3 Þ and σ h ¼ @ 0 1 0 A: ð19:120Þ
0 0 1

In this way we can determine the trace for individual symmetry transformations
of basis functions ϕ1, ϕ2, and ϕ3. We collect the results of characters of a reducible
representation Γ in Table 19.5. It can be reduced to a summation of irreducible
representations according to the procedures given in (18.81)–(18.83) and using a
character table of D3h (Table 19.4). As a result, we get

Γ ¼ A2 00 þ E 00 : ð19:121Þ

As in the case of ethylene, we make the best use of the information of a subgroup
C3 of D3h. Let us consider a subduced representation of D of D3h to C3. For this, in
Table 19.6 we show a character table of irreducible representations of C3. We can
readily construct this character table. There should be three irreducible representa-
tions. Regarding the totally symmetric representation, we allocate 1 to each sym-
metry operation. Hence, for other representations we allocate 1 to an identity element
E and two other triple roots of 1, i.e., ε and ε [where ε ¼ exp (i2π/3)] to an element
C3 and C32 as shown so that the row and column vectors of the character table are
orthogonal to each other.
Returning to the construction of the subduced representation, we have
760 19 Applications of Group Theory to Physical Chemistry

Table 19.5 Characters for D3h E 2C3 3C2 σh 2S3 3σ v


individual symmetry
Γ 3 0 1 3 0 1
transformations of 2pz orbitals
in cyclopropenyl radical

Table 19.6 Character table C3 E C3 C 23 ε ¼ exp (i2π/3)


of C3
A 1 1 1 z; x2 + y2, z2
 
E 1 ε ε (x, y); (x2  y2, xy), (yz, zx)
1 ε ε

A2 00 # C3 ¼ A and E00 # C 3 ¼ 2E: ð19:122Þ

Then, corresponding to (18.147) we have


h i
1 X ðAÞ 1
PðAÞ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ½1  ϕ1 þ 1  ϕ2 þ 1  ϕ3 
3 g 3
1
¼ ðϕ1 þ ϕ2 þ ϕ3 Þ: ð19:123Þ
3

Also, we have
h i
1 X ½ E ð 1Þ 
P½E  ϕ1 ¼
ð1Þ 1
χ ð g Þ gϕ1 ¼ ½1  ϕ1 þ ε ϕ3 þ ðε Þ ϕ2 
3 g 3
1
¼ ðϕ1 þ εϕ2 þ ε ϕ3 Þ: ð19:124Þ
3

Also, we get
h i
1 X ½ E ð 2Þ 
P ½ E  ϕ1 ¼
ð 2Þ 1
χ ð g Þ gϕ1 ¼ ½1  ϕ1 þ ðε Þ ϕ3 þ ε ϕ2 
3 g 3
1
¼ ðϕ1 þ ε ϕ2 þ εϕ3 Þ: ð19:125Þ
3

Here is the best place to mention an eigenvalue of a symmetry operator. Let us


designate SALCs as follows:

ξ1 ¼ ϕ1 þ ϕ2 þ ϕ3 , ξ2 ¼ ϕ1 þ εϕ2 þ ε ϕ3 , and ξ3 ¼ ϕ1 þ ε ϕ2 þ εϕ3 : ð19:126Þ

Let us choose C3 for a symmetry operator. Then we have


19.4 MO Calculations Based on π-Electron Approximation 761

C 3 ðξ1 Þ ¼ C 3 ðϕ1 þ ϕ2 þ ϕ3 Þ ¼ C3 ϕ1 þ C3 ϕ2 þ C 3 ϕ3 ¼ ϕ3 þ ϕ1 þ ϕ2 ¼ ξ1 , ð19:127Þ

where for the second equality we used the fact that C3 is a linear operator. That is,
regarding a SALC ξ1, an eigenvalue of C3 is 1.
Similarly, we have

C3 ðξ2 Þ ¼ C3 ðϕ1 þ εϕ2 þ ε ϕ3 Þ ¼ C 3 ϕ1 þ εC3 ϕ2 þ ε C 3 ϕ3


¼ ϕ3 þ εϕ1 þ ε ϕ2 ¼ εðϕ1 þ εϕ2 þ ε ϕ3 Þ ¼ εξ2 : ð19:128Þ

Furthermore, we get

C 3 ðξ 3 Þ ¼ ε ξ 3 : ð19:129Þ

Thus, we find that regarding SALCs ξ2 and ξ3, eigenvalues of C3 are ε and ε ,
respectively. These pieces of information imply that if we appropriately choose
proper functions for basis vectors, a character of a symmetry operation for a
one-dimensional representation is identical to an eigenvalue of the said symmetry
operation (see Table 19.6).
Regarding the last parts of the calculations, we follow the procedures described in
the case of ethylene. Using the above functions ξ1, ξ2, and ξ3, we construct the
secular equation such that
 
He e 
 11  λS11 
 e 22  λe 
 H S22  ¼ 0: ð19:130Þ
 
 e e 
H 33  λS33

Since we have obtained three SALCs that are assigned to individual irreducible
representations A, E(1), and E(2), these SALCs span the representation space V3. This
makes off-diagonal elements of the secular equation vanish and it is simplified as in
(19.130). Here, we have
ð ð
e 11 ¼ ξ1 Hξ1 dτ ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ H ðϕ1 þ ϕ2 þ ϕ3 Þdτ
H

¼ 3ðα þ 2βÞ, ð19:131Þ

where we used the same α and β as defined in (19.94). Strictly speaking, α and β
appearing in (19.131) should be slightly different from those of (19.94), because a
Hamiltonian is different. This approximation, however, would be enough for the
present studies. In a similar manner, we get
762 19 Applications of Group Theory to Physical Chemistry

ð ð
e 22 ¼ ξ2 Hξ2 dτ ¼ 3ðα  βÞ
H e 33 ¼ ξ3 Hξ3 dτ ¼ 3ðα  βÞ:
and H ð19:132Þ

Meanwhile, we have
ð
e
S11 ¼ hξ1 jξ1 i ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ ðϕ1 þ ϕ2 þ ϕ3 Þdτ
ð
¼ ðϕ1 þ ϕ2 þ ϕ3 Þ2 dτ ¼ 3ð1 þ 2SÞ: ð19:133Þ

Similarly, we get

e
S22 ¼ e
S33 ¼ 3ð1  SÞ: ð19:134Þ

Readers are urged to verify (19.133) and (19.134).


Substituting (19.131) through (19.134) for (19.130), we get as the energy
eigenvalue

α þ 2β αβ αβ
λ1 ¼ , λ2 ¼ , and λ3 ¼ : ð19:135Þ
1 þ 2S 1S 1S

Notice that two MOs belonging to E00 have the same energy. These MOs are said
to be energetically degenerate. This situation is characteristic of a two-dimensional
representation. Actually, even though the group C3 has only one-dimensional
representations (because it is an Abelian group), the two complex conjugate repre-
sentations labelled E behave as if they were a two-dimensional representation
[2]. We will again encounter the same situation in a next example, benzene.
From (19.126), we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
j jξ1 j j¼ hξ1 jξ1 i ¼ 3ð1 þ 2SÞ: ð19:136Þ

Thus, as one of MOs corresponding to an energy eigenvalue λ1, i.e., Ψ 1, we get

j ξ1 i ϕ1 þ ϕ2 þ ϕ3
Ψ1 ¼ ¼ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:137Þ
j jξ1 j j 3ð1 þ 2SÞ

Also, we have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
hξ2 jξ2 i ¼ 3ð1  SÞ
jjξ2 jj ¼ and j jξ3 j j¼ hξ3 jξ3 i
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ 3ð 1  SÞ : ð19:138Þ

Thus, for another MO corresponding to an energy eigenvalue λ2, we get


19.4 MO Calculations Based on π-Electron Approximation 763

j ξ2 i ϕ þ εϕ2 þ ε ϕ3
Ψ2 ¼ ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:139Þ
j jξ2 j j 3ð 1  SÞ

Also, with a MO corresponding to λ3 (¼λ2), we have

j ξ3 i ϕ þ ε ϕ2 þ εϕ3
Ψ3 ¼ ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:140Þ
j jξ3 j j 3ð 1  SÞ

Equations (19.139) and (19.140) include complex numbers, and so it is inconve-


nient to computer analysis. In that case, we can convert it to real numbers. In Part III,
we examined properties of unitary transformations. Since the unitary transformation
keeps a norm of a vector unchanged, this is suited to our present purpose. This can be
done using a following unitary matrix U:
0 1
1 i
pffiffiffi  pffiffiffi
B 2 2C
U¼B
@ 1
C: ð19:141Þ
i A
pffiffiffi pffiffiffi
2 2

Then we have
 
1 i
ðΨ 2 Ψ 3 ÞU ¼ pffiffiffi ðΨ 2 þ Ψ 3 Þ pffiffiffi ðΨ 2 þ Ψ 3 Þ : ð19:142Þ
2 2

f2 and Ψ
Thus, defining Ψ f3 as

f2 ¼ p1ffiffiffi ðΨ 2 þ Ψ 3 Þ and
Ψ f3 ¼ piffiffiffi ðΨ 2 þ Ψ 3 Þ,
Ψ ð19:143Þ
2 2

we get

1
f2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ ½2ϕ1 þ ðε þ ε Þϕ2 þ ðε þ εÞϕ3 
6ð 1  SÞ
1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2ϕ1  ϕ2  ϕ3 Þ: ð19:144Þ
6ð 1  SÞ

Also, we have
h pffiffiffi pffiffiffi i
i
f3 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i
Ψ ½ðε  εÞϕ2 þ ðε  ε Þϕ3  ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 3ϕ2 þ i 3ϕ3
6ð1  SÞ 6ð 1  SÞ
1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðϕ2  ϕ3 Þ: ð19:145Þ
2ð 1  SÞ
764 19 Applications of Group Theory to Physical Chemistry

Thus, we have successfully converted complex functions to real functions. In the


above unitary transformation, notice that a norm of the vectors remains unchanged
before and after the unitary transformation.
As cyclopropenyl radical has three π electrons, two occupy the lowest energy
level of A00. Another electron occupies a level E00. Since this level possesses an
energy higher than α, the electron occupying this level is anticipated to be unstable.
Under such a circumstance, a molecule tends to lose the said electron so as to be a
cation. Following the argument given in the previous case of ethylene, it is easy to
make sure that the allowed transition of the cyclopropenyl radical takes place when
the light is polarized parallel to the molecular plane (i.e., the xy-plane in Fig. 19.8).
The proof is left for readers as an exercise. This polarizing feature is typical of planar
molecules with high molecular symmetry.

19.4.3 Benzene

Benzene has structural formula which is shown in Fig. 19.9. It is a planar molecule
and six carbon atoms form a regular hexagon. Hence, the molecule belongs to D6h
symmetry. In the molecule six π-electrons extend vertically to the molecular plane
toward upper and lower directions as in the case of ethylene and cyclopropenyl
radical. This is a standard illustration of quantum chemistry and dealt with in many
textbooks. As in the case of ethylene and cyclopropenyl radical, the problem can be
treated similarly.
As before, six equivalent pz atomic orbitals of carbon are denoted by ϕ1 to ϕ6 in
Fig. 19.10. These vectors or their linear combinations span a six-dimensional
representation space. We construct basis vectors using these vectors. Following
the previous procedures, we construct proper SALC orbitals along with MOs.
Similarly as before, a subgroup C6 of D6h plays an essential role. This subgroup
contains six group elements such that
 
C6 ¼ E, C6 , C 3 , C 2 , C 23 , C56 : ð19:146Þ

Taking C6(z) as an example, we have

ðϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ÞC 6 ðzÞ ¼ ðϕ6 ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 Þ: ð19:147Þ

Using a matrix representation, we have

Fig. 19.9 Structural


formula of benzene. It
belongs to D6h symmetry
19.4 MO Calculations Based on π-Electron Approximation 765

Fig. 19.10 Six equivalent


pz atomic orbitals of carbon
y
of benzene

O x
z

0 1
0 1 0 0 0 0
B0 0C
B 0 1 0 0 C
B C
B0 0 0 1 0 0C
C 6 ðzÞ ¼ B
B0
C: ð19:148Þ
B 0 0 0 1 0C
C
B C
@0 0 0 0 0 1A
1 0 0 0 0 0

Once again, we can determine the trace for individual symmetry transformations
belonging to D6h. We collect the results in Table 19.7. The representation is
reducible and this is reduced as follows using a character table of D6h
(Table 19.8). As a result, we get

Γ ¼ A2u þ B2g þ E 1g þ E2u : ð19:149Þ

As a subduced representation of D of D6h to C6, we have

A2u # C 6 ¼ A, B2g # C 6 ¼ B, E 1g # C 6 ¼ 2E1 , E 2u # C 6 ¼ 2E2 : ð19:150Þ

Here, we used Table 19.9 that shows a character table of irreducible representa-
tions of C6. Following the previous procedures, as SALCs we have
766 19 Applications of Group Theory to Physical Chemistry

Table 19.7 Characters for individual symmetry transformations of 2pz orbitals in benzene
D6h E 2C6 2C3 C2 3C 02 3C 002 i 2S3 2S6 σh 3σ d 3σ v
Γ 6 0 0 0 2 0 0 0 0 6 0 2

Table 19.8 Character table of D6h


D6h E 2C6 2C3 C2 3C 02 3C 002 i 2S3 2S6 σh 3σ d 3σ v
A1g 1 1 1 1 1 1 1 1 1 1 1 1 x2 + y2, z2
A2g 1 1 1 1 1 1 1 1 1 1 1 1
B1g 1 1 1 1 1 1 1 1 1 1 1 1
B2g 1 1 1 1 1 1 1 1 1 1 1 1
E1g 2 1 1 2 0 0 2 1 1 2 0 0 (yz, zx)
E2g 2 1 1 2 0 0 2 1 1 2 0 0 (x2  y2, xy)
A1u 1 1 1 1 1 1 1 1 1 1 1 1
A2u 1 1 1 1 1 1 1 1 1 1 1 1 z
B1u 1 1 1 1 1 1 1 1 1 1 1 1
B2u 1 1 1 1 1 1 1 1 1 1 1 1
E1u 2 1 1 2 0 0 2 1 1 2 0 0 (x, y)
E2u 2 1 1 2 0 0 2 1 1 2 0 0

6PðAÞ ϕ1  ξ1 ¼ ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6 ,
6PðBÞ ϕ1  ξ2 ¼ ϕ1  ϕ2 þ ϕ3  ϕ4 þ ϕ5  ϕ6 ,
P½E1  ϕ1  ξ3 ¼ ϕ1 þ εϕ2  ε ϕ3  ϕ4  εϕ5 þ ε ϕ6 ,
ð1Þ

ð19:151Þ
P½E1  ϕ  ξ ¼ ϕ þ ε ϕ  εϕ  ϕ  ε ϕ þ εϕ ,
ð2Þ

1 4 1 2 3 4 5 6

P½E2  ϕ1  ξ5 ¼ ϕ1  ε ϕ2  εϕ3 þ ϕ4  ε ϕ5  εϕ6 ,


ð1Þ

P½E2  ϕ1  ξ6 ¼ ϕ1  εϕ2  ε ϕ3 þ ϕ4  εϕ5  ε ϕ6 ,


ð2Þ

where ε ¼ exp (iπ/3).


Correspondingly, we have a diagonal secular equation of a sixth-order such that
 
He e 
 11  λS11 
 e 22  λe 
 H S22 
 
 e 33  λe 
 H S33 
 
  ¼ 0: ð19:152Þ
 
 e 44  λe 
 H S44 
 
 e 55  λe 
 H S55 
 H 66  λS66 
e e

Here, we have, for example,


19.4 MO Calculations Based on π-Electron Approximation 767

Table 19.9 Character table of C6


C6 E C6 C3 C2 C 23 C 56 ε ¼ exp (iπ/3)
A 1 1 1 1 1 1 z; x2 + y2, z2
B 1 1 1 1 1 1
 
E1 1 ε ε 1 ε ε (x, y); (yz, zx)
1 ε ε 1 ε ε
 
E2 1 ε ε 1 ε ε (x2  y2, xy)
1 ε ε 1 ε ε

ð
e 11 ¼ ξ1 Hξ1 dτ ¼ hξ1 jHξ1 i ¼ 6ðα þ 2β þ 2β0 þ β00 Þ:
H ð19:153Þ

In (19.153), we used the same α and β defined in (19.94) as in the case of


cyclopropenyl radical. That is, α is a Coulomb integral and β is a resonance integral
between two adjacent 2pz orbital of carbon. Meanwhile, β0 is a resonance integral
between orbitals of “meta” positions such as ϕ1 and ϕ3. A quantity β00 is a resonance
integral between orbitals of “para” positions such as ϕ1 and ϕ4. It is unfamiliar to
include such kind resonance integrals of β0 and β00 at a simple π-electron approxi-
mation level. To ignore such resonance integrals is because of a practical purpose to
simplify the calculations. However, we have no reason to exclude them. Or rather,
the use of appropriate SALCs makes it feasible to include β0 and β00.
In a similar manner, we get
ð
e 22 ¼ ξ2 Hξ2 dτ ¼ hξ2 jHξ2 i ¼ 6ðα  2β þ 2β0  β00 Þ,
H
e 33 ¼ H
e 44 ¼ 6ðα þ β  β0  β00 Þ, ð19:154Þ
H
e 55 ¼ H
H e 66 ¼ 6ðα  β  β0 þ β00 Þ:

Meanwhile, we have

e
S11 ¼ hξ1 jξ1 i ¼ 6ð1 þ 2S þ 2S0 þ S00 Þ,
e
S22 ¼ hξ2 jξ2 i ¼ 6ð1  2S þ 2S0  S00 Þ,
ð19:155Þ
e
S33 ¼ e
S44 ¼ 6ð1 þ S  S0  S00 Þ,
e
S55 ¼ e
S66 ¼ 6ð1  S  S0 þ S00 Þ,

where S, S0, and S00 are overlap integrals between the ortho, meta, and para positions,
respectively. Substituting (19.153) through (19.155) for (19.152), the energy eigen-
values are readily obtained as
768 19 Applications of Group Theory to Physical Chemistry

α þ 2β þ 2β0 þ β00 α  2β þ 2β0  β00 α þ β  β0  β00


λ1 ¼ 0 00 , λ2 ¼ , λ3 ¼ λ4 ¼ ,
1 þ 2S þ 2S þ S 1  2S þ 2S0  S00 1 þ S  S0  S00
0 00
αββ þβ
λ5 ¼ λ6 ¼ :
1  S  S0 þ S00
ð19:156Þ

Notice that two MOs ξ3 and ξ4 as well as ξ5 and ξ6 are degenerate. As can be seen,
λ3 and λ4 are doubly degenerate. So are λ5 and λ6.
From (19.151), we get for instance

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
kξ1 k ¼ hξ1 jξ1 i ¼ 6ð1 þ 2S þ 2S0 þ S00 Þ: ð19:157Þ

Thus, for one of normalized MOs corresponding to an energy eigenvalue λ1, we


get

j ξ1 i ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6
Ψ1 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi : ð19:158Þ
kξ 1 k 6ð1 þ 2S þ 2S0 þ S00 Þ

Following the previous examples, we have other normalized MOs. That is, we
have

j ξ2 i ϕ1  ϕ2 þ ϕ3  ϕ4 þ ϕ5  ϕ6
Ψ2 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ,
kξ 2 k 6ð1  2S þ 2S0  S00 Þ
j ξ3 i ϕ1 þ εϕ2  ε ϕ3  ϕ4  εϕ5 þ ε ϕ6
Ψ3 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ,
kξ 3 k 6ð1 þ S  S0  S00 Þ
j ξ4 i ϕ1 þ ε ϕ2  εϕ3  ϕ4  ε ϕ5 þ εϕ6
Ψ4 ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi , ð19:159Þ
kξ 4 k 6ð1 þ S  S0  S00 Þ
j ξ i ϕ  ε ϕ2  εϕ3 þ ϕ4  ε ϕ5  εϕ6
Ψ5 ¼ 5 ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ,
kξ 5 k 6ð1  S  S0 þ S00 Þ
j ξ i ϕ  εϕ2  ε ϕ3 þ ϕ4  εϕ5  ε ϕ6
Ψ6 ¼ 6 ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi :
kξ 6 k 6ð1  S  S0 þ S00 Þ

The eigenfunction Ψ i (1  i  6) corresponds to the eigenvalue λi. As in the case


f3 and Ψ
of cyclopropenyl radical, Ψ 3 and Ψ 4 can be transformed to Ψ f4, respectively,
through a unitary matrix of (19.141). Thus, we get
19.4 MO Calculations Based on π-Electron Approximation 769

Fig. 19.11 Energy diagram ( )


and MO assignments of
benzene. Energy
eigenvalues λ1 to λ6 are
= ( )
given in (19.156)

= ( )

f3 ¼ 2ϕ1 þpϕffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ 2  ϕ3  2ϕ4  ϕ5 þ ϕ6
,
12ð1 þ S  S0  S00 Þ
ð19:160Þ
f4 ¼ ϕp
Ψ 2 þ ϕ3  ϕ5  ϕ6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
2 1 þ S  S0  S00

f5 and Ψ
Similarly, transforming Ψ 5 and Ψ 6 to Ψ f6 , respectively, we have

f5 ¼ 2ϕ1 pϕffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ 2  ϕ3 þ 2ϕ4  ϕ5  ϕ6
,
12ð1  S  S0 þ S00 Þ
ð19:161Þ
f6 ¼ ϕp
Ψ 2  ϕ3 þ ϕ5  ϕ6
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
2 1  S  S0 þ S00

Figure 19.11 shows an energy diagram and MO assignments of benzene.


A major optical transition takes place among HOMO (E1g) and LUMO (E2u)
levels. In the case of optical absorption, an initial electronic configuration is assigned
to the totally symmetric representation A1g and the symmetry of the final electronic
configuration is described as

Γ ¼ E 1g E 2u :

Therefore, a transition matrix element is expressed as


D  E

ΦðE1g E 2u Þ
jεe  PΦðA1g Þ , ð19:162Þ

where ΦðA1g Þ stands for the totally symmetric ground-state electronic configuration;
ΦðE1g E2u Þ denotes an excited state electronic configuration represented by a direct-
product representation. This representation is reducible and expressed as a direct
sum of irreducible representations such that
770 19 Applications of Group Theory to Physical Chemistry

Γ ¼ B1u þ B2u þ E 1u : ð19:163Þ

Notice that unlike ethylene a direct-product representation associated with the


final state is reducible.
To examine whether (19.162) is nonvanishing, as in the case of (19.109) we
estimate whether a direct-product representation A1g E1g E2u ¼ E1g E2u ¼
B1u + B2u + E1u contains an irreducible representation which εe  P belongs
to. Consulting a character table of D6h, we find that x and y belong to an irreducible
representation E1u and that z belongs to A2u. Since the direct sum Γ in (19.163)
contains E1u, benzene is expected to be polarized along both x- and y-axes (see
Fig. 19.10). Since (19.163) does not contain A2u, the transition along the z-axis is
forbidden. Accordingly, the transition takes place when the light is polarized parallel
to the molecular plane (i.e., the xy-plane). This is a common feature among planar
aromatic molecules including benzene and cyclopropenyl radical. On the other hand,
A2u is not contained in Γ, and so we do not expect the optical transition to occur in
the direction of the z-axis.

19.4.4 Allyl Radical [1]

We revisit the allyl radical and perform its MO calculations. As already noted,
Tables 18.1 and 18.2 of Example 18.1 collected representation matrices of individual
symmetry operations in reference to the basis vectors comprising three atomic
orbitals of allyl radical. As usual, we examine traces (or characters) of those
matrices. Table 19.10 collects them. The representation is readily reduced according
to the character table of C2v (see Table 18.4) so that we have

Γ ¼ A2 þ 2B1 : ð19:164Þ

We have two SALC orbitals that belong to the same irreducible representation of
B1. As noted in Sect. 19.3, the orbitals obtained by a linear combination of these two
SALCs belong to B1 as well. Such a linear combination is given by a unitary
transformation. In the present case, it is convenient to transform the basis vectors
two times. The first transformation is carried out to get SALCs and the second one
will be done in the process of solving a secular equation using the SALCs. Sche-
matically showing the procedures, we have

Table 19.10 Characters for C2v E C2(z) σ v(zx) σ 0v ðyzÞ


individual symmetry
Γ 3 1 1 3
transformations of 2pz orbitals
in allyl radical
19.4 MO Calculations Based on π-Electron Approximation 771

8 8 8
< ϕ1
> < Ψ1
> < Φ1
>
ϕ2 ! Ψ 2 ! Φ2 ,
>
: >
: >
:
ϕ3 Ψ3 Φ3

where ϕ1, ϕ2, ϕ3 show the original atomic orbitals; Ψ 1, Ψ 2, Ψ 3 the SALCs; Φ1, Φ2,
Φ3 the final MOs. Thus, the three sets of vectors are connected through unitary
transformations.
Starting with ϕ1 and following the previous cases, we have, e.g.,
h i
1 X ðB1 Þ
PðB1 Þ ϕ1 ¼ χ ð g Þ gϕ1 ¼ ϕ1 :
4 g

Also starting with ϕ2, we have


h i
1 X ðB1 Þ 1
PðB1 Þ ϕ2 ¼ χ ðgÞ gϕ2 ¼ ðϕ2 þ ϕ3 Þ:
4 g 2

Meanwhile, we get

PðA2 Þ ϕ1 ¼ 0,
h i
1 X ðA2 Þ 1
PðA2 Þ ϕ2 ¼ χ ð g Þ gϕ2 ¼ ðϕ2  ϕ3 Þ:
4 g 2

Thus, we recovered the results of Example 18.1. Notice that ϕ1 does not partic-
ipate in A2, but take part in B1 by itself.
Normalized SALCs are given as follows:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ 1 ¼ ϕ1 , Ψ 2 ¼ ðϕ2 þ ϕ3 Þ= 2ð1 þ S0 Þ, Ψ 3 ¼ ðϕ2  ϕ3 Þ= 2ð1  S0 Þ,
Ð
where we define S0  ϕ2ϕ3dτ. If Ψ 1, Ψ 2, and Ψ 3 belonged to different irreducible
representations (as in the cases of previous three examples of ethylene,
cyclopropenyl radical, and benzene), the secular equation would be fully reduced
to a form of (19.66). In the present case, however, Ψ 1 and Ψ 2 belong to the same
irreducible representation B1, making the situation a bit complicated. Nonetheless,
we can use (19.61) and the secular equation is “partially” reduced.
Defining
ð ð
H jk ¼ Ψ j HΨ k dτ and Sjk ¼ Ψ j Ψ k dτ,

we have a following secular equation the same form as (19.61):


772 19 Applications of Group Theory to Physical Chemistry

det H jk  λSjk ¼ 0:

More specifically, we have


 rffiffiffiffiffiffiffiffiffiffiffiffi 
 
 αλ
2
ðβ  SλÞ 
 1 þ S0
0 
 
 rffiffiffiffiffiffiffiffiffiffiffiffi 
 2 α þ β0 
  ¼ 0,
 1 þ S0 ðβ  SλÞ 1 þ S0
λ 0 
 
 
 α  β0 
 0 0  λ 
1  S0

where α, β, and S are similarly defined as (19.94) and (19.98). The quantity β0 is
defined as
ð
0
β  ϕ2 Hϕ3 dτ:

Thus, the secular equation is separated into the following two:


 rffiffiffiffiffiffiffiffiffiffiffiffi 
 
 αλ
2 
 0 ðβ  SλÞ 
 1þS  α  β0
 rffiffiffiffiffiffiffiffiffiffiffiffi  ¼ 0 and  λ ¼ 0: ð19:165Þ
 2 αþβ 0  1  S0
 ð β  Sλ Þ  λ 
 1 þ S0 1þS 0 

The second equation immediately gives

α  β0
λ¼ :
1  S0

The first equation of (19.165) is somewhat complicated, and so we adopt the next
approximation. That is,

S0 ¼ β0 ¼ 0: ð19:166Þ

This approximation is justified, because two carbon atoms C2 and C3 are pretty
remote, and so the interaction between them is likely to be weak enough. Thus, we
rewrite the first equation of (19.165) as
 pffiffiffi 
 αλ 2ðβ  SλÞ 

 pffiffiffi  ¼ 0: ð19:167Þ
 2ðβ  SλÞ αλ 
19.4 MO Calculations Based on π-Electron Approximation 773

Moreover, we assume that since S is a small quantity compared to 1, a square term


of S2 1. Hence, we ignore S2.
Using the approximation of (19.166) and assuming S2 0, from (19.167) we
have a following quadratic equation:

λ2  2ðα  2βSÞλ þ α2  2β2 ¼ 0:

Solving this equation, we have


rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  
pffiffiffi 2αS pffiffiffi αS
λ ¼ α  2βS  2β 1 α  2βS  2β 1  ,
β β

where the last approximation is based on

pffiffiffiffiffiffiffiffiffiffiffi 1
1x 1 x
2

with a small quantity x. Thus, we get


pffiffiffi pffiffiffi pffiffiffi pffiffiffi
λL αþ 2β 1 2S and λH α 2β 1þ 2S , ð19:168Þ

where λL < λH. To determine the corresponding eigenfunctions Φ (i.e., MOs), we use
a linear combination of two SALCs Ψ 1 and Ψ 2. That is, putting

Φ ¼ c1 Ψ 1 þ c2 Ψ 2 ð19:169Þ

and from (19.167), we obtain


pffiffiffi
ðα  λÞc1 þ 2ðβ  SλÞc2 ¼ 0:

Thus, with λ ¼ λL we get c1 ¼ c2 and for λ ¼ λH we get c1 ¼  c2. Consequently,


as a normalized eigenfunction ΦL corresponding to λL we get

1 1 pffiffiffi 1=2 pffiffiffi


ΦL ¼ pffiffiffi ðΨ 1 þ Ψ 2 Þ ¼ 1 þ 2S 2ϕ1 þ ϕ2 þ ϕ3 : ð19:170Þ
2 2

Likewise, as a normalized eigenfunction ΦH corresponding to λH we get

1 1 pffiffiffi 1=2 pffiffiffi


ΦH ¼ pffiffiffi ðΨ 1 þ Ψ 2 Þ ¼ 1  2S  2ϕ1 þ ϕ2 þ ϕ3 : ð19:171Þ
2 2

As another eigenvalue λ0 and corresponding eigenfunction Φ0, we have


774 19 Applications of Group Theory to Physical Chemistry

1
λ0 α and Φ0 ¼ pffiffiffi ðϕ2  ϕ3 Þ, ð19:172Þ
2

where we have λL < λ0 < λH. The eigenfunction Φ0 does not participate in chemical
bonding and, hence, is said to be a nonbonding orbital.
It is worth noting that eigenfunctions ΦL, Φ0, and ΦH have the same function
forms as those obtained by the simple Hückel theory that ignores the overlap
integrals S. It is because the interaction between ϕ2 and ϕ3 that construct SALCs
of Ψ 2 and Ψ 3 is weak.
Notice that within the framework of the simple Hückel theory, two sets of basis
vectors jϕ1 i, p1ffiffi2 jϕ2 þ ϕ3 i, p1ffiffi2 jϕ2  ϕ3 i and |ΦLi, |ΦHi, |Φ0i are connected by a
following unitary matrix V:
 
1 1
ðjΦL i jΦH i jΦ0 iÞ ¼ jϕ1 i pffiffiffi jϕ2 þ ϕ3 i pffiffiffijϕ2  ϕ3 i V
2 2
0 1
1 1
pffiffiffi  pffiffiffi 0
 B
B 2 2 C
C
1 1 B C
¼ jϕ1 i pffiffiffi jϕ2 þ ϕ3 i pffiffiffijϕ2  ϕ3 i B 1
pffiffiffi
1
pffiffiffi C: ð19:173Þ
2 2 B 0 C
@ 2 2 A
0 0 1

Although both SALCs Ψ 1 and Ψ 2 belong to the same irreducible representation


B1, they are not orthogonal. As can be seen from (19.170) and (19.171), however, we
find that ΦL and ΦH that are sought by solving the secular equation (19.167) have
been mutually orthogonal. Starting from jϕ1i, jϕ2i, and jϕ3i of Example 18.1, we
reached jΦLi, jΦHi, and jΦ0i via two-step unitary transformations (18.40) and
(19.173). The combined unitary transformations W ¼ UV are unitary again. That
is, we have
0 1 1 1
pffiffiffi  pffiffiffi 0
B 2 2 C
B C
B 1 1 1 C
ðjϕ1 ijϕ2 ijϕ3 iÞUV ¼ ðjϕ1 ijϕ2 ijϕ3 iÞB
B 2 pffiffiffi C
B 2 2 C C
@ 1 1 1 A
 pffiffiffi
2 2 2
¼ ðjΦL i jΦH i jΦ0 iÞ: ð19:174Þ

The optical transition of allyl radical represents general features of the optical
transitions of molecules. To make a story simple, let us consider a case of allyl
cation. Figure 19.12 shows electronic configurations together with symmetry species
of individual eigenstates and their corresponding energy eigenvalues for the allyl
cation. In Fig. 19.13, we redraw its geometry where the origin is located at the center
19.4 MO Calculations Based on π-Electron Approximation 775

(a) (b) (c)


( ) − 2 )(1 + 2

( )

( ) + 2 )(1 − 2

Fig. 19.12 Electronic configurations and symmetry species of individual eigenstates along with
their corresponding energy eigenvalues for the allyl cation. (a) Ground state. (b) First excited state.
(c) Second excited state

C1

r1
C3 C2
r3 O r2

Fig. 19.13 Geometry and position vectors of carbon atoms of the allyl cation. The origin is located
at the center of a line segment connecting C2 and C3 (r2 + r3 = 0)

of a line segment connecting C2 and C3 (r2 + r3 = 0). Major optical transitions


(or optical absorption) are ΦL ! Φ0 and ΦL ! ΦH.
(i) ΦL ! Φ0: In this case, following (19.109) the transition matrix element Tfi is
described by
D E
T fi ¼ Θf ðB1 A2 Þ
jεe  PjΘi ðA1 Þ : ð19:175Þ

In the above equation, we designate the irreducible representation of eigenstates


according to (19.164). Therefore, the symmetry of the final electronic configu-
ration is described as

Γ ¼ B1 A2 ¼ B2 :

Since a direct product of the initial electronic configuration [Θi ðA1 Þ ] and the
final configuration Θf ðB1 A2 Þ is B1 A2 A1 ¼ B2. Then, if εe  P belongs to the
same irreducible representation B2, the associated optical transition should be
allowed. Consulting a character table of C2v, we find that y belongs to B2. Thus,
the allowed optical transition is polarized along the y-direction.
(ii) ΦL ! ΦH: In parallel with the above case, Tfi is described by
D E
T fi ¼ Θf ðB1 B1 Þ
jεe  PjΘi ðA1 Þ : ð19:176Þ

Thus, the transition is characterized by


776 19 Applications of Group Theory to Physical Chemistry

A1 ! A1 ,

where the former A1 indicates the electronic ground state and the latter A1
indicates the excited state given by B1 B1 ¼ A1. The direct product of them is
simply described as B1 B1 A1 ¼ A1. Consulting a character table again, we
find that z belongs to A1. This implies that the allowed transition is polarized
along the z-direction.
Next, we investigate the above optical transition in a semiquantitative manner. In
principle, the transition matrix element Tfi should be estimated from (19.175) and
(19.176) that use electronic configurations of two-electron system. Nonetheless, a
formulation of (4.7) of Sect. 4.1 based upon one-electron states well serves our
present purpose. For ΦL ! Φ0 transition, we have
ð ð
T fi ðΦL ! Φ0 Þ ¼ Φ0 εe  PΦL dτ ¼ eεe  Φ0 rΦL dτ
ð
1 pffiffiffi 1
¼ eεe  2ϕ1 þ ϕ2 þ ϕ3 r pffiffiffi ðϕ2  ϕ3 Þ dτ
2 2
ð pffiffiffi pffiffiffi

¼ peffiffiffi  2ϕ1 rϕ2  2ϕ1 rϕ3 þ ϕ2 rϕ2 þ ϕ3 rϕ2  ϕ3 rϕ3 dτ
2 2
ð
eεe
pffiffiffi  ðϕ2 rϕ2  ϕ3 rϕ3 Þ dτ
2 2
 ð ð
eεe
pffiffiffi  r2 jϕ2 j2 dτ  r3 jϕ3 j2 dτ
2 2
eε eε
¼ peffiffiffi  ðr2  r3 Þ ¼ pffiffieffi  r2
2 2 2
Ð
where with the first near equality we ignored integrals ϕirϕjdτ (i 6¼ j); with the last
near equality r r2 or r3. For these approximations, we assumed that an electron
density is very high near C2 or C3 with ignorable density at a place remote from
them. Choosing εe for the direction of r2, we get

e
T fi ðΦL ! Φ0 Þ pffiffiffi j r2 j :
2

With ΦL ! ΦH transition, similarly we have


ð
eεe eε
T fi ðΦL ! ΦH Þ ¼ ΦH εe  PΦL dτ  ðr3 2 2r1 þ r2 Þ = 2 e  r1 :
4 2

Choosing εe for the direction of r1, we get


19.5 MO Calculations of Methane 777

e
T fi ðΦL ! ΦH Þ jr j:
2 1

Transition
pffiffiffi probability is proportional to a square of an absolute value of Tfi. Using
j r2 j 3 j r1 j, we have

  e2 3e2
T fi ðΦL ! Φ0 Þ2 j r2 j 2 ¼ jr j2 ,
2 2 1
  e2
T fi ðΦL ! ΦH Þ2 jr j2 :
4 1

Thus, we obtain
   2
T fi ðΦL ! Φ0 Þ2 6T fi ðΦL ! ΦH Þ : ð19:177Þ

Thus, the transition probability of ΦL ! Φ0 is about six times that for ΦL ! ΦH.
Note that in the above simple estimation we ignore an overlap integral S.
From the above discussion, we conclude that (i) the ΦL ! Φ0 transition is
polarized along the r2 direction (i.e., the molecular long axis) and that the ΦL ! ΦH
transition is polarized along the r1 direction (i.e., the molecular short axis).
(ii) Transition probability of ΦL ! Φ0 is about six times that of ΦL ! ΦH. Note
that the polarized characteristics are consistent with those obtained from the discus-
sion based on the group theory. The conclusion reached by the semiquantitative
estimation of a simple molecule of allyl cation well typifies the general optical
features of more complicated molecules having a well-defined molecular long axis
such as polyenes.

19.5 MO Calculations of Methane

So far, we investigated MO calculations of aromatic molecules based upon π-


electron approximation. These are a homogeneous system that has the same quality
of electrons. Here we deal with methane that includes a carbon and surrounding four
hydrogens. These hydrogen atoms form a regular tetrahedron with the carbon atom
positioned at a center of the tetrahedron. It is therefore considered as a heterogeneous
system. The calculation principle, however, is consistent, namely, we make the most
of projection operators and construct appropriate SALCs of methane.
We deal with four 1s electrons of hydrogen along with two 2s electrons and two
2p electrons of carbon. Regarding basis functions of carbon, however, we consider
2s atomic orbital and three 2p orbitals (i.e., 2px, 2py, 2pz orbitals). That is, we deal
with eight atomic orbitals all together. These are depicted in Fig. 19.14. The
dimension of the vector space (i.e., representation space) is eight accordingly.
778 19 Applications of Group Theory to Physical Chemistry

Fig. 19.14 Four 1s atomic z


orbitals of hydrogen and a
2pz orbital of carbon. The
former orbitals are
represented by H1 to H4. 2px
and 2py orbitals of carbon
are omitted for simplicity

O
y

As before, we wish to determine irreducible representations which individual


MOs belong to. As already mentioned in Sect. 17.3, there are 24 symmetry opera-
tions in a point group Td which methane belongs to (see Table 17.6). According to
the symmetry operations, we decide transformation matrices related to each opera-
tion. For example, Cxyz
3 transforms basis functions as follows:

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz C xyz


3

¼ H 1 H 3 H 4 H 2 C2s C2py C2pz C2px ,

where by the above notations we denoted atomic species and molecular orbitals.
Hence, as a matrix representation we have
0 1
1 0 0 0 0 0 0 0
B0 0C
B 0 0 1 0 0 0 C
B C
B0 1 0 0 0 0 0 0C
B C
B0 0C
B 0 1 0 0 0 0 C
Cxyz ¼B C, ð19:178Þ
3
B0 0 0 0 1 0 0 0C
B C
B0 1C
B 0 0 0 0 0 0 C
B C
@0 0 0 0 0 1 0 0A
0 0 0 0 0 0 1 0

where Cxyz
3 is the same operation as Rxyz2π
3
appeared in Sect. 17.3. Therefore,
19.5 MO Calculations of Methane 779

χ C xyz
3 ¼ 2:

As another example σ yz
d , we have

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz σ yz


d

¼ H 1 H 2 H 4 H 3 C2s C2px C2pz C2py ,

where σ yz
d represents a mirror symmetry with respect to the plane that includes the x-
axis and bisects the angle formed by the y- and z-axes. Thus, we have
0 1
1 0 0 0 0 0 0 0
B0 0C
B 1 0 0 0 0 0 C
B C
B0 0 0 1 0 0 0 0C
B C
B0 0C
yz B 0 1 0 0 0 0 C
σd ¼ B C: ð19:179Þ
B0 0 0 0 1 0 0 0C
B C
B0 0C
B 0 0 0 0 1 0 C
B C
@0 0 0 0 0 0 0 1A
0 0 0 0 0 0 1 0

Then, we have

χ σ yz
d ¼ 4:

Taking some more examples, for Cz2 we get

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz C z2


¼ H 4 H 3 H 2 H 1 C2s  C2px  C2py C2pz ,

where Cz2 means a rotation by π around the z-axis. Also, we have


780 19 Applications of Group Theory to Physical Chemistry

0 1
0 0 0 1 0 0 0 0
B0 0C
B 0 1 0 0 0 0 C
B C
B0 1 0 0 0 0 0 0C
B C
B1 0 0C
B 0 0 0 0 0 C
C2 ¼ B
z
C, ð19:180Þ
B0 0 0 0 1 0 0 0C
B C
B0 0 1 0 0C
B 0 0 0 C
B C
@0 0 0 0 0 0 1 0 A
0 0 0 0 0 0 0 1
χ Cz2 ¼ 0:


With S42 (i.e., an improper rotation by π2 around the z-axis), we get


H 1 H 2 H 3 H 4 C2s C2px C2py C2pz S42
¼ H 3 H 1 H 4 H 2 C2s C2py  C2px  C2pz ,

0 1
0 1 0 0 0 0 0 0
B0 0 C
B 0 0 1 0 0 0 C
B C
B1 0 0 0 0 0 0 0 C
B C
B0 0 C
zπ2 B 0 1 0 0 0 0 C
S4 ¼ B C, ð19:181Þ
B0 0 0 0 1 0 0 0 C
B C
B0 1 0 C
B 0 0 0 0 0 C
B C
@0 0 0 0 0 1 0 0 A
0 0 0 0 0 0 0 1
zπ2
χ S4 ¼ 0:

As for the identity matrix E, we have

χ ðEÞ ¼ 8:

Thus, Table 19.11 collects characters of individual symmetry transformations


with respect to hydrogen 1s and carbon 2s and 2p orbitals in methane. From the
above examples, we notice that all the symmetry operators R reduce an eight-
dimensional representation space V8 to subspaces of Span{H1 H2 H3 H4} and
Span{C2s C2px C2py C2pz}. In terms of a notation of Sect. 12.2, we have
 
V 8 ¼ SpanfH 1 H 2 H 3 H 4 g  Span C2s C2px C2py C2pz : ð19:182Þ
19.5 MO Calculations of Methane 781

Table 19.11 Characters for Td E 8C3 3C2 6S4 6σ d


individual symmetry
Γ 8 2 0 0 4
transformations of hydrogen
1s orbitals and carbon 2s and
2p orbitals in methane

In other words, V8 is decomposed into the above two R-invariant subspaces (see
Part III); one is the hydrogen-related subspace and the other is the carbon-related
subspace.
Correspondingly, a representation D comprising the above representation matri-
ces should be reduced to a direct sum of irreducible representations D(α). That is, we
should have
X
D¼ q DðαÞ ,
α α

where qα is a positive integer or zero. Here we are thinking of decomposition of (8, 8)


matrices such as (19.178) into submatrices. We estimate qα using (18.83). That is,

1 X ðαÞ
qα ¼ χ ðgÞ χ ðgÞ: ð18:83Þ
n g

With an irreducible representation A1 of Td for instance, we have

1 X ðA1 Þ 1
qA1 ¼ χ ðgÞ χ ðgÞ ¼ ð1  8 þ 1  8  2 þ 1  6  4Þ ¼ 2:
24 g 24

As for A2, we have

1
qA 2 ¼ ½1  8 þ 1  8  2 þ ð1Þ  6  4 ¼ 0:
24

Regarding T2, we get

1
qT 2 ¼ ð3  8 þ 1  6  4Þ ¼ 2:
24

For other irreducible representations of Td, we get qα ¼ 0. Consequently, we have

D ¼ 2DðA1 Þ þ 2DðT 2 Þ : ð19:183Þ

Evidently, both for the hydrogen-related representation D(H )and carbon-related


representation D(C), we have individually
782 19 Applications of Group Theory to Physical Chemistry

DðH Þ ¼ DðA1 Þ þ DðT 2 Þ and DðCÞ ¼ DðA1 Þ þ DðT 2 Þ , ð19:184Þ

where D ¼ D(H ) + D(C).


In fact, (19.180) for example, in a subspace Span{H1 H2 H3 H4}, C z2 is expressed
as
0 1
0 0 0 1
B0 0 0C
B 1 C
C z2 ¼ B C:
@0 1 0 0A
1 0 0 0

Following routine procedures based on a characteristic polynomials, we get


eigenvalues +1 (as a double root) and 1 (a double root as well). Unitary similarity
transformation using a unitary matrix P expressed as
01 1 1 1 1
B2 2 2 2 C
B1 1C
B 1

1
 C
B2 2C
P¼B
B1
2 2 C
B 1 1 1C
B2   C
@ 2 2 2C
A
1 1 1 1
 
2 2 2 2

yields a diagonal matrix such that


0 1
1 0 0 0
B0 1 0C
B 0 C
P1 C z2 P ¼ B C:
@0 0 1 0A
0 0 0 1

Note that this diagonal matrix is identical to submatrix of (19.180) for Span

{C2s C2px C2py C2pz}. Similarly, S42 of (19.181) gives eigenvalues 1 and i for

both the subspaces. Notice that since S42 is unitary, its eigenvalues take a complex
number with an absolute value of 1. Since these symmetry operation matrices are
unitary, these matrices must be diagonalized according to Theorem 14.5. Using
unitary matrices whose column vectors are chosen from eigenvectors of the matrices,
diagonal elements are identical to eigenvalues including their multiplicity. Writing
representation matrices of symmetry operations for the hydrogen-associated sub-
space and carbon-associated subspace as H and C, we find that H and C have the
same eigenvalues in common. Notice here that different types of transformation

matrices (e.g., C z2 , S42 ) give a different set of eigenvalues. Via unitary similarity
transformation using unitary matrices P and Q, we get
19.5 MO Calculations of Methane 783

1
P1 HP ¼ Q1 CQ or PQ1 H PQ1 ¼ C:

Namely, H and C are similar, i.e., the representation is equivalent. Thus, recalling
Schur’s First Lemma, Eq. (19.184) results.
Our next task is to construct SALCs, i.e., proper basis vectors using projection
operators. From the above, we anticipate that the MOs comprise a linear combina-
tion of the hydrogen-associated SALC and carbon-associated SALC that belongs to
the same irreducible representation (i.e., A1 or T2). For this purpose, we first find
proper SALCs using projection operators described as

ðαÞ dα X ðαÞ
PlðlÞ ¼ D ðgÞ g:
g ll
ð18:156Þ
n

We apply this operator to Span{H1 H2 H3 H4}. Taking, e.g., H1 and operating


both sides of (18.156) on H1, as a basis vector corresponding to a one-dimensional
representation of A1 we have

ðA Þ dA1 X ðA1 Þ 1
P1ð11Þ H 1 ¼ D ðgÞ gH 1 ¼ ½ð1  H 1 Þ þ ð1  H 1 þ 1  H 1 þ 1  H 3
g 11
n 24
þ1  H 4 þ 1  H 3 þ 1  H 2 þ 1  H 4 þ 1  H 2 Þ þ ð1  H 4 þ 1  H 3 þ 1  H 2 Þ
þð1  H 3 þ 1  H 2 þ 1  H 2 þ 1  H 4 þ 1  H 4 þ 1  H 3 Þ
þð1  H 1 þ 1  H 4 þ 1  H 1 þ 1  H 2 þ 1  H 1 þ 1  H 3 Þ
1
¼ ð6  H 1 þ 6  H 2 þ 6  H 3 þ 6  H 4 Þ
24
1
¼ ðH 1 þ H 2 þ H 3 þ H 4 Þ: ð19:185Þ
4

The case of C2s is simple, because all the symmetry operations convert C2s to
itself. That is, we have

ðA Þ 1
P1ð11Þ C2s ¼ ð24  C2sÞ ¼ C2s: ð19:186Þ
24

Regarding C2p, taking C2px for instance, we have

ðA Þ 1 
P1ð11Þ C2px ¼ fð1  C2px Þ þ 1  C2py þ 1  C2pz þ 1  C2py
24
þ1  C2pz þ 1  C2py þ 1  C2pz þ 1  C2py þ 1  C2pz 

þ½ð1  C2px þ 1  ðC2px Þ þ 1  ðC2px Þþ½1  ðC2px Þ

þ1  ðC2px Þ þ 1  C2pz þ 1  C2pz þ 1  C2py þ 1  C2py 


784 19 Applications of Group Theory to Physical Chemistry

Table 19.12 Character table of Td


Td E 8C3 3C2 6S4 6σ d
A1 1 1 1 1 1 x2 + y2 + z2
A2 1 1 1 1 1
E 2 1 2 0 0 (2z2  x2  y2, x2  y2)
T1 3 0 1 1 1
T2 3 0 1 1 1 (x, y, z); (xy, yz, zx)


þ 1  C2py þ 1  C2px þ 1  C2pz þ 1  C2py þ 1  C2px þ 1  C2pz

¼ 0:

The calculation is somewhat tedious, but it is natural that since C2s is spherically
symmetric, it belongs to the totally symmetric representation. Conversely, it is
natural to think that C2px is totally unlikely to contain a totally symmetric represen-
tation. This is also the case with C2py and C2pz.
Table 19.12 shows the character table of Td in which the three-dimensional
irreducible representation T2 is spanned by basis vectors (x y z). Since in
Table 17.6 each (3, 3) matrix is given in reference to the vectors (x y z), it can
directly be utilized to represent T2. More specifically, we can directly choose the
ðT Þ
diagonal elements (1, 1), (2, 2), and (3, 3) of the individual (3, 3) matrices for D112 ,
ðT Þ ðT Þ
D222 , and D332 elements of the projection operators, respectively. Thus, we can
construct SALCs using projection operators explained in Sect. 18.7. For example,
using H1 we obtain SALCs that belong to T2 such that

ðT Þ 3 X ðT 2 Þ 3
P1ð12Þ H 1 ¼ D ðgÞ gH 1 ¼ fð1  H 1 Þ
g 11
24 24
þ½ð1Þ  H 4 þ ð1Þ  H 3 þ 1  H 2 
þ½0  ðH 3 Þ þ 0  ðH 2 Þ þ 0  ðH 2 Þ þ 0  ðH 4 Þ þ ð1Þ  H 4 þ ð1Þ  H 3 
þð0  H 1 þ 0  H 4 þ 1  H 1 þ 1  H 2 þ 0  H 1 þ 0  H 3 Þg
3
¼ ½2  H 1 þ 2  H 2 þ ð2Þ  H 3 þ ð2Þ  H 4 
24
1
¼ ðH 1 þ H 2  H 3  H 4 Þ: ð19:187Þ
4

Similarly, we have

ðT Þ 1
P2ð22Þ H 1 ¼ ðH 1  H 2 þ H 3  H 4 Þ, ð19:188Þ
4
ðT Þ 1
P3ð32Þ H 1 ¼ ðH 1  H 2  H 3 þ H 4 Þ: ð19:189Þ
4
19.5 MO Calculations of Methane 785

ðT Þ
Now we can easily guess that P1ð12Þ C2px solely contains C2px. Likewise,
ðT Þ ðT Þ
P2ð22Þ C2py and P3ð32Þ C2pz contain only C2py and C2pz, respectively. In fact, we
obtain what we anticipate. That is,

ðT Þ ðT Þ ðT Þ
P1ð12Þ C2px ¼ C2px , P2ð22Þ C2py ¼ C2py , P3ð32Þ C2pz ¼ C2pz : ð19:190Þ

This gives a good example to illustrate the general concept of projection operators
and related calculations of inner products
D discussed in Sect.E 18.7. For example, we
ðT 2 Þ ðT 2 Þ
have a nonvanishing inner product of P1ð1Þ H 1 jP1ð1Þ C2px and an inner product of,
D E
ðT Þ ðT Þ
e.g., P2ð22Þ H 1 jP1ð12Þ C2px is zero. This significantly reduces efforts to solve a
ðT Þ ðT Þ
secular equation (vide infra). Notice that functions P1ð12Þ H 1 , P1ð12Þ C2px , etc. are
ðT Þ ðT Þ
linearly independent of one another. Recall also that P1ð12Þ H 1 and P1ð12Þ C2px
ð αÞ ðαÞ
correspond to ϕl and ψ l in (18.171), respectively. That is, the former functions
are linearly independent, while they belong to the same place “1” of the same three-
dimensional irreducible representation T2.
Equations (19.187) to (19.190) seem intuitively obvious. This is because if we
draw a molecular geometry of methane (see Fig. 19.14), we can immediately
recognize the relationship between the directionality in ℝ3 and “directionality” of
SALCs represented by sings of hydrogen atomic orbitals (or C2px, C2py, and C2pz of
carbon).
As stated above, we have successfully obtained SALCs relevant to methane.
Therefore, our next task is to solve an eigenvalue problem and construct appropriate
MOs. To do this, let us first normalize the SALCs. We assume that carbon-based
SALCs have already been normalized as well-studied atomic orbitals. We suppose
that all the functions are real. For the hydrogen-based SALCs

hH 1 þ H 2 þ H 3 þ H 4 jH 1 þ H 2 þ H 3 þ H 4 i ¼ 4hH 1 jH 1 i þ 12hH 1 jH 2 i
¼ 4 þ 12SHH ¼ 4 1 þ 3SHH , ð19:191Þ

where the second equality comes from the fact that jH1i, i.e., a 1s atomic orbital is
normalized. We define hH1| H2i ¼ SHH, i.e., an overlap integral between two
adjacent hydrogen atoms. Note also that hHi| Hji (1  i, j  4; i 6¼ j) is the same
because of the symmetry requirement.
Thus, as a normalized SALC we have

j H1 þ H2 þ H3 þ H4i
H ðA1 Þ  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:192Þ
2 1 þ 3SHH
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Defining a denominator as c ¼ 2 1 þ 3SHH , we have
786 19 Applications of Group Theory to Physical Chemistry

H ðA1 Þ ¼j H 1 þ H 2 þ H 3 þ H 4 i=c: ð19:193Þ

Also, we have

hH 1 þ H 2  H 3  H 4 jH 1 þ H 2  H 3  H 4 i ¼ 4hH 1 jH 1 i  4hH 1 jH 2 i
¼ 4  4SHH ¼ 4 1  SHH : ð19:194Þ

Thus, we have

ðT 2 Þ j H1 þ H2  H3  H4i
H1  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð19:195Þ
2 1  SHH
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Also defining a denominator as d ¼ 2 1  SHH , we have

ðT 2 Þ
H1 j H 1 þ H 2  H 3  H 4 i=d:

Similarly, we define other hydrogen-based SALCs as

ðT 2 Þ
H2 j H 1  H 2 þ H 3  H 4 i=d,
ðT Þ
,
H3 2 j H 1  H 2  H 3 þ H 4 i=d:

The next step is to construct MOs using the above SALCs. To this end, we make a
linear combination using SALCs belonging to the same irreducible representations.
In the case of A1, we choose H ðA1 Þ and C2s. Naturally, we anticipate two linear
combinations of

a1 H ðA1 Þ þ b1 C2s, ð19:196Þ

where a1 and b1 are arbitrary constants. On the basis of the discussions of projection
operators in Sect. 18.7, both the above two linear combinations belong to A1 as well.
ðT Þ ðT Þ ðT Þ
Similarly, according to the projection operators P1ð12Þ , P2ð22Þ , and P3ð32Þ , we make three
sets of linear combinations

ðT Þ ðT Þ ðT Þ
q1 P1ð12Þ H 1 þ r 1 C2px ; q2 P2ð22Þ H 1 þ r 2 C2py ; q3 P3ð32Þ H 1 þ r 3 C2pz , ð19:197Þ

where q1, r1, etc. are arbitrary constants. These three sets of linear combinations
belong to individual “addresses” 1, 2, and 3 of T2. What we have to do is to
determine coefficients of the above MOs and to normalize them by solving the
secular equations. With two different energy eigenvalues, we get two orthogonal
(i.e., linearly independent) MOs for the individual four sets of linear combinations of
19.5 MO Calculations of Methane 787

(19.196) and (19.197). Thus, total eight linear combinations constitute MOs of
methane.
In light of (19.183), the secular equation can be reduced as follows according to
the representations A1 and T2. There we have changed the order of entries in the
equation so that we can deal with the equation easily. Then we have
 
 H 11  λ H 12  λS12 
 
 H  λS H  λ 
 21 21 22 
 
 G11  λ G12  λT 12 
 
 G21  λT 21 G22  λ 
 
 
 F 11  λ F 12  λV 12 
 
 F 21  λV 21 F 22  λ 
 
 
 K  λ K  λW 
 11 12 12 
 K 21  λW 21 K 22  λ 
¼ 0,
ð19:198Þ

where off-diagonal elements are all zero. This is because of the symmetry require-
ment (see Sect. 19.2). Thus, in (19.198) the secular equation is decomposed into four
(2, 2) blocks. The first block is pertinent to A1 of a hydrogen-based component and
carbon-based component from the left, respectively. Lower three blocks are perti-
nent to T2 of hydrogen-based and carbon-based components from the left, respec-
ðT Þ ðT Þ ðT Þ
tively, in order of P1ð12Þ , P2ð22Þ , and P3ð32Þ SALCs from the top. The notations follow
those of (19.59).
We compute these equations. The calculations are equivalent to solving the
following four two-dimensional secular equations:
 
 H 11  λ H 12  λS12 

  ¼ 0,
 H  λS H 22  λ 
21 21
 
 G11  λ G12  λT 12 

  ¼ 0,
 G  λT G22  λ 
21 21
  ð19:199Þ
 F 11  λ F 12  λV 12 

  ¼ 0,
 F  λV F 22  λ 
21 21
 
 K 11  λ K 12  λW 12 

  ¼ 0:
 K  λW K λ 
21 21 22

Notice that these four secular equations are the same as a (2, 2) determinant of
(19.59) in the form of a secular equation. Note, at the same time, that while (19.59)
did not assume SALCs, (19.199) takes account of SALCs. That is, (19.199)
788 19 Applications of Group Theory to Physical Chemistry

expresses a secular equation with respect to two SALCs that belong to the same
irreducible representation.
The first equation of (19.199) reads as

1  S12 2 λ2  ðH 11 þ H 22  2H 12 S12 Þλ þ H 11 H 22  H 12 2 ¼ 0: ð19:200Þ

In (19.200) we define quantities as follows:


ð
hH 1 þ H 2 þ H 3 þ H 4 jC2si 2hH 1 jC2si
S12  H ðA1 Þ C2sdτ  hH ðA1 Þ jC2si ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 1 þ 3SHH 1 þ 3SHH
2SCH
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
A1
ffi: ð19:201Þ
1 þ 3SHH

In (19.201), an overlap integral between hydrogen atomic orbitals and C2s is


identical from the symmetry requirement and it is defined as

A1  hH 1 jC2si:
SCH ð19:202Þ

Also in (19.200), other quantities are defined as follows:


D E
αH þ 3βHH
H 11  H ðA1 Þ jHH ðA1 Þ ¼ , ð19:203Þ
1 þ 3SHH
H 22  hC2sjHC2si, ð19:204Þ
D E hH þ H þ H þ H jHC2si 2hH jHC2si
H 12  H ðA1 Þ jHC2s ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 3
ffi4 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1

2 1 þ 3S HH
1 þ 3SHH
2βCH
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
A1
ffi, ð19:205Þ
1 þ 3SHH

where H is a Hamiltonian of a methane molecule. In (19.203) and (19.205),


moreover, we define the quantities as
   
αH  H 1 HH 1 i, βHH  hH 1 HH 2 , βCH
A1  hH 1 jHC2si: ð19:206Þ

The quantity of H11 is a “Coulomb” integral of the hydrogen-based SALC that


involves four hydrogen atoms. Solving the first equation of (19.199), we get
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 
H 11 þH 22 2H 12 S12  ðH 11 H 22 Þ2 þ4 H 12 2 þH 11 H 22 S12 2 H 12 S12 ðH 11 þH 22 Þ
λ¼ :
2 1S12 2
ð19:207Þ
19.5 MO Calculations of Methane 789

Similarly, we obtain related solutions for the latter three eigenvalue equations of
(19.199). With the second equation of (19.199), for instance, we have

1  T 12 2 λ2  ðG11 þ G22  2G12 T 12 Þλ þ G11 G22  G12 2 ¼ 0: ð19:208Þ

Solving this, we get


qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 
G11 þG22 2G12 T 12  ðG11 G22 Þ2 þ4 G12 2 þG11 G22 T 12 2 G12 T 12 ðG11 þG22 Þ
λ¼ :
2 1T 12 2
ð19:209Þ

In (19.208) and (19.209) we define these quantities as follows:


ð D E hH þH H H jC2p i 2hH jC2p i
ðT Þ ðT Þ
T 12  H 1 2 C2px dτ H 1 2 j2Cpx ¼ 1 2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3
ffi4 x
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1

x
2 1SHH 1SHH
2SCH
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T2
ffi, ð19:210Þ
1SHH
D E
ðT Þ ðT Þ αH  βHH
G11  H 1 2 jHH 1 2 ¼ , ð19:211Þ
1  SHH
G22  hC2px jHC2px i, ð19:212Þ
D E hH þ H  H  H jHC2p i 2hH jHC2p i
ðT Þ
G12  H 1 2 jHC2px ¼ 1 2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3

4 x
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
ffix
2 1  SHH 1  SHH
2βCH
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T2
ffi: ð19:213Þ
1  SHH

In the above equations, we further define integrals such that

T 2  hH 1 jC2px i, βT 2  hH 1 jℌC2px i:
SCH ð19:214Þ
CH

In (19.210), an overlap integral T12 between four hydrogen atomic orbitals and
C2px is identical again from the symmetry requirement. That is, the integrals of
(19.213) are additive regarding a product of plus components of a hydrogen atomic
orbital of (19.187) and C2px as well as another product of minus components of a
hydrogen atomic orbital of (19.187) and C2px; see Fig. 19.14. Notice that C2px has a
node on yz-plane.
The third and fourth equations of (19.199) give exactly the same eigenvalues as
that given in (19.209). This is obvious from the fact that all the latter three equations
of (19.199) are associated with a irreducible representation T2. The corresponding
three MOs are triply degenerate.
In (19.207) and (19.209), the plus sign gives a higher orbital energy and the minus
sign gives a lower energy. Equations (19.207) and (19.209), however, look
790 19 Applications of Group Theory to Physical Chemistry

somewhat complicated. To simplify the situation, (i) in, e.g., (19.207) let us consider
a case where jH11 j  j H22j or jH11 j j H22j. In that case, (H11  H22)2
dominates inside a square root, and so ignoring 2H12S12 and S122 we have
λ H11 or λ H22. Inserting these values into (19.199), we have either Ψ
H ðA1 Þ or Ψ C2s, where Ψ is a resulting MO. This implies that no interaction would
arise between H ðA1 Þ and C2s. (ii) If, on the other hand, H11 ¼ H22, we would get

H 11  H 12 S12  j H 12  H 11 S12 j
λ¼ : ð19:215Þ
1  S12 2

As H12  H11S12 is positive or negative, we have a following alternative:


11 þH 12 11 H 12
Case I: H12  H11S12 > 0. We have λH ¼ H1þS 12
and λL ¼ H1S 12
, where
λ H > λ L.
11 H 12 11 þH 12
Case II: H12  H11S12 < 0. We have λH ¼ H1S 12
and λL ¼ H1þS 12
, where
λ H > λ L.
In this case, we would anticipate maximum orbital mixing between H ðA1 Þ and
C2s. (iii) If H11 and H22 are moderately different in between the above (i) and (ii), the
orbital mixing is likely to be moderate. This is an actual situation. With (19.209) we
have eigenvalues expressed similarly to those of (19.207). Therefore, classifications
related to the above Cases I and II hold with G12  G11T12, F12  F11V12, and
K12  K11W12 as well.
In spite of simplicity of (19.215), the quantities H11, H12, and S12 are hard to
calculate. In general cases including the present example (i.e., methane), the diffi-
culty in calculating these quantities results essentially from the fact that we are
dealing with a many-particle interactions that include electron repulsion. Nonethe-
less, for a simplest case of a hydrogen molecular ion Hþ 2 the estimation is feasible
[3]. Let us estimate H12  H11S12 quantitatively according to Atkins and
Friedman [3].
The Hamiltonian of Hþ 2 is described as

ħ2 2 e2 e2 e2
H¼ ∇   þ , ð19:216Þ
2m 4πε0 r A 4πε0 r B 4πε0 R

where m is an electron rest mass and other symbols are defined in Fig. 19.15; the last
term represents the repulsive interaction between the two hydrogen nuclei. To
estimate the 0
quantities
1 in (19.215), it is convenient to use dimensionless ellipsoidal
μ
B C
coordinates @ ν A such that [3]
ϕ

rA þ rB rA  rB
μ¼ and ν¼ : ð19:217Þ
R R
19.5 MO Calculations of Methane 791

A B

Fig. 19.15 Configuration of electron (e) and hydrogen nuclei (A and B). rA and rB denote a
separation between the electron and A and that between the electron and B, respectively. R denotes
a separation between A and B

The quantity ϕ is an azimuthal angle around the molecular axis (i.e., the straight
line connecting the two nuclei). Then, we have
          2  
 ħ2 2 e2   e2   e 

H 11 ¼ hAjHjAi ¼ A ∇  A þ A  A þ A A
2m 4πε0 r A  4πε0 r B  4πε0 R
   
e2 1 e2
¼ E 1s  A A þ , ð19:218Þ
4πε0 rB 4πε0 R

where E1s is the same as that given in (3.258) with Z ¼ n ¼ 1 and μ replaced with
ħ2
m (i.e., E1s ¼  2ma 2 ). Using a coordinate representation of (3.301), we have

rffiffiffi
1 3=2 rA =a
j Ai ¼ a e : ð19:219Þ
π

Moreover considering (19.217), we have


    ð
1 1 1
A A ¼ 3 dτe2rA =a :
rB πa rB

Converting Cartesian coordinates to ellipsoidal coordinates [3] such that


ð ð
3 2π
ð1 ð1
R
dτ ¼ dϕ dμ dν μ2  ν2 ,
2 0 1 1

we have
792 19 Applications of Group Theory to Physical Chemistry

    ð1 ð1
1 R3 eðμþνÞR=a
A A ¼  2π dμ dν μ2  ν2
rB 8πa 3 R
1 1 ðμ  νÞ
2
ð ð1
R2 1
¼ 3 dμ dνðμ þ νÞeðμþνÞR=a : ð19:220Þ
2a 1 1

Ð1 Ð1
Putting I ¼ 1 dμ 1 dνðμ þ νÞeðμþνÞR=a , we obtain
ð1 ð1 ð1 ð1
μR=a νR=a μR=a
I¼ μe dμ e dν þ e dμ νeνR=a dν: ð19:221Þ
1 1 1 1

The above definite integrals can readily be calculated using the methods
described in Sect. 3.7.2; see, e.g., (3.262) and (3.263). For instance, we have
ð1  1
ecx 1
ecν dν ¼ ¼ ðec  ec Þ: ð19:222Þ
1 c 1
c

Differentiating (19.222) with respect to the parameter c, we get


ð1
1
νecν dν ¼ ½ð1  cÞec  ð1 þ cÞec : ð19:223Þ
1 c2

In the present case, c is to be replaced with R/a. Other calculations of the definite
integrals are left for readers. Thus, we obtain
h i
a3 R
I¼ 3
2  2 1 þ e2R=a :
R a

In turn, we have
    h i
1 R2 1 R
A A ¼ 3  I ¼ 1  1 þ e2R=a :
rB 2a R a

Introducing a symbol according to Atkins and Friedman [3]


19.5 MO Calculations of Methane 793

e2
j0  ,
4πε0

we finally obtain

j0 h R
i j
H 11 ¼ E 1s  1  1 þ e2R=a þ 0 : ð19:224Þ
R a R

The quantity S12 can be obtained as follows:


ð 2π ð1 ð1
R3
S12 ¼ hB j Ai ¼ dϕ dμ dν μ2  ν2 eμR=a , ð19:225Þ
8πa3 0 1 1
qffiffi
1 3=2 rB =a
where we used j Bi ¼ πa e and (19.217). Noting that

ð1 ð1 ð1
2 μR=a
dμ dν μ2  ν2 eμR=a ¼ dμ 2μ2  e
1 1 1 3

and following procedures similar to those described above, we get


 2
R 1 R
S12 ¼ 1 þ þ eR=a : ð19:226Þ
a 3 a

In turn, for H12 we have


          2  
 ħ2 e2   2   
H 12 ¼ hAjHjBi ¼ A ∇2  B þ A  e B þ A e B
2m 4πε0 r B   4πε0 r A   4πε0 R
   
e2 1 e2
¼ E1s hAjBi A B þ hAjBi: ð19:227Þ
4πε0 r A  4πε0 R

ħ 2
Note that in (19.227) jBi is an eigenfunction belonging to E1s ¼  2ma 2 that is an
ħ2 e2
eigenvalue of an operator  2m ∇  4πε0 rB . From (3.258), we estimate E1s to be
2

13.61 eV. Using (19.217) and converting Cartesian coordinates to ellipsoidal coor-
dinates once again, we get
794 19 Applications of Group Theory to Physical Chemistry

   
1 1 R R=a
A B ¼ 1þ e :
rA a a

Thus, we have
 
j0 j R R=a
H 12 ¼ E1s þ S  0 1þ e : ð19:228Þ
R 12 a a

We are now in the position to evaluate H12  H11S12 in (19.213). According to


Atkins and Friedman [3], we define following notations:
       
1 1
j  j0 A A
0
and k  j0 A B :
0
rB rA

Then, we have

H 12  H 11 S12 ¼ k0 þ j0 S12 : ð19:229Þ

The calculation of this quantity is straightforward. The result is

H 12  H 11 S12
h i
1 2R R=a j0 R R 1 R 2 3R=a
¼ j0  2 e  1þ 1þ þ Þ e
R 3a R a a 3 a

j0 a 2R R=a j0 h i
a R 1 R 2 3R=a
¼  e  1þ 1þ þ Þ e : ð19:230Þ
a R 3a a R a 3 a

In (19.230) we notice that whereas the second term is always negative, the first term
may be negative or positive depending upon R. We could not tell a priori whether
H12  H11S12 is negative accordingly. If we had R a, (19.230) would become
positive.
Let us then make a quantitative estimation. The Bohr radius a is about 52.9 pm
(using an electron rest mass). As an experimental result, R is approximately 106 pm
[3]. Hence, for a Hþ2 ion we have

R=a 2:0:

Using this number, we get

H 12  H 11 S12 0:13j0 =a < 0:

We estimate H12  H11S12 to be ~  3.5 eV. Therefore, from (19.215) we get


19.5 MO Calculations of Methane 795

H 11 þ H 12 H 11  H 12
λL ¼ and λH ¼ , ð19:231Þ
1 þ S12 1  S12

where λL and λH indicate lower and higher energy eigenvalues, respectively.


Namely, Case II in the above is more likely. Correspondingly, for MOs we have

j Aiþ j Bi j Ai j Bi
Ψ L ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi and Ψ H ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , ð19:232Þ
2ð1 þ S12 Þ 2ð1  S12 Þ

where Ψ L and Ψ H belong to λL and λH, respectively. The results are virtually the
same as those given in (19.101) to (19.105) of Sect. 19.4.1. In (19.101) and
Fig. 19.6, however, we merely assumed that λ1 ¼ αþβ αβ
1þS is lower than λ2 ¼ 1S .
Here, we have confirmed that this is truly the case. A chemical bond is formed in
such a way that an electron is distributed as much as possible along the molecular
axis (see Fig. 19.4 for a schematic) and that in this configuration a minimized orbital
energy is achieved.
In our present case, H11 in (19.203) and H22 in (19.204) should differ. This is also
the case with G11 in (19.211) and G22 in (19.212). Here we return back to (19.199).
Suppose that we get a MO by solving, e.g., the first secular equation of (19.199) such
that

Ψ ¼ a1 H ðA1 Þ þ b1 C2s:

From the secular equation, we get

H 12  λS12
a1 ¼  b , ð19:233Þ
H 11  λ 1

where S12 and H12 were defined as in (19.201) and (19.205), respectively. A
e is described by
normalized MO Ψ

a1 H ðA1 Þ þ b1 C2s
e ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ψ ffi: ð19:234Þ
a1 2 þ b1 2 þ 2a1 b1 S12

Thus, according to two different energy eigenvalues λ we will get two linearly
independent MOs. Other three secular equations are dealt with similarly.
From the energetical consideration of Hþ 2 we infer that a1b1 > 0 with a bonding
MO and a1b1 < 0 with an antibonding MO. Meanwhile, since both H ðA1 Þ and C2s
belong to the irreducible representation A1, so does Ψe on the basis of the discussion
on the projection operators of Sect. 18.7. Thus, we should be able to construct proper
MOs that belong to A1. Similarly, we get proper MOs belonging to T2 by solving
other three secular equations of (19.199). In this case, three bonding MOs are triply
degenerate, so are three antibonding MOs. All these six MOs belong to the
796 19 Applications of Group Theory to Physical Chemistry

Fig. 19.16 Probable energy ∗


diagram and MO symmetry
species of methane. Adapted
from http://www.science.
oregonstate.edu/~gablek/
CH334/Chapter1/methane_ ∗
MOs.htm with kind
permission of Professor
Kevin P. Gable

CH4

irreducible representation T2. Thus, we can get a complete set of MOs for methane.
These eight MOs span the representation space V8.
To precisely determine the energy levels, we need to take more elaborate
approaches to approximate and calculate various parameters that appear in
(19.199). At the same time, we need to perform detailed experiments including
spectroscopic measurements and interpret those results carefully [4]. Taking account
of these situations, Fig. 19.16 [5] displays as an example of MO calculations that
give a probable energy diagram and MO symmetry species of methane. The diagram
comprises a ground-state bonding a1 state and its corresponding antibonding state a1
along with triply degenerate bonding t2 states, and their corresponding antibonding
state t 2 .
We emphasize that the said elaborate approaches ensue truly from the “paper-
and-pencil” methods based upon group theory. The group theory thus supplies us
with a powerful tool and clear guideline for addressing various quantum chemical
problems, a few of which we are introducing as an example in this book.
Finally, let us examine the optical transition of methane. In this case, we have to
consider electronic configurations of the initial and final states. If we are dealing with
optical absorption, the initial state is the ground-state A1 that is described by the
totally symmetric representation. The final state, on the other hand, will be an excited
state, which is described by a direct-product representation related to the two states
that are associated with the optical transition. The matrix element is expressed as
D E
Θðα βÞ
jεe  PjΘðA1 Þ , ð19:235Þ

where ΘðA1 Þ stands for an electronic configuration of the totally symmetric ground
state; Θ(α β) denotes an electronic configuration of an excited state represented by a
direct-product representation pertinent to irreducible representations α and β; P is an
electric dipole operator and εe is a unit polarization vector of the electric field. From a
19.5 MO Calculations of Methane 797

character table for Td, we find that εe  P belongs to the irreducible representation T2
(see Table 19.12).
The ground-state electronic configuration is A1 (totally symmetric). It is denoted
by

a21 t 22 t 0 2 t 00 2 ,
2 2

where three T2 states are distinguished by a prime and double prime. For possible
configuration of excited states, we have.

a21 t 2 t 0 2 t 00 2 t 2 ðA1 ! T 2 T 2 Þ, a21 t 2 t 0 2 t 00 2 a1 ðA1 ! T 2


2 2 2 2
A1 ¼ T 2 Þ,
a1 t 22 t 0 2 t 00 2 t 2 a1 t 22 t 0 2 t 00 2 a1
2 2 2 2
ðA1 ! A1 T 2 ¼ T 2 Þ, ðA1 ! A1 A1 ¼ A1 Þ:
ð19:236Þ

In (19.236), we have optical excitations of t 2 ! t 2 , t 2 ! a1 , a1 ! t 2 , and a1 ! a1 ,


respectively. Consequently, the excited states are denoted by T2 T2, T2, T2, and A1,
respectively. Since εe  P is Hermitian as seen in (19.51) of Sect. 19.2, we have
D E D E
Θðα βÞ
jεe  PjΘðA1 Þ ¼ ΘðA1 Þ jεe  PjΘðα βÞ
: ð19:237Þ

Therefore, according to the general theory of Sect. 19.2, we need to examine whether
T2 D(α) D(β) contains A1 to judge whether the optical transition is allowed. As
mentioned above, D(α) D(β) is chosen from among T2 T2, T2, T2, and A1. The
results are given as below:

T2 T2 T 2 ¼ 3T 1 þ 4T 2 þ 2E þ A2 þ A1 , ð19:238Þ
T2 T 2 ¼ T 1 þ T 2 þ E þ A1 , ð19:239Þ
T2 A1 ¼ T 2 : ð19:240Þ

Admittedly, (19.238) and (19.239) contain A1, and so the transition is allowed. As
for (19.240), however, the transition is forbidden because it does not contain A1. In
light of the character table of Td (Table 19.12), we find that the allowed transitions
(i.e., t 2 ! t 2 , t 2 ! a1 , and a1 ! t 2 ) equally take place in the direction polarized
along the x-, y-, z-axes. This is often the case with molecules having higher
symmetries such as methane. The transition a1 ! a1 , on the other hand, is
forbidden.
In the above argument including the energetic consideration, we could not tell
magnitude relationship of MO energies or photon energies associated with the
optical transition. Once again, this requires more accurate calculations and experi-
ments. Yet, the discussion we have developed gives us strong guiding principles in
the investigation of molecular science.
798 19 Applications of Group Theory to Physical Chemistry

References

1. Cotton FA (1990) Chemical applications of group theory. Wiley, New York


2. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
3. Atkins P, Friedman R (2005) Molecular quantum mechanics, 4th edn. Oxford University Press,
Oxford
4. Anslyn EV, Dougherty DA (2006) Modern physical organic chemistry. University Science
Books, Sausalito
5. Gable KP (2014) Molecular orbitals: Methane, http://www.science.oregonstate.edu/~gablek/
CH334/Chapter1/methane_MOs.htm
Chapter 20
Theory of Continuous Groups

In Chap. 16, we classified the groups into finite groups and infinite groups. We
focused mostly on finite groups and their representations in the precedent chapters.
Of the infinite groups, continuous groups and their properties have been widely
investigated. In this chapter we think of the continuous groups as a collective term
that include Lie groups and topological groups. The continuous groups are also
viewed as the transformation group. Aside from the strict definition, we will see the
continuous group as a natural extension of the rotation group or SO(3) that we
studied briefly in Chap. 17. Here we reconstruct SO(3) on the basis of the notion of
infinitesimal rotation. We make the most of the exponential functions of matrices the
theory of which we have explored in Chap. 15. In this context, we study the basic
properties and representations of the special unitary group of SU(2) that has close
relevance to SO(3). Thus, we focus our attention on the representation theory of SU
(2) and SO(3). The results are associated with the (generalized) angular momenta
which we dealt with in Chap. 3. In particular, we show that the spherical surface
harmonics constitute the basis functions of the representation space of SO(3).
Finally, we study various important properties of SU(2) and SO(3) within a
framework of Lie groups and Lie algebras. The last sections comprise the description
of abstract ideas, but those are useful for us to comprehend the constitution of the
group theory from a higher perspective.

20.1 Introduction: Operators of Rotation and Infinitesimal


Rotation

To characterize the continuous transformation appropriately, we have to describe an


infinitesimal transformation of rotation. This is because a rotation with a finite angle
is attained by an infinite number of infinitesimal rotations.

© Springer Nature Singapore Pte Ltd. 2020 799


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3_20
800 20 Theory of Continuous Groups

First let us consider how a function Ψ is transformed by the rotation (see Sect.
19.1). Here we express Ψ by
0 1
c1
B c2 C
B C
Ψ ¼ ðψ 1 ψ 2   ψ d ÞB C, ð20:1Þ
@⋮A
cd

where {ψ 1,   , ψ d} span the representation space relevant to the rotation; d is a


dimension of the representation space; c1, c2,    cd are coefficients (or coordinates).
Henceforth, we make it a rule to denote a vector in a representation space as in
(20.1). This is a natural extension of (11.13). We assume ψ v (v ¼ 1, 2,   , d ) as a
vector component and ψ v is normally expressed as a function of a position vector x in
a real three-dimensional space (see Chap. 18). Considering for simplicity a
one-dimensional representation space (i.e., d ¼ 1) and omitting the index v, as a
transformation of ψ we have
 
Rθ ψ ðxÞ ¼ ψ R1
θ ð xÞ , ð20:2Þ

where the index θ indicates the rotation angle in a real space whose dimension is two
or three. Note that the dimensions of the real space and representation space may or
may not be the same. Equation (20.2) is essentially the same as (19.11).
As a three-dimensional matrix representation of rotation, we have, e.g.,
0 1
cos θ  sin θ 0
B C
Rθ ¼ @ sin θ cos θ 0 A: ð20:3Þ
0 0 1

The matrix Rθ represents a rotation around the z-axis in ℝ3; see Fig. 11.2 and
(11.31). Borrowing the notation of (17.3) and (17.12), we have
0 10 1
cos θ  sin θ 0 x
B CB C
Rθ ðxÞ ¼ ðe1 e2 e3 Þ@ sin θ cos θ 0 A@ y A: ð11:36Þ
0 0 1 z

Therefore, we get
0 10 1
cos θ sin θ 0 x
B CB C
R1
θ ðxÞ ¼ ðe1 e2 e3 Þ@  sin θ cos θ 0 A@ y A: ð20:4Þ
0 0 1 z
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 801

Let us think of an infinitesimal rotation. Taking a first-order quantity of an


infinitesimal angle θ, (20.4) can be approximated as
0 10 1 0 1
1 θ 0 x x þ θy
B CB C B C
R1 B CB C B C
θ ðxÞ  ðe1 e2 e3 Þ@ θ 1 0 A@ y A ¼ ðe1 e2 e3 Þ@ y  θx A

0 0 1 z z ð20:5Þ
¼ ðxe1 þ θye1 ye2  θxe2 ze3 Þ
¼ ð½x þ θye1 ½y  θxe2 ze3 Þ:

Putting

ψ ðxÞ = ψ ðxe1 þ ye2 þ ze3 Þ  ψ ðx, y, zÞ,

we have
 
Rθ ψ ðxÞ ¼ ψ R1
θ ð xÞ ð20:2Þ

or

Rθ ψ ðx, y, zÞ  ψ ðx þ θy, y  θx, zÞ


∂ψ ∂ψ
¼ ψ ðx, y, zÞ þ θy þ ðθxÞ
∂x ∂y ð20:6Þ
 
∂ ∂
¼ ψ ðx, y, zÞ  iθðiÞ x  y ψ ðx, y, zÞ:
∂y ∂x

Now, using a dimensionless operator ℳz (¼Lz/ħ) such that


 
∂ ∂
ℳz  i x  y , ð20:7Þ
∂y ∂x

we get

Rθ ψ ðx, y, zÞ  ψ ðx, y, zÞ  iθℳz ψ ðx, y, zÞ ¼ ð1  iθℳz Þψ ðx, y, zÞ: ð20:8Þ

Note that the operator ℳz has already appeared in Sect. 3.5 [also see (3.19), Sect.
3.2], but in this section we denote it by ℳz to distinguish it from the previous
notation Mz. This is because ℳz and Mz are connected through a unitary similarity
transformation (vide infra).
From (20.8), we express an infinitesimal rotation θ around the z-axis as
802 20 Theory of Continuous Groups

Rθ = 1  iθℳz : ð20:9Þ

Next, let us think of a rotation of a finite angle θ. We have


 n
Rθ ¼ Rθ=n , ð20:10Þ

where RHS of (20.10) implies that the rotation Rθ of a finite angle θ is attained by
n successive infinitesimal rotations of Rθ/n with a large enough n.
Taking a limit n ! 1 [1, 2], we get
 n
 n θ
Rθ ¼ lim Rθ=n ¼ lim 1  i ℳz ¼ exp ðiθℳz Þ: ð20:11Þ
n!1 n!1 n

Recall the following definition of an exponential function other than (15.7) [3]:

x n
ex  n!1
lim 1 þ : ð20:12Þ
n

Note that as in the case of (15.7), (20.12) holds when x represents a matrix. Thus,
if a dimension of the representation space d is two or more, (20.9) should be read as

Rθ ¼ E  iθℳz , ð20:13Þ

where E is a (d, d) identity matrix; ℳz is represented by a (d, d) square matrix


accordingly. As already shown in Chap. 15, an exponential function of a matrix is
well defined. Since ℳz is Hermitian, iℳz is anti-Hermitian (see Chaps. 2 and 3).
Thus, using (20.11) Rθ can be expressed as

Rθ ¼ exp ðθAz Þ, ð20:14Þ

with

Az  iℳz : ð20:15Þ

Operators of this type play an essential role in continuous groups. This is because
in (20.14) the operator Az represents a Lie algebra, which in turn generates a Lie
group. The Lie group is categorized as one of continuous groups along with a
topological group. In this chapter we further explore various characteristics of SO
(3) and SU(2), both of which are Lie groups and have been fully investigated from
various aspects. The Lie algebras frequently appear in the theory of continuous
groups in combination with the Lie groups. Brief outline of the theory of Lie groups
and Lie algebras will be given at the end of this chapter.
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 803

Let us further consider implications of (20.9) and (20.13). From those equations
we have

ℳz ¼ ð1  Rθ Þ=iθ ð20:16Þ
or

ℳz ¼ ðE  Rθ Þ=iθ: ð20:17Þ

As a very trivial case, we might well be able to think of a one-dimensional real


space. Let the space extend along the z-axis and a unit vector of that space be e3.
Then, with any rotation Rθ around the z-axis makes e3 unchanged (or invariant). That
is, we have

Rθ ðe3 Þ ¼ e3 :

This means Rθ  E. Putting this into (20.17), we have ℳz ¼ 0. From (20.15), we


have Az ¼ 0 as well. In fact, (20.14) shows that this is the case.
With the three-dimensional case, we have
0 1 0 1
1 0 0 1 θ 0
B C B C
E ¼ @0 1 0A and Rθ ¼ @ θ 1 0 A: ð20:18Þ
0 0 1 0 0 1

Therefore, from (20.17) we get


0 1
0 i 0
B C
ℳz ¼ @ i 0 0 A: ð20:19Þ
0 0 0

Next, viewing x, y, and z as vector components (i.e., functions), we think of the


following equation:
0 10 1
0 i 0 c1
B CB C
ℳz ðΧÞ ¼ ðx y zÞ@ i 0 0 A@ c2 A: ð20:20Þ
0 0 0 c3

The notation of (20.20) denotes the vector transformation in the representation


space, in accordance with (11.37). In (20.20) we consider (x y z) as basis vectors as in
the case of (e1 e2 e3) of (17.2) and (17.3). In other words, we are thinking of a vector
transformation in a three-dimensional representation space spanned by a set of basis
804 20 Theory of Continuous Groups

Fig. 20.1 Function


f(x, y) ¼ ay (a > 0). The = , = ( > 0)
function ay is drawn green.
Only a part of f (x, y) > 0
is shown

vectors (x y z). In this case, ℳz represented by (20.19) operates on a vector (i.e., a


function) Χ expressed by
0 1
c1
B C
Χ ¼ ð x y z Þ @ c2 A, ð20:21Þ
c3

where c1, c2, and c3 are coefficients (or coordinates). Again, this notation is a natural
extension of (11.13). We emphasize once more that (x y z) does not represents
coordinates but functions. As an example, we depict a function f(x, y) ¼ ay (a > 0) in
Fig. 20.1. The function ay (drawn green in Fig. 20.1) behaves as a vector
with respect to a linear transformation like a basis vector e2 of the Cartesian
coordinate. The “directionality” as a vector can easily be visualized with the function
ay.
Since ℳz of (20.19) is Hermitian, it must be diagonalized by unitary similarity
transformation (Chap. 14). As in the case of, e.g., (14.96), we find a diagonalizing
unitary matrix P with respect to ℳz such that
0 1
1 1
 pffiffiffi pffiffiffi 0
B 2 2 C
B C
P¼B
B  pffiffiffi  piffiffiffi
i C: ð20:22Þ
@ 0C
A
2 2
0 0 1

Using (20.22), we rewrite (20.20) as


20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 805

0 1 0 1
0 i 0 c1
B C B C
ℳz ðΧÞ ¼ ðx y zÞP  P1 B
@i 0 0C 1 B C
AP  P @ c2 A
0 0 0 c3
0 1 1
0
1 1 1 i
 pffiffiffi pffiffiffi 0 0 1
 pffiffiffi pffiffiffi 0 0 1
B 2 2 C 1 0 0 B 2 2 C c1
B CB CB CB C
B CB B 1 CB C
¼ ðx y zÞB 1 0 C
 pffiffiffi 0 C AB pffiffiffi pffiffiffi 0 C
B
i i 0 i c2
B  pffiffiffi C@ C@ A
@ 2 2 A @ 2 2 A
0 0 0 c3
0 0 1 0 0 1
0 1
0 1  p1ffiffiffi c þ piffiffiffi c
 1 0 0 B 2 C
1 2
 2
1 i 1 i B CB
B
C
C
¼  pffiffiffi x  pffiffiffi y pffiffiffi x  pffiffiffi y z B
@ 0 1 0 CB 1
AB pffiffiffi c1 þ pffiffiffi c2 C
i :
2 2 2 2 C
@ 2 2 A
0 0 0
c3
ð20:23Þ

fz such that
Here we define M
0 1
1 0 0
fz  P1 ℳz P ¼ B
M @0
C
1 0 A: ð20:24Þ
0 0 0

Equation (20.24) was obtained by putting l ¼ 1 and shuffling the rows and
columns of (3.158) in Chap. 3. That is, choosing a (real) unitary matrix R3 given by
0 1
0 0 1
B C
R3 ¼ @ 1 0 0 A, ð17:116Þ
0 1 0

we have
0 1
1 0 0
f 1 B C
R1 1 1
3 M z R3 ¼ R3 P ℳz PR3 ¼ ðPR3 Þ ℳz ðPR3 Þ ¼ @ 0 0 0 A ¼ Mz:
0 0 1

Hence, Mz that appeared as a diagonal matrix in (3.158) has been obtained by a


unitary similarity transformation of ℳz using a unitary operator PR3.
Comparing (20.20) and (20.23), we obtain useful information. If we choose (x y z)
for basis vectors, we could not have a representation that diagonalizes ℳz. If, on the
other hand, we choose  p1ffiffi2 x  piffiffi2 y p1ffiffi2 x  piffiffi2 y z for the basis vectors, we get a
representation that diagonalizes ℳz. These basis vectors are sometimes referred to as
806 20 Theory of Continuous Groups

a spherical basis. If we carefully look at the latter basis vectors (i.e., spherical basis),
we
 1 notice that these vectors have the same transformation properties as
Y 1 Y 1
1 Y 01 . In fact, using (3.216) and converting the spherical coordinates to
Cartesian coordinates in (3.216), we get [4].
rffiffiffiffiffi  
 3 1 1 1
Y 11 Y 1
1 Y 01 ¼  pffiffiffi ðx þ iyÞ pffiffiffi ðx  iyÞ z , ð20:25Þ
4π r 2 2

where r is a radial coordinate (i.e., a positive number)


 that appears in the polar
coordinate (r, θ). Note that the factor r contained in Y 11 Y 1
1 Y 01 is invariant with
respect to the rotation.
We can generalize the results and conform them to the discussion of related
characteristics in representation spaces of higher dimensions. The spherical surface
harmonics possess an important position for this (vide infra).
From Sect. 3.5, (3.150) we have

M z Y 00 ¼ 0,
pffiffiffiffiffiffiffiffiffiffi
where Y 00 ðθ, ϕÞ ¼ 1=4π . The above relation obviously shows that Y 00 ðθ, ϕÞ
belongs to an eigenvalue of zero of Mz. This is consistent with the earlier comments
made on the one-dimensional real space.

20.2 Rotation Groups: SU(2) and SO(3)

Bearing in mind the aforementioned background knowledge, we further develop the


theory of the rotation groups SU(2) and SO(3) and explore in detail various proper-
ties of them. First, we deal with the topics through conventional approaches. Later
we will describe them systematically.
Following the argument developed above, we wish to relate an anti-Hermitian
operator to a rotation operator of a finite angle θ represented by (17.12). We express
Hermitian operators related to the x- and y-axes as
0 1 0 1
0 0 0 0 0 i
B C B C
ℳx ¼ @ 0 0 i A and ℳy ¼ @ 0 0 0 A:
0 i 0 i 0 0

Then, anti-Hermitian operators corresponding to (20.15) are described by


0 1 0 1 0 1
0 1 0 0 0 0 0 0 1
B C B C B C
Az ¼ @ 1 0 0 A, A x ¼ @ 0 0 1 A, Ay ¼ @ 0 0 0 A, ð20:26Þ
0 0 0 0 1 0 1 0 0
20.2 Rotation Groups: SU(2) and SO(3) 807

where Ax  iℳx and Ay  iℳy. These are characterized by real skew-symmetric


matrices.
Using (15.7), we calculate exp(θAz) such that

1 1 1 1
exp ðθAz Þ ¼ E þ θAz þ ðθAz Þ2 þ ðθAz Þ3 þ ðθAz Þ4 þ ðθAz Þ5 þ   : ð20:27Þ
2! 3! 4! 5!

We note that
0 1 0 1
1 0 0 1 0 0
B C B C
Az 2 ¼@ 0 1 0 A ¼ Az , Az ¼ @ 0
6 4
1 0 A ¼ Az 8 , etc:;
0 0 0 0 0 0
0 1 0 1
0 1 0 0 1 0
B C B C
Az 3 ¼ @ 1 0 0 A ¼ Az 7 , Az 5 ¼ @ 1 0 0 A ¼ Az 9 , etc:
0 0 0 0 0 0

Thus, we get

1 1 1 1
exp ðθAz Þ ¼ E þ ðθAz Þ2 þ ðθAz Þ4 þ    þ θAz þ ðθAz Þ3 þ ðθAz Þ5 þ   
2! 4! 3! 5!
0 1
1 1
1  θ2 þ θ4     0 0
B 2! 4! C
B C
¼BB 1 2 1 4 C
0 1  θ þ θ     0 C
@ 2! 4! A
0 0 1
0 1
1 3 1 5
0 θ þ θ  θ þ    0
B 3! 5! C
B C
þBB 1 3 1 5 C
θ  θ þ θ þ    0 0 C :
@ 3! 5! A
0 0 0
0 1 0 1
cos θ 0 0 0  sin θ 0
B C B C
¼B@ 0 cos θ 0 C B
A þ @ sin θ 0 0CA
0 0 1 0 0 0
0 1
cos θ  sin θ 0
B C
B
¼ @ sin θ cos θ 0 C A:
0 0 1
ð20:28Þ

Hence, we recover the matrix form of (17.12). Similarly, we get


808 20 Theory of Continuous Groups

0 1
1 0 0
B C
exp ðφAx Þ ¼ @ 0 cos φ  sin φ A, ð20:29Þ
0 sin φ cos φ
0 1
cos ϕ 0 sin ϕ
 B C
exp ϕAy ¼ @ 0 1 0 A: ð20:30Þ
 sin ϕ 0 cos ϕ

Of the above operators, using (20.28) and (20.30) we recover (17.101) related to
the Euler angles such that

f
R3 ¼ exp ðαAz Þ exp βAy exp ðγAz Þ: ð20:31Þ

Using exponential functions of matrices, (20.31) gives a transformation matrix


containing the Euler angles and supplies us with a direct method to be able to make a
matrix representation of SO(3).
Although (20.31) is useful for the matrix representation in the real space, its
extension to abstract representation spaces would somewhat be limited. In this
respect, the method developed in SU(2) can widely be used to produce representa-
tion matrices for a representation space of an arbitrary dimension. Let us start with
constructing those representation matrices using (2, 2) complex matrices.

20.2.1 Construction of SU(2) Matrices

In Sect. 16.1 we mentioned a general linear group GL(n, ℂ). There are many
subgroups of GL(n, ℂ). Among them, SU(n), in particular SU(2), plays a central
role in theoretical physics and chemistry. In this section we examine how to
construct SU(2) matrices.
Definition 20.1 A group consisting of (n, n) unitary matrices with a determinant 1 is
called a special unitary group of degree n and denoted by SU(n).
In Chap. 14 we mentioned that the “absolute value” of determinant of a unitary
matrix is 1. In Definition 20.1, on the other hand, there is an additional constraint in
that we are dealing with unitary matrices whose determinant is 1.
The SU(2) matrices U have the following general form:
 
a b
U¼ , ð20:32Þ
c d

where a, b, c, and d are complex numbers. According to Definition 20.1 we try to


seek conditions for these numbers. The unitarity condition UU{ ¼ U{U ¼ E reads as
20.2 Rotation Groups: SU(2) and SO(3) 809

! ! !
a b a c jaj2 þ jbj2 ac þ bd 
¼
c d b d  a c þ b d jcj2 þ jdj2
! ! ! ! ð20:33Þ
a c a b jaj2 þ jcj2 a b þ c  d 1 0
¼ ¼ ¼ :
b d c d ab þ cd  jbj2 þ jdj2 0 1

Then we have
(i) |a|2 + |b|2 ¼ 1, (ii) |c|2 + |d|2 ¼ 1, (iii) |a|2 + |c|2 ¼ 1,
(iv) |b|2 + |d|2 ¼ 1, (v) ac + bd ¼ 0, (vi) ab + cd ¼ 0,
(vii) ad  bc ¼ 1.
Of the above conditions, (vii) comes from detU ¼ 1. Condition (iv) can be
eliminated by conditions (i)(iii). From (v) and (vii), using Cramer’s rule we have

0 b a 0

1 a b b 1 a
c¼   ¼ ¼ b , d ¼   ¼ ¼ a : ð20:34Þ
a b 2
jaj þ jbj 2 a b jaj þ jbj2
2

b a b a

From (20.34) we see that (i)–(iv) are equivalent and that (v) and (vi) are redun-
dant. Thus, as a general form of U we get
 
a b
U¼ with jaj2 þ jbj2 ¼ 1: ð20:35Þ
b a

As a result, we have four freedoms with a and b with a restricting condition of


|a|2 + |b|2 ¼ 1. That is, we finally get three freedoms with choice of parameters.
Putting

a ¼ p þ iq and b ¼ r þ is ðp, q, r, s : realÞ,

we have

p2 þ q2 þ r 2 þ s2 ¼ 1: ð20:36Þ

Therefore, the restricted condition of (20.35) is equivalent to that p, q, r, and s are


regarded as coordinates that are positioned on a unit sphere of ℝ4.
From (20.35), we have
 
1 { a b
U ¼U ¼ :
b a

Also putting
810 20 Theory of Continuous Groups

   
a1 b1 a2 b2
U1 ¼ and U2 ¼ , ð20:37Þ
b1 a1 b2 a2

we get
 
a1 a2  b1 b2 a1 b2 þ b1 a2
U1U2 ¼ : ð20:38Þ
a2 b1  a1 b2 b1 b2 þ a1 a2

Thus, both U1 and U1U2 satisfy criteria (20.35) and, hence, the unitary matrices
having a matrix form of (20.35) constitute a group; i.e., SU(2).
Next we wish to connect the SU(2) matrices to the matrices that represent the
angular momentum. In Sect. 3.4 we developed the theory of generalized angular
momentum. In the case of j ¼ 1/2 we can obtain accurate information about the spin
states. Using (3.101) as a starting point and defining the state belonging to the spin
   
1 0
angular momentum μ ¼  1/2 and μ ¼ 1/2 as and , respectively, we
0 1
obtain
   
ðþÞ 0 0 ð Þ 0 1
J ¼ and J ¼ : ð20:39Þ
1 0 0 0
   
1 0
Note that and are in accordance with (2.64) as a column vector
0 1
representation. Using (3.72), we have
     
1 0 1 1 0 i 1 1 0
Jx ¼ , Jy ¼ , Jz ¼ : ð20:40Þ
2 1 0 2 i 0 2 0 1
   
1 1 1
In (20.40) Jz was given so that we may have J z ¼ and
0 2 0
   
0 1 0 1
Jz ¼ . Deleting the coefficient from (20.40), we get Pauli spin
1 2 1 2
matrices such that
     
0 1 0 i 1 0
σx ¼ , σy ¼ , σz ¼ : ð20:41Þ
1 0 i 0 0 1

Multiplying i on (20.40), we have anti-Hermitian operators described by


     
1 0 i 1 0 1 1 i 0
ζx ¼ , ζy ¼ , ζz ¼ : ð20:42Þ
2 i 0 2 1 0 2 0 i

Notice that these are traceless anti-Hermitian matrices.


20.2 Rotation Groups: SU(2) and SO(3) 811

Now let us seek rotation matrices in a manner related to that deriving (20.31) but
at a more general level. From Property (7)0 of Chap. 15 (see Sect. 15.2), we know
that

exp ðtAÞ

with an anti-Hermitian operator A combined with a real number t produces a unitary


matrix. Let us apply this to the anti-Hermitian matrices described by (20.42).
Using ζ z of (20.42), we calculate exp(aζ z) as in (20.27). We show only the results
described by
0 α α 1
α α @ cos 2 þ i sin 2 0
A
exp ðαζ z Þ ¼ E  cos  iσ z sin ¼ α α
2 2 0 cos  i sin
2 2

!
e2 0
¼ : ð20:43Þ
0 e 2

Readers are recommended to derive this. Using ζ y and applying similar calcula-
tion procedures to the above, we also get
0 1
β β
 B
cos
2
sin
2 C:
exp βζ y ¼ @ A ð20:44Þ
β β
 sin cos
2 2

The representation matrices describe “complex rotations” and (20.43) and


ð12Þ
(20.44) correspond to (20.28) and (20.30), respectively. We define Dα,β,γ as the
representation matrix that describes the successive rotations α, β, and γ and corre-
sponds to f
R3 in (17.101) and (20.31) including Euler angles. As a result, we have

ð12Þ 
Dα,β,γ ¼ exp ðαζ z Þ exp βζ y exp ðγζ z Þ
0 1
β β
e2ðαþγ Þ cos e2ðαγ Þ sin
i i

B 2 2 C,
¼@ A ð20:45Þ
β β
e2ðγαÞ sin e2ðαþγÞ cos
i i

2 2
ð12Þ
where the superscript 12 of Dα,β,γ corresponds to the non-negative generalized angular
 1
momentum j ¼ 2 . Equation (20.45) can be obtained by putting a ¼ e2ðαþγÞ cos β2
i

and b ¼ e2ðαγÞ sin β2 in the matrix U of (20.35). Using this general SU(2) represen-
i

tation matrix, we construct representation matrices of higher orders.


812 20 Theory of Continuous Groups

20.2.2 SU(2) Representation Matrices: Wigner Formula

Suppose that we have two functions (or vectors) v and u. We regard v and u are
linearly independent vectors that undergo a unitary transformation by a unitary
matrix U of (20.35). That is, we have
 
0 0 a b
ðv u Þ ¼ ðv uÞU ¼ ðv uÞ , ð20:46Þ
b a

where (v u) are transformed into (v0 u0). Namely, we have

v0 ¼ av  b u, u0 ¼ bv þ a u:

Let us further define according to Hamermesh [5] an orthonormal set of 2j + 1


functions such that

u jþm v jm
f m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm ¼ j, j þ 1,   , j  1, jÞ, ð20:47Þ
ð j þ mÞ!ð j  mÞ!

where j is an integer or a half-odd-integer. Equation (20.47) implies that 2j + 1


monomials fm of 2j-degree with respect to v and u constitute the orthonormal basis.
Related discussion can be seen in Sect. 20.3.2. The description that follows includes
contents related to generalized angular momenta (see Sect. 3.4).
We examine how these functions fm (or vectors) are transformed by (20.45). We
denote this transformation by Rða, bÞ. That is, we have
X ð jÞ
Rða, bÞð f m Þ  m0
f m0 Dm0 m ða, bÞ, ð20:48Þ

ð jÞ
where Dm0 m ða, bÞ denotes matrix elements with respect to the transformation.
Replacing u and v with u0 and v0, respectively, we get

1
ℜða,bÞf m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðbv þ a uÞ jþm ðav  b uÞ jm
ð j þ mÞ!ð j  mÞ!
X 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðbv þ a uÞ jþm ðav  b uÞ jm
μ,v ð j þ mÞ!ð j  mÞ!
X 1 ð j þ mÞ!ð j  mÞ!
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
μ,v ð j þ mÞ!ð j  mÞ! ð j þ m  μÞ!μ!ð j  m  vÞ!v!

ða uÞ jþmμ ðbvÞμ ðb uÞ jmv ðavÞv


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð j þ mÞ!ð j  mÞ!
¼ μ,v ð j þ m  μÞ!μ!ð j  m  vÞ!v!
ða Þ jþmμ bμ ðb Þ jmv av u2jμv vμþv :

ð20:49Þ
20.2 Rotation Groups: SU(2) and SO(3) 813

Here, to eliminate v, putting

2j  μ  v ¼ j þ m0 , μ þ v ¼ j  m0 ,

we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð j þ mÞ!ð j  mÞ!
Rða, bÞf m ¼ μ,m0 ð j þ m  μÞ!μ!ðm0  m þ μÞ!ð j  m0  μÞ!
0
ða Þ jþmμ bμ ðb Þm mþμ a jm μ u jþm v jm :
0 0 0
ð20:50Þ

Expressing f m0 as
0 0
u jþm v jm
f m0 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm0 ¼ j, j þ 1,   , j  1, jÞ, ð20:51Þ
ð j þ m0 Þ!ð j  m0 Þ!

we obtain
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð j þ mÞ!ð j  mÞ!ð j þ m0 Þ!ð j  m0 Þ!
Rða, bÞð f m Þ ¼ f 0
μ,m 0 m
ð j þ m  μÞ!μ!ðm0  m þ μÞ!ð j  m0  μÞ!
0
ða Þ jþmμ bμ ðb Þm mþμ a jm μ :
0
ð20:52Þ

Equating (20.52) with (20.48), we get


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð jÞ
X ð j þ mÞ!ð j  mÞ!ð j þ m0 Þ!ð j  m0 Þ!
Dm0 m ða, bÞ ¼
μ ð j þ m  μÞ!μ!ðm0  m þ μÞ!ð j  m0  μÞ!
0
ða Þ jþmμ bμ ðb Þm mþμ a jm μ :
0
ð20:53Þ

In (20.53), factorials of negative integers must be avoided; see (3.216). Finally,


replacing a (or a) and b (or b) with e2ðαþγÞ cos β2 [or e2ðαþγÞ cos β2] and e2ðαγÞ sin β2
i i i

[or e2ðγαÞ sin β2] by use of (20.45), respectively, we get


i

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð jÞ
X ð1Þm0 mþμ ð j þ mÞ!ð j  mÞ!ð j þ m0 Þ!ð j  m0 Þ!
Dm0 m ðα, β, γ Þ ¼
μ ð j þ m  μÞ!μ!ðm0  m þ μÞ!ð j  m0  μÞ!
 2jþmm0 2μ  m0 mþ2μ
iðαm0 þγmÞ β β
e cos sin : ð20:54Þ
2 2

Equation (20.54) is called Wigner formula [5–7].


ð jÞ
To confirm that Dm0 m is indeed unitary, we carry out following calculations. From
(20.47) we have
814 20 Theory of Continuous Groups

Xj Xj ju jþm v jm j Xj juj2ð jþmÞ jvj2ð jmÞ


2
j f j2 ¼
m¼j m m¼j ð j þ mÞ!ð j  mÞ!
¼ m¼j ð j þ mÞ!ð j  mÞ!
:

Meanwhile, we have

h i2j X2j ð2jÞ!juj2ð2jkÞ jvj2k Xj ð2jÞ!juj2ð jþmÞ jvj


2ð jmÞ
2
juj þ jvj 2
¼ ¼ ,
k¼0 ð2j  k Þ!k! m¼j ð j þ mÞ!ð j  mÞ!

where with the last equality k was replaced with j  m. Comparing the above two
equations, we get

Xj h i2j
1
j f j2 ¼ juj2 þ jvj2 : ð20:55Þ
m¼j m ð2jÞ!

Viewing (20.46) as variables transformation and taking adjoint of (20.46), we


have
   
v0 { v
¼U , ð20:56Þ
u0 u
 
a b
where we defined U ¼ (i.e., a unitary matrix). Operating (20.56) on
b a
both sides of (20.46) from the right, we get
     
v0 v v
ðv0 u0 Þ ¼ ðv uÞUU { ¼ ð v uÞ :
u0 u u

That is, we have a quadratic form described by

ju0 j þ jv0 j ¼ juj2 þ jvj2 :


2 2
ð20:57Þ
Pj 2
From (20.55) and (20.57), we conclude that m¼j j f m j (i.e., the length of
ð jÞ
vector) is invariant under the transformation Dm0 m. This implies that the transforma-
ð jÞ
tion is unitary and, hence, that Dm0 m is a unitary matrix [5].

20.2.3 SO(3) Representation Matrices and Spherical Surface


Harmonics

Once representation matrices D( j )(α, β, γ) have been obtained, we are able to get
further useful information. We immediately notice that (20.54) is related to (3.216)
20.2 Rotation Groups: SU(2) and SO(3) 815

that is explicitly described by trigonometric functions. Putting m ¼ 0 in (20.54) we


have
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð jÞ ð jÞ
X ð1Þm0 þμ j! ð j þ m0 Þ!ð j  m0 Þ!
Dm0 0 ðα, β, γ Þ ¼ Dm0 0 ðα, βÞ
¼ μ ð j  μÞ!μ!ðm0 þ μÞ!ð j  m0  μÞ!
 2jm0 2μ  m0 þ2μ
0 β β
eiαm cos sin : ð20:58Þ
2 2

Note that (20.58) does not depend on the variable γ, because m ¼ 0 leads to
eimγ ¼ ei  0  γ ¼ e0 ¼ 1 in (20.54). Notice also that the event of m ¼ 0 in (20.54)
never happens with the case where j is a half-odd-integer, but happens with the case
where j is an integer l. In fact, D(l )(α, β, γ) is a (2l + 1)-degree representation of SO
(3). Later in this section, we will give a full matrix form in the case of l ¼ 1.
Replacing m0 with m in (20.58) we get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2jm2μ  mþ2μ
ð jÞ
X ð1Þmþμ j! ð j þ mÞ!ð j  mÞ! β β
iαm
Dm0 ðα,β,γ Þ ¼ μ ð j  μÞ!μ!ðm þ μÞ!ð j  m  μÞ!
e cos sin :
2 2
ð20:59Þ

For (20.59) to be consistent with (3.216), by changing notations of variables and


replacing j with an integer l we further rewrite (20.59) to get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2l2rm  2rþm
ðlÞ
X ð1Þmþr l! ðl þ mÞ!ðl  mÞ! θ θ
imϕ
Dm0 ðϕ, θÞ ¼ r r!ðl  m  r Þ!ðl  r Þ!ðm þ r Þ!
e cos sin :
2 2
ð20:60Þ

Notice that in (20.60) we deleted the variable γ as it was redundant. Comparing


(20.60) with (3.216), we get [5].

h i rffiffiffiffiffiffiffiffiffiffiffiffi
ðlÞ 4π m
Dm0 ðϕ, θÞ ¼ Y ðθ, ϕÞ: ð20:61Þ
2l þ 1 l

From a general point of view, let us return to a discussion of SU(2) and introduce
a following important theorem in relation to the basis functions of SU(2).
Theorem 20.1 [1] Let {ψ j, ψ j + 1,   , ψ j} be a set of (2j + 1) functions. Let D( j )
be a representation of the special unitary group SU(2). Suppose that we have a
following expression for ψ m, k (m ¼  j,   , j) such that
h i
ð jÞ
ψ m,k ðα, β, γ Þ ¼ Dmk ðα, β, γ Þ , ð20:62Þ

where α, β, and γ are Euler angles that appeared in (17.101). Then, the above
functions are basis functions of the representation of D( j ).
816 20 Theory of Continuous Groups

Fig. 20.2 Coordinate I


systems O, I, and II and their ( , , )
transformation. Regarding
the symbols and notations,
see text
O

Proof In Fig. 20.2 we show coordinate systems O, I, and II. Let P, Q, R be operators
of transformation (i.e., rotation) between the coordinate system as shown. The
transformation Q is defined as the coordinate transformation that transforms I to
II. Note that their inverse transformations, e.g., Q1 exists. In Fig. 20.2, P and R are
specified as P(α0, β0, γ 0) and R(α, β, γ), respectively. Then, we have [1, 2]

Q1 R ¼ P: ð20:63Þ

According to the expression (19.12), we have

  h i
ð jÞ 
Qψ m,k ðα, β, γ Þ ¼ ψ m,k Q1 ðα, β, γ Þ ¼ ψ m,k ðα0 , β0 , γ 0 Þ ¼ Dmk Q1 R ,

where the last equality comes from (20.62) and (20.63); (α, β, γ) and (α0, β0, γ 0) stand
for the transformations R(α, β, γ) and P(α0, β0, γ 0), respectively. The matrix element
ð jÞ 
Dmk Q1 R can be written as

ð jÞ 
X  ð jÞ
Xh i
ð jÞ
Dmk Q1 R ¼ DðmnjÞ Q1 Dnk ðRÞ ¼ DðnmjÞ ðQÞ Dnk ðRÞ,
n n

where with the last equality we used the unitarity of the representation matrix D( j ).
Taking complex conjugate of the above expression, we get
X h i
ð jÞ
Qψ m,k ðα, β, γ Þ ¼ ψ m,k ðα0 , β0 , γ 0 Þ ¼ DðnmjÞ ðQÞ Dnk ðRÞ
n
X h i X
ð jÞ ð jÞ
¼ n
D nm ð Q Þ D nk ð α, β, γ Þ ¼ ψ n,k ðα, β, γ ÞDðnmjÞ ðQÞ, ð20:64Þ
n

where with the last equality we used the supposition (20.62). Equation (20.64)
implies that ψ m, k (m ¼  j,   , 0,   , j) are basis functions of the representation
of D( j ). This completes the proof. ∎
If we restrict Theorem 20.1 to SO(3), we get an important result. Replacing j with
an integer l and putting k ¼ 0 in (20.62) and (20.64), we get
20.2 Rotation Groups: SU(2) and SO(3) 817

X
Qψ m,0 ðα, β, γ Þ ¼ ψ m,0 ðα0 , β0 , γ 0 Þ ¼ n
ψ n,0 ðα, β, γ ÞDðnm

ðQÞ,
h i ð20:65Þ
ðlÞ
ψ m,0 ðα, β, γ Þ ¼ Dm0 ðα, β, γ Þ :

This expression shows that the complex conjugates of individual components for
the “center” column vector of the representation matrix give the basis functions of
the representation of D(l ).
Removing γ as it is redundant and by the aid of (20.61), we get
rffiffiffiffiffiffiffiffiffiffiffiffi
4π m
ψ m,0 ðα, βÞ  Y ðβ, αÞ:
2l þ 1 l

Thus, in combination with (20.61), Theorem 20.1 immediately indicates that the
spherical surface harmonics Y m l ðθ, ϕÞ ðm ¼ l,   , 0,   , lÞ span the representation
space with respect to D(l ). That is, we have
X
l ðβ, αÞ ¼
QY m Y n ðβ, αÞDðnm
n l

ðQÞ: ð20:66Þ

Now, let us get back to (20.63). From (20.63), we have

Q ¼ RP1 : ð20:67Þ

Since P represents an arbitrary transformation, we may choose any (orthogonal)


coordinate system I in ℝ3. Meanwhile, a transformation which transforms I into II is
necessarily present and given by Q that is expressed as (20.67); see (11.72) and the
relevant argument of Sect. 11.4. Consequently, Q represents an arbitrary transfor-
mation in ℝ3 in turn.
Now, in (20.67) putting P ¼ E we get Q  R(α, β, γ). Then, replacing Q with R in
(20.66) as a special case we have
X
Rðα, β, γ ÞY m
l ðβ, αÞ ¼ Y n ðβ, αÞDðnm
n l

ðα, β, γ Þ: ð20:68Þ

The LHS of (20.68) is associated with three successive transformations. To


describe it appropriately, let us recall (19.12) again. We have
 
Rf ðrÞ ¼ f R1 ðrÞ : ð19:12Þ

Now, we are thinking of the case where R in (19.12) is described by R(α, β, γ) or


f
R3 ¼ Rzα R0y0 β R00 z00 γ in (17.101). That is, we have
818 20 Theory of Continuous Groups

h i
Rðα, β, γ Þf ðrÞ ¼ f Rðα, β, γ Þ1 ðrÞ :

f3 of (17.101), we readily get [5].


Considering the constitution of R

Rðα, β, γ Þ1 ¼ Rðγ, β, αÞ:

Thus, we have

Rðα, β, γ Þf ðrÞ ¼ f ½Rðγ, β, αÞðrÞ: ð20:69Þ

Applying (20.69) to (20.68), we get the successive (inverse) transformations of


the functions such that

l ðβ, αÞ ! Y l ðβ, α  γ Þ ! Y l ðβ  β, α  γ Þ ! Y l ðβ  β, α  γ  αÞ:


Ym m m m

The last entry in the above is [5]

l ðβ  β, α  γ  αÞ ¼ Y l ð0, γ Þ:
Ym m

Then, from (20.68) we get


X
l ð0, γ Þ ¼
Ym Y nl ðβ, αÞDðnm

ðα, β, γ Þ: ð20:70Þ
n

Meanwhile, from (20.61) we have


rffiffiffiffiffiffiffiffiffiffiffiffih i
2l þ 1 ðlÞ
l ð0, γ Þ ¼
Ym

Dm0 ðγ, 0Þ :

Returning to (20.60), if we would have sin 2θ factor in (20.60) or in (3.216), the


ðlÞ
relevant terms vanish with Dm0 ðγ, 0Þ except for special cases where 2r + m ¼ 0 in
(3.216) or (20.60). But, to avoid having a negative integer inside the factorial in the
denominator of (20.60), we must have r + m 0. Since r 0, we must have
2r + m ¼ (r + m) + r 0 as well. In other words, only if r ¼ m ¼ 0 in (20.60), we
ðlÞ
have 2r + m ¼ 0, for which Dm0 ðγ, 0Þ does not vanish. Using these conditions in
(20.60), (20.60) is greatly simplified to be

ðlÞ
Dm0 ðγ, 0Þ ¼ δm0 :

From (20.61) we get [5]


20.2 Rotation Groups: SU(2) and SO(3) 819

rffiffiffiffiffiffiffiffiffiffiffiffi
2l þ 1
l ð0, γ Þ
Ym ¼
4π 0m
δ : ð20:71Þ

Notice that this expression is consistent with (3.147) and (3.148). Meanwhile,
replacing ϕ and θ in (20.61) with α and β, respectively, multiplying the resulting
ðlÞ
equation by Dmk ðα, β, γ Þ, and further summing over m, we get
rffiffiffiffiffiffiffiffiffiffiffiffi
X 4π m ðlÞ
X h ðlÞ i
ðlÞ
Y ð β, αÞD ðα, β, γ Þ ¼ D ð α, β, γ Þ Dmk ðα, β, γ Þ
m 2l þ 1 l mk m m0
rffiffiffiffiffiffiffiffiffiffiffiffi
4π k
¼ δ0k ¼ Y ð0, γ Þ, ð20:72Þ
2l þ 1 l

where with the second equality we used the unitarity of the matrices Dðnm

ðα, β, γ Þ (see
Sect. 20.2.2)
qffiffiffiffiffiffiffi and with the last equality we used (20.70) multiplied by the constant

2lþ1 . At the same time, we recovered (20.71) comparing the last two sides of
(20.72).
Rewriting (20.48) in a (2j + 1, 2j + 1) matrix form, we get

f j , f jþ1 ,   , f ℜða, bÞ
j1 , f j
0 1
Dj,j  Dj,j
 B C ð20:73Þ
¼ f j , f jþ1 ,   , f j1 , f j B
@ ⋮ ⋱ ⋮ C A,
D j,j  D j,j

where fm (m ¼  j, j + 1,   , j  1, j) is described as in (20.47). If j is an integer l,


we can choose Y m l ðβ, αÞ for individual fm (i.e., basis functions).
To get used to the abstract representation theory in continuous groups, we wish to
think of a following trivial case: Putting j ¼ 0 in (20.54), (20.54) becomes trivial, but
still valid. In that case, we have m0 ¼ m ¼ 0 so that the inside of the factorial can be
non-negative. In turn, we must have μ ¼ 0 as well. Thus, we have

ð0Þ
D00 ðα, β, γ Þ ¼ 1:

Or, inserting l ¼ m ¼ 0 into (20.61), we have


h i pffiffiffiffiffi 0 pffiffiffiffiffiffiffiffiffiffi
ð 0Þ
D00 ðϕ, θÞ ¼1¼ 4π Y 0 ðθ, ϕÞ, i:e:, Y 00 ðθ, ϕÞ ¼ 1=4π :

Thus, we recover (3.150).


Next, let us reproduce three-dimensional representation D(1) using (20.54). A full
matrix form is expressed as
820 20 Theory of Continuous Groups

0 1
D1,1 D1,0 D1,1
B C
Dð1Þ ¼ B
@ D0,1 D0,0 D0,1 C A
D1,1 D1,0 D1,1
0 β 1 β 1
eiðαþγ Þ cos 2 pffiffiffi eiα sin β eiðαγÞ sin 2 ð20:74Þ
B 2 2 2 C
B C
B C
B 1
pffiffiffi e sin β C
1 iγ
¼ B  pffiffiffi eiγ sin β cos β C:
B 2 2 C
B C
@ β 1 βA
eiðγαÞ sin 2  pffiffiffi eiα sin β eiðαþγÞ cos 2
2 2 2

Note that (20.74) is a unitary matrix. The confirmation is left for readers. The
complex conjugate of the column vector m ¼ 0 in (20.61) with l ¼ 1 is described by
0 1
1
pffiffiffi eiα sin β
h i B
B 2
C
C
Dm0 ðϕ, θÞ ¼ B
ð1Þ C
B cos β C ð20:75Þ
@ 1 iα A
 pffiffiffi e sin β
2

that is related to the spherical surface harmonics (see Sect. 3.6); also see p. 760,
Table 15.4 of Ref. [4].
Now, (20.25), (20.61), (20.65), and (20.73) provide a clue to relating D(1) to the
f3 of (17.101) that includes Euler angles. The argument is
rotation matrix of ℝ3, i.e., R
as follows: As already shown in (20.61) and (20.65), we have related the basis
vectors ( f1 f0 f1) of D(1) to the spherical harmonics Y 1 1 Y 01 Y 11 . Denoting the
spherical basis by

1 1
ff e e
1  pffiffiffi ðx  iyÞ, f 0  z, f 1  pffiffiffi ðx  iyÞ, ð20:76Þ
2 2

we have
0 1
1 1
pffiffiffi  pffiffiffi C
B 2 0 2C
B
ff e e
1 f 0 f 1 ¼ ðx z yÞU ¼ ðx z yÞB
B 0 1 0 C C, ð20:77Þ
@ i i A
 pffiffiffi 0  pffiffiffi
2 2

where we define a (unitary) matrix U as


20.2 Rotation Groups: SU(2) and SO(3) 821

0 1
1 1
pffiffiffi 0  pffiffiffi C
B 2
B 2C
UB
B 0 1 0 C C:
@ i i A
 pffiffiffi 0  pffiffiffi
2 2

Equation (20.77) describes the relationship among two set of functions


f
f 1 fe0 fe1 and (x z y). We should not regard (x z y) as coordinate in ℝ3 as
mentioned in Sect. 20.1. The (3, 3) matrix of (20.77) is essentially the same as P of
0
(20.22). Following (20.73) and taking account of (20.25), we express ff e0 e0
1 f 0 f 1
as

0
ff e 0 e 0  ff
1 f 0 f 1
e e f e e ð1Þ
1 f 0 f 1 Rða, bÞ ¼ f 1 f 0 f 1 D : ð20:78Þ

This means that a set of functions ff e e


1 f 0 f 1 forms the basis vectors of D
(1)
as
well. Meanwhile, the relation (20.77) holds after the transformation, because (20.77)
is independent of the choice of specific coordinate systems. That is, we have

0
ff e 0 e 0 ¼ ðx0 z0 y0 Þ U:
1 f 0 f 1

Then, from (20.78) we get

0
ðx0 z0 y0 Þ ¼ ff e 0 e 0 1 ¼ ff
1 f 0 f 1 U
e e ð1Þ 1
1 f 0 f 1 D U

¼ ðx z yÞUDð1Þ U 1 : ð20:79Þ

Defining a unitary matrix V as


0 1
1 0 0
B C
V ¼@0 0 1A
0 1 0

and operating V on both sides of (20.79) from the right, we get

ðx0 z0 y0 ÞV ¼ ðx z yÞV  V 1 UDð1Þ U 1 V:

That is, we have


 1
ðx0 y0 z0 Þ ¼ ðx y zÞ U 1 V Dð1Þ U 1 V: ð20:80Þ
822 20 Theory of Continuous Groups

Note that (x0 z0 y0)V ¼ (x0 y0 z0) and (x z y)V ¼ (x y z); that is, V exchanges the
order of z and y in (x z y).
Meanwhile, regarding (x y z) as the basis vectors in ℝ3 we have

ð x0 y0 z 0 Þ ¼ ð x y z Þ R , ð20:81Þ

where R represents a (3, 3) orthogonal matrix.


Equation (20.81) represents a rotation in SO(3). Since, x, y, and z are linearly
independent, comparing (20.80) and (20.81) we get
 1  {
R  U 1 V Dð1Þ U 1 V ¼ U 1 V Dð1Þ U 1 V: ð20:82Þ

β β
More explicitly, using a ¼ e2ðαþγÞ cos and b ¼ e2ðαγÞ sin
i i
2 2 as in the case of
(20.45), we describe R as
0   1
Re a2  b2 Im a2 þ b2 2 Re ðabÞ
B   C
R ¼ @ Im a2  b2 Re a2 þ b2 2ImðabÞ A: ð20:83Þ
2 Re ðab Þ 2Imðab Þ 2
j aj  j bj 2

Comparing (20.83) with (17.101), we find out that R is identical to f R3 of


f3 are
(17.101). Equations (20.82) and (20.83) clearly show that D(1) and R  R
connected to each other through the unitary similarity transformation, even though
these matrices differ in that the former matrix is unitary and the latter is a real
orthogonal matrix. That is, D(1) and f
R3 are equivalent in terms of representation; see
Schur’s First Lemma of Sect. 18.3. Thus, D(1) is certainly a representation matrix of
f3 is given by
SO(3). The trace of D(1) and R

f3 ¼ cos ðα þ γ Þð1 þ cos βÞ þ cos β:


TrDð1Þ ¼ Tr R

Let us continue the discussion still further. We think of the quantity (x/r y/r z/r),
where r is radial coordinate of (20.25). Operating (20.82) on (x/r y/r z/r) from the
right, we have

ðx=r y=r z=r ÞR ¼ ðx=r y=r z=r ÞV 1 UDð1Þ U 1 V

¼ ff e e ð1Þ 1
1 =r f 0 =r f 1 =r D U V:

From (20.25), (20.61), (20.75), and (20.76), we get


 
1 1
ff e e
1 =r f 0 =r f 1 =r ¼ pffiffiffi eiα sin β cos β  pffiffiffi eiα sin β : ð20:84Þ
2 2

Using (20.74) and (20.84), we obtain


20.2 Rotation Groups: SU(2) and SO(3) 823

ff e e
1 =r f 0 =r f 1 =r D
ð1Þ
¼ ð0 1 0Þ:

This is a tangible example of (20.72). Then, we have

ff e e ð1Þ 1
1 =r f 0 =r f 1 =r D U V ¼ ðx=r y=r z=r ÞR ¼ ð0 0 1Þ: ð20:85Þ

For the confirmation of (20.85), use the spherical coordinate representation (3.17)
to denote x, y, and z in the above equation. This seems to be merely a confirmation of
the unitarity of representation matrices; see (20.72). Nonetheless, we emphasize that
the simple relation (20.85) holds only with a special case where the parameters α and
β in D(1) (or R ) are identical with the angular components of spherical coordinates
associated with the Cartesian coordinates x, y, and z. Equation (20.85), however,
does not hold in a general case where the parameters α and β are different from those
angular components. In other words, (20.68) that describes the special case where
Q  R(α, β, γ) leads to (20.85); see Fig. 20.2. Equation (20.66), on the other hand, is
used for the general case where Q represents an arbitrary transformation in ℝ3.
Regarding further detailed discussion on the topics, readers are referred to the
literature [1, 2, 5, 6].

20.2.4 Irreducible Representations of SU(2) and SO(3)

In this section we further explore characteristics of the representation of SU(2) and


SO(3) and show the irreducibility of the representation matrices of SU(2) and SO(3).
ð jÞ
To prove the irreducibility of the SU(2) representation matrices Dm0 m , we use
Schur’s lemmas already explained in Sect. 18.3. For this purpose we use special
types of matrices of [5]. In (20.54) we consider a special case where m0 ¼ j. Then, the
factor ( j  m0  μ) ! ¼ (μ)! in the denominator survives only when μ ¼ 0. Hence,
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   jþm   jm
ð jÞ jm ð2jÞ! iðαjþγmÞ β β
Djm ðα, β, γ Þ ¼ ð1Þ e cos sin :
ð j þ mÞ!ð j  mÞ! 2 2
ð20:86Þ

ð jÞ
Since in (20.86) the exponential term never vanishes, Djm ðα, β, γ Þ does not vanish
either except for special values of β.
ð jÞ
Next, we consider Dm0 m ðα, 0, γ Þ. For this term not to vanish, the power of sin β2 in
(20.54) must be zero; i.e., we have

m0  m þ 2μ ¼ 0: ð20:87Þ
824 20 Theory of Continuous Groups

Meanwhile, no condition is imposed on the power of cos β2 in (20.54); note that


 2jþmm0 2μ
cos 02 ¼ 1. In the denominator of (20.54), to avoid a negative number in
factorials we must have m0  m + μ 0 and μ 0. If μ > 0,
m0  m + 2μ ¼ (m0  m + μ) + μ > 0 as well. Therefore, for (20.87) to hold, we
must have μ ¼ 0. From (20.87), in turn, this means m0  m ¼ 0, i.e., m0 ¼ m. Then,
we have a greatly simplified equation

ð jÞ 0
Dm0 m ðα, 0, γ Þ ¼ δm0 m eiðαm þγmÞ : ð20:88Þ

ð jÞ
This implies that Dm0 m ðα, 0, γ Þ is diagonal. Suppose that we have a matrix
ð jÞ
A ¼ ðAm0 m Þ. If A commutes with Dm0 m ðα, 0, γ Þ, it must be diagonal unless
ð jÞ
Dm0 m ðα, 0, γ Þ ¼ cδm0 m, where c is a constant independent of m0 and m. But,
ð jÞ
Dm0 m ðα, 0, γ Þ 6¼ cδm0 m because of (20.88). That is, A should have a form Am0 m ¼
ð jÞ
am δm0 m . If A commutes with Dm0 m ðα, β, γ Þ, then taking the ( j, m) component of the
product we have
X ð jÞ
X ð jÞ ð jÞ
ADð jÞ ¼ A D ¼
k jk km
aδ D
k k jk km
¼ aj Djm : ð20:89Þ
jm

Meanwhile,
X ð jÞ
X ð jÞ ð jÞ
Dð jÞ A ¼ k
Djk Akm ¼ k
Djk am δkm ¼ am Djm : ð20:90Þ
jm

From (20.89) and (20.90), we have

ð jÞ 
Djm ðα, β, γ Þ aj  am ¼ 0:

ð jÞ
As already mentioned above, the matrix elements Djm ðα, β, γ Þ of (20.86) do not
ð jÞ
vanish except for special values of β. Therefore, for A to commute with Dm0 m ðα, β, γ Þ,
we must have aj ¼ am  a for all m. This implies

Am0 m ¼ aδm0 m : ð20:91Þ

ð jÞ
Thus, from Schur’s Second Lemma Dm0 m ðα, β, γ Þ is irreducible.
We must be careful, however, about the application of Schur’s Second Lemma.
This is because Schur’s Second Lemma described in Sect. 18.3 says the
following:
A representation D is irreducible ⟹ The matrix M that is commutative with all
D(g) is limited to a form of M ¼ cE.
20.2 Rotation Groups: SU(2) and SO(3) 825

Nonetheless, we are uncertain of truth or falsehood of the converse proposition.


Fortunately, however, the converse proposition is true if the representation is unitary.
We prove the converse proposition by its contraposition such that.
A representation D is not irreducible (i.e., reducible) ⟹ Of the matrices that are
commutative with all D(g), we can find at least one matrix M of the type other than
M ¼ cE.
In this context, Theorem 18.2 (Sect. 18.2) as well as (18.38) teaches us that the
unitary representation D(g) is completely reducible so that we can describe it as a
direct sum such that

DðgÞ ¼ Dð1Þ ðgÞ Dð2Þ ðgÞ  DðωÞ ðgÞ, ð20:92Þ

where g is any group element. Then, we can choose a following matrix A for a linear
transformation that is commutative with D(g):

A ¼ αð1Þ E 1 αð2Þ E 2  αðωÞ E ω , ð20:93Þ

where α(1),   , α(ω) are complex constants that may be different from one another;
E1,   , Eω are unit matrices having the same dimension as D(1)(g),   , D(ω)(g),
respectively. Then, even though A is not a type of A ¼ αδm0 m, A commutes with D(g)
for any g.
Here we rephrase Schur’s Second Lemma as the following theorem.
Theorem 20.2 Let D be a unitary representation of a group G. Suppose that with
8
g 2 G we have

DðgÞM ¼ MDðgÞ: ð18:48Þ

A necessary and sufficient condition for the said unitary representation to be


irreducible is that linear transformations that are commutative with D(g) (8g 2 G)
are limited to those of a form described by A ¼ αδm0 m , where α is a (complex)
constant.
Next, we examine the irreducibility of the SO(3) representation matrices.
We develop the discussion in conjunction with explanation about the important
properties of the representation matrices of SU(2) as well as SO(3). We rewrite
(20.54) when j ¼ l (l : zero or positive integers) to relate it to the rotation in ℝ3.

h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð1Þm0 mþμ ðl þ mÞ!ðl  mÞ!ðl þ m0 Þ!ðl  m0 Þ!
ðlÞ
D ðα, β, γ Þ ¼
m0 m μ ðl þ m  μÞ!μ!ðm0  m þ μÞ!ðl  m0  μÞ!
 2lþmm0 2μ  m0 mþ2μ
0 β β
 eiðαm þγmÞ cos sin :
2 2
ð20:94Þ
826 20 Theory of Continuous Groups

f3 (appearing in
Meanwhile, as mentioned in Sect. 20.2.3 the transformation R
Sect. 17.4.2) described by

f
R3 ¼ Rzα R0 y0 β R00 z00 γ

may be regarded as R(α, β, γ) of (20.63). Consequently, the representation matrix


D(l )(α, β, γ) is described by

ðlÞ 
DðlÞ ðα, β, γ Þ  DðlÞ f
R3 ¼ DðαlÞ ðRzα ÞDβ R0y0 β DðγlÞ R00 z00 γ : ð20:95Þ

Equation (20.95) is based upon the definition of representation (18.2). This


ð12Þ
notation is in parallel with (20.45) in which Dα,β,γ is described by the product of
three exponential functions of matrices each of which was associated with the
rotation characterized by Euler angles of (17.101). Denoting DðαlÞ ðRzα Þ ¼
ðlÞ 
DðlÞ ðα, 0, 0Þ, Dβ R0y0 β ¼ DðlÞ ð0, β, 0Þ, and DðγlÞ R00 z00 γ ¼ DðlÞ ð0, 0, γ Þ, we have

DðlÞ ðα, β, γ Þ ¼ DðlÞ ðα, 0, 0ÞDðlÞ ð0, β, 0ÞDðlÞ ð0, 0, γ Þ, ð20:96Þ

where with each representation two out of three Euler angles are zero.
In light of (20.94), we estimate each factor of (20.96). By the same token as
before, putting β ¼ γ ¼ 0 in (20.94) let us examine on what conditions
 m0 mþ2μ
sin 02 survives. For this factor to survive, we must have m0  m + 2μ ¼ 0
as in (20.87). Then, following the argument as before, we should have μ ¼ 0 and
m0 ¼ m. Thus, D(l )(α, 0, 0) must be a diagonal matrix. This is the case with
D(l )(0, 0, γ) as well.
Equation (3.216) implies that the functions Y m l ðβ, αÞ ðm ¼ l,   , 0,   , lÞ are
eigenfunctions with regard to the rotation about the z-axis. In other words, we have

imα m
l ðθ, ϕÞ ¼ Y l ðθ, ϕ  αÞ ¼ e
Rzα Y m Y l ðθ, ϕÞ:
m

l ðθ, ϕÞ can be described as


This is because from (3.216) Y m

l ðθ, ϕÞ ¼ F ðθ Þe
Ym imϕ
,

where F(θ) is a function of θ. Hence, we have

imðϕαÞ
l ðθ, ϕ  αÞ ¼ F ðθ Þe
Ym ¼ eimα F ðθÞeimϕ ¼ eimα Y m
l ðθ, ϕÞ:

Therefore, with a full matrix representation we get


20.2 Rotation Groups: SU(2) and SO(3) 827


Y l lþ1
l Yl   Y 0l   Y ll Rzα
0 1
eiðlÞα
B C
B C
 l lþ1 B
l B
eiðlþ1Þα C
¼ Y l Y l   Y l   Y l B
0 C
C
B ⋱ C
@ A
eilα ð20:97Þ
0 ilα
1
e
B C
B C
 l lþ1 B eiðl1Þα C
¼ Y l Y l   Y 0l   Y ll B
B
C:
C
B ⋱ C
@ A
eilα

With a shorthand notation, we have

DðlÞ ðα, 0, 0Þ ¼ eimα δm0 m ðm ¼ l,   , lÞ: ð20:98Þ

From (20.96), with the (m0, m) matrix elements of D(l )(α, β, γ) we get
h i X h h h i
DðlÞ ðα, β, γ Þ ¼ D ðlÞ
ðα, 0, 0 Þ 0
ms D ðlÞ
ð 0, β, 0 Þst D ðlÞ
ð 0, 0, γ Þ
m0 m s,t
X h i tm
isα ðlÞ imγ
¼ s,t
e δm0 s D ð0, β, 0Þ e δtm ð20:99Þ
0
h i st

¼ eim α DðlÞ ð0, β, 0Þ 0 eimγ


mm

Meanwhile, from (20.94) we have

h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð1Þm0 mþμ ðl þ mÞ!ðl  mÞ!ðl þ m0 Þ!ðl  m0 Þ!
ðlÞ
D ð0, β, 0Þ ¼
m0 m μ ðl þ m  μÞ!μ!ðm0  m þ μÞ!ðl  m0  μÞ!
 2lþmm0 2μ  m0 mþ2μ:
β β
cos sin ð20:100Þ
2 2

Thus, we find that [DðlÞ ðα, β, γ Þm0 m of (20.99) has been factorized into three
factors. Equation (20.94) has such an implication. The same characteristic is shared
with the SU(2) representation matrices (20.54) more generally. This can readily be
understood from the aforementioned discussion.
Taking D(1) of (20.74) as an example, we have
828 20 Theory of Continuous Groups

Dð1Þ ðα, β, γ Þ ¼ Dð1Þ ðα, 0, 0ÞDð1Þ ð0, β, 0ÞDð1Þ ð0, 0, γ Þ


0 β 1
2β 1
pffiffiffi sin β
0 iα 1B cos 2 sin 2
2 2 C0 iγ 1
e 0 0 B C e 0 0
B CB CB C
¼B 0 1 0 CBB  p
1
ffiffi
ffi sin β cos β p
1
ffiffi
ffi sin β CB
C@ 0 1 0 C
@ AB 2 2 C A
B C
0 0 eiα @ β 1 β A 0 0 e iγ
sin 2  pffiffiffi sin β cos2
2 2 2
0 β 1 β 1
eiðαþγÞ cos 2 pffiffiffi eiα sin β eiðαγÞ sin 2
B 2 2 2 C
B C
B 1 C
B  pffiffiffi eiγ sin β cos β pffiffiffi e sin β C
1 iγ
¼B C:
B 2 2 C
B C
@ βA
iðγαÞ 2β 1 iα i ðαþγ Þ
e sin  pffiffiffi e sin β e cos 2
2 2 2
ð20:74Þ

In this way, we have reproduced the result of (20.74).


Once D(l )(α, β, γ) has been factorized as in (20.94), we can readily examine
whether the representation is irreducible. To know what kinds of matrices commute
with D(l )(α, β, γ), it suffices to examine whether the matrices commute with individ-
ual factors. The argument is due to Hamermesh [5].
(i) Let A be a matrix having a dimension the same as that of D(l )(α, β, γ), namely
A is a (2l + 1, 2l + 1) square matrix with a form of A ¼ (aij). First, we examine the
commutativity of D(l )(α, 0, 0) and A. With the matrix elements we have
h i X h i X
ADðlÞ ðα, 0, 0Þ ¼ a m0k D
ðlÞ
ðα, 0, 0 Þ ¼ am0 k eimα δkm
m0 m k km
k

¼ am0 m eimα , ð20:101Þ

where the second equality comes from (20.98). Also, we have


h i Xh i 0
DðlÞ ðα, 0, 0ÞA ¼ DðlÞ ðα, 0, 0Þ akm ¼ am0 m eim α : ð20:102Þ
m0 m k m0 k

From (20.101) and (20.102), for a condition of commutativity we have

0
am0 m eimα  eim α ¼ 0:

If m0 6¼ m, we should have am0 m ¼ 0. If m0 ¼ m, am0 m does not have to vanish,


but may take an arbitrary complex number. Hence, A is a diagonal matrix,
namely
20.2 Rotation Groups: SU(2) and SO(3) 829

A ¼ am δ m 0 m , ð20:103Þ

where am is an arbitrary complex number.


(ii) Next we examine the commutativity of D(l )(0, β, 0) and A of (20.103). Using
(20.66), we have
X
l ðβ, αÞ ¼
QY m Y n ðβ, αÞDðnm
n l

ðQÞ: ð20:66Þ

Choosing R(0, θ, 0) for Q and putting α ¼ 0 in (20.66), we have


X
Rð0, θ, 0ÞY m
l ðβ, 0Þ ¼ Y n ðβ, 0ÞDðnm
n l

ð0, θ, 0Þ: ð20:104Þ

Meanwhile, from (20.69) we get


h i
1
Rð0, θ, 0ÞY m
l ð β, 0 Þ ¼ Y m
l Rð 0, θ, 0 Þ ð β, 0 Þ ¼ Ym
l ½Rð0, θ, 0Þðβ, 0Þ

l ðβ  θ, 0Þ:
¼ Ym ð20:105Þ

Combining (20.104) and (20.105) we get


X
l ðβ  θ, 0Þ ¼
Ym Y n ðβ, 0ÞDðnm
n l

ð0, θ, 0Þ: ð20:106Þ

Further putting β ¼ 0 in (20.106), we have


X
l ðθ, 0Þ ¼
Ym Y n ð0, 0ÞDðnm
n l

ð0, θ, 0Þ: ð20:107Þ

Taking account of (20.71), RHS does not vanish in (20.107) only when
n ¼ 0. On this condition, we obtain

ðlÞ
l ðθ, 0Þ ¼ Y l ð0, 0ÞD0m ð0, θ, 0Þ:
Ym ð20:108Þ
0

Meanwhile, from (20.71) with m ¼ 0, we have


rffiffiffiffiffiffiffiffiffiffiffiffi
2l þ 1
Y 0l ð0, 0Þ ¼ : ð20:109Þ

Inserting this into (20.108), we have


rffiffiffiffiffiffiffiffiffiffiffiffi
2l þ 1 ðlÞ
l ðθ, 0Þ
Ym ¼
4π 0m
D ð0, θ, 0Þ: ð20:110Þ

Replacing θ with θ in (20.110), we get


830 20 Theory of Continuous Groups

rffiffiffiffiffiffiffiffiffiffiffiffi
ðlÞ 4π m
D0m ð0, θ, 0Þ ¼ Y ðθ, 0Þ: ð20:111Þ
2l þ 1 l

Compare (20.111) with (20.61) and confirm (20.111) using (20.74) with
l ¼ 1. Viewing (20.111) as a “row vector,” (20.111) with l ¼ 1 can be expressed
as

ð1Þ ð1Þ ð1Þ


D0,1 ð0, θ, 0Þ D0,0 ð0, θ, 0Þ D0,1 ð0, θ, 0Þ
rffiffiffiffiffi  
4π  1 1 1
¼ Y ðθ, 0Þ Y l ðθ, 0Þ Y l ðθ, 0Þ ¼ pffiffiffi sin θ cos θ  pffiffiffi sin θ :
0 1
3 l 2 2

Getting back to our topic, with the matrix elements we have


h i X h i h i
ADðlÞ ð0, θ, 0Þ ¼ k
a k δ 0k D ðlÞ
ð 0, θ, 0 Þ ¼ a0 DðlÞ ð0, θ, 0Þ
0m km 0m

and
h i Xh i h i
DðlÞ ð0, θ, 0ÞA ¼ k
DðlÞ ð0, θ, 0Þ am δkm ¼ DðlÞ ð0, θ, 0Þ am ,
0m 0k 0m

where we assume A ¼ am δm0 m in (20.103). Since from (20.108)


[D (0, θ, 0)]0m does not vanish other than specific values of θ, for A and
(l )

D(l )(0, θ, 0) to commute we must have

am ¼ a0

for all m ¼  l,  l + 1,   , 0,   , l  1, l. Then, from (20.103) we get

A ¼ a0 δm0 m , ð20:112Þ

where a0 is an arbitrary complex number. Thus, a matrix that commutes with


both D(l )(α, 0, 0) and D(l )(0, β, 0) must be of a form of (20.112). The same
discussion holds with D(l )(α, β, γ) of (20.94) as a whole with regard to the
commutativity. On the basis of Schur’s Second Lemma (Sect. 18.3), we con-
clude that the representation D(l )(α, β, γ) is again irreducible. This is one of the
prominent features of SO(3) as well as SU(2).
As can be seen clearly in (20.97), the dimension of representation matrix
D(l )(α, β, γ) is 2l + 1. Suppose that there is another representation matrix
0 0
Dðl Þ ðα, β, γ Þ ðl0 6¼ lÞ. Then, D(l )(α, β, γ) and Dðl Þ ðα, β, γ Þ are inequivalent (see
Sect. 18.3). Meanwhile, the unitary representation of SO(3) is completely
reducible and, hence, any reducible unitary representation D can be described by
20.2 Rotation Groups: SU(2) and SO(3) 831

X
D¼ a DðlÞ ðα, β, γ Þ,
l l
ð20:113Þ

where al is zero or a positive integer. If al 2, this means that the same


representations D(l )(α, β, γ) repeatedly appear. Equation (20.113) provides a
tangible example of (20.92) expressed as

DðgÞ ¼ Dð1Þ ðgÞ Dð2Þ ðgÞ  DðωÞ ðgÞ ð20:92Þ

and applies to the representation matrices of SU(2) more generally. We will


encounter related discussion in Sect. 20.3 in connection with the direct-product
representation.

20.2.5 Parameter Space of SO(3)

As already discussed in Sect. 17.4.2, we need three angles α, β, and γ to specify the
rotation in ℝ3. Their domains are usually taken as follows:

0 α 2π, 0 β π, 0 γ 2π:

The domains defined above are referred to as a parameter space.


Yet, there are different implications depending upon choosing different coordinate
systems. The first choice is a moving coordinate system where α, β, and γ are Euler
angles (see Sect. 17.4.2). The second choice is a fixed coordinate system where α is an
azimuthal angle, β is a zenithal angle, and γ is a rotation angle. Notice that in the
former case α, β, and γ represent equivalent rotation angles. In the latter case, however,
although α and β define the orientation of a rotation axis, γ is designated as a rotation
angle. In the present section we further discuss characteristics of the parameter space.
First, we study several characteristics of (3, 3) real orthogonal matrices. (i) The (3, 3)
real orthogonal matrices have eigenvalues 1, eiγ , and eiγ (0 γ π). This is a direct
consequence of (17.77) and invariance of the characteristic equation of the matrix. The
rotation axis is associated with an eigenvector that belongs to the eigenvalue 1.
Let A ¼ (aij) (1 i, j 3) be a (3, 3) real orthogonal matrix. Let u be an
eigenvector belonging to the eigenvalue 1. We suppose that when we are thinking of
the rotation on the fixed coordinate system, u is given by a column vector such that
0 1
u1
B C
u ¼ @ u2 A :
u3

Then, we have
832 20 Theory of Continuous Groups

A u = 1u = u: ð20:114Þ

Operating AT on both sides of (20.114), we get

AT A u = Eu = u = AT u, ð20:115Þ

where we used the property of an orthogonal matrix; i.e., ATA ¼ E. From (20.114)
and (20.115), we have

A  AT u ¼ 0: ð20:116Þ

Writing (20.116) in matrix components, we have [5].


20 1 0 13 0 1
a11 a12 a13 a11 a21 a31 u1
6B C B C7 B C
4@ a21 a22 a23 A  @ a12 a22 a32 A5@ u2 A ¼ 0: ð20:117Þ
a31 a32 a33 a13 a23 a33 u3

That is,

ða12  a21 Þu2 þ ða13  a31 Þu3 ¼ 0,


ða21  a12 Þu1 þ ða23  a32 Þu3 ¼ 0,
ða31  a13 Þu1 þ ða32  a23 Þu2 ¼ 0: ð20:118Þ

Solving (20.118), we get

u1 : u2 : u3 ¼ ða32  a23 Þ : ða13  a31 Þ : ða21  a12 Þ: ð20:119Þ


1 0
u1
B C
If we normalize u, i.e., u12 + u22 + u32 ¼ 1, @ u2 A gives direction cosines of the
u3
eigenvector, namely the rotation axis. Equation (20.119) applies to any orthogonal
matrix. A simple example is (20.3); a more complicated example can be seen in
(17.107). Confirmation is left for readers. We use this property soon below.
Let us consider infinitesimal transformations of rotation. As already implied in
(20.5), an infinitesimal rotation θ around the z-axis is described by
0 1
1 θ 0
B C
Rzθ ¼ @ θ 1 0 A: ð20:120Þ
0 0 1

Now, we newly introduce infinitesimal rotation operators as below:


20.2 Rotation Groups: SU(2) and SO(3) 833

0 1 0 1 0 1
1 0 0 1 0 η 1 ζ 0
B C B C B C
Rxξ ¼ @ 0 1 ξ A, Ryη ¼ @ 0 1 0 A, Rzζ ¼ @ ζ 1 0 A: ð20:121Þ
0 ξ 1 η 0 1 0 0 1

These operators represent infinitesimal rotations ξ, η, and ζ around the x-, y-, and
z-axes, respectively (see Fig. 20.3). Note that these operators commute one another
to the first order of infinitesimal quantities ξ, η, and ζ. That is, we have

Rxξ Ryη ¼ Ryη Rxξ , Ryη Rzζ ¼ Rzζ Ryη , Rxξ Ryη Rzζ ¼ Rzζ Ryη Rxξ , etc:

with, e.g.,
0 1
1 ζ η
B C
Rxξ Ryη Rzζ ¼ @ ζ 1 ξ A ð20:122Þ
η ξ 1

to the first order. Notice that the order of operations Rxξ, Ryη, and Rzζ is disregarded.
Next, let us consider a successive transformation with a finite rotation angle ω that
follows the infinitesimal rotations. Readers may well ask why and for what purpose
we need to make such elaborate calculations. It is partly because when we were
dealing with finite groups in the previous chapters, group elements were finite and
relevant calculations were straightforward accordingly. But, now we are thinking of
continuous groups that possess an infinite number of group elements and, hence, we
have to consider a “density” of group elements. In this respect, we are now thinking
of how the density of group elements in the parameter space is changed according to

Fig. 20.3 Infinitesimal z


rotations ξ, η, and ζ around
the x-, y-, and z-axes,
respectively

y
O

x
834 20 Theory of Continuous Groups

the rotation of a finite angle ω. The results are used for various group calculations
including orthogonalization of characters.
Getting back to our subject, we further operate a finite rotation of an angle ω
around the z-axis subsequent to the aforementioned infinitesimal rotations. The
discussion developed below is due to Hamermesh [5]. We denote this rotation by
Rω. Since we are dealing with a spherically symmetric system, any rotation axis can
equivalently be chosen without loss of generality. Notice also that the rotations of the
same rotation angle belong to the same conjugacy class [see (17.106) and
Fig. 17.18]. Defining RxξRyηRzζ  RΔ of (20.122), the rotation R that combines RΔ
and subsequently occurring Rω is described by
0 10 1
1 ζ η cos ω  sin ω 0
B CB C
B CB C
R ¼ RΔ Rω ¼ B ζ 1 ξ CB sin ω cos ω 0 C
@ A@ A
η ξ 1 0 0 1
0 1 ð20:123Þ
cos ω  ζ sin ω  sin ω  ζ cos ω η
B C
B C
¼ B sin ω þ ζ cos ω cos ω  ζ sin ω ξ C:
@ A
ξ sin ω  η cos ω ξ cos ω þ η sin ω 1

Hence, we have

R32  R23 ¼ ξ cos ω þ η sin ω  ðξÞ ¼ ξð1 þ cos ωÞ þ η sin ω,


R13  R31 ¼ η  ðξ sin ω  η cos ωÞ ¼ ηð1 þ cos ωÞ  ξ sin ω,
R21  R12 ¼ sin ω þ ζ cos ω  ð sin ω  ζ cos ωÞ ¼ 2ð sin ω þ ζ cos ωÞ:
ð20:124Þ

Since (20.124) gives a relative directional ratio of the rotation axis, we should
normalize a vector whose components are given by (20.124) to get a direction
cosines of the rotation axis. Note that the direction of the rotation axis related to
R of (20.123) should be close to Rω and that only R12  R21 in (20.124) has a term
(i.e., 2 sin ω) lacking infinitesimal quantities ξ, η, ζ. Therefore, it suffices to
normalize R12  R21 to seek the direction cosines to the first order. Hence, dividing
(20.124) by 2 sin ω, as components of the direction cosines we get

ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ


þ ,  , 1: ð20:125Þ
2 sin ω 2 2 sin ω 2

Meanwhile, combining (17.78) and (20.123), we can find a rotation angle ω0 for
R in (20.123). The trace χ of (20.123) is written as
20.2 Rotation Groups: SU(2) and SO(3) 835

χ ¼ 1 þ 2ð cos ω  ζ sin ωÞ: ð20:126Þ

From (17.78) we have

χ ¼ 1 þ 2 cos ω0 : ð20:127Þ

Equating (20.126) and (20.127), we obtain

cos ω0 ¼ cos ω  ζ sin ω: ð20:128Þ

Approximating (20.128), we get

1 2 1
1  ω0  1  ω2  ζω: ð20:129Þ
2 2

From (20.129), we have a following approximate expression:

ðω0 þ ωÞðω0  ωÞ  2ωðω0  ωÞ  2ζω:

Hence, we obtain

ω0  ω þ ζ: ð20:130Þ

Combining (20.125) and (20.130), we get their product to the first order such that
   
ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ
ω þ , ω  , ω þ ζ: ð20:131Þ
2 sin ω 2 2 sin ω 2

Since these quantities are products of individual direction cosines and the rotation
angle ω + ζ, we introduce variables ex, ey, and ez as the x-, y-, and z-related quantities,
respectively. Namely, we have
 
ξð1 þ cos ωÞ η
ex ¼ ω þ ,
2 sin ω 2
 
ηð1 þ cos ωÞ ξ ð20:132Þ
ey ¼ ω  ,
2 sin ω 2
ez ¼ ω þ ζ:

To evaluate how the density of group elements is changed as a function of the


rotation angle ω, we are interested in variations of ex, ey, and ez that depend on ξ, η, and
ζ. To this end, we calculate the Jacobian J as
836 20 Theory of Continuous Groups

∂ex ∂ex ∂ex


∂ξ ∂η ∂ζ ωð1 þ cos ωÞ ω
0
∂ðex, ey,ezÞ 2 sin ω 2
∂ey ∂ey ∂ey
J¼ ¼ ¼ ω ωð1 þ cos ωÞ
∂ðξ, η, ζ Þ ∂ξ ∂η ∂ζ  0
2 2 sin ω
∂ez ∂ez ∂ez 0 0 1
∂ξ ∂η ∂ζ
 2 
ω2 ð1 þ cos ωÞ ω2
¼ þ1 ¼ , ð20:133Þ
4 sin ω
2
4 sin 2 ω 2

where we used formulae of trigonometric functions with the last equality. Note that
(20.133) does not depend on the signs of ω. This is because J represents the
relative volume density ratio between two parameter spaces of the ex ey ez-system and
ξηζ-system and this ratio is solely determined by the modulus of rotation angle.
Taking ω ! 0, we have

0
ω2 ðω2 Þ 1 2ω ω
lim J ¼ lim ¼ lim  2 0 ¼ lim ¼ lim
ω!0 ω!0 4 sin 2 ω ω!0 4 sin ω 4 ω!0 2 sin ω2  12 cos ω
2
ω!0 sin ω
2 2

ð ωÞ 0 1
¼ lim 0 ¼ lim cos ω ¼ 1:
ω!0 ð sin ωÞ ω!0

Thus, the relative volume density ratio in the limit of ω ! 0 is 1, as expected. Let
dV ¼ dexdeydez and dΠ ¼ dξdηdζ be volume elements of each coordinate system.
Then we have

dV ¼ JdΠ: ð20:134Þ

We assume that

dV ρξηζ
J¼ ¼ ,
dΠ ρexeyez

where ρ ex ey ez and ρξηζ are a “number density” of group elements in each coordinate
system. We suppose that the infinitesimal rotations in the ξηζ-coordinate are
converted by a finite rotation Rω of (20.123) into the ex ey ez -coordinate. In this way,
J can be viewed as an “expansion” coefficient as a function of rotation angle jωj.
Hence, we have

ρξηζ
dV ¼ dΠ or ρ ex ey ez dV ¼ ρξηζ dΠ: ð20:135Þ
ρ ex ey ez
20.2 Rotation Groups: SU(2) and SO(3) 837

Equation (20.135) implies that total (infinite) number of group elements that are
contained in the group SO(3) is invariant with respect to the transformation.
Let us calculate the total volume of SO(3). This can be measured such that
Z Z
1
dΠ ¼ dV: ð20:136Þ
J

Converting dV to the polar coordinate representation, we get


Z Z Z Z
π π 2π
4 sin 2 ω~2
Π½SOð3Þ ¼ dΠ ¼ ω
de dθ dϕ e 2 sin θ ¼ 8π 2 ,
ω ð20:137Þ
0 0 0 e2
ω

where we denote the total volume of SO(3) by Π[SO(3)]. Note that ω in (20.133) is
replaced with ω e j ω j so that the radial coordinate can be positive. As already
noted, (20.133) does not depend on the signs of ω. In other words, among the
angles α, β, and ω (usually γ is used instead of ω; see Sect. 17.4.2), ω is taken as
π ω < π (instead of 0 ω < 2π) so that it can be conformed to the radial
coordinate.
We utilize (12.137) for calculation of various functions. Let f(ϕ, θ, ω) be an
arbitrary function on a parameter space. We can readily estimate a “mean value”
of f(ϕ, θ, ω) on that space. That is, we have
R
f ðϕ, θ, ωÞdΠ
f ðϕ, θ, ωÞ  R , ð20:138Þ

where f ðϕ, θ, ωÞ represents a mean value of the function and given by


 Z π Z π Z 2π  Z
e
ω
f ðϕ, θ, ωÞ ¼ 4 deω dθ dϕf ðϕ, θ, ωÞ sin 2 sin θ = dΠ: ð20:139Þ
0 0 0 2

There would be some inconvenience to use (20.139) because of the mixture of


variables ω and ωe . Yet, that causes no problem if f(ϕ, θ, ω) is an even function with
respect to ω (see Sect. 20.2.6).

20.2.6 Irreducible Characters of SO(3) and Their


Orthogonality

To evaluate (20.139), let us get back to the irreducible representations D(l )(α, β, γ) of
Sect. 20.2.3. Also, we recall how successive coordinate transformations produce
changes in the orthogonal matrices of transformation (Sect. 17.4.2). Since the
parameters α, β, and γ uniquely specify individual rotation, different sets of α, β,
and γ (in this section we use ω instead of γ) cause different irreducible
838 20 Theory of Continuous Groups

Fig. 20.4 Geometrical


arrangement of two rotation
axes A and A0 accompanied
by the same rotation angle ω

representations. A corresponding rotation matrix is given by (17.101). In fact, the


said matrix is a representation of the rotation. Keeping these points in mind, we
discuss irreducible representations and their characters particularly in relation to the
orthogonality of the irreducible characters.
Figure 20.4 depicts a geometrical arrangement of two rotation axes accompanied
by the same rotation angle ω. Let Rω and R0ω be such two rotations around the
rotation axis A and A0, respectively. Let Q be another rotation that transforms the
rotation axis A to A0. Then we have [1, 2]

R0ω ¼ QRω Q1 : ð20:140Þ

This implies that Rω and R0ω belong to the same conjugacy class. To consider the
situation more clearly, let us view the two rotations Rω and R0ω from two different
coordinate systems, say some xyz-coordinate system and another x0y0z0-coordinate
system. Suppose also that A coincides with the z-axis and that A0 coincides with the
z0-axis. Meanwhile, if we describe Rω in reference to the xyz-system and R0ω in
reference to the x0y0z0-system, the representation of these rotations must be identical
in reference to the individual coordinate systems. Let this representation matrix be
Rω .
(i) If we describe Rω and R0ω with respect to the xyz-system, we have
 1
Rω ¼ Rω , R0ω ¼ Q1 Rω Q1 ¼ QRω Q1 : ð20:141Þ

Namely, we reproduce

R0ω ¼ QRω Q1 : ð20:140Þ

(ii) If we describe Rω and R0ω with respect to the x0y0z0-system, we have

R0ω ¼ Rω , Rω ¼ Q1 Rω Q: ð20:142Þ

Hence, again we reproduce (20.140). That is, the relation (20.140) is indepen-
dent of specific choices of the coordinate system.
20.2 Rotation Groups: SU(2) and SO(3) 839

Taking the representation indexed by l of Sect. 20.2.4 with respect to


(20.140), we have
  
DðlÞ R0ω ¼ DðlÞ QRω Q1 ¼ DðlÞ ðQÞDðlÞ ðRω ÞDðlÞ Q1

¼ DðlÞ ðQÞDðlÞ ðRω Þ½DðlÞ ðQÞ1 , ð20:143Þ

where with the last equality we used (18.6). Operating D(l )(Q) on (20.143)
from the right, we get


DðlÞ R0ω DðlÞ ðQÞ ¼ DðlÞ ðQÞDðlÞ ðRω Þ: ð20:144Þ

Since both DðlÞ R0ω and D(l )(Rω) are irreducible, from (18.41) of Schur’s First
Lemma DðlÞ R0ω and D(l )(Rω) are equivalent. An infinite number of rotations
determined by the orientation of the rotation axis A that is accompanied by an
azimuthal angle α (0 α < 2π) and a zenithal angle β (0 β < π) (see
Fig. 17.18) form a conjugacy class with each specific ω shared. Thus, we have
classified various representations D(l )(Rω) according to ω and l. Meanwhile, we may
identify D(l )(Rω) with D(l )(α, β, γ) of Sect. 20.2.4.
In Sect. 20.2.2 we know that the spherical surface harmonics
l θ, ϕÞ ðm ¼ l,   , 0,   , lÞ constitute basis functions of D
ð (l )
Ym and span the rep-
resentation space. A dimension of the matrix (or the representation space) is 2l + 1.
0
Therefore, D(l ) and Dðl Þ for different l and l0 have a different dimension. From
0
(18.41) of Schur’s First Lemma, such D(l ) and Dðl Þ are inequivalent.
Returning
 to (20.139), let us evaluate a mean value of f ðϕ, θ, ωÞ. Let the trace of
DðlÞ R0ω and D(l )(Rω) be χ ðlÞ R0ω and χ (l )(Rω), respectively. Remembering (12.13),
the trace is invariant under a similarity transformation, and so χ ðlÞ R0ω ¼ χ ðlÞ ðRω Þ.
Then, we put

χ ðlÞ ðωÞ  χ ðlÞ ðRω Þ ¼ χ ðlÞ R0ω : ð20:145Þ

Consequently, it suffices to evaluate the trace χ (l )(ω) using D(l )(α, 0, 0) of (20.96)
whose representation matrix is given by (20.97). We have
 
Xl eilω 1  eiωð2lþ1Þ
χ ðlÞ ðωÞ ¼ e imω
¼
m¼l 1  eiω
 
eilω ð1  eiω Þ 1  eiωð2lþ1Þ
¼ , ð20:146Þ
ð1  eiω Þð1  eiω Þ

where

½numerator of ð20:146Þ
840 20 Theory of Continuous Groups

¼ eilω þ eilω  eiωðlþ1Þ  eiωðlþ1Þ ¼ 2½ cos lω  cos ðl þ 1Þω


1 ω
¼ 4 sin l þ ω sin
2 2

and

ω
½denominator of ð20:146Þ ¼ 2ð1  cos ωÞ ¼ 4sin2 :
2

Thus, we get

ðlÞ sin l þ 12 ω
χ ð ωÞ ¼ : ð20:147Þ
sin ω2

To adapt (18.76) to the present formulation, we rewrite it as

1 X ðαÞ  ðβÞ
χ ðgÞ χ ðgÞ ¼ δαβ : ð20:148Þ
n g

A summation over a finite number of group element n in (20.148) should be read


as integration in the case of the continuous groups. Instead of a finite number of
irreducible representations in a finite group, we are thinking of an infinite number of
irreducible
R representations with continuous groups. In (20.139) the denominator
dΠ (¼8π 2) corresponds to n of (20.148). If f(ϕ, θ, ω) or f(α, β, ω) is an even
function with respect to ω (π ω < π) as in the case of (20.147), the numerator
of (20.139) can be expressed in a form of
Z Z Z Z
π π 2π
4 sin 2 ω2 2
f ðα, β, ωÞdΠ ¼ dω dβ dαf ðα, β, ωÞ ω sin β
0 0 0 ω2
Z π Z π Z 2π
ω
¼4 dω dβ dαf ðα, β, ωÞ sin 2 sin β: ð20:149Þ
0 0 0 2

If, moreover, f(α, β, ω) depends only on ω, again as in the case of the character
described by (20.147), the calculation is further simplified such that
Z Z π
ω
f ðωÞdΠ ¼ 16π dωf ðωÞ sin 2 : ð20:150Þ
0 2
 0 
Replacing f(ω) with χ ðl Þ ðωÞ χ ðlÞ ðωÞ, we have
20.2 Rotation Groups: SU(2) and SO(3) 841

Z h i Z πh 0 i
ðl0 Þ ðlÞ ω
χ ðωÞ χ ðωÞdΠ ¼ 16π dω χ ðl Þ ðωÞ χ ðlÞ ðωÞsin 2
0 2
Z π sin l0 þ
1 1
ω sin l þ ω Z π
¼ 16π dω 2 2 sin 2 ω ¼ 16π 1 1
dωsin l0 þ ωsin l þ ω
ω ω 2 2 2
0 sin sin 0
Z 2 2
π
¼ 8π dω½ cos ðl0  lÞω  cos ðl0 þ l þ 1Þω:
0
ð20:151Þ

Since l0 + l + 1 > 0, the integral of cos(l0 + l + 1)ω term vanishes. Only if l0  l ¼ 0,


the integral of cos(l0  l)ω term does not vanish, but takes a value of π. Therefore,
we have
Z h i
0
χ ðl Þ ðωÞ χ ðlÞ ðωÞdΠ ¼ 8π 2 δl0 l : ð20:152Þ

 0 
Finally, with χ ðl Þ ðωÞ χ ðlÞ ðωÞ defined as in (20.139), we get

 0 
χ ðl Þ ðωÞ χ ðlÞ ðωÞ ¼ δl0 l : ð20:153Þ

This relation gives the orthogonalization condition in concert with (20.148)


obtained in the case of a finite group.
In Sect. 18.6 we have shown that the number of inequivalent irreducible
representations of a finite group is well defined and given by the number of
conjugacy classes of that group. This is clearly demonstrated in (18.144). In the
case of the continuous groups, however, the situation is somewhat complicated and
we are uncertain of the number of the inequivalent irreducible representations. We
address this problem as follows: Suppose that Ω(α, β, ω) would be an irreducible
representation that is inequivalent to individual D(l)(α, β, ω)
 (l ¼ 0, 1, 2,   ). Let
K(ω) be a character of Ω(α, β, ω). Defining f ðωÞ  χ ðlÞ ðωÞ KðωÞ, (20.150) reads
as
Z h i Z π h i
ðlÞ ω
χ ðωÞ KðωÞdΠ ¼ 16π dω χ ðlÞ ðωÞ KðωÞ sin 2 : ð20:154Þ
0 2

The orthogonalization condition (20.153) demands that the integral of (20.154)


vanishes. Similarly, we would have
Z h i Z π h i
ðlþ1Þ ω
χ ðωÞ KðωÞdΠ ¼ 16π dω χ ðlþ1Þ ðωÞ KðωÞ sin 2 , ð20:155Þ
0 2
842 20 Theory of Continuous Groups

which must vanish as well. Subtracting RHS of (20.154) from RHS of (20.155) and
taking account of the fact that χ (l )(ω) is real [see (20.147)], we have
Z π h i
ω
dω χ ðlþ1Þ ðωÞ  χ ðlÞ ðωÞ KðωÞ sin 2 ¼ 0: ð20:156Þ
0 2

Invoking a trigonometric formula of

aþb ab
sin a  sin b ¼ 2 cos sin
2 2

and applying it to (20.147), (20.156) can be rewritten as


Z π
ω
2 dω½ cos ðl þ 1ÞωKðωÞ sin 2 ¼ 0 ðl ¼ 0, 1, 2,   Þ: ð20:157Þ
0 2

Putting l ¼ 0 in LHS of (20.154) and from the assumption that the integral of
(20.154) vanishes, we also have
Z π
ω
dωKðωÞ sin 2 ¼ 0, ð20:158Þ
0 2

where we used χ (0)(ω) ¼ 1. Then, (20.157) and (20.158) are combined to give
Z π
ω
dωð cos lωÞKðωÞ sin 2 ¼ 0 ðl ¼ 0, 1, 2,   Þ: ð20:159Þ
0 2

This implies that all the Fourier cosine coefficients of KðωÞ sin 2 ω2 vanish.
Considering that coslω (l ¼ 0, 1, 2,   ) forms a complete orthonormal system in
the region [0, π], we must have

ω
KðωÞ sin 2  0: ð20:160Þ
2

Requiring K(ω) to be continuous, (20.160) is equivalent to K(ω)  0. This


implies that there is no other inequivalent irreducible representation than
D(l )(α, β, ω) (l ¼ 0, 1, 2,   ). In other words, the representations D(l )(α, β, ω)
(l ¼ 0, 1, 2,   ) constitute a complete set of irreducible representations. The spherical
l ðθ, ϕÞ ðm ¼ l,   , 0,   , lÞ constitute basis functions of D
(l )
surface harmonics Y m
and span the representation space of D (α, β, ω) accordingly.
(l )

In the above discussion, we assumed that K(ω) is an even function with respect to
ω as in the case of χ (l )(ω) in (20.147). Hence, KðωÞ sin 2 ω2 is an even function as
well. Therefore, we assumed that KðωÞ sin 2 ω2 can be expanded to the Fourier cosine
series.
20.3 Clebsch–Gordan Coefficients of Rotation Groups 843

20.3 Clebsch2Gordan Coefficients of Rotation Groups

In Sect. 18.8, we studied direct-product representations. We know that even though


D(α) and D(β) are both irreducible, D(α  β) is not necessarily irreducible. This is a
common feature for both finite and infinite groups. Moreover, the reducible unitary
representation can be expressed as a direct sum of irreducible representations such
that
X
DðαÞ ðgÞ D ð β Þ ð gÞ ¼ q D
ω ω
ðωÞ
ðgÞ: ð18:199Þ

In the infinite groups, typically the continuous groups, this situation often appears
when we deal with the addition of two angular momenta. Examples include the
addition of two (or more) orbital angular momenta and that of the orbital angular
momentum and spin angular momentum [6, 8].
Meanwhile, there is a set of basis vectors that spans the representation space with
respect to individual irreducible representations. There, we have a following ques-
tion: What are the basis vectors that span the total reduced representation space
described by (18.199)? An answer for this question is that these vectors must be
constructed by the basis vectors relevant to the individual irreducible representa-
tions. The ClebschGordan coefficients appear as coefficients with respect to a
linear combination of the basis vectors associated with those irreducible
representations.
In a word, to find the ClebschGordan coefficients is equivalent to the calcula-
tion of proper coefficients of the basis functions that span the representation space of
the direct-product groups. The relevant continuous groups which we are interested in
are SU(2) and SO(3).

20.3.1 Direct-Product of SU(2) and Clebsch2Gordan


Coefficients

In Sect. 20.2.6 we have explained the orthogonality of irreducible characters of SO


(3) by deriving (20.153). It takes advanced theory to show the orthogonality of
irreducible characters of SU(2). Fortunately, however, we have an expression similar
to (20.153) with SU(2) as well. Readers are encouraged to look up appropriate
literature for this [9]. On the basis of this important expression, we further develop
the representation theory of the continuous groups. Within this framework, we
calculate the ClebschGordan Coefficients.
As in the case of the previous section, the character of the irreducible represen-
tation D( j )(α, β, γ) of SU(2) is described only as a function of a rotation angle.
Similarly to (20.147), in SU(2) we get
844 20 Theory of Continuous Groups


Xj sin j þ 12 ω
ð jÞ
χ ð ωÞ ¼ e imω
¼ , ð20:161Þ
m¼j sin ω2

where χ ( j )(ω) is an irreducible character of D( j )(α, β, γ). It is because we can align


the rotation axis in the direction of, e.g., the z-axis and the representation matrix of a
rotation angle ω is typified by D( j )(ω, 0, 0) or D( j )(0, 0, ω). Or, we can evenly choose
any direction of the rotation axis at will.
We are interested in calculating angular momenta of a coupled system, i.e.,
addition of angular momenta of that system [5, 6]. The word of “coupled system”
needs some explanation. This is because, on the one hand, we may deal with, e.g., a
sum of an orbital angular momentum and a spin angular momentum of a “single”
electron, but, on the other hand, we might calculate, e.g., a sum of two angular
momenta of “two” electrons. In either case, we will deal with the total angular
momenta of the coupled system.
Now, let us consider a direct-product group for this kind of problem. Suppose that
one system is characterized by a quantum number j1 and the other by j2. Here these
numbers are supposed to be those of (3.89), namely the highest positive number of
the generalized angular momentum in the z-direction. That is, the representation
matrices of rotation for systems 1 and 2 are assumed to be Dð j1 Þ ðω, 0, 0Þ and
Dð j2 Þ ðω, 0, 0Þ, respectively. Then, we are to deal with the direct-product represen-
tation Dð j1 Þ Dð j2 Þ (see Sect. 18.8). As in (18.193), we describe

Dð j1 j2 Þ ðωÞ ¼ Dð j1 Þ ðωÞ Dð j2 Þ


ðωÞ: ð20:162Þ

In (20.162), the group is represented by a rotation angle ω, more strictly the


rotation operation of an angle ω. The quantum numbers j1 and j2 mean the irreduc-
ible representations of the rotations for systems 1 and 2, respectively. According to
(20.161), we have

χð j1 j2 Þ
ðωÞ ¼ χ ð j1 Þ ðωÞχ ð j2 Þ
ðωÞ, ð20:163Þ

where χ ð j1 j2 Þ ðωÞ gives a character of the direct-product representation; see


(18.198). Furthermore, we have [5].
X j1 X j2
χð j1 j2 Þ
ð ωÞ ¼ m1 ¼j1
eim1 ω eim2 ω
m2 ¼j2
X j1 X j2 X j1 þj2 XJ
¼ m1 ¼j1 m2 ¼j2
eiðm1 þm2 Þω ¼ J¼j j j j M¼J
eiMω
X
1 2
j1 þj2 ðJ Þ
¼ J¼j j1 j2 j
χ ðωÞ,
ð20:164Þ

where a positive number J can be chosen from among


20.3 Clebsch–Gordan Coefficients of Rotation Groups 845

J ¼ j j1  j2 j, j j1  j2 j þ 1,   , j1 þ j2 :

Rewriting (20.164) more explicitly, we have

χ ð j1 j2 Þ ðωÞ ¼ χ ðj j1 j2 jÞ ðωÞ þ χ ðj j1 j2 jþ1Þ ðωÞ þ    þ χ ð j1 þj2 Þ ðωÞ: ð20:165Þ

If both j1 and j2 are integers, from (20.165) we can immediately derive an


important result. In light of (18.199) and (18.200), we multiply χ (k)(ω) on both
sides of (20.165) and then integrate it to get
Z
χ ðkÞ ðωÞ χ ð j1 j2 Þ ðωÞdΠ
Z Z
¼ χ ðkÞ ðωÞ χ ðj j1 j2 jÞ ðωÞdΠ þ χ ðkÞ ðωÞ χ ðj j1 j2 jþ1Þ ðωÞdΠ þ   
Z ð20:166Þ
þ χ ðkÞ ðωÞ χ ð j1 þj2 Þ ðωÞdΠ

¼ 8π 2 δk,jj1 j2 j þ δk,j j1 j2 jþ1 þ    þ δk,j1 þj2 ,

where k is zeroR or an integer and with the last equality we used (20.152). Dividing
both sides by dΠ, we get

χ ð k Þ ð ωÞ  χ ð j1 j2 Þ ðωÞ ¼ δk,jj1 j2 j þ δk,jj1 j2 jþ1 þ    þ δk,j1 þj2 , ð20:167Þ

where we used (20.153). Equation (20.166) implies that only if k is identical to one
out of jj1  j2j, jj1  j2 j + 1,   , j1 + j2, χ ðkÞ ðωÞ χ ð j1 j2 Þ ðωÞ does not vanish. If, for
example, we choose jj1  j2j for k in (20.167), we have

χ ðj j1 j2 jÞ ðωÞ χ ð j1 j2 Þ ðωÞ ¼ δjj1 j2 j,jj1 j2 j ¼ 1: ð20:168Þ

This relation corresponds R to (18.200) where a finite group is relevant. Note that
the volume of SO(3), i.e., dΠ ¼ 8π 2 corresponds to the order n of a finite group in
(20.148). Considering (18.199) and (18.200) once again, it follows that in the above
case Dðj j1 j2 jÞ takes place once and only once in the direct-product representation
Dð j1 j2 Þ . Thus, we obtain the following important relation:
X j1 þj2
Dð j1 j2 Þ
¼ Dð j1 Þ
ð ωÞ Dð j2 Þ
ð ωÞ ¼ J¼jj1 j2 j
DðJ Þ : ð20:169Þ

Any group (finite or infinite) is said to be simply reducible, if each irreducible


representation takes place at most once when the direct-product representation of
that group in question is reduced. Equation (20.169) clearly shows that SO(3) is
simply reducible. With respect to SU(2) we also have the same expression of
(20.169) [9].
846 20 Theory of Continuous Groups

In Sect. 18.8 we examined the direct-product representation of two (irreducible)


representations. Let D(μ)(g) and D(v)(g) be two different irreducible representations
of the group G. Here G may be either a finite or infinite group. Note that in Sect. 18.8
we have focused on the finite groups and that now we are thinking of an infinite
group, typically SU(2) and SO(3). Let nμ and nv be a dimension of the representation
space of the irreducible representations μ and v, respectively. We assume that each
representation space is spanned by following basis functions:

ðμÞ  ð vÞ
ψj μ ¼ 1,   , nμ and ϕl ðμ ¼ 1,   , nv Þ:

We know from Sect. 18.8 that nμnv new basis functions can be constructed using
ðμÞ ðvÞ
the functions that are described by ψ j ϕl . Our next task will be to classify these
nμnv functions into those belonging to different irreducible representations. For this,
we need relation of (18.199). According to the custom, we rewrite it as follows:
X
DðμÞ ðgÞ DðvÞ ðgÞ ¼ ω
ðμvωÞDðωÞ ðgÞ, ð20:170Þ

where (μvω) is identical with qω of RHS of (18.199) and shows how many times the
same representation ω takes place in the direct-product representations. Naturally,
we have

ðμvωÞ ¼ ðvμωÞ:

Then, we get
X
ðμvωÞnω ¼ nμ nv :
ω

Meanwhile, we should be able to constitute the functions that have transformation


properties the same as those of the irreducible representation ω. In other words, these
functions should make the basis functions belonging to ω. Let the function be Ψðsωτω Þ.
Then, we have
X ðμÞ ðvÞ
Ψðsωτω Þ ¼ ðμj, vljωτω sÞψ j ϕl , ð20:171Þ
j, l

where τω ¼ 1,   , (μvω) and s designates the functions contained in the irreducible


representation ω; i.e., s ¼ 1,   , nω. If ω takes place more than once, we label ω as
ωτω . The coefficients with respect to the above linear combination

ðμj, vljωτω sÞ

are called ClebschGordan coefficients. Readers might well be bewildered by the


notations of abstract algebra, but Examples of Sect. 20.3.3 should greatly relieve
them of anxiety about it.
20.3 Clebsch–Gordan Coefficients of Rotation Groups 847

As we shall see later in this chapter, the functions Ψðsωτω Þ are going to be
ðμÞ ðvÞ
normalized along with the product functions ψ j ϕl . The ClebschGordan coeffi-
cients must form a unitary matrix accordingly. Notice that a transformation between
two orthonormal basis sets is unitary (see Chap. 14). According as there are nμnv
product functions as the basis vectors, the ClebschGordan coefficients are associ-
ated with (nμnv, nμnv) square matrices. The said coefficients can be defined with any
group either finite or infinite. Or rather, difficulty in determining the
ClebschGordan coefficients arises when τω is more than one. This is because we
should take account of linear combinations described by
X
c Ψðωτω Þ
τ ω ωτ ω s

that belong to the irreducible representation ω. This would cause arbitrariness and
ambiguity [5]. Nevertheless, we do not need to get into further discussion about this
problem. From now on, we focus on the ClebschGordan coefficients of SU(2) and
SO(3), a typical simply reducible group, namely τω is at most one.
For SU(2) and SO(3) to be simply reducible saves us a lot of labor. The argument
for this is as follows: Equation (20.169) shows how the dimension of (2j1 + 1)
(2j2 + 1) for the representation space with the direct product of two representations is
redistributed to individual irreducible representations that take place only once. The
arithmetic based on this fact is as follows:

ð2j1 þ 1Þð2j2 þ 1Þ
n h io
1
¼ 2ð j1  j2 Þð2j2 þ 1Þ þ 2  2j2 ð2j2 þ 1Þ þ ð2j2 þ 1Þ
2
¼ 2ð j1  j2 Þð2j2 þ 1Þ þ 2ð0 þ 1 þ 2 þ    þ 2j2 Þ þ ð2j2 þ 1Þ ð20:172Þ
¼ ½2ð j1  j2 Þ þ 0 þ 2½ð j1  j2 Þ þ 1 þ 2½ð j1  j2 Þ þ 2 þ   
þ2½ð j1  j2 Þ þ 2j2  þ ð2j2 þ 1Þ
¼ ½2ð j1  j2 Þ þ 1 þ f2½ð j1  j2 Þ þ 1 þ 1g þ    þ ½2ð j1 þ j2 Þ þ 1,

where with the second equality, we used the formula 1 þ 2 þ    þ n ¼ 12 nðn þ 1Þ;
resulting numbers of 0, 1, 2,   , and 2j2 are distributed to 2j2 + 1 terms of the
subsequent RHS of (20.172) each; the third term of 2j2 + 1 is distributed to 2j2 + 1
terms each as well on the rightmost hand side. Equation (20.172) is symmetric with
j1 and j2, but if we assume that j1 j2, each term of the rightmost hand side of
(20.172) is positive. Therefore, for convenience we assume j1 j2 without loss of
generality.
To clarify the situation, we list several tables (Tables 20.1, 20.2, and 20.3).
Tables 20.1 and 20.2 show specific cases, but Table 20.3 represents a general
case. In these tables the topmost layer has a diagram indicated by . This
diagram corresponds to the case of J ¼ j j1  j2j and contains different
2|j1  j2| + 1 quantum states. The lower layers include diagrams indicated by
. These diagrams correspond to the case of J ¼ |j1  j2| + 1,   , j1 + j2. For
848 20 Theory of Continuous Groups

Table 20.1 Summation of angular momenta. J ¼ 0, 1, or 2; M   J,  J + 1,   , J. Regarding a


dotted aqua line, see text

−1 0 1 (= )

−1 −2 −1 0
0 −1 0 1
1 (= ) 0 1 2

Table 20.2 Summation of angular momenta. J ¼ 1, 2, or 3; M   J,  J + 1,   , J

−2 −1 0 1 2 (= )

−1 −3 −2 −1 0 1
0 −2 −1 0 1 2
1 (= ) −1 0 1 2 3

Table 20.3 Summation of angular momenta in a general case. We suppose j1 > 2j2. J ¼ |j1  j2|,
  , j1 + j2; M   J,  J + 1,   , J
m1
− − +1 ⋯ 2 − ⋯ −2 ⋯ −1
m2
− − − − − +1 ⋯ − ⋯ −3 ⋯ − −1 −
− +1 − − +1 − − +2 ⋯ − +1 ⋯ −3 +1 ⋯ − − +1
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯

each J, 2J + 1 functions (or quantum states) are included and labelled according to
different numbers M  + J,  J + 1,   , J. The rightmost column displays
M  m1 + m2 ¼ |j1  j2|, |j1  j2|  1,   , j1 + j2 from the topmost row through
the bottommost row.
Tables 20.1, 20.2, and 20.3 comprise 2j2 + 1 layers, where total (2j1 + 1)(2j2 + 1)
functions that form basis vectors of Dð j1 Þ ðωÞ Dð j2 Þ ðωÞ are redistributed according
to the M number. The same numbers that appear from the topmost row through the
bottommost row are marked red and connected with dotted aqua lines. These lines
form a parallelogram together with the top and bottom horizontal lines as shown.
The upper and lower sides of the parallelogram are 2 jj1  j2j in width. The
parallelogram gets flattened with decreasing jj1  j2j. If j1 ¼ j2, the parallelogram
coalesces into a line (Table 20.1). Regarding the notations M and J, see Sects. 20.3.2
and 20.3.3 (vide infra).
Bearing in mind the above remarks, we focus on the coupling of two angular
momenta as a typical example. Within this framework we deal with the
20.3 Clebsch–Gordan Coefficients of Rotation Groups 849

ClebschGordan coefficients with regard to SU(2). We develop the discussion


according to literature [5]. The relevant approach helps address a more complicated
situation where the coupling of three or more angular momenta including j  j
coupling and L  S coupling is responsible [6].
From (20.47) we know that fm forms a basis set that spans a (2j + 1)-dimensional
representation space associated with D( j ) such that

u jþm v jm
f m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm ¼ j, j þ 1,   , j  1, jÞ: ð20:47Þ
ð j þ mÞ!ð j  mÞ!

The functions fm are transformed according to Rða, bÞ so that we have


X ð jÞ
Rða, bÞð f m Þ  f 0 D 0 ða, bÞ,
m0 m m m
ð20:48Þ

where j can be either an integer or a half-odd-integer. We henceforth work on fm of


(20.47).

20.3.2 Calculation Procedures of Clebsch2Gordan


Coefficients

In the previous section we have described how systematically the quantum states are
made up of two angular momenta. We express those states in terms of a linear
combination of the basis functions related to the direct-product representations of
(20.169). To determine those coefficients, we follow calculation procedures due to
Hamermesh [5]. The calculation procedures are rather lengthy, and so we summarize
each item separately.
(1) Invariant quantities AJ and BJ:
We start with seeking invariant quantities under the unitary transformation U of
(20.35) expressed as
 
a b
U¼ with jaj2 þ jbj2 ¼ 1: ð20:35Þ
b a

As in (20.46) let us consider a combination of variables u  (u2 u1) and v  (v2 v1)
that are transformed such that
   
 a b  a b
u02 u01 ¼ ðu2 u1 Þ , v02 v01 ¼ ðv2 v1 Þ : ð20:173Þ
b a b a 

For a shorthand notation we have


850 20 Theory of Continuous Groups

u0 ¼ um, v0 ¼ vm, ð20:174Þ

where we define u0, v0, m as


 
  a b
u  u02 u01 ,
0
v  v02 v01 ,
0
m : ð20:175Þ
b a

As in (20.47) we also define the following functions as

u1 j1 þm1 u2 j1 m1
ψ mj11 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm1 ¼ j1 , j1 þ 1,   , j1  1, j1 Þ,
ð j1 þ m1 Þ!ð j1  m1 Þ!
ð20:176Þ
v1 j2 þm2 v2 j2 m2
ϕmj22 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm2 ¼ j2 , j2 þ 1,   , j2  1, j2 Þ:
ð j2 þ m2 Þ!ð j2  m2 Þ!

Meanwhile, we also consider a combination of variables x  (x2 x1) that are


transformed as
 
 a b
x02 x01 ¼ ð x2 x1 Þ : ð20:177Þ
b a

For a shorthand notation we have

x0 ¼ xm , ð20:178Þ

where we have
 
 a b
x0  x02 x01 , x  ðx2 x1 Þ, m ¼ : ð20:179Þ
b a

The unitary matrix m is said to be a complex conjugate matrix of m. We say that


in (20.174) u and v are transformed covariantly and that in (20.178) x is transformed
contravariantly. Defining a unitary operator g as
 
0 1
g ð20:180Þ
1 0

and operating g on both sides of (20.174), we have


 
u0 g ¼ u gg{ mg ¼ ug g{ mg ¼ ugm : ð20:181Þ

Rewriting (20.181) as
20.3 Clebsch–Gordan Coefficients of Rotation Groups 851

u0 g ¼ ðugÞm , ð20:182Þ

we find that ug is transformed contravariantly. Taking transposition of (20.178), we


have

x0 ¼ ðm ÞT xT ¼ m{ xT :
T

Then, we get
 
u0 x0 ¼ ðumÞ m{ xT ¼ u mm{ xT ¼ uxT ¼ u2 x2 þ u1 x1 ¼ uxT ,
T
ð20:183Þ

where with the third equality we used the unitarity of m. This implies that uxT is an
invariant quantity under the unitary transformation by m. We have

v0 x0 ¼ vxT ,
T
ð20:184Þ

meaning that vxT is an invariant as well.


In a similar manner, we have

u01 v02  u02 v01 ¼ v0 ðu0 gÞ ¼ v0 gT u0 ¼ vmgT mT uT ¼ vgT uT ¼ vðugÞT


T T

  
0 1 u2
¼ ðv2 v1 Þ ¼ u1 v 2  u2 v 1 : ð20:185Þ
1 0 u1

Thus, v(ug)T (or u1v2  u2v1) is another invariant under the unitary transformation
by m. In the above discussion we can view (20.183) to (20.185) as a quadratic form
of two variables (see Sect. 14.5).
Using these invariants, we define other important invariants AJ and BJ as follows
[5]:

AJ  ðu1 v2  u2 v1 Þ j1 þj2 J ðu2 x2 þ u1 x1 Þ j1 j2 þJ ðv2 x2 þ v1 x1 Þ j2 j1 þJ


, ð20:186Þ
BJ  ðu2 x2 þ u1 x1 Þ2J : ð20:187Þ

The polynomial AJ is of degree 2j1 with respect to the covariant variables u1 and
u2 and is of degree 2j2 with v1 and v2. The degree of the contravariant variables x1
and x2 is 2J. Expanding AJ in powers of x1 and x2, we get
XJ
AJ ¼ M¼J
W JM X JM , ð20:188Þ

where X JM is defined as

x1 JþM x2 JM
X JM  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðM ¼ J, J þ 1,   , J  1, J Þ ð20:189Þ
ðJ þ M Þ!ðJ  M Þ!
852 20 Theory of Continuous Groups

and coefficients W JM are polynomials of u1, u2, v1, and v2. The coefficients W JM will
be determined soon after in combination with X JM . Meanwhile, we have

X2J ð2J Þ!
BJ ¼ ðu x Þk ðu2 x2 Þ2Jk
k!ð2J  kÞ! 1 1
k¼0
XJ 1
¼ ð2J Þ! M¼J ðu x ÞJþM ðu2 x2 ÞJM
ðJ þ M Þ!ðJ  M Þ! 1 1 ð20:190Þ
XJ 1
¼ ð2J Þ! M¼J u JþM u2 JM x1 JþM x2 JM
ðJ þ M Þ!ðJ  M Þ! 1
XJ
¼ ð2J Þ! M¼J ΦJM X JM ,

where ΦJM is defined as

u1 JþM u2 JM
ΦJM  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðM ¼ J, J þ 1,   , J  1, J Þ: ð20:191Þ
ðJ þ M Þ!ðJ  M Þ!

Implications of the above calculations of abstract algebra are as follows: In Sect.


ð jÞ
20.2.2 we determined the representation matrices Dm0 m of SU(2) using (20.47) and
(20.48). We see that the functions fm (m ¼  j, j + 1,   , j  1, j) of (20.47) are
transformed as basis vectors by the unitary transformation D( j ). Meanwhile, the
functions ΦJM ðM ¼ J, J þ 1,   , J  1, J Þ of (20.191) are transformed as the
basis vectors by the unitary transformation D(J ). Here let us compare (20.188) and
(20.190). Then, we see that whereas ΦJM associates X JM with the invariant BJ, W JM
associates X JM with the invariant AJ. Thus, W JM plays the same role as ΦJM and, hence,
we expect that W JM is eligible for the basis vectors for D(J ) as well.
(2) Binomial expansion of AJ:
To calculate W JM , using the binomial theorem we expand AJ such that [5]
!
j1 þj2 J
X j1 þj2 J λ
j1 þ j2  J j1 þj2 Jλ
ð u1 v 2  u2 v 1 Þ ¼ λ¼0
ð1Þ ð u1 v 2 Þ ð u2 v 1 Þ λ ,
λ
!
j1 j2 þJ
X j1 j2 þJ j1  j2 þ J j1 j2 þJμ
ð u2 x 2 þ u 1 x 1 Þ ¼ μ¼0
ð u1 x 1 Þ ð u2 x 2 Þ μ ,
μ
!
j2 j1 þJ
X j1 j2 þJ j2  j1 þ J j2 j1 þJv
ð v2 x2 þ v1 x1 Þ ¼ μ¼0
ð v1 x1 Þ ð v2 x2 Þ v :
v
ð20:192Þ
20.3 Clebsch–Gordan Coefficients of Rotation Groups 853

Therefore, we have

X    
λ j1 þ j2  J j1  j2 þ J j2  j1 þ J
AJ ¼ ð1Þ
λ,μ,v λ μ v
 u1 2j1 λμ u2 λþμ v1 j2 j1 þJvþλ
v2 j1 þj2 Jλþv
x1 2Jμv x2 μþv ð20:193Þ

Introducing new summation variables m1 ¼ j1  λ  μ and m2 ¼ J  j1 + λ  v,


we obtain
! ! !
X j1 þ j2  J j1  j2 þ J j2  j1 þ J
λ
AJ ¼ λ,μ,v
ð1Þ
λ j1  λ  m1 j2  λ þ m 2 ð20:194Þ
j1 þm1 j1 m1 j2 þm2 j2 m2
u1 u2 v1 v2 x1 Jþm1 þm2 x2 Jm1 m2 ,
 
j2  j1 þ J
where to derive in RHS we used
j2  λ þ m2
   
a a
¼ : ð20:195Þ
b ab

Further using (20.176) and (20.189), we modify AJ such that

X  
ð j1 þ j2  J Þ!ð j1  j2 þ J Þ!ð j2  j1 þ J Þ!
AJ ¼ ð1Þλ
m1 ,m2 ,λ λ!ð j1 þ j2  J  λÞ!ð j1  λ  m1 Þ!
½ð j1 þ m1 Þ!ð j1  m1 Þ!ð j2 þ m2 Þ!ð j2  m2 Þ!ðJ þ m1 þ m2 Þ!ðJ  m1  m2 Þ!1=2 j1 j2 J
 ψ m1 ϕm2 X m1 þm2 :
ðJ  j2 þ λ þ m1 Þ!ð j2  λ þ m2 Þ!ðJ  j1 þ λ  m2 Þ!

Setting m1 + m2 ¼ M, we have

X  
ð j1 þ j2  J Þ!ð j1  j2 þ J Þ!ð j2  j1 þ J Þ!
AJ ¼ ð1Þλ
m1 ,m2 ,λ λ!ð j1 þ j2  J  λÞ!ð j1  λ  m1 Þ!
½ð j1 þ m1 Þ!ð j1  m1 Þ!ð j2 þ m2 Þ!ð j2  m2 Þ!ðJ þ M Þ!ðJ  M Þ!1=2 j1 j2 J
 ψ m1 ϕm2 X M :
ðJ  j2 þ λ þ m1 Þ!ð j2  λ þ m2 Þ!ðJ  j1 þ λ  m2 Þ!
ð20:196Þ

In this way, we get W JM for (20.188) expressed as

W JM ¼ ð j1 þ j2  J Þ!ð j1  j2 þ J Þ!ð j2  j1 þ J Þ!
X
 CJ ψ j1 ϕ j2 ,
m ,m ,m þm ¼M m1 ,m2 m1 m2
ð20:197Þ
1 2 1 2

where we define C Jm1 ,m2 as


854 20 Theory of Continuous Groups

X ð1Þλ ½ð j1 þm1 Þ!ð j1 m1 Þ!ð j2 þm2 Þ!ð j2 m2 Þ!ðJ þM Þ!ðJ M Þ!1=2
C Jm1 ,m2  λ λ!ð j þj J λÞ!ð j λm1 Þ!ðJ j þλþm1 Þ!ð j λþm2 Þ!ðJ j þλm2 Þ!
:
1 2 1 2 2 1

ð20:198Þ

From (20.197), we assume that the functions W JM ðM ¼ J, Jþ


1,   , J  1, JÞ are suited for forming basis vectors of D(J ). Then, any basis set ΛJM
should be connected to W JM via unitary transformation such that

ΛJM ¼ UW JM ,

where U is a unitary matrix. Then, we get W JM ¼ U { ΛJM . What we have to do is only


to normalize W JM. If we carefully look at the functional form of (20.197), we become
aware that we only have to adjust the first factor of (20.197) that is a function of only
J. Hence, as the proper normalized functions we expect to have

ΨJM ¼ C ðJ ÞW JM ¼ C ðJ ÞU { ΛJM , ð20:199Þ

where C(J ) is a constant that depends only on J with given numbers j1 and j2. An
implication of (20.199) is that we can get suitably normalized functions ΨJM using
arbitrary basis vectors ΛJM .
Putting

ρJ  C ðJ Þ ð j1 þ j2  J Þ!ð j1  j2 þ J Þ!ð j2  j1 þ J Þ!, ð20:200Þ

we get
X
ΨJM ¼ ρJ m1 ,m2 ,m1 þm2 ¼M
C Jm1 ,m2 ψ mj11 ϕmj22 : ð20:201Þ

The combined states are completely decided by J and M. If we fix J at a certain


number, (20.201) is expressed as a linear combination of different functions ψ mj11 ϕmj22
with varying m1 and m2 but fixed m1 + m2 ¼ M. Returning back to Tables 20.1, 20.2,
and 20.3, the same M can be found in at most 2j2 + 1 places. This implies that ΨJM of
(20.201) has at most 2j2 + 1 terms on condition that j1 j2. In (20.201) the
ClebschGordan coefficients are given by

ρJ CJm1 ,m2 :

The descriptions of the ClebschGordan coefficients are different from literature


to literature [5, 6]. We adopt the description due to Hamermesh [5] such that
20.3 Clebsch–Gordan Coefficients of Rotation Groups 855

ð j1 m1 j2 m2 jJM Þ  ρJ C Jm1 ,m2 ,


X
ΨJM ¼ ð j m j m jJM Þψ mj11 ϕmj22 :
m ,m ,m þm ¼M 1 1 2 2
ð20:202Þ
1 2 1 2

(3) Normalization of ΨJM :


Normalization condition for ΨJM of (20.201) or (20.202) is given by
X 2
j ρJ j 2 m1 ,m2 ,m1 þm2 ¼M
CJm1 ,m2 ¼ 1: ð20:203Þ

To obtain (20.203), we assume the following normalization condition for the


basis vectors ψ mj11 ϕmj22 that span the representation space. That is, for an appropriate
pair of complex variables z1 and z2, we define their inner product as follows [9]:
  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 jz1 z2
zl1 zml ¼ l!ðm  lÞ! k!ðm  k Þ!δlk or
k mk

* +
zl1 zml zk1 zmk
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ δlk:
2 2
ð20:204Þ
l!ðm  lÞ! k!ðm  kÞ!

Equation (20.204) implies that m + 1 monomials of m-degree with respect to z1


and z2 constitute the orthonormal basis.
Since ρJ defined in (20.200) is independent of M, we can evaluate it by choosing
M conveniently in (20.203). Setting M (¼m1 + m2) ¼ J and substituting it in
J  j1 + λ  m2 of (20.198), we obtain (m1 + m2)  j1 + λ  m2 ¼ λ + m1  j1.
So far as we are dealing with integers, an inside of the factorial must be non-
negative, and so we have

λ þ m1  j1 0 or λ j1  m 1 : ð20:205Þ

Meanwhile, looking at another factorial ( j1  λ  m1)!, we get

λ j1  m1 : ð20:206Þ

From the above equations, we obtain only one choice for λ such that [5].

λ ¼ j1  m1 : ð20:207Þ

Notice that j1  m1 is an integer, whichever j1 takes out of an integer or a half-


odd-integer. Then, we get
856 20 Theory of Continuous Groups

pffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð 2J Þ! ð j1 þm1 Þ!ð j2 þm2 Þ!
CJm1 ,m2 ¼ ð1Þ j1 m1 
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! ð j1 m1 Þ!ð j2 m2 Þ!
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
  
j1 m1 ð2J Þ! j1 þm1 j2 þm2
¼ ð1Þ  : ð20:208Þ
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j2 m2 j1 m1
  
j1 þ m 1 j2 þ m2
To confirm (20.208), calculate and use, e.g.,
j2  m 2 j1  m1

j1  j2 þ m1 þ m2 ¼ j1  j2 þ M ¼ j1  j2 þ J: ð20:209Þ

Thus, we get
X 2
m1 ,m2 ,m1 þm2 ¼M
CJm1 ,m2
X  
ð2J Þ! j1 þ m 1
¼
ðJ  j2 þ j1 Þ!ðJ  j1 þ j2 Þ! m1 ,m2 ,m1 þm2 ¼M j2  m2
 
j2 þ m2
 : ð20:210Þ
j1  m 1

To evaluate the sum, we use the following relations [4, 5, 10]:

ΓðzÞΓð1  zÞ ¼ π=ð sin πzÞ: ð20:211Þ

Replacing z with x + 1 in (20.211), we have

Γðx þ 1ÞΓðxÞ ¼ π=½ sin π ðx þ 1Þ: ð20:212Þ

Meanwhile, using gamma functions, we have


 
x x! Γ ð x þ 1Þ Γðx þ 1ÞΓðxÞ Γðy  xÞ
¼ ¼ ¼ 
y y!ðx  yÞ! y!Γðx  y þ 1Þ Γðx  y þ 1ÞΓðy  xÞ y!ΓðxÞ
π sin π ðx  y þ 1Þ Γðy  xÞ
¼  
sin π ðx þ 1Þ π y!ΓðxÞ
sin π ðx  y þ 1Þ Γðy  xÞ Γ ð y  xÞ
¼  ¼ ð1Þy  : ð20:213Þ
sin π ðx þ 1Þ y!ΓðxÞ y!ΓðxÞ

Replacing x in (20.213) with y  x  1, we have


   
yx1 y Γ ð x þ 1Þ x
¼ ð1Þ  ¼ ð1Þy  ,
y y!Γðx  y þ 1Þ y

where with the last equality we used (20.213). That is, we get
20.3 Clebsch–Gordan Coefficients of Rotation Groups 857

   
x y yx1
¼ ð1Þ : ð20:214Þ
y y

Applying (20.214) to (20.210), we get


X 2
m1 ,m2 ,m1 þm2 ¼M
C Jm1 ,m2

ð2J Þ!
¼ 
ðJ  j2 þ j1 Þ!ðJ  j1 þ j2 Þ!

X   
j2 m2 j1 m1 j2 m2 j1 m1 1 j1 m1 j2 m2 1
ð1Þ ð1Þ
m1 ,m2 ,m1 þm2 ¼M j2 m2 j1 m1
  
ð1Þ j1 þj2 J ð2J Þ! X j2 j1 J1 j1 j2 J1
¼ : ð20:215Þ
ðJj2 þj1 Þ!ðJj1 þj2 Þ! m1 ,m2 ,m1 þm2 ¼M j2 m2 j1 m1

Meanwhile, from the binomial theorem, we have


! ! ! !
X r X s X r s
α β
r
ð1 þ xÞ ð1 þ xÞ ¼ s
α
x β
x ¼ α,β
xαþβ
α β α β
!
X rþs
¼ ð 1 þ xÞ rþs
¼ γ
xγ :
γ
ð20:216Þ

Comparing the coefficients of the γ-th power monomials of the last three sides,
we get
  X   
rþs r s
¼ : ð20:217Þ
γ α,β,αþβ¼γ α β

Applying (20.217) to (20.215) and employing (20.214) once again, we obtain


X 2
m1 ,m2 ,m1 þm2 ¼M
C Jm1 ,m2
   
ð1Þ j1 þj2 J ð2J Þ! 2J 2 ð1Þ j1 þj2 J ð1Þ j1 þj2 J ð2J Þ! j1 þj2 þJ þ1
¼ ¼
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j1 þj2 J ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j1 þj2 J

 
ð2J Þ! j1 þj2 þJ þ1 ð2J Þ! ð j1 þj2 þJ þ1Þ!
¼ ¼ 
ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j1 þj2 J ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! ð j1 þj2 J Þ!ð2J þ1Þ!

ð j1 þj2 þJ þ1Þ!
¼ : ð20:218Þ
ð2J þ1ÞðJ j2 þj1 Þ!ðJ j1 þj2 Þ!ð j1 þj2 J Þ!
858 20 Theory of Continuous Groups

Inserting (20.218) into (20.203), as a positive number ρJ we have


sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2J þ 1ÞðJ  j2 þ j1 Þ!ðJ  j1 þ j2 Þ!ð j1 þ j2  J Þ!
ρJ ¼ : ð20:219Þ
ð j1 þ j2 þ J þ 1Þ!

From (20.200), we find


rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2J þ 1
C ðJ Þ ¼ : ð20:220Þ
ðJ  j2 þ j1 Þ!ðJ  j1 þ j2 Þ!ð j1 þ j2  J Þ!ð j1 þ j2 þ J þ 1Þ!

Thus, as the ClebschGordan coefficients ( j1m1j2m2| JM), at last we get

ð j1 m1 j2 m2 jJM Þ  ρJ C Jm1 ,m2


sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2J þ1ÞðJ j2 þj1 Þ!ðJ j1 þj2 Þ!ð j1 þj2 J Þ!
¼
ð j1 þj2 þJ þ1Þ!

X ð1Þλ ½ð j1 þm1 Þ!ð j1 m1 Þ!ð j2 þm2 Þ!ð j2 m2 Þ!ðJ þM Þ!ðJ M Þ!1=2
 λ λ!ð j þj J λÞ!ð j λm1 Þ!ðJ j þλþm1 Þ!ð j λþm2 Þ!ðJ j þλm2 Þ!
1 2 1 2 2 1
ð20:221Þ

During the course of the above calculations, we encountered, e.g., Γ(x) and the
factorial involving a negative integer. Under ordinary circumstances, however, such
things must be avoided. Nonetheless, it would be convenient for practical use, if we
properly recognize that we first choose a number, e.g., x close to (but not identical
with) a negative integer and finally take the limit of x at a negative integer after the
related calculations have been finished.
Choosing, e.g., M(¼m1 + m2) ¼  J in (20.203) instead of setting M ¼ J in the
above, we will see that only the factor of λ ¼ j2 + m2 in (20.198) survives. Yet we get
the same result as the above. The confirmation is left for readers.

20.3.3 Examples of Calculation of Clebsch2Gordan


Coefficients

Simple examples mentioned below help us understand the calculation procedures of


the ClebschGordan coefficients and their implications.
Example 20.1 As a simplest example, we examine a case of D(1/2) D(1/2). At the
same time, it is an example of the symmetric and antisymmetric representations
20.3 Clebsch–Gordan Coefficients of Rotation Groups 859

discussed in Sect. 18.9. Taking two sets of complex valuables u  (u2 u1) and
v  (v2 v1) and rewriting them according to (20.176), we have

1=2 1=2 1=2 1=2


ψ 1=2 ¼ u2 , ψ 1=2 ¼ u1 , ϕ1=2 ¼ v2 , ϕ1=2 ¼ v1

so that we can get the basis functions of the direct-product representation described
by

ðu2 v2 u1 v2 u2 v1 u1 v1 Þ:

Notice that we have four functions above; i.e., (2j1 + 1)(2j2 + 1) ¼ 4, j1 ¼ j2 ¼ 1/2.
Using (20.221) we can determine the proper basis functions with respect to ΨJM of
(20.202). For example, for Ψ11 we have only one choice to take m1 ¼ m2 ¼  12 to
get m1 + m2 ¼ M ¼  1. In (20.221) we have no other choice but to take λ ¼ 0. In
this way, we have

1=2 1=2
Ψ11 ¼ u2 v2 ¼ ψ 1=2 ϕ1=2 :

Similarly, we get

1=2 1=2
Ψ11 ¼ u1 v1 ¼ ψ 1=2 ϕ1=2 :

With two other functions, using (20.221) we obtain

1 1 1=2 1=2 1=2 1=2


Ψ10 ¼ pffiffiffi ðu2 v1 þ u1 v2 Þ ¼ pffiffiffi ψ 1=2 ϕ1=2 þ ψ 1=2 ϕ1=2 ,
2 2
1 1 1=2 1=2 1=2 1=2
Ψ0 ¼ pffiffiffi ðu1 v2  u2 v1 Þ ¼ pffiffiffi ψ 1=2 ϕ1=2  ψ 1=2 ϕ1=2 :
0
2 2

We put the above results in a simple matrix form representing the basis vector
transformation such that
0 1
1 0 0 0
B 1 1 C
B 0 pffiffiffi 0 pffiffiffi C
B 2 2 C 
ðu2 v2 u1 v2 u1 v1 u2 v1 ÞB
B0 0
C ¼ Ψ 1 Ψ1 Ψ1 Ψ0 :
C 1 0 1 0 ð20:222Þ
B 1 0 C
@ 1 1 A
0 pffiffiffi 0  pffiffiffi
2 2

In this way, (20.222) shows how the basis functions that possess a proper
irreducible representation and span the direct-product representation space are
constructed using the original product functions. The ClebschGordan coefficients
860 20 Theory of Continuous Groups

play a role as a linker (i.e., a unitary operator) of those two sets of functions. In this
respect these coefficients resemble those appearing in the symmetry-adapted linear
combinations (SALCs) mentioned in Sects. 18.2 and 19.3.
Expressing the unitary transformation according to (20.173) and choosing
 1
Ψ1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the transformation in a similar
manner to (20.48) such that
 
Rða, bÞ Ψ11 Ψ10 Ψ11 Ψ00 ¼ Ψ11 Ψ10 Ψ11 Ψ00 Dð1=2Þ Dð1=2Þ : ð20:223Þ

Then, as the matrix representation of D(1/2) D(1/2) we get


0 pffiffiffi 1
a2 2ab b2 0
B  ffiffi2ffiab
p
aa  bb

pffiffiffi 
2a b 0 C
B C
Dð1=2Þ Dð1=2Þ ¼B pffiffiffi C: ð20:224Þ
@ ð b Þ 2  2a b  2
ða Þ 0A
0 0 0 1

Notice that (20.224) is a block matrix, namely (20.224) has been reduced
according to the symmetric and antisymmetric representations. Symbolically writing
(20.223) and (20.224), we have

Dð1=2Þ Dð1=2Þ ¼ Dð1Þ ⨁Dð0Þ :

The block matrix of (20.224) is unitary, and so the (3, 3) submatrix and (1, 1)
submatrix (i.e., the number of 1) are unitary accordingly. To show it, use the
conditions (20.35) and (20.173). The confirmation is left for readers. This submatrix

is a symmetric representation, whose representation space Ψ11 Ψ10 Ψ11 span. Note
that these functions are symmetric with the exchange m1 $ m2. Or, we may think
that these functions are symmetric with the exchange ψ $ ϕ. Though trivial, the
basis function Ψ00 spans the antisymmetric representation space. The corresponding
submatrix is merely the number 1. This is directly related to the fact that v
(ug)T (or u1v2  u2v1) of (20.185) is an invariant. The function Ψ00 changes the
sign (i.e., antisymmetric) with the exchange m1 $ m2 or ψ $ ϕ.
Once again, readers might well ask why we need to work out such an elaborate
means to pick up a small pebble. With increasing dimension of the vector space,
however, to seek an appropriate set of proper basis functions becomes increasingly
as difficult as lifting a huge rock. Though simple, a next example gives us a feel for
it. Under such situations, a projection operator is an indispensable tool to address the
problems.
Example 20.2 We examine a direct product of D(1) D(1), where j1 ¼ j2 ¼ 1. This
is another example of the symmetric and antisymmetric representations. Taking
two sets of complex valuables u  (u2 u1) and v  (v2 v1) and rewriting them
20.3 Clebsch–Gordan Coefficients of Rotation Groups 861

according to (20.176) again, we have nine, i.e., (2j1 + 1)(2j2 + 1) product functions
expressed as

ψ 11 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 , ψ 11 ϕ10 , ψ 10 ϕ10 , ψ 11 ϕ10 , ψ 11 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 :
ð20:225Þ

These product functions form the basis functions of the direct-product represen-
tation. Consequently, we will be able to construct proper eigenfunctions by means of
linear combinations of these functions. This method is again related to that based on
SALCs discussed in Chap. 19. In the present case, it is accomplished by finding the
ClebschGordan coefficients described by (20.221).
As implied in Table 20.1, we need to determine the individual coefficients of nine
functions of (20.225) with respect to the following functions that are constituted by
linear combinations of the above nine functions:

Ψ22 , Ψ21 , Ψ20 , Ψ21 , Ψ22 , Ψ11 , Ψ10 , Ψ11 , Ψ00 :

(i) With the first five functions, we have J ¼ j1 + j2 ¼ 2. In this case, the
determination procedures of the coefficients are pretty much simplified. Substitut-
ing J ¼ j1 + j2 for the second factor of the denominator of (20.221), we get (λ)!
as well as λ! in the first factor of the denominator. Therefore, for the inside of
factorials not to be negative we must have λ ¼ 0. Consequently, we have [5]

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2j1 Þ!ð2j2 Þ!
ρJ ¼ ,
ð2J Þ!
 1=2
ðJ þ M Þ!ðJ  M Þ!
C Jm1 ,m2 ¼ :
ð j1 þ m1 Þ!ð j1  m1 Þ!ð j2 þ m2 Þ!ð j2  m2 Þ!

With Ψ2 2 we have only one choice for ρJ and CJm1 ,m2. That is, we have m1 ¼ j1
and m2 ¼ j2. Then, Ψ2 2 ¼ ψ 1 1 ϕ1 1 . In the case of M ¼  1, we have two
choices in (20.221); i.e., m1 ¼  1, m2 ¼ 0 and m1 ¼ 0, m2 ¼  1. Then, we
get

1 
Ψ21 ¼ pffiffiffi ψ 11 ϕ10 þ ψ 10 ϕ11 : ð20:226Þ
2

In the case of M ¼ 1, similarly we have two choices in (20.221); i.e., m1 ¼ 1,


m2 ¼ 0 and m1 ¼ 0, m2 ¼ 1. We get
862 20 Theory of Continuous Groups

1 
Ψ21 ¼ pffiffiffi ψ 11 ϕ10 þ ψ 10 ϕ11 : ð20:227Þ
2

Moreover, in the case of M ¼ 0, we have three choices in (20.221); i.e.,


m1 ¼  1, m2 ¼ 1 and m1 ¼ 0, m2 ¼ 0 along with m1 ¼ 1, m2 ¼  1 (see
Table 20.1). As a result, we get

1 
Ψ20 ¼ pffiffiffi ψ 11 ϕ11 þ 2ψ 10 ϕ10 þ ψ 11 ϕ11 : ð20:228Þ
6

(ii) With J ¼ 1, i.e., Ψ11 , Ψ10 , and Ψ11 , we start with (20.221). Noting the λ! and
(1  λ)! in the first two factors of denominator, we have λ ¼ 0 or λ ¼ 1. In the
case of M ¼  1, we have only one choice of m1 ¼ 0, m2 ¼  1 for λ ¼ 0.
Similarly, m1 ¼  1, m2 ¼ 0 for λ ¼ 1. Hence, we get

1 
Ψ11 ¼ pffiffiffi ψ 10 ϕ11  ψ 11 ϕ10 : ð20:229Þ
2

In a similar manner, for M ¼ 0 and M ¼ 1, respectively, we have

1 
Ψ10 ¼ pffiffiffi ψ 11 ϕ11  ψ 11 ϕ11 , ð20:230Þ
2
1 
Ψ11 ¼ pffiffiffi ψ 11 ϕ10  ψ 10 ϕ11 : ð20:231Þ
2

(iii) With J ¼ 0 (i.e., Ψ00 ), we have J ¼ j1  j2 (¼0). In this case, we have


J  j1 + λ  m2 ¼  j2 + λ  m2 in the denominator of (20.221) [5]. Since
this factor is inside the factorial, we have j2 + λ  m2 0. Also, we have
j2  λ + m2 0 in the denominator of (20.221). Hence, we have only one
choice of λ ¼ j2 + m2. Therefore, we get
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2J þ 1Þ!ð2j2 Þ!
ρJ ¼ ,
ð2j1 þ 1Þ!
 1=2
j2 þm2 ð j1 þ m1 Þ!ð j1  m1 Þ!
C Jm1 ,m2 ¼ ð1Þ :
ðJ þ M Þ!ðJ  M Þ!ð j2 þ m2 Þ!ð j2  m2 Þ!

In the above, we have three choices of m1 ¼  1, m2 ¼ 1; m1 ¼ m2 ¼ 0; m1 ¼ 1,


m2 ¼  1. As a result, we get

1 
Ψ00 ¼ pffiffiffi ψ 11 ϕ11  ψ 10 ϕ10 þ ψ 11 ϕ11 :
3
20.3 Clebsch–Gordan Coefficients of Rotation Groups 863

Summarizing the above results, we have constructed the proper


eigenfunctions with respect to the combined angular momenta using the basis
functions of the direct-product representation. An example of the relevant
matrix representations is described as follows:
 1 1
ψ 1 ϕ1 ψ 10 ϕ11 ψ 11 ϕ11 ψ 10 ϕ11 ψ 11 ϕ11 ψ 11 ϕ10 ψ 11 ϕ11 ψ 11 ϕ10 ψ 10 ϕ10
0 1
1 0 0 0 0 0 0 0 0
B C
B 0 p1ffiffiffi 0 0 0 pffiffiffi
1
0 C
B 0 0 C
B 2 2 C
B C
B 1 C
B 0 0 p1ffiffiffi 0 0 1
pffiffiffi pffiffiffi C
B 0 0 C
B 6 2 3 C
B C
B 1 1 C
B0 0 0 p ffiffi
ffi 0 0 0  p ffiffi
ffi 0 C
B C
B 2 2 C
B C
B0 0 0 0 1 0 0 0 0 C
BB C ð20:232Þ
C
B 0 p1ffiffiffi 0 0 0  pffiffiffi
1
0 C
B 0 0 C
B 2 2 C
B C
B 1 C
B 0 0 p1ffiffiffi 0 0 0
1
 pffiffiffi 0 pffiffiffi C
B C
B 6 2 3 C
B C
B 1 1 C
B0 0 0 p ffiffi
ffi 0 0 0 p ffiffi
ffi 0 C
B C
B 2 2 C
B C
@ 2 1 A
0 0 p ffiffi
ffi 0 0 0 0 0  p ffiffi

6 3
 2
¼ Ψ2 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00

In (20.232) the matrix elements represent the ClebschGordan coefficients


and their combination in (20.232) forms a unitary matrix. In (20.232) we have
arbitrariness with the disposition of the elements of both row and column
vectors. In other words, we have the arbitrariness with the unitary similarity
transformation of the matrix, but the unitary matrix that underwent the unitary
similarity transformation is again a unitary matrix (see Chap. 14). To show that
the determinant of (20.232) is 1, use the expansion of the determinant and use
the fact that when (a multiple of) a certain column (or row) is added to another
column (or row), the determinant is unchanged (see Sect. 11.3).
We find that the functions belonging to J ¼ 2 and J ¼ 0 are the basis
functions of the symmetric representations. Meanwhile, those belonging to
J ¼ 1 are the basis functions of the antisymmetric representations (see Sect.
18.9). Expressing the unitary transformation of this example according to

(20.173) and choosing Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00 as the basis
vectors, we describe the transformation as
864 20 Theory of Continuous Groups


ℜða, bÞ Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00
 ð20:233Þ
¼ Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00 Dð1Þ Dð1Þ :

Notice again that the representation matrix of D(1) D(1) can be converted
(or reduced) to a block unitary matrices according to the symmetric and
antisymmetric representations such that

Dð1Þ Dð1Þ ¼ Dð2Þ ⨁Dð1Þ ⨁Dð0Þ , ð20:234Þ

where D(2) and D(0) are the symmetric representations and D(1) is the
antisymmetric representation. Recall that SO(3) is simply reducible. To show
(20.234), we need somewhat lengthy calculations, but the computation pro-
cedures are straightforward.

A set of basis functions Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 spans a five-dimensional
symmetric representation space. In turn, Ψ00 spans a one-dimensional symmetric

representation space. Another set of basis functions Ψ11 Ψ10 Ψ11 spans a
three-dimensional antisymmetric representation space.
We add that to seek some special ClebschGordan coefficients would be
easy. For example, we can readily find out them in the case of J ¼ j1 + j2 ¼ M or
J ¼ j1 + j2 ¼  M. The final result of the normalized basis functions can be

j1 þj2 j j j þj2 j j
Ψ j1 þj2 ¼ ψ j11 ϕ j22 , Ψ1ð j1 þj2 Þ ¼ ψ j1 1 ϕj2 2 :

That is, among the related coefficients, only

ð j1 , m1 ¼ j1 , j2 , m2 ¼ j2 jJ ¼ j1 þ j2 , M ¼ J Þ

and

ð j1 , m1 ¼ j1 , j2 , m2 ¼ j2 jJ ¼ j1 þ j2 , M ¼ J Þ

survive and take a value of 1; see the corresponding parts of the matrix
elements in (20.221). This can readily be seen in Tables 20.1, 20.2, and 20.3.

20.4 Lie Groups and Lie Algebras

In Sect. 20.2 we have developed a practical approach to dealing with various aspects
and characteristics of SU(2) and SO(3). In this section we introduce an elegant theory
of Lie groups and Lie algebras that enable us to systematically study SU(2) and
SO(3).
20.4 Lie Groups and Lie Algebras 865

20.4.1 Definition of Lie Groups and Lie Algebras:


One-Parameter Groups

We start with the following discussion. Let A(t) be a non-singular matrix whose
elements vary as a function of a real number t. Here we think of a matrix as a function
of a real number t. If under such conditions A(t) constitutes a group, A(t) is said to be
a one-parameter group. In particular, we are interested in a situation where we have

Aðs þ t Þ ¼ AðsÞAðt Þ: ð20:235Þ

We further get

AðsÞAðt Þ ¼ Aðs þ t Þ ¼ Aðt þ sÞ ¼ Aðt ÞAðsÞ: ð20:236Þ

Therefore, any pair of A(t) and A(s) is commutative. Putting t ¼ s ¼ 0 in (20.236),


we have

Að0Þ ¼ Að0ÞAð0Þ: ð20:237Þ

Since A(0) is non-singular, A(0)1 must exist. Then, multiplying A(0)1 on both
sides of (20.237), we obtain

Að0Þ ¼ E, ð20:238Þ

where E is an identity matrix. Also, we have

Aðt ÞAðt Þ ¼ Að0Þ ¼ E, ð20:239Þ

meaning that

Aðt Þ ¼ Aðt Þ1 :

Differentiating both sides of (20.235) with respect to s at s ¼ 0, we get

A0 ðt Þ ¼ A0 ð0ÞAðt Þ: ð20:240Þ

Defining

X  A0 ð0Þ, ð20:241Þ

we obtain
866 20 Theory of Continuous Groups

A0 ðt Þ ¼ XAðt Þ: ð20:242Þ

Integrating (20.242), we have

Aðt Þ ¼ exp ðtX Þ ð20:243Þ

under an initial condition A(0) ¼ E. Thus, any one-parameter group A(t) is described
by (20.243) on condition that A(t) is represented by (20.235).
In Sect. 20.1 we mentioned the Lie algebra in relation to (20.14). In this context
(20.243) shows how the Lie algebra is related to the one-parameter group. For a
group to be a continuous group is thus deeply connected to this one-parameter group.
Equation (20.45) of Sect. 20.2 is an example of product of one-parameter groups.
Bearing in mind the above situation, we describe definitions of the Lie group and Lie
algebra.
Definition 20.2 [9] Let G be a subgroup of GL(n, ℂ). Suppose that
Ak (k ¼ 1, 2,   ) 2 G and that lim Ak ¼ A ½2 GLðn, ℂÞ. If A 2 G, then G is called
k!1
a linear Lie group of dimension n.
We usually call a linear Lie group simply a Lie group. Note that GL(n, ℂ) has
appeared in Sect. 16.1. The definition again is most likely to bewilder chemists in
that they read it as though the definition itself including the word of Lie group were
written in the air. Nevertheless, a next example is again expected to relieve the
chemists of useless anxiety.
Example 20.3 Let G be a group and let Ak 2 G with detAk ¼ 1 such that
 
cos θk  sin θk
Ak ¼ ðk : real numberÞ: ð20:244Þ
sin θk cos θk

Suppose that lim θk ¼ θ and then we have


k!1

   
cos θk  sin θk cos θ  sin θ
lim Ak ¼ lim ¼  A: ð20:245Þ
k!1 k!1 sin θk cos θk sin θ cos θ

Certainly, we have A 2 G, because detAk ¼ det A ¼ 1. Such group is called


SO(2).
In a word, a lie group is a group whose component consists of continuous and
differentiable (or analytic) functions. In that sense the groups such as SU(2) and
SO(3) we have already encountered are Lie groups.
Next, the following is in turn a definition of a Lie algebra.
20.4 Lie Groups and Lie Algebras 867

Definition 20.3 [9] Let G be a linear Lie group. If with 8t (t: real number) we have

exp ðtX Þ 2 G, ð20:246Þ

all X are called a Lie algebra of̬ the corresponding Lie group G. We denote the Lie
algebra of the Lie group G by g.
The relation (20.246) includes the case where X is a (1, 1) matrix (i.e., merely a
complex number) as the trivial case. Most importantly, (20.246) is directly
connected to (20.243) and Definition 20.3 shows the direct relevance between the
Lie group and Lie algebra. The two theories of Lie group and Lie algebra underlie
continuous groups.
The lie algebra has following properties:
̬ ̬
(i) If X 2g, aX 2g as well, where a is any real number.
̬ ̬
(ii) If X, Y 2g, X þ Y 2g as well.
̬ ̬
(iii) If X, Y 2g, ½X, Y ð XY  YX Þ 2g as well.
In fact, exp[t(aX)] ¼ exp [(ta)X] and ta is a real number so that we have (i).
Regarding (ii) we should examine individual properties of X. As already shown in
Property (7)0 of Chap. 15 (Sect. 15.2), e.g., a unitary matrix of exp(tX) is associated
with an anti-Hermitian matrix of X; we are interested in this particular case. In the
case of SU(2) X is a traceless anti-Hermitian matrix and as to SO(3) X is a real skew-
symmetric matrix. In the former case, for example, traceless anti-Hermitian matrices
X and Y can be expressed as
   
ic a þ ib ih f þ ig
X¼ , Y¼ , ð20:247Þ
a þ ib ic f þ ig ih

where a, b, c, etc. are real. Then, we have


 
iðc þ hÞ ða þ f Þ þ iðb þ gÞ
XþY ¼ : ð20:248Þ
ð a þ f Þ þ i ð b þ gÞ iðc þ hÞ

Thus, X + Y is a traceless anti-Hermitian matrix as well. With the property (iii), we


̬
need some explanation. Suppose X, Y 2g. Then, with any real numbers s and t, exp
(sX), exp (tY), exp (sX), exp (tY) 2 G by Definition 20.3. Then, their product exp
(sX) exp (tY) exp (sX) exp (tY) 2 G as well. Taking an infinitesimal transforma-
tion of these four products within the first order of s and t, we have

exp ðsX Þ exp ðtY Þ exp ðsX Þ exp ðtY Þ


 ð1 þ sX Þð1 þ tY Þð1  sX Þð1  tY Þ
¼ ð1 þ tY þ sX þ stXY Þð1  tY  sX þ stXY Þ
 ð1  tY  sX þ stXY Þ þ ðtY  stYX Þ þ ðsX  stXY Þ þ stXY
¼ 1 þ stðXY  YX Þ: ð20:249Þ
868 20 Theory of Continuous Groups

Notice that we ignored the terms having s2, t2, st2, s2t, and s2t2 as a coefficient.
Defining

½X, Y   XY  YX, ð20:250Þ

we have

exp ðsX Þ exp ðtY Þ exp ðsX Þ exp ðtY Þ  1 þ st ½X, Y   exp ðst ½X, Y Þ: ð20:251Þ

Since LHS represents an element of G and st is a real number, by Definition 20.3


̬
½X, Y  2g . The quantity [X, Y] is called a commutator. The commutator and its
definition appeared in (1.139).
̬
From properties (i) and (ii), the Lie algebra g forms a vector space (see Chap. 11).
̬
An element of g is usually expressed by a matrix. Zero vector corresponds to zero
matrix (see Proposition 15.1). As already seen in Chap. 15, we defined an inner
̬
product between any A, B 2g such that
X
hAjBi  aij bij : ð20:252Þ
i, j

̬
Thus, g constitutes an inner product (vector) space.

20.4.2 Properties of Lie Algebras

Let us further continue our discussion of Lie algebras by thinking of a following


proposition about the inner product.
Proposition 20.1 Let f(t) and g(t) be continuous and differentiable functions with
respect to a real number t. Then, we have
   
d df ðt Þ dgðt Þ
h f ðt Þjgðt Þi ¼ gðt Þ þ f ðt Þ : ð20:253Þ
dt dt dt

Proof We calculate a following equation:


20.4 Lie Groups and Lie Algebras 869

h f ðt þ ΔÞjgðt þ ΔÞi  h f ðt Þjgðt Þi


¼ h f ðt þ ΔÞjgðt þ ΔÞi  h f ðt Þjgðt þ ΔÞi þ h f ðt Þjgðt þ ΔÞi  h f ðt Þjgðt Þi
¼ h f ðt þ ΔÞ  f ðt Þjgðt þ ΔÞi þ h f ðt Þjgðt þ ΔÞ  gðt Þi:
ð20:254Þ

Dividing both sides of (20.254) by a real number Δ and taking a limit as below,
we have

1
½h f ðt þ ΔÞjgðt þ ΔÞi  h f ðt Þjgðt Þi
lim
Δ
Δ!0
    ð20:255Þ
f ðt þ ΔÞ  f ðt Þ gðt þ ΔÞ  gðt Þ
¼ lim gð t þ Δ Þ þ f ð t Þ ,
Δ!0 Δ Δ

where we used the calculation rules for inner products of (13.3) and (13.20). Then,
we get (20.253). This completes the proof. ∎
Now, let us apply (20.253) to a unitary operator A(t) we dealt with in the previous
section. Replacing f(t) and g(t) in (20.253) with ψA(t){ and A(t)χ, respectively, we
have
* +  
D E dAðt Þ{ dAðt Þ
d
ψAðt Þ{ Aðt Þχ ¼ ψ Aðt Þχ þ ψAðt Þ {
χ , ð20:256Þ
dt dt dt

where ψ and χ are arbitrarily chosen vectors that do not depend on t. Then, we have
D E
d d d
LHS of ð20:256Þ ¼ ψ jAðt Þ{ Aðt Þjχ ¼ hψ jEjχ i ¼ hψ jχ i ¼ 0, ð20:257Þ
dt dt dt

where with the second equality we used the unitarity of A(t). Taking limit t ! 0 in
(20.256), we have
* +  
dAð0Þ{ dAð0Þ
{
lim ½RHS of ð20:256Þ ¼ ψ Að0Þχ þ ψAð0Þ χ :
t!0 dt dt

Meanwhile, taking a limit of t ! 0 in the following relation, we have


 {
dAðt Þ{ dAðt Þ {
¼ ¼ ½A0 ð0Þ ,
dt t!0
dt t!0

where we assumed that operations of the adjoint and t ! 0 are commutable. Then,
we get
870 20 Theory of Continuous Groups

D E  
0 { { dAð0Þ
lim ½RHS of ð20:256Þ ¼ ψ ½A ð0Þ jAð0Þχ þ ψAð0Þ χ
t!0 dt ð20:258Þ
    
¼ ψX { jχ þ hψ jXχ i ¼ ψ X { þ X χ ,

where we used X  A0(0) of (20.241) and A(0) ¼ A(0){ ¼ E. From (20.257) and
(20.258), we have
  
0 ¼ ψ X{ þ X χ :

Since ψ and χ are arbitrary, from Theorem 14.2 we get

X { þ X  0, i:e:, X { ¼ X: ð20:259Þ

This indicates that X is an anti-Hermitian operator. Let X be expressed as (xij).


Then, from (20.259) xij ¼ xji . As for diagonal elements, xii ¼ xii ; i.e., xii þ
xii ¼ 0 . Hence, xii is a pure imaginary number or zero. We have the following
theorem accordingly.
Theorem 20.3 Let A(t) be a one-parameter unitary group with A(0) ¼ E that is
described by A(t) ¼ exp (tX) and satisfies (20.235). Then, X  A0(0) is an anti-
Hermitian operator.
Differentiating both sides of A(t) ¼ exp (tX) with respect to t and using (15.47) of
Theorem 15.4 [11, 12], we have

A0 ðt Þ ¼ X exp ðtX Þ ¼ XAðt Þ ¼ Aðt ÞX: ð20:260Þ

Putting t ¼ 0 in (20.260), we restore X ¼ A0(0). If we require A(t) to be unitary,


from Theorem 20.3 again X should be anti-Hermitian. Thus, once we have a one-
parameter group in a form of a unitary operator A(t) ¼ exp (tX), the exponent can be
separated into a product of a parameter t and a t-independent constant anti-Hermitian
operator X. In fact, all the one-parameter groups that have appeared in Sect. 20.2 are
of a type of exp(tX).
Conversely, let us consider what type of operator exp(tX) would be if X is anti-
Hermitian. Here we assume that X is a (n, n) matrix. With an arbitrary real number
t we have
   
½ exp ðtX Þ exp ðtX Þ{ ¼ ½ exp ðtX Þ exp tX { ¼ ½ exp ðtX Þ ½ exp ðtX Þ
¼ ½ exp ðtX  tX Þ ¼ exp 0 ¼ E, ð20:261Þ

where we used (15.32) and Theorem 15.2 as well as the assumption that X is anti-
Hermitian. Note that exp(tX) and exp(tX) are commutative; see (15.29) of Sect.
15.2. Equation (20.261) implies that exp(tX) is unitary, i.e., exp(tX) 2 U(n). The
20.4 Lie Groups and Lie Algebras 871

̬ ̬
notation U(n) means a unitary group. Hence, X 2u ðnÞ, where u ðnÞ means the Lie
algebra corresponding to U(n).
Next, let us seek a Lie algebra that corresponds to a special unitary group U(n).
̬ ̬ ̬
By Definition 20.3, it is obvious that since U(n) ⊃ SU(n), u ðnÞ ⊃ su ðnÞ , where
̬ ̬
su ðnÞ means the Lie algebra corresponding to SU(n). It suffices to find the condition
̬
under which det[exp(tX)] ¼ 1 holds with any real number t and the elements X 2u
ðnÞ. From (15.41) [9], we have

det½ exp ðtX Þ ¼ exp TrðtX Þ ¼ exp t ½TrðX Þ ¼ 1, ð20:262Þ

where Tr stands for trace (see Chap. 12). This is equivalent to that

t ½TrðX Þ ¼ 2miπ ðm : zero or integersÞ ð20:263Þ

holds
̬ ̬
with any real t. For this, we must have Tr(X) ¼ 0 with m ¼ 0. This implies that
su ðnÞ comprises anti-Hermitian ̬ ̬
matrices with its trace being zero (i.e., traceless).
̬
Consequently, the Lie algebra su ðnÞ is certainly a subset (or subspace) of u ðnÞ that
consists of anti-Hermitian matrices.
In relation to (20.256) let us think of a real orthogonal matrix A(t). If A(t) is real
and orthogonal, (20.256) can be rewritten as
   
d  dAðt ÞT dAðt Þ
ψAðt ÞT jAðt Þχ ¼ ψ Aðt Þχ þ ψAðt ÞT χ : ð20:264Þ
dt dt dt

Following the procedure similar to the case of (20.259), we get

X T þ X  0, i:e:, X T ¼ X: ð20:265Þ

In this case we obtain a skew-symmetric matrix. All its diagonal elements are zero
and, hence, the skew-symmetric matrix is traceless. Other typical Lie groups are an
orthogonal group O(n) and a special orthogonal group SO(n) and the corresponding
̬ ̬ ̬ ̬
Lie algebras are denoted by o ðnÞ and so ðnÞ, respectively. Note that both o ðnÞ and
̬ ̬
so ðnÞ consist of skew-symmetric matrices.
Conversely, let us consider what type of operator exp(tX) would be if X is a skew-
symmetric matrix. Here we assume that X is a (n, n) matrix. With an arbitrary real
number t we have
   
½ exp ðtX Þ exp ðtX ÞT ¼ ½ exp ðtX Þ exp tX T ¼ ½ exp ðtX Þ ½ exp ðtX Þ
¼ ½ exp ðtX  tX Þ ¼ exp 0 ¼ E, ð20:266Þ

where we used (15.31). This


̬
implies that exp(tX) is an orthogonal matrix, i.e., exp
(tX) 2 O(n). Hence, X 2o ðnÞ. Notice that exp(tX) and exp(tX) are commutative.
Summarizing the above, we have the following theorem.
872 20 Theory of Continuous Groups

̬
Theorem 20.4 The n-th order Lie algebra u ðnÞ corresponding to the Lie group U
(n) (i.e., unitary group) consists of all the anti-Hermitian (n, n) matrices. The n-th
̬ ̬
order Lie algebra su ðnÞ corresponding to the Lie group SU(n) (i.e., special unitary
group) comprises all the anti-Hermitian (n, n) matrices with the trace zero. The n-th
̬ ̬ ̬
order Lie algebras o ðnÞ and so ðnÞ corresponding to the Lie groups O(n) and SO(n),
respectively, are all the real skew-symmetric (n, n) matrices.
Notice that if A and B are anti-Hermitian matrices, so are A + B and cA (c: real
̬ ̬ ̬ ̬
number). This is true of skew-symmetric matrices as well. Hence, u ðnÞ, su ðnÞ, o ðnÞ,
̬ ̬
and so ðnÞ form a linear vector space.
In Sect. 20.3.1 we introduced a commutator. The commutators are ubiquitous in
quantum mechanics, especially as commutation relations (see Part I). Major prop-
erties are as follows:

ði Þ ½X þ Y, Z  ¼ ½X, Z  þ ½Y, Z ,
ðiiÞ ½aX, Y  ¼ a½X, Z  ða 2 ℝÞ,
ðiiiÞ ½X, Y  ¼ ½Y, X ,
ðivÞ ½X, ½Y, Z  þ ½Y, ½Z, X  þ ½Z, ½X, Y  ¼ 0: ð20:267Þ

The last equation of (20.267) is well known as Jacobi’s identity. Readers are
encouraged to check it.
Since the Lie algebra forms a linear vector space of a finite dimension, we denote
its basis vectors by X1, X2,   , Xd. Then, we have
̬
g ðdÞ ¼ SpanfX 1 , X 2 ,   , X d g, ð20:268Þ
̬
where
̬
g ðdÞ stands for a Lie algebra of dimension d. As their commutators belong to
g ðdÞ as well, with an arbitrary pair of Xi and Xj (i, j ¼ 1, 2,   , d ) we have
  Xd
Xi, Xj ¼ f X ,
k¼1 ijk k
ð20:269Þ

where a set of real coefficients fijk is said to be structure constants, which define the
structure of the Lie algebra.
Example 20.4 In (20.42) we write ζ 1  ζ x, ζ 2  ζ y, and ζ 3  ζ z. Rewriting it
explicitly, we have
     
1 0 i 1 0 1 1 i 0
ζ1 ¼ , ζ2 ¼ , ζ3 ¼ : ð20:270Þ
2 i 0 2 1 0 2 0 i

Then, we get
20.4 Lie Groups and Lie Algebras 873

  X3
ζi , ζj ¼ E ζ ,
k¼1 ijk k
ð20:271Þ

where Eijk is called the Levi-Civita symbol [4] and denoted by


8
< þ1 ði, j, k Þ ¼ ð1, 2, 3Þ, ð2, 3, 1Þ, ð3, 1, 2Þ
>
Eijk ¼ 1 ði, j, k Þ ¼ ð3, 2, 1Þ, ð1, 3, 2Þ, ð2, 1, 3Þ ð20:272Þ
>
:
0 otherwise:

Notice that in (20.272) if (i, j, k) represents an even permutation of (1, 2, 3), Eijk ¼ 1,
but if (i, j, k) is an odd permutation of (1, 2, 3), Eijk ¼  1. Otherwise, for, e.g.,
i ¼ j, j ¼ k, or k ¼ i, etc. Eijk ¼ 0. The relation of (20.271) is essentially the same as
(3.30) and (3.69) of Chap. 3. We have
̬ ̬
su ð2Þ ¼ Spanfζ 1 , ζ 2 , ζ 3 g: ð20:273Þ

Example 20.5 In (20.26) we write A1  Ax, A2  Ay, and A3  Az. Rewriting it, we
have
0 1 0 1 0 1
0 0 0 0 0 1 0 1 0
B C B C B C
A1 ¼ @ 0 0 1 A, A2 ¼ @ 0 0 0 A, A3 ¼ @ 1 0 0 A: ð20:274Þ
0 1 0 1 0 0 0 0 0

Again, we have
  X3
Ai , Aj ¼ E A:
k¼1 ijk k
ð20:275Þ

This is of the same form as that of (20.271). Namely,


̬ ̬ ̬
so ð3Þ ¼o ð3Þ ¼ SpanfA1 , A2 , A3 g:

Equation (20.274) clearly shows that diagonal elements of Ai (i ¼ 1, 2, 3)


are zero.
̬ ̬ ̬ ̬ ̬
From Examples 20.4 and 20.5, su ð2Þ and so ð3Þ ½or o ð3Þ structurally resemble
each other. This fact is directly related to similarity between SU(2) and SO(3). Note
̬ ̬ ̬ ̬
also that the dimensionality of su ð2Þ and so ð3Þ is the same in terms of linear vector
spaces.
In Chap. 3, we deal with the generalized angular momentum using the relation
(3.30). Using (20.15), we can readily obtain the relation the same as (20.271)
and (20.275). That is, from (3.30) defining the following anti-Hermitian operators
as
874 20 Theory of Continuous Groups

iJ iJ y
Jex   x , Jey   , Jez  iJ z =ħ,
ħ ħ

we get
h i X3
Jel , Jf
m ¼ E Je ,
k¼1 lmn n
ð20:276Þ

where l, m, n stand for x, y, z. The derivation is left for readers.

20.4.3 Adjoint Representation of Lie Groups


̬
In (20.252) of Sect. 20.3.1 we defined an inner product of g as
X
hAjBi  a b :
i,j ij ij
ð20:252Þ

We can readily check that the definition (20.252) satisfies those of the inner
product of (13.2), (13.3), and (13.4). In fact, we have
X X  X 
hBjAi ¼ b a ¼
i,j ij ij i,j
aij bij ¼ a b
i,j ij ij
¼ hAjBi : ð20:277Þ

Equation (13.3) is obvious from the calculation rules of a matrix. With (13.4),
we get
 X
hAjAi ¼ Tr A{ A ¼
2
a
i,j ij
0, ð20:278Þ

where we have A ¼ (aij) and the equality holds if and only


̬
if all aij ¼ 0, i.e., A ¼ 0.
Thus, hA|Ai gives a positive definite inner product on g. ̬
From (20.278), we may equally define an inner product of g as

hAjBi ¼ Tr A{ B : ð20:279Þ

In fact, we have
 X  X
Tr A{ B ¼ i,j
A{ ij ðBÞji ¼ a b  hAjBi:
i,j ji ji

In another way, we can readily compare (20.279) with (20.252) using, e.g., a
(2, 2) matrix and find that both equations give the same result. It is left for readers as
an exercise.
20.4 Lie Groups and Lie Algebras 875

̬
From Theorem 20.4, u ðnÞ corresponding to the unitary group U(n) consists of all
the anti-Hermitian (n, n) matrices. With these matrices, we have
 X X  X
hBjAi ¼ Tr B{ A ¼ i,j
b 
ji aji ¼ i,j
bij aij ¼ a b
i,j ij ij
¼ hAjBi: ð20:280Þ

Then, comparing (20.277) and (20.280) we have

hAjBi ¼ hAjBi:
̬
This means that hA|Bi is real. For example,
̬
using u ð2Þ, let us evaluate an inner
product. We denote arbitrary X 1 , X 2 2u ð2Þ by
   
ia c þ id ip r þ is
X1 ¼ , X2 ¼ ,
c þ id ib r þ is iq

where a, b, c, d; p, q, r, s are real. Then, according to the calculation rule of an inner


product of the Lie algebra, we have

hX 1 jX 2 i ¼ ap þ 2ðcr þ dsÞ þ bq,


̬ ̬
which gives a real inner product. Notice that this is the case with su ð2Þ as well.
Next, let us think of the following inner product:
  h i
{
gXg1 gYg1 ¼ Tr gXg1 gYg1 , ð20:281Þ

where g is any non-singular matrix.


Bearing in mind that we are particularly interested in the continuous groups U(n)
[or its subgroup SU(n)] and O(n) [or its subgroup SO(n)], we assume that g is an
element of those groups and represented by a unitary matrix (including an orthog-
onal matrix); i.e., g1 ¼ g{ (or gT). Then, if g is a unitary matrix, we have
     
Tr gXg1 Þ{ gYg1 ¼ Tr gXg{ Þ{ gYg{ ¼ Tr gX { g{ gYg{
 
¼ Tr gX { Yg{ ¼ Tr X { Y ¼ hXjY i, ð20:282Þ

where with the second last equality we used (12.13), namely invariance of the trace
under the (unitary) similarity transformation. If g is represented by a real orthogonal
matrix, instead of (20.282) we have
876 20 Theory of Continuous Groups

     
Tr gXg1 Þ{ gYg1 ¼ Tr gXgT Þ{ gYgT ¼ Tr gT Þ{ X { g{ gYgT
 
¼ Tr g X { gT gYg{ ¼ Tr gX { YgT ¼ hXjYi, ð20:283Þ

where g is a complex conjugate matrix of g. Combining (20.281) and (20.282) or


(20.283), we have
 
gXg1 gYg1 ¼ hXjY i: ð20:284Þ

This relation clearly shows that the real inner product of (20.284) remains
unchanged by the operation

X⟼gXg1 ,

where X is an anti-Hermitian (or skew-symmetric) matrix and g is a unitary


(or orthogonal) matrix.
Now, we give the following definition and related theorems in relation to both the
Lie groups and Lie algebras.
Definition 20.4 Let G be a Lie group chosen from U(n) [including SU(n)] and O(n)
̬ ̬
[including SO(n)]. Let g be a Lie algebra corresponding to G. Let g 2 G and X 2g.
̬
We define the following transformation on g such that

Ad½gðX Þ  gXg1 : ð20:285Þ


̬
Then, Ad[g] is said to be an adjoint representation of G on g. We write the relation
(20.285) as
̬ ̬
Ad½g: g!g :
̬
That is, Ad½gðX Þ 2g . The operator Ad[g] is a kind of mapping (i.e., endomor-
phism) discussed in Sect. 11.2. Notice that both g and X are represented by (n, n)
matrices. The matrix g is either a unitary matrix or an orthogonal matrix. The matrix
X is either an anti-Hermitian matrix or a skew-symmetric matrix.
̬
Theorem 20.5 Let g be an element of a Lie group G and g be a Lie algebra of G.
̬
Then, Ad[g] is a linear transformation on g.
̬
Proof Let X, Y 2g. Then, we have
 
Ad½gðaX þ bY Þ  gðaX þ bY Þg1 ¼ a gXg1 þ b gYg1
¼ aAd½gðX Þ þ bAd½gðY Þ: ð20:286Þ

Thus, we find that Ad[g] is a linear transformation. By Definition 20.3, exp


(tX) 2 G with an arbitrary real number t. Meanwhile, we have
20.4 Lie Groups and Lie Algebras 877

 
exp ftAd½gðX Þg ¼ exp tgXg1 ¼ exp gtXg1 ¼ g exp ðtX Þg1 ,

where with the last equality we used (15.30).


̬
Then, if g 2 G, g exp (tX)g1 2 G. Again from Definition 20.3, Ad½gðX Þ 2g .
̬
That is, Ad[g] is a linear transformation on g. This completes the proof. ∎
Notice that in Theorem 20.5, G may be any linear Lie group chosen̬ from among
̬
GL(n, ℂ). Lie algebra corresponding to GL(n, ℂ) is denoted by gl ðn, ℂÞ. The
following theorem shows that Ad[g] is a representation.
Theorem 20.6 Let g be an element of a Lie group G. Then, g ⟼ Ad[g] is a
̬
representation of G on g.
Proof From (20.285), we have

Ad½g1 g2 ðX Þ  ðg1 g2 ÞX ðg1 g2 Þ1 ¼ g1 g2 Xg1 1 1


2 g1 ¼ g1 Ad½g2 ðX Þg1

¼ Ad½g1 ðAd½g2 ðX ÞÞ ¼ ðAd½g1 Ad½g2 ÞðX Þ: ð20:287Þ

Comparing the first and last sides of (20.287), we get

Ad½g1 g2  ¼ Ad½g1 Ad½g2 :


̬
That is, g ⟼ Ad[g] is a representation of G on g.
By virtue of Theorem 20.6, we call g ⟼ Ad[g] an adjoint representation of G on
̬
g. Once again, we remark that
̬ ̬
X⟼Ad½gðXÞ  X0 or Ad½g : g!g ð20:288Þ

g Ad[ ] g

̬ ̬
Fig. 20.5 Linear transformation Ad[g]: g!g (i.e., endomorphism). Ad[g] is an endomorphism that
̬
operates on a vector space g
878 20 Theory of Continuous Groups

̬ ̬
is a linear
̬
transformation g!g, namely an endomorphism that operates on a vector
space g (see Sect. 11.2). Figure 20.5 schematically depicts it. The notation of the
adjoint representation Ad is somewhat
̬ ̬
confusing. That is, (20.288) shows that Ad[g]
is a linear transformation g!g (i.e., endomorphism). On the other hand, Ad is
thought to be a mapping G ! G0 where G and G0 mean two groups. Namely, we
express it symbolically as [9]

g⟼Ad½g or Ad : G ! G0 : ð20:289Þ

The G and G0 may or may not be identical. Examples can be seen in, e.g., (20.294)
and (20.302) later.
̬ ̬ ̬
As mentioned above, if g is either u ðnÞ or o ðnÞ , Ad[g] is an orthogonal
̬
transformation on g. The representation of Ad[g] is real accordingly.
An immediate implication of this is that the transformation of the basis vectors of
̬
g has a connection with the corresponding Lie groups such as U(n) and O(n).
Moreover, it is well known and studied that there is a close relationship between
̬ ̬
Ad[SU(2)] and SO(3). First, we wish to seek basis vectors of su ð2Þ. A general form
̬ ̬
of X 2su ð2Þ is a following traceless anti-Hermitian matrix described by
 
ic a þ ib
X¼ ,
a þ ib ic

where a, b, and c are real. We have encountered this type of matrix in (20.42) and
(20.270). Using an
̬ ̬
inner product described in (20.252), we determine an orthonor-
mal basis set of su ð2Þ such that
     
1 0 i 1 0 1 1 i 0
e1 ¼ pffiffiffi , e2 ¼ pffiffiffi , e3 ¼ pffiffiffi , ð20:290Þ
2 i 0 2 1 0 2 0 i

where we have hei| eji ¼ δij (i, j ¼ 1, 2, 3).


Choosing the SU(2) elements of
0 1
! β β
iα cos sin
e 2 0 B 2 2C
gα  and gβ  @ A
0 e iα2 β β
 sin cos
2 2

that appeared in (20.43) and (20.44), we have [9]


20.4 Lie Groups and Lie Algebras 879

! ! !
1 eiα=2 0 0 i eiα=2 0
Ad½gα ðe1 Þ ¼ gα e1 g1
α ¼ pffiffiffi
2 0 eiα=2 i 0 0 eiα=2
!
1 0 sin α  i cos α
¼ pffiffiffi ¼ e1 cos α þ e2 sin α:
2  sin α  i cos α 0
ð20:291Þ

Similarly, we get
 
1 0 cosα þ isinα
Ad½gα ðe2 Þ ¼ p ffiffi
ffi ¼ e1 sinα þ e2 cosα, ð20:292Þ
2 cosα þ isinα 0
 
1 i 0
Ad½gα ðe3 Þ ¼ pffiffiffi ¼ e3 : ð20:293Þ
2 0 i

Notice that we obtained (20.291) to (20.293) as a product of three diagonal


matrices. Thus, we obtain
0 1
cos α  sin α 0
B C
ðe1 e2 e3 ÞAd½gα  ¼ ðe1 e2 e3 Þ@ sin α cos α 0 A: ð20:294Þ
0 0 1

From (20.294), as the real representation we get


0 1
cos α  sin α 0
B C
Ad½gα  ¼ @ sin α cos α 0 A: ð20:295Þ
0 0 1

Calculating (e1 e2 e3)Ad[gβ] likewise, we get


0 1
cos β 0 sin β
  B C
Ad gβ ¼ @ 0 1 0 A: ð20:296Þ
 sin β 0 cos β

Moreover, with
!
eiγ=2 0
gγ ¼ , ð20:297Þ
0 eiγ=2

we have
880 20 Theory of Continuous Groups

0 1
cos γ  sin γ 0
  B C
Ad gγ ¼ @ sin γ cos γ 0 A: ð20:298Þ
0 0 1

Finally, let us reproduce (17.101) by calculating


     
Ad½gα   Ad gβ  Ad gγ ¼ Ad gα gβ gγ , ð20:299Þ

where we
̬ ̬
used the fact that gω ⟼ Ad[gω] (ω ¼ α, β, γ) is a representation of SU
(2) on su ð2Þ . To obtain an explicit form of the representation matrix of Ad[g]
[g 2 SU(2)], we use
0 1
β β
e2ðαþγÞ cos e2ðαγÞ sin
i i

B 2 2 C
gαβγ  gα gβ gγ ¼ @ A ð20:300Þ
i
ð γα Þ β 2i ðαþγ Þ β
e2 sin e cos
2 2

and calculate Ad[gαβγ ](e1) such that


 
Ad gαβγ ðe1 Þ ¼ gαβγ e1 g1αβγ
0 i β β 1 0 β β1
! 2i ðαþγÞ
e2ðαþγÞ cos e2ðαγÞ sin cos e2ðαγÞ sin
i i
0 i e
1 B 2 2 C B 2 2C
¼ pffiffiffi @ A @ A
2 β β i 0 β β
e2ðγαÞ sin e2ðαþγÞ cos e2ðγαÞ sin e2ðαþγÞ cos
i i i i

2 2 2 2
0 1
i sin β cos γ ið cos α cos β cos γ  sin α sin γ Þ
B C
B þ sin α cos β cos γ þ cos α sin γ C
B C
1 BB
C
C
¼ pffiffiffi B C
2B C
B ið cos α cos β cos γ  sin α sin γ Þ C
@ A
i sin β cos γ
ð sin α cos β cos γ þ cos α sin γ Þ
0 1
cos α cos β cos γ  sin α sin γ
B C
¼ ð e1 e2 e3 Þ B C
@ sin α cos β cos γ þ cos α sin γ A:
 sin β cos γ
ð20:301Þ

Note that the matrix of (20.300) is identical with (20.45). Similarly calculating Ad
[gαβγ ](e2) and Ad[gαβγ ](e3), we obtain
20.4 Lie Groups and Lie Algebras 881

 
ðe1 e2 e3 Þ Ad gαβγ ¼
0 1
cosαcosβcosγ  sinαsinγ cosαcosβsinγ  sinαcosγ cosαsinβ
B C
ð e1 e2 e3 Þ B
@
sinαcosβcosγ þ cosαsinγ sinαcosβsinγ þ cosαcosγ sinαsinβ C:
A
sinβcosγ sinβsinγ cosβ
ð20:302Þ

The matrix of RHS of (20.302) is the same as (17.101) that contains Euler angles.
The calculations are somewhat lengthy but straightforward. As mentioned just above
and as can be seen, e.g., in (20.294) and (20.302), we may symbolically express the
adjoint representation as [9]

Ad : SU ð2Þ ! SOð3Þ ð20:303Þ

in parallel to (20.289). Accordingly, from (20.302) we may identify Ad[gαβγ ] with


the SO(3) matrix (17.101) and write
 
Ad gαβγ
0 1
cos α cos β cos γ  sin α sin γ  cos α cos βsinγ  sin α cos γ cos α sin β
B C
¼B@
sin α cos β cos γ þ cos α sin γ  sin α cos β sin γ þ cos α cos γ sin α sin β C:
A
 sin β cos γ sin β sin γ cos β
ð20:304Þ

In the above discussion, we make several remarks. (i) g 2 SU(2) is described by


(2, 2) complex matrices, but Ad[SU(2)] is expressed by (3, 3) real orthogonal
matrices. Thus, the dimension of the matrices is different. (ii) Notice again that
̬ ̬
(e1 e2 e3) of (20.302) is an orthonormal basis set of su ð2Þ. Hence, (20.302) can be
viewed as the orthogonal transformation of (e1 e2 e3). That is, we may regard Ad[SU
(2)] as all the matrix representations of SO(3). From (12.64) and (20.302), we write
[9]

Ad½SU ð2Þ  SOð3Þ ¼ 0 or Ad½SU ð2Þ ¼ SOð3Þ:

Regarding the adjoint representation of G, we have the following important


theorem.
Theorem 20.7 The adjoint representation of Ad[SU(2)] is a surjective homomor-
phism of SU(2) on SO(3). The kernel F of the representation is {e, e} 2 SU(2).
With any rotation R 2 SO(3), two elements g 2 SU(2) satisfy Ad[g] ¼ Ad
[g] ¼ R.
Proof We have shown that by (20.302) the adjoint representation of Ad[SU(2)] is
surjective mapping of SU(2) on SO(3). Since Ad[g] is a representation, namely
882 20 Theory of Continuous Groups

homomorphism mapping from SU(2) to SO(3), we seek its kernel on the basis of
Theorem 16.3 (Sect. 16.4). Let h be an element of kernel of SU(2) such that

Ad½h ¼ E 2 SOð3Þ, ð20:305Þ


̬ ̬
where E is the identity transformation of SO(3). Operating (20.305) on e3 2su ð2Þ,
from (20.305) we have a following expression such that

Ad½hðe3 Þ ¼ he3 h1 ¼ Eðe3 Þ ¼ e3 or he3 ¼ e3 h,


 
h11 h12
where e3 is denoted by (20.290). Expressing h as , we get
h21 h22
   
ih11 ih12 ih11 ih12
¼ :
ih21 ih22 ih21 ih22

From the above relation, we must have h12 ¼ h21 ¼ 0. As h 2 SU(2), deth ¼ 1.
 
a 0
Hence, for h we choose h ¼ , where a 6¼ 0. Moreover, considering
0 a1
he2 ¼ e2h we get
   
0 a 0 a1
¼ : ð20:306Þ
a1 0 a 0

This implies that a ¼ a1, or a ¼ 1. That is, h ¼ e, where e denotes the


identity element of SU(2). (Note that we get no further conditions from he1 ¼ e1h.)
Thus, we obtain

F ¼ fe, eg: ð20:307Þ


̬ ̬
In fact, with respect to X 2su ð2Þ we have

Ad½ eðX Þ ¼ ð eÞX ð eÞ1 ¼ ð eÞX ð eÞ ¼ X: ð20:308Þ

Therefore, certainly we get

Ad½ e ¼ E: ð20:309Þ

Meanwhile, suppose that with g1, g2 2 SU(2) we have Ad[g1] ¼ Ad[g2] ¼ E 2 SO


(3). Then, we have
20.4 Lie Groups and Lie Algebras 883

       
Ad g1 g2 1 ¼ Ad½g1 Ad g2 1 ¼ Ad½g2 Ad g2 1 ¼ Ad g2 g2 1
¼ Ad½e ¼ E: ð20:310Þ

Therefore, from (20.307) and (20.309) we get

g1 g2 1 ¼ e or g1 ¼ g2 : ð20:311Þ

Conversely, we have

Ad½g ¼ Ad½eAd½g ¼ Ad½g, ð20:312Þ

where with the first equality we used Theorem 20.6 and with the second equality we
used (20.309). The relations (20.311) and (20.312) imply that with any g 2 SU
(2) there are two elements g that satisfy Ad[g] ¼ Ad[g]. These complete the
proof. ∎
In the above proof of Theorem 20.7, (20.309) is a simple, but important expres-
sion. In view of Theorem 16.4 (Homomorphism theorem), we summarize the above
discussion as follows: Let Ad be a homomorphism mapping such that

Ad: SU ð2Þ⟶SOð3Þ ð20:313Þ

with a kernel F ¼ fe, eg. The relation is schematically represented in Fig. 20.6.
Notice the resemblance between Fig. 20.6 and Fig. 16.1a.
Correspondingly, from Theorem 16.4 of Sect. 16.4 we have an isomorphic
f such that
mapping Ad

f SU ð2Þ=F ⟶SOð3Þ,
Ad: ð20:314Þ

where F ¼ fe, eg. Symbolically writing (20.314) as in Sect. 16.4, we have


SU ð2Þ=F ffi SOð3Þ. Notice that F is an invariant subgroup of SU(2). Using a general

(2) (3)

Ad

Fig. 20.6 Homomorphism mapping Ad: SU(2) ⟶ SO(3) with a kernel F ¼ fe, eg . E is the
identity transformation of SO(3)
884 20 Theory of Continuous Groups

form of SU(2) of (20.300) and a general form of SO(3) of (20.302), we can readily
show Ad[gαβγ ] ¼ Ad[gαβγ ]. That is, from (20.300) we have

gαþ2π,βþ2π,γþ2π ¼ gαβγ :

Both the above two elements gαβγ and gαβγ produce an identical orthogonal
matrix given in RHS of (20.302).
To associate the above results of abstract algebra with the tangible matrix form
obtained earlier, we rewrite (20.54) by applying a simple trigonometric formula of

β β
sin β ¼ 2 sin cos :
2 2

That is, we get


pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð jÞ
X ð1Þm0 mþμ ð jþmÞ!ð jmÞ!ð jþm0 Þ!ð jm0 Þ!
Dm0 m ðα,β,γ Þ¼ μ ð jþmμÞ!μ!ðm0 mþμÞ!ð jm0 μÞ!
 2j " mm0 2μ  m0 mþ2μ #
iðαm0 þγmÞ β β β
e cos cos sin
2 2 2
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ð1Þm0 mþμ ð jþmÞ!ð jmÞ!ð jþm0 Þ!ð jm0 Þ! 0
¼ 2mm 2μ
μ ð jþmμÞ!μ!ðm0 mþμÞ!ð jm0 μÞ!
 2jþ2ðmm0 2μÞ
iðαm0 þγmÞ m0 mþ2μ β
e ð sinβÞ cos : ð20:315Þ
2

From (20.315), we clearly see that (i) if j is zero or a positive integer, so are m and
m0. Therefore, (20.315) is a periodic function with 2π with respect to α, β, and γ. But
(ii) if j is a half-odd-integer, so are m and m0. Then, (20.315) is a periodic function
with 4π. Note that in the case of (ii) 2j ¼ 2n + 1 (n ¼ 0.1, 2,   ). Therefore, in
(20.315) we have
 2jþ2ðmm0 2μÞ    2ðnþmm0 2μÞ
β β β
cos ¼ cos  cos : ð20:316Þ
2 2 2

As studied in Sect. 20.2.3, in the former case (i) the spherical surface harmonics
span the representation space of D(l ) (l ¼ 0, 1, 2,   ). The simplest case for it is
D(0) ¼ 1; a (1, 1) identity matrix, i.e., merely a number 1. The second simplest case
was given by (20.74) that describes D(1). Meanwhile, the simplest case for (ii) is
ð12Þ
D(1/2) whose matrix representation was given by Dα,β,γ in (20.45) or by gαβγ in
(20.300). With gαβγ , both Ad[ gαβγ] produce a real orthogonal (3, 3) matrix
expressed as (20.302). This representation matrix allowedly belongs to SO(3).
20.5 Connectedness of Lie Groups 885

20.5 Connectedness of Lie Groups

As already discussed in Chap. 6, reflecting the topological characteristics the


connectedness is an important concept in the Lie groups. If we think of the analytic
functions on the complex plane ℂ2, the connectedness was easily envisaged geo-
metrically. To deal with the connectedness in general Lie groups, on the other hand,
we need abstract concepts. In this section we take a down-to-earth approach as much
as possible.

20.5.1 Several Definitions and Examples

To discuss this topic, let us give several definitions and examples.


Definition 20.5 Suppose that A is a subset of GL(n, ℂ) and that we have any pair of
elements a, b 2 A ⊂ GL(n, ℂ). Meanwhile, let f(t) be a continuous function defined
in an interval 0 t 1 and take values within A; that is, f(t) 2 A with 8t. If f(0) ¼ a
and f(1) ¼ b, then a and b are said to be connected within A.
Major parts of the discussion that follows are based mostly upon literature [9]. A
simple example for the above is given below.
   
1 0 1 0
Example 20.6 Let a ¼ and b ¼ be elements of SO(2). Let f
0 1 0 1
(x) be described by
 
cos πt  sin πt
f ðt Þ ¼ :
sin πt cos πt

Then, f(t) is a continuous matrix function of all real numbers t. We have f(0) ¼ a
and f(1) ¼ b and, hence, a and b are connected to each other within SO(2).
For a and b to be connected within A is denoted by a~b. Then, the symbol ~
satisfies the equivalence relation. That is, we have a following proposition.
Proposition 20.2 We have a, b, c 2 A ⊂ GL(n, ℂ). Then, we have following three
relations. (i) a~a. (ii) If a~b, b~a. (iii) If a~b and b~c, then we have a~c.
Proof (i) Let f(t) ¼ a (0 t 1). Then, f(0) ¼ f(1) ¼ a so that we have a~a. (ii) Let
f(t) be a continuous curve that connects a and b such that f(0) ¼ a and f(1) ¼ b
(0 t 1), and so a~b. Let g(t) be another continuous curve within A. Then, g(t)  f
(1  t) is also a continuous curve within A. Then, g(0) ¼ b and g(1) ¼ a. This implies
b~a. (iii) Let f(t) and g(t) be two continuous curves that connect a and b and b and c,
respectively. Meanwhile, we define h(t) as
886 20 Theory of Continuous Groups

8
>
< f ð2t Þ 0 t
1
hð t Þ ¼ 2
>
: gð2t  1Þ 1
t 1
2

so that h(t) can be a third continuous curve. Then, h(1/2) ¼ f(1) ¼ g(0). From the
supposition we have h(0) ¼ f(0) ¼ a, h(1/2) ¼ f(1) ¼ g(0) ¼ b, and h(1) ¼ g(1) ¼ c.
This implies if a~b and b~c, then we have a~c.
Let us give another related definition.
Definition 20.6 Let A be a subset of GL(n, ℂ). Let a point a 2 A. Then, a collection
of points that are connected to a within A is called a connected component of a and
denoted by C(a).
From Proposition 20.2 (i), a 2 C(a). Let C(a) and C(b) be two different connected
components. We have two alternatives with C(a) and C(b). (I) C(a) \ C(b) ¼ ∅.
(II) C(a) ¼ C(b). Suppose c 2 C(a) \ C(b). Then, it follows that c~a and c~b. From
Proposition 20.2 (ii) and (iii), we have a~b. Hence, we get C(a) ¼ C(b). Thus, the
subset A is a direct sum of several connected components such that

A ¼ C ðaÞ [ C ðbÞ [    [ CðzÞ with any CðpÞ \ C ðqÞ ¼ ∅ ðp 6¼ qÞ, ð20:317Þ

where p, q are taken from among a, b,   , z. In particular, the connected component


containing e (i.e., the identity element) is of major importance.
Definition 20.7 Let A be a subset of GL(n, ℂ). If any pair of elements a, b (2A) is
connected to each other within A, then A is connected or A is called a connected set.
The connected set is nothing but a set that consists of a single connected set. With
a connected set, we have the following important theorem.
Theorem 20.8 An image of a connected set A obtained by a continuous mapping
f is a connected set as well.
Proof The proof is almost self-evident. Let f: A ⟶ B be a continuous mapping from
a connected set A to B. Suppose f(A) ¼ B. Then, we have 8a, a0 2 A and b, b0 2 B that
satisfy f(a) ¼ b and f(a0) ¼ b0. Then, from Definition 20.5 there is a continuous
function g(t) (0 t 1) that satisfies g(0) ¼ a and g(1) ¼ a0. Now, define a function
h(t)  f [g(t)]. Then, h(t) is a continuous mapping with h(0) ¼ f [g(0)] ¼ f(a) ¼ b and
h(1) ¼ f [g(1)] ¼ f(a0) ¼ b0. Again from Definition 20.5, h(t) is a continuous curve
that connects b and b0 within B. Meanwhile, from supposition of f(A) ¼ B, with

8
b, b0 2 B we must have ∃ a0 , a00 2 A that satisfy f(a0) ¼ b and f a00 ¼ b0 .
Consequently, from Definition 20.7, B is a connected set as well. This completes
the proof. ∎
Applications of Theorem 20.8 are given below as examples.
Example 20.7 Let f(x) be a real continuous function defined on a subset A ⊂ GL
(n, ℝ), where x 2 A with A being a connected set. Suppose that for a pair of
20.5 Connectedness of Lie Groups 887

a, b 2 A, f(a) ¼ p and f(b) ¼ q ( p, q : real) with p < q. Then, f(x) takes all real
numbers for its value that exist on the interval [p, q]. In other words, we have f
(A) ⊃ [p, q]. This is well known as an intermediate value theorem.
Example 20.8 [9] Suppose that A is a connected subset of GL(n, ℝ). Then, with any
pair of elements a, b 2 A ⊂ GL(n, ℝ), we must have a continuous function g(t)
(0 t 1) such that g(0) ¼ a and g(1) ¼ b. Meanwhile, suppose that we can define a
real determinant for any 8g 2 A as a real continuous function detg(t).
Now, suppose that deta ¼ p < 0 and detb ¼ q > 0 with p < q. Then, from
Theorem 20.8 we would have det(A) ⊃ [p, q]. Consequently, we must have some
x 2 A such that det(x) ¼ 0. But, this is in contradiction to A ⊂ GL(n, ℝ), because we
must have det(A) 6¼ 0. Thus, within a connected set the sign of the determinant
should be constant, if the determinant is defined as a real continuous function.
In relation to the discussion of the connectedness, we have the following general
theorem.
Theorem 20.9 [9] Let G0 be a connected component within a linear Lie group
G that contains the identity element e. Then, G0 is an invariant subgroup of G. A
connected component C(g) containing g 2 G is identical with gG0 ¼ G0g.
Proof Let a, b 2 G0. Since G0 ¼ C(e), from the supposition there are continuous
curves f(t) and g(t) within G0 that connect a with e and b with e, respectively, such
that f(0) ¼ a, f(1) ¼ e and g(0) ¼ b, g(1) ¼ e. Meanwhile, let h(t)  f(t)[g(t)]1. Then,
h(t) is another continuous curve. We have h(0) ¼ f(0)[g(0)]1 ¼ ab1 and h(1) ¼ f(1)
[g(1)]1 ¼ e(e1) ¼ e  e ¼ e. That is, ab1 and e are connected, being indicative of
ab1 2 G0. This implies that G0 is a subgroup of G.
Next, with 8g 2 G and 8a 2 G0 we have f(t) in the same sense as the above.
Defining k(t) as k(t) ¼ g f(t)g1, k(t) is a continuous curve within G with k
(0) ¼ gag1 and k(1) ¼ g eg1 ¼ e. Hence, we have gag1 2 G0. Since a is an
arbitrary point of G0, we have gG0g1 ⊂ G0 accordingly. Similarly, g is an
arbitrary point of G, and so replacing g with g1 in the above, we get g1G0g ⊂ G0.
Operating g and g1 from the left and right, respectively, we obtain G0 ⊂ gG0g1.
Combining this with the above relation gG0g1 ⊂ G0, we obtain gG0g1 ¼ G0 or
gG0 ¼ G0g. This means that G0 is an invariant subgroup of G; see Sect. 16.3.
Taking 8a 2 G0 and f(t) in the same sense as the above again, l(t) ¼ g f(t) is a
continuous curve within G with l(0) ¼ ga and l(1) ¼ g e ¼ g. Then, ga and g are
connected within G. That is, ga 2 C(g). Since a is an arbitrary point of G0, gG0 ⊂ C
(g). Meanwhile, choosing any 8p 2 C(g), we set a continuous curve m(t) (0 t 1)
that connects p with g within G. This implies that m(0) ¼ p and m(1) ¼ g. Also
setting a continuous curve n(t) ¼ g1m(t), we have n(0) ¼ g1m(0) ¼ g1p and n
(1) ¼ g1m(1) ¼ g1g ¼ e. This means that g1p 2 G0. Multiplying g on its both
sides, we get p 2 gG0. Since p is arbitrarily chosen from C(g), we get C(g) ⊂ gG0.
Combining this with the above relation gG0 ⊂ C(g), we obtain C(g) ¼ gG0 ¼ G0g.
These complete the proof. ∎
In the above proof, we add the following statement: In Sect. 16.2 with the
necessary and sufficient condition for the subset to be a subgroup, we describe
888 20 Theory of Continuous Groups

(1) hi , hj 2 H ⟹hi ⋄hj 2 H . (2) h 2 H ⟹h1 2 H . In (1) if we replace hj with h1 i ,


hi ⋄h1
j ¼ h i ⋄h 1
i ¼ e 2 H . In turn, if we replace h i with e, h i ⋄h 1
j ¼ e⋄h 1
j ¼
1
h1 1 1 1
j 2 H . Finally, if we replace hj ¼ hj , hi ⋄hj ¼ hi ⋄ hj ¼ hi ⋄hj 2 H .
Thus, H satisfies the axioms (A1), (A3), and A(4) of Sect. 16.1 and, hence, forms
a (sub)group. In other words, the conditions (1) and (2) are combined so as to be

hi 2 H , hj 2 H ⟹hi ⋄h1
j 2 H:

The above general theorem can immediately be applied to an important class of


Lie groups SO(n). We discuss this topic below.

20.5.2 O(3) and SO(3)

In Chap. 17 we dealt with finite groups related to O(3) and SO(3), i.e., their
subgroups. In this section, we examine several important properties in terms of Lie
groups.
The groups O(3) and SO(3) are characterized by their determinant detO(3) ¼ 1
and detSO(3) ¼ 1. Example 20.8 implies that in O(3) there should be two different
connected components according to detO(3) ¼ 1. Also Theorem 20.9 tells us that
a connected component C(E) is an invariant subgroup G0 in O(3), where E is the
identity element of O(3). Obviously, G0 is SO(3). In turn, another connected
component is SO(3)c, where SO(3)c denotes a complementary set of SO(3); with
the notation see Sect. 6.1. Remembering (20.317), we have

Oð3Þ ¼ SOð3Þ [ SOð3Þc : ð20:318Þ

This can be rewritten as a direct sum such that

Oð3Þ ¼ CðE Þ [ C ðEÞ with C ðE Þ \ C ðE Þ ¼ ∅, ð20:319Þ


0 1
1 0 0
B C
where E denotes a (3, 3) unit matrix and E ¼ @ 0 1 0 A . In the above,
0 0 1
SO(3) is identical to C(E) with SO(3) itself being a connected set; SO(3)c is identical
with C(E) that is another connected set.
Another description of O(3) is [13].

Oð3Þ ¼ SOð3Þ fE, Eg, ð20:320Þ

where the symbol denotes a direct-product group (Sect. 16.5). An alternative


description for this is
20.5 Connectedness of Lie Groups 889

Oð3Þ ¼ SOð3Þ [ fðE ÞSOð3Þg, ð20:321Þ

where the direct sum is implied.


In Sect. 20.2.6, we discussed two rotations around different rotation axes with the
same rotation angle. These rotations belong to the same conjugacy class; see
(20.140). With Q as another rotation that transforms the rotation axis of Rω to that
of R0ω , the two rotations Rω and R0ω having the same rotation angle ω are associated
with each other through the following relation:

R0ω ¼ QRω Q1 : ð20:140Þ

Notice that this relation is independent of the choice of specific coordinate


system. In Fig. 20.4 let us choose the z-axis for the rotation axis A with respect to
the rotation Rω. Then, the rotation matrix Rω is expressed in reference to the
Cartesian coordinate system as
0 1
cos ω  sin ω 0
B C
Rω ¼ @ sin ω cos ω 0 A: ð20:322Þ
0 0 1

Meanwhile, multiplying E on both sides of (20.140), we have

R0ω ¼ QðRω ÞQ1 : ð20:323Þ

Note that E is commutative with any Q (or Q1). Also, notice that from
(20.323) R0ω and Rω are again connected via a unitary similarity transformation.
From (20.322), we have
0 1 0 1
cosω sinω 0 cos ðω þ π Þ sin ðω þ π Þ 0
B C B C
Rω ¼ @ sinω cosω 0 A ¼ @ sin ðω þ π Þ cos ðω þ π Þ 0 A, ð20:324Þ
0 0 1 0 0 1

where Rω represents an improper rotation of an angle ω + π (see Sect. 17.1).


Referring to Fig. 17.12 and Table 17.6 (Sect. 17.3), as another tangible example
we further get
0 10 1 0 1
0 1 0 1 0 0 0 1 0
B CB C B C
@ 1 0 0 A@ 0 1 0 A ¼ @1 0 0 A, ð20:325Þ
0 0 1 0 0 1 0 0 1
890 20 Theory of Continuous Groups

where the first component of LHS belongs to O (octahedral rotation group) and the
RHS belongs to Td (tetrahedral group) that gives a mirror symmetry. Combining
(20.324) and (20.325), we get the following schematic representation:

ðRotationÞ  ðInversionÞ ¼ ðImproper rotationÞ or ðMirror symmetryÞ:

This is a finite group version of (20.320).

20.5.3 Simply Connected Lie Groups: Local Properties


and Global Properties

As a final topic of Lie groups, we outline the characteristic of simply connected


groups.
In (20.318), (20.319), and (20.321) we showed three different decompositions of
O(3). This can be generalized to O(n). That is, instead of E we may take
F described by [9].
0 1
1
B C
B C
B 1 C
B C
F¼B ⋱ C,
B C
B C
@ 1 A
1

where detF ¼  1. Then, we have

OðnÞ ¼ SOðnÞ [ SOðnÞc ¼ CðE Þ [ C ðF Þ ¼ SOðnÞ þ FSOðnÞ, ð20:326Þ

where the last side represents the coset decomposition (Sect. 16.2). This is essen-
tially the same as (20.321). In terms of the isomorphism, we have

OðnÞ=SOðnÞ ffi fE, F g: ð20:327Þ

If in particular n is odd (n 3), we can choose E instead of F, because det


(E) ¼  1. Then, similarly to (20.320) we have

OðnÞ ¼ SOðnÞ fE, Eg ðn: oddÞ, ð20:328Þ

where E is a (n, n) identity matrix. Note that since E is commutative with any
elements of O(n), the direct-product of (20.328) is allowed.
20.5 Connectedness of Lie Groups 891

Of the connected components, that containing the identity element is of particular


importance. This is because as already mentioned in Sect. 20.3.1, the “initial”
condition of the one-parameter group A(t) is set at t ¼ 0 such that

Að0Þ ¼ E: ð20:238Þ

Under this condition, it is important to examine how A(t) evolves with t, starting
from E at t ¼ 0. In this connection we have the following theorems that give good
grounds to further discussion of Sect. 20.2. We describe them without proof.
Interested readers are encouraged to look up literature [9].
Theorem 20.10 Let G be a linear Lie group. Let G0 be a connected component of
the identity element e 2 G. Then, G0 is another linear Lie group and the Lie algebra
of G and G0 is identical.
Theorem 20.10 shows the significance of the connected component of the
̬ ̬ ̬
identity. From this theorem the Lie algebras of, e.g., o ðnÞ and so ðnÞ are identical,
namely both of the Lie algebras are given by real skew-symmetric matrices. This is
inherently related to the properties of Lie algebras. The zero matrix (i.e., zero vector)
as an element of Lie algebra corresponds to the identity element of the Lie group.
This is obvious from the relation

e ¼ exp 0,

where e is the identity element of the Lie group and 0 represents the zero matrix.
Basically Lie algebras are suited for describing local properties associated with
infinitesimal transformations around the identity element e. More specifically, the
exponential functions of the real skew-symmetric matrices cannot produce E.
Considering that one of the connected components of O(3) is ̬
SO(3) that̬ ̬ contains
the identity element, it is intuitively understandable that o ðnÞ and so ðnÞ are
the same.
Another interesting point is that SO(3) is at once an open set and a closed set (i.e.,
clopen set; see Sect. 6.1). This is relevant to Theorem 6.3 which says that a necessary
and sufficient condition for a subset S of a topological space to be both open and
closed at once is Sb ¼ ∅. Since the determinant of any elements of O(3) is
alternatively 1, it is natural that SO(3) has no boundary; if there were boundary,
its determinant would be zero, in contradiction to SO(3) ⊂ GL(n, ℝ); see Example
20.8.
Theorem 20.11 Let G be a connected linear Lie group. (The Lie group G may be G0
in the sense of Theorem 20.10.) Then, 8g 2 G can be described by

g ¼ ð exp t 1 X 1 Þð exp t 2 X 2 Þ  ð exp t d X d Þ, ð20:329Þ


892 20 Theory of Continuous Groups

where Xi (i ¼ 1, 2,   , d ) is a basis set of Lie algebra corresponding to G and


ti (i ¼ 1, 2,   , d) is an arbitrary real number.
Originally, the notion of Lie algebra has been introduced to deal with infinites-
imal transformation near the identity element. Yet, Theorem 20.11 says that any
transformation (or element) of a connected linear Lie group can be described by a
finite number of elements of the corresponding Lie algebra. That is, the Lie algebra
determines the global characteristics of the connected Lie group.
Strengthening the condition of connectedness, we need to deal with simply
connected groups. In this context we have following definitions.
Definition 20.8 Let G be a linear Lie group and let an interval I ¼ [0, 1]. Let f be a
continuous function such that

f : I⟶G or f ðI Þ ⊂ G: ð20:330Þ

Then, the function f is said to be a path, in which f(0) and f(1) are an initial point
and an end point, respectively. If f(0) ¼ f(1) ¼ x0, f is called a loop (i.e., a closed
path) at x0 [14]. If f(t)  x0 (0 t 1), f is said to be a constant loop.
Definition 20.9 Let f and g be two paths. If f and g can continuously be deformed
from one to the other, they are said to be homotopic.
To be more specific with Definition 20.9, let us define a function h(s, t) that is
continuous with respect to s and t in a region I  I ¼ {(s, t); 0 s 1, 0 t 1}. If
furthermore h(0, t) ¼ f(t) and h(1, t) ¼ g(t) hold, f and g are homotopic [9].
Definition 20.10 Let G be a connected Lie group. If all the loops at an arbitrary
chosen point 8x 2 G are homotopic to a constant loop, G is said to be simply
connected.
This would somewhat be an abstract concept. Rephrasing the statement in a word,
for G to be simply connected is that loops at any 8x 2 G can continually be
contracted to that point x. A next example helps understand the meaning of being
simply connected.
Example 20.9 Let S2 be a spherical surface of ℝ3 such that
 
S2 ¼ x 2 ℝ3 ; hxjxi ¼ 1 : ð20:331Þ

Any x is equivalent in virtue of the spherical symmetry of S2. Let us think of any
loop at x. This loop can be contracted to that point x. Hence, S2 is simply connected.
By the same token, a spherical hypersurface (or hypersphere) given by
Sn ¼ {x 2 ℝn + 1; hx| xi ¼ 1} is simply connected. Thus, from (20.36) SU(2) is
simply connected. Note that the parameter space of SU(2) is S3.
Example 20.10 In Sect. 20.4.2 we showed that SO(3) is a connected set. But, it is
not simply connected. To see this, we return to Sect. 17.4.2 that dealt with the three-
dimensional rotation matrices. Rewriting fR3 in (17.101) as fR3 ðα, β, γ Þ, we have
20.5 Connectedness of Lie Groups 893

f3 ðα, β, γ Þ
R
0 1
cos α cos β cos γ  sin α sin γ  cos α cos βsinγ  sin α cos γ cos α sin β
B C
¼B @
sin α cos β cos γ þ cos α sin γ  sin α cos β sin γ þ cos α cos γ sin α sin β C:
A
 sin β cos γ sin β sin γ cos β
ð20:332Þ

Note that in (20.332) we represent the rotation in the moving coordinate system.
Putting β ¼ 0 in (20.332), we get

f
R3 ðα, 0, γ Þ
0 1
cos α cos γ  sin α sin γ  cos α sin γ  sin α cos γ 0
B C
¼B @ sin αð cos 0Þ cos γ þ cos α sin γ  sin αð cos 0Þ sin γ þ cos α cos γ 0C
A
0 0 1
0 1
cos ðα þ γ Þ  sin ðα þ γ Þ 0
B C
¼B @ sin ðα þ γ Þ cos ðα þ γ Þ 0 CA:
0 0 1
ð20:333Þ

If in (20.333) we replace α with α + ϕ0 and γ with γ  ϕ0, we are going to obtain


the same result as (20.333). That is, different sets of parameters give the same result
(i.e., the same rotation) such that

ðα, 0, γ Þ⟷ðα þ ϕ0 , 0, γ  ϕ0 Þ: ð20:334Þ

Putting β ¼ π in (20.332), we get


0 1
 cos ðα  γ Þ  sin ðα  γ Þ 0
f3 ðα, π, γ Þ ¼ B
R @  sin ðα  γ Þ cos ðα  γ Þ
C
0 A: ð20:335Þ
0 0 1

If, in turn, we replace α with α + ϕ0 and γ with γ + ϕ0 in (20.335), we obtain the


same result as (20.335). Again, different sets of parameters give the same result (i.e.,
the same rotation) such that

ðα, π, γ Þ⟷ðα þ ϕ0 , π, γ þ ϕ0 Þ: ð20:336Þ

Meanwhile, R3 of (17.107) is expressed as


894 20 Theory of Continuous Groups

(a) (b)

+
− Equator
N

2 −
S

(c)

Fig. 20.7 Geometry of the rotation axis (A) accompanied by a rotation γ. (a) Geometry viewed in
parallel to the equator. (b, c) Geometry viewed from above the north pole (N). If the azimuthal angle
e α should be replaced with (b) α + π (in the case of 0 α π)
α is measured at the antipodal point P,
or (c) α  π (in the case of π < α 2π)

0 1
cos 2 α cos 2 β þ sin 2 α cos γ cos α sinα sin 2 βð1  cos γ Þ cos α cos β sin βð1  cos γ Þ
B þ 2
α 2
β  cos β sin γ þsin α sin β sin γ C
B cos sin C
B C
B C
B  2 C
B C
B cos αsinα sin 2 βð1  cos γ Þ sin α cos 2 β þ cos 2 α cos γ sinα cos β sin βð1  cos γ Þ C
B C
R3 ¼ B  cos αsinβ sin γ C:
B þcos β sin γ þ sin 2 α sin 2 β C
B C
B C
B C
B C
B cos β sin β cos αð1  cos γ Þ sin αcos β sinβð1  cos γ Þ sin 2 β cos γ C
@ A
sin α sin β sin γ þcos α sin β sin γ þ cos 2 β
ð20:337Þ
20.5 Connectedness of Lie Groups 895

Note in (20.337) we represent the rotation in the fixed coordinate system. In this
case, again we have arbitrariness with the choice of parameters. Figure 20.7 shows a
geometry of the rotation axis (A) accompanied by a rotation γ; see Fig. 17.18 once
again. In this parameter space, specifying SO(3) the rotation is characterized by
azimuthal angle α, zenithal angle β, and magnitude of rotation γ. The domains of
variability of the parameters (α, β, γ) are

0 α 2π, 0 β π, 0 γ 2π:

Figure 20.7a shows a geometry viewed in parallel to the equator that includes a
zenithal angle β as well as a point P (i.e., a point of intersection between the rotation
axis and geosphere) and its antipodal point P. e If β is measured at P,
e β and γ should be
replaced with π  β and 2π  γ, respectively. Meanwhile, Fig. 20.7b, c depict an
azimuthal angle α viewed from above the north pole (N). If the azimuthal angle α is
e α should be replaced with either α + π (in the case
measured at the antipodal point P,
of 0 α π; see Fig. 20.7b) or α  π (in the case of π < α 2π; see Fig. 20.7c).
Consequently, we have the following correspondence:

ðα, β, γ Þ⟷ðα π, π  β, 2π  γ Þ: ð20:338Þ

Thus, we confirm once again that the different sets of parameters give the same
rotation. In other words, two different points in the parameter space correspond to
the same transformation (i.e., rotation). In fact, using these different sets of param-
eters, the matrix form of (20.337) is held unchanged. The conformation is left for
readers.
Because of the aforementioned characteristic of the parameter space, a loop
cannot be contracted to a single point in SO(3). For this reason, SO(3) is not simply
connected.
In contrast to Example 20.10, the mapping from the parameter space S3 to SU(2)
is bijective in the case of SU(2). That is, any two different points of S3 give different
transformations on SU(2) (i.e., injective). For any element of SU(2) it has a
corresponding point in S3 (i.e., surjective). This can be rephrased such that SU
(2) and S3 are isomorphic.
In this context, let us give important concepts. Let (T, τ) and (S, σ) be two
topological spaces. Suppose that f : (T, τ) ⟶ (S, σ) is a bijective mapping. If in this
case, furthermore, both f and f1 are continuous mappings, (T, τ) and (S, σ) are said to
be homeomorphic [14]. In our present case, both SU(2) and S3 are homeomorphic.
Apart from being simply connected or not, SU(2) and SO(3) resemble in many
aspects. The resemblance has already shown up in Chap. 3 when we considered the
(orbital) angular momentum and generalized angular momentum. As shown in
(3.30) and (3.69), the commutation relation was virtually the same. It is obvious
when we compare (20.271) and (20.275). That is, in terms of Lie algebras their
structure constants are the same and, eventually, the structure of corresponding Lie
groups is related. This became well understood when we considered the adjoint
896 20 Theory of Continuous Groups

̬
representation of the Lie group G on its Lie algebra g . Of the groups whose Lie
algebras are characterized by the same structure constants, the simply connected
group is said to be a universal covering group. For example, among the Lie groups O
(3), SO(3), and SU(2), only SU(2) is simply connected and, hence, is a universal
covering group.
The understanding of the Lie algebra is indispensable to explicitly describing the
elements of the Lie group, especially the connected components of the identity
element e 2 G. Then, we are able to understand the global characteristics of the
Lie group. In this chapter, we have focused on SU(2) and SO(3) as a typical example
of linear Lie groups.
Finally, though formal, we describe a definition of a topological group.
Definition 20.11 [15] Let G be a set. If G satisfies the following conditions, G is
called a topological group.
(i) The set G is a group.
(ii) The set G is a T1-space.
(iii) Group operations (or calculations) of G are continuous. That is, let P and Q be
two mappings such that

P : G  G⟶G; Pðg, hÞ ¼ gh ðg, h 2 GÞ, ð20:339Þ


Q : G⟶G; QðgÞ ¼ g1 ðg 2 GÞ: ð20:340Þ

Then, both the mappings P and Q are continuous.


By Definition 20.11, a topological group combines the structures of group and
topological space. Usually, the continuous group is a collective term of Lie groups
and topological groups. The continuous groups are very frequently dealt with in
various fields of natural science and mathematical physics.

References

1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, expanded
edn. Shokabo, Tokyo. (in Japanese)
3. Takagi T (2010) Introduction to analysis, Standard edn. Iwanami, Tokyo. (in Japanese)
4. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn.
Academic Press, Waltham
5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
6. Rose ME (1995) Elementary theory of angular momentum. Dover, New York
7. Hassani S (2006) Mathematical physics. Springer, New York
8. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton
9. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
10. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
References 897

11. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
12. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
13. Steeb W-H (2007) Continuous symmetries, Lie algebras, differential equations and computer
algebra, 2nd edn. World Scientific, Singapore
14. McCarty G (1988) Topology. Dover, New York
15. Murakami S (2004) Foundation of continuous groups, revived edn. Asakura Publishing, Tokyo.
(in Japanese)
Index

A Analytic continuation, 225–227, 234, 264


Abelian group, 622, 626, 689, 710, 746, 762 Analytic functions, 181–265, 319, 581, 582,
Abscissa axis, 181, 197 866, 885
Absolute convergence, 583 Analytic prolongation, 227
Absolute temperature, 340 Angular coordinate, 197
Absolute value, 127, 138, 141, 197, 205, 216, Angular frequency, 4, 6, 32, 127, 129, 133, 139,
337, 354, 567, 777, 782, 808 278, 341, 344, 346, 351, 356
Abstract algebra, 846, 852, 884 Angular momentum operator, 29, 77–91, 108,
Abstract concept, 433, 635, 688, 885, 892 142, 145
AC5, 361–363 Anisotropic
AC’7, 359–363 crystal, 366
Accumulation points, 191–193, 223, 225, 226 dielectric constant, 365, 366
Adherent point, 186, 192 media, 569
Adjoint boundary functionals, 393 medium, 365
Adjoint Green’s functions, 396, 402 Annihilation operators, 31, 41, 111, 112
Adjoint matrix, 22, 524, 584 Anti-Hermitian, 28, 29, 109, 116, 117, 387,
Adjoint operator, 109, 387, 389, 523, 535–539, 590, 592, 600, 610, 802, 806, 810, 867,
541, 554 870–873, 875, 876, 878
Adjoint representation, 874–884, 895 Anti-Hermitian operators, 28, 109, 610, 806,
Algebraic branch point, 256 810, 870, 873
Algebraic equation, 15 Antilinear, 526
Algebraic method, 1 Anti-phase, 336, 337
Allowed transition, 138, 147, 764, 776 Antipodal point, 895
Allyl radical, 688, 689, 691, 747, 770–777 Antisymmetric, 412, 723–726, 858, 860, 863,
Aluminum-doped zinc oxide (AZO), 363, 864
369–371 Antisymmetric representation, 723–726, 858,
Ampère, A.-M., 273 860, 863, 864
Ampère’s circuital law, 273 Approximation methods, 151–179
Amplitude, 278, 284, 285, 298, 300, 303, 305, Arc tangent, 63, 371
310, 324, 332, 333, 335, 351, 375, 377, Arguments, 30, 55, 66, 152, 187, 197, 227, 231,
422, 423 238, 243, 247, 252, 254, 256, 258, 260,
Analytical equation, 1 261, 264, 303, 317, 319, 353, 385, 394,
Analytical method, 77, 83, 91 398–400, 403, 405, 408, 412, 416, 418,
Analytical solutions, 44, 107, 151 454, 463, 489, 498, 499, 517, 532, 557,

© Springer Nature Singapore Pte Ltd. 2020 899


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-15-2225-3
900 Index

569, 573, 608, 615, 623, 647, 658, 681, Bithiophene, 648
693, 704, 707, 731, 734, 740, 747, 751, Blackbody, 339–341
764, 797, 806, 817, 820, 826, 828, 847 Blackbody radiation, 339–341
Aromatic hydrocarbons, 729, 747 Block matrix, 686, 860
Aromatic molecules, 756, 770, 777 Bohr radius
Associated Laguerre polynomials, 57, 108, of hydrogen, 133
116–122 of hydrogen-like atoms, 108
Associated Legendre differential equation, 83, Boltzmann distribution law, 339, 348, 355
91–103, 427 Born probability rule, 565
Associated Legendre functions, 57, 83, 94, Boron trifluoride (BF3), 650
103–107 Bose–Einstein distribution functions, 341
Associative law, 23, 441, 541, 624, 631, 663, Boundary, 14, 71, 109, 117, 173, 174, 189–190,
666 325–327, 341, 344, 363, 365, 370, 379,
Atomic orbitals, 729, 734, 742, 744, 745, 747, 380, 384, 385, 388, 389, 393, 401, 405,
749, 752, 757, 764, 770, 771, 777, 785, 408, 418, 423, 597, 891
788, 789 functionals, 380, 384, 388, 389, 393, 408
Attribute, 636, 637 term, 386
Azimuthal angle, 29, 70, 674, 791, 831, 839, Boundary conditions (BCs), 14–17, 19, 20,
895 25–27, 30, 32, 36, 56, 71, 72, 109, 116,
Azimuthal quantum numbers, 120, 142, 147 117, 173, 174, 285, 325–327, 341–343,
Azimuth of direction, 663 345, 363, 365, 370, 379, 380, 384–386,
388, 389, 393–395, 397–399, 401–403,
405, 407–411, 415, 418, 419, 421,
B 423–427, 581, 597–600, 602, 605, 607,
Backward wave, 278, 331, 333, 335 608, 610, 615
Ball, 220 adjoint, 410, 415
Basis homogeneous, 380, 388, 389, 393, 394,
functions, 88, 683–691, 720, 734, 744, 397, 399, 401–403, 405, 407–409, 411,
748–750, 777, 799, 815–817, 819, 839, 423, 426, 427, 597, 600
842, 843, 846, 849, 859, 860, 863, 864 homogeneous adjoint, 393–395
set, 457, 515, 516, 541, 547, 584, 685, 692, inhomogeneous, 380, 394, 395, 402, 403,
693, 720, 739, 847, 849, 854, 878, 881, 405, 407, 415, 418, 419, 424, 597–599,
892 605, 607, 608
vectors, 8, 41, 59, 60, 132, 140, 155, 291, Bounded, 25, 29, 74, 79, 212, 214–216, 221,
395, 433, 435, 439–446, 448, 450, 222
452–460, 468–470, 474, 477, 487, BP1T, 361–363
489–491, 495, 505–507, 513, 515, Bragg’s condition, 374
518–520, 528, 529, 533, 537–540, 547, Branch, 250, 252, 253, 255, 256, 427, 547, 617
568, 576, 636, 640, 648, 650, 654, 657, cut, 253, 255–259, 262, 263
677, 679, 684–686, 688, 689, 691, 704, point, 252–257, 259, 262–264
710, 711, 716, 717, 719, 730, 738, 740, Bra vector, 22, 523, 526, 529
744, 745, 747, 751, 757, 761, 764, 770, Brewster angles, 312–314
783, 784, 803, 805, 820, 822, 843, 847,
848, 852, 854, 855, 859, 860, 863, 872,
878 C
Benzene, 648, 729, 747, 762, 764–771 Canonical commutation relation, 3, 27–30, 34,
Bijective, 447, 450, 451, 623, 628, 629, 895 43, 44, 54, 59, 65
Bijective mapping, 895 Canonical coordinate, 27
Binomial expansion, 233, 234, 852 Canonical forms of matrices, 459–522, 559
Binomial theorem Canonical momentum, 27
generalized, 95, 233 Cardinalities, 183
Biphenyl, 648 Cardinal numbers, 183
Index 901

Carrier space, 687 Column vectors, 8, 21, 41, 88, 90, 290, 435,
Cartesian coordinates, 45, 61, 109, 171, 197, 440, 441, 452, 455, 457, 462–465, 502,
283, 321, 349, 650, 736, 791, 793, 804, 505, 507, 508, 510, 527, 535, 541, 542,
806, 823, 889 556, 557, 568, 573, 574, 577, 596, 613,
Cartesian space, 59, 436 658, 677, 684, 759, 782, 810, 817, 820,
Cauchy–Hadamard theorem, 220 831, 863
Cauchy–Liouville theorem, 215–216 Common factor, 477, 478, 482, 483
Cauchy Riemann conditions, 203–206 Commutable, 27, 482, 486–488, 524, 556, 579,
Cauchy–Schwarz inequality, 525, 585 642, 643, 869
Cauchy’s integral formula, 206–216, 219, 225, Commutation relation, 3, 27–30, 34, 43, 44, 54,
229 59, 65, 73, 77, 142–144, 872, 895
Cauchy’s integral theorem, 209–212, 227, 243 Commutative, 27, 171, 472, 518, 585, 591, 603,
Cavity, 33, 339, 341, 343–345, 375, 376, 378 604, 615, 622, 639, 649, 689, 824, 825,
Cavity radiation, 339 865, 870, 871, 889, 890
Center of inversion, 642 Commutative groups, 622
Center-of-mass coordinates, 57, 58 Commutative law, 622
Central force fields, 57, 107 Commutator, 27–30, 144, 868, 872
Characteristic equation, 421, 461, 462, 501, Compatibility relation, 750
533, 567, 573, 675, 689, 831 Complementary set, 183, 212, 888
Characteristic impedance Complements, 183, 185, 189, 190, 195, 212,
of dielectric media, 295, 336 547–549, 559, 688, 691, 888
Characteristic polynomial, 461, 471, 477, 481, Complete orthonormal system (CONS), 151,
489, 513, 517, 520, 782 155, 163, 166, 167, 173, 842
Characters, 373, 453, 524, 628, 631, 637, 694, Complete system, 155
697–700, 702, 703, 722, 723, 726, 741, Completely reducible, 825, 830
745, 746, 748, 750, 751, 755–757, 759, Complex amplitude, 303
761, 765, 770, 775, 776, 780, 784, 796, Complex analysis, 181, 182, 197
797, 834, 837–844 Complex conjugate, 22, 24, 25, 28, 34, 117,
Chemical species, 374, 650 137, 199, 347, 390, 397, 401, 523, 524,
Circularly polarized light 526, 536, 537, 693, 714, 762, 816, 817,
left-, 136, 138, 147–149, 293 820, 850, 876
right-, 136, 138 Complex conjugate representation, 762, 816,
Clad layers, 321, 327, 331, 371 817
Classes, 191, 625–627, 637, 653, 654, 656, 657, Complex conjugate transposed matrix, 22, 537
659, 662, 663, 699, 700, 704–711, 746, Complex domain, 19, 201, 216, 227, 263–265,
834, 838, 839, 841, 888, 889 315
Classical Hamiltonian, 33 Complex function, 14, 181, 199, 200, 204, 206,
Classical orthogonal polynomials, 47, 94, 379, 207, 216, 254, 389, 764
427 Complex phase shift, 329
Clebsch Gordan Coefficients, 843 Complex plane, 14, 181, 182, 197–199, 201,
Clopen sets, 190, 193, 195, 891 206, 207, 211, 212, 216, 227, 241, 242,
Closed interval, 196 249, 251, 253, 263, 317, 318, 521, 885
Closed-open set, 190 Complex roots, 235, 420, 421
Closed paths, 194, 209, 892 Complex variables, 15, 199–206, 215, 230,
Closed sets, 185, 186, 188–190, 193, 195, 196, 532, 855
212, 891 Composite function, 63
Closed shell, 741, 755 Compton, A., 4, 6
Closed surface, 273, 274 Compton effect, 4
Closures, 186–189, 654 Compton wavelength, 5
C-numbers, 10, 22 Condon–Shortley phase, 101, 102, 106, 123,
Cofactor, 471, 472 138, 140, 141
Cofactor matrix, 471, 472 Conjugacy classes, 625, 654, 657, 659, 662,
Coherent state, 126–128, 133, 139, 149 699, 707, 710, 711, 841
902 Index

Conjugate element, 625 Cross-section area, 355, 356, 376


Conjugate representation, 762 Crystal
Connected anisotropic, 366, 569
components, 886–888, 891, 896 organic, 359, 360, 362, 363, 365, 372–374,
set, 886–888, 892 569
Connectedness, 194, 199, 887, 892, 896 Cubic roots, 249
Conservation of charge, 274 Current continuity equation, 274
Conservation of energy, 5 Cyclic groups, 622
Constant loop, 892 Cyclic permutation, 641, 677
Constant matrix, 603, 604, 610, 617 Cyclopropenyl radical, 729, 747, 757–764, 767,
Constant term, 169 768, 770, 771
Constructability, 401
Constructive interference, 357
Continuous function, 214, 885–887, 892 D
Continuous groups, 190, 663, 804, 816, 833, Damped oscillator, 419–424
838, 848, 877, 883, 894, 896 Damping
Continuous mapping, 886, 895 critical, 419
Contour integral, 207, 208, 229, 230, 232, 236, over, 419
239, 240, 242, 244 weak, 420
Contour integration, 211, 213, 214, 217, 220, Damping constant, 419
230, 232, 236, 238–240, 243, 247, 257, Darboux inequality, 212–213, 231
261, 262, 265 de Broglie, L.-V., 6
Contour map, 729 de Broglie wave, 6
Contraposition, 204, 223, 382, 449, 476, 528, Definite integrals, 26, 98, 112, 132, 137, 171,
667, 825 181, 230–248, 257, 259, 261, 717, 723,
Contravariant, 850, 851 741, 742, 792
Convergence circle, 220, 226 de Moivre’s theorem, 198, 250
Convergence radius, 220 de Morgan’s law, 184, 186
Coordinate point, 289, 292 Degeneracy, 152
Coordinate representation, 3, 31, 44–51, 112, Degenerate
122, 129, 132, 136, 146, 157, 160, 165, doubly, 768
168, 170, 171, 175, 178, 256, 395, 396, triply, 789, 795, 796
791, 823, 837 Derived set, 191
Coordinate transformation, 291, 367, 441, 573, Determinants, 42, 59, 287, 383, 409, 433,
635, 641, 677, 738, 816, 837 448–452, 454, 471, 517, 531, 567, 571,
Core layer, 321, 327, 329, 331 589, 647, 654, 662–664, 744, 787, 808,
Corpuscular beam, 6, 7 863, 887, 888, 891
Correction terms, 153–156 Device physics, 330, 359, 374
Coset Device substrate, 363, 370
decomposition, 624, 626, 649, 890 Diagonal elements, 87, 88, 90, 287, 450, 461,
left, 624–626 462, 465, 476, 496, 498, 499, 512, 514,
right, 624–626 518, 559, 560, 570, 588, 589, 639, 640,
Coulomb integral, 740, 752, 767, 788 654, 690, 697, 744, 745, 761, 782, 784,
Coulomb potential, 57, 67, 107 787, 870, 871, 873
Coulomb’s law, 270 Diagonalizable, 475, 553
Countable, 14, 565, 621 Diagonalizable matrices, 512–522, 561
Coupling, 849 Diagonalization, 466, 517, 573–580, 601, 667,
Covariant, 850, 851 690
Covering group, 896 Diagonalized, 43, 54, 367, 475, 477, 486, 487,
Cramer’s rule, 306, 406, 809 513, 515, 530, 547, 556, 557, 559, 561,
Creation operator, 41 567, 578–580, 600, 606, 690, 691, 782,
Critical angles, 312–314, 317, 318, 320, 329 804
Critical damping, 419
Index 903

Diagonal matrices, 89, 459, 485, 517, 535, 556, Dispersion of refractive index, 358, 359, 361,
558, 559, 568, 573, 600, 601, 604, 638, 362
681, 689, 738, 782, 805, 826, 828, 879 Displacement current, 273, 275
Diagonal positions, 465, 486 Distance function, 181, 196, 199, 524
Dielectric constant, 270, 271, 336, 365 Distributive law, 184, 442
Dielectric constant ellipsoid, 366 Divisor, 473, 514, 648
Dielectric medium, 269–272, 275, 276, 280, Domain
285, 286, 295–337, 339, 357, 365, 371, of analyticity, 206, 209, 210, 212, 216, 227
375, 376, 419 complex, 19, 227, 263–265, 315
Differentiable, 65, 201, 204, 206, 215, 275, of variability, 181, 895
406, 581, 591, 866, 868
Differential equations, 3, 14, 15, 17, 19, 26, 44,
50, 67, 83, 91–103, 107, 108, 169, 174, E
269, 341, 368, 379–389, 392, 401, 403, Effective index, 362, 368
405, 408, 418, 421, 426, 427, 581, Eigenenergy, 43, 111, 121, 151–156, 162–164
592–600, 613, 617 Eigenequation, 460
Differential operators, 9, 14, 19, 26, 27, 50, 68, Eigenfunctions, 17, 18, 24, 26, 36, 37, 40, 41,
72, 77, 167, 172–174, 270, 379, 386, 67–69, 72, 86, 110, 112, 137, 152, 173,
388–394, 399, 402, 403, 409, 414, 425, 429, 690, 715, 720, 739, 768, 773, 774,
426, 523, 547 793, 826, 861, 863
Diffraction grating, 359, 361, 363, 371 Eigenspace, 460, 468–473, 478–485, 487, 488,
Diffraction order, 364, 373 496, 513, 515, 517, 559, 565, 577, 719
Dimensionless, 72, 77, 79, 108, 270–272, 278, Eigenstate
355, 359, 790, 801 energy, 156, 157, 163, 166, 176, 774
Dimensionless parameter, 50, 152 Eigenvalue
Dimension theorem, 444, 445, 447, 479, 492, equation, 13, 24, 29, 33, 50, 51, 58, 126,
493, 514 151, 152, 160, 172, 371, 408, 426, 460,
Dipole 466, 502, 521, 534, 577, 729, 734, 742,
approximation, 127, 142, 347 744, 789
isolated, 353 problems, 14, 16, 18, 21, 24, 120, 125, 379,
oscillation, 339 425–430, 459, 522, 579, 691, 720, 785
radiation, 349–354 Eigenvectors, 26, 41, 74, 152, 153, 155, 288,
Dirac equation, 12 290, 459–468, 471, 473–478, 493, 495,
Dirac, P., 11, 12, 22, 523–526 497, 500–503, 505, 507, 510, 513, 515,
Direct factors, 632, 649 516, 518, 520, 521, 556, 559, 561, 566,
Direct sum, 191, 193, 437, 471, 473, 474, 575–580, 654, 665, 667, 676, 691, 743,
477–481, 484, 487–489, 496, 513, 515, 746, 782, 831, 832
517, 518, 547, 550, 633, 686, 687, 691, Einstein, A., 3, 4, 6, 7, 339, 341, 345–347, 349,
699, 702, 719, 742, 749, 769, 770, 781, 353, 354
825, 843, 886, 888, 889 Einstein coefficients, 349
Direction cosines, 279, 305, 658, 675, 676, 832, Electric dipole
834, 835 approximation, 142
Direct-product groups, 621, 631–633, 649, 720, moment, 127, 164, 166, 741, 755
723, 750, 843, 844, 888 operator, 796
Direct-product representations, 720–723, 725, transition, 125–127, 129, 132, 141, 146,
740, 741, 745, 769, 770, 796, 831, 740, 754, 755
843–846, 849, 859, 861, 863 Electric displacement, 270
Dirichlet conditions, 14, 19, 26, 71, 325, 342, Electric field, 127, 147, 149, 151, 156, 160,
344, 408 161, 163, 164, 172, 174, 177, 269, 270,
Disconnected, 194 272, 284–286, 289, 292, 293, 297,
Discrete set, 191 303–306, 310, 315, 324, 325, 328, 329,
Discrete topology, 195 331, 335, 336, 344, 350–352, 368–370,
Disjoint sets, 194 375, 376, 418, 419, 611, 755, 796
904 Index

Electric flux density, 270, 367 Evanescent mode, 320


Electric permittivity, 365 Evanescent wave, 321, 327–331
Electric permittivity tensor, 365 Even function, 18, 49, 132, 157, 837, 840, 842
Electric waves, 284–286 Even permutations, 449, 873
Electromagnetic fields, 33, 127, 148, 295–297, Excited state, 37, 128, 132, 152, 174, 175, 177,
303, 305, 321, 351, 363, 365, 369, 371, 339, 346, 347, 354–356, 754, 769, 776,
377, 569, 611 796, 797
Electromagnetic induction, 272 Existence probability, 21, 32, 139
Electromagnetic plane wave, 281 Existence probability density, 139
Electromagnetic waves, 127, 151, 269, Expansion coefficient, 173, 396, 836
280–293, 295–337, 339, 341–345, 352, Expectation value, 24, 25, 33–36, 51, 52, 67,
357, 363, 367, 368, 375, 379, 419, 172, 174, 565, 566
595–600 Exponential functions, 15, 198, 241, 281, 298,
Electromagnetism, 166, 269, 270, 279, 378, 299, 423, 581–617, 799, 802, 808, 826,
379, 427, 569 891
Electronic configuration, 741, 754, 755, 769, Exponential polynomials, 299
774–776, 796, 797 External force fields, 58
Element, 45, 83, 127, 182, 287, 347, 433, 447, Extremals, 129, 334
523, 553, 581, 621, 635, 679, 735, 812 Extremum, 176
Elementary charge, 58, 127, 148, 347
Ellipse, 288, 289, 292, 366, 574, 575
Ellipsoidal coordinates, 790, 791, 793 F
Elliptic polarizations, 269 Factor group, 627, 631, 649
Elliptically polarized light, 288, 289, 292 Faraday, M., 272
Emission color, 374 Faraday’s law of electromagnetic induction,
Emission gain, 374 272
Empty set, 183, 185, 626 Finite group, 621–623, 626, 628, 629, 631, 662,
Endomorphism, 433, 441, 443, 444, 446–449, 663, 679–681, 683, 684, 686, 692, 703,
452, 454, 622, 630, 876, 878 799, 833, 840, 841, 845, 846, 888, 890
Energy conservation, 5, 311 Finite set, 183
Energy eigenvalues, 20, 31, 36, 37, 51, 133, First-order approximation, 177, 178
151, 152, 154, 158, 172, 532, 691, 719, First-order linear differential equation
739, 751, 753, 754, 762, 767, 768, 774, (FOLDE), 44, 379, 384–389, 593, 598,
786, 795 607, 608
Energy level, 41, 153–156, 172, 345, 764, 796 First-order partial differential equations, 269
Energy transport, 308–311, 320, 335 Fixed coordinate system, 640, 657, 678, 831,
Entire function, 206, 215, 216, 242 895
Equation of motion, 58, 148, 419 Flux, 148, 271, 272, 311, 352, 356, 612
Equation of wave motion, 269, 276–280, 322 Focused ion beam (FIB), 363
Equilateral triangle, 650, 757 Forbidden, 125, 755, 770, 797
Equivalence law, 485 Forward wave, 278, 331, 333, 335
Equivalence relation, 885 Four group, 639, 648
Equivalent transformation, 569 Fourier coefficients, 167
Essential singularity, 225 Fourier cosine coefficients, 842
Ethylene, 729, 747–757, 759, 761, 764, 770, Fourier cosine series, 842
771 Fréchet axiom, 195
Euclidean space, 196, 635, 639, 663 Free space, 295, 326, 331, 352
Euler angles, 669–678, 808, 811, 815, 820, 826, Functionals, 13, 15, 100, 101, 118, 122, 134,
831, 881 139, 160, 167, 169, 201, 203, 219, 227,
Euler equation, 169 234, 278, 380, 382, 384, 388, 389, 393,
Euler’s formula, 198 408, 523, 594, 717, 734, 741, 746, 754,
Euler’s identity, 198 854
Index 905

Function space, 25, 623, 728 H


Fundamental set of solutions, 15, 383, 404, 405, Half-integer, 75, 77
409, 411, 418–421, 613, 617 Half-odd-integers, 75, 77, 79, 82, 812, 815,
Fundamental solution, 608 849, 855, 884
Hamilton–Cayley theorem, 471–472, 482, 489,
514
G Hamiltonian, 11, 21, 29, 31, 33, 36, 43, 57–67,
Gain function, 356 107, 152, 157, 164, 174, 177, 532, 735,
Gamma functions, 95, 97, 98, 856 738, 740, 741, 745, 761, 788, 790
Gaussian plane, 181 Harmonic oscillator, 30–52, 57, 107–109, 125,
Gauss’ law, 270, 271 126, 130–132, 151, 152, 160, 174, 175,
Gegenbauer polynomials, 94, 99 339, 341, 375, 377, 379, 418, 419, 532,
Generalized angular momentum, 72–77, 810, 748, 755
811, 844, 873, 895 Heisenberg, W., 11
Generalized binomial theorem, 95, 233 Hermite differential equation, 51, 426, 427
Generalized eigenspaces, 471, 478–485, 488, Hermite polynomials, 47, 49, 57, 163, 427
496 Hermitian, 23, 52, 67, 130, 152, 347, 379, 530,
Generalized eigenvalues, 460, 500, 589 547, 600, 681, 738, 802
Generalized eigenvectors, 471, 473–478, 493, Hermitian differential operators, 172, 174, 425
495, 497, 500, 504, 505, 507, 511, 513, Hermitian matrix, 24, 131, 530, 532, 537, 568,
521 572, 577, 600
Generalized Green’s identity, 392, 409 Hermitian operator, 23–25, 52, 53, 68, 77, 156,
General linear group, 623, 808 173, 177, 347, 379, 399, 402, 427, 429,
General point, 635, 730, 815 547–580, 740, 806
General solution, 15, 32, 278, 384, 409, 420 Hermitian quadratic form, 532, 535, 547,
Generating function, 95, 264 568–575
Geometric object, 635–638, 640, 648, 654 Hermiticity, 3, 19, 25, 26, 117, 391, 394, 402,
Geometric series, 217, 221 426, 427, 577, 745
Glass, 269, 280, 286 Hesse’s normal forms, 280, 658, 661
Global properties, 890–896 Highest-occupied molecular orbital (HOMO),
Gram matrix, 529, 530, 532–535, 541, 563, 754, 769
568, 681 Hilbert space, 25, 565, 748
Gram–Schmidt orthonormalization, 543, 556 Homeomorphic, 895
Grand orthogonality theorem (GOT), 692–697, Homogeneous BCs, 388, 389, 393, 394, 397,
713 399, 401–403, 405, 407–409, 411, 423,
Grating period, 364, 374 425, 426
Grating wavevector, 363, 364, 372, 373 Homogeneous equation, 169, 380, 397, 399,
Grazing emissions, 364, 372 401, 402, 404, 405, 408, 419, 425, 426,
Grazing incidence, 317, 318 594, 597, 600, 604, 607, 613, 614
Greatest common factor, 478 Homomorphic, 628–631, 680, 685, 708
Green’s functions, 379–430, 581, 592, 594, 617 Homomorphic mapping, 629, 630
Green’s identity, 392, 397 Homomorphism, 621, 627–631, 679–681, 713,
Ground state, 36, 128, 132, 152, 153, 156, 157, 718, 881, 883
163, 164, 172, 174, 176, 177, 339, 340, Homomorphism theorem, 631, 883
345–347, 741, 754, 769, 776, 796, 797 Homotopic, 892
Group Hückel approximation, 754
algebra, 700–708 Hydrogen, 57, 122, 125, 132–142, 163–172,
representation, 679–726 755, 777, 780, 785, 788–790
theory, 581, 621–633, 635, 636, 643, Hydrogen-like atom, 57–123, 125, 126, 132,
679–726, 729–797 140, 142, 151, 153, 532, 579
Group refractive index, 358 Hypersphere, 892
Group velocity, 7, 12, 326 Hypersurface, 574, 892
906 Index

I contour, 211, 213, 214, 217, 220, 230, 232,


Idempotent matrix, 479, 482, 486, 517, 520, 236, 238, 239, 243, 247, 257, 261, 262,
521, 564, 604 265
Identity counter-clockwise, 210, 211, 232, 257
element, 622, 624, 625, 627–631, 638, 649, definite, 132
697, 751, 759, 882, 886–888, 891, 892, by parts, 25, 29, 83, 130, 386
896 path, 209, 210, 222
matrix, 43, 480, 482, 484, 486, 494, 562, range, 25, 239, 415, 417, 742
567, 582, 640, 664, 667, 668, 680, 694, real, 135
702, 703, 780, 802, 865, 884, 890 surface, 274
operator, 35, 55, 155, 167, 395, 397, 480 termwise, 218, 222
transformation, 640, 663, 667, 882 volume, 274
Image Interface, 295–298, 300–305, 311, 314, 320,
of transformation, 443 321, 324–327, 331, 335–337, 339, 344,
Imaginary axis, 181, 197 363, 365, 370, 375, 376
Imaginary unit, 181 Interference, 331, 357
Improper rotation, 643, 647, 663, 780, 889, 890 Interior point, 186
Incidence angle, 301, 313 Intermediate state, 156
Incidence plane, 301, 304, 306, 311, 315 Intermediate value theorem, 887
Incident light, 300, 302, 304 Intersection, 183, 184, 190, 623, 643, 645, 646,
Indeterminate constant, 18 649, 661, 895
Indiscrete topology, 195 Invariant, 461, 468–474, 481, 487, 501, 513,
Inequivalent, 685, 692, 694–696, 699, 707, 559, 566, 569, 589, 626, 632, 687, 689,
710, 711, 830, 839, 841, 842 691, 706, 719, 734, 738, 803, 806, 814,
Infinite dielectric media, 285, 286 837, 839, 849, 851, 852, 860
Infinite-dimensional, 41, 43, 435, 748 Invariant subgroups, 626, 630, 632, 649, 662,
Infinite group, 622, 628, 629, 635, 663, 799, 706, 883, 887, 888
843, 846 Invariant subspace, 468–474, 477, 481, 490,
Infinite power series, 582 501, 503, 504, 513, 518, 521, 547, 559,
Infinite series, 159, 218, 225 577, 579, 687, 688, 691, 781
Infinite set, 182 Inverse element, 447, 622, 627, 629, 631, 638,
Inflection points, 235 656, 680, 701, 706
Inhomogeneous boundary conditions (BCs), Inverse mapping, 447
402, 403, 405, 407, 415, 419, 424, Inverse matrix, 42, 200, 433, 448–452, 454,
597–599 511, 512, 571, 596, 601, 605, 680
Inhomogeneous equation, 380, 383, 402, 407, Inverse transformations, 443, 447, 448, 622,
594, 596–599, 607 816, 818
Initial value problem (IVP), 379, 408–424 Inversion symmetry, 642, 643, 647, 654
Injective, 447, 448, 452, 627, 629, 895 Inverted distribution, 355
Injective mapping, 627 Invertible, 447, 622
Inner product Invertible mapping, 447
of functions, 719, 740 Irradiance, 311, 355–357
space, 196, 523–544, 547, 550, 554–556, Irrational functions, 248
563–566, 584 Irreducible, 686, 688, 691–694, 696–700, 703,
of vectors, 397, 523, 576 707–711, 715–717, 719, 720, 722, 738,
vector space, 523, 551, 868 740–742, 745–751, 754–757, 759, 761,
In-quadrature, 375, 376 765, 769–771, 774, 775, 778, 781,
Integrand, 214, 232, 244, 246, 257, 259, 262, 783–786, 788, 789, 795–797, 823–831,
265 837–847, 859
Integration Irreducible character, 697, 842–844
clockwise, 210, 257, 262 Irreducible representations, 687, 688, 692, 694,
complex, 206 696, 698, 699, 702, 703, 707–711,
constant, 149, 376, 385, 593, 594 715–717, 719, 720, 738, 740–742,
Index 907

745–748, 750, 751, 754–757, 759, 761, Light-emitting device, 363, 372, 373
765, 769–771, 774, 775, 778, 781, Light-emitting material
783–786, 788, 789, 795–797, 823–831, Light propagation, 148, 269, 319, 364, 366
837, 840–847, 859 Light quantum, 3, 4, 345–347
Isolated point, 191–193 organic, 359, 361, 374
Isomorphic, 629, 631, 648, 654, 661–663, 685, Limes superior, 220
734, 883, 895 Limit, 202, 205, 207, 212, 213, 220, 258, 312,
Isomorphism, 621, 627–631, 679, 890 415, 417, 587, 591, 703, 802, 836, 858,
869
Linear algebra, 491
J Linear combination
Jacobi’s identity, 872 of basis vectors, 155, 435, 440, 519, 691,
j j coupling, 849 843
Jordan blocks, 490–502, 504–506, 513 of functions, 126, 849, 854, 861
Jordan canonical forms, 459, 488–512 of a fundamental set of solutions, 404
LCAO, 734
Linear combination of atomic orbitals (LCAO),
K 729, 734, 742
Kernel, 443, 501, 629–631, 881, 883 Linearly dependent, 17, 93, 100, 107, 299, 344,
Ket vector, 22, 523, 526 381–383, 434, 443, 444, 454, 466, 468,
469, 471, 527, 528, 531, 532, 543, 693
Linearly independent, 15–17, 32, 71, 298, 341,
L 345, 368, 380–383, 404, 433, 434, 437,
Laboratory coordinate system, 366 445, 453, 456, 462, 463, 466–470, 473,
Ladder operators, 74, 90, 91, 112 476–478, 489, 490, 513, 526, 528, 532,
Laguerre polynomials, 57, 108, 116–122 533, 543, 544, 547, 568, 576, 579, 596,
Laser 613, 664, 665, 684, 685, 688, 693, 701,
material, 358, 361, 363 710, 715, 716, 785, 786, 795, 812, 822
medium, 355–358, 365 Linearly polarized light, 142, 144, 286, 293
organic, 359–374 Linear transformation group, 623
oscillation, 355, 358, 360, 374 Linear vector space
Lasing property, 359 finite-dimensional, 872
Laurent’s expansion, 221–223, 229, 233, 234 Line integral, 296
Laurent’s series, 216–224, 228, 229, 234 Local properties, 890–896
Law of conservation of charge, 274 Logarithmic branch point, 256
Law of electromagnetic induction, 272 Logarithmic functions, 248
Law of equipartition of energy, 341 Longitudinal multimode, 358
Left-circularly polarized light, 136, 138, Loop, 296, 892, 895
147–149, 293 Lorentz force, 147, 611, 612, 615
Left-circular polarization, 293 Lowering operators, 74, 90, 108, 112, 122
Legendre differential equation, 83, 91–103, 427 Lower left off-diagonal elements, 450, 462
Legendre polynomials, 57, 85, 91, 93, 98, 100, Lower triangle matrix, 462, 512, 553, 606
104, 106, 107, 264 Lowest-unoccupied molecular orbital (LUMO),
Leibniz rule, 92, 94, 101, 118 754, 769
Leibniz rule about differentiation, 92, 94 L S coupling, 849
Levi-Civita symbol, 873 Luminous flux, 311
Lie algebra, 592, 600, 610, 799, 802, 864, 891,
892, 895, 896
Lie group, 891 M
linear, 866, 867, 877, 887, 891, 892 Magnetic field, 147, 269, 271–273, 275,
Light amplification, 354, 355 283–285, 297, 303–307, 310, 322, 324,
Light amplification by stimulated emission of 326, 344, 350, 352, 368–370, 375, 376,
radiation (laser), 354 611
908 Index

Magnetic flux, 272 Molecular orbitals (MOs), 691, 719, 720, 729,
Magnetic flux density, 148, 271, 612 734–797
Magnetic permeability, 271, 365 Molecular science, 3, 711, 723, 729, 740, 797
Magnetic quantum number, 140, 142 Molecular short axis, 777
Magnetic wave, 284, 286 Molecular states, 729
Magnetostatics, 271 Molecular systems, 720, 723
Magnitude relationship, 412, 693, 694, 797 Momentum, 4–6, 27, 55, 59, 60, 72–77, 79, 82,
Major axis, 291, 292 86, 91–107, 125, 132, 138, 147–150,
Mapping 375, 377, 810, 843, 844, 873, 895
inverse, 447 Momentum operators, 3, 29, 31, 33, 61, 77–91,
invertible, 447 108, 142, 145
non-invertible, 447 Monoclinic, 365
reversible, 447 Monomials, 812, 855, 857
Mathematical induction, 48, 80, 113, 114, 462, Moving coordinate systems, 669, 670, 672,
464, 466, 543, 556, 558, 570, 583 678, 831, 893
Matrix Multiple roots, 249, 461, 473, 474, 477, 478,
algebra, 43, 306, 367, 433, 443, 451, 474, 514, 515, 517, 521, 559
530, 547 Multiplication table, 623, 648, 688, 701
decomposition, 485, 488, 512, 561 Multiplicity, 468, 481, 486, 493, 498, 562, 565,
representation, 3, 31, 41–44, 86, 89, 90, 575, 576, 782
440, 442, 443, 450, 451, 454, 456, 468, Multiply connected, 194, 222
490, 494, 503–505, 519, 536–538, 540, Multivalued function, 248–265
553, 576, 579, 640, 643–646, 648, 650,
652, 657, 659, 661, 663, 689–691, 694,
697, 730, 738, 746, 748, 758, 764, 778, N
800, 808, 826, 860, 863, 881, 884 Nabla, 61
Matrix element Nano capacitor, 157
of electric dipole, 127, 129 Nano device, 160
of Hamiltonian, 131, 532, 740, 741 Naphthalene, 648, 658
Matrix equation, 504 Necessary and sufficient condition, 53, 190,
Matrix function, 885 193, 195, 206, 382, 383, 447, 448, 451,
Matter wave, 6, 7 453, 461, 528, 533, 556, 563, 622, 624,
Maximum number, 434 629, 825, 887, 891
Maxwell, J.C., 275 Negation, 187, 191
Maxwell’s equations, 269–293, 321, 365, 366 Negative helicity, 293
Mechanical system, 33, 375–378 Neumann conditions, 27, 327, 344, 394
Meromorphic, 225, 231 Newtonian equation, 31
Meta, 767 Nilpotent, 475, 485–488, 498, 553
Methane, 659, 729, 777–797 Nilpotent matrices, 90, 91, 473–478, 484,
Method of variation of constants, 594, 596, 614 487–494, 497, 499, 500, 509, 512, 553,
Metric, 181, 196, 199, 523 607
Metric space, 196, 199, 220, 524 Nodes, 325, 335–337, 376, 747, 748, 789
Minimal polynomial, 472, 473, 475, 499, 508, Non-commutative, 3, 27, 579
513–515, 517 Non-commutative group, 622, 639, 746
Minor axis, 291 Non-degenerate, 151–154, 163, 176, 575
Mirror symmetry Non-equilibrium phenomenon, 355
plane of, 642, 643, 646–648 Non-invertible mapping, 447
Mode density, 341–345 Non-magnetic substance, 280, 286, 308, 312
Modulus, 197, 213, 235, 836 Non-negative, 27, 34, 49, 52, 68, 69, 74, 93,
Molecular axis, 648, 756, 791, 795 146, 197, 214, 241, 242, 384, 385, 524,
Molecular long axis, 756, 777 532, 535, 547, 568, 615, 811, 819, 855
Molecular orbital energy, 735, 746 Non-negative operator, 72, 74, 532
Index 909

Non-relativistic approximation, 7, 11 473, 496, 501, 502, 513, 639, 710, 742,
Non-singular matrix, 433, 454, 455, 462, 463, 746, 755, 803, 806, 864
465, 475, 486, 497, 499, 508, 513, 570, case, 343
571, 588, 681, 685, 865, 875 harmonic oscillator, 31, 57, 126, 130–132,
Non-trivial, 381, 434 755
Non-trivial solution, 382, 383, 399, 408, 426, infinite potential well, 19
521 position coordinate, 33
Non-vanishing, 15, 16, 90, 91, 128, 131, 137, representation, 746, 761, 762, 783, 800
146, 223, 225, 228, 274, 296, 394, 591, system, 128–132, 163
654, 741, 745, 770, 785 One-parameter group, 865, 870, 891
Non-zero Open ball, 220
coefficient, 300 Open neighborhood, 196
determinant, 454 Operator, 31, 61, 130, 151, 270, 347, 379, 456,
eigenvalue, 498 478, 523, 547, 610, 647, 684, 729, 801
element, 87, 88 Operator method, 33–41, 132
vector, 67 Optical devices, 314, 319, 320, 337, 354, 355,
Norm, 24, 53, 524, 525, 529, 531, 539, 541, 363, 371
543, 554, 561, 566, 573, 581, 584, 763, Optical path, 327, 328
764 Optical process, 125, 136, 346
Normalization constant, 37, 45, 51, 112, 140, Optical transition, 125–151, 346, 347, 717, 720,
543 722, 741, 746, 754, 755, 757, 769, 770,
Normalized, 13, 18, 24, 25, 28, 34, 35, 37, 39, 774–776, 796, 797
47, 53, 54, 67, 71, 76, 86, 106, 110, 113, Optical waveguide, 329
115, 121, 122, 128, 132–134, 139, 140, Optics, 295, 339, 362
153, 154, 174, 175, 356, 543, 544, 565, Orbital angular momentum, 58, 77–107, 843,
726, 742, 743, 751, 753, 754, 768, 771, 844, 895
785, 795, 847, 854, 864 Orbital angular momentum quantum numbers,
Normalized eigenfunction, 112, 773 108, 120, 122, 142
Normalized eigenstate, 53 Orbital energy, 789, 795
Normalized eigenvectors, 37, 153, 290 Order of group, 648, 696
Normal operator, 547, 554–557, 561, 563, 668 Ordinate axis, 129
n-th roots, 248–251 Organic laser, 359–374
Nullity, 444 Organic materials, 361
Null-space, 443, 501 Orthogonal, 47–49, 94, 107, 175, 304, 344,
Number operator, 40 379, 427, 429, 524, 539, 544, 548, 551,
Numerical vector, 435 560, 561, 636, 655, 691, 710, 717, 719,
737, 745, 746, 751, 759, 774, 786, 871,
875, 884
O complements, 547–549, 559, 688, 691
O(3), 280, 642, 647, 654–663, 888–889 coordinate, 197, 365, 366, 817
Oblique coordinate, 639 group, 663–678, 871
Oblique incidence matrix, 569, 574, 590, 592, 614, 636, 638,
of wave, 295 654, 663, 664, 667, 672, 680, 736, 822,
Observable, 565, 566 831, 832, 837, 871, 875, 876, 881, 884
Octants, 656 transformation, 636, 663, 669, 878, 881
Odd functions, 18, 49, 132, 157 Orthonormal base, 8, 532, 540–543
Odd permutations, 449, 873 Orthonormal basis set, 547, 584, 720, 739, 740,
Off-diagonal elements, 450, 512, 560, 639, 640, 847, 878, 881
744, 745, 761, 787 Orthonormal basis vectors, 59, 60, 636, 650,
O(n), 875 716, 730, 745
One-dimensional, 33, 49, 128, 130, 151, 152, Orthonormal eigenfunctions, 40
157, 167, 174, 177, 196, 277, 357, 375, Orthonormal eigenvectors, 153, 559
910 Index

Orthonormal system, 107, 151, 155, 842 Physicochemical properties, 746, 754
Oscillating electromagnetic field, 127 Planar molecule, 650, 747, 757, 764
Out-coupling Planck, M., 3, 127, 339, 341–345, 349
of emission, 372 Planck’s law of radiation, 339, 341–345, 349
of light, 363, 364 Plane electromagnetic waves, 283
Over damping, 419 Plane figure, 643
Overlap integrals, 717, 740, 753, 754, 767, 774, Plane of incidence, 301
777, 785, 788, 789 Plane of mirror symmetry, 642, 647
Plastics, 286
Point group, 635, 638, 648, 659, 745, 746, 755,
P 778
Paper-and-pencil, 796 Polar coordinate, 29, 45, 58, 62, 146, 171, 197,
Para, 767 241, 806, 837
Parallelepiped, 355–357 Polar coordinate representation, 146, 256, 837
Parallelogram, 848 Polar form, 197, 237, 249
Parallel translation, 278 Polarizability, 163–171
Parameter space, 831–837, 892, 895 Polarization vector, 127, 134, 135, 138, 284,
Parity, 18, 49, 130 285, 303–306, 309, 315, 324, 331, 335,
Partial differentiation, 8–10, 283, 321 350, 352, 375, 741, 755, 796
Partial fraction decomposition Polyenes, 777
method of, 232, 245 Polymers, 269, 280
Partial sum, 159, 218 Polynomials, 38, 47–49, 57, 85, 91, 93–95,
Particular solutions, 169, 383 98–100, 104–107, 116–122, 163, 231,
Path, 194, 208–210, 212, 222, 327, 328, 357, 235, 264, 298, 299, 379, 427, 428, 460,
892 461, 471–473, 475, 477, 478, 481–483,
Pauli spin matrices, 810 486, 488, 489, 499, 508, 513–515, 517,
Periodic conditions, 27, 394 518, 520, 603, 782, 851, 852
Periodic function, 250, 884 Population inversion, 355
Periodic wave, 278 Portmanteau word, 190
Permeability, 148, 271, 306, 365 Position coordinate, 33
Permittivity, 58, 108, 270, 365, 569 Position operator, 29, 347, 354
Permittivity tensor, 365, 569 Position vector, 8, 59, 127, 129, 134, 149, 164,
Permutation, 383, 449, 641, 662, 671, 677, 873 275, 280, 298, 349, 351, 439, 540, 612,
Perturbation 636, 730, 735, 741, 748, 755, 800
method, 151–172, 179 Positive definite, 17, 26, 27, 34, 36, 287, 288,
Perturbed 532, 534, 568, 570, 874, 981
state, 164 Positive definiteness, 17, 19, 26, 27, 31, 34, 36,
Phase change 287, 288, 532, 534, 547, 568–570, 681,
upon reflection, 295, 330 874
Phase difference, 286, 327, 328, 375 Positive-definite operator, 26
Phase factor, 18, 40, 74, 77, 84, 120, 133, 141, Positive helicity, 293
284, 544 Positive semi-definite, 27, 532
Phase matching, 364, 365, 371 Power series expansion, 33, 108, 116, 117, 121,
Phase matching conditions, 365 122, 198, 216, 582, 591
Phase refractive index, 358, 361, 369 Poynting vector, 309, 311, 335
Phase shift, 317, 318, 329, 330 Primitive function, 208
Phase velocity, 7, 8, 278, 280, 326, 331 Principal axis, 366, 373
Photoabsorption, 132 Principal branch, 251, 254
Photon Principal diagonal, 494, 496
absorption, 132, 135, 138 Principal minors, 287, 533, 570, 573
emission, 132, 134, 138 Principal part, 224
energy, 355 Principal submatrix, 570
Photonics, ix Principal value
Index 911

of branch, 251 R
of integral, 236, 237 Radial coordinate, 107–115, 197, 806, 822, 837
of ln z, 258 Radial wave function, 107–115, 120–122, 167
Probability, 21, 32, 126, 127, 133, 138, 172, Radiation field, 127, 136, 138
346, 353, 565, 741, 777 Raising and lowering operators, 74, 90
density, 126, 127, 129, 139 Raising operator, 74, 90
distribution, 126 Rational number, 198
distribution density, 128, 129 Rayleigh–Jeans law, 339, 345
Projection operator Real analysis, 182, 216
sensu lato, 712 Real axis, 181, 197, 235, 242, 246, 247, 253,
sensu stricto, 714 254, 257–259
Proof by contradiction, 190 Real functions, 49, 128, 141, 142, 200, 319,
Propagating electromagnetic waves, 295 348, 392, 397, 401, 402, 404, 752, 757,
Propagation constant, 326, 331, 361, 363, 364 764
Propagation vector, 364 Real number line, 200, 201, 395
Propagation velocity, 7, 278, 343 Real space, 800, 803, 806, 808
Proper rotation, 640, 643, 654 Real symmetric matrix, 287, 348, 569, 570,
Proposition 572, 600, 603
converse, 591, 825 Real symmetric quadratic form, 547, 569
P6T, 363, 365, 366, 368–370, 373, 374 Real variable, 14, 25, 200, 206, 207, 219, 230,
Pure imaginary, 28, 29, 315, 331, 419, 870 389, 581
Pure rotation group, 654, 659 Rearrangement theorem, 623, 681, 695, 701,
704, 713
Rectangular parallelepiped, 355–357
Q Recurrence relation, 84
Q-numbers, 10 Redshifted, 4, 372–374
Quadratic forms, 53, 287, 288, 532, 533, 535, Reduced mass, 58, 59, 108, 133, 735
547, 568–575, 814, 851 Reduced Planck constant, 4, 127
Quantum chemical calculations, 711, 712, 729, Reducible, 471, 686, 699, 700, 702, 742, 746,
734 749, 754, 759, 765, 769, 825, 830, 843,
Quantum chemistry, 764 845, 847, 864
Quantum electromagnetism, 378 Reducible representation, 686, 699, 700, 742,
Quantum-mechanical, 3, 21, 30–52, 54, 55, 57, 746, 749, 759
107–109, 122, 151, 152, 160, 165, 532 Reduction, 559, 687
Quantum-mechanical harmonic oscillator, Reflectance, 311, 317, 337
31–52, 55, 107–109, 152, 160, 532 Reflected light, 300, 302
Quantum mechanics, 3, 6, 11, 12, 18, 27, 29, Reflected wave, 311, 313, 324, 327
31, 51, 57, 58, 151–179, 339, 354, 427, Reflections
523, 547, 673, 748, 872 angle, 300, 302, 313, 314, 317, 319, 329,
Quantum number 643, 645
azimuthal, 120, 121, 142, 147 coefficient, 299, 300, 306–308, 317, 325,
magnetic, 140, 142 326, 337, 344, 379
orbital angular momentum, 108, 120, 122, Refraction angles, 302, 308, 315
142, 895 Refractive index
principal, 120, 121, 142 of dielectric, 280, 295, 308, 313, 314, 321,
Quantum operator, 29 326, 327
Quantum state, 21, 32, 52, 74, 90, 111, 121, group, 358
125–127, 132, 142, 143, 145, 151–159, of laser medium, 356, 358
163, 164, 166, 172, 174, 354, 741, 847, phase, 295, 358, 361, 362, 369
849 relative, 303, 312, 314, 318, 371
Quantum theory of light, 356 Region, 148, 199, 204, 206–209, 211, 215,
Quartic equation, 145 219–223, 225–227, 263, 285, 286, 318,
Quotient group, 627
912 Index

319, 321, 331, 336, 339, 351, 416, 842, theory, 621, 664, 679–726, 729, 745, 799,
892 819, 843
Regular hexagon, 764 unitary, 679–681, 683, 687, 695, 714, 825,
Regular point, 206, 216, 228 830, 843
Regular representation, 700–707 Residues, 227–230, 232, 233, 237, 257
Regular singular point, 169 Resolvent, 581, 595–600, 602–607, 609, 610,
Regular tetrahedron, 661, 777 614, 617
Relations between roots and coefficients, 461 Resolvent matrix, 595–600, 602–607, 609, 610,
Relative coordinate, 57–59 614, 617
Relative permeability, 271 Resonance integral, 740, 752, 767
Relative permittivity, 271 Resonance structure, 688, 689, 757
Relative refractive index, 303, 312, 314, 318, Resonator, 337, 359, 361
371 Reversible mapping, 447
Relativistic quantum mechanics, 12 Riemann sheet, 252
Removable singularity, 224, 244 Riemann surface, 248–265
Representation Right-circular polarization, 136, 138, 293
antisymmetric, 723–726, 858, 860, 863, 864 Right-handed system, 284, 304, 309, 350, 376
complex conjugate, 523, 762, 816, 817 Rigid body, 664, 729
conjugate, 762, 816, 817 Rodrigues formula, 85, 93, 94, 99, 116, 264
coordinate, 3, 31, 44–51, 108, 112, 122, Rotation
127, 129, 132, 136, 146, 157, 160, 163, angles, 367, 639, 640, 642, 647, 663, 667,
165, 168, 170, 171, 175, 178, 256, 395, 800, 831, 833–836, 838, 843, 844, 889
396, 440, 574, 640, 644, 650, 652, 791, axis, 639–643, 648, 657, 659, 661,
800, 823, 837, 838 664–668, 674–676, 831, 832, 834, 838,
dimension of, 679, 680, 685, 687, 688, 691, 839, 844, 889, 895
695–697, 703, 711, 719, 723, 726, 746, groups, 622, 654, 659, 663, 799, 806, 815,
777, 800, 802, 830, 839, 846, 847 843, 890
direct-product, 631–633, 720–723, 725, improper, 643, 647, 663, 780, 889
740, 741, 745, 750, 769, 796, 831, matrix, 615, 639, 663–668, 820, 838, 889
843–846, 849, 859, 861, 863 proper, 640, 643, 654
of group, 679–726, 840 symmetry, 640, 642
irreducible, 686, 688, 691–694, 696–700, transformation, 367, 439, 451, 622, 635,
702, 703, 707–711, 715–717, 719, 720, 640, 663, 667, 672–678, 799, 816, 832,
738, 740–742, 745–751, 754–757, 759, 833, 837, 889, 895
761, 765, 769–771, 774, 775, 778, 781, Row vectors, 8, 383, 464, 519, 528, 541, 542,
783–786, 788, 789, 795–797, 823, 825, 702, 830
828, 830, 840–847, 859
matrix, 3, 31, 34, 54, 86, 161, 440–443, 450,
451, 454, 456, 468, 487, 536, 553, 661, S
680, 681, 685, 686, 690, 698, 699, 708, Scalar, 270, 276–278, 434, 435, 446, 473, 486,
714, 719, 739, 811, 816, 817, 822, 826, 526, 577, 729
830, 838, 839, 844, 864, 880, 884 Scalar function, 277, 278, 729
of point group, 746 Schrödinger, E., 3, 7, 11, 14
reducible, 686, 699, 700, 742, 749, 759 Schrödinger equation
regular, 700–707 of motion, 58
space, 687, 688, 691, 692, 714, 716, 719, time-evolved, 126
738, 745, 761, 764, 777, 780, 796, 799, Schur’s First Lemma, 692–694, 697, 783, 822,
800, 802, 803, 806, 808, 817, 839, 842, 839
843, 846, 847, 849, 855, 859, 860, 864, Schur’s lemmas, 692–697, 823
884 Schur’s Second Lemma, 694, 695, 708, 738,
subduced, 750, 759, 765 824, 825, 830
symmetric, 723–726, 734, 738, 741, 742, Second-order differential operators, 389–394
745, 751, 754, 755, 759, 769, 784, 796, Second-order linear differential equations
858, 860, 863, 864 (SOLDEs), 3, 14, 15, 25, 33, 44, 69–71,
Index 913

82, 368, 379–384, 388–390, 394, 402, Skew-symmetric matrix, 590, 592, 600, 807,
424, 428, 581, 592–594 867, 871, 872, 876, 891
Secular equation, 503, 729, 744–747, 751, 761, Slab waveguide, 320, 321, 323, 324, 326, 327,
766, 770, 771, 774, 786, 787, 795 331, 359
Selection rules, 125–149, 720 Solid-state chemistry, 374
Self-adjoint, 389, 391–394, 399, 401, 402, 405, Solid-state physics, 374
410, 426, 427, 530 Space group, 637
Self-adjoint matrix, 530 Span, 435–437, 443–446, 454, 468, 469, 474,
Self-adjoint operator, 391, 399, 402, 410, 426 481, 489, 490, 501, 513, 521, 548, 549,
Sellmeier’s dispersion formula, 359, 362 575, 577, 665, 738, 745, 761, 764, 780,
Semicircle, 230–232, 236, 238, 239, 241, 242, 782, 796, 800, 817, 839, 842, 843, 849,
246 855, 859, 860, 864, 872, 873, 884
Semiclassical, 125, 147 Special functions, 57, 107, 617
Semiclassical theory, 147 Special orthogonal group SO(3), 635, 663–678,
Semiconductors, 286, 337, 354, 359–362 799, 802, 806, 808, 809, 815, 816,
Semi-infinite dielectric media, 295, 296 822–843, 845, 847, 864, 866, 867, 873,
Semi-simple, 485–488, 509, 512 884, 888–889, 892, 895, 896
Separation axioms, 195, 196 Special orthogonal group SO(n), 871, 875
Separation conditions, 195 Special solution, 8
Separation of variables, 12, 67–72, 107, 125, Special unitary group SU(2), 799, 802, 806,
341 808, 810, 812, 815, 823, 825, 827, 830,
Sesquilinear, 526 831, 843–849, 852, 864, 866, 867, 873,
Set difference, 183 892, 895, 896
Set of points, 225, 226 Special unitary group SU(n), 808, 871, 872, 875
Set theory, 181, 182, 196, 199 Spectral decomposition, 563–565, 580, 668
Similarity transformation, 458, 459, 461–463, Spectral lines, 127, 357, 729
465, 466, 475, 477, 484–486, 496, 497, Spherical basis, 806, 820
505, 506, 508, 510, 513, 515, 530, 532, Spherical coordinate, 58, 60, 109, 806, 823
554, 556, 557, 559, 560, 563, 569, 578, Spherical surface harmonics, 86, 92–103, 107,
580, 588, 589, 600, 604, 609, 654, 668, 167, 799, 806, 817, 820, 823, 839, 842,
679, 681, 683, 685, 686, 689, 782, 801, 884
804, 805, 822, 839, 863, 875, 889 Spherical symmetry, 57, 58, 892
Simple harmonic motion, 32 Spin angular momentum, 77, 843, 844
Simple root, 501, 502, 518 Spin states, 121, 810
Simply connected, 194, 209–212, 215, 219, Spontaneous emission, 346, 347, 353, 355
221, 222, 227, 890–896 Spring constant, 31, 419
Simply reducible, 686, 845, 847, 864 Square-integrable, 25
Simultaneous eigenfunction, 68, 86 Square matrix, 433, 462–464, 468, 469, 471,
Simultaneous eigenstate, 66, 68, 73, 79, 86, 90, 475, 485, 486, 496, 582, 584, 588, 679,
574–580 685, 692, 693, 697, 700, 802, 828
Singleton, 196 Square roots, 249, 288, 367, 790
Single-valuedness, 206, 215, 251 Standard deviation, 52
Singularity, 208, 211, 224, 225, 228, 229, 240, Stark effect, 164, 172
244, 408 State vectors, 127, 131, 133, 153
Singular matrix, 555 Stationary current, 273, 275
Singular part, 224, 225 Stationary waves, 331–337, 343, 375
Singular point Stimulated absorption, 346, 355
isolated, 235 Stimulated emission, 346, 347, 354, 355
regular, 169, 206, 228 Stirling’s interpolation formula, 586
Sinusoidal oscillation, 127, 129, 133 Stokes’ theorem, 296
Skew-symmetric, 590, 592, 600, 807, 867, 871, Strictly positive, 17
872, 876, 891 Structure constants, 872, 895
914 Index

Structure–property relationship, 359 T


Strum Liouville system, 427 Taylor’s expansion, 217, 219, 223, 226
Subduced representation, 750, 759, 765 Taylor’s series, 96, 216–222, 224, 234, 237,
Subgroups, 621, 624–626, 630, 632, 648, 649, 243, 263
659, 662, 663, 706, 750, 757, 759, 764, Td, 647, 654, 659, 661, 662, 778, 781, 784, 797,
808, 866, 875, 883, 887, 888 890
Submatrix, 367, 559, 570, 782, 860 Tensor
Subsets, 182, 183, 185–193, 195, 196, 199, electric permittivity, 365
226, 435, 624, 625, 629, 871, 885–887, permittivity, 365, 569
891 Termwise integration, 218, 222
Subspaces, 185, 194, 199, 435, 436, 438, 443, Thiophene, 648, 701
445, 460, 468–474, 477, 478, 481, 490, Thiophene/phenylene co-oligomers (TPCOs),
501, 503, 504, 513, 514, 518, 521, 360, 362, 363
548–550, 559, 560, 577, 579, 624, 639, Three-dimensional, 57, 58, 132–142, 163, 167,
687, 688, 691, 780, 782, 871 273, 278, 280, 323, 342, 349, 436, 451,
Superior limit, 220 496, 501, 635, 638, 639, 659, 669, 731,
Superposed wave, 334, 335 733, 784, 785, 800, 803, 819, 864, 892
Superposition Three-dimensional Cartesian coordinate, 8, 650
of waves, 285–293 Time-averaged Poynting vectors, 309, 311
Surface integral, 295–297, 351 Time-evolved Schrödinger equation, 126
Surface term, 386–388, 390, 394, 397, 402, Topological groups, 190, 799, 802, 896
403, 407, 410, 414–418, 427, 594 Topological spaces, 181, 185–196, 199, 891,
Surjective, 447, 448, 452, 628, 629, 631, 881, 895, 896
895 Topology, 181–199
Surjective mapping, 631, 881 Torus, 194
Symmetric, 67, 131, 171, 175, 287, 348, 371, Total internal reflection, 321, 327–331, 372
399, 403, 547, 570–572, 600, 603, 604, Total reflection, 295, 303, 314–320, 323, 330
637, 638, 662, 723–726, 734, 738, 741, Totally symmetric, 741, 745, 797
742, 745, 751, 754, 755, 759, 769, 784, Totally symmetric ground state, 769, 796
796, 797, 834, 847, 858, 860, 863, 864 Totally symmetric representation, 734, 738,
Symmetric matrix, 131, 569–572, 604 741, 742, 745, 751, 754, 755, 759, 769,
Symmetric representation, 723–726, 734, 738, 784
741, 742, 745, 751, 754, 755, 759, 769, Trace, 247, 257, 259, 261, 262, 289, 292, 293,
784, 796, 860, 863, 864 461, 506, 568, 569, 654, 668, 672, 674,
Symmetry 690, 694, 697–699, 703, 704, 707, 708,
groups, 622, 637, 638, 641, 650, 662, 735, 746, 748, 757, 759, 765, 770, 822, 834,
738, 755 839, 871, 872, 875
operations, 635, 637, 638, 640, 642–654, Traceless, 810, 867, 871, 878
659, 661, 684, 690, 691, 702, 710, 729, Transformation
733, 737, 738, 745–747, 750, 751, 757, coordinate, 641
759, 761, 770, 778, 782, 783 group, 623, 635, 710, 799
requirement, 175, 646, 647, 785, 787–789 linear, 433, 438–440, 443, 444, 447, 448,
species, 650, 659, 719, 745, 757, 759, 774, 450–452, 454, 456, 458, 459, 468, 471,
796 474, 481, 490, 494, 501, 506, 519, 526,
transformations, 748, 759, 765, 780 533, 535, 537–541, 555, 556, 566, 623,
Symmetry-adapted linear combination (SALC), 629, 636, 663, 669, 677, 678, 683, 684,
691, 729, 745–747, 751, 757, 760, 761, 708, 723, 804, 825, 876, 878
764, 765, 767, 770, 771, 773, 774, 777, matrix, 441, 443, 451, 452, 454, 455, 459,
783–788, 860, 861 540, 782, 808, 812
Symmetry groups, 635–678 successive, 454, 456, 458, 669, 670, 672,
Syn-phase, 335, 337 817, 818, 833, 837
System of differential equations, 581, 592–600, Transition dipole moments, 127, 132
610, 617 Transition electric dipole moment, 127
System of linear equations, 450, 452
Index 915

Transition matrix elements, 131, 740, 755, 769, Unique solution, 403, 416, 419, 424, 447, 450,
775, 776 451
Transition probability, 127, 133, 138, 172, 346, Uniqueness
347, 741, 777 of decomposition, 486, 488, 512, 564
Translation symmetry, 637 of Green’s function, 401
Transmission of representation, 440, 450, 454, 632, 633
coefficient, 306–308 of solution, 18
of electromagnetic wave, 295–337 Unitarity, 683, 713, 714, 718, 740, 808, 816,
irradiance, 311 819, 823, 851, 869
of light, 297 Unitary
Transmittance, 311 diagonalization, 290, 556–564, 568, 573,
Transpose 667, 804
complex conjugate, 22, 537 group, 799, 870–872, 875
Transposed matrix, 22, 383, 471, 523, 537 matrix, 135, 290, 530, 533, 534, 541, 547,
Transverse electric (TE) mode, 304, 324, 326, 556–559, 561, 564–568, 573, 578, 590,
329 592, 601, 604, 667, 679–681, 687, 690,
Transverse electric (TE) wave, 295, 303–312, 699, 701, 714, 746, 763, 768, 774, 782,
314, 316, 320–331 804, 805, 808, 810–812, 814, 820, 821,
Transverse magnetic (TM) mode, 304, 368, 847, 850, 854, 863, 867, 875, 876
371, 372 operator, 547–580, 860, 869, 870
Transverse magnetic (TM) wave, 295, representation, 679–681, 683, 687, 695,
303–308, 310, 312–314, 316, 318, 714, 825, 830, 843
320–331 transformation, 135, 140, 541, 566, 740,
Transverse waves, 281, 282, 352 763, 764, 770, 771, 774, 812, 849, 851,
Trial function, 174, 175, 177 852, 854, 860, 863
Triangle inequality, 525 Unitary similarity transformation, 532, 554,
Triangle matrix 556, 557, 559, 560, 563, 578, 580, 600,
lower, 462, 512, 553, 606 609, 668, 689, 782, 801, 804, 805, 822,
upper, 462, 464, 476, 485, 496, 553, 606 863, 875, 889
Trigonometric formula, 101, 130, 309, 312, Unit polarization vector
313, 330, 842, 884 of electric field, 127, 284, 303, 350, 352,
Trigonometric functions, 103, 140, 241, 319, 755, 796
639, 815, 836 of magnetic field, 304, 352
Triple root, 474, 501, 503, 504, 759 Unit set, 196
Triply degenerate, 789, 795, 796 Unit vectors, 4, 129, 130, 273, 279, 280,
Trivial, 21, 26, 184, 192, 299, 381, 434, 437, 296–298, 303–305, 325, 327, 352, 364,
475, 480, 489, 555, 567, 570, 585, 607, 436, 540, 611, 803
622, 664, 667, 693, 741, 803, 819, 860, Universal covering group, 896
867 Universal set, 182–185
Trivial solution, 399, 401, 409, 426, 453, 460, Unknowns, 153, 284, 305, 596, 598, 600
522 Unperturbed
Trivial topology, 195 state, 153, 158
T1-space, 195–196, 896 system, 152, 153
Two-level atoms, 339, 345–349, 354, 355, 375 Upper right off-diagonal elements, 450, 462
Upper triangle matrix, 462, 464, 476, 485, 496,
553, 606
U
Ultraviolet catastrophe, 339, 345
Ultraviolet light, 363
Unbounded, 25, 29, 216 V
Uncertainty principle, 29, 51–55 Variable transformation, 46, 47, 78, 140, 160,
Uncountable, 182, 621 234, 278
Underlying set, 185 Variance, 51–55
Undetermined constants, 74, 75, 84 Variance operator, 51
Uniformly convergent, 216, 218–222 Variational method, 151, 172–179
916 Index

Variational principle, 36, 110 Wigner formula, 814


Vector Wronskian, 15, 298, 381, 383, 409, 412
transformation, 182, 433–454, 459, 487,
505, 538, 635, 678, 803
Vector analysis, 269, 276, 281 X
Vector space X-ray, 4–6
complex, 136
finite-dimensional, 433, 435, 448, 449, 748
linear, 433, 438, 440, 444, 445, 448, 459, Y
518, 523, 524, 559, 575, 581, 583, 624, Yukawa potential, 107
627, 691, 872
n-dimensional, 433, 434, 437, 440, 448
Venn diagrams, 182–184, 186 Z
Vertical incidence, 303, 305, 312 Zenithal angle, 70, 674, 831, 839, 895
Zero matrix, 27, 469, 471, 475, 476, 487–489,
512, 562, 585, 587, 868, 891
W Zeros, 5, 48, 69, 132, 176, 204, 270, 318, 340,
Wave equations, 8, 269, 279, 284, 322, 324, 408, 434, 460, 532, 553, 607, 624, 639,
341, 342 695, 743, 806
Wave front, 328
Waveguide, 295, 319–331, 359, 371, 373
Wavelength dispersion, 358–362, 373 Δ
Wavelengths, 4–6, 149, 302, 303, 310, 334, δ functions, 396, 397, 399, 402, 422
336, 337, 357–359, 364, 372, 374, 611
Wavenumber, 4–6, 278, 297, 298, 301, 315,
321, 323, 326, 328, 357, 364 Θ
Wavenumber vectors, 4, 5, 279, 297, 298, 301, θ function, 422
315, 321, 323, 355, 364
Wavevector, 363, 364, 372, 373
Wave zone, 350–352 Π
Weak damping, 420 π-electron approximation, 767, 777
Weight function, 49, 384, 391, 393, 395, 405,
410, 422, 426–428, 593, 594

You might also like