Professional Documents
Culture Documents
M504Lect1 - Schur's Unitary Triangularization Theorem
M504Lect1 - Schur's Unitary Triangularization Theorem
This lecture introduces the notion of unitary equivalence and presents Schur’s theorem and
some of its consequences. It roughly corresponds to Sections 2.1, 2.2, and 2.4 of the textbook.
1 Unitary matrices
Definition 1. A matrix U ∈ Mn is called unitary if
UU∗ = I (= U ∗ U ).
If U is a real matrix (in which case U ∗ is just U > ), then U is called an orthogonal matrix.
| det U | = 1.
Examples: Matrices of reflection and of rotations are unitary (in fact, orthogonal) matrices.
For instance, in 3D-space,
1 0 0
reflection along the z-axis: U = 0 1 0 , det U = −1,
0 0 −1
cos θ − sin θ 0
rotation along the z-axis: U = sin θ cos θ 0 , det U = 1.
0 0 1
That these matrices are unitary is best seen using one of the alternate characterizations listed
below.
Theorem 2. Given U ∈ Mn , the following statements are equivalent:
(i) U is unitary,
1
Proof. (i) ⇒ (ii). If U ∗ U = I, then, for any x ∈ Cn ,
(ii) ⇒ (iii). Let (e1 , e2 , . . . , en ) denote the canonical basis of Cn . Assume that U preserves the
Hermitian norm of every vector. We obtain, for j ∈ {1, . . . , n},
Moreover, for i, j ∈ {1, . . . , n} with i 6= j, we have, for any complex number λ of modulus 1,
1 1
kλUi + Uj k22 − kUi k22 − kUj k22 = kU (λei + ej )k22 − kU (ei )k22 − kU (ej )k22
<hλUi , Uj i =
2 2
1
kλei + ej k22 − kei k22 − kej k22 = 0.
=
2
This does imply that hUi , Uj i = 0 (argue for instance that |hλUi , Uj i| = <hλUi , Uj i for a properly
chosen λ ∈ C with |λ| = 1).
(iii) ⇒ (i). Observe that the (i, j)th entry of U ∗ U is Ui∗ Uj to realize that (iii) directly translates
into U ∗ U = I.
According to (iii), a unitary matrix can be interpreted as the matrix of an orthonormal basis in
another orthonormal basis. In terms of linear maps represented by matrices A, the change of
orthonormal bases therefore corresponds to the transformation A 7→ U AU ∗ for some unitary
matrix U . This transformation defines the unitary equivalence.
Definition 3. Two matrices A, B ∈ Mn are called unitary equivalent if there exists a unitary
matrix U ∈ Mn such that
B = U AU ∗ .
tr B ∗ B = tr (U AU ∗ )∗ (U AU ∗ ) = tr U A∗ U ∗ U AU ∗ = tr U A∗ AU ∗ = tr U ∗ U A∗ A
= tr A∗ A .
2
2 Statement of Schur’s theorem and some of its consequences
Schur’s unitary triangularization theorem says that every matrix is unitarily equivalent to a
triangular matrix. Precisely, it reads as follows.
Theorem 4. Given A ∈ Mn with eigenvalues λ1 , . . . , λn , counting multiplicities, there exists
a unitary matrix U ∈ Mn such that
λ1 x · · · x
.
0 λ2 . . . .. ∗
A=U . .
U .
.. .. ... x
0 · · · 0 λn
Note that such a decomposition is far from unique (see Example 2.3.2 p.80-81). Let us now
state a few consequences from Schur’s theorem. First, Cayley–Hamilton theorem says that
every square matrix annihilates its own characteristic polynomial.
Theorem 5. Given A ∈ Mn , one has
pA (A) = 0.
The second consequence of Schur’s theorem says that every matrix is similar to a block-
diagonal matrix where each block is upper triangular and has a constant diagonal. This
is an important step in a possible proof of Jordan canonical form.
Theorem 6. Given A ∈ Mn with distinct eigenvalues λ1 , . . . , λk , there is an invertible matrix
S ∈ Mn such that
T1 0 · · · 0 λi x · · · x
. .
0 T2 . . . .. −1 0 λi . . . ..
A=S . .
. S ,
where Ti has the form Ti = . .
.
.
. . . . . . 0 .. . . . . x
0 ··· 0 Tk 0 ··· 0 λi
The arguments for Theorems 5 and 6 (see next section) do not use the unitary equivalence
stated in Schur’s theorem, but merely the equivalence. The unitary equivalence is nonetheless
crucial for the final consequence of Schur’s theorem, which says that there are diagonalizable
matrices arbitrary close to any matrix (in other words, the set of diagonalizable matrices is
dense in Mn ).
Theorem 7. Given A ∈ Mn and ε > 0, there exists a diagonalizable matrix A
e ∈ Mn such that
X
ai,j |2 < ε.
|ai,j − e
1≤i,j≤n
3
3 Proofs
Proof of Theorem 4. We proceed by induction on n ≥ 1. For n = 1, there is nothing to do.
Suppose now the result true up to an integer n − 1, n ≥ 2. Let A ∈ Mn with eigenvalues
λ1 , . . . , λn , counting multiplicities. Consider an eigenvector v1 associated to the eigenvalue
λ1 . We may assume that kv1 k2 = 1. We use it to form an orthonormal basis (v1 , v2 , . . . , vn ).
The matrix A is equivalent to the matrix of the linear map x 7→ Ax relative to the basis
(v1 , v2 , . . . , vn ), i.e.,
λ1 x ··· x
0
−1
(1) A=V . V ,
.. A
e
0
h i
where V = v1 · · · vn is the matrix of the system (v1 , v2 , . . . , vn ) relative to the canonical basis.
Since this is a unitary matrix, the equivalence of (1) is in fact a unitary equivalence. Note
that pA (x) = (λ1 − x)pAe(x), so that the eigenvalues of A e ∈ Mn−1 , counting multiplicities, are
λ2 , . . . , λn . We use the induction hypothesis to find a unitary matrix W f ∈ Mn−1 such that
λ2 x · · · x λ2 x · · · x
. .
0 . . . . . . .. ∗ 0 . . . . . . ..
∗ ef
A=W . .
e f W ,
f i.e., W AW = . .
f .
.. .. ... x .. .. ... x
0 · · · 0 λn 0 · · · 0 λn
Now observe that
∗
1 0 ··· 0 λ1 x ··· x 1 0 ··· 0 1 0 ··· 0 λ1 x ··· x
0 0 0 0 0
. . . = . .
. . . . f∗ .
. W . A . W . W . A
eW
f e f f
0 0 0 0 0
λ1 x ··· x λ1 x ··· x
0 0 λ2 · · · x
=
..
= . .. . . .
..
. f∗A
W eWf .. . . .
0 0 0 ··· λn
" #
1 0
Since W := is a unitary matrix, this reads
0 W
f
λ1 x ··· x λ1 x · · · x
0 0 λ2 · · · x ∗
(2) .
.
=W .
. .. . . W .
..
. A
e
. . . .
0 0 0 ··· λn
Putting the unitary equivalences (1) and (2) together shows the result of Theorem 4 (with
U = V W ) for the integer n. This concludes the inductive proof.
4
Now that Schur’s theorem is established, we may prove the consequences stated in Section 2.
Proof of Theorem 5. First attempt, valid when A is diagonalizable. In this case, there is a
basis (v1 , . . . , vn ) of eigenvectors associated to (not necessarily distinct) eigenvalues λ1 , . . . , λn .
It is enough to show that the matrix pA (A) vanishes on each basis vector vi , 1 ≤ i ≤ n. Note
that
pA (A) = (λ1 I − A) · · · (λn I − A) = [(λ1 I − A) · · · (λi−1 I − A)(λi+1 I − A) · · · (λn I − A)](λi I − A),
because (λi I − A) commutes with all (λj I − A). Then the expected results follows from
pA (A)(vi ) = [· · · ](λi I − A)(vi ) = [· · · ](0) = 0.
Final proof. Let λ1 , . . . , λn be the eigenvalues of A ∈ Mn , counting multiplicities. According
to Schur’s theorem, we can write
λ1 x · · · x
0 λ2 · · · x
A = ST S −1 , where T = .. .. . . . .
. . . ..
0 0 ··· λn
To establish the next consequence of Schur’s theorem, we will use the following result.
Lemma 8. If A ∈ Mm and B ∈ Mn are two matrices with no eigenvalue in common, then the
matrices " # " #
A M A 0
and
0 B 0 B
are equivalent for any choice of M ∈ Mm×n .
5
Proof. For X ∈ Mm×n , consider the matrices S and S −1 given by
" # " #
I X −1 I −X
S= and S = .
0 I 0 I
F : X ∈ Mm×n 7→ AX − XB ∈ Mm×n ,
it is enough to show that F is surjective. But since F is a linear map from Mm×n into itself,
it is therefore enough to show that F is injective, i.e., that
?
(3) AX − XB = 0 =⇒ X = 0.
To see why this is true, let us consider X ∈ Mm×n such that AX = XB, and observe that
etc., so that P (A)X = XP (B) for any polynomial P . If we choose P = pA as the characteristic
polynomial of A, Cayley–Hamilton theorem implies
Denoting by λ1 , . . . , λn the eigenvalues of A, we have pA (B) = (λ1 I −B) · · · (λn I −B). Note that
each factor (λi I − B) is invertible, since none of the λi is an eigenvalue of B, so that pA (B) is
itself invertible. We can now conclude from (4) that X = 0. This establishes (3), and finishes
the proof.
We could have given a less conceptual proof of Lemma 8 in case both A and B are upper
triangular (see exercises), which is actually what the proof presented below requires.
0 ··· 0 Tk 0 ··· 0 λi
6
We now use the Lemma 8 repeatedly in the chain of equivalences
T1 X ··· ··· X T1 0 ··· ··· 0
T1 0
0 T2 X · · · X 0 T2 X · · · X
0 T2 X
.
. .. .. ..
.. ..
· · ·
A∼ . 0 T3 . . ∼ . 0 T3 . . = T 3 X
. .. .
.
. .. . . ..
.. .. . . ..
. . .
. .
. . . . X .
. . . X 0
0 0 · · · 0 Tk 0 0 · · · 0 Tk 0 · · · Tk
T1 0 T1 0 0 T1 0 0
0 T 0 0 T 0 X 0 T 0 0
2 2 2
· · · 0 0 T 0 0 T
∼ T 3 X = 3 ∼ 3 ∼ ···
.. . . .. .. ..
. . . . X . 0
0 0 0
0 · · · Tk 0 Tk 0 Tk
T1 0 · · · 0
.
0 T2 . . . ..
∼ . .
.
.. . . . . 0
0 ··· 0 Tk
This is the announced result.
Proof of Theorem 7. Let us sort the eigenvalues of A as λ1 ≥ · · · ≥ λn . According to Schur’s
theorem, there exists a unitary matrixU ∈ Mn such that
λ1 x · · · x
.
0 λ2 . . . .. ∗
A=U . .
.
U .
. . . . . . x
0 · · · 0 λn
If λi := λi + iη and η > 0 is small enough to guarantee that λ e ,...,λen are all distinct, we set
1
e
e
λ1 x · · · x
e2 . . . ...
0 λ
A=U . .
e U ∗.
. . . .
. . x
.
0 ··· 0 λ en
In this case, the eigenvalues of A e (i.e., λ en ) are all distinct, hence A
e1 , . . . , λ e is diagonalizable.
We now notice that X
|ai,j − e e ∗ (A − A)
ai,j |2 = tr (A − A) e .
1≤i,j≤n
But since A − A
e is unitarily equivalent of the diagonal matrix diag[λ1 − λ
e 1 , . . . , λn − λ
en ], this
P
quantity equals 1≤i≤n |λi − λ ei |2 . It follows that
X X
ai,j |2 =
|ai,j − e i2 η 2 < ε,
1≤i,j≤n 1≤i≤n
provided η is chosen small enough to have η 2 < ε/
P 2
ii .
7
4 Exercises
Ex.6: Prove that a matrix U ∈ Mn is unitary iff it preserves the Hermitian inner product,
i.e., iff hU x, U yi = hx, yi for all x, y ∈ Cn .
Ex.10: Given an invertible matrix A ∈ Mn , show that its inverse A−1 can be expressed as a
polynomial of degree ≤ n − 1 in A.
Ex.11: Without using Cayley-Hamilton theorem, prove that if T ∈ Mm and Te ∈ Mn are two
upper triangular matrices with no eigenvalue in common, then the matrices
" # " #
T M T 0
and
0 Te 0 Te
are equivalent for any choice of M ∈ Mm×n .
[Hint: Observe that you need to show T X = X Te =⇒ X = 0. Start by considering the
element in the lower left corner of the matrix T X = X Te to show that xm,1 = 0, then
consider the diagonal i − j = m − 2 (the one just above the lower left corner) of the
matrix T X = X Te to show that xm−1,1 = 0 and xm,2 = 0, etc.]