You are on page 1of 26

MA1513

Chapter 3: (Linear Algebra) Linear Transformation, Eigenvalues and Eigenvectors


3.1 Linear Transformation (P.1)
3.2 Eigenvalues and Eigenvectors (P.7)
3.3 Eigenspaces (P.11)
3.4 Diagonalizable Matrices (P.15)
3.5 Diagonalization (P.20)
3.6 Powers of Matrices (P.24)

3.1 Linear Transformation


In this chapter, we are going to look at matrices from a different perspective by treating them as
some transformations. More precisely, we shall see how a matrix transform a vector to another
vector through matrix multiplication. This provides a geometrical interpretation for matrices in the
lower dimensional cases.
Matrix as a Mapping (slide 2)
 0 − 1
We start with a simple 2 x 2 matrix A =  1 0 
 
1 
Given this 2-vector u=  which we treat it as an input vector,
1 
we shall process it under matrix A by means of multiplication.
The resulting product Au is another 2-vector which we call it the output vector, or the image.

In other words, we can think of a matrix as a mapping from R2 to R2.


Geometrically, this mapping transforms the arrow representing the vector u to this arrow
representing the vector Au:

Here we denote the mapping by the letter T, which stand for transformation.
Notation:
T : R2 → R2 defined by T(u) = Au for all u in R2
We shall call such a mapping a linear transformation.

1
2 x 2 Linear Transformation (slide 3)

We now look at a general input vector x


u= 
y 

We process this input vector like before and get the output vector Au:

This gives us a formula of the linear transformation:

T : R2 → R2 given by

The geometrical interpretation of this matrix can now be seen more clearly by considering a slightly
more complicated diagram, like the letter F.

We see that the resulting diagram is the letter of F “lying down”. In fact, the effect of this
transformation is the 90° counterclockwise rotation about the origin.
Geometric Transformations of R2 (slide 4)

Indeed, many 2 x 2 matrices give transformations that have interesting geometrical interpretation on
the xy-plane. We shall give the 4 basic types of transformation, namely rotation, reflection, scaling
and shearing.

Rotation

The matrix gives a counterclockwise rotation through an angle 𝜃𝜃.

Reflection
The matrix represents the reflection about the y-axis,
while the matrix gives the reflection about the x-axis.

2
Scaling
The matrix represents an enlargement if a > 1, and represents a contraction if 0 < a < 1.

When the two diagonal entries are different, the x and y scaling will not have the same proportion.

Shearing
The matrix has the shearing effect parallel to the x-axis. How much the letter is slanted

depends on the value of “a”, and it is slanted to the left if a < 0.

Similarly, the matrix has the shearing effect parallel to the y axis.

m x n Linear Transformation (slide 5)


Now we look at a general m x n matrix A and go through the same procedure. In order to perform
the matrix multiplication, the input vector u must be an n-vector and the output vector Au will be an
m-vector.

So this gives a mapping T with domain Rn and codomain Rm.


T is called a linear transformation from Rn to Rm with notation
T : Rn → Rm defined by T(u) = Au for all u ∈ Rn.
A is called the standard matrix of the linear transformation.
𝑎𝑎11 𝑎𝑎12 … 𝑎𝑎1𝑛𝑛 𝑥𝑥1
𝑎𝑎21 𝑎𝑎22 … 𝑎𝑎2𝑛𝑛 𝑥𝑥2
Suppose 𝑨𝑨 = � ⋮ and u = � ⋮ �.
⋮ ⋮ �
𝑎𝑎𝑚𝑚1 𝑎𝑎𝑚𝑚2 … 𝑎𝑎𝑚𝑚𝑚𝑚 𝑥𝑥𝑛𝑛

3
This gives the formula for T:
𝑥𝑥1 𝑎𝑎11 𝑥𝑥1 + 𝑎𝑎12 𝑥𝑥2 + … + 𝑎𝑎1𝑛𝑛 𝑥𝑥𝑛𝑛
𝑥𝑥2 𝑎𝑎21 𝑥𝑥1 + 𝑎𝑎22 𝑥𝑥2 + … + 𝑎𝑎2𝑛𝑛 𝑥𝑥𝑛𝑛
𝑇𝑇 �� ⋮ �� = � �

𝑥𝑥𝑛𝑛 𝑎𝑎𝑚𝑚1 𝑥𝑥1 + 𝑎𝑎𝑚𝑚2 𝑥𝑥2 + … + 𝑎𝑎𝑚𝑚𝑚𝑚 𝑥𝑥𝑛𝑛

Linear Transformation by Formula (slide 6)


Supposed we are given a mapping in terms of formula as follow:
x + y
 x   
T : R2 → R3 such that T     =  2x 
  y    −3y 
 

We can regard x and y as two parameters, and rewrite the formula as a linear combination as shown
in the middle term below:

1 1
Then this linear combination can be rewritten in matrix form with �2� and � 0 � forming the two
0 −3
1 1
columns of a 3 x 2 matrix A = �2 0 �.
0 −3
So we see that T is a linear transformation with this given standard matrix A.

Images of Standard Basis (slide 7)


Let T : Rn → Rm be any linear transformation with the standard matrix given by
𝑎𝑎11 𝑎𝑎12 … 𝑎𝑎1𝑛𝑛
𝑎𝑎21 𝑎𝑎22 … 𝑎𝑎2𝑛𝑛
𝑨𝑨 = � �.
⋮ ⋮ ⋮
𝑎𝑎𝑚𝑚1 𝑎𝑎𝑚𝑚2 … 𝑎𝑎𝑚𝑚𝑚𝑚
Suppose we use the standard basis vectors of Rn

as the input vectors of this mapping. Then


𝑎𝑎11 𝑎𝑎12 … 𝑎𝑎1𝑛𝑛 1 𝑎𝑎11
𝑎𝑎21 𝑎𝑎22 … 𝑎𝑎2𝑛𝑛 0 𝑎𝑎21
T(e1) = 𝑨𝑨𝒆𝒆𝟏𝟏 = � ⋮ ⋮ ⋮ � �⋮� = � ⋮ �
𝑎𝑎𝑚𝑚1 𝑎𝑎𝑚𝑚2 … 𝑎𝑎𝑚𝑚𝑚𝑚 0 𝑎𝑎𝑚𝑚1
We notice that this output vector is precisely the first column of the standard matrix A.
Similarly, if we take the ith standard basis vector as the input vectors, the output vectors will
correspond to the ith column of A. In other words,
the image T(ej) = the jth column of A
and the matrix A = (T(e1) T(e2) ··· T(en)).

4
Properties of Linear Transformation (slide 8)
If T : Rn → Rm is a linear transformation, then
1. T(0) = 0 (T preserves zero vector)
2. T(u + v) = T(u) + T(v) (T preserves addition)
3. T(cu) = cT(u) (T preserves scalar multiplication)
4. T( c1u1 + c2u2 + ··· + ckuk ) = c1T(u1) + c2T(u2) + ··· + ckT(uk) (T preserves linear combinations)

All these properties are referred to as the linearity properties of the linear transformation, and they
can be verified easily by converting the map T into matrix multiplication by the standard matrix A to
the input vector. For example, in the last property, the left hand side can be written as pre-
multiplying A to the linear combination:
T( c1u1 + c2u2 + ··· + ckuk ) = A( c1u1 + c2u2 + ··· + ckuk )

= c1 Au1 + c2 Au2 + ··· + ck Auk

= c1T(u1) + c2T(u2) + ··· + ckT(uk)

where the second equality is a property of matrix multiplication.

Example: Linearity Property (slide 9)


Given a linear transformation T : R2 → R2 , with both the standard matrix and formula unknown. The
only information we have is that the images of two particular input vectors

1 0
Based on this piece of information, how do we find the images of 𝒆𝒆1 = � � and 𝒆𝒆2 = � � under T ?
0 1
We shall see that the linearity property of T can help us.
First observe that the two input vectors form a basis for R2 (just need to check they are linearly
1 0
independent). This means it is possible to express 𝒆𝒆1 = � � and 𝒆𝒆2 = � � as linear combinations of
0 1
this basis.
1
� � = 2 �1� + 2 � 1 � (use Gaussian elimination, or simply by inspection)
1 1
0 1 −1
1 1 1
T�� �� =T �12 �1� + 12 � 1 �� =2T �� �� + 2T �� ��
1 1
(by linearity property)
0 1 −1 1 −1

4 0
=12 � � + 12 � �= �2� (from given information above)
2 6 4
Likewise,

�0� = 2 �1� − 2 � 1 �
1 1
1 1 −1

T�� �� =T �12 �1� − 12 � 1 �� =1T ��1�� − 1T �� 1 �� =1 �4� − 1 �0�= � 2 �


0
1 1 −1 2 1 2 −1 2 2 2 6
-2
Recall from an earlier slide, the columns of the standard matrix A of the linear transformation is
given by the image of the standard basis 𝒆𝒆1 and 𝒆𝒆2 , we can therefore recover:
2 2
Standard Matrix A = �T(e1 ) T(e2 )� = � �.
4 -2

5
Images of a basis (slide 10)
From the above example, we have the following generalization:

Given linear transformation T : Rn → Rm and a basis {u1, u2, …, un} for Rn.

For any v in Rn, we have v = c1u1 + c2u2 + ··· + cnun and hence

T(v) = c1T(u1) + c2T(u2) + ··· + cnT(un)

Hence, knowing the images T(u1), T(u2),…, T(un) is enough to determine:

• the image T(v) of any vector v in the domain Rn;


• the standard matrix of T;
• the formula of T.

We say:
the linear transformation T is completely determined by the images T(u1), T(u2), …, T(un) of the basis
{u1, u2, …, un}.

6
3.2 Eigenvalues and Eigenvectors
Given a square matrix, we can talk about the eigenvalues and eigenvectors associated to this matrix.
It is probably the most commonly used concepts in linear algebra that is being applied to solving
Engineering problems. In chapter 4, we shall see how these concepts are being used in solving
system of differential equations.

Eigenvalues and Eigenvectors (slide 2)

Let A be n x n square matrix. By treating A as a linear transformation, we have

Let x be a nonzero (input) vector in Rn. If the output vector Ax is a scalar multiple of x, then we call x
an eigenvector of A.

If we write Ax = 𝜆𝜆x , then the scalar 𝜆𝜆 (pronounced: “lambda”) is called an eigenvalue of A ;


and x is said to be an eigenvector of A associated with the eigenvalue 𝜆𝜆.

Geometrically (in 2- or 3-spaces), the arrows representing x and Ax are parallel, pointing either in the
same or opposite directions.

Example 1: Eigenvalues and Eigenvectors (slide 3)

 0.96 0.01 
A =  
 0.04 0.99 

Let’s use the following as the input vectors:


1  1 
x =   y =  
 4  − 1
Then

So x is an eigenvector of A with the eigenvalue 1.

And

So y is an eigenvector of A with the eigenvalue 0.95.

Example 2: Eigenvalues and Eigenvectors (slide 4-5)

We shall look at another matrix B with input vectors x, y, z:

1 1 1  1  1 1
       
B = 1 1 1  x = 1  y=  0  z =  −2 
1 1 1  1   −1  1
       

7
First, we have

So x is an eigenvector associated with eigenvalue 3.

Before we move on to y and z, let us look at the scalar multiples of x:

So 2x is also an eigenvector associated with eigenvalue 3.

In fact, for any non-zero scalar k, kx is an eigenvector associated with eigenvalue 3:

1 1 1   k  k 
    
B=
(kx) 1 1 1 = k
  3
=  k  3(kx)
1 1 1   k   
   k 

In general,

if x is an eigenvector associated with eigenvalue 𝜆𝜆,


any scalar multiples kx will also be an eigenvector associated with the same eigenvalue 𝜆𝜆.

Now for y and z,

1 1 1   1   0  1
      
Bz = 1 1 1   −2  =  0  = 0  −2  = 0z
1 1 1   1   0  1
      

So y and z are both eigenvectors associated with eigenvalue 0.

Note:

1. 0 can be an eigenvalue, but the zero vector 0 cannot be an eigenvector.

2. Two eigenvectors that are not scalar multiples of each other may have the same eigenvalue.

Finding Eigenvalues (slide 6)

Let A be n x n square matrix. Without given any input vector, how do we find the eigenvalues of A?

We will derive the answer using concepts that we have learned in earlier chapters.

First of all, by definition, we have

𝜆𝜆 is an eigenvalue of A ⇔ Ax = 𝜆𝜆x for some nonzero column vector x

⇔ 𝜆𝜆x – Ax = 0 for some nonzero column vector x

We can further rewrite 𝜆𝜆x – Ax = 0 as (𝜆𝜆I – A) x = 0 .

8
So the condition now become:

(𝜆𝜆I – A) x = 0 (*) for some nonzero column vector x (△)

The matrix equation (*) with x as unknown represents a homogeneous system with the coefficient
matrix 𝜆𝜆I – A.

Condition (△) implies this linear system has a non-trivial solution. Now recall that a homogeneous
system has non-trivial solution means the coefficient matrix 𝜆𝜆I – A is singular, and

det(𝜆𝜆I – A) = 0 (†)

Now if we expand the determinant using the inductive expansion (see section 1.7), the equation (†)
will turn into a polynomial equation in terms of 𝜆𝜆. In order for us to find the eigenvalue 𝜆𝜆, we need
to solve this polynomial equation.

We shall look at an example to elaborate the above.

Example 1: Finding Eigenvalues (slide 7)


 0.96 0.01 
A =  
 0.04 0.99 
  1 0   0.96 0.01   λ − 0.96 − 0.01
det(λI − A) = det λ   −    = = (λ − 0.96)(λ − 0.99) − (−0.01)(−0.04)
  0 1   0.04 0.99   − 0.04 λ − 0.99

So we have λ 2 1.95λ + 0.95 ,


det(λI − A) =− which is a polynomial in 𝜆𝜆 of degree 2.

To find the eigenvalues of A, we solve the polynomial equation:

=0
which gives the solutions 𝜆𝜆 = 1 and 0.95.
Hence the eigenvalues of A are 1 and 0.95.
Characteristic Polynomial (slide 8)
Let A be n x n square matrix. Then det(λI – A) is a polynomial of degree n. This polynomial is called the
characteristic polynomial of A.
 0.96 0.01 
Example: The characteristic polynomial of A =   is
 0.04 0.99 
From the earlier discussion, we have:
λ is an eigenvalue of A ⇔ det(λI – A) = 0 ⇔ λ is a root of the characteristic polynomial

Example 2: Finding Eigenvalues (slide 9)


0 − 1 0
 
C = 0 0 2
1 1 1
 

To find the eigenvalues of C, we solve its characteristic polynomial:

(using inductive expansion)

= 𝜆𝜆3 − 𝜆𝜆2 − 2𝜆𝜆 + 2

9
It may not be easy to factorize a degree polynomial. A short cut is to try guessing one possible root.
In this case, it is not difficult to guess the root λ = 1. Ultimately, the complete factorization of this
polynomial is given by

det(𝜆𝜆I − C) = (𝜆𝜆 − 1)(𝜆𝜆2 − 2) = (𝜆𝜆 − 1)�𝜆𝜆 − √2��𝜆𝜆 + √2�

The eigenvalues of C are the roots of the above polynomial, which are 1, √2 and -√2.

Triangular Matrices (slide 10)


If A is a triangular matrix, the eigenvalues of A are all the diagonal entries of A.
In particular, the eigenvalues of a diagonal matrix are its diagonal entries.
For example,
 − 1 3.5 14 
 
This is an upper triangular matrix with diagonal entries –1, 5 and 2.
 0 5 − 26 
 0
 0 2  So the eigenvalues are –1, 5 and 2.
− 2

0 0
 This is a lower triangular matrix with diagonal entries –2, 0 and 10.
 99 0 0
 10 − 4.5 10 
  So the eigenvalues are –2, 0 and 10.
Eigenvalue and Singularity (slide 11)
If 0 is an eigenvalue of A, then A is singular. Otherwise, A is non-singular.
Below is an explanation:
0 is an eigenvalue of A
⇔ 0 is a root of the characteristic polynomial det( 𝜆𝜆I – A)
⇔ det(0I – A) = 0
⇔ det(–A) = 0
⇔ det(A) = 0
⇔ A is singular
Note that the arguments go both ways.
Matrix with Complex Eigenvalues (slide 12)
To end this section, we make a remark about complex eigenvalues.
Given a matrix A, it is possible that the characteristic polynomial det(𝜆𝜆I – A) may have complex roots.
Example  0 − 1
A= 
1 0 

The characteristic polynomial of this matrix is det(𝜆𝜆I – A) = 𝜆𝜆2 + 1. It has complex roots: 𝜆𝜆 = ± i.

So this matrix has complex eigenvalues. The corresponding eigenvectors will have complex
components. i.e. they come from vector space over complex numbers. We shall see some
applications of complex eigenvalues and eigenvectors in solving system of differential equations in
the last chapter.

10
3.3 Eigenspaces
We have seen in the previous section that, given a square matrix, there is a systematic way to find
the eigenvalues by solving the characteristic polynomial. In this section, we shall see how to find the
eigenvectors associated to a certain eigenvalue. The collection of all such eigenvectors form a
subspace of some n-space. This subspace is called an eigenspace.
Eigenspace (slide 2)
Given an n x n square matrix A.
Suppose 𝜆𝜆 is an eigenvalue of A. Then det(𝜆𝜆I – A) = 0. This means the homogeneous linear system
(𝜆𝜆I – A) x = 0
with coefficient matrix 𝜆𝜆I – A has non-trivial solutions. In fact,
all the non-trivial solutions of the system (𝜆𝜆I – A) x = 0 are the eigenvectors of A associated to 𝜆𝜆.
The solution space of this homogeneous system is called the eigenspace of A associated with 𝜆𝜆 and
we denote this subspace of Rn by E𝜆𝜆.
If u is a nonzero vector in E𝜆𝜆, then u is an eigenvector of A associated with the eigenvalue 𝜆𝜆.
Example 1: Eigenspace (slide 3-5)
 0.96 0.01 
A =  
 0.04 0.99 
In the previous section, we have found the eigenvalues of A to be 1 and 0.95 by solving its
characteristic polynomial det(𝜆𝜆I – A) = (𝜆𝜆 - 1)(𝜆𝜆 - 0.95). So correspondingly there are two eigenspace
E1 and E-0.95. associated to eigenvalues 1 and 0.95 respectively. We solve for these two eigenspaces
separately.

For 𝜆𝜆 = 1, the homogeneous system (𝜆𝜆I – A) x = 0 is given by


1 − 0.96 − 0.01   x   0 
(I – A) x = 0 ⇔     =  
 − 0.04 1 − 0.99   y   0 
Using Gaussian elimination, we can solve the system and get the general solution
x 0.25t 0.25
�y� = � �= t� �
t 1
with parameter t. We can pull out the parameter t as a scalar as shown on the right side above. This
0.25
means any non-zero scalar multiple of � � is an eigenvector of A associated with the eigenvalue
1
1. The eigenspace is given by
 0.25 
E1 = span 
 1 
0.25
We can also take this eigenvector � � (or any one of its multiples) as a basis for E1.
1
We do the same for 𝜆𝜆 = 0.95.
 0.95 − 0.96 − 0.01 x 0
(0.95I – A) x = 0 ⇔    =  
 − 0.04 0.95 − 0.99   y   0 
Again using Gaussian elimination, we can solve the system and get the general solution
x  − 1
  = t  
y   1 

11
-1
Any non-zero scalar multiple of � � is an eigenvector of A associated with the eigenvalue 0.95.
1

The eigenspace E0.95 is given by


 − 1
E0.95 = span 
 1 
-1
and a basis for E0.95 is given by � � or any of its multiples.
1
Example 2: Eigenspace (slide 6-7)
1 1 1 
 
B = 1 1 1 
1 1 1 
 

We have seen before that the eigenvalues of B are 3 and 0 by solving the characteristic polynomial
det(𝜆𝜆I – B) = (𝜆𝜆 - 3)𝜆𝜆2. So B has two eigenspaces E3 and E0 associated to eigenvalues 1 and 0.95
respectively.

These eigenspaces can be solved in a similar way. Here we will only find the eignespace E0.

We substitute 𝜆𝜆 = 0 in the homogeneous system (𝜆𝜆I – B) x = 0 to get


0 − 1 − 1 − 1   x  0
    
(– B) x = 0 ⇔  − 1 0 − 1 − 1   y  = 0
 −1
 − 1 0 − 1  z   0 

Using Gaussian elimination, we solve the system to get the general solution with two parameters s
and t:
x  − 1  − 1
     
y
  = s  1  + t  0 
z  0   1 
     

-1 -1
This means any non-zero linear combination of � 1 � and � 0 � is an eigenvector of B associated with
0 1
the eigenvalue 0. The eigenspace can be expressed as the linear span of these two vectors:
 − 1  − 1
   
E0 = span 1  ,  0 
 0   1 
   

and these two vectors give a basis for the eigenspace E0.

Eigenspace of Identity Matrix (slide 8)

We use the 3 x 3 identity matrix as an example, but the result can be generalized to any n x n identity
matrix.

1 0 0
 
I3 =  0 1 0 
0 0 1
 

We have the following properties:

12
1. The identity matrix I3 has only one eigenvalue 1.

2. I3 has only one eigenspace E1.

3. The eigenspace E1 is R3.

Property 1 is easy to see, as I3 is a diagonal matrix, and eigenvalues of a diagonal matrix are all its
diagonal entries, which is just 1 in this case. In fact, if you consider the characteristic polynomial of
the identity matrix, it is given by det(𝜆𝜆I3 – I3) = (𝜆𝜆 - 1)3, which has only one root 1.

Property 2 follows from 1 immediately.

Property 3 says that any non-zero 3-vector in R3 is an eigenvector. To see this, we look at the matrix
0 0 0
 
λ I3 − I3 =
0 0 0
0 0 0
 

with 𝜆𝜆 = 1, which turns out to be the zero matrix. Then

(𝜆𝜆I3 – I3) x = 0 ⟹ 0x = 0

This homogeneous system with zero matrix as the coefficient is satisfied by every vector in R3. This
shows that the eigenspace E1 is R3.

Multiplicity of Eigenvalues (slide 9)

For the three matrices we have seen in this section, let us recap their characteristic polynomials.
 0.96 0.01 
A =  
 0.04 0.99 

The characteristic polynomial of A is given by this factorized form det(𝜆𝜆I – A) = (𝜆𝜆 - 1)(𝜆𝜆 - 0.95).

Note that the power of each of the two factors (𝜆𝜆 - 1) and (𝜆𝜆 - 0.95) is 1. We call these powers the
respective multiplicities of the eigenvalues

1 1 1 
 
B = 1 1 1 
 
1 1 1 

The characteristic polynomial of B is det(𝜆𝜆I – B) = (𝜆𝜆 - 3) 𝜆𝜆2

The factor (𝜆𝜆 - 3) corresponding to eigenvalue 3 has power 1, and the factor 𝜆𝜆2 corresponding to
eigenvalue 0 has power 2. So we say the eigenvalue 3 has multiplicity 1, and the eigenvalue 0 has
multiplicity 2.
1 0 0
 
I3 =  0 1 0 
0 0 1
 

The characteristic polynomial of I is det(𝜆𝜆I – I) = (𝜆𝜆 - 1)3 has only one factor with power 3.

So the eigenvalue 1 has multiplicity 3.

Now let’s look at the dimension of the various eigenspaces. Recall that the dimension is the number
of basis vectors for a certain vector space.

13
For matrix A, we have dim E1 = 1, dim E0.95 = 1.

For matrix B, we have dim E3 = 1, dim E0 = 2.

For the identity matrix I, dim E1 = dim R3 = 3.

Observe that the dimension of the eigenspace is equal to the multiplicity of the corresponding
eigenvalues given above.

This is not true in general. However, there is a relationship between the two quantities.

Dimension of Eigenspace (slide 10)

Suppose the characteristic polynomial of a matrix A is factorized as follow, with all the common
factors group together:

det(𝜆𝜆𝑰𝑰 − 𝑨𝑨) = (𝜆𝜆 − 𝜆𝜆1 )𝑟𝑟1 (𝜆𝜆 − 𝜆𝜆2 )𝑟𝑟2 ⋯ (𝜆𝜆 − 𝜆𝜆𝑘𝑘 )𝑟𝑟𝑘𝑘
What we know from this polynomial are: 𝜆𝜆1 to 𝜆𝜆k are all the eigenvalues of A, r1 to rk are the
respective multiplicities of these eigenvalues. If A is an n x n matrix, then

r1 + r2 + … + r𝑘𝑘 = n
dim E𝜆𝜆𝑖𝑖 ≤ r𝑖𝑖 for all 𝑖𝑖

The inequality above says that the number of basis vectors in each eigenspace cannot be more than
the multiplicity of the eigenvalue in the characteristic polynomial.

Here’s an example.

Suppose det(λ I − A) = (λ − 1)1(λ − 2)3(λ − 4)2 is a characteristic polynomial of a 6 x 6 matrix A. So


we know there are three eigenvalues 1, 2 and 4.

What is the dimension of each of their eigenspaces?

Let’s look at E2 first. Since the multiplicity of eigenvalue 2 is equal to 3, we know

dim E2 ≤ 3.

In other words, the dimension can be 1, 2 or 3, but we can’t tell the exact value until we find the
eigenspace explicitly.

Similarly, for E4, the multiplicity of the eigenvalue 4 in the characteristic polynomial is 2, so

dim E4 ≤ 2.

As for E1, since the multiplicity of eigenvalue 1 in the characteristic polynomial is 1, we conclude that

dim E1 = 1

because dimension of an eigenspace cannot be anything less than 1.

14
3.4 Diagonalizable Matrices
Diagonal matrix is a special type of matrix that has many nice features. For matrices that are
not diagonal, we would like to “convert” them to diagonal matrices. Those matrices that can
be converted are known as diagonalizable matrices. In this section, we will give a more
precise description, and show how this notion is related to eigenvalues and eigenvectors.
Example: Power of Matrix (slide 2-3)
We start with an example as a motivation, which is about taking powers of square matrices.

A =  0.96 0.01 
 0.04 0.99 
n
 0.96 0.01 
We want to compute Ann =  
 0.04 0.99 
We cannot simply raise each entry to power n, because of the way we perform matrix multiplication.

What we will do is to perform a “factorization” on the matrix A in the following way:

The right hand side is the product of three matrices, which will be explain later how they are derived.
(You may try to work backward to verify that the product is indeed equal to A.)

The two outer matrices in the product are inverses of each other, which we denote by P and P-1 and
the matrix in the middle is a diagonal matrix, which we denote by D. So we have

A = PDP –1

Our objective is to raise A to the power n. So we do the same to the product on the right.

An = (PDP –1)n = (PDP –1)(PDP –1)(PDP –1) ···(PDP –1) (n times) (*)

(Note that (PDP –1)n ≠ PnDnP –n)

One nice property about matrix multiplication is the associative law, which allows us to rearrange
the parenthesis on the right hand side of (*):

PD(P –1P)D(P –1P)D(P –1P) ··· (P –1P)DP –1 (**)

We observe that there are many pairs of P –1P side by side in (**). All the intermediate pairs of P –1P
can be cancelled as their product is the identity matrix and we are left with a much simpler product:

An = PD D ··· DP –1 = PD nP –1.

Note that on the right hand side, we raise D to the power of n, while the powers of P and P-1 remain
unchanged.

As mentioned, diagonal matrices have many nice features. One of them being easy to perform
multiplication. In particular, raising a diagonal matrix to power n, is just raising its individual diagonal
entries to power n. So for our D, we have

15
For example, if we want to compute A100, instead of multiplying matrix A with itself 100 times, we
express it as:

Essentially we just need to perform multiplication on three matrices as shown. And this is a more
efficient way to get the answer.

Example 1: Diagonalizable Matrix (slide 4)


From our previous example, we have the factorization of A given by
−1
 0.96 0.01   1 1   1 0  1 1 
 =   
 0.04 0.99   4 − 1  0 0.95   4 −1 

which is represented algebraically by A = PDP–1. We can rewrite this equality as:


P–1AP = D
which in the expanded form is given by:
−1
 1 1   0.96 0.01   1 1   1 0 
 4 − 1  0.04 0.99   4 − 1 =  0 0.95 
      

What we have done is converting a non-diagonal matrix A to a diagonal matrix D. When such a
conversion can be carried out, we say the matrix A is a diagonalizable matrix, and P is the matrix that
diagonalizes A to give us a diagonal matrix D.
Diagonalizable Matrix (slide 5)
Let’s state the definition more generally:
An n x n square matrix A is called diagonalizable if we can find a non-singular matrix P such that
P –1AP is a diagonal matrix:
 λ1 0
 
λ2
P AP =  
−1

 ⋱
O 
 
0 λn 

We say: the matrix P diagonalizes A. Matrix P is necessarily non-singular, as we need to take its
inverse matrix.
Example 2: Diagonalizable Matrix (slide 6)

Here’s another example of 3 x 3 matrix B and the corresponding matrix P:


1 1 1  1 1 1 
   
B = 1 1 1  P = 1 0 − 2 
1 1 1  1 − 1 1 
   

We can check that P –1BP is a diagonal matrix!

So B is a diagonalizable matrix, and P diagonalizes B.

How do we get this matrix P? We shall see at the end of this section that P is related to the
eigenvectors of B.

16
Example: Non-diagonalizable Matrix (slide 7)

Not all square matrices are diagonalizable. Let us look at this example M, which is a non-
diagonalizable matrix.
2 0
M =  
1 2 

In other words, we cannot find a matrix P that diagonalizes M. To see why this is the case, we try to
argue by means of contradiction:

Assumption: Suppose there is a P such that P –1MP is diagonal.

Let’s write this non-singular matrix as

Then det(P) = ad - bc ≠ 0 (*).

Now let us expand the product P –1MP and equate it with some diagonal matrix �e 0�.
0 f

Then

The next thing is to compare the entries on both sides of the equality above.

For the (1,2)-entry, we have b2 = 0 which implies b = 0.

For the (2,1)-entry, we have a2 = 0 which implies a = 0.

This will imply: det(P) = ad - bc = 0 which contradict (*) above.

This means the above assumption is wrong, which implies that we cannot find the required matrix P.

This is an ad hoc method to show a non-diagonalizable matrix. In the next section, we will see a more
systematic way to determine whether a matrix is diagonalizable or not.

Diagonalizability (slide 8)

Let us make a comparison of the three matrices:


1 1 1 
 0.96 0.01    2 0
A =   B = 1 1 1  M =  
 0.04 0.99  1 1 1 
  1 2 

We have seen that A, B are diagonalizable, while M is not diagonalizable.

Let us now bring the eigenvalues and eigenvectors into the discussion.

A has two eigenvalues 1 and 0.95, B also has two eigenvalues 3 and 0, while M has one eigenvalue 2
(observe that M is a triangular matrix).

However, it is not the number of eigenvalues that decide whether the matrix is diagonalizable or not.
It is the eigenvectors.

For matrix A, the two eigenvalues each correspond to an eigenvector:

17
For matrix B, we have one eigenvector associated to eigenvalue 3, 1 
 
 1 
 
 1 
 
and two eigenvectors associated to eigenvalue 0: 1   0   − 2
1   − 1  1 
     
For matrix M, I leave it to you to check that an eigenvector of M associated to eigenvalue 2 is given
by  0 
1
 
Note that we are not saying the matrices above have only 2, 3 and 1 eigenvector respective. In fact,
these matrices have infinitely many eigenvectors.
0
For example, for M, any scalar multiple of   is also an eigenvector:
1
The correct way to say this is to include the condition linearly independent:

In other words, matrix A has two linearly independent eigenvectors; matrix B has three linearly
independent eigenvectors; while M, it has only one linearly independent eigenvector.

In the last case, there are “not enough” linearly independent eigenvectors. But how do we tell
whether there are enough eigenvectors for a given matrix? The answer is to compare it with the size
of the matrix.

Diagonalizability and Eigenvectors (slide 9)

The following result gives the condition for a matrix to be diagonalizable.

Let A be an n x n square matrix.

If A has n linearly independent eigenvectors, then A is diagonalizable

If we cannot find n eigenvectors of A that are linearly independent, then A is not diagonalizable.

(We will give an explanation of this result in the next segment.)

Using the examples above:

A is 2 x 2 and has 2 linearly independent eigenvectors. So A is diagonalizable.

B is 3 x 3 and has 3 linearly independent eigenvectors. So B is diagonalizable.

M is 2 x 2 and only has 1 linearly independent eigenvector. So M is non-diagonalizable.

Two Useful Observations (slide 10)

The following two observations, not only explain the condition for diagonalizability, but they are also
techniques used commonly in matrix multiplications.

Observation 1: Matrix multiplication of two matrices A and B.

If we write matrix B in terms of its columns b1 , b2 , ··· , bn. Then

AB = A(b1 b2 ··· bn) = (Ab1 Ab2 ··· Abn)

In other words, the 1st column of AB is given by Ab1, the 2nd column of AB is given by Ab2 and so on.

To illustrate, let us look at this concrete example:

18
You can first multiply the two matrices on the left to see that you will get the right hand side.

Now denote the three columns of B as b1 , b2 , b3.

Then you can check that the first column of AB is Ab1, the second column is Ab2, and the third
column is Ab3.

Observation 2: Matrix multiplication of two matrices B and D, where D is a diagonal matrix.

Suppose the diagonal entries of D are d1 , d2 , ··· , dn, and the columns of B are b1 , b2 , ··· , bn. Then

BD = (b1 b2 ··· bn) D = (d1b1 d2b2 ··· dnbn)

Let us use an example:

The product of the two matrices on the left hand side yield the matrix on the right.

Let b1 , b2 , b3 be the three columns of the matrix on the left, we again check the matrix on the right
by columns, and see that column 1 is 2b1, column 2 is 3 b2, and column 3 is 4 b3.

Diagonalizability and Eigenvectors - proof (slide 11)

We want to show: if an n x n matrix A has n linearly independent eigenvectors, then it is


diagonalizable.

First, we denote the n eigenvectors by u1, u2, …, un and let 𝜆𝜆1, 𝜆𝜆2, …, 𝜆𝜆n be the corresponding
eigenvalues. (Note that some of these n eigenvalues may be the same.)

Then we form the n x n matrix P by using the n eigenvectors as its columns and let D be the diagonal
matrix where the entries are the eigenvalues:

P = (u1 u2 ··· un) and

Note that P is guaranteed to be non-singular because the columns are linearly independent.

Now AP = (Au1 Au2 ··· Aun) = (𝜆𝜆1u1 𝜆𝜆2u2 ··· 𝜆𝜆nun)

The first equality above follows from the first observation in the previous segment.

The second equality is due to u1, u2, …, un being eigenvectors of A, so that Aui = 𝜆𝜆iui.

Now from the second observation in the previous segment,

(𝜆𝜆1u1 𝜆𝜆2u2 ··· 𝜆𝜆nun)

The product on the right is precisely PD.

So putting everything together, we get AP = PD.

And by bringing the P on the right hand side to the left, we get P –1AP = D.

This means the matrix P diagonalizes A to give the diagonal matrix D and so we conclude that A is
diagonalizable.

19
3.5 Diagonalization
In this section, we will introduce a systematic procedure to determine whether a matrix is
diagonalizable or not. The approach will also show how to diagonalize a matrix. This process is called
diagonalization.
Algorithm for Diagonalization (slide 2)
Step 1: Solve the characteristic polynomial det(𝜆𝜆I – A) to find all distinct eigenvalues 𝜆𝜆1, 𝜆𝜆2, …, 𝜆𝜆k.
Step 2: For each 𝜆𝜆i, find a basis S𝜆𝜆𝑖𝑖 for the eigenspace E𝜆𝜆𝑖𝑖 by solving (𝜆𝜆iI – A)x = 0

Step 3: Let S = S𝜆𝜆1 ∪ S𝜆𝜆2 ∪ … ∪ S𝜆𝜆𝑘𝑘 . (Then |S| is the total number of basis vectors from all the
eigespaces)
(a) If |S| < n, then A is not diagonalizable.
In this case, there are not enough linearly independent eigenvectors to form a matrix P that
diagonalizes A.
(b) If |S| = n, then A is diagonalizable.
In this case, if S = {u1, u2, …, un}, then the square matrix P = (u1 u2 ··· un) diagonalizes A.
Example 1: Algorithm for Diagonalization (slide 3-4)

1 1 1 
 
B = 1 1 1 
1 1 1 
 

Step 1: By solving the characteristic polynomial, the eigenvalues are 3 and 0.


Step 2: For 𝜆𝜆 = 3, solve (3I – B) x = 0; for 𝜆𝜆 = 0, solve (0I – B) x = 0
 1   − 1  − 1
We have S3 =  1  a basis for E3 and S0 =  1  ,  0  a basis for E0
  1   0   1 
 

Step 3: |S| = |S3| + |S0| = 3, which agree with the size 3 x 3 of matrix B.

So we conclude B is diagonalizable.
1 − 1 − 1 
 
We form the matrix P using the three vectors in S as columns: P = 1 1 0 
 1 
3 0 0 1 0
 
Then P BP =  0 0 0 
−1

0 0 0
 
The diagonal matrix on the right has the eigenvalues of B as the diagonal entries. Note that 0
appears two times as there are two eigenvectors associated to this eigenvalue.
Note: In step 3, it is not necessary for you to actually perform the multiplication P-1BP. All you need
to do is to write down the diagonal matrix using the eigenvalues found in step 1.

Remark: Matrix P is not unique. There are many possible matrices that can diagonalize B to give a
diagonal matrix. We can use any other eigenvectors of B as the columns of P, as long as they are
linearly independent.

20
For example, can be replaced by

Note that the three columns in the matrix on the right are scalar multiples of the original
eigenvectors, and hence are still eigenvectors of B.

We can also re-order the columns.

For example, matrix Q is obtained from P by moving the first column to the
last. As long as the three columns are linearly independent eigenvectors, it will
still diagonalize the matrix.
0 0 0
You can check that  
Q−1BQ =  0 0 0 
0 0 3
 
Take note that the order of the diagonal entries on the right is rearranged accordingly.
Example 2: Non-Diagonalizable Matrix (slide 5)

 1 0 0
 
A =  1 2 0
 − 3 5 2
 

Step 1: The eigenvalues are 1 and 2. (A is a lower triangular matrix, so the eigenvalues can be read
off from the diagonal entries directly.)

Step 2: For 𝜆𝜆 = 1, solve (I – A) x = 0; for 𝜆𝜆 = 2, solve (2I – A) x = 0.


 1 
   0 
We have a basis S1 =  − 1 for E1 and a basis S2 =  0  for E2
 

 8   1 


Step 3: |S| = |S1| + |S2| = 2.

Since we only have two linearly independent eigenvectors, so A is not diagonalizable.

Dimension of Eigenspace (slide 6)


Recall from section 3.3, the characteristic polynomial of an n x n matrix A:
det(𝜆𝜆𝑰𝑰 − 𝑨𝑨) = (𝜆𝜆 − 𝜆𝜆1 )𝑟𝑟1 (𝜆𝜆 − 𝜆𝜆2 )𝑟𝑟2 ⋯ (𝜆𝜆 − 𝜆𝜆𝑘𝑘 )𝑟𝑟𝑘𝑘
where r1 , r2 , … , r𝑘𝑘 are called the multiplicities of the eigenvalues of A. And we have
r1 + r2 + … + r𝑘𝑘 = n and dim E𝜆𝜆𝑖𝑖 ≤ r𝑖𝑖 for all 𝑖𝑖.

Note that dim E𝜆𝜆𝑖𝑖 = �S𝜆𝜆𝑖𝑖 � in the algorithm above.


Case 1: If dim E𝜆𝜆𝑖𝑖 = r𝑖𝑖 for all 𝑖𝑖, then |S| = r1 + r2 + … + r𝑘𝑘 = n.
In this case, A is diagonalizable.
Case 2: If dim E𝜆𝜆𝑖𝑖 < r𝑖𝑖 for at least one 𝑖𝑖, then |S| < r1 + r2 + … + r𝑘𝑘 = n.
In this case, A is not diagonalizable.

21
Matrix with Maximum Distinct Eigenvalues (slide 7)

Let A be an n x n square matrix.

If A has n distinct eigenvalues, then A is diagonalizable.

Suppose the n eigenvalues of A are 𝜆𝜆1, 𝜆𝜆2, …, 𝜆𝜆n. Then each eigenvalue will correspond to an
eigenvector u1, u2, …, un. Since all the eigenvalues are distinct, the eigenvectors will be linearly
independent. So we have n linearly independent eigenvectors for A. This implies A is diagonalizable.
What this observation tells us is that, when we carry out the diagonalization algorithm, after step 1,
if we obtain n distinct eigenvalues, then we can straight away conclude that the matrix is
diagonalizable without going through the remaining steps.
However, if we need to find the matrix P that diagonalizes A, we still need to perform steps 2 and 3
of the algorithm to find explicit eigenvectors.

Example: Matrix with Maximum Distinct Eigenvalues (slide 8)


1 1 1 1 Observe that this is an upper triangular matrix with diagonal entries 1, 2, 3, 4.
 
0 2 2 2
A= So we know immediately A has 4 distinct eigenvalues and hence A is a
0 0 3 3
 
0 0 0 4 diagonalizable matrix.
1 0 0 0 This is a diagonal matrix with diagonal entries 1, 1, 2, 2. So B has only 2 distinct
 
0 1 0 0 eigenvalues. Is B diagonalizable?
B=
0 0 2 0
  All diagonal matrices are automatically diagonalizable.
0 0 0 2

So B is also a diagonalizable matrix despite not having a maximum number of distinct eigenvalues.
Hence, if a matrix is an n x n diagonalizable matrix, it need not have n distinct eigenvalues.
Diagonalization and Linear Transformation (slide 9)
5 3
Suppose T : R2 → R2 defined by T(u) = Au where A = � �.
3 5
1 -1
A is diagonalizable with eigenvalues 8 and 2, and eigenvectors � � and � �. (Check!)
1 1
1 -1 8 0
So we let P = � � and D = � �.
1 1 0 2
Then the standard matrix A = PDP –1 and T(u) = PDP –1u.
We shall see how the input vector is being transformed to the output vector in three steps as follow.
u → P –1u → DP –1u → PDP –1u
We first draw the standard coordinate system (black arrows) corresponding to the standard basis
1 -1
vector. Then we construct a new coordinate system using our two eigenvectors � � and � � as a
1 1
basis (red arrows), as shown in the diagram.

22
Then the new coordinate systems will have the new axes (blue dotted arrows) along the two
eigenvectors.
Now for the step u → P –1u , the resulting vector gives the coordinate vector relative to the new
system.
For the second step P –1u → DP –1u, the effect is to do a scaling in the new coordinate system.
To illustrate this, let us use a square as an input figure, and look at the effect.

After multiplying by P-1, we should regard the square in the new coordinate system.
8 0
Now the matrix D = � � scales along the first axis by a factor 8 as shown in the diagram above,
0 2
transforming the top right and bottom left corners of the square.
At the same time, D scales along the second axis by a factor 2, transforming the top left and bottom
right corners of the square as shown.
So we obtain the four corners of the resulting figures, which is given by a rhombus.
Finally, DP –1u → PDP –1u, the effect is to bring the vector back to the original coordinate system.

23
3.6 Powers of Matrices
In this last section, we shall revisit the power of square matrices that are diagonalizable. We will also
give an example on application of power of matrices to population modeling.
Iterative Systems (slide 2)
Many real life examples come in the form of iterative systems.
These are systems with various stages (usually over a period of time) such that:
• the stages can be described using n-vectors, x0, x1, x2, …
• consecutive stages are related by a fixed n x n matrix A: xk = Axk-1
where xk-1 and xk are n-vectors representing the k-1th and kth stage respectively.
Here is an example of a system where the stages are represented by 3-vectors:

The matrix A in this case is a 3 x 3 matrix.


The stage 1 vector is given by x1 = Ax0.

The stage 2 vector is given by x2 = Ax1 = A(Ax0) = A2x0.

In a similar manner, the stage n vector is given by


xn = Axn-1 = A(Axn-2) = A(A(Axn-3)) = … = Anx0
Power of Diagonalizable Matrices (slide 3)
Let A be a diagonalizable matrix of order n and P a non-singular matrix such that

Raising both sides to power m:

The left side can be simplified as P–1AmP  λ1m 0 


 
while the right hand side is given by this matrix λm
= 2 
 ⋱O 
 
m
 0 λn 

So we have

24

This gives a simple way to compute and analyse large powers of A by just performing matrix
multiplication on three matrices.
Example: Power of Diagonalizable Matrices (slide 4)
 − 4 0 − 6
 
A= 2 1 2 
 3 0 5 
 

By going through the algorithm for diagonalization, we obtain (details skipped):


 − 2 0 − 1  − 1 0 0
   
P = 1 1 0  and P AP =  0 1 0 
−1

 1 0 1   0 0 2
   

 (−1)m 0 0 
Then  
A m
= P 0 1m 0  P−1
 0 0 2m 

In fact, since this matrix is non-singular, we can even apply negative powers to this. In particular, we
can let m be -1 to get A-1 in the same form.
 (−1)−1 0 0 
 
A −1
= P 0 1−1
0  P−1
 0 0 2−1 

Population Modeling (slide 5-7)
We now apply the power of matrix to population modeling.
Suppose the population in a certain country is divided into rural and urban. Studies show that every
year, 4% of rural population will move to the urban area, while 1% of the urban population will move
to the rural area.

What will be the projected proportion of the two population in the long term?
Answer: 20% of the population will be rural, and 80% will be urban.
How do we compute that?
As this is an iterative system, we need to analyse the population in terms of the year. So we let
an = the rural population in year n, and bn = the urban population in year n, with respect to current
year as the starting point.
Now based on the information we have, the rural population in year n depends on both the rural and
urban population in year n-1:

In other words, 96% of rural population in year n-1 and 1% of urban population in year n-1 will
contribute to the rural population in year n.

25
Similarly, 4% of rural population in year n-1 and 99% of urban population in year n-1 will contribute
to the urban population in year n.

This is a kind of linear system that we can convert into the matrix equation form as before.

0.96 0.01 an
Let A= � � and xn = �b �. Then we have the iterative relation
0.04 0.99 n

xn = Axn-1 = A(Axn-2) = A(A(Axn-3)) = … = Anx0


as before. Here x0 represents the current population.
To determine the long term population, we will look at an and bn for large n, which means we need
to find the nth stage vector for large n.
And from what we derive, we should be looking at An for some large power n of A. Using
diagonalization, we have

The diagonal entries are 1 and 0.95, which are the eigenvalues of A. Since 0.95 < 1, when it is raised
to a large power, it will be closed to 0. So we can approximate this entry with 0 and hence the
resulting product can be computed to be approximately given by this:

1 1 1 0 1 1 −1 0.2 0.2
𝑨𝑨(𝑏𝑏𝑏𝑏𝑏𝑏 𝑛𝑛) ≈ � �� �� � =� �
4 −1 0 0 4 −1 0.8 0.8

We can then use this matrix to find the populations in the long term:

 a(big n)   0.2 0.2  a0   0.2(a0 + b0 )


  ≈    =  
b 
 (big n)   0.8 0.8  b0   0.8(a0 + b0 ) 
The resulting vector shows that the rural population in the long term is 20% of the current total
population, and the urban population in the long term is 80% of the current total population.
Of course, this is a very simple model of the population, with many assumptions made. For example,
factors such as migration, birth and death are not taken into account.

26

You might also like