You are on page 1of 29

Lecture Notes on Computational Mathematics

Dr. K. Manjunatha Prasad


Professor of Mathematics,
Department of Data Science, PSPH
Manipal Academy of Higher Education, Manipal, Karnataka-576 104
kmprasad63@gmail.com, km.prasad@manipal.edu
Lecture Notes on Computational Mathematics

Dr. K. Manjunatha Prasad


Department of Data Science, PSPH
Manipal Academy of Higher Education, Manipal, India

M.Sc Data Science/Biostatistics/Digital Epidemiology (I sem), Batch 2021-2022

Contents

1 Decomposition of Matrices, Generalized inverses 2


1.1 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Eigenvalues, Positive Definite Matrices and Decompositions . . . . . . . . . . . . . . . . . . . 9
1.4 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Characteristics of a diagonalizable matrix . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.1 Properties of PD and PSD matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.8 Spectral Decomposition Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.9 Singular Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.10 The Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.11 Generalized Inverses and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.12 Construction of generalized inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.13 Minimum Norm, Least Squares g-inverse and Moore-Penrose inverse . . . . . . . . . . . . . 24
1.13.1 Construction of Moore-Penrose inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1
Decomposition of Matrices, Generalized inverses

1. Decomposition of Matrices, Generalized inverses

1.1 Rank
Column space of a matrix: Given m × n matrix A, each column of A is a vector from Rm and the
subspace spanned by those columns is known as ‘column space’ of matrix A (denoted by C (A)).
The dimension of column space of A is known as ‘column rank’ of matrix A.

Row space of a matrix: Given m × n matrix A, each row of A is a vector from Rn and the subspace
spanned by those rows is known as ‘row space’ of matrix A.
The dimension of row space of A is known as ‘row rank’ of matrix A.
Note: For any matrix A, we have

Row Rank(A) = Column Rank(A)

Definition 1.1 (Rank). Rank of an m × n matrix A is the Column Rank(A) which is same as the Row
Rank(A).

Theorem 1.1. Let A, B be matrices such that AB is defined. Then

rank(AB) ≤ min{rank(A), rank(B)}

Proof. A vector in C (AB) is of the form ABx for some vector x, and therefore it belongs to C (A). There-
fore C (AB) ⊆ C (A) and hence rank(AB) ≤ rank A. Similarly, we observe that R (AB) ⊆ R (B) and there-
fore rank(AB) ≤ rank B.

Theorem 1.2. Let A be an m × n matrix of rank r, r ̸= 0. There exist matrices B, C of order m × r, r × n


respectively such that rank B = rank C = r and A = BC. This decomposition of matrix A is called
rank factorization of A

Proof. Consider a basis for the column space of A, say b 1 , b 2 , . . . , b r . Construct an m × r matrix B =
³ ´
b 1 · · · b r . Since each column of A is a linear combination of the columns of B, there exists an r × n
matrix C such that A = BC From the definition of B, it is trivial that rank B = r. Since r = rank A ≤
rank C and C is of size r × n, we obtain rank C = r.

Exercise 1.1. Let A be an m × n matrix. Then N (A) = (C (A T ))⊥ .

Exercise 1.2. Let A be an n × n matrix. Then the following conditions are equivalent.

(i) A is nonsingular, i.e., rankA = n.

(ii) For any b ∈ Rn , Ax = b has unique solution.

Lecture Notes 2
1.1 Rank

(iii) There exists unique matrix B such that AB = BA = I.

Theorem 1.3. Let A, B be m × n matrices. Then rank (A + B) ≤ rank A + rank B.


 
³ ´ Y
Proof. Let A = X Y , B = UV be rank factorizations of A, B. Then A + B = X Y + UV = X U  . So,
V
³ ´ ³ ´
rank (A + B) ≤ rank X U . Clearly, dim(C X U ) ≤ dim(C (X )) + dim(C (U)) = rank A + rank B. This
proves the theorem.

Exercise 1.3. Let A be an m × n matrix and M and N are the invertible matrices of size m × m and n × n,
respectively. Then prove that

(i) rank(M A) = rank A

(ii) rank(AN) = rank A.

Theorem 1.4. Given an m × n matrix  A ofrank r, there exists invertible matrices M, N of order m × m,
Ir 0
n × n respectively such that M AN =  .
0 0

Example 1. Obtain the canonical form of the following matrix and give two different rank factorizations:
 
3 6 6
A= .
1 2 2

Solution. The canonical form of A is given by


 
Ir 0
A=P Q
0 0

where P and Q are invertible matrices and r is the rank of matrix A.


Consider the augmented matrix [I : A] to find the invertible matrix P. Now
     
h i 1 0 : 3 6 6 ≃ 0 1 : 1 2 2 ≃ 0 1 : 1 2 2
I : A =  R1 ↔ R2   R 2 → R 2 + (−3)R 1  
0 1 : 1 2 2 1 0 : 3 6 6 1 −3 : 0 0 0
     
0 1 1 2 2 3 1
Take M =  , then M A =   and P = M −1 =  .
1 −3 0 0 0 1 0
 
MA
Consider the augmented matrix   to find the invertible matrix Q. Now
I3
     
1 2 2 1 0 2 1 0 0
     
 0 0 0 0 0 0
   0 0 0
MA
   
  ≃   ≃  
 C 2 → C 2 + (−2)C 1 1 −2 0 C 3 → C 3 + (−2)C 1 1 −2 −2
  = 1 0 0   
I3
 
     
0 1 0 0 1 0 0 1 0
 
   
0 0 1 0 0 1 0 0 1

Lecture Notes 3
1.2 Determinants

   
1 −2 −2   1 2 2
  1 0 0 −1
 
Take N = 
0 1 0 , then M AN =
  and Q = N = 0 1 0 .
0 0 0

0 0 1 0 0 1
 
       1 2 2
1 0 0 I 0 3 1 1 0 0 
Therefore A = M −1   N −1 = P  1

Q =    0 1 0.
0 0 0 0 0 1 0 0 0 0
 
0 0 1
 
3 h i
A rank factorization of A is A =   1 2 2 . Now
1
   
3 h i 3 h i h i−1 h i
A =   I1 1 2 2 =   2 2 1 2 2
1 1
 
3 h ih ih i
=   2 2−1 1 2 2
1
 
6 h i
=   12 1 1 ,
2

is another rank factorization.

Exercise 1.4. Obtain the canonical form of the following matrices and give two different rank factoriza-
tions:        
2 1 −2 1 1 −1 1 1 1 −1
 ,  ,  ,  .
1 0 −1 −1 1 1 2 2 1 1

Exercise 1.5. Let A be an n × n matrix of rank r. Then there exists an n × n matrix Z of rank n − r such
that A + Z is nonsingular.

Theorem 1.5 (Frobenius Inequality). Let A, B be n × n matrices. Then

rank(AB) ≥ rank A + rank B − n.


   
I r 0 0 0
Proof. Let rank of A be r. If A = M −1   N −1 , then for Z = M −1   N −1 we get A + Z =
0 0 0 I n− r
−1 −1
M N is an invertible matrix. Further

rank B = rank((A + Z)B) = rank(AB + ZB) ≤ rank(AB) + rank(ZB)

≤ rank(AB) + rank(Z) = rank(AB) + n − r = rank(AB) + n − rank A.

This proves the Frobenius inequality.

1.2 Determinants
Consider Rn×n the set of all n × n matrices over R.
A mapping D : Rn×n → R is said to be n-linear if for each i, 1 ≤ i ≤ n, D is linear function of i-th row
when other (n − 1) rows are held fixed.

Lecture Notes 4
1.2 Determinants

Example 2. A function D : Rn×n → R defined by D(A) = a 11 a 22 . . . a nn , product of diagonal elements of


matrix A, is n-linear.

Exercise 1.6. A linear combination of n-linear functions is n-linear.

Alternating n-linear Function


A mapping D : Rn×n → R is said to be an alternating n-linear function if

(i) D is n-linear

(ii) If B is a matrix obtained by interchanging two rows of A, then D(A) = −D(B).

When D : Rn×n → R satisfies the condition (i) above, the condition (ii) can be replaced by
(ii)′ D(A) = 0 if two rows are equal. (Exercise)

Theorem 1.6. If a mapping D : Rn×n → R is n-linear, then the following are equivalent:

(i) If B is a matrix obtained by interchanging two rows of A, then D(A) = −D(B).

(ii) D(A) = 0 if any two rows are equal.

(iii) If B is a matrix obtained by interchanging two adjacent rows of A, then D(A) = −D(B).

(iv) D(A) = 0 if any two rows on the adjacent are equal.

Proof. (i) =⇒ (ii) Consider a matrix A such that its i-th row A i and j-th row A j are same for some
i ̸= j. Now
D(A) = D(A 1 , . . . , A i , . . . , A j , . . . , A n ) = D(A 1 , . . . , A j , . . . , A i , . . . , A n )

But from (i) we get

D(A 1 , . . . , A i , . . . , A j , . . . , A n ) = −D(A 1 , . . . , A j , . . . , A i , . . . , A n )

Therefore D(A) = 0. (ii) =⇒ (i) Consider a matrix B with its k-th row B k is same as A k , the k-th row of
A, for all k ̸= i, j and B i = A j and B j = A i (i < j). Now obtain a matrix C such that k-th row C k is same
as A k , the k-th row of A, for all k ̸= i, j and C i = C j = A i + A j . Now from (ii), D(C) = 0 and we get

0 = D(C 1 , . . . , C i , . . . , C j , . . . , C n )

= D(A 1 , . . . , (A i + A j ), . . . , (A i + A j ), . . . , A n )

= D(A 1 , . . . , A i , . . . , A i , . . . , A n ) + D(A 1 , . . . , A i , . . . , A j , . . . , A n )+

D(A 1 , . . . , A j , . . . , A i , . . . , A n ) + D(A 1 , . . . , A j , . . . , A j , . . . , A n )

= D(A 1 , . . . , A i , . . . , A j , . . . , A n ) + D(A 1 , . . . , A j , . . . , A i , . . . , A n )

= D(A) + D(B)

Therefore D(B) = −D(A).


(i) =⇒ (iii) is trivial.

Lecture Notes 5
1.2 Determinants

(iii) =⇒ (i) Consider the sequence of rows

A 1 , A 2 , . . . , A i−1 , A i , A i+1 , . . . , A j−1 , A j , A j+1 , . . . A n where i < j.

Now begin interchange the row A i with A i+1 and continue until we get the sequence in the order

A 1 , A 2 , . . . , A i−1 , A i+1 , . . . , A j−1 , A j , A i , A j+1 , . . . A n .

This requires k = j − i many interchanges of adjacent rows. Further to get A j in the i-th position we
require k − 1 such interchanges of adjacent rows. So, if B is the matrix with interchange of i-th and j-th
rows of A, we get B from A after 2k − 1 interchanges of adjacent rows. So, D(B) = (−1)2k−1 D(A) = −D(A).
(ii) =⇒ (iv) is trivial, (iv) =⇒ (ii) follows from (iii) =⇒ (i).

Alternating n-linear Function, n = 2 case


Now consider an alternating 2-linear function D : R2×2 → R, where n of the above discussion is 2. Let
e 1 = (1, 0) and e 2 = (0, 1), the first and second rows of I respectively. For any matrix A of size 2 × 2, the
first row is given by a 11 e 1 + a 12 e 2 and similarly the second row is a 21 e 1 + a 22 e 2 . Therefore,

D(A) = D(a 11 e 1 + a 12 e 2 , a 21 e 1 + a 22 e 2 )

= D(a 11 e 1 , a 21 e 1 + a 22 e 2 ) + D(a 12 e 2 , a 21 e 1 + a 22 e 2 )

= D(a 11 e 1 , a 21 e 1 ) + D(a 11 e 1 , a 22 e 2 ) + D(a 12 e 2 , a 21 e 1 )

+ D(a 12 e 2 , a 22 e 2 )

= a 11 a 21 D(e 1 , e 1 ) + a 11 a 22 D(e 1 , e 2 ) + a 12 a 21 D(e 2 , e 1 ) + a 12 a 22 D(e 2 , e 2 )

Now employing (ii) or (ii)′ conveniently, the above equals to

= a 11 a 22 D(e 1 , e 2 ) + a 12 a 21 D(e 2 , e 1 ) = (a 11 a 22 − a 12 a 21 )D(e 1 , e 2 ) (1.1)

Determinant Function
Determinant function on Rn×n is a mapping D : Rn×n → R such that

(i) D is n-linear

(ii) D(A) = 0 if two rows are equal

(iii) D(I) = 1 for the identity matrix.

If D satisfies the property (iii) given above, then (1.1) reduces to D(A) = (a 11 a 22 − a 12 a 21 ).
Now for any alternating n-linear function D : Rn×n → R, we shall consider e i the i-th row of an
identity matrix. Now à !
n
X n
X n
X
D(A) = D a1i e i , a2i e i , . . . , a ni e i (1.2)
i =1 i =1 i =1

Also, consider the set of all permutations of degree n

P n = {σ = (σ1 , σ2 , . . . , σn ) : 1 ≤ σ1 , σ2 , . . . , σn ≤ n and distinct}

Lecture Notes 6
1.2 Determinants

Now apply n-linear property and D(e k1 , e k2 , . . . , e k n ) = 0 for any repeated k i , we get (1.2) as
X
D(A) = a 1σ1 a 2σ2 . . . a nσn D(e σ1 , e σ2 , . . . , e σn )
σ

Now by alternating property of D, we get D(e σ1 , e σ2 , . . . , e σn ) = sgn(σ)D(e 1 , e 2 , . . . , e n ), where sgn(σ) is


(−1)k , k is the length of the permutation or the number of interchanges of elements in (σ1 , σ2 , . . . , σn )
required to get (1, 2, . . . , n). So a permutation is said to be an odd permutation if k above is odd otherwise,
even permutation. Hence, we get
X X
D(A) = sgn(σ)a 1σ1 a 2σ2 . . . a nσn D(e 1 , e 2 , . . . , e n ) = sgn(σ)a 1σ1 a 2σ2 . . . a nσn D(I)
σ σ

Further, if D satisfies (iii) of determinant function then D(A) is uniquely determined by the entries of
A and
X
D(A) = sgn(σ)a 1σ1 a 2σ2 . . . a nσn
σ

So the determinant function for Rn×n exists and unique.


If D is a determinant function on Rn×n , then D(A) is called determinant of matrix A and denoted
by det(A).
Given an n × n matrix A and 1 ≤ i, j ≤ n(n > 1), A(i | j) is the submatrix of A of size (n − 1) × (n − 1)
obtained by removing i-th row and j-th column. If D is an alternating (n − 1) linear function, then
D i j (A) = D(A(i | j)).
If a i j is the (i j)-th element of matrix A ∈ Rn×n , then the cofactor of a i j is c i j = (−1) i+ j det(A(i | j)).
We mention the following result without any proof.

Theorem 1.7. Consider an n × n matrix A and D an alternating (n − 1) linear function for each j, 1 ≤
j ≤ n, E j defined by
n
(−1) i+ j A i j D i j (A)
X
E j (A) =
i =1

is an alternating n-linear function. Further, if D is determinant function, then so is E j .

The followings are some basic properties of determinant which are useful:

(i) The determinant is a linear function of any row when all the other rows are held fixed.

(ii) The determinant changes sign if two rows are interchanged.

(iii) The determinant is unchanged if a constant multiple of one row is added to another row.

(iv) det(AB) = det(A) det(B).

(v) det(A) = det(A T ).


 
A C
(vi) If A and B are square matrices and D =  , then
0 B

det(D) = det(A) det(B).

Lecture Notes 7
1.2 Determinants

Remarks:(Lapalce expansion) The determinant can be evaluated by expansion along a row or a column.

• The determinant expansion for a real matrix A of size n × n about the j-th column is
n
(−1) i+ j a i j det(A(i | j)).
X
det(A) =
i =1

• The determinant expansion for a real matrix A of size n × n about the i-th row is
n
(−1) i+ j a i j det(A(i | j)).
X
det(A) =
j =1

Exercise 1.7. For a square matrix A ∈ Rn×n adjoint matrix adj(A) is the matrix where

(ad j(A)) ji = (−1) i+ j det(A(i | j))

Prove that A.ad j(A) = det(A)I. Hence prove the Cramer’s rule, i.e., j-th co-ordinate x j of the solution to
|B j |
Ax = b is given by | A| where B j is the matrix obtained by replacing j-th column of A by b.

Exercise 1.8. For a square matrix A ∈ Rn×n , A has inverse if and only if det(A) is nonzero.

Exercise 1.9. Find the inverse of the following matrices using adjoint method:
 
1 2 −1
 
(i) 
−1 1 2
2 −1 1
 
cos(θ ) − sin(θ ) 0
 
 sin(θ ) cos(θ ) 0
(ii)  

0 0 1

Exercise 1.10. Solve by Cramer’s rule, if possible:

(i) 3x + y + 2z = 3, 2x − 3y − z = −3, x + 2y + z = 4

(ii) x + y + z = 11, 2x − 6y − z = 0, 3x + 4y + 2z = 0

(iii) 3x − 2y = 7, 3y − 2z = 6, 3z − 2x = −1
¯ ¯
¯ x 2 −1¯
¯ ¯
¯ ¯
Exercise 1.11. Solve for x : ¯ 2 5 x ¯¯ = 0.
¯
¯ ¯
¯−1 2 x ¯
¯ ¯
¯1
¯ ω ω2 ¯¯
Exercise 1.12. If ω is the imaginary cube root of unity, evaluate ¯¯ ω
¯ ¯
ω2 1 ¯¯ .
¯ω2
¯ ¯
1 ω¯
¯ ¯
¯1 + a b c ¯¯
¯
¯ ¯
Exercise 1.13. Prove that ¯¯ a 1+b c ¯¯ = 1 + a + b + c.
¯ ¯
¯ a b 1 + c¯

Lecture Notes 8
1.3 Eigenvalues, Positive Definite Matrices and Decompositions

¯ ¯
¯a + b + 2c a b ¯
¯ ¯
¯ = 2(a + b + c)3 .
¯ ¯
Exercise 1.14. Show that ¯¯ c b + c + 2a b ¯
¯ ¯
¯ c a c + a + 2b¯
¯ ¯
¯91 92 93¯
¯ ¯
¯ ¯
Exercise 1.15. Evaluate ¯94 95 96¯¯.
¯
¯ ¯
¯97 98 99¯

¯ ¯
¯x + 3 5 7 ¯¯
¯
¯ ¯
Exercise 1.16. Solve for x : ¯¯ 3 x+5 7 ¯¯ = 0.
¯ ¯
¯ 3 5 x + 7¯
   
4 10 11 1 −2 3
   
Exercise 1.17. If A = 
7 6 2 and B =  0

, find | A.B|.
2 1
1 5 4 −4 5 2
   
102 105 160 150
Exercise 1.18. If A =   and B =  , find | A.B|.
100 100 150 150
 
3 2 x
 
Exercise 1.19. If A = 
4 1 −1 is a singular matrix, find x.

0 3 4
¯ ¯
¯ x 3 7¯
¯ ¯
¯ ¯
Exercise 1.20. If x = −9 is a root of ¯2 x 2¯¯ = 0, find the other two roots.
¯
¯ ¯
¯7 6 x ¯

Exercise 1.21. Decide whether the determinant of the following matrix A is even or odd, without eval-
 
387 456 589 238
 
488 455 677 382
uating it explicitly: A =  .
 
440 982 654 651
 
892 564 786 442
¯ ¯
¯ A + B A¯
¯ ¯
Exercise 1.22. If A, B are n × n matrices, show that ¯
¯ ¯ = | A ||B|.
¯ A A¯
¯

Exercise 1.23. Evaluate the determinant of an n × n matrix A where a i j = i j if i ̸= j and a i j = 1 + i j if


i = j.

1.3 Eigenvalues, Positive Denite Matrices and Decompositions


Characteristic Polynomial: Given an n × n matrix A over the field R (or C), the characteristic poly-
nomial is the polynomial of degree n with a variable x, given by det(xI − A). If P(x) = det(xI − A), the
characteristic equation is given by P(x) = 0. i.e., the characteristic equation of A is det(xI − A) = 0.

Lecture Notes 9
1.3 Eigenvalues, Positive Definite Matrices and Decompositions

Note: In the characteristic polynomial, the coefficient of x n is 1, coefficient x n−1 is (−1)1 T race(A), . . . . . .
coefficient of x n− i is (−1) i s i× i where s i× i is the sum of all i × i principal minors of A . . . . . . and the con-
stant term is (−1)n det(A).

Definition 1.2. The roots of the characteristic equation of a square matrix A is called the eigenvalues
(characteristic values) of A or in other words λ is said to be an eigenvalue of A if there exists a vector
x ̸= 0 such that Ax = λ x. Such a vector x is called an eigenvector of A corresponding to the eigenvalue λ.

Note that the set of vectors { x : Ax = λ x} is the nullspace of A − λ I. This nullspace is called the
eigenspace of A corresponding to the eigenvalue λ and its dimension is called the geometric multi-
plicity of λ.
The eigenvalues may not all be distinct. The number of times an eigenvalue occurs as a root of the
characteristic equation is called the algebraic multiplicity of the eigenvalue.
Cayley Hamilton theorem: Given a n × n matrix A and characteristic polynomial P(x) = det(xI −
A), we have P(A) = 0.

Exercise 1.24. Two similar matrices have same characteristic polynomial.

Exercise 1.25. Find the characteristic polynomials of the following matrices:


 
2 −3
(i)  
5 1
 
1 3 0
 
(ii) 
−2 2 −1

4 0 2

Exercise 1.26. Find the characteristic polynomial of a 2 × 2 matrix whose trace and determinant are 7
and 6 respectively.

Exercise 1.27. Show that a matrix A and its transpose A T have the same characteristic polynomial.
 
A1 B
Exercise 1.28. Suppose   where A 1 and A 2 are the square matrices. Show that the character-
0 A2
istic polynomial of M is the product of the characteristic polynomials of A 1 and A 2 .

Exercise 1.29. Find the characteristic polynomials of the following matrices:


 
1 2 3 4
 
0 2 8 −6
(i) 
 

0 0 3 −5
 
0 0 0 4
 
2 5 7 −9
 
1 4 −6 4 
(ii) 
 

0 0 6 −5
 
0 0 2 3

Lecture Notes 10
1.4 Diagonalization

 
5 8 −1 0
 
0 3 6 7
(iii) 
 

0 −3 5 −4
 
0 0 0 7

Exercise 1.30. Find all the eigenvalues and the eigenvectors corresponding to each of the eigenvalues of
the following matrices:
 
1 4
(i)  
2 3
 
1 0 −1
 
(ii) 
1 2 1 

2 2 3
 
1 −3 3
 
(iii) 
3 −5 3

6 −6 4

Exercise 1.31. Show that the eigenvectors corresponding to distinct eigenvalues of a matrix are linearly
independent.

1.4 Diagonalization
Definition 1.3. Two n × n matrices A and B are said to be similar if there exists an invertible matrix P
such that P −1 AP = B.

The product P −1 AP is called a similarity transformation on A.


A fundamental problem is the following: Given a square matrix A, can we reduce it to a simplest possible
form by means of a similarity transformation?
Note that the diagonal matrices have the simplest form. So, now the question reduces to the following:
Is every square matrix similar to a diagonal
 matrix?.

0 0
The answer is No. For example, let A =  . Note that A 2 = 0. If there exists a nonsingular matrix
1 0
P such that P −1 AP = D, then D 2 = P −1 APP −1 AP = P −1 A 2 P = 0 ⇒ D = 0 ⇒ A = 0, which is not true.
So A is not similar to a diagonal matrix. Hence not all the square matrices are similar to a diagonal
matrix.

Definition 1.4. A square matrix A is said to be diagonalizable if A is similar to a diagonal matrix.

Lecture Notes 11
1.4 Diagonalization

1.4.1. Characteristics of a diagonalizable matrix

We examine the following equation:


 
λ1 0 0 ... 0
 
0
 λ2 0 ... 0 
−1
 
P A n× n P = D = 
 ... ... ... ... ... 

 
 ... ... ... ... ... 
 
0 0 0 0 λn

which implies,
 
λ1 0 0 ... 0
 
0
 λ2 0 ... 0 
 
AP = PD ⇒ A[P1 |P2 |...|P n ] = [P1 |P2 |...|P n ] 
 ... ... .
... ... ... 
 
 ... ... ... ... ... 
 
0 0 0 0 λn

Equivalently,
[AP1 | AP2 |...| AP n ] = [λ1 P1 |λ2 P2 |...|λn P n ].

Hence, AP j = λ j P j i.e., (λ j , P j ) is an eigenpair for A i.e., P must be a matrix whose columns constitute
n linearly independent eigenvectors, and D is a diagonal matrix whose diagonal entries are the corre-
sponding eigenvalues. Also, if there exists a linearly independent set of n eigenvectors that are used as
columns to build a nonsingular matrix P, and if D is the diagonal matrix whose diagonal entries are
the corresponding eigenvalues, then P −1 AP = D.
A complete set of eigenvectors for A n×n is any set of n linearly independent eigenvectors for A. By the
above discussion it follows that A is diagonalizable if and only if it has a complete set of eigenvectors.
Hence A is diagonalizable if and only if the algebraic multiplicity equals the geometric multiplicity of
each eigenvalue.

Exercise 1.32. If possible, diagonalize the following matrix with a similarity transformation:
 
1 −4 −4
 
A=  8 −11 −8

−8 8 5

Exercise 1.33. If possible, diagonalize the following matrices with a similarity transformation. Other-
wise give reasons why they are not diagonalizable:
 
0 1
1. A =  
−8 4
 
1 1 1
 
2. A = 
1 1 1

1 1 1

Lecture Notes 12
1.5 Positive Definite Matrices

 
5 4 2 1
 
0 1 −1 −1
3. A = 
 

−1 −1 3 0
 
1 1 −1 2
 
5
−6 −6
 
4. A = 
−1 4 2

3 −6 −4

1.5 Positive Denite Matrices


Definition 1.5. A principal submatrix of a square matrix is a submatrix formed by a set of rows and
the corresponding set of columns. A principal minor of a square matrix is the determinant of a principal
submatrix.

Definition 1.6. An n × n matrix A is said to be symmetric if A = A T . An n × n matrix A is said to be


positive definite if it is a symmetric matrix and if xT Ax > 0 for every nonzero vector x.

Definition 1.7. An n × n matrix A is said to be a positive semidefinite if it is a symmetric matrix and if


xT Ax ≥ 0 for all x.

Example 3. An identity matrix is trivially a positive definite matrix.


     
2 1 1 −2 1 2
Matrices  ,   are positive definite. The matrix  is neither a positive definite
1 3 −2 5 2 1
 
1 1
nor a positive semidefinite.  is positive semidefinite.
1 1

1.5.1. Properties of PD and PSD matrices

1. If A is positive definite then it is nonsingular.

2. If A, B are positive definite and if α, β ≥ 0, with α + β > 0, then α A + βB is positive definite.

3. If A is positive definite then | A | > 0.

4. If A is positive definite then any principal submatrix of A is positive definite.

5. Let A be a symmetric n × n matrix. Then A is positive definite if and only if the eigenvalues of
A are all positive. Similarly, A is positive semidefinite if and only if the eigenvalues of A are all
nonnegative.

6. Let A be a symmetric n × n matrix. Then A is positive definite if and only if the entire principal
minors of A are positive. (Similarly, A is positive semidefinite if and only if the entire principal
minors of A are nonnegative.)

Lecture Notes 13
1.6 LU Decomposition

7. Let A be a symmetric n × n matrix. Then A is positive definite if and only if all leading principal
minors of A are positive.

Note: (i) If A is a symmetric matrix then the eigenvalues of A are all real.
(ii) If v, w are eigenvectors of a symmetric matrix correspond to distinct eigenvalues α, β, then v, w are
mutually orthogonal.

Definition 1.8. A square matrix P is said to be orthogonal if P −1 = P T ; that is to say, if PP T = P T P = I.

1.6 LU Decomposition
Theorem 1.8. Let  
a 11 a 12 ... a 1n
 
a a 22 ... a 2n 
 21 
 
A=
 ... ... ... ... 

 
 ... ... ... ... 
 
a n1 a n2 ... a nn
be a non-singular matrix. Then A can be factorized into the form LU, where
   
l 11 0 0 ... 0 1 u 12 u 13 ... u 1n
   
l
 21 l 22 0 ... 0  0 1 u 23 ... u 2n 
 

   
L= ... ... ... ... ...  and U = ... ...
 ... ... ,
... 
   
 ... ... ... ... ...  ... ... ... ... ... 
 
 
l n1 l n2 l n3 ... l nn 0 0 0 ... 1

if
¯ ¯
¯ ¯ ¯a a 12 a 13 ¯¯
¯ 11
¯a 11 a 12 ¯
¯ ¯
¯ ¯
a 11 ̸= 0, ¯¯ ¯ ̸= 0, ¯a
¯ 21 a 22 a 23 ¯¯ ̸= 0 and so on.
¯a 21 a 22 ¯
¯
¯ ¯
¯a 31 a 32 a 33 ¯
Such a factorization, whenever it exists, is unique.

Similarly, the factorization LU where,


   
1 0 0 ... 0 u 11 u 12 u 13 ... u 1n
   
l u 22 u 23 u 2n 
 21 1 0 ... 0 0 ...

  
   
L=  ... ... ... ... ...
 and U = 
 ... ... ... ... ... 

   
 ... ... ... ... ...
  ... ... ... ... ... 
  
l n1 l n2 l n3 ... 1 0 0 0 ... u nn

is also a unique factorization.

Example 4. Consider a positive definite matrix A ∈ R3×3 . Write


    
a 11 a 12 a 13 1 0 0 u 11 u 12 u 13
    
A= a 21 a 22 a 23  = LU =  l 21 1 0  0
   u 22 u 23 

a 31 a 32 a 33 l 31 l 32 1 0 0 u 33

Lecture Notes 14
1.6 LU Decomposition

i.e.,
 
u 11 u 12 u 13
 
A=
 l 21 u 11 l 21 u 12 + u 22 l 21 u 13 + u 23 

l 31 u 11 l 31 u 12 + l 32 u 22 l 31 u 13 + l 32 u 23 + u 33
Equating the corresponding entries, we get

u 11 = l 11 , u 12 = a 12 , u 13 = a 13 ,
a 21 a 31
l 21 = , l 31 = ,
a 11 a 11
a 21 a 21
u 22 = a 22 − a 12 , u 23 = a 23 − a 13 ,
a 11 a 11
a 32 − aa31
11
a 12
l 32 = ,
u 22
from which u 33 can be computed.

Note: We follow the following systematic procedure to evaluate the elements of L and U (where L
is unit lower triangular and U is upper triangular).
Step I: Determine first row of U and first column of L.
Step II: Determine the second row of U and the second column of L.
Step III: Determine third row of U.

Exercise 1.34. Factorize the matrix  


2 3 1
 
A=
1 2 3

3 1 2
into the LU form.

Exercise 1.35. Factorize the matrix  


4 3 −1
 
A=
1 1 1 

3 5 3
into the LU form.

Exercise 1.36. Factorize the matrix  


5 −2 1
 
A=
7 1 −5

3 7 4

into the LU form. Hence solve the system Ax = b where b = [4 8 10]T .

Exercise 1.37. Factorize the matrix  


4 3 2
 
A=
2 3 4

1 2 1
into the LU form.

Lecture Notes 15
1.7 Cholesky Decomposition

1.7 Cholesky Decomposition


Note that a positive real number can be decomposed into identical components using square root oper-
p p
ation, for example, 16 = 4 × 4 and 2 = 2 × 2. Similarly, for matrices we have the following

Theorem 1.9. A positive definite matrix A can be factorized into a product A = LL T , where L is a lower-
triangular matrix with positive diagonal entries:

    
a 11 a 12 ... a 1n l 11 0 0 ... 0 l 11 l 21 l 31 ... l n1
    
a a 22 a 2n 
...   l 21 l 22 0 ... 0  0 l 22 l 23 ... l 2n 
 
 21 
    
 =  ... .
 ... ... ... ...  ... ... ... ... 
  ... ... ... ... ... 
 

    
 ... ... ... ... 
  ... ... ... ... ... 
  ... ... ... ... ... 
 
 
a n1 a n2 ... a nn l n1 l n2 l n3 ... l nn 0 0 0 ... l nn
L is called the Cholesky factor of A, and L is unique.

Example 5. Consider a positive definite matrix A ∈ R3×3 . Write


    
a 11 a 12 a 13 l 11 0 0 l 11 l 21 l 31
T
    
A=
a 21 a 22  = LL =  l 21
a 23   l 22 0  0
 l 22 l 32 

a 31 a 32 a 33 l 31 l 32 l 33 0 0 l 33

i.e.,
 
l 211 l 21 l 11 l 31 l 11
 
A=
 l 21 l 11 l 221 + l 222 .
l 21 l 31 + l 22 l 32 
l 31 l 11 l 21 l 31 + l 22 l 32 l 231 + l 232 + l 233
Equating the first columns of each of the matrices, we get,

p 1 1
l 11 = a 11 , l 21 = a 21 , l 31 = a 31 .
l 11 l 11

Equating the second and third columns, we get,


q 1 q
l 22 = a 22 − l 221 , l 32 = (a 32 − l 31 l 21 ), and l 33 = a 33 − (l 231 + l 232 ).
l 22

Note: If the matrix is Positive Semi-definite, instead of Positive Definite, then it still has a de-
composition of the form A = LL T , where the diagonal entries are allowed to be zero. However, this
decomposition is not unique.

Exercise 1.38. Solve the system


    
5 0 1 x1 8
    
0 −2 0  x  = −4
   2  
1 0 5 x3 16
by Cholesky’s method.

Lecture Notes 16
1.8 Spectral Decomposition Theorem

1.8 Spectral Decomposition Theorem


Theorem 1.10. Given a symmetric matrix A, there exists an orthogonal matrix U such that

A = U diag(λ1 , λ2 , . . . , λr , 0, . . . , 0)U T ,

where λ1 , λ2 , . . . , λr are the eigenvalues of A and the first r columns of U are the unit eigenvectors corre-
sponding the eigenvalues.

Lemma 1.1. If A a symmetric matrix of size n × n and v is a unit eigenvector corresponding to the
eigenvalue α, then B = αvvT is a matrix satisfying the following properties:

(i) B is a matrix of rank one such that AB = BA = B2 .

(ii) Rank(A) = Rank(B) + Rank(C), where C = A − B.

Proof. Since v is an eigenvector of A, we have that Av = αv and therefore AB = BA = B2 . From


the definition of B, Range(B) ⊆ Range(A) and therefore Range(C) ⊆ Range(A). Since A and B are
symmetric, so is C = A − B. Therefore BC = CB = 0 implies Range(B) ⊥ Range(C), and therefore
Range(B) ⊕ Range(C) = Range(A). So, Rank(A) = Rank(B) + Rank(C).

Proof. Proof is by induction on the rank of given symmetric matrix A. If the rank of A is one with
eigenvalue λ1 and eigenvector v, then A is in the form

λ1 u 1 u 1T

1
where u 1 = ||v|| v. Now extend { u 1 } to an orthonormal basis { u 1 , u 2 , . . . u n } of Rn . Now construct a matrix
U by taking { u 1 , u 2 , . . . u n } as columns in the same order. Clearly, UU T = U T U = I and
 
λ1 0
A =U U T .
0 0

So, the theorem holds when the rank of A is one. Suppose that the theorem holds for all the matrices of
rank r − 1. If A is a symmetric matrix of rank r, chose an eigenvalue λ1 of A with corresponding unit
eigenvector u 1 , as in the earlier lemma. Now construct B = λ1 u 1 u 1T satisfying Rank(A) = Rank(B) +
Rank(C) and BC = CB = 0. Since C is a symmetric matrix and Rank(C) = r − 1, by induction, there
exists an orthogonal matrix V such that

C = V diag(λ2 , . . . , λr , 0, . . . , 0)V T ,

where λ2 , . . . , λr are the eigenvalues of C and the first r − 1 columns of V are corresponding eigenval-
ues with unit norm. Since BC = CB = 0, u 2 , . . . , u r are the eigenvectors correspond to the eigenvalues
λ2 , . . . , λr of C as well of A. For the same reason, the set of vectors u 1 together with the first r − 1 columns
{ u 2 , . . . , u r } of V forms a set of orthonormal vectors. Extend this set of orthonormal vectors to form an

Lecture Notes 17
1.8 Spectral Decomposition Theorem

orthonormal basis { u 1 , u 2 , . . . , u r , u r−1 . . . u n } of Rn . Now construct a orthogonal matrix U by using these


orthonormal vectors and from the definition of B and C, it is clear that

A = U diag(λ1 , λ2 , . . . , λr , 0, . . . , 0)U T .

 
2 1
Example 6. Obtain the spectral decomposition of A =  .
1 2

Solution. Spectral Decomposition of a symmetric matrix is given by

A = QDQ T

First step is to find the eigenvalues of A. To get eigenvalues we have to solve characteristic equation
| A − λI | = 0 ¯ ¯
¯2 − λ 1 ¯
¯ ¯
| A − λ I | = ¯¯ ¯=0
¯ 1 2 − λ¯
¯

Solving, we get λ = 3, 1
Eigenvector corresponding to eigenvalue λ = 3 is given by
    
−1 1 x 0
   1 =  
1 −1 x2 0
   
x1
1
Solving we get   =  
x2 1
 
p1
Normalized eigenvector corresponding to eigenvalue λ = 3 is  2
p1
2
Eigenvector corresponding to eigenvalue λ = 1 is given by
    
1 1 x1 0
   =  
1 1 x2 0
   
x1
−1
Solving we get   =  
x2 1
 
−1
p
Normalized eigenvector corresponding to eigenvalue λ = 1 is  2
p1
2
Spectral Decomposition of A is given as
   
p1 −1
p 3 0 p1 p1
A =  2 2  2 2.
p1 p1
0 1 −1
p p1
2 2 2 2

Exercise 1.39. If A is positive semidefinite matrix then there exists a unique positive semidefinite matrix
B such that B2 = A. (The matrix B is called the square root of A and is denoted by A /2 .)
1

Lecture Notes 18
1.9 Singular Values

 
1 3
Exercise 1.40. Obtain the spectral decomposition of  .
3 1
 
0 1 1  
  2 5
Exercise 1.41. Obtain the spectral decomposition of matrices 
1 0 1 ,
  .
5 3
1 1 0

1.9 Singular Values


1
Let A be an n × n matrix. the singular values of A are defined to be the eigenvalues of (A A T ) 2 . The
singular values are always nonnegative, since A A T is a positive semidefinite matrix and we denote
them by
σ1 (A) ≥ · · · ≥ σn (A)

or simply by
σ1 ≥ · · · ≥ σ n

Suppose A is an m × n matrix with m < n. Augment A by n − m zero rows to get a square matrix, say B.
Then the singular values of A are defined to be the singular values of B. Suppose m < n, then similar
definition can be given by augmenting A by zero columns, instead of zero rows.
The following assertions can be verified easily. We omit the proof.

(i) The singular values of A and P AQ are identical for any orthogonal matrices P,Q.

(ii) The rank of a matrix equals the number of nonzero singular values of the matrix.

(iii) If A is symmetric then the singular values of A are the absolute values of its eigenvalues. If A is
positive semidefinite then the singular values are the same as eigenvalues.

1.10 The Singular Value Decomposition


Theorem 1.11. Given an m × n matrix A, there exist orthogonal matrices U and V such that

A = U diag(σ1 , σ2 , . . . , σr , 0, . . . , 0)V T

where σ1 , σ2 , . . . , σr are the singular values of A.

Proof. If A is a matrix of size m × n and with rank r, it is clear that the matrix A A T is of size m ×
m with rank r. Further, the eigenvalues of A A T are non-negative and let the positive eigenvalues
be σ21 , σ22 , . . . , σ2r . Consider the orthogonal unit eigenvectors u 1 , u 2 , . . . , u r of A A T corresponding to the
eigenvalues σ21 , σ22 , . . . , σ2r . From spectral decomposition theorem, we have

A A T = P diag(σ21 , σ22 , . . . , σ2r )P T ,

Lecture Notes 19
1.11 Generalized Inverses and Applications

where P is the matrix obtained by taking orthogonal unit eigenvectors u 1 , u 2 , . . . , u r as its columns. Note
1
that P T P = I and PP T A A T = A A T which implies PP T A = A. Now write v i = σi
A T u i . Observe that

1 T 1 T 1 T 2
A T Av i = A T A( A ui) = A (A A T u i ) = A (σ i u i ) = σ2i v i
σi σi σi

and therefore v i are eigenvectors of A T A. Further, v i are orthogonal unit vectors and therefore {v1 , v2 , . . . , vr }
is a set of orthonormal vectors. For Q obtained by taking {v1 , v2 , . . . , vr } as its columns, we get

A T A = QD 2 Q T ,

where D 2 = diag(σ21 , σ22 , . . . , σ2r ). From the definition of v i , it is clear that

Q = A T PD −1

and therefore
PDQ T = PDD −1 P T A = PP T A = A.

Now extend the matrices P and Q to the orthonormal matrices U and V , respectively to get
 
D 0
A =U V T.
0 0

Remark: u i is an eigenvector of A A T and v i is an eigenvector of A T A corresponding to the same


eigenvalue σ2i . These vectors are called the singular vectors of A.

Exercise 1.42. Find the singular value decomposition of matrix


 
1 −2 2
A= .
−1 2 −2

1.11 Generalized Inverses and Applications


Consider the linear system
Ax = y (1.3)

where A is an m × n matrix and y ∈ R (A), the range space of A. If the matrix A is nonsingular then
x = A −1 y will be the solution to the system (1.3). Suppose the matrix A is singular or m ̸= n then we
need a right candidate G of order n × m such that

AG y = y. (1.4)

That is G y is a solution to the linear system (1.3). Equivalently, G is of order n × m such that

AG A = A. (1.5)

Hence we can define the generalized inverse as the following.

Lecture Notes 20
1.11 Generalized Inverses and Applications

Definition 1.9. Given a m × n matrix A, an n × m matrix G is said to be generalized inverse of A if

AG A = A.

G is also known as g-inverse, {1}−inverse, pseudo inverse, partial inverse by many authors in the
literature. We denote an arbitrary generalized inverse by A − . The set of all generalized inverses is
denoted by { A − }.
Remark: If A is square and nonsingular, then A −1 is the unique g-inverse of A.

Lemma 1.2. If G is a g-inverse of A, then rank(A) = rank(AG) = rank(G A).

Proof. Since AG A = A and rank(AB) ≤ min{ rank(A), rank(B)} for any two matrices A, B, we have

rank (A) = rank (AG A) ≤ rank (AG) ≤ rank (A).

Also,
rank (A) = rank (AG A) ≤ rank (G A) ≤ rank (A).

=⇒ rank(A) = rank(AG) = rank(G A).

Example 7. Let A be a matrix and let G be a g-inverse of A. Show that the class of all g-inverses of A is
given by
G + (I − G A)U + V (I − AG),

where U, V are arbitrary.

Solution. Consider any matrix G + (I − G A)U + V (I − AG) where U and V are arbitrary matrices and
G is a g-inverse of A.
We have AG A = A. Now,

A(G + (I − G A)U + V (I − AG))A = AG A + A(I − G A)U A + AV (I − AG)A

= AG A + AU A − AG AU A + AV A − AV AG A

= A + AU A − AU A + AV A − AV A

=A

=⇒ A(G + (I − G A)U + V (I − AG))A = A.

Consider any H = { A − }. If G is a g-inverse of A,


write H = G + (H − G) = G + W ( where W = H − G & AW A = 0)

W = (I − G A)W + G AW

AW A = 0 =⇒ G AW A = 0 =⇒ G AW = G AW(I − AG)

So,
W = (I − G A)W + G AW(I − AG)

Therefore, H = G + (I − G A)U + V (I − AG) where U = W = H − G, V = G AW.


∴ H ∈ {G + (I − G A)U + V (I − AG) : G is a g-inverse of A and U, V are arbitrary}.

Lecture Notes 21
1.12 Construction of generalized inverse

1.12 Construction of generalized inverse


The following characterizations are easy to verify.

1. If A = BC is a rank factorization, then


G = C− −
r Bl

is a g-inverse of A where C − −
r is a right inverse of C and B l is left inverse of B.

Proof. Note that, B has left inverse B−


l
(say) and C has right inverse C − − −
r . Now set G = C r B l then

AG A = BCC − −
r B l BC = BI r I r C = BC = A.

Hence, G is the g-inverse of A.


 
Ir 0
2. If A = P   Q for any non-singular matrices P and Q, then
0 0
 
I r U
G = Q −1   P −1
W V

is a generalized inverse of A for arbitrary U, V and W.


   
Ir 0 Ir 0
Proof. Note that   the g-inverse of the matrix   since,
0 0 0 0
     
Ir 0 Ir 0 Ir 0 Ir 0
   = .
0 0 0 0 0 0 0 0
   
Ir U I 0
Further note that, for any U, V ,W of appropriate sizes,   is the g-inverse of  r  since
W V 0 0
     
Ir 0 Ir U Ir 0 Ir 0
   = .
0 0 V W 0 0 0 0
 
Ir U
Now set G = Q −1   P −1 . Since,
W V
     
Ir 0 Ir U Ir 0
AG A = P   QQ −1   P −1 P  Q
0 0 W V 0 0
   
Ir 0 I U I 0
=P  r  r Q
0 0 W V 0 0
 
Ir 0
=P Q
0 0

=A

G is the g-inverse of A.

Lecture Notes 22
1.12 Construction of generalized inverse

This also shows that any matrix which is not a square, nonsingular matrix admits infinitely many
g-inverses.

3. Let A be of rank r. Since A is a matrix of rank r, there exists a r × r submatrix whose determinant
is nonzero. Without loss of generality, let
 
B C
A= 
D E

where B r×r is the non-singular submatrix of A. Also, there exists a matrix X such that B =
C X , D = E X . Then,  
B−1 0
G= 
0 0

is a g-inverse of A. (Easy to verify)


 
1 0 −1 2
 
 2 0 −2 4
Example 8. Find two different g-inverses of  .
 
−1 1 1 3
 
−2 2 2 6

Solution. Note that the matrix is of rank 2, since the Echelon form of the matrix is,
 
1 0 −1 2
 
 0 1 0 5 
.
 

 0 0 0 0 
 
0 0 0 0
 
2 0
Now, note that, A 1 =   is a 2 × 2 nonsingular minor. Now, fitting the inverse of A 1 in the
−1 1
appropriate place, we get the
 
0 1/2 0 0
 
 0 1/2 1 0 
G1 =  .
 
 0 0 0 0 
 
0 0 0 0
   
1 0 1 0
Similarly, A 2 =   , which gives A −1 =  . And the g-inverse is,
2
−1 1 1 1
 
1 0 0 0
 
 1 0 1 0 
G2 =  .
 
 0 0 0 0 
 
0 0 0 0

Lecture Notes 23
1.13 Minimum Norm, Least Squares g-inverse and Moore-Penrose inverse

Definition 1.10. A g-inverse G of A is called a reflexive g-inverse if it satisfies

G AG = G.

Note that, if G is any g-inverse of A, G AG is a reflexive g-inverse of A.

Theorem 1.12. Let G be a g-inverse of A. Then

rank A ≤ rankG.

Furthermore, equality holds if and only if G is reflexive.

Proof. From Lemma 1.2, rank (A) = rank (G A) ≤ rank (G). If G is a reflexive g-inverse of A, then A is a
g-inverse of G and hence rank(G) ≤ rank (A), equality holds.
Conversely, suppose rank(A) = rank(G). First observe that C (G A) ⊂ C (G). By 1.2, rank(G) =
rank(G A) and hence C (G) = C (G A). Therefore G = G A X for some X . Now G AG = G AG A X = G A X = G
and hence G is reflexive.

1.13 Minimum Norm, Least Squares g-inverse and Moore-Penrose in-


verse
Definition 1.11 (minimum norm g-inverse). A g-inverse G of A is said to be a minimum norm g-inverse
if, in addition to AG A = A, it satisfies
(G A)T = G A.

Definition 1.12 (least squares g-inverse). A g-inverse G of A is said to be a least squares g-inverse if,
in addition to AG A = A, it satisfies
(AG)T = AG.

Definition 1.13 (Moore-Penrose inverse). If G is a reflexive g-inverse of A which is both minimum norm
and least squares then it is called a Moore-Penrose inverse of A.
In other words, G is said to be Moore-Penrose inverse of A if it satisfies

AG A = A, G AG = G, (AG)T = AG and (G A)T = G A.

Moore-Penrose inverse is denoted by A + .

Lemma 1.3. Let A be a complex matrix of order m × n then the Moore-Penrose inverse of A exists and it
is unique.

Proof. Let A = BC be a rank factorization. Then

B+ = (B T B)−1 B T , C + = C T (CC T )−1

and then
A + = C + B+ .

Verification:

Lecture Notes 24
1.13 Minimum Norm, Least Squares g-inverse and Moore-Penrose inverse

(i)

A A + A = BCC T (CC T )−1 (B T B)−1 B T BC

= BC (∵ CC T (CC T )−1 = I, (B T B)−1 B T B = I)

=A

(ii)

A + A A + = C T (CC T )−1 (B T B)−1 B T BCC T (CC T )−1 (B T B)−1 B T

= C T (CC T )−1 (B T B)−1 B T

= A+

(iii)

A A + = BCC T (CC T )−1 (B T B)−1 B T

= B(B T B)−1 B T

(A A + )T = (B(B T B)−1 B T )T

= B(B T B)−1 B T

∴ (A A + )T = A A +

(iv)

A + A = C T (CC T )−1 (B T B)−1 B T BC

= C T (CC T )−1 C

(A + A)T = (C T (CC T )−1 C)T

= C T (CC T )−1 C

∴ (A + A)T = A + A.

Since, all the four conditions of Moore-Penrose are satisfied, therefore A + is Moore-Penrose inverse of
A. Hence the existence. To prove the uniqueness, let G 1 and G 2 be two Moore-Penrose inverse of A.
Then

G 1 = G 1 AG 1 = G 1 G 1T A T (∵ AG 1 = (AG 1 )T )

= G 1 G 1T A T G 2T A T (∵ AG 2 A = A)

= G 1 G 1T A T AG 2 (∵ AG 2 = (AG 2 )T )

= G 1 AG 1 AG 2 (∵ AG 1 = (AG 1 )T )

= G 1 AG 2 AG 2 (AG 1 A = A, G 2 AG 2 = G 2 )

= G 1 A A T G 2T G 2 (∵ G 2 A = (G 2 A)T )

= A T G 1T A T G 2T G 2 (∵ G 1 A = (G 1 A)T )

= A T G 2T G 2 (∵ AG 1 A = A)

= G 2 AG 2 = G 2 (∵ G 2 A = (G 2 A)T , G 2 AG 2 = G 2 ).

Lecture Notes 25
1.13 Minimum Norm, Least Squares g-inverse and Moore-Penrose inverse

Hence, whenever Moore-Penrose inverse exists, it is unique.

1.13.1. Construction of Moore-Penrose inverse

1 T
(i) If A is of rank 1 matrix, then A + = α
A is the Moore-Penrose inverse of A, where α = Trace(A T A).
(Proof as exercise)
 
2 1 3
Exercise 1.43. Find the Moore-Penrose inverse of A =  .
4 2 6

Here rank(A) = 1.  
20 10 30
AT A =  T
 
10 5 15 , Trace(A A) = 70

30 15 45

2 4

1  70 70
A+ = A T =  1 2 , where α = Trace(A T A)

α  70 70 
3 6
70 70

(ii) Let A be an m× n matrix. The singular value decomposition of A is given by A (m×n) = U(m×m) (m×n) V(Tn×n)
P
P
where U and V are orthogonal matrices and is a block diagonal matrix consists of singular val-
ues of A and zeros.

By using singular value decomposition, the Moore-Penrose inverse of A is given by A + ( n× m)


=
P+ T P+
V(n×n) (n×m) U(m×m) where is a block diagonal matrix consists of reciprocal of singular val-
ues of A and zeros. (Proof left as an exercise)
 
1 −1
 
Example 9. Find Moore-Penrose inverse of A =  −2 2  using S.V.D.

2 −2
 
1 −1
 
Solution. Given A = 
−2 2.
2 −2
P T
Formula for SVD is A m×n = Um×m m× n Vn× n
where U, V are orthogonal matrices.
In this case formula for calculating Moore-Penrose inverse is A + = V + U T .
P

First step is to find the singular values of A. To get singular values we have to find the eigenvalues
of A T A by solving | A T A − λ I | = 0.
Now  
  1
 −1
1 −2 2 9 −9
AT A = 
 
 −2 2  =  .
−1 2 −2 −9 9
 
2 −2
Therefore ¯ ¯
9 λ 9
¯ ¯
− −
| A T A − λ I | = 0 ⇒ ¯¯
¯ ¯
¯ = 0;
¯ −9 9 − λ¯
¯

Lecture Notes 26
1.13 Minimum Norm, Least Squares g-inverse and Moore-Penrose inverse

∴ λ = 18, 0.
p
Thus the singular values are σ = 18, 0.
Finding eigenvector corresponding to λ = 18, by solving the matrix equation

(A T A − λ I)X = 0
    
−9 −9 x1 0
i. e.,    =  .
−9 −9 x2 0
 
 
−1 x1
Solving we get   =   = v.
x2 1
 
−1
p
Normalizing v we get v1 =  2 .
p1
2
We require unit vector v
2 which
 is orthogonal to v1 .
y1
To find v2 we write v2 =   such that y2 + y2 = 1,
1 2
y2

v1T v2 = 0
1
i.e., p (− y1 + y2 ) = 0
2
   
y1 p1
∴ v2 =   =  12  .
y2 p
2
 
−1
p p1
∴V = 2 2 .
p1 p1
2 2
Next is to find U3×3 .
  
−1
1
−1  −1  3
p
Let u 1 = σ11 Av1 = p1    2  =  2 .
   
18 
− 2 2  p1 3
2 −2
2 −2 3
We now extend the set { u 1 } to form an orthonormal basis for R3 . We need two orthonormal vectors
which are orthogonal to u 1 , satisfying u 1T x = 0 i.e., − x1 + 2x2 − 2x3 = 0.
       
x1 2x2 − 2x3 2 −2
       
∴  x2  =  x2  = x2 1 + x3  0 
      
.
x3 x3 0 1

A basis for the solution set is given by


   
2 −2
   
w1 = 
1 , w2 =  0  .
  

0 1

Applying Gram-Schmidt process to {w1 , w2 } we obtain

Lecture Notes 27
1.13 Minimum Norm, Least Squares g-inverse and Moore-Penrose inverse

   
p2 p−2
 5  45 
u 2 =  p1  , u 3 =  p4 .
   
 5  45 
0 p5
 45
−1 p2 p−2
 3 5 45 
2 p1 p4 .
∴U =
 
 3 5 45 
−2
3 0 p5
45 

−1
p
p2 p−2

3 18
0  −1 
 5 45 
 p p1
2 1 4  2 2  is the required singular value decomposition of A.

∴ A= 00
 p p 
 
3  p1
 5 45   p1
−2
3 0 p5 0 0 2 2
45
Hence the Moore-Penrose inverse of A is 
   −1 2 −2  
−1 1 1 3 3 3  1 −1 1
p p p 0 0 
A+ = V + U T =  2 2   18  p25 p1 0  =  18 9 9 
P  
p1 p1 0 0 0  5  −1 1 −1
18 9 9
2 2 p−2 p4 p5
45 45 45
We now obtain some characterizations of the Moore–Penrose inverse in terms of volume. For example,
we will show that if A is an n × n matrix then A + is a g-inverse of A with minimum volume. First we
prove some preliminary results. It is easily seen that A + can be determined from the singular value
decomposition of A. A more general result is proved next.

Theorem 1.13. Let A be an n × n matrix of rank r and let


 
Σ 0
A=P Q
0 0

= diag (σ1 , . . . , σr ). Then the


P
be the singular value decomposition of A, where P,Q are orthogonal and
class of g-inverses of A is given by  
Σ−1 X
G = QT   PT (1.6)
Y Z

where X , Y , Z are arbitrary matrices of appropriate dimension. The class of reflexive g-inverses G of A is
given by (1.6) with the additional condition that Z = Y Σ X . The class of least squares g-inverses G of A
is given by (1.6) with X = 0. The class of minimum norm g-inverses G of A is given by (1.6) with Y = 0.
Finally, the Moore-Penrose inverse of A is given by (1.6) with X , Y , Z all being zero.

Lecture Notes 28

You might also like