You are on page 1of 25

Chapter 1

Matrix Algebra

1.1 Vector
1. A vector is an ordered sequence of elements arranged in a row or column. Unless otherwise
noted, a vector will always be assumed to be a column vector. For example, a is a
3-element column vector and x is a n-element column vector.
 
  x1
5  x2 
 
a =  1  , x =  .. 
 . 
3
xn
Column vectors can be transformed into row vectors by the operation of transposition.
We denote the transposition operation by a prime. Thus, row vector:
a′ = [5 1 3], x′ = [x1 x2 . . . xn ]

2. The inner product of two vectors is defined as


 
b1
 b2  Xn
′  
a b = [a1 a2 . . . an ]  ..  = a1 b1 + a2 b2 + · · · + an bn = ai bi = b′ a
 .  i=1
bn
That is, corresponding elements are multiplied together and summed to give the product,
which is a scalar.
3. A special case of inner product:
n
X

Sum of squares: a a = a2i
i=1

Example: Let e = [e1 e2 . . . en ]′ be a (n × 1) residuals vector of OLS estimators then we


can obtain n
X

Residual sum of squares = e e = e2i
i=1

1
4. Definition: Let i is a vector that contains a column of ones.
 
1
 1 
 
in×1 =  .. 
 . 
1

5. Suppose i and x are two (n × 1) column vector, then


 
x1
n
X  x2 
 
xi = x1 + x2 + · · · + xn = [1 1 . . . 1]  ..  = i′ x
i=1
 . 
xn

1
Pn
6. Sample mean: x̄ = n i=1 xi = n1 i′ x

7. A column vector with sample mean:


 

 x̄  1 1
 
 ..  = ix̄ = i i′ x = ii′ x
 .  n n
x̄ n×1

⇒ n1 ii′ is a n × n matrix with every element equal to 1/n.

1.2 Matrices
1. Definition:
A matrix is a rectangular array of elements. The order of a matrix is given by the
number of rows and the number of columns. The first number is the number of rows, and
the second number is the number of columns. A matrix A of order (or dimension) m × n
can be expressed as  
a11 a12 . . . a1n
 a21 a22 . . . a2n 
 
A =  .. .. . . .. 
 . . . . 
am1 am2 . . . amn

Example:
 
2 1 7 4
B =  1 2 −2 1 
4 1 2 −3
The matrix B is of order 3 × 4.

2
2. Definition: Let A be a m × n matrix. A is said to be a square matrix if m = n.

3. Definition: lower-triangular matrix


A matrix in which all elements above the diagonal are 0, e.g.
 
1 0 0
A= 2 3 0 
4 5 6

4. Definition: upper-triangular matrix


A matrix in which all elements below the diagonal are 0, e.g.
 
1 2 3
A= 0 4 5 
0 0 6

5. A diagonal matrix is a square matrix with nonzero elements in all diagonal position and
zeros occur elsewhere.  
a11 0 . . . 0
 0 a22 . . . 0 
 
A =  .. .. . . .. 
 . . . . 
0 0 . . . ann
We often denoted a diagonal matrix A as

diag(a11 , a22 , . . . , ann ).

where aii is the ith element on the principal diagonal.

6. The identity matrix of order n × n is defined as


 
1 0 ... 0
 0 1 ... 0 
 
In =  .. .. . . .. 
 . . . . 
0 0 ... 1

7. Suppose A is a matrix with order m × n, it follows that

Im A = AIn = A

That is, pre- or post-multiplying by I does not change the matrix.

8. The identity matrix can be entered or suppressed at will in matrix multiplication. For
example,
y − Py = Iy − Py = (I − P)y = My, where M = I − P

3
9. The transpose of a matrix A = [aij ] denoted as A′ , is obtained by creating the matrix
whose kth row is the kth column of the original matrix, i.e. A′ = [aji ].
Example:  
  1 4
1 2 3
A= , A′ =  2 5 
4 5 6
3 6

10. A square matrix A is a symmetric matrix if and only if A = A′ .


Example 1:  
1 −2 3
A =  −2 1 4  = A′
3 4 2
Example 2:
If X is any n × K matrix, then X′ X is a symmetric matrix.
11. If k is a scalar then (kA)′ = A′ k ′ = A′ k = kA′ .
12. (A′ )′ = A
13. The transpose of a sum is the sum of the transposes.
(A + B)′ = A′ + B′

14. The transpose of an identity matrix I is the identity matrix itself, i.e. I′ = I.
15. Definition: A matrix with all elements zero is said to be a null matrix and denoted as On .
16. Two matrices A and B are said to be equal if A − B = O.
17. Definition: Matrix multiplication
If the matrix A is m × n and B is n × k, then AB is defined.
   
a11 a12 · · · a1n a′1
 a21 a22 · · · a2n   ′ 
   a2 
A =  .. .. .. ..  =  . 
 . . . .   .. 
am1 am2 · · · amn a′m
 
b11 b12 · · · b1k
 b21 b22 · · · b2k 
 
B= .. .. .. ..  = [b1 b2 · · · bk ]
 . . . . 
bn1 bn2 · · · bnk
Then C is a m × k matrix defined as
   
a′1 a′1 b1 a′1 b2 · · · a′1 bk
 a ′  a′ b1 a′ b2 · · · a′2 bk 
 2   2 2 
C = AB =  ..  [b1 b2 · · · bk ] =  .. .. .. .. 
 .   . . . . 
′ ′ ′ ′
am am b1 am b2 · · · am bk

4
Example 1:
   
1 3 2 1 3
A= ,B =
2 4 0 1 2
   
1(2) + 3(0) 1(1) + 3(1) 1(3) + 3(2) 2 4 9
⇒ AB = =
2(2) + 4(0) 2(1) + 4(1) 2(3) + 4(2) 4 6 14

Example 2:
Let X be a n × K matrix defined as:
   
x11 x12 · · · x1K x′1
 x21 x22 · · · x2K   x′2 
   
X(n×K) =  .. .. .. .. = .. 
 . . . .   . 
xn1 xn2 · · · xnK x′n

where  
xi1
 xi2 
 
xi =  ..  , where i = 1, 2, . . . , n
 . 
xiK

It turns out that


 
x′1

 
 x′2 

X X(K×K) = x1 x2 . . . xn  .. 
 . 
x′n
n
X
= xi x′i
i=1

18. Generally, AB 6= BA if both products exist. e.g.


   
4 7 1 5
A= ,B =
3 2 6 8
   
46 76 19 17
⇒ AB = 6= BA =
15 31 48 58

19. The transpose of the product of two matrices is given by

(AB)′ = B′ A′ , where A is m × n and B is n × k.

That is, the transpose of a product is the product of the transposes in reverse order.
Example:    
1 2 2 0
A= ,B =
3 4 1 1

5
   
4 2 ′ 4 10
⇒ AB = ⇒ (AB) =
10 4 2 4
    
′ ′ 2 1 1 3 4 10
BA = =
0 1 2 4 2 4

20. A matrix A is defined as an idempotent matrix, then

A = AA.

That is, multiplying A by itself, however many times, simply reproduces the original matrix.
Example:  
4 −2
A=
6 −3
    
2 4 −2 4 −2 4 −2
A = =
6 −3 6 −3 6 −3

21. Example 1:

(1) projection matrix


P = X(X′ X)−1 X′
(2) residual maker
M = I − X(X′ X)−1X′

22. Example 2: Let an idempotent matrix A = I − n1 ii′ , then


 
x1 − x̄
1  x2 − x̄ 
 
Ax = x − (ii′ )x =  .. 
n  . 
xn − x̄

The matrix A transforms raw data into deviation form.

1.3 Trace of a Matrix


1. If the square matrix A is of order n × n, the trace of A is defined as the sum of the
elements on the principal diagonal; i.e.
n
X
tr(A) = aii
i=1

2. Basic properties of the trace:

(1) tr(cA) = c · tr(A), where c is a constant.


(2) tr(A′ ) = tr(A)

6
(3) tr(A + B) = tr(A) + tr(B)
(4) tr(In ) = n
(5) tr(AB) = tr(BA)
(6) tr(ABC) = tr(CAB) = tr(BCA)
(7) a′ a = tr(a′ a) = tr(aa′ ), where a is a n × 1 column vector.
P P
(8) tr(A′ A) = tr(AA′ ) = ni=1 nj=1 a2ij

1.4 Determinant of a Square Matrix


1. Definition: For a 2 × 2 matrix  
a11 a12
A=
a21 a22
its determinant is defined as
det(A) = |A| = a11 (−1)(1+1) |a22 | + a12 (−1)(1+2) |a21 |
= a11 a22 − a21 a12
2. For a 3 × 3 matrix  
a11 a12 a13
A =  a21 a22 a23 
a31 a32 a33
its determinant is
a a a21 a23 a21 a22
|A| = a11 (−1)(1+1) 22 23 + a12 (−1)

(1+2)
a31 a33
+ a13 (−1)

(1+3)
a31 a32


a32 a33
3. Suppose A is an n × n matrix, then
Xn
|A| = aij Cij , for any row i = 1, 2, . . . , n.
j=1
Xn
= aij Cij , for any column j = 1, 2, . . . , n.
i=1

where Cij is called the cofactor of the element of aij and denoted by Cij = (−1)i+j |Aij |.
Aij is a submatrix obtained from A by deleting row i and column j.
4. If A is an n × n matrix and c is a nonzero constant, then |cA| = cn |A|. e.g.
   
1 2 3 6
A= and B = 3A = .
3 10 9 30
⇒ |A| = 4, |B| = 36 = 32 · 4
5. If any row (or column) of a matrix is a multiple of any other row (or column), then its
determinant is 0.
6. |A′| = |A|.
7. If A and B are square matrices of the same order, then |AB| = |A| · |B|.

7
1.5 Inverse Matrix
1. An n × n matrix A has an inverse, denote A−1 , provided that AA−1 = A−1 A = In .

2. A matrix that has an inverse is said to be a nonsingular matrix. Otherwise, it is said to


be singular.

3. The inverse of an identity matrix is the identity matrix itself, i.e. I−1 = I.

4. The inverse of the inverse is the original matrix itself, i.e. (A−1 )−1 = A.

5. The inverse of the transpose is the transpose of the inverse, i.e. (A′ )−1 = (A−1 )′ .

6. If A and B are nonsingular, then (AB)−1 = B−1 A−1 .

7. Let A be an order n square matrix, then the inverse matrix of A is


 ′
C11 C12 . . . C1n
1   C21 C22 . . . C2n 

A−1 =  .. .. .. .. 
.
|A|  . . . 
Cn1 Cn2 . . . Cnn

8. Example:
 
1 3 4
A= 1 2 1 
2 4 5

C11 = 6, C12 = −3, C13 = 0, C21 = 1, C22 = −3, C23 = 2, C31 = −5, C32 = 3, C33 = −1
|A| = 1(6) + 3(−3) + 4(0) = −3
 ′    
6 −3 0 6 1 −5 −2 −1 5
1  1  3 3
A−1 = 1 −3 2  = −3 −3 3  =  1 1 −1 
−3 −3
−5 3 −1 0 2 −1 0 −23
1
3

Verify:    

−1 5
1 3 4 −2 3 3
1 0 0
AA−1 =  1 2 1  1 1 −1  =  0 1 0  = I3
−2 1
2 4 5 0 3 3
0 0 1

1.6 Rank of a Matrix


1. Definition: Linear Combinations
A linear combination of a set of n vectors a1 , a2 , . . ., an is denoted as

c1 a1 + c2 a2 + . . . + cn an

for the constants c1 , c2 , . . . , cn .

8
     
1 2 3
2. Example: Suppose a1 = , a2 = , a3 = , and c1 = −2, c2 = 1, c3 = 2, then
2 1 4
the linear combination is
       
1 2 3 6
c1 a1 + c2 a2 + c3 a3 = −2 +1 +2 =
2 1 4 5

3. Definition: Linearly Independent


Denote n columns of the matrix A as a1 , a2 , . . ., an . The set of these vectors is linearly
independent if and only if there exists only a set of scalars c1 = c2 = . . . = cn = 0, such
that
c1 a1 + c2 a2 + · · · + cn an = 0
Otherwise they are linearly dependent.
4. Example:    
1 1
Suppose a1 = , a2 = , then there exists only one solutions c1 = c2 = 0 such that
1 2
     
1 1 0
c1 + c2 = .
1 2 0
This implied that a1 and a2 is linearly independent. (|A| =
6 0)
5. Example:    
1 2
Suppose a1 = , a2 = , then there exists nonzero solutions c1 = −2, c2 = 1 such
2 4
that      
1 2 0
c1 + c2 = .
2 4 0
Therefore, we say that a1 and a2 is linearly dependent. (|A| = 0)
6. Let A be an m × n matrix. The rank of A, denoted by rank(A), is the maximum number
of linearly independent rows or columns of A. The matrix A has rank(A) = r if and only
if the order of the largest square submatrix (r × r) whose determinant is not zero.
7. If the maximum number of linearly independent columns (or rows) is equal to the number
of columns, we say that the matrix has a full column rank.
8. If the maximum number of linearly independent rows (or columns) is equal to the number
of rows, we say that the matrix has a full row rank.
9. If a square matrix A has a full column (and row) rank and |A| =
6 0, then A is said to be
nonsingular.
10. Example: Suppose
 
3 2 7
A =  0 1 −3  ⇒ |A| = 0
3 4 1

9
In this example, there exists nonzero solutions c1 = −13
3
, c2 = 3, and c3 = 1 such that
       
3 2 7 0
−13  
0 + 3  1  +  −3  =  0  .
3
3 4 1 0

Therefore, we say that a1 , a2 and a3 are linearly dependent.


Let A11 denote the submatrix of A deleted the first row and the first column of A. Then
 
1 −3
A11 = ⇒ |A11 | = 13 6= 0
4 1

Hence rank(A) = 2.

11. rank(In )=n.

12. rank(cA)=rank(A), where c is a constant that is not 0.

13. rank(A′ )=rank(A).

14. If A is an (m × n) matrix, then rank(A) ≤ min{m, n}.

15. If A is an (n × n) matrix, then rank(A) = n if and only if A is nonsingular.

16. Let X be an (n × K) matrix, then rank(X)=rank(X′ X).

17. Let X be an (n×K) matrix and A be an (n×n) nonsigular matrix, then rank(AX)=rank(X).

1.7 Partitioned Matrices


1. Let  
1 2 | 3
 4 7 | 5   
  A11 A12
A=


−− −− −− −−  =
 A21 A22
8 2 | 4 
2 1 | 3
then we said that A is a partitioned matrix.

2. If  
A11 0
A=
0 A22
then the inverse of A is  
−1 A−1
11 0
A =
0 A−1
22

provided that A−1 −1


11 and A22 exist.

10
3. If  
A11 A12
A=
A21 A22
where A11 and A22 are square nonsingular matrices, then
 
−1 B11 −B11 A12 A−1
22
A =
−A−1 −1 −1 −1
22 A21 B11 A22 + A22 A21 B11 A12 A22

where B11 = (A11 − A12 A−1 −1


22 A21 ) . Or, alternatively,
 −1 
−1 A11 + A−1 −1
11 A12 B22 A21 A11 −A−1
11 A12 B22
A =
−B22 A21 A−1
11 B22

where B22 = (A22 − A21 A−1 −1


11 A12 ) .

1.8 Quadratic Forms and Definite Matrices


1. Definition: A quadratic form in x is a function of the form
n X
X n
q= xi xj aij .
i=1 j=1

Let A be an (n × n) symmetric matrix, then q = x′ Ax.


Example:

q = x′ Ax
  
1 2 x1
= [x1 x2 ]
2 6 x2
= x21 + 4x1 x2 + 6x22

2. Definition: A quadratic form x′ Ax or its matrix A is said to be

(1) positive definite (p.d.) if q > 0 for all nonzero x.


(2) negative definite (n.d.) if q < 0 for all nonzero x.
(3) positive semidefinite (p.s.d.) if q ≥ 0 for all nonzero x.
(4) negative semidefinite (n.s.d.) if q ≤ 0 for all nonzero x.

3. If A is an (n × n) symmetric matrix and rank(A)=n, then the following are all equivalent:

(1) x′ Ax > 0 (p.d.) for all nonzero x.


(2) The determinants of the n principal minors are all strictly positive, i.e.

a11 a12 a13
a11 a12
|a11 | > 0, > 0, a21 a22 a23 > 0, . . . , |A| > 0
a21 a22
a31 a32 a33

11
4. Example:  
1 2
Show that A = is p.d.
2 6
(1) Method 1:
  
′ 1 2 x1
q = x Ax = [x1 x2 ]
2 6 x2
= x21 + 4x1 x2 + 6x22
= (x1 + 2x2 )2 + 2x22

q > 0 ∀ x 6= 0, since which is a sum of squares.


(2) Method 2: A is p.d. since |a11 | = 1 > 0 and |A| = 2 > 0.

5. Properties of positive definite and positive semidefinite matrices

(1) A positive definite matrix has diagonal elements that are strictly positive, while p.s.d.
matrix has nonnegative diagonal elements
(2) If A is p.d., then A−1 exists and is p.d.
(3) If X is n × K, then X′ X is p.s.d.

Proof:
Let c be an K × 1 nonzero matrix.

q = c′ X′ Xc
= y′ y, Let yn×1 = Xc
X
= yi2
≥ 0

(4) If X is n × K and rank(X)=K, then X′ X is p.d. and nonsingular.

6. Consider two matrices A and B with the same dimension, then

(1) A > B if A − B is p.d.


(2) A ≥ B if A − B is p.s.d.
(3) A < B if A − B is n.d.
(4) A ≤ B if A − B is n.s.d.

7. Example:
Show that OLS estimator b = (X′ X)−1X′ y have minimum variance in the class of unbiased
estimators.
Proof:
Let b0 = Cy be another linear unbiased estimator of β, where C is an K × n matrix such
that CX = I.

12
Let D = C − (X′ X)−1 X′

⇒ DX = CX − IK = OK×K
∴ b0 = Cy
= C(Xβ + ε)
= β + Cε
⇒ b0 − β = Cε

Var(b0 |X) = E[(b0 − β)(b0 − β)′ |X]


= E(Cεε′ C′ |X)
= CE(εε′ |X)C′
= σ 2 CC′ , since E(εε′ |X) = σ 2 I
= σ 2 [(D + (X′ X)−1X′ )(D + (X′ X)−1 X′ )′ ]
= σ 2 (X′X)−1 + σ 2 DD′ , since DX = O
= Var(b|X) + σ 2 DD′

Since DD′ is p.s.d, it follows that Var(b0 |X) ≥ Var(b|X).


This implied that the least square estimator b is the best linear unbiased estimator (BLUE)
of β.
Note:
Show that DD′ is p.s.d.
Proof:
The quadratic form in DD′ is

q = z′ DD′ z
= h′ h, let h = D′ z
X
= h2i
≥ 0

Therefore, DD′ is p.s.d.

1.9 Matrix Differentiation


1. Suppose a function y = f (x1 , x2 , . . . , xn ) = f (x) is a scalar-valued function of a vector x,
where x′ = [x1 x2 . . . xn ]. The gradient of y is denoted as
 
∂y/∂x1
∂f (x)   ∂y/∂x2 

= .. 
∂x  . 
∂y/∂xn

13
The Hessian matrix (i.e. second derivatives matrix) of y is defined as
 
f11 f12 · · · f1n
∂2y ∂(∂y/∂x) ∂(∂y/∂x)  f21 f22 · · · f2n 
 
H= ′
= ′
= =  .. .. . . .. 
∂x∂x ∂x ∂(x1 x2 . . . xn )  . . . . 
fn1 fn2 · · · fnn

where fij = ∂ 2 y/∂xi ∂xj .

2. If a′ = [a1 a2 . . . an ] and x′ = [x1 x2 . . . xn ], then

y = a′ x = a1 x1 + a2 x2 + . . . + an xn
 ∂y   
∂x1 a1
∂y ′
∂(a x)  ∂y 
∂(x a)  ∂x2   a2
′  

= = =  .  =  .. =a
∂x ∂x ∂x  ..   . 
∂y an
∂x n

Example:
Residual sum of squares:
e′ e = y′ y − 2b′ X′ y + b′ X′ Xb
∂(−2b′ X′ y)
= −2X′ y
∂b
3. Theorem:
If A is a symmetric matrix, then

∂x′ Ax
= 2Ax.
∂x
 
1 3
Example 1: A = . Then
3 4
  
′ 1 3 x1
x Ax = [x1 x2 ] = 1x21 + 4x22 + 6x1 x2
3 4 x2
    
∂x′ Ax 2x1 + 6x2 1 3 x1
= =2 = 2Ax
∂x 6x1 + 8x2 3 4 x2

Example 2:
∂(b′ X′ Xb)
= 2X′ Xb, since X′X is symmetric
∂b
4. If A is not symmetric, then
∂x′ Ax
= (A + A′ )x.
∂x

14
 
1 3
e.g. A = . Then
0 4
  
′ 1 3 x1
y = x Ax = [x1 x2 ] = x21 + 4x22 + 3x1 x2
0 4 x2

" #
∂y
∂x′ Ax ∂x1
= ∂y
∂x ∂x2
 
2x1 + 3x2
=
3x1 + 8x2
  
2 3 x1
=
3 8 x2
     
1 3 1 0 x1
= +
0 4 3 4 x2
= (A + A′ )x

5. If y = A x then
(m×1) (m×n)(n×1)
∂y
= A′ .
∂x (n×m)
e.g. A


   
x1  
y1 5 3 2   5x1 + 3x2 + 2x3
y= = x2 = = Ax
y2 2 1 3 2x1 + x2 + 3x3
x3
   
∂y1 ∂y2
  ∂x ∂x 5 2
∂y ∂(Ax) ∂y1 ∂y2  ∂y1 ∂y2  
1 1
= = =  ∂x ∂x2  = 3 1  = A′
∂x ∂x ∂x ∂x ∂y1
2
∂y2
∂x ∂x
2 3
3 3

1.10 Eigenvalues and Eigenvectors


1. Suppose that we want to find the solutions of

Ac = λc

where A is a known (k × k) square matrix, c is an unknown (k × 1) nonzero vector, and λ


is an unkown scalar. Thus,

Ac = λIc
Ac − λIc = 0
(A − λI)c = 0
If the inverse of (A − λI) exists, then

(A − λI)−1 (A − λI)c = 0

15
This implies that
c=0
This solution contradicts the condition that c 6= 0. The results that the matrix

(A − λI) is singular.

i.e., (A − λI)−1 does not exist.


This implies that
|A − λI| = 0

2. Suppose A is a (k × k) matrix of known numbers such that

A c =λ c
(k×k) (k×1) (k×1)

or
(A − λI)c = 0
where λ is an unknown scalar and c is an unknown k × 1 vector. Then,

|A − λI| = 0

The above polynomial equation in λ of degree k is known as the characteristic equation


of A. These λ’s are called characteristic roots (or eigenvalues) of the matrix A. Each
λi can be substituted in (A − λI)c = 0 and the corresponding k × 1 matrix ci obtained,
where ci is a nonzero vector. The c vectors are known as the characteristic vectors (or
eigenvectors) of A.

3. Aci = λi ci , i = 1, 2, . . . , k. Stacking all k solutions produces the matrix equation


   
Ak×k c1 c2 . . . ck k×k = λ1 c1 λ2 c2 . . . λk ck k×k (1.1)
 
λ1 0 . . . 0
 
 0 λ2 . . . 0 

= c1 c2 . . . ck  .. .. . . .  (1.2)
 . . . .. 
0 0 . . . λk

It can be written as
Ak×k Ck×k = Ck×k Λk×k
where Λ is the diagonal matrix of eigenvalues.

Assume C is nonsingular, then we obtain the diagonalization of A

Λ = C−1 AC

where Λ is a k × k diagonal matrix with eigenvalues λi s in diagonal position.

16
4. Example: Suppose  
3 1
A=
1 3
     
3 1 1 0 3−λ 1
A − λI = −λ =
1 3 0 1 1 3−λ

The characteristic equation is given by

|A − λI| = 0

3−λ 1
⇒ = 0
1 3−λ

⇒ λ2 − 6λ + 8 = (λ − 4)(λ − 2) = 0
The eigenvalues of A are λ1 = 4 and λ2 = 2.

• Find eigenvectors:
(1) λ1 = 4:

Ac1 = 4c1
    
3 1 c11 c11
i.e. = 4
1 3 c21 c21
   
3c11 + c21 4c11
⇒ =
c11 + 3c21 4c21
   
−c11 + c21 0
⇒ =
c11 − c21 0
⇒ c11 = c21
Let c11 = 1, then c21 = 1.
 
1
Therefore, eigenvector for λ1 = 4 is c1 =
1

(2) λ2 = 2:

Ac2 = 2c2
    
3 1 c12 c12
i.e. = 2
1 3 c22 c22
   
3c12 + c22 2c12
⇒ =
c12 + 3c22 2c22
   
c12 + c22 0
⇒ =
c12 + c22 0
⇒ c12 = −c22
Let c12 = 1, then c22 = −1.
 
1
Therefore, eigenvector for λ2 = 2 is c2 =
−1

17
• The equations system is homogeneous, it will yield an infinite number of vectors cor-
responding to the root λi .
• Check Λ = C−1 AC.
   
  1 1 −1 −1 −1 −1
C= c1 c2 = ⇒C =
1 −1 2 −1 1
     
−1
−1 −1 −1 3 1 1 1 4 0
⇒ Λ = C AC = =
2 −1 1 1 3 1 −1 0 2
5. The eigenvalues of a symmetric matrix are all real.

6. If all k eigenvalues are distinct, C will have k linearly independent columns and such that

Λ = C−1 AC

7. If A is symmetric, the eigenvectors are linear independent and pairwise orthogonal in


that c′i cj = 0 for λi 6= λj .

8. Normalization of eigenvectors: c′i ci = 1, i = 1, . . . , k

9. Let Q denote the matrix whose columns are normalized orthogonal eigenvectors. Then

Q′ Q = I

The matrix Q is called an orthogonal matrix since its inverse is its transpose, i.e.

Q′ Q = QQ′ = I

⇒ Q−1 = Q′ , since Q−1 Q = I

     
c′1 c′1 c1 c′1 c2 . . . c′1 ck 1 0 ... 0
 c′2   c′2 c1 c′2 c2 . . . c′2 ck   0 1 ... 0 
′     
QQ= ..  [c1 c2 . . . ck ] =  .. .. .. ..  =  .... . . .. =I
 .   . . . .   . . . . 
c′k ′ ′
ck c1 ck c2 ′
. . . ck ck 0 0 ... 1

10. Let A be an (k × k) symmetric matrices. Then, there exists an (k × k) orthogonal matrix


Q such that Q′ AQ = Λ is diagonal.

Q′ AQ = Λ ⇔ A = QΛQ′

11. Example: (continue...)

18
• λ1 = 4:     
3−4 1 c11 0
=
1 3−4 c21 0

c11 = c21

c211 + c221 = 1 (normalization)
" # " #
√1 −1

⇒ c1 = 2 or 2
√1 −1

2 2

• λ2 = 2:     
3−2 1 c12 0
=
1 3−2 c22 0

c12 = −c22

c212 + c222 = 1 (normalization)
" # " #
√1 −1

⇒ c2 = 2 or 2
−1
√ √1
2 2

• Check Q′ Q = QQ′ = I.
" #′ " #  
√1 √1 √1 √1 1 0
′ 2 2 2 2
QQ= √1 −1 √1 −1 =
2

2 2

2
0 1

• Check Q′ AQ = Λ.
" #′  " #  
√1 √1 3 1 √1 √1 4 0
′ 2 2 2 2
Q AQ = √1 −1 √1 −1 = =Λ
2

2
1 3 2

2
0 2

12. The determinant of a symmetric matrix is the product of its eigenvalues.


Proof:

|Λ| = |Q′ AQ|


= |Q′ | · |A| · |Q|, since |AB| = |A| · |B|
= |Q′ | · |Q| · |A|
= |Q′ Q| · |A|, since |A| · |B| = |AB|
= |I| · |A|
= |A|

13. The sum of all the eigenvalues is equal to the trace of A.


Proof:

tr(Λ) = tr(C−1 AC)


= tr(ACC−1 ), since tr(AB) = tr(BA)
= tr(A)

19
14. The rank of A is equal to the number of nonzero eigenvalues.
Proof:
Let k1 be the number of nonzero eigenvalues of matrix A.
rank(A) = rank(QΛQ′ )
= rank(ΛQ′ ), since rank(BX) = rank(X) if B is nonsingular
= rank(QΛ), since rank(X) = rank(X′ )
= rank(Λ), since rank(BX) = rank(X) if B is nonsingular
= k1

15. The rank of an idempotent matrix is equal to its trace.


16. Suppose A is symmetric then A is positive definite if and only if all eigenvalues of A are
positive.
17. If A is symmetric and positive definite then there exists a nonsingular matrix P = QΛ1/2
such that A = PP′ , where Q is an orthogonal matrix of eigenvectors and
 √ 
λ1 √0 · · · 0
 0 λ2 · · · 0 
 
Λ1/2 =  .. .. . . ..  .
 . . . 
√.
0 0 ··· λk

There is also a matrix T = Λ1/2 Q′ such that A = T′ T and TA−1 T′ = I.


Proof:
A = QΛQ′ = QΛ1/2 Λ1/2 Q′ = (QΛ1/2 )(QΛ1/2 )′ = PP′
A = QΛQ′ = QΛ1/2 Λ1/2 Q′ = (Λ1/2 Q′ )′ (Λ1/2 Q′ ) = T′ T

18. Example: (continue...)


(1) Check PP′ = A.
" #  " #
√1 √1 2 √0 √2 1
P = QΛ1/2 = √1
2
−1
2 = √2
2

2

2
0 2 2
−1
" #" #′  
√2 1 √2 1 3 1
′ 2 2
PP = √2 √2
= =A
2
−1 2
−1 1 3

(2) Check T′ T = A.
 " #′  
2 √0 √1 √1 √2 √2
1/2 ′ 2 2 2 2
T=Λ Q = √1 −1 =
0 2 2

2
1 −1
 ′    
√2 √2 √2 √2 3 1
′ 2 2 2 2
TT= = =A
1 −1 1 −1 1 3

20
(3) Check TA−1 T′ = I.
   
3 1 −1 1 3 −1
A= ⇒A =
1 3 8 −1 3

   ′
1 √2 √2 3 −1 √2 √2
−1 ′ 2 2 2 2
TA T =
8 1 −1 −1 3 1 −1
 " #
1 √4 √4 √2 1
= 2 2 2
8 4 −4 √2 −1
2
 
1 8 0
=
8 0 8
 
1 0
=
0 1
= I2

21
Chapter 2

Multivariate Distributions

2.1 Multivariate Densities


1. Let x denote a vector of random variables X1 , X2 , . . . , Xk . Then, expected values of x in a
vector can be expressed as
   
E(X1 ) µ1
 E(X2 )   µ2 
   
µ = E(x) =  ..  =  .. 
 .   . 
E(Xk ) µk

 
X1
 X2 
 
xk×1 =  .. 
 . 
Xk

22
The variance-covariance (or covariance) matrix denotes by

Var(x)
= E[(x − µ)(x − µ)′ ]
  

 (X 1 − µ 1 ) 


 (X2 − µ2 )  

 
= E  ..  [(X 1 − µ 1 ) (X 2 − µ 2 ) · · · (X k − µ k )]

  .  


 (X − µ ) 

k k
 
E(X1 − µ1 )2 E[(X1 − µ1 )(X2 − µ2 )] . . . E[(X1 − µ1 )(Xk − µk )]
 E[(X2 − µ2 )(X1 − µ1 )] E(X2 − µ2 )2 . . . E[(X2 − µ2 )(Xk − µk )] 
 
=  .. .. .. .. 
 . . . . 
E[(Xk − µk )(X1 − µ1 )] E[(Xk − µk )(X2 − µ2 )] . . . E(Xk − µk )2
 
σ11 σ12 . . . σ1k
 σ21 σ22 . . . σ2k 
 
=  .. .. . . .. 
 . . . . 
σk1 σk2 . . . σkk
= Σ

2.2 Multivariate Normal Distribution


1. Let x be a k × 1 normal random vector and its probability density is given by
 
−k/2 −1/2 −1 ′ −1
f (x) = (2π) |Σ| exp (x − µ) Σ (x − µ)
2

where µ = [µ1 µ2 · · · µk ]′ and the positive definite matrix


 
σ11 σ12 · · · σ1k
 σ21 σ22 · · · σ2k 
 
Σ =  .. .. . . .. 
 . . . . 
σk1 σk2 · · · σkk

An abbreviated notation is
x ∼ N(µ, Σ)

2. If xk×1 ∼ N(µ, Σk×k ), then

(1) x − µ ∼ N(0, Σ).


(2) Let c be an k × 1 vector of constant, then

c′ x ∼ N(c′ µ, c′Σc).

23
Since E(c′ x) = c′ E(x) = c′ µ.

Var(c′ x) = E [(c′ x − E(c′ x))(c′ x − E(c′ x))′ ]


= E [(c′ x − c′ µ)(c′ x − c′ µ)′ ]
= c′ E [(x − µ)(x − µ)′ ] c
= c′ Σc

3. Suppose that xk×1 ∼ N(0, Ik ), then

x′ x ∼ χ2 (k)

4. Suppose that x ∼ N(0, σ 2 I), then


1 ′
x x = x′ (σ 2 I)−1 x ∼ χ2 (k)
σ2

5. Suppose that x ∼ N(0, Σ), where Σ is a positive definite matrix. Then,

x′ Σ−1 x ∼ χ2 (k)

6. If x ∼ N(0, σ 2 I) and A is a (k × k) symmetric and idempotent matrix with rank r (r ≤ k),


then
1 ′
x Ax ∼ χ2 (r)
σ2
Proof:
Let Q denote the orthogonal matrix, then
 
′ Ir 0
Q AQ = Λ =
0 0

Define y = Q′ x and x = Qy. Then E(y) = 0 and

Var(y) = E(yy′ )
= E(Q′ xx′ Q)
= Q′ (σ 2 I)Q
= σ2 I
⇒y ∼ N(0, σ 2 I)

⇒ x′ Ax = y′ Q′ AQy
= y′ y
= y12 + y22 + . . . + yr2
yi
∵ ∼ N(0, 1)
σ
x′ Ax
∴ ∼ χ2 (r)
σ2
24
7. Suppose x ∼ N(0, σ 2 I) and there are two quadratic form x′ Ax and x′ Bx, where A and B
are symmetric and idempotent matrices. Then, x′ Ax and x′ Bx are statistically indepen-
dently if and only if
AB = O.

8. Assume x ∼ N(0, σ 2 I). Let L be an (m × n) matrix and A a symmetric matrix of order n,


the linear form Lx is independent of the quadratic form x′ Ax if and only if LA = O.

25

You might also like