Matrix Algebra: 1.1 Vector

Chapter 1
Matrix Algebra
1.1 Vector
1. A vector is an ordered sequence of elements arranged in a row or column. Unless otherwise
noted, a vector will always be assumed to be a column vector. For example, a is a
3-element column vector and x is a n-element column vector.
 
  x1
5  x2 
 
a =  1  , x =  .. 
 . 
3
xn
Column vectors can be transformed into row vectors by the operation of transposition.
We denote the transposition operation by a prime. Thus, row vector:
a′ = [5 1 3], x′ = [x1 x2 . . . xn ]
2. The inner product of two vectors is defined as

 
b1
 b2  Xn
′  
a b = [a1 a2 . . . an ]  ..  = a1 b1 + a2 b2 + · · · + an bn = ai bi = b′ a
 .  i=1
bn
That is, corresponding elements are multiplied together and summed to give the product,
which is a scalar.
3. A special case of inner product:
n
X
′
Sum of squares: a a = a2i
i=1
Example: Let e = [e1 e2 . . . en ]′ be a (n × 1) residuals vector of OLS estimators then we

can obtain n
X
′
Residual sum of squares = e e = e2i
i=1
1
4. Definition: Let i is a vector that contains a column of ones.
 
1
 1 
 
in×1 =  .. 
 . 
1
5. Suppose i and x are two (n × 1) column vector, then

 
x1
n
X  x2 
 
xi = x1 + x2 + · · · + xn = [1 1 . . . 1]  ..  = i′ x
i=1
 . 
xn
1
Pn
6. Sample mean: x̄ = n i=1 xi = n1 i′ x
7. A column vector with sample mean:

 
x̄
 x̄  1 1
 
 ..  = ix̄ = i i′ x = ii′ x
 .  n n
x̄ n×1
⇒ n1 ii′ is a n × n matrix with every element equal to 1/n.
1.2 Matrices
1. Definition:
A matrix is a rectangular array of elements. The order of a matrix is given by the
number of rows and the number of columns. The first number is the number of rows, and
the second number is the number of columns. A matrix A of order (or dimension) m × n
can be expressed as  
a11 a12 . . . a1n
 a21 a22 . . . a2n 
 
A =  .. .. . . .. 
 . . . . 
am1 am2 . . . amn
Example:
 
2 1 7 4
B =  1 2 −2 1 
4 1 2 −3
The matrix B is of order 3 × 4.
2
2. Definition: Let A be a m × n matrix. A is said to be a square matrix if m = n.
3. Definition: lower-triangular matrix

A matrix in which all elements above the diagonal are 0, e.g.
 
1 0 0
A= 2 3 0 
4 5 6
4. Definition: upper-triangular matrix

A matrix in which all elements below the diagonal are 0, e.g.
 
1 2 3
A= 0 4 5 
0 0 6
5. A diagonal matrix is a square matrix with nonzero elements in all diagonal position and
zeros occur elsewhere.  
a11 0 . . . 0
 0 a22 . . . 0 
 
A =  .. .. . . .. 
 . . . . 
0 0 . . . ann
We often denoted a diagonal matrix A as
diag(a11 , a22 , . . . , ann ).
where aii is the ith element on the principal diagonal.
6. The identity matrix of order n × n is defined as

 
1 0 ... 0
 0 1 ... 0 
 
In =  .. .. . . .. 
 . . . . 
0 0 ... 1
7. Suppose A is a matrix with order m × n, it follows that
Im A = AIn = A
That is, pre- or post-multiplying by I does not change the matrix.
8. The identity matrix can be entered or suppressed at will in matrix multiplication. For
example,
y − Py = Iy − Py = (I − P)y = My, where M = I − P
3
9. The transpose of a matrix A = [aij ] denoted as A′ , is obtained by creating the matrix
whose kth row is the kth column of the original matrix, i.e. A′ = [aji ].
Example:  
1 4
1 2 3
A= , A′ =  2 5 
4 5 6
3 6
10. A square matrix A is a symmetric matrix if and only if A = A′ .

Example 1:  
1 −2 3
A =  −2 1 4  = A′
3 4 2
Example 2:
If X is any n × K matrix, then X′ X is a symmetric matrix.
11. If k is a scalar then (kA)′ = A′ k ′ = A′ k = kA′ .
12. (A′ )′ = A
13. The transpose of a sum is the sum of the transposes.
(A + B)′ = A′ + B′
14. The transpose of an identity matrix I is the identity matrix itself, i.e. I′ = I.
15. Definition: A matrix with all elements zero is said to be a null matrix and denoted as On .
16. Two matrices A and B are said to be equal if A − B = O.
17. Definition: Matrix multiplication
If the matrix A is m × n and B is n × k, then AB is defined.
   
a11 a12 · · · a1n a′1
 a21 a22 · · · a2n   ′ 
   a2 
A =  .. .. .. ..  =  . 
 . . . .   .. 
am1 am2 · · · amn a′m
 
b11 b12 · · · b1k
 b21 b22 · · · b2k 
 
B= .. .. .. ..  = [b1 b2 · · · bk ]
 . . . . 
bn1 bn2 · · · bnk
Then C is a m × k matrix defined as
   
a′1 a′1 b1 a′1 b2 · · · a′1 bk
 a ′  a′ b1 a′ b2 · · · a′2 bk 
 2   2 2 
C = AB =  ..  [b1 b2 · · · bk ] =  .. .. .. .. 
 .   . . . . 
′ ′ ′ ′
am am b1 am b2 · · · am bk
4
Example 1:

1 3 2 1 3
A= ,B =
2 4 0 1 2

1(2) + 3(0) 1(1) + 3(1) 1(3) + 3(2) 2 4 9
⇒ AB = =
2(2) + 4(0) 2(1) + 4(1) 2(3) + 4(2) 4 6 14
Example 2:
Let X be a n × K matrix defined as:
   
x11 x12 · · · x1K x′1
 x21 x22 · · · x2K   x′2 
   
X(n×K) =  .. .. .. .. = .. 
 . . . .   . 
xn1 xn2 · · · xnK x′n
where  
xi1
 xi2 
 
xi =  ..  , where i = 1, 2, . . . , n
 . 
xiK
It turns out that

 
x′1
′

 x′2 

X X(K×K) = x1 x2 . . . xn  .. 
 . 
x′n
n
X
= xi x′i
i=1
18. Generally, AB 6= BA if both products exist. e.g.

4 7 1 5
A= ,B =
3 2 6 8

46 76 19 17
⇒ AB = 6= BA =
15 31 48 58
19. The transpose of the product of two matrices is given by
(AB)′ = B′ A′ , where A is m × n and B is n × k.
That is, the transpose of a product is the product of the transposes in reverse order.
Example:
1 2 2 0
A= ,B =
3 4 1 1
5

4 2 ′ 4 10
⇒ AB = ⇒ (AB) =
10 4 2 4

′ ′ 2 1 1 3 4 10
BA = =
0 1 2 4 2 4
20. A matrix A is defined as an idempotent matrix, then
A = AA.
That is, multiplying A by itself, however many times, simply reproduces the original matrix.
Example:
4 −2
A=
6 −3

2 4 −2 4 −2 4 −2
A = =
6 −3 6 −3 6 −3
21. Example 1:
(1) projection matrix

P = X(X′ X)−1 X′
(2) residual maker
M = I − X(X′ X)−1X′
22. Example 2: Let an idempotent matrix A = I − n1 ii′ , then

 
x1 − x̄
1  x2 − x̄ 
 
Ax = x − (ii′ )x =  .. 
n  . 
xn − x̄
The matrix A transforms raw data into deviation form.
1.3 Trace of a Matrix

1. If the square matrix A is of order n × n, the trace of A is defined as the sum of the
elements on the principal diagonal; i.e.
n
X
tr(A) = aii
i=1
2. Basic properties of the trace:
(1) tr(cA) = c · tr(A), where c is a constant.

(2) tr(A′ ) = tr(A)
6
(3) tr(A + B) = tr(A) + tr(B)
(4) tr(In ) = n
(5) tr(AB) = tr(BA)
(6) tr(ABC) = tr(CAB) = tr(BCA)
(7) a′ a = tr(a′ a) = tr(aa′ ), where a is a n × 1 column vector.
P P
(8) tr(A′ A) = tr(AA′ ) = ni=1 nj=1 a2ij
1.4 Determinant of a Square Matrix

1. Definition: For a 2 × 2 matrix
a11 a12
A=
a21 a22
its determinant is defined as
det(A) = |A| = a11 (−1)(1+1) |a22 | + a12 (−1)(1+2) |a21 |
= a11 a22 − a21 a12
2. For a 3 × 3 matrix  
a11 a12 a13
A =  a21 a22 a23 
a31 a32 a33
its determinant is
a a a21 a23 a21 a22
|A| = a11 (−1)(1+1) 22 23 + a12 (−1)

(1+2)
a31 a33
+ a13 (−1)

(1+3)
a31 a32

a32 a33
3. Suppose A is an n × n matrix, then
Xn
|A| = aij Cij , for any row i = 1, 2, . . . , n.
j=1
Xn
= aij Cij , for any column j = 1, 2, . . . , n.
i=1
where Cij is called the cofactor of the element of aij and denoted by Cij = (−1)i+j |Aij |.
Aij is a submatrix obtained from A by deleting row i and column j.
4. If A is an n × n matrix and c is a nonzero constant, then |cA| = cn |A|. e.g.

1 2 3 6
A= and B = 3A = .
3 10 9 30
⇒ |A| = 4, |B| = 36 = 32 · 4
5. If any row (or column) of a matrix is a multiple of any other row (or column), then its
determinant is 0.
6. |A′| = |A|.
7. If A and B are square matrices of the same order, then |AB| = |A| · |B|.
7
1.5 Inverse Matrix
1. An n × n matrix A has an inverse, denote A−1 , provided that AA−1 = A−1 A = In .
2. A matrix that has an inverse is said to be a nonsingular matrix. Otherwise, it is said to

be singular.
3. The inverse of an identity matrix is the identity matrix itself, i.e. I−1 = I.
4. The inverse of the inverse is the original matrix itself, i.e. (A−1 )−1 = A.
5. The inverse of the transpose is the transpose of the inverse, i.e. (A′ )−1 = (A−1 )′ .
6. If A and B are nonsingular, then (AB)−1 = B−1 A−1 .
7. Let A be an order n square matrix, then the inverse matrix of A is

 ′
C11 C12 . . . C1n
1   C21 C22 . . . C2n 

A−1 =  .. .. .. .. 
.
|A|  . . . 
Cn1 Cn2 . . . Cnn
8. Example:
 
1 3 4
A= 1 2 1 
2 4 5
C11 = 6, C12 = −3, C13 = 0, C21 = 1, C22 = −3, C23 = 2, C31 = −5, C32 = 3, C33 = −1
|A| = 1(6) + 3(−3) + 4(0) = −3
 ′    
6 −3 0 6 1 −5 −2 −1 5
1  1  3 3
A−1 = 1 −3 2  = −3 −3 3  =  1 1 −1 
−3 −3
−5 3 −1 0 2 −1 0 −23
1
3
Verify:    

−1 5
1 3 4 −2 3 3
1 0 0
AA−1 =  1 2 1  1 1 −1  =  0 1 0  = I3
−2 1
2 4 5 0 3 3
0 0 1
1.6 Rank of a Matrix

1. Definition: Linear Combinations
A linear combination of a set of n vectors a1 , a2 , . . ., an is denoted as
c1 a1 + c2 a2 + . . . + cn an
for the constants c1 , c2 , . . . , cn .
8

1 2 3
2. Example: Suppose a1 = , a2 = , a3 = , and c1 = −2, c2 = 1, c3 = 2, then
2 1 4
the linear combination is

1 2 3 6
c1 a1 + c2 a2 + c3 a3 = −2 +1 +2 =
2 1 4 5
3. Definition: Linearly Independent

Denote n columns of the matrix A as a1 , a2 , . . ., an . The set of these vectors is linearly
independent if and only if there exists only a set of scalars c1 = c2 = . . . = cn = 0, such
that
c1 a1 + c2 a2 + · · · + cn an = 0
Otherwise they are linearly dependent.
4. Example:
1 1
Suppose a1 = , a2 = , then there exists only one solutions c1 = c2 = 0 such that
1 2

1 1 0
c1 + c2 = .
1 2 0
This implied that a1 and a2 is linearly independent. (|A| =
6 0)
5. Example:
1 2
Suppose a1 = , a2 = , then there exists nonzero solutions c1 = −2, c2 = 1 such
2 4
that
1 2 0
c1 + c2 = .
2 4 0
Therefore, we say that a1 and a2 is linearly dependent. (|A| = 0)
6. Let A be an m × n matrix. The rank of A, denoted by rank(A), is the maximum number
of linearly independent rows or columns of A. The matrix A has rank(A) = r if and only
if the order of the largest square submatrix (r × r) whose determinant is not zero.
7. If the maximum number of linearly independent columns (or rows) is equal to the number
of columns, we say that the matrix has a full column rank.
8. If the maximum number of linearly independent rows (or columns) is equal to the number
of rows, we say that the matrix has a full row rank.
9. If a square matrix A has a full column (and row) rank and |A| =
6 0, then A is said to be
nonsingular.
10. Example: Suppose
 
3 2 7
A =  0 1 −3  ⇒ |A| = 0
3 4 1
9
In this example, there exists nonzero solutions c1 = −13
3
, c2 = 3, and c3 = 1 such that
       
3 2 7 0
−13  
0 + 3  1  +  −3  =  0  .
3
3 4 1 0
Therefore, we say that a1 , a2 and a3 are linearly dependent.

Let A11 denote the submatrix of A deleted the first row and the first column of A. Then

1 −3
A11 = ⇒ |A11 | = 13 6= 0
4 1
Hence rank(A) = 2.
11. rank(In )=n.
12. rank(cA)=rank(A), where c is a constant that is not 0.
13. rank(A′ )=rank(A).
14. If A is an (m × n) matrix, then rank(A) ≤ min{m, n}.
15. If A is an (n × n) matrix, then rank(A) = n if and only if A is nonsingular.
16. Let X be an (n × K) matrix, then rank(X)=rank(X′ X).
17. Let X be an (n×K) matrix and A be an (n×n) nonsigular matrix, then rank(AX)=rank(X).
1.7 Partitioned Matrices

1. Let  
1 2 | 3
 4 7 | 5 
  A11 A12
A=


−− −− −− −−  =
 A21 A22
8 2 | 4 
2 1 | 3
then we said that A is a partitioned matrix.
2. If
A11 0
A=
0 A22
then the inverse of A is
−1 A−1
11 0
A =
0 A−1
22
provided that A−1 −1

11 and A22 exist.
10
3. If
A11 A12
A=
A21 A22
where A11 and A22 are square nonsingular matrices, then

−1 B11 −B11 A12 A−1
22
A =
−A−1 −1 −1 −1
22 A21 B11 A22 + A22 A21 B11 A12 A22
where B11 = (A11 − A12 A−1 −1

22 A21 ) . Or, alternatively,
−1
−1 A11 + A−1 −1
11 A12 B22 A21 A11 −A−1
11 A12 B22
A =
−B22 A21 A−1
11 B22
where B22 = (A22 − A21 A−1 −1

11 A12 ) .
1.8 Quadratic Forms and Definite Matrices

1. Definition: A quadratic form in x is a function of the form
n X
X n
q= xi xj aij .
i=1 j=1
Let A be an (n × n) symmetric matrix, then q = x′ Ax.

Example:
q = x′ Ax

1 2 x1
= [x1 x2 ]
2 6 x2
= x21 + 4x1 x2 + 6x22
2. Definition: A quadratic form x′ Ax or its matrix A is said to be
(1) positive definite (p.d.) if q > 0 for all nonzero x.

(2) negative definite (n.d.) if q < 0 for all nonzero x.
(3) positive semidefinite (p.s.d.) if q ≥ 0 for all nonzero x.
(4) negative semidefinite (n.s.d.) if q ≤ 0 for all nonzero x.
3. If A is an (n × n) symmetric matrix and rank(A)=n, then the following are all equivalent:
(1) x′ Ax > 0 (p.d.) for all nonzero x.

(2) The determinants of the n principal minors are all strictly positive, i.e.

a11 a12 a13
a11 a12
|a11 | > 0, > 0, a21 a22 a23 > 0, . . . , |A| > 0
a21 a22
a31 a32 a33
11
4. Example:
1 2
Show that A = is p.d.
2 6
(1) Method 1:

′ 1 2 x1
q = x Ax = [x1 x2 ]
2 6 x2
= x21 + 4x1 x2 + 6x22
= (x1 + 2x2 )2 + 2x22
q > 0 ∀ x 6= 0, since which is a sum of squares.

(2) Method 2: A is p.d. since |a11 | = 1 > 0 and |A| = 2 > 0.
5. Properties of positive definite and positive semidefinite matrices
(1) A positive definite matrix has diagonal elements that are strictly positive, while p.s.d.
matrix has nonnegative diagonal elements
(2) If A is p.d., then A−1 exists and is p.d.
(3) If X is n × K, then X′ X is p.s.d.
Proof:
Let c be an K × 1 nonzero matrix.
q = c′ X′ Xc
= y′ y, Let yn×1 = Xc
X
= yi2
≥ 0
(4) If X is n × K and rank(X)=K, then X′ X is p.d. and nonsingular.
6. Consider two matrices A and B with the same dimension, then
(1) A > B if A − B is p.d.

(2) A ≥ B if A − B is p.s.d.
(3) A < B if A − B is n.d.
(4) A ≤ B if A − B is n.s.d.
7. Example:
Show that OLS estimator b = (X′ X)−1X′ y have minimum variance in the class of unbiased
estimators.
Proof:
Let b0 = Cy be another linear unbiased estimator of β, where C is an K × n matrix such
that CX = I.
12
Let D = C − (X′ X)−1 X′
⇒ DX = CX − IK = OK×K
∴ b0 = Cy
= C(Xβ + ε)
= β + Cε
⇒ b0 − β = Cε
Var(b0 |X) = E[(b0 − β)(b0 − β)′ |X]

= E(Cεε′ C′ |X)
= CE(εε′ |X)C′
= σ 2 CC′ , since E(εε′ |X) = σ 2 I
= σ 2 [(D + (X′ X)−1X′ )(D + (X′ X)−1 X′ )′ ]
= σ 2 (X′X)−1 + σ 2 DD′ , since DX = O
= Var(b|X) + σ 2 DD′
Since DD′ is p.s.d, it follows that Var(b0 |X) ≥ Var(b|X).

This implied that the least square estimator b is the best linear unbiased estimator (BLUE)
of β.
Note:
Show that DD′ is p.s.d.
Proof:
The quadratic form in DD′ is
q = z′ DD′ z
= h′ h, let h = D′ z
X
= h2i
≥ 0
Therefore, DD′ is p.s.d.
1.9 Matrix Differentiation

1. Suppose a function y = f (x1 , x2 , . . . , xn ) = f (x) is a scalar-valued function of a vector x,
where x′ = [x1 x2 . . . xn ]. The gradient of y is denoted as
 
∂y/∂x1
∂f (x)   ∂y/∂x2 

= .. 
∂x  . 
∂y/∂xn
13
The Hessian matrix (i.e. second derivatives matrix) of y is defined as
 
f11 f12 · · · f1n
∂2y ∂(∂y/∂x) ∂(∂y/∂x)  f21 f22 · · · f2n 
 
H= ′
= ′
= =  .. .. . . .. 
∂x∂x ∂x ∂(x1 x2 . . . xn )  . . . . 
fn1 fn2 · · · fnn
where fij = ∂ 2 y/∂xi ∂xj .
2. If a′ = [a1 a2 . . . an ] and x′ = [x1 x2 . . . xn ], then
y = a′ x = a1 x1 + a2 x2 + . . . + an xn
 ∂y   
∂x1 a1
∂y ′
∂(a x)  ∂y 
∂(x a)  ∂x2   a2
′  

= = =  .  =  .. =a
∂x ∂x ∂x  ..   . 
∂y an
∂x n
Example:
Residual sum of squares:
e′ e = y′ y − 2b′ X′ y + b′ X′ Xb
∂(−2b′ X′ y)
= −2X′ y
∂b
3. Theorem:
If A is a symmetric matrix, then
∂x′ Ax
= 2Ax.
∂x

1 3
Example 1: A = . Then
3 4

′ 1 3 x1
x Ax = [x1 x2 ] = 1x21 + 4x22 + 6x1 x2
3 4 x2

∂x′ Ax 2x1 + 6x2 1 3 x1
= =2 = 2Ax
∂x 6x1 + 8x2 3 4 x2
Example 2:
∂(b′ X′ Xb)
= 2X′ Xb, since X′X is symmetric
∂b
4. If A is not symmetric, then
∂x′ Ax
= (A + A′ )x.
∂x
14

1 3
e.g. A = . Then
0 4

′ 1 3 x1
y = x Ax = [x1 x2 ] = x21 + 4x22 + 3x1 x2
0 4 x2
" #
∂y
∂x′ Ax ∂x1
= ∂y
∂x ∂x2

2x1 + 3x2
=
3x1 + 8x2

2 3 x1
=
3 8 x2

1 3 1 0 x1
= +
0 4 3 4 x2
= (A + A′ )x
5. If y = A x then
(m×1) (m×n)(n×1)
∂y
= A′ .
∂x (n×m)
e.g. A



x1
y1 5 3 2   5x1 + 3x2 + 2x3
y= = x2 = = Ax
y2 2 1 3 2x1 + x2 + 3x3
x3
   
∂y1 ∂y2
∂x ∂x 5 2
∂y ∂(Ax) ∂y1 ∂y2  ∂y1 ∂y2  
1 1
= = =  ∂x ∂x2  = 3 1  = A′
∂x ∂x ∂x ∂x ∂y1
2
∂y2
∂x ∂x
2 3
3 3
1.10 Eigenvalues and Eigenvectors

1. Suppose that we want to find the solutions of
Ac = λc
where A is a known (k × k) square matrix, c is an unknown (k × 1) nonzero vector, and λ

is an unkown scalar. Thus,
Ac = λIc
Ac − λIc = 0
(A − λI)c = 0
If the inverse of (A − λI) exists, then
(A − λI)−1 (A − λI)c = 0
15
This implies that
c=0
This solution contradicts the condition that c 6= 0. The results that the matrix
(A − λI) is singular.
i.e., (A − λI)−1 does not exist.

This implies that
|A − λI| = 0
2. Suppose A is a (k × k) matrix of known numbers such that
A c =λ c
(k×k) (k×1) (k×1)
or
(A − λI)c = 0
where λ is an unknown scalar and c is an unknown k × 1 vector. Then,
|A − λI| = 0
The above polynomial equation in λ of degree k is known as the characteristic equation

of A. These λ’s are called characteristic roots (or eigenvalues) of the matrix A. Each
λi can be substituted in (A − λI)c = 0 and the corresponding k × 1 matrix ci obtained,
where ci is a nonzero vector. The c vectors are known as the characteristic vectors (or
eigenvectors) of A.
3. Aci = λi ci , i = 1, 2, . . . , k. Stacking all k solutions produces the matrix equation

Ak×k c1 c2 . . . ck k×k = λ1 c1 λ2 c2 . . . λk ck k×k (1.1)
 
λ1 0 . . . 0

 0 λ2 . . . 0 

= c1 c2 . . . ck  .. .. . . .  (1.2)
 . . . .. 
0 0 . . . λk
It can be written as
Ak×k Ck×k = Ck×k Λk×k
where Λ is the diagonal matrix of eigenvalues.
Assume C is nonsingular, then we obtain the diagonalization of A
Λ = C−1 AC
where Λ is a k × k diagonal matrix with eigenvalues λi s in diagonal position.
16
4. Example: Suppose
3 1
A=
1 3

3 1 1 0 3−λ 1
A − λI = −λ =
1 3 0 1 1 3−λ
The characteristic equation is given by
|A − λI| = 0

3−λ 1
⇒ = 0
1 3−λ
⇒ λ2 − 6λ + 8 = (λ − 4)(λ − 2) = 0
The eigenvalues of A are λ1 = 4 and λ2 = 2.
• Find eigenvectors:
(1) λ1 = 4:
Ac1 = 4c1

3 1 c11 c11
i.e. = 4
1 3 c21 c21

3c11 + c21 4c11
⇒ =
c11 + 3c21 4c21

−c11 + c21 0
⇒ =
c11 − c21 0
⇒ c11 = c21
Let c11 = 1, then c21 = 1.

1
Therefore, eigenvector for λ1 = 4 is c1 =
1
(2) λ2 = 2:
Ac2 = 2c2

3 1 c12 c12
i.e. = 2
1 3 c22 c22

3c12 + c22 2c12
⇒ =
c12 + 3c22 2c22

c12 + c22 0
⇒ =
c12 + c22 0
⇒ c12 = −c22
Let c12 = 1, then c22 = −1.

1
Therefore, eigenvector for λ2 = 2 is c2 =
−1
17
• The equations system is homogeneous, it will yield an infinite number of vectors cor-
responding to the root λi .
• Check Λ = C−1 AC.

1 1 −1 −1 −1 −1
C= c1 c2 = ⇒C =
1 −1 2 −1 1

−1
−1 −1 −1 3 1 1 1 4 0
⇒ Λ = C AC = =
2 −1 1 1 3 1 −1 0 2
5. The eigenvalues of a symmetric matrix are all real.
6. If all k eigenvalues are distinct, C will have k linearly independent columns and such that
Λ = C−1 AC
7. If A is symmetric, the eigenvectors are linear independent and pairwise orthogonal in

that c′i cj = 0 for λi 6= λj .
8. Normalization of eigenvectors: c′i ci = 1, i = 1, . . . , k
9. Let Q denote the matrix whose columns are normalized orthogonal eigenvectors. Then
Q′ Q = I
The matrix Q is called an orthogonal matrix since its inverse is its transpose, i.e.
Q′ Q = QQ′ = I
⇒ Q−1 = Q′ , since Q−1 Q = I
     
c′1 c′1 c1 c′1 c2 . . . c′1 ck 1 0 ... 0
 c′2   c′2 c1 c′2 c2 . . . c′2 ck   0 1 ... 0 
′     
QQ= ..  [c1 c2 . . . ck ] =  .. .. .. ..  =  .... . . .. =I
 .   . . . .   . . . . 
c′k ′ ′
ck c1 ck c2 ′
. . . ck ck 0 0 ... 1
10. Let A be an (k × k) symmetric matrices. Then, there exists an (k × k) orthogonal matrix

Q such that Q′ AQ = Λ is diagonal.
Q′ AQ = Λ ⇔ A = QΛQ′
11. Example: (continue...)
18
• λ1 = 4:
3−4 1 c11 0
=
1 3−4 c21 0

c11 = c21
⇒
c211 + c221 = 1 (normalization)
" # " #
√1 −1
√
⇒ c1 = 2 or 2
√1 −1
√
2 2
• λ2 = 2:
3−2 1 c12 0
=
1 3−2 c22 0

c12 = −c22
⇒
c212 + c222 = 1 (normalization)
" # " #
√1 −1
√
⇒ c2 = 2 or 2
−1
√ √1
2 2
• Check Q′ Q = QQ′ = I.
" #′ " #
√1 √1 √1 √1 1 0
′ 2 2 2 2
QQ= √1 −1 √1 −1 =
2
√
2 2
√
2
0 1
• Check Q′ AQ = Λ.
" #′ " #
√1 √1 3 1 √1 √1 4 0
′ 2 2 2 2
Q AQ = √1 −1 √1 −1 = =Λ
2
√
2
1 3 2
√
2
0 2
12. The determinant of a symmetric matrix is the product of its eigenvalues.

Proof:
|Λ| = |Q′ AQ|

= |Q′ | · |A| · |Q|, since |AB| = |A| · |B|
= |Q′ | · |Q| · |A|
= |Q′ Q| · |A|, since |A| · |B| = |AB|
= |I| · |A|
= |A|
13. The sum of all the eigenvalues is equal to the trace of A.

Proof:
tr(Λ) = tr(C−1 AC)

= tr(ACC−1 ), since tr(AB) = tr(BA)
= tr(A)
19
14. The rank of A is equal to the number of nonzero eigenvalues.
Proof:
Let k1 be the number of nonzero eigenvalues of matrix A.
rank(A) = rank(QΛQ′ )
= rank(ΛQ′ ), since rank(BX) = rank(X) if B is nonsingular
= rank(QΛ), since rank(X) = rank(X′ )
= rank(Λ), since rank(BX) = rank(X) if B is nonsingular
= k1
15. The rank of an idempotent matrix is equal to its trace.

16. Suppose A is symmetric then A is positive definite if and only if all eigenvalues of A are
positive.
17. If A is symmetric and positive definite then there exists a nonsingular matrix P = QΛ1/2
such that A = PP′ , where Q is an orthogonal matrix of eigenvectors and
 √ 
λ1 √0 · · · 0
 0 λ2 · · · 0 
 
Λ1/2 =  .. .. . . ..  .
 . . . 
√.
0 0 ··· λk
There is also a matrix T = Λ1/2 Q′ such that A = T′ T and TA−1 T′ = I.

Proof:
A = QΛQ′ = QΛ1/2 Λ1/2 Q′ = (QΛ1/2 )(QΛ1/2 )′ = PP′
A = QΛQ′ = QΛ1/2 Λ1/2 Q′ = (Λ1/2 Q′ )′ (Λ1/2 Q′ ) = T′ T
18. Example: (continue...)

(1) Check PP′ = A.
" # " #
√1 √1 2 √0 √2 1
P = QΛ1/2 = √1
2
−1
2 = √2
2
2
√
2
0 2 2
−1
" #" #′
√2 1 √2 1 3 1
′ 2 2
PP = √2 √2
= =A
2
−1 2
−1 1 3
(2) Check T′ T = A.
" #′
2 √0 √1 √1 √2 √2
1/2 ′ 2 2 2 2
T=Λ Q = √1 −1 =
0 2 2
√
2
1 −1
′
√2 √2 √2 √2 3 1
′ 2 2 2 2
TT= = =A
1 −1 1 −1 1 3
20
(3) Check TA−1 T′ = I.

3 1 −1 1 3 −1
A= ⇒A =
1 3 8 −1 3
′
1 √2 √2 3 −1 √2 √2
−1 ′ 2 2 2 2
TA T =
8 1 −1 −1 3 1 −1
" #
1 √4 √4 √2 1
= 2 2 2
8 4 −4 √2 −1
2

1 8 0
=
8 0 8

1 0
=
0 1
= I2
21
Chapter 2
Multivariate Distributions
2.1 Multivariate Densities

1. Let x denote a vector of random variables X1 , X2 , . . . , Xk . Then, expected values of x in a
vector can be expressed as
   
E(X1 ) µ1
 E(X2 )   µ2 
   
µ = E(x) =  ..  =  .. 
 .   . 
E(Xk ) µk
 
X1
 X2 
 
xk×1 =  .. 
 . 
Xk
22
The variance-covariance (or covariance) matrix denotes by
Var(x)
= E[(x − µ)(x − µ)′ ]
  

 (X 1 − µ 1 ) 


 (X2 − µ2 )  

 
= E  ..  [(X 1 − µ 1 ) (X 2 − µ 2 ) · · · (X k − µ k )]

  .  


 (X − µ ) 

k k
 
E(X1 − µ1 )2 E[(X1 − µ1 )(X2 − µ2 )] . . . E[(X1 − µ1 )(Xk − µk )]
 E[(X2 − µ2 )(X1 − µ1 )] E(X2 − µ2 )2 . . . E[(X2 − µ2 )(Xk − µk )] 
 
=  .. .. .. .. 
 . . . . 
E[(Xk − µk )(X1 − µ1 )] E[(Xk − µk )(X2 − µ2 )] . . . E(Xk − µk )2
 
σ11 σ12 . . . σ1k
 σ21 σ22 . . . σ2k 
 
=  .. .. . . .. 
 . . . . 
σk1 σk2 . . . σkk
= Σ
2.2 Multivariate Normal Distribution

1. Let x be a k × 1 normal random vector and its probability density is given by

−k/2 −1/2 −1 ′ −1
f (x) = (2π) |Σ| exp (x − µ) Σ (x − µ)
2
where µ = [µ1 µ2 · · · µk ]′ and the positive definite matrix

 
σ11 σ12 · · · σ1k
 σ21 σ22 · · · σ2k 
 
Σ =  .. .. . . .. 
 . . . . 
σk1 σk2 · · · σkk
An abbreviated notation is
x ∼ N(µ, Σ)
2. If xk×1 ∼ N(µ, Σk×k ), then
(1) x − µ ∼ N(0, Σ).

(2) Let c be an k × 1 vector of constant, then
c′ x ∼ N(c′ µ, c′Σc).
23
Since E(c′ x) = c′ E(x) = c′ µ.
Var(c′ x) = E [(c′ x − E(c′ x))(c′ x − E(c′ x))′ ]

= E [(c′ x − c′ µ)(c′ x − c′ µ)′ ]
= c′ E [(x − µ)(x − µ)′ ] c
= c′ Σc
3. Suppose that xk×1 ∼ N(0, Ik ), then
x′ x ∼ χ2 (k)
4. Suppose that x ∼ N(0, σ 2 I), then

1 ′
x x = x′ (σ 2 I)−1 x ∼ χ2 (k)
σ2
5. Suppose that x ∼ N(0, Σ), where Σ is a positive definite matrix. Then,
x′ Σ−1 x ∼ χ2 (k)
6. If x ∼ N(0, σ 2 I) and A is a (k × k) symmetric and idempotent matrix with rank r (r ≤ k),

then
1 ′
x Ax ∼ χ2 (r)
σ2
Proof:
Let Q denote the orthogonal matrix, then

′ Ir 0
Q AQ = Λ =
0 0
Define y = Q′ x and x = Qy. Then E(y) = 0 and
Var(y) = E(yy′ )
= E(Q′ xx′ Q)
= Q′ (σ 2 I)Q
= σ2 I
⇒y ∼ N(0, σ 2 I)
⇒ x′ Ax = y′ Q′ AQy
= y′ y
= y12 + y22 + . . . + yr2
yi
∵ ∼ N(0, 1)
σ
x′ Ax
∴ ∼ χ2 (r)
σ2
24
7. Suppose x ∼ N(0, σ 2 I) and there are two quadratic form x′ Ax and x′ Bx, where A and B
are symmetric and idempotent matrices. Then, x′ Ax and x′ Bx are statistically indepen-
dently if and only if
AB = O.
8. Assume x ∼ N(0, σ 2 I). Let L be an (m × n) matrix and A a symmetric matrix of order n,

the linear form Lx is independent of the quadratic form x′ Ax if and only if LA = O.
25

Matrix Algebra: 1.1 Vector

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matrix Algebra: 1.1 Vector

Uploaded by

Copyright:

Available Formats

Chapter 1

2. The inner product of two vectors is defined as

Example: Let e = [e1 e2 . . . en ]′ be a (n × 1) residuals vector of OLS estimators then we

5. Suppose i and x are two (n × 1) column vector, then

7. A column vector with sample mean:

⇒ n1 ii′ is a n × n matrix with every element equal to 1/n.

3. Definition: lower-triangular matrix

4. Definition: upper-triangular matrix

diag(a11 , a22 , . . . , ann ).

where aii is the ith element on the principal diagonal.

6. The identity matrix of order n × n is defined as

7. Suppose A is a matrix with order m × n, it follows that

That is, pre- or post-multiplying by I does not change the matrix.

10. A square matrix A is a symmetric matrix if and only if A = A′ .

It turns out that

18. Generally, AB 6= BA if both products exist. e.g.

19. The transpose of the product of two matrices is given by

(AB)′ = B′ A′ , where A is m × n and B is n × k.

20. A matrix A is defined as an idempotent matrix, then

(1) projection matrix

22. Example 2: Let an idempotent matrix A = I − n1 ii′ , then

The matrix A transforms raw data into deviation form.

1.3 Trace of a Matrix

2. Basic properties of the trace:

(1) tr(cA) = c · tr(A), where c is a constant.

1.4 Determinant of a Square Matrix

2. A matrix that has an inverse is said to be a nonsingular matrix. Otherwise, it is said to

6. If A and B are nonsingular, then (AB)−1 = B−1 A−1 .

7. Let A be an order n square matrix, then the inverse matrix of A is

1.6 Rank of a Matrix

for the constants c1 , c2 , . . . , cn .

3. Definition: Linearly Independent

Therefore, we say that a1 , a2 and a3 are linearly dependent.

11. rank(In )=n.

12. rank(cA)=rank(A), where c is a constant that is not 0.

13. rank(A′ )=rank(A).

14. If A is an (m × n) matrix, then rank(A) ≤ min{m, n}.

15. If A is an (n × n) matrix, then rank(A) = n if and only if A is nonsingular.

16. Let X be an (n × K) matrix, then rank(X)=rank(X′ X).

1.7 Partitioned Matrices

provided that A−1 −1

where B11 = (A11 − A12 A−1 −1

where B22 = (A22 − A21 A−1 −1

1.8 Quadratic Forms and Definite Matrices

Let A be an (n × n) symmetric matrix, then q = x′ Ax.

2. Definition: A quadratic form x′ Ax or its matrix A is said to be

(1) positive definite (p.d.) if q > 0 for all nonzero x.

(1) x′ Ax > 0 (p.d.) for all nonzero x.

q > 0 ∀ x 6= 0, since which is a sum of squares.

5. Properties of positive definite and positive semidefinite matrices

(4) If X is n × K and rank(X)=K, then X′ X is p.d. and nonsingular.

6. Consider two matrices A and B with the same dimension, then

(1) A > B if A − B is p.d.

Var(b0 |X) = E[(b0 − β)(b0 − β)′ |X]

Since DD′ is p.s.d, it follows that Var(b0 |X) ≥ Var(b|X).

Therefore, DD′ is p.s.d.

1.9 Matrix Differentiation

where fij = ∂ 2 y/∂xi ∂xj .

2. If a′ = [a1 a2 . . . an ] and x′ = [x1 x2 . . . xn ], then

1.10 Eigenvalues and Eigenvectors

where A is a known (k × k) square matrix, c is an unknown (k × 1) nonzero vector, and λ

i.e., (A − λI)−1 does not exist.

2. Suppose A is a (k × k) matrix of known numbers such that