Professional Documents
Culture Documents
Matrix Algebra
1.1 Vector
1. A vector is an ordered sequence of elements arranged in a row or column. Unless otherwise
noted, a vector will always be assumed to be a column vector. For example, a is a
3-element column vector and x is a n-element column vector.
x1
5 x2
a = 1 , x = ..
.
3
xn
Column vectors can be transformed into row vectors by the operation of transposition.
We denote the transposition operation by a prime. Thus, row vector:
a′ = [5 1 3], x′ = [x1 x2 . . . xn ]
1
4. Definition: Let i is a vector that contains a column of ones.
1
1
in×1 = ..
.
1
1
Pn
6. Sample mean: x̄ = n i=1 xi = n1 i′ x
1.2 Matrices
1. Definition:
A matrix is a rectangular array of elements. The order of a matrix is given by the
number of rows and the number of columns. The first number is the number of rows, and
the second number is the number of columns. A matrix A of order (or dimension) m × n
can be expressed as
a11 a12 . . . a1n
a21 a22 . . . a2n
A = .. .. . . ..
. . . .
am1 am2 . . . amn
Example:
2 1 7 4
B = 1 2 −2 1
4 1 2 −3
The matrix B is of order 3 × 4.
2
2. Definition: Let A be a m × n matrix. A is said to be a square matrix if m = n.
5. A diagonal matrix is a square matrix with nonzero elements in all diagonal position and
zeros occur elsewhere.
a11 0 . . . 0
0 a22 . . . 0
A = .. .. . . ..
. . . .
0 0 . . . ann
We often denoted a diagonal matrix A as
Im A = AIn = A
8. The identity matrix can be entered or suppressed at will in matrix multiplication. For
example,
y − Py = Iy − Py = (I − P)y = My, where M = I − P
3
9. The transpose of a matrix A = [aij ] denoted as A′ , is obtained by creating the matrix
whose kth row is the kth column of the original matrix, i.e. A′ = [aji ].
Example:
1 4
1 2 3
A= , A′ = 2 5
4 5 6
3 6
14. The transpose of an identity matrix I is the identity matrix itself, i.e. I′ = I.
15. Definition: A matrix with all elements zero is said to be a null matrix and denoted as On .
16. Two matrices A and B are said to be equal if A − B = O.
17. Definition: Matrix multiplication
If the matrix A is m × n and B is n × k, then AB is defined.
a11 a12 · · · a1n a′1
a21 a22 · · · a2n ′
a2
A = .. .. .. .. = .
. . . . ..
am1 am2 · · · amn a′m
b11 b12 · · · b1k
b21 b22 · · · b2k
B= .. .. .. .. = [b1 b2 · · · bk ]
. . . .
bn1 bn2 · · · bnk
Then C is a m × k matrix defined as
a′1 a′1 b1 a′1 b2 · · · a′1 bk
a ′ a′ b1 a′ b2 · · · a′2 bk
2 2 2
C = AB = .. [b1 b2 · · · bk ] = .. .. .. ..
. . . . .
′ ′ ′ ′
am am b1 am b2 · · · am bk
4
Example 1:
1 3 2 1 3
A= ,B =
2 4 0 1 2
1(2) + 3(0) 1(1) + 3(1) 1(3) + 3(2) 2 4 9
⇒ AB = =
2(2) + 4(0) 2(1) + 4(1) 2(3) + 4(2) 4 6 14
Example 2:
Let X be a n × K matrix defined as:
x11 x12 · · · x1K x′1
x21 x22 · · · x2K x′2
X(n×K) = .. .. .. .. = ..
. . . . .
xn1 xn2 · · · xnK x′n
where
xi1
xi2
xi = .. , where i = 1, 2, . . . , n
.
xiK
That is, the transpose of a product is the product of the transposes in reverse order.
Example:
1 2 2 0
A= ,B =
3 4 1 1
5
4 2 ′ 4 10
⇒ AB = ⇒ (AB) =
10 4 2 4
′ ′ 2 1 1 3 4 10
BA = =
0 1 2 4 2 4
A = AA.
That is, multiplying A by itself, however many times, simply reproduces the original matrix.
Example:
4 −2
A=
6 −3
2 4 −2 4 −2 4 −2
A = =
6 −3 6 −3 6 −3
21. Example 1:
6
(3) tr(A + B) = tr(A) + tr(B)
(4) tr(In ) = n
(5) tr(AB) = tr(BA)
(6) tr(ABC) = tr(CAB) = tr(BCA)
(7) a′ a = tr(a′ a) = tr(aa′ ), where a is a n × 1 column vector.
P P
(8) tr(A′ A) = tr(AA′ ) = ni=1 nj=1 a2ij
where Cij is called the cofactor of the element of aij and denoted by Cij = (−1)i+j |Aij |.
Aij is a submatrix obtained from A by deleting row i and column j.
4. If A is an n × n matrix and c is a nonzero constant, then |cA| = cn |A|. e.g.
1 2 3 6
A= and B = 3A = .
3 10 9 30
⇒ |A| = 4, |B| = 36 = 32 · 4
5. If any row (or column) of a matrix is a multiple of any other row (or column), then its
determinant is 0.
6. |A′| = |A|.
7. If A and B are square matrices of the same order, then |AB| = |A| · |B|.
7
1.5 Inverse Matrix
1. An n × n matrix A has an inverse, denote A−1 , provided that AA−1 = A−1 A = In .
3. The inverse of an identity matrix is the identity matrix itself, i.e. I−1 = I.
4. The inverse of the inverse is the original matrix itself, i.e. (A−1 )−1 = A.
5. The inverse of the transpose is the transpose of the inverse, i.e. (A′ )−1 = (A−1 )′ .
8. Example:
1 3 4
A= 1 2 1
2 4 5
C11 = 6, C12 = −3, C13 = 0, C21 = 1, C22 = −3, C23 = 2, C31 = −5, C32 = 3, C33 = −1
|A| = 1(6) + 3(−3) + 4(0) = −3
′
6 −3 0 6 1 −5 −2 −1 5
1 1 3 3
A−1 = 1 −3 2 = −3 −3 3 = 1 1 −1
−3 −3
−5 3 −1 0 2 −1 0 −23
1
3
Verify:
−1 5
1 3 4 −2 3 3
1 0 0
AA−1 = 1 2 1 1 1 −1 = 0 1 0 = I3
−2 1
2 4 5 0 3 3
0 0 1
c1 a1 + c2 a2 + . . . + cn an
8
1 2 3
2. Example: Suppose a1 = , a2 = , a3 = , and c1 = −2, c2 = 1, c3 = 2, then
2 1 4
the linear combination is
1 2 3 6
c1 a1 + c2 a2 + c3 a3 = −2 +1 +2 =
2 1 4 5
9
In this example, there exists nonzero solutions c1 = −13
3
, c2 = 3, and c3 = 1 such that
3 2 7 0
−13
0 + 3 1 + −3 = 0 .
3
3 4 1 0
Hence rank(A) = 2.
17. Let X be an (n×K) matrix and A be an (n×n) nonsigular matrix, then rank(AX)=rank(X).
2. If
A11 0
A=
0 A22
then the inverse of A is
−1 A−1
11 0
A =
0 A−1
22
10
3. If
A11 A12
A=
A21 A22
where A11 and A22 are square nonsingular matrices, then
−1 B11 −B11 A12 A−1
22
A =
−A−1 −1 −1 −1
22 A21 B11 A22 + A22 A21 B11 A12 A22
q = x′ Ax
1 2 x1
= [x1 x2 ]
2 6 x2
= x21 + 4x1 x2 + 6x22
3. If A is an (n × n) symmetric matrix and rank(A)=n, then the following are all equivalent:
11
4. Example:
1 2
Show that A = is p.d.
2 6
(1) Method 1:
′ 1 2 x1
q = x Ax = [x1 x2 ]
2 6 x2
= x21 + 4x1 x2 + 6x22
= (x1 + 2x2 )2 + 2x22
(1) A positive definite matrix has diagonal elements that are strictly positive, while p.s.d.
matrix has nonnegative diagonal elements
(2) If A is p.d., then A−1 exists and is p.d.
(3) If X is n × K, then X′ X is p.s.d.
Proof:
Let c be an K × 1 nonzero matrix.
q = c′ X′ Xc
= y′ y, Let yn×1 = Xc
X
= yi2
≥ 0
7. Example:
Show that OLS estimator b = (X′ X)−1X′ y have minimum variance in the class of unbiased
estimators.
Proof:
Let b0 = Cy be another linear unbiased estimator of β, where C is an K × n matrix such
that CX = I.
12
Let D = C − (X′ X)−1 X′
⇒ DX = CX − IK = OK×K
∴ b0 = Cy
= C(Xβ + ε)
= β + Cε
⇒ b0 − β = Cε
q = z′ DD′ z
= h′ h, let h = D′ z
X
= h2i
≥ 0
13
The Hessian matrix (i.e. second derivatives matrix) of y is defined as
f11 f12 · · · f1n
∂2y ∂(∂y/∂x) ∂(∂y/∂x) f21 f22 · · · f2n
H= ′
= ′
= = .. .. . . ..
∂x∂x ∂x ∂(x1 x2 . . . xn ) . . . .
fn1 fn2 · · · fnn
y = a′ x = a1 x1 + a2 x2 + . . . + an xn
∂y
∂x1 a1
∂y ′
∂(a x) ∂y
∂(x a) ∂x2 a2
′
= = = . = .. =a
∂x ∂x ∂x .. .
∂y an
∂x n
Example:
Residual sum of squares:
e′ e = y′ y − 2b′ X′ y + b′ X′ Xb
∂(−2b′ X′ y)
= −2X′ y
∂b
3. Theorem:
If A is a symmetric matrix, then
∂x′ Ax
= 2Ax.
∂x
1 3
Example 1: A = . Then
3 4
′ 1 3 x1
x Ax = [x1 x2 ] = 1x21 + 4x22 + 6x1 x2
3 4 x2
∂x′ Ax 2x1 + 6x2 1 3 x1
= =2 = 2Ax
∂x 6x1 + 8x2 3 4 x2
Example 2:
∂(b′ X′ Xb)
= 2X′ Xb, since X′X is symmetric
∂b
4. If A is not symmetric, then
∂x′ Ax
= (A + A′ )x.
∂x
14
1 3
e.g. A = . Then
0 4
′ 1 3 x1
y = x Ax = [x1 x2 ] = x21 + 4x22 + 3x1 x2
0 4 x2
" #
∂y
∂x′ Ax ∂x1
= ∂y
∂x ∂x2
2x1 + 3x2
=
3x1 + 8x2
2 3 x1
=
3 8 x2
1 3 1 0 x1
= +
0 4 3 4 x2
= (A + A′ )x
5. If y = A x then
(m×1) (m×n)(n×1)
∂y
= A′ .
∂x (n×m)
e.g. A
x1
y1 5 3 2 5x1 + 3x2 + 2x3
y= = x2 = = Ax
y2 2 1 3 2x1 + x2 + 3x3
x3
∂y1 ∂y2
∂x ∂x 5 2
∂y ∂(Ax) ∂y1 ∂y2 ∂y1 ∂y2
1 1
= = = ∂x ∂x2 = 3 1 = A′
∂x ∂x ∂x ∂x ∂y1
2
∂y2
∂x ∂x
2 3
3 3
Ac = λc
Ac = λIc
Ac − λIc = 0
(A − λI)c = 0
If the inverse of (A − λI) exists, then
(A − λI)−1 (A − λI)c = 0
15
This implies that
c=0
This solution contradicts the condition that c 6= 0. The results that the matrix
(A − λI) is singular.
A c =λ c
(k×k) (k×1) (k×1)
or
(A − λI)c = 0
where λ is an unknown scalar and c is an unknown k × 1 vector. Then,
|A − λI| = 0
It can be written as
Ak×k Ck×k = Ck×k Λk×k
where Λ is the diagonal matrix of eigenvalues.
Λ = C−1 AC
16
4. Example: Suppose
3 1
A=
1 3
3 1 1 0 3−λ 1
A − λI = −λ =
1 3 0 1 1 3−λ
|A − λI| = 0
3−λ 1
⇒ = 0
1 3−λ
⇒ λ2 − 6λ + 8 = (λ − 4)(λ − 2) = 0
The eigenvalues of A are λ1 = 4 and λ2 = 2.
• Find eigenvectors:
(1) λ1 = 4:
Ac1 = 4c1
3 1 c11 c11
i.e. = 4
1 3 c21 c21
3c11 + c21 4c11
⇒ =
c11 + 3c21 4c21
−c11 + c21 0
⇒ =
c11 − c21 0
⇒ c11 = c21
Let c11 = 1, then c21 = 1.
1
Therefore, eigenvector for λ1 = 4 is c1 =
1
(2) λ2 = 2:
Ac2 = 2c2
3 1 c12 c12
i.e. = 2
1 3 c22 c22
3c12 + c22 2c12
⇒ =
c12 + 3c22 2c22
c12 + c22 0
⇒ =
c12 + c22 0
⇒ c12 = −c22
Let c12 = 1, then c22 = −1.
1
Therefore, eigenvector for λ2 = 2 is c2 =
−1
17
• The equations system is homogeneous, it will yield an infinite number of vectors cor-
responding to the root λi .
• Check Λ = C−1 AC.
1 1 −1 −1 −1 −1
C= c1 c2 = ⇒C =
1 −1 2 −1 1
−1
−1 −1 −1 3 1 1 1 4 0
⇒ Λ = C AC = =
2 −1 1 1 3 1 −1 0 2
5. The eigenvalues of a symmetric matrix are all real.
6. If all k eigenvalues are distinct, C will have k linearly independent columns and such that
Λ = C−1 AC
9. Let Q denote the matrix whose columns are normalized orthogonal eigenvectors. Then
Q′ Q = I
The matrix Q is called an orthogonal matrix since its inverse is its transpose, i.e.
Q′ Q = QQ′ = I
c′1 c′1 c1 c′1 c2 . . . c′1 ck 1 0 ... 0
c′2 c′2 c1 c′2 c2 . . . c′2 ck 0 1 ... 0
′
QQ= .. [c1 c2 . . . ck ] = .. .. .. .. = .... . . .. =I
. . . . . . . . .
c′k ′ ′
ck c1 ck c2 ′
. . . ck ck 0 0 ... 1
Q′ AQ = Λ ⇔ A = QΛQ′
18
• λ1 = 4:
3−4 1 c11 0
=
1 3−4 c21 0
c11 = c21
⇒
c211 + c221 = 1 (normalization)
" # " #
√1 −1
√
⇒ c1 = 2 or 2
√1 −1
√
2 2
• λ2 = 2:
3−2 1 c12 0
=
1 3−2 c22 0
c12 = −c22
⇒
c212 + c222 = 1 (normalization)
" # " #
√1 −1
√
⇒ c2 = 2 or 2
−1
√ √1
2 2
• Check Q′ Q = QQ′ = I.
" #′ " #
√1 √1 √1 √1 1 0
′ 2 2 2 2
QQ= √1 −1 √1 −1 =
2
√
2 2
√
2
0 1
• Check Q′ AQ = Λ.
" #′ " #
√1 √1 3 1 √1 √1 4 0
′ 2 2 2 2
Q AQ = √1 −1 √1 −1 = =Λ
2
√
2
1 3 2
√
2
0 2
19
14. The rank of A is equal to the number of nonzero eigenvalues.
Proof:
Let k1 be the number of nonzero eigenvalues of matrix A.
rank(A) = rank(QΛQ′ )
= rank(ΛQ′ ), since rank(BX) = rank(X) if B is nonsingular
= rank(QΛ), since rank(X) = rank(X′ )
= rank(Λ), since rank(BX) = rank(X) if B is nonsingular
= k1
2
√
2
0 2 2
−1
" #" #′
√2 1 √2 1 3 1
′ 2 2
PP = √2 √2
= =A
2
−1 2
−1 1 3
(2) Check T′ T = A.
" #′
2 √0 √1 √1 √2 √2
1/2 ′ 2 2 2 2
T=Λ Q = √1 −1 =
0 2 2
√
2
1 −1
′
√2 √2 √2 √2 3 1
′ 2 2 2 2
TT= = =A
1 −1 1 −1 1 3
20
(3) Check TA−1 T′ = I.
3 1 −1 1 3 −1
A= ⇒A =
1 3 8 −1 3
′
1 √2 √2 3 −1 √2 √2
−1 ′ 2 2 2 2
TA T =
8 1 −1 −1 3 1 −1
" #
1 √4 √4 √2 1
= 2 2 2
8 4 −4 √2 −1
2
1 8 0
=
8 0 8
1 0
=
0 1
= I2
21
Chapter 2
Multivariate Distributions
X1
X2
xk×1 = ..
.
Xk
22
The variance-covariance (or covariance) matrix denotes by
Var(x)
= E[(x − µ)(x − µ)′ ]
(X 1 − µ 1 )
(X2 − µ2 )
= E .. [(X 1 − µ 1 ) (X 2 − µ 2 ) · · · (X k − µ k )]
.
(X − µ )
k k
E(X1 − µ1 )2 E[(X1 − µ1 )(X2 − µ2 )] . . . E[(X1 − µ1 )(Xk − µk )]
E[(X2 − µ2 )(X1 − µ1 )] E(X2 − µ2 )2 . . . E[(X2 − µ2 )(Xk − µk )]
= .. .. .. ..
. . . .
E[(Xk − µk )(X1 − µ1 )] E[(Xk − µk )(X2 − µ2 )] . . . E(Xk − µk )2
σ11 σ12 . . . σ1k
σ21 σ22 . . . σ2k
= .. .. . . ..
. . . .
σk1 σk2 . . . σkk
= Σ
An abbreviated notation is
x ∼ N(µ, Σ)
c′ x ∼ N(c′ µ, c′Σc).
23
Since E(c′ x) = c′ E(x) = c′ µ.
x′ x ∼ χ2 (k)
x′ Σ−1 x ∼ χ2 (k)
Var(y) = E(yy′ )
= E(Q′ xx′ Q)
= Q′ (σ 2 I)Q
= σ2 I
⇒y ∼ N(0, σ 2 I)
⇒ x′ Ax = y′ Q′ AQy
= y′ y
= y12 + y22 + . . . + yr2
yi
∵ ∼ N(0, 1)
σ
x′ Ax
∴ ∼ χ2 (r)
σ2
24
7. Suppose x ∼ N(0, σ 2 I) and there are two quadratic form x′ Ax and x′ Bx, where A and B
are symmetric and idempotent matrices. Then, x′ Ax and x′ Bx are statistically indepen-
dently if and only if
AB = O.
25