You are on page 1of 42

Notes on Matrix Algebra

Giovanni S.F. Bruno


Current address: Department of economics - Bocconi University,
Milan
E-mail address: giovanni.bruno@unibocconi.it
Contents

Chapter 1. Introduction 5
Chapter 2. Formal definitions and notation 7
Chapter 3. Basic operations 10
3.1. Transpose of a matrix 10
3.2. Matrix addition 11
3.3. Partitioned matrices 12
3.4. Scalar multiplication 15
3.5. Vector multiplication 16
3.6. Matrix multiplication 17
3.7. Trace of a matrix 22
Chapter 4. Determinant of a matrix 23
4.1. Definition 23
4.2. The determinant of matrices of order 1, 2 and 3 23
4.3. The cofactor expansion 24
4.4. Properties 26
Chapter 5. Linearly independent vectors 27
5.1. Linear combinations of vectors 27
5.2. Linearly independent vectors 28
Chapter 6. Rank 31
6.1. Definition 31
6.2. Maximal number of linearly independent vectors 31
6.3. Row rank 32
6.4. The rank of a product matrix 33
6.5. Systems of linear equations 33
Chapter 7. Inverse matrices 35
7.1. Existence and uniqueness of the inverse matrix 35
7.2. Computation of the inverse 36
7.3. Ordinary Least Squares 37
Chapter 8. Vector spaces, spanning sets and projection matrices 38
3
CONTENTS 4

8.1. Vectors spaces 38


8.2. Spanning sets 38
8.3. Orthogonal spaces 39
8.4. Idempotent matrices 39
8.5. Projection matrices 39
8.6. OLS residuals and predicted values 41
Bibliography 42
CHAPTER 1

Introduction

These notes cover matrix algebra results that are useful to econome-
tricians. They are based on Greene (2008), Searle (1982), Rao (1973)
and Harville (1997). Exercises are given throughout and I recommend
the reader to go through all of them (refraining from looking at the
solutions that follows each exercise until one provides her/his own so-
lutions or, at least, enough effort).
Consider Table 1 showing disposable income and aggregate con-
sumption for the U.S. over the years 1940-1950.

Table 1. U.S. Consumption and Income, 1940-1950

Year Consumption Disposable Income


1940 226 241
1941 240 280
1942 235 319
1943 245 331
1944 255 345
1945 265 340
1946 295 332
1947 300 320
1948 305 339
1949 315 338
1950 325 371

5
1. INTRODUCTION 6

Now consider the array of numbers in Table 1 extracted as



226 241
240 280
235 319

245 331

255 345

(1.0.1) 265 340 ,

295 332

300 320

305 339

315 338
325 371
where the position of an entry determines its meaning. For example,
the entry in the first column and third row, 235, is consumption in year
1942. It is therefore clear that a row in the array indicates the values
taken on by both economic variables in the same year and a column the
values taken on by the same economic variable over all years. Such an
array is called a matrix. Therefore, matrices are mathematical objects
that are useful to represent data of economic variables.
CHAPTER 2

Formal definitions and notation

Using the words of Searle (1982) a matrix is a square or rectangular


array of numbers arranged in rows and columns. All columns of a
matrix are of equal length, as are all rows. I stick to the convention of
denoting a matrix by a capital letter and its elements by the same letter
in lower case, subscripted first by row and then by column indexes.
Therefore, aij denotes the element of the matrix A in row i and column
j. If A has n rows and k columns, A is said to be a (n k) matrix or,
equivalently, a matrix of order (n k) and can be written as

a11 a12 a1j a1k
a21 a22 a2j a2k
. .. .. ..
.
. . . .

(2.0.2) A= ;

ai1 ai2 aij aik
. .. .. ..
.. . . .
an1 an2 anj ank
n and k are referred to as the dimensions of A. When we wish to
indicate that a matrix A is of order (n k), we can use the compact
notation Ank .
If the entries aij are real numbers, A is said a real matrix. In
this course we will deal exclusively with real matrices. A sometimes
convenient representation of a matrix A of order (n k) is A = {aij } ,
i = 1, ..., n and j = 1, ..., k, where aij is the generic element of A in the
i.th row and j.th column.
If a matrix has all zero elements is called a null matrix and is
denoted by 0; if it is of given order (n k), then it is denoted by 0nk .
If n = k, A is said a square matrix of order k. For example

2.1 1.5 0 5.9
3.5 5.1 3.4 6.1
A= 0
.
0 2.1 6.7
1.1 4.1 0.4 7.2
All elements aii , i = 1, ..., n, of a square matrix are referred to as the
terms onto the main diagonal of A or, simply, as the main diagonal of A.
7
2. FORMAL DEFINITIONS AND NOTATION 8

The terms of a square matrix that lie onto the line parallel to and just
below the main diagonal are referred to as the first lower subdiagonal
of A, the terms laying onto the line parallel to and just below the first
lower subdiagonal are referred to as the second lower subdiagonal, etc.
Upper subdiagonals are defined similarly. In the foregoing matrix A
the main diagonal is given by the terms (2.1, 5.1, 2.1, 7.2), and the
first lower and upper subdiagonals are (3.5, 0, 0.4) and (1.5, 3.4, 6.7),
respectively. The elements of a square matrix other than the main
diagonal are referred to as the off-diagonal or non-diagonal terms.
A square matrix whose off-diagonal terms are all zero is called a
diagonal matrix, for example

2 0 0
A = 0 7 0 .
0 0 5
A diagonal matrix whose main diagonal has all unity elements is called
an identity matrix and is denoted by I. If I is of a given order n, then
it is usually denoted by In , for example

1 0 0
I3 = 0 1 0 .
0 0 1
A matrix consisting of only a single column is called a column vec-
tor,
x11

x21
x= ... ,

xn1
the order of which is (n 1) . It is denoted by lower case letters in bold.
A matrix consisting of only a single row is called a row vector,
x0 = x11 x12 . . . x1n ,


the order of which is (1 n) . It is denoted by lower case letters in bold


with a prime (the reason for the prime notation will clarify below).
Example 1. A column vector:

11
7
a=
5 .

10
A row vector
b0 =

2 4 5 23 .
2. FORMAL DEFINITIONS AND NOTATION 9

Two matrices A = {aij } , i = 1, ..., nA and j = 1, ..., kB , and B =


{bij } , i = 1, ..., nB and j = 1, ..., kB , are equal if and only if nA =
nB = n, kA = kB = k, and aij = bij for all i = 1, ..., n and j = 1, ..., k.
In words, two matrices A and B are equal if and only if they are of
the same order and each element of A is equal to the corresponding
element of B.
CHAPTER 3

Basic operations

3.1. Transpose of a matrix


Given the matrix A = {aij } , i = 1, ..., n and j = 1, ..., k, the
transpose of A is denoted by A0 and is defined as
A0 = {aji } ,
j = 1, ..., k and i = 1, ..., n. In words, the transpose of a matrix A is
a matrix whose element in row j and column i is the element of A in
row i and column j. Evidently, the rows of A0 are the columns of A,
with order preserved, from first to last, and if A is of order (n k) its
transpose is of order (k n) .
Example 2. Given
 
5 3 2 0
A= ,
9 7 0 1

5 9
3 7
A0 =

.
2 0
0 1

The transpose operation is reflexive: (A0 )0 = A.


The transpose of a row vector is the same vector shaped as a column
vector and the transpose of a column vector is the same vector shaped
as a row vector (this explains the prime notation for row vectors).
Example 3. Given


1
6
x=
5
9
0 
x = 1 6 5 9 .
10
3.2. MATRIX ADDITION 11

Exercise 1. Do the transpose of matrix (1.0.1).


Solution:
 
226 240 235 245 255 265 295 300 305 315 325
241 280 319 331 345 340 332 320 339 338 371
A matrix A is said symmetric if it is a square matrix such that
A = A0 . For example
1 4 9
A= 4 7 3
9 3 6
is a symmetric matrix of order 3. A generic (n n) symmetric matrix
can be represented as
a11 a12 . . . a1n

a12 a22 . . . a2n
A= ... .. . . . .
. . ..
a1n a2n . . . ann

3.2. Matrix addition


The matrix addition and matrix subtraction are defined only for
matrices of the same order. In this case it is said that the matrices are
conformable for matrix addition. Given two (n k) matrices A = {aij }
and B = {bij }, i = 1, ..., n and j = 1, ..., k,
(1) the matrix addition of A and B is denoted by the matrix A+B
defined as
A + B = {aij + bij } ,
i = 1, ..., n and j = 1, ..., k; in words, the matrix addition of
two matrices is the matrix of sums, elementwise;
(2) the matrix subtraction of A and B is denoted by the matrix
A B defined as
A B = {aij bij } ,
i = 1, ..., n and j = 1, ..., k; in words, the matrix subtraction
of two matrices is the matrix of differences, elementwise.
The null matrix plays the same role as the scalar 0 in scalar addition.
So,
Ank + 0nk = Ank .
3.2.1. The transpose of the matrix addition. Given two ma-
trices of the same order, A and B, it is not hard to verify that (A + B)0 =
A0 + B 0 .
3.3. PARTITIONED MATRICES 12

3.2.2. The laws of algebra. It is easy to verify that matrix ad-


dition is commutative, A + B = B + A, and associative (A + B) + C =
A + (B + C).
Exercise 2. Verify the properties of matrix addition through nu-
merical examples.
Solution: easy.

3.3. Partitioned matrices


Partitioning is an important operation that do not actually trans-
form the matrix itself, but only its representation.
Consider the following matrix

3 4 5 6 8 3
4 7 9 1 4 2

(3.3.1) A= 2 8 8 6 7 1

4 3 0 7 0 9
4 7 5 3 3 5
and draw two dashed lines separating the matrix into four blocks as
follows
3 4 5 6 | 8 3

4 7 9 1 | 4 2
2 8 8 6 | 7 1

A= .
|
4 3 0 7 | 0 9
4 7 5 3 | 3 5
Matrix A can be, thereby, represented as
 
A11 A12
(3.3.2) A= ,
A21 A22
where

3 4 5 6 8 3  
4 3 0 7
A11 = 4 7 9 1 , A12 = 4 2 A21 =
4 7 5 3
2 8 8 6 7 1
and  
0 9
A22 = .
3 5
A of (3.3.2) is represented as a matrix of matrices. A11 , A12 , A21
and A22 are said submatrices of A, A of (3.3.2) is said a partitioned
matrix and the process that leads from A of (3.3.1) to A of (3.3.2) is
said partitioning.
3.3. PARTITIONED MATRICES 13

Notice that A11 and A12 have the same number of columns as A21
and A22 , respectively, and that A11 and A21 have the same number
of rows as A12 and A22 , respectively. This is by no chance: a proper
partitioning requires that the horizontal and vertical dashed lines must
go the full length of the matrix. In general terms, the partitioning of a
generic matrix Ank into four submatrices as in (3.3.2) is said a 2 2
partitioning and can always be represented as
 
Kn1 k1 Ln1 k2
A=
Mn2 k1 Nn2 k2
where n1 + n2 = n and k1 + k2 = k. Partitioning can involve less or
more than four submatrices, for example both

3 4 | 5 6 8 3
4 7 | 9 1 4 2

(3.3.3) A= 2 8 | 8 6 7 1

4 3 | 0 7 0 9
4 7 | 5 3 3 5
and
3 4 | 5 6 8 3

4 7 | 9 1 4 2
|


(3.3.4) A= 2 8 | 8 6 7 1


4 3 | 0 7 0 9



4 7 | 5 3 3 5
are legitimate partitioned matrices of A. Partitioning a generic matrix
Ank into two submatrices as in (3.3.3) is said a 1 2 partitioning and
can always be represented, in general terms, as

A = Knk1 Lnk2
where k1 + k2 = k. Partitioning a matrix Ank into six submatrices as
in (3.3.4) is said a 3 2 partitioning.
Exercise 3. 1) Formulate the 32 partitioning of a matrix Ank in
general terms. 2) Find a 21 partitioning of A of (3.3.1). 3) Formulate
the 2 1 partitioning of a matrix Ank in general terms.
Solution: 1)

Kn1 k1 Ln1 k2
Ank = Mn2 k1 Nn2 k2
On3 k1 Pn3 k2
with n = n1 + n2 + n3 and k = k1 + k2 .
3.3. PARTITIONED MATRICES 14

2)
3 4 5 6 8 3

4 7 9 1 4 2
2 8 8 6 7 1

A=

4 3 0 7 0 9



4 7 5 3 3 5
3)  
Kn1 k
Ank =
Ln2 k
with n = n1 + n2 .
Example 4. Partitioning is important in regression analysis, for
instance when we wish to keep the variables of interest distinct from
the other regressors into the sample regressors matrix X. So, if in
0.1 1.2 1

0.4 1.8 2
0.6 1.8 3

X=
0.1 1.9 4

0.3 1.7 5
0.1 1.3 6
the first two columns are the explanatory variables of interest, whereas
the last is only a control variable, we may find it convenient to represent
X as a partitioned matrix X = (X1 X2 ), where
0.1 1.2 1

0.4 1.8 2
0.6 1.8 3

X1 = and X2 = .
0.1 1.9 4
0.3 1.7 5
0.1 1.3 6
Remark 1. It is worth noting that A of (3.3.1) and A of (3.3.2)
(or A of (3.3.1) and A of (3.3.3) for that matters) are the same matrix
of order (5 6) from a mathematical point of view. Partitioning just
makes it explicit some qualitative difference among the entries of the
matrix, as in example 4, where there is the difference between the
entries of the first two columns, observations peculiar to the variables
of interest, and the entries of the last column, observations peculiar to
the control variable.
The transpose of a partitioned matrix is the transpose of the ma-
trix of submatrices. It can be carried out into two logical steps. First,
3.4. SCALAR MULTIPLICATION 15

transpose the matrix of submatrices as if you dont know that its ele-
ments are matrices; second, transpose the submatrices. For example,
let

A= B C ,
then
B0
 
0
A = ;
C0
or let  
B C D
A= ,
E F G
then
B0 E0
A0 = C 0 F 0 .
D0 G0
The transpose of an l m partitioned matrix is always an m l
partitioned matrix.
Exercise 4. Transpose the partitioned matrix of (3.3.4) and verify
that it is a 2 3 partitioning of the transpose of A in (3.3.1).
Solution:

3 4 | 2 4 | 4

4 7 | 8 3 | 7
| |


A0 = 5 9 | 8 0 | 5


6 1
| 6 7 | 3

8 4 | 7 0 | 3
3 2 | 1 9 | 5
3 4 2 4 4

4 7 8 3 7
5 9 8 0 5

A0 =

6 1 6 7 3


8 4 7 0 3
3 2 1 9 5

3.4. Scalar multiplication


Let be a scalar and A = {aij } be a matrix of any order. The
multiplication of by A is defined as the matrix
A = {aij } .
3.5. VECTOR MULTIPLICATION 16

Example 5. Let
 
3 4
A= ,
1 7
then
 
10.5 14
3.5A = .
3.5 24.5

3.5. Vector multiplication


Given two column vectors a and b of order (n 1),
a1 b1

a2 b
and b = .2

a= ... ..
an bn
the multiplication of a by b, is carried out through the inner product of
the two vectors. The inner product of a and b is obtained by multiply-
ing each element of the first vector by the corresponding element of the
second vector and then summing all of the resulting scalar products,
and hence it is a scalar itself. By a convention that will clarify when
introducing the matrix product, the first vector in the inner product is
transposed. So, the inner product of a and b is written as
b1

n
0
 b2 X
a b a1 a2 . . . an .. ai b i .
. i=1
bn

It is evident that the inner product is commutative, that is a0 b =


b0 a. The inner product of two column vectors of different order is not
defined.
Exercise 5. Find the inner product of

1 0.5
1 1

a= 1 and b = 0.7

.

1 1
1 0.8

Solution: a0 b = 0.5 + 1 + 0.7 1 + 0.8 = 2


3.6. MATRIX MULTIPLICATION 17

3.6. Matrix multiplication


The multiplication of a matrix A by a matrix B is defined only if
the number of columns of A is equal to the number of rows of B, that
is if A is of order (n k) and B is of order (k m). If this is the case
we say that A and B are conformable for the matrix multiplication of
A by B. Partition
a11 a12 a13 . . . a1k

a21 a22 a23 . . . a2k
A= ... .. .. .
. . . . . ..
an1 an2 an3 . . . ank
into its rows as
a01

a02
A=
... ,

a0n
where
a0i = ai1 ai2 ai3 . . . aik


is the i.th row of A, i = 1, ..., n. Then, partition


b11 b12 . . . b1m

b21 b22 . . . b2m
b31 b32 . . . b3m

B= .
.. .. .. ..
. . .
bk1 bk2 . . . bkm
into its columns 
B= b1 b2 . . . bm ,
where
b1i

b2i
b3i

b=
..
.
bki
is the i.th column of B, i = 1, ..., m. Finally, define the matrix product
of A by B as the (n m) matrix of inner products
0
a1 b1 a01 b2 . . . a01 bm

a02 b1 a02 b2 . . . a02 bm
AB = ... .. .. .
. ... .
a0n b1 a0n b2 . . . a0n bm
3.6. MATRIX MULTIPLICATION 18

Clearly, neitherAB needs being equal to BA, nor BA is by necessity


defined if AB is. To BA be defined it has to be n = m. Therefore, we
have proved the following property of the matrix product: If both
AB and BA are defined, then both matrix products must be square
matrices. The matrix product AB is described in words by saying that
A is postmultiplied by B, or equivalently that B is premultiplied by A.
The identity and the null matrices play the same role as, respec-
tively, the scalars 1 and 0 in scalar multiplication, so

Ank Ik = In Ank = Ank

and
Ank 0k = 0n Ank = 0nk .

Exercise 6. Given

1 2 3 7
5 6 0 2
0 0
1
0
A= 0 1 3 8 and B =

6 5
1 2 3 3
0 0
9 3 6 5

Is the matrix product AB defined? If yes, compute it. Is the matrix


product BA defined? If yes, compute it.
Solution:AB is defined

16 17
6 10

AB = 19 15 .

20 16
39 48

BA is not defined.

Exercise 7. Given

1

5

A=
0 and B =
0 2 8
1
9

Is the matrix product AB defined? If yes, compute it. Is the matrix


product BA defined? If yes, compute it.
3.6. MATRIX MULTIPLICATION 19

Solution: AB is defined

0 2 8

0 10 40

AB =
0 0 0

0 2 8
0 18 72
BA is not defined.
Exercise 8. Given

1
5

A= 0 0 2 8 5 1
and B =

1
9
Is the matrix product AB defined? If yes, compute it. Is the matrix
product BA defined? If yes, compute it.
Solution: AB is defined

0 2 8 5 1
0 10 40 25 5

AB = 0 0 0 0 0

0 2 8 5 1
0 18 72 45 9
BA is defined, BA = 24.
3.6.1. The classical regression model in matrix form. Con-
sider the classical regression model for a data set with n observation
(3.6.1) yi = 1 xi,1 + . . . + k xi,k + i
where yi denotes the value of the dependent variable, xi,h the value of
the h.th regressor, h = 1, . . . , k, and i the value of the random shock,
all of them at the i.th observation, i = 1, . . . , n.
The n equations (3.6.1) can be expressed compactly in matrix form
as
y = X + ,
where

y1 x1,1 x1,h x1,k i
.. .. .. .. ..
. . . . .
y = yi , X = xi,1 xi,h xi,k , = h

. . .. .. .
.. .. . . ..
yn xn,1 xn,h xn,k k
3.6. MATRIX MULTIPLICATION 20

and  = (1 . . . i . . . n )0 , so that y and  are (n 1) column vectors,


X is an (n k) matrix and is a (k 1) column vector.

3.6.2. The transpose of the matrix product. Given two ma-


trices A and B that are conformable for matrix multiplication of A by
B, it has that (AB)0 = B 0 A0 .

3.6.3. The laws of algebra. The associative law holds for matrix
multiplication, provided that matrices are conformable. Given Ank ,
Bkl and Clm , then

(Ank Bkl ) Clm = Ank (Bkl Clm ) = Ank Bkl Clm .

The distributive law also holds,

A (B + C) = AB + AC,

provided that, on the one hand, A and B are conformable for matrix
multiplication and, on the other, B and C are conformable for matrix
addition.
As already seen, the commutative law does not hold, since it is
not generally true that AB = BA, even if both products are defined.
In special cases it may happen that AB = BA, as for the following
matrices
   
1 2 0 2
(3.6.2) A= and B = .
3 4 3 3

When this happens we say that A and B commute in matrix product.

Exercise 9. Verify that A and B of (3.6.2) commute in matrix


product.
Solution:
 
6 8
AB = = BA
12 18

3.6.4. The cross-product matrix. Given a matrix X of order


n k and a column vector y of order n, the square matrix of order k,
3.6. MATRIX MULTIPLICATION 21

X 0 X, is said the cross-product matrix for X


n n n

X X X
2
x xi,1 xi,h xi,1 xi,k
i=1 i,1

i=1 i=1

.. .. .. ..

. . . .


X n Xn Xn
X 0X =

xi,h xi,1 x2i,h xi,h xi,k

i=1 i=1 i=1
.. .. ... ..
n . . .

n n

X X X
2
x x
i,k i,1 x x i,k i,h xi,k

i=1 i=1 i=1

and the column vector of order k, X 0 y , is said the cross-product vector


of X and y
n
X
xi,1 yi
i=1
.


..

X n
0

Xy= x i,h y i
.

i=1

.
..


n
X
x y i,k i
i=1

3.6.5. Cross-product matrices of partitioned matrices. Let


A be a 1 2 partitioned matrix A = (B C) . Then, the cross-product
matrix of A is
 0 
0 B
AA = (B C)
C0
 0
B B B0C

= .
C 0B C 0C

Sometimes it may also be convenient to compute the cross-product


matrix of A0 (also referred to as the outer product of A)
 0 
0 B
AA = (B C)
C0
= BB 0 + CC 0 .
3.7. TRACE OF A MATRIX 22

3.7. Trace of a matrix


It is defined only for square matrices. Given a square matrix A =
{aij } , i, j = 1, ..., n the trace of A, denoted by tr (A) , is a scalar defined
as n
X
tr (A) = aii .
i=1
In words, the trace of a matrix is the sum of all terms onto its main
diagonal.
Example 6. Given

1 5 8
A = 2 3 9 ,
6 4 10
The properties of the trace operator can be verified by direct in-
spection:
(1) tr (A) = tr (A0 )
(2) tr (A + B) = tr (A) + tr (B)
(3) tr (Ank Bkn ) = tr (Bkn Ank ) .
Exercise 10. Verify the foregoing properties and prove that
tr (Ank Bkm Cmn ) = tr (Bkm Cmn Ank ) = tr (Cmn Ank Bkm )
(the last claim is a direct implication of property (3)).
Solution: (1) follows from the fact that a square matrix has the
same diagonal as its transpose; (2) follows from the fact that the trace
is a linear operator; (3)
n X
X k k X
X n
tr (Ank Bkn ) aij bji = bji aij tr (Bkn Ank ) .
i=1 j=1 j=1 i=1

Finally, by the associative law for the matrix multiplication and prop-
erty (3)
tr [Ank (Bkm Cmn )] = tr [Bkm (Cmn Ank )] = tr [Cmn Ank Bkm ] .
CHAPTER 4

Determinant of a matrix

4.1. Definition
Loosely speaking, the determinant is a function of the elements of
a matrix and is defined only for square matrices. While the formal
definition is cumbersome (readers are referred to Rao (1973) p. 22, or
Searle (1982) p. 90), it is also not strictly needed for our econometric
interests. Below, I show how to obtain the determinant of square ma-
trices of order 1, 2 and 3 and then, provide a general computational
rule that applies to any square matrix.

4.2. The determinant of matrices of order 1, 2 and 3


Given a generic square matrix of order 1 (a singleton matrix)

A11 = a11 ,
its determinant, indicated as det (A11 ), is defined as
(4.2.1) det (A11 ) = a11 .
Given a generic square matrix of order 2
 
a11 a12
A22 = ,
a21 a22
its determinant, indicated as det (A22 ), is defined as
(4.2.2) det (A22 ) = a11 a22 a12 a21
Given a generic square matrix of order 3

a11 a12 a13
A33 = a21 a22 a23 ,
a31 a32 a33
its determinant is defined as
(4.2.3) det (A33 ) = a11 a22 a33 + a21 a32 a13 + a12 a23 a31
a13 a22 a31 a23 a32 a11 a12 a21 a33 .
The foregoing formula can be better understood (and memorized) by
thinking A33 as made by:
23
4.3. THE COFACTOR EXPANSION 24

(1) its five diagonals: the main diagonal, the first lower sub-diagonal,
given by the elements a21 and a32 , the second lower sub-diagonal
given by a31 , and the first and the second upper sub-diagonals
with elements symmetrical to those in the corresponding lower
ones

a11 a12 a13

& &

A33 = a 21 a 22 a23
;

& &
a31 a32 a33
(2) or, alternatively, its five anti-diagonals

a11 a12 a13

. .

A33 = 21a a 22 a23 .

. .
a31 a32 a33
Then, det (A33 ) is the sum of the product of the main diagonal terms
a11 a22 a33 , the product of the first lower sub-diagonal terms and the
second upper sub-diagonal term, a21 a32 a13 , and the product of the
first upper sub-diagonal terms and the second lower sub-diagonal term
a12 a23 a31 , minus the sum of the products taken in the same way but
along the anti-diagonals: the product of the main anti-diagonal terms
a13 a22 a31 , the product of the first lower sub-anti-diagonal terms and the
second upper sub-anti-diagonal term, a23 a32 a11 , and the product of the
first upper sub-anti-diagonal terms and second lower sub-anti-diagonal
term a12 a21 a33 .
Exercise 11. Compute the determinant of the matrix

1 1 2
A= 3 5 4
1 0 2
Solution: det=18.

4.3. The cofactor expansion


I now describe a general procedure to compute the determinant
of a generic square matrix A of order n recursively. Consider any
row (or column) of A, say the i.th row, and multiply each element,
aij , of this row by its minor, det (Aij ), that is the determinant of the
matrix derived from A by crossing out the i.th row and the j.th column
4.3. THE COFACTOR EXPANSION 25

(that is the row and column containing aij ). Then, multiply each term
aij det (Aij ) by (1)i+j . The sum of all resulting products is det (A) :
X n
det (A) = (1)i+j aij det (Aij )
j=1

The scalar term


(4.3.1) cij = (1)i+j det (Aij )
is called a cofactor of A, specifically the cofactor of element aij , and
for this reason the present procedure is said a cofactor expansion of the
determinant.
A square matrix with zero determinant is said singular.
Remark 2. Notice that the number of factors in each product of
det (A) is equal to n, each product contains one and only one element
from every row and column of A and the number of products to add
up is n! 1 2 . . . n. Therefore, a square matrix with either a
null column or a null row is singular
Exercise 12. Consider

a11 a12 a13
A33 = a21 a22 a23 .
a31 a32 a33
Obtain det (A33 ) by applying a cofactor expansion and formula (4.2.2)
and verify that it is identical to that in (4.2.3).
Solution:

det (A33 ) = (1)2 a11 (a22 a33 a23 a32 ) + (1)3 a12 (a21 a33 a23 a31 )
(1)4 a13 (a21 a32 a22 a31 )
= a11 a22 a33 a11 a23 a32 a12 a21 a33 + a12 a23 a31
a13 a21 a32 a13 a22 a31
Exercise 13. Consider
 
a11 a12
A22 = .
a21 a22
Obtain det (A22 ) by applying a cofactor expansion and formula (4.2.1)
and verify that it is identical to that in (4.2.2).
Solution:
det (A22 ) = (1)2 a11 a22 + (1)3 a12 a21
= a11 a22 a12 a21
4.4. PROPERTIES 26

Exercise 14. Compute the determinant of the generic square ma-


trix of order 4
a11 a12 a13 a14
a21 a22 a23 a24
A44 =
a31 a32 a33 a34

a41 a42 a43 a44
by a cofactor expansion and formula (4.2.3).
Solution: Consider the first row and apply the general rule:
det (A44 ) = (1)2 a11 (a22 a33 a44 + a32 a43 a14 + a23 a34 a42
a24 a33 a42 a34 a43 a22 a23 a32 a44 )
(1)3 a12 (a21 a33 a44 + a31 a43 a24 + a23 a34 a41
a24 a33 a41 a34 a43 a21 a23 a31 a44 )
(1)4 a13 (a21 a32 a44 + a31 a42 a24 + a22 a34 a41
a24 a32 a41 a34 a42 a21 a22 a31 a44 )
(1)5 a14 (a21 a32 a43 + a31 a42 a23 + a22 a33 a41
a23 a32 a41 a33 a42 a21 a22 a31 a43 )
4.4. Properties
I spell out, next, some of the properties of the determinant function.
For an exhaustive list, the reader is referred to Searle (1982), pp. 92-99.
(1) det (A) = det (A0 )
(2) det (AB) = det (A) det (B)
(3) If A contains null rows or columns, then det (A) = 0.
CHAPTER 5

Linearly independent vectors

5.1. Linear combinations of vectors


Given a matrix Ank partitioned into its columns ai , i = 1, ..., k,

Ank = a1 a2 . . . ak
and a vector
c1

c2
ck1 =
... ,

ck
a linear combination of the columns of Ank is given by the (n 1)
column vector
Ank ck1 = c1 a1 + c2 a2 + . . . + ck ak .
Example 7. Let

2 1  
0.5
A= 4 3 and c = ,
2
6 5
then
2 1
Ac = 0.5 4
+ 2 3 .

6 5
Example 8. Suppose you have data on gross income, y, and income
tax payments, t, for a cross-section of n households, so that
y1 t1

y = ... , t = ... .
yn tn
Then, net income, x, is a linear combination of y and t
y1 t1

 
.
. .
. 1
x= . . = y t.
1
yn tn
27
5.2. LINEARLY INDEPENDENT VECTORS 28

Along the foregoing lines, Ank Ckm is an (n m) matrix of m


linear combinations of the columns of Ank , where
c1i

 c2i
Ckm = c1 c2 . . . cm and ci = ... , i = 1, . . . m.

cki
This can be seen more clearly by expanding Ank Ckm as follows

Ank Ckm = Ank c1 Ank c2 . . . Ank cm .
Exercise 15. Given a matrix Ank , represent the two cases of 1
and m generic linear combinations of the rows of Ank in matrix form.
Solution: Let 0
a1
a02
Ank = ...

a0n
where a0i =

ai1 . . . aij . . . aik , i = 1, ..., n, and
c0 = c1 . . . cn ,


then a generic linear combination of the rows of A can be expressed as


c0 Ank = c1 a01 + c2 a02 + . . . + cn a0n .
Let
c01

c02 0

Cmn =
... and ci =
ci1 . . . cin , i = 1, . . . m
c0m
then m generic linear combinations of the rows of A can be expressed
as 0
c1 Ank

c02 Ank
Cmn Ank = .. .
.
c0m Ank

5.2. Linearly independent vectors


The columns of Ank are said linearly independent if and only if
there exists no non-null vector ck1 such that Ank ck1 = 0n1 . Oth-
erwise, they are said linearly dependent. It is easy to prove that any
set of vectors containing the null vector is a set of linearly dependent
5.2. LINEARLY INDEPENDENT VECTORS 29

vectors (see exercise 17). The following exercise shows an important


property of sets of linearly independent vectors.
Exercise 16. Prove that if the vectors a1 , ..., ak are linearly inde-
pendent, so are a1 , ..., ai , for any integer i < k.
Solution: Assume there exists a non-null ci1 such that c1 a1 +
. . . + ci ai = 0n1 and let Ank = a1 . . . ai . . . ak and bk1 =
0
c0i1 00(ki)1 . Then,
Ank bk1 = 0n1 ,
contradicting that a1 , ..., ak are linearly independent, since bk1 is non-
zero by construction.
The following three exercises show important properties of sets of
linearly dependent vectors.
Exercise 17. Given the matrix

Ank = a1 a2 . . . ak ,
prove that if any one of the columns ai , i = 1, ..., k, is the null vector
then a1 , a2 , ..., ak are linearly dependent.
Solution: Let a1 = 0n1 and
1

0
c= ... ,

0
so that c is non-null and Ank c = 0n1 .
Exercise 18. Consider a matrix

Ank = a1 a2 . . . ak
whose columns are non-zero and linearly dependent, prove that at least
one column of A can be obtained as a linear combination of its prede-
cessors.
Solution: There exists a non-zero xk1 such that Ank xk1 = 0n1 ,
or
(5.2.1) x1 a1 + x2 a2 . . . + xn an = 0n1
and since all vectors are non-zero, there exist at least two non-zero
components of xk1 . Among all non-zero components of xk1 , take the
one with the highest subscript, say xi , and solve system (5.2.1) for ai .
5.2. LINEARLY INDEPENDENT VECTORS 30

Exercise 19. Consider a matrix



Ank = a1 a2 . . . ak
where a column of Ank , say ai , is a linear combination of some other
columns of Ank , say ai1 , . . . , aik0 , k 0 < k, then the columns of Ank
are a set of linearly dependent vectors.
Solution: There exists a non-zero xk0 1 such that
x1 ai1 + . . . + xk0 aik0 = ai .
Therefore,
x1

x2
ai1 ai2 . . . aik0

ai ..
= 0n1 .
.
x0
k
1
In example 8, the column vectors y, t and x is a linear combination

of y and t. Therefore, by exercise 19, the data matrix y t x is
a set of linearly dependent
 columns. This can be directly0 verified by
noting that y t x c = 0, with c = 1 1 1 .
Remark 3. The same analysis can be conducted in terms of row
vectors.
CHAPTER 6

Rank

6.1. Definition
Given a matrix A, the rank of A, rank (A), is defined as the maxi-
mal number of linearly independent columns of A. Ank is said of full
column rank (f.c.r.) if and only if rank (Ank ) = k.
It is obvious that if A has f.c.r. and c is a conformable non-null
column vector, then Ac = b 6= 0 (otherwise the columns of A would
be linearly dependent).

6.2. Maximal number of linearly independent vectors


The fundamental question arises about the maximal number of vec-
tors that can be contained in a set of linearly independent vectors of
order n. A first part of the answer is that it cannot be smaller than n.
This is proved at once upon noting that the n columns of the identity
matrix of order n 
In = e1 . . . en
are indeed linearly independent (this is obvious, since if x 6= 0 then
In x = x 6= 0). The following proves that a set of linearly independent
vectors of order n cannot contain more than n such vectors.
Let a1 , . . ., an be linearly independent column vectors of order n.
Now, consider any non-null column vector
0
a0 = a1,0 a2,0 . . . an,0 ,
it can be represented as
(6.2.1) a0 = a1,0 e1 + a2,0 e2 . . . + an,0 en .
Consider the set of vectors
a1 e1 . . . en .
Since
a11

 a21
a1 = In a1 = e1 . . . en ... ,

an1
31
6.3. ROW RANK 32

a1 e1 . . . en is a set of linearly dependent non-null vectors (see
exercise 19) and so there will be an ei that can be written as a linear
combination of its predecessors, as shown in exercise 18. Then, ei can
be replaced out in equation (6.2.1) to get
(6.2.2) a0 = b1,0 a1 + c1,0 e1 + . . . + ci1,0 ei1 + ai+1,0 ei+1 + . . . + an,0 en .
Consider the vectors
a2 a1 e1 . . . ei1 ei+1 . . . en
Since a2 is a linear combination of e1 , ..., ei , ..., en and ei in turn
is a linear combination of a1 , e1 , ..., ei1 , then a2 can be expressed
as a linear combination of a1 , e1 , ..., ei1  , ei+1 ..., en , which makes
of a2 a1 e1 . . . ei1 ei+1 . . . en a set of linearly dependent
vectors. So, one more vector eh can be replaced out in equation (6.2.2).
The process of adding as and excluding es (once included, an a will
never be excluded since its predecessors are only as and all as are
linearly independent) continues until a0 is written as a linear combina-
tion of a1 , . . . , an which shows the result that the maximal number of
linearly independent vectors of order n is n.
As an immediate implication, if Ank is of f.c.r. then k n.

6.3. Row rank


It is possible to formulate the concept of rank in terms of row vec-
tors, defining rank (A) as the maximum number of linearly indepen-
dent rows of A. Indeed, another implication of the result proved in the
preceding section is that this definition is equivalent to that of Section
6.1, that is the maximum number of linearly independent columns in A
equals that of linearly independent rows (for a formal proof see Searle
(1982) pp. 169-170). For this reason a square matrix of f.c.r. is simply
said of full rank (f.r.).
Exercise 20. Prove that a square matrix of f.r. contains neither
null rows nor null columns (hint: use the result that the maximum
number of linearly independent columns in a matrix equals that of
linearly independent rows and exercise 17)
Solution: That zero columns are not admissible has already been
proved in exercise 17. For the rest, let
0
0
a02
Ank = ...

a0n
6.5. SYSTEMS OF LINEAR EQUATIONS 33

and
c0 = 1 0 . . . 0 ,


so that c0 Ank = 0 contradicting that A has f.r.

6.4. The rank of a product matrix


The following result concerning the rank of a product matrix is also
important.
Given two matrices A and B conformable for the product AB, then
(6.4.1) rank (AB) min [rank (A) , rank (B)] .
For a proof, see Searle (1982), pp. 196-197.

6.5. Systems of linear equations


6.5.1. Non homogeneous systems. The non homogeneous sys-
tem of n linear equations in n unknowns
a11 x1 + . . . + a1n xn = b1
.. ..
. .
ai1 x1 + . . . + ain xn = bi
.. ..
. .
an1 x1 + . . . + ann xn = bn ,
can always be represented in matrix form
(6.5.1) Ax = b,
where A is any given non-null square matrix of order n and b is any
given non-null (n 1) vector.
Now, I prove that there exists a unique (n 1) vector x 6= 0, solu-
tion to the system in (6.5.1) if, and only if, A has full rank (f.r.), i.e.
rank (A) = n.
I start with the sufficiency part and suppose rank (A) = n. The
existence of x 6= 0 follows from the fact that, in light of the
 result of
Section 6.2, the columns of the partitioned matrix A b are a set
of linearly dependent
0 vectors, so there exists a [(n + 1) 1] non-null
0
vector x
c , with x 6= 0 and c 6= 0, such that
 
 x
A b = 0,
c
or equivalently
A
x + bc = 0
6.5. SYSTEMS OF LINEAR EQUATIONS 34

(that both x
and c are non-null follows from the fact that b is non-
null and A has f.r.) Hence, taking x = (1/c) x
establishes existence.
Uniqueness is easily proved as follows. Let
b = Ax1 = Ax2 ,
then A (x1 x2 ) = 0, and since A has f.r. it follows that x1 = x2 .
To prove necessity, maintain that there exists a unique (n 1) vec-
tor x1 6= 0, solution to the system in (6.5.1). Suppose, by contradic-
tion, that A is not of f.r. Then, there exists a vector x2 6= 0 such that
Ax2 = 0. Hence, all x1 + ax2 , any real scalar a, are solutions to (6.5.1)
contradicting uniqueness of x1 .
From Section 6.3 it follows that an equivalent statement can be
made in terms of row vectors. Given b0 any non-null (1 n) vector,
then there exists a unique (1 n) vector x0 6= 0 solution to the system
x0 A = b0 if, and only if, rank (A) = n.
6.5.2. Homogeneous systems. An homogeneous system of n lin-
ear equations in n unknowns can always be represented as
Ax = 0,
where A is any given non-null square matrix of order n. It is immediate
that there exists a non-zero solution to the system if and only if A is
not of f.r. Otherwise, the only solution would be the trivial one, x = 0.
CHAPTER 7

Inverse matrices

The concept of an inverse matrix is fundamental in multiple regres-


sion analysis.

7.1. Existence and uniqueness of the inverse matrix


If a square matrix A of order n has f.r., then a conformable square
matrix, A1 , exists such that
(7.1.1) A1 A = AA1 = In
and is unique. Such a matrix A1 is said the inverse of A.
Existence and uniqueness of A1 are proved by using the results of
Section 6.5.1. Partition In into its columns as

In = e1 . . . eh . . . en
and notice that all columns eh are non-zero vectors. Since A is of f.r.
the first result of Section 6.5.1 applies and so for each h = 1, . . . , n
there exists a unique (n 1) vector xh 6= 0 such that Axh = eh . Then,
letting

R = x 1 . . . x h . . . xn ,
it has that
(7.1.2) AR = In .
From the last result of Section 6.5.1 it follows that for each h = 1, . . . , n
there exists a unique (n 1) vector yh 6= 0 such that yh0 A = e0h . Then,
upon letting
0
y1
..
.
L = yh0

.
..
yn0
and noting that In is a symmetric matrix, it has that
(7.1.3) LA = In .
35
7.2. COMPUTATION OF THE INVERSE 36

To prove that R = L A1 , just premultiply both sides of equation


(7.1.2) by L to have LAR = L, then replace LA into the foregoing
equation with the right hand side of equation (7.1.3).
Remark 4. From the analysis in Section 6.5.1, it follows that the
converse also holds true: given a square matrix A, if a unique A1
exists satisfying (7.1.1), then A has f.r. Here, for simplicity, I am
maintaining that A is a square matrix. Indeed, the stronger statement
holds true that if a unique A1 exists satisfying (7.1.1), then A is a
square matrix with f.r., which has the important implication that the
inverse is defined only for this class of matrices.
Assuming A of f.r. implies the following properties, in addition to
existence and uniqueness of A1 and equation (7.1.1) ,
1
(1) (A1 ) = A
0
(2) (A0 )1 = (A1 )
(3) If A and B are square matrices of the same order,
(AB)1 = B 1 A1
(4) det (A1 ) 6= 0 and det (A1 ) = 1/ det (A)
Remark 5. Property 2 implies that the inverse of a symmetric
0
matrix is symmetric, that is if A0 = A, then (A1 ) = A1 .

7.2. Computation of the inverse


Given a square matrix A = {aij } , i, j = 1, . . . , n, define first its
cofactor matrix as C = {cij }, i, j = 1, . . . , n where cij is the cofactor
of aij as defined in (4.3.1). Then, define the adjugate (or adjoint) of A
as
adj (A) = C 0 .
Finally, assuming that A has f.r., its inverse can be computed as
1
(7.2.1) A1 = adj (A) .
det (A)
A proof of (7.2.1) is found in Dhrymes (2000), pp. 37-38.
Remark 6. Since the adjugate of a square matrix always exists
(determinants of square matrices are always defined), from (7.2.1) it
is clear that the inverse of a square matrix A exists if, and only if,
det (A) 6= 0, that is if and only if A is non singular. Also, given remark
4, A non singular is equivalent to A of f.r.
7.3. ORDINARY LEAST SQUARES 37

7.3. Ordinary Least Squares


7.3.1. The rank of the cross-product matrix. The cross-product
matrix of the sample regressor matrix X, X 0 X, plays a fundamental
role in the classical regression model of Section 3.6.1. I start with the
following general result.
Let A be an (n k) matrix, I prove that rank (A0 A) = k if and only
if A is of f.c.r. Suppose rank (A) = k. If A0 Ac = 0, then c0 A0 Ac = 0,
which implies Ac = 0 and since rank (A) = k, then c = 0, which
proves rank (A0 A) = k. Now, suppose rank (A0 A) = k. If Ac = 0,
then A0 Ac = 0 and since rank (A0 A) = k, then c = 0, which proves
rank (A) = k.
Exercise 21. Let A be an (n k) matrix, prove that rank (AA0 ) =
n if and only if A0 is of f.c.r.
Solution: Suppose rank (A0 ) = k. If AA0 c = 0, then c0 AA0 c = 0,
which implies c0 A = 0 and since rank (A0 ) = k, then c0 = 0, which
proves rank (AA0 ) = k. Now, suppose rank (AA0 ) = k. If c0 A = 0,
then c0 AA0 = 0 and since rank (AA0 ) = k, then c0 = 0, which proves
rank (A0 ) = k.
7.3.2. The OLS estimator. Consider the linear regression model
in matrix form, as in Section 3.6.1,
y = X + ,
and assume that X is of f.c.r. The ordinary least squares estimator
(OLS) for , say bols , is such that the cross-products between each
regressors in X and the OLS residuals
(7.3.1) e = y Xbols
are equal to zero:
X 0 (y Xbols ) = 0,
or equivalently
X 0 Xbols = X 0 y.
Since X is of f.c.r., so is X 0 X, which implies that the OLS estimator
exists with the formula
1
(7.3.2) bols = (X 0 X) X 0 y,
, is
and that the formula for the OLS predicted values, y
(7.3.3) = Xbols .
y
CHAPTER 8

Vector spaces, spanning sets and projection


matrices

8.1. Vectors spaces


A vector space V is defined as the collection of (column) vectors
such that given two vectors v1 and v2 in V, then v1 + v2 V and
cv1 V for any real scalar c. The n-dimensional Euclidean space Rn
is a vector space.

8.2. Spanning sets


Given the (n k) real matrix A, each column ai of A, i = 1, . . . , k,
belongs to Rn and the set of all linear combinations of the columns of
A is said the space spanned by the columns of A (or also the range of
A), denoted by R (A) .
R (A) can be easily proved to be a subspace of Rn (it is obvious that
R (A) Rn ; R (A) is a vector space since, given any two vectors a1 and
a2 belonging to R (A), then a1 + a2 R (A) and ca1 R (A) for any
real scalar c). Since each element of R (A) is e vector of n components,
R (A) is said to be a vector space of order n. The dimension of R (A),
denoted by dim [R (A)], is the maximal number of linearly independent
vectors in R (A).
It is not hard to prove that dim [R (A)] = rank (A). Indeed, since
any vector v in R (A) can be written as v = Ab, where b is some
vector of order k, any collection of m vectors in R (A) can be written
as the matrix product

v1 . . . vm = ABkm .

Hence, given inequality (6.4.1), rank (ABkm ) rank (A) for any
Bkm and any m, with equality holding if m = k and Bkm is non-
singular, which proves that dim [R (A)] = rank (A) . Obviously, if A is
of f.c.r., then dim [R (A)] = k.
38
8.5. PROJECTION MATRICES 39

8.3. Orthogonal spaces


Two column vectors of the same order, x and y, are said orthogonal
if and only if x0 y = 0 (or, obviously, y0 x = 0). The set of all vectors
in Rn that are orthogonal to the vectors of R (A) is denoted by A
and is proved to be a subspace of Rn . A proof follows. A Rn
by definition. Given any two vectors b1 and b2 belonging to A and
for any a R (A), by definition b01 a = 0 and b02 a = 0, but then also
(b1 + b2 )0 a = 0 and, for any scalar c, (cb01 ) a = 0, which completes the
proof.
Importantly, it is possible to prove (see Rao (1973), p. 11) that
dim A = n rank (A) .

(8.3.1)
A is commonly referred to as the space orthogonal to R (A), or also
the orthogonal complement of R (A) .
Exercise 22. Prove that any subspace of Rn contains the null
vector, 0n .
Solution: Let V be a subspace of Rn and v V. Then, 0v 0
V.

8.4. Idempotent matrices


Define A2 as A2 = AA. That is, in analogy with the power of a
scalar, A raised to the power of 2 is the matrix product of A by itself.
By definition of matrix product, A2 is defined only if A is a square
matrix.
An important class of matrices is given by the square matrices A
such that A2 = A. These matrices are called idempotent. An impor-
tant property of this class of matrices establishes that the rank of an
idempotent matrix is equal to its trace (see Rao, p. 28).

8.5. Projection matrices


Assume A of f.c.r.1 and define the operator P[A] as
1
P[A] = A (A0 A) A0 .
0
One can easily verify that P[A] is a symmetric (P[A] = P[A] ) and idem-
potent (P[A] P[A] = P[A] ) matrix.

1This is a stringent assumption made to keep the analysis simple. It could be


relaxed in more advanced treatments that use the concept of a generalized inverse.
8.5. PROJECTION MATRICES 40

0
Exercise 23. Verify that P[A] = P[A] and P[A] P[A] = P[A] .
Solution: h i0
1 0 1
0
A (A A) A = A (A0 A) A0
and
1 1 1
A (A0 A) A0 A (A0 A) A0 = A (A0 A) A0 .
Any matrix with the two foregoing properties is said an orthogonal
projector, and so is P[A] . In geometrical terms, P[A] projects vectors
onto R (A) along a direction that is parallel to the space orthogonal to
R (A), A . Symmetrically,
M[A] = I P[A]
is the orthogonal projector that projects vectors onto A along a di-
rection that is parallel to the space orthogonal to A , R (A).
Exercise 24. Prove that M[A] is an orthogonal projector (hint:
just verify that M[A] is symmetric and idempotent).
Solution: 0
I P[A] = I 0 P[A]
0
= I P[A]
and  
I P[A] I P[A] = I P[A] P[A] + P[A] = M[A] .
The properties of orthogonal projectors are readily understood,
upon grasping their geometrical meaning. Of course, they can be also
demonstrated algebraically, which is what the following exercises are
concerned with.
Exercise 25. (this is a little bit hard, but instructive) Given two
(n k) real matrices A and B, both of f.c.r., prove that if A and B
span the same space than P[A] = P[B] (hint: prove that A can be always
expressed as A = BK where K is a non-singular (k k) matrix).
Solution: If R (A) coincides with R (B), then every column of A
belongs to R (B), and as such every column of A can be expressed as a
linear combination of the columns of B, A = BK, where K is (k k) .
Therefore, P[A] = BK (K 0 B 0 BK)1 K 0 B. Since both A and B have
rank equal to k, in the light of inequality (6.4.1), k min [k, rank (K)],
which implies that rank (K) k, and since rank (K) > k is not possi-
ble (see Section 6.2), then rank (K) = k and K is non-singular. Finally,
by the property of the inverse of the product of square matrices
1
P[A] = BK (K 0 B 0 BK) K 0B0
1 1
= BKK 1 (B 0 B) (K 0 ) K 0B0
= P[B] .
8.6. OLS RESIDUALS AND PREDICTED VALUES 41

Exercise 26. Prove that P[A] and M[A] are orthogonal, that is
P[A] M[A] = 0. 
Solution: P[A] M[A] = P[A] I P[A] = P[A] P[A] = 0.

Exercise 27. Given any (n 1) real vector v lying onto R (A)


prove that P[A] v = v and M[A] v = 0 (hint: express v as v = Ac, where
c is a real (k 1) vector).
Solution: Since v R (A) it can be written as a linear combination
of the columns of A, so let v = Ac (see Section 5.1) and P[A] v =
P[A] Ac = A (A0 A)1 A0 Ac = Ac = v. Given this, M[A] v = v v = 0.
From exercise 27 it clearly follows that
(8.5.1) P[A] A = A
and
(8.5.2) M[A] A = 0.
8.6. OLS residuals and predicted values
Using the OLS formula in (7.3.2), the OLS residual vector can be
formulated as
(8.6.1) e = M[X] y
where, according to the definition of orthogonal projections,
1
M[X] = I X (X 0 X) X 0.
Therefore, the OLS residual vector is the orthogonal projection of y
onto the space orthogonal to that spanned by the regressors, X . For
this reason M[X] is sometimes said the residual maker. From equations
(7.3.2) and (7.3.3) it follows that
= P[X] y
y
, is the orthogonal projec-
and so the vector of OLS predicted values, y
tion onto the space spanned by the regressors, R (X). Clearly, e0 y
=0
(see exercise 26), therefore the OLS method decomposes y into two
orthogonal components
(8.6.2) + e.
y=y
Bibliography

Dhrymes, P. J., 2000. Mathematics for Econometrics. New York:


Springer-Verlag.
Greene, W. H., 2008. Econometric Analysis. Upper Saddle River, NJ:
Prentice Hall.
Harville, D. A., 1997. Matrix Algebra From A Statisticians Perspec-
tive. Springer-Verlag.
Rao, C. R., 1973. Linear Statistical Inference and Its Applications. New
York: Wiley.
Searle, S. R., 1982. Matrix Algebra Useful for Statistics. New York:
Wiley.

42