You are on page 1of 5

Models Not of Full Rank

Estimation/Hypothesis Testing
Y Nx1 = X Nxp b px1 + e Nx1
e = y Ey and Ee = 0 EY =Xb, vare = Eee = 2 I N
results in e 0, 2 I and Y Xb, 2 I

Normal equations:
Using least squares we obtain
XXb = XY
Consider a completely randomized design
y ij = + i + e ij for i = 1, 2, 3
Then b =

1 2 3

and data represented as


observation 1 2 3
y 11

1 1

y 12

1 1

y 13

1 1

y 21

1 0

y 22

1 0

y 31

1 0

where the data is:


normal off-type aberrant
101

84

105

88

32

94
totals 300

172

32

The sum of the last 3 columns is the first column; every y ij contains therefore the first column of X is all
ones. Also every y ij contains just one therefore the sum of the last three columns is one hence X is not of
full column rank. X X is square symmetric; its elements are inner products of the columns of X with each
other. X is not of full column rank therefore X Xis not of full column rank.

6 3 2 1
XX =

3 3 0 0
2 0 2 0
1 0 0 1

NOTE: elements of XX are the number of times that parameter of the model occurs in a total
i.e. occurs 6 times in y , 1 occurs 3 times in y , 2 occurs 2 times in y , 3 occurs once in y
occurs 3 times in y 1 , 1 occurs 3 times in y 1 , 2 and 3 do not occur in y 1

and
y 11

XY =

1 1 1 1 1 1

y 12

1 1 1 0 0 0

y 13

0 0 0 1 1 0

y 21

0 0 0 0 0 1

y 22
y 31

y
=

y 1
y 2
y 3
504

300
172
32

101 = y 11 = + 1 + e 11
105 = y 12 = + 2 + e 12
XY is a vector consisting of the inner product of columns of X with Y and since the nonzero elements of X are
y
ones, we obtain

y 1
y 2

y 3
Since XX is not of full column rank, there is not one unique solution to the normal equations
XXb 0 = XY

where
0

b0 =

01
02
03

and applying generalized inverse G we write


GXXb 0 = GXY b 0 = GXY
6 3 2 1

3 3 0 0

01

2 0 2 0

02

1 0 0 1

03

y
y 1

y 2

504
=

y 3

300
172
32

The normal equations are re-written as


EXY = XY
replacing b by b 0 on LHS.
Hence a solution is

b 0 = GXY

Consequence of a Solution:
b 0 is a function of Y
a.
Eb 0 = GXEY
= GXXb
= Hb

b.
varb 0 = varGXY
= GXvarYXG
= GXXG 2 I
For XX symmetric orthogonal permutation matrix P

XP XP = PXXP
A 11 A 12

2=

A 12 A 22

then
G=P

A 1
11 0
0

and
GXXG = G
and
varb 0 = G 2

c. Estimating Ey

Ey = y = Xb 0
= XGXy

Note this vector is invariant to G since XGX is invariant hence y is always the same regardless of

b0

d. Residual Error Sum of Squares

SSE = y Xb 0 y Xb 0
= yy yXb 0 Xb 0 y + Xb 0 Xb 0
= yI XGX I XGX y
= yI XGX y
= yy yXGXy
= yy b 0 Xy in computational form
and XGX is invariant to G so SSE is invariant to G and hence invariant to b 0 .

e. Estimating residual error variance


With y NXb. 2 I
ESSE = EyI XGX y
= trI XGX 2 I + Xb I XGX Xb
= 2 rankI XGX
= N rankX 2

Hence
2 =

SSE
N rankX

f. Partitioning the SST (sum of squares total):


SST = yy

SST = yy
SSM = Ny 2 = yN 1 11y from fitting a general mean SST m = yy Ny 2

SSR = yXGXy = b 0 Xy SSR m = yXGX N 1 11 y =SSR SSM

SSR m0 = yXGX N 1 11

SSE = yI XGX y

SSE = yI XGX y

SSE = yI XGX y

g. Coefficient of Determination R 2

The estimated expected values of y are y

The coefficient of determination = product-moment correlation between observations y and y


N
y i y y i y

R2 =

so

i=1
N

N
y i y 2 y i y
i=1

i=1

Note:
XXGX = X

and because 1 is the first row of X, 1XGX = 1 y = y and thus


SSR 2m
SST m SSR m
= SSR m
SST m

R2 =