You are on page 1of 137

SPRINGER BRIEFS IN STATISTICS

Simo Puntanen
George P. H. Styan
Jarkko Isotalo

Formulas Useful
for Linear Regression
Analysis and Related
Matrix Theory
Its Only Formulas
But We Like Them

SpringerBriefs in Statistics

For further volumes:


http://www.springer.com/series/8921

Photograph 1 Tiritiri Island, Auckland, New Zealand. (Photo: SP)

Simo Puntanen
George P. H. Styan
Jarkko Isotalo

Formulas Useful
for Linear Regression
Analysis and Related
Matrix Theory
Its Only Formulas But We Like Them

123

George P. H. Styan
Department of Mathematics
and Statistics
McGill University
Montral, QC, Canada

Simo Puntanen
School of Information Sciences
University of Tampere
Tampere, Finland
Jarkko Isotalo
Department of Forest Sciences
University of Helsinki
Helsinki, Finland

ISSN 2191-544X
ISBN 978-3-642-32930-2
DOI 10.1007/978-3-642-32931-9

ISSN 2191-5458 (electronic)


ISBN 978-3-642-32931-9 (eBook)

Springer Heidelberg New York Dordrecht London


Library of Congress Control Number: 2012948184
The Author(s) 2013
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief
excerpts in connection with reviews or scholarly analysis or material supplied specifically for the
purpose of being entered and executed on a computer system, for exclusive use by the purchaser
of the work. Duplication of this publication or parts thereof is permitted only under the provisions of
the Copyright Law of the Publishers location, in its current version, and permission for use must always
be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright
Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

Preface

Lie la lie, lie la la-lie lie la-lie.


There must be fifty ways to leave your lover.
Oh, still crazy after all these years.
Paul Simon1

Think about going to a lonely island for some substantial time and that you are supposed to decide what books to take with you. This book is then a serious alternative:
it does not only guarantee a good nights sleep (reading in the late evening) but also
oers you a survival kit in your urgent regression problems (denitely met at the day
time on any lonely island, see for example Photograph 1, p. ii).
Our experience is that even though a huge amount of the formulas related to linear
models is available in the statistical literature, it is not always so easy to catch them
when needed. The purpose of this book is to collect together a good bunch of helpful
ruleswithin a limited number of pages, however. They all exist in literature but are
pretty much scattered. The rst version (technical report) of the Formulas appeared
in 1996 (54 pages) and the fourth one in 2008. Since those days, the authors have
never left home without the Formulas.
This book is not a regular textbookthis is supporting material for courses given
in linear regression (and also in multivariate statistical analysis); such courses are
extremely common in universities providing teaching in quantitative statistical analysis. We assume that the reader is somewhat familiar with linear algebra, matrix
calculus, linear statistical models, and multivariate statistical analysis, although a
thorough knowledge is not needed, one year of undergraduate study of linear algebra and statistics is expected. A short course in regression would also be necessary
before traveling with our book. Here are some examples of smooth introductions to
regression: Chatterjee & Hadi (2012) (rst ed. 1977), Draper & Smith (1998) (rst
ed. 1966), Seber & Lee (2003) (rst ed. 1977), and Weisberg (2005) (rst ed. 1980).
The term regression itself has an exceptionally interesting history: see the excellent chapter entitled Regression towards Mean in Stigler (1999), where (on p. 177)
he says that the story of Francis Galtons (18221911) discovery of regression is an
exciting one, involving science, experiment, mathematics, simulation, and one of the
great thought experiments of all time.
1

From (1) The Boxer, a folk rock ballad written by Paul Simon in 1968 and rst recorded by Simon
& Garfunkel, (2) 50 Ways to Leave Your Lover, a 1975 song by Paul Simon, from his album Still
Crazy After All These Years, (3) Still Crazy After All These Years, a 1975 song by Paul Simon and
title track from his album Still Crazy After All These Years.

vi

Preface

This book is neither a real handbook: by a handbook we understand a thorough


representation of a particular area. There are some recent handbook-type books dealing with matrix algebra helpful for statistics. The book by Seber (2008) should be
mentioned in particular. Some further books are, for example, by Abadir & Magnus
(2005) and Bernstein (2009). Quick visits to matrices in linear models and multivariate analysis appear in Puntanen, Seber & Styan (2013) and in Puntanen & Styan
(2013).
We do not provide any proofs nor references. The book by Puntanen, Styan &
Isotalo (2011) oers many proofs for the formulas. The website http://www.sis.uta.
/tilasto/matrixtricks supports both these books by additional material.
Sincere thanks go to Gtz Trenkler, Oskar Maria Baksalary, Stephen J. Haslett,
and Kimmo Vehkalahti for helpful comments. We give special thanks to Jarmo Niemel for his outstanding LATEX assistance. The Figure 1 (p. xii) was prepared using
the Survo software, online at http://www.survo. (thanks go to Kimmo Vehkalahti)
and the Figure 2 (p. xii) using PSTricks (thanks again going to Jarmo Niemel).
We are most grateful to Alice Blanck, Ulrike Stricker-Komba, and to Niels Peter
Thomas of Springer for advice and encouragement.
This research has been supported in part by the Natural Sciences and Engineering
Research Council of Canada.
SP, GPHS & JI
June 7, 2012
MSC 2000: 15-01, 15-02, 15A09, 15A42, 15A99, 62H12, 62J05.
Key words and phrases: Best linear unbiased estimation, CauchySchwarz inequality, column space, eigenvalue decomposition, estimability, GaussMarkov model,
generalized inverse, idempotent matrix, linear model, linear regression, Lwner ordering, matrix inequalities, oblique projector, ordinary least squares, orthogonal projector, partitioned linear model, partitioned matrix, rank cancellation rule, reduced
linear model, Schur complement, singular value decomposition.

References
Abadir, K. M. & Magnus, J. R. (2005). Matrix Algebra. Cambridge University Press.
Bernstein, D. S. (2009). Matrix Mathematics: Theory, Facts, and Formulas. Princeton University
Press.
Chatterjee, S. & Hadi, A. S. (2012). Regression Analysis by Example, 5th Edition. Wiley.
Draper, N. R. & Smith, H. (1998). Applied Regression Analysis, 3rd Edition. Wiley.
Puntanen, S., Styan, G. P. H. & Isotalo, J. (2011). Matrix Tricks for Linear Statistical Models: Our
Personal Top Twenty. Springer.
Puntanen, S. & Styan, G. P. H. (2013). Chapter 52: Random Vectors and Linear Statistical Models.
Handbook of Linear Algebra, 2nd Edition (Leslie Hogben, ed.), Chapman & Hall, in press.
Puntanen, S., Seber, G. A. F. & Styan, G. P. H. (2013). Chapter 53: Multivariate Statistical Analysis.
Handbook of Linear Algebra, 2nd Edition (Leslie Hogben, ed.), Chapman & Hall, in press.
Seber, G. A. F. (2008). A Matrix Handbook for Statisticians. Wiley.
Seber, G. A. F. & Lee, A. J. (2006). Linear Regression Analysis, 2nd Edition. Wiley.
Stigler, S. M. (1999). Statistics on the Table: The History of Statistical Concepts and Methods.
Harvard University Press.
Weisberg, S. (2005). Applied Linear Regression, 3rd Edition. Wiley.

Contents

Formulas Useful for Linear Regression Analysis and Related Matrix Theory . . 1
1
The model matrix & other preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2
Fitted values and residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3
Regression coecients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4
Decompositions of sums of squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6
Best linear predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7
Testing hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
8
Regression diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9
BLUE: Some preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10 Best linear unbiased estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
11 The relative eciency of OLSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
12 Linear suciency and admissibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
13 Best linear unbiased predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
14 Mixed model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
15 Multivariate linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
16 Principal components, discriminant analysis, factor analysis . . . . . . . . . . . 74
17 Canonical correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
18 Column space properties and rank rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
19 Inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
20 Generalized inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
21 Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
22 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
23 Singular value decomposition & other matrix decompositions . . . . . . . . . . 105
24 Lwner ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
25 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
26 Kronecker product, some matrix derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 115
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

vii

Notation

Rnm

the set of n  m real matrices: all matrices considered in this book


are real

Rnm
r

the subset of Rnm consisting of matrices with rank r

NNDn

the subset of symmetric n  n matrices consisting of nonnegative


denite (nnd) matrices

PDn

the subset of NNDn consisting of positive denite (pd) matrices

null vector, null matrix

1n

column vector of ones, shortened 1

In

identity matrix, shortened I

ij

the j th column of I; j th standard basis vector

Anm D faij g

n  m matrix A with its elements aij , A D .a1 W : : : W am / presented columnwise, A D .a.1/ W : : : W a.n/ /0 presented row-wise

column vector a 2 Rn

A0

transpose of matrix A; A is symmetric if A0 D A, skew-symmetric


if A0 D A

.A W B/

partitioned (augmented) matrix

1

inverse of matrix Ann : AB D BA D In H) B D A1

A

generalized inverse of matrix A: AA A D A

the MoorePenrose inverse of matrix A: AAC A D A; AC AAC D


AC ; .AAC /0 D AAC ; .AC A/0 D AC A

A1=2

nonnegative denite square root of A 2 NNDn

C1=2

nonnegative denite square square root of AC 2 NNDn

ha; bi

standard inner product in Rn : ha; bi D a0 b

ha; biV

inner product a0 Vb; V is a positive denite inner product matrix


(ipm)

ix

Notation

kak

Euclidean norm (standard norm, 2-norm) of vector a: kak2 D a0 a,


also denoted as kak2

kakV

kak2V D a0 Va, norm when the ipm is positive denite V

kAkF

Euclidean (Frobenius) norm of matrix A: kAk2F D tr.A0 A/

det.A/

determinant of matrix A, also denoted as jAj

diag.d1 ; : : : ; dn / n  n diagonal matrix with listed diagonal entries


diag.A/

diagonal matrix formed by the diagonal entries of Ann , denoted


also as A

r.A/

rank of matrix A, denoted also as rank.A/

tr.A/

trace of matrix Ann , denoted also as trace.A/: tr.A/ D a11 C


a22 C    C ann

A L 0

A is nonnegative denite: A D LL0 for some L

A >L 0

A is positive denite: A D LL0 for some invertible L

A L B

A  B is nonnegative denite, Lwner partial ordering

A >L B

A  B is positive denite

cos.a; b/

the cosine of the angle, , between the nonzero vectors a and b:


cos.a; b/ D cos  D ha; bi=.kakkbk/

vec.A/

the vector of columns of A, vec.Anm / D .a01 ; : : : ; a0m /0 2 Rnm

AB

Kronecker product of Anm and Bpq :


1
0
a11 B : : : a1m B
::
:: C
B :
npmq
A B D @ ::
:
: A2R
an1 B : : : anm B
 11 A12 
Schur complement of A11 in A D A
A21 A22 : A221 D A22 
A
D
A=A
A21 A
12
11
11

A221
PA

orthogonal projector onto C .A/ w.r.t. ipm I: PA D A.A0 A/ A0 D


AAC

PAIV

orthogonal projector onto C .A/ w.r.t. ipm V: PAIV D


A.A0 VA/ A0 V

PAjB

projector onto C .A/ along C .B/: PAjB .A W B/ D .A W 0/

C .A/

column space of matrix Anm : C .A/ D f y 2 Rn W y D Ax for


some x 2 Rm g

N .A/

null space of matrix Anm : N .A/ D f x 2 Rm W Ax D 0 g

C .A/?

orthocomplement of C .A/ w.r.t. ipm I: C .A/? D N .A0 /

A?

matrix whose column space is C .A? / D C .A/? D N .A0 /

C .A/?
V

orthocomplement of C .A/ w.r.t. ipm V

A?
V

?
?
1 ?
matrix whose column space is C .A/?
V : AV D .VA/ D V A

Notation

xi

chi .A/ D i

the i th largest eigenvalue of Ann (all eigenvalues being real):


.i ; ti / is the i th eigenpair of A: Ati D i ti , ti 0

ch.A/

set of all n eigenvalues of Ann , including multiplicities

nzch.A/

set of nonzero eigenvalues of Ann , including multiplicities

sgi .A/ D i

the i th largest singular value of Anm

sg.A/

set of singular values of Anm

U CV

sum of vector spaces U and V

U V

direct sum of vector spaces U and V; here U \ V D f0g

U V
vard .y/ D

direct sum of orthogonal vector spaces U and U


sy2

sample variance: argument is variable vector y 2 Rn :


vard .y/ D

1
y0 Cy
n1

covd .x; y/ D sxy sample covariance:


covd .x; y/ D

1
x0 Cy
n1

1
n1

n
X

.xi  xN i /.yi  y/
N

iD1

p
cord .x; y/ D rxy sample correlation: rxy D x0 Cy= x0 Cx  y0 Cy, C is the centering
matrix
E.x/

expectation of a p-dimensional random vector x: E.x/ D x D


.1 ; : : : ; p /0 2 Rp

cov.x/ D

covariance matrix (p  p) of a p-dimensional random vector x:


cov.x/ D D fij g D E.x  x /.x  x /0 ;
cov.xi ; xj / D ij D E.xi  i /.xj  j /;
var.xi / D i i D i2

cor.x/

correlation matrix of a p-dimensional random vector x:




ij
cor.x/ D f%ij g D
i j

xii
y
15
10
5
0
-5
-10
-15
x
-15

-10

-5

10

15

Figure 1 Observations from N2 .0; /; x D 5, y D 4, %xy D 0:7. Regression line has the
slope O1  %xy y =x . Also the regression line of x on y is drawn. The direction of the rst
major axis of the contour ellipse is determined by t1 , the rst eigenvector of .

X
y

SST

SSE

Hy

SST

SSE

SSE

SSR

H
y

Jy

Figure 2 Illustration of SST D SSR C SSE.

JHy

SSR

Formulas Useful for Linear Regression Analysis


and Related Matrix Theory

1 The model matrix & other preliminaries


1.1

Linear model. By M D fy; X;  2 Vg we mean that we have the model y D


X C ", where E.y/ D X 2 Rn and cov.y/ D  2 V, i.e., E."/ D 0 and
cov."/ D  2 V; M is often called the GaussMarkov model.
 y is an observable random vector, " is unobservable random error vector,
X D .1 W X0 / is a given n  p (p D k C 1) model (design) matrix, the
vector D .0 ; 1 ; : : : ; k /0 D .0 ; 0x /0 and scalar  2 > 0 are unknown
 yi D E.yi / C "i D 0 C x0.i/ x C "i D 0 C 1 xi1 C    C k xik C "i ,
where x0.i/ D the i th row of X0
 from the context it is apparent when X has full column rank; when distributional properties are considered, we assume that y  Nn .X;  2 V/
 according to the model, we believe that E.y/ 2 C .X/, i.e., E.y/ is a linear
combination of the columns of X but we do not know which linear combination
 from the context it is clear which formulas require that the model has the
intercept term 0 ; p refers to the number of columns of X and hence in the
no-intercept model p D k
 if the explanatory variables xi are random variables, then the model M
may be interpreted as the conditional model of y given X: E.y j X/ D X,
cov.y j X/ D  2 V, and the error term is dierence y  E.y j X/. In short,
regression is the study of how the conditional distribution of y, when x is
given, changes with the value of x.

S. Puntanen et al., Formulas Useful for Linear Regression


Analysis and Related Matrix Theory, SpringerBriefs in Statistics,
c The Author(s) 2013
DOI: 10.1007/978-3-642-32931-9_1, 

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

1
x0.1/
C
B
X D .1 W X0 / D .1 W x1 W : : : W xk / D @ ::: A 2 Rn.kC1/
model matrix
0
n

p,
p DkC1
x.n/
0 0 1 0
1
x.1/
x11 x12 : : : x1k
B :: C @ ::
::
:: A
2 Rnk
X0 D .x1 W : : : W xk / D @ : A D
:
:
:
0
data matrix
xn1 xn2 : : : xnk
x.n/
of x1 ; : : : ; xk
0

1.2

1.3

1.4

1 D .1; : : : ; 1/0 2 Rn ; ii D .0; : : : ; 1 (i th) : : : ; 0/0 2 Rn ;


i0i X D x0.i/ D .1; x0.i/ / D the i th row of X;
i0i X0 D x0.i/ D the i th row of X0

1.5

variable vectors in variable space Rn

x1 ; : : : ; xk

observation vectors in observation space Rk

x.1/ ; : : : ; x.n/
1.6

Xy D .X0 W y/ 2 Rn.kC1/

1.7

J D 1.10 1/1 10 D n1 110 D P1 D Jn D orthogonal projector onto C .1n /


N y;
N : : : ; y/
N 0 2 Rn
Jy D y1
N D yN D .y;

1.8

IJDC

1.9

.I  J/y D Cy D y  y1
N n
N : : : ; yn  y/
N 0
D y  yN D yQ D .y1  y;

1.10

joint data matrix of x1 ; : : : ; xk and response y

C D Cn D orthogonal projector onto C .1n /? ,


centering matrix

xN D .xN 1 ; : : : ; xN k /0 D n1 X00 1n
D n1 .x.1/ C    C x.n/ / 2 Rk

1.11

1.12

centered y

0 0
xN
:
0
@
N D ::
D .xN 1 W : : : W xN k W yN / D 1.Nx ; y/
xN 0

N
JXy D .JX0 W Jy/ D .xN 1 1 W : : : W xN k 1 W y1/

vector of x-means
1
yN
:: A
n.kC1/
: 2R
yN

Q 0 D .I  J/X0 D CX0 D .x1  xN 1 W : : : W xk  xN k / D .Qx1 W : : : W xQ k /


X
1 0 0 1
0 0
xQ .1/
x.1/  xN 0
C
B
B
:
: C
nk
::
D@
centered X0
A D @ :: A 2 R
0
0
0
x.n/  xN
xQ .n/

1 The model matrix & other preliminaries

1.13

Q y D .I  J/Xy
X
Q 0 W y  yN / D .X
Q 0 W yQ /
D CXy D .CX0 W Cy/ D .X

1.14

centered Xy


 0
  0
Q Q0Q
Q X
X0 CX0 X00 Cy
X
0 0 X0 y
D
Q 0 yQ 0 yQ
y0 CX0 y0 Cy
yQ 0 X
1
0
t11 t12 : : : t1k t1y
 

::
::
:: C
B ::
txy
ssp.X0 / ssp.X0 ; y/
:
:
:
: C
D
DB
@tk1 tk2 : : : tkk tky A
tyy
ssp.y; X0 / ssp.y/
ty1 ty2 : : : tyk tyy

Qy D
Q y0 X
T D Xy0 .I  J/Xy D X

T
D 0xx
txy

D ssp.X0 W y/ D ssp.Xy / corrected sums of squares and cross products


1.15

Q 0 D X00 CX0 D
Q 00 X
Txx D X
D

n
X

n
X

xQ .i/ xQ 0.i/ D

iD1

n
X

.x.i/  xN /.x.i/  xN /0

iD1

x.i/ x0.i/  nNxxN 0 D fx0i Cxj g D ftij g D ssp.X0 /

iD1



Sxx sxy
D 0
sxy sy2

1.16

S D covd .Xy / D covd .X0 W y/ D




covd .X0 / covd .X0 ; y/
D
sample covariance matrix of xi s and y
covd .y; X0 / vard .y/
  

x
covs .x/ covs .x; y/
D
here x is the vector
D covs
y
covs .y; x/ vars .y/
of xs to be observed

1.17

T D diag.T/ D diag.t11 ; : : : ; tkk ; tyy /;

1
T
n1

S D diag.S/ D diag.s12 ; : : : ; sk2 ; sy2 /


1.18

Q D .xQ W : : : W xQ W yQ /
X
y
1
k
Q
Q0X
Q y T1=2 ; centering & scaling: diag.X
DX
y y / D IkC1

1.19

While calculating the correlations, we assume that all variables have nonzero
variances, that is, the matrix diag.T/ is positive denite, or in other words:
xi C .1/, i D 1; : : : ; k, y C .1/.

1.20

R D cord .X0 W y/ D cord .Xy /

sample correlation matrix of xi s and y

D S1=2
SS1=2
D T1=2
TT1=2

! 

 
Q X
Q 0 yQ
QX0 X
Rxx rxy
cord .X0 / cord .X0 ; y/
Q D
Q0X
0 0
0
DX
D
D
y y
Q
r0xy 1
1
cord .y; X0 /
yQ 0 yQ
yQ 0 X
0

1.21

Formulas Useful for Linear Regression Analysis and Related Matrix Theory



T t
T D 0xx xy ;
txy tyy
0

1
t1y
D @ ::: A ;
tky

1.22

txy

1.23

SSy D

Sxx sxy
s0xy sy2


D

sxy

1
s1y
D @ ::: A ;
sky

RD

1
T;
n1



Rxx rxy
r0xy 1

rxy

1
r1y
D @ ::: A
rky

n
n
n
X
2
X
X
.yi  y/
N 2D
yi2  n1
yi
iD1


SD

n
X

iD1

iD1

yi2  nyN 2 D y0 Cy D y0 y  y0 Jy

iD1

1.24

SPxy D
D

n
X
iD1
n
X

.xi  x/.y
N i  y/
N D

n
X

xi yi 

1
n

n
X

iD1

xi

n
X

iD1


yi

iD1

xi yi  nxN yN D x0 Cy D x0 y  x0 Jy

iD1

1.25

sxy D covs .x; y/ D


1.26

1
y0 Cy;
n1
1
covd .x; y/ D n1
x0 Cy

sy2 D vars .y/ D vard .y/ D

rij D cors .xi ; xj / D cord .xi ; xj / D cos.Cxi ; Cxj / D cos.Qxi ; xQj / D xQ 0i xQj
x0 Cxj
sij
SPij
tij
D
Dp 0 i
Dp
Dp
0
ti i tjj
si sj
SSi SSj
xi Cxi  xj Cxj

sample
correlation

1.27

If x and y are centered then cord .x; y/ D cos.x; y/.

1.28

Keeping observed data as a theoretical distribution. Let u1 , . . . , un be the observed values of some empirical variable u, and let u be a discrete random
variable whose values are u1 , . . . , un , each with P.u D ui / D n1 . Then
s 2 . More generally, consider a data matrix
E.u / D uN and var.u / D n1
n u
0
U D .u.1/ W : : : W u.n/ / and dene a discrete random vector u with probability function P.u D u.i/ / D n1 , i D 1; : : : ; n. Then
N
E.u / D u;

cov.u / D n1 U0 CU D

n1
n

covd .U/ D

n1
S:
n

Moreover, the sample correlation matrix of data matrix U is the same as the
(theoretical, population) correlation matrix of u . Therefore, any property
shown for population statistics, holds for sample statistics and vice versa.
1.29

Mahalanobis distance. Consider a data matrix Unp D .u.1/ W : : : W u.n/ /0 ,


where covd .U/ D S 2 PDp . The (squared) sample Mahalanobis distance of

1 The model matrix & other preliminaries

the observation u.i/ from the mean vector uN is dened as


N S/ D .u.i/  u/
N 0 S1 .u.i/  u/
N D kS1=2 .u.i/  u/k
N 2:
MHLN2 .u.i/ ; u;
Q D PCU . If x is a random
N S/ D .n1/hQ i i , where H
Moreover, MHLN2 .u.i/ ; u;
p
vector with E.x/ D  2 R and cov.x/ D 2 PDp , then the (squared)
Mahalanobis distance between x and  is the random variable
MHLN2 .x; ; / D .x/0 1 .x/ D z0 zI
1.30

z D 1=2 .x/:

Statistical distance. The squared Euclidean distance of the i th observation u.i/


N 2 . Given the data matrix Unp , one
from the mean uN is of course ku.i/  uk
may wonder if there is a more informative way, in statistical sense, to measure
N Consider a new variable D a0 u so that the n
the distance between u.i/ and u.
N
values of are in the variable vector z D Ua. Then i D a0 u.i/ and N D a0 u,
and we may dene
N
ja0 .u.i/  u/j
N
ji  j
D
Di .a/ D p
;
p
0
a Sa
vars ./

where S D covd .U/.

Let us nd a vector a which maximizes Di .a/. In view of 22.24c (p. 102),


N 0 S1 .u.i/  u/
N D MHLN2 .u.i/ ; u;
N S/:
max Di2 .a/ D .u.i/  u/
a0

N
The maximum is attained for any vector a proportional to S1 .u.i/  u/.
1.31

C .A/ D the column space of Anm D .a1 W : : : W am /


D f z 2 Rn W z D At D a1 t1 C    C am tm for some t 2 Rm g  Rn

1.32

C .A/? D the orthocomplement of C .A/


D the set of vectors which are orthogonal (w.r.t. the standard inner
product u0 v) to every vector in C .A/
D f u 2 Rn W u0 At D 0 for all t g D f u 2 Rn W A0 u D 0 g
D N .A0 / D the null space of A0

1.33

Linear independence and rank.A/. The columns of Anm are linearly independent i N .A/ D f0g. The rank of A, r.A/, is the maximal number of
linearly independent columns (equivalently, rows) of A; r.A/ D dim C .A/.

1.34

A? D a matrix whose column space is C .A? / D C .A/? D N .A0 /:


Z 2 fA? g () A0 Z D 0 and r.Z/ D n  r.A/ D dim C .A/? :

1.35

The rank of the model matrix X D .1 W X0 / can be expressed as


r.X/ D 1 C r.X0 /  dim C .1/ \ C .X0 / D r.1 W CX0 /
D 1 C r.CX0 / D 1 C r.Txx / D 1 C r.Sxx /;

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

and thereby
r.Sxx / D r.X/  1 D r.CX0 / D r.X0 /  dim C .1/ \ C .X0 /:
If all x-variables have nonzero variances, i.e., the correlation matrix Rxx is
properly dened, then r.Rxx / D r.Sxx / D r.Txx /. Moreover,
Sxx is pd () r.X/ D k C 1 () r.X0 / D k and 1 C .X0 /:
1.36

 In 1.1 vector y is a random vector but for example in 1.6 y is an observed


sample value.
 cov.x/ D D fij g refers to the covariance matrix (p  p) of a random
vector x (with p elements), cov.xi ; xj / D ij , var.xi / D i i D i2 :
cov.x/ D D E.x  x /.x  x /0
D E.xx0 /  x 0x ;

x D E.x/:

 notation x  .; / indicates that E.x/ D  and cov.x/ D


 the determinant det./ is called the (population) generalized variance


 cor.x/ D % D iijj refers to the correlation matrix of a random vector x:


1=2
;
cor.x/ D % D 1=2

D 1=2
% 1=2

 cov.Ax/ D A cov.x/A0 D AA0 , A 2 Rap , E.Ax/ D Ax


 E 1=2 .x  / D 0,

cov 1=2 .x  / D Ip , when x  .; /

 cov.T0 x/ D , if D TT0 is the eigenvalue decomposition of


 var.a0 x/ D a0 cov.x/a D a0 a  0 for all a 2 Rp and hence every
covariance matrix is nonnegative denite; is singular i there exists a
nonzero a 2 Rp such that a0 x D a constant with probability 1
 var.x1 x2 / D 12 C 22 212

  
 
x
cov.x/ cov.x; y/
xx xy
,
 cov
D
D
yx yy
y
cov.y; x/ cov.y/
  

xx  xy
x
cov
D
 0xy y2
y
 cov.x; y/ refers to the covariance matrix between random vectors x and y:
cov.x; y/ D E.x  x /.y  y /0 D E.xy0 /  x 0y D xy
 cov.x; x/ D cov.x/
 cov.Ax; By/ D A cov.x; y/B0 D A xy B0
 cov.Ax C By/ D A xx A0 C B yy B0 C A xy B0 C B yx A0

1 The model matrix & other preliminaries

 cov.Ax; By C Cz/ D cov.Ax; By/ C cov.Ax; Cz/


 cov.a0 x; y/ D a0 cov.x; y/ D a0  xy D a1 1y C    C ap py
 cov.z/ D I2 ,

 2


x
x xy
p0
H)
cov.Az/
D
;
AD
xy y2
y % y 1  %2
i.e.,


 cov

 
u
cov.z/ D cov
D I2
v
  2


p
x xy
xu


D
H) cov
xy y2
y %u C 1  %2 v
1
xy =x2

! 
  

x
0
2
0
x
xy
D x 2
D cov
y  2 x
0 y .1  %2 /
1
y
x

 covd .Unp / D covs .u/ refers to the sample covariance matrix


n
X
1
1
N .i/  u/
N 0DS
covd .U/ D n1
U0 CU D n1
.u.i/  u/.u
iD1

 the determinant det.S/ is called the (sample) generalized variance


 covd .UA/ D A0 covd .U/A D A0 SA D covs .A0 u/
 covd .US1=2 / D Ip D covd .CUS1=2 /
 U D CUS1=2 : U is centered and transformed so that the new variN
N and each has variance 1. Moreover, diag.UU0 / D
ables
are uncorrelated
NN
2
2
N S/.
diag.d1 ; : : : ; dn /, where di2 D MHLN2 .u.i/ ; u;
 U D CUdiag.S/1=2 : U is centered and scaled so that each variable
has variance 1 (and the squared length n  1)
Q is centered and scaled so that each variable
Q D CUdiag.U0 CU/1=2 : U
 U
1
has length 1 (and variance n1 )
 Denote U# D CUT, where Tpp comprises the orthonormal eigenvectors
of S W S D TT0 , D diag.1 ; : : : ; p /. Then U# is centered and transformed so that the new variables are uncorrelated and the i th variable has
variance i : covd .U# / D .
 vars .u1 C u2 / D covd .Un2 12 / D 10 covd .U/1 D 10 S22 1 D s12 C s22 C
2s12
 vars .u1 u2 / D s12 C s22 2s12

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

2 Fitted values and residuals


2.1

H D X.X0 X/ X0 D XXC D PX


0

D X.X X/
2.2

1

X;

orthogonal projector onto C .X/

when r.X/ D p

H D P1 C P.IJ/X0 D P1 C PXQ 0

X D .1 W X0 /

Q 0 / X
Q 00 D J C X
Q0
Q 00 X
Q 0 T
Q 0 .X
DJCX
xx X0

Q 0 D .I  J/X0 D CX0
X

2.3

Q 0 / X
Q 00 D PC .X/\C .1/?
Q 0 .X
Q 00 X
H  J D PXQ 0 D X

2.4

H D P X1 C P M1 X2 I
M1 D I  P X1 ;

X D .X1 W X2 /;

X1 2 Rnp1 ;

X2 2 Rnp2

2.5

H  PX1 D PM1 X2 D PC .X/\C .M1 / D M1 X2 .X02 M1 X2 / X02 M1

2.6

C .M1 X2 / D C .X/ \ C .M1 /;


C .M1 X2 /? D N .X02 M1 / D C .X/?  C .X1 /

2.7

C .Cx/ D C .1 W x/ \ C .1/? ;

C .Cx/? D C .1 W x/?  C .1/

2.8

r.X02 M1 X2 / D r.X02 M1 / D r.X2 /  dim C .X1 / \ C .X2 /

2.9

The matrix X02 M1 X2 is pd i r.M1 X2 / D p2 , i.e., i C .X1 / \ C .X2 / D f0g


and X2 has full column rank. In particular, Txx D X00 CX0 is pd i r.CX0 / D
k i r.X/ D k C 1 i
C .X0 / \ C .1/ D f0g and X0 has full column rank.

2.10

H D X1 .X01 M2 X1 / X01 M2 C X2 .X02 M1 X2 / X02 M1


i C .X1 / \ C .X2 / D f0g

2.11

MDIH

2.12

M D I  .PX1 C PM1 X2 / D M1  PM1 X2 D M1 .I  PM1 X2 /

2.13

c D OLSE.X/
yO D Hy D XO D X

2.14

Because yO is the projection of y onto C .X/, it depends only on C .X/, not on


a particular choice of X D X , as long as C .X/ D C .X /. The coordinates
O depend on the choice of X.
of yO with respect to X, i.e., ,

orthogonal projector onto C .X/? D N .X0 /

OLSE of X, the tted values

2 Fitted values and residuals

2.15

yO D Hy D XO D 1O0 C X0 O x D O0 1 C O1 x1 C    C Ok xk
1
N 0 T1
D .J C PXQ 0 /y D Jy C .I  J/X0 T1
xx t xy D .yN  x
xx t xy /1 C X0 Txx t xy

2.16

yO D X1 O 1 C X2 O 2 D X1 .X01 M2 X1 /1 X01 M2 y C X2 .X02 M1 X2 /1 X02 M1 y


D .PX1 C PM1 X2 /y D X1 .X01 X1 /1 X01 y C M1 X2 .X02 M1 X2 /1 X02 M1 y
D PX1 y C M1 X2 O 2

here and in 2.15 r.X/ D p

2.17

OLS criterion. Let O be any vector minimizing ky  Xk2 . Then XO is


OLSE.X/. Vector XO is always unique but O is unique i r.X/ D p. Even
if r.X/ < p, O is called the OLSE of even though it is not an ordinary estimator because of its nonuniqueness; it is merely a solution to the minimizing
problem. The OLSE of K0 is K0 O which is unique i K0 is estimable.

2.18

Normal equations. Let O 2 Rp be any solution to normal equation X0 X D


X0 y. Then O minimizes ky  Xk. The general solution to X0 X D X0 y is
O D .X0 X/ X0 y C Ip  .X0 X/ X0 Xz;
where z 2 Rp is free to vary and .X0 X/ is an arbitrary (but xed) generalized
inverse of X0 X.

2.19

Generalized normal equations. Let Q 2 Rp be any solution to the generalized


normal equation X0 V1 X D X0 V1 y, where V 2 PDn . Then Q minimizes
ky  XkV1 . The general solution to the equation X0 V1 X D X0 V1 y is
Q D .X0 V1 X/ X0 V1 y C Ip  .X0 V1 X/ X0 V1 Xz;
where z 2 Rp is free to vary.

2.20

Under the model fy; X;  2 Ig the following holds:


(a) E.Oy/ D E.Hy/ D X;
(b) cov.Oy; y  yO / D 0;

cov.Oy/ D cov.Hy/ D  2 H

cov.y; y  yO / D  2 M

(c) yO  Nn .X;  2 H/ under normality


O where x0 is the i th row of X
(d) yOi D x0.i/ ,
.i/

yOi  N.x0.i/ ;  2 hi i /

(e) "O D y  yO D .In  H/y D My D res.yI X/

"O D the residual vector

(f) " D y  X;

E."/ D 0;

cov."/ D  2 In

" D error vector

O D 0, cov."/
O D  2 M and hence the components of the residual
(g) E."/
vector "O may be correlated and have unequal variances
(h) "O D My  Nn .0;  2 M/ when normality is assumed
O
(i) "Oi D yi  yOi D yi  x0.i/ ;

"Oi  N0;  2 .1  hi i /,

"Oi D the i th
residual

10

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(j) var.O"i / D  2 .1  hi i / D  2 mi i
(k) cor.O"i ; "Oj / D
(l)
2.21

hij
mij
D
1=2
.1  hi i /.1  hjj /
.mi i mjj /1=2

"Oi
 N.0; 1/;
p
 1  hi i

"i
 N.0; 1/


Under the intercept model fy; .1 W X0 /;  2 Ig the following holds:



 0
y0 .I  J/Hy
y .H  J/y 1=2
(a) cord .y; yO / D p
D
y0 .I  J/y
y0 .I  J/y  y0 H.I  J/Hy


SSR 1=2
D
D R D Ryx D the multiple correlation
SST
O D0
(b) cord .Oy; "/

the tted values and residuals are uncorrelated

O D0
(c) cord .xi ; "/

each xi -variable is uncorrelated


with the residual vector

O D .C/.1  R2 /1=2 ( 0)
(d) cord .y; "/
(e) "O 0 1 D y0 M1 D 0, i.e.,
0O

Pn

the residual vector "O is centered

D0

(f) "O y D 0;

"O xi D 0

(g) yO 0 1 D y0 H1 D y0 1, i.e.,
2.22

Oi
iD1 "

y and "O may be


positively correlated

1
n

Pn
iD1

yOi D yN

the mean of yOi -values is yN

Under the model fy; X;  2 Vg we have


(a) E.Oy/ D X;
O D ;
(b) E./

cov.Oy/ D  2 HVH;

cov.Hy; My/ D  2 HVM,

O D  2 .X0 X/1 X0 VX.X0 X/1 .


cov./

[if r.X/ D p]

3 Regression coecients

3.1

In this section we consider the model fy; X;  2 Ig, where r.X/ D p most of
the time and X D .1 W X0 /. As regards distribution, y  Nn .X;  2 I/.
!
O0

0
1
0
estimated regression
O D .X X/ X y D
2 RkC1
O x
coecients, OLSE./

3.2

O D ;
E./

O D  2 .X0 X/1 I
cov./

3.3

O0 D yN  O 0x xN D yN  .O1 xN 1 C    C Ok xN k /

O  NkC1 ;  2 .X0 X/1 


estimated constant term,
intercept

3 Regression coecients

3.4

11

O x D .O1 ; : : : ; Ok /0
1
Q 00 X
Q 0 /1 X
Q 00 yQ D T1
D .X00 CX0 /1 X00 Cy D .X
xx t xy D Sxx sxy

X D .1 W x/;

O0 D yN  O1 x;
N

sy
SPxy
sxy
O1 D
D 2 D rxy
SSx
sx
sx

3.5

k D 1;

3.6

If the model does not have the intercept term, we denote p D k, X D X0 ,


and O D .O1 ; : : : ; Op /0 D .X0 X/1 X0 y D .X00 X0 /1 X00 y.

3.7

If X D .X1 W X2 /, Mi D I  PXi , Xi 2 Rnpi , i D 1; 2, then


! 

O1
.X01 M2 X1 /1 X01 M2 y

and
O D O D
.X02 M1 X2 /1 X02 M1 y
2

3.8

O 1 D .X01 X1 /1 X01 y  .X01 X1 /1 X01 X2 O 2 D .X01 X1 /1 X01 .y  X2 O 2 /:

3.9

Denoting the full model as M12 D fy; .X1 W X2 /;  2 Ig and


M1 D fy; X1 1 ;  2 Ig;

small model
2

M122 D fM2 y; M2 X1 1 ;  M2 g; with M2 D I  PX2 ;


O i .A / D OLSE of i under the model A ;

reduced
model

we can write 3.8 as


(a) O 1 .M12 / D O 1 .M1 /  .X01 X1 /1 X01 X2 O 2 .M12 /,
and clearly we have
(b) O 1 .M12 / D O 1 .M122 /.

(FrischWaughLovell theorem)

3.10

Let M12 D fy; .1 W X0 /;  2 Ig, D .0 W 0x /0 , M121 D fCy; CX0 x ;


 2 Cg D centered model. Then 3.9b means that x has the same OLSE in the
original model and in the centered model.

3.11

O 1 .M12 / D .X01 X1 /1 X01 y, i.e., the old regression coecients do not change
when the new regressors (X2 ) are added i X01 X2 O 2 D 0.

3.12

The following statements are equivalent:


(a) O 2 D 0,

(b) X02 M1 y D 0,

(c) y 2 N .X02 M1 / D C .M1 X2 /? ,

(d) pcord .X2 ; y j X01 / D 0 or y 2 C .X1 /I

3.13

X D .1 W X01 W X2 /
D .X1 W X2 /:

The old regression coecients do not change when one new regressor xk is
added i X01 xk D 0 or Ok D 0, with Ok D 0 being equivalent to x0k M1 y D 0.

12

3.14

3.15

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

In the intercept model the old regression coecients O1 , O2 , . . . , Okp2 (of


real predictors) do not change when new regressors (whose values are in
X2 ) are added if cord .X01 ; X2 / D 0 or O 2 D 0; here X D .1 W X01 W X2 /.
Ok D

x0k .I  PX1 /y
x 0 M1 y
u0 v
D 0k
D 0 ,
0
xk .I  PX1 /xk
x k M1 x k
vv

where X D .X1 W xk / and

u D M1 y D res.yI X1 /, v D M1 xk D res.xk I X1 /, i.e., Ok is the OLSE of


k in the reduced model M121 D fM1 y; M1 xk k ;  2 M1 g.
3.16
3.17

y 0 M1 y
2
2
 0
D ryk12:::k1
 t kk  SSE.yI X1 /
Ok2 D ryk12:::k1
xk M1 x0k
Multiplying xk by a means that Ok will be divided by a.
Multiplying y by b means that Ok will be multiplied by b.
si
O i D Oi ;
sy

3.18

O D R1
xx rxy ;

3.19

k D 2 W O 1 D

3.20

Let X D .1 W X1 W X2 / WD .Z W X2 /, where X2 2 Rnp2 . Then

O 0 D 0;
standardized regr. coecients
(all variables centered & equal variance)

2 1=2
.1  r2y
/
r1y  r12 r2y
r2y  r12 r1y
;

O
D
;

O
D
r
2
1
1y2
2
2
2 1=2
1  r12
1  r12
.1  r12 /

O D  2 .X0 X/1
cov./

1


N 0 T1
n 10 X0
xN Nx0 T1
2
2 1=n C x
xx
xx
D
D
N
X00 1 X00 X0
T1
T1
xx x
xx
0
1
00
01
t
t
:
:
: t 0k
!
B 10 t 11 : : : t 1k C
var.O0 / cov.O0 ; O x /
C
2 Bt
:: : :
: C
D
D

B ::
@ :
: :: A
cov.O x ; O0 / cov.O x /
:
t k0 t k1 : : : t kk
 0



1
0

T2
2 Z Z Z X2
2 T
D
D
X02 Z X02 X2
T2 T22
 0

1

2 Z .I  PX2 /Z
; where
D

X02 .I  PZ /X2 1
0
1
(a) T22 D T1
221 D X2 .I  P.1WX1 / X2 

D .X02 M1 X2 /1 D X02 .C  PCX1 /X2 1


D X02 CX2  X02 CX1 .X01 CX1 /1 X01 CX2 1
1
D .T22  T21 T1
11 T12 / ,

M1 D I  P.1WX1 /

3 Regression coecients

13



T11 T12
2 Rkk ,
(b) Txx D .X1 W X2 /0 C.X1 W X2 / D
T21 T22


 
T1
,
D
xx
 T22
(c) t kk D x0k .I  PX1 /xk 1

X D .X1 W xk /, X1 D .1 W x1 W : : : W xk1 /

D 1=SSE.k/ D 1=SSE.xk explained by all other xs/


D 1=SSE.xk I X1 / D 1=tkkX1

corresp. result holds for all t i i

(d) t kk D .x0k Cxk /1 D 1=tkk i X01 Cxk D 0 i r1k D r2k D    D


rk1;k D 0,
(e) t 00 D

1
n

N D .n  10 PX0 1/1 D k.I  PX0 /1k1 .


C xN 0 T1
xx x

3.21

Under fy; .1 W x/;  2 Ig:




P 2

2
N 2 =SSx x=SS
xi =n xN
N
x
2 1=n C x
O
D
;
cov./ D 
x=SS
N
1=SSx
xN
1
SSx
x
x0 x
2
xN
; var.O0 / D  2
; cor.O0 ; O1 / D p
:
var.O1 / D
SSx
nSSx
x0 x=n

3.22

cov.O 2 / D  2 .X02 M1 X2 /1 ;

3.23

cov.O 1 j M12 / D  2 .X01 M2 X1 /1 L  2 .X01 X1 /1 D cov.O 1 j M1 /:


adding new regressors cannot decrease the variances of old regression coefcients.

3.24

Ri2 D R2 .xi explained by all other xs/


D R2 .xi I X.i/ /
SSR.i /
SSE.i /
D
D1
SST.i /
SST.i /

O 2 2 Rp2 ;

X D .X1 W X2 /;

X2 is n  p2

X.i/ D .1 W x1 ; : : : ; xi1 ; xiC1 ; : : : ; xk /


SSE.i / D SSE.xi I X.i/ /, SST.i / D ti i

1
D r i i ; i D 1; : : : ; k;
VIFi D variance ination factor
1  Ri2
SST.i /
ti i
ij
1
ij
D
D
D ti i t i i ; R1
xx D fr g; Txx D ft g
SSE.i /
SSE.i /

3.25

VIFi D

3.26

VIFi  1 and VIFi D 1

3.27

var.Oi / D  2 t i i
D

cord .xi ; X.i/ / D 0

VIFi
rii
2
2
D 2
D
;
D 2
SSE.i /
ti i
ti i
.1  Ri2 /ti i

i D 1; : : : ; k

14

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

2
;
2
.1  r12
/ti i

3.28

k D 2 W var.Oi / D

3.29

cor.O1 ; O2 / D r1234:::k D  pcord .x1 ; x2 j X2 /;

3.30

O D O 2 .X0 X/1
cov./

3.31

var.
c Oi / D O 2 t i i D se2 .Oi /

3.32

se.Oi / D

3.33

Oi t=2Ink1 se.Oi /

3.34

cor.O1 ; O2 / D r12
X2 D .x3 W : : : W xk /

estimated covariance matrix of O

p
p
var.
c Oi / D O t i i

estimated variance of Oi
estimated stdev of Oi , standard error of Oi
.1  /100% condence interval for i

Best linear unbiased prediction, BLUP, of y under




   
X
y
2 In 0
I
;
; 
M D
0 1
x0#
y
a linear model with new future observation; see Section 13 (p. 65). Suppose
that X D .1 W X0 / and denote x0# D .1; x0 / D .1; x1 ; : : : ; xk /. Then
(a) y D x0# C "

new unobserved value y with


a given .1; x0 / under M

(b) yO D x0# O D O0 C x0 O x D O0 C O1 x1 C    C Ok xk


D .yN  O 0x xN / C O 0x x D yN C O 0x .x  xN /
(c) e D y  yO
(d) var.yO / D

yO D BLUP.y /

prediction error with a given x

O
var.x0# /

D  2 x0# .X0 X/1 x#




C var O 0x .x  xN /

WD  2 h#

D var.y/
N
Note: cov.O x ; y/
N D0


2 1
0 1
D  n C .x  xN / Txx .x  xN /


1
N/
D  2 n1 C n1
.x  xN /0 S1
xx .x  x


1
D  2 n1 C n1
MHLN2 .x ; xN ; Sxx /


1
N 2
.x  x/
; when k D 1; yO D O0 C O1 x
C
(e) var.yO / D  2
n
SSx
(f) var.e / D var.y  yO /
variance of the prediction error
2
2
D var.y / C var.yO / D  C  h# D  2 1 C x0# .X0 X/1 x# 


1
N 2
.x  x/
; when k D 1; yO D O0 C O1 x
(g) var.e / D  2 1 C C
n
SSx
O D O 2 h#
(h) se2 .yO / D var.
c yO / D se2 .x0# /

estimated variance of yO

4 Decompositions of sums of squares

15

(i) se2 .e / D var.e


c /
D var.y
c   yO / D O 2 .1 C h# /
estimated variance of e
p
0 O
O
(j) yO t=2Ink1 var.
c yO / D x# t=2Ink1 se.x0# /
p
condence interval for E.y /
D yO t=2Ink1 O h#
p
p
c   yO / D yO t=2Ink1 O 1 C h#
(k) yO t=2Ink1 var.y
prediction interval for the new unobserved y
p
p
(l) yO .k C 1/F;kC1;nk1 O h#
WorkingHotelling condence band for E.y /

4 Decompositions of sums of squares


Unless otherwise stated we assume that 1 2 C .X/ holds throughout this section.
4.1

SST D ky  yN k2 D k.I  J/yk2 D y0 .I  J/y D y0 y  nyN 2 D tyy total SS

4.2

SSR D kOy  yN k2 D k.H  J/yk2 D k.I  J/Hyk2 D y0 .H  J/y


D y0 PCX0 y D y0 PXQ 0 y D t0xy T1
xx t xy

4.3

SS due to regression; 1 2 C .X/

SSE D ky  yO k2 D k.I  H/yk2 D y0 .I  H/y D y0 My D y0 .C  PCX0 /y


D y0 y  y0 XO D tyy  t0xy T1
xx t xy

4.4

SST D SSR C SSE

4.5

(a) df.SST/ D r.I  J/ D n  1;

4.6

sy2 D SST=.n  1/,

(b) df.SSR/ D r.H  J/ D r.X/  1;

MSR D SSR=r.X/  1,

(c) df.SSE/ D r.I  H/ D n  r.X/;

MSE D SSE=n  r.X/ D O 2 .

SST D

n
X

.yi  y/
N 2;

SSR D

iD1

4.7

residual sum of squares

n
X
.yOi  y/
N 2;
iD1

SSE D

n
X

.yi  yOi /2

iD1



SSR
2
D SST.1  R2 / D SST.1  Ryx
/
SSE D SST 1 
SST

4.8

2
/
MSE D O 2 sy2 .1  Ryx

2
2
which corresponds to yx
D y2 .1  %yx
/

4.9

MSE D O 2 D SSE=n  r.X/


D SSE=.n  k  1/; when r.X/ D k C 1

unbiased estimate of  2 ,
residual mean square

16

4.10

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

We always have .I  J/y D .H  J/y C .I  H/y and similarly always y0 .I 


J/y D y0 .H  J/y C y0 .I  H/y, but the decomposition 4.4,
(a) k.I  J/yk2 D k.H  J/yk2 C k.I  H/yk2 ,
is valid i .Oy  yN /0 .y  yO / D .H  J/y0 .I  H/y D y0 J.H  I/y D 0, which
holds for all y i JH D J which is equivalent to 1 2 C .X/, i.e., to H1 D 1.
Decomposition (a) holds also if y is centered or y 2 C .X/.

4.11

1 2 C .X/ () H  J is orthogonal projector () JH D HJ D J in


which situation JOy D JHy D Jy D .y;
N y;
N : : : ; y/
N 0.

4.12

In the intercept model we usually have X D .1 W X0 /. If X does not explicitly


have 1 as a column, but 1 2 C .X/, then C .X/ D C .1 W X/ and we have
H D P1 C PCX D J C PCX , and H  J is indeed an orthogonal projector.

4.13

SSE D min ky  Xk2 D SSE.yI X/ D kres.yI X/k2

4.14

SST D min ky  1k2 D SSE.yI 1/ D kres.yI 1/k2

4.15

SSR D minkHy  1k2 D SSE.HyI 1/ D kres.HyI 1/k2

D SSE.yI 1/  SSE.yI X/

change in SSE gained


adding real predictors
when 1 is already in the model

D SSE.X0 j 1/
4.16

y explained only by 1

2
R2 D Ryx
D

D
D

SSR
SSE
D1
SST
SST

SSE.yI 1/  SSE.yI X/
SSE.yI 1/
t0xy T1
xx t xy
tyy

s0xy S1
xx sxy

multiple correlation coecient squared,


coecient of determination
fraction of SSE.yI 1/ D SST accounted
for by adding predictors x1 ; : : : ; xk

sy2

O 0 rxy D O 1 r1y C    C O k rky


D r0xy R1
xx rxy D
4.17

O D cord2 .yI yO / D cos2 Cy; .H  J/y


R2 D max cord2 .yI X/ D cord2 .yI X/

4.18

X D .X1 W X2 /; M1 D I  PX1
(a) SSE D y0 I  .PX1 C PM1 X2 /y
0
0
D y M1 y  y PM1 X2 y D SSE.M1 yI M1 X2 /
D SSE.yI X1 /  SSR.eyX1 I EX2 X1 /, where

(b) eyX1 D res.yI X1 / D M1 y D residual of y after elimination of X1 ,


(c) EX2 X1 D res.X2 I X1 / D M1 X2 .

4 Decompositions of sums of squares

17

4.19

y0 PM1 X2 y D SSE.yI X1 /  SSE.yI X1 ; X2 / D SSE.X2 j X1 /


D reduction in SSE when adding X2 to the model

4.20

Denoting M12 D fy; X;  2 Ig, M1 D fy; X1 1 ;  2 Ig, and M121 D fM1 y;


M1 X2 2 ;  2 M1 g, the following holds:
(a) SSE.M121 / D SSE.M12 / D y0 My,
(b) SST.M121 / D y0 M1 y D SSE.M1 /,
(c) SSR.M121 / D y0 M1 y  y0 My D y0 PM1 X2 y,
(d) R2 .M121 / D

SSR.M121 /
y 0 P M1 X2 y
y0 My
D
D1 0
,
0
SST.M121 /
y M1 y
y M1 y

2
2
(e) X2 D xk : R2 .M121 / D ryk12:::k1
and 1  ryk12:::k1
D

y0 My
,
y 0 M1 y

(f) 1  R2 .M12 / D 1  R2 .M1 /1  R2 .M121 /


D 1  R2 .yI X1 /1  R2 .M1 yI M1 X2 /;
2
2
2
2
2
D .1  ry1
/.1  ry21
/.1  ry312
/    .1  ryk12:::k1
/.
(g) 1  Ry12:::k

4.21

If the model does not have the intercept term [or 1 C .X/], then the decomposition 4.4 is not valid. In this situation, we consider the decomposition
y0 y D y0 Hy C y0 .I  H/y;

SSTc D SSRc C SSEc :

In the no-intercept model, the coecient of determination is dened as


Rc2 D

SSRc
y0 Hy
SSE
D 0 D1 0 :
SSTc
yy
yy

In the no-intercept model we may have Rc2 D cos2 .y; yO / cord2 .y; yO /. However, if both X and y are centered (actually meaning that the intercept term
is present but not explicitly), then we can use the usual denitions of R2 and
Ri2 . [We can think that SSRc D SSE.yI 0/  SSE.yI X/ D change in SSE
gained adding predictors when there are no predictors previously at all.]
4.22

Sample partial correlations. Below we consider the data matrix .X W Y/ D


.x1 W : : : W xp W y1 W : : : W yq /.
(a) EYX D res.YI X/ D .I  P.1WX/ /Y D MY D .ey1 X W : : : W eyq X /,
(b) eyi X D .I  P.1WX/ /yi D Myi ,
(c) pcord .Y j X/ D cord res.YI X/ D cord .EYX /
D partial correlations of variables of Y after elimination of X,
(d) pcord .y1 ; y2 j X/ D cord .ey1 X ; ey2 X /,
(e) Tyyx D E0YX EYX D Y0 .I  P.1WX/ /Y D Y0 MY D ftij x g 2 PDq ,

18

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(f) ti ix D y0i .I  P.1WX/ /yi D y0i Myi D SSE.yi I 1; X/.



4.23

Denote T D

Txx Txy
Tyx Tyy

Then


D


T1

xxy
;
 T1
yyx


X0 CY X0 CX
, C D I  J, M D I  P.1WX/ .
Y0 CX Y0 CY

(a) T1 D


cord .X W Y/1 D R1 D


R1

xxy
;

R1
yyx

0
0
0
1 0
0
(b) Tyyx D Tyy  Tyx T1
xx Txy D Y CY  Y CX.Y CX/ Y CX D Y MY,

(c) Ryyx D Ryy  Ryx R1


xx Rxy ,
(d) pcord .Y j X/ D diag.Tyyx /1=2 Tyyx diag.Tyyx /1=2
D diag.Ryyx /1=2 Ryyx diag.Ryyx /1=2 :
4.24

.Y  XB/0 C.Y  XB/ D .CY  CXB/0 .CY  CXB/


L .CY  PCX Y/0 .CY  PCX Y/
D Y0 C.I  PCX /CY D Tyy  Tyx T1
xx Txy
for all B, and hence for all B we have
covs .y  B0 x/ D covd .Y  XB/ L Syy  Syx S1
xx Sxy ;
1
where the equality is attained if B D T1
xx Txy D Sxx Sxy ; see 6.7 (p. 28).

4.25

rxy  rx ry
rxy D p
2
2
.1  rx
/.1  ry
/

4.26

If Y D .x1 W x2 /, X D .x3 W : : : W xk / and Ryyx D fr ij g, then

partial correlation

12

r
:
cor.O1 ; O2 / D  pcord .x1 ; x2 j x3 ; : : : ; xk / D r123:::k D p
r 11 r 22
4.27

Added variable plot (AVP). Let X D .X1 W xk / and denote


u D eyX1 D M1 y D res.yI X1 /;
v D exk X1 D M1 xk D res.xk I X1 /:
The scatterplot of exk X1 versus eyX1 is an AVP. Moreover, consider the mod 
els M12 D fy; .X1 W xk /;  2 Ig, with D k1 , M1 D fy; X1 1 ;  2 Ig,
and M121 D fM1 y; M1 xk k ;  2 M1 g D feyX1 ; exk X1 k ;  2 M1 g. Then
(a) Ok .M12 / D Ok .M121 /,
(b) res.yI X/ D res.M1 yI M1 xk /,

(FrischWaughLovell theorem)
2
R2 .M121 / D ryk12:::k1
,

2
(c) 1  R2 .M12 / D 1  R2 .M1 /.1  ryk12:::k1
/.

5 Distributions

19

5 Distributions
5.1

Discrete uniform distribution. Let x be a random variable whose values are


1; 2; : : : ; N , each with equal probability 1=N . Then E.x/ D 12 .N C 1/, and
1
.N 2  1/.
var.x/ D 12

5.2

Sum of squares and cubes of integers:


n
X

i 2 D 16 n.n C 1/.2n C 1/;

iD1

n
X

i 3 D 14 n2 .n C 1/2 :

iD1

5.3

Let x1 ; : : : ; xp be a random sample selected without a replacement from A D


2
f1; 2; : : : ; N g. Denote y D x1 C    C xp D 1p0 x. Then var.xi / D N121 ,


cor.xi ; xj / D  N11 D %, i; j D 1; : : : ; p, cor 2 .x1 ; y/ D p1 C 1  p1 %.

5.4

Bernoulli distribution. Let x be a random variable whose values are 0 and


1, with probabilities p and q D 1  p. Then x  Ber.p/ and E.x/ D p,
var.x/ D pq. If y D x1 C    C xn , where xi are independent and each
xi  Ber.p/, then y follows the binomial distribution, y  Bin.n; p/, and
E.x/ D np, var.x/ D npq.

5.5

Two dichotomous variables. On the basis of the following frequency table:


1 
n


x
sx2 D
D

1
;
n1 n
n1 n
n
0
1 total
1 ad  bc
ad  bc
; rxy D p
;
sxy D
0 a b
n1
n

y
1 c d
n.ad  bc/2
2
2 D
:
D nrxy
total  n


5.6

5.7

 
Let z D yx be a discrete 2-dimensional random vector which is obtained
from the frequency table in 5.5 so that each
has the same proba observation

bility 1=n. Then E.x/ D n , var.x/ D n 1  n , cov.x; y/ D .ad  bc/=n2 ,
p
and cor.x; y/ D .ad  bc/=  .
In terms of the probabilities:
var.x/ D p1 p2 ;
cov.x; y/ D p11 p22  p12 p21 ;
p11 p22  p12 p21
D %xy :
cor.x; y/ D p
p1 p2 p1 p2

x
0
1 total
0 p11 p12 p1
y
1 p21 p22 p2
total p1 p2 1

20

Formulas Useful for Linear Regression Analysis and Related Matrix Theory


p11
p21
p11
() f
D
p21




a b
p12
D det
D0
c d
p22
p12
a
b
()
D
p22
c
d

5.8

%xy D 0 () det

5.9

Dichotomous random variables x and y are statistically independent i %xy D


0.

5.10

Independence between random variables means statistical (stochastic) independence: the random vectors
  x and y are statistically independent i the joint
distribution function of xy is the product of the distribution functions of x
and y. For example, if x and y are discrete random variables with values
x1 ; : : : ; xr and y1 ; : : : ; yc , then x and y are statistically independent i
P.x D xi ; y D yj / D P.x D xi / P.y D yj /; i D 1; : : : ; r; j D 1; : : : ; c:

5.11

Finiteness matters. Throughout this book, we assume that the expectations,


variances and covariances that we are dealing with are nite. Then independence of the random variables x and y implies that cor.x; y/ D 0. This implication may not be true if the niteness is not holding.

5.12

Denition N1: A p-dimensional random variable z is said to have a p-variate


normal distribution Np if every linear function a0 z has a univariate normal
distribution. We denote z  Np .; /, where  D E.z/ and D cov.z/. If
a0 z D b, where b is a constant, we dene a0 z  N.b; 0/.

5.13

Denition N2: A p-dimensional random variable z, with  D E.z/ and D


cov.z/, is said to have a p-variate normal distribution Np if it can be expressed
as z D  C Fu, where F is an p  r matrix of rank r and u is a random vector
of r independent univariate normal random variables.

5.14

If z  Np then each element of z follows N1 . The reverse relation does not


necessarily hold.

5.15

 
If z D xy is multinormally distributed, then x and y are stochastically independent i they are uncorrelated.

5.16

Let z  Np .; /, where is positive denite. Then z has a density


n.zI ; / D

5.17

1
1
0 1
e 2 .z/ .z/ :
.2
/p=2 jj1=2

Contours of constant density for N2 .; / are ellipses dened by


A D f z 2 R2 W .z  /0 1 .z  / D c 2 g:

5 Distributions

21

p
These ellipses are centered at  and have axes c i ti , where i D chi ./
and ti is the corresponding eigenvector. The major axis is the longest diameter
(line through ) of the ellipse, that is, we want to nd a point z1 solving
maxkz  k2 subject to z 2 A. Denoting u D z  , the above task becomes
max u0 u

subject to u0 1 u D c 2 ;

p
for which the solution is u1 D z1   D c 1 t1 , and u01 u1 D c 2 1 .
Correspondingly, the minor axis is the shortest diameter of the ellipse A.

5.18

 2 12
21  2

The eigenvalues of D


D 2


1 %
, where 12  0, are
% 1

ch1 ./ D  2 C 12 D  2 .1 C %/;


ch2 ./ D  2  12 D  2 .1  %/;
 
 
and t1 D p1 11 , t2 D p1 1
1 . If 12
0, then t 1 D
2

5.19

When p D 2 and cor.1 ; 2 / D % ( 1), we have


1  2
1

11 12
1 1 2 %
1
(a) D
D
21 22
1 2 % 22
D

1
12 22 .1

%2 /

22 12
12 12

p1 1
2 1


.

1
%
1
2
C
1 B
B 1 1 2 C ;
D
@
A
2
%
1
1%
2
1 2 2

(b) det./ D 12 22 .1  %2 /


12 22 ,



1
.1  1 /2
1
(c) n.zI ; / D
 exp
p
2.1  %2 /
12
2
1 2 1  %2
.1  1 /.2  2 /
.2  2 /2
 2%
C
1 2
22
5.20

Suppose that z  N.; /, z D


:



 
 
xx xy
x
x
,D
. Then
,D
y
y
yx yy

(a) the conditional distribution of y given that x is held xed at a selected


value x D x is normal with mean
N
E.y j x/ D y C yx 1
xx .x  x /
N
N
1
D .y  yx 1
xx x / C yx xx x;
N
(b) and the covariance matrix (partial covariances)
cov.y j x/ D yyx D yy  yx 1
xx xy D = xx :
N

22

5.21

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

 
 
x
x
;
If z D
 NpC1 .; /;  D
y
y




xx  xy
 
1
D
;

; then
D
 0xy y2
  yy
0
1
0
1
(a) E.y j x/ D y C  0xy 1
xx .x  x / D .y   xy xx x / C  xy xx x,
N
N
N
2
2
(b) var.y j x/ D y12:::p
D yx
D y2   0xy 1

xx xy
N


0
1



xy xx xy
D y2 1 
y2
2
/ D 1= yy D conditional variance;
D y2 .1  %yx
2
D
(c) %yx

5.22

When

x
y

 0xy 1
xx  xy
y2

D the squared population multiple correlation.

 N2 , cor.x; y/ D %, and WD xy =x2 , we have




(a) E.y j x/ D y C .x  x / D y C % yx .x  x / D .y  x / C x,


N
N
N
N
2
(b) var.y j x/ D yx
D y2 
N

2
xy

x2

D y2 .1  %2 /
y2 D var.y/.

5.23

The random vector y C yx 1


xx .x  x / appears to be the best linear predictor of y on the basis of x, denoted as BLP.yI x/. In general, BLP.yI x/ is
not the conditional expectation of y given x.

5.24

The random vector


eyx D y  BLP.yI x/ D y  y C yx 1
xx .x  x /
is the vector of residuals of y from its regression on x, i.e., prediction error
between y and its best linear predictor BLP.yI x/. The matrix of partial covariances of y (holding x xed) is
cov.eyx / D yyx D yy  yx 1
xx xy D = xx :
If z is not multinormally distributed, the matrix yyx is not necessarily the
covariance matrix of the conditional distribution.

5.25

The population partial correlations. The ij -element of the matrix of partial


correlations of y (eliminating x) is
ij x
D cor.eyi x ; eyj x /;
%ij x D p
i ix jj x
fij x g D yyx ;
and eyi x D yi  yi 

f%ij x g D cor.eyx /;
 0xyi 1
xx .x

 x /,  xyi D cov.x; yi /. In particular,

5 Distributions

23

%xy D p

%xy  %x %y
2
.1  %2x /.1  %y
/

5.26

The conditional expectation E.y j x/, where x is now a random vector, is


BP.yI x/, the best predictor of y on the basis of x. Notice that BLP.yI x/ is the
best linear predictor.

5.27

In the multinormal distribution, BLP.yI x/ D E.y j x/ D BP.yI x/ D the best


predictor of y on the basis of x; here E.y j x/ is a random vector.

5.28

The conditional mean E.y j x/ is called (in the world of random variables)
the regression function (true Nmean of y when x is held at a selected value x)
N
and similarly var.y j x/ is called the variance function. Note that in the multiN
normal case E.y j x/ is simply a linear function of x and var.y j x/ does not
N
N
depend on x at all. N
N
 
Let yx be a random vector and let E.y j x/ WD m.x/ be a random variable
taking the value E.y j x D x/ when x takes the value x, and var.y j x/ WD
N the value var.y j x D x/N when x D x. Then
v.x/ is a random variable taking
N
N
E.y/ D EE.y j x/ D Em.x/;
var.y/ D varE.y j x/ C Evar.y j x/ D varm.x/ C Ev.x/:

5.29

5.30

 
Let yx be a random vector such that E.y j x D x/ D C x. Then D
N
N
xy =x2 and D y  x .

5.31

If z  .; / and A is symmetric, then E.z0 Az/ D tr.A/ C 0 A.

5.32

Central 2 -distribution: z  Nn .0; In /: z0 z D 2n  2 .n/

5.33

Noncentral 2 -distribution: z  Nn .; In /: z0 z D 2n;  2 .n; /, D 0 

5.34

z  Nn .;  2 In / W z0 z= 2  2 .n; 0 = 2 /

5.35

Let z  Nn .; / where is pd and let A and B be symmetric. Then


(a) z0 Az  2 .r; / i AA D A, in which case r D tr.A/ D r.A/,
D 0 A,
(b) z0 1 z D z0 cov.z/1 z  2 .r; /, where r D n, D 0 1 ,
(c) .z  /0 1 .z  / D MHLN2 .z; ; /  2 .n/,
(d) z0 Az and z0 Bz are independent i AB D 0,
(e) z0 Az and b0 z are independent i Ab D 0.

24

5.36

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Let z  Nn .0; / where is nnd and let A and B be symmetric. Then


(a) z0 Az  2 .r/ i AA D A, in which case r D tr.A/ D
r.A/,
(b) z0 Az and x0 Bx are independent () AB D 0,
(c) z0  z D z0 cov.z/ z  2 .r/ for any choice of  and r./ D r.

5.37

Let z  Nn .; / where is pd and let A and B be symmetric. Then


(a) var.z0 Az/ D 2 tr.A/2  C 40 AA,
(b) cov.z0 Az; z0 Bz/ D 2 tr.AB/ C 40 AB.

5.38

2m; =m
Noncentral F -distribution: F D
 F.m; n; /, where 2m; and 2n
2n =n
are independent

5.39

t -distribution: t2 .n/ D F.1; n/

5.40

Let y  Nn .X;  2 I/, where X D .1 W X0 /, r.X/ D k C 1. Then


(a) y0 y= 2  2 .n; 0 X0 X= 2 /,
(b) y0 .IJ/y= 2  2 n1; 0 X0 .IJ/X= 2  D 2 n1; 0x Txx x = 2 ,
(c) y0 .H  J/y= 2  2 k; 0x Txx x = 2 ,
(d) y0 .I  H/y= 2 D SSE= 2  2 .n  k  1/.

5.41

Suppose In D A1 C    C Am . Then the following statements are equivalent:


(a) n D r.A1 / C    C r.Am /,
(b) A2i D Ai for i D 1; : : : ; m,
(c) Ai Aj D 0 for all i j .

5.42

Cochrans theorem. Let z  Nn .; I/ and let z0 z D z0 A1 z C    C z0 Am z.


Then any of 5.41a5.41c is a necessary and sucient condition for z0 Ai z to
be independently distributed as 2 r.Ai /; .

5.43

Wishart-distribution. Let U0 D .u.1/ W : : : W u.n/ / be a random sample from


Np .0; /,
Pi.e., u.i/ s are independent and each u.i/  Np .0; /. Then W D
U0 U D niD1 u.i/ u0.i/ is said to have a Wishart-distribution with n degrees
of freedom and scale matrix , and we write W  Wp .n; /.

5.44

Hotellings T 2 distribution. Suppose v  Np .0; /, W  Wp .m; /, v and


W are independent, is pd. Hotellings T 2 distribution is the distribution of

5 Distributions

25

T 2 D m  v0 W1 v

 1

W
Wishart 1
v D .normal r.v./0
.normal r.v./
D v0
m
df
and is denoted as T 2  T2 .p; m/.
5.45

Let U0 D .u.1/ W u.2/ W : : : W u.n/ / be a random sample from Np .; /. Then:


(a) The (transposed) rows of U, i.e., u.1/ , u.2/ , . . . , u.n/ , are independent
random vectors, each u.i/  Np .; /.
(b) The columns of U, i.e., u1 , u2 , . . . , up are n-dimensional random vectors:
ui  Nn .i 1n ; i2 In /, cov.ui ; uj / D ij In .
1
0
0 1
1 1 n
u1
B : C
(c) z D vec.U/ D @ ::: A, E.z/ D @ :: A D  1n .
up
p 1n
0 2
1
1 In 12 In : : : 1p In
::
:: C
B :
(d) cov.z/ D @ ::
:
: A D In .
p1 In p2 In : : : p2 In
(e) uN D n1 .u.1/ C u.2/ C    C u.n/ / D n1 U0 1n D .uN 1 ; uN 2 ; : : : ; uNp /0 .
(f) S D
D

1
T
n1
1
n1

n
X

1
U0 .I
n1

 J/U D

1
n1

n
X

N .i/  u/
N 0
.u.i/  u/.u

iD1


u.i/ u0.i/  nuN uN 0 :

iD1



N D , E.S/ D , uN  Np ; n1 .
(g) E.u/
(h) uN and T D U0 .I  J/U are independent and T  W.n  1; /.
N 0 ; S/,
(i) Hotellings T 2 : T 2 D n.uN  0 /0 S1 .uN  0 / D n  MHLN2 .u;
np
T2
.n1/p

 F.p; n  p;  /;

 D n.  0 /0 1 .  0 /:

(j) Hypothesis  D 0 is rejected at risk level , if


n.uN  0 /0 S1 .uN  0 / >

p.n1/
FIp;np :
np

(k) A 100.1  /% condence region for the mean of the Np .; / is the
ellipsoid determined by all  such that
n.uN  /0 S1 .uN  /

(l) max
a0

p.n1/
FIp;np :
np

a0 .uN  0 /2
N 0 ; S/.
D .uN  0 /0 S1 .uN  0 / D MHLN2 .u;
a0 Sa

26

5.46

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Let U01 and U02 be independent random samples from Np .1 ; / and Np .2 ;
/, respectively. Denote Ti D U0i Cni Ui and
S D

1
.T1 C T2 /;
n 1 C n2  2

If 1 D 2 , then
5.47

n1 n2
N 1  uN 2 /:
.uN 1  uN 2 /0 S1
 .u
n 1 C n2

 F.p; n1 C n2  p  1/.

If n1 D 1, then Hotellings T 2 becomes


T2 D

5.48

n1 Cn2 p1 2
T
.n1 Cn2 2/p

T2 D

n2
.u.1/
n2 C1

N 2 /:
 uN 2 /0 S1
2 .u.1/  u

Let U0 D .u.1/ W : : : W u.n/ / be a random sample from Np .; /. Then the


likelihood function and its logarithm are
(a) L D .2
/

pn
2

n
h
i
X
n
jj 2 exp  12
.u.i/  /0 1 .u.i/  / ,
iD1

(b) log L D  12 pn log.2


/  12 n logjj 

n
X
.u.i/  /0 1 .u.i/  /.
iD1

The function L is considered as a function of  and while U0 is being xed.


The maximum likelihood estimators, MLEs, of  and are the vector 
and the positive denite matrix  that maximize L:
N
(c)  D n1 U0 1 D .uN 1 ; uN 2 ; : : : ; uNp /0 D u,
(d) The maximum of L is max L.; / D
;

(e) Denote max L.0 ; / D

where 0 D
is

1
n

Pn

1
.2
/np=2 j

j

n=2

n1
S.
n

enp=2 .

1
enp=2 ,
.2
/np=2 j 0 jn=2

iD1 .u.i/

 0 /.u.i/  0 /0 . Then the likelihood ratio

max L.0 ; /
D
D
max; L.; /
(f) Wilkss lambda D 2=n D
5.49

 D n1 U0 .I  J/U D

j  j
j 0 j

1
1C

1
T2
n1

n=2
:

, T 2 D n.uN  0 /0 S1 .uN  0 /.

Let U0 D .u.1/ W : : : W u.n/ / be a random sample from NkC1 .; /, where


 
 
x.i/
x
; E.u.i/ / D
;
u.i/ D
yi
y


xx  xy
cov.u.i/ / D
; i D 1; : : : ; n:
 0xy y2

6 Best linear predictor

27

Then the conditional mean and variance of y, given that x D x, are


N
0
E.y j x/ D y C  0xy 1
xx .x  x / D 0 C x x;
N
N
N
2
var.y j x/ D yx
;
N
1
where 0 D y   0xy 1
xx x and x D xx  xy . Then
N D O0 ;
MLE.0 / D yN  s0xy S1
xx x

O
MLE. x / D S1
xx sxy D x ;

2
2
1
MLE.yx
/ D n1 .tyy
 t0xy T1
xx t xy / D n SSE:
2
2
The squared population multiple correlation %yx
D  0xy 1
xx  xy =y equals
1
0 i x D xx  xy D 0. The hypothesis x D 0, i.e., %yx D 0, can be tested
by

F D

2
=k
Ryx
2 /=.n  k  1/
.1  Ryx

6 Best linear predictor


6.1

Let f .x/ be a scalar valued function of the random vector x. The mean squared
error of f .x/ with respect to y (y being a random variable or a xed constant)
is
MSEf .x/I y D Ey  f .x/2 :
Correspondingly, for the random vectors y and f.x/, the mean squared error
matrix of f.x/ with respect to y is
MSEMf.x/I y D Ey  f.x/y  f.x/0 :

6.2

We might be interested in predicting the random variable y on the basis of


some function of the random vector x; denote this function as f .x/. Then
f .x/ is called a predictor of y on the basis of x. Choosing f .x/ so that it
minimizes the mean squared error MSEf .x/I y D Ey  f .x/2 gives the
best predictor BP.yI x/. Then the BP.yI x/ has the property
min MSEf .x/I y D min Ey  f .x/2 D Ey  BP.yI x/2 :

f .x/

f .x/

It appears that the conditional expectation E.y j x/ is the best predictor of y:


E.y j x/ D BP.yI x/. Here we have to consider E.y j x/ as a random variable,
not a real number.
6.3

biasf .x/I y D Ey  f .x/

6.4

The mean squared error MSE.a0 x C bI y/ of the linear (inhomogeneous) predictor a0 x C b with respect to y can be expressed as

28

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Ey  .a0 x C b/2 D var.y  a0 x/ C y  .a0 x C b/2


D variance C bias2 ;
and in the general case, the mean squared error matrix MSEM.Ax C bI y/ is
Ey  .Ax C b/y  .Ax C b/0
D cov.y  Ax/ C ky  .Ax C b/k2 :
6.5

BLP: Best linear predictor. Let x and y be random vectors such that
  

   
x
xx xy
x
x
; cov
D
:
E
D
y
yx yy
y
y
Then a linear predictor Gx C g is said to be the best linear predictor, BLP, for
y, if the Lwner ordering
MSEM.Gx C gI y/
L MSEM.Fx C fI y/
holds for every linear predictor Fx C f of y.

6.6


 

x
xx xy
; and denote
DD
yx yy
y
  


x
x
I
0
D
:
Bz D
y
y  yx 
 yx 
xx I
xx x

Let cov.z/ D cov

Then in view the block diagonalization theorem of a nonnegative denite matrix, see 20.26 (p. 89), we have




/0 xy
I . 
I
0
xx xy
0
xx
(a) cov.Bz/ D BB D
yx yy
0
I
 yx 
xx I


0
xx
; yyx D yy  yx 
D
xx xy ;
0 yyx
(b) cov.x; y  yx 
xx x/ D 0,
6.7

and

(a) cov.y  yx 
xx x/ D yyx
L cov.y  Fx/

for all F,



(b) BLP.yI x/ D y C yx 
xx .x  x / D .y  yx xx x / C yx xx x.

6.8

Let eyx D y  BLP.yI x/ be the prediction error. Then


(a) eyx D y  BLP.yI x/ D y  y C yx 
xx .x  x /,
(b) cov.eyx / D yy  yx 
xx xy D yyx D MSEMBLP.yI x/I y,
(c) cov.eyx ; x/ D 0.

6.9

According to 4.24 (p. 18), for the data matrix U D .X W Y/ we have

6 Best linear predictor

29

0
Syy  Syx S1
xx Sxy
L covd .Y  XB/ D covs .y  B x/

sy2

s0xy S1
xx sxy

vard .y  Xb/ D vars .y  b x/

for all B;

for all b;

1
where the minimum is attained when B D S1
xx Sxy D Txx Txy .

6.10

x 2 C . xx W x / and x  x 2 C . xx / with probability 1.

6.11

The following statements are equivalent:


(a) cov.x; y  Ax/ D 0.
(b) A is a solution to A xx D yx .

(c) A is of the form A D yx 
xx C Z.Ip  xx xx /; Z is free to vary.

(d) A.x  x / D yx 
xx .x  x / with probability 1.
6.12


If b D 1
xx  xy (no worries to use xx ), then

(a) min var.y  b0 x/ D var.y  b0 x/


b

2
2
D y2   0xy 1
xx  xy D y12:::p D yx ;

(b) cov.x; y  b0 x/ D 0,


(c) max cor 2 .y; b0 x/ D cor 2 .y; b0 x/ D cor 2 .y;  0xy 1
xx x/
b

 0xy 1
xx  xy
y2

2
D %yx
,

squared population
multiple correlation,

2
2
0
1
2
2
2
(d) yx
D y2   0xy 1
xx  xy D y .1   xy xx  xy =y / D y .1  %yx /.

6.13

(a) The tasks of solving b from minb var.y  b0 x/ and maxb cor 2 .y; b0 x/
yield essentially the same solutions b D 1
xx  xy .
(b) The tasks of solving b from minb ky  Abk2 and maxb cos2 .y; Ab/ yield
essentially the same solutions Ab D PA y.
(c) The tasks of solving b from minb vard .y  X0 b/ and maxb cord2 .y; X0 b/
O
yield essentially the same solutions b D S1
xx sxy D x .

6.14

Consider a random vector z with E.z/ D , cov.z/ D , and let the random
vector BLP.zI A0 z/ denote the BLP of z based on A0 z. Then
(a) BLP.zI A0 z/ D  C cov.z; A0 z/cov.A0 z/ A0 z  E.A0 z/
D  C A.A0 A/ A0 .z  /
D  C P0AI .z  /;
(b) the covariance matrix of the prediction error ezA0 z D z  BLP.zI A0 z/ is

30

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

covz  BLP.zI A0 z/ D  A.A0 A/ A0


D .I  PAI / D 1=2 .I  P1=2 A / 1=2 :
6.15

6.16

 
Cooks trick. The best linear predictor of z D xy on the basis of x is

 

x
x
BLP.zI x/ D
D
:
y C yx 
BLP.yI x/
xx .x  x /
Let cov.z/ D and denote z.i/ D .1 ; : : : ; i /0 and consider the following
residuals (prediction errors)
e10 D 1 ;

ei1:::i1 D i  BLP.i I z.i1/ /;

i D 2; : : : ; p:

Let e be a p-dimensional random vector of these residuals: e D .e10 ; e21 ; : : : ;


ep1:::p1 /0 . Then cov.e/ WD D is a diagonal matrix and
2
2
det./ D detcov.e/ D 12 21
   p12:::p1
2
D .1  %212 /.1  %2312 /    .1  %p1:::p1
/12 22    p2

12 22    p2 :


6.17

(Continued . . . ) The vector e can be written as e D Fz, where F is a lower


triangular matrix (with ones in diagonal) and cov.e/ D D D FF0 . Thereby
D F1 D1=2 D1=2 .F0 /1 WD LL0 , where L D F1 D1=2 is an lower triangular matrix. This gives a statistical proof of the triangular factorization of a
nonnegative denite matrix.

6.18

2
:
Recursive decomposition of 1  %y12:::p
2
2
2
2
2
1  %y12:::p
D .1  %y1
/.1  %y21
/.1  %y312
/    .1  %yp12:::p1
/:

6.19

Mustonens measure of multivariate dispersion. Let cov.z/ D pp . Then


Mvar./ D max

p
X

2
i12:::i1
;

iD1

where the maximum is sought over all permutations of 1 ; : : : ; p .


6.20

Consider the data matrix X0 D .x1 W : : : W xk / and denote e1 D .I  P1 /x1 ,


ei D .I  P.1Wx1 W:::Wxi 1 / /xi , E D .e1 W e2 W : : : W ek /. Then E0 E is a diagonal
matrix where ei i D SSE.xi I 1; x1 ; : : : ; xi1 /, and
 1 0 
2
2
2
det n1
E E D .1  r12
/.1  R312
/    .1  Rk1:::k1
/s12 s22    sk2
D det.Sxx /:

6.21

AR(1)-structure. Let yi D %yi1 C ui , i D 1; : : : ; n, j%j < 1, where ui s


(i D : : : , 2, 1, 0, 1, 2; : : : ) are independent random variables, each having
E.ui / D 0 and var.ui / D u2 . Then

6 Best linear predictor

31

n1 1

1
%
%2 : : : %
B
%
1
% : : : %n2 C
B :
:
::
:: C
(a) cov.y/ D D  2 V D
::
1  %2 @ ::
:
: A
n1
n2
n3
%
%
::: 1
%
u2 jij j

D
%
:
1  %2
 


V11 v12
y
(b) cor.y/ D V D cor .n1/ D
D f%jij j g,
yn
v012 1
u2

1
0
n1
,
(c) V1
11 v12 D V11  %V11 in1 D %in1 D .0; : : : ; 0; %/ WD b 2 R

(d) BLP.yn I y.n1/ / D b0 y.n1/ / D v012 V1


11 y.n1/ D %yn1 ,
(e) en D yn  BLP.yn I y.n1/ / D yn  %yn1 D nth prediction error,
(f) cov.y.n1/ ; yn  b0 y.n1/ D cov.y.n1/ ; yn  %yn1 / D 0 2 Rn1 ,
(g) For each yi , i D 2; : : : ; n, we have
BLP.yi I y.i1/ / D %yi1 ;

eyi y.i 1/ D yi  %yi1 WD ei :

Dene e1 D y1 , ei D yi  %yi1 , i D 2; : : : ; n, i.e.,


0
1 0
1 0 0 ::: 0 0
y1
B y2  %y1 C B% 1 0 : : : 0 0
B
C B : : :
::
::
C B : : :
eDB
:
:
B y3 : %y2 C D B : : :
@
A @ 0 0 0 : : : % 1
::
yn  %yn1
0 0 0 : : : 0 %

1
0
0C
:: C
:C
C y WD Ly;
0A
1

where L 2 Rnn .


1
00
(h) cov.e/ D  2
WD  2 D D cov.Ly/ D  2 LVL0 ,
0 .1  %2 /In1
(i) LVL0 D D;

V D L1 D.L0 /1 ; V1 D L0 D1 L,




p

1  %2 0 0
1  %2 0 0
1
1
1=2
p
;
D
;
D
(j) D1 D 1%
2
0
In1
1%2
0
In1
0p
1
1  %2 0 0 : : : 0 0 0
B %
1 0 ::: 0 0 0C
B
1
::
:: ::
::
:: :: C
B
(k) D1=2 L D p
:
:
:
:
: :C
B
C
1  %2 @
0
0 0 : : : % 1 0 A
0

0 ::: 0

1
WD p
K;
1  %2
(l) D1=2 LVL0 D1=2 D

1
KVK0 D In ,
1  %2

% 1

32

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

1
K0 K
1  %2
0 ::: 0
0
% : : : 0
0
::
::
::
:
:
:
0 : : : % 1 C %2
0 ::: 0
%

(m) V1 D L0 D1 L D L0 D1=2 D1=2 L D


0

1
%
B% 1 C %2
1 B
::
B :::
D
:
B
2
1% @
0
0
0
0

1
0
0C
:: C
: C
C;
%A
1

(n) det.V/ D .1  %2 /n1 .


(o) Consider the model M D fy; X;  2 Vg, where  2 D u2 =.1  %2 /.
Premultiplying this model by K yields the model M D fy ; X ; u2 Ig,
where y D Ky and X D KX.
6.22

DurbinWatson test statistic for testing % D 0:


X
n
n
X
"O 0 G0 G"O
2
2.1  %/;
O
DW D
.O"i  "Oi1 /
"O2i D
"O 0 "O
iD2
iD1
Pn
P
where %O D niD2 "Oi1 "Oi
O2i and G 2 R.n1/n and G0 G 2 Rnn are
iD2 "
1
0
1 1 0 : : : 0 0 0
B 0 1 1 : : : 0 0 0C
B :
:: ::
::
:: :: C
:
GDB
: :
:
: :C
C;
B :
@ 0 0 0 : : : 1 1 0A
0 0 0 : : : 0 1 1
1
0
1 1 0 : : : 0 0 0
B1 2 1 : : : 0 0 0 C
B :
::
::
::
::
:: C
:
G0 G D B
:
:
:
:
: C
C:
B :
@ 0 0 0 : : : 1 2 1A
0 0 0 : : : 0 1 1

6.23

Let the sample covariance matrix of x1 ; : : : ; xk ; y be





S s
S D r jij j D 0xx xy
2 R.kC1/.kC1/ :
sxy sy2
0
k
Then sxy D rSxx ik and O x D S1
xx sxy D rik D .0; : : : ; 0; r/ 2 R .

6.24

1
B1
B
1
If L D B
B:
@ ::

0
1
1
::
:

0
0
1
::
:

:::
:::
:::
::
:
1 1 1 :::

1
0
0C
C
0C 2 Rnn , then
:: C
:A
1

7 Testing hypotheses

33

.LL0 /1

11 0
1
::: 1
1
2 1 0 : : : 0 0
B1 2 1 : : : 0 0 C
::: 2
2 C
C
C
B
B 0 1 2 : : : 0 0 C
::: 3
3 C
C
B
::
:: C D B :: :: :: : :
:: :: C
::
C:
:
:
: C
B : : : : : : C
@ 0 0 0 : : : 2 1 A
2 3 : : : n  1 n  1A
1 2 3 ::: n  1 n
0 0 0 : : : 1 1

1
B1
B
B1
DB
B :::
B
@1

1
2
2
::
:

1
2
3
::
:

7 Testing hypotheses
7.1

Consider the model M D fy; X;  2 Ig, where y  Nn .X;  2 I/, X is n  p


(p D k C 1). Most of the time we assume that X D .1 W X0 / and it has full
column rank. Let the hypothesis to be tested be
H W K0 D 0; where K0 2 Rqqp , i.e., Kpq has a full column rank.
Let MH D fy; X j K0 D 0;  2 Ig denote the model under H and let C .X /
be that subspace of C .X/ where the hypothesis H holds:
C .X / D f z W there exists b such that z D Xb and K0 b D 0 g:
Then MH D fy; X ;  2 Ig and the hypothesis is H : E.y/ 2 C .X /. In general we assume that K0 is estimable, i.e., C .K/  C .X0 / and the hypothesis
said to be testable. If r.X/ D p then every K0 is estimable.

7.2

The F -statistic for testing H : F D

7.3

SSE= 2 D y0 My= 2  2 .n  p/

7.4

Q D SSEH  SSE D SSE


D SSE.MH /  SSE.M /
D y0 .I  PX /y  y0 .I  H/y

Q=q
Q=q
 F.q; n  p; /
D
2
O
SSE=.n  p/

change in SSE due to the hypothesis


SSEH D SSE.MH / D SSE under H

D y0 .H  PX /y; and hence

7.5

SSE.MH /  SSE.M /=q


SSE.M /=.n  p/
2
.R2  RH
/=q
D
2
.1  R /=.n  p/
SSE.MH /  SSE.M /=r.X/  r.X /
D
SSE.M /=n  r.X/
2
.R2  RH
/=r.X/  r.X /
D
2
.1  R /=n  r.X/

F D

2
RH
D R2 under H

34

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

7.6

Q= 2  2 .q; /;

central 2 if H true

7.7

D 0 X0 .H  PX /X= 2 D 0 X0 .I  PX /X= 2

7.8

C .X / D C .XK? /;

7.9

Consider hypothesis H : kqC1 D    D k D 0, i.e., 2 D 0, when


 
1
; 2 2 Rq ; X D .X1 W X2 /; X1 D .1 W x1 W : : : W xkq /:
D
2

q D r.X/  r.X /

r.X / D r.X/  dim C .X0 / \ C .K/ D r.X/  r.K/

Then K0 D .0 W Iq /, X D X1 D XK? , and 7.107.16 hold.


7.10

Q D y0 .H  PX1 /y
D y 0 P M1 X2 y

D 02 X02 M1 X2 2 = 2

M1 D I  PX1 

D y0 M1 X2 .X02 M1 X2 /1 X02 M1 y


D O 02 X02 M1 X2 O 2 D O 02 T221 O 2

T221 D X02 M1 X2

D O 02 cov.O 2 /1 O 2  2
7.11
7.12

cov.O 2 / D  2 .X02 M1 X2 /1

.O 2  2 /0 T221 .O 2  2 /=q

FIq;nk1
O 2

condence ellipsoid for 2

The left-hand side of 7.11 can be written as


.O 2  2 /0 cov.O 2 /1 .O 2  2 / 2 =q
D .O 2  2 /0 cov.O 2 /1 .O 2  2 /=q
O 2
and hence the condence region for 2 is

.O 2  2 /0 cov.O 2 /1 .O 2  2 /=q


FIq;nk1 :
7.13

Consider the last two regressors: X D .X1 W xk1 ; xk /. Then 3.29 (p. 14)
implies that


1
rk1;kX1
O
; rk1;kX1 D rk1;k12:::k2 ;
cor. 2 / D
rk1;kX1
1
and hence the orientation of the ellipse (the directions of the major and minor
axes) is determined by the partial correlation between xk1 and xk .

7.14

The volume of the ellipsoid .O 2  2 /0 cov.O 2 /1 .O 2  2 / D c 2 is proportional to c q detcov.O 2 /.

7.15

.O x  x /0 Txx .O x  x /=k

FIk;nk1
O 2

condence region for x

7 Testing hypotheses

7.16

35

Q D kres.yI X1 /  res.yI X/k2 D SSE.yI X1 /  SSE.yI X1 ; X2 /


D SSE D change in SSE when adding X2 to the model
D SSR  SSRH D y0 .H  J/y  y0 .PX  J/y D SSR

7.17

If the hypothesis is H : K0 D d, where K0 2 Rqqp , then


(a) Q D .K0 O  d/0 K0 .X0 X/1 K1 .K0 O  d/
O 1 .K0 O  d/ 2 WD u0 cov.u/1 u 2 ;
D .K0 O  d/0 cov.K0 /
(b) Q= 2 D 2 .q; /;
(c) F D

D .K0  d/0 K0 .X0 X/1 K1 .K0  d/= 2

Q=q
O 1 .K0 O  d/=q  F.q; p  q; /,
D .K0 O  d/0 cov.K0 /
O 2

(d) K0 O  d  Nq K0  d;  2 K0 .X0 X/1 K,


(e) SSEH D minK0 Dd ky  Xk2 D ky  XO r k2 ,
(f) O r D O  .X0 X/1 KK0 .X0 X/1 K1 .K0 O  d/.
7.18

restricted OLSE

The restricted OLSE O r is the solution to equation


 0
   0 
Xy
XX K
O r
D
; where  is the Lagrangian multiplier.
0
K 0
d

The equation above has a unique solution i r.X0 W K/ D p C q, and hence
O r may be unique even though O is not unique.

7.19

If the restriction is K0 D 0 and L is a matrix (of full column rank) such that
L 2 fK? g, then X D XL, and the following holds:
(a) XO r D X .X0 X /1 X0 y D XL.L0 X0 XL/1 L0 X0 y,


(b) O r D Ip  .X0 X/1 KK0 .X0 X/1 KK0 O
D .Ip  P0KI.X0 X/1 /O D PLIX0 X O
D L.L0 X0 XL/1 L0 X0 XO D L.L0 X0 XL/1 L0 X0 y:
(c) cov.O r / D  2 L.L0 X0 XL/1 L0


D  2 .X0 X/1  .X0 X/1 KK0 .X0 X/1 K1 K0 .X0 X/1 :

7.20

If we want to nd K so that there is only one solution to X0 X D X0 y which


satises the constraint K0 D 0, we need to consider the equation
 0 
 0 

 

XX
Xy
PX y
X
D
D
; or equivalently,
:
K0
0
K0
0
0
The above equation
 a unique solution for every given PX y i C .X / \
 Xhas
C .K/ D f0g and K0 has a full column rank in which case the solution for

36

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

is .X0 X C KK0 /1 X0 y. Requirement C .X0 / \ C .K/ D f0g means that K0


is not estimable.
7.21

H W 1 D    D k D 0, i.e., H W E.y/ 2 C .1/ W


F D Foverall D

7.22

R2 =k
MSR
D
 F.k; n  k  1; /
MSE
.1  R2 /=.n  k  1/

H W i D 0 W F D F .Oi / D
D

Oi2
R2
SSE
D
;
D
MSE
.1  R2 /=.n  k  1/
se2 .Oi /
2
ryirest

2
.1  ryirest
/=.n  k  1/

 F.1; n  k  1; /;

2
2
where R2 D R2  R.i/
D change in R2 when xi deleted, R.i/
D
2
R .yI X.i/ /, X.i/ D fall other regressors except the i thg D frestg.

q
Oi
Oi
Dp
D F .Oi /  t.n  k  1/
se.Oi /
var.
c Oi /

7.23

H W i D 0 W t D t .Oi / D

7.24

H W k0 D 0 W F D

O 2
O 2
.k0 /
.k0 /
D
 F.1; n  k  1; /
O
O 2 k0 .X0 X/1 k
var.k
c 0 /

7.25

H W K0 D d W F D

R2 =q
 F.q; n  p; /,
.1  R2 /=.n  p/

2
where R2 D R2  RH
D change in R2 due to the hypothesis.

7.26

In the no-intercept model R2 has to be replaced with Rc2 .

7.27

Consider the model M D fy; X;  2 Vg, where X 2 Rpnp , V is pd, y 


Nn .X;  2 V/, and F .V / is the F -statistic for testing linear hypothesis H :
K0 D d (K0 2 Rqqp /. Then
(a) F .V / D

Q.V /=q
Q.V /=q
 F.q; n  p; /,
D
Q 2
SSE.V /=.n  p/

(b) Q 2 D SSE.V /=f;

f D n  p,

(c) SSE.V / D min ky  Xk2V1

unbiased estimator of  2
under fy; X;  2 Vg
weighted SSE

Q 0 V1 .y  X/,
Q
D min .y  X/0 V1 .y  X/ D .y  X/
(d) SSE.V /= 2  2 .n  p/,
(e) Q D .X0 V1 X/1 X0 V1 y D BLUE./ under fy; X;  2 Vg,

7 Testing hypotheses

37

Q
Q D  2 K0 .X0 V1 X/1 K D K0 cov./K,
(f) cov.K0 /

Q D Q 2 K0 .X0 V1 X/1 K,


(g) cov.K0 /
(h) Q.V / D SSEH .V /  SSE.V / D SSE.V /

change in SSE.V /
due to H ,

(i) Q.V / D .K0 Q  d/0 K0 .X0 V1 X/1 K1 .K0 Q  d/


Q 1 .K0 Q  d/ 2 WD u0 cov.u/1 u 2 ;
D .K0 Q  d/0 cov.K0 /
Q 1 .K0 Q  d/  2 .q; /,
(j) Q.V /= 2 D .K0 Q  d/0 cov.K0 /
Q 1 .K0  d/= 2 ,
(k) D .K0  d/0 cov.K0 /
Q 1 .K0 Q  d/ 2 =q
.K0 Q  d/0 cov.K0 /
Q 2
0 Q
0
0 Q 1
D .K  d/ cov.K / .K0 Q  d/=q  F.q; f; /;

(l) F .V / D

(m) K0 Q  d  Nq K0  d;  2 K0 .X0 V1 X/1 K,


(n) SSEH .V / D minK0 Dd ky  Xk2V1 D ky  XQ r k2V1 , where
(o) Q r D Q  .X0 V1 X/1 KK0 .X0 V1 X/1 K1 .K0 Q  d/
7.28

restricted
BLUE.

Consider general situation, when V is possibly singular. Denote


Q
Q 0 W .y  X/;
SSE.W / D .y  X/

where W D V C XUX0

with U satisfying C .W/ D C .X W V/ and XQ D BLUE.X/. Then


Q 2 D SSE.W /=f;

where f D r.X W V/  r.X/ D r.VM/;

is an unbiased estimator for  2 . Let K0 be estimable, i.e., C .K/  C .X0 /,


Q
Q D  2 D, m D r.D/, and assume that DD u D
denote u D K0 d,
cov.K0 /
0 
2
u. Then .u D u=m/=  F.m; f; /, i.e.,

Q  .K0 Q  d/=m  F.m; f; /:


.K0 Q  d/0 cov.K0 /
7.29

If r.X/ < p but K0 is estimable, then the hypothesis K0 D d can be


tested as presented earlier by replacing appropriated inverses with generalized
inverses and replacing p (D k C 1) with r D r.X/.

7.30

One-way ANOVA for g groups; hypothesis is H : 1 D    D g .


P 1
group 1: y11 ; y12 ; : : : ; y1n1 ; SS1 D jnD1
.y1j  yN1 /2
Png
group g: yg1 ; yg2 ; : : : ; yg ng ; SSg D j D1 .ygj  yNg /2
(a) SSE D SSEWithin D SS1 C  CSSg ;

SSBetween D

g
X
iD1

ni .yNi  y/
N 2,

38

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(b) F D
7.31

SSB=.g  1/
 F.g  1; n  g/
SSE=.n  g/

if H is true.

t -test for the equality of the expectations of 2 groups: H W 1 D 2 .


N 2 C n2 .yN2  y/
N 2
.yN1  yN2 /2
n1 .yN1  y/

D
SS1 C SS2
SS1 C SS2 1
1
C
n2
n2
n1
n2
1

n 1 n2
SS1 C SS2
D
 .yN1  yN2 /
 .yN1  yN2 / 
n
n2
.yN1  yN2 /2
D
 F.1; n  2/ D t2 .n  2/; n1 C n2 D n
SSE n
n  2 n1 n2

F D

7.32

One-way ANOVA in matrix terms.


y D X C ";
0 1 0
1n1
y1
:
@ :: A D B
@ ::
:
yg
0
0
Jn1 : : :
B :
H D @ :: : : :

yi D .yi1 ; yi2 ; : : : ; yi ni /0 ; i D 1; : : : ; g;
10 1 0 1
::: 0
"1
1
: C :
:
::
: :: A @ :: A C @ :: A ;
g
"g
: : : 1ng
1
0
1
0
0
In1  Jn1 : : :
:: C
::
::
B
C
::
A;
:
: A; M D @
:
:

0 : : : Jng
0
: : : Ing  Jng
0
1 0 1
0
1
Cn1 y1
yN1 1n1
yN 1
B : C B:C
B
C
Hy D @ :: A D @ :: A ; My D @ ::: A ;
yNg 1ng
yN g
Cng yg
1
0
N n1
.yN1  y/1
C
B
:
:
.H  Jn /y D @
A;
:
N ng
.yNg  y/1
0

SST D y Cn y D

ni
g X
X

.yij  y/
N 2;

iD1 j D1

SSE D y0 My D SS1 C    C SSg ;


0

SSR D y .H  Jn /y D

g
X

ni .yNi  y/
N 2 D SSB;

iD1

SST D SSR C SSE;

because 1n 2 C .X/;

SSEH D SSE.1 D    D g / D SST H) SSEH  SSE D SSR:


7.33

Consider one-way ANOVA situation with g groups. Let x be a variable indicating the group where the observation belongs to so that x has values 1; : : : ,

8 Regression diagnostics

39

g 
1
g. Suppose that the n data points y111 ; : : : ; y1n
; : : : ; ygg1 ; : : : ; ygng
1
comprise
 a theoretical two-dimensional discrete distribution for the random
vector yx where each pair appears with the same probability 1=n. Let
E.y j x/ WD m.x/ be a random variable which takes the value E.y j x D i /
when x takes the value i , and var.y j x/ WD v.x/ is a random variable taking
the value var.y j x D i / when x D i . Then the decomposition

var.y/ D varE.y j x/ C Evar.y j x/ D varm.x/ C Ev.x/


(multiplied by n) can be expressed as
ni
g X
X

.yij  y/
N D

iD1 j D1

g
X

ni .yNi  y/
N C

.yij  yNi /2 ;

iD1 j D1

iD1

SST D SSR C SSE;

ni
g X
X

y0 .I  J/y D y0 .H  J/y C y0 .I  H/y;

where J and H are as in 7.32.


7.34

One-way ANOVA using another parametrization.


y D X C "; yi D .yi1 ; yi2 ; : : : ; yi ni /0 ; i D 1; : : : ; g;
101 0 1
0 1 0
1n1 1n1 : : : 0
"1
y1
1 C
::
:: : :
:: C B
:: A ;
C
B
@
@ :: A D B
C
A
@
:
:
:
:
:
:
:
@:A
:
yg
"g
1ng 0 : : : 1ng
g
1
0
n n1 : : : ng
B
n
1 n1 : : : 0 C
C
X0 X D B
@ ::: ::: : : : ::: A ;
n g 0 : : : ng
0
1
0 1
0
0 0 ::: 0
B0 1=n1 : : : 0 C
C
B
y
N
C 2 f.X0 X/ g; O D GX0 y D B :1 C :
:
:
:
G WD B
:
:
:
:
:
@:
@ :: A
:
:
: A
0

yNg

: : : 1=ng

8 Regression diagnostics
8.1

Some particular residuals under fy; X;  2 Ig:


(a) "Oi D yi  yOi

ordinary least squares residual

(b) RESSi D .yi  yOi /=O


(c) STUDIi D ri D

yi  yOi
p
O 1  hi i

scaled residual
internally Studentized residual

40

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(d) STUDEi D ti D

yi  yOi
p
O .i/ 1  hi i

externally Studentized residual

O .i/ D the estimate of  in the model where the i th case is excluded,


STUDEi  t.n  k  2; /.
8.2

Denote


X.i/
;
XD 0
x.i/

 
y
y D .i/ ;
yi

Z D .X W ii /;

 

D
;

ii D .0; : : : ; 0; 1/0 ;

and let M.i/ D fy.i/ ; X.i/ ;  2 In1 g be the model where the i th (the last, for
notational convenience) observation is omitted, and let MZ D fy; Z;  2 Ig
O
O
O
be the extended (mean-shift) model. Denote O .i/ D .M
.i/ /, Z D .MZ /,
O
O
and D .MZ /. Then
(a) O Z D X0 .I  ii i0i /X1 X0 .I  ii i0i /y D .X0.i/ X.i/ /1 X0.i/ y.i/ D O .i/ ,
i0 My
"Oi
(b) O D 0i
D
;
ii Mii
mi i

we assume that mi i > 0, i.e., is estimable,

(c) SSEZ D SSE.MZ /


D y0 .I  PZ /y D y0 .M  PMii /y D y0 .I  ii i0i  P.Iii i0i /X /y
D SSE.i/ D SSE  y0 PMii y D SSE  mi i O2 D SSE  "O2i =mi i ;
(d) SSE  SSEZ D y0 PMii y D change in SSE due to hypothesis D 0,


.In1  PX.i / /y.i/
;
(e) .In  PZ /y D
0
(f) ti2 D

y0 PMii y
SSE  SSEZ
O2
D
D
,
O
y0 .M  PMii /y=.n  k  2/
SSEZ =.n  k  2/
se2 ./

(g) ti2  F.1; n  k  2; 2 mi i = 2 /,


O D O 2 =mi i D O 2 =mi i ,
(h) se2 ./
Z
.i/

F -test statistic for


testing D 0 in MZ ,

(i) mi i > 0 () Mii 0 () ii C .X/ () x.i/ 2 C .X0.i/ /


() is estimable under MZ
() is estimable under MZ :
8.3

If Z D .1 W in /, then tn D

yn  yN.n/
yn  yN
D
,
p
p
s.n/ = 1  1=n
s.n/ 1  1=n

2
D sample variance of y1 ; : : : ; yn1 and yN.n/ D .y1 C    C
where s.n/
yn1 /=.n  1/.

8 Regression diagnostics

41

8.4

O
DFBETAi D O  O .i/ D .X0 X/1 X0 ii ;

X.O  O .i/ / D Hii O

8.5

O
y  XO .i/ D My C Hii O H) "O.i/ D i0i .y  XO .i/ / D yi  x0.i/ O .i/ D ,
"O.i/ D the i th predicted residual: "O.i/ is based on a t to the data with the i th
case deleted

8.6

"O2.1/ C    C "O2.n/ D PRESS

8.7

COOK2i D Di2 D .O  O .i/ /0 X0 X.O  O .i/ /=.p O 2 /


D .Oy  yO i /0 .Oy  yO i /=.p O 2 / D O2 hi i =.p O 2 /
ri2 hi i
hi i
1 "O2i
D
;
p O 2 mi i mi i
p mi i

8.8

COOK2i D

8.9

hi i D x0.i/ .X0 X/1 x.i/ ;


x0.i/

the predicted residual sum of squares

.1; x0.i/ /

hi i D leverage

D the i th row of X

h11 C h22 C    C hnn D p .D k C 1/ D trace.H/ D r.X/

8.11

H1 D 1

8.12

1
1

hi i
;
n
c

8.13

h2ij

8.14

hi i D
D
D

yO i D XO .i/

ri D STUDIi

8.10

1 2 C .X/

1
;
4
1
n
1
n
1
n

Cooks distance

c D # of rows of X which are identical with x0.i/ (c  1)

for all i j

Q .i/
C xQ 0.i/ T1
xx x
N/
C .x.i/  xN /0 T1
xx .x.i/  x
C

1
.x
n1 .i/

N/ D
 xN /0 S1
xx .x.i/  x

.xi  x/
N 2
1
C
;
n
SSx

x0.i/ D the i th row of X0


1
n

1
MHLN2 .x.i/ ; xN ; Sxx /
n1

8.15

hi i D

8.16

N / D .n  1/hQ i i
MHLN2 .x.i/ ; xN ; Sxx / D .x.i/  xN /0 S1
xx .x.i/  x

8.17

chmax .X0 X/
.X X/ D
chmin .X0 X/

8.18

chmax .X0 X/
i .X X/ D
chi .X0 X/

when k D 1

1=2
D
1=2
D

sgmax .X/
sgmin .X/
sgmax .X/
sgi .X/

Q D PCX
H
0

condition number of X

condition indexes of X

42

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

8.19

Variables (columns) x1 ; : : : ; xp are exactly collinear if one of the xi s is an


exact linear combination of the others. This is exact collinearity, i.e., linear
dependence, and by term collinearity or near dependence we mean inexact
collinear relations.

8.20

Let MZ D fy; Z;  2 Ig denote the extended model where Z D .X W U/,


M D fy1 ; X.1/ ; Inb g is the model without the last b observations and

 

 

X.1/
0
; X.2/ 2 Rbp ; Unb D
; D
:
Xnp D

X.2/
Ib


Ia  H11 H12
; a C b D n;
(a) M D
H21 Ib  H22
M22 D Ib  H22 D U0 .In  H/U;
(b) r.M22 / D r.U0 M/ D b  dim C .U/ \ C .X/,
(c) rX0 .In  UU0 / D r.X.1/ / D r.X/  dim C .X/ \ C .U/,
(d) dim C .X/\C .U/ D r.X/r.X.1/ / D r.X.2/ /dim C .X0.1/ /\C .X0.2/ /,
(e) r.M22 / D b  r.X.2/ / C dim C .X0.1/ / \ C .X0.2/ /,
(f) M22 is pd () C .X0.2/ /  C .X0.1/ / () r.X.1/ / D r.X/
() ch1 .U0 HU/ D ch1 .H22 / < 1
() C .X/ \ C .U/ D f0g
() (and then ) is estimable under MZ :

8.21

O
O
O
If X.1/ has full column rank, then .M
 / D .MZ / WD  and
O2
(a) O D .U0 MU/1 U0 My D M1
22 "
D

.M1
22 M21
0

W Ib /y D

M1
22 M21 y1

"O 2 D lower part of "O D My


C y2

y2 D lower part of y
0

(b) SSEZ D y .In  PZ /y D y .M  PMU /y D y In  UU0  P.In UU0 /X y


D SSE D SSE  y0 PMU y
O 2 D SSE  O 0 M22 O
D SSE  "O 02 M1
22 "
0
(c) O  O  D .X0 X/1 X0 UM1
22 U My

O 2 D .X0 X/1 X0 UO
D .X0 X/1 X0.2/ M1
22 "
y0 MU.U0 MU/1 U0 My=b
y0 PMU y=b
D
y0 .M  PMU /y=.n  p  b/
SSE =.n  p  b/
0
1
"O 2 M22 "O 2 =b
 F.b; n  p  b; 0 M22 = 2 /
D
SSE =.n  p  b/
D F -test statistic for testing hypothesis D 0 in MZ

(d) t2 D

D the multiple case analogue of externally Studentized residual

8 Regression diagnostics

43

O
(e) COOK D .O  O  /0 X0 X.O  O  /=.p O 2 / D O 0 H22 =.p
O 2 /
1
O 2 =.p O 2 /
D "O 02 M1
22 H22 M22 "

8.22

jZ0 Zj D jX0 .In  UU0 /Xj D jX0.1/ X.1/ j D jM22 jjX0 Xj

8.23

jX0 Xj.1  i0i Hii / D jX0 .In  ii i0i /Xj;

8.24

C .X0.1/ /

8.25

PZ D

8.26

Corresponding to 8.2 (p. 40), consider the models M D fy; X;  2 Vg,


M.i/ D fy.i/ ; X.i/ ;  2 V.n1/ g and MZ D fy; Z;  2 Vg, where r.Z/ D
Q 0 denote the BLUE of  under MZ . Then:
p C 1, and let Q D .Q 0 W /

C .X0.2/ /

i.e.,

mi i D jX0.i/ X.i/ j=jX0 Xj

PX.1/ 0
D f0g () H D
0 PX.2/


PX.1/ 0
;
0 Ib


QZ D I n  P Z D

Ia  PX.1/ 0
0
0

(a) Q Z D Q .i/ ;

Q D ePi =m
P ii ,

ePi
(b) DFBETAi .V / D Q  Q .i/ D .X0 V1 X/1 X0 V1 ii
,
m
P ii
P dened as
where m
P i i is the i th diagonal element of the matrix M
P D V1  V1 X.X0 V1 X/1 X0 V1 D M.MVM/ M D .MVM/C ,
(c) M
P We denote
and ePi is the i th element of the vector eP D My.
P D V1 X.X0 V1 X/1 X0 V1 and hence H
P CM
P D V1 .
(d) H
(e) ti2 .V / D
(f) Q 2 D

ePi2
ePi2
Q2
D
D
2
Q
var.
f ePi /
m
P i i Q .i/
var.
f /

SSE.V /
,
np

P
where SSE.V / D y0 My;

F -test statistic
for testing D 0
2
D
Q .i/

SSE.i/ .V /
,
np1

P i i D SSE.V /  Q2 m
P i i D SSEZ .V /,
(g) SSE.i/ .V / D SSE.V /  ePi2 =m
(h) var.ePi / D  2 m
P ii ;
(i) COOK2i .V / D
8.27

Q D  2 =m
var./
P ii ,

ri2 .V / hP i i
.Q  Q .i/ /0 X0 V1 X.Q  Q .i/ /
D
.
p Q 2
p m
P ii

Consider the deletion of the last b observations in model M D fy; X;  2 Vg


and let Z be partitioned as in 8.20 (p. 42). Then
P 1 eP 2
(a) Q  Q  D .X0 V1 X/1 X0 V1 UQ D .X0 V1 X/1 X0 V1 UM
22
Q  is calculated without the last b observations;

44

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

P 1 U0 My
P DM
P 1 eP 2 ;
(b) Q D .U0 MU/
22

P
eP 2 D lower part of eP D My

0 P
P  y0 MU.U
P
P
MU/1 U0 My
(c) SSEZ .V / D y0 My
P  eP 0 M
P 1 P 2
D y0 My
2 22 e
P Z y D y0 MZ .MZ VMZ / MZ y
D y0 M

MZ D I  P Z

D SSE .V /,
(d) t2 .V / D

P 1 eP 2 =b
eP 02 M
22
P 22 = 2 /,
 F.b; n  p  b; 0 M
SSE .V /=.n  p  b/

Q
P 22 =.p
(e) COOK2 .V / D .Q  Q  /0 X0 V1 X.Q  Q  /=.p 2 / D Q 0 H
Q 2 /.

9 BLUE: Some preliminaries


9.1

Let Z be any matrix such that C .Z/ D C .X? / D N .X0 / D C .M/ and
denote F D VC1=2 X, L D V1=2 M. Then F0 L D X0 PV M, and
X0 PV M D 0 H) PF C PL D P.FWL/ D PV :

9.2

P Consider the linear model M D fy; X;  2 Vg, where X and V


Matrix M.
P and M
R be dened as
may not have full column ranks. Let the matrices M
P D M.MVM/ M;
M

P V:
R D PV MP
M

P V is always
P is unique i C .X W V/ D Rn . However, PV MP
The matrix M
unique. Suppose that the condition HPV M D 0 holds. Then
R D PV M.MVM/ MPV D VC  VC X.X0 VC X/ X0 VC ,
(a) M
R D MVC M  MVC X.X0 VC X/ X0 VC M,
(b) M
R D MM
R D MM
R
R
(c) M
D MMM,
(d) M.MVM/C M D .MVM/C M D M.MVM/C D .MVM/C ,
R M
R D M,
R i.e., V 2 f.M/
R  g,
(e) MV
R D r.VM/ D r.V/  dim C .X/ \ C .V/ D r.X W V/  r.X/,
(f) r.M/
(g) If Z is a matrix with property C .Z/ D C .M/, then
R D PV Z.Z0 VZ/ Z0 PV ;
M

P D VZ.Z0 VZ/ Z0 V:
VMV

(h) Let .X W Z/ be orthogonal. Then always (even if HPV M 0)


C

.X W Z/0 V.X W Z/ D .X W Z/0 VC .X W Z/:
Moreover, if in addition we have HPV M D 0, then

9 BLUE: Some preliminaries

.X W Z/0 V.X W Z/C D




45

X0 V C X X0 V C Z
D
Z0 VC X Z0 VC Z


X0 VX  X0 VZ.Z0 VZ/ Z0 VXC
X0 VC XX0 VZ.Z0 VZ/C
:
.Z0 VZ/C Z0 VXX0 VC X
Z0 VZ  Z0 VX.X0 VX/ X0 VZC

(i) If V is positive denite and C .Z/ D C .M/, then


(i)

P DM
R D M.MVM/ M D .MVM/C D Z.Z0 VZ/ Z0
M
D V1  V1 X.X0 V1 X/ X0 V1 D V1 .I  PXIV1 /;

P
(ii) X.X0 V1 X/ X0 D V  VZ.Z0 VZ/ Z0 V D V  VMV,
(iii) X.X0 V1 X/ X0 V1 D I  VZ.Z0 VZ/ Z0
P D I  P0ZIV D I  PVZIV1 :
D I  VM
(j) If V is positive denite and .X W Z/ is orthogonal, then
.X0 V1 X/1 D X0 VX  X0 VZ.Z0 VZ/1 Z0 VX:
9.3

If V is positive denite, then


P
P D V1  H
(a) M
D V1  V1 X.X0 V1 X/ X0 V1 D V1 .I  PXIV1 /;
P D M.MVM/ M D MMM
P
P D MM
P
(b) M
D MM
D M.MVM/C M D .MVM/C M D M.MVM/C D .MVM/C :

9.4

M.MVM/ M D M.MVM/C M i C .M/  C .MV/ i r.X W V/ D n

9.5

P general case. Consider the model fy; X; Vg. Let U be any matrix
Matrix M:
such that W D V C XUX0 has the property C .W/ D C .X W V/. Then
PW M.MVM/ MPW D .MVM/C
D WC  WC X.X0 W X/ X0 WC ;
i.e.,
R W D WC  WC X.X0 W X/ X0 WC :
P W WD M
PW MP
R in 9.2 (p. 44).
R W has the corresponding properties as M
The matrix M

9.6

W D VM.MVM/ MV C X.X0 W X/ X0

9.7

V  VM.MVM/ MV D X.X0 W X/ X0  X0 UX


D HVH  HVM.MVM/ MVH

46

9.8

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

X.X0 WC X/ X0 WC D X.X0 W X/ X0 WC D I  WM.MWM/ MPW


D I  VM.MVM/ MPW D H  HVM.MVM/ MPW

9.9

Let W D V C XUU0 X 2 NNDn have property C .X W V/ D C .W/. Then


W is a generalized inverse of V i C .V/ \ C .XUU0 X0 / D f0g.

9.10

Properties of X0 W X. Suppose that V 2 NNDnn , X 2 Rnp , and W D


V C XUX0 , where U 2 Rpp . Then the following statements are equivalent:
(a) C .X/  C .W/,
(b) C .X W V/ D C .W/,
(c) r.X W V/ D r.W/,
(d) X0 W X is invariant for any choice of W ,
(e) C .X0 W X/ is invariant for any choice of W ,
(f) C .X0 W X/ D C .X0 / for any choice of W ,
(g) r.X0 W X/ D r.X/ irrespective of the choice of W ,
(h) r.X0 W X/ is invariant with respect to the choice of W ,
(i) X.X0 W X/ X0 W X D X for any choices of the g-inverses involved.
Moreover, each of these statements is equivalent to (a) C .X/  C .W0 /, and
hence to the statements (b)(i) obtained from (b)(i), by setting W0 in place
of W. We will denote
(w) W D f Wnn W W D V C XUX0 ; C .W/ D C .X W V/ g.

9.11

C .VX? /? D C .W X W I  W W/

9.12

Consider the linear model fy; X; Vg and denote W D V C XUX0 2 W, and


let W be an arbitrary g-inverse of W. Then

9.13

W D V C XUX0 2 W

C .W X/ C .X/? D Rn ;

C .W X/? C .X/ D Rn ;

C .W /0 X C .X/? D Rn ;

C .W /0 X? C .X/ D Rn :

PA .PA NPA /C PA D .PA NPA /C PA


D PA .PA NPA /C D .PA NPA /C

9.14

for any N

Let X be any matrix such that C .X/ D C .X /. Then


(a) PV X .X0 VX / X0 PV D PV X.X0 VX/ X0 PV ,
(b) PV X .X0 VC X / X0 PV D PV X.X0 VC X/ X0 PV ,
(c) C .X/  C .V/ H)

9 BLUE: Some preliminaries

47

X .X0 V X / X0 D X.X0 V X/ X0 D H.H0 V H/ H D .H0 V H/C .


9.15

Let Xo be such that C .X/ D C .Xo / and X0o Xo D Ir , where r D r.X/. Then
Xo .X0o VC Xo /C X0o D H.H0 VC H/C H D .H0 VC H/C ;
whereas Xo .X0o VC Xo /C X0o D X.X0 VC X/C X0 i C .X0 XX0 V/ D C .X0 V/.

9.16

Let V 2 NNDnn , X 2 Rnp , and U 2 Rpp be such such that W D


V C XUX0 saties the condition C .W/ D C .X W V/. Then the equality
W D VB.B0 VB/ B0 V C X.X0 W X/ X0
holds for an n  q matrix B i
C .VW X/  C .B/?

and

C .VX? /  C .VB/;

or, equivalently, C .VW X/ D C .B/? \ C .V/, the subspace C .VW X/


being independent of the choice of W .
9.17

Let V 2 NNDnn , X 2 Rnp and C .X/  C .V/. Then the equality


V D VB.B0 VB/ B0 V C X.X0 V X/ X0
holds for an n  q matrix B i C .X/ D C .B/? \ C .V/.

9.18

Denition of estimability: Let K0 2 Rqp . Then K0 is estimable if there


exists a linear estimator Fy which is unbiased for K0 .
In what follows we consider the model fy; X1 1 C X2 2 ; Vg, where Xi 2
Rnpi , i D 1; 2, p1 C p2 D p.

9.19

K0 is estimable i C .K/  C .X0 / i K D X0 A for some A i K0 O D


K0 .X0 X/ X0 y is unique for all .X0 X/ .

9.20

L 2 is estimable i C .L0 /  C .X02 M1 /, i.e., L D BM1 X2 for some B.

9.21

The following statements are equivalent:


(a) 2 is estimable,

(b) C Ip02  C .X0 /,
(c) C .X1 / \ C .X2 / D f0g and r.X2 / D p2 ,
(d) r.M1 X2 / D r.X2 / D p2 .

9.22

Denoting PX2 X1 D X2 .X02 M1 X2 / X02 M1 , the following statements are


equivalent:
(a) X2 2 is estimable,

(b) r.X02 / D r.X02 M1 /,

48

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(c) C .X1 / \ C .X2 / D f0g,

(d) PX2 X1 X2 D X2 ,

(e) PX2 X1 is invariant with respect to the choice of .X02 M1 X2 / ,


(f) PX2 X1 is a projector onto C .X2 / along C .X1 / C .X/? ,
(g) H D PX2 X1 C PX1 X2 .
9.23

k is estimable () xk C .X1 / () M1 xk 0

9.24

k0 is estimable () k 2 C .X0 /

9.25

is estimable () r.Xnp / D p

9.26

Consider the one-way ANOVA using the following parametrization:


 

y D X C "; D
;

101 0 1
0 1 0
1
1
:
:
:
0
"1
y1
n1
n1
C
::
:: : :
:: C B
B :1 C C @ :: A :
@ :: A D B
A
@
:
:
:
:
:
:
@:A
:
yg
"g
1ng 0 : : : 1ng
g
Then the parametric function a0 D .b; c0 / D b C c0  is estimable i
a0 u D 0; where u0 D .1; 10g /;
0

i.e.,

b  .c1 C    C cg / D 0;

and c  is estimable i c 1g D 0. Such a function c0  is called a contrast, and


the contrasts of the form i  j , i j , are called elementary contrasts.
9.27

(Continued . . . ) Denote the model matrix above as X D .1n W X0 / 2


Rn.gC1/ , and let C 2 Rnn be the centering matrix. Then
(a) N .X/ D C .u/; where u0 D .1; 10g /,
(b) N .CX0 / D C .X00 C/? D C .1g /; and hence C .X00 C/ D C .1g /? .

10 Best linear unbiased estimator


10.1

Denition 1: Let k0 be an estimable parametric function. Then g0 y is


BLUE.k0 / under fy; X;  2 Vg if g0 y is an unbiased estimator of k0 and
it has the minimum variance among all linear unbiased estimators of k0 :
E.g0 y/ D k0 and var.g0 y/
var.f 0 y/ for any f W E.f 0 y/ D k0 :

10.2

Denition 2: Gy is BLUE.X/ under fy; X;  2 Vg if


E.Gy/ D X and cov.Gy/
L cov.Fy/ for any F W E.Fy/ D X:

10 Best linear unbiased estimator

49

10.3

Gy D BLUE.X/ under fy; X;  2 Vg () G.X W VM/ D .X W 0/.

10.4

GaussMarkov theorem: OLSE.X/ D BLUE.X/ under fy; X;  2 Ig.

10.5

f D XQ should be understood as
The notation Gy D BLUE.X/ D X
Gy 2 fBLUE.X j M /g;

i.e.,

G 2 fPXjVM g;

where fBLUE.X j M /g refers to the set of all representations of the BLUE.


The matrix G is unique i C .X W V/ D Rn , but the value of Gy is always
unique after observing y.
10.6

A gentle warning regarding notation 10.5 may be worth giving. Namely in


10.5 Q refers now to any vector Q D Ay such that XQ is the BLUE.X/. The
vector Q in 10.5 need not be the BLUE./the parameter vector may not
even be estimable.

10.7

Let K0 be an estimable parametric function under fy; X;  2 Vg. Then


Gy D BLUE.K0 / () G.X W VM/ D .K0 W 0/:

10.8

Gy D BLUE.X/ under fy; X;  2 Vg i there exists a matrix L so that G is


a solution to (Pandoras Box)
  0  

G
0
V X
D
:
L
X0
X0 0

10.9

The general solution for G satisfying G.X W VM/ D .X W 0/ can be expressed,


for example, in the following ways:
(a) G1 D .X W 0/.X W VM/ C F1 In  .X W VM/.X W VM/ ,
(b) G2 D X.X0 W X/ X0 W C F2 .In  WW /,
(c) G3 D In  VM.MVM/ M C F3 In  MVM.MVM/ M,
(d) G4 D H  HVM.MVM/ M C F4 In  MVM.MVM/ M,
where F1 , F2 , F3 and F4 are arbitrary matrices, W D V C XUX0 and U is
any matrix such that C .W/ D C .X W V/, i.e., W 2 W, see 9.10w (p. 46).

10.10 If C .X/  C .V/, fy; X;  2 Vg is called a weakly singular linear model.


10.11 Consistency condition for M D fy; X;  2 Vg. Under M , we have
y 2 C .X W V/ with probability 1.
The above statement holds if the model M is indeed correct, so to say (as all
our models are supposed to be). Notice that C .X W V/ can be written as
C .X W V/ D C .X W VM/ D C .X/ C .VM/:

50

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

10.12 Under the (consistent model) fy; X;  2 Vg, the estimators Ay and By are said
to be equal if their realized values are equal for all y 2 C .X W V/:
A1 y equals A2 y () A1 y D A2 y for all y 2 C .X W V/ D C .X W VM/:
10.13 If A1 y and A2 y are two BLUEs under fy; X;  2 Vg, then A1 y D A2 y for all
y 2 C .X W V/.
10.14 A linear estimator ` 0 y which is unbiased for zero, is called a linear zero function. Every linear zero function can be written as b0 My for some b. Hence an
unbiased estimator Gy is BLUE.X/ i Gy is uncorrelated with every linear
zero function.
10.15 If Gy is the BLUE for an estimable K0 under fy; X;  2 Vg, then LGy is the
BLUE for LK0 ; shortly, fLBLUE.K0 /g  fBLUE.LK0 /g for any L.
10.16 Gy D BLUE.GX/ under fy; X;  2 Vg () GVM D 0
10.17 If Gy D BLUE.X/ then HGy D BLUE.X/ and thereby there exists L
such that XLy D BLUE.X/.
10.18 Consider the models M D fy; X; Vg and MW D fy; X; Wg, where W D
V C XUU0 X0 , and C .W/ D C .X W V/. Then
(a) cov.XO j MW / D HWH;
cov.XQ j MW / D X.X0 W X/ X0 D .HW H/C ;
(b) cov.XO j M / D HVH;

cov.XQ j M / D X.X0 W X/ X0  XUU0 X0 ,

(c) cov.XO j MW /  cov.XQ j MW / D cov.XO j M /  cov.XQ j M /,


(d) fBLUE.X j MW /g D fBLUE.X j M /g.
10.19 Under fy; X;  2 Vg, the BLUE.X/ has the representations
f D XQ D 
Q
BLUE.X/ D X
D PXIV1 y D X.X0 V1 X/ X0 V1 y
X.X0# X# / X0# y#
0 
 0 

D
D X.X V X/ X V y

y# D V

1=2

V pd
1=2

y, X# D V
X
C .X/  C .V/

D H  HVM.MVM/ My
D OLSE.X/  HVM.MVM/ My
D I  VM.MVM/ My
D X.X0 W X/ X0 W y

W D V C XUX0 2 W

10 Best linear unbiased estimator

51

10.20 BLUE./ D Q D .X0 X/1 X0 y  .X0 X/1 X0 VM.MVM/ My


P
D O  .X0 X/1 X0 VMy
0

D .X V X/

1

only r.X/ D p requested

C .X/  C .V/, any V will do

XV y

Q D X.X0 V1 X/ X0


10.21 (a) cov.X/
0

V pd

D X.X V X/ X D .HV H/
C .X/  C .V/
D HVH  HVM.MVM/ MVH
D K0 K  K0 PL K
K D V1=2 H; L D V1=2 M
O  HVM.MVM/ MVH
D cov.X/
D V  VM.MVM/ MV D V1=2 .I  PV1=2 M /V1=2
D X.X0 W X/ X0  XUX0 ,
Q
(b) cov./
D .X0 V1 X/1
0

1

V pd, r.X/ D p
0

1

1

D .X X/ X VX.X X/  .X X/ X VM.MVM/ MVX.X0 X/1


O  .X0 X/1 K0 PL K.X0 X/1
D cov./
only r.X/ D p requested
0
1
0
0
0
1
D .X X/ .K K  K PL K/.X X/
K D V1=2 X; L D V1=2 M
D .X0 VC X/1

C .X/  C .V/; r.X/ D p

Q D MVM= and
10.22 Denoting D .H W M/0 V.H W M/, we have cov.X/
Q
Q D
hence r./ D r.V/ D r.MVM/ C rcov.X/, which yields rcov.X/
Q
r.V/r.VM/ D dim C .X/\C .V/. Moreover, C cov.X/ D C .X/\C .V/
Q D dim C .X/ \ C .V/.
and if r.X/ D p, we have rcov./
10.23 The BLUE of X and its covariance matrix remain invariant for any choice
of X as long as C .X/ remains the same.
Q D cov.X/
O  cov.X/
Q D HVM.MVM/ MVH
10.24 (a) cov.XO  X/
Q D cov./
O  cov./
Q
(b) cov.O  /
D .X0 X/1 X0 VM.MVM/ MVX.X0 X/1
O /
Q D cov./
Q
(c) cov.;
P
10.25 "Q D y  XQ D VM.MVM/ My D VMy

residual of the BLUE

Q D cov.y/  cov.X/
Q D covVM.MVM/ My
Q D cov.y  X/
10.26 cov."/
P D VM.MVM/ MV;
Q D C .VM/
D cov.VMy/
C cov."/
10.27 Pandoras Box. Consider the model fy; X;  2 Vg and denote

52

Formulas Useful for Linear Regression Analysis and Related Matrix Theory


V X
;
X0 0

D

and

BD

 


V X
B1 B2
D
2 f  g:
X0 0
B3 B4

Then

 
 
V
X
\
C
D f0g,
X0
0


V X
(b) r
D r.V W X/ C r.X/,
X0 0
(a) C

(c) XB02 X D X;

XB3 X D X,

(d) XB4 X0 D XB04 X0 D VB03 X0 D XB3 V D VB2 X0 D XB02 V,


(e) X0 B1 X, X0 B1 V and VB1 X are all zero matrices,
(f) VB1 VB1 V D VB1 V D VB01 VB1 V D VB01 V,
(g) tr.VB1 / D r.V W X/  r.X/ D tr.VB01 /,
(h) VB1 V and XB4 X0 are invariant for any choice of B1 and B4 ,
Q D XB4 X0 D V  VB1 V,
(i) XQ D XB02 y, cov.X/
"Q D y  XQ D VB1 y,
Q D  2 k0 B4 k,
(j) for estimable k0 , k0 Q D k0 B02 y D k0 B3 y, var.k0 /
(k) Q 2 D y0 B1 y=f is an unbiased estimator of  2 ; f D r.VM/.
10.28 "Q D y  XQ D .I  PXIV1 /y

residual of the BLUE, V pd

P
D VM.MVM/ My D VMy


10.29 "Q # D V1=2 "Q

V can be singular

residual in M# D fy# ; X# ;  2 Ig, V pd

D V1=2 .I  PXIV1 /y
10.30 SSE.V / D minky  Xk2V1 D min.y  X/0 V1 .y  X/

D
D
D

ky  PXIV1 yk2V1
Q 2 1 D .y  X/
Q 0 V1 .y
ky  Xk
V
"Q 0# "Q # D y0# .I  PX# /y#

Q D "Q 0 V1 "Q


 X/
y# D V1=2 y, X# D V1=2 X

D y0 V1  V1 X.X0 V1 X/ X0 V1 y


P
D y0 M.MVM/ My D y0 My
10.31 Q 2 D SSE.V /=n  r.X/

general presentation

unbiased estimator of  2 , V pd

10.32 Let W be dened as W D V C XUX0 , with C .W/ D C .X W V/. Then an


unbiased estimator for  2 is

10 Best linear unbiased estimator

53

(a) Q 2 D SSE.W /=f , where f D r.X W V/  r.X/ D r.VM/, and


Q 0 W .y  X/
Q D .y  X/
Q 0 V .y  X/
Q
(b) SSE.W / D .y  X/
0

0


0

 0

D "Q W "Q D y W  W X.X W X/ X W y
P D SSE.V /: Note: y 2 C .W/
D y0 M.MVM/ My D y0 My
P D y0 My D SSE.I / 8 y 2 C .X W V/ () .VM/2 D VM
10.33 SSE.V / D y0 My
10.34 Let V be pd and let X be partitioned as X D .X1 W X2 /, r.X/ D p. Then, in
view of 21.2321.24 (p. 95):
P 1 X2 .X02 M
P 1 X2 /1 X02 M
P1
(a) P.X1 WX2 /IV1 D X1 .X01 V1 X1 /1 X01 V1 C VM
P 2 X1 /1 X01 M
P 2 C X2 .X02 M
P 1 X2 /1 X02 M
P 1;
D X1 .X01 M
P 1 D V1  V1 X1 .X0 V1 X1 /1 X0 V1 D M1 .M1 VM1 / M1 ,
(b) M
1
1
P 2 X1 /1 X0 M
P
(c) Q 1 D .X01 M
1 2 y;

P 1 X2 /1 X0 M
P
Q 2 D .X02 M
2 1 y,

P 1 X2 /1 ,
(d) cov.Q 2 / D  2 .X02 M
(e) cov.O 2 / D  2 .X02 M1 X2 /1 X02 M1 VM1 X2 .X02 M1 X2 /1 ,
(f) BLUE.X/
D P.X1 WX2 /IV1 y D X1 Q 1 C X2 Q 2
P 2 X1 /1 X0 M
P 2 y C X2 .X0 M
P 1 X2 /1 X0 M
P 1y
D X1 .X0 M
D

1
1
0 1
1 0 1
X1 .X1 V X1 / X1 V y

1 0 P
P 1 X2 .X0 M
P
C VM
2 1 X2 / X2 M1 y

P 1 X2 Q 2
D PX1 IV1 y C VM
P 1 X2 Q 2 .M12 /;
D X1 Q 1 .M1 / C VM

M1 D fy; X1 1 ;  2 Vg

(g) Q 1 .M12 / D Q 1 .M1 /  .X01 V1 X1 /1 X01 V1 X2 Q 2 .M12 /.


(h) Replacing V1 with VC and denoting PAIVC D A.A0 VC A/1 A0 VC , the
results in (c)(g) hold under a weakly singular model.
10.35 Assume that C .X1 / \ C .X2 / D f0g, r.M1 X2 / D p2 and denote
W D V C X1 U1 X01 C X2 U2 X02 ; Wi D V C Xi Ui X0i ;
P 1W D M1 .M1 WM1 / M1 D M1 .M1 W2 M1 / M1 ;
M

i D 1; 2;

where C .Wi / D C .Xi W V/, C .W/ D C .X W V/. Then


P 1W X2 /1 X02 M
P 1W y:
BLUE. 2 / D Q 2 .M12 / D .X02 M
10.36 (Continued . . . ) If the disjointness holds, and C .X2 /  C .X1 W V/, then
P 1 X 2 / X 0 M
P
(a) X2 Q 2 .M12 / D X2 .X02 M
2 1 y,
(b) X1 Q 1 .M12 / D X1 Q 1 .M1 /  X1 .X01 WC X1 / X01 WC X2 Q 2 .M12 /,

54

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(c) BLUE.X j M12 /


P 1  BLUE.X2 2 j M12 /
D BLUE.X1 1 j M1 / C W1 M
P 1 X2 2 j M12 /:
D BLUE.X1 1 j M1 / C W1  BLUE.M
P 12 y
10.37 SSE.V / D minky  Xk2V1 D y0 M

P 12 D M
P
M

D y0 V1 .I  P.X1 WX2 /IV1 /y


P 1 X2 .X02 M
P 1 X2 /1 X02 M
P 1y
D y0 V1 .I  PX1 IV1 /y  y0 M
D SSEH .V /  SSE.V /

H : 2 D 0

P 1 X2 .X02 M
P 1 X2 /1 X02 M
P 1y
10.38 SSE.V / D y0 M
0 P
0 P
D y M1 y  y M12 y
change in SSE.V / due to the hypothesis
D minky  Xk2V1  minky  Xk2V1
H

H W 2 D 0

D Q 02 cov.Q 2 /1 Q 2  2 D Q.V /


10.39 OLSE.X/ D BLUE.X/ i any of the following equivalent conditions
holds. (Note: V is replaceable with VC and H and M can be interchanged.)
(a) HV D VH,

(b) HV D HVH,

(c) HVM D 0,

(d) X0 VZ D 0, where C .Z/ D C .M/,

(e) C .VX/  C .X/,

(f) C .VX/ D C .X/ \ C .V/,

(g) HVH
L V, i.e., V  HVH is nonnegative denite,
(h) HVH
rs V, i.e., r.V  HVH/ D r.V/  r.HVH/, i.e., HVH and V are
rank-subtractive, i.e., HVH is below V w.r.t. the minus ordering,
(i) C .X/ has a basis consisting of r D r.X/ ortonormal eigenvectors of V,
(j) r.T0f1g X/ C    C r.T0fsg X/ D r.X/, where Tfig is a matrix consisting of
the orthonormal eigenvectors corresponding to the ith largest eigenvalue
fig of V; f1g > f2g >    > fsg , fig s are the distinct eigenvalues
of V,
(k) T0fig HTfig D .T0fig HTfig /2 for all i D 1; 2; : : : ; s,
(l) T0fig HTfj g D 0 for all i; j D 1; 2; : : : ; s, i j ,
(m) the squared nonzero canonical correlations between y and Hy are the
nonzero eigenvalues of V HVH for all V , i.e.,
cc2C .y; Hy/ D nzch.V HVH/

for all V ;

(n) V can be expressed as V D HAH C MBM, where A L 0, B L 0, i.e.,

10 Best linear unbiased estimator

55

V 2 V1 D f V L 0 W V D HAH C MBM; A L 0; B L 0 g;
(o) V can be expressed as V D XCX0 C ZDZ0 , where C L 0, D L 0, i.e.,
V 2 V2 D f V L 0 W V D XCX0 C ZDZ0 ; C L 0; D L 0 g;
(p) V can be expressed as V D I C XKX0 C ZLZ0 , where 2 R, and K
and L are symmetric, such that V is nonnegative denite, i.e.,
V 2 V3 D f V L 0 W V D I C XKX0 C ZLZ0 ; K D K0 ; L D L0 g:
10.40 Intraclass correlation structure. Consider the model M D fy; X;  2 Vg,
where 1 2 C .X/. If V D .1  %/I C %110 , then OLSE.X/ D BLUE.X/.
10.41 Consider models M% D fy; X;  2 Vg and M0 D fy; X;  2 Ig, where V
has intraclass correlation structure. Let the hypothesis to be tested be H :
E.y/ 2 C .X /, where it is assumed that 1 2 C .X /  C .X/. Then the
F -test statistics for testing H are the same under M% and M0 .
10.42 Consider the models M1 D fy; X; V1 g and M2 D fy; X; V2 g where V1
and V2 are pd and X 2 Rnp
. Then the following statements are equivalent:
r
(a) BLUE.X j M1 / D BLUE.X j M2 /,
(b) PXIV1 D PXIV1 ,
1

(c) X

V1
2 PXIV1
1

D X0 V1
2 ,

1
(d) P0XIV1 V1
2 PXIV1 D V2 PXIV1 ,
1

(e)

V1
2 PXIV1
1

is symmetric,

1
(f) C .V1
1 X/ D C .V2 X/,

(g) C .V1 X? / D C .V2 X? /,


(h) C .V2 V1
1 X/ D C .X/,
(i) X0 V1
1 V2 M D 0,
V2 V1=2
 V1=2
X/ D C .V1=2
X/,
(j) C .V1=2
1
1
1
1
X/ has a basis U D .u1 W : : : W ur / comprising a set of r
(k) C .V1=2
1
eigenvectors of V1=2
V2 V1=2
,
1
1
X D UA for some Arp , r.A/ D r,
(l) V1=2
1
1=2
1
(m) X D V1=2
1 UA; the columns of V1 U are r eigenvectors of V2 V1 ,

(n) C .X/ has a basis comprising a set of r eigenvectors of V2 V1


1 .

56

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

10.43 Consider the models M1 D fy; X; V1 g and M2 D fy; X; V2 g, where


C
 0
r.X/ D r. Denote PXIWC D X.X0 WC
1 X/ X W1 , where W1 D V1 C
1
XUU0 X0 , and C .W1 / D C .X W V1 / and so PXIWC y is the BLUE for X
1
under M1 . Then PXIWC y is the BLUE for X also under M2 i any of the
1
following equivalent conditions holds:
(a) X0 WC
1 V2 M D 0,

(b) C .V2 WC
1 X/  C .X/,

C
C
?
?
(c) C .V2 M/  N .X0 WC
1 / D C .W1 X/ D C .G/; G 2 f.W1 X/ g,

(d) C .WC
1 X/ is spanned by a set of r proper eigenvectors of V2 w.r.t. W1 ,
(e) C .X/ is spanned by a set of r eigenvectors of V2 WC
1,
(f) PXIWC V2 is symmetric,
1

(g) V2 2 f V2 2 NNDn W V2 D XN1 X0 C GN2 G0 for some N1 and N2 g.


10.44 Consider the models M1 D fy; X; V1 g and M2 D fy; X; V2 g, and let
the notation fBLUE.X j M1 /g  fBLUE.X j M2 /g mean that every representation of the BLUE for X under M1 remains the BLUE for X under
M2 . Then the following statements are equivalent:
(a) fBLUE.X j M1 /g  fBLUE.X j M2 /g,
(b) fBLUE.K0 j M1 /g  fBLUE.K0 / j M2 /g for every estimable K0 ,
(c) C .V2 X? /  C .V1 X? /,
(d) V2 D aV1 C XN1 X0 C V1 MN2 MV1 for some a 2 R, N1 and N2 ,
(e) V2 D XN3 X0 C V1 MN4 MV1 for some N3 and N4 .
10.45 Consider the models M1 D fy; X; V1 g and M2 D fy; X; V2 g. For X to
have a common BLUE under M1 and M2 it is necessary and sucient that
C .V1 X? W V2 X? / \ C .X/ D f0g:
10.46 Consider the linear models M1 D fy; X; V1 g and M2 D fy; X; V2 g. Then
the following statements are equivalent:
(a) G.X W V1 M W V2 M/ D .X W 0 W 0/ has a solution for G,
(b) C .V1 M W V2 M/ \ C .X/ D f0g,




MV1
MV1 M
(c) C
C
:
MV2
MV2 M
10.47 Let U 2 Rnk be given such that r.U/
n  1. Then for every X satisfying C .U/  C .X/ the equality OLSE.X/ D BLUE.X/ holds under
fy; X;  2 Vg i any of the following equivalent conditions holds:

10 Best linear unbiased estimator

(a) V.I  PU / D

57

trV.I  PU /
.I  PU /,
n  r.U/

(b) V can be expressed as V D aICUAU0 , where a 2 R, and A is symmetric,


such that V is nonnegative denite.
10.48 (Continued . . . ) If U D 1n then V in (b) above becomes V D aI C b110 , i.e.
V is a completely symmetric matrix.
10.49 Consider the model M12 D fy; X1 1 C X2 2 ; Vg and the reduced model
M121 D fM1 y; M1 X2 2 ; M1 VM1 g. Then
(a) every estimable function of 2 is of the form LM1 X2 2 for some L,
(b) K0 2 is estimable under M12 i it is estimable under M121 .
10.50 Generalized FrischWaughLovell theorem. Let us denote
fBLUE.M1 X2 2 j M12 /g D f Ay W Ay is BLUE for M1 X2 2 g:
Then every representation of the BLUE of M1 X2 2 under M12 remains the
BLUE under M121 and vice versa, i.e., the sets of the BLUEs coincide:
fBLUE.M1 X2 2 j M12 /g D fBLUE.M1 X2 2 j M121 /g:
In other words: Let K0 2 be an arbitrary estimable parametric function under
M12 . Then every representation of the BLUE of K0 2 under M12 remains
the BLUE under M121 and vice versa.
10.51 Let 2 be estimable under M12 . Then
O 2 .M12 / D O 2 .M121 /;

Q 2 .M12 / D Q 2 .M121 /:

10.52 Equality of the OLSE and BLUE of the subvectors. Consider a partitioned
linear model M12 , where X2 has full column rank and C .X1 /\C .X2 / D f0g
holds. Then the following statements are equivalent:
(a) O 2 .M12 / D Q 2 .M12 /,
(b) O 2 .M121 / D Q 2 .M121 /,
(c) C .M1 VM1 X2 /  C .M1 X2 /,
(d) C M1 VM1 .M1 X2 /?   C .M1 X2 /? ,
(e) PM1 X2 M1 VM1 D M1 VM1 PM1 X2 ,
(f) PM1 X2 M1 VM1 QM1 X2 D 0, where QM1 X2 D I  PM1 X2 ,
(g) PM1 X2 VM1 D M1 VPM1 X2 ,
(h) PM1 X2 VM D 0,
(i) C .M1 X2 / has a basis comprising p2 eigenvectors of M1 VM1 .

58

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

11 The relative eciency of OLSE


11.1

The Watson eciency  under model M D fy; X; Vg. The relative eciency of OLSE vs. BLUE is dened as the ratio
O DD
e./

Q
jX0 Xj2
detcov./
:
D 0
O
jX VXj  jX0 V1 Xj
detcov./

We have 0 < 
1, with  D 1 i OLSE./ D BLUE./.
BWK The BloomeldWatsonKnott inequality. The lower bound for  is
41 n
42 n1
4p npC1


D 12 22    p2
;
.1 C n /2 .2 C n1 /2
.p C npC1 /2
i.e.,
min  D 0min
X

X XDIp

p
p
Y
Y
1
4i niC1
D
i2 ;
D
jX0 VXj  jX0 V1 Xj
.i C niC1 /2
iD1

iD1

where i D chi .V/, and i D i th antieigenvalue of V. The lower bound is


attained when X is
Xbad D .t1 tn W : : : W tp tnpC1 / D T.i1 in W : : : W ip inpC1 /;
where ti s are the orthonormal eigenvectors of V and p
n=2.
11.2

Suppose that r.X/ D p and denote K D V1=2 X, L D V1=2 M. Then



  0
 0   0
X VX X0 VM
K K K0 L
Xy
;
D
D
cov
My
MVX MVM
L0 K L0 L
  
  0 1 0

O cov./
Q
.X X/ X VX.X0 X/1 
O
cov./
cov Q D
D
Q cov./
Q



cov./


0
0
FK KF
FK .In  PL /KF
D
; F D .X0 X/1 ;
FK0 .In  PL /KF FK0 .In  PL /KF
Q D cov./
O  .X0 X/1 X0 VM.MVM/ MVX.X0 X/1
cov./
O  D:
D .X0 X/1 K0 .In  PL /K.X0 X/1 WD cov./

11.3

D

Q
jcov./j
jX0 VX  X0 VM.MVM/ MVXj  jX0 Xj2
D
O
jX0 VXj  jX0 Xj2
jcov./j
jX0 VX  X0 VM.MVM/ MVXj
jX0 VXj
D jIp  .X0 VX/1 X0 VM.MVM/ MVXj

D jIp  .K0 K/1 K0 PL Kj;

11 The relative eciency of OLSE

59

where we must have r.VX/ D r.X/ D p so that dim C .X/ \ C .V/? D f0g.
In a weakly singular model with r.X/ D p the above representation is valid.
11.4

Consider a weakly singular linear model where r.X/ D p and let and 1 
2      p  0 and 1  2      p > 0 denote the canonical
Q respectively, i.e.,
correlations between X0 y and My, and O and ,
i D cci .X0 y; My/;

O /;
Q
i D cci .;

i D 1; : : : ; p:

Suppose that p
n=2 in which case the number of the canonical correlations,
i.e., the number of pairs of canonical variables based on X0 y and My, is p.
Then
(a) m D r.X0 VM/ D number of nonzero i s,
p D number of nonzero i s,
2
(b) f 12 ; : : : ; m
g D nzch.X0 VX/1 X0 VM.MVM/ MVX
D nzch.PV1=2 X PV1=2 M / D nzch.PV1=2 H PV1=2 M /

D nzch.PK PL /;
(c)

f12 ; : : : ; p2 g

D chX0 X.X0 VX/1 X0 X  .X0 VC X/1 


O 1  cov./
Q
D ch.cov.//
O  D/
O 1 .cov./
D ch.cov.//
D chIp  X0 X.K0 K/1 K0 PL K.X0 X/1 
D chIp  .K0 K/1 K0 PL K D f1  ch.K0 K/1 K0 PL Kg
D f1  ch.X0 VX/1 X0 VM.MVM/ MVXg;

2
(d) i2 D 1  piC1
;

i.e.;

0
O /
Q D 1  cc2
cc2i .;
piC1 .X y; My/;

11.5

i D 1; : : : ; p.

Under a weakly singular model the Watson eciency can be written as


jX0 VX  X0 VM.MVM/ MVXj
jX0 Xj2
D
jX0 VXj  jX0 VC Xj
jX0 VXj
0

0
D jIp  X VM.MVM/ MVX.X VX/1 j

D

D jIn  PV1=2 X PV1=2 M j D jIn  PV1=2 H PV1=2 M j


D

p
Y
iD1

i2 D

p
Y

.1  i2 /:

iD1

11.6

Q   2 cov./
O D 0, and
The i2 s are the roots of the equation detcov./
2
O
Q
thereby they are solutions to cov./w D  cov./w, w 0.

11.7

Let Gy be the BLUE.X/ and denote K D V1=2 H. Then

60

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

 
Hy
HVH
D
My
MVH
  
Hy
HVH
cov
D
GVG0
Gy

cov

  0

HVM
K K K0 L
D
;
MVM
L0 K L0 L
 

K0 K
K0 .I  PL /K
GVG0
D
:
GVG0
K0 .I  PL /K K0 .I  PL /K

Denote T1 D .HVH/ HVM.MVM/ MVH, T2 D .HVH/ GVG0 . Then


2
(a) cc2C .Hy; My/ D f 12 ; : : : ; m
g

m D r.HVM/

D nzch.T1 / D nzch.PL PK /


D nzch.PL PK / D cc2C .X0 y; My/,
(b) cc2C .Hy; Gy/ D f12 ; : : : ; g2 g

g D dim C .X/ \ C .V/




D nzch.T2 / D nzch.HVH/ GVG0 


D nzch.K0 K / K0 .In  PL /K 
D nzchPK .In  PL / D nzchPK .In  PL /,
(c) cc21 .Hy; Gy/ D max cor 2 .a0 Hy; b0 Gy/
a;b

a0 GVG0 a
max taken subject to VHa 0
a
a0 HVHa
D ch1 .HVH/ GVG0  D 12 ,

D max

(d) cc2i .Hy; My/ D 1  cc2hiC1 .Hy; Gy/;


11.8

i D 1; : : : ; h;

h D r.VH/.

u D # of unit canonical correlations ( i s) between X0 y and My


D # of unit canonical correlations between Hy and My
D dim C .V1=2 X/ \ C .V1=2 M/ D dim C .VX/ \ C .VM/
D r.V/  dim C .X/ \ C .V/  dim C .M/ \ C .V/
D r.HPV M/

11.9

Under M D fy; X; Vg, the following statements are equivalent:


(a) HPV M D 0,

(b) PV M D MPV ,

(d) C .VH/ \ C .VM/ D f0g,

(e) C .V

(c) C .PV H/  C .H/,


1=2

H/ \ C .V1=2 M/ D f0g,

(f) C .X/ D C .X/ \ C .V/  C .X/ \ C .V/? ,


(g) u D dim C .VH/ \ C .VM/ D r.HPV M/ D 0, where u is the number of
unit canonical correlations between Hy and My,
Q D PV X.X0 VC X/ X0 PV .
(h) cov.X/
11.10 The squared canonical correlations i2 s are the proper eigenvalues of K0 PL K
with respect to K0 K, i.e.,
HVM.MVM/ MVHw D 2 HVHw;

VHw 0:

11 The relative eciency of OLSE

61

The squared canonical correlations i2 s are the proper eigenvalues of L0 L D


GVG0 with respect to K0 K D HVH:
GVG0 w D  2 HVHw D .1  2 /HVHw;

VHw 0:

11.11 We can arrange the i2 s as follows:


12 D    D u2 D 1;
1>

2
uC1

2
uCtC1

  
2
mC1

u D r.HPV M/

2
uCt

D 

2
D m
> 0;
2
D mCs D %2h

m D r.HVM/
D 0;

2
where i2 D 1  hiC1
, i D 1; : : : ; h D r.VH/, s D dim C .VX/ \ C .X/
and t D dim C .V/ \ C .X/  s.

11.12 Antieigenvalues. Denoting x D V1=2 z, the Watson eciency  can be interpreted as a specic squared cosine:
.x0 V1=2  V1=2 x/2
.x0 x/2
D
D cos2 .V1=2 x; V1=2 x/
x0 Vx  x0 V1 x
x0 Vx  x0 V1 x
.z0 Vz/2
D 0 2
D cos2 .Vz; z/:
z V z  z0 z
The vector z minimizing  [maximizing angle .Vz; z/] is called an antieigenvector and the corresponding minimum ( 12 ) the rst antieigenvalue (squared):
p
p
1 n
2 1 n
;
1 D min cos.Vz; z/ WD cos '.V/ D
D
1 C n
.1 C n /=2
z0
D

where '.V/ is the matrix angle of V and .i ; ti / is the p


i th eigenpair
p of V. The
rst antieigenvector has forms proportional to z1 D n t1 1 tn . The
second antieigenvalue of V is dened as
p
2 2 n1
2 D min cos.Vz; z/ D
:
2 C n1
z0; z0 z1 D0
11.13 Consider the models M12 D fy; X1 1 C X2 2 ; Vg, M1 D fy; X1 ; Vg, and
M1H D fHy; X1 1 ; HVHg, where X is of full column rank and C .X/ 
C .V/. Then the corresponding Watson eciencies are
(a) e.O j M12 / D

jcov.Q j M12 /j
jX0 Xj2
,
D 0
jX VXj  jX0 VC Xj
jcov.O j M12 /j

(b) e.O 2 j M12 / D


(c) e.O 1 j M1 / D

jX02 M1 X2 j2
jcov.Q 2 j M12 /j
,
D 0
P 1 X2 j
jX2 M1 VM1 X2 j  jX02 M
jcov.O 2 j M12 /j

jX01 X1 j2
,
jX01 VX1 j  jX01 VC X1 j

62

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(d) e.O 1 j M1H / D

jX01 VX1 j

jX01 X1 j2
.
 jX01 .HVH/ X1 j

11.14 (Continued . . . ) The Watson eciency e.O j M12 / can be expressed as


e.O j M12 / D e.O 1 j M1 /  e.O 2 j M12 / 

1
e.O 1 j M1H /

where
e.O 1 j M1H / D

jX01 X1 j2
jX01 VX1 j  jX01 .HVH/ X1 j

D jIp1  X01 VM1 X2 .X02 M1 VM1 X2 /1 X02 M1 VX01 .X01 VX1 /1 j:
11.15 (Continued . . . ) The following statements are equivalent:
(a) e.O j M12 / D e.O 2 j M12 /,
(b) C .X1 /  C .VX/,
(c) Hy is linearly sucient for X1 1 under the model M1 .
11.16 (Continued . . . ) The eciency factorization multiplier  is dened as
D

e.O j M12 /
:
e.O 1 j M12 /  e.O 2 j M12 /

We say that the Watson eciency factorizes if  D 1, i.e., e.O j M12 / D


e.O 1 j M12 /  e.O 2 j M12 /, which happens i (supposing X0 X D Ip )
jIn  PVC1=2 X1 PVC1=2 X2 j D jIn  PV1=2 X1 PV1=2 X2 j:
 
11.17 Consider a weakly singular linear model MZ D fy; Z; Vg D fy; Z ; Vg,
where Z D .X W ii / has full column rank, and denote M D fy; X; Vg, and
let M.i/ D fy.i/ ; X.i/ ; V.i/ g be such a version of M in which the i th observation is deleted. Assume that X0 ii 0 (i D 1; : : : ; n), and that OLSE./
equals BLUE./ under M . Then
Q
O
.M
.i/ / D .M.i/ /

for all i D 1; : : : ; n;

i V satises MVM D  M, for some nonzero scalar .


2

11.18 The BloomeldWatson eciency

tr.HV  VH/.HV  VH/0


p
X
D tr.HVMV/ D tr.HV2 /  tr.HV/2
14
.i  ni1 /2 ;
D 12 kHV  VHk2F D kHVMk2F D

1
2

iD1

where the equality is attained in the same situation as the minimum of 12 .

12 Linear suciency and admissibility

63

11.19 C. R. Raos criterion for the goodness of OLSE:


O  cov.X/
Q D trHVH  X.X0 V1 X/1 X0 
D trcov.X/

p
X
2
.1=2
 1=2
i
niC1 / :
iD1

11.20 Equality of the BLUEs of X2 2 under two models. Consider the models
M12 D fy; X1 1 C X2 2 ; Vg and M12 D fy; X1 1 C X2 2 ; Vg, where
N
N
statements are equivalent:
C .X1 / \ C .X2 / D f0g. Then the following
(a) fBLUE.X2 2 j M12 g  fBLUE.X2 2 j M12 g,
N
(b) fBLUE.M1 X2 2 j M12 g  fBLUE.M1 X2 2 j M12 g,
N
(c) C .VM/  C .X1 W VM/,
N
(d) C .M1 VM/  C .M1 VM/,
N
(e) C M1 VM1 .M1 X1 /?   C M1 VM1 .M1 X2 /? ,
N
(f) for some L1 and L2 , the matrix M1 VM1 can be expressed as
N
M1 VM1 D M1 X2 L1 X02 M1 C M1 VM1 QM1 X2 L2 QM1 X2 M1 VM1
N
D M1 X2 L1 X02 M1 C M1 VML2 MVM1

12 Linear suciency and admissibility


12.1

Under the model fy; X;  2 Vg, a linear statistic Fy is called linearly sucient
for X if there exists a matrix A such that AFy is the BLUE of X.

12.2

Let W D V C XUU0 X0 be an arbitrary nnd matrix satisfying C .W/ D C .X W


V/; this notation holds throughout this section. Then a statistic Fy is linearly
sucient for X i any of the following equivalent statements holds:
(a) C .X/  C .WF0 /,
(b) N .F/ \ C .X W V/  C .VX? /,
(c) r.X W VF0 / D r.WF0 /,
(d) C .X0 F0 / D C .X0 / and C .FX/ \ C .FVX? / D f0g.
(e) The best linear predictor of y based on Fy, BLP.yI Fy/, is almost surely
equal to a linear function of Fy which does not depend on .

12.3

Let Fy be linearly sucient for X under M D fy; X;  2 Vg. Then each


BLUE of X under the transformed model fFy; FX;  2 FVF0 g is the BLUE
of X under the original model M and vice versa.

64

12.4

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Let K0 be an estimable parametric function under the model fy; X;  2 Vg.


Then the following statements are equivalent:
(a) Fy is linearly sucient for K0 ,

(b) N .FX W FVX? /  N .K0 W 0/,

(c) C X.X0 W X/ K  C .WF0 /.


12.5

Under the model fy; X;  2 Vg, a linear statistic Fy is called linearly minimal
sucient for X, if for any other linearly sucient statistic Ly, there exists a
matrix A such that Fy D ALy almost surely.

12.6

The statistic Fy is linearly minimal sucient for X i C .X/ D C .WF0 /.

12.7

Let K0 be an estimable parametric function under the model fy; X;  2 Vg.


Then the following statements are equivalent:
(a) Fy is linearly minimal sucient for K0 ,
(b) N .FX W FVX? / D N .K0 W 0/,
(c) C .K/ D C .X0 F0 / and FVX? D 0.

12.8

Let X1 1 be estimable under fy; X1 1 C X2 2 ; Vg and denote W1 D V C


P
P 2 D M2 .M2 W1 M2 / M2 . Then X0 M
X1 X01 and M
1 2 y is linearly minimal
sucient for X1 1 .

12.9

Under the model fy; X;  2 Vg, a linear statistic Fy is called linearly complete
for X if for all matrices A such that E.AFy/ D 0 it follows that AFy D 0
almost surely.

12.10 A statistic Fy is linearly complete for X i C .FV/  C .FX/.


12.11 Fy is linearly complete and linearly sucient for X () Fy is minimal
linear sucient () C .X/ D C .WF0 /.
12.12 Linear LehmannSche theorem. Under the model fy; X;  2 Vg, let Ly be
linear unbiased estimator for X and let Fy be linearly complete and linearly
sucient for X. Then the BLUE of X is almost surely equal to the best
linear predictor of Ly based on Fy, BLP.LyI Fy/.
12.13 Admissibility. Consider the linear model M D fy; X;  2 Vg and let K0
be an estimable parametric function, K0 2 Rqp . Denote the set of all linear
(homogeneous) estimators of K0 as LEq .y/ D f Fy W F 2 Rqn g. The mean
squared error matrix of Fy with respect to K0 is dened as
MSEM.FyI K0 / D E.Fy  K0 /.Fy  K0 /0 ;
and the quadratic risk of Fy under M is

13 Best linear unbiased predictor

65

risk.FyI K0 / D trace MSEM.FyI K0 /


D E.Fy  K0 /0 .Fy  K0 /
D  2 trace.FVF0 / C k.FX  K0 /k2
D trace cov.Fy/ C bias2 :
A linear estimator Ay is said to be admissible for K0 among LEq .y/ under
M if there does not exist Fy 2 LEq .y/ such that the inequality
risk.FyI K0 /
risk.AyI K0 /
holds for every .;  2 / 2 Rp  .0; 1/ and is strict for at least one point
.;  2 /. The set of admissible estimators of K0 is denoted as AD.K0 /.
12.14 Consider the model fy; X;  2 Vg and let K0 (K0 2 Rqp ) be estimable,
i.e., K0 D LX for some L 2 Rqn . Then Fy 2 AD.K0 / i
C .VF0 /  C .X/; FVL0  FVF0 L 0
C .F  L/X D C .F  L/N;

and

where N is a matrix satisfying C .N/ D C .X/ \ C .V/.


12.15 Let K0 be any arbitrary parametric function. Then a necessary condition for
Fy 2 AD.K0 / is that C .FX W FV/  C .K0 /.
12.16 If Fy is linearly sucient and admissible for an estimable K0 , then Fy is
also linearly minimal sucient for K0 .

13 Best linear unbiased predictor


13.1

BLUP: Best linear unbiased predictor. Consider the linear model with new
observations:



  
X
V V12
y
Mf D
;
;
; 2
Xf
V21 V22
yf
where yf is an unobservable random vector containing new observations (observable in future). Above we have E.y/ D X, E.yf / D Xf , and
cov.y/ D  2 V;

cov.yf / D  2 V22 ;

cov.y; yf / D  2 V12 :

A linear predictor Gy is said to be linear unbiased predictor, LUP, for yf if


E.Gy/ D E.yf / D Xf for all 2 Rp , i.e., E.Gy  yf / D 0, i.e., C .Xf0 / 
C .X0 /. Then yf is said unbiasedly predictable; the dierence Gy  yf is the
prediction error. A linear unbiased predictor Gy is the BLUP for yf if
cov.Gy  yf /
L cov.Fy  yf /

for all Fy 2 fLUP.yf /g:

66

13.2

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

A linear predictor Ay is the BLUP for yf i the equation


A.X W VX? / D .Xf W V21 X? /
is satised. This is equivalent to the existence of a matrix L such that A satises the equation (Pandoras Box for the BLUP)


  0 
V12
V X
A
:
D
Xf0
X0 0
L

13.3

A linear estimator ` 0 y which is unbiased for zero, is called a linear zero function. Every linear zero function can be written as b0 My for some b. Hence a
linear unbiased predictor Ay is the BLUP for yf under Mf i
cov.Ay  yf ; ` 0 y/ D 0

13.4

for every linear zero function ` 0 y:

Let C .Xf0 /  C .X0 /, i.e., Xf is estimable (assumed below in all statements). The general solution to 13.2 can be written, for example, as
A0 D .Xf W V21 M/.X W VM/C C F.In  P.XWVM/ /;
where F is free to vary. Even though the multiplier A may not be unique, the
observed value Ay of the BLUP is unique with probability 1. We can get, for
example, the following matrices Ai such that Ai y equals the BLUP.yf /:
A1 D Xf B C V21 W In  X.X0 W X/ X0 W ;
A2 D Xf B C V21 V In  X.X0 W X/ X0 W ;
A3 D Xf B C V21 M.MVM/ M;
A4 D Xf .X0 X/ X0 C V21  Xf .X0 X/ X0 VM.MVM/ M;
where W D V C XUX0 and B D .X0 W X/ X0 W with U satisfying
C .W/ D C .X W V/.

13.5

The BLUP.yf / can be written as


Q
BLUP.yf / D Xf Q C V21 W .y  X/
D Xf Q C V21 W "Q D Xf Q C V21 V "Q
P
D Xf Q C V21 M.MVM/ My D BLUE.Xf / C V21 My
P
D Xf O C .V21  Xf XC V/My
P
D OLSE.Xf / C .V21  Xf XC V/My;
where "Q D y  XQ is the vector of the BLUEs residual:
P
XQ D y  VM.MVM/ My D y  VMy;

13.6

BLUP.yf / D BLUE.Xf / () C .V12 /  C .X/

13.7

The following statements are equivalent:

P D WMy:
P
"Q D VMy

13 Best linear unbiased predictor

67

(a) BLUP.yf / D OLSE.Xf / D Xf O

for a xed Xf D LX,

(b) C V21  V.X0 /C Xf0   C .X/,


(c) C .V21  VHL0 /  C .X/.
13.8

The following statements are equivalent:


(a) BLUP.yf / D OLSE.Xf / D Xf O

for all Xf of the form Xf D LX,

(b) C .VX/  C .X/ and C .V12 /  C .X/,


(c) OLSE.X/ D BLUE.X/ and BLUP.yf / D BLUE.Xf /.
13.9

Let Xf be estimable so that Xf D LX for some L, and denote Gy D


BLUE.X/. Then
Q
Q C covV21 V .y  X/
covBLUP.yf / D cov.Xf /

Q
Q C covV21 V .y  X/
D cov.LX/
D cov.LGy/ C covV21 V .y  Gy/
P
D cov.LGy/ C cov.V21 My/
P 12
D L cov.Gy/L0 C V21 MV
0
D L cov.Gy/L C V21 V V  cov.Gy/V V12 :

13.10 If the covariance matrix of .y0 ; yf0 /0 has an intraclass correlation structure and
1n 2 C .X/, 1m 2 C .Xf /, then
BLUP.yf / D OLSE.Xf / D Xf O D Xf .X0 X/ X0 y D Xf XC y:
13.11 If the covariance matrix of .y0 ; yf /0 has an AR(1)-structure f%jij j g, then
BLUP.yf / D xf0 Q C %i0n "Q D xf0 Q C %eQn ;
where xf0 corresponds to Xf and "Q is the vector of the BLUEs residual.
13.12 Consider the models Mf and Mf , where Xf is a given estimable parametric
function such that C .Xf0 /  CN .X0 / and C .Xf0 /  C .X0 /:
N 
N
    
X
V11 V12
y
;
;
;
Mf D
Xf
V21 V22
yf

    
X
V V
y
Mf D
; N ; N 11 N 12 :
X
V21 V22
y
f
f
N
N
N
N
Then every representation of the BLUP for yf under the model Mf is also a
BLUP for yf under the model Mf if and only if

N


X V11 M
X V11 M
:
C N N N C
Xf V21 M
Xf V21 M
N
N N

68

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

13.13 Under the model Mf , a linear statistic Fy is called linearly prediction sufcient for yf if there exists a matrix A such that AFy is the BLUP for yf .
Moreover, Fy is called linearly minimal prediction sucient for yf if for any
other linearly prediction sucient statistics Sy, there exists a matrix A such
that Fy D ASy almost surely. Under the model Mf , Fy is linearly prediction
sucient for yf i
N .FX W FVX? /  N .Xf W V21 X? /;
and Fy is linearly minimal prediction sucient i the equality holds above.
13.14 Let Fy be linearly prediction sucient for yf under Mf . Then every representation of the BLUP for yf under the transformed model


  

FX
FVF0 FV12
Fy
;
; 2
Mf D
Xf
yf
V21 F0 V22
is also the BLUP under Mf and vice versa.

14 Mixed model
14.1

The mixed model: y D X C Z C ", where y is an observable and " an unobservable random vector, Xnp and Znq are known matrices, is a vector
of unknown parameters having xed values (xed eects),  is an unobservable random vector (random eects) such that E./ D 0, E."/ D 0, and
cov./ D  2 D, cov."/ D  2 R, cov.; "/ D 0. We may denote briey
Mmix D fy; X C Z;  2 D;  2 Rg:
Then E.y/ D X, cov.y/ D cov.Z C "/ D  2 .ZDZ0 C R/ WD  2 , and
 




0
y
ZD
2 ZDZ C R ZD
2
cov
D
D
:

DZ0 D
D
DZ0

14.2

The mixed model can be presented as a version of the model with new observations; the new observations being now in : y D X C Z C ",  D
0  C "f , where cov."f / D cov./ D  2 D, cov.; "/ D cov."f ; "/ D 0:


   
ZDZ0 C R ZD
y
X
:
;
;  2
Mmix-new D
D

0
DZ0

14.3

The linear predictor Ay is the BLUP for  under the mixed model Mmix i
A.X W M/ D .0 W DZ0 M/;
where D ZDZ0 C R. In terms of Pandoras Box, Ay D BLUP./ i there
exists a matrix L such that B satises the equation

14 Mixed model

69

  0  
B
ZD
X
D
:
L
0
X0 0

The linear estimator By is the BLUE for X under the model Mmix i
B.X W M/ D .X W 0/:
14.4

Under the mixed model Mmix we have


(a) BLUE.X/ D XQ D X.X0 W X/ X0 W y,
Q D DZ0 W "Q D DZ0  "Q
(b) BLUP./ D Q D DZ0 W .y  X/
P y;
D DZ0 M.MM/ My D DZ0 M
P D Z0 M.MM/ M.
where W D C XUX0 , C .W/ D C .X W / and M

14.5

Hendersons mixed model equations are dened as


 0 1
    0 1 
XR X
X0 R1 Z

XR y
:
D
Z0 R1 X Z0 R1 Z C D1
Z0 R1 y

They are obtained by minimizing the following quadratic form f .; / with
respect to and  (keeping also  as a non-random vector):

0 
1 

y  X  Z
R 0
y  X  Z
f .; / D
:

0 D

If  and   are solutions to the mixed model equations, then X  D
BLUE.X/ D XQ and   D BLUP./.

14.6

Let us denote
 
y
y# D
;
0


X Z
;
X D
0 Iq



R 0
V D
;
0 D

 

D
:


Then f .; / in 14.5 can be expressed as


f .; / D .y#  X /0 V1
 .y#  X /;
and the minimum of f .; / is attained when has value
 


Q
XQ C ZQ
0 1
1 0 1
Q D
D .X V X / X V y I X Q D
:
Q
Q
14.7

If V is singular, then minimizing the quadratic form


f .; / D .y#  X /0 W
 .y#  X /;
Q
where W D V C X X0 , yields Q and .



M
In
:
M
D
Z0
Z0 M


14.8

?
One choice for X?
 is X D

70

14.9

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Let us consider the xed eects partitioned model


F W y D X C Z C ";

cov.y/ D cov."/ D R;

where both and  are xed (but of course unknown) coecients, and supplement F with the stochastic restrictions y0 D  C "0 , cov."0 / D D. This
supplementation can be expressed as the partitioned model:
  

  
X Z

R 0
y
;
;
:
F D fy ; X ; V g D
0 Iq

0 D
y0
Model F is a partitioned linear model F D fy; X CZ; Rg supplemented
with stochastic restrictions on .
14.10 The estimator By is the BLUE for X under the model F i B satises
the equation B.X W V X?
 / D .X W 0/, i.e.,
 



X Z 0
X Z
RM
B11 B12
D
:
(a)
B21 B22
0 Iq 0
0 Iq DZ0 M
14.11 Let By be the BLUE of X under the augmented model F , i.e.,






B12
B11 B12
B11
y D
yC
y
By D
B21 B22
B21
B22 0
D BLUE.X j F /:
Then it is necessary and sucient that B21 y is the BLUP of  and .B11 
ZB21 /y is the BLUE of X under the mixed model Mmix ; in other words, (a)
in 14.10 holds i

   



In Z
y
BLUE.X j M /
B11  ZB21
B
yD
D
;
B21
0 Iq
0
BLUP. j M /
or equivalently



  
BLUE.X j M / C BLUP.Z j M /
y
B11
yD
:
B
D
B21
BLUP. j M /
0
14.12 Assume that B satises (a) in 14.10. Then all representations of BLUEs of
X and BLUPs of  in the mixed model Mmix can be generated through


B11  ZB21
y by varying B11 and B21 .
B21
14.13 Consider two mixed models:
M1 D fy; X C Z; D1 ; R1 g;

M2 D fy; X C Z; D2 ; R2 g;

and denote i D ZDi Z0 CRi . Then every representation of the BLUP. j M1 /


continues to be BLUP. j M2 / i any of the following equivalent conditions
holds:

14 Mixed model

(a) C
(b) C
(c) C
(d) C

71



2M
X 1M

C
;
D2 Z0 M
0 D1 Z0 M




X R1 M
R2 M
C
;
D2 Z0 M
0 D1 Z0 M




M 1 M
M 2 M
C
;
D2 Z0 M
D1 Z0 M




MR1 M
MR2 M

C
:
D2 Z0 M
D1 Z0 M

14.14 (Continued . . . ) Both the BLUE.X j M1 / continues to be the BLUE.X j M2 /


and the BLUP. j M1 / continues to be the BLUP. j M2 / i any of the following equivalent conditions holds:




1M
2M
C
;
(a) C
D2 Z0 M
D1 Z0 M




R1 M
R2 M

C
;
(b) C
D2 Z0 M
D1 Z0 M
?
(c) C .V2 X?
 /  C .V1 X /, where




Ri 0
In
?
and X D
M:
Vi D
Z0
0 Di

(d) The matrix V2 can be expressed as


? 0
V2 D X N1 X C V1 X?
 N2 .X / V1

for some N1 , N2 :

14.15 Consider the partitioned linear xed eects model


F D fy; X1 1 C X2 2 ; Rg D fy; X; Rg:
Let M be a linear mixed eects model y D X1 1 C X2  2 C ", where
cov. 2 / D D, cov."/ D R, which we denote as
M D fy; X1 1 ; g D fy; X1 1 ; X2 DX02 C Rg:
Then, denoting D X2 DX02 C R, the following statements hold:
(a) There exists a matrix L such that Ly is the BLUE of M2 X1 1 under the
models F and M i N .X1 W X2 W RX?
1 /  N .M2 X1 W 0 W 0/.
(b) Every representation of the BLUE of M2 X1 1 under F is also the BLUE
of M2 X1 1 under M i C .X2 W RX? / C .X?
1 /.
(c) Every representation of the BLUE of M2 X1 1 under M is also the
BLUE of M2 X1 1 under F i C .X2 W RX? /  C .X?
1 /.

72

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

15 Multivariate linear model


15.1

Instead of one response variable y, consider d response variables y1 ; : : : ; yd .


Let the n observed values of these d variables be in the data matrix Ynd
while Xnp is the usual model matrix:
0 0 1
0 0 1
y.1/
x.1/
B :: C
B :: C
Y D .y1 W : : : W yd / D @ : A ; X D .x1 W : : : W xp / D @ : A :
y0.n/

x0.n/

Denote B D . 1 W : : : W d / 2 Rpd , and assume that


yj D Xj C "j ;

cov."j / D j2 In ;

E."j / D 0;

j D 1; : : : ; d;

.y1 W : : : W yd / D X. 1 W : : : W d / C ."1 W : : : W "d /; Y D XB C ":

The columns of Y, y1 ; : : : ; yd , are n-dimensional random vectors such that


yj  .Xj ; j2 In /;

cov.yj ; yk / D j k In ;

j; k D 1; : : : ; d:

Transposing Y D XB C " yields


.y.1/ W : : : W y.n/ / D B0 .x.1/ W : : : W x.n/ / C .".1/ W : : : W ".n/ /;
where the (transposed) rows of Y, i.e., y.1/ ; : : : ; y.n/ , are independent d dimensional random vectors such that
y.i/  .B0 x.i/ ; /;

cov.y.i/ ; y.j / / D 0;

i j:

Notice that in this setup rows (observations) are independent but columns
(variables) may correlate. We denote this model shortly as A D fY; XB; g.
15.2

Denoting

1
y1
y D vec.Y/ D @ ::: A ;
yd

X :::
: :
@
E.y / D :: : :
0 :::

10 1
0
1
:: A @ :: A
:
:
X

D .Id X/ vec.B/;
we get

1
12 In 12 In : : : 1d In
::
:: C
B :
cov.y / D @ ::
:
: A D In ;
d1 In d 2 In : : : d2 In
0

and hence the multivariate model can be rewritten as a univariate model


B D fvec.Y/; .Id X/ vec.B/; In g WD fy ; X  ; V g:
15.3

By analogy of the univariate model, we can estimate XB by minimizing


tr.Y  XB/0 .Y  XB/ D kY  XBk2F :

15 Multivariate linear model

73

The resulting XBO D HY is the OLSE of XB. Moreover, we have


.Y  XB/0 .Y  XB/ L .Y  HY/0 .Y  HY/ D Y0 MY WD Eres 8 B;
where the equality holds i XB D XBO D PX Y D HY.
15.4

Under multinormality, Y0 MY  Wd n  r.X/; , and if r.X/ D p, we have


MLE.B/ D BO D .X0 X/1 X0 Y, MLE./ D n1 Y0 MY D n1 Eres .

15.5

C .V X /  C .X / H) BLUE.X  / D OLSE.X  /

15.6

Consider a linear hypothesis H : K0 B D D, where K 2 Rpq


and D 2 Rqd
q
0
are such that K B is estimable, Then the minimum EH , say, of
.Y  XB/0 .Y  XB/

subject to K0 B D D

occurs (in the Lwner sense) when XB equals


XBO H D XBO  X.X0 X/ KK0 .X0 X/ K1 .K0 BO  D/;
and so we have, corresponding to 7.17 (p. 35),
EH  Eres D .K0 BO  D/0 K0 .X0 X/ K1 .K0 BO  D/:
The hypothesis testing in multivariate model can be based on some appropriate function of .EH  Eres /E1
res , or on some closely related matrix.
15.7

Consider two independent samples Y01 D .y.11/ W : : : W y.1n1 / / and Y02 D


.y.21/ W : : : W y.2n2 / / so that each y.1i/  Nd .1 ; / and y.2i/  Nd .2 ; /.
Then the test statistics for the hypothesis 1 D 2 can be be based, e.g., on
the


n 1 n2
0 1
N
N

D
ch
.N
y

y
/.N
y

y
/
E
ch1 .EH  Eres /E1
1
1
2
1
2
res
res
n1 C n2
D .Ny1  yN 2 /0 S1
y1  yN 2 /;
 .N
where D

.n1 Cn2 2/n1 n2


,
n1 Cn2

S D

1
E
n1 Cn2 2 res

1
T ,
n1 Cn2 2 

and

Eres D Y0 .In  H/Y D Y01 Cn1 Y1 C Y02 Cn2 Y2 D T ;


EH  Eres D Y0 .In  Jn /Y  Y0 .In  H/Y D Y0 .H  Jn /Y
D n1 .Ny1  yN /.Ny1  yN /0 C n2 .Ny2  yN /.Ny2  yN /0
n1 n2
D
.Ny1  yN 2 /.Ny1  yN 2 /0 WD EBetween ;
n1 C n2
EH D EBetween C Eres :
Hence one appropriate test statistics appears to be a function of the squared
Mahalanobis distance
y1  yN 2 /:
MHLN2 .Ny1 ; yN 2 ; S / D .Ny1  yN 2 /0 S1
 .N
One such statistics is

n1 n2
.Ny1
n1 Cn2

 yN 2 /0 S1
y1  yN 2 / D T 2 D Hotellings T 2 .
 .N

74

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

16 Principal components, discriminant analysis, factor analysis


16.1

Principal component analysis, PCA. Let a p-dimensional random vector z


have E.z/ D 0 and cov.z/ D . Consider the eigenvalue decomposition
of : D TT0 , D diag.1 ; : : : ; n /, T0 T D In , T D .t1 W : : : W tn /,
where 1  2      n  0 are the ordered eigenvalues of . Then the
random variable t0i z, which is the i th element of the random vector T0 z, is the
ith (population) principal component of z.

16.2

Denote T.i/ D .t1 W : : : W ti /. Then, in the above notation,


 max
var.b0 z/ D var.t01 z/ D 1 ,
0
b bD1

max var.b0 z/ D var.t0i z/ D i ,

b0 bD1
T0.i 1/ bD0

i.e., t0i z has maximum variance of all normalized linear combinations uncorrelated with the elements of T0.i1/ z.
16.3

Predictive approach to PCA. Let Apk have orthonormal columns, and consider the best linear predictor of z on the basis of A0 z. Then, the Euclidean
norm of the covariance matrix of the prediction error, see 6.14 (p. 29),
k  A.A0 A/1 A0 kF D k 1=2 .Id  P1=2 A / 1=2 kF ;
is a minimum when A is chosen as T.k/ D .t1 W : : : W tk /. Moreover, minimizing the trace of the covariance matrix of the prediction error, i.e., maximizing
tr.P1=2 A / yields the same result.

16.4

Q be an n  p
Sample principal components, geometric interpretation. Let X
centered data matrix. How should Gnk be chosen if we wish to minimize the
sum of orthogonal distances (squared) of the observations xQ .i/ from C .G/?
The function to be minimized is
Q  XP
Q G k2 D tr.X
Q 0 X/
Q  tr.PG X
Q 0 X/;
Q
kEk2F D kX
F
P D T.k/ D .t1 W : : : W tk /, where tk are the rst k
and the solution is G
Q (which are the same as those of the correQ 0X
orthonormal eigenvectors of X
sponding covariance matrix). The new projected observations are the columns
Q 0 D PT X
Q 0 . In particular, if k D 1, the new projected
of matrix T.k/ T0.k/ X
.k/
observations are the columns of
Q 0 D t1 .t01 X
Q 0 / D t1 .t01 xQ .1/ ; : : : ; t01 xQ .n/ / WD t1 s01 ;
t1 t01 X
Q 1 D .t0 xQ .1/ ; : : : ; t0n xQ .n/ /0 2 Rn comprises the valwhere the vector s1 D Xt
1
ues (scores) of the rst (sample) principal component; the i th individual has
the score t0i xQ .i/ on the rst principal component. The scores are the (signed)
lengths of the new projected observations.

16 Principal components, discriminant analysis, factor analysis

75

16.5

PCA and the matrix approximation. The matrix approximation of the centered
Q by a matrix of a lower rank yields the PCA. The scores can be
data matrix X
Q X
Q D UT0 . With k D 1, the scores
obtained directly from the SVD of X:
Q The
of the j th principal component are in the j th column of U D XT.
0Q0
columns of matrix T X represent the new rotated observations.

16.6

Discriminant analysis. Let x denote a d -variate random vector with E.x/ D


1 in population 1 and E.x/ D 2 in population 2, and cov.x/ D (pd) in
both populations. If
max
a0

a0 .1  2 /2
a0 .1  2 /2
a0 .1  2 /2
;
D
max
D
var.a0 x/
a0 a
a0 a
a0

a0 x

then
is called a linear discriminant function for the two populations.
Moreover,
max
a0

a0 .1  2 /2
D .1 2 /0 1 .1 2 / D MHLN2 .1 ; 2 ; /;
a0 a

and any linear combination a0 x with a D b  1 .1  2 /, where b 0


is an arbitrary constant, is a linear discriminant function for the two populations. In other words, nding that linear combination a0 x for which a0 1 and
a0 2 are as distant from each other as possibledistance measure in terms
of standard deviation of a0 x, yields a linear discriminant function a0 x.
16.7

Let U01 D .u.11/ W : : : W u.1n1 / / and U02 D .u.21/ W : : : W u.2n2 / / be independent random samples from d -dimensional populations .1 ; / and .2 ; /,
respectively. Denote, cf. 5.46 (p. 26), 15.7 (p. 73),
Ti D U0i Cni Ui ; T D T1 C T2 ;
1
T ; n D n1 C n2 ;
S D
n1 C n2  2
and uN i D

1
ni

max
a0

U0i 1ni , uN D n1 .U01 W U02 /1n . Let S be pd. Then


a0 .uN 1  uN 2 /2
a 0 S a

N 1  uN 2 / D MHLN2 .uN 1 ; uN 2 ; S /
D .uN 1  uN 2 /0 S1
 .u
a0 .uN 1  uN 2 /.uN 1  uN 2 /0 a
a0 Ba
D
.n

2/
max
;
D .n  2/ max
a0 T a
a0
a0 a0 T a
P
N uN i  u/
N 0 . The vector
where B D .uN 1  uN 2 /.uN 1  uN 2 /0 D n1nn2 2iD1 ni .uN i  u/.
1 N
of coecients of the discriminant function is given by a D S .u1  uN 2 / or
any vector proportional to it.
16.8

The model for factor analysis is x D  C Af C ", where x is an observable


random vector of p components; E.x/ D  and cov.x/ D . Vector f is an

76

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

m-dimensional random vector, m


p, whose elements are called (common)
factors. The elements of " are called specic or unique factors. The matrix
Apm is the unknown matrix of factor loadings. Moreover,
E."/ D 0;

cov."/ D D diag.

E.f/ D 0;

cov.f/ D D Im ;

2
1;:::;

2
p /;

cov.f; "/ D 0:

The fundamental equation for factor analysis is cov.x/ D cov.Af/ C cov."/,


i.e., D AA0 C . Then

  
x
A
and cov.f  Lx/ L cov.f  A0 1 x/ for all L.
cov
D
f
A0 I m
The matrix L D A0 1 can be represented as
L D A0 1 D .A0 1 A C Im /1 A0 1 ;
and the individual factor scores can be obtained by L .x  /.

17 Canonical correlations
17.1

 
Let z D xy denote a d -dimensional random vector with (possibly singular)
covariance matrix . Let x and y have dimensions d1 and d2 D d  d1 ,
respectively. Denote

 

x
xx xy
; r. xy / D m; r. xx / D h
r. yy /:
cov
DD
yx yy
y
Let %21 be the maximum value of cor2 .a0 x; b0 y/, and let a D a1 and b D b1
be the
pcorresponding maximizing values of a and b. Then the positive square
root %21 is called the rst (population) canonical correlation between x and
y, denoted as cc1 .x; y/ D %1 , and u1 D a01 x and v1 D b01 y are called the rst
(population) canonical variables.

17.2

Let %22 be the maximum value of cor 2 .a0 x; b0 y/, where a0 x is uncorrelated
0
0
with a01 x and b0 y is uncorrelated with b01 y, and let u
p2 D a2 x and v2 D b2 y
be the maximizing values. The positive square root %22 is called the second
canonical correlation, denoted as cc2 .x; y/ D %2 , and u2 and v2 are called
the second canonical variables. Continuing in this manner, we obtain h pairs
of canonical variables u D .u1 ; u2 ; : : : ; uh /0 and v D .v1 ; v2 ; : : : ; vh /0 .

17.3

We have
cor 2 .a0 x; b0 y/ D

xy C1=2
b  /2
.a0 C1=2
.a0 xy b/2
xx
yy
;
D
a0 xx a  b0 yy b
a0 a  b0 b

1=2
where a D 1=2
xx a, b D yy b. In view of 23.3 (p. 106), we get

17 Canonical correlations

77

max cor 2 .a0 x; b0 y/ D sg21 . C1=2


xy C1=2
/
xx
yy
a;b

17.4

C
2
2
D ch1 . C
xx xy yy yx / D cc1 .x; y/ D %1 :

The minimal angle between the subspaces A D C .A/ and B D C .B/ is


dened to be the number 0
min

=2 for which
cos2 min D

17.5

.0 A0 B/2
:
A0 0 A0 A  0 B0 B

max cos2 .u; v/ D max

u2A; v2B
u0; v0

B0

Let Ana and Bnb be given matrices. Then


.01 A0 B 1 /2
.0 A0 B/2
D
01 A0 A1  01 B0 B 2
A0 0 A0 A  0 B0 B
max

B0

01 A0 PB A1
D ch1 .PA PB / D 21 ;
01 A0 A1

where .21 ; 1 / is the rst proper eigenpair for .A0 PB A; A0 A/ satisfying


A0 PB A1 D 21 A0 A1 ;

A1 0:

The vector 1 is the proper eigenvector satisfying


B0 PA B 1 D 21 B0 B 1 ;
17.6

B 1 0:

Consider an n-dimensional random vector u such that cov.u/ D In and dene


x D A0 u and y D B0 u where A 2 Rna and B 2 Rnb . Then

 


 0   0
x
A A A0 B
xx xy
Au
D :
cov
D
WD
D cov
yx yy
B0 u
B0 A B0 B
y
Let %i denote the ith largest canonical correlation between the random vectors
A0 u and B0 u and let 01 A0 u and 01 B0 u be the rst canonical variables. In view
of 17.5,
01 A0 PB A1
.0 A0 B/2
D ch1 .PA PB /:
D
01 A0 A1
A0 0 A0 A  0 B0 B

%21 D max

B0

In other words, .%21 ; 1 / is the rst proper eigenpair for .A0 PB A; A0 A/:
A0 PB A1 D %21 A0 A1 ;

A1 0:

Moreover, .%21 ; A1 / is the rst eigenpair of PA PB : PA PB A1 D %21 A1 .


17.7

Suppose that r D r.A/


r.B/ and denote
cc.A0 u; B0 u/ D f%1 ; : : : ; %m ; %mC1 ; : : : ; %h g D the set of all ccs;
ccC .A0 u; B0 u/ D f%1 ; : : : ; %m g
D the set of nonzero ccs, m D r.A0 B/.

78

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Then
(a) there are h D r.A/ pairs of canonical variables 0i A0 u, 0i B0 u, and h
corresponding canonical correlations %1  %2      %h  0,
(b) the vectors i are the proper eigenvectors of A0 PB A w.r.t. A0 A, and the
%2i s are the corresponding proper eigenvalues,
(c) the %2i s are the h largest eigenvalues of PA PB ,
(d) the nonzero %2i s are the nonzero eigenvalues of PA PB , i.e.,

cc2C .A0 u; B0 u/ D nzch.PA PB / D nzch. yx 
xx xy yy /;

(e) the vectors i are the proper eigenvectors of B0 PA B w.r.t. B0 B,


(f) the number of nonzero %i s is m D r.A0 B/ D r.A/dim C .A/\C .B/? ,
(g) the number of unit %i s is u D dim C .A/ \ C .B/,
(h) the number of zero %i s is s D r.A/  r.A0 B/ D dim C .A/ \ C .B/? .
17.8

Let u denote a random vector with covariance matrix cov.u/ D In and let
K 2 Rnp , L 2 Rnq , and F has the property C .F/ D C .K/ \ C .L/. Then
(a) The canonical correlations between K0 QL u and L0 QK u are all less than
1, and are precisely those canonical correlations between K0 u and L0 u
that are not equal to 1.
(b) The nonzero eigenvalues of PK PL  PF are all less than 1, and are precisely those canonical correlations between the vectors K0 u and L0 u that
are not equal to 1.

18 Column space properties and rank rules


18.1

For conformable matrices A and B, the following statements hold:


(a) N .A/ D N .A0 A/,
(b) C .A/? D C .A? / D N .A0 /,
(c) C .A/  C .B/ () C .B/?  C .A/? ,
(d) C .A/ D C .AA0 /,
(e) r.A/ D r.A0 / D r.AA0 / D r.A0 A/,
(f) Rn D C .Anm /  C .Anm /? D C .A/  N .A0 /,
(g) r.Anm / D n  r.A? / D n  dim N .A0 /,
(h) C .A W B/ D C .A/ C C .B/,

18 Column space properties and rank rules

79

(i) r.A W B/ D r.A/ C r.B/  dim C .A/ \ C .B/,


 
A
(j) r
D r.A/ C r.B/  dim C .A0 / \ C .B0 /,
B
(k) C .A C B/  C .A/ C C .B/ D C .A W B/,
(l) r.A C B/
r.A W B/
r.A/ C r.B/,
(m) C .A W B/? D C .A/? \ C .B/? .
18.2

(a) LAY D MAY & r.AY/ D r.A/ H) LA D MA

rank cancellation rule

(b) DAM D DAN & r.DA/ D r.A/ H) AM D AN


18.3

r.AB/ D r.A/ & FAB D 0 H) FA D 0

18.4

(a) r.AB/ D r.A/ H) r.FAB/ D r.FA/ for all F


(b) r.AB/ D r.B/ H) r.ABG/ D r.BG/ for all G

18.5

r.AB/ D r.A/  dim C .A0 / \ C .B/?

18.6

r.A W B/ D r.A/ C r.I  PA /B D r.A/ C r.I  AA /B

rank of the product

D r.A/ C r.B/  dim C .A/ \ C .B/


 
A
D r.A/ C r.I  PA0 /B0  D r.A/ C rB0 .I  A A/
B
D r.A/ C r.B/  dim C .A0 / \ C .B0 /

18.7

18.8

r.A0 UA/ D r.A0 U/


() C .A0 UA/ D C .A0 U/
() A0 UA.A0 UA/ A0 U D A0 U

18.9

[holds e.g. if U L 0]

r.A0 UA/ D r.A/ () C .A0 UA/ D C .A0 / () A0 UA.A0 UA/ A0 D A0

18.10 r.A C B/ D r.A/ C r.B/


() dim C .A/ \ C .B/ D 0 D dim C .A0 / \ C .B0 /
18.11 C .U C V/ D C .U W V/ if U and V are nnd
18.12 C .G/ D C .G / H) C .AG/ D C .AG /
18.13 C .Anm / D C .Bnp /
i 9 F W Anm D Bnp Fpm ; where C .B0 / \ C .F? / D f0g

80

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

18.14 AA0 D BB0 () 9 orthogonal Q W A D BQ


18.15 Let y 2 Rn be a given vector, y 0. Then y0 Qx D 0 8 Qnp H) x D 0.
18.16 Frobenius inequality: r.AZB/  r.AZ/ C r.ZB/  r.Z/
18.17 Sylvesters inequality:
r.Amn Bnp /  r.A/ C r.B/  n; with equality i N .A/  C .B/
18.18 For A 2 Rnm we have
(a) .In  AA /0 2 fA? g,
(b) In  .A /0 A0 2 fA? g,
(c) In  .A0 / A0 2 fA? g,
(d) Im  A A 2 f.A0 /? g,
(e) In  AAC D In  PA D QA 2 fA? g.
(

? )
Anp Bnq
In
QA 2
;
B0
0
Iq


18.19

? )
 (

Anp
QA 0
2
;
0qp
0 0

QA D I  P A

18.20

In
B0

(
? )
Bnq
2
Iq

18.21 Consider Anp , Znq , D D diag.d1 ; : : : ; dq / 2 PDq , and suppose that


Z0 Z D Iq and C .A/  C .Z/. Then D1=2 Z0 .In  PA / 2 f.D1=2 Z0 A/? g.
18.22 C .A/ \ C .B/ D C A.A0 B? /?  D C A.A0 QB /?  D C A.I  PA0 QB /
18.23 C .A/ \ C .B/? D C A.A0 B/?  D C A.I  PA0 B / D C PA .I  PA0 B /
18.24 C .A/ \ C .B/ D C AA0 .AA0 C BB0 / BB0 
18.25 Disjointness: C .A/ \ C .B/ D f0g. The following statements are equivalent:
(a) C .A/ \ C .B/ D f0g,
 0

 0
A
A 0
0
0 
0
0
(b)
.AA C BB / .AA W BB / D
,
B0
0 B0
(c) A0 .AA0 C BB0 / AA0 D A0 ,
(d) A0 .AA0 C BB0 / B D 0,

18 Column space properties and rank rules

81

(e) .AA0 C BB0 / is a generalized inverse of AA0 ,


(f) A0 .AA0 C BB0 / A D PA0 ,
 
 0
0
A
(g) C

C
,
B0
B0
(h) N .A W B/  N .0 W B/,
(i) Y.A W B/ D .0 W B/ has a solution for Y,


PA0 0


(j) P A0 D
,
0 P B0
B0
(k) r.QB A/ D r.A/,
(l) C .A0 QB / D C .A0 /,
(m) PA0 QB A0 D A0 ,
(n) ch1 .PA PB / < 1,
(o) det.I  PA PB / 0.
18.26 Let us denote PAB D A.A0 QB A/ A0 QB , QB D I  PB . Then the following
statements are equivalent:
(a) PAB is invariant w.r.t. the choice of .A0 QB A/ ,
(b) C A.A0 QB A/ A0 QB  is invariant w.r.t. the choice of .A0 QB A/ ,
(c) r.QB A/ D r.A/,
(d) PAB A D A,
(e) PAB is the projector onto C .A/ along C .B/  C .A W B/? ,
(f) C .A/ \ C .B/ D f0g,
(g) P.AWB/ D PAB C PBA .
18.27 Let A 2 Rnp and B 2 Rnq . Then
r.PA PB QA / D r.PA PB / C r.PA W PB /  r.A/  r.B/
D r.PA PB / C r.QA PB /  r.B/
D r.PB PA / C r.QB PA /  r.A/
D r.PB PA QB /:
18.28 (a) C .APB / D C .AB/,
(b) r.AB/ D r.APB / D r.PB A0 / D r.B0 A0 / D r.PB PA0 / D r.PA0 PB /.

82

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

19 Inverse of a matrix
19.1

(a) A; : submatrix of Ann , obtained by choosing the elements of A


which lie in rows and columns ; and are index sets of the rows
and the columns of A, respectively.
(b) A D A; : principal submatrix; same rows and columns chosen.
(c) ALi D i th leading principal submatrix of A: ALi D A1; : : : ; i .
(d) A.; /: submatrix of A, obtained by choosing the elements of A which
do not lie in rows and columns .
(e) A.i; j / D submatrix of A, obtained by deleting row i and column j .
(f) minor.aij / D det.A.i; j / D ij th minor of A corresponding to aij .
(g) cof.aij / D .1/iCj minor.aij / D ij th cofactor of A corresponding to
aij .
(h) det.A/ D principal minor.
(i) det.ALi / D leading principal minor of order i .

19.2

Determinant. The determinant of matrix Ann , denoted as jAj or det.A/, is


det.a/ D a when a 2 R; when n > 1, we have the Laplace expansion of the
determinant by minors along the i th row:
det.A/ D

n
X

aij cof.aij /;

i 2 f1; : : : ; ng:

j D1

19.3

An alternative equivalent denition of the determinant is the following:


X
det.A/ D
.1/f .i1 ;:::;in / a1i1 a2i2    anin ;
where the summation is taken over all permutations fi1 ; : : : ; in g of the set of
integers f1; : : : ; ng, and the function f .i1 ; : : : ; in / equals the number of transpositions necessary to change fi1 ; : : : ; in g to f1; : : : ; ng. A transposition is the
interchange of two of the integers. The determinant produces all products of
n terms of the elements of A such that exactly one element is selected from
each row and each column of A; there are n of such products.

19.4

If Ann Bnn D In then B D A1 and A is said to be nonsingular. Ann is


nonsingular i det.A/ 0 i rank.A/ D n.

19.5

If A D




1
d b
a b
, det.A/ D ad  bc.
, then A1 D
c d
ad  bc c a

If A D faij g, then

19 Inverse of a matrix

83

A1 D faij g D

19.6

1
1
cof.A/0 D
adj.A/;
det.A/
det.A/

where cof.A/ D fcof.aij /g. The matrix adj.A/ D cof.A/0 is the adjoint
matrix of A.


A11 A12
, where A11
Let a nonsingular matrix A be partitioned as A D
A21 A22
is a square matrix. Then




A11
0
A12
I A1
I
0
11
(a) A D
0
I
0 A22  A21 A1
A21 A1
11 I
11 A12

 1
1
1
1
1
A11 C A1
11 A12 A221 A21 A11 A11 A12 A221
(b) A1 D
1
1
1
A221 A21 A11
A221


1
A1
A1
112
112 A12 A22
D
1
1
1
1
1
A1
22 A21 A112 A22 C A22 A21 A112 A12 A22

 1   1
A11 A12
A11 0
1
A1
C
D
221 .A21 A11 W I/; where
I
0 0
(c) A112 D A11  A12 A1
22 A21 D the Schur complement of A22 in A;
A221 D A22  A21 A1
11 A12 WD A=A11 :
(d) For possibly singular A the generalized Schur complements are
A112 D A11  A12 A
22 A21 ;
(e) r.A/ D r.A11 / C r.A221 /
D r.A22 / C r.A112 /

A221 D A22  A21 A


11 A12 :

if C .A12 /  C .A11 /; C .A021 /  C .A011 /


if C .A21 /  C .A22 /; C .A012 /  C .A022 /

(f) jAj D jA11 jjA22  A21 A


11 A12 j

if C .A12 /  C .A11 /,
C .A021 /  C .A011 /

D jA22 jjA11  A12 A


22 A21 j

if C .A21 /  C .A22 /,
C .A012 /  C .A022 /

(g) Let Aij 2 Rnn and A11 A21 D A21 A11 . Then jAj D jA11 A22 
A21 A12 j.
19.7

WedderburnGuttman theorem. Consider A 2 Rnp , x 2 Rn , y 2 Rp and


suppose that WD x0 Ay 0. Then in view of the rank additivity on the Schur
complement, see 19.6e, we have


A Ay
D r.A/ D r.x0 Ay/ C r.A  1 Ayx0 A/;
r 0
x A x0 Ay
and hence rank.A  1 Ayx0 A/ D rank.A/  1.


19.8

I A
0 I

1
D


I A
;
0 I

Enn Fnm

0 Gmm D jEjjGj

84

19.9

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Let A be nnd. Then there exists L such that A D L0 L, where


 

 0
 0
A11 A12
L1
L1 L1 L01 L2
0
.L1 W L2 / D
D
:
ADLLD
L02
L02 L1 L02 L2
A21 A22
If A is positive denite, then


.L01 Q2 L1 /1
.L01 Q2 L1 /1 L01 L2 .L02 L2 /1
A1 D
.L02 L2 /1 L02 L1 .L01 Q2 L1 /1
.L02 Q1 L2 /1
 11 12 
A A
; Qj D I  PLj ; L0i Qj Li D Ai ij ;
D
A21 A22
where A11 and A12 (and correspondingly A22 and A21 ) can be expressed also
as
A11 D .L01 L1 /1 C .L01 L1 /1 L01 L2 .L02 Q1 L2 /1 L02 L1 .L01 L1 /1 ;
A12 D .L01 L1 /1 L01 L2 .L02 Q1 L2 /1 :
0

19.10 .X X/

1

n
D
X00 1
0
t 00
Bt 10
B
D B ::
@ :
t k0

1

19.11 .X W y/ .X W y/

10 X0
X00 X0

1



1=n C xN 0 T1
xN Nx0 T1
xx
xx
D
N
T1
T1
xx x
xx
1

t 01 : : : t 0k
t 11 : : : t 1k C
C
:: : :
:: C
:
:
: A
k1
kk
t
::: t

!
1
 0
1
X X X0 y
G
O SSE
D
D
;
1
1
y0 X y0 y
O 0 SSE
SSE

where

19.12 G D X0 .I  Py /X1 D .X0 X/1 C O O 0 =SSE


19.13 jX0 Xj D njX00 .I  J/X0 j D njTxx j D jX00 X0 j10 .I  PX0 /1
D jX00 X0 j  k.I  PX0 /1k2
19.14 j.X W y/0 .X W y/j D jX0 Xj  SSE
19.15 r.X/ D 1 C r.X0 /  dim C .1/ \ C .X0 /
D 1 C r.I  J/X0  D 1 C r.Txx / D 1 C r.Rxx / D 1 C r.Sxx /
19.16 jRxx j 0 () r.X/ D k C 1 () jX0 Xj 0 () jTxx j 0
() r.X0 / D k & 1 C .X0 /
19.17 rij D 1 for some i j implies that jRxx j D 0 (but not vice versa)
19.18 (a) The columns of X D .1 W X0 / are orthogonal i

19 Inverse of a matrix

85

(b) the columns of X0 are centered and cord .X0 / D Rxx D Ik .


19.19 The statements (a) the columns of X0 are orthogonal, (b) cord .X0 / D Ik (i.e.,
orthogonality and uncorrelatedness) are equivalent if X0 is centered.
19.20 It is possible that
(a) cos.x; y/ is high but cord .x; y/ D 0,
(b) cos.x; y/ D 0 but cord .x; y/ D 1.
19.21 cord .x; y/ D 0 () y 2 C .Cx/? D C .1 W x/?  C .1/
[exclude the cases when x 2 C .1/ and/or y 2 C .1/]
19.22 T D



Txx txy
;
t0xy tyy


R
R D 0xx
rxy



S s
S D 0xx xy ;
sxy syy
1
0
R11 r12

rxy C
rxy
B
D @ r012 1
A
1
0
rxy
1

2
19.23 jRj D jRxx j.1  r0xy R1
xx rxy / D jRxx j.1  Ryx /
1
2
19.24 jRxx j D jR11 j.1  r012 R1
11 r12 / D jR11 j.1  Rk1:::k1 /
2
2
19.25 jRj D jR11 j.1  Rk1:::k1
/.1  Ryx
/;

2
Ryx
D R2 .yI X/

2
2
2
2
2
19.26 jRj D .1  r12
/.1  R312
/.1  R4123
/    .1  Rk123:::k1
/.1  Ryx
/
2
19.27 1  Ryx
D

19.28 R1 D
1

19.29 T

jRj
2
2
2
2
D .1  ry1
/.1  ry21
/.1  ry312
/    .1  ryk12:::k1
/
jRxx j

Rxx rxy
.rxy /0 r yy

Txx txy
D
.txy /0 t yy


D f r ij g;
N


ij

D f t g;
N

R1
xx D
T1
xx

R11 r12
.r12 /0 r kk

T11 t12
D
.t12 /0 t kk


D fr ij g


D ft ij g

1
0
1
2
19.30 Rxx D .Rxx  rxy r0xy /1 D R1
xx C R xx rxy rxy R xx =.1  R /

19.31 t yy D 1=SSE D the last diagonal element of T1


19.32 t i i D 1=SSE.xi I X.i/ / D the i th diagonal element of T1
xx

86

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

19.33 r yy D

19.34 r i i D

1
1

r0xy R1
xx rxy

1
D the last diagonal element of R1
2
1  Ryx

1
1  Ri2

2
2
D VIFi D the i th diagonal element of R1
xx ; Ri D R .xi I X.i/ /

19.35 Assuming that appropriate inverses exist:


(a) .B  CD1 C0 /1 D B1 C B1 CS1 C0 B1 , where S D D  C0 B1 C.
B1 uv0 B1
.
1 C v0 B1 u
1
D B1 
B1 ii ij0 B1
1 C kij0 B1 ii

(b) .B C uv0 /1 D B1 


(c) .B C kii ij0 /1

D B1 

1
bi .bj /0 :
1 C kb .j i/

(d) X0 .I  ii i0i /X1 D .X0.i/ X.i/ /1


D .X0 X/1 C

1
.X0 X/1 x.i/ x0.i/ .X0 X/1 :
1  hi i

(e) .A C kI/1 D A1  kA1 .I C kA1 /1 A1 .

20 Generalized inverses
In what follows, let A 2 Rnm .
20.1

(mp1) AGA D A,

(mp1) () G 2 fA g

(mp2) GAG D G,

(mp1) & (mp2): G 2 fA


r g reexive g-inverse

(mp3) .AG/0 D AG,

(mp1) & (mp3): G 2 fA


g: least squares g-i
`

(mp4) .GA/0 D GA,

(mp1) & (mp4): G 2 fA


m g: minimum norm g-i

All four conditions () G D AC : MoorePenrose inverse (unique)


20.2

The matrix Gmn is a generalized inverse of Anm if any of the following


equivalent conditions holds:
(a)

The vector Gy is a solution to Ab D y always when this equation is


consistent, i.e., always when y 2 C .A/.

(b1) GA is idempotent and r.GA/ D r.A/, or equivalently


(b2) AG is idempotent and r.AG/ D r.A/.

20 Generalized inverses

(c)

AGA D A.

87

(mp1)

20.3

AGA D A & r.G/ D r.A/ () G is a reexive generalized inverse of A

20.4

A general solution to a consistent (solvable) equation Ax D y is


A y C .Im  A A/z; where the vector z 2 Rm is free to vary,
and A is an arbitrary (but xed) generalized inverse.

20.5

The class of all solutions to a consistent equation Ax D y is fGyg, where G


varies through all generalized inverses of A.

20.6

The equation Ax D y is consistent i y 2 C .A/ i A0 u D 0 H) y0 u D 0.

20.7

The equation AX D Y has a solution (in X) i C .Y/  C .A/ in which case


the general solution is
A Y C .Im  A A/Z; where Z is free to vary.

20.8

A necessary and sucient condition for the equation AXB D C to have a


solution is that AA CB B D C, in which case the general solution is
X D A CB C Z  A AZBB ; where Z is free to vary.

20.9

Two alternative representations of a general solution to g-inverse of A are


(a) G D A C U  A AUAA ,
(b) G D A C V.In  AA / C .Im  A A/W,
where A is a particular g-inverse and U, V, W are free to vary. In particular,
choosing A D AC , the general representations can be expressed as
(c) G D AC C U  PA0 UPA , (d) G D AC C V.In  PA / C .Im  PA0 /W.

20.10 Let A 0, C 0. Then AB C is invariant w.r.t. the choice of B i C .C/ 


C .B/ & C .A0 /  C .B0 /.
20.11 AA C D C for some A i AA C D C for all A i AA C is invariant
w.r.t. the choice of A i C .C/  C .A/.
20.12 AB B D A () C .A0 /  C .B0 /
20.13 Let C 0. Then C .AB C/ is invariant w.r.t. the choice of B i C .A0 / 
C .B0 / holds along with C .C/  C .B/ or along with r.ABC L/ D r.A/,
where L is any matrix such that C .L/ D C .B/ \ C .C/.

88

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

20.14 rank.AB C/ is invariant w.r.t. the choice of B i at least one of the column
spaces C .AB C/ and C .C0 .B0 / A0 / is invariant w.r.t. the choice of B .
20.15 r.A/ D r.AA A/
r.AA /
r.A /;
20.16 C .AA / D C .A/ but
20.17 C .Im  A A/ D N .A/;

r.AC / D r.A/

C .A A/ D C .A0 / () A 2 fA


14 g
N .In  AA / D C .A/

20.18 C .AC / D C .A0 /


20.19 AC D .A0 A/C A0 D A0 .AA0 /C
20.20 .AC /0 D .A0 /C ;

.A /0 2 f.A0 / g

20.21 If A has a full rank decomposition A D UV0 then


AC D V.V0 V/1 .U0 U/1 U0 :
20.22 .AB/C D BC AC () C .BB0 A0 /  C .A0 / & C .A0 AB/  C .B/


1 0
V0 . Then:
20.23 Let Anm have a singular value decomposition A D U
0
0
 1 
1 K
U0 ;
G 2 fA g () G D V
L N
 1

1
K
g
()
G
D
V
U0 ;
G 2 fA
12
L L1 K
 1 
1 0

g
D
fA
g
()
G
D
V
U0 ;
G 2 fA
13
`
L N
 1 
1 K

g
D
fA
g
()
G
D
V
U0 ;
G 2 fA
14
m
0 N
 1 
1 0
U0 ;
G D AC () G D V
0 0
where K, L, and N are arbitrary matrices.
20.24 Let A D TT0 D T1 1 T01 be the EVD of A with 1 comprising the nonzero
eigenvalues. Then G is a symmetric and nnd reexive g-inverse of A i

 1
L
1
T0 ; where L is an arbitrary matrix.
GDT
L0 L0 1 L

20.25 If Anm D


B C
and r.A/ D r D r.Brr /, then
D E

20 Generalized inverses

Gmn

89

 1 
B
0
D
2 fA g:
0 0

20.26 In (a)(c) below we consider a nonnegative denite A partitioned as


 

 0
A11 A12
L1 L1 L01 L2
D
:
A D L0 L D
L02 L1 L02 L2
A21 A22
(a) Block diagonalization of a nonnegative denite matrix:




I A
I .L01 L1 / L01 L2
A12
11
(i) .L1 W L2 /
DL
WD LU
0
I
0
I
D L1 W .I  PL1 /L2 ;

 0
L1 L1
0
;
(ii) U0 L0 LU D U0 AU D
0 L02 .I  PL1 /L2


 
 
I
0
I A
0
11 A12 D A11
;
(iii)
A
A21 AD
0
I
0 A22  A21 A
11 I
11 A12




I A
I
0
A11
0
A12
11
(iv) A D
;
A21 AD
0
I
0 A22  A21 A
11 I
11 A12


where AD
11 , A11 , and A11 are arbitrary generalized inverses of A11 , and

A221 D A22  A21 A11 A12 D the Schur complement of A11 in A.

(b) The matrix A# is one generalized inverse of A:




A
A
A12 A
#
112
112
22
;
A D





A
22 A21 A112 A22 C A22 A21 A112 A12 A22
where A112 D L01 Q2 L1 , with B denoting a g-inverse of B and Q2 D
I  PL2 . In particular, the matrix A# is a symmetric reexive g-inverse
of A for any choices of symmetric reexive g-inverses .L02 L2 / and
.L01 Q2 L1 / . We say that A# is in BanachiewiczSchur form.
(c) If any of the following conditions hold, the all three hold:
r.A/ D r.A11 / C r.A22 /, i.e., C .L1 / \ C .L2 / D f0g,


AC
AC
A12 AC
C
112
112
22
(ii) A D
;
C
C
C
C
C
AC
22 A21 A112 A22 C A22 A21 A112 A12 A22

 C
A11 C AC
A12 AC
A21 AC
AC
A12 AC
C
11
221
11
11
221
(iii) A D
:
C
AC
AC
221 A21 A11
221

(i)

20.27 Consider Anm and Bnm and let the pd inner product matrix be V. Then
kAxkV
kAx C BykV 8 x; y 2 Rm () A0 VB D 0:
The statement above holds also for nnd V, i.e., t0 Vu is a semi-inner product.

90

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

20.28 Let G be a g-inverse of A such that Gy is a minimum norm solution (w.r.t.


standard norm) of Ax D y for any y 2 C .A/. Then it is necessary and sucient that AGA D A and .GA/0 D GA, i.e., G 2 fA
14 g. Such a G is called a
.
minimum norm g-inverse and denoted as A
m
20.29 Let A 2 Rnm and let the inner product matrix be N 2 PDn and denote a
2 Rmn . Then the following statements
minimum norm g-inverse as A
m.N/
are equivalent:
(a) G D A
,
m.N/
(b) AGA D A, .GA/0 N D NGA (here N can be nnd),
(c) GAN1 A0 D N1 A0 ,
(d) GA D PN1 A0 IN , i.e., .GA/0 D PA0 IN1 .
g,
20.30 (a) .N C A0 A/ A0 A0 .N C A0 A/ A 2 fA
m.N/
(b) C .A0 /  C .N/ H) N A0 .A0 N A/ 2 fA
g.
m.N/
20.31 Let G be a matrix (not necessarily a g-inverse) such that Gy is a least-squares
solution (w.r.t. standard norm) of Ax D y for any y, that is
ky  AGyk
ky  Axk for all x 2 Rm ; y 2 Rn :
Then it is necessary and sucient that AGA D A and .AG/0 D AG, that is,

G 2 fA
13 g. Such a G is called a least-squares g-inverse and denoted as A` .
20.32 Let the inner product matrix be V 2 PDn and denote a least-squares g-inverse
. Then the following statements are equivalent:
as A
`.V/
(a) G D A
,
`.V/
(b) AGA D A, .AG/0 V D VAG,
(c) A0 VAG D A0 V (here V can be nnd),
(d) AG D PAIV .
20.33 .A0 VA/ A0 V 2 fA
`.V/ g:

 
0

20.34 Let the inner product matrix Vnn be pd. Then .A0 /
D A`.V1 / .
m.V/
20.35 The minimum norm solution for X0 a D k is aQ D .X0 /
k WD Gk, where
m.V/

0


.X0 /
m.V/ WD G D W X.X W X/ I

W D V C XX0 :

Furthermore, BLUE.k0 / D aQ 0 y D k0 G0 y D k0 .X0 W X/ X0 W y.

21 Projectors

91

21 Projectors
21.1

Orthogonal projector. Let A 2 Rnm


and let the inner product
and the correr
p
sponding norm be dened as ht; ui D t0 u, and ktk D ht; ti, respectively.
Further, let A? 2 Rnq be a matrix spanning C .A/? D N .A0 /, and the
2 Rn.nr/ form bases for
columns of the matrices Ab 2 Rnr and A?
b
C .A/ and C .A/? , respectively. Then the following conditions are equivalent
ways to dene the unique matrix P:
(a) The matrix P transforms every y 2 Rn ,
y D yA C yA? ;

yA 2 C .A/;

yA? 2 C .A/? ;

into its projection onto C .A/ along C .A/? ; that is, for each y above, the
multiplication Py gives the projection yA : Py D yA .
(b) P.Ab C A? c/ D Ab for all b 2 Rm , c 2 Rq .
(c) P.A W A? / D .A W 0/.
(d) P.Ab W A?
/ D .Ab W 0/.
b
(e) C .P/  C .A/;

minky  Abk2 D ky  Pyk2

(f) C .P/  C .A/;

P0 A D A.

(g) C .P/ D C .A/;

P0 P D P.

(h) C .P/ D C .A/;

P2 D P;

(i) C .P/ D C .A/;

Rn D C .P/  C .In  P/.

for all y 2 Rn .

P0 D P.

(j) P D A.A0 A/ A0 D AAC .


(k) P D Ab .A0b Ab /1 A0b .


(l) P D Ao A0o D U I0r 00 U0 , where U D .Ao W A?
o /, the columns of Ao and
forming
orthonormal
bases
for
C
.A/
and
C .A/? , respectively.
A?
o
The matrix P D PA is the orthogonal projector onto the column space C .A/
w.r.t. the inner product ht; ui D t0 u. Correspondingly, the matrix In  PA is
the orthogonal projector onto C .A? /: In  PA D PA? .
21.2

In 21.321.10 we consider orthogonal projectors dened w.r.t. the standard


inner product in Rn so that they are symmetric idempotent matrices. If P is
idempotent but not necessarily symmetric, it is called an oblique projector or
simply a projector.

21.3

PA C PB is an orthogonal projector () A0 B D 0, in which case PA C PB


is the orthogonal projector onto C .A W B/.

92

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

21.4

P.AWB/ D PA C P.IPA /B

21.5

The following statements are equivalent:


(a) PA  PB is an orthogonal projector,

(b) PA PB D PB PA D PB ,

(c) kPA xk  kPB xk for all x 2 R ,

(d) PA  PB L 0,

(e) C .B/  C .A/.


If any of the above conditions hold, then PA PB D P.IPB /A D PC .A/\C .B/? .
21.6

Let L be a matrix with property C .L/ D C .A/ \ C .B/. Then


(a) C .A/ D C L W .I  PL /A D C .A/ \ C .B/  C .I  PL /A,
(b) PA D PL C P.IPL /A D PL C PC .A/\C .L/? ,
(c) PA PB D PL C P.IPL /A P.IPL /B ,
(d) .I  PL /PA D PA .I  PL / D PA  PL D PC .A/\C .L/? ,
(e) C .I  PL /A D C .I  PL /PA  D C .A/ \ C .L/? ,
(f) r.A/ D dim C .A/ \ C .B/ C dim C .A/ \ C .L/? .

21.7

Commuting projectors. Denote C .L/ D C .A/ \ C .B/. Then the following


statements are equivalent;
(a) PA PB D PB PA ,
(b) PA PB D PC .A/\C .B/ D PL ,
(c) P.AWB/ D PA C PB  PA PB ,
(d) C .A W B/ \ C .B/? D C .A/ \ C .B/? ,
(e) C .A/ D C .A/ \ C .B/  C .A/ \ C .B/? ,
(f) r.A/ D dim C .A/ \ C .B/ C dim C .A/ \ C .B/? ,
(g) r.A0 B/ D dim C .A/ \ C .B/,
(h) P.IPA /B D PB  PA PB ,
(i) C .PA B/ D C .A/ \ C .B/,
(j) C .PA B/  C .B/,
(k) P.IPL /A P.IPL /B D 0.

21.8

Let PA and PB be orthogonal projectors of order n  n. Then


(a) 1
chi .PA  PB /
1;
(b) 0
chi .PA PB /
1;

i D 1; : : : ; n,

i D 1; : : : ; n,

21 Projectors

93

(c) trace.PA PB /
r.PA PB /,
(d) #fchi .PA PB / D 1g D dim C .A/ \ C .B/, where #fchi .Z/ D 1g D the
number of unit eigenvalues of Z.
21.9

The following statements are equivalent:


ch1 .PA PB / < 1;

C .A/ \ C .B/ D f0g;

det.I  PA PB / 0:

21.10 Let A 2 Rna , B 2 Rnb , and T 2 Rnt , and assume that C .T/  C .A/.
Moreover, let U be any matrix satisfying C .U/ D C .A/ \ C .B/. Then
C .QT U/ D C .QT A/ \ C .QT B/;

where QT D In  PT :

21.11 As a generalization to y0 .In  PX /y


.y  Xb/0 .y  Xb/ for all b, we have,
for given Ynq and Xnp , the Lwner ordering
Y0 .In  PX /Y
L .Y  XB/0 .Y  XB/ for all Bpq :


P12
be an orthogonal projector where P11 is a square matrix.
21.12 Let P D PP11
21 P22
Then P221 D P22  P21 P
11 P12 is also an orthogonal projector.
is idempotent i any of the following conditions holds:
21.13 The matrix P 2 Rnn
r
(a) P D AA for some A,
(b) In  P is idempotent,
(c) r.P/ C r.In  P/ D n,
(d) Rn D C .P/ C .In  P/,
(e) P has a full rank decomposition P D UV0 , where V0 U D Ir ,


I 0
B1 , where B is a nonsingular matrix,
(f) P D B r
0 0


I C
(g) P D D r
D0 , where D is an orthogonal matrix and C 2 Rr.nr/ ,
0 0
(h) r.P/ D tr.P/ and r.In  P/ D tr.In  P/.
21.14 If P2 D P then rank.P/ D tr.P/ and
ch.P/ D f0; : : : ; 0; 1; : : : ; 1g;

#fch.P/ D 1g D rank.P/;

but ch.P/ D f0; : : : ; 0; 1; : : : ; 1g does not imply P2 D P (unless P0 D P).


21.15 Let P be symmetric. Then P2 D P () ch.P/ D f0; : : : ; 0; 1; : : : ; 1g; here
#fch.P/ D 1g D rank.P/ D tr.P/.

94

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

21.16 P2 D P H) P is the oblique projector onto C .P/ along N .P/, where the
direction space can be also written as N .P/ D C .I  P/.
21.17 AA is the oblique projector onto C .A/ along N .AA /, where the direction
space can be written as N .AA / D C .In  AA /. Correspondingly, A A
is the oblique projector onto C .A A/ along N .A A/ D N .A/.
21.18 AAC D PA ;

AC A D PA0

21.19 Generalized projector PAjB . By PAjB we mean any matrix G, say, satisfying
(a) G.A W B/ D .A W 0/,
where it is assumed that C .A/ \ C .B/ D f0g, which is a necessary and sufcient condition for the solvability of (a). Matrix G is a generalized projector
onto C .A/ along C .B/ but it need not be unique and idempotent as is the
case when C .A W B/ D Rn . We denote the set of matrices G satisfying (a) as
fPAjB g and the general expression for G is, for example,
G D .A W 0/.A W B/ C F.I  P.AWB/ /; where F is free to vary.
21.20 Suppose that C .A/ \ C .B/ D f0n g D C .C/ \ C .D/. Then
fPCjD g  fPAjB g () C .A/  C .C/ and C .B/  C .D/:
, and
21.21 Orthogonal projector w.r.t. the inner product matrix V. Let A 2 Rnm
r
let the inner product (and the corresponding norm) be dened as ht; uiV D
?
t0 Vu where V is pd. Further, let A?
V be an n  q matrix spanning C .A/V D
0
?
1 ?
N .A V/ D C .VA/ D C .V A /. Then the following conditions are
equivalent ways to dene the unique matrix P :
(a) P .Ab C A?
V c/ D Ab

for all b 2 Rm , c 2 Rq .

1 ?
(b) P .A W A?
V / D P .A W V A / D .A W 0/.

(c) C .P /  C .A/;

minky  Abk2V D ky  P yk2V for all y 2 Rn .

(d) C .P /  C .A/;

P0 VA D VA.

(e) P0 .VA W A? / D .VA W 0/.


(f) C .P / D C .A/;

P0 V.In  P / D 0.

(g) C .P / D C .A/;

P2 D P ;

.VP /0 D VP .

(h) C .P / D C .A/; Rn D C .P /  C .In  P /; here  refers to the


orthogonality with respect to the given inner product.
(i) P D A.A0 VA/ A0 V, which is invariant for any choice of .A0 VA/ .
The matrix P D PAIV is the orthogonal projector onto the column space
C .A/ w.r.t. the inner product ht; uiV D t0 Vu. Correspondingly, the matrix

21 Projectors

95

In  PAIV is the orthogonal projector onto C .A?


V /:
In  PAIV D PA? IV D V1 Z.Z0 V1 Z/ Z0 D P0ZIV1 D PV1 ZIV ;
V

where Z 2 fA g, i.e., PAIV D In  P0A? IV1 D In  PV1 A? IV .


?

21.22 Consider the linear model fy; X; Vg and let C .X/?


denote the set of vecV1
tors which are orthogonal to every vector in C .X/ with respect to the inner
product matrix V1 . Then the following sets are identical:
(a) C .X/?
,
V1

(b) C .VX? /,

(c) N .X0 V1 /,

(d) C .V1 X/? ,

(e) N .PXIV1 /,

(f) C .In  PXIV1 /.

Denote W D V C XUX0 , where C .W/ D C .X W V/. Then


C .VX? / D C .W X W In  W W/? ;
where W is an arbitrary (but xed) generalized inverse of W. The column
space C .VX? / can be expressed also as
C .VX? / D C .W /0 X W In  .W /0 W0 ? :
Moreover, let V be possibly singular and assume that C .X/  C .V/. Then
C .VX? / D C .V X W In  V V/?  C .V X/? ;
where the inclusion becomes equality i V is positive denite.
21.23 Let V be pd and let X be partitioned as X D .X1 W X2 /. Then
(a) PX1 IV1 C PX2 IV is an orthogonal projector i X01 V1 X2 D 0, in which
case PX1 IV1 C PX2 IV1 D P.X1 WX2 /IV1 ,
(b) P.X1 WX2 /IV1 D PX1 IV1 C P.IP
(c) P.IP

X1 IV1

/X2 IV1

X1 IV1

/X2 IV1 ,

1 0 P
P
P 1 X2 .X0 M
D PVMP 1 X2 IV1 D VM
2 1 X2 / X2 M1 ,

P 1 D V1  V1 X1 .X0 V1 X1 /1 X0 V1 D M1 .M1 VM1 / M1 .


(d) M
1
1
21.24 Consider a weakly singular partitioned linear model and denote PAIVC D
P 1 D M1 .M1 VM1 / M1 . Then
A.A0 VC A/ A0 VC and M
PXIVC D X.X0 VC X/ X0 VC D V1=2 PVC1=2 X VC1=2
D V1=2 .PVC1=2 X1 C P.IP
D PX1 IVC C V1=2 P.IP

VC1=2 X1

VC1=2 X1

/VC1=2 X2 /V

/VC1=2 X2 V

C1=2

C1=2

D PX1 IVC C V1=2 PV1=2 MP 1 X2 VC1=2


P 1 X2 / X02 M
P 1 PV :
P 1 X2 .X02 M
D PX1 IVC C VM

96

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

22 Eigenvalues
22.1

Denition: The scalar  is an eigenvalue of Ann if


At D t for some nonzero vector t, i.e., .A  In /t D 0,
in which case t is an eigenvector of A corresponding to , and .; t/ is an
eigenpair for A. If A is symmetric, then all eigenvalues are real. The eigenvalues are the n roots of the characteristic equation
pA ./ D det.A  In / D 0;
where pA ./ D det.AIn / is the characteristic polynomial of A. We denote
(when eigenvalues are real)
1      n ;

chi .A/ D i ;

ch.A/ D f1 ; : : : ; n g D spectrum of A;


nzch.A/ D f i W i 0 g:
22.2

The characteristic equation can be written as


pA ./ D ./n C S1 ./n1 C    C Sn1 ./ C Sn D 0;
pA ./ D .1  /    .n  / D 0;
for appropriate real coecients S1 ; : : : ; Sn : Si is the sum of all i  i principal
minors and hence Sn D pA .0/ D det.A/ D 1    n , S1 D tr.A/ D 1 C
   C n , and thereby always
det.A/ D 1    n ;

tr.A/ D 1 C    C n :

For n D 3, we have
det.A  I3 /
D ./3 C tr.A/./2




a11 a12 a11 a13 a22 a23

./ C det.A/:

C
C
C
a21 a22 a31 a33 a32 a33
22.3

The characteristic polynomial of A D

a b
b c

is

pA ./ D 2  .a C c/ C .ac  b 2 / D 2  tr.A/   C det.A/:


p


The eigenvalues of A are 12 a C c .a  c/2 C 4b 2 .
22.4

The following statements are equivalent (for a nonnull t and pd A):


(a) t is an eigenvector of A,

(b) cos.t; At/ D 1,

(c) t0 At  .t0 A1 t/1 D 0 with ktk D 1,


(d) cos.A1=2 t; A1=2 t/ D 1,

22 Eigenvalues

97

(e) cos.y; At/ D 0 for all y 2 C .t/? .


22.5

The spectral radius of A 2 Rnn is dened as .A/ D maxfjchi .A/jg; the


eigenvalue corresponding to .A/ is called the dominant eigenvalue.

EVD Eigenvalue decomposition. A symmetric Ann can be written as


A D TT0 D 1 t1 t01 C    C n tn t0n ;
and thereby
.At1 W At2 W : : : W Atn / D .1 t1 W 2 t2 W : : : W n tn /;

AT D T;

where Tnn is orthogonal, D diag.1 ; : : : ; n /, and 1      n are


the ordered eigenvalues of A; chi .A/ D i . The columns ti of T are the
orthonormal eigenvectors of A.
Consider the distinct eigenvalues of A, f1g >    > fsg , and let Tfig be an
n  mi matrix consisting of the orthonormal eigenvectors corresponding to
fig ; mi is the multiplicity of fig . Then
A D TT0 D f1g Tf1g T0f1g C    C fsg Tfsg T0fsg :
With this ordering, is unique and T is unique up to postmultiplying by a
blockdiagonal matrix U D blockdiag.U1 ; : : : ; Us /, where Ui is an orthogonal
mi  mi matrix. If all the eigenvalues are distinct, then U is a diagonal matrix
with diagonal elements equal to 1.
22.6

For a nonnegative denite n  n matrix A with rank r > 0 we have



 0 
1 0
T1
D T1 1 T01
A D TT0 D .T1 W T0 /
T00
0 0
D 1 t1 t01 C    C r tr t0r ;
where 1      r > 0, 1 D diag.1 ; : : : ; r /, and T1 D .t1 W : : : W tr /,
T0 D .trC1 W : : : W tn /.

22.7

The nnd square root of the nnd matrix A D TT0 is dened as


0
A1=2 D T1=2 T0 D T1 1=2
1 T1 I

22.8

.A1=2 /C D T1 1=2
T01 WD AC1=2 :
1

If App D .ab/Ip Cb1p 1p0 for some a; b 2 R, then A is called a completely


symmetric matrix. If it is a covariance matrix (i.e., nnd) then it is said to have
an intraclass correlation structure. Consider the matrices
App D .a  b/Ip C b1p 1p0 ;

pp D .1  %/Ip C %1p 1p0 :

(a) det.A/ D .a  b/n1 a C .p  1/b,


(b) the eigenvalues of A are a C .p  1/b (with multiplicity 1), and a  b
(with multiplicity n  1),

98

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(c) A is nonsingular i a b and a .p  1/b, in which case




1
b
1
0
Ip 
1p 1p ;
A D
ab
a C .p  1/b
(
1 C .p  1/% with multiplicity 1,
(d) ch./ D
1%
with multiplicity p  1,
1
(e) is nonnegative denite ()  p1

%
1,

(f) t1 D 1p D eigenvector w.r.t. 1 D 1 C .p  1/%, 0 2 R,


t2 ; : : : ; tp are orthonormal eigenvectors w.r.t. i D 1  %, i D 2; : : : ; p,
t2 ; : : : ; tp form an orthonormal basis for C .1p /? ,
(g) det./ D .1  %/p1 1 C .p  1/%,
1
(h) 1p D 1 C .p  1/%1p WD 1 1p ; if %  p1
, then 1p D 1
1 1p in
which case
p
0

2 0
;
1p0  1p D 2
1 1p 1p D 1 1p 1p D
1 C .p  1/%


1
1
%
(i) 1 D
Ip 
1p 1p0 , for % 1, % 
.
1%
1 C .p  1/%
p1

(j) Suppose that


  

xx  xy
x
0
cov
D
;
D .1  %/IpC1 C %1pC1 1pC1
 0xy 1
y
where (necessarily)  p1
%
1. Then
2
D  0xy 
%yx
xx  xy D

p%2
:
1 C .p  1/%

  

%1p 1p0
x
(k) Assume that cov
D
, where
y
%1p 1p0

D .1  %/Ip C %1p 1p0 . Then cor.1p0 x; 1p0 y/ D


22.9

p%
.
1 C .p  1/%

Consider a completely symmetric matrix App D .a  b/Ip C b1p 1p0 , Then


1
0
ab
0 0 ::: 0
b
C
B 0 a  b 0 ::: 0
b
C
B :
::
::
::
::
1
C WD F;
B
:
L AL D B :
:
:
:
:
C
A
@ 0
0 0 ::: a  b
b
0
0 0 : : : 0 a C .p  1/b

22 Eigenvalues

99

where L carries out elementary column operations:






Ip1 0p1
Ip1 0p1
1
;
L
;
D
LD
0
0
1p1
1
1p1
1
and hence det.A/ D .a  b/p1 a C .p  1/b. Moreover, det.A  Ip / D 0
i det.Ip  F/ D 0.
22.10 The algebraic multiplicity of , denoted as alg mult A ./ is its multiplicity as
a root of det.A  In / D 0. The set f t 0 W At D t g is the set of all
eigenvectors associated with . The eigenspace of A corresponding to  is
f t W .A  In /t D 0 g D N .A  In /:
If alg multA ./ D 1,  is called a simple eigenvalue; otherwise it is a multiple
eigenvalue. The geometric multiplicity of  is the dimension of the eigenspace
of A corresponding to :
geo mult A ./ D dim N .A  In / D n  r.A  In /:
Moreover, geo mult A ./
alg multA ./ for each  2 ch.A/; here the equality holds e.g. when A is symmetric.
22.11 Similarity. Two n  n matrices A and B are said to be similar whenever there
exists a nonsingular matrix F such that F1 AF D B. The product F1 AF is
called a similarity transformation of A.
22.12 Diagonalizability. A matrix Ann is said to be diagonalizable whenever there
exists a nonsingular matrix Fnn such that F1 AF D D for some diagonal
matrix Dnn , i.e., A is similar to a diagonal matrix. In particular, any symmetric A is diagonalizable.
22.13 The following statements concerning the matrix Ann are equivalent:
(a) A is diagonalizable,
(b) geo mult A ./ D alg multA ./

for all  2 ch.A/,

(c) A has n linearly independent eigenvectors.


22.14 Let A 2 Rnm , B 2 Rmn . Then AB and BA have the same nonzero eigenvalues: nzch.AB/ D nzch.BA/. Moreover, det.In  AB/ D det.Im  BA/.
22.15 Let A; B 2 NNDn . Then
(a) tr.AB/
ch1 .A/  tr.B/,
(b) A  B H) chi .A/  chi .B/, i D 1; : : : ; n; here we must have at least
one strict inequality if A B.

100

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

22.16 Interlacing theorem. Let Ann be symmetric and let B be a principal submatrix of A of order .n  1/  .n  1/. Then
chiC1 .A/
chi .B/
chi .A/;

i D 1; : : : ; n  1:



22.17 Let A 2 Rna
and denote B D A00 A0 , sg.A/ D fi ; : : : ; r g. Then the
r
nonzero eigenvalues of B are 1 ; : : : ; r ; 1 ; : : : ; r .
In 22.18EY1 we consider Ann whose EVD is A D TT0 and denote
T.k/ D .t1 W : : : W tk /, Tk D .tnkC1 W : : : W tn /.
22.18 (a) ch1 .A/ D 1 D max

x0 Ax
x0 Ax,
D max
x0 xD1
x0 x

(b) chn .A/ D n D min

x0 Ax
x0 Ax,
D min
x0 xD1
x0 x

x0

x0

x0 Ax

1 D ch1 .A/;
x0 x
x0 Ax,
(d) ch2 .A/ D 2 D max
0

(c) chn .A/ D n

x0 Ax
D Rayleigh quotient,
x0 x

x xD1
t01 xD0

x0 Ax D kC1 ; k D 1; : : : ; n  1.
(e) chkC1 .A/ D kC1 D max
0
x xD1
T0.k/ xD0

22.19 Let Ann be symmetric and let k be a given integer, k


n. Then
max tr.G0 AG/ D max
tr.PG A/ D 1 C    C k ;
0

G0 GDIk

G GDIk

where the upper bound is obtained when G D .t1 W : : : W tk / D T.k/ .


PST

Poincar separation theorem. Let Ann be a symmetric matrix, and let Gnk
be such that G0 G D Ik , k
n. Then, for i D 1; : : : ; k,
chnkCi .A/
chi .G0 AG/
chi .A/:
Equality holds on the right simultaneously for all i D 1; : : : ; k when G D
T.k/ K, and on the left if G D Tk L; K and L are arbitrary orthogonal matrices.

CFT CourantFischer theorem. Let Ann be symmetric and let k be a given integer
with 2
k
n and let B 2 Rn.k1/ . Then
x0 Ax
D k ,
B xD0 x0 x

(a) min max


0
B

x0 Ax
D nkC1 .
B B xD0 x0 x
The result (a) is obtained when B D T.k1/ and x D tk .
(b) max min
0

22 Eigenvalues

101

EY1 EckartYoung theorem. Let Ann be a symmetric matrix of rank r. Then


min kA  Bk2F D min tr.A  B/.A  B/0 D 2kC1 C    C 2r ;

r.B/Dk

r.B/Dk

and the minimum is attained when


B D T.k/ .k/ T0.k/ D 1 t1 t01 C    C k tk t0k :
22.20 Let A be a given n  m matrix of rank r, and let B 2 Rnm
, k < r. Then
k
.A  B/.A  B/0 L .A  APB0 /.A  APB0 /0 D A.Im  PB0 /A0 ;
and hence kA  Bk2F  kA  APB0 k2F . Moreover, if B has the full rank
decomposition B D FG0 , where G0 G D Ik , then PB0 D PG and
.A  B/.A  B/0 L .A  FG0 /.A  FG0 /0 D A.Im  GG0 /A0 :
22.21 Suppose V 2 PDn and C denotes the centering matrix. Then the following
statements are equivalent:
(a) CVC D c 2 C for some c 0,
(b) V D 2 I C a10 C 1a0 , where a is an arbitrary vector and is any scalar
ensuring the positive deniteness of V.
p
The eigenvalues of V in (b) are 2 C10 a na0 a, each with multiplicity one,
and 2 with multiplicity n  2.
22.22 Eigenvalues of A w.r.t. pd B. Let Ann and Bnn be symmetric of which B is
nonnegative denite. Let  be a scalar and w a vector such that
(a) Aw D Bw;

Bw 0.

Then we call  a proper eigenvalue and w a proper eigenvector of A with


respect to B, or shortly, .; w/ is a proper eigenpair for .A; B/. There may
exist a vector w 0 such that Aw D Bw D 0, in which case (a) is satised
with arbitrary . We call such a vector w an improper eigenvector of A with
respect to B. The space of improper eigenvectors is C .A W B/? . Consider next
the situation when B is positive denite (in which case the word proper can
be dropped o). Then (a) becomes
(b) Aw D Bw;

w 0.

Premultiplying (b) by B1 yields the usual eigenvalue equation B1 Aw D


w, w 0. We denote
(c) ch.A; B/ D ch.B1 A/ D ch.AB1 / D f1 ; : : : ; n g.
The matrix B1 A is not necessarily symmetric but in view of
(d) nzch.B1 A/ D nzch.B1=2 B1=2 A/ D nzch.B1=2 AB1=2 /,
and the symmetry of B1=2 AB1=2 , the eigenvalues of B1 A are all real.

102

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Premultiplying (a) by B1=2 yields B1=2 Aw D B1=2 w, i.e., B1=2 AB1=2 


B1=2 w D B1=2 w, which shows the equivalence of the following statements:
(e) .; w/ is an eigenpair for .A; B/,
(f) .; B1=2 w/ is an eigenpair for B1=2 AB1=2 ,
(g) .; w/ is an eigenpair for B1 A.
22.23 Rewriting (b) above as .AB/w D 0, we observe that nontrivial solutions w
for (b) exist i det.A  B/ D 0. The expression A  B, with indeterminate
, is called a matrix pencil or simply a pencil.
22.24 Let Ann be symmetric and Bnn positive denite. Then
(a) max
x0

x0 Ax
x0 B1=2  B1=2 AB1=2  B1=2 x
D
max
x0 Bx
x0
x0 B1=2  B1=2 x
D max
z0

z0 B1=2 AB1=2 z
z0 z

D ch1 .B1=2 AB1=2 / D ch1 .B1 A/


WD 1 D the largest root of det.A  B/ D 0:
(b) Denote Wi D .w1 W : : : W wi /. The vectors w1 ; : : : ; wn satisfy
w01 Aw1
x0 Ax
D ch1 .B1 A/ D 1 ;
D
w01 Bw1
x0 x0 Bx
w0i Awi
x0 Ax
max
D chi .B1 A/ D i ;
D
w0i Bwi
W0i 1 BxD0 x0 Bx
max

i > 1;

x0

i wi is an eigenvector of B1 A corresponding to the eigenvalue


chi .B1 A/ D i , i.e., i is the i th largest root of det.A  B/ D 0.
(c) max
x0

.a0 x/2
x0  aa0  x
D
max
D ch1 .aa0 B1 / D a0 B1 a.
x0 Bx
x0 Bx
x0

22.25 Let Bnn be nonnegative denite and a 2 C .B/. Then


.a0 x/2
D a0 B a;
Bx0 x0 Bx
max

where the equality is obtained i Bx D a for some 2 R.


22.26 Let Vpp be nnd with V D diag.V/ being pd. Then
max
a0

a0 Va
VV1=2
/ D ch1 .RV /;
D ch1 .V1=2

a 0 V a

22 Eigenvalues

103

where RV D V1=2
VV1=2
can be considered as a correlation matrix. More

over,
a0 Va

p
a 0 V a

for all a 2 Rp ;

where the equality is obtained i V D  2 qq0 for some  2 R and some q D


.q1 ; : : : ; qp /0 , and a is a multiple of a D V1=2
1 D 1 .1=q1 ; : : : ; 1=qp /0 .

22.27 Simultaneous diagonalization. Let Ann and Bnn be symmetric. Then there
exists an orthogonal matrix Q such that Q0 AQ and Q0 BQ are both diagonal
i AB D BA.
22.28 Consider the symmetric matrices Ann and Bnn .
(a) If B is pd, then there exists a nonsingular matrix Qnn such that
Q0 AQ D D diag.1 ; : : : ; n /;

Q0 BQ D In ;

where ch.B1 A/ D f1 ; : : : ; n g. The columns of Q are the eigenvectors


of B1 A; Q is not necessarily orthogonal.
(b) Let B be nnd with r.B/ D b. Then there exists a matrix Lnb such that
L0 AL D diag.1 ; : : : ; b /;

L0 BL D Ib :

(c) Let B be nnd with r.B/ D b, and assume that r.N0 AN/ D r.N0 A/, where
N D B? . Then there exists a nonsingular matrix Qnn such that




1 0
I 0
;
Q0 AQ D
; Q0 BQ D b
0 2
0 0
where 1 2 Rbb and 2 2 R.nb/.nb/ are diagonal matrices.
22.29 As in 22.22 (p. 101), consider the eigenvalues of A w.r.t. nnd B but allow now
B to be singular. Let  be a scalar and w a vector such that
(a) Aw D Bw, i.e., .A  B/w D 0;

Bw 0.

Scalar  is a proper eigenvalue and w a proper eigenvector of A with respect


to B. The nontrivial solutions w for (a) above exist i det.A  B/ D 0.
The matrix pencil A  B, with indeterminate , is is said to be singular if
det.A  B/ D 0 is satised for any ; otherwise the pencil is regular.
22.30 If i is the i th largest proper eigenvalue of A with respect to B, then we write
chi .A; B/ D i ;

1  2      b ; b D r.B/;
ch.A; B/ D f1 ; : : : ; b g D set of proper eigenvalues of A w.r.t. B;
nzch.A; B/ D set of nonzero proper eigenvalues of A w.r.t. B:

104

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

22.31 Let Ann be symmetric, Bnn nnd, and r.B/ D b, and assume that r.N0 AN/ D
r.N0 A/, where N D B? . Then there are precisely b proper eigenvalues of A
with respect to B, ch.A; B/ D f1 ; : : : ; b g, some of which may be repeated
or null. Also w1 ; : : : ; wb , the corresponding eigenvectors, can be so chosen
that if wi is the i th column of Wnb , then
W0 AW D 1 D diag.1 ; : : : ; b /;

W0 BW D Ib ;

W0AN D 0:

22.32 Suppose that r.QB AQB / D r.QB A/ holds; here QB D In  PB . Then


(a) nzch.A; B/ D nzch.A  AQB .QB AQB / QB A/B ,
(b) C .A/  C .B/ H) nzch.A; B/ D nzch.AB /,
where the set nzch.AB / is invariant with respect to the choice of the B .
22.33 Consider the linear model fy; X; Vg. The nonzero proper eigenvalues of V
with respect to H are the same as the nonzero eigenvalues of the covariance
matrix of the BLUE.X/.
22.34 If Ann is symmetric and Bnn is nnd satisfying C .A/  C .B/, then
w01 Aw1
x0 Ax
D
D 1 D ch1 .BC A/ D ch1 .A; B/;
w01 Bw1
Bx0 x0 Bx
max

where 1 is the largest proper eigenvalue of A with respect to B and w1 the


corresponding proper eigenvector satisfying Aw1 D 1 Bw1 , Bw1 0. If
C .A/  C .B/ does not hold, then x0 Ax=x0 Bx has no upper bound.
22.35 Under a weakly singular linear model where r.X/ D r we have
x0 Hx
D 1 D ch1 .VC H/ D ch1 .HVC H/ D 1= chr .HVC H/C ;
Vx0 x0 Vx
max

Q
and hence 1=1 is the smallest nonzero eigenvalue of cov.X/.
22.36 Consider symmetric n  n matrices A and B. Then
n
X

chi .A/ chni1 .B/


tr.AB/

iD1

n
X

chi .A/ chi .B/:

iD1

Suppose that A 2 Rnp , B 2 Rpn , and k D min.r.A/; r.B//. Then




k
X

sgi .A/ sgi .B/


tr.AB/

iD1

k
X

sgi .A/ sgi .B/:

iD1

22.37 Suppose that A, B and A  B are nnd n  n matrices. Then


chi .A  B/  chiCk .A/;

i D 1; : : : ; n  k;

23 Singular value decomposition & other matrix decompositions

105

and with equality for all i i B D T.k/ .k/ T0.k/ , where .k/ comprises the
rst k eigenvalues of A and T.k/ D .t1 W : : : W tk /.
22.38 For A; B 2 Rnm , where r.A/ D r and r.B/ D k, we have
sgi .A  B/  sgiCk .A/;

i C k
r;

and sgi .A  B/  0 for i C k > r. The equality is attained above i k


r
P
P
and B D kiD1 sgi .A/ti u0i , where A D riD1 sgi .A/ti u0i is the SVD of A.

23 Singular value decomposition & other matrix decompositions


SVD Singular value decomposition. Matrix A 2 Rnm
, .m
n/, can be written
r
as

 0 
1 0
V1
D UV0 D U1 1 V01 D U  V0
A D .U1 W U0 /
V00
0 0
D 1 u1 v01 C    C r ur v0r ;
where 1 D diag.1 ; : : : ; r /, 1      r > 0,  2 Rnm , and
  


1 0
2 Rnm ; 1 2 Rrr ;  2 Rmm ;
D
D
0
0 0
 D diag.1 ; : : : ; r ; rC1 ; : : : ; m / D the rst m rows of ;
rC1 D rC2 D    D m D 0;
p
i D sgi .A/ D chi .A0 A/
D i th singular value of A;

i D 1; : : : ; m;

Unn D .U1 W U0 /; U1 2 R ; U0 U D UU0 D In ;


Vmm D .V1 W V0 /; V1 2 Rmr ; V0 V D VV0 D Im ;
nr

U D .u1 W : : : W um / D the rst m columns of U; U 2 Rnm ;


 2 
1 0
0 0
0
2
2 Rmm ;
V A AV D   D  D
0 0
 2 
1 0
0
0
0
2
2 Rnn ;
U AA U D  D # D
0 0
ui D i th left singular vector of A; i th eigenvector of AA0 ;
vi D i th right singular vector of A; i th eigenvector of A0 A:
23.1

With the above notation,


(a) f12 ; : : : ; r2 g D nzch.A0 A/ D nzch.AA0 /,
(b) C .V1 / D C .A/;
(c) Avi D i ui ;

C .U1 / D C .A0 /,

A0 ui D i vi ;

i D 1; : : : ; m,

106

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(d) A0 ui D 0;
(e) u0i Avi D i ;

i D m C 1; : : : ; n,
i D 1; : : : ; m;

u0i Avj D 0 for i j .

23.2

Not all pairs of orthogonal U and V satisfying U0 AA0 U D 2# and V0 A0 AV D


2 yield A D UV0 .

23.3

Let A have an SVD A D UV0 D 1 u1 v01 C    C r ur v0r , r.A/ D r. Then


sg1 .A/ D 1 D max x0 Ay

subject to x0 x D y0 y D 1;

and the maximum is obtained when x D u1 and y D v1 ,


.x0 Ay/2
D ch1 .A0 A/:
x0; y0 x0 x  y0 y

sg21 .A/ D 12 D max

The second largest singular value 2 can be obtained as 2 D max x0 Ay, where
the maximum is taken over the set
f x 2 Rn ; y 2 Rm W x0 x D y0 y D 1; x0 u1 D y0 v1 D 0 g:
The i th largest singular value can be dened correspondingly as
max
x0; y0
U.k/ xD0;V.k/ yD0

x0 Ay
D kC1 ;
x 0 x  y0 y

k D 1; : : : ; r  1;

where U.k/ D .u1 W : : : W uk / and V.k/ D .v1 W : : : W vk /; the maximum


occurs when x D ukC1 and y D vkC1 .
23.4

Let A 2 Rnm , B 2 Rnn , C 2 Rmm and B and C are pd. Then


.x0 Ay/2
.x0 B1=2 B1=2 AC1=2 C1=2 y/2
D
max
x0; y0 x0 Bx  y0 Cy
x0; y0 x0 B1=2 B1=2 x  y0 C1=2 C1=2 y
max

.t0 B1=2 AC1=2 u/2


D sg21 .B1=2 AC1=2 /:
t 0 t  u0 u
t0; u0

D max

With minor changes the above holds for possibly singular B and C.
23.5

The matrix 2-norm (or the spectral norm) is dened as


 0 0 1=2
x A Ax
kAk2 D max kAxk2 D max
x0 x
kxk2 D1
x0
p
D ch1 .A0 A/ D sg1 .A/;
where
kxk2 refers to the standard Euclidean vector norm, and sgi .A/ D
p
chi .A0 A/ D i D the i th largest singular value of A. Obviously we have
kAxk2
kAk2 kxk2 . Recall that
q
kAkF D 12 C    C r2 ; kAk2 D 1 ; where r D rank.A/:

23 Singular value decomposition & other matrix decompositions

107

EY2 EckartYoung theorem. Let Anm be a given matrix of rank r, with the singular value decomposition A D U1 1 V01 D 1 u1 v01 C    C r ur v0r . Let B
be an n  m matrix of rank k (< r). Then
2
C    C r2 ;
minkA  Bk2F D kC1
B

P k D 1 u1 v0 C    C k uk v0 .
and the minimum is attained taking B D B
1
k
23.6

Let Anm and Bmn have the SVDs A D UA V0 , B D RB S0 , where
A D diag.1 ; : : : ; r /, B D diag.1 ; : : : ; r /, and r D min.n; m/. Then
jtr.AXBY/j
1 1 C    C r r
for all orthogonal Xmm and Ynn . The upper bound is attained when X D
VR0 and Y D SU0 .

23.7

Let A and B be n  m matrices with SVDs corresponding to 23.6. Then


(a) the minimum of kXA  BYkF when X and Y run through orthogonal
matrices is attained at X D RU0 and Y D SV0 ,
(b) the minimum of kA  BZkF when Z varies over orthogonal matrices is
attained when Z D LK0 , where LK0 is the SVD of B0 A.

FRD Full rank decomposition of Anm , r.A/ D r:


A D BC0 ;

where r.Bnr / D r.Cmr / D r:

CHO Cholesky decomposition. Let Ann be nnd. Then there exists a lower triangular matrix Unn having all ui i  0 such that A D UU0 .
QRD QR-decomposition. Matrix Anm can be expressed as A D Qnn Rnm ,
where Q is orthogonal and R is upper triangular.
POL Polar decomposition. Matrix Anm (n
m) can be expressed as A D
Pnn Unm , where P is nonnegative denite with r.P/ D r.A/ and UU0 D In .
SCH Schurs triangularization theorem. Let Ann have real eigenvalues 1 ; : : : ; n .
Then there exists an orthogonal Unn such that U0 AU D T, where T is an
upper-triangular matrix with i s as its diagonal elements.
ROT Orthogonal rotation. Denote


cos   sin 
A D
;
sin  cos 


cos  sin 
B D
:
sin   cos 

Then any 2  2 orthogonal matrix Q is A or B for some  . Transformation


A u.i/ rotates the observation u.i/ by the angle  in the counter-clockwise

108

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

direction, and B u.i/ makes the reection of the observation u.i/ w.r.t. the
 
line y D tan 2 x. Matrix


cos  sin 
C D
 sin  cos 
carries out an orthogonal rotation clockwise.
HSD HartwigSpindelbck decomposition. Let A 2 Rnn be of rank r. Then there
exists an orthogonal U 2 Rnn such that


K L
ADU
U0 ; where  D diag.1 Ir1 ; : : : ; t Ir t /;
0
0
with  being the diagonal matrix of singular values of A, 1 >    > t > 0,
r1 C    C r t D r, while K 2 Rrr , L 2 Rr.nr/ satisfy KK0 C LL0 D Ir .
23.8

Consider the matrix A 2 Rnn with representation HSD. Then:


 0 2
 2 

K  K K0 2 L
 0
0
0
;
AA
D
U
U
(a) A0 A D U
U0 ;
L0 2 K L0 2 L
0 0


 0

K K K0 L
Ir 0
0
C
U
U0 ,
;
AA
D
U
AC A D U
L0 K L0 L
0 0
(b) A is an oblique projector, i.e., A2 D A, i K D Ir ,
(c) A is an orthogonal projector i L D 0,  D Ir , K D Ir .

23.9

A matrix E 2 Rnn is a general permutation matrix if it is a product of


elementary permutation matrices Eij ; Eij is the identity matrix In with the
i th and j th rows (or equivalently columns) interchanged.

23.10 (a) Ann is nonnegative if all aij  0,


(b) Ann is reducible if there is a permutation matrix Q, such that


A11 A12
;
Q0 AQ D
0 A22
where A11 and A22 are square, and it is otherwise irreducible.
PFT

PerronFrobenius theorem. If Ann is nonnegative and irreducible, then


(a) A has a positive eigenvalue, %, equal to the spectral radius of A, .A/ D
maxfjchi .A/jg; the eigenvalue corresponding to .A/ is called the dominant eigenvalue,
(b) % has multiplicity 1,
(c) there is a positive eigenvector (all elements  0) corresponding to %.

24 Lwner ordering

109

23.11 Stochastic matrix. If A 2 Rnn is nonnegative and A1n D 1n , then A is a


stochastic matrix. If also 10n A D 10n , then A is a doubly stochastic matrix. If,
in addition, both diagonal sums are 1, then A is a superstochastic matrix.
23.12 Magic square. The matrix
0
1
16 3 2 13
B 5 10 11 8 C
C
ADB
@ 9 6 7 12A ;
4 15 14 1
appearing in Albrecht Drers copper-plate engraving Melencolia I is a magic
square, i.e., a k  k array such that the numbers in every row, column and in
each of the two main diagonals add up to the same magic sum, 34 in this case.
The matrix A here denes a classic magic square since the entries in A are
the consecutive integers 1; 2; : : : ; k 2 . The MoorePenrose inverse AC also a
magic square (though not a classic one) and its magic sum is 1=34.

24 Lwner ordering
24.1

Let Ann be symmetric.


(a) The following statements are equivalent:
(i)

A is positive denite; shortly A 2 PDn , A >L 0

(ii) x0 Ax > 0 for all vectors x 0


(iii) chi .A/ > 0 for i D 1; : : : ; n
(iv) A D FF0 for some Fnn , r.F/ D n
(v) all leading principal minors > 0
(vi) A1 is positive denite
(b) The following statements are equivalent:
(i)

A is nonnegative denite; shortly A 2 NNDn , A L 0

(ii) x0 Ax  0 for all vectors x


(iii) chi .A/  0 for i D 1; : : : ; n
(iv) A D FF0 for some matrix F
(v) all principal minors  0
24.2

A
L B () B  A L 0 () B  A D FF0 for some matrix F
(denition of Lwner ordering)

110

24.3

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Let A; B 2 NNDn . Then


(a) A
L B () C .A/  C .B/ & ch1 .AB /
1
in which case ch1 .AB / is invariant with respect to B ,
(b) A
L B () C .A/  C .B/ & AB A
L A.

24.4

Let A L B where A >L 0 and B L 0. Then det.A/ D det.B/ H) A D B.

24.5

 11 A12 
Alberts theorem. Let A D A
A21 A22 be a symmetric matrix where A11 is a
square matrix. Then the following three statements are equivalent:
(a)

24.6

(b1 ) A11 L 0,

(b2 ) C .A12 /  C .A11 /,

(b3 ) A22  A21 A


11 A12 L 0,

(c1 ) A22 L 0,

(c2 ) C .A21 /  C .A22 /,

(c3 ) A11  A12 A


22 A21 L 0.

(Continued . . . ) The following three statements are equivalent:


(a)

24.7

A L 0,

A >L 0,

(b1 ) A11 >L 0,

(b2 ) A22  A21 A1


11 A12 >L 0,

(c1 ) A22 >L 0,

(c2 ) A11  A12 A1


22 A21 >L 0.

The following three statements are equivalent:




B b
L 0,
(a) A D
b0
(b) B  bb0 = L 0,
(c) B L 0;

24.8

b 2 C .B/;

b0 B b
.

Let U and V be pd. Then:


(a) U  B0 V1 B >L 0 () V  BU1 B0 >L 0,
(b) U  B0 V1 B L 0 () V  BU1 B0 L 0.

24.9

Let A >L 0 and B >L 0. Then: A <L B () A1 >L B1 .

24.10 Let A L 0 and B L 0. Then any two of the following conditions imply the
third: A
L B, AC L BC , r.A/ D r.B/.
24.11 Let A be symmetric. Then A  BB0 L 0 i
A L 0;

C .B/  C .A/;

24.12 Consider a symmetric matrix

and

I  B0 A B L 0:

24 Lwner ordering

111

1 r12 r13
R D @r21 1 r23 A ;
r31 r32 1

where all rij2


1:

Then R is a correlation matrix i R is nod which holds o det.R/ D 1 


2
2
2
 r13
 r23
C 2r12 r13 r23  0, or, equivalently, .r12  r13 r23 /2
.1 
r12
2
2
r13 /.1  r23 /.
24.13 The inertia In.A/ of a symmetric Ann is dened as the triple f
; ; g, where

is the number of positive eigenvalues of A,  is the number that are negative,


and  is the number that are zero. Thus
C  D r.A/ and
C  C  D n.
Matrix A is nnd if  D 0, and pd if  D  D 0.


A11 A12
satises
24.14 The inertia of the symmetric partitioned matrix A D
A21 A22
(a) In.A/ D In.A11 / C In.A22  A21 A
11 A12 /

if C .A12 /  C .A11 /,

A12 A
22 A21 /

if C .A21 /  C .A22 /.

(b) In.A/ D In.A22 / C In.A11 

24.15 The minus (or rank-subtractivity) partial ordering for Anm and Bnm :
A
 B () r.B  A/ D r.B/  r.A/;
or equivalently, A
 B holds i any of the following conditions holds:
(a) A A D A B and AA D BA for some A 2 fA g,
(b) A A D A B and AA D BA for some A , A 2 fA g,
(c) fB g  fA g,
(d) C .A/ \ C .B  A/ D f0g and C .A0 / \ C .B0  A0 / D f0g,
(e) C .A/  C .B/, C .A0 /  C .B0 / & AB A D A for some (and hence for
all) B .
24.16 Let V 2 NNDn , X 2 Rnp , and denote
U D f U W 0
L U
L V; C .U/  C .X/ g:
The maximal element (in the Lwner partial ordering) U in U is the shorted
matrix of V with respect to X, and denoted as Sh.V j X/.
24.17 The shorted matrix S D Sh.V j X/ has the following properties:
Q under fy; X; Vg,
(a) S D cov.X/

(b) C .S/ D C .X/ \ C .V/,

(c) C .V/ D C .S/ C .V  S/,

(d) r.V/ D r.S/ C r.V  S/,

(e) SV .V  S/ D 0,
(f) V 2 fS g for some (and hence for all) V 2 fV g, i.e., fV g  fS g.

112

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

25 Inequalities
CSI

.x0 y/2
x0 x  y0 y

25.1

Recall: (a) Equality holds in CSI if x D 0 or y D 0. (b) The nonnull vectors


x and y are linearly dependent (l.d.) i x D y for some  2 R.

CauchySchwarz inequality with equality holding


i x and y are linearly dependent

In what follows, the matrix Vnn is nnd or pd, depending on the case. The ordered eigenvalues are 1 ; : : : ; n and the corresponding orthonormal eigenvectors are t1 ; : : : ; tn .
25.2

.x0 Vy/2
x0 Vx  y0 Vy

equality i Vx and Vy are l.d.

25.3

.x0 y/2
x0 Vx  y0 V1 y

equality i x and Vy are l.d.; V is pd

25.4

.x0 PV y/2
x0 Vx  y0 VC y

25.5

.x0 y/2
x0 Vx  y0 V y

25.6

.x0 x/2
x0 Vx  x0 V1 x;

25.7

.x0 Vx/1
x0 V1 x; where x0 x D 1

25.8

Let V 2 NNDn with Vfpg dened as

equality i V1=2 x and VC1=2 y are l.d.

for all y 2 C .V/

.x0 x/2

1,
x0 Vx  x0 V1 x
where the equality holds (assuming x 0) i x is an eigenvector of V: Vx D
x for some  2 R.

Vfpg D Vp I
D PV I

i.e.,

equality i Vx D x for some 

p D 1; 2; : : : ;
p D 0;

C jpj

D .V / I

p D 1; 2; : : :

Then: .x0 Vf.hCk/=2g y/2


.x0 Vfhg x/.y0 Vfkg y/ for h; k D : : : ; 1; 0; 1; 2; : : : ,
with equality i Vx / Vf1C.kh/=2g y.
41 n
.x0 x/2

0
2
.1 C n /
x Vx  x0 V1 x

KI

12 WD

25.9

Equality holds in Kantorovich inequality KI when x is proportional to t1 tn ,


where t1 and tn are orthonormal eigenvectors of V corresponding to 1 and
n ; when the eigenvalues 1 and n are both simple (i.e., each has multiplicity
1) then this condition is also necessary.

Kantorovich inequality
i D chi .V/, V 2 PDn

25 Inequalities

113

p
25.10 1 D 1 n =.1 C n /=2 D the rst antieigenvalue of V
.1 C n /2
.1 C n / .1=1 C 1=n /
D

41 n
2
2
2
1
.
C

/
1
1
n
D 2p
D

2
1 n
1 n
1

25.11 12 D

1 Cn

25.12 x0 Vx 
WI

p
p
1


n /2

.
1
x0 V1 x

.x0 x D 1/

Wielandt inequality. Consider V 2 NNDn with r.V/ D v and let i be the i th


largest eigenvalue of V and ti the corresponding eigenvector. Let x 2 C .V/
and y be nonnull vectors satisfying the condition x0 y D 0. Then


41 v
1  v 2
.x0 Vy/2
D1
WD v2 :

0
0
x Vx  y Vy
1 C v
.1 C v /2
The upper bound is attained when x D t1 C tv and y D t1  tv .

25.13 If X 2 Rnp and Y 2 Rnq then


0
L Y0 .In  PX /Y
L .Y  XB/0 .Y  XB/
25.14 jX0 Yj2
jX0 Xj  jY0 Yj

for all Bpq :


for all Xnp and Ynp

25.15 Let V 2 NNDn with Vfpg dened as in 25.8 (p. 112). Then
X0 Vf.hCk/=2g Y.Y0 Vfkg Y/ Y0 Vf.hCk/=2g X
L X0 Vfhg X

for all X and Y;

with equality holding i C .Vfh=2g X/  C .Vfk=2g Y/.


25.16 X0 PV X.X0 VX/ X0 PV X
L X0 VC X

.1 C v /2 0
X PV X.X0 VX/ X0 PV X
41 v

r.V/ D v

.1 C v /2 0
X PV X.X0 VC X/ X0 PV X
41 v

r.V/ D v

25.17 X0 VC X
L
25.18 X0 VX
L

equality i C .VX/ D C .PV X/

25.19 X0 V1 X
L

.1 C n /2 0
X X.X0 VX/1 X0 X
41 n

V pd

25.20 If X0 PV X is idempotent, then .X0 VX/C


L X0 VC X, where the equality holds
i PV XX0 VX D VX.
25.21 .X0 VX/C
L X0 V1 X, if V is pd and XX0 X D X

114

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

25.22 Under the full rank model fy; X; Vg we have


O
L
Q
L cov./
(a) cov./

.1 C n /2
Q
cov./,
41 n

(b) .X0 V1 X/1


L .X0 X/1 X0 VX.X0 X/1
L

.1 C n /2 0 1 1
.X V X/ ,
41 n

O  cov./
Q D X0 VX  .X0 V1 X/1
(c) cov./
p
p

L . 1  n /2 Ip ; X0 X D Ip :
25.23 Consider V 2 NNDn , Xnp and Ynq , satisfying X0 PV Y D 0. Then


1  v 2 0
(a) X0 VY.Y0 VY/ Y0 VX
L
X VX,
r.V/ D v
1 C v
(b) V  PV X.X0 VC X/ X0 PV L VY.Y0 VY/ Y0 V,
(c) X0 VX  X0 PV X.X0 VC X/ X0 PV X L X0 VY.Y0 VY/ Y0 VX,
where the equality holds i r.V/ D r.VX/ C r.VY/.
25.24 Samuelsons inequality. Consider the n  k data matrix X0 D .x.1/ W : : : W
1
X00 CX0 . Then
x.n/ /0 and denote xN D .xN 1 ; : : : ; xN k /0 , and Sxx D n1
.n  1/2
Sxx  .x.j /  xN /.x.j /  xN /0 L 0;
n
or equivalently,
N / D MHLN2 .x.j / ; xN ; Sxx /
.x.j /  xN /0 S1
xx .x.j /  x
.n  1/2
; j D 1; : : : ; n:
n
The equality above holds i all x.i/ dierent from x.j / coincide with their
mean.

p
10 V1  tr.V/, i.e., assuming 10 V1 0,
25.25 Let Vpp be nnd. Then 10 V1  p1


tr.V/
p
1 0
WD .1/:
1
p1
1 V1

where the equality is obtained i V D  2 110 for some  2 R. The term .1/
can be interpreted as Cronbachs alpha.
25.26 Using the notation above, and assuming that diag.V/ D V is pd [see 22.26
(p. 102)]




a 0 V a
p
1
p
1 0

1

1
.a/ D
p1
a Va
p1
ch1 .RV /

26 Kronecker product, some matrix derivatives

115

for all a 2 Rp , where RV is a correlation matrix calculated from the covariance matrix V. Moreover, the equality .a/ D 1 is obtained i V D  2 qq0
for some  2 R and some q D .q1 ; : : : ; qp /0 , and a is a multiple of
a D .1=q1 ; : : : ; 1=qp /0 .

26 Kronecker product, some matrix derivatives


26.1

The Kronecker product of Anm D .a1 W : : : W am / and Bpq and vec.A/ are
dened as
1
0
a11 B a12 B : : : a1n B
0 1
a1
Ba21 B a22 B : : : a2n B C
C 2 Rnpmq ; vec.A/ D @ :: A :
AB D B
:
:
:
:
@ ::
::
:: A
am
an1 B an2 B : : : anm B

26.2

.A B/0 D A0 B0 ;

26.3

.F W G/ B D .F B W G B/;

26.4

.A b0 /.c D/ D .b0 A/.D c/ D Acb0 D

26.5

Anm Bpk D .A Ip /.Im B/ D .In B/.A Ik /

26.6

.A B/1 D A1 B1 ;

26.7

PAB D PA PB ;

26.8

tr.A B/ D tr.A/  tr.B/;

26.9

Let A L 0 and B L 0. Then A B L 0.

a0 b D ba0 D b a0 ;

 A D A D A 

.A B/.C D/ D AC BD

.A B/C D AC BC

r.A B/ D r.A/  r.B/


kA BkF D kAkF  kBkF

26.10 Let ch.Ann / D f1 ; : : : ; n g and ch.Bmm / D f1 ; : : : ; m g. Then ch.A


B/ D fi j g and ch.A Im C In B/ D fi C j g, where i D 1; : : : ; n,
j D 1; : : : ; m.
26.11 vec.A C B/ D vec.A/ C vec.B/;

; 2 R

26.12 vec.ABC/ D .I AB/ vec.C/ D .C0 A/ vec.B/ D .C0 B0 I/ vec.A/


26.13 vec.A1 / D .A1 /0 A1  vec.A/
26.14 tr.AB/ D vec.A0 /0 vec.B/

116

Formulas Useful for Linear Regression Analysis and Related Matrix Theory

26.15 Below are some matrix derivatives.


@Ax
D A0
(a)
@x
@x0 Ax
(b)
D .A C A0 /xI 2Ax when A symmetric
@x
@ vec.AX/
DIA
(c)
@ vec.X/0
(d)

@ vec.AXB/
D B0 A
@ vec.X/0

@ tr.AX/
D A0
@X
@ tr.X0 AX/
D 2AX for symmetric A
(f)
@X
@ logjX0 AXj
(g)
D 2AX.X0 AX/1 for symmetric A
@X
(e)

Index

A
Abadir, Karim M. vi
added variable plot 18
adjoint matrix 83
admissibility 65
Alberts theorem 110
ANOVA 37
another parametrization 39
conditional mean and variance 38
estimability 48
in matrix terms 38
antieigenvalues 58, 61
approximating
the centered data matrix 75
AR(1)-structure see autocorrelation
autocorrelation
V D f%ji j j g 30
BLP 31
in linear model 32
in prediction 67
regression coecients 32
testing 32
B
Baksalary, Oskar Maria vi
BanachiewiczSchur form 89
Bernoulli distribution 19
Bernstein, Dennis S. vi
best linear predictor see BLP
best linear unbiased estimator see BLUE
best linear unbiased predictor see BLUP, 65
best predictor see BP
bias 27
binomial distribution 19
Blanck, Alice vi

block diagonalization 28, 89


BloomeldWatson eciency 62
BloomeldWatsonKnott inequality 58
BLP
BLP.yI x/ 22, 28
BLP.zI A0 z/ 29
BLP.yn I y.n1/ / 31
y C yx 1
22
xx .x  x /
y C yx 
28
xx .x  x /
autocorrelation 31
BLUE
Q representations 51
,
P 1W X2 /1 X0 M
P
53
Q 2 D .X02 M
2 1W y
0 P
1 0 P
Q
2 D .X2 M1 X2 / X2 M1 y 53
Q i in the general case 53
BLUE.K0 / 49
BLUE.X/ 48
denition 48
fundamental equation 49
Q as a shorted matrix 111
cov.X/
Q
X, general representations 49
Q representations 50
X,
equality of OLSE and BLUE, several
conditions 54
restricted BLUE 37
without the ith observation 43
BLUP
fundamental equation 66
in mixed model 68
of y 14
of  68
of yf 66
Pandoras Box 66
representations for BLUP.yf / 66
BP
BP.yI x/ 23

S. Puntanen et al., Formulas Useful for Linear Regression


Analysis and Related Matrix Theory, SpringerBriefs in Statistics,
c The Author(s) 2013
DOI: 10.1007/978-3-642-32931-9, 

117

118
BP.yI x/ 27
E.y j x/ 27
in Np 23
is the conditional expectation

Index

23, 27

C
canonical correlations
between x and y 76
between O and Q 59
between Hy and My 60
between K0 QL u and L0 QK u 78
non-unit ccs between K0 u and L0 u 78
proper eigenvalues 77
unit ccs 60
Watson eciency 59
CauchySchwarz inequality 112
centered X0 ; Xy 2
centered and scaled Xy 3
centered model 11
centering matrix C 2
characteristic equation 96
Chatterjee, Samprit v
Cholesky decomposition 107
Cochrans theorem 24
coecient of determination see multiple
correlation
cofactor 82
collinearity 42
column space
denition 5
C .VX? / 95
C .W X W I  W W/ 95
C .X W V/ D C .X W VM/ 49
C X0 .V C XUX0 / X 46
simple properties 78
Q
of cov.X/
51
completely symmetric matrix 57
eigenvalues 97, 98
condition index 41
condition number 41
conditional see normal distribution
EE.y j x/ 23
varE.y j x/ 23, 38
expectation and the best predictor 27
condence ellipse see condence region
condence interval
for i 14
for E.y / 15
WorkingHotelling condence band 15
condence region
for 2 34
for x 34
for  25

consistency (when solving X)


of AX D B 87
of AXB D C 87
of X.A W B/ D .0 W B/ 81, 94
consistency of linear model 49
constraints to obtain unique O 35
contingency table 19
contours of constant density xii, 20
contrast 48
Cooks distance 41
COOK 2i .V / 43
Cook, R. Dennis 30
correlation see multiple correlation
cor.x; y/ D xy =x y xi
cors .xi ; xj /; cord .xi ; xj / 4
R D cord .y; yO / 10
cor.O1 ; O2 / 14, 18
cor."Oi ; "Oj / 10
as a cosine 4
between dichotomous variables 19
random sample without replacement 19
correlation matrix
cor.x/ 6
cord .X0 / D Rxx 3
cord .X0 W y/ D R 3
correlation matrix 111
determinant 84, 85
inverse 13, 85, 86
rank 5, 84
recursive decomposition of det.R/ 85
CourantFischer theorem 100
covariance
cov.x; y/ D E.x  x /.y  y / xi
covs .x; y/; covd .x; y/ 4
cov.z0 Az; z0 Bz/ 24
covariance matrix
cov.x/ 6
cov.x; y/ 6
O
cov./
12
cov.O 2 / 13
cov.O x ; y/
N D 0 14
cov.Q 2 / 53
cov.eyx / 28
cov.eyx ; x/ 28
Q
cov.X/
51
cov.x; y  Ax/ D 0 29
covz  BLP.zI A0 z/ 29
covd .U/ 7
covd .UA/ 7
covd .US1=2 / 7
covd .UT/; S D TT0 7
covd .X0 / D Sxx 3
covd .Xy / D S 3

Index
always nonnegative denite 6
of the prediction error 28, 74
partial covariance 22
crazy, still v
Cronbachs alpha 114
D
data matrix
Xy D .X0 W y/ 2
best rank-k approximation 75
keeping data as a theoretical distribution 4
decomposition see eigenvalue decomposition, see full rank decomposition, see
singular value decomposition
HartwigSpindelbck 108
derivatives, matrix derivatives 116
determinant
22 21
Laplace expansion 82
recursive decomposition of det.S/ 30
recursive decomposition of det./ 30
Schur complement 83
DFBETAi 41
diagonalizability 99
discrete uniform distribution 19
discriminant function 75
disjointness
C .A/ \ C .B/ D f0g, several conditions
80
C .VH/ \ C .VM/ 60
C .X0.1/ / \ C .X0.2/ / 43
C .X/ \ C .VM/ 49
C .X1 / \ C .X2 / 8
estimability in outlier testing 42
distance see Cooks distance, see Mahalanobis distance, see statistical
distance
distribution see Hotellings T 2 , see normal
distribution
F -distribution 24
t -distribution 24
Bernoulli 19
binomial 19
chi-squared 23, 24
discrete uniform 19
Hotellings T 2 25
of MHLN2 .x; ; / 23
of y0 .H  J/y 24
of y0 .I  H/y 24
of z0 Az 23, 24
of z0 1 z 23
Wishart-distribution 24, 73
dominant eigenvalue 97, 108

119
Draper, Norman R. v
DurbinWatson test statistic
Drers Melencolia I 109

32

E
EckartYoung theorem
non-square matrix 107
symmetric matrix 101
eciency of OLSE see Watson eciency,
BloomeldWatson eciency, Raos
eciency
eigenvalue, dominant 97, 108
eigenvalues
denition 96
.; w/ as an eigenpair for .A; B/ 102
ch. 22 / 21
ch.A; B/ D ch.B1 A/ 102
ch.A22 / 96
ch1 .A; B/ D ch1 .BC A/ 104
jA  Ij D 0 96
jA  Bj D 0 101
nzch.A; B/ D nzch.AB / 104
nzch.AB/ D nzch.BA/ 99
algebraic multiplicity 99
characteristic polynomial 96
eigenspace 99
geometric multiplicity 99
intraclass correlation 97, 98
of 2 I C a10 C 1a0 101
Q
of cov.X/
104
of PA PB 78, 92, 93
of PK PL  PF 78
spectral radius 97
elementary column/row operation 99
equality
of BLUPs of yf under two models 67
of BLUPs under two mixed models 70
of OLSE and BLUE, several conditions 54
of OLSE and BLUE, subject to
C .U/ C .X/ 56
of the BLUEs under two models 55
of the OLSE and BLUE of 2 57
O M.i/ / D .
Q M.i/ / 62
.
O 2 .M12 / D Q 2 .M12 / 57
Q 2 .M12 / D Q 2 .M12 / 63
N
BLUP.yf / D BLUE.X
66
f /
BLUP.yf / D OLSE.Xf / 67
SSE.V / D SSE.I / 53
XO D XQ 54
Ay D By 50
PXIV1 D PXIV2 55
estimability

120

Index
 D 0 25
1 D 2 26, 73
1 D    D k D 0 36
i D 0 36
D 0 in outlier testing 40
1 D    D g 37
1 D 2 38
k0 D 0 36
K0 D d 35, 36
K0 D d when cov.y/ D  2 V 36
K0 D 0 33
K0 B D D 73
overall-F -value 36
testable 37
under intraclass correlation 55

denition 47
constraints 35
contrast 48
in hypothesis testing 33, 37
of 48
of k 48
of in the extended model 42
of in the extended model 40
of c0  48
of K0 47
of X2 2 47
extended (mean-shift) model 40, 42, 43
F
factor analysis 75
factor scores 76
niteness matters 20
tted values 8
frequency table 19
FrischWaughLovell theorem 11, 18
generalized 57
Frobenius inequality 80
full rank decomposition 88, 107
G
Galtons discovery of regression v
GaussMarkov model see linear model
GaussMarkov theorem 49
generalized inverse
the 4 MoorePenrose conditions 86
BanachiewiczSchur form 89
least-squares g-inverse 90
minimum norm g-inverse 90
partitioned nnd matrix 89
reexive g-inverse 87, 89
through SVD 88
generalized variance 6, 7
generated observations from N2 .0; / xii
H
Hadi, Ali S. v
HartwigSpindelbck decomposition 108
Haslett, Stephen J. vi
hat matrix H 8
Hendersons mixed model equations 69
Hotellings T 2 24, 25, 73
when n1 D 1 26
hypothesis
2 D 0 34
x D 0 27, 36
D 0 in outlier testing 42

I
idempotent matrix
eigenvalues 93
full rank decomposition 93
rank D trace 93
several conditions 93
illustration of SST = SSR + SSE xii
independence
linear 5
statistical 20
uN and T D U0 .I  J/U 25
observations, rows 24, 25
of dichotomous variables 20
quadratic forms 23, 24
inequality
chi .AA0 /
chi .AA0 C BB0 / 99
n
x0 Ax=x0 x
1 100
kAxk
kAx C Byk 89
kAxk2
kAk2 kxk2 106
tr.AB/
ch1 .A/  tr.B/ 99
BloomeldWatsonKnott 58
CauchySchwarz 112
Frobenius 80
Kantorovich 112
Samuelsons 114
Sylvesters 80
Wielandt 113
inertia 111
intercept 10
interlacing theorem 100
intersection
C .A/ \ C .B/ 80
C .A/ \ C .B/, several properties 92
intraclass correlation
F -test 55
OLSE D BLUE 55
eigenvalues 97, 98

Index

121

invariance
of C .AB C/ 87
of AB C 87
of X0 .V C XUX0 / X 46
of r.AB C/ 88
inverse
of Ann 82
intraclass correlation 97
of A D faij g D fmin.i; j /g 32
of R 85
of Rxx 13, 86
of X0 X 12, 84
of 22 21
of a partitioned block matrix 83
of autocorrelation matrix 32
of partitioned pd matrix 84
Schur complement 83
irreducible matrix 108
K
Kantorovich inequality 112
Kronecker product 115
in multivariate model 72
in multivariate sample 25
L
Lagrangian multiplier 35
Laplace expansion of determinant 82
LATEX vi
least squares see OLSE
Lee, Alan J. v
LehmannSche theorem 64
leverage hi i 41
likelihood function 26
likelihood ratio 26
linear hypothesis see hypothesis
linear independence 5
collinearity 42
linear model 1
multivariate 72
linear prediction suciency 68
linear zero function 50, 66
linearly complete 64
linearly sucient see suciency
Lwner ordering see minimizing
denition 109
Alberts theorem 110
Y0 MY
L .Y  XB/0 .Y  XB/ 18, 113
M
magic square

109

Magnus, Jan R. vi
Mahalanobis distance
MHLN2 .Ny1 ; yN 2 ; S / 26, 73, 75
MHLN2 .u.i/ ; uN ; S/ 4
MHLN2 .x; ; / 5, 23
MHLN2 .x ; xN ; Sxx / 14
MHLN2 .x.i/ ; xN ; Sxx / 41, 114
MHLN2 .1 ; 2 ; / 75
major/minor axis of the ellipse 21
matrix
irreducible 108
nonnegative 108
permutation 108
reducible 108
shorted 111
spanning C .A/ \ C .B/ 80
stochastic 109
R
matrix M
P V 44
PV MP
P WDM
R W 45
PW MP
P
matrix M
M.MVM/ M 44
P
52, 53
SSE.V / D y0 My
P1
matrix M
M1 .M1 VM1 / M1 53, 95
P 1W
matrix M
M1 .M1 W2 M1 / M1 53
matrix angle 61
matrix of corrected sums of squares
T D Xy0 CXy 3
Txx D X00 CX0 3
maximizing
tr.G0 AG/ subject to G0 G D Ik 100
.a0 x/2 =x0 Bx 102
.x0 Ay/2 =.x0 Bx  y0 Cy/ 106
.x0 Ay/2 =.x0 x  y0 y/ 106
a0 .1  2 /2 =a0 a 75
a0 .Nu  0 /2 =a0 Sa 25
a0 .Nu1  uN 2 /2 =a0 S a 75
a0 .u.i/  uN /2 =a0 Sa 5
.Vz; z/ 61
cor.a0 x; b0 y/ 76
cor.y; b0 x/ 29
cord .y; X/ 16
cord .y; X0 b/ 29
cos2 .u; v/; u 2 A; v 2 B 77
.0 A0 B/2
77
0
A0 A  0 B0 B
2
kHVMkF 62
u0 u subject to u0 1 u D c 2 21
var.b0 z/ 74
x0 Ax=x0 Bx 102
x0 Ax=x0 Bx, B nnd 104
x0 Hx=x0 Vx 104

122
likelihood function 26
mean squared error 27
mean squared error matrix 27, 64
Melencolia I by Drer 109
minimizing
.Y  XB/0 .Y  XB/ 18, 72, 73, 93, 113
.y  X/0 .y  X/ 9
.y  X/0 .y  X/ under K0 D d 35
.y  X/0 V1 .y  X/ 9, 36, 52
.y  X/0 V1 .y  X/ under K0 D d
37
Ey  .Ax C b/y  .Ax C b/0 28
MSEM.Ax C bI y/ 28
cos.Vz; z/ 61
cov.Gy/ subject to GX D X 48
cov.y  Fx/ 28
covd .Y  XB/ 18, 28
covs .y  B0 x/ 18, 28
O D jcov./j=jcov.
Q
O
e./
/j
58
k  A.A0 A/1 A0 kF 74
kA  BZkF subject to Z orthogonal 107
kA  BkF subject to r.B/ D k 101, 107
Q  XP
Q G kF 74
kX
ky  AxkV 94
ky  1 k 16
ky  Axk 91
ky  Xk 9
tr.P1=2 A / 74
var.g0 y/ subject to X0 g D k 48
var.y  b0 x/ 29
vard .y  X0 b/ 28, 29
angle 77
mean squared error 28
mean squared error matrix 28
orthogonal distances 74
minor 82
leading principal 82
principal 82
minus partial ordering 54, 111
mixed model 68
MLE
of 0 and x 27
of  and 26
2
27
of yx
model matrix 2
rank 5, 84
multinormal distribution
denition 20
conditional mean 22, 27
conditional variance 22, 27
contours 20
density function of N2 21
density function of Np 20
sample U0 from Np 25

Index
multiple correlation
R D cord .y; yO / 10
in M121 17
in no-intercept model 17
population 22, 27, 29
squared R2 16
multivariate linear model 72
Mustonens measure 30
N
Niemel, Jarmo vi
no-intercept model 1, 17, 36
nonnegative deniteness
of partitioned matrix 110
nonnegative matrix 108
norm
matrix 2-norm 106
spectral 106
normal distribution see multinormal
distribution
normal equation 9
general solution 9
generalized 9
null space 5
O
observation space 2
observation vector 2
OLS criterion 9
OLSE
constraints 35
tted values 8
of 10
of K0 9
of X 8, 9
restricted OLSE 35
without the i th observation 40
without the last b observations 42
ordinary least squares see OLSE
orthocomplement
C .A/? 5
C .VX? / 95
C .VX? / D C .W X W I  W W/?
95
C .V1 X/? 95
C .X/?
95
V1
A? 5, 80, 91
94
A?
V
orthogonal projector see projector
simple properties of PA 91
simple properties of PAIV 94
C 2

46,

Index

123

H 8
J 2
M 8
P221 93
PC .A/\C .B/ 92
PC .A/\C .B? / 92
PXIV1 D I  P0MIV 45, 95
commuting PA PB D PB PA 92
decomposition of P.X1 WX2 /IVC 95
decomposition of P.X1 WX2 /IV1 53, 95
dierence PA  PB 92
sum PA C PB 91
orthogonal rotation 107
overall-F-value 36
P
Pandoras Box
BLUE 49, 51
BLUP 66
mixed model 68
partial correlation
pcord .Y j X/ 17
%ij x 22
%xy 22
rxy 18
2
30
decomposition of 1  %yx
2
decomposition of 1  Ry12:::k
2
decomposition of 1  Ry12:::k
population 22
partial covariance 22
partitioned matrix
g-inverse 89
inverse 83
MP-inverse 89
nonnegative deniteness 110
pencil 102, 103
permutation
determinant 82
matrix 108
PerronFrobenius theorem 108
Poincar separation theorem 100
polar decomposition 107
predictable, unbiasedly 65
prediction error
Gy  yf 65
y  BLP.yI x/ 22, 28
ei 1:::i1 30
Mahalanobis distance 14
with a given x 14
prediction interval for y 15
principal component analysis 74
matrix approximation 75
predictive approach 74

sample principal components 74


principal components
Q 75
from the SVD of X
principal minor
sum of all i  i principal minors 96
principal submatrix 82
projector see orthogonal projector
PAjB 94
PXjVM 49
oblique 91, 94
proper eigenvalues
ch.A; B/ 103
ch.GVG0 ; HVH/ 60
ch.V; H/ 104
PSTricks vi
Q
QR-decomposition 107
quadratic form
E.z0 Az/ 23
var.z0 Az/ 24
distribution 23, 24
independence 23, 24
quadratic risk 64
R

17
85

random sample without replacement 19


rank
denition 5
simple properties 78
of .1 W X0 / 5, 84
of .A W B/ 79
Q
of cov.X/
51
of AB 79
of APB 81
of A 88
of HPV M 60
of Txx 5, 8
of X0 .V C XUX0 / X 46
of X02 M1 X2 8
of correlation matrix 5, 84
of model matrix 5, 84
rank additivity
r.A C B/ D r.A/ C r.B/ 79
Schur complement 83
rank cancellation rule 79
rank-subtractivity see minus partial ordering
Raos eciency 63
Rayleigh quotient x0 Ax=x0 x 100
recursive decomposition
2
30
of 1  %yx
2
of 1  Ry12:::k
17
2
of 1  Ry12:::k
85

124
of det.R/ 85
of det.S/ 30
of det./ 30
reduced model
R2 .M121 / 17
M121 57
M1H 62
AVP 18
SSE.M121 / 17
reducible matrix 108
reection 108
regression coecients 10
old ones do not change 11
standardized 12
regression function 23
regression towards mean v
relative eciency of OLSE see Watson
eciency
residual
y  BLP.yI x/ 22, 28
ei 1:::i1 30
after elimination of X1 16
externally Studentized 40, 42
internally Studentized 39
of BLUE 51, 52
of OLSE 9, 39
predicted residual 41
scaled 39
residual mean square, MSE 15
residual sum of squares, SSE 15
rotation 107
of observations 75
S
Samuelsons inequality 114
Schur complement 83
determinant 83
MP-inverse 89
rank additivity 83
Schurs triangularization theorem 107
Seber, George A. F. v, vi
shorted matrix 111
similarity 99
Simon, Paul v
simultaneous diagonalization
by a nonsingular matrix 103
by an orthogonal matrix 103
of commuting matrices 103
singular value decomposition 105
skew-symmetric: A0 D A ix
Smith, Harry v
solution
to A.X W VX? / D .Xf W V21 X? / 66

Index
to A.X W VX? / D .Xf W V21 X? / 66
to AX D B 87
to AXB D C 87
to Ax D y 87
to A 87
to G.X W VM/ D .X W 0/ 49
to X0 X D X0 y 9
to X.A W B/ D .0 W B/ 94
spectral radius 97, 108
spectrum see eigenvalues, 96
square root of nnd A 97
standard error of Oi 14
statistical distance 5
Stigler, Stephen M. v
stochastic matrix 109
stochastic restrictions 70
Stricker-Komba, Ulrike vi
suciency
linear 63
linear prediction 68
linearly minimal 64
sum of products SPxy D txy 4
sum of squares
change in SSE 16, 33, 35
change in SSE(V) 37
predicted residual 41
SSy D tyy 4
SSE 15
SSE under M121 17
SSE, various representations 16
SSR 15
SST 15
weighted SSE 36, 52
sum of squares and cubes of integers 19
Survo vi
Sylvesters inequality 80
T
theorem
Albert 110
Cochran 24
CourantFischer 100
EckartYoung 101, 107
FrischWaughLovell 11, 18, 57
GaussMarkov 49
interlacing 100
LehmannSche 64
PerronFrobenius 108
Poincar separation 100
Schurs triangularization 107
WedderburnGuttman 83
Thomas, Niels Peter vi
Tiritiri Island ii

Index

125

total sum of squares, SST 15


transposition 82
Trenkler, Gtz vi
triangular factorization 30
triangular matrix 30, 107
U
unbiased estimator of  2
O 2 15
2
40
O .i/
Q 2 36, 37, 52
unbiasedly predictable 65

14

V
variable space 2
variable vector 2
variance
vars .y/; vard .y/ 4
var.x/ D E.x  x /2
se2 .Oi / 14
var.a0 x/ D a0 a 6
var.Oi / 13
var."Oi / 10

of a dichotomous variable 19
of prediction error with a given x
variance function 23
vec 115
in multivariate model 72
in multivariate sample 25
vector of means xN 2
Vehkalahti, Kimmo vi
VIF 13, 86
volume of the ellipsoid 34

xi

Watson eciency
denition 58
decomposition 62
factorization 62
lower limit 58
weakly singular linear model: C .X/ C .V/
49
WedderburnGuttman theorem 83
Weisberg, Sanford v
Wielandt inequality 113
Wilkss lambda 26
Wishart-distribution see distribution
WorkingHotelling condence band 15