Professional Documents
Culture Documents
(i) Hc ( In − n1 Jn ) = Hc
(iv) Hc ( In − n1 Jn − Hc ) = ( H − n1 Jn )( In − H ) = 0
SSR
∼ χ2 ( p, λ)
σ2
1 T T
where λ = β X Xβ
2σ2 1 c c 1
SSE
∼ χ2 ( n − p − 1)
σ2
1 T T
λ= β X Xc ( XcT Xc )−1 XcT X β
2σ2 T
1 T 1 T −1 T α
= 2 (α, β 1 ) T Xc ( Xc Xc ) Xc ( 1, Xc ) β
2σ X c 1
1
= 2 β 1T XcT Xc β 1
2σ
SSR/p
• Theorem 8.1d. If y is Nn ( X β, σ2 I ), the distribution of F = SSE/(n− p−1)
is as
follows:
1
SST = y T ( In − Jn )y
n
1
= y T ( I − H )y + y T ( H − H1 )y + y T ( H1 − Jn )y
n
= SSE + SS( β 2 | β 1 ) + SSR(reduced)
SSR(reduced) is the sum of squares due to fitting X1 only in the reduced model,
or the null model, y = X1 β∗1 + e∗ . Thus, SS( β 2 | β 1 ) is “extra” regression sum of
squares due to β 2 after adjusting for β 1 and can be expressed as SSR( f ull ) −
SSR(reduced).
(Proof) We know H and H1 are idempotent. How about HH1 ? First observe
that HX = X ( X T X )−1 X T X = X or X = [ X ( X T X )−1 X T ] X.
Partitioning X on the left side and the last X on the right side, we obtain
(Proof) We showed (i) in the last chapter. For (iii), ( I − H )( H − H1 ) = 0. For (ii),
1 T T
noncentrality parameter after applying the result is λ1 = 2σ2
β X (H − H1 ) Xβ.
CHAPTER 6. MULTIPLE REGRESSION: TESTS OF HYPOTHESES AND
CONFIDENCE INTERVALS 47
β T X T ( H − H1 ) X β = ( β T1 X1T + β T2 X2T )( H − H1 )( X1 β 1 + X2 β 2 )
with β = ( β 0 , β 1 , β 2 , β 3 , β 4 ) T .
b ) T [ C ( X T X ) −1 C T ] −1 ( C β
(ii) SSH/σ2 = (C β b )/σ2 is χ2 (q, λ),
where SSH is the sum of squares due to Cβ (i.e. due to the hypothesis).
= y T X ( X T X ) −1 C T [ C ( X T X ) −1 C T ] −1 C ( X T X ) −1 X T y
= y T Ay
SSH/q
F=
SSE/(n − p − 1)
b ) T [ C ( X T X ) −1 C T ] −1 ( C β
(C β b )/q
=
SSE/(n − p − 1)
• The F test for H0 : Cβ = 0 in the above theorem is called general linear hypothesis
test. The degrees of freedom q is the number of linear combinations in Cβ.
b − 0 ) T [ C ( X T X ) −1 C T ] −1 ( C β
• SSH can be written as (C β b − 0), which is squared
b and Cβ = 0 under the null. Intuitively, if C β
distance between C β b is very
∂U ( β, λ)
= Cβ = 0
∂λ
∂U ( β, λ)
= −2X T y + 2X T X β + C T λ
∂β
b = ( X T X ) −1 X T y − 1 ( X T X ) −1 C T λ
β c
2
b = 0,
Since C β c
1
C ( X T X ) −1 X T y − C ( X T X ) −1 C T λ = 0
2
yielding
Thus
b =β
β b − 1 2 ( X T X ) −1 C T [ C ( X T X ) −1 C T ] −1 C ( X T X ) −1 X T y
c
2
b − ( X T X ) −1 C T [ C ( X T X ) −1 C T ] −1 C β
=β b
= ( I − B) β,
b
b ) = σ 2 ( X T X ) −1 − σ 2 ( X T X ) −1 C T [ C ( X T X ) −1 C T ] −1 C ( X T X ) −1
(ii) cov( β c
b ) = ( I − B)( X T X )−1 ( I − B) T σ2
cov( β c
The sum of squares due to regression can be decomposed into two parts: the
sum of squares due to the hypothesis (SSH) and the sum of squares due to the
remaining regression after adjusting the hypothesis.
b ) T { C ( X T X ) −1 C T } −1 C β
y T HX y = ( C β b T {var ( β
b + σ2 β b )}− β
b .
c c c
T
D 0 P1
( I − B) = UD ∗ P T
= [U1 , U2 ] = U1 DP1T , where P1 and U1 are
0 0 P2T
( p + 1)x( p + 1 − q) matrices and D is a ( p + 1 − q)x( p + 1 − q) matrix.
C
Let a = β = Qβ. Then, the regression model y = X β + e can be ex-
P1T
pressed as y = XQ−1 Qβ + e = W a + e.
CHAPTER 6. MULTIPLE REGRESSION: TESTS OF HYPOTHESES AND
CONFIDENCE INTERVALS 51
Then, we have
y T HX y = y T HW y = σ2 b a ) −1 b
a T var ( b a
= σ2 ( Q β b ) Q T } −1 ( Q β
b ) T { Q var ( β b)
C ( X T X ) −1 C T C ( X T X ) −1 P −1 ( C β
!
b )T ( PT β b T 1
b )
= (C β 1 ) b) .
P1T ( X T X )−1 C T P1T ( X T X )−1 P1 ( P1T β
B ( X T X ) −1 C T = ( X T X ) −1 C T [ C ( X T X ) −1 C T ] −1 C ( X T X ) −1 C T = ( X T X ) −1 C T .
(b) From (a), we have
( I − B)( X T X )−1 C T = 0
⇐⇒ U1 DP1T ( X T X )−1 C T = 0
⇐⇒ D −1 DP1T ( X T X )−1 C T = 0
⇐⇒ P1T ( X T X )−1 C T = 0.
Then,
b ) T { C ( X T X ) −1 C T } −1 C β
y T HX y = ( C β b T T T −1
b + ( PT β −1 T b
1 ) { P1 ( X X ) P1 } P1 β.
b ) T { P T ( X T X )−1 P1 }−1 P T β
Now we show ( P1T β b T {var ( β
b = σ2 β b )}− β
b .
1 1 c c c
b T {var ( β
σ2 β b )}− β
b
c c c
b T ( I − B) T {( I − B)var ( β
= σ2 β b )( I − B) T }− ( I − B) β
b
b T P1 DU T {U1 DP T var ( β
= σ2 β b ) P1 DU T }− U1 DP T β
b
1 1 1 1
T
= σ2 β b ) P1 D }−1 U T ]U1 DP T β
b P1 DU T [U1 { DP T var ( β b
1 1 1 1
b T P1 { P T var ( β
= σ2 β b ) P1 }−1 P T β
b
1 1
b ) T { P T ( X T X )−1 P1 }−1 P T β.
= ( P1T β b
1 1
CHAPTER 6. MULTIPLE REGRESSION: TESTS OF HYPOTHESES AND
CONFIDENCE INTERVALS 52
that falsely rejects at least one hypothesis when all hypotheses are true.
P( E1 or E2 ... or Ek ).
α f = P( E1 ∪ E2 ∪ · · · ∪ Ek ) ≤ ∑kj=1 P( Ej ) = ∑kj=1 α j .
Let ∑ik=1 αi = α, be the desired overall error rate. One choice is αi = α/k.
• Scheffé’s method: Scheffé’s method works for testing any linear combination of
β.
b ) T ( a T ( X T X ) −1 a ) −1 a T β
(aT β b
Fa =
s2
(aT β
b )T (aT β b)
= 2 T T −1 ,
s a (X X ) a
CHAPTER 6. MULTIPLE REGRESSION: TESTS OF HYPOTHESES AND
CONFIDENCE INTERVALS 53
Note that if ( )
(aT β
b − a T β )2
P max ≥c =α
a a T Sa
is satisfied, then, for any a,
( )
(aT β
b − a T β )2
P ≥c ≤ α.
a T Sa
(aT β
b − a T β )2
Then, what is maxa a T Sa
?
(aT β
b − a T β )2
b − β ) T S −1 ( β
b − β)
max = (β
a a T Sa
by Cauchy-Schwarz inequality and maximum occurs when a ∝ S−1 ( β
b − β ).
(aT βb )2 bT X T X β
β b
max a =
s 2 a T ( X T X ) −1 a s2
T
(ii) If y is Nn ( X β, σ2 I ), then β b X T X β/
b ( p + 1)s2 is distributed as F ( p + 1, n −
(aT β
b )2
p − 1). Thus max a 2 T T −1 is distributed as F ( p + 1, n − p − 1).
s a ( X X ) a ( p +1)
CHAPTER 6. MULTIPLE REGRESSION: TESTS OF HYPOTHESES AND
CONFIDENCE INTERVALS 54
• Simultaneous intervals
Using the results obtained in this section, one can construct simultaneous inter-
val for β.
√
Bonferroni interval for β 1 , · · · , β p is βbj ± tn− p−1 (α/2p) s g jj , where g jj is the
diagonal entry of ( X T X )−1 that corresponding to β j and this implies that
P ∀ j, β j ∈ ( βbj − tn− p−1 (α/2p) s g jj , βbj + tn− p−1 (α/2p) s g jj ) ≥ 1 − α
p p
• 8.37: gas vapor example (We repeat the data description again.)
When gasoline is pumped into the tank of a car, vapors are vented into the
atmosphere. An experiment was conducted to determine whether y, the amount
of vapor, can be predicted using the following four variables based on initial
x1 = tank temperature (◦ F)
x2 = gasoline temperature (◦ F)
fit=lm(y˜ . ,data=gas )
SST= sum((gas$y-mean(gas$y))ˆ2)
SSR= sum((fit$fitted-mean(gas$y))ˆ2)
SSE= sum((gas$y-fit$fitted)ˆ2)
F=(SSR/4)/(SSE/(32-4-1))
F
[1] 84.54
summary(fit)$f
[1] 7.327e-15
Test H0 : β 1 = β 3 = 0.
X=model.matrix(fit)
fit2=lm(y˜ x2+x4,data=gas)
X1=model.matrix(fit2)
H=X%*%solve(t(X)%*%X)%*%t(X)
H1=X1%*%solve(t(X1)%*%X1)%*%t(X1)
SS.beta2.beta1=t(gas$y)%*%(H-H1)%*%gas$y
SSE=t(gas$y)%*%(diag(dim(X)[1])-H)%*%gas$y
F= (SS.beta2.beta1/2)/(SSE/(32-4-1))
F
CHAPTER 6. MULTIPLE REGRESSION: TESTS OF HYPOTHESES AND
CONFIDENCE INTERVALS 56
[,1]
[1,] 2.493
[,1]
[1,] 0.1015
#Alternatively
anova(fit2,fit)
Model 1: y ˜ x2 + x4
Model 2: y ˜ x1 + x2 + x3 + x4
Res.Df RSS Df Sum of Sq F Pr(>F)
1 29 238
2 27 201 2 37.2 2.49 0.1
C=matrix(c(0,1,-1,0,0,0,0,1,-12,0,0,0,0,1,-1),nrow=3,byrow=F)
C
[1,] 0 1 -1 0 0
[2,] 0 0 1 -12 0
[3,] 0 0 0 1 -1
hat.beta=solve(t(X)%*%X)%*%t(X)%*%gas$y
CHAPTER 6. MULTIPLE REGRESSION: TESTS OF HYPOTHESES AND
CONFIDENCE INTERVALS 57
SSH=t(C%*%hat.beta)%*%solve(C%*%solve(t(X)%*%X)%*%t(C))%*%(C%*%hat.beta)
F=(SSH/3)/(SSE/(32-4-1))
F
[,1]
[1,] 10.57
[,1]
[1,] 8.99e-05
Chapter 7
In this chapter we consider various approaches to checking the model and the
7.1 Residuals
e = y − Xβ
b b = ( I − H )y = ( I − H )( X β + e) = ( I − H )e due to HX = X.
• Properties of residuals:
E(b
e) = 0,
e ) = σ 2 ( I − H ),
cov(b
cov(b
e, yb) = 0
∑in=1 b
ei = 0
e T y = y T ( I − H )y = SSE
b
e T yb = y T ( I − H ) Hy = 0
b
CHAPTER 7. MULTIPLE REGRESSION: MODEL VALIDATION AND
DIAGNOSTICS 59
e T X = y T ( I − H ) X = 0.
b
1
(i) n ≤ hii ≤ 1 for i = 1, · · · n
(ii) − 21 ≤ hij ≤ 1
2 for i 6= j.
(iii) hii = 1
n + ( xci − x̄ )( XcT Xc )−1 ( xci − x̄ )T
where xci = ( xi1 , xi2 , · · · , xip ), x̄ = ( x̄1 , x̄2 , · · · x̄ p ) and ( xci − x̄ ) is the ith row
of the centered matrix Xc . [Also, note that hii = xi ( X T X )−1 xiT , where xi =
(1, xi1 , xi2 , · · · , xip )]
(iv) tr ( H ) = ∑ hii = p + 1
1 1
H= J + Hc = J + Xc ( XcT Xc )−1 XcT (7.1)
n n
∑ j6=i h2ij
Dividing (7.2) by hii (≥ 1/n), we obtain 1 = hii + hii , which implies hii ≤ 1.
Since yb = Hy,
If the ith observation heavily influenced its fit (i.e. yi ≈ ybi ), it implies that hii is
close to 1. Recall that 1/n ≤ hii ≤ 1. Such observation is called a leverage point.
Note that the leverage points can be identified based on X only (without y) as
7.4 Outliers
As we know that the residuals do not have the same variance, it is desirable to
scale the residuals. There are two common methods of scaling.
residual,
CHAPTER 7. MULTIPLE REGRESSION: MODEL VALIDATION AND
DIAGNOSTICS 61
e
ri = √ i ,
σ 1 − hii
b
σ = SSE/(n − p − 1).
where b
In simple regression,
1 ( xi − x̄ )2
hii = +
n ∑( x j − x̄ )2
and
ei
ri = r .
1 ( xi − x̄ )2
σ 1−
b n − ∑( x j − x̄ )2
Properties:
(i) ∑ ri 6= 0
(ii) E(ri ) = 0
e
ti = √i
σ(i) 1 − hii
b
ei2
σ(2i) (n − p − 2) = (n − p − 1)b
σ2 − .
1 − hii
b
A−1 uv T A−1
where the last equality is using the fact that ( A − uv T )−1 = A−1 + 1 − v T A −1 u
.
Thus
T −1 T T −1
b − ( X T X ) −1 x T y i + ( X X ) x i x i ( X X ) ( X T y − x T y i )
βb(i) = β i i
1 − hii
T
b − ( X X ) xi ei . It is convenient to know that yi − xi βb(i) ≡ e(i) =
−1 T
and βb(i) = β 1− h ii
hii ei b − ( X T X )−1 x T e(i) .)
ei + 1−hii = ei /(1 − hii ). (This in turn yields βb(i) = β i
n
σ(2i) =
( n − 1 − p − 1)b ∑ (y j − x j βb(i) )2 − (yi − xi βb(i) )2
j =1
n 2 2
h ji ei
ei
= ∑ ej +
1 − hii
−
1 − hii
(7.3)
j =1
n ei2 ei2
= ∑ e2j −
1 − hii
= ( n − p − 1)b 2
σ −
1 − hii
j =1
It remains to show
1
(i) σ2
(n − 1 − σ(2i) ∼ χ2 (n − 1 − p − 1)
p − 1)b
ei2
σ(2i) ⊥
(ii) (n − 1 − p − 1)b 1−hii
CHAPTER 7. MULTIPLE REGRESSION: MODEL VALIDATION AND
DIAGNOSTICS 63
2
1 ei
(iii) σ2 1−hii
∼ χ2 (1).
σ(2i) = y T ( I − H )y − y T Ly.
( n − 1 − p − 1)b
1 − 1.
From this result, one can consider t-test using ti for testing H0 : θ = 0 in
the mean-shift outlier model, E(yi | xi ) = xi β + θ. Since n tests will be made,
n
PRESS ≡ ∑ (yi − ybi(i) )2,
i =1
where ybi(i) = xi βb(i) and βb(i) is the estimated β using the data without the ith
observation.
• The first expression requires to fit the regression n times while the second ex-
pression needs to fit the regression once.
• A scaled residual, ei /(1 − hii ) that corresponds to a large value of hii contributes
more to PRESS. For a given dataset, PRESS may be a better measure than SSE
of how well the model will predict future observations (why?). When the objec-
tive is prediction, one can choose a model with small PRESS among candidate
models.
• Cook’s distance
Di = (yb − yb(i) ) T (yb − yb(i) )/{( p + 1)b
σ2 }
b − βb(i) ) T ( X T X )( β
= (β σ2 }
b − βb(i) )/{( p + 1)b
T
b − ( X X ) xi ei in Di ,
−1 T
Replacing βb(i) = β 1− h ii
1 1
Di = 2
ei2 xi ( X T X )−1 xiT
( p + 1)b
σ (1 − hii )2
!2
ei hii
=
( p + 1)(1 − hii )
p
σ (1 − hii )
b
ri2 hii
= .
( p + 1) (1 − hii )
• DFFITS
ybi − ybi(i)
DFFITSi = q
σ(2i) hii
b
h e
ybi − ybi(i) = xi ( βb − βb(i) ) = ii i
1 − hii
CHAPTER 7. MULTIPLE REGRESSION: MODEL VALIDATION AND
DIAGNOSTICS 65
1 hii ei
DFFITSi = q
σ(2i) hii 1 − hii
b
s
ei hii
=q
σ(2i) (1 − hii ) 1 − hii
b
s
hii
= ti ,
1 − hii
• DFBETAS:
βbj − βbj(i)
DFBETAS ji = q ,
σ(2i) c jj
b
Alternatively,
a ji t
DFBETAS ji = √ √ i ,
c jj 1 − hii
where a ji be the ( j, i )th element of A = ( X T X )−1 X T .
( X T X )−1 xiT ei
βb − βb(i) =
1 − hii
a i ei
= ,
1 − hii
a ji ei 1 a ji t
DFBETAS ji = =√ √ i .
1 − hii b
q
σ(2i) c jj c jj 1 − hii
When | DFBETA ji | > √2 then the ith data point is considered influential.
n
CHAPTER 7. MULTIPLE REGRESSION: MODEL VALIDATION AND
DIAGNOSTICS 66
#hat matrix
hii=hat(model.matrix(fit))
#usual residual
ei=gas$y-fit$fitted
#studentized residual
ri=rstandard(fit)
#R studentized residual (external)
ti=rstudent(fit)
mar=c(4,4,2,1))
plot(fit$fitted, ei, xlab=expression( hat(y)[i]),
ylab=expression(e[i]))
abline(h=0,col='gray')
plot(fit$fitted, ri, xlab=expression( hat(y)[i]),
ylab=expression(r[i]))
abline(h=0,col='gray')
plot(fit$fitted, ti, xlab=expression( hat(y)[i]),
ylab=expression(t[i]))
abline(h=0,col='gray')
plot(1:length(hii), hii, xlab='observation number',
ylab=expression(h[ii]), ylim=c(0,1))
CHAPTER 7. MULTIPLE REGRESSION: MODEL VALIDATION AND
DIAGNOSTICS 67
2
4
1
2
0
0
ei
ri
−6 −4 −2
−1
−2
20 30 40 50 20 30 40 50
y^i y^i
hii
ti
−1
−2
20 30 40 50 0 5 10 20 30
y^i observation number
Influence measures
#cooks distance
cooks.distance(fit)
#DFFITS
dffits(fit)
#DFBETAS
dfbeta(fit)