Professional Documents
Culture Documents
Hsuhl LRClass Chap2
Hsuhl LRClass Chap2
許湘伶
Applied Linear Regression Models
(Kutner, Nachtsheim, Neter, Li)
Yi = β0 + β1 Xi + εi
illustration
A market research analyst studying the relation between sales
(Y ) and advertising expenditures (廣告支出 X):
to obtain an interval estimate of β1
provide information as to how many additional sales dollars
檢定
H0 : β1 = 0;
Ha : β1 6= 0.
Point estimator b1
(Xi − X̄ )(Yi − Ȳ )
P
b1 =
(Xi − X̄ )2
P
Mean: E{b1 } = β1 ;
σ2
variance: σ 2 {b1 } = P
(Xi − X̄ )2
X b1 is a linear combination of Yi
X
b1 = ki Yi
Xi − X̄
ki = P : a function of Xi
(Xi − X̄ )2
X Properties of ki :
X
ki = 0
X
ki X i = 1
1
ki2 = P
X
(Xi − X̄ )2
Normality
Normality (cont.)
Normally properties
Mean:
E{b1 } = β1
Variance:
1
σ 2 {b1 } = σ 2 P
(Xi − X̄ )2
Normality
Normality
(Yi − Ŷi )2
P
SSE
MSE = =
n−2 n−2
Normality
(Yi − Ŷi )2
P
SSE
MSE = =
n−2 n−2
MSE
s 2 {b1 } = P
(Xi − X̄ )2
Normality
(Yi − Ŷi )2
P
SSE
MSE = =
n−2 n−2
MSE
s 2 {b1 } = P
(Xi − X̄ )2
Theorem 1
The estimator b1 has minimum variance among all unbiased
linear estimators of: X
β̂1 = ci Yi
P P
ci : arbitrary constants; ci = 0; ci Xi = 1
X Unbiased: E{β̂1 } = β1
X σ 2 {β̂1 } = σ 2
P 2
c i
ci = ki + di
Xi −X̄
ki = P(X 2
i −X̄)
di : arbitrary constants
P
ki di = 0
σ 2 ki2 = σ 2 {b1 }
P
P 2
X β̂1 is at a minimum when d i = 0 ⇔ all di = 0 ⇔ ci = ki
hsuhl (NUK) LR Chap 2 13 / 118
Sampling distribution of (b1 − β1 )/s{b1 }
Theorem 2
b1 − β1
∼ tn−2
s{b1 }
Theorem 2
b1 − β1
∼ tn−2
s{b1 }
Theorem 2
b1 − β1
∼ tn−2
s{b1 }
b1 −β1
Proof of s{b1 }
∼ tn−2
Theorem 3
For regression model (2.1),
SSE/σ 2 ∼ χ2 (n − 2),
Theorem 4
t Distribution
Let Z and χ2 (ν) be independent r.v. (standard normal, χ2 ). A t
random variable as follows:
Z
t(ν) = h χ2 (ν) i where Z and χ2 (ν) are indep.
ν
b1 − β1 b1 − β1 s{b1 } Z
= ÷ ∼q 2
s{b1 } σ{b1 } σ{b1 } χ (n−2)
n−2
b1 − β1
∼ tn−2
s{b1 }
b1 − β1
∼ tn−2
s{b1 }
n o
P t(α/2; n − 2) ≤ (b1 − β1 )/s{b1 } ≤ t(1 − α/2; n − 2) = 1 − α
n o
⇒P b1 − t(1 − α/2; n − 2)s{b1 } ≤ β1 ≤ b1 + t(1 − α/2; n − 2)s{b1 }
=1−α
sb1<-sqrt(MSE/sum((Size-mean(Size))^2)) #s{b1}
conf<-c(b1-tdf*sb1,b1+tdf*sb1)
conf
## method 2
fitreg<-lm(Hrs~Size)
summary(fitreg)
confint(fitreg)
MSE
s 2 {b1 } = P(Xi −X̄)
2 = 0.120404
s{b1 } = 0.3470
t(0.975, 23) = 2.069
The 95 percent confidence interval:
2.85 ≤ β1 ≤ 4.29
Tests concerning β1
b1 − β1 d
∼ tn−2
s{b1 }
Two-Sided Test
H0 : β1 = 0 vs. Ha : β1 6= 0
Tests concerning β1
α = 0.05; n = 25
t(0.975; 23) = 2.069
b1 = 3.5702
|t ∗ | =
|3.5702/0.3470| =
10.29 > 2.069
conclude Ha :
β1 6= 0
p-value: P t(23) >
t ∗ = 10.29} < 0.0005
hsuhl (NUK) LR Chap 2 25 / 118
Confidence interval for β1
Tests concerning β1
α = 0.05; n = 25
t(0.975; 23) = 2.069
b1 = 3.5702
|t ∗ | =
|3.5702/0.3470| =
10.29 > 2.069
conclude Ha :
β1 6= 0
p-value: P t(23) >
t ∗ = 10.29} < 0.0005
hsuhl (NUK) LR Chap 2 25 / 118
Confidence interval for β1
Tests concerning β1
One-Sided Test
H0 : β1 ≤ 0 vs. Ha : β1 > 0
Tests concerning β1
Two-Sided Test
Sampling distribution of b0
Theorem 5
For regression model (2.1), the sampling distribution of b0 is
normal, with mean and variance:
E{b0 } = β0
1 X̄ 2
2 2
σ {b0 } = σ +P
n (Xi − X̄ )2
An estimator of σ 2 {b0 }:
1 X̄ 2
2
s {b0 } = MSE +P
n (Xi − X̄ )2
Theorem 6
b0 − β0 d
∼ tn−2
s{b0 }
## method 1
sb0<-sqrt(MSE*(1/n+mean(Size)^2/sum((Size-mean(Size))^2))) #
b0<-mean(Hrs)-b1*mean(Size)
tdf0<-qt(1-0.1/2,n-2)
conf0<-c(b0-tdf0*sb0,b0+tdf0*sb0)
conf0
## method 2
fitreg<-lm(Hrs~Size)
confint(fitreg,level = 0.9)
hsuhl (NUK) LR Chap 2 30 / 118
Confidence interval for β0
Toluca Company
Range of X : [20,120]
t(0.95; 23) = 1.714
1 2
s 2 {b0 } = MSE n
+ P(XX̄i −X̄)2 = 685.34
s{b0 } = 26.18
The 90% C.I. for β0 :
17.5 ≤ β0 ≤ 107.2
Power of Tests
The power of this test for α level: the decision rule will lead to
conclusion Ha when Ha holds
Power = P |t ∗ | > t(1 − α/2; n − 2)|δ
|β1 − β10 |
δ= (Appendix Table B.5)
σ{b1 }
Common objective
To estimate the mean for one or more probability distribu-
tions of Y .
Illustration
A study of the relation between level of piecework(按件計酬的工
作) pay (X ) and worker productivity (生產力 Y ).
The mean productivity at high and medium levels of
piecework pay may be of particular interest for purposes of
analyzing the benefits obtained from an increase in the pay
Variance:
1 (Xh − X̄ )2
2 2
σ {Ŷh } = σ +P
n (Xi − X̄ )2
Properties of Ŷh
Xh = 0:
Var(Ŷh ) = Var(b0 )
s 2 {Ŷh } = s 2 {b0 }
b1 and Ȳ are uncorrelated ⇔ σ{Ȳ , b1 } = 0
1 (Xh − X̄ )2
2
s {Ŷh } = MSE +P
n (Xi − X̄ )2
Theorem 7
Ŷh − E{Yh }
∼ tn−2
s{Ŷh }
Example 1
Toluca Company example
find a 90 percent confidence interval for E{Yh } when the lot
size is Xh = 65 units.
Ŷh = 62.37 + 3.5702(65) = 294.4i
(65−70.00)2
h
2 1
s {Ŷh } = 2, 384 25 + 19,800 = 98.37 ⇒ s{Ŷh } = 9, 918
t(.95; 23) = 1.7141
277.4 ≤ E{Yh } ≤ 311.4
Conclude: the mean number of work hours required when
lots of 65 units are produced is somewhere between 277.4 and
311.4 hours.
the estimate of the mean number of work hours is moderately
precise.
EYhCI<-c(hYh-tdf90*sYh,hYh+tdf90*sYh)
## method 2
fitreg<-lm(Hrs~Size,data=toluca)
summary(fitreg)
predXCI<-predict(fitreg,data.frame(Size = c(Xh)),
interval = "confidence", se.fit = F,level = 0.9)
plot(Size, Hrs)
abline(fitreg,col="blue")
Example 2
Toluca Company example
Estimate E{Yh } for lots with Xh = 100 units with a 90
percent confidence interval
Ŷh = 62.37 + 3.5702(100) = 419.4i
(100−70.00)2
h
2 1
s {Ŷh } = 2, 384 25 + 19,800 = 203.72
s{Ŷh } = 14.27
t(.95; 23) = 1.7141
394.9 ≤ E{Yh } ≤ 443.9
Conclude: the confidence interval is somewhat wider than
that for Example 1, since the Xh level here is substantially
farther from the mean X̄ = 70 than the Xh = 65.
β0 = 0.10 β1 = 0.95
E{Y } = 0.10 + 0.95X
σ = 0.12
An applicant whose high school GPA is Xh = 3.5:
E{Yh } ± 3σ:
3.425 ± 3(0.12) ⇒ 3.065 ≤ Yh(new) ≤ 3.785
Yh(new) − Ŷh
∼ tn−2 for normal error regression model (2.1)
s{pred}
1 (Xh − X̄ )2
2 2
s {pred} = MSE + s {Ŷh } = MSE 1 + + P
n (Xi − X̄ )2
Example
Toluca Company: Xh = 100
90 percent prediction interval: t(0.95; 23) = 1.714
## Example (p59)
fitreg<-lm(Hrs~Size,data=toluca)
Xh<-100
fitnew<-predict.lm(fitreg,data.frame(Size = c(Xh)),
se.fit = T, level = 0.9)
s2pred<-fitnew$se.fit^2+fitnew$residual.scale^2
c(s2pred,sqrt(s2pred))
Toluca Company:
The C.I. for Yh(new) is wider than the C.I. for E{Yh }:
∵ predicting the work hours required for a new lot, ⇒
encounter the variability in Ŷh from sample to sample as well
as the lot-to-lot variation within the probability distribution
of Y
(2.38a):
The prediction interval is wider the further Xh is from X̄
Example
Toluca Company: Xh = 100
90 percent prediction interval for the mean number of work
hours Ȳh(new) in three new production runs
Previous results:
Ŷh = 419.4; s 2 {Yh } = 203.72
MSE = 2, 384; t(0.95; 23) = 1.714;
2, 384
⇒s 2 {predmean} = + 203.72 = 998.4
3
s{predmean} = 31.60
(Yi − Ȳ )2
X
SSTO =
Yi are the same ⇒ SSTO = 0
The greater the variation among the Yi , the larger is SSTO.
SSE: error sum of squares
(Yi − Ŷi )2
X
SSE =
Yi fall on the fitted regression line ⇒ SSE = 0
The greater the variation of the Yi around the fitted
regression line, the larger is SSE.
(Ŷi − Ȳi )2
X
SSR =
The regression line is horizontal ⇒ SSR = 0, otherwise
SSR > 0
a measure associated with the regression line
The larger SSR is in relation to SSTO, the greater is the
effect of the regression relation in accounting for the total
variation in the Yi observations.
Y − Ȳ
| i {z }
= Ŷ − Ȳ
| i {z }
+ Y − Ŷ
| i {z }i
Total deviation Deviation of Deviation around
fitted regression value fitted regression line
around mean
Two components:
X The deviation of the fitted value Ŷi around the mean Ȳ .
X The deviation of the observation Yi around the fitted
regression line.
X
2 (Ŷi − Ȳ )(Yi − Ŷi ) = 0
SSR = b12 (Xi − X̄ )2
P
SSR 1
SSE n − 2 (∵ β0 , β1 in Ŷi )
Mean Squares
SS df MS
SSR
SSR 1 MSR = 1
(regression mean square)
SSE
SSE n − 2 MSE = n−2
(error mean square)
SSTO
6= MSR + MSE
n−1
ANOVA table
(Yi − Ȳ )2 = Yi2 − n Ȳ 2
X X
SSTO =
SS(correctionformean) = n Ȳ 2
E{MSE} = σ 2
E{MSR} = σ 2 + β12 (Xi − X̄ )2
X
F test of β1 = 0 vs. β1 6= 0
H0 :β1 = 0
6 0
Ha :β1 =
Test Statistics:
MSR
F∗ =
MSE
large values of F ∗ ⇒ Ha ;
values of F ∗ near 1 ⇒ H0 ;
hsuhl (NUK) LR Chap 2 77 / 118
Analysis of Variance Approach to Regression Analysis
Cochran’s theorem
If all n observations Yi come from the same normal distribu-
tion with mean µ and variance σ 2 , and SSTO is decomposed
into k sums of squares SSr , each with degrees of freedom dfr ,
then the SSr /σ 2 terms are independent χ2 variables with dfr
degrees of freedom if
k
X
dfr = n − 1
r=1
Property
If β1 = 0 so that all Yi have the same mean µ = β0 and the
same variance σ 2 , SSE/σ 2 and SSR/σ 2 are independent χ2
variables.
When H0 holds:
SSR SSE
MSR χ2 (1) χ2 (n − 2)
F∗ = σ2
÷ σ2
= ∼ ÷ ∼ F (1, n − 2)
1 n−2 MSE 1 n−2
When Ha holds, F ∗ follows the noncentral F distribution.
F ∗ ∼ F (1, n − 2)
The decision rule: α=Type I error
Decision
Exmaple
Toluca Company
Earlier test on β1 : (the t test, p46)
Two-Sided Test
H0 : β1 = β10
b1 − β10
If |t ∗ | = ≤ t(1 − α/2; n − 2), conclude H0
s{b1 }
b1 − β10
If |t ∗ | > t(1 − α/2; n − 2), conclude Ha
s{b1 }
Exmaple
H0 :β1 = β10 = 0
6 β10 = 0
Ha :β1 =
We have
MSR 252, 378
F∗ = = = 105.9
MSE 2, 384
What is the conclusion?
H0 :β1 = 0 Ha : β1 6= 0
1. Full Model
Yi = β0 + β1 Xi + εi
SSE(F ):
X 2
(Yi − Ŷi )2 = SSE
X
SSE(F ) = Yi − (b0 + b1 Xi ) =
2. Reduced Model
Hypothesis:
H0 :β1 = 0 Ha : β1 6= 0
Yi = β0 + εi
SSE(R):
X 2
(Yi − Ȳ )2 = SSTO
X
SSE(R) = Yi − (b0 ) =
3. Test Statistic
SSE(F ) ≤ SSE(R)
The more parameters are in the model, the better one can fit
the data and the smaller are the deviations around the fitted
regression function.
When SSE(F ) is not much less than SSE(R), using the full
model does not account for much more of the variability of
the Yi than does the reduced model.
i.e., H0 holds.
When SSE(F ) is close to SSE(R), the variation of the
observations around the fitted regression function for the full
model is almost as great as the variation around the fitted
regression function for the reduced model.
when H0 holds.
Decision rule:
Reason
Coefficient of Determination
SSR SSE
R2 = =1−
SSTO SSTO
R2 = 0 ⇒ X 與Y 無直線關係
There is no linear association between X and Y in the sample
data, and the predictor variable X is of no help in reducing the
variation in Yi with linear regression.
R2 → 1 ⇒ X 軸與Y 軸變項間的直線關係越強(具有線性變化)
The closer it is to 1, the greater is said to be the degree of linear
association between X and Y .
相關 係 數 )
Coefficient of Correlation (相
Marginal Distribution
Y1 , Y2 ∼ N2 (µ1 , µ2 , σ1 , σ2 , ρ12 ):
2
1 1 Y1 − µ1
Y1 ∼ N (µ1 , σ12 ) ⇒f1 (Y1 ) = √ exp −
2πσ1 2 σ1
2
1 1 Y2 − µ2
Y2 ∼ N (µ2 , σ22 ) ⇒f2 (Y2 ) = √ exp −
2πσ2 2 σ2
σ{Y1 , Y2 } σ12
ρ12 = =
σ{Y1 }σ{Y2 } σ1 σ2
Y1 ⊥ Y2 ⇒ σ12 = 0 ⇒ ρ12 = 0
If Y1 and Y2 are positively related⇒ σ12 and ρ12 are positive.
−1 ≤ ρ12 ≤ 1
σ1
α1|2 = µ1 − µ2 ρ12
σ2
σ1
β12 = ρ12
σ2
2
σ1|2 = σ12 (1 − ρ212 )
(ρ12 = 0 ⇒ Y1 ⊥ Y2 )
Test statistics:
√
∗ r12 n − 2
t = q ∼ t(n − 2)
2
1 − r12
z0 − ς
∼ N (0, 1)
σ{z 0 }
⇒z 0 ± z(1 − α/2)σ{z 0 }
A C.I. for ρ12 can be employed to test whether or not ρ12 has
a specified value. (ex. 0.5)
0 ≤ ρ212 ≤ 1: measures the relative reduction in the variability
of Y2 associated with the use of variable Y1 .
σ12 − σ1|2
2
ρ212 =
σ12
σ22 − σ2|1
2
ρ212 =
σ22
Test: (two-sided)