You are on page 1of 124

Chapter 2

Inferences in Regression and Correlation Analysis

許湘伶
Applied Linear Regression Models
(Kutner, Nachtsheim, Neter, Li)

hsuhl (NUK) LR Chap 2 1 / 118


Inferences concerning:
the regression parameters β0 and β1
interval estimation of β0 and β1
tests about them

X interval estimation of E(Y ) of the probability distribution of


Y for given X
X prediction intervals of a new observation Y
4 confidence bands for the regression line

hsuhl (NUK) LR Chap 2 2 / 118


: the analysis of variance approach to regression analysis
: the general linear test approach
: descriptive measure of association
2 the correlation coefficient

hsuhl (NUK) LR Chap 2 3 / 118


Normal error regression model

Assume that the normal error regression model is applicable:

Yi = β0 + β1 Xi + εi

β0 and β1 are parameters


Xi are known constants
εi ∼ N (0, σ 2 ): are independent

hsuhl (NUK) LR Chap 2 4 / 118


Inferences Concerning β1

Inferences about the slope β1 of the regression line

illustration
A market research analyst studying the relation between sales
(Y ) and advertising expenditures (廣告支出 X):
to obtain an interval estimate of β1
provide information as to how many additional sales dollars

檢定

H0 : β1 = 0;
Ha : β1 6= 0.

hsuhl (NUK) LR Chap 2 5 / 118


Inferences Concerning β1

Inferences about the slope β1 of the regression line (cont.)

Figure 1 : Regression Model when β1 = 0.

β1 = 0 ⇒ no linear association between Y and X


The regression line is horizontal.
The means of Y :
E{Y } = β0 .
The probability distribution of Y are identical at all levels of
X.
hsuhl (NUK) LR Chap 2 6 / 118
Sampling Distribution of b1

Point estimator b1

(Xi − X̄ )(Yi − Ȳ )
P
b1 =
(Xi − X̄ )2
P

The sample distribution of b1 refers to the different values of


b1 that would be obtained with repeated sampling when the
levels of the predictor variable X are held constant from
sample to sample.

hsuhl (NUK) LR Chap 2 7 / 118


Sampling Distribution of b1

Point estimator b1 (cont.)

Sampling distribution of b1 (需ki 的特性)


For normal error regression model:

Mean: E{b1 } = β1 ;
σ2
variance: σ 2 {b1 } = P
(Xi − X̄ )2

X b1 is a linear combination of Yi
X
b1 = ki Yi
Xi − X̄
ki = P : a function of Xi
(Xi − X̄ )2

hsuhl (NUK) LR Chap 2 8 / 118


Sampling Distribution of b1

Point estimator b1 (cont.)

X Properties of ki :
X
ki = 0
X
ki X i = 1
1
ki2 = P
X
(Xi − X̄ )2

hsuhl (NUK) LR Chap 2 9 / 118


Sampling Distribution of b1

Normality

Yi : independently, normally distributed


A linear combination of independent normal random
variables is normally distributed.

hsuhl (NUK) LR Chap 2 10 / 118


Sampling Distribution of b1

Normality (cont.)

Normally properties
Mean:

E{b1 } = β1

Variance:
1
σ 2 {b1 } = σ 2 P
(Xi − X̄ )2

hsuhl (NUK) LR Chap 2 11 / 118


Sampling Distribution of b1

Normality

Normally properties (cont.)


The unbiased estimator of σ 2 :

hsuhl (NUK) LR Chap 2 12 / 118


Sampling Distribution of b1

Normality

Normally properties (cont.)


The unbiased estimator of σ 2 :

(Yi − Ŷi )2
P
SSE
MSE = =
n−2 n−2

hsuhl (NUK) LR Chap 2 12 / 118


Sampling Distribution of b1

Normality

Normally properties (cont.)


The unbiased estimator of σ 2 :

(Yi − Ŷi )2
P
SSE
MSE = =
n−2 n−2

Estimated Variance σ 2 {b1 }:

MSE
s 2 {b1 } = P
(Xi − X̄ )2

The point estimator s 2 {b1 } is an unbiased estimator of


σ 2 {b1 }.
The point estimator of σ{b1 }:
hsuhl (NUK) LR Chap 2 12 / 118
Sampling Distribution of b1

Normality

Normally properties (cont.)


The unbiased estimator of σ 2 :

(Yi − Ŷi )2
P
SSE
MSE = =
n−2 n−2

Estimated Variance σ 2 {b1 }:

MSE
s 2 {b1 } = P
(Xi − X̄ )2

The point estimator s 2 {b1 } is an unbiased estimator of


σ 2 {b1 }.
The point estimator of σ{b1 }: s{b1 }
hsuhl (NUK) LR Chap 2 12 / 118
Sampling Distribution of b1

b1 : unbiased linear estimator

Theorem 1
The estimator b1 has minimum variance among all unbiased
linear estimators of: X
β̂1 = ci Yi
P P
ci : arbitrary constants; ci = 0; ci Xi = 1
X Unbiased: E{β̂1 } = β1
X σ 2 {β̂1 } = σ 2
P 2
c i
ci = ki + di
Xi −X̄
ki = P(X 2
i −X̄)
di : arbitrary constants
P
ki di = 0
σ 2 ki2 = σ 2 {b1 }
P
P 2
X β̂1 is at a minimum when d i = 0 ⇔ all di = 0 ⇔ ci = ki
hsuhl (NUK) LR Chap 2 13 / 118
Sampling distribution of (b1 − β1 )/s{b1 }

Sampling distribution of (b1 − β1 )/s{b1 }

Theorem 2

b1 − β1
∼ tn−2
s{b1 }

If the observations Yi come from the same normal


d
population, (Ȳ − µ)/s{Ȳ } ∼

hsuhl (NUK) LR Chap 2 14 / 118


Sampling distribution of (b1 − β1 )/s{b1 }

Sampling distribution of (b1 − β1 )/s{b1 }

Theorem 2

b1 − β1
∼ tn−2
s{b1 }

If the observations Yi come from the same normal


d
population, (Ȳ − µ)/s{Ȳ } ∼ tn−1

hsuhl (NUK) LR Chap 2 14 / 118


Sampling distribution of (b1 − β1 )/s{b1 }

Sampling distribution of (b1 − β1 )/s{b1 }

Theorem 2

b1 − β1
∼ tn−2
s{b1 }

If the observations Yi come from the same normal


d
population, (Ȳ − µ)/s{Ȳ } ∼ tn−1
Two parameters: β0 , β1

hsuhl (NUK) LR Chap 2 14 / 118


Sampling distribution of (b1 − β1 )/s{b1 }

Sampling distribution of (b1 − β1 )/s{b1 } (cont.)

b1 −β1
Proof of s{b1 }
∼ tn−2

Theorem 3
For regression model (2.1),

SSE/σ 2 ∼ χ2 (n − 2),

and is independent of b0 and b1 .

hsuhl (NUK) LR Chap 2 15 / 118


Sampling distribution of (b1 − β1 )/s{b1 }

Sampling distribution of (b1 − β1 )/s{b1 } (cont.) (cont.)

Theorem 4
t Distribution
Let Z and χ2 (ν) be independent r.v. (standard normal, χ2 ). A t
random variable as follows:
Z
t(ν) = h χ2 (ν) i where Z and χ2 (ν) are indep.
ν

hsuhl (NUK) LR Chap 2 16 / 118


Sampling distribution of (b1 − β1 )/s{b1 }

Sampling distribution of (b1 − β1 )/s{b1 } (cont.) (cont.)


1 (b1 − β1 )/σ{b1 } ∼ Z (Standard Normal variable)
s2 {b1 } χ2 (n−2)
2
σ 2 {b1 }
∼ n−2
3

b1 − β1 b1 − β1 s{b1 } Z
= ÷ ∼q 2
s{b1 } σ{b1 } σ{b1 } χ (n−2)
n−2

Z and χ2 are independent;


4 Z is a function of b1
5 b1 is independent of SSE/σ 2 ∼ χ2

b1 − β1
∼ tn−2
s{b1 }

hsuhl (NUK) LR Chap 2 17 / 118


Confidence interval for β1

Confidence interval for β1

b1 − β1
∼ tn−2
s{b1 }

n o
P t(α/2; n − 2) ≤ (b1 − β1 )/s{b1 } ≤ t(1 − α/2; n − 2) = 1 − α
n o
⇒P b1 − t(1 − α/2; n − 2)s{b1 } ≤ β1 ≤ b1 + t(1 − α/2; n − 2)s{b1 }
=1−α

t(α/2; n − 2): (α/2)100 percentile of the t distribution with


n − 2 d.f.
hsuhl (NUK) LR Chap 2 18 / 118
Confidence interval for β1

Confidence interval for β1 (cont.)

Symmetric: t(α/2; n − 2) = −t(1 − α/2; n − 2)

The 1 − α confidence limits for β1 are:

b1 ± t(1 − α/2; n − 2)s{b1 }

hsuhl (NUK) LR Chap 2 19 / 118


Confidence interval for β1

Confidence interval for β1 (cont.)

hsuhl (NUK) LR Chap 2 20 / 118


Confidence interval for β1

Ex: Confidence interval for β1

The Toluca Company: an estimate of β1 with 95 percent


confidence coefficient.
#### Example p46
toluca<-read.table("toluca.txt",header=T)
attach(toluca)
## method 1
n<-length(Size)
alpha<-0.05
tdf<-qt(1-alpha/2,n-2)
Sxx<-sum((Size-mean(Size))^2)
b1<-sum((Size-mean(Size))*(Hrs-mean(Hrs)))/Sxx
b0<-mean(Hrs)-b1*mean(Size)
c(b0,b1)
SSE<-sum((Hrs-(b0+b1*Size))^2)
MSE<-SSE/(n-2)
hsuhl (NUK) LR Chap 2 21 / 118
Confidence interval for β1

Ex: Confidence interval for β1 (cont.)

sb1<-sqrt(MSE/sum((Size-mean(Size))^2)) #s{b1}
conf<-c(b1-tdf*sb1,b1+tdf*sb1)
conf
## method 2
fitreg<-lm(Hrs~Size)
summary(fitreg)
confint(fitreg)
MSE
s 2 {b1 } = P(Xi −X̄)
2 = 0.120404

s{b1 } = 0.3470
t(0.975, 23) = 2.069
The 95 percent confidence interval:

2.85 ≤ β1 ≤ 4.29

hsuhl (NUK) LR Chap 2 22 / 118


Confidence interval for β1

Ex: Confidence interval for β1 (cont.)

hsuhl (NUK) LR Chap 2 23 / 118


Confidence interval for β1

Tests concerning β1

b1 − β1 d
∼ tn−2
s{b1 }

Two-Sided Test

H0 : β1 = 0 vs. Ha : β1 6= 0

The test statistic: b1


t∗ =
s{b1 }

The decision rule (the level of significance α):

If |t ∗ | ≤ t(1 − α/2; n − 2), conclude H0


If |t ∗ | > t(1 − α/2; n − 2), conclude Ha

hsuhl (NUK) LR Chap 2 24 / 118


Confidence interval for β1

Tests concerning β1

α = 0.05; n = 25
t(0.975; 23) = 2.069
b1 = 3.5702
|t ∗ | =
|3.5702/0.3470| =
10.29 > 2.069

conclude Ha :
β1 6= 0

p-value: P t(23) >
t ∗ = 10.29} < 0.0005
hsuhl (NUK) LR Chap 2 25 / 118
Confidence interval for β1

Tests concerning β1

α = 0.05; n = 25
t(0.975; 23) = 2.069
b1 = 3.5702
|t ∗ | =
|3.5702/0.3470| =
10.29 > 2.069

conclude Ha :
β1 6= 0

p-value: P t(23) >
t ∗ = 10.29} < 0.0005
hsuhl (NUK) LR Chap 2 25 / 118
Confidence interval for β1

Tests concerning β1

One-Sided Test

H0 : β1 ≤ 0 vs. Ha : β1 > 0

The test statistic: b1


t∗ =
s{b1 }

The decision rule (the level of significance α):

If t ∗ ≤ t(1 − α; n − 2), conclude H0


If t ∗ > t(1 − α; n − 2), conclude Ha

hsuhl (NUK) LR Chap 2 26 / 118


Confidence interval for β1

Tests concerning β1

Two-Sided Test

H0 : β1 = β10 vs. Ha : β1 6= β10

The test statistic: b1 − β10


t∗ =
s{b1 }

The decision rule (the level of significance α):

If |t ∗ | ≤ t(1 − α/2; n − 2), conclude H0


If |t ∗ | > t(1 − α/2; n − 2), conclude Ha

hsuhl (NUK) LR Chap 2 27 / 118


Inferences concerning β0

Sampling distribution of b0

The point estimator b0 : b0 = Ȳ − b1 X̄

Theorem 5
For regression model (2.1), the sampling distribution of b0 is
normal, with mean and variance:

E{b0 } = β0
1 X̄ 2
 
2 2
σ {b0 } = σ +P
n (Xi − X̄ )2

hsuhl (NUK) LR Chap 2 28 / 118


Inferences concerning β0

Sampling distribution of b0 (cont.)

An estimator of σ 2 {b0 }:

1 X̄ 2
 
2
s {b0 } = MSE +P
n (Xi − X̄ )2

An estimator of σ{b0 }: s{b0 }

Theorem 6

b0 − β0 d
∼ tn−2
s{b0 }

hsuhl (NUK) LR Chap 2 29 / 118


Confidence interval for β0

Confidence interval for β0

The 1 − α confidence limits for β0 are:

b0 ± t(1 − α/2; n − 2)s{b0 }

## method 1
sb0<-sqrt(MSE*(1/n+mean(Size)^2/sum((Size-mean(Size))^2))) #
b0<-mean(Hrs)-b1*mean(Size)
tdf0<-qt(1-0.1/2,n-2)
conf0<-c(b0-tdf0*sb0,b0+tdf0*sb0)
conf0
## method 2
fitreg<-lm(Hrs~Size)
confint(fitreg,level = 0.9)
hsuhl (NUK) LR Chap 2 30 / 118
Confidence interval for β0

Example C.I. for β0

Toluca Company
Range of X : [20,120]
t(0.95; 23) = 1.714
 
1 2
s 2 {b0 } = MSE n
+ P(XX̄i −X̄)2 = 685.34
s{b0 } = 26.18
The 90% C.I. for β0 :

17.5 ≤ β0 ≤ 107.2

hsuhl (NUK) LR Chap 2 31 / 118


Confidence interval for β0

Example C.I. for β0 (cont.)

It does not necessarily provide information about the “setup"


cost since we are not certain whether a linear regression
model is appropriate when the scope of the model is extended
to X = 0
hsuhl (NUK) LR Chap 2 32 / 118
Some considerations on making inferences concerning β0
and β1

Departures from Normality

If the probability distribution of Y are not exactly normal


but do not depart seriously, the sampling distributions of b0
and b1 will approximately normal, and the use of the t
distribution will provide approximately the specified
confidence coefficient or level of significance.

hsuhl (NUK) LR Chap 2 33 / 118


Some considerations on making inferences concerning β0
and β1

Departures from Normality (cont.)

Even if the distribution of Y are far from normal, the


estimators b0 and b1 generally have the property of
asymptotic normality- their distributions approach normality
under very general conditions as the sample size increases.

hsuhl (NUK) LR Chap 2 34 / 118


Some considerations on making inferences concerning β0
and β1

Power of Tests

H0 : β1 = β10 vs. Ha : β1 6= β10


b1 − β10
Test statistic: t ∗ =
s{b1 }

The power of this test for α level: the decision rule will lead to
conclusion Ha when Ha holds
 
Power = P |t ∗ | > t(1 − α/2; n − 2)|δ

hsuhl (NUK) LR Chap 2 35 / 118


Some considerations on making inferences concerning β0
and β1

Power of Tests (cont.)

the noncentrality measure i.e., a measure of how far the true


value of β1 is from β10

|β1 − β10 |
δ= (Appendix Table B.5)
σ{b1 }

hsuhl (NUK) LR Chap 2 36 / 118


Some considerations on making inferences concerning β0
and β1

Common objective
To estimate the mean for one or more probability distribu-
tions of Y .

Illustration
A study of the relation between level of piecework(按件計酬的工
作) pay (X ) and worker productivity (生產力 Y ).
The mean productivity at high and medium levels of
piecework pay may be of particular interest for purposes of
analyzing the benefits obtained from an increase in the pay

hsuhl (NUK) LR Chap 2 37 / 118


Some considerations on making inferences concerning β0
and β1

Xh : the level of X for which we wish to estimate the mean


response
may be a value which occurred in the sample
other value of the predictor variable within the scope(範圍)
of the model
E{Yh }: the mean response when X = Xh
Ŷh : the point estimator of E{Yh }:
Ŷh = b0 + b1 Xh

What is the sampling distribution of Ŷh ?

hsuhl (NUK) LR Chap 2 38 / 118


Some considerations on making inferences concerning β0
and β1

Sampling distribution of Ŷh

Sampling distribution of Ŷh


For normal error regression model (2,1), the sampling distribution
of Ŷh is normal:
Mean:
E{Ŷh } = E{Yh }

Variance:
1 (Xh − X̄ )2
 
2 2
σ {Ŷh } = σ +P
n (Xi − X̄ )2

Ŷh is a linear combination of the observations Yi .


Ŷh is an unbiased estimator of E{Yh }.

hsuhl (NUK) LR Chap 2 39 / 118


Some considerations on making inferences concerning β0
and β1

Sampling distribution of Ŷh (cont.)

Figure 2 : Effect on Ŷh of Variation in b1 from Sample to Sample in


Two Samples with Same Means Ȳ and X̄

hsuhl (NUK) LR Chap 2 40 / 118


Some considerations on making inferences concerning β0
and β1

Properties of Ŷh

Xh = 0:
Var(Ŷh ) = Var(b0 )
s 2 {Ŷh } = s 2 {b0 }
b1 and Ȳ are uncorrelated ⇔ σ{Ȳ , b1 } = 0

hsuhl (NUK) LR Chap 2 41 / 118


Some considerations on making inferences concerning β0
and β1

Properties of Ŷh (cont.)

When MSE is substituted for σ 2 , the estimated variance of


Ŷh (s 2 (Ŷh ))

1 (Xh − X̄ )2
 
2
s {Ŷh } = MSE +P
n (Xi − X̄ )2

The estimated standard deviation of Ŷh : s{Ŷh }

hsuhl (NUK) LR Chap 2 42 / 118


Interval Estimation of E{Yh }

Sampling Distribution of (Ŷh − E{Yh })/s{Ŷh }

Theorem 7

Ŷh − E{Yh }
∼ tn−2
s{Ŷh }

The 1 − α confidence limits for E{Yh } are:

Ŷh ± t(1 − α/2; n − 2)s{Ŷh }

hsuhl (NUK) LR Chap 2 43 / 118


Interval Estimation of E{Yh }

Example 1
Toluca Company example
find a 90 percent confidence interval for E{Yh } when the lot
size is Xh = 65 units.
Ŷh = 62.37 + 3.5702(65) = 294.4i
(65−70.00)2
h
2 1
s {Ŷh } = 2, 384 25 + 19,800 = 98.37 ⇒ s{Ŷh } = 9, 918
t(.95; 23) = 1.7141
277.4 ≤ E{Yh } ≤ 311.4
Conclude: the mean number of work hours required when
lots of 65 units are produced is somewhere between 277.4 and
311.4 hours.
the estimate of the mean number of work hours is moderately
precise.

hsuhl (NUK) LR Chap 2 44 / 118


Interval Estimation of E{Yh }

hsuhl (NUK) LR Chap 2 45 / 118


Interval Estimation of E{Yh }

Code for Example 1

#### Example p54


toluca<-read.table("toluca.txt",header=T)
attach(toluca)
## method 1
n<-length(Size)
alpha<-0.1
tdf90<-qt(1-alpha/2,n-2)
Sxx<-sum((Size-mean(Size))^2)
b1<-sum((Size-mean(Size))*(Hrs-mean(Hrs)))/Sxx
b0<-mean(Hrs)-b1*mean(Size)
c(b0,b1)
SSE<-sum((Hrs-(b0+b1*Size))^2)
MSE<-SSE/(n-2)
sb1<-sqrt(MSE/sum((Size-mean(Size))^2)) #s{b1}
Xh<-65
hYh<-b0+b1* Xh
sYh<-sqrt(MSE*(1/n+(Xh-mean(Size))^2/sum((Size-mean(Size))^2)))
hsuhl (NUK) LR Chap 2 46 / 118
Interval Estimation of E{Yh }

Code for Example 1 (cont.)

EYhCI<-c(hYh-tdf90*sYh,hYh+tdf90*sYh)

## method 2
fitreg<-lm(Hrs~Size,data=toluca)
summary(fitreg)
predXCI<-predict(fitreg,data.frame(Size = c(Xh)),
interval = "confidence", se.fit = F,level = 0.9)
plot(Size, Hrs)
abline(fitreg,col="blue")

# now the confidence interval for $X_h=specific level$


points(Xh, predXCI[, "fit"],col="red",pch=15)
points(Xh, predXCI[, "lwr"], lty = "dotted",col="red")
points(Xh, predXCI[, "upr"], lty = "dotted",col="red")

hsuhl (NUK) LR Chap 2 47 / 118


Interval Estimation of E{Yh }

Example 2
Toluca Company example
Estimate E{Yh } for lots with Xh = 100 units with a 90
percent confidence interval
Ŷh = 62.37 + 3.5702(100) = 419.4i
(100−70.00)2
h
2 1
s {Ŷh } = 2, 384 25 + 19,800 = 203.72
s{Ŷh } = 14.27
t(.95; 23) = 1.7141
394.9 ≤ E{Yh } ≤ 443.9
Conclude: the confidence interval is somewhat wider than
that for Example 1, since the Xh level here is substantially
farther from the mean X̄ = 70 than the Xh = 65.

hsuhl (NUK) LR Chap 2 48 / 118


Interval Estimation of E{Yh }

hsuhl (NUK) LR Chap 2 49 / 118


Interval Estimation of E{Yh }

Sampling distribution of Ŷh

The variance of Ŷh is smallest when Xh = X̄ .


In an experiment to estimate the mean response at a
particular level Xh of the predictor variable, the precision of
the estimate will be greatest if the observations on X are
spaced so that X̄ = Xh .
The usual relationship between C.I. and tests applies in
inferences concerning the mean response.
The two-sided confidence limits can be utilized for two-sided
tests concerning the mean response at Xh . Alternatively, a
regular decision rule can be set up.
The confidence limits for a mean response E{Yh } are not
sensitive to moderate departures from the assumption that
the error terms are normally distributed.

hsuhl (NUK) LR Chap 2 50 / 118


Prediction of New Observation

Prediction of New Observation

Toluca Company: the next lot to be produced consists of 100


units; wishes to predict the number of work hours for this
particular lot.

Estimated the regression relation between company sales and


number of persons 16 or more years old from data for the
past 10 years; wishes to predict next year’s company sales

hsuhl (NUK) LR Chap 2 51 / 118


Prediction of New Observation

Prediction of New Observation

College admissions example


the relevant parameters of the
regression model are known:

β0 = 0.10 β1 = 0.95
E{Y } = 0.10 + 0.95X
σ = 0.12
An applicant whose high school GPA is Xh = 3.5:

E{Yh } = 0.10 + 0.95(3.5) = 3.425

E{Yh } ± 3σ:
3.425 ± 3(0.12) ⇒ 3.065 ≤ Yh(new) ≤ 3.785

hsuhl (NUK) LR Chap 2 52 / 118


Prediction of New Observation

The basic idea of a prediction interval is to choose a range in


the distribution of Y wherein most of the observations will
fall, and then to declare that the next observation will fall in
this range.
When the regression parameters of normal error regression
model (2.1) are known, the 1 − α prediction limits for Yh(new)
are:
E{Yh } ± z(1 − α/2)σ

hsuhl (NUK) LR Chap 2 53 / 118


Prediction of New Observation

Prediction Interval for Yh(new) when Parameters Unknown

Figure 3 : Figure 2.5: Prediction of Yh(new) when Parameters


Unknown

hsuhl (NUK) LR Chap 2 54 / 118


Prediction of New Observation

Prediction Interval for Yh(new) when Parameters Unknown


(cont.)

Since we cannot be certain of the location of the distribution


of Y , prediction limits for Yh(new) clearly must take account
of two elements (Figure 2.5)
Variation in possible location of the distribution of Y
Variation within the probability distribution of Y
Prediction limits for a new observation Yh(new) at Xh (given)
are obtained:
Theorem 8

Yh(new) − Ŷh
∼ tn−2 for normal error regression model (2.1)
s{pred}

hsuhl (NUK) LR Chap 2 55 / 118


Prediction of New Observation

Prediction Interval for Yh(new) when Parameters Unknown


(cont.)

The 1 − α prediction limits for Yh(new) :

Ŷh ± t(1 − α/2; n − 2)s{pred}

The difference may be viewed as the prediction error, with


Ŷh(new) serving as the best point estimate of the value of the
new observation Yh(new)
σ 2 {pred}: the variance of the prediction error

σ 2 {pred} = σ 2 {Yh(new) − Ŷh } = σ 2 + σ 2 {Ŷh }

hsuhl (NUK) LR Chap 2 56 / 118


Prediction of New Observation

Prediction Interval for Yh(new) when Parameters Unknown


(cont.)

σ 2 {pred} has two components:


The variance of the distribution of Y at X = Xh ; σ 2
The variance of the sampling distribution of Ŷh ; σ 2 {Ŷh }
An unbiased estimator of σ 2 {pred}:

1 (Xh − X̄ )2
 
2 2
s {pred} = MSE + s {Ŷh } = MSE 1 + + P
n (Xi − X̄ )2

hsuhl (NUK) LR Chap 2 57 / 118


Prediction of New Observation

Example
Toluca Company: Xh = 100
90 percent prediction interval: t(0.95; 23) = 1.714

Ŷh = 419.4 s 2 {Ŷh } = 203.72 MSE = 2, 384


⇒s 2 {pred} = 2, 384 + 203.72 = 2, 587.72
s{pred} = 50.87

The 90 percent prediction interval for Yh(new) :

332.2 ≤ Yh(new) ≤ 506.6

hsuhl (NUK) LR Chap 2 58 / 118


Prediction of New Observation

## Example (p59)
fitreg<-lm(Hrs~Size,data=toluca)
Xh<-100
fitnew<-predict.lm(fitreg,data.frame(Size = c(Xh)),
se.fit = T, level = 0.9)
s2pred<-fitnew$se.fit^2+fitnew$residual.scale^2
c(s2pred,sqrt(s2pred))

hsuhl (NUK) LR Chap 2 59 / 118


Prediction of New Observation

This prediction interval is rather wide and may not be useful


for planning worker requirements for the next lot.
The interval can still be useful for control purposes.
If the actual work hours fall outside the prediction limits ⇒
some alerts- may have occurred a change in the production
process

hsuhl (NUK) LR Chap 2 60 / 118


Prediction of New Observation

Comparing Yh(new) and E{Yh }

Toluca Company:
The C.I. for Yh(new) is wider than the C.I. for E{Yh }:
∵ predicting the work hours required for a new lot, ⇒
encounter the variability in Ŷh from sample to sample as well
as the lot-to-lot variation within the probability distribution
of Y
(2.38a):
The prediction interval is wider the further Xh is from X̄

hsuhl (NUK) LR Chap 2 61 / 118


Prediction of New Observation

Prediction of Mean of m New Observations for Given Xh

Predict the mean of m new observations on Y for a given Xh


Y : the mean of the new observations to be predicted as
Ȳh(new)
the new Y observations are independent
The appropriate 1 − α prediction limits:

Ŷh ± t(1 − α/2; n − 2)s{predmean}


MSE
s 2 {predmean} = + s 2 {Ŷh }
m
1 1 (Xh − X̄ )2
 
2
⇔ s {predmean} = MSE + +P }
m n (Xi − X̄ )2

Two components for s 2 {predmean}


hsuhl (NUK) LR Chap 2 62 / 118
Prediction of New Observation

Prediction of Mean of m New Observations for Given Xh


(cont.)

the variance of the mean of m observations from the


probability distribution of Y at X = Xh
The variance of the sampling distribution of Ŷh .

hsuhl (NUK) LR Chap 2 63 / 118


Prediction of New Observation

Prediction of Mean of m New Observations for Given Xh


(cont.)

Example
Toluca Company: Xh = 100
90 percent prediction interval for the mean number of work
hours Ȳh(new) in three new production runs
Previous results:
Ŷh = 419.4; s 2 {Yh } = 203.72
MSE = 2, 384; t(0.95; 23) = 1.714;
2, 384
⇒s 2 {predmean} = + 203.72 = 998.4
3
s{predmean} = 31.60

hsuhl (NUK) LR Chap 2 64 / 118


Prediction of New Observation

Prediction of Mean of m New Observations for Given Xh


(cont.)

The prediction interval for Ȳh(new) :

365.2 ≤ Ȳh(new) ≤ 473.6

The total number of work hours:

1, 095.6 ≤ Total work hours ≤ 1, 420.8

hsuhl (NUK) LR Chap 2 65 / 118


Analysis of Variance Approach to Regression Analysis

Partition of Total Sum of Squares

The analysis of variance approach is based on the partitioning


of sums of squares and degrees of freedom associated with Y .

The variation is measured: the deviations of the Yi around


their mean Ȳ :
Yi − Ȳ

hsuhl (NUK) LR Chap 2 66 / 118


Analysis of Variance Approach to Regression Analysis

Partition of Total Sum of Squares (cont.)

hsuhl (NUK) LR Chap 2 67 / 118


Analysis of Variance Approach to Regression Analysis

Partition of Total Sum of Squares (cont.)

Total variation: (SSTO: total sum of squares)

(Yi − Ȳ )2
X
SSTO =
Yi are the same ⇒ SSTO = 0
The greater the variation among the Yi , the larger is SSTO.
SSE: error sum of squares

(Yi − Ŷi )2
X
SSE =
Yi fall on the fitted regression line ⇒ SSE = 0
The greater the variation of the Yi around the fitted
regression line, the larger is SSE.

hsuhl (NUK) LR Chap 2 68 / 118


Analysis of Variance Approach to Regression Analysis

Partition of Total Sum of Squares (cont.)

SSR: regression sum of squares

(Ŷi − Ȳi )2
X
SSR =
The regression line is horizontal ⇒ SSR = 0, otherwise
SSR > 0
a measure associated with the regression line
The larger SSR is in relation to SSTO, the greater is the
effect of the regression relation in accounting for the total
variation in the Yi observations.

hsuhl (NUK) LR Chap 2 69 / 118


Analysis of Variance Approach to Regression Analysis

Formal Development of Partitioning

The total deviation:

Y − Ȳ
| i {z }
= Ŷ − Ȳ
| i {z }
+ Y − Ŷ
| i {z }i
Total deviation Deviation of Deviation around
fitted regression value fitted regression line
around mean

Two components:
X The deviation of the fitted value Ŷi around the mean Ȳ .
X The deviation of the observation Yi around the fitted
regression line.

hsuhl (NUK) LR Chap 2 70 / 118


Analysis of Variance Approach to Regression Analysis

Formal Development of Partitioning (cont.)

(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + (Yi − Ŷ )2


X X X

SSTO SSR SSE

X
2 (Ŷi − Ȳ )(Yi − Ŷi ) = 0
SSR = b12 (Xi − X̄ )2
P

Degrees of freedom: (df)


SS df explanation
SSTO n − 1 (∵ (Yi − Ȳ ) = 0)
P

SSR 1
SSE n − 2 (∵ β0 , β1 in Ŷi )

hsuhl (NUK) LR Chap 2 71 / 118


Analysis of Variance Approach to Regression Analysis

Mean Squares

Mean square (MS):


A sum of squares divided by its associated df

SS df MS
SSR
SSR 1 MSR = 1
(regression mean square)
SSE
SSE n − 2 MSE = n−2
(error mean square)

Mean squares are not additive.

SSTO
6= MSR + MSE
n−1

hsuhl (NUK) LR Chap 2 72 / 118


Analysis of Variance Approach to Regression Analysis

ANOVA table

Table 1 : ANOVA Table for Simple Linear Regression


Source of SS df MS E{MS }
Variation
SSR
Regression SSR = (Ŷi − Ȳ )2 σ 2 + β12 (Xi − X̄ )2
P
1 MSR =
P
1
SSE
Error SSE = (Yi − Ŷi )2 n−2 σ2
P
MSE = n−2
Total SSTO = (Yi − Ȳ )2 n−1
P

hsuhl (NUK) LR Chap 2 73 / 118


Analysis of Variance Approach to Regression Analysis

ANOVA table (cont.)

Modified ANOVA table:

(Yi − Ȳ )2 = Yi2 − n Ȳ 2
X X
SSTO =

SSTOU: total uncorrected sum of squares:


X
SSTOU = Yi2

correction for the mean sum of squares:

SS(correctionformean) = n Ȳ 2

hsuhl (NUK) LR Chap 2 74 / 118


Analysis of Variance Approach to Regression Analysis

ANOVA table (cont.)

Table 2 : Modified ANOVA Table for Simple Linear Regression


Source of Variation SS df MS
SSR
Regression SSR = (Ŷi − Ȳ )2
P
1 MSR = 1
SSE
Error SSE = (Yi − Ŷi )2 n−2
P
MSE = n−2
Total SSTO = (Yi − Ȳ )2 n−1
P

Correction for mean SS (correctionformean) = n Ȳ 2 1


Total, uncorrected SSTOU = Yi2
P
n

hsuhl (NUK) LR Chap 2 75 / 118


Analysis of Variance Approach to Regression Analysis

Expected Mean Squares

E{MSE} = σ 2
E{MSR} = σ 2 + β12 (Xi − X̄ )2
X

MSE is an unbiased estimator of σ 2 .


Implication:
The mean of the sampling distribution of MSE is σ 2 ;
The mean of the sampling distribution of MSR is σ 2 when
β1 = 0;
When β1 6= 0, E{MSR} > E{MSE} = σ 2 .
(∵ β12 (Xi − X̄ )2 > 0)
P

hsuhl (NUK) LR Chap 2 76 / 118


Analysis of Variance Approach to Regression Analysis

F test of β1 = 0 vs. β1 6= 0

The analysis of variance approach provides us with a battery


highly useful tests for regression models.
For the simple linear regression case, the ANOVA provides us
with a test:

H0 :β1 = 0
6 0
Ha :β1 =

Test Statistics:
MSR
F∗ =
MSE
large values of F ∗ ⇒ Ha ;
values of F ∗ near 1 ⇒ H0 ;
hsuhl (NUK) LR Chap 2 77 / 118
Analysis of Variance Approach to Regression Analysis

F test of β1 = 0 vs. β1 6= 0 (cont.)

Cochran’s theorem
If all n observations Yi come from the same normal distribu-
tion with mean µ and variance σ 2 , and SSTO is decomposed
into k sums of squares SSr , each with degrees of freedom dfr ,
then the SSr /σ 2 terms are independent χ2 variables with dfr
degrees of freedom if
k
X
dfr = n − 1
r=1

hsuhl (NUK) LR Chap 2 78 / 118


Analysis of Variance Approach to Regression Analysis

F test of β1 = 0 vs. β1 6= 0 (cont.)

Property
If β1 = 0 so that all Yi have the same mean µ = β0 and the
same variance σ 2 , SSE/σ 2 and SSR/σ 2 are independent χ2
variables.

When H0 holds:
SSR SSE
MSR χ2 (1) χ2 (n − 2)
F∗ = σ2
÷ σ2
= ∼ ÷ ∼ F (1, n − 2)
1 n−2 MSE 1 n−2
When Ha holds, F ∗ follows the noncentral F distribution.

hsuhl (NUK) LR Chap 2 79 / 118


Analysis of Variance Approach to Regression Analysis

Construction of Decision Rule

F ∗ ∼ F (1, n − 2)
The decision rule: α=Type I error
Decision

IfF ∗ ≤ F (1 − α; 1, n − 2), concludeH0 ;


IfF ∗ > F (1 − α; 1, n − 2), concludeHa ;

where F (1 − α; 1, n − 2) is the (1 − α)100 percentile of


the approximate F distribution.

hsuhl (NUK) LR Chap 2 80 / 118


Analysis of Variance Approach to Regression Analysis

Exmaple

Toluca Company
Earlier test on β1 : (the t test, p46)
Two-Sided Test
H0 : β1 = β10
b1 − β10
If |t ∗ | = ≤ t(1 − α/2; n − 2), conclude H0
s{b1 }
b1 − β10
If |t ∗ | > t(1 − α/2; n − 2), conclude Ha
s{b1 }

hsuhl (NUK) LR Chap 2 81 / 118


Analysis of Variance Approach to Regression Analysis

Exmaple

Using the F test

H0 :β1 = β10 = 0
6 β10 = 0
Ha :β1 =

α = 0.05; n = 26; F (0.95; 1, 23) = 4.28

IfF ∗ ≤ 4.28, concludeH0

We have
MSR 252, 378
F∗ = = = 105.9
MSE 2, 384
What is the conclusion?

hsuhl (NUK) LR Chap 2 82 / 118


Analysis of Variance Approach to Regression Analysis

Equivalence of F test and t Test

For a given α level, the F test of

H0 :β1 = 0 Ha : β1 6= 0

is equivalence algebraically to the two-tailed t test.


Decision
!2
∗ b2 b1
F = 2 1 = = (t ∗ )2
s {b1 } s{b1 }

[t(1 − α/2; n − 2)]2 = F (1 − α; 1, n − 2)

t test: two-tailed; F test: one-tailed;


hsuhl (NUK) LR Chap 2 83 / 118
General Linear Test Approach

Steps for General Linear Test Approach

1 Fit the full model and SSE(F )


2 Fit the reduced model under H0 and SSE(R)
3 Use test statistic and decision rule

hsuhl (NUK) LR Chap 2 84 / 118


General Linear Test Approach

1. Full Model

The full or unrestricted model:

Yi = β0 + β1 Xi + εi

SSE(F ):
X 2
(Yi − Ŷi )2 = SSE
X
SSE(F ) = Yi − (b0 + b1 Xi ) =

hsuhl (NUK) LR Chap 2 85 / 118


General Linear Test Approach

2. Reduced Model

Hypothesis:

H0 :β1 = 0 Ha : β1 6= 0

The reduced or restricted model when H0 holds:

Yi = β0 + εi

SSE(R):
X 2
(Yi − Ȳ )2 = SSTO
X
SSE(R) = Yi − (b0 ) =

hsuhl (NUK) LR Chap 2 86 / 118


General Linear Test Approach

3. Test Statistic

SSE(F ) ≤ SSE(R)

The more parameters are in the model, the better one can fit
the data and the smaller are the deviations around the fitted
regression function.

hsuhl (NUK) LR Chap 2 87 / 118


General Linear Test Approach

3. Test Statistic (cont.)

When SSE(F ) is not much less than SSE(R), using the full
model does not account for much more of the variability of
the Yi than does the reduced model.

⇒ Suggest that the reduced model is adequate

i.e., H0 holds.
When SSE(F ) is close to SSE(R), the variation of the
observations around the fitted regression function for the full
model is almost as great as the variation around the fitted
regression function for the reduced model.

hsuhl (NUK) LR Chap 2 88 / 118


General Linear Test Approach

3. Test Statistic (cont.)

A small difference SSE(R) − SSE(F ) suggests that H0 holds.


⇔ A large difference suggests that Ha holds.
Test Statistic: a function of SSE(R) − SSE(F ):

SSE(R) − SSE(F ) SSE(F )


F∗ = ÷ ∼ F distribution
dfR − dfF dfF

when H0 holds.
Decision rule:

If F ∗ ≤ F (1 − α; dfR − dfF , dfF ), conclude H0


If F ∗ > F (1 − α; dfR − dfF , dfF ), conclude Ha

hsuhl (NUK) LR Chap 2 89 / 118


General Linear Test Approach

3. Test Statistic (cont.)

For testing whether or not β1 = 0, we have

SSE(R) = SSTO SSE(F ) = SSE


dfR = n − 1 dfF = n − 2
MSR
⇒ F∗ =
MSE

hsuhl (NUK) LR Chap 2 90 / 118


Descriptive Measures of Linear Association between X
and Y

Reason

The usefulness of estimates or predictions depends upon the


width of the interval
The user’s needs for precision which vary from one
application to another.
No single descriptive measure of the “degree of linear
association” can capture the essential information as to
whether a given regression relation is useful in any particular
application.

hsuhl (NUK) LR Chap 2 91 / 118


Descriptive Measures of Linear Association between X
and Y

Coefficient of Determination

SSTO is a measure of the uncertainty in predicting Y when


X is not considered.
SSE measures the variation in the Yi when a regression
model utilizing the predictor variable X is employed.
A natural measure of the effect of X in reducing the variation
in Y is to express the reduction in variation
(SSTO − SSE = SSR) as a proportion of the total variation:

SSR SSE
R2 = =1−
SSTO SSTO

hsuhl (NUK) LR Chap 2 92 / 118


Descriptive Measures of Linear Association between X
and Y

Coefficient of Determination (cont.)


R2 : the coefficient of determination; 判定係數;決定係數
0 ≤ SSE ≤ SSTO
SSR SSE
⇒ 0 ≤ R2 = =1− ≤1
SSTO SSTO

SSE = 0 ⇒ R2 = 1 (If all Yi = Ŷi )


SSR = 0 ⇒ R2 = 0 (If all Ŷi = Ȳ (b1 = 0 ⇒ X 與Y 無直線
關係))
hsuhl (NUK) LR Chap 2 93 / 118
Descriptive Measures of Linear Association between X
and Y

Coefficient of Determination (cont.)

R2 = 0 ⇒ X 與Y 無直線關係
There is no linear association between X and Y in the sample
data, and the predictor variable X is of no help in reducing the
variation in Yi with linear regression.
R2 → 1 ⇒ X 軸與Y 軸變項間的直線關係越強(具有線性變化)
The closer it is to 1, the greater is said to be the degree of linear
association between X and Y .

hsuhl (NUK) LR Chap 2 94 / 118


Descriptive Measures of Linear Association between X
and Y

Coefficient of Determination (cont.)

The Toluca Company:


fitreg<-lm(Hrs~Size,data=toluca)
anova(fitreg)
summary(fitreg)

hsuhl (NUK) LR Chap 2 95 / 118


Descriptive Measures of Linear Association between X
and Y

Coefficient of Determination (cont.)

The variation in work hours is reduced by 82.2% when lot size is


considered.

hsuhl (NUK) LR Chap 2 96 / 118


Descriptive Measures of Linear Association between X
and Y

Coefficient of Determination (cont.)

hsuhl (NUK) LR Chap 2 97 / 118


Descriptive Measures of Linear Association between X
and Y

Coefficient of Determination (cont.)

R2 is widely used for describing the usefulness of a regression


model.
Serious misunderstanding:
A high R2 indicates that useful predictions can be made.
(Not necessarily correct. Ex: Xh = 100)
A high R2 indicated that the estimated regression line is a
good fit. (Not necessarily correct. Curvilinear)
A R2 near 0 indicated that X and Y are not related. (Not
necessarily correct. Curvilinear)

hsuhl (NUK) LR Chap 2 98 / 118


Descriptive Measures of Linear Association between X
and Y

Coefficient of Determination (cont.)

hsuhl (NUK) LR Chap 2 99 / 118


Descriptive Measures of Linear Association between X
and Y

相關 係 數 )
Coefficient of Correlation (相

A measure of linear association between Y and X when Y


and X are random is the coefficient of correlation.

r = ± R2

A plus or minus sign is attached to this measure according to


whether the slope of the fitted regression line is positive of
negative.
−1 ≤ r ≤ 1
The wider the Xi are spaced, the higher R2 will tend to be.
SSR: the “expected variation” in Y
SSE: the “unexplained variation”
R2 is interpreted in terms of the proportion of SSTO in Y
which has been “explained” by X .
hsuhl (NUK) LR Chap 2 100 / 118
Distinction between Regression and Correlation Model

Normal Correlation Models

Assume that the X values are known constants?


The confidence coefficients and risks of errors refer to
repeated sampling when X values are kept the same from
sample to sample.

Frequently, it may not be appropriate to consider the X


values as known constants.
cannot control daily temperatures
“height of person” vs. “weight of person”: using correlation
model
the normal correlation model

hsuhl (NUK) LR Chap 2 101 / 118


Distinction between Regression and Correlation Model

Bivariate Normal Distribution

Two variables Y1 , Y2 : bivariate normal distribution.


Density Function
The density function of the bivariate normal distribu-
tion: 
1 1 Y 1 − µ1 2
  
f (Y1 , Y2 ) = √ exp −
2πσ1 σ2 1 − ρ12  2(1 − ρ212 ) σ1

2 
Y 1 − µ1 Y 2 − µ2 Y2 − µ2
   
− 2ρ12 +
σ1 σ2 σ2 

Five parameters: µ1 , µ2 , σ1 , σ2 , ρ12

hsuhl (NUK) LR Chap 2 102 / 118


Distinction between Regression and Correlation Model

Bivariate Normal Distribution (cont.)

hsuhl (NUK) LR Chap 2 103 / 118


Distinction between Regression and Correlation Model

Bivariate Normal Distribution (cont.)

Marginal Distribution

Y1 , Y2 ∼ N2 (µ1 , µ2 , σ1 , σ2 , ρ12 ):  
2
1 1 Y1 − µ1

Y1 ∼ N (µ1 , σ12 ) ⇒f1 (Y1 ) = √ exp  − 
2πσ1 2 σ1
 
2
1 1 Y2 − µ2

Y2 ∼ N (µ2 , σ22 ) ⇒f2 (Y2 ) = √ exp  − 
2πσ2 2 σ2

When Y1 , Y2 ∼ N2 (µ1 , µ2 , σ1 , σ2 , ρ12 ) ⇒


Y1 ∼ N (µ1 , σ12 ); Y2 ∼ N (µ2 , σ22 )
The converse is not generally true.

hsuhl (NUK) LR Chap 2 104 / 118


Distinction between Regression and Correlation Model

Bivariate Normal Distribution (cont.)

ρ12 : the coefficient of correlation between Y1 , Y2

σ{Y1 , Y2 } σ12
ρ12 = =
σ{Y1 }σ{Y2 } σ1 σ2

Y1 ⊥ Y2 ⇒ σ12 = 0 ⇒ ρ12 = 0
If Y1 and Y2 are positively related⇒ σ12 and ρ12 are positive.
−1 ≤ ρ12 ≤ 1

hsuhl (NUK) LR Chap 2 105 / 118


Distinction between Regression and Correlation Model

Conditional Probability Distribution of Y1

Y1 , Y2 ∼ N2 (µ1 , µ2 , σ1 , σ2 , ρ12 ); and


Y1 ∼ N (µ1 , σ12 ); Y2 ∼ N (µ2 , σ22 );
The density function of the conditional probability of Y1 for
given value of Y2 : (Y1 |Y2 ∼ N (α1|2 + β12 Y2 , σ1|2 ))
" #
f (Y1 , Y2 ) 1 1  Y1 − α1|2 − β12 Y2 2
f (Y1 |Y2 ) = =√ exp −
f2 (Y2 ) 2πσ1|2 2 σ1|2

σ1
α1|2 = µ1 − µ2 ρ12
σ2
σ1
β12 = ρ12
σ2
2
σ1|2 = σ12 (1 − ρ212 )

hsuhl (NUK) LR Chap 2 106 / 118


Distinction between Regression and Correlation Model

Conditional Probability Distribution of Y1 (cont.)

α1|2 : the intercept of the line of regression of Y1 and Y2


β12 : the slope of this line
The conditional distribution of Y1 , given Y2 , is equivalent to
the normal error regression model (1.24).

hsuhl (NUK) LR Chap 2 107 / 118


Distinction between Regression and Correlation Model

Conditional Probability Distribution of Y1 (cont.)


Three important characteristics of the conditional distributions of
Y1 :
1 normal: slice a bivariate normal distribution vertically; scaled
its area;

hsuhl (NUK) LR Chap 2 108 / 118


Distinction between Regression and Correlation Model

Conditional Probability Distribution of Y1 (cont.)

2 The means of the conditional probability distributions of Y1


fall on a straight line:

E{Y1 |Y2 } = α1|2 + β12 Y2

3 All conditional probability distribution have the same


standard deviation σ1|2 .

Equivalence to Normal Error Regression Model.

hsuhl (NUK) LR Chap 2 109 / 118


Distinction between Regression and Correlation Model

Conditional Probability Distribution of Y1 (cont.)

Can we still use regression model (2.1) if Y1 and Y2 are not


bivariate normal?

1 The conditional distributions of the Yi , given Xi , are normal


and independent, with conditional means β0 + β1 Xi and
conditional variance σ 2 .
2 The X1 are independent r.v. whose probability distribution
does not involve the parameter β0 , β1 , σ 2 .

hsuhl (NUK) LR Chap 2 110 / 118


Distinction between Regression and Correlation Model

Inferences on Correlation Coefficients

To study the relationship between two variables: ρ12


MLE of ρ12 :

(Yi1 − Ȳ1 )(Yi2 − Ȳ2 )


P
r12 = P
[ (Yi1 − Ȳ1 )2 (Yi2 − Ȳ2 )2 ]1/2
P

r12 is a biased estimator of ρ12 .


−1 ≤ r12 ≤ 1

hsuhl (NUK) LR Chap 2 111 / 118


Distinction between Regression and Correlation Model

Inferences on Correlation Coefficients (cont.)


Test
The population is bivariate normal

H0 : ρ12 = 0 vs. Ha : ρ12 6= 0

(ρ12 = 0 ⇒ Y1 ⊥ Y2 )

⇐⇒ H0 : β12 = 0 vs. Ha : β12 6= 0


⇐⇒ H0 : β21 = 0 vs. Ha : β21 6= 0

Test statistics:

∗ r12 n − 2
t = q ∼ t(n − 2)
2
1 − r12

hsuhl (NUK) LR Chap 2 112 / 118


Distinction between Regression and Correlation Model

Inferences on Correlation Coefficients

This test statistics is identical to the regression t ∗ test


P 1/2
(Xi −X̄)2
q
b1 SSR
statistics s{b1 }
. (∵ r = SSTO
= b1 SSTO
)
The appropriate decision rule to control the Type I error α:

If |t ∗ | ≤ t(1 − α/2; n − 2), conlude H0


If |t ∗ | > t(1 − α/2; n − 2), conlude Ha

hsuhl (NUK) LR Chap 2 113 / 118


Distinction between Regression and Correlation Model

Inferences on Correlation Coefficients (cont.)

Inferences on Correlation Coefficients


A national oil company: service station gasoline sales vs.
sales of auxiliary(附屬的) product
23 of its service stations
average monthly sales data: Y1 =gasoline sales vs.
Y2 =auxiliary products and services
r12 = 0.52; α = 0.05
To test whether or not the association was positive

H0 : ρ12 ≤ 0 vs. Ha : ρ12 > 0

t ∗ = 2.79 > t(0.95; 21) = 1.721; (P-value = 0.006)

hsuhl (NUK) LR Chap 2 114 / 118


Distinction between Regression and Correlation Model

Interval Estimation of ρ12

The Fisher z transformation:


1 1 + r12
 
0
z = ln
2 1 − r12

When n is large (n ≥ 25), the sampling distribution of z 0 is


≈ N (E{z 0 }, σ 2 {z 0 }).
1 1 + ρ12
 
E{z 0 } = ς = ln (2.90)
2 1 − ρ12
1
σ 2 {z 0 } = (只與n有關) (2.91)
n−3
(z 0 , ς): Table B.8

hsuhl (NUK) LR Chap 2 115 / 118


Distinction between Regression and Correlation Model

Interval Estimation of ρ12 (cont.)

Interval estimate: (n ≥ 25)

z0 − ς
∼ N (0, 1)
σ{z 0 }
⇒z 0 ± z(1 − α/2)σ{z 0 }

The 1 − α C.I. for ρ12 are obtained by transforming the limits


on ς by (2.90).

hsuhl (NUK) LR Chap 2 116 / 118


Distinction between Regression and Correlation Model

Interval Estimation of ρ12 (cont.)

A C.I. for ρ12 can be employed to test whether or not ρ12 has
a specified value. (ex. 0.5)
0 ≤ ρ212 ≤ 1: measures the relative reduction in the variability
of Y2 associated with the use of variable Y1 .

σ12 − σ1|2
2
ρ212 =
σ12
σ22 − σ2|1
2
ρ212 =
σ22

hsuhl (NUK) LR Chap 2 117 / 118


Distinction between Regression and Correlation Model

Spearman Rand Correlation Coefficient

When no appropriate transformations can be found, a


nonparametric rank correlation procedure may be useful for
making inferences about the association between Y1 and Y2 .
The ordinal Pearson product-moment correlation coefficient:

(Ri1 − R̄1 )(Ri2 − R̄2 )


P
rs = P
[ (Ri1 − R̄1 )2 (Ri2 − R̄2 )2 ]1/2
P

Test: (two-sided)

H0 : There is no association between Y1 and Y2


Ha : There is an association between Y1 and Y2

∗ rs n − 2
⇒t = ∼ t(n − 2)
1 − rs2
hsuhl (NUK) LR Chap 2 118 / 118

You might also like