Lecture12 PDF

Lecture 12 
Linear Regression:
Test and Confidence Intervals
Fall 2013
Prof. Yao Xie, yao.xie@isye.gatech.edu
H. Milton Stewart School of Industrial Systems & Engineering
Georgia Tech
1
Outline
• Properties of β̂
1 and β̂
0 as point estimators
• Hypothesis test on slope and intercept
• Confidence intervals of slope and intercept
• Real example: house prices and taxes
2
Regression analysis
• Step 1: graphical display of data — scatter plot: sales
vs. advertisement cost
!
!
!
!
!
!
!
• calculate correlation
3
• Step 2: find the relationship or association between
Sales and Advertisement Cost — Regression
4
Simple linear regression
Based on the scatter diagram, it is probably reasonable to assume that the mean of the
random variable Y is related to X by the following simple linear regression model:
Response Regressor or Predictor
Yi = β 0 + β1 X i + ε i i = 1,2,!, n
εi (
ε i ∼ Ν 0, σ 2 )
Intercept Slope Random error
where the slope and intercept of the line are called regression coefficients.
• The case of simple linear regression considers a single regressor or predictor x and a
dependent or response variable Y.
5
the adequacy of the fitted model.
ator
t is of Equation 11-8.
occasionally Given data
convenient to (xgive
1, y1), p , (xn, yto
(x2, y2),symbols
special n),the
let numerator and
uation 11-8. Given data (x1, y1), (x2, y2), p , (xn, nyn), let
Regression a a xi b
coefficients 2
Sx x " a 1xi % x2 " a

i"1 a a xi b
n n
2 2 n 2
i"1
xi % (11-10)
Sx x " a 1xi % x2 2 " a x 2i %

n i"1 n
n
i"1
n (11-10)
i"1 i"1
a a xi b a a yi b
n n
Sx y " a 1yi % y2 1xi % x2 " a xi yi %

n n
a a xi b a a yi b
i"1 i"1
n n
n (11-11)
" a 1yi % ˆy2 1xi % x2ˆ " a xi yi %

i"1 i"1
n n
i"1 i"1
(11-11)
xy
i"1 β 0 = y − β1 x i"1 n
yˆ i = βˆ0 + βˆ1 xi
Fitted (estimated)
S xy regression model
β̂1 =
S xx
Caveat: regression relationship are valid only for values of the regressor variable
within the range the original data. Be careful with extrapolation. 6
Estimation of variance
• Using the fitted model, we can estimate value of the
response variable for given predictor
!
yî = βˆ0 + βˆ1 xi
!
• Residuals: ri = yi − yî
• Our model: Yi = β0 + β1Xi + εi, i =1,…,n, Var(εi) = σ2
• Unbiased estimator (MSE: Mean Square Error)
n
!
∑i
r 2
σ̂ =2
MSE = i =1
n−2
7
Punchline
• the coefficients
!
β̂1 and β̂ 0
!
and both calculated from data, and they are subject to
error.
• if the true model is y = β 1 x +
β 0 , β̂ β̂ 0 are
1 and
point estimators for the true coefficients
!
• we can talk about the `àccuracy’’ of β̂1 and β̂ 0
8
Assessing linear regression model
• Test hypothesis about true slope and intercept
! β1 = ?, β0 = ?
!
• Construct confidence intervals
!
β1 ∈ ⎡⎣ β̂1 − a, β̂1 + a ⎤⎦ β 0 ∈ ⎡⎣ β̂ 0 − b, β̂ 0 + b ⎤⎦ with probability 1− α
!
• Assume the errors are normally distributed
(
ε i ~ Ν 0, σ 2 )
9
to show
the $ thei"1
leastthat
squares expected valueofof
estimators
i $ the
$1 regression
is
i"1 i coefficients may be
s. We will investigate the bias and variance properties of the least
11-4 HYPOTHESIS TESTS IN
Properties
$̂e1.fitted E1$̂1 2 ! of
$1 Regression Estimators
or estimated regression line is therefore
(11-15)
ause $̂1 is a linear combination of the observations Yi, we can use
estimator slope
of the For
parameter
true the
slope
o show that the expected value intercept,
$ β. we can show
intercept
$̂01 #
ˆ ˆ in a similar
parameter β manner tha
1 1ŷ of
"! is !1x 0 (
riance of $̂1. Since we 11-4
have HYPOTHESIS
assumed that TESTS
V(&i) !IN#SIMPLE
2 LINEAR REGRESS
, it follows that
hatiseach pair ofcombination
a linear observations satisfies
of the the relationship
observations Yi, the results in ˆ
E1$̂1 2 ! $1 ˆ
E1!0 2 " !0 and V1!0 2 "
(11-15)
edintercept, we can show in a similar manner that
to show that
ˆ #!
yi " ! ˆ x # e, i " 1, 2, p , n
0 1 i i
timator of the true slope2$1. 2
ei " yof Thus, #!ˆ is an unbiased estimator 1 of thex the fit !o
errorintercept
ance i %$̂ŷ
1 .
i is called
Since
V1$̂we 2 the
ˆ
!have residual.
1E1!0 2 " !0 0assumed The
and residual
that ˆ
V(&
V1! 2
i) !
" #
$ 22, itthe
describes c follows
&(11-16) d in
that
S ˆ ˆ 0 n thebe
to the ith observation
is a linear combination ablesy i. !
Later
xx
of
0 and
in
the !
this is not
chapter
observations
1 zero.
we will
Y It, usecan
the Sresiduals
xxshown
results in to(see
pr
i
ation about the adequacy
to show that #$ x %Sxx.
2of the fitted model.
otationally, it is occasionally Theconvenient
estimate to
of give
$ 2 special symbols to the numerato
could be used in Equations 1
ˆ! is an unbiased estimator !
inator
0 unbiased
of Equation estimator
11-8. Given of the
data (x1, y1), (x unbiased
, y2), p0, (xestimator
intercept . The covariance
, yn), let of th
!0 and ! the variance of the slope
ˆ is not zero. 2It can be shown (see Exercise 11-98) 2 and the nintercept. We call th
that
1 #
a a xi b
Sxx. V1$̂ 1
estimators
2 ! the estimated standard
n 2errors of
(11-16) the slop
Snxx
Sx x " a 1xi % x2 " a x i %
2 n
e estimate of $ could be used in2 Equations 2 11-16
i"1 and 11-17 to provi
n (1
iance of the slope and i"1 the intercept. We i"1 call the square roots of the res
ors theEstimated
estimated standard errors of the slope and intercept, respectiv 10
an unbiased estimator of the intercept !0. The covariance of the rand
ˆStandard
nd ! is not zero. errors
It can be of
showncoefficients
(see Exercise 11-98) that cov(! ˆ
1
• We can replace σ 2 with its estimator σ̂ 2 …
timate of $2 could be used in Equations 11-16 and 11-17 to provide est
! n
∑
e of the slope and the intercept. We call ri
2the square roots of the resulting
!
he estimated standardσ̂errors
2
= MSE
of the
= i =slope
1 and intercept, respectively.
n−2
!
! ri = yi − yî yî = βˆ0 + βˆ1 xi
!
ple linear regression the estimated standard error of the slope and
• Using rerror
ted standard esults
offrom previous are
the intercept page, estimate the
standard error of coefficients
2 2
ˆ 2" $̂ 1
ˆ 2 " $̂2 c & x
se1!1 and se1!0 n d
B Sxx B Sxx
11
2
ively, where $̂ is computed from Equation 11-13.
distributed
2 with mean zero 2 and variance $ , abbreviated NID(0, $ ).
, abbreviated NID(0, $ ).
t of assessing the adequacy of a linear regression model is testing stat
he modelHypothesisparameters test in simple
and constructing linear
certain regression
confidence intervals. Hypo
near regression is discussed in this section, and Section 11-5 presents me
n model
• we wisish
testing
to t statistical
est ttest
he hypothesis whether the say,
sand
lope e.quals
onfidence
o test the intervals. Tothat
hypothesis hypotheses
the slope equals about a the slope
constant, intercept
! The of t
app
nfidence intervals. Hypothesis 1,0
s
we
re a musta constant
constant, make the!1,0
say, . The appro-
additional assumption that the error component i
ection 11-5 presents methods
lly distributed.
! Thus, the complete assumptions are that the errors are nor
slope and intercept ofHthe 0: !re-
1 " !1,0
distributed with mean zero and variance $2, abbreviated NID(0, $2).
the error ! componentH in: !the( ! (11-1
1 1 1,0
re that the errors are normally (11-18)
! 2
reviated NID(0,
umed a two-sided alternative. $ ). Since the errors ' are NID(0, $ 2
), it follo
rrors•
servationse.g.
' i arerelate
NID(0, a
Yi are NID(! ds
$ 2to sales, we are interested in study
), it &follows
! x , $ 2
i
). Now !ˆ 1 is a linear combination
0 1 i
!ˆ whether
to test
ow 1 is or combination
thea hypothesis
linear not ithat
ncrease ofa $ oequals
the slope n ads awconstant,
ill increase
say,$ !
1,0
. iThe
n a
are sales?
• sale
onstant, say, = ! 1,0
.aThe
ds +appro-
Hconstant?
0: !1 " !1,0
H1: !1 ( !1,0 (1
sumed a two-sided alternative. Since the errors 'i are NID(0, $2),12 it fo
(11-18) 2 ˆ
hesis if the computed value of this test statistic, t0, is such that
A related
e denominator of the and important
test statistic question…
in Equation 11-22 is just the stan-
whether
ial• case or not the ofslope
of the hypotheses Equationis 11-18 is
zero? Significance of regression
! H0: !1 $ 0
H1: !1 % 0 (11-23)
!
• if significance
o the β1 = 0, that ofmregression.
eans Y does not to reject H0: !1 $ 0 is
Failure
depend
at there on X, relationship
is no linear i.e., between x and Y. This situation is
e that this may imply either that x is of little value in explaining the
• Y and X are independent
best estimator of Y for any x is ŷ $ Y [Fig. 11-5(a)] or that the true
d Y• isIn
not the advertisement
linear [Fig. 11-5(b)]. example,
Alternatively, if H0: !1 $ 0 is re-
does
of value in aexplaining
ds increase the svariability
ales? or n
ino Y (see Fig. 11-6). Rejecting
effect?
her that the straight-line model is adequate [Fig. 11-6(a)] or that,
effect of x, better results could be obtained with the addition of
rms in x [Fig. 11-6(b)]. 13
11-4 HYPOTHESIS TESTS IN SIMPLE LINEAR REG
y y
"0 x x
(a) (b)
• H0 not rejected • H0 rejected

2 Oxygen Purity Tests of Coefficients
nificance of regression using the model for Practical Interpretation: Since the r
ata from Example 11-1. The hypotheses are t0.005,18 " 2.88, the value of the test statis
14
critical region, implying that H0: !1 "
H :! "0
PTER 11 SIMPLE LINEAR
independent REGRESSION
normal random AND and consequently, !ˆ 1 is N(!1, "
CORRELATION
variables,
Use t-test for slope
and variance properties of the slope discussed in Section 11-3. In additio
a chi-square distribution with n # 2•degrees of freedom, and !ˆ is ind
independent normal random variables,Under H0 , test
and consequently, ˆ
!1 1is N(!
result of those properties, the statistic statistic
Under H0
and variance properties of the slope discussed in Section 11-3. In ad
a chi-square distribution with n # !2 degrees of freedom, and ˆ is
!
ˆ #! 1
!1 1,0
result
Statistic of those properties, the statistic
! T0 $ 2
2"ˆ (Sxx

ˆ #!
!
,0 !
follows the t distribution with n # 2 degrees of
1
freedom
1,0
under H0: !1 $
atistic T0 $
H0: !1 $ !1,0 if ~ t d istribution
2" ˆ 2 with
(Sxx
n-‐2 degree of freedom
0 t0 0 & tof
follows the t distribution with n # 2! degrees freedom under H0: !
'(2,n#2
H0: !1 $ !1,0 if • Reject H0 if
where t0 is computed from Equation 11-19.
! The denominator of Equation
( )
error of the slope, so we could write the test 0 t statistic
0 & t as
β̂1 ~ Ν β1,0 , σ 2 / Sxx ! 0 '(2,n#2
(two-‐sided ˆ #
!1
test)
!1,0
where t0 is computed from Equation 11-19. T0 $ The denominatorˆ
of Equa
se1!1 2 15
error of the slope, so we could write the test statistic as
We willNumbertest for significance
x (%) of
y (%)regression using the model for Prac
xygen
the oxygen Purity
1
purityTests 0.99 of Coefficients
data from Example
90.01
11-1. The hypotheses are t0.005,18 "
2 1.02 89.05
11-2
nce
EXAMPLE 3 Oxygenusing
of regression
Example: 11-2Purity
1.15 the Tests
oxygen model
Oxygen of
91.43for
purity Coefficients
Purity ˆ tests
Tests Practical2Interpretation: Since
of of coefficients
Coefficients criticaltherer
om variables,
Example
r significance4 and
11-1. consequently,
The
1.29hypotheses
of regression H :
using !
93.74 are" !
0 1
the model tis N(! " 1 , "
2.88, !S
for the modelPracticalthe xx ),
value using
of the the
test
Interpretatio
bias
statis
We will 5 test for significance 0
of 1
regression 0.005,18
using for There
Practica is s
1n 2
H(value
2
1.46 96.73
e the
rity slope
EGRESSION
•data discussed
6 from
oxygen
AND
Consider Example
purity 1.36 in
CORRELATION
t he
data t
11-1.Section
est
from H The: !
94.45
1Example 1 & 11-3.
0
hypotheses
11-1.
100
In
critical
are
The
addition,
region,
hypothesest implying
0.005,18 are " #2.88,22"
that
t
ˆ
the 0"
this : !test
"
has
"of
12.88
H0: !1 7" 0 0.87 87.59
Thereˆ is strong evidence to 0.005,18
support 2 this c
h n #8 2 degrees1.23of freedom, 91.77 and 98 ! 1 is independent
critical region, of "ˆ
implying
with
critical . region
As
a cal tha
1: !we
Hand 1! 9&will
0H0use: !1 #"1.55 " 0 0.01. H From
: !
99.42" Example
0 thisˆ11-1test and is P ! 1.23
Table 2 is11-2
(9
' 10 evidence . This wa
e statistic
mal random10 variables, 1.40 and 0consequently,
1
93.65
96
! 1 is N(! There
1 , " !S strong
xx ), using
There the to
bia
is Table
stron
we have H : ! & 0 H : 93.54& 0 94 with a calculator.
! this test 1n is # !
P this 2 '
1.23 2isth10
1. From!12of (
test P
Purity (y)
11 1.19
perties the slope
Example 1 1
11-1 discussed
and Table1 in 1 Section 11-3. In addition,
11-2 22" Notice
ˆ " ha
1.15 92.52 Table ˆ 11-2
with presents
a calculator.the Minitab ou
#
92
! 2is independent of11.35 with "a 2calcul
anda
eibution with n 2 degrees of freedom, and
ˆ "
! 13
14.947 0.98
n " 20, 90.56
S " 0.68088, $̂ " 1.18 ˆfor. As
and !we
# " 0.01. From
114 will use # 1.01 "!ˆ
Example0.01.
# 11-1
From
! and
89.54xx Table
Example 90 Notice
11-2 that1
11-1 and Table 11-2the t-statistic value
Table 11-2 presents Table 11
the th
operties, 15 the statistic 1
1.11 2 1,0
89.85 reports is P " th
20, we Shave " T
0.68088, $ $̂ " 1.18 88 11.35 and that the reported P-value (11-19)
so the!xx17t-statistic
16 0 in Equation
1.20
x 10-20
90.39
becomes Notice that theNotice t-statistic
This that
statis
x v
th
(a)
ˆ18 2"
1.26 ˆ 2
( S 93.25
xx 2
86 reports the t-statistic for testing the hy
0.85 0.95 2 1.15
1.05 11.35 (b)and
1.25 1.45 11.35
1.35 that the and
1.55reported tha
47
ion•10-20n ! " " 14.947
20,
Calculate
1 becomes S " n " 20,
0.68088,
xxthe test !
1.32
ˆ S
93.41
#
xx $̂"
! "0.68088,
1.18
This $̂
statistic " 1.18
is
Hydrocarboncomputed
level ( x) from as t "
Equation
0 46
19
!ˆ 1.43
!ˆ 1
94.98 1,014.947 reports theversus reports
t-statistic the
forrej t-t
t0statistic
20
"t-statistic
1 0.95 T $
" 0
1 87.33
"
Figure 11-1 Scatter diagram
t0 "
asTable 11-1.46.62. "
of oxygen purity
Clearly,
11.35 zero
hydrocarbon
then, the hypothes is
(11-19
c so
in the
Equation 10-20 in Equation
becomes 10-20
2" 2 becomes
level from
This statistic isThis statisticfr
computed
!
hn #1
" 2$̂ %Sxx of freedom
2 degrees
14.9472
"
ˆ
se1!1 2
11.35
ˆ ( S
21.18zero
underxx %H : !rejected.
0.68088
0is 1 $ !1,0. We would rejec
ˆ! 2 ! 21.18 as t0 " 46.62. as t
Clearly,
0 " 46.62
then
1ity Tests
1 ! %0.6808814.947
ˆ
1 of 1Coefficients
ˆ ! ˆ
!1 14.947 zero is rejected. zero is rejecte
ibution t
"0 with" n" "
2# 2 degrees ˆ of " " 11.35
freedom " 11.35
under H0: !1 $ !1,0. We would rejec
%Sxx using
ssion ! 2$̂
ˆ
se1!1 2the % model
S se1!
21.18 %for
xx 2
0.68088
1 21.18 % Practical Interpretation: Since t
0.68088
11-4.2 Analysis of Variance Approach to Test Significa
of11-1. 0 0 & t'(2,n#2
0 tApproach
• Threshold
The hypotheses
Variance = t0.005,18
areto Test " 2.88, the
Significance of value of the(11-20)
Regression test st
• Reject
11-4.2 H0 since 0 tof
Analysis 0 &
A critical
Variance
t'(2,n#2
method region,
Approach
called the implying
to Test
analysis of Significance
variance 16 !1
that H(11-20
0: can
nalysis of Variance 0Approach
A method called the analysis of There to
varianceTest Significance
can beevidence
used to test of Regres
is strong to for signific
support th
ˆ #!
! T0 $ ˆ 2
1 1,0 se1!
T0 $ A similar
ˆ 2
procedure can
1 be used to test hypothes
Use t-test for1 intercept
se1!
similar procedure can be used to test hypotheses about the intercept. To test
H0: !0 $
• Use
be used a shypotheses
to test imilar form
about of tthe
est intercept. To testH0: !0 $ !0,0 H1: !0 %
! H0: !0 $ !0,0 H1: !0 % !0,0
!
H : !we%would
! use the statistic (11-21)
e would use the statistic
1 0 0,0
!
ˆ #!
! ˆ #!
! ˆ #!
!
c • Test statistic 0 0,0 0 0,0 0 0,0
T0 $ 2
$ T
ˆ 0
se1! 2
$ 2
Test Statistic 1 x 0 1 x
! ˆ #! ˆ "ˆ#c n! )
2
d
! 0 0,0 B!0 0,0 Sxx "ˆ 2
cn )
T0 $ $ ˆ B(11-22) Sx
! 1 x 2 se1!0 2
"ˆ Hc0nhypothesis
nd rejectUnder 2
the null , )
T ~
0 S t ddistribution
if the computedw ith n
value -‐2 d egree
of o
this test f f reedom
statistic, t0, is s
00 & t ! B. Note thatand
'(2,n#2 thexxdenominator reject theofnull hypothesis
the test if the computed
statistic in Equation 11-22 is just
•
othesis Reject
ard error
if the H0 if 0 t0 0 &
of the intercept.
computed
t'(2,n#2
value of
. Note
this test
that the denominator
statistic, t , is such
of the te
that
A very important special case of the hypotheses of Equation 0 11-18 is
dard
the denominator of the testerror of the
statistic intercept.
in Equation 11-22 is just the stan-
t. A very important
H0: !1 $ 0special case of the hypothe
17
Class activity
Given the regression line:
y = 22.2 + 10.5 x estimated for x = 1,2,3,…,20
!
1. The estimated slope is:
A. βˆ = 22 .2 B. βˆ = 10.5 C. biased

1 1
!
!
!
2. The predicted value for x*=10 is
A. y*=22.2 B. y*=127.2 C. y*=32.7
!
!
!
3. The predicted value for x*=40 is
A. y*=442.2 B. y*=127.2 C. Cannot extrapolate
18
Class activity
1.  The estimated slope is significantly different from zero when
2
βˆ S XX βˆ S XX βˆ1 S XX
1
> tα / 2,n − 2 1
< tα / 2,n − 2 > Fα / 2,n −1,1
A. σˆ B. σˆ C. ! σˆ
2. The estimated intercept is plausibly zero when
A.  Its confidence interval contains 0.
βˆ0 S XX βˆ0
B. < tα / 2,n − 2 C. > tα / 2,n − 2
σˆ σˆ 1 / n + x / S xx
2
19
where t0 is computed from Equation 11-19. The denominator of Equa
ˆ #
!
error of the slope, so we could write the test !1,0 as
1 statistic
Confidence interval T0 $
se1!ˆ 2
1
ˆ! # !
1 1,0
• we can obtain
A similarconfidence
procedure can ibe
nterval estimates
used to test
T0 $
hypotheses oˆf 1s2 the
about
se1! lope
intercept. T
and CORRELATION
ESSION AND intercept
A similar procedure can be used to test !0 $ !0,0 about the interce
H0: hypotheses
• width of confidence interval is a measure H1: !0 %o! f 0,0the
randomoverall
variables,quality ˆ is N(! , "2!S ), using the bias
and consequently,of the regression
! 1 1 xx H0: !0 $ !0,0
2true 2 parameter
ies of the slope discussed we would in Section 11-3. In addition, 1n H
use the statistic # :22"!
ˆ % ( "
! has
1 0 2 0,0
tion with n # 2 degrees of freedom, and ! ˆ is independent of " ˆ . As a
slope intercept
1
!ˆ #! !ˆ #!
rties, the statistic we would use the statisticT0 $ 0 0,0
$
0 0,0
Test Statistic 2 se1!ˆ 2
1 x 0
"ˆ 2 c nˆ ) d ˆ #!
ˆ! # ! B !0 #Sxx !0,0 ! 0 0,0
1 1,0 T0 $ $ ˆ 2
T0 $
Test Statistic 1 x (11-19)
2 se1! 0
and 2" reject 2 the
ˆ ( Sxx null hypothesis if the 2
computed
"ˆ c n ) value d of this test stat
0 t0 0 & t'(2,n#2. Note that the denominator B of the testSstatistic xx in Equation
dard error of the intercept.
tion with n #~ 2t degrees
distribution
and of reject
freedom
A very w ith under
the
important n-‐2
null H0~
hypothesis
special :! t1 d$istribution
case if!the
of the computed
1,0.hypotheses
We with
would n-‐2 of this
ofvalue
reject
Equation testi
11-18
0 t0 0 & t'(2,n#2. Note that the denominator of the test statistic in Equa
degree dard of error
freedom
of the intercept. degree Ho0:f !f1reedom $0 20
A very important special case of the hypotheses of Equation 11
H1: !1 % 0
0 t0 0 & t'(2,n#2 (11-20)
e both both1$distributed
are distributed
1 " $as 1 2 %t2&asˆ t%random
random Sx x variables 1$with
andvariables $n0 2"
0 " with %B "
n2& c2ndegrees
ˆdegrees ' of Bdfreedom.
of freedom. x x This
SThis lead
leads t
following definition of 100(1 " #)% confidence intervals Sx xthe slope and intercept.
on
llowing definition of 100(1 " #)% confidence intervals on the slope and intercept.
are both distributed as t random variables with n " 2 degrees of freedom. This l
Confidence intervals
h distributed as t random variables with n " 2 degrees of freedom. This leads to the
ce following definition of 100(1 " #)% confidence intervals on the slope and interce
ng definition of 100(1 " #)% confidence intervals on the slope and intercept.
on UnderUnder the assumption
the assumption thatobservations
that the the observations are normally
are normally and and independently
independently distribu
distributed
rs " confidence
#)% confidence
ce a 100(1
a 100(1 " #)% interval
interval on theon slope
the slope$1 in$simple
1 in simple linearlinear regression
regression is is
on Under the assumption that the observations are normally and independently distri
der
rs
the assumption that the observations are normally
2 and independently distributed,
2
a 100(1 " #)% confidence 2 &
interval ˆ on the slope $ in simple &
ˆ
2 linear regression
00(1 " #)% confidence interval
$̂ " ton the&slope $(1 in$ simple
ˆ
( $̂ linear
' &ˆ
1 t regression is (11-
$̂1 " t1#%2, n"2#%2, n"2 ( $1 ( 1$̂1 ' 1t# 2, n"2 #%2, n"2 (11-29
B SxB S
x x x2
% B SB S
xx xx 2
&ˆ2 &ˆ &ˆ2 &ˆ
$̂1 a"100(1
Similarly, $̂1 #)%
t#%2, n"2
" " t#confidence
%( ( t$
$1 ( S$̂1 interval
2, n"2 ' 12,( $̂1the
on ' intercept
t#%2, n"2 $(11-29) is (
Similarly, a 100(1 " #)%Bconfidence Sx x Binterval
xx
# % n"2
on theBintercept
Sx x $B 0 isSx x
0
2
milarly, aSimilarly,
100(1 " a 100(1
#)% 1
1confidence2 x
2 "x#)% interval
confidence interval
on the intercepton the$0 intercept
is $0 is
$̂ " t
$̂0 " t#%2, n"2% &ˆ B
0 # 2, n"22 &
ˆ c '
c n ' n d Sx x d
B Sx x
2
2 2 1 x
1 x 1 x 2
" t#%2, n"2 ' &ˆd2 c n ' ( $
$̂0 "&ˆt2#%c2,nn"2 d (($0$̂('$̂0t ' t#%2, n"2&ˆ 2 c &ˆ12 ' c n x' d (11-
B B
Sx x Sx x 0 0 #%2, n"2 B d
S (11-30
B 2n Sx x x x
1 x 2
1 x
( $0 ( $̂0 ' ( t$#%02,( &ˆ 2 tc#n%2,'
n"2$̂0 ' d&ˆ 2 c '(11-30) d (
B n"2
SBxx n Sx x
Oxygen Purity Confidence Interval on the Slope
ygen Purity
onfidence Confidence
interval on the slopeInterval
of the on
re- the This
Slope simplifies to
nce data
the interval on the slope
in Example of the
11-1. re- thatThis simplifies to
Recall
Purity
Oxygen
ata
0.68088,
Confidence
andPurity
in Example ˆ2 )
&
Interval
Confidence
11-1.
1.18 Recall
(see
on the
that
Table
Slopeon the Slope
Interval
11-2). 21
terval on 2the slope of the re- This simplifies to 12.181 ( $1 ( 17.713
8, and
onfidence&
ˆ ) 1.18
11-29 we interval
find (see Table 11-2).
on the slope of the re- This simplifies to ( $ ( 17.713
12.181
B Sx x B Sx x
We will find a 95% confidence interval on the sl
gression line using the data in Example 11-1
XAMPLEExample:
onfidence 11-4
interval oxygen
Oxygenon thePurity purity
intercept Confidencetests
$ 0 is of0.68088,
Interval coefficients
on the Slope
2
AMPLE
We willEXAMPLE 11-4 Oxygen
11-4 Oxygen
find a 95% confidence Purity
$̂ )
interval onPurity
1 Confidence
14.947,
the slope
S )
Confidence Interval
of the re- Interval
xx and &ˆ
on )the
1.18Slo
on the
This simplifies
(se
to
2will find EXAMPLE
a using
95% 11-4 Oxygen
Then, Purity
from Equation Confidence
11-29 we findInterval on
(αthe=ofre-
0.05)
essionWe line
will find confidence
athe95% interval
dataconfidence
in Example on
11-1.
interval theon slope
Recall of
that
the slope the re- ThisThis sims
d gression
) 14.947, We
S )will find
0.68088, a 95%
and &ˆconfidence
2
) in interval
(see
1.18Example Table11-1.on the
11-2). 2 slopethat
of the re-
1ssion line using
xx line theusing data theindata Example 11-1. Recall &
ˆ that
Recall
hen,
x from gression
Equation 11-29line weusing
find the 2 data in Example
$̂ˆ12" t0.025,18 11-1.
( $1Recall
( $̂1 ' that
t0.025,18
$̂
) 14.947, 1 ) 14.947,
S ) 0.68088, S xx ) 0.68088, and
and & ) 1.182 (see
ˆ & ) 1.18 (see
2 Table Table
B Sxx11-2). 11-2).
$̂xx
1 ) 14.947, S xx ) 0.68088,
1 and
x &
ˆ ) 1.18 (see Table 11-2).
en, from Then, from Equation
Equation 11-29 11-29
we find we
2 find Practical Inte
( $ ( $̂0 '
$̂1 "0 tThen, from &ˆ 2
t#%Equation
2, n"2 &
ˆ
or c nt 'we findd
11-29 &ˆ 2
(11-30)
0.025,18 ( $ 1B 1
( $̂ ' Sx xB S
0.025,18 there is strong evid
B Sxx 2 &ˆ 2 xx 2 &ˆ 2 Prac P
$̂ " t &ˆ ( &
ˆ
$ 2
( $̂ ' t &ˆ The&
1.18ˆ 2CI is reasonab
there
$̂1 " t0.025,181
$̂0.025,18
" t (
B $
S 1 ( $̂1 ' t1
1 ( $ ( $̂
0.025,18
14.947
0.025,18
" '
2.101t B S ance is( there
fairly
$ ( is14s
smal
B Sxx
1 0.025,18xx
B Sxx 1 1B Sxx A 0.68088
0.025,18
xx
B Sxx 1
val on TheTheCI iC
or the Slope 1.18 1.18 ance
ance is f
e- or
14.947 " 2.101
This simplifies to 0.68088 ( $ 1 ( 14.947 ' 2.101
A A 0.68088
at 1.18
14.947 " 2.101 1.18
1.18 1.18
( $1 ( 14.947
). 14.947 '"2.10114.947
2.101 "A
12.1812.101
0.68088
( (
$ $
( ( ( $1 ( 14.947
14.947
17.713
AA 0.68088 A 0.68088
1
0.68088 1
11-5.2 1.18 Confidence Interval on the
The confidence
' interval
2.101
1.18 does not
1.18 include 0, so enough
Practical Interpretation:
' A2.101 This CI does not include zero, so
0.68088
1-5.2evidence
'saying
Confidence2.101 there i
Interval
A 0.68088
there is strong evidence s enough
A
on the c
0.68088orrelation
Mean b etween
Response
A
(at # ) 0.05) that the slope is not zero. X
confidence and Y
interva.
22
The CI is reasonably narrow (*2.766) because thex0error
. Thisvari-
is a confiden
Wellington [“Prediction, Linear Regression, and a Minimum priate, fit the regression model relating steam
Sum of Relative Errors” (Vol. 19, 1977)] presents data on the the average temperature (x). What is the est
selling price and annual taxes for 24 houses. The data are Graph the regression line.
Example: house selling price and annual taxes
shown in the following table. (b) What is the estimate of expected steam usa
average temperature is 55#F?
(c) What change in mean steam usage is expec
Taxes Taxes monthly average temperature changes by 1#F
Sale (Local, School), Sale (Local, School), (d) Suppose the monthly average temperature is 4
Price/1000 County)/1000 Price/1000 County)/1000 the fitted value of y and the corresponding resi
25.9 4.9176 30.0 5.0500 11-6. The following table presents the high
29.5 5.0208 36.9 8.2464 mileage performance and engine displacement
27.9 4.5429 41.9 6.6969 Chrysler vehicles for model year 2005 (source: U
40.5 7.7841 mental Protection Agency).
25.9 4.5573
(a) Fit a simple linear model relating highway m
29.9 5.0597 43.9 9.0384 lon ( y) to engine displacement (x) in cubic
29.9 3.8910 37.5 5.9894 least squares.
30.9 5.8980 37.9 7.5422 (b) Find an estimate of the mean highway gaso
28.9 5.6039 44.5 8.7951 performance for a car with 150 cubic in
35.9 5.8282 37.9 6.0831 displacement.
(c) Obtain the fitted value of y and the correspon
31.5 5.3003 38.9 8.3607
for a car, the Neon, with an engine displace
31.0 6.2712 36.9 8.1400 cubic inches.
30.9 5.9592 45.8 9.1416
(a) Assuming that a simple linear regression model is Engine

Independent variable X: SalePrice
appropriate, obtain the least squares fit relating selling Displacement
price to taxes paid. What is the estimate of !2? Carline (in3)
Dependent variable Y: Taxes
(b) Find the mean selling price given that the taxes paid are 300C/SRT-8 215
23
x " 7.50. CARAVAN 2WD 201
(c) Calculate the fitted value of y corresponding to x "
• qualitative analysis
Calculate correlation
= 0.8760
24
n about the adequacy of the fitted model.
is called
onally, it isthe residual. convenient
occasionally The residual describes
to give special the errortointhe
symbols thenumerator
fit of theand
or of Independent
bservation
Equationyi. 11-8. variable
LaterGiven
in this Y: 1,SyalePrice
chapter
data (x 1),we y2), p
(x2,will use, (xthe
n, ynresiduals
), let to provide
ORRELATION
he adequacy of thevfitted model.
Dependent ariable X: Taxes
a
n 2
is occasionally convenient to give special symbols a b to the numerator and
a= 34.6125 a y
n n
xi
ONation 11-8. Given data (x , y ),
1x1 %1 x2 2 "
(x p
2 2 x 2 % n yn), let
, y ), , (x ,
i"1
n = 24 S " xx xi"1 i i = 6.4049
n (11-10)
a a xi b
i"1
n 2
Therefore, the least squares estimates of the slope and inter-
a a
n n
cept are 1x % x2 " = 829.0462
2 2 i"1
xi b a a
a aslope
S "xx i x % i (11-10)
n n n
i"1 i"1 yi b inter-
Sx y " a 1yi %! i % x2 ""a xi yi %
e, the leastn squares estimates of
Sx y = 191.3612
n the
10.17744 i"1
and
i"1
ˆy2 1x" " n14.94748 (11-11)
i"1 1
Sx x i"10.68088
a a xi b a a yi b
n n
Sx y 191.3612
a1 i S i 829.0462 a
n
and
ˆ 1y"% y2 1x " 10.17744 n
y " ! =
% x2 " x =
y
i i
"% 14.94748
0.2308i"1 i"1
(11-11)
0.68088 n
i"1 xx i"1
1 = 6.4049 − 0.2308 × 34.6125 = −1.5837

ˆ "y#!
! ˆ x" 92.1605 # 114.9474821.196 " 74.28331
0
25
The fitted simple linear regression model (with the coefficients
Fitted simple linear regression model ŷ = −1.5837 + 0.2308x
!
!
!
!
!
!
!
!
! n
∑i
r 2
• residuals: σ̂ 2 = MSE = i =1
= 0.6088
n−2
26
• standard error of regression coefficients
0.6088
= = 0.0271
829.0462
⎡ 1 34.6125 2 ⎤
= 0.6088 ⎢ + ⎥ = 0.9514
⎣ 24 829.0462 ⎦
27
in Section 11-3. HIn0: !addition,
discussedpoints? 1 " 0 1n # 22" ˆ 2 ( "2 has There is
degrees of freedom, and H! ˆ1: !is1 & 0
independent of "
ˆ 2
. As a this tes
1
• 11-26.
test Consider the data from Exercise 11-4 on y " with salesa c
c and we will use # " 0.01. From Example 11-1 and Table 11-2
price and x " taxes paid.
! we have
Ta
(a) Test H0: !1 " 0 using the t-test; use 'α " = 0.05
0.05. Notice
! ! ˆ !ˆ 1#
" !
14.947 n " 20, S " 0.68088, $̂2
" 1.18 11.35 a
(b) Test !1 " 0 using thexxanalysis of variance with ' " reports
1 H0: 1,0 0.05.
T•0 $
calculate test statistics (11-19)
soDiscuss
2" ˆ ( Sthe
2
the t-statistic relationship
in Equation 10-20 of this test to the test from partThis
becomes (a).sta
xx
! as t0 "
ˆ
! 0.2308
14.947 ˆ
!
1
"=
1 zero is
! t0 " " = 8.5166
" 11.35
degrees of freedom under
2$̂ %Sxx
2 H102 : !121.18
ˆ
se1! $ !%1,0
0.0271 . We would reject
0.68088
!
• threshold
11-4.2 Analysis of Variance Approach to Test Signific
0 t0 !0 & t'(2,n#2 = t 0.0025,22 = 3.119 (11-20)
• value of test statistic Ais method called
greater than the analysis of variance c
threshold
1-19. The denominator of The procedure
Equation 11-19partitions
is the the total variability
standard
• —> reject H0 nents as the basis for the test. The analysis o
the test statistic as
28
ˆ #!
!1 1,0
T0 $ (11-19
Under construct
• the assumptioncthat
onfidence
2" the2
(Sxx interval
ˆ observations for slope
are normally parameterdistributed,
and independently
a 100(1 " #)% confidence interval on the slope $1 in simple linear regression is
&ˆ2 &ˆ2
hn # 2 degrees
$̂1 "oft#freedom
% 2, n"2
under
( $1 ( H
$̂10:'!t1
# %
$
2, n"2
!1,0. We would rejec
(11-29)
B Sx x B Sx x
Similarly, a 100(1 " #)% confidence interval on the intercept $0 is

0 t0 0 1& tx'2(2,n#2 = t 0.0025,22 = 3.119 (11-20
$̂0 " t#%2, n"2 &ˆ c n '
2
d
B Sx x
− 3.119
0.2308 The
quation 11-19. × 0.0271 ≤ β1of
denominator ≤ 0.2308
Equation + 3.119 × x0.0271
11-19
1 2
is the standar
( $0 ( $̂0 ' t#%2, n"2 &ˆ 2 c n ' d (11-30)
uld write the test statistic as B Sx x
0.14631 ≤ β1 ≤ 0.3153
ˆ #!
en Purity Confidence Interval
! on1,0
the Slope
1
0 $
e interval on theTslope of the re- This simplifies to
in Example 11-1. Recall se1! that ˆ 1 2
and &ˆ 2 ) 1.18 (see Table 11-2). 12.181 ( $1 ( 17.713 29
e find

Lecture12 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture12 PDF

Uploaded by

Copyright:

Available Formats

Lecture 12

Response Regressor or Predictor

Sx x " a 1xi % x2 " a

Sx x " a 1xi % x2 2 " a x 2i %

Sx y " a 1yi % y2 1xi % x2 " a xi yi %

" a 1yi % ˆy2 1xi % x2ˆ " a xi yi %

• H0 not rejected • H0 rejected

(a) Assuming that a simple linear regression model is Engine

1 = 6.4049 − 0.2308 × 34.6125 = −1.5837

Similarly, a 100(1 " #)% confidence interval on the intercept $0 is

You might also like