You are on page 1of 29

Lecture 12


Linear Regression:
Test and Confidence Intervals

Fall  2013  
Prof.  Yao  Xie,  yao.xie@isye.gatech.edu  
H.  Milton  Stewart  School  of  Industrial  Systems  &  Engineering  
Georgia  Tech

1
Outline
• Properties  of  β̂
     1    and  β̂
     0    as  point  estimators  
• Hypothesis  test  on  slope  and  intercept  
• Confidence  intervals  of  slope  and  intercept  
• Real  example:  house  prices  and  taxes  

2
Regression analysis
• Step  1:  graphical  display  of  data  —  scatter  plot:  sales  
vs.  advertisement  cost  
!
!
!
!
!
!
!
• calculate  correlation
3
• Step  2:  find  the  relationship  or  association  between  
Sales  and  Advertisement  Cost  —  Regression

4
Simple linear regression
Based on the scatter diagram, it is probably reasonable to assume that the mean of the
random variable Y is related to X by the following simple linear regression model:

Response Regressor or Predictor

Yi = β 0 + β1 X i + ε i i = 1,2,!, n
εi (
ε i ∼ Ν 0, σ 2 )
Intercept Slope Random error

where the slope and intercept of the line are called regression coefficients.

• The case of simple linear regression considers a single regressor or predictor x and a
dependent or response variable Y.

5
the adequacy of the fitted model.
ator
t is of Equation 11-8.
occasionally Given data
convenient to (xgive
1, y1), p , (xn, yto
(x2, y2),symbols
special n),the
let numerator and
uation 11-8. Given data (x1, y1), (x2, y2), p , (xn, nyn), let
Regression a a xi b
coefficients 2

Sx x " a 1xi % x2 " a


i"1 a a xi b
n n
2 2 n 2
i"1
xi % (11-10)

Sx x " a 1xi % x2 2 " a x 2i %


n i"1 n
n
i"1
n (11-10)
i"1 i"1

a a xi b a a yi b
n n

Sx y " a 1yi % y2 1xi % x2 " a xi yi %


n n

a a xi b a a yi b
i"1 i"1
n n
n (11-11)

" a 1yi % ˆy2 1xi % x2ˆ " a xi yi %


i"1 i"1
n n
i"1 i"1
(11-11)
xy
i"1 β 0 = y − β1 x i"1 n

yˆ i = βˆ0 + βˆ1 xi
Fitted (estimated)
S xy regression model
β̂1 =
S xx

Caveat:  regression  relationship  are  valid  only  for  values  of  the  regressor  variable  
within  the  range  the  original  data.  Be  careful  with  extrapolation. 6
Estimation of variance
• Using  the  fitted  model,  we  can  estimate  value  of  the  
response  variable  for  given  predictor  
!
yˆi = βˆ0 + βˆ1 xi
!
• Residuals:   ri = yi − yˆi
• Our  model:    Yi  =  β0  +  β1Xi  +  εi,  i  =1,…,n,  Var(εi)  =  σ2    
• Unbiased  estimator  (MSE:  Mean  Square  Error)  
n
!
∑i
r 2

σ̂ =2
MSE = i =1
n−2

7
Punchline
• the  coefficients    
!
β̂1 and β̂ 0
!
and  both  calculated  from  data,  and  they  are  subject  to  
error.  
• if  the  true  model  is    y        =        β    1    x      +
       β      0      ,    β̂                  β̂      0    are  
     1      and
point  estimators  for  the  true  coefficients    
!
• we  can  talk  about  the  ``accuracy’’  of     β̂1 and β̂ 0

8
Assessing linear regression model
• Test  hypothesis  about  true  slope  and  intercept  
! β1 = ?, β0 = ?
!
• Construct  confidence  intervals  
!
β1 ∈ ⎡⎣ β̂1 − a, β̂1 + a ⎤⎦ β 0 ∈ ⎡⎣ β̂ 0 − b, β̂ 0 + b ⎤⎦ with  probability   1− α
!
• Assume  the  errors  are  normally  distributed

(
ε i ~ Ν 0, σ 2 )
9
to show
the $ thei"1
leastthat
squares expected valueofof
estimators
i $ the
$1 regression
is
i"1 i coefficients may be
s. We will investigate the bias and variance properties of the least
11-4 HYPOTHESIS TESTS IN
Properties
$̂e1.fitted E1$̂1 2 ! of
$1 Regression Estimators
or estimated regression line is therefore
(11-15)
ause $̂1 is a linear combination of the observations Yi, we can use
estimator slope
of the For
parameter
true the
slope
o show that the expected value intercept,
$ β. we can show
intercept
$̂01 #
ˆ ˆ in a similar
parameter β manner tha
1 1ŷ of
"! is !1x 0 (
riance of $̂1. Since we 11-4
have HYPOTHESIS
assumed that TESTS
V(&i) !IN#SIMPLE
2 LINEAR REGRESS
, it follows that
hatiseach pair ofcombination
a linear observations satisfies
of the the relationship
observations Yi, the results in ˆ
E1$̂1 2 ! $1 ˆ
E1!0 2 " !0 and V1!0 2 "
(11-15)
edintercept, we can show in a similar manner that
to show that
ˆ #!
yi " ! ˆ x # e, i " 1, 2, p , n
0 1 i i
timator of the true slope2$1. 2
ei " yof Thus, #!ˆ is an unbiased estimator 1 of thex the fit !o
errorintercept
ance i %$̂ŷ
1 .
i is called
Since
V1$̂we 2 the
ˆ
!have residual.
1E1!0 2 " !0 0assumed The
and residual
that ˆ
V(&
V1! 2
i) !
" #
$ 22, itthe
describes c follows
&(11-16) d in
that
S ˆ ˆ 0 n thebe
to the ith observation
is a linear combination ablesy i. !
Later
xx
of
0 and
in
the !
this is not
chapter
observations
1 zero.
we will
Y It, usecan
the Sresiduals
xxshown
results in to(see
pr
i
ation about the adequacy
to show that #$ x %Sxx.
2of the fitted model.
otationally, it is occasionally Theconvenient
estimate to
of give
$ 2 special symbols to the numerato
could be used in Equations 1
ˆ! is an unbiased estimator !
inator
0 unbiased
of Equation estimator
11-8. Given of the
data (x1, y1), (x unbiased
, y2), p0, (xestimator
intercept . The covariance
, yn), let of th
!0 and ! the variance of the slope
ˆ is not zero. 2It can be shown (see Exercise 11-98) 2 and the nintercept. We call th
that
1 #
a a xi b
Sxx. V1$̂ 1
estimators
2 ! the estimated standard
n 2errors of
(11-16) the slop
Snxx
Sx x " a 1xi % x2 " a x i %
2 n
e estimate of $ could be used in2 Equations 2 11-16
i"1 and 11-17 to provi
n (1
iance of the slope and i"1 the intercept. We i"1 call the square roots of the res
ors theEstimated
estimated standard errors of the slope and intercept, respectiv 10
an unbiased estimator of the intercept !0. The covariance of the rand
ˆStandard
nd ! is not zero. errors
It can be of
showncoefficients
(see Exercise 11-98) that cov(! ˆ
1
• We  can  replace    σ      2    with  its  estimator    σ̂      2    …  
timate of $2 could be used in Equations 11-16 and 11-17 to provide est
! n


e of the slope and the intercept. We call ri
2the square roots of the resulting

!
he estimated standardσ̂errors
2
= MSE
of the
= i =slope
1 and intercept, respectively.
n−2
!
! ri = yi − yˆi yˆi = βˆ0 + βˆ1 xi
!
ple linear regression the estimated standard error of the slope and
• Using  rerror
ted standard esults  
offrom   previous  are
the intercept page,  estimate  the  
standard  error  of  coefficients    
2 2
ˆ 2" $̂ 1
ˆ 2 " $̂2 c & x
se1!1 and se1!0 n d
B Sxx B Sxx
11
2
ively, where $̂ is computed from Equation 11-13.
distributed
2 with mean zero 2 and variance $ , abbreviated NID(0, $ ).
, abbreviated NID(0, $ ).
t of assessing the adequacy of a linear regression model is testing stat
he modelHypothesisparameters test in simple
and constructing linear
certain regression
confidence intervals. Hypo
near regression is discussed in this section, and Section 11-5 presents me
n model
• we   wisish  
testing
to   t statistical
est   ttest
he   hypothesis   whether   the  say,
sand
lope   e.quals  
onfidence
o test the intervals. Tothat
hypothesis hypotheses
the slope equals about a the slope
constant, intercept
! The of t
app
nfidence intervals. Hypothesis 1,0
s
we
re a musta  constant  
constant, make the!1,0
say, . The appro-
additional assumption that the error component i
ection 11-5 presents methods
lly distributed.
! Thus, the complete assumptions are that the errors are nor
slope and intercept ofHthe 0: !re-
1 " !1,0
distributed with mean zero and variance $2, abbreviated NID(0, $2).
the error ! componentH in: !the( ! (11-1
1 1 1,0
re that the errors are normally (11-18)
! 2
reviated NID(0,
umed a two-sided alternative. $ ). Since the errors ' are NID(0, $ 2
), it follo
rrors•
servationse.g.  
' i arerelate  
NID(0, a
Yi are NID(! ds  
$ 2to  sales,  we  are  interested  in  study  
), it &follows
! x , $ 2
i
). Now !ˆ 1 is a linear combination
0 1 i
!ˆ whether  
to test
ow 1 is or  combination
thea hypothesis
linear not  ithat
ncrease   ofa  $  oequals
the slope n  ads  awconstant,
ill  increase  
say,$  !
     1,0
     .  iThe
n   a
are sales?  
• sale  
onstant, say, =    !    1,0
       .aThe
ds  +appro-
 Hconstant?  
0: !1 " !1,0            
H1: !1 ( !1,0 (1

sumed a two-sided alternative. Since the errors 'i are NID(0, $2),12 it fo
(11-18) 2 ˆ
hesis if the computed value of this test statistic, t0, is such that
A related
e denominator of the and important
test statistic question…
in Equation 11-22 is just the stan-
whether  
ial• case or  not  the  ofslope  
of the hypotheses Equationis   11-18 is
zero?   Significance  of  regression
! H0: !1 $ 0
H1: !1 % 0 (11-23)
!
• if  significance
o the β1  =  0,  that  ofmregression.
eans  Y  does   not   to reject H0: !1 $ 0 is
Failure
depend  
at there on  X,  relationship
is no linear i.e.,     between x and Y. This situation is
e that this may imply either that x is of little value in explaining the
• Y  and  X  are  independent    
best estimator of Y for any x is ŷ $ Y [Fig. 11-5(a)] or that the true
d Y• isIn  
not the   advertisement  
linear [Fig. 11-5(b)]. example,  
Alternatively, if H0: !1 $ 0 is re-
does  
of value in aexplaining
ds  increase  the svariability
ales?  or  n
ino  Y (see Fig. 11-6). Rejecting
effect?
her that the straight-line model is adequate [Fig. 11-6(a)] or that,
effect of x, better results could be obtained with the addition of
rms in x [Fig. 11-6(b)]. 13
11-4 HYPOTHESIS TESTS IN SIMPLE LINEAR REG

y y

"0 x x
(a) (b)

• H0      not  rejected • H0      rejected


2 Oxygen Purity Tests of Coefficients
nificance of regression using the model for Practical Interpretation: Since the r
ata from Example 11-1. The hypotheses are t0.005,18 " 2.88, the value of the test statis
14
critical region, implying that H0: !1 "
H :! "0
PTER 11 SIMPLE LINEAR
independent REGRESSION
normal random AND and consequently, !ˆ 1 is N(!1, "
CORRELATION
variables,
Use t-test for slope
and variance properties of the slope discussed in Section 11-3. In additio
a chi-square distribution with n # 2•degrees of freedom, and !ˆ is ind
independent normal random variables,Under   H0  ,  test  
and consequently, ˆ
!1 1is N(!
result of those properties, the statistic statistic  
Under   H0
and variance properties of the slope discussed in Section 11-3. In ad
a chi-square distribution with n # !2 degrees of freedom, and ˆ is
!
ˆ #! 1
!1 1,0
result
Statistic of those properties, the statistic
! T0 $ 2
2"ˆ (Sxx
       
ˆ #!
!
,0 !
follows the t distribution with n # 2 degrees of
1
freedom
1,0
under H0: !1 $
atistic T0 $
H0: !1 $ !1,0 if ~   t   d istribution  
2" ˆ 2 with  
(Sxx
n-­‐2  degree  of  freedom  
0 t0 0 & tof
follows the t distribution with n # 2! degrees freedom under H0: !
'(2,n#2
H0: !1 $ !1,0 if • Reject  H0  if  
where t0 is computed from Equation 11-19.
! The denominator of Equation

( )
error of the slope, so we could write the test 0 t statistic
0 & t as
β̂1 ~ Ν β1,0 , σ 2 / Sxx ! 0 '(2,n#2

(two-­‐sided   ˆ #
!1
test)
!1,0
where t0 is computed from Equation 11-19. T0 $ The denominatorˆ
of Equa
se1!1 2 15
error of the slope, so we could write the test statistic as
We willNumbertest for significance
x (%) of
y (%)regression using the model for Prac
xygen
the oxygen Purity
1
purityTests 0.99 of Coefficients
data from Example
90.01
11-1. The hypotheses are t0.005,18 "
2 1.02 89.05
11-2
nce
EXAMPLE 3 Oxygenusing
of regression
Example: 11-2Purity
1.15 the Tests
oxygen model
Oxygen of
91.43for
purity Coefficients
Purity ˆ tests
Tests Practical2Interpretation: Since
of of coefficients
Coefficients criticaltherer
om variables,
Example
r significance4 and
11-1. consequently,
The
1.29hypotheses
of regression H :
using !
93.74 are" !
0 1
the model tis N(! " 1 , "
2.88, !S
for the modelPracticalthe xx ),
value using
of the the
test
Interpretatio
bias
statis
We will 5 test for significance 0
of 1
regression 0.005,18
using for There
Practica is s
1n 2
H(value
2
1.46 96.73
e the
rity slope
EGRESSION
•data discussed
6 from
oxygen
AND
Consider  Example
purity 1.36 in
CORRELATION
t he  
data t
11-1.Section
est  
from H The: !
94.45
1Example 1 & 11-3.
0
hypotheses
11-1.
100
In
critical
are
The
addition,
region,
hypothesest implying
0.005,18 are " #2.88,22"
that
t
ˆ
the 0"
this : !test
"
has
"of
12.88
H0: !1 7" 0 0.87 87.59
Thereˆ is strong evidence to 0.005,18
support 2 this c
h n #8 2 degrees1.23of freedom, 91.77 and 98 ! 1 is independent
critical region, of "ˆ
implying
with
critical . region
As
a cal tha
1: !we
Hand 1! 9&will
0H0use: !1 #"1.55 " 0 0.01. H From
: !
99.42" Example
0 thisˆ11-1test and is P ! 1.23
Table 2 is11-2
(9
' 10 evidence . This wa
e statistic
mal random10 variables, 1.40 and 0consequently,
1
93.65
96
! 1 is N(! There
1 , " !S strong
xx ), using
There the to
bia
is Table
stron
we have H : ! & 0 H : 93.54& 0 94 with a calculator.
! this test 1n is # !
P this 2 '
1.23 2isth10
1. From!12of (
test P

Purity (y)
11 1.19
perties the slope
Example 1 1
11-1 discussed
and Table1 in 1 Section 11-3. In addition,
11-2 22" Notice
ˆ " ha
1.15 92.52 Table ˆ 11-2
with presents
a calculator.the Minitab ou
#
92
! 2is independent of11.35 with "a 2calcul
anda
eibution with n 2 degrees of freedom, and
ˆ "
! 13
14.947 0.98
n " 20, 90.56
S " 0.68088, $̂ " 1.18 ˆfor. As
and !we
# " 0.01. From
114 will use # 1.01 "!ˆ
Example0.01.
# 11-1
From
! and
89.54xx Table
Example 90 Notice
11-2 that1
11-1 and Table 11-2the t-statistic value
Table 11-2 presents Table 11
the th
operties, 15 the statistic 1
1.11 2 1,0
89.85 reports is P " th
20, we Shave " T
0.68088, $ $̂ " 1.18 88 11.35 and that the reported P-value (11-19)
so the!xx17t-statistic
16 0 in Equation
1.20
x 10-20
90.39
becomes Notice that theNotice t-statistic
This that
statis
x v
th
(a)
ˆ18 2"
1.26 ˆ 2
( S 93.25
xx 2
86 reports the t-statistic for testing the hy
0.85 0.95 2 1.15
1.05 11.35 (b)and
1.25 1.45 11.35
1.35 that the and
1.55reported tha
47
ion•10-20n ! " " 14.947
20,
Calculate  
1 becomes S " n " 20,
0.68088,
xxthe  test   !
1.32
ˆ S
93.41
#
xx $̂"
! "0.68088,
1.18
This $̂
statistic " 1.18
is
Hydrocarboncomputed
level ( x) from as t "
Equation
0 46
19
!ˆ 1.43
!ˆ 1
94.98 1,014.947 reports theversus reports
t-statistic the
forrej t-t
t0statistic  
20
"t-statistic
1 0.95 T $
" 0
1 87.33
"
Figure 11-1 Scatter diagram
t0 "
asTable 11-1.46.62. "
of oxygen purity
Clearly,
11.35 zero
hydrocarbon
then, the hypothes is
(11-19
c so
in the
Equation 10-20 in Equation
becomes 10-20
2" 2 becomes
level from
This statistic isThis statisticfr
computed
!
hn #1
" 2$̂ %Sxx of freedom
2 degrees
14.9472
"
ˆ
se1!1 2
11.35
ˆ ( S
21.18zero
underxx %H : !rejected.
0.68088
0is 1 $ !1,0. We would rejec
ˆ! 2 ! 21.18 as t0 " 46.62. as t
Clearly,
0 " 46.62
then
1ity Tests
1 ! %0.6808814.947
ˆ
1 of 1Coefficients
ˆ ! ˆ
!1 14.947 zero is rejected. zero is rejecte
ibution t
"0 with" n" "
2# 2 degrees ˆ of " " 11.35
freedom " 11.35
under H0: !1 $ !1,0. We would rejec
%Sxx using
ssion ! 2$̂
ˆ
se1!1 2the % model
S se1!
21.18 %for
xx 2
0.68088
1 21.18 % Practical Interpretation: Since t
0.68088
11-4.2 Analysis of Variance Approach to Test Significa
of11-1. 0 0 & t'(2,n#2
0 tApproach
• Threshold  
The hypotheses
Variance = t0.005,18
areto Test " 2.88, the
Significance of value of the(11-20)
Regression test st
• Reject  
11-4.2 H0  since 0 tof
Analysis 0 &
A critical
Variance
t'(2,n#2
method region,
Approach
called the implying
to Test
analysis of Significance
variance 16 !1
that H(11-20
0: can
nalysis of Variance 0Approach
A method called the analysis of There to
varianceTest Significance
can beevidence
used to test of Regres
is strong to for signific
support th
ˆ #!
! T0 $ ˆ 2
1 1,0 se1!
T0 $ A similar
ˆ 2
procedure can
1 be used to test hypothes
Use t-test for1 intercept
se1!
similar procedure can be used to test hypotheses about the intercept. To test
H0: !0 $
• Use  
be used a  shypotheses
to test imilar  form  
about of  tthe
est  intercept. To testH0: !0 $ !0,0 H1: !0 %
! H0: !0 $ !0,0 H1: !0 % !0,0
!
H : !we%would
! use the statistic (11-21)
e would use the statistic
1 0 0,0
!
ˆ #!
! ˆ #!
! ˆ #!
!
c • Test  statistic   0 0,0 0 0,0 0 0,0
T0 $ 2
$ T
ˆ 0
se1! 2
$ 2
Test Statistic 1 x 0 1 x
! ˆ #! ˆ "ˆ#c n! )
2
d
! 0 0,0 B!0 0,0 Sxx "ˆ 2
cn )
T0 $ $ ˆ B(11-22) Sx
! 1 x 2 se1!0 2
"ˆ Hc0nhypothesis
nd rejectUnder  2
the null ,   )
   T ~
     0        S   t ddistribution  
  if the computedw ith   n
value -­‐2   d egree  
of o
this test f   f reedom  
statistic, t0, is s
00 & t ! B. Note thatand
'(2,n#2 thexxdenominator reject theofnull hypothesis
the test if the computed
statistic in Equation 11-22 is just

othesis Reject  
ard error
if the H0  if 0 t0 0 &
of the intercept.
computed
t'(2,n#2
value of
. Note
this test
that the denominator
statistic, t , is such
of the te
that
A very important special case of the hypotheses of Equation 0 11-18 is
dard
the denominator of the testerror of the
statistic intercept.
in Equation 11-22 is just the stan-
t. A very important
H0: !1 $ 0special case of the hypothe
17
Class activity
Given the regression line:
y = 22.2 + 10.5 x estimated for x = 1,2,3,…,20
!
1. The estimated slope is:
A. βˆ = 22 .2 B. βˆ = 10.5 C. biased

1 1
!
!
!
2. The predicted value for x*=10 is
A. y*=22.2 B. y*=127.2 C. y*=32.7
!
!
!
3. The predicted value for x*=40 is
A. y*=442.2 B. y*=127.2 C. Cannot extrapolate

18
Class activity
1.  The estimated slope is significantly different from zero when
2
βˆ S XX βˆ S XX βˆ1 S XX
1
> tα / 2,n − 2 1
< tα / 2,n − 2 > Fα / 2,n −1,1
A. σˆ B. σˆ C. ! σˆ
2. The estimated intercept is plausibly zero when
A.  Its confidence interval contains 0.

βˆ0 S XX βˆ0
B. < tα / 2,n − 2 C. > tα / 2,n − 2
σˆ σˆ 1 / n + x / S xx
2

19
where t0 is computed from Equation 11-19. The denominator of Equa
ˆ #
!
error of the slope, so we could write the test !1,0 as
1 statistic
Confidence interval T0 $
se1!ˆ 2
1
ˆ! # !
1 1,0
• we  can  obtain  
A similarconfidence  
procedure can ibe
nterval   estimates  
used to test
T0 $
hypotheses oˆf  1s2 the
about
se1! lope  
intercept. T
and  CORRELATION
ESSION AND intercept  
A similar procedure can be used to test !0 $ !0,0 about the interce
H0: hypotheses
• width  of  confidence  interval  is  a  measure   H1: !0 %o! f  0,0the  
randomoverall  
variables,quality   ˆ is N(! , "2!S ), using the bias
and consequently,of  the  regression
! 1 1 xx H0: !0 $ !0,0
2true   2 parameter
ies of the slope discussed we would in Section 11-3. In addition, 1n H
use the statistic # :22"!
ˆ % ( "
! has
1 0 2 0,0
tion with n # 2 degrees of freedom, and ! ˆ is independent of " ˆ . As a
slope intercept
1
!ˆ #! !ˆ #!
rties, the statistic we would use the statisticT0 $ 0 0,0
$
0 0,0
Test Statistic 2 se1!ˆ 2
1 x 0
"ˆ 2 c nˆ ) d ˆ #!
ˆ! # ! B !0 #Sxx !0,0 ! 0 0,0
1 1,0 T0 $ $ ˆ 2
T0 $
Test Statistic 1 x (11-19)
2 se1! 0
and 2" reject 2 the
ˆ ( Sxx null hypothesis if the 2
computed
"ˆ c n ) value d of this test stat
0 t0 0 & t'(2,n#2. Note that the denominator B of the testSstatistic xx in Equation
dard error of the intercept.
tion with n #~  2t  degrees
distribution  
and of reject
freedom
A very w ith  under
the
important n-­‐2  
null H0~  
hypothesis
special :! t1  d$istribution  
case if!the
of the computed
1,0.hypotheses
We with  
would n-­‐2  of this
ofvalue
reject
Equation testi
11-18
0 t0 0 & t'(2,n#2. Note that the denominator of the test statistic in Equa
degree  dard of  error
freedom  
of the intercept. degree  Ho0:f  !f1reedom   $0 20
A very important special case of the hypotheses of Equation 11
H1: !1 % 0
0 t0 0 & t'(2,n#2 (11-20)
e both both1$distributed
are distributed
1 " $as 1 2 %t2&asˆ t%random
random Sx x variables 1$with
andvariables $n0 2"
0 " with %B "
n2& c2ndegrees
ˆdegrees ' of Bdfreedom.
of freedom. x x This
SThis lead
leads t
following definition of 100(1 " #)% confidence intervals Sx xthe slope and intercept.
on
llowing definition of 100(1 " #)% confidence intervals on the slope and intercept.
are both distributed as t random variables with n " 2 degrees of freedom. This l
Confidence intervals
h distributed as t random variables with n " 2 degrees of freedom. This leads to the
ce following definition of 100(1 " #)% confidence intervals on the slope and interce
ng definition of 100(1 " #)% confidence intervals on the slope and intercept.
on UnderUnder the assumption
the assumption thatobservations
that the the observations are normally
are normally and and independently
independently distribu
distributed
rs " confidence
#)% confidence
ce a 100(1
a 100(1 " #)% interval
interval on theon slope
the slope$1 in$simple
1 in simple linearlinear regression
regression is is
on Under the assumption that the observations are normally and independently distri
der
rs
the assumption that the observations are normally
2 and independently distributed,
2
a 100(1 " #)% confidence 2 &
interval ˆ on the slope $ in simple &
ˆ
2 linear regression
00(1 " #)% confidence interval
$̂ " ton the&slope $(1 in$ simple
ˆ
( $̂ linear
' &ˆ
1 t regression is (11-
$̂1 " t1#%2, n"2#%2, n"2 ( $1 ( 1$̂1 ' 1t# 2, n"2 #%2, n"2 (11-29
B SxB S
x x x2
% B SB S
xx xx 2
&ˆ2 &ˆ &ˆ2 &ˆ
$̂1 a"100(1
Similarly, $̂1 #)%
t#%2, n"2
" " t#confidence
%( ( t$
$1 ( S$̂1 interval
2, n"2 ' 12,( $̂1the
on ' intercept
t#%2, n"2 $(11-29) is (
Similarly, a 100(1 " #)%Bconfidence Sx x Binterval
xx
# % n"2
on theBintercept
Sx x $B 0 isSx x
0

2
milarly, aSimilarly,
100(1 " a 100(1
#)% 1
1confidence2 x
2 "x#)% interval
confidence interval
on the intercepton the$0 intercept
is $0 is
$̂ " t
$̂0 " t#%2, n"2% &ˆ B
0 # 2, n"22 &
ˆ c '
c n ' n d Sx x d
B Sx x
2
2 2 1 x
1 x 1 x 2
" t#%2, n"2 ' &ˆd2 c n ' ( $
$̂0 "&ˆt2#%c2,nn"2 d (($0$̂('$̂0t ' t#%2, n"2&ˆ 2 c &ˆ12 ' c n x' d (11-
B B
Sx x Sx x 0 0 #%2, n"2 B d
S (11-30
B 2n Sx x x x
1 x 2
1 x
( $0 ( $̂0 ' ( t$#%02,( &ˆ 2 tc#n%2,'
n"2$̂0 ' d&ˆ 2 c '(11-30) d (
B n"2
SBxx n Sx x
Oxygen Purity Confidence Interval on the Slope
ygen Purity
onfidence Confidence
interval on the slopeInterval
of the on
re- the This
Slope simplifies to
nce data
the interval on the slope
in Example of the
11-1. re- thatThis simplifies to
Recall
Purity
Oxygen
ata
0.68088,
Confidence
andPurity
in Example ˆ2 )
&
Interval
Confidence
11-1.
1.18 Recall
(see
on the
that
Table
Slopeon the Slope
Interval
11-2). 21
terval on 2the slope of the re- This simplifies to 12.181 ( $1 ( 17.713
8, and
onfidence&
ˆ ) 1.18
11-29 we interval
find (see Table 11-2).
on the slope of the re- This simplifies to ( $ ( 17.713
12.181
B Sx x B Sx x
We will find a 95% confidence interval on the sl
gression line using the data in Example 11-1
XAMPLEExample:
onfidence 11-4
interval oxygen
Oxygenon thePurity purity
intercept Confidencetests
$ 0 is of0.68088,
Interval coefficients
on the Slope
2
AMPLE
We willEXAMPLE 11-4 Oxygen
11-4 Oxygen
find a 95% confidence Purity
$̂ )
interval onPurity
1 Confidence
14.947,
the slope
S )
Confidence Interval
of the re- Interval
xx and &ˆ
on )the
1.18Slo
on the
This simplifies
(se
to
2will find EXAMPLE
a using
95% 11-4 Oxygen
Then, Purity
from Equation Confidence
11-29 we findInterval on
(αthe=ofre-
0.05)
essionWe line
will find confidence
athe95% interval
dataconfidence
in Example on
11-1.
interval theon slope
Recall of
that
the slope the re- ThisThis sims
d gression
) 14.947, We
S )will find
0.68088, a 95%
and &ˆconfidence
2
) in interval
(see
1.18Example Table11-1.on the
11-2). 2 slopethat
of the re-
1ssion line using
xx line theusing data theindata Example 11-1. Recall &
ˆ that
Recall
hen,
x from gression
Equation 11-29line weusing
find the 2 data in Example
$̂ˆ12" t0.025,18 11-1.
( $1Recall
( $̂1 ' that
t0.025,18

) 14.947, 1 ) 14.947,
S ) 0.68088, S xx ) 0.68088, and
and & ) 1.182 (see
ˆ & ) 1.18 (see
2 Table Table
B Sxx11-2). 11-2).
$̂xx
1 ) 14.947, S xx ) 0.68088,
1 and
x &
ˆ ) 1.18 (see Table 11-2).
en, from Then, from Equation
Equation 11-29 11-29
we find we
2 find Practical Inte
( $ ( $̂0 '
$̂1 "0 tThen, from &ˆ 2
t#%Equation
2, n"2 &
ˆ
or c nt 'we findd
11-29 &ˆ 2
(11-30)
0.025,18 ( $ 1B 1
( $̂ ' Sx xB S
0.025,18 there is strong evid
B Sxx 2 &ˆ 2 xx 2 &ˆ 2 Prac P
$̂ " t &ˆ ( &
ˆ
$ 2
( $̂ ' t &ˆ The&
1.18ˆ 2CI is reasonab
there
$̂1 " t0.025,181
$̂0.025,18
" t (
B $
S 1 ( $̂1 ' t1
1 ( $ ( $̂
0.025,18
14.947
0.025,18
" '
2.101t B S ance is( there
fairly
$ ( is14s
smal
B Sxx
1 0.025,18xx
B Sxx 1 1B Sxx A 0.68088
0.025,18
xx
B Sxx 1

val on TheTheCI iC
or the Slope 1.18 1.18 ance
ance is f
e- or
14.947 " 2.101
This simplifies to 0.68088 ( $ 1 ( 14.947 ' 2.101
A A 0.68088
at 1.18
14.947 " 2.101 1.18
1.18 1.18
( $1 ( 14.947
). 14.947 '"2.10114.947
2.101 "A
12.1812.101
0.68088
( (
$ $
( ( ( $1 ( 14.947
14.947
17.713
AA 0.68088 A 0.68088
1
0.68088 1
11-5.2 1.18 Confidence Interval on the
The  confidence  
' interval  
2.101
1.18 does  not  
1.18 include  0,  so  enough    
Practical Interpretation:
' A2.101 This CI does not include zero, so
0.68088
1-5.2evidence  
'saying  
Confidence2.101 there   i
Interval
A 0.68088
there is strong evidence s   enough  
A
on the c
0.68088orrelation  
Mean b etween  
Response
A
(at # ) 0.05) that the slope is not zero. X
confidence   and   Y
interva.  
22
The CI is reasonably narrow (*2.766) because thex0error
. Thisvari-
is a confiden
Wellington [“Prediction, Linear Regression, and a Minimum priate, fit the regression model relating steam
Sum of Relative Errors” (Vol. 19, 1977)] presents data on the the average temperature (x). What is the est
selling price and annual taxes for 24 houses. The data are Graph the regression line.
Example: house selling price and annual taxes
shown in the following table. (b) What is the estimate of expected steam usa
average temperature is 55#F?
(c) What change in mean steam usage is expec
Taxes Taxes monthly average temperature changes by 1#F
Sale (Local, School), Sale (Local, School), (d) Suppose the monthly average temperature is 4
Price/1000 County)/1000 Price/1000 County)/1000 the fitted value of y and the corresponding resi
25.9 4.9176 30.0 5.0500 11-6. The following table presents the high
29.5 5.0208 36.9 8.2464 mileage performance and engine displacement
27.9 4.5429 41.9 6.6969 Chrysler vehicles for model year 2005 (source: U
40.5 7.7841 mental Protection Agency).
25.9 4.5573
(a) Fit a simple linear model relating highway m
29.9 5.0597 43.9 9.0384 lon ( y) to engine displacement (x) in cubic
29.9 3.8910 37.5 5.9894 least squares.
30.9 5.8980 37.9 7.5422 (b) Find an estimate of the mean highway gaso
28.9 5.6039 44.5 8.7951 performance for a car with 150 cubic in
35.9 5.8282 37.9 6.0831 displacement.
(c) Obtain the fitted value of y and the correspon
31.5 5.3003 38.9 8.3607
for a car, the Neon, with an engine displace
31.0 6.2712 36.9 8.1400 cubic inches.
30.9 5.9592 45.8 9.1416

(a) Assuming that a simple linear regression model is Engine


Independent  variable  X:  SalePrice  
appropriate, obtain the least squares fit relating selling Displacement
price to taxes paid. What is the estimate of !2? Carline (in3)
Dependent  variable  Y:  Taxes
(b) Find the mean selling price given that the taxes paid are 300C/SRT-8 215
23
x " 7.50. CARAVAN 2WD 201
(c) Calculate the fitted value of y corresponding to x "
• qualitative  analysis

Calculate  correlation

= 0.8760

24
n about the adequacy of the fitted model.
is called
onally, it isthe residual. convenient
occasionally The residual describes
to give special the errortointhe
symbols thenumerator
fit of theand
or of Independent  
bservation
Equationyi. 11-8. variable  
LaterGiven
in this Y:  1,SyalePrice  
chapter
data (x 1),we y2), p
(x2,will use, (xthe
n, ynresiduals
), let to provide
ORRELATION
he adequacy of thevfitted model.
Dependent   ariable   X:  Taxes
a
n 2
is occasionally convenient to give special symbols a b to the numerator and
a= 34.6125 a y
n n
xi
ONation 11-8. Given data (x , y ),
1x1 %1 x2 2 "
(x p
2 2 x 2 % n yn), let
, y ), , (x ,
i"1
n = 24 S " xx xi"1 i i = 6.4049
n (11-10)

a a xi b
i"1
n 2
Therefore, the least squares estimates of the slope and inter-
a a
n n
cept are 1x % x2 " = 829.0462
2 2 i"1

xi b a a
a aslope
S "xx i x % i (11-10)
n n n
i"1 i"1 yi b inter-
Sx y " a 1yi %! i % x2 ""a xi yi %
e, the leastn squares estimates of
Sx y = 191.3612
n the
10.17744 i"1
and
i"1
ˆy2 1x" " n14.94748 (11-11)
i"1 1
Sx x i"10.68088
a a xi b a a yi b
n n
Sx y 191.3612
a1 i S i 829.0462 a
n
and
ˆ 1y"% y2 1x " 10.17744 n
y " ! =
% x2 " x =
y
i i
"% 14.94748
0.2308i"1 i"1
(11-11)
0.68088 n
i"1 xx i"1

1 = 6.4049 − 0.2308 × 34.6125 = −1.5837


ˆ "y#!
! ˆ x" 92.1605 # 114.9474821.196 " 74.28331
0

25
The fitted simple linear regression model (with the coefficients
Fitted  simple  linear  regression  model ŷ = −1.5837 + 0.2308x
!
!
!
!
!
!
!
!
! n

∑i
r 2

• residuals: σ̂ 2 = MSE = i =1
= 0.6088
n−2
26
• standard  error  of  regression  coefficients

0.6088
= = 0.0271
829.0462

⎡ 1 34.6125 2 ⎤
= 0.6088 ⎢ + ⎥ = 0.9514
⎣ 24 829.0462 ⎦

27
in Section 11-3. HIn0: !addition,
discussedpoints? 1 " 0 1n # 22" ˆ 2 ( "2 has There is
degrees of freedom, and H! ˆ1: !is1 & 0
independent of "
ˆ 2
. As a this tes
1
• 11-26.
test   Consider the data from Exercise 11-4 on y " with salesa c
c and we will use # " 0.01. From Example 11-1 and Table 11-2
price and x " taxes paid.
! we have
Ta
(a) Test H0: !1 " 0 using the t-test; use 'α " = 0.05
0.05. Notice
! ! ˆ !ˆ 1#
" !
14.947 n " 20, S " 0.68088, $̂2
" 1.18 11.35 a
(b) Test !1 " 0 using thexxanalysis of variance with ' " reports
1 H0: 1,0 0.05.
T•0 $
calculate   test   statistics   (11-19)
soDiscuss
2" ˆ ( Sthe
2
the t-statistic relationship
in Equation 10-20 of this test to the test from partThis
becomes (a).sta
xx
! as t0 "
ˆ
! 0.2308
14.947 ˆ
!
1
"=
1 zero is
! t0 " " = 8.5166
" 11.35
degrees of freedom under
2$̂ %Sxx
2 H102 : !121.18
ˆ
se1! $ !%1,0
0.0271 . We would reject
0.68088
!
• threshold  
11-4.2 Analysis of Variance Approach to Test Signific
0 t0 !0 & t'(2,n#2 = t 0.0025,22 = 3.119 (11-20)
• value  of  test  statistic  Ais  method called
greater   than  the analysis of variance c
threshold  
1-19. The denominator of The procedure
Equation 11-19partitions
is the the total variability
standard
• —>                      reject  H0   nents as the basis for the test. The analysis o
the test statistic as
28
ˆ #!
!1 1,0
T0 $ (11-19
Under construct  
• the assumptioncthat
onfidence  
2" the2
(Sxx interval  
ˆ observations for  slope  
are normally parameterdistributed,
and independently
a 100(1 " #)% confidence interval on the slope $1 in simple linear regression is

&ˆ2 &ˆ2
hn # 2 degrees
$̂1 "oft#freedom
% 2, n"2
under
( $1 ( H
$̂10:'!t1
# %
$
2, n"2
!1,0. We would rejec
(11-29)
B Sx x B Sx x

Similarly, a 100(1 " #)% confidence interval on the intercept $0 is


0 t0 0 1& tx'2(2,n#2 = t 0.0025,22 = 3.119 (11-20
$̂0 " t#%2, n"2 &ˆ c n '
2
d
B Sx x
− 3.119
0.2308 The
quation 11-19. × 0.0271 ≤ β1of
denominator ≤ 0.2308
Equation + 3.119 × x0.0271
11-19
1 2
is the standar
( $0 ( $̂0 ' t#%2, n"2 &ˆ 2 c n ' d (11-30)
uld write the test statistic as B Sx x

0.14631 ≤ β1 ≤ 0.3153
ˆ #!
en Purity Confidence Interval
! on1,0
the Slope
1
0 $
e interval on theTslope of the re- This simplifies to
in Example 11-1. Recall se1! that ˆ 1 2
and &ˆ 2 ) 1.18 (see Table 11-2). 12.181 ( $1 ( 17.713 29
e find

You might also like