You are on page 1of 32

Inference in Regression Analysis

Dr. Frank Wood

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1


Today: Normal Error Regression Model
Yi = β0 + β1 Xi + ǫi
• Yi value of the response variable in the ith trial
• β and β are parameters
• Xi is a known constant, the value of the
predictor variable in the ith trial
• ǫi ~iid N(0,σ)
• i = 1,…,n

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 2


Inferences concerning β
• Tests concerning β (the slope) are often of
interest, particularly
H0 : β1 = 0
Ha : β1 = 0
the null hypothesis model

Yi = β0 + (0)Xi + ǫi
implies that there is no relationship between
Y and X

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 3


Review : Hypothesis Testing
• Elements of a statistical test
– Null hypothesis, H0
– Alternative hypothesis, Ha
– Test statistic
– Rejection region

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 4


Review : Hypothesis Testing - Errors
• Errors
– A type I error is made if H0 is rejected when H0 is
true. The probability of a type I error is denoted
by α. The value of α is called the level of the test.
– A type II error is made if H0 is accepted when Ha
is true. The probability of a type II error is
denoted by β.

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 5


P-value
• The p-value, or attained significance level, is
the smallest level of significance α for which
the observed data indicate that the null
hypothesis should be rejected.

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 6


Null Hypothesis
• If β = 0 then with 95% confidence the b1
would fall
40
in some range around zero
Guess, y = 0x + 21.2, mse: 37.1
True, y = 2x + 9, mse: 4.22
35

30
Response/Output

25

20

15

10
1 2 3 4 5 6 7 8 9 10 11
Predictor/Input
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 7
Alternative Hypothesis : Least Squares Fit
40
Estimate, y = 2.09x + 8.36, mse: 4.15
True, y = 2x + 9, mse: 4.22
35

30
Response/Output

25
b1 rescaled is
test statistic
20

15

10
1 2 3 4 5 6 7 8 9 10 11
Predictor/Input

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 8


Testing This Hypothesis
• Only have a finite sample
• Different finite set of samples (from the same
population / source) will (almost always)
produce different estimates of β and β (b0,
b1) given the same estimation procedure
• b and b are random variables whose
sampling distributions can be statistically
characterized
• Hypothesis tests can be constructed using
these distributions.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 9
Example : Sampling Dist. Of b1
• The point estimator for b1 is

(Xi − X̄)(Yi − Ȳ )
b1 = 
(Xi − X̄)2

• The sampling distribution for b1 is the


distribution over b1 that occurs when the
predictor variables Xi are held fixed and the
observed outputs are repeatedly sampled

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 10


Sampling Dist. Of b1 In Normal Regr. Model
• For a normal error regression model the
sampling distribution of b1 is normal, with
mean and variance given by

E(b1 ) = β1
σ2
V (b1 ) = 
(Xi − X̄)2

• To show this we need to go through a number


of algebraic steps.

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 11


First step
• To show
 
(Xi − X̄)(Yi − Ȳ ) = (Xi − X̄)Yi

we observe
  
(Xi − X̄)(Yi − Ȳ ) = (Xi − X̄)Yi − (Xi − X̄)Ȳ
 
= (Xi − X̄)Yi − Ȳ (Xi − X̄)
  
Xi
= (Xi − X̄)Yi − Ȳ (Xi ) + Ȳ n
n

= (Xi − X̄)Yi

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 12


Slope as linear combination of outputs
• b1 can be expressed as a linear combination
of the Yi’s

(Xi − X̄)(Yi − Ȳ )
b1 = 
(Xi − X̄)2

(Xi − X̄)Yi
= 
(Xi − X̄)2

= ki Yi

where

(Xi −X̄)

ki = (X −X̄)2
i

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 13


Properties of the ki’s
• It can be shown that

ki = 0

k i Xi = 1
 1
2
ki = 
(Xi − X̄)2
(possible homework). We will use these
properties to prove various properties of the
sampling distributions of b1 and b0.
write on board
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 14
Normality of b1’s Sampling Distribution
• Useful fact:
– A linear combination of independent normal
random variables is normally distributed
– More formally: when Y1, …, Yn are independent
normal random variables, the linear combination
a1Y1 + a2Y2 + … + anYn is normally distributed,
with mean ∑ aiE(Yi) and variance ∑ a2iV(Yi)

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 15


Normality of b1’s Sampling Distribution
• Since b1 is a linear combination of the Yi’s
and each Yi is an independent normal
random variable, then b1 is distributed
normally as well

 (Xi − X̄)
b1 = ki Yi , ki = 
(Xi − X̄)2

write on board

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 16


b1 is an unbiased estimator
• This can be seen using two of the properties
  
E(b1 ) = E( ki Yi ) = ki E(Yi ) = ki (β0 + β1 Xi )
 
= β0 ki + β 1 ki X i
= β0 (0) + β1 (1)
= β1

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 17


Variance of b1
• Since the Yi are independent random
variables with variance σ and the ki’s are
constants we get
 
V (b1 ) = V( ki Yi ) = ki2 V (Yi )
 
= ki2 σ 2 =σ 2
ki2
2 1
= σ 
(Xi − X̄)2
note that this assumes that we know σ.
– Can we?

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 18


Estimated variance of b1
• If we don’t know σ then we can replace it with
the MSE estimate
• Remember
 2
 2
2 SSE (Yi −Ŷi ) ei
s = M SE = n−2 = n−2 = n−2

plugging in we get
σ2
V (b1 ) = 
(Xi − X̄)2
s2
V̂ (b1 ) = 
(Xi − X̄)2
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 19
Digression : Gauss-Markov Theorem
• In a regression model where E(ǫi) = 0 and
variance V(ǫi) = σ < ∞ and ǫi and ǫj are
uncorrelated for all i and j the least squares
estimators b0 and b1 and unbiased and have
minimum variance among all unbiased linear
estimators.
– Remember

(Xi − X̄)(Yi − Ȳ )
b1 = 
(Xi − X̄)2
b0 = Ȳ − b1 X̄
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 20
Proof
• The theorem states that b1 as minimum
variance among all unbiased linear estimators
of the form

β̂1 = c i Yi

• As this estimator must be unbiased we have



E(β̂1 ) = ci E(Yi ) = β1
  
= ci (β0 + β1 Xi ) = β0 ci + β1 ci Xi = β1

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 21


Proof cont.
• Given these constraints
 
β0 ci + β1 c i Xi = β1

clearly it must be the case that ∑ci =0 and


∑ciXi = 1 write these on board as conditions of unbiasedness

• The variance of this estimator is


 
V (β̂1 ) = c2i V (Yi ) = σ 2 c2i

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 22


Proof cont.
• Now define ci = ki + di where the ki are the
constants we already defined and the di are
arbitrary constants. Let’s look at the variance
of the estimator
 
V (β̂1 ) = c2i V
(Yi ) = σ 2
(ki + di )2
  
= σ2 ( ki2 + d2i + 2 ki di )

Note we just demonstrated that


2

σ ki2 = V (b1 )

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 23


Proof cont.
• Now by showing that ∑ ki di = 0 we’re almost
done
 
ki di = ki (ci − ki )

= ki (ci − ki )
 
= ki ci − ki2
  Xi − X̄  1
= ci  2
−
(Xi − X̄) (Xi − X̄)2
 
ci Xi − X̄ ci 1
=  2
− 2
=0
(Xi − X̄) (Xi − X̄)
from conditions of unbiasedness
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 24
Proof end
• So we are left with
 
V (β̂1 ) = σ2 ( ki2 + d2i )

2
= V (b1 ) + σ ( d2i )

which is minimized when the di’s = 0.


This means that the least squares estimator
b1 has minimum variance among all unbiased
linear estimators.

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 25


Sampling Distribution of (b1 - β)/S(b1)
• b1 is normally distributed so (b1-β)/(V(b1)1/2)
is a standard normal variable
• We don’t know V(b1) so it must be estimated
from data. We have already denoted it’s
estimate V̂ (b1 )
• Using this estimate we it can be shown that

b1 −β1
Ŝ(b1 )
∼ t(n − 2) Ŝ(b1 ) = V̂ (b1 )

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 26


Where does this come from?
• We need to rely upon the following theorem
– For the normal error regression model


(Yi −Ŷi )2
SSE
σ2 = σ2 ∼ χ2 (n − 2)

and is independent of b0 and b1

• Intuitively this follows the standard result for the sum


of squared normal random variables
– Here there are two linear constraints imposed by the
regression parameter estimation that each reduce the
number of degrees of freedom by one.

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 27


Another useful fact : t distribution
• Let z and χ(ν) be independent random
variables (standard normal and χ
respectively). We then define a t random
variable as follows:

t(ν) =  χz2 (ν)


ν

This version of the t distribution has one


parameter, the degrees of freedom ν

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 28


Distribution of the studentized statistic
• To derive the distribution of this statistic, first
we do the following rewrite
This is a standard
normal variable
b1 −β1
b1 −β1 S(b1 )
Ŝ(b1 )
= Ŝ(b1 )
S(b1 )


Ŝ(b1 ) V̂ (b1 )
S(b1 ) = V (b1 )

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 29


Studentized statistic cont.
• And note the following
 M SE
V̂ (b1 ) (Xi −X̄)2
M SE SSE
V (b1 ) =  σ2
= σ2 = σ 2 (n−2)
(Xi −X̄)2

where we know (by the given theorem) the


distribution of the last term is χ and indep. of
b1 and b0
SSE χ2 (n−2)
σ 2 (n−2) ∼ n−2

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 30


Studentized statistic final
• But by the given definition of the t distribution
we have our result
b1 −β1
Ŝ(b1 )
∼ t(n − 2)

because putting everything together we can


see that
b1 −β1 z
Ŝ(b1 )
∼
χ2 (n−2)
n−2

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 31


Confidence Intervals and Hypothesis Tests
• Now that we know the sampling distribution of
b (t with n-2 degrees of freedom) we can
construct confidence intervals and hypothesis
tests easily

Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 32

You might also like