You are on page 1of 11

LINEAR REGRESSION ANALYSIS

MODULE – II
Lecture - 4

Simple Linear Regression


Analysis
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2

Maximum likelihood estimation


We assume that ε i ' s (i = 1, 2,..., n) are independent and identically distributed following a normal distribution N (0, σ ).
2

Now we use the method of maximum likelihood to estimate the parameters of the linear regression model

yi =β 0 + β1 xi + ε i (i =1, 2,..., n),

the observations yi (i = 1, 2,..., n) are independently distributed with N ( β 0 + β1 xi , σ ) for all i = 1, 2,..., n. The likelihood
2

function of the given observations ( xi , yi ) and unknown parameters β 0 , β1 and σ 2 is

1/ 2
n
 1   1 
i , yi ; β 0 , β1 , σ
L( x= 2
∏  2 
i =1  2πσ 
exp  − 2 ( yi − β 0 − β1 xi ) 2 .
 2σ 

The maximum likelihood estimates of β 0 , β1 and σ 2 can be obtained by maximizing L( xi , yi ; β 0 , β1 , σ 2 ) or equivalently


ln L( xi , yi ; β 0 , β1 , σ 2 ) where

n n  1  n
ln L( xi , yi ; β 0 , β1 , σ 2 ) =
−   ln 2π −   ln σ 2 −  2  ∑ ( yi − β 0 − β1 xi ) 2 .
2 2  2σ  i =1

The normal equations are obtained by partial differentiation of log-likelihood with respect to β 0 , β1 and σ 2 equating them
to zero
∂ ln L( xi , yi ; β 0 , β1 , σ 2 ) 1 n
− 2 ∑ ( yi − β 0 − β1 xi ) =
= 0
∂β 0 σ i =1
∂ ln L( xi , yi ; β 0 , β1 , σ 2 ) 1 n
− 2 ∑ ( yi − β 0 − β1 xi )xi =
= 0
∂β1 σ i =1
and
∂ ln L( xi , yi ; β 0 , β1 , σ 2 ) n 1 n
4 ∑
=
− 2+ ( yi − β 0 − β1 xi ) 2 =
0.
∂σ 2
2σ 2σ i =1
3
The solution of these normal equations give the maximum likelihood estimates of β 0 , β1 and σ 2 as

b0= y − b1 x
n

∑ ( x − x )( y − y )
i i
sxy
b1 =
i =1
n

∑ ( xi − x )2
sxx
i =1

and
n

∑ ( y − bi 0 − b1 xi ) 2
s 2 = i =1
,
n
respectively.

It can be verified that the Hessian matrix of second order partial derivation of ln L with respect to β 0 , β1 , and σ 2 is negative
=
definite at β 0 b=
0 , β1 b1 , and σ 2 = s 2 which ensures that the likelihood function is maximized at these values.

Note that the least squares and maximum likelihood estimates of β 0 and β1 are identical when disturbances are normally

distributed. The least squares and maximum likelihood estimates of σ are different. In fact, the least squares estimate of σ 2 is
2

1 n
=s2 ∑
n − 2 i =1
( yi − y ) 2

n−2 2
so that it is related to maximum likelihood estimate as s 2 =
s .
n
Thus b0 and b1 are unbiased estimators of β 0 and β1 whereas s is a biased estimate of σ 2 , but it is asymptotically unbiased.
2

The variances of b0 and b1 are same as that of b0 and b1 respectively but the mean squared error

MSE ( s 2 ) < Var ( s 2 ).


4

Testing of hypotheses and confidence interval estimation for slope parameter


Now we consider the tests of hypothesis and confidence interval estimation for the slope parameter of the model under two
cases, viz., when σ 2 is known and when σ 2 is unknown.

Case 1: When σ2 is known


Consider the simple linear regression model yi =β 0 + β1 xi + ε i (i =1, 2,..., n). It is assumed that ε i ' s are independent
and identically distributed and follow N (0, σ 2 ).
First we develop a test for the null hypothesis related to the slope parameter

H 0 : β1 = β10
where β10 is some given constant.

Assuming σ 2 to be known, we know that


σ2
E (b1 ) β=
= 1 , Var (b1 )
sxx

and b1 is a linear combination of normally distributed yi ' s., so

 σ2 
b1 ~ N  β1 , 
 sxx 

and so the following statistic can be constructed


b1 − β10
Z1 =
σ2
sxx
which is distributed as N(0, 1) when H0 is true.
5
A decision rule to test H1 : β1 ≠ β10 can be framed as follows:
Reject H0 if Z 0 > zα /2
where zα /2 is the α / 2 percentage points on normal distribution. Similarly, the decision rule for one sided alternative
hypothesis can also be framed.

The 100(1 − α )% confidence interval for β1 can be obtained using the Z1 statistic as follows:

 
P  − zα ≤ Z1 ≤ z α  =1 − α

 2 2 

 
 
 b1 − β1
P − zα ≤ ≤ zα  =1 − α
 σ 2 
 2 2

 sxx 
 σ2 σ2 
P b1 − zα ≤ β1 ≤ b1 + zα  =1 − α .
 2 s xx 2 s xx 

So 100(1 − α )% confidence interval for β1 is

 α2 α2 
 b1 − zα /2 , b1 + zα /2 
 s s 
 xx xx 

where zα / 2 is the α / 2 percentage point of the N(0,1) distribution.


6

Case 2: When σ 2 is unknown

When σ 2 is unknown, we proceed as follows. We know that


SS res
~ χ 2 (n − 2).
σ 2

and
 SS 
E  res  = σ 2 .
n−2
Further, SS res / σ 2 and b1 are independently distributed. This result will be proved formally later in module on multiple linear
regression. This result also follows from the result that under normal distribution, the maximum likelihood estimates, viz.,
sample mean (estimator of population mean) and sample variance (estimator of population variance) are independently
distributed so b1 and s2 are also independently distributed.

Thus the following statistic can be constructed:


b1 − β1
t0 =
σˆ 2
sxx
b1 − β1
= ~ tn − 2
SS res
(n − 2) sxx

which follows a t-distribution with (n - 2) degrees of freedom, denoted as tn − 2 , when H0 is true.


7

A decision rule to test H1 : β1 ≠ β10 is to


reject H 0 if t0 > tn − 2,α / 2
where tn − 2,α / 2 is the α / 2 percentage point of the t-distribution with (n - 2) degrees of freedom.

Similarly, the decision rule for one sided alternative hypothesis can also be framed.

The 100(1 − α )% confidence interval of β1 can be obtained using the t0 statistic as follows :
 
P  −tα ≤ t0 ≤ tα  =1 − α
 2 2 

 
 
 b1 − β1
P −tα ≤ ≤ tα  =1 − α
 σˆ 2 
 2 2

 sxx 
 σˆ 2 σˆ 2 
P b1 − tα ≤ β1 ≤ b1 + tα  =1 − α .
 2 sxx 2 sxx 

So the 100(1 − α )% confidence interval β1 is

 SS res SS res 
 b1 − tn − 2,α /2 , b1 + tn − 2,α /2  .
 (n − 2) sxx (n − 2) sxx 
8

Testing of hypotheses and confidence interval estimation for intercept term

Now, we consider the tests of hypothesis and confidence interval estimation for intercept term under two cases, viz., when
σ 2 is known and when σ 2 is unknown.

Case 1: When σ 2 is known

Suppose the null hypothesis under consideration is H 0 : β 0 = β 00 ,

1 x2 
where σ is known, then using the result that E
2
= (b0 ) β 0 , Var=
(b0 ) σ 2  +  and b0 is a linear combination of
 n sx 
normally distributed random variables, the following statistic
b0 − β 00
Z0 = ~ N (0,1),
21 x2 
σ  + 
 n sxx 
has a N(0, 1) distribution when H0 is true.

A decision rule to test H1 : β 0 ≠ β 00 can be framed as follows:


Reject H0 if Z 0 > zα /2
where zα /2 is the α / 2 percentage points on normal distribution.

Similarly, the decision rule for one sided alternative hypothesis can also be framed.
9

The 100(1 − α )% confidence intervals for β 0 when σ 2 is known can be derived using the Z 0 statistic as follows::
 
P  − zα ≤ Z 0 ≤ zα  =1 − α
 2 2

 
 
 b0 − β 0 
P  − zα ≤ ≤ zα  =1 − α
 2  1 x2  2

 + 
  n s xx 

 
 x2  
21 x2  21
P b0 − zα σ  +  ≤ β 0 ≤ b0 + zα σ  +   =1 − α .
 2  n s xx  2  n s xx  

So the 100(1 − α )% of confidential interval of β 0 is

 
21 x2  21 x2
 b0 − zα /2 σ  +  , b0 + zα /2 σ  +  .
  n sxx   n sxx  

10

Case 2: When σ is unknown


2

When σ 2 is unknown, then the statistic is constructed

b0 − β 00
t0 =
SSres  1 x 2 
+
n − 2  n s xx 

which follows a t-distribution with (n - 2) degrees of freedom, i.e., tn − 2 when H0 is true.

A decision rule to test H1 : β 0 ≠ β 00 is as follows:

Reject H0 whenever t0 > tn − 2,α / 2


where tn − 2,α / 2 is the α / 2 percentage point of the t-distribution with (n - 2) degrees of freedom.

Similarly, the decision rule for one sided alternative hypothesis can also be framed.
11
The 100 (1 − α )% of confidential interval of β 0 can be obtained as follows:
Consider
P tn − 2,α /2 ≤ t0 ≤ tn − 2,α /2  =1 − α
 
 
 b0 − β 0 
P tn − 2,α /2 ≤ ≤ tn − 2,α /2  = 1−α
 
SS res 1 x 2
 
 + 
 n − 2  n sxx  
 
 SS res  1 x 2  SS res  1 x 2 
P b0 − tn − 2,α /2  +  ≤ β ≤ b + t n − 2,α /2  +   =1 − α .
n − 2  n sxx  n − 2  n sxx
0 0
  
The 100 (1 − α )% confidential interval for β 0 is
 SS res  1 x 2  SS res  1 x 2 
b0 − tn − 2,α / 2  +  , b0 + tn − 2,α / 2  + .
 n − 2  n sxx  n − 2  n sxx  

Confidence interval for σ 2

A confidence interval for σ 2 can also be derived as follows. Since SS res / σ ~ χ n − 2 , thus consider
2 2

 SS 
P  χ n2− 2,α /2 ≤ res ≤ χ n2− 2,1−α /2  =
1−α
 σ 2

 SS SS 
P  2 res ≤ σ 2 ≤ 2 res  =− 1 α.
χ
 n − 2,1−α /2 χ 
n − 2,α /2 
 SS res SS res 
The corresponding 100(1 − α )% confidence interval for σ is  2 .
2
,
χ  n − 2,1−α /2 χ n2− 2,α /2 

You might also like