You are on page 1of 41

Machine Translated by Google

Chapter 2

Simple Linear Regression Analysis

The simple linear regression model


We consider the modelling between the dependent and one independent variable. When there is only one

independent variable in the linear regression model, the model is generally termed as a simple linear

regression model. When there are more than one independent variables in the model, then the linear model

is termed as the multiple linear regression model.

The linear model

Consider a simple linear regression model

X ÿÿ
ÿ ÿy ÿ
0 ÿ 1

where y is termed as the dependent or study variable and X is termed as the independent or explanatory

variable. The terms ÿ and ÿ are the parameters of the model. The parameter ÿ is termed as an intercept
0 1
0

term, and the parameter ÿ 1


is termed as the slope parameter. These parameters are usually called as

regression coefficients. The unobservable error component ÿ accounts for the failure of data to lie on the

straight line and represents the difference between the true and observed realization of y . There can be

several reasons for such difference, e.g., the effect of all deleted variables in the model, variables may be

qualitative, inherent randomness in the observations etc. We assume that ÿ is observed as independent and
2
identically distributed random variable with mean zero and constant variance ÿ . Later, we will additionally
assume that ÿ is normally distributed.

The independent variables are viewed as controlled by the experimenter, so it is considered as non-stochastic

whereas y is viewed as a random variable with

E y( ) ÿ ÿ ÿ 0 ÿ 1X

and

ÿ 2
Var y() ÿ .

Sometimes X can also be a random variable. In such a case, instead of the sample mean and sample

variance of y , we consider the conditional mean of y given X ÿ x as

E(|) yx ÿÿ
ÿ 0 ÿ 1
x

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
1
Machine Translated by Google

and the conditional variance of y given X x ÿ as


2
Let ax (|) ÿ ÿ .

When the values of ÿ ÿ,0 1


and ÿ
2

are known, the model is completely described. The parameters ÿ ÿ,


0 1
and

2
ÿ are generally unknown in practice and ÿ is unobserved. The determination of the statistical model
2
X ÿÿ
ÿ ÿy ÿ
0 1
ÿ depends on the determination (i.e., estimation ) of ÿ ÿ
0 , 1
and ÿ . In order to know the

values of these parameters, n pairs of observations ( , )( 1,..., i) on ( , ) are observed/collected and x y i n Xy


i
ÿ

are used to determine these unknown parameters.

Various methods of estimation can be used to determine the estimates of the parameters. Among them, the

methods of least squares and maximum likelihood are the popular methods of estimation.

Least squares estimation


Suppose a sample of n sets of paired observations ( , ) ( is available.
i These observations x yi n 1, 2,..., )
i
ÿ

are assumed to satisfy the simple linear regression model, and so we can write

twelfth,...,
i
ÿÿ ÿÿ ÿ0 1 ÿ). medicine
ii
ÿ
(

The principle of least squares estimates the parameters ÿ and ÿ by minimizing the sum of squares of the
0 1

difference between the observations and the line in the scatter diagram. Such an idea is viewed from different

perspectives. When the vertical difference between the observations and the line in the scatter diagram is

considered, and its sum of squares is minimized to obtain the estimates of ÿ 0


and
ÿ 1, the method is known

as direct regression.
do

(xi,

and ÿ ÿ0 ÿ
ÿ 1X

(Xi,

xi

Direct regression

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
2
Machine Translated by Google

Alternatively, the sum of squares of the difference between the observations and the line in the horizontal

direction in the scatter diagram can be minimized to obtain the estimates of ÿ 0


and ÿ 1 . This is known as a

reverse (or inverse) regression method.

do

Y X ÿÿÿ 0 ÿ1
(xi, yi)

(Xi, Yi)

xi,

Reverse regression method

Instead of horizontal or vertical errors, if the sum of squares of perpendicular distances between the

observations and the line in the scatter diagram is minimized to obtain the estimates of ÿ and ÿ 1 , the
0

method is known as orthogonal regression or major axis regression method.

do

(xi

Y X ÿÿÿ 0 ÿ1

(Xi )

xi

Major axis regression method

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
3
Machine Translated by Google

Instead of minimizing the distance, the area can also be minimized. The reduced major axis regression

method minimizes the sum of the areas of rectangles defined between the observed data points and the

nearest point on the line in the scatter diagram to obtain the estimates of regression coefficients. This is

shown in the following figure:

do

(xi yi)

Y X ÿÿÿ 0 ÿ1

(Xi, Yi)

xi

Reduced major axis method

The method of least absolute deviation regression considers the sum of the absolute deviation of the

observations from the line in the vertical direction in the scatter diagram as in the case of direct regression to

obtain the estimates of ÿ and


ÿ 1
.

No assumption is required about the form of the probability distribution of ÿ i in deriving the least squares

s are random
'

estimates. For the purpose of deriving the statistical inferences only, we assume that ÿ i

2
variable with AND
(ÿ )i 0, ( )Was
ÿ
ÿ i ÿ ÿ and The (ÿÿ, ) ÿ0ÿÿfor all (_ 1, 2,..., n). This assumption is
ij

needed to find the mean, variance and other properties of the least-squares estimates. The assumption that
'

ÿ i s are normally distributed is utilized while constructing the tests of hypotheses and confidence intervals

of the parameters.

Based on these approaches, different estimates of ÿ and ÿ are obtained which have different statistical
0 1

properties. Among them, the direct regression approach is more popular. Generally, the direct regression

estimates are referred to as the least-squares estimates or ordinary least squares estimates.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
4
Machine Translated by Google

Direct regression method


This method is also known as the ordinary least squares estimation. Assuming that a set of n paired

observations on ( , ), 1, 2,...,
i are available which satisfy the linear regression model X ÿÿ ÿ x yi n y
i
ÿ
ÿ 0 ÿ 1 ÿ .

So we can write the model for each observation as y


i
ÿ
ÿ ÿÿ
ii 0 ÿ 1x ÿ ,( i ÿ
1,2,...,n ) .

The direct regression approach minimizes the sum of squares


n n
2 2
S (,)
ÿÿ01 ÿ
ÿ ÿÿÿ ÿii iiÿ ( and ÿ 0 ÿ 1
x )
i ÿ
1
ÿ
1

with respect to ÿ and ÿ 1


.

The partial derivatives of S(,)ÿ with


ÿ respect to ÿ is
0 1 0

ÿ S (,)
ÿ ÿ
n

0 1
ÿÿ
ÿ
2( and t
ÿÿ

ÿ 0
ÿ i
x 1 )
ÿ 0 i ÿ
1

and the partial derivative of S(,)ÿ with


ÿ respect to ÿ is
0 1
1

n
ÿ S (,)
ÿ ÿ ÿ 0 1
ÿÿ
2 ÿ( do ÿ ÿ xxii ) 1
ÿÿ

.
0
1
i ÿ
1

The solutions of ÿ 0
and ÿ 1 are obtained by setting

ÿ S(,) ÿ ÿ ÿ 0 1 ÿ
0
ÿÿ ÿÿ
0

ÿ
S (,) ÿ
0 1
ÿ 0.

The solutions of these two equations are called the direct regression estimators, or usually called as the

ordinary least squares (OLS) estimators of ÿ and .

0 ÿ 1

This gives the ordinary least squares estimates b b of and of ÿ as


0 ÿ 0 11

ÿÿ
b y bx
0 1

s
xy
b ÿ
1

s
xx

where

n nnn

ÿ ÿÿ
1 1
2
sxy ÿ ÿÿ ÿÿ ÿ

ÿ ( ), x xy
)( y s i i xx (x)xy
, xy x i ,
ÿ
i
.

i ÿ
1 ii ii1 11
ÿ ÿÿ n n

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
5
Machine Translated by Google

Further, we have
2 n
ÿ S (,)
ÿ ÿÿ 0 1
ÿÿ ÿ ÿ
n
2 2 ÿ( 1) 2 ,

ÿÿ 0 ÿÿ i ÿ
1

2 n
ÿ S (,)
ÿ ÿÿ 1
ÿ
0 2
ÿ 2 xi
2
ÿ 1 i ÿ
1

2 n
ÿ S (,)
ÿÿ
0 1
ÿ
2 2.ÿ x nx t
ÿ

0 1
i ÿ
1

The Hessian matrix which is the matrix of second-order partial derivatives, in this case, is given as

ÿ ÿ ÿ
2
S
ÿ 01
ÿ ÿ 2 ÿS (,)
ÿÿ(,)
2 01
ÿ ÿ
0 ÿÿÿÿ
*
ÿ ÿ 01
H ÿ

ÿ
2
S ÿ (,)ÿ (,) ÿ 2
S
01 01
ÿÿÿÿ ÿÿ ÿ ÿ 2

01 1

nx ÿ

2 n
ÿ ÿ ÿ nx
ÿ 2x
i
ÿ i ÿ
1
ÿ ÿÿ

'
ÿ ÿÿ ÿ
ÿ ÿ2 ÿ ÿ ÿ' ÿÿ
ÿ ,
x ÿ

where ÿ ÿ (1,1,...,1)' is a n -vector of elements unity and x ÿ x 1( ,..., )' xa nnis -vector of observations on X .

The matrix H * is positive definite if its determinant and the element in the first row and column of H * are

positive. The determinant of H * is given by


n
ÿ ÿ
* 4nx
H nx ÿ

ÿ
ÿ i
2 22
ÿ

ÿ i ÿ
1 ÿ ÿ

4(n)ÿxx
2
ÿ ÿ

i ÿ
1

ÿ 0.

The case when ÿ ÿ (ÿ )0


2

is not interesting because all the observations, in this case, are identical, i.e.
xx i

i ÿ
1

x ÿ c (some constant). In such a case, there is no relationship between x and y in the context of regression
i

n
2
analysis. Since ÿ ÿ ÿx therefore
x i ) 0,
H ÿ0. So H is positive definite for any 0 (,) , therefore,
( ÿ ÿ 1
i ÿ
1

S(,)ÿ has
0
ÿ a global minimum at ( , ). b b
1 01

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
6
Machine Translated by Google

The fitted line or the fitted linear regression model is

yb bx ÿ ÿ 0 1 .

The predicted values are


ˆ
ybi ÿÿ
bxÿ i0 1 i
( 1, 2,..., n).

The difference between the observed value yi and the fitted (or predicted) value ˆi y is called a residual. The

i residual is defined as
th

ˆ
ÿ
( iiioh ~
yeah
ÿ
1,2,..., ) n
ˆ
ÿÿ
yyi i

ÿÿ ÿ i 0 (
yb bx 1 i
).

Properties of the direct regression estimators:

Unbiased property:
s
xy
Note that b ÿ
and b yare
bx ÿÿ
the linear combinations of ( 1,..., ). yi n ÿ i
1 01
sxx

Therefore

b 1 ÿ ÿ this one i i
ÿ
i 1

n n

where k x x s ÿÿ
) / . Note that k ÿ 0 and ÿ sokx
ÿ
1,
i
(
i xx ÿ i i i
ÿ ÿ
i 1 i 1

E b()
1
ÿ
ÿ k E y() i i

i ÿ
1

ÿ ki ( ÿ 0 ÿ ÿ x
1i
) .
i ÿ
1

ÿ
ÿ 1
.

This b is1 an unbiased estimator of ÿ 1 . Next

ÿ
( )E y bx ÿ
Eb
ÿ

ÿ ÿ
0 1

ÿ
AND
ÿÿ01ÿ1 ÿÿ x ÿ bx ÿ

ÿÿ ÿÿÿ
x xÿ01 1 ÿ
ÿ
.
0

Thus b is an unbiased estimator of ÿ


0 0.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
7
Machine Translated by Google

Variances:
Using the assumption that 'y i s are independently distributed, the variance of b is1
n
2
there is
()b ÿ
ÿ
1
ÿ k Var
i y () i
ÿÿ kk Cov
ij yy (, ) ij
i ÿ
1
from ÿ

ÿ (
xxi ÿ

)
2

2 i
ÿ
ÿ ( The
( , y yij ) 0 as y y ,...,
ÿ
1 n are independent)
2 s
xx

ÿ 2s
= xx
2
s
xx

2
ÿ
=
.

s
xx

The variance of b is 0

Var(b)0 Var
11
ÿÿ ÿ ( ) yx
Var b xCov( yb
)2 2
( , ).

First, we find that

ÿÿ ÿ
(, )E y E yb Eÿb
The yb 1
ÿÿ () 11 ÿ () ÿÿ

ÿ
AND ÿ cyi i
ÿ

ÿ
(ÿ 1 ÿ )

i ÿ ÿ

1 ÿ ÿ
ÿ ÿÿ c ii cx ii
ÿ ÿ ÿÿ iÿ)( ÿÿ ÿthere ÿÿÿ)
ÿ
AND
ÿÿ 01
ÿ ( ÿ ÿÿ 1 i
n i i ii i ÿ ÿ

1 ÿ ÿÿÿ
0000 ÿ ÿ

n
ÿ
0

So

2
1 x ÿ
there is
( )b0 ÿ
ÿ .
ÿ ÿÿ
ÿns xx ÿ ÿ

Covariance:
The covariance between b and 0
b1is

Cov (bb
, )Cov yb xVar(,b) ÿ
()
ÿ

01 1 1

x 2
ÿÿ
ÿ .

s
xx

It can further be shown that the ordinary least squares estimators b and
0
b1possess the minimum variance

in the class of linear and unbiased estimators. So they are termed as the Best Linear Unbiased Estimators

(BLUE). Such a property is known as the Gauss-Markov theorem, which is discussed later in multiple

linear regression model.


Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
8
Machine Translated by Google

Residual sum of squares:


The residual sum of squares is given as
n n
ˆ
SS res
ÿÿ ÿ
ÿÿ 2 no ( yyi i )
2

i 1 ÿ
i ÿ
1

ÿ ÿÿ
ÿ ( yb bx i 0 1 i )
2

i ÿ
1

ÿ
2
ÿ ÿÿ ÿ 1 ÿ
ÿ yi y bx bx 1 i

i ÿ
1

n
2
ÿ ÿÿ ÿ

ÿ ÿ ÿ (i )( ) y y bx x 1 i

i ÿ
1

nn n

ÿ ÿÿ ÿÿ ÿ( ÿÿ ÿ( ) 2 i )
22
( ) y y b x x b x xy y 1 i
2

1 ii )(
ÿ

ii i 11 1
ÿÿ ÿ

2 2
ÿÿ ÿ2 s bs bsxx 1 xx
yy 1

2
ÿ ÿ yy s bs 1 xx

ÿ sÿxy
ÿ ÿ ÿ ÿsÿ s
yy
s
xx

ÿ xx
2
s
xy
ÿ
s ÿ

yy
s
xx

ÿ ÿ yy s bs 1 xy
.

n n
1
where ) ,do s yyÿ y (
ÿÿ ÿ
2
ÿ Y i
.

i ÿ
1
n i ÿ
1

2
Estimation of ÿ
2
The estimator of ÿ is obtained from the residual sum of squares as follows. Assuming that y is
i normally

2
distributed, it follows that SSreshas a ÿ distribution with ( 2) n ÿ degrees of freedom, so

SS res 2

~ (ÿ n ÿ

2).
2
ÿ

Thus using the result about the expectation of a chi-square random variable, we have
2
E (SS res )
ÿÿ
( n2) . ÿ

2
Thus an unbiased estimator of ÿ is

SS
res
ÿ
2 s .

n
ÿ

Note that SS has only ( 2) n ÿ degrees of freedom. The two degrees of freedom are lost due to estimation
res

2 2
of b and b . Since s depends on the estimates b and b so it is a model-dependentt estimate of ÿ .
0 1 0 1 ,

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
9
Machine Translated by Google

Estimate of variances of b and 0


b1:

2 ˆ2 2
The estimators of variances of b and b are obtained
1 by replacing 0
ÿ by its estimate ÿ ÿ s as follows:

2
ÿ
1 x ÿ
have( )bs 0
ÿ
2
ÿ ÿÿ
ÿns xx ÿ ÿ

and
2
ÿ
s

there is b()
1
ÿ
.
s xx

n n
ˆ
It is observed that since ÿ ÿ( ÿandsoandÿ) ÿ0,In i i the light of this
0. property, e can be regarded as an
and
i i

i ÿ 1 i ÿ 1

estimate
of i of unknown ÿ( i1,...,
nÿ) . This helps in verifying the different model assumptions on the basis

thengiven sample ( , ), i 1, 2,..., . x


yi i
ÿ

Further, note that


n

ÿ ÿcar
(i) i i
0,
i ÿ 1

ˆ n

(ii) ÿ ÿy e i i
0,
i ÿ 1

n n
ˆ
(iii) Y and Y
ÿ ÿÿ i i

i 1 ÿ i ÿ 1

(iv) the fitted line always passes through ( , ). x y

Centered Model:
Sometimes it is useful to measure the independent variable around its mean. In such a case, the model
Yi X ÿÿ
ÿ ÿ 1 ÿii 0 ÿ has a centred version as follows:

do ÿÿ
ÿ ÿ ÿÿ ÿ 01 i (
xx x ) ÿ ÿ ( i
ÿ
1, 2,..., n)
1

*
ÿ ÿ ÿÿ
ÿÿ 0 ÿ1 ÿ (
xxi ) ÿ
i

*
where ÿ ÿ ÿ x 0 01 . The sum of squares due to error is given by
2
n n

S ( ÿÿ*2,* ÿ ) ÿ ÿ ÿÿ ÿ
Y ÿ ÿxxii( i). ÿ
01 ÿ ÿ ÿÿ ÿ 0 1

i 1 ÿ
i ÿ
1

Now solving
*
ÿS (,)
ÿ ÿÿ 0 1
ÿ 0
*
ÿÿ ÿ
0

*
ÿS (,)
ÿ 0 1

*
ÿ
0,
ÿ
1

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
10
Machine Translated by Google

we get the direct regression least squares estimates of ÿ and ÿ as


0 1

b0ÿ y

and

s
xy
b ÿ
,
1

s xx

respectively.

Thus the form of the estimate of slope parameter ÿ 1


remains the same in the usual and centered model

whereas the form of the estimate of intercept term changes in the usual and centered models.

* *

Further, the Hessian matrix of the second order partial derivatives of S(,)ÿ with
ÿ respect to
0 1
ÿ 0
and ÿ 1

* * * * *

is positive definite at ÿ 0
ÿ b and 0 ÿ 1
ÿ b which ensures that
1
S(,)ÿ isÿminimized at
0 1
ÿ 0
ÿ b and
0

ÿ 1
ÿb.
1

2
Under the assumption that Andÿ (i ) 0,
ÿ
Was ( ÿ) i ÿ
ÿ and The ( ÿÿ
ij
ÿ
) 0 for all i j ÿÿ 1, 2,..., n , it follows that

* *
Eb( ) 0
ÿ
ÿ 0 11() Eb
, , ÿ
ÿ
2 2
* ÿ ÿ
there is
( b)0 ÿ
there
, () . is b 1
ÿ

n s
xx

*
ÿ
In this case, the fitted model of Y0 toxxii ÿÿ ÿ ÿÿ ÿ 1( )
is

y y bx x ÿÿ1 ÿ ( ),

and the predicted values are


ˆ
x i ÿÿ ÿ 1 ( )i ( y y bx
i
ÿ
1,..., n).

Note that in the centered model


*
The bb0 1( , ) 0. ÿ

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
11
Machine Translated by Google

No intercept term model:


Sometimes in practice, a model without an intercept term is used in those situations when x
i
for0 0 y ÿÿ
i
ÿ

all i n ÿ1,2,..., . A no-intercept model is

yx1 toin ÿÿ
ii
ÿÿ ÿ ( 1, 2,.., ).

For example, in analyzing the relationship between the velocity ( ) y of a car and its acceleration ( ) X ,
the

velocity is zero when acceleration is zero.

Using the data ( , ), x yi n i i


ÿ
1, 2,..., , the direct regression least-squares estimate of ÿ 1 is obtained by

n n

minimizing S ()ÿÿ
1
ÿÿ ÿ
ÿÿ 2
i ii ( andÿ x 1
)
2
and solving
ÿ ÿ
i 1 i 1

ÿ S ( )ÿ 1
ÿ 0
ÿ ÿ 1

gives the estimator of ÿ1 as

*
ÿ y xi i
i ÿ
1
b ÿ
.
1 n

ÿ xi
2

i ÿ
1

The second-order partial derivative of S( ÿ) with respect to


1
ÿ 1
at
ÿ 1 ÿ b is positive which insures that
1
b 1

minimizes S( ÿ). 1

2
Using the assumption that AND ( ÿ) i 0, ( )
ÿ
Was ÿ i ÿ ÿ and ( Those ÿÿ ) 0 for all i j
ÿ
ÿÿ 1, 2,..., n , the properties
ij

*
of b can be derived as follows:
1

*
ÿ x E y( ) i
i

i ÿ
1
E b( ) 1
ÿ
n

ÿ xi
2

i ÿ
1

ÿ xi
2

ÿ 1

i ÿ
1
ÿ
n

ÿ xi
2

i ÿ
1

ÿ
ÿ 1

* *

This b is an unbiased estimator of


1 ÿ 1 . The variance of b is obtained
1 as follows:

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
12
Machine Translated by Google

n
2

*
ÿ Var y
i
( ) xi
i ÿ
1
there is
( )b ÿ
1 2
n
ÿ ÿ
2
xi
ÿ ÿÿ
i ÿ
1 ÿ ÿ

n
2

ÿ xi

2 i ÿ
1
ÿ ÿ
2
n
ÿ ÿ
2
xi
ÿ ÿÿ
i ÿ
1 ÿ ÿ

2
ÿ
ÿ

n
2

ÿ xi

i ÿ
1

2
and an unbiased estimator of ÿ is obtained as

n n
2

ÿ ÿyx
iy b
ÿ

1 ii

i 1
ÿ
i ÿ
1

n ÿ

Maximum likelihood estimation


We assume that ÿ i ' ( and ÿ
1, 2,..., ) n are independent and identically distributed following a normal

2
distribution N(0, ). ÿ Now we use the method of maximum likelihood to estimate the parameters of the

linear regression model

xi in ÿÿ ÿ 0ÿ1 ii ÿ ÿ (
ÿ
1, 2,..., ), y

2
the observations ( yi i
ÿ
1, 2,..., ) nare independently distributed with N ( ÿ 0
ÿ ÿ x
1 i
,)ÿ for all i n ÿ1,2,..., .

2
The likelihood function of the given observations (, ) x y and and is
ii unknown parameters ÿ ÿ
0 , 1 ÿ

n
1/2
ÿÿÿ ÿ1 ÿ ÿ ÿÿÿ 1
2 2
ÿ ÿ ÿÿ

Lx(y, ii ; ÿ ,ÿÿ01 , ) 2 exp 2


( Yi ÿ 0 ÿ x
1 i
ÿ ).
ÿÿ 2ÿ
ÿ

2 ÿ ÿ
i ÿ
1

2 2
The maximum likelihood estimates of ÿ ÿ, and ÿ can be obtained by maximizing Lx y (, ; i ÿ ÿ ,
1, ÿ )
or
0 1 i 0

2
equivalently in ln ( , Lx y i; ÿ 0ÿ, 1 ,
ÿ ) where
i

n
22 2 ÿÿ ÿÿn ÿÿ ÿÿ ÿÿ n ÿ 1 ÿ

ln (y, L x i; ÿ 0ÿÿ
, 1 , )
ln ÿ ÿ

ln ÿ
ÿ

ÿ 2 ÿ ( Yi
ÿÿ

ÿ 0 ÿ x
1i ).
ÿÿ2ÿÿ22 2
i

ÿ ÿÿÿ i ÿ
1

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
13
Machine Translated by Google

2
The normal equations are obtained by partial differentiation of log-likelihood with respect to ÿ 0ÿ,1 and ÿ

and equating them to zero as follows:


2 n
ÿ ln ( , Lx y i i ; ÿ 0ÿÿ
,
ÿ 1,
) 1
ÿÿ
2
ÿ ( and i
ÿÿ

ÿ 0
ÿ i
x1 ) 0ÿ
ÿ ÿ
0 i ÿ
1

2 n
ÿ ln ( , Lx y i ; ÿ ÿÿ ÿ , , ) 1
i 0 1
ÿÿ
ÿ ( and i
ÿÿ

ÿ 0 ÿ xxii )1 ÿ 0
2
ÿ ÿ
1
i ÿ
1

and

2 n
ÿ ln ( , Lx i y ;
ÿ 0ÿÿ, 1 ,
) n 1 2
i
ÿÿ
0. ÿ ÿÿ ÿ ÿ
x 1 i( Y ÿ ÿ )
2 24 0 i 2
ÿÿ 2 ÿÿ
i ÿ
1

2
The solution of these normal equations give the maximum likelihood estimates of ÿ ÿ, and ÿ as
0 1

ÿ ÿ

ÿÿ
b y bx 0 1

ÿ
ÿ ( )( ) x xy y i
ÿ

i
ÿ

s
i ÿ
1 xy
b ÿ ÿ
1 n
s
ÿ (xxi ÿ

)
2
xx

i ÿ
1

and
n ÿÿ

ÿÿ
2

ÿ ( ybbxi 0 1 i )
ÿ

i ÿ
1
ÿ
2 s

respectively.

It can be verified that the Hessian matrix of second-order partial derivation of ln L with respect to ÿ ÿ0 , 1,

ÿ
ÿ

2 2 2
and ÿ is negative definite at ÿ
ÿ
b , ÿ
ÿ
b ,
and ÿ ÿ sÿ which ensures that the likelihood function is
0 01 1

maximized at these values.

Note that the least-squares and maximum likelihood estimates of ÿ and ÿ 1 are identical. The least-squares
0

2 2
and maximum likelihood estimates of ÿ are different. In fact, the least-squares estimate of ÿ is

n
1
2 2

s ÿ ÿ

ÿ ( ) yyi
n ÿ

2 i ÿ 1

so that it is related to the maximum likelihood estimate as

ÿ n
ÿ

2
2 s ÿ
2 s .
n
ÿ
ÿ

2 2
Thus b 0
and b 1
are unbiased estimators of ÿ 0
and ÿ1 whereas sÿ is a biased estimate of ÿ , but it is

ÿ
ÿ

asymptotically unbiased. The variances of b 0


and b 1
are same as of b and
0 b respectively but
1

ÿ
2 2
Whose
Where
( )s ÿ ( ).

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
14
Machine Translated by Google

Testing of hypotheses and confidence interval estimation for slope parameter:


Now we consider the tests of hypothesis and confidence interval estimation for the slope parameter of the
2 2
model under two cases, viz., when ÿ is known and when ÿ is unknown.

2
Case 1: When ÿ is known:

Consider the simple linear regression model


ÿ
xii n ÿÿÿ 0 1 ÿ ii
ÿ ( 1, 2,..., ) y . It is assumed that ' s areÿ i

2
ÿ
independent and identically distributed and follow N(0, ).

First, we develop a test for the null hypothesis related to the slope parameter

H: 0 ÿ 1
ÿ
ÿ 10

where ÿ 10 is some given constant.

2 ÿ
Assuming ÿ to be known, we know that E b( ) ÿ , ÿ11 1 have( b) ÿ
and b 1 is a linear combination of
s
xx

normally distributed ' y s. So i

2
ÿ ÿ

ÿ b~ N ÿ
1 ÿ 1 ,

s
ÿ xx ÿ ÿ

and so the following statistic can be constructed

b 1
ÿ

ÿ 10
ÿ
FROM
1 2
ÿ

s
xx

which is distributed as N(0,1) when H0 is true.

A decision rule to test H: 1


ÿ 1
ÿ ÿ 10
can be framed as follows:

Reject H0 if Z1 ÿZ ÿ /2

where Z ÿ /2
is the ÿ / 2 percent points on the normal distribution.

Similarly, the decision rule for one-sided alternative hypothesis can also be framed.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
15
Machine Translated by Google

The 100 (1 )%
ÿ ÿ confidence interval for ÿ 1 can be obtained using the Z1 statistic as follows:

Pz Z z ÿÿ ÿ/2 ÿ ÿÿ 1 ÿ
/2 ÿ 1 ÿ
ÿ

ÿ ÿ ÿ

ÿ
ÿ

b ÿ
1 1
Pz ÿ ÿ ÿ /2
1 ÿ ÿÿWith
ÿ /2
ÿ
2
ÿ

s xx
ÿ ÿ

2 2
ÿ ÿ ÿ
ÿ
Pb z ÿ 1
ÿ

ÿ /2
ÿ ÿÿ ÿ 1
bz 1 ÿ /2
ÿÿ
1 ÿ .

s s
xx xx
ÿ ÿ

ÿ
So 100 (1 )% ÿ confidence interval for ÿ 1 ÿ is

2 2
ÿ
ÿ ÿ ÿ

ÿ
bz1 bz ÿ

ÿ /2 , 1ÿ ÿ /2
s s
xx xx
ÿ ÿ ÿ

where With
ÿ /2
is the ÿ
/ 2 percentage point of the N(0,1) distribution.

2
Case 2: When ÿ is unknown:
2
When ÿ
is unknown then we proceed as follows. We know that

SS res 2
n
ÿ

2 ~ (ÿ2)
ÿ

and

ÿ SS ÿ
res 2
AND ÿ
ÿ .

ÿ
n ÿ ÿ ÿÿ 2

2
Further, SS / ÿ and
res b 1are independently distributed. This result will be proved formally later in the next

module on multiple linear regression. This result also follows from the result that under normal distribution,

the maximum likelihood estimates, viz., the sample mean (estimator of population mean) and the sample
2

variance (estimator of population variance) are independently distributed, so b 1and s are also

independently distributed.

Thus the following statistic can be constructed:

b 1
ÿ

ÿ 1

t ÿ
0 ˆ 2
ÿ

s xx

b 1
ÿ

ÿ 1
ÿ

SS res

( n2) which s xx
ÿ

follows a t -distribution with ( 2) n ÿ degrees of freedom, denoted as t


n 2ÿ , when H0 is true.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
16
Machine Translated by Google

A decision rule to test H: 1


ÿ 1
ÿ ÿ 10
is to

reject H0 if t
0
tÿ n ÿ

ÿ
2, /2

where t n
2, /2ÿ
ÿ
is the ÿ
/ 2 percent point of the t -distribution with ( 2) n ÿ degrees of freedom. Similarly, the

decision rule for the one-sided alternative hypothesis can also be framed.

ÿ
The 100 (1 )% ÿ confidence interval of ÿ 1 can be obtained using the 0t statistic as follows:

Consider

Pt t ÿ ÿ /2ÿ ÿÿ tÿ 1 ÿ
ÿ ÿ 0 /2 ÿ

ÿ ÿ ÿ

ÿ
ÿ

b 1 ÿ 1

Pt ÿ ÿ /2 2 1 t ÿ ÿÿ / ÿ
ÿ ˆ 2
ÿ

s xx
ÿ ÿ

ˆ 2 ˆ 2
ÿ ÿ
ÿ ÿ
Pb t ÿ ÿ 1
ÿ

ÿ /2
ÿ ÿÿ 1
btÿ 1
ÿ /2 ÿÿ
1 ÿ .

s
xx
s xx
ÿ

So the 100 (1 )% ÿ ÿ
confidence interval ÿ1 ÿ is

SS res
SS res
ÿ

ÿ ÿ bt 1
ÿ

n ÿ

ÿ
2, /2
, bt ÿ1 n
ÿ

ÿ
2, /2
.
( ns ( ns
ÿ ÿ

ÿ 2) xx 2) xx ÿ ÿ

Testing of hypotheses and confidence interval estimation for intercept term:


Now, we consider the tests of hypothesis and confidence interval estimation for intercept term under two
2 2
cases, viz., when ÿ is known and when ÿ is unknown.

2
Case 1: When ÿ is known:
Suppose the null hypothesis under consideration is

H: 0
ÿ 0
ÿ
ÿ 00 ,

2
2 ÿ 1 x ÿ
where ÿ is known, then using the result that E (b ) ÿ ÿÿ ÿ , Yes (b)
00 0
ÿ
2

ÿ and b ÿ 0
is a linear
ÿn s x

combination of normally distributed random variables, the following statistic

ÿ
b
0
ÿ

ÿ 00
FROM
0
2
2 ÿ 1 x
ÿ
ÿ ÿ ÿÿÿ
ÿ
ns xx

has a N(0,1) distribution when H0 is true.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
17
Machine Translated by Google

A decision rule to test H: 1


ÿ 0
ÿ ÿ 00
can be framed as follows:

Reject H0 if Z0 ÿ Z ÿ /2

where Z is the ÿ / 2 percentage points on the normal distribution. Similarly, the decision rule for one
ÿ /2

sided alternative hypothesis can also be framed.

when 2
The 100 (1 )% ÿÿ confidence intervals for ÿ 0 ÿ is known can be derived using the Z0 statistic as

follows:

Pz Z z ÿ ÿ ÿ ÿÿ /2 1 ÿ
ÿ ÿ 0 ÿ /2 ÿ

ÿ ÿ

ÿ ÿ

b ÿ

ÿ
Pz ÿÿ 0 0
1 ÿ ÿÿÿ/2
With ÿ
ÿ /2
2
ÿ 1 x
2
ÿ
ÿ ÿ ÿÿ
ÿ
n s xx ÿ
ÿ ÿ

2
ÿ
2
ÿÿ1 b z2ÿx ÿ 2
ÿÿ1 x 1 ÿ
ÿ

Pb z ÿ ÿ ÿ

ÿ ÿÿ ÿ ÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿ .
0 ÿ /2 0 0 ÿ /2

ÿÿns xx
ns xx
ÿ ÿ

So the 100 (1 )% ÿÿ of confidential interval of ÿ is


0

2
ÿ
ÿ ÿÿ ÿÿx ÿ ÿ ÿÿ ÿ ÿÿ ÿÿ ÿÿ ÿÿ
1 ÿ1 2

x
bz 2 bz 2
ÿ
ÿ ÿ ,
.
0 ÿ /2 0 ÿ /2
ns ns
xx xx
ÿ

2
Case 2: When ÿ is unknown:
2
When ÿ is unknown, then the following statistic is constructed

b ÿ

t ÿ 0 ÿ 00
0
SS ÿ 1 x 2
res
ÿ ÿ ÿ ÿÿ
n ns
ÿ
2
ÿ
xx

which follows a t -distribution with ( 2) n ÿ degrees of freedom, i.e., t n 2 ÿ


when H0 is true.

A decision rule to test H: ÿ ÿ is as follows:


1 ÿ 0 00

Reject H0 whenever t tÿ
0 n ÿ

ÿ
2, /2

where t n is the ÿ / 2 percentage point of the t -distribution with ( 2) n ÿ degrees of freedom. Similarly,
2, /2 ÿ
ÿ

the decision rule for one-sided alternative hypothesis can also be framed.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
18
Machine Translated by Google

The 100 (1 )% ÿÿ confidence interval of ÿ 0 can be obtained as follows:

Consider

Pt ÿ n ÿ

ÿ
ÿ ÿt 0ÿÿ 2,t n/2 2,
ÿ
/2 ÿ
ÿÿ 1 ÿ
ÿ

ÿ ÿ

ÿ ÿ

b ÿ

0 ÿ 0
Pt n ÿ

ÿ
2, /2
ÿ 1 ÿt nÿÿ2, /2ÿ
ÿ
ÿ
2
SS res
1 x
ÿ ÿÿ
n ns2
ÿ
ÿ
ÿ ÿ ÿÿ xx
ÿ

ÿ
SS res
ÿÿ1 ÿ ÿ 2ÿx SS res
ÿ ÿÿ ÿ2
1

Pb t ÿ
0
ÿ

n ÿ

ÿ
2, /2
ÿ
ÿÿ ÿÿ ÿ 0
bt 0 n ÿ

ÿ
2, /2
ÿÿ ÿ x 1 ÿ
ÿÿ ÿÿ ÿ .

n ns2ÿ

xx
n ns2 ÿ

xx
ÿ

So 100(1 )% ÿÿ confidence interval for ÿ 0 is

2 2
ÿ
SS resx ÿbt1ÿÿ
ÿ ÿÿx ÿÿ 2,
ÿÿ/2ÿÿ ÿ ÿÿ ÿÿ ÿ SS res 1
ÿ
0 n ÿ

2, /2ÿ
bt0 n ÿ ,
ÿ
.
n ns2 ÿ

xx
n ns2 xx
ÿ

2
Test of hypothesis for ÿ

We have considered two types of test statistics for testing the hypothesis about the intercept term and slope
2 2 2
parameter- when ÿ is known and when ÿ
is unknown. While dealing with the case of known ÿ , the

2
value of ÿ
is known from some external sources like past experience, long association of the experimenter

with the experiment, past studies etc. In such situations, the experimenter would like to test the hypothesis
2 2 2 2 2
like H: 0
ÿ ÿ
ÿ
0 against H 0: ÿ ÿ ÿ
0 where ÿ
0 is specified. The test statistic is based on the result

SS 2
r is ~
2 ÿ n
ÿ

2
. So the test statistic is
ÿ

SS 2
r is
C ÿ ~
0 20
ÿ n ÿ

2 under H0 .
ÿ

2 2
ÿ or ÿ .
The decision rule is to reject H0 if C0 ÿ nÿ 2,ÿ /2 C0 ÿ nÿ ÿ 2,1 /2
ÿ

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
19
Machine Translated by Google

2
Confidence interval for ÿ

2
/~ÿ
2 2

A confidence interval for ÿ can also be derived as follows. Since SS res ÿ n ÿ

2 , thus consider

ÿ 2
SS res 2 ÿ
P ÿ n ÿ
ÿ 2 ÿ ÿ ÿ
ÿÿ 1 ÿ
ÿ

2, /2 n ÿÿ

2,1 /2
ÿ
ÿ
ÿ ÿ

ÿ SS SS ÿÿÿ
res 2 res
P ÿ ÿÿ ÿÿ
1 ÿ .
ÿ 2 2 ÿ

ÿ
ÿ n
ÿÿ

2,1
ÿ /2 ÿ n ÿ

2, / ÿ2

2
ÿ ÿ is
The corresponding 100(1 )% ÿ confidence interval for

ÿ SS
res
SS res ÿ

ÿ 2 , .
2

ÿ
ÿ n
ÿÿ

2,1
ÿ /2 ÿ n
ÿ

2, / ÿ2 ÿÿ

Joint confidence region for and :


ÿ 0 ÿ 1

A joint confidence region for ÿ 0


and ÿ 1 can also be found. Such a region will provide a 100(1 )% ÿ ÿ

confidence that both the estimates of ÿ 0


and ÿ 1
are correct. Consider the centered version of the linear

regression model
*

Y to ii xx ÿÿ ÿ0 ÿÿ ÿ 1( ) ÿ

* *
where ÿ 0 01
ÿ ÿÿ ÿ x . The least squares estimators of ÿ 0
and ÿ 1
are

* s
xy
by0 b
ÿ
and 1
ÿ
,
s xx

respectively.

Using the results that


* *
Eb
() , 0
ÿ
ÿ 0

Eb
() , 1
ÿ
ÿ 1

2
* ÿ
there is
()b, ÿ
0
n
2
ÿ
there is
()b. ÿ
1

s
xx

2
When ÿ is known, then the statistic
* *
b
ÿ

ÿ b ÿ

ÿ 1
0 0
N
~ (0,1) and 1

N
~ (0,1).
2 2
ÿ ÿ

n s
xx

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
20
Machine Translated by Google

Moreover, both statistics are independently distributed. Thus


2
2

ÿ ÿ ÿ
ÿ ÿ ÿ

ÿ b
ÿ * *
b ÿÿÿÿÿ
2 2
ÿÿ ÿ ÿ ÿ0
ÿ 0
~ ÿ and
ÿ ÿ ÿ ÿ
1
ÿ 1
~
1
ÿ 1

ÿ 2 ÿ 2

n ÿÿÿ ÿ
s
xx ÿ ÿ
*

are also independently distributed because b


0
b and are independently distributed. Consequently, the sum
1

of these two

* *2 2
nb( )()
ÿ

sb ÿ

ÿ
0 ÿ O
ÿ xx 11 2 2
~
2 2 ÿ .

ÿ ÿ

Since

SS
res 2
~ ÿ
n 2
2
ÿ

*
and SS is independently distributed of b and b 1,
so the ratio
res 0

ÿ nb s* b*2( ÿ)() ÿ

ÿ
2
ÿ
0 0 11 xxÿ 2
ÿ
2 2
ÿ ÿ
ÿ ÿ ÿ
~ F .
2,n 2
SS
ÿ

ÿ
res n
ÿ
2
ÿ ÿ (ÿ2)ÿ
ÿ
ÿ

* *
Substituting b b bx ÿ0 01
ÿ and ÿ ÿ ÿ ÿx 0 01ÿ , we get

ÿ 2
n ÿÿÿ Q ÿ
f
ÿ
ÿ
2
ÿÿ ÿÿ
SS
res ÿ ÿ

where

n n

Q nb
ft i
ÿ

( 0
ÿ

ÿ 0 2 ÿ
2
ÿ ) ( x b 0 11 1ÿÿ )( b ( x b ) ÿ ÿÿ ÿ ÿ 22
1 ÿ 1
).
i 1
ÿ
i ÿ
1

Since

ÿ ÿQÿÿÿ2 ÿnÿÿ ÿ ÿ ÿ
f F 2, 2
ÿ ÿÿ 1 ÿ
P n
2 SS
ÿ

ÿ ÿ
res

holds true for all values of ÿ and ÿ so the 100 (1 ) ÿ % ÿconfidence region for and ÿ is
0 1, ÿ 0
1

ÿ
n ÿ ÿ2 ÿ .
Q f F
ÿ 2,
ÿ 2;1
n ÿÿ

ÿ .
ÿ
2 ÿ SS .

res

This confidence region is an ellipse which gives the 100 (1 )% ÿ probability that ÿ and ÿ are contained
ÿ 0
1

simultaneously in this ellipse.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
21
Machine Translated by Google

Analysis of variance:
The technique of analysis of variance is usually used for testing the hypothesis related to equality of more

than one parameters, like population means or slope parameters. It is more meaningful in case of multiple

regression model when there are more than one slope parameters. This technique is discussed and illustrated

here to understand the related basic concepts and fundamentals which will be used in developing the analysis

of variance in the next module in multiple linear regression model where the explanatory variables are more
than two.

A test statistic for testing H : ÿ ÿ can


0 also be formulated using the analysis of variance technique as
1 0

follows.

On the basis of the identity


ˆ ˆ
ii i ( ) ( ), yyi yy yy
ÿÿ ÿÿ ÿ

the sum of squared residuals is


n
ˆ
Sb() ( y y ÿ
ÿ i
ÿ

i
)
2

i ÿ
1

nn n
ˆ 2 2
ˆ
ÿ ÿÿ ÿ ÿ ( ) and and ( and andand
ÿ ÿ

ÿÿ ÿ ) 2 ( )( ). and and iiiand i i

ii i 11 1
ÿÿ ÿ

Further, consider
n
ˆ n

ÿ ( )(ii )i yyy y
ÿ ÿÿ ÿ
ÿ ( x)x( ) y yb 1 i
ÿ

ÿ ÿ
i 1 i 1

ÿ
( b xx )
2

1
ÿ i
ÿ 2

ÿ
i 1

n
ˆ
ÿ
ÿ ( ). y y i
ÿ 2

ÿ
i 1

Thus we have

nnn ˆ
ˆ
ÿÿÿ
( yyÿÿ
yy ÿÿ
yy ÿ i
)
2 22
( ii
) (
i
).
i ii 1 11
ÿ ÿÿ

The term ÿ ÿ (is) and


called
2
and the sum of squares about the mean, corrected sum of squares of y (i.e.,
i
ÿ
i 1

SScorrected), total sum of squares, or syy .

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
22
Machine Translated by Google

n
ˆ
The term ÿÿ ( yyi i )
2

describes the deviation: observation minus predicted value, viz., the residual sum of
i ÿ 1

ÿ
ˆ 2

squares, i.e., SS res ÿ ÿ ( and and i i )

i ÿ 1

n
ˆ 2
whereas the term ÿ ÿ ( ) and
and i describes the proportion of variability explained by the regression,
i ÿ 1

SS
rg and
ÿ
ÿ ÿ( ).ˆ and and i
2

i ÿ 1

n
ˆ 2
If all observations y are located on a straight line, then in this case
i
ÿ ÿ (ÿand
)0
andand
thusi i

i ÿ 1

SS SS ÿ .
corrected r ge

Note that SS is completely determined by b and so has


1 only one degree of freedom. The total sum of
r ge

n n
2

squares ) s yy ÿ ÿ (
yy
ÿ
i ÿ degrees of freedom due to constraint ( )0 ÿ ÿ ÿand
andandhas ( 1) n i
SS has
res

i ÿ 1 i ÿ 1

( 2) n ÿ degrees of freedom as it depends on the determination of b and 0


b.1

2
All sums of squares are mutually independent and distributed as ÿ df with df degrees of freedom if the

errors are normally distributed.

The mean square due to regression is


SS
r gand
MS ÿr g and

and mean square due to residuals is


SS
res
MSE ÿ
.
n ÿ

The test statistic for testing H: 0


ÿ 1
0 ÿ is

MSr and g
F0 ÿ .
MSE

If H : ÿ 1 ÿ is
0
0 true, then M S and MSE are independently distributed and thus
r ge

F F 1, 2
~ .
0 nÿ

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
23
Machine Translated by Google

The decision rule for H: ÿ 1 ÿ is0to reject H0 if


1

FFÿ 0 1, 2;1
nÿ ÿ ÿ

at ÿ level of significance. The test procedure can be described in an Analysis of variance table.

Analysis of variance for testing H: ÿ 1


ÿ
0
0

Source of variation Sum of squares Degrees of freedom Mean square F

Regression SS r ge
1 M S r ge / MS r MSE
gand

Residual SS nÿ2 MSE


res

Total s 1nÿ
yy

Some other forms of SS SS and s can be derived as follows:


, reg res yy

The sample correlation coefficient then may be written as

s
xy
r ÿ
.
xy
s xx s
yy

Moreover, we have

s s
xy yy
b ÿÿ r .
1
xy
s
xx
s xx

2
The estimator of ÿ in this case may be expressed as

n
1
s
2
ÿ

n ÿ
2
ÿ and
i
2

i ÿ 1

1
ÿ SS .
res
n ÿ
2

Various alternative formulations for SS are in use as well:


res

SS
res
ÿ ÿÿ i ÿ y b b( x [ 0 1 i
)]
2

i ÿ 1

n
2
ÿ ÿÿ ÿ

ÿ [( )] iy y bx) x 1
(
i

i ÿ 1

ÿÿ ÿ2 s b s bs 2

yy 1 xx 1
xy

ÿ ÿ yy s bs 1
2

xx

ÿÿ
( s
xy )
s .

yy
s
xx

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
24
Machine Translated by Google

Using this result, we find that

SS sÿ
corrected yy

and
SS s SS ÿÿ
res
r gand yy

2
( s)xy
ÿ

s
xx

2
ÿ
b s xx
1

ÿ
bs .
1
xy

Goodness of fit of regression


It can be noted that a fitted model can be said to be good when residuals are small. Since SS is based on res

residuals, so a measure of the quality of a fitted model can be based on SS . When the intercept
res
term is

present in the model, a measure of goodness of fit of the model is given by

SS SS
R 2 ÿÿ ÿ 1
res rg and

.
s s
yy yy

This is known as the coefficient of determination. This measure is based on the concept that how much

variation in y ’s stated by s is explainable


yy
by SS reg
and how much unexplainable part is contained in

SS . The ratio SS s describes / the proportion of variability that is explained by regression in relation
res r g yy and

to the total variability of y . The ratio SS s describes the


res
proportion
/
yy
of variability that is not covered

by the regression.

It can be seen that

Rrÿ 2 2
xy

where r xyis the simple correlation coefficient between x and y. Clearly 0 ÿRÿ 1
2
so a value of R 2closer
,

2
to one indicates the better fit and value of R closer to zero indicates the poor fit.

Prediction of values of study variable


An important use of linear regression modeling is to predict the average and actual values of the study

variable. The term prediction of the value of study variable corresponds to knowing the value of ( ) E y (in

case of average value) and value of y (in case of actual value) for a given value of the explanatory variable.

We consider both cases.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
25
Machine Translated by Google

Case 1: Prediction of average value


Under the linear regression model, xÿ ÿÿ ÿÿ 1y 0
ÿ , the fitted model is y b bx ÿ ÿ where b 0 1 0
b and1are the

OLS estimators of ÿ 0
and ÿ 1 respectively.

Suppose we want to predict the value of E y( ) for a given value of x ÿ x .0 Then the predictor is given by

x
0 ÿˆ x .
E and (| )ÿÿ 0/ y10
0 b bx ÿ

Predictive bias
Then the prediction error is given as
ˆ
ÿ | y x0 E ()
yb (bx E
ÿ ÿÿ ÿ 0 10 ÿÿ ÿ ÿ x 0 10 ÿ
)

b bx
ÿÿ ÿ ÿ0 10 0 10 (ÿ ÿ x

) b bx
0
ÿ ( )( ).1 10
ÿ ÿÿ ÿÿ 0

Then
ˆ
And ÿ |yx0
ÿ

( ÿ ÿÿÿ ÿ
ÿ () Eb
E and 0 0 1 10
ÿÿ )
Eb( x )
ÿÿ

000 ÿÿÿ

Thus the predictor ÿ and / x0


is an unbiased predictor of E y( ).

Predictive variance:

The predictive variance of ÿˆ | y x 0 is

ˆ
PV ÿ ()ÿ (| y x 0 ÿ
Be bbx0 10
ÿ )

ÿ ÿÿWhere ybxx1 0()


ÿ
2
ÿ ÿÿVar
ÿ 0 yxx
10
( ) 1( Var b ( ) 2() xx Covyb ( , )
ÿ

)
2 2 2
ÿ ÿ xx0 ÿ

)
(
0 ÿÿ ÿ
n s xx

2 ÿ ( ÿÿ 1 x x 0ÿ ÿ ) ÿ
2

ÿ ÿ .
ÿ
n s xx ÿ

Estimate of predictive variance


2 ˆ2
The predictive variance can be estimated by substituting ÿ
by
ÿ ÿ MSE as

2
ÿ
ˆ ˆ 2 ÿ (1 ÿ ÿ x x ÿ0ÿ ÿ) ÿ ÿ
PV (ÿ
ÿ
ÿ
|yx0
) ÿ
n s
xx

2
ÿ 1 ( ÿ x x ÿ0) ÿ ÿ
ÿ
MSE .
ÿ ÿ
n s
xx

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
26
Machine Translated by Google

Prediction interval estimation:


The 100(1- ÿ )% prediction interval for E y(/ ) x0 is obtained as follows:

The predictor ÿˆ | y x 0 is a linear combination of normally distributed random variables, so it is also normally

distributed as
ˆ ~ N x PV ÿ
ˆ
ÿ ÿÿ0 10 , ÿ ÿÿ .
| yx 0
ÿ ÿ | yx 0

2
So if ÿ is known, then the distribution of
ˆ
E (|
y x)
ÿ

| yx0 0

ˆ
ÿ ÿ PV ( |yx0
)

is (0,1). N So the 100(1- )%


ÿ prediction interval is obtained as
ˆ
ÿ
ÿ (| ) 1 ÿ
Ouch ÿ
ÿ

| yx0 0
Pz ÿÿ /2 ÿ
ÿ ˆ ÿÿ z / 2 ÿ
ÿ ÿ
PV ( ÿ )
|yx0
ÿ ÿ

which gives the prediction interval for E(/ ) y x0 as

2 2
ÿ
ˆ 2
ÿ ÿÿ ÿÿ
()xxÿ 0 ÿ ÿÿ ÿÿ ÿÿ
1 ˆ ÿÿ ÿÿ ÿÿ ÿ 2 1
()xx 0
ÿ

ÿ
ÿÿ| yxÿÿ0 With
ÿ /2 ,
| yx 0
With
ÿ /2
.
ns xx
ns xx
ÿ

2 ˆ2
When ÿ is unknown, it is replaced by ÿ ÿ MSE and in this case the sampling distribution of
ˆ
ÿ | yx0
ÿ

Ey(|x) 0

2
ÿ 1 ( x x ÿ0 ÿ )ÿ
MSE ÿ ÿ
n s
ÿ xx ÿ

is t -distribution with ( 2) n ÿ degrees of freedom, i.e., t n 2 ÿ


.

ÿ
The 100(1- )% prediction interval in this case is

ÿ ÿ

ÿ
ˆ ÿ

ÿ | y x0 (| )
Ouch
ÿ

0
Pt ÿ

ÿ 2, 2 n ÿ
ÿ / 1ÿ t ÿ ÿÿ ÿ .

2 n ÿ

( x x ÿ0 ÿ ) ÿ
,

ÿ 1 2
MSE ÿ
ÿ
n s
ÿ ÿ xx ÿ ÿ

which gives the prediction interval as

2 2
ÿ
ˆ ÿ ÿÿ ÿÿ()xx
ÿ ÿ0 ÿÿ ÿÿ ÿÿ ÿÿˆ ÿÿ ÿÿ ÿ
1 1 ()xx 0
ÿ

ÿ
ÿ | yx t
ÿ /2, 2 n ÿ
MSE ÿ | yx t ÿ /2, 2 n , ÿ
MSE .
0
ns xx
0
ns xx
ÿ

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
27
Machine Translated by Google

Note that the width of the prediction interval E y(| ) x is a function of x . The interval width is minimum
0 0

for x ÿ x and widens as


0
x 0ÿ x increases. This is also expected as the best estimates of y to be made at

x -values lie near the center of the data and the precision of estimation to deteriorate as we move to the

boundary of the x -space.

Case 2: Prediction of actual value


If x is the value of the explanatory variable, then the actual value predictor for y is
0

yˆ ÿ ÿ b0 bx
0 10 .

The true value of y in the prediction period is given by y 0 ÿ


ÿ ÿ ÿÿ x 00 10 ÿ where ÿ 0 indicates the value that

would be drawn from the distribution of random error in the prediction period. Note that the form of

predictor is the same as of average value predictor, but its predictive error and other properties are different.

This is the dual nature of predictor.

Predictive bias:
The predictive error of yˆ0is given by
ˆ
y y ÿbÿÿbxÿ0 ÿ ÿ x 0 0 10 0 10 0 ( ÿ ÿ ÿ )

( b)( bx
) ÿÿ 0ÿ ÿÿ ÿ 1 10
0
ÿ ÿ .

Thus, we find that


ˆ
E yy
( Eb 0 0
) (
0 ÿ 0 )
Eb( x1E10 0ÿÿÿ )ÿÿ() ÿÿ

0000 ÿÿÿÿ

implies that yˆ0is an unbiased predictor of which 0 and .

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
28
Machine Translated by Google

Predictive variance
Because the future observation 0 y is independent of yˆ , the predictive variance of yˆ is
0
0
ˆ ÿ
ˆ 2

PV y()E(0y 00
y
ÿ

)
2
ÿ
Eb[( ÿ ÿÿÿ 0) 0( )( x0 xb
11 11 0 ÿÿÿÿ ) ( b ÿ ÿÿ ) x ]

2
ÿ ÿÿVar
ÿ 0 bxx 2 ( ) 2( ) ( , ) 2 xx Cov bb xCov
(0 )b(Var
b () bx Var b )Var () ÿ ÿ ÿ ) 2( ) ÿx x Var b ]
( , 01 ()
ÿ ÿ

1 0 01 ÿ 0 1
1 0

[rest of the terms are 0 assuming the indepe ndence of


ÿ 0 with
ÿÿ12 , ,..., ÿ n

ÿ ( )b0[( 2 ÿ ÿ ÿÿ0 ÿ x x x)
there is 2( Var 0 ( ) ) 2 ] x x xÿ
)] xbxVar ( ) 2[( 1
The bb ÿ ÿ ÿÿ
0 ( , 01
)
2
Yes
Var
ÿÿ ÿÿ 0 (bx
())2Yes
x Cov
b0 bb () 1
ÿ 0 0 01 (,)
22 2 x 2 22
ÿ 1 ÿ xÿ
ÿ
ÿ
ÿ
2 ÿ ÿ ÿÿ x x 0 ÿ0 ÿ

ÿ
ns s xx ÿ ÿ xx
s
xx

2
ÿ ÿ
(
xx0 ÿ
ÿ

ÿ 2n 1 1
ÿ ÿÿ ) .
ÿ
s
xx ÿ ÿ

Estimate of predictive variance

The estimate of predictive variance can be obtained by replacing 2


by its estimate
ˆ
ÿ 2 ÿ MSE as
ÿ

• 2
ˆ ˆ2 ÿ ( ÿ 1ÿÿ ÿ x x 0ÿ ÿ ) ÿ
PV y() 10 ÿ
ÿ
n s
xx

ÿ ( ÿ 1ÿÿ ÿ x x 0ÿ ÿ ÿ )
ÿ MSE 1
.

n s
xx ÿ ÿ

Prediction interval:

If 2
ÿ is known, then the distribution of
ˆ ÿ

yy0 0
ˆ
PV y( ) 0

is N(0,1). So the 100(1- ÿ )% prediction interval is obtained as

ÿ
ˆ ÿ
yy 0
ÿ

0
Pz ÿÿ /2 ÿ
ˆ 1 ÿ ÿÿ
ÿ /2
z ÿ
ÿ

ÿ
PV y( ) 0
ÿ ÿ

which gives the prediction interval for 0 y as

2 2
ÿ
ˆ ÿ ÿÿ ÿ ()xx 0
ÿ

ˆ ()xx 0 ÿ

ÿ ÿÿÿ1ÿÿÿÿ 1 ÿÿ
2 ÿ 2
ÿ
yz 0
ÿ

ÿ /2 ÿ 1ÿ ,1
ÿÿ. ÿ ÿ ÿ yz /2
ns 0 ÿ
ns
xx xx
ÿ

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
29
Machine Translated by Google

2
When ÿ is unknown, then
ˆ
yy
ÿ

0 0

ÿ
ˆ
PV (y) 0

follows a t -distribution with ( 2) n ÿ degrees of freedom. The 100(1- )%


ÿ prediction interval for yˆ0 in this
case is obtained as

ÿ ˆ ÿ

and and0
ÿ

Pt ÿ ÿ

ÿ /
0
1 ÿ ÿÿ tn / 2, 2 ÿ
ÿ
ÿ 2, 2 n ÿ

ÿ
ˆ ÿ ÿ

ÿ
PV (y ) 0 ÿ

which gives the prediction interval


2
ˆ
2
ÿ ÿÿ ÿÿ ÿ ()ÿxx 0 ˆ 1
()xx 0
ÿ

ÿ ÿ yt 0
ÿ

ÿ /2, 2 n ÿ
11 ÿÿ
MSE ÿÿ1,ÿÿ ÿÿ ÿ yt 0 ÿ /2, 2 n ÿ
MSE .

ÿ
ÿÿ ÿÿ nsÿ xx
ns xx

The prediction interval is of minimum width at x ÿ x and widens as x ÿ x increases.


0 0

The prediction interval for yˆ is


0 wider than the prediction interval for
ˆ
ÿ /yx0 because the prediction interval

for yˆ0 depends on both the error from the fitted model as well as the error associated with the future
observations.

Reverse regression method


The reverse (or inverse) regression approach minimizes the sum of squares of horizontal distances between
the observed data points and the line in the following scatter diagram to obtain the estimates of regression
parameters.
do

Y X ÿÿÿ 0 ÿ1
(xi,

(Xi, )

x,

Reverse regression

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
30
Machine Translated by Google

The reverse regression has been advocated in the analysis of gender (or race) discrimination in salaries. For

example, if y denotes salary and x denotes qualifications, and we are interested in determining if there is

gender discrimination in salaries, we can ask:

“Whether men and women with the same qualifications (value of x) are getting the same salaries

(value of y). This question is answered by the direct regression.”

Alternatively, we can ask:

“Whether men and women with the same salaries (value of y) have the same qualifications (value of

x). This question is answered by the reverse regression, i.e., regression of x on y.”

The regression equation in case of reverse regression can be written as


* *
x
i and
ÿÿÿÿ ÿ 0 1 ii
ÿ ( 1,2,..., ) n
i ÿ

where ÿ ’s are the associated random error components and satisfy the assumptions as in the case of the
i

ˆ * ˆ *

usual simple linear regression model. The reverse regression estimates ÿ OR


of and
ÿÿÿ 1R
of for the
0 1

model are obtained by interchanging the x and y in the direct regression estimators of ÿ 0
and ÿ 1 . The

estimates are obtained as

ˆ ˆ
OR
ÿÿx 1R
ÿ ÿ Y
and

ˆ s
yy
ÿ

ÿ 1 R
s
xy

for ÿ 0
and ÿ 1 respectively. The residual sum of squares in this case is
2
s
* xy
SS s ÿ ÿ xx
.
res
s
yy

Note that
2
ˆ s xy
2

ÿ 1R b
ÿ ÿ
r xy
1

ss
xx yy

where b1is the direct regression estimator of the slope parameter and rxyis the correlation coefficient

between x and y. Hence if r is close


xy to 1, the two regression lines will be close to each other.

An important application of the reverse regression method is in solving the calibration problem.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
31
Machine Translated by Google

Orthogonal regression method (or major axis regression method)


The direct and reverse regression methods of estimation assume that the errors in the observations are either

in x -direction or y -direction. In other words, the errors can be either in the dependent variable or

independent variable. There can be situations when uncertainties are involved in dependent and independent

variables both. In such situations, the orthogonal regression is more appropriate. In order to take care of

errors in both the directions, the least-squares principle in orthogonal regression minimizes the squared

perpendicular distance between the observed data points and the line in the following scatter diagram to

obtain the estimates of regression coefficients. This is also known as the major axis regression method.

The estimates obtained are called orthogonal regression estimates or major axis regression estimates of

regression coefficients.

do

(xi, yi)

Y X ÿÿÿ 0 ÿ1

(Xi, Yi)

xi

Orthogonal or major axis regression

If we assume that the regression line to be fitted is Y i


ÿ
ÿ0 ÿ ÿ 1X i ,
then it is expected that all the observations

(nBut
, lie
i
), these
1,2,...,
i
on thispoints
ÿ line.
x yi deviate from the line, and in such a case, the squared

perpendicular distance of observed data ( , )x(yi1,2,...,


n
i i
) from the line is given by
ÿ

d i2Xx22
Y ( )( )
ii iiÿ ÿ ÿÿ and

th
where ( ,) Xi iY denotes the i pair of observation without any error which lies on the line.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
32
Machine Translated by Google

n
2
The objective is to minimize the sum of squared perpendicular distances given by ÿ tod obtain the
i

i ÿ 1

estimates of ÿ0 and ÿ 1 . The observations ( , ) ( x yi n i i


ÿ
1,2,..., ) are expected to lie on the line

Y ÿ ÿÿ
i
0 ÿ 1X i ,

so let

EYii X ÿÿ ÿ
ÿ 0
ÿ1 i
ÿ
0.

n
2 '
The regression coefficients are obtained by minimizing ÿ under
d the constraints
i Ei s using the
i ÿ 1

Lagrangian’s multiplier method. The Lagrangian function is


n n
2
L
ÿ
d 2 ÿ iiAND
0
ÿÿÿ i

i ÿ 1 i ÿ 1

where ÿ 1,..., ÿ n are the Lagrangian multipliers. The set of equations are obtained by setting

L 000
ÿÿÿ 0,
ÿ LLL
0
0, 0 and
ÿ ÿ ÿ ÿÿ
0( i 1,2,..., n).
XY ÿ ÿÿ ÿ i i
ÿ 0
ÿ 1

Thus we find

ÿL 0
ÿÿ ()
ÿX0xii iÿ ÿ ÿ 1

ÿX i

ÿL 0
ÿ ÿ ÿÿ
()and
ii0i and
ÿ
ÿ Y i

n
ÿL 0
ÿ
ÿ ÿ i
ÿ
0
ÿÿ 0 i ÿ 1

n
ÿL 0
ÿ
ÿ ÿ X i i
ÿ
0.
ÿÿ 1 i ÿ 1

Since

X xi ii ÿÿ ÿ ÿ 1

and and
ÿ ÿ to ii ÿ ,

so substituting these values is ÿ i , we obtain

ÿ ÿ ÿ x ii 01 1ÿÿ )0
Eyi iiÿ ÿ (ÿÿ ÿ ) (
ÿ

ÿ ÿ ÿÿ
0 1
ÿÿi ÿ xy i i
.
2
1ÿ ÿ 1

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
33
Machine Translated by Google

Also using this ÿ in the equation


i ÿ ÿÿ i
0
, we get
i ÿ 1

ÿ( ÿ ÿ ÿÿ
0 1x
i
y i
)
i ÿ
1
ÿ 0
2
1ÿ ÿ 1

and using (
Xx ÿ i
1 i ) ÿÿ i
0 and ÿÿ ÿ ÿÿ,iiX 0 we get
i ÿ
1

ÿ ÿÿ ii (i x ÿÿ 1
) 0. ÿ

i ÿ
1

Substituting ÿ in this equation, we get


i

n
2
ÿ( ÿ 0
ÿ x xÿyx1 i i
ÿ

ii )
2

i 1 ÿ
ÿÿ ( ÿ ÿ ÿxy 10
22 1 ii )
ÿ
ÿ
0. (1)
ÿ 2 ÿ ÿ
(1 ÿ i
) (1
1
)

Using ÿ
i
in the equation and using the equation ÿ ÿÿ i
0
,
we solve
ÿ
i 1

( ÿ ÿ ÿ ÿxy 0 1 ii )
ÿ
0.
ÿ
i 1
ÿ
2
1ÿ ÿ 1

The solution provides an orthogonal regression estimate of ÿ 0 as

ˆ ˆ
ÿ 0 OR ÿ ÿ and ÿ x
1 OR

ˆ
where ÿ is an orthogonal regression estimate of .
1 OR ÿ 1

Now, substituting ÿ 0OR


in equation (1), we get

2
n n
2
ÿ
2
xx 0 ÿÿ ÿÿÿÿ
ÿ ÿ ÿ ÿ 1xxy 1 ÿ11 ÿ y x ÿÿ xy
ÿ

ÿ ÿ
ÿ (1 1 ÿ )
ÿ
yx i 1 i i ii
ÿ

ÿ i i

i 1 ÿ
i ÿ
1

or
2
n n

(1 ÿ ÿ
2
xyy ÿ x xi 1 11
ÿÿ1( ÿ ÿ and
( ) and
ÿÿÿ ÿ ( xxi ÿ
ÿ
0
ÿ ) ii ÿ ) i ÿ)

i 1ÿ
i ÿ
1

or
n n
2 2
ÿ ÿ in xvÿ u ÿÿ 1 i ii ii vu ÿ ÿÿÿÿ 11 )0
ÿ) ( )( ÿ(
ÿ

(1 1 )

ÿ ÿ
i 1 i 1

where
u xx ÿÿ
,
i i

ÿÿ
vyy i i
.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
34
Machine Translated by Google

n n

Since ÿÿ u
i
ÿ in i
ÿ
0, so
i 1
ÿ i ÿ 1

n
2 22 0 ÿ ÿÿ ÿ uv u v uv ii i ii
ÿ ÿÿ ÿ 1
( ÿ 1 i )
ÿ
ÿ
i ÿ 1

or
2
ÿ s ss s xyÿ yy xy ÿ (
ÿ ÿÿ
)
0.
1 1 xx

Solving this quadratic equation provides the orthogonal regression estimate of ÿ 1 as

ˆ ÿ s s yy
ÿÿ xx
ÿ ÿÿ
sign s xy ( s)4
s yy
xx
ÿ
2

ÿ s
2

xy

ÿ 1 OR
ÿ

2s
xy

where sign s denotes


) the sign of s which can be positive or negative. So
( xy xy

ÿ 1 if s
xy
0ÿ
( sign s
xy
ÿÿ ÿÿ ÿ ÿ ) .
1 if s xy 0. ÿ

ˆ n
2

Notice that this gives two solutions for ÿ We choose the solution which minimizes d
1
ÿ . The other OR .
i

i ÿ 1

n
2
solution maximizes d
ÿ and is in the direction perpendicular to the optimal solution. The optimal solution
i

i ÿ 1

can be chosen with the sign of s


xy
.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
35
Machine Translated by Google

Reduced major axis regression method:


The direct, reverse and orthogonal methods of estimation minimize the errors in a particular direction which

is usually the distance between the observed data points and the line in the scatter diagram. Alternatively,

one can consider the area extended by the data points in a certain neighbourhood and instead of distances, the

area of rectangles defined between the corresponding observed data point and the nearest point on the line in

the following scatter diagram can also be minimized. Such an approach is more appropriate when the

uncertainties are present in the study and explanatory variables both. This approach is termed as reduced

major axis regression.

do

(xi yi)

Y X ÿÿÿ 0 ÿ1

(Xi, Yi)

xi

Reduced major axis method

Suppose the regression line is Y ÿ ÿ ÿ0 on which all the observed points are expected to lie. Suppose
i ÿ 1X i

the points ( , 1, 2,...,


i
), are observed which lie away from the line. The area of rectangle extended x yi n
i
ÿ

between the th
i observed data point and the line is

Ai
ÿ
( ~ i)(ii i~ ) ( X xY yi ÿ 1, 2,..., n)

th
where ( ,) X Yi i denotes the i pair of observation without any error which lies on the line.

The total area extended by n data points is


n n

A (X~xY
ÿ ÿÿ )( ~y ).i ii i
i

i 1
ÿ
i ÿ
1

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
36
Machine Translated by Google

All observed data points ( , ),n ( x yi i i


ÿ
1, 2,..., ) are expected to lie on the line

Y ÿ ÿ ÿ0 ÿ 1X
i i

and let
*
EY X i ÿÿ ÿ
ÿ ÿ ÿ 0.
ii 0 1

*
So now the objective is to minimize the sum of areas under the constraints Ei to obtain the reduced major

axis estimates of regression coefficients. Using the Lagrangian multiplier method, the Lagrangian function is
n n

ÿÿ
*
THE E ÿ ÿ

ÿ
R i ii
i 1 ÿ
i ÿ
1

n n

ÿ
*
ÿ

ÿ ( )(X x Y y ÿ
ÿ ÿÿ

) ii _
i i i i

i 1
ÿ
i ÿ
1

where ÿ 1,..., ÿ n are the Lagrangian multipliers. The set of equations are obtained by setting

ÿÿÿÿ LLLL 0, 0,
RRRR
ÿ ÿ ÿ ÿÿ
0, 0( i
1, 2,..., n).
XY ÿ ÿÿ ÿ ÿ ÿ
i i 0 1

Thus
ÿL
R
ÿÿ and
ÿ and() 0 ÿ ÿ ÿ

1 ii i
ÿX
i

ÿL
R ÿ ÿ ÿÿ ii i
()0 X x ÿ
ÿ Y
i

ÿ L n

ÿÿ
R ÿ
ÿ ÿ i
ÿ 0

0 i ÿ
1

n
ÿL

ÿÿ
R ÿ
ÿ ÿ X i i
ÿ 0.
1
i ÿ
1

Now
Xxÿ ÿ ÿ
iii

and
ÿÿ

1 ii iand
ÿ ÿ
ÿ ÿ ÿÿ
ÿ 01xy 1 ii i
ÿÿ
ÿ 01
ÿ ÿ ( x ii i ÿ ÿÿ
ÿ and
) ÿÿ 1 i

i
ÿÿ

ÿ 0ÿ ÿ x
y ÿ ÿÿ i 1i
.

2
1

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
37
Machine Translated by Google

Substituting ÿ i in ÿ ÿÿ i
0 , the reduced major axis regression estimate of ÿ0 is obtained as
i ÿ
1

ˆ ˆ
ÿ 0 RM ÿ ÿ and ÿ 1RM x
ˆ ˆ
where ÿ RM is the reduced major axis regression estimate of
1
ÿ 1 . Using X xi ÿ i ii ÿ , ÿ and ÿ 0 RM in

ÿ ÿÿ X i i
0 , we get
i ÿ
1

n
ÿ ÿÿ ÿ
ÿÿ ÿ ÿÿ ÿ yy x x yy x x 11ÿÿ11 i ii i x i ÿÿ ÿ ÿ

ÿÿ
ÿ
ÿ
0.
ÿÿ ÿ
i ÿ
1
2 2
1
ÿÿ 1
ÿ ÿ

Let u iixx v ÿÿ ÿÿ y yand


then this
ii equation, can be re-expressed as

ÿ ÿ ÿÿ
(
v iuv i u
ÿii 2 ÿ) 10.11)( ÿÿ x
i ÿ
1

n n

u in
Using ÿ ÿÿ ÿi we get i 0,
i ÿ
1 i ÿ
1

n n
22 2 we
ÿÿÿ ÿ 1
in
i
ÿ
0.
i 1 ÿ
i ÿ
1

Solving this equation, the reduced major axis regression estimate of ÿ1 is obtained as

ˆ s
yy
ÿ 1 RM
ÿ
sign s( ) xy
s xx

1 if s
xy
0ÿ
where sign s ÿ ÿ xy
( ) ÿÿÿ ÿ
1 if s
xy
ÿ 0.

s.
We choose the regression estimator which has same sign as of xy

Least absolute deviation regression method


The least-squares principle advocates the minimization of the sum of squared errors. The idea of squaring the

errors is useful in place of simple errors because random errors can be positive as well as negative. So

consequently their sum can be close to zero indicating that there is no error in the model and which can be

misleading. Instead of the sum of random errors, the sum of absolute random errors can be considered which

avoids the problem due to positive and negative random errors.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
38
Machine Translated by Google

In the method of least squares, the estimates of the parameters ÿ 0


and ÿ 1 in the model
n
2
and
i ÿ ÿ ÿ 1xi n ÿare
ÿÿ 0 ii
. ( chosen
1, 2,...,
ÿ
such ) that the sum of squares of deviations ÿ isÿminimum. In
i

i ÿ
1

the method of least absolute deviation (LAD) regression, the parameters ÿ 0


and ÿ 1 are estimated such that
n

the sum of absolute deviations ÿ is ÿminimum. It minimizes the absolute vertical sum of errors as in the
i

i ÿ
1

following scatter diagram:

do

(xi, yi)

Y X ÿÿÿ
0
ÿ1

(Xi, Yi)

xi

Least absolute deviation regression

The LAD estimates ÿ 0L


ˆ and ˆ
ÿ L
1
are the estimates of ÿ0 and ÿ 1 , respectively which minimize
n

LAD ÿ ÿ 01 ÿ ÿÿ ÿ Y
(,)i ÿÿ0 x
1i

i ÿ
1

for the given observations ( , )( 1, 2,...,i ). x y i i


ÿ
n

Conceptually, LAD procedure is more straightforward than OLS procedure because e (absolute residuals)

2
is a more straightforward measure of the size of the residual than e (squared residuals). The LAD

regression estimates of ÿ 0
and ÿ1 are not available in closed form. Instead, they can be obtained

numerically based on algorithms. Moreover, this creates the problems of non-uniqueness and degeneracy in

the estimates. The concept of non-uniqueness relates to that more than one best line pass through a data

point. The degeneracy concept describes that the best line through a data point also passes through more than

one other data points. The non-uniqueness and degeneracy concepts are used in algorithms to judge the

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
39
Machine Translated by Google

quality of the estimates. The algorithm for finding the estimators generally proceeds in steps. At each step,

the best line is found that passes through a given data point. The best line always passes through another

data point, and this data point is used in the next step. When there is non-uniqueness, then there is more than

one best line. When there is degeneracy, then the best line passes through more than one other data point.

When either of the problems is present, then there is more than one choice for the data point to be used in the

next step and the algorithm may go around in circles or make a wrong choice of the LAD regression line.

The exact tests of hypothesis and confidence intervals for the LAD regression estimates can not be derived

analytically. Instead, they are derived analogously to the tests of hypothesis and confidence intervals related

to ordinary least squares estimates.

Estimation of parameters when X is stochastic


In a usual linear regression model, the study variable is supped to be random and explanatory variables are

assumed to be fixed. In practice, there may be situations in which the explanatory variable also becomes
random.

Suppose both dependent and independent variables are stochastic in the simple linear regression model
X ÿÿ
ÿ ÿyÿ ÿ
0 1

where ÿ is the associated random error component. The observations ( , ), x y i i i


ÿ
1, 2,..., n are assumed to be

jointly distributed. Then the statistical inferences can be drawn in such cases which are conditional on X .

2 2
Assume the joint distribution of X and y to be bivariate normal N ( , ,) where and
ÿ xyxÿÿÿ
y , ÿ ÿ ÿ Y
,
x

2 2
are the means of X and Y;ÿ x
and ÿ are the variances of X and ; y and ÿ
is the correlation coefficient
Y

between X and y . Then the conditional distribution of y given X x ÿ is the univariate normal

conditional mean

Xx(| )
Hey ÿÿ ÿÿ x | 01ÿ y x ÿ ÿ

and the conditional variance of y given X x ÿ is


22 2
X x (ÿÿ
Var and
| ÿ ) ÿ
| yx
ÿ
Y
(1
ÿ

ÿ )

where
ÿÿ
ÿ 0 ÿ Y ÿÿx 1

and

ÿ
ÿ Y
ÿ 1 ÿ .
ÿ x

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
40
Machine Translated by Google

When both X and y are stochastic, then the problem of estimation of parameters can be reformulated as

follows. Consider a conditional random variable y | X x ÿ having a normal distribution with mean as
2
conditional mean and variance as conditional variance . Obtain n independently
ÿ y|x X x (|ÿand
Var ÿ) ÿ
| and x

2|

distributed observation | y xi 1, 2,..., from with nonstochastic X . Now the method of


i
i ,
ÿ
n
N ÿ(, )
| y x yx
ÿ

maximum likelihood can be used to estimate the parameters which yield the estimates of ÿ and ÿ as
0 1

earlier in the case of nonstochastic X as


ÿ ÿ

bÿÿybx 1
and
ÿ s
xy
b ÿ
,
1

s xx

respectively.

Moreover, the correlation coefficient

Ey( X)(
ÿ

ÿ
ÿ

ÿ Y x )
ÿ
ÿ
ÿ ÿ
yx

can be estimated by the sample correlation coefficient


n

ÿ ÿ

i
ÿ ( )( ) y yx x i

ˆ i ÿ
1
ÿ
ÿ n n

2 2
ÿ ÿ

i
ÿ ÿ () () x x yy i

i ÿ
1 i ÿ
1

s
xy
ÿ

s xx s
yy

ÿ
s xx
ÿ
b 1
.

s
yy

Thus

ˆ 2 ÿ
2 s xx
ÿ
ÿ
b 1

s
yy

ÿ
s
xy
ÿ
b 1

s
yy
n
ˆ2
s yy
ÿ

ÿ ÿ i
i ÿ
1
ÿ

s
yy
2
ÿ
R

2
which is same as the coefficient of determination. Thus R has the same expression as in the case when X

2
is fixed. Thus R again measures the goodness of the fitted model even when X is stochastic.

Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
41

You might also like