You are on page 1of 15

Chapter 8

Heteroskedasticity

In the multiple regression model


y X β +ε,
=
it is assumed that
V (ε ) = σ 2 I ,
i.e.,
Var (ε i2 ) = σ 2 ,
Cov(ε iε j )= 0, i ≠ j = 1, 2,..., n.

In this case, the diagonal elements of covariance matrix of ε are same indicating that the variance of each ε i

is same and off-diagonal elements of covariance matrix of ε are zero indicating that all disturbances are
pairwise uncorrelated. This property of constancy of variance is termed as homoskedasticity and disturbances
are called as homoskedastic disturbances.

In many situations, this assumption may not be plausible and the variances may not remain same. The
disturbances whose variances are not constant across the observations are called heteroskedastic disturbance
and this property is termed as heteroskedasticity. In this case
(ε i ) σ=
Var= 2
i , i 1, 2,..., n
and disturbances are pairwise uncorrelated.

The covariance matrix of disturbances is


 σ 12 0  0 
 
0 σ 22  0 
=V (ε ) diag
= (σ 1 , σ 2 ,..., σ n ) 
2 2 2
.
     
 
 0 0  σ n2 

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


1
Graphically, the following pictures depict homoskedasticity and heteroskedasticity.

Homoskedasticity

Heteroskedasticity (Var(y) increases with x) Heteroskedasticity (Var(y) decreases with x)

Examples : Suppose in a simple linear regression model, x denote the income and y denotes the expenditure
on food. It is observed that as the income increases, the variation in expenditure on food increases because the
choice and varieties in food increase, in general, upto certain extent. So the variance of observations on y will
not remain constant as income changes. The assumption of homoscedasticity implies that the consumption
pattern of food will remain same irrespective of the income of the person. This may not generally be a
correct assumption in real situations. Rather the consumption pattern changes and hence the variance of y
and so the variances of disturbances will not remain constant. In general, it will be increasing as income
increases.

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


2
In another example, suppose in a simple linear regression model, x denotes the number of hours of practice
for typing and y denotes the number of typing errors per page. It is expected that the number of typing
mistakes per page decreases as the person practices more. The homoskedastic disturbances assumption
implies that the number of errors per page will remain same irrespective of the number of hours of typing
practice which may not be true is practice.

Possible reasons for heteroskedasticity:


There are various reasons due to which the heteroskedasticity is introduced in the data. Some of them are as
follows:
1. The nature of phenomenon under study may have an increasing or decreasing trend. For example, the
variation in consumption pattern on food increases as income increases, similarly the number of typing
mistakes decreases as the number of hours of typing practice increases.

2. Sometimes the observations are in the form of averages and this introduces the heteroskedasticity in the
model. For example, it is easier to collect data on the expenditure on clothes for the whole family rather
than on a particular family member. Suppose in a simple linear regression model
yij =β 0 + β1 xij + ε ij , i =1, 2,..., n, j =1, 2,..., mi

yij denotes the expenditure on cloth for the j th family having m j members and xij denotes the age of

the i th person in j th family. It is difficult to record data for individual family member but it is easier to

get data for the whole family. So yij ' s are known collectively.

Then instead of per member expenditure, we find the data on average expenditure for each family
member as
mj
1
yi =
mj
∑y
j =1
ij

and the model becomes


yi =β 0 + β1 xi + ε i .

=
It we assume E (ε ij ) 0,=
Var (ε ij ) σ 2 , then

E (ε i ) = 0
σ2
Var (ε i ) =
mj

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


3
which indicates that the resultant variance of disturbances does not remain constant but depends on the
number of members in a family m j . So heteroskedasticity enters in the data. The variance will remain

constant only when all m j ' s are same.

3. Sometimes the theoretical considerations introduces the heteroskedasticity in the data. For example,
suppose in the simple linear model
yi =β 0 + β1 xi + ε i , i =1, 2,..., n ,

yi denotes the yield of rice and xi denotes the quantity of fertilizer in an agricultural experiment. It is
observed that when the quantity of fertilizer increases, then yield increases. In fact, initially the yield
increases when quantity of fertilizer increases. Gradually, the rate of increase slows down and if
fertilizer is increased further, the crop burns. So notice that β1 changes with different levels of

fertilizer. In such cases, when β1 changes, a possible way is to express it as a random variable with

constant mean β1 and constant variance θ 2 like

β1i =β1 + vi , i =1, 2,..., n


with
= Var (vi ) θ 2=
E (vi ) 0,= , E (ε i vi ) 0.
So the complete model becomes
yi =β 0 + β1i xi + ε i
β1=i β1 + vi
⇒ yi = β 0 + β1 xi + (ε i + xi vi )
=β 0 + β1 xi + wi

where w=
i ε i + xi vi is like a new random error component. So
E ( wi ) = 0
Var ( wi ) = E ( wi2 )
=E (ε i2 ) + xi2 E (vi2 ) + 2 xi E (ε i vi )
=σ 2 + xi2θ 2 + 0
= σ 2 + xi2θ 2 .
So variance depends on i and thus heteroskedasticity is introduced in the model. Note that we assume
homoskedastic disturbances for the model
yi =β 0 + β1 xi + ε i , β1i =β1 + vi
but finally end up with heteroskedastic disturbances. This is due to theoretical considerations.
Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur
4
4. The skewness in the distribution of one or more explanatory variables in the model also causes
heteroskedasticity in the model.
5. The incorrect data transformations and incorrect functional form of the model can also give rise to the
heteroskedasticity problem.

Tests for heteroskedasticity


The presence of heteroskedasticity affects the estimation and test of hypothesis. The heteroskedasticity can
enter into the data due to various reasons. The tests for heteroskedasticity assume a specific nature of
heteroskedasticity. Various tests are available in literature for testing the presence of heteroskedasticity, e.g.,
1. Bartlett test
2. Breusch Pagan test
3. Goldfeld Quandt test
4. Glesjer test
5. Test based on Spearman’s rank correlation coefficient
6. White test
7. Ramsey test
8. Harvey Phillips test
9. Szroeter test
10. Peak test (nonparametric) test
We discuss the first five tests.

1. Bartlett’s test
It is a test for testing the null hypothesis
H 0 : σ 12= σ 22= ...= σ i2= ...= σ n2
This hypothesis is termed as the hypothesis of homoskedasticity. This test can be used only when replicated
data is available.

Since in the model


y=
i β1 X i1 + β 2 X i 2 + ... + β k X ik + ε i , E (ε =
i) 0, Var (ε =
i) σ i2 ,=
i 1, 2,..., n,

only one observation yi is available to find σ i2 , so the usual tests can not be applied. This problem can be
overcome if replicated data is available. So consider the model of the form
yi* X i β + ε i*
=

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


5
where yi* is a mi ×1 vector, X i is mi × k matrix, β is k ×1 vector and ε i* is mi ×1 vector. So replicated data

is now available for every yi* in the following way:

y1* X 1β + ε1* consists of m1 observations


=
y2* X 2 β + ε 2* consists of m2 observations
=

yn* X n β + ε n* consists of mn observations
=

All the individual model can be written as


 y1*   X1   ε1* 
 *    *
 y2 
=  X2  β + ε2 
       
 *     * 
 yn   Xn  εn 
y* X β + ε *
or =

 n   n   n 
where y * is a vector of order  ∑ mi  ×1, X is  ∑ mi  × k matrix, β is k ×1 vector and ε * is  ∑ mi  ×1
=  i 1=
 i1   i =1 
vector. Application of OLS to this model yields
βˆ = ( X ' X ) −1 X ' y *
and obtain the residual vector
e=
*
i yi* − X i βˆ .
Based on this, obtain
1
si2 = ei* ' ei*
mi − k
n

∑ (m − k ) s
i
2
i
s2 = i =1
n
.
∑ (m − k )
i =1
i

Now apply the Bartlett’s test as


1 n s2
=χ2 ∑ i
C i =1
( m − k ) log
si2

which has asymptotic χ 2 − distribution with (n − 1) degrees of freedom where

 
1  n
 1  1 
C=
1+ ∑   − .
3(n − 1)  i =1  mi − k  n
∑ (mi − k ) 
 i =1


Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


6
Another variant of Bartlett’s test
Another variant of Bartlett’s test is based on likelihood ratio test statistic. If there are m independent normal
random samples where there are ni observations in the i th sample. Then the likelihood ratio test statistic for

testing H=
0 σ=
2
1 σ=
2
2 ... σ n2 is
=
ni / 2
m
 s2 
u = ∑  i2 
i =1  s 

where
ni

∑( y − yi ) , i= 1, 2,..., m; j= 1, 2,..., ni
1 2
s=
2
i ij
ni j =1

1 m
s2 = ∑
n i =1
ni si2

m
n = ∑ ni .
i =1

To obtain an unbiased test and a modification of -2 ln u which is a closer approximation to


χ m2 −1 under H 0 , Bartlett test replaces ni by (ni − 1) and divide by a scalar constant. This leads to the statistic
m
(n − m) log σˆ 2 − ∑ (ni − 1) log σˆ i2
M= i =1

1   1  m
1 
1+ ∑  − 
3(m − 1)  i =1  ni − 1  n − m 

which has a χ 2 distribution with (m − 1) degrees of freedom under H 0 and

1 ni
=σˆ i2 ∑ ( yij − yi )2
n − 1 j =1
1 m
=σˆ 2 ∑
n − m i =1
(ni − 1)σˆ i2 .

In experimental sciences, it is easier to get replicated data and this test can be easily applied. In real life
applications, it is difficult to get replicated data and this test may not be applied. This difficulty is overcome in
Breusch Pagan test.

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


7
2. Breusch Pagan test
This test can be applied when the replicated data is not available but only single observations are available.
When it is suspected that the variance is some function (but not necessarily multiplicative) of more than one
explanatory variable, then Breusch Pagan test can be used.

Assuming that under the alternative hypothesis σ i2 is expressible as

σ i2 h( Z=
= iγ )
'
ˆ h(γ 1 + Z i*γ * )

where h is some unspecified function and is independent of i ,


Z i' = (1, Z i*' )(1, Z i 2 , Z i 3 ,..., Z ip )

=
is the vector of observable explanatory variables with first element unity and γ ′ (γ=
1, γ i )
*
(γ 1 , γ 2 ,..., γ p ) is a

vector of unknown coefficients related to β with first element being the intercept term. The heterogencity is

defined by these p variables. These Z i ' s may also include some X ' s also.

Specifically, assume that


σ i2 = γ 1 + γ 2 Z i 2 + ... + γ p Z ip .
The null hypothesis
H 0 : σ 12= σ 22= ...= σ n2
can be expressed as
H 0 : γ 2= γ 3= ...= γ p= 0 .

If H 0 is accepted , it implies that Z i 2 , Z i 3 ,..., Z ip do not have any effect on σ i2 and we get σ i2 = γ 1 .

The test procedure is as follows:


1. Ignoring heteroskedasticity, apply OLS to
yi = β1 + β 2 X i1 + ... + β k X ik + ε i
and obtain residual
e= y − Xb
b = ( X ' X ) −1 X ' Y .
2. Construct the variables
ei2 nei2
=gi =
 n 2  SS res
 ∑ ei n 
 i =1 

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


8
where SS res is the residual sum of squares based on ei ' s.
*
3. Run regression of g on Z1 , Z 2 ,..., Z p and get residual sum of squares SS res .

4. For testing, calculate the test statistic


1 n 2 * 
=Q  ∑
2  i =1
gi − SS res 

which is asymptotically distributed as χ 2 distribution with ( p − 1) degrees of freedom.

5. The decision rule is to reject H 0 if Q > χ12−α (m − 1).

• This test is very simple to perform.


• A fairly general form is assumed for heteroskedasticity, so it is a very general test.
• This is an asymptotic test.
• This test is quite powerful in the presence of heteroskedasticity.

3. Goldfeld Quandt test


This test is based on the assumption that σ i2 is positively related to X ij , i.e., one of the explanatory variable

explains the heteroskedasticity in the model. Let j th explanatory variable explains the heteroskedasticity, so

σ i2 ∝ X ij
or σ i2 = σ 2 X ij .

The test procedure is as follows:

1. Rank the observations according to the decreasing order of X j .

2. Split the observations into two equal parts leaving c observations in the middle.
n−c n−c
So each part contains observations provided > k.
2 2
3. Run two separate regression in the two parts using OLS and obtain the residual sum of squares SS res1

and SS res 2 .

SS res 2
4. The test statistic is F0 =
SS res1

 n−c n−c 
which follows a F − distribution, i.e., F  − k, − k  when H 0 true.
 2 2 

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


9
 n−c n−c 
5. The decision rule is to reject H 0 whenever F0 > F1−α  − k, −k.
 2 2 
• This test is simple test but it is based on the assumption that one of the explanatory variable helps is
determining the heteroskedasticity.
• This test is an exact finite sample test.
• One difficulty in this test is that the choice of c is not obvious. If large value of c is chosen, then it
n−c n−c
reduces the degrees of freedom − k and the condition > k may be violated.
2 2
On the other hand, if smaller value of c is chosen, then the test may fail to reveal the heteroskedasticity. The
basic objective of ordering of observations and deletion of observations in the middle part may not reveal the
heteroskedasticity effect. Since the first and last values of σ i2 give the maximum discretion, so deletion of
smaller value may not give the proper idea of heteroskedasticity. Keeping those two points in view, the
n
working choice of c is suggested as c = .
3

Moreover, the choice of X ij is also difficult. Since σ i2 ∝ X ij , so if all important variables are included in the

model, then it may be difficult to decide that which of the variable is influencing the heteroskedasticity.

4. Glesjer test:
This test is based on the assumption that σ i2 is influenced by one variable Z , i.e., there is only one variable
which is influencing the heteroskedasticity. This variable could be either one of the explanatory variable or it
can be chosen from some extraneous sources also.

The test procedure is as follows:


1. Use OLS and obtain the residual vector e on the basis of available study and explanatory variables.
2. Choose Z and apply OLS to
δ 0 + δ1Z ih + vi
ei =

where vi is the associated disturbance term.

3. Test H 0 : δ1 = 0 using t -ratio test statistic.

1
4. Conduct the test for h =±1, ± . So the test procedure is repeated four times.
2
In practice, one can choose any value of h . For simplicity, we choose h = 1 .
• The test has only asymptotic justification and the four choices of h give generally satisfactory results.
• This test sheds light on the nature of heteroskedasticity.
Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur
10
5. Spearman’s rank correlation test
It di denotes the difference in the ranks assigned to two different characteristics of the i th object or

phenomenon and n is the number of objects or phenomenon ranked, then the Spearman’s rank correlation
coefficient is defined as
 n 2 
 ∑ di 
r = 1 − 6  i =12  ; − 1 ≤ r ≤ 1.
 n(n − 1) 
 
 
This can be used for testing the hypothesis about the heteroskedasticity.
Consider the model
yi =β 0 + β1 X i + ε i .
1. Run the regression of y on X and obtain the residuals e .

2. Consider ei .

3. Rank both ei and X i (or yˆi ) in an ascending (or descending) order.

4. Compute rank correlation coefficient r based on ei and X i (or yˆi ) .

5. Assuming that the population rank correlation coefficient is zero and n > 8, use the test statistic

r n−2
t0 =
1− r2
which follows a t -distribution with (n − 2) degrees of freedom.
6. The decision rule is to reject the null hypothesis of heteroskedasticity whenever t0 ≥ t1−α (n − 2).
If there are more than one explanatory variables, then rank correlation coefficient can be computed
between ei and each of the explanatory variables separately and can be tested using t0 .

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


11
Estimation under heteroskedasticity
Consider the model
y Xβ +ε
=
with k explanatory variables and assume that
E (ε ) = 0
 σ 12 0  0 
 
0 σ 22  0 
V (ε ) = Ω =  .
     
 
 0 0  σ n2 

The OLSE is
b = ( X ' X ) −1 X ' y.
Its estimation error is
b−β =
( X ' X ) −1 X ' ε
and
E (b − β ) ( X ' X
= = ) −1 X ' E (ε ) 0.
Thus OLSE remains unbiased even under heteroskedasticity.

The covariance matrix of b is


V (b) =E (b − β )(b − β ) '
= ( X ' X ) −1 X ' E (εε ') X ( X ' X ) −1
= ( X ' X ) −1 X ' ΩX ( X ' X ) −1
which is not the same as conventional expression. So OLSE is not efficient under heteroskedasticity as
compared under homokedasticity.

Now we check if E (ei2 ) = σ i2 or not where ei is the ith residual.


The residual vector is
e =y − Xb =H ε

e 
e 
ei = [ 0, 0,..., 0,1, 0,...0]  2 
 
 
en 
=  i ' H εε ' H  i

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


12
where  i is a n × 1 vector with all elements zero except the i th element which is unity and

H = I − X ( X ' X ) −1 X ' . Then

=ei2 =
i ' e.e '  i  i ' H εε ' H  i
ei2 )  i ' HE (εε ') H
E (= =  i  i ' H ΩH  i

0
 
   h1i i 
 h11  h1n  0   
    h2i i 
=H  i =     1 
  
 hn1  hnn  0   
  
 hni i 
 
0 
 
σ 12 0  0   h1i i 
  
0 σ 22  0   h2i i 
E (ei ) =  h1i i, h2i i,..., hni i  
2
.
       
  
 0 0  σ n2   hn1 i 

Thus E (ei2 ) ≠ σ i2 and so ei2 becomes a biased estimator of σ i2 in the presence of heteroskedasticity.

In the presence of heteroskedasticity, use the generalized least squares estimation. The generalized least
squares estimator (GLSE) of β is

βˆ =Ω
( X ' −1 X ) −1 X ' Ω −1 y.
Its estimation error is obtained as

βˆ =( X ' Ω −1 X ) −1 X ' Ω −1 ( X β + ε )
βˆ − β = ( X ' Ω −1 X ) −1 X ' Ω −1ε ).
Thus
E ( βˆ − β ) = ( X ' Ω −1 X ) −1 X ' Ω −1E (ε ) = 0
V ( βˆ ) =E ( βˆ − β )( βˆ − β )
( X ' Ω −1 X ) −1 X ' Ω −1 E (εε )Ω −1 X ( X ' Ω −1 X ) −1
=
= ( X ' Ω −1 X ) −1 X ' Ω −1ΩΩ −1 X ( X ' Ω −1 X ) −1
= ( X ' Ω −1 X ) −1.

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


13
Example: Consider a simple linear regression model
yi =β 0 + β1 xi + ε i , i =1, 2,..., n.

The variances of OLSE and GLSE of β are


n

∑ (x − x ) σi
2
i
2

Var (b) = i =1
n

∑ (x − x )
i =1
i
2

∑ (x − x )
i
2

Var ( βˆ ) = i =1

 2 2 2 1 
n n

∑ i ( x − x ) σ i   ∑ ( xi − x )
σ i2 
= i 1= i 1
x −x 
Square of the correlation coefficient betweene σ i ( xi − x )   and  i 
 σi 
≤1
⇒ Var ( βˆ ) ≤ Var (b).
( xi − x )
So efficient of OLSE and GLSE depends upon the correlation coefficient between ( xi − x )σ i and .
σi

The generalized least squares estimation assumes that Ω is known, i.e., the nature of heteroskedasticity is
completely specified. Based on this assumption, the possibilities of following two cases arise:
• Ω is completely specified or
• Ω is not completely specified .

We consider both the cases as follows:


Case 1: σ i2 ' s are pre-specified:
Suppose σ 12 , σ 22 ,..., σ n2 are completely known in the model

yi = β1 + β 2 X i 2 + ... + β k X ik + ε i .

Now deflate the model by σ i , i.e.,

yi 1 X X ε
= β1 + β 2 i 2 + ... + β k ik + i .
σi σi σiσi σi
ε σ i2
Let ε i* = i , then E (ε=
*
) 0, Var (ε=*
) = 1. Now OLS can be applied to this model and usual tools for
σi i i
σ i2
drawing statistical inferences can be used.
Note that when the model is deflated, the intercept term is lost as β1 / σ i is itself a variable. This point has to be
taken care in a software output.
Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur
14
Case 2: Ω may not be completely specified
Let σ 12 , σ 22 ,..., σ n2 are partially known and suppose

σ i2 ∝ X ij2 λ

or σ i2 = σ 2 X ij2 λ

but σ 2 is not available. Consider the model


yi = β1 + β 2 X i 2 + ... + β k X ik + ε i

and deflate it by X ijλ as

yi β1 X X ε
λ
= λ
+ β 2 iλ2 + ... + β k ikλ + iλ .
X ij X ij X ij X ij X ij

Now apply OLS to this transformed model and use the usual statistical tools for drawing inferences.

A caution is to be kept is mind while doing so. This is illustrated in the following example with one
explanatory variable model.

Consider the model


yi =β 0 + β1 xi + ε i .

Deflate it by xi , so we get

yi β 0 ε
= + β1 + i .
xi xi xi

Note that the roles of β 0 and β1 in original and deflated models are interchanged. In original model, β 0 is

intercept term and β1 is slope parameter whereas in deflated model, β1 becomes the intercept term and β 0
becomes the slope parameter. So essentially, one can use OLS but need to be careful in identifying the intercept
term and slope parameter, particularly in the software output.

Econometrics | Chapter 8 | Heteroskedasticity | Shalabh, IIT Kanpur


15

You might also like