You are on page 1of 9

Applied Econometrics Heteroscedasticity

Applied Econometrics
Lecture 8: Heteroscedasticity
1) The nature of heteroscedasticity
To estimate the model: Y
i
=
0
+
1
X
i
+
i
. We assume that (0,
2
). The constant variance
assumption E(
i
2
)=
2
for all i is called homoscedasticity. Violation of of this assumption is the
problem of heteroscedasticity.
Heteroscedasticity is a non-constant error variance across the sample. The presence of
heteroscedasticity renders least squares estimators inefficient, but they remain unbiased. In other
words, they are linear unbiased estimators, not best linear unbiased estimators. Moreover, the
standard formulae for the standard errors of the coefficients no longer apply and, hence, statistical
inferences based on the t test or F test is not valid.
Since we do not know the population line, we do not know the actual errors ( s), but we estimate
them by the residuals (e). Hence a look at the residual plot is a first test for the presence of
heteroscedasticity.
2) An illustrative example: Urban weekly earning against age of workers
We have data on weekly earning of 261 workers along with their age. The estimated regression of
weekly wage income on age is given below (standard errors in brackets)
INCOME = 8.647 + 4.883 AGE R
2
= 0.196
(21.13) (0.61)
Figure 1 presents the residual versus predicted plot. The points spread vertically wider and wider
with the increase in the predicted value of weekly wage income, indicating heteroscedasticity.
We may also draw the plots of absolute and squared residuals against the predicted values of weekly
wage income. The latter mainly draws our attention to the presence of outliers.
Written by Huynh Thanh Dien May 24, 2004
1
Applied Econometrics Heteroscedasticity
Figure 1: The Evidence of Heteroscedasticity
Residual versus Predicted Income
-400
-200
0
200
400
600
800
50 100 150 200 250 300 350 400
Predicted Income
R
e
s
i
d
u
a
l

Source: Survey of worker households in 1990 in an industrial town in southern of India.
3) Detection of heteroscedasticity
There are a number of tests, which help us to detect the presence of heteroscedasticity. What follows
is an illustration of a selection of commonly used ones.
Glejsers test
Glejsers test checks whether a systematic relation exists between the residuals and the explanatory
variables. The test involves regressing absolute residual separately on X, X
-1
, X
1/2
, and uses t-tests for
the slope coefficients to be zero
1
. The hypothesis of homoscesdasticity is rejected if any of slope
coefficients turns out to be significantly different from zero.
The test involves the following steps:
1. Run regression and calculate the residuals (e)
2. Convert the residuals to their absolute values
2
1
If there is more than one explanatory variable, this test is to be repeated for each of the explanatory
variables.
2
In Excel we use the command =absolute(data range) and in Eviews we use GENR ee = abs(e).
Written by Huynh Thanh Dien May 24, 2004
2
Applied Econometrics Heteroscedasticity
3. Regress the absolute residual on each regressor separately in the following functional
forms:
( e( =
0
+
1
X
( e( =
0
+
1
(1/X)
( e( =
0
+
1
X
( e( =
0
+
1
(1/X)
4. Use t-test to determine of the slope coefficient is significantly different from zero.
Whites test
Whites test also checks whether there is any systematic relation between the squared residuals and
explanatory variables. This is achieved by regressing the squared residuals e
2
on all the explanatory
variables and on their squares and cross products. Thus, if X
1
and X
2
are the explanatory variables,
then Whites test involves regressing e
2
on X
1
, X
2
, X
1
2
, X
2
2
, X
1
X
2
and using overall F-test to check if
the regression is significant or not.
Goldfeld-Quandts test
The test is commonly used when the heteroscedatic variance is suspected to be monotonically (i.e.,
consistently increasing or decreasing) related to one of the explanatory variables in the regression
model. We group the data with respect to different ranges of one of the explanatory variables and test
whether the conditional variances of the error term are the same.
The test involves the following steps:
1. Arrange the data in ascending order of the explanatory variable suspected to be related to
the error variance.
2. Drop a number of the middle observations, say c, so that (n-c) is divided by 2. A rule of
thumb is to drop about 1/4
th
of the total observations from the middle. This omission
sharpens the test.
3. Estimate two separate regressions for the bottom and the top group of observations (equal
sample size).

4. Calculate the ratio of the higher to the lower residual sum of squares from the two sub-
sample regressions. If the sub-sample variances are the same (homoscedatic errors), the
Written by Huynh Thanh Dien May 24, 2004
3
Applied Econometrics Heteroscedasticity
ratio will approximately equal unity.
5. Compare the computed ratio with critical value of the relevant F distribution with ([(n-
c)/2]-k; [(n-c)/2]-k) degrees of freedom at the desired level of significance, where (n-c)/2 is
the size of each sub-sample and k is the number of regressors in the equation (including
intercept). If the computed ratio exceeds the critical value, then the hypothesis of
homoscedasticity is rejected.
The Goldfeld-Quandts test is only valid under the assumption that the dependent variable is
normally distributed.
Bartletts test
Bartletts test can be applied to check for the equality of the variances of the dependent variables
across groups defined by an explanatory variable.
The conditional variance of Y given X is the same as the conditional variance of the error term
2
X
,
V(Y) = V(
0
+
1
X +
X
)
But, since X is given,
X
is uncorrelated with X, i.e. cov(
X
, X) = 0, and the variance of (
0
+

1
X) is zero for any given X, we have:
V(Y) = V(
0
+
1
X) + V(
X
) = V(
X
) =
2
X

Hence, one way of checking for heteroscedasticity is to test for the stability of the conditional
variance of Y across the range of X in the sample data. In practical situations, we generally do not
have multiple observations of Y for a given X. The application of Bartletts test, therefore, involves
that we first sort the data in ascending order of the explanatory variable which is suspected to be the
cause of the heteroscedastic pattern of the error term, and divide the sample into several groups, say
k groups, based on this explanatory variable, after which we subsequently test the hypothesis of
homogeneous variances across the groups
If
2
i
is the variance of Y in the ith group; the null hypothesis we seek to test can then be stated as:
H
0
:
2
1
=
2
2
=
2
3
= =
2
k
=
2
Now let Y
ij
= jth Y value in the ith group; n
i
= the number of observations in ith group; and f
i
=(n
i
1);
f = f
i
. The test is then performed as follows:
Written by Huynh Thanh Dien May 24, 2004
4
Applied Econometrics Heteroscedasticity
1. Compute the sample variance for each group i

n
1 j
i
ij
2
i
2
i
i
) Y
Y
(
1 n
1
s

where

n
1 j
ij
i
i
i
Y
n
1
Y
s
i
2
is the estimator of
2
i
, i = 1, 2, 3, , k.
2. Compute the pooled sample variance of all the group together

k
1 i
k
1 i
2
i
i
k
1 i
i
k
1 i
2
i
i
2
f
s f
f
s f
s
s
2
is the estimator of
2
under H
0
.
2. Under the null hypothesis the ratio A/B has approximately a chi square distribution with (k1)
degrees of freedom, where

k
1 i
2
i i
2
)
s
.ln(
f
) (s f.ln A
1
1
]
1

,
_

k
1 i
i
f
1
f
1
1) 3(k
1
1 B
Note that Bartletts test is only valid under the assumption that the dependent variable is normally
distributed.


4) Transformations towards homoscedasticity
Our tests for heteroscedasticity are test upon residual as proxies for errors. But properties of
residuals are determined by model specification. Errors in true model may be homoscedastic, but
those in misspecified model not so. Hence residual heteroscedasticity may be a symptom of model
misspecification (either incorrect functional form or omitted variables).

Written by Huynh Thanh Dien May 24, 2004
5
Applied Econometrics Heteroscedasticity
One of the most common reasons for heteroscedasticity is the skewness of the distribution of one or
more variables
3
involved in a regression with socioeconomic data. How do we find the appropriate
transformation to eliminate heteroscedasticity?
If the functional relationship between the variance of the dependent variable (
Y
2
) and the mean are
known, a transformation exists which will make the variance approximately constant (Rawlings,
1988:309).
The common functional relationship between the variance of the dependent variable and its
conditional mean is:
Y
2
= A
2
Y
k
or, alternatively:
Y = A

Y
k
which can conveniently reexpress as a double-log equation as follows:
ln( Y) = A + k ln( Y)
where
Y
2
is the conditional variance of Y
Y is the conditional mean of Y
A and k are constants
The slope coefficient of the corresponding regression tells us which transformation may be most
appropriate.

If k = 1, the log transformation will approximately eliminate heteroscedasticity.
If k 1, the power transformation Y
1-k
will approximately eliminate heteroscedasticity.
However, Y and Y are unknown. What we can do is to substitute the absolute residuals ( e( for
Y and the predicted Y
p
for Y in the above equation and using the data to estimate its slope
coefficient, k, with least squares. The corresponding fitted regression is given as follows,
ln(( e( ) =
0
+
1
ln(Y
p
)
3
We may properly take log transformation of both dependent and independent variables to eliminate
heteroscedasticity (practice INDFOOD).
Written by Huynh Thanh Dien May 24, 2004
6
Applied Econometrics Heteroscedasticity
We may use the t-test for the slope coefficient to be equal to one at the desired level of significance.
If
1
= 1, the log transformation will approximately eliminate heteroscedasticity.
If
1
1, the power transformation Y
1-k
will approximately eliminate heteroscedasticity.
5) Weighted least squares
If we believe we dealing with a case of genuine heteroscedasticity then, in some cases, the method of
weighted least squares allows us to derive efficient estimators of a regression model with
heteroscedastic errors and to make valid inferences.
If the heteroscedastic model is given as follows:
Y
i
=
1
+
2
X
i
+
i
(6.1)
where V(
i
) =
i
2
= w
i
2

2
, we can divide equation 6.1) by w
i
so as to get the following model:
w w
X

w
1

w
Y
i
i
i
i
2
i
1
i
i

+ +
This transformed regression has no constant term. The variance of the error term in the new
specification is homoscedastic because
w
w
1
) V(
w
1
w
V
2 2 2
i
2
i
i
2
i
i
i

,
_


This procedure of estimation of the regression coefficients is called the weighted least squares
method of estimation. The crucial issue in practice is to find the appropriate weights, w
i
.
6) Whites heteroscedastic consistent standard errors (HCSEs)
If the weighted least squared method is not possible, we may use the heteroscedastic consistent
standard errors. In the two variable case we see that:

) X
X
(

)
b
Var(
i
2
2
2
(6.1)
where the formula is simplified using the assumption that E(
i
2
) =
2
for all i. Where this
Written by Huynh Thanh Dien May 24, 2004
7
Applied Econometrics Heteroscedasticity
assumption is not valid (i.e. the errors are heteroscedastic) then:

) X
X
(
) X
X
(
)
b
Var(
i
i
2
2
2
2
(6.2)
White (1980) showed that substituting the squared residuals (e
i
2
) into equation (6.2) yields a
consistent estimate of the standard errors. However, unlike with weighted least squares, these are not
the minimum variances.
Inspection of equation (6.2) shows that if the errors are homoscedastic then the expression simplifies
to that in equation (6.1). That is, the heteroscedastic consistent standard errors and those usually
reported will be the same if there is no heteroscedasticity. A divergence between these two sets of
standard errors is thus a rough test for the presence of heteroscedasticity.
References
Maddala, G.S. (1992), Introduction to Econometrics, Macmillan Publishing Company, New York.
Rawlings, John O. (1988) Applied Regression Analysis: A Research Tool, Pacific Grove, CA:
Woodsworth and Brooks/Coke.
Mukherjee Chandan, Howard White and Marc Wuyts (1998), Econometrics and Data Analysis for
Developing Countries published by Routledge, London, UK.
White Halbert (1980) A Heteroscedasticity Consistent Covariance Matrix Estimator and a Direct
Test for Heteroscedasticity, Econometrica 48: 817-38.
Written by Huynh Thanh Dien May 24, 2004
8
Applied Econometrics Heteroscedasticity
Workshop 8: Heteroscedasticity
1) Use the data set INDIA,
1.1) estimate the regression line between the logarithm of wage income and the age of the
worker, compute the residuals, and plot the raw, absolute, and squared residuals against
the predicted values of wage income and against the age of workers. What do you
conclude about the presence or absence of heteroscedasticity.
1.2) do tests for heteroscedasticity with the model featuring the logarithms of income versus
the age of INDIA worker.
2) Use the data set SOCECON,
2.1) Regress energy consumption (E) on GNP per capita (Y); energy consumption (E) and
the degree of urbanization (E) as measured by the percentage of the population living in
urban areas (U); and, life expectancy (L) and GNP per capita (Y).
2.2) For each of these simple regressions between raw data, compare the plots of raw,
absolute, and squared residuals against the predicted values of the dependent variable or
against the regressor. In each case, check which plot is most revealing in terms of
detecting heteroscedasticity.
2.3) Use all four tests, detect the presence of heteroscedasticity.
3) Using the data file TPESANT (farm size and household size in Tanzania) estimate the
regression of landholding size on household size with weighted least squares. Do you think
that the resulting regression satisfies the assumptions of classical linear regression?

4) Use the data in data file INDFOOD to test for heteroscedasticity in the regression of
household food expenditure on total expenditure. Repeat the tests using the log of both
variables. Comment on your findings.
5) Using the data in data file LEACCESS, regress life expectancy on (a) income per capita; (b)
logged income per capita; and (c) logged income per capita and access to health. Test for
heteroscedasticity in each regression equation. Comment on your results.
Written by Huynh Thanh Dien May 24, 2004
9