Chapter 12 Heteroskedasticity PDF

ECON 413 Introductory Econometrics
Chapter 12 – Heteroskedasticity
November 21, 2011

This Lecture
We are going to relax the assumption that variance of error terms are
homoskedastic, i.e.
var (u|x1 , x2 ...xk ) = σ 2
Notice that the assumption of homoskedasticity plays no role in unbiasedness

and consistency of the estimator
However it has effects on statistical inference
i.e. The t-test, F-test and confidence intervals are constructed base on the
assumption on homoskedasticity
Also, if heteroskedasticity is present, the OLS estimator is no longer the best
linear unbiased estimator

Testing for Heteroskedasticity
Consider the following linear model:
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u
We would like to test the null hypothesis that:
var (u|x1 , x2 ...xk ) = σ 2
In order to test for violation of the homoskedasticity assumption, we want to

test whether u 2 is related (in expected terms) to one or more of the explanatory
variables.
A simple way is to assume a linear function:
u 2 = δ0 + δ1 x1 + δ2 x2 + ... + δk xk + v
We are testing the null hypothesis of:
δ1 = δ2 = ... = δk = 0

Testing Heteroskedasticity
Since we never know the actual errors in the population model, but we do have
estimates of them: the OLS residuals, ûi , is an estimate of the error ui for
observation i.
Thus, we can estimate the equation:
uˆi2 = δ0 + δ1 x1 + δ2 x2 + ... + δk xk + error
Let Rû22 be the R-squared in estimating the above equation.

Then the F statistics is:
Rû22 /k
F = ∼ Fk,n−k−1
(1 − Rû22 )/(n − k − 1)
The above procedure is called the Breusch-Pagan Test for heteroskedasticity

(BP test)

The Breusch-Pagan Test
1 Estimate the model by OLS as usual. Obtain the squared OLS residuals, uˆ2
2 Run the regression of uˆ2 against the independent variables (x1 , x2 , ..., xk ). Keep
the R-squared from the regression, Rû22
3 Form the F statistic and compute the p-value (using the Fk,n−k−1 distribution)
4 If the p-value is sufficiently small, that is, below the chosen significance level,
then we reject the null hypothesis of homoskedasticity

STATA example
We use the housing price data in HPRICE1.DTA to test for heteroskedasticityin

a simple house price equation
We first estimate the equation by using the levels of all variables:
. reg price lotsize sqrft bdrms
Source SS df MS Number of obs = 88

F( 3, 84) = 57.46
Model 617130.701 3 205710.234 Prob > F = 0.0000
Residual 300723.805 84 3580.0453 R-squared = 0.6724
Adj R-squared = 0.6607
Total 917854.506 87 10550.0518 Root MSE = 59.833
price Coef. Std. Err. t P>|t| [95% Conf. Interval]
lotsize .0020677 .0006421 3.22 0.002 .0007908 .0033446

sqrft .1227782 .0132374 9.28 0.000 .0964541 .1491022
bdrms 13.85252 9.010145 1.54 0.128 -4.065141 31.77018
_cons -21.77031 29.47504 -0.74 0.462 -80.38466 36.84405

STATA Example
To detect heteroskedasticity, we can first inspect the residuals plot to see if the
variance of the error terms change in any systematic ways with the fitted values
The residual plot can be generated by using the command: rvfplot
200
100
Residuals
0
-100
200 300 400 500 600

Fitted values
We can see that in the residual plot, the errors seem to be more dispersed when
the fitted value becomes bigger
We should then carry out the Breusch-Pagan test for heteroskedasticity

STATA Example
To test for heteroskedasticity using the Breusch-Pagan Test, we need to first

obtain the residuals from our regression by using the command predict name of
residual, res
Then we need to obtain the squared residuals by using the command: gen
We then regress the squared residuals with the independent variables
The result shows that the p-value for the F-statistic is 0.002 which is strongly
against the null hypothesis of homoskedasticity
. predict res1, residual
. gen res1sq = res1*res1
. reg res1sq lotsize sqrft bdrms

F( 3, 84) = 5.34
Model 701213780 3 233737927 Prob > F = 0.0020
Residual 3.6775e+09 84 43780003.5 R-squared = 0.1601
Total 4.3787e+09 87 50330276.7 Root MSE = 6616.6
res1sq Coef. Std. Err. t P>|t| [95% Conf. Interval]
lotsize .2015209 .0710091 2.84 0.006 .0603116 .3427302

sqrft 1.691037 1.46385 1.16 0.251 -1.219989 4.602063
bdrms 1041.76 996.381 1.05 0.299 -939.6526 3023.173
_cons -5522.795 3259.478 -1.69 0.094 -12004.62 959.0348

STATA Example
One way to reduce the problem of heteroskedasticity is to use logarithmic

functional forms for the dependent and independent variables
Consider the same example, but now instead of running the regression against
the levels, we estimate the logged version of the regression:
. reg lprice llotsize lsqrft bdrms

F( 3, 84) = 50.42
Model 5.15504028 3 1.71834676 Prob > F = 0.0000
Residual 2.86256324 84 .034078134 R-squared = 0.6430
Total 8.01760352 87 .092156362 Root MSE = .1846
lprice Coef. Std. Err. t P>|t| [95% Conf. Interval]
llotsize .1679667 .0382812 4.39 0.000 .0918404 .244093

lsqrft .7002324 .0928652 7.54 0.000 .5155597 .8849051
bdrms .0369584 .0275313 1.34 0.183 -.0177906 .0917074
_cons -1.297042 .6512836 -1.99 0.050 -2.592191 -.001893

STATA Example
We can inspect the residual plot:
1
.5
Residuals
0
-.5
-1
5 5.5 6 6.5
Fitted values
Heteroskedasticity has seemed to be improved

STATA Example
We can conduct the Breusch-Pagan Test:
. predict res2,residual
. gen res2sq = res2*res2
. reg res2sq llotsize lsqrft bdrms

F( 3, 84) = 1.41
Model .022620168 3 .007540056 Prob > F = 0.2451
Residual .448717194 84 .005341871 R-squared = 0.0480
Total .471337362 87 .005417671 Root MSE = .07309
res2sq Coef. Std. Err. t P>|t| [95% Conf. Interval]
llotsize -.0070156 .0151563 -0.46 0.645 -.0371556 .0231244

lsqrft -.0627368 .0367673 -1.71 0.092 -.1358526 .0103791
bdrms .0168407 .0109002 1.54 0.126 -.0048356 .038517
_cons .509994 .257857 1.98 0.051 -.0027829 1.022771
The p-value of the F-test is 0.245 which means we fail to reject the null
hypothesis of homoskedasticity in the model with logarithmic unctional forms

The White Test for Heteroskedasticity
The assumption of homoskedasticity can in fact be replaced by a weaker

assumption that the squared error u 2 is uncorrelated with all the independent
variables (xj ), the squares of the independent variables (xj2 ), and all the cross
products (xj xh for j 6= h)
This is known as the White test for heteroskedasticity
To test a model with k = 3 independent variables, the White test is based on an
estimation of
uˆ2 = δ0 + δ1 x1 + δ2 x2 + δ3 x3 + δ4 x12 + δ5 x22 +

δ6 x32 + δ7 x1 x2 + δ8 x1 x3 + δ9 x2 x3 + error
The difference between the White test and the Breusch-Pagan test is that the
former includes the squares and cross products of the independent variables

The White Test
The formulation of the White test above seems to be very complicated

However, we can preserve the spirit of the White test while conserving on the
degrees of freedom by using the OLS fitted values in the test
Recall that the fitted values are defined as:
yˆi = βˆ0 + βˆ1 x1i + βˆ2 x2i + ... + βˆk xki
To obtain function of all the squares and cross products of the independent
variables, we can square the fitted values
This suggests testing for heteroskedasticity by estimating the equation:
uˆ2 = δ0 + δ1 ŷ + δ2 yˆ2 + error
where ŷ stands for fitted values

It is important not to confuse ŷ and y in this equation
We use the fitted values because they are functions of the independent
variables, using y does not produce a valid test for heteroskedasticity

The White Test
We can use F statistic for the null hypothesis H0 : δ1 = 0, δ2 = 0

We can view the above test as a special case of the White test
The procedure of a special case of the White test for Heteroskedasticity
1 Estimate the model by OLS as usual. Obtain the OLS residuals û and the
fitted values ŷ . Compute the squared OLS residuals uˆ2 and the squared
fitted values yˆ2
2 Run the regression of uˆ2 against ŷ and yˆ2
3 Keep the R-squared from the regression, R 2ˆ2
u
4 Form the F statistic and compute the p-value (using the F2,n−3
distribution)

Weighted Least Squares Estimation
If we can detect heteroskedasticity in a specific form, we can se a weighted least

squares method to correct for the variance
Suppose the Heteroskedasticity is known up to a multiplicative constant, i.e.
var (u|x) = σ 2 h(x)
where h(x) is some function of the explanatory variables that determines the
heteroskedasticity
Since variance must be positive, so h(x) > 0 for all possible values of the
independent variables
Suppose the heteroskedasticity takes the following form:
σi2 = var (ui |xi ) = σ 2 h(xi ) = σ 2 hi

Consider the following simple savings function:
savi = β0 + β1 inci + ui
var (ui |inci ) = σ 2 inci
Here, h(x) = h(inc) = inc

i.e. the variance of the error is proportional to the level of income
This means that, as income increases, the variability in savings increases
How can we use the information in the form of heteroskedasticity to estimate
the βj ?
Consider the original equation:
yi = β0 + β1 x1i + β2 x2i + ... + βk xki + ui
Since var (ui |xi ) = E (ui2 |xi ) = σ 2 hi

√
The variance of ui / hi is σ 2 , i.e.
p 2
E [(ui / hi ] = E (ui2 /hi ) = (σ 2 hi )/hi = σ 2

√
We can divide the original equation by hi to get
p p p p p p
yi / hi = β0 / hi + β1 (x1i / hi ) + β2 (x2i / hi ) + ... + βk (xki / hi ) + (ui / hi )
or
yi∗ = β0 xi0
∗ ∗
+ β1 xi1 + ... + βk xik∗ + ui∗
√
∗ = 1/ h and the other starred variables denote the corresponding
where xi0 i
√
original variables divided by hi
Consider our simple example of savings function, the transformed equation is:
inci = β0 (1/ inci ) + β1 inci + ui∗

p p p
savi /

Now, with this transformed equation, the assumption of homoskedasticity is
satisfied
The OLS estimators from this transformed equation has appealing properties of
BLUE
These estimators β0∗ , β1∗ , ..., βk∗ will be different from the OLS estimatorsin the
original equation
These estimators are referred to weighted least squares (GLS) estimators

STATA Example
In this example, we estimate equations that explain net total financial wealth in
terms of income
nettfa = β0 + β1 inc + u
We use the data on single people (fsize=1 ) for the above regression:
. keep if fsize ==1

(7258 observations deleted)
. reg nettfa inc

F( 1, 2015) = 181.60
Model 377482.064 1 377482.064 Prob > F = 0.0000
Total 4565965.05 2016 2264.86361 Root MSE = 45.592
nettfa Coef. Std. Err. t P>|t| [95% Conf. Interval]
inc .8206815 .0609 13.48 0.000 .7012479 .940115

_cons -10.57095 2.060678 -5.13 0.000 -14.61223 -6.529671

STATA Example
Suppose we assume the OLS standard errors has a variance: var (u|inc) = σ 2 inc
We can run the following WLS regression:
√ √ √
nettfa/ inc = β0 / inc + β1 inc + u
. gen nettfa1 = nettfa/sqrt(inc)
. gen invincsq = 1/sqrt(inc)
. gen inc1 = sqrt(inc)
. reg nettfa1 invincsq inc1, noc

F( 2, 2015) = 138.24
Model 14410.242 2 7205.12101 Prob > F = 0.0000
Total 119429.713 2017 59.2115585 Root MSE = 7.2193
nettfa1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
invincsq -9.580702 1.653284 -5.79 0.000 -12.82303 -6.338378

inc1 .7870523 .0634814 12.40 0.000 .6625562 .9115484

Chapter 12 Heteroskedasticity PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 12 Heteroskedasticity PDF

Uploaded by

Copyright:

Available Formats

ECON 413 Introductory Econometrics

November 21, 2011

ECON 413 Introductory Econometrics

var (u|x1 , x2 ...xk ) = σ 2

Notice that the assumption of homoskedasticity plays no role in unbiasedness

ECON 413 Introductory Econometrics

Consider the following linear model:

We would like to test the null hypothesis that:

var (u|x1 , x2 ...xk ) = σ 2

In order to test for violation of the homoskedasticity assumption, we want to

We are testing the null hypothesis of:

ECON 413 Introductory Econometrics

uˆi2 = δ0 + δ1 x1 + δ2 x2 + ... + δk xk + error

Let Rû22 be the R-squared in estimating the above equation.

The above procedure is called the Breusch-Pagan Test for heteroskedasticity

ECON 413 Introductory Econometrics

ECON 413 Introductory Econometrics

We use the housing price data in HPRICE1.DTA to test for heteroskedasticityin

. reg price lotsize sqrft bdrms

Source SS df MS Number of obs = 88

price Coef. Std. Err. t P>|t| [95% Conf. Interval]

lotsize .0020677 .0006421 3.22 0.002 .0007908 .0033446

ECON 413 Introductory Econometrics

200 300 400 500 600

ECON 413 Introductory Econometrics

To test for heteroskedasticity using the Breusch-Pagan Test, we need to first

. predict res1, residual

. gen res1sq = res1*res1

. reg res1sq lotsize sqrft bdrms

Source SS df MS Number of obs = 88

res1sq Coef. Std. Err. t P>|t| [95% Conf. Interval]

lotsize .2015209 .0710091 2.84 0.006 .0603116 .3427302

ECON 413 Introductory Econometrics

One way to reduce the problem of heteroskedasticity is to use logarithmic

. reg lprice llotsize lsqrft bdrms

Source SS df MS Number of obs = 88

lprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

llotsize .1679667 .0382812 4.39 0.000 .0918404 .244093

ECON 413 Introductory Econometrics

We can inspect the residual plot:

Heteroskedasticity has seemed to be improved

ECON 413 Introductory Econometrics

We can conduct the Breusch-Pagan Test:

. gen res2sq = res2*res2

. reg res2sq llotsize lsqrft bdrms

Source SS df MS Number of obs = 88

res2sq Coef. Std. Err. t P>|t| [95% Conf. Interval]

llotsize -.0070156 .0151563 -0.46 0.645 -.0371556 .0231244

ECON 413 Introductory Econometrics

The assumption of homoskedasticity can in fact be replaced by a weaker

uˆ2 = δ0 + δ1 x1 + δ2 x2 + δ3 x3 + δ4 x12 + δ5 x22 +

ECON 413 Introductory Econometrics

The formulation of the White test above seems to be very complicated

yˆi = βˆ0 + βˆ1 x1i + βˆ2 x2i + ... + βˆk xki

uˆ2 = δ0 + δ1 ŷ + δ2 yˆ2 + error

where ŷ stands for fitted values

ECON 413 Introductory Econometrics

We can use F statistic for the null hypothesis H0 : δ1 = 0, δ2 = 0

ECON 413 Introductory Econometrics

If we can detect heteroskedasticity in a specific form, we can se a weighted least

var (u|x) = σ 2 h(x)

σi2 = var (ui |xi ) = σ 2 h(xi ) = σ 2 hi

ECON 413 Introductory Econometrics