You are on page 1of 20

ECON 413 Introductory Econometrics

Chapter 12 – Heteroskedasticity

November 21, 2011

ECON 413 Introductory Econometrics


This Lecture

We are going to relax the assumption that variance of error terms are
homoskedastic, i.e.

var (u|x1 , x2 ...xk ) = σ 2

Notice that the assumption of homoskedasticity plays no role in unbiasedness


and consistency of the estimator
However it has effects on statistical inference
i.e. The t-test, F-test and confidence intervals are constructed base on the
assumption on homoskedasticity
Also, if heteroskedasticity is present, the OLS estimator is no longer the best
linear unbiased estimator

ECON 413 Introductory Econometrics


Testing for Heteroskedasticity

Consider the following linear model:

y = β0 + β1 x1 + β2 x2 + ... + βk xk + u

We would like to test the null hypothesis that:

var (u|x1 , x2 ...xk ) = σ 2

In order to test for violation of the homoskedasticity assumption, we want to


test whether u 2 is related (in expected terms) to one or more of the explanatory
variables.
A simple way is to assume a linear function:

u 2 = δ0 + δ1 x1 + δ2 x2 + ... + δk xk + v

We are testing the null hypothesis of:

δ1 = δ2 = ... = δk = 0

ECON 413 Introductory Econometrics


Testing Heteroskedasticity

Since we never know the actual errors in the population model, but we do have
estimates of them: the OLS residuals, ûi , is an estimate of the error ui for
observation i.
Thus, we can estimate the equation:

uˆi2 = δ0 + δ1 x1 + δ2 x2 + ... + δk xk + error

Let Rû22 be the R-squared in estimating the above equation.


Then the F statistics is:

Rû22 /k
F = ∼ Fk,n−k−1
(1 − Rû22 )/(n − k − 1)

The above procedure is called the Breusch-Pagan Test for heteroskedasticity


(BP test)

ECON 413 Introductory Econometrics


The Breusch-Pagan Test

1 Estimate the model by OLS as usual. Obtain the squared OLS residuals, uˆ2
2 Run the regression of uˆ2 against the independent variables (x1 , x2 , ..., xk ). Keep
the R-squared from the regression, Rû22
3 Form the F statistic and compute the p-value (using the Fk,n−k−1 distribution)
4 If the p-value is sufficiently small, that is, below the chosen significance level,
then we reject the null hypothesis of homoskedasticity

ECON 413 Introductory Econometrics


STATA example

We use the housing price data in HPRICE1.DTA to test for heteroskedasticityin


a simple house price equation
We first estimate the equation by using the levels of all variables:

. reg price lotsize sqrft bdrms

Source SS df MS Number of obs = 88


F( 3, 84) = 57.46
Model 617130.701 3 205710.234 Prob > F = 0.0000
Residual 300723.805 84 3580.0453 R-squared = 0.6724
Adj R-squared = 0.6607
Total 917854.506 87 10550.0518 Root MSE = 59.833

price Coef. Std. Err. t P>|t| [95% Conf. Interval]

lotsize .0020677 .0006421 3.22 0.002 .0007908 .0033446


sqrft .1227782 .0132374 9.28 0.000 .0964541 .1491022
bdrms 13.85252 9.010145 1.54 0.128 -4.065141 31.77018
_cons -21.77031 29.47504 -0.74 0.462 -80.38466 36.84405

ECON 413 Introductory Econometrics


STATA Example
To detect heteroskedasticity, we can first inspect the residuals plot to see if the
variance of the error terms change in any systematic ways with the fitted values
The residual plot can be generated by using the command: rvfplot

200
100
Residuals

0
-100

200 300 400 500 600


Fitted values

We can see that in the residual plot, the errors seem to be more dispersed when
the fitted value becomes bigger
We should then carry out the Breusch-Pagan test for heteroskedasticity

ECON 413 Introductory Econometrics


STATA Example

To test for heteroskedasticity using the Breusch-Pagan Test, we need to first


obtain the residuals from our regression by using the command predict name of
residual, res
Then we need to obtain the squared residuals by using the command: gen
We then regress the squared residuals with the independent variables
The result shows that the p-value for the F-statistic is 0.002 which is strongly
against the null hypothesis of homoskedasticity

. predict res1, residual

. gen res1sq = res1*res1

. reg res1sq lotsize sqrft bdrms

Source SS df MS Number of obs = 88


F( 3, 84) = 5.34
Model 701213780 3 233737927 Prob > F = 0.0020
Residual 3.6775e+09 84 43780003.5 R-squared = 0.1601
Adj R-squared = 0.1301
Total 4.3787e+09 87 50330276.7 Root MSE = 6616.6

res1sq Coef. Std. Err. t P>|t| [95% Conf. Interval]

lotsize .2015209 .0710091 2.84 0.006 .0603116 .3427302


sqrft 1.691037 1.46385 1.16 0.251 -1.219989 4.602063
bdrms 1041.76 996.381 1.05 0.299 -939.6526 3023.173
_cons -5522.795 3259.478 -1.69 0.094 -12004.62 959.0348

ECON 413 Introductory Econometrics


STATA Example

One way to reduce the problem of heteroskedasticity is to use logarithmic


functional forms for the dependent and independent variables
Consider the same example, but now instead of running the regression against
the levels, we estimate the logged version of the regression:

. reg lprice llotsize lsqrft bdrms

Source SS df MS Number of obs = 88


F( 3, 84) = 50.42
Model 5.15504028 3 1.71834676 Prob > F = 0.0000
Residual 2.86256324 84 .034078134 R-squared = 0.6430
Adj R-squared = 0.6302
Total 8.01760352 87 .092156362 Root MSE = .1846

lprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

llotsize .1679667 .0382812 4.39 0.000 .0918404 .244093


lsqrft .7002324 .0928652 7.54 0.000 .5155597 .8849051
bdrms .0369584 .0275313 1.34 0.183 -.0177906 .0917074
_cons -1.297042 .6512836 -1.99 0.050 -2.592191 -.001893

ECON 413 Introductory Econometrics


STATA Example

We can inspect the residual plot:

1
.5
Residuals

0
-.5
-1

5 5.5 6 6.5
Fitted values

Heteroskedasticity has seemed to be improved

ECON 413 Introductory Econometrics


STATA Example

We can conduct the Breusch-Pagan Test:

. predict res2,residual

. gen res2sq = res2*res2

. reg res2sq llotsize lsqrft bdrms

Source SS df MS Number of obs = 88


F( 3, 84) = 1.41
Model .022620168 3 .007540056 Prob > F = 0.2451
Residual .448717194 84 .005341871 R-squared = 0.0480
Adj R-squared = 0.0140
Total .471337362 87 .005417671 Root MSE = .07309

res2sq Coef. Std. Err. t P>|t| [95% Conf. Interval]

llotsize -.0070156 .0151563 -0.46 0.645 -.0371556 .0231244


lsqrft -.0627368 .0367673 -1.71 0.092 -.1358526 .0103791
bdrms .0168407 .0109002 1.54 0.126 -.0048356 .038517
_cons .509994 .257857 1.98 0.051 -.0027829 1.022771

The p-value of the F-test is 0.245 which means we fail to reject the null
hypothesis of homoskedasticity in the model with logarithmic unctional forms

ECON 413 Introductory Econometrics


The White Test for Heteroskedasticity

The assumption of homoskedasticity can in fact be replaced by a weaker


assumption that the squared error u 2 is uncorrelated with all the independent
variables (xj ), the squares of the independent variables (xj2 ), and all the cross
products (xj xh for j 6= h)
This is known as the White test for heteroskedasticity
To test a model with k = 3 independent variables, the White test is based on an
estimation of

uˆ2 = δ0 + δ1 x1 + δ2 x2 + δ3 x3 + δ4 x12 + δ5 x22 +


δ6 x32 + δ7 x1 x2 + δ8 x1 x3 + δ9 x2 x3 + error

The difference between the White test and the Breusch-Pagan test is that the
former includes the squares and cross products of the independent variables

ECON 413 Introductory Econometrics


The White Test

The formulation of the White test above seems to be very complicated


However, we can preserve the spirit of the White test while conserving on the
degrees of freedom by using the OLS fitted values in the test
Recall that the fitted values are defined as:

yˆi = βˆ0 + βˆ1 x1i + βˆ2 x2i + ... + βˆk xki

To obtain function of all the squares and cross products of the independent
variables, we can square the fitted values
This suggests testing for heteroskedasticity by estimating the equation:

uˆ2 = δ0 + δ1 ŷ + δ2 yˆ2 + error

where ŷ stands for fitted values


It is important not to confuse ŷ and y in this equation
We use the fitted values because they are functions of the independent
variables, using y does not produce a valid test for heteroskedasticity

ECON 413 Introductory Econometrics


The White Test

We can use F statistic for the null hypothesis H0 : δ1 = 0, δ2 = 0


We can view the above test as a special case of the White test
The procedure of a special case of the White test for Heteroskedasticity
1 Estimate the model by OLS as usual. Obtain the OLS residuals û and the
fitted values ŷ . Compute the squared OLS residuals uˆ2 and the squared
fitted values yˆ2
2 Run the regression of uˆ2 against ŷ and yˆ2
3 Keep the R-squared from the regression, R 2ˆ2
u
4 Form the F statistic and compute the p-value (using the F2,n−3
distribution)

ECON 413 Introductory Econometrics


Weighted Least Squares Estimation

If we can detect heteroskedasticity in a specific form, we can se a weighted least


squares method to correct for the variance
Suppose the Heteroskedasticity is known up to a multiplicative constant, i.e.

var (u|x) = σ 2 h(x)

where h(x) is some function of the explanatory variables that determines the
heteroskedasticity
Since variance must be positive, so h(x) > 0 for all possible values of the
independent variables
Suppose the heteroskedasticity takes the following form:

σi2 = var (ui |xi ) = σ 2 h(xi ) = σ 2 hi

ECON 413 Introductory Econometrics


Consider the following simple savings function:

savi = β0 + β1 inci + ui
var (ui |inci ) = σ 2 inci

Here, h(x) = h(inc) = inc


i.e. the variance of the error is proportional to the level of income
This means that, as income increases, the variability in savings increases
How can we use the information in the form of heteroskedasticity to estimate
the βj ?
Consider the original equation:

yi = β0 + β1 x1i + β2 x2i + ... + βk xki + ui

Since var (ui |xi ) = E (ui2 |xi ) = σ 2 hi



The variance of ui / hi is σ 2 , i.e.

p 2
E [(ui / hi ] = E (ui2 /hi ) = (σ 2 hi )/hi = σ 2

ECON 413 Introductory Econometrics



We can divide the original equation by hi to get
p p p p p p
yi / hi = β0 / hi + β1 (x1i / hi ) + β2 (x2i / hi ) + ... + βk (xki / hi ) + (ui / hi )

or

yi∗ = β0 xi0
∗ ∗
+ β1 xi1 + ... + βk xik∗ + ui∗

∗ = 1/ h and the other starred variables denote the corresponding
where xi0 i

original variables divided by hi
Consider our simple example of savings function, the transformed equation is:

inci = β0 (1/ inci ) + β1 inci + ui∗


p p p
savi /

ECON 413 Introductory Econometrics


Now, with this transformed equation, the assumption of homoskedasticity is
satisfied
The OLS estimators from this transformed equation has appealing properties of
BLUE
These estimators β0∗ , β1∗ , ..., βk∗ will be different from the OLS estimatorsin the
original equation
These estimators are referred to weighted least squares (GLS) estimators

ECON 413 Introductory Econometrics


STATA Example

In this example, we estimate equations that explain net total financial wealth in
terms of income

nettfa = β0 + β1 inc + u

We use the data on single people (fsize=1 ) for the above regression:

. keep if fsize ==1


(7258 observations deleted)

. reg nettfa inc

Source SS df MS Number of obs = 2017


F( 1, 2015) = 181.60
Model 377482.064 1 377482.064 Prob > F = 0.0000
Residual 4188482.98 2015 2078.6516 R-squared = 0.0827
Adj R-squared = 0.0822
Total 4565965.05 2016 2264.86361 Root MSE = 45.592

nettfa Coef. Std. Err. t P>|t| [95% Conf. Interval]

inc .8206815 .0609 13.48 0.000 .7012479 .940115


_cons -10.57095 2.060678 -5.13 0.000 -14.61223 -6.529671

ECON 413 Introductory Econometrics


STATA Example
Suppose we assume the OLS standard errors has a variance: var (u|inc) = σ 2 inc
We can run the following WLS regression:
√ √ √
nettfa/ inc = β0 / inc + β1 inc + u

. gen nettfa1 = nettfa/sqrt(inc)

. gen invincsq = 1/sqrt(inc)

. gen inc1 = sqrt(inc)

. reg nettfa1 invincsq inc1, noc

Source SS df MS Number of obs = 2017


F( 2, 2015) = 138.24
Model 14410.242 2 7205.12101 Prob > F = 0.0000
Residual 105019.471 2015 52.1188444 R-squared = 0.1207
Adj R-squared = 0.1198
Total 119429.713 2017 59.2115585 Root MSE = 7.2193

nettfa1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

invincsq -9.580702 1.653284 -5.79 0.000 -12.82303 -6.338378


inc1 .7870523 .0634814 12.40 0.000 .6625562 .9115484

ECON 413 Introductory Econometrics

You might also like