Heteroscedasticity: L. Dhliwayo

Heteroscedasticity
Lecture 3
L. Dhliwayo
University of Zimbabwe
ldhliwayo@science.uz.ac.zw
October 5, 2018
L. Dhliwayo (UZ) Econometrics October 5, 2018 1 / 24

Overview
1 Heteroscedasticity
2 Causes or sources of heteroscedasticity

Mis-specification
Stratification
Data Treatment
Data Collection Procedures
Administrative Interference
3 Impact of heteroscedasticity on estimation and inference
4 Estimation and Testing for heteroscedasticity
5 Goldfeld-Quandt Test
Barttlet’s Test

Heteroscedasticity
The main goal or objective of econometric modelling is to obtain accurate

and efficient estimates of relationships among variable of a economic
system on the basis of which the main aims of Econometrics namely,
prediction,
planning and
control can be effected.
The degree of success or failure in achieving these goals depends to a large
extent, on the degree of success achieved at the specification, estimation
and diagnostic stages of the model building process. At the estimation and
inferential stages, efficiency of parameter estimates and validity of
inference resulting therefrom, depend largely on whether or not the
fundamental assumptions of the GLM are satisfied

In the econometric regression model
Y = X0 β + ut
one of the basic assumptions is that the disturbance or error terms {ut }
are homoscedastic i.e. σ 2 (t) = var(ut ) = σ 2 is a constant for all t or
equivalently
Ω = E (uu0 ) = σ 2 I
where u = (u1 , u2 , . . . , un )0 . This assumption among other things, that the
least squares estimator
β̂ = (X0 X)−1 XY
is an efficient estimator of β and that the conventional t-test and F-test
used to make inference about the model are valid statistical procedures. If
the error terms in the regression model have unequal variances we say that
there is heteroscedasticity.

Definition of Heteroscedasticity
Definition
Let {ut } be a time series. Then the series is said to heteroscedastic if
σ 2 (t) 6= σ 2 (s) form some t 6= s
Heteroscedasticity refers to a situation where the variances of the error

terms {ut } are unequal. Since this is an undesirable phenomenon, the
GLM assumption referring to the absence of this phenomenon is called the
assumption of homogeneous variances i.e. equal variances.
Homogeneous Variance Assumption

Whenever there is heteroscedasticity in the error terms all inference
namely estimation, hypothesis testing and forecasting must take into
account the effects of heteroscedasticity for the conclusions to be valid.
We examine below some common causes of heteroscedasticity and how

they be avoided or taken into account
L. Dhliwayo (UZ)
when making statistical
Econometrics
inference. 5 / 24
October 5, 2018
Causes or sources of heteroscedasticity
The assumption of spherical disturbances, as indicated earlier involves the

double assumption that:
1. the error terms have equal variances, and
2. the error terms are uncorrelated.
It is possible, of course, for the error series to be uncorrelated but with
unequal variances. In this case the covariance matrix for the error terms
will be diagonal.
Heteroscedasticity can be caused by a number of factors. These include,

1. Mis-specification
Some economic variables such as Consumer Price Index(CPI) or GDP tend
to increase linearly or exponentially. If such variables are omitted from the
regression they will be absorbed in the error term ut which will then
exhibit changing variance. For example if a model of the form
Yt = β0 + β1 X1t + β2 X2t + ut
is wrongly specified as
Yt = β0 + β1 X1t + vt .
If {X2t } is increasing with time, so will {vt }.
Heteroscedasticity due to mis-specification by exclusion of important

explanatory variables or by assuming a linear relation when in fact a
non-linear relationship exists is quite common. The solution to the
problem if detected is simply correcting the specification. Other
treatments of the problem are also possible as we will see later.
2. Stratification
Different economic units or populations are hardly homogeneous. Data for

two different groups of populations can exhibit unequal variances for many
reasons. For example
1. income figures for low and high income groups, in general, show
different variablity or spread of values.
2. data for small firms will not show as much variability as data for large
firms whose economic activities are on a larger scale.
Smaller firms are unlikely to engage in extensive and/or competitive

research and developments since they may not have the leverage i.e.
asserts, liquidity, economies of scale etc. As there are also greater risks
involved in these activities, we would expect variability to be more
pronounced for larger firms.

3. Data Treatment
Data manipulation such as data aggregation and grouping techniques tend

to produce marked heterogeneity. Use of indices and choice or change of
base year can cause heteroscedasticity.

4. Data Collection Procedures
Sampling procedures such as cluster sampling can easily generate unequal

variances.

5. Administrative Interference
Sometimes and for some socio-political reasons, statistical data are

interfered with so that some (types or groups of) figures are changed so as
to make them appear larger or smaller than what they really are. In
addition, Statistical acts and their enforcements can results in marked
differences in data, especially for data collected during different periods.

Impact of heteroscedasticity on estimation and inference
We have already seen the main effects of non-spherical disturbances. In

particular we have seen that if β̂ is the OLS estimate of β in the GLM
Y = X0 β + u, then
cov(β̂) = (X0 X)−1 (X0 ΩX)(X0 X)−1
which implies biased variance estimation and hence general lack of

accuracy in all subsequent inference. In particular, heteroscedasticity
implies that
1. Model coefficients will be inaccurate.
2. Error variance σ 2 is underestimated by OLS estimation.
3. The estimated model has low predictive power.

Heteroscedasticity can take on various forms. Typical or standard
heteroscedasticity takes the form
 2 
σ1 0 . . . 0
 0 σ2 . . . 0 
2
Ω = cov(u, u) = E (uu0 ) =  .
 
.. .. ..
 ..

. . . 
0 ... ... σn2
It is, however, not enough to just assume that
 2 
σ1 0 . . . 0
 0 σ2 . . . 0 
2
Ω= .
 
.. .. ..
 ..

. . . 
0 . . . . . . σn2
.
We do not have enough degrees of freedom as there are (p + n) parameters

β0 , . . . , βp−1 and σ12 , . . . , σn2
to be estimated. So to make the study or analysis feasible we must impose
further restrictions on the structure of Ω.
Estimation and Testing for heteroscedasticity
As indicated above in order to be able to estimate accurately parameters

of a model in the presence of heteroscedasticity, it is necessary to make
some simplifying but attainable assumptions.
Suppose that it is possible to identify groups G1 , G2 , . . . , Gm such that
error variances are homogeneous i.e. equal within a group but possibly
differing from group to group. Let ng , g = 1, 2, . . . , m be the number of
observations in the i th group.

Goldfeld-Quandt Test
Testing for grouped and increasing i.e. ordered heteroscedasticity

The tests for heteroscedasticity discussed here assume that the disturbance
term {ut } are
(i) uncorrelated
(i) normally distributed
The Hypothessis to be tested is
H0 : σ12 = σ22 = · · · = σm
2
versus H1 : σ12 ≤ σ22 ≤ · · · ≤ σm
2
The test applies to a situation where the variance is a function of an

exogenous, i.e, explanatory variable, the procedure is as follows.

1. Order the data with the magnitude of the explanatory variable Xt .
2. Omit say c central or middle values. The value of c is usually chosen
so that about 10% or at most, 25% of the observations are omitted.
3. Peform the two regressions by OLS to the first and last n−c
2
observations.
4. Let SSE1 and SSE2 denote the residual sum of squares from the first
and second regressions respectively. Let also
SSEmax = max(SSE1 , SSE2 )
and
SSEmin = min(SSE1 , SSE2 )
. Then the test statistic which is given by
SSEmax /(n − c − 2p)/2 SSEmax
F = =
SSEmin /(n − c − 2p)/2 SSEmin
has, under the assumption of equal variances, an F-distribution with
[(n − c − 2p)/2, (n − c − 2p)/2)] degrees of freedom where p is the
number of model parameters.
Example
The following data show expenditure data.
Household Consumption (Y) Income (X) Income group

1 22 29 1
2 22 20 1
3 20 14 1
4 24 21 1
5 30 15 2
6 32 9 2
7 26 1 2
8 26 6 2
9 12 16 3
10 8 31 3
11 13 26 3
12 25 25 3
Test for heteroscedasticity in the linear regression model

Y = β0 + β1 X + u, using Goldfeld-Quandt test with c = 2.
Barttlet’s Test
Another test for gouped heteroscedasticity but less frequently used in
Econometrics is Barttlet’s test. The testing procedure is as follows:
1. The hypothesis is
H0 : σ12 = σ22 = · · · = σg2
2. The Test statistics is
λ
Q=
C
where
hP i
1 m 1 Pm 1
C = 1+ g =1 ng −1 −
3(m−1) g =1 (ng −1)
hP i
m 2
Pm 2
λ= g =1 (ng − 1) ln s − g =1 (ng − 1) ln sg
1 Png 1 Png
sg2 = ng −1 j=1 (ygj − ȳg )2 ȳg = ng j=1 ygj
Pm
s2 = Pm 1 − 1)sg2
g =1 (ng −1)
g =1 (ng
3. Under H0 , Q follows an approximate χ2 -distribution with m-1 degrees
of freedom. Thus H0 is rejected at the α signfificance level if
Q > χ2m−1,α .

Example
The following data show expenditure data.
Household Consumption (Y) Income (X) Income group

1 22 29 1
2 22 20 1
3 20 14 1
4 24 21 1
5 30 15 2
6 32 9 2
7 26 1 2
8 26 6 2
9 12 16 3
10 8 31 3
11 13 26 3
12 25 25 3
Test for heteroscedasticity in the linear regression model

Y = β0 + β1 X + u, using Barttlets test.
X 1 6 9 14 15 16 20 21 25 26 29 31
Y 26 26 32 20 30 12 22 24 25 13 22 8
Pn
− n1 ( ni=1 x)( ni=1 y ) = 1200 − 51 (45)(134) = −6
P P
Sxy = i=1 xy
Pn Pn
Sxx = i=1 x
2 − n1 ( i=1 x)
2 = 539 − 51 (45)2 = 134
Pn 2 − n1 ( ni=1 y )2 = 3676 − 15 (134)2 = 84.8
P
Syy = i=1 y
2
Sxy (−6)2
SST1 = Syy = 84.8 SSR1 = Sxx = 134 = 0.2687
SSE1 = SST1 − SSR1 = 84.8 − 0.2687 = 84.5313

Pn
− n1 ( ni=1 x)( ni=1 y ) = 2353 − 51 (92)(132) = −75.8
P P
Sxy = i=1 xy
Pn Pn
Sxx = i=1 x
2 − n1 ( i=1 x)
2 = 3544 − 15 (132)2 = 59.2
Pn 2 − n1 ( ni=1 y )2 = 1918 − 15 (92)2 = 225.2
P
Syy = i=1 y
2
Sxy (−75.8)2
SST2 = Syy = 225.2 SSR2 = Sxx = 59.2 = 97.0547
SSE2 = SST2 − SSR2 = 225.2 − 97.0547 = 128.1453

Group 1 Group 2 Group 3
Y1 Y2 Y3
22 30 12
22 32 8
20 26 13
24 26 25
ng − 1 3 3 3
sg2 2.6667 9 53.6667
ln sg2 0.9808 2.1972 3.9828
Pm
s2 = Pm 1 − 1)sg2
g =1 (ng −1)
g =1 (ng
1
= 9 [3(206667) + 3(9) + 3(53.6667)] = 21.7779
ln sg2 = ln(21.7778) = 3.08089

hP i
1 m 1 Pm 1
C =1+ g =1 ng −1 −
3(m−1) g =1 (ng −1)
h i
1
=1+ (3)(2) ( 13 + 1
3 + 13 ) − 1
3+3+3 = 31
27
hP i
m Pm
λ = g =1 (ng − 1) ln s 2 − 2
g =1 (ng − 1) ln sg
= [3(3.08089) + 3(3.08089) + 3(3.08089)] −
[3(0.9808) + 3(2.1972) + 3(3.9828)] = 6.2455
λ 6.2455
Q = C = 31/27 = 5.4396
χ2m−1 α = χ22 0.05 = 6

Since Q = 5.4396 < 6 we fail to reject H0 and conclude that the variances
are not different.
The End

Heteroscedasticity: L. Dhliwayo

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Heteroscedasticity: L. Dhliwayo

Uploaded by

Copyright:

Available Formats

Heteroscedasticity

L. Dhliwayo (UZ) Econometrics October 5, 2018 1 / 24

2 Causes or sources of heteroscedasticity

3 Impact of heteroscedasticity on estimation and inference

4 Estimation and Testing for heteroscedasticity

L. Dhliwayo (UZ) Econometrics October 5, 2018 2 / 24

The main goal or objective of econometric modelling is to obtain accurate

L. Dhliwayo (UZ) Econometrics October 5, 2018 3 / 24

L. Dhliwayo (UZ) Econometrics October 5, 2018 4 / 24

σ 2 (t) 6= σ 2 (s) form some t 6= s

Heteroscedasticity refers to a situation where the variances of the error

Homogeneous Variance Assumption

We examine below some common causes of heteroscedasticity and how

The assumption of spherical disturbances, as indicated earlier involves the

Heteroscedasticity can be caused by a number of factors. These include,

L. Dhliwayo (UZ) Econometrics October 5, 2018 6 / 24

If {X2t } is increasing with time, so will {vt }.

Heteroscedasticity due to mis-specification by exclusion of important

Different economic units or populations are hardly homogeneous. Data for

Smaller firms are unlikely to engage in extensive and/or competitive

L. Dhliwayo (UZ) Econometrics October 5, 2018 8 / 24

Data manipulation such as data aggregation and grouping techniques tend

L. Dhliwayo (UZ) Econometrics October 5, 2018 9 / 24

Sampling procedures such as cluster sampling can easily generate unequal

L. Dhliwayo (UZ) Econometrics October 5, 2018 10 / 24

Sometimes and for some socio-political reasons, statistical data are

L. Dhliwayo (UZ) Econometrics October 5, 2018 11 / 24

We have already seen the main effects of non-spherical disturbances. In

cov(β̂) = (X0 X)−1 (X0 ΩX)(X0 X)−1

which implies biased variance estimation and hence general lack of

L. Dhliwayo (UZ) Econometrics October 5, 2018 12 / 24

We do not have enough degrees of freedom as there are (p + n) parameters

As indicated above in order to be able to estimate accurately parameters

L. Dhliwayo (UZ) Econometrics October 5, 2018 14 / 24

Testing for grouped and increasing i.e. ordered heteroscedasticity

The test applies to a situation where the variance is a function of an

L. Dhliwayo (UZ) Econometrics October 5, 2018 15 / 24

Household Consumption (Y) Income (X) Income group

Test for heteroscedasticity in the linear regression model

L. Dhliwayo (UZ) Econometrics October 5, 2018 19 / 24

Household Consumption (Y) Income (X) Income group

Test for heteroscedasticity in the linear regression model

SSE1 = SST1 − SSR1 = 84.8 − 0.2687 = 84.5313

L. Dhliwayo (UZ) Econometrics October 5, 2018 21 / 24

SSE2 = SST2 − SSR2 = 225.2 − 97.0547 = 128.1453

L. Dhliwayo (UZ) Econometrics October 5, 2018 22 / 24

ln sg2 = ln(21.7778) = 3.08089

L. Dhliwayo (UZ) Econometrics October 5, 2018 23 / 24

= [3(3.08089) + 3(3.08089) + 3(3.08089)] −

[3(0.9808) + 3(2.1972) + 3(3.9828)] = 6.2455

χ2m−1 α = χ22 0.05 = 6

L. Dhliwayo (UZ) Econometrics October 5, 2018 25 / 24

You might also like