You are on page 1of 25

Heteroscedasticity

Lecture 3

L. Dhliwayo

University of Zimbabwe
ldhliwayo@science.uz.ac.zw

October 5, 2018

L. Dhliwayo (UZ) Econometrics October 5, 2018 1 / 24


Overview

1 Heteroscedasticity

2 Causes or sources of heteroscedasticity


Mis-specification
Stratification
Data Treatment
Data Collection Procedures
Administrative Interference

3 Impact of heteroscedasticity on estimation and inference

4 Estimation and Testing for heteroscedasticity

5 Goldfeld-Quandt Test
Barttlet’s Test

L. Dhliwayo (UZ) Econometrics October 5, 2018 2 / 24


Heteroscedasticity

The main goal or objective of econometric modelling is to obtain accurate


and efficient estimates of relationships among variable of a economic
system on the basis of which the main aims of Econometrics namely,

prediction,
planning and
control can be effected.
The degree of success or failure in achieving these goals depends to a large
extent, on the degree of success achieved at the specification, estimation
and diagnostic stages of the model building process. At the estimation and
inferential stages, efficiency of parameter estimates and validity of
inference resulting therefrom, depend largely on whether or not the
fundamental assumptions of the GLM are satisfied

L. Dhliwayo (UZ) Econometrics October 5, 2018 3 / 24


In the econometric regression model

Y = X0 β + ut

one of the basic assumptions is that the disturbance or error terms {ut }
are homoscedastic i.e. σ 2 (t) = var(ut ) = σ 2 is a constant for all t or
equivalently
Ω = E (uu0 ) = σ 2 I
where u = (u1 , u2 , . . . , un )0 . This assumption among other things, that the
least squares estimator
β̂ = (X0 X)−1 XY
is an efficient estimator of β and that the conventional t-test and F-test
used to make inference about the model are valid statistical procedures. If
the error terms in the regression model have unequal variances we say that
there is heteroscedasticity.

L. Dhliwayo (UZ) Econometrics October 5, 2018 4 / 24


Definition of Heteroscedasticity
Definition
Let {ut } be a time series. Then the series is said to heteroscedastic if

σ 2 (t) 6= σ 2 (s) form some t 6= s

Heteroscedasticity refers to a situation where the variances of the error


terms {ut } are unequal. Since this is an undesirable phenomenon, the
GLM assumption referring to the absence of this phenomenon is called the
assumption of homogeneous variances i.e. equal variances.

Homogeneous Variance Assumption


Whenever there is heteroscedasticity in the error terms all inference
namely estimation, hypothesis testing and forecasting must take into
account the effects of heteroscedasticity for the conclusions to be valid.

We examine below some common causes of heteroscedasticity and how


they be avoided or taken into account
L. Dhliwayo (UZ)
when making statistical
Econometrics
inference. 5 / 24
October 5, 2018
Causes or sources of heteroscedasticity

The assumption of spherical disturbances, as indicated earlier involves the


double assumption that:
1. the error terms have equal variances, and
2. the error terms are uncorrelated.
It is possible, of course, for the error series to be uncorrelated but with
unequal variances. In this case the covariance matrix for the error terms
will be diagonal.

Heteroscedasticity can be caused by a number of factors. These include,

L. Dhliwayo (UZ) Econometrics October 5, 2018 6 / 24


1. Mis-specification
Some economic variables such as Consumer Price Index(CPI) or GDP tend
to increase linearly or exponentially. If such variables are omitted from the
regression they will be absorbed in the error term ut which will then
exhibit changing variance. For example if a model of the form

Yt = β0 + β1 X1t + β2 X2t + ut

is wrongly specified as

Yt = β0 + β1 X1t + vt .

If {X2t } is increasing with time, so will {vt }.

Heteroscedasticity due to mis-specification by exclusion of important


explanatory variables or by assuming a linear relation when in fact a
non-linear relationship exists is quite common. The solution to the
problem if detected is simply correcting the specification. Other
treatments of the problem are also possible as we will see later.
L. Dhliwayo (UZ) Econometrics October 5, 2018 7 / 24
2. Stratification

Different economic units or populations are hardly homogeneous. Data for


two different groups of populations can exhibit unequal variances for many
reasons. For example
1. income figures for low and high income groups, in general, show
different variablity or spread of values.
2. data for small firms will not show as much variability as data for large
firms whose economic activities are on a larger scale.

Smaller firms are unlikely to engage in extensive and/or competitive


research and developments since they may not have the leverage i.e.
asserts, liquidity, economies of scale etc. As there are also greater risks
involved in these activities, we would expect variability to be more
pronounced for larger firms.

L. Dhliwayo (UZ) Econometrics October 5, 2018 8 / 24


3. Data Treatment

Data manipulation such as data aggregation and grouping techniques tend


to produce marked heterogeneity. Use of indices and choice or change of
base year can cause heteroscedasticity.

L. Dhliwayo (UZ) Econometrics October 5, 2018 9 / 24


4. Data Collection Procedures

Sampling procedures such as cluster sampling can easily generate unequal


variances.

L. Dhliwayo (UZ) Econometrics October 5, 2018 10 / 24


5. Administrative Interference

Sometimes and for some socio-political reasons, statistical data are


interfered with so that some (types or groups of) figures are changed so as
to make them appear larger or smaller than what they really are. In
addition, Statistical acts and their enforcements can results in marked
differences in data, especially for data collected during different periods.

L. Dhliwayo (UZ) Econometrics October 5, 2018 11 / 24


Impact of heteroscedasticity on estimation and inference

We have already seen the main effects of non-spherical disturbances. In


particular we have seen that if β̂ is the OLS estimate of β in the GLM
Y = X0 β + u, then

cov(β̂) = (X0 X)−1 (X0 ΩX)(X0 X)−1

which implies biased variance estimation and hence general lack of


accuracy in all subsequent inference. In particular, heteroscedasticity
implies that
1. Model coefficients will be inaccurate.
2. Error variance σ 2 is underestimated by OLS estimation.
3. The estimated model has low predictive power.

L. Dhliwayo (UZ) Econometrics October 5, 2018 12 / 24


Heteroscedasticity can take on various forms. Typical or standard
heteroscedasticity takes the form
 2 
σ1 0 . . . 0
 0 σ2 . . . 0 
2
Ω = cov(u, u) = E (uu0 ) =  .
 
.. .. ..
 ..

. . . 
0 ... ... σn2
It is, however, not enough to just assume that
 2 
σ1 0 . . . 0
 0 σ2 . . . 0 
2
Ω= .
 
.. .. ..
 ..

. . . 
0 . . . . . . σn2
.

We do not have enough degrees of freedom as there are (p + n) parameters


β0 , . . . , βp−1 and σ12 , . . . , σn2
to be estimated. So to make the study or analysis feasible we must impose
further restrictions on the structure of Ω.
L. Dhliwayo (UZ) Econometrics October 5, 2018 13 / 24
Estimation and Testing for heteroscedasticity

As indicated above in order to be able to estimate accurately parameters


of a model in the presence of heteroscedasticity, it is necessary to make
some simplifying but attainable assumptions.
Suppose that it is possible to identify groups G1 , G2 , . . . , Gm such that
error variances are homogeneous i.e. equal within a group but possibly
differing from group to group. Let ng , g = 1, 2, . . . , m be the number of
observations in the i th group.

L. Dhliwayo (UZ) Econometrics October 5, 2018 14 / 24


Goldfeld-Quandt Test

Testing for grouped and increasing i.e. ordered heteroscedasticity


The tests for heteroscedasticity discussed here assume that the disturbance
term {ut } are
(i) uncorrelated
(i) normally distributed
The Hypothessis to be tested is

H0 : σ12 = σ22 = · · · = σm
2
versus H1 : σ12 ≤ σ22 ≤ · · · ≤ σm
2

The test applies to a situation where the variance is a function of an


exogenous, i.e, explanatory variable, the procedure is as follows.

L. Dhliwayo (UZ) Econometrics October 5, 2018 15 / 24


1. Order the data with the magnitude of the explanatory variable Xt .
2. Omit say c central or middle values. The value of c is usually chosen
so that about 10% or at most, 25% of the observations are omitted.
3. Peform the two regressions by OLS to the first and last n−c
2
observations.
4. Let SSE1 and SSE2 denote the residual sum of squares from the first
and second regressions respectively. Let also
SSEmax = max(SSE1 , SSE2 )
and
SSEmin = min(SSE1 , SSE2 )
. Then the test statistic which is given by
SSEmax /(n − c − 2p)/2 SSEmax
F = =
SSEmin /(n − c − 2p)/2 SSEmin
has, under the assumption of equal variances, an F-distribution with
[(n − c − 2p)/2, (n − c − 2p)/2)] degrees of freedom where p is the
number of model parameters.
L. Dhliwayo (UZ) Econometrics October 5, 2018 16 / 24
Example
The following data show expenditure data.

Household Consumption (Y) Income (X) Income group


1 22 29 1
2 22 20 1
3 20 14 1
4 24 21 1
5 30 15 2
6 32 9 2
7 26 1 2
8 26 6 2
9 12 16 3
10 8 31 3
11 13 26 3
12 25 25 3

Test for heteroscedasticity in the linear regression model


Y = β0 + β1 X + u, using Goldfeld-Quandt test with c = 2.
L. Dhliwayo (UZ) Econometrics October 5, 2018 17 / 24
Barttlet’s Test
Another test for gouped heteroscedasticity but less frequently used in
Econometrics is Barttlet’s test. The testing procedure is as follows:

1. The hypothesis is
H0 : σ12 = σ22 = · · · = σg2
2. The Test statistics is
λ
Q=
C
where
hP i
1 m 1 Pm 1
C = 1+ g =1 ng −1 −
3(m−1) g =1 (ng −1)

hP i
m 2
Pm 2
λ= g =1 (ng − 1) ln s − g =1 (ng − 1) ln sg

1 Png 1 Png
sg2 = ng −1 j=1 (ygj − ȳg )2 ȳg = ng j=1 ygj

Pm
s2 = Pm 1 − 1)sg2
g =1 (ng −1)
g =1 (ng
L. Dhliwayo (UZ) Econometrics October 5, 2018 18 / 24
3. Under H0 , Q follows an approximate χ2 -distribution with m-1 degrees
of freedom. Thus H0 is rejected at the α signfificance level if
Q > χ2m−1,α .

L. Dhliwayo (UZ) Econometrics October 5, 2018 19 / 24


Example
The following data show expenditure data.

Household Consumption (Y) Income (X) Income group


1 22 29 1
2 22 20 1
3 20 14 1
4 24 21 1
5 30 15 2
6 32 9 2
7 26 1 2
8 26 6 2
9 12 16 3
10 8 31 3
11 13 26 3
12 25 25 3

Test for heteroscedasticity in the linear regression model


Y = β0 + β1 X + u, using Barttlets test.
L. Dhliwayo (UZ) Econometrics October 5, 2018 20 / 24
X 1 6 9 14 15 16 20 21 25 26 29 31
Y 26 26 32 20 30 12 22 24 25 13 22 8

Pn
− n1 ( ni=1 x)( ni=1 y ) = 1200 − 51 (45)(134) = −6
P P
Sxy = i=1 xy
Pn Pn
Sxx = i=1 x
2 − n1 ( i=1 x)
2 = 539 − 51 (45)2 = 134
Pn 2 − n1 ( ni=1 y )2 = 3676 − 15 (134)2 = 84.8
P
Syy = i=1 y

2
Sxy (−6)2
SST1 = Syy = 84.8 SSR1 = Sxx = 134 = 0.2687

SSE1 = SST1 − SSR1 = 84.8 − 0.2687 = 84.5313

L. Dhliwayo (UZ) Econometrics October 5, 2018 21 / 24


Pn
− n1 ( ni=1 x)( ni=1 y ) = 2353 − 51 (92)(132) = −75.8
P P
Sxy = i=1 xy
Pn Pn
Sxx = i=1 x
2 − n1 ( i=1 x)
2 = 3544 − 15 (132)2 = 59.2
Pn 2 − n1 ( ni=1 y )2 = 1918 − 15 (92)2 = 225.2
P
Syy = i=1 y

2
Sxy (−75.8)2
SST2 = Syy = 225.2 SSR2 = Sxx = 59.2 = 97.0547

SSE2 = SST2 − SSR2 = 225.2 − 97.0547 = 128.1453

L. Dhliwayo (UZ) Econometrics October 5, 2018 22 / 24


Group 1 Group 2 Group 3
Y1 Y2 Y3
22 30 12
22 32 8
20 26 13
24 26 25
ng − 1 3 3 3
sg2 2.6667 9 53.6667
ln sg2 0.9808 2.1972 3.9828

Pm
s2 = Pm 1 − 1)sg2
g =1 (ng −1)
g =1 (ng

1
= 9 [3(206667) + 3(9) + 3(53.6667)] = 21.7779

ln sg2 = ln(21.7778) = 3.08089

L. Dhliwayo (UZ) Econometrics October 5, 2018 23 / 24


hP i
1 m 1 Pm 1
C =1+ g =1 ng −1 −
3(m−1) g =1 (ng −1)

h i
1
=1+ (3)(2) ( 13 + 1
3 + 13 ) − 1
3+3+3 = 31
27

hP i
m Pm
λ = g =1 (ng − 1) ln s 2 − 2
g =1 (ng − 1) ln sg

= [3(3.08089) + 3(3.08089) + 3(3.08089)] −

[3(0.9808) + 3(2.1972) + 3(3.9828)] = 6.2455

λ 6.2455
Q = C = 31/27 = 5.4396

χ2m−1 α = χ22 0.05 = 6


Since Q = 5.4396 < 6 we fail to reject H0 and conclude that the variances
are not different.
L. Dhliwayo (UZ) Econometrics October 5, 2018 24 / 24
The End

L. Dhliwayo (UZ) Econometrics October 5, 2018 25 / 24

You might also like