You are on page 1of 7

1/25 

Outline

Basic Econometrics in Transportation  What is the nature of heteroscedasticity?


 What are its consequences?
 How does one detect it?
 What are the remedial measures?
Heteroscedasticity

Amir Samimi

Civil Engineering Department


Sharif University of Technology

Primary Source: Basic Econometrics (Gujarati)

2/25 
 3/25 


Nature of Heteroscedasticity Possible Reasons

 An important
po ta t assumption
assu pt o in CLRM
C iss that (u2i) = σ2
t at E(u 1.. Ass peop
peoplee learn,
ea , their
t e errors
e o s of
o behavior
be av o become
beco e smaller
s a e over
ove
 This is the assumption of equal (homo) spread (scedasticity). time.
 Example: the higher income families on the average save more than the lower-  As the number of hours of typing practice increases, the average number of
income families, but there is also more variability in their savings. typing errors as well as their variances decreases.
2. As incomes grow, people have more choices about the
disposition of their income.
 Rich people have more choices about their savings behavior.

3. As data collecting techniques improve, σ2i is likely to decrease.


 Banks that have sophisticated data processing equipment are likely to commit
fewer errors.
4/25 
 5/25 


Possible Reasons Cross-sectional and Time Series Data

4. Heteroscedasticity can arise when there are outliers.  Heteroscedasticity is likely to be more common in cross
cross-
 An observation that is much different than other observations in the sample. sectional than in time series data.
5. Heteroscedasticity arises when model is not correctly specified.  In cross-sectional data, one usually deals with members of a population at a

 Very often what looks like heteroscedasticity may be due to the fact that
given point in time. These members may be of different sizes, income, etc.
some important variables are omitted from the model.  In time series data, the variables tend to be of similar orders of magnitude
because one generally collects the data for the same entity over a period of
6. Skewness in distribution of a regressor is an other source. time.
 Distribution of income and wealth in most societies is uneven
uneven, with the bulk
of the income and wealth being owned by a few at the top.
7. Other sources of heteroscedasticity:
 Incorrect data transformation (ratio or first difference transformations).
 Incorrect functional form (linear versus log–linear models).

6/25 
 7/25 


OLS Estimation with Heteroscedasticity Method of Generalized Least Squares

 O
OLS
S est
estimators
ato s aand
d ttheir
e vavariances
a ces when
w e  Ideally,
dea y, we wou
would
d likee to ggive
ve less
ess we
weight
g t to the
t e observations
obse vat o s
 . coming from populations with greater variability.
 .  Consider: Yi = β1 + β2Xi + ui = β1X0i + β2Xi + ui
 Assume the heteroscedastic variances are known:
 Is it still BLUE when we drop only the homoscedasticity
assumption?
 We can easily prove that it is still linear and unbiased.
 We can also show that it is a consistent estimator.  Variance of transformed disturbance term is now homoscedastic:
 It is no longer best and the minimum variance is not given by the equation
above.
 What is BLUE in the presence of heteroscedasticity?
 Apply OLS to the transformed model and get BLUE estimators.
8/25 
 9/25 


GLS Estimators Consequences of Using OLS

 Minimize  OLS estimator for variance is a biased estimator.


 Overestimates or underestimates, on average
 Cannot tell whether the bias is positive or negative
 No longer rely on confidence intervals, t and F tests
 Follow the standard calculus techniques, we have:  If we persist in using the usual testing procedures despite
heteroscedasticity, whatever conclusions we draw may be very misleading.

 Heteroscedasticity is potentially a serious problem and the


researcher needs to know whether it is present in a given
situation.

10/25 
 11/25 


Detection Informal Methods

 There
e e are
a e noo hard-and-fast
a d a d ast rules
u es for
o detecting
detect g heteroscedasticity,
ete oscedast c ty,  Nature of the Problem
 Nature of problem may suggest heteroscedasticity is likely to be encountered.
only a few rules of thumb.
 Residual variance around the regression of consumption on income increases
 This is inevitable because σ2i can be known only if we have the entire Y
with income.
population corresponding to the chosen X’s,
 More often than not, there is only one sample Y value corresponding to a  Graphical Method
particular value of X. And there is no way one can know σ2i from just one Y  Estimated u2i are plotted against estimated Yi
observation.  Is the estimated mean value of Y systematically
 Thus,
Thus heteroscedasticity may be a matter of intuition,
intuition educated guesswork,
guesswork or related
l t d to
t the
th squaredd residual?
id l?
prior empirical experience.  a) no systematic pattern, perhaps no
 Most of the detection methods are based on examination of OLS heteroscedasticity.
residuals.  b-e) definite pattern, perhaps no
homoscedasticity.
 Those are the ones we observe, and not ui. We hope they are good estimates.
 Using such knowledge, one may transform the
 This hope may be fulfilled if the sample size is fairly large.
data to alleviate the problem.
12/25 
 13/25 


Formal Methods Formal Methods

 Park Test  Glejser Test


 He formalizes the graphical method, by suggesting a Log-linear model:  Glejser suggests regressing the estimated error term on the X variable:
ln σ2i = ln σ2 + β ln Xi + vi  Following functional forms are suggested:
 Since σ2i is generally unknown, Park suggests

 If β turns out to be insignificant, homoscedasticity assumption may be


accepted.
 The
h particular
i l functional
f i l form
f chosen
h by
b Parkk is
i only
l suggestive.
i  For large samples the first four give generally satisfactory results.
 The last two models are nonlinear in the parameters.
 Note: the error term vi may not satisfy the OLS assumptions.
 Note: some argued that vi does not have a zero expected value, it is serially
correlated, and heteroscedastic.

14/25 
 15/25 


Formal Methods Formal Methods

 Spea
Spearman’s
a s Rank
a CoCorrelation
e at o Test
est  Go
Goldfeld-Quandt
d e d Qua dt Test
est
 Fit the regression to the data on Y and X and estimate the residuals.  Rank the observations according to Xi values.
 Rank both absolute value of residuals and Xi (or estimated Yi) and compute the  Omit c central observations, and divide the remaining observations into two
Spearman’s rank correlation coefficient: groups each of (n − c) / 2 observations.
 Fit separate OLS regressions to the first and last set of observations, and obtain
• di = difference in the ranks for ith observation.
the residual sums of squares RSS1 and RSS2.
 Assuming that the population rank correlation coefficient is zero and n > 8, the  Compute the ratio
significance
i ifi off the
th sample
l rs can be
b ttested
t d by
b the
th t test,
t t with
ith df = n − 2:
2
 If ui are assumed to be normally distributed, and if the assumption of
homoscedasticity is valid, then it can be shown that λ follows the F distribution.
 The ability of the test depends on how c is chosen.
 If the computed t value exceeds the critical t value, we may accept the  Goldfeld and Quandt suggest that c = 8 if n = 30, c = 16 if n = 60.
hypothesis of heteroscedasticity.  Judge et al. note that c = 4 if n = 30 and c = 10 if n is about 60.
16/25 
 17/25 


Formal Methods Formal Methods

 Breusch
Breusch–Pagan–Godfrey
Pagan Godfrey Test  White
White’ss General Heteroscedasticity Test
 Success of GQ test depends on c and X with which observations are ordered.  Does not rely on the normality assumption and is easy to implement.
 Estimate Yi = β1 + β2X2i + ··· + βkXki + ui by OLS and obtain the residuals.  Estimate Yi = β1 + β2X2i + β3X3i + ui and obtain the residuals.
 Obtain , (ML estimator of σ2)  Run the following auxiliary regression:
 Construct variables pi defined as
 Regress pi on the Z’s as pi = α1 + α2Z2i + ··· + αmZmi + vi Higher powers of regressors can also be introduced.
o σ2i is assumed to be a linear function of the Z’s.  Under the null hypothesis (homoscedasticity), if the sample size n increases
o Some or all of the X’s can serve as Z’s. i d fi i l it
indefinitely, i can be
b shown
h h nR2 ∼ χ2 (df = number
that b off regressors))
 Obtain the ESS (explained sum of squares)  = 0.5 ESS  If the chi-square value exceeds the critical value, the conclusion is that there is
 Assuming ui are normally distributed, one can show that if there is heteroscedasticity.
homoscedasticity and if the sample size n increases indefinitely, then  ∼ χ2m−1  If it does not α2 = α3 = α4 = α5 = α6 = 0.
 BPG test is an asymptotic, or large-sample, test.  It has been argued that if cross-product terms are present, then it is a test of
heteroscedasticity and specification bias.

18/25 
 19/25 


Remedial Measures Remedial Measures

 Heteroscedasticity
ete oscedast c ty does not
ot destroy
dest oy u
unbiasedness
b ased ess aandd  W e σ2i iss known:
When ow :
consistency.  The most straightforward method of correcting heteroscedasticity is
 But OLS estimators are no longer efficient, not even by means of weighted least squares.
asymptotically.  WLS method provides BLUE estimators.

 There are two approaches to remediation:


 when σ2i is known, and  When σ2i is unknown:
 When σ2i is not known.  Is there a way of obtaining consistent estimates of the variances and
covariances of OLS estimators even if there is heteroscedasticity?
The answer is yes.
20/25 
 21/25 


White’s Correction White’s Procedure

 White has suggested a procedure by which asymptotically valid  For a 2


2-variable
variable regression model Yi = β1 + β2X2i + ui we showed:
statistical inferences can be made about the true parameter
values.
 Several computer packages present White’s heteroscedasticity-  White has shown that is a consistent estimator of
corrected variances and standard errors along with the usual OLS
variances and standard errors.
 For Yi = β1 + β2X2i + β3X3i + · · · +βkXki + ui we have:
 White’s heteroscedasticity-corrected standard errors are also
known as robust standard errors.  are the residuals obtained from the original regression.
 are the residuals obtained from the auxiliary
regression of the regressor Xj on the remaining
regressors.

22/25 
 23/25 


Example Reasonable Heteroscedasticity Patterns

 Apart
pa t from
o being
be g a large-sample
a ge sa p e procedure,
p ocedu e, oonee drawback
d awbac of
o the
t e
White procedure is that the estimators thus obtained may not be
so efficient as those obtained by methods that transform data to
Y = per capita expenditure on public schools by state in 1979
reflect specific types of heteroscedasticity.
Income = per capita income by state in 1979
 Both the regressors are statistically significant at the 5 percent  We may consider several assumptions about the pattern of
level, whereas on the basis of White estimators they are not. heteroscedasticity.
 Since robust standard errors are now available in established
regression packages, it is recommended to report them.
 WHITE option can be used to compare the output with regular
OLS output as a check for heteroscedasticity.
24/25 
 25/25 


Reasonable Heteroscedasticity Patterns Homework 5

 Assumption 1: if , Basic Econometrics (Gujarati, 2003)

1. Chapter 11, Problem 15 [50 points]


 Assumption 2: if , 2. Chapter 11, Problem 16 [50 points]

 Assumption 3: if ,

 Assumption 4:
 A log transformation such as lnYi = β1 + β2 ln Xi + ui very often reduces
Assignment weight factor = 0.5
heteroscedasticity.

You might also like