Chapter10 Heteroskedasticity

HETEROSKEDASTICITY
β 10-1
Pure Heteroskedasticity
• Pure heteroskedasticity—referred to as
heteroskedasticity—occurs when Classical Assumption V
is violated in a correctly specified equation.
• Classical Assumption V assumes homoskedasticity:
𝑉𝑎𝑟 𝜀𝑖 = 𝜎 2 = a constant 𝑖 = 1,2, . . , 𝑁 (10.1)
• With heteroskedasticity, error term variance is not

constant and depends on the observation:
𝑉𝑎𝑟 𝜀𝑖 = 𝜎𝑖2 𝑖 = 1,2, . . , 𝑁 (10.2)
β 10-2
Pure Heteroskedasticity (continued)
• Heteroskedasticity often occurs in data sets in which
there is a large disparity between largest and smallest
observed values of the dependent variable.
• Simplest way to visualize pure heteroskedasticity is to

picture a world in which observations of error term can
be grouped into two distributions: “wide” and “narrow.”
• Both groups can be centered on zero but one has a

larger variance (Figure 10.1).
β 10-3
β 10-4
• Heteroskedasticity can take on many complex forms.
• One model has the variance of the error term related to
an exogenous variable Zi.
• For a typical regression equation:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + 𝜀𝑖 (10.3)
• The variance of the error term might be equal to:

𝑉𝑎𝑟 𝜀𝑖 = 𝜎 2 𝑍𝑖 (10.4)
• Z is called a proportionality factor.

β 10-5
β 10-6
β 10-7
• Heteroskedasticity can occur in a time-series model with
a significant amount of change in the dependent variable.
• It can also occur in any model, time series or cross-
sectional, where the quality of data collection changes
dramatically.
• As measurement errors decrease in size, so should the
variance of the error term.
• Heteroskedasticity caused by an error in specification is
referred to as impure heteroskedasticity.
β 10-8
The Consequences of Heteroskedasticity
• There are three major consequences of
heteroskedasticity:
1. Pure heteroskedasticity does not cause bias in the
coefficient estimates.
2. Heteroskedasticity typically causes OLS to no longer
be the minimum variance estimator (of all the linear
unbiased estimators).
3. Heteroskedasticity causes the OLS estimates of the
standard errors to be biased, leading to unreliable
hypothesis testing and confidence intervals.
β 10-9
Testing for Heteroskedasticity
• There are many tests for heteroskedasticity; two popular:
1. Park test
2. Goldfeld-Quandt
2. Breusch-Pagan test
• Before testing for heteroskedasticity, start with asking:
1. Are there any obvious specification errors?
2. Are there any early warning signs of
heteroskedasticity?
3. Does a graph of the residuals show any evidence of
heteroskedasticity?
β 10-10
Testing for Heteroskedasticity (continued)
β 10-11
β 10-12
• For test the heterokedasticity, we will use the data from
UCLA website.
• In Stata, type
. use http://stats.idre.ucla.edu/stat/stata/ado/analysis/hetdata.dta,clear
to download the data (make sure that your computer is online).
β 10-13
• The model ;
𝑒𝑥𝑝𝑖 = 𝛼0 + 𝛼1 𝑎𝑔𝑒𝑖 + 𝛼2 𝑜𝑤𝑛𝑟𝑒𝑛𝑡𝑖 + 𝑖𝑛𝑐𝑜𝑚𝑒𝑖 + 𝑖𝑛𝑐𝑜𝑚𝑒𝑖2 + 𝑢𝑖 (7.1)
where;
exp = expenditure
age = age
ownrent = own rent
income = income
β 10-14
• Lets we regress the Eq(7.1)
regress exp age ownrent income incomesq
Source SS df MS Number of obs = 72
F( 4, 67) = 5.39
Model 1749357.01 4 437339.252 Prob > F = 0.0008
Residual 5432562.03 67 81083.0153 R-squared = 0.2436
Adj R-squared = 0.1984
Total 7181919.03 71 101153.789 Root MSE = 284.75
exp Coef. Std. Err. t P>|t| [95% Conf. Interval]
age -3.081814 5.514717 -0.56 0.578 -14.08923 7.925606

ownrent 27.94091 82.92232 0.34 0.737 -137.5727 193.4546
income 234.347 80.36595 2.92 0.005 73.93593 394.7581
incomesq -14.99684 7.469337 -2.01 0.049 -29.9057 -.0879857
_cons -237.1465 199.3517 -1.19 0.238 -635.0541 160.7611
• Generate the residual;

predict uhat,resid
β 10-15
• We plot the data for residual for each variables to
see that whether there is a clear pattern between
them.
graph twoway (scatter uhat age),yline(0)
1500
1000
500
0
-500
20 30 40 50 60
β
age
10-16
graph twoway (scatter uhat ownrent),yline(0)
1500
1000
500
0
-500
0 .2 .4 .6 .8 1
own rent
β 10-17
graph twoway (scatter uhat income),yline(0)
1500
1000
500
0
-500
2 4 6 8 10
income
β 10-18
• Its seem that the income variable is the source of
heterocedasticity, or we called its as the Z factor.
β 10-19
Park Test
• The challenge of the Park (1966) test is to determine
which variable to use as Z, the size factor.
• The Park test is useful because tells weather we
have successfully identified the correct Z.
•
β 10-20
Park Test (continued)
• Steps:
1. Decide what variable should serve as Z (usually something that
measures the relative size of the obs).
2. Run the original regression:
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝑢 (5.1)
3. Carry-out the error term obs (residuals):

𝑢 = 𝑌 − 𝛽0 − 𝛽1 𝑋1 − 𝛽2 𝑋2 − ⋯ − 𝛽𝑘 𝑋𝑘 (5.2)
4. Square the error term obs, 𝑢2

5. Take the natural log of both Z and 𝑢2 , then run the following
regression:
ln 𝑢2 = 𝛽0 + 𝛽1 ln 𝑍 + 𝑒 (5.3)
β 10-21
6. Conduct a t-test on 𝛽1 . If the 𝛽1 is significantly different from zero at,
let says, 5% level, we assume there is heteroskedasticity.
Otherwise, there is no heteroskedasticity. We may want try the Park
test more than once, is there is more than one good measure of
size that could be used as Z.
• If there no heteroskedasticity, then Z should not be related

to the variance of the error term.
• In Park test, 𝑢2 represents an estimate of the error term
variance.
• If Z and 𝑢2 are unrelated, then the t-test should be
insignificant and there are no heteroskedasticity.
β 10-22
• Perform Park test with Stata
o Estimate Eq(7.1) by OLS.

o Choose Z factor. Assume that Z = income.
o Generate the residual from Eq(1).
predict uhat,residual
o Squared the residual (𝑢2 )
gen uhatsq=uhat^2
o Transform the 𝑢2 and income into log form.

gen lnuhatsq=log(uhatsq)
gen lnincome=log(income)
β 10-23
o Estimate the model
ln 𝑢2 = 𝛽0 + 𝛽1 ln 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝑒 (7.2)
regress lnuhatsq lnincome
F( 1, 70) = 1.70
Model 8.44316238 1 8.44316238 Prob > F = 0.1962
Total 355.57651 71 5.00811986 Root MSE = 2.2269
lnuhatsq Coef. Std. Err. t P>|t| [95% Conf. Interval]
lnincome .819264 .6278711 1.30 0.196 -.4329853 2.071513

_cons 8.156698 .762231 10.70 0.000 6.636476 9.67692
• The coefficient of lnincome is not statistically significance.

• Means that there are no heterocedasticity base on Z=
income.
β 10-24
Goldfeld-Quandt Test
• The most popular method.
• Aplicable if we assume heteroskedastic variance, 𝜎𝑖2 is
positively related to one of independendent variables.
• Consider two-variable model:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 (5.4)
• supposed 𝜎𝑖2 is positively related to 𝑋𝑖

𝜎𝑖2 = 𝜎 2 𝑋𝑖2 (5.5)
• Where 𝜎 2 is constant.
β 10-25
Goldfeld-Quandt Test (continued)
• Assumption Eq(5.5) postulates that 𝜎𝑖2 is proportional to the
square if X variable.
• It means that, when 𝜎𝑖2 become larger, the larger of value 𝑋𝑖 .
• If this happen, the heteroskedasticity is exist in model.
• To test this explicitly, G-Q suggest following steps:
1. Order or rank the obs according to the value of X, beginning lowest
value of X.
2. Omit c central obs, and divide the remaining (n-c) obs into two
group each of (n-c)/2 obs.
3. Run OLS regression to the first group and second group.
β 10-26
Obtain the varian from the two groups.  1 for group 1 and  2 for the
2 2
4.
group 2.  1 refers to small variance group and  2 refers to large
2 2
variance group.
5. Compute the ratio
 22 /  n2  k  (5.6)
 2
 1 /  n2  k 
where 𝜆 follows the F distribution with numerator and denominator
df (n – k).
6. If in application the computed 𝜆(= 𝐹) is greater than critical F at the
chosen level of significance, we reject the hypothesis of
homoskedasticity . Means that there are heteroskedasticity in the
estimated model.
β 10-27
• Perform Goldfeld-Quandt with Stata;
• Assumes that Z factor is income.

o Order or rank the observations according to the values of
Z(=income), beginning with lowest income.
gsort income
o Divide obs into 2 groups. The data have 72 obs. Each group have
36 obs now. Drop the middle obs (drop the obs 35,36,37,38)
o Group n1 (cointained obs from 1 to 34) and Group n2 (contained

obs from 39 to 72).
β 10-28
o Regress the data for group n1 (1 to 34) based on Eq(7.1) and carry-
out the value of RSS and their degree of freedom
regress exp age income ownrent incomesq in 1/34
F( 4, 29) = 1.66
Model 72255.6521 4 18063.913 Prob > F = 0.1860
Total 387634.192 33 11746.4907 Root MSE = 104.28
age -3.906595 2.739347 -1.43 0.165 -9.509188 1.695998

income 256.8735 579.7199 0.44 0.661 -928.7869 1442.534
ownrent 105.6887 46.12162 2.29 0.029 11.35942 200.018
incomesq -54.67354 134.2711 -0.41 0.687 -329.2887 219.9417
_cons -90.54211 604.161 -0.15 0.882 -1326.19 1145.106
scalar rmse_n1=e(rmse)
scalar df_n1=e(df_r)
β 10-29
o Regress the data for group n2 (39 to 72) based on Eq(7.1) and
carry-out the value of RSS and their degree of freedom
regress exp age income ownrent incomesq in 39/72
F( 4, 29) = 0.38
Model 256113.216 4 64028.304 Prob > F = 0.8179
Adj R-squared = -0.0806
Total 5085865.8 33 154117.145 Root MSE = 408.1
age -2.575317 12.4036 -0.21 0.837 -27.94353 22.7929

income 217.47 239.8188 0.91 0.372 -273.0146 707.9546
ownrent -61.23771 168.3738 -0.36 0.719 -405.6008 283.1254
incomesq -13.72955 18.76875 -0.73 0.470 -52.11594 24.65685
_cons -129.9911 687.2023 -0.19 0.851 -1535.478 1275.495
scalar rmse_n2=e(rmse)
scalar df_n2=e(df_r)
β 10-30
o Compute the ratio:
 22 /  n2  k (7.3)
F 2
 1 /  n2  k 
*calculate the F-statistics
scalar Fstat = rmse_n1^2/rmse_n2^2
* calculate the F-critical at 5% significance level
scalar Fcrit=invFtail(df_n2,df_n1,0.05)
*display F-statistics and F-critical

scalar list Fstat Fcrit
Fstat = .06529911
Fcrit = 1.8608114
β 10-31
o The null hypothesis is homocedastic.

o The results show that the F-statistics is lower than F-critical at 5%
significance level. Means that we fail to reject Ho.
o Conclusion, heteroscedasticity is not exist when Z = income
β 10-32
The Breusch-Pagan Test
• Breusch-Pagan test investigates whether the squared
residuals can be explained by possible proportionality
factors.
• It has three steps:

Step 1: Obtain the residuals from the estimated
equation. For an equation with two independent
variables:
𝑒𝑖 = 𝑌𝑖 − 𝑌𝑖 = 𝑌𝑖 − 𝛽1 𝑋1𝑖 − 𝛽2 𝑋2𝑖 (10.6)
β 10-33
The Breusch-Pagan Test (continued)
Step 2: Use the squared residuals as the dependent
variable in an auxiliary equation.
𝑒𝑖2 = 𝛼0 + 𝛼1 𝑋1𝑖 + 𝛼2 𝑋2𝑖 + 𝑢𝑖 (10.7)

Step 3: Test the overall significance of Equation 10.7
with a chi-square test.
𝐻0 : 𝛼1 = 𝛼2 = 0 (homoskedasticity)
𝐻𝑎 : 𝛼1 = 𝛼2 ≠ 0 (heteroskedasticity)
• The test statistic is NR2 and the degrees of freedom is equal to the
number of slope coefficients in auxiliary.
β 10-34
Example: Woody’s restaurant location
• Auxiliary equation:
• R2 = 0.0441, N = 33 so chi-square statistic = 1.455

• 5-percent critical value (3 degrees of freedom) = 7.81
• We can’t reject the null hypothesis.
• There is no evidence of heteroskedasticity.
β 10-35
• The Breuch-Pagan test with Stata
regress exp age ownrent income incomesq
estat hettest,normal
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of exp
chi2(1) = 29.23
Prob > chi2 = 0.0000
• Null hypothesis is homocedastic.

• Our results shows that the prob is 0.000.
• Means that, even at 0.10 significant level, we reject null
hypothesis.
β• The heteroskedasticity is exist in our model.

10-36
Remedies for Heteroskedasticity
• As we have seen, heteroskedasticity does not
destroy the unbiasedness and consistency
properties of OLS estimator, but they are no longer
efficient.
• This lack of efficiency makes the usual hypothesis
testing prosedure of dubious value.
• Therefore, remedial measures may be called for.
• There are two approaches to remediation: when 𝜎𝑖2
is known and when 𝜎𝑖2 is not known.
β 10-37
When 𝝈𝟐𝒊 is known
• The generalized least squares (GLS) is a popular approach
to treating heteroskedasticity and capable of producing
estimator that are BLUE.
• To see how this is accomplished, let us continue with two
variable model;
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1 + 𝑢𝑖 (6.1)
• which for ease algebraic manipulation we write as

𝑌𝑖 = 𝛽0 𝑋0𝑖 + 𝛽1 𝑋1 + 𝑢𝑖 (6.2)
β 10-38
where 𝑋0𝑖 = 1 for each 𝑖. We can see that these two
formulation are identical.
• Now, assume that the heteroskedastic variances 𝜎𝑖2 are
known.
• Divided Eq(6.2) by 𝜎 2 to obtain;
𝑌𝑖 𝑋0𝑖 𝑋𝑖 𝑢𝑖
= 𝛽0 + 𝛽1 + (6.3)
𝜎𝑖 𝜎𝑖 𝜎𝑖 𝜎𝑖
β 10-39
which for ease of exposition we write as;
𝑌𝑖∗ = 𝛽0∗ 𝑋0𝑖
∗
+ 𝛽1∗ 𝑋𝑖∗ + 𝑢𝑖∗ (6.4)
• Where the starred, or transformed, variables are the original

variable divided by (the known) 𝜎𝑖 .
• We use the notation 𝛽0∗ and 𝛽1∗ , the parameters of transformed
model, to distinguish them from the usual OLS parameters 𝛽0
and 𝛽1 .
β 10-40
• What is the proposed of transforming the original model?
• To see this, notice the following feature of the transformed
error term, 𝑢𝑖∗ ;
𝑢𝑖 2
𝑣𝑎𝑟 𝑢𝑖∗ =𝐸 𝑢𝑖∗ 2 =𝐸
𝜎𝑖
1
= 𝐸 𝑢𝑖2 since 𝜎𝑖2 is known
𝜎𝑖2
1
= 𝜎𝑖2 since 𝐸 𝑢𝑖2 = 𝜎𝑖2
𝜎𝑖2
=1
which is a constant.
β 10-41
• That is, the variance of the transformed disturbance term 𝑢𝑖∗ is
now homocedastic.
• Since we are still retaining the other assumptions of the
classical model, the finding that it is 𝑢∗ that is homocedastic
suggest that if apply OLS to the transformed model Eq(6.3), it
will produce estimators that are BLUE.
• In short, the estimated 𝛽0∗ and 𝛽1∗ are now BLUE and not the
OLS estimators 𝛽0 and 𝛽1 .
β 10-42
• This procedure of transforming the original variable and
transformed variables satisfy the assumptions of the CLRM
and then applying OLS to them is known as the method of
generalized least squares (GLS).
• In short, GLS is OLS on the transformed variables that satisfy
the standard least-squares assumptions.
• The estimators thus obtained are known as GLS estimator,
and it is these estimators are BLUE.
β 10-43
β
CHAPTER 10: the end

Chapter10 Heteroskedasticity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter10 Heteroskedasticity

Uploaded by

Copyright:

Available Formats

HETEROSKEDASTICITY

𝑉𝑎𝑟 𝜀𝑖 = 𝜎 2 = a constant 𝑖 = 1,2, . . , 𝑁 (10.1)

• With heteroskedasticity, error term variance is not

• Simplest way to visualize pure heteroskedasticity is to

• Both groups can be centered on zero but one has a

• The variance of the error term might be equal to:

• Z is called a proportionality factor.

to download the data (make sure that your computer is online).

𝑒𝑥𝑝𝑖 = 𝛼0 + 𝛼1 𝑎𝑔𝑒𝑖 + 𝛼2 𝑜𝑤𝑛𝑟𝑒𝑛𝑡𝑖 + 𝑖𝑛𝑐𝑜𝑚𝑒𝑖 + 𝑖𝑛𝑐𝑜𝑚𝑒𝑖2 + 𝑢𝑖 (7.1)

exp Coef. Std. Err. t P>|t| [95% Conf. Interval]

age -3.081814 5.514717 -0.56 0.578 -14.08923 7.925606

• Generate the residual;

3. Carry-out the error term obs (residuals):

4. Square the error term obs, 𝑢2

• If there no heteroskedasticity, then Z should not be related

o Estimate Eq(7.1) by OLS.

o Transform the 𝑢2 and income into log form.

lnuhatsq Coef. Std. Err. t P>|t| [95% Conf. Interval]

lnincome .819264 .6278711 1.30 0.196 -.4329853 2.071513

• The coefficient of lnincome is not statistically significance.

• supposed 𝜎𝑖2 is positively related to 𝑋𝑖

• Assumes that Z factor is income.

o Group n1 (cointained obs from 1 to 34) and Group n2 (contained

exp Coef. Std. Err. t P>|t| [95% Conf. Interval]

age -3.906595 2.739347 -1.43 0.165 -9.509188 1.695998

exp Coef. Std. Err. t P>|t| [95% Conf. Interval]

age -2.575317 12.4036 -0.21 0.837 -27.94353 22.7929

*display F-statistics and F-critical

o The null hypothesis is homocedastic.

• It has three steps:

𝑒𝑖2 = 𝛼0 + 𝛼1 𝑋1𝑖 + 𝛼2 𝑋2𝑖 + 𝑢𝑖 (10.7)

• R2 = 0.0441, N = 33 so chi-square statistic = 1.455

• Null hypothesis is homocedastic.

β• The heteroskedasticity is exist in our model.

• which for ease algebraic manipulation we write as

• Where the starred, or transformed, variables are the original

CHAPTER 10: the end

You might also like