Professional Documents
Culture Documents
β 10-1
Pure Heteroskedasticity
• Pure heteroskedasticity—referred to as
heteroskedasticity—occurs when Classical Assumption V
is violated in a correctly specified equation.
• Classical Assumption V assumes homoskedasticity:
β 10-2
Pure Heteroskedasticity (continued)
• Heteroskedasticity often occurs in data sets in which
there is a large disparity between largest and smallest
observed values of the dependent variable.
β 10-3
Pure Heteroskedasticity (continued)
β 10-4
Pure Heteroskedasticity (continued)
• Heteroskedasticity can take on many complex forms.
• One model has the variance of the error term related to
an exogenous variable Zi.
• For a typical regression equation:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + 𝜀𝑖 (10.3)
β 10-6
Pure Heteroskedasticity (continued)
β 10-7
Pure Heteroskedasticity (continued)
• Heteroskedasticity can occur in a time-series model with
a significant amount of change in the dependent variable.
• It can also occur in any model, time series or cross-
sectional, where the quality of data collection changes
dramatically.
• As measurement errors decrease in size, so should the
variance of the error term.
• Heteroskedasticity caused by an error in specification is
referred to as impure heteroskedasticity.
β 10-8
The Consequences of Heteroskedasticity
• There are three major consequences of
heteroskedasticity:
1. Pure heteroskedasticity does not cause bias in the
coefficient estimates.
2. Heteroskedasticity typically causes OLS to no longer
be the minimum variance estimator (of all the linear
unbiased estimators).
3. Heteroskedasticity causes the OLS estimates of the
standard errors to be biased, leading to unreliable
hypothesis testing and confidence intervals.
β 10-9
Testing for Heteroskedasticity
• There are many tests for heteroskedasticity; two popular:
1. Park test
2. Goldfeld-Quandt
2. Breusch-Pagan test
• Before testing for heteroskedasticity, start with asking:
1. Are there any obvious specification errors?
2. Are there any early warning signs of
heteroskedasticity?
3. Does a graph of the residuals show any evidence of
heteroskedasticity?
β 10-10
Testing for Heteroskedasticity (continued)
β 10-11
Testing for Heteroskedasticity (continued)
β 10-12
Testing for Heteroskedasticity (continued)
• For test the heterokedasticity, we will use the data from
UCLA website.
• In Stata, type
. use http://stats.idre.ucla.edu/stat/stata/ado/analysis/hetdata.dta,clear
β 10-13
Testing for Heteroskedasticity (continued)
• The model ;
where;
exp = expenditure
age = age
ownrent = own rent
income = income
β 10-14
Testing for Heteroskedasticity (continued)
• Lets we regress the Eq(7.1)
regress exp age ownrent income incomesq
Source SS df MS Number of obs = 72
F( 4, 67) = 5.39
Model 1749357.01 4 437339.252 Prob > F = 0.0008
Residual 5432562.03 67 81083.0153 R-squared = 0.2436
Adj R-squared = 0.1984
Total 7181919.03 71 101153.789 Root MSE = 284.75
β 10-15
Testing for Heteroskedasticity (continued)
• We plot the data for residual for each variables to
see that whether there is a clear pattern between
them.
graph twoway (scatter uhat age),yline(0)
1500
1000
500
0
-500
20 30 40 50 60
β
age
10-16
Testing for Heteroskedasticity (continued)
graph twoway (scatter uhat ownrent),yline(0)
1500
1000
500
0
-500
0 .2 .4 .6 .8 1
own rent
β 10-17
Testing for Heteroskedasticity (continued)
graph twoway (scatter uhat income),yline(0)
1500
1000
500
0
-500
2 4 6 8 10
income
β 10-18
Testing for Heteroskedasticity (continued)
• Its seem that the income variable is the source of
heterocedasticity, or we called its as the Z factor.
β 10-19
Park Test
• The challenge of the Park (1966) test is to determine
which variable to use as Z, the size factor.
• The Park test is useful because tells weather we
have successfully identified the correct Z.
•
β 10-20
Park Test (continued)
• Steps:
1. Decide what variable should serve as Z (usually something that
measures the relative size of the obs).
2. Run the original regression:
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝑢 (5.1)
β 10-22
Park Test (continued)
• Perform Park test with Stata
β 10-23
Park Test (continued)
o Estimate the model
ln 𝑢2 = 𝛽0 + 𝛽1 ln 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝑒 (7.2)
regress lnuhatsq lnincome
Source SS df MS Number of obs = 72
F( 1, 70) = 1.70
Model 8.44316238 1 8.44316238 Prob > F = 0.1962
Residual 347.133348 70 4.95904782 R-squared = 0.0237
Adj R-squared = 0.0098
Total 355.57651 71 5.00811986 Root MSE = 2.2269
• Where 𝜎 2 is constant.
β 10-25
Goldfeld-Quandt Test (continued)
• Assumption Eq(5.5) postulates that 𝜎𝑖2 is proportional to the
square if X variable.
• It means that, when 𝜎𝑖2 become larger, the larger of value 𝑋𝑖 .
• If this happen, the heteroskedasticity is exist in model.
• To test this explicitly, G-Q suggest following steps:
1. Order or rank the obs according to the value of X, beginning lowest
value of X.
2. Omit c central obs, and divide the remaining (n-c) obs into two
group each of (n-c)/2 obs.
3. Run OLS regression to the first group and second group.
β 10-26
Goldfeld-Quandt Test (continued)
Obtain the varian from the two groups. 1 for group 1 and 2 for the
2 2
4.
group 2. 1 refers to small variance group and 2 refers to large
2 2
variance group.
5. Compute the ratio
22 / n2 k (5.6)
2
1 / n2 k
where 𝜆 follows the F distribution with numerator and denominator
df (n – k).
6. If in application the computed 𝜆(= 𝐹) is greater than critical F at the
chosen level of significance, we reject the hypothesis of
homoskedasticity . Means that there are heteroskedasticity in the
estimated model.
β 10-27
Goldfeld-Quandt Test (continued)
• Perform Goldfeld-Quandt with Stata;
o Divide obs into 2 groups. The data have 72 obs. Each group have
36 obs now. Drop the middle obs (drop the obs 35,36,37,38)
β 10-28
Goldfeld-Quandt Test (continued)
o Regress the data for group n1 (1 to 34) based on Eq(7.1) and carry-
out the value of RSS and their degree of freedom
regress exp age income ownrent incomesq in 1/34
Source SS df MS Number of obs = 34
F( 4, 29) = 1.66
Model 72255.6521 4 18063.913 Prob > F = 0.1860
Residual 315378.54 29 10875.1221 R-squared = 0.1864
Adj R-squared = 0.0742
Total 387634.192 33 11746.4907 Root MSE = 104.28
scalar rmse_n1=e(rmse)
scalar df_n1=e(df_r)
β 10-29
Goldfeld-Quandt Test (continued)
o Regress the data for group n2 (39 to 72) based on Eq(7.1) and
carry-out the value of RSS and their degree of freedom
regress exp age income ownrent incomesq in 39/72
Source SS df MS Number of obs = 34
F( 4, 29) = 0.38
Model 256113.216 4 64028.304 Prob > F = 0.8179
Residual 4829752.59 29 166543.193 R-squared = 0.0504
Adj R-squared = -0.0806
Total 5085865.8 33 154117.145 Root MSE = 408.1
scalar rmse_n2=e(rmse)
scalar df_n2=e(df_r)
β 10-30
Goldfeld-Quandt Test (continued)
o Compute the ratio:
22 / n2 k (7.3)
F 2
1 / n2 k
*calculate the F-statistics
scalar Fstat = rmse_n1^2/rmse_n2^2
* calculate the F-critical at 5% significance level
scalar Fcrit=invFtail(df_n2,df_n1,0.05)
Fstat = .06529911
Fcrit = 1.8608114
β 10-31
Goldfeld-Quandt Test (continued)
β 10-32
The Breusch-Pagan Test
• Breusch-Pagan test investigates whether the squared
residuals can be explained by possible proportionality
factors.
β 10-33
The Breusch-Pagan Test (continued)
Step 2: Use the squared residuals as the dependent
variable in an auxiliary equation.
• The test statistic is NR2 and the degrees of freedom is equal to the
number of slope coefficients in auxiliary.
β 10-34
The Breusch-Pagan Test (continued)
Example: Woody’s restaurant location
• Auxiliary equation:
chi2(1) = 29.23
Prob > chi2 = 0.0000
β 10-37
Remedies for Heteroskedasticity
When 𝝈𝟐𝒊 is known
• The generalized least squares (GLS) is a popular approach
to treating heteroskedasticity and capable of producing
estimator that are BLUE.
• To see how this is accomplished, let us continue with two
variable model;
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1 + 𝑢𝑖 (6.1)
β 10-38
Remedies for Heteroskedasticity
When 𝝈𝟐𝒊 is known
where 𝑋0𝑖 = 1 for each 𝑖. We can see that these two
formulation are identical.
• Now, assume that the heteroskedastic variances 𝜎𝑖2 are
known.
• Divided Eq(6.2) by 𝜎 2 to obtain;
𝑌𝑖 𝑋0𝑖 𝑋𝑖 𝑢𝑖
= 𝛽0 + 𝛽1 + (6.3)
𝜎𝑖 𝜎𝑖 𝜎𝑖 𝜎𝑖
β 10-39
Remedies for Heteroskedasticity
When 𝝈𝟐𝒊 is known
which for ease of exposition we write as;
𝑌𝑖∗ = 𝛽0∗ 𝑋0𝑖
∗
+ 𝛽1∗ 𝑋𝑖∗ + 𝑢𝑖∗ (6.4)
β 10-40
Remedies for Heteroskedasticity
When 𝝈𝟐𝒊 is known
• What is the proposed of transforming the original model?
• To see this, notice the following feature of the transformed
error term, 𝑢𝑖∗ ;
𝑢𝑖 2
𝑣𝑎𝑟 𝑢𝑖∗ =𝐸 𝑢𝑖∗ 2 =𝐸
𝜎𝑖
1
= 𝐸 𝑢𝑖2 since 𝜎𝑖2 is known
𝜎𝑖2
1
= 𝜎𝑖2 since 𝐸 𝑢𝑖2 = 𝜎𝑖2
𝜎𝑖2
=1
which is a constant.
β 10-41
Remedies for Heteroskedasticity
When 𝝈𝟐𝒊 is known
• That is, the variance of the transformed disturbance term 𝑢𝑖∗ is
now homocedastic.
• Since we are still retaining the other assumptions of the
classical model, the finding that it is 𝑢∗ that is homocedastic
suggest that if apply OLS to the transformed model Eq(6.3), it
will produce estimators that are BLUE.
• In short, the estimated 𝛽0∗ and 𝛽1∗ are now BLUE and not the
OLS estimators 𝛽0 and 𝛽1 .
β 10-42
Remedies for Heteroskedasticity
When 𝝈𝟐𝒊 is known
• This procedure of transforming the original variable and
transformed variables satisfy the assumptions of the CLRM
and then applying OLS to them is known as the method of
generalized least squares (GLS).
• In short, GLS is OLS on the transformed variables that satisfy
the standard least-squares assumptions.
• The estimators thus obtained are known as GLS estimator,
and it is these estimators are BLUE.
β 10-43
β