You are on page 1of 15

Introductory Econometrics Wooldridge Notes

Rajat Goyal

May 28, 2019

1 Chapter 1
1. An important feature of cross-sectional data is that we can often assume that they have been obtained
by random sampling from the underlying population.

(a) Possible violation is sample selection bias where for example, families with high wealth do not
disclose their wealth.

2. A key feature of time series data that makes it more dicult to analyze than cross sectional data is the
fact that economic observations can rarely, if ever, be assumed to be independent across time.

(a) Seasonal patterns in the data need to be handled properly

3. Pooled data is several independent sets of cross-sectional data collected over time to be put together in
one dataset. This diers from panel data where the same samples in one cross-sectional data set are
observed over time.

2 Chapter 2: Simple Linear Regression


1. SLR Assumptions

(a) Linear in parameters: y = β0 + β1 x + u


(b) Random sampling

(c) Equal (which implies zero) conditional mean given any values of the independent variables, that is,
E(u|x) = E(u) = 0 for all x
(d) Sample variation in the independent variable, that is, not all xi are the same
2
(e) Homoskedasticity: V ar(u|x) = σ

2. The zero conditional mean assumptioncoupled with the random sampling assumption allows for a
convenient technical simplication. In particular, we can derive the statistical properties of the OLS
estimators as conditional on the values of the xi in our sample. Technically, in statistical derivations,
conditioning on the sample values of the independent variable is the same as treating the xi as xed in
repeated samples. In nonexperimental contexts, xed in repeated sampling is not realistic - for example:
think about xing the levels of education (xi ) and then going out looking for wages of men with that
level of education. This is not realistic. Instead, we randomly select men and note both education and
wage (random sampling).

3. Spurious Correlation: Using simple regression when u contains factors aecting y that are also correlated
with x can result in spurious correlation: that is, we nd a relationship between y and x that is really
due to other unobserved factors that aect y and also happen to be correlated with x.
4. Dierence Between Errors and Residuals: The errors ui are never observable, while the residuals ûi are
computed from the data.
yi = β0 + β1 xi + ui
yi = βˆ0 + βˆ1 xi + ûi

1
5. Algebraic properties of linear regression are

(a) The sample average of the residuals is zero.

(b) The sample covariance between each independent variable and the OLS residuals is zero. Conse-
quently, the sample covariance between the OLS tted values and the OLS residuals is zero.

(c) The point (x̄, ȳ) lies on the regression line.

6. SST = SSE + SSR (Total = Explained + Residual)

n
X 2
SST = (yi − ȳ)
i=1

n
X 2
SSE = (yˆi − ȳ)
i=1
n
X n
X
2
SSR = (yi − yˆi ) = ûi 2
i=1 i=1

3 Chapter 3: Multiple Linear Regression


1. Regression Through the Origin

(a) Drawback is that if the intercept is not truly zero, the slope estimators will be biased.

(b) Advantage is that if the intercept is truly zero, the variance of the slope estimators will be lesser.

2. By minimizing sum of squared residuals SSR, we get

ˆ
β~ = (X T X)−1 X T ~y

~yˆ = X ~yˆ = X(X T X)−1 X T ~y = P ~y


ˆ = ~y − ~yˆ = (I − P )~y = Q~y
~u
where P = X(X T X)−1 X T and Q=I −P
3. MLR Assumptions

(a) MLR.1: Linear in parameters: y = β~ T ~x + u


(b) MLR.2: Random sampling

(c) MLR.3: Equal (which implies zero) conditional mean given any values of the independent variables,
that is, E(u|~x) = E(u) = 0 for all ~x
i. Fails if

A. The functional relationship between the explained and explanatory variables is misspecied

B. Omitting an important factor that is correlated with any of the explanatory variables

C. Measurement error in an explanatory variable

ii. If the assumption holds, we say that we have exogenous explanatory variables. If xi is correlated
with u for any reason, then xi is said to be an endogenous explanatory variable.

(d) MLR.4: No perfect collinearity: In the sample (and therefore in the population), none of the
independent variables is constant, and there are no exact linear relationships among the independent
variables

(e) MLR.5: Homoskedasticity: V ar(u|~x) = σ 2


i. Together, assumptions MLR.1 to MLR.5 are known as the Gauss-Markov assumptions for
cross-sectional regressions

4. Under assumptions MLR.1 to MLR.4, estimators for parameters are unbiased.

5. Overspecication: Estimators are still unbiased although variance of the OLS estimators can be higher.

2
6. Underspecication or Omitted Variable Bias: Estimators are biased. In the simple case where the true
model is y = β0 + β1 x 1 + β2 x 2 + u
E(β˜1 ) = β1 + β2 δˆ1
where x2 = δ 0 + δ 1 x1 + u

(a) This means that if x1 and x2 are uncorrelated, then β˜1 is unbiased since δ1 = 0
(b) In general, if a variable is omitted, all OLS estimators are biased even if the corresponding explana-
tory variables is uncorrelated with the missing variable

~ˆ = σ 2 X T X −1 .

7. Under Gauss-Markov assumptions, Cov(β) For variances, this can also be written as

σ2
V ar(βˆj ) =
SSTj (1 − Rj2 )

where Rj2 is the R-squared obtained by regressing xj against the other independent variables.

(a) More the error variance, more the variance in the OLS estimators

(b) More the variance in the explanatory variables, less is the variance in the OLS estimators

(c) Bigger Rj2 (more multicollinearity), more is the variance in the corresponding OLS estimator

(d) Further, under Gauss-Markov assumptions

Cov(~yˆ) = σ 2 P
ˆ) = σ 2 Q
Cov(~u
Cov(~yˆ, ~u
ˆ) = σ 2 QP = 0

8. Omitted Variable Bias: See from the above equation that variance of the OLS estimator βˆ1 reduces if
the variable x2 is excluded. But then it is biased. In large samples, we should denitely include x2 since
the biasedness remains but the dierence in the variance reduces.

SSR
9. Under Gauss-Markov assumptions, σ̂ 2 = n−k where k is the number of parameters is an unbiased
2
estimator of σ

(a) Addition of an explanatory variable can either increase or decrease σ̂ since SSR reduces but k
increases by 1

(b) Note that σ̂ is not an unbiased estimator of σ

10. Gauss-Markov theorem: Under Gauss-Markov assumptions, OLS is BLUE

(a) Linear estimator is one that can be expressed as a linear function of the data on the dependent
Pn
variable, that is, βˆj = i=1 wij yi

4 Chapter 4: Inference
1. Make an additional assumption for MLR

(a) MLR.6: Normality: The population error u is independent of the explanatory variables and is
normally distributed with zero mean and variance σ2
(b) The rst ve assumptions + normality are called classical linear model (CLM) assumptions

2. Under CLM assumptions, the OLS esrtimator is MVUE

3. Under MLR.6, the OLS estimators are also normally distirbuted since they are a linear combination of
the residuals.
ˆ
 
β~ ∼ N β, ~ σ 2 (X T X)−1
 
~yˆ = P ~y ∼ N X β, ~ σ2 P

ˆ = Q~y ∼ N 0, σ 2 Q

~u
~yˆ and ~u
ˆ are independent since they have zero covariance and both are normally distributed.

3
4. T-test
βˆj − βj βˆj − βj
=q ∼ tn−k
se(βˆj ) (σ̂ 2 (X T X)−1 )jj

(n − k)σ̂ 2
∼ χ2n−k
σ2
5. Recommended practice is to use smaller (1% instead of 5%) signicance levels as the sample size increases
to counter the fact that standard error also reduces.

6. Large standard errors are normally caused by small sample sizes. Remember that large standard errors
can also be a result of multicollinearity (high correlation among some of the independent variables), even
if the sample size seems fairly large.

7. F-test: To test whether a subset of the OLS estimators are dierent from zero, use

2

(SSRr − SSRur ) /q Rur − Rr2 /q
= 2 ) /(n − k)
∼ Fq,n−k
SSRur / (n − k) (1 − Rur
In vector form,
T
ˆ ~ ~ˆ − β~
  
β~ − β XT X β
∼ Fk,n−k
kσ̂ 2
To test a hypothesis on a set of linear combinations of the OLS estimators,

 T  
~ηˆ − ~η Σ−1 ~ηˆ − ~η
∼ Fq,n−k
qσ̂ 2

where ~η = Aβ~ and Σ = A(X T X)−1 AT

yˆi − E(yi )
√ ∼ tn−m
σ̂ pii

If y0 is a new observation not in the training dataset, then

yˆ0 − E(y0 )
p ∼ tn−m
σ̂ xT0 (X T X)−1 x0
yˆ0 − y0
p ∼ tn−m
σ̂ 1 + xT0 (X T X)−1 x0

5 Chapter 5: OLS Asymptotics


1. One practically important nding is that even without the normality assumption (Assumption MLR.6),
t and F statistics have approximately t and F distributions, at least in large sample sizes

2. Under assumptions MLR.1 through MLR.4, the OLS estimators are consistent

(a) This theorem holds even if we replace assumption MLR.3 with the weaker assumption of E(u) = 0
and Cov(xj , u) = 0 for all j
(b) If the error is correlated with any of the explanatory variables, then all of the OLS estimators are
biased and inconsistent

3. Asymptotic Normality: Under Gauss-Markov assumptions, OLS estimators are asymptotically normally
distributed

(a) σ̂ 2 is a consistent estimator of σ2


(b) σ̂ is a consistent estimator of σ

4. Lagrange Multiplier (LM) Statistic for q Exclusion Restrictions

(a) Regress y on the restricted set of independent variables and save the residuals

4
(b) Regress the residuals on *all* (for technical reasons) of the independent variables and obtain the
R-squared, say Ru2
(c) Compute LM = nRu2
(d) Compare LM to the appropriate critical value, c, in a χ2q distribution. If LM > c, the null hypothesis
is rejected

(e) If the null hypothesis (of the q excluded parameters being zero) is true, then Ru2 would be zero
since the residuals from the restricted regression would be uncorrelated with all of the removed
explanatory variables

6 Chapter 6: Further Issues


1. Adjusted R-Squared:
SSR
R2 = 1 −
SST
σu2 SSR/(n − k)
AdjR2 = 1 − =1−
σy2 SST /(n − 1)

(a) Adjusted R2 is useful for comparing non-nested alternatives for a model


2
(b) Neither R nor adjusted R2 can be used to compare models with dierent dependent variables

   
σ̂ 2 ˆ
2. Predicting y when log(y) is the Dependent Variable: ŷ = exp 2 exp log(y)

(a) This prediction is consistent but not unbiased

(b) We rely on normality of u to reach the above conclusion. However, that is not always true and so
 
a better method is to use ˆ
ŷ = αˆ0 exp log(y) where αˆ0 is an estimator for E(exp(u)). Obtain a
 
consistent estimator αˆ0 by using the slope estimator of the regression ˆ i)
y = c · exp log(y

7 Chapter 7: Multiple Regression Analysis with Qualitative Infor-


mation: Binary (or Dummy) Variables
1. Linear Probability Model: If y is a binary 0/1 variable, the standard regression model gives us the
probability E(y|x) = P (y = 1|x)

(a) Due to the binary nature of y, the linear probability model does violate one of the Gauss-Markov
assumptions. When y is a binary variable, its variance, conditional on x, is

V ar(y|x) = E(y|x) (1 − E(y|x))

which is not independent of x


(b) Drawbacks are that it can give a probability more than 1 or less than zero, and it implies a constant
marginal eect of each explanatory variable that appears in its original form

8 Chapter 8: Heteroskedasticity
1. Heteroskedasticity does not cause bias or inconsistency in the OLS estimators

2. Heteroskedasticity robust standard error: They are validat least in large sampleswhether or not the
errors have constant variance, and we do not need to know which is the case

Pn 2 2
i=1 rˆ
ij ûi
V ar(βˆj ) =
SSRj2

where rˆij denotes the ith residual from regressing xj on all the other independent variables

5
(a) The robust standard errors and robust t statistics are justied only as the sample size becomes
large. With small sample sizes, the robust t statistics can have distributions that are not very close
to the t distribution, which would could throw o our inference.

3. Heteroskedasticity Robust LM Statistic for q Exclusion Restrictions

(a) Obtain the residuals ˆ


~u from the restricted model

(b) Regress each of the independent variables excluded under the null on all of the included independent
 
variables; if there are q excluded variables, this leads to q sets of residuals r~ˆ1 , r~ˆ2 , ..., r~ˆq

(c) Find the products between each set of residuals r~ˆj and ˆ
~u (for all observations)

(d) Run the regression of 1 on q features r~ˆj · ~u


ˆ without an intercept. The heteroskedasticity-robust LM
statistic isLM = n − SSR1 where SSR1 is just the usual sum of squared residuals from this nal
2
regression. Under the null hypothesis, LM is distributed approximately as χq

4. Tests for Heteroskedasticity

(a) Breusch-Pagan test: Run û2 = δ0 + δ1 x1 + ... + δk−1 xk−1 + ν and test H0 : δ1 = δ2 = ... = δk−1 = 0
2
(b) White test: Add squares xj and cross-products xi xj to the regression in the above test

(c) Modied White test: Run û2 = δ0 + δ1 ŷ + δ2 ŷ 2 + ν and test H0 : δ1 = δ2 = 0


i. Remember that using y instead of ŷ does not produce a valid test

ii. Advantage is that it increases degrees of freedom but disadvantage is that it imposes restrictions
on the parameters of the White test

(d) We have interpreted a rejection using one of the heteroskedasticity tests as evidence of heteroskedas-
ticity. This is appropriate provided we maintain Assumptions MLR.1 through MLR.4. But, if
MLR.3 is violated - in particular, if the functional form of E(y|x) is misspecied - then a test for
heteroskedastcity can reject the null hypothesis even if V ar(y|x) is constant

5. Weighted Least Squares: If we are given that V ar(y|~x) = σ 2 h(~x) where h(~x) is known, then we run
y xk−1
the regression √ = β0 + β1 √x1 + ... + βk−1 √ and the transformed equation satises the
h(~
x) h(~x) h(~
x)
classical linear model assumptions (MLR.1 through MLR.6), if the original model does so, except for the
homoskedasticity assumption

6. Feasible GLS: Model V ar(u|~x) = σ 2 exp (δ0 + δ1 x1 + ... + δk−1 xk−1 )

(a) Obtain the residuals ˆ


~u from the standard model

(b) Run the regression of log(û2 ) on the independent variables (as an alternative, we can also use ŷ and
ŷ 2
as the independent variables) and obtain the tted values ~ gˆ
(c) Exponentiate the tted values ĥ = exp(ĝ)
(d) Estimate the standard equation by WLS using ĥi instead of h(x~i )
(e) FGLS estimator is not unbiased and hence, not BLUE. Nevertheless, the FGLS estimator is consis-
tent and asymptotically more ecient than OLS

9 Chapter 9: More on Specication and Data Problems


1. Functional Form Misspecication:

(a) Regression Specication Error (RESET) test: Run y = β0 + β1 x1 + ... + βk−1 xk−1 + δ0 ŷ + δ1 ŷ 2 + u
and test H0 : δ0 = δ1 = 0
(b) Testing Against Non-Nested Alternatives: The rst approach is to construct a comprehensive model
that contains each model as a special case and then to test the restrictions that lead to each of the
models. The other approach is to use the tted dependent variables values ŷ from one model as an
extra explanatory variable in the other model and test that the OLS estimator for ŷ is zero

2
i. If both models are accepted, use adjusted R to choose

ii. If both models are rejected, more work is needed

6
2. Omitted Variables Bias: How do we solve the problem of knowing that y depends on some explanatory
variable x but not having any data for x x is unobservable or that data was
either because the variable

not collected for x? Assume a 3 variable model y = β0 + β1 x1 + β2 x2 + β3 x3 + u. Replace the actual
∗ ∗
variable x3 with a proxy variable x3 . Assume x3 = δ0 + δ3 x3 + v3 . This procedure gives consistent
estimators for β1 and β2 if

(a) E(u|x1 , x2 , x∗3 , x3 ) = 0


(b) E(x∗3 |x1 , x2 , x3 ) = E(x∗3 |x3 ) = δ0 + δ3 x3 , that is, once x3 is controlled for, the expected value of x∗3
does not depend on x1 or x2

3. Measurement Error in the Dependent Variable: Measurement error in the dependent variable can cause
biases in OLS if it is systematically related to one or more of the explanatory variables. If the measure-
ment error is just a random reporting error that is independent of the explanatory variables, as is often
assumed, then OLS is perfectly appropriate

(a) If the expectation of the measurement error is not zero, we get a biased estimate for the intercept
but not the slope estimators are unbiased. This is rarely a problem

(b) The variance of the error term becomes larger since it also includes variance in the measurement
error now. So the variance in the OLS estimators becomes larger but this is as expected

4. Measurement Error in an Explanatory Variable: Say the true model is y = β0 +β1 x∗1 +u but we only have
∗ ∗
observations x1 for x1 . Dene the measurement error to be 1 = x1 − x1 . Then y = β0 + β1 x1 + u − β1 1 .
∗ ∗
Assume that E(1 ) = 0 and E(u|x1 , x1 ) = E(u|x1 ) = 0. Both these assumptions are non-controversial
and hold amost by denition.

(a) If we now assume that Cov(x1 , 1 ) = 0, then we obtain unbiased and consistent estimators for β0
and β1
(b) The classical errors-in-variables (CEV) assumption is that Cov(x∗1 , 1 ) = 0

Cov(x1 , 1 ) = σ2

Cov(x1 , u − β1 1 ) = −β1 σ2


Thus, in the CEV case, the OLS regression of y on x1 gives a biased and inconsistent estimator.
We can obtain !
σx2∗1
plim(βˆ1 ) = β1
σx2∗ + σ21
1

plim(βˆ1 ) is always closer to zero than β1 (attenuation bias)

5. Violations of the Random Sampling Assumption MLR.2

(a) Missing Data: Some of the numbers in the data matrix are missing. Ignore the observations that
have any missing data

i. If the data are missing at random, then the size of the random sample available from the
population is simply reduced but MLR.2 continues to hold

ii. There are ways to use the information on observations where only some variables are missing,
but this is not often done in practice. The improvement in the estimators is usually slight,
while the methods are somewhat complicated. In most cases, we just ignore the observations
that have missing information.

(b) Nonrandom Samples: Exogenous sample selection (selection is based on the values of the indepen-
dent variables) is not a problem whereas endogenous sample selection (selection is based on the
value of the dependent variable) is a problem

(c) Outlying Observations: Rather than trying to nd outlying observations in the data before applying
least squares, we can use an estimation method that is less sensitive to outliers than OLS. This
obviates the need to explicitly search for outliers before estimation. One such method is called least
absolute deviations, or LAD. The LAD estimator minimizes the sum of the absolute deviation of
the residuals, rather than the sum of squared residuals. Compared with OLS, LAD gives less weight
to large residuals. Thus, it is less inuenced by changes in a small number of observations.

7
10 Chapter 10: Basic Regression Analysis with Time Series Data
1. Formally, a sequence of random variables indexed by time is called a stochastic process or a time series
process. When we collect a time series data set, we obtain one possible outcome, or realization, of the
stochastic process.We can only see a single realization, because we cannot go back in time and start
the process over again. (This is analogous to cross-sectional analysis where we can collect only one
random sample.) However, if certain conditions in history had been dierent, we would generally obtain
a dierent realization for the stochastic process, and this is why we think of time series data as the
outcome of random variables. The set of all possible realizations of a time series process plays the role
of the population in cross-sectional analysis.

2. Static/Contemporaneous Model: yt = β0 + β1 zt + ut
3. Finite Distributed Lag Model of order q : yt = α0 + δ0 zt + δ1 zt−1 + ... + δq zt−q + ut

(a) Lag distribution summarizes the dynamic eect that a temporary increase in z has on y
(b) Long Run Propensity (LRP) or Long Run Multiplier is the dynamic eect on y of a permanent
increase in z

4. TS Assumptions

(a) TS.1: Linear in Parameters

(b) TS.2: Zero Conditional Mean: E(ut |X) = 0 for all t


i. If E(ut |x~t ) = 0, then we say that X is contemporaneously exogenous

ii. TS.2 is called strict exogeneity

iii. In the cross-sectional case, we did not explicitly state how the error term ui is related to the
explanatory variables for other observations in the sample. The reason this was unnecessary
is that, with random sampling (Assumption MLR.2), ui is automatically independent of the
explanatory variables for observations other than i. In a time series context, random sampling
is almost never appropriate, so we must explicitly assume that the expected value of ut is not
related to the explanatory variables in any time periods

iv. Assumption TS.2 requires not only that ut and zt are uncorrelated, but that ut is also uncorre-
lated with past and future values of z. This has two implications. First, z can have no lagged
eect on y. If z does have a lagged eect on y , then we should estimate a distributed lag model.
A more subtle point is that strict exogeneity excludes the possibility that changes in the error
term today can cause future changes in z. This eectively rules out feedback from y on future
values of z
(c) TS.3: No Perfect Collinearity: In the sample (and therefore in the underlying time series process),
no independent variable is constant or a perfect linear combination of the others

(d) TS.4: Homoskedasticity: Conditional on X, the variance of ut is the same for all t

V ar(ut |X) = σ 2

(e) TS.5: No Serial Correlation: Conditional on X, the errors in two dierent time periods are uncor-
related
Corr(ut , us |X) = 0
(f ) TS.6: Normality: The errors ut are independent of X and are independently and identically dis-
tributed as N (0, σ 2 )
(g) TS.1 through TS.5 are the appropriate Gauss-Markov assumptions for time series applications

(h) TS.1 through TS.6 are the Classical Linear Model (CLM) assumptions for time series applications

5. Unbiasedness: Under assumptions TS.1, TS.2, and TS.3, the OLS estimators are unbiased conditional

on X, and therefore unconditionally as well ~ˆ = β~


E(β)
6. Under assumptions TS.1 through TS.5,

σ2
V ar(βj |X) =
SSTj (1 − Rj2 )

8
SSR
7. Under assumptions TS.1 through TS.5, σ̂ 2 = n−k is an unbiased estimator of σ2
8. Gauss-Markov theorem: Under the Gauss-Markov assumptions, OLS is BLUE

9. Under assumptions TS.1 through TS.6, all sampling distributions are normal and the standard t, F, chi
squared tests hold

10. Spurious Regression: The phenomenon of nding a relationship between two or more trending variables
simply because each is growing over time is an example of spurious regression

11. Computing R2 When the Dependent Variable is Trending: Estimating the error variance when yt is
trending is no problem, provided a time trend is included in the regression. However, when E(yt )
SST
n−1 is no longer an unbiased or consistent estimator of V ar(yt ). In fact,
follows, say, a linear time trend
SST
can substantially overestimate the variance in yt , because it does not account for the trend in yt .
n−1
Compute a detrended time series y˜t by regressing yt on t and then use that in the standard regression
2
with all the explanatory variables plus t to compute the R measure

(a) Rather than including a time trend in a regression, we can instead dierence those variables that
show obvious trends.

11 Chapter 11: Further Issues in Using OLS with Time Series Data
1. Strict Sense Stationary: A stationary time series process is one whose probability distributions are stable
over time in the following sense: if we take any collection of random variables in the sequence and then
shift that sequence ahead h time periods, the joint probability distribution must remain unchanged

2. Covariance Stationary: A stochastic process with nite second moment is covariance stationary if (i)
E(xt ) is constant; (ii) V ar(xt ) is constant; (iii) Cov(xt , xt+h ) depends only on h and not on t

3. Why is Stationarity Useful? On a practical level, if we want to understand the relationship between two
or more variables using regression analysis, we need to assume some sort of stability over time. If we
allow the relationship between two variables to change arbitrarily in each time period, then we cannot
hope to learn much about how a change in one variable aects the other variable if we only have access
to a single time series realization

4. Weakly Dependent Time Series: Loosely speaking, a stationary time series process is said to be weakly
dependent if xt and xt+h are almost independent as h increases without bound. A similar statement
holds true if the sequence is nonstationary, but then we must assume that the concept of being almost
independent does not depend on the starting point, t.

(a) Completely dierent concept from stationarity

5. Asymptotically Uncorrelated Time Series: Covariance stationary sequences where Corr(xt , xt+h ) → 0 as
h→∞ are said to be asymptotically uncorrelated. Intuitively, this is how we will usually characterize
weak dependence

6. Why is weak dependence important for regression analysis? Essentially, it replaces the assumption of
random sampling in implying that the law of large numbers (LLN) and the central limit theorem (CLT)
hold. The most well-known central limit theorem for time series data requires stationarity and some
form of weak dependence. Time series that are not weakly dependent do not generally satisfy the CLT,
which is why their use in multiple regression analysis can be tricky. Examples of weakly dependent time
series are

(a) MA(1): xt = α1 t−1 + t


(b) Stable AR(1): yt = ρ1 yt−1 + t with |ρ| < 1 (stability condition)

7. Trend Stationary Process: A series that is stationary about its time trend, as well as weakly dependent,
is often called a trend-stationary process. (Notice that the name is not completely descriptive because
we assume weak dependence along with stationarity.) Such processes can be used in regression analysis,
provided appropriate time trends are included in the model

8. Assumptions

9
(a) TS.1': Assumption TS.1 + Weak Dependence: In other words, the law of large numbers and the
central limit theorem can be applied to sample averages

(b) TS.2': Contemporaneous Exogeneity

(c) TS.3': No Perfect Collinearity

(d) TS.4': Homoskedasticity: V ar(ut |x~t ) = σ 2


(e) TS.5': No Serial Correlation: Corr(ut , us |x~t , x~s ) = 0
(f ) Note that stationarity is not assumed anywhere but it is a useful property to have

9. Under assumptions TS.1' through TS.3', the OLS estimators are consistent

10. Under assumptions TS.1' through TS.5', the OLS estimators are asymptotically normally distributed

11. Transformations on Highly Persistent Time Series: Weakly dependent processes are said to be integrated
of order zero, I(0). Practically, this means that nothing needs to be done to such series before using them
in regression analysis: averages of such sequences already satisfy the standard limit theorems. Unit root
processes, such as a random walk (with or without drift), are said to be integrated of order one, or I(1).
This means that the rst dierence of the process is weakly dependent (and often stationary)

12. When models have complete dynamics in the sense that no further lags of any variable are needed in
the equation, the errors will be serially uncorrelated (assumption TS.5' is satised). Formally, the model
yt = β0 + β1 xt1 + ... + βk−1 xt,k−1 + ut is dynamically complete if E(yt |xt , yt−1 , xt−1 , yt−2 , xt−2 , ...) =
E(yt |xt ). In other words, whatever is in xt , enough lags have been included so that further lags of y and
the explanatory variables do not matter for explaining yt . Since specifying a dynamically complete model
means that there is no serial correlation, does it follow that all models should be dynamically complete?
For forecasting purposes, the answer is yes. In static and distributed lag models, the dynamically
complete assumption is often false, which generally means the errors will be serially correlated

12 Chapter 12: Serial Correlation and Heterskedasticity in Time


Series Regressions
1. We saw in Chapter 11 that when, in an appropriate sense, the dynamics of a model have been completely
specied, the errors will not be serially correlated

2. Unbiasedness and Consistency: Assumptions TS.1 through TS.3 imply unbiasedness regardless of serial
correlation and regardless of heteroskedasticity. Assumptions TS.1' through TS.3' imply consistency
(though not necessarily unbiasedness) regardless of serial correlation and regardless of heteroskedasticity

3. Eciency and Inference: OLS is no longer BLUE in the presence of serial correlation. OLS standard
error of estimates can seriously underestimate true standard error in the presence of serial correlation.
Hence, OLS statistics are invalid for testing purposes.

4. Serial correlation in the presence of lagged dependent variables:

(a) The serial correlation in the errors will cause the usual OLS statistics to be invalid for testing
purposes, but it will not aect consistency.

(b) So when is OLS inconsistent if the errors are serially correlated and the regressors contain a lagged
dependent variable? This happens when we assume that ut follows a stable AR(1) model ut =
ρut−1 + et . In this case, Cov(yt−1 , ut ) = ρ · Cov(yt−1 , ut−1 ).
(c) Often serial correlation in the errors of a dynamic model simply indicates that the dynamic regression
function has not been completely specied.

5. t-test for AR(1) serial correlation ut = ρut−1 + et with strictly exogenous regressors. Recall that this
requires the error, ut , to be uncorrelated with the regressors in all time periods, and so, among other
things, it rules out models with lagged dependent variables.

(a) Run the standard OLS regression and obtain residuals uˆt
(b) Regress uˆt on ut−1
ˆ , obtaining the coecient ρ̂ on ut−1
ˆ and the t-statistic tρ̂
(c) Use the t-statistic to test H0 : ρ = 0 against H1 : ρ 6= 0 in the usual way

10
Pi=n
(ut −ut−1 )2
6. Durbin-Watson test under CLM assumptions: DW statistic DW = i=2
Pi=n 2 . We can show that
i=1 ui
ˆ
DW ≈ 2(1 − ρ)

(a) Usually the DW test is computed for the alternative H1 : ρ > 0


(b) The fact that an exact sampling distribution for DW can be tabulated is the only advantage that DW
has over the t-test. Given that the tabulated critical values are exactly valid only under the full set of
CLM assumptions and that they can lead to a wide inconclusive region, the practical disadvantages
of the DW are substantial. The t statistic is simple to compute and asymptotically valid without
normally distributed errors. The t statistic is also valid in the presence of heteroskedasticity; and
it is easy to make it robust to any form of heteroskedasticity.

7. Testing for AR(1) serial correlation without strictly exogenous regressors: Use t-test above except that
in step (b), we regress the residual on all independent variables, including an intercept and the lagged
residual. This allows for each xtj to be correlated with ut−1
ˆ , and this ensures that tρ̂ has an approximate
t distribution in large samples.

8. Testing for AR(q) serial correlation:

(a) Run the standard OLS regression and obtain residuals uˆt
(b) Regress uˆt on all the independent variables, including an intercept, andut−1
ˆ , ........., ut−q
ˆ , obtaining
the coecients ρˆ1 , ...., ρˆq
(c) Use the F-test for joint signicance to test H0 : ρ1 = ρ2 = .... = ρq = 0

9. Correcting for AR(1) serial correlation with strictly exogenous regressors (FGLS estimation of the AR(1)
model):

(a) Run the standard OLS regression and obtain residuals uˆt
(b) Regress uˆt on ut−1
ˆ , obtaining the coecient ρ̂ on ut−1
ˆ
( (
yt − ρ̂yt−1 t≥2 xt,i − ρ̂xt−1,i t≥2
(c) Compute quasi-dierenced data y˜t = 1/2 and x̃t,i = 1/2
1 − ρ̂2 y1 t=1 1 − ρ̂2 x1,i t=1
(d) Run OLS ỹt = β0 x̃t,0 + β1 x̃t,1 + .... + βk x̃t,k + errort . The usual standard errors, t statistics, and
F statistics are asymptotically valid.

10. The cost of using ρ̂ instead of ρ is that the FGLS estimator has no tractable nite sample properties. In
particular, it is not unbiased, although it is consistent when the data are weakly dependent.

(a) Since it is not unbiased, we cannot say it is BLUE

(b) Nevertheless, it is asymptotically more ecient than the OLS estimator when the AR(1) model for
serial correlation holds and the explanatory variables are strictly exogenous

11. Comparing OLS and FGLS: Consider the simple regression model yt = β0 + β1 xt + ut where the time
series processes are stationary. Now, assuming that the law of large numbers holds, consistency of OLS
for β1 holds if Cov(xt, ut ) = 0. Earlier we asserted that FGLS was consistent under the strict exogeneity
assumption which is more restrictive. It can be shown that the weakest assumption that must hold in
addition for FGLS to be consistent is Cov(xt−1 + xt+1 , ut ) = 0.

(a) Consistency and asymptotic normality of OLS and FGLS rely heavily on the time series processes
yt and xt,i being weakly dependent.

(b) Strange things can happen if we apply either OLS or FGLS when some processes have unit roots.
We discuss this further in Chapter 18.

12. Another benet of dierencing in addition to converting I(1) processes to I(0) is that it removes serial
correlation

13. Serial correlation (and heteroskedasticity) robust standard error for β̂1

(a) Estimate yt = β0 + β1 xt,1 + ... + βk xt,k + errort by OLS, which yields ”se(β̂1 )”, σ̂ , and the OLS
residuals {ûi : t = 1, ....., n}
(b) Compute the residuals {r̂t , t = 1, ..., n} from the auxiliary regression xt,1 on xt,2 , ..., xt,k and com-
pute ât = r̂t ût for each t

11
(c) For your choice of g, compute v̂ . Formula for v̂ not written here.

n
 92
i. Newey and West (1987) recommend taking g to be the integer part of 4 100 . Others have
1
suggested the integer part of n4
 2 √
”se(β̂1 )”
(d) Compute se(β̂1 ) = σ̂ v̂

14. Heteroskedasticity in Time Series Regressions

(a) Any serial correlation will generally invalidate a test for heteroskedasticity. Thus, it makes sense
to test for serial correlation rst, using a heteroskedasticity-robust test if heteroskedasticity is sus-
pected. Then, after something has been done to correct for serial correlation, we can test for
heteroskedasticity.

15. Autoregressive Conditional Heteroskedasticity (ARCH)

(a) Consider a simple static regression model yt = β0 + β1 zt + ut . The homoskedasticity assumption


says V ar(ut |Z) ut given Z is constant, there are other
should be constant. Even if the variance of
ways heteroskedasticity can arise. Engle (1982) suggested looking at the conditional variance of ut
given past errors (where the conditioning on Z is left implicit).

(b) First order ARCH process: u2t = α0 + α1 u2t−1 + vt where the expected value of vt (given ut−1 , ut−2 ,
......) is zero by denition

(c) The presence of ARCH does not aect consistency of OLS, and the usual heteroskedasticity-robust
standard errors and test statistics are valid. (Remember, these are valid for any form of het-
eroskedasticity, and ARCH is just one particular form of heteroskedasticity.)

16. Feasible GLS with Heteroskedasticity and AR(1) Serial Correlation

(a) Nothing rules out the possibility of both heteroskedasticity and serial correlation being present in
a regression model. If we are unsure, we can always use OLS and compute fully robust standard
errors, as described above.

(b) Alternatively, we can model heteroskedasticity and serial correlation, and correct for both through
a combined weighted least squares AR(1) procedure.

(c) Assume that we are trying to estimate the model yt = β0 + β1 xt,1 + .... + βk xt,k + ut , ut = ht vt ,
vt = ρvt−1 + et , |ρ| < 1
(d) Process:

i. Estimate the yt = ... equation by OLS and save the residuals ût
ii. Regress log(û2t ) on xt,1 , ..., xt,k and obtain the tted values, say ĝt
iii. Obtain the estimates of ht : ĥt = exp(ĝt )
−1/2 −1/2 −1/2 −1/2
iv. Estimate the transformed equation ĥt yt = ĥt β0 +β1 ĥt xt,1 +....+βk ĥt xt,k +errort
by standard serial correlation FGLS described above

(e) These feasible GLS estimators are asymptotically ecient. More importantly, all standard errors
and test statistics from the CO or PW methods are asymptotically valid.

13 Chapter 18: Advanced Time Series Topics


1. Innite Distributed Lag (IDL) Models:

(a) Relates yt to current and all past values of z , yt = α + δ0 zt + δ1 zt−1 + δ2 zt−2 + .... + ut , where the
sum on lagged z extends back to the indenite past

(b) The Geometric (or Koyck) Distributed Lag: δj = γρj ∀ j = 0, 1, 2, ......, |ρ| < 1
i. A simple subtraction yields yt = α(1 − ρ) + γzt + ρyt−1 + ut − ρut−1 . Dene vt = ut − ρut−1
ii. However, note that Corr(yt−1 , vt ) 6= 0 generally, which means standard OLS estimates would
be biased and inconsistent.

iii. If we assume that ut follows a standard AR(1) process with the same correlation ρ as in the IDL
process, ie, ut = ρut−1 +et where E(et |zt , yt−1 , zt−1 , .....) = 0, then yt = α(1−ρ)+γzt +ρyt−1 +et
becomes a dynamically complete model and we can obtain consistent, asymptotically normally
distributed estimators of the parameters by OLS.

12
2. Unit Root Testing:

(a) Assume AR(1) process yt = α + ρyt−1 + et where et follows a martingale dierence sequence (MDS),
ie, et is an iid sequence with zero mean, or, in other words, E(et |yt−1 , yt−2 , ..., y0 ) = 0
(b) Null hypothesis H0 : ρ = 1 v/s alternate hypothesis H1 : 0 < ρ < 1. In words, the null is that yt is
I(1) against the alternate of I(0). The reason we do not take the null to be I(0) in this setup is that
{yt } is I(0) for any value of ρ strictly between -1 and 1, something that classical hypothesis testing
does not handle easily.

(c) We convert the AR(1) process to ∆yt = α + θyt−1 + et . Now, H0 : θ = 0. The problem is
that, under H0 , {yt } is I(1), and so the usual central limit theorem that underlies the asymptotic
standard normal distribution for the t statistic does not apply: the t statistic does not have an
approximate standard normal distribution even in large sample sizes. The asymptotic distribution
of the t statistic under H0 has come to be known as the Dickey-Fuller distribution after Dickey and
Fuller (1979).

(d) Use the usual t statistic for θ̂ but with the appropriate Dickey-Fuller critical values. This is known
as the Dickey-Fuller (DF) test for a unit root.

(e) We reject the null hypothesis H0 : θ = 0 against H1 : θ < 0 if tθ̂ < c, where c is one of the negative
values in Table 18.2. For example, to carry out the test at the 5% signicance level, we reject if
tθ̂ < 2.86
(f ) We also need to test for unit roots in models with more complicated dynamics. More generally, we
can add p lags of∆yt to the equation to account for the dynamics in the process. The way we test the
null hypothesis of a unit root is very similar: we run the regression of∆yt on yt−1, ∆yt−1 , ...., ∆yt−p .
The inclusion of the lagged changes is intended to clean up any serial correlation in ∆yt . The critical
values and rejection rule are the same as before. This is called the augmented Dickey-Fuller test.

(g) For series that have clear time trends, we need to modify the test for unit roots. A trend-stationary
processwhich has a linear trend in its mean but is I(0) about its trendcan be mistaken for a
unit root process if we do not control for a time trend in the Dickey-Fuller regression. We change
the basic equation to ∆yt = α + δt + θyt−1 + et . When we include a time trend in the regression, the
critical values of the test change. Intuitively, this is because detrending a unit root process tends to
make it look more like an I(0) process. Therefore, we require a larger magnitude for the t statistic
in order to reject H0 .

3. Spurious Regression:

(a) In a cross-sectional environment, we use the phrase spurious correlation to describe a situation
where two variables are related through their correlation with a third variable. In particular, if we
regress y on x, we nd a signicant relationship. But when we control for another variable, say z,
the partial eect of x on y becomes zero. Naturally, this can also happen in time series contexts
with I(0) variables.

(b) Secondly, it is possible to nd a spurious relationship between time series that have increasing or
decreasing trends. Provided the series are weakly dependent about their time trends, the problem
is eectively solved by including a time trend in the regression model.

(c) Thirdly, when we are dealing with processes that are integrated of order one, there is an additional
complication. Even if the two series have means that are not trending, a simple regression involving
two independent I(1) series will often result in a signicant t statistic. Let {xt } and {yt } be random
walks generated by xt = xt−1 + at and yt = yt−1 + et where {at } and {et } are MDS sequences.
Assume further that {at } and {et } are independent processes, which implies that {xt } and {yt } are
also independent. If we run the simple regression, yt = β0 + β1 xt , β̂! is signicant a large percentage
of the time, much larger than the nominal signicance level. Granger and Newbold called this the
spurious regression problem: there is no sense in which y and x are related, but an OLS regression
using the usual t statistics will often indicate a relationship.

i. Including a time trend does not really change the conclusion.

ii. The same considerations arise with multiple independent variables, each of which may be I(1)
or some of which may be I(0). If {yt } is I(1) and at least some of the explanatory variables are
I(1), the regression results may be spurious.

4. Cointegration:

13
(a) In earlier chapters, we suggested that I(1) variables should be dierenced before they are used in
linear regression models, whether they are estimated by OLS or instrumental variables.

(b) If {yt : t = 0, 1, ....} and {xt : t = 0, 1, ....} are two I(1) processes, then, in general, yt − βxt is
an I(1) process for any number β . Nevertheless, it is possible that for some β , yt − βxt is an I(0)
process, which means it has constant mean, constant variance, autocorrelations that depend only
on the time distance between any two variables in the series, and it is asymptotically uncorrelated.
If such a exists, we say that y and x are cointegrated, and we call β the cointegration parameter.

(c) Testing for cointegration:

i. If we have a hypothesized value for β, then construct st = yt − βxt and apply DF test to {st }.
If we reject a unit root in {st } in favor of the I(0) alternative, then we nd that yt and xt are
cointegrated. In other words, the null hypothesis is that yt and xt are not cointegrated.

ii. Testing for cointegration is more dicult when the (potential) cointegration parameter is un-
known. We must rst estimate β. Running an OLS regression yt = α̂ + β̂xt + ût is spurious
under the null hypothesis. Fortunately, it is possible to tabulate critical values even when β is
estimated, where we apply the Dickey-Fuller or augmented Dickey-Fuller test to the residuals
ût . The only dierence is that the critical values account for estimation of β.
(d) Tests for β when two processes are known to be cointegrated:

i. When yt and xt are I(1) and cointegrated, we can write yt = α + βxt + ut , where ut is a
zero mean, I(0) process. Generally, {ut } contains serial correlation, but we know from Chapter
11 that this does not aect consistency of OLS. Unfortunately, because xt is I(1), the usual
inference procedures do not necessarily apply: OLS is not asymptotically normally distributed,
and the t statistic for β̂ does not necessarily have an approximate t distribution. We do
know from Chapter 10 that, if {xt } is strictly exogenous and the errors are homoskedastic,
serially uncorrelated, and normally distributed the OLS estimator is also normally distributed
(conditional on the explanatory variables), and the t statistic has an exact t distribution.
Unfortunately, these assumptions are too strong to apply to most situations.

ii. Fortunately, the lack of strict exogeneity of {xt }


xt is I(1), the proper
can be xed. Because
notion of strict exogeneity is that ut t and s.We can always
is uncorrelated with ∆xs , for all
arrange this for a new set of errors, at least approximately, by writing ut as a function of the
∆xs for all s close to t. This gives us yt = α + βxt + φ0 ∆xt + φ1 ∆xt−1 + φ2 ∆xt−2 + γ1 ∆xt+1 +
γ2 ∆xt+2 + et . The strict exogeneity assumption is the important condition needed to obtain
an approximately normal t statistic for β̂ . The OLS estimator of β from this equation is called
the leads and lags estimator for β .

5. Error Correction Models:

(a) If yt and xt are I(1) processes that are not cointegrated, we might estimate a dynamic model in rst
dierences such as ∆yt = α0 +α1 ∆yt−1 +γ0 ∆xt +γ1 ∆xt−1 +ut where E(ut |∆xt , ∆yt−1 , ∆xt−1 , ....) =
0.
(b) If yt and xt β , then we have additional I(0) variables which we
are cointegrated with parameter
can include. st = yt − βxt and assume that st has zero mean. Then we can dene ∆yt =
Let
α0 + α1 ∆yt−1 + γ0 ∆xt + γ1 ∆xt−1 + δst−1 + ut where E(ut |It−1 ) = 0 and It−1 contains information
on ∆xt and all past values of x and y . The term δ (yt−1 − βxt−1 ) is called the error correction term
and this model is an example of an error correction model.

(c) How do we estimate the parameters? If we know β, this is easy. We just need to run an OLS
regression. In many other examples, the cointegrating parameter must be estimated. Then, we
replace st−1 with ŝt−1 = yt−1 − β̂xt−1 . This raises the issue about how sampling variation in β̂
aects inference on the other parameters in the error correction model. Fortunately, as shown by
Engle and Granger (1987), we can ignore the preliminary estimation of β (asymptotically). This
is very convenient. The procedure of replacing β with β̂ is called the Engle-Granger two-step
procedure.

6. Forecasting: We assume in this section that the primary focus is on forecasting future values of a time
series process and not necessarily on estimating causal or structural economic models.

(a) Let ft denote the forecast of yt+1 made at time t. We call ft a one-step-ahead forecast. The forecast
error is et+1 = yt+1 − ft , which we observe once the outcome on yt+1 is observed.

14
(b) A basic fact from probability says that if we wish to minimize the expected squared forecast error
given information at time t, our forecast should be the expected value of yt+1 given variables we
know at time t.

(c) Vector Autoregressive Model (VAR): We know what an autoregressive model is from Chapter 11:
we model a single series, {yt }, in terms of its own past. In vector autoregressive models, we model
several series. If we have two series, yt and zt , a vector autoregression consists of equations that
look like :
yt = δ0 + α1 yt−1 + γ1 zt−1 + α2 yt−2 + γ2 zt−2 + ...
zt = η0 + β1 yt−1 + ρ1 zt−1 + β2 yt−2 + ρ2 zt−2 + ...
(d) Granger Causality: Generally, we say that z Granger causes y if E(yt |It−1 ) 6= E(yt |Jt−1 ) where
It−1 contains past information on y and z, and Jt−1 contains only information on past y . This
means that past z is useful, in addition to past y , for predicting yt . Note that Granger causality
has nothing to say about contemporaneous causality, so it does not allow us to determine whether
zt is an exogenous or endogenous variable in an equation relating yt to zt .
i. Test for Granger Causality: Suppose that E(yt |yt−1 , yt−2 , yt−3 , ...) depends only on three lags:
yt = δ0 + α1 yt−1 + α2 yt−2 + α3 yt−3 + ut where E(ut |yt−1 , yt−2 , yt−3 , ...) = 0. Now, under the
null hypothesis that z does not Granger cause y , any lags of z that we add to the equation
should have zero population coecients. We can use t-test if only one lag of z is added or F-test
for multiple lags to test for this.

ii. Let {wt } be a third series (or, it could represent several additional series). Then, z Granger
causes y conditional on w if E(yt |It−1 ) 6= E(yt |Jt−1 ) holds, but now It−1 contains past infor-
mation on y , z , and w , while Jt−1 contains past information on y and w . It is certainly possible
that z Granger causes y , but z does not Granger cause y conditional on w . A test of the null
that z does not Granger cause y conditional on w is obtained by testing for signicance of
lagged z in a model for y that also depends on lagged y and lagged w .

14 Chapter 19: Carrying Out an Empirical Project


1. Posing a question

2. Literature review

3. Data collection

4. Econometric analysis

5. Writing a paper

15

You might also like