Professional Documents
Culture Documents
Francis Diebold - Forecasting Slides
Francis Diebold - Forecasting Slides
Francis X. Diebold
University of Pennsylvania
1 / 281
c 2013-2019, by Francis X. Diebold.
Copyright
All materials are freely available for your use, but be warned: they are highly
preliminary, significantly incomplete, and rapidly evolving. All are licensed
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
International License. (Briefly: I retain copyright, but you can use, copy and
distribute non-commercially, so long as you give me attribution and do not
modify. To view a copy of the license, visit
http://creativecommons.org/licenses/by-nc-nd/4.0/.) In return I ask
that you please cite the books whenever appropriate, as: ”Diebold, F.X. (year
here), Book Title Here, Department of Economics, University of Pennsylvania,
http://www.ssc.upenn.edu/ fdiebold/Textbooks.html.”
2 / 281
Introduction to Econometrics
3 / 281
Numerous Communities Use Econometrics
4 / 281
Econometrics is Special
5 / 281
Let’s Elaborate on the “Emphasis on Prediction”...
Q: What is econometrics is about, broadly?
6 / 281
Types/Arrangements of Economic Data
– Cross section
– Time series
7 / 281
A Few Leading Econometrics Web Data Resources
(Clickable)
Indispensible:
I Resources for Economists (AEA)
More specialized:
I National Bureau of Economic Research
I Many more
8 / 281
A Few Leading Econometrics Software Environments
(Clickable)
9 / 281
Graphics Review
10 / 281
Graphics Help us to:
11 / 281
Histogram Revealing Distributional Shape:
1-Year Government Bond Yield
12 / 281
Time Series Plot Revealing Dynamics:
1-Year Goverment Bond Yield
13 / 281
Scatterplot Revealing Relationship:
1-Year and 10-Year Government Bond Yields
14 / 281
Some Principles of Graphical Style
15 / 281
Probability and Statistics Review
16 / 281
Moments, Sample Moments and Their Sampling
Distributions
17 / 281
Population Moments: Expectations of Powers of R.V.’s
σ 2 = var (y ) = E (y − µ)2 .
18 / 281
More Population Moments
E (y − µ)3
S= .
σ3
E (y − µ)4
K= .
σ4
19 / 281
Covariance and Correlation
So does correlation:
cov (y , x)
corr (y , x) = .
σy σx
20 / 281
Sampling and Estimation
Sample : {yi }N
i=1 ∼ iid f (y )
Sample mean:
N
1 X
ȳ = yi
N
i=1
Sample variance:
PN
2 (yi − ȳ )2
σ̂ = i=1
N
Unbiased sample variance:
PN
2 (yi − ȳ )2
s = i=1
N −1
21 / 281
More Sample Moments
Sample skewness:
1 PN
N i=1 (yi − ȳ )3
Ŝ =
σ̂ 3
Sample kurtosis:
1 PN
N i=1 (yi − ȳ )4
K̂ =
σ̂ 4
22 / 281
Still More Sample Moments
Sample covariance:
N
1 X
ov (y , x) =
cd [(yi − ȳ )(xi − x̄)]
N
i=1
Sample correlation:
ov (y , x)
cd
d (y , x) =
corr
σ̂y σ̂x
23 / 281
Exact Finite-Sample Distribution of the Sample Mean
(Requires iid Normality)
26 / 281
Wages: Sample Statistics
27 / 281
Regression
28 / 281
Regression
29 / 281
Regression as Curve Fitting
30 / 281
Distributions of Log Wage, Education and Experience
31 / 281
Scatterplot: Log Wage vs. Education
32 / 281
Curve Fitting
Fit a line:
yi = β1 + β2 xi
Solve:
N
(yi − β1 − β2 xi )2
X
min
β1 ,β2
i=1
“quadratic loss”
33 / 281
Actual Values, Fitted Values and Residuals
34 / 281
Log Wage vs. Education with Superimposed Regression
Line
\ = 1.273 + .081EDUC
LWAGE
35 / 281
Multiple Linear Regression (K RHS Variables)
Solve:
N
(yi − β1 − β2 xi2 − ... − βK xiK )2
X
min
β1 ,...,βK
i=1
Fitted hyperplane:
ŷi = β̂1 + β̂2 xi2 + ... + β̂K xiK
More compactly:
K
X
ŷi = β̂k xik ,
k=1
Wage dataset:
\ = .867 + .093EDUC + .013EXPER
LWAGE
36 / 281
Regression as a Probability Model
37 / 281
An Ideal Situation (“The Ideal Conditions”, or IC)
1. The data-generating process (DGP) is:
y1 1 x12 x13 ... x1K β1 ε1
y2 1 x22 x23 ... x2K β2 ε2
y = . X = . β = .. ε = ..
.. .. . .
yN 1 xN2 xN3 ... xNK βK εN
39 / 281
Elementary Matrices and Matrix Operations
0 0 ... 0 1 0 ... 0
0 0 . . . 0 0 1 . . . 0
0 = . . . .. I = . . . ..
.
. .. . . . .
. .. . . .
0 0 ... 0 0 0 ... 1
40 / 281
The DGP in Matrix Form, Written Out
y1 1 x12 x13 . . . x1K β1 ε1
y2 1 x22 x23 . . . x2K β2 ε2
.. = .. .. + ..
. . . .
yN 1 xN2 xN3 . . . xNK βK εN
2
ε1 01 σ 0 ... 0
ε2 02 0 σ 2 . . . 0
.. ∼ N .. , .. .. . . .
. . . . . ..
εN 0N 0 0 . . . σ2
y = Xβ + ε
ε ∼ iidN(0, σ 2 I )
41 / 281
Three Notations
Original form:
i = 1, 2, ..., N
Intermediate form:
yi = xi0 β + εi , εi ∼ iidN(0, σ 2 )
i = 1, 2, ..., N
Matrix form:
y = X β + ε, ε ∼ iidN(0, σ 2 I )
42 / 281
Ideal Conditions Redux
y = X β + ε, ε ∼ N(0, σ 2 I )
43 / 281
The OLS Estimator in Matrix Notation:
As always, the LS estimator solves:
N
!
X
2
min (yi − β1 − β2 xi2 − ... − βK xiK )
β
i=1
In matrix notation:
min (y − X β)0 (y − X β)
β
β̂LS = (X 0 X )−1 X 0 y
44 / 281
Recall the Large-Sample Distribution of the Sample Mean
a
ȳ ∼ N(µ, v ).
45 / 281
Large-Sample Distribution of β̂LS Under the IC
a
β̂LS ∼ N (β, V ) .
46 / 281
Sample Mean, Regression on an Intercept, and Properties
of Residuals
47 / 281
Conditional Implications of the DGP
Conditional mean:
E (yi | xi =x ∗ ) = x ∗ 0 β
Conditional variance:
var (yi | xi =x ∗ ) = σ 2
yi | xi =x ∗ ∼ N(x ∗ 0 β, σ 2 )
48 / 281
Why All the Talk About Conditional Implications?:
“Point Prediction”
A major goal in econometrics is predicting y . The question is “If a
new person i arrives with characteristics xi =x ∗ , what is my best
prediction of her yi ? The answer is E (yi | xi =x ∗ ) = x ∗ 0 β.
49 / 281
“Interval Prediction”
Non-operational:
Operational:
50 / 281
“Density Prediction”
Non-operational version:
yi | xi =x ∗ ∼ N(x ∗ 0 β, σ 2 )
Operational version:
yi | xi =x ∗ ∼ N(x ∗ 0 β̂LS , s 2 )
51 / 281
“Typical” Regression Analysis of Wages, Education and
Experience
52 / 281
“Top Matter”: Background Information
I Dependent variable
I Method
I Date
I Sample
I Included observations
53 / 281
“Middle Matter”: Estimated Regression Function
I Variable
I p-value
54 / 281
Predictive Perspectives
– OLS coefficient signs and sizes give the weights put on the
various x variables in forming the best in-sample prediction of y .
55 / 281
“Bottom Matter: Statistics”
56 / 281
Regression Statistics: Mean dependent var 2.342
N
1 X
ȳ = yi
N
i=1
57 / 281
Predictive Perspectives
58 / 281
Regression Statistics: S.D. dependent var .561
s
PN
i=1 (yi− ȳ )2
SD =
N −1
59 / 281
Predictive Perspectives
60 / 281
Regression Statistics: Sum squared resid 319.938
N
X
SSR = ei2
i=1
61 / 281
Predictive Perspectives
– The OLS fitted values, ŷi = xi0 β̂, are effectively in-sample
regression predictions.
“squared-error loss”
“quadratic loss”
(SSRres − SSR)/(K − 1)
F =
SSR/(N − K )
63 / 281
Predictive Perspectives
64 / 281
Regression Statistics: S.E. of regression .492
PN 2
2 i=1 ei
s =
N −K
s
√
PN 2
i=1 ei
SER = s2 =
N −K
65 / 281
Predictive Perspectives
66 / 281
Regression Statistics:
R-squared .232, Adjusted R-squared .231
1 PN 2
2 N i=1 ei
R =1− 1 PN
N i=1 (yi − ȳ )2
1 PN 2
2 N−K i=1 ei
R̄ = 1 − 1 N 2
P
N−1 i=1 (yi − ȳ )
67 / 281
Predictive Perspectives
68 / 281
Rk2 and “Multicollinearity”
(not shown in the computer output)
69 / 281
Predictive Perspectives
70 / 281
Regression Statistics: Log likelihood -938.236
71 / 281
Detail: Maximum-Likelihood Estimation
yi |xi ∼ iidN(xi0 β, σ 2 ),
so that −1 0 2
−1
f (yi |xi ) = (2πσ 2 ) 2 e 2σ2 (yi −xi β)
72 / 281
Detail: Log Likelihood
N
2 −N
1 X
ln L = ln (2πσ ) 2 − 2 (yi − xi0 β)2
2σ
i=1
N
−N N 1 X
ln(2π) − ln σ 2 − 2 (yi − xi0 β)2
=
2 2 2σ
i=1
– Log turns the product into a sum and eliminates the exponential
– “MLE and OLS coincide for linear regression under the IC”
(Normality, in particular, is crucial)
73 / 281
Detail: Likelihood-Ratio Tests
74 / 281
Predictive Perspectives
75 / 281
Regression Statistics: Schwarz criterion 1.435
Akaike info criterion 1.422
76 / 281
Regression Statistics: Durbin-Watson stat. 1.926
77 / 281
Residual Scatter
78 / 281
Residual Plot
79 / 281
Predictive Perspectives
– The LS fitted values, ŷi = xi0 β̂, are effectively best in-sample
predictions.
For example:
1. The true DGP may be nonlinear
2. ε may be non-Gaussian
3. ε may have non-constant variance
80 / 281
Misspecification and Model Selection
81 / 281
Regression Statistics:
Akaike info criterion 1.422, Schwarz criterion 1.435
SSR versions:
PN 2
2K
i=1 ei
AIC = e N
N
PN 2
( K
) i=1 ei
SIC = N N
N
AIC = −2lnL + 2K
82 / 281
Penalties
83 / 281
Predictive Perspectives
– Estimate out-of-sample forecast accuracy (which is what we
really care about) on the basis of in-sample forecast accuracy. (We
want to select a forecasting model that will perform well for
out-of-sample forecasting, quite apart from its in-sample fit.)
“Oracle property”
84 / 281
Non-Normality and Outliers
85 / 281
What We’ll Do
86 / 281
Large-Sample Distribution of β̂LS
Under the Ideal Conditions (Except Normality)
a
β̂LS ∼ N (β, V ) ,
87 / 281
Jarque-Bera Normality Test
– Many more
88 / 281
Recall Our OLS Wage Regression
89 / 281
OLS Residual Histogram and Statistics
90 / 281
QQ Plots
I To the extent that the QQ plot does not match the 45 degree
line, the nature of the divergence can be very informative, as
for example in indicating fat tails
91 / 281
OLS Wage Regression Residual QQ Plot
92 / 281
Residual Scatter
93 / 281
OLS Residual Plot
94 / 281
Leave-One-Out Plot
Consider:
β̂ (−i) − β̂ , i = 1, ...N
“Leave-one-out plot”
95 / 281
Wage Regression
96 / 281
Robust Estimation: Least Absolute Deviations (LAD)
The LAD estimator, β̂LAD , solves:
N
X
minβ |εi |
i=1
– The two are equal with symmetric disturbances as with FIC, but
not with asymmetric disturbancesy, in which case the median is a
more robust measure of central tendency
97 / 281
LAD Wage Regression Estimation
98 / 281
Digging into Prediction (Much) More Deeply (Again)
yi = xi0 β + εi , i = 1, ..., N,
εi ∼ iid D(0, σ 2 )
99 / 281
Simulation Algorithm for
Feasible Density Prediction With Normality
100 / 281
Now: Simulation Algorithm for
Feasible Density Prediction Without Normality
[If desired, form a 95% interval forecast by sorting the output from
step 2, and taking the left and right interval endpoints as the the
.025% and .975% values.]
101 / 281
Indicator Variables in Cross Sections:
Group Effects
102 / 281
Dummy Variables for Group Effects
“Intercept dummies”
103 / 281
Histograms for Wage Covariates
105 / 281
Basic Wage Regression Results
106 / 281
Introducing Sex, Race, and Union Status
in the Wage Regression
Now:
107 / 281
Wage Regression on Education, Experience, and Group
Dummies
108 / 281
Predictive Perspectives
Basic Wage Regression
– Conditions only on education and experience.
– Intercept is a mongrel combination of those for men, women,
union, non-union, white, non-white.
– Comparatively sparse “information set”. Forecasting performance
could be improved.
Wage Regression With Dummies
– Conditions on education, experience,
AND sex, race, and union status.
– Now we have different, “customized”, intercepts
by sex, race, and union status.
– Comparatively rich information set.
Forecasting performance should be better.
e.g., knowing that someone is female, non-white, and non-union
would be very valuable (in addition to education and experience)
for predicting her wage!
109 / 281
Nonlinearity
110 / 281
Anscombe’s Quartet
111 / 281
Anscombe’s Quartet: Regressions
112 / 281
Anscombe’s Quartet Graphics: Dataset 1
113 / 281
Anscombe’s Quartet Graphics: Dataset 2
114 / 281
Anscombe’s Quartet Graphics: Dataset 3
115 / 281
Anscombe’s Quartet Graphics: Dataset 4
116 / 281
Log-Log Regression
lnyi = β1 + β2 lnxi + εi
lnyi = βxi + ε
118 / 281
Intrinsically Non-Linear Models
(0 < r < 1)
119 / 281
Taylor Series Expansions
yi = g (xi , εi )
yi = f (xi ) + εi
120 / 281
Taylor Series, Continued
f (xi ) ≈ β1 + β2 x,
f (xi ) ≈ β1 + β2 xi + β3 xi2 .
121 / 281
A Key Insight
– So non-linearity is ultimately
an omitted-variables problem
122 / 281
Predictive Perspectives
123 / 281
Assessing Non-Linearity
124 / 281
Linear Wage Regression (Actually Log-Lin)
125 / 281
Quadratic Wage Regression
126 / 281
Dummy Interactions?
127 / 281
Everything
128 / 281
So Drop Dummy Interactions and Tighten the Rest
129 / 281
Heteroskedasticity in Cross-Sections
130 / 281
Heteroskedasticity is Another Type of Violation of the IC
(This time it’s “non-constant disturbance variances”)
Unconditional heteroskedasticity not usually so relevant...
Consider this part of the IC:
ε ∼ N(0, σ 2 I )
Unconditional heteroskedasticity occurs when the unconditional
disturbance variance varies across people for some unknown reason.
Hence Ω = cov (ε) is diagonal but Ω 6= σ 2 I .
Instead:
2
σ1 0 . . . 0
0 σ2 . . . 0
2
Ω= . .. . . ..
.. . . .
0 0 . . . σN 2
Consider IC2.2:
132 / 281
Consequences for Estimation and Inference
133 / 281
Consequences for Prediction
134 / 281
Detecting Conditional Heteroskedasticity
135 / 281
Graphical Diagnostics
136 / 281
Recall Our “Final” Wage Regression
137 / 281
Squared Residual vs. EDUC
138 / 281
The Breusch-Pagan-Godfrey Test (BPG)
BPG test:
139 / 281
BPG Test
140 / 281
White’s Test
141 / 281
White’s Test
142 / 281
Dealing with Heteroskedasticity
143 / 281
Adjusting Standard Errors
Using advanced methods, one can obtain estimators for standard
errors that are consistent even when heteroskedasticity is present.
V̂ = s 2 (X 0 X )−1 ,
where s 2 = N 2
P
i=1 ei /(N − K ).
145 / 281
Adjusting Density Forecasts
Recall non-operational version for Gaussian homoskedastic
disturbances:
yi | xi =x ∗ ∼ N(x ∗ 0 β, σ 2 )
yi | xi =x ∗ ∼ N(x ∗ 0 β̂LS , s 2 )
(Nonlinear, non-Gaussian)
147 / 281
Many Names
“classification models”
148 / 281
Framework
Left-hand-side variable is yi = Ii (z), where the “indicator variable”
Ii (z) indicates whether event z occurs; that is,
1 if event z occurs
Ii (z) =
0 otherwise.
In that case we have
E (yi |xi ) = E (Ii (z)|xi ) = xi0 β.
A key insight, however, is that
E (Ii (z)|xi ) = P(Ii (z) = 1|xi ),
so the statement is really
P(Ii (z) = 1|xi ) = xi0 β. (1)
– Leading examples: recessions, bankruptcies, loan or credit card
defaults, financial market crises, consumer choices, ...
149 / 281
The “Linear Probability Model”
150 / 281
Squashing Functions
151 / 281
Logit as a Nonlinear Model for the Event Probability
0
e xi β
P(Ii (z) = 1|xi ) = 0 .
1 + e xi β
152 / 281
Logit as a Linear Model for the Log Odds
Solving the log odds for P(Ii (z) = 1 | xi ) yields the logit model,
0
1 e xi β
P(Ii (z) = 1 | xi ) = 0 = 0 .
1 + e −xi β 1 + e xi β
So logit is just linear regression for log odds.
153 / 281
Estimation
yi |xi ∼ N(xi0 β, σ 2 ),
Now we have:
0
!
e xi β
yi |xi ∼ Bernoulli 0 ,
1 + e xi β
154 / 281
Marginal Effects for Binary Regression
Logit marginal effects ∂E (yi |xi )/∂xik are not simply given by the
βk ’s. Instead we have
155 / 281
R 2 for Binary Regression
156 / 281
Classification and “0-1 Forecasting”
157 / 281
Example: Predicting High-Wage Individuals
Using 1995 CPS data, let’s suppose that we don’t observe exact
wages. Instead, we observe whether a person’s wage is high
(WAGE ≥ 15) or low (WAGE < 15).
158 / 281
Logit Regression of HIGHWAGE on EDUC and EXPER
Dependent variable:
HIGHWAGE
EDUC 0.347∗∗∗
(0.028)
EXPER 0.046∗∗∗
(0.006)
Constant −6.608∗∗∗
(0.455)
159 / 281
Decision Boundary Scatterplot for Logit Bayes Classifier
20
15
EDUC
10
5
0 10 20 30 40 50
EXPER
161 / 281
Misspecification and Model Selection
162 / 281
Non-Normality and Outliers
163 / 281
Indicator Variables in Time Series I:
Trend
164 / 281
Liquor Sales
165 / 281
Log Liquor Sales
166 / 281
Linear Trend
Trendt = β1 + β2 TIMEt
where TIMEt = t
167 / 281
Various Linear Trends
168 / 281
Linear Trend Estimation
169 / 281
Residual Plot
170 / 281
Indicator Variables in Time Series II:
Seasonality
171 / 281
Seasonal Dummies
S
X
Seasonal s = βs SEASst (S seasons per year)
s=1
1 if observation t falls in season s
where SEASst =
0 otherwise
173 / 281
Seasonal Pattern
174 / 281
Residual Plot
175 / 281
Nonlinearity in Time Series
176 / 281
Non-Linear Trend: Exponential (Log-Linear)
Trendt = β1 e β2 TIMEt
177 / 281
Figure: Various Exponential Trends
178 / 281
Non-Linear Trend: Quadratic
179 / 281
Figure: Various Quadratic Trends
180 / 281
Log-Quadratic Liquor Sales Trend Estimation
Figure:
181 / 281
Residual Plot
182 / 281
Log-Quadratic Liquor Sales Trend Estimation
with Seasonal Dummies
184 / 281
Serial Correlation
185 / 281
Serial Correlation is Another Type of Violation of the IC
(This time it’s “correlated disturbances”.)
Consider: ε ∼ N(0, Ω)
186 / 281
Serially Correlated Regression Disturbances
Leading example
(“AR(1)” disturbance serial correlation):
yt = xt0 β + εt
187 / 281
Consequences for β Estimation and Inference:
As with Heteroskedasticity, Point Estimation is OK,
but Inference is Damaged
188 / 281
Consequences for y Prediction:
Unlike With Heteroskedasticity,
Even Point Predictions are Damaged/
Serial correlation is a bigger problem for prediction than
heteroskedasticity.
Put differently:
Serial correlation in forecast errors means that you can forecast
your forecast errors! So something is wrong and can be improved...
189 / 281
Some Important Language and Tools
For Characterizing Serial Correlation
“Autocovariances”: γε (τ ) = cov (εt , εt−τ ), τ = 1, 2, ...
190 / 281
The Precise Form of Ω Under Serial Correlation
γε (0) γε (1) . . . γε (T − 1)
γε (1) γε (0) . . . γε (T − 2)
Ω= .. .. ..
..
. . . .
γε (T − 1) γε (T − 2) . . . γε (0)
d c b a
191 / 281
White Noise Disturbances
iid
Independent (strong) white noise: εt ∼ (0, σ 2 )
iid
Gaussian white noise: εt ∼ N(0, σ 2 )
We write:
εt ∼ WN(0, σ 2 )
192 / 281
,5
"73
=&
*"+!!
193 / 281
&
/
6
73
=&
194 / 281
195 / 281
AR(1) Disturbances
vt ∼ WN(0, σ 2 )
196 / 281
Realizations of Two AR(1) Processes (N(0, 1) shocks)
,5
";4/,89&
'
"-!!,*"+!!!
197 / 281
&
/
6
/,89&
<JA#
ρ(τ ) = φτ
198 / 281
&
/
6
/,89&
<JA#
ρ(τ ) = φτ
199 / 281
Detecting Serial Correlation
I Graphical diagnostics
I Residual plot
I Residual scatterplot of (et vs. et−τ )
I Residual correlogram
I Formal tests
I Durbin-Watson
I Breusch-Godfrey
200 / 281
Recall Our Log-Quadratic Liquor Sales Model
Dependent Variable: LSALES
Method: Least Squares
Date: 08/08/13 Time: 08:53
Sample: 1987M01 2014M12
Included observations: 336
201 / 281
Residual Plot
202 / 281
Residual Scatterplot (et vs. et−1 )
203 / 281
Residual Correlogram
√ √
Bartlett standard error (= 1/ T ) = 1/ 336) = .055
√
95 % Bartlett band (= ±2/ T ) = ±.11
204 / 281
Formal Tests: Durbin-Watson (0.59!)
yt = xt0 β + εt
εt = φεt−1 + vt
vt ∼ iid N(0, σ 2 )
We want to test H0 : φ = 0 against H1 : φ 6= 0
Then form:
PT
(et − et−1 )2
DW = t=2PT
2
t=1 et
205 / 281
Understanding the Durbin-Watson Statistic
PT 2 1 PT 2
t=2 (et − et−1 ) T t=2 (et − et−1 )
DW = PT = 1 PT
2 2
t=1 et T t=1 et
1 PT 2 1 PT 2 1 PT
T t=2 et + T t=2 et−1 − 2 T t=2 et et−1
= 1 PT 2
T t=1 et
Hence as T → ∞:
σ 2 + σ 2 − 2cov (εt , εt−1 )
DW ≈ = 1+1−2corr (εt , εt−1 ) = 2(1−corr (εt , εt−1 ))
σ2
=⇒ DW ∈ [0, 4], DW → 2 as φ → 0, and DW → 0 as φ → 1
206 / 281
Formal Tests: Breusch-Godfrey
yt = xt0 β + εt
207 / 281
BG for AR(1) Disturbances
(TR 2 = 168.5, p = 0.0000)
208 / 281
BG for AR(4) Disturbances
(TR 2 = 216.7, p = 0.0000)
209 / 281
BG for AR(8) Disturbances
(TR 2 = 219.0, p = 0.0000)
210 / 281
HAC Estimation for Time Series
***
211 / 281
Modeling Serial Correlation:
Including Lags of y as Regressors
Serial correlation in disturbances means that
the included x’s don’t fully account for the y dynamics.
Simple to fix by modeling the y dynamics directly:
Just include lags of y as additional regressors.
“Lagged dependent variables”
More precisely, AR(p) disturbances “fixed” by
including p lags of y and x.
(Select p using the usual SIC , etc.)
Illustration:
Convert the DGP below to one with white noise disturbances.
yt = β1 + β2 xt + εt
εt = φεt−1 + vt
vt ∼ iid N(0, σ 2 )
212 / 281
Liquor Sales: Everything Consistent With AR(4) Dynamics
213 / 281
Trend + Seasonal Model with Four Lags of y
214 / 281
Trend + Seasonal Model with Four Lags of y
Residual Plot
215 / 281
Trend + Seasonal Model with Four Lags of y
Residual Scatterplot
216 / 281
Trend + Seasonal Model with Four Lags of y
Residual Autocorrelations
217 / 281
Trend + Seasonal Model with Four Lags of y
Residual Histogram and Normality Test
218 / 281
Forecasting Time Series
219 / 281
The “Forecasting the Right-Hand-Side Variables Problem”
yt = xt0 β + εt
0
=⇒ yt+h = xt+h β + εt+h
Projecting on current information,
0
yt+h,t = xt+h,t β
220 / 281
But FRVP is not a Problem for Us!
FRVP no problem for trends. Why?
FRVP no problem for seasonals. Why?
FRVP also no problem for autoregressive effects
(lagged dependent variables)
e.g., consider a pure AR(1)
yt = φyt−1 + εt
yt+h = φyt+h−1 + εt+h
yt+h,t = φyt+h−1,t
No FRVP for h = 1. There seems to be an FRVP for h > 1.
But there’s not...
We build the multi-step forecast recursively.
First 1-step, then 2-step, etc.
“Wold’s chain rule of forecasting”
221 / 281
Interval and Density Forecasting
222 / 281
Interval and Density Forecasting
(1-Step-Ahead, AR(1))
yt = φyt−1 + εt εt ∼ WN(0, σ 2 )
Back substitution yields
with variance
2
σt+1,t = var (et+1,t ) = σ 2
223 / 281
Interval and Density Forecasting
(h-Step-Ahead, AR(p))
225 / 281
Now With Realization Superimposed...
226 / 281
SER’s
LSALESt → c, TIMEt
SER = .160
227 / 281
Structural Change in Time Series:
Evolution or Breaks in Any or all Parameters
228 / 281
Structural Change
Single Sharp Break, Exogenously Known
For simplicity of exposition, consider a bivariate regression:
Distributed F under the no-break null (and the rest of the IC)
230 / 281
Structural Change
Sharp Breakpoint, Endogenously Identified
231 / 281
Recursive Parameter Estimates
β̂1:t
for t = 1, ..., T
232 / 281
Recursive Residuals
233 / 281
Standardized Recursive Residuals and CUSUM
êt+1,t
ŵt+1,t ≡ √ ,
σ rt+1,t
t = 1, ..., T − 1 (leaving room for startup)
235 / 281
Recursive Analysis, Breaking-Parameter DGP
236 / 281
Liquor Sales Model: Recursive Parameter Estimates
237 / 281
Liquor Sales Model: Recursive Residuals With Two
Standard Error Bands
238 / 281
Liquor Sales Model: CUSUM
239 / 281
Vector Autoregressions
240 / 281
Basic Framework
e.g., bivariate (2-variable) VAR(1)
242 / 281
Starts Sample Autocorrelations
243 / 281
Starts Sample Partial Autocorrelations
244 / 281
Completions Sample Autocorrelations
245 / 281
Completions Sample Partial Autocorrelations
246 / 281
Starts and Completions: Sample Cross Correlations
247 / 281
VAR Starts Equation
248 / 281
VAR Starts Equations Residual Plot
249 / 281
VAR Starts Equation Residual Sample Autocorrelations
250 / 281
VAR Starts Equation Residual Sample Partial
Autocorrelations
251 / 281
VAR Completions Equation
252 / 281
VAR Completions Equation Residual Plot
253 / 281
VAR Completions Equation Residual Sample
Autocorrelations
254 / 281
VAR Completions Equation Residual Sample Partial
Autocorrelations
255 / 281
Predictive Causality Analysis
256 / 281
Starts History and Forecast
257 / 281
...Now With Starts Realization
258 / 281
Completions History and Forecast
259 / 281
...Now With Completions Realization
260 / 281
Heteroskedasticity in Time Series
261 / 281
Dynamic Volatility is the Key to Finance and Financial
Economics
I Risk management
I Portfolio allocation
I Asset pricing
I Hedging
I Trading
262 / 281
Data: Financial Asset Returns
263 / 281
Data: Returns are Approximately Serially Uncorrelated
264 / 281
Corresponding Theory: Conditional Mean of Returns
E (rt |Ωt−1 ) = 0
265 / 281
Data: Returns are not Gaussian
266 / 281
Corresponding Theory: Recall the Varieties of White Noise
iid
Independent (strong) white noise: εt ∼ (0, σ 2 )
iid
Gaussian white noise: εt ∼ N(0, σ 2 )
267 / 281
Data: Returns are Conditionally Heteroskedastic I
268 / 281
Data: Returns are Conditionally Heteroskedastic II
269 / 281
We Need New Models
270 / 281
Standard Models (e.g., AR(1)) Fail...
rt = φrt−1 + εt , εt ∼ iidN(0, σ 2 )
Conditional mean:
E (rt | Ωt−1 ) = φrt−1 (varies)
Conditional variance:
var (rt | Ωt−1 ) = σ 2 (constant)
271 / 281
...So Introduce Special Heteroskedastic Disturbances
rt = φrt−1 + εt , εt ∼ iidN(0, ht )
rt | Ωt−1 ∼ N(0, ht )
272 / 281
Unified Theoretical Framework
and Tractable Empirical Framework
So take:
p td
r t = ht
std(td )
274 / 281
Asymmetric Response and the Leverage Effect: T-GARCH
2 + βh
Standard GARCH: ht = ω + αrt−1 t−1
2 + γr 2 D
T-GARCH: ht = ω + αrt−1 t−1 t−1 + βht−1
1 if rt < 0
Dt =
0 otherwise
275 / 281
Back to Empirical Work:
GARCH Modeling of Daily NYSE Returns
276 / 281
Fitted GARCH Volatility
277 / 281
A Useful Specification Diagnostic
rt |Ωt−1 ∼ N(0, ht )
p
rt = ht εt , εt ∼ iidN(0, 1)
r
√t = εt , εt ∼ iidN(0, 1)
ht
278 / 281
GARCH Model Diagnostics
279 / 281
GARCH Volatility Forecast
(σ̂t+h,t , h = 1, 2, ...)
280 / 281
Volatility Forecasts Feed Into Return Density Forecasts
Now we have:
2
yt+h ∼ N(yt+h,t , σt+h,t )
281 / 281