Professional Documents
Culture Documents
T IME S ERIES
S UMMARY
by
[1]
0. C ONTENTS
1 Handout I 1
1.1 Examples of time series and objectives of time series analysis . . . . . . . . . . . . . . . . . . 1
1.1.1 Objectives of Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 General approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Financial Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Recap Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Correlation-Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Stationary Models and measuring dependence using correlations . . . . . . . . . . . . . . . . 2
1.4.1 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4.2 Autocovariance and Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4.3 Estimating moments using sample moments . . . . . . . . . . . . . . . . . . . . . . . 2
1.4.4 Time series models used for this course: . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.5 IID noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.6 White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.7 Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.8 First order moving average process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.9 First-order autoregressive process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Removing Trend and Seasonal Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.1 Removing trend components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Testing the estimated noise terms: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6.1 Testing for independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6.2 Testing for Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6.3 Testing for White-Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6.4 Testing for normality: Gaussian QQ-Plot . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Handout II 8
2.1 Linear Processes: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 MA(q)-models:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 AR(p)-models: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Estimation of AR(p)-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 ARMA(p, q)-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 ARIMA(p, d, q)-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Conditional probability and expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Forecasting Stationary Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.1 Introduction to prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.2 One step ahead prediction for AR(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.3 Two step ahead prediction for AR(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.4 One step ahead prediction for MA(1). . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.5 Two step ahead prediction for MA(1). . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.6 l step ahead prediction for MA(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.7 Introduction to best linear predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.8 Best linear predictor: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Partial Autocorrelation Function (PACF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6.1 Formal Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.2 Equivalent Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.3 Distribution of sample PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
C ONTENTS ii
3 Handout III 16
3.1 Modeling Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Financial Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Heavy tails: Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 ARCH-models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 ARCH(1)-model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2 Properties of ARCH(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.3 Squared ARCH(1) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.4 Existence of a stationary ARCH(1)-process . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.5 Summary ARCH(1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.6 Estimation of ARCH(1)-processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.7 Standardized residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.8 Forecasting with ARCH(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.9 Recursion relation for predicting ARCH(1) . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.10 Kurtosis of stationary ARCH(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.11 Choice for IID-sequence { Z t } . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.12 Weaknesses of ARCH models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.13 Extension to ARCH(m) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 GARCH-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 Properties of GARCH(1,1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.2 ARMA(1,1) representation of GARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.3 Further properties and extensions of GARCH(1,1) . . . . . . . . . . . . . . . . . . . . . 21
1. H ANDOUT I
• Finding a suitable probability model that models the uncertainty an a time series
• Specifying the joint distribution of C t , 0 ≤ t ≤ n )
Afterwards, the trend and seasonal components are removed so the remaining residuals are stationary. Then
a probability model is fitted on the residuals, which can be used to predict future values of the time series.
l t = l og (
Pt
P t −1
P t − P t −1
= l og (1 + ) (1.3)
P t −1
= l og (1 + R t )
≈ Rt
C ov ( X 1 , X 2 ) = E [( X 1 − E [ X 1 ])( X 2 − E [ X 2 ])]
= E [ X 1 ⋅ X 2 ] − E [ X 1 ]E [ X 2 ]
(1.4)
For X 1 and X 2 independent, Cov(X 1 , X 2 ) = 0. This is not a reversible property. Except for the bivariate normal
distribution.
C ov (a X + bY + c, Z ) = a ⋅ C ov ( X , Z ) + b ⋅ C ov (Y , Z ) (1.5)
−1 ≤ ρ ( X 1 , X 2 ) ≤ 1
The correlation measures linear dependence of X 1 and X 2 .
1. E[X t ] = µ, is independent of t.
γ X (h ) = C ov ( X t +h , X t ) (1.7)
ρ X (h ) = = ρ ( X t +h , X t )
γ X (h )
(1.8)
γ X (0)
Note that these functions are even functions so that γ X (h ) = γ X (−h ) and ρ X (h ) = ρ X (−h ).
Sample Autocovariance:
n −∣h ∣
γ̂(h ) ∶= ∑ (x t +h − x̄ )(x t − x̄ )
1
(1.10)
n t =1
Sample Autocorrelation:
n −∣h ∣
∑ (x t +h − x̄ )(x t − x̄ )
γ̂(h )
ρ̂ (h ) = =
t =1
γ̂(0)
(1.11)
n −∣h ∣
∑ (x t − x̄ )2
t =1
2. White noise
3. Random walk
6. ARMA processes
7. ARCH processes
8. GARCH processes
σ2 , if s = t
C ov ( X s , X t ) = { (1.12)
0, if otherwise
1, if h = 0
ρ X (h ) = {
0, if h ≠ 0
(1.14)
σ2 , if h = 0
γ X (h ) = {
if h ≠ 0
(1.15)
0,
and
1, if h = 0
ρ X (h ) = {
0, if h ≠ 0
(1.16)
This looks similar to the IID noise but it is not equivalent. A sequence {X t } can be WN but not IID when it is
for example not independently distributed, such as:
⎧
⎪
⎪Zt , if t is even
X t = ⎨ Zt2−1
⎪
⎪
⎩ , if t is odd
√
2
With {Z t } ~IID N(0, 1).
This is clearly not an indepently distributed sequence and is therefore not IID. It is however WN(0, 1)
Distribution of the sample ACF for white noise: Identifying white noise from data can be done by looking
at the autocorrelation plot.
if {X t } ~WN(0, σ2 ) then for all h ≥ 1:
√
n ρˆn (h ) is normally distributed ( N(0, 1) ) as n goes to infinity.
Thus: ρˆn (h ) ≈ N (0, n1 ) (for sufficiently large n) and with a probability of 0.95 (2σ):
−√ ≤ ρˆn (h ) ≤ √
1.96 1.96
(1.17)
n n
Note: these are the two lines that appear in ACF plots in R. If the autocorrelations for all h ≥ 1 fall between
these two lines, you can say with 95% confidence that the sequence is white noise.
E [X t ] = 0
V ar ( X t ) = V ar ( Z1 ) + V ar ( Z2 ) + ... + V ar ( Z t ) = t σ2
(1.19)
Hence:
C ov ( X t +h , X t ) = C ov ( X t + Z t +1 + ... + Z t +h , X t )
= C ov ( X t , X t ) + C ov ( Z t +1 , X t ) + ... + C ov ( Z t +h , X t )
= C ov ( X t , X t )
(1.20)
= V ar ( X t ) = t σ2
⎧
⎪σ2 (1 + θ 2 ) if h = 0
⎪
⎪
⎪ 2
C ov ( X t , X t +h ) = ⎨σ θ if h = ±1
⎪
(1.22)
⎪
⎪
⎪
⎩0 if ∣h ∣ ≥ 2
⎧
Therefore:
⎪
⎪ if h = 0
⎪
⎪
1
ρ X (h ) = ⎨ 1+θθ2 if h = ±1
⎪
(1.23)
⎪
⎪
⎪
⎩0 if ∣h ∣ ≥ 2
X t = φX t −1 + Z t
= φ(φX t −2 + Z t −1 ) + Z t
= φ2 X t −2 + φZ t −1 + Z t
k −1
= φk X t −k + ∑ φ j Z t − j (1.25)
j =0
∞
Ð→ ∑ φ j Z t − j
j =0
Provided∣φ∣ < 1
γ X (h ) = C ov ( X t , X t +h )
φh
= σ2
1 − φ2
(1.26)
= φh γ X (0)
X t = mt + st + Yt (1.27)
with:
• m t is a trend component
The analysis is then done by estimating and extracting the trend and seasonal components and then finding
a suitable probabilistic model for Y t .
The removal of trend components can be done in 3 ways:
• Polynomial fitting
• Differencing
m t = a 0 + a 1 t + ... + a k t k (1.28)
D IFFERENCING
The backward shift operator B is defined by:
B X t = X t −1 (1.31)
∇( X t ) = X t − X t −1 = (1 − B ) X t (1.32)
Differentiating once will remove linear trends, while differentiating twice will remove quadratic drift.
If the time series is independent, each of these possibilities will have an equal probibility. 2/3 of the possibil-
ities are turning points so in a series of n points, the amount of turning points Tn is expected to be:
E [Tn ] = (n − 2)
2
(1.33)
3
Tn − 2(n − 2)/3
Un ∶= √ ~N (0, 1) (1.34)
(16n − 29)/90
H0 or the null hypothesis that the series is stationary is rejected for large or small values of Un (typically
outside of the 2σ or 3σ bounds)
If { X t } ~W N (0, σ2 ):
L JUNG -B OX -T EST
H
ρˆn (h )2
Q LB = n (n + 2) ∑
n −h
(1.36)
h =1
If { X t }~W N (0, σ2 ), then Q LB ≈ χ2 ( H ) (for n sufficiently large). Determine critical values for test from signif-
icance and χ2 -distribution with H degrees of freedom.
F ( X (i ) ≈ F n ( X (i ) , with X (i ) =
i
n (1.38)
X (i ) ≈ F −1 (i /n ) = µ + σΦ−1 (1/n )
So, if the hypothesis that { X t } is distributed normally holds, then the points:
(Φ−1 ( ), X (i ) )
i
n +1
(1.39)
• Shapiro-Wilk test
• Jarque-Bera test
• etc.
2. H ANDOUT II
∞
X t = ∑ ψ j Zt − j (2.1)
j =−∞
j =−∞
A process is called causal if ψ j = 0 for j < 0, and can be written as follows:
X t = Ψ(B ) Z t
Ψ(B ) = ψ0 + ψ1 B + ψ2 B 2 + ...
(2.2)
X t = Z t + θ1 Z t −1 + ... + θq Z t −q (2.3)
Where { Z t }~W N (0, σ ), and θq ≠ 0.
2
We write:
X t = Θ(B ) Z t
Θ(B ) = 1 + θ1 B + θ2 B 2 + ... + θq B q
(2.5)
⎧
⎪
⎪
⎪
q −h
⎪σ2 ∑ θ j θ j +h if ∣h ∣ ≤ q
γ X (h ) = ⎨ j =0
⎪
(2.6)
⎪
⎪
⎪
⎩0 if ∣h ∣ > q
If { X t } is a stationary time series with mean zero, for which γ(h ) = 0 for |h| > q, it can be represented as a
MA(q) process.
X t = ϕ1 X t −1 + ϕ2 X t −2 + ... + ϕp X t −p + Z t (2.7)
Where Z t ~W N (0, σ2 ).
We write: Φ(B ) X t = Z t
with ϕ(B ) the autoregressive operator: Φ(B ) = 1 − ϕ1 B − ϕ2 B 2 − ... − ϕp B p
AR(1)- MODELS
X t = ϕX t −1 + Z t
⎧
⎪
⎪ if ∣ϕ∣ < 1
∞
⎪
⎪ ∑ ϕ j Zt − j
⎪ j =0
=⎨ ∞
(2.8)
⎪
⎪
⎪− ∑ ϕ− j Z t + j if ∣ϕ∣ > 1
⎪
⎪
⎩ j =1
It can be seen that the solution for ∣ϕ∣ > 1 is not useful for prediction, since it is non-causal and depends on
the future.
The solution for ∣ϕ∣ < 1 is the unique causal solution to the AR defining equation.
The solution for the coefficients {ψ j } can be found by reorganizing the coefficients.
After fitting, the residuals (or standardized residuals) should be tested for WN and N.
If the residuals are not white noise or normal, the model is inappropriate for the data. The question then
becomes: how do we select the right AR model? The goal is to find the most parsimonious model (the model
that uses the smallest amount of parameters) that fits the data well.
This is done using the Akaike Information Criterion or the Corrected version thereof:
Are outside the unit circle (so the roots satisfy |z| > 1).
Θ(B )
X t = Ψ(B ) Z t Ψ(B ) =
Φ(B )
with (2.20)
2.2.1. C ONCLUSION
Type of process ACF PACF
AR(p) Exponential decay Zero after lag p
MA(q) Zero after lag q Exponential decay
ARMA(p,q) Exponential decay Exponential decay
2. Apply transformations to transform the data into a stationary time-series (differencing, polynomial
fitting,...)
If multiple ARMA models seem reasonable, compare them using either the smallest ψ-weights or the smallest
AIC(C)-value.
Y t = (1 − B ) X t = X t − X t −1 (2.23)
satisfies:
Y t − ϕ1 Y t −1 − ϕ2 Y t −2 = Z t + θ1 Z t −1 (2.24)
P Y ∣ X ( y ∣x ) = P (Y = y ∣ X = x ) (2.25)
Whenever P(X=x)>0.
It follows from the definition that:
P ( X = x, Y = y )
P Y ∣ X ( y ∣x ) =
P (X = x )
(2.26)
P ( X = x, Y = y ) = P ( X = x )P (Y = y )
P Y ∣ X ( y ∣x ) = P Y ( y )
(2.27)
The conditional expectation of Y given X is denoted by E[Y | X=x] and defined as:
E [Y ∣ X = x ] = ∑ p Y ∣ X ( y ∣x ) (2.28)
y
E [Y ∣ X = x ] = E [Y ] (2.29)
E [Y ] = ∑ E [Y ∣ X = x ]P ( X = x )
x (2.30)
= E [E [Y ∣ X ]]
3. Taking out what is known:
E [ f ( X )g ( Z )∣ X ] = f ( X )E [g ( Z )∣ X ] (2.31)
Suppose that X 1 , X 2 , ... is a sequence of random variables with E [ X i2 ] < ∞ and E [ X i ] = µ. The random variable
f ( X 1 , X 2 , ..., X n ) that minimizes:
E [( X n +1 − f ( X 1 , ..., X n ))2 ] (2.32)
is given by:
f ( X 1 , ..., X n ) = E [ X n +1 ∣Fn ] (2.33)
P n X n +1 = E [ X n +1 ∣Fn ] (2.34)
X n +1 = ϕ1 X n + ϕ2 X n −1 + ... + ϕp X n −p + Zn +1
p
= ∑ ϕi X n +1−i + Zn +1
(2.35)
i =1
Where we assume that { Z t } ~I I D (0, σ2 ). The one step ahead forecast error is then:
e n (1) = X n +1 − P n X n +1 = Zn +1 (2.37)
So we have:
X n +1 − P n X n +1 = Zn +1 (2.38)
X n +2 = ϕ1 X n +1 + ϕ2 X n + ... = ϕp X n + 2 − p + Zn +2 (2.40)
The best predictor then becomes:
p
P n X n +2 = E [ X n +2 ∣Fn ] = ϕ1 P n X n +1 + ∑ ϕi X n +2−i (2.41)
i =2
e n (2) = X n +2 − P n X n +2
= ϕ1 ( X n +1 − P n X n +1 ) + Zn +2 (2.42)
= ϕ1 Zn +1 + Zn +2
Note that Var(e n (2)) ≥ Var(e n (1)). This method can be extended to an l-step ahead forecast.
e n (2) = X n +2 − P n X n +2 = Zn +2 + θ1 Zn +1 (2.45)
e n (1) = X n +1 − P n X n +1 = Zn +1 (2.48)
E [ X n +h − a 0 − a 1 X n − ... − a n X 1 ]2 = 0
δ
δa j
⎧
⎪
⎪
⎪E [ X n +h − a 0 − ∑ a i X n +1−i ] if j = 0
n
⎪
⎪
(2.56)
0=⎨ i =1
⎪
⎪
⎪E [( X n +h − ∑ a i X n +1−i ) X n +i − j ] if j = 1, 2, ..., n
n
⎪
⎪
⎩ i =1
If Γn is non-singular, then:
an = Γ−
n ⋅ γn (h )
1
(2.58)
α(h ) = ρ ( X t +h − PI X t +h , X t − PI X t ) (2.59)
Where
I = Ih,t = {t + 1, ..., t + h − 1} (2.60)
B LP ( X t ∣ X t −1 ) = ϕ1,1 X t −1
B LP ( X t ∣ X t −1 , X t −2 ) = ϕ2,1 X t −1 + ϕ2,2 X t −2
B LP ( X t ∣ X t −1 , X t −2 , X t −3 ) = ϕ3,1 X t −1 + ϕ3,1 X t −2 + ϕ3,3 X t −1
(2.61)
⋮=⋮
Then:
α(h ) = ϕh,h (2.62)
B LP ( X 2 ∣ X 1 ) = ϕ1,1 X 1
B LP ( X 3 ∣ X 2 , X 1 ) = ϕ2,1 X 2 + ϕ2,2 X 1
B LP ( X 4 ∣ X 3 , X 2 , X 1 ) = ϕ3,1 X 3 + ϕ3,1 X 2 + ϕ3,3 X 1
(2.63)
⋮=⋮
This implies α(h ) = ϕhh is the coefficient of X 1 in the best linear predictor P 1∶h X h +1 , i.e. ϕhh is uniquely
determined by:
⎡ γ(0) γ(1) ⋯ γ(h − 1)⎤ ⎡ ⎤ ⎡ ⎤
⎢ ⎥ ⎢ ϕh1 ⎥ ⎢ γ(1) ⎥
⎢ γ(1) γ(0) ⋯ γ(h − 2)⎥ ⎢ ϕh2 ⎥ ⎢ γ(2) ⎥
⎥ ⎢ ⎥ ⎢
⎢ ⎥
⎢ ⎥⎢ ⎥=⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
(2.64)
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢γ(h − 1) γ(h − 2) ⋯ γ(0) ⎥ ⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ϕhh ⎦ ⎣γ(h )⎦
This result can be used for order selection of AR processes using the following relations:
√
∣α̂(h )∣ > 1.96/ n for0 ≤ h ≤ p
{ √
∣α̂(h )∣ ≤ 1.96/ n forh > p
(2.66)
3. H ANDOUT III
R t ≈ l t = ∇[l og (P t )] (3.2)
Let Ft = σ( X 1 , ..., X t )
denote the information up to and including time t.
E [(Y − µ)4 ]
κY =
(E [(Y − µ)2 ])2
(3.3)
∑ (Yi − Ȳn )4
n
1
κ̂n =
n
i =1
(3.4)
( n1 ∑ (Yi − Ȳn )2 )2
n
i =1
X t = σt Z t
σ2t = ω + αX t2−1
(3.5)
We assume ω, α ≥ 0.
Suppose { Z t } ~N(0, 1), then
X t ∣ X t −1 ~N (0, ω + αX t2−1 ) (3.6)
This conditional distribution has non-constant variance, so it is called Conditionally Heteroscedastic (CH).
X t ∈ Ft
σ2t ∈ Ft −1
Conditional Properties:
1. E [ X t ∣Ft −1 ] = 0
2. V ar ( X t ∣Ft −1 ) = σ2t
Unconditional Properties:
1. E [ X t ] = 0
2. for h > 0, C ov ( X t , X t +h ) = 0
3. V ar ( X t ) = ω + αV ar ( X t −1 )
• A causal stationary process exists if ω ≥ 0 and α ∈ [0, 1). In that case { X t } ~W N (0, ω/(1 − α))
• The squared process { X t2 } is AR(1) with nonzero mean and non-Gaussian noise terms.
X t2 = ω + αX t2−1 + Vt (3.11)
If { Z t } ~N(0,1), then:
x t2
f ω,α (x t ∣x t −1 ) = exp(− )
1
2π(ω + αx t2−1 ) 2(ω + αx t2−1 )
(3.13)
Then find (ω̂, α̂) that maximize L. (Note that unless a large sample size is used, this likelihood function tends
to be flat.)
With { Z t } ~IID(0,1).
After estimation using maximum likelihood this gives (µ̂, ω̂, α̂).
√
The predicted volatility is defined by: σ̂t = ω̂ + α̂X t2−1
This means that the forecast is always zero, however, the prediction bands change over time.
for (h → ∞)
ω (3.17)
1−α
→
1 − α2
κX = κZ > κZ
1 − να2
(3.21)
Γ((ν + 1)/2) x2
f (x ) = √ (1 + )−(ν+1)/2 for 2 < ν ≤ ∞
Γ(ν/2) ν − 2π ν−2
If ν = 2, then this corresponds to N(0,1) and for 0 < ν < 2 this yields heavier tails.
• The ARCH model provides a way to describe conditional variance. It does not give indications on what
causes such behavior.
• The ARCH model is restrictive, i.e., constraints, such as 3α2 < 1, limit the ability of ARCH models with
conditional Gaussian innovations to capture excess kurtosis.
X t = σt Z t
σ2t = ω + α1 X t2−1 + ... + αm X t2−m
(3.22)
If { X t } ~ARCH(m), then { X t2 } ~AR(m). So the PACF of { X t2 } can be used to determine the order m.
X t = σt Z t
σ2t = ω + αX t2−1 + βσ2t −1
(3.23)
σ2t ∈ Ft −1
Conditional properties:
Unconditional properties:
1.
E [X t ] = 0
2. for h > 0
C ov ( X t , X t +h ) = 0
V ar ( X t ) =
ω
1−α−β
proof: From
X t = σt Z t
we get:
So:
β( X t2−1 − σ2t −1 ) = βVt2−1
Y t = µ + ϕY t −1 + X t
With:
X t = σt Z t σ2t = ω + αX t2−1 + βσ2t −1
• Modeling multiple time-series is very interesting from a practical point of view as well.