GARCH (1,1) Models: Ruprecht-Karls-Universit at Heidelberg

Ruprecht-Karls-Universität Heidelberg
Fakultät für Mathematik und Informatik
Bachelorarbeit
zur Erlangung des akademischen Grades
Bachelor of Science (B. Sc.)
GARCH(1,1) models
vorgelegt von
Brandon Williams
15. Juli 2011
Betreuung: Prof. Dr. Rainer Dahlhaus

Abstrakt
In dieser Bachelorarbeit werden GARCH(1,1)-Modelle zur Analyse finanzieller Zeitreihen unter-
sucht. Dabei werden zuerst hinreichende und notwendige Bedingungen dafür gegeben, dass solche
Prozesse überhaupt stationär werden können. Danach werden asymptotische Ergebnisse über rel-
evante Schätzer hergeleitet und parametrische Tests entwickelt. Die Methoden werden am Ende
durch ein Datenbeispiel illustriert.
Abstract
In this thesis, GARCH(1,1)-models for the analysis of financial time series are investigated. First,
sufficient and necessary conditions will be given for the process to have a stationary solution.
Then, asymptotic results for relevant estimators will be derived and used to develop parametric
tests. Finally, the methods will be illustrated with an empirical example.
Contents
1 Introduction 2
2 Stationarity 4
3 A central limit theorem 9
4 Parameter estimation 18
5 Tests 22
6 Variants of the GARCH(1,1) model 26
7 GARCH(1,1) in continuous time 27
8 Example with MATLAB 34
9 Discussion 39
1
1 Introduction
Modelling financial time series is a major application and area of research in probability theory and
statistics. One of the challenges particular to this field is the presence of heteroskedastic effects,
meaning that the volatility of the considered process is generally not constant. Here the volatility
is the square root of the conditional variance of the log return process given its previous values.
That is, if Pt is the time series evaluated at time t, one defines the log returns
Xt = log Pt+1 − log Pt
and the volatility σt , where

σt2 = Var[Xt2 | Ft−1 ]
and Ft−1 is the σ-algebra generated by X0 , ..., Xt−1 . Heuristically, it makes sense that the volatility
of such processes should change over time, due to any number of economic and political factors,
and this is one of the well known “stylized facts” of mathematical finance.
The presence of heteroskedasticity is ignored in some financial models such as the Black-Scholes
model, which is widely used to determine the fair pricing of European-style options. While this leads
to an elegent closed-form formula, it makes assumptions about the distribution and stationarity
of the underlying process which are unrealistic in general. Another commonly used homoskedas-
tic model is the Ornstein-Uhlenbeck process, which is used in finance to model interest rates and
credit markets. This application is known as the Vasicek model and suffers from the homoskedastic
assumption as well.
ARCH (autoregressive conditional heteroskedasticity) models were introduced by Robert Engle

in a 1982 paper to account for this behavior. Here the conditional variance process is given an au-
toregressive structure and the log returns are modelled as a white noise multiplied by the volatility:
Xt = et σt
σt2 = ω + α1 Xt−1
2 2
+ ... + αp Xt−p ,
where et (the ’innovations’) are i.i.d. with expectation 0 and variance 1 and are assumed indepen-
dent from σk for all k ≤ t. The lag length p ≥ 0 is part of the model specification and may be
determined using the Box-Pierce or similar tests for autocorrelation significance, where the case
p = 0 corresponds to a white noise process. To ensure that σt2 remains positive, ω, αi ≥ 0 ∀i is
required.
Tim Bollerslev (1986) extended the ARCH model to allow σt2 to have an additional autoregres-
sive structure within itself. The GARCH(p,q) (generalized ARCH) model is given by
Xt = et σt
σt2 = ω + α1 Xt−1
2 2
+ ... + αp Xt−p 2
+ β1 σt−1 2
+ ... + βq σt−q .
This model, in particular the simpler GARCH(1,1) model, has become widely used in financial
time series modelling and is implemented in most statistics and econometric software packages.
GARCH(1,1) models are favored over other stochastic volatility models by many economists due
2
to their relatively simple implementation: since they are given by stochastic difference equations
in discrete time, the likelihood function is easier to handle than continuous-time models, and since
financial data is generally gathered at discrete intervals.
However, there are also improvements to be made on the standard GARCH model. A notable
problem is the inability to react differently to positive and negative innovations, where in reality,
volatility tends to increase more after a large negative shock than an equally large positive shock.
This is known as the leverage effect and possible solutions to this problem are discussed further in
section 6.
Without loss of generality, the time t will be assumed in the following sections to take values
in either N0 or in Z.
3
2 Stationarity
The first task is to determine suitable parameter sets for the model. In the introduction, we
considered that ω, α, β ≥ 0 is necessary to ensure that the conditional variance σt2 remains non-
negative at all times t. It is also important to find parameters ω, α, β which ensure that σt2 has finite
expected value or higher moments. Another consideration which will be important when studying
the asymptotic properties of GARCH models is whether σt2 converges to a stationary distribution.
Unfortunately, we will see that these conditions translate to rather severe restrictions on the choice
of parameters.
Definition 1. : A process Xt is called stationary (strictly stationary), if for all times t1 , ..., tn , h ∈ Z:
FX (xt1 +h , ..., xtn +h ) = FX (xt1 , ...xtn )
where FX (xt1 , ..., xtn ) is the joint cumulative distribution function of Xt1 , ..., Xtn .
Theorem 2. Let ω > 0 and α, β ≥ 0. Then the GARCH(1,1) equations have a stationary solution
if and only if E[log(αe2t + β)] < 0. In this case the solution is uniquely given by
X j
∞ Y
σt2 = ω 1 + (αe2t−i + β) .
j=1 i=1
Proof. With the equation σt2 = ω + (αe2t−1 + β)σt−1

2 , by repeated use on σ
t−1 , etc. we arrive at the
equation
Xk Yj k+1
Y
σt2 = ω(1 + (αe2t−i + β)) + ( (αe2t−i + β))σt−k−1
2
,
j=1 i=1 i=1
which is valid for all k ∈ N. In particular,

j
k Y
X
σt2 ≥ ω(1 + (αe2t−i + β)),
j=1 i=1
since α, β ≥ 0. Assume that σt2 is a stationary solution and that E[log(αe2t + β)] ≥ 0. We have
j
Y j
Y j
X
log E[ (αe2t−i + β)] ≥ E[log (αe2t−i + β)] = E[log(αe2t−i + β)]
i=1 i=1 i=1
Qj
and therefore, if E[log(αe2t + β)] > 0, then the product i=1 (αe2t−i + β) diverges a.s. by the strong
law of large numbers. In the case that E[log(αe2t + β)] = 0, then ji=1 log(αe2t−i + β) is a random
P
walk process so that
Xj
lim sup log(αe2t−i + β) = ∞ a.s.
j→∞ i=1
so that in both cases we have
j
Y
lim sup (αe2t−i + β) = ∞ a.s.
j→∞ i=1
4
Since all terms are negative we then have
j
Y
σt2 ≥ lim sup ω (αe2t−i + β) = ∞ a.s.
j→∞ i=1
which is impossible; therefore, E[log(αe2t + β)] < 0 is necessary for the existence of a stationary
solution. On the other hand, let E[αe2t + β] < 0. Then there exists a ξ > 1 with log ξ + E[log(αe2t +
β)] < 0. For this ξ we have by the strong law of large numbers:
n
1X a.s.
log ξ + log(αe2t−i + β) −→ log ξ + E[log(αe2t + β)] < 0,
n
i=1
so
n n
Y 1X a.s.
log(ξ n (αe2t−i + β)) = n(log ξ + log(αe2t−i + β)) −→ −∞,
n
i=1 i=1
and
n
a.s.
Y
ξn (αe2t−i + β) −→ 0.
i=1
Therefore, the series
X j
∞ Y
ω 1+ (αe2t−i + β)
j=1 i=1
converges a.s. To show uniqueness, assume that σt and σ̂t are stationary: then
n
Y
|σt − σ̂t | = (αe2t−1 + β)|σt−1
2 2
| = (ξ n (αe2t−i + β))ξ −n |σt−n
2 2 P
− σ̂t−1 − σ̂t−n | → 0.
i=1
This means that P(|σt − σ̂t | > ) = 0 ∀ > 0, so σt = σ̂t a.s.
Corollary 3. The GARCH(1,1) equations with ω > 0 and α, β ≥ 0,have a stationary solution with
ω
finite expected value if and only if α + β < 1, and in this case: E[σt2 ] = 1−α−β .
Proof. : Since E[log(αe2t + β)] ≤ log(E[αe2t + β]) = log(α + β) < 0, the conditions of Theorem 1 are
fulfilled. We have
X j
∞ Y
E[σt2 ] = E[ω 1 + (αe2t−i + β) ]
j=1 i=1
∞
X j
Y
= ω(1 + E[ (αe2t−i + β)])
j=1 i=1
X∞
= ω(1 + (α + β)j )
j=1
ω
=
(1 − α − β)
if this series converges, that is, if α + β < 1, and ∞ otherwise.
5
Remark 4. This theorem shows that strictly stationary IGARCH(1,1) processes (those where
α + β = 1) exist. For example, if et is normally distributed, and α = 1, β = 0, then
E[log(αe2t + b)] = E[log e2t ] = −(γ + log 2) < 0,
where γ ≈ 0.577 is the Euler-Mascheroni constant. Therefore, the equations Xt = et σt ; σt2 = Xt−1
2 ,
or equivalently Xt = et Xt−1 define a stationary process which has infinite variance at every t. On
the other hand, σt2 = σt−1
2 has no stationary solution.
In some applications, we may require that the GARCH process have finite higher-order moments;
for example, when studying its tail behavior it is useful to study its excess kurtosis, which requires
the fourth moment to exist and be finite. This leads to further restrictions on the coefficients α
and β.
For a stationary GARCH process,
E[Xt4 ] = E[e4t ]E[σt4 ]

X j
∞ Y 2
= E[e4t ]E[ω 2 1 + (αe2t−i + β) ]
j=1 i=1
X j
∞ Y ∞ X
X ∞ Y
k Y
l
= ω 2 E[e4t ]E[1 + 2 (αe2t−i + β)2 + (αe2t−i + β)(αe2t−j + β)]
j=1 i=1 k=1 l=1 i=1 j=1
∞
X ∞ X
X ∞
= ω 2 E[e4t ] 1 + 2 (α + β)j + E[(αe2t + β)2 ]k∧l (α + β)k∨l−k∧l ,
j=1 k=1 l=1
which is finite if and only if ρ2 := E[(αe2t + β)2 ] < 1. In this case, using the recursion σt2 =
2 (αe2
ω + σt−1 t−1 + β),
E[σt4 ] = ω 2 + 2ωE[αe2t−1 + β]E[σt−1

2
] + E[(αe2t−1 + β)2 ]E[σt−1
4
]
α+β
= ω 2 + 2ω 2 + ρ2 E[σt4 ],
1−α−β
so
1+α+β
E[Xt4 ] = E[σt4 ]E[e4t ] = ω 2 E[e4t ] .
(1 − ρ2 )(1 − α − β)
In the case of normally distributed innovations (et ), the condition ρ2 < 1 means
E[(αe2t + β)2 ] = α2 E[e4t ] + 2αβE[e2t ] + β 2 = 3α2 + 2αβ + β 2 = (α + β)2 + 2α2 < 1.
The excess kurtosis of Xt with normally distributed innovations is then
E[Xt4 ] 3(1 + α + β)ω 2

− 3 = ω −3
Var[Xt ]2 (1 − α − β)(1 − 2α2 − (α + β)2 )( 1−α−β )2
1 − (α + β)2
=3 −3
1 − 2α2 − (α + β)2
2α2
= > 0,
1 − 2α2 − (α + β)2
6
which means that Xt is leptokurtic, or heavy-tailed. This implies that outliers in the GARCH
model should occur more frequently than they would with a process of i.i.d. normally distributed
variables, which is consistent with empirical studies of financial processes.
More generally, for Xt to have a finite 2n-th moment (n ∈ N) a necessary and sufficient condition
is that E[(αe2t + β)n ] < 1.
Another interesting feature of GARCH processes is the extent to which innovations et at time
2 . To consider this mathematically we will
t persist in the conditional variance at a later time σt+h
use the following definition. For the GARCH(1,1)-process X = (Xt ), define
t+n
Y t+n−1
X Y k
πX (t, n) = σ02 (αe2t+n−i + β) + ω( (αe2t+n−j + β)).
i=1 k=n j=1
Definition 5. The innovation et does not persist in X in L1 iff

E[πX (t, n)] → 0 (n → ∞),
and almost surely (a.s.) iff
a.s.
πX (t, n) → 0 (n → ∞).
If every innovation et persists in X, then we call X persistent.
To see how this definition reflects the heuristic meaning of a shock innovation persisting in the
conditional variance, consider that for a GARCH time series with finite variance,
n−1
2 2 1−α−β Y
E[πX (t, n)] = E[(E[σt+n ] − E[σt+n | et ]) (αe2t+n−i + β)]
ω(α + β)n−1
i=1
2 2 1−α−β
= (E[σt+n ] − E[σt+n | et ]) ,
ω
2 ] − E[σ 2
which tends to zero if and only if E[σt+n t+n | et ] tends to zero as well.
Theorem 6. (i) If et persists in X in L1 for any t, then et persists in X in L1 for all t. This is
the case if and only if α + β ≥ 1.
(ii) If et persists in X a.s. for any t, then et persists in X a.s. for all t. This is the case if and
only if E[log(αe2t + β)] ≥ 0.
Proof. (i) First,
t+n
Y t+n−1
X Y k
E[πX (t, n)] = σ02 E[αe2t+n−i + β] + ω E[αe2t+n−j + β].
i=1 k=n j=1
For this value to be converge to zero (that is, for et to not persist), we need E[σ02 ] to be finite,
which means α + β < 1. On the other hand, let α + β < 1. Then we have
t+n−1
ω X
E[πX (t, n)] = (α + β)t+n + ω (α + β)k
1−α−β
k=n
ω(α + β)n
= → 0 (n → ∞),
1−α−β
7
so et does not persist.
(ii) Let E[log(αe2t + β)] < 0. By the strong law of large numbers,
n
1X a.s.
log(αe2t−i + β) → E[log(αe2t + β)] < 0,
n
i=1
so
n n
1 X
a.s.
Y
log (αe2t−i + β) = n log(αe2t−i + β) → −∞
n
i=1 i=1
and therefore
n
a.s.
Y
(αe2t−i + β) → 0.
i=1
This means that we have

t+n t+n−1 k
a.s.
Y X Y
πX (t, n) = σ02 (αe2t+n−i + β) +ω( (αe2t+n−j + β)) → 0.
|i=1 {z } k=n j=1
| {z }
→0 →0
On the other hand, let E[log(αe2t + β)] ≥ 0. Then by the argument in the proof of Theorem 1, we
have
Yj
lim sup (αe2t−i + β) = ∞ a.s.
j→∞ i=1
so that πX (t, n) cannot converge to zero.
It is a peculiar property of GARCH(1,1) models that the concept of persistence depends strongly
on the type of convergence used in the definition. Persistence in L1 is the more intuitive sense,
since it excludes pathological volatility processes such as σt2 = 3σt−1
2 , which is strongly stationary
2
since E[log(3et + 0)] = −(γ + log 2) + log 3 < 0.
Definition 7. We call π := α + β the persistence of a GARCH(1,1) model with parameters ω, α, β.
As we have seen earlier, the persistence of the model limits the kurtosis the process can take.
Since the estimated best-fit GARCH process to a time series often has persistence close to 1, this
severely limits the value of α to ensure the existence of the fourth moment. From the representation
of σt2 in theorem 2.1, we immediately have
Theorem 8. The autocorrelation function (ACF) of {Xn2 } decays exponentially to zero with rate
π if π < 1.
8
3 A central limit theorem
Having derived the admissible parameter space, we consider the task of estimating the parameters
and predicting the values of Xt at future times t. Since P Xt is centered at every t, a natural
estimator for its variance is the average of the squares n1 nt=1 Xt2 . The following theorem will show
that, under a stationarity and moment assumption, this is a consistent and asymptotically normal
estimator.
Theorem 9. For a wide-sense stationary GARCH(1,1)-process Xt with V ar[Xt2 ] < ∞, E[e4t ] < ∞
and parameters ω, α, β, the following theorem holds:
n
1 X 2 D 1 + α + β E[e4t ](1 + α − β) + 2β 1
√ (Xt − E[Xt2 ]) → N (0, ω 2 − ,
n t=1 (1 − α − β)2 1 − ρ2 1−α−β
where ρ2 := E[(αe2t + β)2 ] as in section 2.

Proof. By Corollary 2.2 the condition E[Xt2 ] < ∞ implies that α + β < 1 and we have
ω
E[Xt2 ] = E[σt2 ] = .
1−α−β
Similarly, we have seen that E[Xt4 ] is finite if and only if ρ2 < 1 and in this case one has
1+α+β
E[Xt4 ] = E[σt4 ]E[e4t ] = ω 2 E[e4t ] .
(1 − ρ2 )(1 − α − β)
Define Yt := Xt2 − E[Xt2 ]. Then the variables {Y1 , Y2 , ...} are weakly dependent in the following
sense:
Lemma 10. Let s1 < s2 < ... < su < su + r = t and let f : Ru → R be quadratically integrable and
measurable. Then p
|Cov[f (Ys1 , ..., Ysu ), Yt ]| ≤ C E[f 2 (Ys1 , ..., Ysu )]ρr
for a constant C which is independent of s1 , ..., su , r.
Proof. Let w = ω − E[Xt2 ]. Then we have

j
∞ Y
X
Yt = we2t (1 + (αe2t−i + β)).
j=1 i=1
We define the helper variable

j
r−1 Y
X
Ỹt = we2t (1 + (αe2t−i + β)).
j=1 i=1
Then Ỹt is independent of (Ys1 , ..., Ysu ) and by the Cauchy-Schwarz inequality:
|Cov[f (Ys1 , ..., Ysu ), Yt ]| = |Cov[f (Ys1 , ..., Ysu ), Yt − Ỹt ]|

p q
≤ E[f (Ys1 , ..., Ysu )] E[(Yt − Ỹt )2 ].
2
9
However, we have
X j
∞ Y j
r−1 Y
X
2
E[(Yt − Ỹt ) ] = E[w2 e4t ( 2
(αet−i + β) − (αe2t−i + β))2 ]
j=1 i=1 j=1 i=1
j
∞ Y
X
= w2 E[e4t ]E[( (αe2t−i + β))2 ]
j=r i=1
Yr ∞
X j
Y
= w E[e4t ]E[( (αe2t−k
2 2
+ β)) ]E[(1 + (αe2t−i + β))2 ]
k=1 j=r+1 i=r+1
w 2
= ρ2r 2 E[e4t ]E[σt4 ]
ω
= ρ2r E[Yt2 ],
and therefore q p
|Cov[f (Ys1 , ..., Ysu ), Yt ]| ≤ E[Yt2 ] E[f 2 (Ys1 , ..., Ysu )]ρr .
| {z }
C
Similarly, we will need an additional inequality:

Lemma 11. Let s1 < s2 < ... < su < su + r = t and let f : Ru → R be bounded, measurable and
integrable. Then
|Cov[f (Ys1 , ..., Ysu ), Yt Yt+h ]| ≤ Ckf k∞ ρr
for any h > 0.
Proof. Define Ỹt as earlier. Then by Hölder’s inequality,
|Cov[f (Ys1 , ..., Ysu ), Yt Yt+h ]| = |Cov[f (Ys1 , ..., Ysu ), Yt Yt+h − Ỹt Ỹt+h ]|
≤ 2kf k∞ E[|Yt Yt+h − Ỹt Ỹt+h |].
Using the triangle and Cauchy-Schwarz inequalities, we have
E[|Yt Yt+h − Ỹt Ỹt+h |] ≤ E[|Yt − Ỹt |Yt+h ] + erw|Yt+h − Ỹt+h |Ỹt
q q q q
≤ E[(Yt − Ỹt )] E[Yt+h ] + E[(Yt+h − Ỹt+h ) ] E[Ỹt2 ]
2 2
q q q q
≤ ρr E[Yt2 ] E[Yt+h 2 ] + ρr E[Y 2 ] E[Ỹ 2 ],
t+h t
so that q q
|Cov[f (Ys1 , ..., Ysu ), Yt Yt+h ]| ≤ 2(E[Yt2 ] + E[Yt ] E[Ỹt2 ])kf k∞ ρr .
2
| {z }
C
10
The theorem to be proved is now
n
1 X D 1 + α + β E[e4t ](1 + α − β) + 2β 1
√ Yt → N (0, ω 2 2
( 2
− )).
n t=1 (1 − α − β) 1−ρ 1−α−β
Define
∞
X ∞
X
σ 2 := E[Y0 Yk ] = E[Y02 ] + 2 E[Y0 Yk ]
k=−∞ k=1
and
n
1 X
σn2 := Var[ √ Yt ].
n t=1
Then we have
∞
X
σ 2 = E[X04 ] − E[X02 ]2 + 2 (E[X02 Xk2 ] − E[X02 ]E[Xk2 ]).
k=1
However,
E[X02 Xk2 ] = E[σ02 σk2 ]

X j
k−1 Y k
Y
= E[σ02 (a0 2 2
(αek−i + β) + σ0 (αe2k−i + β))]
j=1 i=1 i=1
1 − (α + β)k
= (α + β)k E[σ04 ] + ω E[σ02 ]
1−α−β
(1 + α + β)(α + β)k 1 − (α + β)k
= ω2( + ),
(1 − ρ2 )(1 − α − β) (1 − α − β)2
so
1+α+β 1
σ 2 = ω 2 E[e4t ] 2
− ω2 +
(1 − ρ )(1 − α − β) (1 − α − β)2
∞
X (1 + α + β)(α + β)k (α + β)k
+2 ω2( − )
(1 − ρ2 )(1 − α − β) (1 − α − β)2
k=1
2 ∞
ω 1+α+β 1 1+α+β 1 X
= (E[e4t ] − + 2( − ) (α + β)k )
1−α−β 1 − ρ2 1−α−β 1 − ρ2 1−α−β
k=1
ω2 1 − (α + β)2
2 + 2α + 2β 2
= (E[e4t ] −1+ − )
(1 − α − β)2 1−ρ 2 1−ρ 2 1−α−β
1 + α + β E[e4t ](1 + α − β) + 2β 1
= ω2 ( − ).
(1 − α − β)2 1 − ρ2 1−α−β
11
Since Yt is centered for every t, we have
n
1 X
|σ̃n2 − σ 2 | = |( E[Ys Yt ]) − σ 2 |
n
s,t=1
n−1
X k
= |E[Y02 ] + 2 (1 − )E[Y0 Yk ] − σ 2 |
n
k=1
∞ n−1
X X k
≤2 |E[Y0 Yk ]| + 2 |E[Y0 Yk ]|
n
k=n k=1
q ∞
X q n−1
X kρk
≤ 2C E[Y02 ] ρk + 2C E[Y02 ] → 0 (n → ∞),
n
k=n
| {z } | {z k=1 }
→0 →0
P of kthe variables {Yk }, where 2the limit

using the weak independence of the second sum is due to
Kronecker’s lemma since ∞ k=1 ρ < ∞. This means that σ̃ n → σ 2 for n → ∞.
We now define
Yt X 2 − E[X 2 ]
Wn,t := √ = t √ t (t ≤ n).
σ̃n n σ̃n n
Pn D
By Slutsky’s theorem, it is enough to show that t=1 Wn,t −→ N (0, 1). For k ≤ n define
k
X Xk−1
vn,k := Var[ Wn,t ] − Var[ Wn,t ].
t=1 t=1
Then we have
k−1
X
vn,k = Var[Wn,k ] + 2 Cov[Wn,t , Wn,k ]
t=1
k−1
1 X
= (E[Y02 ] + 2 E[Y0 Yt ])
nσ̃n2
t=1
∞
1 X
= (σ 2 − 2 E[Y0 Yt ]),
nσ̃n2
| t=k {z }
→0
so nvn,k tends to 1 as n → ∞ for all k. In particular we have maxk≤n vn,k → 0 (n → ∞) and that
vn,k is positive for all n ≥ k > N0 for a large enough N0 ∈ N. Without loss of generality, we may
assume that vn,k > 0 is true for all n, k.
For every n ∈ N let Zn,1 , Zn,2 , ..., Zn,n be independent random variables, also independent from
Wn,k for every k, and such that Zn,k ∼ N (0, vn,k ). Since
D
Zn,1 + ... + Zn,n −→ N (0, 1),
| {z
P
}
∼N (0, k vn,k )
12
it is enough to show that
E[f (Wn,1 + ... + Wn,n ) − f (Zn,1 + ... + Zn,n )] → 0 (n → ∞)
for any function f ∈ C 3 (R). Let f be any such function. For 1 ≤ k ≤ n we define the partial sums
Sn,k := Wn,1 + ... + Wn,k−1 , Tn,k := Zn,k+1 + ... + Zn,n
(where Sn,1 = Tn,n = 0), as well as
∆n,k := E[f (Sn,k + Wn,k + Tn,k ) − f (Sn,k + Zn,k + Tn,k )].
Clearly,
n
X
E[f (Wn,1 + ... + Wn,n ) − f (Zn,1 + ... + Zn,n )] = ∆n,k .
k=1
(1) (2)
We now split ∆n,k = ∆n,k − ∆n,k by defining
(1) vn,k
∆n,k := E[f (Sn,k + Wn,k + Tn,k )] − E[f (Sn,k + Tn,k )] − E[f 00 (Sn,k + Tn,k )]
2
(2) vn,k
∆n,k := E[f (Sn,k + Zn,k + Tn,k )] − E[f (Sn,k + Tn,k )] − E[f 00 (Sn,k + Tn,k )]
2
and consider each term separately. By applying Taylor’s theorem to f as a function in Zn,k around
0, there exists a random variable ξn,k ∈ (0, 1) with
2
Zn,k
(2) vn,k
∆n,k = E[Zn,k f 0 (Sn,k + Tn,k )] +E[ (f 00 (Sn,k + ξn,k Zn,k + Tn,k )] − E[f 00 (Sn,k + Tn,k ))]
| {z } 2 2
=0
where the first term is zero because Zn,k is independent of Sn,k , Tn,k . By the mean value theorem,
n n n n
(2) 3/2 1/2
X X X X
3
| ∆n,k | ≤ C E[|Zn,k |] = C vn,k ≤ C max vn,k vn,k → 0(n → ∞).
1≤k≤n
k=1 k=1 k=1
|k=1{z }
=1
(1)
Showing this for ∆n,k is somewhat more difficult. Similarly to above, we find by Taylor’s theorem
a random variable τn,k ∈ (0, 1) so that
2
Wn,k
(1) 0 vn,k
∆n,k = E[Wn,k f (Sn,k + Tn,k )] + E[ f 00 (Sn,k + τn,k Wn,k + Tn,k )] − E[f 00 (Sn,k + Tn,k )]
2 2
where
k−1
X
E[Wn,k f 0 (Sn,k + Tn,k )] = E[Wn,k (f 0 (Sn,j+1 + Tn,k ) − f 0 (Sn,j + Tn,k ))]
j=1
k−1
X
= E[Wn,k Wn,j f 00 (Sn,j + ξn,k,j Wn,j + Tn,k )]
j=1
13
2 ]+2
Pk−1
for a random variable ξn,k,j ∈ (0, 1), again by Taylor’s theorem. Since vn,k = E[Wn,k j=1 E[Wn,k Wn,j ],
we then have
k−1
(1)
X
∆n,k = E[Wn,k Wn,j (f 00 (Sn,j + ξn,k,j Wn,j + Tn,k ) − E[f 00 (Sn,k + Tn,k )])]
j=1
| {z }
(1,1)
=∆n,k
1 2
+ E[Wn,k (f 00 (Sn,k + τn,k Wn,k + Tn,k ) − E[f 00 (Sn,k + Tn,k )])]
2
| {z }
(1,2)
=∆n,k
(1,2)
For every d ∈ N we split ∆n,k as follows:
(1,2) (1,2,1) (1,2,2) (1,2,3)

∆n,k = ∆n,k,d + ∆n,k,d + ∆n,k,d
where
(1,2,1) 2
∆n,k,d = E[Wn,k (f 00 (Sn,k + τn,k Wn,k + Tn,k ) − f 00 (Sn,k−d + Tn,k ))]
(1,2,2)
2
∆n,k,d = E[Wn,k (f 00 (Sn,k−d + Tn,k ) − E[f 00 (Sn,k−d + Tn,k )])]
and
(1,2,3) 2
∆n,k,d = E[Wn,k (E[f 00 (Sn,k−d + Tn,k )] = E[f 00 (Sn,k + Tn,k )])]
From the second weak dependence lemma (Lemma 11), it follows that
(1,2,2) 1
|∆n,k,d | = |E[Yk2 (f 00 (Sn,k−d + Tn,k ) − E[f 00 (Sn,k−d + Tn,k )])]|
nσ̃n2
1
= |Cov[f 00 (Sn,k−d + Tn,k ), Yk2 ]|
nσ̃n2
1
≤ kf 00 k∞ ρd ,
nσ̃n2
and therefore for every δ > 0 there exists a D0 ∈ N so that ∀d > D0 :

n
X (1,2,2) 1
∆n,k,d ≤ kf 00 k∞ ρd ≤ δ.
σ̃n2
k=1
For any such d, using the Cauchy-Schwarz inequality and the fact that f 00 is bounded, for any > 0:
n n n k
(1,2,1)
X X X X
2 √ 2 √
| ∆n,k,d | ≤ C( E[Wn,k I{|Yk |≥ n} ] + E[Wn,k I{|Yk |≤ n} |Wn,j |])
k=1 k=1 k=1 j=k−d
n k
C C X X
= 2
E[Y12 I{|Y1 |≥√n} ] + E[|Wn,k | |Wn,j |]
σ̃n σ̃n
k=1 j=k−d
14
for an appropriate bound C > 0. Using the Cauchy-Schwarz inequality on every term in the above
sum, we have
n n k
X (1,2,1) C 2 √ ]+
C X X q 2 ]E[W 2 ]
| ∆n,k,d ≤ E[Y I
1 {|Y1 |≥ n} E[Wn,k n,j
σ̃n2 σ̃n
k=1 k=1 j=k−d
C C(d + 1) X
≤ 2 E[Y12 I{|Y1 |≥√n} ] + E[Yj2 ] ≤ δ
σ̃n σ̃n3
j∈N
for large enough n and small enough . In addition, we have

(1,2,3) 2
|∆n,k,d | = |E[Wn,k ]||E[f 00 (Sn,k−d + Tn,k )] − E[f 00 (Sn,k + Tn,k )]|
k−1
X
2
≤ C|E[Wn,k ]| E[|Wn,j |]
j=k−d
Cd
= E[Y12 ]E[Y1 ],
n−3/2 σ̃n3
which is summable over n. With δ → 0 we have

n
(1,2)
X
∆n,k → 0 (n → ∞).
k=1
(1,1)
For the term ∆n,k we use the triangle inequality and have for any d between 1 and k:
k−d
(1,1)
X
|∆n,k | ≤ | E[Wn,k Wn,j (f 00 (Sn,j + ξn,k,j Wn,j + Tn,k ) − E[f 00 (Sn,k + Tn,k )])]|+
j=1
k−1
X
+| E[Wn,k Wn,j (f 00 (Sn,j + ξn,k,j Wn,j + Tn,k ) − E[f 00 (Sn,k + Tn,k )])]|.
j=k−d+1
The first term tends to zero: for any δ > 0,
Xk−d
| E[Wn,k Wn,j (f 00 (Sn,j + ξn,k,j Wn,j + Tn,k ) − E[f 00 (Sn,k + Tn,k )])]|
j=1
k−d
kf k∞ X k−j
≤ ρ
nσ̃n2
j=1
kf k∞ (1 − ρd ) δ
≤ 2
≤
n(1 − ρ)σ̃n n
for large enough d (and n ≥ d). The other term is split into three parts:
k−1
(1,1,1)
X
∆n,k,d := E[Wn,k Wn,j (f 00 (Sn,j + ξn,k,j Wn,j + Tn,k ) − f 00 (Sn,j−d + Tn,k ))],
j=k−d+1
15
k−1
(1,1,2)
X
∆n,k,d := E[Wn,k Wn,j (f 00 (Sn,j−d + Tn,k ) − E[f 00 (Sn,j−d + Tn,k )])],
j=k−d+1
k−1
(1,1,3)
X
∆n,k,d := E[Wn,k Wn,j (E[f 00 (Sn,j−d + Tn,k )] − E[f 00 (Sn,j + Tn,k )])].
j=k−d+1
Using the weak dependence shown in Lemma 11, for large enough d:
k−1
(1,1,2) 1 X
|∆n,k,d | ≤ |Cov[f 00 (Sn,j−d + Tn,j ), Yj Yk ]|
nσ̃n2
j=k−d+1
dρd
≤ kf k∞ ≤ δ.
nσ̃n2
Additionally, for every > 0,

n n k−1
(1,1,1)
X X X
| ∆n,k,d | ≤ C E[Wn,k Wn,j I{|Yk Yj |≥n2 } ]+
k=1 k=1 j=k−d+1
n
X k−1
X k
X
+C E[Wn,k Wn,j I{|Yk Yj |<n2 } |Wn,i |]
k=1 j=k−d+1 i=j−d
k−1 n k−1 j
C X C X X X q
= E[Yk Yj I{|Yk Yj |≥n2 } ] + E[ |Wn,k Wn,j ||Wn,i |]
σ̃n2 σ̃n
j=k−d+1 k=1 j=k−d i=j−d
k−1 n k−1 j
C X C X X X q
2 ]
≤ E[Yk Yj I{|Yk Yj |≥n2 } ] + E[|Wn,k Wn,j |]E[Wn,i
σ̃n2 σ̃n
j=k−d+1 k=1 j=k−d+1 i=j−d
k−1 n k−1 j rq
C X C X X X
≤ 2 E[Yk Yj I{|Yk Yj |≥n2 } ] + E[Yj2 ]ρk−j E[Yi2 ]
σ̃n nσ̃n3
j=k−d+1 k=1 j=k−d+1 i=j−d
k−1 k−1
C X Cd X k−j 3
≤ 2 E[Yk Yj I{|Yk Yj |≥n2 } ] + 3 ρ 2 E[Yj2 ] 4
σ̃n σ̃n
j=k−d+1 j=k−d+1
k−1 3 d
C X CdE[Y12 ] 4 (1 − ρ 2 )
= 2 E[Yk Yj I{|Yk Yj |≥n2 } ] + √ ≤δ
σ̃n σ̃n3 (1 − ρ)
j=k−d+1
| {z } | {z
→0
}
→0
16
for large enough n with an appropriate = (n). Finally,
k−1
(1,1,3)
X
|∆n,k,d | ≤ |E[Wn,k Wn,j ]|(E[f 00 (Sn,j−d + Tn,k ) − f 00 (Sn,j + Tn,k )])
j=k−d+1
k−1
X q j−1
X
≤ Cnσ̃n2 ( E[Yj2 ]ρk−j E[ |Wn,i |])
j=k−d+1 i=j−d
p k−1
Cd E[Y12 ] X
= 3 ρk−j
n 2 σ̃n2
j=k−d+1
p
Cd E[Y1 ](1 − ρd )
2
= 3 .
n 2 σ̃n2 (1 − ρ)
Altogether, we have
n 3 d
dρd kf 00 k∞ CdE[Yj2 ] 4 (1 − ρ 2 ) Cd E[Y12 ](1 − ρd )
p
(1,1)
X
|∆n,k,d | ≤δ+ + √ + √ 2
σ̃n2 σ̃n3 (1 − ρ) nσ̃n (1 − ρ)
k=1
p
Cd E[Y12 ](1 − ρd )
≤ 3δ + √ 2 → 0,
nσ̃n (1 − ρ)
letting n tend to ∞ and δ to zero. Altogether, we now have

n
(1)
X
| ∆n,k | → 0 (n → ∞)
k=1
and the theorem is proved.
17
4 Parameter estimation
Before attempting to estimate the parameters ω, α, β of a GARCH(1,1) process, we first have to
show that they are unique - that they are the only parameters capable of defining the process.
Lemma 12. Let Xt be a strictly stationary GARCH(1,1) model with α + β < 1. Then
∞
ω X
σt2 = + αβ k−1 Xt−k
2
.
1−β
k=1
ω
Proof. Since β < 1 it is clear that the series converges with probability 1. Let Zt := 1−β +
P∞ k−1 X 2 . Then one has
k=1 αβ t−k
∞
2 2 ω X
ω+ αXt−1 + βZt−1 = ω + αXt−1 + (−ω) + + αβ k Xt−(k+1)
2
1−β
k=1
∞
ω X
= + αβ k Xt−(k+1)
2
= Zt ,
1−β
k=0
so that Zt fulfills the same recursive equation as σt2 . However, by Theorem 1 the strictly stationary
solution is unique, so we must have Zt = σt2 ∀t.
Theorem 13. : Let Xt be a GARCH(1,1) model with α + β < 1, ω > 0 and where e2t is not a.s.
constant. Then the parameters ω, α, β are unique.
Proof. : Assume that Xt has two representations as a GARCH(1,1) process with parameters ω, α, β
and ω̂, α̂, β̂. By the above lemma, we can write
∞ ∞
ω X ω̂ X
σt2 = + αβ k−1 Xt−k
2
= + α̂β̂ k−1 Xt−k
2
.
1−β 1 − β̂
k=1 k=1
This means that αβ k−1 = α̂β̂ k−1 ∀k; assuming otherwise, let k0 be the smallest k with αβ k0 −1 6=
α̂β̂ k0 −1 . Then P∞
ω ω̂ j−1 − α̂β̂ j−1 )X 2
2 1−β − 1−β̂ + j=k0 +1 (αβ t−j
et−k0 = .
2
σt−k0 (αβ k 0 −1 − α̂β̂ k 0 −1 )
However, since e2t−k0 is also stochastically independent from the right side of this equation, it must
be a.s. constant, which contradicts our assumption. Since αβ k−1 = α̂β̂ k−1 for all k, we have
∞ ∞
αz X X α̂z
= αβ k−1 z k = α̂β̂ k−1 z k = ∀|z| ≤ 1,
1 − βz 1 − β̂z
k=1 k=1
and by analytic extension we have
α(1 − β̂z) = α̂(1 − βz) ∀z.
Letting z = 0 we have α = α̂, which immediately leads to β = β̂ and ω = ω̂.
18
Due to the complexity of the maximum likelihood function for GARCH(1,1) models, one gener-
ally uses an approximating function. Let θ = (ω, α, β) and let (Ft ) be the filtration on Ω generated
by {Xs : s ≤ t}. Then one has
L(θ | x0 , ..., xn ) = f (x0 , ..., xn | θ) = f (xn | Fn−1 )f (xn−1 | Fn−2 )...f (x1 | F0 )f (x0 | θ),
so
n
X
− log L(θ | x0 , ..., xn ) = − log f (x0 | θ) − log f (xk | Fk−1 ),
k=1
where f (x0 , ..., xn | θ) is the joint probability distribution of {X0 , ..., Xn } in a GARCH model with
parameters θ. With a large enough sample size, the contribution from f (x0 | θ) is commonly
assumed to be relatively small and is dropped, resulting in the quasi maximum likelihood function
n−1
X
QL(θ | x0 , ..., xn ) = − log f (xk | Fk−1 )
k=1
which is to be minimized. In the case of normally distributed innovations et , we have
1 x2
f (xk+1 | Fk ) = q exp(− k ),
2πσk2 2σk
so
n
n−1 1 X x2
QL(θ | x0 , ..., xn ) = log 2π + log σk2 (θ) + 2 k .
2 2 σk (θ)
k=1
The parameter θ̂n = (ω̂n , α̂n , β̂n ) which minimizes this function given observations X0 = x0 , ..., Xn =
X2
xn , or equivalently which minimizes ln (θ) = n1 nk=1 (log σk2 + σ2k ), is called the Quasi-Maximum-
P
k
Likelihood estimator.
Theorem 14. (QMLE - strong consistency)

Let Xt be a GARCH-time series with parameters θ0 = (ω0 , α0 , β0 ). Under the conditions
(i) θ0 lies in a compact set K ⊂ (0, ∞) × [0, ∞) × [0, 1)
(ii) e2t is not a.s. constant
(iii) β0 < 1, α0 6= β0 , α0 + β0 < 1, ω0 > 0
a.s.
the QML estimator is consistent: θ̂n −→ θ0 for n → ∞
Proof. By Theorem 4.2, θ is identifiable.

ω
For each parameter θ, we define recursively σ02 (θ) = 1−α−β and
σk2 (θ) = ω + αXk−1

2 2
+ βσk−1 (θ)
for k = 1, ..., n. Clearly, σk2 (θ0 ) coincides with the true volatility. Since α0 + β0 < 1, we know that
Eθ0 [Xk2 ] = Eθ0 [σk2 (θ0 )] are finite for all k and therefore
n
X
Eθ0 [|ln (θ0 )|] = 1 + Eθ0 [|log σk2 |]
k=1
19
is also finite. Using the inequality log x ≤ x − 1 (for x > 0),
n
1X σ 2 (θ0 )e2k
Eθ0 [|ln (θ)|] − Eθ0 [|ln (θ0 )|] = Eθ0 [log(σk2 (θ)) − log(σk2 (θ0 )) + k 2 − e2k ]
n σk (θ)
k=1
n
1X σ 2 (θ0 ) σ 2 (θ0 )
= Eθ0 [− log( k2 ) + k2 − 1] ≥ 0,
n σk (θ) σk (θ)
k=1
0 σ 2 (θ )
with equality only when σk2 (θ) = 1 Pθ0 -a.s. for all k. Since θ0 is identifiable, this happens only
k
when θ = θ0 . Since the parameter set K is compact, we have supθ∈K β < 1. Define
n n
˜ln (θ) = 1
X X2 1X e2 σ 2 (θ0 )
(log σk2 (θ) + 2 k ) = (log σk2 (θ) + k 2k ).
n σk (θ) n σk (θ)
k=1 k=1
Then Y Y
sup |σt2 (θ0 ) − σt2 (θ)| = sup σ02 | (α0 e2t−i + β0 ) − (αe2t−i + β)| ≤ C sup β t .
θ∈K i i θ∈K
|x−y|
Using the inequality |log( xy )| ≤ min(x,y) for x, y > 0, we have
n
1X |σ 2 (θ0 ) − σk2 (θ)| 2 σ 2 (θ0 )
sup |ln (θ0 ) − ˜ln (θ)| ≤ sup k 2 2 Xk + |log( k2 )|
θ∈K n θ∈K σk (θ0 )σk (θ) σk (θ)
k=1
n n
1 1X k 2 11X k
≤ C sup 2 β Xk + C sup β ,
θ∈K ω n θ∈K ω n
k=1 k=1
a.s.
where supθ∈K β k Xk2 −→ 0 since supθ∈K β < 1 and Xk2 has a finite moment of order greater than 0.
Using Kronecker’s lemma, we see that
a.s.
sup |ln (θ) − ˜ln (θ)| −→ 0.
θ∈K
Finally, for every θ ∈ K and r > 0 let Br (θ) be the open sphere with center θ and radius r. Then
we have
lim inf inf ˜ln (θ̂) ≥ lim inf inf ln (θ̂) − lim sup sup |ln (θ̂) − ˜ln (θ̂)|
n→∞ θ̂∈Br (θ)∩K n→∞ θ̂∈Br (θ)∩K n→∞ θ̂∈K
n
1 X e2k σk2 (θ0 )
≥ lim inf inf log σk2 (θ̂) +
n→∞ n σk2 (θ̂)
k=1 θ̂∈Br (θ)∩K
h e2k σk2 (θ0 ) i
= Eθ0 inf log σk2 (θ̂) +
θ̂∈Br (θ)∩K σk2 (θ̂)
by Birkhoff’s ergodic theorem, using the fact that Xt is ergodic by condition (iii) . By Beppo-Levi’s
theorem (monotone convergence), we have
h e2k σ12 (θ0 ) i h e2 σ 2 (θ0 ) i
Eθ0 inf log σ12 (θ̂) + → Eθ0 log σ1 (θ0 ) + 1 21
θ̂∈Br (θ)∩K σ12 (θ̂) σ1 (θ0 )
20
as r → 0. This means that for every θ 6= θ0 ∈ K, there is an open neighborhood U (θ) of θ in K
such that h e2 σ 2 (θ0 ) i
lim inf inf ˜ln (θ̂) > Eθ0 log σ1 (θ0 ) + 1 21 .
n→∞ θ̂∈U (θ) σ1 (θ0 )
Since K is compact, by the Heine-Borel theorem K is covered by a finite set of these open neigh-
borhoods U (θ1 ), ..., U (θk ) and Br (θ) for any r > 0. Since
2 2
lim sup inf ˜ln (θ̂) ≤ lim ˜ln (θ0 ) = lim ln (θ0 ) = Eθ [log σ1 (θ0 ) + e1 σ1 (θ0 ) ],
n→∞ θ̂∈Br (θ) n→∞ n→∞ 0
σ12 (θ0 )
for any r > 0, θ̂n must lie in Br (θ0 ) for large enough n, and the theorem is proved.
Theorem 15. (QMLE - asymptotic distribution)

Let Xt be a GARCH-time series with parameters θ0 = (ω0 , α0 , β0 ). Define M = R×Mα ×Mβ ⊂ R3 ,
where Mα = [0, ∞) if α0 = 0 and R otherwise, and Mβ analogously. Under the conditions
(i) θ0 lies in a compact set K ⊂ (0, ∞) × [0, ∞) × [0, 1);
(ii) et is not a.s. constant;
(iii) β0 < 1, α0 6= β0 , α0 + β0 < 1, ω0 > 0;
(iv) κ := E[e4t ] < ∞;
(v) E[Xt6 ] < ∞;
√ D
we have n(θ̂n − θ0 ) −→ ξ, where
ξ = arg inf (t − Z)T J(θ0 )(t − Z), Z ∼ N (0, (η − 1)J(θ0 )−1 )

t∈M
and h 1 ∂2 2 i
J(θ0 ) = Eθ0 σ (θ 0 )
σ14 (θ0 ) ∂θ2 1
is a positive definite matrix.
The proof of this theorem can be found in [4], Chapter 8. However, it is useful to see the
Xt2 X2
following equality. Let ft (θ) = log σt2 (θ) + σ2 (θ) , so that ln (θ) = n1 nk=1 fk (θ). Since e2t = σ2 (θt ) is
P
t t 0
independent of σt2 (θ0 ) and all derivatives of σt2 at θ0 , we have
h ∂f (θ ) i h 1 ∂σ 2 (θ ) i
t 0 t 0
Eθ0 = Eθ0 [1 − e2t ] · Eθ0 2 = 0,
∂θ σt (θ0 ) ∂θ
where the derivatives are understood as one-sided in case θ0 lies on the boundary of (0, ∞)×[0, ∞)×
[0, 1). Additionally,
h ∂2 i h 1 ∂2 i h 1 ∂2 2 i
2 2 2
Eθ0 f (θ
t 0 ) = Eθ0 [1 − e t ] · Eθ0 σ + Eθ0 [2e − 1] · Eθ0 σ (θ 0 )
∂θ2 θ σt2 ∂θ2 t t
σt4 (θ0 ) ∂θ2 θ t
= J(θ0 ),
so we may equivalently write

h ∂2 i
J(θ0 ) = Eθ0
ft (θ 0 ) .
∂θ2 θ
Note that when θ0 lies in the interior of K, M = R3 and ξ = Z is normally distributed.
21
5 Tests
We can use the QML estimator from section 4 to construct tests for the significance of the coefficients
in the model; for example, to test between the hypothesis and alternative
H0 : α0 = 0 H1 : α0 > 0.
However, for technical reasons, it is difficult to test α0 = β0 = 0 simultaneously.
The LM test
Pn 2 Xk2 1
Let ln (θ) := k=1 log σk (θ) + σ 2 (θ) be the QMLE as in section 4 (removing the factor n for conve-
k
nience in the below equations). We construct θ̃n as the QML estimator under the constraint that
α = 0. First, we assume that the innovations et actually are standard normally distributed, so that
the QML estimator ln is the true maximum likelihood estimator. Under the assumptions that θ0
is identifiable and Xt has finite moments of up to 6th order, one has the convergence
1 ∂ln (θ0 ) D
√ −→ N (0, I)
n ∂θ
and √ D
n(θ̂n − θ0 ) −→ N (0, I −1 )
2
where I = limn→∞ n1 ∂ ∂θ
ln (θ0 )
2 converges almost surely. The constrained estimator is then derived
through the method of Lagrangian multipliers: define Λ(θ, λ) = ln (θ) − λα. Then (θ̃n , λ̃) =
arg sup Λ(θ, λ). We have ∇Λ(θ̃n , λ̃) = 0 and therefore
 
0
∂
α̃ = 0, ln (θ̃) = λ̃ 1 .
∂θ
0

Let R = 0 1 0 . Then we have
√ √ D
nR(θ̂n − θ̃n ) = nR(θ̂n − θ̃0 ) −→ N (0, RI −1 RT ).
We have
1 ∂ ( 1 ∂ √
0= √ ln θ̂n ) = √ ln (θ0 ) − I n(θ̂n − θ0 ) + o(1)
n ∂θ n ∂θ
and
1 ∂ 1 ∂ √
√ ln (θ̃n ) = √ ln (θ0 ) − I n(θ̃n − θ0 ) + o(1),
n ∂θ n ∂θ
so that
√ 1 ∂ ( 1
n(θ̂n − θ̃n ) = I −1 √ ln θ̃n ) = I −1 √ RT λ̃
n ∂θ n
and therefore
λ̃ √ D
√ = (RI −1 RT )−1 nR(θ̃n − θ0 ) + o(1) −→ N (0, (RI −1 RT )−1 ).
n
22
This means that
1 ∂ λ̃ D
√ ln (θ̃n ) = RT √ −→ N (0, RT (RI −1 RT )−1 R).
n ∂θ n
1 ∂2
Let Iˆn = n ∂θ2 ln (θ̃n ), so that Iˆn is a strongly consistent estimator for I. Then the LM statistic
1 ∂ ∂
LM (n) = ( ln (θ̃n ))T Iˆn−1 ln (θ̃n )
n ∂θ ∂θ
is asymptotically χ2 -distributed with Rank(R) = 1 degree of freedom.
In the case where et are not normally distributed, one has in general
1 ∂
J := lim Varθ0 [ √ ln (θ)] 6= I
n→∞ n ∂θ
and it can be shown that the following limits hold:

1 ∂ D
√ ln (θ0 ) −→ N (0, J)
n ∂θ
and √ D
n(θ̂n − θ0 ) −→ N (0, I −1 JI −1 ),
and therefore the LM statistic
1 ∂ln (θ̃n ) T ˆ−1 T ˆ−1 ˆ ˆ−1 T −1 ˆ−1 ∂ln (θ̃n )

LMn = ( ) In R (RIn Jn In R ) RIn .
n ∂θ ∂θ
It can be shown that this statistic is also asymptotically χ2 -distributed with Rank(R) = 1 degree
of freedom.
First, we show the problem that appears when attempting to test both α0 = β0 = 0. Applying
the previous results to this case, we have θ0 = (ω0 , 0, 0) under hypothesis H0 and therefore
n
1 ∂ln (θ0 ) 1 X Xk2 − σk2 (θ0 ) ∂σk2 (θ0 )
√ = √
n ∂θ 2 n σk4 (θ0 ) ∂θ
k=1
 
n 1
1 X e2k − 1  2  D
= √ Xk−1 −→ N (0, J),
2 n ω0
k=1 ω0
where, letting κ = E[e4k ],

   
1 1 ω0 ω 0
1 h i κ−1
E (e2k − 1)2 Xk−1
2  1 X2
ω0 κω02 ω02  .

J= 2 k−1 ω0 =
4ω0 4ω02
ω0 ω0 ω02 ω02
Since J is singular, no LM test can immediately be constructed.
23
ω0
Testing α0 = 0, we have θ0 = (ω0 , 0, β0 ) under H0 and therefore E[σk2 (θ0 )] = 1−β0 . We then
have  
n 2 1
1 ∂ln (θ0 ) 1 X (ek − 1)  2 D
√ = √ 2
Xk−1  −→ N (0, J),
n ∂θ 2 n σk (θ0 ) 2 (θ )
k=1 σk−1 0
2
1−β k σk−1 (θ0 ) 1−β0k−1
where σk2 (θ0 ) = ω0 1−β00 , so σk2 (θ0 )
= 1−β0k
=: γk and therefore
 
n 1 γk γk
1 ∂ln (θ0 ) κ−1X γk κγk σ 2 (θ0 ) γk σ 2 (θ0 ) ,
J = lim Var[ √ ] = lim k−1 k−1
n→∞ n ∂θ n→∞ 4n 2 (θ ) 2 (θ )
k=1 γk γk σk−1 0 γk σk−1 0
Pn Pn 2
and we have with gn = k=1 γk , sn = k=1 γk σk−1 (θ),
 sn −gn 
2
sn −gn
0 sn −gn 2
4n  0 1 −1
J −1 = lim

sn (κ−1) sn (κ−1)
κ−1
 
n→∞ 2
−gn −1 sn κ−gn
2
sn −gn sn (κ−1) sn (sn −gn2 )(κ−1)
where the limit in each matrix entry exists. To see this, consider for example that with 0 < β0 < 1,
n
X 1 − β k−1 0 1 − β0
gn = = ψβ (n) + n + O(1),
1− β0k β0 log β0 0
k=1
where ψβ0 (n) is the β0 -digamma function and

∞
ψβ0 (z) X β0n+z
lim = log β0 lim =0
z→∞ z z→∞ z(1 − β n+z )
0 n=0
n
by uniform convergence, so that limn→∞ gn = 1 Additionally, we have
n
X ω0 (1 − β0k−1 )2
sn =
k=1
(1 − β0 )(1 − β0k )
β0n − β02 (n − 1) + β0 (n − 2) (β − 1)ψβ0 (n + 1)
= + + O(1),
(β0 − 1)2 β0 β02 log β0
n
and sn → 1 − β0 follows. Thus the limit J is indeed invertible, with inverse
 1

1−β0 0 −1
1 
J −1 =  0
1−β0
κ−1 − 1−β 0
κ−1  .
κ−1
−1 − 1−β
κ−1
0 1−β0
κ−1
24
On the other hand,
1 ∂2
In := ln (θ0 )
n ∂θ2
n
1 X ∂ h X 2 − σ 2 (θ0 ) ∂σ 2 (θ0 ) i
k k k
=
n ∂θ 2σk4 (θ0 ) ∂θ
k=1
n
1 X 2Xk2 − σk2 (θ0 ) ∂σk2 (θ0 ) ∂σk2 (θ0 ) T
= ( )
n 2σk6 (θ0 ) ∂θ ∂θ
k=1
2 2 (θ )
 
n 2 1 Xk−1 σk−1 0
1 X 2ek − 1  2 4 2 σ 2 (θ )
= Xk−1 Xk−1 Xk−1 k−1 0 ,
n 2σk4 (θ0 ) 2 2 2 4 (θ )
k=1 σk−1 (θ0 ) Xk−1 σk−1 (θ0 ) σk−1 0
which converges almost surely to 2

κ−1 J due to the factor of 1
n. Defining Jˆn = κ−1 ˆ
2 In where
 
n 2 2 1
1 X 2Xk − σk (θ̃n ) 
Iˆn = 2

Xk−1  1 X2 2 (θ̃ ) ,
σk−1 n
n 6 k−1
k=1
2σk (θ̃n ) 2 (θ̃ )
σk−1 n
we have
κ − 1 ˆ−1 T 2n
RIˆn−1 Jˆn Iˆn−1 RT = RIn R = ,
2 sn (κ − 1)
 
2 2
1
X −σ (θ̃ )
∂
ln (θ0 ) = 12 nk=1 kσ4 (θ̃k ) n  Xk−1 2
P
and considering that ∂θ , the LM statistic can be calculated.
k n 2
σk−1 (θ̃n
Testing for covariance stationarity
The GARCH(1,1)-process is covariance stationary if and only if α0 + β0 < 1. The problem of

determining whether the process has a stationary solution or not may therefore be addressed with
a parametric test. Let
H0 : α0 + β0 ≥ 1 H1 : α0 + β0 < 1.
Here, θ lies in the interior of a compact parameter space K, so that the QML estimator is asymp-
totically normal:
√ D
n(θ0 − θ̂n ) −→ N (0, (κ − 1)J −1 )

and with R = 0 1 1 , if Rθ0 = 1, we have
√ D
n(Rθ̂n − 1) −→ N (0, (κ − 1)RJ −1 RT ).
This leads to the asymptotic normality of the Wald statistic:

√
n(α̂n + β̂n − 1) D
Wn = q −→ N (0, 1),
(κ − 1)RJˆn−1 RT
and H0 is rejected if Wn < uα , where uα is the α-quantile of the standard normal distribution.
25
6 Variants of the GARCH(1,1) model
While the standard GARCH(1,1) and related GARCH(p,q) models are useful tools in econometrics,
they are unable to describe certain aspects often found in financial data. An important weakness
is their inability to react differently to positive and negative innovations - the conditional variance
considers only the squares of the innovations. However, many datasets display a leverage effect,
where negative returns correspond to higher increases in volatility than positive returns. Another
problem is the lack of clarity with regard to stationarity and persistence, where shocks may persist
in one norm but not in another in the GARCH model. The existence of almost-surely stationary
GARCH(1,1)-processes with infinite variance at every time t is inconvenient.
A notable variant of the GARCH model which addresses these problems is the Exponential ARCH
(EARCH) model due to Nelson (1989). This has the additional advantage of greater flexibility
in the parameters by imposing the autoregressive relationship on log σt2 , which can take negative
values. The general form of the EARCH(1) model is
log σt2 = ω + β(|et−1 | − E[|et−1 |]) + γet−1 .
It can also be shown that the conditions for stationarity, unlike the GARCH(1,1) model, are the
same for both wide-sense (almost sure) and covariance stationarity. A necessary and sufficient
condition for this is β < 1. However, the asymptotic properties of QML estimation for EARCH
models are not as well known as the GARCH case.
Another possible extension of the GARCH(1,1) model to allow for asymmetry is the QGARCH(1,1)
model of Sentana (1995):
σt2 = ω + αXt−1
2 2
+ βσt−1 + γet−1 ,
where appropriate restrictions are necessary to ensure that σt remains positive. These are given
√
by ω, α, β > 0 and |γ| ≤ 2 αω. Many of the properties of the GARCH(1,1) model carry over
ω
immediately to QGARCH. For example, the QGARCH model has unconditional variance 1−α−β if
α + β < 1 and undefined otherwise, and α + β < 1 is also necessary for covariance stationarity.
The AGARCH(1,1) (asymmetric GARCH) model developed by Engle and Ng (1993) is another
approach to allowing the GARCH model to react asymmetrically. It is defined by
Xt = et σt , σt2 = ω + α(Xt−1 + γ)2 + βσt−1

2
where γ is the noncentrality parameter.

Another interesting model is the TGARCH(1,1) (threshold GARCH) model developed by Jean-
Michel Zakoian. Here, the autoregressive specification is given for the conditional standard devia-
tion instead of the variance:
Xt = et σt
−
+
σt = ω + α+ Xt−1 + α− Xt−1 + βσt−1
where Xt+ = max{0, Xt } is the positive part of Xt and Xt− = min{0, Xt } the negative. This is
another model developed to account for asymmetric reactions to shocks, which has the advantage of
being closer to the original GARCH formulation but also requires some non-negativity assumptions
for the parameters.
26
7 GARCH(1,1) in continuous time
The diffusion limit
Heuristically, we will consider in this section whether increasingly frequent observations of the
time series will lead to a better model. Although this may seem intuitive, it is easy to see that this
should not be the case in general: even in a non-stochastic setting, increasingly fine interpolation
can lead to large errors without the assumption of differentiability. However, we will show that
highly frequent observations will lead to a more accurate model in certain cases, where the precise
meaning of “leading to a better model” will be the weak convergence of processes.
Nelson (1990) has studied the relationship between GARCH(1,1) and similar models, and stochas-
tic differential equations in continuous time. His main result is that in a sequence of GARCH(1,1)
models to increasingly small time intervals, under certain assumptions on the parameters, the
conditional variance process converges in distribution to a stochastic differential equation with
an inverse-gamma stationary distribution. This means that for sufficiently short time intervals
the GARCH log returns can be approximately modelled with a Student’s t distribution. Since a
GARCH(1,1) process is Markovian, it is enough to consider convergence of Markov chains.
(h)
The following theorem gives conditions under which a sequence of Markov processes Xhk (k ∈ N)
converges weakly to an Ito process as h tends to zero. Let D be the Skorokhod space of càdlàg
(h)
mappings from [0, ∞) into Rn , endowed with the Skorokhod metric. For every h > 0 let Fkh be
(h) (h)
the σ-algebra generated by X0 , ..., Xkh , P(h) be a probability measure on B n , the Borel σ-algebra
(h)
over Rn , and for every h > 0 and k ∈ N0 let Kkh be a transition function on Rn ; that means:
(h)
(i) for every x ∈ Rn , Kkh (x, .) is a probability measure on B n , and
(h)
(ii) for every A ∈ B n , Kkh (., A) is a measurable function.
(h) (h) (h) (h)
and such that we have P(h) (X(k+1)h ∈ A | Fkh ) = Kkh (Xkh , A) P(h) -a.s. ∀A ∈ B n , as well as
(h) (h)
P(h) (Xt = Xkh , kh ≤ t < (k + 1)h) = 1.
(h) (h) (h)
We define Xt as the extension of Xkh into continuous time; that is, Xt is a step function with
discontinuities at kh for all k. For h > 0 and > 0 define
Z
1 (h)
ah (x, t) = (y − x)(y − x)T Kh,bt/hc (x, dy)
h
Rn
Z
1 (h)
bh (x, t) = (y − x)Kh,bt/hc (x, dy)
h
Rn
and for each i = 1, ..., n define
Z
1
ch,i, (x, t) = |(y − x)i |2+ Kh,bt/hc (x, dy)
h
Rn
where ah and bh are finite if ch,i, is finite for all i with some > 0.
Theorem 16. Let the following assumptions be fulfilled:
(i): There is an > 0 so that for every R > 0, T > 0 and i = 1, ..., n:
lim sup ch,i, (x, t) = 0
h→0 kxk≤R, 0≤t≤T
27
and continuous functions a : Rn × [0, ∞) → Rn×n , b : Rn × [0, ∞) → Rn so that
lim sup kah (x, t) − a(x, t)kF = 0

lim sup kbh (x, t) − b(x, t)k = 0

where k−kF denotes the Frobenius norm.

(ii): There exists a continuous function σ : Rn × [0, ∞) → Rn×n so that
a(x, t) = σ(x, t)σ(x, t)T ∀x ∈ Rn , t ≥ 0.
(h) D
(iii): There exists a random variable X0 with distribution P0 on (Rn , B n ) so that X0 −→ X0 .
(iv): a, b, and P0 uniquely determine the distribution of a diffusion process Xt with starting distri-
bution P0 , diffusion matrix a(x, t) and drift b(x, t).
(h)
Then Xt ⇒ Xt where Xt is given by the stochastic differential equation
Zt Zt
Xt = X0 + b(Xs , s)ds + σ(Xs , s)dBn,s
0 0
where Bn,t is an n-dimensional Brownian motion independent of X0 . Xt does not depend on the
choice of matrix square root σ. In addition, for every T > 0, we have
P( sup kXt k < ∞) = 1.

0≤t≤T
This theorem can be applied to the GARCH(1,1) model with normally distributed innovations,
allowing
P the parameters α, β, ω to depend on h and making the innovations proportional to h. Let
Yt = s<t Xt . Then one has the difference equations
(h) (h) (h) (h)
Yhk = Yh(k−1) + σhk ehk
(h) (h) 1 (h)

(σhk )2 = ωh + (σh(k−1) )2 (βh + αh (eh(k−1) )2 )
h
(h) (h)
with i.i.d. random variables ehk ∼ N (0, h) (k = 0, 1, ...). For B ∈ B 2 let νh (B) = P((Y0 , σ02 ) ∈ B)
where we may assume that νh satisfies condition (iii) of the theorem.
(h) (h) (h) (h)
Let Fhk be the σ-algebra generated by Y0 , ..., Yh(k−1) , σ0 , ..., σhk . Then
(h) (h) (h) (h)

E[h−1 (Yhk − Yh(k−1) ) | Fhk ] = σhk E[ehk ] = 0
and
(h) (h) 1 (h)
E[h−1 ((σh(k+1) )2 − (σhk )2 ) | Fhk ] = (ωh + (βh + αh − 1)(σhk )2 ),
h
so the limit condition in (i) can be fulfilled only if the limits
ωh 1 − αh − β h
lim =ω≥0 lim =θ
h→0 h h→0 h
28
α2h α2
exist. In addition, assuming limh→0 h = 2 exists and is finite,
(h) (h)
E[h−1 {(σh(k+1) )2 − (σhk )2 }2 | Fhk ]

(h) (h) (h)
= h−1 ωh2 − 2(1 − αh − βh )(σhk )2 + (1 − αh − βh )2 (σhk )4 + 2αh2 (σhk )4
(h)
= α2 (σhk )4 + o(1),
and
(h) (h) (h)
E[h−1 (Yhk − Yh(k−1) )2 | Fhk ] = (σhk )2 ,
and
(h) (h) (h) (h)
E[h−1 (Yhk − Yh(k−1) )((σh(k+1) )2 − (σhk )2 ) | Fhk ] = 0,
for a term o(1) which tends to zero uniformly on compacta. Since
(h) (h) (h)
E[h−1 (Yhk − Yh(k−1) )4 | Fhk ] = 3h−1 (σhk )4 → 0
and
(h) (h)
E[h−1 {(σh(k+1) )2 − (σhk )2 }4 | Fhk ] → 0,
we have with = 2,
(h) (h)
ch,i, (x, t) = E[h−1 (Yhk − Yh(k−1) )4 | Fhk ] → 0;
and setting 2
σ 0
a(Y, σ) =
0 α2 σ 4

0
b(Y, σ) = ,
ω − θσ 2

σ 0
assuming the limit conditions on αh , βh , ωh , condition (i) is satisfied. Letting τ = fulfills
0 ασ 2
condition (ii). Under the assumption of distributional uniqueness, by Theorem 7.1 we have the
diffusion limit
dYt = σt dBt
dσt2 = (ω − θσt2 )dt + ασt2 dWt ,
for independent Brownian motions Bt , Wt . Since the drift and variation terms are Lipschitz-
continuous, there exists a unique strong solution and therefore a unique distributional solution.
The corresponding Fokker-Planck equation for σt2 is
∂ ∂ 1 ∂2 2 2
f (x, t) = − [(ω − θx)f ] + (α x f )
∂t ∂x 2 ∂x2
where f (x, t) is the probability density of σt2 given σ02 . A stationary distribution with p.d.f. g for
σt2 must satisfy g(x) = limt→∞ f (x, t) and therefore
d 2 2
α x g(x) = 2(ω − θx)g(x),
dx
29
so
2
g 0 (x) = (ω − θx − α2 x)g(x)
α 2 x2
with solution
−2ω
2 −2
g(x) = Cx−2θ/α ) exp(
α2 x
for a normalizing constant C. This is the p.d.f. of an inverse gamma distribution up to the constant
factor and therefore 2
(2ω/α2 )(2θ/α +1)
C= ,
Γ(2θ/α2 + 1)
so under the assumption that 2θ/α2 + 1 > 0,
D
σt2 −→ Γ−1 (2θ/α2 + 1, 2ω/α2 ).
Theorem 17. Assume that 2θ/α2 + 1 > 0 with θ, α defined as above, and that
(h) D
(σ0 )2 −→ Γ−1 (2θ/α2 + 1, 2ω/α2 ) (h → 0).
Then
(h) D
(σhk )2 −→ Γ−1 (2θ/α2 + 1, 2ω/α2 )
and r
(2θ + α2 )/2ω (h) (h) D
ehk σhk −→ t(2 + 4θ/α2 )
h
as (h → 0) and kh remains constant.
Here, t(2 + 4θ/α2 ) is the Student’s t-distribution with 2 + 4θ/α2 degrees of freedom.
(h)
Proof. The first statement follows immediately from the above considerations since σhk converges
(h)
in distribution to σt for h → 0 and k = ht . For the second statement, consider that h1 ehk is standard
(h)
normally distributed and independent of σhk for every h, k. Therefore we assume without loss of
generality that the process is stationary, that is,
(h)
(σhk )2 ∼ Γ−1 (λ, µ)
(h)
for every h, k, defining λ = 2θ/α2 + 1, µ = 2ω/α2 . Since the density of (σhk )2 is
−µ
g(x) = Cx−λ−1 exp( )
x
(h)
with C defined as earlier, the density of σhk is
g(x2 ) −µ
fσ (x) = √ = 2Cx−2λ−1 exp( ),
| ddxx | x2
30
q
1 (h) (h)
so that the density of the product h ehk σhk is
Z∞
1 1 −x2 y
f (y) = √ e 2 fσ ( )dx
2π x x
0
Z∞
2C −x2 ( 12 + µ
)
= √ y −2λ−1 x2λ e y2 dx
2π
0
2C 1 1 µ 1 1 1
= √ (y 2 )−λ− 2 ( 2 + )−λ− 2 Γ(λ + )
2π 2 y 2 2
2 1
1 Γ(λ +
y 2)
= µλ (µ + )−λ− 2 √ ,
2 2πΓ(λ)
q q
(2θ+α2 )/2ω (h) (h) λ (h) (h)
so that the distribution of the product h ehk σhk = µh ehk σhk is
y 2 µ −λ− 1 Γ(λ + 21 )
r
µ λ
f (y) = µ (µ + ) 2 √
λ 2λ 2πΓ(λ)
y 2 −2λ−1 Γ( 2λ+1
2 )
= (1 + ) 2 √ ,
2λ π · 2λΓ( 2λ
2 )
which is the density of a t-distributed random variable with 2λ = 2 + 4θ/α2 degrees of freedom.
The theorem is proved.
The COGARCH(1,1) model
Recall that in the GARCH(1,1)-process with ω > 0, α, β ≥ 0, (see Theorem 3.1), we have the
representation
X j
t−1 Y t−1
Y
σt2 = ω (αe2t−i + β) + σ02 (αe2t−k + β)

j=0 i=1 k=0

n−1
X j
X n−1
X
=ω exp( log(αe2t−i + β)) + σ02 exp( (αe2k + β)).
j=0 i=1 k=0
This motivates the following continuous-time GARCH(1,1) variant due to Klüppelberg, Lindner
and Maller:
Definition 18. Let (Lt )t≥0 be a Lévy process adapted to a filtrated probability space (Ω, F, P, Ft )
satisfying the usual conditions: Ft contains every P-null set for any t, and Ft = ∩s>t Fs . Define the
cadlag process
X α(∆Ls )2
Xt = −t log β − log 1 + (t ≥ 0)
β
s≤t
31
and with a random variable σ0 independent of (Lt )t≥0 , the caglad volatility process
Zt
σt2 = (ω eXs ds + σ02 )e−Xt− (t ≥ 0)
0
and the cadlag (integrated) COGARCH(1,1) process

Zs
Gt = σs dLs (t > 0), G0 = 0.
0
Gt plays the same role as the cumulative log-returns Yt = Xt + ... + X0 from the GARCH
model described in section 1. The simplest example of a COGARCH(1,1) process is driven by
a Brownian motion: Lt = Bt . Since Bt is almost surely continuous, we have Xt = −t log β, so
Rt t −1)
σt2 = (ω β −s ds + σ02 )β t = ω(β 2 t 2
log β + σ0 β . In this case, σt is deterministic. This is not surprising:
0
in general, the jumps ∆Ls are the analogons to the innovations en in discrete time.
Lemma 19. Let Xt and σt be given as above. Then σt2 solves the stochastic differential equation
2
dσt+ = ωdt + σt2 eXt− d(e−Xt )
and therefore
Zt
α X 2
σt2 = ωt + log β σs2 ds + σs (∆Ls )2 + σ02 .
β
0 0<s<t
Q α(∆Ls )2
Proof. : Let Yt = s≤t (1 + β ). Then
Y α(∆Ls )2
e−Xt = β t (1 + ) = β t Yt ,
β
s≤t
and defining f (t, x) = β t x, we have by Ito’s lemma:

df (t, Yt ) = (β t Yt log β)dt + β t dYt ,
and therefore
Zt
−Xt α X −Xs−
e = 1 + log β e−Xs ds + e (∆Ls )2
β
0 s≤t
using the fact that Yt has bounded variation. Since

Zt Zt Zs Zt Zs h Zt i
−Xt Xs −Xs Xr Xr −Xs −Xt
e e ds = e d( e dr) + e drd(e )+ e , eXs ds
t
0 0 0 0 0 0
Zt Zt Zs h Zt i
= e−Xs eXs ds + eXr drd(e−Xs ) + e−Xt , eXs ds
t
0 0 0 0
32
by integration by parts and the chain rule, and since
h Zt i h Zt Zt i Zt
−Xt Xs −Xs Xs
e , e ds = log β e , e = d[s log β, s]s = 0
t t
0 0 0 0
we have
Zt Zt Zr
−Xt
e e Xs
ds = t + eXr drd(e−Xs ),
0 0 0
so
Zt
−Xt
dσt2 = ωd(e eXs ds) + σ02 d(e−Xt )
0
Zt
= ωdt + ω eXs dsd(e−Xt ) + σ02 d(e−Xt ) = ωdt + σt2 eXt− d(e−Xt ).
0
33
8 Example with MATLAB
In this section, we use the GARCH methodology to analyze the exchange rate between the U.S.
dollar and the euro since its full introduction in 2002.
The above graph shows the daily average value of USD in euros each day from January 1st,
2002 until July 9, 2011. First, the data is transformed into its log returns. With the original time
series named ”data”, the MATLAB code is simple:
for i = 1:length(data)-1
temp(i) = log(data(i+1) / data(i));
end
global log_returns = temp;
plot(1:length(log_returns),log_returns);
where the log returns are saved as a global variable so that they can be accessed easily in other
functions later on.
34
This returns the graph below:
Alternating periods of volatility and relative quiet are visible, as well as a period of intense volatility
in late 2008 (around 2500 days from the start) which likely corresponds to the subprime mortgage
crisis. At first glance, the data appear to have heteroskedastic effects. We now estimate the param-
eters using MATLAB’s optimization toolbox. The Gaussian quasi-maximum likelihood function is
implemented and stored separately in a file called QMLE.m:
function y = QMLE(param)
global log_returns;
sigma2(1) = param(1)/(1 - (param(2) + param(3)));
y = log(sigma2(1)) + (log_returns(1)^2)/sigma2(1);
for i=2:length(log_returns)
sigma2(i) = param(1) + param(2)*log_returns(i-1)^2...
+ param(3)*sigma2(i-1);
assert(sigma2(i) > 0);
y = y + log(sigma2(i)) + (log_returns(i)^2)/sigma2(i);
end
where param is a vector (ω, α, β), sigma2(i) is σi2 and log returns(i) is Xi .
35
The function QMLE is now minimized over the compact set
[1 ∗ 10−15 , 5] × [1 ∗ 10−15 , 1] × [1 ∗ 10−15 , 1]
under the additional constraint that α + β ≤ 1 − 10−15 . The reason for this is that the constraints
must be given in the form Ax ≤ b instead of Ax < b. This optimization is done with
[param, fval] = fmincon(@QMLE,[0.0000002,0.03,0.96],...

[0,1,1],1-10^-15,[],[],lb,ub,[],options)
where lb = 10−5 [1; 1; 1] is the lower bound and ub = [5; 1; 1] the upper bound for the parameters.
MATLAB generates the output
param =
0.0000 0.0248 0.9744
fval =
-3.4017e+004
and we find (after increasing the precision) our QML estimator
ω̂ = 2.27 ∗ 10−8 , α̂ = 0.0248, β̂ = 0.9744
The function also generates the reconstructed volatility process σt2 (θ̂n ) described in section 4:
where the dotted line represents the stationary volatility.
36
We can now simulate the process for the following year. Since the parameters were estimated
using Gaussian quasi-maximum likelihood, it is appropriate that normally distributed innovations
should be used for the simulation. Normally distributed pseudorandom numbers are generated in
MATLAB with randn. We define the function
function [sim,sigma2] = simul(param,last_sigma2,last_observ,t)

assert(length(param) == 3);
assert(last_sigma2 > 0);
assert(param(1)*param(2)*param(3) > 0);
innov = randn([1,t]);
sigma2(1) = param(1) + param(2)*last_observ^2...
+ param(3)*last_sigma2;
sim(1) = innov(1)*sqrt(sigma2(1));
for i=2:t
sigma2(i) = param(1) + param(2)*sim(i-1)^2...
+ param(3)*sigma2(i-1);
sim(i) = innov(i)*sqrt(sigma2(i));
end
Four simulated volatilities are shown below:
37
which correspond to the following predicted exchange rates:
38
9 Discussion
In this thesis, we have considered the strengths and weaknesses of the GARCH(1,1) model in math-
ematical finance, as well as the practical questions of parameter estimation and implementation.
GARCH models and variants have become ubiquitous in the theory of economic time series since
their introduction only 25 years prior. This is due to the relative simplicity of the model and the
wide range of processes it can approximate. Related models such as the EARCH and EGARCH
models of Nelson (1989) provide the ability to account for emperical phenomena such as the lever-
age effect at the cost of a more complex asymptotic theory for the typical estimators.
The investigation of multivariate GARCH models remains an active area of research. The need
for such models arises when one considers a set of time series with significant interdependence; an
example of this is the stock price of a manufacturing firm and the commodity prices for the re-
sources it requires. The most general model replaces the GARCH specification with matrix-valued
coefficients as well as a log-returns vector Xt and a vectorized volatility matrix σt (that is, such
that σt2 is the conditional covariance of Xt ). This is known as the Vec model. However, this can
be very difficult to work with, as necessary and sufficient conditions to ensure that σt2 is positive
definite are difficult or impossible to derive. Therefore, the Vec model is often restricted to models
such as the BEKK model. The theory of these models is beyond the scope of this paper.
Another area of further research is the connection between GARCH models and stochastic volatil-
ity models. Two examples of continuous-time processes which are related to the discrete GARCH
equations are mentioned in section 7; these are the only such classes of processes known at this
time. The stationary distribution of the diffusion limit may also contain information about the
stationary distribution of the GARCH model and thus its long-term behavior. In addition, the
convergence to the diffusion limit is weak and does not necessarily imply that the GARCH model
and the diffusion limit must be asymptotically equivalent.
39
References
[1] Bollerslev, T., 1986, Generalized autoregressive conditional heteroskedasticity, Journal of Econo-
metrics 31, 307-327 Journal of Economics, 31, 307-327
[2] Engle, R.F., 1982, Autoregressive conditional heteroskedasticity with estimates of the variance
of U.K. inflation, Econometrica, 50, 987-1008
[3] Engle, R.F. and Ng, V.K., 1993, Measuring and testing the impact of news on volatility, Journal
of Finance, 48, 1749-1778
[4] Francq, C. and Zakoı̈an, J-M. Modèles GARCH: structure, inférence statistique et applications
financières, Economica: Paris, 2009
[5] Kreiss, J-P. and Neuhaus, G., Einführung in die Zeitreihenanalyse, Berlin, Heidelberg: Springer-
Verlag, 2006
[6] Klüppelberg, C., Lindner, A. and Maller, R., 2004, A continuous time GARCH(1,1) process
driven by a LÂ´evy process: stationarity and second order behaviour, Journal of Applied Prob-
ability, 41, 1-22
[7] MATLAB version 7.11.0. Natick, Massachusetts: The MathWorks Inc., 2010.
[8] Nelson, D.B., 1988, Stationarity and persistence in the GARCH(1,1) model, Econometric The-
ory, 6, 318-334
[9] Nelson, D.B., 1989, Conditional heteroskedasticity in asset returns: A new approach, Econo-
metrica, 59, 347-370
[10] Nelson, D.B., 1990, ARCH models as diffusion approximations, Journal of Economics, 45, 7-28
[11] Sentana, E., 1995, Quadratic ARCH models, Review of Economic Studies, 62, 639-661
[12] Stroock, D.W. and Varadhan, S.R.S., Multidimensional Diffusion Processes, Berlin: Springer-
Verlag, 2006
[13] Zakoı̈an, J-M., 1994, Threshold heteroskedastic models, Journal of Economic Dynamics and
Control, 18, 931-955
40

GARCH (1,1) Models: Ruprecht-Karls-Universit at Heidelberg

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GARCH (1,1) Models: Ruprecht-Karls-Universit at Heidelberg

Uploaded by

Copyright:

Available Formats

Ruprecht-Karls-Universität Heidelberg

Fakultät für Mathematik und Informatik

zur Erlangung des akademischen Grades

Bachelor of Science (B. Sc.)

15. Juli 2011

Betreuung: Prof. Dr. Rainer Dahlhaus

3 A central limit theorem 9

6 Variants of the GARCH(1,1) model 26

7 GARCH(1,1) in continuous time 27

8 Example with MATLAB 34

Xt = log Pt+1 − log Pt

and the volatility σt , where

ARCH (autoregressive conditional heteroskedasticity) models were introduced by Robert Engle

FX (xt1 +h , ..., xtn +h ) = FX (xt1 , ...xtn )

Proof. With the equation σt2 = ω + (αe2t−1 + β)σt−1

which is valid for all k ∈ N. In particular,

This means that P(|σt − σ̂t | > ) = 0 ∀ > 0, so σt = σ̂t a.s.

E[log(αe2t + b)] = E[log e2t ] = −(γ + log 2) < 0,

E[Xt4 ] = E[e4t ]E[σt4 ]

E[σt4 ] = ω 2 + 2ωE[αe2t−1 + β]E[σt−1

E[(αe2t + β)2 ] = α2 E[e4t ] + 2αβE[e2t ] + β 2 = 3α2 + 2αβ + β 2 = (α + β)2 + 2α2 < 1.

The excess kurtosis of Xt with normally distributed innovations is then

E[Xt4 ] 3(1 + α + β)ω 2

Definition 5. The innovation et does not persist in X in L1 iff

This means that we have

so that πX (t, n) cannot converge to zero.

Definition 7. We call π := α + β the persistence of a GARCH(1,1) model with parameters ω, α, β.

where ρ2 := E[(αe2t + β)2 ] as in section 2.

Proof. Let w = ω − E[Xt2 ]. Then we have

We define the helper variable

|Cov[f (Ys1 , ..., Ysu ), Yt ]| = |Cov[f (Ys1 , ..., Ysu ), Yt − Ỹt ]|

Similarly, we will need an additional inequality:

Proof. Define Ỹt as earlier. Then by Hölder’s inequality,

Using the triangle and Cauchy-Schwarz inequalities, we have

E[X02 Xk2 ] = E[σ02 σk2 ]

P of kthe variables {Yk }, where 2the limit

E[f (Wn,1 + ... + Wn,n ) − f (Zn,1 + ... + Zn,n )] → 0 (n → ∞)

Sn,k := Wn,1 + ... + Wn,k−1 , Tn,k := Zn,k+1 + ... + Zn,n

(where Sn,1 = Tn,n = 0), as well as

∆n,k := E[f (Sn,k + Wn,k + Tn,k ) − f (Sn,k + Zn,k + Tn,k )].

(1,2) (1,2,1) (1,2,2) (1,2,3)

and therefore for every δ > 0 there exists a D0 ∈ N so that ∀d > D0 :

for large enough n and small enough . In addition, we have

which is summable over n. With δ → 0 we have

The first term tends to zero: for any δ > 0,

Additionally, for every  > 0,

letting n tend to ∞ and δ to zero. Altogether, we now have

and the theorem is proved.

and by analytic extension we have

α(1 − β̂z) = α̂(1 − βz) ∀z.

Letting z = 0 we have α = α̂, which immediately leads to β = β̂ and ω = ω̂.

which is to be minimized. In the case of normally distributed innovations et , we have

Theorem 14. (QMLE - strong consistency)

Proof. By Theorem 4.2, θ is identifiable.

σk2 (θ) = ω + αXk−1

Theorem 15. (QMLE - asymptotic distribution)

ξ = arg inf (t − Z)T J(θ0 )(t − Z), Z ∼ N (0, (η − 1)J(θ0 )−1 )

so we may equivalently write

However, for technical reasons, it is difficult to test α0 = β0 = 0 simultaneously.

and it can be shown that the following limits hold:

1 ∂ln (θ̃n ) T ˆ−1 T ˆ−1 ˆ ˆ−1 T −1 ˆ−1 ∂ln (θ̃n )

where, letting κ = E[e4k ],

Since J is singular, no LM test can immediately be constructed.

This means that P(|σt − σ̂t | > ) = 0 ∀ > 0, so σt = σ̂t a.s.

for large enough n and small enough . In addition, we have

Additionally, for every > 0,