You are on page 1of 148

Time Series Analysis

Andrea Beccarini

Center for Quantitative Economics

Winter 2013/2014

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 1 / 143


Introduction
Objectives

Time series are ubiquitous in economics, and very important


in macro economics and financial economics
GDP, inflation rates, unemployment, interest rates, stock prices
You will learn . . .
the formal mathematical treatment of time series and stochastic
processes
what the most important standard models in economics are
how to fit models to real world time series

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 2 / 143


Introduction
Prerequisites

Descriptive Statistics
Probability Theory
Statistical Inference

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 3 / 143


Introduction
Class and material

Class
Class teacher: Sarah Meyer
Time: Tu., 12:00-14:00
Location: CAWM 3
Start: 22 October 2013
Material
Course page on Blackboard
Slides and class material are (or will be) downloadable

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 4 / 143


Introduction
Literature

Neusser, Klaus (2011), Zeitreihenanalyse in den


Wirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden.
available online in the RUB-Netz
Hamilton, James D. (1994), Time Series Analysis,
Princeton University Press, Princeton.
Pfaff, Bernhard (2006), Analysis of Integrated and
Cointegrated Time Series with R, Springer, New York.
Schlittgen, Rainer und Streitberg, Bernd (1997),
Zeitreihenanalyse, 7. Aufl., Oldenbourg, Munchen.

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 5 / 143


Basics
Definition

Definition: Time series


A sequence of observations ordered by time is called time series

Time series can be univariate or multivariate


Time can be discrete or continous
The states can be discrete or continuous

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 6 / 143


Basics
Definition

Typical notations
x1 , x2 , . . . , xT
or x(1), x(2), . . . , x(T )
or xt , t = 1, . . . , T
or (xt )t0
This course is about . . .
univariate time series
in discrete time
with continuous states

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 7 / 143


Basics
Examples

Quarterly GDP Germany, 1991 I to 2012 II



650







600



GDP (in current billion Euro)




550







500








450








400


350

1995 2000 2005 2010

Time

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 8 / 143


Basics
Examples

DAX index and log(DAX), 31.12.1964 to 6.4.2009


6000
DAX

2000

1970 1980 1990 2000 2010

Time
9.0
logarithm of DAX

8.0
7.0
6.0

1970 1980 1990 2000 2010

Time

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 9 / 143


Basics
Definition

Definition: Stochastic process


A sequence (Xt )tT of random variables, all defined on the same
probability space (, A, P), is called stochastic process with discrete time
parameter (usually T = N or T = Z)

Short version: A stochastic process is a sequence of random variables


A stochastic process depends on both chance and time

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 10 / 143


Basics
Definition

Distinguish four cases: both time and chance can be fixed or variable

t fixed t variable
fixed Xt () is a Xt () is a sequence of
real number real numbers (path,
realization, trajectory)
variable Xt () is a Xt () is a stochastic
random variable process

process.R

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 11 / 143


Basics
Examples

Example 1: White noise

t NID 0, 2


Example 2: Random walk

Xt = Xt1 + t and X0 = 0
2
t NID(0, )

Example 3: A random constant

Xt = Z
Z N(0, 2 )

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 12 / 143


Basics
Moment functions

Definition: Moment functions


The following functions of time are called moment functions:

(t) = E (Xt ) (expectation function)


2 (t) = Var (Xt ) (variance function)
(s, t) = Cov (Xs , Xt ) (covariance function)

Correlation function (autocorrelation function)

(s, t)
(s, t) = p p
(s) 2 (t)
2

moments.R [1]

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 13 / 143


Basics
Estimation of moment functions

Usually, the moment functions are unknown and have to be estimated


Problem: Only a single path (realization) can be observed
(1) (2) (n)
X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Can we still estimate the expectation function (t) and the
autocovariance function (s, t)? Under which conditions?

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 14 / 143


Basics
Estimation of moment functions

(1) (2) (n)


X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Usually, the expectation function (t) should be estimated by
averaging over realizations,
n
1 X (i)
(t) = Xt
n
i=1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 15 / 143


Basics
Estimation of moment functions

(1) (2) (n)


X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Under certain conditions, (t) can be estimated by
averaging over time,
T
1 X (1)
= Xt
T
t=1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 15 / 143


Basics
Estimation of moment functions

(1) (2) (n)


X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Usually, the autocovariance (t, t + h) should be estimated by
averaging over realizations,
n
1 X (i) (i)
(t, t + h) = (Xt (t))(Xt+h (t + h))
n
i=1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 16 / 143


Basics
Estimation of moment functions

(1) (2) (n)


X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Under certain conditions, (t, t + h) can be estimated by
averaging over time,
T h
1 X
(t, t + h) = (Xt (1) )(Xt+h (1) )
T
t=1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 16 / 143


Basics
Definition

Moment functions cannot be estimated without additional


assumptions since only one path is observed
There are restrictions which allow to estimate the moment functions
Restriction of the time heterogeneity:
The distribution of (Xt ())tT must not be completely different for
each t T
Restriction of the memory:
If the values of the process are coupled too closely over time, the
individual observations do not supply any (or only insufficient)
information about the distribution

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 17 / 143


Basics
Restriction of time heterogeneity: Stationarity

Definition: Strong stationarity


Let (Xt )tT be a stochastic process, and let t1 , . . . , tn T be an arbitrary
number of n N arbitrary time points.
(Xt )tT is called strongly stationary if for arbitrary h Z

P(Xt1 x1 , . . . , Xtn xn ) = P(Xt1 +h x1 , . . . , Xtn +h xn )

Implication: all univariate marginal distributions are identical

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 18 / 143


Basics
Restriction of time heterogeneity: Stationarity

Definition: Weak stationarity


(Xt )tT is called weakly stationary if
1 the expectation exists and is constant: E (Xt ) = < for all t T
2 the variance exists and is constant: Var (Xt ) = 2 < for all t T
3 for all t, s, r Z (in admissible range)

(t, s) = (t + r , s + r )

Simplified notation for covariance and correlation functions

(h) = (t, t + h)
(h) = (t, t + h)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 19 / 143


Basics
Restriction of time heterogeneity: Stationarity

Strong stationarity implies weak stationarity


(but only if the first two moments exist)
A stochastic process is called Gaussian if the joint distribution
of Xt1 , . . . , Xtn is multivariate normal
For Gaussian processes, weak and strong stationarity coincide
Intuition: An observed time series can be regarded as a realization of
a stationary process, if a gliding window of appropriate width

always displays qualitatively the same picture

stationary.R
Examples [2]

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 20 / 143


Basics
Restriction of memory: Ergodicity

Definition: Ergodicity (I)


Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define
T
1 X
= Xt
T
t=1

(Xt )tT is called (expectation) ergodic, if


h i
lim E (T )2 = 0
T

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 21 / 143


Basics
Restriction of memory: Ergodicity

Definition: Ergodicity (II)


Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define
T h
1 X
(h) = (Xt )(Xt+h )
T
t=1

(Xt )tT is called (covariance) ergodic, if for all h Z


h i
lim E ((h) (h))2 = 0
T

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 22 / 143


Basics
Restriction of memory: Ergodicity

Ergodicity is consistency (in quadratic mean) of the estimators of


and (h) of (h) for dependent observations
The process (Xt )tT is expectation ergodic if ((h))hZ is
absolutely summable, i.e.

X
|(h)| <
h=

The dependence between far away observations must be sufficiently


small

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 23 / 143


Basics
Restriction of memory: Ergodicity

Ergodicity condition (for autocovariance): A stationary Gaussian


process (Xt )tT with absolutely summable autocovariance function
(h) is (autocovariance) ergodic
Under ergodicity, the law of large numbers holds even if the
observations are dependent
If the dependence (h) does not diminish fast enough,
the estimators are no longer consistent
Examples [3]

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 24 / 143


Basics
Estimation of moment functions

Summary of estimators electricity.R


T
1 X
= XT = Xt
T
t=1
T h
1 X
(h) = (Xt )(Xt+h )
T
t=1
(h)
(h) =
(0)

Sometimes, (h) is defined with factor 1/(T h)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 25 / 143


Basics
Estimation of moment functions

A closer look at the expectation estimator


The estimator is unbiased, i.e. E () = [4]

The variance of is [5]

T 1  
(0) 2 X h
Var () = + 1 (h)
T T T
h=1

Under ergodicity, for T



X
X
T Var () (0) + 2 (h) = (h)
h=1 h=

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 26 / 143


Basics
Estimation of moment functions

For Gaussian processes, is normally distributed

N (, Var ())

and asymptotically

!
X
T ( ) Z N 0, (0) + 2 (h)
h=1

For non-Gaussian processes, is (often) asymptotically normal



!
X
T ( ) Z N 0, (0) + 2 (h)
h=1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 27 / 143


Basics
Estimation of moment functions

A closer look at the autocovariance estimators (h)


For Gaussian processes with absolutely summable covariance function,
 0
T ( (0) (0)) , . . . , T ( (K ) (K ))

is multivariate normal with expectation vector (0, . . . , 0)0 and

T Cov ( (h1 ) , (h2 ))



X
= ( (r ) (r + h1 + h2 ) + (r h2 ) (r + h1 ))
r =

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 28 / 143


Basics
Estimation of moment functions

A closer look at the autocorrelation estimators (h)


For Gaussian processes with absolutely summable covariance function,
the random vector
 0
T ( (0) (0)) , . . . , T ( (K ) (K ))

is multivariate normal with expectation vector (0, . . . , 0)0 and a


complicated covariance matrix
Be careful: For small to medium sample sizes the autocovariance and
autocorrelation estimators are biased!
autocorr.R

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 29 / 143


Basics
Estimation of moment functions

An important special case for autocorrelation estimators:


Let (t ) be a white-noise process with Var (t ) = 2 < , then

E ( (h)) = T 1 + O(T 2 )
 1 2 )
T + O(T for h1 = h2
Cov ( (h1 ) , (h2 )) = 2

O T else

For white-noise processes and long time series, the empirical


autocorrelations are approximately independent normal random
variables with expectation T 1 and variance T 1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 30 / 143


Mathematical digression (I)
Complex numbers

Some quadratic equations do not have real solutions, e.g.

x2 + 1 = 0

Still it is possible (and sensible) to define solutions to such equations


The definition in common notation is

i = 1

where i is the number which, when squared, equals 1


The number i is called imaginary (i.e. not real)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 31 / 143


Mathematical digression (I)
Complex numbers

Other imaginary numbers follow from this definition, e.g.



16 = 16 1 = 4i

5 = 5 1 = 5i

Further, it is possible to define numbers that contain


both a real part and an imaginary part, e.g. 5 8i or a + bi
Such numbers are called complex and the set of complex numbers
is denoted as C
The pair a + bi and a bi is called conjugate complex

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 32 / 143


Mathematical digression (I)
Complex numbers

Geometric interpretation:
seq(0, 8, length = 11)

imaginary axis

a+bi

er
alu
ev
lut
so imaginary part b
ab

real part a

real axis

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 33 / 143


Mathematical digression (I)
Complex numbers

Polar coordinates and Cartesian coordinates

z = a + bi
= r (cos + i sin )
= re i
a = r cos
b = r sin
p
r = a2 + b 2
 
b
= arctan
a

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 34 / 143


Mathematical digression (I)
Complex numbers

Rules of calculus:
Addition
(a + bi) + (c + di) = (a + c) + (b + d)i
Multiplication (cartesian coordinates)

(a + bi) (c + di) = (ac bd) + (ad + bc)i

Multiplication (polar coordinates)

r1 e i1 r2 e i2 = r1 r2 e i(1 +2 )

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 35 / 143


Mathematical digression (I)
Complex numbers

Addition:
seq(2, 8, length = 11)

imaginary axis

a+bi
c+di

real axis

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 36 / 143


Mathematical digression (I)
Complex numbers

Addition:


seq(2, 8, length = 11)

imaginary axis

a+bi
c+di

real axis

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 36 / 143


Mathematical digression (I)
Complex numbers

Addition:
(a+c)+(b+d)i

seq(2, 8, length = 11)

imaginary axis

a+bi
c+di

real axis

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 36 / 143


Mathematical digression (I)
Complex numbers

Multiplication:
seq(2, 8, length = 11)

imaginary axis

2
r1
r2

real axis

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 37 / 143


Mathematical digression (I)
Complex numbers

Multiplication:

r=
seq(2, 8, length = 11)

imaginary axis

r1
r
2

= 1 + 2
2
r1
r2

real axis

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 37 / 143


Mathematical digression (I)
Complex numbers

The quadratic equation

x 2 + px + q = 0

has the solutions r


p p2
x = q
2 4
p2
If 4 q < 0 the solutions are complex (and conjugate)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 38 / 143


Mathematical digression (I)
Complex numbers

Example: The solutions of

x 2 2x + 5 = 0

are r
(2) (2)2
x = + 5 = 1 + 2i
2 4
and r
(2) (2)2
x = 5 = 1 2i
2 4

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 39 / 143


Mathematical digression (II)
Linear difference equations

First order difference equation with initial value x0 :

xt = c + 1 xt1

p-th order difference equation with initial value x0 :

xt = c + 1 xt1 + . . . + p xtp

A sequence (xt )t=0,1,... that satisfies the difference equation


is called a solution of the difference equation
Examples (diffequation.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 40 / 143


Mathematical digression (II)
Linear difference equations

We only consider the homogeneous case, i.e. c = 0


The general solution of the first-order difference equation

xt = 1 xt1

is
xt = A t1
with arbitrary constant A since xt = At1 = 1 At1
1 = 1 xt1
The constant is definitized by the initial condition, A = x0
The sequence xt = At1 is convergent if and only if |1 | < 1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 41 / 143


Mathematical digression (II)
Linear difference equations

Solution of the p-th order difference equation

xt = 1 xt1 + . . . + p xtp

Let xt = Az t , then

Az t = 1 Az (t1) + . . . + p Az (tp)
z t = 1 z (t1) + . . . + p z (tp)

and thus
1 1 z 1 . . . p z p = 0
Characteristic polynomial, characteristic equation

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 42 / 143


Mathematical digression (II)
Linear difference equations

There are p (possibly complex, possibly nondistinct) solutions


of the characteristic equation
Denote the solutions (called roots) by z1 , . . . , zp
If all roots are real and distinct, then

xt = A1 z1t + . . . + Ap zpt

is a solution of the homogeneous difference equation


If there are complex roots the solution is oscillating
The constants A1 , . . . , Ap can be definitized with p initial conditions
(x0 , x1 , . . . , xp1 )

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 43 / 143


Mathematical digression (II)
Linear difference equations

Stability condition: The linear difference equation

xt = 1 xt1 + . . . + p xtp

is stable (i.e. convergent) if and only if all roots of the


characteristic polynomial

1 1 z . . . p z p = 0

are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p
In R, the stability condition can be checked easily using the
commands polyroot (base package) or ArmaRoots (fArma package)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 44 / 143


ARMA models
Definition

Definition: ARMA process


Let (t )tT be a white noise process; the stochastic process

Xt = 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq

with p , q 6= 0 is called ARMA(p, q) process

AutoRegressive Moving Average process


ARMA processes are important since every stationary process can be
approximated by an ARMA process

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 45 / 143


ARMA models
Lag operator and lag polynomial

The lag operator is a convenient notational tool


The lag operator L shifts the time index of a stochastic process

L (Xt )tT = (Xt1 )tT


LXt = Xt1

Rules

L2 Xt = L (LXt ) = Xt2
n
L Xt = Xtn
1
L = Xt+1
0
L Xt = Xt

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 46 / 143


ARMA models
Lag operator and lag polynomial

Lag polynomial

A(L) = a0 + a1 L + a2 L2 + . . . + ap Lp

Example: Let A(L) = 1 0.5L and B(L) = 1 + 4L2 , then

C (L) = A(L)B(L)
= (1 0.5L) 1 + 4L2


= 1 0.5L + 4L2 2L3

Lag polynomials can be treated in the same way as ordinary


polynomials

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 47 / 143


ARMA models
Lag operator and lag polynomial

Define the lag polynomials

(L) = 1 1 L . . . p Lp
(L) = 1 + 1 L + . . . + q Lq

The ARMA(p, q) process can be written compactly as

(L)Xt = (L)t

Important special cases

MA(q) process : Xt = t + 1 t1 + . . . + q tq
AR(1) process : Xt = 1 Xt1 + t
AR(p) process : Xt = 1 Xt1 + + p Xtp + t

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 48 / 143


ARMA models
MA(q) process

The MA(q) process is

Xt = (L)t
Xt = t + 1 t1 + . . . + q tq

with t NID(0, 2 )
Expectation function

E (Xt ) = E (t + 1 t1 + . . . + q tq )
= E (t ) + 1 E (t1 ) + . . . + q E (tq )
= 0

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 49 / 143


ARMA models
MA(q) process

Autocovariance function

(s, t)
 
= E (s + 1 s1 + . . . + q sq ) (t + 1 t1 + . . . + q tq )

= E s t + 1 s t1 + 2 s t2 + . . . + q s tq
+1 s1 t + 12 s1 t1 + 1 2 s1 t2 + . . . + 1 q s1 tq
+...
+q sq t + 1 q sq t1 + 2 q sq t2 + . . . + q2 sq tq


The expectations of the cross products are



0 for s 6= t
E (s t ) = 2
for s = t

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 50 / 143


ARMA models
MA(q) process

Define 0 = 1, then
Xq
(t, t) = 2 2
i=0 i
Xq1
(t 1, t) = 2 i i+1
i=0
Xq2
(t 2, t) = 2 i i+2
i=0
(t q, t) = 2 0 q = 2 q
(s, t) = 0 for s < t q

Hence, MA(q) processes are always stationary


Simulation of MA(q) processes (maqsim.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 51 / 143


ARMA models
AR(1) process

The AR(1) process is

(L)Xt = t
(1 1 L)Xt = t
Xt = 1 Xt1 + t

with t NID(0, 2 )
Expectation and variance function [6]

Stability condition: AR(1) processes are stable if |1 | < 1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 52 / 143


ARMA models
AR(1) process

Stationarity: Stable AR(1) processes are weakly stationary if [7]

E (X0 ) = 0
2
Var (X0 ) =
1 21

Nonstationary stable processes converge towards stationarity [8]

It is common parlance to call stable processes stationary


Covariance function of stationary AR(1) process [9]

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 53 / 143


ARMA models
AR(p) process

The AR(p) process is

(L)Xt = t
Xt = 1 Xt1 + . . . + p Xtp + t

with t NID(0, 2 )
Assumption: t is independent from Xt1 , Xt2 , . . . (innovations)
Expectation function [10]

The covariance function is complicated (ar2autocov.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 54 / 143


ARMA models
AR(p) process

AR(p) processes are stable if all roots of the characteristic equation

(z) = 0

are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p


An AR(p) process is weakly stationary if the joint distribution of the
p initial values (X0 , X1 , . . . , X(p1) ) is appropriate

Stable AR(p) processes converge towards stationarity;
they are often called stationary
Simulation of AR(p) processes (arpsim.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 55 / 143


ARMA models
Invertability

AR and MA processes can be inverted (into each other)


Example: Consider the stable AR (1) process with |1 | < 1

Xt = 1 Xt1 + t
= 1 (1 Xt2 + t1 ) + t
= 21 Xt2 + 1 t1 + t
..
.
= n1 Xtn + 1n1 t(n1) + . . . + 21 t2 + 1 t1 + t

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 56 / 143


ARMA models
Invertability

Since |1 | < 1

X
Xt = i1 ti
i=0
= t + 1 t1 + 2 t2 + . . .

with i = i1
A stable AR(1) process can be written as an MA() process
(the same is true for stable AR(p) processes)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 57 / 143


ARMA models
Invertability

Using lag polynomials this can be written as

(1 1 L)Xt = t
Xt = (1 1 L)1 t
X
Xt = (1 L)i t
i=0

General compact and elegant notation

(L)Xt = t
Xt = ((L))1 t
= (L)t

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 58 / 143


ARMA models
Invertability

MA(q) can be written as AR() if all roots of (z) = 0 are larger


than 1 in absolute value (invertability condition)
Example: MA(1) with |1 | < 1; from

Xt = t + 1 t1
1 Xt1 = 1 t1 + 12 t2

we find Xt = 1 Xt1 + t 12 t2
Repeated substitution of the ti terms yields

X
Xt = i Xti + t with i = (1)i+1 1i
i=1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 59 / 143


ARMA models
Invertability

Summary

ARMA(p, q) processes are stable if all roots of

(z) = 0

are larger than 1 in absolute value


ARMA(p, q) processes are invertible if all roots of

(z) = 0

are larger than 1 in absolute value

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 60 / 143


ARMA models
Invertability

Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q)


process either as AR() or as MA()
ARMA(p, q) can be written as AR() or MA()

(L)Xt = (L)t
Xt = ((L))1 (L)t
((L))1 (L)Xt = t

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 61 / 143


ARMA models
Deterministic components

Until now we only considered processes with zero expectation


Many processes have both a zero-expectation stochastic
component (Yt ) and a non-zero deterministic component (Dt )
Examples:
linear trend Dt = a + bt
exponential trend Dt = ab t
saisonal patterns
Let (Xt )tZ be a stochastic process with deterministic component Dt
and define Yt = Xt Dt

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 62 / 143


ARMA models
Deterministic components

Then E (Yt ) = 0 and

Cov (Yt , Ys ) = E [(Yt E (Yt )) (Ys E (Ys ))]


= E [(Xt Dt E (Xt Dt ))(Xs Ds E (Xs Ds ))]
= E [(Xt E (Xt )) (Xs E (Xs ))]
= Cov (Xt , Xs )

The covariance function does not depend on the deterministic


component
To derive the covariance function of a stochastic process, simply drop
the deterministic component

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 63 / 143


ARMA models
Deterministic components

Special case: Dt = t =
ARMA(p, q) process with constant (non-zero) expectation

Xt = 1 (Xt1 ) + . . . + p (Xtp )
+t + 1 t1 + . . . + q tq

The process can also be written as

Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq

where c = (1 1 . . . p )

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 64 / 143


ARMA models
Deterministic components

Wolds representation theorem: Every stationary stochastic process


(Xt )tT can be represented as

X
Xt = h th + Dt
h=0
P 2
with 0 = 1, h=0 j < and t white noise with variance 2 > 0
Stationary stochastic processes can be written as a sum of a
deterministic process and an MA() process
Often, low order ARMA(p, q) processes can approximate MA()
processes well

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 65 / 143


ARMA models
Linear processes and filter

Definition: Linear process


Let (t )tZ be a white noise process; a stochastic process (Xt )tZ is called
linear if it can be written as

X
Xt = h th
h=
= (L)t
P
where the coefficients are absolutely summable, i.e. h= |h | < .
The lag polynomial (L) is called (linear) filter

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 66 / 143


ARMA models
Linear processes and filter

Some special filters


Change from previous period (difference filter)

(L) = 1 L

Change from last year (for quarterly or monthly data)

(L) = 1 L4
(L) = 1 L12

Elimination of saisonal influences (quarterly data)

(L) = 1 + L + L2 + L3 /4


(L) = 0.125L2 + 0.25L + 0.25 + 0.25L1 + 0.125L2

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 67 / 143


ARMA models
Linear processes and filter

Hodrick-Prescott filter (important tool in empirical macro economics)


Decompose a time series (Xt ) into a long-term growth component
(Gt ) and a short-term cyclical component (Ct )

Xt = Gt + Ct

Trade-off between goodness-of-fit and smoothness of Gt


Minimize the criterion function
T
X T
X 1
(Xt Gt )2 + [(Gt+1 Gt ) (Gt Gt1 )]2
t=1 t=2

with respect to Gt for given smoothness parameter

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 68 / 143


ARMA models
Linear processes and filter

The FOCs of the minimization problem are



G1 X1
.. .
. = A ..
GT XT

where A = (I + K 0 K )1 with

1 2 1 0 0 ... 0 0 0
0 1 2 1 0 ... 0 0 0

K = 0 0
1 2 1 ... 0 0 0

.. .. .. .. .. . .. ..
. . . ..

. . . . . . .
0 0 0 0 0 . . . 1 2 1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 69 / 143


ARMA models
Linear processes and filter

The HP filter is a linear filter


Typical values for smoothing parameter

= 10 annual data
= 1600 quarterly data
= 14400 monthly data

Implementation in R (code by Olaf Posch)


Empirical examples (hpfilter.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 70 / 143


Estimation of ARMA models
The estimation problem

Problem: The parameters 1 , . . . , p , 1 , . . . , q , 2 of an ARMA(p, q)


process are usually unknown
They have to be estimated from an observed time series X1 , . . . , XT
Standard estimation methods:
Least squares (OLS)
Maximum likelihood (ML)
Assumption: the lag orders p and q are known

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 71 / 143


Estimation of ARMA models
Least squares estimation of AR(p) models

The AR(p) model with non-zero constant expectation

Xt = c + 1 Xt1 + . . . + p Xtp + t

can be writte in matrix notation



Xp+1 1 Xp Xp1 ... X1 c p+1
Xp+2 1 Xp+1 Xp ... X2 1 p+2
.. = .. +

.. .. .. .. .. ..
. . . . . . . .
XT 1 XT 1 XT 2 . . . XT p p T

Compact notation: y = X + u

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 72 / 143


Estimation of ARMA models
Least squares estimation of AR(p) models

The standard least squares estimator is


1 0
= X0 X Xy

The matrix of exogenous variables X is stochastic


usual results for OLS regression do not hold
But: There is no contemporaneous correlation between the error term
and the exogenous variables
Hence, the OLS estimators are consistent and asymptotically efficient

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 73 / 143


Estimation of ARMA models
Least squares estimation of ARMA models

Solve the ARMA equation

Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq

for t ,

t = Xt c 1 Xt1 . . . p Xtp 1 t1 . . . q tq

Define the residuals as functions of the unknown parameters

t (d, f1 , . . . , fp , g1 , . . . , gq ) = Xt d f1 Xt1 . . . fp Xtp


g1 t1 . . . gq tq

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 74 / 143


Estimation of ARMA models
Least squares estimation of ARMA models

Define the sum of squared residuals


T
X
S (d, f1 , . . . , fp , g1 , . . . , gq ) = (t (d, f1 , . . . , fp , g1 , . . . , gq ))2
t=1

The least squares estimators are

(c, 1 , . . . , p , 1 , . . . , q ) = arg min S (d, f1 , . . . , fp , g1 , . . . , gq )

Since the residuals are defined recursively one needs starting values
0 , . . . , q+1 and X0 , . . . , Xp+1 to calculate 1
Easiest way: Set all starting values to zero ( conditional estimation)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 75 / 143


Estimation of ARMA models
Least squares estimation of ARMA models

The first order conditions are a nonlinear equation system


which cannot be solved easily
Minimization by standard numerical methods
(implemented in all usual statistical packages)
Either solve the nonlinear first order conditions equation system or
minimize S
Simple special case: ARMA(1, 1)
arma11.R

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 76 / 143


Estimation of ARMA models
Maximum likelihood estimation

Additional assumption: The innovations t are normally distributed


Implication: ARMA processes are Gaussian
The joint distribution of X1 , . . . , XT is multivariat normal

X1
X = ... N (, )

XT

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 77 / 143


Estimation of ARMA models
Maximum likelihood estimation

Expectation vector

X1 c/ (1 1 . . . p )
= E ... = ..

.
XT c/ (1 1 . . . p )

Covariance matrix

X1 (0) . . . (T 1)
(1)
X2 (1) . . . (T 2)
(0)
= Cov . =

.. .. .. ..
..

. .
. .
XT (T 1) (T 2) . . . (0)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 78 / 143


Estimation of ARMA models
Maximum likelihood estimation

The expectation vector and the covariance matrix contain  all


2
unknown parameters = 1 , . . . , p , 1 , . . . , q , c,
The likelihood function is
 
T /2 1/2 1 0 1
L (; X) = (2) (det ) exp (X ) (X )
2

and the loglikelihood function is


T 1 1
ln L (; X) = ln (2) ln (det ) (X )0 1 (X )
2 2 2

The ML estimators are = arg max ln L (; X)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 79 / 143


Estimation of ARMA models
Maximum likelihood estimation

The loglikelihood function has to be maximized by numerical methods


Standard properties of ML estimators:
1 consistency
2 asymptotic efficiency
3 asymptotically jointly normally distributed
4 the covariance matrix of the estimators can be consistently estimated
Example: ML estimation of an ARMA(3, 3) model for the interest
rate spread (arma33.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 80 / 143


Estimation of ARMA models
Hypothesis tests

Since the estimation method is maximum likelihood,


the classical tests (Wald, LR, LM) are applicable
General null and alternative hypotheses

H0 : g () = 0
H1 : not H0

where g () is an m-valued function of the parameters


Example: If H0 : 1 = 0 then m = 1 and g () = 1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 81 / 143


Estimation of ARMA models
Hypothesis tests

Likelihood ratio test statistic

LR = 2(ln L(ML ) ln L(R ))

where ML and R are the unrestricted and restricted estimators


Under the null hypothesis
d
LR U 2m

and H0 is rejected at significance level if LR > 2m;1


Disadvantage: Two models must be estimated

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 82 / 143


Estimation of ARMA models
Hypothesis tests

For the Wald test we only consider g () = 0 , i.e.

H0 : = 0
H1 : not H0

Test statistic
W = ( 0 )0 Cov
d ()( 0 )
d
If the null hypothesis is true then W U 2m
The asymptotic covariance matrix can be estimated consistently as
d () = H 1 where H is the Hessian matrix returned by the
Cov
maximization procedure

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 83 / 143


Estimation of ARMA models
Hypothesis tests

Test example 1:

H0 : 1 = 0
H1 : 1 6= 0

Test example 2

H0 : = 0
H1 : not H0

Illustration (arma33.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 84 / 143


Estimation of ARMA models
Model selection

Usually, the lag orders p and q of an ARMA model are unknown


Trade-off: Goodness-of-fit against parsimony
Akaikes information criterion for the model with non-zero expectation

AIC = ln 2
|{z} + 2 (p + q + 1) /T
| {z }
goodness-of-fit penalty

Choose the model with the smallest AIC

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 85 / 143


Estimation of ARMA models
Model selection

Bayesian information criterion BIC (Schwarz information criterion)

BIC = ln 2 + (p + q + 1) ln T /T

Hannan-Quinn information criterion

HQ = ln 2 + 2 (p + q + 1) ln (ln T ) /T

Both BIC and HQ are consistent while the AIC tends to overfit
Illustration (arma33.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 86 / 143


Estimation of ARMA models
Model selection

Another illustration: The true model is ARMA(2, 1) with


Xt = 0.5Xt1 + 0.3Xt2 + t + 0.7t1 ; 1000 samples of size n = 500
were generated; the table shows the models orders p and q as selected by
AIC and BIC
# orders selected by AIC # orders selected by BIC
q q
p 0 1 2 3 4 5 0 1 2 3 4 5
0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 18 64 23 14 6 0 310 167 4 0 0
2 0 171 21 16 5 7 0 503 3 1 0 0
3 0 7 35 58 80 45 1 0 2 1 0 0
4 9 2 12 139 37 44 6 1 0 0 0 0
5 11 6 12 56 46 56 1 0 0 0 0 0

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 87 / 143


Integrated processes
Difference operator

Define the difference operator

= 1 L,

then
Xt = Xt Xt1
Second order differences

2 = () = (1 L)2 = 1 2L + L2

Higher orders n are defined in the same way; note that n 6= 1 Ln

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 88 / 143


Integrated processes
Definition

Definition: Integrated process


A stochastic process is called integrated of order 1 if

Xt = + (L)t

where t is white noise, (1) 6= 0, and


P
j=0 j|j | <

Common notation: Xt I (1)


I (1) processes are also called difference stationary or
unit root processes
Stochastic and deterministic trends
Trend stationary processes are not I (1) (since (1) = 0)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 89 / 143


Integrated processes
Definition

Stationary processes are sometimes called I (0)


Higher order integrations are possible, e.g.

Xt I (2)
2
Xt I (0)

In general, Xt I (d) means that d Xt I (0)


Most economic time series are either I (0) or I (1)
Some economic time series may be I (2)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 90 / 143


Integrated processes
Definition

Example 1: The random walk with drift, Xt = b + Xt1 + t ,


is I (1) because

Xt = Xt Xt1
= b + t
= b + (L)t

where 0 = 1 and j = 0 for j 6= 0

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 91 / 143


Integrated processes
Definition

Example 2: The trend stationary process, Xt = a + bt + t ,


is not I (1) because

Xt = b + t t1
= (L)t

with 0 = 1, 1 = 1 and j = 0 for all other j

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 92 / 143


Integrated processes
Definition

Example 3: The AR(2) process



Xt = b + (1 + ) Xt1 Xt2 + t
(1 L) (1 L) Xt = b + t

is I (1) if || < 1 because Xt = (L) (b + t ) with

(L) = (1 L)1 = 1 + L + 2 L2 + 3 L3 + 4 L4 + . . .

and thus (1) = i 1


P
i=0 = 1 6= 0. The roots of the characteristic
equation are z = 1 and z = 1/

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 93 / 143


Integrated processes
Definition

Example 4: The process

Xt = 0.5Xt1 0.4Xt2 + t

is a stationary (stable) zero expectation AR(2) process; the process

Yt = a + bt + Xt

is trend stationary and I (0) since

Yt = b + Xt
1
with Xt = (L)t = (1 L) 1 0.5L + 0.4L2 t
and therefore (1) = 0 (i0andi1.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 94 / 143


Integrated processes
Definition

Definition: ARIMA process


Let (t )tT be a white noise process; the stochastic process (Xt )tZ is
called integrated autoregressive moving-average process of the orders
p, d and q, or ARIMA(p, d, q), if d Xt is an ARMA(p, q) process

(L)d Xt = (L)t

For d > 0 the process is nonstationary (I (d)) even if all roots of


(z) = 0 are outside the unit circle
Simulation of an ARIMA(p, d, q) process (arimapdqsim.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 95 / 143


Integrated processes
Deterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends?

Reason 1: Long-term forecasts and forecasting errors


Deterministic trend: The forecasting error variance is bounded
Stochastic trend: The forecasting error variance is unbounded
Illustrations
i0andi1.R

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 96 / 143


Integrated processes
Deterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends?

Reason 2: Spurious regression


OLS regressions will show spurious relationships between
time series with (deterministic or stochastic) trends
Detrending works if the series have deterministic trends,
but it does not help if the series are integrated
Illustrations
spurious1.R

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 97 / 143


Integrated processes
Integrated processes and parameter estimation

OLS estimators (and ML estimators) are consistent and


asymptotically normal for stationary processes
The asymptotic normality is lost if the processes are integrated
We only look at the very special case

Xt = 1 Xt1 + t

with t NID(0, 1) and X0 = 0


The AR(1) process is stationary if |1 | < 1 and has a unit root
if |1 | = 1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 98 / 143


Integrated processes
Integrated processes and parameter estimation

The usual OLS estimator of 1 is


PT
t=1 Xt Xt1
1 = P T 2
t=1 Xt1

How does the distribution of look like?


Influence of and T
Consistency?
Asymptotic normality?
Illustration (phihat.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 99 / 143


Integrated processes
Integrated processes and parameter estimation

Consistency and asymptotic normality for I (0) processes (|1 | < 1)

plim 1 = 1
 
d
T 1 1 Z N 0, 1 21


Consistency and asymptotic normality for I (1) processes (1 = 1)

plim 1 = 1
 
d
T 1 1 V

where V is a nondegenerate, nonnormal random variable


Root-T -consistency and superconsistency

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 100 / 143
Integrated processes
Unit root tests

Importance to distinguish between trend stationarity and difference


stationarity
Test of hypothesis that a process has a unit root (i.e. is I (1))
Classical approaches: (Augmented) Dickey-Fuller-Test,
Phillips-Perron-Test
Basic tool: Linear regression

Xt = deterministics + Xt1 + t
Xt = deterministics + ( 1) Xt1 + t
| {z }
=:

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 101 / 143
Integrated processes
Unit root tests

Null and alternative hypothesis

H0 : = 1 (unit root)
H1 : || < 1 (no unit root)

or, equivalently,

H0 : = 0 (unit root)
H1 : < 0 (no unit root)

Unit root tests are one-sided; explosive process are ruled out
Rejecting the null hypothesis is evidence in favour of stationarity
If the null hypothesis is not rejected, there could be a unit root

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 102 / 143
Integrated processes
DF test and ADF test

Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests

Possible regressions

Xt = Xt1 + t or Xt = Xt1 + t
Xt = a + Xt1 + t or Xt = a + Xt1 + t
Xt = a + bt + Xt1 + t or Xt = a + bt + Xt1 + t

Assumption for Dickey-Fuller test: no autocorrelation in t


If there is autocorrelation in t , use the augmented DF test

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 103 / 143
Integrated processes
DF test and ADF test

Dickey-Fuller regression, case 1: no constant, no trend

Xt = Xt1 + t

Null and alternative hypotheses

H0 : = 0
H1 : < 0

Null hypothesis: stochastic trend without drift


Alternative hypothesis: stationary process around zero

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 104 / 143
Integrated processes
DF test and ADF test

Dickey-Fuller regression, case 2: constant, no trend

Xt = a + Xt1 + t

Null and alternative hypotheses

H0 : = 0 or H0 : = 0, a = 0
H1 : < 0 or H0 : < 0, a 6= 0

Null hypothesis: stochastic trend without drift


Alternative hypothesis: stationary process around a constant

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 105 / 143
Integrated processes
DF test and ADF test

Dickey-Fuller regression, case 3: constant and trend

Xt = a + bt + Xt1 + t

Null and alternative hypotheses

H0 : = 0 or = 0, b = 0
H1 : < 0 or < 0, b 6= 0

Null hypothesis: stochastic trend with drift


Alternative hypothesis: trend stationary process

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 106 / 143
Integrated processes
DF test and ADF test

Dickey-Fuller test statistics for single hypotheses

-test : T
-test : /

The -test statistic is computed in the same way as


the usual t-test statistic
Reject the null hypothesis if the test statistics are too small
The critical values are not the quantiles of the t-distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.6)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 107 / 143
Integrated processes
DF test and ADF test

The Dickey-Fuller test statistics for the joint hypotheses are


computed in the same way as the usual F -test statistics
Reject the null hypothesis if the test statistic is too large
The critical values are not the quantiles of the F -distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.7)
Illustrations (dftest.R)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 108 / 143
Integrated processes
DF test and ADF test

If there is autocorrelation in t the DF test does not work (dftest.R)


Augmented Dickey-Fuller test (ADF test) regressions:

Xt = 1 Xt1 + . . . + p Xtp + Xt1 + t


Xt = a + 1 Xt1 + . . . + p Xtp + Xt1 + t
Xt = a + bt + 1 Xt1 + . . . + p Xtp + Xt1 + t

The added lagged differences capture the autocorrelation


The number of lags p must be large enough to make t white noise
The critical values remain the same as in the no-correlation case

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 109 / 143
Integrated processes
DF test and ADF test

Further interesting topics (but we skip these)

Phillips-Perron test
Structural breaks and unit roots
KPSS test of stationarity

H0 : Xt I (0)
H1 : Xt I (1)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 110 / 143
Integrated processes
Regression with integrated processes

Spurious regression: If Xt and Yt are independent but both I (1)


then the regression
Yt = + Xt + ut
will result in an estimated coefficient that is significantly different
from 0 with probability 1 as T
BUT: The regression

Yt = + Xt + ut

may be sensible even though Xt and Yt are I (1)


Cointegration

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 111 / 143
Integrated processes
Regression with integrated processes

Definition: Cointegration
Two stochastic processes (Xt )tT and (Yt )tT are cointegrated if both
processes are I (1) and there is a constant such that the process
(Yt Xt ) is I (0)

If is known, cointegration can be tested using a standard unit root


test on the process (Yt Xt )
If is unknown, it can be estimated from the linear regression

Yt = + Xt + ut

and cointegration is tested using a modified unit root test on the


residual process (ut )t=1,...,T

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 112 / 143
GARCH models
Conditional expectation

Let (X , Y ) be a bivariate random variable with a joint density


function, then
Z
E (X |Y = y ) = x fX |Y =y (x)dx

is the conditional expectation of X given Y = y


E (X |Y ) denotes a random variable with realization E (X |Y = y )
if the random variable Y realizes as y
Both E (X |Y ) and E (X |Y = y ) are called conditional expectation

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 113 / 143
GARCH models
Conditional variance

Let (X , Y ) be a bivariate random variable with a joint density


function, then
Z
Var (X |Y = y ) = (x E (X |Y = y ))2 fX |Y =y (x)dx

is the conditional variance of X given Y = y


Var (X |Y ) denotes a random variable with realization Var (X |Y = y )
if the random variable Y realizes as y
Both Var (X |Y = y ) and Var (X |Y ) are called conditional variance

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 114 / 143
GARCH models
Rules for conditional expectations

1 Law of iterated expectations: E (E (X |Y )) = E (X )


2 If X and Y are independent, then E (X |Y ) = E (X )
3 The condition can be treated like a constant,
E (XY |Y ) = Y E (X |Y )
4 The conditional expecation is a linear operator. For a1 , . . . , an R
n n
!
X X
E ai Xi |Y = ai E (Xi |Y )
i=1 i=1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 115 / 143
GARCH models
Basics

Some economic time series show volatility clusters, e.g. stock returns,
commodity price changes, inflation rates, . . .
Simple autoregressive models cannot capture volatility clusters since
their conditional variance is constant
Example: Stationary AR(1)-process, Xt = Xt1 + t with || < 1;
then
2
Var (Xt ) = X2 = ,
1 2
and the conditional variance is

Var (Xt |Xt1 ) = 2

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 116 / 143
GARCH models
Basics

In the following, we will focus on stock returns


Empirical fact: squared (or absolute) returns are positively
autocorrelated
Implication: Returns are not independent over time
The dependence is nonlinear
How can we model this kind of dependence?

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 117 / 143
GARCH models
ARCH(1)-process

Definition: ARCH(1)-process
The stochastic process (Xt )tZ is called ARCH(1)-process if

E (Xt |Xt1 ) = 0
Var (Xt |Xt1 ) = t2
2
= 0 + 1 Xt1

for all t Z, with 0 , 1 > 0

Often, an additional assumption is


2
Xt | (Xt1 = xt1 ) N(0, 0 + 1 xt1 )

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 118 / 143
GARCH models
ARCH(1)-process

The unconditional distribution of Xt is a non-normal distribution


Leptokurtosis: The tails are heavier than the tails of the normal
distribution
Example of an ARCH(1)-process

Xt = t t

where (t )tZ is white noise with 2 = 1 and


q
t = 0 + 1 Xt1 2

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 119 / 143
GARCH models
ARCH(1)-process

One can show that [11]

E (Xt |Xt1 ) = 0
E (Xt ) = 0
2
Var (Xt |Xt1 ) = 0 + 1 Xt1
Var (Xt ) = 0 / (1 1 )
Cov (Xt , Xti ) = 0 for i > 0

Stationarity condition: 0 < 1 < 1


The unconditional
p kurtosis is 3(1 12 )/(1 312 ) if t N(0, 1).
If 1 > 1/3 = 0.57735, the kurtosis does not exist. [12]

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 120 / 143
GARCH models
ARCH(1)-process

Squared returns follow [13]

Xt2 = 0 + 1 Xt1
2
+ vt

with vt = t2 (2t 1)
Thus, squared returns of ARCH(1) are AR(1)
The process (vt )tZ is white noise

E (vt ) = 0
Var (vt ) = E (vt2 ) = const.
Cov (vt , vti ) = 0 (i = 1, 2, . . .)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 121 / 143
GARCH models
ARCH(1)-process

Simulation of an ARCH(1)-process for t = 1, . . . , 2500


Parameters: 0 = 0.05, 1 = 0.95, start value X0 = 0
Conditional distribution: t N(0, 1)
archsim.R
Check whether the simulated time series shows the typical stylized
facts of return distributions

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 122 / 143
GARCH models
Estimation of an ARCH(1)-process

Of course, we do not know the true values of the model parameters


0 and 1
How can we estimate the unknown parameters 0 and 1 ?
Observations X1 , . . . , XT
Because of
Xt2 = 0 + 1 Xt1
2
+ vt
a possible estimation method is OLS

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 123 / 143
GARCH models
Estimation of an ARCH(1)-process

OLS estimator of 1
  
T
Xt2 Xt2 Xt1
2 X2
P
t=2 t1
1 = PT  2 2 (Xt2 , Xt1
2
)
X X 2
t=2 t1 t1

Careful: These
p estimators are only consistent if the kurtosis exists
(i.e. if 1 < 1/3)
Test of ARCH-effects

H0 : 1 = 0
H1 : 1 > 0

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 124 / 143
GARCH models
Estimation of an ARCH(1)-process

For T large, under H0



T 1 N(0, 1)

Reject H0 if T 1 > 1 (1 )
Second version of this test: Consider the R 2 of the regression

Xt2 = 0 + 1 Xt1
2
+ vt ,

then under H0
appr
T 12 TR 2 21
Reject H0 if TR 2 > F1
2 (1 )
1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 125 / 143
GARCH models
ARCH(p)-process

Definition: ARCH(p)-process
The stochastic process (Xt )tZ is called ARCH(p)-process if

E (Xt |Xt1 , . . . Xtp ) = 0


Var (Xt |Xt1 , . . . , Xtp ) = t2
2 2
= 0 + 1 Xt1 + . . . + p Xtp

for t Z, where i 0 for i = 0, 1, . . . , p 1 and p > 0

Often, an additional assumption is that

Xt |(Xt1 = xt1 , . . . , Xtp = xtp ) N(0, t2 )

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 126 / 143
GARCH models
ARCH(p)-process

Example of an ARCH(p)-process

Xt = t t

where(t )tZ is white noise with 2 = 1 and


q
t = 0 + 1 Xt1 2 + ... + X2
p tp

An ARCH(p) process is weakly stationary if all roots of


1 1 z 2 z 2 . . . p z p = 0 are outside the unit circle
Then, for all t Z, E (Xt ) = 0 and

Var (Xt ) = P0p
1 i=1 i

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 127 / 143
GARCH models
ARCH(p)-process

If (Xt )tZ is a stationary ARCH(p) process, then (Xt2 )tZ is a


stationary AR(p) process

Xt2 = 0 + 1 Xt1
2 2
+ . . . + p Xtp + vt

As to the error term,

E (vt ) = 0
Var (vt ) = const.
Cov (vt , vti ) = 0 for i = 1, 2, . . .

Simulating an ARCH(p) is easy

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 128 / 143
GARCH models
Estimation of ARCH(p) models

OLS estimation of

Xt2 = 0 + 1 Xt1
2 2
+ . . . + p Xtp + vt

Test of ARCH-effects

H0 : 1 = 2 = . . . = p = 0 vs H1 : not H0

Let R 2 denote the coefficient of determination of the regression


Under H0 , the test statistic TR 2 2p ;
thus reject H0 if TR 2 > F12 (1 )
p

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 129 / 143
GARCH models
Maximum likelihood estimation

Basic idea of the maximum likelihood estimation method:


Choose parameters such that the joint density of the observations

fX1 ,...,XT (x1 , . . . , xT )

is maximized
Let X1 , . . . , XT denote a random sample from X
The density fX (x; ) depends on R unknown parameters
= (1 , . . . , R )

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 130 / 143
GARCH models
Maximum likelihood estimation

ML estimation of : Maximize the (log)likelihood function

L () = fX1 ,...XT (x1 , . . . , xT ; )


T
Y
= fX (xt ; )
t=1
XT
ln L () = ln fX (xt ; )
t=1

ML estimate
= argmax [ln L ()]

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 131 / 143
GARCH models
Maximum likelihood estimation

Since observations are independent in random samples


T
Y
fX1 ,...,XT (x1 , . . . , xT ) = fXt (xt )
t=1

or
T
X
ln fX1 ,...,XT (x1 , . . . , xT ) = ln fXt (xt )
t=1
T
X
= ln fX (xt )
t=1

But: ARCH-returns are not independent!

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 132 / 143
GARCH models
Maximum likelihood estimation

Factorization with dependent observations


T
Y
fX1 ,...,XT (x1 , . . . , xT ) = fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )
t=1

or
T
X
ln fX1 ,...,XT (x1 , . . . , xT ) = ln fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )
t=1

Hence, for an ARCH(1)-process


T  2 !
Y 1 1 xt
fX1 ,...,XT (x1 , . . . , xT ) = fX1 (x1 ) p 2 exp
2 t 2 t
t=2

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 133 / 143
GARCH models
Maximum likelihood estimation

The marginal density of X1 is complicated but becomes negligible for


large T and, therefore, will be dropped from now on
Log-likelihood function (without initial marginal density)

ln L(0 , 1 |x1 , . . . , xT )
T T  2
T 1X 1X xt
= ln 2 ln t2
2 2 2 t
t=2 t=2

where t2 = 0 + 1 xt1
2

ML-estimation of 0 and 1 by numerical maximization of


ln L(0 , 1 ) with respect to 0 and 1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 134 / 143
GARCH models
GARCH(p,q)-process

Definition: GARCH(p,q)-process
The stochastic process (Xt )tZ is called GARCH(p, q)-process if

E (Xt |Xt1 , Xt2 , . . .) = 0


Var (Xt |Xt1 , Xt2 , . . .) = t2
2 2
= 0 + 1 Xt1 + . . . + p Xtp
2 2
+1 t1 + . . . + q tq

for t Z with i , i 0

Often, an additional assumption is that

(Xt |Xt1 = xt1 , Xt2 = xt2 , . . .) N(0, t2 )

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 135 / 143
GARCH models
GARCH(p,q)-process

Conditional variance of GARCH(1, 1)

Var (Xt |Xt1 , Xt2 , . . .) = t2


2 2
= 0 + 1 Xt1 + 1 t1

0 X
= + 1 1i1 Xti
2
1 1
i=1

Unconditional variance
0
Var (Xt ) = Pp Pq
1 i=1 i j=1 j

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 136 / 143
GARCH models
GARCH(p,q)-process

Necessary condition for weak stationarity


p
X q
X
i + j < 1
i=1 j=1

(Xt )tZ has no autocorrelation


GARCH-processes can be written as ARMA(max (p, q) , q)-processes
in the squared returns
Example: GARCH(1, 1)-process with Xt = t t and
t2 = 0 + 1 Xt1
2 + 2
1 t1

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 137 / 143
GARCH models
Estimation of GARCH(p,q)-processes

Estimation of the ARMA(max (p, q) , q)-process in the squared returns


Alternative (and better) method: Maximum likelihood
For a GARCH(1, 1)-process

fX1 ,...,XT (x1 , . . . , xT )


T  2 !
Y 1 1 xt
= fX1 (x1 ) p 2 exp
2 t 2 t
t=2

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 138 / 143
GARCH models
Estimation of GARCH(p,q)-processes

Again, the density of X1 can be neglected


Log-Likelihood function

ln L(0 , 1 , 1 |x1 , . . . , xT )
T T  2
T 1X 1X xt
= ln 2 ln t2
2 2 2 t
t=2 t=2

with t2 = 0 + 1 xt1
2 2
+ 1 t1 and 12 = 0
ML-estimation of 0 , 1 and 1 by numerical maximization

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 139 / 143
GARCH models
Estimation of GARCH(p,q)-processes

2
Conditional h-step forecast of the volatility t+h in a GARCH(1, 1)
model
 
2
 h 2 0
E t+h |Xt , Xt1 , . . . = (1 + 1 ) t
1 1 1
0
+
1 1 1
If the process is stationary
2 0
lim E (t+h |Xt , Xt1 , . . .) =
h 1 1 1
Simulation of GARCH-processes is easy; the estimation can be
computer intensive

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 140 / 143
GARCH models
Residuals of an estimated GARCH(1,1) model

Careful: Residuals are slightly different from what you know from OLS
regressions
Estimates: 0 , 1 , 1 ,
From t2 = 0 + 1 Xt1
2 + 2
1 t1 and Xt = + t t we calculate
the standardized residuals
Xt Xt
t = =q
t 2 + 2
0 + 1 Xt1 1 t1

Histogram of the standardized residuals

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 141 / 143
GARCH models
AR(p)-ARCH(q)-models

Definition: (Xt )tZ is called AR(p)-ARCH(q)-process if

Xt = + 1 Xt1 + t
t2 = 0 + 1 2t1

where t N(0, t2 )
mean equation / variance equation
Maximum likelihood estimation

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 142 / 143
GARCH models
Extensions of the GARCH model

There are a number of possible extensions to the GARCH model:

Empirical fact: Negative shocks have a larger impact on volatility


than positive shocks (leverage effect)
News impact curve
Nonnormal innovations, e.g. t t

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 143 / 143

You might also like