Time Series Analysis: Andrea Beccarini

Time Series Analysis
Andrea Beccarini
Center for Quantitative Economics
Winter 2013/2014
Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 1 / 143

Introduction
Objectives
Time series are ubiquitous in economics, and very important

in macro economics and financial economics
GDP, inflation rates, unemployment, interest rates, stock prices
You will learn . . .
the formal mathematical treatment of time series and stochastic
processes
what the most important standard models in economics are
how to fit models to real world time series

Introduction
Prerequisites
Descriptive Statistics
Probability Theory
Statistical Inference

Introduction
Class and material
Class
Class teacher: Sarah Meyer
Time: Tu., 12:00-14:00
Location: CAWM 3
Start: 22 October 2013
Material
Course page on Blackboard
Slides and class material are (or will be) downloadable

Introduction
Literature
Neusser, Klaus (2011), Zeitreihenanalyse in den

Wirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden.
available online in the RUB-Netz
Hamilton, James D. (1994), Time Series Analysis,
Princeton University Press, Princeton.
Pfaff, Bernhard (2006), Analysis of Integrated and
Cointegrated Time Series with R, Springer, New York.
Schlittgen, Rainer und Streitberg, Bernd (1997),
Zeitreihenanalyse, 7. Aufl., Oldenbourg, Munchen.

Basics
Definition
Definition: Time series

A sequence of observations ordered by time is called time series
Time series can be univariate or multivariate

Time can be discrete or continous
The states can be discrete or continuous

Basics
Definition
Typical notations
x1 , x2 , . . . , xT
or x(1), x(2), . . . , x(T )
or xt , t = 1, . . . , T
or (xt )t0
This course is about . . .
univariate time series
in discrete time
with continuous states

Basics
Examples
Quarterly GDP Germany, 1991 I to 2012 II

650

600

GDP (in current billion Euro)

550

500

450

400

350
1995 2000 2005 2010
Time

Basics
Examples
DAX index and log(DAX), 31.12.1964 to 6.4.2009

6000
DAX
2000
1970 1980 1990 2000 2010
Time
9.0
logarithm of DAX
8.0
7.0
6.0
1970 1980 1990 2000 2010
Time

Basics
Definition
Definition: Stochastic process

A sequence (Xt )tT of random variables, all defined on the same
probability space (, A, P), is called stochastic process with discrete time
parameter (usually T = N or T = Z)
Short version: A stochastic process is a sequence of random variables

A stochastic process depends on both chance and time

Basics
Definition
Distinguish four cases: both time and chance can be fixed or variable
t fixed t variable
fixed Xt () is a Xt () is a sequence of
real number real numbers (path,
realization, trajectory)
variable Xt () is a Xt () is a stochastic
random variable process
process.R

Basics
Examples
Example 1: White noise
t NID 0, 2

Example 2: Random walk
Xt = Xt1 + t and X0 = 0
2
t NID(0, )
Example 3: A random constant
Xt = Z
Z N(0, 2 )

Basics
Moment functions
Definition: Moment functions

The following functions of time are called moment functions:
(t) = E (Xt ) (expectation function)

2 (t) = Var (Xt ) (variance function)
(s, t) = Cov (Xs , Xt ) (covariance function)
Correlation function (autocorrelation function)
(s, t)
(s, t) = p p
(s) 2 (t)
2
moments.R [1]

Basics
Estimation of moment functions
Usually, the moment functions are unknown and have to be estimated

Problem: Only a single path (realization) can be observed
(1) (2) (n)
X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Can we still estimate the expectation function (t) and the
autocovariance function (s, t)? Under which conditions?

Basics
(1) (2) (n)

X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Usually, the expectation function (t) should be estimated by
averaging over realizations,
n
1 X (i)
(t) = Xt
n
i=1

Basics
(1) (2) (n)

X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Under certain conditions, (t) can be estimated by
averaging over time,
T
1 X (1)
= Xt
T
t=1

Basics
(1) (2) (n)

X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Usually, the autocovariance (t, t + h) should be estimated by
averaging over realizations,
n
1 X (i) (i)
(t, t + h) = (Xt (t))(Xt+h (t + h))
n
i=1

Basics
(1) (2) (n)

X1 X1 ... X1
(1) (2) (n)
X2 X2 ... X2
.. .. ..
. . ... .
(1) (2) (n)
XT XT ... XT
Under certain conditions, (t, t + h) can be estimated by
averaging over time,
T h
1 X
(t, t + h) = (Xt (1) )(Xt+h (1) )
T
t=1

Basics
Definition
Moment functions cannot be estimated without additional

assumptions since only one path is observed
There are restrictions which allow to estimate the moment functions
Restriction of the time heterogeneity:
The distribution of (Xt ())tT must not be completely different for
each t T
Restriction of the memory:
If the values of the process are coupled too closely over time, the
individual observations do not supply any (or only insufficient)
information about the distribution

Basics
Restriction of time heterogeneity: Stationarity
Definition: Strong stationarity

Let (Xt )tT be a stochastic process, and let t1 , . . . , tn T be an arbitrary
number of n N arbitrary time points.
(Xt )tT is called strongly stationary if for arbitrary h Z
P(Xt1 x1 , . . . , Xtn xn ) = P(Xt1 +h x1 , . . . , Xtn +h xn )
Implication: all univariate marginal distributions are identical

Basics
Definition: Weak stationarity

(Xt )tT is called weakly stationary if
1 the expectation exists and is constant: E (Xt ) = < for all t T
2 the variance exists and is constant: Var (Xt ) = 2 < for all t T
3 for all t, s, r Z (in admissible range)
(t, s) = (t + r , s + r )
Simplified notation for covariance and correlation functions
(h) = (t, t + h)
(h) = (t, t + h)

Basics
Strong stationarity implies weak stationarity

(but only if the first two moments exist)
A stochastic process is called Gaussian if the joint distribution
of Xt1 , . . . , Xtn is multivariate normal
For Gaussian processes, weak and strong stationarity coincide
Intuition: An observed time series can be regarded as a realization of
a stationary process, if a gliding window of appropriate width

always displays qualitatively the same picture

stationary.R
Examples [2]

Basics
Restriction of memory: Ergodicity
Definition: Ergodicity (I)

Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define
T
1 X
= Xt
T
t=1
(Xt )tT is called (expectation) ergodic, if

h i
lim E (T )2 = 0
T

Basics
Definition: Ergodicity (II)

Let (Xt )tT be a weakly stationary stochastic process with expectation
and autocovariance (h); define
T h
1 X
(h) = (Xt )(Xt+h )
T
t=1
(Xt )tT is called (covariance) ergodic, if for all h Z

h i
lim E ((h) (h))2 = 0
T

Basics
Ergodicity is consistency (in quadratic mean) of the estimators of

and (h) of (h) for dependent observations
The process (Xt )tT is expectation ergodic if ((h))hZ is
absolutely summable, i.e.

X
|(h)| <
h=
The dependence between far away observations must be sufficiently

small

Basics
Ergodicity condition (for autocovariance): A stationary Gaussian

process (Xt )tT with absolutely summable autocovariance function
(h) is (autocovariance) ergodic
Under ergodicity, the law of large numbers holds even if the
observations are dependent
If the dependence (h) does not diminish fast enough,
the estimators are no longer consistent
Examples [3]

Basics
Summary of estimators electricity.R

T
1 X
= XT = Xt
T
t=1
T h
1 X
(h) = (Xt )(Xt+h )
T
t=1
(h)
(h) =
(0)
Sometimes, (h) is defined with factor 1/(T h)

Basics
A closer look at the expectation estimator

The estimator is unbiased, i.e. E () = [4]
The variance of is [5]
T 1
(0) 2 X h
Var () = + 1 (h)
T T T
h=1
Under ergodicity, for T

X
X
T Var () (0) + 2 (h) = (h)
h=1 h=

Basics
For Gaussian processes, is normally distributed
N (, Var ())
and asymptotically

!
X
T ( ) Z N 0, (0) + 2 (h)
h=1
For non-Gaussian processes, is (often) asymptotically normal

!
X
T ( ) Z N 0, (0) + 2 (h)
h=1

Basics
A closer look at the autocovariance estimators (h)

For Gaussian processes with absolutely summable covariance function,
0
T ( (0) (0)) , . . . , T ( (K ) (K ))
is multivariate normal with expectation vector (0, . . . , 0)0 and
T Cov ( (h1 ) , (h2 ))

X
= ( (r ) (r + h1 + h2 ) + (r h2 ) (r + h1 ))
r =

Basics
A closer look at the autocorrelation estimators (h)

For Gaussian processes with absolutely summable covariance function,
the random vector
0
T ( (0) (0)) , . . . , T ( (K ) (K ))
is multivariate normal with expectation vector (0, . . . , 0)0 and a

complicated covariance matrix
Be careful: For small to medium sample sizes the autocovariance and
autocorrelation estimators are biased!
autocorr.R

Basics
An important special case for autocorrelation estimators:

Let (t ) be a white-noise process with Var (t ) = 2 < , then
E ( (h)) = T 1 + O(T 2 )
1 2 )
T + O(T for h1 = h2
Cov ( (h1 ) , (h2 )) = 2

O T else
For white-noise processes and long time series, the empirical

autocorrelations are approximately independent normal random
variables with expectation T 1 and variance T 1

Mathematical digression (I)
Complex numbers
Some quadratic equations do not have real solutions, e.g.
x2 + 1 = 0
Still it is possible (and sensible) to define solutions to such equations

The definition in common notation is

i = 1
where i is the number which, when squared, equals 1

The number i is called imaginary (i.e. not real)

Complex numbers
Other imaginary numbers follow from this definition, e.g.

16 = 16 1 = 4i

5 = 5 1 = 5i
Further, it is possible to define numbers that contain

both a real part and an imaginary part, e.g. 5 8i or a + bi
Such numbers are called complex and the set of complex numbers
is denoted as C
The pair a + bi and a bi is called conjugate complex

Complex numbers
Geometric interpretation:
seq(0, 8, length = 11)
imaginary axis
a+bi
er
alu
ev
lut
so imaginary part b
ab
real part a
real axis

Complex numbers
Polar coordinates and Cartesian coordinates
z = a + bi
= r (cos + i sin )
= re i
a = r cos
b = r sin
p
r = a2 + b 2

b
= arctan
a

Complex numbers
Rules of calculus:
Addition
(a + bi) + (c + di) = (a + c) + (b + d)i
Multiplication (cartesian coordinates)
(a + bi) (c + di) = (ac bd) + (ad + bc)i
Multiplication (polar coordinates)
r1 e i1 r2 e i2 = r1 r2 e i(1 +2 )

Complex numbers
Addition:
imaginary axis
a+bi
c+di

real axis

Complex numbers
Addition:

imaginary axis
a+bi
c+di

real axis

Complex numbers
Addition:
(a+c)+(b+d)i

imaginary axis
a+bi
c+di

real axis

Complex numbers
Multiplication:
imaginary axis
2
r1
r2
real axis

Complex numbers
Multiplication:
r=
imaginary axis
r1
r
2
= 1 + 2
2
r1
r2
real axis

Complex numbers
The quadratic equation
x 2 + px + q = 0
has the solutions r

p p2
x = q
2 4
p2
If 4 q < 0 the solutions are complex (and conjugate)

Complex numbers
Example: The solutions of
x 2 2x + 5 = 0
are r
(2) (2)2
x = + 5 = 1 + 2i
2 4
and r
(2) (2)2
x = 5 = 1 2i
2 4

Mathematical digression (II)
Linear difference equations
First order difference equation with initial value x0 :
xt = c + 1 xt1
p-th order difference equation with initial value x0 :
xt = c + 1 xt1 + . . . + p xtp
A sequence (xt )t=0,1,... that satisfies the difference equation

is called a solution of the difference equation
Examples (diffequation.R)

We only consider the homogeneous case, i.e. c = 0

The general solution of the first-order difference equation
xt = 1 xt1
is
xt = A t1
with arbitrary constant A since xt = At1 = 1 At1
1 = 1 xt1
The constant is definitized by the initial condition, A = x0
The sequence xt = At1 is convergent if and only if |1 | < 1

Solution of the p-th order difference equation
xt = 1 xt1 + . . . + p xtp
Let xt = Az t , then
Az t = 1 Az (t1) + . . . + p Az (tp)
z t = 1 z (t1) + . . . + p z (tp)
and thus
1 1 z 1 . . . p z p = 0
Characteristic polynomial, characteristic equation

There are p (possibly complex, possibly nondistinct) solutions

of the characteristic equation
Denote the solutions (called roots) by z1 , . . . , zp
If all roots are real and distinct, then
xt = A1 z1t + . . . + Ap zpt
is a solution of the homogeneous difference equation

If there are complex roots the solution is oscillating
The constants A1 , . . . , Ap can be definitized with p initial conditions
(x0 , x1 , . . . , xp1 )

Stability condition: The linear difference equation
xt = 1 xt1 + . . . + p xtp
is stable (i.e. convergent) if and only if all roots of the

characteristic polynomial
1 1 z . . . p z p = 0
are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p
In R, the stability condition can be checked easily using the
commands polyroot (base package) or ArmaRoots (fArma package)

ARMA models
Definition
Definition: ARMA process

Let (t )tT be a white noise process; the stochastic process
Xt = 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
with p , q 6= 0 is called ARMA(p, q) process
AutoRegressive Moving Average process

ARMA processes are important since every stationary process can be
approximated by an ARMA process

ARMA models
Lag operator and lag polynomial
The lag operator is a convenient notational tool

The lag operator L shifts the time index of a stochastic process
L (Xt )tT = (Xt1 )tT

LXt = Xt1
Rules
L2 Xt = L (LXt ) = Xt2
n
L Xt = Xtn
1
L = Xt+1
0
L Xt = Xt

ARMA models
Lag polynomial
A(L) = a0 + a1 L + a2 L2 + . . . + ap Lp
Example: Let A(L) = 1 0.5L and B(L) = 1 + 4L2 , then
C (L) = A(L)B(L)
= (1 0.5L) 1 + 4L2

= 1 0.5L + 4L2 2L3
Lag polynomials can be treated in the same way as ordinary

polynomials

ARMA models
Define the lag polynomials
(L) = 1 1 L . . . p Lp
(L) = 1 + 1 L + . . . + q Lq
The ARMA(p, q) process can be written compactly as
(L)Xt = (L)t
Important special cases
MA(q) process : Xt = t + 1 t1 + . . . + q tq
AR(1) process : Xt = 1 Xt1 + t
AR(p) process : Xt = 1 Xt1 + + p Xtp + t

ARMA models
MA(q) process
The MA(q) process is
Xt = (L)t
Xt = t + 1 t1 + . . . + q tq
with t NID(0, 2 )
Expectation function
E (Xt ) = E (t + 1 t1 + . . . + q tq )
= E (t ) + 1 E (t1 ) + . . . + q E (tq )
= 0

ARMA models
MA(q) process
Autocovariance function
(s, t)

= E (s + 1 s1 + . . . + q sq ) (t + 1 t1 + . . . + q tq )

= E s t + 1 s t1 + 2 s t2 + . . . + q s tq
+1 s1 t + 12 s1 t1 + 1 2 s1 t2 + . . . + 1 q s1 tq
+...
+q sq t + 1 q sq t1 + 2 q sq t2 + . . . + q2 sq tq

The expectations of the cross products are

0 for s 6= t
E (s t ) = 2
for s = t

ARMA models
MA(q) process
Define 0 = 1, then
Xq
(t, t) = 2 2
i=0 i
Xq1
(t 1, t) = 2 i i+1
i=0
Xq2
(t 2, t) = 2 i i+2
i=0
(t q, t) = 2 0 q = 2 q
(s, t) = 0 for s < t q
Hence, MA(q) processes are always stationary

Simulation of MA(q) processes (maqsim.R)

ARMA models
AR(1) process
The AR(1) process is
(L)Xt = t
(1 1 L)Xt = t
Xt = 1 Xt1 + t
with t NID(0, 2 )
Expectation and variance function [6]
Stability condition: AR(1) processes are stable if |1 | < 1

ARMA models
AR(1) process
Stationarity: Stable AR(1) processes are weakly stationary if [7]
E (X0 ) = 0
2
Var (X0 ) =
1 21
Nonstationary stable processes converge towards stationarity [8]
It is common parlance to call stable processes stationary

Covariance function of stationary AR(1) process [9]

ARMA models
AR(p) process
The AR(p) process is
(L)Xt = t
Xt = 1 Xt1 + . . . + p Xtp + t
with t NID(0, 2 )
Assumption: t is independent from Xt1 , Xt2 , . . . (innovations)
Expectation function [10]
The covariance function is complicated (ar2autocov.R)

ARMA models
AR(p) process
AR(p) processes are stable if all roots of the characteristic equation
(z) = 0
are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p

An AR(p) process is weakly stationary if the joint distribution of the
p initial values (X0 , X1 , . . . , X(p1) ) is appropriate

Stable AR(p) processes converge towards stationarity;
they are often called stationary
Simulation of AR(p) processes (arpsim.R)

ARMA models
Invertability
AR and MA processes can be inverted (into each other)

Example: Consider the stable AR (1) process with |1 | < 1
Xt = 1 Xt1 + t
= 1 (1 Xt2 + t1 ) + t
= 21 Xt2 + 1 t1 + t
..
.
= n1 Xtn + 1n1 t(n1) + . . . + 21 t2 + 1 t1 + t

ARMA models
Invertability
Since |1 | < 1

X
Xt = i1 ti
i=0
= t + 1 t1 + 2 t2 + . . .
with i = i1
A stable AR(1) process can be written as an MA() process
(the same is true for stable AR(p) processes)

ARMA models
Invertability
Using lag polynomials this can be written as
(1 1 L)Xt = t
Xt = (1 1 L)1 t
X
Xt = (1 L)i t
i=0
General compact and elegant notation
(L)Xt = t
Xt = ((L))1 t
= (L)t

ARMA models
Invertability
MA(q) can be written as AR() if all roots of (z) = 0 are larger

than 1 in absolute value (invertability condition)
Example: MA(1) with |1 | < 1; from
Xt = t + 1 t1
1 Xt1 = 1 t1 + 12 t2
we find Xt = 1 Xt1 + t 12 t2
Repeated substitution of the ti terms yields

X
Xt = i Xti + t with i = (1)i+1 1i
i=1

ARMA models
Invertability
Summary
ARMA(p, q) processes are stable if all roots of
(z) = 0
are larger than 1 in absolute value

ARMA(p, q) processes are invertible if all roots of
(z) = 0
are larger than 1 in absolute value

ARMA models
Invertability
Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q)

process either as AR() or as MA()
ARMA(p, q) can be written as AR() or MA()
(L)Xt = (L)t
Xt = ((L))1 (L)t
((L))1 (L)Xt = t

ARMA models
Deterministic components
Until now we only considered processes with zero expectation

Many processes have both a zero-expectation stochastic
component (Yt ) and a non-zero deterministic component (Dt )
Examples:
linear trend Dt = a + bt
exponential trend Dt = ab t
saisonal patterns
Let (Xt )tZ be a stochastic process with deterministic component Dt
and define Yt = Xt Dt

ARMA models
Then E (Yt ) = 0 and
Cov (Yt , Ys ) = E [(Yt E (Yt )) (Ys E (Ys ))]

= E [(Xt Dt E (Xt Dt ))(Xs Ds E (Xs Ds ))]
= E [(Xt E (Xt )) (Xs E (Xs ))]
= Cov (Xt , Xs )
The covariance function does not depend on the deterministic

component
To derive the covariance function of a stochastic process, simply drop
the deterministic component

ARMA models
Special case: Dt = t =
ARMA(p, q) process with constant (non-zero) expectation
Xt = 1 (Xt1 ) + . . . + p (Xtp )
+t + 1 t1 + . . . + q tq
The process can also be written as
Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
where c = (1 1 . . . p )

ARMA models
Wolds representation theorem: Every stationary stochastic process

(Xt )tT can be represented as

X
Xt = h th + Dt
h=0
P 2
with 0 = 1, h=0 j < and t white noise with variance 2 > 0
Stationary stochastic processes can be written as a sum of a
deterministic process and an MA() process
Often, low order ARMA(p, q) processes can approximate MA()
processes well

ARMA models
Linear processes and filter
Definition: Linear process

Let (t )tZ be a white noise process; a stochastic process (Xt )tZ is called
linear if it can be written as

X
Xt = h th
h=
= (L)t
P
where the coefficients are absolutely summable, i.e. h= |h | < .
The lag polynomial (L) is called (linear) filter

ARMA models
Some special filters

Change from previous period (difference filter)
(L) = 1 L
Change from last year (for quarterly or monthly data)
(L) = 1 L4
(L) = 1 L12
Elimination of saisonal influences (quarterly data)
(L) = 1 + L + L2 + L3 /4

(L) = 0.125L2 + 0.25L + 0.25 + 0.25L1 + 0.125L2

ARMA models
Hodrick-Prescott filter (important tool in empirical macro economics)

Decompose a time series (Xt ) into a long-term growth component
(Gt ) and a short-term cyclical component (Ct )
Xt = Gt + Ct
Trade-off between goodness-of-fit and smoothness of Gt

Minimize the criterion function
T
X T
X 1
(Xt Gt )2 + [(Gt+1 Gt ) (Gt Gt1 )]2
t=1 t=2
with respect to Gt for given smoothness parameter

ARMA models
The FOCs of the minimization problem are

G1 X1
.. .
. = A ..
GT XT
where A = (I + K 0 K )1 with

1 2 1 0 0 ... 0 0 0
0 1 2 1 0 ... 0 0 0

K = 0 0
1 2 1 ... 0 0 0

.. .. .. .. .. . .. ..
. . . ..

. . . . . . .
0 0 0 0 0 . . . 1 2 1

ARMA models
The HP filter is a linear filter

Typical values for smoothing parameter
= 10 annual data
= 1600 quarterly data
= 14400 monthly data
Implementation in R (code by Olaf Posch)

Empirical examples (hpfilter.R)

Estimation of ARMA models
The estimation problem
Problem: The parameters 1 , . . . , p , 1 , . . . , q , 2 of an ARMA(p, q)

process are usually unknown
They have to be estimated from an observed time series X1 , . . . , XT
Standard estimation methods:
Least squares (OLS)
Maximum likelihood (ML)
Assumption: the lag orders p and q are known

Least squares estimation of AR(p) models
The AR(p) model with non-zero constant expectation
Xt = c + 1 Xt1 + . . . + p Xtp + t
can be writte in matrix notation

Xp+1 1 Xp Xp1 ... X1 c p+1
Xp+2 1 Xp+1 Xp ... X2 1 p+2
.. = .. +

.. .. .. .. .. ..
. . . . . . . .
XT 1 XT 1 XT 2 . . . XT p p T
Compact notation: y = X + u

Least squares estimation of AR(p) models
The standard least squares estimator is

1 0
= X0 X Xy
The matrix of exogenous variables X is stochastic

usual results for OLS regression do not hold
But: There is no contemporaneous correlation between the error term
and the exogenous variables
Hence, the OLS estimators are consistent and asymptotically efficient

Least squares estimation of ARMA models
Solve the ARMA equation
Xt = c + 1 Xt1 + . . . + p Xtp + t + 1 t1 + . . . + q tq
for t ,
t = Xt c 1 Xt1 . . . p Xtp 1 t1 . . . q tq
Define the residuals as functions of the unknown parameters
t (d, f1 , . . . , fp , g1 , . . . , gq ) = Xt d f1 Xt1 . . . fp Xtp

g1 t1 . . . gq tq

Define the sum of squared residuals

T
X
S (d, f1 , . . . , fp , g1 , . . . , gq ) = (t (d, f1 , . . . , fp , g1 , . . . , gq ))2
t=1
The least squares estimators are
(c, 1 , . . . , p , 1 , . . . , q ) = arg min S (d, f1 , . . . , fp , g1 , . . . , gq )
Since the residuals are defined recursively one needs starting values
0 , . . . , q+1 and X0 , . . . , Xp+1 to calculate 1
Easiest way: Set all starting values to zero ( conditional estimation)


The first order conditions are a nonlinear equation system

which cannot be solved easily
Minimization by standard numerical methods
(implemented in all usual statistical packages)
Either solve the nonlinear first order conditions equation system or
minimize S
Simple special case: ARMA(1, 1)
arma11.R

Maximum likelihood estimation
Additional assumption: The innovations t are normally distributed

Implication: ARMA processes are Gaussian
The joint distribution of X1 , . . . , XT is multivariat normal

X1
X = ... N (, )

XT

Expectation vector

X1 c/ (1 1 . . . p )
= E ... = ..

.
XT c/ (1 1 . . . p )
Covariance matrix

X1 (0) . . . (T 1)
(1)
X2 (1) . . . (T 2)
(0)
= Cov . =

.. .. .. ..
..

. .
. .
XT (T 1) (T 2) . . . (0)

The expectation vector and the covariance matrix contain all

2
unknown parameters = 1 , . . . , p , 1 , . . . , q , c,
The likelihood function is

T /2 1/2 1 0 1
L (; X) = (2) (det ) exp (X ) (X )
2
and the loglikelihood function is

T 1 1
ln L (; X) = ln (2) ln (det ) (X )0 1 (X )
2 2 2
The ML estimators are = arg max ln L (; X)

The loglikelihood function has to be maximized by numerical methods

Standard properties of ML estimators:
1 consistency
2 asymptotic efficiency
3 asymptotically jointly normally distributed
4 the covariance matrix of the estimators can be consistently estimated
Example: ML estimation of an ARMA(3, 3) model for the interest
rate spread (arma33.R)

Hypothesis tests
Since the estimation method is maximum likelihood,

the classical tests (Wald, LR, LM) are applicable
General null and alternative hypotheses
H0 : g () = 0
H1 : not H0
where g () is an m-valued function of the parameters

Example: If H0 : 1 = 0 then m = 1 and g () = 1

Hypothesis tests
Likelihood ratio test statistic
LR = 2(ln L(ML ) ln L(R ))
where ML and R are the unrestricted and restricted estimators

Under the null hypothesis
d
LR U 2m
and H0 is rejected at significance level if LR > 2m;1

Disadvantage: Two models must be estimated

Hypothesis tests
For the Wald test we only consider g () = 0 , i.e.
H0 : = 0
H1 : not H0
Test statistic
W = ( 0 )0 Cov
d ()( 0 )
d
If the null hypothesis is true then W U 2m
The asymptotic covariance matrix can be estimated consistently as
d () = H 1 where H is the Hessian matrix returned by the
Cov
maximization procedure

Hypothesis tests
Test example 1:
H0 : 1 = 0
H1 : 1 6= 0
Test example 2
H0 : = 0
H1 : not H0
Illustration (arma33.R)

Model selection
Usually, the lag orders p and q of an ARMA model are unknown

Trade-off: Goodness-of-fit against parsimony
Akaikes information criterion for the model with non-zero expectation
AIC = ln 2
|{z} + 2 (p + q + 1) /T
| {z }
goodness-of-fit penalty
Choose the model with the smallest AIC

Model selection
Bayesian information criterion BIC (Schwarz information criterion)
BIC = ln 2 + (p + q + 1) ln T /T
Hannan-Quinn information criterion
HQ = ln 2 + 2 (p + q + 1) ln (ln T ) /T
Both BIC and HQ are consistent while the AIC tends to overfit
Illustration (arma33.R)

Model selection
Another illustration: The true model is ARMA(2, 1) with

Xt = 0.5Xt1 + 0.3Xt2 + t + 0.7t1 ; 1000 samples of size n = 500
were generated; the table shows the models orders p and q as selected by
AIC and BIC
# orders selected by AIC # orders selected by BIC
q q
p 0 1 2 3 4 5 0 1 2 3 4 5
0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 18 64 23 14 6 0 310 167 4 0 0
2 0 171 21 16 5 7 0 503 3 1 0 0
3 0 7 35 58 80 45 1 0 2 1 0 0
4 9 2 12 139 37 44 6 1 0 0 0 0
5 11 6 12 56 46 56 1 0 0 0 0 0

Integrated processes
Difference operator
Define the difference operator
= 1 L,
then
Xt = Xt Xt1
Second order differences
2 = () = (1 L)2 = 1 2L + L2
Higher orders n are defined in the same way; note that n 6= 1 Ln

Definition
Definition: Integrated process

A stochastic process is called integrated of order 1 if
Xt = + (L)t
where t is white noise, (1) 6= 0, and

P
j=0 j|j | <
Common notation: Xt I (1)

I (1) processes are also called difference stationary or
unit root processes
Stochastic and deterministic trends
Trend stationary processes are not I (1) (since (1) = 0)

Definition
Stationary processes are sometimes called I (0)

Higher order integrations are possible, e.g.
Xt I (2)
2
Xt I (0)
In general, Xt I (d) means that d Xt I (0)

Most economic time series are either I (0) or I (1)
Some economic time series may be I (2)

Definition
Example 1: The random walk with drift, Xt = b + Xt1 + t ,

is I (1) because
Xt = Xt Xt1
= b + t
= b + (L)t
where 0 = 1 and j = 0 for j 6= 0

Definition
Example 2: The trend stationary process, Xt = a + bt + t ,

is not I (1) because
Xt = b + t t1
= (L)t
with 0 = 1, 1 = 1 and j = 0 for all other j

Definition
Example 3: The AR(2) process

Xt = b + (1 + ) Xt1 Xt2 + t
(1 L) (1 L) Xt = b + t
is I (1) if || < 1 because Xt = (L) (b + t ) with
(L) = (1 L)1 = 1 + L + 2 L2 + 3 L3 + 4 L4 + . . .
and thus (1) = i 1

P
i=0 = 1 6= 0. The roots of the characteristic
equation are z = 1 and z = 1/

Definition
Example 4: The process
Xt = 0.5Xt1 0.4Xt2 + t
is a stationary (stable) zero expectation AR(2) process; the process
Yt = a + bt + Xt
is trend stationary and I (0) since
Yt = b + Xt
1
with Xt = (L)t = (1 L) 1 0.5L + 0.4L2 t
and therefore (1) = 0 (i0andi1.R)

Definition
Definition: ARIMA process

Let (t )tT be a white noise process; the stochastic process (Xt )tZ is
called integrated autoregressive moving-average process of the orders
p, d and q, or ARIMA(p, d, q), if d Xt is an ARMA(p, q) process
(L)d Xt = (L)t
For d > 0 the process is nonstationary (I (d)) even if all roots of

(z) = 0 are outside the unit circle
Simulation of an ARIMA(p, d, q) process (arimapdqsim.R)

Deterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends?
Reason 1: Long-term forecasts and forecasting errors

Deterministic trend: The forecasting error variance is bounded
Stochastic trend: The forecasting error variance is unbounded
Illustrations
i0andi1.R

Deterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends?
Reason 2: Spurious regression

OLS regressions will show spurious relationships between
time series with (deterministic or stochastic) trends
Detrending works if the series have deterministic trends,
but it does not help if the series are integrated
Illustrations
spurious1.R

Integrated processes and parameter estimation
OLS estimators (and ML estimators) are consistent and

asymptotically normal for stationary processes
The asymptotic normality is lost if the processes are integrated
We only look at the very special case
Xt = 1 Xt1 + t
with t NID(0, 1) and X0 = 0

The AR(1) process is stationary if |1 | < 1 and has a unit root
if |1 | = 1

The usual OLS estimator of 1 is

PT
t=1 Xt Xt1
1 = P T 2
t=1 Xt1
How does the distribution of look like?

Influence of and T
Consistency?
Asymptotic normality?
Illustration (phihat.R)

Consistency and asymptotic normality for I (0) processes (|1 | < 1)
plim 1 = 1

d
T 1 1 Z N 0, 1 21

Consistency and asymptotic normality for I (1) processes (1 = 1)
plim 1 = 1

d
T 1 1 V
where V is a nondegenerate, nonnormal random variable

Root-T -consistency and superconsistency
Unit root tests
Importance to distinguish between trend stationarity and difference

stationarity
Test of hypothesis that a process has a unit root (i.e. is I (1))
Classical approaches: (Augmented) Dickey-Fuller-Test,
Phillips-Perron-Test
Basic tool: Linear regression
Xt = deterministics + Xt1 + t
Xt = deterministics + ( 1) Xt1 + t
| {z }
=:
Unit root tests
Null and alternative hypothesis
H0 : = 1 (unit root)
H1 : || < 1 (no unit root)
or, equivalently,
H0 : = 0 (unit root)
H1 : < 0 (no unit root)
Unit root tests are one-sided; explosive process are ruled out
Rejecting the null hypothesis is evidence in favour of stationarity
If the null hypothesis is not rejected, there could be a unit root
DF test and ADF test
Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests
Possible regressions
Xt = Xt1 + t or Xt = Xt1 + t
Xt = a + Xt1 + t or Xt = a + Xt1 + t
Xt = a + bt + Xt1 + t or Xt = a + bt + Xt1 + t
Assumption for Dickey-Fuller test: no autocorrelation in t

If there is autocorrelation in t , use the augmented DF test
Dickey-Fuller regression, case 1: no constant, no trend
Xt = Xt1 + t
Null and alternative hypotheses
H0 : = 0
H1 : < 0
Null hypothesis: stochastic trend without drift

Alternative hypothesis: stationary process around zero
Dickey-Fuller regression, case 2: constant, no trend
Xt = a + Xt1 + t
H0 : = 0 or H0 : = 0, a = 0
H1 : < 0 or H0 : < 0, a 6= 0
Null hypothesis: stochastic trend without drift

Alternative hypothesis: stationary process around a constant
Dickey-Fuller regression, case 3: constant and trend
Xt = a + bt + Xt1 + t
H0 : = 0 or = 0, b = 0
H1 : < 0 or < 0, b 6= 0
Null hypothesis: stochastic trend with drift

Alternative hypothesis: trend stationary process
Dickey-Fuller test statistics for single hypotheses
-test : T
-test : /
The -test statistic is computed in the same way as

the usual t-test statistic
Reject the null hypothesis if the test statistics are too small
The critical values are not the quantiles of the t-distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.6)
The Dickey-Fuller test statistics for the joint hypotheses are

computed in the same way as the usual F -test statistics
Reject the null hypothesis if the test statistic is too large
The critical values are not the quantiles of the F -distribution
There are tables with the correct critical values
(e.g. Hamilton, table B.7)
Illustrations (dftest.R)
If there is autocorrelation in t the DF test does not work (dftest.R)

Augmented Dickey-Fuller test (ADF test) regressions:
Xt = 1 Xt1 + . . . + p Xtp + Xt1 + t

Xt = a + 1 Xt1 + . . . + p Xtp + Xt1 + t
Xt = a + bt + 1 Xt1 + . . . + p Xtp + Xt1 + t
The added lagged differences capture the autocorrelation

The number of lags p must be large enough to make t white noise
The critical values remain the same as in the no-correlation case
Further interesting topics (but we skip these)
Phillips-Perron test
Structural breaks and unit roots
KPSS test of stationarity
H0 : Xt I (0)
H1 : Xt I (1)
Regression with integrated processes
Spurious regression: If Xt and Yt are independent but both I (1)

then the regression
Yt = + Xt + ut
will result in an estimated coefficient that is significantly different
from 0 with probability 1 as T
BUT: The regression
Yt = + Xt + ut
may be sensible even though Xt and Yt are I (1)

Cointegration
Regression with integrated processes
Definition: Cointegration
Two stochastic processes (Xt )tT and (Yt )tT are cointegrated if both
processes are I (1) and there is a constant such that the process
(Yt Xt ) is I (0)
If is known, cointegration can be tested using a standard unit root

test on the process (Yt Xt )
If is unknown, it can be estimated from the linear regression
Yt = + Xt + ut
and cointegration is tested using a modified unit root test on the

residual process (ut )t=1,...,T
GARCH models
Conditional expectation
Let (X , Y ) be a bivariate random variable with a joint density

function, then
Z
E (X |Y = y ) = x fX |Y =y (x)dx

is the conditional expectation of X given Y = y

E (X |Y ) denotes a random variable with realization E (X |Y = y )
if the random variable Y realizes as y
Both E (X |Y ) and E (X |Y = y ) are called conditional expectation
GARCH models
Conditional variance
Let (X , Y ) be a bivariate random variable with a joint density

function, then
Z
Var (X |Y = y ) = (x E (X |Y = y ))2 fX |Y =y (x)dx

is the conditional variance of X given Y = y

Var (X |Y ) denotes a random variable with realization Var (X |Y = y )
if the random variable Y realizes as y
Both Var (X |Y = y ) and Var (X |Y ) are called conditional variance
GARCH models
Rules for conditional expectations
1 Law of iterated expectations: E (E (X |Y )) = E (X )

2 If X and Y are independent, then E (X |Y ) = E (X )
3 The condition can be treated like a constant,
E (XY |Y ) = Y E (X |Y )
4 The conditional expecation is a linear operator. For a1 , . . . , an R
n n
!
X X
E ai Xi |Y = ai E (Xi |Y )
i=1 i=1
GARCH models
Basics
Some economic time series show volatility clusters, e.g. stock returns,
commodity price changes, inflation rates, . . .
Simple autoregressive models cannot capture volatility clusters since
their conditional variance is constant
Example: Stationary AR(1)-process, Xt = Xt1 + t with || < 1;
then
2
Var (Xt ) = X2 = ,
1 2
and the conditional variance is
Var (Xt |Xt1 ) = 2
GARCH models
Basics
In the following, we will focus on stock returns

Empirical fact: squared (or absolute) returns are positively
autocorrelated
Implication: Returns are not independent over time
The dependence is nonlinear
How can we model this kind of dependence?
GARCH models
ARCH(1)-process
Definition: ARCH(1)-process
The stochastic process (Xt )tZ is called ARCH(1)-process if
E (Xt |Xt1 ) = 0
Var (Xt |Xt1 ) = t2
2
= 0 + 1 Xt1
for all t Z, with 0 , 1 > 0
Often, an additional assumption is

2
Xt | (Xt1 = xt1 ) N(0, 0 + 1 xt1 )
GARCH models
ARCH(1)-process
The unconditional distribution of Xt is a non-normal distribution

Leptokurtosis: The tails are heavier than the tails of the normal
distribution
Example of an ARCH(1)-process
Xt = t t
where (t )tZ is white noise with 2 = 1 and

q
t = 0 + 1 Xt1 2
GARCH models
ARCH(1)-process
One can show that [11]
E (Xt |Xt1 ) = 0
E (Xt ) = 0
2
Var (Xt |Xt1 ) = 0 + 1 Xt1
Var (Xt ) = 0 / (1 1 )
Cov (Xt , Xti ) = 0 for i > 0
Stationarity condition: 0 < 1 < 1

The unconditional
p kurtosis is 3(1 12 )/(1 312 ) if t N(0, 1).
If 1 > 1/3 = 0.57735, the kurtosis does not exist. [12]
GARCH models
ARCH(1)-process
Squared returns follow [13]
Xt2 = 0 + 1 Xt1
2
+ vt
with vt = t2 (2t 1)
Thus, squared returns of ARCH(1) are AR(1)
The process (vt )tZ is white noise
E (vt ) = 0
Var (vt ) = E (vt2 ) = const.
Cov (vt , vti ) = 0 (i = 1, 2, . . .)
GARCH models
ARCH(1)-process
Simulation of an ARCH(1)-process for t = 1, . . . , 2500

Parameters: 0 = 0.05, 1 = 0.95, start value X0 = 0
Conditional distribution: t N(0, 1)
archsim.R
Check whether the simulated time series shows the typical stylized
facts of return distributions
GARCH models
Estimation of an ARCH(1)-process
Of course, we do not know the true values of the model parameters

0 and 1
How can we estimate the unknown parameters 0 and 1 ?
Observations X1 , . . . , XT
Because of
Xt2 = 0 + 1 Xt1
2
+ vt
a possible estimation method is OLS
GARCH models
OLS estimator of 1

T
Xt2 Xt2 Xt1
2 X2
P
t=2 t1
1 = PT 2 2 (Xt2 , Xt1
2
)
X X 2
t=2 t1 t1
Careful: These
p estimators are only consistent if the kurtosis exists
(i.e. if 1 < 1/3)
Test of ARCH-effects
H0 : 1 = 0
H1 : 1 > 0
GARCH models
For T large, under H0

T 1 N(0, 1)

Reject H0 if T 1 > 1 (1 )
Second version of this test: Consider the R 2 of the regression
Xt2 = 0 + 1 Xt1
2
+ vt ,
then under H0
appr
T 12 TR 2 21
Reject H0 if TR 2 > F1
2 (1 )
1
GARCH models
ARCH(p)-process
Definition: ARCH(p)-process
The stochastic process (Xt )tZ is called ARCH(p)-process if
E (Xt |Xt1 , . . . Xtp ) = 0

Var (Xt |Xt1 , . . . , Xtp ) = t2
2 2
= 0 + 1 Xt1 + . . . + p Xtp
for t Z, where i 0 for i = 0, 1, . . . , p 1 and p > 0
Often, an additional assumption is that
Xt |(Xt1 = xt1 , . . . , Xtp = xtp ) N(0, t2 )
GARCH models
ARCH(p)-process
Example of an ARCH(p)-process
Xt = t t
where(t )tZ is white noise with 2 = 1 and

q
t = 0 + 1 Xt1 2 + ... + X2
p tp
An ARCH(p) process is weakly stationary if all roots of

1 1 z 2 z 2 . . . p z p = 0 are outside the unit circle
Then, for all t Z, E (Xt ) = 0 and

Var (Xt ) = P0p
1 i=1 i
GARCH models
ARCH(p)-process
If (Xt )tZ is a stationary ARCH(p) process, then (Xt2 )tZ is a

stationary AR(p) process
Xt2 = 0 + 1 Xt1
2 2
+ . . . + p Xtp + vt
As to the error term,
E (vt ) = 0
Var (vt ) = const.
Cov (vt , vti ) = 0 for i = 1, 2, . . .
Simulating an ARCH(p) is easy
GARCH models
Estimation of ARCH(p) models
OLS estimation of
Xt2 = 0 + 1 Xt1
2 2
+ . . . + p Xtp + vt
Test of ARCH-effects
H0 : 1 = 2 = . . . = p = 0 vs H1 : not H0
Let R 2 denote the coefficient of determination of the regression

Under H0 , the test statistic TR 2 2p ;
thus reject H0 if TR 2 > F12 (1 )
p
GARCH models
Basic idea of the maximum likelihood estimation method:

Choose parameters such that the joint density of the observations
fX1 ,...,XT (x1 , . . . , xT )
is maximized
Let X1 , . . . , XT denote a random sample from X
The density fX (x; ) depends on R unknown parameters
= (1 , . . . , R )
GARCH models
ML estimation of : Maximize the (log)likelihood function
L () = fX1 ,...XT (x1 , . . . , xT ; )

T
Y
= fX (xt ; )
t=1
XT
ln L () = ln fX (xt ; )
t=1
ML estimate
= argmax [ln L ()]
GARCH models
Since observations are independent in random samples

T
Y
fX1 ,...,XT (x1 , . . . , xT ) = fXt (xt )
t=1
or
T
X
ln fX1 ,...,XT (x1 , . . . , xT ) = ln fXt (xt )
t=1
T
X
= ln fX (xt )
t=1
But: ARCH-returns are not independent!
GARCH models
Factorization with dependent observations

T
Y
fX1 ,...,XT (x1 , . . . , xT ) = fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )
t=1
or
T
X
ln fX1 ,...,XT (x1 , . . . , xT ) = ln fXt |Xt1 ,...,X1 (xt |xt1 , . . . , x1 )
t=1
Hence, for an ARCH(1)-process

T 2 !
Y 1 1 xt
fX1 ,...,XT (x1 , . . . , xT ) = fX1 (x1 ) p 2 exp
2 t 2 t
t=2
GARCH models
The marginal density of X1 is complicated but becomes negligible for

large T and, therefore, will be dropped from now on
Log-likelihood function (without initial marginal density)
ln L(0 , 1 |x1 , . . . , xT )
T T 2
T 1X 1X xt
= ln 2 ln t2
2 2 2 t
t=2 t=2
where t2 = 0 + 1 xt1
2
ML-estimation of 0 and 1 by numerical maximization of

ln L(0 , 1 ) with respect to 0 and 1
GARCH models
GARCH(p,q)-process
Definition: GARCH(p,q)-process
The stochastic process (Xt )tZ is called GARCH(p, q)-process if
E (Xt |Xt1 , Xt2 , . . .) = 0

Var (Xt |Xt1 , Xt2 , . . .) = t2
2 2
= 0 + 1 Xt1 + . . . + p Xtp
2 2
+1 t1 + . . . + q tq
for t Z with i , i 0
Often, an additional assumption is that
(Xt |Xt1 = xt1 , Xt2 = xt2 , . . .) N(0, t2 )
GARCH models
GARCH(p,q)-process
Conditional variance of GARCH(1, 1)
Var (Xt |Xt1 , Xt2 , . . .) = t2

2 2
= 0 + 1 Xt1 + 1 t1

0 X
= + 1 1i1 Xti
2
1 1
i=1
Unconditional variance
0
Var (Xt ) = Pp Pq
1 i=1 i j=1 j
GARCH models
GARCH(p,q)-process
Necessary condition for weak stationarity

p
X q
X
i + j < 1
i=1 j=1
(Xt )tZ has no autocorrelation

GARCH-processes can be written as ARMA(max (p, q) , q)-processes
in the squared returns
Example: GARCH(1, 1)-process with Xt = t t and
t2 = 0 + 1 Xt1
2 + 2
1 t1
GARCH models
Estimation of GARCH(p,q)-processes
Estimation of the ARMA(max (p, q) , q)-process in the squared returns

Alternative (and better) method: Maximum likelihood
For a GARCH(1, 1)-process
fX1 ,...,XT (x1 , . . . , xT )

T 2 !
Y 1 1 xt
= fX1 (x1 ) p 2 exp
2 t 2 t
t=2
GARCH models
Again, the density of X1 can be neglected

Log-Likelihood function
ln L(0 , 1 , 1 |x1 , . . . , xT )
T T 2
T 1X 1X xt
= ln 2 ln t2
2 2 2 t
t=2 t=2
with t2 = 0 + 1 xt1
2 2
+ 1 t1 and 12 = 0
ML-estimation of 0 , 1 and 1 by numerical maximization
GARCH models
2
Conditional h-step forecast of the volatility t+h in a GARCH(1, 1)
model

2
h 2 0
E t+h |Xt , Xt1 , . . . = (1 + 1 ) t
1 1 1
0
+
1 1 1
If the process is stationary
2 0
lim E (t+h |Xt , Xt1 , . . .) =
h 1 1 1
Simulation of GARCH-processes is easy; the estimation can be
computer intensive
GARCH models
Residuals of an estimated GARCH(1,1) model
Careful: Residuals are slightly different from what you know from OLS
regressions
Estimates: 0 , 1 , 1 ,
From t2 = 0 + 1 Xt1
2 + 2
1 t1 and Xt = + t t we calculate
the standardized residuals
Xt Xt
t = =q
t 2 + 2
0 + 1 Xt1 1 t1
Histogram of the standardized residuals
GARCH models
AR(p)-ARCH(q)-models
Definition: (Xt )tZ is called AR(p)-ARCH(q)-process if
Xt = + 1 Xt1 + t
t2 = 0 + 1 2t1
where t N(0, t2 )
mean equation / variance equation
GARCH models
Extensions of the GARCH model
There are a number of possible extensions to the GARCH model:
Empirical fact: Negative shocks have a larger impact on volatility

than positive shocks (leverage effect)
News impact curve
Nonnormal innovations, e.g. t t

Time Series Analysis: Andrea Beccarini

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series Analysis: Andrea Beccarini

Uploaded by

Copyright:

Available Formats

Time Series Analysis

Center for Quantitative Economics

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 1 / 143

Time series are ubiquitous in economics, and very important

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 2 / 143

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 3 / 143

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 4 / 143

Neusser, Klaus (2011), Zeitreihenanalyse in den

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 5 / 143

Definition: Time series

Time series can be univariate or multivariate

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 6 / 143

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 7 / 143

Quarterly GDP Germany, 1991 I to 2012 II

1995 2000 2005 2010

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 8 / 143

DAX index and log(DAX), 31.12.1964 to 6.4.2009

1970 1980 1990 2000 2010

1970 1980 1990 2000 2010

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 9 / 143

Definition: Stochastic process

Short version: A stochastic process is a sequence of random variables

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 10 / 143

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 11 / 143

Example 1: White noise

Example 2: Random walk

Example 3: A random constant

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 12 / 143

Definition: Moment functions

(t) = E (Xt ) (expectation function)

Correlation function (autocorrelation function)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 13 / 143

Usually, the moment functions are unknown and have to be estimated

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 14 / 143

(1) (2) (n)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 15 / 143

(1) (2) (n)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 15 / 143

(1) (2) (n)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 16 / 143

(1) (2) (n)

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 16 / 143

Moment functions cannot be estimated without additional

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 17 / 143

Definition: Strong stationarity

P(Xt1 x1 , . . . , Xtn xn ) = P(Xt1 +h x1 , . . . , Xtn +h xn )

Implication: all univariate marginal distributions are identical

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 18 / 143

Definition: Weak stationarity

Simplified notation for covariance and correlation functions

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 19 / 143

Strong stationarity implies weak stationarity

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 20 / 143

Definition: Ergodicity (I)

(Xt )tT is called (expectation) ergodic, if

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 21 / 143

Definition: Ergodicity (II)

(Xt )tT is called (covariance) ergodic, if for all h Z

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 22 / 143

Ergodicity is consistency (in quadratic mean) of the estimators of

The dependence between far away observations must be sufficiently

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 23 / 143

Ergodicity condition (for autocovariance): A stationary Gaussian