You are on page 1of 16

# 6.3.

6.3

123

## The purpose of forecasting is to predict future values of a TS based on the data

collected to the present. In this section we will discuss a linear function of
X = (Xn , Xn1 , . . . , X1 )T predicting a future value of Xn+m for m = 1, 2, . . ..
We call a function
f(n) (X) = 0 + 1 Xn + . . . + n X1 = 0 +

n
X

i Xn+1i

i=1

the best linear predictor (BLP) of Xn+m if it minimizes the prediction error
S() = E[Xn+m f(n) (X)]2 ,
where is the vector of the coefficients i and X is the vector of variables
Xn+1i .
Since S() is a quadratic function of and is bounded below by zero there is at
least one value of that minimizes S(). It satisfies the equations
S()
= 0,
i

i = 0, 1, . . . , n.

## Evaluation of the derivatives gives so called prediction equations

n

X
S()
i Xn+1i ] = 0
= E[Xn+m 0
0
i=1

(6.19)

X
S()
i Xn+1i )Xn+1j ] = 0
= E[(Xn+m 0
j
i=1
Assuming that E(Xt ) = the first equation can be written as
0

n
X

i = 0,

i=1

which gives
0 = (1

n
X
i=1

i ).

(6.20)

## CHAPTER 6. ARMA MODELS

124
The set of equations (6.20) gives
0 = E(Xn+m Xn+1j ) 0

n
X

= E(Xn+m Xn+1j ) 2 (1
= (m (1 j))

n
X
i=1

i E(Xn+1i Xn+1j )

i=1
n
X
i=1

i )

n
X

i E(Xn+1i Xn+1j )

i=1

i (i j)

## That is we obtain the following form of the prediction equations (6.20).

(m 1 + j) =

n
X
i=1

i (i j),

j = 1, . . . , n.

(6.21)

We obtain the same set of equations when E(Xt ) = 0. Hence, we assume further
that the TS is a zero-mean stationary process. Then 0 = 0 too.

Given {X1 , . . . , Xn } we want to forecast the value of Xn+1 . The BLP of Xn+1 is
f(n) =

n
X

i Xn+1i .

i=1

## The coefficients i satisfy (6.21), that is

n
X
i=1

i (i j) = (j),

j = 1, 2, . . . , n.

## A convenient way of writing these equations is using matrix notation. We have

n n = n ,
where

(6.22)

n = {(i j)}j,i=1,2,...,n
n = (1 , . . . , n )T

n = ((1), . . . , (n))T .
If n is nonsingular than the unique solution to (6.22) exists and is equal to
n = 1
n n .

(6.23)

125

(n)

Xn+1 = nT X.

(6.24)
(n)

(n)

(n)

## Pn+1 = E(Xn+1 Xn+1 )2

= E(Xn+1 nT X)2

2
= E(Xn+1 nT 1
n X)

T 1
T 1
2
= E(Xn+1
2nT 1
n XXn+1 + n n XX n n )

(6.25)

T 1
1
= (0) 2nT 1
n n + n n n n n

= (0) nT 1
n n .

## Example 6.6. Prediction for an AR(2)

Let
Xt = 1 Xt1 + 2 Xt2 + Zt
be a causal AR(2) process. Suppose we have one observation of X1 . Then onestep-ahead prediction function is
f(1) (X2 ) = 1 X1 ,
where
1 = 1
1 1 =

(1)
= (1) = 11
(0)

and we obtain
(1)

X2 = (1)X1 = 11 X1 .
To predict X3 based on X2 and X1 we need to calculate 1 and 2 in the prediction
function
f(2) (X3 ) = 1 X2 + 2 X1 .

126

## These can be obtained from (6.23) as



1 
 
1
(1)
(0) (1)
=
2
(2)
(1) (0)


1
= 2
(0) 2 (1)

(0) (1)
(1) (0)

1
= 2
(0) 2 (1)

(0)(1) (1)(2)
2 (1) + (0)(2)

(1)
(2)

(1)((0)(2))
(1)(1(2))
2 (0) 2 (1)

12 (1)
=

=
2
2
(0)(2) (1) (2) (1) .
2 (0) 2 (1)
12 (1)

From the difference equations (6.17) calculated in Example 6.4 we know that
1
1 2
(2) 1 (1) 2 (0) = 0
(1) =

That is
(2) = 1 (1) + 2 .
It finally gives

1
2

1
2

In fact, we can obtain this result directly from the model taking
(2)

X3 = 1 X2 + 2 X1
which satisfies the prediction equations, namely
E[(X3 1 X2 2 X1 )X1 ] = E[Z3 X1 ] = 0
E[(X3 1 X2 2 X1 )X2 ] = E[Z3 X2 ] = 0.
In general, for n 2, we have
(n)

Xn+1 = 1 Xn + 2 Xn1 ,
i.e., j = 0 for j = 3, . . . , n.

(6.26)

127

(n)

## Xn+1 = 1 Xn + 2 Xn1 + . . . + p Xnp+1 , for n p.

(6.27)

Remark 6.8. An interesting connection between the PACF and the vector n is
that in fact nn = n the last element of the vector. For this reason, the vector n
is usually denoted by n in the following way

1
n1
2 n2

n = .. = .. = n .
. .
n
nn
The prediction equation (6.22) for a general ARMA(p,q) models is more difficult
to calculate, particularly for large values of n when we would have to calculate an
inverse of matrix n of large dimension. Hence some recursive solutions to calculate the predictor (6.24) and the mean square error (6.25) were proposed, one of
them by Levinson in 1947 and by Durbin in 1960.
The method is known as the Durbin-Levinson Algorithm. Its steps are following:
(0)

Step 1 Put 00 = 0, P1

= (0).

## Step 2 For n 1 calculate

nn
where, for n 2

P
(n) n1
n1,k (n k)
=
Pk=1
n1
1 k=1 n1,k (k)

(6.28)

nk = n1,k nn n1,nk , k = 1, 2, . . . , n 1.
Step 3 For n 1 calculate
(n)

## Pn+1 = Pn(n1) (1 2nn ).

(6.29)

Remark 6.9. Note, that the Durbin-Levinson algorithm gives an iterative method
to calculate the PACF of a stationary process.

## CHAPTER 6. ARMA MODELS

128

Remark 6.10. When we predict a value of the TS based only on one preceding
datum, that is n = 1, we obtain
11 = (1),
(1)

## and hence the predictor X2 = (1)X1 , or in general

(1)

Xn+1 = (1)Xn
and its mean square error
(1)

P2

= (0)(1 211 ).

## When we predict Xn+1 based on two preceding values, that is n = 2, we obtain

22 =

(2) 2 (1)
(2) 11 (1)
=
1 11 (1)
1 2 (1)

which we have also obtained solving the matrix equation (6.22) for 2 ,
21 = 11 22 11 = (1)(1 22 ).
Then the predictor is
(2)

Xn+1 = 21 Xn + 22 Xn1
and its mean square error
(2)

P3

129

## Using the Durbin-Levinson algorithm for AR(2) we obtain

11 = (1) =

22 =

1
1 2

(2) 2 (1)
= 2
1 2 (1)

21 = (1)(1 22 ) = 1
33 =

## (3) 1 (2) 2 (1)

=0
1 1 (1) 2 (2)

31 = 21 33 22 = 1
32 = 22 33 21 = 2
44 =

## (4) 1 (3) 2 (2)

=0
1 1 (1) 2 (2)

The results for 33 and 44 come from the fact that in the numerator we have the
difference which is zero (difference equation).
Hence, one-step-ahead predictor for AR(2) is based only on two preceding values,
as there are only two nonzero coefficients in the prediction function. As before,
we obtain the result
(2)
Xn+1 = 1 Xn + 2 Xn1 .
Remark 6.11. The PACF for AR(2) is
1
1 2
= 2
= 0 for 3.

11 =
22

(6.30)

Given values of variables {X1 , . . . , Xn } the m-steps-ahead predictor is
(n)

(m)

(m)

nn X1 ,

(6.31)

130
(m)

## where nj = j satisfy the prediction equations (6.21). In matrix notation the

prediction equations are
n (m)
= n(m) ,
(6.32)
n
where
n(m) = ((m), (m + 1), . . . , (m + n 1))T
and

(m)

(m)

T
(m)
= (n1 , n2 , . . . , (m)
n
nn ) .

## The mean square m-step-ahead prediction error is

(n)

(n)

(m)
Pn+m = E[Xn+m Xn+m ]2 = (0) (n(m) )T 1
n n .

(6.33)

The mean square prediction error assesses the precision of the forecast and it is
used to calculate so called prediction interval (PI). When the process is Gaussian
the the PI is
q
(n)
(n)
b
(6.34)
Xn+m u Pbn+m ,
where u is such that P (|U | < u ) = 1 , where U is a standard normal r.v.
For = 0.5 we have u 1.96 and the 95% prediction interval boundaries are


q
q
(n)
(n)
(n)
(n)
bn+m + 1.96 Pbn+m .
bn+m 1.96 Pbn+m , X
X

Here we have used the hat notation as usually we do not know the values of the
model parameters and we have to use their estimators. We will discuss the model
parameter estimation in the next section.

131

## 6.3.3 Parameter Estimation

In this section we will discuss methods of parameter estimation for ARMA(p,q)
assuming that the orders p and q are known.
Method of Moments
In this method we equate the population moments with the sample moments to
obtain a set of equations whose solution gives the required estimators. For example, the first population moment is 1 = E(X) and its sample counterpart is
This immediately gives
m1 = X.

= X.
The method of moments gives good estimators for AR models but less efficient
estimators for MA or ARMA processes. Hence we will present the method for
AR time series. As usual we denote an AR(p) model by
Xt = 1 Xt1 + . . . + p Xtp + Zt .
This is a zero-mean model, but the estimation of the mean is straightforward and
we will not discuss it further. Here we use the difference equations, where we
replace the population autocovariance (central moment of order two) with the
sample autocovariance. The first p + 1 difference equations are
(0) = 1 (1) + . . . + p (p) + 2
( ) = 1 ( 1) + . . . + p ( p),

= 1, 2, . . . , p.

Note, that q = 0, so the sum on the right hand side of (6.16) is zero.
In matrix notation we can write
2 = (0) T p
p = p
where

p = {(i j)}i,j=1,...,p
= (1 , . . . , p )T

p = ((1), . . . , (p))T .
Replacing ( ) by the sample ACVF
n

b( ) =

1X

(Xt+ X)(X
t X)
n t=1

## CHAPTER 6. ARMA MODELS

132
we obtain the solution

b 1
bpT

b2 =
b(0)
p bp
1
b=
b
bp .

(6.35)

These equations are called Yule-Walker estimators. They are often expressed in
terms of autocorrelation function rather than autocovariance function. Then we
have


b 1 bp

b2 =
b(0) 1 bT
R
p
p
(6.36)
1
b
b
= Rp bp ,

where

b p = {b
R
(i j)}i,j=1,2,...,p

## is the matrix of the sample autocorrelations and

bp = (b
(1), . . . , b(p))T

## is the vector of sample autocorrelations.

b of the model
Proposition 6.3. The distribution of the Yule-Walker estimators
parameters of a causal AR(p) process
Xt = 1 Xt1 + . . . + p Xtp + Zt .
is asymptotically (as n ) normal, in the sense that

and

b ) N (0, 2
b 1 ),
n(
p
p
b2 2 .

Remark 6.12. Note that the matrix equation (6.23) is of the same form as (6.36).
Hence, we can use the Durbin-Lewinson algorithm to calculate the estimates. This
will give us the values of the sample PACF as well as the estimates of .
Proposition 6.4. The distribution of the sample PACF of a causal AR(p) process
is asymptotically normal, that is

n

## Example 6.8. Consider an AR(2) zero-mean causal process

Xt = 1 Xt1 + 2 Xt2 + Zt .

133

## Then the Yule-Walker estimators are



b 1 b2

b2 =
b(0) 1 bT
R
2
2

where

and

b=R
b 1 b2 ,

2
b2 =
R

b(0) b(1)
b(1) b(0)

b2 = (b
(1), b(2))T
b = (b1 , b2 )T .

We can easily invert a 2 2 matrix and calculate the estimators, or we can use the
Durbin-Levinson algorithm directly to obtain
b11 = b(1) =

Also, we get

b1

1 b2
b(2) b2 (1)
b
= b2
22 =
1 b2 (1)
b21 = b(1)[1 b22 ] = b1 .
"

b2 = (0) 1 (b
(1), b(2))

b1
b2

!#

= (0)[1 (b
(1)b1 + b(2)b2 )]

Furthermore, from Proposition 6.3 we can derive the confidence interval for i .
The proposition says that

d
b )
b 1 ),
n(
N (0, 2
p

b 1 ,
that is the variance of n(bi i ) is the i-th diagonal element of the matrix 2
p
say vii . But

Hence,

var(bi ) =

1
vii
n

#
"
r
r
1
1
vii , bi + u
vii .
bi u
n
n

## CHAPTER 6. ARMA MODELS

134
Also, from Proposition 6.4 we have

that is

## This gives the asymptotic result

var(b )

1
.
n

However, we know that the PACF for > p is zero. It means that with probability
1 we have
b 0
< u .
u < q
1
n

It can be interpreted that the estimate of the PACF indicates a non-significant value
of if it is in the interval

[u / n, u / n].
We will do the calculations for the simulated AR(2) process given in Figure 6.3.
For these data we have the following values of the sample variance
b(0) and the
sample autocorrelations b(1) and b(2)

b 2 is equal to
Then, matrix R
and its inverse is

b(0) = 1.947669
b(1) = 0.66018
b(2) = 0.33751.

b2 =
R

b 1 =
R
2

1
0.66018
0.66018
1

1.77254 1.17020
1.17020
1.77254

! 

 

b
0.775243
1
0.66018
1.77254 1.17020
=
=
0.174290
0.33751
1.17020
1.77254
b2

135

## b2 = 1.947669[1 (0.66018 0.775243 + 0.33751 (0.174290))] = 1.06542.

The series was simulated for 1 = 0.7 and 2 = 0.1 and a Gaussian White Noise
with zero mean and variance equal to 1. These estimates are not far from the true
values. Had we not known the true values we would have liked to calculate the
confidence intervals for them. There are 200 observations, i.e. n = 200 which is
big enough to use the asymptotic result given in the Proposition 6.3. To calculate
vii note that
= (0)R,

which gives
1 =

1
R1 .
(0)

Hence
b 1 =

b2
b2

1 b 1
R

b(0)

1.06542
=
1.947669

1.77254 1.17020
1.17020
1.77254

0.969623 0.640129
0.640129
0.969623

## and we obtain the estimate of the variance of the parameter estimators

var(bi ) =

1
1
vii =
0.969623 = 0.0048481.
n
200

The 95% approximate confidence intervals for the model parameters 1 and 2
are, respectively

## [0.775243 1.96 0.0048481, 0.775243 + 1.96 0.0048481]

= [0.638771, 0.911714]

## [0.17429 1.96 0.0048481, 0.17429 + 1.96 0.0048481]

= [0.310761, 0.037818]

## CHAPTER 6. ARMA MODELS

136
Maximum Likelihood Estimation

## The method of Maximum Likelihood Estimation applies to any ARMA(p,q) model

Xt 1 Xt1 . . . p Xtp = Zt + 1 Zt1 + . . . + q Ztq .
This method requires an assumption on the distribution of the random variable
X = (X1 , . . . , Xn )T . The usual assumption is that the process is Gaussian. Let
us denote the p.d.f. of X by
fX (X1 , . . . , Xn ; , 2 ),
where
= (1 , . . . , p , 1 , . . . , q )T .
Given the values of X the p.d.f. becomes a function of the parameters. It is then
denoted by
L(, 2 |x1 , . . . , xn )

## and for the Gaussian process it is



1 T 1
exp X n X .
L(, |x1 , . . . , xn ) = p
2
(2)n det(n )
2

A more convenient form can be obtained after taking natural logarithm. Then
l(, 2 |x1 , . . . , xn ) = ln L(, 2 |x1 , . . . , xn )
1
1
n
= ln(2) ln det(n ) X T 1
n X.
2
2
2
The Maximum likelihood Estimates are the values of and 2 which maximize
the function l(, 2 |x1 , . . . , xn ). Intuitively, the MLE is the parameter value for
which the observed sample is most likely.
The estimates are usually found numerically using some iterative numerical optimization routines. We will not discuss them here.
The MLE have the property of being asymptotically normally distributed. It is
stated in the following proposition.
Proposition 6.5. The distribution of the MLE b of a causal and invertible ARMA(p,q)
process is asymptotically normal in the sense that

d
n(b ) N (0, 2 1
(6.37)
p+q ),
where the (p + q) (p + q)-dimensional matrix p+q depends on the model parameters.

137

## Some Specific Asymptotic Distributions

AR(1): Xt + Xt1 = Zt


1
2
b
AN , (1 )
n

## AR(2): Xt + 1 Xt1 + 2 Xt2 = Zt

!




b
1
1 22
1 (1 + 2 )
1
1
,
AN
1 22
2
n 1 (1 + 2 )
b2

MA(1): Xt = Zt + Zt1



1
2
b
AN , (1 )
n

!




b
1
1 22
1 (1 + 2 )
1
1
AN
,
1 22
2
n 1 (1 + 2 )
b2

## ARMA(1,1): Xt Xt1 = Zt + Zt1

!
 


1 +

b
(1 2 )(1 + ) (1 2 )(1 2 )
,
AN

n( + )2 (1 2 )(1 2 ) (1 2 )(1 + )
b

Using these results we can construct approximate confidence intervals for the
model parameters as in the method of moments.

138