You are on page 1of 6

'

Statistics 626

10 Linear Prediction
If we have a realization x(1), . . . , x(n) from a time series X , we would

like a rule for how to take the xs and the probability properties of X to nd the function of the data that best predicts future values in some optimal way.

We visualize taking all realizations in the ensemble of realizations that have values x(1), . . . , x(n) for X(1), . . . , X(n), using our rule to predict what the value at time n + h would be, and choosing the rule that gives the right answer on the average and the smallest average squared distance of the predicted value from the actual value no matter what values we have for x(1), . . . , x(n). Such a rule is called the best unbiased predictor (BUP) of X(n + h) from X(1), . . . , X(n) and is calculated by E(X(n + h)|X(1), . . . , X(n)), which is called the conditional expectation of X(n) given

X(1), . . . , X(n). The technical denition of conditional expectation is


very complicated (see the text) but for our purposes, it follows the same rules as does the usual expectation.

&

Topic 10: Linear Prediction

Copyright c 1999 by H.J. Newton

Slide 1

'
10.1

Statistics 626

If our populations X(1), . . . , X(n) are jointly normally distributed, then

the BUP becomes a linear function of x(1), . . . , x(n) and can be easily

calculated (as we will see below). If they are not normally distributed, then in general nding the BUP is next to impossible, so we will restrict ourselves to nding the best linear unbiased predictor (BLUP). Thus we need to study BLUPs carefully.

BLUPs for Covariance Stationary Time Series

If X is a covariance stationary time series with autocovariance function

R, then
1. The BLUP of X(n + h) given X(1), . . . , X(n) is

Xnh = 1 X(n) + + n X(1),


where the vector of coefcients satises the prediction normal equations

= r,
where is the (n n) Toeplitz matrix having R(|j

k|) as its

&

jk th element, and the vector r of length n is (R(h), . . . , R(n + h 1))T .

2. The variance of the h step ahead prediction error is given by

nh = R(0) rT 1 r. 2

Topic 10: Linear Prediction

Copyright c 1999 by H.J. Newton

Slide 2

'
where

Statistics 626

Ex: Consider a realization of length two from

X M A(1, = 2, 2 = 1. Then, since R(0) = 5, R(1) = 2, and all other Rs are zero, we have X21 = 1 X(2) + 2 X(1), 5 2 2 5
which gives 1

1 2

2 , 0

= 10/21 and 2 = 4/21, so 4 10 X(2) X(1). X21 = 21 21

Further, if we wanted to get a prediction interval for X(3), we would nd

21 2

5 2 = 5 (2 0) 2 5

2 0

= 85/21,

and then we could say that 95% of the values of X(3) are in the interval

X21 1.96 85/21.

&

Topic 10: Linear Prediction

Copyright c 1999 by H.J. Newton

Slide 3

'
10.2
where

Statistics 626

Levisons Algorithm and Partial Autocorrelations

From X(1), . . . , X(n), to get the h step ahead predictor Xnh of X(n + h) and its prediction error variance nh = Var(X(n + h) Xnh ), we must solve 2 n nh = rnh ,

n nh rnh

= Toepl(R(0), R(1), . . . , R(n 1)) = (nh (1), . . . , nh (n))T = (R(h), . . . , R(n + h 1))T

and then

Xnh nh 2

= nh (1)X(n) + + nh (n)X(1)
T = R(0) rnh 1 rnh .

This appears to be a massive problem, both in storing the (n n)

&

matrix n as n could easily be in the thousands, and in the number of

numerical operations needed to solve the system (it takes proportional to

n3 operations to solve a general system of equations).

Fortunately, there exist a variety of remarkably effective computaional tricks to solve these problems, including an algorithm called Levinsons

Topic 10: Linear Prediction

Copyright c 1999 by H.J. Newton

Slide 4

'

Statistics 626

recursion that applies when h

= 1, that is, for doing one step ahead

prediction.

2 2 Levinsons Recursions: If we denote nh and nh by n and n when h = 1, we have 1 (1) = (1),


and then for j

1 = R(0)(1 2 (1), 2 1

= 2, . . . , n : R(j)
j1 k=1

j (j) =

j1 (k)R(j k) , j1 2

j (k) = j1 (k) j (j)j1 (j k), k = 1, . . . , j 1, j 2 = j1 (1 2 (j)). 2 j

Remarks: 1. This algorithm takes only proportional to n2 numerical operations and one only need store two of the j vectors at any given point in the recursion (we only need the one for j

&

1 to get the one for j ).

2. It can be shown that j (j) is the correlation between the errors in

1 X s and in predicting X(t + j) from the previous j 1 X s and thus (j) = j (j) is dened to be the partial autocorrelation of lag j .
predicting X(t) from the next j

Topic 10: Linear Prediction

Copyright c 1999 by H.J. Newton

Slide 5

'

Statistics 626

3. At the j th step of the recursion, we have j (1), . . . , j (j) which are the coefcients needed to nd the one step ahead predictor of

X(j + 1) given X(1), . . . , X(j). Thus a common procedure is to use the rst X to predict the second, the rst two to predict the third,
and so on. Then we could calculate the set of one step ahead prediction errors

e(2) = X(2) X11 , . . . , e(n) = X(n) Xn1,1 ,


and taking as the best predictor of X(1) given no data to be the mean of the time series, we have e(1) (as is usually assumed).

= X(1) if the mean is zero

&

Topic 10: Linear Prediction

Copyright c 1999 by H.J. Newton

Slide 6

You might also like