You are on page 1of 28

Chaper 9: Nonseasonal Box-Jenkins Models

The concepts of ‘stationary time series’ and

‘nonstationary time series’ are important in the
Box-Jenkins methodology.

Stationary time series

A time series { yt } is said to be stationary if the
following two conditions are satisfied:
(a) the mean function is constant over time, i.e.,
mt = E ( yt ) = c for all t

(b) rt ,s = cov( yt , ys ) / var( yt ) var( ys ) are not

functions of time, i.e., rt ,t -k = r0,k = rk for all

time t and lag k . This is equivalent to the
condition: g t ,s = cov( yt , ys ) are independent of
time t also, ie., gt,t-k = g0,k = gk for all t and lag k.

1 Ch9
In other words both the autocorrelations rt ,s and

autocovariances g t ,s depend only the distance

between the two time points s and t but not on the
actual positions of s and t .
Note: Since g t ,t = cov( yt , yt ) = var( yt ) , a stationary
time series is also necessary that the variance is
constant with respect to t.

Nonstationary Time Series

If the n values of yt do not fluctuate around a
constant mean or do not fluctuate with constant
variation then it is reasonable to believe the time
series is not stationary.

2 Ch9
Random walk with zero mean




50 100 150

A nonstationary series can be transformed into a

stationary one by first differencing
zt = �yt = yt - yt -1 .
Minitab command for differencing is
Stat > Time Series >Difference (lag 1)

(Differencing is like differentiation in calculus)

�yt y - yt -1
�yt = yt - yt -1 � �yt = = t
1 t - (t - 1)
which is similar to the definition of a derivative of
a function f (t ) :

3 Ch9
f (t + D ) - f (t ) f (t + D ) - f (t )
f ' (t ) = lim = lim
D �0 t + D -t D �0 D

Time Series Plot of Paper Towel Sales




1 12 24 36 48 60 72 84 96 108 120

After first differencing

4 Ch9
Time Series Plot of first differencing





1 12 24 36 48 60 72 84 96 108 120

If this is not sufficient, take second differences (the

first differences of the first differences) of the
original series values should normally does the job
zt = �2 yt = �yt - �yt -1 = ( yt - yt -1 ) - ( yt -1 - yt -2 )
If a time series plot indicates increasing variability,
it is often transform the series by using either
square root, quadric or logarithmetic
transformation first and then takes first differences

Consider the following NCR (New Company
Registrations) rates data given below:

5 Ch9
Time Series Plot of NCR






4 8 12 16 20 24 28 32 36

The series is clearly not stationary since it has a

trend and increasing variability which means both
E ( yt ) and var( yt ) are depending on the time
variable t .

6 Ch9
Time Series Plot of lnNCR







4 8 12 16 20 24 28 32 36

Clearly the log transformation has stabilised the

variance somewhat.

Applying differencing on the logged series:

Time Series Plot of d1lnNCR







4 8 12 16 20 24 28 32 36

7 Ch9
It now appears that the resulting series is stationary.

Working Series
The textbook uses zb , zb +1 ,..., zn as the ‘working
series’ obtained from the original series by
transformation or differencing.
b = 2 if zt = yt - yt -1

Sample autocorrelation coefficient (SAC)

The sample autocorrelation at lag k is
�( zt - z )( zt +k - z )
t =b
rk = rk = n
�( zt - z )2
t =b

z = �zt /(n - b + 1)
t =b

The standard error of rk is

8 Ch9
� 1
�(n - b + 1)1/ 2 , if k = 1

srk = � k -1

� 1 + 2 �j r 2

j =1
� 1/ 2
, if k = 2,3,...
�(n - b + 1)

The trk -statistic is

trk =

SAC graph is a graph of sample autocorrelations

(Minitab calls it the ACF plot):

Autocorrelation Function for y (original towel sales)

(with 5% significance limits for the autocorrelations)



2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

9 Ch9
We say that a spike at lag k exists if rk is

statistically large, says trk = rk / srk > 2 in absolute

In Minitab acf graph, any rk that is above or below
the confidence bands is considered to be a spike so

you do not need to find the value of trk .

Cuts off after k

We say that SAC cuts off after lag k if no spikes at
lags greater than k in SAC

Using the SAC to find a stationary time series

For nonseasonal data
(i) If the time series either cuts off fairly
quickly or dies down fairly quickly, then
the series is considered stationary

10 Ch9
(ii) If the time series dies down extremely
slowly, then the series is considered
Note that the SAC of the towel sales series refuse
to die down quickly so there is a clear sign the
series is nonstationary

Sample partial autocorrelation rkk

Can be thought of as the sample autocorrelation of
time series observations separated by a lag of k
time units with the effects of the intervening
observations eliminated.

In other words, this measure of correlation is used

to identify the extent of relationship between
current values of a variable with earlier values of
the same variable (values for various time lags)
while holding the effects of all other time lags

11 Ch9
Consider now the differenced series of the towel
Autocorrelation Function for z (differenced series)
(with 5% significance limits for the autocorrelations)



2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Here, there is a cut-off at lag 1 so the differenced

series is stationary.

Simple Stationary Time Series Models- ARMA

Let {at } be a sequence of random shocks which
describe the effect of all other factors other than
zt -1 on zt . It is more or less the residual errors of

12 Ch9
the forecast (if the residuals e t are not independent,
then we can’t treat e t as at )
Note: Most textbooks call { at } the white noise.

Properties of {at }
(i) a1 , a2 , a3, ... are independent
(ii) ai : N (0, s a )

(iii) at +1 is independent of yt , yt -1 ,...

{at } forms a very important role in Box-Jenkins

methodology. Essentially, every stationary Box-
Jenkins model can be expressed in terms of the
white noise process.

Simple Box-Jenkins Models

Moving Average Models

13 Ch9
zt = at - q1at -1 ... - qqat -q
and refer to it as a moving average process of order
q, denoted by MA(q). (Note that structurally
speaking, MA(q) is expressed as averaging of at
terms except the negative signs)

The special case:


zt = at - q1at -1

E ( zt ) = 0
var( zt ) = s 2a (1 + q12 )
cov( zt , zt +1 ) = -q1s 2a
cov( zt , zt + k ) = 0 for k �2

Thus r1 = 2 and all other r k are zero.
1 + q1
(Make sure you know how to derive the above).

Hence the TAC of an MA(1) “cuts off” after lag 1.


14 Ch9
zt = at - q1at -1 - q2 at -2
E ( zt ) = 0 E(Zt) = 0
var( zt ) = s a2 (1 + q12 + q22 )
cov( zt , zt +1 ) = ( -q1 + q1q2 )s a2
cov( zt , zt + 2 ) = -q2 s2a
cov( zt , zt + k ) = 0 for k �3

-q1 + q1q2
r1 = ,
1 + q12 + q22
r2 =
1 + q12 + q22
and all other rk are zero.
Thus the TAC of an MA(2) “cuts off” after lag 2.

In general, for MA(q)

(i) rk �0 for k = 1, 2,..., q

rk = 0 for k > q

(ii) PAC dies down

15 Ch9
Autoregressive Models

zt = f1 zt -1 + f2 zt -2 + ...f p zt - p + at

Here the zt are regressed on themselves, (hence of

course the name) but lagged by various amounts.
The simplest case is the first order, denoted as
AR(1), which takes the form
zt = f1 zt -1 + at

E ( zt ) = 0
s 2a
var( zt ) = g 0 = 2 ,
1 - f1
so |f1| < 1 to ensure stationarity
gk = f1 gk-1
rk = f1

Thus rk “dies down” exponentially as k increases,

oscillating if f1 < 0. Thus if the TAC of a series
dies down rather than cuts off, we suspect it to be
an AR rather than an MA.

16 Ch9
Note that AR and MA series are not entirely
unrelated. It can be shown that an AR(1) can be
expressed as an “infinite” MA series, much like the
general linear process. The MA(1) can similarly be
expressed as an “infinite” AR series.

Note: a linear process is a time series that has the

yt = at + y 1at -1 + y 2 at -2 + ...

The AR(2) can be written as

zt = f1 zt -1 + f2 zt -2 + at
r1 =
1 - f2
r2 = f1r1 + f2
r3 = f1r2 + r2 f1


Thus again the TAC dies down rather than cuts off,
though it is difficult at times to tell the difference in
TAC’s between AR(1) and AR(2).

17 Ch9
TPAC has nonzero partial autocorrelations at lags 1
and 2 and zero at all lags after lag 2, i.e., cuts off
after lag 2.

In general, for AR(p), TAC dies down and TPAC

cuts off after lag p.

ARMA(p, q)
Mixed autoregressive-moving average models

The model can be written as

zt = f1 zt -1 + f2 zt - 2 + ... + ft - p + at - q1at -1 - q2 at -2 - ... - qq at -q

zt - f1 zt -1 - f2 zt -2 ... - ft - p = at - q1at -1 - q2 at -2 - ... - q q at - q

i.e., we move autoregressive part to the left

whereas the moving average part on the right.

ARMA(1, 1)

18 Ch9
zt = f1 zt -1 + at - q1at -1

(1 - q1f1 )(f1 - q1 ) k -1
rk = f1 , k �1
1 - 2q1f1 + q1 2

i.e., TAC dies down exponentially from r1 (not

from r0 = 1)

TPAC also dies down exponentially.


We can therefore tentatively produce a Model

Identification Chart, as follows, based on the
behaviours of the SAC and SPAC of a stationary

SAC SPAC Tentative

behaviour behaviour Model
Cuts off after 1 Dies down MA(1)
Cuts off after 2 Dies down MA(2)
Dies down Cuts off after 1 AR(1)
Dies down Cuts off after 2 AR(2)
Dies down Dies down ARMA(1, 1)

19 Ch9
This looks relatively obvious, but isn’t as easy in
practice as it appears. Note that no process has
ACF and PACF that both cut off.

Box-Jenkins Models with a nonzero constant

zt = d + at - q1at -1 ... - qqat -q

E ( zt ) = m = d

zt = d+f1 zt -1 + f2 zt -2 + ...f p zt - p + at
d = m(1 - f1 - f2 - ... - f p ) �

m = d /(1 - f1 - f2 )

zt = d + f1 zt -1 + f2 zt -2 + ... + ft - p + at - q1at -1 - q2 at -2 - ... - qq at - q

d = m(1 - f1 - f2 - ... - f p )

20 Ch9
Time Series Operations and Representation of
ARMA (p,q) Models.

Backshift Operator
Byt = yt -1
(Push back the time series to the previous position)
Difference operator

�= 1 - B so �yt = (1 - B) yt = yt - yt -1 . Thus, �is

generally known as a differencing operator.

�2 yt = ��
( yt ) = �( yt - yt -1 ) = ( yt - yt -1 ) - ( yt -1 - yt -2 )
= yt - 2 yt -1 + yt -2

Also �d = (1 - B) d

Representation of an ARMA(p, q) model:


zt = d + f1 zt -1 + ... + f p zt - p + at �
zt - f1 zt -1 - ... - f p zt - p = d + at

which can also be written as

21 Ch9
(1 - f1 B - f2 B 2 - ... - f p B p ) zt = d + at

Define f p ( B ) = (1 - f1 B - f2 B - ... - f p B )
2 p

f p ( B) zt = d + at

MA(q) – moving average model of order q

The model is written as

zt = d + at - q1at -1 - q 2 at -2 - ... - q q at -q
which can also be written as

zt = d + (1 - q1 B - q 2 B 2 - ... - q q B q )at

q q ( B) = (1 - q1 B - q 2 B 2 - ... - q q B q ) ,


zt = d + q q ( B )at

ARMA (p, q)—Mixed autoregressive-moving

average model of order (p, q):

22 Ch9
zt = d + f1 zt -1 + f2 zt - 2 + ... + f p zt - p
+at - q1at -1 - q 2 at - 2 - ... - q q at - q

zt - f1 zt -1 - f2 zt - 2 �
��-f p zt - p = d + at - q1at -1 - q 2 at -2 �
��-q q at - q

(1 - f1 B - f2 B 2 - ... - f p B p ) zt = d + (1 - q1 B - q 2 B 2 - ... - q q B q )at

f p ( B ) zt = d + q q ( B )at (*)

where q q ( B ) = (1 - q1 B - q 2 B - .. - q q B )
2 q

In this notation, ARMA(p, 0)= AR(p) and

ARMA(0, q) = MA(q).

In such cases one would prefer to write AR(p) and

MA(q) instead of ARMA(p, 0) and ARMA(0, q).

23 Ch9
Point Estimate of the model parameters
Having identified a tentative ARMA model, we
must now fit it to the dataset concerned, in so doing
obtain estimates of the parameters defined by the
models. For the ARMA(p, q) model, the
parameters are qi , fi and d (if the constant term is

These parameters are popularly estimated the least

squares method (As we understand it, both Minitab
and SAS use this approach).
The least method essentially find the estimates so

that SSE = � t - 2
( y ˆ
y t ) is minimum.

You do not need to know the detailed algorithm.

Isn’t nice that the computer packages do it for us?

24 Ch9
What is the meaning of forecasting?
yˆt +t (t ) is a point forecast of the series at time t + t
given the series has been observed from 1 to t
Statistically speaking,
yˆt +t (t ) = E ( yt +t | y1 , y2 ,.., yt )

Since ARMA models build upon the series{at } , the

properties of {at } needs to be revisited. In
particular, a1 , a2 , a3 ,... are independent and that
future values of a ' s are independent of the present
and the past values of y ' s , i.e., at +1 is independent
of yt , yt -1 ,....

Example: Paper Towel Sales

It is found that the differenced series can be fitted
by MA(1), so
zt = at - q1at -1

(assuming d = 0 ).

Since zt = yt - yt -1 so

25 Ch9
yt - yt -1 = at - q1at -1 �

yt = yt -1 + at - q1at -1

(This is known as in the form of a difference-

One-step forecast:

First, we have yt +1 = yt + at +1 - q1at

yˆt +1 (t ) = E ( yt +1 | y1 , y2 ,..., yt )
= E ( yt + at +1 - q1at | y1 ,..., yt )
= yt + 0 - qˆ1aˆt = yt - qˆ1aˆt

since at +1 is independent of y1 ,.., yt

so E (at +1 | y1 , y2 ,.., yt ) = E ( at +1 ) = 0 .

Let t = 120 and t = 1 so

yˆ121 (120) = y120 - qˆ1aˆ120

In the absorbent towel sales example given in Table

9.1, Minitab gives qˆ1 = -0.3544

26 Ch9
Final Estimates of Parameters

Type Coef SE Coef T P

MA 1 -0.3544 0.0864 -4.10 0.000

Differencing: 1 regular difference

Number of observations: Original series 120, after
differencing 119
Residuals: SS = 127.367 (backforecasts excluded)
MS = 1.079 DF = 118

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag 12 24 36 48
Chi-Square 10.3 18.6 27.5 41.2
DF 11 23 35 47
P-Value 0.500 0.725 0.815 0.710

The last two residuals are e119 = -1.0890 and

e120 = 0.6903 so aˆ119 = -1.0890 and aˆ120 = 0.6903 .

yˆ121 (120) = 15.6453 + 0.3544 �0.6903
= 15.8899
Using Minitab to forecast, we get
Forecasts from period 120

95 Percent
Period Forecast Lower Upper Actual
121 15.8899 13.8532 17.9267

which is identical.

27 Ch9
Two-step forecast:
yt + 2 = yt +1 + at + 2 - q1at +1 �

yˆt + 2 = yˆt +1 (t ) + E (at + 2 ) - qˆ1E (at +1 ) = yˆt +1 (t )

Again, let t = 120 ,

then yˆ122 = yˆ121 (120) = 15.8899 .

However, the prediction interval is winder:

Forecasts from period 120

95 Percent
Period Forecast Lower Upper Actual
121 15.8899 13.8532 17.9267
122 15.8899 12.4609 19.3189

Finally, in ARIMA notation, we may write our

model that fits the original series as


28 Ch9

You might also like