You are on page 1of 8

Chapter 3, Part IV: The Box-Jenkins Approach to Model Building

The ARMA models have been found to be quite useful for describing stationary nonseasonal time
series. A partial explanation for this fact is provided by Wolds Theorem: "Any stationary series can be
expressed as the sum of two components: a perfectly forecastable series and a moving average of possibly infinite order." In practice, the only perfectly forecastable aspect of an economic series is the seasonal component, if any. Thus, nonseasonal series can always be represented by an MA () model,
which in turn can usually be approximated by an ARMA(p,q) model with p +q small (i.e., with a small
number of parameters). Thus, the ARMA models can typically provide an accurate yet parsimonious
description of stationary nonseasonal series.
In fact, most economic series are nonstationary and have a seasonal component. This does not
degrade the usefulness of ARMA models, however, since the raw data may typically be processed
(often by some form of differencing) to produce an approximately stationary, nonseasonal series. This
series may be forecast by fitting an appropriate ARMA model. Forecasts of the original series may then
be obtained by reversing the processing operation.
Specifically, the processing proceeds as follows. Seasonal components may be removed by a technique called "seasonal differencing", discussed in Chapter 4. Nonstationarity can often be classified as
a "trend in mean", or a "trend in variance". Trends in mean can usually be handled by ordinary
differencing. An example is the series xt = (a + bt ) + t . Trends in variance can often be converted
into trends in mean by taking logarithms, as with the series xt = exp (a + bt ) exp (t ) . The trend in
mean of log xt can then be removed by differencing. Since the techniques just described are reasonably
effective, we can safely assume that our data (after being suitably processed) forms a stationary nonseasonal time series.

What Is Model Building?

So far, in our discussions of forecasting for stationary series, we have assumed that the series
actually obeys an ARMA(p,q) model, that the model orders (i.e., p and q ) are known, and that the
corresponding parameter values are known as well. In practice, we will simply have a series of data


values, and none of these assumptions will be valid. Indeed, it is highly doubtful that our stationary
series obeys an exact ARMA model. The main justification for using such a model is not that we
believe it actually holds, but instead that we believe it can provide an accurate, parsimonious description of the data, as discussed above. Still, some important questions remain: What are the appropriate
values for (p , q ), and how should we estimate the corresponding parameter values? Box and Jenkins
refer to these respectively as the identi f ication and estimation stages of model building. We will
describe how these two stages are implemented. Note that once a model has been identified and its
parameters estimated, the result is taken to be the true model and forecasts are obtained accordingly. It
is worth remembering, however, that the fitted model is almost certainly not identical to the true model.
This can result in a type of forecasting error (essentially ignored by most authors) which cannot be
easily gauged, and which can in fact be quite devastating. As a minimum protection against such problems, we must check that the fitted model is (or at least seems to be) adequate. Such diagnostic checking is the final stage of the Box-Jenkins approach, and will be described.

Model Identification: The Correlogram and Partial Correlogram

The class of ARMA models is quite large, and in practice we must decide which of these models
is most appropriate for the data at hand, x 1 , x 2 , . . . , xn . The correlogram and partial correlogram are
two simple diagrams which can help us to make this decision (i.e., to "identify the model").
We first describe the correlogram, since it is conceptually the simplest. The theoretical correlogram is a plot of the theoretical autocorrelations
k = corr (xt , xt k )
against k . The sample correlogram is a plot against k of the estimated autocorrelations
rk =

(xt xd ) (xt k xd ) / (xt xd )2

t =k +1
t =1

If the series were actually MA(q), its theoretical correlogram would "cut off" (i.e., take the value
zero) for k > q . Thus, we would expect that the sample correlogram would have a similar (though not
identical) shape to the theoretical correlogram, and would therefore stay reasonably close to zero for

-3k > q . Reversing this reasoning, we get the following rule: If the correlogram seems to cut off for
k > q , then the appropriate model is MA(q).
For AR(p) models, the autocorrelations k are approximately (for large enough k ) k = A k
where e e < 1. Thus, for k large (say k p ), the correlogram would be expected to decline steadily (if
> 0) or be bounded by a pair of declining curves (if < 0). This pattern of decline can often be distinguished from the "cutoff" described earlier, and should be taken as evidence that the correct model is
not MA. To actually identify an AR model, however, we need a diagram which will have a more distinctive shape when the series is actually AR. The partial correlogram is such a diagram.
To define partial correlations, suppose we fit an AR(k) model to our data:
xt = ak 1xt 1 + ak 2xt 2 + . . . + akk xt k + t

Then akk is the estimate of the coefficient of xt k when a k th order AR is fitted. Rewriting this as
xt [ak 1xt 1 + . . . + ak (k 1)xt (k 1)] = akk xt k + t

we see that akk is a plausible estimate of the correlation between xt k and that part of xt which cannot
be forecast from xt 1 , . . . , xt (k 1). akk is called the partial correlation between xt and xt k . It is the
estimated correlation between xt and xt k after the effects of all intermediate x s on this correlation are
taken out.
Clearly, if the series is actually AR(p), then the theoretical partial correlations akk will be zero for
k > p . Thus, we can use the partial correlogram (i.e., a plot of the estimated partial correlation
coefficients) to identify AR models: If the partial correlogram cuts off for k > p , then the appropriate
model is AR(p).
There is an interesting duality (symmetry) between the properties of the correlogram and partial
correlogram for pure AR and pure MA models. The behavior of a given diagram for a given model
type is the same as the behavior of the other diagram for the other model type. We have already seen
some evidence of this: The correlogram for an MA model and the partial correlogram for an AR model
both cut off. As we know, the correlogram for an AR model dies down (but does not cut off). It can be
shown that the partial correlogram for an MA model dies down as well.


A still unanswered question is how we can identify a mixed ARMA model. In this case, it can be
shown that the correlogram and partial correlogram both die down (but do not cut off). Thus, if both
diagrams die down, we can conclude that the appropriate model is ARMA. Unfortunately, though, the
diagrams do not in this case help us to decide on the order (p , q ) of the mixed model.
The following table summarizes the behavior of the diagrams.
Behavior of Correlogram and Partial Correlogram for Various Models
c Correlogram c Partial Correlogram c c
c c
c c
c AR
c Dies Down c
Cuts Off
c c
c c
c c
Cuts Off
Dies Down
c MA c
c c
c c
c c
Dies Down
c ARMA c Dies Down c
c c
c c

After examining the correlogram and partial correlogram in the light of the above described properties, we should be able to select a few models which seem appropriate. (Unfortunately, the observed
patterns are often not so clear as to unambiguously point to a single model.) Another guiding principle
in model identification is that of parsimony : The total number of parameters in the model should be as
small as possible (e.e., 3 or less, in the view of Box and Jenkins), subject to the restriction that the
model provide an adequate description of the data. If two models appear to fit the data equally well, the
one with the fewest parameters will always be preferred. Indeed, in this case the one with the fewest
parameters will almost certainly produce the best forecasts. One reason is that we can obtain more precise (stable) parameter estimates if the number of parameters is small.
Besides facilitating the identification of models for stationary series, the correlogram can also
diagnose nonstationarity. If a series is nonstationary (and needs to be differenced to produce a stationary
series) then the theoretical autocorrelations will be nearly 1 for all k . Thus, if the estimated correlogram
fails to die down (or dies down very slowly), then the series should be differenced. If the estimate d
correlogram for the differenced series still fails to die down, then the series should be differenced once
more. Note, however, that economic series typically need to be differenced only once. If the series
needs to be differenced d times before an ARMA(p,q) model can be identified, the original series is


said to be an integrated mixed autoregressive-moving average series, denoted ARIMA(p,d,q).

The model identification method just described is the one advocated by Box and Jenkins, and
Granger (among others). Its usefulness has been amply demonstrated on actual data, economic and otherwise. It is the method that we will use in this course. The method does have some serious drawbacks,
however: It is not entirely objective, its implementation requires careful examination of the data by a
knowledgeable and experienced analyst, and it may fail to unambiguously identify a model. Since the
publication of Box-Jenkins and Granger, several objective methods have been proposed and tested.
These methods automatically select a model without any intervention from the user. Although there is
no universal agreement as the superiority of the objective methods compared to the Box-Jenkins
method, the potential advantages of a high-quality automated method are quite strong. Still, if an
experienced analyst is available, considerable insight may be gained through examination of the correlogram and partial correlogram, even if an automated method is ultimately used. We will discuss the new
methods more fully if time permits.

In the last section, we described ways of choosing an appropriate model. Strictly speaking, however, "model identification" consists merely of selecting the form of the model, but not the numerical
values of its parameters. Suppose, for example, we have decided to fit an AR(1) model xt = axt 1 + t .
Since the value of the parameter a is not known, it must somehow be estimated from the data. Here,
we describe methods of estimating the parameters of ARMA models.
For pure AR models, there exist simple estimation techniques, since there is a linear relationship
between the autocorrelations and the AR parameters. This relationship can be inverted, and then the
theoretical autocorrelations can be replaced by their estimates, to yield estimates of the AR parameters.
In the AR(1) case, for example, we know that 1 = a . Thus, we may estimate a by a = r 1. In general for
the AR(p) model
xt = a 1xt 1 + a 2xt 2 + . . . + ap xt p + t
we obtain a system of linear equations called the Yule-Walker equations by multiplying both sides by

-6xt k (k = 1 , . . . , p ), taking expectations and then normalizing. The k th equation in the system is
k = a 1k 1 + a 2k 2 + . . . + ap k p

The estimates a 1 , . . . , ak of the AR parameters are obtained by solving this linear system, thereby
obtaining a formula for a 1 , . . . , ap in terms of 1 , . . . , p and then replacing 1 , . . . , p by their
estimates r 1 , . . . , rp in this formula. This procedure is equivalent to solving the system
rk = a 1rk 1 + a 2rk 2 + . . . + ap rk p

(k = 1 , . . . , p )

for a 1 , . . . , ap . The resulting values are called the Yule-Walker estimates. It can be shown that the
Yule-Walker estimated AR parameters always correspond to a stationary AR model.
The situation for MA models is considerably more complicated. The theoretical relationship
between the parameters and autocorrelations is not linear. For example, in the MA(1) xt = t + b t 1 ,
we have
1 = hhhhhh
1 + b2

In this case, we get a quadratic equation for b, namely 1b 2 + (1)b + 1 = 0 , which has the two solutions
b = hhhhhhhhhh

It can be shown that e 1 e .5 for any MA(1) model, so the solutions will both be real. The corresponding estimates of b are
1 ddd
4r 1d
b = h hhhhhhhhh
2r 1

and two problems arise here. First, there is no guarantee that 1 4r 12 > 0 . Second, how do we decide
which of the two solutions to use?
To answer this second question we must define invertibility . An MA model is said to be invertible if it can be represented as (i.e., "inverted to") a stationary infinite-order autoregression, AR ().
Consider, for example, the MA(1) model xt = t + b t 1. If we consider this as a difference equation for

-7t , we obtain the solution

t = xt bxt 1 + b 2xt 2 + . . . + (b )k xt k + . . .

If e b e > 1, an explosive series results and the current t cannot be estimated from past xt . Thus, to be
useful for forecasting, the MA model must be invertible. For the MA (q ) model, the invertibility condition is that the root of largest magnitude of the equation z q + b 1z q 1 + . . . + bq = 0 should have magnitude less than one.
Returning now to the issue of which solution to choose for b in the MA(1) case, it can be shown
that of the two possible solutions, only one gives an invertible model. Estimation for MA(q) models
proceeds similarly. From the expressions for 1 , . . . , q , we obtain a system of nonlinear equation
for the parameters b 1 , . . . , bq . This system will have many solutions, but only one will give an invertible model. Computer programs for fitting MA models will always choose this invertible model.
Estimation for mixed ARMA models proceeds by nonlinear methods. The programs used will
always choose a stationary, invertible model.
The methods just described are those given in Granger. All of these exploit the the connection
between the autocorrelations and the parameters. In fact, there exist many other estimation techniques,
including the very popular maximum likelihood method. This method assumes that the innovations are
normally distributed, and then exploits this assumption as fully as possible. Another popular method is
least squares , in which the sum of squared errors of the fitted model (i.e., the sum of squares of the
estimated innovations) is made as small as possible. Assuming normal innovations the maximum likelihood and least squares methods are generally superior to the method described in Granger, particularly
when the model is near the nonstationarity boundary (i.e., when the largest root of the stationarity equation has magnitude close to 1).

Diagnostic Checking
Once a model has been identified and estimated, it is usually taken to the the true model and
forecasts can be obtained accordingly. As mentioned earlier, it is virtually certain that the estimate d
model is not the true model. To protect against disastrous forecasting errors, the least we can do is to


check that the fitted model is a satisfactory one. This is done by the use of diagnostic checks . If we
had a large amount of data, it would be feasible to break the data into two parts, identify and estimate
the model on the first part and check the quality of the forecasts on the second part. This method,
known as cross validation , gives one of the few ways of obtaining an honest estimate of forecasting
error. Unfortunately, there is typically not enough data for cross-validation to be used, so that models
are identified, estimated, and diagnostically checked on the same data set. The most commonly used
method is to examine the correlogram of the residuals from the fitted model to see if the residuals are a
white noise (as they should be, if the model is correct). For example, the Box-Pierce test statistic is
based on the sum of squares of the residual autocorrelations. If this test statistic exceeds some critical
value (found in a table), then the model in question is declared to be inadequate. Unfortunately, this test
is not very likely to flag inadequately fitting models. Furthermore, even if a model is not found to be
inadequate, the method provides no assessment of the probable contribution to forecast error due to the
identification and estimation stages, and due to the difference between the identified and actual models.