You are on page 1of 17

Journal of Forecasting, Vol.

10, 579-595 (1991)

Forecasting for Business Planning:


A Case Study of IBM Product Sales
L. S.-Y. WU, N. RAVISHANKER AND J. R. M. HOSKING
IBM Research Division, U.S.A.

ABSTRACT
This is a case study of a closely managed product. Its purpose is to
determine whether time-series methods can be appropriate for business
planning. By appropriate, we mean two things: whether these methods can
model and estimate the special events or features that are often present in
sales data; and whether they can forecast accurately enough one, two and
four quarters ahead to be useful for business planning. We use two time-
series methods, Box-Jenkins modeling and Holt-Winters adaptive
forecasting, to obtain forecasts of shipments of a closely managed
product. We show how Box- Jenkins transfer-function models can
account for the special events in the data. We develop criteria for choosing
a final model which differ from the usual methods and are specifically
directed towards maximizing the accuracy of next-quarter, next-half-year
and next-full-year forecasts. We find that the best Box-Jenkins models
give forecasts which are clearly better than those obtained from
Holt-Winters forecast functions, and are also better than the judgmental
forecasts of IBM’s own planners. In conclusion, we judge that
Box-Jenkins models can be appropriate for business planning, in
particular for determining at the end of the year baseline business-as-usual
annual and monthly forecasts for the next year, and in mid-year for
resetting the remaining monthly forecasts.

KEY WORDS Adaptive smoothing ARIMA models Transfer-function


models

FORECASTING AND BUSINESS PLANNING

Business planning often starts with an annual financial goal from which annual targets or plans
for financial measurements and sales of individual products must be generated. Business
planners take each annual plan and further apportion it into monthly numbers which represent
their expectation and a feasible way of reaching the annual plan, in terms of manufacturing
schedules and marketing support. The monthly planning numbers are called the track. They
are based on historical patterns of seasonality in the data and on the planner’s judgment of
such matters as manufacturing capability, product announcements and price cuts.
One important aspect of business planning is assessing, as the year progresses, whether the
annual plan is attainable or whether action needs to be taken to achieve the target; and, when
0277-6693/91/060579- 17$08.50 Received July 1989
@ 1991 by John Wiley & Sons, Ltd. Revised December 1989
580 Journal of Forecasting Vol. 10, Iss. No. 6

action is required, judging the likely effect of feasible choices. Wu (1988) has described how
to model and quantify the uncertainties in this assessment process. She showed how a planning
model, the ‘Track Uncertainty Model’, can be used to construct planning charts (called the
WINEGLASS, SHIPWRECK and OUTLOOK charts) to determine whether we are on track,
to assess the attainability of the annual plan and to make outlooks for the current year; and,
if we are off track, to find where the problem is. In the Track Uncertainty Model outlooks are
based on the year-to-date estimate of the ratio of annual sales to the annual plan. This ratio
is assumed to vary from year to year and is estimated with data from the current year only.
Therefore outlooks made from this model are for the current year only.
Two other areas important to business planning are making forecasts across years and
measuring the effect of special marketing and manufacturing events on sales. Developing
forecasts for the next year is important for several reasons: for cross-checking the
reasonableness of annual plans developed from the financial goals; towards the end of a year,
for setting manufacturing schedules for the beginning of the next year; and for establishing
a baseline business-as-usual forecast for the next year. The modeling and forecasting of sales
data is difficult because of discontinuities caused by special events such as the introduction of
new products and marketing programs. Failure to account for these discontinuities will lead
to erratic and unreliable forecasts. However, if good forecasts are available then the effect of
a special event can be estimated, as the difference between the forecast and actuality. In this
way, one can obtain a reasonable value for the annual plan for the next year, and a track for
reaching it. Start with the baseline business-as-usual forecast. Next, consider any special events
which are expected in the next year, and estimate their effects to be what has been measured
for analogous events in the past. Add these effects to the baseline forecast to obtain the track.
In this paper we examine two time-series forecasting methods: the Box-Jenkins model and
the Holt-Winters forecasting function. We describe the criteria and steps used to construct our
forecasting models; we show how the effect of one particular special event, a constraint on
shipments, can be estimated; and we assess how well these methods forecast the next quarter,
the next half-year and the next full year. The methods are applied to a time series of shipments
of a closely managed product, which we call Boxes. By ‘closely managed’ we mean that there
is a monthly sales objective, the track, and that shipments are managed with the aim of
achieving it. At first, one might not expect time-series forecasts to be as accurate as the track
for predicting future shipments. However, we found that this was not the case. Box-Jenkins
modeling yielded better forecasts than the planner tracks, and we judge it to be an appropriate
method for planning and forecasting sales of Boxes.
The structure of the paper is as follows. The next section contains a description of the data.
Our Box-Jenkins and Holt-Winters modeling procedures are described in the third and fourth
sections, respectively, The final section contains a summary and conclusions, and compares
our forecasts with the planner track.

DATA DESCRIPTION

Data on shipments of Boxes consists of monthly observations from January 1982. The data
set is listed in Table I. T o maintain confidentiality, the numbers have been rescaled by a
constant which we do not disclose. This rescaling does not, of course, affect the dependence
structure of the time series. In our analysis we use the 72 observations between January 1982
and December 1987 to build a time-series model to best describe the data, and then we use this
period to forecast the number of shipments for 1988. We denote the 1982-7 Boxes data by
L . S.-Y. Wu, N . Ravishanker and J. R. M . Hosking IBM Product Sales 581

Table I . Shipments of Boxes, 1982-8

1982 1983 1984 1985 1986 1987 1988

Jan. 3300 2100 3400 1100 4400 1300 2000


Feb. 3500 2600 4500 1300 2300 3000 3900
March 7300 9400 7700 3400 7500 11300 15500
April 4300 2900 4200 1900 2500 3200 1900
May 4600 4700 5400 3200 4700 5300 5600
June 5600 loo00 6800 5900 1 1000 12700 16600
July 2600 4400 2400 1200 3300 1400 1300
Aug. 4400 5 100 5100 2800 5300 2800 6400
Sept . 6900 7900 4700 10900 11700 19000 22400
Oct . 3700 5200 3800 5100 4300 2500 2000
Nov. 6400 8800 5700 10200 7300 9600 7700
Dec. 8800 10200 12600 14100 16800 22900 24900

Figure 1. Monthly data on Boxes from January 1982 to December 1987

X,, t = 1,2, ... , 72. The series is shown in Figure 1. The mean of the observations is 6100 and
the variance is 1.82 x 10’.
The series displays a strong seasonal pattern, with peaks in the last month of a quarter and
a particularly large peak in December. There are also two special features in the data which
are important for building a time-series model:

The variability in the data increased at about the end of 1984: this period corresponds to
the introduction of a new model. The point at which the variance change occurs is difficult
to judge exactly; we take the change as occurring at time point t = 33, corresponding to
September 1984. The variance in the data after the introduction of the new model is about
five times the variance before its introduction. To account for this change in variance, we
transform the data to stabilize the variance and use the transformed data for modeling
and forecasting. The transformation is discussed in the next section.
The data in the third quarter of 1987 are affected by an external constraint on shipments
during July and August 1987. Boxes which would have been shipped in July and August
were not shipped until September 1987. This means that September 1987 was unusually
high at the expense of July and August 1987. To correct this anomaly, a reallocation of
shipments within these three months is called for. This is specially important, because
these observations greatly influence the values of the forecasts obtained for 1988.
582 Journal of Forecasting Vol. 10, Iss. No. 6

FORECASTING THE BOXES SERIES USING BOX-JENKINS MODELS

Identification of a seasonal model


Box and Jenkins (1976) have described methods of time-series modeling and forecasting which
use the seasonal autoregressive integrated moving-average (ARIMA) (p, d, q ) x (P,D,Q ) S
model:
+ ( B ) C P ( B ~- ( Si p z f- ,.4
) (~( ~) ~- B = e(B)e(BS)c, (1)
Here Zf is the observed time series, B denotes the backward shift operator defined by
BZf = Z,- I , t.he Ct are independent and identically distributed random variables with mean zero
and variance a’, p is the mean of the process after seasonal and nonseasonal differencing,
+ ( B ) = 1 - + l B - - bPBpand 8 ( B ) = 1 - 81B - - 0,B4 are polynomials in B of degrees p
and q, respectively, @ ( B S )and 8(Bs) are polynomials in B S of degrees P and Q, respectively,
and s is the seasonal period. We will apply Box-Jenkins methods to the Boxes data. In our
final model Z , will not be the raw data series Xf but a transformed version of it, which
accounts for the special features in Boxes series noted in the previous section. However, for
reasons of computational simplicity, our initial identification of the model orders
(p, d , q, P, D,Q) uses the untransformed Xf series.
The Boxes series exhibits an increase in variance at about t = 33. The dependence structure
of the Boxes series is similar for the two segments of the data before and after t = 33.
Figures 2(a) and 2(b) show the autocorrelations for the Boxes series for t < 33 and for t > 33,
respectively. Figures 2(c) and 2(d) depict the partial autocorrelations for the two segments;
Figures 2(e) and 2(f) show the normalized spectral density function for the two segments. It
can be seen that they are very similar. Hence it is reasonable to use a single ARIMA model
for the entire Boxes series.
Plots of the sample autocorrelations and partial autocorrelations for the whole Boxes series,
together with 95% marginal confidence intervals for each correlation coefficient, and of the
estimated spectral density function, are shown in Figure 3. The autocorrelations exhibit a
strong seasonal pattern with seasonal period s = 3, and the spectral density function has a large
peak at a frequency of 0 . 3 3 , which corresponds to a period of 3. This suggests that the Boxes
series is seasonally nonstationary. The nonstationarity is removed by seasonal differencing of
period 3 and degree 1 . Figure 4 shows the differenced Boxes series Yf = (1 - B 3 ) X , .The mean
of the series is very close to zero: we take it as being exactly zero. The autocorrelations, partial
autocorrelations and spectral density function of Y, are shown in Figure 5 . Some strong
correlations and periodicities remain in the series, but the autocorrelations and partial
autocorrelations die away sufficiently rapidly that it is reasonable to treat the series Yf as
stationary. The autocorrelations and partial autocorrelations at lags that are multiples of 3
suggest P = 3 or P = 4 and Q = 3 or Q = 4 as possible choices for the seasonal model orders.
Further, the presence of sizable partial autocorrelations at lags 1 and 2 suggests p = 2 as a
possible choice.
Thus we are led to consider several possible models. Three models among these candidates
have particularly small residual variances. These have orders (2,0,0) x (3, 1,3)3,
(0, 0,O)x (3, 1, 3)3 and (0,0,O)x (4,1,0)3. We shall refer to them as Model 1, Model 2 and
Model 3, respectively.

Modeling the change in variance


As noted in the “Data description” section, the Boxes data exhibit a variance change at about
the end of 1984. The change in variance can be modelled and estimated by rescaling the data.
:c

n
U
V
N
W
LY
3
+ w
CL
3
12 0
LL
LL

n!

<
,
0'1 S'O 0 S'O- 0'1- 0'1 S'O 0 S'O- O ' L - oz SL 01 s 0
N011Vl3tltl0301nV N 0 1 1 V l 3 t l k i 0 3 0 1 ~ V1VlltlVd MISN30 l W 1 3 3 d S

. -

/ -

h
n :
m
v
0
V a
V
N N v) N
; + W
CL
3
a
4 W
3
CL
3
12 12 12
LL LL LL

0'1 S'O 0 S'O- O'L- 0'1 E'O 0 S'O-O'L- 91 ZL 8 tr


NOllVl3tltl0301flV N O l l V l 3 t l t l 0 3 0 1 n V 1VlltlVd MISN3a l W 1 3 3 d S
584 Journal of Forecasting Vol. 10, Iss. No. 6

I LAGS

a'l
LAGS

Figure 3. Autocorrelations, partial autocorrelations and spectral density function of X,

ES
3 -
8
eo
0
-
w -

Eg
om
~

Figure 4. The differenced Boxes series, Y,= (1 - B3)xf

Let Y, = ( 1 - B 3 ) X fdenote the seasonally differenced Boxes series. Let the parameter k denote
the ratio of the variance after the change point c to the variance before the change point c,
where c corresponds to t = 33., Then Y, is assumed to follow an ARMA model:

with innovations at which satisfy

(a: t>c
L . S.-Y. Wu, N . Ravishanker and J . R. M. Hosking IBM Product Sales 505

t l 1 1 1

! * . .. .,. 1
.. .

-t
I LAGS

Figure 5 . Autocorrelations, partial autocorrelations and spectral density function of Y,

Now let

In the Appendix we show that the log-likelihoods of model (2) and of an ARMA(p, q ) model
fitted with constant innovation variance to Zlk' are approximately equal, apart from a term
+ c log k . We use this result when fitting a full transfer-function model for the Boxes series,
described later.

Modeling the constraint on shipments


Due to a constraint on shipments in the third quarter of 1987, the observed series X,has an
artificially high value in September 1987, and correspondingly lower values in July and August,
as the units that could not be shipped in July and August were made up in September. Hence,
an unconstrained process would have had higher values in July and August and a lower value
in September compared to the data actually observed. However, the total for the third quarter
would be the same for the observed process as for the underlying unconstrained process.
We want to base our forecasts of future shipments on what we believe the underlying
unconstrained process to have been, rather than on the artificially disturbed observed data.
This can be achieved by reallocating the observed third quarter total between the three months
in the quarter. The months July, August and September 1987 correspond, respectively, to
586 Journal of Forecasting Vol. 10, Iss. No. 6

t = 67, 68 and 69. To effect the reallocation from September into July and August, define
1 t=67

0 otherwise
and
1 t=68

0 otherwise
The series which we want to use as the basis for forecasting is not X I but X I + aUf.l + PUt,2.
The parameters a and P may be regarded as the numbers of Boxes reallocated from September
1987 into July and August, respectively, and are as yet undetermined.

The full transfer-function model for the Boxes series


Incorporating the variance transformation and the reallocation for the third quarter of 1987,
the seasonally differenced observed series Yf follows the model
Z l k ’ + cx(1 - B3)Ur,l+ P(1 - B3)Ut,z= Nt (7)

4 (B)+ ( B ) Nf
~ = 8 (B)e( B )tf (8)
where Z j k ’ , Uf,l and Ut,2 are as defined in equations (4), (5) and (6), respectively. This is a
transfer-function model for Z l k ’ , a transformation of Y,. The input series Ut,,and U,,Zare
wholly deterministic, so the model resembles those used in intervention analysis (Box and Tiao,
1975). Estimation of the model involves the simultaneous estimation of all the model
parameters: k, a, and the p + q + P + Q coefficients of the 4, 8, 9 and 8 polynomials. This
is done using the following two-step procedure.

Step 1.
For each of a grid of values of k, in this case k = 1.0,1.2, ..., 5.0, the transformed series Zjk’
is formed and the parameters a! and and the coefficients of the 4, 8, 9,8 polynomials are
estimated by the method of maximum likelihood for a transfer-function model. This
estimation procedure is available in standard software, e.g. Genstat (Genstat 5 Committee,
1987). The maximized log-likelihood t ( k ) , is computed by substituting the maximum-
likelihood estimates for the parameters and the chosen value of k into the log-likelihood
function of the transfer function model.

Step 2.
The optimal value of k is taken to be that value which maximizes the function L ( k ) + 5 c log k ,
with c = 33. This value, %, and the estimates for the other parameters at this value of k, are
the final simultaneously estimated parameters of the transfer-function model.
Estimated parameters of transfer-function models, using each of the candidate seasonal
models identified above are shown in Table 11. Some comments on these estimated parameters
are called for. The estimates of cx and P in all three models are not statistically significant.
Nonetheless, we consider these parameters essential to the model because we know a priori that
there was a constraint on shipments: we do not need to diagnose the presence of a constraint
from the data alone. All the fitted models are stationary and invertible, though some roots of
L . S.-Y. Wu, N . Ravishanker and J. R. M . Hosking IBM Product Sales 587

Table 11. Estimates of transfer-function model parameters. Standard errors of the estimates are given in
parentheses
~~ ~ ~~~

Parameter Model 1 Model 2 Model 3


k 2.2 3.8 3.8
a 224 318 763
(1646) ( 1 728) (2003)
P 2697 2545 3160
(1418) ( 1 743) ( 1993)
0.280
(0.144)
42 0.064
(0.151)
(PI - 0.864 - 1.016 - 0.457
(0.137) (0.092) (0.128)
(P2 - 0.717 - 0.954 - 0.268
(0.172) (0.122) (0.141)
a3 - 0.845 - 0.930 - 0.255
(0.107) (0.086) (0.142)
a4 0.366
(0.129)
el - 0.266 - 0.7 14
(0.153) (0.162)
e2 - 0.277 - 0.544
(0.151) (0.197)
e3 -0.815 - 0.638
(0.147) (0.161)
2
00 3.78 x lo6 5 . 2 6 ~lo6 7.37 x lo6
AIC 524.16 530.34 541.12

the autoregressive and moving-average polynomials lie close to the unit circle. For example,
the seasonal autoregressive operator of Model2, 1 + 1 .016B3 + 0.954B6 + 0.930B9, factorizes
as (1 + 0.996B3)(1+ 0.020B3+ 0.934B6), which may be regarded as containing a factor of
1 + B3 or even (1 + B 3 ) ( l + B 6 ) . This would suggest, since the seasonal differencing operator
1 - B 3 is also present in this model, that seasonal differencing with period
6 (1 - B6 = (1 - B 3 ) ( l + B 3 ) ) or even period 12 ( 1 - B I Z = ( 1 - B 3 ) ( l + B 3 ) ( l + B 6 ) ) may be
appropriate. We found, however, that models containing seasonal differencing with periods 6
and 12 gave inferior forecasts compared with Model 2.
Autocorrelations of the residual series from fitting each of the models are given in Figure 6 .
Those of Model 1 are not significantly different from zero, but there is some dependence in
the residuals from Models 2 and 3 . Therefore, based on the residual diagnostics, it may be
doubtful whether Models 2 and 3 adequately fit the data. However, Model 1 differs from
Model 2 only by the inclusion of autoregressive terms at lags 1 and 2, parameterized by 4, and
$2, and neither of these parameters is strongly significant in Model 1 .
588 Journal of Forecasting Vol. 10, Iss. No. 6

I
LAGS

9.

I
LAGS

LAGS

Figure 6. Autocorrelations of residuals from Models 1,2 and 3

Choice of a final model


Of the Box-Jenkins models which we have considered, Model 1 gives the best fit to the data
in the sense of having the smallest residual variance and the smallest AIC. However, our choice
of a final model is not based primarily on how well the candidate models fit the data; more
important is how well the models forecast future shipments. We shall choose as the best model
the one which produces the most consistently accurate forecasts over the recent past as the
forecast origin runs quarter by quarter from December 1986, and forecasts are made for the
next quarter, the next half-year, and the next full year. All our forecasts are 'out-of-sample':
for each forecast origin the model parameters are re-estimated using only data up to the
forecast origin.
We judge forecasts by a 'quarterly forecast criterion' (QFC), defined by

c IR-xfl
QFC = Mean " x 100 (9)
it I
c XI
r t Q,
zf
where Qi denotes the ith quarter, and and X I the forecast and the actual value, respectively,
at time t . I is an index set over which the mean is taken. We prefer QFC to more familiar
criteria such as MAPE (Mean Absolute Percentage Error, used, for example, by Makridakis
el al., 1982) so as not to give too much influence to forecast errors in months when shipments
are low.
QFCs for the three candidate models are shown in TableIII. In these QFCs, the index set
I is the forecast period (one quarter for next-quarter forecasts, two quarters for next-half-year
L . S.-Y. Wu, N . Ravishanker and J. R . M. Hosking IBM Product Sales 589

Table 111. QFCs for next-quarter, next-half-year and next-full-year forecasts from Box-Jenkins models
(1,2.3) and Holt-Winters forecast functions (A, B, C)

Forecast Forecast Model Model Model Function Function Function


origin period” I 2 3 A B C

Next
quarter
12/86 IQ87 27.0 23.1 18.7 40.6 40.9 44.4
3/87 2487 14.0 14.4 11.4 9.4 3.3 5.9
6/87 3Q87 - - - - - -
9/87 4Q87 27.3 28.7 33.7 22.4 16.0 20.2
12187 lQ88 13.4 6.9 16.2 27.9 15.5 27.3
3/88 2488 9.2 11.0 8.1 18.9 17.2 18.8
6/88 3488 19.8 17.4 21 .o 35.4 30.8 33.1
9/88 4488 7.8 7.8 5.6 22.8 15.5 21.8
Next
half-year
12/86 1H87 15.7 18.5 11.6 24.7 23.2 26.9
6/87 2H87 31.8 37. I 39.9 35.7 29.7 33.5
12/87 1H88 14.1 7.8 14.0 24.6 16.5 24.0
6/88 2H88 14.5 15.7 16.2 28.3 24.0 26.4
Next
full year
12/86 87 26.3 26.2 25.9 30.5 26.2 30.9
12/87 88 15.3 10.4 15.2 26.6 20.6 25.3

’1487 denotes first quarter of 1987, 1H87 denotes first half of 1987, erc.
bQFC is not given for the third quarter of 1987, since the monthly data for this quarter are unreliable.

forecasts, four quarters for next-full-year forecasts). For forecasts of 1988 shipments, Model
2 yields, on average, the lowest QFCs at all three lead times. For the 1987-8 forecasts as a
whole there is less to choose between the three models, but Model 2 still performs
competitively. We therefore choose Model 2, the transfer-function model with an
ARIMA(0, 0,O) x (3, 1,3)3 seasonal model, as the final model for forecasting the Boxes series.

FORECASTING THE BOXES SERIES USING HOLT-WINTERS FORECASTING


FUNCTIONS

Summary of the model


In this section we obtain forecasts of the Boxes series using the Holt-Winters (HW) adaptive
smoothing method. For good discussions of this method, see Montgomery and Johnson (1976)
and Abraham and Ledolter (1983). Chatfield and Yar (1988) discuss some of the practical
issues. The HW method is easier to understand than Box-Jenkins modeling. It is widely used
and has been found, in forecasting competitions where the same sets of data have been forecast
using different methods, to yield forecast accuracy comparable to that of Box-Jenkins models
(Chatfield and Prothero, 1973, Makridakis et a f . , 1982, 1984). For these reasons it made sense
to us to construct a HW procedure to forecast the Boxes series.
The HW multiplicative seasonal procedure for monthly data X I assumes that at month t the
590 Journal of Forecasting Vol. 10, Iss. No. 6

future evolution of the process can be described by the equation


Xt+j= (PI + P t j ) S + j + cl+j (10)
where p , and PI are, respectively, the local mean and local slope of the process at time 1 , and
the S, are seasonal factors which satisfy Si = Si+ 12 and SI+ + SIZ= 12. The estimates of plr
PI and S, are denoted by if,b1 and Sl,and are updated each month after observing X I . The
updating formulae are

where ( Y I , (112 and (113 are smoothing parameters which lie between 0 and 1. The forecast made
at month t for 1 months ahead xl(l),is computed as
%(I) = ( i ,+ fiIl)S,+,-l*, 1 = 1, ... ,12 (14)
To produce forecasts, three components are needed: initial values for the level, slope and
seasonal factors, a set of smoothing parameters, and a set of data over which to run the
updating equations to produce estimates of the current level, slope and seasonal factors. These
three components constitute a HW forecast function or HW function.

Modeling the special features in the data


In the data description section we describe two special features of our data: an increase in
variance at the end of 1984 and a constraint on shipments during July and August 1987. The
increase in variance arises from an intensification of the pattern of seasonality in the data. This
is handled automatically in the multiplicative seasonal HW procedure by the updating of the
slope and seasonal factors, and is why we prefer a multiplicative to an additive seasonal
structure. The change in variance may affect the amount of data which should be used in the
smoothing procedure. One approach would use only the data after the change occurred. This
has the advantage of not having to adapt to the change. However, an advantage of using all
the data is that with more data the choice of smoothing parameters would be less affected by
short-term aberrations in the data, and may well have a more consistent performance over
time. We felt that there was sufficient reason to try both approaches.
The constraint on shipments cannot be straightforwardly dealt with in the HW procedure.
Our approach is to reallocate the shipments in the third quarter of 1987 so as to minimize the
errors made in forecasting the third quarter of 1987 from a forecast origin of June 1987. The
reallocated values are the forecasts for the third quarter of 1987, made from a forecast origin
of June 1987, rescaled so that the sum of the reallocated values is equal to the actual total
shipments for the third quarter of 1987. Denoting June 1987 by J and the reallocated
shipments by % + I , 2J+2and 2J+3we define

where x~(1)is the I-month-ahead forecast made at J and given by equation (14). We then
replace the shipments x J + I , xJ+2and xJ+3by the reallocated XJ+I,2J+2and xJ+3in the
L . S.-Y. Wu,N . Ravishanker and J. R. M . Hosking IBM Product Sales 591

updating equations. This reallocation is based on the forecast from June 1987 alone, an
extrapolation from the past data. In contrast, the transfer-function reallocation described in
the previous section is based not only on the forecast from June 1987 but also on how well
the reallocated data forecasts October through December 1987, and may be regarded as an
interpolation using all the available data. Because the transfer-function reallocation uses more
information than the HW reallocation, we expect it to be more accurate.

Building Holt-Winters forecast functions


To start the updating equations (1 1)-(13), we need to specify starting values j i ~ 60
, and Sj- IZ,
j = 1, ..., 12. These starting values are obtained from the first two years of data, using one of
the standard ways to initialize the HW procedure (Montgomery and Johnson, 1976, p. 102;
Abraham and Ledolter, 1983, p. 171).
Smoothing parameters of HW functions are commonly chosen either by specifying
parameter values a priori or by minimizing the sum of squared one-step-ahead forecast errors
over a suitable fitting period (Abraham and Ledolter, 1983; Chatfield and Yar, 1988). We
employ a different procedure, similar to that used to select our Box-Jenkins model. We
construct several (in our case, three) sensible candidate HW forecast functions and choose
from these candidates the one that performs best over the most recent data. Our criterion for
choosing the smoothing parameters for each candidate HW function is the QFC of next-
quarter forecasts (equation (9)) averaged over a suitable period of time. If a QFC is averaged
over a time period which includes 1987, we exclude from that time period the third quarter
of 1987. This is because we know that the data for this quarter are unreliable, and in
consequence the forecast errors for this quarter are not meaningful.
Our three candidate HW functions include one that uses all the data (1982-7), one that uses
only the portion since the variance change occurred (1985-7). and one that uses all the data
with more weight placed on the performance of the recent 1987 forecasts.
Function A uses data from 1982 through 1987. Starting values for local mean, slope and
seasonality are calculated using the data for 1982 and 1983. The smoothing parameters are
chosen as follows. For many sets of ( Y I , a2 and a3 values we reallocate the third quarter of
1987 as described above, and we compute the QFC of next-quarter forecasts for 1984 through
1987 (i.e. the set I in the definition of QFC (equation (9)) consists of the 16 quarters 1484
through 4Q87). This QFC is computed for all possible values of ( Y I . a2 and ( ~ 3 ,where each (Yi
is a multiple of 0.05 between 0 and 1. The smoothing parameters are chosen to minimize QFC.
They are a 1= 0.3, (YZ = 0, a3 = 0.65.
Function B uses data from 1985 through 1987. Starting values for local mean, slope and
seasonality are calculated using the data for 1985 and 1986. The smoothing parameters are
chosen as for Function A, except that QFC is calculated for 1987 only. The smoothing
parameters, chosen to minimize QFC, are ( Y I = 0.05, a2 = 0.05, a3 = 0.85.
Function C uses data from 1982 through 1987. The difference between Functions A and C
lies in the choice of smoothing parameters. For Function A, parameters are chosen so as to
obtain forcasts that performed consistently well between 1984 and 1987. For Function C,
parameters are chosen so as to obtain forecasts that perform consistently well between 1984
to 1986, but particularly well in 1987. We do this by a two-step procedure. First, we choose
the 30 best sets of ( Y I , a2 and a3 based on the QFC of next-quarter forecasts for 1984 through
1986. Then for each of the 30 candidate sets we reallocate the third quarter of 1987 and
compute the QFC of the next-quarter forecasts for 1987. The smoothing parameters are chosen
to minimize the QFC for 1987. They are ( Y I = 0.25, (YZ = 0, ( ~ =3 0.7.
592 Journal of Forecasting Vol. 10, Iss. No. 6

Choice of a final model


Table I11 gives QFCs of next-quarter, next-half-year and next-full-year forecasts from the three
candidate HW forecast functions. Again the index set I in equation (9) is the forecast period.
We choose as our final HW function the one that performed best in 1988. This is Function
B. To check for consistency in the recent forecast performance of Function B, we compare its
1987 performance with Functions A and C. Function B clearly has the best 1987 performance
as well. Therefore Function B is our final HW forecast function.
Comparing the performance of HW Function B with that of Box-Jenkins Models 1,2 and
3, we see from Table 111 that each of the Box-Jenkins models performs far better than HW
Function B in both 1987 and 1988.

CONCLUSIONS

We have attempted to decide whether time-series methods can be appropriate for business
planning. By ‘appropriate’ we mean two things: (1) whether these methods can model and
estimate the special events or features often present in sales data; and (2) whether they can
forecast accurately enough one, two and four quarters ahead to be useful for business
planning.
The Boxes series studied in this report has some special features which standard time-series
models do not account for: a change in variance near the end of 1984 and a constraint on
shipments in 1987. We have shown that the Box-Jenkins transfer-function model provides an
excellent framework for modeling and simultaneously estimating all the parameters in the
model. The Holt-Winters forecast function can also deal with a constraint on shipments,
though in a somewhat ad hoc way. We found that the accuracy of our Holt-Winters forecasts
depended strongly on the amount of data used to build the forecast function. This shows that
when using a Holt-Winters procedure we cannot assume that sudden changes in the slope or
variance of the series can be taken care of by the adaptive updates. Instead, we need to pay
close attention to the amount of data used.
For the Boxes series, the forecasts obtained from our best Box-Jenkins models were clearly
better than those obtained from our best Holt-Winters forecast functions. This is true for all
of the forecasts that we considered-one, two and four quarters ahead. We therefore choose
a Box-Jenkins model, specifically Model 2 defined in the third section, as our best time-series
forecasting model.
Finally, we need to determine whether our Box-Jenkins model is accurate enough to be
useful for business planning. We have compared the 1988 performance of each of our
Box-Jenkins models with the 1988 track. The track was constructed by IBM planners as a
feasible path for reaching the annual plan, and shipments are managed with the aim of
achieving it. The track is therefore intended as a monthly sales objective and not as a direct
forecast of actual shipments. However, it is a reasonable yardstick by which to measure
whether the Box-Jenkins model can be useful for business planning. Figure 7 shows the track
for 1988, established in December 1987, together with the Box-Jenkins forecasts for 1988,
made at origin December 1987 using Model 2. The QFC of these next-full-year forecasts from
Model 2 for 1988 is 10.4. This is much lower than the corresponding QFC of the track, which
is 23.1. Indeed, QFCs of forecasts from Box-Jenkins Models 1,2 and 3 are all markedly lower
than the QFC of the track. We therefore conclude that Box-Jenkins models can be
appropriate for business planning. They should be particularly useful at the end of a year for
determining a baseline business-as-usual annual plan and track for the year ahead, and in mid-
year for correcting the annual plan and resetting the track.
L. S.-Y. Wu, N. Ravishanker and J. R. M . Hosking IBM Product Sales 593

25
......... Track
Model 2 forecasts

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nav Dec
Figure 7. Planner track and time-series forecasts (origin 12/87) for 1988

APPENDIX

Suppose that the time series Yr follows the ARMA model


4(B)+(BS)Yt= B(B)O(Bs)ar
with innovations which satisfy

(d t>c

Asymptotically, the log-likelihood for Y, is


C
2
n
L , ( k ) = C + - log k - - log
2
k c
20, r = l
a?---I
1
9
20, r = c + ~
a?

where C is a constant. This is a function of the Y, through equation (Al), which relates at to
Y,. Our goal is t o find a way t o estimate k and the parameters of the Box-Jenkins model
simultaneously so as to maximize L,(k).For known k, u3 is estimated by

Substituting 6; in L,(k), the maximized likelihood for known k is

Now let

and consider fitting the model


$ ( B ) + ( B s ) Z ( k=) B(B)B(BS)b,
The log-likelihood function for Z j k ) for known k is asymptotically

Lb(k) = c - -
n log u; - 7g b ?
2 2ub r = I
594 Journal of Forecasting Vol. 10, Iss. No. 6

which, concentrated over at, is


n
Lb(k) = c - - log
2
f3g - -n2 (A9)

But approximately
bt =
t>c
so that, by equation (A4),

Therefore the two likelihoods La(k)and &k) are identical apart from the term ic log k , i.e.
c
C ( k )= L*(k)+ - log k (A 12)
2
This implies that for a given value of k, if we fit a model (A7) that maximizes Lb(k), then the
same model also maximizes L,(k).If we choose k to maximize & , ( k ) + i c log k, then this
choice also maximizes L,(k), giving the maximum-likelihood estimate of model (Al).
Tsay (1988) has suggested a method of dealing with a variance change in a time series both
when the change point is known as well as when it is unknown. Tsay’s method estimates the
variance ratio from the residuals of a time-series model fitted to the original data, transforms
the data by this ratio and fits an ARIMA model to the transformed data. Like the method
described above, it relies on the approximation (A10) but the details of the calculations are
different, and the two methods would, in general, give slightly different results.

ACKNOWLEDGEMENTS

We are grateful to the referees for many helpful comments.

REFERENCES

Abraham, B. and Ledolter, J . Statistical Methods for Forecasting, New York: John Wiley (1983).
Box, G. E. P. and Jenkins G. M., Times Series Analysis Forecasting and Control, 2nd edn, San
Francisco: Holden-Day (1976).
Box, G. E. P. and Tiao, G. C. ‘Intervention analysis with applications to economic and environmental
problems’, Journal of the American Statistical Association, 70 (1979, 70-79.
Chatfield, C. and Prothero, D. L., ‘Box-Jenkins seasonal forecasting: problems in a case study’, Journal
of the Royal Statistical Society, Series A . 136 (1973). 295-336.
Chatfield, C. and Yar, M., ‘Holt-Winters forecasting: some practical issues’, The Statistician, 37 (1988),
129-40.
Genstat 5 Committee, Genstat 5 Reference Manual, Oxford: Clarendon Press (1987).
Makridakis, S., Andersen, A., Carbone, R., Fildes. R., Hibon, M., Lewandowski, R., Newton, J.,
Parzen, E. and Winkler, R., ‘The accuracy of extrapolation (time series) methods: results of a
forecasting competition’, Journal of Forecasting, 1 (1982), 111-53.
Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M.. Lewandowski. R., Newton, J.,
Parzen, E. and Winkler, R.. The Forecasting Accuracy of Major Time Series Methods, New York:
Wiley (1984).
L . S.-Y. Wu,N. Ravishanker and J. R. M. Hosking IBM Product Sales 595

Montgomery, D. C. and Johnson, L. A., Forecasting and Time and Series Analysis, New York:
McGraw-Hill (1976).
Tsay, R. S., ‘Outliers, level shifts and variance changes in time series’, Journal of Forecasting, 7 (1988),
1-20.
Wu, L. S.-Y., ‘Business planning under uncertainty: quantifying variability’, The Statistician, 37 (1988),
141-52.

Authors ’ biographies:
Lilian Shiao-Yen Wu has been a Research Staff Member at the T. J. Watson Research Center of IBM
Research Division since 1974. She received a PhD in Applied Mathematics from Cornell University in
1974. Her interests include forecasting, business planning and data analysis.
Nalini Ravishanker is an Assistant Professor in the Department of Statistics at the University of
Connecticut at Storrs. She received a PhD in Statistics from New York University in 1987. Her interests
include simultaneous confidence intervals and the differential geometry of time-series models.
J. R. M. Hosking has been a Research Staff Member at the T. J. Watson Research Center of IBM
Research Division since 1986. He received a PhD in Statistics from the University of Southampton in
1979. His interests include time-series analysis, distribution theory and flood frequency analysis.

Authors’ addresses:
Lilian Shiao-Yen Wu, IBM T. J. Watson Research Center, P O Box 218, Yorktown Heights, NY 10598,
USA.
Nalini Ravishanker. Department of Statistics, University of Connecticut, Storrs, CT 06268, USA.
J. R. M. Hosking, IBM T. J. Watson Research Center, PO Box 218, Yorktown Heights, NY 10598,
USA.

You might also like