You are on page 1of 7

Journal o f Intelligent Manufacturing (1992) 3, 317-323

Connectionist approach to time series


prediction: an empirical test

RAMESH SHARDA land RAJENDRA B. P A T I L 2

1Oklahoma State University, Stillwater, Oklahoma 74078-0555, USA


2Department of Computer Science, New Mexico State University, Las Cruces, NM 88003,
USA

Received May 1990 and accepted October 1991

Among the various potential applications of neural networks, forecasting is considered to


be a major application. Several researchers have reported their experiences with the use of
neural networks in forecasting, and the evidence is inconclusive. This paper presents the
results of a forecasting competition between a neural network model and a Box-Jenkins
automatic forecasting expert system. Seventy-five series, a subset of data series which have
been used for comparison of various forecasting techniques, were analysed using the
Box-Jenkins approach and a neural network implementation. The results show that the
simple neural net model tested on this set of time series could forecast about as well as the
Box-Jenkins forecasting system.

Keywords: Forecasting, time-series, neural networks, connectionist expert systems, back


propagation applications

1. Introduction Several authors have attempted to apply this idea for


forecasting a time series. Werbos (1988) states that he
One of the major problems in manufacturing is forecast- laid the foundations for use of back propagation in
ing. The need for forecasting arises in all aspects of forecasting in his doctoral dissertation, Werbos (1974).
manufacturing, starting from inventory planning to re- In one of his papers (Werbos, 1988) he describes an
source requirements to industrial process control. Fore- application of back propagation to locate sources of
casting in each of these areas has traditionally been forecast uncertainty in a recurrent gas market model.
accomplished using various techniques. For example, Lapedes and Farber (1987) used a multilayered per-
smoothing models are commonly used for inventory ceptron to predict the values of a nonlinear dynamic
modeling. Time series approaches are used for process system with chaotic behavior. They illustrated the
control etc. method by selecting two common topics in signal proces-
One of the most promising applications of artificial sing, prediction and system modeling, and showed that
neural networks may indeed be in forecasting problems the nonlinear applications can be handled extremely well
in manufacturing and other areas. The autoassociative by using nonlinear neural networks. They reported that
memory of certain neural network models can be tapped neural networks gave superior prediction for their dyna-
in prediction problems. Smolensky (1986) specifies a mic system.
dynamic feed-forward network in the following way: Sutton (1988) introduces a class of incremental learn-
ing procedures specialized for prediction - that is, for
ui(t + 1) = F[ExWkiG(u~(t))]
using experience with an incompletely-known system to
where ui(t) is the activation of unit i at time t, F is a predict its future behavior. Whereas conventional predic-
nonlinear sigmoid transfer function, G is a nonlinear tion-learning methods assign credit by the difference
threshold function and Wki is the connection strength or between predicted and actual results, the method used by
weight from unit k to unit i. This relationship can, in the author assigns credit by the difference between
principle at least, be used for predicting future values of temporally successive predictions. He argues that most
variables. problems to which supervised learning is now applied are
0956-5515 9 1992 Chapman & Hall
318 Sharda and Patil

really prediction problems of some sort to which tempor- versus multiple step ahead forecasting with neural net-
al-difference methods can be applied to advantage. works.
Fozzard et al. (1989) discuss a neural nets-based expert
system for solar flare forecasting, and claim that its
performance is superior to human experts. Tang et al. 2. Data, models and methods
(1990) discuss the results of a test of the performance of
neural networks and conventional methods in forecasting 2.1. Data
time series. The authors experimented with three time The time series were selected from the famous
series of different complexity using different feedfor- M-Competition (Makridakis et al., 1982) to compare the
ward, back propagation models and the Box-Jenkins performance of various forecasting techniques. Out of
model. Their experiments demonstrated that for time 1001 series collected, only 111 series were analyzed in the
series with long memory, both methods produced com- M-Competition using Box-Jenkins methodology. This
parable results. However, for series with short memory, was done because the Box-Jenkins approach requires an
neural networks outperform the Box-Jenkins model. The analyst's intervention and is thus quite time consuming.
authors conclude that neural networks are robust, par- Pack and Downing (1983) examined this 111 series subset
simonious in their data requirements and provide good and concluded that several series were not appropriate
long-term forecasting. All of these results are based on for forecasting using the Box-Jenkins technique. Sharda
comparison of the techniques using three time series. and Patil (1990) took a sample of 75 series from this 111
However, the experiences with neural networks in series set after considering Pack and Downing's recom-
forecasting are not all positive. Fishwick (1989), for mendations. The tests reported here are based on the full
example, reports that the forecasting ability of neural 111 series. Of course, the comparisons between the
networks was inferior to simple linear regression and Box-Jenkins technique and the neural network can only
surface response methods. There are some trade maga- be made using the subset where both techniques were
zine articles about use of neural networks in stock price able to build a model. Our test set contains 13 annual, 20
forecasting, but no concrete descriptions can be found quarterly and 68 monthly series. In Table 1, series
(perhaps for confidentiality reasons). Even when the use numbers under 112 are annual, series numbers under 382
of neural networks in forecasting has been shown to be and over 112 are quarterly, and the rest are monthly.
positive, it is usually based on test data sets from a
particular problem domain.
Sharda and Patil (1990) reported the results of an 2.2. Method
empirical test of neural networks which shows that neural One hundred and eleven data sets were analyzed using
networks may be used for time series forecasting, at least the following approach. For each data set, n - k observa-
for a single period forecasting problem. The authors tions were used to build the forecast model (to train the
tested and compared a sample of 75 data series con- network), and then the model (the trained neural net-
taining annual, quarterly and monthly observations using work) was used to forecast the future k values, where
neural network models and traditional Box-Jenkins fore- k - - 6 , 8, 18 for annual, quarterly and monthly series
casting models. The simple neural network models tested respectively. These values are well established for such
on 75 data series could forecast about as well as an comparisons in the forecasting literature. The generated
automatic Box-Jenkins ARIMA modeling system. Each forecasts were compared with the actual values for the k
method outperformed the other in about half of the test. periods, and mean absolute percent error (MAPE) was
These tests were based on one particular set of learning computed for each series. We also computed the abso-
parameters and one architecture. lute percent error by forecast horizon for monthly series.
This paper also reports of a forecasting competition
between a neural network model and a traditional
forecasting technique, Box-Jenkins forecasting. Several 2.3. Models
data series from a comprehensive forecasting competition 2.3.1. Box-Jenkins method
were analyzed using a neural network model and the
This approach to time series forecasting is well known
Box-Jenkins time series forecasting techniques. The data
(Box and Jenkins, 1976) and has been applied in
series came from a variety of sources. This paper reports
practice. It is considered to be a 'sophisticated' approach
the performance of a neural network model using several
to forecasting, but is quite complex to use. Essentially
different learning parameters. Further, it compares the
the analyst examines both the auto and partial
forecasting ability of a neural net model and the Box-
autocorrelations and identifies models of the form
Jenkins technique in forecasting multiple horizons ahead.
Previous research has not studied the issue of single step ,~(B)4,(~)VdVs~ c) = O(B)O(BS)a,
Connectionist approach to time series prediction 319

where B is the back-shift operator (i.e. Bxt = xr-a). pattern. Each pattern had different minimum and max-
imum values. While normalizing the output was written
V = 1 - B; s = seasonality, at = white noise;
to two different files. The number of patterns written to
~b(B) and ~(B s) are nonseasonal and seasonal autore-
the training file and the testing file were:
gressive polynomials respectively;
O(B) and O(B s) are nonseasonal and seasonal moving Number of training patterns = total patterns - number
average polynomials respectively; of test patterns.
Zt = series (transformed if necessary) to be modeled. Number of test patterns = 18, 8, 6 for monthly,
quarterly and yearly data
After identification of several candidate models, the
series respectively.
analyst can iterate through the process of estimation and
diagnostic-checking. Once the final model has been Slightly different normalizing techniques were used
selected, the forecasting process can begin. over the test file and the training file. In the training file
The process of model identification, estimation and the minimum and maximum was found over each com-
diagnostics-checking has been automated and is available plete pattern (input and output part) and each number in
in the form of a forecasting expert system. The perform- the pattern (input and output) was normalized using:
ance of such an automatic expert system has been
normalized value = (original value - min)/(max - rain)
reported to be comparable to real experts (Sharda and
Ireland, 1987). For our tests, we used such an automatic Note that maximum and minimum values are over all
Box-Jenkins modeling expert system, A U T O B O X (AFS numbers of the input and output part of each training
Inc., 1988). This program can take a data set and iterate pattern and each pattern was normalized by its own
through the model identification, estimation and diagnos- minimum and maximum values. This approach to norma-
tics process to develop the best model. The series were lizing data runs the risk of not capturing any global
run in A U T O B O X using its default setting with no patterns. However, it approximates the learning behavior
intervention detection. of someone who would observe each pattern containing
historical data and target forecasts. Further tests can be
2.3.2. Neural network model performed to test the effectiveness of global normaliza-
A back propagation rule was used to train a multilayered tion. In the case of the test file the maximum and
perceptron network. We used one hidden layer. The minimum values were only over the input part of the
number of neurons in the input, hidden layers and output testing patterns, because the output part of the pattern is
were based on an input test to be described shortly. not supposed to be known but is needed in the pattern
Different architectures, with increasing number of for calculating the pattern error and total sum of squares
hidden layer neurons, were trained over different values error (tss) while testing. The same formula was used for
of learning parameters to find the optimal learning normalizing the test file. The maximum and minimum
parameters first and then find the optimal architecture values were saved only over the test file for denormaliza-
for a given class of data series. M A P E (Mean Absolute tion.
Percent Error) and M e A P E (Median Absolute Percent After this preprocessing the neural network model was
Error) were used as the measure of performance. Nine trained with learning rate -- 0.1 and momentum = 0.1.
combinations of three learning rates (0.1, 0.5, 0.9) and These values were found to be the best from the pilot
three m o m e n t u m values (0.1, 0.5, 0.9) were tried. tests described earlier. Maximum cycles used for training
Ten data series were trained over different were 1000. Training was stopped if total sum of squares
architectures and learning parameter sets. Using the error reached 0.04 before training reached 1000 cycles. If
M A P E as the error criterion, the best learning para- a series converged in less than 1000 cycles, the total sum
meters and architecture from this test set were taken to of squares error over all the training patterns was less
model remaining data series. Each hidden neuron and than 0.04. The same seed was used in all our tests to
output neuron had a nonlinear sigmoidal transfer func- initialize the network in order to facilitate more
tion. appropriate comparison of their performances. While the
The neural network software used for this test was the choice of a seed is known to affect a neural network's
popular PDP software (McClelland and Rumelhart, performance, we chose to use a randomly-generated seed
1988). This program requires that the data be normal- and use the same seed across all series. This would
ized. The data series were normalized by row. While emulate a novice forecaster considering use of neural
normalizing the file two different data files were created; networks for forecasting. The results generated using this
a training file and a test file. The format of these two files approach could only be improved further by selecting the
were the same as the unnormalized data file except that seed as well as other parameters of a neural network
each number was now normalized using the minimum model after careful problem analysis.
and maximum value over the corresponding complete Once a network was trained, the test file was submitted
320 Sharda and Patil

to the network and the activations of output neurons expert system. The mean of the M A P E s for the neural
w e r e logged to a file and were denormalized using the nets model is slightly less than that for the Box-Jenkins
normalization p a r a m e t e r s saved during normalization of modeling system. H o w e v e r , due to a large standard
the test file. This produced the forecasts generated by the deviation, the difference is insignificant. A pair wise
trained network using the test file. means test also indicates the same result. Forecasts using
A U T O B O X resulted in lower M A P E s for 22 series, and
thus the neural network model was able to do better in
3. Results and discussion the other 50 series. However, the large standard devia-
tions still m a k e these differences insignificant.
Table 1 exhibits the M A P E s of A U T O B O X and the When the series are grouped on the basis of periodic-
neural network approach. These are the averages of ity, the M A P E s are still insignificantly different between
absolute percent errors over all forecasting horizons for the two approaches. This suggests that the periodicity of
each series. It shows that the simple (training algorithm) the series being modeled does not affect a technique's
neural network approach p e r f o r m e d as well as a forecast performance. It was quite interesting, at least for us, that
the neural network model was able to incorporate
seasonality automatically, just as A U T O B O X is able to
Table 1. MAPE comparison of Box-Jenkins (AUTOBOX) and do.
neural network performance as forecasting experts.
Figure 1 shows a graphical comparison of the M A P E s
Series Autobox NN-PDP Series Autobox NN-PDP of the 43 monthly series included in Table 1. This chart
indicates that the pattern of M A P E s for both approaches
SER4 23.53 6.43 SER499 13.50 10.02 is quite similar.
SER13 6.58 2.59 SER508 16.11 5.89 These results are quite encouraging for the proponents
SER31 18.07 0.55 SER526 9.95 14.75 of neural networks as a forecasting tool. Obviously this
SER40 10.16 1.03 SER544 2.72 3.98 work needs to be replicated to assess the full potentiality
SER49 24.77 7.65 SER571 5.57 5.82 of neural networks for forecasting. Possible use of other
SER58 72.57 1.34 SER580 3.03 3.28 neural network architectures and m o r e sophisticated
SER85 37.00 0.29 SER589 9.66 8.20 training algorithms m a y improve the results.
SERll2 4.55 0.10 SER598 23.49 6.36
SER184 20.61 6.92 SER616 7.22 4.62
SER193 44.43 32.52 SER634 13.70 18.67 3.1. Different parameter effects
SER202 21.51 3.95 SER643 18.92 20.29
SER211 26.31 36.03 SER652 15.23 15.54 One of the objectives here was to examine the effect of
SER220 37.10 47.59 SER661 19.80 20.52 different architectural and learning parameters on the
SER229 42.19 4.19 SER670 28.39 30.27 p e r f o r m a n c e of neural networks over time series model-
SER238 14.22 4.79 SER679 20.42 40.01 ing. This test was carried out with monthly data. Diffe-
SER265 21.15 5.52 SER688 12.86 11.12 rent values of learning rate and m o m e n t u m were used to
SER292 12.67 13.94 SER697 4.13 3.63 train a 12-12-1, 12-12-2, 12-12-4, 12-12-6, 12-12-8, 12-12-
SER301 2.91 2.25 SER706 20.11 9.66 12 architecture, generating forecasts over 1, 2, 4, 6, 8,
SER310 4.46 2.90 SER715 61.31 83.83 and 12 periods ahead respectively. These architectures
SER319 14.30 4.39 SER724 17.43 22.44 were trained over 9 different combinations of learning
SER328 8.02 3.14 SER733 14.21 21.52 rate (0.1, 0.5, 0.9) and m o m e n t u m (0.1, 0.5, 0.9) values.
SER337 8.52 3.86 SER742 5.00 2.05
Observations over this test gave the optimal learning
SER346 20.23 11.15 SER751 5.19 8.17
parameters. These optimal learning parameters were
SER355 4.75 3.92 SER760 7.75 8.05
SER364 3.73 1.25 SER787 3.17 1.87 then used to train remaining monthly data series over
SER382 32.78 18.25 SER796 17.68 15.96 different architectures as: 12-6-1, 12-12-1, 12-18-1, 12-24-
SER400 14.14 11.42 SER805 7.93 1.18 1 to observe the effect of a different n u m b e r of hidden
SER409 42.80 32.09 SER823 1.29 0.65 neurons. The architecture which p e r f o r m e d well in
SER418 19.23 27.37 SER832 7.96 1.29 average analysis was then considered as the optimal
SER427 9.89 8.53 SER877 3.73 3.38 architecture. The optimal architecture test was carried
SER436 7.09 1.85 SER904 3.88 3.22 out only over a one-step-ahead forecast.
SER445 25.71 12.87 SER913 58.53 48.90 Figure 2 shows the average one-step-ahead forecast
SER454 8.48 6.85 SER922 26.35 5.22 error with different network architectures. These results
SER463 6.19 19.95 SER958 13.67 11.36
are over SER400 and SER409 data series. It is inter-
SER472 19.21 20.76 SER967 38.74 20.68
esting to observe the effect of increasing the n u m b e r of
SER481 32.36 31.00 mean 15.94 14.67
SER490 7.90 8.71 stdev 15.18 15.39 output neurons. It is observed that, as the n u m b e r of
output neurons was increased from 1 to 12 the n + 1
Connectionist approach to time series prediction 321

70.00

60.00

50.00

40.00 !1I /, [
/,,'
- - NN-PDP
MAPE
30.00 - - Autobox

20.00
a
\\ / I ~, , /\W / / ///'"- '~ /I "\ A / ~l
10.00

0.00

2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
Series

Fig. 1. MAPEs for monthly series.

8o.O0
60.00
,40.00
2~176
0
o.OC

Architecture
~tum

Fig. 2. Effect of learning parameters and architecture.

(single step) forecast improved. Figure 2 also shows the appeared to be the best architecture. This architectural
effect of different learning parameters on the network parameter (equal number of input and hidden neurons)
performance. It is seen that the learning parameters with was then used in further tests.
learning rate = 0.1, and m o m e n t u m = 0.1 gave the best
results. The performance was poor whenever the mo-
3.2. Multiple horizonforecasting
mentum value was high. Again, the optimal parameters
were learning rate = 0.1 and momentum = 0.1. We analyzed 43 monthly series further using A U T O B O X
Optimal learning parameters found in the above test and the neural approach. As before, 18 observations
were then used to train the remaining set of data series to were held back. The remaining data were used to build a
find the optimal architectures. This test was carried out model and forecasts were made at one origin so that the
only for single step forecast. On average analysis, 12-12-1 performance of a technique could be evaluated in terms
322 Shardaand Paul

30
network's ability to forecast in a fuzzy sense is more
25 appropriate than the other forecasting methods. Our
present study was limited to univariate forecasting, but it
20
is expected that neural networks may be used for
MAPE 15 multivariate forecasting also. For time series with long
memory, both Box-Jenkins models and neural networks
10
perform well, with Box-Jenkins models slightly better for
5 short-term forecasting. With short memory, neural net-
works outperform Box-Jenkins (Tang et al., 1990).
0 I I r ] I I I I I I I I I I [ [ I

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Neural networks can be trained to approximate the


Horizon underlying mapping of time series, albeit the accuracy of
approximation depends on a number of factors such as
Fig. 3. Mean APE by forecasting horizon (43 monthly series).
the neural network structure, learning method, and
training procedure. Without a hidden layer, the linear
of forecasting multiple horizons. A U T O B O X was used neural network model is functionally similar to the
in its default mode. The neural network model was a Box-Jenkins A R I M A model.
simple 12-12-18 network. Table 2 gives the average The neural network structure and training procedure
absolute percent error for the 43 series at each forecast- have great impact on its forecasting performance. This
ing horizon. Figure 3 depicts these same results graphi- fact is evident from the present work that we are doing.
cally. It is apparent that A U T O B O X had a lower M A P E Learning algorithms used in our study are by no means
than the neural network models at most of the forecast- the best. We believe that there is still much room for
ing horizons. However, the M A P E for neural networks is improvement of neural network forecasting.
more stable than for A U T O B O X . Again, the closeness
of errors between these two approaches suggests that the
neural network models should be investigated further.
References

4. Conclusions and future research directions AFS Inc. (1988) A U T O B O X Software User Manual, Automatic
Forecasting Systems, Hatboro, PA.
Box, G. E. P. and Jenkins, G. M. (1976) Time Series Analysis:
Neural networks appear to provide a promising alterna- Forecasting and Control, Holden-Day, San Francisco, CA.
tive approach to time series forecasting. The neural Fishwick, P. (1989) Neural network models in simulation: a
comparison with traditional modeling approaches, in Pro-
Table 2. MAPEs of AUTOBOX and NN model at various ceedings of Winter Simulation Conference, Washington,
horizons (43 series) DC, pp. 702-10.
Fozzard, R., Bradshaw, G. and Ceci, L. (1989) A connectionist
LAG Autobox NN-PDP expert system for solar flare forecasting, in Advances in
Neural Information Processing Systems I, Touretzky, D. S.
1 10.75 15.35 (ed.), Morgan Kaufmann Publishers, Inc., San Mateo,
2 18.19 14.97 CA, pp. 264-71.
3 10.31 15.17 Lapedes, A. and Farber, R. (1987) Nonlinear signal processing
4 14.00 15.49 using neural networks: prediction and system modeling,
5 13.32 16.30 Los Alamos National Lab Technical Report LA-UR-87-
6 16.79 17.16 2261, July.
7 15.17 17.30 Makridakis, S., Anderson, A., Carbone, R., Fildes, R.,
8 11.53 17.21 Hibdon, M., Lewandowski, R., Newton, J., Parzen, E. and
9 16.53 17.22 Winkler, R. (1982) The accuracy of extrapolation (time
10 12.57 17.75 series) methods: results of a forecasting competition.
11 12.72 17.36 Journal of Forecasting, 1, 111-53.
12 13.41 17.10 McClelland, J. and Rumelhart, D. (1988) Exploration in
13 16.48 18.57 Parallel Distributed Processing: A Handbook of Models,
14 28.45 19.60 Programs and Exercises, The MIT Press, Cambridge, MA.
15 14.84 19.12 Pack, D. J. and Downing, D. J. (1983) Why didn't Box-Jenkins
16 14.61 19.08 win (again)?, in 3rd International Symposium on Forecast-
17 14.34 19.46 ing, Philadelphia.
18 17.78 19.36 Sharda, R. and Ireland, T. (1987) An empirical test of
Average 15.10 17.42 automatic forecasting systems, ORSA/TIMS Meeting, New
Orleans, May.
Connectionist approach to time series prediction 323

Sharda, R. and Patil, R. (1990) Neural networks as forecasting Tang, Z., de Almedia, C. and Fishwick, P. A. (1990) Time
experts: an empirical test, in Proceedings of the Interna- series forecasting using neural networks vs. Box-Jenkins
tional Joint Conference on Neural Networks, IJCNN- methodology, International Workshop on Neural Networks,
WASH-D.C., Jan. 15-19, II, pp. 491-4. Feb. 2-4, Auburn, AL.
Smolensky, P. (1986) Neural and conceptual interpretation of Werbos, P. (1974) Beyond regressions: new tools for prediction
PDP models, in Parallel Distributed Processing, Vol. 2, and analysis in the behavioral sciences, PhD Thesis,
McClelland, J. L. and Rumelhart, D. L. and the PDP Harvard University, MA.
Research Group (eds), MIT Press, Cambridge, MA, p. Werbos, P. (1988) Generalization of back propagation with
397. application to recurrent gas market model. Neural Net-
Sutton, R. S. (1988) Learning to predict by the methods of works, 1,339-56.
temporal differences. Machine Learning, 3, 9-44.

You might also like