You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/290106965

Neural Networks for Time-Series Forecasting

Chapter · January 2001


DOI: 10.1007/978-0-306-47630-3_12

CITATIONS READS

42 2,532

2 authors, including:

William Remus
University of Hawai'i System
71 PUBLICATIONS   2,832 CITATIONS   

SEE PROFILE

All content following this page was uploaded by William Remus on 09 May 2016.

The user has requested enhancement of the downloaded file.


NEURAL NETWORKS FOR
TIME-SERIES FORECASTING

William Remus
Department ofDecision Science, University ofHawaii
Marcus O'Connor
School ofInformation Systems, University ofNew South Wales

ABSTRACT
Neural networks perfonn best when used for (1) monthly and quarterly
time series, (2) discontinuous series, and (3) forecasts that are several pe-
riods out on the forecast horizon. Neural networks require the same good
practices associated with developing traditional forecasting models, plus
they introduce new complexities. We recommend cleaning data (includ-
ing handling outliers), scaling and deseasonalizing the data, building
plausible neural network models, pruning the neural networks, avoiding
overfitting, and good implementation strategies.

Keywords: Discontinuities, forecasting, neural networks, principles, sea-


sonality.

Research has given us many methods for forecasting, many of which rely on statistical tech-
niques. Since 1980, much research has focused on detennining the conditions under which
various methods perform the best (Makridakis et al. 1982; Makridakis et al. 1993). In general,
no single method dominates all other methods, but simple and parsimonious methods seem to
perform best in many of the competitive studies.
In the early 1980s, researchers proposed a new forecasting methodology to forecast time
series, neural networks. We provide principles for the use and estimation of neural networks
for time-series forecasting and review support for their merits which varies from mathemati-
cal proofs to empirical comparisons.

USING NEURAL NETWORKS


Neural networks are mathematical models inspired by the functioning of biological neurons.
There are many neural network models. In some cases, these models correspond closely to

J. S. Armstrong (ed.), Principles of Forecasting


© Springer Science+Business Media New York 2001
246 PRINCIPLES OF FORECASTING

biological neurons, and in other cases, the models depart from biological functioning in sig-
nificant ways. The most prominent, back propagation, is estimated to be used in over 80 per-
cent of the applications of neural networks (Kaastra and Boyd 1996); this model is explained
in the Appendix. Rumelhart and McClelland (1986) discuss most of the neural network mod-
els in detail.
Given sufficient data, neural networks are well suited to the task of forecasting. They excel
at pattern recognition and forecasting from pattern clusters. The key issue to be addressed is
in which situations do neural networks perform better than traditional models. Researchers
suggest that neural networks have several advantages over traditional statistical methods.
Neural networks have been mathematically shown to be universal approximators of func-
tions (Cybenko 1989; Funahashi 1989; Hornik, Stinchcombe and White 1989) and their de-
rivatives (White, Hornik and Stinchcombe 1992). This means that neural networks can ap-
proximate whatever functional form best characterizes the time series. While this universal
approximation property offers little value if the functional form is simple (e.g., linear), it al-
lows neural networks to model better forecasting data with complex underlying functional
forms. For example, in a simulation study, Dorsey and Sen (1998) found that neural networks
gave comparable levels of model fit to properly specified polynomial regression models.
Neural networks, however, did much better when the polynomial form of a series was not
known.
Theoretically, neural networks should be able to model data as well as traditional statistical
methods because neural networks can approximate traditional statistical methods. For exam-
ple, neural networks have been shown to approximate ordinary least squares and nonlinear
least-squares regression (White 1992b, White and Stinchcombe 1992), nonparametric regres-
sion (White 1992a), and Fourier series analysis (White and Gallant 1992).
Neural networks are inherently nonlinear (Rumelhart and McClelland 1986; Wasserman
1989). That means that they estimate nonlinear functions well (White 1992a, 1992b; White
and Gallant 1992; White and Stinchcombe 1992).
Neural networks can partition the sample space and build different functions in different
portions of that space. The neural network model for the Boolean exclusive OR function is a
good example of such a model (Wasserman 1989, pp. 30--33). Thus, neural networks have a
capability for building piecewise nonlinear models, such as forecasting models that incorpo-
rate discontinuities.
It might seem that because of their universal approximation properties neural networks
should supercede the traditional forecasting techniques. That is not true for several reasons.
First, universal approximation on a data set does not necessarily lead to good out-of-sample
forecasts (Armstrong 2001). Second, if the data fit the assumptions of a traditional model,
generally the traditional model will be easier to develop and use. Thus, while neural networks
seem a promising alternative to traditional forecasting models, we need to examine the em-
piricalliterature on their forecasting performance.
Researchers have compared point-estimate forecasts from neural networks and traditional
time series techniques (neural networks provide point-estimate forecasts but not prediction
a
intervals). Sharda and Pati1 (1990) used 75 of III time-series subset of the M-Competition
data and found that neural network models were as accurate as the automatic Box-Jenkins
(Autobox) procedure. The 36 deleted series did not contain enough data to estimate either of
the models. Foster, Collopy, and Ungar (1992) also used the M-Competition data. They found
neural networks to be inferior to Holt's, Brown's, and the least-squares statistical models for
Neural Networks for Time-Series Forecasting 247

time series· of yearly data,. but comparable with quarterly data; they did not compare the mod-
els on monthly data.
Kang (1991) compared neural networks and Autobox on the 50 M-Competition series.
Overall, Kang found Autobox to have superior or equivalent mean absolute percentage error
(MAPE) to that for 18 different neural network architectures. In addition Kang compared the
18 neural network architectures and Autobox models on seven sets of simulated time-series
patterns. Kang found the MAPE for the 18 neural network architectures was superior when
the data included trend and seasonal patterns. Kang also found that neural networks often
performed better when predicting points on the forecasting horizon beyond the first few peri-
ods ahead.
These results are mixed; thus, we were inspired to attempt a more comprehensive com-
parison of neural networks and traditional models (Hill, O'Connor and Remus 1996). The
traditional models we considered were Box-Jenkins and deseasonalized exponential
smoothing. Deseasonalized exponential smoothing was found to be one of the most accu-
rate methods and Box-Jenkins a bit less accurate in the two major comparative studies of
traditional forecasting methods (Makridakis et al. 1982, Makridakis et al. 1993). In addi-
tion, we used the method based on combining the forecasts from six other methods from
the first competition and a naive model. The data was a systematic sample of the Makrida-
kis et al. (1982) competition data. We standardized many other procedural differences
between the earlier studies discussed.
Exhibit 1 shows the results from Hill, O'Connor and Remus (1996), the MAPE for the
neural networks and several other reference methods. They were calculated on the holdout
data sets from the Makridakis et al. (1982) competition; the forecast horizons are as in the
competition.

Exhibit 1
MAPE for neural networks and other reference methods
(number of series)
Annual Quarterly Monthly
(16) (19) (63)
Neural networks 14.2 15.3 13.6
Deseasonalized exponential 15.9 18.7 15.2
smoothing
Box-Jenkins 15.7 20.6 16.4
Judgment 12.5 20.5 16.3
Combined methods 12.6 21.2 16.7

• Neural networks may be as accurate or more accurate than traditional forecasting


methods for monthly and quarterly time series.

Neural networks may be better than traditional forecasting methods for monthly and quar-
terly time series. The M-Competition data contained annual data, quarterly data, and monthly
series; thus, the models were compared across the data period used. Foster, Collopy and Un-
248 PRINCIPLES OF FORECASTING

gar (1992) found neural networks to be inferior to traditional models for annual data but
comparable for quarterly data; they did not compare the models on monthly data.
We found that neural networks outperformed the traditional models (including Box-
Jenkins) in forecasting monthly and quarterly data series; however, they were not superior to
traditional models with annual series (Hill, O'Connor and Remus, 1996) (see Exhibit 1).

• Neunl networks may be better than traditional extnpolative forecasting methods


for diseontinuous series and often are as good as tnditional forecasting methods in
other situations.

Some of the M-Competition series had nonlinearities and discontinuities in the model-
estimation data (Armstrong and Collopy 1992; Carbone and Makridakis 1986; Collopy and
Armstrong 1992; Hill, O'Connor and Remus 1996). For example, in the monthly series used
by Hill, O'Connor and Remus (1996), only 57 percent of the series were linear; the remaining
43 percent included nonlinearities or discontinuities or both. We compared the effectiveness
of the forecasting models with linear, nonlinear, and discontinuous series. Hill, O'Connor and
Remus (1996) found that nonlinearities and discontinuities in the model estimation data af-
fected the forecasting accuracy of the neural networks. In particular, although neural net-
works performed well overall for all monthly series, they seemed to perform better in series
with discontinuities in estimation data.

• Neural networks are better than traditional extrapolative forecasting methods for
long-term forecast horizons but are often no better than traditional forecasting
methods for shorter forecast horizons.
Some models, such as exponential smoothing, are recommended for short-term forecast-
ing, while regression models are often recommended for long-term forecasting. Sharda and
Patil (1992) and Tang, de Almeida and Fishwick (1990) found that for time series with a long
history, neural network models and Box-Jenkins models produced comparable results.
Hill, O'Connor and Remus (1996) compared neural network models with the traditional
models across the 18 periods in the forecast horizon. The neural network model generally
performed better than traditional models in the later periods of the forecast horizon; these
findings are consistent with Kang's (1991). In a simulation study, Dorsey and Sen (1998) also
found that neural networks strongly dominated polynomial regression models in the later
periods of the forecast horizon when estimating series with polynomial features.

• To estimate the panmeters chancterizing neunl networks, many observations


may be required. Thus, simpler traditional models (e.g., exponential smoothing)
may be preferred for smaU data sets.
Many observations are often required to estimate neural networks. Particularly in the
quarterly and monthly M-Competition series, the number of observations for model estima-
tion varied widely. In many cases, there may not be enough observations to estimate the
model (Sharda and Patil 1990). The reason for this is simple; neural networks have more
parameters to estimate than most traditional time-series forecasting models.
Neural Networks for Time-Series Forecasting 249

ESTIMATING NEURAL NETWORKS


We adapted our principles for estimating neural networks from Armstrong's principles for
estimating forecasting models (2001) and from results specific to neural networks. All of
the general principles Armstrong presented are apply to neural networks. The following
principles are of critical importance:

• Clean the data prior to estimating the neural network model.


Data should be inspected for outliers prior to model building. This principle applies
equally to neural networks and other forecasting models (Refenes 1995, pp. 56--60). Outliers
make it difficult for neural networks to model the true underlying functional form.

• Scale and deseasonalize data prior to estimating the model.


Scale the data prior to estimating the model to help the neural network learn the patterns in
the data (Kaastra and Boyd 1996). As Hill, O'Connor and Remus (1996) did, modelers usu-
ally scale data between values of plus one and minus one. As in regression modeling, other
transformations are occasionally applied to facilitate the modeling; Kaastra and Boyd (1996)
give several examples.
Often, a time series contains significant seasonality and deseasonalizing the data prior to
forecasting model estimation is the standard approach. Wheelwright and Makridakis
(1985) found that prior deseasonalization improved the accuracy of traditional statistical
forecasting methods for the M-Competition quarterly and monthly data. Deseasonalization
is commonly done with neural networks. Hill, O'Connor and Remus (1996) statistically
deseasonalized their time series prior to applying the technique.
Is deseasonalization necessary or can neural networks model the seasonality that is likely
to be present in a time series? Given that neural networks have been shown to be universal
approximators of functions (Cybenko 1989), it seems reasonable to expect them to be able to
model the patterns of seasonality in a time series. On the other hand, Kolarik and Rudorfer
(1994) found neural networks had difficulty modeling seasonal patterns in time series.
Nelson et al. (1999) used data from the M-Competition to investigate the ability of neural
networks to model the seasonality in the series. They partitioned a systematic sample of 64
monthly series into two subsets based on the Makridakis et al. (1982) assessment of the exis-
tence of seasonality in those series. In those series with seasonality (n = 49), the MAPE for
neural networks based on deseasonalized data (12.3%) was significantly more accurate than
neural networks based on nondeseasonalized data (15.4%). In those series without seasonal-
ity (n = 15), the MAPE for neural networks based on deseasonalized data (16.9%) was not
significantly more accurate than neural networks based on nondeseasonalized data (16.4%).
Nelson et al. (1999) also performed post-hoc testing to establish that the above findings are
valid across the functional form of the time series, the number of historical data points, and
the periods in the forecast horizon. These results suggest that neural networks may benefit
from deseasonalizing data just as statistical methods do (Wheelwright and Makridakis
1985, p. 275).

• Use appropriate methods to choose the right starting point.


The most commonly used estimation method for neural networks, backpropagation, is
basically a gradient descent of a nonlinear error, cost, or profit surface. This means that
250 PRINCIPLES OF FORECASTING

finding the best starting point weights for the descent is crucial to getting to the global
optimal and avoiding local optimal points; this has been noted by many researchers in-
cluding, most recently, Faraway and Chatfield (1998). Typically, researchers choose the
neural network starting point weights randomly. It is much better to choose an algorithm to
help one find good starting points. As shown by Marquez (1992), one such method is the
downhill simplex method of Neider and Mead (1965); the necessary computer code can be
found in Press et al. (1988).

• Use speeiaUzed methods to avoid loeal optima.

When estimating neural network models, it is possible to end up at a local optimum or


not to converge to an optimum at all. One can use many techniques to avoid these prob-
lems. Our preference is the downhill simplex method of Neider and Mead (1965) for over-
coming these problems; Marquez (1992) gives an example of its use. Thus, one can use the
downhill simplex method both initially and to local optimum. Researchers have suggested
many other methods to deal with this problem, including using a momentum term in gradi-
ent descent rule (Rumelhart and McClelland 1986), using genetic algorithms (Sexton, Dor-
sey and Johnson 1998), local fitting of the network (Sanzogni and Vaccaro 1993), and
using a dynamically adjusted learning rate (Marquez 1992).
This principle and the previous one deal with problems associated with any gradient de-
scent algorithm (e.g., back propagation). Some researchers prefer to use nonlinear program-
ming algorithms to try to avoid these problems. Eventually, such an algorithm will replace the
currently popular back-propagation algorithm.

• Expand the network until there is no significant improvement in fit.

As noted by many researchers, including most recently Faraway and Chatfield (1998), a lot
of the art of building a successful model is selecting a good neural-network design. Since it
has been shown mathematically that only one hidden layer is necessary to model a network to
fit any function optimally (Funahashi 1989), we generally use only one hidden layer. If the
network has n input nodes, Hecht-Nelson (1989) has mathematically shown that there need
be no more than 2n+ 1 hidden layer nodes.
To select the number of input nodes in time-series forecasting, we generally start with at
least as many input nodes as there are periods in one cycle of the time series (e.g., at least 12
for monthly data). We then expand the network by incrementally increasing the number of
input nodes until there is no improvement in fit. Then we prune the network back. This is the
easiest way to build the neural-network model while avoiding overfitting. It is also common
to start with a large network and reduce it to an appropriate size using the pruning methods
(Kaastra and Boyd 1996 discuss this approach). If one identifies a clear lag structure using
traditional means, one can use the structure to set the number of nodes.
Hill, O'Connor and Remus (1996) used one output node to make a forecast; they used this
forecast value to create another forecast further into the future. They did this iteratively (as in
the Box-Jenkins model), this is often called a moving-window approach. Zhang, Patuwo and
Hu (1998) make compelling arguments for developing neural-network models that forecast
several periods ahead simultaneously. Hill, O'Connor and Remus (1996) initially used the
simultaneous forecasting method but changed to the iterative method to avoid overfitting
problems. We suspect that many forecasters face similar problems that will lead them to use
Neural Networks for Time-Series Forecasting 251

network structures like those used by Hill, O'Connor and Remus (1996). When there is no
overfitting problem, the capability to generate multiple forecasts may be useful.

• Use pruning techniques when estimating neural networks and use holdout samples
when evaluating neural networks.
Overfitting is a major concern in the design of a neural network, especially for small data
sets. When the number of parameters in a network is too large relative to the size of the esti-
mation data set, the neural network tends to "memorize" the data rather than to "generalize"
from it The risk of overfitting grows with the size of the neural network. Thus, one way to
avoid overfitting is to keep the neural network small.
In general, it is useful to start with one hidden layer using at least as many input nodes as
are in one seasonal cycle; there are mathematical proofs to show no fitting advantage from
using more than one hidden layer. If the seasonal cycles are not stable, one can increase the
starting number of input nodes. Then one prunes the network to a small size. For example,
Marquez (1992) used Seitsma and Dow's (1991) indicators to determine where in the net-
work to prune and then pruned the network using the methods of Weigend, Hubermann and
Rumelhart (1990). Even small neural networks can often be reduced in size. For example, if
a neural network has four input nodes, three intermediate nodes, and one output node, the
fully connected network would have 23 parameters; many more than 23 observations
would be needed to avoid overfitting. Larger networks would require hundreds of data
points to avoid overfitting. Refenes (1995, pp. 28, 33-54) discusses details of pruning and
alternative approaches.
One needs holdout (out-of-sample) data to compare models. Should any overfitting have
occurred, the comparative measures of fit on the holdout data would not be over estimated
since overfitting affects only measures based on the estimation sample.

• Obtain software that has built-in features to address the previously described
problems.
The highest cost to Hill, O'Connor and Remus (1996) and to many other neural network
researchers was the effort expended to develop custom software. We spent many hours
building the software and developing procedures to make the forecasts. Fortunately most of
these problems are now solved in off-the-shelf neural-network software packages. The capa-
bilities of the packages are always improving, so one should consult recent reviews of the
major packages.
In looking over software specifications, look for built-in support procedures, such as pro-
cedures for finding good start points, avoiding local optimum, performing pruning, and sim-
plifying neural networks.

• Build plausible neural networks to gain model acceptance.


Neural networks suffer from the major handicap that their forecasts seem to come from a
black box. That is, examining the model parameters often does not reveal why the model
made good predictions. This makes neural-network models hard to understand and difficult
for some managers to accept.
Some work has been done to make these models more understandable. For example, Be-
nitez, Castro and Requena (1997) have mathematically shown that neural networks can be
thought of as rule-based systems. However, the best approach is to carefully reduce the net-
252 PRINCIPLES OF FORECASTING

work size so that resulting network structures are causally plausible and interpretable. This
requires selecting good software to support the model estimation.

• Use three approaches to ensure that the neural-network model is valid.


Adya and Collopy (1998) describe three validation criteria: (1) comparing the neural net-
work forecasts to the forecasts of other well-accepted reference models, (2) comparing the
neural network and traditional forecasts' ex ante (out-of-sample) performance, and (3) mak-
ing enough forecasts to draw inferences (they suggest 40 forecasts). Armstrong (2001) gives
more details . Because neural networks are prone to overfitting, one must always validate the
neural network models using at least these three validation criteria.
Neural-network researchers often partition their data into three parts rather than just two.
One portion is for model estimation, the second portion is for model testing, and the third is
for validation. This requires a lot of data.

CONCLUSIONS
Neural networks are not a panacea, but they do perform well in many situations. They per-
form best when the estimation data contain discontinuities. They may be more effective for
monthly and quarterly series than for annual series. Also neural networks perform better than
statistical methods dofor forecasting three or more periods out on the forecast horizon. An-
other strength of neural networks is that they can be automated.
Neural networks might be superior to traditional extrapolation models when nonlinearities
and discontinuities occur. Neural networks may be better suited to some task domains than
others, and we need more research to define these conditions.
We have given some guidelines on the issues and pitfalls forecasters face in estimating
neural network models, which are similar to those they face with traditional extrapolation
models. Forecasters need to take time to master neural networks and they need good software.
The research cited above on neural networks is largely based on experience with time se-
ries forecasting tasks. These principles should generalize to many non-time series forecasting
models since neural networks have been mathematically shown to be universal approximators
of functions and their derivatives, to be equivalent to ordinary linear and nonlinear least-
squares regression, and nonparametric regression.
Research on neural networks is growing exponentially. Concerned pmctitioners should
read the periodic reviews of the emerging litemture like that of Zhang, Patuwo and Hu
(1998). However, the standards many researchers use fall short of those discussed by Adya
and Collopy (1998) and Armstrong (2001). Thus, the practitioners should apply the standards
of Adya and Collopy (1998) and Armstrong (2001) when evaluating the emerging litemture.

APPENDIX: WHAT ARE NEURAL NETWORKS?


Neural networks consist of interconnected nodes, termed neurons, whose design is suggested
by their biological counterparts. Each neuron has one or more incoming paths (Exhibit 2).
Each incoming path i has a signal on it (Xi)' and the strength of the path is characterized by a
Neural Networks for Time-Series Forecasting 253

weight (wJ. The neuron sums the path weight times the input signal over all paths; in addi-
tion, the node may be biased by an amount (Q). Mathematically, the sum is expressed as
follows:

sum = L WjXj+ Q

Exhibit 2
A neuron

Neuron
1-------------------------------------------------,
I I
XI
,
I
:
I

I
I
I
I
Transform I
I
I
I
I

x.
I
I
IL_________________________________________________II

The output (Y) of the node is usually a sigmoid shaped logistic transformation of the sum
when the signals are continuous variables. This transformation is as shown below:
Y = 11(1 + e-SUDI)
Learning occurs through the adjustment of the path weights (Wj) and node bias (Q). The most
common method used for the adjustment is called back propagation. In this method, the fore-
caster adjusts the weights to minimize the squared difference between the model output and
the desired output. The adjustments are usually based on a gradient descent algorithm.
Many neurons combine to form a network (Exhibit 3). The network consists of an input
layer, an output layer, and perhaps one or more intervening layers; the latter are termed hid-
den layers. Each layer consists of multiple neurons and these neurons are connected to other
neurons in adjacent layers. Since these networks contain many interacting nonlinear neurons,
the networks can capture fairly complex phenomenon.

Exhibit 3
A neural network

INPUT HIDDEN OUTPUT


LAYER LAYER LAYER
254 PRINCIPLES OF FORECASTING

REFERENCES
Ady~ M. & F. Collopy (1998), "How effective are neural networks at forecasting and
prediction? A review and evaluation," Journal ofForecasting, 17,451--461. (Full text
at hops.wharton.upenn.edulforecast)
Annstrong, J. S. (2001), "Evaluating forecasting methods," in J. S. Annstrong (ed.), Prin-
ciples ofForecasting. Norwell, MA: Kluwer Academic Publishers.
Annstrong, J. S. & F. Collopy (1992), "Error measures for generalizing about forecasting
methods: Empirical comparisons," International Journal of Forecasting, 8, 69-80.
(Full text at hops.wharton.upenn.edulforecast)
Benitez, J. M., J. L. Castro & I. Requena (1997), "Are artificial neural networks black
boxes?" IEEE Transactions on Neural Networks, 8, 1156-1164.
Carbone, R. & S. Makridakis (1986), "Forecasting when pattern changes occur beyond the
historical da~" Management Science, 32, 257-271.
Collopy, F. & J. S. Annstrong (1992), "Rule-based forecasting: Development and valida-
tion of an expert systems approach to combining time series extrapolations," Manage-
ment Science, 38, 1394-1414.
Cybenko, G. (1989), "Approximation by superpositions of a sigmoidal function," Mathe-
matics ofControl, Signals, and Systems, 2, 303-314.
Dorsey, R. E. & S. Sen (1998), "Flexible fonn estimation: A comparison of polynomial
regression with artificial neural networks," Working paper: University of Mississippi.
Faraway, J. & C. Chatfield (1998), "Time series forecasting with neural networks: A com-
parative study using the airline ~" Applied Statistics, 47, Part 2, 231-250.
Foster, B., F. Collopy & L. Ungar (1992), ''Neural network forecasting of short, noisy time
series," Computers and Chemical Engineering, 16, 293-297.
Funahashi, K. (1989), "On the approximate realization of continuous mappings by neural
networks," Neural Networks, 2, 183-192.
Hecht-Nelson, R. (1989), "Theory of the backpropagation neural network," Proceedings of
the International Joint Conference on Neural Networks. Washington, DC, I, 593-605.
Hill, T., M. O'Connor & W. Remus (1996), "Neural network models for time series fore-
casts," Management Science, 42, 1082-1092.
Hornik, K., M. Stinchcombe & H. White (1989), "Multilayer feedforward networks are
universal approximators," Neural Networks, 2, 359-366.
Kaas~ I. & M. Boyd (1996), "Designing a neural network for forecasting financial and
economic time series," Neurocomputing 10, 215-236.
Kang, S. (1991), An investigation ofthe use offeedforward neural networks for forecast-
ing, Ph.D. Dissertation, Kent, Ohio: Kent State University.
Kolarik, T. & G. Rudorfer (1994), "Time series forecasting using neural networks," APL
Quote Quad, 25,86--94.
Makridakis, S., A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. New-
ton, E. Parzen & R. Winkler (1982), "The accuracy of extrapolation (time series)
methods: Results ofa forecasting competition," Journal ofForecasting, 1, 111-153.
Makridakis, S., C. Chatfield, M. Hibon, M. J. Lawrence, T. Mills, K. Ord & L. F. Simmons
(1993), "The M2-Competition: A real-time judgmentally based forecasting competi-
tion," Journal ofForecasting, 9, 5-22.
Makridakis, S., M. Hibon, E. Lusk & M. Belhadjali (1987), "Confidence intervals: An
empirical investigation of the series in the M-Competition," International Journal of
Forecasting, 3,489-508.
Neural Networks for Time-Series Forecasting 255

Marquez, L. (1992), Function approximation using neural networks: A simulation study,


Ph.D. Dissertation, Honolulu, Hawaii: University of Hawaii.
NeIder, J. & R. Mead (1965), "The downhill simplex method," Computer Journal, 7,308-
310.
Nelson, M., T. Hill, W. Remus & M. O'Connor (1999), ''Time series forecasting using
neural networks: Should the data be deseasonalized first?" Journal ofForecasting, 18,
359-370.
Press, W., B. Flannery, S. Teukolsky & W. Vettering (1988), Numerical Recipes in C: The
Art ofScientific Computing. Cambridge, U. K.: Cambridge University Press.
Refenes, A. P. (1995), Neural Networks in the Capital Markets. C~ichester, UK: Wiley.
Rumelhart, D. & J. McClelland (1986), Parallel Distributed Processing. Cambridge, MA:
MIT Press.
Sanzogni, L. & J. A. Vaccaro (1993), "Use of weighting functions for focusing oflearning
in artificial neural networks," Neurocomputing, 5, 175-184.
Seitsma. J. & R. Dow (1991), "Creating artificial neural networks that generalize," Neural
Networks, 4, 67-79.
Sexton, R. S., R. E. Dorsey & J. D. Johnson (1998), "Toward global optimization of neural
networks: A comparison of the genetic algorithms and backpropagation," Decision
Support Systems, 22, 171-185.
Sharda. R. & R. Patil (1990), ''Neural networks as forecasting experts: An empirical test,"
Proceedings ofthe 1990 IJCNN Meeting, 2, 491-494.
Sharda, R. & R. Patil (1992), "Connectionist approach to time series prediction: An em-
pirical test," Journal ofIntelligent Mamifacturing, 3, 317-323.
Tang, Z., C. de Almeida & P. Fishwick (1990), "Time series forecasting using neural net-
works vs. Box-Jenkins methodology," Simulation, 57, 303-310.
Wasserman, P. D. (1989), Neural Computing: Theory and Practice. New York: Van
Nostrand Reinhold.
Weigend, A., B. Hubermann & D. Rumelhart (1990), "Predicting the future: A
connectionist approach," International Journal o/Neural Systems, 1, 193-209.
Wheelwright, S. & S. Makridakis (1985), Forecasting Methods for Management, 4th ed.,
New York: Wiley.
White, H. (1992a), \'Connectionist nonparametric regression: Multilayer feedforward net-
works can learn arbitrary mappings," in H. White (ed.), Artificial Neural Networks:
Approximations and Learning Theory. Oxford, UK: Blackwell.
White, H. (1992b), "Consequences and detection of nonlinear regression models," in H.
White (ed.), ArtifiCial Neural Networks: Approximations and Learning Theory. Ox-
ford, UK: Blackwell.
White, H. & A. R. Gallant (1992), "There exists a neural network that does not make
avoidable mistakes," in Artificial Neural Networks: Approximations and Learning
Theory. H. White (ed.), Oxford, UK: Blackwell.
White, H., K. Hornik & M. Stinchcombe (1992), "Universal approximation of an unknown
mapping and its derivatives," in H. White (ed.), Artificial Neural Networks: Approxi-
mations and Learning Theory. Oxford, UK: Blackwell.
White, H. & M. Stinchcombe (1992), "Approximating and learning unknown mappings
using multilayer feedforward networks with bounded weights," in H. White (ed.), Ar-
tificial Neural Networks: Approximations and Learning Theory. Oxford, UK: Black-
well.
256 PRINCIPLES OF FORECASTING

Zhang, G., B. E. Patuwo & M. Y. Hu (1998) "Forecasting with artificial neural networks:
The state of the art," International Journal ofForecasting, 14, 35--62.

Acknowledgments: We appreciate the valuable comments on our paper made by Sandy


Balkin, Chris Chatfield, Wilpen Gorr, and others at the 1998 International Symposium of
Forecasting Conference in Edinburgh.

View publication stats

You might also like