You are on page 1of 10

ARTICLE IN PRESS

Tourism Management 27 (2006) 781–790


www.elsevier.com/locate/tourman

Designing an artificial neural network for forecasting tourism


time series
Alfonso Palmer, Juan José Montaño, Albert Sesé
Facultad de Psicologı´a, Universidad de las Islas Baleares, Ctra. Valldemossa Km. 7,5, 07122 Palma de Mallorca, Spain
Received 23 July 2003; accepted 19 May 2005

Abstract

This paper aims to provide, on one hand, an introduction to the theoretical principles of artificial neural networks (ANN) and on
the other, a step-by-step methodology for designing a neural network for tourism time series forecasting. The time series
corresponding to tourism expenditure in the Balearic Islands (Spain), one of the world’s major tourist destinations, has been used as
data to illustrate this process. A number of practical rules and discussion points between authors, comprehensible to both academic
researchers and practitioners, have been included throughout the paper so that ANN can be applied successfully. Lastly, the results
obtained in this study provide information for researchers interested in applying ANN to tourism data forecasting.
r 2005 Elsevier Ltd. All rights reserved.

Keywords: Neural networks; Tourism forecasting

1. Introduction also of value in ascertaining the relative contributions of


tourism to production, income and employment in
Due to the perishable nature of the tourism industry, tourist destinations (Bull, 1995). Thus, the large
the need to devise accurate forecasts has become crucial quantity of academic literature which has been gener-
(Chandra & Menezes, 2001; Law, 2000a; Law & Au, ated in this area is not surprising (Morley, 2000).
1999). Thus, researchers, practitioners and policy Despite the consensus on the need to develop accurate
makers have openly acknowledged the need for accurate forecasts and the recognition of their corresponding
forecasts in the field of tourism (Sheldon & Var, 1985). benefits, there is no one model that stands out in terms
In the case of tourism demand, better forecasts would of forecasting accuracy (Law & Au, 1999; Witt & Witt,
help directors and investors make operational, tactical 1995). In this respect, one of the most widely used
and strategic decisions, examples of which are schedul- procedures in time series forecasting is the Box–Jenkins
ing and staffing, preparing tour brochures and hotel methodology (Box & Jenkins, 1976), which is based on
investments, respectively. Similarly, government bodies the fit of a special type of linear statistical model known
need accurate tourism demand forecasts to plan as Autoregressive Integrated Moving Average (ARI-
required tourism infrastructures, such as accommoda- MA). One problem that makes developing and im-
tion site planning and transportation development, plementing this type of time series model difficult is that
among other needs. Forecasting tourism expenditure is the model must be formally specified and a probability
distribution for data must be assumed (Hansen,
McDonald & Nelson, 1999).
Corresponding author. Tel.: +34 971 17 34 32;
In recent years, the study of artificial neural networks
fax: +34 971 17 31 90.
(ANN) has aroused great interest in fields as diverse as
E-mail addresses: alfonso.palmer@uib.es (A. Palmer),
juanjo.montano@uib.es (J. José Montaño), albert.sese@uib.es biology, psychology, medicine, economics, mathematics,
(A. Sesé). statistics and computers. The reason behind this interest

0261-5177/$ - see front matter r 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.tourman.2005.05.006
ARTICLE IN PRESS
782 A. Palmer et al. / Tourism Management 27 (2006) 781–790

is that ANN are universal function approximators been included throughout the paper for successful
capable of mapping any linear or non-linear function application. Lastly, the results obtained in this study
(Cybenko, 1989; Funahashi, 1989; Hornik, Stinchcombe provide valuable information on applying ANN in
& White, 1989; Wasserman, 1989). Due to their tourism data forecasting.
flexibility in function approximation, ANN are powerful The time series corresponding to tourism expenditure
methods in tasks involving pattern classification, esti- in the Balearic Islands, one of Spain’s most internation-
mating continuous variables and forecasting (Kaastra & ally relevant tourism destinations, shall be used as data
Boyd, 1996). In the last case, neural networks offer to illustrate the application of this technique. Spain is
several potential advantages over alternative methods— the world’s second major tourism destination, with a
mainly the ARIMA time series models—when handling total of 49.5 million tourists in 2001 (Ivars, 2003). The
problems with non-normal and non-linear data (Hansen Balearic Islands, which covers a surface area of
et al., 1999). The first advantage is that ANN are very 4968.36 km2 and had 878,627 inhabitants in 2001, is
versatile and do not require formal specification of the considered an exceptional tourism centre, one of the
model nor acceptance of a determined probability top-ranking regions in the world in terms of affluence,
distribution for data. As for the second advantage, with a volume of 8.4 million arrivals in 2001, a 1.2%
Masters (1995) demonstrates that ANN are capable of share of the total world market. At the same time, the
tolerating the presence of chaotic components better significance of tourism to its economy can be determined
than most alternative methods. This capacity is parti- by observing the Balearic Islands’ own statistics–an
cularly important, as many relevant time series possess estimated E5096.2 million of revenue generated in 2001.
significant chaotic components. Additionally, tourism added value (TVA) amounted to
ANN have been applied in the many fields mentioned 21.2% of the Balearic GDP in 1997, while the value
above and have been a pioneer in the field of tourism added of tourism industries (VATI) was estimated at
data analysis (Chandra & Menezes, 2001). Thus, neural 31.2% (Palmer & Riera, 2003).
network models have recently been used as a statistical Having discussed the role that ANN can play in tourism
technique in the main fields of tourism research, such as time series forecasting and the objectives of this research,
demand and consumer behaviour forecasting (Burger, the remaining sections of this paper are organised as
Dohnal, Kathrada & Law, 2001; Cho, 2003; Jeng & follows: Section 2 briefly introduces the basic foundations
Fesenmaier, 1996; Law, 1998, 2000a, b, 2001; Law & of ANN and describes artificial neuron operations.
Au, 1999; Mohsin & Ryan, 1999; Pattie & Snyder, 1996; Section 3 presents the neural network model most widely
Sirakaya, Delen & Choi, 2005; Tsaur, Chiu & Huang, used in time series forecasting: the multi-layer perceptron
2002; Uysal & El Roubi, 1999; Wang, 2004), market (MLP). Section 4 describes a step-by-step methodology
segmentation and positioning analysis (Bloom, 2002, for applying a neural network to tourism expenditure
2004, 2005; Dolnicar & Fluker, 2003; Ganglmair & forecasting in the Balearic Islands. The paper’s final
Wooliscroft, 2000; Kim, Wei & Ruys, 2003; Mazanec, section concludes with a discussion of the results obtained
1992, 1995, 1999; Wallace, Maglogiannis, Karpouzis, and suggestions for future lines of research.
Kormentzas & Kollias, 2003).
These studies indicate a growing interest in using
ANN to represent complex, non-linear cognitive activ- 2. The artificial neuron
ities that are often studied in tourism management
(Morley, 2000). Nevertheless, due to the relatively recent ANN can be defined as information processing
introduction of neural networks in tourism, the number systems whose structure and functioning are inspired
of published articles which clearly explain a methodol- by biological neural networks. They have three funda-
ogy that obtains reliable tourism data modelling mental features—parallel processing, distributed mem-
through neural networks continues to be limited in ory and adaptability—that provide them with a series of
comparison with other statistical methods (Uysal & El advantages compared to other processing systems, such
Roubi, 1999). On the other hand, due to their flexibility, as robustness and a tolerance to error and noise.
neural networks lack a systematic procedure for model In general, an ANN is made up of a large number of
building. Therefore, obtaining a reliable neural model simple processing elements known as nodes or neurons,
involves selecting a large number of parameters experi- which are organised in layers. Each neuron is connected
mentally through trial and error. to other neurons by communication links, each of which
Thus, this paper aims to provide both an introduction has an associated numerical value known as a ‘‘weight’’.
to the main principles of ANN and a step-by-step Weights contain the knowledge or information that
methodology for designing a neural network for tourism ANN possess about a specific problem.
time series forecasting. Several practical rules and The task of an artificial neuron j is simple and consists
discussion points between authors, comprehensible both of receiving input signals (xi) (weighted by connection
to academic researchers as well as practitioners, have weights (wij)) from neighbouring neurons. The sum of
ARTICLE IN PRESS
A. Palmer et al. / Tourism Management 27 (2006) 781–790 783

these weighted signals provides the neuron’s total or net Univariant time series modelling is normally carried
input (netj). Then, the activation threshold of neuron j— out through this neural network by using a determined
represented by a positive or negative yj value—is added number of the time series’ lagged terms as inputs and
to net input and through applying a mathematical forecasts as output (Bishop, 1995). The number of input
function f ðÞ (generally non-linear and known as an neurons determines the number of prior time points to
activation function) to net input, output value yj is be used in each forecast, while the number of output
computed and sent to other neurons (see Fig. 1). neurons determines the forecast horizon. Thus, a one-
step-ahead forecast can be performed through a neural
network with one output unit and a k-step-ahead
3. The MLP forecast can be performed with k output units. Fig. 2
illustrates an MLP forecasting model in which four past
MLP (also known as a feedforward neural network) is terms in the series are used to forecast the value of the
the neural network model most widely used in applied series in a one-step-ahead forecast.
work (Sarle, 2002), because it is capable of resolving a
wide variety of problems. For the purposes of this
paper, the MLP network is also the type of neural 4. Designing an ANN model for tourism time series
network most commonly used in time series forecasting forecasting
(Bishop, 1995; Kaastra & Boyd, 1996).
A MLP network is made up of an input layer, an Resolving a forecasting task through an MLP net-
output layer and one or more hidden layers of neurons. work means applying a methodology that not only has
In this type of architecture, data are always transmitted aspects in common with conventional statistical model-
from the input layer to the output layer. From a ling techniques, but also has other singular aspects that
statistical point of view, each input neuron represents can only be found in the ANN field. The application
one of the independent variables, while the output procedure is described below.
neuron(s) represent dependent variable(s) or MLP
network forecasts.
Two stages may be considered in the MLP network: 4.1. Data preprocessing
the running stage, in which an input pattern is presented
to the trained network and transmitted through This research worked with data on tourism expendi-
successive layers of neurons until reaching an output, ture in the Balearic Islands (in millions of pesetas) from
and the training or learning stage in which the weights each quarter of 1986 through to the year 2000. There-
or parameters of the network are iteratively modified on fore, this time series is composed of 60 time points. The
the basis of a set of input–output patterns known as a data were provided by the Ministry of the Economy.
training set, in order to minimise the deviance or error Fig. 3(a) shows the time series which was used. In the
between the output obtained by the network and the first place, a marked seasonality can be seen in which
user’s desired output. This is why MLP network the first and fourth quarters (October–March) make up
learning is said to be supervised. The learning rule the ‘low season’ and coincide with autumn and winter,
commonly used in this type of network is the back and the second and third quarters (from April to
propagation algorithm or gradient descent method,
developed and disseminated by Rumelhart, Hinton
Input
and Williams (1986).
layer

X1 tn-4
1 Hidden
layer

f Output
X2 W1j tn-3
2 layer
N yj
W2j net j = Σ WijXi + θj yj = f(netj) tn
i=1
Xi
i tn-2
Wij

XN
WNj
N tn-1

Neurons i Neurons j
Fig. 2. MLP network for one-step-ahead forecasting based on four
Fig. 1. General functioning of an artificial neuron. lagged terms.
ARTICLE IN PRESS
784 A. Palmer et al. / Tourism Management 27 (2006) 781–790

Raw Data
Time series Histogram
400000
14
350000
12
300000
Millions of pesetas

10
250000

Frequency
8
200000
6
150000

100000 4

50000 2

0 0
86 87 88 89 90 91 92 93 94 95 96 97 98 99 00
(a) Year (b) Millions of pesetas
Preprocessed Data
Time series Histogram
0.5
8
0.4

0.3
6
0.2
Difference

Frequency
0.1
4
0

-0.1
2
-0.2

-0.3 0
87 88 89 90 91 92 93 94 95 96 97 98 99 00
(c) Year (d) Difference

Fig. 3. Graphic representation of raw and preprocessed data.

September) make up the ‘high season’ and coincide with to the right present in data distribution. This transfor-
spring and summer, as tourists visiting the Islands are mation also allows multiplicative relations to be con-
mainly attracted by the sun and sand offer. In the verted into additive relations, which simplifies and
second place, the time series shows a slightly rising improves data modelling (Masters, 1993). Differencing
trend from approximately 1992, which indicates the or the use of changes in a variable can be used to
growth that tourism expenditure in the Balearic Islands eliminate both linear trend as well as seasonality in a set
has experienced in recent years. Fig. 3(b) shows the of data (Masters, 1993).
histogram of data distribution in which a marked Lastly, when applying MLP networks it is recom-
asymmetry to the right can be seen. mended, although not strictly required, that the
Preprocessing data refers to analysing and transform- variables’ range of values be limited in the work interval
ing input and output variables in order to detect trends, of the activation function used in the hidden and output
minimise noise, underline important relationships and layers of neural networks (Masters, 1993; Sarle, 2002).
flatten the variable’s distribution. These analyses and This measure considerably accelerates weight learning
transformations help the model learn relevant patterns. and avoids saturation or overflow of the hidden and
Logarithmic transformation (natural log) and differ- output neurons whose activation values generally fall
encing are the two preprocessing techniques most between the [0, 1] or [1, 1] interval.
commonly used in both traditional and neural networks In order to analyse the effect that the data preproces-
forecasting (Kaastra & Boyd, 1996). Logarithmic sing has on MLP networks performance, two sets of
transformation is useful in correcting the asymmetry data were used. The first set (which shall be called raw
ARTICLE IN PRESS
A. Palmer et al. / Tourism Management 27 (2006) 781–790 785

data) is made up of the time series data limited by the losing its capacity to generalise learning to new cases
[1, 1] interval through the linear transformation (Baum & Haussler, 1989).
proposed by Masters (1993). The second set (which To avoid the problem of overfitting, it is recom-
shall be called preprocessed data) is made up of the time mended using a second set of data, the validation
series data to which two consecutive transformations set, which permits the learning process to be controlled.
have been applied with the twofold aim of pre- During learning, the network modifies weights on
processing the data and eliminating the two determinis- the basis of the training data and alternatively the
tic components identified above, linear trend and network error made with validation data is obtained.
seasonality. Thus, the logarithmic transformation was Thus, the optimum number of weights can be ascer-
applied and then a differentiation that consists in tained on the basis of the architecture that has
subtracting the value of the previous year’s quarter performed best with the validation data. The value of
from the value of each quarter (e.g., the 3rd quarter of other parameters that play a part in network learning
1998—the 3rd quarter of 1997) was applied. Fig. 3(c) can also be determined through the validation set, as can
shows the preprocessed time series from which linear be seen below.
trend and seasonality have been eliminated, while Lastly, if the final effectiveness of the system built is to
Fig. 3(d) shows the histogram which reflects a more be measured in a completely objective manner, the error
symmetrical distribution after preprocessing. Finally, committed with validation data should not be used as a
the differentiated values were limited by the [1, 1] basis, as to some degree, these data have participated in
interval through linear transformation (Masters, 1993). the training process. A third set of independent data
Next, different data matrices were generated for both must be used, the test set, which provides an unbiased
sets of data on the basis of the number of lagged terms estimate of the generalisation error.
(from 1 to 8) that would act as input and the number of There is no precise rule on the optimum size of the
step-ahead forecast (1 and 2) that would act as the three sets of data, although authors agree that the
neural network’s output. To illustrate, Fig. 4 shows the training set must be the largest (Kaastra & Boyd, 1996;
structure of the data matrix corresponding to the MLP West, Brockett & Golden, 1997). In this research, data
network in Fig. 2, in which each time point is to be from the years between 1986 and 1996 were used as the
forecast on the basis of the four previous time points. training set (73.3%), data from 1997 and 1998 were used
as the validation set (13.3%) and data from 1999 and
4.2. Creating training, validation and test sets 2000 were used as the test set (13.3%).

In ANN methodology, data samples are frequently 4.3. ANN model building
subdivided into three sets (Bishop, 1995; Ripley, 1996)—
training, validation and test sets—in order to obtain a The following is practical advice on four sets of
network which is capable of generalising and performing parameters involved in creating an MLP network
well with new cases. through the back propagation learning rule: network
During the network’s learning stage, the weights are architecture, learning rate and momentum factor,
iteratively modified on the basis of the training set’s activation function of the hidden and output layers
values, in order to minimise the error between the and number of iterations.
network output and the user’s desired output. Never- As for the MLP network architecture, it is known that
theless, an excessive number of parameters or weights in using one sole hidden layer of neurons will be sufficient
relation to the problem at hand and to the number of for most practical problems (Funahashi, 1989; Hornik
training data may lead to overfitting. This phenomenon et al., 1989). The number of hidden neurons determines
occurs when the model fits the irrelevant features present the MLP network’s capacity to learn. Despite its
in training data too closely instead of fitting the importance, there is no rule that indicates the optimum
underlying function which relates inputs and outputs, number of hidden neurons for any given problem.
Bearing in mind the overfitting problem, selecting the
network which performs best with the validation set
Inputs Outputs using the least possible number of hidden neurons
t1 t2 t3 t4 t5 (Masters, 1993; Smith, 1993; Rzempoluck, 1998) is most
t2 t3 t4 t5 t6 recommended. The input layer is in charge of receiving a
determined number of the time series’ lagged terms to
t3 t4 t5 t6 t7 carry out the forecast. Normally, the selection of lagged
t56 t57 t58 t59 t60
terms is determined experimentally through trial and
error, on the basis of the minimum set of terms that
Fig. 4. Structure of the data matrix corresponding to the MLP obtains the least error with the validation set. Two
network in Fig. 2. alternatives to the experimental selection of input
ARTICLE IN PRESS
786 A. Palmer et al. / Tourism Management 27 (2006) 781–790

variables have been suggested. First, applying a activation function used in the hidden and output layers
sensitivity analysis to the network model provides of the models built.
information on the significance that each input has on Lastly, MLP network training must conclude when
output, and as a result, the lagged terms most relevant to the weights converge. This point occurs when the error
forecasting can be selected (Montaño & Palmer, 2003). committed with the training data stops decreasing. In
Second, selecting the lagged terms in the MLP network most studies, the number of training iterations used until
can be carried out on the basis of the autoregressive or convergence was reached oscillated between 100 and
lagged terms obtained by applying the Box–Jenkins 200,000 iterations (Kaastra & Boyd, 1996). A total of
methodology (Tang & Fishwick, 1993). Lastly, the 10,000 iterations in each training was sufficient in this
output layer determines the neural network’s forecast study to guarantee convergence of the weights obtained.
horizon. As for the scope of the forecast horizon, At present there are many free computer programs
authors such as Masters (1993) and Kaastra and Boyd and commercial computer programs that allow MLP
(1996) recommend limiting the forecast to a one-step- networks to be simulated through the manipulation of
ahead prediction, assuming that the use of broader all the parameters analysed in this section without the
forecast horizons has very serious repercussions on need for deep knowledge of the underlying mathema-
network performance. Nevertheless, Tang and Fishwick tical algorithms. The MLP networks used in this paper
(1993) proved in a systematic study that expanding the were generated by the Neural Connection 2.1 computer
forecast horizon does not imply a decrease in the neural program (SPSS Inc., 1998).
network’s performance.
A set of models based on the combination of different 4.4. Evaluation and selection of ANN models
values for the number of input (from 1 to 8), hidden
(from 1 to 3) and output (1 and 2) neurons were A total of 96 MLP networks were obtained on the
constructed for this study in order to analyse the effect basis of combining the following parameters: raw/pre-
that the type of architecture has on MLP networks processed data and the number of input (from 1 to 8),
performance and in consonance with the structure of hidden (from 1 to 3) and output (1 or 2) neurons. The
matrices generated in the data preprocessing stage. models were evaluated with the validation data through
The learning rate value plays a crucial role in the three forecasting accuracy measures: root mean squared
MLP networks training process, as it controls the size of error (RMSE), mean absolute percentage error (MAPE)
the changes in weight in each iteration. Both a too-small and Theil’s U coefficient.
change in size as well as a too-large change in size must The MLP architecture which was selected, and
be avoided in order to obtain optimum weight config- therefore, the architecture which presented the best
urations. A learning rate between 0.05 and 0.5 provides forecasting accuracy with the validation data, was
good results in most practical cases (Rumelhart et al., composed of eight inputs, one hidden and one output
1986). The momentum factor determines the effect of neurons (in abbreviated form, a 8-1-1 architecture),
past changes in weights on current changes in weights the time series having been previously preprocessed. The
and allows the speed of learning to be increased by values obtained through this MLP network with the
filtering the oscillations caused by the learning rate. The validation data as regards accuracy measures were
momentum factor usually has a value of close to 1, e.g., RMSE ¼ 6270.57, MAPE ¼ 3.32 and U-Theil ¼ 0.016.
0.9 (Rumelhart et al., 1986). A learning rate of 0.25 and These three accuracy measures were applied to the 96
a momentum factor of 0.8 were constantly used in all the MLP networks using the test set as data in order to
MLP networks in our study, although it was shown that analyse the different created models’ level of general-
similar results were obtained for a wide range of values isation. Table 1 shows the results of a selection of 28
in both parameters. MLP networks and reflects the effect that manipulating
In a standard MLP network, the input layer neurons different parameters has on performance. In the first
use a linear activation function, while the hidden and place, one of the most striking results is that neural
output layer neurons use a sigmoid activation function. networks performance clearly improved in the three
In this sense, the two sigmoid functions most used measures used when data were preprocessed. In this
are the logistic (providing continuous values between 0 case, all the architectures, including the architecture
and 1) and hyperbolic tangent (providing continuous selected in the validation stage, obtained MAPE values
values between 1 and 1) functions. In this study, the of less than 5 percent, which can be considered highly
hyperbolic tangent function was used in the hidden and accurate forecasting (Witt & Witt, 1992). This implies
output layers of the MLP networks, as it considerably that data preprocessing which consists of eliminating the
accelerates weight learning in comparison to the logistic deterministic components of the time series—linear
function (Fahlman, 1988; Fausett, 1994). It should be trend and seasonality—has positive repercussions on
recalled that the data were scaled to the interval between the goodness of fit of MLP networks. In the second
[1, 1] in the preprocessing stage, coherent with the place, it can be seen that two-steps-ahead forecasting
ARTICLE IN PRESS
A. Palmer et al. / Tourism Management 27 (2006) 781–790 787

Table 1
Forecasting accuracy of MLP networks with test data

Network architecture Raw data Preprocessed data


(input-hidden-output)
RMSE MAPE U-Theil RMSE MAPE U-Theil

2-2-1 94077.38 29.30 0.266 10556.78 4.20 0.023


3-2-1 28473.71 11.45 0.068 8235.75 3.72 0.018
4-2-1 16531.10 8.87 0.039 10578.32 4.11 0.023
5-2-1 16211.88 9.70 0.038 11842.94 4.14 0.026
6-1-1 24292.66 17.34 0.058 9688.92 3.88 0.021
7-2-1 22574.47 15.44 0.054 7412.12 3.40 0.016
8-1-1 27603.08 21.07 0.065 8181.41 3.44 0.018
2-3-2 11482.83 7.26 0.026 8982.47 3.97 0.020
3-1-2 33268.98 18.51 0.081 6592.16 4.32 0.015
4-1-2 45808.06 31.98 0.115 8160.88 4.60 0.018
5-2-2 20697.03 8.04 0.049 6386.23 4.22 0.014
6-1-2 64520.22 46.03 0.164 9781.20 4.11 0.022
7-2-2 13863.27 8.78 0.032 7423.22 4.12 0.017
8-2-2 18417.48 8.92 0.044 9308.34 3.69 0.021

obtains very similar results to those provided by one- As for their limitations and the criticisms levelled
step-ahead forecasting when data have been pre- against them, ANN lack a theoretical background and a
processed. It can therefore be said that expanding the systematic procedure for model building, in contrast to
forecast horizon did not lead to a noticeable decrease in classic approximations such as the Box–Jenkins meth-
the models performance. odology (Box & Jenkins, 1976). As a consequence, the
model building stage involves the experimental selection
of a large number of parameters through trial and error.
The use of classic statistical procedures to determine the
5. Conclusions parameters of a neural network in forecasting time series
can help overcome this limitation. Thus, Hansen et al.
This paper has presented an introduction to the (1999) suggest using time series methods based on
theoretical principles of ANN and a step-by-step interpreting correlograms and periodograms to deter-
methodology to design a neural network for tourism mine the number of lagged terms that will serve as input
time series forecasting. variables in the neural network. The field of statistics
ANN can be considered flexible, non-linear, all- and numerical analysis has generated a set of non-linear
purpose statistical tools, capable of learning the complex optimisation algorithms (Bertsekas & Tsitsiklis, 1996)
relations that occur in the social processes associated which estimates the weights of the neural network more
with tourism. This technology presents a series of quickly and efficiently than back propagation algorithm
advantages compared to classic statistical models. On without the need to use parameters such as the learning
one hand, ANN do not depend on meeting statistical rate and the momentum factor. Furthermore, a
conditions such as the type of relation between variables systematic methodology and a series of practical rules
or the type of data distribution, for example. On the that together guarantee obtaining a network model
other hand, as universal function approximators, they fitted to reality have been provided throughout this
are capable of fitting both linear and non-linear paper. Nevertheless, the most criticised aspect of
functions without the need to know the form of the applying ANN is the study of the effect or significance
underlying function a priori. Thus, in the field of of the input variables on an MLP network, as the values
tourism forecasting ANN have presented a better fit in of the parameters obtained by the network do not have
comparison to classic statistical models such as the a practical interpretation, in contrast to classical
multiple linear regression model (Burger et al., 2001; statistical models. As a result, ANN have been presented
Law, 1998, 2000a, b; Law & Au, 1999; Pattie & Snyder, to users as a kind of ‘black box’ on the basis of which it
1996; Uysal & El Roubi, 1999), ARIMA models (Burger is not possible to analyse the role played by each input
et al., 2001; Cho, 2003; Law, 2000a; Law & Au, 1999; variable in the forecasting carried out. In recent years,
Pattie & Snyder, 1996) and the single exponential various methods to interpret learning by an MLP
smoothing model (Burger et al., 2001; Cho, 2003; Law, network have been proposed (Montaño & Palmer,
2000b; Law & Au, 1999; Pattie & Snyder, 1996). 2003). Most of these procedures fall under the generic
ARTICLE IN PRESS
788 A. Palmer et al. / Tourism Management 27 (2006) 781–790

name of sensitivity analysis and have been applied in a the deterministic components of the time series and
wide range of fields of knowledge. In the tourism field, fitting a simple MLP network that concentrates on
Tsaur et al. (2002) applied sensitivity analysis to an learning the non-deterministic or chaotic components of
MLP network to establish a ranking of importance of the data.
the different service attributes used in forecasting guest In the second place, it has been shown that expanding
loyalty to international tourist hotels. the forecast horizon does not lead to a noticeable
The time series corresponding to tourism expenditure decrease in the forecasting accuracy of the MLP
in the Balearic Islands used in this paper has been networks when data have been duly preprocessed. This
relevant in illustrating the application of ANN to result suggests that ANN can be of great use in those
tourism forecasting, as it meets a twofold prerequisite. cases in which carrying out long-term forecasting is
From the applied point of view, the Balearic Islands is desired and coincides with the results obtained by Pattie
one of the most relevant destinations in the international and Snyder (1996) and Burger et al. (2001) in long-term
tourism field. From the methodological point of view, forecasting of demand in several tourism destinations.
this time series is representative of the time series set Lastly, it has been shown that the MLP network
given in the field of tourism because it has both linear selected in the validation stage provides highly accurate
trend as well as seasonality. Thus, tourism is undoubt- forecasting with test data. The 3.44% MAPE value
edly one of the activities with the highest growth rate in obtained through this architecture is comparable to
recent decades (Palmer & Riera, 2003). This is reflected those obtained by MLP networks designed in the field
in the ascending linear trend which is easily observable of tourism forecasting by Pattie and Snyder (1996)
in most of the tourism time series which habitually (2.69%), Law (1998) (3.10%), Uysal and El Roubi
measure tourism demand or tourism expenditure in a (1999) (3.23%), Law (2000a) (2.76%), Law (2000b)
determined destination (Burger et al., 2001; Law, 1998, (7.17%) and Burger et al. (2001) (5.07%).
2000a, b; Law & Au, 1999; Uysal & El Roubi, 1999; This set of results indicates that ANN are effective
Wang, 2004). As for seasonality, it is an inherent and flexible instruments for researchers interested in
characteristic of the tourism industry and is present in forecasting the behaviours which occur in the field of
many international tourism destinations (Burger et al., tourism. This paper offers a series of contributions to
2001; Cho, 2003; Pattie & Snyder, 1996; Uysal & El knowledge in the field of tourism research. In the first
Roubi, 1999). place, no didactic explanations exist in available
As for the results obtained in our research, in the first literature on how to apply a neural network model to
place it has been shown that MLP networks provide the field of tourism research, as opposed to other
more accurate forecasts when the time series involved statistical models such as structural equation modelling
has been detrended and deseasonalised. A review of the (Reisinger & Turner, 1999), cluster analysis (Jurowski &
literature shows that authors do not agree on the need to Reich, 2000), logistic regression analysis (Mitchell, 2001)
eliminate deterministic components in the time series or time series analysis (Lim & McAleer, 2002). This
when ANN are applied. Thus, some authors suggest paper offers a practical guide that makes good use of
that neural networks can effectively fit linear trend and the RNA possible. In the second place, the study
seasonality on the basis of their capacity to model any proves that the use of preprocessed data substantially
arbitrary function (Franses & Draisma, 1995; Gorr, improves the RNA’s performance compared to the use
1994; Kang, 1991; Marseguerra, Minoggio, Rossi & Zio, of raw data, measured by different numerical indices.
1992; Tang, Almeida & Fishwick, 1991). Other authors In general, there are no indications that this work
maintain that despite being universal function approx- procedure has been used in the field of tourism. In the
imators, neural networks can benefit by the prior third place, it has been demonstrated that RNA’s
elimination of trend and seasonality on the assumption forecasting accuracy does not decrease when the
that the model can thus focus on learning more complex forecasting horizon is increased from a one to a two
behaviours (Chakraborty, Mehrotra, Mohan & Ranka, step-ahead forecast.
1992; Jurik, 1992; Kolarik & Rudorfer, 1994; Nelson, Future lines of research should be aimed at over-
Hill, Remus & O’Connor, 1999; Pattie & Snyder, 1996). coming the two limitations involved in ANN that have
Our results seem to support this last group of been mentioned, i.e., the selection of MLP network
researchers. One possible explanation of our findings is parameters and the study of the effect or significance of
that adequate modelling of a time series with trend and the input variables on the forecast carried out. This
seasonality requires the use of an MLP network with a paper indicates some possible directions to be taken. It is
large number of hidden neurons. Taking into account also necessary to apply ANN to other tourism databases
that the amount of data available in the tourism field is to ascertain the degree of generalisation of the results
normally very limited, a network model with too many obtained in this paper. Lastly, a comparison in terms of
parameters will most probably cause overfitting. The forecasting accuracy between ANN and classic statis-
most effective solution consists of previously eliminating tical models will permit ascertaining the conditions in
ARTICLE IN PRESS
A. Palmer et al. / Tourism Management 27 (2006) 781–790 789

which using ANN in forecasting tourism data is Gorr, W. L. (1994). Research prospective on neural network
preferable. forecasting. International Journal of Forecasting, 10, 1–4.
Hansen, J. V., McDonald, J. B., & Nelson, R. D. (1999). Time series
prediction with genetic-algorithm designed neural networks: An
empirical comparison with modern statistical models. Computa-
tional Intelligence, 15(3), 171–184.
Acknowledgements Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer
feedforward networks are universal approximators. Neural Net-
This research was partially supported by Grant works, 2(5), 359–366.
BSO2001-0369 from the Ministry of Science and Ivars, J. A. (2003). Regional development policies: An assessment of
Technology, Spain. We also wish to thank the anon- their evolution and effects on the Spanish tourist model. Tourism
Management, 24, 655–663.
ymous reviewers and the editor for their helpful Jeng, J. M., & Fesenmaier, D. R. (1996). A neural network approach
comments and suggestions. to discrete choice modeling. Journal of Travel and Tourism
Marketing, 5, 119–144.
Jurik, M. (1992). Trading techniques: The care and feeding of a neural
network. Futures: the Magazine of Commodities & Options, 21(12),
References 40–44.
Jurowski, C., & Reich, A. Z. (2000). An explanation and illustration of
Baum, E. B., & Haussler, D. (1989). What size net gives valid cluster analysis for identifying hospitality market segments. Journal
generalization? Neural Computation, 1(1), 151–160. of Hospitality and Tourism Research, 24(1), 67–91.
Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic program- Kaastra, I., & Boyd, M. (1996). Designing a neural network for
ming. Belmont, MA: Athena Scientific. forecasting financial and economic time series. Neurocomputing, 10,
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: 215–236.
Oxford University Press. Kang, S. (1991). An investigation of the use of feedforward neural
Bloom, J. Z. (2002). The sequencing of neural networks for segmenting networks for forecasting. Unpublished Doctoral thesis, Kent State
the market of a tourist destination. Tourism, 50(4), 325–338. University.
Bloom, J. Z. (2004). Tourist market segmentation with linear and non-
Kim, J., Wei, S., & Ruys, H. (2003). Segmenting the market of West
linear techniques. Tourism Management, 25, 723–733.
Australian senior tourists using an artificial neural network.
Bloom, J. Z. (2005). Market segmentation: A neural network
Tourism Management, 24, 25–34.
application. Annals of Tourism Research, 32(1), 93–111.
Kolarik, T., & Rudorfer, G. (1994). Time series forecasting using
Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis:
neural networks. APL Quote Quad, 25(1), 86–94.
Forecasting and control. San Francisco: Holden Day.
Law, R. (1998). Room occupancy rate forecasting: A neural network
Bull, A. (1995). The economics of travel and tourism. Melbourne:
approach. International Journal of Contemporary Hospitality
Longman.
Management, 10(6), 234–239.
Burger, C., Dohnal, M., Kathrada, M., & Law, R. (2001). A
Law, R. (2000a). Back-propagation learning in improving the accuracy
practitioners guide to time-series methods for tourism deman
of neural network-based tourism demand forecasting. Tourism
forecasting: A case study of Durban, South Africa. Tourism
Management, 21, 331–340.
Management, 22, 403–409.
Law, R. (2000b). Demand for hotel spending by visitors to Hong
Chakraborty, K., Mehrotra, K., Mohan, C. K., & Ranka, S. (1992).
Kong: A study of various forecasting techniques. Journal of
Forecasting the behavior of multivariate time series using neural
Hospitality & Leisure Marketing, 6(4), 17–29.
networks. Neural Networks, 5, 961–970.
Chandra, S., & Menezes, D. (2001). Applications of multivariate Law, R. (2001). The impact of the Asian financial crisis on Japanese
analysis in international tourism research: The marketing strategy demand for travel to Hong Kong: a study of various forecasting
perspective of NTOs. Journal of Economic and Social Research, techniques. Journal of Travel & Tourism Marketing, 10(2–3), 47–66.
3(1), 77–98. Law, R., & Au, N. (1999). A neural network model to forecast
Cho, V. (2003). A comparison of three different approaches to tourist Japanese demand for travel to Hong Kong. Tourism Management,
arrival forecasting. Tourism Management, 24, 323–330. 20, 89–97.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal Lim, C., & McAleer, M. (2002). Time series forecasts of international
function. Mathematical Control, Signal and Systems, 2, 303–314. travel demand for Australia. Tourism Management, 23, 389–396.
Dolnicar, S., & Fluker, M. (2003). Behavioural market segments Marseguerra, M., Minoggio, S., Rossi, A., & Zio, E. (1992). Neural
among surf tourists: Investigating past destination choice. Journal networks prediction and fault diagnosis applied to stationary and
of Sport Tourism, 8(3), 186–196. non-stationary ARMA modeled time series. Progress in Nuclear
Fahlman, S. E. (1988).Faster-learning variations on back-propagation: Energy, 27(1), 25–36.
An empirical study. In: Touretsky, D., Hinton, G. E., Sejnowski, T. Masters, T. (1993). Practical neural networks recipes in C++. London:
J. (Eds.), Proceedings of the 1988 connectionist models summer Academic Press.
school. (pp. 38–51). San Mateo: Morgan Kaufmann. Masters, T. (1995). Advanced algorithms for neural networks: A C++
Fausett, L. (1994). Fundamentals of neural networks. NJ: Prentice-Hall. sourcebook. New York: Wiley.
Franses, P.H., & Draisma, G. (1995). Recognizing changing seasonal Mazanec, J. A. (1992). Classifying tourists into market segments: A
patterns using artificial neural networks. Working paper, Economic neural network approach. Journal of Travel and Tourism Market-
Institute, Erasmus University Rotterdam. ing, 1(1), 39–59.
Funahashi, K. (1989). On the approximate realization of continuous Mazanec, J. A. (1995). Positioning analysis with self-organizing maps:
mappings by neural networks. Neural Networks, 2, 183–192. A exploratory study on luxury hotels. Cornell Hotel and Restaurant
Ganglmair, A., & Wooliscroft, B. (2000). K-means vs. topology Administration Quarterly, 36(6), 80–95.
representing networks: Comparing ease of use for gaining optimal Mazanec, J. A. (1999). Simultaneous positioning and segmentation
results with reference to data input order. Tourism Analysis, 5, analysis with topologically ordered feature maps: A tour operator
157–162. example. Journal of Retailing and Consumer Services, 6, 219–235.
ARTICLE IN PRESS
790 A. Palmer et al. / Tourism Management 27 (2006) 781–790

Mitchell, R. E. (2001). Predictability of hunting: A logistic regression Sirakaya, E., Delen, D., & Choi, H. S. (2005). Forecasting gaming
analysis of western Canadian hunters. Crossing Boundaries, 1(1), referenda. Annals of Tourism Research, 32(1), 127–149.
107–117. Smith, M. (1993). Neural networks for statistical modeling. New York:
Mohsin, A., & Ryan, C. (1999). Perceptions of the northern territory Van Nostrand Reinhold.
by travel agents in Kuala Lumpur. Asia Pacific Journal of Tourism SPSS Inc. (1998). Neural Connection 2.1. Chicago: SPSS Inc.
Research, 3(2), 41–46. Tang, Z., Almeida, C., & Fishwick, P. A. (1991). Time series
Montaño, J. J., & Palmer, A. (2003). Numeric sensitivity analysis forecasting using neural networks vs. Box–Jenkins methodology.
applied to feedforward neural networks. Neural Computing & Simulation, 57(5), 303–310.
Applications, 12, 119–125. Tang, Z., & Fishwick, P. A. (1993). Feedforward neural nets as models
Morley, C. (2000). Demand modelling methodologies: Integration and for time series forecasting. ORSA Journal on Computing, 5(4),
other issues. Tourism Economics, 6(1), 5–19. 374–385.
Nelson, M., Hill, T., Remus, W., & O’Connor, M. (1999). Time series Tsaur, S. H., Chiu, Y. C., & Huang, C. H. (2002). Determinants of
forecasting using neural networks: Should the data be deseasona- guest loyalty to international tourist hotels: A neural network
lized first? Journal of Forecasting, 18, 359–367. approach. Tourism Management, 23, 397–405.
Palmer, T., & Riera, A. (2003). Tourism and environmental taxes. Uysal, M., & El Roubi, M. S. (1999). Artificial neural networks versus
With special reference to the ‘‘Balearic ecotax’’. Tourism Manage- multiple regression in tourism demand analysis. Journal of Travel
ment, 24, 665–674. Research, 38, 111–118.
Pattie, D. C., & Snyder, J. (1996). Using a neural network to forecast Wallace, M., Maglogiannis, I., Karpouzis, K., Kormentzas, G., &
visitor behavior. Annals of Tourism Research, 23(1), 151–164. Kollias, S. (2003). Intelligent one-stop-shop travel recommenda-
Reisinger, Y., & Turner, L. (1999). Structural equation modeling with tions using an adaptive neural network and clustering of history.
Lisrel: Application in tourism. Tourism Management, 20, 71–88. Information Technology & Tourism, 6(3), 181–194.
Ripley, B. D. (1996). Pattern recognition and neural networks. Wang, C. H. (2004). Predicting tourism demand using fuzzy time series
Cambridge: Cambridge University Press. and hybrid grey theory. Tourism Management, 25, 367–374.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning Wasserman, P. D. (1989). Neural computing: Theory and practice. New
internal representations by error propagation. In D. E. Rumelhart, York: Van Nostrand Reinhold.
& J. L. McClelland (Eds.), Parallel distributed processing (pp. West, P. M., Brockett, P. L., & Golden, L. L. (1997). A comparative
318–362). Cambridge, MA: MIT Press. analysis of neural networks and statistical methods for predicting
Rzempoluck, E. J. (1998). Neural network data analysis using Simulnet. consumer choice. Marketing Science, 16(4), 370–391.
New York: Springer. Witt, S. F., & Witt, C. A. (1992). Modeling and forecasting demand in
Sarle, W.S. (2002). Neural network FAQ. Retrieved 11 February 2003, tourism. London: Academic Press.
from ftp://ftp.sas.com/pub/neural/FAQ.html. Witt, S. F., & Witt, C. A. (1995). Forecasting tourism demand: A
Sheldon, P. J., & Var, T. (1985). Tourism forecasting: A review of review of empirical research. International Journal of Forecasting,
empirical research. Journal of Forecasting, 4(2), 183–195. 11(3), 447–475.