time series

Alfonso Palmer, Juan José Montaño, Albert Sesé

Facultad de Psicologı´a, Universidad de las Islas Baleares, Ctra. Valldemossa Km. 7,5, 07122 Palma de Mallorca, Spain

Received 23 July 2003; accepted 19 May 2005

Abstract

This paper aims to provide, on one hand, an introduction to the theoretical principles of artiﬁcial neural networks (ANN) and on

the other, a step-by-step methodology for designing a neural network for tourism time series forecasting. The time series

corresponding to tourism expenditure in the Balearic Islands (Spain), one of the world’s major tourist destinations, has been used as

data to illustrate this process. A number of practical rules and discussion points between authors, comprehensible to both academic

researchers and practitioners, have been included throughout the paper so that ANN can be applied successfully. Lastly, the results

obtained in this study provide information for researchers interested in applying ANN to tourism data forecasting.

r 2005 Elsevier Ltd. All rights reserved.

tourism to production, income and employment in

Due to the perishable nature of the tourism industry, tourist destinations (Bull, 1995). Thus, the large

the need to devise accurate forecasts has become crucial quantity of academic literature which has been gener-

(Chandra & Menezes, 2001; Law, 2000a; Law & Au, ated in this area is not surprising (Morley, 2000).

1999). Thus, researchers, practitioners and policy Despite the consensus on the need to develop accurate

makers have openly acknowledged the need for accurate forecasts and the recognition of their corresponding

forecasts in the ﬁeld of tourism (Sheldon & Var, 1985). beneﬁts, there is no one model that stands out in terms

In the case of tourism demand, better forecasts would of forecasting accuracy (Law & Au, 1999; Witt & Witt,

help directors and investors make operational, tactical 1995). In this respect, one of the most widely used

and strategic decisions, examples of which are schedul- procedures in time series forecasting is the Box–Jenkins

ing and stafﬁng, preparing tour brochures and hotel methodology (Box & Jenkins, 1976), which is based on

investments, respectively. Similarly, government bodies the ﬁt of a special type of linear statistical model known

need accurate tourism demand forecasts to plan as Autoregressive Integrated Moving Average (ARI-

required tourism infrastructures, such as accommoda- MA). One problem that makes developing and im-

tion site planning and transportation development, plementing this type of time series model difﬁcult is that

among other needs. Forecasting tourism expenditure is the model must be formally speciﬁed and a probability

distribution for data must be assumed (Hansen,

McDonald & Nelson, 1999).

A. Palmer et al. / Tourism Management 27 (2006) 781–790

is that ANN are universal function approximators been included throughout the paper for successful

capable of mapping any linear or non-linear function application. Lastly, the results obtained in this study

(Cybenko, 1989; Funahashi, 1989; Hornik, Stinchcombe provide valuable information on applying ANN in

& White, 1989; Wasserman, 1989). Due to their tourism data forecasting.

ﬂexibility in function approximation, ANN are powerful The time series corresponding to tourism expenditure

methods in tasks involving pattern classiﬁcation, esti- in the Balearic Islands, one of Spain’s most internation-

mating continuous variables and forecasting (Kaastra & ally relevant tourism destinations, shall be used as data

Boyd, 1996). In the last case, neural networks offer to illustrate the application of this technique. Spain is

several potential advantages over alternative methods— the world’s second major tourism destination, with a

mainly the ARIMA time series models—when handling total of 49.5 million tourists in 2001 (Ivars, 2003). The

problems with non-normal and non-linear data (Hansen Balearic Islands, which covers a surface area of

et al., 1999). The ﬁrst advantage is that ANN are very 4968.36 km2 and had 878,627 inhabitants in 2001, is

versatile and do not require formal speciﬁcation of the considered an exceptional tourism centre, one of the

model nor acceptance of a determined probability top-ranking regions in the world in terms of afﬂuence,

distribution for data. As for the second advantage, with a volume of 8.4 million arrivals in 2001, a 1.2%

Masters (1995) demonstrates that ANN are capable of share of the total world market. At the same time, the

tolerating the presence of chaotic components better signiﬁcance of tourism to its economy can be determined

than most alternative methods. This capacity is parti- by observing the Balearic Islands’ own statistics–an

cularly important, as many relevant time series possess estimated E5096.2 million of revenue generated in 2001.

signiﬁcant chaotic components. Additionally, tourism added value (TVA) amounted to

ANN have been applied in the many ﬁelds mentioned 21.2% of the Balearic GDP in 1997, while the value

above and have been a pioneer in the ﬁeld of tourism added of tourism industries (VATI) was estimated at

data analysis (Chandra & Menezes, 2001). Thus, neural 31.2% (Palmer & Riera, 2003).

network models have recently been used as a statistical Having discussed the role that ANN can play in tourism

technique in the main ﬁelds of tourism research, such as time series forecasting and the objectives of this research,

demand and consumer behaviour forecasting (Burger, the remaining sections of this paper are organised as

Dohnal, Kathrada & Law, 2001; Cho, 2003; Jeng & follows: Section 2 brieﬂy introduces the basic foundations

Fesenmaier, 1996; Law, 1998, 2000a, b, 2001; Law & of ANN and describes artiﬁcial neuron operations.

Au, 1999; Mohsin & Ryan, 1999; Pattie & Snyder, 1996; Section 3 presents the neural network model most widely

Sirakaya, Delen & Choi, 2005; Tsaur, Chiu & Huang, used in time series forecasting: the multi-layer perceptron

2002; Uysal & El Roubi, 1999; Wang, 2004), market (MLP). Section 4 describes a step-by-step methodology

segmentation and positioning analysis (Bloom, 2002, for applying a neural network to tourism expenditure

2004, 2005; Dolnicar & Fluker, 2003; Ganglmair & forecasting in the Balearic Islands. The paper’s ﬁnal

Wooliscroft, 2000; Kim, Wei & Ruys, 2003; Mazanec, section concludes with a discussion of the results obtained

1992, 1995, 1999; Wallace, Maglogiannis, Karpouzis, and suggestions for future lines of research.

Kormentzas & Kollias, 2003).

These studies indicate a growing interest in using

ANN to represent complex, non-linear cognitive activ- 2. The artiﬁcial neuron

ities that are often studied in tourism management

(Morley, 2000). Nevertheless, due to the relatively recent ANN can be deﬁned as information processing

introduction of neural networks in tourism, the number systems whose structure and functioning are inspired

of published articles which clearly explain a methodol- by biological neural networks. They have three funda-

ogy that obtains reliable tourism data modelling mental features—parallel processing, distributed mem-

through neural networks continues to be limited in ory and adaptability—that provide them with a series of

comparison with other statistical methods (Uysal & El advantages compared to other processing systems, such

Roubi, 1999). On the other hand, due to their ﬂexibility, as robustness and a tolerance to error and noise.

neural networks lack a systematic procedure for model In general, an ANN is made up of a large number of

building. Therefore, obtaining a reliable neural model simple processing elements known as nodes or neurons,

involves selecting a large number of parameters experi- which are organised in layers. Each neuron is connected

mentally through trial and error. to other neurons by communication links, each of which

Thus, this paper aims to provide both an introduction has an associated numerical value known as a ‘‘weight’’.

to the main principles of ANN and a step-by-step Weights contain the knowledge or information that

methodology for designing a neural network for tourism ANN possess about a speciﬁc problem.

time series forecasting. Several practical rules and The task of an artiﬁcial neuron j is simple and consists

discussion points between authors, comprehensible both of receiving input signals (xi) (weighted by connection

to academic researchers as well as practitioners, have weights (wij)) from neighbouring neurons. The sum of

A. Palmer et al. / Tourism Management 27 (2006) 781–790

these weighted signals provides the neuron’s total or net Univariant time series modelling is normally carried

input (netj). Then, the activation threshold of neuron j— out through this neural network by using a determined

represented by a positive or negative yj value—is added number of the time series’ lagged terms as inputs and

to net input and through applying a mathematical forecasts as output (Bishop, 1995). The number of input

function f ðÞ (generally non-linear and known as an neurons determines the number of prior time points to

activation function) to net input, output value yj is be used in each forecast, while the number of output

computed and sent to other neurons (see Fig. 1). neurons determines the forecast horizon. Thus, a one-

step-ahead forecast can be performed through a neural

network with one output unit and a k-step-ahead

3. The MLP forecast can be performed with k output units. Fig. 2

illustrates an MLP forecasting model in which four past

MLP (also known as a feedforward neural network) is terms in the series are used to forecast the value of the

the neural network model most widely used in applied series in a one-step-ahead forecast.

work (Sarle, 2002), because it is capable of resolving a

wide variety of problems. For the purposes of this

paper, the MLP network is also the type of neural 4. Designing an ANN model for tourism time series

network most commonly used in time series forecasting forecasting

(Bishop, 1995; Kaastra & Boyd, 1996).

A MLP network is made up of an input layer, an Resolving a forecasting task through an MLP net-

output layer and one or more hidden layers of neurons. work means applying a methodology that not only has

In this type of architecture, data are always transmitted aspects in common with conventional statistical model-

from the input layer to the output layer. From a ling techniques, but also has other singular aspects that

statistical point of view, each input neuron represents can only be found in the ANN ﬁeld. The application

one of the independent variables, while the output procedure is described below.

neuron(s) represent dependent variable(s) or MLP

network forecasts.

Two stages may be considered in the MLP network: 4.1. Data preprocessing

the running stage, in which an input pattern is presented

to the trained network and transmitted through This research worked with data on tourism expendi-

successive layers of neurons until reaching an output, ture in the Balearic Islands (in millions of pesetas) from

and the training or learning stage in which the weights each quarter of 1986 through to the year 2000. There-

or parameters of the network are iteratively modiﬁed on fore, this time series is composed of 60 time points. The

the basis of a set of input–output patterns known as a data were provided by the Ministry of the Economy.

training set, in order to minimise the deviance or error Fig. 3(a) shows the time series which was used. In the

between the output obtained by the network and the ﬁrst place, a marked seasonality can be seen in which

user’s desired output. This is why MLP network the ﬁrst and fourth quarters (October–March) make up

learning is said to be supervised. The learning rule the ‘low season’ and coincide with autumn and winter,

commonly used in this type of network is the back and the second and third quarters (from April to

propagation algorithm or gradient descent method,

developed and disseminated by Rumelhart, Hinton

Input

and Williams (1986).

layer

X1 tn-4

1 Hidden

layer

f Output

X2 W1j tn-3

2 layer

N yj

W2j net j = Σ WijXi + θj yj = f(netj) tn

i=1

Xi

i tn-2

Wij

XN

WNj

N tn-1

Neurons i Neurons j

Fig. 2. MLP network for one-step-ahead forecasting based on four

Fig. 1. General functioning of an artiﬁcial neuron. lagged terms.

A. Palmer et al. / Tourism Management 27 (2006) 781–790

Raw Data

Time series Histogram

400000

14

350000

12

300000

Millions of pesetas

10

250000

Frequency

8

200000

6

150000

100000 4

50000 2

0 0

86 87 88 89 90 91 92 93 94 95 96 97 98 99 00

(a) Year (b) Millions of pesetas

Preprocessed Data

Time series Histogram

0.5

8

0.4

0.3

6

0.2

Difference

Frequency

0.1

4

0

-0.1

2

-0.2

-0.3 0

87 88 89 90 91 92 93 94 95 96 97 98 99 00

(c) Year (d) Difference

September) make up the ‘high season’ and coincide with to the right present in data distribution. This transfor-

spring and summer, as tourists visiting the Islands are mation also allows multiplicative relations to be con-

mainly attracted by the sun and sand offer. In the verted into additive relations, which simpliﬁes and

second place, the time series shows a slightly rising improves data modelling (Masters, 1993). Differencing

trend from approximately 1992, which indicates the or the use of changes in a variable can be used to

growth that tourism expenditure in the Balearic Islands eliminate both linear trend as well as seasonality in a set

has experienced in recent years. Fig. 3(b) shows the of data (Masters, 1993).

histogram of data distribution in which a marked Lastly, when applying MLP networks it is recom-

asymmetry to the right can be seen. mended, although not strictly required, that the

Preprocessing data refers to analysing and transform- variables’ range of values be limited in the work interval

ing input and output variables in order to detect trends, of the activation function used in the hidden and output

minimise noise, underline important relationships and layers of neural networks (Masters, 1993; Sarle, 2002).

ﬂatten the variable’s distribution. These analyses and This measure considerably accelerates weight learning

transformations help the model learn relevant patterns. and avoids saturation or overﬂow of the hidden and

Logarithmic transformation (natural log) and differ- output neurons whose activation values generally fall

encing are the two preprocessing techniques most between the [0, 1] or [1, 1] interval.

commonly used in both traditional and neural networks In order to analyse the effect that the data preproces-

forecasting (Kaastra & Boyd, 1996). Logarithmic sing has on MLP networks performance, two sets of

transformation is useful in correcting the asymmetry data were used. The ﬁrst set (which shall be called raw

A. Palmer et al. / Tourism Management 27 (2006) 781–790

data) is made up of the time series data limited by the losing its capacity to generalise learning to new cases

[1, 1] interval through the linear transformation (Baum & Haussler, 1989).

proposed by Masters (1993). The second set (which To avoid the problem of overﬁtting, it is recom-

shall be called preprocessed data) is made up of the time mended using a second set of data, the validation

series data to which two consecutive transformations set, which permits the learning process to be controlled.

have been applied with the twofold aim of pre- During learning, the network modiﬁes weights on

processing the data and eliminating the two determinis- the basis of the training data and alternatively the

tic components identiﬁed above, linear trend and network error made with validation data is obtained.

seasonality. Thus, the logarithmic transformation was Thus, the optimum number of weights can be ascer-

applied and then a differentiation that consists in tained on the basis of the architecture that has

subtracting the value of the previous year’s quarter performed best with the validation data. The value of

from the value of each quarter (e.g., the 3rd quarter of other parameters that play a part in network learning

1998—the 3rd quarter of 1997) was applied. Fig. 3(c) can also be determined through the validation set, as can

shows the preprocessed time series from which linear be seen below.

trend and seasonality have been eliminated, while Lastly, if the ﬁnal effectiveness of the system built is to

Fig. 3(d) shows the histogram which reﬂects a more be measured in a completely objective manner, the error

symmetrical distribution after preprocessing. Finally, committed with validation data should not be used as a

the differentiated values were limited by the [1, 1] basis, as to some degree, these data have participated in

interval through linear transformation (Masters, 1993). the training process. A third set of independent data

Next, different data matrices were generated for both must be used, the test set, which provides an unbiased

sets of data on the basis of the number of lagged terms estimate of the generalisation error.

(from 1 to 8) that would act as input and the number of There is no precise rule on the optimum size of the

step-ahead forecast (1 and 2) that would act as the three sets of data, although authors agree that the

neural network’s output. To illustrate, Fig. 4 shows the training set must be the largest (Kaastra & Boyd, 1996;

structure of the data matrix corresponding to the MLP West, Brockett & Golden, 1997). In this research, data

network in Fig. 2, in which each time point is to be from the years between 1986 and 1996 were used as the

forecast on the basis of the four previous time points. training set (73.3%), data from 1997 and 1998 were used

as the validation set (13.3%) and data from 1999 and

4.2. Creating training, validation and test sets 2000 were used as the test set (13.3%).

In ANN methodology, data samples are frequently 4.3. ANN model building

subdivided into three sets (Bishop, 1995; Ripley, 1996)—

training, validation and test sets—in order to obtain a The following is practical advice on four sets of

network which is capable of generalising and performing parameters involved in creating an MLP network

well with new cases. through the back propagation learning rule: network

During the network’s learning stage, the weights are architecture, learning rate and momentum factor,

iteratively modiﬁed on the basis of the training set’s activation function of the hidden and output layers

values, in order to minimise the error between the and number of iterations.

network output and the user’s desired output. Never- As for the MLP network architecture, it is known that

theless, an excessive number of parameters or weights in using one sole hidden layer of neurons will be sufﬁcient

relation to the problem at hand and to the number of for most practical problems (Funahashi, 1989; Hornik

training data may lead to overﬁtting. This phenomenon et al., 1989). The number of hidden neurons determines

occurs when the model ﬁts the irrelevant features present the MLP network’s capacity to learn. Despite its

in training data too closely instead of ﬁtting the importance, there is no rule that indicates the optimum

underlying function which relates inputs and outputs, number of hidden neurons for any given problem.

Bearing in mind the overﬁtting problem, selecting the

network which performs best with the validation set

Inputs Outputs using the least possible number of hidden neurons

t1 t2 t3 t4 t5 (Masters, 1993; Smith, 1993; Rzempoluck, 1998) is most

t2 t3 t4 t5 t6 recommended. The input layer is in charge of receiving a

determined number of the time series’ lagged terms to

t3 t4 t5 t6 t7 carry out the forecast. Normally, the selection of lagged

t56 t57 t58 t59 t60

terms is determined experimentally through trial and

error, on the basis of the minimum set of terms that

Fig. 4. Structure of the data matrix corresponding to the MLP obtains the least error with the validation set. Two

network in Fig. 2. alternatives to the experimental selection of input

A. Palmer et al. / Tourism Management 27 (2006) 781–790

variables have been suggested. First, applying a activation function used in the hidden and output layers

sensitivity analysis to the network model provides of the models built.

information on the signiﬁcance that each input has on Lastly, MLP network training must conclude when

output, and as a result, the lagged terms most relevant to the weights converge. This point occurs when the error

forecasting can be selected (Montaño & Palmer, 2003). committed with the training data stops decreasing. In

Second, selecting the lagged terms in the MLP network most studies, the number of training iterations used until

can be carried out on the basis of the autoregressive or convergence was reached oscillated between 100 and

lagged terms obtained by applying the Box–Jenkins 200,000 iterations (Kaastra & Boyd, 1996). A total of

methodology (Tang & Fishwick, 1993). Lastly, the 10,000 iterations in each training was sufﬁcient in this

output layer determines the neural network’s forecast study to guarantee convergence of the weights obtained.

horizon. As for the scope of the forecast horizon, At present there are many free computer programs

authors such as Masters (1993) and Kaastra and Boyd and commercial computer programs that allow MLP

(1996) recommend limiting the forecast to a one-step- networks to be simulated through the manipulation of

ahead prediction, assuming that the use of broader all the parameters analysed in this section without the

forecast horizons has very serious repercussions on need for deep knowledge of the underlying mathema-

network performance. Nevertheless, Tang and Fishwick tical algorithms. The MLP networks used in this paper

(1993) proved in a systematic study that expanding the were generated by the Neural Connection 2.1 computer

forecast horizon does not imply a decrease in the neural program (SPSS Inc., 1998).

network’s performance.

A set of models based on the combination of different 4.4. Evaluation and selection of ANN models

values for the number of input (from 1 to 8), hidden

(from 1 to 3) and output (1 and 2) neurons were A total of 96 MLP networks were obtained on the

constructed for this study in order to analyse the effect basis of combining the following parameters: raw/pre-

that the type of architecture has on MLP networks processed data and the number of input (from 1 to 8),

performance and in consonance with the structure of hidden (from 1 to 3) and output (1 or 2) neurons. The

matrices generated in the data preprocessing stage. models were evaluated with the validation data through

The learning rate value plays a crucial role in the three forecasting accuracy measures: root mean squared

MLP networks training process, as it controls the size of error (RMSE), mean absolute percentage error (MAPE)

the changes in weight in each iteration. Both a too-small and Theil’s U coefﬁcient.

change in size as well as a too-large change in size must The MLP architecture which was selected, and

be avoided in order to obtain optimum weight conﬁg- therefore, the architecture which presented the best

urations. A learning rate between 0.05 and 0.5 provides forecasting accuracy with the validation data, was

good results in most practical cases (Rumelhart et al., composed of eight inputs, one hidden and one output

1986). The momentum factor determines the effect of neurons (in abbreviated form, a 8-1-1 architecture),

past changes in weights on current changes in weights the time series having been previously preprocessed. The

and allows the speed of learning to be increased by values obtained through this MLP network with the

ﬁltering the oscillations caused by the learning rate. The validation data as regards accuracy measures were

momentum factor usually has a value of close to 1, e.g., RMSE ¼ 6270.57, MAPE ¼ 3.32 and U-Theil ¼ 0.016.

0.9 (Rumelhart et al., 1986). A learning rate of 0.25 and These three accuracy measures were applied to the 96

a momentum factor of 0.8 were constantly used in all the MLP networks using the test set as data in order to

MLP networks in our study, although it was shown that analyse the different created models’ level of general-

similar results were obtained for a wide range of values isation. Table 1 shows the results of a selection of 28

in both parameters. MLP networks and reﬂects the effect that manipulating

In a standard MLP network, the input layer neurons different parameters has on performance. In the ﬁrst

use a linear activation function, while the hidden and place, one of the most striking results is that neural

output layer neurons use a sigmoid activation function. networks performance clearly improved in the three

In this sense, the two sigmoid functions most used measures used when data were preprocessed. In this

are the logistic (providing continuous values between 0 case, all the architectures, including the architecture

and 1) and hyperbolic tangent (providing continuous selected in the validation stage, obtained MAPE values

values between 1 and 1) functions. In this study, the of less than 5 percent, which can be considered highly

hyperbolic tangent function was used in the hidden and accurate forecasting (Witt & Witt, 1992). This implies

output layers of the MLP networks, as it considerably that data preprocessing which consists of eliminating the

accelerates weight learning in comparison to the logistic deterministic components of the time series—linear

function (Fahlman, 1988; Fausett, 1994). It should be trend and seasonality—has positive repercussions on

recalled that the data were scaled to the interval between the goodness of ﬁt of MLP networks. In the second

[1, 1] in the preprocessing stage, coherent with the place, it can be seen that two-steps-ahead forecasting

A. Palmer et al. / Tourism Management 27 (2006) 781–790

Table 1

Forecasting accuracy of MLP networks with test data

(input-hidden-output)

RMSE MAPE U-Theil RMSE MAPE U-Theil

3-2-1 28473.71 11.45 0.068 8235.75 3.72 0.018

4-2-1 16531.10 8.87 0.039 10578.32 4.11 0.023

5-2-1 16211.88 9.70 0.038 11842.94 4.14 0.026

6-1-1 24292.66 17.34 0.058 9688.92 3.88 0.021

7-2-1 22574.47 15.44 0.054 7412.12 3.40 0.016

8-1-1 27603.08 21.07 0.065 8181.41 3.44 0.018

2-3-2 11482.83 7.26 0.026 8982.47 3.97 0.020

3-1-2 33268.98 18.51 0.081 6592.16 4.32 0.015

4-1-2 45808.06 31.98 0.115 8160.88 4.60 0.018

5-2-2 20697.03 8.04 0.049 6386.23 4.22 0.014

6-1-2 64520.22 46.03 0.164 9781.20 4.11 0.022

7-2-2 13863.27 8.78 0.032 7423.22 4.12 0.017

8-2-2 18417.48 8.92 0.044 9308.34 3.69 0.021

obtains very similar results to those provided by one- As for their limitations and the criticisms levelled

step-ahead forecasting when data have been pre- against them, ANN lack a theoretical background and a

processed. It can therefore be said that expanding the systematic procedure for model building, in contrast to

forecast horizon did not lead to a noticeable decrease in classic approximations such as the Box–Jenkins meth-

the models performance. odology (Box & Jenkins, 1976). As a consequence, the

model building stage involves the experimental selection

of a large number of parameters through trial and error.

The use of classic statistical procedures to determine the

5. Conclusions parameters of a neural network in forecasting time series

can help overcome this limitation. Thus, Hansen et al.

This paper has presented an introduction to the (1999) suggest using time series methods based on

theoretical principles of ANN and a step-by-step interpreting correlograms and periodograms to deter-

methodology to design a neural network for tourism mine the number of lagged terms that will serve as input

time series forecasting. variables in the neural network. The ﬁeld of statistics

ANN can be considered ﬂexible, non-linear, all- and numerical analysis has generated a set of non-linear

purpose statistical tools, capable of learning the complex optimisation algorithms (Bertsekas & Tsitsiklis, 1996)

relations that occur in the social processes associated which estimates the weights of the neural network more

with tourism. This technology presents a series of quickly and efﬁciently than back propagation algorithm

advantages compared to classic statistical models. On without the need to use parameters such as the learning

one hand, ANN do not depend on meeting statistical rate and the momentum factor. Furthermore, a

conditions such as the type of relation between variables systematic methodology and a series of practical rules

or the type of data distribution, for example. On the that together guarantee obtaining a network model

other hand, as universal function approximators, they ﬁtted to reality have been provided throughout this

are capable of ﬁtting both linear and non-linear paper. Nevertheless, the most criticised aspect of

functions without the need to know the form of the applying ANN is the study of the effect or signiﬁcance

underlying function a priori. Thus, in the ﬁeld of of the input variables on an MLP network, as the values

tourism forecasting ANN have presented a better ﬁt in of the parameters obtained by the network do not have

comparison to classic statistical models such as the a practical interpretation, in contrast to classical

multiple linear regression model (Burger et al., 2001; statistical models. As a result, ANN have been presented

Law, 1998, 2000a, b; Law & Au, 1999; Pattie & Snyder, to users as a kind of ‘black box’ on the basis of which it

1996; Uysal & El Roubi, 1999), ARIMA models (Burger is not possible to analyse the role played by each input

et al., 2001; Cho, 2003; Law, 2000a; Law & Au, 1999; variable in the forecasting carried out. In recent years,

Pattie & Snyder, 1996) and the single exponential various methods to interpret learning by an MLP

smoothing model (Burger et al., 2001; Cho, 2003; Law, network have been proposed (Montaño & Palmer,

2000b; Law & Au, 1999; Pattie & Snyder, 1996). 2003). Most of these procedures fall under the generic

A. Palmer et al. / Tourism Management 27 (2006) 781–790

name of sensitivity analysis and have been applied in a the deterministic components of the time series and

wide range of ﬁelds of knowledge. In the tourism ﬁeld, ﬁtting a simple MLP network that concentrates on

Tsaur et al. (2002) applied sensitivity analysis to an learning the non-deterministic or chaotic components of

MLP network to establish a ranking of importance of the data.

the different service attributes used in forecasting guest In the second place, it has been shown that expanding

loyalty to international tourist hotels. the forecast horizon does not lead to a noticeable

The time series corresponding to tourism expenditure decrease in the forecasting accuracy of the MLP

in the Balearic Islands used in this paper has been networks when data have been duly preprocessed. This

relevant in illustrating the application of ANN to result suggests that ANN can be of great use in those

tourism forecasting, as it meets a twofold prerequisite. cases in which carrying out long-term forecasting is

From the applied point of view, the Balearic Islands is desired and coincides with the results obtained by Pattie

one of the most relevant destinations in the international and Snyder (1996) and Burger et al. (2001) in long-term

tourism ﬁeld. From the methodological point of view, forecasting of demand in several tourism destinations.

this time series is representative of the time series set Lastly, it has been shown that the MLP network

given in the ﬁeld of tourism because it has both linear selected in the validation stage provides highly accurate

trend as well as seasonality. Thus, tourism is undoubt- forecasting with test data. The 3.44% MAPE value

edly one of the activities with the highest growth rate in obtained through this architecture is comparable to

recent decades (Palmer & Riera, 2003). This is reﬂected those obtained by MLP networks designed in the ﬁeld

in the ascending linear trend which is easily observable of tourism forecasting by Pattie and Snyder (1996)

in most of the tourism time series which habitually (2.69%), Law (1998) (3.10%), Uysal and El Roubi

measure tourism demand or tourism expenditure in a (1999) (3.23%), Law (2000a) (2.76%), Law (2000b)

determined destination (Burger et al., 2001; Law, 1998, (7.17%) and Burger et al. (2001) (5.07%).

2000a, b; Law & Au, 1999; Uysal & El Roubi, 1999; This set of results indicates that ANN are effective

Wang, 2004). As for seasonality, it is an inherent and ﬂexible instruments for researchers interested in

characteristic of the tourism industry and is present in forecasting the behaviours which occur in the ﬁeld of

many international tourism destinations (Burger et al., tourism. This paper offers a series of contributions to

2001; Cho, 2003; Pattie & Snyder, 1996; Uysal & El knowledge in the ﬁeld of tourism research. In the ﬁrst

Roubi, 1999). place, no didactic explanations exist in available

As for the results obtained in our research, in the ﬁrst literature on how to apply a neural network model to

place it has been shown that MLP networks provide the ﬁeld of tourism research, as opposed to other

more accurate forecasts when the time series involved statistical models such as structural equation modelling

has been detrended and deseasonalised. A review of the (Reisinger & Turner, 1999), cluster analysis (Jurowski &

literature shows that authors do not agree on the need to Reich, 2000), logistic regression analysis (Mitchell, 2001)

eliminate deterministic components in the time series or time series analysis (Lim & McAleer, 2002). This

when ANN are applied. Thus, some authors suggest paper offers a practical guide that makes good use of

that neural networks can effectively ﬁt linear trend and the RNA possible. In the second place, the study

seasonality on the basis of their capacity to model any proves that the use of preprocessed data substantially

arbitrary function (Franses & Draisma, 1995; Gorr, improves the RNA’s performance compared to the use

1994; Kang, 1991; Marseguerra, Minoggio, Rossi & Zio, of raw data, measured by different numerical indices.

1992; Tang, Almeida & Fishwick, 1991). Other authors In general, there are no indications that this work

maintain that despite being universal function approx- procedure has been used in the ﬁeld of tourism. In the

imators, neural networks can beneﬁt by the prior third place, it has been demonstrated that RNA’s

elimination of trend and seasonality on the assumption forecasting accuracy does not decrease when the

that the model can thus focus on learning more complex forecasting horizon is increased from a one to a two

behaviours (Chakraborty, Mehrotra, Mohan & Ranka, step-ahead forecast.

1992; Jurik, 1992; Kolarik & Rudorfer, 1994; Nelson, Future lines of research should be aimed at over-

Hill, Remus & O’Connor, 1999; Pattie & Snyder, 1996). coming the two limitations involved in ANN that have

Our results seem to support this last group of been mentioned, i.e., the selection of MLP network

researchers. One possible explanation of our ﬁndings is parameters and the study of the effect or signiﬁcance of

that adequate modelling of a time series with trend and the input variables on the forecast carried out. This

seasonality requires the use of an MLP network with a paper indicates some possible directions to be taken. It is

large number of hidden neurons. Taking into account also necessary to apply ANN to other tourism databases

that the amount of data available in the tourism ﬁeld is to ascertain the degree of generalisation of the results

normally very limited, a network model with too many obtained in this paper. Lastly, a comparison in terms of

parameters will most probably cause overﬁtting. The forecasting accuracy between ANN and classic statis-

most effective solution consists of previously eliminating tical models will permit ascertaining the conditions in

A. Palmer et al. / Tourism Management 27 (2006) 781–790

ARTICLE IN PRESS

