You are on page 1of 13

Decision Support Systems 54 (2013) 1404–1416

Contents lists available at SciVerse ScienceDirect

Decision Support Systems


journal homepage: www.elsevier.com/locate/dss

A demand forecast model using a combination of surrogate data analysis and optimal
neural network approach
H.C.W. Lau a,⁎, G.T.S. Ho b, Yi Zhao b
a
CInIS & School of Management, University of Western Sydney, Australia
b
Department of Industrial and System Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong

a r t i c l e i n f o a b s t r a c t

Article history: As rough or inaccurate estimation of demands is one of the main causes of the bullwhip effect harming the
Received 2 June 2010 entire supply chain, we have developed a mathematical approach, the minimum description length (MDL),
Received in revised form 8 August 2012 to determine the optimal artificial neural network (ANN) that can provide accurate demand forecasts. Two
Accepted 6 December 2012
types of simulated customer and one practical demand are employed to validate the capability of the MDL
Available online 14 December 2012
method. Since stochastic factors hidden in the demand data disturb the prediction, the surrogate data meth-
Keywords:
od is proposed for identifying the characteristics of the demand data. This method excludes demands that are
Demand forecast totally stochastic when forecasting. We demonstrate how optimal models estimated by MDL are consistent
Minimum description length (MDL) with the dynamics of demand data identified by the surrogate data method. The complementary approach
Optimal neural network of the surrogate data method and neural network constitutes a comprehensive framework for making vari-
Stochastic factors ous demand predictions. This framework is applicable to a wide variety of real-world data.
Surrogate data © 2012 Elsevier B.V. All rights reserved.

1. Introduction factors while the pattern hidden in those demands is unknown or is


too complicated for managers along supply chain to understand.
The operation of the supply chain has undergone significant In this paper, an intelligent system based on the MDL-optimal
changes during the past decade [3]. Within the supply chain, enter- neural network for “learning” the underlying pattern and predicting
prises usually adopt a strategy of having low inventories at all levels, future demands is developed. Neural networks are considered as
including the end sale retailers, in order to reduce the their costs the primary and most popular technique for demand forecasting
[22,27]. Meanwhile, the retailer sectors have to face uncertainties in supply chain management, and in particular the multi-layer
emerging in their supply chains. Customer demands depend on feed-forward neural network is able to approximate any nonlinear
uncertain, stochastic factors, which make it difficult for supply chain or linear function under certain conditions [2,13,20]. Moreover, the
participants to give an accurate estimation of future demands. This high-degree of freedom in the neural network architecture pro-
issue would be further extended by variation amplification, known vides the potential to model any function but, unfortunately, also
as bullwhip effect, and make the parties involved getting lost in results in a very high probability of overfitting [38]. So the crucial
inventory management by receiving faulted notification. Obviously, issue in developing a neural network is the generalization of the
the extra stock keeping results in excess production at the upstream network; however, being ignored in published works. An alterna-
levels, since the producer aims to fulfill the over-estimated demand. tive novel approach is taken to determine the optimal neural net-
Likewise, underestimated demand causes upstream players not work considering generalization and defined as the MDL-optimal
producing enough quantity fulfilling actual demand. Both scenarios neural network for demand forecasting.
lead to inefficiency in supply chain management. Thus, the challenge Furthermore, there is little in the literature that focuses on study-
for a participant in the supply chain is to determine the appropriate ing the nature of customer demand. Most articles about demand fore-
quantity in terms of accurate demand forecast. casting usually maintain that the given data should be predicted by
As the bullwhip effect has been recognized as a forecast-driven using their approaches no matter whether the data is stochastic, lin-
problem through supply chains, it is necessary to develop advanced ear or nonlinear and no matter whether the forecasting techniques
techniques for forecasting customer demand and extend the visibility are suitable for modeling the data or not. Actual demands depend
of customer demands as far as possible. However, doing an accurate on a lot of stochastic elements, which very probably result in the
and truthful demand forecasting is not straightforward and its diffi- demands becoming completely stochastic. Meanwhile, some kinds
culty is that customer demands depend on many environmental of demand data in which deterministic patterns dominate appear to
be random, and then people are very likely to ignore investigation
⁎ Corresponding author. Tel.: +61 2 9685 9488; fax: +61 2 9685 9593. of those demands. To address this issue, we employ the surrogate
E-mail address: H.lau@uws.edu.au (H.C.W. Lau). data method and examine the dynamics of the specific simulated

0167-9236/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.dss.2012.12.008
H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416 1405

customer demands. We notice that the results of the surrogate data that use of neural networks is a way out. More studies in comparing and
method are also confirmed by the MDL-optimal model. We, therefore, combining traditional and neural network based forecasting methodol-
attempt to make use of the hypothesis testing with surrogated data ogy suggest that neural network can offer some improvement in perfor-
method confirming the modeling technique so as to provide a com- mance and its feasibility of cooperation [11]. In this paper, we focus on
prehensive solution to problems of customer demand prediction in how MDL-optimal neural networks approach for tackling linear and
terms of identifying data feature, and selecting optimal network nonlinear simulated customer demand forecasting.
setting from one to another model. The contribution of the proposed
techniques is also validated according to the performance in accuracy 3. Framework of the MDL-optimal neural network model
and flexibility in comparison with commonly used mathematical
approaches in experiments. Simulation logic, along with a flowchart, is shown in Fig. 1. To
The rest of the paper is organized as follows: Section 2 contains a verify that the program does actually perform as intended, the con-
brief review of techniques of demand forecasting. This is followed ceptual model is divided into three parts: demand generation, fore-
by a description of the developed optimal neural network, as well casting and calculation of prediction accuracy. The neural network
as the surrogate data method for identifying data characteristics. In used in this paper is the three-layer feedforward neural network,
Section 4 the application of proposed technique is described and with a single hidden layer, sigmoid activation functions, and one lin-
benchmarked with existing approaches through simulated models ear output, as illustrated in Fig. 1. Given the input vector (xt − 1,xt −
and a practical study. In the last section a conclusion on the 2, …,xt − d) the transfer function of the neural network, f can be math-
approaches that have been developed is given and suggestions ematically expressed as
for future research are provided. 0 1
  X
k Xd
2. Literature review f xt−1 ; xt−2; …; xt−d ¼ b0 þ vi ϕ@ ωi;j xt−j þ bi A ð1Þ
i¼1 j¼1

Uncertainty in a supply chain can be defined as unpredictable


events that affect its planned performance [24]. Demand uncertainty, where {vi,ωi,j,bi} are weights and biases respectively, ϕ is the
caused by inaccurate forecasting, feeds into the information exchange tan-sigmoid transfer function, k is the number of neurons, and d is
network, which in turn results in the bullwhip effect [30]. Such an the number of inputs. The Levenberg–Marquardt algorithm is used
effect may decrease the ability of the chain to meet the expected tar- to train the neural network.
get for the delivery of products and services. Overfitting has long been recognized as an endemic problem to
The bullwhip effect, also known as the Forrester or whiplash effect neural networks having a number of parameters. The biological na-
is one of the key areas of research in supply chain management (SCM) ture for ANN (i.e. a massive highly connected array of nonlinear excit-
applications and its' typical cases could be found in the commercial atory “neurons”) promotes the construction of neural networks with
operation of Campbell's Soup [15], HP and Proctor & Gamble [25], a large number of neurons. Correspondingly the resulting models eas-
and a garment supply chain [12]. High inventory levels and poor cus- ily become overfitting. Therefore, the adequate generalization of ANN
tomer services along the supply chain constitute symptoms of the for a specific application is a primary element to ensure successful ap-
bullwhip effect [9]. plication in practice. We, therefore, utilize a novel approach, MDL, to
In terms of management science techniques, Yao, [39] Paik, and directly determine the optimal neural network (i.e. the number of
SeungKuk [31] identified demand forecasting as one of the significant neurons in the neural network) with the focus on prediction
variables for the bullwhip control, and Miyaoka found that improved accuracy.
forecasting could reduce fluctuations in manufacturing production
levels [29]. It is of vital importance as it has close relationship with 3.1. The minimum description length
reordering system [6], inventory cost [26], decision making [28],
profit maximization [18], etc. Gurnani et al. [18] found that retailers The MDL principle is based on the trade-off strategy to estimate
have to strike a balance between an accurate demand forecast and the optimal model according to the minimum of total description
unit cost uncertainty before selling season starts. Different situations length. The description length of a model is composed of two parts:
require different forecast models or methods. Sani and Kingsman [34] the cost describing both the model parameters, and its prediction
compared five forecasting methods in terms of cost, service level and errors.
then both of them and found that they perform slightly different form
Definition 1. N(s) is the description length of parameters of a neural
each other. To forecast telephone demand in Australia and Chile,
network, whose neuron number is s.
different techniques were also adopted by Bhattacharyya and
Wellenius respectively [5]. Referring to above works, it can be con- Definition 2. E(s) is the description length of the prediction error of
cluded that optimal forecasting methods have to be designed or the same model.
selected carefully [14].
To address this far-reaching factor, the moving average and the The total description length with respect to this model, denoted by
single exponential smoothing methods have been used by Graves D(s), is then given by the sum of both parts. The MDL principle states
[17] and Chen et al. [7,8]. Multiple correlations and a regression equa- that the optimal model is the one that minimize D(s).
tion with weights can also be applied for optimal prediction [10]. It is Let {di}iN= 1 be customer demands of N time units (for example
a conventional technique in demand forecasting [33]. According to a weeks) and f(di − 1,di − 2, …,di − k;Λs) be a prediction of the neural net-
comparative study, ANN can produce better predictions than with work given of the previous k inputs and neural network parameters
the Multiple Regression method [16]. Λs with respect to s neurons. The prediction error at the ith time
As neural networks have been developed rapidly and are used unit is then given by ei = f(di − 1,di − 2, …,di − k;Λs) − di. So the descrip-
widely in operations management [23,32,35]. Demand forecasting tion length of the neural network f(⋅;Λs) is given by the description
problems deploying this artificial intelligent technique are studied length of all the parameters Λs [40]:
following classification, simulation, and decision making. Hill et al.
[21] stated that neural networks can perform significantly well in X
s
γ
forecasting tasks. Possible reasons for poor forecast of classical NðsÞ ¼ LðΛ s Þ ¼ ln ð2Þ
j¼1
δj
decomposition are indentified and Hansen and Nelson [19] concluded
1406 H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416

Start

Demand generation

Demand forecast using


different NN models

Input Hidden Output Input Hidden Output Input Hidden Output


Layout Layout Layout Layout Layout Layout Layout Layout Layout

MDL approach to shortlist


demand forecast model

Select MDL-optimal neural


networks as forecast model

End

Fig. 1. Flowchart of simulation model.

where γ is a constant. (δ0,δ1, …,δs) are defined as the solution of jth time unit as the inputs, {pij}iK=1, the neural network gives its
j
0 2 31 prediction, f(p1j ,p2j , …,pK;Λ s) and correspondingly the prediction error
δ0 at the jth time unit ei =f(p1j ,p2j ,…,pKj;Λ s) − di. We then employ equations
B 6 δ17C for M(s) and E(s) to compute the description length of the neural
B 6 7C
B 6 δ27C 1
BQ 6 7C ¼ ð3Þ network in this scenario.
B 6 :: 7C δj−1
@ 4 : 5A
δk
j
3.2. Surrogate data method
where Q is the second derivative of E(s) and (⋅)j denotes the jth
element of the vector (⋅) [40]. The surrogate data method was suggested and standardized by
E(s) is the negative logarithm of the likelihood of the errors e = Theiler [37]. The rationale for surrogate data hypothesis testing is to
{ei}iN= 1 under the given probability distribution of those errors. With generate an ensemble of surrogate data (surrogates in short) that
the assumption that these errors follow the standard Gaussian distri- preserve certain properties of the original data (i.e. consistent with
bution the description length of model prediction errors is approxi- some null hypotheses) [36].
mated by [40] There are three typical null hypotheses, NH0 (the data is random
noise), NH1 (the data is linearly filtered noise) and NH2 (the data is a
 N=2 !N=2 static monotonic nonlinear transformation of linearly filtered noise)
N 2π XN
2
EðsÞ ¼ þ ln þ ln ei : ð4Þ [42]. Correspondingly, Algorithm 0, Algorithm 1, and Algorithm 2 pro-
2 N i¼1 duce surrogates that are consistent with these hypotheses. One then
applies some test statistic to both the surrogates and the original data.
In this paper we further study the prediction of customer demands If the test statistic value for the data is out of the distribution formed
by using the related environmental processes. Since the end customer by values estimated for the surrogates, the given hypotheses is rejected
demand is related to many environmental factors the end customer as being the likely origin of the data. If the statistical value for the data is
demand can be regarded as an unknown function with respect to in the distribution formed from surrogates, this suggests that the data is
those factors. Given the time series of K environment factors at the consistent with the given hypotheses.
H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416 1407

For example, we test the demand data against the hypothesis of Base, slope, season and period are specified by the simulated data,
NH1. That is, we wish to examine whether the demand data we are as shown in Fig. 2. These four data sets contaminated with observa-
interested in originates from the linear process. If so, the linear tional noise simulate practical environmental factors related to the
model is more suitable to model such data. So the surrogate data customer demand. Then the demand data per week is generated
method acts as a guide to the next modeling technique. We then using the given formula above. In order to examine the suitability of
notice whether the optimal neural network selected by the MDL is MDL approach to forecast demand, some forecasting techniques,
equivalent to a linear model so as to further validate the capability MSE-optimal neural network, exponential smoothing and multiple
of our model selection technique. If the given data is consistent with regressions, would be taken as comparison.
the nonlinear process nonlinear modeling techniques, like neural
networks should outperform the linear counterparts. However,
4.1. Identification of stochastic customer demands
nonlinear models are sensitive to data sets and more easily become
overfitted than do linear models. We then observe whether the
Prior to the prediction of the simulated customer demands, it is
MDL-optimal model can avoid overfitting and can provide an accu-
necessary to determine whether the given data is predictable or not.
rate prediction. This is discussed at length in the next section.
It is not significant to make a prediction of customer demands
which are consistent with stochastic noise. The surrogate data meth-
4. Experimental results
od with the hypotheses of NH0 and NH1 is used to test the dynamic
property of the given demand data (i.e. predictability and linear
First of all, the customers follow a linear demand process with
property). The results are presented in Fig. 3.
seasonal swings. The customer demands in the simulation model
100 surrogates of the original customer demand are generated,
are generated using the following formula [41],
which are consistent with NH0 and NH1. One popular statistical crite-
  rion, complexity [42], is applied to the given simulated data. Green
seasonðt Þ þ periodð2π=52  t Þ
Dt ¼ ðbaseðt Þ þ slopeðt ÞÞ  þ noise  randXnormalð⋅Þ; bars in Fig. 3 show the probability distribution of statistical values
seasonðt Þ
for all the surrogates; and the red star is the complexity of the original
ð5Þ
demand. Note that the complexity of the original time series is not
shown in the top panel of Fig. 3, as its value is far away from the
where Dt is the demand at the tth week, rand _ normal(⋅) is a standard range of the surrogates' results (i.e. 0.99–1.10).
normal random generator between zero and one, noise represents the Gaussian distribution is employed to fit the distribution of statisti-
amplitude of the contaminated normal random noise, and base, slope, cal values for surrogate data. With this known distribution, we can
period and season are variable coefficients concerned with the time, a determine the confidence level of rejecting the given hypothesis.
week. The central limit theorem proves that a large number of independent
The same form is also used by Bayraktar et al. [4] but they set base, surrogates are distributed approximately normally. The normal distri-
slope and season with constants. The underlying tendency in their bution is appropriate for modeling the distribution of statistical
simulated data is easy to follow so we replace them with time- values of surrogates as 100 surrogates are generated independently.
variant coefficients and add stochastic factors into the generation of We observe that the complexity of the original demand data is
these coefficients. The fluctuation of base and peaks of slope randomly even larger than the mean of the fitted Gaussian distribution plus
appear, and the variable, season, is added with strong random noise, its three-time standard deviation. It, therefore, suggests that the
all of which are tried to reflect the practical factors. Finally, the gener- given customer demand is inconsistent with random noise, with
ated customer demand is more difficult to forecast. almost 100% confidence probability. Furthermore, in the bottom

a b
10 8
7
8
6

6 5
4
4 3
2
2
1
0 0
0 100 200 300 400 500 0 100 200 300 400 500

c d
8 4
7
3
6
5 2
4
3 1

2
0
1
0 -1
0 100 200 300 400 500 0 100 200 300 400 500

Fig. 2. Time series of environmental factors, base (a), slope (b), season (c) and period (d).
1408 H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416

20

Distribution
15

10

0
0.96 0.98 1 1.02 1.04 1.06 1.08 1.1 1.12
Complexity

25

20
Distribution

15

10

0
0.7 0.75 0.8 0.85 0.9 0.95 1
Complexity

Fig. 3. Application of the surrogate data method with hypotheses of NH0 (top panel) and NH1 (bottom panel) to the simulated demand data.

panel the statistical value of the original demand data is in the center 4.2. Prediction of customer demands with historical demands
of the distribution of the surrogates. This result indicates that the
given customer demand is consistent with linear dynamics. Thus Various neural networks are employed with one to twenty
both conclusions indicate that the given data itself is not of a stochas- neurons to forecast the demand data generated by the formula
tic character and is suitable for prediction by linear modeling above, of which the first 400 points are selected to train the network,
techniques. and the rest are the test data. Typical predictions obtained by the

a b

c d

e f

Fig. 4. Prediction of various neural networks of the customer demands by using historical demand data.
H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416 1409

neural networks with one, five, seven, ten, sixteen, and nineteen As a comparison, we illustrate the mean square errors of the train-
neurons are listed in Fig. 4, where the red curves are the prediction ing and test data, as presented in Fig. 6. The mean square errors of the
and the blue curves are the original demand. The neural network training data keep decreasing while the mean square errors of the test
with only one neuron predicts exactly the future customer demand. data keep increasing, with fluctuations. The presence of mean square
Note that some predicted values are negative, as shown in Fig. 4(c) errors indicates that the neural networks with more neurons are
and (d) since they are obtained automatically by the given neural inclined to become overfitted but it cannot estimate the optimal
network. The negative values indicate that the neural network has neural network. The mean square errors in the test data can be
become overfitted. The description length curve of all twenty neural regarded as evidence of overfitting, rather than a criterion for model
networks is shown in Fig. 5. The neural network candidate with one selection.
neuron minimizes the description length and thereby this neural The performance among selected forecasting techniques is given
network is the MDL-optimal model. The neural network with one in Table 1, with negligible mean-square errors for the four methods,
neuron is equivalent to a linear model. It conforms to the previous it is found that neural network approach got slightly better prediction
conclusion taken by the surrogate data method that the linear comparing to those of exponential smoothing and multiple regression
model is suitable for predicting the given simulated demand. with overall accuracy about 0.8.
Here the standard deviation of the test data is set as the threshold
for assessing the accuracy of the prediction. It means that if the devi- 4.3. Prediction of customer demands with environmental factors
ation between the original data and its prediction is lower than the
threshold, this prediction is regarded as an accurate one; otherwise, The records of base, slope, season and period are fed to the neural
it is an inaccurate prediction. Finally, a curve of the prediction accura- network to give the corresponding demand data. Obviously, the
cy for all twenty models is achieved, as shown in the bottom panel of generation of demand data is a simple linear function with respect
Fig. 5. Again, the MDL-optimal model gives the most accurate to these parameters. Using the environmental factors related to the
prediction. practical customer demand is the best way to forecast such demand.

Fig. 5. Description length and prediction accuracy of the twenty neural networks when employing historical demand data.
1410 H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416

Fig. 6. Mean square errors of both the training (the top panel) and test data (the bottom panel) for all the twenty neural networks.

Fig. 7 gives the description length curve of twenty neural networks MDL-optimal neural network is about 37% and 20% higher than those
and their prediction accuracy. of exponential smoothing and multiple regressions respectively.
The description length curve shows that for the same type of
simulated demand data, the MDL-optimal neural network is also the 4.4. Prediction of nonlinear customer demands
neural network candidate with one neuron, which achieves the highest
prediction accuracy. The curve expression in Fig. 7 and calculated mean We now introduce a chaotic time series, the Ikeda map, which
square error also show that overfitting occurs in the network models appears to be periodic. The equation of the Ikeda map is given by
with more than one neuron. Hence, the MDL-optimal model can pro- 
vide an accurate prediction for the linear demand data in both xnþ1 ¼ 1 þ μ ðxn cost n −yn sint n Þ
; ð6Þ
scenarios. ynþ1 ¼ μ ðxn sint n þ yn cost n Þ
Under the effect of environmental factors, computations, again,
Table 2 shows that MDL-optimal neural network performed the best where μ = 0.7 and tn = 0.4 − 6/(1 + xn2 + yn2).
among the four specified demand forecasting methods. It is deserved It is known that with the parameters above, the data generated is
to mention that prediction accuracy with environmental factors of nonlinear. Certainly, other known nonlinear data can also be used.
Here we choose the x-component data of the Ikeda map denoted by
Ikeda_x. We, therefore, simulate the nonlinear demand by using the
formula, Dt = base + slope + season + Ikeda _ x.
Table 1
Prediction accuracy of customer demands with historical demands. The base, slope and season are the same as those of the previous
function, and are contaminated with stochastic noise. The demand
MDL-optimal MSE-optimal Exponential Multiple
that is generated is also a linear function with respect to these four
NN NN smoothing regression
coefficients, but the data itself is nonlinear. The simulated nonlinear
MSE 1.75 × 10−3 1.51 × 10−3 3.46 × 10−3 4.31 × 10−3 demand data is tested by the surrogate data method, as shown in
Prediction accuracy 0.88 0.84 0.77 0.78
Fig. 8. The dashed curves are a Gaussian distribution fitted to the
H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416 1411

Fig. 7. Description length and prediction accuracy of the twenty neural network candidates, using environmental factors.

distribution of two kinds of surrogates. In the figure, the statistical Referring to Fig. 10, the MDL-optimal neural network is the neural net-
value of the original simulated demand data is 0.721, which is lower work candidate with five neurons, which achieves the highest prediction
than the mean of complexity of surrogates minus their standard accuracy of all. From Fig. 9, we observe that the MDL-optimal neural
deviation. We, therefore, conclude that the given demand data is network accurately predicts the demand of the next customer. The neural
not random noise, and not suitable for linear models. network with one neuron cannot capture the underlying dynamics of this
We also generate 500 points of this system, of which 400 points demand data although the amplitude of prediction is lower than the
are selected as training data and the rest are used as testing data. original one (i.e. the variance of the prediction is also lower than that of
Typical predictions obtained by neural networks with one, five, the original).
nine, thirteen, sixteen, and twenty neurons are presented in Fig. 9, While comparing all results of prediction accuracies, the accuracy
where the denotation of curves is the same as that in Fig. 4. The of MDL-optimal neural network is as high as 0.92 which is also
description length of these twenty neural network candidates is plot- approximately 1.4 times of those of exponential smoothing and
ted in the top panel of Fig. 10 and the bottom panel shows the accu- multiple regressions as shown in Table 3.
racy of the predictions. We also implement the prediction for this nonlinear demand data
by using the four coefficients. As we expected, the description length
curve shows the neural network with one neuron is the optimal
model in this case. For the sake of brevity, we do not show these
Table 2
Prediction accuracy of customer demands with environmental factors.
figures. In addition, we replace the time series of the Ikeda map
with the periodic data and then repeat the prediction in the same
MDL-optimal MSE-optimal Exponential Multiple way. We notice that the MDL-optimal model as well as the other
NN NN smoothing regression
models fails to follow the future tendency of the demand data. This
MSE 3.33 × 10−3 3.18 × 10−3 7.56 × 10−3 7.40 × 10−3 suggests that one should be cautious when using environmental fac-
Prediction accuracy 0.85 0.84 0.62 0.71
tors which are possibly related to customer demands when making
1412 H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416

Fig. 8. Application of the surrogate data method with hypotheses of NH0 (the top panel) and NH1 (the bottom panel) to the nonlinear demand data.

predictions. It is quite difficult to identify the environmental factors that 4.5. Prediction of practical demand data
affect customer demands, and wrong employment of irrelevant envi-
ronmental records would result in poor predictions. In contrast, predic- Here we employ the practical demand data, Monthly gasoline
tion using historical demand data is preferable but it is suggested that demand Ontario gallons in millions from 1960 to 1975, to validate
the data characteristics be investigated prior to the prediction by the proposed modeling technique and the framework [1]. We follow
using the surrogate data method, as discussed in this section. the same procedure as in previous cases. The surrogate data method,

a b

c d

e f

Fig. 9. Prediction of various neural networks of nonlinear customer demands by using historical demand data.
H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416 1413

Fig. 10. Description length of twenty neural networks and their prediction accuracy.

as shown in Fig. 11, suggests, with a weak confidence level that the According to Table 4, we can find that MDL-optimal neural
practical demand data is consistent with the linear process. network, still, performed the best with prediction accuracy 0.84. It
There are 192 data points, of which 102 points are selected to train is of 31%, 50% and 55.6% higher correspondingly. It is noticed that
neural networks and the rest are used as testing data. Typical predic- the large scale historical demand causes relatively large MSEs. In
tions obtained by various neural networks including the MDL-optimal conclusion, MDL-optimal neural network shows its high accuracy
one are presented in Fig. 12. We observe that the MDL-optimal neural and adoptability in different perspectives mentioned.
network overwhelms the other candidates. Fig. 13 exhibits the
description length of these twenty neural network candidates and
their prediction accuracy which suggests that MDL is the optimal 5. Conclusion
model.
This study has provided an analysis of the impact of the neural
network forecasting technique when dealing with a linear demand
structure with seasonal swings, and also with a non-linear demand
structure. Although earlier researchers examined analytically a similar
Table 3
Prediction accuracy of nonlinear customer demands. demand forecasting problem [4] they did not consider the dynamic
properties of the demand data in their prediction. The complementary
MDL-optimal MSE-optimal Exponential Multiple
approach developed here examines the characteristics of several
NN NN smoothing regression
demand processes by using statistical hypothesis testing in order to
MSE 10.10 7.69 13.02 9.62 exclude totally stochastic demands from being used for the purpose of
Prediction accuracy 0.92 0.85 0.64 0.69
prediction.
1414 H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416

Fig. 11. Application of the surrogate data method with hypotheses of NH1 to the practical monthly gasoline demand (the star represents the complexity of the original demand).

Based on the simulation analysis, this study notes a highly signifi- policy been considered, that is, one in which the stocks in the retail
cant finding that the method of description length can be adapted so sectors are kept “up to” a certain level by replenishing products
that it can be used to select the optimal neural network that is consis- consumed by customers (i.e. their demands). Obviously, the MDL
tent with the demand structure identified by the surrogate data method that has been developed, as well as the MDL-optimal neural
method. The MDL-optimal neural networks give accurate predictions network are also applicable to other replenishment policies in those
for typical demand data outperforming its counterparts. The surro- scenarios by ensuring that predictions are accurate.
gate data method and the MDL method confirm each other by their This study may further be extended as a way of assessing the
findings. The proposed framework of both methods gives an insight impact of the bullwhip effect on the performance measures of the
into various demand predictions. Here we do not consider the order- supply chain (e.g., total inventory cost and service level of the
ing policy in the supply chain. Neither has an up-to replenishment chain). Given the fact that bullwhip effect has a deteriorating impact

a b

c d

e f

Fig. 12. Prediction of various neural networks on the practical demands by using historical demand data.
H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416 1415

Fig. 13. Description length of twenty neural networks and their prediction accuracy.

on the operation cost of the whole chain, the direct relationship References
between the bullwhip effect and the performance of the proposed
[1] B. Abraham, J. Ledolter, Statistical Methods for Forecasting, John Wiley, New York,
prediction techniques is an interesting area for future research. In 1983.
addition, the structural configuration of the prediction system is [2] L. Aburto, R. Webber, Demand forecast in a supermarket using a hybrid intelligent
suggested as a way to integrate various modules/techniques so as to system, in: Design and application of hybrid intelligent systems, IOS, Berlin, 2003.
[3] L. Aburto, R. Webber, Improved supply chain management based on hybrid
enhance the efficiency of the whole system. demand forecasts, Applied Soft Computing 7 (Jan. 2007) 136–144.
[4] E. Bayraktar, S. Kohb, A. Gunasekaran, K. Sari, E. Tatoglu, The role of forecasting on
bullwhip effect for E-SCM applications, International Journal of Production
Economics 113 (2008) 193–204.
Acknowledgment [5] M.N. Bhattacharyya, Forecasting the demand for telephones in Australia, Journal
of the Royal Statistical Society: Series C: Applied Statistics 23 (1) (1974) 1–10.
The authors wish to thank the Research Committee of the Hong [6] M. Caputo, V. Mininno, Internal, vertical and horizontal logistics integration in
Italian grocery distribution, International Journal of Physical Distribution and
Kong Polytechnic University and the CInIS research group of Univer-
Logistics Management 26 (9) (1996) 64–90.
sity of Western Sydney for their support of this project. [7] F. Chen, J.K. Ryan, D. Simchi-Levi, The impact of exponent trial smoothing fore-
casts on the bullwhip effect, Naval Research Logistics 47 (2000) 269–286.
[8] F. Chen, Z. Drezner, J.K. Ryan, D. Simchi-Levi, Quantifying the bullwhip effect in a
simple supply chain: the impact of forecasting, lead times, and information,
Table 4 Management Science 46 (3) (2000) 436–443.
Prediction accuracy of practical demand data. [9] S. Chopra, P. Meindl, Supply Chain Management, Prentice-Hall, Englewood Cliffs,
NJ, 2001.
MDL-optimal MSE-optimal Exponential Multiple [10] J. Cohen, Multiple regression as a general data-analytic system, Psychological
NN NN smoothing regression Bulletin 70 (6) (1968) 426–443.
[11] M.C.M. de Carvalho, M.S. Dougherty, A.S. Fowkes, M.R. Wardman, Forecasting
MSE 1.48 × 104 1.03 × 104 4.76 × 104 6.02 × 104
travel demand: a comparison of logit and artificial neural network methods,
Prediction accuracy 0.84 0.64 0.56 0.54
The Journal of the Operational Research Society 49 (7) (1998) 717–722.
1416 H.C.W. Lau et al. / Decision Support Systems 54 (2013) 1404–1416

[12] S.M. Disney, D.R. Towill, The effect of vendor managed inventory (VMI) dynamics to Pay for Urban Water Supply, Department of Civil Engineering, University of
on the bullwhip effect in supply chains, International of Production Economics 85 Moratuwa, Sri Lanka, 2001.
(2003) 199–215. [34] B. Sani, B.G. Kingsman, Selecting the best periodic inventory control and demand
[13] J. Faraway, C. Chatfield, Time series forecasting with neural networks: a compar- forecasting methods for low demand items, The Journal of the Operational
ative study using the airline data, Applied Statistics 47 (2) (1998) 231–250. Research Society 48 (7) (1997) 700–713.
[14] R. Fildes, Evaluation of aggregate and individual forecast method selection rules, [35] S. Schocken, G. Ariav, Neural networks for decision support: problems and oppor-
Management Science 35 (9) (1989) 1056–1065. tunities, Decision Support Systems 11 (5) (1994) 393–414.
[15] M. Fisher, J. Hammond, W. Obermeyer, A. Raman, Configuring a supply chain to [36] M. Small, C.K. Tse, Detecting determinism in time series: the method of surrogate
reduce the cost of demand uncertainty, Production and Operations Management data, IEEE Transactions on Circuits and Systems I 50 (2003) 663–672.
6 (1997) 211–225. [37] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, J.D. Farmer, Testing for nonlinearity
[16] I. Flood, A neural network appraoch to the sequencing of construction tasks, in: in time series: the method of surrogate data, Physica D: Nonlinear Phenomena 58
Proceedings of the 6th International Symposium on Automation and Robotics in (1–4) (1992) 77–94.
Construction, Construction Industry Institute, Austin, TX, 1989, pp. 204–211. [38] J.M. Twomey, A.E. Smith, Bias and variance of validation methods for function
[17] S.C. Graves, A single-item inventory model for a non-stationary demand process, approximation neural networks under conditions of sparse data, IEEE Transac-
Manufacturing and Service Operations Management 1 (1) (1999) 50–61. tions on Systems, Man, and Cybernetics Part C 28 (3) (1998) 417–430.
[18] H. Gurnani, C.S. Tang, Optimal ordering decisions with uncertain cost and [39] D.Q. Yao, Study of bullwhip effect and channel design in supply chains, Ph.D. Disser-
demand forecast updating, Management Science 45 (10) (1999) 1456–1462. tation, the University of Wisconsin-Milwaukee, 2001.
[19] J.V. Hansen, R.D. Nelson, Forecasting and recombining time-series components by [40] Y. Zhao, M. Small, Minimum description length criterion for modeling of chaotic
using neural networks, The Journal of the Operational Research Society 54 (3) attractors with multilayer perception networks, IEEE Transactions on Circuits
(2003) 307–317. and Systems I 53 (3) (2006) 722–732.
[20] T. Hill, M. O'Connor, W. Remus, Neural networks for time series forecasts, [41] X. Zhao, J. Xie, J. Leung, The impact of forecasting model selection on the value of
Management Science 42 (7) (1996) 1082–1092. information sharing in a supply chain, European Journal of Operational Research
[21] T. Hill, M. O'Connor, W. Remus, Neural networks models for time series forecasts, 142 (2002) 321–344.
Management Science 42 (7) (1996) 1082–1092. [42] Y. Zhao, J.F. Sun, M. Small, Evidence consistent with deterministic chaos in human
[22] R.H. Hollier, K.L. Mak, K.K. Lai, Computing optimal (s, S) policies for Inventory cardiac data: surrogate and nonlinear dynamical modeling, International Journal
systems with a cut-off transaction size and the option of joint replenishment, of Bifurcation and Chaos 18 (2) (2008) 141–160.
International Journal of Production Research 40 (14) (2002) 3375–3389.
[23] H. James, Software for studying and developing applications of artificial neural
networks, The Economic Journal 104 (422) (1994) 181–196.
H.C.W. Lau received his M.Sc. degree from Aston University, Birmingham U.K., in 1981 and
[24] S. Koh, A. Gunasekaran, A knowledge management approach for managing uncer-
the Ph.D. degree from the University of Adelaide, Adelaide, Australia, in 1995. He is currently
tainty in manufacturing, Industrial Management and Data Systems 106 (2006)
an Associate Professor with the Department of Industrial and Systems Engineering,
439–459.
Hong Kong Polytechnic University, involved in research and teaching activities. His current
[25] H. Lee, V. Padmanabhan, W. Seungjin, The bullwhip effect in supply chains, Sloan
research areas cover manufacturing, data management, workflow automation, and artificial
Management Review 38 (1997) 93–102.
intelligence applications.
[26] W. Liang, C. Huang, Agent-based demand forecast in multi-echelon supply chain,
Decision Support Systems 42 (1) (2006) 390–407.
[27] J.M. Masters, Determination of near optimal stock levels for multi-echelon distri-
bution inventories, Journal of Business Logistics 14 (2) (1993) 165–195. G.T.S. Ho received his Bachelor degree and PhD at the Hong Kong Polytechnic Univer-
[28] R. Metters, Quantifying the bullwhip effect in supply chains, Journal of Operations sity. He is currently a lecturer in the Department of Industrial and Systems Engineering
Management 15 (2) (1997) 89–100. at the Hong Kong Polytechnic University. His research interests include supply chain
[29] J. Miyaoka, W. Hausman, How a base stock policy using ‘stale’ forecasts provides management, integrated quality enhancement systems and artificial intelligence appli-
supply chain benefits, Manufacturing and Service Operations Management 6 (2) cations.
(2004) 149–162.
[30] T. Moyaux, B. Chaib-draa, S. D'Amours, Information sharing as a coordination
mechanism for reducing the bullwhip effect in a supply chain, IEEE Transactions Yi Zhao received his M.Eng. degree from Zhajiang University, Hangzhou, China, in 2003
on Systems, Man, and Cybernetics Part C 37 (3) (2007) 396–409. and the Ph.D. degree from Hong Kong Polytechnic University, Hong Kong, China, in
[31] S.K. Paik, Analysis of the causes of ‘bullwhip’ effect in a supply chain: A simulation 2007. He is currently a Postdoctoral Fellow at Hong Kong Polytechnic University. His
approach, Ph.D. Dissertation, The George Washington University, 2003. research interests include time series analysis, nonlinear system modeling, and neural
[32] N.C. Proudlove, Intelligent management systems in operations: a review, The networks.
Journal of the Operational Research Society 49 (7) (1998) 682–699.
[33] M. Ranasinghe, G.B. Hua, T. Barathithaasan, A Comparative Study of Artificial
Neural Networks and Multiple Regression Analysis in Estimating Willingness

You might also like