You are on page 1of 15

International Journal of Electronic Business Management, Vol. 9, No. 2, pp.

107-121 (2011) 107

CONSTRUCTING A SALES FORECASTING MODEL BY


INTEGRATING GRA AND ELMA CASE STUDY FOR
RETAIL INDUSTRY
Fei-Long Chen and Tsung-Yin Ou*
Department of Industrial Engineering and Engineering Management
National Tsing Hua University
Hsinchu (300), Taiwan

ABSTRACT
Due to the strong competition and economic hardship, sales forecasting is a challenging
problem as the demand fluctuation is influenced by many factors. A good forecasting
model leads to improve the customers satisfaction, reduce destruction of fresh food,
increase sales revenue and make production plan efficiently. In this study, the GELM
forecasting model integrates Grey Relation Analysis (GRA) and extreme learning machine
(ELM) to support purchasing decisions in the retail industry. GRA can sieve out the more
influential factors from raw data and transforms them as the input data in a novel neural
network such as ELM that can abandon the slow gradient-based learning speed and
parameters tuned iteratively. The proposed system evaluated the real sales data of fresh
food in the retail industry. The experimental results indicate the GELM model outperforms
than other time series forecasting models, such as GARCH, GBPN and the GMFLN model
in predicting accuracy and training speed. Otherwise, the different activation functions of
the GELM model have significant differences in training time and performance during our
experiments.

Keywords: Sales Forecasting, Grey Relation Analysis, Extreme Learning Machine, Retail
Industry, Activation Functions

1. INTRODUCTION critical issue to figure out the influential factors then


* obtain accurate forecasting results about the fresh
In retail industry actual operations, sales food within a modern retail industry.
forecasting plays a more and more prominent role as Since managers in retails usually lack an
part of the commercial enterprise. However, sales accurate forecasting tool, they have to rely on their
forecasting is usually a highly complex problem due own experience or consult the point of sales system
to the influence of internal and external factors. If (POS system) to predict the future demand and place
decision-makers could estimate their sales quantities purchasing orders. Few decision makers adopt
properly, the demands of customers would be statistical methods, such as the moving average
satisfied and the cost of spoiled fresh food would be method or exponential smoothing to deal with the
substantially reduced. Actually, the variations in time daily problems. LeVee [27] indicated that
consumers demand are caused by many factors like accurate sales forecasting was obtainable and that it
price, promotion, changing consumer preference or can help the decision-makers to calculate the
weather changes, especially in fresh food [34]. Both production and material costs and determine the sales
shortage and surplus of fresh items, which can only price. In fact, most conventional sales forecasting
be sold for a limited period, would lead to loss methods used either factors or time series data to
revenue for the retail company. An effective and determine the sales prediction. The relationship
timely forecasting model is an urgent and between the past time series data (independent
indispensable tool for handling the inventory level in variables) and the sales prediction (dependent
the retail business. On the other hand, poor variable) is always too complicated to acquire an
forecasting methods would result in redundant or advantageous ordering suggestions by using the
insufficient stock that will affect the income and unsuited statistical approaches. Practically, the POS
competitive advantage directly. Therefore, it is a very system actually provides some forecasting
suggestions for the managers to place orders.
However, most decision-makers still prefer to place
*
Corresponding author: d927810@oz.nthu.edu.tw the same quantity as usual or depend on their own
108 International Journal of Electronic Business Management, Vol. 9, No. 2 (2011)

intuition instead of model-based approaches. In this weather, promotion, competitive market, etc.
paper, we present a relatively novel neural network Therefore, traditional methodologies require some
methodology, Grey relation analysis integrated with improvements for providing better forecasting
extreme learning machine (GELM) to construct a suggestions.
forecasting model in the fresh food sector of the retail Next, we will briefly introduce the traditional
industry. statistical forecasting models and ANN model in
Sales in the retail sector exhibit strong seasonal sales forecasting applications.
variations. Historically, modeling and forecasting
seasonal data is one of the major research efforts and 2.1 Traditional Statistical Model for Time Series
many theoretical and heuristic methods have been Data Forecasting
developed in the last several decades. The available In the past several decades, many researchers
traditional quantitative approaches include heuristic had used many kinds of forecasting methods to study
methods such as time series decomposition and time series events. Univariate time series models
exponential smoothing as well as time series include the moving average model, exponential
regression and autoregressive and integrated moving smoothing model, and auto-regressive integrated
average (ARIMA) models that have formal statistical moving average (ARIMA) model. Box and Jenkins
foundations [7]. Nevertheless, their forecasting ability [9] developed ARIMA, a basic principle of this
is limited by their assumption of a linear behavior and model is the assumption of linearity among the
thus, it is not always satisfactory [37]. Recently, variables. However, many time series events may not
artificial neural network (ANN) have been applied hold on the linearity assumption. Clearly, ARIMA
comprehensively in sales forecasting [17,31], pattern models could not be effectively used to capture and
recognition [26], aggregate retail [7], PCB industry explain non-linear relationships, especially for
[11]. Most studies indicate that ANN have the better handling actual sales forecasting problems. When it is
performance than conventional methodology [23,24]. applied to processes that are non-linear, forecasting
This flexible data-driven modeling property has made errors often increase greatly as the forecasting
the ANN model an attractive tool for many horizon becomes longer. For improving forecasting
forecasting tasks. However, most ANN and its non-linear time series events, many researchers have
varieties used gradient-bases learning algorithms, developed alternative modeling approaches. These
such as back-propagation network (BPN), and faced approaches include non-linear regression models, the
many difficulties in stopping criteria, learning rate, bilinear model, the threshold auto-regressive model,
learning epochs, over-tuning, local minima and long the auto-regressive heteroscedastic model (ARCH)
computing time. A new learning algorithm for [16] and generalized auto-regression conditional
single-hidden-layer feed-forward neural network heterskedasticity (GARCH) model [4].
(SLFN) called the extreme learning machine (ELM) Although the traditional methods have been
has been proposed recently and overcome the proved somewhat effective, they still have certain
previous disadvantages as we mentioned [18,19,30, shortcomings. Zhang [36] indicated that although
32,34]. these methods had displayed some improvements
The rest of this study will illustrate the GELM over the linear models in some specific cases, they
model for improving the accuracy of forecasting fresh tended to be applied to special events, and lacked
foods in the retail industry. Section 2 reviews the generality and were poorly implement.
related sales forecasting literatures including the
traditional statistical model and the ANN model. 2.2 ANN Model in Time Series Data Forecasting
Section 3 presents the methodology of this study in The ANN model is a model-free approach that
solving the real forecasting problems. Section 4 was been recently applied in forecasting due to its
describes the development of various forecasting competent performance in forecasting and pattern
models and discusses the comparison results. Then recognition. In general, it consists of a collection of
the conclusion will be provided in Section 5. simple non-linear computing elements whose inputs
and outputs are tied together to form a network. Many
studies have attempted to apply ANN model to time
2. LITERATURES REVIEW series forecasting. Weigend. et al. [35] introduced the
''weight-elimination'' back-propagation learning
The available traditional time series forecasting procedure and applied it to sunspots and
approaches are divided into two groups i.e. the exchange-rate time series. Tang and Han [33]
univariate time series model and multivariate time compared the ANN model with the ARIMA model by
series model. One of the major limitations of using international airline passenger traffic, domestic
traditional statistical methods is that they are car sales and foreign car sales in the USA.
essentially linear methods. The sales status of fresh Chakraborty et al. [10] presented an ANN approach
food is often influenced by uncertain factors such as based on multivariate time-series analysis, which can
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating GRA and ELM 109

accurately predict the flour prices in three cities in the inventory levels. Au et al. [6] and Sun et al.[37]
USA. Lachtermacher et al.[20] developed a calibrated developed different sales forecasting models in
ANN model. In this model, the Box-Jenkins methods fashion retailing. Au et al. [6] illustrated evolutionary
are used to determine the lag components of the input neuron network for sales forecasting and showed that
data. Moreover, it employed a heuristics method to when guided with the BIC and the pre-search
choose the number of hidden units. approach, the non-fully connected neuron network
Ansuj et al. [5] expressed a comparison made can converge faster and more accurate in forecasting
for the time series model with interventions related to for time series than the fully connected neuron
the ANN model for analyzing the sales behavior of a network and traditional SARIMA model. Forecasting
medium-size enterprise. The results showed that the is often time crucial, the improvement of convergence
ANN model was more accurate. Furthermore, Bigus speed makes widely applicable to decision-making
[7] used promotion, time of year, end of month age, problems. Sun et al [37] applied ELM neural network
and weekly sales as inputs for the ANN model to model to investigate the relationship between sales
forecast the weekly demand with promising results. amount and some significant factors which affect
Kuo and Chen [22] believed that the traditional demand. The experiment results demonstrate that the
statistic approaches had higher performance dealing proposed methods outperform than back-propagation
with data of seasonality and trends, but they are neural network model. Ali et al. [3] explored
inappropriate for unexpected situations. forecasting accuracy versus data and model
In the ELM method, the input weights (linking complexity tradeoff in the grocery retailing sales
the input layer to the hidden layer) and hidden biases forecasting problem, by considering a wide spectrum
are randomly chosen and the output weights (linking in data and technique complexity. The experiment
the hidden layer to the output layer) are analytically results indicated that simple time series techniques
determined by using the Moore-Penrose (MP) perform very well for periods without promotions.
generalized inverse. As this new learning algorithm However, for periods with promotions, regression
can be easily implemented, it tends to identify the trees with explicit features improve accuracy
smallest training error, obtains the smallest norm of substantially. More sophisticated input is only
weights and the good generalization performance, beneficial when advanced techniques are used. Chen
and runs extremely fast. et al. [12] developed the GMFLN forecasting model
by integrating GRA and MFLN neural networks.
2.3 Demand Forecasting of the Retail Industry GRA sieves out the more influential factors from raw
Chu and Zhang [13] and Alon et al.[4] data then transforms them as the input data in the
developed the artificial networks for forecasting the MFLN model. The experimental results indicated the
aggregate retail sales. Alon et al.[21] compared with proposed forecasting model outperforms than MA,
traditional methods including Winter exponential ARIMA and GARCH forecasting model of the retail
smoothing, Box-Jenkins ARIMA model, and goods.
multivariate regression. The derivative analysis According to the above literature review, the
shows that the nonlinear neural networks model is retail forecasting problems are usually a time and
able to capture the dynamic nonlinear trend and accuracy crucial issue. This paper aims to construct a
seasonal patterns, as well as the interactions between more efficiently sales forecasting model that could
them. Chu et al.[7] found the non-linear models are perform more accurate and faster than the univariate
able to outperform linear counterparts in and multivariate time series model for retail goods.
out-of-sample forecasting, and prior seasonal As we know, sales will be affected by many dynamic
adjustment of the data can significantly improve factors. GRA and the expert knowledge will sieve the
performance of the neural network model. The more influential factors out as the input variables of
overall best model is the neural network built on the ELM model. Providing an improved forecasting
deseasonalized time series data. Doganis et al. [15] method that can help the managers to make decisions
also presented a evolutionary sales forecasting model for ordering the appropriate amounts will be the focal
which is a combination of two artificial intelligence point of this research.
technologies, namely the radial basis function and
genetic algorithm. The methodology is applied 3. METHODOLOGY
successfully to sales data of fresh milk provided by a
major manufacturing company of daily product. The following section presents the purposed
Aburto and Weber [1] presented a hybrid intelligent sales forecasting model by integrating GRA and
system combing ARIMA model and MLP neural ELM. The GRA computes the Grey Relation Grades
networks for demand forecasting. It shows (GRG), which are the influential degree of a
improvements in forecasting accuracy and a compared series by relative distance. Subsequently,
replenishment system for a Chilean supermarket, the data composed of these input and output pairs are
which leads simultaneously to fewer sales and lower divided into training, testing and predicting data. All
110 International Journal of Electronic Business Management, Vol. 9, No. 2 (2011)

the data sets should be normalized into a specific 3.2 Normalization and Unnormalization
range [-1,1]. The ELM would offer predicting results The normalized method for the input and
then process the unnormalization step, to convert the output data set is described as follows:
data back into unnormalized outcomes.
X ij Max{ X ij } ( X ij Min{ X ij })
3.1 Grey Relation Analysis (GRA) X normalize
( Max{ X ij } Min{ X ij })
Deng [14] proposed the Grey Relation Analysis
(GRA) mathematics. It has been successfully applied i 1,2,..., n; j 1,2,..., N (3)
in many fields such as management, economics, and The unnormalized method for the predicting
engineering. The Grey Relation Grades (GRG) is the result is described as follows:
influence degree of a compared series on the
reference series that can be represented by the relative Punnormalize
distance. The smaller distance would have more Pij ( Max{ X ij } Min{ X ij }) Max{ X ij } Min{ X ij }
influence. The degree of influence describes the
2
relative variations between two factors that indicate i 1,2,..., n; j 1,2,..., N (4)
the magnitude and gradient in a given system. The
GRG between two series, the compared series and the
reference series, is called relational coefficient 3.3 Extreme Learning Machine (ELM)
r ( x0 (k ), xi (k )) . Before calculating the Grey relational ELM is a single hidden-layer feed-forward
neural network (SLFN). It randomly chooses the
coefficients, each data series must be normalized by input weight matrix W and analytically determines
dividing the respective data from the original series the output weight matrix of SLFN. Suppose that we
with their averages.
are training a SLFN with K hidden neurons and an
After performing Grey data processing, the
transformed reference sequence is x0={x0(1), x0(2),..., activation function vectors g ( x) [ g1 ( x), g 2 ( x),...,
x0(n)}. The compared sequences are denoted by g k (x)] to learn N distinct samples ( xi , ti ) , where
xi={xi(1), xi(2),,xi(n)}, i=1 to m. The relational xi [ xi1 , xi 2 ,..., xin ]T R n and ti [ti1 , ti 2 ,...tim ]T R m .
coefficient r ( x0 (k ), xi (k )) between the reference
If the SLFN can approximate the N samples with a
series x0(t) and the compared series xi(t) at time t=k zero error then we have
can be calculated using the following equation [20]:


N
r ( x0 , xi (k )) j 1
|| y j t j || 0 (5)
min min | x0 (k ) xi (k ) | max max | x0 (k ) xi (k ) |
i k i k
Where y is the actual output value of the
| x0 (k ) xi (k ) | max max | x0 (k ) xi (k ) |
i k SLFN. There also exist parameters i , wi and bi
k=1,2,,mi=1,2,,m (1) such that

While is a distinguishing coefficient K

(0< 1) that is used to adjust the range of the g (w x


i 1
i i i j bi ) t j j 1,2,..., N (6)
comparison environment and control level of
differences in the relational coefficients. When
=1, the comparison environment is altered. When Where wi [ wi1 , wi 2 ,..., wim ]T is the weight
=0, the comparison environment disappears. In vector connecting the ith hidden node and the input
cases, when the data variation is large, usually nodes, i [ i1 , i 2 ,..., im ]T is the weight vector
ranges from 0.1 to 0.5 for reducing the influence of connecting the ith hidden node and the output node,
extremely large mini mink | x0 (k ) xi (k ) | . and bi is the threshold of the ith hidden node. The
where | x0 (k ) xi (k ) | denotes the absolute operation wi x j in Equation (6) denotes the inner
difference between the two sequences, which product of wi and x j . The above N equations can
represent the distance x0i (k ) after data be written compactly as:
transformation is the minimum (maximum) distance
for the time in all compared sequences which form H T
the comparison environment. While mini mink Where
|x0(k)-xi(k)| equals zero since the transformed series H ( w1 ,..., w , b1 ,..., b , x1 ,...x N )
~ ~
N N
will crisscross at a certain point.

maxi maxk | x0 (k) xi (k) | [maxi maxk | x0 (k) xi (k) |] (2)


F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating GRA and ELM 111

g (w x b ) g (w x b ) rational analysis but also avoid the preconceived


1 1 1 1
N
~ ~
N opinions of experts.
Step 5: Divide the input and output data into
g ( w x b ) g ( w x b ) training data, testing data and predicting data
1 N 1 ~
N
N~

N
The ELM is a SLFN with three main layers,
T t1T input layers, hidden layers and output layers.
1 Different from traditional learning algorithms the
and T (7)
T t NT proposed learning algorithm tends to reach the
N N m
~ ~ N m smallest training error and obtains the smallest norm
of weights. The ELM can be summarized as follows:
In ELM, the input weights and hidden biases Algorithm ELM: Given a training set
are randomly generated instead of tuned. Thus the N {( xi , ti ) | xi R n , ti R m } , activation function
determination of the output weights is as simple as ~
g(x), and hidden node number N ,
finding the least-square (LS) solution to the linear is Step 5.1 Randomly assign input weight wi and
~
bias bi , i 1,2,..., N .
H (8)
Step 5.2 Calculate the hidden layer output matrix
H.
where H is the MP generalized inverse of Step 5.3 Calculate the output weight .
the matrix H. The minimum norm LS solution is
H T where T [t1 ,..., t N ]T
unique and has the smallest norm among all the LS
solutions. H is the MP generalized inverse of matrix H
(See Appendix)
3.4 Steps of Constructing the GELM Forecasting Step 6: Select different activation functions and
Model neuron number of hidden nodes
This section will describe how to constructing a The ELM randomly chooses hidden nodes and
Grey relation analysis and Extreme Learning analytically determines the output weights. There are
Machine (GELM) forecasting model systematically. three activation functions (sigmoidalsinehardlim)
The basic elements of the present study are presented and four kinds of hidden node (2050100200) can
in Figure1, and can be briefly described as follows: be select.
Step 1: Data collection Step 7: Input training and testing data and predict
Collect the daily sales and price data from the the further sales amounts
target store and the other relative series data provided Obtain the predicted results of training and
by neighboring stores or some government agencies testing data then unnormalize the outcomes by
as the forecasting references. One of the data is the Equation (4)
forecasting target ( x0 X ) , and the other is the m As discussed in the Appendix, we have the
comparison series data ( x0 X , i 1,2,..., m) where following important properties:
1. Minimum training error
( X {x | 1,2,..., m} .
The special solution H T is one of the
Step 2: Normalize the initial data
least-square solution of a general linear system
All initial data is composed of a moving
H T , meaning that the smallest training error can
window of fixed length along with the series and the
input data will be normalized by Equation (3). After be reached by this special solution:
normalizing all the collected data, each data set will
fall into the interval between -1 and 1. H T HH T T min H T (9)
Step 3: Calculate the grey relation grades (GRG)
The grey relational grades between the two
series at a certain time point t is represented by grey Although almost all learning algorithms wish to
relational coefficient r ( x0 (k ), xi (k )) , define as reach the minimum training error, however, most of
them cannot reach it because of local minimum or
Equation (1). The range of GRG is closed interval
infinite training iteration is usually not allowed in
between 0 and 1. The great GRG between two data
applications.
sets, the closer the relationship between these data 2. Smallest norm of weight
sets are.
Step 4: Select the more affective factors Further, the special solution H T has the
According to the ranking of the GRG, an expert smallest norm among all the least-square solutions of
who owns the domain knowledge can select the H T :
important factors that affect the sales amounts more
significant. Step 2 and Step 3 not only can provide a
112 International Journal of Electronic Business Management, Vol. 9, No. 2 (2011)

applications will divide the raw data into a training


H T ,
set and a testing set. The training set is used for
{ : H T Hz T , z R N N (10) neural network construction while the test set is used
for measuring the model predictive ability. The
training process for determining the function is
The minimum norm least-square solutions of determined by using the linking arc weights of the
H T is unique, which is H T network. The structural size of the GELM model
Step 8: Measure the accuracy of the forecasting depends on the number of hidden nodes. The input
results data should provide strong representative after the
Measure the accuracy of the forecasting results GRA and opinions according to expert knowledge.
by MAD and MSE criterion.
1. MSE (Mean Square Error)


N
t 1
( At Ft ) 2
MSE (11)
N 1

2. MAD (Mean absolute deviation)

1

N
MAD t 1
At Ft (12) Figure 2: The framework and non-linear
N
transformation of the GELM network
Where At is the actual amount and Ft is the 3.5 The Properties of the GELM Forecasting
forecasting amount, respectively. Model of the Retail Industry
Step 9: Repeat step 6~8 for the same data The GELM forecasting model combined Grey
The GELM model will offer the best predicted relation analysis (GRA) and Extreme learning
results then measure the accuracy of those results. We machine (ELM) methodologies. The GRA in the grey
will do some statistical tests (paired t-test) on system is an important problem-solving method that
obtained results of the sigmoidal activation is used when dealing with the similarity measures of
function, sine activation function and hardlim complex relations. The main purpose of GRA in the
activation function. proposed hybrid-forecasting model is to realize the
relationship between two sets of time series data in
relational space [25]. The Grey relational grade
(GRG) is a globalized measure adopted for GRA. It is
used to describe and explain the relation between two
sets. If the data for the two sets at all individual time
points were the same, then all the relational
coefficients would equal one. The great GRG
between two sets, the closer the relationship between
the sets are. The higher GRG of the candidate data
sets would be the delegates as the input data sets of
the GELM model for enhancing the predict ability.
Owing to the learning speed of the feedforward
neural network is far slower than required and it has
been a major bottleneck in practical application for
past decades. This study applied ELM for
single-hidden layer feed-forward neural networks that
randomly chooses hidden nodes and analytically
determines the output weights of the networks. The
major property of the ELM can abandon the slow
gradient-based learning speed and parameters tuned
iteratively algorithms that are extensively used to
Figure1: Outline of present study train neural network then provide good generalization
forecasting performance at extremely fast learning
Figure 2 shows the framework and non-linear speed.
transformation of the GELM network that The limitation of the purposed GELM
incorporates input layers, hidden layers and output forecasting model is that lacks to consider the
layers. Generally, GELM model in practical influence of the financial crisis, free trade agreement,
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating GRA and ELM 113

consumers behavior and advertisements. Besides, it


is more suitable regarding the mature product's
forecast but not the new announcement product on
the market.

4. EXPERIMENT RESULTS
AND DISCUSSIONS
In operational management in the retail
industry, it is indispensable to forecast the further
Figure 3: The sale quantities of the target brand
demand and place orders at various times of the day.
If the system can offer more accurate prediction
As we know, many factors will affect consumers
functions that can assist managers to cater for the
behavior in the actual retail industry. Among those
demand of customers and reduce scraped quantities
factors that would be described include how to select
of fresh food. Using the GELM model to predict sales
the most influential indices by using the analytical
amounts can increase the accuracy in the proposed
methodology to be the input data of the ELM model
system. The procedures of the experiments and the
as below. After normalizing the raw data and
results are described sequentially in the following
calculating the GRG of each index. The expert
subsections.
selected three factors with higher GRG to be the input
This study compared the GELM forecasting
data of the multivariate time series model and ELM
model with the multivariate statistical forecasting
model. The GRG of each factor is shown in Table 1.
methods such as the GARCH model, the
The selected factors will represent the more
Back-Propagation Network (BPN) as well as the
influential in the sales amounts of fresh food. The
GBPN and the GMFLN model by forecasting 120
three selected factors are W, TAs and TBS.
days sales. The GBPN model integrates the GRA and
Back-Propagation Networks and the GMFLN model
4.2 Experiment Results
integrate the GRA and Multilayer Functional Link
The experimental algorithms of the GARCH,
Networks. The GARCH model is built by E-view and
GBPN, GMFLM, and GELM import the same data
the simulation of relative BPN model are conducted
sets including three indices (W, TAs and TBS )
in MATLAB running on an ordinary notebook with a
selected by GRA and the last 7 days lagged data.
1.4 GHz CPU and 760MB RAM.
4.2.1 GARCH Forecasting Model
4.1 Data Collection and Analysis
Bollerslev [8] proposed the GARCH
Well-known retailers and a government
(Generalized ARCH) conditional variance
organization in Taiwan provided the initial data that
specification that allows for a parsimonious
can be separated into three different groups. Firstly,
parameterization of the lag structure. In analyzing the
the target store collected the daily sales data and price
time series model, several suitable models could
of 960ml containers of milk. The total numbers of the
explain the input data. We adopt two statistics to be
data was 334 as shown in Figure 3. We also collected
the criterions for choosing the best statistical
the sales amounts of other two different brands and
forecasting model.
their prices respectively. Ordinarily, the sales price
1. AIC (Akaikes Information Criterion)
would not be a fixed number, as it will be adjusted
Akaike [2] provided the following criterion to
due to many reasons such as promotion, the hot/cold
evaluate the fitness of the proposal statistical models.
season or some specific activities. Secondly, the sales
(Data set fitted by P parameters of the statistics
data was also obtained from two neighboring stores.
models.)
Those neighboring stores are in the same distribution
area. Stores were close to each other and they
serviced the same customers. We also collected the AIC(P)= n Ln( a2 ) 2 P (13)
sales amounts and price data from other stores.
Thirdly, the Central Weather Bureau provided the 2. SBC (Schwartzs Bayesian Criterion)
local weather records. Schawrtz [28] provided the similar criterion to
evaluate the fitness of the statistical models.

SBC(P)= n Ln( a2 ) P Ln(n) (14)

The best GARCH forecasting model will use


the same time series data and three indices (W, TAs
114 International Journal of Electronic Business Management, Vol. 9, No. 2 (2011)

and TBS) to predict the 120 days demand. After +0.06524TAs + 0.06403TBS-0.81732 t-1+
examining AIC(-0.75645) and SBC(-0.60712) the 0.93919 t-2-0.69544 t-3+0.14682 t-5+ t
best adapted model is described below. t~N(0, 2t)
2t0.02877-0.09753 2t-2 (15)
yt=0.83341yt-1-0.84852yt-2+0.79288yt-3+0.37308Wt

Table 1: The GRG of collected data


I. Data from target store II. Data from neighboring two stores
Target Brand GRG Target Brand in Neighboring A GRG
Sales amount TS Sales amount TAS 0.7560
Price TP 0.6232 Price TAp 0.6843
Competitive Brand 1 Competitive Brand 1in Neighboring A
Sales amount C1S 0.6786 Sales amount C1AS 0.7109
Price C1P 0.6935 Price C1Ap 0.6233
Competitive Brand 2 Competitive Brand 2 in Neighboring A
Sales amount C2S 0.6716 Sales amount C1BS 0.7322
Price C2P 0.6896 Price C1Bp 0.6571
Target Brand in Neighboring B
III. Weather data Sales amount TBS 0.7567
Weather records W 0.7737 Price TBp 0.6056
Competitive Brand 1in Neighboring B
Sales amount C1BS 0.6985
Price C1Bp 0.7012
Competitive Brand 2 in Neighboring B
Sales amount C2BS 0.7345
Price C2Bp 0.7021

4.2.2 GBPN Forecasting Model networks [12]. It is composed of one or two hidden
Generally, the BPN is a typical type of artificial layers that have competent continuous function in a
neural networks model, which is a class of theoretical time-series. In the analogous models, the
generalized non-linear nonparametric model that was hidden nodes are used to capture the non-linear
inspired by studies of the brain and nervous system. structures. Making the decision for how many hidden
BPN is composed of several layers of input, hidden nodes should be used is another difficult issue in the
and output nodes. It is a challenge to develop neural network forecasting model construction
appropriate size of BPN model for combining the process. In practice, the numbers of hidden nodes
available data in the training data and the testing data. were chosen through experiments or by
The structure size of the model depends on the trial-and-error without any theoretical basis to guide
number of input nodes and the number of hidden the decision.
nodes. There are no systematic reports on the Some theories suggest that more hidden nodes
decision of input and hidden nodes. Different input can increase the accuracy in approximating a
and hidden nodes have a significant impact on the functional relationship but it still causes the
learning and prediction ability of the network. over-fitting problem. This problem is more likely to
As mentioned before, the purpose of GRA is to happen in the GMFLN model than in other statistical
realize the relationship between two sets of time models. The over-fitting problem solution is to find a
series data in a relational space. In the GBPN model, parsimonious model that fits the data well. Another
the input nodes of the neural network are usually the way to tackle the over-fitting problem is to divide the
past, lagged observations and more influential factors time series into three sets; training, testing and
that will affect the sales amounts, and the output node validation [21]. The first two sets are used for model
is the real sales data. We expect to obtain an building and the last is used for model validation or
applicable GBPN forecasting model that has evaluation. The best GMFLN model is the one that
generalization and good forecasting capability. gives the best results in the predicting set.

4.2.3 GMFLN Forecasting Model 4.2.4 GELM Forecasting Model


The MFLN incorporates basic input nodes, The learning speed of ELM is faster than other
logarithmic input nodes, and exponential input nodes traditional classic gradient-based learning algorithms.
in the input layer for improving the forecasting ability This advantage has already been recognized in many
and reducing the learning cycle time of the nervous further studies. In order to obtain higher prediction
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating GRA and ELM 115

accuracy, we designed the experiments with different and three indices (W, TAs and TBS) to predict the
activation functions for the number of hidden nodes. 120 days demand.
In the GELM forecasting model, we compare the Table 2 shows the training time of GELM,
accuracy with sigmoidal activation function, and GBPN and GMFLM. The GELM learning algorithm
hardlim activation function in the different numbers spent 0.3705s CPU time with the Sigmoidal
of hidden nodes. The numbers of hidden nodes are activation function and 200 hidden nodes. The
selected from the 20, 50, 100 and 200. The GELM traditional gradient-based learning algorithm as
forecasting model will use the same time series data GBPN and GMFLN cost too much training time
compared with GELM.

Table 2: Training time of different algorithms


GELM GBPN GMFLM
Activation function
Sigmoidal Sine Hardlim
(Sig.) (Sin.) (Har.)
20 0.0100 0.0100 0.0100
Hidden 50 0.0200 0.0200 0.0200
50 11.573 50 4.216
nodes 100 0.0801 0.0701 0.0801
200 0.3705 0.3805 0.3805

Table 3 shows the performance of the GELM compare training time and performance with different
forecasting model in different activation functions activation functions we tested 30 times each run and
and hidden nodes. The more hidden node has a better did some statistical tests (paired t-test) on obtained
ability to predict the sales amounts. The best results to examine the statistically significant
forecasting results have MAD of 0.07039 and MSE difference. The paired t-test is a widely used method
of 0.00907 with sigmoidal activation function and to examine whether the average difference of
200 hidden nodes. performance between two methods over various data
In the GELM model, the input weights and sets is significantly from zero. If the p-value
hidden biases are randomly chosen and the output generated by a paired t-test is lower than the
weights are analytically determined by using the significant level (0.05) that indicate the difference
Moore-Penrose generalized inverse. In order to between the two methods.

Table 3: Performance of different activation functions and hidden nodes


Hidden nodes
Activation function Criterion 20 50 100 200
MAD 0.13192 0.13032 0.11908 0.07039
Sigmoidal (Sig.)
MSE 0.03027 0.02789 0.02221 0.00907
MAD 0.15315 0.14177 0.12049 0.07691
Sine (Sin.)
MSE 0.04030 0.03113 0.02442 0.01000
MAD 0.14310 0.12870 0.11234 0.07264
Hardlim (Har.)
MSE 0.03548 0.02581 0.02126 0.00998

Table 4 shows the training time of the GELM nodes. The p-value in sigmoidal and sine
with different activation functions and different activation function is always lower than 0.05, which
hidden nodes. There is no significant difference when means there is a significant difference between these
the numbers of hidden nodes are 20 and 50. When two activation functions and the performance of the
hidden nodes are 100, the training time of the hardlim sigmoidal activation function is always better than the
activation function is significantly different between sine activation function. The hidden nodes are 20, 50
sigmoidal and sine activation function. But the and 100, the p-value in the sigmoidal and the hardlim
sigmoidal and sine activation function have no activation function are lower than 0.05, the sigmoidal
difference. When hidden nodes are 200, these three activation function is significantly better than hardlim
activation functions have significant differences. The activation function. The hidden nodes are 20, 100 and
hardlim activation function is better than the 200, the p-value in the sine and the hardlim activation
sigmoidal activation function and the sigmoidal function are lower than 0.05, the hardlim activation
activation function is better than the sine activation function is significantly better than sine activation
function. Table 5 shows the MAD of GELM within function.
different activation functions and different hidden
116 International Journal of Electronic Business Management, Vol. 9, No. 2 (2011)

Table 4: Paired t-test of training time between different activation function


Paired Differences
Hidden Paired 95% Confidence Interval of
nodes Methods Mean StDev the Difference t P values
Lower Upper
Sig.Sin. 0.00020 0.00142 -0.00033 0.00073 0.77 0.448
20 Sig.Har. 0.00000 0.00158 -0.00059 0.00059 0.00 1.000
Sin.Har. -0.00020 0.00110 -0.00061 0.00021 -1.00 0.326
Sig.Sin. -0.00000 0.00455 -0.00170 0.00170 0.00 1.000
50 Sig.Har. 0.00000 0.00643 -0.00240 0.00240 0.00 1.000
Sin.Har. 0.00000 0.00643 -0.00240 0.00240 0.00 1.000
Sig.Sin. -0.00200 0.00610 -0.00428 0.00028 -1.80 0.083
100 Sig.Har. 0.00400 0.00498 0.00214 0.00586 4.40 0.000*
Sin.Har. 0.00600 0.00498 0.00414 0.00786 6.60 0.000*
Sig.Sin. -0.00090 0.01062 -0.01297 -0.00503 -4.64 0.000*
200 Sig.Har. 0.01702 0.01647 0.01087 0.02317 5.66 0.000*
Sin.Har. 0.02602 0.01306 0.02115 0.03089 10.92 0.000*

Table 5: Paired t-test results of predicting between different activation function in MAD
Paired Differences
Hidden Paired 95% Confidence Interval of
nodes Methods Mean StDev the Difference t P values
Lower Upper
Sig.Sin. -0.00735 0.00756 -0.01017 -0.00452 -5.32 0.000*
20 Sig.Har. -0.00272 0.00435 -0.00434 -0.00110 -3.43 0.002*
Sin.Har. 0.00463 0.00832 0.00152 0.00773 3.05 0.005*
Sig.Sin. -0.00538 0.00971 -0.00901 -0.00176 -3.04 0.005*
50 Sig.Har. -0.00413 0.00925 -0.00758 -0.00068 -2.45 0.021*
Sin.Har. 0.00125 0.01081 -0.00278 0.00529 0.64 0.530
Sig.Sin. -0.00837 0.00620 -0.01068 -0.00605 -7.40 0.000*
100 Sig.Har. -0.00378 0.00466 -0.00552 -0.00203 -4.43 0.000*
Sin.Har. 0.00459 0.00737 0.00184 0.00733 3.42 0.002*
Sig.Sin. -0.00485 0.00532 -0.00684 -0.00286 -4.99 0.000*
200 Sig.Har. 0.01204 0.04376 -0.00430 0.02838 1.51 0.143
Sin.Har. 0.01689 0.04478 0.00017 0.03361 2.07 0.048*

Table 6 shows the MSE of GELM within function. But, there is no significant difference
different activation functions and different hidden between sigmoidal vs. hardlim or sine vs.
nodes. The p-value in sigmoidal and sine hardlim.
activation function are always lower than 0.05, which
means there is a significant difference between these 4.3 Discussion
two activation functions and the performance of Table 7 presents the results of different
sigmoidal activation function is always better than forecasting models. The best GARCH model has
sine activation function. The hidden nodes are 50 and MAD of 0.13876 and MSE of 0.03191. The best
100, the p-value in sigmoidal and hardlim is are forecasting result of GBPN model has MAD of
lower than 0.05, the sigmoidal activation function is 0.09837 and MSE of 0.01979. The best forecasting
significantly better than hardlim activation function. result has MAD of 0.08911 and MSE of 0.01883. The
The hidden nodes are 20 and 200, the p-value in best GELM model has MAD of 0.07039 and MSE of
sine and hardlim activation function are lower 0.00907. The GELM forecasting model we proposed
than 0.05, the hardlim activation function is has the smallest predicting errors and the learning
significantly better than sine activation function. speed is extremely faster than others.
From above results, the sigmoidal activation function
has significant differences between the sine activation
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating GRA and ELM 117

Table 6: Paired t-test results of predicting between different activation function in MSE
Paired Differences
Hidden Paired 95% Confidence Interval of
nodes Methods Mean StDev the Difference t P values
Lower Upper
SigSin -0.00321 0.00316 -0.00439 -0.00203 -5.56 0.000*
20 SigHar -0.00051 0.00174 -0.00116 0.00014 -1.62 0.116
SinHar 0.00269 0.00072 0.00122 0.00417 3.74 0.001*
SigSin -0.00184 0.00329 -0.00307 -0.00061 -3.07 0.005*
50 SigHar -0.00176 0.00280 -0.00280 -0.00071 -3.43 0.002*
SinHar 0.00009 0.00375 -0.00131 0.00149 0.13 0.900
SigSin -0.00213 0.00250 -0.00306 -0.00120 -4.67 0.000*
100 SigHar -0.00138 0.00167 -0.00020 -0.00075 -4.52 0.000*
SinHar 0.00075 0.00246 -0.00016 0.00167 1.68 0.104
SigSin -0.00083 0.00106 -0.00123 0.00044 -4.29 0.000*
200 SigHar 0.00168 0.00578 -0.00047 0.00384 1.60 0.121
SinHar 0.00252 0.00597 0.00029 0.04750 2.31 0.028*

Table 7: The compared results of different forecasting models


Model Type MAD MSE Training Time
Statistical time series model GARCH 0.13876 0.03191
GBPN 0.09837 0.01979 11.573
Artificial neural network model GMFLN 0.08911 0.01883 4.216
GELM 0.07039 0.00907 0.3705

5. CONCLUSIONS gradient-based algorithms, such as stopping


criteria, local minima, improper learning rate and
over-fitting problems.
Recently, many researches and industrial (5) The GELM tends to reach the solutions
managers are interested in applying data mining and straightforward without trivial issue and looks
artificial intelligence algorithms to deal with routine much simpler than most feed-forward neural
problems. Sales forecasting plays a more and more networks algorithms.
important role in operating management of The experiment results demonstrated the
commercial enterprises especially in the retail idustry. effectiveness of the GELM was superior to other
In this paper, we present a relatively novel neural forecasting models. In summary, this research would
network methodology, Grey relation analysis provide the following contributions in practical
integrated with extreme learning machine (GELM) to forecasting problems in the retail industry.
construct a forecasting model for fresh food. The (1) Influential factor selections
proposed GELM model includes several major The Grey relation analysis (GRA) is able to
characteristics as following: identify the appropriate factors for forecasting future
(1) This study applied GRA, which is a values. These influential factors can elucidate and
problem-solving method that used when dealing incorporate into the input data.
with similarity measures of complex relations. (2) Forecasting efficiency
The main purpose of GRA in this model is to The efficiency of GELM is better than other
realize the relationship between two sets of time GBPN or GMFLN methods. For the demand of fresh
series data in the relational space and sieve out food fluctuates usually, the faster learning speed can
the more influential factors as the input data to
provide timely and frequent forecasting results for the
the ELM.
managers reference. When hidden nodes are bigger,
(2) The learning speed of GELM is extremely fast
the learning speed of the hardlim activation function
than GBPN and GMFLN. The learning phase of
GELM can be completed less than a second is better than the sigmoidal activation function and
within different activate functions and hidden the sigmoidal activation function is better than the
nodes. sine activation function.
(3) The proposed GELM has better generalization (3) Forecasting performance
performance than the gradient-based algorithms This research applies many forecasting models
such as GBPN and GMFLN. to be the compared benchmark. According to the
(4) The GELM method can avoid many harmful results, the GELM model has the smallest MAD and
issues that happened in the traditional MSE than GARCH, GBPN, and GMFLN models.
118 International Journal of Electronic Business Management, Vol. 9, No. 2 (2011)

Therefore, GELM is a valid and effective forecasting 10. Chakraborty, K., Mehrotra, K. and Mohan, C.
tool that can be further applied in similar field for K., 1992, Forecasting the behavior of
applications. multivariate time series using neural networks,
Examining the performance with different Neural Networks, Vol. 5, No. 6, pp. 961-970.
activation functions by a paired t-test, the sigmoidal 11. Chang, P. C. and Wang, Y. W, 2006, Fuzzy
activation function has significant differences with Delphi and back-propagation model for sales
the sine activation function in MAD and MSE forecasting in PCB industry, Expert Systems
criterions. with Applications, Vol. 30, No. 4, pp. 715-726.
In this paper, our experiments have 12. Chen, F. L. and Ou, T. Y., 2009, Grey relation
analysis and multilayer function link network
successfully demonstrated the GELM can be well
sales forecasting model for perishable food in
employed in sales forecasting for the retail industry.
convenience store, Expert Systems with
It not only provides smaller predicting errors but also
Application, Vol. 36, No. 3, pp. 7054-7063.
improves the training speed more than other 13. Chu, C. W. and Zhang, G. P., 2003, A
forecasting models. Future research will focus on the comparative study of linear and nonlinear
different temperature levels of fresh food in the retail models for aggregate retail sales forecasting,
industry and improve the stability and learning speed International Journal of Production Economics,
of the GELM model. Vol. 86, No. 3, pp. 217-231.
14. Deng, J. L., 1982, Control problems of Grey
REFERENCES systems, System Control Letter, Vol. 1, No. 4,
pp. 288-294.
1. Aburto, L. and Weber, R., 2007, Improved 15. Doganis, P., Alexandrids, A., Patrinos, P. and
supply chain management based on hybrid Sarimveis, H., 2006, Time series sales
demand forecasts, Applied Soft Computing, forecasting for short shelf-life food products
Vol. 7, No. 1, pp. 126-144. based on artificial neural networks and
2. Akaike, H., 1974, A new look at the statistical evolutionary computing, Journal of Food
model identification, IEEE Transactions on Engineering, Vol. 75, No. 2, pp. 196-204.
Automatic Control, Vol. 19, No. 6, pp. 16. Engle, R. F., 1982, Autoregressive conditional
716-723. heteroskedasticity with estimates of the
3. Ali, . G., Sayin, S., Woensel, T. V. and variance of U.K. inflation, Econometrica, Vol.
Fransoo, J., 2009, SKU demand forecasting in 50, No. 4, pp. 987-1008.
the presence of promotions, Expert Systems 17. Frank, C., Garg, A., Sztandera, L. and Raheja,
with Application, Vol. 36, No. 10, pp. A., 2003, Forecasting womens apparel sales
12340-12348. using mathematical modeling, International
4. Alon, I., Qi, M. and Sadowski, R. J., 2001, Journal of Clothing Science and Technology,
Forecasting aggregate retail sales: A Vol. 15, No. 2, pp. 107-125.
comparison of artificial neural networks and 18. Huang, G. B., 2003, Learning capability and
traditional methods, Journal of Retailing and strong capacity of two-hidden-layer
Consumer Services, Vol. 8, No. 3, pp. 147-156. feedforward networks, IEEE Transactions on
5. Ansuj, A. P., Camargo, M. E., Radharamanan, Neural Networks, Vol. 14, No. 2, pp. 274-281.
R. and Petry, D. G., 1996, Sales forecasting 19. Huang, G. B., Zhu, Q. Y. and Siew, C. K., 2006,
using time series and neural networks, Extreme learning machine: Theory and
Computers and Industrial Engineering, Vol. 31, applications, Neurocomputing, Vol. 70, No.
No. 1-2, pp. 421-424. 1-3, pp. 489-501.
6. Au, K. F., Choi, T. M. and Yu, Y., 2008, 20. Huang, S. T., Chiu, N. H. and Chen, L. W.,
Fashion retail forecasting by evolutionary 2008, Integration of grey relational analysis
neural networks, International Journal of with genetic algorithm for software effort
Production Economics, Vol. 114, No. 2, pp. estimation, European Journal of Operational
615-630. Research, Vol. 188, No. 3 , pp. 898-909.
7. Bigus, J. P., 1996, Data Mining with Neural 21. Kaastra, I. and Boyd, M., 1996, Designing a
Networks: Solving Business Problems - From neural network for forecasting financial and
Application Development to Decision Support, economic time series, Neurocomputing, Vol.
McGraw-Hill, New York. 10, No. 3, pp. 215-236.
8. Bollerslev, T., 1986, Generalized 22. Kuo, R. J. and Chen, J. A., 2004, A decision
autoregressive conditional heteroskedasticity, support system for order selection in electronic
Journal of Econometrics, Vol. 31, No. 3, pp. commerce based on fuzzy neural network
307-327. supported by real-coded genetic algorithm,
9. Box, G. E. P. and Jenkins, G. M., 1976, Time Expert Systems with Application, Vol. 26, No. 2,
series analysis forecasting and control, pp. 141-154.
Management Science, Vol. 17, No. 4, pp. 23. Kuo, R. J., 2001, A sales forecasting system
141-164.
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating Gra and Elm 119

based on fuzzy neural network with initial performance by reducing uncertainty,


weights generated by genetic algorithm, International Transactions in Operation
European Journal of Operational Research, Research, Vol. 5, No. 6, pp. 487-499.
Vol. 129, No. 3, pp. 496-517. 35. Weigend, A. S., Rumelhart, D. E. and
24. Lachtermacher, G. and Fuller, J. D., 1995, Huberman, B. A., 1991, Generalization by
Back-propagation in time series forecasting, weight-elimination with application to
Journal of Forecasting, Vol. 14, No. 4, pp. forecasting, Advances in Neural Information
381-393. Processing Systems, Vol. 3, pp. 875-882.
25. Lai, H. H., Lin, Y. C. and Yeh, C. H., 2005, 36. Zhang, G. P., 2001, An investigation of neural
Form design of product image using grey networks for linear time-series forecasting,
relational analysis and neural network models, Computers and Operations Research, Vol. 28,
Computers & Operations Research, Vol. 32, No. 12, pp. 183-202.
No. 10, pp. 2689 -2711. 37. Zhang, G. P., 2003, Time series forecasting
26. Leigh, W., Purvis, R. and Ragusa, J. M., 2002, using a hybrid ARIMA and neural network
Forecasting the NYSE composite index with model, Neurocomputing, Vol. 50, pp. 159-175.
technical analysis, pattern recognizer, neural
network, and genetic algorithm: A case study in ABOUT THE AUTHORS
romantic decision support, Decision Support
System, Vol. 32, No. 4, pp. 361-377.
Fei-Long Chen is a Professor of Industrial
27. LeVee, G. S., 1993, The key to understanding
Engineering and Engineering Management at
the forecasting process, Journal of Business
National Tsing-Hua University (NTHU), Hsinchu,
Forecasting, Vol. 11, No. 4, pp. 12-16.
Taiwan. He received the B.S. degree in Industrial
28. Schawrtz, G., 1978, Estimating the dimension
Engineering from National Tsing-Hua University,
of a model, Annals of Statistics, Vol. 6, No. 2,
Taiwan, in 1982, and the M.S. and Ph.D degrees in
pp. 461-464.
Industrial Engineering from Aubrun University, USA,
29. Serre, D., 2002, Matrices: Theory and
in 1988 and 1991. respectively. His currently research
Applications, Springer, New York.
interests include statistical process control, total
30. Sun, Z. L., Choi, T. M., Au, K. F. and Yu, Y.,
quality management, 6-sigma, engineering data
2008, Sales forecasting using extreme learning
analysis, enterprise integration, enterprise resource
machine with applications in fashion retailing,
planning, and global logistics management. Currently
Decision Support Systems, Vol. 46, No. 1, pp.
he is temporally transferred to Liteon Corp. and
411-419.
serves as the Dean of IE Acodemy.
31. Sztandera, L. M., Frank, C. and Vemulapali, B.,
2004, Predicting womens apparel sales by
soft computing, Lecture Notes in Artificial Tsung-Yin Ou is currently a Ph.D. candidate in the
intelligence, Vol. 3070, pp. 1193-1198. Department of Industrial Engineering and
32. Tang, X. and Han, M., 2009, Partial lanczos Engineering Management at National Tsing Hua
extreme learning machine for single output University, Taiwan. He received his B.S. degree at
regression problems, Neurocomputing, Vol. 72, National Chiao Tung University and M.S. degree at
No. 13-15, pp. 3066-3076. Tunghai University in Tai Chung. He is also an
33. Tang, Z., Almedia, C., and Fishwick, P. A., engineer of IE Department in China Steel Corporation,
1991, Time series forecasting using neural Taiwan. His research interesting includes Data
networks vs. Box-Jenkins methodology, Mining, Operation Management and ERP.
Simulation, Vol. 57, No. 5, pp. 303-310.
34. Van der Vorst, J. G. A. J., Beulens, A. J. M., De (Received September 2009, revised December 2009,
Wit, W. and Van Beek, P., 1998, Supply chain accepted December 2009)
management in food chains: Improving
120 International Journal of Electronic Business Management, Vol. 9, No. 2 (2011)

APPENDIX
Appendix 1 Appendix 2
Moore-penrose Generalized Inverse Minimum Norm Least-square Solutions of
The resolution of a general linear system General Linear System
Ax y , where A may be singular and may even For general linear system Ax y , we say that
not be square, can be made very simple by the use of x is a least-square solutions if
Moore-Penrose generalized inversed [29].
Definition 1: A matrix G of order n m is the Ax y min x Ax y where |||| is a norm in
Moore-penrose generalized inverse matrix A of order
m n , if Euclidean space (15)

AGA A, GAG G, ( AG ) T AG, (GA)T GA (14)

For the sake of convenience, the


Moore-Penrose generalized inverse matrix A will be
denote by A
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating Gra and Elm 121



*

101




GELM






GELM
GARCHGBPNGMFLNGELM
(activation function)


*d927810@oz..nthu.edu.tw

You might also like