Chap6 - Artificial Neural Networks - Theory and Applications

CHAPTER 6
ARTIFICIAL NEURAL NETWORKS: THEORY AND APPLICATIONS
6.1 Introduction
As discussed in Chapter 4, relationship underlying TFP growth and factors (variables)
affecting TFP growth is complex. First, there are complicatedly interrelated
relationships among factors affecting TFP growth. Second, there may have non-linear
relationship between factors affecting TFP and TFP growth. These may imply that a
conventional statistical forecasting model is not capable of modelling such arbitrary
complex non-linear mapping. In the last decade, it has been widely recognized that
Artificial Neural Networks (ANNs) are superior over the traditional statistical model
when relationship between output and input variables is implicit, complex, and
nonlinear. For this reason, this study will apply ANN technology to develop the
forecasting model. Hence, it is necessary to review the theory and applications of
ANNs here.
Section 6.2 provides an overview of ANNs, including the definition of ANNs, areas of
application and advantages of ANNs over traditional mathematical methods. Section
6.3 reviews the application of ANNs in the area of construction management and
economics. Basic ANN components and theories, such as artificial neural systems,
Processing Elements (PEs), topology, threshold function, learning rules and
convergence rules, are explained with illustration in Section 6.4
133
Section 6.5 discusses the overfitting problem and regularization. Overfitting problem,
the most common problem encountered by ANNs when the dataset is small, is
explained under Section 6.5.1. Regularization is a technique applied to overcome
overfitting and it is discussed in subsection 6.5.2.
Bayesian neural networks (BNNs), the combination of the well-established Bayesian
Regularization and Neural Networks, are discussed in Section 6.6. A review of the
applications of BNNs is provided under Section 6.6.1. Section 6.6.2 explains the
theory of BNNs. It focuses on the main objective, that is, to optimise regularization
parameters.
Section 6.7 reviews the applications of ANNs to time-series forecasting. It evaluates
the advantages and disadvantages of different neural network models applying time-
series forecasting, including feedforward neural network, recurrent network,
evolutionary ANNs, neuron-fuzzy networks, neuron-wavelet networks and Bayesian
neural networks.
Section 6.8 explains how to carry out an empirical ANN modelling. It concentrates on
the know-how of develop a multilayer feedforward network. This involves the design
of the architecture of multiplayer feedforward network and selection of transfer
function, training algorithm, data normalization, training and testing samples and
performance function.
Justification for choice of ANNs to predict TFP growth, in particular Bayesian neural
networks (BNNs), is provided in Section 6.9.
134
6.2 An overview of ANNs
In the last decade, Artificial Intelligence (AI) techniques such as Artificial Neural
Networks (ANNs) have received a great deal of attention. In essence, ANN is an
information technology that mimics the human brain and nervous system in learning
from experience and generalizes from previous examples to generate new outputs by
abstracting essential characteristics from inputs in the pattern of variable
interconnection weights among the processing elements. ANNs are more powerful
than traditional methods in the situations when the problem require qualitative or
complex quantitative reasoning where the conventional statistical and mathematical
methods are inadequate, or the parameters are highly interdependent and data is
intrinsically noisy, incomplete or error prone (Bailey and Thompson, 1990).
ANNs have many advantages over traditional methods of modelling. Firstly, as
opposed to the traditional mathematical and statistical methods, ANNs are data-driven
self-adaptive methods, which can capture subtle functional relationships among the
data even if the underlying relationships are unknown or hard to describe. Secondly,
ANNs are able to capture complex non-linear relationships with better accuracy
(Rumelhart et al. 1994). Thirdly, the most important advantage of ANNs over
mathematical and statistical models is their adaptability---ANN systems can
automatically adjust their weights to optimise their behaviour (Boussabaine, 1996).
Neural networks have been utilized for classification, clustering, vector quantification,
pattern association, function approximation, control, optimisation and search.
135
6.3 Applications of ANNs in the construction industry
Moselhi et al. (1991) have discussed the potential applications of ANNs in
construction industry in the early of1990s and, in 1996, Boussabaine (1996) reviewed
the use of ANNs in construction management. So far, ANNs have been used for
prediction, risk analysis, decision-making, resources optimization, classification and
selection.
The most common application of ANNs in the construction management area is
prediction. ANNs have been applied to predict tender bids (Gaarslev, 1991; McKim,
1993; Li and Love, 1999), construction cost (Williams, 1994, 2002; Adeli and Wu,
1998; Hegazy and Ayed, 1998; Emsley, 2002), construction budget performance
(Chua et al., 1997), project cash flow (Boussabaine and Kaka, 1998), construction
demand (Goh 1996; 2000), labour productivity (Chao and Skibniewski, 1994; Portas
and AbouRizk, 1997; Savin and Fazio, 1998; AbouRizk et al., 2001), earthmoving
operation (Shi, 1999), the acceptability of a new technology (Chao and Skibniewski,
1995), organizational effectiveness (Sinha and Mckim, 2001), contractor
preqalification (Lam et al., 2001) and hoisting time of tower cranes (Tam et al., 2002).
Multi-layer feedforward network and Backpropagation (BP) training algorithms were
the most popular topology and learning methods for the prediction. However, several
other neural networks other than BP were developed to cope with different data
problems. Regularization neural network has been used by Adeli and Wu (1998) to
deal with the noise in highway construction costs. Regularization neural network has
advantages over BP in that the result of the estimation from the regularization neural
network depends only on the training examples and that it can overcome overfitting
136
problem. When the prediction dependent variables are subject to uncertainty and based
on subjective judgement, fuzzy neural network (FNN) model, which combine the
fuzzy set and neural network techniques, has been developed to improve the
objectivity of the prediction. The successful applications of FNN include those by
Portas and AbouRizk (1997), Lam et al. (2001) and AbouRizk (2001). Their studies
reveal the benefit of FNN models over the general feedforward neural network
(GFNN) to produce more accurate models.
However, the selection of an appropriate topology for multiplayer network used to be
conducted by trail and error. To automate the search of an optimal architecture for
ANNs, the solution was to combine genetic algorithms (GAs) with neural networks
(Goh, 2000). GAs are artificial intelligence search methods based on the theories of
genetics and natural selection developed by Holland (1975). The combined technique
was found be able to produce more accurate forecasts than the ANN technique.
Another important application of ANN in construction management is optimisation. So
far, two types of optimisation algorithm have been used to find a global minimum in
order to avoid a local minimum, which NNs are prone to. One is GAs and another is
the simulated annealing (SA). Yeh (1995) employed the SA and Hopfield neural
network to optimise construction-site layout. SA is a probabilistic hill-climbing search
algorithm which can find a global minimum of the performance function by combining
gradient descent with a random process. However, the drawback of the SA is that it is
very slow. Contrasted with the SA, GAs are less susceptible to being stuck at the local
minimum and can quickly locate high performance regions in extremely complex
search spaces. GAs have three major applications: to optimise weights in NNs; to
137
specify the topology for NNs; and to select optimum smoothing factors for adaptive
probabilistic neural networks (APNNs). Hegazy and Ayed (1998) applied GAs to
optimise the network weights when developing a parametric cost-estimating model for
highway projects. Goh (2000) used GAs to seek the optimum architecture of NNs.
Sawhney and Mund (2001) used GAs to select optimum smoothing factors in APNNs
to develop an integrated crane type and model selection system.
For classification or, selection, multilayer neural network was used by Cheung et al.
(2000) to conduct project dispute satisfaction classification. Sawhney and Mund
(2001) used APNNs based on the Bayesian classifier method to conduct crane type and
model selection. APNNs can model any non-linear function using a single hidden layer
with as many PEs as there are training cases.
6.4 Basic Concepts of ANNs
An artificial neural network is a computational model defined by four parameters: type
of neurons, connection architecture, learning algorithm and recall algorithm (Mehrotra,
et al., 1997).
6.4.1 Artificial neural systems
ANNs is an information processing technology that stimulates the human brain
nervous system. It is built on three basic components: processing elements (PE) which
are an artificial model of the human neuron; interconnections whose functions are
similar to the axon; and synapses which are the junctions where an interconnection
138
meets a PE. Each PE receives signals from other PEs that constitute an input pattern.
This input pattern stimulates the PE to reach some level of activity. If the activity is
strong enough, the PE generates a single output signal that is transmitted to other PEs
through an interconnection.
6.4.2 Processing elements
Figure 6.1 describes a typical artificial neuron. The input signals come from either the
environment or outputs of other PEs and form an input vector:
A = (a1, ai, an) (6-1)
where, ai is the activity level of the ith PE or input. There are weights bound to the
input connections: w1, w2, . . . ,wn. The neuron has a bias b. The sum of the weighted
inputs and the bias form the net input signal, X:
n
X = wij ai + b j = W * A + b (6-2)
i =1
The input signal is then sent to a transfer function, which serves as a non-linear
threshold. The transfer function calculates output signal of the PE (j) as:
Oj = f (X ) (6-3)
where, O j is the output signal from PE(j); f is a transfer function; and X is the net
input signal to PE(j).
139
bj
a1
w1j
ai w ij f ( a i w ij + b j ) Oj
wnj
an
Figure 6.1 A generic processing element
6.4.3 Threshold functions
There are many threshold functions adopted in ANNs. The two most commonly used
transfer functions are linear and sigmoid.
The linear threshold function: f(x)=x (6-4)
The sigmoid function. Log-Sigmoid transfer function and Tan-Sigmoid transfer
function are commonly used in backpropagation networks, partly because in
backpropagation, it is important to be able to calculate the derivatives of any
transfer function used (Demuth and Beale, 2000). They can be expressed as the
following equations:
logistic function: f ( x ) = (1 + e x ) 1 (6-5)
hyperbolic tangent: f ( x ) = tanh( x ) (6-6)
140
6.4.4 Architecture of ANNs
Architecture of an ANN is the organisation that assembles PEs into layers and links
them with weighted interconnections. The architecture determines how computations
proceed. A common ANN architecture is determined by three distinguishing
characteristics: connection types, connection schemes and layer configurations.
The most commonly used ANN paradigm is multilayer perceptrons (MLPs). A MLP
consists of an input layer, at least one hidden layer, and one output layer. The neurons
in each layer are usually fully connected to the neurons in another layer. Among them,
three-layer feedforward network is the most popular. Feedforword network is a type of
network in which connection is allowed from a node in layer i only to nodes in layer i
+ 1. The three layers are input layer, hidden layer and output layer. Input layer is the
layer that receives input signals from the environment. Output layer is the layer that
emits signals to the environment. Hidden layers are layers between the input and
output layers.
6.4.5 Learning rules
Learning makes possible modification of behaviour in response to the environment. A
learning rule is a procedure for modifying the weights of connections between the
nodes and biases of a network. There are three broad learning categories: supervised
learning; unsupervised learning; and reinforcement learning.
141
6.4.6 Convergence
Convergence is the eventual minimization of error between the desired and computed
PE outputs. One common convergence method is convergence in the means-square
sense:
2
lim E{
n
xn x }=0 (6-7)
where, E {x} represents the estimated value of x.
6.5 Overfitting problems and regularization
As stated before, the BP is the most commonly used ANN learning technique. The
standard backpropagation is a gradient descent algorithm in which the network weights
and biases are modified in the direction that performance function decreases most
rapidly. Multilayer feedforward network with BP are capable of performing any linear
or multivariate arbitrary non-linear computations and can approximate any continuous
function to achieve a desired accuracy. However, BP algorithm is slow for
convergence and may cause overfitting problem. To speed up the BP training process,
some faster BP algorithms that can converge faster have been developed. Among
them, Levenberg-Marquardt algorithm generally has the fastest convergence and is
able to obtain lower mean squares errors than any other algorithms for function
approximation problems (Demuth and Beale, 2000).
6.5.1 Overfitting problems
The goal of neural network training is to minimize the errors while the trained neural
network can respond properly when presented with new inputs. Ovefitting is a
142
phenomenon whereby the neural network has memorized the training example so that
the network fails to generalize on a new situation. Overfitting may occur when the data
set for training is small. As a larger network is used, the more complex the functions a
network can create. However, the more complex the network is, the more possible the
network may mistakenly model the noise in the data as part of the non-linear
relationship, leading to over-fitting the data. To overcome overfitting, one solution is
to add more training examples. However, it is difficult to know how large the network
should be for a specific application. It is also difficult to overcome overfitting if the
training examples are in limited supply. Fortunately, there are two other useful
techniques to overcome this problem. They are early stopping and regularization.
According to Sarle (1995) and Demuth and Beale (2000), when conducting a function
approximation training, Bayesian Regularization provides better generalization
performance than early stopping. It is because unlike early stopping that separates
validation data from the training data, Bayesian Regularization uses both as training
data. When the size of the data set is small or if there is little noise in the data set, the
advantage of Bayesian Regularization over early stopping is more remarkable. The
experiments show that on average, the MSE obtained from Bayesian Regularization is
only around 1/5 that of early stopping. Therefore, this study will apply the Bayesian
Regularization to avoid over-fitting.
6.5.2 Regularization
Regularization is to improve generalization by constraining the size of the network
weights. When the weights are small, the network response will be smooth. According
143
to Foresee and Hagan (1997), with regularization, any modestly oversized network
should be able to sufficiently represent the true function.
The typical performance function that is used for training multilayer feedforward
network is the mean sum of squares of the network errors (MSE):
1 N
F = MSE = (t i o i ) (6-8)
N i =1
To improve generalization, the performance function is modified by adding a term that
consists of the mean of the sum of squares of the network weights and biases (MSW):
MSEreg = MSE + (1 )MSW
1 n 2 (6-9)
MSW = wj
n j=1
where is the performance ratio; MSEreg is the performance function for
regularization; MSE is the mean sum of square of the network errors; and MSW is the
mean of the sum of squares of the network weights and biases.
The improved performance function will cause the network to have smaller weights
and biases and, hence, resulting in a smoother network response which is less likely to
overfit. However, it is difficult to determine the optimum value for the performance
ratio parameter. To overcome this difficulty, Mackay (1992) introduced the Bayesian
Regularization. In this technique, the weights and biases of the network are assumed to
be random variables with specific distributions. The regularization parameters are
related to the unknown variances associated with these distributions. Then statistical
techniques can be used to estimate these parameters. Application of the Bayesian
Regularization will be discussed in the next section.
144
6.6 Bayesian Neural Networks
Bayesian Neural Networks (BNNs) is a combination of Bayesian rules and neural
network to automatically determine the optimal regularization parameters. Mackay
(1992) was the first to introduce the Bayesian approach to neural network training and
to optimize regularization based on Gaussian approximation. Neal (1993) adopted the
Monte Carlo method as computational techniques to implement Bayesian neural nets.
Foresee and Hagan (1997) used a Gauss-Newton approximation on the Hessian matrix
and the Levenberg-Marquardt algorithm to implement Bayesian Regularization to train
feedforward neural networks.
6.6.1. Applications of BNNs
The BNNs have been utilized in many areas, but not yet in construction. BNN model
was used by Cool et al. (1997) for predicting yield and ultimate tensile strength in
welds. Cherian et al. (2000) used BNNs to predict mechanical properties of ferrous
powder materials and the model was found to produce good prediction accuracy. A
BNN-based model for determining main particulars of a ship at the initial design stage
is described by Clausen et al. (2001). BNNs have also been used in assessing
nonlinearities in the relationship between work attitudes and job performance by
Somers (2001). Aminian (2001) developed an analog circuit fault diagnostic system
applying BNNs.
145
6.6.2 Theory of Bayesian Regularization
The main objective of BNNs is to model the relationship from the data without
overfitting the noise through optimizing the regularization parameter. As discussed
before, one of the drawbacks of ANNs is that of choosing the optimal architecture of
ANNs by trial and error. Compared with conventional neural networks, BNNs can
automatically control model complexity through estimating effective parameters of the
networks.
According to Mackay (1997), the Bayesian probability theory offers several benefits in
data modelling:
The overfitting problem can be solved by using Bayesian methods to control
model complexity.
Probabilistic modelling handles uncertainty in a natural manner. There is a
unique prescription for incorporating uncertainty about parameters into
predictions.
One can define more sophisticated probabilistic models which are able to
extract more information from the data.
According to Macky (1997), optimization of model control parameters has four
important advantages:
No test set or validation set is involved, so all available training data can be
devoted to both model fitting and model comparison.
Regularization constants can be optimized on-line, i.e. simultaneously with
optimization of ordinary model parameters.
146
The Bayesian objective function is not noisy, in contrast to a cross-validation
measure.
The gradient f the evidence with respect to the control parameters can be
evaluated, making it possible to simultaneously optimize a large number of
control parameters.
Bayesian approaches are mostly implemented for multilayer feedforward neural
networks. A network is trained using a data set of inputs and the targets D by adjusting
the weights w so as to minimize an error function:
D = ( x1 , t1 ), ( x 2 , t 2 )..., ( x n , t n )} (6-10)
where, D is the training set; xi is the ith sets of inputs, t is the ith target output. It is
assumed that the ith target of the network are generated by
t i = g ( xi ) + i (6-11)
where, g ( xi ) is an unknown function and i is the independent Gaussian noise. The
objective of the training is to minimize the sum of squares of the network errors:
n
ED (w) = (ti oi )2 (6-12)
i =1
where, oi is the network output.
It is possible to improve generalization by adding a term; the objective function is
modified as:
F = E D + E W (6-13)
where, F is the modified performance function; ED is the sum squares of the network
errors; EW = i wi2 is the sum squares of the network weights; and and are
objective function parameters which determine the complexity of the model.
147
controls the weight distribution in the network and, hence, its nonlinear mapping
ability. Noise in the data is represented as , which is the inverse of variance due to
noise. If >> , training will emphasize weight size reduction and produce a
smoother network response. If >> , the training algorithm will drive the error
smaller. The objective of regularization is to optimise the parameters and .
6.6.2.1 Infer weights w for given values of ,
In the Bayesian framework, the weights of the network are considered random
variables. Consider objective function F = E D + EW , after the data D is observed,
the density function for the set of weights w can be updated by applying Bayes rule:
P ( D | w, , M ) P (w | , M )
P (w | D, , , M ) = (6-14)
P(D | , , M )
where, M is the specific functional form of the neural network model used.
P(w | D, , , M ) is the posterior probability of w;
P(w | , M ) is the prior probability (density) of w;
P( D | w, , M ) is the likelihood function of w; and
P( D | , , M ) is a normalization factor or evidence for and .
Under the assumption that the distribution of noise of the target variable t is Gaussian1
and that the prior probability distribution for the weights is Gaussian, the likelihood
function and prior densities can be represented as:
1
The assumption of Gaussian simplifies the calculations involved in arriving at the equations and
reduces the computational burden in on-line optimisation of hyper-parameter. In real cases, these
assumptions give satisfactory results (MacKay, 1992).
148
1
P(D | w, , M ) = exp(-E D ); and
ZD ( )
(6-15)
1
P(w | , M) = exp(-E W )
Z W ( )
where,

n
ZD ( ) = ( ) 2

(6-16)
N
Z W ( ) = ( ) 2

The posterior probability can be written as:
1 1
exp(-( E D + EW ))
Z D ( ) Z W ( )
P (w | D , , , M ) =
Normalizat ion factor (6-17)
1
= exp ( F (w ))
Z F ( , )
The optimal weights are inferred by maximising the posterior probability
P(w | D, , , M ) , which is equal to minimising the regularized objective function
F = E D + EW .
6.6.2.2 Optimise regularization parameters ,
The control parameters and determine the complexity of the model. To infer
and , again Bayes rule is applied and the posterior probability of parameters and
can be written as:
P( D | , , M ) P( , | M )
P( , | D, M ) = (6-18)
P( D | M )
149
Assuming a uniform prior density P( , | M ) for the regularization parameters
and , optimising the posterior probability of parameters and can be achieved
by maximizing the likelihood function P( D | , , M ) .
From the equation 6-14, the normalization factor can be solved as:
P ( D | w, , M ) P (w | , M )
P(D | , , M ) =
P (w | D, , , M )
1 1
exp( E D ) exp( E W )
ZD ( ) Z W ( ) Z F ( , )
= =
1 Z D ( ) Z W ( )
exp( ( F(w))
Z F ( , )
(6-19)
In the above equation, only Z F ( , ) is unknown. To estimate it, Taylor series
expansion is used. As the objective function has the shape of a quadratic in a small
area surrounding a minimum point, expand F (w) around the minimum point of
posterior density wMP, where the gradient is zero. Solving for the normalizing constant,
one obtains:
N 1
MP 1 2
Z F (2) (det((H
2
) ) exp( F (w MP )) ) (6-20)
where, H = 2 E D + 2 E W is the Hessian matrix of the objective function and w MP
is the parameter vector which minimises the objective function F = E D + EW .
By substituting equation 6-20, the optimal values for , at the minimum point are
obtained by taking the derivative with respect to each of the log of equation 6-19 and
setting them to zero, the maximum evidence of and satisfy, respectively:
150

MP =
2 EW (w MP )
(6-21)
n
MP
=
2 E D (w MP )
where, = N 2 MP tr (H MP ) 1 is the effective number of parameters in the neural
network used in reducing the error function with values between 0 and N. N is the total
number of parameters in the network.
To compute the Hessian matrix H MP of the F(w) at the minimum point wMP , two
alternative methods were used. One is Gauss-Newton approximation and the other is
Monte Carlo method developed by Neal (1996). The Gauss-Newton approximation to
the Hessian matrix is widely used for it is readily available if the Levenberg-Marquardt
(LM) optimisation algorithm is used to find the minimum point.
The Levenberg-Marquardt algorithm is an approximation to the Gauss-Newton
approximation. Consider a function V(x), to minimize it with respect to the parameter
x , use Newtons method and the following is obtained:
x = [ 2 V( x )]1 V( x ) (6-22)
where, 2 V( x ) is the Hessian matrix and V ( x ) is the gradient. Assume V( x ) is a
sum of squares function:
N
V( x ) = e i2 ( x ) (6-23)
i =1
Then, the following is obtained:
V ( x ) = J T ( x ) e( x ) (6-24)
2 V( x ) = J T ( x )J ( x ) + S( x ) (6-25)
where J(x) is the Jacobian matrix,
151
e1 ( x ) e1 ( x ) e1 ( x )
x ...
x 2 x N
1

e 2 ( x ) e 2 ( x )
...
e 2 ( x )
J ( x ) = x 1 x 2 x N (6-26)
. . . .
e ( x ) e N ( x ) e N ( x )
N
x 1 x 2 x N
and
N
S( x ) = e i ( x ) 2 e i ( x )) . (6-27)
i =1
For the Gauss-Newton method, it is assumed that S( x ) 0 , thus,
x = [J T ( x )J ( x )] 1 J T ( x )e( x ) (6-28)
H= 2 V( x ) = J T ( x )J ( x ) . (6-29)
The Levenburg-Marquardt algorithm modification to the Gauss-Newton method is:
x = [J T ( x )J ( x ) + I] 1 J T ( x )e( x ) . (6-30)
when is small, Levenburg-Marquardt becomes Gauss-Newton.
6.6.2.3 Gauss-Newton approximation to Bayesian Regularization
Before training, the training data needs to be normalized into the range [-1, 1] so as to
achieve better results. Based on the method to infer weights w and optimise
regularization parameters , . The steps for Bayesian Regularization using Gauss-
Newton approximation to Hessian Matrix are:
1. Initialise , and the weights. Set = 0, = 1 , and use Nguyen-Widrow
method of initialising weights.
152
2. Take one step of the Levenberg-Marquardt algorithm to minimize the objective
function F = E D + EW
3. Compute the effective number of parameters = N 2 MP tr (H ) 1 . To
compute the Hessian Matrix H, the Gauss-Newton approximation available in
the Levenberg-Marquardt training algorithm is used:
H = 2 F(w) 2 J T J + 2I N , where, J is the Jacobian matrix of the training
set errors. To compute the Jacobian matrix J, refer to Hagan and Menhaj
(1994).
4. Compute new estimates for the objective function parameters

= , and
2 EW (w )
n
= .
2 E D (w)
5. Iterate step 2 through 4 until convergence.
With each re-estimation of the objective parameters, the objective function is changing
and, therefore, the minimum point is moving. If traversing the performance generally
moves towards the next minimum point, then the new estimates for the objective
function parameters will be more precise. Eventually, the objective function will not
significantly change in subsequent iterations and this indicates that the precision is
good and the training reaches its convergence.
153
6.7 Applications of ANNs to times-series forecasting
Time series forecasting is an important task that has long been conducted in many
disciplines, particularly, in economics and finance. The techniques applied include
traditional statistical methods such as the Box-Jenkins method and threshold
autoregressive, genetic algorithms and neural networks. Among them, neural networks
are demonstrated to be the most powerful one for time-series forecasting (Goh, 1998;
Zhang and Fukushige , 2002).
One of the first successful applications of ANNs in forecasting is by Lapedes and
Farber (1987). Applying feedforward neural network on two deterministic chaotic time
series, they developed a model that can forecast nonlinear systems with very high
accuracy. After Lapedes and Farbers pilot work, there were many neural networks
developed for time-series prediction. Among them are the feedforward neural networks
(NNs), recurrent NNs, neurofuzzy networks, neuro-wavelet networks, Bayesian NNs
and Bayesian evolutionary neural tree. The following sections will review each of
them briefly.
6.7.1 Feedforward neural networks in time-series forecasting
Feedforward multilayer networks are the most widely used ANNs to forecast time-
series due to its straightforwardness. Previous works include Lapedes and Farber
(1987), Sharda and Patil (1992), and Tang and Fishwick (1993). Feedforward NNs are
capable of conducting stationary time series forecasting with high accuracy. However,
the method can only learn an input output mapping which is static and may fail when
temporal contingencies span unknown intervals (Zhang and Fukushige , 2002).
154
6.7.2 Recurrent networks in time series forecast
To overcome this feedforward NNss disadvantage, recurrent networks were
developed. Recurrent networks are networks with one or more cycles that apply to
time series data and that use outputs of network units at time t as input to other units at
time t+1. Recurrent networks are superior over feedfoward NNs in dealing with
complex stochastic time series. But one drawback of recurrent networks is that they are
very difficult to train and do not generalize reliably (Mitchell, 1997). The design of an
efficient architecture and the choice of the parameters require longer processing time
(Zhang and Fukushige , 2002). Nevertheless, recurrent networks are very important in
time series prediction because of their representational power.
To solve the difficulty in choosing architecture and parameters in recurrent networks,
genetic algorithms (GAs), especially the more superior Breeder Genetic Algorithms
(BGAs) are utilized to optimise the architecture neural networks and related
parameters. BGAs are specially powerful in designing neural networks for nonlinear
systems.
6.7.3 Evolutionary ANNs
The efficient use of GAs (BGAs) to optimise network topology inspired many research
studies in evolving ANNs by evolutionary search procedures, such as GAs.
Evolutionary ANNs (EANNs) were, consequently, developed. EAANs are networks
that combine ANNs and evolutionary search procedures. EANNs not only learn, but
also adapt to a changing environment. EANNs are adaptive systems that can change
their architecture and learning rules appropriately without human intervention.
155
6.7.4 Neuron fuzzy networks
There is considerable interest in combining neural networks and fuzzy logic to develop
fuzzy neural networks (FNNs) for times series analysis. In FNNs, fuzzy reasoning is
used to handle uncertain information and neural network to deal with information
related to real data. There are fewer practical applications of FNNs to time series
predictions and most of them are applied to chaotic time series.
6.7.5 Neuron-wavelet networks
Neuron-wavelet networks are a combination of Dynamical Recurrent Neural Network
(DRNN) with the wavelet transform technique. Neuron-wavelet networks have
demonstrated the capability to improve prediction accuracy of conventional neural
network in time series prediction. In neuron-wavelet networks, first, wavelet transform
is used to decompose the time series into varying scale of temporal resolution so that
the temporal structure of the original time series becomes more tractable. Then, DRNN
is used to train on each resolution scale by the temporal recurrent backpropagation
(TRBP) algorithm. Subsequently, each wavelet scale forecast is combined to compute
the current estimate.
6.7.6 Bayesian Neural Networks
The conventional neural networks have difficulties in controlling the complexity of the
model and they lack of tools for analyzing output results such as confidence intervals
and levels. The Bayesian Neural Networks are a combination of Bayesian approach
and neural networks. It is mainly used for solving overfitting problem in the case of
156
insufficient data series. The Bayesian approach is applied by using probability to
quantify uncertainty in inference. The result of Bayesian learning is a probability
distribution and prediction is made by integrating over the posterior distribution. The
main advantages of Bayesian neural networks include (Lampinen and Vehtari, 2000),
(1) automatic complexity control; (2) possibility to use prior information and
hierarchical models for hyper-parameters; and (3) predictive distribution of the output.
6.8 Developing Multilayer feedforward network for forecasting
Generally, developing a neural network involves design of an appropriate architecture,
selection of activation functions of the hidden and output nodes, the training algorithm
and parameters, data normalization methods, training and test datasets, and
performance measures. The following sections will focus on how to develop multilayer
feedforward network specifically for forecasting. To develop a multilayer feedforward
network, the decisions include (1) design the appropriate architecture, that is, the
number of layers, the number of nodes in each layer, and the number of arcs which
interconnect with the nodes; (2) selection of transfer functions of the hidden and output
nodes; (3) selection of the training algorithm; (4) data normalization methods, (5)
training and test sets; and (6) performance measures.
6.8.1 Architecture of multilayer feedforward network
In the typical multilayer feedforward network, there are one input layer, one output
layer and the one or more hidden layers, with each node fully connected with nodes of
adjacent layer. The first step in designing a multilayer feedforward network is to
157
determine the number of input nodes, the number of hidden layers and nodes, and the
number of output nodes. The selection of these parameters is basically problem-
dependent and there is no simple clear-cut method for the determination of these
parameters.
6.8.1.1 Number of input nodes
The number of input nodes corresponds to the number of variables used to forecast
future values. In a time series forecasting, the number of input nodes corresponds to
the number of lagged observations used to discover the underlying pattern in a time
series. Currently there is no suggested systematic way to determine this number.
However, too few or too many input nodes can affect either the learning or prediction
capability of the network.
6.8.1.2 Number of hidden layers and nodes
Hidden nodes in the hidden layer allow neural networks to capture the feature in the
data and to perform complicated nonlinear mapping between input and output
variables. For number of hidden layers in forecasting problem, usually one hidden
layer is enough for ANNs to approximate any complex nonlinear function with any
desired accuracy (Hornik et al., 1989). However for some specific problems, using two
hidden layers may give more accurate results, especially when one hidden layer
network has too many hidden nodes to give satisfactory results.
158
Determining the number of hidden nodes is by trail-and-error. As discussed before,
networks with fewer hidden nodes are preferable as they usually have better
generalization ability and less overfitting problem. But networks with too few hidden
nodes may not have enough power to model and learn the data. If there is only one
hidden layer, a suitable initial size is 75% of the size of the input layer (Bailey and
Thompson, 1990)
6.8.1.3 Number of output nodes
For a time series forecasting problem, the number of output nodes often corresponds to
the forecasting horizon. There are two types of forecasting: one-step-ahead, which uses
one output node, and multi-step-ahead forecasting. There are two ways of making
multi-step forecasts: The first is called the iterative forecasting, in which the forecast
values are iteratively used as inputs for the next forecasts. In this case, only one output
node is necessary. The second called the direct method is to let the neural network
have several output nodes to directly forecast each step into the future. Results from
Zhang (1994) show that the direct prediction is much better than the iterated method.
6.8.2 Transfer function
Backpropagation algorithms require that the transfer function must be differentiable at
all points. Thus sigmoidal and linear transfer functions are most commonly used for
multilayer feedforward network that use BP algorithms. If the output layer of
kmultilayer feedforward network uses sigmoid transfer function, then the outputs of
the network are limited to a small range. If linear neurons are used for output layer, the
159
output of the network can take on any values. In multilayer feedforward networks,
hidden layers usually use sigmoid transfer function and one output layer use linear
transfer function, especially to carry out a forecast or function approximation task.
6.8.3 Training algorithm
Selection of training algorithm depends on the task type, the neural network size, time
constraints, memory requirement, accuracy requirement, and others. For a forecasting
task, which belongs to function approximation, the most popularly used training
algorithms are those of the Backpropagation. Among them, the Levenberg-Marquardt
algorithm generally has the fastest convergence and is able to obtain lower mean
squares errors than any other algorithms for function approximation problems. If the
data set is small, Bayesian Regularization training algorithm is preferred in order to
overcome overfitting.
6.8.4 Data normalization
To make more efficient in training, before training it is necessary to scale the inputs
and outputs within the range [-1,1]. Normalization can be realized using functions
premnmx, postmnmx and tramnmx in MATLAB.
6.8.5 Training sample and test sample
To develop a forecasting ANN model, a training data and a test data are typically
required. The training sample is used for developing the model and the test sample is
160
used for evaluating the forecasting ability of the model. In early stopping techniques
the validation sample is utilized to determine the stopping point of the training process.
In Bayesian regularization, only test set is used for both validation and testing purposes
particularly. To separate data into the training and test sets, it is necessary to consider
factors such as the problem requirement, data type and size of the available data and it
is critical to have both the training and test sets representative of the population. For
time series forecasting problems, this is particularly important. Inappropriate
separation of the training and test sets will affect the selection of optimal ANN
structure and the evaluation of ANN forecasting performance. Most authors select
them based on the rule of 90% vs. 10% (Zhang, et al., 1998). Some choose them based
on their particular problems.
6.8.6 Performance measures
The performance function is normally measured in terms of accuracy. There are a
number of measures of accuracy in the forecasting literature and each has advantages
and limitations. The most frequently used is the mean absolute percentage error
(MAPE) (Zhang, et al., 1998). Others include the mean absolute deviation (MAD), the
sum squares of the network errors (SSE) on the training set, the mean squared error
(MSE), the root mean squared error (RMSE). The latter four measures are absolute
measures and are of limited value when used to compare different time series.
161
6.9 Justification for the choice of ANN to predict TFP growth
As discussed in Chapter 4, factors influencing construction industry-level TFP growth
are highly interactive. The underlying relationships between TFP growth and factors
affecting TFP growth in the construction industry of Singapore are very complex and
have not yet been clearly understood. The traditional regression methods, which
require explicit representation of the relationship in mathematics or statistical model,
are not conducive for such complex multi-attribute non-linear mappings. Besides,
traditional models lack the ability to learn by themselves in order to respond
adequately to highly correlated, incomplete or previously unknown data. In contrast,
neural networks are superior over traditional methods for the purpose of determining
more complex relationships in a set of data and where the relationships between data
are largely remain unknown. They are also able to solve complex non-linear
relationships with higher accuracy (Goh, 1996, 1998; Portas and AbouRizk, 1997;
Boussabaine and Kaka, 1998; Emsley, 2002).
The advantages of ANNs over traditional statistical methods in time-series forecasting
are particularly remarkable (Zhang et al., 1998). One of the widely used traditional
model for time-series prediction, the Box-Jenkins or Autoregressive-Integrated-
Moving Average (ARIMA) method (Box and Jenkins, 1976), is linear. However, real
world systems are often nonlinear (Granger and Terasvirta, 1993). But nonlinear time
series models such as the bilinear model, the threshold autoregressive (TAR) model,
and the autoregressive conditional heteroscedastic (ARCH) model are still subjected to
the assumption of an explicit formulation for the data series despite the fact that the
underlying relationship for the data series may not be clear, and a pre-specified
nonlinear model may not be general enough to capture all the important features.
162
ANNs which are nonlinear data-driven approaches as opposed to the above model-
based nonlinear methods, are capable of performing nonlinear modeling without a
priori knowledge about the relationships between input and output variables. Studies in
many fields indicates that neural networks can predict nonlinear time series with
higher accuracy over traditional statistical and mathematical models (i.e. Lapedes and
Farber, 1987; Deppisch et al., 1991; Li et al., 1990; De Groot and Wurtz, 1991; Goh,
1996 , 1998).
As ANNs are a more flexible modelling tool for forecasting, this study will use this
method to forecast TFP growth of the construction industry in Singapore. As the data
set of this study is small (only 59 time-series dataset), an overfitting problem is highly
possible. In order to avoid overfitting, the Bayesian Neural Network is used in
predicting the TFP growth. BNNs can solve overfitting problem through automatically
controlling the model complexity. Moreover, a BNN is also superior over the
conventional neural networks in analyzing output results such as predictive distribution
of output, confidence intervals and levels. Therefore, a BNN-based model is applied to
predict the TFP growth in this study.
6.10 Chapter summary
A comprehensive review of the theory and applications of ANN, in particular BNNs,
was carried out in this chapter. It consists of: (1) an overview of ANNs; (2) basic
concepts of ANN; (3) overfitting problem and regularization; (4) Bayesian Neural
Networks (BNNs); (5) application of ANNs to time-series forecasting; and (6)
developing ANNs for training.
163
An overview of ANNs was given in Section 6.2. Formal definition, application areas
and advantages of ANNs were discussed. It highlighted that the main advantage of
ANNs is that they can perform complex non-linear mappings with higher accuracy
than traditional statistical models, especially when the relationships among the data
cannot be explicitly represented.
Next, the applications of ANNs in the field of construction management and
economics were reviewed in Section 6.3. It was found that ANNs are most frequently
used for forecasting work and that the three-layered feedforward neural network and
Backprogagation algorithms are the most common topology and training algorithm
adopted.
The fundamentals of developing ANNs were explained in Section 6.4. It involved
artificial neurons, processing elements, threshold function, topology of the networks,
learning rules, training methods and convergence rules.
Overfitting problems and regularization were discussed in Section 6.5. It highlighted
that if the training data set is small, the neural network tends to memorize the examples
and cannot generate well in new cases. This common overfitting problem can be
tackled effectively by two methods: early stopping and Regularization, in particular
Bayesian Regularization. Regularization is better than early stopping to cope with
overfitting problem.
Section 6.6, therefore, focused on the Bayesian Regularization technique. It first
reviewed the applications of BNNs and then explained how to apply Bayes rules and
164
Gauss-Newton approximation to optimise neural network parameters. The major
advantage of BNNs is that they can automatically control model complexity without
the need of using trial and error.
Section 6.7 reviewed the applications of ANNs for time-series forecasting. A critical
study of different neural networks for time-series forecasting was carried out. The
review covered the feedforward neural network, recurrent network, evolutionary
ANNs, neuron-fuzzy networks, neuron-wavelet networks and Bayesian neural
networks.
Section 6.8 explained how to develop a multilayer feedforward network. Rules of
design the architecture of the multilayer feedforward network and the selection of the
transfer function, training algorithm, data normalization, training and testing samples,
and performance function were respectively discussed.
Finally, Section 6.9 investigated the feasibility of ANNs, in particular BNNs for
forecasting construction industry-level TFP growth. Two key reasons were highlighted
for the choice of BNNs. First, the underlying relationship of the factors affecting TFP
growth is very complex. Second, the dataset of this study is small, which is highly
susceptible to cause an overfitting problem.
165

Chap6 - Artificial Neural Networks - Theory and Applications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap6 - Artificial Neural Networks - Theory and Applications

Uploaded by

Copyright:

Available Formats

CHAPTER 6

ARTIFICIAL NEURAL NETWORKS: THEORY AND APPLICATIONS

As discussed in Chapter 4, relationship underlying TFP growth and factors (variables)

affecting TFP growth is complex. First, there are complicatedly interrelated

conventional statistical forecasting model is not capable of modelling such arbitrary

forecasting model. Hence, it is necessary to review the theory and applications of

application and advantages of ANNs over traditional mathematical methods. Section

Processing Elements (PEs), topology, threshold function, learning rules and

convergence rules, are explained with illustration in Section 6.4

explained under Section 6.5.1. Regularization is a technique applied to overcome

overfitting and it is discussed in subsection 6.5.2.

Bayesian neural networks (BNNs), the combination of the well-established Bayesian

Section 6.7 reviews the applications of ANNs to time-series forecasting. It evaluates

series forecasting, including feedforward neural network, recurrent network,

evolutionary ANNs, neuron-fuzzy networks, neuron-wavelet networks and Bayesian

of the architecture of multiplayer feedforward network and selection of transfer

networks (BNNs), is provided in Section 6.9.

Networks (ANNs) have received a great deal of attention. In essence, ANN is an

abstracting essential characteristics from inputs in the pattern of variable

complex quantitative reasoning where the conventional statistical and mathematical

intrinsically noisy, incomplete or error prone (Bailey and Thompson, 1990).

ANNs have many advantages over traditional methods of modelling. Firstly, as

mathematical and statistical models is their adaptability---ANN systems can

automatically adjust their weights to optimise their behaviour (Boussabaine, 1996).

pattern association, function approximation, control, optimisation and search.

Moselhi et al. (1991) have discussed the potential applications of ANNs in

prediction, risk analysis, decision-making, resources optimization, classification and

The most common application of ANNs in the construction management area is

1995), organizational effectiveness (Sinha and Mckim, 2001), contractor

Multi-layer feedforward network and Backpropagation (BP) training algorithms were

objectivity of the prediction. The successful applications of FNN include those by

(GFNN) to produce more accurate models.

However, the selection of an appropriate topology for multiplayer network used to be

Another important application of ANN in construction management is optimisation. So

network to optimise construction-site layout. SA is a probabilistic hill-climbing search

to develop an integrated crane type and model selection system.

(2000) to conduct project dispute satisfaction classification. Sawhney and Mund

with as many PEs as there are training cases.

6.4 Basic Concepts of ANNs

An artificial neural network is a computational model defined by four parameters: type

of neurons, connection architecture, learning algorithm and recall algorithm (Mehrotra,

6.4.1 Artificial neural systems

ANNs is an information processing technology that stimulates the human brain

6.4.2 Processing elements

environment or outputs of other PEs and form an input vector:

A = (a1, ai, an) (6-1)

inputs and the bias form the net input signal, X:

input signal to PE(j).

Figure 6.1 A generic processing element

6.4.3 Threshold functions

transfer functions are linear and sigmoid.

The linear threshold function: f(x)=x (6-4)

The sigmoid function. Log-Sigmoid transfer function and Tan-Sigmoid transfer

function are commonly used in backpropagation networks, partly because in

backpropagation, it is important to be able to calculate the derivatives of any

logistic function: f ( x ) = (1 + e x ) 1 (6-5)

hyperbolic tangent: f ( x ) = tanh( x ) (6-6)

them with weighted interconnections. The architecture determines how computations

proceed. A common ANN architecture is determined by three distinguishing

characteristics: connection types, connection schemes and layer configurations.

three-layer feedforward network is the most popular. Feedforword network is a type of

6.4.5 Learning rules

Learning makes possible modification of behaviour in response to the environment. A