Professional Documents
Culture Documents
6.1 Introduction
relationships among factors affecting TFP growth. Second, there may have non-linear
relationship between factors affecting TFP and TFP growth. These may imply that a
complex non-linear mapping. In the last decade, it has been widely recognized that
Artificial Neural Networks (ANNs) are superior over the traditional statistical model
when relationship between output and input variables is implicit, complex, and
nonlinear. For this reason, this study will apply ANN technology to develop the
ANNs here.
Section 6.2 provides an overview of ANNs, including the definition of ANNs, areas of
6.3 reviews the application of ANNs in the area of construction management and
economics. Basic ANN components and theories, such as artificial neural systems,
133
Section 6.5 discusses the overfitting problem and regularization. Overfitting problem,
the most common problem encountered by ANNs when the dataset is small, is
Regularization and Neural Networks, are discussed in Section 6.6. A review of the
applications of BNNs is provided under Section 6.6.1. Section 6.6.2 explains the
theory of BNNs. It focuses on the main objective, that is, to optimise regularization
parameters.
the advantages and disadvantages of different neural network models applying time-
neural networks.
Section 6.8 explains how to carry out an empirical ANN modelling. It concentrates on
the know-how of develop a multilayer feedforward network. This involves the design
function, training algorithm, data normalization, training and testing samples and
performance function.
Justification for choice of ANNs to predict TFP growth, in particular Bayesian neural
134
6.2 An overview of ANNs
In the last decade, Artificial Intelligence (AI) techniques such as Artificial Neural
information technology that mimics the human brain and nervous system in learning
from experience and generalizes from previous examples to generate new outputs by
interconnection weights among the processing elements. ANNs are more powerful
than traditional methods in the situations when the problem require qualitative or
methods are inadequate, or the parameters are highly interdependent and data is
opposed to the traditional mathematical and statistical methods, ANNs are data-driven
self-adaptive methods, which can capture subtle functional relationships among the
data even if the underlying relationships are unknown or hard to describe. Secondly,
ANNs are able to capture complex non-linear relationships with better accuracy
(Rumelhart et al. 1994). Thirdly, the most important advantage of ANNs over
Neural networks have been utilized for classification, clustering, vector quantification,
135
6.3 Applications of ANNs in the construction industry
construction industry in the early of1990s and, in 1996, Boussabaine (1996) reviewed
the use of ANNs in construction management. So far, ANNs have been used for
selection.
prediction. ANNs have been applied to predict tender bids (Gaarslev, 1991; McKim,
1993; Li and Love, 1999), construction cost (Williams, 1994, 2002; Adeli and Wu,
1998; Hegazy and Ayed, 1998; Emsley, 2002), construction budget performance
(Chua et al., 1997), project cash flow (Boussabaine and Kaka, 1998), construction
demand (Goh 1996; 2000), labour productivity (Chao and Skibniewski, 1994; Portas
and AbouRizk, 1997; Savin and Fazio, 1998; AbouRizk et al., 2001), earthmoving
operation (Shi, 1999), the acceptability of a new technology (Chao and Skibniewski,
preqalification (Lam et al., 2001) and hoisting time of tower cranes (Tam et al., 2002).
the most popular topology and learning methods for the prediction. However, several
other neural networks other than BP were developed to cope with different data
problems. Regularization neural network has been used by Adeli and Wu (1998) to
deal with the noise in highway construction costs. Regularization neural network has
advantages over BP in that the result of the estimation from the regularization neural
network depends only on the training examples and that it can overcome overfitting
136
problem. When the prediction dependent variables are subject to uncertainty and based
on subjective judgement, fuzzy neural network (FNN) model, which combine the
fuzzy set and neural network techniques, has been developed to improve the
Portas and AbouRizk (1997), Lam et al. (2001) and AbouRizk (2001). Their studies
reveal the benefit of FNN models over the general feedforward neural network
conducted by trail and error. To automate the search of an optimal architecture for
ANNs, the solution was to combine genetic algorithms (GAs) with neural networks
(Goh, 2000). GAs are artificial intelligence search methods based on the theories of
genetics and natural selection developed by Holland (1975). The combined technique
was found be able to produce more accurate forecasts than the ANN technique.
far, two types of optimisation algorithm have been used to find a global minimum in
order to avoid a local minimum, which NNs are prone to. One is GAs and another is
the simulated annealing (SA). Yeh (1995) employed the SA and Hopfield neural
algorithm which can find a global minimum of the performance function by combining
gradient descent with a random process. However, the drawback of the SA is that it is
very slow. Contrasted with the SA, GAs are less susceptible to being stuck at the local
minimum and can quickly locate high performance regions in extremely complex
search spaces. GAs have three major applications: to optimise weights in NNs; to
137
specify the topology for NNs; and to select optimum smoothing factors for adaptive
probabilistic neural networks (APNNs). Hegazy and Ayed (1998) applied GAs to
optimise the network weights when developing a parametric cost-estimating model for
highway projects. Goh (2000) used GAs to seek the optimum architecture of NNs.
Sawhney and Mund (2001) used GAs to select optimum smoothing factors in APNNs
For classification or, selection, multilayer neural network was used by Cheung et al.
(2001) used APNNs based on the Bayesian classifier method to conduct crane type and
model selection. APNNs can model any non-linear function using a single hidden layer
et al., 1997).
nervous system. It is built on three basic components: processing elements (PE) which
are an artificial model of the human neuron; interconnections whose functions are
similar to the axon; and synapses which are the junctions where an interconnection
138
meets a PE. Each PE receives signals from other PEs that constitute an input pattern.
This input pattern stimulates the PE to reach some level of activity. If the activity is
strong enough, the PE generates a single output signal that is transmitted to other PEs
through an interconnection.
Figure 6.1 describes a typical artificial neuron. The input signals come from either the
where, ai is the activity level of the ith PE or input. There are weights bound to the
input connections: w1, w2, . . . ,wn. The neuron has a bias b. The sum of the weighted
n
X = wij ai + b j = W * A + b (6-2)
i =1
The input signal is then sent to a transfer function, which serves as a non-linear
threshold. The transfer function calculates output signal of the PE (j) as:
Oj = f (X ) (6-3)
where, O j is the output signal from PE(j); f is a transfer function; and X is the net
139
bj
a1
w1j
ai w ij f ( a i w ij + b j ) Oj
wnj
an
There are many threshold functions adopted in ANNs. The two most commonly used
transfer function used (Demuth and Beale, 2000). They can be expressed as the
following equations:
140
6.4.4 Architecture of ANNs
Architecture of an ANN is the organisation that assembles PEs into layers and links
The most commonly used ANN paradigm is multilayer perceptrons (MLPs). A MLP
consists of an input layer, at least one hidden layer, and one output layer. The neurons
in each layer are usually fully connected to the neurons in another layer. Among them,
network in which connection is allowed from a node in layer i only to nodes in layer i
+ 1. The three layers are input layer, hidden layer and output layer. Input layer is the
layer that receives input signals from the environment. Output layer is the layer that
emits signals to the environment. Hidden layers are layers between the input and
output layers.
learning rule is a procedure for modifying the weights of connections between the
nodes and biases of a network. There are three broad learning categories: supervised
141
6.4.6 Convergence
Convergence is the eventual minimization of error between the desired and computed
sense:
2
lim E{
n
xn x }=0 (6-7)
As stated before, the BP is the most commonly used ANN learning technique. The
and biases are modified in the direction that performance function decreases most
rapidly. Multilayer feedforward network with BP are capable of performing any linear
convergence and may cause overfitting problem. To speed up the BP training process,
some faster BP algorithms that can converge faster have been developed. Among
able to obtain lower mean squares errors than any other algorithms for function
The goal of neural network training is to minimize the errors while the trained neural
network can respond properly when presented with new inputs. Ovefitting is a
142
phenomenon whereby the neural network has memorized the training example so that
the network fails to generalize on a new situation. Overfitting may occur when the data
set for training is small. As a larger network is used, the more complex the functions a
network can create. However, the more complex the network is, the more possible the
network may mistakenly model the noise in the data as part of the non-linear
to add more training examples. However, it is difficult to know how large the network
training examples are in limited supply. Fortunately, there are two other useful
techniques to overcome this problem. They are early stopping and regularization.
According to Sarle (1995) and Demuth and Beale (2000), when conducting a function
performance than early stopping. It is because unlike early stopping that separates
validation data from the training data, Bayesian Regularization uses both as training
data. When the size of the data set is small or if there is little noise in the data set, the
experiments show that on average, the MSE obtained from Bayesian Regularization is
only around 1/5 that of early stopping. Therefore, this study will apply the Bayesian
6.5.2 Regularization
weights. When the weights are small, the network response will be smooth. According
143
to Foresee and Hagan (1997), with regularization, any modestly oversized network
The typical performance function that is used for training multilayer feedforward
1 N
F = MSE = (t i o i ) (6-8)
N i =1
consists of the mean of the sum of squares of the network weights and biases (MSW):
1 n 2 (6-9)
MSW = wj
n j=1
regularization; MSE is the mean sum of square of the network errors; and MSW is the
The improved performance function will cause the network to have smaller weights
and biases and, hence, resulting in a smoother network response which is less likely to
overfit. However, it is difficult to determine the optimum value for the performance
ratio parameter. To overcome this difficulty, Mackay (1992) introduced the Bayesian
Regularization. In this technique, the weights and biases of the network are assumed to
related to the unknown variances associated with these distributions. Then statistical
144
6.6 Bayesian Neural Networks
(1992) was the first to introduce the Bayesian approach to neural network training and
Foresee and Hagan (1997) used a Gauss-Newton approximation on the Hessian matrix
The BNNs have been utilized in many areas, but not yet in construction. BNN model
was used by Cool et al. (1997) for predicting yield and ultimate tensile strength in
welds. Cherian et al. (2000) used BNNs to predict mechanical properties of ferrous
powder materials and the model was found to produce good prediction accuracy. A
BNN-based model for determining main particulars of a ship at the initial design stage
is described by Clausen et al. (2001). BNNs have also been used in assessing
Somers (2001). Aminian (2001) developed an analog circuit fault diagnostic system
applying BNNs.
145
6.6.2 Theory of Bayesian Regularization
The main objective of BNNs is to model the relationship from the data without
before, one of the drawbacks of ANNs is that of choosing the optimal architecture of
ANNs by trial and error. Compared with conventional neural networks, BNNs can
networks.
According to Mackay (1997), the Bayesian probability theory offers several benefits in
data modelling:
model complexity.
predictions.
One can define more sophisticated probabilistic models which are able to
important advantages:
No test set or validation set is involved, so all available training data can be
146
The Bayesian objective function is not noisy, in contrast to a cross-validation
measure.
The gradient f the evidence with respect to the control parameters can be
control parameters.
networks. A network is trained using a data set of inputs and the targets D by adjusting
D = ( x1 , t1 ), ( x 2 , t 2 )..., ( x n , t n )} (6-10)
where, D is the training set; xi is the ith sets of inputs, t is the ith target output. It is
t i = g ( xi ) + i (6-11)
objective of the training is to minimize the sum of squares of the network errors:
n
ED (w) = (ti oi )2 (6-12)
i =1
modified as:
F = E D + E W (6-13)
where, F is the modified performance function; ED is the sum squares of the network
errors; EW = i wi2 is the sum squares of the network weights; and and are
147
controls the weight distribution in the network and, hence, its nonlinear mapping
ability. Noise in the data is represented as , which is the inverse of variance due to
noise. If >> , training will emphasize weight size reduction and produce a
smoother network response. If >> , the training algorithm will drive the error
In the Bayesian framework, the weights of the network are considered random
the density function for the set of weights w can be updated by applying Bayes rule:
P ( D | w, , M ) P (w | , M )
P (w | D, , , M ) = (6-14)
P(D | , , M )
where, M is the specific functional form of the neural network model used.
Under the assumption that the distribution of noise of the target variable t is Gaussian1
and that the prior probability distribution for the weights is Gaussian, the likelihood
1
The assumption of Gaussian simplifies the calculations involved in arriving at the equations and
reduces the computational burden in on-line optimisation of hyper-parameter. In real cases, these
assumptions give satisfactory results (MacKay, 1992).
148
1
P(D | w, , M ) = exp(-E D ); and
ZD ( )
(6-15)
1
P(w | , M) = exp(-E W )
Z W ( )
where,
n
ZD ( ) = ( ) 2
(6-16)
N
Z W ( ) = ( ) 2
1 1
exp(-( E D + EW ))
Z D ( ) Z W ( )
P (w | D , , , M ) =
Normalizat ion factor (6-17)
1
= exp ( F (w ))
Z F ( , )
F = E D + EW .
The control parameters and determine the complexity of the model. To infer
and , again Bayes rule is applied and the posterior probability of parameters and
P( D | , , M ) P( , | M )
P( , | D, M ) = (6-18)
P( D | M )
149
Assuming a uniform prior density P( , | M ) for the regularization parameters
From the equation 6-14, the normalization factor can be solved as:
P ( D | w, , M ) P (w | , M )
P(D | , , M ) =
P (w | D, , , M )
1 1
exp( E D ) exp( E W )
ZD ( ) Z W ( ) Z F ( , )
= =
1 Z D ( ) Z W ( )
exp( ( F(w))
Z F ( , )
(6-19)
expansion is used. As the objective function has the shape of a quadratic in a small
area surrounding a minimum point, expand F (w) around the minimum point of
posterior density wMP, where the gradient is zero. Solving for the normalizing constant,
one obtains:
N 1
MP 1 2
Z F (2) (det((H
2
) ) exp( F (w MP )) ) (6-20)
By substituting equation 6-20, the optimal values for , at the minimum point are
obtained by taking the derivative with respect to each of the log of equation 6-19 and
150
MP =
2 EW (w MP )
(6-21)
n
MP
=
2 E D (w MP )
network used in reducing the error function with values between 0 and N. N is the total
To compute the Hessian matrix H MP of the F(w) at the minimum point wMP , two
alternative methods were used. One is Gauss-Newton approximation and the other is
the Hessian matrix is widely used for it is readily available if the Levenberg-Marquardt
x = [ 2 V( x )]1 V( x ) (6-22)
N
V( x ) = e i2 ( x ) (6-23)
i =1
V ( x ) = J T ( x ) e( x ) (6-24)
2 V( x ) = J T ( x )J ( x ) + S( x ) (6-25)
151
e1 ( x ) e1 ( x ) e1 ( x )
x ...
x 2 x N
1
e 2 ( x ) e 2 ( x )
...
e 2 ( x )
J ( x ) = x 1 x 2 x N (6-26)
. . . .
e ( x ) e N ( x ) e N ( x )
N
x 1 x 2 x N
and
N
S( x ) = e i ( x ) 2 e i ( x )) . (6-27)
i =1
x = [J T ( x )J ( x )] 1 J T ( x )e( x ) (6-28)
H= 2 V( x ) = J T ( x )J ( x ) . (6-29)
x = [J T ( x )J ( x ) + I] 1 J T ( x )e( x ) . (6-30)
Before training, the training data needs to be normalized into the range [-1, 1] so as to
achieve better results. Based on the method to infer weights w and optimise
152
2. Take one step of the Levenberg-Marquardt algorithm to minimize the objective
function F = E D + EW
set errors. To compute the Jacobian matrix J, refer to Hagan and Menhaj
(1994).
= , and
2 EW (w )
n
= .
2 E D (w)
With each re-estimation of the objective parameters, the objective function is changing
and, therefore, the minimum point is moving. If traversing the performance generally
moves towards the next minimum point, then the new estimates for the objective
function parameters will be more precise. Eventually, the objective function will not
significantly change in subsequent iterations and this indicates that the precision is
153
6.7 Applications of ANNs to times-series forecasting
Time series forecasting is an important task that has long been conducted in many
autoregressive, genetic algorithms and neural networks. Among them, neural networks
are demonstrated to be the most powerful one for time-series forecasting (Goh, 1998;
Farber (1987). Applying feedforward neural network on two deterministic chaotic time
series, they developed a model that can forecast nonlinear systems with very high
accuracy. After Lapedes and Farbers pilot work, there were many neural networks
developed for time-series prediction. Among them are the feedforward neural networks
and Bayesian evolutionary neural tree. The following sections will review each of
them briefly.
Feedforward multilayer networks are the most widely used ANNs to forecast time-
series due to its straightforwardness. Previous works include Lapedes and Farber
(1987), Sharda and Patil (1992), and Tang and Fishwick (1993). Feedforward NNs are
capable of conducting stationary time series forecasting with high accuracy. However,
the method can only learn an input output mapping which is static and may fail when
154
6.7.2 Recurrent networks in time series forecast
developed. Recurrent networks are networks with one or more cycles that apply to
time series data and that use outputs of network units at time t as input to other units at
time t+1. Recurrent networks are superior over feedfoward NNs in dealing with
complex stochastic time series. But one drawback of recurrent networks is that they are
very difficult to train and do not generalize reliably (Mitchell, 1997). The design of an
efficient architecture and the choice of the parameters require longer processing time
(Zhang and Fukushige , 2002). Nevertheless, recurrent networks are very important in
genetic algorithms (GAs), especially the more superior Breeder Genetic Algorithms
(BGAs) are utilized to optimise the architecture neural networks and related
parameters. BGAs are specially powerful in designing neural networks for nonlinear
systems.
The efficient use of GAs (BGAs) to optimise network topology inspired many research
that combine ANNs and evolutionary search procedures. EANNs not only learn, but
also adapt to a changing environment. EANNs are adaptive systems that can change
155
6.7.4 Neuron fuzzy networks
There is considerable interest in combining neural networks and fuzzy logic to develop
fuzzy neural networks (FNNs) for times series analysis. In FNNs, fuzzy reasoning is
used to handle uncertain information and neural network to deal with information
related to real data. There are fewer practical applications of FNNs to time series
is used to decompose the time series into varying scale of temporal resolution so that
the temporal structure of the original time series becomes more tractable. Then, DRNN
The conventional neural networks have difficulties in controlling the complexity of the
model and they lack of tools for analyzing output results such as confidence intervals
and levels. The Bayesian Neural Networks are a combination of Bayesian approach
and neural networks. It is mainly used for solving overfitting problem in the case of
156
insufficient data series. The Bayesian approach is applied by using probability to
distribution and prediction is made by integrating over the posterior distribution. The
main advantages of Bayesian neural networks include (Lampinen and Vehtari, 2000),
(1) automatic complexity control; (2) possibility to use prior information and
hierarchical models for hyper-parameters; and (3) predictive distribution of the output.
selection of activation functions of the hidden and output nodes, the training algorithm
and parameters, data normalization methods, training and test datasets, and
performance measures. The following sections will focus on how to develop multilayer
network, the decisions include (1) design the appropriate architecture, that is, the
number of layers, the number of nodes in each layer, and the number of arcs which
interconnect with the nodes; (2) selection of transfer functions of the hidden and output
nodes; (3) selection of the training algorithm; (4) data normalization methods, (5)
In the typical multilayer feedforward network, there are one input layer, one output
layer and the one or more hidden layers, with each node fully connected with nodes of
157
determine the number of input nodes, the number of hidden layers and nodes, and the
dependent and there is no simple clear-cut method for the determination of these
parameters.
The number of input nodes corresponds to the number of variables used to forecast
future values. In a time series forecasting, the number of input nodes corresponds to
the number of lagged observations used to discover the underlying pattern in a time
However, too few or too many input nodes can affect either the learning or prediction
Hidden nodes in the hidden layer allow neural networks to capture the feature in the
data and to perform complicated nonlinear mapping between input and output
variables. For number of hidden layers in forecasting problem, usually one hidden
layer is enough for ANNs to approximate any complex nonlinear function with any
desired accuracy (Hornik et al., 1989). However for some specific problems, using two
hidden layers may give more accurate results, especially when one hidden layer
158
Determining the number of hidden nodes is by trail-and-error. As discussed before,
networks with fewer hidden nodes are preferable as they usually have better
generalization ability and less overfitting problem. But networks with too few hidden
nodes may not have enough power to model and learn the data. If there is only one
hidden layer, a suitable initial size is 75% of the size of the input layer (Bailey and
Thompson, 1990)
For a time series forecasting problem, the number of output nodes often corresponds to
the forecasting horizon. There are two types of forecasting: one-step-ahead, which uses
one output node, and multi-step-ahead forecasting. There are two ways of making
multi-step forecasts: The first is called the iterative forecasting, in which the forecast
values are iteratively used as inputs for the next forecasts. In this case, only one output
node is necessary. The second called the direct method is to let the neural network
have several output nodes to directly forecast each step into the future. Results from
Zhang (1994) show that the direct prediction is much better than the iterated method.
all points. Thus sigmoidal and linear transfer functions are most commonly used for
kmultilayer feedforward network uses sigmoid transfer function, then the outputs of
the network are limited to a small range. If linear neurons are used for output layer, the
159
output of the network can take on any values. In multilayer feedforward networks,
hidden layers usually use sigmoid transfer function and one output layer use linear
Selection of training algorithm depends on the task type, the neural network size, time
task, which belongs to function approximation, the most popularly used training
algorithm generally has the fastest convergence and is able to obtain lower mean
squares errors than any other algorithms for function approximation problems. If the
overcome overfitting.
To make more efficient in training, before training it is necessary to scale the inputs
and outputs within the range [-1,1]. Normalization can be realized using functions
To develop a forecasting ANN model, a training data and a test data are typically
required. The training sample is used for developing the model and the test sample is
160
used for evaluating the forecasting ability of the model. In early stopping techniques
the validation sample is utilized to determine the stopping point of the training process.
In Bayesian regularization, only test set is used for both validation and testing purposes
particularly. To separate data into the training and test sets, it is necessary to consider
factors such as the problem requirement, data type and size of the available data and it
is critical to have both the training and test sets representative of the population. For
separation of the training and test sets will affect the selection of optimal ANN
structure and the evaluation of ANN forecasting performance. Most authors select
them based on the rule of 90% vs. 10% (Zhang, et al., 1998). Some choose them based
number of measures of accuracy in the forecasting literature and each has advantages
and limitations. The most frequently used is the mean absolute percentage error
(MAPE) (Zhang, et al., 1998). Others include the mean absolute deviation (MAD), the
sum squares of the network errors (SSE) on the training set, the mean squared error
(MSE), the root mean squared error (RMSE). The latter four measures are absolute
measures and are of limited value when used to compare different time series.
161
6.9 Justification for the choice of ANN to predict TFP growth
are highly interactive. The underlying relationships between TFP growth and factors
affecting TFP growth in the construction industry of Singapore are very complex and
have not yet been clearly understood. The traditional regression methods, which
are not conducive for such complex multi-attribute non-linear mappings. Besides,
neural networks are superior over traditional methods for the purpose of determining
more complex relationships in a set of data and where the relationships between data
are largely remain unknown. They are also able to solve complex non-linear
relationships with higher accuracy (Goh, 1996, 1998; Portas and AbouRizk, 1997;
are particularly remarkable (Zhang et al., 1998). One of the widely used traditional
Moving Average (ARIMA) method (Box and Jenkins, 1976), is linear. However, real
world systems are often nonlinear (Granger and Terasvirta, 1993). But nonlinear time
series models such as the bilinear model, the threshold autoregressive (TAR) model,
and the autoregressive conditional heteroscedastic (ARCH) model are still subjected to
the assumption of an explicit formulation for the data series despite the fact that the
underlying relationship for the data series may not be clear, and a pre-specified
nonlinear model may not be general enough to capture all the important features.
162
ANNs which are nonlinear data-driven approaches as opposed to the above model-
priori knowledge about the relationships between input and output variables. Studies in
many fields indicates that neural networks can predict nonlinear time series with
higher accuracy over traditional statistical and mathematical models (i.e. Lapedes and
Farber, 1987; Deppisch et al., 1991; Li et al., 1990; De Groot and Wurtz, 1991; Goh,
1996 , 1998).
As ANNs are a more flexible modelling tool for forecasting, this study will use this
method to forecast TFP growth of the construction industry in Singapore. As the data
set of this study is small (only 59 time-series dataset), an overfitting problem is highly
predicting the TFP growth. BNNs can solve overfitting problem through automatically
controlling the model complexity. Moreover, a BNN is also superior over the
was carried out in this chapter. It consists of: (1) an overview of ANNs; (2) basic
concepts of ANN; (3) overfitting problem and regularization; (4) Bayesian Neural
163
An overview of ANNs was given in Section 6.2. Formal definition, application areas
and advantages of ANNs were discussed. It highlighted that the main advantage of
ANNs is that they can perform complex non-linear mappings with higher accuracy
than traditional statistical models, especially when the relationships among the data
economics were reviewed in Section 6.3. It was found that ANNs are most frequently
used for forecasting work and that the three-layered feedforward neural network and
Backprogagation algorithms are the most common topology and training algorithm
adopted.
that if the training data set is small, the neural network tends to memorize the examples
and cannot generate well in new cases. This common overfitting problem can be
overfitting problem.
reviewed the applications of BNNs and then explained how to apply Bayes rules and
164
Gauss-Newton approximation to optimise neural network parameters. The major
advantage of BNNs is that they can automatically control model complexity without
Section 6.7 reviewed the applications of ANNs for time-series forecasting. A critical
study of different neural networks for time-series forecasting was carried out. The
networks.
design the architecture of the multilayer feedforward network and the selection of the
transfer function, training algorithm, data normalization, training and testing samples,
Finally, Section 6.9 investigated the feasibility of ANNs, in particular BNNs for
forecasting construction industry-level TFP growth. Two key reasons were highlighted
for the choice of BNNs. First, the underlying relationship of the factors affecting TFP
growth is very complex. Second, the dataset of this study is small, which is highly
165