You are on page 1of 33

CHAPTER 6

ARTIFICIAL NEURAL NETWORKS: THEORY AND APPLICATIONS

6.1 Introduction

As discussed in Chapter 4, relationship underlying TFP growth and factors (variables)

affecting TFP growth is complex. First, there are complicatedly interrelated

relationships among factors affecting TFP growth. Second, there may have non-linear

relationship between factors affecting TFP and TFP growth. These may imply that a

conventional statistical forecasting model is not capable of modelling such arbitrary

complex non-linear mapping. In the last decade, it has been widely recognized that

Artificial Neural Networks (ANNs) are superior over the traditional statistical model

when relationship between output and input variables is implicit, complex, and

nonlinear. For this reason, this study will apply ANN technology to develop the

forecasting model. Hence, it is necessary to review the theory and applications of

ANNs here.

Section 6.2 provides an overview of ANNs, including the definition of ANNs, areas of

application and advantages of ANNs over traditional mathematical methods. Section

6.3 reviews the application of ANNs in the area of construction management and

economics. Basic ANN components and theories, such as artificial neural systems,

Processing Elements (PEs), topology, threshold function, learning rules and

convergence rules, are explained with illustration in Section 6.4

133
Section 6.5 discusses the overfitting problem and regularization. Overfitting problem,

the most common problem encountered by ANNs when the dataset is small, is

explained under Section 6.5.1. Regularization is a technique applied to overcome

overfitting and it is discussed in subsection 6.5.2.

Bayesian neural networks (BNNs), the combination of the well-established Bayesian

Regularization and Neural Networks, are discussed in Section 6.6. A review of the

applications of BNNs is provided under Section 6.6.1. Section 6.6.2 explains the

theory of BNNs. It focuses on the main objective, that is, to optimise regularization

parameters.

Section 6.7 reviews the applications of ANNs to time-series forecasting. It evaluates

the advantages and disadvantages of different neural network models applying time-

series forecasting, including feedforward neural network, recurrent network,

evolutionary ANNs, neuron-fuzzy networks, neuron-wavelet networks and Bayesian

neural networks.

Section 6.8 explains how to carry out an empirical ANN modelling. It concentrates on

the know-how of develop a multilayer feedforward network. This involves the design

of the architecture of multiplayer feedforward network and selection of transfer

function, training algorithm, data normalization, training and testing samples and

performance function.

Justification for choice of ANNs to predict TFP growth, in particular Bayesian neural

networks (BNNs), is provided in Section 6.9.

134
6.2 An overview of ANNs

In the last decade, Artificial Intelligence (AI) techniques such as Artificial Neural

Networks (ANNs) have received a great deal of attention. In essence, ANN is an

information technology that mimics the human brain and nervous system in learning

from experience and generalizes from previous examples to generate new outputs by

abstracting essential characteristics from inputs in the pattern of variable

interconnection weights among the processing elements. ANNs are more powerful

than traditional methods in the situations when the problem require qualitative or

complex quantitative reasoning where the conventional statistical and mathematical

methods are inadequate, or the parameters are highly interdependent and data is

intrinsically noisy, incomplete or error prone (Bailey and Thompson, 1990).

ANNs have many advantages over traditional methods of modelling. Firstly, as

opposed to the traditional mathematical and statistical methods, ANNs are data-driven

self-adaptive methods, which can capture subtle functional relationships among the

data even if the underlying relationships are unknown or hard to describe. Secondly,

ANNs are able to capture complex non-linear relationships with better accuracy

(Rumelhart et al. 1994). Thirdly, the most important advantage of ANNs over

mathematical and statistical models is their adaptability---ANN systems can

automatically adjust their weights to optimise their behaviour (Boussabaine, 1996).

Neural networks have been utilized for classification, clustering, vector quantification,

pattern association, function approximation, control, optimisation and search.

135
6.3 Applications of ANNs in the construction industry

Moselhi et al. (1991) have discussed the potential applications of ANNs in

construction industry in the early of1990s and, in 1996, Boussabaine (1996) reviewed

the use of ANNs in construction management. So far, ANNs have been used for

prediction, risk analysis, decision-making, resources optimization, classification and

selection.

The most common application of ANNs in the construction management area is

prediction. ANNs have been applied to predict tender bids (Gaarslev, 1991; McKim,

1993; Li and Love, 1999), construction cost (Williams, 1994, 2002; Adeli and Wu,

1998; Hegazy and Ayed, 1998; Emsley, 2002), construction budget performance

(Chua et al., 1997), project cash flow (Boussabaine and Kaka, 1998), construction

demand (Goh 1996; 2000), labour productivity (Chao and Skibniewski, 1994; Portas

and AbouRizk, 1997; Savin and Fazio, 1998; AbouRizk et al., 2001), earthmoving

operation (Shi, 1999), the acceptability of a new technology (Chao and Skibniewski,

1995), organizational effectiveness (Sinha and Mckim, 2001), contractor

preqalification (Lam et al., 2001) and hoisting time of tower cranes (Tam et al., 2002).

Multi-layer feedforward network and Backpropagation (BP) training algorithms were

the most popular topology and learning methods for the prediction. However, several

other neural networks other than BP were developed to cope with different data

problems. Regularization neural network has been used by Adeli and Wu (1998) to

deal with the noise in highway construction costs. Regularization neural network has

advantages over BP in that the result of the estimation from the regularization neural

network depends only on the training examples and that it can overcome overfitting

136
problem. When the prediction dependent variables are subject to uncertainty and based

on subjective judgement, fuzzy neural network (FNN) model, which combine the

fuzzy set and neural network techniques, has been developed to improve the

objectivity of the prediction. The successful applications of FNN include those by

Portas and AbouRizk (1997), Lam et al. (2001) and AbouRizk (2001). Their studies

reveal the benefit of FNN models over the general feedforward neural network

(GFNN) to produce more accurate models.

However, the selection of an appropriate topology for multiplayer network used to be

conducted by trail and error. To automate the search of an optimal architecture for

ANNs, the solution was to combine genetic algorithms (GAs) with neural networks

(Goh, 2000). GAs are artificial intelligence search methods based on the theories of

genetics and natural selection developed by Holland (1975). The combined technique

was found be able to produce more accurate forecasts than the ANN technique.

Another important application of ANN in construction management is optimisation. So

far, two types of optimisation algorithm have been used to find a global minimum in

order to avoid a local minimum, which NNs are prone to. One is GAs and another is

the simulated annealing (SA). Yeh (1995) employed the SA and Hopfield neural

network to optimise construction-site layout. SA is a probabilistic hill-climbing search

algorithm which can find a global minimum of the performance function by combining

gradient descent with a random process. However, the drawback of the SA is that it is

very slow. Contrasted with the SA, GAs are less susceptible to being stuck at the local

minimum and can quickly locate high performance regions in extremely complex

search spaces. GAs have three major applications: to optimise weights in NNs; to

137
specify the topology for NNs; and to select optimum smoothing factors for adaptive

probabilistic neural networks (APNNs). Hegazy and Ayed (1998) applied GAs to

optimise the network weights when developing a parametric cost-estimating model for

highway projects. Goh (2000) used GAs to seek the optimum architecture of NNs.

Sawhney and Mund (2001) used GAs to select optimum smoothing factors in APNNs

to develop an integrated crane type and model selection system.

For classification or, selection, multilayer neural network was used by Cheung et al.

(2000) to conduct project dispute satisfaction classification. Sawhney and Mund

(2001) used APNNs based on the Bayesian classifier method to conduct crane type and

model selection. APNNs can model any non-linear function using a single hidden layer

with as many PEs as there are training cases.

6.4 Basic Concepts of ANNs

An artificial neural network is a computational model defined by four parameters: type

of neurons, connection architecture, learning algorithm and recall algorithm (Mehrotra,

et al., 1997).

6.4.1 Artificial neural systems

ANNs is an information processing technology that stimulates the human brain

nervous system. It is built on three basic components: processing elements (PE) which

are an artificial model of the human neuron; interconnections whose functions are

similar to the axon; and synapses which are the junctions where an interconnection

138
meets a PE. Each PE receives signals from other PEs that constitute an input pattern.

This input pattern stimulates the PE to reach some level of activity. If the activity is

strong enough, the PE generates a single output signal that is transmitted to other PEs

through an interconnection.

6.4.2 Processing elements

Figure 6.1 describes a typical artificial neuron. The input signals come from either the

environment or outputs of other PEs and form an input vector:

A = (a1, ai, an) (6-1)

where, ai is the activity level of the ith PE or input. There are weights bound to the

input connections: w1, w2, . . . ,wn. The neuron has a bias b. The sum of the weighted

inputs and the bias form the net input signal, X:

n
X = wij ai + b j = W * A + b (6-2)
i =1

The input signal is then sent to a transfer function, which serves as a non-linear

threshold. The transfer function calculates output signal of the PE (j) as:

Oj = f (X ) (6-3)

where, O j is the output signal from PE(j); f is a transfer function; and X is the net

input signal to PE(j).

139
bj

a1
w1j

ai w ij f ( a i w ij + b j ) Oj

wnj

an

Figure 6.1 A generic processing element

6.4.3 Threshold functions

There are many threshold functions adopted in ANNs. The two most commonly used

transfer functions are linear and sigmoid.

The linear threshold function: f(x)=x (6-4)

The sigmoid function. Log-Sigmoid transfer function and Tan-Sigmoid transfer

function are commonly used in backpropagation networks, partly because in

backpropagation, it is important to be able to calculate the derivatives of any

transfer function used (Demuth and Beale, 2000). They can be expressed as the

following equations:

logistic function: f ( x ) = (1 + e x ) 1 (6-5)

hyperbolic tangent: f ( x ) = tanh( x ) (6-6)

140
6.4.4 Architecture of ANNs

Architecture of an ANN is the organisation that assembles PEs into layers and links

them with weighted interconnections. The architecture determines how computations

proceed. A common ANN architecture is determined by three distinguishing

characteristics: connection types, connection schemes and layer configurations.

The most commonly used ANN paradigm is multilayer perceptrons (MLPs). A MLP

consists of an input layer, at least one hidden layer, and one output layer. The neurons

in each layer are usually fully connected to the neurons in another layer. Among them,

three-layer feedforward network is the most popular. Feedforword network is a type of

network in which connection is allowed from a node in layer i only to nodes in layer i

+ 1. The three layers are input layer, hidden layer and output layer. Input layer is the

layer that receives input signals from the environment. Output layer is the layer that

emits signals to the environment. Hidden layers are layers between the input and

output layers.

6.4.5 Learning rules

Learning makes possible modification of behaviour in response to the environment. A

learning rule is a procedure for modifying the weights of connections between the

nodes and biases of a network. There are three broad learning categories: supervised

learning; unsupervised learning; and reinforcement learning.

141
6.4.6 Convergence

Convergence is the eventual minimization of error between the desired and computed

PE outputs. One common convergence method is convergence in the means-square

sense:

2
lim E{
n
xn x }=0 (6-7)

where, E {x} represents the estimated value of x.

6.5 Overfitting problems and regularization

As stated before, the BP is the most commonly used ANN learning technique. The

standard backpropagation is a gradient descent algorithm in which the network weights

and biases are modified in the direction that performance function decreases most

rapidly. Multilayer feedforward network with BP are capable of performing any linear

or multivariate arbitrary non-linear computations and can approximate any continuous

function to achieve a desired accuracy. However, BP algorithm is slow for

convergence and may cause overfitting problem. To speed up the BP training process,

some faster BP algorithms that can converge faster have been developed. Among

them, Levenberg-Marquardt algorithm generally has the fastest convergence and is

able to obtain lower mean squares errors than any other algorithms for function

approximation problems (Demuth and Beale, 2000).

6.5.1 Overfitting problems

The goal of neural network training is to minimize the errors while the trained neural

network can respond properly when presented with new inputs. Ovefitting is a

142
phenomenon whereby the neural network has memorized the training example so that

the network fails to generalize on a new situation. Overfitting may occur when the data

set for training is small. As a larger network is used, the more complex the functions a

network can create. However, the more complex the network is, the more possible the

network may mistakenly model the noise in the data as part of the non-linear

relationship, leading to over-fitting the data. To overcome overfitting, one solution is

to add more training examples. However, it is difficult to know how large the network

should be for a specific application. It is also difficult to overcome overfitting if the

training examples are in limited supply. Fortunately, there are two other useful

techniques to overcome this problem. They are early stopping and regularization.

According to Sarle (1995) and Demuth and Beale (2000), when conducting a function

approximation training, Bayesian Regularization provides better generalization

performance than early stopping. It is because unlike early stopping that separates

validation data from the training data, Bayesian Regularization uses both as training

data. When the size of the data set is small or if there is little noise in the data set, the

advantage of Bayesian Regularization over early stopping is more remarkable. The

experiments show that on average, the MSE obtained from Bayesian Regularization is

only around 1/5 that of early stopping. Therefore, this study will apply the Bayesian

Regularization to avoid over-fitting.

6.5.2 Regularization

Regularization is to improve generalization by constraining the size of the network

weights. When the weights are small, the network response will be smooth. According

143
to Foresee and Hagan (1997), with regularization, any modestly oversized network

should be able to sufficiently represent the true function.

The typical performance function that is used for training multilayer feedforward

network is the mean sum of squares of the network errors (MSE):

1 N
F = MSE = (t i o i ) (6-8)
N i =1

To improve generalization, the performance function is modified by adding a term that

consists of the mean of the sum of squares of the network weights and biases (MSW):

MSEreg = MSE + (1 )MSW

1 n 2 (6-9)
MSW = wj
n j=1

where is the performance ratio; MSEreg is the performance function for

regularization; MSE is the mean sum of square of the network errors; and MSW is the

mean of the sum of squares of the network weights and biases.

The improved performance function will cause the network to have smaller weights

and biases and, hence, resulting in a smoother network response which is less likely to

overfit. However, it is difficult to determine the optimum value for the performance

ratio parameter. To overcome this difficulty, Mackay (1992) introduced the Bayesian

Regularization. In this technique, the weights and biases of the network are assumed to

be random variables with specific distributions. The regularization parameters are

related to the unknown variances associated with these distributions. Then statistical

techniques can be used to estimate these parameters. Application of the Bayesian

Regularization will be discussed in the next section.

144
6.6 Bayesian Neural Networks

Bayesian Neural Networks (BNNs) is a combination of Bayesian rules and neural

network to automatically determine the optimal regularization parameters. Mackay

(1992) was the first to introduce the Bayesian approach to neural network training and

to optimize regularization based on Gaussian approximation. Neal (1993) adopted the

Monte Carlo method as computational techniques to implement Bayesian neural nets.

Foresee and Hagan (1997) used a Gauss-Newton approximation on the Hessian matrix

and the Levenberg-Marquardt algorithm to implement Bayesian Regularization to train

feedforward neural networks.

6.6.1. Applications of BNNs

The BNNs have been utilized in many areas, but not yet in construction. BNN model

was used by Cool et al. (1997) for predicting yield and ultimate tensile strength in

welds. Cherian et al. (2000) used BNNs to predict mechanical properties of ferrous

powder materials and the model was found to produce good prediction accuracy. A

BNN-based model for determining main particulars of a ship at the initial design stage

is described by Clausen et al. (2001). BNNs have also been used in assessing

nonlinearities in the relationship between work attitudes and job performance by

Somers (2001). Aminian (2001) developed an analog circuit fault diagnostic system

applying BNNs.

145
6.6.2 Theory of Bayesian Regularization

The main objective of BNNs is to model the relationship from the data without

overfitting the noise through optimizing the regularization parameter. As discussed

before, one of the drawbacks of ANNs is that of choosing the optimal architecture of

ANNs by trial and error. Compared with conventional neural networks, BNNs can

automatically control model complexity through estimating effective parameters of the

networks.

According to Mackay (1997), the Bayesian probability theory offers several benefits in

data modelling:

The overfitting problem can be solved by using Bayesian methods to control

model complexity.

Probabilistic modelling handles uncertainty in a natural manner. There is a

unique prescription for incorporating uncertainty about parameters into

predictions.

One can define more sophisticated probabilistic models which are able to

extract more information from the data.

According to Macky (1997), optimization of model control parameters has four

important advantages:

No test set or validation set is involved, so all available training data can be

devoted to both model fitting and model comparison.

Regularization constants can be optimized on-line, i.e. simultaneously with

optimization of ordinary model parameters.

146
The Bayesian objective function is not noisy, in contrast to a cross-validation

measure.

The gradient f the evidence with respect to the control parameters can be

evaluated, making it possible to simultaneously optimize a large number of

control parameters.

Bayesian approaches are mostly implemented for multilayer feedforward neural

networks. A network is trained using a data set of inputs and the targets D by adjusting

the weights w so as to minimize an error function:

D = ( x1 , t1 ), ( x 2 , t 2 )..., ( x n , t n )} (6-10)

where, D is the training set; xi is the ith sets of inputs, t is the ith target output. It is

assumed that the ith target of the network are generated by

t i = g ( xi ) + i (6-11)

where, g ( xi ) is an unknown function and i is the independent Gaussian noise. The

objective of the training is to minimize the sum of squares of the network errors:

n
ED (w) = (ti oi )2 (6-12)
i =1

where, oi is the network output.

It is possible to improve generalization by adding a term; the objective function is

modified as:

F = E D + E W (6-13)

where, F is the modified performance function; ED is the sum squares of the network

errors; EW = i wi2 is the sum squares of the network weights; and and are

objective function parameters which determine the complexity of the model.

147
controls the weight distribution in the network and, hence, its nonlinear mapping

ability. Noise in the data is represented as , which is the inverse of variance due to

noise. If >> , training will emphasize weight size reduction and produce a

smoother network response. If >> , the training algorithm will drive the error

smaller. The objective of regularization is to optimise the parameters and .

6.6.2.1 Infer weights w for given values of ,

In the Bayesian framework, the weights of the network are considered random

variables. Consider objective function F = E D + EW , after the data D is observed,

the density function for the set of weights w can be updated by applying Bayes rule:

P ( D | w, , M ) P (w | , M )
P (w | D, , , M ) = (6-14)
P(D | , , M )

where, M is the specific functional form of the neural network model used.

P(w | D, , , M ) is the posterior probability of w;

P(w | , M ) is the prior probability (density) of w;

P( D | w, , M ) is the likelihood function of w; and

P( D | , , M ) is a normalization factor or evidence for and .

Under the assumption that the distribution of noise of the target variable t is Gaussian1

and that the prior probability distribution for the weights is Gaussian, the likelihood

function and prior densities can be represented as:

1
The assumption of Gaussian simplifies the calculations involved in arriving at the equations and
reduces the computational burden in on-line optimisation of hyper-parameter. In real cases, these
assumptions give satisfactory results (MacKay, 1992).

148
1
P(D | w, , M ) = exp(-E D ); and
ZD ( )
(6-15)
1
P(w | , M) = exp(-E W )
Z W ( )

where,


n
ZD ( ) = ( ) 2

(6-16)
N
Z W ( ) = ( ) 2

The posterior probability can be written as:

1 1
exp(-( E D + EW ))
Z D ( ) Z W ( )
P (w | D , , , M ) =
Normalizat ion factor (6-17)
1
= exp ( F (w ))
Z F ( , )

The optimal weights are inferred by maximising the posterior probability

P(w | D, , , M ) , which is equal to minimising the regularized objective function

F = E D + EW .

6.6.2.2 Optimise regularization parameters ,

The control parameters and determine the complexity of the model. To infer

and , again Bayes rule is applied and the posterior probability of parameters and

can be written as:

P( D | , , M ) P( , | M )
P( , | D, M ) = (6-18)
P( D | M )

149
Assuming a uniform prior density P( , | M ) for the regularization parameters

and , optimising the posterior probability of parameters and can be achieved

by maximizing the likelihood function P( D | , , M ) .

From the equation 6-14, the normalization factor can be solved as:

P ( D | w, , M ) P (w | , M )
P(D | , , M ) =
P (w | D, , , M )
1 1
exp( E D ) exp( E W )
ZD ( ) Z W ( ) Z F ( , )
= =
1 Z D ( ) Z W ( )
exp( ( F(w))
Z F ( , )

(6-19)

In the above equation, only Z F ( , ) is unknown. To estimate it, Taylor series

expansion is used. As the objective function has the shape of a quadratic in a small

area surrounding a minimum point, expand F (w) around the minimum point of

posterior density wMP, where the gradient is zero. Solving for the normalizing constant,

one obtains:

N 1
MP 1 2
Z F (2) (det((H
2
) ) exp( F (w MP )) ) (6-20)

where, H = 2 E D + 2 E W is the Hessian matrix of the objective function and w MP

is the parameter vector which minimises the objective function F = E D + EW .

By substituting equation 6-20, the optimal values for , at the minimum point are

obtained by taking the derivative with respect to each of the log of equation 6-19 and

setting them to zero, the maximum evidence of and satisfy, respectively:

150

MP =
2 EW (w MP )
(6-21)
n
MP
=
2 E D (w MP )

where, = N 2 MP tr (H MP ) 1 is the effective number of parameters in the neural

network used in reducing the error function with values between 0 and N. N is the total

number of parameters in the network.

To compute the Hessian matrix H MP of the F(w) at the minimum point wMP , two

alternative methods were used. One is Gauss-Newton approximation and the other is

Monte Carlo method developed by Neal (1996). The Gauss-Newton approximation to

the Hessian matrix is widely used for it is readily available if the Levenberg-Marquardt

(LM) optimisation algorithm is used to find the minimum point.

The Levenberg-Marquardt algorithm is an approximation to the Gauss-Newton

approximation. Consider a function V(x), to minimize it with respect to the parameter

x , use Newtons method and the following is obtained:

x = [ 2 V( x )]1 V( x ) (6-22)

where, 2 V( x ) is the Hessian matrix and V ( x ) is the gradient. Assume V( x ) is a

sum of squares function:

N
V( x ) = e i2 ( x ) (6-23)
i =1

Then, the following is obtained:

V ( x ) = J T ( x ) e( x ) (6-24)

2 V( x ) = J T ( x )J ( x ) + S( x ) (6-25)

where J(x) is the Jacobian matrix,

151
e1 ( x ) e1 ( x ) e1 ( x )
x ...
x 2 x N
1

e 2 ( x ) e 2 ( x )
...
e 2 ( x )
J ( x ) = x 1 x 2 x N (6-26)
. . . .
e ( x ) e N ( x ) e N ( x )
N
x 1 x 2 x N

and
N
S( x ) = e i ( x ) 2 e i ( x )) . (6-27)
i =1

For the Gauss-Newton method, it is assumed that S( x ) 0 , thus,

x = [J T ( x )J ( x )] 1 J T ( x )e( x ) (6-28)

H= 2 V( x ) = J T ( x )J ( x ) . (6-29)

The Levenburg-Marquardt algorithm modification to the Gauss-Newton method is:

x = [J T ( x )J ( x ) + I] 1 J T ( x )e( x ) . (6-30)

when is small, Levenburg-Marquardt becomes Gauss-Newton.

6.6.2.3 Gauss-Newton approximation to Bayesian Regularization

Before training, the training data needs to be normalized into the range [-1, 1] so as to

achieve better results. Based on the method to infer weights w and optimise

regularization parameters , . The steps for Bayesian Regularization using Gauss-

Newton approximation to Hessian Matrix are:

1. Initialise , and the weights. Set = 0, = 1 , and use Nguyen-Widrow

method of initialising weights.

152
2. Take one step of the Levenberg-Marquardt algorithm to minimize the objective

function F = E D + EW

3. Compute the effective number of parameters = N 2 MP tr (H ) 1 . To

compute the Hessian Matrix H, the Gauss-Newton approximation available in

the Levenberg-Marquardt training algorithm is used:

H = 2 F(w) 2 J T J + 2I N , where, J is the Jacobian matrix of the training

set errors. To compute the Jacobian matrix J, refer to Hagan and Menhaj

(1994).

4. Compute new estimates for the objective function parameters


= , and
2 EW (w )

n
= .
2 E D (w)

5. Iterate step 2 through 4 until convergence.

With each re-estimation of the objective parameters, the objective function is changing

and, therefore, the minimum point is moving. If traversing the performance generally

moves towards the next minimum point, then the new estimates for the objective

function parameters will be more precise. Eventually, the objective function will not

significantly change in subsequent iterations and this indicates that the precision is

good and the training reaches its convergence.

153
6.7 Applications of ANNs to times-series forecasting

Time series forecasting is an important task that has long been conducted in many

disciplines, particularly, in economics and finance. The techniques applied include

traditional statistical methods such as the Box-Jenkins method and threshold

autoregressive, genetic algorithms and neural networks. Among them, neural networks

are demonstrated to be the most powerful one for time-series forecasting (Goh, 1998;

Zhang and Fukushige , 2002).

One of the first successful applications of ANNs in forecasting is by Lapedes and

Farber (1987). Applying feedforward neural network on two deterministic chaotic time

series, they developed a model that can forecast nonlinear systems with very high

accuracy. After Lapedes and Farbers pilot work, there were many neural networks

developed for time-series prediction. Among them are the feedforward neural networks

(NNs), recurrent NNs, neurofuzzy networks, neuro-wavelet networks, Bayesian NNs

and Bayesian evolutionary neural tree. The following sections will review each of

them briefly.

6.7.1 Feedforward neural networks in time-series forecasting

Feedforward multilayer networks are the most widely used ANNs to forecast time-

series due to its straightforwardness. Previous works include Lapedes and Farber

(1987), Sharda and Patil (1992), and Tang and Fishwick (1993). Feedforward NNs are

capable of conducting stationary time series forecasting with high accuracy. However,

the method can only learn an input output mapping which is static and may fail when

temporal contingencies span unknown intervals (Zhang and Fukushige , 2002).

154
6.7.2 Recurrent networks in time series forecast

To overcome this feedforward NNss disadvantage, recurrent networks were

developed. Recurrent networks are networks with one or more cycles that apply to

time series data and that use outputs of network units at time t as input to other units at

time t+1. Recurrent networks are superior over feedfoward NNs in dealing with

complex stochastic time series. But one drawback of recurrent networks is that they are

very difficult to train and do not generalize reliably (Mitchell, 1997). The design of an

efficient architecture and the choice of the parameters require longer processing time

(Zhang and Fukushige , 2002). Nevertheless, recurrent networks are very important in

time series prediction because of their representational power.

To solve the difficulty in choosing architecture and parameters in recurrent networks,

genetic algorithms (GAs), especially the more superior Breeder Genetic Algorithms

(BGAs) are utilized to optimise the architecture neural networks and related

parameters. BGAs are specially powerful in designing neural networks for nonlinear

systems.

6.7.3 Evolutionary ANNs

The efficient use of GAs (BGAs) to optimise network topology inspired many research

studies in evolving ANNs by evolutionary search procedures, such as GAs.

Evolutionary ANNs (EANNs) were, consequently, developed. EAANs are networks

that combine ANNs and evolutionary search procedures. EANNs not only learn, but

also adapt to a changing environment. EANNs are adaptive systems that can change

their architecture and learning rules appropriately without human intervention.

155
6.7.4 Neuron fuzzy networks

There is considerable interest in combining neural networks and fuzzy logic to develop

fuzzy neural networks (FNNs) for times series analysis. In FNNs, fuzzy reasoning is

used to handle uncertain information and neural network to deal with information

related to real data. There are fewer practical applications of FNNs to time series

predictions and most of them are applied to chaotic time series.

6.7.5 Neuron-wavelet networks

Neuron-wavelet networks are a combination of Dynamical Recurrent Neural Network

(DRNN) with the wavelet transform technique. Neuron-wavelet networks have

demonstrated the capability to improve prediction accuracy of conventional neural

network in time series prediction. In neuron-wavelet networks, first, wavelet transform

is used to decompose the time series into varying scale of temporal resolution so that

the temporal structure of the original time series becomes more tractable. Then, DRNN

is used to train on each resolution scale by the temporal recurrent backpropagation

(TRBP) algorithm. Subsequently, each wavelet scale forecast is combined to compute

the current estimate.

6.7.6 Bayesian Neural Networks

The conventional neural networks have difficulties in controlling the complexity of the

model and they lack of tools for analyzing output results such as confidence intervals

and levels. The Bayesian Neural Networks are a combination of Bayesian approach

and neural networks. It is mainly used for solving overfitting problem in the case of

156
insufficient data series. The Bayesian approach is applied by using probability to

quantify uncertainty in inference. The result of Bayesian learning is a probability

distribution and prediction is made by integrating over the posterior distribution. The

main advantages of Bayesian neural networks include (Lampinen and Vehtari, 2000),

(1) automatic complexity control; (2) possibility to use prior information and

hierarchical models for hyper-parameters; and (3) predictive distribution of the output.

6.8 Developing Multilayer feedforward network for forecasting

Generally, developing a neural network involves design of an appropriate architecture,

selection of activation functions of the hidden and output nodes, the training algorithm

and parameters, data normalization methods, training and test datasets, and

performance measures. The following sections will focus on how to develop multilayer

feedforward network specifically for forecasting. To develop a multilayer feedforward

network, the decisions include (1) design the appropriate architecture, that is, the

number of layers, the number of nodes in each layer, and the number of arcs which

interconnect with the nodes; (2) selection of transfer functions of the hidden and output

nodes; (3) selection of the training algorithm; (4) data normalization methods, (5)

training and test sets; and (6) performance measures.

6.8.1 Architecture of multilayer feedforward network

In the typical multilayer feedforward network, there are one input layer, one output

layer and the one or more hidden layers, with each node fully connected with nodes of

adjacent layer. The first step in designing a multilayer feedforward network is to

157
determine the number of input nodes, the number of hidden layers and nodes, and the

number of output nodes. The selection of these parameters is basically problem-

dependent and there is no simple clear-cut method for the determination of these

parameters.

6.8.1.1 Number of input nodes

The number of input nodes corresponds to the number of variables used to forecast

future values. In a time series forecasting, the number of input nodes corresponds to

the number of lagged observations used to discover the underlying pattern in a time

series. Currently there is no suggested systematic way to determine this number.

However, too few or too many input nodes can affect either the learning or prediction

capability of the network.

6.8.1.2 Number of hidden layers and nodes

Hidden nodes in the hidden layer allow neural networks to capture the feature in the

data and to perform complicated nonlinear mapping between input and output

variables. For number of hidden layers in forecasting problem, usually one hidden

layer is enough for ANNs to approximate any complex nonlinear function with any

desired accuracy (Hornik et al., 1989). However for some specific problems, using two

hidden layers may give more accurate results, especially when one hidden layer

network has too many hidden nodes to give satisfactory results.

158
Determining the number of hidden nodes is by trail-and-error. As discussed before,

networks with fewer hidden nodes are preferable as they usually have better

generalization ability and less overfitting problem. But networks with too few hidden

nodes may not have enough power to model and learn the data. If there is only one

hidden layer, a suitable initial size is 75% of the size of the input layer (Bailey and

Thompson, 1990)

6.8.1.3 Number of output nodes

For a time series forecasting problem, the number of output nodes often corresponds to

the forecasting horizon. There are two types of forecasting: one-step-ahead, which uses

one output node, and multi-step-ahead forecasting. There are two ways of making

multi-step forecasts: The first is called the iterative forecasting, in which the forecast

values are iteratively used as inputs for the next forecasts. In this case, only one output

node is necessary. The second called the direct method is to let the neural network

have several output nodes to directly forecast each step into the future. Results from

Zhang (1994) show that the direct prediction is much better than the iterated method.

6.8.2 Transfer function

Backpropagation algorithms require that the transfer function must be differentiable at

all points. Thus sigmoidal and linear transfer functions are most commonly used for

multilayer feedforward network that use BP algorithms. If the output layer of

kmultilayer feedforward network uses sigmoid transfer function, then the outputs of

the network are limited to a small range. If linear neurons are used for output layer, the

159
output of the network can take on any values. In multilayer feedforward networks,

hidden layers usually use sigmoid transfer function and one output layer use linear

transfer function, especially to carry out a forecast or function approximation task.

6.8.3 Training algorithm

Selection of training algorithm depends on the task type, the neural network size, time

constraints, memory requirement, accuracy requirement, and others. For a forecasting

task, which belongs to function approximation, the most popularly used training

algorithms are those of the Backpropagation. Among them, the Levenberg-Marquardt

algorithm generally has the fastest convergence and is able to obtain lower mean

squares errors than any other algorithms for function approximation problems. If the

data set is small, Bayesian Regularization training algorithm is preferred in order to

overcome overfitting.

6.8.4 Data normalization

To make more efficient in training, before training it is necessary to scale the inputs

and outputs within the range [-1,1]. Normalization can be realized using functions

premnmx, postmnmx and tramnmx in MATLAB.

6.8.5 Training sample and test sample

To develop a forecasting ANN model, a training data and a test data are typically

required. The training sample is used for developing the model and the test sample is

160
used for evaluating the forecasting ability of the model. In early stopping techniques

the validation sample is utilized to determine the stopping point of the training process.

In Bayesian regularization, only test set is used for both validation and testing purposes

particularly. To separate data into the training and test sets, it is necessary to consider

factors such as the problem requirement, data type and size of the available data and it

is critical to have both the training and test sets representative of the population. For

time series forecasting problems, this is particularly important. Inappropriate

separation of the training and test sets will affect the selection of optimal ANN

structure and the evaluation of ANN forecasting performance. Most authors select

them based on the rule of 90% vs. 10% (Zhang, et al., 1998). Some choose them based

on their particular problems.

6.8.6 Performance measures

The performance function is normally measured in terms of accuracy. There are a

number of measures of accuracy in the forecasting literature and each has advantages

and limitations. The most frequently used is the mean absolute percentage error

(MAPE) (Zhang, et al., 1998). Others include the mean absolute deviation (MAD), the

sum squares of the network errors (SSE) on the training set, the mean squared error

(MSE), the root mean squared error (RMSE). The latter four measures are absolute

measures and are of limited value when used to compare different time series.

161
6.9 Justification for the choice of ANN to predict TFP growth

As discussed in Chapter 4, factors influencing construction industry-level TFP growth

are highly interactive. The underlying relationships between TFP growth and factors

affecting TFP growth in the construction industry of Singapore are very complex and

have not yet been clearly understood. The traditional regression methods, which

require explicit representation of the relationship in mathematics or statistical model,

are not conducive for such complex multi-attribute non-linear mappings. Besides,

traditional models lack the ability to learn by themselves in order to respond

adequately to highly correlated, incomplete or previously unknown data. In contrast,

neural networks are superior over traditional methods for the purpose of determining

more complex relationships in a set of data and where the relationships between data

are largely remain unknown. They are also able to solve complex non-linear

relationships with higher accuracy (Goh, 1996, 1998; Portas and AbouRizk, 1997;

Boussabaine and Kaka, 1998; Emsley, 2002).

The advantages of ANNs over traditional statistical methods in time-series forecasting

are particularly remarkable (Zhang et al., 1998). One of the widely used traditional

model for time-series prediction, the Box-Jenkins or Autoregressive-Integrated-

Moving Average (ARIMA) method (Box and Jenkins, 1976), is linear. However, real

world systems are often nonlinear (Granger and Terasvirta, 1993). But nonlinear time

series models such as the bilinear model, the threshold autoregressive (TAR) model,

and the autoregressive conditional heteroscedastic (ARCH) model are still subjected to

the assumption of an explicit formulation for the data series despite the fact that the

underlying relationship for the data series may not be clear, and a pre-specified

nonlinear model may not be general enough to capture all the important features.

162
ANNs which are nonlinear data-driven approaches as opposed to the above model-

based nonlinear methods, are capable of performing nonlinear modeling without a

priori knowledge about the relationships between input and output variables. Studies in

many fields indicates that neural networks can predict nonlinear time series with

higher accuracy over traditional statistical and mathematical models (i.e. Lapedes and

Farber, 1987; Deppisch et al., 1991; Li et al., 1990; De Groot and Wurtz, 1991; Goh,

1996 , 1998).

As ANNs are a more flexible modelling tool for forecasting, this study will use this

method to forecast TFP growth of the construction industry in Singapore. As the data

set of this study is small (only 59 time-series dataset), an overfitting problem is highly

possible. In order to avoid overfitting, the Bayesian Neural Network is used in

predicting the TFP growth. BNNs can solve overfitting problem through automatically

controlling the model complexity. Moreover, a BNN is also superior over the

conventional neural networks in analyzing output results such as predictive distribution

of output, confidence intervals and levels. Therefore, a BNN-based model is applied to

predict the TFP growth in this study.

6.10 Chapter summary

A comprehensive review of the theory and applications of ANN, in particular BNNs,

was carried out in this chapter. It consists of: (1) an overview of ANNs; (2) basic

concepts of ANN; (3) overfitting problem and regularization; (4) Bayesian Neural

Networks (BNNs); (5) application of ANNs to time-series forecasting; and (6)

developing ANNs for training.

163
An overview of ANNs was given in Section 6.2. Formal definition, application areas

and advantages of ANNs were discussed. It highlighted that the main advantage of

ANNs is that they can perform complex non-linear mappings with higher accuracy

than traditional statistical models, especially when the relationships among the data

cannot be explicitly represented.

Next, the applications of ANNs in the field of construction management and

economics were reviewed in Section 6.3. It was found that ANNs are most frequently

used for forecasting work and that the three-layered feedforward neural network and

Backprogagation algorithms are the most common topology and training algorithm

adopted.

The fundamentals of developing ANNs were explained in Section 6.4. It involved

artificial neurons, processing elements, threshold function, topology of the networks,

learning rules, training methods and convergence rules.

Overfitting problems and regularization were discussed in Section 6.5. It highlighted

that if the training data set is small, the neural network tends to memorize the examples

and cannot generate well in new cases. This common overfitting problem can be

tackled effectively by two methods: early stopping and Regularization, in particular

Bayesian Regularization. Regularization is better than early stopping to cope with

overfitting problem.

Section 6.6, therefore, focused on the Bayesian Regularization technique. It first

reviewed the applications of BNNs and then explained how to apply Bayes rules and

164
Gauss-Newton approximation to optimise neural network parameters. The major

advantage of BNNs is that they can automatically control model complexity without

the need of using trial and error.

Section 6.7 reviewed the applications of ANNs for time-series forecasting. A critical

study of different neural networks for time-series forecasting was carried out. The

review covered the feedforward neural network, recurrent network, evolutionary

ANNs, neuron-fuzzy networks, neuron-wavelet networks and Bayesian neural

networks.

Section 6.8 explained how to develop a multilayer feedforward network. Rules of

design the architecture of the multilayer feedforward network and the selection of the

transfer function, training algorithm, data normalization, training and testing samples,

and performance function were respectively discussed.

Finally, Section 6.9 investigated the feasibility of ANNs, in particular BNNs for

forecasting construction industry-level TFP growth. Two key reasons were highlighted

for the choice of BNNs. First, the underlying relationship of the factors affecting TFP

growth is very complex. Second, the dataset of this study is small, which is highly

susceptible to cause an overfitting problem.

165

You might also like