Professional Documents
Culture Documents
net/publication/327573940
CITATIONS READS
0 1,504
1 author:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Layth Abdulbari Al-Jaberi on 11 September 2018.
Chapter Three
ARTIFICIAL NEURAL NETWORKS
3.1 General:
The origins of artificial neural networks (ANN) are in the field of the biology.
The biological brain consists of billions of highly interconnected neurons
forming a neural network. Human information processing depends on this
connectionist system of nervous cells. Based on this advantage of information
processing, neural networks can easily exploit the massively parallel local
processing and distributed storage properties in the brain.
A classical comparison of information processing by a human and a computer
is focused on the ability of pattern recognition and learning. The computer can
calculate large numbers at high speeds but it cannot recognize something such as
a classification problem, written text, data compression and a learning algorithm.
On the contrary, a human easily recognizes and deals with the challenges
mentioned above by processing information with highly distributed
transformations through thousands of interconnected neurons in the brain {Jeng
et al (2003) [97]}.
3.2 Processing Units:
The fundamental processing element of a neural network is a neuron. This
building block of human awareness encompasses a few general capabilities.
Basically, a biological neuron receives inputs from other sources, combines them
in some way, performs a generally nonlinear operation on the result, and then
outputs the final result {Anderson and McNeill (1992) [98]}.
Hu, and Hwang (2002) [100] stated that artificial neural network is a general
mathematical computing paradigm that models the operations of biological
neural systems. Among numerous artificial neural network models that have
been proposed over the years, all share a common building block known as
“nodes”, “neurons”, “cells”, or just “units” depending on who is describing the
network. The most widely used neuron model is based on McCulloch and Pitts’
work {McCulloch and Pitts (1943) [99]}.
35
Chapter Three___________________________________ Artificial Neural Networks
Rumelhart et al. (1986) [101] stated that processing units may represent:
A specific concept, such as features, letters, words, etc (the idea is that
one processing unit is equal to one concept); or
An indescribable part of a larger concept, such as a sub -pattern within a
larger pattern (i.e., the idea is that many processing units may together
equal one concept).
The main tasks associated with the processing units are to receive input
from the neighbors providing incoming activation, compute an output, and send
that output to its neighbors receiving its output. Such a system is inherently
parallel, because many processing units can carry out their computations at the
same time. The processing units in a neural network can be classified as one of
three types [100]:
1. Input processing units, which receive input from external sources,
compute their activation level, compute their output as a function of
an activation level, and transmit this output to the rest of the network;
2. Output processing units, which upon receipt of input from the rest
of the network, compute and broadcast their output to external
receivers or feed their output back to the input layer of the network
for further processing; and
3. Hidden processing units, which only receive input from, and broadcast
their computed output to, processing units within the network (i.e.,
no outside contact).
3.3 Weighting Factors:
A neuron usually receives many simultaneous inputs. Each input has its own
relative weight which gives the input the impact that it needs on the processing
element's summation function. These weights perform the same type of function
as do the varying synaptic strengths of biological neurons. In both cases, some
inputs are made more important than others so that they have a greater effect on
the processing element as they combine to produce a neural response [98].
36
Chapter Three___________________________________ Artificial Neural Networks
Weights are adaptive coefficients within the network that determine the
intensity of the input signal as registered by the artificial neuron. They are a
measure of an input's connection strength. These strengths can be modified in
response to various training sets and according to a network’s specific topology
or through its learning rules.
3.4 Summation (or Net) Function:
The net function determines how the network inputs are combined inside the
neuron. In Figure 3.1, a weighted linear combination is adopted. The amount of
information about the input that is required to solve a problem is stored in the
form of weights. Each signal is multiplied with an associated weight w1, w2, w3,
…., wn before it is applied to the summing block. In addition the artificial neuron
has a bias term w0, a threshold value “θ” that has to be reached or extended for
the neuron to produce a signal, a linear or a nonlinear function “F” that acts on
the produced signal “net” and an output “y” after this function. It should be noted
that the input to the bias neuron in Figure 3.3 is assumed to be 1.
1 x0
w0
x1 x1 w1
w2 y
x2 x2 ∑ F(net)
Summing Block
wn
xn xn
37
Chapter Three___________________________________ Artificial Neural Networks
Where,
⋯ Eq (3.2)
or
Eq (3.3)
or
[for nonlinear activation function]
38
Chapter Three___________________________________ Artificial Neural Networks
39
Chapter Three___________________________________ Artificial Neural Networks
hidden neurons raise the network’s ability to extract higher-order statistics from
(input) data. This is a crucial quality, especially if there is a large input layer.
The signal flow and the way which neurons are connected between each other
will indentify the type of the net. According to that, ANNs have two types of
architectures, feed-forward networks, and Recurrent or Feed-Back Networks.
In feed-forward networks, the signal flow is from input to output units, strictly
in a feed-forward direction, from the input nodes, through the hidden nodes (if
any) and to the output nodes. There are no cycles or loops in the network (see
Figure 3.2). Single Layer Perceptron, Multi-Layer Perceptron (MLP), and Radial
Basis Function Nets are three types of Feed-Forward Networks.
40
Chapter Three___________________________________ Artificial Neural Networks
set provided to the system, the corresponding desired output set is provided as
well. In most applications, actual data must be used.
3.7.2 Unsupervised Training
The other type of training is called unsupervised training. In unsupervised
training, the network is provided with inputs but not with desired outputs. The
system itself must then decide what features it will use to group the input data.
This is often referred to as self-organization or adaption.
In this method of training, the input vectors of similar types are grouped
without the use of training data to specify how a typical number of each group
looks or to which group a member belongs. During training the neural network
receives input patterns and organizes these patterns into categories. When new
input pattern is applied, the neural network provides an output response
indicating the class to which the input pattern belongs. If a class cannot be
found for the input pattern, a new class is generated {Sivanandam and Paulraj
(2004) [104]}.
3.7.3 Reinforced Training
Reinforced training is similar to supervised training. However, in this method
the teacher does not indicate how close the actual output to the desired output
is, but yields only a pass or a fail indication. Thus, the error signal generated
during reinforced training is binary [104].
3.8 Training Algorithms:
In artificial neural networks, Training algorithm refers to the method of
modifying the weights of connections between the nodes of a specified network
in order to find the error by comparing the output value of the network with the
target value and then minimizing the difference (error) by modifying the weights.
There are many different training algorithms for the MLP.
Many algorithms are used in ANN’s. Donald Hebb introduced the first rule
which is Hebb’s rule. Hebb’s original statement was: “When an axon of cell A is
near enough to excite a cell B and repeatedly or persistently takes place in firing
it, some growth process or metabolic change takes place in one or both cells so
41
Chapter Three___________________________________ Artificial Neural Networks
that A’s efficiency, as one of the cells firing B, is increased”. Hopfield (1982)
[105] stated that if the desired output and the input are either active or both
inactive, increment the connection weight by the learning rate, otherwise
decrement the weight by the learning rate. Anderson and McNeill [98] stated that
delta rule is (which is also referred to as the Widrow-Hoff rule or Least Mean
Square (LMS) Learning Rule) a further variation of Hebb's Rule.
Rabuñal and Dorado (2006) [106] defined Gradient Descent process as “The
process of making changes to weights and biases, where the changes are
proportional to the derivatives of the ANN error with respect to those weights
and biases. This is done to minimise the ANN error”. Anderson and McNeill
[98] stated that this rule is similar to the Delta Rule in that the derivative of the
transfer function is still used to modify the delta error before it is applied to the
connection weights.
Back-Propagation (BP) is one of the supervised learning algorithms of which
the best known example to train ANNs. It was developed by several independent
sources {Werbor (1974); Parker (1982); LenCun (1985); Rumelhart, Hinton and
Williams (1985)}. This independent co-development was the result of a
proliferation of articles and talks at various conferences which stimulated the
entire industry.
The simplest implementation of BP algorithm is to update the network weights
and biases in the direction where the performance function decreases most
rapidly (negative of the gradient). BP learning can be divided into two phases. In
the forward phase, a training pattern is an input into the network, and the error of
each hidden neuron is calculated. In the backward phase, the error is propagated
backward through the network and the weights and biases are adjusted
automatically using the following formula:
42
Chapter Three___________________________________ Artificial Neural Networks
The Newton’s method is a class of local algorithm which makes use of the
Hessian matrix* of the objective function. In this way it is a second order
method. The idea of the method is as follows: one starts with an initial guess
which is reasonably close to the true root, then the function is approximated by
its tangent line (which can be computed using the tools of calculus), and one
computes the x-intercept of this tangent line (which is easily done with
elementary algebra). This x-intercept will typically be a better approximation to
the function's root than the original guess, and the method can be iterated.
Conjugate gradient (CG) is a local algorithm for an objective function whose
gradient can be computed, belonging for that reason to the class of first order
methods. According to its behavior, it can be described as a deterministic
method. The CG method can be regarded as being somewhat intermediate
between the method of gradient descent and Newton’s method {Luenberger
(1984) [107]}. In the conjugate gradient algorithm search is performed along
conjugate directions, which produces generally faster convergence than steepest
descent directions {Demuth and Beale (2002) [108]}. These train directions are
conjugated with respect to the Hessian matrix.
In optimization, quasi-Newton methods (also known as variable metric
methods) are algorithms for finding local maxima and minima of functions.
Quasi-Newton methods are based on Newton's method to find the stationary
point of a function, where the gradient is “0”. Newton's method assumes that the
function can be locally approximated as a quadratic in the region around the
optimum, and use the first and second derivatives (gradient and Hessian) to find
the stationary point. In Quasi-Newton methods the Hessian matrix of
second derivatives of the function to be minimized does not need to be
computed. The Hessian is updated by analyzing successive gradient vectors
instead. The quasi-Newton method can be classified as a local, first order and
deterministic training algorithm for the MLP.
*
In mathematics, the Hessian matrix (or simply the Hessian) is the square matrix of second-order partial
derivatives of a function; that is, it describes the local curvature of a function of many variables.
43
Chapter Three___________________________________ Artificial Neural Networks
44
Chapter Three___________________________________ Artificial Neural Networks
, and . 0, .
Step 4: Set
where
,
Step 5: : 1 ; go to Step 2.
The line search in Step 3 requires the step-length αk to meet certain
conditions. If exact line search is used, αk satisfies
min
In the implementations of the BFGS algorithm, one normally requires that the
step-length αk satisfies the Wolfe (1969) [111] conditions:
,
,
where are constants in (0; 1). For convenience, the line search that
satisfies the Wolfe conditions is called the Wolfe line search.
3.9 Networks for Prediction
Predicting is making claims about something that will happen, often based on
information from past and from current state. Everyone solves the problem of
prediction every day with various degrees of success.
Neural networks can be used for prediction with various levels of success. The
advantage of them includes automatic learning of dependencies only from
measured data without any need to add further information (such as type of
dependency like with the regression). The neural network is trained from the
historical data with the hope that it will discover hidden dependencies and that it
will be able to use them for predicting into future. In other words, neural network
is not represented by an explicitly given model. It is more a black box that is able
to learn something. The basic idea is to train a neural network with past data and
then use this network to predict future values.
45
Chapter Three___________________________________ Artificial Neural Networks
The advantage of the usage of neural networks for prediction is that they are
able to learn from examples only and that after their learning is finished, they are
able to catch hidden and strongly non-linear dependencies, even when there is a
significant noise in the training set. The disadvantage is that NNs can learn the
dependency valid in a certain period only. The error of prediction cannot be
generally estimated.
3.10 ANNs Predicting Models in Concrete Technology
Unlike traditional parametric models, these models are able to construct a
supposedly complex relationship between input and output variables with an
excellent level of accuracy compared with that of conventional methods [98].
The main advantage of ANNs is that one does not have to assume an explicit
model form, which is a prerequisite in the parametric approaches. Indeed, in
ANN models, a relationship of a possibly complicated nature between input and
output variables is generated by the data points. In comparison to parametric
methods, ANNs can deal with relatively imprecise or incomplete data and
approximate results, and are less vulnerable to outliers. They are highly parallel,
that is, their numerous independent operations can be executed simultaneously
[103].
These advances in the field of artificial intelligence keep having strong
influence over the civil engineering area. New methods and algorithms are
emerging that enable civil engineers to use computing in different ways. One of
them is in the area of concrete technology.
Concrete, as a non-homogeneous material, consists of separate phases;
hydrated cement paste, transition zone and aggregate. Although most of the
characteristics of concrete are associated with the average characteristics of a
component microstructure, the compressive strength and failure of concrete are
related to the weakest part of the microstructure. Experimental works stay as the
most reliable studies to figure out these characteristics. However, since the
experimental work needs a lot of effort, time and money, the need for utilizing
new methodologies and techniques to reduce this effort, save time and money
46
Chapter Three___________________________________ Artificial Neural Networks
(and at the same time preserving high accuracy) is urged. Artificial intelligence
has proven its capability in simulating and predicting the behavior of the
different physical phenomena in most of the engineering fields such as
inspection, design, environment, hydrology, geotechnical engineering, and
concrete technology {[106], Rasa1 et al (2009) [112], Abdeen & Hodhod (2010)
[113], and Razavi et al (2011) [114]}.
3.11 Previous Studies in Predicting Properties of Concrete using ANNs:
The use of the ANN approach to describe and predict the properties of fresh
and hardened concrete is relatively not new. A number of applications in
predicting one or more of properties of concrete have been proposed by
several researchers (Ref. [115]-[172]}. However, few of them are dealing
with rheology and compressive strength of SCC particularly. Brief
explanation about these researches is in the following.
1. Nehdi et al (2001) [130]
This study is considered as the first attempt to predict the performance of
SCC mixtures using ANN modeling. In this research artificial neural
networks to predict SCC performance based on mixture proportions are
developed. The values of slump flow, filling capacity, segregation
resistance, and 28-day compressive strength were modeled.
Each SCC property was modeled separately but using the same network
architecture. In addition to the input and output layers, the final network
contains two hidden layers. The first hidden layer has 10 processing units,
while the second layer has only five processing units. A sigmoid function,
(logsig), was employed as an activation function for all processing units
with full connection adopted between units in different layers within the
network, as shown in Figure 3.3.
The input layer of this network model consists of an external input vector
of 10 elements, (cement, water, fly ash, slag, silica fume, limestone filler,
sand, gravel, viscosity-enhancing admixture, and high-range water-
reducing admixture). These were selected to represent adequately SCC
47
Chapter Three___________________________________ Artificial Neural Networks
48
Chapter Three___________________________________ Artificial Neural Networks
49
Chapter Three___________________________________ Artificial Neural Networks
50
Chapter Three___________________________________ Artificial Neural Networks
Table 3.3: Data Sets and Their Sources used in Zaid [148]
Trainig Testing
Model Sources
Sets Sets
Khayat [64],Khayat et al(2002)[173], and
Yield Stress 24 4
Sonebi (2004) [174]
Plasti Khayat [64], Khayat et al(2002)[173], and
24 4
Viscosity Sonebi (2004) [174]
Kim et al [122], Nehdi et al [130],
Kim et al(1998)[175], Sonebi (2004) [176],
Zhu and Bartos (2003)[177],
Slump Flow 79 12
Ravindrarajah et al (2003) [178],
Van et al (1998)[179], Troli et al (2003)[180],
Shinoh & Matsuka (2003) [181],and Persson (2004) [182]
Nehdi et al [130], Sonebi [176],
L-Box 52 6
Van et al [179], and Persson [182]
28 days Nehdi et al [130], Kim et al[175], Persson [182],
Compressive 24 4 Bouzoubaa and Lachemi (2001) [183], and
Strength Ambroise et al (ACI SP) [184]
Splitting-
Tensile 20 3 Kim et al[175],and Ma & Dietz (2002) [185]
Strength
While in both studies (Nehdi et al [130] and Zaid [148]) each SCC
property was modeled separately, multiple network architectures (not the
same) were constructed by Zaid [148]. For yield stress and plastic viscosity
the models were same with (11) processing units in input layer represent
cement, fine aggregate, coarse aggregate, water, silica fume, fly ash, blast
furnace slag, set-retarding agent, air-entraining agent, viscosity-modifying
agent, and high-range water-reducing gent were used. The output layer
contained one processing element that represented the yield stress or plastic
viscosity. One hidden layer with (12) processing units were used.
Slump flow model contained an input layer with (9) processing units that
represented cement, fine aggregate, coarse aggregate, water, limestone, fly
ash, slag, viscosity-modifying agent, and high-range water-reducing gent.
51
Chapter Three___________________________________ Artificial Neural Networks
Definitely the output layer contained single element represented the value
of slump flow. This network contained three hidden layers with (24)
processing units divided as follows: (12) elements for the first hidden layer
and (6) elements for each second and third layers.
L-Box model contained an input layer with (8) processing units
represented cement, fine aggregate, coarse aggregate, water, limestone, fly
ash, viscosity-modifying agent, and high-range water-reducing agent. The
output layer contained single element represented the value of L-Box
value. One hidden layer with (25) processing units were used.
The input layer for 28 days compressive strength contained (8)
processing units represented cement, aggregate, water, limestone, fly ash,
blast furnace slag, air-entraining agent, and high-range water-reducing
admixture. The output layer with single element represented this property
value. Two hidden layers with (3) and (5) elements in the 1st and 2nd layer
respectively were adopted.
Splitting-tensile strength model has the architecture of 28 days
compressive strength, however, the eight input elements were differs. They
were cement, aggregate, water, silica fume, fly ash, quartz powder, high-
range water-reducing admixture, and age of test (days).
Specialized computer software named Pythia-The Neural Network
Designer were used to construct and train the models. Sigmoid function
was used in all models. All models had learning rate parameter equal to
(0.1) automatically adjusted. The learning cycles were (3000) for all except
for slump flow model was (5000). Sets of data that were used in those
models and their sources are listed in Table 3.3. All of the data that used in
the input layers (in all of the models) were normalized as a percentage of
the density of normal weight concrete (2400 kg/m3). Data used for output
layers were small enough to be compatible with the limits of the sigmoid
function. The experimental work of this study consisted of designing,
preparing, mixing, and testing (5) SCC mixes. These mixes contained
52
Chapter Three___________________________________ Artificial Neural Networks
53
Chapter Three___________________________________ Artificial Neural Networks
Obtained results from this study indicated that ANN is utilizable method to
determine the rheological properties (Bingham model) of fresh concrete.
54
Chapter Three___________________________________ Artificial Neural Networks
55