Artificial Neural Networks

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/327573940
Chapter Three___________________________________ Artiﬁcial

Neural Networks ARTIFICIAL NEURAL NETWORKS 3.1 General
Chapter · September 2018
CITATIONS READS
0 1,504
1 author:
Layth Abdulbari Al-Jaberi

Al-Mustansiriya University
48 PUBLICATIONS 85 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Geopolymer Concrete View project
Fresh and Hardened Properties of Self Compacting Concrete View project
All content following this page was uploaded by Layth Abdulbari Al-Jaberi on 11 September 2018.
The user has requested enhancement of the downloaded file.

Chapter Three___________________________________ Artificial Neural Networks
Chapter Three
ARTIFICIAL NEURAL NETWORKS
3.1 General:
The origins of artificial neural networks (ANN) are in the field of the biology.
The biological brain consists of billions of highly interconnected neurons
forming a neural network. Human information processing depends on this
connectionist system of nervous cells. Based on this advantage of information
processing, neural networks can easily exploit the massively parallel local
processing and distributed storage properties in the brain.
A classical comparison of information processing by a human and a computer
is focused on the ability of pattern recognition and learning. The computer can
calculate large numbers at high speeds but it cannot recognize something such as
a classification problem, written text, data compression and a learning algorithm.
On the contrary, a human easily recognizes and deals with the challenges
mentioned above by processing information with highly distributed
transformations through thousands of interconnected neurons in the brain {Jeng
et al (2003) [97]}.
3.2 Processing Units:
The fundamental processing element of a neural network is a neuron. This
building block of human awareness encompasses a few general capabilities.
Basically, a biological neuron receives inputs from other sources, combines them
in some way, performs a generally nonlinear operation on the result, and then
outputs the final result {Anderson and McNeill (1992) [98]}.
Hu, and Hwang (2002) [100] stated that artificial neural network is a general
mathematical computing paradigm that models the operations of biological
neural systems. Among numerous artificial neural network models that have
been proposed over the years, all share a common building block known as
“nodes”, “neurons”, “cells”, or just “units” depending on who is describing the
network. The most widely used neuron model is based on McCulloch and Pitts’
work {McCulloch and Pitts (1943) [99]}.
35
Rumelhart et al. (1986) [101] stated that processing units may represent:
 A specific concept, such as features, letters, words, etc (the idea is that
one processing unit is equal to one concept); or
 An indescribable part of a larger concept, such as a sub -pattern within a
larger pattern (i.e., the idea is that many processing units may together
equal one concept).
The main tasks associated with the processing units are to receive input
from the neighbors providing incoming activation, compute an output, and send
that output to its neighbors receiving its output. Such a system is inherently
parallel, because many processing units can carry out their computations at the
same time. The processing units in a neural network can be classified as one of
three types [100]:
1. Input processing units, which receive input from external sources,
compute their activation level, compute their output as a function of
an activation level, and transmit this output to the rest of the network;
2. Output processing units, which upon receipt of input from the rest
of the network, compute and broadcast their output to external
receivers or feed their output back to the input layer of the network
for further processing; and
3. Hidden processing units, which only receive input from, and broadcast
their computed output to, processing units within the network (i.e.,
no outside contact).
3.3 Weighting Factors:
A neuron usually receives many simultaneous inputs. Each input has its own
relative weight which gives the input the impact that it needs on the processing
element's summation function. These weights perform the same type of function
as do the varying synaptic strengths of biological neurons. In both cases, some
inputs are made more important than others so that they have a greater effect on
the processing element as they combine to produce a neural response [98].
36
Weights are adaptive coefficients within the network that determine the
intensity of the input signal as registered by the artificial neuron. They are a
measure of an input's connection strength. These strengths can be modified in
response to various training sets and according to a network’s specific topology
or through its learning rules.
3.4 Summation (or Net) Function:
The net function determines how the network inputs are combined inside the
neuron. In Figure 3.1, a weighted linear combination is adopted. The amount of
information about the input that is required to solve a problem is stored in the
form of weights. Each signal is multiplied with an associated weight w1, w2, w3,
…., wn before it is applied to the summing block. In addition the artificial neuron
has a bias term w0, a threshold value “θ” that has to be reached or extended for
the neuron to produce a signal, a linear or a nonlinear function “F” that acts on
the produced signal “net” and an output “y” after this function. It should be noted
that the input to the bias neuron in Figure 3.3 is assumed to be 1.
1 x0
w0
x1 x1 w1
w2 y
x2 x2 ∑ F(net)
Summing Block
wn
xn xn
Figure 3.1: Basic Neuron Model

The following relation describes the transfer function of the basic neuron
model:
Eq (3.1)
37
Where,
⋯ Eq (3.2)
or
Eq (3.3)
and the neuron firing condition is:
[for linear activation function], x0 =1
or
[for nonlinear activation function]
3.5 Activation Function:

The purpose of linear or nonlinear activation function is to ensure that the
neuron’s response is bounded- that is, the actual response of the neuron is
conditioned or damped, as a result of large or small activating stimuli and thus
controllable. Further, in order to achieve the advantages of multilayer nets
compared with the limited capabilities of single layer networks, nonlinear
functions are used, depending upon the paradigm and the algorithm used for
training the network. The most commonly used activation functions are
summarized in Table 3.1.
Linear (straight-line) functions are limited because the output is simply
proportional to the input. Linear functions are not very useful. That was the
problem in the earliest network models as noted in Minsky and Papert's book
Perceptrons (1969) [102].
The Binary Step function could mirror the input within a given range and still
act as a hard limiter outside that range. It is a linear function that has been
clipped to minimum and maximum values, making it non-linear.
Sigmoid or S-shaped curve approaches a minimum and maximum value at the
asymptotes. It is common for this curve to be called a sigmoid when it ranges
between 0 and 1, and a hyperbolic tangent when it ranges between -1 and 1.
38
Table 3.1: Most Commonly used Neuron Activation Functions

Activation
Formula y= f(x) Description
Function
The activation of the neuron is passed on directly
Linear f(x) = x, for all x
as the output
;
Binary Step Output value is 0 or 1
.
A S-shaped curve, very popularbecause it is
Logistic, or 1 Monotonous and has a simple derivative,
Sigmoidal 1 1
Range of logistic or sigmoid function is from 0 to 1
A sigmoid curve similar to the logistic function.
Often performs better than the logistic function
Hyperbolic 2
1 because of its symmetry. Ideal for multilayer
Tangent 1
Perceptrons, particularly the hidden layers. Output
value is between -1 and +1
The negative exponential function. Ideal for use
with radial units. The combination of
radial synaptic function and negative exponential
activation function produces units that model a
Exponential Gaussian (bell-shaped) function centered at the
weight vector. The standard deviation of the
Gaussian is given by the formula below, where d is
the "deviation" of the unit stored in the unit's
threshold
Gaussian ‖ ‖ Used for radial basis neural network; m and σ2 are

radial basis parameters to be specified.
3.6 Artificial Neural Network Architectures

The arrangement of neurons into layers and the connection patterns within and
between layers is called the net architecture. The importance of the network
design is not to be underestimated. There is a tight relationship between the
learning algorithm and network structure which makes the design central
{Haykin (1994) [103]}.
A typical neural network consists of layers. In a single layered network there is
an input layer of source nodes and an output layer of neurons. A multi-layer
network has in addition one or more hidden layers of hidden neurons. More
39
hidden neurons raise the network’s ability to extract higher-order statistics from
(input) data. This is a crucial quality, especially if there is a large input layer.
The signal flow and the way which neurons are connected between each other
will indentify the type of the net. According to that, ANNs have two types of
architectures, feed-forward networks, and Recurrent or Feed-Back Networks.
In feed-forward networks, the signal flow is from input to output units, strictly
in a feed-forward direction, from the input nodes, through the hidden nodes (if
any) and to the output nodes. There are no cycles or loops in the network (see
Figure 3.2). Single Layer Perceptron, Multi-Layer Perceptron (MLP), and Radial
Basis Function Nets are three types of Feed-Forward Networks.
Figure 3.2: Feed-Forward Network.

3.7 Training an Artificial Neural Network
In order to approximate a given target function, it is necessary to find a good
set of weights. The problem is that changing one weight is likely to alter the
output of the function on the whole input space, so it is not as easy as using a
grid-based approximation. One possible solution consists in minimizing an error
function that measures how bad an approximation is. The process of finding
weights that minimize the error function is called training or learning by artificial
intelligence researchers. Learning or training methods can be categorized as:
3.7.1 Supervised Training [98]
With supervised learning, the artificial neural network must be trained before
it becomes useful. Training consists of presenting input and output data to the
network. This data is often referred to as the training set. That is, for each input
40
set provided to the system, the corresponding desired output set is provided as
well. In most applications, actual data must be used.
3.7.2 Unsupervised Training
The other type of training is called unsupervised training. In unsupervised
training, the network is provided with inputs but not with desired outputs. The
system itself must then decide what features it will use to group the input data.
This is often referred to as self-organization or adaption.
In this method of training, the input vectors of similar types are grouped
without the use of training data to specify how a typical number of each group
looks or to which group a member belongs. During training the neural network
receives input patterns and organizes these patterns into categories. When new
input pattern is applied, the neural network provides an output response
indicating the class to which the input pattern belongs. If a class cannot be
found for the input pattern, a new class is generated {Sivanandam and Paulraj
(2004) [104]}.
3.7.3 Reinforced Training
Reinforced training is similar to supervised training. However, in this method
the teacher does not indicate how close the actual output to the desired output
is, but yields only a pass or a fail indication. Thus, the error signal generated
during reinforced training is binary [104].
3.8 Training Algorithms:
In artificial neural networks, Training algorithm refers to the method of
modifying the weights of connections between the nodes of a specified network
in order to find the error by comparing the output value of the network with the
target value and then minimizing the difference (error) by modifying the weights.
There are many different training algorithms for the MLP.
Many algorithms are used in ANN’s. Donald Hebb introduced the first rule
which is Hebb’s rule. Hebb’s original statement was: “When an axon of cell A is
near enough to excite a cell B and repeatedly or persistently takes place in firing
it, some growth process or metabolic change takes place in one or both cells so
41
that A’s efficiency, as one of the cells firing B, is increased”. Hopfield (1982)
[105] stated that if the desired output and the input are either active or both
inactive, increment the connection weight by the learning rate, otherwise
decrement the weight by the learning rate. Anderson and McNeill [98] stated that
delta rule is (which is also referred to as the Widrow-Hoff rule or Least Mean
Square (LMS) Learning Rule) a further variation of Hebb's Rule.
Rabuñal and Dorado (2006) [106] defined Gradient Descent process as “The
process of making changes to weights and biases, where the changes are
proportional to the derivatives of the ANN error with respect to those weights
and biases. This is done to minimise the ANN error”. Anderson and McNeill
[98] stated that this rule is similar to the Delta Rule in that the derivative of the
transfer function is still used to modify the delta error before it is applied to the
connection weights.
Back-Propagation (BP) is one of the supervised learning algorithms of which
the best known example to train ANNs. It was developed by several independent
sources {Werbor (1974); Parker (1982); LenCun (1985); Rumelhart, Hinton and
Williams (1985)}. This independent co-development was the result of a
proliferation of articles and talks at various conferences which stimulated the
entire industry.
The simplest implementation of BP algorithm is to update the network weights
and biases in the direction where the performance function decreases most
rapidly (negative of the gradient). BP learning can be divided into two phases. In
the forward phase, a training pattern is an input into the network, and the error of
each hidden neuron is calculated. In the backward phase, the error is propagated
backward through the network and the weights and biases are adjusted
automatically using the following formula:
where xk is a vector of current weights and biases, gk is the current gradient,

and ak is the learning rate. These two phases are repeated until the performance
is good enough.
42
The Newton’s method is a class of local algorithm which makes use of the
Hessian matrix* of the objective function. In this way it is a second order
method. The idea of the method is as follows: one starts with an initial guess
which is reasonably close to the true root, then the function is approximated by
its tangent line (which can be computed using the tools of calculus), and one
computes the x-intercept of this tangent line (which is easily done with
elementary algebra). This x-intercept will typically be a better approximation to
the function's root than the original guess, and the method can be iterated.
Conjugate gradient (CG) is a local algorithm for an objective function whose
gradient can be computed, belonging for that reason to the class of first order
methods. According to its behavior, it can be described as a deterministic
method. The CG method can be regarded as being somewhat intermediate
between the method of gradient descent and Newton’s method {Luenberger
(1984) [107]}. In the conjugate gradient algorithm search is performed along
conjugate directions, which produces generally faster convergence than steepest
descent directions {Demuth and Beale (2002) [108]}. These train directions are
conjugated with respect to the Hessian matrix.
In optimization, quasi-Newton methods (also known as variable metric
methods) are algorithms for finding local maxima and minima of functions.
Quasi-Newton methods are based on Newton's method to find the stationary
point of a function, where the gradient is “0”. Newton's method assumes that the
function can be locally approximated as a quadratic in the region around the
optimum, and use the first and second derivatives (gradient and Hessian) to find
the stationary point. In Quasi-Newton methods the Hessian matrix of
second derivatives of the function to be minimized does not need to be
computed. The Hessian is updated by analyzing successive gradient vectors
instead. The quasi-Newton method can be classified as a local, first order and
deterministic training algorithm for the MLP.
*
In mathematics, the Hessian matrix (or simply the Hessian) is the square matrix of second-order partial
derivatives of a function; that is, it describes the local curvature of a function of many variables.
43
The BFGS algorithm, proposed independently by Broyden, Fletcher, Goldfarb,

and Shanno, is based on the Quasi-Newton method and is considered as one of
the most famous quasi-Newton algorithms. In numerical optimization,
the BFGS method is a method for solving nonlinear optimization problems
(which lack constraints).
The BFGS method approximates Newton's method, a class of hill-climbing
optimization techniques that seeks a stationary point of a (preferably twice
continuously differentiable) function: For such problems, a necessary condition
for optimality is that the gradient be zero. Newton's method and the BFGS
methods need not converge unless the function has a quadratic Taylor
expansion near an optimum. These methods use the first and second derivatives.
However, BFGS has proven good performance even for non-smooth
optimizations.
In quasi-Newton methods, the Hessian matrix of second derivatives need not be
evaluated directly. Instead, the Hessian matrix is approximated using rank-one
updates specified by gradient evaluations (or approximate gradient
evaluations). Quasi-Newton methods are a generalization of the secant method to
find the root of the first derivative for multidimensional problems. In multi-
dimensions the secant equation does not specify a unique solution, and quasi-
Newton methods differ in how they constrain the solution. The BFGS method is
one of the most popular members of this class {Nocedal & Wright (2006) [109]}.
Also in common use is L-BFGS, which is a limited-memory version of BFGS
that is particularly suited to problems with very large numbers of variables (like
>1000). The BFGS-B variant handles simple box constraints {Byrd, P. Lu and J.
Nocedal (1995) [110]}.
Training Algorithm
∗
Step 1: Given min f(x); ∈ ; ∈ positive definite;
Compute . 0, ; , : 1
Step 2: Set
Step 3: Carry out a line search along dk , getting 0,
44
, and . 0, .
Step 4: Set
where
,
Step 5: : 1 ; go to Step 2.
The line search in Step 3 requires the step-length αk to meet certain
conditions. If exact line search is used, αk satisfies
min
In the implementations of the BFGS algorithm, one normally requires that the
step-length αk satisfies the Wolfe (1969) [111] conditions:
,
,
where are constants in (0; 1). For convenience, the line search that
satisfies the Wolfe conditions is called the Wolfe line search.
3.9 Networks for Prediction
Predicting is making claims about something that will happen, often based on
information from past and from current state. Everyone solves the problem of
prediction every day with various degrees of success.
Neural networks can be used for prediction with various levels of success. The
advantage of them includes automatic learning of dependencies only from
measured data without any need to add further information (such as type of
dependency like with the regression). The neural network is trained from the
historical data with the hope that it will discover hidden dependencies and that it
will be able to use them for predicting into future. In other words, neural network
is not represented by an explicitly given model. It is more a black box that is able
to learn something. The basic idea is to train a neural network with past data and
then use this network to predict future values.
45
The advantage of the usage of neural networks for prediction is that they are
able to learn from examples only and that after their learning is finished, they are
able to catch hidden and strongly non-linear dependencies, even when there is a
significant noise in the training set. The disadvantage is that NNs can learn the
dependency valid in a certain period only. The error of prediction cannot be
generally estimated.
3.10 ANNs Predicting Models in Concrete Technology
Unlike traditional parametric models, these models are able to construct a
supposedly complex relationship between input and output variables with an
excellent level of accuracy compared with that of conventional methods [98].
The main advantage of ANNs is that one does not have to assume an explicit
model form, which is a prerequisite in the parametric approaches. Indeed, in
ANN models, a relationship of a possibly complicated nature between input and
output variables is generated by the data points. In comparison to parametric
methods, ANNs can deal with relatively imprecise or incomplete data and
approximate results, and are less vulnerable to outliers. They are highly parallel,
that is, their numerous independent operations can be executed simultaneously
[103].
These advances in the field of artificial intelligence keep having strong
influence over the civil engineering area. New methods and algorithms are
emerging that enable civil engineers to use computing in different ways. One of
them is in the area of concrete technology.
Concrete, as a non-homogeneous material, consists of separate phases;
hydrated cement paste, transition zone and aggregate. Although most of the
characteristics of concrete are associated with the average characteristics of a
component microstructure, the compressive strength and failure of concrete are
related to the weakest part of the microstructure. Experimental works stay as the
most reliable studies to figure out these characteristics. However, since the
experimental work needs a lot of effort, time and money, the need for utilizing
new methodologies and techniques to reduce this effort, save time and money
46
(and at the same time preserving high accuracy) is urged. Artificial intelligence
has proven its capability in simulating and predicting the behavior of the
different physical phenomena in most of the engineering fields such as
inspection, design, environment, hydrology, geotechnical engineering, and
concrete technology {[106], Rasa1 et al (2009) [112], Abdeen & Hodhod (2010)
[113], and Razavi et al (2011) [114]}.
3.11 Previous Studies in Predicting Properties of Concrete using ANNs:
The use of the ANN approach to describe and predict the properties of fresh
and hardened concrete is relatively not new. A number of applications in
predicting one or more of properties of concrete have been proposed by
several researchers (Ref. [115]-[172]}. However, few of them are dealing
with rheology and compressive strength of SCC particularly. Brief
explanation about these researches is in the following.
1. Nehdi et al (2001) [130]
This study is considered as the first attempt to predict the performance of
SCC mixtures using ANN modeling. In this research artificial neural
networks to predict SCC performance based on mixture proportions are
developed. The values of slump flow, filling capacity, segregation
resistance, and 28-day compressive strength were modeled.
Each SCC property was modeled separately but using the same network
architecture. In addition to the input and output layers, the final network
contains two hidden layers. The first hidden layer has 10 processing units,
while the second layer has only five processing units. A sigmoid function,
(logsig), was employed as an activation function for all processing units
with full connection adopted between units in different layers within the
network, as shown in Figure 3.3.
The input layer of this network model consists of an external input vector
of 10 elements, (cement, water, fly ash, slag, silica fume, limestone filler,
sand, gravel, viscosity-enhancing admixture, and high-range water-
reducing admixture). These were selected to represent adequately SCC
47
mixtures in the database described as follows. The output layer contains

one processing unit that represents the network’s output (one SCC
property) for each input vector.
The researchers mentioned a very important statement, while it is
possible to include several processing units in the output layer
representing various properties of SCC, the limited number of published
experimental data that simultaneously include several properties of SCC
makes it difficult for the network to capture the relationships between the
mixture components and SCC properties. Therefore, in this study each
SCC property was modeled separately using the same network architecture.
The researchers stated that the degree of success of the neural network
model in predicting the behavior of SCC mixtures largely depends on how
comprehensive the training data is. In other words, it depends on the
availability of a large variety of preexisting experimental data, capable of
teaching the network all aspects of the relationship between the mixture
variables of SCC and its measured properties. An extensive literature
review has identified a great deal of published data on SCC. The exclusion
of one or more of SCC properties in some studies and the ambiguity of
mixture proportions and testing methods in others, however, have reduced
significantly the number of adequate experimental data to train the
network. To avoid any further complexity, only experimental data having
mixture components with comparable physical and chemical properties
were identified for the training and testing of the network. With the
aforementioned criteria enforced, a number of data sets were selected from
different studies to train and test the network model, as summarized in
Table 3.2.
For back-propagation neural network, parameters used in this study, the
following values were used: learning parameter = 0.05; minimum gradient
= 1E-10; and desired error at the output layer = 1E-5. Weights and biases
are often initialized randomly.
48
Figure 3.3: Architecture of neural network model [130].

Nineteen experimental data used for network training and testing contain
sets of pairs. Each pair consists of an input vector of 10 elements (mixture
variables), and an output vector of one element (SCC property). To
simplify the learning process, each element in an input vector (cement,
water, fly ash, slag, silica fume, limestone filler, sand, gravel, viscosity-
enhancing admixture (VEA), and high-range water-reducing admixture)
was normalized as a percentage of the total weight of the SCC mixture.
As a conclusion, this study showed that the ANN approach can be used as
a new modeling technique to predict the rheological behavior and
mechanical performance of SCC mixtures. This approach performed very
well in predicting not only the rheological properties and compressive
strength of SCC mixtures used in the training process of the model, but
also those of test mixtures that were unfamiliar to the neural network.
The model was able to predict slump flow, filling capacity, segregation,
and 28-d compressive strength values of SCC mixtures made by various
researchers with an average absolute error of 4, 5, 7, and 7%, respectively.
It is clear that the ANN model captured the effect of SCC mixture variables
on the slump flow, filling capacity, segregation, and compressive strength.
Again, the success of the model was limited by the amount of data used
to train the model. It was suggested that the model could be used in
49
mixture proportioning to limit the number of laboratory trial batches.

Mixture proportions could be created and tested in the artificial neural
network model to select mixtures to achieve the required properties.
Table 3.2: Database of Ref. [130]
2. Zaid (2007) [148]

This work focused on the efficiency of ANN technique in predicting
performance of SCC. The researcher stated that the ANN models that
developed in his work based solely on the experimental results available in
the open literature. The experimental part of this study is based on
preparing, mixing and testing (5) SCC mixtures. The results gated from the
experimental part were used to validate the ANN models.
As in the previous work of Nehdi et al [130] the researcher mentioned the
problem of the limited available data. Thus, he followed the same path of
Nehdi et al [125] in constructing models contains one processing unit that
represents the network’s output (one SCC property). However, the values
of yield stress, plastic viscosity, slump flow, L-Box, 28-day compressive
strength, and splitting tensile strength were modeled.
50
Table 3.3: Data Sets and Their Sources used in Zaid [148]
Trainig Testing
Model Sources
Sets Sets
Khayat [64],Khayat et al(2002)[173], and
Yield Stress 24 4
Sonebi (2004) [174]
Plasti Khayat [64], Khayat et al(2002)[173], and
24 4
Viscosity Sonebi (2004) [174]
Kim et al [122], Nehdi et al [130],
Kim et al(1998)[175], Sonebi (2004) [176],
Zhu and Bartos (2003)[177],
Slump Flow 79 12
Ravindrarajah et al (2003) [178],
Van et al (1998)[179], Troli et al (2003)[180],
Shinoh & Matsuka (2003) [181],and Persson (2004) [182]
Nehdi et al [130], Sonebi [176],
L-Box 52 6
Van et al [179], and Persson [182]
28 days Nehdi et al [130], Kim et al[175], Persson [182],
Compressive 24 4 Bouzoubaa and Lachemi (2001) [183], and
Strength Ambroise et al (ACI SP) [184]
Splitting-
Tensile 20 3 Kim et al[175],and Ma & Dietz (2002) [185]
Strength
While in both studies (Nehdi et al [130] and Zaid [148]) each SCC
property was modeled separately, multiple network architectures (not the
same) were constructed by Zaid [148]. For yield stress and plastic viscosity
the models were same with (11) processing units in input layer represent
cement, fine aggregate, coarse aggregate, water, silica fume, fly ash, blast
furnace slag, set-retarding agent, air-entraining agent, viscosity-modifying
agent, and high-range water-reducing gent were used. The output layer
contained one processing element that represented the yield stress or plastic
viscosity. One hidden layer with (12) processing units were used.
Slump flow model contained an input layer with (9) processing units that
represented cement, fine aggregate, coarse aggregate, water, limestone, fly
ash, slag, viscosity-modifying agent, and high-range water-reducing gent.
51
Definitely the output layer contained single element represented the value
of slump flow. This network contained three hidden layers with (24)
processing units divided as follows: (12) elements for the first hidden layer
and (6) elements for each second and third layers.
L-Box model contained an input layer with (8) processing units
represented cement, fine aggregate, coarse aggregate, water, limestone, fly
ash, viscosity-modifying agent, and high-range water-reducing agent. The
output layer contained single element represented the value of L-Box
value. One hidden layer with (25) processing units were used.
The input layer for 28 days compressive strength contained (8)
processing units represented cement, aggregate, water, limestone, fly ash,
blast furnace slag, air-entraining agent, and high-range water-reducing
admixture. The output layer with single element represented this property
value. Two hidden layers with (3) and (5) elements in the 1st and 2nd layer
respectively were adopted.
Splitting-tensile strength model has the architecture of 28 days
compressive strength, however, the eight input elements were differs. They
were cement, aggregate, water, silica fume, fly ash, quartz powder, high-
range water-reducing admixture, and age of test (days).
Specialized computer software named Pythia-The Neural Network
Designer were used to construct and train the models. Sigmoid function
was used in all models. All models had learning rate parameter equal to
(0.1) automatically adjusted. The learning cycles were (3000) for all except
for slump flow model was (5000). Sets of data that were used in those
models and their sources are listed in Table 3.3. All of the data that used in
the input layers (in all of the models) were normalized as a percentage of
the density of normal weight concrete (2400 kg/m3). Data used for output
layers were small enough to be compatible with the limits of the sigmoid
function. The experimental work of this study consisted of designing,
preparing, mixing, and testing (5) SCC mixes. These mixes contained
52
water, cement, fine aggregates, coarse aggregates, and high-range water-

reducing admixture. The mix design used in this part was adopted from
EFNARC [3]. The concrete mixtures proportions were selected to investigate
the effect of varying fine and coarse aggregate contents on the filling and
passing abilities of SCC mixes. Slump flow, L-Box ratio, compressive and
splitting tensile strengths tests were adopted to reach to the aim of this part of
study. The aim of this part was to investigate the ability of the ANN models
developed for slump flow, L-Box, 28 days compressive strength, and splitting
tensile strength. No experimental test is done for rheology parameters (yield
stress and plastic viscosity).
The results of this study showed that:
1- The ANN models were able to predict the rheological, fresh and
hardened properties with an average absolute error of 9.5% for yield
stress, 9.8% for plastic viscosity, 3.8% for slump flow, 8% for L-Box
ratio, 5% for compressive strength, and also 5% for splitting tensile
strength.
2- Effects of the constituents of the mixes on the rheological and fresh
performance agreed with literature discussed in chapter 2 before.
As a recommendation for future work, the researcher advice creating a
predicting ANN model which consist of several outputs at the same time.
3. Ahmet Bilgil (2010) [161]
In this study, artificial neural network “ANN” is used to determine the
rheological properties of fresh concrete. Ferraris and de Larrard’s (1998)
[186] experimental slump, yield stress and viscosity data from different
composed concretes is used in this study. Slump, yield stress and viscosity are
estimated with respect to mixture design parameters.
The network used in this study is shown in Figure 3.4. It has a structure with
three layers, and it has six inputs and three outputs. The inputs are gravel,
sand, fine sand, cement, water, and super plasticizer “SP”. The outputs are
slump, yield stress and viscosity. The number of cells in the hidden layer is up
to ten based on the experiments conducted during training. To test the
53
accuracy of the trained network, the coefficient of determination R2 was

adopted. The coefficient is a measure of how well the considered independent
variables account for the measured dependent variable. The higher the R2
value is the better the prediction relationship.
Figure 3.4: Structure of application artificial neural network [161].

The ANN is trained using various training settings. (58) Mixes are used as
the data for the model. The mixture ingredients that comprised the concrete
are sorted into two groups based on their “Dry mixture masses”, and
“Compositions” as shown in Table 3.5. Additionally, from these groups of
data, three sub-groups have been constructed as shown in Table 3.4. ANN is
trained using Levenberg-Marquardtmethod which is implemented in Matlab
as “trainlm” function. This is the best function which gives the smaller
training error compared to the rest of algorithms available in Matlab. A part of
data is used to train ANN. The performance of ANN is evaluated using the
rest of data which is not used in training session. In all training and test
processes the groups of data shown in Table 3.5 were used.
Table 3.4 Input data groups for ANN [161].
Obtained results from this study indicated that ANN is utilizable method to
determine the rheological properties (Bingham model) of fresh concrete.
54
3.12 Concluded Remarks:

1. Artificial Neural Networks is a branch of artificial intelligence. Networks are
modeled after the human brain consisting of brain cells and connections. As in the
human brain, these networks are capable of learning from examples. Neural
networks learn by adjusting their connection weights. Most networks are based on
supervised learning algorithms in which pairs of input and desired output are
shown to them during a training session.
2. Neural networks can be used for prediction with various levels of success. The
advantage of these includes automatic learning of dependencies only from
measured data without any need to add further information (such as type of
dependency like with the regression). The neural network is trained from the
historical data with the hope that it will discover hidden dependencies and that it
will be able to use them for predicting the future.
3. Because of its complex mixture proportions, research on SCC has been highly
empirical, and no models with reliable predictive capabilities for its behavior have
been developed. Thus, its rheological and mechanical properties are often
described using traditional regression analysis and statistical methods. Based on its
abilities ANN can be used to predict the performance of SCC mixtures effectively.
It can capture complex interactions among input/output variables in a system
without any prior knowledge of the nature of these interactions, and without
having to explicitly assume a model form. Indeed, such a model form is generated
by the data points themselves.
4. A number of applications in predicting one or more of properties of concrete have
been proposed by several researchers. However, few of them are dealing with
rheology and compressive strength of SCC particularly. All of those researches
depend on limited range of data, and deal with one, two, or three parameters.
There are no models available that contain very wide of data, or designed to
capture all of the rheological, fresh and hardened parameters of SCC at the same
time.
55
View publication stats

Artificial Neural Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Networks

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Chapter Three___________________________________ Artiﬁcial

Chapter · September 2018

Layth Abdulbari Al-Jaberi

Geopolymer Concrete View project

Fresh and Hardened Properties of Self Compacting Concrete View project

The user has requested enhancement of the downloaded file.

Figure 3.1: Basic Neuron Model

and the neuron firing condition is:

[for linear activation function], x0 =1

3.5 Activation Function:

Table 3.1: Most Commonly used Neuron Activation Functions

3.6 Artificial Neural Network Architectures

Figure 3.2: Feed-Forward Network.

where xk is a vector of current weights and biases, gk is the current gradient,

The BFGS algorithm, proposed independently by Broyden, Fletcher, Goldfarb,

mixtures in the database described as follows. The output layer contains

Figure 3.3: Architecture of neural network model [130].

mixture proportioning to limit the number of laboratory trial batches.

2. Zaid (2007) [148]

water, cement, fine aggregates, coarse aggregates, and high-range water-

accuracy of the trained network, the coefficient of determination R2 was

Figure 3.4: Structure of application artificial neural network [161].

3.12 Concluded Remarks:

View publication stats

You might also like