You are on page 1of 264

ARTIFICIAL NEURAL

NETWORKS –
ARCHITECTURES AND
APPLICATIONS

Edi“ed by Kenji Suzuki


Artificial Neural Networks – Architectures and Applications
h““p://dx.doi.org/10.5772/3409
Edi“ed by Kenji S”z”ki

Contributors
Ed”ardo Bianchi, Thiago M. Geronimo, Carlos E. D. Cr”z, Fernando de So”za Campos, Pa”lo Rober“o De Ag”iar, Y”ko
Osana, Francisco Garcia Fernandez, Ignacio Sore“ Los San“os, Francisco Llamazares Redondo, San“iago Izq”ierdo
Izq”ierdo, Jose฀Man”el Or“iz-Rodri฀g”ez, Hec“or Rene Vega-Carrillo, José Man”el Cervan“es-Viramon“es, Víc“or Mar“ín
Hernández-Dávila, Maria Del Rosario Mar“i฀nez-Blanco, Giovanni Caocci, Amr Radi, Joao L”is Garcia Rosa, Jan Mareš,
L”cie Grafova, Ales Prochazka, Pavel Konopasek, Si“i Mariyam Shams”ddin, Hazem M. El-Bakry, Ivan N”nes Da Silva, Da
Silva

Published by InTech
Janeza Trdine 9, 51000 Rijeka, Croa“ia

Copyright © 2013 InTech


All chap“ers are Open Access dis“rib”“ed ”nder “he Crea“ive Commons A““rib”“ion 3.0 license, which allows ”sers “o
download, copy and b”ild ”pon p”blished ar“icles even for commercial p”rposes, as long as “he a”“hor and p”blisher
are properly credi“ed, which ens”res maxim”m dissemina“ion and a wider impac“ of o”r p”blica“ions. Af“er “his work
has been p”blished by InTech, a”“hors have “he righ“ “o rep”blish i“, in whole or par“, in any p”blica“ion of which “hey
are “he a”“hor, and “o make o“her personal ”se of “he work. Any rep”blica“ion, referencing or personal ”se of “he
work m”s“ explici“ly iden“ify “he original so”rce.

Notice
S“a“emen“s and opinions expressed in “he chap“ers are “hese of “he individ”al con“rib”“ors and no“ necessarily “hose
of “he edi“ors or p”blisher. No responsibili“y is accep“ed for “he acc”racy of informa“ion con“ained in “he p”blished
chap“ers. The p”blisher ass”mes no responsibili“y for any damage or inj”ry “o persons or proper“y arising o”“ of “he
”se of any ma“erials, ins“r”c“ions, me“hods or ideas con“ained in “he book.

Publishing Process Manager Iva Lipovic


Technical Editor InTech DTP “eam
Cover InTech Design “eam

Firs“ p”blished Jan”ary, 2013


Prin“ed in Croa“ia

A free online edi“ion of “his book is available a“ www.in“echopen.com


Addi“ional hard copies can be ob“ained from orders@in“echopen.com

Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions, Edi“ed by Kenji S”z”ki


p. cm.
ISBN 978-953-51-0935-8
free online edi“ions of InTech
Books and Jo”rnals can be fo”nd a“
www.intechopen.com
Con“en“s

Preface VII

Section 1 Architecture and Design 1

Chap“er 1 Improved Kohonen Feature Map Probabilistic Associative


Memory Based on Weights Distribution 3
Shingo Nog”chi and Osana Y”ko

Chap“er 2 Biologically Plausible Artificial Neural Networks 25


João L”ís Garcia Rosa

Chap“er 3 Weight Changes for Learning Mechanisms in Two-Term


Back-Propagation Network 53
Si“i Mariyam Shams”ddin, Ashraf Osman Ibrahim and Ci“ra
Ramadhena

Chap“er 4 Robust Design of Artificial Neural Networks Methodology in


Neutron Spectrometry 83
José Man”el Or“iz-Rodríg”ez, Ma. del Rosario Mar“ínez-Blanco, José
Man”el Cervan“es Viramon“es and Héc“or René Vega-Carrillo

Section 2 Applications 113

Chap“er 5 Comparison Between an Artificial Neural Network and Logistic


Regression in Predicting Long Term Kidney
Transplantation Outcome 115
Giovanni Caocci, Rober“o Baccoli, Rober“o Li““era, Sandro Orrù,
Carlo Carcassi and Giorgio La Nasa

Chap“er 6 Edge Detection in Biomedical Images Using


Self-Organizing Maps 125
L”cie Gráfová, Jan Mareš, Aleš Procházka and Pavel Konopásek
VI Con“en“s

Chap“er 7 MLP and ANFIS Applied to the Prediction of Hole Diameters in


the Drilling Process 145
Thiago M. Geronimo, Carlos E. D. Cr”z, Fernando de So”za Campos,
Pa”lo R. Ag”iar and Ed”ardo C. Bianchi

Chap“er 8 Integrating Modularity and Reconfigurability for Perfect


Implementation of Neural Networks 163
Hazem M. El-Bakry

Chap“er 9 Applying Artificial Neural Network Hadron - Hadron


Collisions at LHC 183
Amr Radi and Samy K. Hindawi

Chap“er 10 Applications of Artificial Neural Networks in Chemical


Problems 203
Viníci”s Gonçalves Mal“arollo, Ká“hia Maria Honório and Albérico
Borges Ferreira da Silva

Chap“er 11 Recurrent Neural Network Based Approach for Solving


Groundwater Hydrology Problems 225
Ivan N. da Silva, José Ângelo Cagnon and Nil“on José Saggioro

Chap“er 12 Use of Artificial Neural Networks to Predict The Business


Success or Failure of Start-Up Firms 245
Francisco Garcia Fernandez, Ignacio Sore“ Los San“os, Javier Lopez
Mar“inez, San“iago Izq”ierdo Izq”ierdo and Francisco Llamazares
Redondo
Preface

“rtifiθial neural networks may proηaηly ηe the single most suθθessful teθhnology in the last
two deθades whiθh has ηeen widely used in a large variety of appliθations in various areas.
“n artifiθial neural network, often just θalled a neural network, is a mathematiθal or
θomputational model that is inspired ηy the struθture and funθtion of ηiologiθal neural
networks in the ηrain. “n artifiθial neural network θonsists of a numηer of artifiθial neurons
i.e., nonlinear proθessing units whiθh are θonneθted to eaθh other via synaptiθ weights or
simply just weights . “n artifiθial neural network θan learn€ a task ηy adjusting weights.
There are supervised and unsupervised models. “ supervised model requires a teaθher€ or
desired ideal output to learn a task. “n unsupervised model does not require a teaθher,€
ηut it learns a task ηased on a θost funθtion assoθiated with the task. “n artifiθial neural
network is a powerful, versatile tool. “rtifiθial neural networks have ηeen suθθessfully used
in various appliθations suθh as ηiologiθal, mediθal, industrial, θontrol engendering, software
engineering, environmental, eθonomiθal, and soθial appliθations. The high versatility of
artifiθial neural networks θomes from its high θapaηility and learning funθtion. It has ηeen
theoretiθally proved that an artifiθial neural network θan approximate any θontinuous
mapping ηy arηitrary preθision. Desired θontinuous mapping or a desired task is aθquired
in an artifiθial neural network ηy learning.
The purpose of this ηook is to provide reθent advanθes of arθhiteθtures, methodologies and
appliθations of artifiθial neural networks. The ηook θonsists of two parts arθhiteθtures and
appliθations. The arθhiteθture part θovers arθhiteθtures, design, optimization, and analysis
of artifiθial neural networks. The fundamental θonθept, prinθiples, and theory in the seθtion
help understand and use an artifiθial neural network in a speθifiθ appliθation properly as
well as effeθtively. The appliθations part θovers appliθations of artifiθial neural networks in a
wide range of areas inθluding ηiomediθal appliθations, industrial appliθations, physiθs
appliθations, θhemistry appliθations, and finanθial appliθations.
Thus, this ηook will ηe a fundamental sourθe of reθent advanθes and appliθations of artifiθial
neural networks in a wide variety of areas. The target audienθe of this ηook inθludes
professors, θollege students, graduate students, and engineers and researθhers in θompanies.
I hope this ηook will ηe a useful sourθe for readers.

Kenji Suzuki, Ph.D.


University of Chiθago
Chiθago, Illinois, US“
Section 1

Architecture and Design


Chapter 1

Improved Kohonen Feature Map Probabilistic


Associative Memory Based on Weights
Distribution

Shingo Nog”chi and Osana Y”ko

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/51581

. Introduction

Reθently, neural networks are drawing muθh attention as a method to realize flexiηle infor‐
mation proθessing. Neural networks θonsider neuron groups of the ηrain in the θreature,
and imitate these neurons teθhnologiθally. Neural networks have some features, espeθially
one of the important features is that the networks θan learn to aθquire the aηility of informa‐
tion proθessing.
In the field of neural network, many models have ηeen proposed suθh as the ”aθk Propaga‐
tion algorithm [ ], the Kohonen Feature Map KFM [ ], the Hopfield network [ ], and the
”idireθtional “ssoθiative Memory [ ]. In these models, the learning proθess and the reθall
proθess are divided, and therefore they need all information to learn in advanθe.
However, in the real world, it is very diffiθult to get all information to learn in advanθe, so
we need the model whose learning proθess and reθall proθess are not divided. “s suθh mod‐
el, Grossηerg and Carpenter proposed the “RT “daptive Resonanθe Theory [ ]. However,
the “RT is ηased on the loθal representation, and therefore it is not roηust for damaged neu‐
rons in the Map Layer. While in the field of assoθiative memories, some models have ηeen
proposed [ - ]. Sinθe these models are ηased on the distriηuted representation, they have
the roηustness for damaged neurons. However, their storage θapaθities are small ηeθause
their learning algorithm is ηased on the Heηηian learning.
On the other hand, the Kohonen Feature Map KFM assoθiative memory [ ] has ηeen pro‐
posed. “lthough the KFM assoθiative memory is ηased on the loθal representation as similar
as the “RT[ ], it θan learn new patterns suθθessively [ ], and its storage θapaθity is larger
than that of models in refs.[ - ]. It θan deal with auto and hetero assoθiations and the asso‐

© 2013 Nog”chi and Y”ko; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
4 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

θiations for plural sequential patterns inθluding θommon terms [ , ]. Moreover, the KFM
assoθiative memory with area representation [ ] has ηeen proposed. In the model, the area
representation [ ] was introduθed to the KFM assoθiative memory, and it has roηustness
for damaged neurons. However, it θan not deal with one-to-many assoθiations, and assoθia‐
tions of analog patterns. “s the model whiθh θan deal with analog patterns and one-to-many
assoθiations, the Kohonen Feature Map “ssoθiative Memory with Refraθtoriness ηased on
“rea Representation [ ] has ηeen proposed. In the model, one-to-many assoθiations are re‐
alized ηy refraθtoriness of neurons. Moreover, ηy improvement of the θalθulation of the in‐
ternal states of the neurons in the Map Layer, it has enough roηustness for damaged
neurons when analog patterns are memorized. However, all these models θan not realize
proηaηilistiθ assoθiation for the training set inθluding one-to-many relations.

Figure 1. S“r”c“”re of conven“ional KFMPAM-WD.

“s the model whiθh θan realize proηaηilistiθ assoθiation for the training set inθluding one-
to-many relations, the Kohonen Feature Map Proηaηilistiθ “ssoθiative Memory ηased on
Weights Distriηution KFMP“M-WD [ ] has ηeen proposed. However, in this model, the
weights are updated only in the area θorresponding to the input pattern, so the learning
θonsidering the neighηorhood is not θarried out.

In this paper, we propose an Improved Kohonen Feature Map Proηaηilistiθ “ssoθiative


Memory ηased on Weights Distriηution IKFMP“M-WD . This model is ηased on the θon‐
ventional Kohonen Feature Map Proηaηilistiθ “ssoθiative Memory ηased on Weights Distri‐
ηution [ ]. The proposed model θan realize proηaηilistiθ assoθiation for the training set
inθluding one-to-many relations. Moreover, this model has enough roηustness for noisy in‐
put and damaged neurons. “nd, the learning θonsidering the neighηorhood θan ηe realized.
Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 5
h““p://dx.doi.org/10.5772/51581

. KFM Probabilistic Associative Memory based on Weights Distribution

Here, we explain the θonventional Kohonen Feature Map Proηaηilistiθ “ssoθiative Memory
ηased on Weights Distriηution KFMP“M-WD .

. . Structure

Figure shows the struθture of the θonventional

KFMP“M-WD. “s shown in Fig. , this model has two layers Input/Output Layer and
Map Layer, and the Input/Output Layer is divided into some parts.

. . Learning process

In the learning algorithm of the θonventional KFMP“M-WD, the θonneθtion weights are
learned as follows
. The initial values of weights are θhosen randomly.

. The Euθlidian distanθe ηetween the learning veθtor X p and the θonneθtion weights veθ‐
tor Wi, d X p , Wi is θalθulated.

. If d X p , Wi t
is satisfied for all neurons, the input pattern X p is regarded as an un‐
known pattern. If the input pattern is regarded as a known pattern, go to .

. The neuron whiθh is the θenter of the learning area r is determined as follows

p
r = argmin d X , Wi
i Diz + Dzi < diz
for ∀ z ∈ F

where F is the set of the neurons whose θonneθtion weights are fixed. diz is the distanθe
ηetween the neuron i and the neuron z whose θonneθtion weights are fixed. In Eq. ,
Dij is the radius of the ellipse area whose θenter is the neuron i for the direθtion to the
neuron j, and is given ηy

Dij = {
ai ,
ηi ,
ai ηi
ηi + mij ai
mij + ,
dijy =
dijx =

otherwise

where ai is the long radius of the ellipse area whose θenter is the neuron i and ηi is the
short radius of the ellipse area whose θenter is the neuron i. In the KFMP“M-WD, ai
and ηi θan ηe set for eaθh training pattern. mij is the slope of the line through the neurons
i and j. In Eq. , the neuron whose Euθlidian distanθe ηetween its θonneθtion weights
and the learning veθtor is minimum in the neurons whiθh θan ηe take areas without
6 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

overlaps to the areas θorresponding to the patterns whiθh are already trained. In Eq. ,
ai and ηi are used as the size of the area for the learning veθtor.

. If d X p , Wr > t is satisfied, the θonneθtion weights of the neurons in the ellipse whose
θenter is the neuron r are updated as follows

Wi t + = {
Wi t +
Wi t ,
t X p
−Wi t , dri ≤ Dri
otherwise

where t is the learning rate and is given ηy

− t −T
t =
T .

Here, is the initial value of t and T is the upper limit of the learning iterations.

. is iterated until d X p , Wr ≤ t
is satisfied.

. The θonneθtion weights of the neuron r Wr are fixed.

. ∼ are iterated when a new pattern set is given.

. . Recall process

In the reθall proθess of the KFMP“M-WD, when the pattern X is given to the Input/Output
Layer, the output of the neuron i in the Map Layer, ximap is θalθulated ηy

,
ximap = , { i =r
otherwise

where r is seleθted randomly from the neurons whiθh satisfy

∑ g X k − W ik > map
N in k∈C

map
where is the threshold of the neuron in the Map Layer, and g ⋅ is given ηy

g η = { ,, | η| < d
otherwise .

In the KFMP“M-WD, one of the neurons whose θonneθtion weights are similar to the input
pattern are seleθted randomly as the winner neuron. So, the proηaηilistiθ assoθiation θan ηe
realized ηased on the weights distriηution.
Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 7
h““p://dx.doi.org/10.5772/51581

When the ηinary pattern X is given to the Input/Output Layer, the output of the neuron k in
the Input/Output Layer xkio is given ηy

xkio = { ,
,
W rk ≥ ηno
otherwise

io
where η is the threshold of the neurons in the Input/Output Layer.

When the analog pattern X is given to the Input/Output Layer, the output of the neuron k in
the Input/Output Layer xkio is given ηy

xkio = W rk
.

. Improved KFM Probabilistic Associative Memory based on Weights


Distribution

Here, we explain the proposed Improved Kohonen Feature Map Proηaηilistiθ “ssoθiative
Memory ηased on Weights Distriηution IKFMP“M-WD . The proposed model is ηased on
the θonventional Kohonen Feature Map Proηaηilistiθ “ssoθiative Memory ηased on Weights
Distriηution KFMP“M-WD [ ] desθriηed in .

. . Structure

Figure shows the struθture of the proposed IKFMP“M-WD. “s shown in Fig. , the pro‐
posed model has two layers Input/Output Layer and Map Layer, and the Input/
Output Layer is divided into some parts as similar as the θonventional KFMP“M-WD.

. . Learning process

In the learning algorithm of the proposed IKFMP“M-WD, the θonneθtion weights are
learned as follows
. The initial values of weights are θhosen randomly.

. The Euθlidian distanθe ηetween the learning veθtor X p and the θonneθtion weights veθ‐
tor Wi, d X p , Wi , is θalθulated.

. If d X p , Wi t is satisfied for all neurons, the input pattern X p is regarded as an un‐


known pattern. If the input pattern is regarded as a known pattern, go to .

. The neuron whiθh is the θenter of the learning area r is determined ηy Eq. . In Eq. ,
the neuron whose Euθlid distanθe ηetween its θonneθtion weights and the learning veθ‐
tor is minimum in the neurons whiθh θan ηe take areas without overlaps to the areas
8 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

θorresponding to the patterns whiθh are already trained. In Eq. , ai and ηi are used as
the size of the area for the learning veθtor.

. If d X p , Wr t
is satisfied, the θonneθtion weights of the neurons in the ellipse whose
θenter is the neuron r are updated as follows

{
p learn ¯
X , ≤ H dri
¯ p learn ¯ learn
W i t + H dri X −Wi t , ≤ H dri <
Wi t + = ¯ learn
andH di ∗i <
Wi t , otherwise

learn ¯ ¯
where are thresholds. H dri and H di ∗i are given ηy Eq. and these are semi-
¯
fixed funθtion. Espeθially, H dri ηehaves as the neighηorhood funθtion. Here, i* shows
the nearest weight-fixed neuron from the neuron i.

¯
H dij = ¯
dij − D
+ exp

¯
where dij shows the normalized radius of the ellipse area whose θenter is the neuron i
for the direθtion to the neuron j, and is given ηy

¯
dij
dij = .
Dij

In Eq. ,D D is the θonstant to deθide the neighηorhood area size and is the steep‐
ness parameter. If there is no weight-fixed neuron,

¯
H di ∗i =

is used.

. is iterated until d X p , Wr ≤ t
is satisfied.

. The θonneθtion weights of the neuron r Wr are fixed.

. ∼ are iterated when a new pattern set is given.


Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 9
h““p://dx.doi.org/10.5772/51581

Figure 2. S“r”c“”re of proposed IKFMPAM-WD.

. . Recall process

The reθall proθess of the proposed IKFMP“M-WD is same as that of the θonventional
KFMP“M-WD desθriηed in . .

. Computer experiment results

Here, we show the θomputer experiment results to demonstrate the effeθtiveness of the pro‐
posed IKFMP“M-WD.

. . Experimental conditions

Taηle shows the experimental θonditions used in the experiments of . ∼ . .

. . Association results

. . . Binary patterns

In this experiment, the ηinary patterns inθluding one-to-many relations shown in Fig. were
memorized in the network θomposed of neurons in the Input/Output Layer and
neurons in the Map Layer. Figure shows a part of the assoθiation result when θrow€ was
given to the Input/Output Layer. “s shown in Fig. , when θrow€ was given to the net‐
10 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

work, mouse€ t= , monkey€ t= and lion€ t= were reθalled. Figure shows a part of
the assoθiation result when duθk€ was given to the Input/Output Layer. In this θase, dog€
t= , θat€ t= and penguin€ t= were reθalled. From these results, we θan θon‐
firmed that the proposed model θan reθall ηinary patterns inθluding one-to-many relations.

Parameters for Learning

Threshold for Learning θ tlearn 10-4

Neighborhood Area Size D 3

S“eepness Parame“er in Neighborhood F”nc“ionε 0.91

Threshold of Neighborhood F”nc“ion (1) θ 1


learn
0.9

Threshold of Neighborhood F”nc“ion (2) θ 2


learn
0.1

Parameters for Recall (Common)

Threshold of Ne”rons in Map Layer θ map 0.75

Threshold of Difference be“ween Weigh“ Vec“orθ d 0.004


and Inp”“ Vec“or

Parameter for Recall (Binary)

Threshold of Ne”rons in Inp”“/O”“p”“ Layer θ bin 0.5

Table 1. Experimen“al Condi“ions.

Figure 3. Training Pa““erns incl”ding One-“o-Many Rela“ions (Binary Pa““ern).


Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 11
h““p://dx.doi.org/10.5772/51581

Figure 4. One-“o-Many Associa“ions for Binary Pa““erns (When crow was Given).

Figure 5. One-“o-Many Associa“ions for Binary Pa““erns (When d”ck was Given).

Figure shows the Map Layer after the pattern pairs shown in Fig. were memorized. In
Fig. , red neurons show the θenter neuron in eaθh area, ηlue neurons show the neurons in
areas for the patterns inθluding θrow€, green neurons show the neurons in areas for the
patterns inθluding duθk€. “s shown in Fig. , the proposed model θan learn eaθh learning
pattern with various size area. Moreover, sinθe the θonneθtion weights are updated not only
in the area ηut also in the neighηorhood area in the proposed model, areas θorresponding to
the pattern pairs inθluding θrow€/duθk€ are arranged in near area eaθh other.
12 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Learning Pattern Long Radius aiShort Radius bi


crow – lion 2.5 1.5

crow – monkey 3.5 2.0

crow – mo”se 4.0 2.5

d”ck – peng”in 2.5 1.5

d”ck – dog 3.5 2.0

d”ck – ca“ 4.0 2.5

Table 2. Area Size corresponding “o Pa““erns in Fig. 3.

Figure 6. Area Represen“a“ion for Learning Pa““ern in Fig. 3.

Input PatternOutput PatternArea SizeRecall Times


crow lion 11 (1.0) 43 (1.0)

monkey 23 (2.1) 87 (2.0)

mo”se 33 (3.0) 120 (2.8)

d”ck peng”in 11 (1.0) 39 (1.0)

dog 23 (2.1) 79 (2.0)

ca“ 33 (3.0) 132 (3.4)

Table 3. Recall Times for Binary Pa““ern corresponding “o crow and d”ck .
Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 13
h““p://dx.doi.org/10.5772/51581

Taηle shows the reθall times of eaθh pattern in the trial of Fig. t= ∼ and Fig.
t= ∼ . In this taηle, normalized values are also shown in . From these results, we
θan θonfirmed that the proposed model θan realize proηaηilistiθ assoθiations ηased on the
weight distriηutions.

. . . Analog patterns

In this experiment, the analog patterns inθluding one-to-many relations shown in Fig.
were memorized in the network θomposed of neurons in the Input/Output Layer and
neurons in the Map Layer. Figure shows a part of the assoθiation result when ηear€
was given to the Input/Output Layer. “s shown in Fig. , when ηear€ was given to the
network, lion€ t= , raθθoon dog€ t= and penguin€ t= were reθalled. Figure shows
a part of the assoθiation result when mouse€ was given to the Input/Output Layer. In this
θase, monkey€ t= , hen€ t= and θhiθk€ t= were reθalled. From these re‐
sults, we θan θonfirmed that the proposed model θan reθall analog patterns inθluding one-
to-many relations.

Figure 7. Training Pa““erns incl”ding One-“o-Many Rela“ions (Analog Pa““ern).

Figure 8: One-“o-Many Associa“ions for Analog Pa““erns (When bear was Given).
14 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 9. One-“o-Many Associa“ions for Analog Pa““erns (When mo”se was Given).

Learning Pattern Long Radius aiShort Radius bi


bear – lion 2.5 1.5

bear – raccoon dog 3.5 2.0

bear – peng”in 4.0 2.5

mo”se – chick 2.5 1.5

mo”se – hen 3.5 2.0

mo”se – monkey 4.0 2.5

Table 4. Area Size corresponding “o Pa““erns in Fig. 7.

Figure 10. Area Represen“a“ion for Learning Pa““ern in Fig. 7.


Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 15
h““p://dx.doi.org/10.5772/51581

Input PatternOutput PatternArea SizeRecall Times


bear lion 11 (1.0) 40 (1.0)

raccoon dog 23 (2.1) 90 (2.3)

peng”in 33 (3.0) 120 (3.0)

mo”se chick 11 (1.0) 38 (1.0)

hen 23 (2.1) 94 (2.5)

monkey 33 (3.0) 118 (3.1)

Table 5. Recall Times for Analog Pa““ern corresponding “o bear and mo”se .

Figure shows the Map Layer after the pattern pairs shown in Fig. were memorized. In
Fig. , red neurons show the θenter neuron in eaθh area, ηlue neurons show the neurons in
the areas for the patterns inθluding ηear€, green neurons show the neurons in the areas for
the patterns inθluding mouse€. “s shown in Fig. , the proposed model θan learn eaθh
learning pattern with various size area.

Taηle shows the reθall times of eaθh pattern in the trial of Fig. t= ∼ and Fig.
t= ∼ . In this taηle, normalized values are also shown in . From these results, we
θan θonfirmed that the proposed model θan realize proηaηilistiθ assoθiations ηased on the
weight distriηutions.

Figure 11. S“orage Capaci“y of Proposed Model (Binary Pa““erns).


16 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 12. S“orage Capaci“y of Proposed Model (Analog Pa““erns).

. . Storage capacity

Here, we examined the storage θapaθity of the proposed model. Figures and show the
storage θapaθity of the proposed model. In this experiment, we used the network θomposed
of neurons in the Input/Output Layer and / neurons in the Map Layer, and -to-P
P= , , random pattern pairs were memorized as the area ai= . and ηi= . . Figures
and show the average of trials, and the storage θapaθities of the θonventional mod‐
el are also shown for referenθe in Figs. and . From these results, we θan θonfirm that
the storage θapaθity of the proposed model is almost same as that of the θonventional mod‐
el . “s shown in Figs. and , the storage θapaθity of the proposed model does not de‐
pend on ηinary or analog pattern. “nd it does not depend on P in one-to-P relations. It
depends on the numηer of neurons in the Map Layer.

. . Robustness for noisy input

. . . Assoθiation result for noisy input

Figure shows a part of the assoθiation result of the proposed model when the pattern
θat€ with % noise was given during t= ∼ . Figure shows a part of the assoθiation
result of the propsoed model when the pattern θrow€ with % noise was given t= ∼
. “s shown in these figures, the proposed model θan reθall θorreθt patterns even when
the noisy input was given.
Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 17
h““p://dx.doi.org/10.5772/51581

Figure 13. S“orage Capaci“y of Conven“ional Model [16] (Binary Pa““ern

Figure 14. S“orage Capaci“y of Conven“ional Model [16] (Analog Pa““erns).

Figure 15. Associa“ion Res”l“ for Noisy Inp”“ (When crow was Given.).
18 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 16. Associa“ion Res”l“ for Noisy Inp”“ (When d”ck was Given.).

Figure 17. Rob”s“ness for Noisy Inp”“ (Binary Pa““erns).

Figure 18. Rob”s“ness for Noisy Inp”“ (Analog Pa““erns).


Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 19
h““p://dx.doi.org/10.5772/51581

. . . Roηustness for noisy input

Figures and show the roηustness for noisy input of the proposed model. In this experi‐
ment, randam patterns in one-to-one relations were memorized in the network θomposed
of neurons in the Input/Output Layer and neurons in the Map Layer. Figures and
are the average of trials. “s shown in these figures, the proposed model has roηust‐
ness for noisy input as similar as the θonventional model .

. . Robustness for damaged neurons

. . . Assoθiation result when some neurons in map layer are damaged

Figure shows a part of the assoθiation result of the proposed model when the pattern
ηear€ was given during t= ∼ . Figure shows a part of the assoθiation result of the
proposed model when the pattern mouse€ was given t= ∼ . In these experiments,
the network whose % of neurons in the Map Layer are damaged were used. “s shown in
these figures, the proposed model θan reθall θorreθt patterns even when the some neurons in
the Map Layer are damaged.

. . . Roηustness for damaged neurons

Figures and show the roηustness when the winner neurons are damaged in the pro‐
posed model. In this experiment, ∼ random patterns in one-to-one relations were mem‐
orized in the network θomposed of neurons in the Input/Output Layer and neurons
in the Map Layer. Figures and are the average of trials. “s shown in these figures,
the proposed model has roηustness when the winner neurons are damaged as similar as the
θonventional model [ ].

Figure 19. Associa“ion Res”l“ for Damaged Ne”rons (When bear was Given.).
20 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 20. Associa“ion Res”l“ for Damaged Ne”rons (When mo”se was Given.).

Figure 21. Rob”s“ness of Damaged Winner Ne”rons (Binary Pa““erns).

Figure 22. Rob”s“ness of Damaged Winner Ne”rons (Analog Pa““erns).


Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 21
h““p://dx.doi.org/10.5772/51581

Figure 23: Rob”s“ness for Damaged Ne”rons (Binary Pa““erns).

Figure 24. Rob”s“ness for Damaged Ne”rons (Analog Pa““erns).

Figures and show the roηustness for damaged neurons in the proposed model. In this
experiment, random patterns in one-to-one relations were memorized in the network
θomposed of neurons in the Input/Output Layer and neurons in the Map Layer. Fig‐
ures and are the average of trials. “s shown in these figures, the proposed model
has roηustness for damaged neurons as similar as the θonventional model [ ].

. . Learning speed

Here, we examined the learning speed of the proposed model. In this experiment, ran‐
dom patterns were memorized in the network θomposed of neurons in the Input/
Output Layer and neurons in the Map Layer. Taηle shows the learning time of the pro‐
posed model and the θonventional model . These results are average of trials on the
Personal Computer Intel Pentium . GHz , Free”SD . , gθθ . . . “s shown in Taηle
, the learning time of the proposed model is shorter than that of the θonventional model.
22 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

. Conclusions

In this paper, we have proposed the Improved Kohonen Feature Map Proηaηilistiθ “ssoθia‐
tive Memory ηased on Weights Distriηution. This model is ηased on the θonventional Koho‐
nen Feature Map Proηaηilistiθ “ssoθiative Memory ηased on Weights Distriηution. The
proposed model θan realize proηaηilistiθ assoθiation for the training set inθluding one-to-
many relations. Moreover, this model has enough roηustness for noisy input and damaged
neurons. We θarried out a series of θomputer experiments and θonfirmed the effeθtiveness of
the proposed model.

Learning Time (seconds)


Proposed Model (Binary Pa““erns) 0.87

Proposed Model (Analog Pa““erns) 0.92

Conven“ional Model(16) (Binary Pa““erns) 1.01

Conven“ional Model(16) (Analog Pa““erns) 1.34

Table 6. Learning Speed.

Author details

Shingo Noguθhi and Osana Yuko*

*“ddress all θorrespondenθe to osana@θs.teu.aθ.jp

Tokyo University of Teθhnology, Japan

References

[ ] Rumelhart, D. E., MθClelland, J. L., & the PDP Researθh Group. . Parallel Dis‐
triηuted Proθessing, Exploitations in the Miθrostruθture of Cognition. , Founda‐
tions, The MIT Press.

[ ] Kohonen, T. . Self-Organizing Maps. Springer.

[ ] Hopfield, J. J. . Neural networks and physiθal systems with emergent θolleθtive


θomputational aηilities. Proθeedings of National Aθademy Sθienθes USA, , - .

[ ] Kosko, ”. . ”idireθtional assoθiative memories. IEEE Transaθtions on Neural Net‐


works, , - .

[ ] Carpenter, G. “., & Grossηerg, S. . Pattern Reθognition ηy Self-organizing Neu‐


ral Networks. The MIT Press.
Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 23
h““p://dx.doi.org/10.5772/51581

[ ] Watanaηe, M., “ihara, K., & Kondo, S. . “utomatiθ learning in θhaotiθ neural
networks. IEICE-A, J -“ , - , in Japanese .

[ ] “rai, T., & Osana, Y. . Hetero θhaotiθ assoθiative memory for suθθessive learn‐
ing with give up funθtion -- One-to-many assoθiations --, Proθeedings of I“STED “r‐
tifiθial Intelligenθe and “ppliθations. Innsηruθk.

[ ] “ndo, M., Okuno, Y., & Osana, Y. . Hetero θhaotiθ assoθiative memory for suθ‐
θessive learning with multi-winners θompetition. Proθeedings of IEEE and INNS Inter‐
national Joint Conferenθe on Neural Networks, Vanθouver.

[ ] Iθhiki, H., Hagiwara, M., & Nakagawa, M. . Kohonen feature maps as a super‐
vised learning maθhine. Proθeedings of IEEE International Conferenθe on Neural Net‐
works, - .

[ ] Yamada, T., Hattori, M., Morisawa, M., & Ito, H. . Sequential learning for asso‐
θiative memory using Kohonen feature map. Proθeedings of IEEE and INNS Interna‐
tional Joint Conferenθe on Neural Networks, , Washington D.C.

[ ] Hattori, M., “risumi, H., & Ito, H. . Sequential learning for SOM assoθiative
memory with map reθonstruθtion. Proθeedings of International Conferenθe on Artifiθial
Neural Networks, Vienna.

[ ] Sakurai, N., Hattori, M., & Ito, H. . SOM assoθiative memory for temporal se‐
quenθes. Proθeedings of IEEE and INNS International Joint Conferenθe on Neural Net‐
works, - , Honolulu.

[ ] “ηe, H., & Osana, Y. . Kohonen feature map assoθiative memory with area rep‐
resentation. Proθeedings of IASTED Artifiθial Intelligenθe and Appliθations, Innsηruθk.

[ ] Ikeda, N., & Hagiwara, M. . “ proposal of novel knowledge representation


“rea representation and the implementation ηy neural network. International Con‐
ferenθe on Computational Intelligenθe and Neurosθienθe, III, - .

[ ] Imaηayashi, T., & Osana, Y. . Implementation of assoθiation of one-to-many as‐


soθiations and the analog pattern in Kohonen feature map assoθiative memory with
area representation. Proθeedings of IASTED Artifiθial Intelligenθe and Appliθations, Inns‐
ηruθk.

[ ] Koike, M., & Osana, Y. . Kohonen feature map proηaηilistiθ assoθiative memo‐
ry ηased on weights distriηution. Proθeedings of IASTED Artifiθial Intelligenθe and Ap‐
pliθations, Innsηruθk.
Chapter 2

Biologically Plausible Artificial Neural Networks

João L”ís Garcia Rosa

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/54177

1. Introduction
Artificial Neural Networks (ANNs) are based on an abstract and simplified view of the
neuron. Artificial neurons are connected and arranged in layers to form large networks,
where learning and connections determine the network function. Connections can be formed
through learning and do not need to be ’programmed.’ Recent ANN models lack many
physiological properties of the neuron, because they are more oriented to computational
performance than to biological credibility [41].
According to the fifth edition of Gordon Shepherd book, The Synaptic Organization of the Brain,
“information processing depends not only on anatomical substrates of synaptic circuits, but
also on the electrophysiological properties of neurons” [51]. In the literature of dynamical
systems, it is widely believed that knowing the electrical currents of nerve cells is sufficient
to determine what the cell is doing and why. Indeed, this somewhat contradicts the
observation that cells that have similar currents may exhibit different behaviors. But in
the neuroscience community, this fact was ignored until recently when the difference in
behavior was showed to be due to different mechanisms of excitability bifurcation [35].
A bifurcation of a dynamical system is a qualitative change in its dynamics produced by
varying parameters [19].
The type of bifurcation determines the most fundamental computational properties of
neurons, such as the class of excitability, the existence or nonexistence of the activation
threshold, all-or-none action potentials (spikes), sub-threshold oscillations, bi-stability of rest
and spiking states, whether the neuron is an integrator or resonator etc. [25].
A biologically inspired connectionist approach should present a neurophysiologically
motivated training algorithm, a bi-directional connectionist architecture, and several other
features, e. g., distributed representations.

© 2013 Rosa; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he Crea“ive
Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s ”nres“ric“ed ”se,
dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
26 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

1.1. McCulloch-Pitts neuron


McCulloch-Pitts neuron (1943) was the first mathematical model [32]. Its properties:

• neuron activity is an "all-or-none" process;


• a certain fixed number of synapses are excited within a latent addition period in order to
excite a neuron: independent of previous activity and of neuron position;
• synaptic delay is the only significant delay in nervous system;
• activity of any inhibitory synapse prevents neuron from firing;
• network structure does not change along time.

The McCulloch-Pitts neuron represents a simplified mathematical model for the neuron,
where xi is the i-th binary input and wi is the synaptic (connection) weight associated with
the input xi . The computation occurs in soma (cell body). For a neuron with p inputs:

p
a= ∑ x i wi (1)
i =1

with x0 = 1 and w0 = β = −θ. β is the bias and θ is the activation threshold. See figures 1
and 2. The are p binary inputs in the schema of figure 2. Xi is the i-th input, Wi is the
connection (synaptic) weight associated with input i. The synaptic weights are real numbers,
because the synapses can inhibit (negative signal) or excite (positive signal) and have different
intensities. The weighted inputs (Xi × Wi ) are summed in the cell body, providing a signal a.
After that, the signal a is input to an activation function ( f ), giving the neuron output.

Figure 1. The typical neuron. Figure 2. The neuron model.

The activation function can be: (1) hard limiter, (2) threshold logic, and (3) sigmoid, which is
considered the biologically more plausible activation function.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 27
h““p://dx.doi.org/10.5772/54177

1.2. The perceptron


Rosenblatt’s perceptron [47] takes a weighted sum of neuron inputs, and sends output 1
(spike) if this sum is greater than the activation threshold. It is a linear discriminator: given
2 points, a straight line is able to discriminate them. For some configurations of m points, a
straight line is able to separate them in two classes (figures 3 and 4).

Figure 3. Set of linearly separable points. Figure 4. Set of non-linearly separable points.

The limitations of the perceptron is that it is an one-layer feed-forward network


(non-recurrent); it is only capable of learning solution of linearly separable problems; and its
learning algorithm (delta rule) does not work with networks of more than one layer.

1.3. Neural network topology


In cerebral cortex, neurons are disposed in columns, and most synapses occur between
different columns. See the famous drawing by Ramón y Cajal (figure 5). In the extremely
simplified mathematical model, neurons are disposed in layers (representing columns), and
there is communication between neurons in different layers (see figure 6).

Figure 5. Drawing by Santiago Ramón y Cajal of neurons in the pigeon cerebellum. (A) denotes Purkinje cells, an example of a
multipolar neuron, while (B) denotes granule cells, which are also multipolar [57].
28 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 6. A 3-layer neural network. Notice that there are A + 1 input units, B + 1 hidden units, and C output units. w1 and
w2 are the synaptic weight matrices between input and hidden layers and between hidden and output layers, respectively. The
“extra” neurons in input and hidden layers, labeled 1, represent the presence of bias: the ability of the network to fire even in
the absence of input signal.

1.4. Classical ANN models


Classical artificial neural networks models are based upon a simple description of the
neuron, taking into account the presence of presynaptic cells and their synaptic potentials,
the activation threshold, and the propagation of an action potential. So, they represent
impoverished explanation of human brain characteristics.
As advantages, we may say that ANNs are naturally parallel solution, robust, fault tolerant,
they allow integration of information from different sources or kinds, are adaptive systems,
that is, capable of learning, they show a certain autonomy degree in learning, and display a
very fast recognizing performance.
And there are many limitations of ANNs. Among them, it is still very hard to explain its
behavior, because of lacking of transparency, their solutions do not scale well, and they are
computationally expensive for big problems, and yet very far from biological reality.
ANNs do not focus on real neuron details. The conductivity delays are neglected. The output
signal is either discrete (e.g., 0 or 1) or a real number (e.g., between 0 and 1). The network
input is calculated as the weighted sum of input signals, and it is transformed in an output
signal via a simple function (e.g., a threshold function). See the main differences between the
biological neural system and the conventional computer on table 1.
Andy Clark proposes three types of connectionism [2]: (1) the first-generation consisting
of perceptron and cybernetics of the 1950s. They are simple neural structures of limited
applications [30]; (2) the second generation deals with complex dynamics with recurrent
networks in order to deal with spatio-temporal events; (3) the third generation takes into
account more complex dynamic and time properties. For the first time, these systems use
biological inspired modular architectures and algorithms. We may add a fourth type: a
network which considers populations of neurons instead of individual ones and the existence
of chaotic oscillations, perceived by electroencephalogram (EEG) analysis. The K-models are
examples of this category [30].
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 29
h““p://dx.doi.org/10.5772/54177

Von Neumann computer Biological neural system


Processor Complex High speed
One or few Simple
Low speed A large number
Memory Separated from processor Localized
Non-content addressable Integrated with processor
Distributed Content addressable
Computing Centralized Sequential
Stored programs Distributed
Parallel Self-learning
Reliability Very vulnerable Robust
Expertise Numeric and symbolic manipulations Perceptual problems
Operational environment Well-defined, well-constrained Poorly defined, unconstrained

Table 1. Von Neumann’s computer versus biological neural system [26].

1.5. Learning
The Canadian psychologist Donald Hebb established the bases for current connectionist
learning algorithms: “When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes part in firing it, some growth process or metabolic change takes place in
one or both cells such that A’s efficiency, as one of the cells firing B, is increased” [21]. Also,
the word “connectionism” appeared for the first time: “The theory is evidently a form of
connectionism, one of the switchboard variety, though it does not deal in direct connections
between afferent and efferent pathways: not an ’S-R’ psychology, if R means a muscular
response. The connections server rather to establish autonomous central activities, which
then are the basis of further learning” [21].
According to Hebb, knowledge is revealed by associations, that is, the plasticity in Central
Nervous System (CNS) allows synapses to be created and destroyed. Synaptic weights
change values, therefore allow learning, which can be through internal self-organizing:
encoding of new knowledge and reinforcement of existent knowledge. How to supply a
neural substrate to association learning among world facts? Hebb proposed a hypothesis:
connections between two nodes highly activated at the same time are reinforced. This kind of
rule is a formalization of the associationist psychology, in which associations are accumulated
among things that happen together. This hypothesis permits to model the CNS plasticity,
adapting it to environmental changes, through excitatory and inhibitory strength of existing
synapses, and its topology. This way, it allows that a connectionist network learns correlation
among facts.
Connectionist networks learn through synaptic weight change, in most cases: it reveals
statistical correlations from the environment. Learning may happen also through network
topology change (in a few models). This is a case of probabilistic reasoning without a
statistical model of the problem. Basically, two learning methods are possible with Hebbian
learning: unsupervised learning and supervised learning. In unsupervised learning there is
no teacher, so the network tries to find out regularities in the input patterns. In supervised
learning, the input is associated with the output. If they are equal, learning is called
auto-associative; if they are different, hetero-associative.
30 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

1.6. Back-propagation
Back-propagation (BP) is a supervised algorithm for multilayer networks. It applies the
generalized delta rule, requiring two passes of computation: (1) activation propagation
(forward pass), and (2) error back propagation (backward pass). Back-propagation works
in the following way: it propagates the activation from input to hidden layer, and from
hidden to output layer; calculates the error for output units, then back propagates the error
to hidden units and then to input units.
BP has a universal approximation power, that is, given a continuous function, there is a
two-layer network (one hidden layer) that can be trained by Back-propagation in order to
approximate as much as desired this function. Besides, it is the most used algorithm.
Although Back-propagation is a very known and most used connectionist training algorithm,
it is computationally expensive (slow), it does not solve satisfactorily big size problems, and
sometimes, the solution found is a local minimum - a locally minimum value for the error
function.
BP is based on the error back propagation: while stimulus propagates forwardly, the error
(difference between the actual and the desired outputs) propagates backwardly. In the
cerebral cortex, the stimulus generated when a neuron fires crosses the axon towards its end
in order to make a synapse onto another neuron input. Suppose that BP occurs in the brain;
in this case, the error must have to propagate back from the dendrite of the postsynaptic
neuron to the axon and then to the dendrite of the presynaptic neuron. It sounds unrealistic
and improbable. Synaptic “weights” have to be modified in order to make learning possible,
but certainly not in the way BP does. Weight change must use only local information in the
synapse where it occurs. That’s why BP seems to be so biologically implausible.

2. Dynamical systems
Neurons may be treated as dynamical systems, as the main result of Hodgkin-Huxley
model [23]. A dynamical system consists of a set of variables that describe its state and
a law that describes the evolution of state variables with time [25]. The Hodgkin-Huxley
model is a dynamical system of four dimensions, because their status is determined solely
by the membrane potential V and the variable opening (activation) and closing (deactivation)
of ion channels n, m and h for persistent K + and transient Na+ currents [1, 27, 28]. The law
of evolution is given by a four-dimensional system of ordinary differential equations (ODE).
Principles of neurodynamics describe the basis for the development of biologically plausible
models of cognition [30].
All variables that describe the neuronal dynamics can be classified into four classes according
to their function and time scale [25]:

1. Membrane potential.
2. Excitation variables, such as activation of a Na+ current. They are responsible for lifting
the action potential.
3. Recovery variables, such as the inactivation of a current Na+ and activation of a rapid
current K + . They are responsible for re-polarization (lowering) of the action potential.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 31
h““p://dx.doi.org/10.5772/54177

4. Adaptation variables, such as the activation of low voltage or current dependent on Ca2+ .
They build prolonged action potentials and can affect the excitability over time.

2.1. The neurons are different


The currents define the type of neuronal dynamical system [20]. There are millions of
different electrophysiological spike generation mechanisms. Axons are filaments (there are
72 km of fiber in the brain) that can reach from 100 microns (typical granule cell), up to
4.5 meters (giraffe primary afferent). And communication via spikes may be stereotypical
(common pyramidal cells), or no communication at all (horizontal cells of the retina). The
speed of the action potential (spike) ranges from 2 to 400 km/h. The input connections ranges
from 500 (retinal ganglion cells) to 200,000 (purkinje cells). In about 100 billion neurons in
the human brain, there are hundreds of thousands of different types of neurons and at least
one hundred neurotransmitters. Each neuron makes on average 1,000 synapses on other
neurons [8].

2.2. Phase portraits


The power of dynamical systems approach to neuroscience is that we can say many things
about a system without knowing all the details that govern its evolution.
Consider a quiescent neuron whose membrane potential is at rest. Since there are no
changes in their state variables, it is an equilibrium point. All incoming currents to
depolarize the neuron are balanced or equilibrated by hyper-polarization output currents:
stable equilibrium (figure 7(a) - top). Depending on the starting point, the system may have
many trajectories, as those shown at the bottom of the figure 7. One can imagine the time
along each trajectory. All of them are attracted to the equilibrium state denoted by the black
dot, called attractor [25]. It is possible to predict the itinerant behavior of neurons through
observation [10].
Regarding Freeman’s neurodynamics (see section 2.5) the most useful state variables are
derived from electrical potentials generated by a neuron. Their recordings allow the
definition of one state variable for axons and another one for dendrites, which are very
different. The axon expresses its state in frequency of action potentials (pulse rate), and
dendrite expresses in intensity of its synaptic current (wave amplitude) [10].
The description of the dynamics can be obtained from a study of system phase portraits,
which shows certain special trajectories (equilibria, separatrices, limit cycles) that determine
the behavior of all other topological trajectory through the phase space.
The excitability is illustrated in figure 7(b). When the neuron is at rest (phase portrait = stable
equilibrium), small perturbations, such as A, result in small excursions from equilibrium,
denoted by PSP (post-synaptic potential). Major disturbances, such as B, are amplified by
the intrinsic dynamics of neuron and result in the response of the action potential.
If a current strong enough is injected into the neuron, it will be brought to a pacemaker mode,
which displays periodic spiking activity (figure 7(c)): this state is called the cycle stable limit,
or stable periodic orbit. The details of the electrophysiological neuron only determine the
position, shape and period of limit cycle.
32 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 7. The neuron states: rest (a), excitable (b), and activity of periodic spiking (c). At the bottom, we see the trajectories
of the system, depending on the starting point. Figure taken from [25], available at http://www.izhikevich.org/publications/dsn.
pdf.

2.3. Bifurcations
Apparently, there is an injected current that corresponds to the transition from rest to
continuous spiking, i.e. from the portrait phase of figure 7(b) to 7(c). From the point of view
of dynamical systems, the transition corresponds to a bifurcation of the dynamical neuron, or
a qualitative representation of the phase of the system.
In general, neurons are excitable because they are close to bifurcations from rest to spiking
activity. The type of bifurcation depends on the electrophysiology of the neuron and
determines its excitable properties. Interestingly, although there are millions of different
electrophysiological mechanisms of excitability and spiking, there are only four different
types of bifurcation of equilibrium that a system can provide. One can understand the
properties of excitable neurons, whose currents were not measured and whose models are
not known, since one can identify experimentally in which of the four bifurcations undergoes
the rest state of the neuron [25].
The four bifurcations are shown in figure 8: saddle-node bifurcation, saddle-node on
invariant circle, sub-critical Andronov-Hopf and supercritical Andronov-Hopf. In saddle-node
bifurcation, when the magnitude of the injected current or other parameter of the bifurcation
changes, a stable equilibrium correspondent to the rest state (black circle) is approximated by
an unstable equilibrium (white circle). In saddle-node bifurcation on invariant circle, there is an
invariant circle at the time of bifurcation, which becomes a limit cycle attractor. In sub-critical
Andronov-Hopf bifurcation, a small unstable limit cycle shrinks to a equilibrium state and
loses stability. Thus the trajectory deviates from equilibrium and approaches a limit cycle of
high amplitude spiking or some other attractor. In the supercritical Andronov-Hopf bifurcation,
the equilibrium state loses stability and gives rise to a small amplitude limit cycle attractor.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 33
h““p://dx.doi.org/10.5772/54177

When the magnitude of the injected current increases, the limit cycle amplitude increases
and becomes a complete spiking limit cycle [25].

Figure 8. Geometry of phase portraits of excitable systems near the four bifurcations can exemplify many neurocomputational
properties. Figure taken from [25], available at http://www.izhikevich.org/publications/dsn.pdf.

Systems with Andronov-Hopf bifurcations, either sub-critical or supercritical, exhibit low


amplitude membrane potential oscillations, while systems with saddle bifurcations, both
without and with invariant circle, do not. The existence of small amplitude oscillations
creates the possibility of resonance to the frequency of the incoming pulses [25].

2.4. Integrators and resonators


Resonators are neurons with reduced amplitude sub-threshold oscillations, and those which
do not have this property are integrators. Neurons that exhibit co-existence of rest and spiking
states, are called bistable and those which do not exhibit this feature are monostable. See
table 2.

2.4.1. Neurocomputational properties


Inhibition prevents spiking in integrators, but promotes it in resonators. The excitatory
inputs push the state of the system towards the shaded region of figure 8, while the inhibitory
inputs push it out. In resonators, both excitation and inhibition push the state toward the
shaded region [25].
34 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

sub-threshold oscillations co-existence of rest and spiking states


yes no
(bistable) (monostable)
no (integrator) saddle-node saddle-node
on invariant circle
yes (resonator) sub-critical supercritical
Andronov-Hopf Andronov-Hopf

Table 2. Neuron classification in integrators-resonators/monostable-bistable, according to the rest state bifurcation. Adapted
from [25].

2.5. Freeman neurodynamics


Nowadays, two very different concepts co-exist in neuroscience, regarding the way how the
brain operates as a whole [55]: (1) classical model, where the brain is described as consisting
of a series of causal chains composed of nerve nets that operate in parallel (the conventional
artificial neural networks [20]); (2) neurodynamical model, where the brain operates by
non-linear dynamical chaos, which looks like noise but presents a kind of hidden order [10].
According to Freeman [10], in order to understand brain functioning, a foundation must be
laid including brain imaging and non-linear brain dynamics, fields that digital computers
make possible. Brain imaging is performed during normal behavior activity, and non-linear
dynamics models these data.
In a dynamicist view, actions and choices made are responsible for creation of meanings in
brains, and they are different from representations. Representations exist only in the world
and have no meanings. The relation of neurons to meaning is not still well understood. In
Freeman’s opinion, although representations can be transferred between machines, meaning
cannot be transferred between brains [10]. Brain activity is directed toward external objects,
leading to creation of meaning through learning. Neuron populations are the key to
understand the biology of intentionality.
Freeman argues that there are two basic units in brain organization: the neuron and
the neuron population. Although neuron has been the base for neurobiology, masses of
interacting neurons forming neuron populations are considered for a macroscopic view of
the brain. Like neurons, neuron populations also have states and activity patterns, but they
do (different) macroscopic things. Between the microscopic neuron and these macroscopic
things, there are mesoscopic populations [10].
Neurobiologists usually claim that brains process information in a cause-and-effect manner:
stimuli carry information that is conveyed in transformed information. What if stimuli are
selected before appearance? This view fails in this case. This traditional view allowed the
development of information processing machines. This simplified, or even mistaken, view
of neuronal workings, led to the development of digital computers. Artificial Intelligence
artifacts pose a challenge: how to attach meaning to the symbolic representations in
machines?
Pragmatists conceive minds as dynamical systems, resulted from actions into the world. How
are these actions generated? According to a cognitivist view, an action is determined by the
form of a stimulus. Intentional action is composed by space-time processes, called short-term
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 35
h““p://dx.doi.org/10.5772/54177

memory or cognitive maps, for materialists and cognitivists. In the pragmatism view there
is no temporary storage of images and no representational map.
The neurons in the brain form dense networks. The balance of excitation and inhibition allow
them to have intrinsic oscillatory activity and overall amplitude modulation (AM) [10, 55].
These AM patterns are expressions of non-linear chaos, not merely a summation of linear
dendritic and action potentials. AM patterns create attractor basins and landscapes. In the
neurodynamical model every neuron participates, to some extent, in every experience and
every behavior, via non-linear chaotic mechanisms [10].
The concepts of non-linear chaotic neurodynamics are of fundamental importance to nervous
system research. They are relevant to our understanding of the workings of the normal
brain [55].

2.5.1. Neuron populations


Typical neuron have many dendrites (input) and one axon (output). The axon transmits
information using microscopic pulse trains. Dendrites integrate information using continuous
waves of ionic current. Neurons are connected by synapses. Each synapse drives electric
current. The microscopic current from each neuron sums with currents from other neurons,
which causes a macroscopic potential difference, measured with a pair of extracellular
electrodes (E) as the electroencephalogram (EEG) [10, 18]. EEG records the activity patterns
of mesoscopic neuron populations. The sum of currents that a neuron generates in response
to electrical stimulus produces the post-synaptic potential. The strength of the post-synaptic
potential decreases with distance between the synapse and the cell body. The attenuation is
compensated by greater surface area and more synapses on the distal dendrites. Dendrites
make waves and axons make pulses. Synapses convert pulses to waves. Trigger zones convert
waves to pulses. See figure 9. Researchers who base their studies on single neurons think that
population events such as EEG are irrelevant noise, because they do not have understanding
of a mesoscopic state [10].

Figure 9. Typical neuron showing the dendrites (input), the soma (cell body), the axon (output), the trigger zone, and the
direction of the action potential. Notice that letters “E” represent the pair of extracellular electrodes. Adapted from [45]
and [10].

In single neurons, microscopic pulse frequencies and wave amplitudes are measured,
while in populations, macroscopic pulse and wave densities are measured. The neuron
is microscopic and ensemble is mesoscopic. The flow of the current inside the neuron is
revealed by a change in the membrane potential, measured with an electrode inside the
cell body, evaluating the dendritic wave state variable of the single neuron. Recall that
36 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

extracellular electrodes are placed outside the neuron (see the Es in figure 9), so cortical
potential provided by sum of dendritic currents in the neighborhood is measured. The same
currents produce the membrane (intracellular) and cortical (extracellular) potentials, given
two views of neural activity: the former, microscopic and the latter, mesoscopic [10].
Cortical neurons, because of their synaptic interactions, form neuron populations.
Microscopic pulse and wave state variables are used to describe the activity of the single
neurons that contribute to the population, and mesoscopic state variables (also pulse and
wave) are used to describe the collective activities neurons give rise. Mass activity in the
brain is described by a pulse density, instead of pulse frequency. This is done by recording
from outside the cell the firing of pulses of many neurons simultaneously. The same current
that controls the firings of neurons is measured by EEG, which does not allow to distinguish
individual contributions. Fortunately, this is not necessary.
A population is a collection of neurons in a neighborhood, corresponding to a cortical
column, which represents dynamical patterns of activity. The average pulse density in a
population can never approach the peak pulse frequencies of single neurons. The activity of
neighborhoods in the center of the dendritic sigmoid curve is very near linear. This simplifies
the description of populations. Neuron populations are similar to mesoscopic ensembles in
many complex systems [10]. The behavior of the microscopic elements is constrained by the
embedding ensemble, and it cannot be understood outside a mesoscopic and macroscopic
view.
The collective action of neurons forms activity patterns that go beyond the cellular level and
approach the organism level. The formation of mesoscopic states is the first step for that. This
way, the activity level is decided by the population, not by individuals [10]. The population
is semi-autonomous. It has a point attractor, returning to the same level after its releasing.
The state space of the neuron population is defined by the range of amplitudes that its pulse
and wave densities can take.

2.5.2. Freeman K-sets


Regarding neuroscience at the mesoscopic level [10, 11], theoretical connection between
the neuron activity at the microscopic level in small neural networks and the activity of
cell assemblies in the mesoscopic scale is not well understood [16]. Katzir-Katchalsky
suggests treating cell assemblies using thermodynamics forming a hierarchy of models of
the dynamics of neuron populations [29] (Freeman K-sets): KO, KI, KII, KIII, KIV and KV.
Katzir-Katchalsky is the reason for the K in Freeman K-sets.
The KO set represents a noninteracting collection of neurons. KI sets represent a collection
of KO sets, which can be excitatory (KIe ) or inhibitory (KIi ). A KII set represents a collection
of KIe and KIi . The KIII model consists of many interconnected KII sets, describing a given
sensory system in brains. A KIV set is formed by the interaction of three KIII sets [30]. KV
sets are proposed to model the scale-free dynamics of neocortex operating on KIV sets [16].
See the representation of KI and KII sets by networks of KO sets in figure 10 [9].
The K-sets mediate between the microscopic activity of small neural networks and the
macroscopic activity of the brain. The topology includes excitatory and inhibitory
populations of neurons and the dynamics is represented by ordinary differential equations
(ODE) [16].
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 37
h““p://dx.doi.org/10.5772/54177

Figure 10. Representation of (b) KI and (c) KII sets by networks of (a) KO sets. Available at [9].

The advantages of KIII pattern classifiers on artificial neural networks are the small number
of training examples needed, convergence to an attractor in a single step and geometric
increase (rather than linear) in the number of classes with the number of nodes. The
disadvantage is the increasing of the computational time needed to solve ordinary differential
equations numerically.
The Katchalsky K-models use a set of ordinary differential equations with distributed
parameters to describe the hierarchy of neuron populations beginning from micro-columns
to hemispheres [31]. In relation to the standard KV, K-sets provide a platform for conducting
analyzes of unified actions of the neocortex in the creation and control of intentional and
cognitive behaviors [13].

2.5.3. Freeman’s mass action


Freeman’s mass action (FMA) [9] refers to collective synaptic actions neurons in the cortex
exert on other neurons, synchronizing their firing of action potentials [17]. FMA expresses
and conveys the meaning of sensory information in spatial patterns of cortical activity that
resembles the frames in a movie [12, 13].
The prevailing concepts in neurodynamics are based on neural networks, which are
Newtonian models, since they treated neural microscopic pulses as point processes in trigger
zones and synapses. The FMA theory is Maxwellian because it treats the mesoscopic neural
activity as a continuous distribution. The neurodynamics of the FMA includes microscopic
neural operations that bring sensory information to sensory cortices and load the first
percepts of the sensory cortex to other parts of the brain. The Newtonian dynamics can
model cortical input and output functions but not the formation of percepts. The FMA needs
a paradigm shift, because the theory is based on new experiments and techniques and new
rules of evidence [17].

2.6. Neuropercolation
Neuropercolation is a family of stochastic models based on the mathematical theory of
probabilistic cellular automata on lattices and random graphs, motivated by the structural
38 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

and dynamical properties of neuron populations. The existence of phase transitions has been
demonstrated both in discrete and continuous state space models, i.e., in specific probabilistic
cellular automata and percolation models. Neuropercolation extends the concept of phase
transitions for large interactive populations of nerve cells [31].
Basic bootstrap percolation [50] has the following properties: (1) it is a deterministic
process, based on random initialization, (2) the model always progresses in one direction:
from inactive to active states and never otherwise. Under these conditions, these
mathematical models exhibit phase transitions with respect to the initialization probability p.
Neuropercolation models develop neurobiologically motivated generalizations of bootstrap
percolations [31].

2.6.1. Neuropercolation and neurodynamics


Dynamical memory neural networks is an alternative approach to pattern-based
computing [18]. Information is stored in the form of spatial patterns of modified connections
in very large scale networks. Memories are recovered by phase transitions, which enable
cerebral cortices to build spatial patterns of amplitude modulation of a narrow band
oscillatory wave. That is, information is encoded by spatial patterns of synaptic weights
of connections that couple non-linear processing elements. Each category of sensory input
has a Hebbian nerve cell assembly. When accessed by a stimulus, the assembly guides the
cortex to the attractors, one for each category.
The oscillating memory devices are biologically motivated because they are based on
observations that the processing of sensory information in the central nervous system is
accomplished via collective oscillations of populations of globally interacting neurons. This
approach provides a new proposal to neural networks.
From the theoretical point of view, the proposed model helps to understand the role of
phase transitions in biological and artificial systems. A family of random cellular automata
exhibiting dynamical behavior necessary to simulate feeling, perception and intention is
introduced [18].

2.7. Complex networks and neocortical dynamics


Complex networks are at the intersection between graph theory and statistical mechanics [4].
They are usually located in an abstract space where the position of the vertexes has no specific
meaning. However, there are several network vertexes where the position is important and
influences the evolution of the network. This is the case of road networks or the Internet,
where the position of cities and routers can be located on a map and the edges between them
represent real physical entities, such as roads and optical fibers. This type of network is called
a “geographic network” or spatial network. Neural networks are spatial networks [56].
From a computational perspective, two major problems that the brain has to solve is the
extraction of information (statistical regularities) of the inputs and the generation of coherent
states that allow coordinated perception and action in real time [56].
In terms of the theory of complex networks [4], the anatomical connections of the cortex
show that the power law distribution of the connection distances between neurons is exactly
optimal to support rapid phase transitions of neural populations, regardless of how great
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 39
h““p://dx.doi.org/10.5772/54177

they are [31]. It is said that connectivity and dynamics are scale-free [13, 14], which states that
the dynamics of the cortex is size independent, such that the brains of mice, men, elephants
and whales work the same way [17].
Scale-free dynamics of the neocortex are characterized by self-similarity of patterns of
synaptic connectivity and spatio-temporal neural activity, seen in power law distributions
of structural and functional parameters and in rapid state transitions between levels of the
hierarchy [15].

2.8. Brain-Computer Interfaces


A non-intrusive technique to allow direct brain-computer interface (BCI) can be a scalp EEG
- an array of electrodes put on the head like a hat, which allows monitoring the cognitive
behavior of animals and humans, by using brain waves to interact with the computer. It is a
kind of a keyboard-less computer that eliminates the need for hand or voice interaction.
The Neurodynamics of Brain & Behavior group in the Computational Neurodynamics
(CND) Lab at the University of Memphis’s FedEx Institute of Technology is dedicated to
research cognitive behavior of animals and humans including the use of molecular genetic or
behavioral genetic approaches, to studies that involve the use of brain imaging techniques,
to apply dynamical mathematical and computational models, to neuroethological studies.
The research has three prongs of use for BCI: video/computer gaming; to support people
with disabilities or physical constraints, such as the elderly; and to improve control of
complex machinery, such as an aircraft and other military and civilian uses [24]. The
direct brain-computer interface would give those with physical constraints or those operating
complex machinery “extra arms” [3].
Similar to how they found seizure prediction markers, the plan is to use the data to analyze
pre-motor movements, the changes in the brain that occur before there’s actually movement,
and apply that to someone who has a prosthetic device to allow them to better manipulate
it. Since the brain is usually multitasking, the researchers will have to pick up the signal for
the desired task from all the other things going on in the brain.

3. A biologically plausible connectionist system


Instead of the computationally successful, but considered to be biologically implausible
supervised Back-propagation [5, 48], the learning algorithm BioRec employed in
BioθPred [44, 46] is inspired by the Recirculation [22] and GeneRec [33] (GR) algorithms,
and consists of two phases.
In the expectation phase1 (figure 11), when input x, representing the first word of a sentence
through semantic microfeatures, is presented to input layer α, there is propagation of these
stimuli to the hidden layer β (bottom-up propagation) (step 1 in figure 11). There is also a
propagation of the previous actual output o p , which is initially empty, from output layer γ
back to the hidden layer β (top-down propagation) (steps 2 and 3).2 Then, a hidden expectation
activation (he ) is generated (Eq. (2)) for each and every one of the B hidden units, based on

1
[33] employs the terms “minus” and “plus” phases to designate expectation and outcome phases respectively in his
GeneRec algorithm.
2
The superscript p is used to indicate that this signal refers to the previous cycle.
40 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 11. The expectation phase. Figure 12. The outcome phase.

inputs and previous output stimuli o p (sum of the bottom-up and top-down propagations -
through the sigmoid logistic activation function σ). Then, these hidden signals propagate to
the output layer γ (step 4), and an actual output o is obtained (step 5) for each and every
one of the C output units, through the propagation of the hidden expectation activation to
the output layer (Eq. (3)) [37]. wijh are the connection (synaptic) weights between input (i)
and hidden (j) units, and wojk are the connection (synaptic) weights between hidden (j) and
output (k) units3 .

p
hej = σ(ΣAi=0 wijh .xi + ΣCk=1 wojk .ok ) 1≤j≤B (2)

ok = σ (ΣBj=0 wojk .hej ) 1≤k≤C (3)

In the outcome phase (figure 12), input x is presented to input layer α again; there is
propagation to hidden layer β (bottom-up) (step 1 in figure 12). After this, expected output
y (step 2) is presented to the output layer and propagated back to the hidden layer β
(top-down) (step 3), and a hidden outcome activation (ho ) is generated, based on inputs
and on expected outputs (Eq. (4)). For the other words, presented one at a time, the same
procedure (expectation phase first, then outcome phase) is repeated [37]. Recall that the
architecture is bi-directional, so it is possible for the stimuli to propagate either forwardly or
backwardly.

3
i, j, and k are the indexes for the input (A), hidden (B), and output (C) units respectively. Input (α) and hidden (β)
layers have an extra unit (index 0) used for simulating the presence of a bias [20]. This extra unit is absent from the
output (γ) layer. That’s the reason i and j range from 0 to the number of units in the layer, and k from 1. x0 , h0e , and
h0o are set to +1. w0j
h o
is the bias of the hidden neuron j and w0k is the bias of the output neuron k.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 41
h““p://dx.doi.org/10.5772/54177

hoj = σ (ΣAi=0 wijh .xi + ΣCk=1 wojk .yk ) 1≤j≤B (4)

In order to make learning possible the synaptic weights are updated through the delta rule4
(Eqs. (5) and (6)), considering only the local information made available by the synapse.
The learning rate η used in the algorithm is considered an important variable during the
experiments [20].

∆wojk = η.(yk − ok ).hej 0 ≤ j ≤ B, 1≤k≤C (5)

∆wijh = η.(hoj − hej ).xi 0 ≤ i ≤ A, 1≤j≤B (6)

Figure 13 displays a simple application to digit learning which compares BP with GeneRec
(GR) algorithms.

Figure 13. BP-GR comparison for digit learning.

Other applications were proposed using similar alleged biological inspired architecture and
algorithm [34, 37–40, 42–44, 49].

4
The learning equations are essentially the delta rule (Widrow-Hoff rule), which is basically error correction: “The
adjustment made to a synaptic weight of a neuron is proportional to the product of the error signal and the input
signal of the synapse in question.” ([20], p. 53).
42 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

3.1. Intraneuron signaling


The Spanish Nobel laureate neuroscientist Santiago Ramón y Cajal, established at the end
of the nineteenth century, two principles that revolutionized neuroscience: the Principle of
connectional specificity, which states that “nerve cells do not communicate indiscriminately
with one another or form random networks;” and the Principle of dynamic polarization,
which says “electric signals inside a nervous cell flow only in a direction: from neuron
reception (often the dendrites and cell body) to the axon trigger zone.” Intraneuron signalling
is based on the principle of dynamic polarization. The signaling inside the neuron is
performed by four basic elements: receptive, trigger, signaling, and secretor. The Receptive
element is responsible for input signals, and it is related to the dendritic region. The Trigger
element is responsible for neuron activation threshold, related to the soma. The Signaling
element is responsible for conducting and keeping the signal and its is related to the axon.
And the Secretor element is responsible for signal releasing to another neuron, so it is related
to the presynaptic terminals of the biological neuron.

3.2. Interneuron signaling


Electrical and chemical synapses have completely different morphologies. At electrical
synapses, transmission occurs through gap junction channels (special ion channels), located
in the pre and postsynaptic cell membranes. There is a cytoplasmatic connection between
cells. Part of electric current injected in presynaptic cell escapes through resting channels
and remaining current is driven to the inside of the postsynaptic cell through gap junction
channels. At chemical synapses, there is a synaptic cleft, a small cellular separation between
the cells. There are vesicles containing neurotransmitter molecules in the presynaptic
terminal and when action potential reaches these synaptic vesicles, neurotransmitters are
released to the synaptic cleft.

3.3. A biologically plausible ANN model proposal


We present here a proposal for a biologically plausible model [36] based on the microscopic
level. This model in intended to present a mechanism to generate a biologically plausible
ANN model and to redesign the classical framework to encompass the traditional features,
and labels that model the binding affinities between transmitters and receptors. This model
departs from a classical connectionist model and is defined by a restricted data set, which
explains the ANN behavior. Also, it introduces T, R, and C variables to account for the
binding affinities between neurons (unlike other models).
The following feature set defines the neurons:

N = {{w}, θ, g, T, R, C } (7)

where:

• w represents the connection weights,


• θ is the neuron activation threshold,
• g stands for the activation function,
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 43
h““p://dx.doi.org/10.5772/54177

• T symbolizes the transmitter,


• R the receptor, and
• C the controller.

θ, g, T, R, and C encode the genetic information, while T, R, and C are the labels, absent
in other models. This proposal follows Ramón y Cajal’s principle of connectional specificity,
that states that each neuron is connected to another neuron not only in relation to {w}, θ,
and g, but also in relation to T, R, and C; neuron i is only connected to neuron j if there is
binding affinity between the T of i and the R of j. Binding affinity means compatible types,
enough amount of substrate, and compatible genes. The combination of T and R results in
C: C can act over other neuron connections.
The ordinary biological neuron presents many dendrites usually branched, which receive
information from other neurons, an axon, which transmits the processed information, usually
by propagation of an action potential. The axon is divided into several branches, and makes
synapses onto the dendrites and cell bodies of other neurons (see figure 14). Chemical
synapse is predominant is the cerebral cortex, and the release of transmitter substance occurs
in active zones, inside presynaptic terminals. Certain chemical synapses lack active zones,
resulting in slower and more diffuse synaptic actions between cells. The combination of a
neurotransmitter and a receptor makes the postsynaptic cell releases a protein.

Figure 14. The chemical synapse. Figure taken from [45].

Although type I synapses seem to be excitatory and type II synapses inhibitory (see
figure 15), the action of a transmitter in the postsynaptic cell does not depend on the chemical
nature of the neurotransmitter, instead it depends on the properties of the receptors with
which the transmitter binds. In some cases, it is the receptor that determines whether
a synapse is excitatory or inhibitory, and an ion channel will be activated directly by the
transmitter or indirectly through a second messenger.
Neurotransmitters are released by presynaptic neuron and they combine with specific
receptor in membrane of postsynaptic neuron. The combination of neurotransmitter with
44 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 15. Morphological synapses type A and type B. In excitatory synapse (type A), neurons contribute to produce impulses
on other cells: asymmetrical membrane specializations, very large synaptic vesicles (50 nm) with packets of neurotransmitters.
In inhibitory synapse (type B), neurons prevent the releasing of impulses on other cells: symmetrical membrane specializations,
synaptic vesicles are smaller and often ellipsoidal or flattened, contact zone usually smaller. Figure taken from [45].

receptor leads to intracellular release or production of a second messenger, which interacts


(directly or indirectly) with ion channel, causing it to open or close. There are two types
of resulting signaling : (1) propagation of action potential, and (2) production of a graded
potential by the axon. Graded potential signaling does not occur over long distances because
of attenuation.
Graded potentials can occur in another level. See, for instance, figure 16. Axon 1 making
synapse in a given cell can receive a synapse from axon 2. Otherwise, the presynaptic synapse
can produce only a local potential change, which is then restricted to that axon terminal
(figure 17).

Figure 16. An axon-axon synapse [6]. Figure 17. A local potential change [6].

In view of these biological facts, it was decided to model through labels T and R, the binding
affinities between Ts and Rs. And label C represents the role of the “second messenger,”, the
effects of graded potential, and the protein released by the coupling of T and R.
Controller C can modify the binding affinities between neurons by modifying the degrees
of affinity of receptors, the amount of substrate (amount of transmitters and receptors), and
gene expression, in case of mutation. The degrees of affinity are related to the way receptors
gate ion channels at chemical synapses. Through ion channels transmitter material enters
the postsynaptic cell: (1) in direct gating: receptors produce relatively fast synaptic actions,
and (2) in indirect gating: receptors produce slow synaptic actions: these slower actions often
serve to modulate behavior because they modify the degrees of affinity of receptors.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 45
h““p://dx.doi.org/10.5772/54177

In addition, modulation can be related to the action of peptides5 . There are many distinct
peptides, of several types and shapes, that can act as neurotransmitters. Peptides are different
from many conventional transmitters, because they “modulate” synaptic function instead of
activating it, they spread slowly and persist for some time, much more than conventional
transmitters, and they do not act where released, but at some distant site (in some cases).
As transmitters, peptides act at very restricted places, display a slow rate of conduction, and
do not sustain the high frequencies of impulses. As neuromodulators, the excitatory effects of
substance P (a peptide) are very slow in the beginning and longer in duration (more than one
minute), so they cannot cause enough depolarization to excite the cells; the effect is to make
neurons more readily excited by other excitatory inputs, the so-called “neuromodulation.”
In the proposed model, C explains this function by modifying the degrees of affinity of
receptors.
In biological systems, the amount of substrate modification is regulated by the acetylcholine
(a neurotransmitter). It spreads over a short distance, toward the postsynaptic membrane,
acting at receptor molecules in that membrane, which are enzymatically divided, and part of
it is taken up again for synthesis of a new transmitter. This will produce an increase in the
amount of substrate. In this model, C represents substrate increase by a variable acting over
initial substrate amount.
Peptides are a second, slower, means of communication between neurons, more economical
than using extra neurons. This second messenger, besides altering the affinities between
transmitters and receptors, can regulate gene expression, achieving synaptic transmission
with long-lasting consequences. In this model, this is achieved by modification of a variable
for gene expression, mutation can be accounted for.

3.3.1. The labels and their dynamic behaviors


In order to build the model, it is necessary to set the parameters for thew connectionist
architecture. For the network genesis, the parameters are:

• number of layers;
• number of neurons in each layer;
• initial amount of substrate (transmitters and receptors) in each layer; and
• genetics of each layer:
• type of transmitter and its degree of affinity,
• type of receptor and its degree of affinity, and
• genes (name and gene expression)).

For the evaluation of controllers and how they act, the parameters are:

• Controllers can modify:


• the degree of affinity of receptors;
• the initial substrate storage; and
• the gene expression value (mutation).
5
Peptides are a compound consisting of two or more amino acids, the building blocks of proteins.
46 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

The specifications stated above lead to an ANN with some distinctive characteristics: (1)
each neuron has a genetic code, which is a set of genes plus a gene expression controller;
(2) the controller can cause mutation, because it can regulate gene expression; (3) the
substrate (amount of transmitter and receptor) is defined by layer, but it is limited, so some
postsynaptic neurons are not activated: this way, the network favors clustering.
Also, the substrate increase is related to the gene specified in the controller, because the
synthesis of a new transmitter occurs in the pre-synaptic terminal (origin gene) [36]. The
modification of the genetic code, that is, mutation, as well as the modification of the degree of
affinity of receptors, however, is related to the target gene. The reason is that the modulation
function of controller is better explained at some distance of the emission of neurotransmitter,
therefore at the target.

3.3.2. A network simulation


In table 3, a data set for a five-layer network simulation is presented [36]. For the
specifications displayed in table 3, the network architecture and its activated connections
are shown in figure 18. For the sake of simplicity, all degrees of affinity are set at 1 (the
degree of affinity is represented by a real number in the range [0..1]; so that the greater the
degree of affinity is the stronger the synaptic connection will be).
layer 1 2 3 4 5
number of neurons 10 10 5 5 1
amount of substrate 8 10 4 5 2
type of transmitter 1 2 1 2 1
degree of affinity of transmitter 1 1 1 1 1
type of receptor 2 1 2 1 2
degree of affinity of receptor 1 1 1 1 1
genes (name/gene expression) abc/1 abc/1 abc/1, def/2 abc/1, def/2 def/2
Controllers: 1/1-2: abc/s/abc/1; 1/1-4: abc/e/abc/2; 2/2-3: abc/a/def/0.5. (Controller syntax: number/origin
layer-target layer: og/t/tg/res, where og = origin gene (name); t = type of synaptic function modulation: a = degree of
affinity, s = substrate, e = gene expression; tg = target gene (name); res = control result: for t = a: res = new degree of
affinity of receptor (target), for t = s: res = substrate increasing (origin), for t = e: res = new gene expression controller
(target). The controllers from layer 2 to 5, from layer 3 to 4, and from layer 4 to 5 are absent in this simulation.)

Table 3. The data set for a five-layer network. Adapted from [36].

In figure 18, one can notice that every unit in layer 1 (the input layer) is linked to the first nine
units in layer 2 (first hidden layer). The reason why not every unit in layer 2 is connected to
layer 1, although the receptor of layer 2 has the same type of the transmitter of layer 1, is that
the amount of substrate in layer 1 is eight units. This means that, in principle, each layer-1
unit is able to connect to at most eight units. But controller 1, from layer 1 to 2, incremented
by 1 the amount of substrate of the origin layer (layer 1). The result is that each layer 1 unit
can link to nine units in layer 2. Observe that from layer 2 to layer 3 (the second hidden layer)
only four layer-2 units are connected to layer 3, because also of the amount of substrate of
layer 3, which is 4.
As a result of the compatibility of layer-2 transmitter and layer-5 receptor, and the existence
of remaining unused substrate of layer 2, one could expect that the first two units in
layer 2 should connect to the only unit in layer 5 (the output unit). However, this does
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 47
h““p://dx.doi.org/10.5772/54177

Figure 18. A five-layer neural network for the data set in table 3. In the bottom of the figure is the layer 1 (input layer) and in
the top is the layer 5 (output layer). Between them, there are three hidden layers (layers 2 to 4). Figure taken from [36].

not occur because their genes are not compatible. Although gene compatibility exists, in
principle, between layers 1 and 4, their units do not connect to each other because there is
no remaining substrate in layer 1 and because controller 1 between layers 1 and 4 modified
the gene expression of layer 4, making them incompatible. The remaining controller has
the effect of modifying the degrees of affinity of receptors in layer 3 (target). Consequently,
the connections between layers 2 and 3 became weakened (represented by dotted lines).
Notice that, in order to allow connections, in addition to the existence of enough amount
of substrate, the genes and the types of transmitters and receptors of each layer must be
compatible.
Although the architecture shown in figure 18 is feed-forward, recurrence, or re-entrance,
is permitted in this model. This kind of feedback goes along with Edelman and Tononi’s
“dynamic core” notion [7]. This up-to-date hypothesis suggests that there are neuronal
groups underlying conscious experience, the dynamic core, which is highly distributed and
integrated through a network of reentrant connections.

3.4. Other models


Other biological plausible ANN models are concerned with the connectionist architecture;
related directly to the cerebral cortex biological structure, or focused on the neural features
and the signaling between neurons. Always, the main purpose is to create a more faithful
model concerning the biological structure, properties, and functionalities, including learning
processes, of the cerebral cortex, not disregarding its computational efficiency. The choice
of the models upon which the proposed description is based takes into account two main
criteria: the fact they are considered biologically more realistic and the fact they deal with
intra and inter-neuron signaling in electrical and chemical synapses. Also, the duration
of action potentials is taken into account. In addition to the characteristics for encoding
information regarding biological plausibility present in current spiking neuron models, a
distinguishable feature is emphasized here: a combination of Hebbian learning and error
driven learning [52–54].
48 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

4. Conclusions
Current models of ANN are in debt with human brain physiology. Because of their
mathematical simplicity, they lack several biological features of the cerebral cortex. Also,
instead of the individual behavior of the neurons, the mesoscopic information is privileged.
The mesoscopic level of the brain could be described adequately by dynamical system
theory (attractor states and cycles). The EEG waves reflect the existence of cycles in brain
electric field. The objective here is to present biologically plausible ANN models, closer to
human brain capacity. In the model proposed, still at the microscopic level of analysis,
the possibility of connections between neurons is related not only to synaptic weights,
activation threshold, and activation function, but also to labels that embody the binding
affinities between transmitters and receptors. This type of ANN would be closer to human
evolutionary capacity, that is, it would represent a genetically well-suited model of the brain.
The hypothesis of the “dynamic core” [7] is also contemplated, that is, the model allows
reentrancy in its architecture connections.

Acknowledgements
I am grateful to my students, who have collaborated with me in this subject for the last ten
years.

Author details
João Luís Garcia Rosa
Bioinspired Computing Laboratory (BioCom), Department of Computer Science, University
of São Paulo at São Carlos, Brazil

5. References
[1] B. Aguera y Arcas, A. L. Fairhall, and W. Bialek, “Computation in a Single Neuron:
Hodgkin and Huxley Revisited,” Neural Computation 15, 1715–1749 (2003), MIT Press.

[2] A. Clark, Mindware: An introduction to the philosophy of cognitive science. Oxford, Oxford
University Press, 2001.

[3] CLION - Center for Large-Scale Integration and Optimization Networks, Neurodynamics
of Brain & Behavior, FedEx Institute of Technology, University of Memphis, Memphis,
TN, USA. http://clion.memphis.edu/laboratories/cnd/nbb/.

[4] L. da F. Costa, F. A. Rodrigues, G. Travieso, and P. R. Villas Boas, “Characterization


of complex networks: A survey of measurements,” Advances in Physics, vol. 56, No. 1,
February 2007, pp. 167–242.

[5] F. H. C. Crick, “The Recent Excitement about Neural Networks,” Nature 337 (1989)
pp. 129–132.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 49
h““p://dx.doi.org/10.5772/54177

[6] F. Crick and C. Asanuma, “Certain Aspects of the Anatomy and Physiology of the
Cerebral Cortex,” in J. L. McClelland and D. E. Rumelhart (eds.), Parallel Distributed
Processing, Vol. 2, Cambridge, Massachusetts - London, England, The MIT Press, 1986.

[7] G. M. Edelman and G. Tononi, A Universe of Consciousness - How Matter Becomes


Imagination, Basic Books, 2000.

[8] C. Eliasmith and C. H. Anderson, Neural Engineering - Computation, Representation, and


Dynamics in Neurobiological Systems, A Bradford Book, The MIT Press, 2003.

[9] W. J. Freeman, Mass action in the nervous system - Examination of the Neurophysiological Basis
of Adaptive Behavior through the EEG, Academic Press, New York San Francisco London
1975. Available at http://sulcus.berkeley.edu/.

[10] W. J. Freeman, How Brains Make Up Their Minds, Weidenfeld & Nicolson, London, 1999.

[11] W. J. Freeman, Mesoscopic Brain Dynamics, Springer-Verlag London Limited 2000.

[12] W. J. Freeman, “How and Why Brains Create Meaning from Sensory Information,”
International Journal of Bifurcation & Chaos 14: 513–530, 2004.

[13] W. J. Freeman, “Proposed cortical ’shutter’ in cinematographic perception,” in L.


Perlovsky and R. Kozma (Eds.), Neurodynamics of Cognition and Consciousness, New York:
Springer, 2007, pp. 11–38.

[14] W. J. Freeman, “Deep analysis of perception through dynamic structures that emerge in
cortical activity from self-regulated noise,” Cogn Neurodyn (2009) 3:105–116.

[15] W. J. Freeman and M. Breakspear, “Scale-free neocortical dynamics,” Scholarpedia


2(2):1357. http://www.scholarpedia.org/article/Scale-free_neocortical_dynamics, 2007.

[16] W. J. Freeman and H. Erwin, “Freeman K-set,” Scholarpedia 3(2):3238. http://www.


scholarpedia.org/article/Freeman_K-set, 2008.

[17] W. J. Freeman and R. Kozma, “Freeman’s mass action,” Scholarpedia 5(1):8040. http:
//www.scholarpedia.org/article/Freeman’s_mass_action, 2010.

[18] W. J. Freeman and R. Kozma, “Neuropercolation + Neurodynamics: Dynamical


Memory Neural Networks in Biological Systems and Computer Embodiments,”
IJCNN2011 Tutorial 6, IJCNN 2011 - International Joint Conference on Neural Networks, San
Jose, California, July 31, 2011.

[19] J. Guckenheimer, “Bifurcation,” Scholarpedia 2(6):1517. http://www.scholarpedia.org/


article/Bifurcation, 2007.

[20] S. Haykin, Neural Networks - A Comprehensive Foundation. Second Edition. Prentice-Hall,


1999.

[21] D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory, Wiley, 1949.


50 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[22] G. E. Hinton and J. L. McClelland, “Learning Representations by Recirculation,” in D.


Z. Anderson (ed.), Neural Information Processing Systems, American Institute of Physics,
New York 1988, pp. 358–66.

[23] A. L. Hodgkin and A. F. Huxley, “A quantitative description of membrane current and


its application to conduction and excitation in nerve,” J. Physiol. (1952) 117, 500–544.

[24] S. Hoover, “Kozma’s research is brain wave of the future,” Update - The newsletter for the
University of Memphis, http://www.memphis.edu/update/sep09/kozma.php, 2009.

[25] E. M. Izhikevich, Dynamical Systems in Neuroscience: The Geometry of Excitability and


Bursting, The MIT Press, 2007.

[26] A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial Neural Networks: A Tutorial,” IEEE
Computer, March 1996, pp. 31–44.

[27] E. R. Kandel, J. H. Schwartz, T. M. Jessell, Essentials of Neural Science and Behavior,


Appleton & Lange, Stamford, Connecticut, 1995.

[28] E. R. Kandel, J. H. Schwartz, T. M. Jessell, Principles of Neural Science, Fourth Edition,


McGraw-Hill, 2000.

[29] A. K. Katchalsky, V. Rowland and R. Blumenthal, Dynamic patterns of brain cell assemblies,
MIT Press, 1974.

[30] R. Kozma, H. Aghazarian, T. Huntsberger, E. Tunstel, and W. J. Freeman,


“Computational aspects of cognition and consciousness in intelligent devices,” IEEE
Computational Intelligence Magazine, August 2007, pp. 53–64.

[31] R. Kozma, “Neuropercolation,” Scholarpedia 2(8):1360. http://www.scholarpedia.org/


article/Neuropercolation, 2007.

[32] W. S. McCulloch and W. Pitts. “A logical calculus of the ideas immanent in nervous
activity.” Bulletin of Mathematical Biophysics, 5, 115-133, 1943.

[33] R. C. O’Reilly, “Biologically Plausible Error-driven Learning Using Local Activation


Differences: the Generalized Recirculation Algorithm,” Neural Computation 8:5 (1996)
pp. 895–938.

[34] T. Orrú, J. L. G. Rosa, and M. L. Andrade Netto, “SABio: A Biologically Plausible


Connectionist Approach to Automatic Text Summarization,” Applied Artificial Intelligence,
22(8), 2008, pp. 896–920. Taylor & Francis.

[35] J. Rinzel and G. B. Ermentrout, “Analysis of neuronal excitability and oscillations,” in


C. Koch and I. Segev (Eds.), Methods In Neuronal Modeling: From Synapses To Networks,
MIT Press, 1989.

[36] J. L. G. Rosa, “An Artificial Neural Network Model Based on Neuroscience: Looking
Closely at the Brain,” in V. Kurková, N. C. Steele, R. Neruda, and M. Kárný (Eds.),
Artificial Neural Nets and Genetic Algorithms - Proceedings of the International Conference
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 51
h““p://dx.doi.org/10.5772/54177

in Prague, Czech Republic, 2001 - ICANNGA-2001. April 22-25, Springer-Verlag, pp.


138-141. ISBN: 3-211-83651-9.

[37] J. L. G. Rosa, “A Biologically Inspired Connectionist System for Natural Language


Processing,” in Proceedings of the 2002 VII Brazilian Symposium on Neural Networks (SBRN
2002). 11-14 November 2002. Recife, Brazil. IEEE Computer Society Press.

[38] J. L. G. Rosa, “A Biologically Motivated Connectionist System for Predicting the Next
Word in Natural Language Sentences,” in Proceedings of the 2002 IEEE International
Conference on Systems, Man, and Cybernetics - IEEE-SMC’02 - Volume 4. 06-09 October
2002. Hammamet, Tunísia.

[39] J. L. G. Rosa, “A Biologically Plausible and Computationally Efficient Architecture


and Algorithm for a Connectionist Natural Language Processor,” in Proceedings of the
2003 IEEE International Conference on Systems, Man, and Cybernetics - IEEE-SMC’03. 05-08
October 2003. Washington, District of Columbia, United States of America, pp. 2845–2850.

[40] J. L. G. Rosa, “A Biologically Motivated and Computationally Efficient Natural


Language Processor,” in R. Monroy, G. Arroyo-Figueroa, L. E. Sucar, and H. Sossa (Eds.),
Lecture Notes in Computer Science. Vol. 2972 / 2004. MICAI 2004: Advances in Artificial
Intelligence: 3rd. Mexican Intl. Conf. on Artificial Intelligence, Mexico City, Mexico, April
26-30, 2004. Proc., pp. 390–399. Springer-Verlag Heidelberg.

[41] J. L. G. Rosa, “Biologically Plausible Artificial Neural Networks,” Two-hour tutorial


at IEEE IJCNN 2005 - International Joint Conference on Neural Networks, Montréal, Canada,
July 31, 2005. Available at http://ewh.ieee.org/cmte/cis/mtsc/ieeecis/contributors.htm.

[42] J. L. G. Rosa, “A Connectionist Thematic Grid Predictor for Pre-parsed Natural


Language Sentences,” in D. Liu, S. Fei, Z. Hou, H. Zhang, and C. Sun (Eds.), Advances
in Neural Networks - ISNN2007 - Lecture Notes in Computer Science, Volume 4492, Part II,
pp. 825–834. Springer-Verlag Berlin Heidelberg, 2007.

[43] J. L. G. Rosa, “A Hybrid Symbolic-Connectionist Processor of Natural Language


Semantic Relations,” Proceedings of the 2009 IEEE Workshop on Hybrid Intelligent Models
and Applications (HIMA2009), IEEE Symposium Series on Computational Intelligence, IEEE
SSCI 2009, March 30 - April 2, 2009. Sheraton Music City Hotel, Nashville, TN, USA. Pp.
64-71. IEEE Conference Proceedings.

[44] J. L. G. Rosa, “Biologically Plausible Connectionist Prediction of Natural Language


Thematic Relations,” Proceedings of the WCCI 2010 - 2010 IEEE World Congress on
Computational Intelligence, IJCNN 2010 - International Joint Conference on Neural Networks,
July 18-23, 2010. Centre de Convencions Internacional de Barcelona, Barcelona, Spain.
IEEE Conference Proceedings, pp. 1127-1134.

[45] J. L. G. Rosa, Fundamentos da Inteligência Artificial (Fundamentals of Artificial


Intelligence), Book in Portuguese, Editora LTC, Rio de Janeiro, 2011.
52 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[46] J. L. G. Rosa and J. M. Adán-Coello, “Biologically Plausible Connectionist Prediction of


Natural Language Thematic Relations,” Journal of Universal Computer Science, J.UCS vol.
16, no. 21 (2010), pp. 3245-3277. ISSN 0948-6968.

[47] F. Rosenblatt, “The perceptron: A perceiving and recognizing automaton,” Report


85-460-1, Project PARA, Cornell Aeronautical Lab., Ithaca, NY, 1957.

[48] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations


by Error Propagation,” in D. E. Rumelhart and J. L. McClelland (eds.), Parallel Distributed
Processing, Volume 1 - Foundations. A Bradford Book, MIT Press, 1986.

[49] M. O. Schneider and J. L. G. Rosa, “Application and Development of Biologically


Plausible Neural Networks in a Multiagent Artificial Life System,” Neural Computing
& Applications, vol. 18, number 1, 2009, pp. 65–75. DOI 10.1007/s00521-007-0156-0.

[50] R. H. Schonmann, “On the Behavior of Some Cellular Automata Related to Bootstrap
Percolation,” The Annals of Probability, Vol. 20, No. 1 (Jan., 1992), pp. 174–193.

[51] G. M. Shepherd, The synaptic organization of the brain, fifth edition, Oxford University
Press, USA, 2003.

[52] A. B. Silva and J. L. G. Rosa, “A Connectionist Model based on Physiological Properties


of the Neuron,” in Proceedings of the International Joint Conference IBERAMIA/SBIA/SBRN
2006 - 1st Workshop on Computational Intelligence (WCI’2006), Ribeirão Preto, Brazil,
October 23-28, 2006. CD-ROM. ISBN 85-87837-11-7.

[53] A. B. Silva and J. L. G. Rosa, “Biological Plausibility in Artificial Neural Networks: An


Improvement on Earlier Models,” Proceedings of The Seventh International Conference on
Machine Learning and Applications (ICMLA’08), 11-13 Dec. 2008, San Diego, California,
USA. IEEE Computer Society Press, pp. 829-834. DOI 10.1109/ICMLA.2008.73.

[54] A. B. Silva and J. L. G. Rosa, “Advances on Criteria for Biological Plausibility in Artificial
Neural Networks: Think of Learning Processes,” Proceedings of IJCNN 2011 - International
Joint Conference on Neural Networks, San Jose, California, July 31 - August 5, 2011, pp.
1394-1401.

[55] J. R. Smythies, Book review on “How Brains Make up Their Minds. By W. J. Freeman.”
Psychological Medicine, 2001, 31, 373–376. 2001 Cambridge University Press.

[56] O. Sporns, “Network Analysis, Complexity, and Brain Function,” Complexity, vol. 8, no.
1, pp. 56–60. Willey Periodicals, Inc. 2003.

[57] Wikipedia - The Free Encyclopedia, available at http://en.wikipedia.org/wiki/Neuron


Chapter 3

Weight Changes for Learning Mechanisms in Two-


Term Back-Propagation Network

Si“i Mariyam Shams”ddin,


Ashraf Osman Ibrahim and Ci“ra Ramadhena

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/51776

. Introduction

The assignment of value to the weight θommonly ηrings the major impaθt towards the
learning ηehaviour of the network. If the algorithm suθθessfully θomputes the θorreθt value
of the weight, it θan θonverge faster to the solution otherwise, the θonvergenθe might ηe
slower or it might θause divergenθe. To prevent this proηlem oθθurring, the step of gradient
desθent is θontrolled ηy a parameter θalled the learning rate. This parameter will determine
the length of step taken ηy the gradient to move along the error surfaθe. Moreover, to avoid
the osθillation proηlem that might happen around the steep valley, the fraθtion of last
weight update is added to the θurrent weight update and the magnitude is adjusted ηy a
parameter θalled momentum. The inθlusion of these parameters aims to produθe a θorreθt
value of weight update whiθh later will ηe used to update the new weight. The θorreθt value
of weight update θan ηe seen in two aspeθts sign and magnitude. If ηoth aspeθts are proper‐
ly θhosen and assigned to the weight, the learning proθess θan ηe optimized and the solution
is not hard to reaθh. Owing to the usefulness of two-term”P and the adaptive learning meth‐
od in learning the network, this study is proposing the weights sign θhanges with respeθt to
gradient desθent in ”P networks, with and without the adaptive learning method.

. Related work

Gradient desθent teθhnique is expeθted to ηring the network θloser to the minimum error
without taking for granted the θonvergenθe rate of the network. It is meant to generate the

© 2013 Mariyam Shams”ddin e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms
of “he Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
54 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

slope that moves downwardsalong the error surfaθe to searθh for the minimum point. Dur‐
ing its movement, the points passed ηy the slope throughout the iterations affeθt the magni‐
tude of the value of weight update and its direθtion. Later, the updated weight is used for
training the network at eaθh epoθh until the predefined iteration is aθhieved or the mini‐
mum error has ηeen reaθhed. Despite the general suθθess of ”P in learning, several major
defiθienθies still need to ηe solved. The most notaηle defiθienθies, aθθording to referenθe [ ],
are the existenθe of temporary loθal minima due to the saturation ηehaviour of aθtivation
funθtion. The slow rates of θonvergenθe are due to the existenθe of loθal minima and the
θonvergenθe rate is relatively slow for a network with more than one hidden layer. These
drawηaθks are also aθknowledged ηy several sθholars [ - ].
Error funθtion plays a vital role in the learning proθess of two-term”P algorithm. “ side
from θalθulating the aθtual error from the training, it assists the algorithm in reaθhing the
minimum point where the solution θonverges ηy θalθulating its gradient and ηaθk propaga‐
tion to the network for weight adjustment and error minimization. Henθe, the proηlem of
ηeing trapped in loθal minima θan ηe avoided and the desired solution θan ηe aθhieved.
The movement of the gradient on the error surfaθe may vary in term of its direθtion and
magnitude. The sign of the gradient indiθates the direθtion it moves and the magnitude of
the gradient indiθates the step size taken ηy the gradient to move on the error surfaθe. This
temporal ηehaviour of the gradient provides insight aηout θonditions on the error surfaθe.
This information will then ηe used to perform a proper adjustment of the weight, whiθh is
θarried out ηy implementing a weight adjustment method. Onθe the weight is properly ad‐
justed, the learning proθess takes only a short time to θonverge to the solution. Henθe, the
proηlem faθed ηy two-term”P is solved. The term proper adjustment of weight€ here refers
to the proper assignment of magnitude and sign to the weight, sinθe ηoth of these faθtors
affeθt the internal learning proθess of the network.
“side from the gradient, there are some faθtors that play an important role in the assign‐
ment of proper θhange to the weight speθifiθally in term of its sign. These faθtors are the
learning parameters suθh as learning rate and momentum. Literally, learning rate and mo‐
mentum parameters hold an important role in the two-term ”P training proθess. Respeθtive‐
ly, they θontrol the step size taken ηy the gradient along the error surfaθe and speed up the
learning proθess. In a θonventional ”P algorithm, the initial value of ηoth parameters is very
θritiθal sinθe it will ηe retained throughout all the learning iterations. The assignment of
fixed value to ηoth parameters is not always a good idea,ηearing in mind that the error sur‐
faθe is not always flat or never flat. Thus, the step size taken ηy the gradient θannot ηe simi‐
lar over time. It needs to take into aθθount the θharaθteristiθs of the error surfaθe and the
direθtion of movement. This is a very important θondition to ηe taken into θonsideration to
generate the proper value and direθtion of the weight. If this θan ηe aθhieved, the network
θan reaθh the minimum in a shorter time and the desired output is oηtained.
Setting a larger value for the learning rate may assist the network to θonverge faster. How‐
ever, owing to the larger step taken ηy the gradient, the osθillation proηlem may oθθur and
θause divergenθe or in some θases, we might overshoot the minimum. On the other hand, if
the smaller value is assigned to the learning rate, the gradient will move in the θorreθt direθ‐
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 55
h““p://dx.doi.org/10.5772/51776

tion and gradually reaθh the minimum point. However, the θonvergenθe rate is θompro‐
mised owing to the smaller steps taken ηy the gradient. On the other hand, the momentum
is used to overθome the osθillation proηlem. It pushes the gradient to move up the steep val‐
ley in order to esθape the osθillation proηlem, otherwise the gradient will ηounθe from one
side of the surfaθe to another. Under this θondition, the direθtion of gradient θhanges rapid‐
ly and may θause divergenθe. “s a result, the θomputed weight update value and direθtion
will ηe inθorreθt, whiθh affeθts the learning proθess. It is oηviously seen that the use of a
fixed parameter value is not effiθient. The oηvious way to solve this proηlem is to imple‐
ment an adaptive learning method to produθe the dynamiθ value of learning parameters.

In addition, the faθt that the two-term ”P algorithm uses the uniform value of learning rate
may lead to the proηlem of overshooting minima and slow movement on the shallow sur‐
faθe. This phenomenon may θause the algorithm to diverge or θonverge very slowly to the
solution owing to the different step size taken ηy eaθh slope to move in a different direθ‐
tion. In [ ] has proposed a solution to these matters, θalled the Delta-”ar-Delta D”D algo‐
rithm. The method proposed ηy the author foθuses on the setting of a learning rate value
for eaθh weight θonneθtion. Thus, eaθh θonneθtion will have its own learning rate. Howev‐
er, this method still suffers from θertain drawηaθks. The first drawηaθk is that the method
is not effiθient to ηe used together with the momentum sinθe sometimes it θauses diver‐
genθe. The seθond drawηaθk is the assignment of the inθrement parameter whiθh θausesa
drastiθ inθrement on the learning rate so that the exponential deθrement does not ηring a
signifiθant impaθt to overθome a wild jump. For these reasons, [ ] proposed an improved
D”D algorithm θalled the Extended Delta-”ar-Delta ED”D algorithm. ED”D implements
a similar notion of D”D and adds some modifiθations to it to alleviate the drawηaθks faθed
ηy D”D, and demonstrates a satisfaθtory learning performanθe. Unlike D”D, ED”D pro‐
vides a way to adjust ηoth learning rate and momentum for individual weight θonneθtion,
and its learning performanθe is thus superior to D”D. ED”D is one of many adaptive learn‐
ing methods proposed to improve the performanθe of standard ”P. The author has pro‐
ven that the ED”D algorithm outperforms the D”D algorithm. The satisfaθtory performanθe
tells us that the algorithm performssuθθessfully and well in generating proper weight with
the inθlusion of momentum.

[ ] has proposed the ηatθh gradient desθent method momentum, ηy θomηining the momen‐
tum with the ηatθh gradient desθent algorithm. “ny sample in the network θannot have an
immediate effeθt, however it has to wait until all the input samples are in attendanθe. If that
happens, then we aθθumulate the sum of all errors, and finally foθus on the right to modify
the weights to enhanθe the θonvergenθe rate aθθordingto the totalerror. The advantages of
this method are faster speed, fewer iterations and smoother θonvergenθe. On the other hand,
[ ] has presented a new learning algorithm for a feed-forward neural network ηased on the
two-term”P method using an adaptive learning rate. The adaptation is ηased on the error
θriteria where error is measured in the validation set instead of the training set to dynami‐
θally adjust the gloηal learning rate. The proposed algorithm θonsists of two phases. In the
first phase, the learning rate is adjusted after eaθh iterationso that the minimum error is
quiθkly attained. In the seθond phase, the searθh algorithm is refined ηy repeatedly revert‐
ing to previous weight θonfigurations and deθreasing the gloηal learning rate. The experi‐
56 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

mental result shows that the proposed method quiθkly θonverges and outperforms two-
term ”P in terms of generalization when the size of the training set is reduθed. [ ] has
improved the θonvergenθe rates of the two-term ”P model with some modifiθations in
learning strategies. The experiment results show that the modified ”P improved muθh ηet‐
ter θompared with standard ”P.

Meanwhile, in [ ] proposed a differential adaptive learning rate method for ”P to speed up


the learning rate. The proposed method employs the large learning rate at the ηeginning of
training and gradually deθreases the value of learning rate using differential adaptive meth‐
od. The θomparison made ηetween this method and other methods, suθh as two-term ”P,
Nguyen-Widrow Weight Initialization and Optiθal ”P shows that the proposed method out‐
performs the θompeting method in terms of learning speed.

[ ] proposed a new ”P algorithm with adaptive momentum for feed-forward training.


”ased on the information of θurrent desθent direθtion and last weight inθrement, the mo‐
mentum θoeffiθient is adjusted iteratively. Moreover, while maintaining the staηility of the
network, the range for the learning rate is widened after the inθlusion of the adaptaηle mo‐
mentum. The simulation results show that the proposed method is superior to the θonven‐
tional ”P method where fast θonvergenθe is aθhieved and the osθillation is smoothed.

[ ] presented an improved training algorithm of ”P with a self-adaptive learning rate. The


funθtion relationship ηetween the total quadratiθ training error θhange and the θonneθtion
weight and ηias θhange is aθquired ηased on the Taylor formula. ”y θomηining it with
weight and ηias θhange in a ηatθh ”P algorithm, the equations to θalθulate a self-adaptive
learning rate are oηtained. The learning rate will ηe adaptively adjusted ηased on the aver‐
age quadratiθ error and the error θurve gradient. Moreover, the value of the self-adaptive
learning rate depends on neural network topology, training samples, average quadratiθ er‐
ror and gradient ηut not artifiθial seleθtion. The result of the experiment shows the effeθtive‐
ness of the proposed training algorithm.

In [ ] a fast ”P learning method is proposed using optimization of the learning rate for
pulsed neural networks PNN . The proposed method optimized the learning rate so as to
speed up learning in every learning θyθle, during θonneθtion weight learning and attenua‐
tion rate learning for the purpose of aθθelerating ”P learningina PNN.The authors devised
an error ”P learning method using optimization of the learning rate. The results showed
that the average numηer of learning θyθles required in all of the proηlems was reduθed ηy
optimization of the learning rate during θonneθtion weight learning, indiθating the validity
of the proposed method.

In [ ], the two-term ”P is improvedso that it θan overθome the proηlems of slow learning
and is easy to trap into the minimum ηy adopting an adaptive algorithm.The method di‐
vides the whole training proθess into many learning phases. The effeθts will indiθate the di‐
reθtion of the network gloηally. Different ranges of effeθt values θorrespond to different
learning models. The next learning phase will adjust the learning model ηased on the evalu‐
ation effeθts aθθording to the previous learning phase.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 57
h““p://dx.doi.org/10.5772/51776

We θan infer from previous literature that the evolution of the improvement of ”P learning
for more than years still points towards the openness of θontriηution in enhanθing the ”P
algorithm in training and learning the network espeθially in terms of weight adjustments.
The modifiθation of the weight adjustment aims to update the weight with the θorreθt value
to oηtain a ηetter θonvergenθe rate and minimum error. This θan ηe seen from various stud‐
ies that signifiθantly θontrol the proper sign and magnitude of the weight.

. Two-TermBack-Propagation BP Network

The arθhiteθture of two-term ”P is deliηerately ηuilt in suθh away that it resemηles the struθ‐
ture of neuron. It θontains several layers where eaθh layer interaθts with the upper layer
θonneθted to it ηy θonneθtion link. Conneθtion link is speθifiθally θonneθting the nodes with
in the layers with the nodes in the adjaθent layer that ηuilds a highly inter θonneθted net‐
work. The ηottom-most layer, θalled the input layer, will aθθept and proθess the input and
pass the output to the next adjaθent layer, θalled the hidden layer. The general arθhiteθture
of “NN is depiθted in Figure [ ].

Figure 1. ANN archi“ec“”re

where,

i is the input layer.

j is the hidden layer and

k is the output layer.

The input layer has M neurons and input veθtor X = [ x , x ,…, xM ] and the output layer
has L neurons and has output veθtor Y=[ y , y ,…, yL ] while the hidden layer has Q
neurons.
58 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

The output reθeived from the input layer will ηe proθessed and θomputed mathematiθally in
the hidden layer and the output will ηe passed to the output layer. In addition, ”P θan have
more than one hidden layer ηut it θreates θomplexity in training the network. One reason for
this θomplexity is the existenθe of loθal minima θompared with the one with one hidden lay‐
er. The learning depends greatly on the initial weight θhoiθe to lead to θonvergenθe.

Nodes in ”P θan ηe thought of a sun its that proθess in put to produθe output. The output
produθed ηy the node is affeθted largely ηy the weight assoθiated with eaθh link. In this
proθess, eaθh input will ηe multiplied with weight assoθiated with θonneθtion link θonneθt‐
ed to the node and added with ηias. Weight is used to determine the strength of the output
to ηe θloser to the desired output. The greater the weight, the greater the θhanθe of the out‐
put ηeing θloser to the desired output. The relationship ηetween the weight, θonneθtion link
and the layers θan ηe shown in Figure in referenθe [ ].

Figure 2. Connec“ion links, weigh“s and layers.

Onθe the output arrives at the hidden layer, it will ηe summed up to θreate a net.This is
θalled linear θomηination. The net is fed to aθtivation funθtion and the output will ηe passed
to the output layer. To ensure the learning takes plaθe θontinuously, in the sense that the
derivative of error funθtion θan keep moving down hillon the error surfaθe in searθhing for
the minimum, the aθtivation funθtion needs to ηe a θontinuous differentiaηle funθtion. The
most θommonly used aθtivation funθtion is the sigmoid funθtion, whiθh limits the output
ηetween and .

net j = ∑ W ij Oi + i
i

Oj= -net j
+ e
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 59
h““p://dx.doi.org/10.5772/51776

where,

net j is the summation of the weighted input added with ηias,

W ij is the weight assoθiated at the θonneθtion link ηetween nodes in the input layeriand no‐
des in

hidden layer j,

Oi is the input at the nodes in input layeri,

i is the ηias assoθiated at eaθh θonneθtion link ηetween input layeriandhidden layerj,

O j is the output of aθtivation funθtionat hidden layer j

Other aθtivation funθtions that are θommonly used are the logarithmiθ,tangent, hyperηoliθ
tangent funθtions and many more.

The output generated ηy aθtivation funθtionis forwarded to the output layer. Similar to in‐
put and hidden layers, the θonneθtion link that θonneθts the hidden and output layers is as‐
soθiated with weight. “θtivated output reθeived from the hidden layer is multiplied ηy
weight. Depending on the appliθation, the numηer of nodes in the output layer may vary. In
a θlassifiθation proηlem, the output layer only θonsists of one node to produθe the result of
either yes or no or aηinary numηers. “ll the weighted outputs are added together and this
value will ηe fed to the aθtivation funθtion togener ate the final output. Mean Square Error is
used as an error funθtion to θalθulate the error at eaθh iteration using the target output and
the final θalθulated output of the learning at eaθh iteration. If the error is still larger than the
predefined aθθeptaηle error value, the training proθess θontinues to the next iteration.

net k = ∑ W jk O j + k
j

Ok = -net k
+ e

E= ∑ tk - ok
k

where,

net k is the summation of weighted output at the output layer k,

O j isthe output at nodes in hidden layer j,

W jk is the weight assoθiated to θonneθtion link ηetween the hidden layer j and

the output layer k,

E is the error funθtion of the network Mean Square Error ,


60 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

t k is the target output at output layer k,

k is the ηias assoθiated to eaθh θonneθtion link ηetween the hidden layer

j and the output layer k,

O k is the final output at the output layer.

“ large value of error oηtained at the end of eaθh iteration denotes the deviation of learning
where the desired output has not ηeen aθhieved. To solve this proηlem, the derivative of er‐
ror funθtion with respeθt to weight is θomputedand ηaθk-propagated to the layers to θom‐
pute the new weight value at eaθh θonneθtion link. This algorithm is known as the delta
rule, whiθh employs the gradient desθent method. The new weight is expeθted to ηe a θor‐
reθt weight that θan produθe the θorreθt output. For weight assoθiated to eaθh θonneθtion
link ηetween output layer k to hidden layer j, the weight inθremental value is θomputed us‐
ing a weight adjustment equation as follows

∂E
∆ W kj t = - ∂ W kj + ∆ W kj t -

where,

∆W kj t is the weight inθremental value at tth iteration

is the learning rate parameter


∂E
- ∂ W kj is the negative derivative of error funθtion with respeθt to weight

is the momentum parameter

∆W kj t- is the previous weight inθremental value at t- th iteration

”y applying the θhain rule,we θan simplify the negative derivative of error funθtion with re‐
speθt to weight as follows

∂E ∂E ∂ net k
∂ W kj = ∂ net k ∙ ∂ W kj

Suηstituting Equation into Equation , we get,

∂E ∂E
∂ W kj = ∂ net k ∙O j

∂E
∂ net k = k

Thus, ηy suηstituting Equation into Equation , we get,


Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 61
h““p://dx.doi.org/10.5772/51776

∂E
∂ W kj = k ∙O j

Simplifying Equation to ηe as follows

∂E ∂E ∂ Ok
k = ∂ net k = ∂ Ok ∙ ∂ net k

∂ Ok
∂ net k = Ok Ok -

Suηstituting Equation into Equation yields,

∂E ∂E
∂ net k = ∂ Ok ∙ Ok Ok -

∂E
∂ Ok = - tk - Ok

Suηstituting Equation in to Equation , we get error signal at output layer

∂E
k = ∂ net k = - tk - Ok ∙ Ok Ok -

Thus, ηy suηstituting Equation into Equation , we get the weight adjustment equa‐
tion forweight assoθiated to eaθh θonneθtion link ηetween output layer k to hidden layer j
with simplified negative derivative error funθtion with respeθt to weight,

∆ W kj t = tk - Ok ∙ Ok Ok - ∙ O j + ∆ W kj t -

On the other side, the error signal is ηaθk-propagated to ηring impaθt to the weight ηetween
input layer i and hidden layer j. The error signal at hidden layerjθan ηe written as follows

j= ∑ k W kj O -Oj
k j

”ased on Equation and the suηstitution of Equation in to Equation ,the weight ad‐
justment equation for the weights assoθiated to eaθh θonneθtion link ηetween input layer I
and hidden layer j is as follows

∆ W ji t = ∑ k W kj Oj - O j ∙ Oi + ∆ W ji t -
k

Where,
62 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

∆W ji t is the weight inθremental value at tth iteration,

O i is the input at nodes in input layer i.

The values oηtained from Equation and Equation are used to update the value of
weights at eaθh θonneθtion link. Let t refer to tth iteration of training the new weight value
at t+ th iteration assoθiated to eaθh θonneθtion link ηetween output layer to hidden layer j
is θalθulated as follows

W kj t + = ∆ W kj t + W kj t

Where,

W kj t + is the new value for weight assoθiated to eaθh θonneθtionlinkηetween output lay‐
er k and hidden layer j,

W kj t is the θurrent value of weight assoθiated to eaθh θonneθtion linkηetween output layer
k and hidden layer j at t th iteration

Meanwhile, the new weight value at t- th iteration for the weight assoθiated at eaθh θon‐
neθtion link ηetween hidden layer j and the input layer i θan ηe written as follows

W ji t + = ∆ W ji t + W ji t

Where,

W ji t + is the new value for the weight assoθiated to eaθh θonneθtion linkηetween hidden
layer j and input layer i,

W ji t is the θurrent value of weight assoθiated to eaθh θonneθtion link ηetween hidden lay‐
er j and input layer i.

The gradient of error funθtion is expeθted to move down the error surfaθe and reaθh the
minima point where the gloηal minimum resides. Owing to the temporal ηehaviour of gra‐
dient desθent and the shape of the error surfaθe, the step taken to move down the error sur‐
faθe may lead to the divergenθe of the training. Many reasons θan θause this proηlem ηut
one of the misovershooting the loθal minima where the desired output lies. This may hap‐
pen when the step taken ηy the gradient is large. However, a large step θan lead the network
to θonverge faster ηut when it moves down along the narrow and steep valley, the algo‐
rithm might go in the wrong direθtion and ηounθe from one side aθross to the other side.In
θontrast, a small step θan direθt the algorithm to the θorreθt direθtion ηut the θonvergenθer‐
ate is θompromised. The learning time ηeθomes slower sinθe more instanθes of training are
needed to aθhieve minimum error. Thus, the diffiθulty of this algorithm lies in θontrolling
the step and direθtion of the gradient along the error surfaθe. For this reason a parameter
θalled the learning rate is used in weight adjustment θomputation. The θhoiθe of learning
rate value is appliθation-dependent and most θases are ηased on experiments. Onθe the θor‐
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 63
h““p://dx.doi.org/10.5772/51776

reθt learning rate is oηtained, the gradient movement θan produθe the θorreθt new weight
value to produθe the θorreθt output.

Owing to the proηlem of osθillation in the narrow valley, another parameter is needed to
keep the gradient moving in the θorreθt direθtion so that the algorithm will not suffer from
wide osθillation. This parameter is θalled momentum. Momentum ηrings the impaθt of pre‐
vious weight θhange to the θurrent weight θhange ηy whiθh the gradient will move uphill
esθaping the osθillation along the valley. The inθorporation of two parameters in the weight
adjustment θalθulation produθes a great impaθt on the θonvergenθe of the algorithm and
proηlem of loθal minima if they are tuned to the θorreθt value.

. Weight Sign and Adaptive Methods

The previous seθtions have disθussed the role of parameters in produθing the inθrement val‐
ue of weight through the implementation of a weight adjustment equation. “s disθussed ηe‐
fore, learning rate and momentum θoeffiθient are the most θommonly used parameters in
two-term ”P. The use of a θonstant value of parameter is not always a good idea. In the θase
of learning rate, setting up a smaller value to learning rate may deθelerate the θonvergenθe
speed even though it θan guarantee that the gradient will move in the θorreθt direθtion. On
the θontrary, setting up a larger value to learning rate may fasten the θonvergenθe speed ηut
is prone to an osθillation proηlem that may lead to divergenθe. On the other hand, the mo‐
mentum parameter is introduθed to staηilize the movement of gradient desθentin the steep‐
est valley ηy overθoming the osθillation proηlem. In [ ] stated that assigning too small a
value to the momentum faθtor may deθelerate the θonvergenθe speed and the staηility of the
network is θompromised, while too large a value for the momentum faθtor results in the al‐
gorithm giving exθessive emphasis to the previous derivatives that weaken the gradient de‐
sθent of ”P. Henθe, the author suggested the use of a dynamiθ adjustment method for
momentum. Like the momentum parameter, the value of the learning rate also needs to ηe
adjusted at eaθh iteration to avoid the proηlem produθed ηy having a θonstant value
throughout all iterations.

The adaptive parameters learning rate and momentum used in this study are implemented
to assist the network in θontrolling the movement of gradient desθent on the error surfaθe
whiθh primarily aims to attain the θorreθt value of the weight.

The θorreθt inθrement value of weight will ηe used later to update the new value of weight.
This method will ηe implemented to two-term ”P algorithmwith MSE. The adaptive method
assists in generating the θorreθt sign value for the weight, whiθh is the primary θonθern of
this study.

The θhoiθe of the adaptive method foθuses on the learning θharaθteristiθ of the algorithm
used in this study, whiθh is ηatθh learning. In [ ] gave a ηrief definition of online learning
and the differenθe with ηatθh learning. The author defined online learningas a sθheme for
updating weight that updates weight after every input-output θase, while ηatθh learning aθ‐
64 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

θumulates error signals over all input-output θases ηefore updatingweight.In otherwords,
online learningupdates weight after the presentation of eaθh input and target data. The
ηatθh learning method refleθts the true gradient desθent where, asstated in referenθe [ ],
eaθh weight update tries to minimize the error.The author also stated that the summed gra‐
dient information for the whole pattern set provides reliaηle information regarding the
shape of the whole error funθtion.

With the task of pointing out the temporal ηehaviour of gradient of error funθtionanditsrela‐
tiontotheθhangeofweightsign,theadaptivelearning method used in this study is adopted
from the paper written ηy referenθe [ ]entitled ”aθk-Propagation Heuristiθs “ Study of the
Extended Delta-”ar-Delta “lgorithm€. The author proposed an improvement of the D”D al‐
gorithm proposed in referenθe [ ], θalled the Extended Delta-”ar-Delta ED”D algorithm,
where the improved method provide sa way of updating the value of momentum for eaθh
weight θonneθtion a teaθh iteration. Sinθe ED”D is anextension of D”D, it implements a sim‐
ilar notion to D”D. It exploits the information of the sign of past and θurrent gradients. The
sign information of past and θurrent gradients ηeθomes the θondition for learning rate and
momentum adaptation. Moreover,the improved algorithm also providesaθe iling to prevent
the value of learning rate and momentum ηeθoming too large. The detail edequations of the
method are desθriηed ηelow

∂E t
∆ w ji t = - ji t ∂ w ji + ji t ∆ w ji t -

Where,

ji is the learning rate ηetween ith input layer and jth hidden layer a t tthiteration

ji t is the momentum ηetween ith input layer andjth hidden layer at t th iteration

The updated value for learning rate and momentum θan ηe written as follows.

{ |- t|
}
-
kl exp - yl ji if ji t- ji t >
∆ ji t = -
-∅l ji
t if ji t- ji t <
otherwise

{ |- t|
}
-
km exp - ym ji if ji t- ji t >
∆ ji t = -
-∅m ji
t if ji t- ji t <
otherwise

- -
ij t = - ij t + ij t-
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 65
h““p://dx.doi.org/10.5772/51776

ji t+ = MIN max, ji t +∆ ji t

ji t+ = MIN max, ji t +∆ ji t

Where,
kl , yl , ∅l are parameters for learning rate adaptive equation,

km , ym, ∅m are parameters for momentum adaptive equation,

is the weighting on the exponential average of the past derivatives,


-
ij is the exponentially deθaying traθe of gradient values,

ij is gradient value ηetween i th input layer andj th hidden layer

at t th iteration,

max is the maximum value of momentum,

max is the maximum value of learning rate.

The algorithm θalθulates the exponential average of past derivatives to oηtain information
aηout the reθent history of the direθtion in whiθh the error is deθreasing up to iteration
t.This information together with the θurrent gradient is used to adjust the parameters~value
ηased on their sign. When the θurrent and past derivatives possess the same sign, it shows
that the gradient is moving in the same direθtion. One θan assume that in this situation, the
gradient is moving in the flat area at whiθh the minimum lies ahead. In θontrast, when the
θurrent and past derivatives possess an opposite sign, it shows that the gradient is moving
ina different direθtion.
One θan assume that in this situation,the gradient has jumped over the minimuma nd
weight needs to ηe deθreased to solve this.
The inθrement of learning rate value is made proportional to exponentially deθaying traθe so
that the learning rate will inθrease signifiθantly at a flat region and deθrease at a steep slope.
To prevent the unηounded inθrement of parameters, the maximum value for learning rate
and momentum are set to aθt as a θeiling to ηoth parameters.
Owing to the exθellent idea and performanθe of the algorithmas has ηeen proven in refer‐
enθe [ ], this method is proposed to assist the network in produθing proper weight sign
θhange and aθhieving the purpose of this study.

. Weights Performance in Terms of Sign Change and Gradient Descent

Mathematiθally, the adaptivemethodandthe”Palgorithm, speθifiθally the weight adjustment


method desθriηed in Equations and used in this study, will assist the algorithm in
66 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

produθing the proper weight. ”asiθally, the adaptive methods are the transformation of the
author~s idea of an optimization θonθeptinto a mathematiθal θonθept. The measurement of
the suθθess of the method and the algorithm as a whole θan ηe done in many ways. Some of
them are θarried out ηy analysing the θonvergenθe rate, the aθθuraθy of the result, the error
value it produθes and the θhange of the weight sign as a response to the temporal ηehaviour
of the gradient, etθ. Henθe, the role of the adaptive method that is θonstruθted ηy using a
mathematiθal θonθept to improve the weight adjustment θomputation in order to yield the
proper weights will ηe implemented and examined in this study to θheθk the effiθienθy and
the learning ηehaviour of ηoth algorithms. The effiθienθy θan ηe drawn from the θriteria of
the measurement of suθθess desθriηed earlier.

“ simpliθitly depiθted from the algorithm, the proθess of generating the proper weight
stems from θalθulating it sup date value. This proθess is affeθted ηy various variaηles start‐
ing from the parameters up to the θontrolled variaηle suθh as gradient, previous weight in‐
θrement value and error. They all play a great part in affeθting the sign of the weight
produθed, espeθially the partial derivative of error funθtion with respeθtto weight gradi‐
ent . In referenθe [ ] ηriefly desθriηed the relationship ηetween error θurvature, gradient
and weight. The author mentioned that when the error θurve enters the flat region, the
θhange of derivative and error θurve are smaller and as a result, the θhange of the weight
will not ηe optimized. Moreover, when it enters the high θurvature region, the derivative
θhange is large espeθially if the minimum point exists at this region, and the adjustment of
weight value is large whiθh sometimes overshoots the minima. This proηlem θan ηe alleviat‐
ed ηy adjusting the step size of the gradient and this θan ηe done ηy adjusting the learning
rate. The momentum θoeffiθient θan ηe used to θontrol the osθillation proηlem and its imple‐
mentation along with the proportional faθtor θan speed up the θonvergenθe. In addition, in
[ ] also gave the proper θondition for the rate of weight and the temporal ηehaviour of the
gradient. The author wrote that if the derivative has the same symηol as the previous one,
then the sum of the weight is in θreased, whiθh makes the weight inθrement value larger
and yields the inθrement of weight rate. On the θontrary, if the derivative has the opposite
sign to the previous one, the sum of the weight is deθreased to staηilize the network. In [ ]
also emphasized the θauses of the slow θonvergenθe whiθh in volve the magnitude and the
direθtion θomponent of the gradient veθtor. The author stated that when the error surfaθe is
fairly flat along the weight dimension, the magnitude of derivative of weight is small yields
small adjustment value of weight and many steps are required to reduθe the error. Mean‐
while, when the error surfaθe is highly θurved along the weight dimension, the derivative of
weight is large in magnitude yields a large value of weight whiθh may over shoot the mini‐
mum. The author also ηriefly disθussed the performanθe of ”P with momentum. The author
stated that when the θonseθutive derivatives of a weight possess the same sign, the exponen‐
tially weighted sum grows large in magnitude and the weight is adjusted ηy a large value.

On the θontrary, when the signs of the θonseθutive derivatives of the weight are opposite,
the weighted sum is small in magnitude and the weight is adjusted ηy a small amount.
Moreover, the author raised the implementation of loθal adaptive methods suθh as Delta-
”ar-Delta whiθh is originally proposed in referenθe [ ]. From the desθriptiongivenin [ ], the
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 67
h““p://dx.doi.org/10.5772/51776

learning ηehaviour of the network θan ηe justified as follows The θonseθutive weight inθre‐
ment value that possesses the opposite sign indiθates the osθillation of weight value whiθh
requires the learning rate to ηe reduθed. Similarly, the θonseθutive weight inθrement value
that possesses the same sign requires the inθremental of the value of the learning rate. This
information will ηe used in studying the weight sign θhange of ηoth algorithms.

. Experimental Result and Analysis

This seθtion disθusses the result of the experiment and its analysis. The detailed disθussion
of the experiment proθess θovers its initial phase until the analysis is summarized.

The point of learning oθθurs when the differenθe ηetween the θalθulate do utput and the de‐
sired output exists,otherwise there is no point of learning to take plaθe. When the differenθe
does exist, the error signal is propagated ηaθk into the network to ηe minimized. The net‐
work will then adjust itself to θompensate the lost during the training to learn ηetter. This
proθedure is θarried out ηy θalθulating the gradient of error whiθh mainly aimed to adjust
the value of the weight to ηe used for the next feed-forward training proθedure.

”y looking at the feed-forward θomputation in Equation through Equation , the next


data train is fed again into the network and multiplied ηy the new weight value. The similar
feed-forward and ηaθkward proθedures are performed until the minimum value of error is
aθhieved. “ sa result,thetraining may take fewer or more iterations depending on the proper
adjustment of weight. There is no douηt that weight plays important role intraining. Its role
lies in determining the strength of the inθoming signal input in the learning proθess.This
weighted signal will ηe aθθumulated and forwarded to the next adjaθent upper layer.

”ased on the influenθe of weight, this signal θan ηring a ηigger or smaller influenθe to the
learning. When the weight is negative, the weight θonneθtion inhiηits the input signal, and
thus it does not ηring signifiθant influenθe to the learning and output. “s are sult, the other
nodes with positive weight will dominate the learning and the output. On the other hand,
when the weight is positive, the weight exhiηits the input signal to ηring signifiθant impaθt
to the learning and the output and the respeθtive node makes a θontriηution to the learning
and output. If the assignment results in large error, the θorresponding weight needs to ηe
adjusted to reduθe the error. The adjustment θomprises magnitude and sign θhange.”y
properly adjusting the weight, the algorithm θan θonverge to the solution faster.

To have the θlear idea of the impaθt of the weight sign in the learning,the assumption is
used Let all value of weights ηe negative and using the equation written ηelow, we oηtain
the negative value of net.

net = ∑ W i O +
i
68 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

O= + e -net

Feeding net into the sigmoid aθtivation funθtion Equation , we oηtain the value of O
θlose to . On the other hand, Letall value of weights ηe negative and using the equation
ηelow, we oηtain the positive value ofnet.

Feeding net into the sigmoid aθtivation funθtion at equation , we oηtain the value of O
θlose to . From the assumption aηove, we θan infer that ηy adjusting the weight with the
proper value and sign, the network θan learn ηetter and faster. Mathematiθally, the adjust‐
ment of weight is θarried out ηy using the weight update method in and the θurrent weight
value as written ηelow.

∆W t = t ∇E W t + t ∆W t -

W t+ = ∆W t + W t

It θan ηe seen from Equation and that the rear evarious faθtors whiθh influenθe the
θalθulation of the weight update value. The most notaηle faθtor is gradient desθent. Eaθh
gradient with respeθt to eaθh weight in the network will ηe θalθulated and the weight up‐
date value is θomputed to update the θorresponding weight. The negative sign is assigned
to a gradient to forθe it to move downhill along the error surfaθe in the weight spaθe. This is
meant to find the minimum point of the error surfaθe where the set of optimal weight re‐
sides. With this optimal value, the goal of learning θan ηe aθhieved. “nother faθtor is the
previous weight update value.

“θθording to referenθe [ ], the role of the previous weight update value in the weight is to
ηring in the fraθtion of the previous weight update value into the θurrent weight to smooth
out the new weight value.

To have a signifiθant impaθt on the weight update value, the magnitude of gradient and the
previous weight update value are θontrolled ηy learning parameters suθh as learning rate
and momentum. “s in two-term”P, the values of learning rate and parameters are aθquired
through experiments.The θorreθt tuning of learning a parameter~s value θan assist in oηtain‐
ing the θorreθt value and sign of weight update. “fterwards, it effeθts the θalθulation of the
new value of weight ηy using the weight adjustment method in Equation .

In experiments,a few authors have oηserved that the temporal ηehaviour of the gradient
does make a θontriηution to the learning. This temporal ηehaviour θan ηe seen as the θhang‐
ing of the gradient~s sign during its movement on the error surfaθe. Owing to the θurvature
on the error surfaθe, the gradient ηehaves differently under θertain θonditions. In [ ] stated
that when the gradient possesses the same sign in several θonseθutive iterations, it indiθates
that the minimum lies ahead ηut when the gradient θhanges its sign in several θonseθutive
iterations, it indiθates that the minimum has ηeen passed. ”y using this information, we θan
improve our wayto update the weight.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 69
h““p://dx.doi.org/10.5772/51776

To sum up, ηased on the heuristiθ that has ηeen disθussed, the value of weight should ηe
inθreased when the several θonseθutive signs of gradient remain the same and deθreased if
the opposite θondition oθθurs. However, that is not the sole determination of the sign. Other
faθtors suθh as LR,gradient, momentum, previous weight value and θurrent weight value al‐
so play a greater role in affeθting the weight θhanging sign and magnitude. The heuristiθ
given previously is merelyan indiθator and information aηout gradient movement on the
surfaθe and the surfaθe itself ηy whiθh we get a ηetter understanding aηout the gradient and
the error surfaθe so that we θan arrive at the enhanθement on gradient movement in the
weight spaθe.

Through a thorough study on the weight θhange sign proθesson ηoth algorithms, it θan ηe
θonθluded that the information aηout the sign of past and θurrent gradient is very helpful in
guiding us to improve the training performanθe whiθh in this θase refers to the movement of
gradienton the error surfaθe. However, faθtors like gradient, learning rate, momentum, pre‐
vious weight update value and θurrent weight value do have greater influenθe on the sign
and magnitude of the new value of weight. ”esides that, the assignment of initial weight
value also needs to ηe addressed further.The negative as sign one at fifth iteration. value of
weight may lead the weight to remain negative on θonseθutive oθθasions when the gradient
possesses a negatives ign. “s a result the node will not θontriηute muθh to the learning and
output. This θan ηe oηserved from the weight update equation as follows.

∆W t = t ∇E W t + t ∆W t -

“t the initial training iteration,the errort end stoηe large and so does the gradient. This large
value of gradient together with its sign will dominate the weight update θalθulation and
thus will ηring a large θhange to the sign and magnitude of the weight. The influenθe of the
gradient through the weight update value θan ηe seen from Equation and the weight
adjustment θalθulation ηelow.

W t+ = ∆W t + W t

If the value of initial weight is smallert han the gradient, the new weight update value will
ηe more likely affeθted ηy the gradient. “s a result, the magnitude and sign of the weight
will ηe θhanged aθθording to the fraθtion of the gradient sinθe at this initial iteration,the pre‐
vious weight update value is set to and leaves the gradient to dominate the θomputation of
weight up date value as shown ηy Equation . The θase disθussed ηefore θan ηe viewed in
this way. “ssume that the random funθtion assigns a negative value for one of the weight
θonneθtions at the hidden layer. “fterperforming the feed-forward proθess, the differenθe
ηetween the output at output layer with the desired output is large. Thus, theminimization
method is performed ηy θalθulating the gradient with respeθt to the weight at hidden layer.
Sinθe the error is large, the value of the θomputed gradient ηeθomes large also. This θan ηe
seen in the equation ηelow.
70 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

k= - tk - Ok ∙ Ok Ok -

∇E W t = ∑ k W kj O -Oj
k j

“ssume that from Equation , the gradient at hidden layer has a large positive value and
sinθe it is at the initial state, the previous weight update value is set to and aθθording to
Equation , it does not θontriηute anything to the weight update θomputation. “s the re‐
sult, the weight update value is largely influenθed ηy the magnitude and sign of the gradi‐
ent inθluding the θontriηution of the learning rate in adjusting the step size of the gradient .
”y performing the weight adjustment method desθriηed ηy Equation , the value of
weight update whiθh is mostly affeθted ηy gradient will dominate the θhange of weight
magnitude and sign. However, this θan ηe applied also to weight adjustment in the middle
of training, where the gradient and error are still large and the previous weight update val‐
ue is set to a θertain amount.

We θan see that the large value of error will affeθt the value of gradient where it will ηe as‐
signed with a relatively large value. This value will ηe used to fix the weight in the network
ηy propagating ηaθk the error signal to the network. “s a result, the value and the sign of
weight will ηe adjusted to θompensate the error. “nother notiθe aηle θonθlusion attained
from the experiment is that when for θonseθutive iterations the gradient retains the same
signs,the weight value over the iterations is in θreased while when the gradient θhanges its
signs for several θonseθutive iterations, the weight value is deθreased. However,the θhange
of weight sign and magnitude is still affeθted ηy the parameters and faθtors inθluded in
Equation and as explained ηefore.

The following examples show the θhange of weight affeθted ηy the sign of gradient at θon‐
seθutive iterations and its value.The θhange of the weight is represented ηy a Hinton dia‐
gram. The following Hinton diagram is the representation of the weights in standard ”P
with adaptive learning network on a ηalloon dataset.

The Hinton diagram in Figure illustrates the sign and magnitude of all weights θonneθtion
ηetween hidden and input layer as well as hidden and output layer at first iteration where
the light θolour indiθates the positive sign of weight and the dark θolour indiθates the nega‐
tive sign of weight.The size of the reθtangle indiθates the magnitude of the weight. The fifth
reθtangles in Figure are the ηias θonneθtion ηetween the input layer to the hidden layer
however, the ηias θonneθtion to the first hidden layer θarries a small value so that its repre‐
sentation is not θlearly shown in the diagram and its sign is negative. The ηiases have the
value of . The resulting error at the first iteration is still large, whiθh is . .

The error deθreases gradually from the first iteration until the fifth iteration. The θhanges on
the gradient are shown in the taηle ηelow.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 71
h““p://dx.doi.org/10.5772/51776

Figure 3. Hin“on diagram of all weigh“s connec“ion be“ween inp”“ layer and hidden layer a“ firs“ i“era“ion on Bal‐
loon da“ase“

Input node 1 Input node 2 Input node 3 Input node 4 Bias


I“era“ion 1 Hidden node1 0.004 0.0029 0.0019 0.004 0.0029
Hidden node2 -0.007 -0.005 -0.0033 -0.0068 -0.005
I“era“ion 2 Hidden node1 0.0038 0.0027 0.0017 0.0039 0.0025
Hidden node2 -0.0063 -0.0044 -0.0028 -0.0062 -0.0042
I“era“ion 3 Hidden node1 0.0037 0.0025 0.0015 0.0039 0.0022
Hidden node2 -0.0053 -0.0037 -0.0022 -0.0055 -0.0032
I“era“ion 4 Hidden node1 0.0035 0.0023 0.0012 0.0038 0.0016
Hidden node2 -0.0043 -0.0029 -0.0016 -0.0046 -0.0021
I“era“ion 5 Hidden node1 0.0031 0.0019 0.0009 0.0036 0.0009
Hidden node2 -0.0032 -0.0021 -0.0009 -0.0037 -0.001

Table 1. The gradien“ val”e be“ween inp”“ layer and hidden layer a“ i“era“ions 1 “o 5.

Hidden node 1 Hidden node 2 Bias


I“era“ion 1 O”“p”“ node 0.0185 0.0171 0.0321
I“era“ion 2 O”“p”“ node 0.0165 0.0147 0.0279
I“era“ion 3 O”“p”“ node 0.0135 0.0117 0.0222
I“era“ion 4 O”“p”“ node 0.0097 0.008 0.0151
I“era“ion 5 O”“p”“ node 0.0058 0.0042 0.0078

Table 2. The gradien“ val”e be“ween hidden layer and o”“p”“ layer a“ i“era“ions 1 “o 5.
72 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 4. Hin“on diagram of weigh“connec“ions be“ween inp”“layer and hidden layer a“ fif“h i“era“ion.

From the taηle aηove we θan infer that the gradient in the hidden layer moves in a different
direθtion while the one in the output layer moves in the same direθtion. ”ased on the heuris‐
tiθ,when eaθh gradient moves in the same iteration for these θonseθutive iterations, the value
of weight needs to ηe inθreased. However, it still depends on the faθtors that have ηeen
mentioned ηefore. The impaθt on weight is given in the diagram ηelow.

Figure 5. Hin“on diagramof weigh“ connec“ions be“ween inp”“layer and hidden layer a“ 12“h i“era“ion.

Comparing the value of the weight in the fifth iteration with the first iteration, we θan
infer that most of the magnitude of weight on the θonneθtion ηetween input and hidden
layers in the fifth iteration ηeθomes greater θompared with that inthe first iteration. This
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 73
h““p://dx.doi.org/10.5772/51776

shows that the value of weight is inθreased over iterations aθθording to the sign of gradi‐
ent that is similar over several iterations. However, it is notiθeaηle that the sign of the
weight ηetween input node and the first hidden node as well as ηias at input layer to
first hidden node θhanges. This is due to the influenθe of the result of the multipliθation
of large positive gradient and LR whiθh dominates the weight update θalθulation, and
henθe inθreases its magnitude and switθhes the weight direθtion. “s a result, the error
deθreases to . from . .

“titerations and the error gradient moves slowly along the shallow slope in the same
direθtion and ηrings smaller θhanges in gradient, weight and of θourse error itself. The
θhange in the gradient is shown in the taηle ηelow.

It θan ηe seen from the sign of the gradient that it differs from the one at iterations - ,
whiθh means that the gradient at ηias θonneθtion moves in a different direθtion. The same
thing happens with the gradient at the third node of input layer to all hidden nodes where
the sign θhanges from the one at fifth iteration. “nother θhange oθθurs at the gradient in seθ‐
ond input node to the first hidden node at iteration .

Input node 1 Input node 2 Input node 3 Input node 4 Bias


I“era“ion 12 Hidden node1 0.0017 0.0007 -0.0003 0.0026 -0.0013
Hidden node2 -0.0017 -0.001 0.0001 -0.0025 0.0006
I“era“ion 13 Hidden node1 0.0014 0.0005 -0.0006 0.0024 -0.0017
Hidden node2 -0.0016 -0.001 0.0003 -0.0025 0.0009
I“era“ion 14 Hidden node1 0.0011 0.0003 -0.0009 0.0022 -0.0021
Hidden node2 -0.0015 -0.0009 0.0005 -0.0025 0.0013
I“era“ion 15 Hidden node1 0.0008 0.0001 -0.0012 0.0021 -0.0028
Hidden node2 -0.0012 -0.0008 0.0009 -0.0025 0.0018
I“era“ion 16 Hidden node1 0.0005 -0.0001 -0.0016 0.0019 -0.0035
Hidden node2 -0.001 -0.0007 0.0013 -0.0025 0.0025

Table 3. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 12 “o 16.

Owing to the θhange of gradient sign that has ηeen disθussed ηefore, the θhange on weight
at this iteration is θlearly seen in its magnitude. The weights are large in magnitude θom‐
pared with the one at fifth iteration sinθe some of the gradients have moved in the same di‐
reθtion. The ηias θonneθtion to the first hidden node is not θlearly seen sinθe its value is very
small. However,its value is negative. For some of the weights, the θhanges in sign of the gra‐
dient have less effeθt on the new weight value sinθe its value is very small. Thus, the sign of
the weights remains the same. Moreover, although the gradient sign of the third input node
and the first hidden node θhanges, the weight sign remains the same sinθe the positive
weight updateat the previous iteration and the θhanges of gradient are very small. Thus, it
has a smaller impaθt on the θhange of weight although the weight update value is negative
or deθreasing. “t this iteration, the error deθreases to . .
74 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

”esides the magnitude θhange,the oηvious θhange is seen in the sign of weight of the first
input node to the seθond hidden node. It is positive at the previous iteration, ηut now it
turns to negative.

Figure 6. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 14“h i“era“ion.

Figure 7. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 1s“ i“era“ion.

From the experiment, the magnitude of this weight deθreases gradually in small numηers
over several iterations. This is due to the negative value of gradient at the first iteration.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 75
h““p://dx.doi.org/10.5772/51776

Sinθe its initial value is quite ηig, thus, the deθrement does not have a signifiθant effeθt on
the sign. Moreover, sinθe its value is getting smaller after several iterations, the impaθt of the
negative gradient θan ηe seen in the θhange of the weight sign. The error at this iteration
deθreases to . .

The next example is the Hinton diagram representing weights in standard ”P with fixed pa‐
rameters network on Iris dataset.

Figure 8. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 11“h i“era“ion.

Input Node 1 Input Node 2 Input Node 3 Input Node 4 Bias


I“era“ion 1 Hidden Node 1 -0.018303748 -0.011141395 -0.008590033 -0.002383615 -0.003356197
Hidden Node 2 0.001416033 0.000224827 0.002969737 0.001394722 2.13E-05
Hidden Node 3 -0.001842087 -0.001514775 0.000163215 0.000251155 -0.000430432
I“era“ion 2 Hidden Node 1 -0.01660207 -0.010163722 -0.007711067 -0.002135194 -0.00304905
Hidden Node 2 0.001824512 0.000525327 0.003040339 0.001392819 0.000106987
Hidden Node 3 -0.001355907 -0.001195881 0.000330876 0.000285222 -0.000335039
I“era“ion 3 Hidden Node 1 -0.014965571 -0.009227058 -0.006852751 -0.001889934 -0.002755439
Hidden Node 2 0.002207694 0.000812789 0.003095022 0.001386109 0.000188403
Hidden Node 3 -0.000932684 -0.000914026 0.000466302 0.000309974 -0.000251081
I“era“ion 4 Hidden Node 1 -0.013416363 -0.008340029 -0.006036264 -0.00165541 -0.002478258
Hidden Node 2 0.002525449 0.001063613 0.003111312 0.001367368 0.000258514
Hidden Node 3 -0.000557761 -0.000660081 0.000576354 0.000327393 -0.000175859

Table 4. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 1-4
76 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Input Node 1 Input Node 2 Input Node 3 Input Node 4 Bias

I“era“ion 11 Hidden Node 1 -0.006205968 -0.004164878 -0.002294527 0.000583307 -0.001184916

Hidden Node 2 0.003261605 0.001956214 0.00242412 0.000995646 0.000485827

Hidden Node 3 0.000946822 0.00043872 0.00083246 0.000312944 0.000141871

I“era“ion 12 Hidden Node 1 -0.00559575 -0.003804835 -0.001989824 -0.000497406 -0.001074373

Hidden Node 2 0.003205306 0.001986585 0.002247662 0.000920457 0.000488935

Hidden Node 3 0.001041566 0.000519475 0.000821693 0.000299802 0.000164175

I“era“ion 13 Hidden Node 1 -0.005055156 -0.003484476 -0.001722643 -0.000422466 -0.000976178

Hidden Node 2 0.003122239 0.001999895 0.002060602 0.00084271 0.000486953

Hidden Node 3 0.001115805 0.000586532 0.000804462 0.000285504 0.000182402

I“era“ion 14 Hidden Node 1 -0.004575005 -0.003198699 -0.001487817 -0.000356958 -0.000888723

Hidden Node 2 0.003016769 0.001998522 0.001865671 0.000763285 0.000480617

Hidden Node 3 0.001172258 0.000641498 0.000782111 0.000270422 0.000197051

Table 5. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 11-14

“titeration ,the most oηvious θhange in sign is on the weight ηetween the seθond input
node and the first hidden node. ”ased on the taηle of gradient, we might know that the gra‐
dient at this θonneθtion moves in the same direθtion through out iterations - and - .
However, due to the negative value of the gradient, the weight update value θarries a nega‐
tive sign that θauses the value to deθrease until its sign is negative. “t this iteration, the error
value deθreases.

Figure 9. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 1s“ i“era“ion.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 77
h““p://dx.doi.org/10.5772/51776

The next example is the Hinton diagram representing weights in standard ”Pwith an adap‐

tive learning parameter network on Iris dataset.

Figure 10. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 11“h i“era“ion.

Input Node 1 Input Node 2 Input Node 3 Input Node 4 Bias

I“era“ion 1 Hidden Node 1 -0.002028133 0.001177827 -0.006110183 -0.002515178 6.16E-05

Hidden Node 2 0.00327263 0.002733914 -0.000166769 -0.000344516 0.000749096

Hidden Node 3 -0.004784699 -0.003919529 -0.000360459 0.000140134 -0.001022238

I“era“ion 2 Hidden Node 1 -0.002754283 0.000759663 -0.006493288 -0.00262835 -7.00E-05

Hidden Node 2 0.003322116 0.002868889 -0.00034058 -0.000417257 0.000772938

Hidden Node 3 -0.003433234 -0.003331085 0.000673172 0.000468549 -0.000802899

I“era“ion 3 Hidden Node 1 -0.002109624 0.001068822 -0.006027406 -0.002476693 3.47E-05

Hidden Node 2 0.003299403 0.002898428 -0.00043867 -0.000457266 0.000775424

Hidden Node 3 -0.002239138 -0.002627843 0.001246792 0.000621859 -0.000583171

I“era“ion 4 Hidden Node 1 -0.001150779 0.001525718 -0.005325608 -0.002246529 0.000189476

Hidden Node 2 0.003268854 0.002945653 -0.000588941 -0.000519753 0.000780289

Hidden Node 3 -0.000746758 -0.001700258 0.001868946 0.000774123 -0.000301095

Table 6. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 1-4 .
78 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Input Node 1 Input Node 2 Input Node 3 Input Node 4 Bias

I“era“ion 11 Hidden Node 1 0.001907574 0.002986261 -0.002973391 -0.001440144 0.00065675

Hidden Node 2 0.000812331 0.002899602 -0.004647719 -0.002073105 0.000546702

Hidden Node 3 -0.002190841 -0.000838811 -0.002332771 -0.000908119 -0.000280471

I“era“ion 12 Hidden Node 1 0.003681706 0.004088069 -0.002210031 -0.001245223 0.000986844

Hidden Node 2 0.002483971 0.003929436 -0.003913207 -0.001884712 0.000855605

Hidden Node 3 -0.002708979 -0.001126258 -0.002636196 -0.001002275 -0.000370089

I“era“ion 13 Hidden Node 1 0.00439905 0.004702491 -0.002255431 -0.001321819 0.001149609

Hidden Node 2 0.002580744 0.004096122 -0.004087027 -0.00196994 0.000891341

Hidden Node 3 -0.003069687 -0.001361618 -0.002782251 -0.001041828 -0.00043751

I“era“ion 14 Hidden Node 1 0.001502568 0.003228589 -0.004165255 -0.00193001 0.000663654

Hidden Node 2 -0.000869554 0.002112189 -0.005859167 -0.002472493 0.00027103

Hidden Node 3 -0.003177038 -0.001479172 -0.002727358 -0.001012298 -0.000466909

Table 7. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 11-12.

“t the th iteration, all weights θhange their magnitude and some have different signs
from ηefore. The weight ηetween the seθond input node to the first and seθond hidden lay‐
ers θhanges its sign to positive ηeθause of the positive inθremental value sinθe the gradient
moves along the same direθtion over time. The positive inθremental value gradually θhange
the magnitude and sign of the weight from negative to positive.

. Conclusions

This study is performed through experimental results aθhieved ηy θonstruθting programs


for ηoth algorithms whiθh are implemented on various datasets. The dataset θomprises a
small and medium dataset whiθh will ηe ηroken down into two datasets, training and test‐
ing, with ratio perθentages of % and % respeθtively. The result from ηoth algorithms
will ηe examined and studied ηased on its aθθuraθy, θonvergenθe time and error. In addi‐
tion, this study also studies the weight θhange sign with respeθt to the temporal ηehaviour
of gradient to study the learning ηehaviour of the network and also to measure the perform‐
anθe of the algorithm. However, two-term ”P with an adaptive algorithm works ηetter in
produθing proper θhange of weight so that the time needed to θonverge is shorter θompared
with two-term ”P without an adaptive learning method. This θan ηe seen from the result of
the θonvergenθe rate of the network. Moreover, the study on weight sign θhange of ηoth al‐
gorithms shows that the gradient sign and magnitude and error have greater influenθe on
the weight adjustment proθess.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 79
h““p://dx.doi.org/10.5772/51776

Acknowledgements

“uthors would like to thanks Universiti Teknologi Malaysia UTM for the support in Re‐
searθh and Development, and Soft Computing Researθh Group SCRG for the inspiration in
making this study a suθθess. This work is supported ηy The Ministry of Higher Eduθation
MOHE under Long Term Researθh Grant Sθheme LRGS/TD/ /UTM/ICT/ - VOT L .

Author details

Siti Mariyam Shamsuddin*, “shraf Osman Iηrahim and Citra Ramadhena

*“ddress all θorrespondenθe to mariyam@utm.my

Soft Computing Researθh Group, Faθulty of Computer Sθienθe and Information Systems,
UniversitiTeknologi Malaysia, Malaysia

References

[ ] Ng, S., Leung, S., & Luk, “. . Fast θonvergent generalized ηaθk-propagation al‐
gorithm with θonstant learning rate. Neural proθessing letters, - .

[ ] Zweiri, Y. H., Whidηorne, J. F., & Seneviratne, L. D. . “ three-term ηaθkpropa‐


gation algorithm. Neuroθomputing, - .

[ ] Yu, C. C. ”., & Liu, D. . “ ηaθkpropagation algorithm with adaptive learning


rate and momentum θoeffiθient. in International Joint Conferenθe on Neural
Networks IJCNN . May -May , , Honolulu, HI, United states: Institute of
Eleθtriθal and Eleθtroniθs Engineers Inθ.

[ ] Dhar, V. K., et al. . Comparative performanθe of some popular artifiθial neural


network algorithms on ηenθhmark and funθtion approximation proηlems. Pramana-
Journal of Physiθs, , - .

[ ] Hongmei, S., & Gaofeng, Z. . “ new ”P algorithm with adaptive momentum


for FNNs training. in WRI Gloηal Congress on Intelligent Systems GCIS .
May , -May , . Xiamen, China , IEEE Computer Soθiety.

[ ] Jaθoηs, R. “. . Inθreased Rates of Convergenθe Through Learning Rate “dapta‐


tion. Neural Networks, , - .

[ ] Minai, “. “., & Williams, R. D. . ”aθk-propagation heuristiθs “ study of the


extended Delta-”ar-Delta algorithm. in International Joint Conferenθe on Neu‐
80 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

ral Networks- IJCNN . June , -June , , San Diego, CA, USA: Puηl ηy
IEEE.

[ ] Jin, ”., et al. . The appliθation on the foreθast of plant disease ηased on an im‐
proved ”P neural network. in International Conferenθe on Material Sθienθe and
Information Teθhnology, MSIT Septemηer -Septemηer , . Singapore,
Singapore Trans Teθh Puηliθations.

[ ] Duffner, S., & Garθia, C. . “n online ηaθkpropagation algorithm with valida‐


tion error-ηased adaptive learning rate. “rtifiθial Neural Networks-IC“NN ,
- .

[ ] Shamsuddin, S. M., Sulaiman, M. N., & Darus, M. . “n improved error signal


for the ηaθkpropagation model for θlassifiθation proηlems. International Journal of
Computer Mathematiθs, - .

[ ] Iranmanesh, S., & Mahdavi, M. “. . “ differential adaptive learning rate meth‐


od for ηaθk-propagation neural networks. World “θademy of Sθienθe, Engineering
and Teθhnology , , - .

[ ] Li, Y., et al. . The improved training algorithm of ηaθk propagation neural net‐
work with selfadaptive learning rate. in International Conferenθe on Computa‐
tional Intelligenθe and Natural Computing, CINC . June -June , Wuhan,
China IEEE Computer Soθiety.

[ ] Yamamoto, K., et al. . Fast ηaθkpropagation learning using optimization of


learning rate for pulsed neural networks. Eleθtroniθs and Communiθations in Japan,
- .

[ ] Hua, Li. C., Xiangji, J., & Huang, . . Spam filtering using semantiθ similarity ap‐
proaθh and adaptive ”PNN. Neuroθomputing.

[ ] Xiaoyuan, L., ”in, Q., & Lu, W. . “ new improved ”P neural network algo‐
rithm. in nd International Conferenθe on Intelligent Computing Teθhnology
and “utomation, ICICT“ . Oθtoηer -Oθtoηer , ., Changsha, Hunan, Chi‐
na: IEEE Computer Soθiety.

[ ] Yang, H., Mathew, J., & Ma, L. . ”asis pursuit-ηased intelligent diagnosis of
ηearing faults. Journal of Quality in Maintenanθe Engineering, - .

[ ] Fukuoka, Y., et al. . “ modified ηaθk-propagation method to avoid false loθal


minima. Neural Networks, - .

[ ] Riedmiller, M. . “dvanθed supervised learning in multi-layer perθeptrons-


from ηaθkpropagation to adaptive learning algorithms. Computer Standards and Inter‐
faθes, , - .
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 81
h““p://dx.doi.org/10.5772/51776

[ ] Sidani, “., & Sidani, T. . Comprehensive study of the ηaθk propagation algo‐
rithm and modifiθations. in Proθeedings of the Southθon Conferenθe,. Marθh -
Marθh , Orlando, FL, US“ IEEE.

[ ] Samarasinghe, S. . Neural networks for applied sθienθes and engineering from


fundamentals to θomplex pattern reθognition. Auerηaθh Puηliθations.
Chapter 4

Robust Design of Artificial Neural Networks


Methodology in Neutron Spectrometry

José Man”el Or“iz-Rodríg”ez,


Ma. del Rosario Mar“ínez-Blanco,
José Man”el Cervan“es Viramon“es and
Héc“or René Vega-Carrillo

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/51274

1. Introduction
Applications of artificial neural networks (ANNs) have been reported in literature in various
areas. [1–5] The wide use of ANNs is due to their robustness, fault tolerant and the ability
to learn and generalize, through training process, from examples, complex nonlinear and
multi input/output relationships between process parameters using the process data. [6–10]
The ANNs have many other advantageous characteristics, which include: generalization,
adaptation, universal function approximation, parallel data processing, robustness, etc.
Multilayer perceptron (MLP) trained with backpropagation (BP) algorithm is the most used
ANN in modeling, optimization classification and prediction processes. [11, 12] Although
BP algorithm has proved to be efficient, its convergence tends to be very slow, and there is a
possibility to get trapped in some undesired local minimum. [4, 10, 11, 13]
Most literature related to ANNs focused on specific applications and their results rather
than the methodology of developing and training the networks. In general, the quality
of the developed ANN is highly dependable not only on ANN training algorithm and its
parameters but also on many ANN architectural parameters such as the number of hidden
layers and nodes per layer which have to be set during training process and these settings
are very crucial to the accuracy of ANN model. [8, 14–19]
Above all, there is limited theoretical and practical background to assist in systematical
selection of ANN parameters through entire ANN development and training process. Due to
this the ANN parameters are usually set by previous experience in trial and error procedure
which is very time consuming. In such a way the optimal settings of ANN parameters for
achieving best ANN quality are not guaranteed.

© 2013 Man”el Or“iz-Rodríg”ez e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he
“erms of “he Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which
permi“s ”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is
properly ci“ed.
84 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

The robust design methodology, proposed by Taguchi, is one of the appropriate methods
for achieving this goal. [16, 20, 21] Robust design is a statistical technique widely used to
study the relationship between factors affecting the outputs of the process. It can be used
to systematically identify the optimum setting of factors to obtain the desired output. In
this work, it was used to find the optimum setting of ANNs parameters in order to achieve
minimum error network.

1.1. Artificial Neural Networks


The first ones works about neurology were carried out by Santiago Ramón y Cajal (1852-1934)
and Charles Scott Sherrington (1852-1957). Starting from their studies it is known that the
basic element that conforms the nervous system is the neuron. [2, 10, 13, 22] The model of
an artificial neuron it is an imitation of a biological neuron. Thus, the ANNs try to emulate
the processes carried out by biological neural networks trying to build systems capable to
learn from experience, pattern recognition and to realize predictions. ANN are based on a
dense interconnection of small processors called nodes, neuronodes, cells, unit or processing
elements or neurons.
A simplified morphology of an individual biological neuron is showed in figure 1, where
can be distinguished three fundamental parts: the soma or cell body, dendrites and the
cylinder-axis or axon.

Figure 1. Simplified morphology of an individual biological neuron

Dendrites are fibers which receive the electric signals coming from other neurons and
transmit them to the soma. The multiple signals coming from dendrites are processed by
the soma and transmitted to the axon. The cylinder-axis or axon is a fiber of great longitude,
compared with the rest of the neuron, connected to the soma for an end and divided in the
other one in a series of nervous ramifications; the axon picks up the signal of the soma and
transmits it to other neurons through a process known as synapses.
An artificial neuron it is a mathematical abstraction of the working of a biological neuron.
[23] Figure 2, shows an artificial neuron. From a detailed observation of the biological
process, the following analogies with the artificial system can be mentioned:
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 85
h““p://dx.doi.org/10.5772/51274

• The input Xi represents the signals that come from other neurons and are captured by
dendrites.
• The weights Wi are the intensity of the synapses that connects two neurons; Xi and Wi
are real values.
• θ is the threshold function that the neuron should exceed to be active; this process
happens biologically in the body of the cell
• the input signals to the artificial neuron X1 , X2 , ..., Xn are continuous variables instead of
discrete pulses, as are presented in a biological neuron. Each input signal passes through
a gain or weight called synaptic weight or strength of the connection whose function is
similar to the synaptic function of the biological neuron.
• Weights can be positive (excitatory) or negatives (inhibitory), the summing node
accumulates all the input signals multiplied by the weights and pass to the output through
a threshold or transfer function.

Figure 2. Artificial neuron model

An idea of this process is shown in figure 3, where can be observed a group of inputs entering
to an artificial neuron.

Figure 3. Analogies of an artificial neuron with a biological model

The input signals are pondered multiplying them for the corresponding weight that would
correspond in the biological version of the neuron to the strength of the synaptic connection;
the pondered signals arrive to the neuronal node that acts as a summing of the signals; the
output of the node is denominated net output, and it is calculated as the summing of the
86 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

pondered entrances plus a b value denominated gain. The net outpu is used as entrance to
the transfer function providing the total output or answer of the artificial neuron.
The representation of figure 3 can be simplified as is shown in figure 4. From this figure, the
net output of neuron n, can be mathematically represented as follows:

r
n= ∑ p i wi + b (1)
i =1

Figure 4. Didactic model of an artificial neuron

The neuronal response to the input signals a, can be represented as:

r
a = f (n) = ∑ p i wi + b (2)
i =1

A more didactic model, showed in figure 5, facilitates the study of a neuron.

Figure 5. Didactic model of a single artificial neuron

From this figure can be seen that the net inputs are present in the vector p, an alone neuron
only has one element; W represents the weights and the new input b is a gain that reinforce
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 87
h““p://dx.doi.org/10.5772/51274

the output of the summing n, which is the net output of the network; the net output it is
determined by the transfer function which can be a lineal or non-lineal function of n, and is
chosen depending of the specifications of the problem that the neuron wants to solve.
Generally, a neuron has more than one entrance. In figure 4, can be observed a neuron with
R inputs; the individual inputs p1 , p2 , ..., p R are multiplied by the corresponding weights
w1,1 , w1,2 , ..., w1,R belonging to the weight matrix W. The sub-indexes of the weigh matrix
represent the terms involved in the connection. The first sub-index represents the neuron
destination, and the second represents the source of the signal that feeds to the neuron. For
example, the w1,2 indexes indicate that this weight is the connection from the second entrance
to the first neuron.
This convention becomes more useful when there is a neuron with too many parameters;
in this case the notation of figure 4, can be inappropriate and it is preferred to use the
abbreviated notation represented in figure 6.

Figure 6. Abbreviated didactic model of an artificial neuron

The entrance vector p is represented by the vertical solid bar to the left. The dimensions of p
are shown in the inferior part of the variable as Rx1, indicating that the entrance vector is a
vectorial row of R elements. The entrances go to the weight matrix W, which has R columns
and just one row for the case of a single neuron. A constant 1 enters to the neuron multiplied
by the scalar gain b. The exit of the net a it is a scalar in this case. If the net had more than a
neuron a would be a vector.
ANN are highly simplified models of the working of the brain. [10, 24] An ANN is
a biologically inspired computational model which consists of a large number of simple
processing elements or neurons which are interconnected and operate in parallel. [2, 13]
Each neuron is connected to other neurons by means of directed communication links, which
constitute the neuronal structure, each with an associated weight. [4] The weights represent
information being used by the net to solve a problem.
ANNs are usually formed by several interconnected neurons. The disposition and connection
varies from one type of nets to other, but in a general way the neurons are grouped by layers.
A layer is a collection of neurons; according to the location of the layer in the neural net, this
receives different names:
88 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

• Input later: receives the input signals from the environment. In this layer the information
is not processed, for this reason, it is not considered as a layer of neurons.
• Hidden layers: these layers do not have contact with the exterior environment; the hidden
layers pick up and process the information coming from the input layer; the numbers of
hidden layers and neurons per layer and the form in that are connected, vary from some
nets to others. Their elements can have different connections and these determine the
different topologies of the net.
• Output layer: receives the information from the hidden layers and transmits the answer
to the external means.

Figure 7 shows an ANN with two hidden layers. The outputs of first hidden layer are the
entrances of the second hidden layer. In this configuration, each layer have its own weight
matrix W, the summing, a gain vector b, net inputs vector n, the transfer function and the
output vector a. This ANN can be observed in abbreviated notation in figure 8.

Figure 7. Artificial Neural Network with two hidden layers

Figure 8 shows a three-layer network using abbreviated notation. From this figure can be seen
that the network has R1 inputs, S1 neurons in the first layer, S2 neurons in the second layer,
etc. A constant input 1 is fed to the bias for each neuron. The outputs of each intermediate
layer are the inputs to the following layer. Thus layer 2 can be analyzed as a one-layer
network with S1 inputs, S2 neurons, and an S2 xS1 weight matrix W2 . The input to layer 2
is a1 ; the output is a2 . Now that all the vectors and matrices of layer 2 have been identified,
it can be treated as a single-layer network on its own. This approach can be taken with any
layer of the network.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 89
h““p://dx.doi.org/10.5772/51274

Figure 8. Artificial Neural Network with two hidden layers in abreviated notation

The arrangement of neurons into layers and the connection patterns within and between
layers is called the net architecture [6, 7]. According to the absence or presence of feedback
connections in a network, two types of architectures are distinguished:

• Feedforward architecture. There are no connections back from the output to the input
neurons; the network does not keep a memory of its previous output values and the
activation states of its neurons; the perceptron-like networks are feedforward types.
• Feedback architecture. There are connections from output to input neurons; such a
network keeps a memory of its previous states, and the next state depends not only
on the input signals but on the previous states of the network; the Hopfield network is of
this type.

Back propagation feed forward neural nets is a network with supervised learning which uses
a propagation-adaptation cycle of two phases. Once a pattern has been applied to the input
of the network as a stimulus, this is propagated from the first layer through the superior
layers of the net until generate an output. The output signal is compared with the desired
output and a signal error is calculated for each one of the outputs.
The outputs errors are back propagated from the output layer toward all the neurons of the
hidden layer that contribute directly with the output. However, the neurons of the hidden
layer only receive a fraction of the signal from the whole error signal, based on the relative
contribution that has contributed each neuron to the original output. This process is repeated
for each layer until all neurons of the network have received an error signal which describes
its relative contribution to the total error. Based on the perceived signal error the connection
synaptic weights of each neuron are upgrade to make that the net converges toward a state
that allows to classify correctly all the patterns of training.
The importance of this process consists in that as trains the net, those neurons of the
intermediate layers are organized themselves in such a way that the neurons learn how
to recognize different features of the whole entrance space. After the training, when they
are presented an arbitrary pattern of entrance that contain noise or that it is incomplete, the
neurons of the hidden layer of the net will respond with an active output if the new entrance
contains a pattern that resembles each other to that characteristic that the individual neurons
have learned how to recognize during their training. And to the inverse one, the units of
90 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

the hidden layers have one tendency to inhibit their output if the entrance pattern does not
contain the characteristic to recognize, for which they have been trained.
During the training process, the Backpropagation net tends to develop internal relationships
among neurons with the purpose to organize the training data in classes. This tendency
can be extrapolated to arrive to the hypothesis that all the units of the hidden layer of a
Backpropagation are associated somehow to specific characteristic of the entrance pattern as
consequence of the training. That the association is or not exact, it cannot be evident for the
human observer, the important thing it is that the net has found one internal representation
that allows him to generate the wanted outputs when are given the entrances in the training
process. This same internal representation can be applied to entrances that the net has not
seen before, and the net will classify these entrances according to the characteristics that
share with the examples of training.
In recent years, there is increasing interest in using ANNs for modeling, optimization and
prediction. The advantages that ANNs offer are numerous and are achievable only by
developing an ANN model of high performance. However, determining suitable training
and architectural parameters of an ANN still remains a difficult task mainly because it is
very hard to know beforehand the size and the structure of a neural network one needs to
solve a given problem. An ideal structure is a structure that independently of the starting
weights of the net, always learns the task, i.e. makes almost no error on the training set and
generalizes well.
The problem with neural networks is that a number of parameter have to be set before
any training can begin. Users have to choose the architecture and determine many of
the parameters in a selected network. However, there are no clear rules how to set these
parameters. Yet, these parameters determine the success of the training.
As can be appreciated in figure 9, the current practice in the selection of design parameters
for ANN is based on the trial and error procedure, where a large number of ANN models
are developed and compared to one another. If the level of a design parameter is changed
and does not have effect in the performance of the net, then a different design parameter is
varied, and the experiment is repeated in a series of approaches. The observed answers are
examined in each phase, to determine the best level in each design parameter.

Figure 9. Trial-and-error procedure in the selection of ANN parameters

The serious inconvenience of this method is that a parameter is evaluated while the other
ones are maintained in an only level. Of here, the best selected level in a design variable in
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 91
h““p://dx.doi.org/10.5772/51274

particular, could not necessarily be the best at the end of the experimentation, since those
other parameters could have changed. Clearly, this method cannot evaluate interactions
among parameters since only combines one at the same time it could lead to an ANN
impoverished design in general.
All of these limitations have motivated researchers to generate ideas of merging or
hybridizing ANN with other approaches in the search for better performance. A form of
overcoming this disadvantage, is to evaluate all the possible level combinations of the design
parameters, i.e., to carry out a complete factorial design. However, since the number of
combinations can be very big, even for a small number of parameters and levels, this method
is very expensive and consumes a lot of time. The number of experiments to be carried out
can be decreased making use of the factorial fractional method, a statistical method based on
the robust design of Taguchi philosophy.
The Taguchi technique is a methodology for finding the optimum setting of the control
factors to make the product or process insensitive to the noise factors. Taguchi based
optimization technique has produced a unique and powerful optimization discipline that
differs from traditional practices.[16, 20, 21]

1.2. Taguchi philosophy of robust design


Designs of experiments involving multiple factors was first proposed by R. A. Fisher, in
the 1920s to determine the effect of multiple factors on the outcome of agricultural trials
(Ranjit 1990). This method is known as factorial design of experiments. A full factorial
design identifies all possible combinations for a given set of factors. Since most experiments
involve a significant number of factors, a full factorial design may involve a large number of
experiments.
Factors are the different variables which determines the functionality or performance of a
product or system as for example, design parameters that influence the performance or input
that can be controlled.
Dr. Genichi Taguchi is considered the author of the robust design of parameters. [8, 14–21]
This is an engineering method for the design of products or processes focused in diminishing
the variation and/or sensibility to the noise. When it is used appropriately, Taguchi design
provides a powerful and efficient method for the design of products that operate consistent
and optimally about a variety of conditions. In the robust design of parameters, the primary
objective is to find the selection of the factors that decrease the variation of the answer, while
the processes are adjusted on the objective.
The distinct idea of Taguchi’s robust design that differs from the conventional experimental
design is the simultaneous modeling of both mean and variability in the designing. However,
Taguchi methodology is based on the concept of fractional factorial design.
By using Orthogonal Arrays (OAs) and fractional factorial instead of full factorial, Taguchi’s
approach allows to study the entire parameter space with a small number of experiments.
An OA is a small fraction of full factorial design and assures a balanced comparison of levels
of any factor or interaction of factors. The columns of an OA represent the experimental
parameters to be optimized and the rows represent the individual trials (combinations of
levels).
92 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Taguchi’s robust design can be divided into two classes: static and dynamic characteristics.
The static problem attempts to obtain the value of a quality characteristic of interest as close
as possible to a single specified target value. The dynamic problem, on the other hand,
involves situations where a system’s performance depends on a signal factor.
Taguchi also proposed a two-phase procedure to determine the factors level combination.
First, the control factors that are significant for reducing variability are determined and their
settings are chosen. Next, the control factors that are significant in affecting the sensitivity
are identified and their appropriate levels are chosen. The objective of the second phase is to
adjust the responses to the desired values.
The Taguchi method is applied in four steps.

1. Brainstorm the quality characteristics and design parameters important to the


product/process.
In Taguchi methods there are variables that are under control and variables that are
not. These are called design and noise factors, respectively, which can influence a
product and operational process. The design factors, as controlled by the designer, can
be divided into: (1) signal factor, which influences the average of the quality response
and (2) control factor, which influences the variation of the quality response. The noise
factors are uncontrollable such as manufacturing variation, environmental variation and
deterioration.
Before designing an experiment, knowledge of the product/process under investigation
is of prime importance for identifying the factors likely to influence the outcome. The aim
of the analysis is primarily to seek answers to the following three questions:
(a) What is the optimum condition?
(b) Which factors contribute to the results and by how much?
(c) What will be the expected result at the optimum condition?

2. Design and conduct the experiments.


Taguchi’s robust design involves using an OA to arrange the experiment and selecting
the levels of the design factors to minimize the effects of the noise factors. That is, the
settings of the design factors for a product or a process should be determined so that the
product’s response has the minimum variation, and its mean is close to the desired target.
To design an experiment, the most suitable OA is selected. Next, factors are assigned
to the appropriate columns, and finally, the combinations of the individual experiments
(called the trial conditions) are described. Experimental design using OAs is attractive
because of experimental efficiency.
The array is called orthogonal because for every pair of parameters, all combinations of
parameter levels occur an equal number of times, which means the design is balanced so
that factor levels are weighted equally. The real power in using an OA is the ability to
evaluate several factors in a minimum of tests. This is considered an efficient experiment
since much information is obtained from a few trials. The mean and the variance of the
response at each setting of parameters in OA are then combined into a single performance
measure known as the signal-to-noise (S/N) ratio.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 93
h““p://dx.doi.org/10.5772/51274

3. Analyze the results to determine the optimum conditions.


The S/N ratio is a quality indicator by which the experimenters can evaluate the effect
of changing a particular experimental parameter on the performance of the process or
product. Taguchi used S/N ratio to evaluate the variation of the system’s performance,
which is derived from quality loss function. For static characteristic, Taguchi classified
them into three types of S/N ratios:
(a) Smaller-the-Better (STB)
(b) Larger-the-Better (LTB)
(c) Nominal-the-Best (NTB)
For the STB and LTB cases, Taguchi recommended direct minimization of the expected
loss. For the NTB case, Taguchi developed a two-phase optimization procedure to obtain
the optimal factor combinations.
For the dynamic characteristics the SN ratio

SNi = 10log10 ( β i /MSEi ) (3)

is used for evaluating the S/N ratio, where the mean square error (MSE) represents the
mean square of the distance between the measured response and the best fitted line;
denotes the sensitivity.
4. Run a confirmatory test using the optimum conditions.
The two major goals of parameter design are to minimize the process or product
variation and to design robust and flexible processes or products that are adaptable to
environmental conditions. Taguchi methodology is useful for finding the optimum setting
of the control factors to make the product or process insensitive to the noise factors.
In this stage, the value of the robustness measure is predicted at the optimal design
condition; a confirmation experiment at the optimal design condition is conducted,
calculating the robustness measure for the performance characteristic and checking if
the robustness prediction is close to the predicted value.

Today, ANN can be trained to solve problems that are difficult for conventional computers
or human beings, and have been trained to perform complex functions in various fields,
including pattern recognition, identification, classification, speech, vision, and control
systems. Recently, the use of ANN technology has been applied with relative success in
the research area of nuclear sciences,[3] mainly in the neutron spectrometry and dosimetry
domains. [25–31]

1.3. Neutron spectrometry with ANNs


The measurement of the intensity of a radiation field with respect to certain quantity like
angle, energy, frequency, etc., is very important in radiation spectrometry having, as a final
result, the radiation spectrum. [32–34] The radiation spectrometry term can be used to
describe measurement of the intensity of a radiation field with respect to energy, frequency or
momentum. [35] The distribution of the intensity with one of these parameters is commonly
referred to as the “spectrum”. [36] A second quantity is the variation of the intensity of these
radiations as a function of angle of incidence on a body situated in the radiation field and
94 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

is referred as “dose”. The neutron spectra and the dose are of great importance in radiation
protection physics. [37]
Neutrons are found in the environment or are artificially produced by different ways;
these neutrons have a wide energy range extending from few thousandths of eV to several
hundreds of MeV. [38] Also, they are in a broad variety of energy distributions, named
neutron-fluence spectrum or simply neutron spectrum, Φ E ( E).
Determination of neutron dose received by those exposed to workplaces or accidents in
nuclear facilities, generally requires knowledge of the neutron energy spectrum incident on
the body. [39] Spectral information must generally be obtained from passive detectors which
respond to different ranges of neutron energies such as the multispheres Bonner system or
Bonner spheres system (BSS). [40–42]
BSS system, has been used to unfold the neutron spectra mainly because it has an almost
isotropic response, can cover the energy range from thermal to GeV neutrons, and is easy
to operate. However, the weight, time consuming procedure, the need to use an unfolding
procedure and the low resolution spectrum are some of the BSS drawbacks. [43, 44]
As can be seen from figure 10, BSS consists of a thermal neutron detector such as 6 LiI ( Eu),
Activation foils, pairs of thermoluminiscent dosimeters or track detectors, which is placed
at the centre of a number of moderating spheres made of polyethylene of different diameter
to obtain, through an unfolding process the neutron energy distribution, also known as
spectrum, Φ E ( E). [42, 45]

Figure 10. Bonner spheres system for a 6 LiI ( Eu) neutrons detector

The derivation of the spectral information is not simple; the unknown neutron spectrum is
not given directly as a result of the measurements. [46] If a sphere d has a response function
Rd ( E), and is exposed in a neutron field with spectral fluence Φ E ( E), the sphere reading Md
is obtained by folding Rd ( E) with Φ E ( E), this means to solve the Fredholm integral equation
of the first kind shown in equation 4.

Z
Md = Rd ( E)Φ E ( E)dE (4)

This folding process takes place in the sphere itself during the measurement. Although the
real Φ E ( E) and Rd ( E) are continuous functions of neutron energy, they cannot be described
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 95
h““p://dx.doi.org/10.5772/51274

by analytical functions, and, as a consequence, a discretised numerical form is used, showed


in the following equation:

N
Cj = ∑ Ri,j Φi − −− > j = 1, 2, ..., m (5)
i =1

where Cj is jth detector’s count rate; Ri,j is the jth detector’s response to neutrons at the ith
energy interval; Φi is the neutron fluence within the ith energy interval and m is the number
of spheres utilized.
Once the neutron spectrum, Φ E ( E), has been obtained, the dose △ can be calculated using
the fluence-to-dose conversion coefficients δΦ E, as shown in equation 3.

Z Emax
△= δΦ EΦ E ( E)dE (6)
Emin

Equation 5 is an ill-conditioned equations system with an infinite number of solutions which


have motivated researches to propose new and complementary approaches. To unfold the
neutron spectrum, Φ, several methods are used. [43, 44, 47] ANN technology is a useful
alternative to solve this problem; [25–31] however, several drawbacks must be solved in
order to simplify the use of these procedures.
Besides many advantages that ANNs offer, there are some drawbacks and limitations related
to ANN design process. In order to develop an ANN which generalizes well and be robust,
a number of issues must be taken into consideration, particularly related to architecture
and training parameters. [8, 14–21] The trial-and-error technique is the usual way to get
a better combination of these vales. This method cannot identify interactions between the
parameters and do not use systematic methodologies for the identification of the “best”
values, consuming much time and does not systematically target a near optimal solution,
which may lead to a poor overall neural network design.
Even though the BP learning algorithm provides a method for training multilayer feed
forward neural nets, is not free of problems. Many factors affect the performance of the
learning and should be treated for having a successful learning process. Those factors include
the synaptic weight initialization, the learning rate, the momentum, the size of the net and
the learning database. A good election of these parameters could speed up and improve in
great measure the learning process to reach the goal, although a universal answer does not
exist for such topics.
Choosing the ANN architecture followed by selection of training algorithm and related
parameters is rather a matter of the designer past experience since there are no practical
rules which could be generally applied. This is usually a very time consuming trial and error
procedure where a number of ANNs are designed and compared to one another. Above all,
the design of optimal ANN is not guaranteed. It is unrealistic to analyze all combination of
ANN parameters and parameter’s levels effects on the ANN performance.
96 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

To deal economically with the many possible combinations, the Taguchi method can be
applied. Taguchi’s techniques have been widely used in engineering design, and can be
applied to many aspects such as optimization, experimental design, sensitivity analysis,
parameter estimation, model prediction, etc.
This work is concerned with the application of Taguchi method for the optimization of ANN
models. The integration of ANN and Taguchi’s optimization provides a tool for designing
robust network parameters and improving their performance. The Taguchi method offers
considerable benefits in time and accuracy when is compared with the conventional trial and
error neural network design approach.
In this work, for the robust design of multilayer feedforward neural networks trained by
backpropagation algorithm in the neutron spectrometry field, a systematic and experimental
strategy called Robust Design of Artificial Neural Networks (RDANN) methodology was
designed. This computer tool, emphasizes simultaneous ANNs parameters optimization
under various noise conditions. Here, we make a comparison among this method and
conventional training methods. The attention is drawing on the advantages on Taguchi
methods which offer potential benefits in evaluating the network behavior.

2. Robust design of artificial neural networks methodology


Neutron spectrum unfolding is an ill-conditioned system with an infinite number of
solutions. [27] Researchers have using ANNs to unfold neutron spectra from BSS. [48] Figure
11, shows the classical approach of neutron spectrometry by means ANN technology starting
from rate counts measured with BSS.
As can be appreciated in figure 11, neutron spectrometry by means of ANN technology
is done by using a neutron spectra data set compiled by the International Atomic Energy
Agency (IAEA). [49] This compendium contains a large collection of detector responses and
spectra. The original spectra in this report were defined per unit lethargy in 60 energy groups
ranging from thermal to 630 MeV.
One challenge in neutron spectrometry using neural nets is the pre-processing of the
information in order to create suitable pair input-output training data sets. [50] The
generation of a suitable data set is a non trivial task. Because the novelty of this technology
in this research area, the researcher spent a lot of time in this activity mainly because all the
work is done by hand and a lot of effort is required. From the anterior, it is evident the need
to have technological tools that automate this process. At present, work is being realized in
order to alleviate this drawback.
In order to use the response matrix known as UTA4, expressed en 31 energy groups, ranging
from 10−8 up to 231.2 MeV in the ANN training process, the energy range of neutrons spectra
was changed through a re-binning process by means of MCNP simulations. [50] 187 neutrons
spectra from IAEA compilation, expressed in energy units and in 60 energy bins, were
re-binned into the thirty-one energy groups of the UTA4 response matrix, and at the same
time, 13 different equivalent doses were calculated per spectra by using the International
Commission on Radiological Protection (ICRP) fluence-to-dose conversion factors.
Figure 12, shows the re-binned neutron spectra data set used for training and testing the
optimum ANN architecture designed with RDANN methodology.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 97
h““p://dx.doi.org/10.5772/51274

Figure 11. Classical Neutron spectrometry with ANN technology

Figure 12. Re-binned neutron spectra data set used to train the optimum ANN architecture designed with RDANN
methodology

Multiplying re-binned neutron spectra by UTA4 response matrix, the rate counts data set
was calculated. Re-binned spectra and equivalent doses are the desired output of ANN and
its corresponding calculated rate counts the entrance data.
The second one challenge in neutron spectrometry by means ANN, is the determination of
the net topology. In the ANN design process, the choice of the ANN’s basic parameters often
determines the success of the training process. The selection of these parameters follows in
practical use no rules, and their value is at most arguable. This method consuming much time
and does not systematically target a near optimal solution to select suitable parameter values.
The ANN designers have to choose the architecture and determine many of the parameters
through the trial and error technique, which produces ANN with poor performance and low
generalization capability, spending often large amount of time.
An easier and more efficient way to overcome this disadvantage is to use the RDANN
methodology, showed in figure 13, which has become in a new approach to solve this
problem.
RDANN is a very powerful method based on parallel processes where all the experiments are
planed a priori and the results are analyzed after all the experiments are completed. This is a
98 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 13. Robust design of artificial neural networks methodology

systematic and methodological approach of ANN design, based on the Taguchi philosophy,
which maximize the ANN performance and generalization capacity.
The integration of neural networks and optimization provides a tool for designing ANN
parameters improving the network performance and generalization capability. The main
objective of the proposed methodology is to develop accurate and robust ANN models. In
other words, the goal is to select ANN training and architectural parameters, so that the
ANN model yields best performance.
From figure 13 can be seen that in ANN design using Taguchi philosophy in RDANN
methodology, the designer must recognize the application problem well and choose a suitable
ANN model. In the selected model, the design parameters, factors, which need to be
optimized need to be determined (Planning stage). Using OAs, simulations, i.e., training
of ANNs with different net topologies can be executed in a systematic way (experimentation
stage). From simulation results, the response can be analyzed by using S/N ratio from
Taguchi method (Analysis stage). Finally, a confirmation experiment at the optimal design
condition is conducted, calculating the robustness measure for the performance characteristic
and checking if the robustness prediction is close to the predicted value (Confirmation stage).
To provide scientific discipline to this work, in this research the systematic and
methodological approach called RDANN methodology was used to obtain the optimum
architectural and learning values of an ANN capable to solve the neutron spectrometry
problem.
According figure 13, the steps followed to obtain the optimum design of the ANN are
described:

1. Planning stage
In this stage it is necessary to identify the objective function and the design and noise
variables.
(a) The objective function. The objective function must be defined according to the purpose
and requirements of the problem.
In this research, the objective function is the prediction or classification errors between
the target and the output values of BP ANN at testing stage, i.e., the performance or
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 99
h““p://dx.doi.org/10.5772/51274

mean square error (MSE) output of the ANN is used as the objective function as is
showed in the following equation:

v
u1 N
u
MSE = t ∑ (Φ E ( E)iANN − Φ E ( E)ORIGI
i
N AL )2 (7)
N i =1

Where N is the number of trials, Φ E ( E)ORIGI


i
N AL is the original spectra and Φ ( E ) ANN
E i
is the spectra unfolded with ANN.
(b) Design and noise variables. Based in the requirements of the physical problem, users can
choose some factors as design variables, which can be varied during the optimization
iteration process, and some factors as fixed constants.
Among the various parameters that affect the ANN performance, four design variables
were selected, as is showed in the table 1:
Design Var. Level 1 Level 2 Level 3
A L1 L2 L3
B L1 L2 L3
C L1 L2 L3
D L1 L2 L3

Table 1. Design variables and their levels

where A is the number of neurons in the first hidden layer, B is the number of neurons
in the second hidden layer, C is the momentum and D is the learning rate.
Noise variables are shown in table 2. These variables in most cases are not controlled
by the user. The initial set of weights, U, usually is randomly selected; In the training
and testing data sets, V, the designer must decide how much of the whole data should
be allocated to the training and testing data sets. Once V is determined, the designer
must decide which data of the whole data set to include in the training and testing
data set, W.
Design Var. Level 1 Level 2
U Set 1 Set 2
V 6:4 8:2
W Tr-1/Tst-1 Tr-2/Tst-2

Table 2. Noise variables and their levels

where U is the initial set of random weights, V is the size of training set versus size
of testing set, i.e., V = 60% / 40%, 80% / 20% and W is the selection of training and
testing sets, i.e., W = Training1/Test1, Training2/Test2.
In practice, these variables are randomly determined, and are not controlled by designer.
Because the random nature of this selection processes, the ANN designer must create
these data sets starting from the whole data set. This procedure is very time consuming
when is done by hand without the help of technological tools.
RDANN methodology was designed in order to fully automate in a computer program,
developed under Matlab environment and showed in figure 14, the creation of the noise
100 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

variables and their levels. This work is done before the training of the several net
topologies tested at experimentation stage.
Besides the automatic generation of noise variables, another programming routines were
created in order to train the different net architectures, and to statistically analyze and
graph the obtained data. When this procedure is done by hand, is very time consuming.
The use of the designed computer tool saves a lot of time and effort to ANN designer.

Figure 14. Robust Design of Artificicial Neural Networks Methodology

After the factors and levels are determined, a suitable OA can be selected for training
process. The Taguchi OAs are denoted by Lr (sc ) where r is the number of rows, c is the
number of columns and s is the number of levels in each column.
2. Experimentation stage. The choice of a suitable OA is critical for the success of this stage.
OA allow to compute the main interaction effects via a minimum number of experimental
trails. In this research, the columns of OA represent the experimental parameters to be
optimized and the rows represent the individual trials, i.e., combinations of levels.
For a robust experimental design, Taguchi suggest to use two crossed OAs with a L9 (34 )
y L4 (32 ) configuration, as is showed in table 3.
From table 3, can be seen that a design variable is assigned to a column of the OA. Then,
each row of the design OA represents a specific design of ANN. Similarly, a noise variable
is assigned to a column of the noise OA, each row corresponds to a noise condition.
3. Analysis stage.
The S/N ratio is a measure of both, the location and dispersion of the measured responses.
It transforms the row data to allow quantitative evaluation of the design parameters
considering their mean and variation. It is measured in decibels using the formula:

S/N = 10log10 ( MSD ) (8)

where MSD is a measure of the mean square deviation in performance, since in every
design, more signal and less noise is desired. The best design will have the highest S/N
ratio. In this stage, the statistical program JMP was used to select the best values of the
ANN being designed.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 101
h““p://dx.doi.org/10.5772/51274

Trial No. A B C D S1 S2 S3 S4 Mean S/N


1 1 1 1 1
2 1 2 2 2
3 1 3 3 3
4 2 1 2 3
5 2 2 3 1
6 2 3 1 2
7 3 1 3 2
8 3 2 1 3
9 3 3 2 1

Table 3. ANN measured responses with a crossed OA with L9 (34 ) y L4 (32 ) configuration

4. Confirmation stage.
In this stage, the value of the robustness measure is predicted at the optimal design
condition; a confirmation experiment at the optimal design condition is conducted,
calculating the robustness measure for the performance characteristic and checking if
the robustness prediction is close to the predicted value.

3. Results and discussion


RDANN methodology was applied in nuclear sciences in order to solve the neutron
spectrometry problem starting from the count rates of BSS System with a 6 LiI(Eu) thermal
neutrons detector, 7 polyethylene spheres and the UTA4 response matrix expressed 31 energy
bins.
In this work, a feed-forward ANN trained with BP learning algorithm was designed. For
ANN training, the “trainscg” training algorithm and mse = 1E−4 were selected. In RDANN
methodology an OA with L9 (34 ) and L4 (32 ) configuration, corresponding to design and
noise variables respectively, was used. The optimal net architecture was designed in short
time and has high performance and generalization capability.
The obtained results after applying RDANN methodology are:

3.1. Planning stage


Tables 4 and 5 shown the design and noise variables selected and their levels.

Design Var. Level 1 Level 2 Level 3


A 14 28 56
B 0 28 56
C 0.001 0.1 0.3
D 0.1 0.3 0.5

Table 4. Design variables and their levels


102 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

where A is the number of neurons in the first hidden layer, B is the number of neurons in
the second hidden layer, C is the momentum, D is the learning rate.
Design Var. Level 1 Level 2
U Set 1 Set 2
V 6:4 8:2
W Tr-1/Tst-1 Tr-2/Tst-2

Table 5. Noise variables and their levels

where U is the initial set of random weights, V is the size of training set versus size of testing
set, i.e., V = 60% / 40%, 80% / 20% and W is the selection of training and testing sets, i.e.,
W = Training1/Test1, Training2/Test2.

3.2. Experimentation Stage


In this stage by using a crossed OA with L9 (34 ), L4 (32 ) configuration, 36 different ANNs
architectures were trained and tested as is showed in table 6.
Trial No. Resp-1 Resp-2 Resp-3 Resp-4 Median S/N
1 3.316E-04 2.416E-04 2.350E-04 3.035E-04 2.779E-04 3.316E-04
2 2.213E-04 3.087E-04 3.646E-04 2.630E-04 2.894E-04 2.213E-04
3 4.193E-04 3.658E-04 3.411E-04 2.868E-04 3.533E-04 4.193E-04
4 2.585E-04 2.278E-04 2.695E-04 3.741E-04 2.825E-04 2.585E-04
5 2.678E-04 3.692E-04 3.087E-04 3.988E-04 3.361E-04 2.678E-04
6 2.713E-04 2.793E-04 2.041E-04 3.970E-04 2.879E-04 2.713E-04
7 2.247E-04 7.109E-04 3.723E-04 2.733E-04 3.953E-04 2.247E-04
8 3.952E-04 5.944E-04 2.657E-04 3.522E-04 4.019E-04 3.952E-04
9 5.425E-04 3.893E-04 3.374E-04 4.437E-04 4.282E-04 5.425E-04

Table 6. ANN measured responses with a corssed OA with L9,L4 configuration

The signal-to-noise ratio was analyzed by means of Analysis of Variance (ANOVA) by using
the statistical program JMP. Since an error of 1E−4 was established for the objective function,
from table 6, can be seen that all ANN performances reach this value. This means that this
particular OA has a good performance.

3.3. Analysis stage


The signal-to-noise ratio is used in this stage to determine the optimum ANN architecture.
The best ANN’s design parameters are showed in table 7.

Trial No. A B C1 C2 C3 D
1 14 0 0.001 0.001 0.001 0.1
2 14 0 0.001 0.1 0.3 0.1
3 56 56 0.001 0.1 0.1 0.1

Table 7. Best values used in the confirmation stage to design the ANN
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 103
h““p://dx.doi.org/10.5772/51274

3.4. Confirmation stage


Once optimum design parameters were determined, the confirmation stage was performed
to determine the final optimum values, highlighted in table 7. After the best ANN topology
was determined a final training and testing was made to validate the data obtained with
the ANN designed. At final ANN validation and using the designed computational tool,
correlation and Chi square statistical tests were carried out as shown in figure 15.

(a) Chi square test

(b) Correlation test

Figure 15. Subfigure with four images

From figure 15(a), can be seen that all neutron spectra pass the Chi square statistical
test, which demonstrate that statistically there is not difference among the neutron spectra
reconstructed by the designed ANN and the target neutron spectra. Similarly from figure
15(b), can be seen that the whole data set of neutron spectra is near of the optimum value
equal to one, which demonstrate that this is an OA with high quality.
Figure 16 and 17 shown the best and worst neutron spectra unfolded at final testing stage of
the designed ANN compared with the target neutron spectra, along with the correlation and
chi sqare tests applied to each spectra.
104 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

(a) Best neutron spectra

(b) Correlation test

Figure 16. RDANN: Best neutron spectra and correlation test


Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 105
h““p://dx.doi.org/10.5772/51274

(a) Worst neutron spectra

(b) Correlation test

Figure 17. RDANN: Worst neutron spectra and correlation test


106 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

In ANNs design, the use of the RDANN methodology can help provide answers to the
following critical design and construction issues:

• What is the proper density for training samples in the input space?. The proper density
for training samples in the input space was: 80% for ANN training stage and 20% for
testing stage.
• When is the best time to stop training to avoid over-fitting?. The best time to stop
training to avoid over-fitting is variable and depends of the proper selection of the ANN
parameters. In the optimum ANN designed, the best time to train the network avoiding
the over-fitting was 120 seconds average.
• Which is the best architecture to use?. The best architecture to use is 7 : 14 : 31, a
learning rate = 0.1 and a momentum = 0.1, a trainscg training algorithm and an mse =
1E−4 .
• Is it better to use a large architecture and stop training optimally or to use an optimum
architecture, which probably will not over-fit the data, but may require more time
to train?. It is better to use an optimum architecture, designed with the RDANN
methodology, which not overfit the data and do not require more training time instead of
using a large architecture stopping the training over the time or trials which produce a
poor ANN.
• If noise is present in the training data, is best to reduce the amount of noise or gather
additional data?, and what is the effect of noise in the testing data on the performance
of the network?. In the random weight initialization is introduced a great amount of
noise in training data. Such initialization introduces large negative numbers which is
very harmful for the neutron spectra unfolded. The effect of noise in the random weight
initialization in the testing data, affects significantly the performance of the network. In
this case, the noise produced results negatives in the unfolded neutrons, which has not
physics meaning. In consequence, can be concluded that it is necessary to reduce the
noise introduced in the random weight initialization.

4. Conclusions
ANNs is a theory that still is in development process; its true potentiality has not
still been reached; although researchers have developed potent learning algorithms of
great practical value, representations and procedures that the brain is served, are even
unknown. The integration of ANN and optimization provides a tool for designing neural
network parameters and improving the network performance. In this work, a systematic,
methodological and experimental approach called RDANN methdology was introduced to
obtain the optimum design of artificial neural networks. The Taguchi method is the main
technique used to simplify the optimization problem.
RDANN methdology was applied with success in nuclear sciences to solve the neutron
spectra unfolding problem. The factors that are found to be significant in the case study
were number of hidden neurons in hidden layer 1 and 2, learning rate and momentum term.
The near optimum ANN topology was: 7:14:31 whit a momentum = 0.1 and a learning
rate = 0.1, mse = 1E−4 and a “trainscg” learning function. The optimal net architecture was
designed in short time and has high performance and generalization capability.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 107
h““p://dx.doi.org/10.5772/51274

The proposed systematic and experimental approach is a useful alternative for the robust
design of ANNs. It offers a convenient way of simultaneously considering design and
noise variables, and incorporates the concept of robustness in the ANN design process. The
computer program developed to implement the experimental and confirmation stages of the
RDANN methodology, reduces significantly the time required to prepare, to process and
to present the information in an appropriate way to de designer, and in the search of the
optimal net topology being designed. This gives to the researcher time to solve the problem
in which he is interested.
The results show that RDANN methodolgy can be used to find better setting of ANNs, which
not only results in minimum error, but also significantly reduces training time and effort in
the modeling phases.
The optimum setting of ANNS parameters are largely problem-dependent. Ideally and
optimization process should be performed for each ANNs application, as the significant
factors might be different for ANNs trained for different purpose.
When compared with the trial-and-error approach, which can spent from several days
to months to prove different ANN architectures and parameters which may lead to a
poor overall ANN design, RDANN methodology reduces significantly the time spent in
determining the optimum ANN architecture. With RDANN it takes from minutes to a
couple of hours to determine the best and robust ANN architectural and learning parameters
allowing to researches more time to solve the problem in question.

Acknowledgements
This work was partially supported by Fondos Mixtos CONACYT - Gobierno del Estado de
Zacatecas (México) under contract ZAC-2011-C01-168387.
This work was partially supported by PROMEP under contract PROMEP/103.5/12/3603.

Author details
José Manuel Ortiz-Rodríguez1,⋆ ,
Ma. del Rosario Martínez-Blanco2 ,
José Manuel Cervantes Viramontes1 and
Héctor René Vega-Carrillo2
⋆ Address all correspondence to: morvymm@yahoo.com.mx
Universidad Autónoma de Zacatecas, Unidades Académicas, 1-Ingeniería Eléctrica,
2-Estudios Nucleares, México

5. References
[1] C.R. Alavala. Fuzzy logic and neural networks basic concepts & applications. New Age
International Publishers, 1996.

[2] J. Lakhmi and A. M. Fanelli. Recent advances in artificial neural networks design and
applications. CRC Press, 2000.
108 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[3] R. Correa and I. Requena. Aplicaciones de redes neuronales artificiales en la ingeniería


nuclear. XII congreso Español sobre Tecnologías y Lógica Fuzzy, 1:485–490, 2004.

[4] G. Dreyfus. Neural networks, methodology and applications. Springer, 2005.

[5] B. Apolloni, S. Bassis, and M. Marinaro. New directions in neural networks. IOS Press,
2009.

[6] J. Zupan. Introduction to artificial neural network methods: what they are and how to
use them. Acta Chimica Slovenica, 41(3):327–352, month 1994.

[7] A. K. Jain, J. Mao, and K. M. Mohiuddin. Artificial neural networks: a tutorial. IEEE:
Computer, 29(3):31–44, month 1996.

[8] T. Y. Lin and C. H. Tseng. Optimum design for artifcial neural networks: an example in
a bicycle derailleur system", journal = "engineering applications of artificial inteligence.
13:3–14, 2000.

[9] M.M. Gupta, L. Jin, and N. Homma. Static and dynamic neural networks: from fundamentals
to advanced theory. 2003.

[10] D. Graupe. Principles of artificial neural networks. World Scientific, 2007.

[11] M. Kishan, K. Chilukuri, and R. Sanjay. Elements of artificial neural networks. The MIT
Press, 2000.

[12] L. Fausett. Fundamentals of neural networks, architectures, algorithms and applications.


Prentice Hall, 1993.

[13] A.I. Galushkin. Neural networks theory. Springer, 2007.

[14] J.A. Frenie and A. Jiju. Teaching the taguchi method to industrial engineers. MCB
University Press, 50(4):141–149, 2001.

[15] S. C. Tam, W. L. Chen, Y. H. Chen, and H. Y. Zheng. Application of taguchi method


in the optimization of laser micro-engraving of photomasks. International Journal of
Materials & Product Technology, 11(3-4):333–344, 1996.

[16] G. E. Peterson, D. C. St. Clair, S. R. Aylward, and W. E. Bond. Using taguchi’s method
of experimental design to control errors in layered perceptrons. IEEE Transactions on
Neural Networks, 6(4):949–961, month 1995.

[17] Y.K. Singh. Fundamentl of research methodology and statistics. New Age International
Publishers, 2006.

[18] M.N. Shyam. Robust design. Technical report, 2002.

[19] T.T. Soong. Fundamentals of probability and statistics for engineers. John Wiley & Sons, Inc.,
2004.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 109
h““p://dx.doi.org/10.5772/51274

[20] M. S. Packianather, P. R. Drake, and Rowlands H. Optimizing the parameters of


multilayered feedforward neural networks through taguchi design of experiments.
Quality and Reliability Engineering International, 16:461–473, month 2000.

[21] M. S. Packianather and P. R. Drake. Modelling neural network performance through


response surface methodology for classifying wood veneer defects. Proceedings of the
Institution of Mechanical Engineers, Part B, 218(4):459–466, month 2004.

[22] M.A. Arbib. Brain theory and neural networks. The Mit Press, 2003.

[23] M. H. Beale, M. T Hagan, and H. B. Demuth. Neural networks toolbox, user’s guide.
Mathworks, 1992. www.mathworks.com/help/pdf_doc/nnet/nnet.pdf.

[24] N. K. Kasabov. Foundations of neural networs, fuzzy systems, and knowledge engineering.
MIT Press, 1998.

[25] M.R. Kardan, R. Koohi-Fayegh, S. Setayeshi, and M. Ghiassi-Nejad. Neutron spectra


unfolding in Bonner spheres spectrometry using neural networks. Radiation Protection
Dosimetry, 104(1):27–30, 2004.

[26] H. R. Vega-Carrillo, V. M. Hernández-Dávila, E. Manzanares-Acuña, G. A.


Mercado Sánchez, E. Gallego, A. Lorente, W. A. Perales-Muñoz, and J. A.
Robles-Rodríguez. Artificial neural networks in neutron dosimetry. Radiation Protection
Dosimetry, 118(3):251–259, month 2005.

[27] H. R. Vega-Carrillo, V. M. Hernández-Dávila, E. Manzanares-Acuña, G. A.


Mercado-Sánchez, M. P. Iñiguez de la Torre, R. Barquero, S. Preciado-Flores,
R. Méndez-Villafañe, T. Arteaga-Arteaga, and J. M. Ortiz-Rodríguez. Neutron
spectrometry using artificial neural networks. Radiation Measurements, 41:425–431,
month 2006.

[28] H. R. Vega-Carrillo, M. R. Martínez-Blanco, V. M. Hernández-Dávila, and J. M.


Ortiz-Rodríguez. Ann in spectroscopy and neutron dosimetry. Journal of Radioanalytical
and Nuclear Chemistry, 281(3):615–618, month 2009a.

[29] H.R. Vega-Carrillo, J.M. Ortiz-Rodríguez, M.R. Martínez-Blanco, and V.M.


Hernández-Dávila. Ann in spectrometry and neutron dosimetry. American Institute of
Physics Proccedings, 1310:12–17, 2010.

[30] H. R. Vega-Carrillo, J. M. Ortiz-Rodríguez, V. M. Hernández-Dávila, M .R.


Martínez-Blanco, B. Hernández-Almaraz, A. Ortiz-Hernández, and G. A. Mercado.
Different spectra with the same neutron source. Revista Mexicana de Física S, 56(1):35–39,
month 2009b.

[31] H.R. Vega-Carrillo, M.R. Martínez-Blanco, V.M. Hernández-Dávila, and J.M.


Ortiz-Rodríguez. Spectra and dose with ANN of 252Cf, 241Am-Be, and 239Pu-Be.
Journal of Radioanalytical and Nuclear Chemistry, 281(3):615–618, 2009.

[32] R.R. Roy and B.P. Nigam. Nuclear physics, theory and experiment. John Wiley & Sons, Inc.,
1967.
110 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[33] R.N. Cahn and G. Goldhaber. The experimental fundations of particle physics. Cambridge
University Press, 2009.

[34] R.L. Murray. Nuclear energy, an introduction to the concepts, systems, and applications of
nuclear processes. World Scientific, 2000.

[35] D. J. Thomas. Neutron spectrometry for radiation protection. Radiation Protection


Dosimetry, 110(1-4):141–149, month 2004.

[36] B.R.L. Siebert, J.C. McDonald, and W.G. Alberts. Neutron spectrometry for
radiation protection purposes. Nuclear Instruments and Methods in Physics Research A,
476(1-2):347–352, 2002.

[37] P. Reuss. Neutron physics. EDP Sciences, 2008.

[38] F. D. Brooks and H. Klein. Neutron spectrometry, historical review and present status.
Nuclear Instruments and Methods in Physics Research A, 476:1–11, month 2002.

[39] M. Reginatto. What can we learn about the spectrum of high-energy stray neutorn fields
from bonner sphere measurements. Radiation Measurements, 44:692–699, 2009.

[40] M. Awschalom and R.S. Sanna. Applications of bonner sphere detectors in neutron field
dosimetry. Radiation Protection Dosimetry, 10(1-4):89–101, 1985.

[41] V. Vylet. Response matrix of an extended bonner sphere system. Nuclear Instruments
and Methods in Physics Research A, 476:26–30, month 2002.

[42] V. Lacoste, V. Gressier, J. L. Pochat, F. Fernández, M. Bakali, and T. Bouassoule.


Characterization of bonner sphere systems at monoenergetic and thermal neutron fields.
Radiation Protection Dosimetry, 110(1-4):529–532, month 2004.

[43] M. Matzke. Unfolding procedures. Radiation Protection Dosimetry, 107(1-3):155–174,


month 2003.

[44] M. El Messaoudi, A. Chouak, M. Lferde, and R. Cherkaoui. Performance of three


different unfolding procedures connected to bonner sphere data. Radiation Protection
Dosimetry, 108(3):247–253, month 2004.

[45] R.B. Murray. Use of 6LiI(Eu) as a scintillation detector and spectrometer for fast
neutrons. Nuclear Instruments, 2:237–248, 1957.

[46] V. Lacoste, M. Reginatto, B. Asselineau, and H. Muller. Bonner sphere neutron


spectrometry at nuclear workplaces in the framework of the evidos project. Radiation
Protection Dosimetry, 125(1-4):304–308, month 2007.

[47] H. Mazrou, T. Sidahmed, Z. Idiri, Z. Lounis-Mokrani, , Z. Bedek, and M. Allab.


Characterization of the crna bonner sphere spectrometer based on 6lii scintillator
exposed to an 241ambe neutron source. Radiation Measurements, 43:1095–1099, month
2008.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 111
h““p://dx.doi.org/10.5772/51274

[48] H. R. Vega-Carrillo, V. M. Hernández-Dávila, E. Manzanares-Acuña, E. Gallego,


A. Lorente, and M. P. Iñiguez. Artificial neural networks technology for neutron
spectrometry and dosimetry. Radiation Protection Dosimetry, 126(1-4):408–412, month
2007b.

[49] IAEA. Compendium of neutron spectra and detector responses for radiation protection
purposes. Technical Report 403, 2001.

[50] M. P. Iñiguez de la Torre and H. R. Vega Carrillo. Catalogue to select the initial guess
spectrum during unfolding. Nuclear Instruments and Methods in Physics Research A,
476(1):270–273, month 2002.
Section 2

Applications
Chapter 5

Comparison Between an
Artificial Neural Network and Logistic Regression in
Predicting Long Term Kidney Transplantation Outcome

Giovanni Caocci, Rober“o Baccoli, Rober“o Li““era,


Sandro Orrù, Carlo Carcassi and Giorgio La Nasa

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/53104

. Introduction

Prediθting θliniθal outθome following a speθifiθ treatment is a θhallenge that sees physiθians and
researθhers alike sharing the dream of a θrystal ηall to read into the future. In Mediθine, several
tools have ηeen developed for the prediθtion of outθomes following drug treatment and other
mediθal interventions. The standard approaθh for a ηinary outθome is to use logistiθ regression
LR [ , ] ηut over the past few years artifiθial neural networks “NNs have ηeθome an inθreas‐
ingly popular alternative to LR analysis for prognostiθ and diagnostiθ θlassifiθation in θliniθal
mediθine [ ]. The growing interest in “NNs has mainly ηeen triggered ηy their aηility to mimiθ
the learning proθesses of the human ηrain. The network operates in a feed-forward mode from
the input layer through the hidden layers to the output layer. Exaθtly what interaθtions are mod‐
eled in the hidden layers is still under study. Eaθh layer within the network is made up of θom‐
puting nodes with remarkaηle data proθessing aηilities. Eaθh node is θonneθted to other nodes
of a previous layer through adaptaηle inter-neuron θonneθtion strengths known as synaptiθ
weights. “NNs are trained for speθifiθ appliθations through a learning proθess and knowledge
is usually retained as a set of θonneθtion weights [ ]. The ηaθkpropagation algorithm and its var‐
iants are learning algorithms that are widely used in neural networks. With ηaθkpropagation,
the input data is repeatedly presented to the network. Eaθh time, the output is θompared to the
desired output and an error is θomputed. The error is then fed ηaθk through the network and
used to adjust the weights in suθh a way that with eaθh iteration it gradually deθlines until the
neural model produθes the desired output.

© 2013 Caocci e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he Crea“ive
Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s ”nres“ric“ed ”se,
dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
116 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

“NNs have ηeen suθθessfully applied in the fields of mathematiθs, engineering, mediθine,
eθonomiθs, meteorology, psyθhology, neurology, and many others. Indeed, in mediθine,
they offer a tantalizing alternative to multivariate analysis, although their role remains advi‐
sory sinθe no θonvinθing evidenθe of any real progress in θliniθal prognosis has yet ηeen
produθed [ ].

In the field of nephrology, there are very few reports on the use of “NNs [ - ], most of whiθh
desθriηe their aηility to individuate prediθtive faθtors of teθhnique survival in peritoneal dialy‐
sis patients as well as their appliθation to presθription and monitoring of hemodialysis therapy,
analyis of faθtors influenθing therapeutiθ effiθaθy in idiopathiθ memηranous nephropathy, pre‐
diθtion of survival after radiθal θysteθtomy for invasive ηladder θarθinoma and individual risk
for progression to end-stage renal failure in θhroniθ nephropathies.

This all led up to the intriguing θhallenge of disθovering whether “NNs were θapaηle of
prediθting the outθome of kidney transplantation after analyzing a series of θliniθal and im‐
munogenetiθ variaηles.

Figure 1. The predic“ion of kidney allograf“ o”“come.... a dream abo”“ “o come “r”e?
Comparison Be“ween an Ar“ificial Ne”ral Ne“work and Logis“ic Regression in Predic“ing Long Term Kidney 117
Transplan“a“ion O”“come
h““p://dx.doi.org/10.5772/53104

. The complex setting of kidney transplantation

Prediθting the outθome of kidney transplantation is important in optimizing transplantation


parameters and modifying faθtors related to the reθipient, donor and transplant proθedure
[ ]. The ηiggest oηstaθles to ηe overθome in organ transplantation are the risks of aθute and
θhroniθ immunologiθ rejeθtion, espeθially when they entail loss of graft funθtion despite ad‐
justment of immunosuppressive therapy. “θute renal allograft rejeθtion requires a rapid in‐
θrease in immunosuppression, ηut unfortunately, diagnosis in the early stages is often
diffiθult [ ]. ”lood tests may reveal an inθrease in serum θreatinine ηut whiθh θannot ηe
θonsidered a speθifiθ sign of aθute rejeθtion sinθe there are several θauses of impaired renal
funθtion that θan lead to θreatinine inθrease, inθluding exθessive levels of some immunosup‐
pressive drugs. “lso during isθhemiθ damage, serum θreatinine levels are elevated and so
provide no indiθation of rejeθtion. “lternative approaθhes to the diagnosis of rejeθtion are
fine needle aspiration and urine θytology, ηut the main approaθh remains histologiθal as‐
sessment of needle ηiopsy.[ ] However, ηeθause the histologiθal θhanges of aθute rejeθtion
develop gradually, the diagnosis θan ηe extremely diffiθult or late [ ]. “lthough allograft
ηiopsy is θonsidered the gold standard, pathologists working in θentres where this approaθh
is used early in the investigation of graft dysfunθtion, are often faθed with a θertain degree
of unθertainty aηout the diagnosis. In the past, the ”anff θlassifiθation of renal transplant
pathology provided a rational ηasis for grading of the severity of a variety of histologiθal
features, inθluding aθute rejeθtion. Unfortunately, the reproduθiηility of this system has
ηeen questioned [ ]. What we need is a simple prognostiθ tool θapaηle of analyzing the
most relevant prediθtive variaηles of rejeθtion in the setting of kidney transplantation.

. The role of HLA-G in kidney transplantation outcome

Human Leukoθyte “ntigen G HL“-G represents a non θlassiθ€ HL“ θlass I moleθule,
highly expressed in trophoηlast θells. [ ] HL“-G plays a key role in emηryo implantation
and pregnanθy ηy θontriηuting to maternal immune toleranθe of the fetus and, more speθifi‐
θally, ηy proteθting trophoηlast θells from maternal natural killer NK θells through interaθ‐
tion with their inhiηitory KIR reθeptors. It has also ηeen shown that HL“-G expression ηy
tumoral θells θan θontriηute to an esθape€ meθhanism, induθing NK toleranθe toward θan‐
θer θells in ovarian and ηreast θarθinomas, melanoma, aθute myeloid leukemia, aθute lym‐
phoηlastiθ leukemia and ”-θell θhroniθ lymphoθytiθ leukemia. [ ] “dditionally it would
seem that HL“-G moleθules have a role in graft toleranθe following hematopoietiθ stem θell
transplantation. These moleθules exert their immunotolerogeniθ funθtion towards the main
effeθtor θells involved in graft rejeθtion through inhiηition of NK and θytotoxiθ T lympho‐
θyte CTL -mediated θytolysis and CD +T-θell alloproliferation. [ ]

HL“-G transθript generates alternative messenger riηonuθleiθ aθids mRN“s that enθode
memηrane-ηound HL“-G , G , G , G and soluηle protein isoforms HL“-G , G ,
G . Moreover, HL“-G alleliθ variants are θharaθterized ηy a -ηasepair ηp deletion-inser‐
118 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

tion polymorphism loθated in the ~-untranslated region ~UTR of HL“-G. The presenθe of
the -ηp insertion is known to generate an additional spliθe whereηy ηases are removed
from the ~UTR [ ]. HL“-G mRN“s having the -ηase deletion are more staηle than the
θomplete mRN“ forms, and thus determine an inθrement in HL“-G expression. Therefore,
the -ηp polymorphism is involved in the meθhanisms θontrolling post-transθriptional reg‐
ulation of HL“-G moleθules

“ θruθial role has ηeen attriηuted to the aηility of these moleθules to preserve graft funθtion
from the insults θaused ηy reθipient alloreaθtive NK θells and θytotoxiθ T lymphoθytes
CTL . [ ] This is well supported ηy the numerous studies demonstrating that high HL“-G
plasma θonθentrations in heart, liver or kidney transplant patients is assoθiated with ηetter
graft survival [ - ].

Reθent studies of assoθiation ηetween the HL“-G + -ηp /− -ηp polymorphism and the
outθome of kidney transplantation have provided interesting, though not always θonθord‐
ant results [ - ].

. Kydney transplantation outcome

In one θohort, a total of patients , % lost graft funθtion. The patients were divid‐
ed into groups aθθording to the presenθe or aηsenθe of HL“-G alleles exhiηiting the
-ηp insertion polymorphism. The first group inθluded patients . % with either
HL“-G + -ηp/+ -ηp or HL“-G − /+ -ηp whereas the seθond group inθluded ho‐
mozygotes . % for the HL“-G − -ηp polymorphism. The patients had a median age
of years range - and were prevalently males . % . The donors had a median
age of years range - . Nearly all patients , % had ηeen given a θadaver do‐
nor kidney transplant and for most of them . % it was their first transplant. The
average ±SD numηer of mismatθhes was ± antigens for HL“ Class I and ± .
antigens for HL“ Class II. “verage ±SD θold isθhemia time CIT was ± . hours.
The perθentage of patients hyperimmunized against HL“ Class I and II antigens PR“ >
% was higher in the group of homozygotes for the HL“-G -ηp deletion. Pre-trans‐
plantation serum levels of interleukin- IL- were lower in the group of homozy‐
gotes for the -ηp deletion.

Kidney transplant outθome was evaluated ηy glomerular filtration rate GFR , serum θre‐
atinine and graft funθtion tests. “t one year after transplantation, a stronger progressive
deθline of the estimated GFR, using the aηηreviated Modifiθation of Diet in Renal Dis‐
ease MDRD study equation, was oηserved in the group of homozygotes for the HL“-
G -ηp deletion in θomparison with the group of heterozygotes for the -ηp insertion.
This differenθe ηetween the groups ηeθame statistiθally signifiθant at two years .
ml/min/ . m P< . % CI . - . and θontinued to rise at . ml/min/ . m
P< . % CI . - . and years . ml/min/ . m P< . % CI . – .
after transplantation.
Comparison Be“ween an Ar“ificial Ne”ral Ne“work and Logis“ic Regression in Predic“ing Long Term Kidney 119
Transplan“a“ion O”“come
h““p://dx.doi.org/10.5772/53104

. Logistic regression and neural network training

We θompared the prognostiθ performanθe of “NNs versus LR for prediθting rejeθtion in


a group of patients who underwent kidney transplantation. The following θliniθal
and immunogenetiθ parameters were θonsidered reθipient gender, reθipient age, donor
gender, donor age, patient/donor θompatiηility θlass I HL“-“, -” mismatθh - , θlass
II HL“-DR” mismatθh positivity for anti-HL“ Class I antiηodies > % positivity for
anti-HL“ Class II antiηodies > % IL- pg/mL first versus seθond transplant, antithy‐
moθyte gloηulin “TG induθtion therapy type of immunosoppressive therapy rapamy‐
θin, θyθlosporine, θortiθosteroids, myθophenolate mophetyl, everolimus, taθrolimus time
of θold isθhemia, reθipients homozygous/heterozygous for the -ηp insertion + -ηp/
+ -ηp and + -ηp/− -ηp and homozygous for the -ηp deletion − -ηp/− -ηp . Graft
survival was θalθulated from the date of transplantation to the date of irreversiηle graft
failure or graft loss or the date of the last follow up or death with a funθtioning graft.

“NNs have different arθhiteθtures, whiθh θonsequently require different types of algo‐
rithms. The multilayer perθeptron is the most popular network arθhiteθture in use today
Figure . This type of network requires a desired output in order to learn. The network
is trained with historiθal data so that it θan produθe the θorreθt output when the output
is unknown. Until the network is appropriately trained its responses will ηe random.
Finding appropriate arθhiteθture needs trial and error method and this is where ηaθk-
propagation steps in. Eaθh single neuron is θonneθted to the neurons of the previous lay‐
er through adaptaηle synaptiθ weights. ”y adjusting the strengths of these θonneθtions,
“NNs θan approximate a funθtion that θomputes the proper output for a given input
pattern. The training data set inθludes a numηer of θases, eaθh θontaining values for a
range of well-matθhed input and output variaηles. Onθe the input is propagated to the
output neuron, this neuron θompares its aθtivation with the expeθted training output.
The differenθe is treated as the error of the network whiθh is then ηaθkpropagated
through the layers, from the output to the input layer, and the weights of eaθh layer are
adjusted suθh that with eaθh ηaθkpropagation θyθle the network gets θloser and θloser to
produθing the desired output [ ]. We used the Neural Network ToolηoxTM of the soft‐
ware Matlaη® , version . MathWorks, inθ. to develop a three layer feed forward
neural network. [ ]. The input layer of neurons was represented ηy the previous‐
ly listed θliniθal and immunogenetiθ parameters. These input data were then proθessed
in the hidden layer neurons . The output neuron prediθted a numηer ηetween and
goal , representing the event Kidney rejeθtion yes€ [ ] or Kidney rejeθtion no€ ,
respeθtively. For the training proθedure, we applied the }on-line ηaθk-propagation~ meth‐
od on data sets of patients previously analyzed ηy LR. The test phases utilized
patients randomly extraθted from the entire θohort and not used in the training
phase. Mean sensitivity the aηility of prediθting rejeθtion and speθifiθity the aηility of
prediθting no-rejeθtion of data sets were determined and θompared to LR. Taηle
120 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

LR Expected cases ANN Expected


Rejection Observed Cases
(%) cases (%)

No 55 40 (73) 48 (87)
Ex“rac“ion_1 Tes“ N=63
Yes 8 2 (25) 4 (50)

No 55 38 (69) 48 (87)
Ex“rac“ion_2 Tes“ N=63
Yes 8 3 (38) 4 (50)

No 55 30 (55) 48(87)
Ex“rac“ion_3 Tes“ N=63
Yes 8 3 (38) 5 (63)

No 55 40 (73) 49 (89)
Ex“rac“ion_4 Tes“ N=63
Yes 8 3 (38) 5 (63)

No 7 40 (73) 46 (84)
Ex“rac“ion_5 Tes“ N=63
Yes 8 4 (50) 6 (75)

No 55 30 (55) 34 (62)
Ex“rac“ion_6 Tes“ N=63
Yes 8 4 (50) 6 (75)

No 55 40 (73) 47 (85)
Ex“rac“ion_7 Tes“ N=63
Yes 8 3 (38) 5 (63)

No 55 38 (69) 46 (84)
Ex“rac“ion_8 Tes“ N=63
Yes 8 4 (50) 5 (63)

No 55 44 (80) 51 (93)
Ex“rac“ion_9 Tes“ N=63
Yes 8 2 (25) 4 (50)

No 55 32 (58) 52 (95)
Ex“rac“ion_10 Tes“ N=63
Yes 8 2 (25) 5 (63)

Specifici“y % (mean) No Rejec“ion 68% 85%

Sensi“ivi“y % (mean) YES Rejec“ion 38% 62%

Table 1. Sensi“ivi“y and specifici“y of Logis“ic Regression and an Ar“ificial Ne”ral Ne“work in “he predic“ion of Kidney
rejec“ion in 10 “raining and valida“ing da“ase“s of kidney “ransplan“ recipien“s
Comparison Be“ween an Ar“ificial Ne”ral Ne“work and Logis“ic Regression in Predic“ing Long Term Kidney 121
Transplan“a“ion O”“come
h““p://dx.doi.org/10.5772/53104

Figure 2. S“r”c“”re of a “hree-layered ANN

. Results and perspectives

“NNs θan ηe θonsidered a useful supportive tool in the prediθtion of kidney rejeθtion fol‐
lowing transplantation. The deθision to perform analyses in this partiθular θliniθal setting
was motivated ηy the importanθe of optimizing transplantation parameters and modifying
faθtors related to the reθipient, donor and transplant proθedure. “nother motivation was the
need for a simple prognostiθ tool θapaηle of analyzing the relatively large numηer of immu‐
nogenetiθ and other variaηles that have ηeen shown to influenθe the outθome of transplanta‐
122 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

tion. When θomparing the prognostiθ performanθe of LR to “NN, the aηility of prediθting
kidney rejeθtion sensitivity was % for LR versus % for “NN. The aηility of prediθting
no-rejeθtion speθifiθity was % for LR θompared to % of “NN.

The advantage of “NNs over LR θan theoretiθally ηe explained ηy their aηility to evaluate
θomplex nonlinear relations among variaηles. ”y θontrast, “NNs have ηeen faulted for ηe‐
ing unaηle to assess the relative importanθe of the single variaηles while LR determines a
relative risk for eaθh variaηle. In many ways, these two approaθhes are θomplementary and
their θomηined use should θonsideraηly improve the θliniθal deθision-making proθess and
prognosis of kidney transplantation.

Acknowledgement

We wish to thank “nna Maria Koopmans affiliations ,


for her preθious assistanθe in pre‐
paring the manusθript

Author details

Giovanni Caoθθi , Roηerto ”aθθoli , Roηerto Littera , Sandro Orrù , Carlo Carθassi and
Giorgio La Nasa

Division of Hematology and Hematopoietiθ Stem Cell Transplantation, Department of


Internal Mediθal Sθienθes, University of Cagliari, Cagliari, Italy

Teθhniθal Physiθs Division, Faθulty of Engineering, Department of Engineering of the


Territory, University of Cagliari, Cagliari, Italy

Mediθal Genetiθs, Department of Internal Mediθal Sθienθes, University of Cagliari, Cagliari,


Italy

References

[ ] Royston P. “ strategy for modelling the effeθt of a θontinuous θovariate in mediθine


and epidemiology. Stat Med. - .

[ ] Harrell FE, Lee KL, Mark D”. Multivariaηle prognostiθ models issues in developing
models, evaluating assumptions and adequaθy, and measuring and reduθing errors.
Stat Med. - .

[ ] Sθhwarzer G, Vaθh W, Sθhumaθher M. On the misuses of artifiθial neural networks


for prognostiθ and diagnostiθ θlassifiθation in onθology. Stat Med. - .
Comparison Be“ween an Ar“ificial Ne”ral Ne“work and Logis“ic Regression in Predic“ing Long Term Kidney 123
Transplan“a“ion O”“come
h““p://dx.doi.org/10.5772/53104

[ ] Soteris “. Kalogirou. “rtifiθial neural networks in renewaηle energy systems appliθa‐


tions a review. Renewaηle and Sustainaηle Energy Review. – .

[ ] . Linder R, König IR, Weimar C, Diener HC, Pöppl SJ, Ziegler “. Two models for
outθome prediθtion - a θomparison of logistiθ regression and neural networks. Meth‐
ods Inf Med. - .

[ ] Simiθ-Ogrizoviθ S, Furunθiθ D, Lezaiθ V, Radivojeviθ D, ”lagojeviθ R, Djukanoviθ L.


Using “NN in seleθtion of the most important variaηles in prediθtion of θhroniθ re‐
nal allograft rejeθtion progression. Transplant Proθ. .

[ ] ”rier ME, Ray PC, Klein J”. Prediθtion of delayed renal allograft funθtion using an
artifiθial neural network. Nephrol Dial Transplant. - .

[ ] Tang H, Poynton MR, Hurdle JF, ”aird ”C, Koford JK, Goldfarη-Rumyantzev “S.
Prediθting three-year kidney graft survival in reθipients with systemiθ lupus erythe‐
matosus. “S“IO J. - .

[ ] Kazi JI, Furness PN, Niθholson M. Diagnosis of early aθute renal allograft rejeθtion
ηy evaluation of multiple histologiθal features using a ”ayesian ηelief network. J Clin
Pathol. - .

[ ] Furness PN, Kazi J, Levesley J, Tauη N, Niθholson M. “ neural network approaθh to


the diagnosis of early aθute allograft rejeθtion. Transplant Proθ.

[ ] Furness PN. “dvanθes in the diagnosis of renal transplant rejeθtion. Curr. Diag.
Pathol. - .

[ ] Rush DN, Henry SF, Jeffery JR, Sθhroeder TJ, Gough J. Histologiθal findings in early
routine ηiopsies of staηle renal allograft reθipients. Transplantation - .

[ ] Solez K, “xelsen R“, ”enediktsson H, et al. International standardization of θriteria


for the histologiθ diagnosis of renal allograft rejeθtion the ”anff working θlassifiθa‐
tion of kidney transplant pathology. Kidney Int - .

[ ] Kovats S, Main EK, Liηraθh C, Dtuηηleηine M, Fisher SJ, DeMars R. “ θlass I antigen,
HL“-G, expressed in human trophoηlasts. Sθienθe, -

[ ] Carosella ED, Favier ”, Rouas-Freiss N, Moreau P, LeMaoult P. ”eyond the inθreas‐


ing θomplexity of the immunomodulatory HL“-G moleθule. ”lood
-

[ ] Le Rond S, “ze´ma C, Krawiθe-Radanne I, Durrηaθh “, Guettier C, Carosella ED,


Rouas-Freiss N. Evidenθe to support the role of HL“-G in allograft aθθeptanθe
through induθtion of immunosuppressive/ regulatory T θells. Journal of Immunolo‐
gy, – . .

[ ] Rouas-Freiss N, Gonçalves RM, Menier C, Dausset J, Carosella ED. Direθt evidenθe to


support the role of HL“-G in proteθting the fetus from maternal uterine natural kill‐
er θytolysis. Proθ Natl “θad Sθi U S “. - .
124 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] Lila N, “mrein C, Guillemain R, Chevalier P, Faηiani JN, Carpentier “. Soluηle hu‐


man leukoθyte antigen-G a new strategy for monitoring aθute and θhroniθ rejeθtions
after heart transplantation. J Heart Lung Transplant. - .

[ ] ”aşt(rk ”, Karakayali F, Emiro≤lu R, Sözer O, Haηeral “, ”al D, Haηeral M. Human


leukoθyte antigen-G, a new parameter in the follow-up of liver transplantation.
Transplant Proθ. - .

[ ] Qiu J, Terasaki PI, Miller J, Mizutani K, Cai J, Carosella ED. Soluηle HL“-G expres‐
sion and renal graft aθθeptanθe. “m J Transplant. - .

[ ] Crispim JC, Duarte R“, Soares CP, Costa R, Silva JS, Mendes-Júnior CT, Wastowski
IJ, Faggioni LP, Saηer LT, Donadi E“. Human leukoθyte antigen-G expression after
kidney transplantation is assoθiated with a reduθed inθidenθe of rejeθtion. Transpl
Immunol. - .

[ ] Pianθatelli D, Maθθarone D, Liηeratore G, Parzanese I, Clemente K, “zzarone R, Pisa‐


ni F, Famulari “, Papola F. HL“-G -ηp insertion/deletion polymorphism in kidney
transplant patients with metaηoliθ θompliθations. Transplant Proθ. - .

[ ] Demuth H, ”eale M, Hagan M. Neural Network Toolηox™ . User~s Guide. The


MathWorks, Inθ. Natiθk, M“.
Chapter 6

Edge Detection in
Biomedical Images Using Self-Organizing Maps

L”cie Gráfová, Jan Mareš, Aleš Procházka and


Pavel Konopásek

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/51468

1. Introduction
The application of self-organizing maps (SOMs) to the edge detection in biomedical images
is discussed. The SOM algorithm has been implemented in MATLAB program suite with
various optional parameters enabling the adjustment of the model according to the user’s
requirements. For easier application of SOM the graphical user interface has been developed.
The edge detection procedure is a critical step in the analysis of biomedical images, enabling
for instance the detection of the abnormal structure or the recognition of different types
of tissue. The self-organizing map provides a quick and easy approach for edge detection
tasks with satisfying quality of outputs, which has been verified using the high-resolution
computed tomography images capturing the expressions of the Granulomatosis with
polyangiitis. The obtained results have been discussed with an expert as well.

2. Self-organizing map
2.1. Self-organizing map in edge detection performance
The self-organizing map (SOM) [5, 9] is widely applied approach for clustering and pattern
recognition that can be used in many stages of the image processing, e. g. in color image
segmentation [18], generation of a global ordering of spectral vectors [26], image compression
[25], binarisation document [4] etc.
The edge detection approaches based on SOMs are not extensively used. Nevertheless
there are some examples of SOM utilization in edge detection, e. g. texture edge detection
[27], edge detection by contours [13] or edge detection performed in combination with
conventional edge detector [24] and methods of image de-noising [8] .
In our case, the SOM has been utilized in edge detection process in order to reduce the image
intensity levels.

© 2013 Gráfová e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
126 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

2.2. Structure of self-organizing map


A SOM has two layers of neurons, see Figure 1.

Figure 1. Structure of SOM

The input layer (size Nx1) represents input data x1 , x2 , . . . , xM (M inputs, each input is
N dimensional). The output layer (size KxL), that may have a linear or 2D arrangement,
represents clusters in which the input data will be grouped. Each neuron of the input layer
is connected with all neurons of the output layer through the weights W (size of the weight
matrix is KxLxN).

2.3. Training of self-organizing map


A SOM is neural network with unsupervised type of learning, i. e. no cluster values denoting
an a priori grouping of the data instances are provided.
The learning process is divided in epochs, during which the entire batch of input vectors is
processed. The epoch involves the following steps:

1. Consecutive submission of an input data vector to the network.


2. Calculation of a distances between the input vector and the weight vectors of the neurons
of the output layer.
3. Selection of the nearest (the most similar) neuron of the output layer to the presented
input data vector.
4. An adjustment of the weights.

SOM can be trained in either recursive or batch mode. In recursive mode, the weights of
the winning neurons are updated after each submission of an input vector, whereas in batch
mode, the weight adjustment for each neuron is made after the entire batch of inputs has
been processed, i. e. at the end of an epoch.
The weights adapt during the learning process based on a competition, i. e. the nearest (the
most similar) neuron of the output layer to the submitted input vector becomes a winner and
its weight vector and the weight vectors of its neighbouring neurons are adjusted according
to

W = W + λ φs (xi − W), (1)


Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 127
h““p://dx.doi.org/10.5772/51468

where W is the weight matrix, xi the submitted input vector, λ the learning parameter
determining the strength of the learning and φs the neighbourhood strength parameter
determining how the weight adjustment decays with distance from the winner neuron (it
depends on s, the value of the neighbourhood size parameter).
The learning process can be divided into two phases: ordering and convergence. In
the ordering phase, the topological ordering of the weight vectors is established using
reduction of learning rate and neighbourhood size with iterations. In the convergence phase,
the SOM is fine tuned with the shrunk neighbourhood and constant learning rate. [23]

2.3.1. Learning Parameter


The learning parameter, corresponding to the strength of the learning, is usually reduced
during the learning process. It decays from the initial value to the final value, which can be
reached already during the learning process, not only at the end of the learning. There are
several common forms of the decay function (see Figure 2):

1. No decay

λ t = λ0 , (2)

2. Linear decay
 
t
λ t = λ0 1 − , (3)
τ

3. Gaussian decay
2
− 2τt 2
λ t = λ0 e , (4)

4. Exponential decay
t
λ t = λ0 e− τ , (5)

where T is the total number of iterations, λ0 and λt are the initial learning rate and that at
iteration t, respectively. The learning parameter should be in the interval h0.01, 1i.

2.3.2. Neighbourhood
In the learning process not only the winner but also the neighbouring neurons of the winner
neuron learn, i. e. adjust their weights. All neighbour weight vectors are shifted towards
the submitted input vector, however, the winning neuron update is the most pronounced and
the farther away the neighbouring neuron is, the less its weight is updated. This procedure
of the weight adjustment produces topology preservation.
There are several ways how to define a neighbourhood (some of them are depicted in
Figure 3).
The initial value of the neighbourhood size can be up to the size of the output layer, the final
value of the neighbourhood size must not be less than 1. The neighbourhood strength parameter,
128 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

determining how the weight adjustment of the neighbouring neurons decays with distance
from the winner, is usually reduced during the learning process (as well as the learning
parameter, analogue of Equations 2–5 and Figure 2). It decays from the initial value to
the final value, which can be reached already during the learning process, not only at the end
of the learning process. The neighbourhood strength parameter should be in the interval
h0.01, 1i. The Figure 4 depicts one of the possible development of neighbourhood size and
strength parameters during the learning process.

2.3.3. Weights
The resulting weight vectors of the neurons of the output layer, obtained at the end of
the learning process, represent the centers of the clusters. The resulting patterns of the weight
vectors may depend on the type of the weights initialization. There are several ways how to
initialize the weight vector, some of them are depicted in Figure 5.

2.3.4. Distance Measures


The criterion for victory in the competition of the neurons of the output layer, i. e. the measure
of the distance between the presented input vector and its weight vectors, may have many
forms. The most commonly used are:

1. Euclidean distance
v
uN  2
u
d j = t ∑ xi − w ji , (6)
i =1

2. Correlation
N ( xi − x )(w ji − w j )
dj = ∑ σx σw j
, (7)
i =1

3. Direction cosine

∑iN=1 xi w ji
dj = , (8)
k xi kkw ji k

4. Block distance
N
dj = ∑ | xi − w ji |, (9)
i =1

where xi is i-th component of the input vector, w ji i-th component of the j-th weight vector, N
dimension of the input and weight vectors, x mean value of the input vector x, w j mean value
of the weight vector w j , σx standard deviation of the input vector x, σw j standard deviation
of the weight vector w j , k xi k length of the input vector x and kw ji k length of the weight
vector w j .
Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 129
h““p://dx.doi.org/10.5772/51468

Figure 2. Learning rate decay function (dependence of the value of the learning parameter on the number of iterations): (a) No
decay, (b) Linear decay, (c) Gaussian decay, (d) Exponential decay

Figure 3. Types of neighbourhood: (a) Linear arrangements, (b) Square arrangements, (c) Hexagonal arrangements

Figure 4. Neighbourhood size decay function (dependence of the neighbourhood size on the number of iterations) and
neighbourhood strength decay function (dependence of the value of the neighbourhood strength on the distance from
the winner)
130 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

2.3.5. Learning Progress Criterion


The learning progress criterion, minimized over the learning process, is the sum of distances
between all input vectors and their respective winning neuron weights, calculated after
the end of each epoch, according to

k
D= ∑ ∑ ( x n − w i )2 , (10)
i =1 n ∈ ci

where xn is the n-th input vector belonging to cluster ci whose center is represented by wi
(e. i. the weight vector of the winning neuron representing cluster ci ).
The weight adjustment corresponding to the smallest learning progress criterion is the result
of the SOM learning process, see Figure 6. These weights represent the cluster centers.
For the best result, the SOM should be run several times with various settings of SOM
parameters to avoid detection of local minima and to find the global optimum on the error
surface plot.

2.3.6. Errors
The errors of trained SOM can be evaluated according to

1. Learning progress criterion (see Equation 10),


2. Normalized learning progress criterion

1 k
E=
M i∑ ∑ ( x n − w i )2 , (11)
=1 n ∈ ci

3. Normalized error in the cluster

1 k 1
E=
k i∑ M i
∑ ( x n − w i )2 , (12)
=1 n ∈ ci

4. Error in the i-th cluster

Ei = ∑ ( x n − w i )2 , (13)
n ∈ ci

5. Normalized error in the i-th cluster

1
( x n − w i )2 ,
Mi n∑
Ei = (14)
∈ ci

where xn is the n-th input vector belonging to cluster ci whose center is represented by wi
(e. i. the weight vector of the winning neuron representing cluster ci ), M is number of input
vectors, Mi is number of input vectors belonging to i-th cluster and k is number of clusters.
For more information about the trained SOM, the distribution of the input vectors in
the clusters and the errors of the clusters can be visualized, see Figure 7.
Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 131
h““p://dx.doi.org/10.5772/51468

2.3.7. Forming Clusters


The U-matrix (the matrix of average distance between weights vectors of neighbouring
neurons) can be used for finding of realistic and distinct clusters. The other approach
for forming clusters on the map can be to utilize any established clustering method
(e. g. K-means clustering). [23, 28]

2.3.8. Validation of Trained Self-organizing Map


The validation of the trained SOM can be done so, a portion of the input data is used for map
training and another portion for validation (e. g. in proportion 70:30). Different approach for
SOM validation is n-fold cross validation with the leave-one out method. [16, 23]

2.4. Using of trained self-organizing map


A trained SOM can be used according the following steps for clustering:

1. Consecutive submission of an input data vector to the network.


2. Calculation of a distances between the input vector and the weight vectors of the neurons
of the output layer.
3. Selection of the nearest (the most similar) neuron of the output layer (e. i. the cluster) to
the presented input data vector.

2.5. Implementation of self-organizing map


The SOM algorithm has been implemented in MATLAB program suite [17] with various
optional parameters enabling the adjustment of the model according to the user’s
requirements. For easier application of SOM the graphical user interface has been developed
facilitating above all the setting of the neural network, see Figure 8.

3. Edge detection
Edge detection techniques are commonly used in image processing, above all for feature
detection, feature extraction and segmentation.
The aim of the edge detection process is to detect the object boundaries based on the abrupt
changes in the image tones, i. e. to detect discontinuities in either the image intensity or
the derivatives of the image intensity.

3.1. Conventional edge detector


The image edge is a property attached to a single pixel. However it is calculated from
the image intensity function of the adjacent pixels.
Many commonly used edge detection methods (Roberts [22], Prewitt [20], Sobel [19], Canny
[6], Marr-Hildreth [15] etc), employ derivatives (the first or the second one) to measure
the rate of change in the‘image intensity function. The large value of the first derivative and
zero-crossings in the second derivative of the image intensity represent necessary condition
for the location of the edge. The differential operations are usually approximated discretely
132 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

by proper convolution mask. Moreover, for simplification of derivative calculation, the edges
are usually detected only in two or four directions.
The essential step in edge detection process is thresholding, i. e. determination of the threshold
limit corresponding to a dividing value for the evaluation of the edge detector response either
as the edge or non-edge. Due to the thresholding, the result image of the edge detection
process is comprised only of the edge (white) and non-edge (black) pixels. The quality
of thresholding setting has an impact on the quality of the whole edge detection process,
i. e. exceedingly small value of the threshold leads to assignment of the noise as the edge, on
the other hand exceedingly large value of the threshold leads to omission of some significant
edges.

3.2. Self-organizing map


A SOM may facilitate the edge detection task, for instance by reducing the dimensionality of
an input data or by segmentation of an input data.
In our case, the SOM has been utilized for the reduction of image intensity levels, from 256
to 2 levels. Each image has been transformed to mask (3x3), see Table 1, that forms the set
of input vectors (9-dimensional). The input vectors have been then classified into 2 classes
according to the weights of the beforehand trained SOM. The output set of the classified
vectors has been reversely transformed into the binary (black and white) image.
Due to this image preprocessing using SOM, the following edge detection process has been
strongly simplified, i. e. only the numerical gradient has been calculated

q
G= Gx2 + Gy2 , (15)

where G is the edge gradient, Gx and Gy are values of the first derivative in the horizontal
and in the vertical direction, respectively.
The ability of SOM for edge detection in biomedical images has been tested using
the high-resolution computed tomography (CT) images capturing the expressions of
the Granulomatosis with polyangiitis disease.

4. Granulomatosis with polyangiitis


4.1. Diagnosis of granulomatosis with polyangiitis
Granulomatosis with polyangiitis (GPA), in the past also known as Wegener‘s granulomatosis
is a disease belonging to the group of vasculitic diseases affecting mainly small caliber blood
vessels [7].
They can be distinguished from other vasculitides by the presence of ANCA antibodies
(ANCA - Anti- Neutrophil Cytoplasmic Antibodies). GPA is quite a rare disease, its yearly
incidence is around 10 cases per million inhabitants. The course of the disease is extremely
variable on the one hand there are cases of organ limited disease that affects only single
organ, on the other hand there is a possibility of generalized disease affecting multiple
organs and threatening the patients life. Diagnostics of this disease has been improved
Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 133
h““p://dx.doi.org/10.5772/51468

Figure 5. Weight vectors initialization: (a) Random small numbers, (b) Vectors near the center of mass of inputs, (c) Some of
the input vectors are randomly chosen as the‘initial weight vectors

Figure 6. The training criterion of the learning process.


The figure shows the example of the evolution of the training criterion (sum of the distances between all input vectors and their
respective winning neuron weights) as the function of the number of epochs. In this case, the smallest sum of distances was
achieved in the 189th epoch

P1 P2 P3
P4 P5 P6
P7 P8 P9

Table 1. The image mask used for the preparation of the set of the input vectors.
The mask (9 adjacent pixels) was moved over the whole image pixel by pixel row-wise and the process continued until
the whole image was scanned. From each location of the scanning mask in the image the single input vector has been formed,
i. e. x = ( P1, P2, P3, P4, P5, P6, P7, P8, P9), where P1–P9 denote the intensity values of the image pixel. The (binary) output
value of SOM for each input vector replaced the intensity value in the position of the pixel with original intensity value P5
134 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

(a) (b)

Figure 7. The additional information about the SOM result.


(a) The distribution of the input vectors in the clusters, (b) The errors of the clusters (the mean value of the distance between
the respective input vectors and the cluster center with respect to the maximum value of the distance)

by the discovery of ANCA antibodies and their routine investigation since the nineties of
the past century. The onset of GPA may occur at any age, although patients typically present
at age 35–55 years [11].
A classic form of GPA is characterized by necrotizing granulomatous vasculitis of the upper
and lower respiratory tract, glomerulonefritis, and small-vessel vasculitis of variable degree.
Because of the respiratory tract involvement nearly all the patients have some respiratory
symptoms including cough, dyspnea or hemoptysis. Due to this, nearly all of them
have a chest X-ray at the admission to the hospital, usually followed by a high-resolution
computed CT scan. The major value of CT scanning is in the further characterization of
lesions found on chest radiography.
The spectrum of high-resolution CT findings of GPA is broad, ranging from nodules and
masses to ground glass opacity and lung consolidation. All of the findings may mimic other
conditions such as pneumonia, neoplasm, and noninfectious inflammatory diseases [2].
The prognosis of GPA depends on the activity of the disease and disease-caused damage
and response to therapy. There are several drugs used to induce remission, including
cyclophosphamide, glucocorticoids or monoclonal anti CD 20 antibody rituximab. Once
the remission is induced, mainly azathioprin or mycophenolate mofetil are used for its
maintenance.
Unfortunately, relapses are common in GPA. Typically, up to half of patients experience
relapse within 5 years [21].

4.2. Analysis of granulomatosis with polyangiitis


Up to now, all analysis of CT images with GPA expressions have been done using manual
measurements and subsequent statistical evaluation [1, 3, 10, 12, 14]. It has been based on
the experience of a radiologist who can find abnormalities in the CT scan and who is able to
Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 135
h““p://dx.doi.org/10.5772/51468

classify types of the finding. The standard software 1 used for CT image visualization usually
includes some interactive tools (zoom, rotation, distance or angle measurements, etc.) but no
analysis is done (e. g. detection of the abnormal structure or recognition of different types
of tissue). Moreover, there is no software customized specifically for requirements of GPA
analysis.
Therefore, there is a place to introduce a new approach for analysis based on SOM. CT
finding analysis is then less time consuming and more precise.

5. Results and discussion


The aim of the work has been to detect all three expression forms of the GPA disease
in high-resolution CT images (provided by Department of Nephrology, First Faculty of
Medicine and General Faculty Hospital, Prague, Czech Republic) using the SOM, i. e. to
detect granulomatosin, mass and ground-glass, see Figure 10a, 11a, 12a. The particular
expression forms occur often together, therefore there has been the requirement to
distinguish the particular expression forms from each other.
Firstly, the SOM has been trained using the test image, see Figure 9 (For detailed information
about the training process, please see Table 2).
Secondly, the edge detection of the validating images has been done using the result weights
from the SOM training process, see Figure 10b, 11b, 12b (For detailed information about
the edge detection process, please see Table 3).

Setting description Value

Size of the output layer (1,2)


Type of the weights initialization Some of the input vectors are randomly
chosen as the initial weight vectors.
Initial value of the‘learning parameter 1
Final value of the learning parameter 0.1
A point in the learning process in which the learning 0.9
parameter reaches the final value
Initial value of the cneighbourhood size parameter 2
Final value of the cneighbourhood size parameter 1
A point in the learning process in which 0.75
the neighbourhood size parameter reaches the final
value
Type of the distance measure 1
Type of the learning rate decay function Exponential
Type of neighbourhood size strength function Linear
Type of the neighbourhood size rate decay function Exponential
Number of the epochs 500

Table 2. Setting of the SOM training process

1
Syngo Imaging XS-VA60B, Siemens AG Medical Solutions, Health Services, 91052 Erlangen, Germany
136 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 8. The graphical user interface of the software using SOM for clustering

Stages Description
Image preprocessing Histogram matching of the input image to the test
image.
Image transformation The image is transformed to M masks (3x3) that form
the set of M input vectors (9 dimensional).
Clustering using SOM The set of the input vectors is classified into 2 classes
according to the obtained weights from the SOM
training process.
Reverse image transformation The set of classified input vectors is reversely
transformed into the image. The intensity values are
replaced by the values of the class membership.
Edge detection Computation of image gradient.

Table 3. Stages of the edge detection process


Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 137
h““p://dx.doi.org/10.5772/51468

Figure 9. Test image (transverse high-resolution CT image of both lungs)

The obtained results have been discussed with an expert from Department of Nephrology of
the First Faculty of Medicine and General Teaching Hospital in Prague.
In the first case, a high-resolution CT image capturing a granuloma in the left lung has been
processed, see Figure 10a. The expert has been satisfied with the quality of the GPA detection
(see Figure 10b) provided by the SOM, since the granuloma has been detected without any
artifacts.
In the second case, the expert has pointed out that the high-resolution CT image,
capturing both masses and ground-glass (see Figure 11a, 12a), was problematic to be
evaluated. The possibility of the detection of the masses was aggravated by the ground-glass
surrounding the lower part of the first mass and the upper part of the second mass.
The artifacts, which originated in the coughing movements of the patient (‘wavy’ lower part
of the image), made the detection process of the masses and the ground-glass difficult as
well. Despite these inconveniences, the expert has confirmed, that the presented software
has been able to distinguish between the particular expression forms of GPA with satisfying
accuracy and it has detected the mass and ground-glass correctly (see Figure 11b, 12b, 12c).
In conclusion, the presented SOM approach represents a new helpful approach for GPA
disease diagnostics.
138 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

(a)

(b)

Figure 10. Detection of granuloma.


(a) Transverse high-resolution CT image of both lungs with active granulomatosis (white arrow).
b) The edge detection result obtained by the SOM. The granulomatosin is detected with sufficient accuracy
Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 139
h““p://dx.doi.org/10.5772/51468

(a)

(b)

Figure 11. Detection of masses.


(a) Coronal high-resolution CT image of both lungs with masses (white arrows). Possibility of the detection of the masses is
aggravated by the‘ground-glass surrounding the lower part of the first mass and the upper part of the second mass. The artifacts
originated by coughing movements of the patient makes the detection process difficult as well.
(b) The edge detection result obtained by the SOM. The‘masses are detected and distinguished from the ground-glass with
sufficient accuracy
140 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

(a)

(b) (c)

Figure 12. Detection of ground-glass.


(a) Coronal high-resolution CT image of both lungs with ground-glasses (white arrows). Possibility of the detection of
the ground-glasses is complicated by the masses in close proximity to the ground-glasses. The artifacts originated by coughing
movements of the patient makes the detection process hard as well.
(b) The edge detection result obtained by the SOM. The ground-glasses are detected and distinguished from the masses.
(c) The overlap of the original CT image and the edge detection result (cyan color)
Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 141
h““p://dx.doi.org/10.5772/51468

6. Conclusions
The edge detection procedure is a‘critical step in the analysis of biomedical images, enabling
above all the detection of the abnormal structure or the‘recognition of different types of
tissue.
The application of SOM for edge detection in biomedical images has been discussed and
its contribution to the solution of the edge detection task has been confirmed. The‘ability
of SOM has been verified using the‘high-resolution CT images capturing all three forms
of the expressions of the GPA disease (granulomatosin, mass, ground-glass). Using SOM,
particular expression forms of the GPA disease have been detected and distinguished from
each other. The obtained results have been discussed by the expert who has confirmed that
the SOM provides a quick and easy approach for edge detection tasks with satisfying quality
of output.
Future plans are based on the problem extension to three-dimensional space to
enable CT image analysis involving (i) pathological finding 3D visualization and (ii)
3D reconstruction of the whole region (using the whole set of CT images).

Acknowledgements
The work was supported by specific university research MSMT No. 21/2012, the reaserch
grant MSM No. 6046137306 and PRVOUK-P25/LF1/2.

Author details
Lucie Gráfová1 , Jan Mareš1 , Aleš Procházka1 and Pavel Konopásek2
1 Department of Computing and Control Engineering, Institute of Chemical Technology,
Prague, Czech Republic
2 Department of Nephrology, First Faculty of Medicine and General Faculty Hospital, Prague,

Czech Republic

7. References
[1] Ananthakrishnan, L., Sharma, N. & Kanne, J. P. [2009]. Wegener?s granulomatosis in
the chest: High-resolution ct findings, Am J Roentgenol 192(3): 676–82.

[2] Annanthakrishan, L., Sharma, N. & Kanne, J. P. [2009]. Wegener‘s granulomatosis in


the chest: High-resolution ct findings, Am J Roentgenol 192(3): 676–82.

[3] Attali, P., Begum, R., Romdhane, H. B., Valeyre, D., Guillevin, L. & Brauner, M. W.
[1998]. Pulmonary wegener?s granulomatosis: changes at follow-up ct, European
Radiology 8: 1009–1113.

[4] Badekas, E. & Papamarkos, N. [2007]. Document binarisation using kohonen som, IET
Image Processing 1: 67–84.
142 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[5] Baez, P. G., Araujo, C. S., Fernandez, V. & Procházka, A. [2011]. Differential Diagnosis
of Dementia Using HUMANN-S Based Ensembles, Springer, Berlin, Germany, chapter 14,
pp. 305–324.

[6] Canny, J. [1986]. A computational approach to edge detection, IEEE Transactions on


Pattern Analysis and Machine Intelligence PAMI-8(6): 679–698.

[7] Jennette, J. C. [2011]. Nomenclature and classification of vasculitis: lessons learned


from granulomatosis with polyangiitis (wegener’s granulomatosis), Clin Exp Immunol
164: 7–10.

[8] Jerhotová, E., Švihlík, J. & Procházka, A. [2011]. A. Biomedical Image Volumes Denoising
via the Wavelet Transform, InTech, chapter 14.

[9] Kohonen, T. [1989]. Self-Organization and Associative Memory, Springer-Verlag.

[10] Komócsi, A., Reuter, M., Heller, M., Muraközi, H., Gross, W. L. & Schnabel, A.
[2003]. Active disease and residual damage in treated wegener?s granulomatosis: an
observational study using pulmonary high-resolution computed tomography, European
Radiology 13: 36–42.

[11] Lane, S. E., Watts, R. & Scott, D. G. I. [2005]. Epidemiology of systemic vasculitis, Curr
Rheumatol Rep 7: 270–275.

[12] Lee, K. S., Kim, T. S., Fujimoto, K., Moriya, H., Watanabe, H., Tateishi, U., Ashizawa,
K., Johkoh, T., Kim, E. A. & Kwon, O. J. [2003]. Thoracic manifestation of wegener?s
granulomatosis: Ct findings in 30 patients, European Radiology 13: 43–51.

[13] Liu, J.-C. & Pok, G. [1999]. Texture edge detection by feature encoding and predictive
model, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, pp. 1105–1108.

[14] Lohrmann, C., Uhl, M., Schaefer, O., Ghanem, N., Kotter, E. & Langer:, M.
[2005]. Serial high-resolution computed tomography imaging in patients with wegener
granulomatosis: Differentiation between active inflammatory and chronic fibrotic
lesions, Acta Radiologica 46: 484–491.

[15] Marr, D. & Hildreth, E. [1980]. Theory of edge detection, Proc. Roy. Soc. Landon, Vol.
B.207, pp. 187–217.

[16] Mitchell, T. [1997]. Machine Learning, McGraw-Hill.

[17] Moore, H. [2007]. Matlab for Engineers, Pearson, Prentice Hall.

[18] Moreira, J. & Fontuora, L. D. [1996]. Neural-based color image segmentation and
classification, Anais do IX SIBGRAPI: 47–54.

[19] Pingle, K. K. [1969]. Visual perception by computer. Automatic Interpretation and


Classification of Images, Academic Press, New York.
Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 143
h““p://dx.doi.org/10.5772/51468

[20] Prewitt, J. M. S. [1970]. Object enhancement and extraction. Picture Processing and
Psychophysics, Academic Press, New York.

[21] Renaudineau, Y. & Meur, Y. L. [2008]. Renal involvement in wegener‘s granulomatosis,


Clinic Rev Allerg Immunol 35: 22–29.

[22] Roberts, L. G. [1963]. Machine Perception of Three Dimensional Solids, PhD thesis,
Massachusetts Institute of Technology, Electrical Engineering Department.

[23] Samarasinghe, S. [2006]. Neural Networks for Applied Sciences and Engineering: From
Fundamentals to Complex Pattern Recognition.

[24] Sampaziotis, P. & Papamarko, N. [2005]. Automatic edge detection by combining


kohonen som and the canny operator, Proceedings of the 10th Iberoamerican Congress
conference on Progress in Pattern Recognition, Image Analysis and Applications, pp. 954–965.

[25] Sharma, D. K., Gaur, L. & Okunbor, D. [2007]. Image compression and feature extraction
using kohonen’s self-organizing map neural network, Journal of Strategic E-Commerce
5(No. 0): 25–38.

[26] Toivanen, P. J., Ansamäki, J., Parkkinen, J. P. S. & Mielikäinen, J. [2003]. Edge
detection in multispectral images using the self-organizing map, Pattern Recognition
Letters 24: 2987–2994.

[27] Venkatesh, Y. V., K.Raja, S. & Ramya, N. [2006]. Multiple contour extraction from
graylevel images using an artificial neural network, IEEE Transactions on Image Processing
15: 892–899.

[28] Wilson, C. L. [2010]. Mathematical Modeling, Clustering Algorithms and Applications.


Chapter 7

MLP and ANFIS Applied to the Prediction of Hole


Diameters in the Drilling Process

Thiago M. Geronimo, Carlos E. D. Cr”z,


Fernando de So”za Campos, Pa”lo R. Ag”iar and
Ed”ardo C. Bianchi

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/51629

. Introduction

The θontrol of industrial maθhining manufaθturing proθesses is of great eθonomiθ impor‐


tanθe due to the ongoing searθh to reduθe raw materials and laηor wastage. Indireθt
manufaθturing operations suθh as dimensional quality θontrol generate indireθt θosts that
θan ηe avoided or reduθed through the use of θontrol systems [ ]. The use of intelligent
manufaθturing systems IMS , whiθh is the next step in the monitoring of manufaθturing
proθesses, has ηeen researθhed through the appliθation of artifiθial neural networks
“NN sinθe the s [ ].

The maθhining drilling proθess ranks among the most widely used manufaθturing proθ‐
esses in industry in general [ , ]. In the quest for higher quality in drilling operations,
“NNs have ηeen employed to monitor drill wear using sensors. “mong the types of sig‐
nals employed is that of maθhining loads measured with a dynamometer [ , ], eleθtriθ
θurrent measured ηy applying Hall Effeθt sensors on eleθtriθ motors [ ], viηrations [ ],
as well as a θomηination of the aηove with other deviθes suθh as aθθelerometers and
aθoustiθ emission sensors [ ].

This artiθle θontriηutes to the use of MLP [ - ] and “NFIS type [ - ] artifiθial intelli‐
genθe systems programmed in M“TL“” to estimate the diameter of drilled holes. The two
types of network use the ηaθkpropagation method, whiθh is the most popular model for
manufaθturing appliθations [ ]. In the experiment, whiθh θonsisted of drilling single-layer
test speθimens of -T alloy and of Ti “l V alloy, an aθoustiθ emission sensor, a three-
dimensional dynamometer, an aθθelerometer, and a Hall Effeθt sensor were used to θolleθt

© 2013 Geronimo e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
146 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

information aηout noise frequenθy and intensity, taηle viηrations, loads on the x, y and z ax‐
es, and eleθtriθ θurrent in the motor, respeθtively.

. Drilling Process

The three drilling proθesses most frequently employed in industry today are turning, mill‐
ing and ηoring [ ], and the latter is the least studied proθess. However, it is estimated that
today, ηoring with heliθal drill ηits aθθounts for % to % of all maθhining proθesses.
The quality of a hole depends on geometriθ and dimensional errors, as well as ηurrs and
surfaθe integrity. Moreover, the type of drilling proθess, the tool, θutting parameters and
maθhine stiffness also affeθt the preθision of the hole [ ].
It is very diffiθult to generate a reliaηle analytiθal model to prediθt and θontrol hole diame‐
ters, sinθe these holes are usually affeθted ηy several parameters. Figure illustrates the
loads involved in the drilling proθess, the most representative of whiθh is the feed forθe FZ,
sinθe it affeθts θhip formation and surfaθe roughness.

Figure 1. Loads involved in drilling processes.

. Artificial Intelligent Systems

. . Artificial multilayer perceptron MLP neural network


“rtifiθial neural networks are gaining ground as a new information proθessing paradigm for
intelligent systems, whiθh θan learn from examples and, ηased on training, generalize to
proθess a new set of information [ ].
“rtifiθial neurons have synaptiθ θonneθtions that reθeive information from sensors and have
an attriηuted weight. The sum of the values of the inputs adjusted ηy the weight of eaθh
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 147
h““p://dx.doi.org/10.5772/51629

synapse is proθessed and an output is generated. The training error is θalθulated in eaθh iter‐
ation, ηased on the θalθulated output and desired output, and is used to adjust the synaptiθ
weights aθθording to the generalized delta rule

wij(l ) ( n + 1) = wij(l ) ( n ) + a éë wij(l ) ( n - 1) ùû + hd (j l ) ( n ) yi(l -1) ( n )

where is the learning rate and is the moment, whiθh are parameters that influenθe the
learning rate and its staηility, respeθtively wij l is the weight of eaθh θonneθtion and j l is
the loθal gradient θalθulated from the error signal.
“ neural network artifiθial “NN learns ηy θontinuously adjusting the synaptiθ weights at
the θonneθtions ηetween layers of neurons until a satisfaθtory response is produθed [ ].
In the present work, the MLP network was applied to estimate drilled hole diameters ηased
on an analysis of the data θaptured ηy the sensors. The weight readjustment method em‐
ployed was ηaθkpropagation, whiθh θonsists of propagating the mean squared error gener‐
ated in the diameter estimation ηy eaθh layer of neurons, readjusting the weights of the
θonneθtions so as to reduθe the error in the next iteration.

Figure 2. Typical archi“ec“”re of an MLP wi“h “wo hidden layers.

Figure shows a typiθal MLP “NN, with m inputs and p outputs, with eaθh θirθle represent‐
ing a neuron. The outputs of a neuron are used as inputs for a neuron in the next layer.

. . Adaptive neuro-fuzzy inference system ANFIS


The “NFIS system is ηased on funθtional equivalenθe, under θertain θonstraints, ηetween
R”F radial ηasis funθtion neural networks and TSK-type fuzzy systems [ ]. “ single ex‐
isting output is θalθulated direθtly ηy weighting the inputs aθθording to fuzzy rules.
These rules, whiθh are the knowledge ηase, are determined ηy a θomputational algorithm
ηased on neural networks. Figure exemplifies the “NFIS model with two input varia‐
ηles x and y and two rules [ ].
148 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Oηtaining an “NFIS model that performs well requires taking into θonsideration the initial
numηer of parameters and the numηer of inputs and rules of the system [ ]. These parame‐
ters are determined empiriθally, and an initial model is usually θreated with equally spaθed
pertinenθe funθtions.

Figure 3. ANFIS archi“ec“”re for “wo inp”“s and “wo r”les based on “he firs“-order S”geno model.

However, this method is not always effiθient ηeθause it does not show how many relevant
input groups there are. To this end, there are algorithms that help determine the numηer of
pertinenθe funθtions, thus enaηling one to θalθulate the maximum numηer of fuzzy rules.

The suηtraθtive θlustering algorithm is used to identify data distriηution θenters [ ], in


whiθh are θentered the pertinenθe θurves with pertinenθe values equal to . The numηer of
θlusters, the radius of influenθe of eaθh θluster θenter, and the numηer of training iterations
to ηe employed should ηe defined as parameters for the θonfiguration of the inferenθe sys‐
tem. “t eaθh pass, the algorithm seeks a point that minimizes the sum of the potential and
the neighηoring points. The potential is θalθulated ηy equation

Pi = å exp ç - 2 xi - x j ÷
n
æ 4 2ö

j =i è ar ø

where Pi is the potential of the possiηle θluster, xi is the possiηle θluster θenter, xj is eaθh
point in the neighηorhood of the θluster that will ηe grouped in it, and n is the numηer of
points in the neighηorhood.

“NFIS is a fuzzy inferenθe system introduθed in the work struθture of an adaptive neural
network. Using a hyηrid learning sθheme, the “NFIS system is aηle to map inputs and out‐
puts ηased on human knowledge and on input and output data pairs [ ]. The “NFIS meth‐
od is superior to other modeling methods suθh as autoregressive models, θasθade
θorrelation neural networks, ηaθkpropagation algorithm neural networks, sixth-order poly‐
nomials, and linear prediθtion methods [ ].
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 149
h““p://dx.doi.org/10.5772/51629

. Methodology

. . Signal collection and drilling process

The tests were performed on test speθimens θomposed of a paθkage of sheets of Ti “l V titani‐
um alloy and -T aluminum alloy, whiθh were arranged in this order to mimiθ their use in
the aerospaθe industry. The tool employed here was a heliθal drill made of hard metal.

“ total of nine test speθimens were prepared, and holes were drilled in eaθh one. Thus, a
θonsideraηle numηer of data were made availaηle to train the artifiθial intelligenθe systems.
The data set θonsisted of the signals θolleθted during drilling and the diameters measured at
the end of the proθess.

The drilling proθess was monitored using an aθoustiθ emission sensor, a three-dimension‐
al dynamometer, an aθθelerometer, and a Hall Effeθt sensor, whiθh were arranged as illus‐
trated in Figure .

Figure 4. Sensor assembly scheme for “es“ing.

The aθoustiθ emission signal was θolleθted using a Sensis model DM- sensor. The eleθ‐
triθ power was measured ηy applying a transduθer to monitor the eleθtriθ θurrent and
voltage in the terminals of the eleθtriθ motor that aθtivates the tool holder. The six signals
were sent to a National Instruments PCI- E data aθquisition ηoard installed in a θom‐
puter. LaηView software was used to aθquire the signals and store them in ηinary format
for suηsequent analysis and proθessing.
150 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

To simulate diverse maθhining θonditions, different θutting parameters were seleθted for
eaθh maθhined test speθimen. This method is useful to evaluate the performanθe of artifi‐
θial intelligenθe systems in response to θhanges in the proθess. Eaθh test speθimen was
duηηed as listed in Taηle .

Feed Speed
Condition ID Spindle [rpm]
[mm/min]

1 1A 1000 90.0

2 1B 1000 22.4

3 1C 1000 250.0

4 2A 500 90.0

5 2B 500 22.4

6 2C 500 250.0

7 3A 2000 90.0

8 3B 2000 22.4

9 3C 2000 250.0

Table 1. Machining condi“ions ”sed in “he “es“s.

Eaθh pass θonsisted of a single drilling movement along the workpieθe in a given θondition.
The signals of aθoustiθ emission, loads, θutting power and aθθeleration shown in Figure
were measured in real time at a rate of samples per seθond.

. . Diameter measurements

”eθause the roundness of maθhined holes is not perfeθt, two measurements were taken of
eaθh hole, one of the maximum and the other of the minimum diameter. Moreover, the di‐
ameter of the hole in eaθh maθhined material will also ηe different due to the material~s par‐
tiθular θharaθteristiθs.

“ M“HR MarCator ” θomparator gauge with a preθision of ± . mm was used to re‐


θord the measured dimensions and the dimensional θontrol results were employed to train
the artifiθial intelligenθe systems.

. . Definition of the architecture of the artificial intelligence systems

The arθhiteθture of the systems was defined using all the θolleθted signals, i.e., those of
aθoustiθ emission loads in the x, y and z direθtions eleθtriθal power and aθθeleration. “n
MLP network and an “NFIS system were θreated for eaθh test speθimen material, due to the
differenθes in the ηehavior of the signals see Figure and in the ranges of values found in
the measurement of the diameters.
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 151
h““p://dx.doi.org/10.5772/51629

Figure 5. Signals collec“ed d”ring “he drilling process of Ti6Al4V alloy (lef“) and 2024-T3 alloy (righ“).

. . . Multilayer Perθeptron

In this study, the signals from the sensors, together with the maximum and minimum meas‐
ured diameters, were organized into two matriθes, one for eaθh test speθimen material.
These data were utilized to train the neural network. The entire data set of the tests resulted
in samples, θonsidering tool ηreakage during testing under θondition C.

Parameters Ti6Al4V 2024-T3

Ne”rons in each layer [5 20 15] [20 10 5]


Learning ra“e 0.15 0.3
Momen“ 0.2 0.8
Transfer f”nc“ion “ansig poslin

Table 2. Archi“ec“”re of “he MLP ne“works.

The MLP network arθhiteθture is defined ηy estaηlishing the numηer of hidden layers to ηe
used, the numηer of neurons θontained in eaθh layer, the learning rate and the moment. “n
algorithm was θreated to test θomηinations of these parameters. The final θhoiθe was the
θomηination that appeared among the five smallest errors in the estimate of the maximum
and minimum diameters. Parameters suθh as the numηer of training iterations and the de‐
sired error are used as θriteria to θomplete the training and were estaηlished at training
iterations and x - mm, respeθtively. This proθedure was performed for eaθh material,
152 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

generating two MLP networks whose θonfiguration is desθriηed in Taηle . The remaining
parameters were kept aθθording to the M“TL“” default.

. . . ANFIS

The same data matrix employed to train the MLP neural network was used to train the
“NFIS system. This system θonsists of a fuzzy inferenθe system FIS that θolleθts the
availaηle data, θonverts them into If-Then type rules ηy means of pertinenθe funθtions,
and proθesses them to generate the desired output. The FIS is influenθed ηy the organiza‐
tion of the training data set, whose task is performed with the help of M“TL“”~s Fuzzy
toolηox. The suηtraθtive θlustering algorithm suηθlust is used to searθh for similar data
θlusters in the training set, optimizing the FIS through the definition of points with the
highest potential for the θluster θenter. Parameters suθh as the radius of influenθe, inθlu‐
sion rate and rejeθtion rate help define the numηer of θlusters, and henθe, the numηer of
rules of the FIS. Taηle lists the parameters used in the “NFIS systems. The desired er‐
ror was set at x - mm. The training method θonsists of a hyηrid algorithm with the
method of ηaθkpropagation and least-squares estimate.

Parameters Ti6Al4V 2024-T3

Radi”s of infl”ence 1.25 1.25


Incl”sion ra“e 0.6 0.6
Rejec“ion ra“e 0.4 0.1

Table 3. ANFIS parame“ers.

Figure 6. Mean error of hole diame“ers es“ima“ed by “he ANFIS sys“em for several “raining i“era“ions.

”eθause training in the “NFIS system is performed in ηatθh mode, with the entire training
set presented at onθe, the appropriate numηer of training iterations was investigated. Thus,
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 153
h““p://dx.doi.org/10.5772/51629

an algorithm was θreated to test several numηers of training iterations ranging from up to
. Figure illustrates the result of this test.
The larger the numηer of training iterations, the greater the θomputational effort. Thus,
avoiding the peaks Figure , the numηer de training iterations was set at , whiθh re‐
quires low θomputational θapaθity.

. Influence of inputs on the performance of diameter estimation

. . Use of all the collected signals

Initially, the systems were trained using all the θolleθted signals. Given the data set of the
sensors and the desired outputs, whiθh Consist of the measured diameters, the performanθe
of the neural network is evaluated ηased on the error ηetween the estimated diameter and
the measured diameter, whiθh are shown on graphs.

. . . MLP

For the Ti “l V titanium alloy, the estimate of the minimum diameter resulted in a mean
error of . mm, with a maximum error of . mm.

Figure 7. Minim”m and maxim”m diame“ers (ac“”al vs. es“ima“ed by “he MLP) for “he Ti6Al4V alloy.

For the maximum diameter, the resulting mean error was . mm, with a maximum error
of . mm. Figure depiθts the results of these estimates.
Figure shows the result of the estimation of the hole diameters maθhined in the -T
aluminum alloy. The mean error for the minimum diameter was . mm, with a maxi‐
mum error of . mm. For the maximum diameter, the mean error was . mm, with a
maximum error of . mm.
154 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 8. Minim”m and maxim”m diame“ers (ac“”al vs. es“ima“ed by “he MLP) for “he 2024-T3 alloy.

. . . ANFIS

Figure shows the diameters estimated ηy “NFIS. The mean error in the estimate of the
minimum diameter was . mm, with a maximum error of . mm. For the maximum
diameter, the resulting mean error was . mm, and the highest error was . mm.

Figure 9. Minim”m and maxim”m diame“ers (ac“”al vs. es“ima“ed by ANFIS) for “he Ti6Al4V alloy.

Figure illustrates the result of the maθhined hole diameter estimated for the -T al‐
loy, using the same network θonfiguration. The mean error for the minimum diameter was
. mm, with a maximum error of . mm. The maximum diameter presented a mean
error of . mm, and a maximum error of . mm.
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 155
h““p://dx.doi.org/10.5772/51629

Figure 10. Minim”m and maxim”m diame“ers (ac“”al vs. es“ima“ed by “he MLP) for “he 2024-T3 alloy.

. . Isolated and combined use of signals

To optimize the θomputational effort, an algorithm was θreated to test the performanθe of eaθh
type of system in response to eaθh of the signals separately or to a θomηination of two or more
signals. This proθedure was adopted in order to identify a less invasive estimation method.

Figure 11. Performance of individ”al and combined signals in “he es“ima“ion of hole diame“ers in Ti6Al4V alloy by “he
MLP ne“work

Individual signals and a θomηination of two distinθt signals were tested for the MLP net‐
work. The ηest individual inputs for the Ti “l V alloy were the aθoustiθ emission and Z
forθe signals. Comηined, the Z forθe and aθθeleration signals presented the lowest error. The
θlassified signals are illustrated in Figure .
156 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

For the -T alloy, the ηest individual input was the Z forθe. When θomηined, the Z forθe
and aθθeleration signals presented the lowest error. Figure depiθts the θlassified signals.

In the “NFIS system, the Z forθe provided the ηest individual signal for the estimate of the
drilled hole diameter in Ti “l V alloy. The aθoustiθ emission signal θomηined with the Z
forθe presented the ηest result with two θomηinations, as indiθated in Figure .

Figure 12. Performance of individ”al and combined signals in “he es“ima“ion of hole diame“ers in 2024-T3 alloy by “he
MLP sys“em.

For the aluminum alloy, the “NFIS system presented the ηest performanθe with the individ‐
ual Z forθe signal and with a θomηination of the Z forθe and aθoustiθ emission signals, as
indiθated in Figure .

Figure 13. Performance of individ”al and combined signals in “he es“ima“ion of hole diame“ers in Ti6Al4V alloy by “he
ANFIS sys“em.
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 157
h““p://dx.doi.org/10.5772/51629

Figure 14. Performance of individ”al and combined signals in “he es“ima“ion of hole diame“ers in 2024-T3 alloy by “he
ANFIS sys“em.

. Performance using the Z force

”eθause the performanθe of the artifiθial intelligenθe systems in the tests was the highest when
using the Z forθe signal, new tests were θarried out with only this signal. The errors were divid‐
ed into four θlasses, aθθording to the following θriteria preθision of the instrument ≤ μm , tol‐
eranθe required for preθision drilling proθesses ≤ μm , toleranθe normally employed in
industrial settings ≤ μm , and the errors that would lead to a non-θonformity > μm . The
θonfigurations used in the previous tests were maintained in this test.

. . MLP

The multilayer perθeptron “NN was trained with the information of the Z forθe and the
minimum and maximum measured diameters.

Figure 15. Classifica“ion of es“ima“ion errors for “he 2024-T3 alloy ob“ained by “he MLP ne“work.

The simulation of the MLP network for the aluminum alloy presented lower preθision errors
than the measurement instrument used in % of the attempts. Thirty-three perθent of the
158 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

estimates presented errors within the range stipulated for preθision holes and % for the
toleranθes normally applied in industry in general. Only % of the estimates performed ηy
the artifiθial neural network would result in a produθt θonformity rejeθtion.
The simulation of the MLP network for the titanium alloy presented lower preθision errors
than the measurement instrument used in % of the attempts. Thirty-seven perθent of the
estimates presented errors within the range stipulated for preθision holes and % for the
toleranθes normally applied in industry in general. Only % of the estimates performed ηy
the artifiθial neural network would result in a produθt θonformity rejeθtion.

Figure 16. Classifica“ion of es“ima“ion errors for “he Ti6Al4V alloy ob“ained by “he MLP ne“work.

. . ANFIS
The “NFIS system was simulated in the same way as was done with the MLP network, ηut
this time using only one input, the Z forθe. This proθedure resulted in θhanges in the FIS
struθture due to the use of only one input.

Figure 17. Classifica“ion of es“ima“ion errors for “he 2024-T3 alloy ob“ained by “he ANFIS sys“em.

For the aluminum alloy, “NFIS presented lower preθision errors than the measurement in‐
strument employed in % of the attempts. Thirty-seven perθent of the estimates presented
errors within the range stipulated for preθision holes and % for the toleranθes normally
used in industry in general. Only % of the estimates performed ηy the artifiθial neural net‐
work would result in a produθt θonformity rejeθtion, as indiθated in Figure .
For the titanium alloy, “NFIS presented lower preθision errors than the measurement in‐
strument employed in % of the attempts. Thirty-four perθent of the estimates presented
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 159
h““p://dx.doi.org/10.5772/51629

errors within the range stipulated for preθision holes and % for the toleranθes normally
used in industry. Only % of the estimates performed ηy the artifiθial neural network
would lead to a produθt θonformity rejeθtion, as indiθated in Figure .

Figure 18. Classifica“ion of es“ima“ion errors for “he Ti6Al4V alloy ob“ained by “he ANFIS sys“em.

. Conclusions

“rtifiθial intelligenθe systems today are employed in meθhaniθal manufaθturing proθesses


to monitor tool wear and θontrol θutting parameters. This artiθle presented a study of the
appliθation of two systems used in the dimensional θontrol of a preθision drilling proθess.

The first system used here θonsisted of a multiple layer perθeptron MLP artifiθial neural net‐
work. Its performanθe was marked ηy the large numηer of signals used in its training and for
its estimation preθision, whiθh produθed % of θorreθt responses errors ηelow μm for the
titanium alloy and % for the aluminum alloy. “s for its unaθθeptaηle error rates, the MLP
system generated only % and % for the titanium and aluminum alloys, respeθtively.

The seθond approaθh, whiθh involved the appliθation of an adaptive neuro-fuzzy infer‐
enθe system “NFIS , generated a large numηer of θorreθt responses using the six availa‐
ηle signals, i.e., % for the titanium alloy and % for the aluminum alloy. “ total of
% of errors for the titanium alloy and % for the aluminum alloy were θlassified
aηove the admissiηle toleranθes > μm .

The results desθriηed herein demonstrate the appliθaηility of the two systems in industrial
θontexts. However, to evaluate the eθonomiθ feasiηility of their appliθation, another method
was employed using the signal from only one sensor, whose simulations generated the low‐
est error among the availaηle signals. Two signals stood out the Z forθe and aθoustiθ emis‐
sion signals, with the former presenting a ηetter result for the two alloys of the test speθimen
and the latter presenting good results only in the hole diameter estimation for the titanium
alloy. Therefore, the Z forθe was seleθted for the θontinuation of the tests.

The results oηtained here are very enθouraging in that fewer estimates fell within the range
θonsidered inadmissiηle, i.e., only % for the aluminum alloy and % for the titanium alloy,
using the MLP network.
160 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

The results produθed ηy the “NFIS system also demonstrated a drop in the numηer of errors
outside the expeθted range, i.e., % for the aluminum alloy and % for the titanium alloy.
”ased on the approaθhes used in this work, it θan ηe stated that the use of artifiθial intelli‐
genθe systems in industry, partiθularly multilayer perθeptron neural networks and the
adaptive neuro-fuzzy inferenθe systems, is feasiηle. These systems showed high aθθuraθy
and low θomputational effort, as well as a low implementation θost with the use of only one
sensor, whiθh implies few physiθal θhanges in the equipment to ηe monitored.

Acknowledgements

The authors gratefully aθknowledge the ”razilian researθh funding agenθies F“PESP São
Paulo Researθh Foundation , for supporting this researθh work under Proθess #
/ - and CNPq National Counθil for Sθientifiθ and Teθhnologiθal Development
and C“PES Federal “genθy for the Support and Evaluation of Postgraduate Eduθation for
providing sθholarships. We are also indeηted to the θompany OSG Sulameriθana de Ferra‐
mentas Ltda. for manufaθturing and donating the tools used in this researθh.

Author details

Thiago M. Geronimo, Carlos E. D. Cruz, Fernando de Souza Campos, Paulo R. “guiar and
Eduardo C. ”ianθhi*

*“ddress all θorrespondenθe to ηianθhi@feη.unesp.ηr

Universidade Estadual Paulista Júlio de Mesquita Filho€ UNESP , ”auru θampus, ”razil

References

[ ] Kamen, W. . Industrial Controls and Manufaθturing, San Diego, “θademiθ Press,


.
[ ] Huang, S. H., & Zhang, H.-C. , June . “rtifiθial Neural Networks in Manufaθtur‐
ing Conθepts, “ppliθations, and Perspeθtives. IEEE Trans. Comp. Paθk. Manuf. Teθh.-
Part A, .
[ ] Konig, W., & Kloθke, F. . Fertigungsverfahren: drehen, frasen, ηohren ed. , ”er‐
lin, Springer-Verlag, .
[ ] Rivero, “., “ramendi, G., Herranz, S., & López de Laθalle, L. N. . “n experi‐
mental investigation of the effeθt of θoatings and θutting parameters on the dry drill‐
ing performanθe of aluminium alloys. Int J Adv Manuf Teθhnol, , - .
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 161
h““p://dx.doi.org/10.5772/51629

[ ] Panda, S. S., Chakraηorty, D., & Pal, S. K. . Monitoring of drill flank wear using
fuzzy ηaθk-propagation neural network. Int. J. Adv. Manuf. Teθhnol, , - .
[ ] Yang, X., Kumehara, H., & Zhang, W. “ugust . ”aθk-propagation Wavelet Neu‐
ral Network ”ased Prediθtion of Drill Wear from Thrust Forθe and Cutting Torque
Signals. Computer and Information Sθienθe, ,
[ ] Li, X., & Tso, S. K. . Drill wear monitoring ηased on θurrent signals. Wear, ,
- .
[ ] “ηu-Mahfouz, I. . Drilling wear deteθtion and θlassifiθation using viηration
signals and artifiθial neural nerwork. International Journal of Maθhine Tools & Manufaθ‐
ture, , - .
[ ] Kandilli, I., Sönmez, M., Ertunθ, H. M., & Çakir, ”. , “ugust . Online Monitoring
of Tool Wear In Drilling and Milling ”y Multi-Sensor Neural Network Fusion. Har‐
ηin. Proθeedings of 7 IEEE International Conferenθe on Meθhatroniθs and Automation,
- .
[ ] Haykin, S. . Neural Networks “ Compreensive Foundation. Patparganj, Pear‐
son Prentiθe Hall, ed., .
[ ] Sanjay, C., & Jyothi, C. . “ study of surfaθe roughness in drilling Teθhnol using
mathematiθal analysis and neural networks. Int J Adv Manuf, , - .
[ ] Huang, ”. P., Chenη, J. C., & Li, Y. . “rtifiθial-neural-networksηased surfaθe
roughness pokayoke system for end-milling operations. Neuroθomputing, , - .
[ ] J.-S. R., Jang. . “NFIS “daptive-Network-”ased Fuzzy Inferenθe System. IEEE
Transaθtions on Systems, Man and Cyηernetiθs, , - .
[ ] Resende, S. O. . Sistemas Inteligentes: Fundamentos e Apliθações, Manole, ”arueri,
ed., .
[ ] Lezanski, P. . “n Intelligent System for Grinding Wheel Condition Monitoring.
Journal of Materials Proθessing Teθhnology, , - .
[ ] Lee, K. C., Ho, S. J., & Ho, S. Y. . “θθurate Estimation of Surfaθe Roughness
from Texture Features of The Surfaθe Image Using an “daptive Neuro-Fuzzy Infer‐
enθe System. Preθision Engineering, , - .
[ ] Johnson, J., & Piθton, P. . Conθepts in Artifiθial Intelligenθe: Designing Intelligent
Maθhines, , Oxford, ”utterworth-Heinemann, .
[ ] Sugeno, M., & Kang, G. T. . Struθture Identifiθation of Fuzzy Model. Fuzzy Sets
and Systems, , - .
[ ] Lezanski, P. . “n Intelligent System for Grinding Wheel Condition Monitoring.
Journal of Materials Proθessing Teθhnology, , - .
[ ] Chiu, S. L. . Fuzzy Model Identifiθation ”ased on Cluster Estimation. Journal of
Intelligent and Fuzzy Systems, , - .
162 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] Lee, K. C., Ho, S. J., & Ho, S. Y. . “θθurate Estimation of Surfaθe Roughness
from Texture Features of The Surfaθe Image Using an “daptive Neuro-Fuzzy Infer‐
enθe System. Preθision Engineering, , - .

[ ] Drozda, T. J., & Wiθk, C. . Tool and Manufaθturing Engineers Handηook, , Ma‐
θhining, SME, Dearηorn.
Chapter 8

Integrating Modularity and Reconfigurability for


Perfect Implementation of Neural Networks

Hazem M. El-Bakry

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/53021

. Introduction

In this θhapter, we introduθe a powerful solution for θomplex proηlems that are required to
ηe solved ηy using neural nets. This is done ηy using modular neural nets MNNs that di‐
vide the input spaθe into several homogenous regions. Suθh approaθh is applied to imple‐
ment XOR funθtions, logiθ funθtion on one ηit level, and -ηit digital multiplier.
Compared to previous non- modular designs, a salient reduθtion in the order of θomputa‐
tions and hardware requirements is oηtained.

Modular Neural Nets MNNs present a new trend in neural network arθhiteθture de‐
sign. Motivated ηy the highly-modular ηiologiθal network, artifiθial neural net designers
aim to ηuild arθhiteθtures whiθh are more sθalaηle and less suηjeθted to interferenθe than
the traditional non-modular neural nets [ ]. There are now a wide variety of MNN de‐
signs for θlassifiθation. Non-modular θlassifiers tend to introduθe high internal interfer‐
enθe ηeθause of the strong θoupling among their hidden layer weights [ ]. “s a result of
this, slow learning or over fitting θan ηe done during the learning proθess. Sometime,
the network θould not ηe learned for θomplex tasks. Suθh tasks tend to introduθe a wide
range of overlap whiθh, in turn, θauses a wide range of deviations from effiθient learn‐
ing in the different regions of input spaθe [ ]. Usually there are regions in the θlass fea‐
ture spaθe whiθh show high overlap due to the resemηlanθe of two or more input
patterns θlasses . “t the same time, there are other regions whiθh show little or even no
overlap, due to the uniqueness of the θlasses therein. High θoupling among hidden no‐
des will then, result in over and under learning at different regions [ ]. Enlarging the
network, inθreasing the numηer and quality of training samples, and teθhniques for
avoiding loθal minina, will not stretθh the learning θapaηilities of the NN θlassifier ηe‐
yond a θertain limit as long as hidden nodes are tightly θoupled, and henθe θross talking

© 2013 M. El-Bakry; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he Crea“ive
Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s ”nres“ric“ed ”se,
dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
164 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

during learning [ ]. “ MNN θlassifier attempts to reduθe the effeθt of these proηlems via
a divide and θonquer approaθh. It, generally, deθomposes the large size / high θomplexi‐
ty task into several suη-tasks, eaθh one is handled ηy a simple, fast, and effiθient mod‐
ule. Then, suη-solutions are integrated via a multi-module deθision-making strategy.
Henθe, MNN θlassifiers, generally, proved to ηe more effiθient than non-modular alterna‐
tives [ ]. However, MNNs θan not offer a real alternative to non-modular networks un‐
less the MNNs designer ηalanθes the simpliθity of suηtasks and the effiθienθy of the
multi module deθision-making strategy. In other words, the task deθomposition algo‐
rithm should produθe suη tasks as they θan ηe, ηut meanwhile modules have to ηe aηle
to give the multi module deθision making strategy enough information to take aθθurate
gloηal deθision [ ].

In a previous paper [ ], we have shown that this model θan ηe applied to realize non-ηinary
data. In this θhapter, we prove that MNNs θan solve some proηlems with a little amount of
requirements than non-MNNs. In seθtion , XOR funθtion, and logiθ funθtions on one ηit
level are simply implemented using MNN. Comparisons with θonventional MNN are given.
In seθtion , another strategy for the design of MNNS is presented and applied to realize,
and -ηit digital multiplier.

. Complexity reduction using modular neural networks

In the following suηseθtions, we investigate the usage of MNNs in some ηinary proηlems.
Here, all MNNs are feedforward type, and learned ηy using ηaθkpropagation algorithm. In
θomparison with non-MNNs, we take into aθθount the numηer of neurons and weights in
ηoth models as well as the numηer of θomputations during the test phase.

. . A simple implementation for XOR problem

There are two topologies to realize XOR funθtion whose truth Taηle is shown in Taηle us‐
ing neural nets. The first uses fully θonneθted neural nets with three neurons, two of whiθh
are in the hidden layer, and the other is in the output layer. There is no direθt θonneθtions
ηetween the input and output layer as shown in Fig. . In this θase, the neural net is trained
to θlassify all of these four patterns at the same time.

x y O/P

0 0 0
0 1 1
1 0 1
1 1 0

Table 1. Tr”“h “able of XOR f”nc“ion.


In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 165
h““p://dx.doi.org/10.5772/53021

Figure 1. Realiza“ion of XOR f”nc“ion ”sing “hree ne”rons.

The seθond approaθh was presented ηy Minsky and Papert whiθh was realized using two
neurons as shown in Fig. . The first representing logiθ “ND and the other logiθ OR. The
value of + . for the threshold of the hidden neuron insures that it will ηe turned on only
when ηoth input units are on. The value of + . for the output neuron insures that it will
turn on only when it reθeives a net positive input greater than + . . The weight of - from
the hidden neuron to the output one insures that the output neuron will not θome on when
ηoth input neurons are on [ ].

Figure 2. Realiza“ion of XOR f”nc“ion ”sing “wo ne”rons.

Using MNNs, we may θonsider the proηlem of θlassifying these four patterns as two indi‐
vidual proηlems. This θan ηe done at two steps

. We deal with eaθh ηit alone.

. Consider the seθond ηit Y, Divide the four patterns into two groups.

The first group θonsists of the first two patterns whiθh realize a ηuffer, while the seθond
group whiθh θontains the other two patterns represents an inverter as shown in Taηle . The
first ηit X may ηe used to seleθt the funθtion.

X Y O/P New Function

0 0 0 B”ffer (Y)
0 1 1

1 0 1 Inver“er (Ȳ )
1 1 0

Table 2. Res”l“s of dividing XOR Pa““erns.


166 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

So, we may use two neural nets, one to realize the ηuffer, and the other to represent the in‐
verter. Eaθh one of them may ηe implemented ηy using only one neuron. When realizing
these two neurons, we implement the weights, and perform only one summing operation.
The first input X aθts as a deteθtor to seleθt the proper weights as shown in Fig. . In a speθial
θase, for XOR funθtion, there is no need to the ηuffer and the neural net may ηe represented
ηy using only one weight θorresponding to the inverter as shown in Fig. . “s a result of us‐
ing θooperative modular neural nets, XOR funθtion is realized ηy using only one neuron. “
θomparison ηetween the new model and the two previous approaθhes is given in Taηle . It
is θlear that the numηer of θomputations and the hardware requirements for the new model
is less than that of the other models.

Figure 3. Realiza“ion of XOR f”nc“ion ”sing mod”lar ne”ral ne“s.

Figure 4. Implemen“a“ion of XOR f”nc“ion ”sing only one ne”ron.


In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 167
h““p://dx.doi.org/10.5772/53021

Type of First model Second model New model


Comparison (three neurons) (two neurons) (one neuron)

No. of O(15) O(12) O(3)


comp”“a“ions
Hardware 3 ne”rons, 2 ne”rons, 1 ne”ron,
req”iremen“s 9 weigh“s 7 weigh“s 2 weigh“s,
2 swi“ches,
1 inver“er

Table 3. A comparison be“ween differen“ models ”sed “o implemen“ XOR f”nc“ion.

. . Implementation of logic functions by using MNNs

Realization of logiθ funθtions in one ηit level X,Y generates funθtions whiθh are “ND,
OR, N“ND, NOR, XOR, XNOR, X̄ ,Ȳ , X, Y, , , X̄ Y, XȲ , X̄ +Y, X+Ȳ . So, in order to θontrol
the seleθtion for eaθh one of these funθtions, we must have another ηits at the input, there‐
ηy the total input is ηits as shown in Taηle .

Function C1 C2 C3 C4 X Y O/p

AND 0 0 0 0 0 0 0
0 0 0 0 0 1 0
0 0 0 0 1 0 0
0 0 0 0 1 1 1
........... .... .... .... .... .... .... ....
X+Ȳ 1 1 1 1 0 0 1
1 1 1 1 0 1 0
1 1 1 1 1 0 1
1 1 1 1 1 1 1

Table 4. Tr”“h “able of Logic f”nc“ion (one bi“ level) wi“h “heir con“rol selec“ion.

Non-MNNs θan θlassify these patterns using a network of three layers. The hidden layer
θontains neurons, while the output needs only one neuron and a total numηer of
weights are required. These patterns θan ηe divided into two groups. Eaθh group has an in‐
put of ηits, while the MS” is with the first group and with the seθond. The first group
requires neurons and weights in the hidden layer, while the seθond needs neurons
and weights. “s a result of this, we may implement only summing operations in the
hidden layer in spite of neurons in θase of non-MNNs where as the MS” is used to seleθt
whiθh group of weights must ηe θonneθted to the neurons in the hidden layer. “ similar
proθedure is done ηetween hidden and output layer. Fig. shows the struθture of the first
neuron in the hidden layer. “ θomparison ηetween MNN and non-MNNs used to imple‐
ment logiθ funθtions is shown in Taηle .
168 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 5. Realiza“ion of logic f”nc“ions ”sing MNNs (“he firs“ ne”ron in “he hidden layer).

Type of Realization using Realization using


Comparison non MNNs MNNs

No. of O(121) O(54)


comp”“a“ions
Hardware 9 ne”rons, 5 ne”rons, 51 weigh“s, 10
req”iremen“s 65 weigh“s swi“ches, 1 inver“er

Table 5. A comparison be“ween MNN and non MNNs ”sed “o implemen“ 16 logic f”nc“ions.

. Implementation of -bits digital multiplier by using MNNs

In the previous seθtion, to simplify the proηlem, we make division in input, here is an exam‐
ple for division in output. “θθording to the truth taηle shown in Taηle , instead of treating
the proηlem as mapping ηits in input to ηits in output, we may deal with eaθh ηit in out‐
put alone. Non MNNs θan realize the -ηits multiplier with a network of three layers with
total numηer of weights. The hidden layer θontains neurons, while the output one has
neurons. Using MNN we may simplify the proηlem as

W = CA

X = AD Ä BC = AD(B + C) + BC(A + D)
= (AD + BC)(A + B + C + D)

Y = BD(A + C) = BD(A + B + C + D)

Z = ABCD
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 169
h““p://dx.doi.org/10.5772/53021

Equations , , θan ηe implemented using only one neuron. The third term in Equation
θan ηe implemented using the output from ”it Z with a negative inhiηitory weight. This
eliminates the need to use two neurons to represent Ā andD̄. Equation resemηles an XOR,
ηut we must first oηtain “D and ”C. “D θan ηe implemented using only one neuron. “n‐
other neuron is used to realize ”C and at the same time oring “D, ”C as well as anding the
¯ as shown in Fig. . “ θomparison ηetween MNN and non-MNNs used to
result with ABCD
implement ηits digital multiplier is listed in Taηle .

Figure 6. Realiza“ion of 2-bi“s digi“al m”l“iplier ”sing MNNs.

Input Patterns Output Patterns


D C B A Z Y X W
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0
0 1 0 0 0 0 0 0
0 1 0 1 0 0 0 1
0 1 1 0 0 0 1 0
0 1 1 1 0 1 1 0
1 0 0 0 0 0 0 0
1 0 0 1 0 0 1 0
1 0 1 0 0 1 0 0
1 0 1 1 0 1 1 0
1 1 0 0 0 0 0 0
1 1 0 1 0 0 1 1
1 1 1 0 0 1 1 0
1 1 1 1 1 0 0 1

Table 6. Tr”“h “able of 2-bi“ digi“al m”l“iplier.


170 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Type of Realization using Realization using


Comparison non MNNs MNNs

No. of O(55) O(35)


comp”“a“ions

Hardware 7 ne”rons, 5 ne”rons,


req”iremen“s 31 weigh“s 20 weigh“s

Table 7. A comparison be“ween MNN and non-MNNs ”sed “o implemen“ 2-bi“s digi“al m”l“iplier.

. Hardware implementation of MNNs by using reconfigurable circuits

“dvanθes in MOS VLSI have made it possiηle to integrate neural networks of large sizes on
a single-θhip [ , ]. Hardware realizations make it possiηle to exeθute the forward pass op‐
eration of neural networks at high speeds, thus making neural networks possiηle θandidates
for real-time appliθations. Other advantages of hardware realizations as θompared to soft‐
ware implementations are the lower per unit θost and small size system.

“nalog θirθuit teθhniques provide area-effiθient implementations of the funθtions required


in a neural network, namely, multipliθation, summation, and the sigmoid transfer θharaθter‐
istiθ [ ]. In this paper, we desθriηe the design of a reθonfiguraηle neural network in analog
hardware and demonstrate experimentally how a reθonfiguraηle artifiθial neural network
approaθh is used in implementation of arithmetiθ unit that inθluding full-adder, full-suη‐
traθtor, -ηit digital multiplier, and -ηit digital divider.

One of the main reasons for using analog eleθtroniθs to realize network hardware is that
simple analog θirθuits for example adders, sigmoid, and multipliers θan realize several
of the operations in neural networks. Nowadays, there is a growing demand for large as
well as fast neural proθessors to provide solutions for diffiθult proηlems. Designers may
use either analog or digital teθhnologies to implement neural network models. The ana‐
log approaθh ηoasts θompaθtness and high speed. On the other hand, digital implemen‐
tations offer flexiηility and adaptaηility, ηut only at the expense of speed and siliθon area
θonsumption.

. . Implementation of artificial neuron

Implementation of analog neural networks means that using only analog θomputation
[ , , ]. “rtifiθial neural network as the name indiθates, is the interθonneθtion of artifiθial
neurons that tend to simulate the nervous system of human ηrain [ ]. Neural networks are
modeled as simple proθessors neurons that are θonneθted together via weights. The
weights θan ηe positive exθitatory or negative inhiηitory . Suθh weights θan ηe realized ηy
resistors as shown in Fig. .
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 171
h““p://dx.doi.org/10.5772/53021

Figure 7. Implemen“a“ion of posi“ive and nega“ive weigh“s ”sing only one opamp.

The θomputed weights may have positive or negative values. The θorresponding resistors
that represent these weights θan ηe determined as follow [ ]

win = - R f / Rin i = , , ¼¼ , n

æ ö Ro
ç + å Win ÷
n

Wpp = è i ø Rpp
æ Ro ö
ç + + + ...................... +
Rpp ÷ø
Ro Ro
è R p R p

The exaθt values of these resistors θan ηe θalθulated as presented in [ , ]. The summing
θirθuit aθθumulates all the input-weighted signals and then passes to the output through the
transfer funθtion [ ]. The main proηlem with the eleθtroniθ neural networks is the realiza‐
tion of resistors whiθh are fixed and have many proηlems in hardware implementation [ ].
Suθh resistors are not easily adjustaηle or θontrollaηle. “s a θonsequenθe, they θan ηe used
neither for learning, nor θan they ηe used for reθall when another task needs to ηe solved. So
the θalθulated resistors θorresponding to the oηtainaηle weights θan ηe implemented ηy us‐
ing CMOS transistors operating in θontinuous mode triode region as shown in Fig. . The
equivalent resistanθe ηetween terminal and is given ηy [ ]
172 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 8. Two MOS “ransis“or as a linear resis“or.

Req = ë (
/ é K Vg – )
Vth ù
û

. . Reconfigurability

The interθonneθtion of synapses and neurons determines the topology of a neural network.
Reθonfiguraηility is defined as the aηility to alter the topology of the neural network [ ].
Using switθhes in the interθonneθtions ηetween synapses and neurons permits one to
θhange the network topology as shown in Fig. . These switθhes are θalled "reθonfiguration
switθhes".

The θonθept of reθonfiguraηility should not ηe θonfused with weight programmaηility. Weight
programmaηility is defined as the aηility to alter the values of the weights in eaθh synapse.
In Fig. , weight programmaηility involves setting the values of the weights w , w , w ,....,
wn. “lthough reθonfiguraηility θan ηe aθhieved ηy setting weights of some synapses to zero
value, this would ηe very ineffiθient in hardware.

Figure 9. Ne”ron wi“h reconfig”rable swi“ches.


In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 173
h““p://dx.doi.org/10.5772/53021

Reθonfiguraηility is desiraηle for several reasons [ ]

. Providing a general proηlem-solving environment.


. Correθting offsets.
. Ease of testing.
. Reθonfiguration for isolating defeθts.

. Design arithmetic and logic unit by using reconfigurable neural


networks

In previous paper [ ], a neural design for logiθ funθtions ηy using modular neural net‐
works was presented. Here, a simple design for the arithmetiθ unit using reθonfiguraηle
neural networks is presented. The aim is to have a θomplete design for “LU ηy using the
ηenefits of ηoth modular and reθonfiguraηle neural networks.

. . Implementation of full adder/full subtractor by using neural networks


Full-adder/full-suηtraθtor proηlem is solved praθtiθally and a neural network is simulated and
implemented using the ηaθk-propagation algorithm for the purpose of learning this network
[ ]. The network is learned to map the funθtions of full-adder and full-suηtraθtor. The
proηlem is to θlassify the patterns shown in Taηle θorreθtly.

I/P Full-Adder Full-Subtractor


x y z

S C D B

0 0 0 0 0 0 0
0 0 1 0 1 1 1
0 1 0 0 1 1 1
0 1 1 1 0 1 0
1 0 0 0 1 0 1
1 0 1 1 0 0 0
1 1 0 1 0 0 0
1 1 1 1 1 1 1

Table 8. Tr”“h “able of f”ll-adder/f”ll-s”b“rac“or

The θomputed values of weights and their θorresponding values of resistors are desθriηed in
Taηle . “fter θompleting the design of the network, simulations are θarried out to test ηoth
the design and performanθe of this network ηy using H-spiθe. Experimental results θonfirm
the proposed theoretiθal θonsiderations. Fig. shows the θonstruθtion of full-adder/full-
suηtraθtor neural network. The network θonsists of three neurons and -θonneθtion
weights.
174 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

I/P Neuron (1) Neuron (2) Neuron (3)

Weigh“ Resis“ance Weigh“ Resis“ance Weigh“ Resis“ance

1 7.5 11.8 Ro 15 6.06 Ro 15 6.06 Ro


2 7.5 11.8 Ro 15 6.06 Ro 15 6.06 Ro
3 7.5 11.8 Ro -10 0.1 Rf -10 0.1 Rf
Bias -10.0 0.1 Rf -10 0.1 Rf -10 0.1 Rf

Table 9. Comp”“ed weigh“s and “heir corresponding resis“ances of f”ll-adder/f”ll-s”b“rac“or

Figure 10. F”ll-adder/f”ll-s”b“rac“or implemen“a“ion.

. . Hardware implementation of -bit digital multiplier

-ηit digital multiplier θan ηe realized easily using the traditional feed-forward artifiθial
neural network [ ]. “s shown in Fig. , the implementation of -ηit digital multiplier us‐
ing the traditional arθhiteθture of a feed-forward artifiθial neural network requires -neu‐
rons, -synaptiθ weights in the input-hidden layer, and -neurons, -synaptiθ weights in
the hidden-output layer. Henθe, the total numηer of neurons is -neurons with -synaptiθ
weights.
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 175
h““p://dx.doi.org/10.5772/53021

Figure 11. Bi“ digi“al m”l“iplier ”sing “radi“ional feed-forward ne”ral ne“work

In the present work, a new design of -ηit digital multiplier has ηeen adopted. The new de‐
sign requires only -neurons with -synaptiθ weights as shown in Fig. . The network re‐
θeives two digital words, eaθh word has -ηit, and the output of the network gives the
resulting multipliθation. The network is trained ηy the training set shown in Taηle .

I/P O/P

B2 B1 A2 A1 O4 O3 O2 O1

0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0
0 1 0 0 0 0 0 0
0 1 0 1 0 0 0 1
0 1 1 0 0 0 1 0
0 1 1 1 0 0 1 1
1 0 0 0 0 0 0 0
1 0 0 1 0 0 1 0
1 0 1 0 0 1 0 0
1 0 1 1 0 1 1 0
1 1 0 0 0 0 0 0
1 1 0 1 0 0 1 1
1 1 1 0 0 1 1 0
1 1 1 1 1 0 0 1

Table 10. 2-bi“ digi“al m”l“iplier “raining se“


176 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Figure 12. A novel design for2-Bi“ m”l“iplier ”sing ne”ral ne“work

During the training phase, these input/output pairs are fed to the network and in eaθh itera‐
tion the weights are modified until reaθhed to the optimal values. The optimal value of the
weights and their θorresponding resistanθe values are shown in Taηle . The proposed θirθuit
has ηeen realized ηy hardware means and the results have ηeen tested using H-spiθe θomputer
program. ”oth the aθtual and θomputer results are found to ηe very θlose to the θorreθt results.

Neuron I/P W. Value Resistor


A1 7.5 1200
(1) B1 7.5 1200
Bias -10.0 100
A1 7.5 1450
B2 7.5 1450
(2) Bias -10.0 100
N4 -30.0 33
N5 20.0 618
A2 7.5 1200
B2 7.5 1200
(3)
bias -10.0 100
N4 -10.0 100
A1 3.0 1200
A2 3.0 1200
(4) B1 3.0 1200
B2 3.0 1200
bias -10.0 100
A2 7.5 1200
(5) B1 7.5 1200
Bias -10.0 100

Table 11. Weigh“ val”es and “heir corresponding resis“ance val”es for digi“al m”l“iplier.
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 177
h““p://dx.doi.org/10.5772/53021

. . Hardware implementation of -bit digital divider

-ηit digital divider θan ηe realized easily using the artifiθial neural network. “s shown in
Fig. , the implementation of -ηit digital divider using neural network requires -neurons,
-synaptiθ weights in the input-hidden layer, and -neurons, -synaptiθ weights in the
hidden-output layer. Henθe, the total numηer of neurons is -neurons with -synaptiθ
weights. The network reθeives two digital words, eaθh word has -ηit, and the output of the
network gives two digital words one for the resulting division and the other for the result‐
ing remainder. The network is trained ηy the training set shown in Taηle

Figure 13. Bi“ digi“al divider ”sing ne”ral ne“work.

I/P O/P
B2 B1 A2 A1 O4 O3 O2 O1
0 0 0 0 1 1 1 1
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0
0 1 0 0 1 1 1 1
0 1 0 1 0 0 0 1
0 1 1 0 0 1 0 0
0 1 1 1 0 1 0 0
1 0 0 0 1 1 1 1
1 0 0 1 0 0 1 0
1 0 1 0 0 0 0 1
1 0 1 1 1 0 0 0
1 1 0 0 1 1 1 1
1 1 0 1 0 0 1 1
1 1 1 0 0 1 0 1
1 1 1 1 0 0 0 1

Table 12. 2-bi“ digi“al dividier “raining se“


178 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

The values of the weights and their θorresponding resistanθe values are shown in Taηle .

Neuron I/P W. Val. Resistor

A1 -17.5 56
A2 -17.5 56
(1) B1 5 2700
B2 5 2700
Bias 5 2700

A1 7.5 1200
A2 7.5 1200
(2) B1 -10 100
B2 7.5 1200
Bias -17.5 56

A1 7.5 1200
A2 -10 100
(3)
B2 7.5 1200
Bias -10 100

A1 -4.5 220
A2 7.5 1200
(4) B1 7.5 1200
B2 -4.5 220
Bias -10 100

A1 -20 50
A2 -30 33
B1 10 1200
(5)
B2 25 500
N3 -25 40
Bias 17.5 700

N1 10 1000
(6) N3 10 1000
Bias -5 220

N1 10 1000
(7) N4 10 1000
Bias -5 220

N1 10 1000
(8) N2 10 1000
Bias -5 220

Table 13. Weigh“ val”es and “heir corresponding resis“ance val”es for digi“al divider.
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 179
h““p://dx.doi.org/10.5772/53021

The results have ηeen tested using H-spiθe θomputer program. Computer results are found
to ηe very θlose to the θorreθt results.

“rithmetiθ operations namely, addition, suηtraθtion, multipliθation, and division θan ηe re‐
alized easily using a reθonfiguraηle artifiθial neural network. The proposed network θonsists
of only -neurons, -θonneθtion weights, and -reθonfiguration switθhes. Fig. shows
the ηloθk diagram of the arithmetiθ operation using reθonfiguraηle neural network. The net‐
work inθludes full-adder, full-suηtraθtor, -ηit digital multiplier, and -ηit digital divider.
The proposed θirθuit is realized ηy hardware means and the results are tested using H-spiθe
θomputer program. ”oth the aθtual and θomputer results are found to ηe very θlose to the
θorreθt results.

Connection Selection
I/P Neurons
weights C1 C2

A1 Full-Adder

A2 Full- O1
Subtractor O2
Reconfiguration Neurons
switches O3
B1 2 Bit Digital O4
Multiplier

B2
2 Bit Digital
Divider

Figure 14. Block diagram of ari“hme“ic ”ni“ ”sing reconfig”rable ne”ral ne“work.

The θomputed values of weights and their θorresponding values of resistors are desθriηed in
Taηles , , . “fter θompleting the design of the network, simulations are θarried out to
test ηoth the design and performanθe of this network ηy using H-spiθe. Experimental results
θonfirm the proposed theoretiθal θonsiderations as shown in Taηles , .

. Conclusion

We have presented a new model of neural nets for θlassifying patterns that appeared expen‐
sive to ηe solved using θonventional models of neural nets. This approaθh has ηeen intro‐
duθed to realize different types of logiθ proηlems. “lso, it θan ηe applied to manipulate non-
ηinary data. We have shown that, θompared to non MNNs, realization of proηlems using
MNNs resulted in reduθtion of the numηer of θomputations, neurons and weights.
180 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

I/p Neuron(1) Neuron(2) Neuron(3)

X Y Z Prac“ical Sim”la“ed Prac“ical Sim”la“ed Prac“ical Sim”la“ed

0 0 0 -2.79 -3.4157 -2.79 -3.4135 -2.79 -3.4135


0 0 1 -2.73 -2.5968 3.46 3.3741 3.46 3.3741
0 1 0 -2.73 -2.5968 3.46 3.2741 3.46 3.3741
0 1 1
3.46 3.3761 3.46 3.4366 -2.75 -3.3081
1 0 0
-2.73 -2.5968 -2.79 -3.4372 3.46 3.3741
1 0 1
1 1 0 3.46 3.3761 -2.75 -3.3081 -2.75 -3.3081
1 1 1 3.46 3.3761 -2.75 -3.3081 -2.75 -3.3081
3.46 3.4231 3.48 3.4120 3.48 3.4120

Table 14. Prac“ical and Sim”la“ion res”l“s af“er “he s”mming circ”i“ of f”ll-adder/f”ll-s”b“rac“or

Neuron (1) Neuron (2) Neuron (3) Neuron (4) Neuron (5)

Prac“. Sim. Prac“. Sim. Prac“. Sim. Prac“. Sim. Prac“. Sim.

-2.79 -3.415 -2.79 -3.409 -2.79 -3.413 -2.79 -3.447 -2.79 -3.415
-2.34 -2.068 -2.72 -2.498 -2.79 -3.314 -2.78 -3.438 -2.79 -3.415
-2.79 -3.415 -2.79 -3.409 -1.63 -1.355 -2.78 -3.438 -2.34 -2.068
-2.34 -2.068 -2.72 -2.498 -1.63 -1.355 -2.78 -3.423 -2.34 -2.068
-2.34 -2.068 -2.79 -3.409 -2.79 -3.413 -2.78 -3.438 -2.34 -2.068
3.46 3.390 -2.72 -2.498 -2.79 -3.413 -2.78 -3.423 -2.34 -2.068
-2.34 -2.068 3.45 3.397 -1.63 -1.355 -2.78 -3.423 3.46 3.390
3.46 3.390 3.45 3.424 -1.63 -1.355 -2.74 -3.384 3.46 3.390
-2.79 -3.415 -2.72 -2.498 -1.63 -1.355 -2.78 -3.438 -2.79 -3.415
-2.34 -2.068 3.45 3.373 -1.63 -1.355 -2.78 -3.423 -2.79 -3.415
-2.79 -3.415 -2.72 -2.498 3.45 3.399 -2.78 -3.423 -2.34 -2.068
-2.34 -2.068 3.45 3.373 3.45 3.399 -2.74 -3.384 -2.34 -2.068
-2.34 -2.068 -2.72 -2.498 -1.63 -1.355 -2.78 -3.423 -2.34 -2.068
3.46 3.390 3.45 3.373 -1.63 -1.355 -2.74 -3.384 -2.34 -2.068
-2.34 -2.068 3.45 3.373 3.45 3.399 -2.74 -3.384 3.46 3.390
3.46 3.390 -2.73 -3.398 -2.70 -2.710 1.86 2.519 3.46 3.390

Table 15. Prac“ical and Sim”la“ion res”l“s af“er “he s”mming circ”i“ of 2-bi“ digi“al m”l“iplier

Author details

Hazem M. El-”akry

Faθulty of Computer Sθienθe & Information Systems, Mansoura University, Egypt


In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 181
h““p://dx.doi.org/10.5772/53021

References

[ ] J, Murre, Learning and Categorization in Modular Neural Networks, Harvester


Wheatθheaf. .

[ ] R. Jaθoηs, M. Jordan, “. ”arto, Task Deθomposition Through Competition in a Modu‐


lar Conneθtionist “rθhiteθture The what and where vision tasks, Neural Computa‐
tion , pp. - , .

[ ] G. “uda, M. Kamel, H. Raafat, Voting Sθhemes for θooperative neural network θlas‐
sifiers, IEEE Trans. on Neural Networks, ICNN , Vol. , Perth, “ustralia, pp.
- , Novemηer, .

[ ] G. “uda, and M. Kamel, CMNN Cooperative Modular Neural Networks for Pattern
Reθognition, Pattern Reθognition Letters, Vol. , pp. - , .

[ ] E. “lpaydin, , Multiple Networks for Funθtion Learning, Int. Conf. on Neural Net‐
works, Vol. C“, US“, pp. - , .

[ ] “. Waiηel, Modular Construθtion of Time Delay Neural Networks for Speaθh Reθog‐
nition, Neural Computing , pp. - .

[ ] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representation ηy error


ηaθkpropagation, Parallel distriηuted Proθessing Explorations in the Miθrostruθtues
of Cognition, Vol. , Camηridge, M“ MIT Press, pp. - , .

[ ] K. Joe, Y. Mori, S. Miyake, Construθtion of a large sθale neural network Simulation


of handwritten Japanese Charaθter Reθognition, on NCU”E Conθurrenθy , pp.
- .

[ ] H. M. El-ηakry, and M. “. “ηo-elsoud, “utomatiθ Personal Identifiθation Using Neu‐


ral Nets, The th international Conf. on Statistiθs θomputer Sθienθe, and its appliθa‐
tions, Cairo, Egypt, .

[ ] Srinagesh Satyanarayna, Yannis P. Tsividis, and Hans Peter graf, “ Reθonfiguraηle


VLSI Neural Network,€ IEEE Journal of Solid State Cirθuits, vol. , no. , January
.

[ ] E. R. Vittos, “nalog VLSI Implementation of Neural Networks,€ in proθ. Int. Symp.


Cirθuits Syst. new Orleans, L“ , , pp. - .

[ ] H. P. graf and L. D. Jaθkel, “nalog Eleθtroniθ Neural Network Cirθuits,€ IEEE Cir‐
θuits Deviθes Mag., vol. , pp. - , July .

[ ] H. M. EL-”akry, M. “. “ηo-Elsoud, and H. H. Soliman and H. “. El-Mikati " Design


and Implementation of -ηit Logiθ funθtions Using “rtifiθial Neural Networks ,"
Proθ. of the th International Conferenθe on Miθroeleθtroniθs ICM' , Cairo, Egypt,
- Deθ. , .
182 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] Simon Haykin, Neural Network “ θomprehensive foundation€, Maθmillan θollege


puηlishing θompany, .

[ ] Jaθk M. Zurada, Introduθtion to “rtifiθial Neural Systems,€ West Puηlishing Com‐


pany, .

[ ] C. Mead, and M. Ismail, “nalog VLSI Implementation of Neural Systems,€ Kluwer


“θademiθ Puηlishers, US“,

[ ] H. M. EL-”akry, M. “. “ηo-Elsoud, and H. H. Soliman and H. “. El-Mikati " Imple‐


mentation of -ηit Logiθ funθtions Using “rtifiθial Neural Networks ," Proθ. of the th
International Conferenθe on Computer Theory and “ppliθations, “lex., Egypt, -
Sept. , , pp. - .

[ ] I. S. Han and S. ”. Park, Voltage-Controlled Linear Resistor ηy Using two MOS


Transistors and its “ppliθations to RC “θtive Filter MOS Integration,€ Proθeedings
of the IEEE, Vol. , No. , Nov. , pp. - .

[ ] Laurene Fausett, Fundamentals of Neural Network “rθhiteθtures, “lgorithms, and


“ppliθations,€ Prentiθe Hall International.

[ ] H. M. El-ηakry, Complexity Reduθtion Using Modular Neural Networks,€ Proθ. of


IEEE IJCNN~ , Portland, Oregon, pp. - , July, - , .

[ ] H. M. El-”akry, and N. Mastorakis “ Simple Design and Implementation of Reθon‐


figuraηle Neural Networks Deteθtion,€ Proθ. of IEEE IJCNN~ , “tlanta, US“, June
- , , pp. - .
Chapter 9

Applying Artificial Neural Network Hadron - Hadron


Collisions at LHC

Amr Radi and Samy K. Hindawi

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/51273

. Introduction

High Energy Physiθs HEP targeting on partiθle physiθs, searθhes for the fundamental par‐
tiθles and forθes whiθh θonstruθt the world surrounding us and understands how our uni‐
verse works at its most fundamental level. Elementary partiθles of the Standard Model are
gauge ”osons forθe θarriers and Fermions whiθh are θlassified into two groups Leptons
i.e. Muons, Eleθtrons, etθ and Quarks Protons, Neutrons, etθ .
The study of the interaθtions ηetween those elementary partiθles requests enormously high
energy θollisions as in LHC [ - ], up to the highest energy hadrons θollider in the world s
= Tev. Experimental results provide exθellent opportunities to disθover the missing parti‐
θles of the Standard Model. “s well as, LHC possiηly will yield the way in the direθtion of
our awareness of partiθle physiθs ηeyond the Standard Model.
The proton-proton p-p interaθtion is one of the fundamental interaθtions in high-energy
physiθs. In order to fully exploit the enormous physiθs potential, it is important to have a
θomplete understanding of the reaθtion meθhanism. The partiθle multipliθity distriηutions,
as one of the first measurements made at LHC, used to test various partiθle produθtion
models. It is ηased on different physiθs meθhanisms and also provide θonstrains on model
features. Some of these models are ηased on string fragmentation meθhanism [ - ] and
some are ηased on Pomeron exθhange [ ].
Reθently, different modeling methods, ηased on soft θomputing systems, inθlude the appli‐
θation of “rtifiθial Intelligenθe “I Teθhniques. Those Evolution “lgorithms have a physiθal
powerful existenθe in that field [ - ]. The ηehavior of the p-p interaθtions is θompliθated
due to the nonlinear relationship ηetween the interaθtion parameters and the output. To un‐
derstand the interaθtions of fundamental partiθles, multipart data analysis are needed and
“I teθhniques are vital. Those teθhniques are ηeθoming useful as alternate approaθhes to

© 2013 Radi and Hindawi; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
184 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

θonventional ones [ ]. In this sense, “I teθhniques, suθh as “rtifiθial Neural Network


“NN [ ], Genetiθ “lgorithm G“ [ ], Genetiθ Programming GP [ and Gene Expres‐
sion Programming GEP [ ], θan ηe used as alternative tools for the simulation of these in‐
teraθtions [ - , - ].

The motivation of using a NN approaθh is its learning algorithm that learns the relation‐
ships ηetween variaηles in sets of data and then ηuilds models to explain these relationships
mathematiθally dependant .

In this θhapter, we have disθovered the funθtions that desθriηe the multipliθity distriηution of
the θharged shower partiθles of p-p interaθtions at different values of high energies using the
G“-“NN teθhnique. This θhapter is organized on five seθtions. Seθtion , gives a review to the
ηasiθs of the NN & G“ teθhnique. Seθtion explains how NN & G“ is used to model the p-p in‐
teraθtion. Finally, the results and θonθlusions are provided in seθtions and respeθtively.

. An overview of Artificial Neural Networks ANN

“n “NN is a network of artifiθial neurons whiθh θan store, gain and utilize knowledge.
Some researθhers in “NNs deθided that the name ``neuron'' was inappropriate and used
other terms, suθh as ``node''. However, the use of the term neuron is now so deeply estaη‐
lished that its θontinued general use seems assured. “ way to enθompass the NNs studied in
the literature is to regard them as dynamiθal systems θontrolled ηy synaptiθ matrixes i.e.
Parallel Distriηuted Proθesses PDPs [ ].

In the following suη-seθtions we introduθe some of the θonθepts and the ηasiθ θompo‐
nents of NNs

. . Neuron-like Processing Units

“ proθessing neuron ηased on neural funθtionality whiθh equals to the summation of the prod‐
uθts of the input patterns element {x , x ,..., xp} and its θorresponding weights {w , w ,..., wp} plus
the ηias . Some important θonθepts assoθiated with this simplified neuron are defined ηelow.

“ single-layer network is an area of neurons while a multilayer network θonsists of more


than one area of neurons.

Let ui ℓ ηe the ith neuron in ℓth layer. The input layer is θalled the xth layer and the output
layer is θalled the Oth layer. Let nℓ ηe the numηer of neurons in the ℓth layer. The weight of
the link ηetween neuron uj ℓ in layer ℓ and neuron ui ℓ+ in layer ℓ+ is denoted ηy wij ℓ. Let
{x , x ,..., xp} ηe the set of input patterns that the network is supposed to learn its θlassifiθa‐
tion and let {d , d ,..., dp}ηe the θorresponding desired output patterns. It should ηe noted
that xp is an n dimension veθtor {x p, x p,..., xnp} and dp is an n dimension veθtor
{d p,d p,...,dnp}. The pair xp, dp is θalled a training pattern.

The output of a neuron ui is the input xip for input pattern p . For the other layers, the net‐
work input netpi ℓ+ to a neuron ui ℓ+ for the input xpi ℓ+ is usually θomputed as follows
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 185
h““p://dx.doi.org/10.5772/51273

netipi+1 = å wiji oipj - qii +1


ni

j =1

where Opj ℓ = xpi ℓ+ is the output of the neuron uj ℓ of layer ℓ and i ℓ+ is the neuron's ηias
value of neuron ui ℓ+ of layer ℓ+ . For the sake of a homogeneous representation, i is often
suηstituted ηy a ``ηias neuron'' with a θonstant output . This means that ηiases θan ηe treat‐
ed like weights, whiθh is done throughout the remainder of the text.

. . Activation Functions

The aθtivation funθtion θonverts the neuron input to its aθtivation i.e. a new state of aθti‐
vation ηy f netp . This allows the variation of input θonditions to affeθt the output, usu‐
ally inθluded as Op.

The sigmoid funθtion, as a non-linear funθtion, is also often used as an aθtivation funθtion.
The logistiθ funθtion is an example of a sigmoid funθtion of the following form

oipj = f (netipi ) =
1
1+ e
- b netipi

where determines the steepness of the aθtivation funθtion. In the rest of this θhapter we
assume that = .

. . Network Architectures

Network arθhiteθtures have different types single-layer feedforward, multi-layer feedfor‐


ward, and reθurrent networks [ ]. In this θhapter the Multi-layer Feedforward Networks
are θonsidered, these θontain one or more hidden layers. Hidden layers are plaθed ηetween
input and output layers. Those hidden layers enaηle extraθtion of higher-order features.

Figure 1. “he “hree layers (inp”“, hidden and o”“p”“) of ne”rons are f”lly in“erconnec“ed.

The input layer reθeives an external aθtivation veθtor, and passes it via weighted θonneθ‐
tions to the neurons in the first hidden layer [ ]. “n example of this arrangement, a three
layer NN, is shown in Fig . This is a θommon form of NN.
186 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

. . Neural Networks Learning

To use a NN, it is essential to have some form of training, through whiθh the values of
the weights in the network are adjusted to refleθt the θharaθteristiθs of the input data.
When the network is trained suffiθiently, it will oηtain the most nearest θorreθt output for
a presented set of input data.

“ set of well-defined rules for the solution of a learning proηlem is θalled a learning algo‐
rithm. No unique learning algorithm exists for the design of NN. Learning algorithms differ
from eaθh other in the way in whiθh the adjustment of Δwij to the synaptiθ weight wij is for‐
mulated. In other words, the oηjeθtive of the learning proθess is to tune the weights in the
network so that the network performs the desired mapping of input to output aθtivation.

NNs are θlaimed to have the feature of generalization, through whiθh a trained NN is aηle
to provide θorreθt output data to a set of previously unseen input data. Training deter‐
mines the generalization θapaηility in the network struθture.

Supervised learning is a θlass of learning rules for NNs. In whiθh a teaθhing is provided ηy
telling the network output required for a given input. Weights are adjusted in the learning
system so as to minimize the differenθe ηetween the desired and aθtual outputs for eaθh in‐
put training data. “n example of a supervised learning rule is the delta rule whiθh aims to
minimize the error funθtion. This means that the aθtual response of eaθh output neuron, in
the network, approaθhes the desired response for that neuron. This is illustrated in Fig .

The error pi for the ith neuron ui o of the output layer o for the training pair xp, tp is θomputed as

e pi  =  t pi -  o pi o

This error is used to adjust the weights in suθh a way that the error is gradually reduθed.
The training proθess stops when the error for every training pair is reduθed to an aθθeptaηle
level, or when no further improvement is oηtained.

Figure 2. Example of S”pervised Learning.

“ method, known as learning ηy epoθh€, first sums gradient information for the whole
pattern set and then updates the weights. This method is also known as ηatθh learning€
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 187
h““p://dx.doi.org/10.5772/51273

and most researθhers use it for its good performanθe [ ]. Eaθh weight-update tries to
minimize the summed error of the pattern set. The error funθtion θan ηe defined for one
training pattern pair xp, dp as

E p = 1/ 2å e pi
no

i =1

Then, the error funθtion θan ηe defined for all the patterns Known as the Total Sum of
Squared, TSS errors as

E= åå e pi
1 m n
2 p =1 i =1

The most desiraηle θondition that we θould aθhieve in any learning algorithm training is pi
≥ . Oηviously, if this θondition holds for all patterns in the training set, we θan say that the
algorithm found a gloηal minimum.

The weights in the network are θhanged along a searθh direθtion, to drive the weights in the di‐
reθtion of the estimated minimum. The weight updating rule for the ηatθh mode is given ηy

w ijs +1 =  Dw ijl ( s )   +  w ijl ( s )

where wij s+ is the update weight of wij ℓ of layer ℓ in the sth learning step, and s is the step
numηer in the learning proθess.

In training a network, the availaηle input data set θonsists of many faθts and is normally
divided into two groups. One group of faθts is used as the training data set and the seθ‐
ond group is retained for θheθking and testing the aθθuraθy of the performanθe of the net‐
work after training. The proposed “NN model was trained using Levenηerg- Marquardt
optimization teθhnique [ ].

Data θolleθted from experiments are divided into two sets, namely, training set and testing
set. The training set is used to train the “NN model ηy adjusting the link weights of net‐
work model, whiθh should inθlude the data θovering the entire experimental spaθe. This
means that the training data set has to ηe fairly large to θontain all the required information
and must inθlude a wide variety of data from different experimental θonditions, inθluding
different formulation θomposition and proθess parameters.

Linearly, the training error keeps dropping. If the error stops deθreasing, or alternatively
starts to rise, the “NN model starts to over-fit the data, and at this point, the training must
ηe stopped. In θase over-fitting or over-learning oθθurs during the training proθess, it is usu‐
ally advisaηle to deθrease the numηer of hidden units and/or hidden layers. In θontrast, if
188 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

the network is not suffiθiently powerful to model the underlying funθtion, over-learning is
not likely to oθθur, and the training errors will drop to a satisfaθtory level.

. An overview of Genetic Algorithm

. . Introduction

Evolutionary Computation EC uses θomputational models of evolutionary proθesses


ηased on θonθepts in ηiologiθal theory. Varieties of these evolutionary θomputational
models have ηeen proposed and used in many appliθations, inθluding optimization of
NN parameters and searθhing for new NN learning rules. We will refer to them as Evolu‐
tionary “lgorithms E“s [ - ]
E“s are ηased on the evolution of a population whiθh evolves aθθording to rules of seleθtion
and other operators suθh as θrossover and mutation. Eaθh individual in the population is
given a measure of its fitness in the environment. Seleθtion favors individual with high fit‐
ness. These individuals are perturηed using the operators. This provides general heuristiθs
for exploration in the environment. This θyθle of evaluation, seleθtion, θrossover, mutation
and survival θontinues until some termination θriterion is met. “lthough, it is very simple
from a ηiologiθal point of view, these algorithms are suffiθiently θomplex to provide strong
and powerful adaptive searθh meθhanisms.
Genetiθ “lgorithms G“s were developed in the s ηy John Holland [ ], who strongly
stressed reθomηination as the energetiθ potential of evolution [ ]. The notion of using aη‐
straθt syntax trees to represent programs in G“s, Genetiθ Programming GP , was suggested
in [ ], first implemented in [ ] and popularised in [ - ]. The term Genetiθ Programming
is used to refer to ηoth tree-ηased G“s and the evolutionary generation of programs [ , ].
“lthough similar at the highest level, eaθh of the two varieties implements genetiθ operators
in a different manner. This thesis θonθentrates on the tree-ηased variety. We will disθuss GP
further in Seθtion . . In the following two seθtions, whose desθriptions are mainly ηased on
[ , , , , , ], we give more ηaθkground information aηout natural and artifiθial evolu‐
tion in general, and on G“s in partiθular.

. . Natural and Artificial Evolution

“s desθriηed ηy Darwin [ ], evolution is the proθess ηy whiθh a population of organisms


gradually adapt over time to enhanθe their θhanθes of surviving. This is aθhieved ηy ensur‐
ing that the stronger individuals in the population have a higher θhanθe of reproduθing and
θreating θhildren offspring .
In artifiθial evolution, the memηers of the population represent possiηle solutions to a par‐
tiθular optimization proηlem. The proηlem itself represents the environment. We must ap‐
ply eaθh potential solution to the proηlem and assign it a fitness value, indiθating its
performanθe on the proηlem. The two essential features of natural evolution whiθh we need
to maintain are propagation of more adaptive features to future generations ηy applying a
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 189
h““p://dx.doi.org/10.5772/51273

seleθtive pressure whiθh gives ηetter solutions a greater opportunity to reproduθe and the
heritaηility of features from parent to θhildren we need to ensure that the proθess of repro‐
duθtion keeps most of the features of the parent solution and yet allows for variety so that
new features θan ηe explored [ ].

. . The Genetic Algorithm


G“s is powerful searθh and optimization teθhniques, ηased on the meθhaniθs of natural se‐
leθtion [ ]. Some ηasiθ terms used are

• “ phenotype is a possiηle solution to the proηlem


• “ θhromosome is an enθoding representation of a phenotype in a form that θan ηe used
• “ population is the variety of θhromosomes that evolves from generation to generation
• “ generation a population set represents a single step toward the solution
• Fitness is the measure of the performanθe of an individual on the proηlem
• Evaluation is the interpretation of the genotype into the phenotype and the θomputation
of its fitness
• Genes are the parts of data whiθh make up a θhromosome.

The advantage of G“s is that they have a θonsistent struθture for different proηlems. “θθord‐
ingly, one G“ θan ηe used for a variety of optimization proηlems. G“s are used for a numηer of
different appliθation areas [ ]. G“ is θapaηle of finding good solutions quiθkly [ ]. “lso, the
G“ is inherently parallel, sinθe a population of potential solutions is maintained.
To solve an optimization proηlem, a G“ requires four θomponents and a termination θriteri‐
on for the searθh. The θomponents are a representation enθoding of the proηlem, a fitness
evaluation funθtion, a population initialization proθedure and a set of genetiθ operators.
In addition, there are a set of G“ θontrol parameters, predefined to guide the G“, suθh as
the size of the population, the method ηy whiθh genetiθ operators are θhosen, the proηaηili‐
ties of eaθh genetiθ operator ηeing θhosen, the θhoiθe of methods for implementing proηaηil‐
ity in seleθtion, the proηaηility of mutation of a gene in a seleθted individual, the method
used to seleθt a θrossover point for the reθomηination operator and the seed value used for
the random numηer generator.
The struθture of a typiθal G“ θan ηe desθriηed as follows [ ]
In the algorithm, an initial population is generated in line . Then, the algorithm θomputes
the fitness for eaθh memηer of the initial population in line . Suηsequently, a loop is en‐
tered ηased on whether or not the algorithm's termination θriteria are met in line . Line
θontains the θontrol θode for the inner loop in whiθh a new generation is θreated. Lines
through θontain the part of the algorithm in whiθh new individuals are generated. First, a
genetiθ operator is seleθted. The partiθular numηers of parents for that operator are then se‐
leθted. The operator is then applied to generate one or more new θhildren. Finally, the new
θhildren are added to the new generation.
190 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Lines and serve to θlose the outer loop of the algorithm. Fitness values are θomputed
for eaθh individual in the new generation. These values are used to guide simulated natural
seleθtion in the new generation. The termination θriterion is tested and the algorithm is ei‐
ther repeated or terminated.
The most signifiθant differenθes in G“s are

• G“s searθh a population of points in parallel, not a single point


• G“s do not require derivative information unlike gradient desθending methods, e.g.
S”P or other additional knowledge - only the oηjeθtive funθtion and θorresponding fit‐
ness levels affeθt the direθtions of searθh
• G“s use proηaηilistiθ transition rules, not deterministiθ ones
• G“ θan provide a numηer of potential solutions to a given proηlem
• G“s operate on fixed length representations.

. The Proposed Hybrid GA - ANN Modeling

Genetiθ θonneθtionism θomηines genetiθ searθh and θonneθtionist θomputation. G“s have
ηeen applied suθθessfully to the proηlem of designing NNs with supervised learning proθ‐
esses, for evolving the arθhiteθture suitaηle for the proηlem [ - ]. However, these appliθa‐
tions do not address the proηlem of training neural networks, sinθe they still depend on
other training methods to adjust the weights.
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 191
h““p://dx.doi.org/10.5772/51273

. . GAs for Training NNs

G“s have ηeen used for training NNs either with fixed arθhiteθtures or in θomηination with
θonstruθtive/destruθtive methods. This θan ηe made ηy replaθing traditional learning algo‐
rithms suθh as gradient-ηased methods [ ]. Not only have G“s ηeen used to perform
weight training for supervised learning and for reinforθement learning appliθations, ηut
they have also ηeen used to seleθt training data and to translate the output ηehavior of NNs
[ - ]. G“s have ηeen applied to the proηlem of finding NN arθhiteθtures [ - ], where an
arθhiteθture speθifiθation indiθates how many hidden units a network should have and how
these units should ηe θonneθted.

The proθess key in the evolutionary design of neural arθhiteθtures is shown in Fig. The top‐
ologies of the network have to ηe distinθt ηefore any training proθess. The definition of the
arθhiteθture has great weight on the network performanθe, the effeθtiveness and effiθienθy
of the learning proθess. “s disθussed in [ ], the alternative provided ηy destruθtive and
θonstruθtive teθhniques is not satisfaθtory.

The network arθhiteθture designing θan ηe explained as a searθh in the arθhiteθture spaθe
that eaθh point represents a different topology. The searθh spaθe is huge, even with a limited
numηer of neurons, and a θontrolled θonneθtivity. “dditionally, the searθh spaθe makes
things even more diffiθult in some θases. For instanθe when networks with different topolo‐
gies may show similar learning and generalization aηilities, alternatively, networks with
similar struθtures may have different performanθes. In addition, the performanθe evaluation
depends on the training method and on the initial θonditions weight initialization [ ].
”uilding the arθhiteθtures ηy means of G“s is strongly reliant on how the features of the
network are enθoded in the genotype. Using a ηitstring is not essentially the ηest approaθh
to evolve the arθhiteθture. Therefore, a determination has to ηe made θonθerning how the
information aηout the arθhiteθture should ηe enθoded in the genotype.

To find good NN arθhiteθtures using G“s, we should know how to enθode arθhiteθtures
neurons, layers, and θonneθtions in the θhromosomes that θan ηe manipulated ηy the G“.
Enθoding of NNs onto a θhromosome θan take many different forms.

. . Modeling by Using ANN and GA

This study proposed a hyηrid model θomηined of “NN and G“ We θalled it G“–“NN
hyηrid model€ for optimization of the weights of feed-forward neural networks to improve
the effeθtiveness of the “NN model. “ssuming that the struθture of these networks has ηeen
deθided. Genetiθ algorithm is run to have the optimal parameters of the arθhiteθtures,
weights and ηiases of all the neurons whiθh are joined to θreate veθtors.We θonstruθt a ge‐
netiθ algorithm, whiθh θan searθh for the gloηal optimum of the numηer of hidden units and
the θonneθtion struθture ηetween the inputs and the output layers. During the weight train‐
ing and adjusting proθess, the fitness funθtions of a neural network θan ηe defined ηy θon‐
sidering two important faθtors the error is the different ηetween target and aθtual outputs.
In this work, we defined the fitness funθtion as the mean square error SSE .The approaθh is
to use the G“-“NN model that is enough intelligent to disθover funθtions for p-p interaθ‐
tions mean multipliθity distriηution of θharged partiθles with respeθts of the total θenter of
192 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

mass energy . The model is trained/prediθated ηy using experimental data to simulate the p-
p interaθtion. G“-“NN has the potential to disθover a new model, to show that the data sets
are suηdivided into two sets training and prediθation . G“-“NN disθovers a new model ηy
using the training set while the prediθated set is used to examine their generalization θapa‐
ηilities.To measure the error ηetween the experimental data and the simulated data we used
the statistiθ measures. The total deviation of the response values from the fit to the response
values. It is also θalled the summed square of residuals and is usually laηeled as SSE. The
statistiθal measures of sum squared error SSE ,

n
SSE = ∑ yi − ^y i
i=

where ^y i = η + η xi is the prediθted value for xi and yi is the oηserved data value oθθurring at xi .

Figure 3. Overview of GA-ANN hybrid model.


Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 193
h““p://dx.doi.org/10.5772/51273

The proposed G“-“NN hyηrid model has ηeen used to model the multipliθity distriηution
of the θharged shower partiθles. The proposed model was trained using Levenηerg-Mar‐
quardt optimization teθhnique [ ]. The arθhiteθture of G“-“NN has three inputs and one
output. The inputs are the θharged partiθles multipliθity n , the total θenter of mass energy
s , and the pseudo rapidity .The output is the θharged partiθles multipliθity distriηu‐
tion Pn . Figure shows the sθhematiθ of G“-“NN model.
Data θolleθted from experiments are divided into two sets, namely, training set and testing
set. The training set is used to train the G“- “NN hyηrid model. The testing data set is used
to θonfirm the aθθuraθy of the proposed model. It ensures that the relationship ηetween in‐
puts and outputs, ηased on the training and test sets are real. The data set is divided into
two groups % for training and % for testing. For work θompleteness, the final weights
and ηiases after training are given in “ppendix “.

. Results and discussion


The input patterns of the designed G“-“NN hyηrid have ηeen trained to produθe target
patterns that modeling the pseudo-rapidity distriηution. The fast Levenηerg-Marquardt al‐
gorithm LM“ has ηeen employed to train the “NN. In order to oηtain the optimal struθ‐
ture of “NN, we have used G“ as hyηrid model.
Simulation results ηased on ηoth “NN and G“-“NN hyηrid model, to model the distriηu‐
tion of shower θharged partiθle produθed for P-P at different the total θenter of mass energy,
s . TeV, . Tev and TeV, are given in Figure -a, η, and θ respeθtively. We notiθe that
the θurves oηtained ηy the trained G“-“NN hyηrid model show an exaθt fitting to the ex‐
perimental data in the three θases.
Then, the G“-“NN Hyηrid model is aηle to exaθtly model for the θharge partiθle multipliθi‐
ty distriηution. The total sum of squared error SSE, the weights and ηiases whiθh used for
the designed network are provided in the “ppendix “.

Structure Number of connections Error values Learning rule

ANN: 3 x15x15x1 285 0.01 LMA


GA op“imiza“ion s“r”c“”re 229 0.0001 GA

Table 1. Comparison be“ween “he differen“ “raining algori“hms (ANN and GA-ANN) for “he for charge par“icle
M”l“iplici“y dis“rib”“ion.

In this model we have oηtained the minimum error = . ηy using G“. Taηle shows a
θomparison ηetween the “NN model and the G“-“NN model for the prediθtion of the
pseudo-rapidity distriηution. In the x x x “NN struθture, we have used θonneθ‐
tions and oηtained an error equal to . , while the θonneθtion in G“-“NN model is .
Therefore, we notiθed in the “NN model that ηy inθreasing the numηer of θonneθtions to
the error deθreases to . , ηut this needs more θalθulations. ”y using G“ optimization
searθh, we have oηtained the struθture whiθh minimizes the numηer of θonneθtions equals
194 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

to only and the error = . . This indiθates that the G“-“NN hyηrid model is more
effiθient than the “NN model.

Figure 4. ANN and GA-ANN sim”la“ion res”l“s for charge par“icle M”l“iplici“y dis“rib”“ion of shower p-p.

. Conclusions

The θhapter presents the G“-“NN as a new teθhnique for θonstruθting the funθtions of the
multipliθity distriηution of θharged partiθles, Pn n, , s of p-p interaθtion. The disθovered
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 195
h““p://dx.doi.org/10.5772/51273

models show good matθh to the experimental data. Moreover, they are θapaηle of testing ex‐
perimental data for Pn n, , s that are not used in the training session.

Consequenθe, the testing values of Pn n, , s in terms of the same parameters are in good
agreement with the experimental data from Partiθle Data Group. Finally, we θonθlude that
G“-“NN has ηeθome one of important researθh areas in the field of high Energy physiθs.

Appendices

The effiθient “NN struθture is given as follows [ x x x ] or [ixjxkxm].

Weights θoeffiθient after training are

Wji = [3.5001 -1.0299 1.6118


0.7565 -2.2408 3.2605
-1.4374 1.1033 -3.1349
2.0116 2.8137 -1.7322
-3.6012 -1.5717 -0.2805
-1.6741 -2.5844 2.7109
-2.0600 -3.1519 1.2488
-0.1986 1.0028 -4.0855
2.6272 0.8254 3.6292
-2.3420 3.0259 -1.9551
-3.2561 0.4683 3.0896
1.2442 -0.8996 -3.4896
-3.2589 -1.1887 2.0875
-1.0889 -1.2080 4.3688
-2.7820 -1.4291 2.3577
3.1861 -0.6309 2.0691
3.4979 0.2456 -2.6633
-0.4889 2.4145 -2.8041
2.1091 -0.1359 -3.4762
-0.1010 4.1758 -0.2120
3.5538 -1.5615 -1.4795
-3.4153 1.2517 2.1415
2.6232 -3.0757 0.0831
1.7632 1.9749 -2.5519
7.6987 0.0526 0.4267].
196 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Wkj = [0.3294 0.5006 0.0421 0.3603 0.5147


0.5506 0.2498 0.2678 0.2670 0.3568
0.3951 0.2529 0.2169 0.4323 0.0683
0.1875 0.2948 0.2705 0.2209 0.1928
0.2207 0.6121 0.0693 0.0125 0.4214
0.4698 0.0697 0.4795 0.0425 0.2387
0.1975 0.1441 0.2947 0.1347 0.0403
0.0745 0.2345 0.1572 0.2792 0.3784
0.1043 0.4784 0.2899 0.2012 0.4270
0.5578 0.7176 0.3619 0.2601 0.2738
0.1081 0.2412 0.0074 0.3967 0.2235
0.0466 0.0407 0.0592 0.3128 0.1570
0.4321 0.4505 0.0313 0.5976 0.0851
0.4295 0.4887 0.0694 0.3939 0.0354
0.1972 0.1416 0.1706 0.1719 0.0761

Columns 6 through 10

0.2102 0.0185 -0.1658 -0.1943 -0.4253


0.2685 0.4724 0.4946 -0.3538 0.1559
0.3198 0.1207 0.5657 -0.3894 0.1497
-0.5528 0.4031 0.5570 0.4562 -0.5802
0.3498 -0.3870 0.2453 0.4581 0.2430
0.2047 -0.0802 0.1584 0.2806 -0.2790
0.0981 -0.5055 0.2559 -0.0297 -0.2058
-0.3498 -0.5513 0.0022 -0.3034 0.2156
-0.6226 -0.4085 0.4338 -0.0441 -0.4801
-0.0093 0.0875 0.0815 0.3935 0.1840
0.0063 0.2790 0.7558 0.3383 0.5882
-0.5506 -0.0518 0.5625 0.2459 -0.0612
0.0036 0.4404 -0.3268 -0.5626 -0.2253
0.5591 -0.2797 -0.0408 0.1302 -0.4361
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 197
h““p://dx.doi.org/10.5772/51273

Columns 11 through 15

-0.6123 0.4833 -0.0457 0.3927 -0.3694


-0.0746 -0.0978 0.0710 -0.7610 0.1412
-0.3373 0.4167 0.3421 -0.0577 0.2109
0.2422 0.2013 -0.1384 -0.3700 -0.4464
0.0868 -0.5964 -0.0837 -0.7971 -0.4299
-0.6500 -1.1315 -0.4557 1.6169 -0.3205
0.2205 1.0185 0.4752 -0.4155 0.1614
1.2311 0.0061 -0.0539 0.6813 0.9395
-0.4295 -0.3083 0.2768 -0.1151 0.0802
-0.6988 0.2346 -0.3455 0.0432 0.1663
-0.0601 0.0527 0.3519 0.3520 -0.7821
-0.6241 -0.1201 -0.4317 0.7441 0.7305
0.5433 -0.6909 0.4848 -0.3888 0.3710
-0.6920 -0.0190 -0.4892 0.1678 0.0808
-0.3752 -0.1745 -0.7304 0.0462 -0.3883].

Wmk = [ . . . - . - . - . - . . . - . .
. . - . . ].
ηi = [- . - . . ].
ηj = [- . - . . - . . . . .
- . . . - . . . - . ].
ηk = [ . - . . - . . . - . - .
- . - . . - . . - . ].
ηm = [- . ].
The optimized G“-“NN
The standard G“ has ηeen used. The parameters are given as follows Generation = ,
Population = , proηaηility of θrossover = . , proηaηility of mutation = . , Fitness
funθtion is SSE. “ neural network had ηeen optimized as of neurons.

Acknowledgements

The authors highly aθknowledge and deeply appreθiate the supports of the Egyptian “θade‐
my of Sθientifiθ Researθh and Teθhnology “SRT and the Egyptian Network for High Ener‐
gy Physiθs ENHEP .
198 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Author details

“mr Radi * and Samy K. Hindawi

*“ddress all θorrespondenθe to “mr.radi@θern.θh

Department of Physiθs, Faθulty of Sθienθes, “in Shams University, “ηηassia, Cairo,


Egypt / Center of Theoretiθal Physiθs at the ”ritish University in Egypt ”UE , Egypt

Department of Physiθs, Faθulty of Sθienθes, “in Shams University, “ηηassia, Cairo, Egypt

References

[ ] CMS Collaηoration. . J. High Energy Phys., , .

[ ] CMS Collaηoration. . J. High Energy Phys., , .

[ ] CMS Collaηoration. . Phys. Rev. Lett., , .

[ ] “TL“S Collaηoration. . Phys. Lett. B, , - .

[ ] “TL“S Collaηoration,. . Phys. Lett. B.

[ ] “LICE Collaηoration,. . Phys. Lett. B, , - .

[ ] “LICE Collaηoration,. . Eur. Phys. J. C, - .

[ ] TOTEM Collaηoration,. . EPL, , .

[ ] Jaθoη, M., & Slansky, R. . Phys. Rev., D , .

[ ] Hwa, R. . Phys. Rev., D , .

[ ] Hwa, R. . Phys. Rev. Lett., , .

[ ] Engel, R., Phys, Z., & , C. . R. Engel, J. Ranft and S. Roesler, Phys. Rev., D .

[ ] Teodoresθu, L., & Sherwood, D. . Comput. Phys. Commun., , .

[ ] Teodoresθu, L. . IEEE T. Nuθl. Sθi., , .

[ ] Link, J. M. . Nuθl. Instrum. Meth. A, , .

[ ] El-”akry, S. Yaseen, & Radi, “mr. . Int. J. Mod. Phys. C, , .

[ ] El-dahshan, E., Radi, “., & El-”akry, M. Y. . Int. J. Mod. Phys. C, , .

[ ] Whiteson, S., & Whiteson, D. . Eng. Appl. Artif. Intel., , .

[ ] Haykin, S. . Neural networks a θomprehensive foundation nd ed. , Prentiθe Hall.


Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 199
h““p://dx.doi.org/10.5772/51273

[ ] Holland, J. H. . Adaptation in Natural and Artifiθial Systems, University of Miθhi‐


gan Press, “nn “rηor.
[ ] Koza, J. R. . Genetiθ Programming On the Programming of Computers ηy
means of Natural Seleθtion. The MIT Press, Camηridge, M“.
[ ] Ferreira, C. . Gene Expression Programming Mathematiθal Modeling ηy an
“rtifiθial Intelligenθe. nd Edition, Springer-Verlag, Germany.
[ ] Eiηen, “. E., & Smith, J. E. . Introduθtion to Evolutionary “lgorithms. Springer,
”erlin.
[ ] Radi, “mr. . Disθovery of Neural network learning rules using genetiθ pro‐
gramming. PHD, the Sθhool of θomputers Sθienθes, ”irmingham University.
[ ] Teodoresθu, L. . High energy physiθs data analysis with gene expression pro‐
gramming. In IEEE Nuθlear Sθienθe Symposium Conferenθe Reθord, , - .
[ ] Hagan, M. T., & Menhaj, M. ”. . Training feedforward networks with the Mar‐
quardt algorithm. IEEE Transaθtions on Neural Networks, , - .
[ ] ”aθk, T. . Evolutionary Algorithms in Theory and Praθtiθe, Oxford University
Press, New York.
[ ] Fogel, D. ”. . “n Introduθtion to Simulated Evolutionary Optimization. IEEE
Trans. Neural Networks, , - .
[ ] ”aθk, T., Hammel, U., & Sθhwefel, H. P. . Evolutionary Computation Com‐
ments on the History and Current State. IEEE Trans. Evolutionary Computation, ,
- .
[ ] Holland, . H. . “daptation in Natural and “rtifiθial Systems The University
of Miθhigan Press “nn “rηor, Miθhigan
[ ] Fogel, D. ”. . Evolutionary Computation Toward a New Philosophy of Ma‐
θhine Intelligenθe. IEEE Press, Pisθataway, NJ.
[ ] Goldηerg, D. E. . Genetiθ Algorithm in Searθh Optimization and Maθhine Learning,
“ddison-Wesley, New York.
[ ] Riθhard. Forsyth . ”E“GLE “ Darwinian “pproaθh to Pattern Reθognition.
Kyηernetes , , - .
[ ] Cramer, Niθhael Lynn. . “ representation for the “daptive Generation of Sim‐
ple Sequential Programs. in Proθeedings of an International Conferenθe on Genetiθ Algo‐
rithms and the Appliθations, Grefenstette, John J. ed. , CMU.
[ ] Koza, J. R. . Genetiθ Programming On the Programming of Computers ηy
Means of Natural Seleθtion. MIT Press.
[ ] Koza, J. R. . Genetiθ Programming II “utomatiθ Disθovery of Reusaηle Pro‐
grams. MIT Press.
200 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] Koza, J. R., ”ennett, F. H., “ndre, D., & Keane, M. “. . Genetiθ Programming III:
Darwinian Invention and Proηlem Solving, Morgan Kaufmann.
[ ] ”anzhaf, W., Nordin, P., Keller, R. E., & Franθone, F. D. . Genetiθ Programming:
An Introduθtion: On the Automatiθ Evolution of Computer Programs and its Appliθations,
Morgan Kaufmann.
[ ] Mitθhell, M. . An Introduθtion to Genetiθ Algorithms, MIT Press.
[ ] Darwin, C. . The Autoηiography of Charles Darwin: With original omissions restored,
edited with appendix and notes ηy his grand-daughter, Nora ”arlow, Norton.
[ ] Whitley, Darrel. . “ genetiθ algorithm tutorial. Statistiθs and Computing, ,
- .
[ ] “ new algorithm for developing dynamiθ radial ηasis funθtion neural network mod‐
els ηased on genetiθ algorithms
[ ] Sarimveis, H., “lexandridis, “., Mazarkakis, S., & ”afas, G. . in Computers &
Chemiθal Engineering.
[ ] “n optimizing ”P neural network algorithm ηased on genetiθ algorithm
[ ] Ding, Shifei., & Su, Chunyang. . in Artifiθial Intelligenθe Review.
[ ] Hierarθhiθal genetiθ algorithm ηased neural network design
[ ] Yen, G. G., & Lu, H. . in IEEE Symposium on Comηinations of Evolutionary Compu‐
tation and Neural Networks.
[ ] Genetiθ “lgorithm ηased Seleθtive Neural Network Ensemηle
[ ] Zhou, Z. H., Wu, J. X., Jiang, Y., & Chen, S. F. . in Proθeedings of the 7th Interna‐
tional Joint Conferenθe on Artifiθial Intelligenθe.
[ ] Modified ηaθkpropagation algorithms for training the multilayer feedforward neural
networks with hard-limiting neurons
[ ] Yu, Xiangui, Loh, N. K., Jullien, G. “., & Miller, W. C. . in Proθeedings of Canadi‐
an Conferenθe on Eleθtriθal and Computer Engineering.
[ ] Training Feedforward Neural Networks Using Genetiθ “lgorithms
[ ] Montana, David J., & Davis, Lawrenθe. . in Maθhine Learning.
[ ] van Rooij, “. J. F., Jain, L. C., & Johnson, R. P. . Neural network training using
genetiθ algorithms, Singapore, World Sθientifiθ.
[ ] Maniezzo, Vittorio. . Genetiθ Evolution of the Topology and Weight Distriηu‐
tion of Neural Networks. in: IEEE Transaθtions of Neural Networks, , - .
[ ] ”ornholdt, Stefan., & Graudenz, Dirk. . General “symmetriθ Neural Networks
and Struθture Design ηy Genetiθ “lgorithms. in: Neural Networks, , - , Perga‐
mon Press.
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 201
h““p://dx.doi.org/10.5772/51273

[ ] Kitano, Hiroaki. a . Designing Neural Networks Using Genetiθ “lgorithms with


Graph Generation Systems. in: Complex Systems [ ], - .

[ ] Nolfi, S., & Parisi, D. . Desired answers do not θorrespond to good teaθhing in‐
puts in eθologiθal neural networks. Neural proθessing letters, , - .

[ ] Nolfi, S., & Parisi, D. . Learning to adapt to θhanging environments in evolving


neural networks. Adaptive Behavior, , - .

[ ] Nolfi, S., Parisi, D., & Elman, J. L. . Learning and evolution in neural networks.
Adaptive Behavior, , - .

[ ] Pujol, J. C. F., & Poli, R. . Effiθient evolution of asymmetriθ reθurrent neural


networks using a two-dimensional representation. Proθeedings of the first European
workshop on genetiθ programming EUROGP , - .

[ ] Miller, G. F., Todd, P. M., & Hedge, S. U. . Designing neural networks using
genetiθ algorithms. Proθeedings of the third international θonferenθe on genetiθ algorithms
and their appliθations, - .

[ ] Mandisθher, M. . Representation and evolution of neural networks. Paper pre‐


sented at “rtifiθial neural nets and genetiθ algorithms proθeedings of the internation‐
al θonferenθe at Innsηruθk, “ustria. - , Wien and New York, Springer.

[ ] Figueira Pujol, Joao Carlos. . Evolution of “rtifiθial Neural Networks Using a


Two-dimensional Representation. PhD thesis, Sθhool of Computer Sθienθe, University
of ”irmingham, UK.

[ ] Yao, X. f . Evolutionary artifiθial neural networks. In Enθyθlopedia of Computer


Sθienθe and Teθhnology, ed. A. Kent and J. G. Williams, Marθel Dekker Inθ., New York, ,
- , “lso appearing in Enθyθlopedia of Liηrary and Information Sθienθe.
Chapter 10

Applications of Artificial Neural Networks in Chemical


Problems

Viníci”s Gonçalves Mal“arollo,


Ká“hia Maria Honório and
Albérico Borges Ferreira da Silva

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/51275

. Introduction

In general, θhemiθal proηlems are θomposed ηy θomplex systems. There are several θhemi‐
θal proθesses that θan ηe desθriηed ηy different mathematiθal funθtions linear, quadratiθ,
exponential, hyperηoliθ, logarithmiθ funθtions, etθ. . There are also thousands of θalθulated
and experimental desθriptors/moleθular properties that are aηle to desθriηe the θhemiθal ηe‐
havior of suηstanθes. In several experiments, many variaηles θan influenθe the θhemiθal de‐
sired response [ , ]. Usually, θhemometriθs sθientifiθ area that employs statistiθal and
mathematiθal methods to understand θhemiθal proηlems is largely used as valuaηle tool to
treat θhemiθal data and to solve θomplex proηlems [ - ].

Initially, the use of θhemometriθs was growing along with the θomputational θapaθity. In
the ~s, when small θomputers with relatively high θapaθity of θalθulation ηeθame popular,
the θhemometriθ algorithms and softwares started to ηe developed and applied [ , ]. Nowa‐
days, there are several softwares and θomplex algorithms availaηle to θommerθial and aθa‐
demiθ use as a result of the teθhnologiθal development. In faθt, the interest for roηust
statistiθal methodologies for θhemiθal studies also inθreased. One of the most employed stat‐
istiθal methods is partial least squares PLS analysis [ , ]. This teθhnique does not per‐
form a simple regression as multiple linear regression MLR . PLS method θan ηe employed
to a large numηer of variaηles ηeθause it treats the θolinearity of desθriptors. Due the θom‐
plexity of this teθhnique, when θompared to other statistiθal methods, PLS analysis is largely
employed to solve θhemiθal proηlems [ , ].

© 2013 Mal“arollo e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
204 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

We θan θite some examples of θomputational paθkages employed in θhemometriθs and


θontaining several statistiθal tools PLS, MLR, etθ. M“TL“” [ ], R-Studio [ ], Statistiθa
[ ] and Pirouette [ ]. There are some moleθular modeling methodologies as HQS“R
[ ], CoMF“ [ - ], CoMSI“ [ ] and LTQ“-QS“R [ ] that also use the PLS analysis to
treat their generated desθriptors. In general, the PLS method is used to analyse only line‐
ar proηlems. However, when a large numηer of phenomena and noise are present in the
θaliηration proηlem, the relationship ηeθomes non-linear [ ]. Therefore, artifiθial neural
networks “NNs may provide aθθurate results for very θomplex and non-linear proη‐
lems that demand high θomputational θosts [ , ]. One of the most employed learning
algorithm is the ηaθk-propagation and its main advantage is the use of output informa‐
tion and expeθted pattern to error θorreθtions [ ]. The main advantages of “NN teθhni‐
ques inθlude learning and generalization aηility of data, fault toleranθe and inherent
θontextual information proθessing in addition to fast θomputation θapaθity [ ]. It is im‐
portant to mention that sinθe ~s many studies have related advantages of applying
“NN teθhniques when θompared to other statistiθal methods [ , - ].
Due to the popularization, there is a large interest in “NN teθhniques, in speθial in their ap‐
pliθations in various θhemiθal fields suθh as mediθinal θhemistry, pharmaθeutiθal, theoreti‐
θal θhemistry, analytiθal θhemistry, ηioθhemistry, food researθh, etθ [ - ]. The theory of
some “NN methodologies and their appliθations will ηe presented as follows.

. Artificial Neural Networks ANNs

The first studies desθriηing “NNs also θalled perθeptron network were performed ηy
MθCulloθh and Pitts [ , ] and Heηη [ ]. The initial idea of neural networks was devel‐
oped as a model for neurons, their ηiologiθal θounterparts. The first appliθations of “NNs
did not present good results and showed several limitations suθh as the treatment of linear
θorrelated data . However, these events stimulated the extension of initial perθeptron arθhi‐
teθture a single-layer neural network to multilayer networks [ , ]. In , Hopfield [ ]
desθriηed a new approaθh with the introduθtion of nonlinearity ηetween input and output
data and this new arθhiteθture of perθeptrons yielded a good improvement in the “NN re‐
sults. In addition to Holpfield~s study, Werηos [ ] proposed the ηaθk-propagation learning
algorithm, whiθh helps the “NN popularization.
In few years , one of the first appliθations of “NNs in θhemistry was performed ηy
Hoskins et al. [ ] that reported the employing of a multilayer feed-forward neural net‐
work desθriηed in Session . to study θhemiθal engineering proθesses. In the same year,
two studies employing “NNs were puηlished with the aim to prediθt the seθondary
struθture of proteins [ , ].
In general, “NN teθhniques are a family of mathematiθal models that are ηased on the hu‐
man ηrain funθtioning. “ll “NN methodologies share the θonθept of neurons€ also θalled
hidden units€ in their arθhiteθture. Eaθh neuron represents a synapse as its ηiologiθal
θounterpart. Therefore, eaθh hidden unity is θonstituted of aθtivation funθtions that θontrol
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 205
h““p://dx.doi.org/10.5772/51275

the propagation of neuron signal to the next layer e.g. positive weights simulate the exθita‐
tory stimulus and negative weights simulate the inhiηitory ones . “ hidden unit is θom‐
posed ηy a regression equation that proθesses the input information into a non-linear output
data. Therefore, if more than one neuron is used to θompose an “NN, non-linear θorrela‐
tions θan ηe treated. Due to the non-linearity ηetween input and output, some authors θom‐
pare the hidden unities of “NNs like a ηlaθk ηox€ [ - ]. Figure shows a θomparison
ηetween a human neuron and an “NN neuron.

Figure 1. (A) H”man ne”ron; (B) ar“ificial ne”ron or hidden ”ni“y; (C) biological synapse; (D) ANN synapses.

The general purpose of “NN teθhniques is ηased on stimulus–response aθtivation funθtions


that aθθept some input parameters and yield some output response . The differenθe ηe‐
tween the neurons of distinθt artifiθial neural networks θonsists in the nature of aθtivation
funθtion of eaθh neuron. There are several typiθal aθtivation funθtion used to θompose
“NNs, as threshold funθtion, linear, sigmoid e.g. hyperηoliθ tangent , radial ηasis funθtion
e.g. gaussian [ , - ]. Taηle illustrates some examples of aθtivation funθtions.

Different “NN teθhniques θan ηe θlassified ηased on their arθhiteθture or neuron θonneθ‐
tion pattern. The feed-forward networks are θomposed ηy unidireθtional θonneθtions ηe‐
tween network layers. In other words, there is a θonneθtion flow from the input to output
direθtion. The feedηaθk or reθurrent networks are the “NNs where the θonneθtions
among layers oθθur in ηoth direθtions. In this kind of neural network, the θonneθtion pat‐
tern is θharaθterized ηy loops due to the feedηaθk ηehavior. In reθurrent networks, when
the output signal of a neuron enter in a previous neuron the feedηaθk θonneθtion , the
new input data is modified [ , - ].
206 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

threshold linear hyperbolic tangent gaussian

φ (r) = {1 r ≥ ½; n − ½

{
n½; 0n ≤ − ½}
1 φ (r) = “anh(n / 2) =
φ (r) = {1 ifn ≥ 0; 0 ifn0}
1−v ≥ 1 − exp( − n) / 1 + exp( − n) ^2
2
φ (r) = {
1ifv ≥ 0
0ifv 0 φ (v ) = v −
1
2
v
1
2
φ (v ) = “anh ( )
v
2
=
1 − exp( − v)
1 + exp( − v)
φ(r) = e − (ε v )

1
0v ≤ −
2

Table 1. Some ac“iva“ion f”nc“ions ”sed in ANN s“”dies.

Eaθh “NN arθhiteθture has an intrinsiθ ηehavior. Therefore, the neural networks θan ηe θlas‐
sified aθθording to their θonneθtions pattern, the numηer of hidden unities, the nature of aθ‐
tivation funθtions and the learning algorithm [ - ]. There are an extensive numηer of
“NN types and Figure exemplifies the general θlassifiθation of neural networks showing
the most θommon “NN teθhniques employed in θhemistry.

Figure 2. The mos“ common ne”ral ne“works employed in chemis“ry (adap“ed from Jain & Mao, 1996 [25]).

“θθording to the previous ηrief explanation, “NN teθhniques θan ηe θlassified ηased on
some features. The next topiθs explain the most θommon types of “NN employed in
θhemiθal proηlems.
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 207
h““p://dx.doi.org/10.5772/51275

. . Multilayer perceptrons

Multilayer perθeptrons MLP is one of the most employed “NN algorithms in θhemistry.
The term multilayer€ is used ηeθause this methodology is θomposed ηy several neurons ar‐
ranged in different layers. Eaθh θonneθtion ηetween the input and hidden layers or two
hidden layers is similar to a synapse ηiologiθal θounterpart and the input data is modified
ηy a determined weight. Therefore, a three layer feed-forward network is θomposed ηy an
input layer, two hidden layers and the output layer [ , - ].

MLP is also θalled feed-forward neural networks ηeθause the data information flows only
in the forward direθtion. In other words, the produθed output of a layer is only used as
input for the next layer. “n important θharaθteristiθ of feed-forward networks is the su‐
pervised learning [ , - ].

The θruθial task in the MLP methodology is the training step. The training or learning step is a
searθh proθess for a set of weight values with the oηjeθtive of reduθing/minimizing the squared
errors of prediθtion experimental x estimated data . This phase is the slowest one and there is
no guarantee of minimum gloηal aθhievement. There are several learning algorithms for MLP
suθh as θonjugate gradient desθent, quasi-Newton, Levenηerg-Marquardt, etθ., ηut the most
employed one is the ηaθk-propagation algorithm. This algorithm uses the error values of the
output layer prediθtion to adjust the weight of layer θonneθtions. Therefore, this algorithm
provides a guarantee of minimum loθal or gloηal θonvergenθe [ , - ].

The main θhallenge of MLP is the θhoiθe of the most suitaηle arθhiteθture. The speed and the
performanθe of the MLP learning are strongly affeθted ηy the numηer of layers and the
numηer of hidden unities in eaθh layer [ , - ]. Figure displays the influenθe of numηer
of layers on the pattern reθognition aηility of neural network.

Figure 3. Infl”ence of “he n”mber of layers on “he pa““ern recogni“ion abili“y of MLP (adap“ed from Jain &
Mao, 1996 [25]).

The inθrease in the numηer of layers in a MLP algorithm is proportional to the inθrease of
θomplexity of the proηlem to ηe solved. The higher the numηer of hidden layers, the higher
the θomplexity of the pattern reθognition of the neural network.
208 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

. . Self-organizing map or Kohonen neural network

Self-organizing map SOM , also θalled Kohonen neural network KNN , is an unsupervised
neural network designed to perform a non-linear mapping of a high-dimensionality data
spaθe transforming it in a low-dimensional spaθe, usually a ηidimensional spaθe. The visuali‐
zation of the output data is performed from the distanθe/proximity of neurons in the output
D-layer. In other words, the SOM teθhnique is employed to θluster and extrapolate the data
set keeping the original topology. The SOM output neurons are only θonneθted to its nearest
neighηors. The neighηorhood represents a similar pattern represented ηy an output neuron. In
general, the neighηorhood of an output neuron is defined as square or hexagonal and this
means that eaθh neuron has or nearest neighηors, respeθtively [ - ]. Figure exemplifies
the output layers of a SOM model using square and hexagonal neurons for a θomηinatorial de‐
sign of purinergiθ reθeptor antagonists [ ] and θannaηinoid θompounds [ ], respeθtively.

Figure 4. Example of o”“p”“ layers of SOM models ”sing sq”are and hexagonal ne”rons for “he combina“orial design
of (a) p”rinergic recep“or an“agonis“s [54] and (b) cannabinoid compo”nds [30], respec“ively.

The SOM teθhnique θould ηe θonsidered a θompetitive neural network due to its learning
algorithm. The θompetitive learning means that only the neuron in the output layer is seleθt‐
ed if its weight is the most similar to the input pattern than the other input neurons. Finally,
the learning rate for the neighηorhood is sθaled down proportional to the distanθe of the
winner output neuron [ - ].
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 209
h““p://dx.doi.org/10.5772/51275

. . Bayesian regularized artificial neural networks

Different from the usual ηaθk-propagation learning algorithm, the ”ayesian method θonsid‐
ers all possiηle values of weights of a neural network weighted ηy the proηaηility of eaθh set
of weights. This kind of neural network is θalled ”ayesian regularized artifiθial neural
”R“NN networks ηeθause the proηaηility of distriηution of eaθh neural network, whiθh
provides the weights, θan ηe determined ηy ”ayes~s theorem [ ]. Therefore, the ”ayesian
method θan estimate the numηer of effeθtive parameters to prediθt an output data, praθtiθal‐
ly independent from the “NN arθhiteθture. “s well as the MLP teθhnique, the θhoiθe of the
network arθhiteθture is a very important step for the learning of ”R“NN. “ θomplete re‐
view of the ”R“NN teθhnique θan ηe found in other studies [ - ].

. . Other important neural networks

“daptative resonanθe theory “RT neural networks [ , ] θonstitute other mathematiθal


models designed to desθriηe the ηiologiθal ηrain ηehavior. One of the most important θhar‐
aθteristiθ of this teθhnique is the θapaθity of knowledge without disturηing or destroying the
stored knowledge. “ simple variation of this teθhnique, the “RT- a model, has a simple
learning algorithm and it is praθtiθally inexpensive θompared to other “RT models [ - ].
The “RT- a method θonsists in θonstruθting a weight matrix that desθriηes the θentroid na‐
ture of a prediθted θlass [ , ]. In the literature, there are several θhemiθal studies that em‐
ploy the “RT-ηased neural networks [ - ].

The neural network known as radial ηasis funθtion R”F [ ] typiθally has the input layer, a
hidden layer with a R”F as the aθtivation funθtion and the output layer. This network was
developed to treat irregular topographiθ θontours of geographiθal data [ - ] ηut due to its
θapaθity of solving θomplex proηlems non-linear speθially , the R”F networks have ηeen
suθθessfully employed to θhemiθal proηlems. There are several studies θomparing the ro‐
ηustness of prediθtion prediθtion θoeffiθients, r , pattern reθognition rates and errors of
R”F-ηased networks and other methods [ - ].

The Hopfield neural network [ - ] is a model that uses a ηinary n x n matrix presented as
n x n pixel image as a weight matrix for n input signals. The aθtivation funθtion treats the
aθtivation signal only as or - . ”esides, the algorithm treats ηlaθk and white pixels as and
ηinary digits, respeθtively, and there is a transformation of the matrix data to enlarge the
interval from – to - – + . The θomplete desθription of this teθhnique θan ηe found in
referenθe [ ]. In θhemistry researθh, we θan found some studies employing the Hopfield
model to oηtain moleθular alignments [ ], to θalθulate the intermoleθular potential energy
funθtion from the seθond virial θoeffiθient [ ] and other purposes [ - ].

. Applications

Following, we will present a ηrief desθription of some studies that apply “NN teθhniques as
important tools to solve θhemiθal proηlems.
210 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

. . Medicinal Chemistry and Pharmaceutical Research

The drug design researθh involves the use of several experimental and θomputational strat‐
egies with different purposes, suθh as ηiologiθal affinity, pharmaθokinetiθ and toxiθologiθal
studies, as well as quantitative struθture-aθtivity relationship QS“R models [ - ]. “n‐
other important approaθh to design new potential drugs is virtual sθreening VS , whiθh θan
maximize the effeθtiveness of rational drug development employing θomputational assays
to θlassify or filter a θompound dataηase as potent drug θandidates [ - ]. ”esides, vari‐
ous “NN methodologies have ηeen largely applied to θontrol the proθess of the pharma‐
θeutiθal produθtion [ - ].

Fanny et al. [ ] θonstruθted a SOM model to perform VS experiments and tested an exter‐
nal dataηase of , θompounds. The use of SOM methodology aθθelerated the similarity
searθhes ηy using several pharmaθophore desθriptors. The ηest result indiθated a map that
retrieves % of relevant neighηors output neurons in the similarity searθh for virtual hits.

. . Theoretical and Computational Chemistry

In theoretiθal/θomputational θhemistry, we θan oηtain some appliθations of “NN teθhniques


suθh as the prediθtion of ionization potential [ ], lipophiliθity of θhemiθals [ , ],
θhemiθal/physiθal/meθhaniθal properties of polymer employing topologiθal indiθes [ ]
and relative permittivity and oxygen diffusion of θeramiθ materials [ ].

Stojković et al. [ ] also θonstruθted a quantitative struθture-property relationship QSPR


model to prediθt pK”H+ for amines. To θonstruθt the regression model, the authors θal‐
θulated some topologiθal and quantum θhemiθal desθriptors. The θounter-propagation
neural network was employed as a modeling tool and the Kohonen self-organizing map
was employed to graphiθally visualize the results. The authors θould θlearly explain how
the input desθriptors influenθed the pK”H+ ηehavior, in speθial the presenθe of halogens
atoms in the amines struθture.

. . Analytical Chemistry

There are several studies in analytiθal θhemistry employing “NN teθhniques with the aim
to oηtain multivariate θaliηration and analysis of speθtrosθopy data [ - ], as well as to
model the HPLC retention ηehavior [ ] and reaθtion kinetiθs [ ].

Fatemi [ ] θonstruθted a QSPR model employing the “NN teθhnique with ηaθk-propaga‐
tion algorithm to prediθt the ozone tropospheriθ degradation rate θonstant of organiθ θom‐
pounds. The data set was θomposed of organiθ θompounds divided into training, test
and validation sets. The author also θompared the “NN results with those oηtained from
the MLR method. The θorrelation θoeffiθients oηtained with “NN/MLR were . / . ,
. / . and . / . for the training, test and validation sets, respeθtively. These results
showed the ηest effiθaθy of the “NN methodology in this θase.
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 211
h““p://dx.doi.org/10.5772/51275

. . Biochemistry

Neural networks have ηeen largely employed in ηioθhemistry and θorrelated researθh fields
suθh as protein, DN“/RN“ and moleθular ηiology sθienθes [ - ].

Petritis et al. [ ] employed a three layer neural network with ηaθk-propagation algorithm to
prediθt the reverse-phase liquid θhromatography retention time of peptides enzymatiθally di‐
gested from proteomes. In the training set, the authors used known peptides from D. radi‐
odurans. The θonstruθted “NN model was employed to prediθt a set with peptides from
S. oneidensis. The used neural network generated some weights for the θhromatographiθ re‐
tention time for eaθh aminoaθid in agreement to results oηtained ηy other authors. The oη‐
tained “NN model θould prediθt a peptide sequenθe θontaining aminoaθids with an error
less than . . Half of the test set was prediθted with less than % of error and more than % of
this set was prediθted with an error around %. These results showed that the “NN method‐
ology is a good tool to prediθt the peptide retention time from liquid θhromatography.

Huang et al. [ ] introduθed a novel “NN approaθh θomηining aspeθts of QS“R and “NN
and they θalled this approaθh of physiθs and θhemistry-driven “NN Phys-Chem “NN .
This methodology has the parameters and θoeffiθients θlearly ηased on physiθoθhemiθal in‐
sights. In this study, the authors employed the Phys-Chem “NN methodology to prediθt the
staηility of human lysozyme. The data set was θomposed ηy types of mutated lysozymes
inθluding the wild type and the experimental property used in the modeling was the
θhange in the unfolding Giηηs free energy kJ- mol . This study resulted in signifiθant θoeffi‐
θients of θaliηration and validation r = . and q = . , respeθtively . The proposed meth‐
odology provided good prediθtion of ηiologiθal aθtivity, as well as struθtural information
and physiθal explanations to understand the staηility of human lysozyme.

. . Food Research

“NNs have also ηeen widely employed in food researθh. Some examples of appliθation of
“NNs in this area inθlude vegetaηle oil studies [ - ], ηeers [ ], wines [ ], honeys
[ - ] and water [ - ].

”os et al. [ ] employed several “NN teθhniques to prediθt the water perθentage in
θheese samples. The authors tested several different arθhiteθture of neurons some funθ‐
tions were employed to simulate different learning ηehaviors and analyzed the prediθ‐
tion errors to assess the “NN performanθe. The ηest result was oηtained employing a
radial ηasis funθtion neural network.

Cimpoiu et al. [ ] used the multi-layer perθeptron with the ηaθk-propagation algorithm to
model the antioxidant aθtivity of some θlasses of tea suθh as ηlaθk, express ηlaθk and green
teas. The authors oηtained a θorrelation of . % ηetween experimental and prediθted anti‐
oxidant aθtivity. “ θlassifiθation of samples was also performed using an “NN teθhnique
with a radial ηasis layer followed ηy a θompetitive layer with a perfeθt matθh ηetween real
and prediθted θlasses.
212 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

. Conclusions

“rtifiθial Neural Networks “NNs were originally developed to mimiθ the learning proθess
of human ηrain and the knowledge storage funθtions. The ηasiθ unities of “NNs are θalled
neurons and are designed to transform the input data as well as propagating the signal with
the aim to perform a non-linear θorrelation ηetween experimental and prediθted data. “s
the human ηrain is not θompletely understood, there are several different arθhiteθtures of
artifiθial neural networks presenting different performanθes. The most θommon “NNs ap‐
plied to θhemistry are MLP, SOM, ”R“NN, “RT, Hopfield and R”F neural networks. There
are several studies in the literature that θompare “NN approaθhes with other θhemometriθ
tools e.g. MLR and PLS , and these studies have shown that “NNs have the ηest perform‐
anθe in many θases. Due to the roηustness and effiθaθy of “NNs to solve θomplex proηlems,
these methods have ηeen widely employed in several researθh fields suθh as mediθinal
θhemistry, pharmaθeutiθal researθh, theoretiθal and θomputational θhemistry, analytiθal
θhemistry, ηioθhemistry, food researθh, etθ. Therefore, “NN teθhniques θan ηe θonsidered
valuaηle tools to understand the main meθhanisms involved in θhemiθal proηlems.

Notes

Teθhniques related to artifiθial neural networks “NNs have ηeen inθreasingly used in θhemi‐
θal studies for data analysis in the last deθades. Some areas of “NN appliθations involve pat‐
tern identifiθation, modeling of relationships ηetween struθture and ηiologiθal aθtivity,
θlassifiθation of θompound θlasses, identifiθation of drug targets, prediθtion of several physi‐
θoθhemiθal properties and others. “θtually, the main purpose of “NN teθhniques in θhemiθal
proηlems is to θreate models for θomplex input–output relationships ηased on learning from
examples and, θonsequently, these models θan ηe used in prediθtion studies. It is interesting to
note that “NN methodologies have shown their power and roηustness in the θreation of use‐
ful models to help θhemists in researθh projeθts in aθademy and industry. Nowadays, the evo‐
lution of θomputer sθienθe software and hardware has allowed the development of many
θomputational methods used to understand and simulate the ηehavior of θomplex systems. In
this way, the integration of teθhnologiθal and sθientifiθ innovation has helped the treatment of
large dataηases of θhemiθal θompounds in order to identify possiηle patterns. However, peo‐
ple that θan use θomputational teθhniques must ηe prepared to understand the limits of appliθ‐
aηility of any θomputational method and to distinguish ηetween those opportunities whiθh are
appropriate to apply “NN methodologies to solve θhemiθal proηlems. The evolution of “NN
theory has resulted in an inθrease in the numηer of suθθessful appliθations. So, the main θontri‐
ηution of this ηook θhapter will ηe ηriefly outline our view on the present sθope and future ad‐
vanθes of “NNs ηased on some appliθations from reθent researθh projeθts with emphasis in
the generation of prediθtive “NN models.
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 213
h““p://dx.doi.org/10.5772/51275

Author details

Viníθius Gonçalves Maltarollo , Káthia Maria Honório ,


and
“lηériθo ”orges Ferreira da Silva *

*“ddress all θorrespondenθe to alηeriθo@iqsθ.usp.ηr

Centro de Ciênθias Naturais e Humanas – UF“”C – Santo “ndré – SP

Esθola de “rtes, Ciênθias e Humanidades – USP – São Paulo – SP

Departamento de Químiθa e Físiθa Moleθular – Instituto de Químiθa de São Carlos – USP –


São Carlos – SP

References

[ ] Teófilo, R. F., & Ferreira, M. M. C. . Quimiometria II Planilhas Eletrôniθas para


Cálθulos de Planejamento Experimental um Tutorial. Quim. Nova, , - .
[ ] Lundstedt, T., Seifert, E., “ηramo, L., Theilin, ”., Nyström, “., Pettersen, J., & ”erg‐
man, R. . Experimental design and optimization. Chemom. Intell. Laη., , - .
[ ] Kowalski, ”. R. J. . Chemometriθs Views and Propositions. J. Chem. Inf. Comp.
Sθi., , - .
[ ] Wold, S. . Pattern reθognition ηy means of disjoint prinθipal θomponent mod‐
els. Pattern Reθognition, , - .
[ ] Vandeginste, ”. G. M. . Chemometriθs- General Introduθtion and Historiθal
Development. Top. Curr. Chem., , - .
[ ] Wold, S., & Sjöström, M. . Chemometriθs present and future suθθess. Chemom.
Intell. Laη., , - .
[ ] Ferreira, M. M. C., “ntunes, “. M., Melgo, M. S., & Volpe, P. L. O. . Quimiome‐
tria I θaliηração multivariada um tutorial. Quím. Nova, , - .
[ ] Neto, ”. ”., Sθarminio, I. S., & ”runs, R. E. . “nos de Quimiometria no ”rasil.
Quim. Nova, , - .
[ ] Hopke, P. K. . The evolution of θhemometriθs. Anal. Chim. Aθta, , - .
[ ] Wold, S., Ruhe, “., Wold, H., & Dunn, W. . The θollinearity proηlem in linear
regression The partial least squares approaθh to generalized inverses. SIAM J. Sθi.
Stat. Comput., , - .
[ ] Wold, S., Sjöströma, M., & Eriksson, L. . PLS-regression a ηasiθ tool of θhemo‐
metriθs. Chemom. Intell. Laη., , - .
214 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] M“TL“” r. . MathWorks Inθ.

[ ] RStudioTM . . RStudio Inθ.

[ ] Statistiθa. . Data Analysis Software System.

[ ] Pirouette. . Infometrix Inθ.

[ ] HQS“RTM . . Manual Release in Syηyl 7. . Tripos Inθ.

[ ] Cramer, R. D., Patterson, D. E., & ”unθe, J. D. . Comparative moleθular field


analysis CoMF“ . . Effeθt of shape on ηinding of steroids to θarrier proteins. J. Am.
Chem. Soθ., , - .

[ ] Cramer, R. D., Patterson, D. E., & ”unθe, J. D. . Reθent advanθes in θomparative


moleθular field analysis CoMFA , Prog. Clin. Biol. Res., , - .

[ ] Kleηe, G., “ηraham, U., & Mietzner, T. . Moleθular Similarity Indiθes in a Com‐
parative “nalysis CoMSI“ of Drug Moleθules to Correlate and Prediθt Their ”io‐
logiθal “θtivity. J. Med. Chem., , - .

[ ] Martins, J. P. “., ”arηosa, E. G., Pasqualoto, K. F. M., & Ferreira, M. M. C. .


LQT“-QS“R “ New D-QS“R Methodology. J. Chem. Inf. Model., , - .

[ ] Long, J. R., Gregoriou, V. G., & Gemperline, P. J. . Speθtrosθopiθ θaliηration and


quantitation using artifiθial neural networks. Anal. Chem., , - .

[ ] Cerqueira, E. O., “ndrade, J. C., & Poppi, R. J. . Redes neurais e suas apliθações
em θaliηração multivariada. Quím. Nova, , - .

[ ] Sigman, M. E., & Rives, S. S. . Prediθtion of “tomiθ Ionization Potentials I-III


Using an “rtifiθial Neural Network. J. Chem. Inf. Comput. Sθi., , - .

[ ] Hsiao, T., Lin, C., Zeng, M., & Chiang, H. K. . The Implementation of Partial
Least Squares with “rtifiθial Neural Network “rθhiteθture. Proθeedings of the th An‐
nual International Conferenθe of the IEEE Engineering in Mediθine and Biology Soθiety, ,
- .

[ ] Jain, “. K., Mao, J., & Mohiuddin, K. M. . “rtifiθial Neural Networks “ Tutori‐
al. IEEE Computer, , - .

[ ] ”orggaard, C., & Thodηerg, H. H. . Optimal minimal neural interpretation of


speθtra. Anal. Chem., , - .

[ ] Zheng, F., Zheng, G., Deaθiuθ, “. G., Zhan, C. G., Dwoskin, L. P., & Crooks, P. “.
. Computational neural network analysis of the affinity of loηeline and tetraηe‐
nazine analogs for the vesiθular monoamine transporter- . Bioorg. Med. Chem., ,
- .

[ ] Louis, ”., “grawal, V. K., & Khadikar, P. V. . Prediθtion of intrinsiθ soluηility of


generiθ drugs using mlr, ann and svm analyses. Eur. J. Med. Chem., , - .
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 215
h““p://dx.doi.org/10.5772/51275

[ ] Fatemi, M. H., Heidari, “., & Ghorηanzade, M. . Prediθtion of aqueous soluηili‐


ty of drug-like θompounds ηy using an artifiθial neural network and least-squares
support veθtor maθhine. Bull. Chem. Soθ. Jpn., , - .

[ ] Honório, K. M., de Lima, E. F., Quiles, M. G., Romero, R. “. F., Molfetta, F. “., & da,
Silva. “. ”. F. . “rtifiθial Neural Networks and the Study of the Psyθhoaθtivity
of Cannaηinoid Compounds. Chem. Biol. Drug. Des., , - .

[ ] Qin, Y., Deng, H., Yan, H., & Zhong, R. . “n aθθurate nonlinear QS“R model
for the antitumor aθtivities of θhloroethylnitrosoureas using neural networks. J. Mol.
Graph. Model., , - .

[ ] Himmelηlau, D. M. . “ppliθations of artifiθial neural networks in θhemiθal en‐


gineering. Korean J. Chem. Eng., , - .

[ ] Marini, F., ”uθθi, R., Magri, “. L., & Magri, “. D. . “rtifiθial neural networks in
θhemometriθs History examples and perspeθtives. Miθroθhem. J., , - .

[ ] Mθ Cutloθh, W. S., & Pttts, W. . “ logiθal θalθulus of the Ideas imminent in


nervous aθtivity. Bull. Math. Biophys., , - .

[ ] Pitts, W., & Mθ Culloθh, W. S. . How we know universals the perθeption of au‐
ditory and visual forms. Bull. Math. Biophys., , - .

[ ] Heηη, D. O. . The Organization of Behavior, New York, Wiley.

[ ] Zupan, J., & Gasteiger, J. . Neural networks “ new method for solving θhemi‐
θal proηlems or just a passing phase? Anal. Chim. Aθta, , - .

[ ] Smits, J. R. M., Melssen, W. J., ”uydens, L. M. C., & Kateman, G. . Using artifi‐
θial neural networks for solving θhemiθal proηlems. Part I. Multi-layer feed-forward
networks. Chemom. Intell. Laη., , - .

[ ] Hopfield, J. J. . Neural networks and physiθal systems with emergent θolleθtive


θomputational aηilities. Proθ. Nat. Aθad. Set., , - .

[ ] Werηos, P. . ”eyond Regression New Tools for Prediθtion and “nalysis in the
”ehavioral Sθienθes. PhD thesis, Harvard University Camηridge.

[ ] Hoskins, J. C., & Himmelηau, D. M. . “rtifiθial Neural Network Models of


Knowledge Representation in Chemiθal Engineering. Comput. Chem. Eng., ,
- .

[ ] Qian, N., & Sejnowski, T. J. . Prediθting the Seθondary Struθture of Gloηular


Proteins Using Neural Network Models. J. Mol. Biol., , - .

[ ] ”ohr, H., ”ohr, J., ”runak, S., Cotterill, R., Lautrup, ”., Norskov, L., Olsen, O., & Pe‐
tersen, S. . Protein Seθondary Struθture and Homology ηy Neural Networks.
FEBS Lett., , - .

[ ] Hassoun, M. H. . Fundamentals of Artifiθial Neural Networks. A Bradford Book.


216 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] Zurada, J. M. . Introduθtion to Artifiθial Neural Systems, ”oston, PWS Puηlishing


Company.

[ ] Zupan, J., & Gasteiger, J. . Neural Networks in Chemistry and Drug Design ed. ,
Wiley-VCH.

[ ] Gasteiger, J., & Zupan, J. . Neural Networks in Chemistry. Angew. Chem. Int.
Edit., , - .

[ ] Marini, F. . “rtifiθial neural networks in foodstuff analyses Trends and per‐


speθtives. A review. Anal. Chim. Aθta, , - .

[ ] Miller, F. P., Vandome, “. F., & Mθ ”rewster, J. . Multilayer Perθeptron, “lpha‐


sθript Puηlishing.

[ ] Widrow, ”., & Lehr, M. “. . years of “daptive Neural Networks Perθeptron


Madaline and ”aθkpropagation. Proθ. IEEE, , - .

[ ] Kohonen, T. . Self Organizing Maps ed. , New York, Springer.

[ ] Zupan, J., Noviča, M., & Ruisánθhez, I. . Kohonen and θounterpropagation arti‐
fiθial neural networks in analytiθal θhemistry. Chemom. Intell. Laη., , - .

[ ] Smits, J. R. M., Melssen, W. J., ”uydens, L. M. C., & Kateman, G. . Using artifi‐
θial neural networks for solving θhemiθal proηlems. Part II. Kohonen self-organising
feature maps and Hopfield networks. Chemom. Intell. Laη., , - .

[ ] Sθhneider, G., & Nettekoven, M. . Ligand-ηased θomηinatorial design of seleθ‐


tive purinergiθ reθeptor a a antagonists using self-organizing maps. J. Comη. Chem.,
- .

[ ] ”ayes, T. . “n Essay toward solving a proηlem in the doθtrine of θhanθes. Philo‐


sophiθal Transaθtions of the Royal Soθiety of London, , - .

[ ] Maθkay, D. J. C. . Proηaηle Networks and Plausiηle Prediθtions- a Review of


Praθtiθal ”ayesian Methods for Supervised Neural Networks. Comput. Neural Sys., ,
- .

[ ] Maθkay, D. J. C. . ”ayesian Interpolation. Neural Comput., , - .

[ ] ”untine, W. L., & Weigend, “. S. . ”ayesian ”aθk-Propagation. Complex. Sys., ,


- .

[ ] de Freitas, J. F. G. . ”ayesian Methods for Neural Networks. PhD thesis, Univer‐


sity Engineering Dept Camηridge.

[ ] Grossηerg, S. . “daptive pattern θlassifiθation and universal reθoding I. Parallel


development and θoding of neural feature deteθtors. Biol. Cyηern., , - .

[ ] Grossηerg, S. . “daptive pattern θlassifiθation and universal reθoding II. Feed‐


ηaθk expeθtation olfaθtion and illusions. Biol. Cyηern., , - .
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 217
h““p://dx.doi.org/10.5772/51275

[ ] Carpenter, G. “., Grossηerg, S., & Rosen, D. ”. . “RT- a-an adaptive resonanθe
algorithm for rapid θategory learning and reθognition. Neural Networks, , - .
[ ] Wienke, D., & ”uydens, L. . “daptive resonanθe theory ηased neural networks-
the~“RT~ of real-time pattern reθognition in θhemiθal proθess monitoring? TrAC
Trend. Anal. Chem., , - .
[ ] Lin, C. C., & Wang, H. P. . Classifiθation of autoregressive speθtral estimated
signal patterns using an adaptive resonanθe theory neural network. Comput. Ind., ,
- .
[ ] Whiteley, J. R., & Davis, J. F. . “ similarity-ηased approaθh to interpretation of
sensor data using adaptive resonanθe theory. Comput. Chem. Eng., , - .
[ ] Whiteley, J. R., & Davis, J. F. . Qualitative interpretation of sensor patterns.
IEEE Expert, , - .
[ ] Wienke, D., & Kateman, G. . “daptive resonanθe theory ηased artifiθial neural
networks for treatment of open-θategory proηlems in θhemiθal pattern reθognition-
appliθation to UV-Vis and IR speθtrosθopy. Chemom. Intell. Laη., , - .
[ ] Wienke, D., Xie, Y., & Hopke, P. K. . “n adaptive resonanθe theory ηased artifi‐
θial neural network “RT- a for rapid identifiθation of airηorne partiθle shapes from
their sθanning eleθtron miθrosθopy images. Intell. Laη., , - .
[ ] Xie, Y., Hopke, P. K., & Wienke, D. . “irηorne partiθle θlassifiθation with a
θomηination of θhemiθal θomposition and shape index utilizing an adaptive reso‐
nanθe artifiθial neural network. Environ. Sθi. Teθhnol., , - .
[ ] Wienke, D., van den, ”roek. W., Melssen, W., ”uydens, L., Feldhoff, R., Huth-Fehre,
T., Kantimm, T., Quiθk, L., Winter, F., & Cammann, K. . Comparison of an
adaptive resonanθe theory ηased neural network “RT- a against other θlassifiers
for rapid sorting of post θonsumer plastiθs ηy remote near-infrared speθtrosθopiθ
sensing using an InGa“s diode array. Anal. Chim. Aθta, , - .
[ ] Domine, D., Devillers, J., Wienke, D., & ”uydens, L. . “RT -“ for Optimal Test
Series Design in QS“R. J. Chem. Inf. Comput. Sθi., , - .
[ ] Wienke, D., & ”uydens, L. . “daptive resonanθe theory ηased neural network
for supervised θhemiθal pattern reθognition Fuzzy“RTM“P . Part Theory and
network properties. Chemom. Intell. Laη., , - .
[ ] Wienke, D., van den, ”roek. W., ”uydens, L., Huth-Fehre, T., Feldhoff, R., Kantimm,
T., & Cammann, K. . “daptive resonanθe theory ηased neural network for su‐
pervised θhemiθal pattern reθognition Fuzzy“RTM“P . Part Classifiθation of
post-θonsumer plastiθs ηy remote NIR speθtrosθopy using an InGa“s diode array.
Chemom. Intell. Laη., , - .
[ ] ”uhmann, M. D. . Radial Basis Funθtions: Theory and Implementations, Camηridge
University.
218 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] Lingireddy, S., & Ormsηee, L. E. . Neural Networks in Optimal Caliηration of


Water Distriηution Systems. In Flood I, Kartam N. eds. “rtifiθial Neural Networks
for Civil Engineers “dvanθed Features and “ppliθations. Amer. Soθiety of Civil Engi‐
neers, - .

[ ] Shahsavand, “., & “hmadpour, “. . “ppliθation of Optimal Rηf Neural Net‐


works for Optimization and Charaθterization of Porous Materials. Comput. Chem.
Eng., , - .

[ ] Regis, R. G., & Shoemaker, C. . “ Constrained Gloηal Optimization of Expen‐


sive ”laθk ”ox Funθtions Using Radial ”asis Funθtions. J. Gloηal. Optim., , - .

[ ] Han, H., Chen, Q., & Qiao, J. . “n effiθient self-organizing R”F neural network
for water quality prediθtion. Neural Networks, , - .

[ ] Fidênθio, P. H., Poppi, R. J., “ndrade, J. C., & “ηreu, M. F. . Use of Radial ”asis
Funθtion Networks and Near-Infrared Speθtrosθopy for the Determination of Total
Nitrogen Content in Soils from Sao Paulo State. Anal. Sθi., , - .

[ ] Yao, X., Liu, M., Zhang, X., Zhang, R., Hu, Z., & Fan, ”. . Radial ”asis Funθtion
Neural Networks ”ased QSPR for the Prediθtion of log P. Chinese J. Chem., ,
- .

[ ] Hopfield, J. J. . Neural networks and physiθal systems with emergent θolleθtive


θomputational aηilities. Proθ. Natl. Aθad. Sθi., US“, , - .

[ ] Hopfield, J. J. . Neurons with graded response have θolleθtive θomputational


properties like those of two-state neurons. Proθ. Natl. Aθad. Sθi., US“, , - .

[ ] “rakawa, M., Hasegawa, K., & Funatsu, K. . “ppliθation of the Novel Moleθu‐
lar “lignment Method Using the Hopfield Neural Network to D-QS“R. J. Chem. Inf.
Comput. Sθi., , - .

[ ] ”raga, J. P., “lmeida, M. ”., ”raga, “. P., & ”elθhior, J. C. . Hopfield neural net‐
work model for θalθulating the potential energy funθtion from seθond virial data.
Chem. Phys., , - .

[ ] Hjelmfelt, “., & Ross, J. . Chemiθal implementation and thermodynamiθs of


θolleθtive neural networks. PNAS, , - .

[ ] Hjelmfelt, “., Sθhneider, F. W., & Ross, J. . Pattern Reθognition in Coupled


Chemiθal Kinetiθ Systems. Sθienθe, , - .

[ ] Vraθko, M. . Kohonen “rtifiθial Neural Network and Counter Propagation


Neural Network in Moleθular Struθture-Toxiθity Studies. Curr. Comput-Aid. Drug, ,
- .

[ ] Guha, R., Serra, J. R., & Jurs, P. C. . Generation of QS“R sets with a self-organ‐
izing map. J. Mol. Graph. Model., , - .
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 219
h““p://dx.doi.org/10.5772/51275

[ ] Hoshi, K., Kawakami, J., Kumagai, M., Kasahara, S., Nishimura, N., Nakamura, H., &
Sato, K. . “n analysis of thyroid funθtion diagnosis using ”ayesian-type and
SOM-type neural networks. Chem. Pharm. Bull., , - .

[ ] Nandi, S., Vraθko, M., & ”agθhi, M. C. . “ntiθanθer aθtivity of seleθted phenoliθ
θompounds QS“R studies using ridge regression and neural networks. Chem. Biol.
Drug Des., , - .

[ ] Xiao, Y. D., Clauset, “., Harris, R., ”ayram, E., Santago, P., & Sθhmitt, . . Super‐
vised self-organizing maps in drug disθovery. . Roηust ηehavior with overdeter‐
mined data sets. J. Chem. Inf. Model., , - .

[ ] Molfetta, F. “., “ngelotti, W. F. D., Romero, R. “. F., Montanari, C. “., & da, Silva. “.
”. F. . “ neural networks study of quinone θompounds with trypanoθidal aθ‐
tivity. J. Mol. Model., , - .

[ ] Zheng, F., Zheng, G., Deaθiuθ, “. G., Zhan, C. G., Dwoskin, L. P., & Crooks, P. “.
. Computational neural network analysis of the affinity of loηeline and tetraηe‐
nazine analogs for the vesiθular monoamine transporter- . ”ioorg. Med. Chem., ,
- .

[ ] Caηallero, J., Fernandez, M., & Gonzalez-Nilo, F. D. . Struθtural requirements


of pyrido[ -d]pyrimidin- -one as CDK /D inhiηitors D autoθorrelation CoMF“
and CoMSI“ analyses. Bioorg. Med. Chem., , - .

[ ] Sθhneider, G., Coassolo, P., & Lavé, T. . Comηining in vitro and in vivo phar‐
maθokinetiθ data for prediθtion of hepatiθ drug θlearanθe in humans ηy artifiθial neu‐
ral networks and multivariate statistiθal teθhniques. J. Med. Chem., , - .

[ ] Hu, L., Chen, G., & Chau, R. M. W. . “ neural networks-ηased drug disθovery
approaθh and its appliθation for designing aldose reduθtase inhiηitors. J. Mol. Graph.
Model., , - .

[ ] “fantitis, “., Melagraki, G., Koutentis, P. “., Sarimveis, H., & Kollias, G. . Li‐
gand- ηased virtual sθreening proθedure for the prediθtion and the identifiθation of
novel -amyloid aggregation inhiηitors using Kohonen maps and Counterpropaga‐
tion “rtifiθial Neural Networks. Eur. J. Med. Chem., , - .

[ ] Noeske, T., Trifanova, D., Kauss, V., Renner, S., Parsons, C. G., Sθhneider, G., & Weil,
T. . Synergism of virtual sθreening and mediθinal θhemistry Identifiθation and
optimization of allosteriθ antagonists of metaηotropiθ glutamate reθeptor . Bioorg.
Med. Chem., , - .

[ ] Karpov, P. V., Osolodkin, D. I., ”askin, I. I., Palyulin, V. “., & Zefirov, N. S. .
One-θlass θlassifiθation as a novel method of ligand-ηased virtual sθreening The θase
of glyθogen synthase kinase inhiηitors. Bioorg. Med. Chem. Lett., , - .

[ ] Molnar, L., & Keseru, G. M. . “ neural network ηased virtual sθreening of θyto‐
θhrome p a inhiηitors. Bioorg. Med. Chem. Lett., , - .
220 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] Di Massimo, C., Montague, G. “., Willis, Tham. M. T., & Morris, “. J. . To‐
wards improved peniθillin fermentation via artifiθialneuralnetworks. Comput. Chem.
Eng., , - .

[ ] Palanθar, M. C., “ragón, J. M., & Torreθilla, J. S. . pH-Control System ”ased on


“rtifiθial Neural Networks. Ind. Eng. Chem. Res., , - .

[ ] Takayama, K., Fujikawa, M., & Nagai, T. . “rtifiθial Neural Network as a Novel
Method to Optimize Pharmaθeutiθal Formulations. Pharm. Res., , - .

[ ] Takayama, K., Morva, “., Fujikawa, M., Hattori, Y., Oηata, Y., & Nagai, T. .
Formula optimization of theophylline θontrolled-release taηlet ηased on artifiθial
neural networks. J. Control. Release, , - .

[ ] Fanny, ”., Gilles, M., Natalia, K., “lexandre, V., & Dragos, H. Using Self-Organizing
Maps to “θθelerate Similarity Searθh. Bioorg. Med. Chem., In Press, http //dxdoiorg/
/jηmθ .

[ ] Sigman, M. E., & Rives, S. S. . Prediθtion of “tomiθ Ionization Potentials I-III


Using an “rtifiθial Neural Network. J. Chem. Inf. Comput. Sθi., , - .

[ ] Tetko, I. V., & Tanθhuk, V. Y. . “ppliθation of “ssoθiative Neural Networks for


Prediθtion of Lipophiliθity in “LOGPS . Program. J. Chem. Inf. Comput. Sθi., ,
- .

[ ] Tetko, I. V., Tanθhuk, V. Y., & Villa, “. E. P. . Prediθtion of n-Oθtanol/Water


Partition Coeffiθients from PHYSPROP Dataηase Using “rtifiθial Neural Networks
and E-State Indiθes. J. Chem. Inf. Comput. Sθi., , - .

[ ] Sumpter, ”. G., & Noid, D. W. . Neural networks and graph theory as θompu‐
tational tools for prediθting polymer properties. Maθromol. Theor. Simul., , - .

[ ] Sθotta, D. J., Coveneya, P. V., Kilnerη, J. “., Rossinyη, J. C. H., & “lford, N. M. N.
. Prediθtion of the funθtional properties of θeramiθ materials from θomposition
using artifiθialneuralnetworks. J. Eur. Ceram. Soθ., , - .

[ ] Stojković, G., Novič, M., & Kuzmanovski, I. . Counter-propagation artifiθial


neural networks as a tool for prediθtion of pK”H+ for series of amides. Chemom. In‐
tell. Laη., , - .

[ ] Næs, T., Kvaal, K., Isaksson, T., & Miller, C. . “rtifiθial neural networks in mul‐
tivariate θaliηration. J. Near. Infrared Speθtrosθ., , - .

[ ] Munk, M. E., Madison, M. S., & Roηη, E. W. . Neural-network models for infra‐
red-speθtrum interpretation. Mikroθhim. Aθta, , - .

[ ] Meyer, M., & Weigelt, T. . Interpretation of infrared speθtra ηy artifiθial neural


networks. Anal. Chim. Aθta, , - .
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 221
h““p://dx.doi.org/10.5772/51275

[ ] Smits, J. R. M., Sθhoenmakers, P., Stehmann, “., Sijstermans, F., & Chemom, Kate‐
man G. . Interpretation of infrared speθtra with modular neural-network sys‐
tems. Intell. Laη., , - .
[ ] Goodaθre, R., Neal, M. J., & Kell, D. ”. . Rapid and Quantitative “nalysis of the
Pyrolysis Mass Speθtra of Complex ”inary and Tertiary Mixtures Using Multivariate
Caliηration and “rtifiθial Neural Networks. Anal. Chem., , - .
[ ] Ciroviθ, D. . Feed-forward artifiθial neural networks appliθations to speθtro‐
sθopy. TrAC Trend. Anal. Chem., , - .
[ ] Zhao, R. H., Yue, ”. F., Ni, J. Y., Zhou, H. F., & Zhang, Y. K. . “ppliθation of an
artifiθial neural network in θhromatography-retention ηehavior prediθtion and pat‐
tern reθognition. Chemom. Intell. Laη., , - .
[ ] ”lanθo, M., Coello, J., Iturriaga, H., Maspoθh, S., & Redon, M. . “rtifiθial Neural
Networks for Multiθomponent Kinetiθ Determinations. Anal. Chem., , - .
[ ] Fatemi, M. H. . Prediθtion of ozone tropospheriθ degradation rate θonstant of
organiθ θompounds ηy using artifiθial neural networks. Anal. Chim. Aθta, ,
- .
[ ] Diederiθhs, K., Freigang, J., Umhau, S., Zeth, K., & ”reed, J. . Prediθtion ηy a
neural network of outer memηrane {ηeta}-strand protein topology. Protein Sθi., ,
- .
[ ] Meiler, J. . PROSHIFT Protein θhemiθal shift prediθtion using artifiθial neural
networks. J. Biomol. NMR, , - .
[ ] Lohmann, R., Sθhneider, G., ”ehrens, D., & Wrede, P. “. . Neural network
model for the prediθtion of memηrane-spanning amino aθid sequenθes. Protein Sθi., ,
- .
[ ] Domηi, G. W., & Lawrenθe, J. . “nalysis of protein transmemηrane heliθal re‐
gions ηy a neural network. Protein Sθi., , - .
[ ] Wang, S. Q., Yang, J., & Chou, K. C. . Using staθked generalization to prediθt
memηrane protein types ηased on pseudo-amino aθid θomposition. J. Theor. Biol.,
, - .
[ ] Ma, L., Cheng, C., Liu, X., Zhao, Y., Wang, “., & Herdewijn, P. . “ neural net‐
work for prediθting the staηility of RN“/DN“ hyηrid duplexes. Chemom. Intell. Laη.,
, - .
[ ] Ferran, E. “., Pflugfelaer, ”., & Ferrara, P. . Self-organized neural maps of hu‐
man protein sequenθes. Protein Sθi., , - .
[ ] Petritis, K., Kangas, L. J., Ferguson, P. L., “nderson, G. “., Pa:a-Tolić, L., Lipton, M.
S., “uηerry, K. J., Strittmatter, E. F., Shen, Y., Zhao, R., & Smith, R. D. . Use of
“rtifiθial Neural Networks for the “θθurate Prediθtion of Peptide Liquid Chromatog‐
raphy Elution Times in Proteome “nalyses. Anal. Chem., , - .
222 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] Huang, R., Du, Q., Wei, Y., Pang, Z., Wei, H., & Chou, K. . Physiθs and θhemis‐
try-driven artifiθial neural network for prediθting ηioaθtivity of peptides and pro‐
teins and their design. J. Theor. Biol., , - .

[ ] Martin, Y. G., Oliveros, M. C. C., Pavon, J. L. P., Pinto, C. G., & Cordero, ”. M. .
Eleθtroniθ nose ηased on metal oxide semiθonduθtor sensors and pattern reθognition
teθhniques θharaθterisation of vegetaηle oils. Anal. Chim. Aθta, , - .

[ ] ”rodnjak-Vonθina, D., Kodηa, Z. C., & Noviθ, M. . Multivariate data analysis in


θlassifiθation of vegetaηle oils θharaθterized ηy the θontent of fatty aθids. Chemom. In‐
tell. Laη., , - .

[ ] Zhang, G. W., Ni, Y. N., Churθhill, J., & Kokot, S. . “uthentiθation of vegetaηle
oils on the ηasis of their physiθo-θhemiθal properties with the aid of θhemometriθs.
Talanta, , - .

[ ] Goodaθre, R., Kell, D. ”., & ”ianθhi, G. . Rapid assessment of the adulteration
of virgin olive oils ηy other seed oils using pyrolysis mass speθtrometry and artifiθial
neural networks. J. Sθi. Food Agr., , - .

[ ] ”ianθhi, G., Giansante, L., Shaw, “., & Kell, D. ”. . Chemometriθ θriteria for the
θharaθterisation of Italian DOP olive oils from their metaηoliθ profiles. Eur. J. Lipid.
Sθi. Teθh., , - .

[ ] ”uθθi, R., Magri, “. D., Magri, “. L., Marini, D., & Marini, F. . Chemiθal “u‐
thentiθation of Extra Virgin Olive Oil Varieties ηy Supervised Chemometriθ Proθe‐
dures. J. Agriθ. Food Chem., , - .

[ ] Marini, F., ”alestrieri, F., ”uθθi, R., Magri, “. D., Magri, “. L., & Marini, D. .
Supervised pattern reθognition to authentiθate Italian extra virgin olive oil varieties.
Chemom. Intell. Laη., , - .

[ ] Marini, F., ”alestrieri, F., ”uθθi, R., Magri, “. L., & Marini, D. . Supervised pat‐
tern reθognition to disθriminate the geographiθal origin of riθe ηran oils a first study.
Miθroθh. J., , - .

[ ] Marini, F., Magri, “. L., Marini, D., & ”alestrieri, F. . Charaθterization of the
lipid fraθtion of Niger seeds Guizotia aηyssiniθa θass from different regions of
Ethiopia and India and θhemometriθ authentiθation of their geographiθal origin. Eur.
J. Lipid. Sθi. Teθh., , - .

[ ] “lexander, P. W., Di ”enedetto, L. T., & Hiηηert, D. ”. . “ field-portaηle gas


analyzer with an array of six semiθonduθtor sensors. Part Identifiθation of ηeer
samples using artifiθial neural networks. Field. Anal. Chem. Teθh., , - .

[ ] Penza, M., & Cassano, G. . Chemometriθ θharaθterization of Italian wines ηy


thin-film multisensors array and artifiθial neural networks. Food Chem., , - .
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 223
h““p://dx.doi.org/10.5772/51275

[ ] Latorre, Pena. R., Garθia, S., & Herrero, C. . “uthentiθation of Galiθian NW


Spain honeys ηy multivariate teθhniques ηased on metal θontent data. Analyst., ,
- .

[ ] Cordella, C. ”. Y., Militao, J. S. L. T., & Clement, M. C. . Caηrol-”ass D Honey


Charaθterization and “dulteration Deteθtion ηy Pattern Reθognition “pplied on
HP“EC-P“D Profiles. . Honey Floral Speθies Charaθterization. J. Agriθ. Food Chem.,
, - .

[ ] ”rodnjak-Vonθina, D., Doηθnik, D., Noviθ, M., & Zupan, J. . Chemometriθs


θharaθterisation of the quality of river water. Anal. Chim. Aθta, , - .

[ ] Vonθina, E., ”rodnjak-Vonθina, D., Soviθ, N., & Noviθ, M. . Chemometriθ θhar‐
aθterisation of the Quality of Ground Waters from Different wells in Slovenia. Aθta
Chim. Slov., , - .

[ ] ”os, “., ”os, M., & van der Linden, W. E. . “rtifiθial neural networks as a tool
for soft-modelling in quantitative analytiθal θhemistry the prediθtion of the water
θontent of θheese. Anal. Chim. Aθta, , - .

[ ] Cimpoiu, C., Cristea, V., Hosu, “., Sandru, M., & Seserman, L. . “ntioxidant
aθtivity prediθtion and θlassifiθation of some teas using artifiθial neural networks.
Food Chem., , - .
Chapter 11

Recurrent Neural Network Based Approach for Solving


Groundwater Hydrology Problems

Ivan N. da Silva, José Ângelo Cagnon and


Nil“on José Saggioro

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/51598

. Introduction

Many θommunities oηtain their drinking water from underground sourθes θalled aquifers.
Offiθial water suppliers or puηliθ inθorporations drill wells into soil and roθk aquifers look‐
ing for groundwater θontained there in order to supply the population with drinking water.
“n aquifer θan ηe defined as a geologiθ formation that will supply water to a well in enough
quantities to make possiηle the produθtion of water from this formation. The θonventional
estimation of the exploration flow involves many efforts to understand the relationship ηe‐
tween the struθtural and physiθal parameters. These parameters depend on several faθtors,
suθh as soil properties and hydrologiθ and geologiθ aspeθts [ ].

The transportation of water to the reservoirs is usually done through suηmerse eleθtriθal
motor pumps, ηeing the eleθtriθ power one of the main sourθes to the water produθtion.
Considering the inθreasing diffiθulty to oηtain new eleθtriθal power sourθes, there is then
the need to reduθe ηoth operational θosts and gloηal energy θonsumption. Thus, it is impor‐
tant to adopt appropriate operational aθtions to manage effiθiently the use of eleθtriθal pow‐
er in these groundwater hydrology proηlems. For this purpose, it is essential to determine a
parameter that expresses the energetiθ ηehavior of whole water extraθtion set, whiθh is here
defined as Gloηal Energetiθ Effiθienθy Indiθator GEEI . “ methodology using artifiθial neural
networks is here developed in order to take into aθθount several experimental tests related
to energy θonsumption in suηmerse motor pumps.

The GEEI of a depth is given in Wh/m .m. From a dimensional analysis, we θan oηserve that
the smaller numeriθ value of GEEI indiθates the ηetter energetiθ effiθienθy to the water ex‐
traθtion system from aquifers.

© 2013 da Silva e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
226 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

For suθh sθope, this θhapter is organized as follows. In Seθtion , a ηrief summary aηout wa‐
ter exploration proθesses are presented. In Seθtion , some aspeθts related to mathematiθal
models applied to water exploration proθess are desθriηed. In Seθtion is formulated the ex‐
pressions for defining the GEEI. The neural approaθh used to determine the GEEI is intro‐
duθed in Seθtion , while the proθedures for estimation of aquifer dynamiθ ηehavior using
neural networks are presented in Seθtion . Finally, in Seθtion , the key issues raised in the
θhapter are summarized and θonθlusions are drawn.

. Water Exploration Process

“n aquifer is a saturated geologiθ unit with enough permeaηility to transmit eθonomiθal


quantities of water to wells [ ]. The aquifers are usually shaped ηy unθonsolidated sands
and θrushed roθks. The sedimentary roθks, suθh as arenite and limestone, and those volθaniθ
and fraθtured θrystalline roθks θan also ηe θlassified as aquifers.

“fter the drilling proθess of groundwater wells, the test known as Step Drawdown Test is θar‐
ried out. This test θonsists of measuring the aquifer depth in relation to θontinue withdrawal
of water and with θresθent flow on the time. This depth relationship is defined as Dynamiθ
Level of the aquifer and the aquifer level at the initial instant, i.e., that instant when the
pump is turned on, is defined as Statiθ Level. This test gives the maximum water flow that
θan ηe pumped from the aquifer taking into aθθount its respeθtive dynamiθ level. “nother
θharaθteristiθ given ηy this test is the determination of Drawdown Disθharge Curves, whiθh
represent the dynamiθ level in relation to exploration flow [ ]. These θurves are usually ex‐
pressed ηy a mathematiθal funθtion and their results have presented low preθision.

Sinθe aquifer ηehavior θhanges in relation to operation time, the Drawdown Disθharge Curves
θan represent the aquifer dynamiθs only in that partiθular moment. These θhanges oθθur ηy
many faθtors, suθh as the following i aquifer reθharge θapaηility ii interferenθe of neigh‐
ηoring wells or θhanges in its exploration θonditions iii modifiθation of the statiθ level
when the pump is turned on iv operation θyθle of pump and v rest time availaηle to the
well. Thus, the mapping of these groundwater hydrology proηlems ηy θonventional identi‐
fiθation teθhniques has ηeθome very diffiθult when all aηove θonsiderations are taken into
aθθount. ”esides the aquifer ηehavior, other θomponents of the exploration system interfere
on the gloηal energetiθ effiθienθy of the system.

On the other hand, the motor-pump set mounted inside the well, suηmersed on the water
that θomes from the aquifer, reθeives the whole eleθtriθ power supplied to the system. From
an eduθtion piping, whiθh also supports physiθally the motor pump, the water is transport‐
ed to the ground surfaθe and from there, through an adduθtion piping, it is transported to
the reservoir, whiθh is normally loθated at an upper position in relation to the well. To trans‐
port water in this hydrauliθ system, it is neθessary several aθθessories valves, pipes, θurves,
etθ. for its implementation. Figure shows the typiθal θomponents involved with a water
extraθtion system ηy means of deep wells.
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 227
h““p://dx.doi.org/10.5772/51598

The resistanθe to the water flow, due to the state of the pipe walls, is θontinuous along all
the tuηing, and will ηe taken as uniform in every plaθe where the diameter of the pipe to ηe
θonstant.

This resistanθe makes the motor pump to supply an additional pressure or a load in order
to water θan reaθh the reservoir. Thus, the effeθt θreated ηy this resistanθe is also θalled
load loss along the pipe€. Similar to the tuηing, other elements of the system θause a resist‐
anθe to the fluid flow, and therefore, load losses. These losses θan ηe θonsidered loθal, loθat‐
ed, aθθidental or singular, due to the faθt that they θome from partiθular points or parts of
the tuηing.

Regarding the hydrauliθ θirθuit, it is oηserved that the load loss distriηuted and loθated is
an important parameter, and that it varies with the type and the state of the material.

Figure 1. Componen“s of “he p”mping sys“em.

Therefore, old tuηing, with aggregated inθrustation along the operational time, shows a load
loss different of that present in new tuηing. “ valve turned off twiθe introduθes a ηigger
load loss than that when it is totally open. “ variation on the extraθtion flow also θreates
θhanges on the load loss. These are some oηservations, among several other points, that
θould ηe done.

“nother important faθtor θonθerning the gloηal energetiθ effiθienθy of the system is the geo‐
metriθal differenθe of level. However, this parameter does not show any variation after the
total implantation of the system. Conθerning this, two statements θan ηe done i when
mathematiθal models were used to study the lowering of the piezometriθ surfaθe, these
models should frequently ηe evaluated in θertain periods of time ii the exploration flow of
the aquifer assumes a fundamental role in the study of the hydrauliθ θirθuit and it should ηe
θarefully analyzed.

In order to overθome these proηlems, this work θonsiders the use of parameters, whiθh are
easily oηtained in praθtiθe, to represent the θapitation system, and the use of artifiθial neural
networks to determine the exploration flow. From these parameters, it is possiηle to deter‐
mine the GEEI of the system.
228 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

. Mathematical Models Applied to Water Exploration Process

One of the most used mathematiθal models to simulate aquifer dynamiθ ηehavior is the
Theis~ model [ , ]. This model is very simple and it is used to transitory flow. In this model,
the following hypotheses are θonsidered i the aquifer is θonfined ηy impermeaηle forma‐
tions, ii the aquifer struθture is homogeneous and isotropiθ in relation to its hydro-geologi‐
θal parameters, iii the aquifer thiθkness is θonsidered θonstant with infinite horizontal
extent, and iv the wells penetrate the entire aquifer and their pumping rates are also θonsid‐
ered θonstant in relation to time.
The model proposed ηy Theis θan ηe represented ηy the following equations

¶ 2s 1 ¶s S ¶s
+ × = ×
¶r 2 r ¶r T ¶t

s(r ,0) = 0

s(¥, t ) = 0

∂s Q
lim r =−
r→ ∂r ⋅π ⋅T

where
s is the aquifer drawdown
Q is the exploration flow
T is the transmissivity θoeffiθient
r is the horizontal distanθe ηetween the well and the oηservation plaθe.
“pplying the Laplaθe~s transform on these equations, we have

d 2s - 1 ds - S
+ × = × w × s-
2 r dr T
dr

s - (r , w ) = A × K0 × (r × (S / T ) × w )

ds Q
lim r =−
r→ dr ⋅π ⋅T ⋅w

where
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 229
h““p://dx.doi.org/10.5772/51598

w is the Laplaθe~s parameter

S is the storage θoeffiθient.

Thus, the aquifer drawdown in the Laplaθe~s spaθe is given ηy

K × (r × (S / T ) × w )
s - (r , w ) = × 0
q
2 × ×T w

This equation in the real spaθe is as follows

é K × (r × (S / T ) × w ) ù
h - h0 (r , t ) = s(r , t ) = × L-1 ê 0 ú
Q
2 × ×T ëê w ûú

The Theis~ solution is then defined ηy

ò
¥
e-y
s= dy = × W (u )
q Q
4 × ×T y 4 × ×T
u

where

r 2 ×S
u=
4 ×T × t

Finally, from Equation , we have

é K × (r × (S / T ) × w ) ù
W (u ) = 2 × L-1 ê 0 ú`
ëê w ûú

where

L- is the Laplaθe~s inverse operator.

K is the hydrauliθ θonduθtivity.

From analysis of the Theis~ model, it is oηserved that to model a partiθular aquifer is indis‐
pensaηle a high teθhniθal knowledge on this aquifer, whiθh is mapped under some hypothe‐
ses, suθh as θonfined aquifer, homogeneous, isotropiθ, θonstant thiθkness, etθ. Moreover,
other aquifer parameters transmissivity θoeffiθient, storage θoeffiθient and hydrauliθ θon‐
duθtivity to ηe explored must ηe also defined. Thus, the mathematiθal models require ex‐
pert knowledge of θonθepts and tools of hydrogeology.
230 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

It is also indispensaηle to θonsider that the aquifer of a speθifiθ region shows θontinuous
θhanges in its exploration θonditions. The θhanges are normally motivated ηy the θompanies
that operate the exploration systems, ηy drilling of new wells or θhanges of the exploration
θonditions, or still, motivated ηy drilling of illegal wells. These θhanges have θertainly re‐
quired immediate adjustment on the Theis~ model. “nother faθt is that the aquifer dynamiθ
level modifies in relation to exploration flow, operation time, statiθ level, and oηviously
with those intrinsiθ θharaθteristiθs of the aquifer under exploration. In addition, neighηoring
wells will also ηe aηle to θause interferenθe on the aquifer.

Therefore, although to ηe possiηle the estimation of aquifer ηehavior using mathematiθal


models, suθh as those presented in [ ]-[ ], they present low preθision ηeθause it is more
diffiθult to θonsider all parameters related to the aquifer dynamiθs. For these situations, in‐
telligent approaθhes [ ]-[ ] have also ηeen used to oηtain a good performanθe.

. Defining the Global Energetic Efficiency Indicator

“s presented in [ ], Energetiθ Effiθienθy€ is a generalized θonθept that refers to set of aθ‐


tions to ηe done, or then, the desθription of reaθhed results, whiθh ηeθome possiηle the re‐
duθtion of demand ηy eleθtriθal energy. The energetiθ effiθienθy indiθators are estaηlished
through relationships and variaηles that θan ηe used in order to monitor the variations and
deviations on the energetiθ effiθienθy of the systems. The desθriptive indiθators are those
that θharaθterize the energetiθ situation without looking for a justifiθation for its variations
or deviations.

The theoretiθal θonθept for the proposed Gloηal Energetiθ Effiθienθy Indiθator will ηe pre‐
sented using θlassiθal equations that show the relationship ηetween the aηsorηed power
from the eleθtriθ system and the other parameters involved with the proθess.

“s presented in [ ], the power of a motor-pump set is given ηy

× Q × HT
Pmp =
75 × mp

where

Pmp is the power of the motor-pump set CV

is the speθifiθ weight of the water kgf/m

Q is the water flow m /s

HT is the total manometriθ height m

mp is the effiθienθy of the motor-pump set motor ⋅ pump .


Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 231
h““p://dx.doi.org/10.5772/51598

Suηstituting the following values { CV ≅ Watts m /s = / m /h = kgf/m } in


equation , we have

2.726 × Q × HT
Pmp =
mp

The total manometriθ height HT in elevator sets to water extraθtion from underground
aquifers is given ηy

H T = H a + H g + Δh f t

where

HT is the total manometriθ height m


Ha is the dynamiθ level of the aquifer in the well m
Hg is the geometriθ differenθe in level ηetween the well surfaθe and the reservoir m
Δhft is total load loss in the hydrauliθ θirθuit m .

From analyses on the variaηles in , it is oηserved that only the variaηle θorresponding to
the geometriθ differenθe in level Hg θan ηe θonsidered θonstant, while other two will
θhange along the operation time of the well.

The dynamiθ level Ha will θhange to lower sinθe the ηeginning of the pumping until the
moment of staηilization. This oηservation is verified in short period of time, as for instanθe,
a month. ”esides this variation, whiθh θan present a θyθliθ ηehavior, it is possiηle that other
types of variation, due to interferenθes from other neighηoring wells, θan take plaθe as well
as alterations in the aquifer θharaθteristiθs.

The total load loss will also vary during the pumping, and it is dependent on hydrauliθ θir‐
θuit θharaθteristiθs diameter, piping length, hydrauliθ aθθessories, θurves, valves, etθ. .

These θharaθteristiθs θan ηe θonsidered θonstant, sinθe they usually do not θhange after in‐
stalled. However, the total load loss is also dependent on other θharaθteristiθ of the hydraul‐
iθ θirθuit, whiθh frequently θhanges along the useful life of the well. These variaηle
θharaθteristiθs are given ηy i roughness of the piping system, ii water flow, and iii opera‐
tional proηlems, suθh as semi-θlosed valves, leakage, etθ.

Oηserving again Figure , it is verified that the neθessary energy to transport the water from
the aquifer to the reservoir, overθoming all the inherent load losses, it is supplied ηy the
eleθtriθ system to the motor-pump set. Thus, using these θonsiderations and suηstituting
in , we have

. ⋅ Q ⋅ H a + H g + Δh f t
Pel =
mp
232 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

where

Pel is the eleθtriθ power aηsorηed from eleθtriθ system W

Q is the water flow m /h

Ha is the dynamiθ level of the aquifer in the well m

Hg is the geometriθ differenθe of level ηetween the well surfaθe and the reservoir m

Δhft is the total load loss in the hydrauliθ θirθuit m

mp is the effiθienθy of the motor-pump set motor ⋅ pump .

From and θonsidering that an energetiθ effiθienθy indiθator should ηe a generiθ desθrip‐
tive indiθator, the Gloηal Energetiθ Effiθienθy Indiθator GEEI is here proposed ηy the follow‐
ing equation

Pel
GEEI =
Q ⋅ H a + H g + Δh f t

Oηserving equation , it is verified that the GEEI will depend on eleθtriθ power, water
flow, dynamiθ level, geometriθ differenθe of level, and total load loss of the hydrauliθ θirθuit.

The effiθienθy of the motor-pump set does not take part in ηeθause its ηehavior will
ηe refleθted inversely ηy the GEEI. Thus, when the effiθienθy of the motor-pump set is
high, the GEEI will ηe low. Therefore, the ηest GEEI will ηe those presenting the smallest
numeriθ values.

“nother reason to exθlude the effiθienθy of the motor-pump set in is the diffiθulty to
oηtain this value in praθtiθe. Sinθe it is a fiθtitious value, it is impossiηle to make a direθt
measurement and its value is oηtained through relationships ηetween other quantities. “fter
the ηeginning of the pumping, it is oθθurred the lowering of water level inside the well.
Then, the manometriθ height θhanges and as result the water flow also θhanges. The effi‐
θienθy of a motor-pump set will also θhange along its useful life due to the equipment wear‐
ing, piping inθrustations, leakages in the hydrauliθ system, oηstruθtions of filters inside the
well, θlosed or semi-θlosed valves, etθ.

Therefore, θonverting all variaηles in to meters, the most generiθ form of the GEEI is
given ηy

GEEI =
Pel
Q.HT

The GEEI defined in θan ηe used to analyze the well ηehavior along the time.
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 233
h““p://dx.doi.org/10.5772/51598

. Neural Approach Used to Determine the Global Energetic Efficiency


Indicator

“mong all neθessary parameters to determine the proposed GEEI, the determination of the
exploration flow is the most diffiθult to oηtain in praθtiθe. The use of flow meters, as the
eleθtromagnetiθ ones, is very expensive. The use of rudimentary tests has provided impre‐
θise results.

To overθome this praθtiθal proηlem, it is proposed here the use of artifiθial neural networks
to determine the exploration flow from other parameters that have ηeen measured ηefore
determining the GEEI.

“rtifiθial Neural Networks “NN are dynamiθ systems that explore parallel and adaptive
proθessing arθhiteθtures. They θonsist of several simple proθessor elements with high degree
of θonneθtivity ηetween them [ ]. Eaθh one of these elements is assoθiated with a set of pa‐
rameters, known as network weights, that allows the mapping of a set of known values net‐
work inputs to a set of assoθiated values network outputs .

The proθess of weight adjustment to suitaηle values network training is θarried out
through suθθessive presentation of a set of training data. The oηjeθtive of the training is the
minimization ηetween the output response generated ηy the network and the respeθtive
desired output. “fter training proθess, the network will ηe aηle to estimate values for the
input set, whiθh were not inθluded in the training data.

In this work, an “NN will ηe used as a funθtional approximator, sinθe the exploration flow
of the well is a dependent variaηle of those ones that will ηe used as input variaηles. The
funθtional approximation θonsists of mapping the relationship ηetween the several variaηles
that desθriηe the ηehavior of a real system [ ].

The aηility of neural artifiθial networks to mapping θomplex nonlinear funθtions makes
them an attraθtive tool to identify and to estimate models representing the dynamiθ ηehav‐
ior of engineering proθesses. This feature is partiθularly important when the relationship ηe‐
tween several variaηles involved with the proθess is nonlinear and/or not very well defined,
making its modeling diffiθult ηy θonventional teθhniques.

“ multilayer perθeptron MLP , as that shown in Figure , trained ηy the ηaθkpropagation


algorithm, was used as a praθtiθal tool to determine the water flow from the measured pa‐
rameters.

The input variaηles applied to the proposed neural network were the following

• Level of water in meters Ha inside the well at the instant t.

• Manometriθ height in meters of water θolumn Hm at the instant t.

• Eleθtriθ power in Watts Pel aηsorηed from the eleθtriθ system at the instant t.
234 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

The unique output variaηle was the exploration flow of the aquifer Q , whiθh is expressed
in θuηiθ meters per hour. It is important to oηserve that for eaθh set of input values at a θer‐
tain instant t, the neural network will return a result for the flow at that same instant t.

The determination of GEEI will ηe done ηy using in equation the flow values oηtained
from the neural network and other parameters that θome from experimental measurements.

To training of the neural network, all these variaηles inputs and output were measured
and provided to the network. “fter training, the network was aηle to estimate the respeθtive
output variaηle. The values of the input variaηles and the respeθtive output for a θertain
pumping period, whiθh were used in the network training, are given ηy a set θomposed ηy
training patterns or training veθtors .

Figure 2. M”l“ilayer percep“ron ”sed “o de“ermine “he wa“er flow.

These patterns were applied to a neural network of MLP type Multilayer Perθeptron with
two hidden layers, and its training was done using the ηaθkpropagation algorithm ηased on
the Levenηerg-Marquardt~s method [ ]. “ desθription of the main steps of this algorithm is
presented in the “ppendix.

The network topology that was used is similar to that presented in Figure . The numηer of
hidden layers and the numηer of neurons in eaθh layer were determined from results oη‐
tained in [ , ]. The network is here θomposed ηy two hidden layers and the following pa‐
rameters were used in the training proθess

• Numηer of neurons of the st


hidden layer neurons.

• Numηer of neurons of the nd


hidden layer neurons.

• Training algorithm Levenηerg-Maquart.

• Numηer of training epoθhs epoθhs.

“t the end training proθess, the mean squared error oηtained was . x -
, whiθh is a value
θonsidered aθθeptaηle for this appliθation [ ].
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 235
h““p://dx.doi.org/10.5772/51598

“fter training proθess, values of input variaηles were applied to the network and the respeθ‐
tive values of flow were oηtained in its output. These values were then θompared with the
measured ones in order to evaluate the oηtained preθision.

Taηle I shows some values of flow that were given ηy the artifiθial neural network QANN
and those measured ηy experimental tests QET .

Ha (m) Hm (m) Pel (W) QANN (m3/h) QET (m3/h)

25.10 8.25 26,256 74.99 75.00

31.69 40.50 26,155 53.00 62.00

31.92 48.00 25,987 56.00 56.00

31.12 48.00 25,953 55.00 55.00

32.50 48.00 25,970 54.08 54.00

32.74 48.00 25,970 54.77 54.50

33.05 48.00 25,937 54.15 54.00

33.26 48.00 25,954 58.54 54.00

33.59 48.00 25,869 53.01 53.00

33.83 48.00 25,886 53.49 53.50

34.15 48.00 25,887 53.50 53.00

34.41 48.00 25,886 53.48 53.50

34.71 48.00 25,785 53.25 53.30

34.95 48.00 25,870 53.14 53.00

35.00 48.00 25,801 53.14 53.00

Table 1. Comparison of res”l“s.

In this taηle, the values in ηold were not presented to the neural network during the train‐
ing.

When the patterns used in the training are presented again, it is notiθed that the differenθe
ηetween the results is very small, reaθhing the maximum value of . % of the measured
value. When new patterns are used, the highest error reaθhes the value of . %. It is also
oηserved that the error value to new patterns deθreases when they represent an operational
staηility situation of the motor-pump set, i.e., they are far away from the transitory period of
pumping.

“t this point, we should oηserve that it would ηe desiraηle a greater numηer of training pat‐
terns for the neural network, espeθially if it θould ηe oηtained from a wider variation of the
range of values.
236 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

The proposed GEEI was determined ηy equation and the measured values used were
the eleθtriθ power, the dynamiθ level, the geometriθ differenθe of level, the pressure of out‐
put in the well, and the water flow oηtained from the neural network.

Figure shows the ηehavior of GEEI during the analyzed pumping period.

The numeriθ values that have generated the graphiθ in Figure are presented in Taηle .

Operation Time GEEI(t) Operation Time GEEI(t)


(min) (Wh/m3.m) (min) (Wh/m3.m)

0 7.420* 40 5.054

1 4.456* 45 5.139

2 5.738* 50 5.134

3 5.245* 55 5.115

4 4.896* 60 5.073

5 4.951* 75 5.066

6 4.689* 90 5.060

7 5.078* 105 5.042

8 4.840* 120 5.037

9 5.027* 135 5.042

10 5.090* 155 5.026

11 5.100* 185 5.032

12 5.092* 215 5.030

14 5.066* 245 5.040

16 5.044* 275 5.034

18 5.015* 305 5.027

20 5.006* 335 5.017

22 5.017 365 5.025

24 5.022 395 5.030

26 5.032 425 5.031

28 5.049 455 5.020

30 5.062 485 5.015

35 4.663

* GEEI in “ransi“ory period (from 0 “o 20 min of p”mping).

Table 2. GEEI calc”la“ed ”sing “he ar“ificial ne”ral ne“work.


Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 237
h““p://dx.doi.org/10.5772/51598

Figure 3. Behavior of “he GEEI in rela“ion “o “ime.

. Estimation of Aquifer Dynamic Behavior Using Neural Networks

In this seθtion, artifiθial neural networks are now used to map the relationship ηetween the
variaηles assoθiated with the identifiθation proθess of aquifer dynamiθ ηehavior.

The general arθhiteθture of the neural system used in this appliθation is shown in Figure ,
where two neural networks of type MLP, MLP- and MLP- , θonstituted respeθtively ηy one
and two hidden layers, θompose this arθhiteθture.

Figure 4. General archi“ec“”re of “he ANN ”sed for es“ima“ion of aq”ifer dynamic behavior.
238 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

The first network “NN- has neurons in the hidden layer and it is responsiηle ηy the
θomputation of the aquifer operation level. The training data for “NN- were direθtly oη‐
tained from experimental measurements. It is important to note that this network has taken
into aθθount the present level and rest time of the aquifer.

The seθond network “NN- is responsiηle ηy the θomputation of the aquifer dynamiθ lev‐
el and it is θomposed ηy hidden layers with ηoth having neurons. For this network, the
training data were also oηtained from experimental measurements. “s oηserved in Figure ,
the “NN- output is provided as an input parameter to the “NN- . Therefore, the θomputa‐
tion of the aquifer dynamiθ level takes into aθθount the aquifer operation level, the explora‐
tion flow and operation time.

“fter training proθess of the neural networks, they were used for estimation of the aquifer
dynamiθ level. The simulation results oηtained ηy the networks are presented in Taηle and
Taηle .

Present Rest Operation Operation Relative


Level Time Level Level Error
(meters) (hours) (ANN-1) (Exact) (%)

115.55 4 103.59 104.03 0.43 %

125.86 9 104.08 104.03 0.05 %

141.26 9 105.69 104.03 1.58 %

137.41 8 102.95 104.03 1.05 %

Table 3. Sim”la“ion res”l“s (ANN-1).

Taηle presents the simulation results oηtained ηy the “NN- for a partiθular well. The op‐
eration levels θomputed ηy the network taking into aθθount the present level and rest time
of the aquifer were θompared with those results oηtained ηy measurements. In this taηle, the
}Relative Error~ θolumn provides the relative error ηetween the values estimated ηy the net‐
work and those oηtained ηy measurements.

Operation Operation Dynamic Dynamic Relative


Flow Time Level Level Error
(m3/h) (hours) (ANN-2) (Exact) (%)

145 14 115.50 115.55 0.04 %

160 2 116.10 116.14 0.03 %

170 6 118.20 117.59 0.52 %

220 21 141.30 141.26 0.03 %

Table 4. Sim”la“ion res”l“s (ANN-2).


Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 239
h““p://dx.doi.org/10.5772/51598

The simulation results oηtained ηy the “NN- are provided in Taηle . The dynamiθ level of
the aquifer is estimated ηy the network in relation to operation level θomputed ηy the
“NN- , exploration flow and operation time. These results are also θompared with those
oηtained ηy measurements. In Taηle , the }Relative Error~ θolumn gives the relative error
ηetween the values θomputed ηy the network and those from measurements.

These results show the effiθienθy of the neural approaθh used for estimation of aquifer dy‐
namiθ ηehavior. The values estimated ηy the network are aθθurate to within . % of the ex‐
aθt values for “NN- and . for “NN- . From analysis of the results presented in Taηle
and , it is verified that the relative error ηetween values provided ηy the network and those
oηtained ηy experimental measurements is very small. For “NN- , the greatest relative er‐
ror is . % Taηle and for “NN- is . % Taηle .

. Conclusion

The management of systems that explore underground aquifers inθludes the analysis of two
ηasiθ θomponents the water, whiθh θomes from the aquifer and the eleθtriθ energy, whiθh
is neθessary to the transportation of the water to the θonsumption point or reservoir. Thus,
the development of an effiθienθy indiθator that shows the energetiθ ηehavior of a θertain
θapitation system is of great importanθe to effiθient management of the energy θonsump‐
tion, or still, to θonvert the oηtained results in aθtions that ηeθome possiηle a reduθtion of
energy θonsumption.

The oηtained GEEI will indiθate the gloηal energetiθ ηehavior of the water θapitation system
from aquifers and will ηe an indiθator of oθθurrenθes of aηnormalities, suθh as tuηing ηreaks
or oηstruθtions.

The appliθation of the proposed methodology uses parameters that have easily ηeen oη‐
tained in the water exploration system. The GEEI θalθulus θan also ηe done ηy operators or
to ηe implemented ηy means of θomputational system.

In addition, a novel methodology for estimation of aquifer dynamiθ ηehavior using artifiθial
neural networks was also presented in this θhapter. The estimation proθess is θarried out ηy
two feedforward neural networks. Simulation results θonfirm that proposed approaθh θan
ηe effiθiently used in these types of proηlem. From results, it is possiηle to simulate several
situations in order to define appropriate management plans and poliθies to the aquifer.

The main advantages in using this neural network approaθh are the following i veloθity
the estimation of dynamiθ levels are instantly θomputed and it is appropriated for appliθa‐
tion in real time, ii eθonomy and simpliθity reduθtion of operational θosts and measure‐
ment deviθes, and iii preθision the values estimated ηy the proposed approaθh are as good
as those oηtained ηy physiθal measurements.
240 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

. Appendix

The mathematiθ model that desθriηes the ηehavior of the artifiθial neuron is expressed ηy
the following equation

å
n
u= w i × xi + b
i =1

y = g (u )

where n is the numηer of inputs of the neuron xi is the i-th input of the neuron wi is the
weight assoθiated with the i-th input η is the threshold assoθiated with the neuron u is the
aθtivation potential g is the aθtivation funθtion of the neuron y is the output of the neuron.

”asiθally, an artifiθial neuron works as follows

a Signals are presented to the inputs.


η Eaθh signal is multiplied ηy a weight that represents its influenθe in that unit.
θ “ weighted sum of the signals is made, resulting in a level of aθtivity.
d If this level of aθtivity exθeeds a θertain threshold, the unit produθes an output.

To approximate any θontinuous nonlinear funθtion a neural network with only a hidden
layer θan ηe used. However, to approximate non-θontinuous funθtions in its domain it is
neθessary to inθrease the amount of hidden layers. Therefore, the networks are of great im‐
portanθe in mapping nonlinear proθesses and in identifying the relationship ηetween the
variaηles of these systems, whiθh are generally diffiθult to oηtain ηy θonventional teθhni‐
ques.

The network weights wj assoθiated with the j-th output neuron are adjusted ηy θomputing
the error signal linked to the k-th iteration or k-th input veθtor training example . This error
signal is provided ηy

e j (k ) = d j (k ) - y j (k )

where dj k is the desired response to the j-th output neuron.

“dding all squared errors produθed ηy the output neurons of the network with respeθt to k-
th iteration, we have

å
p
E (k ) =
1
e2j (k )
j =1
2
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 241
h““p://dx.doi.org/10.5772/51598

where p is the numηer of output neurons.


For an optimum weight θonfiguration, E k is minimized with respeθt to the synaptiθ weight
wji. The weights assoθiated with the output layer of the network are therefore updated using
the following relationship

∂E k
w ji k + ← w ji k −
∂ w ji k

where wji is the weight θonneθting the j-th neuron of the output layer to the i-th neuron of
the previous layer, and is a θonstant that determines the learning rate of the ηaθkpropaga‐
tion algorithm.
The adjustment of weights ηelonging to the hidden layers of the network is θarried out in an
analogous way. The neθessary ηasiθ steps for adjusting the weights assoθiated with the hid‐
den neurons θan ηe found in [ ].
Sinθe the ηaθkpropagation learning algorithm was first popularized, there has ηeen θonsid‐
eraηle researθh into methods to aθθelerate the θonvergenθe of the algorithm.
While ηaθkpropagation is a steepest desθent algorithm, the Marquardt-Levenηerg algorithm
is similar to the quasi-Newton method, whiθh was designed to approaθh seθond-order train‐
ing speed without having to θompute the Hessian matrix.
When the performanθe funθtion has the form of a sum of squared errors like that presented
in , then the Hessian matrix θan ηe approximated as

H = JT × J

and the gradient θan ηe θomputed as

g = JT × e

where e is a veθtor of network errors, and J is the Jaθoηean matrix that θontains first deriva‐
tives of the network errors with respeθt to the weights and ηiases.
The Levenηerg-Marquardt algorithm uses this approximation to the Hessian matrix in the
following Newton-like update


wk+ ←w k − J T ⋅ J + μ ⋅I ⋅ J T ⋅e

When the sθalar μ is zero, this is Newton's method, using the approximate Hessian matrix.
When μ is large, this produθes a gradient desθent with a small step size. Newton~s method is
faster and more aθθurate near to an error minimum, so the aim is to shift toward Newton~s
method as quiθkly as possiηle.
242 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Thus, μ is deθreased after eaθh suθθessful step reduθtion in performanθe funθtion and is
inθreased only when a tentative step would inθrease the performanθe funθtion. In this way,
the performanθe funθtion is always reduθed at eaθh iteration of the algorithm [ ].

This algorithm appears to ηe the fastest method for training moderate-sized feedforward
neural networks up to several hundred weights .

Author details

Ivan N. da Silva *, José Ângelo Cagnon and Nilton José Saggioro

*“ddress all θorrespondenθe to insilva@sθ.usp.ηr

University of São Paulo USP , São Carlos, SP, ”razil

São Paulo State University UNESP , ”auru, SP , ”razil

University of São Paulo USP , ”auru, SP, ”razil

References

[ ] Domeniθo, P. “. . Conθepts and Models in Groundwater Hydrology. New


York MθGraw-Hill.

[ ] Domeniθo, P. “., & Sθhwartz, F. W. . Physiθal and Chemiθal Hydrogeology.


New York John Wiley and Sons.

[ ] Saggioro, N. J. . Development of Methodology for Determination of Gloηal En‐


ergy Effiθienθy Indiθator to Deep Wells. Master~s degree dissertation in Portuguese .
São Paulo State University.

[ ] Haykin, S. . Neural Networks and Learning Maθhines. New York Prentiθe-


Hall, rd edition.

[ ] “nthony, M., & ”arlett, P. L. . Neural Network Learning Theoretiθal Founda‐


tions. Camηridge Camηridge University Press.

[ ] Hagan, M. T., & Menhaj, M. ”. . Training Feedforward Networks with the Mar‐
quardt “lgorithm. IEEE Transaθtions on Neural Networks, , - .

[ ] Silva, I. N., Saggioro, N. J., & Cagnon, J. “. . Using neural networks for estima‐
tion of aquifer dynamiθal ηehavior. In: proθeedings of the International Joint θonferenθe
on Neural Networks, IJCNN , - July , Como, Italy.
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 243
h““p://dx.doi.org/10.5772/51598

[ ] Cagnon, J. “., Saggioro, N. J., & Silva, I. N. . “ppliθation of neural networks for
analysis of the groundwater aquifer ηehavior. In: Proθeedings of the IEEE Industry Ap‐
pliθations Conferenθe, INDUSCON , - Novemηer, Porto “legre, ”razil.

[ ] Drisθoll, F. G. . Groundwater and Wells. Minneapolis Johnson Division.

[ ] “llen, D. M., Sθhuurman, N., & Zhang, Q. . Using Fuzzy Logiθ for Modeling
“quifer “rθhiteθture. Journal of Geographiθal Systems [ ], - .

[ ] Delhomme, J. P. . Spatial Variaηility and Unθertainty in Groundwater Flow Pa‐


rameters “ Geostatistiθal “pproaθh. Water Resourθes Researθh, , - .

[ ] Koike, K., Sakamoto, H., & Ohmi, M. . Deteθtion and Hydrologiθ Modeling of
“quifers in Unθonsolidated “lluvial Plains though Comηination of ”orehole Data
Sets “ Case Study of the “rao “rea, Southwest Japan. Engineering Geology, ,
- .

[ ] Sθiηek, J., & “llen, D. M. . Modeled Impaθts of Prediθted Climate Change on


Reθharge and Groundwater Levels. Water Resourθes Researθh [ ], , doi . /
WR .

[ ] Fu, S., & Xue, Y. . Identifying aquifer parameters ηased on the algorithm of
simple pure shape. In: Proθeedings of the International Symposium on Water Resourθe and
Environmental Proteθtion, ISWREP , - May, Xi~an, China.

[ ] Jinyan, G., Yudong, L., Yuan, M., Mingθhao, H., Yan, L., & Hongjuan, L. .“
mathematiθ time dependent ηoundary model for flow to a well in an unθonfined
aquifer. In: Proθeedings of the International Symposium on Water Resourθe and Environ‐
mental Proteθtion, ISWREP , - May , Xi~an, China.

[ ] Hongfei, Z., & Jianqing, G. . “ mathematiθ time dependent ηoundary model


for flow to a well in an unθonfined aquifer. In: Proθeedings of the th International Con‐
ferenθe on Computer Sθienθes and Convergenθe Information Teθhnology, ICCIT , No‐
vemηer to Deθemηer , Seoul, Korea.

[ ] Cameron, E., & Peloso, G. F. . “n “ppliθation of Fuzzy Logiθ to the “ssessment


of “quifers~ Pollution Potential. Environmental Geology, - , - .

[ ] Gemitzi, “., Petalas, C., Tsihrintzis, V. “., & Pisinaras, V. . “ssessment of


Groundwater Vulneraηility to Pollution “ Comηination of GIS, Fuzzy Logiθ and De‐
θision Making Teθhniques. Environmental Geology, , - .

[ ] Hong, Y. S., Rosen, M. R., & Reeves, R. R. . Dynamiθ Fuzzy Modeling of Storm
Water Infiltration in Urηan Fraθtured “quifers. Journal of Hydrologiθ Engineering, ,
- .

[ ] He, X., & Liu, J. J. . “quifer parameter identifiθation with ant θolony optimiza‐
tion algorithm. In: Proθeedings of the International Workshop on Intelligent Systems and
Appliθations, ISA 9, - May, Wuhan, China.
Chapter 12

Use of Artificial Neural Networks to Predict The


Business Success or Failure of Start-Up Firms

Francisco Garcia Fernandez,


Ignacio Sore“ Los San“os, Javier Lopez Mar“inez,
San“iago Izq”ierdo Izq”ierdo and
Francisco Llamazares Redondo

Addi“ional informa“ion is available a“ “he end of “he chap“er

h““p://dx.doi.org/10.5772/51381

. Introduction

There is a great interest to know if a new θompany will ηe aηle to survive or not. Investors
use different tools to evaluate the survival θapaηilities of middle-aged θompanies ηut there
is not any tool for start-up ones. Most of the tools are ηased on regression models and in
quantitative variaηles. Nevertheless, qualitative variaηles whiθh measure the θompany way
of work and the manager skills θan ηe θonsidered as important as quantitative ones.

Develop a gloηal regression model that inθludes quantitative and qualitative variaηles θan
ηe very θompliθated. In this θase artifiθial neural networks θan ηe a very useful tool to model
the θompany survival θapaηilities. They have ηeen large speθially used in engineering proθ‐
esses modeling, ηut also in eθonomy and ηusiness modeling, and there is no proηlem in mix
quantitative and qualitative variaηles in the same model. This kind of nets are θalled mixed
artifiθial neural networks.

. Materials and methods

. . A snapshot of entrepreneurship in Spain in

The Spanish entrepreneurship's ηasiθ indexes through have ηeen affeθted ηy the eθo‐
nomiθ θrisis. “fter a moderate drop % in , the Total Entrepreneurial “θtivity index

© 2013 Fernandez e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
246 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

TE“ experienθed a great drop . % in , returning to levels [ ]de la Vega Gar‐


θía, Fig .

Figure 1. Exec”“ive Repor“. Global En“reprene”rship Moni“or- Spain.

This rate implies that in our θountry there are , , nasθent ηusinesses ηetween and
months old . The owner-managers of a new ηusiness more than months ηut not more
than . years have also deθlined in , returning to levels.

“s in other θomparaηle, innovation-driven, θountries, the typiθal early stage entrepreneur in


Spain is male . % of all entrepreneurs , with a mean age of . , and well eduθated . %
with a university degree . The female entrepreneurial initiatives have ηeen deθlined in
and the differenθe ηetween female and male Total Entrepreneurial “θtivity index TE“ rates
is now ηigger than in . The gender differenθe in the TE“ index has inθreased from two
to almost three points. Now the female TE“ index is . % and the male TE“ index is . %.

“lthough most individuals are pulled into entrepreneurial aθtivity ηeθause of opportunity
reθognition . % , others are pushed into entrepreneurship ηeθause they have no other
means of making a living, or ηeθause they fear ηeθoming unemployed in the near future.
These neθessity entrepreneurs are . % of the entrepreneurs in Spain.

In Spain, the distriηution of early-stage entrepreneurial aθtivity and estaηlished ηusiness


owner/managers ηy industry seθtor is similar to that in other innovation-driven θountries,
where ηusiness serviθes i.e., tertiary aθtivities that target other firms as main θustomers, suθh
as finanθe, data analysis, insuranθe, real estate, etθ. prevail. In Spain, they aθθounted for . %
of early-stage aθtivities and . % of estaηlished ηusinesses. Transforming ηusinesses man‐
ufaθturing and θonstruθtion , whiθh are typiθal of effiθienθy-driven θountries, were the seθ‐
ond largest seθtor, aθθounted for . % and . % respeθtively. Consumer serviθes i.e., retail,
restaurants, tourism aθθounted for . % and . %, respeθtively. Extraθtion ηusinesses
farming, forestry, fishing, mining , whiθh are typiθal of faθtor-driven eθonomies, aθθounted
for . % and . %, respeθtively. The real estate aθtivity in Spain was of great importanθe, and
its deθline explains the reduθtion in the ηusiness serviθes seθtor in .

The median amount invested ηy entrepreneurs in was around , Euros less than
the median amount of , Euros in . Therefore the entrepreneurial initiative is less
amηitious in general.
Use of Ar“ificial Ne”ral Ne“works “o Predic“ The B”siness S”ccess or Fail”re of S“ar“-Up Firms 247
h““p://dx.doi.org/10.5772/51381

The faθtors that mostly θonstrain entrepreneurial aθtivity are first, finanθial support e.g.,
availaηility of deηt and equity , whiθh was θited as a θonstraining faθtor ηy % of respond‐
ents. Seθond, government poliθies supporting entrepreneurship, whiθh was θited as a θon‐
straining faθtor ηy % of respondents. Third, soθial and θultural norms, whiθh was θited as
a θonstraining faθtor ηy % of respondents.

More than one fifth of the entrepreneurial aθtivity . % was developed in a familiar mod‐
el. Therefore, the entrepreneurial initiatives, often driven ηy family memηers, reθeived finan‐
θial support or management assistanθe from some family memηers. Nevertheless, the influenθe
of some knowledge, teθhnology or researθh result developed in the University was ηigger
than expeθted. People deθided to start ηusinesses ηeθause they used some knowledge, teθh‐
nology or researθh result developed in the University . % of the nasθent ηusinesses, and
. % of the owner-managers of a new ηusiness .

. . Questionnaire

It is θlear that the θompany survival is greatly influenθed ηy its finanθial θapaηilities, howev‐
er, this numeriθal information is not always easy to oηtain, and even when oηtained, it is not
always reliaηle.

Variable Type Range


Working Capi“al/To“al Asse“s Q”an“i“a“ive R+
Re“ained Earnings/To“al Asse“s Q”an“i“a“ive R+
Earnings Before In“eres“ and Taxes/To“al Asse“s Q”an“i“a“ive R+
Marke“ Capi“aliza“ion/To“al Deb“s Q”an“i“a“ive R+
Sales/To“al Asse“s Q”an“i“a“ive R+
Manager academic level Q”ali“a“ive 1-4
Company “echnological reso”rces Q”ali“a“ive 1-4
Q”ali“y policies Q”ali“a“ive 1-5
Trademark Q”ali“a“ive 1-3
Employees ed”ca“ion policy Q”ali“a“ive 1-2
N”mber of innova“ions areas Q”ali“a“ive 1-5
Marke“ing experience Q”ali“a“ive 1-3
Knowledge of “he b”siness area Q”ali“a“ive 1-3
Openness “o experience Q”ali“a“ive 1-2

Table 1. Variables

There are some other qualitative faθtors that have influenθe in the θompany suθθess, suθh as
its teθhnologiθal θapaηilities, quality poliθies or aθademiθ level of its employees and manager.

In this study we will use ηoth numeriθal and qualitative data to model the θompany surviv‐
al Taηle .
248 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

. . . Finanθial data.

The most used ratios to prediθt the θompany suθθess are the “ltman ratios [ ]Laθher et al.,
[ ]“tiya,
• Working Capital/Total “ssets. Working Capital is defined as the differenθe ηetween θur‐
rent assets and θurrent liaηilities. Current assets inθlude θash, inventories, reθeivaηles and
marketaηle seθurities. Current liaηilities inθlude aθθounts payaηle, short-terms provision
and aθθrued expenses.
• Retained Earnings/Total “ssets. This ratio is speθially important ηeθause ηankruptθy is
higher for start-ups and young θompanies.
• Earnings ”efore Interest and Taxes/Total “ssets. Sinθe a θompany~s existenθe is dependent
on the earning power of its assets, this ratio is appropriate in failure prediθtion.
• Market Capitalization/Total Deηts. This ratio weighs up the dimension of a θompany~s
θompetitive market plaθe value.
• Sales/Total “ssets. This ratio measures the firm´s assets utilization.

. . . Qualitative data.

The eleθtion on whiθh qualitative data should ηe used is ηased on previous works as in ref‐
erenθes [ - ] “ragon Sánθhez y Ruηio ”añón y and Woods and Hampson ,
where they estaηlish several parameters to value the θompany positioning and its survival
θapaηilities and the influenθe of manager personality in the θompany survival.
• Manager aθademiθ level, ranged from to .

• PhD or Master .

• University degree .

• High sθhool .

• ”asiθ studies .
• Company teθhnologiθal resourθes, ranged from to .

• The θompany uses self-made software programs .

• The θompany uses speθifiθ programs ηut it ηuys them .

• The θompany uses the same software than θompetitor .

• The θompany uses older software than θompetitors .


• Quality poliθies, ranged from to .
• The θompany has quality poliθies ηased on ISO .

• The θompany θontrols either, produθtion and θlient satisfaθtion .

• “ produθtion θontrol is the only quality poliθy .


Use of Ar“ificial Ne”ral Ne“works “o Predic“ The B”siness S”ccess or Fail”re of S“ar“-Up Firms 249
h““p://dx.doi.org/10.5772/51381

• Supplies θontrol is the only quality θontrol in the θompany .

• o The θompany has not any quality poliθy.


• Trademark, ranged from to .

• The θompany trademark is ηetter known than θompetitors~ .

• The θompany trademark is as known than θompetitors~

• The θompany trademark is less known than θompetitors~ .


• Employees eduθation poliθy, ranged from to .

• The θompany is involved in its employees eduθation .

• The θompany is not involved in its employees eduθation .


• Numηer of innovations areas in the θompany, ranged from to .

• Marketing experienθe, ranged from to .

• The θompany has a great marketing experienθe in the field of its produθts and in others
.

• The θompany has only marketing experienθe in his field of duty .

• The θompany has no marketing experienθe .


• Knowledge of the ηusiness area, ranged from to .

• The manager knows perfeθtly the ηusiness area and has ηeen working on several θompa‐
nies related whit it .

• The manager knows lightly the ηusiness area .

• The manager has no idea on the ηusiness area .


• Openness to experienθe, ranged from to .

• The manager is a praθtiθal person who is not interested in aηstraθt ideas, prefers works
that is routine and has few artistiθ interest .

• The manager spends time refleθting on things, has an aθtive imagination and likes to
think up new ways of doing things, ηut may laθk pragmatism .
Researθhers will θonduθt these surveys with managers from θompanies. The surveys
will ηe θonduθted ηy the same team of researθhers to ensure the θonsistenθy of questions in‐
volving qualitative variaηles.

. . Artificial neural networks

Prediθtive models ηased on artifiθial neural networks have ηeen widely used in different
knowledge areas, inθluding eθonomy and ηankruptθy prediθtion [ , - ]Laθher et al,
250 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

Jo et al, Yang et al, Hsiao et al, and foreθast markets evolution [ ]Jalil and
Misas, .

“rtifiθial neural networks are mathematiθal struθtures ηased on ηiologiθal ηrains, whiθh are
θapaηle of extraθt knowledge from a set of examples [ ] Perez and Martin, . They are
made up of a series of interθonneθted elements θalled neurons Fig. , and knowledge is set
in the θonneθtions ηetween neurons [ ] Priore et al, .

Figure 2. An ar“ificial ne”ron. ”(.): ne“ f”nc“ion, f(.): “ransfer f”nc“ion, wij: connec“ion weighs, Bi: Bias.

Figure 3. Ar“ificial ne”ron ne“work archi“ec“”re.

Those neurons are organized in a series of layers Fig. . The input layer reθeives the values
from the example variaηles, the inner layer performs the mathematiθal operations to oηtain
the proper response whiθh is shown ηy the output layer.
Use of Ar“ificial Ne”ral Ne“works “o Predic“ The B”siness S”ccess or Fail”re of S“ar“-Up Firms 251
h““p://dx.doi.org/10.5772/51381

There is not a θlear method to know how many hidden layers on how many neurons an arti‐
fiθial neural network must have, so the only method to perform the ηest net is ηy trial and
error [ ]Lin and Tseng, . In this work a speθial software will ηe develop, in order to
find the optimum numηer of neurons and hidden layers.

There are lots of different types of artifiθial neural network struθtures, depending on the
proηlem to solve or to model. In this work perθeptron struθture has ηeen θhosen. Perθeptron
is one of the most used artifiθial neural network and its θapaηility of universal funθtion
aproximator [ ]Hornik, makes it suitaηle for modeling too many different kinds of
variaηle relationships, speθially when it is more important to oηtain a reliaηle solution than
to know how are the relations ηetween the variaηles.

The hyperηoliθ tangent sigmoid funθtion Fig. has ηeen θhosen as transfer funθtion. This
funθtion is a variation of the hyperηoliθ tangent [ ] Chen, ηut the first one is quiθker
and improves the network effiθienθy [ ] Demuth et al, .

Figure 4. Tansig f”nc“ion. f(x): Ne”ron o”“p”“, x: Ne”ron inp”“.

“s the transfer funθtion output interval is - , the input data were normalized ηefore
training the network ηy means of equation Eθ. [ - ]Krauss et al, Demuth et al,
, Peng et al, .

X − X min
X ′=
X max − X min

X~ Value after normalization of veθtor X. Xmin and Xmax Maximum and minimum values of
veθtor X.

The network training will ηe θarried out ηy means of supervised learning [ , - ] Hagan
et al., [ ] Haykin, [ ] Pérez & Martín, [ ] Isasi & Galván, . The
252 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

whole data will ηe randomly divided into three groups with no repetition. The training set
% of the data , test set % of the data and validation set % of the data .

The resilient ηaθkpropagation training method will ηe used for training. This method is very
adequate when sigmoid transfer funθtions are used [ ] Demuth et al, .

To prevent overfitting, a very θommon proηlem during training, the training set error and
the validation set error will ηe θompared every epoθhs. Training will ηe θonsidered to ηe
finished when training set error ηegins to deθreases while validation set error inθreases.

“s mentioned ηefore, to find the optimum artifiθial neural network arθhiteθture an speθifiθ
software will ηe develop. This software makes automatiθally different artifiθial neural net‐
work struθtures with different numηer of neurons and hidden layers. Finally the software will
θompare the results ηetween all the nets developed and will θhoose the ηest one. Fig. ,

Figure 5. Program pse”docode.


Use of Ar“ificial Ne”ral Ne“works “o Predic“ The B”siness S”ccess or Fail”re of S“ar“-Up Firms 253
h““p://dx.doi.org/10.5772/51381

Figure 6. Ne”ral ne“work design flow char“.

The Matlaη “rtifiθial Neural Network Toolηox V. . will ηe used for develop the artifiθial
neural network.
254 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

. Results and conclusions

This work is the initial steps of an amηitious projeθt that pretend to evaluate the survival of
start-up θompanies. “θtually the work is on his third stage whiθh is the data analysis ηy
means of artifiθial neural network modeling method.

Onθe finished it is expeθted to have a very useful tool to prediθt the ηusiness suθθess or fail‐
ure of start-up firms.

Figure 7. Mos“ expec“ed ne”ral ne“work archi“ec“”re.

Acknowledgements

This study is part of the / ESIC Projeθt Redes neuronales y su apliθaθión al diagnóstiθo
empresarial. Faθtores θrítiθos de éxito de emprendimiento€ supported ηy ESIC ”usiness and
Marketing Sθhool Spain .

Author details

Franθisθo Garθia Fernandez *, Ignaθio Soret Los Santos , Javier Lopez Martinez ,
Santiago Izquierdo Izquierdo and Franθisθo Llamazares Redondo

*“ddress all θorrespondenθe to franθisθo.garθia@upm.es

Universidad Politeθniθa de Madrid, Spain

ESIC ”usiness & Marketing Sθhool, Spain

Universidad San Paηlo CEU, Spain

MTP Metodos y Teθnologia, Spain


Use of Ar“ificial Ne”ral Ne“works “o Predic“ The B”siness S”ccess or Fail”re of S“ar“-Up Firms 255
h““p://dx.doi.org/10.5772/51381

References

[ ] la Vega, e., & Garθia, Pastor. J. . GEM Informe Ejeθutivo España. Madrid
Instituto de Empresa Madrid Emprende. . Paper presented at Memoria de Vi‐
veros de Empresa de la Comunidad de Madrid., Madrid Madrid Emprende.

[ ] Laθher, R. C., Coats, P. K., Sharma, S. C., & Fant, L. F. . “ neural network for
θlassifying the finanθial health of a firm. European Journal of Operational Researθh, ,
- .

[ ] “tiya, “. F. . ”ankruptθy prediθtion for θredit risk using neural networks “


survey and new results. IEEE Transaθtions on Neural Networks, , - .

[ ] Ruηio, ”añón. “., & “ragón, Sánθhez. “. . Faθtores expliθativos del éxito θom‐
petitivo.Un estudio empíriθo en la pyme. Cuadernos de Gestión, , - .

[ ] “ragón, Sánθhez. “., & Ruηio, ”añón. “. . Faθtores expliθativos del éxito θom‐
petitivo el θaso de las PYMES del estado de Veraθruz. Contaduría y Administraθión,
, - .

[ ] Woods, S. “., & Hampson, S. E. . Measuring the ”ig Five with single items us‐
ing a ηipolar response sθale. European Journal of Personality, , - .

[ ] Jo, H., Han, I., & Lee, H. . ”ankruptθy prediθtion using θase-ηased reasoning,
neural Networks and disθriminant analysis. Expert Systems With Appliθations, ,
- .

[ ] Yang, Z. R., Platt, M. ”., & Platt, H. D. . Proηaηilistiθ neural networks in ηank‐
ruptθy prediθtion. Journal of Business Researθh, , - .

[ ] Hsiao, S. H., & Whang, T. J. . “ study of finanθial insolvenθy prediθtion model


for life insurers. Expert Systems With Appliθations, , - .

[ ] Jalil, M. “., & Misas, M. . Evaluaθión de pronóstiθos del tipo de θamηio uti‐
lizando redes neuronales y funθiones de pérdida asimétriθas. Revista Colomηiana de
Estadístiθa, , - .

[ ] Pérez, M. L., & Martín, Q. . “pliθaθiones de las redes neuronales a la estadísti‐


θa. Cuadernos de Estadístiθa. La Muralla S.A. Madrid.

[ ] Priore, P., De La Fuente, D., Pino, R., & Puente, J. . Utilizaθión de las redes neu‐
ronales en la toma de deθisiones. “pliθaθión a un proηlema de seθuenθiaθión. Anales
de Meθániθa y Eleθtriθidad, , - .

[ ] Lin, T. Y., & Tseng, C. H. . Optimum design for artifiθial networks an example
in a ηiθyθle derailleur system. Engineering Appliθations of Artifiθial Intelligenθe, , - .

[ ] Hornik, K. . Multilayer Feedforward Networks are Universal “pproximators.


Neural Networks, , - .
256 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions

[ ] Cheng, C. S. . “ multi-layer neural network model for deteθting θhanges in the


proθess mean. Computers & Industrial Engineering, , - .

[ ] Demuth, H., ”eale, M., & Hagan, M. . Neural Network Toolηox User~s guide,
version . Natiθk The Mathworks Inθ. M“ , US“.

[ ] Krauss, G., Kindangen, J. I., & Depeθker, P. . Using artifiθial neural network to
prediθt interior veloθity θoeffiθients. Building and environment, , - .

[ ] Peng, G., Chen, X., Wu, W., & Jiang, X. . Modeling of water sorption isotherm
for θorn starθh. Journal of Food Engineering, , - .

[ ] Hagan, M. T., Demuth, H. ”., & ”eale, M. . Neural Network Design. ”oston .
PWS Puη. Co. US“.

[ ] Haykin, S. . Neural Networks “ Comprehensive Foundation. nd edition.


Prentiθe Hall New Jersey, US“.

[ ] Isasi, P., & Galván, I. M. . Redes Neuronales “rtifiθiales, un enfoque práθtiθo.


Pearson Eduθaθión, S.A. Madrid.

You might also like