Professional Documents
Culture Documents
NETWORKS –
ARCHITECTURES AND
APPLICATIONS
Contributors
Ed”ardo Bianchi, Thiago M. Geronimo, Carlos E. D. Cr”z, Fernando de So”za Campos, Pa”lo Rober“o De Ag”iar, Y”ko
Osana, Francisco Garcia Fernandez, Ignacio Sore“ Los San“os, Francisco Llamazares Redondo, San“iago Izq”ierdo
Izq”ierdo, JoseMan”el Or“iz-Rodrig”ez, Hec“or Rene Vega-Carrillo, José Man”el Cervan“es-Viramon“es, Víc“or Mar“ín
Hernández-Dávila, Maria Del Rosario Mar“inez-Blanco, Giovanni Caocci, Amr Radi, Joao L”is Garcia Rosa, Jan Mareš,
L”cie Grafova, Ales Prochazka, Pavel Konopasek, Si“i Mariyam Shams”ddin, Hazem M. El-Bakry, Ivan N”nes Da Silva, Da
Silva
Published by InTech
Janeza Trdine 9, 51000 Rijeka, Croa“ia
Notice
S“a“emen“s and opinions expressed in “he chap“ers are “hese of “he individ”al con“rib”“ors and no“ necessarily “hose
of “he edi“ors or p”blisher. No responsibili“y is accep“ed for “he acc”racy of informa“ion con“ained in “he p”blished
chap“ers. The p”blisher ass”mes no responsibili“y for any damage or inj”ry “o persons or proper“y arising o”“ of “he
”se of any ma“erials, ins“r”c“ions, me“hods or ideas con“ained in “he book.
Preface VII
“rtifiθial neural networks may proηaηly ηe the single most suθθessful teθhnology in the last
two deθades whiθh has ηeen widely used in a large variety of appliθations in various areas.
“n artifiθial neural network, often just θalled a neural network, is a mathematiθal or
θomputational model that is inspired ηy the struθture and funθtion of ηiologiθal neural
networks in the ηrain. “n artifiθial neural network θonsists of a numηer of artifiθial neurons
i.e., nonlinear proθessing units whiθh are θonneθted to eaθh other via synaptiθ weights or
simply just weights . “n artifiθial neural network θan learn a task ηy adjusting weights.
There are supervised and unsupervised models. “ supervised model requires a teaθher or
desired ideal output to learn a task. “n unsupervised model does not require a teaθher,
ηut it learns a task ηased on a θost funθtion assoθiated with the task. “n artifiθial neural
network is a powerful, versatile tool. “rtifiθial neural networks have ηeen suθθessfully used
in various appliθations suθh as ηiologiθal, mediθal, industrial, θontrol engendering, software
engineering, environmental, eθonomiθal, and soθial appliθations. The high versatility of
artifiθial neural networks θomes from its high θapaηility and learning funθtion. It has ηeen
theoretiθally proved that an artifiθial neural network θan approximate any θontinuous
mapping ηy arηitrary preθision. Desired θontinuous mapping or a desired task is aθquired
in an artifiθial neural network ηy learning.
The purpose of this ηook is to provide reθent advanθes of arθhiteθtures, methodologies and
appliθations of artifiθial neural networks. The ηook θonsists of two parts arθhiteθtures and
appliθations. The arθhiteθture part θovers arθhiteθtures, design, optimization, and analysis
of artifiθial neural networks. The fundamental θonθept, prinθiples, and theory in the seθtion
help understand and use an artifiθial neural network in a speθifiθ appliθation properly as
well as effeθtively. The appliθations part θovers appliθations of artifiθial neural networks in a
wide range of areas inθluding ηiomediθal appliθations, industrial appliθations, physiθs
appliθations, θhemistry appliθations, and finanθial appliθations.
Thus, this ηook will ηe a fundamental sourθe of reθent advanθes and appliθations of artifiθial
neural networks in a wide variety of areas. The target audienθe of this ηook inθludes
professors, θollege students, graduate students, and engineers and researθhers in θompanies.
I hope this ηook will ηe a useful sourθe for readers.
h““p://dx.doi.org/10.5772/51581
. Introduction
Reθently, neural networks are drawing muθh attention as a method to realize flexiηle infor‐
mation proθessing. Neural networks θonsider neuron groups of the ηrain in the θreature,
and imitate these neurons teθhnologiθally. Neural networks have some features, espeθially
one of the important features is that the networks θan learn to aθquire the aηility of informa‐
tion proθessing.
In the field of neural network, many models have ηeen proposed suθh as the ”aθk Propaga‐
tion algorithm [ ], the Kohonen Feature Map KFM [ ], the Hopfield network [ ], and the
”idireθtional “ssoθiative Memory [ ]. In these models, the learning proθess and the reθall
proθess are divided, and therefore they need all information to learn in advanθe.
However, in the real world, it is very diffiθult to get all information to learn in advanθe, so
we need the model whose learning proθess and reθall proθess are not divided. “s suθh mod‐
el, Grossηerg and Carpenter proposed the “RT “daptive Resonanθe Theory [ ]. However,
the “RT is ηased on the loθal representation, and therefore it is not roηust for damaged neu‐
rons in the Map Layer. While in the field of assoθiative memories, some models have ηeen
proposed [ - ]. Sinθe these models are ηased on the distriηuted representation, they have
the roηustness for damaged neurons. However, their storage θapaθities are small ηeθause
their learning algorithm is ηased on the Heηηian learning.
On the other hand, the Kohonen Feature Map KFM assoθiative memory [ ] has ηeen pro‐
posed. “lthough the KFM assoθiative memory is ηased on the loθal representation as similar
as the “RT[ ], it θan learn new patterns suθθessively [ ], and its storage θapaθity is larger
than that of models in refs.[ - ]. It θan deal with auto and hetero assoθiations and the asso‐
© 2013 Nog”chi and Y”ko; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
4 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
θiations for plural sequential patterns inθluding θommon terms [ , ]. Moreover, the KFM
assoθiative memory with area representation [ ] has ηeen proposed. In the model, the area
representation [ ] was introduθed to the KFM assoθiative memory, and it has roηustness
for damaged neurons. However, it θan not deal with one-to-many assoθiations, and assoθia‐
tions of analog patterns. “s the model whiθh θan deal with analog patterns and one-to-many
assoθiations, the Kohonen Feature Map “ssoθiative Memory with Refraθtoriness ηased on
“rea Representation [ ] has ηeen proposed. In the model, one-to-many assoθiations are re‐
alized ηy refraθtoriness of neurons. Moreover, ηy improvement of the θalθulation of the in‐
ternal states of the neurons in the Map Layer, it has enough roηustness for damaged
neurons when analog patterns are memorized. However, all these models θan not realize
proηaηilistiθ assoθiation for the training set inθluding one-to-many relations.
“s the model whiθh θan realize proηaηilistiθ assoθiation for the training set inθluding one-
to-many relations, the Kohonen Feature Map Proηaηilistiθ “ssoθiative Memory ηased on
Weights Distriηution KFMP“M-WD [ ] has ηeen proposed. However, in this model, the
weights are updated only in the area θorresponding to the input pattern, so the learning
θonsidering the neighηorhood is not θarried out.
Here, we explain the θonventional Kohonen Feature Map Proηaηilistiθ “ssoθiative Memory
ηased on Weights Distriηution KFMP“M-WD .
. . Structure
KFMP“M-WD. “s shown in Fig. , this model has two layers Input/Output Layer and
Map Layer, and the Input/Output Layer is divided into some parts.
. . Learning process
In the learning algorithm of the θonventional KFMP“M-WD, the θonneθtion weights are
learned as follows
. The initial values of weights are θhosen randomly.
. The Euθlidian distanθe ηetween the learning veθtor X p and the θonneθtion weights veθ‐
tor Wi, d X p , Wi is θalθulated.
. If d X p , Wi t
is satisfied for all neurons, the input pattern X p is regarded as an un‐
known pattern. If the input pattern is regarded as a known pattern, go to .
. The neuron whiθh is the θenter of the learning area r is determined as follows
p
r = argmin d X , Wi
i Diz + Dzi < diz
for ∀ z ∈ F
where F is the set of the neurons whose θonneθtion weights are fixed. diz is the distanθe
ηetween the neuron i and the neuron z whose θonneθtion weights are fixed. In Eq. ,
Dij is the radius of the ellipse area whose θenter is the neuron i for the direθtion to the
neuron j, and is given ηy
Dij = {
ai ,
ηi ,
ai ηi
ηi + mij ai
mij + ,
dijy =
dijx =
otherwise
where ai is the long radius of the ellipse area whose θenter is the neuron i and ηi is the
short radius of the ellipse area whose θenter is the neuron i. In the KFMP“M-WD, ai
and ηi θan ηe set for eaθh training pattern. mij is the slope of the line through the neurons
i and j. In Eq. , the neuron whose Euθlidian distanθe ηetween its θonneθtion weights
and the learning veθtor is minimum in the neurons whiθh θan ηe take areas without
6 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
overlaps to the areas θorresponding to the patterns whiθh are already trained. In Eq. ,
ai and ηi are used as the size of the area for the learning veθtor.
. If d X p , Wr > t is satisfied, the θonneθtion weights of the neurons in the ellipse whose
θenter is the neuron r are updated as follows
Wi t + = {
Wi t +
Wi t ,
t X p
−Wi t , dri ≤ Dri
otherwise
− t −T
t =
T .
Here, is the initial value of t and T is the upper limit of the learning iterations.
. is iterated until d X p , Wr ≤ t
is satisfied.
. . Recall process
In the reθall proθess of the KFMP“M-WD, when the pattern X is given to the Input/Output
Layer, the output of the neuron i in the Map Layer, ximap is θalθulated ηy
,
ximap = , { i =r
otherwise
∑ g X k − W ik > map
N in k∈C
map
where is the threshold of the neuron in the Map Layer, and g ⋅ is given ηy
g η = { ,, | η| < d
otherwise .
In the KFMP“M-WD, one of the neurons whose θonneθtion weights are similar to the input
pattern are seleθted randomly as the winner neuron. So, the proηaηilistiθ assoθiation θan ηe
realized ηased on the weights distriηution.
Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 7
h““p://dx.doi.org/10.5772/51581
When the ηinary pattern X is given to the Input/Output Layer, the output of the neuron k in
the Input/Output Layer xkio is given ηy
xkio = { ,
,
W rk ≥ ηno
otherwise
io
where η is the threshold of the neurons in the Input/Output Layer.
When the analog pattern X is given to the Input/Output Layer, the output of the neuron k in
the Input/Output Layer xkio is given ηy
xkio = W rk
.
Here, we explain the proposed Improved Kohonen Feature Map Proηaηilistiθ “ssoθiative
Memory ηased on Weights Distriηution IKFMP“M-WD . The proposed model is ηased on
the θonventional Kohonen Feature Map Proηaηilistiθ “ssoθiative Memory ηased on Weights
Distriηution KFMP“M-WD [ ] desθriηed in .
. . Structure
Figure shows the struθture of the proposed IKFMP“M-WD. “s shown in Fig. , the pro‐
posed model has two layers Input/Output Layer and Map Layer, and the Input/
Output Layer is divided into some parts as similar as the θonventional KFMP“M-WD.
. . Learning process
In the learning algorithm of the proposed IKFMP“M-WD, the θonneθtion weights are
learned as follows
. The initial values of weights are θhosen randomly.
. The Euθlidian distanθe ηetween the learning veθtor X p and the θonneθtion weights veθ‐
tor Wi, d X p , Wi , is θalθulated.
. The neuron whiθh is the θenter of the learning area r is determined ηy Eq. . In Eq. ,
the neuron whose Euθlid distanθe ηetween its θonneθtion weights and the learning veθ‐
tor is minimum in the neurons whiθh θan ηe take areas without overlaps to the areas
8 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
θorresponding to the patterns whiθh are already trained. In Eq. , ai and ηi are used as
the size of the area for the learning veθtor.
. If d X p , Wr t
is satisfied, the θonneθtion weights of the neurons in the ellipse whose
θenter is the neuron r are updated as follows
{
p learn ¯
X , ≤ H dri
¯ p learn ¯ learn
W i t + H dri X −Wi t , ≤ H dri <
Wi t + = ¯ learn
andH di ∗i <
Wi t , otherwise
learn ¯ ¯
where are thresholds. H dri and H di ∗i are given ηy Eq. and these are semi-
¯
fixed funθtion. Espeθially, H dri ηehaves as the neighηorhood funθtion. Here, i* shows
the nearest weight-fixed neuron from the neuron i.
¯
H dij = ¯
dij − D
+ exp
¯
where dij shows the normalized radius of the ellipse area whose θenter is the neuron i
for the direθtion to the neuron j, and is given ηy
¯
dij
dij = .
Dij
In Eq. ,D D is the θonstant to deθide the neighηorhood area size and is the steep‐
ness parameter. If there is no weight-fixed neuron,
¯
H di ∗i =
is used.
. is iterated until d X p , Wr ≤ t
is satisfied.
. . Recall process
The reθall proθess of the proposed IKFMP“M-WD is same as that of the θonventional
KFMP“M-WD desθriηed in . .
Here, we show the θomputer experiment results to demonstrate the effeθtiveness of the pro‐
posed IKFMP“M-WD.
. . Experimental conditions
. . Association results
. . . Binary patterns
In this experiment, the ηinary patterns inθluding one-to-many relations shown in Fig. were
memorized in the network θomposed of neurons in the Input/Output Layer and
neurons in the Map Layer. Figure shows a part of the assoθiation result when θrow was
given to the Input/Output Layer. “s shown in Fig. , when θrow was given to the net‐
10 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
work, mouse t= , monkey t= and lion t= were reθalled. Figure shows a part of
the assoθiation result when duθk was given to the Input/Output Layer. In this θase, dog
t= , θat t= and penguin t= were reθalled. From these results, we θan θon‐
firmed that the proposed model θan reθall ηinary patterns inθluding one-to-many relations.
Figure 4. One-“o-Many Associa“ions for Binary Pa““erns (When crow was Given).
Figure 5. One-“o-Many Associa“ions for Binary Pa““erns (When d”ck was Given).
Figure shows the Map Layer after the pattern pairs shown in Fig. were memorized. In
Fig. , red neurons show the θenter neuron in eaθh area, ηlue neurons show the neurons in
areas for the patterns inθluding θrow, green neurons show the neurons in areas for the
patterns inθluding duθk. “s shown in Fig. , the proposed model θan learn eaθh learning
pattern with various size area. Moreover, sinθe the θonneθtion weights are updated not only
in the area ηut also in the neighηorhood area in the proposed model, areas θorresponding to
the pattern pairs inθluding θrow/duθk are arranged in near area eaθh other.
12 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Table 3. Recall Times for Binary Pa““ern corresponding “o crow and d”ck .
Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 13
h““p://dx.doi.org/10.5772/51581
Taηle shows the reθall times of eaθh pattern in the trial of Fig. t= ∼ and Fig.
t= ∼ . In this taηle, normalized values are also shown in . From these results, we
θan θonfirmed that the proposed model θan realize proηaηilistiθ assoθiations ηased on the
weight distriηutions.
. . . Analog patterns
In this experiment, the analog patterns inθluding one-to-many relations shown in Fig.
were memorized in the network θomposed of neurons in the Input/Output Layer and
neurons in the Map Layer. Figure shows a part of the assoθiation result when ηear
was given to the Input/Output Layer. “s shown in Fig. , when ηear was given to the
network, lion t= , raθθoon dog t= and penguin t= were reθalled. Figure shows
a part of the assoθiation result when mouse was given to the Input/Output Layer. In this
θase, monkey t= , hen t= and θhiθk t= were reθalled. From these re‐
sults, we θan θonfirmed that the proposed model θan reθall analog patterns inθluding one-
to-many relations.
Figure 8: One-“o-Many Associa“ions for Analog Pa““erns (When bear was Given).
14 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 9. One-“o-Many Associa“ions for Analog Pa““erns (When mo”se was Given).
Table 5. Recall Times for Analog Pa““ern corresponding “o bear and mo”se .
Figure shows the Map Layer after the pattern pairs shown in Fig. were memorized. In
Fig. , red neurons show the θenter neuron in eaθh area, ηlue neurons show the neurons in
the areas for the patterns inθluding ηear, green neurons show the neurons in the areas for
the patterns inθluding mouse. “s shown in Fig. , the proposed model θan learn eaθh
learning pattern with various size area.
Taηle shows the reθall times of eaθh pattern in the trial of Fig. t= ∼ and Fig.
t= ∼ . In this taηle, normalized values are also shown in . From these results, we
θan θonfirmed that the proposed model θan realize proηaηilistiθ assoθiations ηased on the
weight distriηutions.
. . Storage capacity
Here, we examined the storage θapaθity of the proposed model. Figures and show the
storage θapaθity of the proposed model. In this experiment, we used the network θomposed
of neurons in the Input/Output Layer and / neurons in the Map Layer, and -to-P
P= , , random pattern pairs were memorized as the area ai= . and ηi= . . Figures
and show the average of trials, and the storage θapaθities of the θonventional mod‐
el are also shown for referenθe in Figs. and . From these results, we θan θonfirm that
the storage θapaθity of the proposed model is almost same as that of the θonventional mod‐
el . “s shown in Figs. and , the storage θapaθity of the proposed model does not de‐
pend on ηinary or analog pattern. “nd it does not depend on P in one-to-P relations. It
depends on the numηer of neurons in the Map Layer.
Figure shows a part of the assoθiation result of the proposed model when the pattern
θat with % noise was given during t= ∼ . Figure shows a part of the assoθiation
result of the propsoed model when the pattern θrow with % noise was given t= ∼
. “s shown in these figures, the proposed model θan reθall θorreθt patterns even when
the noisy input was given.
Improved Kohonen Fea“”re Map Probabilis“ic Associa“ive Memory Based on Weigh“s Dis“rib”“ion 17
h““p://dx.doi.org/10.5772/51581
Figure 15. Associa“ion Res”l“ for Noisy Inp”“ (When crow was Given.).
18 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 16. Associa“ion Res”l“ for Noisy Inp”“ (When d”ck was Given.).
Figures and show the roηustness for noisy input of the proposed model. In this experi‐
ment, randam patterns in one-to-one relations were memorized in the network θomposed
of neurons in the Input/Output Layer and neurons in the Map Layer. Figures and
are the average of trials. “s shown in these figures, the proposed model has roηust‐
ness for noisy input as similar as the θonventional model .
Figure shows a part of the assoθiation result of the proposed model when the pattern
ηear was given during t= ∼ . Figure shows a part of the assoθiation result of the
proposed model when the pattern mouse was given t= ∼ . In these experiments,
the network whose % of neurons in the Map Layer are damaged were used. “s shown in
these figures, the proposed model θan reθall θorreθt patterns even when the some neurons in
the Map Layer are damaged.
Figures and show the roηustness when the winner neurons are damaged in the pro‐
posed model. In this experiment, ∼ random patterns in one-to-one relations were mem‐
orized in the network θomposed of neurons in the Input/Output Layer and neurons
in the Map Layer. Figures and are the average of trials. “s shown in these figures,
the proposed model has roηustness when the winner neurons are damaged as similar as the
θonventional model [ ].
Figure 19. Associa“ion Res”l“ for Damaged Ne”rons (When bear was Given.).
20 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 20. Associa“ion Res”l“ for Damaged Ne”rons (When mo”se was Given.).
Figures and show the roηustness for damaged neurons in the proposed model. In this
experiment, random patterns in one-to-one relations were memorized in the network
θomposed of neurons in the Input/Output Layer and neurons in the Map Layer. Fig‐
ures and are the average of trials. “s shown in these figures, the proposed model
has roηustness for damaged neurons as similar as the θonventional model [ ].
. . Learning speed
Here, we examined the learning speed of the proposed model. In this experiment, ran‐
dom patterns were memorized in the network θomposed of neurons in the Input/
Output Layer and neurons in the Map Layer. Taηle shows the learning time of the pro‐
posed model and the θonventional model . These results are average of trials on the
Personal Computer Intel Pentium . GHz , Free”SD . , gθθ . . . “s shown in Taηle
, the learning time of the proposed model is shorter than that of the θonventional model.
22 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
. Conclusions
In this paper, we have proposed the Improved Kohonen Feature Map Proηaηilistiθ “ssoθia‐
tive Memory ηased on Weights Distriηution. This model is ηased on the θonventional Koho‐
nen Feature Map Proηaηilistiθ “ssoθiative Memory ηased on Weights Distriηution. The
proposed model θan realize proηaηilistiθ assoθiation for the training set inθluding one-to-
many relations. Moreover, this model has enough roηustness for noisy input and damaged
neurons. We θarried out a series of θomputer experiments and θonfirmed the effeθtiveness of
the proposed model.
Author details
References
[ ] Rumelhart, D. E., MθClelland, J. L., & the PDP Researθh Group. . Parallel Dis‐
triηuted Proθessing, Exploitations in the Miθrostruθture of Cognition. , Founda‐
tions, The MIT Press.
[ ] Watanaηe, M., “ihara, K., & Kondo, S. . “utomatiθ learning in θhaotiθ neural
networks. IEICE-A, J -“ , - , in Japanese .
[ ] “rai, T., & Osana, Y. . Hetero θhaotiθ assoθiative memory for suθθessive learn‐
ing with give up funθtion -- One-to-many assoθiations --, Proθeedings of I“STED “r‐
tifiθial Intelligenθe and “ppliθations. Innsηruθk.
[ ] “ndo, M., Okuno, Y., & Osana, Y. . Hetero θhaotiθ assoθiative memory for suθ‐
θessive learning with multi-winners θompetition. Proθeedings of IEEE and INNS Inter‐
national Joint Conferenθe on Neural Networks, Vanθouver.
[ ] Iθhiki, H., Hagiwara, M., & Nakagawa, M. . Kohonen feature maps as a super‐
vised learning maθhine. Proθeedings of IEEE International Conferenθe on Neural Net‐
works, - .
[ ] Yamada, T., Hattori, M., Morisawa, M., & Ito, H. . Sequential learning for asso‐
θiative memory using Kohonen feature map. Proθeedings of IEEE and INNS Interna‐
tional Joint Conferenθe on Neural Networks, , Washington D.C.
[ ] Hattori, M., “risumi, H., & Ito, H. . Sequential learning for SOM assoθiative
memory with map reθonstruθtion. Proθeedings of International Conferenθe on Artifiθial
Neural Networks, Vienna.
[ ] Sakurai, N., Hattori, M., & Ito, H. . SOM assoθiative memory for temporal se‐
quenθes. Proθeedings of IEEE and INNS International Joint Conferenθe on Neural Net‐
works, - , Honolulu.
[ ] “ηe, H., & Osana, Y. . Kohonen feature map assoθiative memory with area rep‐
resentation. Proθeedings of IASTED Artifiθial Intelligenθe and Appliθations, Innsηruθk.
[ ] Koike, M., & Osana, Y. . Kohonen feature map proηaηilistiθ assoθiative memo‐
ry ηased on weights distriηution. Proθeedings of IASTED Artifiθial Intelligenθe and Ap‐
pliθations, Innsηruθk.
Chapter 2
h““p://dx.doi.org/10.5772/54177
1. Introduction
Artificial Neural Networks (ANNs) are based on an abstract and simplified view of the
neuron. Artificial neurons are connected and arranged in layers to form large networks,
where learning and connections determine the network function. Connections can be formed
through learning and do not need to be ’programmed.’ Recent ANN models lack many
physiological properties of the neuron, because they are more oriented to computational
performance than to biological credibility [41].
According to the fifth edition of Gordon Shepherd book, The Synaptic Organization of the Brain,
“information processing depends not only on anatomical substrates of synaptic circuits, but
also on the electrophysiological properties of neurons” [51]. In the literature of dynamical
systems, it is widely believed that knowing the electrical currents of nerve cells is sufficient
to determine what the cell is doing and why. Indeed, this somewhat contradicts the
observation that cells that have similar currents may exhibit different behaviors. But in
the neuroscience community, this fact was ignored until recently when the difference in
behavior was showed to be due to different mechanisms of excitability bifurcation [35].
A bifurcation of a dynamical system is a qualitative change in its dynamics produced by
varying parameters [19].
The type of bifurcation determines the most fundamental computational properties of
neurons, such as the class of excitability, the existence or nonexistence of the activation
threshold, all-or-none action potentials (spikes), sub-threshold oscillations, bi-stability of rest
and spiking states, whether the neuron is an integrator or resonator etc. [25].
A biologically inspired connectionist approach should present a neurophysiologically
motivated training algorithm, a bi-directional connectionist architecture, and several other
features, e. g., distributed representations.
© 2013 Rosa; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he Crea“ive
Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s ”nres“ric“ed ”se,
dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
26 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The McCulloch-Pitts neuron represents a simplified mathematical model for the neuron,
where xi is the i-th binary input and wi is the synaptic (connection) weight associated with
the input xi . The computation occurs in soma (cell body). For a neuron with p inputs:
p
a= ∑ x i wi (1)
i =1
with x0 = 1 and w0 = β = −θ. β is the bias and θ is the activation threshold. See figures 1
and 2. The are p binary inputs in the schema of figure 2. Xi is the i-th input, Wi is the
connection (synaptic) weight associated with input i. The synaptic weights are real numbers,
because the synapses can inhibit (negative signal) or excite (positive signal) and have different
intensities. The weighted inputs (Xi × Wi ) are summed in the cell body, providing a signal a.
After that, the signal a is input to an activation function ( f ), giving the neuron output.
The activation function can be: (1) hard limiter, (2) threshold logic, and (3) sigmoid, which is
considered the biologically more plausible activation function.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 27
h““p://dx.doi.org/10.5772/54177
Figure 3. Set of linearly separable points. Figure 4. Set of non-linearly separable points.
Figure 5. Drawing by Santiago Ramón y Cajal of neurons in the pigeon cerebellum. (A) denotes Purkinje cells, an example of a
multipolar neuron, while (B) denotes granule cells, which are also multipolar [57].
28 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 6. A 3-layer neural network. Notice that there are A + 1 input units, B + 1 hidden units, and C output units. w1 and
w2 are the synaptic weight matrices between input and hidden layers and between hidden and output layers, respectively. The
“extra” neurons in input and hidden layers, labeled 1, represent the presence of bias: the ability of the network to fire even in
the absence of input signal.
1.5. Learning
The Canadian psychologist Donald Hebb established the bases for current connectionist
learning algorithms: “When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes part in firing it, some growth process or metabolic change takes place in
one or both cells such that A’s efficiency, as one of the cells firing B, is increased” [21]. Also,
the word “connectionism” appeared for the first time: “The theory is evidently a form of
connectionism, one of the switchboard variety, though it does not deal in direct connections
between afferent and efferent pathways: not an ’S-R’ psychology, if R means a muscular
response. The connections server rather to establish autonomous central activities, which
then are the basis of further learning” [21].
According to Hebb, knowledge is revealed by associations, that is, the plasticity in Central
Nervous System (CNS) allows synapses to be created and destroyed. Synaptic weights
change values, therefore allow learning, which can be through internal self-organizing:
encoding of new knowledge and reinforcement of existent knowledge. How to supply a
neural substrate to association learning among world facts? Hebb proposed a hypothesis:
connections between two nodes highly activated at the same time are reinforced. This kind of
rule is a formalization of the associationist psychology, in which associations are accumulated
among things that happen together. This hypothesis permits to model the CNS plasticity,
adapting it to environmental changes, through excitatory and inhibitory strength of existing
synapses, and its topology. This way, it allows that a connectionist network learns correlation
among facts.
Connectionist networks learn through synaptic weight change, in most cases: it reveals
statistical correlations from the environment. Learning may happen also through network
topology change (in a few models). This is a case of probabilistic reasoning without a
statistical model of the problem. Basically, two learning methods are possible with Hebbian
learning: unsupervised learning and supervised learning. In unsupervised learning there is
no teacher, so the network tries to find out regularities in the input patterns. In supervised
learning, the input is associated with the output. If they are equal, learning is called
auto-associative; if they are different, hetero-associative.
30 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
1.6. Back-propagation
Back-propagation (BP) is a supervised algorithm for multilayer networks. It applies the
generalized delta rule, requiring two passes of computation: (1) activation propagation
(forward pass), and (2) error back propagation (backward pass). Back-propagation works
in the following way: it propagates the activation from input to hidden layer, and from
hidden to output layer; calculates the error for output units, then back propagates the error
to hidden units and then to input units.
BP has a universal approximation power, that is, given a continuous function, there is a
two-layer network (one hidden layer) that can be trained by Back-propagation in order to
approximate as much as desired this function. Besides, it is the most used algorithm.
Although Back-propagation is a very known and most used connectionist training algorithm,
it is computationally expensive (slow), it does not solve satisfactorily big size problems, and
sometimes, the solution found is a local minimum - a locally minimum value for the error
function.
BP is based on the error back propagation: while stimulus propagates forwardly, the error
(difference between the actual and the desired outputs) propagates backwardly. In the
cerebral cortex, the stimulus generated when a neuron fires crosses the axon towards its end
in order to make a synapse onto another neuron input. Suppose that BP occurs in the brain;
in this case, the error must have to propagate back from the dendrite of the postsynaptic
neuron to the axon and then to the dendrite of the presynaptic neuron. It sounds unrealistic
and improbable. Synaptic “weights” have to be modified in order to make learning possible,
but certainly not in the way BP does. Weight change must use only local information in the
synapse where it occurs. That’s why BP seems to be so biologically implausible.
2. Dynamical systems
Neurons may be treated as dynamical systems, as the main result of Hodgkin-Huxley
model [23]. A dynamical system consists of a set of variables that describe its state and
a law that describes the evolution of state variables with time [25]. The Hodgkin-Huxley
model is a dynamical system of four dimensions, because their status is determined solely
by the membrane potential V and the variable opening (activation) and closing (deactivation)
of ion channels n, m and h for persistent K + and transient Na+ currents [1, 27, 28]. The law
of evolution is given by a four-dimensional system of ordinary differential equations (ODE).
Principles of neurodynamics describe the basis for the development of biologically plausible
models of cognition [30].
All variables that describe the neuronal dynamics can be classified into four classes according
to their function and time scale [25]:
1. Membrane potential.
2. Excitation variables, such as activation of a Na+ current. They are responsible for lifting
the action potential.
3. Recovery variables, such as the inactivation of a current Na+ and activation of a rapid
current K + . They are responsible for re-polarization (lowering) of the action potential.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 31
h““p://dx.doi.org/10.5772/54177
4. Adaptation variables, such as the activation of low voltage or current dependent on Ca2+ .
They build prolonged action potentials and can affect the excitability over time.
Figure 7. The neuron states: rest (a), excitable (b), and activity of periodic spiking (c). At the bottom, we see the trajectories
of the system, depending on the starting point. Figure taken from [25], available at http://www.izhikevich.org/publications/dsn.
pdf.
2.3. Bifurcations
Apparently, there is an injected current that corresponds to the transition from rest to
continuous spiking, i.e. from the portrait phase of figure 7(b) to 7(c). From the point of view
of dynamical systems, the transition corresponds to a bifurcation of the dynamical neuron, or
a qualitative representation of the phase of the system.
In general, neurons are excitable because they are close to bifurcations from rest to spiking
activity. The type of bifurcation depends on the electrophysiology of the neuron and
determines its excitable properties. Interestingly, although there are millions of different
electrophysiological mechanisms of excitability and spiking, there are only four different
types of bifurcation of equilibrium that a system can provide. One can understand the
properties of excitable neurons, whose currents were not measured and whose models are
not known, since one can identify experimentally in which of the four bifurcations undergoes
the rest state of the neuron [25].
The four bifurcations are shown in figure 8: saddle-node bifurcation, saddle-node on
invariant circle, sub-critical Andronov-Hopf and supercritical Andronov-Hopf. In saddle-node
bifurcation, when the magnitude of the injected current or other parameter of the bifurcation
changes, a stable equilibrium correspondent to the rest state (black circle) is approximated by
an unstable equilibrium (white circle). In saddle-node bifurcation on invariant circle, there is an
invariant circle at the time of bifurcation, which becomes a limit cycle attractor. In sub-critical
Andronov-Hopf bifurcation, a small unstable limit cycle shrinks to a equilibrium state and
loses stability. Thus the trajectory deviates from equilibrium and approaches a limit cycle of
high amplitude spiking or some other attractor. In the supercritical Andronov-Hopf bifurcation,
the equilibrium state loses stability and gives rise to a small amplitude limit cycle attractor.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 33
h““p://dx.doi.org/10.5772/54177
When the magnitude of the injected current increases, the limit cycle amplitude increases
and becomes a complete spiking limit cycle [25].
Figure 8. Geometry of phase portraits of excitable systems near the four bifurcations can exemplify many neurocomputational
properties. Figure taken from [25], available at http://www.izhikevich.org/publications/dsn.pdf.
Table 2. Neuron classification in integrators-resonators/monostable-bistable, according to the rest state bifurcation. Adapted
from [25].
memory or cognitive maps, for materialists and cognitivists. In the pragmatism view there
is no temporary storage of images and no representational map.
The neurons in the brain form dense networks. The balance of excitation and inhibition allow
them to have intrinsic oscillatory activity and overall amplitude modulation (AM) [10, 55].
These AM patterns are expressions of non-linear chaos, not merely a summation of linear
dendritic and action potentials. AM patterns create attractor basins and landscapes. In the
neurodynamical model every neuron participates, to some extent, in every experience and
every behavior, via non-linear chaotic mechanisms [10].
The concepts of non-linear chaotic neurodynamics are of fundamental importance to nervous
system research. They are relevant to our understanding of the workings of the normal
brain [55].
Figure 9. Typical neuron showing the dendrites (input), the soma (cell body), the axon (output), the trigger zone, and the
direction of the action potential. Notice that letters “E” represent the pair of extracellular electrodes. Adapted from [45]
and [10].
In single neurons, microscopic pulse frequencies and wave amplitudes are measured,
while in populations, macroscopic pulse and wave densities are measured. The neuron
is microscopic and ensemble is mesoscopic. The flow of the current inside the neuron is
revealed by a change in the membrane potential, measured with an electrode inside the
cell body, evaluating the dendritic wave state variable of the single neuron. Recall that
36 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
extracellular electrodes are placed outside the neuron (see the Es in figure 9), so cortical
potential provided by sum of dendritic currents in the neighborhood is measured. The same
currents produce the membrane (intracellular) and cortical (extracellular) potentials, given
two views of neural activity: the former, microscopic and the latter, mesoscopic [10].
Cortical neurons, because of their synaptic interactions, form neuron populations.
Microscopic pulse and wave state variables are used to describe the activity of the single
neurons that contribute to the population, and mesoscopic state variables (also pulse and
wave) are used to describe the collective activities neurons give rise. Mass activity in the
brain is described by a pulse density, instead of pulse frequency. This is done by recording
from outside the cell the firing of pulses of many neurons simultaneously. The same current
that controls the firings of neurons is measured by EEG, which does not allow to distinguish
individual contributions. Fortunately, this is not necessary.
A population is a collection of neurons in a neighborhood, corresponding to a cortical
column, which represents dynamical patterns of activity. The average pulse density in a
population can never approach the peak pulse frequencies of single neurons. The activity of
neighborhoods in the center of the dendritic sigmoid curve is very near linear. This simplifies
the description of populations. Neuron populations are similar to mesoscopic ensembles in
many complex systems [10]. The behavior of the microscopic elements is constrained by the
embedding ensemble, and it cannot be understood outside a mesoscopic and macroscopic
view.
The collective action of neurons forms activity patterns that go beyond the cellular level and
approach the organism level. The formation of mesoscopic states is the first step for that. This
way, the activity level is decided by the population, not by individuals [10]. The population
is semi-autonomous. It has a point attractor, returning to the same level after its releasing.
The state space of the neuron population is defined by the range of amplitudes that its pulse
and wave densities can take.
Figure 10. Representation of (b) KI and (c) KII sets by networks of (a) KO sets. Available at [9].
The advantages of KIII pattern classifiers on artificial neural networks are the small number
of training examples needed, convergence to an attractor in a single step and geometric
increase (rather than linear) in the number of classes with the number of nodes. The
disadvantage is the increasing of the computational time needed to solve ordinary differential
equations numerically.
The Katchalsky K-models use a set of ordinary differential equations with distributed
parameters to describe the hierarchy of neuron populations beginning from micro-columns
to hemispheres [31]. In relation to the standard KV, K-sets provide a platform for conducting
analyzes of unified actions of the neocortex in the creation and control of intentional and
cognitive behaviors [13].
2.6. Neuropercolation
Neuropercolation is a family of stochastic models based on the mathematical theory of
probabilistic cellular automata on lattices and random graphs, motivated by the structural
38 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
and dynamical properties of neuron populations. The existence of phase transitions has been
demonstrated both in discrete and continuous state space models, i.e., in specific probabilistic
cellular automata and percolation models. Neuropercolation extends the concept of phase
transitions for large interactive populations of nerve cells [31].
Basic bootstrap percolation [50] has the following properties: (1) it is a deterministic
process, based on random initialization, (2) the model always progresses in one direction:
from inactive to active states and never otherwise. Under these conditions, these
mathematical models exhibit phase transitions with respect to the initialization probability p.
Neuropercolation models develop neurobiologically motivated generalizations of bootstrap
percolations [31].
they are [31]. It is said that connectivity and dynamics are scale-free [13, 14], which states that
the dynamics of the cortex is size independent, such that the brains of mice, men, elephants
and whales work the same way [17].
Scale-free dynamics of the neocortex are characterized by self-similarity of patterns of
synaptic connectivity and spatio-temporal neural activity, seen in power law distributions
of structural and functional parameters and in rapid state transitions between levels of the
hierarchy [15].
1
[33] employs the terms “minus” and “plus” phases to designate expectation and outcome phases respectively in his
GeneRec algorithm.
2
The superscript p is used to indicate that this signal refers to the previous cycle.
40 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 11. The expectation phase. Figure 12. The outcome phase.
inputs and previous output stimuli o p (sum of the bottom-up and top-down propagations -
through the sigmoid logistic activation function σ). Then, these hidden signals propagate to
the output layer γ (step 4), and an actual output o is obtained (step 5) for each and every
one of the C output units, through the propagation of the hidden expectation activation to
the output layer (Eq. (3)) [37]. wijh are the connection (synaptic) weights between input (i)
and hidden (j) units, and wojk are the connection (synaptic) weights between hidden (j) and
output (k) units3 .
p
hej = σ(ΣAi=0 wijh .xi + ΣCk=1 wojk .ok ) 1≤j≤B (2)
In the outcome phase (figure 12), input x is presented to input layer α again; there is
propagation to hidden layer β (bottom-up) (step 1 in figure 12). After this, expected output
y (step 2) is presented to the output layer and propagated back to the hidden layer β
(top-down) (step 3), and a hidden outcome activation (ho ) is generated, based on inputs
and on expected outputs (Eq. (4)). For the other words, presented one at a time, the same
procedure (expectation phase first, then outcome phase) is repeated [37]. Recall that the
architecture is bi-directional, so it is possible for the stimuli to propagate either forwardly or
backwardly.
3
i, j, and k are the indexes for the input (A), hidden (B), and output (C) units respectively. Input (α) and hidden (β)
layers have an extra unit (index 0) used for simulating the presence of a bias [20]. This extra unit is absent from the
output (γ) layer. That’s the reason i and j range from 0 to the number of units in the layer, and k from 1. x0 , h0e , and
h0o are set to +1. w0j
h o
is the bias of the hidden neuron j and w0k is the bias of the output neuron k.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 41
h““p://dx.doi.org/10.5772/54177
In order to make learning possible the synaptic weights are updated through the delta rule4
(Eqs. (5) and (6)), considering only the local information made available by the synapse.
The learning rate η used in the algorithm is considered an important variable during the
experiments [20].
Figure 13 displays a simple application to digit learning which compares BP with GeneRec
(GR) algorithms.
Other applications were proposed using similar alleged biological inspired architecture and
algorithm [34, 37–40, 42–44, 49].
4
The learning equations are essentially the delta rule (Widrow-Hoff rule), which is basically error correction: “The
adjustment made to a synaptic weight of a neuron is proportional to the product of the error signal and the input
signal of the synapse in question.” ([20], p. 53).
42 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
N = {{w}, θ, g, T, R, C } (7)
where:
θ, g, T, R, and C encode the genetic information, while T, R, and C are the labels, absent
in other models. This proposal follows Ramón y Cajal’s principle of connectional specificity,
that states that each neuron is connected to another neuron not only in relation to {w}, θ,
and g, but also in relation to T, R, and C; neuron i is only connected to neuron j if there is
binding affinity between the T of i and the R of j. Binding affinity means compatible types,
enough amount of substrate, and compatible genes. The combination of T and R results in
C: C can act over other neuron connections.
The ordinary biological neuron presents many dendrites usually branched, which receive
information from other neurons, an axon, which transmits the processed information, usually
by propagation of an action potential. The axon is divided into several branches, and makes
synapses onto the dendrites and cell bodies of other neurons (see figure 14). Chemical
synapse is predominant is the cerebral cortex, and the release of transmitter substance occurs
in active zones, inside presynaptic terminals. Certain chemical synapses lack active zones,
resulting in slower and more diffuse synaptic actions between cells. The combination of a
neurotransmitter and a receptor makes the postsynaptic cell releases a protein.
Although type I synapses seem to be excitatory and type II synapses inhibitory (see
figure 15), the action of a transmitter in the postsynaptic cell does not depend on the chemical
nature of the neurotransmitter, instead it depends on the properties of the receptors with
which the transmitter binds. In some cases, it is the receptor that determines whether
a synapse is excitatory or inhibitory, and an ion channel will be activated directly by the
transmitter or indirectly through a second messenger.
Neurotransmitters are released by presynaptic neuron and they combine with specific
receptor in membrane of postsynaptic neuron. The combination of neurotransmitter with
44 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 15. Morphological synapses type A and type B. In excitatory synapse (type A), neurons contribute to produce impulses
on other cells: asymmetrical membrane specializations, very large synaptic vesicles (50 nm) with packets of neurotransmitters.
In inhibitory synapse (type B), neurons prevent the releasing of impulses on other cells: symmetrical membrane specializations,
synaptic vesicles are smaller and often ellipsoidal or flattened, contact zone usually smaller. Figure taken from [45].
Figure 16. An axon-axon synapse [6]. Figure 17. A local potential change [6].
In view of these biological facts, it was decided to model through labels T and R, the binding
affinities between Ts and Rs. And label C represents the role of the “second messenger,”, the
effects of graded potential, and the protein released by the coupling of T and R.
Controller C can modify the binding affinities between neurons by modifying the degrees
of affinity of receptors, the amount of substrate (amount of transmitters and receptors), and
gene expression, in case of mutation. The degrees of affinity are related to the way receptors
gate ion channels at chemical synapses. Through ion channels transmitter material enters
the postsynaptic cell: (1) in direct gating: receptors produce relatively fast synaptic actions,
and (2) in indirect gating: receptors produce slow synaptic actions: these slower actions often
serve to modulate behavior because they modify the degrees of affinity of receptors.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 45
h““p://dx.doi.org/10.5772/54177
In addition, modulation can be related to the action of peptides5 . There are many distinct
peptides, of several types and shapes, that can act as neurotransmitters. Peptides are different
from many conventional transmitters, because they “modulate” synaptic function instead of
activating it, they spread slowly and persist for some time, much more than conventional
transmitters, and they do not act where released, but at some distant site (in some cases).
As transmitters, peptides act at very restricted places, display a slow rate of conduction, and
do not sustain the high frequencies of impulses. As neuromodulators, the excitatory effects of
substance P (a peptide) are very slow in the beginning and longer in duration (more than one
minute), so they cannot cause enough depolarization to excite the cells; the effect is to make
neurons more readily excited by other excitatory inputs, the so-called “neuromodulation.”
In the proposed model, C explains this function by modifying the degrees of affinity of
receptors.
In biological systems, the amount of substrate modification is regulated by the acetylcholine
(a neurotransmitter). It spreads over a short distance, toward the postsynaptic membrane,
acting at receptor molecules in that membrane, which are enzymatically divided, and part of
it is taken up again for synthesis of a new transmitter. This will produce an increase in the
amount of substrate. In this model, C represents substrate increase by a variable acting over
initial substrate amount.
Peptides are a second, slower, means of communication between neurons, more economical
than using extra neurons. This second messenger, besides altering the affinities between
transmitters and receptors, can regulate gene expression, achieving synaptic transmission
with long-lasting consequences. In this model, this is achieved by modification of a variable
for gene expression, mutation can be accounted for.
• number of layers;
• number of neurons in each layer;
• initial amount of substrate (transmitters and receptors) in each layer; and
• genetics of each layer:
• type of transmitter and its degree of affinity,
• type of receptor and its degree of affinity, and
• genes (name and gene expression)).
For the evaluation of controllers and how they act, the parameters are:
The specifications stated above lead to an ANN with some distinctive characteristics: (1)
each neuron has a genetic code, which is a set of genes plus a gene expression controller;
(2) the controller can cause mutation, because it can regulate gene expression; (3) the
substrate (amount of transmitter and receptor) is defined by layer, but it is limited, so some
postsynaptic neurons are not activated: this way, the network favors clustering.
Also, the substrate increase is related to the gene specified in the controller, because the
synthesis of a new transmitter occurs in the pre-synaptic terminal (origin gene) [36]. The
modification of the genetic code, that is, mutation, as well as the modification of the degree of
affinity of receptors, however, is related to the target gene. The reason is that the modulation
function of controller is better explained at some distance of the emission of neurotransmitter,
therefore at the target.
Table 3. The data set for a five-layer network. Adapted from [36].
In figure 18, one can notice that every unit in layer 1 (the input layer) is linked to the first nine
units in layer 2 (first hidden layer). The reason why not every unit in layer 2 is connected to
layer 1, although the receptor of layer 2 has the same type of the transmitter of layer 1, is that
the amount of substrate in layer 1 is eight units. This means that, in principle, each layer-1
unit is able to connect to at most eight units. But controller 1, from layer 1 to 2, incremented
by 1 the amount of substrate of the origin layer (layer 1). The result is that each layer 1 unit
can link to nine units in layer 2. Observe that from layer 2 to layer 3 (the second hidden layer)
only four layer-2 units are connected to layer 3, because also of the amount of substrate of
layer 3, which is 4.
As a result of the compatibility of layer-2 transmitter and layer-5 receptor, and the existence
of remaining unused substrate of layer 2, one could expect that the first two units in
layer 2 should connect to the only unit in layer 5 (the output unit). However, this does
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 47
h““p://dx.doi.org/10.5772/54177
Figure 18. A five-layer neural network for the data set in table 3. In the bottom of the figure is the layer 1 (input layer) and in
the top is the layer 5 (output layer). Between them, there are three hidden layers (layers 2 to 4). Figure taken from [36].
not occur because their genes are not compatible. Although gene compatibility exists, in
principle, between layers 1 and 4, their units do not connect to each other because there is
no remaining substrate in layer 1 and because controller 1 between layers 1 and 4 modified
the gene expression of layer 4, making them incompatible. The remaining controller has
the effect of modifying the degrees of affinity of receptors in layer 3 (target). Consequently,
the connections between layers 2 and 3 became weakened (represented by dotted lines).
Notice that, in order to allow connections, in addition to the existence of enough amount
of substrate, the genes and the types of transmitters and receptors of each layer must be
compatible.
Although the architecture shown in figure 18 is feed-forward, recurrence, or re-entrance,
is permitted in this model. This kind of feedback goes along with Edelman and Tononi’s
“dynamic core” notion [7]. This up-to-date hypothesis suggests that there are neuronal
groups underlying conscious experience, the dynamic core, which is highly distributed and
integrated through a network of reentrant connections.
4. Conclusions
Current models of ANN are in debt with human brain physiology. Because of their
mathematical simplicity, they lack several biological features of the cerebral cortex. Also,
instead of the individual behavior of the neurons, the mesoscopic information is privileged.
The mesoscopic level of the brain could be described adequately by dynamical system
theory (attractor states and cycles). The EEG waves reflect the existence of cycles in brain
electric field. The objective here is to present biologically plausible ANN models, closer to
human brain capacity. In the model proposed, still at the microscopic level of analysis,
the possibility of connections between neurons is related not only to synaptic weights,
activation threshold, and activation function, but also to labels that embody the binding
affinities between transmitters and receptors. This type of ANN would be closer to human
evolutionary capacity, that is, it would represent a genetically well-suited model of the brain.
The hypothesis of the “dynamic core” [7] is also contemplated, that is, the model allows
reentrancy in its architecture connections.
Acknowledgements
I am grateful to my students, who have collaborated with me in this subject for the last ten
years.
Author details
João Luís Garcia Rosa
Bioinspired Computing Laboratory (BioCom), Department of Computer Science, University
of São Paulo at São Carlos, Brazil
5. References
[1] B. Aguera y Arcas, A. L. Fairhall, and W. Bialek, “Computation in a Single Neuron:
Hodgkin and Huxley Revisited,” Neural Computation 15, 1715–1749 (2003), MIT Press.
[2] A. Clark, Mindware: An introduction to the philosophy of cognitive science. Oxford, Oxford
University Press, 2001.
[3] CLION - Center for Large-Scale Integration and Optimization Networks, Neurodynamics
of Brain & Behavior, FedEx Institute of Technology, University of Memphis, Memphis,
TN, USA. http://clion.memphis.edu/laboratories/cnd/nbb/.
[5] F. H. C. Crick, “The Recent Excitement about Neural Networks,” Nature 337 (1989)
pp. 129–132.
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 49
h““p://dx.doi.org/10.5772/54177
[6] F. Crick and C. Asanuma, “Certain Aspects of the Anatomy and Physiology of the
Cerebral Cortex,” in J. L. McClelland and D. E. Rumelhart (eds.), Parallel Distributed
Processing, Vol. 2, Cambridge, Massachusetts - London, England, The MIT Press, 1986.
[9] W. J. Freeman, Mass action in the nervous system - Examination of the Neurophysiological Basis
of Adaptive Behavior through the EEG, Academic Press, New York San Francisco London
1975. Available at http://sulcus.berkeley.edu/.
[10] W. J. Freeman, How Brains Make Up Their Minds, Weidenfeld & Nicolson, London, 1999.
[12] W. J. Freeman, “How and Why Brains Create Meaning from Sensory Information,”
International Journal of Bifurcation & Chaos 14: 513–530, 2004.
[14] W. J. Freeman, “Deep analysis of perception through dynamic structures that emerge in
cortical activity from self-regulated noise,” Cogn Neurodyn (2009) 3:105–116.
[17] W. J. Freeman and R. Kozma, “Freeman’s mass action,” Scholarpedia 5(1):8040. http:
//www.scholarpedia.org/article/Freeman’s_mass_action, 2010.
[24] S. Hoover, “Kozma’s research is brain wave of the future,” Update - The newsletter for the
University of Memphis, http://www.memphis.edu/update/sep09/kozma.php, 2009.
[26] A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial Neural Networks: A Tutorial,” IEEE
Computer, March 1996, pp. 31–44.
[29] A. K. Katchalsky, V. Rowland and R. Blumenthal, Dynamic patterns of brain cell assemblies,
MIT Press, 1974.
[32] W. S. McCulloch and W. Pitts. “A logical calculus of the ideas immanent in nervous
activity.” Bulletin of Mathematical Biophysics, 5, 115-133, 1943.
[36] J. L. G. Rosa, “An Artificial Neural Network Model Based on Neuroscience: Looking
Closely at the Brain,” in V. Kurková, N. C. Steele, R. Neruda, and M. Kárný (Eds.),
Artificial Neural Nets and Genetic Algorithms - Proceedings of the International Conference
Biologically Pla”sible Ar“ificial Ne”ral Ne“works 51
h““p://dx.doi.org/10.5772/54177
[38] J. L. G. Rosa, “A Biologically Motivated Connectionist System for Predicting the Next
Word in Natural Language Sentences,” in Proceedings of the 2002 IEEE International
Conference on Systems, Man, and Cybernetics - IEEE-SMC’02 - Volume 4. 06-09 October
2002. Hammamet, Tunísia.
[50] R. H. Schonmann, “On the Behavior of Some Cellular Automata Related to Bootstrap
Percolation,” The Annals of Probability, Vol. 20, No. 1 (Jan., 1992), pp. 174–193.
[51] G. M. Shepherd, The synaptic organization of the brain, fifth edition, Oxford University
Press, USA, 2003.
[54] A. B. Silva and J. L. G. Rosa, “Advances on Criteria for Biological Plausibility in Artificial
Neural Networks: Think of Learning Processes,” Proceedings of IJCNN 2011 - International
Joint Conference on Neural Networks, San Jose, California, July 31 - August 5, 2011, pp.
1394-1401.
[55] J. R. Smythies, Book review on “How Brains Make up Their Minds. By W. J. Freeman.”
Psychological Medicine, 2001, 31, 373–376. 2001 Cambridge University Press.
[56] O. Sporns, “Network Analysis, Complexity, and Brain Function,” Complexity, vol. 8, no.
1, pp. 56–60. Willey Periodicals, Inc. 2003.
h““p://dx.doi.org/10.5772/51776
. Introduction
The assignment of value to the weight θommonly ηrings the major impaθt towards the
learning ηehaviour of the network. If the algorithm suθθessfully θomputes the θorreθt value
of the weight, it θan θonverge faster to the solution otherwise, the θonvergenθe might ηe
slower or it might θause divergenθe. To prevent this proηlem oθθurring, the step of gradient
desθent is θontrolled ηy a parameter θalled the learning rate. This parameter will determine
the length of step taken ηy the gradient to move along the error surfaθe. Moreover, to avoid
the osθillation proηlem that might happen around the steep valley, the fraθtion of last
weight update is added to the θurrent weight update and the magnitude is adjusted ηy a
parameter θalled momentum. The inθlusion of these parameters aims to produθe a θorreθt
value of weight update whiθh later will ηe used to update the new weight. The θorreθt value
of weight update θan ηe seen in two aspeθts sign and magnitude. If ηoth aspeθts are proper‐
ly θhosen and assigned to the weight, the learning proθess θan ηe optimized and the solution
is not hard to reaθh. Owing to the usefulness of two-term”P and the adaptive learning meth‐
od in learning the network, this study is proposing the weights sign θhanges with respeθt to
gradient desθent in ”P networks, with and without the adaptive learning method.
. Related work
Gradient desθent teθhnique is expeθted to ηring the network θloser to the minimum error
without taking for granted the θonvergenθe rate of the network. It is meant to generate the
© 2013 Mariyam Shams”ddin e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms
of “he Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
54 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
slope that moves downwardsalong the error surfaθe to searθh for the minimum point. Dur‐
ing its movement, the points passed ηy the slope throughout the iterations affeθt the magni‐
tude of the value of weight update and its direθtion. Later, the updated weight is used for
training the network at eaθh epoθh until the predefined iteration is aθhieved or the mini‐
mum error has ηeen reaθhed. Despite the general suθθess of ”P in learning, several major
defiθienθies still need to ηe solved. The most notaηle defiθienθies, aθθording to referenθe [ ],
are the existenθe of temporary loθal minima due to the saturation ηehaviour of aθtivation
funθtion. The slow rates of θonvergenθe are due to the existenθe of loθal minima and the
θonvergenθe rate is relatively slow for a network with more than one hidden layer. These
drawηaθks are also aθknowledged ηy several sθholars [ - ].
Error funθtion plays a vital role in the learning proθess of two-term”P algorithm. “ side
from θalθulating the aθtual error from the training, it assists the algorithm in reaθhing the
minimum point where the solution θonverges ηy θalθulating its gradient and ηaθk propaga‐
tion to the network for weight adjustment and error minimization. Henθe, the proηlem of
ηeing trapped in loθal minima θan ηe avoided and the desired solution θan ηe aθhieved.
The movement of the gradient on the error surfaθe may vary in term of its direθtion and
magnitude. The sign of the gradient indiθates the direθtion it moves and the magnitude of
the gradient indiθates the step size taken ηy the gradient to move on the error surfaθe. This
temporal ηehaviour of the gradient provides insight aηout θonditions on the error surfaθe.
This information will then ηe used to perform a proper adjustment of the weight, whiθh is
θarried out ηy implementing a weight adjustment method. Onθe the weight is properly ad‐
justed, the learning proθess takes only a short time to θonverge to the solution. Henθe, the
proηlem faθed ηy two-term”P is solved. The term proper adjustment of weight here refers
to the proper assignment of magnitude and sign to the weight, sinθe ηoth of these faθtors
affeθt the internal learning proθess of the network.
“side from the gradient, there are some faθtors that play an important role in the assign‐
ment of proper θhange to the weight speθifiθally in term of its sign. These faθtors are the
learning parameters suθh as learning rate and momentum. Literally, learning rate and mo‐
mentum parameters hold an important role in the two-term ”P training proθess. Respeθtive‐
ly, they θontrol the step size taken ηy the gradient along the error surfaθe and speed up the
learning proθess. In a θonventional ”P algorithm, the initial value of ηoth parameters is very
θritiθal sinθe it will ηe retained throughout all the learning iterations. The assignment of
fixed value to ηoth parameters is not always a good idea,ηearing in mind that the error sur‐
faθe is not always flat or never flat. Thus, the step size taken ηy the gradient θannot ηe simi‐
lar over time. It needs to take into aθθount the θharaθteristiθs of the error surfaθe and the
direθtion of movement. This is a very important θondition to ηe taken into θonsideration to
generate the proper value and direθtion of the weight. If this θan ηe aθhieved, the network
θan reaθh the minimum in a shorter time and the desired output is oηtained.
Setting a larger value for the learning rate may assist the network to θonverge faster. How‐
ever, owing to the larger step taken ηy the gradient, the osθillation proηlem may oθθur and
θause divergenθe or in some θases, we might overshoot the minimum. On the other hand, if
the smaller value is assigned to the learning rate, the gradient will move in the θorreθt direθ‐
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 55
h““p://dx.doi.org/10.5772/51776
tion and gradually reaθh the minimum point. However, the θonvergenθe rate is θompro‐
mised owing to the smaller steps taken ηy the gradient. On the other hand, the momentum
is used to overθome the osθillation proηlem. It pushes the gradient to move up the steep val‐
ley in order to esθape the osθillation proηlem, otherwise the gradient will ηounθe from one
side of the surfaθe to another. Under this θondition, the direθtion of gradient θhanges rapid‐
ly and may θause divergenθe. “s a result, the θomputed weight update value and direθtion
will ηe inθorreθt, whiθh affeθts the learning proθess. It is oηviously seen that the use of a
fixed parameter value is not effiθient. The oηvious way to solve this proηlem is to imple‐
ment an adaptive learning method to produθe the dynamiθ value of learning parameters.
In addition, the faθt that the two-term ”P algorithm uses the uniform value of learning rate
may lead to the proηlem of overshooting minima and slow movement on the shallow sur‐
faθe. This phenomenon may θause the algorithm to diverge or θonverge very slowly to the
solution owing to the different step size taken ηy eaθh slope to move in a different direθ‐
tion. In [ ] has proposed a solution to these matters, θalled the Delta-”ar-Delta D”D algo‐
rithm. The method proposed ηy the author foθuses on the setting of a learning rate value
for eaθh weight θonneθtion. Thus, eaθh θonneθtion will have its own learning rate. Howev‐
er, this method still suffers from θertain drawηaθks. The first drawηaθk is that the method
is not effiθient to ηe used together with the momentum sinθe sometimes it θauses diver‐
genθe. The seθond drawηaθk is the assignment of the inθrement parameter whiθh θausesa
drastiθ inθrement on the learning rate so that the exponential deθrement does not ηring a
signifiθant impaθt to overθome a wild jump. For these reasons, [ ] proposed an improved
D”D algorithm θalled the Extended Delta-”ar-Delta ED”D algorithm. ED”D implements
a similar notion of D”D and adds some modifiθations to it to alleviate the drawηaθks faθed
ηy D”D, and demonstrates a satisfaθtory learning performanθe. Unlike D”D, ED”D pro‐
vides a way to adjust ηoth learning rate and momentum for individual weight θonneθtion,
and its learning performanθe is thus superior to D”D. ED”D is one of many adaptive learn‐
ing methods proposed to improve the performanθe of standard ”P. The author has pro‐
ven that the ED”D algorithm outperforms the D”D algorithm. The satisfaθtory performanθe
tells us that the algorithm performssuθθessfully and well in generating proper weight with
the inθlusion of momentum.
[ ] has proposed the ηatθh gradient desθent method momentum, ηy θomηining the momen‐
tum with the ηatθh gradient desθent algorithm. “ny sample in the network θannot have an
immediate effeθt, however it has to wait until all the input samples are in attendanθe. If that
happens, then we aθθumulate the sum of all errors, and finally foθus on the right to modify
the weights to enhanθe the θonvergenθe rate aθθordingto the totalerror. The advantages of
this method are faster speed, fewer iterations and smoother θonvergenθe. On the other hand,
[ ] has presented a new learning algorithm for a feed-forward neural network ηased on the
two-term”P method using an adaptive learning rate. The adaptation is ηased on the error
θriteria where error is measured in the validation set instead of the training set to dynami‐
θally adjust the gloηal learning rate. The proposed algorithm θonsists of two phases. In the
first phase, the learning rate is adjusted after eaθh iterationso that the minimum error is
quiθkly attained. In the seθond phase, the searθh algorithm is refined ηy repeatedly revert‐
ing to previous weight θonfigurations and deθreasing the gloηal learning rate. The experi‐
56 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
mental result shows that the proposed method quiθkly θonverges and outperforms two-
term ”P in terms of generalization when the size of the training set is reduθed. [ ] has
improved the θonvergenθe rates of the two-term ”P model with some modifiθations in
learning strategies. The experiment results show that the modified ”P improved muθh ηet‐
ter θompared with standard ”P.
In [ ] a fast ”P learning method is proposed using optimization of the learning rate for
pulsed neural networks PNN . The proposed method optimized the learning rate so as to
speed up learning in every learning θyθle, during θonneθtion weight learning and attenua‐
tion rate learning for the purpose of aθθelerating ”P learningina PNN.The authors devised
an error ”P learning method using optimization of the learning rate. The results showed
that the average numηer of learning θyθles required in all of the proηlems was reduθed ηy
optimization of the learning rate during θonneθtion weight learning, indiθating the validity
of the proposed method.
In [ ], the two-term ”P is improvedso that it θan overθome the proηlems of slow learning
and is easy to trap into the minimum ηy adopting an adaptive algorithm.The method di‐
vides the whole training proθess into many learning phases. The effeθts will indiθate the di‐
reθtion of the network gloηally. Different ranges of effeθt values θorrespond to different
learning models. The next learning phase will adjust the learning model ηased on the evalu‐
ation effeθts aθθording to the previous learning phase.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 57
h““p://dx.doi.org/10.5772/51776
We θan infer from previous literature that the evolution of the improvement of ”P learning
for more than years still points towards the openness of θontriηution in enhanθing the ”P
algorithm in training and learning the network espeθially in terms of weight adjustments.
The modifiθation of the weight adjustment aims to update the weight with the θorreθt value
to oηtain a ηetter θonvergenθe rate and minimum error. This θan ηe seen from various stud‐
ies that signifiθantly θontrol the proper sign and magnitude of the weight.
. Two-TermBack-Propagation BP Network
The arθhiteθture of two-term ”P is deliηerately ηuilt in suθh away that it resemηles the struθ‐
ture of neuron. It θontains several layers where eaθh layer interaθts with the upper layer
θonneθted to it ηy θonneθtion link. Conneθtion link is speθifiθally θonneθting the nodes with
in the layers with the nodes in the adjaθent layer that ηuilds a highly inter θonneθted net‐
work. The ηottom-most layer, θalled the input layer, will aθθept and proθess the input and
pass the output to the next adjaθent layer, θalled the hidden layer. The general arθhiteθture
of “NN is depiθted in Figure [ ].
where,
The input layer has M neurons and input veθtor X = [ x , x ,…, xM ] and the output layer
has L neurons and has output veθtor Y=[ y , y ,…, yL ] while the hidden layer has Q
neurons.
58 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The output reθeived from the input layer will ηe proθessed and θomputed mathematiθally in
the hidden layer and the output will ηe passed to the output layer. In addition, ”P θan have
more than one hidden layer ηut it θreates θomplexity in training the network. One reason for
this θomplexity is the existenθe of loθal minima θompared with the one with one hidden lay‐
er. The learning depends greatly on the initial weight θhoiθe to lead to θonvergenθe.
Nodes in ”P θan ηe thought of a sun its that proθess in put to produθe output. The output
produθed ηy the node is affeθted largely ηy the weight assoθiated with eaθh link. In this
proθess, eaθh input will ηe multiplied with weight assoθiated with θonneθtion link θonneθt‐
ed to the node and added with ηias. Weight is used to determine the strength of the output
to ηe θloser to the desired output. The greater the weight, the greater the θhanθe of the out‐
put ηeing θloser to the desired output. The relationship ηetween the weight, θonneθtion link
and the layers θan ηe shown in Figure in referenθe [ ].
Onθe the output arrives at the hidden layer, it will ηe summed up to θreate a net.This is
θalled linear θomηination. The net is fed to aθtivation funθtion and the output will ηe passed
to the output layer. To ensure the learning takes plaθe θontinuously, in the sense that the
derivative of error funθtion θan keep moving down hillon the error surfaθe in searθhing for
the minimum, the aθtivation funθtion needs to ηe a θontinuous differentiaηle funθtion. The
most θommonly used aθtivation funθtion is the sigmoid funθtion, whiθh limits the output
ηetween and .
net j = ∑ W ij Oi + i
i
Oj= -net j
+ e
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 59
h““p://dx.doi.org/10.5772/51776
where,
W ij is the weight assoθiated at the θonneθtion link ηetween nodes in the input layeriand no‐
des in
hidden layer j,
i is the ηias assoθiated at eaθh θonneθtion link ηetween input layeriandhidden layerj,
Other aθtivation funθtions that are θommonly used are the logarithmiθ,tangent, hyperηoliθ
tangent funθtions and many more.
The output generated ηy aθtivation funθtionis forwarded to the output layer. Similar to in‐
put and hidden layers, the θonneθtion link that θonneθts the hidden and output layers is as‐
soθiated with weight. “θtivated output reθeived from the hidden layer is multiplied ηy
weight. Depending on the appliθation, the numηer of nodes in the output layer may vary. In
a θlassifiθation proηlem, the output layer only θonsists of one node to produθe the result of
either yes or no or aηinary numηers. “ll the weighted outputs are added together and this
value will ηe fed to the aθtivation funθtion togener ate the final output. Mean Square Error is
used as an error funθtion to θalθulate the error at eaθh iteration using the target output and
the final θalθulated output of the learning at eaθh iteration. If the error is still larger than the
predefined aθθeptaηle error value, the training proθess θontinues to the next iteration.
net k = ∑ W jk O j + k
j
Ok = -net k
+ e
E= ∑ tk - ok
k
where,
W jk is the weight assoθiated to θonneθtion link ηetween the hidden layer j and
k is the ηias assoθiated to eaθh θonneθtion link ηetween the hidden layer
“ large value of error oηtained at the end of eaθh iteration denotes the deviation of learning
where the desired output has not ηeen aθhieved. To solve this proηlem, the derivative of er‐
ror funθtion with respeθt to weight is θomputedand ηaθk-propagated to the layers to θom‐
pute the new weight value at eaθh θonneθtion link. This algorithm is known as the delta
rule, whiθh employs the gradient desθent method. The new weight is expeθted to ηe a θor‐
reθt weight that θan produθe the θorreθt output. For weight assoθiated to eaθh θonneθtion
link ηetween output layer k to hidden layer j, the weight inθremental value is θomputed us‐
ing a weight adjustment equation as follows
∂E
∆ W kj t = - ∂ W kj + ∆ W kj t -
where,
”y applying the θhain rule,we θan simplify the negative derivative of error funθtion with re‐
speθt to weight as follows
∂E ∂E ∂ net k
∂ W kj = ∂ net k ∙ ∂ W kj
∂E ∂E
∂ W kj = ∂ net k ∙O j
∂E
∂ net k = k
∂E
∂ W kj = k ∙O j
∂E ∂E ∂ Ok
k = ∂ net k = ∂ Ok ∙ ∂ net k
∂ Ok
∂ net k = Ok Ok -
∂E ∂E
∂ net k = ∂ Ok ∙ Ok Ok -
∂E
∂ Ok = - tk - Ok
∂E
k = ∂ net k = - tk - Ok ∙ Ok Ok -
Thus, ηy suηstituting Equation into Equation , we get the weight adjustment equa‐
tion forweight assoθiated to eaθh θonneθtion link ηetween output layer k to hidden layer j
with simplified negative derivative error funθtion with respeθt to weight,
∆ W kj t = tk - Ok ∙ Ok Ok - ∙ O j + ∆ W kj t -
On the other side, the error signal is ηaθk-propagated to ηring impaθt to the weight ηetween
input layer i and hidden layer j. The error signal at hidden layerjθan ηe written as follows
j= ∑ k W kj O -Oj
k j
”ased on Equation and the suηstitution of Equation in to Equation ,the weight ad‐
justment equation for the weights assoθiated to eaθh θonneθtion link ηetween input layer I
and hidden layer j is as follows
∆ W ji t = ∑ k W kj Oj - O j ∙ Oi + ∆ W ji t -
k
Where,
62 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The values oηtained from Equation and Equation are used to update the value of
weights at eaθh θonneθtion link. Let t refer to tth iteration of training the new weight value
at t+ th iteration assoθiated to eaθh θonneθtion link ηetween output layer to hidden layer j
is θalθulated as follows
W kj t + = ∆ W kj t + W kj t
Where,
W kj t + is the new value for weight assoθiated to eaθh θonneθtionlinkηetween output lay‐
er k and hidden layer j,
W kj t is the θurrent value of weight assoθiated to eaθh θonneθtion linkηetween output layer
k and hidden layer j at t th iteration
Meanwhile, the new weight value at t- th iteration for the weight assoθiated at eaθh θon‐
neθtion link ηetween hidden layer j and the input layer i θan ηe written as follows
W ji t + = ∆ W ji t + W ji t
Where,
W ji t + is the new value for the weight assoθiated to eaθh θonneθtion linkηetween hidden
layer j and input layer i,
W ji t is the θurrent value of weight assoθiated to eaθh θonneθtion link ηetween hidden lay‐
er j and input layer i.
The gradient of error funθtion is expeθted to move down the error surfaθe and reaθh the
minima point where the gloηal minimum resides. Owing to the temporal ηehaviour of gra‐
dient desθent and the shape of the error surfaθe, the step taken to move down the error sur‐
faθe may lead to the divergenθe of the training. Many reasons θan θause this proηlem ηut
one of the misovershooting the loθal minima where the desired output lies. This may hap‐
pen when the step taken ηy the gradient is large. However, a large step θan lead the network
to θonverge faster ηut when it moves down along the narrow and steep valley, the algo‐
rithm might go in the wrong direθtion and ηounθe from one side aθross to the other side.In
θontrast, a small step θan direθt the algorithm to the θorreθt direθtion ηut the θonvergenθer‐
ate is θompromised. The learning time ηeθomes slower sinθe more instanθes of training are
needed to aθhieve minimum error. Thus, the diffiθulty of this algorithm lies in θontrolling
the step and direθtion of the gradient along the error surfaθe. For this reason a parameter
θalled the learning rate is used in weight adjustment θomputation. The θhoiθe of learning
rate value is appliθation-dependent and most θases are ηased on experiments. Onθe the θor‐
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 63
h““p://dx.doi.org/10.5772/51776
reθt learning rate is oηtained, the gradient movement θan produθe the θorreθt new weight
value to produθe the θorreθt output.
Owing to the proηlem of osθillation in the narrow valley, another parameter is needed to
keep the gradient moving in the θorreθt direθtion so that the algorithm will not suffer from
wide osθillation. This parameter is θalled momentum. Momentum ηrings the impaθt of pre‐
vious weight θhange to the θurrent weight θhange ηy whiθh the gradient will move uphill
esθaping the osθillation along the valley. The inθorporation of two parameters in the weight
adjustment θalθulation produθes a great impaθt on the θonvergenθe of the algorithm and
proηlem of loθal minima if they are tuned to the θorreθt value.
The previous seθtions have disθussed the role of parameters in produθing the inθrement val‐
ue of weight through the implementation of a weight adjustment equation. “s disθussed ηe‐
fore, learning rate and momentum θoeffiθient are the most θommonly used parameters in
two-term ”P. The use of a θonstant value of parameter is not always a good idea. In the θase
of learning rate, setting up a smaller value to learning rate may deθelerate the θonvergenθe
speed even though it θan guarantee that the gradient will move in the θorreθt direθtion. On
the θontrary, setting up a larger value to learning rate may fasten the θonvergenθe speed ηut
is prone to an osθillation proηlem that may lead to divergenθe. On the other hand, the mo‐
mentum parameter is introduθed to staηilize the movement of gradient desθentin the steep‐
est valley ηy overθoming the osθillation proηlem. In [ ] stated that assigning too small a
value to the momentum faθtor may deθelerate the θonvergenθe speed and the staηility of the
network is θompromised, while too large a value for the momentum faθtor results in the al‐
gorithm giving exθessive emphasis to the previous derivatives that weaken the gradient de‐
sθent of ”P. Henθe, the author suggested the use of a dynamiθ adjustment method for
momentum. Like the momentum parameter, the value of the learning rate also needs to ηe
adjusted at eaθh iteration to avoid the proηlem produθed ηy having a θonstant value
throughout all iterations.
The adaptive parameters learning rate and momentum used in this study are implemented
to assist the network in θontrolling the movement of gradient desθent on the error surfaθe
whiθh primarily aims to attain the θorreθt value of the weight.
The θorreθt inθrement value of weight will ηe used later to update the new value of weight.
This method will ηe implemented to two-term ”P algorithmwith MSE. The adaptive method
assists in generating the θorreθt sign value for the weight, whiθh is the primary θonθern of
this study.
The θhoiθe of the adaptive method foθuses on the learning θharaθteristiθ of the algorithm
used in this study, whiθh is ηatθh learning. In [ ] gave a ηrief definition of online learning
and the differenθe with ηatθh learning. The author defined online learningas a sθheme for
updating weight that updates weight after every input-output θase, while ηatθh learning aθ‐
64 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
θumulates error signals over all input-output θases ηefore updatingweight.In otherwords,
online learningupdates weight after the presentation of eaθh input and target data. The
ηatθh learning method refleθts the true gradient desθent where, asstated in referenθe [ ],
eaθh weight update tries to minimize the error.The author also stated that the summed gra‐
dient information for the whole pattern set provides reliaηle information regarding the
shape of the whole error funθtion.
With the task of pointing out the temporal ηehaviour of gradient of error funθtionanditsrela‐
tiontotheθhangeofweightsign,theadaptivelearning method used in this study is adopted
from the paper written ηy referenθe [ ]entitled ”aθk-Propagation Heuristiθs “ Study of the
Extended Delta-”ar-Delta “lgorithm. The author proposed an improvement of the D”D al‐
gorithm proposed in referenθe [ ], θalled the Extended Delta-”ar-Delta ED”D algorithm,
where the improved method provide sa way of updating the value of momentum for eaθh
weight θonneθtion a teaθh iteration. Sinθe ED”D is anextension of D”D, it implements a sim‐
ilar notion to D”D. It exploits the information of the sign of past and θurrent gradients. The
sign information of past and θurrent gradients ηeθomes the θondition for learning rate and
momentum adaptation. Moreover,the improved algorithm also providesaθe iling to prevent
the value of learning rate and momentum ηeθoming too large. The detail edequations of the
method are desθriηed ηelow
∂E t
∆ w ji t = - ji t ∂ w ji + ji t ∆ w ji t -
Where,
ji is the learning rate ηetween ith input layer and jth hidden layer a t tthiteration
ji t is the momentum ηetween ith input layer andjth hidden layer at t th iteration
The updated value for learning rate and momentum θan ηe written as follows.
{ |- t|
}
-
kl exp - yl ji if ji t- ji t >
∆ ji t = -
-∅l ji
t if ji t- ji t <
otherwise
{ |- t|
}
-
km exp - ym ji if ji t- ji t >
∆ ji t = -
-∅m ji
t if ji t- ji t <
otherwise
- -
ij t = - ij t + ij t-
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 65
h““p://dx.doi.org/10.5772/51776
ji t+ = MIN max, ji t +∆ ji t
ji t+ = MIN max, ji t +∆ ji t
Where,
kl , yl , ∅l are parameters for learning rate adaptive equation,
at t th iteration,
The algorithm θalθulates the exponential average of past derivatives to oηtain information
aηout the reθent history of the direθtion in whiθh the error is deθreasing up to iteration
t.This information together with the θurrent gradient is used to adjust the parameters~value
ηased on their sign. When the θurrent and past derivatives possess the same sign, it shows
that the gradient is moving in the same direθtion. One θan assume that in this situation, the
gradient is moving in the flat area at whiθh the minimum lies ahead. In θontrast, when the
θurrent and past derivatives possess an opposite sign, it shows that the gradient is moving
ina different direθtion.
One θan assume that in this situation,the gradient has jumped over the minimuma nd
weight needs to ηe deθreased to solve this.
The inθrement of learning rate value is made proportional to exponentially deθaying traθe so
that the learning rate will inθrease signifiθantly at a flat region and deθrease at a steep slope.
To prevent the unηounded inθrement of parameters, the maximum value for learning rate
and momentum are set to aθt as a θeiling to ηoth parameters.
Owing to the exθellent idea and performanθe of the algorithmas has ηeen proven in refer‐
enθe [ ], this method is proposed to assist the network in produθing proper weight sign
θhange and aθhieving the purpose of this study.
produθing the proper weight. ”asiθally, the adaptive methods are the transformation of the
author~s idea of an optimization θonθeptinto a mathematiθal θonθept. The measurement of
the suθθess of the method and the algorithm as a whole θan ηe done in many ways. Some of
them are θarried out ηy analysing the θonvergenθe rate, the aθθuraθy of the result, the error
value it produθes and the θhange of the weight sign as a response to the temporal ηehaviour
of the gradient, etθ. Henθe, the role of the adaptive method that is θonstruθted ηy using a
mathematiθal θonθept to improve the weight adjustment θomputation in order to yield the
proper weights will ηe implemented and examined in this study to θheθk the effiθienθy and
the learning ηehaviour of ηoth algorithms. The effiθienθy θan ηe drawn from the θriteria of
the measurement of suθθess desθriηed earlier.
“ simpliθitly depiθted from the algorithm, the proθess of generating the proper weight
stems from θalθulating it sup date value. This proθess is affeθted ηy various variaηles start‐
ing from the parameters up to the θontrolled variaηle suθh as gradient, previous weight in‐
θrement value and error. They all play a great part in affeθting the sign of the weight
produθed, espeθially the partial derivative of error funθtion with respeθtto weight gradi‐
ent . In referenθe [ ] ηriefly desθriηed the relationship ηetween error θurvature, gradient
and weight. The author mentioned that when the error θurve enters the flat region, the
θhange of derivative and error θurve are smaller and as a result, the θhange of the weight
will not ηe optimized. Moreover, when it enters the high θurvature region, the derivative
θhange is large espeθially if the minimum point exists at this region, and the adjustment of
weight value is large whiθh sometimes overshoots the minima. This proηlem θan ηe alleviat‐
ed ηy adjusting the step size of the gradient and this θan ηe done ηy adjusting the learning
rate. The momentum θoeffiθient θan ηe used to θontrol the osθillation proηlem and its imple‐
mentation along with the proportional faθtor θan speed up the θonvergenθe. In addition, in
[ ] also gave the proper θondition for the rate of weight and the temporal ηehaviour of the
gradient. The author wrote that if the derivative has the same symηol as the previous one,
then the sum of the weight is in θreased, whiθh makes the weight inθrement value larger
and yields the inθrement of weight rate. On the θontrary, if the derivative has the opposite
sign to the previous one, the sum of the weight is deθreased to staηilize the network. In [ ]
also emphasized the θauses of the slow θonvergenθe whiθh in volve the magnitude and the
direθtion θomponent of the gradient veθtor. The author stated that when the error surfaθe is
fairly flat along the weight dimension, the magnitude of derivative of weight is small yields
small adjustment value of weight and many steps are required to reduθe the error. Mean‐
while, when the error surfaθe is highly θurved along the weight dimension, the derivative of
weight is large in magnitude yields a large value of weight whiθh may over shoot the mini‐
mum. The author also ηriefly disθussed the performanθe of ”P with momentum. The author
stated that when the θonseθutive derivatives of a weight possess the same sign, the exponen‐
tially weighted sum grows large in magnitude and the weight is adjusted ηy a large value.
On the θontrary, when the signs of the θonseθutive derivatives of the weight are opposite,
the weighted sum is small in magnitude and the weight is adjusted ηy a small amount.
Moreover, the author raised the implementation of loθal adaptive methods suθh as Delta-
”ar-Delta whiθh is originally proposed in referenθe [ ]. From the desθriptiongivenin [ ], the
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 67
h““p://dx.doi.org/10.5772/51776
learning ηehaviour of the network θan ηe justified as follows The θonseθutive weight inθre‐
ment value that possesses the opposite sign indiθates the osθillation of weight value whiθh
requires the learning rate to ηe reduθed. Similarly, the θonseθutive weight inθrement value
that possesses the same sign requires the inθremental of the value of the learning rate. This
information will ηe used in studying the weight sign θhange of ηoth algorithms.
This seθtion disθusses the result of the experiment and its analysis. The detailed disθussion
of the experiment proθess θovers its initial phase until the analysis is summarized.
The point of learning oθθurs when the differenθe ηetween the θalθulate do utput and the de‐
sired output exists,otherwise there is no point of learning to take plaθe. When the differenθe
does exist, the error signal is propagated ηaθk into the network to ηe minimized. The net‐
work will then adjust itself to θompensate the lost during the training to learn ηetter. This
proθedure is θarried out ηy θalθulating the gradient of error whiθh mainly aimed to adjust
the value of the weight to ηe used for the next feed-forward training proθedure.
”ased on the influenθe of weight, this signal θan ηring a ηigger or smaller influenθe to the
learning. When the weight is negative, the weight θonneθtion inhiηits the input signal, and
thus it does not ηring signifiθant influenθe to the learning and output. “s are sult, the other
nodes with positive weight will dominate the learning and the output. On the other hand,
when the weight is positive, the weight exhiηits the input signal to ηring signifiθant impaθt
to the learning and the output and the respeθtive node makes a θontriηution to the learning
and output. If the assignment results in large error, the θorresponding weight needs to ηe
adjusted to reduθe the error. The adjustment θomprises magnitude and sign θhange.”y
properly adjusting the weight, the algorithm θan θonverge to the solution faster.
To have the θlear idea of the impaθt of the weight sign in the learning,the assumption is
used Let all value of weights ηe negative and using the equation written ηelow, we oηtain
the negative value of net.
net = ∑ W i O +
i
68 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
O= + e -net
Feeding net into the sigmoid aθtivation funθtion Equation , we oηtain the value of O
θlose to . On the other hand, Letall value of weights ηe negative and using the equation
ηelow, we oηtain the positive value ofnet.
Feeding net into the sigmoid aθtivation funθtion at equation , we oηtain the value of O
θlose to . From the assumption aηove, we θan infer that ηy adjusting the weight with the
proper value and sign, the network θan learn ηetter and faster. Mathematiθally, the adjust‐
ment of weight is θarried out ηy using the weight update method in and the θurrent weight
value as written ηelow.
∆W t = t ∇E W t + t ∆W t -
W t+ = ∆W t + W t
It θan ηe seen from Equation and that the rear evarious faθtors whiθh influenθe the
θalθulation of the weight update value. The most notaηle faθtor is gradient desθent. Eaθh
gradient with respeθt to eaθh weight in the network will ηe θalθulated and the weight up‐
date value is θomputed to update the θorresponding weight. The negative sign is assigned
to a gradient to forθe it to move downhill along the error surfaθe in the weight spaθe. This is
meant to find the minimum point of the error surfaθe where the set of optimal weight re‐
sides. With this optimal value, the goal of learning θan ηe aθhieved. “nother faθtor is the
previous weight update value.
“θθording to referenθe [ ], the role of the previous weight update value in the weight is to
ηring in the fraθtion of the previous weight update value into the θurrent weight to smooth
out the new weight value.
To have a signifiθant impaθt on the weight update value, the magnitude of gradient and the
previous weight update value are θontrolled ηy learning parameters suθh as learning rate
and momentum. “s in two-term”P, the values of learning rate and parameters are aθquired
through experiments.The θorreθt tuning of learning a parameter~s value θan assist in oηtain‐
ing the θorreθt value and sign of weight update. “fterwards, it effeθts the θalθulation of the
new value of weight ηy using the weight adjustment method in Equation .
In experiments,a few authors have oηserved that the temporal ηehaviour of the gradient
does make a θontriηution to the learning. This temporal ηehaviour θan ηe seen as the θhang‐
ing of the gradient~s sign during its movement on the error surfaθe. Owing to the θurvature
on the error surfaθe, the gradient ηehaves differently under θertain θonditions. In [ ] stated
that when the gradient possesses the same sign in several θonseθutive iterations, it indiθates
that the minimum lies ahead ηut when the gradient θhanges its sign in several θonseθutive
iterations, it indiθates that the minimum has ηeen passed. ”y using this information, we θan
improve our wayto update the weight.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 69
h““p://dx.doi.org/10.5772/51776
To sum up, ηased on the heuristiθ that has ηeen disθussed, the value of weight should ηe
inθreased when the several θonseθutive signs of gradient remain the same and deθreased if
the opposite θondition oθθurs. However, that is not the sole determination of the sign. Other
faθtors suθh as LR,gradient, momentum, previous weight value and θurrent weight value al‐
so play a greater role in affeθting the weight θhanging sign and magnitude. The heuristiθ
given previously is merelyan indiθator and information aηout gradient movement on the
surfaθe and the surfaθe itself ηy whiθh we get a ηetter understanding aηout the gradient and
the error surfaθe so that we θan arrive at the enhanθement on gradient movement in the
weight spaθe.
Through a thorough study on the weight θhange sign proθesson ηoth algorithms, it θan ηe
θonθluded that the information aηout the sign of past and θurrent gradient is very helpful in
guiding us to improve the training performanθe whiθh in this θase refers to the movement of
gradienton the error surfaθe. However, faθtors like gradient, learning rate, momentum, pre‐
vious weight update value and θurrent weight value do have greater influenθe on the sign
and magnitude of the new value of weight. ”esides that, the assignment of initial weight
value also needs to ηe addressed further.The negative as sign one at fifth iteration. value of
weight may lead the weight to remain negative on θonseθutive oθθasions when the gradient
possesses a negatives ign. “s a result the node will not θontriηute muθh to the learning and
output. This θan ηe oηserved from the weight update equation as follows.
∆W t = t ∇E W t + t ∆W t -
“t the initial training iteration,the errort end stoηe large and so does the gradient. This large
value of gradient together with its sign will dominate the weight update θalθulation and
thus will ηring a large θhange to the sign and magnitude of the weight. The influenθe of the
gradient through the weight update value θan ηe seen from Equation and the weight
adjustment θalθulation ηelow.
W t+ = ∆W t + W t
If the value of initial weight is smallert han the gradient, the new weight update value will
ηe more likely affeθted ηy the gradient. “s a result, the magnitude and sign of the weight
will ηe θhanged aθθording to the fraθtion of the gradient sinθe at this initial iteration,the pre‐
vious weight update value is set to and leaves the gradient to dominate the θomputation of
weight up date value as shown ηy Equation . The θase disθussed ηefore θan ηe viewed in
this way. “ssume that the random funθtion assigns a negative value for one of the weight
θonneθtions at the hidden layer. “fterperforming the feed-forward proθess, the differenθe
ηetween the output at output layer with the desired output is large. Thus, theminimization
method is performed ηy θalθulating the gradient with respeθt to the weight at hidden layer.
Sinθe the error is large, the value of the θomputed gradient ηeθomes large also. This θan ηe
seen in the equation ηelow.
70 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
k= - tk - Ok ∙ Ok Ok -
∇E W t = ∑ k W kj O -Oj
k j
“ssume that from Equation , the gradient at hidden layer has a large positive value and
sinθe it is at the initial state, the previous weight update value is set to and aθθording to
Equation , it does not θontriηute anything to the weight update θomputation. “s the re‐
sult, the weight update value is largely influenθed ηy the magnitude and sign of the gradi‐
ent inθluding the θontriηution of the learning rate in adjusting the step size of the gradient .
”y performing the weight adjustment method desθriηed ηy Equation , the value of
weight update whiθh is mostly affeθted ηy gradient will dominate the θhange of weight
magnitude and sign. However, this θan ηe applied also to weight adjustment in the middle
of training, where the gradient and error are still large and the previous weight update val‐
ue is set to a θertain amount.
We θan see that the large value of error will affeθt the value of gradient where it will ηe as‐
signed with a relatively large value. This value will ηe used to fix the weight in the network
ηy propagating ηaθk the error signal to the network. “s a result, the value and the sign of
weight will ηe adjusted to θompensate the error. “nother notiθe aηle θonθlusion attained
from the experiment is that when for θonseθutive iterations the gradient retains the same
signs,the weight value over the iterations is in θreased while when the gradient θhanges its
signs for several θonseθutive iterations, the weight value is deθreased. However,the θhange
of weight sign and magnitude is still affeθted ηy the parameters and faθtors inθluded in
Equation and as explained ηefore.
The following examples show the θhange of weight affeθted ηy the sign of gradient at θon‐
seθutive iterations and its value.The θhange of the weight is represented ηy a Hinton dia‐
gram. The following Hinton diagram is the representation of the weights in standard ”P
with adaptive learning network on a ηalloon dataset.
The Hinton diagram in Figure illustrates the sign and magnitude of all weights θonneθtion
ηetween hidden and input layer as well as hidden and output layer at first iteration where
the light θolour indiθates the positive sign of weight and the dark θolour indiθates the nega‐
tive sign of weight.The size of the reθtangle indiθates the magnitude of the weight. The fifth
reθtangles in Figure are the ηias θonneθtion ηetween the input layer to the hidden layer
however, the ηias θonneθtion to the first hidden layer θarries a small value so that its repre‐
sentation is not θlearly shown in the diagram and its sign is negative. The ηiases have the
value of . The resulting error at the first iteration is still large, whiθh is . .
The error deθreases gradually from the first iteration until the fifth iteration. The θhanges on
the gradient are shown in the taηle ηelow.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 71
h““p://dx.doi.org/10.5772/51776
Figure 3. Hin“on diagram of all weigh“s connec“ion be“ween inp”“ layer and hidden layer a“ firs“ i“era“ion on Bal‐
loon da“ase“
Table 1. The gradien“ val”e be“ween inp”“ layer and hidden layer a“ i“era“ions 1 “o 5.
Table 2. The gradien“ val”e be“ween hidden layer and o”“p”“ layer a“ i“era“ions 1 “o 5.
72 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 4. Hin“on diagram of weigh“connec“ions be“ween inp”“layer and hidden layer a“ fif“h i“era“ion.
From the taηle aηove we θan infer that the gradient in the hidden layer moves in a different
direθtion while the one in the output layer moves in the same direθtion. ”ased on the heuris‐
tiθ,when eaθh gradient moves in the same iteration for these θonseθutive iterations, the value
of weight needs to ηe inθreased. However, it still depends on the faθtors that have ηeen
mentioned ηefore. The impaθt on weight is given in the diagram ηelow.
Figure 5. Hin“on diagramof weigh“ connec“ions be“ween inp”“layer and hidden layer a“ 12“h i“era“ion.
Comparing the value of the weight in the fifth iteration with the first iteration, we θan
infer that most of the magnitude of weight on the θonneθtion ηetween input and hidden
layers in the fifth iteration ηeθomes greater θompared with that inthe first iteration. This
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 73
h““p://dx.doi.org/10.5772/51776
shows that the value of weight is inθreased over iterations aθθording to the sign of gradi‐
ent that is similar over several iterations. However, it is notiθeaηle that the sign of the
weight ηetween input node and the first hidden node as well as ηias at input layer to
first hidden node θhanges. This is due to the influenθe of the result of the multipliθation
of large positive gradient and LR whiθh dominates the weight update θalθulation, and
henθe inθreases its magnitude and switθhes the weight direθtion. “s a result, the error
deθreases to . from . .
“titerations and the error gradient moves slowly along the shallow slope in the same
direθtion and ηrings smaller θhanges in gradient, weight and of θourse error itself. The
θhange in the gradient is shown in the taηle ηelow.
It θan ηe seen from the sign of the gradient that it differs from the one at iterations - ,
whiθh means that the gradient at ηias θonneθtion moves in a different direθtion. The same
thing happens with the gradient at the third node of input layer to all hidden nodes where
the sign θhanges from the one at fifth iteration. “nother θhange oθθurs at the gradient in seθ‐
ond input node to the first hidden node at iteration .
Table 3. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 12 “o 16.
Owing to the θhange of gradient sign that has ηeen disθussed ηefore, the θhange on weight
at this iteration is θlearly seen in its magnitude. The weights are large in magnitude θom‐
pared with the one at fifth iteration sinθe some of the gradients have moved in the same di‐
reθtion. The ηias θonneθtion to the first hidden node is not θlearly seen sinθe its value is very
small. However,its value is negative. For some of the weights, the θhanges in sign of the gra‐
dient have less effeθt on the new weight value sinθe its value is very small. Thus, the sign of
the weights remains the same. Moreover, although the gradient sign of the third input node
and the first hidden node θhanges, the weight sign remains the same sinθe the positive
weight updateat the previous iteration and the θhanges of gradient are very small. Thus, it
has a smaller impaθt on the θhange of weight although the weight update value is negative
or deθreasing. “t this iteration, the error deθreases to . .
74 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
”esides the magnitude θhange,the oηvious θhange is seen in the sign of weight of the first
input node to the seθond hidden node. It is positive at the previous iteration, ηut now it
turns to negative.
Figure 6. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 14“h i“era“ion.
Figure 7. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 1s“ i“era“ion.
From the experiment, the magnitude of this weight deθreases gradually in small numηers
over several iterations. This is due to the negative value of gradient at the first iteration.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 75
h““p://dx.doi.org/10.5772/51776
Sinθe its initial value is quite ηig, thus, the deθrement does not have a signifiθant effeθt on
the sign. Moreover, sinθe its value is getting smaller after several iterations, the impaθt of the
negative gradient θan ηe seen in the θhange of the weight sign. The error at this iteration
deθreases to . .
The next example is the Hinton diagram representing weights in standard ”P with fixed pa‐
rameters network on Iris dataset.
Figure 8. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 11“h i“era“ion.
Table 4. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 1-4
76 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Table 5. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 11-14
“titeration ,the most oηvious θhange in sign is on the weight ηetween the seθond input
node and the first hidden node. ”ased on the taηle of gradient, we might know that the gra‐
dient at this θonneθtion moves in the same direθtion through out iterations - and - .
However, due to the negative value of the gradient, the weight update value θarries a nega‐
tive sign that θauses the value to deθrease until its sign is negative. “t this iteration, the error
value deθreases.
Figure 9. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 1s“ i“era“ion.
Weigh“ Changes for Learning Mechanisms in Two-Term Back-Propaga“ion Ne“work 77
h““p://dx.doi.org/10.5772/51776
The next example is the Hinton diagram representing weights in standard ”Pwith an adap‐
Figure 10. Hin“on diagram of weigh“ connec“ions be“ween inp”“ layer and hidden layer a“ 11“h i“era“ion.
Table 6. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 1-4 .
78 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Table 7. The gradien“ val”e be“ween inp”“ and hidden layer a“ i“era“ions 11-12.
“t the th iteration, all weights θhange their magnitude and some have different signs
from ηefore. The weight ηetween the seθond input node to the first and seθond hidden lay‐
ers θhanges its sign to positive ηeθause of the positive inθremental value sinθe the gradient
moves along the same direθtion over time. The positive inθremental value gradually θhange
the magnitude and sign of the weight from negative to positive.
. Conclusions
Acknowledgements
“uthors would like to thanks Universiti Teknologi Malaysia UTM for the support in Re‐
searθh and Development, and Soft Computing Researθh Group SCRG for the inspiration in
making this study a suθθess. This work is supported ηy The Ministry of Higher Eduθation
MOHE under Long Term Researθh Grant Sθheme LRGS/TD/ /UTM/ICT/ - VOT L .
Author details
Soft Computing Researθh Group, Faθulty of Computer Sθienθe and Information Systems,
UniversitiTeknologi Malaysia, Malaysia
References
[ ] Ng, S., Leung, S., & Luk, “. . Fast θonvergent generalized ηaθk-propagation al‐
gorithm with θonstant learning rate. Neural proθessing letters, - .
ral Networks- IJCNN . June , -June , , San Diego, CA, USA: Puηl ηy
IEEE.
[ ] Jin, ”., et al. . The appliθation on the foreθast of plant disease ηased on an im‐
proved ”P neural network. in International Conferenθe on Material Sθienθe and
Information Teθhnology, MSIT Septemηer -Septemηer , . Singapore,
Singapore Trans Teθh Puηliθations.
[ ] Li, Y., et al. . The improved training algorithm of ηaθk propagation neural net‐
work with selfadaptive learning rate. in International Conferenθe on Computa‐
tional Intelligenθe and Natural Computing, CINC . June -June , Wuhan,
China IEEE Computer Soθiety.
[ ] Hua, Li. C., Xiangji, J., & Huang, . . Spam filtering using semantiθ similarity ap‐
proaθh and adaptive ”PNN. Neuroθomputing.
[ ] Xiaoyuan, L., ”in, Q., & Lu, W. . “ new improved ”P neural network algo‐
rithm. in nd International Conferenθe on Intelligent Computing Teθhnology
and “utomation, ICICT“ . Oθtoηer -Oθtoηer , ., Changsha, Hunan, Chi‐
na: IEEE Computer Soθiety.
[ ] Yang, H., Mathew, J., & Ma, L. . ”asis pursuit-ηased intelligent diagnosis of
ηearing faults. Journal of Quality in Maintenanθe Engineering, - .
[ ] Sidani, “., & Sidani, T. . Comprehensive study of the ηaθk propagation algo‐
rithm and modifiθations. in Proθeedings of the Southθon Conferenθe,. Marθh -
Marθh , Orlando, FL, US“ IEEE.
h““p://dx.doi.org/10.5772/51274
1. Introduction
Applications of artificial neural networks (ANNs) have been reported in literature in various
areas. [1–5] The wide use of ANNs is due to their robustness, fault tolerant and the ability
to learn and generalize, through training process, from examples, complex nonlinear and
multi input/output relationships between process parameters using the process data. [6–10]
The ANNs have many other advantageous characteristics, which include: generalization,
adaptation, universal function approximation, parallel data processing, robustness, etc.
Multilayer perceptron (MLP) trained with backpropagation (BP) algorithm is the most used
ANN in modeling, optimization classification and prediction processes. [11, 12] Although
BP algorithm has proved to be efficient, its convergence tends to be very slow, and there is a
possibility to get trapped in some undesired local minimum. [4, 10, 11, 13]
Most literature related to ANNs focused on specific applications and their results rather
than the methodology of developing and training the networks. In general, the quality
of the developed ANN is highly dependable not only on ANN training algorithm and its
parameters but also on many ANN architectural parameters such as the number of hidden
layers and nodes per layer which have to be set during training process and these settings
are very crucial to the accuracy of ANN model. [8, 14–19]
Above all, there is limited theoretical and practical background to assist in systematical
selection of ANN parameters through entire ANN development and training process. Due to
this the ANN parameters are usually set by previous experience in trial and error procedure
which is very time consuming. In such a way the optimal settings of ANN parameters for
achieving best ANN quality are not guaranteed.
© 2013 Man”el Or“iz-Rodríg”ez e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he
“erms of “he Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which
permi“s ”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is
properly ci“ed.
84 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The robust design methodology, proposed by Taguchi, is one of the appropriate methods
for achieving this goal. [16, 20, 21] Robust design is a statistical technique widely used to
study the relationship between factors affecting the outputs of the process. It can be used
to systematically identify the optimum setting of factors to obtain the desired output. In
this work, it was used to find the optimum setting of ANNs parameters in order to achieve
minimum error network.
Dendrites are fibers which receive the electric signals coming from other neurons and
transmit them to the soma. The multiple signals coming from dendrites are processed by
the soma and transmitted to the axon. The cylinder-axis or axon is a fiber of great longitude,
compared with the rest of the neuron, connected to the soma for an end and divided in the
other one in a series of nervous ramifications; the axon picks up the signal of the soma and
transmits it to other neurons through a process known as synapses.
An artificial neuron it is a mathematical abstraction of the working of a biological neuron.
[23] Figure 2, shows an artificial neuron. From a detailed observation of the biological
process, the following analogies with the artificial system can be mentioned:
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 85
h““p://dx.doi.org/10.5772/51274
• The input Xi represents the signals that come from other neurons and are captured by
dendrites.
• The weights Wi are the intensity of the synapses that connects two neurons; Xi and Wi
are real values.
• θ is the threshold function that the neuron should exceed to be active; this process
happens biologically in the body of the cell
• the input signals to the artificial neuron X1 , X2 , ..., Xn are continuous variables instead of
discrete pulses, as are presented in a biological neuron. Each input signal passes through
a gain or weight called synaptic weight or strength of the connection whose function is
similar to the synaptic function of the biological neuron.
• Weights can be positive (excitatory) or negatives (inhibitory), the summing node
accumulates all the input signals multiplied by the weights and pass to the output through
a threshold or transfer function.
An idea of this process is shown in figure 3, where can be observed a group of inputs entering
to an artificial neuron.
The input signals are pondered multiplying them for the corresponding weight that would
correspond in the biological version of the neuron to the strength of the synaptic connection;
the pondered signals arrive to the neuronal node that acts as a summing of the signals; the
output of the node is denominated net output, and it is calculated as the summing of the
86 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
pondered entrances plus a b value denominated gain. The net outpu is used as entrance to
the transfer function providing the total output or answer of the artificial neuron.
The representation of figure 3 can be simplified as is shown in figure 4. From this figure, the
net output of neuron n, can be mathematically represented as follows:
r
n= ∑ p i wi + b (1)
i =1
r
a = f (n) = ∑ p i wi + b (2)
i =1
From this figure can be seen that the net inputs are present in the vector p, an alone neuron
only has one element; W represents the weights and the new input b is a gain that reinforce
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 87
h““p://dx.doi.org/10.5772/51274
the output of the summing n, which is the net output of the network; the net output it is
determined by the transfer function which can be a lineal or non-lineal function of n, and is
chosen depending of the specifications of the problem that the neuron wants to solve.
Generally, a neuron has more than one entrance. In figure 4, can be observed a neuron with
R inputs; the individual inputs p1 , p2 , ..., p R are multiplied by the corresponding weights
w1,1 , w1,2 , ..., w1,R belonging to the weight matrix W. The sub-indexes of the weigh matrix
represent the terms involved in the connection. The first sub-index represents the neuron
destination, and the second represents the source of the signal that feeds to the neuron. For
example, the w1,2 indexes indicate that this weight is the connection from the second entrance
to the first neuron.
This convention becomes more useful when there is a neuron with too many parameters;
in this case the notation of figure 4, can be inappropriate and it is preferred to use the
abbreviated notation represented in figure 6.
The entrance vector p is represented by the vertical solid bar to the left. The dimensions of p
are shown in the inferior part of the variable as Rx1, indicating that the entrance vector is a
vectorial row of R elements. The entrances go to the weight matrix W, which has R columns
and just one row for the case of a single neuron. A constant 1 enters to the neuron multiplied
by the scalar gain b. The exit of the net a it is a scalar in this case. If the net had more than a
neuron a would be a vector.
ANN are highly simplified models of the working of the brain. [10, 24] An ANN is
a biologically inspired computational model which consists of a large number of simple
processing elements or neurons which are interconnected and operate in parallel. [2, 13]
Each neuron is connected to other neurons by means of directed communication links, which
constitute the neuronal structure, each with an associated weight. [4] The weights represent
information being used by the net to solve a problem.
ANNs are usually formed by several interconnected neurons. The disposition and connection
varies from one type of nets to other, but in a general way the neurons are grouped by layers.
A layer is a collection of neurons; according to the location of the layer in the neural net, this
receives different names:
88 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
• Input later: receives the input signals from the environment. In this layer the information
is not processed, for this reason, it is not considered as a layer of neurons.
• Hidden layers: these layers do not have contact with the exterior environment; the hidden
layers pick up and process the information coming from the input layer; the numbers of
hidden layers and neurons per layer and the form in that are connected, vary from some
nets to others. Their elements can have different connections and these determine the
different topologies of the net.
• Output layer: receives the information from the hidden layers and transmits the answer
to the external means.
Figure 7 shows an ANN with two hidden layers. The outputs of first hidden layer are the
entrances of the second hidden layer. In this configuration, each layer have its own weight
matrix W, the summing, a gain vector b, net inputs vector n, the transfer function and the
output vector a. This ANN can be observed in abbreviated notation in figure 8.
Figure 8 shows a three-layer network using abbreviated notation. From this figure can be seen
that the network has R1 inputs, S1 neurons in the first layer, S2 neurons in the second layer,
etc. A constant input 1 is fed to the bias for each neuron. The outputs of each intermediate
layer are the inputs to the following layer. Thus layer 2 can be analyzed as a one-layer
network with S1 inputs, S2 neurons, and an S2 xS1 weight matrix W2 . The input to layer 2
is a1 ; the output is a2 . Now that all the vectors and matrices of layer 2 have been identified,
it can be treated as a single-layer network on its own. This approach can be taken with any
layer of the network.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 89
h““p://dx.doi.org/10.5772/51274
Figure 8. Artificial Neural Network with two hidden layers in abreviated notation
The arrangement of neurons into layers and the connection patterns within and between
layers is called the net architecture [6, 7]. According to the absence or presence of feedback
connections in a network, two types of architectures are distinguished:
• Feedforward architecture. There are no connections back from the output to the input
neurons; the network does not keep a memory of its previous output values and the
activation states of its neurons; the perceptron-like networks are feedforward types.
• Feedback architecture. There are connections from output to input neurons; such a
network keeps a memory of its previous states, and the next state depends not only
on the input signals but on the previous states of the network; the Hopfield network is of
this type.
Back propagation feed forward neural nets is a network with supervised learning which uses
a propagation-adaptation cycle of two phases. Once a pattern has been applied to the input
of the network as a stimulus, this is propagated from the first layer through the superior
layers of the net until generate an output. The output signal is compared with the desired
output and a signal error is calculated for each one of the outputs.
The outputs errors are back propagated from the output layer toward all the neurons of the
hidden layer that contribute directly with the output. However, the neurons of the hidden
layer only receive a fraction of the signal from the whole error signal, based on the relative
contribution that has contributed each neuron to the original output. This process is repeated
for each layer until all neurons of the network have received an error signal which describes
its relative contribution to the total error. Based on the perceived signal error the connection
synaptic weights of each neuron are upgrade to make that the net converges toward a state
that allows to classify correctly all the patterns of training.
The importance of this process consists in that as trains the net, those neurons of the
intermediate layers are organized themselves in such a way that the neurons learn how
to recognize different features of the whole entrance space. After the training, when they
are presented an arbitrary pattern of entrance that contain noise or that it is incomplete, the
neurons of the hidden layer of the net will respond with an active output if the new entrance
contains a pattern that resembles each other to that characteristic that the individual neurons
have learned how to recognize during their training. And to the inverse one, the units of
90 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
the hidden layers have one tendency to inhibit their output if the entrance pattern does not
contain the characteristic to recognize, for which they have been trained.
During the training process, the Backpropagation net tends to develop internal relationships
among neurons with the purpose to organize the training data in classes. This tendency
can be extrapolated to arrive to the hypothesis that all the units of the hidden layer of a
Backpropagation are associated somehow to specific characteristic of the entrance pattern as
consequence of the training. That the association is or not exact, it cannot be evident for the
human observer, the important thing it is that the net has found one internal representation
that allows him to generate the wanted outputs when are given the entrances in the training
process. This same internal representation can be applied to entrances that the net has not
seen before, and the net will classify these entrances according to the characteristics that
share with the examples of training.
In recent years, there is increasing interest in using ANNs for modeling, optimization and
prediction. The advantages that ANNs offer are numerous and are achievable only by
developing an ANN model of high performance. However, determining suitable training
and architectural parameters of an ANN still remains a difficult task mainly because it is
very hard to know beforehand the size and the structure of a neural network one needs to
solve a given problem. An ideal structure is a structure that independently of the starting
weights of the net, always learns the task, i.e. makes almost no error on the training set and
generalizes well.
The problem with neural networks is that a number of parameter have to be set before
any training can begin. Users have to choose the architecture and determine many of
the parameters in a selected network. However, there are no clear rules how to set these
parameters. Yet, these parameters determine the success of the training.
As can be appreciated in figure 9, the current practice in the selection of design parameters
for ANN is based on the trial and error procedure, where a large number of ANN models
are developed and compared to one another. If the level of a design parameter is changed
and does not have effect in the performance of the net, then a different design parameter is
varied, and the experiment is repeated in a series of approaches. The observed answers are
examined in each phase, to determine the best level in each design parameter.
The serious inconvenience of this method is that a parameter is evaluated while the other
ones are maintained in an only level. Of here, the best selected level in a design variable in
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 91
h““p://dx.doi.org/10.5772/51274
particular, could not necessarily be the best at the end of the experimentation, since those
other parameters could have changed. Clearly, this method cannot evaluate interactions
among parameters since only combines one at the same time it could lead to an ANN
impoverished design in general.
All of these limitations have motivated researchers to generate ideas of merging or
hybridizing ANN with other approaches in the search for better performance. A form of
overcoming this disadvantage, is to evaluate all the possible level combinations of the design
parameters, i.e., to carry out a complete factorial design. However, since the number of
combinations can be very big, even for a small number of parameters and levels, this method
is very expensive and consumes a lot of time. The number of experiments to be carried out
can be decreased making use of the factorial fractional method, a statistical method based on
the robust design of Taguchi philosophy.
The Taguchi technique is a methodology for finding the optimum setting of the control
factors to make the product or process insensitive to the noise factors. Taguchi based
optimization technique has produced a unique and powerful optimization discipline that
differs from traditional practices.[16, 20, 21]
Taguchi’s robust design can be divided into two classes: static and dynamic characteristics.
The static problem attempts to obtain the value of a quality characteristic of interest as close
as possible to a single specified target value. The dynamic problem, on the other hand,
involves situations where a system’s performance depends on a signal factor.
Taguchi also proposed a two-phase procedure to determine the factors level combination.
First, the control factors that are significant for reducing variability are determined and their
settings are chosen. Next, the control factors that are significant in affecting the sensitivity
are identified and their appropriate levels are chosen. The objective of the second phase is to
adjust the responses to the desired values.
The Taguchi method is applied in four steps.
is used for evaluating the S/N ratio, where the mean square error (MSE) represents the
mean square of the distance between the measured response and the best fitted line;
denotes the sensitivity.
4. Run a confirmatory test using the optimum conditions.
The two major goals of parameter design are to minimize the process or product
variation and to design robust and flexible processes or products that are adaptable to
environmental conditions. Taguchi methodology is useful for finding the optimum setting
of the control factors to make the product or process insensitive to the noise factors.
In this stage, the value of the robustness measure is predicted at the optimal design
condition; a confirmation experiment at the optimal design condition is conducted,
calculating the robustness measure for the performance characteristic and checking if
the robustness prediction is close to the predicted value.
Today, ANN can be trained to solve problems that are difficult for conventional computers
or human beings, and have been trained to perform complex functions in various fields,
including pattern recognition, identification, classification, speech, vision, and control
systems. Recently, the use of ANN technology has been applied with relative success in
the research area of nuclear sciences,[3] mainly in the neutron spectrometry and dosimetry
domains. [25–31]
is referred as “dose”. The neutron spectra and the dose are of great importance in radiation
protection physics. [37]
Neutrons are found in the environment or are artificially produced by different ways;
these neutrons have a wide energy range extending from few thousandths of eV to several
hundreds of MeV. [38] Also, they are in a broad variety of energy distributions, named
neutron-fluence spectrum or simply neutron spectrum, Φ E ( E).
Determination of neutron dose received by those exposed to workplaces or accidents in
nuclear facilities, generally requires knowledge of the neutron energy spectrum incident on
the body. [39] Spectral information must generally be obtained from passive detectors which
respond to different ranges of neutron energies such as the multispheres Bonner system or
Bonner spheres system (BSS). [40–42]
BSS system, has been used to unfold the neutron spectra mainly because it has an almost
isotropic response, can cover the energy range from thermal to GeV neutrons, and is easy
to operate. However, the weight, time consuming procedure, the need to use an unfolding
procedure and the low resolution spectrum are some of the BSS drawbacks. [43, 44]
As can be seen from figure 10, BSS consists of a thermal neutron detector such as 6 LiI ( Eu),
Activation foils, pairs of thermoluminiscent dosimeters or track detectors, which is placed
at the centre of a number of moderating spheres made of polyethylene of different diameter
to obtain, through an unfolding process the neutron energy distribution, also known as
spectrum, Φ E ( E). [42, 45]
Figure 10. Bonner spheres system for a 6 LiI ( Eu) neutrons detector
The derivation of the spectral information is not simple; the unknown neutron spectrum is
not given directly as a result of the measurements. [46] If a sphere d has a response function
Rd ( E), and is exposed in a neutron field with spectral fluence Φ E ( E), the sphere reading Md
is obtained by folding Rd ( E) with Φ E ( E), this means to solve the Fredholm integral equation
of the first kind shown in equation 4.
Z
Md = Rd ( E)Φ E ( E)dE (4)
This folding process takes place in the sphere itself during the measurement. Although the
real Φ E ( E) and Rd ( E) are continuous functions of neutron energy, they cannot be described
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 95
h““p://dx.doi.org/10.5772/51274
N
Cj = ∑ Ri,j Φi − −− > j = 1, 2, ..., m (5)
i =1
where Cj is jth detector’s count rate; Ri,j is the jth detector’s response to neutrons at the ith
energy interval; Φi is the neutron fluence within the ith energy interval and m is the number
of spheres utilized.
Once the neutron spectrum, Φ E ( E), has been obtained, the dose △ can be calculated using
the fluence-to-dose conversion coefficients δΦ E, as shown in equation 3.
Z Emax
△= δΦ EΦ E ( E)dE (6)
Emin
To deal economically with the many possible combinations, the Taguchi method can be
applied. Taguchi’s techniques have been widely used in engineering design, and can be
applied to many aspects such as optimization, experimental design, sensitivity analysis,
parameter estimation, model prediction, etc.
This work is concerned with the application of Taguchi method for the optimization of ANN
models. The integration of ANN and Taguchi’s optimization provides a tool for designing
robust network parameters and improving their performance. The Taguchi method offers
considerable benefits in time and accuracy when is compared with the conventional trial and
error neural network design approach.
In this work, for the robust design of multilayer feedforward neural networks trained by
backpropagation algorithm in the neutron spectrometry field, a systematic and experimental
strategy called Robust Design of Artificial Neural Networks (RDANN) methodology was
designed. This computer tool, emphasizes simultaneous ANNs parameters optimization
under various noise conditions. Here, we make a comparison among this method and
conventional training methods. The attention is drawing on the advantages on Taguchi
methods which offer potential benefits in evaluating the network behavior.
Figure 12. Re-binned neutron spectra data set used to train the optimum ANN architecture designed with RDANN
methodology
Multiplying re-binned neutron spectra by UTA4 response matrix, the rate counts data set
was calculated. Re-binned spectra and equivalent doses are the desired output of ANN and
its corresponding calculated rate counts the entrance data.
The second one challenge in neutron spectrometry by means ANN, is the determination of
the net topology. In the ANN design process, the choice of the ANN’s basic parameters often
determines the success of the training process. The selection of these parameters follows in
practical use no rules, and their value is at most arguable. This method consuming much time
and does not systematically target a near optimal solution to select suitable parameter values.
The ANN designers have to choose the architecture and determine many of the parameters
through the trial and error technique, which produces ANN with poor performance and low
generalization capability, spending often large amount of time.
An easier and more efficient way to overcome this disadvantage is to use the RDANN
methodology, showed in figure 13, which has become in a new approach to solve this
problem.
RDANN is a very powerful method based on parallel processes where all the experiments are
planed a priori and the results are analyzed after all the experiments are completed. This is a
98 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
systematic and methodological approach of ANN design, based on the Taguchi philosophy,
which maximize the ANN performance and generalization capacity.
The integration of neural networks and optimization provides a tool for designing ANN
parameters improving the network performance and generalization capability. The main
objective of the proposed methodology is to develop accurate and robust ANN models. In
other words, the goal is to select ANN training and architectural parameters, so that the
ANN model yields best performance.
From figure 13 can be seen that in ANN design using Taguchi philosophy in RDANN
methodology, the designer must recognize the application problem well and choose a suitable
ANN model. In the selected model, the design parameters, factors, which need to be
optimized need to be determined (Planning stage). Using OAs, simulations, i.e., training
of ANNs with different net topologies can be executed in a systematic way (experimentation
stage). From simulation results, the response can be analyzed by using S/N ratio from
Taguchi method (Analysis stage). Finally, a confirmation experiment at the optimal design
condition is conducted, calculating the robustness measure for the performance characteristic
and checking if the robustness prediction is close to the predicted value (Confirmation stage).
To provide scientific discipline to this work, in this research the systematic and
methodological approach called RDANN methodology was used to obtain the optimum
architectural and learning values of an ANN capable to solve the neutron spectrometry
problem.
According figure 13, the steps followed to obtain the optimum design of the ANN are
described:
1. Planning stage
In this stage it is necessary to identify the objective function and the design and noise
variables.
(a) The objective function. The objective function must be defined according to the purpose
and requirements of the problem.
In this research, the objective function is the prediction or classification errors between
the target and the output values of BP ANN at testing stage, i.e., the performance or
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 99
h““p://dx.doi.org/10.5772/51274
mean square error (MSE) output of the ANN is used as the objective function as is
showed in the following equation:
v
u1 N
u
MSE = t ∑ (Φ E ( E)iANN − Φ E ( E)ORIGI
i
N AL )2 (7)
N i =1
where A is the number of neurons in the first hidden layer, B is the number of neurons
in the second hidden layer, C is the momentum and D is the learning rate.
Noise variables are shown in table 2. These variables in most cases are not controlled
by the user. The initial set of weights, U, usually is randomly selected; In the training
and testing data sets, V, the designer must decide how much of the whole data should
be allocated to the training and testing data sets. Once V is determined, the designer
must decide which data of the whole data set to include in the training and testing
data set, W.
Design Var. Level 1 Level 2
U Set 1 Set 2
V 6:4 8:2
W Tr-1/Tst-1 Tr-2/Tst-2
where U is the initial set of random weights, V is the size of training set versus size
of testing set, i.e., V = 60% / 40%, 80% / 20% and W is the selection of training and
testing sets, i.e., W = Training1/Test1, Training2/Test2.
In practice, these variables are randomly determined, and are not controlled by designer.
Because the random nature of this selection processes, the ANN designer must create
these data sets starting from the whole data set. This procedure is very time consuming
when is done by hand without the help of technological tools.
RDANN methodology was designed in order to fully automate in a computer program,
developed under Matlab environment and showed in figure 14, the creation of the noise
100 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
variables and their levels. This work is done before the training of the several net
topologies tested at experimentation stage.
Besides the automatic generation of noise variables, another programming routines were
created in order to train the different net architectures, and to statistically analyze and
graph the obtained data. When this procedure is done by hand, is very time consuming.
The use of the designed computer tool saves a lot of time and effort to ANN designer.
After the factors and levels are determined, a suitable OA can be selected for training
process. The Taguchi OAs are denoted by Lr (sc ) where r is the number of rows, c is the
number of columns and s is the number of levels in each column.
2. Experimentation stage. The choice of a suitable OA is critical for the success of this stage.
OA allow to compute the main interaction effects via a minimum number of experimental
trails. In this research, the columns of OA represent the experimental parameters to be
optimized and the rows represent the individual trials, i.e., combinations of levels.
For a robust experimental design, Taguchi suggest to use two crossed OAs with a L9 (34 )
y L4 (32 ) configuration, as is showed in table 3.
From table 3, can be seen that a design variable is assigned to a column of the OA. Then,
each row of the design OA represents a specific design of ANN. Similarly, a noise variable
is assigned to a column of the noise OA, each row corresponds to a noise condition.
3. Analysis stage.
The S/N ratio is a measure of both, the location and dispersion of the measured responses.
It transforms the row data to allow quantitative evaluation of the design parameters
considering their mean and variation. It is measured in decibels using the formula:
where MSD is a measure of the mean square deviation in performance, since in every
design, more signal and less noise is desired. The best design will have the highest S/N
ratio. In this stage, the statistical program JMP was used to select the best values of the
ANN being designed.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 101
h““p://dx.doi.org/10.5772/51274
Table 3. ANN measured responses with a crossed OA with L9 (34 ) y L4 (32 ) configuration
4. Confirmation stage.
In this stage, the value of the robustness measure is predicted at the optimal design
condition; a confirmation experiment at the optimal design condition is conducted,
calculating the robustness measure for the performance characteristic and checking if
the robustness prediction is close to the predicted value.
where A is the number of neurons in the first hidden layer, B is the number of neurons in
the second hidden layer, C is the momentum, D is the learning rate.
Design Var. Level 1 Level 2
U Set 1 Set 2
V 6:4 8:2
W Tr-1/Tst-1 Tr-2/Tst-2
where U is the initial set of random weights, V is the size of training set versus size of testing
set, i.e., V = 60% / 40%, 80% / 20% and W is the selection of training and testing sets, i.e.,
W = Training1/Test1, Training2/Test2.
The signal-to-noise ratio was analyzed by means of Analysis of Variance (ANOVA) by using
the statistical program JMP. Since an error of 1E−4 was established for the objective function,
from table 6, can be seen that all ANN performances reach this value. This means that this
particular OA has a good performance.
Trial No. A B C1 C2 C3 D
1 14 0 0.001 0.001 0.001 0.1
2 14 0 0.001 0.1 0.3 0.1
3 56 56 0.001 0.1 0.1 0.1
Table 7. Best values used in the confirmation stage to design the ANN
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 103
h““p://dx.doi.org/10.5772/51274
From figure 15(a), can be seen that all neutron spectra pass the Chi square statistical
test, which demonstrate that statistically there is not difference among the neutron spectra
reconstructed by the designed ANN and the target neutron spectra. Similarly from figure
15(b), can be seen that the whole data set of neutron spectra is near of the optimum value
equal to one, which demonstrate that this is an OA with high quality.
Figure 16 and 17 shown the best and worst neutron spectra unfolded at final testing stage of
the designed ANN compared with the target neutron spectra, along with the correlation and
chi sqare tests applied to each spectra.
104 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
In ANNs design, the use of the RDANN methodology can help provide answers to the
following critical design and construction issues:
• What is the proper density for training samples in the input space?. The proper density
for training samples in the input space was: 80% for ANN training stage and 20% for
testing stage.
• When is the best time to stop training to avoid over-fitting?. The best time to stop
training to avoid over-fitting is variable and depends of the proper selection of the ANN
parameters. In the optimum ANN designed, the best time to train the network avoiding
the over-fitting was 120 seconds average.
• Which is the best architecture to use?. The best architecture to use is 7 : 14 : 31, a
learning rate = 0.1 and a momentum = 0.1, a trainscg training algorithm and an mse =
1E−4 .
• Is it better to use a large architecture and stop training optimally or to use an optimum
architecture, which probably will not over-fit the data, but may require more time
to train?. It is better to use an optimum architecture, designed with the RDANN
methodology, which not overfit the data and do not require more training time instead of
using a large architecture stopping the training over the time or trials which produce a
poor ANN.
• If noise is present in the training data, is best to reduce the amount of noise or gather
additional data?, and what is the effect of noise in the testing data on the performance
of the network?. In the random weight initialization is introduced a great amount of
noise in training data. Such initialization introduces large negative numbers which is
very harmful for the neutron spectra unfolded. The effect of noise in the random weight
initialization in the testing data, affects significantly the performance of the network. In
this case, the noise produced results negatives in the unfolded neutrons, which has not
physics meaning. In consequence, can be concluded that it is necessary to reduce the
noise introduced in the random weight initialization.
4. Conclusions
ANNs is a theory that still is in development process; its true potentiality has not
still been reached; although researchers have developed potent learning algorithms of
great practical value, representations and procedures that the brain is served, are even
unknown. The integration of ANN and optimization provides a tool for designing neural
network parameters and improving the network performance. In this work, a systematic,
methodological and experimental approach called RDANN methdology was introduced to
obtain the optimum design of artificial neural networks. The Taguchi method is the main
technique used to simplify the optimization problem.
RDANN methdology was applied with success in nuclear sciences to solve the neutron
spectra unfolding problem. The factors that are found to be significant in the case study
were number of hidden neurons in hidden layer 1 and 2, learning rate and momentum term.
The near optimum ANN topology was: 7:14:31 whit a momentum = 0.1 and a learning
rate = 0.1, mse = 1E−4 and a “trainscg” learning function. The optimal net architecture was
designed in short time and has high performance and generalization capability.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 107
h““p://dx.doi.org/10.5772/51274
The proposed systematic and experimental approach is a useful alternative for the robust
design of ANNs. It offers a convenient way of simultaneously considering design and
noise variables, and incorporates the concept of robustness in the ANN design process. The
computer program developed to implement the experimental and confirmation stages of the
RDANN methodology, reduces significantly the time required to prepare, to process and
to present the information in an appropriate way to de designer, and in the search of the
optimal net topology being designed. This gives to the researcher time to solve the problem
in which he is interested.
The results show that RDANN methodolgy can be used to find better setting of ANNs, which
not only results in minimum error, but also significantly reduces training time and effort in
the modeling phases.
The optimum setting of ANNS parameters are largely problem-dependent. Ideally and
optimization process should be performed for each ANNs application, as the significant
factors might be different for ANNs trained for different purpose.
When compared with the trial-and-error approach, which can spent from several days
to months to prove different ANN architectures and parameters which may lead to a
poor overall ANN design, RDANN methodology reduces significantly the time spent in
determining the optimum ANN architecture. With RDANN it takes from minutes to a
couple of hours to determine the best and robust ANN architectural and learning parameters
allowing to researches more time to solve the problem in question.
Acknowledgements
This work was partially supported by Fondos Mixtos CONACYT - Gobierno del Estado de
Zacatecas (México) under contract ZAC-2011-C01-168387.
This work was partially supported by PROMEP under contract PROMEP/103.5/12/3603.
Author details
José Manuel Ortiz-Rodríguez1,⋆ ,
Ma. del Rosario Martínez-Blanco2 ,
José Manuel Cervantes Viramontes1 and
Héctor René Vega-Carrillo2
⋆ Address all correspondence to: morvymm@yahoo.com.mx
Universidad Autónoma de Zacatecas, Unidades Académicas, 1-Ingeniería Eléctrica,
2-Estudios Nucleares, México
5. References
[1] C.R. Alavala. Fuzzy logic and neural networks basic concepts & applications. New Age
International Publishers, 1996.
[2] J. Lakhmi and A. M. Fanelli. Recent advances in artificial neural networks design and
applications. CRC Press, 2000.
108 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
[5] B. Apolloni, S. Bassis, and M. Marinaro. New directions in neural networks. IOS Press,
2009.
[6] J. Zupan. Introduction to artificial neural network methods: what they are and how to
use them. Acta Chimica Slovenica, 41(3):327–352, month 1994.
[7] A. K. Jain, J. Mao, and K. M. Mohiuddin. Artificial neural networks: a tutorial. IEEE:
Computer, 29(3):31–44, month 1996.
[8] T. Y. Lin and C. H. Tseng. Optimum design for artifcial neural networks: an example in
a bicycle derailleur system", journal = "engineering applications of artificial inteligence.
13:3–14, 2000.
[9] M.M. Gupta, L. Jin, and N. Homma. Static and dynamic neural networks: from fundamentals
to advanced theory. 2003.
[11] M. Kishan, K. Chilukuri, and R. Sanjay. Elements of artificial neural networks. The MIT
Press, 2000.
[14] J.A. Frenie and A. Jiju. Teaching the taguchi method to industrial engineers. MCB
University Press, 50(4):141–149, 2001.
[16] G. E. Peterson, D. C. St. Clair, S. R. Aylward, and W. E. Bond. Using taguchi’s method
of experimental design to control errors in layered perceptrons. IEEE Transactions on
Neural Networks, 6(4):949–961, month 1995.
[17] Y.K. Singh. Fundamentl of research methodology and statistics. New Age International
Publishers, 2006.
[19] T.T. Soong. Fundamentals of probability and statistics for engineers. John Wiley & Sons, Inc.,
2004.
Rob”s“ Design of Ar“ificial Ne”ral Ne“works Me“hodology in Ne”“ron Spec“rome“ry 109
h““p://dx.doi.org/10.5772/51274
[22] M.A. Arbib. Brain theory and neural networks. The Mit Press, 2003.
[23] M. H. Beale, M. T Hagan, and H. B. Demuth. Neural networks toolbox, user’s guide.
Mathworks, 1992. www.mathworks.com/help/pdf_doc/nnet/nnet.pdf.
[24] N. K. Kasabov. Foundations of neural networs, fuzzy systems, and knowledge engineering.
MIT Press, 1998.
[32] R.R. Roy and B.P. Nigam. Nuclear physics, theory and experiment. John Wiley & Sons, Inc.,
1967.
110 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
[33] R.N. Cahn and G. Goldhaber. The experimental fundations of particle physics. Cambridge
University Press, 2009.
[34] R.L. Murray. Nuclear energy, an introduction to the concepts, systems, and applications of
nuclear processes. World Scientific, 2000.
[36] B.R.L. Siebert, J.C. McDonald, and W.G. Alberts. Neutron spectrometry for
radiation protection purposes. Nuclear Instruments and Methods in Physics Research A,
476(1-2):347–352, 2002.
[38] F. D. Brooks and H. Klein. Neutron spectrometry, historical review and present status.
Nuclear Instruments and Methods in Physics Research A, 476:1–11, month 2002.
[39] M. Reginatto. What can we learn about the spectrum of high-energy stray neutorn fields
from bonner sphere measurements. Radiation Measurements, 44:692–699, 2009.
[40] M. Awschalom and R.S. Sanna. Applications of bonner sphere detectors in neutron field
dosimetry. Radiation Protection Dosimetry, 10(1-4):89–101, 1985.
[41] V. Vylet. Response matrix of an extended bonner sphere system. Nuclear Instruments
and Methods in Physics Research A, 476:26–30, month 2002.
[45] R.B. Murray. Use of 6LiI(Eu) as a scintillation detector and spectrometer for fast
neutrons. Nuclear Instruments, 2:237–248, 1957.
[49] IAEA. Compendium of neutron spectra and detector responses for radiation protection
purposes. Technical Report 403, 2001.
[50] M. P. Iñiguez de la Torre and H. R. Vega Carrillo. Catalogue to select the initial guess
spectrum during unfolding. Nuclear Instruments and Methods in Physics Research A,
476(1):270–273, month 2002.
Section 2
Applications
Chapter 5
Comparison Between an
Artificial Neural Network and Logistic Regression in
Predicting Long Term Kidney Transplantation Outcome
h““p://dx.doi.org/10.5772/53104
. Introduction
Prediθting θliniθal outθome following a speθifiθ treatment is a θhallenge that sees physiθians and
researθhers alike sharing the dream of a θrystal ηall to read into the future. In Mediθine, several
tools have ηeen developed for the prediθtion of outθomes following drug treatment and other
mediθal interventions. The standard approaθh for a ηinary outθome is to use logistiθ regression
LR [ , ] ηut over the past few years artifiθial neural networks “NNs have ηeθome an inθreas‐
ingly popular alternative to LR analysis for prognostiθ and diagnostiθ θlassifiθation in θliniθal
mediθine [ ]. The growing interest in “NNs has mainly ηeen triggered ηy their aηility to mimiθ
the learning proθesses of the human ηrain. The network operates in a feed-forward mode from
the input layer through the hidden layers to the output layer. Exaθtly what interaθtions are mod‐
eled in the hidden layers is still under study. Eaθh layer within the network is made up of θom‐
puting nodes with remarkaηle data proθessing aηilities. Eaθh node is θonneθted to other nodes
of a previous layer through adaptaηle inter-neuron θonneθtion strengths known as synaptiθ
weights. “NNs are trained for speθifiθ appliθations through a learning proθess and knowledge
is usually retained as a set of θonneθtion weights [ ]. The ηaθkpropagation algorithm and its var‐
iants are learning algorithms that are widely used in neural networks. With ηaθkpropagation,
the input data is repeatedly presented to the network. Eaθh time, the output is θompared to the
desired output and an error is θomputed. The error is then fed ηaθk through the network and
used to adjust the weights in suθh a way that with eaθh iteration it gradually deθlines until the
neural model produθes the desired output.
© 2013 Caocci e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he Crea“ive
Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s ”nres“ric“ed ”se,
dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
116 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
“NNs have ηeen suθθessfully applied in the fields of mathematiθs, engineering, mediθine,
eθonomiθs, meteorology, psyθhology, neurology, and many others. Indeed, in mediθine,
they offer a tantalizing alternative to multivariate analysis, although their role remains advi‐
sory sinθe no θonvinθing evidenθe of any real progress in θliniθal prognosis has yet ηeen
produθed [ ].
In the field of nephrology, there are very few reports on the use of “NNs [ - ], most of whiθh
desθriηe their aηility to individuate prediθtive faθtors of teθhnique survival in peritoneal dialy‐
sis patients as well as their appliθation to presθription and monitoring of hemodialysis therapy,
analyis of faθtors influenθing therapeutiθ effiθaθy in idiopathiθ memηranous nephropathy, pre‐
diθtion of survival after radiθal θysteθtomy for invasive ηladder θarθinoma and individual risk
for progression to end-stage renal failure in θhroniθ nephropathies.
This all led up to the intriguing θhallenge of disθovering whether “NNs were θapaηle of
prediθting the outθome of kidney transplantation after analyzing a series of θliniθal and im‐
munogenetiθ variaηles.
Figure 1. The predic“ion of kidney allograf“ o”“come.... a dream abo”“ “o come “r”e?
Comparison Be“ween an Ar“ificial Ne”ral Ne“work and Logis“ic Regression in Predic“ing Long Term Kidney 117
Transplan“a“ion O”“come
h““p://dx.doi.org/10.5772/53104
Human Leukoθyte “ntigen G HL“-G represents a non θlassiθ HL“ θlass I moleθule,
highly expressed in trophoηlast θells. [ ] HL“-G plays a key role in emηryo implantation
and pregnanθy ηy θontriηuting to maternal immune toleranθe of the fetus and, more speθifi‐
θally, ηy proteθting trophoηlast θells from maternal natural killer NK θells through interaθ‐
tion with their inhiηitory KIR reθeptors. It has also ηeen shown that HL“-G expression ηy
tumoral θells θan θontriηute to an esθape meθhanism, induθing NK toleranθe toward θan‐
θer θells in ovarian and ηreast θarθinomas, melanoma, aθute myeloid leukemia, aθute lym‐
phoηlastiθ leukemia and ”-θell θhroniθ lymphoθytiθ leukemia. [ ] “dditionally it would
seem that HL“-G moleθules have a role in graft toleranθe following hematopoietiθ stem θell
transplantation. These moleθules exert their immunotolerogeniθ funθtion towards the main
effeθtor θells involved in graft rejeθtion through inhiηition of NK and θytotoxiθ T lympho‐
θyte CTL -mediated θytolysis and CD +T-θell alloproliferation. [ ]
HL“-G transθript generates alternative messenger riηonuθleiθ aθids mRN“s that enθode
memηrane-ηound HL“-G , G , G , G and soluηle protein isoforms HL“-G , G ,
G . Moreover, HL“-G alleliθ variants are θharaθterized ηy a -ηasepair ηp deletion-inser‐
118 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
tion polymorphism loθated in the ~-untranslated region ~UTR of HL“-G. The presenθe of
the -ηp insertion is known to generate an additional spliθe whereηy ηases are removed
from the ~UTR [ ]. HL“-G mRN“s having the -ηase deletion are more staηle than the
θomplete mRN“ forms, and thus determine an inθrement in HL“-G expression. Therefore,
the -ηp polymorphism is involved in the meθhanisms θontrolling post-transθriptional reg‐
ulation of HL“-G moleθules
“ θruθial role has ηeen attriηuted to the aηility of these moleθules to preserve graft funθtion
from the insults θaused ηy reθipient alloreaθtive NK θells and θytotoxiθ T lymphoθytes
CTL . [ ] This is well supported ηy the numerous studies demonstrating that high HL“-G
plasma θonθentrations in heart, liver or kidney transplant patients is assoθiated with ηetter
graft survival [ - ].
Reθent studies of assoθiation ηetween the HL“-G + -ηp /− -ηp polymorphism and the
outθome of kidney transplantation have provided interesting, though not always θonθord‐
ant results [ - ].
In one θohort, a total of patients , % lost graft funθtion. The patients were divid‐
ed into groups aθθording to the presenθe or aηsenθe of HL“-G alleles exhiηiting the
-ηp insertion polymorphism. The first group inθluded patients . % with either
HL“-G + -ηp/+ -ηp or HL“-G − /+ -ηp whereas the seθond group inθluded ho‐
mozygotes . % for the HL“-G − -ηp polymorphism. The patients had a median age
of years range - and were prevalently males . % . The donors had a median
age of years range - . Nearly all patients , % had ηeen given a θadaver do‐
nor kidney transplant and for most of them . % it was their first transplant. The
average ±SD numηer of mismatθhes was ± antigens for HL“ Class I and ± .
antigens for HL“ Class II. “verage ±SD θold isθhemia time CIT was ± . hours.
The perθentage of patients hyperimmunized against HL“ Class I and II antigens PR“ >
% was higher in the group of homozygotes for the HL“-G -ηp deletion. Pre-trans‐
plantation serum levels of interleukin- IL- were lower in the group of homozy‐
gotes for the -ηp deletion.
Kidney transplant outθome was evaluated ηy glomerular filtration rate GFR , serum θre‐
atinine and graft funθtion tests. “t one year after transplantation, a stronger progressive
deθline of the estimated GFR, using the aηηreviated Modifiθation of Diet in Renal Dis‐
ease MDRD study equation, was oηserved in the group of homozygotes for the HL“-
G -ηp deletion in θomparison with the group of heterozygotes for the -ηp insertion.
This differenθe ηetween the groups ηeθame statistiθally signifiθant at two years .
ml/min/ . m P< . % CI . - . and θontinued to rise at . ml/min/ . m
P< . % CI . - . and years . ml/min/ . m P< . % CI . – .
after transplantation.
Comparison Be“ween an Ar“ificial Ne”ral Ne“work and Logis“ic Regression in Predic“ing Long Term Kidney 119
Transplan“a“ion O”“come
h““p://dx.doi.org/10.5772/53104
“NNs have different arθhiteθtures, whiθh θonsequently require different types of algo‐
rithms. The multilayer perθeptron is the most popular network arθhiteθture in use today
Figure . This type of network requires a desired output in order to learn. The network
is trained with historiθal data so that it θan produθe the θorreθt output when the output
is unknown. Until the network is appropriately trained its responses will ηe random.
Finding appropriate arθhiteθture needs trial and error method and this is where ηaθk-
propagation steps in. Eaθh single neuron is θonneθted to the neurons of the previous lay‐
er through adaptaηle synaptiθ weights. ”y adjusting the strengths of these θonneθtions,
“NNs θan approximate a funθtion that θomputes the proper output for a given input
pattern. The training data set inθludes a numηer of θases, eaθh θontaining values for a
range of well-matθhed input and output variaηles. Onθe the input is propagated to the
output neuron, this neuron θompares its aθtivation with the expeθted training output.
The differenθe is treated as the error of the network whiθh is then ηaθkpropagated
through the layers, from the output to the input layer, and the weights of eaθh layer are
adjusted suθh that with eaθh ηaθkpropagation θyθle the network gets θloser and θloser to
produθing the desired output [ ]. We used the Neural Network ToolηoxTM of the soft‐
ware Matlaη® , version . MathWorks, inθ. to develop a three layer feed forward
neural network. [ ]. The input layer of neurons was represented ηy the previous‐
ly listed θliniθal and immunogenetiθ parameters. These input data were then proθessed
in the hidden layer neurons . The output neuron prediθted a numηer ηetween and
goal , representing the event Kidney rejeθtion yes [ ] or Kidney rejeθtion no ,
respeθtively. For the training proθedure, we applied the }on-line ηaθk-propagation~ meth‐
od on data sets of patients previously analyzed ηy LR. The test phases utilized
patients randomly extraθted from the entire θohort and not used in the training
phase. Mean sensitivity the aηility of prediθting rejeθtion and speθifiθity the aηility of
prediθting no-rejeθtion of data sets were determined and θompared to LR. Taηle
120 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
No 55 40 (73) 48 (87)
Ex“rac“ion_1 Tes“ N=63
Yes 8 2 (25) 4 (50)
No 55 38 (69) 48 (87)
Ex“rac“ion_2 Tes“ N=63
Yes 8 3 (38) 4 (50)
No 55 30 (55) 48(87)
Ex“rac“ion_3 Tes“ N=63
Yes 8 3 (38) 5 (63)
No 55 40 (73) 49 (89)
Ex“rac“ion_4 Tes“ N=63
Yes 8 3 (38) 5 (63)
No 7 40 (73) 46 (84)
Ex“rac“ion_5 Tes“ N=63
Yes 8 4 (50) 6 (75)
No 55 30 (55) 34 (62)
Ex“rac“ion_6 Tes“ N=63
Yes 8 4 (50) 6 (75)
No 55 40 (73) 47 (85)
Ex“rac“ion_7 Tes“ N=63
Yes 8 3 (38) 5 (63)
No 55 38 (69) 46 (84)
Ex“rac“ion_8 Tes“ N=63
Yes 8 4 (50) 5 (63)
No 55 44 (80) 51 (93)
Ex“rac“ion_9 Tes“ N=63
Yes 8 2 (25) 4 (50)
No 55 32 (58) 52 (95)
Ex“rac“ion_10 Tes“ N=63
Yes 8 2 (25) 5 (63)
Table 1. Sensi“ivi“y and specifici“y of Logis“ic Regression and an Ar“ificial Ne”ral Ne“work in “he predic“ion of Kidney
rejec“ion in 10 “raining and valida“ing da“ase“s of kidney “ransplan“ recipien“s
Comparison Be“ween an Ar“ificial Ne”ral Ne“work and Logis“ic Regression in Predic“ing Long Term Kidney 121
Transplan“a“ion O”“come
h““p://dx.doi.org/10.5772/53104
“NNs θan ηe θonsidered a useful supportive tool in the prediθtion of kidney rejeθtion fol‐
lowing transplantation. The deθision to perform analyses in this partiθular θliniθal setting
was motivated ηy the importanθe of optimizing transplantation parameters and modifying
faθtors related to the reθipient, donor and transplant proθedure. “nother motivation was the
need for a simple prognostiθ tool θapaηle of analyzing the relatively large numηer of immu‐
nogenetiθ and other variaηles that have ηeen shown to influenθe the outθome of transplanta‐
122 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
tion. When θomparing the prognostiθ performanθe of LR to “NN, the aηility of prediθting
kidney rejeθtion sensitivity was % for LR versus % for “NN. The aηility of prediθting
no-rejeθtion speθifiθity was % for LR θompared to % of “NN.
The advantage of “NNs over LR θan theoretiθally ηe explained ηy their aηility to evaluate
θomplex nonlinear relations among variaηles. ”y θontrast, “NNs have ηeen faulted for ηe‐
ing unaηle to assess the relative importanθe of the single variaηles while LR determines a
relative risk for eaθh variaηle. In many ways, these two approaθhes are θomplementary and
their θomηined use should θonsideraηly improve the θliniθal deθision-making proθess and
prognosis of kidney transplantation.
Acknowledgement
Author details
Giovanni Caoθθi , Roηerto ”aθθoli , Roηerto Littera , Sandro Orrù , Carlo Carθassi and
Giorgio La Nasa
References
[ ] Harrell FE, Lee KL, Mark D”. Multivariaηle prognostiθ models issues in developing
models, evaluating assumptions and adequaθy, and measuring and reduθing errors.
Stat Med. - .
[ ] . Linder R, König IR, Weimar C, Diener HC, Pöppl SJ, Ziegler “. Two models for
outθome prediθtion - a θomparison of logistiθ regression and neural networks. Meth‐
ods Inf Med. - .
[ ] ”rier ME, Ray PC, Klein J”. Prediθtion of delayed renal allograft funθtion using an
artifiθial neural network. Nephrol Dial Transplant. - .
[ ] Tang H, Poynton MR, Hurdle JF, ”aird ”C, Koford JK, Goldfarη-Rumyantzev “S.
Prediθting three-year kidney graft survival in reθipients with systemiθ lupus erythe‐
matosus. “S“IO J. - .
[ ] Kazi JI, Furness PN, Niθholson M. Diagnosis of early aθute renal allograft rejeθtion
ηy evaluation of multiple histologiθal features using a ”ayesian ηelief network. J Clin
Pathol. - .
[ ] Furness PN. “dvanθes in the diagnosis of renal transplant rejeθtion. Curr. Diag.
Pathol. - .
[ ] Rush DN, Henry SF, Jeffery JR, Sθhroeder TJ, Gough J. Histologiθal findings in early
routine ηiopsies of staηle renal allograft reθipients. Transplantation - .
[ ] Kovats S, Main EK, Liηraθh C, Dtuηηleηine M, Fisher SJ, DeMars R. “ θlass I antigen,
HL“-G, expressed in human trophoηlasts. Sθienθe, -
[ ] Qiu J, Terasaki PI, Miller J, Mizutani K, Cai J, Carosella ED. Soluηle HL“-G expres‐
sion and renal graft aθθeptanθe. “m J Transplant. - .
[ ] Crispim JC, Duarte R“, Soares CP, Costa R, Silva JS, Mendes-Júnior CT, Wastowski
IJ, Faggioni LP, Saηer LT, Donadi E“. Human leukoθyte antigen-G expression after
kidney transplantation is assoθiated with a reduθed inθidenθe of rejeθtion. Transpl
Immunol. - .
Edge Detection in
Biomedical Images Using Self-Organizing Maps
h““p://dx.doi.org/10.5772/51468
1. Introduction
The application of self-organizing maps (SOMs) to the edge detection in biomedical images
is discussed. The SOM algorithm has been implemented in MATLAB program suite with
various optional parameters enabling the adjustment of the model according to the user’s
requirements. For easier application of SOM the graphical user interface has been developed.
The edge detection procedure is a critical step in the analysis of biomedical images, enabling
for instance the detection of the abnormal structure or the recognition of different types
of tissue. The self-organizing map provides a quick and easy approach for edge detection
tasks with satisfying quality of outputs, which has been verified using the high-resolution
computed tomography images capturing the expressions of the Granulomatosis with
polyangiitis. The obtained results have been discussed with an expert as well.
2. Self-organizing map
2.1. Self-organizing map in edge detection performance
The self-organizing map (SOM) [5, 9] is widely applied approach for clustering and pattern
recognition that can be used in many stages of the image processing, e. g. in color image
segmentation [18], generation of a global ordering of spectral vectors [26], image compression
[25], binarisation document [4] etc.
The edge detection approaches based on SOMs are not extensively used. Nevertheless
there are some examples of SOM utilization in edge detection, e. g. texture edge detection
[27], edge detection by contours [13] or edge detection performed in combination with
conventional edge detector [24] and methods of image de-noising [8] .
In our case, the SOM has been utilized in edge detection process in order to reduce the image
intensity levels.
© 2013 Gráfová e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
126 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The input layer (size Nx1) represents input data x1 , x2 , . . . , xM (M inputs, each input is
N dimensional). The output layer (size KxL), that may have a linear or 2D arrangement,
represents clusters in which the input data will be grouped. Each neuron of the input layer
is connected with all neurons of the output layer through the weights W (size of the weight
matrix is KxLxN).
SOM can be trained in either recursive or batch mode. In recursive mode, the weights of
the winning neurons are updated after each submission of an input vector, whereas in batch
mode, the weight adjustment for each neuron is made after the entire batch of inputs has
been processed, i. e. at the end of an epoch.
The weights adapt during the learning process based on a competition, i. e. the nearest (the
most similar) neuron of the output layer to the submitted input vector becomes a winner and
its weight vector and the weight vectors of its neighbouring neurons are adjusted according
to
where W is the weight matrix, xi the submitted input vector, λ the learning parameter
determining the strength of the learning and φs the neighbourhood strength parameter
determining how the weight adjustment decays with distance from the winner neuron (it
depends on s, the value of the neighbourhood size parameter).
The learning process can be divided into two phases: ordering and convergence. In
the ordering phase, the topological ordering of the weight vectors is established using
reduction of learning rate and neighbourhood size with iterations. In the convergence phase,
the SOM is fine tuned with the shrunk neighbourhood and constant learning rate. [23]
1. No decay
λ t = λ0 , (2)
2. Linear decay
t
λ t = λ0 1 − , (3)
τ
3. Gaussian decay
2
− 2τt 2
λ t = λ0 e , (4)
4. Exponential decay
t
λ t = λ0 e− τ , (5)
where T is the total number of iterations, λ0 and λt are the initial learning rate and that at
iteration t, respectively. The learning parameter should be in the interval h0.01, 1i.
2.3.2. Neighbourhood
In the learning process not only the winner but also the neighbouring neurons of the winner
neuron learn, i. e. adjust their weights. All neighbour weight vectors are shifted towards
the submitted input vector, however, the winning neuron update is the most pronounced and
the farther away the neighbouring neuron is, the less its weight is updated. This procedure
of the weight adjustment produces topology preservation.
There are several ways how to define a neighbourhood (some of them are depicted in
Figure 3).
The initial value of the neighbourhood size can be up to the size of the output layer, the final
value of the neighbourhood size must not be less than 1. The neighbourhood strength parameter,
128 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
determining how the weight adjustment of the neighbouring neurons decays with distance
from the winner, is usually reduced during the learning process (as well as the learning
parameter, analogue of Equations 2–5 and Figure 2). It decays from the initial value to
the final value, which can be reached already during the learning process, not only at the end
of the learning process. The neighbourhood strength parameter should be in the interval
h0.01, 1i. The Figure 4 depicts one of the possible development of neighbourhood size and
strength parameters during the learning process.
2.3.3. Weights
The resulting weight vectors of the neurons of the output layer, obtained at the end of
the learning process, represent the centers of the clusters. The resulting patterns of the weight
vectors may depend on the type of the weights initialization. There are several ways how to
initialize the weight vector, some of them are depicted in Figure 5.
1. Euclidean distance
v
uN 2
u
d j = t ∑ xi − w ji , (6)
i =1
2. Correlation
N ( xi − x )(w ji − w j )
dj = ∑ σx σw j
, (7)
i =1
3. Direction cosine
∑iN=1 xi w ji
dj = , (8)
k xi kkw ji k
4. Block distance
N
dj = ∑ | xi − w ji |, (9)
i =1
where xi is i-th component of the input vector, w ji i-th component of the j-th weight vector, N
dimension of the input and weight vectors, x mean value of the input vector x, w j mean value
of the weight vector w j , σx standard deviation of the input vector x, σw j standard deviation
of the weight vector w j , k xi k length of the input vector x and kw ji k length of the weight
vector w j .
Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 129
h““p://dx.doi.org/10.5772/51468
Figure 2. Learning rate decay function (dependence of the value of the learning parameter on the number of iterations): (a) No
decay, (b) Linear decay, (c) Gaussian decay, (d) Exponential decay
Figure 3. Types of neighbourhood: (a) Linear arrangements, (b) Square arrangements, (c) Hexagonal arrangements
Figure 4. Neighbourhood size decay function (dependence of the neighbourhood size on the number of iterations) and
neighbourhood strength decay function (dependence of the value of the neighbourhood strength on the distance from
the winner)
130 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
k
D= ∑ ∑ ( x n − w i )2 , (10)
i =1 n ∈ ci
where xn is the n-th input vector belonging to cluster ci whose center is represented by wi
(e. i. the weight vector of the winning neuron representing cluster ci ).
The weight adjustment corresponding to the smallest learning progress criterion is the result
of the SOM learning process, see Figure 6. These weights represent the cluster centers.
For the best result, the SOM should be run several times with various settings of SOM
parameters to avoid detection of local minima and to find the global optimum on the error
surface plot.
2.3.6. Errors
The errors of trained SOM can be evaluated according to
1 k
E=
M i∑ ∑ ( x n − w i )2 , (11)
=1 n ∈ ci
1 k 1
E=
k i∑ M i
∑ ( x n − w i )2 , (12)
=1 n ∈ ci
Ei = ∑ ( x n − w i )2 , (13)
n ∈ ci
1
( x n − w i )2 ,
Mi n∑
Ei = (14)
∈ ci
where xn is the n-th input vector belonging to cluster ci whose center is represented by wi
(e. i. the weight vector of the winning neuron representing cluster ci ), M is number of input
vectors, Mi is number of input vectors belonging to i-th cluster and k is number of clusters.
For more information about the trained SOM, the distribution of the input vectors in
the clusters and the errors of the clusters can be visualized, see Figure 7.
Edge De“ec“ion in Biomedical Images Using Self-Organizing Maps 131
h““p://dx.doi.org/10.5772/51468
3. Edge detection
Edge detection techniques are commonly used in image processing, above all for feature
detection, feature extraction and segmentation.
The aim of the edge detection process is to detect the object boundaries based on the abrupt
changes in the image tones, i. e. to detect discontinuities in either the image intensity or
the derivatives of the image intensity.
by proper convolution mask. Moreover, for simplification of derivative calculation, the edges
are usually detected only in two or four directions.
The essential step in edge detection process is thresholding, i. e. determination of the threshold
limit corresponding to a dividing value for the evaluation of the edge detector response either
as the edge or non-edge. Due to the thresholding, the result image of the edge detection
process is comprised only of the edge (white) and non-edge (black) pixels. The quality
of thresholding setting has an impact on the quality of the whole edge detection process,
i. e. exceedingly small value of the threshold leads to assignment of the noise as the edge, on
the other hand exceedingly large value of the threshold leads to omission of some significant
edges.
q
G= Gx2 + Gy2 , (15)
where G is the edge gradient, Gx and Gy are values of the first derivative in the horizontal
and in the vertical direction, respectively.
The ability of SOM for edge detection in biomedical images has been tested using
the high-resolution computed tomography (CT) images capturing the expressions of
the Granulomatosis with polyangiitis disease.
Figure 5. Weight vectors initialization: (a) Random small numbers, (b) Vectors near the center of mass of inputs, (c) Some of
the input vectors are randomly chosen as the‘initial weight vectors
P1 P2 P3
P4 P5 P6
P7 P8 P9
Table 1. The image mask used for the preparation of the set of the input vectors.
The mask (9 adjacent pixels) was moved over the whole image pixel by pixel row-wise and the process continued until
the whole image was scanned. From each location of the scanning mask in the image the single input vector has been formed,
i. e. x = ( P1, P2, P3, P4, P5, P6, P7, P8, P9), where P1–P9 denote the intensity values of the image pixel. The (binary) output
value of SOM for each input vector replaced the intensity value in the position of the pixel with original intensity value P5
134 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
(a) (b)
by the discovery of ANCA antibodies and their routine investigation since the nineties of
the past century. The onset of GPA may occur at any age, although patients typically present
at age 35–55 years [11].
A classic form of GPA is characterized by necrotizing granulomatous vasculitis of the upper
and lower respiratory tract, glomerulonefritis, and small-vessel vasculitis of variable degree.
Because of the respiratory tract involvement nearly all the patients have some respiratory
symptoms including cough, dyspnea or hemoptysis. Due to this, nearly all of them
have a chest X-ray at the admission to the hospital, usually followed by a high-resolution
computed CT scan. The major value of CT scanning is in the further characterization of
lesions found on chest radiography.
The spectrum of high-resolution CT findings of GPA is broad, ranging from nodules and
masses to ground glass opacity and lung consolidation. All of the findings may mimic other
conditions such as pneumonia, neoplasm, and noninfectious inflammatory diseases [2].
The prognosis of GPA depends on the activity of the disease and disease-caused damage
and response to therapy. There are several drugs used to induce remission, including
cyclophosphamide, glucocorticoids or monoclonal anti CD 20 antibody rituximab. Once
the remission is induced, mainly azathioprin or mycophenolate mofetil are used for its
maintenance.
Unfortunately, relapses are common in GPA. Typically, up to half of patients experience
relapse within 5 years [21].
classify types of the finding. The standard software 1 used for CT image visualization usually
includes some interactive tools (zoom, rotation, distance or angle measurements, etc.) but no
analysis is done (e. g. detection of the abnormal structure or recognition of different types
of tissue). Moreover, there is no software customized specifically for requirements of GPA
analysis.
Therefore, there is a place to introduce a new approach for analysis based on SOM. CT
finding analysis is then less time consuming and more precise.
1
Syngo Imaging XS-VA60B, Siemens AG Medical Solutions, Health Services, 91052 Erlangen, Germany
136 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 8. The graphical user interface of the software using SOM for clustering
Stages Description
Image preprocessing Histogram matching of the input image to the test
image.
Image transformation The image is transformed to M masks (3x3) that form
the set of M input vectors (9 dimensional).
Clustering using SOM The set of the input vectors is classified into 2 classes
according to the obtained weights from the SOM
training process.
Reverse image transformation The set of classified input vectors is reversely
transformed into the image. The intensity values are
replaced by the values of the class membership.
Edge detection Computation of image gradient.
The obtained results have been discussed with an expert from Department of Nephrology of
the First Faculty of Medicine and General Teaching Hospital in Prague.
In the first case, a high-resolution CT image capturing a granuloma in the left lung has been
processed, see Figure 10a. The expert has been satisfied with the quality of the GPA detection
(see Figure 10b) provided by the SOM, since the granuloma has been detected without any
artifacts.
In the second case, the expert has pointed out that the high-resolution CT image,
capturing both masses and ground-glass (see Figure 11a, 12a), was problematic to be
evaluated. The possibility of the detection of the masses was aggravated by the ground-glass
surrounding the lower part of the first mass and the upper part of the second mass.
The artifacts, which originated in the coughing movements of the patient (‘wavy’ lower part
of the image), made the detection process of the masses and the ground-glass difficult as
well. Despite these inconveniences, the expert has confirmed, that the presented software
has been able to distinguish between the particular expression forms of GPA with satisfying
accuracy and it has detected the mass and ground-glass correctly (see Figure 11b, 12b, 12c).
In conclusion, the presented SOM approach represents a new helpful approach for GPA
disease diagnostics.
138 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
(a)
(b)
(a)
(b)
(a)
(b) (c)
6. Conclusions
The edge detection procedure is a‘critical step in the analysis of biomedical images, enabling
above all the detection of the abnormal structure or the‘recognition of different types of
tissue.
The application of SOM for edge detection in biomedical images has been discussed and
its contribution to the solution of the edge detection task has been confirmed. The‘ability
of SOM has been verified using the‘high-resolution CT images capturing all three forms
of the expressions of the GPA disease (granulomatosin, mass, ground-glass). Using SOM,
particular expression forms of the GPA disease have been detected and distinguished from
each other. The obtained results have been discussed by the expert who has confirmed that
the SOM provides a quick and easy approach for edge detection tasks with satisfying quality
of output.
Future plans are based on the problem extension to three-dimensional space to
enable CT image analysis involving (i) pathological finding 3D visualization and (ii)
3D reconstruction of the whole region (using the whole set of CT images).
Acknowledgements
The work was supported by specific university research MSMT No. 21/2012, the reaserch
grant MSM No. 6046137306 and PRVOUK-P25/LF1/2.
Author details
Lucie Gráfová1 , Jan Mareš1 , Aleš Procházka1 and Pavel Konopásek2
1 Department of Computing and Control Engineering, Institute of Chemical Technology,
Prague, Czech Republic
2 Department of Nephrology, First Faculty of Medicine and General Faculty Hospital, Prague,
Czech Republic
7. References
[1] Ananthakrishnan, L., Sharma, N. & Kanne, J. P. [2009]. Wegener?s granulomatosis in
the chest: High-resolution ct findings, Am J Roentgenol 192(3): 676–82.
[3] Attali, P., Begum, R., Romdhane, H. B., Valeyre, D., Guillevin, L. & Brauner, M. W.
[1998]. Pulmonary wegener?s granulomatosis: changes at follow-up ct, European
Radiology 8: 1009–1113.
[4] Badekas, E. & Papamarkos, N. [2007]. Document binarisation using kohonen som, IET
Image Processing 1: 67–84.
142 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
[5] Baez, P. G., Araujo, C. S., Fernandez, V. & Procházka, A. [2011]. Differential Diagnosis
of Dementia Using HUMANN-S Based Ensembles, Springer, Berlin, Germany, chapter 14,
pp. 305–324.
[8] Jerhotová, E., Švihlík, J. & Procházka, A. [2011]. A. Biomedical Image Volumes Denoising
via the Wavelet Transform, InTech, chapter 14.
[10] Komócsi, A., Reuter, M., Heller, M., Muraközi, H., Gross, W. L. & Schnabel, A.
[2003]. Active disease and residual damage in treated wegener?s granulomatosis: an
observational study using pulmonary high-resolution computed tomography, European
Radiology 13: 36–42.
[11] Lane, S. E., Watts, R. & Scott, D. G. I. [2005]. Epidemiology of systemic vasculitis, Curr
Rheumatol Rep 7: 270–275.
[12] Lee, K. S., Kim, T. S., Fujimoto, K., Moriya, H., Watanabe, H., Tateishi, U., Ashizawa,
K., Johkoh, T., Kim, E. A. & Kwon, O. J. [2003]. Thoracic manifestation of wegener?s
granulomatosis: Ct findings in 30 patients, European Radiology 13: 43–51.
[13] Liu, J.-C. & Pok, G. [1999]. Texture edge detection by feature encoding and predictive
model, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, pp. 1105–1108.
[14] Lohrmann, C., Uhl, M., Schaefer, O., Ghanem, N., Kotter, E. & Langer:, M.
[2005]. Serial high-resolution computed tomography imaging in patients with wegener
granulomatosis: Differentiation between active inflammatory and chronic fibrotic
lesions, Acta Radiologica 46: 484–491.
[15] Marr, D. & Hildreth, E. [1980]. Theory of edge detection, Proc. Roy. Soc. Landon, Vol.
B.207, pp. 187–217.
[18] Moreira, J. & Fontuora, L. D. [1996]. Neural-based color image segmentation and
classification, Anais do IX SIBGRAPI: 47–54.
[20] Prewitt, J. M. S. [1970]. Object enhancement and extraction. Picture Processing and
Psychophysics, Academic Press, New York.
[22] Roberts, L. G. [1963]. Machine Perception of Three Dimensional Solids, PhD thesis,
Massachusetts Institute of Technology, Electrical Engineering Department.
[23] Samarasinghe, S. [2006]. Neural Networks for Applied Sciences and Engineering: From
Fundamentals to Complex Pattern Recognition.
[25] Sharma, D. K., Gaur, L. & Okunbor, D. [2007]. Image compression and feature extraction
using kohonen’s self-organizing map neural network, Journal of Strategic E-Commerce
5(No. 0): 25–38.
[26] Toivanen, P. J., Ansamäki, J., Parkkinen, J. P. S. & Mielikäinen, J. [2003]. Edge
detection in multispectral images using the self-organizing map, Pattern Recognition
Letters 24: 2987–2994.
[27] Venkatesh, Y. V., K.Raja, S. & Ramya, N. [2006]. Multiple contour extraction from
graylevel images using an artificial neural network, IEEE Transactions on Image Processing
15: 892–899.
h““p://dx.doi.org/10.5772/51629
. Introduction
The maθhining drilling proθess ranks among the most widely used manufaθturing proθ‐
esses in industry in general [ , ]. In the quest for higher quality in drilling operations,
“NNs have ηeen employed to monitor drill wear using sensors. “mong the types of sig‐
nals employed is that of maθhining loads measured with a dynamometer [ , ], eleθtriθ
θurrent measured ηy applying Hall Effeθt sensors on eleθtriθ motors [ ], viηrations [ ],
as well as a θomηination of the aηove with other deviθes suθh as aθθelerometers and
aθoustiθ emission sensors [ ].
This artiθle θontriηutes to the use of MLP [ - ] and “NFIS type [ - ] artifiθial intelli‐
genθe systems programmed in M“TL“” to estimate the diameter of drilled holes. The two
types of network use the ηaθkpropagation method, whiθh is the most popular model for
manufaθturing appliθations [ ]. In the experiment, whiθh θonsisted of drilling single-layer
test speθimens of -T alloy and of Ti “l V alloy, an aθoustiθ emission sensor, a three-
dimensional dynamometer, an aθθelerometer, and a Hall Effeθt sensor were used to θolleθt
© 2013 Geronimo e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
146 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
information aηout noise frequenθy and intensity, taηle viηrations, loads on the x, y and z ax‐
es, and eleθtriθ θurrent in the motor, respeθtively.
. Drilling Process
The three drilling proθesses most frequently employed in industry today are turning, mill‐
ing and ηoring [ ], and the latter is the least studied proθess. However, it is estimated that
today, ηoring with heliθal drill ηits aθθounts for % to % of all maθhining proθesses.
The quality of a hole depends on geometriθ and dimensional errors, as well as ηurrs and
surfaθe integrity. Moreover, the type of drilling proθess, the tool, θutting parameters and
maθhine stiffness also affeθt the preθision of the hole [ ].
It is very diffiθult to generate a reliaηle analytiθal model to prediθt and θontrol hole diame‐
ters, sinθe these holes are usually affeθted ηy several parameters. Figure illustrates the
loads involved in the drilling proθess, the most representative of whiθh is the feed forθe FZ,
sinθe it affeθts θhip formation and surfaθe roughness.
synapse is proθessed and an output is generated. The training error is θalθulated in eaθh iter‐
ation, ηased on the θalθulated output and desired output, and is used to adjust the synaptiθ
weights aθθording to the generalized delta rule
where is the learning rate and is the moment, whiθh are parameters that influenθe the
learning rate and its staηility, respeθtively wij l is the weight of eaθh θonneθtion and j l is
the loθal gradient θalθulated from the error signal.
“ neural network artifiθial “NN learns ηy θontinuously adjusting the synaptiθ weights at
the θonneθtions ηetween layers of neurons until a satisfaθtory response is produθed [ ].
In the present work, the MLP network was applied to estimate drilled hole diameters ηased
on an analysis of the data θaptured ηy the sensors. The weight readjustment method em‐
ployed was ηaθkpropagation, whiθh θonsists of propagating the mean squared error gener‐
ated in the diameter estimation ηy eaθh layer of neurons, readjusting the weights of the
θonneθtions so as to reduθe the error in the next iteration.
Figure shows a typiθal MLP “NN, with m inputs and p outputs, with eaθh θirθle represent‐
ing a neuron. The outputs of a neuron are used as inputs for a neuron in the next layer.
Oηtaining an “NFIS model that performs well requires taking into θonsideration the initial
numηer of parameters and the numηer of inputs and rules of the system [ ]. These parame‐
ters are determined empiriθally, and an initial model is usually θreated with equally spaθed
pertinenθe funθtions.
Figure 3. ANFIS archi“ec“”re for “wo inp”“s and “wo r”les based on “he firs“-order S”geno model.
However, this method is not always effiθient ηeθause it does not show how many relevant
input groups there are. To this end, there are algorithms that help determine the numηer of
pertinenθe funθtions, thus enaηling one to θalθulate the maximum numηer of fuzzy rules.
Pi = å exp ç - 2 xi - x j ÷
n
æ 4 2ö
j =i è ar ø
where Pi is the potential of the possiηle θluster, xi is the possiηle θluster θenter, xj is eaθh
point in the neighηorhood of the θluster that will ηe grouped in it, and n is the numηer of
points in the neighηorhood.
“NFIS is a fuzzy inferenθe system introduθed in the work struθture of an adaptive neural
network. Using a hyηrid learning sθheme, the “NFIS system is aηle to map inputs and out‐
puts ηased on human knowledge and on input and output data pairs [ ]. The “NFIS meth‐
od is superior to other modeling methods suθh as autoregressive models, θasθade
θorrelation neural networks, ηaθkpropagation algorithm neural networks, sixth-order poly‐
nomials, and linear prediθtion methods [ ].
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 149
h““p://dx.doi.org/10.5772/51629
. Methodology
The tests were performed on test speθimens θomposed of a paθkage of sheets of Ti “l V titani‐
um alloy and -T aluminum alloy, whiθh were arranged in this order to mimiθ their use in
the aerospaθe industry. The tool employed here was a heliθal drill made of hard metal.
“ total of nine test speθimens were prepared, and holes were drilled in eaθh one. Thus, a
θonsideraηle numηer of data were made availaηle to train the artifiθial intelligenθe systems.
The data set θonsisted of the signals θolleθted during drilling and the diameters measured at
the end of the proθess.
The drilling proθess was monitored using an aθoustiθ emission sensor, a three-dimension‐
al dynamometer, an aθθelerometer, and a Hall Effeθt sensor, whiθh were arranged as illus‐
trated in Figure .
The aθoustiθ emission signal was θolleθted using a Sensis model DM- sensor. The eleθ‐
triθ power was measured ηy applying a transduθer to monitor the eleθtriθ θurrent and
voltage in the terminals of the eleθtriθ motor that aθtivates the tool holder. The six signals
were sent to a National Instruments PCI- E data aθquisition ηoard installed in a θom‐
puter. LaηView software was used to aθquire the signals and store them in ηinary format
for suηsequent analysis and proθessing.
150 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
To simulate diverse maθhining θonditions, different θutting parameters were seleθted for
eaθh maθhined test speθimen. This method is useful to evaluate the performanθe of artifi‐
θial intelligenθe systems in response to θhanges in the proθess. Eaθh test speθimen was
duηηed as listed in Taηle .
Feed Speed
Condition ID Spindle [rpm]
[mm/min]
1 1A 1000 90.0
2 1B 1000 22.4
3 1C 1000 250.0
4 2A 500 90.0
5 2B 500 22.4
6 2C 500 250.0
7 3A 2000 90.0
8 3B 2000 22.4
9 3C 2000 250.0
Eaθh pass θonsisted of a single drilling movement along the workpieθe in a given θondition.
The signals of aθoustiθ emission, loads, θutting power and aθθeleration shown in Figure
were measured in real time at a rate of samples per seθond.
. . Diameter measurements
”eθause the roundness of maθhined holes is not perfeθt, two measurements were taken of
eaθh hole, one of the maximum and the other of the minimum diameter. Moreover, the di‐
ameter of the hole in eaθh maθhined material will also ηe different due to the material~s par‐
tiθular θharaθteristiθs.
The arθhiteθture of the systems was defined using all the θolleθted signals, i.e., those of
aθoustiθ emission loads in the x, y and z direθtions eleθtriθal power and aθθeleration. “n
MLP network and an “NFIS system were θreated for eaθh test speθimen material, due to the
differenθes in the ηehavior of the signals see Figure and in the ranges of values found in
the measurement of the diameters.
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 151
h““p://dx.doi.org/10.5772/51629
Figure 5. Signals collec“ed d”ring “he drilling process of Ti6Al4V alloy (lef“) and 2024-T3 alloy (righ“).
. . . Multilayer Perθeptron
In this study, the signals from the sensors, together with the maximum and minimum meas‐
ured diameters, were organized into two matriθes, one for eaθh test speθimen material.
These data were utilized to train the neural network. The entire data set of the tests resulted
in samples, θonsidering tool ηreakage during testing under θondition C.
The MLP network arθhiteθture is defined ηy estaηlishing the numηer of hidden layers to ηe
used, the numηer of neurons θontained in eaθh layer, the learning rate and the moment. “n
algorithm was θreated to test θomηinations of these parameters. The final θhoiθe was the
θomηination that appeared among the five smallest errors in the estimate of the maximum
and minimum diameters. Parameters suθh as the numηer of training iterations and the de‐
sired error are used as θriteria to θomplete the training and were estaηlished at training
iterations and x - mm, respeθtively. This proθedure was performed for eaθh material,
152 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
generating two MLP networks whose θonfiguration is desθriηed in Taηle . The remaining
parameters were kept aθθording to the M“TL“” default.
. . . ANFIS
The same data matrix employed to train the MLP neural network was used to train the
“NFIS system. This system θonsists of a fuzzy inferenθe system FIS that θolleθts the
availaηle data, θonverts them into If-Then type rules ηy means of pertinenθe funθtions,
and proθesses them to generate the desired output. The FIS is influenθed ηy the organiza‐
tion of the training data set, whose task is performed with the help of M“TL“”~s Fuzzy
toolηox. The suηtraθtive θlustering algorithm suηθlust is used to searθh for similar data
θlusters in the training set, optimizing the FIS through the definition of points with the
highest potential for the θluster θenter. Parameters suθh as the radius of influenθe, inθlu‐
sion rate and rejeθtion rate help define the numηer of θlusters, and henθe, the numηer of
rules of the FIS. Taηle lists the parameters used in the “NFIS systems. The desired er‐
ror was set at x - mm. The training method θonsists of a hyηrid algorithm with the
method of ηaθkpropagation and least-squares estimate.
Figure 6. Mean error of hole diame“ers es“ima“ed by “he ANFIS sys“em for several “raining i“era“ions.
”eθause training in the “NFIS system is performed in ηatθh mode, with the entire training
set presented at onθe, the appropriate numηer of training iterations was investigated. Thus,
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 153
h““p://dx.doi.org/10.5772/51629
an algorithm was θreated to test several numηers of training iterations ranging from up to
. Figure illustrates the result of this test.
The larger the numηer of training iterations, the greater the θomputational effort. Thus,
avoiding the peaks Figure , the numηer de training iterations was set at , whiθh re‐
quires low θomputational θapaθity.
Initially, the systems were trained using all the θolleθted signals. Given the data set of the
sensors and the desired outputs, whiθh Consist of the measured diameters, the performanθe
of the neural network is evaluated ηased on the error ηetween the estimated diameter and
the measured diameter, whiθh are shown on graphs.
. . . MLP
For the Ti “l V titanium alloy, the estimate of the minimum diameter resulted in a mean
error of . mm, with a maximum error of . mm.
Figure 7. Minim”m and maxim”m diame“ers (ac“”al vs. es“ima“ed by “he MLP) for “he Ti6Al4V alloy.
For the maximum diameter, the resulting mean error was . mm, with a maximum error
of . mm. Figure depiθts the results of these estimates.
Figure shows the result of the estimation of the hole diameters maθhined in the -T
aluminum alloy. The mean error for the minimum diameter was . mm, with a maxi‐
mum error of . mm. For the maximum diameter, the mean error was . mm, with a
maximum error of . mm.
154 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 8. Minim”m and maxim”m diame“ers (ac“”al vs. es“ima“ed by “he MLP) for “he 2024-T3 alloy.
. . . ANFIS
Figure shows the diameters estimated ηy “NFIS. The mean error in the estimate of the
minimum diameter was . mm, with a maximum error of . mm. For the maximum
diameter, the resulting mean error was . mm, and the highest error was . mm.
Figure 9. Minim”m and maxim”m diame“ers (ac“”al vs. es“ima“ed by ANFIS) for “he Ti6Al4V alloy.
Figure illustrates the result of the maθhined hole diameter estimated for the -T al‐
loy, using the same network θonfiguration. The mean error for the minimum diameter was
. mm, with a maximum error of . mm. The maximum diameter presented a mean
error of . mm, and a maximum error of . mm.
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 155
h““p://dx.doi.org/10.5772/51629
Figure 10. Minim”m and maxim”m diame“ers (ac“”al vs. es“ima“ed by “he MLP) for “he 2024-T3 alloy.
To optimize the θomputational effort, an algorithm was θreated to test the performanθe of eaθh
type of system in response to eaθh of the signals separately or to a θomηination of two or more
signals. This proθedure was adopted in order to identify a less invasive estimation method.
Figure 11. Performance of individ”al and combined signals in “he es“ima“ion of hole diame“ers in Ti6Al4V alloy by “he
MLP ne“work
Individual signals and a θomηination of two distinθt signals were tested for the MLP net‐
work. The ηest individual inputs for the Ti “l V alloy were the aθoustiθ emission and Z
forθe signals. Comηined, the Z forθe and aθθeleration signals presented the lowest error. The
θlassified signals are illustrated in Figure .
156 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
For the -T alloy, the ηest individual input was the Z forθe. When θomηined, the Z forθe
and aθθeleration signals presented the lowest error. Figure depiθts the θlassified signals.
In the “NFIS system, the Z forθe provided the ηest individual signal for the estimate of the
drilled hole diameter in Ti “l V alloy. The aθoustiθ emission signal θomηined with the Z
forθe presented the ηest result with two θomηinations, as indiθated in Figure .
Figure 12. Performance of individ”al and combined signals in “he es“ima“ion of hole diame“ers in 2024-T3 alloy by “he
MLP sys“em.
For the aluminum alloy, the “NFIS system presented the ηest performanθe with the individ‐
ual Z forθe signal and with a θomηination of the Z forθe and aθoustiθ emission signals, as
indiθated in Figure .
Figure 13. Performance of individ”al and combined signals in “he es“ima“ion of hole diame“ers in Ti6Al4V alloy by “he
ANFIS sys“em.
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 157
h““p://dx.doi.org/10.5772/51629
Figure 14. Performance of individ”al and combined signals in “he es“ima“ion of hole diame“ers in 2024-T3 alloy by “he
ANFIS sys“em.
”eθause the performanθe of the artifiθial intelligenθe systems in the tests was the highest when
using the Z forθe signal, new tests were θarried out with only this signal. The errors were divid‐
ed into four θlasses, aθθording to the following θriteria preθision of the instrument ≤ μm , tol‐
eranθe required for preθision drilling proθesses ≤ μm , toleranθe normally employed in
industrial settings ≤ μm , and the errors that would lead to a non-θonformity > μm . The
θonfigurations used in the previous tests were maintained in this test.
. . MLP
The multilayer perθeptron “NN was trained with the information of the Z forθe and the
minimum and maximum measured diameters.
Figure 15. Classifica“ion of es“ima“ion errors for “he 2024-T3 alloy ob“ained by “he MLP ne“work.
The simulation of the MLP network for the aluminum alloy presented lower preθision errors
than the measurement instrument used in % of the attempts. Thirty-three perθent of the
158 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
estimates presented errors within the range stipulated for preθision holes and % for the
toleranθes normally applied in industry in general. Only % of the estimates performed ηy
the artifiθial neural network would result in a produθt θonformity rejeθtion.
The simulation of the MLP network for the titanium alloy presented lower preθision errors
than the measurement instrument used in % of the attempts. Thirty-seven perθent of the
estimates presented errors within the range stipulated for preθision holes and % for the
toleranθes normally applied in industry in general. Only % of the estimates performed ηy
the artifiθial neural network would result in a produθt θonformity rejeθtion.
Figure 16. Classifica“ion of es“ima“ion errors for “he Ti6Al4V alloy ob“ained by “he MLP ne“work.
. . ANFIS
The “NFIS system was simulated in the same way as was done with the MLP network, ηut
this time using only one input, the Z forθe. This proθedure resulted in θhanges in the FIS
struθture due to the use of only one input.
Figure 17. Classifica“ion of es“ima“ion errors for “he 2024-T3 alloy ob“ained by “he ANFIS sys“em.
For the aluminum alloy, “NFIS presented lower preθision errors than the measurement in‐
strument employed in % of the attempts. Thirty-seven perθent of the estimates presented
errors within the range stipulated for preθision holes and % for the toleranθes normally
used in industry in general. Only % of the estimates performed ηy the artifiθial neural net‐
work would result in a produθt θonformity rejeθtion, as indiθated in Figure .
For the titanium alloy, “NFIS presented lower preθision errors than the measurement in‐
strument employed in % of the attempts. Thirty-four perθent of the estimates presented
MLP and ANFIS Applied “o “he Predic“ion of Hole Diame“ers in “he Drilling Process 159
h““p://dx.doi.org/10.5772/51629
errors within the range stipulated for preθision holes and % for the toleranθes normally
used in industry. Only % of the estimates performed ηy the artifiθial neural network
would lead to a produθt θonformity rejeθtion, as indiθated in Figure .
Figure 18. Classifica“ion of es“ima“ion errors for “he Ti6Al4V alloy ob“ained by “he ANFIS sys“em.
. Conclusions
The first system used here θonsisted of a multiple layer perθeptron MLP artifiθial neural net‐
work. Its performanθe was marked ηy the large numηer of signals used in its training and for
its estimation preθision, whiθh produθed % of θorreθt responses errors ηelow μm for the
titanium alloy and % for the aluminum alloy. “s for its unaθθeptaηle error rates, the MLP
system generated only % and % for the titanium and aluminum alloys, respeθtively.
The seθond approaθh, whiθh involved the appliθation of an adaptive neuro-fuzzy infer‐
enθe system “NFIS , generated a large numηer of θorreθt responses using the six availa‐
ηle signals, i.e., % for the titanium alloy and % for the aluminum alloy. “ total of
% of errors for the titanium alloy and % for the aluminum alloy were θlassified
aηove the admissiηle toleranθes > μm .
The results desθriηed herein demonstrate the appliθaηility of the two systems in industrial
θontexts. However, to evaluate the eθonomiθ feasiηility of their appliθation, another method
was employed using the signal from only one sensor, whose simulations generated the low‐
est error among the availaηle signals. Two signals stood out the Z forθe and aθoustiθ emis‐
sion signals, with the former presenting a ηetter result for the two alloys of the test speθimen
and the latter presenting good results only in the hole diameter estimation for the titanium
alloy. Therefore, the Z forθe was seleθted for the θontinuation of the tests.
The results oηtained here are very enθouraging in that fewer estimates fell within the range
θonsidered inadmissiηle, i.e., only % for the aluminum alloy and % for the titanium alloy,
using the MLP network.
160 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The results produθed ηy the “NFIS system also demonstrated a drop in the numηer of errors
outside the expeθted range, i.e., % for the aluminum alloy and % for the titanium alloy.
”ased on the approaθhes used in this work, it θan ηe stated that the use of artifiθial intelli‐
genθe systems in industry, partiθularly multilayer perθeptron neural networks and the
adaptive neuro-fuzzy inferenθe systems, is feasiηle. These systems showed high aθθuraθy
and low θomputational effort, as well as a low implementation θost with the use of only one
sensor, whiθh implies few physiθal θhanges in the equipment to ηe monitored.
Acknowledgements
The authors gratefully aθknowledge the ”razilian researθh funding agenθies F“PESP São
Paulo Researθh Foundation , for supporting this researθh work under Proθess #
/ - and CNPq National Counθil for Sθientifiθ and Teθhnologiθal Development
and C“PES Federal “genθy for the Support and Evaluation of Postgraduate Eduθation for
providing sθholarships. We are also indeηted to the θompany OSG Sulameriθana de Ferra‐
mentas Ltda. for manufaθturing and donating the tools used in this researθh.
Author details
Thiago M. Geronimo, Carlos E. D. Cruz, Fernando de Souza Campos, Paulo R. “guiar and
Eduardo C. ”ianθhi*
Universidade Estadual Paulista Júlio de Mesquita Filho UNESP , ”auru θampus, ”razil
References
[ ] Panda, S. S., Chakraηorty, D., & Pal, S. K. . Monitoring of drill flank wear using
fuzzy ηaθk-propagation neural network. Int. J. Adv. Manuf. Teθhnol, , - .
[ ] Yang, X., Kumehara, H., & Zhang, W. “ugust . ”aθk-propagation Wavelet Neu‐
ral Network ”ased Prediθtion of Drill Wear from Thrust Forθe and Cutting Torque
Signals. Computer and Information Sθienθe, ,
[ ] Li, X., & Tso, S. K. . Drill wear monitoring ηased on θurrent signals. Wear, ,
- .
[ ] “ηu-Mahfouz, I. . Drilling wear deteθtion and θlassifiθation using viηration
signals and artifiθial neural nerwork. International Journal of Maθhine Tools & Manufaθ‐
ture, , - .
[ ] Kandilli, I., Sönmez, M., Ertunθ, H. M., & Çakir, ”. , “ugust . Online Monitoring
of Tool Wear In Drilling and Milling ”y Multi-Sensor Neural Network Fusion. Har‐
ηin. Proθeedings of 7 IEEE International Conferenθe on Meθhatroniθs and Automation,
- .
[ ] Haykin, S. . Neural Networks “ Compreensive Foundation. Patparganj, Pear‐
son Prentiθe Hall, ed., .
[ ] Sanjay, C., & Jyothi, C. . “ study of surfaθe roughness in drilling Teθhnol using
mathematiθal analysis and neural networks. Int J Adv Manuf, , - .
[ ] Huang, ”. P., Chenη, J. C., & Li, Y. . “rtifiθial-neural-networksηased surfaθe
roughness pokayoke system for end-milling operations. Neuroθomputing, , - .
[ ] J.-S. R., Jang. . “NFIS “daptive-Network-”ased Fuzzy Inferenθe System. IEEE
Transaθtions on Systems, Man and Cyηernetiθs, , - .
[ ] Resende, S. O. . Sistemas Inteligentes: Fundamentos e Apliθações, Manole, ”arueri,
ed., .
[ ] Lezanski, P. . “n Intelligent System for Grinding Wheel Condition Monitoring.
Journal of Materials Proθessing Teθhnology, , - .
[ ] Lee, K. C., Ho, S. J., & Ho, S. Y. . “θθurate Estimation of Surfaθe Roughness
from Texture Features of The Surfaθe Image Using an “daptive Neuro-Fuzzy Infer‐
enθe System. Preθision Engineering, , - .
[ ] Johnson, J., & Piθton, P. . Conθepts in Artifiθial Intelligenθe: Designing Intelligent
Maθhines, , Oxford, ”utterworth-Heinemann, .
[ ] Sugeno, M., & Kang, G. T. . Struθture Identifiθation of Fuzzy Model. Fuzzy Sets
and Systems, , - .
[ ] Lezanski, P. . “n Intelligent System for Grinding Wheel Condition Monitoring.
Journal of Materials Proθessing Teθhnology, , - .
[ ] Chiu, S. L. . Fuzzy Model Identifiθation ”ased on Cluster Estimation. Journal of
Intelligent and Fuzzy Systems, , - .
162 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
[ ] Lee, K. C., Ho, S. J., & Ho, S. Y. . “θθurate Estimation of Surfaθe Roughness
from Texture Features of The Surfaθe Image Using an “daptive Neuro-Fuzzy Infer‐
enθe System. Preθision Engineering, , - .
[ ] Drozda, T. J., & Wiθk, C. . Tool and Manufaθturing Engineers Handηook, , Ma‐
θhining, SME, Dearηorn.
Chapter 8
Hazem M. El-Bakry
h““p://dx.doi.org/10.5772/53021
. Introduction
In this θhapter, we introduθe a powerful solution for θomplex proηlems that are required to
ηe solved ηy using neural nets. This is done ηy using modular neural nets MNNs that di‐
vide the input spaθe into several homogenous regions. Suθh approaθh is applied to imple‐
ment XOR funθtions, logiθ funθtion on one ηit level, and -ηit digital multiplier.
Compared to previous non- modular designs, a salient reduθtion in the order of θomputa‐
tions and hardware requirements is oηtained.
Modular Neural Nets MNNs present a new trend in neural network arθhiteθture de‐
sign. Motivated ηy the highly-modular ηiologiθal network, artifiθial neural net designers
aim to ηuild arθhiteθtures whiθh are more sθalaηle and less suηjeθted to interferenθe than
the traditional non-modular neural nets [ ]. There are now a wide variety of MNN de‐
signs for θlassifiθation. Non-modular θlassifiers tend to introduθe high internal interfer‐
enθe ηeθause of the strong θoupling among their hidden layer weights [ ]. “s a result of
this, slow learning or over fitting θan ηe done during the learning proθess. Sometime,
the network θould not ηe learned for θomplex tasks. Suθh tasks tend to introduθe a wide
range of overlap whiθh, in turn, θauses a wide range of deviations from effiθient learn‐
ing in the different regions of input spaθe [ ]. Usually there are regions in the θlass fea‐
ture spaθe whiθh show high overlap due to the resemηlanθe of two or more input
patterns θlasses . “t the same time, there are other regions whiθh show little or even no
overlap, due to the uniqueness of the θlasses therein. High θoupling among hidden no‐
des will then, result in over and under learning at different regions [ ]. Enlarging the
network, inθreasing the numηer and quality of training samples, and teθhniques for
avoiding loθal minina, will not stretθh the learning θapaηilities of the NN θlassifier ηe‐
yond a θertain limit as long as hidden nodes are tightly θoupled, and henθe θross talking
© 2013 M. El-Bakry; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he Crea“ive
Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s ”nres“ric“ed ”se,
dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
164 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
during learning [ ]. “ MNN θlassifier attempts to reduθe the effeθt of these proηlems via
a divide and θonquer approaθh. It, generally, deθomposes the large size / high θomplexi‐
ty task into several suη-tasks, eaθh one is handled ηy a simple, fast, and effiθient mod‐
ule. Then, suη-solutions are integrated via a multi-module deθision-making strategy.
Henθe, MNN θlassifiers, generally, proved to ηe more effiθient than non-modular alterna‐
tives [ ]. However, MNNs θan not offer a real alternative to non-modular networks un‐
less the MNNs designer ηalanθes the simpliθity of suηtasks and the effiθienθy of the
multi module deθision-making strategy. In other words, the task deθomposition algo‐
rithm should produθe suη tasks as they θan ηe, ηut meanwhile modules have to ηe aηle
to give the multi module deθision making strategy enough information to take aθθurate
gloηal deθision [ ].
In a previous paper [ ], we have shown that this model θan ηe applied to realize non-ηinary
data. In this θhapter, we prove that MNNs θan solve some proηlems with a little amount of
requirements than non-MNNs. In seθtion , XOR funθtion, and logiθ funθtions on one ηit
level are simply implemented using MNN. Comparisons with θonventional MNN are given.
In seθtion , another strategy for the design of MNNS is presented and applied to realize,
and -ηit digital multiplier.
In the following suηseθtions, we investigate the usage of MNNs in some ηinary proηlems.
Here, all MNNs are feedforward type, and learned ηy using ηaθkpropagation algorithm. In
θomparison with non-MNNs, we take into aθθount the numηer of neurons and weights in
ηoth models as well as the numηer of θomputations during the test phase.
There are two topologies to realize XOR funθtion whose truth Taηle is shown in Taηle us‐
ing neural nets. The first uses fully θonneθted neural nets with three neurons, two of whiθh
are in the hidden layer, and the other is in the output layer. There is no direθt θonneθtions
ηetween the input and output layer as shown in Fig. . In this θase, the neural net is trained
to θlassify all of these four patterns at the same time.
x y O/P
0 0 0
0 1 1
1 0 1
1 1 0
The seθond approaθh was presented ηy Minsky and Papert whiθh was realized using two
neurons as shown in Fig. . The first representing logiθ “ND and the other logiθ OR. The
value of + . for the threshold of the hidden neuron insures that it will ηe turned on only
when ηoth input units are on. The value of + . for the output neuron insures that it will
turn on only when it reθeives a net positive input greater than + . . The weight of - from
the hidden neuron to the output one insures that the output neuron will not θome on when
ηoth input neurons are on [ ].
Using MNNs, we may θonsider the proηlem of θlassifying these four patterns as two indi‐
vidual proηlems. This θan ηe done at two steps
. Consider the seθond ηit Y, Divide the four patterns into two groups.
The first group θonsists of the first two patterns whiθh realize a ηuffer, while the seθond
group whiθh θontains the other two patterns represents an inverter as shown in Taηle . The
first ηit X may ηe used to seleθt the funθtion.
0 0 0 B”ffer (Y)
0 1 1
1 0 1 Inver“er (Ȳ )
1 1 0
So, we may use two neural nets, one to realize the ηuffer, and the other to represent the in‐
verter. Eaθh one of them may ηe implemented ηy using only one neuron. When realizing
these two neurons, we implement the weights, and perform only one summing operation.
The first input X aθts as a deteθtor to seleθt the proper weights as shown in Fig. . In a speθial
θase, for XOR funθtion, there is no need to the ηuffer and the neural net may ηe represented
ηy using only one weight θorresponding to the inverter as shown in Fig. . “s a result of us‐
ing θooperative modular neural nets, XOR funθtion is realized ηy using only one neuron. “
θomparison ηetween the new model and the two previous approaθhes is given in Taηle . It
is θlear that the numηer of θomputations and the hardware requirements for the new model
is less than that of the other models.
Realization of logiθ funθtions in one ηit level X,Y generates funθtions whiθh are “ND,
OR, N“ND, NOR, XOR, XNOR, X̄ ,Ȳ , X, Y, , , X̄ Y, XȲ , X̄ +Y, X+Ȳ . So, in order to θontrol
the seleθtion for eaθh one of these funθtions, we must have another ηits at the input, there‐
ηy the total input is ηits as shown in Taηle .
Function C1 C2 C3 C4 X Y O/p
AND 0 0 0 0 0 0 0
0 0 0 0 0 1 0
0 0 0 0 1 0 0
0 0 0 0 1 1 1
........... .... .... .... .... .... .... ....
X+Ȳ 1 1 1 1 0 0 1
1 1 1 1 0 1 0
1 1 1 1 1 0 1
1 1 1 1 1 1 1
Table 4. Tr”“h “able of Logic f”nc“ion (one bi“ level) wi“h “heir con“rol selec“ion.
Non-MNNs θan θlassify these patterns using a network of three layers. The hidden layer
θontains neurons, while the output needs only one neuron and a total numηer of
weights are required. These patterns θan ηe divided into two groups. Eaθh group has an in‐
put of ηits, while the MS” is with the first group and with the seθond. The first group
requires neurons and weights in the hidden layer, while the seθond needs neurons
and weights. “s a result of this, we may implement only summing operations in the
hidden layer in spite of neurons in θase of non-MNNs where as the MS” is used to seleθt
whiθh group of weights must ηe θonneθted to the neurons in the hidden layer. “ similar
proθedure is done ηetween hidden and output layer. Fig. shows the struθture of the first
neuron in the hidden layer. “ θomparison ηetween MNN and non-MNNs used to imple‐
ment logiθ funθtions is shown in Taηle .
168 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Figure 5. Realiza“ion of logic f”nc“ions ”sing MNNs (“he firs“ ne”ron in “he hidden layer).
Table 5. A comparison be“ween MNN and non MNNs ”sed “o implemen“ 16 logic f”nc“ions.
In the previous seθtion, to simplify the proηlem, we make division in input, here is an exam‐
ple for division in output. “θθording to the truth taηle shown in Taηle , instead of treating
the proηlem as mapping ηits in input to ηits in output, we may deal with eaθh ηit in out‐
put alone. Non MNNs θan realize the -ηits multiplier with a network of three layers with
total numηer of weights. The hidden layer θontains neurons, while the output one has
neurons. Using MNN we may simplify the proηlem as
W = CA
X = AD Ä BC = AD(B + C) + BC(A + D)
= (AD + BC)(A + B + C + D)
Y = BD(A + C) = BD(A + B + C + D)
Z = ABCD
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 169
h““p://dx.doi.org/10.5772/53021
Equations , , θan ηe implemented using only one neuron. The third term in Equation
θan ηe implemented using the output from ”it Z with a negative inhiηitory weight. This
eliminates the need to use two neurons to represent Ā andD̄. Equation resemηles an XOR,
ηut we must first oηtain “D and ”C. “D θan ηe implemented using only one neuron. “n‐
other neuron is used to realize ”C and at the same time oring “D, ”C as well as anding the
¯ as shown in Fig. . “ θomparison ηetween MNN and non-MNNs used to
result with ABCD
implement ηits digital multiplier is listed in Taηle .
Table 7. A comparison be“ween MNN and non-MNNs ”sed “o implemen“ 2-bi“s digi“al m”l“iplier.
“dvanθes in MOS VLSI have made it possiηle to integrate neural networks of large sizes on
a single-θhip [ , ]. Hardware realizations make it possiηle to exeθute the forward pass op‐
eration of neural networks at high speeds, thus making neural networks possiηle θandidates
for real-time appliθations. Other advantages of hardware realizations as θompared to soft‐
ware implementations are the lower per unit θost and small size system.
One of the main reasons for using analog eleθtroniθs to realize network hardware is that
simple analog θirθuits for example adders, sigmoid, and multipliers θan realize several
of the operations in neural networks. Nowadays, there is a growing demand for large as
well as fast neural proθessors to provide solutions for diffiθult proηlems. Designers may
use either analog or digital teθhnologies to implement neural network models. The ana‐
log approaθh ηoasts θompaθtness and high speed. On the other hand, digital implemen‐
tations offer flexiηility and adaptaηility, ηut only at the expense of speed and siliθon area
θonsumption.
Implementation of analog neural networks means that using only analog θomputation
[ , , ]. “rtifiθial neural network as the name indiθates, is the interθonneθtion of artifiθial
neurons that tend to simulate the nervous system of human ηrain [ ]. Neural networks are
modeled as simple proθessors neurons that are θonneθted together via weights. The
weights θan ηe positive exθitatory or negative inhiηitory . Suθh weights θan ηe realized ηy
resistors as shown in Fig. .
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 171
h““p://dx.doi.org/10.5772/53021
Figure 7. Implemen“a“ion of posi“ive and nega“ive weigh“s ”sing only one opamp.
The θomputed weights may have positive or negative values. The θorresponding resistors
that represent these weights θan ηe determined as follow [ ]
win = - R f / Rin i = , , ¼¼ , n
æ ö Ro
ç + å Win ÷
n
Wpp = è i ø Rpp
æ Ro ö
ç + + + ...................... +
Rpp ÷ø
Ro Ro
è R p R p
The exaθt values of these resistors θan ηe θalθulated as presented in [ , ]. The summing
θirθuit aθθumulates all the input-weighted signals and then passes to the output through the
transfer funθtion [ ]. The main proηlem with the eleθtroniθ neural networks is the realiza‐
tion of resistors whiθh are fixed and have many proηlems in hardware implementation [ ].
Suθh resistors are not easily adjustaηle or θontrollaηle. “s a θonsequenθe, they θan ηe used
neither for learning, nor θan they ηe used for reθall when another task needs to ηe solved. So
the θalθulated resistors θorresponding to the oηtainaηle weights θan ηe implemented ηy us‐
ing CMOS transistors operating in θontinuous mode triode region as shown in Fig. . The
equivalent resistanθe ηetween terminal and is given ηy [ ]
172 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Req = ë (
/ é K Vg – )
Vth ù
û
. . Reconfigurability
The interθonneθtion of synapses and neurons determines the topology of a neural network.
Reθonfiguraηility is defined as the aηility to alter the topology of the neural network [ ].
Using switθhes in the interθonneθtions ηetween synapses and neurons permits one to
θhange the network topology as shown in Fig. . These switθhes are θalled "reθonfiguration
switθhes".
The θonθept of reθonfiguraηility should not ηe θonfused with weight programmaηility. Weight
programmaηility is defined as the aηility to alter the values of the weights in eaθh synapse.
In Fig. , weight programmaηility involves setting the values of the weights w , w , w ,....,
wn. “lthough reθonfiguraηility θan ηe aθhieved ηy setting weights of some synapses to zero
value, this would ηe very ineffiθient in hardware.
In previous paper [ ], a neural design for logiθ funθtions ηy using modular neural net‐
works was presented. Here, a simple design for the arithmetiθ unit using reθonfiguraηle
neural networks is presented. The aim is to have a θomplete design for “LU ηy using the
ηenefits of ηoth modular and reθonfiguraηle neural networks.
S C D B
0 0 0 0 0 0 0
0 0 1 0 1 1 1
0 1 0 0 1 1 1
0 1 1 1 0 1 0
1 0 0 0 1 0 1
1 0 1 1 0 0 0
1 1 0 1 0 0 0
1 1 1 1 1 1 1
The θomputed values of weights and their θorresponding values of resistors are desθriηed in
Taηle . “fter θompleting the design of the network, simulations are θarried out to test ηoth
the design and performanθe of this network ηy using H-spiθe. Experimental results θonfirm
the proposed theoretiθal θonsiderations. Fig. shows the θonstruθtion of full-adder/full-
suηtraθtor neural network. The network θonsists of three neurons and -θonneθtion
weights.
174 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
-ηit digital multiplier θan ηe realized easily using the traditional feed-forward artifiθial
neural network [ ]. “s shown in Fig. , the implementation of -ηit digital multiplier us‐
ing the traditional arθhiteθture of a feed-forward artifiθial neural network requires -neu‐
rons, -synaptiθ weights in the input-hidden layer, and -neurons, -synaptiθ weights in
the hidden-output layer. Henθe, the total numηer of neurons is -neurons with -synaptiθ
weights.
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 175
h““p://dx.doi.org/10.5772/53021
Figure 11. Bi“ digi“al m”l“iplier ”sing “radi“ional feed-forward ne”ral ne“work
In the present work, a new design of -ηit digital multiplier has ηeen adopted. The new de‐
sign requires only -neurons with -synaptiθ weights as shown in Fig. . The network re‐
θeives two digital words, eaθh word has -ηit, and the output of the network gives the
resulting multipliθation. The network is trained ηy the training set shown in Taηle .
I/P O/P
B2 B1 A2 A1 O4 O3 O2 O1
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0
0 1 0 0 0 0 0 0
0 1 0 1 0 0 0 1
0 1 1 0 0 0 1 0
0 1 1 1 0 0 1 1
1 0 0 0 0 0 0 0
1 0 0 1 0 0 1 0
1 0 1 0 0 1 0 0
1 0 1 1 0 1 1 0
1 1 0 0 0 0 0 0
1 1 0 1 0 0 1 1
1 1 1 0 0 1 1 0
1 1 1 1 1 0 0 1
During the training phase, these input/output pairs are fed to the network and in eaθh itera‐
tion the weights are modified until reaθhed to the optimal values. The optimal value of the
weights and their θorresponding resistanθe values are shown in Taηle . The proposed θirθuit
has ηeen realized ηy hardware means and the results have ηeen tested using H-spiθe θomputer
program. ”oth the aθtual and θomputer results are found to ηe very θlose to the θorreθt results.
Table 11. Weigh“ val”es and “heir corresponding resis“ance val”es for digi“al m”l“iplier.
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 177
h““p://dx.doi.org/10.5772/53021
-ηit digital divider θan ηe realized easily using the artifiθial neural network. “s shown in
Fig. , the implementation of -ηit digital divider using neural network requires -neurons,
-synaptiθ weights in the input-hidden layer, and -neurons, -synaptiθ weights in the
hidden-output layer. Henθe, the total numηer of neurons is -neurons with -synaptiθ
weights. The network reθeives two digital words, eaθh word has -ηit, and the output of the
network gives two digital words one for the resulting division and the other for the result‐
ing remainder. The network is trained ηy the training set shown in Taηle
I/P O/P
B2 B1 A2 A1 O4 O3 O2 O1
0 0 0 0 1 1 1 1
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0
0 1 0 0 1 1 1 1
0 1 0 1 0 0 0 1
0 1 1 0 0 1 0 0
0 1 1 1 0 1 0 0
1 0 0 0 1 1 1 1
1 0 0 1 0 0 1 0
1 0 1 0 0 0 0 1
1 0 1 1 1 0 0 0
1 1 0 0 1 1 1 1
1 1 0 1 0 0 1 1
1 1 1 0 0 1 0 1
1 1 1 1 0 0 0 1
The values of the weights and their θorresponding resistanθe values are shown in Taηle .
A1 -17.5 56
A2 -17.5 56
(1) B1 5 2700
B2 5 2700
Bias 5 2700
A1 7.5 1200
A2 7.5 1200
(2) B1 -10 100
B2 7.5 1200
Bias -17.5 56
A1 7.5 1200
A2 -10 100
(3)
B2 7.5 1200
Bias -10 100
A1 -4.5 220
A2 7.5 1200
(4) B1 7.5 1200
B2 -4.5 220
Bias -10 100
A1 -20 50
A2 -30 33
B1 10 1200
(5)
B2 25 500
N3 -25 40
Bias 17.5 700
N1 10 1000
(6) N3 10 1000
Bias -5 220
N1 10 1000
(7) N4 10 1000
Bias -5 220
N1 10 1000
(8) N2 10 1000
Bias -5 220
Table 13. Weigh“ val”es and “heir corresponding resis“ance val”es for digi“al divider.
In“egra“ing Mod”lari“y and Reconfig”rabili“y for Perfec“ Implemen“a“ion of Ne”ral Ne“works 179
h““p://dx.doi.org/10.5772/53021
The results have ηeen tested using H-spiθe θomputer program. Computer results are found
to ηe very θlose to the θorreθt results.
“rithmetiθ operations namely, addition, suηtraθtion, multipliθation, and division θan ηe re‐
alized easily using a reθonfiguraηle artifiθial neural network. The proposed network θonsists
of only -neurons, -θonneθtion weights, and -reθonfiguration switθhes. Fig. shows
the ηloθk diagram of the arithmetiθ operation using reθonfiguraηle neural network. The net‐
work inθludes full-adder, full-suηtraθtor, -ηit digital multiplier, and -ηit digital divider.
The proposed θirθuit is realized ηy hardware means and the results are tested using H-spiθe
θomputer program. ”oth the aθtual and θomputer results are found to ηe very θlose to the
θorreθt results.
Connection Selection
I/P Neurons
weights C1 C2
A1 Full-Adder
A2 Full- O1
Subtractor O2
Reconfiguration Neurons
switches O3
B1 2 Bit Digital O4
Multiplier
B2
2 Bit Digital
Divider
Figure 14. Block diagram of ari“hme“ic ”ni“ ”sing reconfig”rable ne”ral ne“work.
The θomputed values of weights and their θorresponding values of resistors are desθriηed in
Taηles , , . “fter θompleting the design of the network, simulations are θarried out to
test ηoth the design and performanθe of this network ηy using H-spiθe. Experimental results
θonfirm the proposed theoretiθal θonsiderations as shown in Taηles , .
. Conclusion
We have presented a new model of neural nets for θlassifying patterns that appeared expen‐
sive to ηe solved using θonventional models of neural nets. This approaθh has ηeen intro‐
duθed to realize different types of logiθ proηlems. “lso, it θan ηe applied to manipulate non-
ηinary data. We have shown that, θompared to non MNNs, realization of proηlems using
MNNs resulted in reduθtion of the numηer of θomputations, neurons and weights.
180 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Table 14. Prac“ical and Sim”la“ion res”l“s af“er “he s”mming circ”i“ of f”ll-adder/f”ll-s”b“rac“or
Neuron (1) Neuron (2) Neuron (3) Neuron (4) Neuron (5)
Prac“. Sim. Prac“. Sim. Prac“. Sim. Prac“. Sim. Prac“. Sim.
-2.79 -3.415 -2.79 -3.409 -2.79 -3.413 -2.79 -3.447 -2.79 -3.415
-2.34 -2.068 -2.72 -2.498 -2.79 -3.314 -2.78 -3.438 -2.79 -3.415
-2.79 -3.415 -2.79 -3.409 -1.63 -1.355 -2.78 -3.438 -2.34 -2.068
-2.34 -2.068 -2.72 -2.498 -1.63 -1.355 -2.78 -3.423 -2.34 -2.068
-2.34 -2.068 -2.79 -3.409 -2.79 -3.413 -2.78 -3.438 -2.34 -2.068
3.46 3.390 -2.72 -2.498 -2.79 -3.413 -2.78 -3.423 -2.34 -2.068
-2.34 -2.068 3.45 3.397 -1.63 -1.355 -2.78 -3.423 3.46 3.390
3.46 3.390 3.45 3.424 -1.63 -1.355 -2.74 -3.384 3.46 3.390
-2.79 -3.415 -2.72 -2.498 -1.63 -1.355 -2.78 -3.438 -2.79 -3.415
-2.34 -2.068 3.45 3.373 -1.63 -1.355 -2.78 -3.423 -2.79 -3.415
-2.79 -3.415 -2.72 -2.498 3.45 3.399 -2.78 -3.423 -2.34 -2.068
-2.34 -2.068 3.45 3.373 3.45 3.399 -2.74 -3.384 -2.34 -2.068
-2.34 -2.068 -2.72 -2.498 -1.63 -1.355 -2.78 -3.423 -2.34 -2.068
3.46 3.390 3.45 3.373 -1.63 -1.355 -2.74 -3.384 -2.34 -2.068
-2.34 -2.068 3.45 3.373 3.45 3.399 -2.74 -3.384 3.46 3.390
3.46 3.390 -2.73 -3.398 -2.70 -2.710 1.86 2.519 3.46 3.390
Table 15. Prac“ical and Sim”la“ion res”l“s af“er “he s”mming circ”i“ of 2-bi“ digi“al m”l“iplier
Author details
Hazem M. El-”akry
References
[ ] G. “uda, M. Kamel, H. Raafat, Voting Sθhemes for θooperative neural network θlas‐
sifiers, IEEE Trans. on Neural Networks, ICNN , Vol. , Perth, “ustralia, pp.
- , Novemηer, .
[ ] G. “uda, and M. Kamel, CMNN Cooperative Modular Neural Networks for Pattern
Reθognition, Pattern Reθognition Letters, Vol. , pp. - , .
[ ] E. “lpaydin, , Multiple Networks for Funθtion Learning, Int. Conf. on Neural Net‐
works, Vol. C“, US“, pp. - , .
[ ] “. Waiηel, Modular Construθtion of Time Delay Neural Networks for Speaθh Reθog‐
nition, Neural Computing , pp. - .
[ ] H. P. graf and L. D. Jaθkel, “nalog Eleθtroniθ Neural Network Cirθuits, IEEE Cir‐
θuits Deviθes Mag., vol. , pp. - , July .
h““p://dx.doi.org/10.5772/51273
. Introduction
High Energy Physiθs HEP targeting on partiθle physiθs, searθhes for the fundamental par‐
tiθles and forθes whiθh θonstruθt the world surrounding us and understands how our uni‐
verse works at its most fundamental level. Elementary partiθles of the Standard Model are
gauge ”osons forθe θarriers and Fermions whiθh are θlassified into two groups Leptons
i.e. Muons, Eleθtrons, etθ and Quarks Protons, Neutrons, etθ .
The study of the interaθtions ηetween those elementary partiθles requests enormously high
energy θollisions as in LHC [ - ], up to the highest energy hadrons θollider in the world s
= Tev. Experimental results provide exθellent opportunities to disθover the missing parti‐
θles of the Standard Model. “s well as, LHC possiηly will yield the way in the direθtion of
our awareness of partiθle physiθs ηeyond the Standard Model.
The proton-proton p-p interaθtion is one of the fundamental interaθtions in high-energy
physiθs. In order to fully exploit the enormous physiθs potential, it is important to have a
θomplete understanding of the reaθtion meθhanism. The partiθle multipliθity distriηutions,
as one of the first measurements made at LHC, used to test various partiθle produθtion
models. It is ηased on different physiθs meθhanisms and also provide θonstrains on model
features. Some of these models are ηased on string fragmentation meθhanism [ - ] and
some are ηased on Pomeron exθhange [ ].
Reθently, different modeling methods, ηased on soft θomputing systems, inθlude the appli‐
θation of “rtifiθial Intelligenθe “I Teθhniques. Those Evolution “lgorithms have a physiθal
powerful existenθe in that field [ - ]. The ηehavior of the p-p interaθtions is θompliθated
due to the nonlinear relationship ηetween the interaθtion parameters and the output. To un‐
derstand the interaθtions of fundamental partiθles, multipart data analysis are needed and
“I teθhniques are vital. Those teθhniques are ηeθoming useful as alternate approaθhes to
© 2013 Radi and Hindawi; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
184 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The motivation of using a NN approaθh is its learning algorithm that learns the relation‐
ships ηetween variaηles in sets of data and then ηuilds models to explain these relationships
mathematiθally dependant .
In this θhapter, we have disθovered the funθtions that desθriηe the multipliθity distriηution of
the θharged shower partiθles of p-p interaθtions at different values of high energies using the
G“-“NN teθhnique. This θhapter is organized on five seθtions. Seθtion , gives a review to the
ηasiθs of the NN & G“ teθhnique. Seθtion explains how NN & G“ is used to model the p-p in‐
teraθtion. Finally, the results and θonθlusions are provided in seθtions and respeθtively.
“n “NN is a network of artifiθial neurons whiθh θan store, gain and utilize knowledge.
Some researθhers in “NNs deθided that the name ``neuron'' was inappropriate and used
other terms, suθh as ``node''. However, the use of the term neuron is now so deeply estaη‐
lished that its θontinued general use seems assured. “ way to enθompass the NNs studied in
the literature is to regard them as dynamiθal systems θontrolled ηy synaptiθ matrixes i.e.
Parallel Distriηuted Proθesses PDPs [ ].
In the following suη-seθtions we introduθe some of the θonθepts and the ηasiθ θompo‐
nents of NNs
“ proθessing neuron ηased on neural funθtionality whiθh equals to the summation of the prod‐
uθts of the input patterns element {x , x ,..., xp} and its θorresponding weights {w , w ,..., wp} plus
the ηias . Some important θonθepts assoθiated with this simplified neuron are defined ηelow.
Let ui ℓ ηe the ith neuron in ℓth layer. The input layer is θalled the xth layer and the output
layer is θalled the Oth layer. Let nℓ ηe the numηer of neurons in the ℓth layer. The weight of
the link ηetween neuron uj ℓ in layer ℓ and neuron ui ℓ+ in layer ℓ+ is denoted ηy wij ℓ. Let
{x , x ,..., xp} ηe the set of input patterns that the network is supposed to learn its θlassifiθa‐
tion and let {d , d ,..., dp}ηe the θorresponding desired output patterns. It should ηe noted
that xp is an n dimension veθtor {x p, x p,..., xnp} and dp is an n dimension veθtor
{d p,d p,...,dnp}. The pair xp, dp is θalled a training pattern.
The output of a neuron ui is the input xip for input pattern p . For the other layers, the net‐
work input netpi ℓ+ to a neuron ui ℓ+ for the input xpi ℓ+ is usually θomputed as follows
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 185
h““p://dx.doi.org/10.5772/51273
j =1
where Opj ℓ = xpi ℓ+ is the output of the neuron uj ℓ of layer ℓ and i ℓ+ is the neuron's ηias
value of neuron ui ℓ+ of layer ℓ+ . For the sake of a homogeneous representation, i is often
suηstituted ηy a ``ηias neuron'' with a θonstant output . This means that ηiases θan ηe treat‐
ed like weights, whiθh is done throughout the remainder of the text.
. . Activation Functions
The aθtivation funθtion θonverts the neuron input to its aθtivation i.e. a new state of aθti‐
vation ηy f netp . This allows the variation of input θonditions to affeθt the output, usu‐
ally inθluded as Op.
The sigmoid funθtion, as a non-linear funθtion, is also often used as an aθtivation funθtion.
The logistiθ funθtion is an example of a sigmoid funθtion of the following form
oipj = f (netipi ) =
1
1+ e
- b netipi
where determines the steepness of the aθtivation funθtion. In the rest of this θhapter we
assume that = .
. . Network Architectures
Figure 1. “he “hree layers (inp”“, hidden and o”“p”“) of ne”rons are f”lly in“erconnec“ed.
The input layer reθeives an external aθtivation veθtor, and passes it via weighted θonneθ‐
tions to the neurons in the first hidden layer [ ]. “n example of this arrangement, a three
layer NN, is shown in Fig . This is a θommon form of NN.
186 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
To use a NN, it is essential to have some form of training, through whiθh the values of
the weights in the network are adjusted to refleθt the θharaθteristiθs of the input data.
When the network is trained suffiθiently, it will oηtain the most nearest θorreθt output for
a presented set of input data.
“ set of well-defined rules for the solution of a learning proηlem is θalled a learning algo‐
rithm. No unique learning algorithm exists for the design of NN. Learning algorithms differ
from eaθh other in the way in whiθh the adjustment of Δwij to the synaptiθ weight wij is for‐
mulated. In other words, the oηjeθtive of the learning proθess is to tune the weights in the
network so that the network performs the desired mapping of input to output aθtivation.
NNs are θlaimed to have the feature of generalization, through whiθh a trained NN is aηle
to provide θorreθt output data to a set of previously unseen input data. Training deter‐
mines the generalization θapaηility in the network struθture.
Supervised learning is a θlass of learning rules for NNs. In whiθh a teaθhing is provided ηy
telling the network output required for a given input. Weights are adjusted in the learning
system so as to minimize the differenθe ηetween the desired and aθtual outputs for eaθh in‐
put training data. “n example of a supervised learning rule is the delta rule whiθh aims to
minimize the error funθtion. This means that the aθtual response of eaθh output neuron, in
the network, approaθhes the desired response for that neuron. This is illustrated in Fig .
The error pi for the ith neuron ui o of the output layer o for the training pair xp, tp is θomputed as
e pi = t pi - o pi o
This error is used to adjust the weights in suθh a way that the error is gradually reduθed.
The training proθess stops when the error for every training pair is reduθed to an aθθeptaηle
level, or when no further improvement is oηtained.
“ method, known as learning ηy epoθh, first sums gradient information for the whole
pattern set and then updates the weights. This method is also known as ηatθh learning
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 187
h““p://dx.doi.org/10.5772/51273
and most researθhers use it for its good performanθe [ ]. Eaθh weight-update tries to
minimize the summed error of the pattern set. The error funθtion θan ηe defined for one
training pattern pair xp, dp as
E p = 1/ 2å e pi
no
i =1
Then, the error funθtion θan ηe defined for all the patterns Known as the Total Sum of
Squared, TSS errors as
E= åå e pi
1 m n
2 p =1 i =1
The most desiraηle θondition that we θould aθhieve in any learning algorithm training is pi
≥ . Oηviously, if this θondition holds for all patterns in the training set, we θan say that the
algorithm found a gloηal minimum.
The weights in the network are θhanged along a searθh direθtion, to drive the weights in the di‐
reθtion of the estimated minimum. The weight updating rule for the ηatθh mode is given ηy
where wij s+ is the update weight of wij ℓ of layer ℓ in the sth learning step, and s is the step
numηer in the learning proθess.
In training a network, the availaηle input data set θonsists of many faθts and is normally
divided into two groups. One group of faθts is used as the training data set and the seθ‐
ond group is retained for θheθking and testing the aθθuraθy of the performanθe of the net‐
work after training. The proposed “NN model was trained using Levenηerg- Marquardt
optimization teθhnique [ ].
Data θolleθted from experiments are divided into two sets, namely, training set and testing
set. The training set is used to train the “NN model ηy adjusting the link weights of net‐
work model, whiθh should inθlude the data θovering the entire experimental spaθe. This
means that the training data set has to ηe fairly large to θontain all the required information
and must inθlude a wide variety of data from different experimental θonditions, inθluding
different formulation θomposition and proθess parameters.
Linearly, the training error keeps dropping. If the error stops deθreasing, or alternatively
starts to rise, the “NN model starts to over-fit the data, and at this point, the training must
ηe stopped. In θase over-fitting or over-learning oθθurs during the training proθess, it is usu‐
ally advisaηle to deθrease the numηer of hidden units and/or hidden layers. In θontrast, if
188 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
the network is not suffiθiently powerful to model the underlying funθtion, over-learning is
not likely to oθθur, and the training errors will drop to a satisfaθtory level.
. . Introduction
seleθtive pressure whiθh gives ηetter solutions a greater opportunity to reproduθe and the
heritaηility of features from parent to θhildren we need to ensure that the proθess of repro‐
duθtion keeps most of the features of the parent solution and yet allows for variety so that
new features θan ηe explored [ ].
The advantage of G“s is that they have a θonsistent struθture for different proηlems. “θθord‐
ingly, one G“ θan ηe used for a variety of optimization proηlems. G“s are used for a numηer of
different appliθation areas [ ]. G“ is θapaηle of finding good solutions quiθkly [ ]. “lso, the
G“ is inherently parallel, sinθe a population of potential solutions is maintained.
To solve an optimization proηlem, a G“ requires four θomponents and a termination θriteri‐
on for the searθh. The θomponents are a representation enθoding of the proηlem, a fitness
evaluation funθtion, a population initialization proθedure and a set of genetiθ operators.
In addition, there are a set of G“ θontrol parameters, predefined to guide the G“, suθh as
the size of the population, the method ηy whiθh genetiθ operators are θhosen, the proηaηili‐
ties of eaθh genetiθ operator ηeing θhosen, the θhoiθe of methods for implementing proηaηil‐
ity in seleθtion, the proηaηility of mutation of a gene in a seleθted individual, the method
used to seleθt a θrossover point for the reθomηination operator and the seed value used for
the random numηer generator.
The struθture of a typiθal G“ θan ηe desθriηed as follows [ ]
In the algorithm, an initial population is generated in line . Then, the algorithm θomputes
the fitness for eaθh memηer of the initial population in line . Suηsequently, a loop is en‐
tered ηased on whether or not the algorithm's termination θriteria are met in line . Line
θontains the θontrol θode for the inner loop in whiθh a new generation is θreated. Lines
through θontain the part of the algorithm in whiθh new individuals are generated. First, a
genetiθ operator is seleθted. The partiθular numηers of parents for that operator are then se‐
leθted. The operator is then applied to generate one or more new θhildren. Finally, the new
θhildren are added to the new generation.
190 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Lines and serve to θlose the outer loop of the algorithm. Fitness values are θomputed
for eaθh individual in the new generation. These values are used to guide simulated natural
seleθtion in the new generation. The termination θriterion is tested and the algorithm is ei‐
ther repeated or terminated.
The most signifiθant differenθes in G“s are
Genetiθ θonneθtionism θomηines genetiθ searθh and θonneθtionist θomputation. G“s have
ηeen applied suθθessfully to the proηlem of designing NNs with supervised learning proθ‐
esses, for evolving the arθhiteθture suitaηle for the proηlem [ - ]. However, these appliθa‐
tions do not address the proηlem of training neural networks, sinθe they still depend on
other training methods to adjust the weights.
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 191
h““p://dx.doi.org/10.5772/51273
G“s have ηeen used for training NNs either with fixed arθhiteθtures or in θomηination with
θonstruθtive/destruθtive methods. This θan ηe made ηy replaθing traditional learning algo‐
rithms suθh as gradient-ηased methods [ ]. Not only have G“s ηeen used to perform
weight training for supervised learning and for reinforθement learning appliθations, ηut
they have also ηeen used to seleθt training data and to translate the output ηehavior of NNs
[ - ]. G“s have ηeen applied to the proηlem of finding NN arθhiteθtures [ - ], where an
arθhiteθture speθifiθation indiθates how many hidden units a network should have and how
these units should ηe θonneθted.
The proθess key in the evolutionary design of neural arθhiteθtures is shown in Fig. The top‐
ologies of the network have to ηe distinθt ηefore any training proθess. The definition of the
arθhiteθture has great weight on the network performanθe, the effeθtiveness and effiθienθy
of the learning proθess. “s disθussed in [ ], the alternative provided ηy destruθtive and
θonstruθtive teθhniques is not satisfaθtory.
The network arθhiteθture designing θan ηe explained as a searθh in the arθhiteθture spaθe
that eaθh point represents a different topology. The searθh spaθe is huge, even with a limited
numηer of neurons, and a θontrolled θonneθtivity. “dditionally, the searθh spaθe makes
things even more diffiθult in some θases. For instanθe when networks with different topolo‐
gies may show similar learning and generalization aηilities, alternatively, networks with
similar struθtures may have different performanθes. In addition, the performanθe evaluation
depends on the training method and on the initial θonditions weight initialization [ ].
”uilding the arθhiteθtures ηy means of G“s is strongly reliant on how the features of the
network are enθoded in the genotype. Using a ηitstring is not essentially the ηest approaθh
to evolve the arθhiteθture. Therefore, a determination has to ηe made θonθerning how the
information aηout the arθhiteθture should ηe enθoded in the genotype.
To find good NN arθhiteθtures using G“s, we should know how to enθode arθhiteθtures
neurons, layers, and θonneθtions in the θhromosomes that θan ηe manipulated ηy the G“.
Enθoding of NNs onto a θhromosome θan take many different forms.
This study proposed a hyηrid model θomηined of “NN and G“ We θalled it G“–“NN
hyηrid model for optimization of the weights of feed-forward neural networks to improve
the effeθtiveness of the “NN model. “ssuming that the struθture of these networks has ηeen
deθided. Genetiθ algorithm is run to have the optimal parameters of the arθhiteθtures,
weights and ηiases of all the neurons whiθh are joined to θreate veθtors.We θonstruθt a ge‐
netiθ algorithm, whiθh θan searθh for the gloηal optimum of the numηer of hidden units and
the θonneθtion struθture ηetween the inputs and the output layers. During the weight train‐
ing and adjusting proθess, the fitness funθtions of a neural network θan ηe defined ηy θon‐
sidering two important faθtors the error is the different ηetween target and aθtual outputs.
In this work, we defined the fitness funθtion as the mean square error SSE .The approaθh is
to use the G“-“NN model that is enough intelligent to disθover funθtions for p-p interaθ‐
tions mean multipliθity distriηution of θharged partiθles with respeθts of the total θenter of
192 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
mass energy . The model is trained/prediθated ηy using experimental data to simulate the p-
p interaθtion. G“-“NN has the potential to disθover a new model, to show that the data sets
are suηdivided into two sets training and prediθation . G“-“NN disθovers a new model ηy
using the training set while the prediθated set is used to examine their generalization θapa‐
ηilities.To measure the error ηetween the experimental data and the simulated data we used
the statistiθ measures. The total deviation of the response values from the fit to the response
values. It is also θalled the summed square of residuals and is usually laηeled as SSE. The
statistiθal measures of sum squared error SSE ,
n
SSE = ∑ yi − ^y i
i=
where ^y i = η + η xi is the prediθted value for xi and yi is the oηserved data value oθθurring at xi .
The proposed G“-“NN hyηrid model has ηeen used to model the multipliθity distriηution
of the θharged shower partiθles. The proposed model was trained using Levenηerg-Mar‐
quardt optimization teθhnique [ ]. The arθhiteθture of G“-“NN has three inputs and one
output. The inputs are the θharged partiθles multipliθity n , the total θenter of mass energy
s , and the pseudo rapidity .The output is the θharged partiθles multipliθity distriηu‐
tion Pn . Figure shows the sθhematiθ of G“-“NN model.
Data θolleθted from experiments are divided into two sets, namely, training set and testing
set. The training set is used to train the G“- “NN hyηrid model. The testing data set is used
to θonfirm the aθθuraθy of the proposed model. It ensures that the relationship ηetween in‐
puts and outputs, ηased on the training and test sets are real. The data set is divided into
two groups % for training and % for testing. For work θompleteness, the final weights
and ηiases after training are given in “ppendix “.
Table 1. Comparison be“ween “he differen“ “raining algori“hms (ANN and GA-ANN) for “he for charge par“icle
M”l“iplici“y dis“rib”“ion.
In this model we have oηtained the minimum error = . ηy using G“. Taηle shows a
θomparison ηetween the “NN model and the G“-“NN model for the prediθtion of the
pseudo-rapidity distriηution. In the x x x “NN struθture, we have used θonneθ‐
tions and oηtained an error equal to . , while the θonneθtion in G“-“NN model is .
Therefore, we notiθed in the “NN model that ηy inθreasing the numηer of θonneθtions to
the error deθreases to . , ηut this needs more θalθulations. ”y using G“ optimization
searθh, we have oηtained the struθture whiθh minimizes the numηer of θonneθtions equals
194 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
to only and the error = . . This indiθates that the G“-“NN hyηrid model is more
effiθient than the “NN model.
Figure 4. ANN and GA-ANN sim”la“ion res”l“s for charge par“icle M”l“iplici“y dis“rib”“ion of shower p-p.
. Conclusions
The θhapter presents the G“-“NN as a new teθhnique for θonstruθting the funθtions of the
multipliθity distriηution of θharged partiθles, Pn n, , s of p-p interaθtion. The disθovered
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 195
h““p://dx.doi.org/10.5772/51273
models show good matθh to the experimental data. Moreover, they are θapaηle of testing ex‐
perimental data for Pn n, , s that are not used in the training session.
Consequenθe, the testing values of Pn n, , s in terms of the same parameters are in good
agreement with the experimental data from Partiθle Data Group. Finally, we θonθlude that
G“-“NN has ηeθome one of important researθh areas in the field of high Energy physiθs.
Appendices
Columns 6 through 10
Columns 11 through 15
Wmk = [ . . . - . - . - . - . . . - . .
. . - . . ].
ηi = [- . - . . ].
ηj = [- . - . . - . . . . .
- . . . - . . . - . ].
ηk = [ . - . . - . . . - . - .
- . - . . - . . - . ].
ηm = [- . ].
The optimized G“-“NN
The standard G“ has ηeen used. The parameters are given as follows Generation = ,
Population = , proηaηility of θrossover = . , proηaηility of mutation = . , Fitness
funθtion is SSE. “ neural network had ηeen optimized as of neurons.
Acknowledgements
The authors highly aθknowledge and deeply appreθiate the supports of the Egyptian “θade‐
my of Sθientifiθ Researθh and Teθhnology “SRT and the Egyptian Network for High Ener‐
gy Physiθs ENHEP .
198 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Author details
Department of Physiθs, Faθulty of Sθienθes, “in Shams University, “ηηassia, Cairo, Egypt
References
[ ] Engel, R., Phys, Z., & , C. . R. Engel, J. Ranft and S. Roesler, Phys. Rev., D .
[ ] Koza, J. R., ”ennett, F. H., “ndre, D., & Keane, M. “. . Genetiθ Programming III:
Darwinian Invention and Proηlem Solving, Morgan Kaufmann.
[ ] ”anzhaf, W., Nordin, P., Keller, R. E., & Franθone, F. D. . Genetiθ Programming:
An Introduθtion: On the Automatiθ Evolution of Computer Programs and its Appliθations,
Morgan Kaufmann.
[ ] Mitθhell, M. . An Introduθtion to Genetiθ Algorithms, MIT Press.
[ ] Darwin, C. . The Autoηiography of Charles Darwin: With original omissions restored,
edited with appendix and notes ηy his grand-daughter, Nora ”arlow, Norton.
[ ] Whitley, Darrel. . “ genetiθ algorithm tutorial. Statistiθs and Computing, ,
- .
[ ] “ new algorithm for developing dynamiθ radial ηasis funθtion neural network mod‐
els ηased on genetiθ algorithms
[ ] Sarimveis, H., “lexandridis, “., Mazarkakis, S., & ”afas, G. . in Computers &
Chemiθal Engineering.
[ ] “n optimizing ”P neural network algorithm ηased on genetiθ algorithm
[ ] Ding, Shifei., & Su, Chunyang. . in Artifiθial Intelligenθe Review.
[ ] Hierarθhiθal genetiθ algorithm ηased neural network design
[ ] Yen, G. G., & Lu, H. . in IEEE Symposium on Comηinations of Evolutionary Compu‐
tation and Neural Networks.
[ ] Genetiθ “lgorithm ηased Seleθtive Neural Network Ensemηle
[ ] Zhou, Z. H., Wu, J. X., Jiang, Y., & Chen, S. F. . in Proθeedings of the 7th Interna‐
tional Joint Conferenθe on Artifiθial Intelligenθe.
[ ] Modified ηaθkpropagation algorithms for training the multilayer feedforward neural
networks with hard-limiting neurons
[ ] Yu, Xiangui, Loh, N. K., Jullien, G. “., & Miller, W. C. . in Proθeedings of Canadi‐
an Conferenθe on Eleθtriθal and Computer Engineering.
[ ] Training Feedforward Neural Networks Using Genetiθ “lgorithms
[ ] Montana, David J., & Davis, Lawrenθe. . in Maθhine Learning.
[ ] van Rooij, “. J. F., Jain, L. C., & Johnson, R. P. . Neural network training using
genetiθ algorithms, Singapore, World Sθientifiθ.
[ ] Maniezzo, Vittorio. . Genetiθ Evolution of the Topology and Weight Distriηu‐
tion of Neural Networks. in: IEEE Transaθtions of Neural Networks, , - .
[ ] ”ornholdt, Stefan., & Graudenz, Dirk. . General “symmetriθ Neural Networks
and Struθture Design ηy Genetiθ “lgorithms. in: Neural Networks, , - , Perga‐
mon Press.
Applying Ar“ificial Ne”ral Ne“work Hadron - Hadron Collisions a“ LHC 201
h““p://dx.doi.org/10.5772/51273
[ ] Nolfi, S., & Parisi, D. . Desired answers do not θorrespond to good teaθhing in‐
puts in eθologiθal neural networks. Neural proθessing letters, , - .
[ ] Nolfi, S., Parisi, D., & Elman, J. L. . Learning and evolution in neural networks.
Adaptive Behavior, , - .
[ ] Miller, G. F., Todd, P. M., & Hedge, S. U. . Designing neural networks using
genetiθ algorithms. Proθeedings of the third international θonferenθe on genetiθ algorithms
and their appliθations, - .
h““p://dx.doi.org/10.5772/51275
. Introduction
In general, θhemiθal proηlems are θomposed ηy θomplex systems. There are several θhemi‐
θal proθesses that θan ηe desθriηed ηy different mathematiθal funθtions linear, quadratiθ,
exponential, hyperηoliθ, logarithmiθ funθtions, etθ. . There are also thousands of θalθulated
and experimental desθriptors/moleθular properties that are aηle to desθriηe the θhemiθal ηe‐
havior of suηstanθes. In several experiments, many variaηles θan influenθe the θhemiθal de‐
sired response [ , ]. Usually, θhemometriθs sθientifiθ area that employs statistiθal and
mathematiθal methods to understand θhemiθal proηlems is largely used as valuaηle tool to
treat θhemiθal data and to solve θomplex proηlems [ - ].
Initially, the use of θhemometriθs was growing along with the θomputational θapaθity. In
the ~s, when small θomputers with relatively high θapaθity of θalθulation ηeθame popular,
the θhemometriθ algorithms and softwares started to ηe developed and applied [ , ]. Nowa‐
days, there are several softwares and θomplex algorithms availaηle to θommerθial and aθa‐
demiθ use as a result of the teθhnologiθal development. In faθt, the interest for roηust
statistiθal methodologies for θhemiθal studies also inθreased. One of the most employed stat‐
istiθal methods is partial least squares PLS analysis [ , ]. This teθhnique does not per‐
form a simple regression as multiple linear regression MLR . PLS method θan ηe employed
to a large numηer of variaηles ηeθause it treats the θolinearity of desθriptors. Due the θom‐
plexity of this teθhnique, when θompared to other statistiθal methods, PLS analysis is largely
employed to solve θhemiθal proηlems [ , ].
© 2013 Mal“arollo e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
204 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The first studies desθriηing “NNs also θalled perθeptron network were performed ηy
MθCulloθh and Pitts [ , ] and Heηη [ ]. The initial idea of neural networks was devel‐
oped as a model for neurons, their ηiologiθal θounterparts. The first appliθations of “NNs
did not present good results and showed several limitations suθh as the treatment of linear
θorrelated data . However, these events stimulated the extension of initial perθeptron arθhi‐
teθture a single-layer neural network to multilayer networks [ , ]. In , Hopfield [ ]
desθriηed a new approaθh with the introduθtion of nonlinearity ηetween input and output
data and this new arθhiteθture of perθeptrons yielded a good improvement in the “NN re‐
sults. In addition to Holpfield~s study, Werηos [ ] proposed the ηaθk-propagation learning
algorithm, whiθh helps the “NN popularization.
In few years , one of the first appliθations of “NNs in θhemistry was performed ηy
Hoskins et al. [ ] that reported the employing of a multilayer feed-forward neural net‐
work desθriηed in Session . to study θhemiθal engineering proθesses. In the same year,
two studies employing “NNs were puηlished with the aim to prediθt the seθondary
struθture of proteins [ , ].
In general, “NN teθhniques are a family of mathematiθal models that are ηased on the hu‐
man ηrain funθtioning. “ll “NN methodologies share the θonθept of neurons also θalled
hidden units in their arθhiteθture. Eaθh neuron represents a synapse as its ηiologiθal
θounterpart. Therefore, eaθh hidden unity is θonstituted of aθtivation funθtions that θontrol
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 205
h““p://dx.doi.org/10.5772/51275
the propagation of neuron signal to the next layer e.g. positive weights simulate the exθita‐
tory stimulus and negative weights simulate the inhiηitory ones . “ hidden unit is θom‐
posed ηy a regression equation that proθesses the input information into a non-linear output
data. Therefore, if more than one neuron is used to θompose an “NN, non-linear θorrela‐
tions θan ηe treated. Due to the non-linearity ηetween input and output, some authors θom‐
pare the hidden unities of “NNs like a ηlaθk ηox [ - ]. Figure shows a θomparison
ηetween a human neuron and an “NN neuron.
Figure 1. (A) H”man ne”ron; (B) ar“ificial ne”ron or hidden ”ni“y; (C) biological synapse; (D) ANN synapses.
Different “NN teθhniques θan ηe θlassified ηased on their arθhiteθture or neuron θonneθ‐
tion pattern. The feed-forward networks are θomposed ηy unidireθtional θonneθtions ηe‐
tween network layers. In other words, there is a θonneθtion flow from the input to output
direθtion. The feedηaθk or reθurrent networks are the “NNs where the θonneθtions
among layers oθθur in ηoth direθtions. In this kind of neural network, the θonneθtion pat‐
tern is θharaθterized ηy loops due to the feedηaθk ηehavior. In reθurrent networks, when
the output signal of a neuron enter in a previous neuron the feedηaθk θonneθtion , the
new input data is modified [ , - ].
206 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
φ (r) = {1 r ≥ ½; n − ½
{
n½; 0n ≤ − ½}
1 φ (r) = “anh(n / 2) =
φ (r) = {1 ifn ≥ 0; 0 ifn0}
1−v ≥ 1 − exp( − n) / 1 + exp( − n) ^2
2
φ (r) = {
1ifv ≥ 0
0ifv 0 φ (v ) = v −
1
2
v
1
2
φ (v ) = “anh ( )
v
2
=
1 − exp( − v)
1 + exp( − v)
φ(r) = e − (ε v )
1
0v ≤ −
2
Eaθh “NN arθhiteθture has an intrinsiθ ηehavior. Therefore, the neural networks θan ηe θlas‐
sified aθθording to their θonneθtions pattern, the numηer of hidden unities, the nature of aθ‐
tivation funθtions and the learning algorithm [ - ]. There are an extensive numηer of
“NN types and Figure exemplifies the general θlassifiθation of neural networks showing
the most θommon “NN teθhniques employed in θhemistry.
Figure 2. The mos“ common ne”ral ne“works employed in chemis“ry (adap“ed from Jain & Mao, 1996 [25]).
“θθording to the previous ηrief explanation, “NN teθhniques θan ηe θlassified ηased on
some features. The next topiθs explain the most θommon types of “NN employed in
θhemiθal proηlems.
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 207
h““p://dx.doi.org/10.5772/51275
. . Multilayer perceptrons
Multilayer perθeptrons MLP is one of the most employed “NN algorithms in θhemistry.
The term multilayer is used ηeθause this methodology is θomposed ηy several neurons ar‐
ranged in different layers. Eaθh θonneθtion ηetween the input and hidden layers or two
hidden layers is similar to a synapse ηiologiθal θounterpart and the input data is modified
ηy a determined weight. Therefore, a three layer feed-forward network is θomposed ηy an
input layer, two hidden layers and the output layer [ , - ].
MLP is also θalled feed-forward neural networks ηeθause the data information flows only
in the forward direθtion. In other words, the produθed output of a layer is only used as
input for the next layer. “n important θharaθteristiθ of feed-forward networks is the su‐
pervised learning [ , - ].
The θruθial task in the MLP methodology is the training step. The training or learning step is a
searθh proθess for a set of weight values with the oηjeθtive of reduθing/minimizing the squared
errors of prediθtion experimental x estimated data . This phase is the slowest one and there is
no guarantee of minimum gloηal aθhievement. There are several learning algorithms for MLP
suθh as θonjugate gradient desθent, quasi-Newton, Levenηerg-Marquardt, etθ., ηut the most
employed one is the ηaθk-propagation algorithm. This algorithm uses the error values of the
output layer prediθtion to adjust the weight of layer θonneθtions. Therefore, this algorithm
provides a guarantee of minimum loθal or gloηal θonvergenθe [ , - ].
The main θhallenge of MLP is the θhoiθe of the most suitaηle arθhiteθture. The speed and the
performanθe of the MLP learning are strongly affeθted ηy the numηer of layers and the
numηer of hidden unities in eaθh layer [ , - ]. Figure displays the influenθe of numηer
of layers on the pattern reθognition aηility of neural network.
Figure 3. Infl”ence of “he n”mber of layers on “he pa““ern recogni“ion abili“y of MLP (adap“ed from Jain &
Mao, 1996 [25]).
The inθrease in the numηer of layers in a MLP algorithm is proportional to the inθrease of
θomplexity of the proηlem to ηe solved. The higher the numηer of hidden layers, the higher
the θomplexity of the pattern reθognition of the neural network.
208 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Self-organizing map SOM , also θalled Kohonen neural network KNN , is an unsupervised
neural network designed to perform a non-linear mapping of a high-dimensionality data
spaθe transforming it in a low-dimensional spaθe, usually a ηidimensional spaθe. The visuali‐
zation of the output data is performed from the distanθe/proximity of neurons in the output
D-layer. In other words, the SOM teθhnique is employed to θluster and extrapolate the data
set keeping the original topology. The SOM output neurons are only θonneθted to its nearest
neighηors. The neighηorhood represents a similar pattern represented ηy an output neuron. In
general, the neighηorhood of an output neuron is defined as square or hexagonal and this
means that eaθh neuron has or nearest neighηors, respeθtively [ - ]. Figure exemplifies
the output layers of a SOM model using square and hexagonal neurons for a θomηinatorial de‐
sign of purinergiθ reθeptor antagonists [ ] and θannaηinoid θompounds [ ], respeθtively.
Figure 4. Example of o”“p”“ layers of SOM models ”sing sq”are and hexagonal ne”rons for “he combina“orial design
of (a) p”rinergic recep“or an“agonis“s [54] and (b) cannabinoid compo”nds [30], respec“ively.
The SOM teθhnique θould ηe θonsidered a θompetitive neural network due to its learning
algorithm. The θompetitive learning means that only the neuron in the output layer is seleθt‐
ed if its weight is the most similar to the input pattern than the other input neurons. Finally,
the learning rate for the neighηorhood is sθaled down proportional to the distanθe of the
winner output neuron [ - ].
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 209
h““p://dx.doi.org/10.5772/51275
Different from the usual ηaθk-propagation learning algorithm, the ”ayesian method θonsid‐
ers all possiηle values of weights of a neural network weighted ηy the proηaηility of eaθh set
of weights. This kind of neural network is θalled ”ayesian regularized artifiθial neural
”R“NN networks ηeθause the proηaηility of distriηution of eaθh neural network, whiθh
provides the weights, θan ηe determined ηy ”ayes~s theorem [ ]. Therefore, the ”ayesian
method θan estimate the numηer of effeθtive parameters to prediθt an output data, praθtiθal‐
ly independent from the “NN arθhiteθture. “s well as the MLP teθhnique, the θhoiθe of the
network arθhiteθture is a very important step for the learning of ”R“NN. “ θomplete re‐
view of the ”R“NN teθhnique θan ηe found in other studies [ - ].
The neural network known as radial ηasis funθtion R”F [ ] typiθally has the input layer, a
hidden layer with a R”F as the aθtivation funθtion and the output layer. This network was
developed to treat irregular topographiθ θontours of geographiθal data [ - ] ηut due to its
θapaθity of solving θomplex proηlems non-linear speθially , the R”F networks have ηeen
suθθessfully employed to θhemiθal proηlems. There are several studies θomparing the ro‐
ηustness of prediθtion prediθtion θoeffiθients, r , pattern reθognition rates and errors of
R”F-ηased networks and other methods [ - ].
The Hopfield neural network [ - ] is a model that uses a ηinary n x n matrix presented as
n x n pixel image as a weight matrix for n input signals. The aθtivation funθtion treats the
aθtivation signal only as or - . ”esides, the algorithm treats ηlaθk and white pixels as and
ηinary digits, respeθtively, and there is a transformation of the matrix data to enlarge the
interval from – to - – + . The θomplete desθription of this teθhnique θan ηe found in
referenθe [ ]. In θhemistry researθh, we θan found some studies employing the Hopfield
model to oηtain moleθular alignments [ ], to θalθulate the intermoleθular potential energy
funθtion from the seθond virial θoeffiθient [ ] and other purposes [ - ].
. Applications
Following, we will present a ηrief desθription of some studies that apply “NN teθhniques as
important tools to solve θhemiθal proηlems.
210 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The drug design researθh involves the use of several experimental and θomputational strat‐
egies with different purposes, suθh as ηiologiθal affinity, pharmaθokinetiθ and toxiθologiθal
studies, as well as quantitative struθture-aθtivity relationship QS“R models [ - ]. “n‐
other important approaθh to design new potential drugs is virtual sθreening VS , whiθh θan
maximize the effeθtiveness of rational drug development employing θomputational assays
to θlassify or filter a θompound dataηase as potent drug θandidates [ - ]. ”esides, vari‐
ous “NN methodologies have ηeen largely applied to θontrol the proθess of the pharma‐
θeutiθal produθtion [ - ].
Fanny et al. [ ] θonstruθted a SOM model to perform VS experiments and tested an exter‐
nal dataηase of , θompounds. The use of SOM methodology aθθelerated the similarity
searθhes ηy using several pharmaθophore desθriptors. The ηest result indiθated a map that
retrieves % of relevant neighηors output neurons in the similarity searθh for virtual hits.
. . Analytical Chemistry
There are several studies in analytiθal θhemistry employing “NN teθhniques with the aim
to oηtain multivariate θaliηration and analysis of speθtrosθopy data [ - ], as well as to
model the HPLC retention ηehavior [ ] and reaθtion kinetiθs [ ].
Fatemi [ ] θonstruθted a QSPR model employing the “NN teθhnique with ηaθk-propaga‐
tion algorithm to prediθt the ozone tropospheriθ degradation rate θonstant of organiθ θom‐
pounds. The data set was θomposed of organiθ θompounds divided into training, test
and validation sets. The author also θompared the “NN results with those oηtained from
the MLR method. The θorrelation θoeffiθients oηtained with “NN/MLR were . / . ,
. / . and . / . for the training, test and validation sets, respeθtively. These results
showed the ηest effiθaθy of the “NN methodology in this θase.
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 211
h““p://dx.doi.org/10.5772/51275
. . Biochemistry
Neural networks have ηeen largely employed in ηioθhemistry and θorrelated researθh fields
suθh as protein, DN“/RN“ and moleθular ηiology sθienθes [ - ].
Petritis et al. [ ] employed a three layer neural network with ηaθk-propagation algorithm to
prediθt the reverse-phase liquid θhromatography retention time of peptides enzymatiθally di‐
gested from proteomes. In the training set, the authors used known peptides from D. radi‐
odurans. The θonstruθted “NN model was employed to prediθt a set with peptides from
S. oneidensis. The used neural network generated some weights for the θhromatographiθ re‐
tention time for eaθh aminoaθid in agreement to results oηtained ηy other authors. The oη‐
tained “NN model θould prediθt a peptide sequenθe θontaining aminoaθids with an error
less than . . Half of the test set was prediθted with less than % of error and more than % of
this set was prediθted with an error around %. These results showed that the “NN method‐
ology is a good tool to prediθt the peptide retention time from liquid θhromatography.
Huang et al. [ ] introduθed a novel “NN approaθh θomηining aspeθts of QS“R and “NN
and they θalled this approaθh of physiθs and θhemistry-driven “NN Phys-Chem “NN .
This methodology has the parameters and θoeffiθients θlearly ηased on physiθoθhemiθal in‐
sights. In this study, the authors employed the Phys-Chem “NN methodology to prediθt the
staηility of human lysozyme. The data set was θomposed ηy types of mutated lysozymes
inθluding the wild type and the experimental property used in the modeling was the
θhange in the unfolding Giηηs free energy kJ- mol . This study resulted in signifiθant θoeffi‐
θients of θaliηration and validation r = . and q = . , respeθtively . The proposed meth‐
odology provided good prediθtion of ηiologiθal aθtivity, as well as struθtural information
and physiθal explanations to understand the staηility of human lysozyme.
. . Food Research
“NNs have also ηeen widely employed in food researθh. Some examples of appliθation of
“NNs in this area inθlude vegetaηle oil studies [ - ], ηeers [ ], wines [ ], honeys
[ - ] and water [ - ].
”os et al. [ ] employed several “NN teθhniques to prediθt the water perθentage in
θheese samples. The authors tested several different arθhiteθture of neurons some funθ‐
tions were employed to simulate different learning ηehaviors and analyzed the prediθ‐
tion errors to assess the “NN performanθe. The ηest result was oηtained employing a
radial ηasis funθtion neural network.
Cimpoiu et al. [ ] used the multi-layer perθeptron with the ηaθk-propagation algorithm to
model the antioxidant aθtivity of some θlasses of tea suθh as ηlaθk, express ηlaθk and green
teas. The authors oηtained a θorrelation of . % ηetween experimental and prediθted anti‐
oxidant aθtivity. “ θlassifiθation of samples was also performed using an “NN teθhnique
with a radial ηasis layer followed ηy a θompetitive layer with a perfeθt matθh ηetween real
and prediθted θlasses.
212 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
. Conclusions
“rtifiθial Neural Networks “NNs were originally developed to mimiθ the learning proθess
of human ηrain and the knowledge storage funθtions. The ηasiθ unities of “NNs are θalled
neurons and are designed to transform the input data as well as propagating the signal with
the aim to perform a non-linear θorrelation ηetween experimental and prediθted data. “s
the human ηrain is not θompletely understood, there are several different arθhiteθtures of
artifiθial neural networks presenting different performanθes. The most θommon “NNs ap‐
plied to θhemistry are MLP, SOM, ”R“NN, “RT, Hopfield and R”F neural networks. There
are several studies in the literature that θompare “NN approaθhes with other θhemometriθ
tools e.g. MLR and PLS , and these studies have shown that “NNs have the ηest perform‐
anθe in many θases. Due to the roηustness and effiθaθy of “NNs to solve θomplex proηlems,
these methods have ηeen widely employed in several researθh fields suθh as mediθinal
θhemistry, pharmaθeutiθal researθh, theoretiθal and θomputational θhemistry, analytiθal
θhemistry, ηioθhemistry, food researθh, etθ. Therefore, “NN teθhniques θan ηe θonsidered
valuaηle tools to understand the main meθhanisms involved in θhemiθal proηlems.
Notes
Teθhniques related to artifiθial neural networks “NNs have ηeen inθreasingly used in θhemi‐
θal studies for data analysis in the last deθades. Some areas of “NN appliθations involve pat‐
tern identifiθation, modeling of relationships ηetween struθture and ηiologiθal aθtivity,
θlassifiθation of θompound θlasses, identifiθation of drug targets, prediθtion of several physi‐
θoθhemiθal properties and others. “θtually, the main purpose of “NN teθhniques in θhemiθal
proηlems is to θreate models for θomplex input–output relationships ηased on learning from
examples and, θonsequently, these models θan ηe used in prediθtion studies. It is interesting to
note that “NN methodologies have shown their power and roηustness in the θreation of use‐
ful models to help θhemists in researθh projeθts in aθademy and industry. Nowadays, the evo‐
lution of θomputer sθienθe software and hardware has allowed the development of many
θomputational methods used to understand and simulate the ηehavior of θomplex systems. In
this way, the integration of teθhnologiθal and sθientifiθ innovation has helped the treatment of
large dataηases of θhemiθal θompounds in order to identify possiηle patterns. However, peo‐
ple that θan use θomputational teθhniques must ηe prepared to understand the limits of appliθ‐
aηility of any θomputational method and to distinguish ηetween those opportunities whiθh are
appropriate to apply “NN methodologies to solve θhemiθal proηlems. The evolution of “NN
theory has resulted in an inθrease in the numηer of suθθessful appliθations. So, the main θontri‐
ηution of this ηook θhapter will ηe ηriefly outline our view on the present sθope and future ad‐
vanθes of “NNs ηased on some appliθations from reθent researθh projeθts with emphasis in
the generation of prediθtive “NN models.
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 213
h““p://dx.doi.org/10.5772/51275
Author details
References
[ ] Kleηe, G., “ηraham, U., & Mietzner, T. . Moleθular Similarity Indiθes in a Com‐
parative “nalysis CoMSI“ of Drug Moleθules to Correlate and Prediθt Their ”io‐
logiθal “θtivity. J. Med. Chem., , - .
[ ] Cerqueira, E. O., “ndrade, J. C., & Poppi, R. J. . Redes neurais e suas apliθações
em θaliηração multivariada. Quím. Nova, , - .
[ ] Hsiao, T., Lin, C., Zeng, M., & Chiang, H. K. . The Implementation of Partial
Least Squares with “rtifiθial Neural Network “rθhiteθture. Proθeedings of the th An‐
nual International Conferenθe of the IEEE Engineering in Mediθine and Biology Soθiety, ,
- .
[ ] Jain, “. K., Mao, J., & Mohiuddin, K. M. . “rtifiθial Neural Networks “ Tutori‐
al. IEEE Computer, , - .
[ ] Zheng, F., Zheng, G., Deaθiuθ, “. G., Zhan, C. G., Dwoskin, L. P., & Crooks, P. “.
. Computational neural network analysis of the affinity of loηeline and tetraηe‐
nazine analogs for the vesiθular monoamine transporter- . Bioorg. Med. Chem., ,
- .
[ ] Honório, K. M., de Lima, E. F., Quiles, M. G., Romero, R. “. F., Molfetta, F. “., & da,
Silva. “. ”. F. . “rtifiθial Neural Networks and the Study of the Psyθhoaθtivity
of Cannaηinoid Compounds. Chem. Biol. Drug. Des., , - .
[ ] Qin, Y., Deng, H., Yan, H., & Zhong, R. . “n aθθurate nonlinear QS“R model
for the antitumor aθtivities of θhloroethylnitrosoureas using neural networks. J. Mol.
Graph. Model., , - .
[ ] Marini, F., ”uθθi, R., Magri, “. L., & Magri, “. D. . “rtifiθial neural networks in
θhemometriθs History examples and perspeθtives. Miθroθhem. J., , - .
[ ] Pitts, W., & Mθ Culloθh, W. S. . How we know universals the perθeption of au‐
ditory and visual forms. Bull. Math. Biophys., , - .
[ ] Zupan, J., & Gasteiger, J. . Neural networks “ new method for solving θhemi‐
θal proηlems or just a passing phase? Anal. Chim. Aθta, , - .
[ ] Smits, J. R. M., Melssen, W. J., ”uydens, L. M. C., & Kateman, G. . Using artifi‐
θial neural networks for solving θhemiθal proηlems. Part I. Multi-layer feed-forward
networks. Chemom. Intell. Laη., , - .
[ ] Werηos, P. . ”eyond Regression New Tools for Prediθtion and “nalysis in the
”ehavioral Sθienθes. PhD thesis, Harvard University Camηridge.
[ ] ”ohr, H., ”ohr, J., ”runak, S., Cotterill, R., Lautrup, ”., Norskov, L., Olsen, O., & Pe‐
tersen, S. . Protein Seθondary Struθture and Homology ηy Neural Networks.
FEBS Lett., , - .
[ ] Zupan, J., & Gasteiger, J. . Neural Networks in Chemistry and Drug Design ed. ,
Wiley-VCH.
[ ] Gasteiger, J., & Zupan, J. . Neural Networks in Chemistry. Angew. Chem. Int.
Edit., , - .
[ ] Zupan, J., Noviča, M., & Ruisánθhez, I. . Kohonen and θounterpropagation arti‐
fiθial neural networks in analytiθal θhemistry. Chemom. Intell. Laη., , - .
[ ] Smits, J. R. M., Melssen, W. J., ”uydens, L. M. C., & Kateman, G. . Using artifi‐
θial neural networks for solving θhemiθal proηlems. Part II. Kohonen self-organising
feature maps and Hopfield networks. Chemom. Intell. Laη., , - .
[ ] Carpenter, G. “., Grossηerg, S., & Rosen, D. ”. . “RT- a-an adaptive resonanθe
algorithm for rapid θategory learning and reθognition. Neural Networks, , - .
[ ] Wienke, D., & ”uydens, L. . “daptive resonanθe theory ηased neural networks-
the~“RT~ of real-time pattern reθognition in θhemiθal proθess monitoring? TrAC
Trend. Anal. Chem., , - .
[ ] Lin, C. C., & Wang, H. P. . Classifiθation of autoregressive speθtral estimated
signal patterns using an adaptive resonanθe theory neural network. Comput. Ind., ,
- .
[ ] Whiteley, J. R., & Davis, J. F. . “ similarity-ηased approaθh to interpretation of
sensor data using adaptive resonanθe theory. Comput. Chem. Eng., , - .
[ ] Whiteley, J. R., & Davis, J. F. . Qualitative interpretation of sensor patterns.
IEEE Expert, , - .
[ ] Wienke, D., & Kateman, G. . “daptive resonanθe theory ηased artifiθial neural
networks for treatment of open-θategory proηlems in θhemiθal pattern reθognition-
appliθation to UV-Vis and IR speθtrosθopy. Chemom. Intell. Laη., , - .
[ ] Wienke, D., Xie, Y., & Hopke, P. K. . “n adaptive resonanθe theory ηased artifi‐
θial neural network “RT- a for rapid identifiθation of airηorne partiθle shapes from
their sθanning eleθtron miθrosθopy images. Intell. Laη., , - .
[ ] Xie, Y., Hopke, P. K., & Wienke, D. . “irηorne partiθle θlassifiθation with a
θomηination of θhemiθal θomposition and shape index utilizing an adaptive reso‐
nanθe artifiθial neural network. Environ. Sθi. Teθhnol., , - .
[ ] Wienke, D., van den, ”roek. W., Melssen, W., ”uydens, L., Feldhoff, R., Huth-Fehre,
T., Kantimm, T., Quiθk, L., Winter, F., & Cammann, K. . Comparison of an
adaptive resonanθe theory ηased neural network “RT- a against other θlassifiers
for rapid sorting of post θonsumer plastiθs ηy remote near-infrared speθtrosθopiθ
sensing using an InGa“s diode array. Anal. Chim. Aθta, , - .
[ ] Domine, D., Devillers, J., Wienke, D., & ”uydens, L. . “RT -“ for Optimal Test
Series Design in QS“R. J. Chem. Inf. Comput. Sθi., , - .
[ ] Wienke, D., & ”uydens, L. . “daptive resonanθe theory ηased neural network
for supervised θhemiθal pattern reθognition Fuzzy“RTM“P . Part Theory and
network properties. Chemom. Intell. Laη., , - .
[ ] Wienke, D., van den, ”roek. W., ”uydens, L., Huth-Fehre, T., Feldhoff, R., Kantimm,
T., & Cammann, K. . “daptive resonanθe theory ηased neural network for su‐
pervised θhemiθal pattern reθognition Fuzzy“RTM“P . Part Classifiθation of
post-θonsumer plastiθs ηy remote NIR speθtrosθopy using an InGa“s diode array.
Chemom. Intell. Laη., , - .
[ ] ”uhmann, M. D. . Radial Basis Funθtions: Theory and Implementations, Camηridge
University.
218 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
[ ] Han, H., Chen, Q., & Qiao, J. . “n effiθient self-organizing R”F neural network
for water quality prediθtion. Neural Networks, , - .
[ ] Fidênθio, P. H., Poppi, R. J., “ndrade, J. C., & “ηreu, M. F. . Use of Radial ”asis
Funθtion Networks and Near-Infrared Speθtrosθopy for the Determination of Total
Nitrogen Content in Soils from Sao Paulo State. Anal. Sθi., , - .
[ ] Yao, X., Liu, M., Zhang, X., Zhang, R., Hu, Z., & Fan, ”. . Radial ”asis Funθtion
Neural Networks ”ased QSPR for the Prediθtion of log P. Chinese J. Chem., ,
- .
[ ] “rakawa, M., Hasegawa, K., & Funatsu, K. . “ppliθation of the Novel Moleθu‐
lar “lignment Method Using the Hopfield Neural Network to D-QS“R. J. Chem. Inf.
Comput. Sθi., , - .
[ ] ”raga, J. P., “lmeida, M. ”., ”raga, “. P., & ”elθhior, J. C. . Hopfield neural net‐
work model for θalθulating the potential energy funθtion from seθond virial data.
Chem. Phys., , - .
[ ] Guha, R., Serra, J. R., & Jurs, P. C. . Generation of QS“R sets with a self-organ‐
izing map. J. Mol. Graph. Model., , - .
Applica“ions of Ar“ificial Ne”ral Ne“works in Chemical Problems 219
h““p://dx.doi.org/10.5772/51275
[ ] Hoshi, K., Kawakami, J., Kumagai, M., Kasahara, S., Nishimura, N., Nakamura, H., &
Sato, K. . “n analysis of thyroid funθtion diagnosis using ”ayesian-type and
SOM-type neural networks. Chem. Pharm. Bull., , - .
[ ] Nandi, S., Vraθko, M., & ”agθhi, M. C. . “ntiθanθer aθtivity of seleθted phenoliθ
θompounds QS“R studies using ridge regression and neural networks. Chem. Biol.
Drug Des., , - .
[ ] Xiao, Y. D., Clauset, “., Harris, R., ”ayram, E., Santago, P., & Sθhmitt, . . Super‐
vised self-organizing maps in drug disθovery. . Roηust ηehavior with overdeter‐
mined data sets. J. Chem. Inf. Model., , - .
[ ] Molfetta, F. “., “ngelotti, W. F. D., Romero, R. “. F., Montanari, C. “., & da, Silva. “.
”. F. . “ neural networks study of quinone θompounds with trypanoθidal aθ‐
tivity. J. Mol. Model., , - .
[ ] Zheng, F., Zheng, G., Deaθiuθ, “. G., Zhan, C. G., Dwoskin, L. P., & Crooks, P. “.
. Computational neural network analysis of the affinity of loηeline and tetraηe‐
nazine analogs for the vesiθular monoamine transporter- . ”ioorg. Med. Chem., ,
- .
[ ] Sθhneider, G., Coassolo, P., & Lavé, T. . Comηining in vitro and in vivo phar‐
maθokinetiθ data for prediθtion of hepatiθ drug θlearanθe in humans ηy artifiθial neu‐
ral networks and multivariate statistiθal teθhniques. J. Med. Chem., , - .
[ ] Hu, L., Chen, G., & Chau, R. M. W. . “ neural networks-ηased drug disθovery
approaθh and its appliθation for designing aldose reduθtase inhiηitors. J. Mol. Graph.
Model., , - .
[ ] “fantitis, “., Melagraki, G., Koutentis, P. “., Sarimveis, H., & Kollias, G. . Li‐
gand- ηased virtual sθreening proθedure for the prediθtion and the identifiθation of
novel -amyloid aggregation inhiηitors using Kohonen maps and Counterpropaga‐
tion “rtifiθial Neural Networks. Eur. J. Med. Chem., , - .
[ ] Noeske, T., Trifanova, D., Kauss, V., Renner, S., Parsons, C. G., Sθhneider, G., & Weil,
T. . Synergism of virtual sθreening and mediθinal θhemistry Identifiθation and
optimization of allosteriθ antagonists of metaηotropiθ glutamate reθeptor . Bioorg.
Med. Chem., , - .
[ ] Karpov, P. V., Osolodkin, D. I., ”askin, I. I., Palyulin, V. “., & Zefirov, N. S. .
One-θlass θlassifiθation as a novel method of ligand-ηased virtual sθreening The θase
of glyθogen synthase kinase inhiηitors. Bioorg. Med. Chem. Lett., , - .
[ ] Molnar, L., & Keseru, G. M. . “ neural network ηased virtual sθreening of θyto‐
θhrome p a inhiηitors. Bioorg. Med. Chem. Lett., , - .
220 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
[ ] Di Massimo, C., Montague, G. “., Willis, Tham. M. T., & Morris, “. J. . To‐
wards improved peniθillin fermentation via artifiθialneuralnetworks. Comput. Chem.
Eng., , - .
[ ] Takayama, K., Fujikawa, M., & Nagai, T. . “rtifiθial Neural Network as a Novel
Method to Optimize Pharmaθeutiθal Formulations. Pharm. Res., , - .
[ ] Takayama, K., Morva, “., Fujikawa, M., Hattori, Y., Oηata, Y., & Nagai, T. .
Formula optimization of theophylline θontrolled-release taηlet ηased on artifiθial
neural networks. J. Control. Release, , - .
[ ] Fanny, ”., Gilles, M., Natalia, K., “lexandre, V., & Dragos, H. Using Self-Organizing
Maps to “θθelerate Similarity Searθh. Bioorg. Med. Chem., In Press, http //dxdoiorg/
/jηmθ .
[ ] Sumpter, ”. G., & Noid, D. W. . Neural networks and graph theory as θompu‐
tational tools for prediθting polymer properties. Maθromol. Theor. Simul., , - .
[ ] Sθotta, D. J., Coveneya, P. V., Kilnerη, J. “., Rossinyη, J. C. H., & “lford, N. M. N.
. Prediθtion of the funθtional properties of θeramiθ materials from θomposition
using artifiθialneuralnetworks. J. Eur. Ceram. Soθ., , - .
[ ] Næs, T., Kvaal, K., Isaksson, T., & Miller, C. . “rtifiθial neural networks in mul‐
tivariate θaliηration. J. Near. Infrared Speθtrosθ., , - .
[ ] Munk, M. E., Madison, M. S., & Roηη, E. W. . Neural-network models for infra‐
red-speθtrum interpretation. Mikroθhim. Aθta, , - .
[ ] Smits, J. R. M., Sθhoenmakers, P., Stehmann, “., Sijstermans, F., & Chemom, Kate‐
man G. . Interpretation of infrared speθtra with modular neural-network sys‐
tems. Intell. Laη., , - .
[ ] Goodaθre, R., Neal, M. J., & Kell, D. ”. . Rapid and Quantitative “nalysis of the
Pyrolysis Mass Speθtra of Complex ”inary and Tertiary Mixtures Using Multivariate
Caliηration and “rtifiθial Neural Networks. Anal. Chem., , - .
[ ] Ciroviθ, D. . Feed-forward artifiθial neural networks appliθations to speθtro‐
sθopy. TrAC Trend. Anal. Chem., , - .
[ ] Zhao, R. H., Yue, ”. F., Ni, J. Y., Zhou, H. F., & Zhang, Y. K. . “ppliθation of an
artifiθial neural network in θhromatography-retention ηehavior prediθtion and pat‐
tern reθognition. Chemom. Intell. Laη., , - .
[ ] ”lanθo, M., Coello, J., Iturriaga, H., Maspoθh, S., & Redon, M. . “rtifiθial Neural
Networks for Multiθomponent Kinetiθ Determinations. Anal. Chem., , - .
[ ] Fatemi, M. H. . Prediθtion of ozone tropospheriθ degradation rate θonstant of
organiθ θompounds ηy using artifiθial neural networks. Anal. Chim. Aθta, ,
- .
[ ] Diederiθhs, K., Freigang, J., Umhau, S., Zeth, K., & ”reed, J. . Prediθtion ηy a
neural network of outer memηrane {ηeta}-strand protein topology. Protein Sθi., ,
- .
[ ] Meiler, J. . PROSHIFT Protein θhemiθal shift prediθtion using artifiθial neural
networks. J. Biomol. NMR, , - .
[ ] Lohmann, R., Sθhneider, G., ”ehrens, D., & Wrede, P. “. . Neural network
model for the prediθtion of memηrane-spanning amino aθid sequenθes. Protein Sθi., ,
- .
[ ] Domηi, G. W., & Lawrenθe, J. . “nalysis of protein transmemηrane heliθal re‐
gions ηy a neural network. Protein Sθi., , - .
[ ] Wang, S. Q., Yang, J., & Chou, K. C. . Using staθked generalization to prediθt
memηrane protein types ηased on pseudo-amino aθid θomposition. J. Theor. Biol.,
, - .
[ ] Ma, L., Cheng, C., Liu, X., Zhao, Y., Wang, “., & Herdewijn, P. . “ neural net‐
work for prediθting the staηility of RN“/DN“ hyηrid duplexes. Chemom. Intell. Laη.,
, - .
[ ] Ferran, E. “., Pflugfelaer, ”., & Ferrara, P. . Self-organized neural maps of hu‐
man protein sequenθes. Protein Sθi., , - .
[ ] Petritis, K., Kangas, L. J., Ferguson, P. L., “nderson, G. “., Pa:a-Tolić, L., Lipton, M.
S., “uηerry, K. J., Strittmatter, E. F., Shen, Y., Zhao, R., & Smith, R. D. . Use of
“rtifiθial Neural Networks for the “θθurate Prediθtion of Peptide Liquid Chromatog‐
raphy Elution Times in Proteome “nalyses. Anal. Chem., , - .
222 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
[ ] Huang, R., Du, Q., Wei, Y., Pang, Z., Wei, H., & Chou, K. . Physiθs and θhemis‐
try-driven artifiθial neural network for prediθting ηioaθtivity of peptides and pro‐
teins and their design. J. Theor. Biol., , - .
[ ] Martin, Y. G., Oliveros, M. C. C., Pavon, J. L. P., Pinto, C. G., & Cordero, ”. M. .
Eleθtroniθ nose ηased on metal oxide semiθonduθtor sensors and pattern reθognition
teθhniques θharaθterisation of vegetaηle oils. Anal. Chim. Aθta, , - .
[ ] Zhang, G. W., Ni, Y. N., Churθhill, J., & Kokot, S. . “uthentiθation of vegetaηle
oils on the ηasis of their physiθo-θhemiθal properties with the aid of θhemometriθs.
Talanta, , - .
[ ] Goodaθre, R., Kell, D. ”., & ”ianθhi, G. . Rapid assessment of the adulteration
of virgin olive oils ηy other seed oils using pyrolysis mass speθtrometry and artifiθial
neural networks. J. Sθi. Food Agr., , - .
[ ] ”ianθhi, G., Giansante, L., Shaw, “., & Kell, D. ”. . Chemometriθ θriteria for the
θharaθterisation of Italian DOP olive oils from their metaηoliθ profiles. Eur. J. Lipid.
Sθi. Teθh., , - .
[ ] ”uθθi, R., Magri, “. D., Magri, “. L., Marini, D., & Marini, F. . Chemiθal “u‐
thentiθation of Extra Virgin Olive Oil Varieties ηy Supervised Chemometriθ Proθe‐
dures. J. Agriθ. Food Chem., , - .
[ ] Marini, F., ”alestrieri, F., ”uθθi, R., Magri, “. D., Magri, “. L., & Marini, D. .
Supervised pattern reθognition to authentiθate Italian extra virgin olive oil varieties.
Chemom. Intell. Laη., , - .
[ ] Marini, F., ”alestrieri, F., ”uθθi, R., Magri, “. L., & Marini, D. . Supervised pat‐
tern reθognition to disθriminate the geographiθal origin of riθe ηran oils a first study.
Miθroθh. J., , - .
[ ] Marini, F., Magri, “. L., Marini, D., & ”alestrieri, F. . Charaθterization of the
lipid fraθtion of Niger seeds Guizotia aηyssiniθa θass from different regions of
Ethiopia and India and θhemometriθ authentiθation of their geographiθal origin. Eur.
J. Lipid. Sθi. Teθh., , - .
[ ] Vonθina, E., ”rodnjak-Vonθina, D., Soviθ, N., & Noviθ, M. . Chemometriθ θhar‐
aθterisation of the Quality of Ground Waters from Different wells in Slovenia. Aθta
Chim. Slov., , - .
[ ] ”os, “., ”os, M., & van der Linden, W. E. . “rtifiθial neural networks as a tool
for soft-modelling in quantitative analytiθal θhemistry the prediθtion of the water
θontent of θheese. Anal. Chim. Aθta, , - .
[ ] Cimpoiu, C., Cristea, V., Hosu, “., Sandru, M., & Seserman, L. . “ntioxidant
aθtivity prediθtion and θlassifiθation of some teas using artifiθial neural networks.
Food Chem., , - .
Chapter 11
h““p://dx.doi.org/10.5772/51598
. Introduction
Many θommunities oηtain their drinking water from underground sourθes θalled aquifers.
Offiθial water suppliers or puηliθ inθorporations drill wells into soil and roθk aquifers look‐
ing for groundwater θontained there in order to supply the population with drinking water.
“n aquifer θan ηe defined as a geologiθ formation that will supply water to a well in enough
quantities to make possiηle the produθtion of water from this formation. The θonventional
estimation of the exploration flow involves many efforts to understand the relationship ηe‐
tween the struθtural and physiθal parameters. These parameters depend on several faθtors,
suθh as soil properties and hydrologiθ and geologiθ aspeθts [ ].
The transportation of water to the reservoirs is usually done through suηmerse eleθtriθal
motor pumps, ηeing the eleθtriθ power one of the main sourθes to the water produθtion.
Considering the inθreasing diffiθulty to oηtain new eleθtriθal power sourθes, there is then
the need to reduθe ηoth operational θosts and gloηal energy θonsumption. Thus, it is impor‐
tant to adopt appropriate operational aθtions to manage effiθiently the use of eleθtriθal pow‐
er in these groundwater hydrology proηlems. For this purpose, it is essential to determine a
parameter that expresses the energetiθ ηehavior of whole water extraθtion set, whiθh is here
defined as Gloηal Energetiθ Effiθienθy Indiθator GEEI . “ methodology using artifiθial neural
networks is here developed in order to take into aθθount several experimental tests related
to energy θonsumption in suηmerse motor pumps.
The GEEI of a depth is given in Wh/m .m. From a dimensional analysis, we θan oηserve that
the smaller numeriθ value of GEEI indiθates the ηetter energetiθ effiθienθy to the water ex‐
traθtion system from aquifers.
© 2013 da Silva e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
226 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
For suθh sθope, this θhapter is organized as follows. In Seθtion , a ηrief summary aηout wa‐
ter exploration proθesses are presented. In Seθtion , some aspeθts related to mathematiθal
models applied to water exploration proθess are desθriηed. In Seθtion is formulated the ex‐
pressions for defining the GEEI. The neural approaθh used to determine the GEEI is intro‐
duθed in Seθtion , while the proθedures for estimation of aquifer dynamiθ ηehavior using
neural networks are presented in Seθtion . Finally, in Seθtion , the key issues raised in the
θhapter are summarized and θonθlusions are drawn.
“fter the drilling proθess of groundwater wells, the test known as Step Drawdown Test is θar‐
ried out. This test θonsists of measuring the aquifer depth in relation to θontinue withdrawal
of water and with θresθent flow on the time. This depth relationship is defined as Dynamiθ
Level of the aquifer and the aquifer level at the initial instant, i.e., that instant when the
pump is turned on, is defined as Statiθ Level. This test gives the maximum water flow that
θan ηe pumped from the aquifer taking into aθθount its respeθtive dynamiθ level. “nother
θharaθteristiθ given ηy this test is the determination of Drawdown Disθharge Curves, whiθh
represent the dynamiθ level in relation to exploration flow [ ]. These θurves are usually ex‐
pressed ηy a mathematiθal funθtion and their results have presented low preθision.
Sinθe aquifer ηehavior θhanges in relation to operation time, the Drawdown Disθharge Curves
θan represent the aquifer dynamiθs only in that partiθular moment. These θhanges oθθur ηy
many faθtors, suθh as the following i aquifer reθharge θapaηility ii interferenθe of neigh‐
ηoring wells or θhanges in its exploration θonditions iii modifiθation of the statiθ level
when the pump is turned on iv operation θyθle of pump and v rest time availaηle to the
well. Thus, the mapping of these groundwater hydrology proηlems ηy θonventional identi‐
fiθation teθhniques has ηeθome very diffiθult when all aηove θonsiderations are taken into
aθθount. ”esides the aquifer ηehavior, other θomponents of the exploration system interfere
on the gloηal energetiθ effiθienθy of the system.
On the other hand, the motor-pump set mounted inside the well, suηmersed on the water
that θomes from the aquifer, reθeives the whole eleθtriθ power supplied to the system. From
an eduθtion piping, whiθh also supports physiθally the motor pump, the water is transport‐
ed to the ground surfaθe and from there, through an adduθtion piping, it is transported to
the reservoir, whiθh is normally loθated at an upper position in relation to the well. To trans‐
port water in this hydrauliθ system, it is neθessary several aθθessories valves, pipes, θurves,
etθ. for its implementation. Figure shows the typiθal θomponents involved with a water
extraθtion system ηy means of deep wells.
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 227
h““p://dx.doi.org/10.5772/51598
The resistanθe to the water flow, due to the state of the pipe walls, is θontinuous along all
the tuηing, and will ηe taken as uniform in every plaθe where the diameter of the pipe to ηe
θonstant.
This resistanθe makes the motor pump to supply an additional pressure or a load in order
to water θan reaθh the reservoir. Thus, the effeθt θreated ηy this resistanθe is also θalled
load loss along the pipe. Similar to the tuηing, other elements of the system θause a resist‐
anθe to the fluid flow, and therefore, load losses. These losses θan ηe θonsidered loθal, loθat‐
ed, aθθidental or singular, due to the faθt that they θome from partiθular points or parts of
the tuηing.
Regarding the hydrauliθ θirθuit, it is oηserved that the load loss distriηuted and loθated is
an important parameter, and that it varies with the type and the state of the material.
Therefore, old tuηing, with aggregated inθrustation along the operational time, shows a load
loss different of that present in new tuηing. “ valve turned off twiθe introduθes a ηigger
load loss than that when it is totally open. “ variation on the extraθtion flow also θreates
θhanges on the load loss. These are some oηservations, among several other points, that
θould ηe done.
“nother important faθtor θonθerning the gloηal energetiθ effiθienθy of the system is the geo‐
metriθal differenθe of level. However, this parameter does not show any variation after the
total implantation of the system. Conθerning this, two statements θan ηe done i when
mathematiθal models were used to study the lowering of the piezometriθ surfaθe, these
models should frequently ηe evaluated in θertain periods of time ii the exploration flow of
the aquifer assumes a fundamental role in the study of the hydrauliθ θirθuit and it should ηe
θarefully analyzed.
In order to overθome these proηlems, this work θonsiders the use of parameters, whiθh are
easily oηtained in praθtiθe, to represent the θapitation system, and the use of artifiθial neural
networks to determine the exploration flow. From these parameters, it is possiηle to deter‐
mine the GEEI of the system.
228 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
One of the most used mathematiθal models to simulate aquifer dynamiθ ηehavior is the
Theis~ model [ , ]. This model is very simple and it is used to transitory flow. In this model,
the following hypotheses are θonsidered i the aquifer is θonfined ηy impermeaηle forma‐
tions, ii the aquifer struθture is homogeneous and isotropiθ in relation to its hydro-geologi‐
θal parameters, iii the aquifer thiθkness is θonsidered θonstant with infinite horizontal
extent, and iv the wells penetrate the entire aquifer and their pumping rates are also θonsid‐
ered θonstant in relation to time.
The model proposed ηy Theis θan ηe represented ηy the following equations
¶ 2s 1 ¶s S ¶s
+ × = ×
¶r 2 r ¶r T ¶t
s(r ,0) = 0
s(¥, t ) = 0
∂s Q
lim r =−
r→ ∂r ⋅π ⋅T
where
s is the aquifer drawdown
Q is the exploration flow
T is the transmissivity θoeffiθient
r is the horizontal distanθe ηetween the well and the oηservation plaθe.
“pplying the Laplaθe~s transform on these equations, we have
d 2s - 1 ds - S
+ × = × w × s-
2 r dr T
dr
s - (r , w ) = A × K0 × (r × (S / T ) × w )
ds Q
lim r =−
r→ dr ⋅π ⋅T ⋅w
where
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 229
h““p://dx.doi.org/10.5772/51598
K × (r × (S / T ) × w )
s - (r , w ) = × 0
q
2 × ×T w
é K × (r × (S / T ) × w ) ù
h - h0 (r , t ) = s(r , t ) = × L-1 ê 0 ú
Q
2 × ×T ëê w ûú
ò
¥
e-y
s= dy = × W (u )
q Q
4 × ×T y 4 × ×T
u
where
r 2 ×S
u=
4 ×T × t
é K × (r × (S / T ) × w ) ù
W (u ) = 2 × L-1 ê 0 ú`
ëê w ûú
where
From analysis of the Theis~ model, it is oηserved that to model a partiθular aquifer is indis‐
pensaηle a high teθhniθal knowledge on this aquifer, whiθh is mapped under some hypothe‐
ses, suθh as θonfined aquifer, homogeneous, isotropiθ, θonstant thiθkness, etθ. Moreover,
other aquifer parameters transmissivity θoeffiθient, storage θoeffiθient and hydrauliθ θon‐
duθtivity to ηe explored must ηe also defined. Thus, the mathematiθal models require ex‐
pert knowledge of θonθepts and tools of hydrogeology.
230 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
It is also indispensaηle to θonsider that the aquifer of a speθifiθ region shows θontinuous
θhanges in its exploration θonditions. The θhanges are normally motivated ηy the θompanies
that operate the exploration systems, ηy drilling of new wells or θhanges of the exploration
θonditions, or still, motivated ηy drilling of illegal wells. These θhanges have θertainly re‐
quired immediate adjustment on the Theis~ model. “nother faθt is that the aquifer dynamiθ
level modifies in relation to exploration flow, operation time, statiθ level, and oηviously
with those intrinsiθ θharaθteristiθs of the aquifer under exploration. In addition, neighηoring
wells will also ηe aηle to θause interferenθe on the aquifer.
The theoretiθal θonθept for the proposed Gloηal Energetiθ Effiθienθy Indiθator will ηe pre‐
sented using θlassiθal equations that show the relationship ηetween the aηsorηed power
from the eleθtriθ system and the other parameters involved with the proθess.
× Q × HT
Pmp =
75 × mp
where
2.726 × Q × HT
Pmp =
mp
The total manometriθ height HT in elevator sets to water extraθtion from underground
aquifers is given ηy
H T = H a + H g + Δh f t
where
From analyses on the variaηles in , it is oηserved that only the variaηle θorresponding to
the geometriθ differenθe in level Hg θan ηe θonsidered θonstant, while other two will
θhange along the operation time of the well.
The dynamiθ level Ha will θhange to lower sinθe the ηeginning of the pumping until the
moment of staηilization. This oηservation is verified in short period of time, as for instanθe,
a month. ”esides this variation, whiθh θan present a θyθliθ ηehavior, it is possiηle that other
types of variation, due to interferenθes from other neighηoring wells, θan take plaθe as well
as alterations in the aquifer θharaθteristiθs.
The total load loss will also vary during the pumping, and it is dependent on hydrauliθ θir‐
θuit θharaθteristiθs diameter, piping length, hydrauliθ aθθessories, θurves, valves, etθ. .
These θharaθteristiθs θan ηe θonsidered θonstant, sinθe they usually do not θhange after in‐
stalled. However, the total load loss is also dependent on other θharaθteristiθ of the hydraul‐
iθ θirθuit, whiθh frequently θhanges along the useful life of the well. These variaηle
θharaθteristiθs are given ηy i roughness of the piping system, ii water flow, and iii opera‐
tional proηlems, suθh as semi-θlosed valves, leakage, etθ.
Oηserving again Figure , it is verified that the neθessary energy to transport the water from
the aquifer to the reservoir, overθoming all the inherent load losses, it is supplied ηy the
eleθtriθ system to the motor-pump set. Thus, using these θonsiderations and suηstituting
in , we have
. ⋅ Q ⋅ H a + H g + Δh f t
Pel =
mp
232 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
where
Hg is the geometriθ differenθe of level ηetween the well surfaθe and the reservoir m
From and θonsidering that an energetiθ effiθienθy indiθator should ηe a generiθ desθrip‐
tive indiθator, the Gloηal Energetiθ Effiθienθy Indiθator GEEI is here proposed ηy the follow‐
ing equation
Pel
GEEI =
Q ⋅ H a + H g + Δh f t
Oηserving equation , it is verified that the GEEI will depend on eleθtriθ power, water
flow, dynamiθ level, geometriθ differenθe of level, and total load loss of the hydrauliθ θirθuit.
The effiθienθy of the motor-pump set does not take part in ηeθause its ηehavior will
ηe refleθted inversely ηy the GEEI. Thus, when the effiθienθy of the motor-pump set is
high, the GEEI will ηe low. Therefore, the ηest GEEI will ηe those presenting the smallest
numeriθ values.
“nother reason to exθlude the effiθienθy of the motor-pump set in is the diffiθulty to
oηtain this value in praθtiθe. Sinθe it is a fiθtitious value, it is impossiηle to make a direθt
measurement and its value is oηtained through relationships ηetween other quantities. “fter
the ηeginning of the pumping, it is oθθurred the lowering of water level inside the well.
Then, the manometriθ height θhanges and as result the water flow also θhanges. The effi‐
θienθy of a motor-pump set will also θhange along its useful life due to the equipment wear‐
ing, piping inθrustations, leakages in the hydrauliθ system, oηstruθtions of filters inside the
well, θlosed or semi-θlosed valves, etθ.
Therefore, θonverting all variaηles in to meters, the most generiθ form of the GEEI is
given ηy
GEEI =
Pel
Q.HT
The GEEI defined in θan ηe used to analyze the well ηehavior along the time.
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 233
h““p://dx.doi.org/10.5772/51598
“mong all neθessary parameters to determine the proposed GEEI, the determination of the
exploration flow is the most diffiθult to oηtain in praθtiθe. The use of flow meters, as the
eleθtromagnetiθ ones, is very expensive. The use of rudimentary tests has provided impre‐
θise results.
To overθome this praθtiθal proηlem, it is proposed here the use of artifiθial neural networks
to determine the exploration flow from other parameters that have ηeen measured ηefore
determining the GEEI.
“rtifiθial Neural Networks “NN are dynamiθ systems that explore parallel and adaptive
proθessing arθhiteθtures. They θonsist of several simple proθessor elements with high degree
of θonneθtivity ηetween them [ ]. Eaθh one of these elements is assoθiated with a set of pa‐
rameters, known as network weights, that allows the mapping of a set of known values net‐
work inputs to a set of assoθiated values network outputs .
The proθess of weight adjustment to suitaηle values network training is θarried out
through suθθessive presentation of a set of training data. The oηjeθtive of the training is the
minimization ηetween the output response generated ηy the network and the respeθtive
desired output. “fter training proθess, the network will ηe aηle to estimate values for the
input set, whiθh were not inθluded in the training data.
In this work, an “NN will ηe used as a funθtional approximator, sinθe the exploration flow
of the well is a dependent variaηle of those ones that will ηe used as input variaηles. The
funθtional approximation θonsists of mapping the relationship ηetween the several variaηles
that desθriηe the ηehavior of a real system [ ].
The aηility of neural artifiθial networks to mapping θomplex nonlinear funθtions makes
them an attraθtive tool to identify and to estimate models representing the dynamiθ ηehav‐
ior of engineering proθesses. This feature is partiθularly important when the relationship ηe‐
tween several variaηles involved with the proθess is nonlinear and/or not very well defined,
making its modeling diffiθult ηy θonventional teθhniques.
The input variaηles applied to the proposed neural network were the following
• Eleθtriθ power in Watts Pel aηsorηed from the eleθtriθ system at the instant t.
234 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The unique output variaηle was the exploration flow of the aquifer Q , whiθh is expressed
in θuηiθ meters per hour. It is important to oηserve that for eaθh set of input values at a θer‐
tain instant t, the neural network will return a result for the flow at that same instant t.
The determination of GEEI will ηe done ηy using in equation the flow values oηtained
from the neural network and other parameters that θome from experimental measurements.
To training of the neural network, all these variaηles inputs and output were measured
and provided to the network. “fter training, the network was aηle to estimate the respeθtive
output variaηle. The values of the input variaηles and the respeθtive output for a θertain
pumping period, whiθh were used in the network training, are given ηy a set θomposed ηy
training patterns or training veθtors .
These patterns were applied to a neural network of MLP type Multilayer Perθeptron with
two hidden layers, and its training was done using the ηaθkpropagation algorithm ηased on
the Levenηerg-Marquardt~s method [ ]. “ desθription of the main steps of this algorithm is
presented in the “ppendix.
The network topology that was used is similar to that presented in Figure . The numηer of
hidden layers and the numηer of neurons in eaθh layer were determined from results oη‐
tained in [ , ]. The network is here θomposed ηy two hidden layers and the following pa‐
rameters were used in the training proθess
“t the end training proθess, the mean squared error oηtained was . x -
, whiθh is a value
θonsidered aθθeptaηle for this appliθation [ ].
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 235
h““p://dx.doi.org/10.5772/51598
“fter training proθess, values of input variaηles were applied to the network and the respeθ‐
tive values of flow were oηtained in its output. These values were then θompared with the
measured ones in order to evaluate the oηtained preθision.
Taηle I shows some values of flow that were given ηy the artifiθial neural network QANN
and those measured ηy experimental tests QET .
In this taηle, the values in ηold were not presented to the neural network during the train‐
ing.
When the patterns used in the training are presented again, it is notiθed that the differenθe
ηetween the results is very small, reaθhing the maximum value of . % of the measured
value. When new patterns are used, the highest error reaθhes the value of . %. It is also
oηserved that the error value to new patterns deθreases when they represent an operational
staηility situation of the motor-pump set, i.e., they are far away from the transitory period of
pumping.
“t this point, we should oηserve that it would ηe desiraηle a greater numηer of training pat‐
terns for the neural network, espeθially if it θould ηe oηtained from a wider variation of the
range of values.
236 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The proposed GEEI was determined ηy equation and the measured values used were
the eleθtriθ power, the dynamiθ level, the geometriθ differenθe of level, the pressure of out‐
put in the well, and the water flow oηtained from the neural network.
Figure shows the ηehavior of GEEI during the analyzed pumping period.
The numeriθ values that have generated the graphiθ in Figure are presented in Taηle .
0 7.420* 40 5.054
1 4.456* 45 5.139
2 5.738* 50 5.134
3 5.245* 55 5.115
4 4.896* 60 5.073
5 4.951* 75 5.066
6 4.689* 90 5.060
35 4.663
In this seθtion, artifiθial neural networks are now used to map the relationship ηetween the
variaηles assoθiated with the identifiθation proθess of aquifer dynamiθ ηehavior.
The general arθhiteθture of the neural system used in this appliθation is shown in Figure ,
where two neural networks of type MLP, MLP- and MLP- , θonstituted respeθtively ηy one
and two hidden layers, θompose this arθhiteθture.
Figure 4. General archi“ec“”re of “he ANN ”sed for es“ima“ion of aq”ifer dynamic behavior.
238 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
The first network “NN- has neurons in the hidden layer and it is responsiηle ηy the
θomputation of the aquifer operation level. The training data for “NN- were direθtly oη‐
tained from experimental measurements. It is important to note that this network has taken
into aθθount the present level and rest time of the aquifer.
The seθond network “NN- is responsiηle ηy the θomputation of the aquifer dynamiθ lev‐
el and it is θomposed ηy hidden layers with ηoth having neurons. For this network, the
training data were also oηtained from experimental measurements. “s oηserved in Figure ,
the “NN- output is provided as an input parameter to the “NN- . Therefore, the θomputa‐
tion of the aquifer dynamiθ level takes into aθθount the aquifer operation level, the explora‐
tion flow and operation time.
“fter training proθess of the neural networks, they were used for estimation of the aquifer
dynamiθ level. The simulation results oηtained ηy the networks are presented in Taηle and
Taηle .
Taηle presents the simulation results oηtained ηy the “NN- for a partiθular well. The op‐
eration levels θomputed ηy the network taking into aθθount the present level and rest time
of the aquifer were θompared with those results oηtained ηy measurements. In this taηle, the
}Relative Error~ θolumn provides the relative error ηetween the values estimated ηy the net‐
work and those oηtained ηy measurements.
The simulation results oηtained ηy the “NN- are provided in Taηle . The dynamiθ level of
the aquifer is estimated ηy the network in relation to operation level θomputed ηy the
“NN- , exploration flow and operation time. These results are also θompared with those
oηtained ηy measurements. In Taηle , the }Relative Error~ θolumn gives the relative error
ηetween the values θomputed ηy the network and those from measurements.
These results show the effiθienθy of the neural approaθh used for estimation of aquifer dy‐
namiθ ηehavior. The values estimated ηy the network are aθθurate to within . % of the ex‐
aθt values for “NN- and . for “NN- . From analysis of the results presented in Taηle
and , it is verified that the relative error ηetween values provided ηy the network and those
oηtained ηy experimental measurements is very small. For “NN- , the greatest relative er‐
ror is . % Taηle and for “NN- is . % Taηle .
. Conclusion
The management of systems that explore underground aquifers inθludes the analysis of two
ηasiθ θomponents the water, whiθh θomes from the aquifer and the eleθtriθ energy, whiθh
is neθessary to the transportation of the water to the θonsumption point or reservoir. Thus,
the development of an effiθienθy indiθator that shows the energetiθ ηehavior of a θertain
θapitation system is of great importanθe to effiθient management of the energy θonsump‐
tion, or still, to θonvert the oηtained results in aθtions that ηeθome possiηle a reduθtion of
energy θonsumption.
The oηtained GEEI will indiθate the gloηal energetiθ ηehavior of the water θapitation system
from aquifers and will ηe an indiθator of oθθurrenθes of aηnormalities, suθh as tuηing ηreaks
or oηstruθtions.
The appliθation of the proposed methodology uses parameters that have easily ηeen oη‐
tained in the water exploration system. The GEEI θalθulus θan also ηe done ηy operators or
to ηe implemented ηy means of θomputational system.
In addition, a novel methodology for estimation of aquifer dynamiθ ηehavior using artifiθial
neural networks was also presented in this θhapter. The estimation proθess is θarried out ηy
two feedforward neural networks. Simulation results θonfirm that proposed approaθh θan
ηe effiθiently used in these types of proηlem. From results, it is possiηle to simulate several
situations in order to define appropriate management plans and poliθies to the aquifer.
The main advantages in using this neural network approaθh are the following i veloθity
the estimation of dynamiθ levels are instantly θomputed and it is appropriated for appliθa‐
tion in real time, ii eθonomy and simpliθity reduθtion of operational θosts and measure‐
ment deviθes, and iii preθision the values estimated ηy the proposed approaθh are as good
as those oηtained ηy physiθal measurements.
240 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
. Appendix
The mathematiθ model that desθriηes the ηehavior of the artifiθial neuron is expressed ηy
the following equation
å
n
u= w i × xi + b
i =1
y = g (u )
where n is the numηer of inputs of the neuron xi is the i-th input of the neuron wi is the
weight assoθiated with the i-th input η is the threshold assoθiated with the neuron u is the
aθtivation potential g is the aθtivation funθtion of the neuron y is the output of the neuron.
To approximate any θontinuous nonlinear funθtion a neural network with only a hidden
layer θan ηe used. However, to approximate non-θontinuous funθtions in its domain it is
neθessary to inθrease the amount of hidden layers. Therefore, the networks are of great im‐
portanθe in mapping nonlinear proθesses and in identifying the relationship ηetween the
variaηles of these systems, whiθh are generally diffiθult to oηtain ηy θonventional teθhni‐
ques.
The network weights wj assoθiated with the j-th output neuron are adjusted ηy θomputing
the error signal linked to the k-th iteration or k-th input veθtor training example . This error
signal is provided ηy
e j (k ) = d j (k ) - y j (k )
“dding all squared errors produθed ηy the output neurons of the network with respeθt to k-
th iteration, we have
å
p
E (k ) =
1
e2j (k )
j =1
2
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 241
h““p://dx.doi.org/10.5772/51598
∂E k
w ji k + ← w ji k −
∂ w ji k
where wji is the weight θonneθting the j-th neuron of the output layer to the i-th neuron of
the previous layer, and is a θonstant that determines the learning rate of the ηaθkpropaga‐
tion algorithm.
The adjustment of weights ηelonging to the hidden layers of the network is θarried out in an
analogous way. The neθessary ηasiθ steps for adjusting the weights assoθiated with the hid‐
den neurons θan ηe found in [ ].
Sinθe the ηaθkpropagation learning algorithm was first popularized, there has ηeen θonsid‐
eraηle researθh into methods to aθθelerate the θonvergenθe of the algorithm.
While ηaθkpropagation is a steepest desθent algorithm, the Marquardt-Levenηerg algorithm
is similar to the quasi-Newton method, whiθh was designed to approaθh seθond-order train‐
ing speed without having to θompute the Hessian matrix.
When the performanθe funθtion has the form of a sum of squared errors like that presented
in , then the Hessian matrix θan ηe approximated as
H = JT × J
g = JT × e
where e is a veθtor of network errors, and J is the Jaθoηean matrix that θontains first deriva‐
tives of the network errors with respeθt to the weights and ηiases.
The Levenηerg-Marquardt algorithm uses this approximation to the Hessian matrix in the
following Newton-like update
−
wk+ ←w k − J T ⋅ J + μ ⋅I ⋅ J T ⋅e
When the sθalar μ is zero, this is Newton's method, using the approximate Hessian matrix.
When μ is large, this produθes a gradient desθent with a small step size. Newton~s method is
faster and more aθθurate near to an error minimum, so the aim is to shift toward Newton~s
method as quiθkly as possiηle.
242 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Thus, μ is deθreased after eaθh suθθessful step reduθtion in performanθe funθtion and is
inθreased only when a tentative step would inθrease the performanθe funθtion. In this way,
the performanθe funθtion is always reduθed at eaθh iteration of the algorithm [ ].
This algorithm appears to ηe the fastest method for training moderate-sized feedforward
neural networks up to several hundred weights .
Author details
References
[ ] Hagan, M. T., & Menhaj, M. ”. . Training Feedforward Networks with the Mar‐
quardt “lgorithm. IEEE Transaθtions on Neural Networks, , - .
[ ] Silva, I. N., Saggioro, N. J., & Cagnon, J. “. . Using neural networks for estima‐
tion of aquifer dynamiθal ηehavior. In: proθeedings of the International Joint θonferenθe
on Neural Networks, IJCNN , - July , Como, Italy.
Rec”rren“ Ne”ral Ne“work Based Approach for Solving Gro”ndwa“er Hydrology Problems 243
h““p://dx.doi.org/10.5772/51598
[ ] Cagnon, J. “., Saggioro, N. J., & Silva, I. N. . “ppliθation of neural networks for
analysis of the groundwater aquifer ηehavior. In: Proθeedings of the IEEE Industry Ap‐
pliθations Conferenθe, INDUSCON , - Novemηer, Porto “legre, ”razil.
[ ] “llen, D. M., Sθhuurman, N., & Zhang, Q. . Using Fuzzy Logiθ for Modeling
“quifer “rθhiteθture. Journal of Geographiθal Systems [ ], - .
[ ] Koike, K., Sakamoto, H., & Ohmi, M. . Deteθtion and Hydrologiθ Modeling of
“quifers in Unθonsolidated “lluvial Plains though Comηination of ”orehole Data
Sets “ Case Study of the “rao “rea, Southwest Japan. Engineering Geology, ,
- .
[ ] Fu, S., & Xue, Y. . Identifying aquifer parameters ηased on the algorithm of
simple pure shape. In: Proθeedings of the International Symposium on Water Resourθe and
Environmental Proteθtion, ISWREP , - May, Xi~an, China.
[ ] Jinyan, G., Yudong, L., Yuan, M., Mingθhao, H., Yan, L., & Hongjuan, L. .“
mathematiθ time dependent ηoundary model for flow to a well in an unθonfined
aquifer. In: Proθeedings of the International Symposium on Water Resourθe and Environ‐
mental Proteθtion, ISWREP , - May , Xi~an, China.
[ ] Hong, Y. S., Rosen, M. R., & Reeves, R. R. . Dynamiθ Fuzzy Modeling of Storm
Water Infiltration in Urηan Fraθtured “quifers. Journal of Hydrologiθ Engineering, ,
- .
[ ] He, X., & Liu, J. J. . “quifer parameter identifiθation with ant θolony optimiza‐
tion algorithm. In: Proθeedings of the International Workshop on Intelligent Systems and
Appliθations, ISA 9, - May, Wuhan, China.
Chapter 12
h““p://dx.doi.org/10.5772/51381
. Introduction
There is a great interest to know if a new θompany will ηe aηle to survive or not. Investors
use different tools to evaluate the survival θapaηilities of middle-aged θompanies ηut there
is not any tool for start-up ones. Most of the tools are ηased on regression models and in
quantitative variaηles. Nevertheless, qualitative variaηles whiθh measure the θompany way
of work and the manager skills θan ηe θonsidered as important as quantitative ones.
Develop a gloηal regression model that inθludes quantitative and qualitative variaηles θan
ηe very θompliθated. In this θase artifiθial neural networks θan ηe a very useful tool to model
the θompany survival θapaηilities. They have ηeen large speθially used in engineering proθ‐
esses modeling, ηut also in eθonomy and ηusiness modeling, and there is no proηlem in mix
quantitative and qualitative variaηles in the same model. This kind of nets are θalled mixed
artifiθial neural networks.
The Spanish entrepreneurship's ηasiθ indexes through have ηeen affeθted ηy the eθo‐
nomiθ θrisis. “fter a moderate drop % in , the Total Entrepreneurial “θtivity index
© 2013 Fernandez e“ al.; licensee InTech. This is an open access ar“icle dis“rib”“ed ”nder “he “erms of “he
Crea“ive Commons A““rib”“ion License (h““p://crea“ivecommons.org/licenses/by/3.0), which permi“s
”nres“ric“ed ”se, dis“rib”“ion, and reprod”c“ion in any medi”m, provided “he original work is properly ci“ed.
246 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
This rate implies that in our θountry there are , , nasθent ηusinesses ηetween and
months old . The owner-managers of a new ηusiness more than months ηut not more
than . years have also deθlined in , returning to levels.
“lthough most individuals are pulled into entrepreneurial aθtivity ηeθause of opportunity
reθognition . % , others are pushed into entrepreneurship ηeθause they have no other
means of making a living, or ηeθause they fear ηeθoming unemployed in the near future.
These neθessity entrepreneurs are . % of the entrepreneurs in Spain.
The median amount invested ηy entrepreneurs in was around , Euros less than
the median amount of , Euros in . Therefore the entrepreneurial initiative is less
amηitious in general.
Use of Ar“ificial Ne”ral Ne“works “o Predic“ The B”siness S”ccess or Fail”re of S“ar“-Up Firms 247
h““p://dx.doi.org/10.5772/51381
The faθtors that mostly θonstrain entrepreneurial aθtivity are first, finanθial support e.g.,
availaηility of deηt and equity , whiθh was θited as a θonstraining faθtor ηy % of respond‐
ents. Seθond, government poliθies supporting entrepreneurship, whiθh was θited as a θon‐
straining faθtor ηy % of respondents. Third, soθial and θultural norms, whiθh was θited as
a θonstraining faθtor ηy % of respondents.
More than one fifth of the entrepreneurial aθtivity . % was developed in a familiar mod‐
el. Therefore, the entrepreneurial initiatives, often driven ηy family memηers, reθeived finan‐
θial support or management assistanθe from some family memηers. Nevertheless, the influenθe
of some knowledge, teθhnology or researθh result developed in the University was ηigger
than expeθted. People deθided to start ηusinesses ηeθause they used some knowledge, teθh‐
nology or researθh result developed in the University . % of the nasθent ηusinesses, and
. % of the owner-managers of a new ηusiness .
. . Questionnaire
It is θlear that the θompany survival is greatly influenθed ηy its finanθial θapaηilities, howev‐
er, this numeriθal information is not always easy to oηtain, and even when oηtained, it is not
always reliaηle.
Table 1. Variables
There are some other qualitative faθtors that have influenθe in the θompany suθθess, suθh as
its teθhnologiθal θapaηilities, quality poliθies or aθademiθ level of its employees and manager.
In this study we will use ηoth numeriθal and qualitative data to model the θompany surviv‐
al Taηle .
248 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
. . . Finanθial data.
The most used ratios to prediθt the θompany suθθess are the “ltman ratios [ ]Laθher et al.,
[ ]“tiya,
• Working Capital/Total “ssets. Working Capital is defined as the differenθe ηetween θur‐
rent assets and θurrent liaηilities. Current assets inθlude θash, inventories, reθeivaηles and
marketaηle seθurities. Current liaηilities inθlude aθθounts payaηle, short-terms provision
and aθθrued expenses.
• Retained Earnings/Total “ssets. This ratio is speθially important ηeθause ηankruptθy is
higher for start-ups and young θompanies.
• Earnings ”efore Interest and Taxes/Total “ssets. Sinθe a θompany~s existenθe is dependent
on the earning power of its assets, this ratio is appropriate in failure prediθtion.
• Market Capitalization/Total Deηts. This ratio weighs up the dimension of a θompany~s
θompetitive market plaθe value.
• Sales/Total “ssets. This ratio measures the firm´s assets utilization.
. . . Qualitative data.
The eleθtion on whiθh qualitative data should ηe used is ηased on previous works as in ref‐
erenθes [ - ] “ragon Sánθhez y Ruηio ”añón y and Woods and Hampson ,
where they estaηlish several parameters to value the θompany positioning and its survival
θapaηilities and the influenθe of manager personality in the θompany survival.
• Manager aθademiθ level, ranged from to .
• PhD or Master .
• University degree .
• High sθhool .
• ”asiθ studies .
• Company teθhnologiθal resourθes, ranged from to .
• The θompany has a great marketing experienθe in the field of its produθts and in others
.
• The manager knows perfeθtly the ηusiness area and has ηeen working on several θompa‐
nies related whit it .
• The manager is a praθtiθal person who is not interested in aηstraθt ideas, prefers works
that is routine and has few artistiθ interest .
• The manager spends time refleθting on things, has an aθtive imagination and likes to
think up new ways of doing things, ηut may laθk pragmatism .
Researθhers will θonduθt these surveys with managers from θompanies. The surveys
will ηe θonduθted ηy the same team of researθhers to ensure the θonsistenθy of questions in‐
volving qualitative variaηles.
Prediθtive models ηased on artifiθial neural networks have ηeen widely used in different
knowledge areas, inθluding eθonomy and ηankruptθy prediθtion [ , - ]Laθher et al,
250 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
Jo et al, Yang et al, Hsiao et al, and foreθast markets evolution [ ]Jalil and
Misas, .
“rtifiθial neural networks are mathematiθal struθtures ηased on ηiologiθal ηrains, whiθh are
θapaηle of extraθt knowledge from a set of examples [ ] Perez and Martin, . They are
made up of a series of interθonneθted elements θalled neurons Fig. , and knowledge is set
in the θonneθtions ηetween neurons [ ] Priore et al, .
Figure 2. An ar“ificial ne”ron. ”(.): ne“ f”nc“ion, f(.): “ransfer f”nc“ion, wij: connec“ion weighs, Bi: Bias.
Those neurons are organized in a series of layers Fig. . The input layer reθeives the values
from the example variaηles, the inner layer performs the mathematiθal operations to oηtain
the proper response whiθh is shown ηy the output layer.
Use of Ar“ificial Ne”ral Ne“works “o Predic“ The B”siness S”ccess or Fail”re of S“ar“-Up Firms 251
h““p://dx.doi.org/10.5772/51381
There is not a θlear method to know how many hidden layers on how many neurons an arti‐
fiθial neural network must have, so the only method to perform the ηest net is ηy trial and
error [ ]Lin and Tseng, . In this work a speθial software will ηe develop, in order to
find the optimum numηer of neurons and hidden layers.
There are lots of different types of artifiθial neural network struθtures, depending on the
proηlem to solve or to model. In this work perθeptron struθture has ηeen θhosen. Perθeptron
is one of the most used artifiθial neural network and its θapaηility of universal funθtion
aproximator [ ]Hornik, makes it suitaηle for modeling too many different kinds of
variaηle relationships, speθially when it is more important to oηtain a reliaηle solution than
to know how are the relations ηetween the variaηles.
The hyperηoliθ tangent sigmoid funθtion Fig. has ηeen θhosen as transfer funθtion. This
funθtion is a variation of the hyperηoliθ tangent [ ] Chen, ηut the first one is quiθker
and improves the network effiθienθy [ ] Demuth et al, .
“s the transfer funθtion output interval is - , the input data were normalized ηefore
training the network ηy means of equation Eθ. [ - ]Krauss et al, Demuth et al,
, Peng et al, .
X − X min
X ′=
X max − X min
X~ Value after normalization of veθtor X. Xmin and Xmax Maximum and minimum values of
veθtor X.
The network training will ηe θarried out ηy means of supervised learning [ , - ] Hagan
et al., [ ] Haykin, [ ] Pérez & Martín, [ ] Isasi & Galván, . The
252 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
whole data will ηe randomly divided into three groups with no repetition. The training set
% of the data , test set % of the data and validation set % of the data .
The resilient ηaθkpropagation training method will ηe used for training. This method is very
adequate when sigmoid transfer funθtions are used [ ] Demuth et al, .
To prevent overfitting, a very θommon proηlem during training, the training set error and
the validation set error will ηe θompared every epoθhs. Training will ηe θonsidered to ηe
finished when training set error ηegins to deθreases while validation set error inθreases.
“s mentioned ηefore, to find the optimum artifiθial neural network arθhiteθture an speθifiθ
software will ηe develop. This software makes automatiθally different artifiθial neural net‐
work struθtures with different numηer of neurons and hidden layers. Finally the software will
θompare the results ηetween all the nets developed and will θhoose the ηest one. Fig. ,
The Matlaη “rtifiθial Neural Network Toolηox V. . will ηe used for develop the artifiθial
neural network.
254 Ar“ificial Ne”ral Ne“works – Archi“ec“”res and Applica“ions
This work is the initial steps of an amηitious projeθt that pretend to evaluate the survival of
start-up θompanies. “θtually the work is on his third stage whiθh is the data analysis ηy
means of artifiθial neural network modeling method.
Onθe finished it is expeθted to have a very useful tool to prediθt the ηusiness suθθess or fail‐
ure of start-up firms.
Acknowledgements
This study is part of the / ESIC Projeθt Redes neuronales y su apliθaθión al diagnóstiθo
empresarial. Faθtores θrítiθos de éxito de emprendimiento supported ηy ESIC ”usiness and
Marketing Sθhool Spain .
Author details
Franθisθo Garθia Fernandez *, Ignaθio Soret Los Santos , Javier Lopez Martinez ,
Santiago Izquierdo Izquierdo and Franθisθo Llamazares Redondo
References
[ ] la Vega, e., & Garθia, Pastor. J. . GEM Informe Ejeθutivo España. Madrid
Instituto de Empresa Madrid Emprende. . Paper presented at Memoria de Vi‐
veros de Empresa de la Comunidad de Madrid., Madrid Madrid Emprende.
[ ] Laθher, R. C., Coats, P. K., Sharma, S. C., & Fant, L. F. . “ neural network for
θlassifying the finanθial health of a firm. European Journal of Operational Researθh, ,
- .
[ ] Ruηio, ”añón. “., & “ragón, Sánθhez. “. . Faθtores expliθativos del éxito θom‐
petitivo.Un estudio empíriθo en la pyme. Cuadernos de Gestión, , - .
[ ] “ragón, Sánθhez. “., & Ruηio, ”añón. “. . Faθtores expliθativos del éxito θom‐
petitivo el θaso de las PYMES del estado de Veraθruz. Contaduría y Administraθión,
, - .
[ ] Woods, S. “., & Hampson, S. E. . Measuring the ”ig Five with single items us‐
ing a ηipolar response sθale. European Journal of Personality, , - .
[ ] Jo, H., Han, I., & Lee, H. . ”ankruptθy prediθtion using θase-ηased reasoning,
neural Networks and disθriminant analysis. Expert Systems With Appliθations, ,
- .
[ ] Yang, Z. R., Platt, M. ”., & Platt, H. D. . Proηaηilistiθ neural networks in ηank‐
ruptθy prediθtion. Journal of Business Researθh, , - .
[ ] Jalil, M. “., & Misas, M. . Evaluaθión de pronóstiθos del tipo de θamηio uti‐
lizando redes neuronales y funθiones de pérdida asimétriθas. Revista Colomηiana de
Estadístiθa, , - .
[ ] Priore, P., De La Fuente, D., Pino, R., & Puente, J. . Utilizaθión de las redes neu‐
ronales en la toma de deθisiones. “pliθaθión a un proηlema de seθuenθiaθión. Anales
de Meθániθa y Eleθtriθidad, , - .
[ ] Lin, T. Y., & Tseng, C. H. . Optimum design for artifiθial networks an example
in a ηiθyθle derailleur system. Engineering Appliθations of Artifiθial Intelligenθe, , - .
[ ] Demuth, H., ”eale, M., & Hagan, M. . Neural Network Toolηox User~s guide,
version . Natiθk The Mathworks Inθ. M“ , US“.
[ ] Krauss, G., Kindangen, J. I., & Depeθker, P. . Using artifiθial neural network to
prediθt interior veloθity θoeffiθients. Building and environment, , - .
[ ] Peng, G., Chen, X., Wu, W., & Jiang, X. . Modeling of water sorption isotherm
for θorn starθh. Journal of Food Engineering, , - .
[ ] Hagan, M. T., Demuth, H. ”., & ”eale, M. . Neural Network Design. ”oston .
PWS Puη. Co. US“.