You are on page 1of 264

ARTIFICIAL NEURAL NETWORKS ARCHITECTURES

AND

APPLICATIONS
Edited by Kenji Suzuki

ARTIFICIAL NEURAL NETWORKS – ARCHITECTURES AND APPLICATIONS
Edited by Kenji Suzuki

Artificial Neural Networks – Architectures and Applications http://dx.doi.org/10.5772/3409 Edited by Kenji Suzuki Contributors
Eduardo Bianchi, Thiago M. Geronimo, Carlos E. D. Cruz, Fernando de Souza Campos, Paulo Roberto De Aguiar, Yuko Osana, Francisco Garcia Fernandez, Ignacio Soret Los Santos, Francisco Llamazares Redondo, Santiago Izquierdo Izquierdo, JoséManuel Ortiz-Rodrí guez, Hector Rene Vega-Carrillo, José Manuel Cervantes-Viramontes, Víctor Martín Hernández-Dávila, Maria Del Rosario Martí nez-Blanco, Giovanni Caocci, Amr Radi, Joao Luis Garcia Rosa, Jan Mareš, Lucie Grafova, Ales Prochazka, Pavel Konopasek, Siti Mariyam Shamsuddin, Hazem M. El-Bakry, Ivan Nunes Da Silva, Da Silva

Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2013 InTech
All chapters are Open Access distributed under the Creative Commons Attribution 3.0 license, which allows users to download, copy and build upon published articles even for commercial purposes, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source.

Notice
Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published chapters. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book.

Publishing Process Manager Iva Lipovic Technical Editor InTech DTP team Cover InTech Design team First published January, 2013 Printed in Croatia A free online edition of this book is available at www.intechopen.com Additional hard copies can be obtained from orders@intechopen.com

Artificial Neural Networks – Architectures and Applications, Edited by Kenji Suzuki p. cm. ISBN 978-953-51-0935-8

Contents

Preface VII Section 1 Chapter 1 Architecture and Design 1

Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution 3 Shingo Noguchi and Osana Yuko Biologically Plausible Artificial Neural Networks 25 João Luís Garcia Rosa Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network 53 Siti Mariyam Shamsuddin, Ashraf Osman Ibrahim and Citra Ramadhena Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 83 José Manuel Ortiz-Rodríguez, Ma. del Rosario Martínez-Blanco, José Manuel Cervantes Viramontes and Héctor René Vega-Carrillo Applications 113

Chapter 2

Chapter 3

Chapter 4

Section 2 Chapter 5

Comparison Between an Artificial Neural Network and Logistic Regression in Predicting Long Term Kidney Transplantation Outcome 115 Giovanni Caocci, Roberto Baccoli, Roberto Littera, Sandro Orrù, Carlo Carcassi and Giorgio La Nasa Edge Detection in Biomedical Images Using Self-Organizing Maps 125 Lucie Gráfová, Jan Mareš, Aleš Procházka and Pavel Konopásek

Chapter 6

Javier Lopez Martinez. El-Bakry Applying Artificial Neural Network Hadron . Geronimo. Bianchi Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks 163 Hazem M. Cruz.VI Contents Chapter 7 MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process 145 Thiago M. Santiago Izquierdo Izquierdo and Francisco Llamazares Redondo Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 . Aguiar and Eduardo C. da Silva. Ignacio Soret Los Santos. Hindawi Applications of Artificial Neural Networks in Chemical Problems 203 Vinícius Gonçalves Maltarollo.Hadron Collisions at LHC 183 Amr Radi and Samy K. Paulo R. Káthia Maria Honório and Albérico Borges Ferreira da Silva Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems 225 Ivan N. José Ângelo Cagnon and Nilton José Saggioro Use of Artificial Neural Networks to Predict The Business Success or Failure of Start-Up Firms 245 Francisco Garcia Fernandez. D. Fernando de Souza Campos. Carlos E.

Artificial neural networks have been successfully used in various applications such as biological. There are supervised and unsupervised models. college students. and financial applications.. An artificial neural network can “learn” a task by adjusting weights. nonlinear processing units) which are connected to each other via synaptic weights (or simply just weights). University of Chicago Chicago. It has been theoretically proved that an artificial neural network can approximate any continuous mapping by arbitrary precision. An unsupervised model does not require a “teacher. The applications part covers applications of artificial neural networks in a wide range of areas including biomedical applications. The target audience of this book includes professors. medical. and social applications. is a mathematical (or computational) model that is inspired by the structure and function of biological neural networks in the brain. The book consists of two parts: architectures and applications.” but it learns a task based on a cost function associated with the task. An artificial neural network. industrial. and engineers and researchers in companies. versatile tool. optimization.Preface Artificial neural networks may probably be the single most successful technology in the last two decades which has been widely used in a large variety of applications in various areas. economical. An artificial neural network consists of a number of artificial neurons (i. principles. software engineering. Kenji Suzuki. control engendering. The high versatility of artificial neural networks comes from its high capability and learning function. environmental. methodologies and applications of artificial neural networks. An artificial neural network is a powerful. The fundamental concept. Ph. chemistry applications. Thus. USA . Desired continuous mapping or a desired task is acquired in an artificial neural network by learning. design. The architecture part covers architectures. graduate students. I hope this book will be a useful source for readers. The purpose of this book is to provide recent advances of architectures. often just called a neural network.e. industrial applications. physics applications. Illinois. and theory in the section help understand and use an artificial neural network in a specific application properly as well as effectively.D. and analysis of artificial neural networks. this book will be a fundamental source of recent advances and applications of artificial neural networks in a wide variety of areas. A supervised model requires a “teacher” or desired (ideal) output to learn a task.

.

Section 1 Architecture and Design .

.

Chapter 1 Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution Shingo Noguchi and Osana Yuko Additional information is available at the end of the chapter http://dx.5772/51581 1. However. However. Since these models are based on the distributed representation. it can learn new patterns successively [10]. and its storage capacity is larger than that of models in refs. the Kohonen Feature Map (KFM) [2].8].8]. However. Although the KFM associative memory is based on the local representation as similar as the ART[5]. While in the field of associative memories. especially one of the important features is that the networks can learn to acquire the ability of informa‐ tion processing. Introduction Recently. the Kohonen Feature Map (KFM) associative memory [9] has been pro‐ posed.[6 . In the field of neural network. their storage capacities are small because their learning algorithm is based on the Hebbian learning. it is very difficult to get all information to learn in advance. many models have been proposed such as the Back Propaga‐ tion algorithm [1]. On the other hand. Grossberg and Carpenter proposed the ART (Adaptive Resonance Theory) [5].org/10. and imitate these neurons technologically. and the Bidirectional Associative Memory [4].doi. they have the robustness for damaged neurons. It can deal with auto and hetero associations and the asso‐ . the Hopfield network [3]. and therefore it is not robust for damaged neu‐ rons in the Map Layer. As such mod‐ el. Neural networks consider neuron groups of the brain in the creature. the learning process and the recall process are divided. In these models. and therefore they need all information to learn in advance. Neural networks have some features. the ART is based on the local representation. in the real world. neural networks are drawing much attention as a method to realize flexible infor‐ mation processing. so we need the model whose learning process and recall process are not divided. some models have been proposed [6 .

And. Moreover. and associa‐ tions of analog patterns. this model has enough robustness for noisy in‐ put and damaged neurons. the area representation [14] was introduced to the KFM associative memory. Figure 1.4 Artificial Neural Networks – Architectures and Applications ciations for plural sequential patterns including common terms [11. Structure of conventional KFMPAM-WD. 12]. However. In this paper. the weights are updated only in the area corresponding to the input pattern. Moreover. the Kohonen Feature Map Associative Memory with Refractoriness based on Area Representation [15] has been proposed. . In the model. However. The proposed model can realize probabilistic association for the training set including one-to-many relations. In the model. However. the learning considering the neighborhood can be realized. As the model which can deal with analog patterns and one-to-many associations. it can not deal with one-to-many associations. it has enough robustness for damaged neurons when analog patterns are memorized. we propose an Improved Kohonen Feature Map Probabilistic Associative Memory based on Weights Distribution (IKFMPAM-WD). As the model which can realize probabilistic association for the training set including oneto-many relations. and it has robustness for damaged neurons. the Kohonen Feature Map Probabilistic Associative Memory based on Weights Distribution (KFMPAM-WD) [16] has been proposed. in this model. Moreover. all these models can not realize probabilistic association for the training set including one-to-many relations. one-to-many associations are re‐ alized by refractoriness of neurons. This model is based on the con‐ ventional Kohonen Feature Map Probabilistic Associative Memory based on Weights Distri‐ bution [16]. so the learning considering the neighborhood is not carried out. by improvement of the calculation of the in‐ ternal states of the neurons in the Map Layer. the KFM associative memory with area representation [13] has been proposed.

and is given by Dij = { bi . go to (8). the input pattern X(p) is regarded as an un‐ known pattern. If d(X(p). W i) (1) where F is the set of the neurons whose connection weights are fixed.2. 2.(1). The initial values of weights are chosen randomly. Structure Figure 1 shows the structure of the conventional KFMPAM-WD. If the input pattern is regarded as a known pattern. (1) Input/Output Layer and (2) Map Layer. we explain the conventional Kohonen Feature Map Probabilistic Associative Memory based on Weights Distribution (KFMPAM-WD)(16). bi 2 + mij 2ai 2 ai 2bi 2 (mij + 1). In Eq. ai and bi can be set for each training pattern. 2 ai . Wi) θ t is satisfied for all neurons. KFM Probabilistic Associative Memory based on Weights Distribution Here. Wi) is calculated. The neuron which is the center of the learning area r is determined as follows: r = argmin d ( X i : Diz + Dzi < diz ( for ∀ z ∈ F ) ( p) . and the Input/Output Layer is divided into some parts. d(X(p). 4. Learning process In the learning algorithm of the conventional KFMPAM-WD. the connection weights are learned as follows: 1. diz is the distance between the neuron i and the neuron z whose connection weights are fixed. In the KFMPAM-WD.5772/51581 5 2. 2. the neuron whose Euclidian distance between its connection weights and the learning vector is minimum in the neurons which can be take areas without . 3. In Eq. mij is the slope of the line through the neurons i and j. The Euclidian distance between the learning vector X(p) and the connection weights vec‐ tor Wi. As shown in Fig. this model has two layers. (dijy = 0) (dijx = 0) (2) (otherwise) where ai is the long radius of the ellipse area whose center is the neuron i and bi is the short radius of the ellipse area whose center is the neuron i. 1. 2.(1).Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx.1.doi.org/10. Dij is the radius of the ellipse area whose center is the neuron i for the direction to the neuron j.

7. (4) Here. The connection weights of the neuron r Wr are fixed. when the pattern X is given to the Input/Output Layer. (2)∼ (7) are iterated when a new pattern set is given. ai and bi are used as the size of the area for the learning vector. Wr)> θ t is satisfied. (| b| < θ d ) (otherwise). and g ( ⋅ ) is given by g (b) = (6) {1. Wr)≤ θ t is satisfied. { (i = r ) (otherwise) (5) where r is selected randomly from the neurons which satisfy 1 ∑ g ( X k − W ik ) > θ map N in k ∈C where θ map is the threshold of the neuron in the Map Layer. (5) is iterated until d(X(p).(1). the connection weights of the neurons in the ellipse whose center is the neuron r are updated as follows: W i (t + 1) = { W i (t ) + α (t )( X W i (t ). (7) In the KFMPAM-WD. If d(X(p). the output of the neuron i in the Map Layer. the probabilistic association can be realized based on the weights distribution.3. one of the neurons whose connection weights are similar to the input pattern are selected randomly as the winner neuron. 6. 8. . ximap = 0. So. In Eq.6 Artificial Neural Networks – Architectures and Applications overlaps to the areas corresponding to the patterns which are already trained. (dri ≤ Dri ) (otherwise) (3) where α(t) is the learning rate and is given by α (t ) = − α0(t − T ) T . α 0 is the initial value of α(t) and T is the upper limit of the learning iterations. Recall process In the recall process of the KFMPAM-WD. ( p) − W i (t )). 0. 2. 5. ximap is calculated by 1.

3. The Euclidian distance between the learning vector X(p) and the connection weights vec‐ tor Wi.(1).doi. is calculated. the input pattern X(p) is regarded as an un‐ known pattern. 2. In Eq.Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx. Structure Figure 2 shows the structure of the proposed IKFMPAM-WD. Wi). the pro‐ posed model has two layers. Learning process In the learning algorithm of the proposed IKFMPAM-WD. 0. As shown in Fig. 3.5772/51581 7 When the binary pattern X is given to the Input/Output Layer. 4. The proposed model is based on the conventional Kohonen Feature Map Probabilistic Associative Memory based on Weights Distribution (KFMPAM-WD) [16] described in 2. If d(X(p). (W rk ≥ θbno ) (otherwise) (8) where θbio is the threshold of the neurons in the Input/Output Layer. the output of the neuron k in the Input/Output Layer xkio is given by xkio = W rk .org/10. Improved KFM Probabilistic Associative Memory based on Weights Distribution Here. If the input pattern is regarded as a known pattern. When the analog pattern X is given to the Input/Output Layer. we explain the proposed Improved Kohonen Feature Map Probabilistic Associative Memory based on Weights Distribution (IKFMPAM-WD). (1) Input/Output Layer and (2) Map Layer. the neuron whose Euclid distance between its connection weights and the learning vec‐ tor is minimum in the neurons which can be take areas without overlaps to the areas . the output of the neuron k in the Input/Output Layer xkio is given by xkio = { 1.2. 2. go to (8). Wi) θt is satisfied for all neurons. and the Input/ Output Layer is divided into some parts as similar as the conventional KFMPAM-WD. The neuron which is the center of the learning area r is determined by Eq.(1). 3. The initial values of weights are chosen randomly. d(X(p). the connection weights are learned as follows: 1.1. (9) 3.

¯ W i (t ) + H (dri )( X ( p ) − W i (t )). (5) is iterated until d(X(p). If there is no weight-fixed neuron. H (dri ) and H (di ∗i ) are given by Eq. Especially.(1). H (dri ) behaves as the neighborhood function. If d(X(p). 8. 6.8 Artificial Neural Networks – Architectures and Applications corresponding to the patterns which are already trained. ¯ ¯ (θ2learn ≤ H (dri ) < θ1learn andH (di ∗i ) < θ1learn ) (otherwise) ¯ (θ1learn ≤ H (dri )) ¯ ¯ (10) where θ1learn are thresholds. In Eq. W i (t ). i* shows the nearest weight-fixed neuron from the neuron i. D (1 D) is the constant to decide the neighborhood area size and is the steep‐ ness parameter. (2)∼ (7) are iterated when a new pattern set is given. H (dij ) = 1 + exp ¯ ¯ ¯ ( 1 dij − D ε ¯ ) (11) where dij shows the normalized radius of the ellipse area whose center is the neuron i for the direction to the neuron j. The connection weights of the neuron r Wr are fixed. and is given by ¯ dij = dij . the connection weights of the neurons in the ellipse whose center is the neuron r are updated as follows: W i (t + 1) = { X ( p ). Dij (12) In Eq. Wr)≤ θ t is satisfied. Here.(11).(11) and these are semifixed function. ¯ (13) . Wr) θ t is satisfied. ai and bi are used as the size of the area for the learning vector. H (di ∗i ) = 0 is used. 7. 5.

1.org/10. As shown in Fig. Recall process The recall process of the proposed IKFMPAM-WD is same as that of the conventional KFMPAM-WD described in 2. when “crow” was given to the net‐ .2.2 ∼ 4. 4.Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx. 4. Association results 4.1. 3 were memorized in the network composed of 800 neurons in the Input/Output Layer and 400 neurons in the Map Layer. the binary patterns including one-to-many relations shown in Fig. 4.3. 4.6. 3. Binary patterns In this experiment. Experimental conditions Table 1 shows the experimental conditions used in the experiments of 4.5772/51581 9 Figure 2. Figure 4 shows a part of the association result when “crow” was given to the Input/Output Layer.2. we show the computer experiment results to demonstrate the effectiveness of the pro‐ posed IKFMPAM-WD.doi. Structure of proposed IKFMPAM-WD. Computer experiment results Here.3.

004 Threshold of Difference between Weight Vectorθ d and Input Vector Parameter for Recall (Binary) Threshold of Neurons in Input/Output Layer Table 1.91 learn Steepness Parameter in Neighborhood Functionε Threshold of Neighborhood Function (1) Threshold of Neighborhood Function (2) Parameters for Recall (Common) Threshold of Neurons in Map Layer θ1 0.5 Figure 3. “dog” (t=251). “monkey” (t=2) and “lion” (t=4) were recalled. From these results. In this case.10 Artificial Neural Networks – Architectures and Applications work. θ bin 0. Parameters for Learning Threshold for Learning Neighborhood Area Size θ tlearn 10-4 D 3 0.75 0.9 0. Figure 5 shows a part of the association result when “duck” was given to the Input/Output Layer. “cat” (t=252) and “penguin” (t=255) were recalled.1 θ2 learn θ map 0. “mouse” (t=1). we can con‐ firmed that the proposed model can recall binary patterns including one-to-many relations. Training Patterns including One-to-Many Relations (Binary Pattern). Experimental Conditions. .

One-to-Many Associations for Binary Patterns (When “duck” was Given).org/10. red neurons show the center neuron in each area. . As shown in Fig. green neurons show the neurons in areas for the patterns including “duck”. 6. since the connection weights are updated not only in the area but also in the neighborhood area in the proposed model. Figure 5. areas corresponding to the pattern pairs including “crow”/“duck” are arranged in near area each other. One-to-Many Associations for Binary Patterns (When “crow” was Given). blue neurons show the neurons in areas for the patterns including “crow”. Moreover.Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx.doi. Figure 6 shows the Map Layer after the pattern pairs shown in Fig. In Fig. the proposed model can learn each learning pattern with various size area. 6.5772/51581 11 Figure 4. 3 were memorized.

0) 11 (1.1) 33 (3. Input PatternOutput PatternArea SizeRecall Times crow lion monkey mouse duck penguin dog cat 11 (1.0) 132 (3. Recall Times for Binary Pattern corresponding to “crow” and “duck”.0) 23 (2.5 2. Area Size corresponding to Patterns in Fig.0) 79 (2.5 4. 3. Area Representation for Learning Pattern in Fig. .5 3.4) Table 3. 2.5 2.0 2.5 Figure 6.5 1.0) 23 (2.5 3.12 Artificial Neural Networks – Architectures and Applications Learning Pattern Long Radius aiShort Radius bi “crow”–“lion” “crow”–“monkey” “crow”–“mouse” “duck”–“penguin” “duck”–“dog” “duck”–“cat” Table 2.0) 43 (1.8) 39 (1.0) 87 (2. 3.0) 120 (2.0 1.0 2.1) 33 (3.0 2.5 4.

As shown in Fig. “raccoon dog” (t=2) and “penguin” (t=3) were recalled. Figure 9 shows a part of the association result when “mouse” was given to the Input/Output Layer.org/10. normalized values are also shown in ( ). In this case.2. In this table. Training Patterns including One-to-Many Relations (Analog Pattern). 8.doi.2. 4 (t=1∼250) and Fig. 7 were memorized in the network composed of 800 neurons in the Input/Output Layer and 400 neurons in the Map Layer. the analog patterns including one-to-many relations shown in Fig. Analog patterns In this experiment. “hen” (t=252) and “chick” (t=253) were recalled. 4. Figure 8: One-to-Many Associations for Analog Patterns (When “bear” was Given). 5 (t=251∼ 500). we can confirmed that the proposed model can recall analog patterns including oneto-many relations. “lion” (t=1).5772/51581 13 Table 3 shows the recall times of each pattern in the trial of Fig.Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx. “monkey” (t=251). From these re‐ sults. . when “bear” was given to the network. Figure 8 shows a part of the association result when “bear” was given to the Input/Output Layer. we can confirmed that the proposed model can realize probabilistic associations based on the weight distributions. From these results. Figure 7.

7. Area Size corresponding to Patterns in Fig.0 2. Area Representation for Learning Pattern in Fig.5 Figure 10. One-to-Many Associations for Analog Patterns (When “mouse” was Given).0 2.5 4. Learning Pattern Long Radius aiShort Radius bi “bear”–“lion” “bear”–“raccoon dog” “bear”–“penguin” “mouse”–“chick” “mouse”–“hen” “mouse”–“monkey” Table 4. 7.5 3.5 1.5 2.0 1.5 4.0 2. 2.5 2.5 3. .14 Artificial Neural Networks – Architectures and Applications Figure 9.

Figure 11.0) 23 (2.0) 23 (2.Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx. As shown in Fig. From these results. .0) 11 (1. Recall Times for Analog Pattern corresponding to “bear” and “mouse”. Storage Capacity of Proposed Model (Binary Patterns).5772/51581 15 Input PatternOutput PatternArea SizeRecall Times bear lion raccoon dog penguin mouse chick hen monkey 11 (1. Table 5 shows the recall times of each pattern in the trial of Fig. 7 were memorized. In this table. blue neurons show the neurons in the areas for the patterns including “bear”.0) 94 (2. Figure 10 shows the Map Layer after the pattern pairs shown in Fig. we can confirmed that the proposed model can realize probabilistic associations based on the weight distributions. 10. 8 (t=1∼ 250) and Fig. 9 (t=251∼ 500). the proposed model can learn each learning pattern with various size area.0) 38 (1. 10.1) Table 5.5) 118 (3. green neurons show the neurons in the areas for the patterns including “mouse”.3) 120 (3.0) 90 (2.doi.1) 33 (3.0) 40 (1. red neurons show the center neuron in each area. In Fig. normalized values are also shown in ( ).1) 33 (3.org/10.

we can confirm that the storage capacity of the proposed model is almost same as that of the conventional mod‐ el(16). we used the network composed of 800 neurons in the Input/Output Layer and 400/900 neurons in the Map Layer. Association result for noisy input Figure 15 shows a part of the association result of the proposed model when the pattern “cat” with 20% noise was given during t=1∼ 500. and 1-to-P (P=2. And it does not depend on P in one-to-P relations.4) random pattern pairs were memorized as the area (ai=2.3.5 and bi=1.1. Figures 11 and 12 show the average of 100 trials. Robustness for noisy input 4. Storage Capacity of Proposed Model (Analog Patterns).3. 13 and 14. and the storage capacities of the conventional mod‐ el(16) are also shown for reference in Figs. the proposed model can recall correct patterns even when the noisy input was given. It depends on the number of neurons in the Map Layer.16 Artificial Neural Networks – Architectures and Applications Figure 12. As shown in these figures. From these results. 11 and 12.4. we examined the storage capacity of the proposed model.5). Figure 16 shows a part of the association result of the propsoed model when the pattern “crow” with 20% noise was given t=501∼ 1000. .4. Figures 11 and 12 show the storage capacity of the proposed model. 4. Storage capacity Here. the storage capacity of the proposed model does not de‐ pend on binary or analog pattern. In this experiment. 4. As shown in Figs.

Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx.org/10.). Figure 15.doi.5772/51581 17 Figure 13. Storage Capacity of Conventional Model [16] (Binary Pattern Figure 14. Association Result for Noisy Input (When “crow” was Given. . Storage Capacity of Conventional Model [16] (Analog Patterns).

. Association Result for Noisy Input (When “duck” was Given. Robustness for Noisy Input (Analog Patterns). Figure 18. Figure 17.18 Artificial Neural Networks – Architectures and Applications Figure 16. Robustness for Noisy Input (Binary Patterns).).

the proposed model has robust‐ ness for noisy input as similar as the conventional model(16). the network whose 20% of neurons in the Map Layer are damaged were used. In this experi‐ ment. Figures 21 and 22 are the average of 100 trials. Figures 17 and 18 are the average of 100 trials.5.Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx.).4. Robustness for noisy input Figures 17 and 18 show the robustness for noisy input of the proposed model. Figure 19. the proposed model has robustness when the winner neurons are damaged as similar as the conventional model [16].doi.org/10. In these experiments. 4. As shown in these figures.5. 4. Robustness for damaged neurons 4. In this experiment.2. As shown in these figures. Association result when some neurons in map layer are damaged Figure 19 shows a part of the association result of the proposed model when the pattern “bear” was given during t=1∼ 500. 10 randam patterns in one-to-one relations were memorized in the network composed of 800 neurons in the Input/Output Layer and 900 neurons in the Map Layer. Figure 20 shows a part of the association result of the proposed model when the pattern “mouse” was given t=501∼ 1000.1. Association Result for Damaged Neurons (When “bear” was Given.5. . As shown in these figures. 1∼ 10 random patterns in one-to-one relations were mem‐ orized in the network composed of 800 neurons in the Input/Output Layer and 900 neurons in the Map Layer.5772/51581 19 4. the proposed model can recall correct patterns even when the some neurons in the Map Layer are damaged. Robustness for damaged neurons Figures 21 and 22 show the robustness when the winner neurons are damaged in the pro‐ posed model.2.

Association Result for Damaged Neurons (When “mouse” was Given. Figure 21.20 Artificial Neural Networks – Architectures and Applications Figure 20. . Figure 22. Robustness of Damaged Winner Neurons (Analog Patterns). Robustness of Damaged Winner Neurons (Binary Patterns).).

Fig‐ ures 23 and 24 are the average of 100 trials.3). we examined the learning speed of the proposed model. Figures 23 and 24 show the robustness for damaged neurons in the proposed model.doi.6. FreeBSD 4. As shown in Table 6.95. the learning time of the proposed model is shorter than that of the conventional model.11. 10 random patterns in one-to-one relations were memorized in the network composed of 800 neurons in the Input/Output Layer and 900 neurons in the Map Layer. Table 6 shows the learning time of the pro‐ posed model and the conventional model(16). Learning speed Here. In this experiment. Figure 24. Robustness for Damaged Neurons (Analog Patterns). 4. .org/10. the proposed model has robustness for damaged neurons as similar as the conventional model [16]. These results are average of 100 trials on the Personal Computer (Intel Pentium 4 (3. In this experiment.2GHz). As shown in these figures.5772/51581 21 Figure 23: Robustness for Damaged Neurons (Binary Patterns).Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx. 10 ran‐ dom patterns were memorized in the network composed of 800 neurons in the Input/ Output Layer and 900 neurons in the Map Layer. gcc 2.

E. J. (1986). B. (1994). & the PDP Research Group.jp Tokyo University of Technology.01 1. A. Author details Shingo Noguchi and Osana Yuko* *Address all correspondence to: osana@cs.22 Artificial Neural Networks – Architectures and Applications 5. S. Conclusions In this paper. Founda‐ tions. 11. D. Self-Organizing Maps.92 1. Bidirectional associative memories. Learning Speed. Japan References [1] Rumelhart. [5] Carpenter. T. L. we have proposed the Improved Kohonen Feature Map Probabilistic Associa‐ tive Memory based on Weights Distribution. Springer. (1982). (1988). J..ac. 49-60. . & Grossberg. 2554-2558. McClelland. Learning Time (seconds) Proposed Model Proposed Model (Binary Patterns) (Analog Patterns) 0. Proceedings of National Academy Sciences USA. [3] Hopfield. The MIT Press.. The proposed model can realize probabilistic association for the training set including one-tomany relations.34 Conventional Model(16) (Binary Patterns) Conventional Model(16) (Analog Patterns) Table 6. J. Exploitations in the Microstructure of Cognition. [4] Kosko. The MIT Press. G. IEEE Transactions on Neural Net‐ works. We carried out a series of computer experiments and confirmed the effectiveness of the proposed model. Moreover. Parallel Dis‐ tributed Processing. 79. Neural networks and physical systems with emergent collective computational abilities.87 0. (1995). [2] Kohonen. 18(1). this model has enough robustness for noisy input and damaged neurons.teu. This model is based on the conventional Koho‐ nen Feature Map Probabilistic Associative Memory based on Weights Distribution.. Pattern Recognition by Self-organizing Neu‐ ral Networks.

N. Y. Sequential learning for asso‐ ciative memory using Kohonen feature map. Inns‐ bruck. Okuno. H. & Ito. Proceedings of IASTED Artificial Intelligence and Applications. Hetero chaotic associative memory for successive learn‐ ing with give up function -. 950-955. M. [9] Ichiki. & Ito.. Proceedings of IEEE and INNS Interna‐ tional Joint Conference on Neural Networks. Hattori. [15] Imabayashi. Automatic learning in chaotic neural networks. M.. T. Y. M. M. Proceedings of IEEE and INNS International Joint Conference on Neural Net‐ works. M. [7] Arai. & Osana. Vienna. Arisumi. Proceedings of IEEE and INNS Inter‐ national Joint Conference on Neural Networks. & Nakagawa. Kohonen feature map probabilistic associative memo‐ ry based on weights distribution.org/10.. (2001). Sequential learning for SOM associative memory with map reconstruction.. Proceedings of IASTED Ar‐ tificial Intelligence and Applications. Vancouver. (2006). M.. 1944-1948.. [11] Hattori... & Osana. & Osana. 686-691. (in Japanese). (1999). H. Honolulu.. Aihara. Kohonen feature map associative memory with area rep‐ resentation. III.. S. M.. H. & Osana. (2006). Implementation of association of one-to-many as‐ sociations and the analog pattern in Kohonen feature map associative memory with area representation. T. M.One-to-many associations --. Hetero chaotic associative memory for suc‐ cessive learning with multi-winners competition.doi. Kohonen feature maps as a super‐ vised learning machine. (1995). & Ito. (2008). Innsbruck. H. H. Hagiwara. Innsbruck. International Con‐ ference on Computational Intelligence and Neuroscience. SOM associative memory for temporal se‐ quences. [10] Yamada. [8] Ando.. H. Proceedings of International Conference on Artificial Neural Networks. M. [14] Ikeda. (2010). (2006).. K. M.. (2002). 555. Innsbruck. Y. Y.. & Osana. (1997).. Morisawa. N. [13] Abe. [16] Koike. Y.5772/51581 23 [6] Watanabe. [12] Sakurai. & Hagiwara. T.. 430-433. A proposal of novel knowledge representation (Area representation) and the implementation by neural network. Washington D.C. Proceedings of IEEE International Conference on Neural Net‐ works. . (1993). Hattori. Proceedings of IASTED Artificial Intelligence and Applications.Improved Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution http://dx. J78-A(6). Y. & Kondo.. Proceedings of IASTED Artificial Intelligence and Ap‐ plications. IEICE-A.

.

and several other features. In the literature of dynamical systems.org/10. it is widely believed that knowing the electrical currents of nerve cells is sufficient to determine what the cell is doing and why. Introduction Artificial Neural Networks (ANNs) are based on an abstract and simplified view of the neuron. The type of bifurcation determines the most fundamental computational properties of neurons. whether the neuron is an integrator or resonator etc. A bifurcation of a dynamical system is a qualitative change in its dynamics produced by varying parameters [19]. Connections can be formed through learning and do not need to be ’programmed. the existence or nonexistence of the activation threshold. “information processing depends not only on anatomical substrates of synaptic circuits. such as the class of excitability.. But in the neuroscience community. this fact was ignored until recently when the difference in behavior was showed to be due to different mechanisms of excitability bifurcation [35]. bi-stability of rest and spiking states. because they are more oriented to computational performance than to biological credibility [41]. The Synaptic Organization of the Brain. Artificial neurons are connected and arranged in layers to form large networks. where learning and connections determine the network function. According to the fifth edition of Gordon Shepherd book. e. A biologically inspired connectionist approach should present a neurophysiologically motivated training algorithm. all-or-none action potentials (spikes). g.’ Recent ANN models lack many physiological properties of the neuron. distributed representations.5772/54177 1. this somewhat contradicts the observation that cells that have similar currents may exhibit different behaviors. but also on the electrophysiological properties of neurons” [51]. . [25].Provisional chapter Chapter 2 Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks João Luís Garcia Rosa João Luís Garcia Rosa Additional information is available at the end of the chapter Additional information is available at the end of the chapter http://dx. Indeed.doi. a bi-directional connectionist architecture. sub-threshold oscillations.

McCulloch-Pitts neuron McCulloch-Pitts neuron (1943) was the first mathematical model [32]. Wi is the connection (synaptic) weight associated with input i. The McCulloch-Pitts neuron represents a simplified mathematical model for the neuron. The activation function can be: (1) hard limiter. • activity of any inhibitory synapse prevents neuron from firing.1. • network structure does not change along time.26 2 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications 1. The typical neuron. Xi is the i-th input. For a neuron with p inputs: p a= ∑ x i wi i =1 (1) with x0 = 1 and w0 = β = −θ . which is considered the biologically more plausible activation function. Figure 1. See figures 1 and 2. (2) threshold logic. and (3) sigmoid. • a certain fixed number of synapses are excited within a latent addition period in order to excite a neuron: independent of previous activity and of neuron position. The weighted inputs (Xi × Wi ) are summed in the cell body. The are p binary inputs in the schema of figure 2. The synaptic weights are real numbers. where xi is the i-th binary input and wi is the synaptic (connection) weight associated with the input xi . the signal a is input to an activation function ( f ). Its properties: • neuron activity is an "all-or-none" process. giving the neuron output. Figure 2. because the synapses can inhibit (negative signal) or excite (positive signal) and have different intensities. The computation occurs in soma (cell body). providing a signal a. The neuron model. β is the bias and θ is the activation threshold. After that. • synaptic delay is the only significant delay in nervous system. .

while (B) denotes granule cells. and there is communication between neurons in different layers (see figure 6). and sends output 1 (spike) if this sum is greater than the activation threshold. Set of linearly separable points. neurons are disposed in layers (representing columns). Drawing by Santiago Ramón y Cajal of neurons in the pigeon cerebellum.5772/54177 3 27 1. Neural network topology In cerebral cortex. and its learning algorithm (delta rule) does not work with networks of more than one layer.org/10. In the extremely simplified mathematical model. Set of non-linearly separable points. neurons are disposed in columns. The perceptron Rosenblatt’s perceptron [47] takes a weighted sum of neuron inputs.2. an example of a multipolar neuron. a straight line is able to separate them in two classes (figures 3 and 4). . it is only capable of learning solution of linearly separable problems. Figure 4. The limitations of the perceptron is that it is an one-layer feed-forward network (non-recurrent). It is a linear discriminator: given 2 points. See the famous drawing by Ramón y Cajal (figure 5). which are also multipolar [57]. Figure 5. and most synapses occur between different columns. (A) denotes Purkinje cells.Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx.doi.3. Figure 3. 1. a straight line is able to discriminate them. For some configurations of m points.

. B + 1 hidden units. The “extra” neurons in input and hidden layers. the activation threshold. ANNs do not focus on real neuron details. and yet very far from biological reality. they allow integration of information from different sources or kinds. So. taking into account the presence of presynaptic cells and their synaptic potentials. robust.g. a threshold function). 1.. it is still very hard to explain its behavior. represent the presence of bias: the ability of the network to fire even in the absence of input signal. they represent impoverished explanation of human brain characteristics. As advantages. Notice that there are A + 1 input units. we may say that ANNs are naturally parallel solution. A 3-layer neural network. Andy Clark proposes three types of connectionism [2]: (1) the first-generation consisting of perceptron and cybernetics of the 1950s.28 4 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications Figure 6. they show a certain autonomy degree in learning. For the first time.g. Classical ANN models Classical artificial neural networks models are based upon a simple description of the neuron. and it is transformed in an output signal via a simple function (e. Among them. And there are many limitations of ANNs. between 0 and 1). The K-models are examples of this category [30]. (2) the second generation deals with complex dynamics with recurrent networks in order to deal with spatio-temporal events. fault tolerant. They are simple neural structures of limited applications [30]. labeled 1. and they are computationally expensive for big problems.4. 0 or 1) or a real number (e. We may add a fourth type: a network which considers populations of neurons instead of individual ones and the existence of chaotic oscillations. and C output units. their solutions do not scale well. and the propagation of an action potential. The output signal is either discrete (e. these systems use biological inspired modular architectures and algorithms. See the main differences between the biological neural system and the conventional computer on table 1.. The network input is calculated as the weighted sum of input signals. are adaptive systems. perceived by electroencephalogram (EEG) analysis.g. The conductivity delays are neglected. respectively. . that is. (3) the third generation takes into account more complex dynamic and time properties. because of lacking of transparency. capable of learning. w1 and w2 are the synaptic weight matrices between input and hidden layers and between hidden and output layers. and display a very fast recognizing performance.

Basically. hetero-associative. 1. This kind of rule is a formalization of the associationist psychology. two learning methods are possible with Hebbian learning: unsupervised learning and supervised learning. Connectionist networks learn through synaptic weight change. well-constrained Biological neural system High speed Simple A large number Localized Integrated with processor Content addressable Sequential Distributed Self-learning Robust Perceptual problems Poorly defined. therefore allow learning. If they are equal. through excitatory and inhibitory strength of existing synapses. learning is called auto-associative. According to Hebb. In unsupervised learning there is no teacher. Learning may happen also through network topology change (in a few models). Learning The Canadian psychologist Donald Hebb established the bases for current connectionist learning algorithms: “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it. which then are the basis of further learning” [21]. the word “connectionism” appeared for the first time: “The theory is evidently a form of connectionism.Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. in which associations are accumulated among things that happen together. one of the switchboard variety. This hypothesis permits to model the CNS plasticity. unconstrained Table 1. if R means a muscular response.5. as one of the cells firing B. some growth process or metabolic change takes place in one or both cells such that A’s efficiency. This is a case of probabilistic reasoning without a statistical model of the problem. adapting it to environmental changes. Synaptic weights change values. if they are different. in most cases: it reveals statistical correlations from the environment. How to supply a neural substrate to association learning among world facts? Hebb proposed a hypothesis: connections between two nodes highly activated at the same time are reinforced. In supervised learning. knowledge is revealed by associations. Von Neumann’s computer versus biological neural system [26]. . so the network tries to find out regularities in the input patterns. is increased” [21].5772/54177 5 29 Processor Memory Computing Reliability Expertise Operational environment Von Neumann computer Complex One or few Low speed Separated from processor Non-content addressable Distributed Centralized Stored programs Parallel Very vulnerable Numeric and symbolic manipulations Well-defined. the plasticity in Central Nervous System (CNS) allows synapses to be created and destroyed. Also. which can be through internal self-organizing: encoding of new knowledge and reinforcement of existent knowledge.org/10. the input is associated with the output. it allows that a connectionist network learns correlation among facts.doi. This way. and its topology. though it does not deal in direct connections between afferent and efferent pathways: not an ’S-R’ psychology. The connections server rather to establish autonomous central activities. that is.

Synaptic “weights” have to be modified in order to make learning possible. requiring two passes of computation: (1) activation propagation (forward pass). it does not solve satisfactorily big size problems. It applies the generalized delta rule. it is the most used algorithm. BP is based on the error back propagation: while stimulus propagates forwardly. the error (difference between the actual and the desired outputs) propagates backwardly. there is a two-layer network (one hidden layer) that can be trained by Back-propagation in order to approximate as much as desired this function. the error must have to propagate back from the dendrite of the postsynaptic neuron to the axon and then to the dendrite of the presynaptic neuron. In the cerebral cortex.30 6 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications 1. 2. but certainly not in the way BP does. 28]. that is. All variables that describe the neuronal dynamics can be classified into four classes according to their function and time scale [25]: 1. and sometimes. because their status is determined solely by the membrane potential V and the variable opening (activation) and closing (deactivation) of ion channels n. and (2) error back propagation (backward pass). 27. such as the inactivation of a current Na+ and activation of a rapid current K + . Dynamical systems Neurons may be treated as dynamical systems. given a continuous function. m and h for persistent K + and transient Na+ currents [1. Back-propagation works in the following way: it propagates the activation from input to hidden layer. Besides. It sounds unrealistic and improbable. it is computationally expensive (slow). 3. and from hidden to output layer. The law of evolution is given by a four-dimensional system of ordinary differential equations (ODE). as the main result of Hodgkin-Huxley model [23]. The Hodgkin-Huxley model is a dynamical system of four dimensions. A dynamical system consists of a set of variables that describe its state and a law that describes the evolution of state variables with time [25]. in this case. They are responsible for re-polarization (lowering) of the action potential.a locally minimum value for the error function. . That’s why BP seems to be so biologically implausible. Suppose that BP occurs in the brain. the solution found is a local minimum . Although Back-propagation is a very known and most used connectionist training algorithm. BP has a universal approximation power. Weight change must use only local information in the synapse where it occurs.6. 2. They are responsible for lifting the action potential. Principles of neurodynamics describe the basis for the development of biologically plausible models of cognition [30]. such as activation of a Na+ current. Back-propagation Back-propagation (BP) is a supervised algorithm for multilayer networks. calculates the error for output units. Membrane potential. Excitation variables. then back propagates the error to hidden units and then to input units. Recovery variables. the stimulus generated when a neuron fires crosses the axon towards its end in order to make a synapse onto another neuron input.

. Phase portraits The power of dynamical systems approach to neuroscience is that we can say many things about a system without knowing all the details that govern its evolution. Since there are no changes in their state variables. small perturbations. Consider a quiescent neuron whose membrane potential is at rest. shape and period of limit cycle. such as B. The neurons are different The currents define the type of neuronal dynamical system [20]. All of them are attracted to the equilibrium state denoted by the black dot.5772/54177 7 31 4. which shows certain special trajectories (equilibria. denoted by PSP (post-synaptic potential).5) the most useful state variables are derived from electrical potentials generated by a neuron. Their recordings allow the definition of one state variable for axons and another one for dendrites. such as A. as those shown at the bottom of the figure 7. or no communication at all (horizontal cells of the retina). up to 4. 2. are amplified by the intrinsic dynamics of neuron and result in the response of the action potential. They build prolonged action potentials and can affect the excitability over time. When the neuron is at rest (phase portrait = stable equilibrium).2. such as the activation of low voltage or current dependent on Ca2+ . All incoming currents to depolarize the neuron are balanced or equilibrated by hyper-polarization output currents: stable equilibrium (figure 7(a) . The excitability is illustrated in figure 7(b).000 (purkinje cells).doi. The description of the dynamics can be obtained from a study of system phase portraits. Regarding Freeman’s neurodynamics (see section 2. limit cycles) that determine the behavior of all other topological trajectory through the phase space.1. The input connections ranges from 500 (retinal ganglion cells) to 200. 2. Each neuron makes on average 1. Adaptation variables. The details of the electrophysiological neuron only determine the position. it will be brought to a pacemaker mode.org/10. Major disturbances. It is possible to predict the itinerant behavior of neurons through observation [10].Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. it is an equilibrium point. there are hundreds of thousands of different types of neurons and at least one hundred neurotransmitters. The speed of the action potential (spike) ranges from 2 to 400 km/h. the system may have many trajectories. which are very different. And communication via spikes may be stereotypical (common pyramidal cells). and dendrite expresses in intensity of its synaptic current (wave amplitude) [10]. The axon expresses its state in frequency of action potentials (pulse rate).top). In about 100 billion neurons in the human brain.000 synapses on other neurons [8]. There are millions of different electrophysiological spike generation mechanisms. Axons are filaments (there are 72 km of fiber in the brain) that can reach from 100 microns (typical granule cell). called attractor [25]. If a current strong enough is injected into the neuron. which displays periodic spiking activity (figure 7(c)): this state is called the cycle stable limit. or stable periodic orbit.5 meters (giraffe primary afferent). One can imagine the time along each trajectory. result in small excursions from equilibrium. Depending on the starting point. separatrices.

when the magnitude of the injected current or other parameter of the bifurcation changes. although there are millions of different electrophysiological mechanisms of excitability and spiking.32 8 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications Figure 7. In sub-critical Andronov-Hopf bifurcation. The type of bifurcation depends on the electrophysiology of the neuron and determines its excitable properties. Bifurcations Apparently. there is an invariant circle at the time of bifurcation. The four bifurcations are shown in figure 8: saddle-node bifurcation. and activity of periodic spiking (c).izhikevich. there is an injected current that corresponds to the transition from rest to continuous spiking.e.3. sub-critical Andronov-Hopf and supercritical Andronov-Hopf. which becomes a limit cycle attractor. Thus the trajectory deviates from equilibrium and approaches a limit cycle of high amplitude spiking or some other attractor. i. there are only four different types of bifurcation of equilibrium that a system can provide. whose currents were not measured and whose models are not known. a stable equilibrium correspondent to the rest state (black circle) is approximated by an unstable equilibrium (white circle). we see the trajectories of the system. In the supercritical Andronov-Hopf bifurcation. 2. available at http://www. Interestingly. excitable (b). saddle-node on invariant circle. the equilibrium state loses stability and gives rise to a small amplitude limit cycle attractor. from the portrait phase of figure 7(b) to 7(c). In saddle-node bifurcation on invariant circle. the transition corresponds to a bifurcation of the dynamical neuron. pdf. In saddle-node bifurcation. In general. or a qualitative representation of the phase of the system. The neuron states: rest (a). One can understand the properties of excitable neurons.org/publications/dsn. . From the point of view of dynamical systems. a small unstable limit cycle shrinks to a equilibrium state and loses stability. neurons are excitable because they are close to bifurcations from rest to spiking activity. depending on the starting point. Figure taken from [25]. since one can identify experimentally in which of the four bifurcations undergoes the rest state of the neuron [25]. At the bottom.

but promotes it in resonators. Figure taken from [25]. 2. both without and with invariant circle. Neurocomputational properties Inhibition prevents spiking in integrators.4. while systems with saddle bifurcations. do not.pdf. The existence of small amplitude oscillations creates the possibility of resonance to the frequency of the incoming pulses [25]. available at http://www. either sub-critical or supercritical.org/publications/dsn. In resonators. Figure 8. Systems with Andronov-Hopf bifurcations.5772/54177 9 33 When the magnitude of the injected current increases.izhikevich.1.4. See table 2.doi. both excitation and inhibition push the state toward the shaded region [25]. exhibit low amplitude membrane potential oscillations. 2. The excitatory inputs push the state of the system towards the shaded region of figure 8. and those which do not have this property are integrators. are called bistable and those which do not exhibit this feature are monostable. . Integrators and resonators Resonators are neurons with reduced amplitude sub-threshold oscillations. Neurons that exhibit co-existence of rest and spiking states. the limit cycle amplitude increases and becomes a complete spiking limit cycle [25].Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. while the inhibitory inputs push it out. Geometry of phase portraits of excitable systems near the four bifurcations can exemplify many neurocomputational properties.org/10.

and they are different from representations. actions and choices made are responsible for creation of meanings in brains. Neurobiologists usually claim that brains process information in a cause-and-effect manner: stimuli carry information that is conveyed in transformed information. resulted from actions into the world. Adapted from [25]. Neuron populations are the key to understand the biology of intentionality. Brain imaging is performed during normal behavior activity. Representations exist only in the world and have no meanings. which looks like noise but presents a kind of hidden order [10]. but they do (different) macroscopic things. This traditional view allowed the development of information processing machines. a foundation must be laid including brain imaging and non-linear brain dynamics. Freeman neurodynamics Nowadays. How are these actions generated? According to a cognitivist view. view of neuronal workings. (2) neurodynamical model. Between the microscopic neuron and these macroscopic things. or even mistaken. In Freeman’s opinion. fields that digital computers make possible. regarding the way how the brain operates as a whole [55]: (1) classical model. According to Freeman [10].34 10 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications sub-threshold oscillations no yes (integrator) (resonator) co-existence of rest and spiking states yes no (bistable) (monostable) saddle-node saddle-node on invariant circle sub-critical supercritical Andronov-Hopf Andronov-Hopf Table 2. in order to understand brain functioning. two very different concepts co-exist in neuroscience. neuron populations also have states and activity patterns. Like neurons. an action is determined by the form of a stimulus. according to the rest state bifurcation. called short-term . This simplified. Freeman argues that there are two basic units in brain organization: the neuron and the neuron population. The relation of neurons to meaning is not still well understood. Neuron classification in integrators-resonators/monostable-bistable. What if stimuli are selected before appearance? This view fails in this case. masses of interacting neurons forming neuron populations are considered for a macroscopic view of the brain.5. where the brain operates by non-linear dynamical chaos. where the brain is described as consisting of a series of causal chains composed of nerve nets that operate in parallel (the conventional artificial neural networks [20]). Brain activity is directed toward external objects. there are mesoscopic populations [10]. although representations can be transferred between machines. and non-linear dynamics models these data. 2. leading to creation of meaning through learning. Intentional action is composed by space-time processes. Artificial Intelligence artifacts pose a challenge: how to attach meaning to the symbolic representations in machines? Pragmatists conceive minds as dynamical systems. Although neuron has been the base for neurobiology. In a dynamicist view. led to the development of digital computers. meaning cannot be transferred between brains [10].

for materialists and cognitivists. The attenuation is compensated by greater surface area and more synapses on the distal dendrites. which causes a macroscopic potential difference. in every experience and every behavior. EEG records the activity patterns of mesoscopic neuron populations. Neurons are connected by synapses. Figure 9. microscopic pulse frequencies and wave amplitudes are measured. 2. the soma (cell body). Dendrites integrate information using continuous waves of ionic current. 55]. to some extent. Trigger zones convert waves to pulses. Researchers who base their studies on single neurons think that population events such as EEG are irrelevant noise. These AM patterns are expressions of non-linear chaos. The microscopic current from each neuron sums with currents from other neurons.5. In single neurons. They are relevant to our understanding of the workings of the normal brain [55]. Adapted from [45] and [10]. Neuron populations Typical neuron have many dendrites (input) and one axon (output). Dendrites make waves and axons make pulses. the trigger zone. The neuron is microscopic and ensemble is mesoscopic. not merely a summation of linear dendritic and action potentials. Typical neuron showing the dendrites (input). while in populations. The strength of the post-synaptic potential decreases with distance between the synapse and the cell body. evaluating the dendritic wave state variable of the single neuron.Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx.5772/54177 11 35 memory or cognitive maps. In the pragmatism view there is no temporary storage of images and no representational map. Synapses convert pulses to waves. the axon (output). The flow of the current inside the neuron is revealed by a change in the membrane potential.org/10.doi. In the neurodynamical model every neuron participates. AM patterns create attractor basins and landscapes. The axon transmits information using microscopic pulse trains. via non-linear chaotic mechanisms [10]. and the direction of the action potential. See figure 9. The concepts of non-linear chaotic neurodynamics are of fundamental importance to nervous system research. The sum of currents that a neuron generates in response to electrical stimulus produces the post-synaptic potential. The neurons in the brain form dense networks. macroscopic pulse and wave densities are measured. measured with an electrode inside the cell body.1. Recall that . measured with a pair of extracellular electrodes (E) as the electroencephalogram (EEG) [10. The balance of excitation and inhibition allow them to have intrinsic oscillatory activity and overall amplitude modulation (AM) [10. Each synapse drives electric current. Notice that letters “E” represent the pair of extracellular electrodes. because they do not have understanding of a mesoscopic state [10]. 18].

KIII. The K-sets mediate between the microscopic activity of small neural networks and the macroscopic activity of the brain. A population is a collection of neurons in a neighborhood. KI sets represent a collection of KO sets. corresponding to a cortical column. Microscopic pulse and wave state variables are used to describe the activity of the single neurons that contribute to the population. The same currents produce the membrane (intracellular) and cortical (extracellular) potentials. The behavior of the microscopic elements is constrained by the embedding ensemble. this is not necessary. This is done by recording from outside the cell the firing of pulses of many neurons simultaneously. It has a point attractor. The formation of mesoscopic states is the first step for that. . See the representation of KI and KII sets by networks of KO sets in figure 10 [9]. the activity level is decided by the population. A KIV set is formed by the interaction of three KIII sets [30]. and it cannot be understood outside a mesoscopic and macroscopic view. Mass activity in the brain is described by a pulse density. A KII set represents a collection of KIe and KIi .2. The population is semi-autonomous.36 12 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications extracellular electrodes are placed outside the neuron (see the Es in figure 9). The topology includes excitatory and inhibitory populations of neurons and the dynamics is represented by ordinary differential equations (ODE) [16]. which does not allow to distinguish individual contributions.5. KI. The KIII model consists of many interconnected KII sets. because of their synaptic interactions. so cortical potential provided by sum of dendritic currents in the neighborhood is measured. mesoscopic [10]. theoretical connection between the neuron activity at the microscopic level in small neural networks and the activity of cell assemblies in the mesoscopic scale is not well understood [16]. The activity of neighborhoods in the center of the dendritic sigmoid curve is very near linear. KIV and KV. Neuron populations are similar to mesoscopic ensembles in many complex systems [10]. which can be excitatory (KIe ) or inhibitory (KIi ). KV sets are proposed to model the scale-free dynamics of neocortex operating on KIV sets [16]. Cortical neurons. returning to the same level after its releasing. This simplifies the description of populations. Fortunately. The state space of the neuron population is defined by the range of amplitudes that its pulse and wave densities can take. This way. The collective action of neurons forms activity patterns that go beyond the cellular level and approach the organism level. given two views of neural activity: the former. microscopic and the latter. Katzir-Katchalsky suggests treating cell assemblies using thermodynamics forming a hierarchy of models of the dynamics of neuron populations [29] (Freeman K-sets): KO. The same current that controls the firings of neurons is measured by EEG. Katzir-Katchalsky is the reason for the K in Freeman K-sets. KII. and mesoscopic state variables (also pulse and wave) are used to describe the collective activities neurons give rise. 11]. The KO set represents a noninteracting collection of neurons. describing a given sensory system in brains. Freeman K-sets Regarding neuroscience at the mesoscopic level [10. 2. The average pulse density in a population can never approach the peak pulse frequencies of single neurons. form neuron populations. not by individuals [10]. instead of pulse frequency. which represents dynamical patterns of activity.

because the theory is based on new experiments and techniques and new rules of evidence [17].Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. In relation to the standard KV.doi. K-sets provide a platform for conducting analyzes of unified actions of the neocortex in the creation and control of intentional and cognitive behaviors [13]. Available at [9]. Freeman’s mass action Freeman’s mass action (FMA) [9] refers to collective synaptic actions neurons in the cortex exert on other neurons. The disadvantage is the increasing of the computational time needed to solve ordinary differential equations numerically.org/10. which are Newtonian models. FMA expresses and conveys the meaning of sensory information in spatial patterns of cortical activity that resembles the frames in a movie [12.6. since they treated neural microscopic pulses as point processes in trigger zones and synapses. The Newtonian dynamics can model cortical input and output functions but not the formation of percepts. The FMA needs a paradigm shift. 2. The advantages of KIII pattern classifiers on artificial neural networks are the small number of training examples needed. synchronizing their firing of action potentials [17]. The Katchalsky K-models use a set of ordinary differential equations with distributed parameters to describe the hierarchy of neuron populations beginning from micro-columns to hemispheres [31].5. The FMA theory is Maxwellian because it treats the mesoscopic neural activity as a continuous distribution. The prevailing concepts in neurodynamics are based on neural networks. Neuropercolation Neuropercolation is a family of stochastic models based on the mathematical theory of probabilistic cellular automata on lattices and random graphs. Representation of (b) KI and (c) KII sets by networks of (a) KO sets. motivated by the structural . convergence to an attractor in a single step and geometric increase (rather than linear) in the number of classes with the number of nodes.5772/54177 13 37 Figure 10. The neurodynamics of the FMA includes microscopic neural operations that bring sensory information to sensory cortices and load the first percepts of the sensory cortex to other parts of the brain. 13]. 2.3.

This type of network is called a “geographic network” or spatial network. in specific probabilistic cellular automata and percolation models. these mathematical models exhibit phase transitions with respect to the initialization probability p. information is encoded by spatial patterns of synaptic weights of connections that couple non-linear processing elements. the assembly guides the cortex to the attractors. Each category of sensory input has a Hebbian nerve cell assembly. the proposed model helps to understand the role of phase transitions in biological and artificial systems. Neural networks are spatial networks [56]. Neuropercolation models develop neurobiologically motivated generalizations of bootstrap percolations [31]. one for each category. However. which enable cerebral cortices to build spatial patterns of amplitude modulation of a narrow band oscillatory wave. where the position of cities and routers can be located on a map and the edges between them represent real physical entities. From the theoretical point of view. The oscillating memory devices are biologically motivated because they are based on observations that the processing of sensory information in the central nervous system is accomplished via collective oscillations of populations of globally interacting neurons.. based on random initialization. This approach provides a new proposal to neural networks. such as roads and optical fibers. This is the case of road networks or the Internet. two major problems that the brain has to solve is the extraction of information (statistical regularities) of the inputs and the generation of coherent states that allow coordinated perception and action in real time [56].1. i.38 14 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications and dynamical properties of neuron populations. the anatomical connections of the cortex show that the power law distribution of the connection distances between neurons is exactly optimal to support rapid phase transitions of neural populations. Neuropercolation and neurodynamics Dynamical memory neural networks is an alternative approach to pattern-based computing [18]. regardless of how great .e. A family of random cellular automata exhibiting dynamical behavior necessary to simulate feeling. Under these conditions. They are usually located in an abstract space where the position of the vertexes has no specific meaning. In terms of the theory of complex networks [4]. From a computational perspective. When accessed by a stimulus. The existence of phase transitions has been demonstrated both in discrete and continuous state space models. 2. 2. That is.6. Information is stored in the form of spatial patterns of modified connections in very large scale networks. (2) the model always progresses in one direction: from inactive to active states and never otherwise.7. there are several network vertexes where the position is important and influences the evolution of the network. perception and intention is introduced [18]. Complex networks and neocortical dynamics Complex networks are at the intersection between graph theory and statistical mechanics [4]. Neuropercolation extends the concept of phase transitions for large interactive populations of nerve cells [31]. Basic bootstrap percolation [50] has the following properties: (1) it is a deterministic process. Memories are recovered by phase transitions.

The Neurodynamics of Brain & Behavior group in the Computational Neurodynamics (CND) Lab at the University of Memphis’s FedEx Institute of Technology is dedicated to research cognitive behavior of animals and humans including the use of molecular genetic or behavioral genetic approaches. The direct brain-computer interface would give those with physical constraints or those operating complex machinery “extra arms” [3].Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. There is also a propagation of the previous actual output o p . Brain-Computer Interfaces A non-intrusive technique to allow direct brain-computer interface (BCI) can be a scalp EEG . which allows monitoring the cognitive behavior of animals and humans. the researchers will have to pick up the signal for the desired task from all the other things going on in the brain. such that the brains of mice. a hidden expectation activation (he ) is generated (Eq. 14]. and to improve control of complex machinery. 2. there is propagation of these stimuli to the hidden layer β (bottom-up propagation) (step 1 in figure 11). but considered to be biologically implausible supervised Back-propagation [5. to neuroethological studies. men.8.doi. Similar to how they found seizure prediction markers. is presented to input layer α. A biologically plausible connectionist system Instead of the computationally successful. elephants and whales work the same way [17].an array of electrodes put on the head like a hat. such as an aircraft and other military and civilian uses [24]. seen in power law distributions of structural and functional parameters and in rapid state transitions between levels of the hierarchy [15]. Since the brain is usually multitasking.5772/54177 15 39 they are [31]. based on 1 2 [33] employs the terms “minus” and “plus” phases to designate expectation and outcome phases respectively in his GeneRec algorithm. The superscript p is used to indicate that this signal refers to the previous cycle. (2)) for each and every one of the B hidden units. which is initially empty. 48]. The research has three prongs of use for BCI: video/computer gaming. which states that the dynamics of the cortex is size independent. such as the elderly. It is said that connectivity and dynamics are scale-free [13.org/10. Scale-free dynamics of the neocortex are characterized by self-similarity of patterns of synaptic connectivity and spatio-temporal neural activity. It is a kind of a keyboard-less computer that eliminates the need for hand or voice interaction. 46] is inspired by the Recirculation [22] and GeneRec [33] (GR) algorithms. to support people with disabilities or physical constraints. the changes in the brain that occur before there’s actually movement. the learning algorithm BioRec employed in Bioθ Pred [44. and consists of two phases. In the expectation phase1 (figure 11). . representing the first word of a sentence through semantic microfeatures. from output layer γ back to the hidden layer β (top-down propagation) (steps 2 and 3). 3. when input x. to studies that involve the use of brain imaging techniques. by using brain waves to interact with the computer. and apply that to someone who has a prosthetic device to allow them to better manipulate it. to apply dynamical mathematical and computational models.2 Then. the plan is to use the data to analyze pre-motor movements.

so it is possible for the stimuli to propagate either forwardly or backwardly. these hidden signals propagate to the output layer γ (step 4). expected output y (step 2) is presented to the output layer and propagated back to the hidden layer β (top-down) (step 3). and o h o h0 are set to +1. input x is presented to input layer α again. (4)). hidden (B). Input (α) and hidden ( β) layers have an extra unit (index 0) used for simulating the presence of a bias [20]. and a hidden outcome activation (ho ) is generated. Recall that the architecture is bi-directional. presented one at a time.ok ) p 1≤j≤B (2) o e o k = σ ( ΣB j=0 w jk . This extra unit is absent from the e output (γ) layer. inputs and previous output stimuli o p (sum of the bottom-up and top-down propagations through the sigmoid logistic activation function σ). and output (C) units respectively. and k from 1. . there is propagation to hidden layer β (bottom-up) (step 1 in figure 12). After this. For the other words. and w jk are the connection (synaptic) weights between hidden ( j) and output (k) units3 . Figure 12. The outcome phase.40 16 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications Figure 11. then outcome phase) is repeated [37]. o h C A he j = σ ( Σi =0 wij . the same procedure (expectation phase first. and k are the indexes for the input (A). w0 j is the bias of the hidden neuron j and w0k is the bias of the output neuron k. (3)) [37]. Then. wij o and hidden ( j) units. The expectation phase. x0 . j. 3 i. and an actual output o is obtained (step 5) for each and every one of the C output units. based on inputs and on expected outputs (Eq. xi + Σk=1 w jk . h0 .h j ) 1≤k≤C (3) In the outcome phase (figure 12). through the propagation of the hidden expectation activation to h are the connection (synaptic) weights between input (i) the output layer (Eq. That’s the reason i and j range from 0 to the number of units in the layer.

Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. 53). Other applications were proposed using similar alleged biological inspired architecture and algorithm [34. BP-GR comparison for digit learning. The learning rate η used in the algorithm is considered an important variable during the experiments [20].org/10.5772/54177 17 41 o C h A ho j = σ ( Σi =0 wij .yk ) 1≤j≤B (4) In order to make learning possible the synaptic weights are updated through the delta rule4 (Eqs.” ([20].h j 0 ≤ j ≤ B. . which is basically error correction: “The adjustment made to a synaptic weight of a neuron is proportional to the product of the error signal and the input signal of the synapse in question. 1≤j≤B (6) Figure 13 displays a simple application to digit learning which compares BP with GeneRec (GR) algorithms. p. xi + Σk=1 w jk . 4 The learning equations are essentially the delta rule (Widrow-Hoff rule). Figure 13.doi. 1≤k≤C (5) h e = η . (5) and (6)).( yk − ok ). x i 0 ≤ i ≤ A. e ∆wo jk = η .( h o ∆wij j − h j ). 42–44. 37–40. 49]. considering only the local information made available by the synapse.

This model departs from a classical connectionist model and is defined by a restricted data set. so it is related to the presynaptic terminals of the biological neuron. R. which says “electric signals inside a nervous cell flow only in a direction: from neuron reception (often the dendrites and cell body) to the axon trigger zone.42 18 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications 3. there is a synaptic cleft. The Trigger element is responsible for neuron activation threshold. The Receptive element is responsible for input signals. Also. The following feature set defines the neurons: N = {{w}. transmission occurs through gap junction channels (special ion channels). neurotransmitters are released to the synaptic cleft. a small cellular separation between the cells. which explains the ANN behavior. • g stands for the activation function. At electrical synapses. and labels that model the binding affinities between transmitters and receptors. and secretor. 3. and C variables to account for the binding affinities between neurons (unlike other models). 3.” and the Principle of dynamic polarization. related to the soma. θ . (7) . established at the end of the nineteenth century. Part of electric current injected in presynaptic cell escapes through resting channels and remaining current is driven to the inside of the postsynaptic cell through gap junction channels. At chemical synapses. C } where: • w represents the connection weights. signaling. it introduces T . g. There are vesicles containing neurotransmitter molecules in the presynaptic terminal and when action potential reaches these synaptic vesicles. two principles that revolutionized neuroscience: the Principle of connectional specificity. Intraneuron signaling The Spanish Nobel laureate neuroscientist Santiago Ramón y Cajal. R. Interneuron signaling Electrical and chemical synapses have completely different morphologies. • θ is the neuron activation threshold. located in the pre and postsynaptic cell membranes. A biologically plausible ANN model proposal We present here a proposal for a biologically plausible model [36] based on the microscopic level. This model in intended to present a mechanism to generate a biologically plausible ANN model and to redesign the classical framework to encompass the traditional features. which states that “nerve cells do not communicate indiscriminately with one another or form random networks.2. And the Secretor element is responsible for signal releasing to another neuron. There is a cytoplasmatic connection between cells. trigger. The Signaling element is responsible for conducting and keeping the signal and its is related to the axon. The signaling inside the neuron is performed by four basic elements: receptive.1.” Intraneuron signalling is based on the principle of dynamic polarization. and it is related to the dendritic region.3. T .

Figure taken from [45]. resulting in slower and more diffuse synaptic actions between cells. absent in other models. and compatible genes. The combination of T and R results in C: C can act over other neuron connections.org/10. R. Binding affinity means compatible types. and makes synapses onto the dendrites and cell bodies of other neurons (see figure 14). usually by propagation of an action potential.Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. Figure 14. the action of a transmitter in the postsynaptic cell does not depend on the chemical nature of the neurotransmitter. R. T . Neurotransmitters are released by presynaptic neuron and they combine with specific receptor in membrane of postsynaptic neuron. The ordinary biological neuron presents many dendrites usually branched. This proposal follows Ramón y Cajal’s principle of connectional specificity. that states that each neuron is connected to another neuron not only in relation to {w}. Although type I synapses seem to be excitatory and type II synapses inhibitory (see figure 15). θ . The chemical synapse. which receive information from other neurons. it is the receptor that determines whether a synapse is excitatory or inhibitory. The combination of neurotransmitter with . • R the receptor. The combination of a neurotransmitter and a receptor makes the postsynaptic cell releases a protein. Chemical synapse is predominant is the cerebral cortex.5772/54177 19 43 • T symbolizes the transmitter. which transmits the processed information. and • C the controller. neuron i is only connected to neuron j if there is binding affinity between the T of i and the R of j. and g. The axon is divided into several branches. but also in relation to T . enough amount of substrate. and C. In some cases.doi. Certain chemical synapses lack active zones. R. θ . instead it depends on the properties of the receptors with which the transmitter binds. inside presynaptic terminals. and an ion channel will be activated directly by the transmitter or indirectly through a second messenger. g. an axon. and C are the labels. and C encode the genetic information. and the release of transmitter substance occurs in active zones. while T .

Otherwise. In view of these biological facts. contact zone usually smaller. and gene expression. Axon 1 making synapse in a given cell can receive a synapse from axon 2. Graded potential signaling does not occur over long distances because of attenuation. The degrees of affinity are related to the way receptors gate ion channels at chemical synapses.44 20 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications Figure 15. An axon-axon synapse [6]. it was decided to model through labels T and R. A local potential change [6]. Controller C can modify the binding affinities between neurons by modifying the degrees of affinity of receptors. Through ion channels transmitter material enters the postsynaptic cell: (1) in direct gating: receptors produce relatively fast synaptic actions. synaptic vesicles are smaller and often ellipsoidal or flattened. neurons prevent the releasing of impulses on other cells: symmetrical membrane specializations. Figure taken from [45].”. the effects of graded potential. In excitatory synapse (type A). causing it to open or close. the binding affinities between T s and Rs. in case of mutation. neurons contribute to produce impulses on other cells: asymmetrical membrane specializations. very large synaptic vesicles (50 nm) with packets of neurotransmitters. Figure 16. Morphological synapses type A and type B. figure 16. receptor leads to intracellular release or production of a second messenger. the presynaptic synapse can produce only a local potential change. In inhibitory synapse (type B). which is then restricted to that axon terminal (figure 17). which interacts (directly or indirectly) with ion channel. for instance. and (2) production of a graded potential by the axon. There are two types of resulting signaling : (1) propagation of action potential. Figure 17. and the protein released by the coupling of T and R. See. the amount of substrate (amount of transmitters and receptors). and (2) in indirect gating: receptors produce slow synaptic actions: these slower actions often serve to modulate behavior because they modify the degrees of affinity of receptors. . And label C represents the role of the “second messenger. Graded potentials can occur in another level.

acting at receptor molecules in that membrane. In biological systems. that can act as neurotransmitters. modulation can be related to the action of peptides5 .” In the proposed model. • the initial substrate storage. slower. they spread slowly and persist for some time. it is necessary to set the parameters for thew connectionist architecture. the parameters are: • Controllers can modify: • the degree of affinity of receptors. 5 Peptides are a compound consisting of two or more amino acids. C represents substrate increase by a variable acting over initial substrate amount. the so-called “neuromodulation. For the evaluation of controllers and how they act. and part of it is taken up again for synthesis of a new transmitter. more economical than using extra neurons. There are many distinct peptides. the amount of substrate modification is regulated by the acetylcholine (a neurotransmitter). which are enzymatically divided. • number of neurons in each layer. and they do not act where released. toward the postsynaptic membrane. but at some distant site (in some cases). 3. and • genes (name and gene expression)).org/10.1. • initial amount of substrate (transmitters and receptors) in each layer. this is achieved by modification of a variable for gene expression. and do not sustain the high frequencies of impulses. .Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx.3. the building blocks of proteins. This second messenger. Peptides are a second. the excitatory effects of substance P (a peptide) are very slow in the beginning and longer in duration (more than one minute).5772/54177 21 45 In addition. and • the gene expression value (mutation). mutation can be accounted for. besides altering the affinities between transmitters and receptors. the parameters are: • number of layers. For the network genesis. As neuromodulators. In this model. As transmitters. can regulate gene expression. of several types and shapes. the effect is to make neurons more readily excited by other excitatory inputs. display a slow rate of conduction. • type of receptor and its degree of affinity. It spreads over a short distance. Peptides are different from many conventional transmitters. means of communication between neurons. much more than conventional transmitters. This will produce an increase in the amount of substrate.doi. achieving synaptic transmission with long-lasting consequences. C explains this function by modifying the degrees of affinity of receptors. In this model. peptides act at very restricted places. so they cannot cause enough depolarization to excite the cells. The labels and their dynamic behaviors In order to build the model. and • genetics of each layer: • type of transmitter and its degree of affinity. because they “modulate” synaptic function instead of activating it.

. from layer 3 to 4.2. therefore at the target. and the existence of remaining unused substrate of layer 2. (Controller syntax: number/origin layer-target layer: og/t/tg/res. for t = s: res = substrate increasing (origin). A network simulation In table 3. The data set for a five-layer network. is related to the target gene. The reason why not every unit in layer 2 is connected to layer 1. e = gene expression. tg = target gene (name).3.5. def/2 5 1 2 1 1 2 1 def/2 Controllers: 1/1-2: abc/s/abc/1. The modification of the genetic code. (3) the substrate (amount of transmitter and receptor) is defined by layer. the substrate increase is related to the gene specified in the controller. because also of the amount of substrate of layer 3. but it is limited. where og = origin gene (name). s = substrate.46 22 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications The specifications stated above lead to an ANN with some distinctive characteristics: (1) each neuron has a genetic code. mutation. because the synthesis of a new transmitter occurs in the pre-synaptic terminal (origin gene) [36]. However. all degrees of affinity are set at 1 (the degree of affinity is represented by a real number in the range [0. But controller 1.) Table 3. so that the greater the degree of affinity is the stronger the synaptic connection will be). because it can regulate gene expression. For the sake of simplicity. as well as the modification of the degree of affinity of receptors. in principle. def/2 4 5 5 2 1 1 1 abc/1. (2) the controller can cause mutation. The controllers from layer 2 to 5. and from layer 4 to 5 are absent in this simulation. although the receptor of layer 2 has the same type of the transmitter of layer 1.1]. a data set for a five-layer network simulation is presented [36]. for t = e: res = new gene expression controller (target). from layer 1 to 2. so some postsynaptic neurons are not activated: this way. Observe that from layer 2 to layer 3 (the second hidden layer) only four layer-2 units are connected to layer 3. the network favors clustering. This means that. is that the amount of substrate in layer 1 is eight units. In figure 18. this does . 1/1-4: abc/e/abc/2. which is 4. each layer-1 unit is able to connect to at most eight units. one can notice that every unit in layer 1 (the input layer) is linked to the first nine units in layer 2 (first hidden layer). layer number of neurons amount of substrate type of transmitter degree of affinity of transmitter type of receptor degree of affinity of receptor genes (name/gene expression) 1 10 8 1 1 2 1 abc/1 2 10 10 2 1 1 1 abc/1 3 5 4 1 1 2 1 abc/1. the network architecture and its activated connections are shown in figure 18. The result is that each layer 1 unit can link to nine units in layer 2. incremented by 1 the amount of substrate of the origin layer (layer 1). 3. which is a set of genes plus a gene expression controller. For the specifications displayed in table 3. one could expect that the first two units in layer 2 should connect to the only unit in layer 5 (the output unit). As a result of the compatibility of layer-2 transmitter and layer-5 receptor. t = type of synaptic function modulation: a = degree of affinity. 2/2-3: abc/a/def/0. Also. res = control result: for t = a: res = new degree of affinity of receptor (target). The reason is that the modulation function of controller is better explained at some distance of the emission of neurotransmitter. that is. Adapted from [36]. however.

Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. in order to allow connections. which is highly distributed and integrated through a network of reentrant connections. not disregarding its computational efficiency. Although the architecture shown in figure 18 is feed-forward. Although gene compatibility exists. including learning processes. Other models Other biological plausible ANN models are concerned with the connectionist architecture. or focused on the neural features and the signaling between neurons. recurrence. The remaining controller has the effect of modifying the degrees of affinity of receptors in layer 3 (target). of the cerebral cortex. making them incompatible.doi. between layers 1 and 4. This up-to-date hypothesis suggests that there are neuronal groups underlying conscious experience. . Figure taken from [36]. there are three hidden layers (layers 2 to 4). The choice of the models upon which the proposed description is based takes into account two main criteria: the fact they are considered biologically more realistic and the fact they deal with intra and inter-neuron signaling in electrical and chemical synapses. Always. A five-layer neural network for the data set in table 3. in principle. in addition to the existence of enough amount of substrate. This kind of feedback goes along with Edelman and Tononi’s “dynamic core” notion [7]. the main purpose is to create a more faithful model concerning the biological structure. and functionalities. the connections between layers 2 and 3 became weakened (represented by dotted lines). is permitted in this model.org/10. the dynamic core. not occur because their genes are not compatible. Notice that.4. the genes and the types of transmitters and receptors of each layer must be compatible. In addition to the characteristics for encoding information regarding biological plausibility present in current spiking neuron models. Between them. a distinguishable feature is emphasized here: a combination of Hebbian learning and error driven learning [52–54]. Consequently. In the bottom of the figure is the layer 1 (input layer) and in the top is the layer 5 (output layer). Also. their units do not connect to each other because there is no remaining substrate in layer 1 and because controller 1 between layers 1 and 4 modified the gene expression of layer 4.5772/54177 23 47 Figure 18. 3. related directly to the cerebral cortex biological structure. properties. the duration of action potentials is taken into account. or re-entrance.

L.memphis. Travieso. 56. that is. 1. da F. The objective here is to present biologically plausible ANN models. University of São Paulo at São Carlos. Crick. Acknowledgements I am grateful to my students. MIT Press. Villas Boas. Rodrigues. Clark. pp.” Nature 337 (1989) pp. Neurodynamics of Brain & Behavior.edu/laboratories/cnd/nbb/. 167–242. activation threshold. The hypothesis of the “dynamic core” [7] is also contemplated. R. that is.Center for Large-Scale Integration and Optimization Networks. TN. Oxford. vol. Because of their mathematical simplicity.48 24 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications 4. Mindware: An introduction to the philosophy of cognitive science. “The Recent Excitement about Neural Networks. who have collaborated with me in this subject for the last ten years. [5] F. it would represent a genetically well-suited model of the brain. References [1] B. A. and W. A. February 2007. . University of Memphis. Costa. Also. The mesoscopic level of the brain could be described adequately by dynamical system theory (attractor states and cycles). [2] A. Memphis. closer to human brain capacity. Bialek. still at the microscopic level of analysis. G. http://clion. the possibility of connections between neurons is related not only to synaptic weights. they lack several biological features of the cerebral cortex. and activation function. In the model proposed. Conclusions Current models of ANN are in debt with human brain physiology. Brazil 5. “Computation in a Single Neuron: Hodgkin and Huxley Revisited. Fairhall. the mesoscopic information is privileged. 129–132. and P. [4] L. instead of the individual behavior of the neurons. 1715–1749 (2003). C. FedEx Institute of Technology. H. Department of Computer Science. “Characterization of complex networks: A survey of measurements. [3] CLION . Aguera y Arcas. This type of ANN would be closer to human evolutionary capacity. 2001.” Advances in Physics. The EEG waves reflect the existence of cycles in brain electric field. Oxford University Press. Author details João Luís Garcia Rosa Bioinspired Computing Laboratory (BioCom). F. USA. but also to labels that embody the binding affinities between transmitters and receptors. the model allows reentrancy in its architecture connections.” Neural Computation 15. No.

Representation.” Scholarpedia 2(2):1357. J. Breakspear. J. A Universe of Consciousness . “How and Why Brains Create Meaning from Sensory Information. Massachusetts . Freeman and R.How Matter Becomes Imagination. “Deep analysis of perception through dynamic structures that emerge in cortical activity from self-regulated noise. How Brains Make Up Their Minds. Freeman. Freeman. H.” International Journal of Bifurcation & Chaos 14: 513–530. [17] W. E. Prentice-Hall.org/article/Scale-free_neocortical_dynamics.Examination of the Neurophysiological Basis of Adaptive Behavior through the EEG.doi. The MIT Press. O. 2008. 11–38. J. J.Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. Haykin.org/ article/Bifurcation. Asanuma. J. Anderson.London. L.scholarpedia.A Comprehensive Foundation. [15] W.org/article/Freeman’s_mass_action. 1999. “Bifurcation.). “Proposed cortical ’shutter’ in cinematographic perception. San Jose. Freeman and M. Weidenfeld & Nicolson. Neural Networks . Erwin.berkeley. 1999. Tononi.” IJCNN2011 Tutorial 6.scholarpedia. Freeman and H. 2007. [16] W.” in L. Edelman and G. Mesoscopic Brain Dynamics. J. 1949. Neurodynamics of Cognition and Consciousness. [20] S. IJCNN 2011 . [7] G. 2000.Computation. Kozma (Eds.5772/54177 25 49 [6] F. 2003. M. J. 1986. Crick and C. “Certain Aspects of the Anatomy and Physiology of the Cerebral Cortex. Freeman.” Scholarpedia 5(1):8040. scholarpedia.).org/article/Freeman_K-set. Basic Books. “Freeman’s mass action. 2. Kozma. 2010. Hebb. Guckenheimer. [13] W.” Cogn Neurodyn (2009) 3:105–116. Freeman. http://www. Available at http://sulcus. England. New York San Francisco London 1975. . The Organization of Behavior: A Neuropsychological Theory.edu/. Kozma. Academic Press. Freeman and R. [11] W. J. http: //www. [12] W. Eliasmith and C. Vol. July 31. “Neuropercolation + Neurodynamics: Dynamical Memory Neural Networks in Biological Systems and Computer Embodiments. Second Edition. Cambridge. 2011. [18] W. 2004.scholarpedia. McClelland and D. California. Rumelhart (eds. Neural Engineering .org/10. New York: Springer.International Joint Conference on Neural Networks. Freeman. [21] D. 2007. [9] W. “Scale-free neocortical dynamics. [14] W. pp. http://www. London.” in J. The MIT Press. Freeman. [8] C. [19] J.” Scholarpedia 2(6):1517. A Bradford Book. Parallel Distributed Processing. http://www. Wiley. Perlovsky and R. J.” Scholarpedia 3(2):3238. 2007. and Dynamics in Neurobiological Systems. Springer-Verlag London Limited 2000. J. Mass action in the nervous system . “Freeman K-set. [10] W.

N. “Artificial Neural Networks: A Tutorial.” in D. 2007. 895–938. Izhikevich. Tunstel. “Learning Representations by Recirculation. Kandel. 2009.” in V.” IEEE Computer. [29] A. R. “Neuropercolation. T. and W.edu/update/sep09/kozma. [34] T. Rosa. [24] S. New York 1988.php.). Schwartz. Jain. March 1996. Freeman. [32] W. Hoover. pp. Connecticut. and M.Proceedings of the International Conference . J. McGraw-Hill.” Bulletin of Mathematical Biophysics. Dynamic patterns of brain cell assemblies. and K. 1995. S. Mohiuddin. Schwartz. [33] R. 500–544. Rinzel and G. Z. pp.” J.” in C. http://www. H. B. “Computational aspects of cognition and consciousness in intelligent devices. Aghazarian. Hinton and J. Rosa. 896–920. Andrade Netto. pp.scholarpedia. M. Appleton & Lange. H. Mao.memphis. H. Physiol. G. [31] R.” Applied Artificial Intelligence. C. Kárný (Eds. Kozma. 5. 1989. The MIT Press.” Neural Computation 8:5 (1996) pp. Essentials of Neural Science and Behavior. Kurková. J.). Rowland and R. [27] E. K. K. 1974. M. Anderson (ed. Ermentrout. “Analysis of neuronal excitability and oscillations. 358–66. Pitts. Huntsberger. 2000. “Kozma’s research is brain wave of the future. Jessell. T. “A quantitative description of membrane current and its application to conduction and excitation in nerve. “An Artificial Neural Network Model Based on Neuroscience: Looking Closely at the Brain. 2007. Katchalsky. Huxley. V. M. 53–64.50 26 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications [22] G. E. J. [23] A. Fourth Edition. C. [36] J.). E. Segev (Eds. Principles of Neural Science. G. L. Kandel. L. Methods In Neuronal Modeling: From Synapses To Networks. L. McCulloch and W. “SABio: A Biologically Plausible Connectionist Approach to Automatic Text Summarization. L. J. pp. [26] A. Blumenthal. R. 1943. [30] R. F. 22(8). L. (1952) 117. O’Reilly. R. Taylor & Francis. “Biologically Plausible Error-driven Learning Using Local Activation Differences: the Generalized Recirculation Algorithm. Neural Information Processing Systems.The newsletter for the University of Memphis. and M. 115-133. Artificial Neural Nets and Genetic Algorithms . http://www. T. J. Kozma. [35] J.” IEEE Computational Intelligence Magazine. Dynamical Systems in Neuroscience: The Geometry of Excitability and Bursting. Orrú. [28] E.org/ article/Neuropercolation. MIT Press.” Update . MIT Press. Koch and I. August 2007. M. American Institute of Physics.” Scholarpedia 2(8):1360. [25] E. Neruda. Jessell. McClelland. 31–44. “A logical calculus of the ideas immanent in nervous activity. 2008. Stamford. Steele. Hodgkin and A.

Sossa (Eds. Advances in Neural Networks . and H.” in D. 2001 . L. [41] J. [44] J. Rio de Janeiro. H.International Joint Conference on Neural Networks. IEEE SSCI 2009.ISNN2007 . IEEE Conference Proceedings. 2007. 2972 / 2004. and Cybernetics . and C. 1127-1134. 138-141. Fei. District of Columbia. Spain. 2009. Available at http://ewh. Montréal. Lecture Notes in Computer Science. Rosa. G.” in Proceedings of the 2002 VII Brazilian Symposium on Neural Networks (SBRN 2002). IJCNN 2010 . Springer-Verlag Berlin Heidelberg.htm. .” Proceedings of the 2009 IEEE Workshop on Hybrid Intelligent Models and Applications (HIMA2009). G. pp. Editora LTC. Monroy. USA. Vol. Man. TN.5772/54177 27 51 in Prague. July 18-23. Tunísia. L. Mexico. on Artificial Intelligence. Rosa. July 31.” in Proceedings of the 2002 IEEE International Conference on Systems. Czech Republic.2010 IEEE World Congress on Computational Intelligence. Barcelona. Sucar. [40] J. “Biologically Plausible Connectionist Prediction of Natural Language Thematic Relations. [39] J.org/10. Sun (Eds. Rosa. 2845–2850.ieee. “Biologically Plausible Artificial Neural Networks. Brazil. Pp. Hammamet.International Joint Conference on Neural Networks.” in R.April 2. Rosa.” Two-hour tutorial at IEEE IJCNN 2005 . Hou. [37] J. 06-09 October 2002. G. Zhang. G. MICAI 2004: Advances in Artificial Intelligence: 3rd.IEEE-SMC’03. Mexican Intl. IEEE Computer Society Press. Volume 4492. Book in Portuguese. L. Sheraton Music City Hotel. G. Conf.. 390–399. IEEE Conference Proceedings.Biologically Plausible Artificial Neural Networks Biologically Plausible Artificial Neural Networks http://dx. Rosa. “A Biologically Inspired Connectionist System for Natural Language Processing. L. L. April 26-30. United States of America.Lecture Notes in Computer Science. L. pp. “A Connectionist Thematic Grid Predictor for Pre-parsed Natural Language Sentences. “A Biologically Motivated Connectionist System for Predicting the Next Word in Natural Language Sentences. 2010. Proc. 2011. L.). Rosa. and Cybernetics . Liu.Volume 4. G.” in Proceedings of the 2003 IEEE International Conference on Systems. [42] J. Springer-Verlag. 2005. E. 05-08 October 2003. “A Biologically Motivated and Computationally Efficient Natural Language Processor.” Proceedings of the WCCI 2010 .IEEE-SMC’02 . April 22-25. Rosa. L. March 30 . Rosa.doi.). [38] J. 825–834. Nashville. S. “A Hybrid Symbolic-Connectionist Processor of Natural Language Semantic Relations. ISBN: 3-211-83651-9. [43] J. “A Biologically Plausible and Computationally Efficient Architecture and Algorithm for a Connectionist Natural Language Processor. IEEE Symposium Series on Computational Intelligence. Man. Centre de Convencions Internacional de Barcelona. Fundamentos da Inteligência Artificial (Fundamentals of Artificial Intelligence). Washington. L. pp. L. G. Arroyo-Figueroa. Recife. pp. Rosa. 64-71. G. G.ICANNGA-2001. pp. Canada. 11-14 November 2002. Mexico City. G. 2004. Z. Part II.org/cmte/cis/mtsc/ieeecis/contributors. [45] J. Springer-Verlag Heidelberg.

). pp. Rosa and J. [54] A. The synaptic organization of the brain. Book review on “How Brains Make up Their Minds.” Proceedings of IJCNN 2011 . “A Connectionist Model based on Physiological Properties of the Neuron. G.2008.org/wiki/Neuron . San Diego. DOI 10. “On the Behavior of Some Cellular Automata Related to Bootstrap Percolation. 2001 Cambridge University Press. Rosenblatt. Rosa. Shepherd. G. 2003. Williams. Oxford University Press.” Complexity. ISBN 85-87837-11-7. NY. G. IEEE Computer Society Press. [57] Wikipedia . 56–60. and Brain Function. Willey Periodicals. pp. “Biologically Plausible Connectionist Prediction of Natural Language Thematic Relations. 2011. J. [56] O. J. October 23-28.” Psychological Medicine. 1394-1401. “Biological Plausibility in Artificial Neural Networks: An Improvement on Earlier Models. DOI 10. Hinton. 16. California.” in D. 2001. Brazil. “The perceptron: A perceiving and recognizing automaton. no. Rosa. By W. “Learning Internal Representations by Error Propagation. [50] R.” The Annals of Probability. M. L.wikipedia. Smythies.” in Proceedings of the International Joint Conference IBERAMIA/SBIA/SBRN 2006 .The Free Encyclopedia. Vol. B. 3245-3277. J. G. “Network Analysis.” Proceedings of The Seventh International Conference on Machine Learning and Applications (ICMLA’08). 2003. 2009. Cornell Aeronautical Lab. pp. San Jose. pp. Parallel Distributed Processing. available at http://en. [49] M. L. and R. B. Freeman. “Application and Development of Biologically Plausible Neural Networks in a Multiagent Artificial Life System. L. H. 2006. Ithaca.73. 8. Volume 1 . 18.1109/ICMLA. 1992). 65–75. E. USA.1007/s00521-007-0156-0. L. no. E. R. Silva and J.1st Workshop on Computational Intelligence (WCI’2006). 1957. vol. 20. [55] J. Silva and J. Project PARA. G. A Bradford Book. USA. 174–193. vol. 2008. pp. July 31 .” Journal of Universal Computer Science. G. M. L. 1 (Jan. pp. Rumelhart and J. L. [48] D. [53] A.. E. Rumelhart. McClelland (eds. 11-13 Dec. Complexity.International Joint Conference on Neural Networks.” Neural Computing & Applications. CD-ROM. 1986.Foundations. 31. California. “Advances on Criteria for Biological Plausibility in Artificial Neural Networks: Think of Learning Processes. Silva and J. [52] A. Rosa.. 21 (2010). [51] G. ISSN 0948-6968. Sporns.August 5. Schneider and J. Inc. fifth edition. Adán-Coello. MIT Press.UCS vol. 373–376. number 1.” Report 85-460-1. Schonmann. O.52 28 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications [46] J. No. 829-834. B. [47] F. Ribeirão Preto. Rosa. 1.

5772/51776 1. The inclusion of these parameters aims to produce a correct value of weight update which later will be used to update the new weight. 2. the step of gradient descent is controlled by a parameter called the learning rate. It is meant to generate the . Introduction The assignment of value to the weight commonly brings the major impact towards the learning behaviour of the network. otherwise. with and without the adaptive learning method. Owing to the usefulness of two-termBP and the adaptive learning meth‐ od in learning the network. the convergence might be slower or it might cause divergence.org/10. Related work Gradient descent technique is expected to bring the network closer to the minimum error without taking for granted the convergence rate of the network. If the algorithm successfully computes the correct value of the weight. Moreover. the learning process can be optimized and the solution is not hard to reach. If both aspects are proper‐ ly chosen and assigned to the weight. to avoid the oscillation problem that might happen around the steep valley.doi. Ashraf Osman Ibrahim and Citra Ramadhena Additional information is available at the end of the chapter http://dx. this study is proposing the weights sign changes with respect to gradient descent in BP networks. it can converge faster to the solution. This parameter will determine the length of step taken by the gradient to move along the error surface. The correct value of weight update can be seen in two aspects: sign and magnitude.Chapter 3 Weight Changes for Learning Mechanisms in TwoTerm Back-Propagation Network Siti Mariyam Shamsuddin. the fraction of last weight update is added to the current weight update and the magnitude is adjusted by a parameter called momentum. To prevent this problem occurring.

Later. the oscillation problem may occur and cause divergence or in some cases. On the other hand. several major deficiencies still need to be solved. The term “proper adjustment of weight” here refers to the proper assignment of magnitude and sign to the weight. The assignment of fixed value to both parameters is not always a good idea.bearing in mind that the error sur‐ face is not always flat or never flat. if the smaller value is assigned to the learning rate. the updated weight is used for training the network at each epoch until the predefined iteration is achieved or the mini‐ mum error has been reached. the points passed by the slope throughout the iterations affect the magni‐ tude of the value of weight update and its direction. These drawbacks are also acknowledged by several scholars [2-5].54 Artificial Neural Networks – Architectures and Applications slope that moves downwardsalong the error surface to search for the minimum point. This information will then be used to perform a proper adjustment of the weight. Error function plays a vital role in the learning process of two-termBP algorithm. If this can be achieved. The movement of the gradient on the error surface may vary in term of its direction and magnitude. Literally. the learning process takes only a short time to converge to the solution. since both of these factors affect the internal learning process of the network. Hence. owing to the larger step taken by the gradient. This is a very important condition to be taken into consideration to generate the proper value and direction of the weight. the step size taken by the gradient cannot be simi‐ lar over time. Respective‐ ly. It needs to take into account the characteristics of the error surface and the direction of movement. Once the weight is properly ad‐ justed. How‐ ever. learning rate and mo‐ mentum parameters hold an important role in the two-term BP training process. The most notable deficiencies. Aside from the gradient. The slow rates of convergence are due to the existence of local minima and the convergence rate is relatively slow for a network with more than one hidden layer. the initial value of both parameters is very critical since it will be retained throughout all the learning iterations. The sign of the gradient indicates the direction it moves and the magnitude of the gradient indicates the step size taken by the gradient to move on the error surface. they control the step size taken by the gradient along the error surface and speed up the learning process. are the existence of temporary local minima due to the saturation behaviour of activation function. This temporal behaviour of the gradient provides insight about conditions on the error surface. which is carried out by implementing a weight adjustment method. A side from calculating the actual error from the training. Hence. Dur‐ ing its movement. These factors are the learning parameters such as learning rate and momentum. Setting a larger value for the learning rate may assist the network to converge faster. the network can reach the minimum in a shorter time and the desired output is obtained. Despite the general success of BP in learning. In a conventional BP algorithm. we might overshoot the minimum. Thus. according to reference [1]. the problem faced by two-termBP is solved. there are some factors that play an important role in the assign‐ ment of proper change to the weight specifically in term of its sign. it assists the algorithm in reaching the minimum point where the solution converges by calculating its gradient and back propaga‐ tion to the network for weight adjustment and error minimization. the gradient will move in the correct direc‐ . the problem of being trapped in local minima can be avoided and the desired solution can be achieved.

In [6] has proposed a solution to these matters. The first drawback is that the method is not efficient to be used together with the momentum since sometimes it causes diver‐ gence. the fact that the two-term BP algorithm uses the uniform value of learning rate may lead to the problem of overshooting minima and slow movement on the shallow sur‐ face. fewer iterations and smoother convergence. EDBD implements a similar notion of DBD and adds some modifications to it to alleviate the drawbacks faced by DBD. However. On the other hand. The experi‐ . Unlike DBD.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx. Under this condition. This phenomenon may cause the algorithm to diverge or converge very slowly to the solution owing to the different step size taken by each slope to move in a different direc‐ tion. however. otherwise the gradient will bounce from one side of the surface to another. which affects the learning process. and its learning performance is thus superior to DBD.5772/51776 55 tion and gradually reach the minimum point. the search algorithm is refined by repeatedly revert‐ ing to previous weight configurations and decreasing the global learning rate. the momentum is used to overcome the oscillation problem. In addition. EDBD is one of many adaptive learn‐ ing methods proposed to improve the performance of standard BP. each connection will have its own learning rate. the direction of gradient changes rapid‐ ly and may cause divergence. The obvious way to solve this problem is to imple‐ ment an adaptive learning method to produce the dynamic value of learning parameters. As a result. [9] has presented a new learning algorithm for a feed-forward neural network based on the two-termBP method using an adaptive learning rate. The proposed algorithm consists of two phases.doi. by combining the momen‐ tum with the batch gradient descent algorithm. The advantages of this method are faster speed. On the other hand. The satisfactory performance tells us that the algorithm performssuccessfully and well in generating proper weight with the inclusion of momentum. Thus. called the Delta-Bar-Delta (DBD) algo‐ rithm. and finally focus on the right to modify the weights to enhance the convergence rate accordingto the totalerror. the learning rate is adjusted after each iterationso that the minimum error is quickly attained. If that happens. it has to wait until all the input samples are in attendance. Any sample in the network cannot have an immediate effect. EDBD pro‐ vides a way to adjust both learning rate and momentum for individual weight connection. It pushes the gradient to move up the steep val‐ ley in order to escape the oscillation problem.org/10. then we accumulate the sum of all errors. For these reasons. It is obviously seen that the use of a fixed parameter value is not efficient. the computed weight update value and direction will be incorrect. and demonstrates a satisfactory learning performance. In the second phase. The adaptation is based on the error criteria where error is measured in the validation set instead of the training set to dynami‐ cally adjust the global learning rate. The method proposed by the author focuses on the setting of a learning rate value for each weight connection. the convergence rate is compro‐ mised owing to the smaller steps taken by the gradient. The author has pro‐ ven that the EDBD algorithm outperforms the DBD algorithm. Howev‐ er. [7] proposed an improved DBD algorithm called the Extended Delta-Bar-Delta (EDBD) algorithm. [8] has proposed the batch gradient descent method momentum. this method still suffers from certain drawbacks. The second drawback is the assignment of the increment parameter which causesa drastic increment on the learning rate so that the exponential decrement does not bring a significant impact to overcome a wild jump. In the first phase.

indicating the validity of the proposed method. the range for the learning rate is widened after the inclusion of the adaptable mo‐ mentum. The next learning phase will adjust the learning model based on the evalu‐ ation effects according to the previous learning phase. The simulation results show that the proposed method is superior to the conven‐ tional BP method where fast convergence is achieved and the oscillation is smoothed. The function relationship between the total quadratic training error change and the connection weight and bias change is acquired based on the Taylor formula. In [13] a fast BP learning method is proposed using optimization of the learning rate for pulsed neural networks (PNN). the mo‐ mentum coefficient is adjusted iteratively. during connection weight learning and attenua‐ tion rate learning for the purpose of accelerating BP learningina PNN. . The comparison made between this method and other methods. such as two-term BP. [12] presented an improved training algorithm of BP with a self-adaptive learning rate. Moreover. Moreover. By combining it with weight and bias change in a batch BP algorithm. the two-term BP is improvedso that it can overcome the problems of slow learning and is easy to trap into the minimum by adopting an adaptive algorithm. in [11] proposed a differential adaptive learning rate method for BP to speed up the learning rate. The experiment results show that the modified BP improved much bet‐ ter compared with standard BP. [5] proposed a new BP algorithm with adaptive momentum for feed-forward training.56 Artificial Neural Networks – Architectures and Applications mental result shows that the proposed method quickly converges and outperforms twoterm BP in terms of generalization when the size of the training set is reduced. The learning rate will be adaptively adjusted based on the aver‐ age quadratic error and the error curve gradient. The results showed that the average number of learning cycles required in all of the problems was reduced by optimization of the learning rate during connection weight learning. [10] has improved the convergence rates of the two-term BP model with some modifications in learning strategies. training samples. Nguyen-Widrow Weight Initialization and Optical BP shows that the proposed method out‐ performs the competing method in terms of learning speed.The method di‐ vides the whole training process into many learning phases. The proposed method employs the large learning rate at the beginning of training and gradually decreases the value of learning rate using differential adaptive meth‐ od. In [14]. while maintaining the stability of the network.The authors devised an error BP learning method using optimization of the learning rate. The effects will indicate the di‐ rection of the network globally. average quadratic er‐ ror and gradient but not artificial selection. The result of the experiment shows the effective‐ ness of the proposed training algorithm. Based on the information of current descent direction and last weight increment. the equations to calculate a self-adaptive learning rate are obtained. The proposed method optimized the learning rate so as to speed up learning in every learning cycle. Meanwhile. Different ranges of effect values correspond to different learning models. the value of the self-adaptive learning rate depends on neural network topology.

5772/51776 57 We can infer from previous literature that the evolution of the improvement of BP learning for more than 30 years still points towards the openness of contribution in enhancing the BP algorithm in training and learning the network especially in terms of weight adjustments. The input layer has M neurons and input vector X = [ x1 .doi. x2 .…. y2 .…. Connection link is specifically connecting the nodes with in the layers with the nodes in the adjacent layer that builds a highly inter connected net‐ work. called the input layer. This can be seen from various stud‐ ies that significantly control the proper sign and magnitude of the weight.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx. The bottom-most layer. The general architecture of ANN is depicted in Figure1 [15]. i is the input layer. It contains several layers where each layer interacts with the upper layer connected to it by connection link. Figure 1. j is the hidden layer and k is the output layer. ANN architecture where. The modification of the weight adjustment aims to update the weight with the correct value to obtain a better convergence rate and minimum error. Two-TermBack-Propagation (BP) Network The architecture of two-term BP is deliberately built in such away that it resembles the struc‐ ture of neuron. xM ] and the output layer has L neurons and has output vector Y=[ y1 . yL ] while the hidden layer has Q neurons. 3. . will accept and process the input and pass the output to the next adjacent layer.org/10. called the hidden layer.

the activation function needs to be a continuous differentiable function. Once the output arrives at the hidden layer. In this process. net j = ∑ W ij Oi + θi i (1) (2) Oj= 1 1+ e -net j .This is called linear combination. the greater the chance of the out‐ put being closer to the desired output. The output produced by the node is affected largely by the weight associated with each link. The most commonly used activation function is the sigmoid function. connection link and the layers can be shown in Figure 2 in reference [16]. each input will be multiplied with weight associated with connection link connect‐ ed to the node and added with bias. The learning depends greatly on the initial weight choice to lead to convergence. Connection links. The relationship between the weight. In addition. Nodes in BP can be thought of a sun its that process in put to produce output. weights and layers. One reason for this complexity is the existence of local minima compared with the one with one hidden lay‐ er. which limits the output between 0 and 1. in the sense that the derivative of error function can keep moving down hillon the error surface in searching for the minimum. Weight is used to determine the strength of the output to be closer to the desired output.58 Artificial Neural Networks – Architectures and Applications The output received from the input layer will be processed and computed mathematically in the hidden layer and the output will be passed to the output layer. To ensure the learning takes place continuously. The net is fed to activation function and the output will be passed to the output layer. it will be summed up to create a net. The greater the weight. Figure 2. BP can have more than one hidden layer but it creates complexity in training the network.

Similar to in‐ put and hidden layers. θi is the bias associated at each connection link between input layeriandhidden layerj. Depending on the application. O j is the output of activation functionat hidden layer j Other activation functions that are commonly used are the logarithmic. O j isthe output at nodes in hidden layer j. The output generated by activation functionis forwarded to the output layer. Oi is the input at the nodes in input layeri. Mean Square Error is used as an error function to calculate the error at each iteration using the target output and the final calculated output of the learning at each iteration.tangent. In a classification problem. . the number of nodes in the output layer may vary.ok ) 2 (5) net k is the summation of weighted output at the output layer k. All the weighted outputs are added together and this value will be fed to the activation function togener ate the final output.doi.org/10. the connection link that connects the hidden and output layers is as‐ sociated with weight. net j is the summation of the weighted input added with bias. net k = ∑ W jk O j + θk j (3) (4) Ok = E= where. E is the error function of the network ( Mean Square Error). the training process continues to the next iteration. hyperbolic tangent functions and many more. Activated output received from the hidden layer is multiplied by weight.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx. W jk is the weight associated to connection link between the hidden layer j and the output layer k. the output layer only consists of one node to produce the result of either yes or no or abinary numbers. des in W ij is the weight associated at the connection link between nodes in the input layeriand no‐ hidden layer j. 1 1+ e -net k 1 2∑ k (tk . If the error is still larger than the predefined acceptable error value.5772/51776 59 where.

1) is the previous weight incremental value at (t-1)th iteration By applying the chain rule. we get. This algorithm is known as the delta rule. the derivative of er‐ ror function with respect to weight is computedand back-propagated to the layers to com‐ pute the new weight value at each connection link.we can simplify the negative derivative of error function with re‐ spect to weight as follows: ∂E ∂ W kj = ∂E ∂ net k ∙ ∂ net k ∂ W kj (7) Substituting Equation (3) into Equation (6).∂ W kj is the negative derivative of error function with respect to weight β is the momentum parameter ∆W kj ∂E (t . ∆W kj ∂E (6) (t ) is the weight incremental value at tth iteration ηis the learning rate parameter .60 Artificial Neural Networks – Architectures and Applications t k is the target output at output layer k.η ∂ W kj + β ∆ W kj (t . For weight associated to each connection link between output layer k to hidden layer j. the weight incremental value is computed us‐ ing a weight adjustment equation as follows: ∆ W kj (t ) = . ∂E ∂ W kj = ∂E ∂ net k ∙O j (8) (9) ∂E ∂ net k = δk Thus. θk is the bias associated to each connection link between the hidden layer j and the output layer k. The new weight is expected to be a cor‐ rect weight that can produce the correct output. A large value of error obtained at the end of each iteration denotes the deviation of learning where the desired output has not been achieved. which employs the gradient descent method.1) where. To solve this problem. . by substituting Equation (9) into Equation (8). we get. O k is the final output at the output layer.

the error signal is back-propagated to bring impact to the weight between input layer i and hidden layer j.O j ) k ( ) j (17) Based on Equation(10) and the substitution of Equation(17) in to Equation(6).Ok ) ∙ Ok (Ok .(tk .Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx.the weight ad‐ justment equation for the weights associated to each connection link between input layer I and hidden layer j is as follows: ∆ W ji (t ) = η ∑ δk W kj O j (1 .1) (16) On the other side. ∆ W kj (t ) = η (tk .(tk .1) k ( ) (18) Where.1) Substituting Equation (12) into Equation (11) yields. The error signal at hidden layerjcan be written as follows: δ j = ∑ δk W kj O (1 .org/10.1) (15) Thus. we get error signal at output layer δk = ∂E ∂ net k = . by substituting Equation (15) into Equation (6).1) ∙ O j + β ∆ W kj (t .O j ) ∙ Oi + β ∆ W ji (t . .5772/51776 ∂E ∂ W kj 61 = δk ∙ O j (10) Simplifying Equation (9) to be as follows: δk = ∂ Ok ∂E ∂ net k = ∂E ∂ Ok ∙ ∂ net k ∂ Ok (11) (12) ∂ net k = Ok (Ok . ∂E ∂ net k ∂E ∂ Ok = ∂E ∂ Ok ∙ Ok (Ok .Ok ) ∙ Ok (Ok .Ok ) Substituting Equation (14) in to Equation (13). we get the weight adjustment equa‐ tion forweight associated to each connection link between output layer k to hidden layer j with simplified negative derivative error function with respect to weight.1) (13) (14) = .doi.

the difficulty of this algorithm lies in controlling the step and direction of the gradient along the error surface. er k and hidden layer j. The choice of learning rate value is application-dependent and most cases are based on experiments. However.62 Artificial Neural Networks – Architectures and Applications ∆W ji (t ) is the weight incremental value at tth iteration. W kj (t + 1) is the new value for weight associated to each connectionlinkbetween output lay‐ (19) W kj (t ) is the current value of weight associated to each connection linkbetween output layer k and hidden layer j at t th iteration Meanwhile. The gradient of error function is expected to move down the error surface and reach the minima point where the global minimum resides. Thus. the new weight value at (t+1)th iteration associated to each connection link between output layer to hidden layer j is calculated as follows: W kj (t + 1) = ∆ W kj (t ) + W kj (t ) Where. The learning time becomes slower since more instances of training are needed to achieve minimum error. The values obtained from Equation (16) and Equation (18) are used to update the value of weights at each connection link. Once the cor‐ ji (t ) is the current value of weight associated to each connection link between hidden lay‐ . O i is the input at nodes in input layer i. the new weight value at (t-1)th iteration for the weight associated at each con‐ nection link between hidden layer j and the input layer i can be written as follows: W ji (t + 1) = ∆ W ji (t ) + W ji (t ) Where. Owing to the temporal behaviour of gra‐ dient descent and the shape of the error surface. a small step can direct the algorithm to the correct direction but the convergencer‐ ate is compromised. (20) W er j and input layer i. Let t refer to tth iteration of training. Many reasons can cause this problem but one of the misovershooting the local minima where the desired output lies. the algo‐ rithm might go in the wrong direction and bounce from one side across to the other side. W ji (t + 1) is the new value for the weight associated to each connection linkbetween hidden layer j and input layer i. For this reason a parameter called the learning rate is used in weight adjustment computation. a large step can lead the network to converge faster but when it moves down along the narrow and steep valley. the step taken to move down the error sur‐ face may lead to the divergence of the training. This may hap‐ pen when the step taken by the gradient is large.In contrast.

The choice of the adaptive method focuses on the learning characteristic of the algorithm used in this study.5772/51776 63 rect learning rate is obtained. setting up a smaller value to learning rate may decelerate the convergence speed even though it can guarantee that the gradient will move in the correct direction. Weight Sign and Adaptive Methods The previous sections have discussed the role of parameters in producing the increment val‐ ue of weight through the implementation of a weight adjustment equation. In the case of learning rate.org/10. setting up a larger value to learning rate may fasten the convergence speed but is prone to an oscillation problem that may lead to divergence. On the contrary. while too large a value for the momentum factor results in the al‐ gorithm giving excessive emphasis to the previous derivatives that weaken the gradient de‐ scent of BP. The use of a constant value of parameter is not always a good idea. Owing to the problem of oscillation in the narrow valley. the gradient movement can produce the correct new weight value to produce the correct output. the value of the learning rate also needs to be adjusted at each iteration to avoid the problem produced by having a constant value throughout all iterations. Momentum brings the impact of pre‐ vious weight change to the current weight change by which the gradient will move uphill escaping the oscillation along the valley. the author suggested the use of a dynamic adjustment method for momentum. 4.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx. which is batch learning. learning rate and momentum coefficient are the most commonly used parameters in two-term BP. which is the primary concern of this study. This method will be implemented to two-term BP algorithmwith MSE. The author defined online learningas a scheme for updating weight that updates weight after every input-output case. In [17] gave a brief definition of online learning and the difference with batch learning. In [15] stated that assigning too small a value to the momentum factor may decelerate the convergence speed and the stability of the network is compromised. Hence. The adaptive method assists in generating the correct sign value for the weight. The adaptive parameters (learning rate and momentum) used in this study are implemented to assist the network in controlling the movement of gradient descent on the error surface which primarily aims to attain the correct value of the weight.doi. The incorporation of two parameters in the weight adjustment calculation produces a great impact on the convergence of the algorithm and problem of local minima if they are tuned to the correct value. On the other hand. As discussed be‐ fore. Like the momentum parameter. This parameter is called momentum. the mo‐ mentum parameter is introduced to stabilize the movement of gradient descentin the steep‐ est valley by overcoming the oscillation problem. another parameter is needed to keep the gradient moving in the correct direction so that the algorithm will not suffer from wide oscillation. while batch learning ac‐ . The correct increment value of weight will be used later to update the new value of weight.

1) } } (22) (23) (24) . where the improved method provide sa way of updating the value of momentum for each weight connection a teach iteration.64 Artificial Neural Networks – Architectures and Applications cumulates error signals over all input-output cases before updatingweight. ∂ E (t ) ∂ w ji + β ji (t ) ∆ w ji (t . asstated in reference [18]. it implements a sim‐ ilar notion to DBD. each weight update tries to minimize the error. online learningupdates weight after the presentation of each input and target data.η ji (t ) Where.The author also stated that the summed gra‐ dient information for the whole pattern set provides reliable information regarding the shape of the whole error function.yl δ ji (t ) ( | |) if δ ji (t .∅l η ji (t ) 0 otherwise km exp .In otherwords.1)δ ji (t ) < 0 0 otherwise δ ij (t ) = (1 . The batch learning method reflects the true gradient descent where. called the Extended Delta-Bar-Delta (EDBD) algorithm. ∆ η ji (t ) = ∆ β ji (t ) = { { kl exp . The sign information of past and current gradients becomes the condition for learning rate and momentum adaptation. Since EDBD is anextension of DBD.1)δ ji (t ) > 0 if δ ji (t .∅mβ ji (t ) ( | |) if δ ji (t . It exploits the information of the sign of past and current gradients. The author proposed an improvement of the DBD al‐ gorithm proposed in reference [6].1) (21) η ji is the learning rate between ith input layer and jth hidden layer a t tthiteration β ji (t ) is the momentum between ith input layer andjth hidden layer at t th iteration The updated value for learning rate and momentum can be written as follows. With the task of pointing out the temporal behaviour of gradient of error functionanditsrela‐ tiontothechangeofweightsign.1)δ ji (t ) > 0 if δ ji (t .θ )δij (t ) + θδ ij (t .theadaptivelearning method used in this study is adopted from the paper written by reference [7]entitled “Back-Propagation Heuristics: A Study of the Extended Delta-Bar-Delta Algorithm”.the improved algorithm also providesace iling to prevent the value of learning rate and momentum becoming too large.ym δ ji (t ) .1)δ ji (t ) < 0 . Moreover. The detail edequations of the method are described below: ∆ w ji (t ) = .

the gradient is moving in the flat area at which the minimum lies ahead. One can assume that in this situation. km . it shows that the gradient is moving ina different direction. When the current and past derivatives possess the same sign. ∅m are parameters for momentum adaptive equation. it shows that the gradient is moving in the same direction. One can assume that in this situation. this method is proposed to assist the network in producing proper weight sign change and achieving the purpose of this study. The increment of learning rate value is made proportional to exponentially decaying trace so that the learning rate will increase significantly at a flat region and decrease at a steep slope.This information together with the current gradient is used to adjust the parameters’value based on their sign.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx. βmax is the maximum value of momentum. ηmax is the maximum value of learning rate. specifically the weight adjustment method described in Equations (19) and (20) used in this study.doi.the gradient has jumped over the minimuma nd weight needs to be decreased to solve this. 5. when the current and past derivatives possess an opposite sign. θis the weighting on the exponential average of the past derivatives. yl . Weights Performance in Terms of Sign Change and Gradient Descent Mathematically. The algorithm calculates the exponential average of past derivatives to obtain information about the recent history of the direction in which the error is decreasing up to iteration t. the maximum value for learning rate and momentum are set to act as a ceiling to both parameters. To prevent the unbounded increment of parameters. ηji(t) + ∆ ηji(t) βji(t + 1) = MIN βmax. Owing to the excellent idea and performance of the algorithmas has been proven in refer‐ ence [7]. (25) (26) δ ij is the exponentially decaying trace of gradient values. the adaptivemethodandtheBPalgorithm. ∅l are parameters for learning rate adaptive equation. will assist the algorithm in . δij is gradient value between i th input layer andj th hidden layer at t th iteration. kl . βji(t) + ∆ βji(t) Where.org/10. ym. In contrast.5772/51776 65 η ji (t + 1) = MIN ηmax.

The author stated that when the consecutive derivatives of a weight possess the same sign. This problem can be alleviat‐ ed by adjusting the step size of the gradient and this can be done by adjusting the learning rate. the change of derivative and error curve are smaller and as a result. which makes the weight increment value larger and yields the increment of weight rate. the error value it produces and the change of the weight sign as a response to the temporal behaviour of the gradient. the accuracy of the result. and the adjustment of weight value is large which sometimes overshoots the minima. when the signs of the consecutive derivatives of the weight are opposite. the derivative of weight is large in magnitude yields a large value of weight which may over shoot the mini‐ mum. Moreover. the sum of the weight is decreased to stabilize the network. the change of the weight will not be optimized. Moreover. On the contrary. etc. In [19] also emphasized the causes of the slow convergence which in volve the magnitude and the direction component of the gradient vector. the role of the adaptive method that is constructed by using a mathematical concept to improve the weight adjustment computation in order to yield the proper weights will be implemented and examined in this study to check the efficiency and the learning behaviour of both algorithms. The author also briefly discussed the performance of BP with momentum. This process is affected by various variables start‐ ing from the parameters up to the controlled variable such as gradient. especially the partial derivative of error function with respectto weight (gradi‐ ent). if the derivative has the opposite sign to the previous one. Hence. The author mentioned that when the error curve enters the flat region. A simplicitly depicted from the algorithm. the author raised the implementation of local adaptive methods such as DeltaBar-Delta which is originally proposed in reference [6]. On the contrary. when it enters the high curvature region. when the error surface is highly curved along the weight dimension. The author stated that when the error surface is fairly flat along the weight dimension. the adaptive methods are the transformation of the author’s idea of an optimization conceptinto a mathematical concept. From the descriptiongivenin [6]. The measurement of the success of the method and the algorithm as a whole can be done in many ways. in [15] also gave the proper condition for the rate of weight and the temporal behaviour of the gradient. the exponen‐ tially weighted sum grows large in magnitude and the weight is adjusted by a large value. Some of them are carried out by analysing the convergence rate. the . the magnitude of derivative of weight is small yields small adjustment value of weight and many steps are required to reduce the error. gradient and weight. Basically.66 Artificial Neural Networks – Architectures and Applications producing the proper weight. the weighted sum is small in magnitude and the weight is adjusted by a small amount. previous weight in‐ crement value and error. The momentum coefficient can be used to control the oscillation problem and its imple‐ mentation along with the proportional factor can speed up the convergence. They all play a great part in affecting the sign of the weight produced. The efficiency can be drawn from the criteria of the measurement of success described earlier. the process of generating the proper weight stems from calculating it sup date value. The author wrote that if the derivative has the same symbol as the previous one. then the sum of the weight is in creased. In addition. In reference [15] briefly described the relationship between error curvature. the derivative change is large especially if the minimum point exists at this region. Mean‐ while.

The adjustment comprises magnitude and sign change. This procedure is carried out by calculating the gradient of error which mainly aimed to adjust the value of the weight to be used for the next feed-forward training procedure. the consecutive weight increment value that possesses the same sign requires the incremental of the value of the learning rate. this signal can bring a bigger or smaller influence to the learning. 6. we obtain the negative value of net. To have the clear idea of the impact of the weight sign in the learning. net = ∑ W i O + θ i (27) . Its role lies in determining the strength of the incoming signal (input) in the learning process. As are sult.otherwise there is no point of learning to take place.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx. the weight exhibits the input signal to bring significant impact to the learning and the output and the respective node makes a contribution to the learning and output. Similarly. the corresponding weight needs to be adjusted to reduce the error.doi. the algorithm can converge to the solution faster. If the assignment results in large error.This weighted signal will be accumulated and forwarded to the next adjacent upper layer.By properly adjusting the weight. By looking at the feed-forward computation in Equation (1) through Equation (5). The detailed discussion of the experiment process covers its initial phase until the analysis is summarized.org/10. The net‐ work will then adjust itself to compensate the lost during the training to learn better. When the difference does exist. Based on the influence of weight. the next data train is fed again into the network and multiplied by the new weight value. and thus it does not bring significant influence to the learning and output.thetraining may take fewer or more iterations depending on the proper adjustment of weight. When the weight is negative. On the other hand. The point of learning occurs when the difference between the calculate do utput and the de‐ sired output exists. the weight connection inhibits the input signal. the other nodes with positive weight will dominate the learning and the output. A sa result. This information will be used in studying the weight sign change of both algorithms. when the weight is positive. There is no doubt that weight plays important role intraining. the error signal is propagated back into the network to be minimized. The similar feed-forward and backward procedures are performed until the minimum value of error is achieved. Experimental Result and Analysis This section discusses the result of the experiment and its analysis.5772/51776 67 learning behaviour of the network can be justified as follows: The consecutive weight incre‐ ment value that possesses the opposite sign indicates the oscillation of weight value which requires the learning rate to be reduced.the assumption is used: Let all value of weights be negative and using the equation written below.

To have a significant impact on the weight update value. Another factor is the previous weight update value. The negative sign is assigned to a gradient to force it to move downhill along the error surface in the weight space.68 Artificial Neural Networks – Architectures and Applications O= 1 1 + e -net (28) Feeding net into the sigmoid activation function Equation (28).1) W (t + 1) = ∆ W (t ) + W (t ) (29) (30) It can be seen from Equation (29) and (30) that the rear evarious factors which influence the calculation of the weight update value. Each gradient with respect to each weight in the network will be calculated and the weight up‐ date value is computed to update the corresponding weight. . we can improve our wayto update the weight. This temporal behaviour can be seen as the chang‐ ing of the gradient’s sign during its movement on the error surface. Mathematically. it indicates that the minimum lies ahead but when the gradient changes its sign in several consecutive iterations. Owing to the curvature on the error surface. As in two-termBP. In [6] stated that when the gradient possesses the same sign in several consecutive iterations. we obtain the value of O close to1. With this optimal value. By using this information.a few authors have observed that the temporal behaviour of the gradient does make a contribution to the learning. it indicates that the minimum has been passed. we obtain the positive value ofnet. The most notable factor is gradient descent. ∆ W (t ) = η (t )∇ E (W (t )) + β (t ) ∆ W (t . we obtain the value of O close to 0. we can infer that by adjusting the weight with the proper value and sign. the magnitude of gradient and the previous weight update value are controlled by learning parameters such as learning rate and momentum. On the other hand. Afterwards. it effects the calculation of the new value of weight by using the weight adjustment method in Equation (30). the gradient behaves differently under certain conditions. In experiments. This is meant to find the minimum point of the error surface where the set of optimal weight re‐ sides. the goal of learning can be achieved. the network can learn better and faster. Letall value of weights be negative and using the equation below. the values of learning rate and parameters are acquired through experiments. the adjust‐ ment of weight is carried out by using the weight update method in and the current weight value as written below. the role of the previous weight update value in the weight is to bring in the fraction of the previous weight update value into the current weight to smooth out the new weight value. According to reference [20]. From the assumption above. Feeding net into the sigmoid activation function at equation (28).The correct tuning of learning a parameter’s value can assist in obtain‐ ing the correct value and sign of weight update.

it can be concluded that the information about the sign of past and current gradient is very helpful in guiding us to improve the training performance which in this case refers to the movement of gradienton the error surface.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx. However. the difference between the output at output layer with the desired output is large. As a result the node will not contribute much to the learning and output. Since the error is large.the errort end stobe large and so does the gradient.5772/51776 69 To sum up. W (t + 1) = ∆ W (t ) + W (t ) (32) If the value of initial weight is smallert han the gradient. Besides that. Other factors such as LR. Through a thorough study on the weight change sign processon both algorithms. theminimization method is performed by calculating the gradient with respect to the weight at hidden layer. This can be observed from the weight update equation as follows. . Afterperforming the feed-forward process.doi. the value of the computed gradient becomes large also. pre‐ vious weight update value and current weight value do have greater influence on the sign and magnitude of the new value of weight. The influence of the gradient through the weight update value can be seen from Equation (31) and the weight adjustment calculation below. that is not the sole determination of the sign. value of weight may lead the weight to remain negative on consecutive occasions when the gradient possesses a negatives ign. Assume that the random function assigns a negative value for one of the weight connections at the hidden layer. previous weight value and current weight value al‐ so play a greater role in affecting the weight changing sign and magnitude. momentum. learning rate. The heuristic given previously is merelyan indicator and information about gradient movement on the surface and the surface itself by which we get a better understanding about the gradient and the error surface so that we can arrive at the enhancement on gradient movement in the weight space. As a result.1) (31) At the initial training iteration. This large value of gradient together with its sign will dominate the weight update calculation and thus will bring a large change to the sign and magnitude of the weight. based on the heuristic that has been discussed.The negative as sign one at fifth iteration.gradient. The case discussed before can be viewed in this way. the new weight update value will be more likely affected by the gradient. Thus. the magnitude and sign of the weight will be changed according to the fraction of the gradient since at this initial iteration. ∆ W (t ) = η (t )∇ E (W (t )) + β (t ) ∆ W (t .org/10.the pre‐ vious weight update value is set to 0 and leaves the gradient to dominate the computation of weight up date value as shown by Equation (31). factors like gradient. the assignment of initial weight value also needs to be addressed further. momentum. the value of weight should be increased when the several consecutive signs of gradient remain the same and decreased if the opposite condition occurs. However. This can be seen in the equation below.

70 Artificial Neural Networks – Architectures and Applications δk = .1) ∇ E (W (t )) = ∑ δk W kj O (1 . the bias connection to the first hidden layer carries a small value so that its repre‐ sentation is not clearly shown in the diagram and its sign is negative. The changes on the gradient are shown in the table below. The Hinton diagram in Figure 3 illustrates the sign and magnitude of all weights connection between hidden and input layer as well as hidden and output layer at first iteration where the light colour indicates the positive sign of weight and the dark colour indicates the nega‐ tive sign of weight. the weight update value is largely influenced by the magnitude and sign of the gradi‐ ent (including the contribution of the learning rate in adjusting the step size of the gradient).1243. the gradient at hidden layer has a large positive value and since it is at the initial state. The following Hinton diagram is the representation of the weights in standard BP with adaptive learning network on a balloon dataset. Another notice able conclusion attained from the experiment is that when for consecutive iterations the gradient retains the same signs. the value and the sign of weight will be adjusted to compensate the error. As the re‐ sult.the change of weight sign and magnitude is still affected by the parameters and factors included in Equation (33) and (34) as explained before. However. This value will be used to fix the weight in the network by propagating back the error signal to the network. We can see that the large value of error will affect the value of gradient where it will be as‐ signed with a relatively large value. The biases have the value of 1. . the value of weight update which is mostly affected by gradient will dominate the change of weight magnitude and sign. which is 0.Ok ) ∙ Ok (Ok . However. however. As a result. this can be applied also to weight adjustment in the middle of training. The error decreases gradually from the first iteration until the fifth iteration. it does not contribute anything to the weight update computation. where the gradient and error are still large and the previous weight update val‐ ue is set to a certain amount. the previous weight update value is set to 0 and according to Equation (31).The change of the weight is represented by a Hinton dia‐ gram.O j ) k (33) (34) ( ) j Assume that from Equation (34).(tk . The following examples show the change of weight affected by the sign of gradient at con‐ secutive iterations and its value.the weight value over the iterations is in creased while when the gradient changes its signs for several consecutive iterations.The size of the rectangle indicates the magnitude of the weight. The resulting error at the first iteration is still large. By performing the weight adjustment method described by Equation (32). the weight value is decreased. The fifth rectangles in Figure 3 are the bias connection between the input layer to the hidden layer.

0151 0.0062 0.0035 -0.0097 0.0009 -0.0038 -0.0078 Table 2.0015 -0.0031 -0.0117 0.0279 0.0321 0.0029 0. Hidden node 1 Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Output node Output node Output node Output node Output node 0.0012 -0.0037 -0.0025 -0.0017 -0.0025 -0.0029 -0.0009 -0.0171 0.0021 0.0029 -0.0036 -0.0019 -0.0019 -0.005 0.0038 -0.0033 0.0022 -0.005 0.0147 0.0039 -0.0068 0.0021 0.0032 0.0058 Hidden node 2 0.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx.001 Table 1.0185 0.0022 0.0042 Bias 0.0027 -0.0063 0.doi.004 -0.0046 0. The gradient value between hidden layer and output layer at iterations 1 to 5. Hinton diagram of all weights connection between input layer and hidden layer at first iteration on Bal‐ loon dataset Input node 1 Iteration 1 Hidden node1 Hidden node2 Iteration 2 Hidden node1 Hidden node2 Iteration 3 Hidden node1 Hidden node2 Iteration 4 Hidden node1 Hidden node2 Iteration 5 Hidden node1 Hidden node2 0.008 0.0042 0.0023 -0.5772/51776 71 Figure 3.0032 Input node 2 Input node 3 0. The gradient value between input layer and hidden layer at iterations 1 to 5.0055 0.0016 -0.0037 0.0037 0.0053 0.org/10.0016 0.0044 0.0043 0.004 -0.0039 -0.0222 0.0135 0.0165 0.0009 Input node 4 Bias 0. .007 0.0028 0.

Hinton diagram of weightconnections between inputlayer and hidden layer at fifth iteration. The impact on weight is given in the diagram below. Comparing the value of the weight in the fifth iteration with the first iteration. This . it still depends on the factors that have been mentioned before. Hinton diagramof weight connections between inputlayer and hidden layer at 12th iteration.when each gradient moves in the same iteration for these consecutive iterations. From the table above we can infer that the gradient in the hidden layer moves in a different direction while the one in the output layer moves in the same direction. However. Based on the heuris‐ tic. the value of weight needs to be increased. we can infer that most of the magnitude of weight on the connection between input and hidden layers in the fifth iteration becomes greater compared with that inthe first iteration.72 Artificial Neural Networks – Architectures and Applications Figure 4. Figure 5.

Owing to the change of gradient sign that has been discussed before.0006 -0. .0017 -0.1158 from 0.001 0.1243.0025 0.0016 0.org/10.001 Input node 2 0. the sign of the weights remains the same.0014 -0. although the gradient sign of the third input node and the first hidden node changes. the error decreases to 0.0025 0.0001 -0.0005 -0. It can be seen from the sign of the gradient that it differs from the one at iterations 1-5.0016 0.0005 -0. The gradient value between input and hidden layer at iterations 12 to 16. Thus.0008 -0.001 0.0022 -0. it is noticeable that the sign of the weight between input node 4 and the first hidden node as well as bias at input layer to first hidden node changes. However.0028 0.0021 -0. weight and of course error itself.0018 -0. The same thing happens with the gradient at the third node of input layer to all hidden nodes where the sign changes from the one at fifth iteration.0012 0.0013 -0.0012 0.0006 0. The change in the gradient is shown in the table below.0001 -0.0021 0.0007 -0. The bias connection to the first hidden node is not clearly seen since its value is very small. the weight sign remains the same since the positive weight updateat the previous iteration and the changes of gradient are very small.0017 0.0009 0.0024 -0. the change on weight at this iteration is clearly seen in its magnitude.0009 0.0017 0. At this iteration. The weights are large in magnitude com‐ pared with the one at fifth iteration since some of the gradients have moved in the same di‐ rection.1116.0005 -0. Input node 1 Iteration 12 Hidden node1 Hidden node2 Iteration 13 Hidden node1 Hidden node2 Iteration 14 Hidden node1 Hidden node2 Iteration 15 Hidden node1 Hidden node2 Iteration 16 Hidden node1 Hidden node2 0.0009 -0. This is due to the influence of the result of the multiplication of large positive gradient and LR which dominates the weight update calculation.0019 -0.doi. Thus.0007 Input node 3 -0.0025 Bias -0. Moreover. However. which means that the gradient at bias connection moves in a different direction.0013 0.0008 -0. Atiterations 12 and 16 the error gradient moves slowly along the shallow slope in the same direction and brings smaller changes in gradient.0003 -0.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx.0035 0.0009 -0.0026 -0. the changes in sign of the gra‐ dient have less effect on the new weight value since its value is very small.0025 Table 3. Another change occurs at the gradient in sec‐ ond input node to the first hidden node at iteration 16. it has a smaller impact on the change of weight although the weight update value is negative or decreasing.0015 0.0025 0. For some of the weights.0011 -0.0025 0.0003 0.0013 Input node 4 0.5772/51776 73 shows that the value of weight is increased over iterations according to the sign of gradi‐ ent that is similar over several iterations. the error decreases to 0.0003 -0.its value is negative.0001 -0. As a result. and hence increases its magnitude and switches the weight direction.

74 Artificial Neural Networks – Architectures and Applications Besides the magnitude change. Figure 6. Figure 7. but now it turns to negative. the magnitude of this weight decreases gradually in small numbers over several iterations. Hinton diagram of weight connections between input layer and hidden layer at 14th iteration. Hinton diagram of weight connections between input layer and hidden layer at 1st iteration. From the experiment.the obvious change is seen in the sign of weight of the first input node to the second hidden node. . This is due to the negative value of gradient at the first iteration. It is positive at the previous iteration.

001392819 0.003111312 0.001416033 -0.002755439 0.001386109 0.1089.000660081 Input Node 3 -0.001394722 0.009227058 0.006036264 0. thus.000188403 -0.001824512 -0.org/10.008340029 0.002135194 0. The gradient value between input and hidden layer at iterations 1-4 .000258514 -0.003040339 0.000812789 -0.13E-05 -0. Hinton diagram of weight connections between input layer and hidden layer at 11th iteration.014965571 0. The next example is the Hinton diagram representing weights in standard BP with fixed pa‐ rameters network on Iris dataset. Figure 8.002207694 -0.001195881 -0.008590033 0.001514775 -0.00165541 0.001889934 0.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx.011141395 0.000525327 -0.000430432 -0.002383615 0.007711067 0.000466302 -0.000914026 -0.000309974 -0.000106987 -0.000251155 -0.001063613 -0.5772/51776 75 Since its initial value is quite big.003356197 2.010163722 0.000330876 -0.001842087 -0.000327393 Bias -0.002525449 -0. Moreover.018303748 0.006852751 0.000285222 -0.000557761 Input Node 2 -0.doi.001355907 -0.000932684 -0.002478258 0.000251081 -0.000335039 -0. the decrement does not have a significant effect on the sign.000175859 Table 4.013416363 0. since its value is getting smaller after several iterations.002969737 0. The error at this iteration decreases to 0.001367368 0.003095022 0.000224827 -0. Input Node 1 Iteration 1 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 2 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 3 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 4 Hidden Node 1 Hidden Node 2 Hidden Node 3 -0.000576354 Input Node 4 -0.000163215 -0.00304905 0. the impact of the negative gradient can be seen in the change of the weight sign.01660207 0.

002060602 0. the error value decreases.000519475 0.000141871 -0.000641498 0.000164175 -0.001999895 0.000270422 0.003016769 0.000299802 0.000182402 -0.003122239 0.001184916 0. we might know that the gra‐ dient at this connection moves in the same direction through out iterations 1-4 and 11-14.000197051 Table 5.001989824 -0.004164878 -0.003484476 -0.000312944 -0.003261605 0.000497406 -0. .000804462 0.000480617 0. The gradient value between input and hidden layer at iterations 11-14 Atiteration 11 .000486953 0.000763285 0.000583307 0.002294527 0.001487817 -0.001041566 0.003205306 0.000285504 0.00043872 0.003198699 -0.00242412 0.00084271 0.003804835 -0. due to the negative value of the gradient.00083246 0.000995646 0.000782111 0.005055156 -0.000485827 0.001865671 0. Figure 9.006205968 -0. the weight update value carries a nega‐ tive sign that causes the value to decrease until its sign is negative.001998522 0.001074373 0.000422466 -0.the most obvious change in sign is on the weight between the second input node and the first hidden node.76 Artificial Neural Networks – Architectures and Applications Input Node 1 Input Node 2 Input Node 3 Input Node 4 Bias Iteration 11 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 12 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 13 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 14 Hidden Node 1 Hidden Node 2 Hidden Node 3 -0.000488935 0.000888723 0. However.001115805 0. Hinton diagram of weight connections between input layer and hidden layer at 1st iteration.001722643 -0. At this iteration.000821693 0.000586532 0. Based on the table of gradient.001172258 0.002247662 0.000920457 0.001986585 0.004575005 -0.000946822 -0.00559575 0.000976178 0.001956214 0.000356958 -0.

16E-05 -0.005325608 -0.006027406 -0.000519753 0.000749096 -0.003268854 0. .001246792 -0.000775424 0.003322116 0.002239138 -0.002515178 6.002476693 3.000802899 -0.000759663 0.00034058 -0.002028133 0.000301095 -0.000140134 -0.000588941 -0.001068822 0.003331085 0.001700258 0.000583171 -0. The gradient value between input and hidden layer at iterations 1-4 .003299403 0.003919529 -0.00327263 0.003433234 -0.000468549 -0.47E-05 -0.00262835 -0.000457266 0.org/10.000772938 0.000774123 -0.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx.00E-05 -0.000673172 -0.000417257 0. Figure 10.002627843 0.001150779 0.006493288 -0.000746758 -0.002733914 -0.004784699 -0.000166769 -0. Input Node 1 Input Node 2 Input Node 3 Input Node 4 Bias Iteration 1 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 2 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 3 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 4 Hidden Node 1 Hidden Node 2 Hidden Node 3 -0.000360459 0.doi.001525718 0. Hinton diagram of weight connections between input layer and hidden layer at 11th iteration.002754283 0.002246529 0.000780289 0.000621859 -0.000344516 0.006110183 -0.001177827 0.002868889 -0.002945653 -0.5772/51776 77 The next example is the Hinton diagram representing weights in standard BPwith an adap‐ tive learning parameter network on Iris dataset.002109624 0.00043867 -0.002898428 -0.001022238 -7.000189476 -0.001868946 Table 6.

with ratio percentages of 70% and 30% respectively.004088069 0.000908119 -0.002986261 0.002255431 -0.003177038 -0.004096122 -0.001002275 -0. However.002190841 -0.00196994 0. all weights change their magnitude and some have different signs from before. This can be seen from the result of the convergence rate of the network.000869554 0.000466909 Table 7.003228589 -0. 7.001479172 -0.004087027 -0.003069687 -0. training and test‐ ing.00043751 0.004702491 0. convergence time and error.001245223 0.000838811 -0.003913207 -0. The result from both algorithms will be examined and studied based on its accuracy. The positive incremental value gradually change the magnitude and sign of the weight from negative to positive.001041828 -0.000370089 0.004647719 -0.002580744 0.000855605 -0.002727358 -0.000986844 -0.002708979 -0.002472493 0.002483971 0.001440144 0.001126258 -0. The dataset comprises a small and medium dataset which will be broken down into two datasets. Moreover.00439905 0.78 Artificial Neural Networks – Architectures and Applications Input Node 1 Input Node 2 Input Node 3 Input Node 4 Bias Iteration 11 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 12 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 13 Hidden Node 1 Hidden Node 2 Hidden Node 3 Iteration 14 Hidden Node 1 Hidden Node 2 Hidden Node 3 0.002210031 -0.00065675 -0.002899602 -0.000812331 0.001321819 0.005859167 -0.001361618 -0.000546702 -0.001012298 -0.003681706 0.001149609 -0.000891341 -0. this study also studies the weight change sign with respect to the temporal behaviour of gradient to study the learning behaviour of the network and also to measure the perform‐ ance of the algorithm. two-term BP with an adaptive algorithm works better in producing proper change of weight so that the time needed to converge is shorter compared with two-term BP without an adaptive learning method.00193001 0. the study on weight sign change of both al‐ gorithms shows that the gradient sign and magnitude and error have greater influence on the weight adjustment process.00027103 -0.003929436 -0.000280471 0.001884712 0.000663654 -0. The gradient value between input and hidden layer at iterations 11-12. Conclusions This study is performed through experimental results achieved by constructing programs for both algorithms which are implemented on various datasets.004165255 -0.002332771 -0.002112189 -0.001502568 0. . In addi‐ tion. The weight between the second input node to the first and second hidden lay‐ ers changes its sign to positive because of the positive incremental value since the gradient moves along the same direction over time.002636196 -0.001907574 0.002782251 -0. At the 11 th iteration.002973391 -0.002073105 0.

74(2). H. Ashraf Osman Ibrahim and Citra Ramadhena *Address all correspondence to: mariyam@utm. Back-propagation heuristics: A study of the extended Delta-Bar-Delta algorithm. 2009.Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx. Z. R. (1999). [5] Hongmei. May 12-May 17.. 307-324. S. Fast convergent generalized back-propagation al‐ gorithm with constant learning rate. A new BP algorithm with adaptive momentum for FNNs training. D.. Comparative performance of some popular artificial neural network algorithms on benchmark and function approximation problems. & Gaofeng. S. Neural Networks. in 2009 WRI Global Congress on Intelligent Systems GCIS 2009. B.5772/51776 79 Acknowledgements Authors would like to thanks Universiti Teknologi Malaysia (UTM) for the support in Re‐ search and Development. et al. Author details Siti Mariyam Shamsuddin*. & Luk.doi. [7] Minai. (2002). R. Neurocomputing. [3] Yu. C. Increased Rates of Convergence Through Learning Rate Adapta‐ tion. C. & Williams. in 2002 International Joint Conference on Neural Networks (IJCNN 2002). A three-term backpropa‐ gation algorithm. PramanaJournal of Physics. A. 13-23. Y. V. [4] Dhar. A. in 1990 International Joint Conference on Neu‐ . (1988). Faculty of Computer Science and Information Systems. A. [6] Jacobs. 2002. Honolulu.org/10. & Liu.. Neural processing letters. China: . UniversitiTeknologi Malaysia. Leung. (2010). K. S. F.. 2009-May 21. Malaysia References [1] Ng..my Soft Computing Research Group. J. L. United states: Institute of Electrical and Electronics Engineers Inc. 305-318. May 19.VOT 4L805). Xiamen.. 295-307. D. [2] Zweiri. D. A. (2009).. A backpropagation algorithm with adaptive learning rate and momentum coefficient. 1(4). HI. Whidborne. (1990). & Seneviratne. IEEE Computer Society. This work is supported by The Ministry of Higher Education (MOHE) under Long Term Research Grant Scheme (LRGS/TD/2011/UTM/ICT/03 . (2003).. and Soft Computing Research Group (SCRG) for the inspiration in making this study a success.

[12] Li. ... [14] Hua. 2011. World Academy of Science.80 Artificial Neural Networks – Architectures and Applications ral Networks.. Basis pursuit-based intelligent diagnosis of bearing faults. Singapore: Trans Tech Publications. (2007). Xiangji. [18] Riedmiller. K. M. Electronics and Communications in Japan.IJCNN 90. October 10-October 11. Advanced supervised learning in multi-layer perceptronsfrom backpropagation to adaptive learning algorithms. (2011). The application on the forecast of plant disease based on an im‐ proved BP neural network. J. [8] Jin. A new improved BP neural network algo‐ rithm. C.. & Huang.. Journal of Quality in Maintenance Engineering.. (2007). June 6-June 7. & Ma. H.. 249-258. [15] Xiaoyuan. C. M. 265-278.. 1059-1072. Spam filtering using semantic similarity ap‐ proach and adaptive BPNN. (1994). 1990. Y.. [17] Fukuoka. Sulaiman. W. 1990-June 21. 2009Wuhan. June 17. in 2009 International Conference on Computa‐ tional Intelligence and Natural Computing.. Engineering and Technology . Y. A differential adaptive learning rate meth‐ od for back-propagation neural networks. Hunan. (2012). Mathew. L. Fast backpropagation learning using optimization of learning rate for pulsed neural networks. Q.. et al. An improved error signal for the backpropagation model for classification problems. & Darus.. (2001). 152-162. Neurocomputing. CA. L. et al. Computer Standards and Inter‐ faces. in 2009 2nd International Conference on Intelligent Computing Technology and Automation. 289-292. 2009. et al. . [10] Shamsuddin. An online backpropagation algorithm with valida‐ tion error-based adaptive learning rate. 297-305. China: IEEE Computer Society. et al. [16] Yang.. (1998). & Garcia. Bin.. [9] Duffner. & Lu.. Artificial Neural Networks-ICANN 2007 . A modified back-propagation method to avoid false local minima. M. The improved training algorithm of back propagation neural net‐ work with selfadaptive learning rate. A. (2009). (2012). (2009). N. Neural Networks. J. & Mahdavi. in 2011 International Conference on Material Science and Information Technology. M. USA: Publ by IEEE. S. [11] Iranmanesh. 38. M. S. (2009). 27-34. Chi‐ na: IEEE Computer Society. B. Changsha. CINC 2009. Singapore. MSIT2011 September 16-September 18. International Journal of Computer Mathematics. 16(3). [13] Yamamoto. ICICTA 2009. S. San Diego. Li.

Weight Changes for Learning Mechanisms in Two-Term Back-Propagation Network http://dx.doi.org/10.5772/51776

81

[19] Sidani, A., & Sidani, T. (1994). Comprehensive study of the back propagation algo‐ rithm and modifications. in Proceedings of the 1994 Southcon Conference,. March 29March 31, 1994Orlando, FL, USA: IEEE. [20] Samarasinghe, S. (2007). Neural networks for applied sciences and engineering: from fundamentals to complex pattern recognition. Auerbach Publications.

Provisional chapter

Chapter 4

Robust Design of Artificial Neural Networks Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry
Methodology in Neutron Spectrometry
José Manuel Ortiz-Rodríguez, José Manuel Ma. del RosarioOrtiz-Rodríguez, Martínez-Blanco, Ma. del Rosario Martínez-Blanco, José Manuel Cervantes Viramontes and Héctor René Vega-Carrillo José Manuel Cervantes Viramontes and

Héctor René Vega-Carrillo
Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51274

1. Introduction
Applications of artificial neural networks (ANNs) have been reported in literature in various areas. [1–5] The wide use of ANNs is due to their robustness, fault tolerant and the ability to learn and generalize, through training process, from examples, complex nonlinear and multi input/output relationships between process parameters using the process data. [6–10] The ANNs have many other advantageous characteristics, which include: generalization, adaptation, universal function approximation, parallel data processing, robustness, etc. Multilayer perceptron (MLP) trained with backpropagation (BP) algorithm is the most used ANN in modeling, optimization classification and prediction processes. [11, 12] Although BP algorithm has proved to be efficient, its convergence tends to be very slow, and there is a possibility to get trapped in some undesired local minimum. [4, 10, 11, 13] Most literature related to ANNs focused on specific applications and their results rather than the methodology of developing and training the networks. In general, the quality of the developed ANN is highly dependable not only on ANN training algorithm and its parameters but also on many ANN architectural parameters such as the number of hidden layers and nodes per layer which have to be set during training process and these settings are very crucial to the accuracy of ANN model. [8, 14–19] Above all, there is limited theoretical and practical background to assist in systematical selection of ANN parameters through entire ANN development and training process. Due to this the ANN parameters are usually set by previous experience in trial and error procedure which is very time consuming. In such a way the optimal settings of ANN parameters for achieving best ANN quality are not guaranteed.

84

2

Artificial Neural Networks

Artificial Neural Networks – Architectures and Applications

The robust design methodology, proposed by Taguchi, is one of the appropriate methods for achieving this goal. [16, 20, 21] Robust design is a statistical technique widely used to study the relationship between factors affecting the outputs of the process. It can be used to systematically identify the optimum setting of factors to obtain the desired output. In this work, it was used to find the optimum setting of ANNs parameters in order to achieve minimum error network.

1.1. Artificial Neural Networks
The first ones works about neurology were carried out by Santiago Ramón y Cajal (1852-1934) and Charles Scott Sherrington (1852-1957). Starting from their studies it is known that the basic element that conforms the nervous system is the neuron. [2, 10, 13, 22] The model of an artificial neuron it is an imitation of a biological neuron. Thus, the ANNs try to emulate the processes carried out by biological neural networks trying to build systems capable to learn from experience, pattern recognition and to realize predictions. ANN are based on a dense interconnection of small processors called nodes, neuronodes, cells, unit or processing elements or neurons. A simplified morphology of an individual biological neuron is showed in figure 1, where can be distinguished three fundamental parts: the soma or cell body, dendrites and the cylinder-axis or axon.

Figure 1. Simplified morphology of an individual biological neuron

Dendrites are fibers which receive the electric signals coming from other neurons and transmit them to the soma. The multiple signals coming from dendrites are processed by the soma and transmitted to the axon. The cylinder-axis or axon is a fiber of great longitude, compared with the rest of the neuron, connected to the soma for an end and divided in the other one in a series of nervous ramifications; the axon picks up the signal of the soma and transmits it to other neurons through a process known as synapses. An artificial neuron it is a mathematical abstraction of the working of a biological neuron. [23] Figure 2, shows an artificial neuron. From a detailed observation of the biological process, the following analogies with the artificial system can be mentioned:

Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 3

Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx.doi.org/10.5772/51274

85

• The input Xi represents the signals that come from other neurons and are captured by dendrites. • The weights Wi are the intensity of the synapses that connects two neurons; Xi and Wi are real values. • θ is the threshold function that the neuron should exceed to be active; this process happens biologically in the body of the cell • the input signals to the artificial neuron X1 , X2 , ..., Xn are continuous variables instead of discrete pulses, as are presented in a biological neuron. Each input signal passes through a gain or weight called synaptic weight or strength of the connection whose function is similar to the synaptic function of the biological neuron. • Weights can be positive (excitatory) or negatives (inhibitory), the summing node accumulates all the input signals multiplied by the weights and pass to the output through a threshold or transfer function.

Figure 2. Artificial neuron model

An idea of this process is shown in figure 3, where can be observed a group of inputs entering to an artificial neuron.

Figure 3. Analogies of an artificial neuron with a biological model

The input signals are pondered multiplying them for the corresponding weight that would correspond in the biological version of the neuron to the strength of the synaptic connection; the pondered signals arrive to the neuronal node that acts as a summing of the signals; the output of the node is denominated net output, and it is calculated as the summing of the

W represents the weights and the new input b is a gain that reinforce . Didactic model of a single artificial neuron From this figure can be seen that the net inputs are present in the vector p. The net outpu is used as entrance to the transfer function providing the total output or answer of the artificial neuron. the net output of neuron n. From this figure. showed in figure 5. Figure 5. an alone neuron only has one element. The representation of figure 3 can be simplified as is shown in figure 4. can be represented as: a = f (n) = ∑ p i wi + b i =1 r (2) A more didactic model.86 4 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications pondered entrances plus a b value denominated gain. facilitates the study of a neuron. Didactic model of an artificial neuron The neuronal response to the input signals a. can be mathematically represented as follows: n= ∑ p i wi + b i =1 r (1) Figure 4.

org/10. [10. If the net had more than a neuron a would be a vector. The disposition and connection varies from one type of nets to other. 13] Each neuron is connected to other neurons by means of directed communication links. p2 . In figure 4. and is chosen depending of the specifications of the problem that the neuron wants to solve. Figure 6. A constant 1 enters to the neuron multiplied by the scalar gain b.Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 5 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx. the individual inputs p1 .. can be inappropriate and it is preferred to use the abbreviated notation represented in figure 6. the w1. which constitute the neuronal structure. . which is the net output of the network. This convention becomes more useful when there is a neuron with too many parameters. ANN are highly simplified models of the working of the brain. The exit of the net a it is a scalar in this case. p R are multiplied by the corresponding weights w1. can be observed a neuron with R inputs.. each with an associated weight. The entrances go to the weight matrix W . A layer is a collection of neurons.2 indexes indicate that this weight is the connection from the second entrance to the first neuron. [2.. a neuron has more than one entrance.R belonging to the weight matrix W . in this case the notation of figure 4..doi. which has R columns and just one row for the case of a single neuron. The sub-indexes of the weigh matrix represent the terms involved in the connection. .2 .. indicating that the entrance vector is a vectorial row of R elements. Abbreviated didactic model of an artificial neuron The entrance vector p is represented by the vertical solid bar to the left. and the second represents the source of the signal that feeds to the neuron. but in a general way the neurons are grouped by layers.1 .. For example. ANNs are usually formed by several interconnected neurons. [4] The weights represent information being used by the net to solve a problem. w1. The first sub-index represents the neuron destination. according to the location of the layer in the neural net. this receives different names: . 24] An ANN is a biologically inspired computational model which consists of a large number of simple processing elements or neurons which are interconnected and operate in parallel. The dimensions of p are shown in the inferior part of the variable as Rx1.5772/51274 87 the output of the summing n. Generally. w1. the net output it is determined by the transfer function which can be a lineal or non-lineal function of n.

a gain vector b. the transfer function and the output vector a. the hidden layers pick up and process the information coming from the input layer. net inputs vector n. Artificial Neural Network with two hidden layers Figure 8 shows a three-layer network using abbreviated notation. S1 neurons in the first layer. it is not considered as a layer of neurons. In this layer the information is not processed. Thus layer 2 can be analyzed as a one-layer network with S1 inputs. Their elements can have different connections and these determine the different topologies of the net. From this figure can be seen that the network has R1 inputs. each layer have its own weight matrix W . The outputs of first hidden layer are the entrances of the second hidden layer. it can be treated as a single-layer network on its own. S2 neurons in the second layer. S2 neurons. Figure 7. The input to layer 2 is a1 . Now that all the vectors and matrices of layer 2 have been identified. This approach can be taken with any layer of the network. the output is a2 . • Output layer: receives the information from the hidden layers and transmits the answer to the external means. A constant input 1 is fed to the bias for each neuron. • Hidden layers: these layers do not have contact with the exterior environment. for this reason. The outputs of each intermediate layer are the inputs to the following layer. Figure 7 shows an ANN with two hidden layers. etc. In this configuration. This ANN can be observed in abbreviated notation in figure 8. and an S2 xS1 weight matrix W2 . the numbers of hidden layers and neurons per layer and the form in that are connected. vary from some nets to others.88 6 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications • Input later: receives the input signals from the environment. . the summing.

After the training. this is propagated from the first layer through the superior layers of the net until generate an output. those neurons of the intermediate layers are organized themselves in such a way that the neurons learn how to recognize different features of the whole entrance space. two types of architectures are distinguished: • Feedforward architecture. based on the relative contribution that has contributed each neuron to the original output. However. the perceptron-like networks are feedforward types. the Hopfield network is of this type. • Feedback architecture. the neurons of the hidden layer only receive a fraction of the signal from the whole error signal. the neurons of the hidden layer of the net will respond with an active output if the new entrance contains a pattern that resembles each other to that characteristic that the individual neurons have learned how to recognize during their training. This process is repeated for each layer until all neurons of the network have received an error signal which describes its relative contribution to the total error. Once a pattern has been applied to the input of the network as a stimulus.org/10. There are no connections back from the output to the input neurons. 7]. The output signal is compared with the desired output and a signal error is calculated for each one of the outputs. Artificial Neural Network with two hidden layers in abreviated notation The arrangement of neurons into layers and the connection patterns within and between layers is called the net architecture [6.5772/51274 89 Figure 8. the network does not keep a memory of its previous output values and the activation states of its neurons.doi. According to the absence or presence of feedback connections in a network. such a network keeps a memory of its previous states. And to the inverse one. The importance of this process consists in that as trains the net.Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 7 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx. when they are presented an arbitrary pattern of entrance that contain noise or that it is incomplete. Back propagation feed forward neural nets is a network with supervised learning which uses a propagation-adaptation cycle of two phases. and the next state depends not only on the input signals but on the previous states of the network. There are connections from output to input neurons. The outputs errors are back propagated from the output layer toward all the neurons of the hidden layer that contribute directly with the output. Based on the perceived signal error the connection synaptic weights of each neuron are upgrade to make that the net converges toward a state that allows to classify correctly all the patterns of training. the units of .

The observed answers are examined in each phase. the current practice in the selection of design parameters for ANN is based on the trial and error procedure. determining suitable training and architectural parameters of an ANN still remains a difficult task mainly because it is very hard to know beforehand the size and the structure of a neural network one needs to solve a given problem. During the training process. Figure 9. always learns the task. An ideal structure is a structure that independently of the starting weights of the net. it cannot be evident for the human observer. for which they have been trained. these parameters determine the success of the training. optimization and prediction. If the level of a design parameter is changed and does not have effect in the performance of the net. to determine the best level in each design parameter. the best selected level in a design variable in . Of here. the important thing it is that the net has found one internal representation that allows him to generate the wanted outputs when are given the entrances in the training process. The advantages that ANNs offer are numerous and are achievable only by developing an ANN model of high performance. However. there are no clear rules how to set these parameters. The problem with neural networks is that a number of parameter have to be set before any training can begin. where a large number of ANN models are developed and compared to one another.e. and the experiment is repeated in a series of approaches. makes almost no error on the training set and generalizes well. However. In recent years. i.90 8 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications the hidden layers have one tendency to inhibit their output if the entrance pattern does not contain the characteristic to recognize. the Backpropagation net tends to develop internal relationships among neurons with the purpose to organize the training data in classes. That the association is or not exact. As can be appreciated in figure 9. Trial-and-error procedure in the selection of ANN parameters The serious inconvenience of this method is that a parameter is evaluated while the other ones are maintained in an only level. Yet. This tendency can be extrapolated to arrive to the hypothesis that all the units of the hidden layer of a Backpropagation are associated somehow to specific characteristic of the entrance pattern as consequence of the training. then a different design parameter is varied. there is increasing interest in using ANNs for modeling. Users have to choose the architecture and determine many of the parameters in a selected network. and the net will classify these entrances according to the characteristics that share with the examples of training. This same internal representation can be applied to entrances that the net has not seen before.

Dr. Taguchi methodology is based on the concept of fractional factorial design. [8. The number of experiments to be carried out can be decreased making use of the factorial fractional method.doi. Taguchi design provides a powerful and efficient method for the design of products that operate consistent and optimally about a variety of conditions. a statistical method based on the robust design of Taguchi philosophy. Fisher. However. Taguchi philosophy of robust design Designs of experiments involving multiple factors was first proposed by R. a full factorial design may involve a large number of experiments. A full factorial design identifies all possible combinations for a given set of factors.org/10. even for a small number of parameters and levels. is to evaluate all the possible level combinations of the design parameters. since the number of combinations can be very big.e. 20. Taguchi based optimization technique has produced a unique and powerful optimization discipline that differs from traditional practices. . In the robust design of parameters. The distinct idea of Taguchi’s robust design that differs from the conventional experimental design is the simultaneous modeling of both mean and variability in the designing. The Taguchi technique is a methodology for finding the optimum setting of the control factors to make the product or process insensitive to the noise factors. Taguchi’s approach allows to study the entire parameter space with a small number of experiments.Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 9 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx. design parameters that influence the performance or input that can be controlled. Genichi Taguchi is considered the author of the robust design of parameters. i.5772/51274 91 particular.2. in the 1920s to determine the effect of multiple factors on the outcome of agricultural trials (Ranjit 1990). the primary objective is to find the selection of the factors that decrease the variation of the answer. A form of overcoming this disadvantage. When it is used appropriately. Factors are the different variables which determines the functionality or performance of a product or system as for example. to carry out a complete factorial design. An OA is a small fraction of full factorial design and assures a balanced comparison of levels of any factor or interaction of factors.. this method cannot evaluate interactions among parameters since only combines one at the same time it could lead to an ANN impoverished design in general. All of these limitations have motivated researchers to generate ideas of merging or hybridizing ANN with other approaches in the search for better performance.[16. 21] 1. this method is very expensive and consumes a lot of time. A. This method is known as factorial design of experiments. 14–21] This is an engineering method for the design of products or processes focused in diminishing the variation and/or sensibility to the noise. could not necessarily be the best at the end of the experimentation. since those other parameters could have changed. Clearly. Since most experiments involve a significant number of factors. The columns of an OA represent the experimental parameters to be optimized and the rows represent the individual trials (combinations of levels). By using Orthogonal Arrays (OAs) and fractional factorial instead of full factorial. However. while the processes are adjusted on the objective.

which influences the average of the quality response and (2) control factor. 1. respectively. This is considered an efficient experiment since much information is obtained from a few trials. These are called design and noise factors. which means the design is balanced so that factor levels are weighted equally. The design factors. the control factors that are significant in affecting the sensitivity are identified and their appropriate levels are chosen. and finally. the most suitable OA is selected. Before designing an experiment. The Taguchi method is applied in four steps. The mean and the variance of the response at each setting of parameters in OA are then combined into a single performance measure known as the signal-to-noise (S/N) ratio. . all combinations of parameter levels occur an equal number of times. as controlled by the designer. factors are assigned to the appropriate columns. Taguchi’s robust design involves using an OA to arrange the experiment and selecting the levels of the design factors to minimize the effects of the noise factors. Design and conduct the experiments. and its mean is close to the desired target. Next. involves situations where a system’s performance depends on a signal factor. The array is called orthogonal because for every pair of parameters. In Taguchi methods there are variables that are under control and variables that are not. To design an experiment. That is. the settings of the design factors for a product or a process should be determined so that the product’s response has the minimum variation. environmental variation and deterioration. Brainstorm the quality characteristics and design parameters important to the product/process. The aim of the analysis is primarily to seek answers to the following three questions: (a) What is the optimum condition? (b) Which factors contribute to the results and by how much? (c) What will be the expected result at the optimum condition? 2. First. which can influence a product and operational process. knowledge of the product/process under investigation is of prime importance for identifying the factors likely to influence the outcome. The noise factors are uncontrollable such as manufacturing variation. The objective of the second phase is to adjust the responses to the desired values. The static problem attempts to obtain the value of a quality characteristic of interest as close as possible to a single specified target value. can be divided into: (1) signal factor. on the other hand. The dynamic problem. The real power in using an OA is the ability to evaluate several factors in a minimum of tests. the control factors that are significant for reducing variability are determined and their settings are chosen. Taguchi also proposed a two-phase procedure to determine the factors level combination. Next. the combinations of the individual experiments (called the trial conditions) are described.92 10 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications Taguchi’s robust design can be divided into two classes: static and dynamic characteristics. Experimental design using OAs is attractive because of experimental efficiency. which influences the variation of the quality response.

a confirmation experiment at the optimal design condition is conducted. [25–31] 1. Recently. vision. The S/N ratio is a quality indicator by which the experimenters can evaluate the effect of changing a particular experimental parameter on the performance of the process or product. the radiation spectrum. Taguchi recommended direct minimization of the expected loss. [32–34] The radiation spectrometry term can be used to describe measurement of the intensity of a radiation field with respect to energy. speech. Today. Taguchi methodology is useful for finding the optimum setting of the control factors to make the product or process insensitive to the noise factors. etc.Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 11 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx. which is derived from quality loss function. the use of ANN technology has been applied with relative success in the research area of nuclear sciences. Neutron spectrometry with ANNs The measurement of the intensity of a radiation field with respect to certain quantity like angle. and have been trained to perform complex functions in various fields. Analyze the results to determine the optimum conditions. classification.[3] mainly in the neutron spectrometry and dosimetry domains. frequency. including pattern recognition. For static characteristic. For the NTB case.. For the dynamic characteristics the SN ratio SNi = 10log10 ( β i / MSEi ) (3) is used for evaluating the S/N ratio. Taguchi developed a two-phase optimization procedure to obtain the optimal factor combinations. is very important in radiation spectrometry having. In this stage. [35] The distribution of the intensity with one of these parameters is commonly referred to as the “spectrum”.doi. the value of the robustness measure is predicted at the optimal design condition. 4. identification. Run a confirmatory test using the optimum conditions. Taguchi used S/N ratio to evaluate the variation of the system’s performance.3. frequency or momentum. where the mean square error (MSE) represents the mean square of the distance between the measured response and the best fitted line. calculating the robustness measure for the performance characteristic and checking if the robustness prediction is close to the predicted value. ANN can be trained to solve problems that are difficult for conventional computers or human beings. as a final result. energy. [36] A second quantity is the variation of the intensity of these radiations as a function of angle of incidence on a body situated in the radiation field and . Taguchi classified them into three types of S/N ratios: (a) Smaller-the-Better (STB) (b) Larger-the-Better (LTB) (c) Nominal-the-Best (NTB) For the STB and LTB cases.org/10. The two major goals of parameter design are to minimize the process or product variation and to design robust and flexible processes or products that are adaptable to environmental conditions.5772/51274 93 3. denotes the sensitivity. and control systems.

[39] Spectral information must generally be obtained from passive detectors which respond to different ranges of neutron energies such as the multispheres Bonner system or Bonner spheres system (BSS). Φ E ( E). [38] Also. [46] If a sphere d has a response function Rd ( E). and is easy to operate. pairs of thermoluminiscent dosimeters or track detectors. generally requires knowledge of the neutron energy spectrum incident on the body. Although the real Φ E ( E) and Rd ( E) are continuous functions of neutron energy. the unknown neutron spectrum is not given directly as a result of the measurements. 45] Figure 10. [43. the need to use an unfolding procedure and the low resolution spectrum are some of the BSS drawbacks. the weight. BSS consists of a thermal neutron detector such as 6 LiI ( Eu).94 12 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications is referred as “dose”. named neutron-fluence spectrum or simply neutron spectrum. and is exposed in a neutron field with spectral fluence Φ E ( E). However. they cannot be described . [37] Neutrons are found in the environment or are artificially produced by different ways. 44] As can be seen from figure 10. they are in a broad variety of energy distributions. Determination of neutron dose received by those exposed to workplaces or accidents in nuclear facilities. the sphere reading Md is obtained by folding Rd ( E) with Φ E ( E). these neutrons have a wide energy range extending from few thousandths of eV to several hundreds of MeV. can cover the energy range from thermal to GeV neutrons. [42. The neutron spectra and the dose are of great importance in radiation protection physics. Bonner spheres system for a 6 LiI ( Eu) neutrons detector The derivation of the spectral information is not simple. which is placed at the centre of a number of moderating spheres made of polyethylene of different diameter to obtain. has been used to unfold the neutron spectra mainly because it has an almost isotropic response. time consuming procedure. this means to solve the Fredholm integral equation of the first kind shown in equation 4. Activation foils. [40–42] BSS system. Φ E ( E). Md = Rd ( E)Φ E ( E)dE (4) This folding process takes place in the sphere itself during the measurement. through an unfolding process the neutron energy distribution. also known as spectrum.

△= Emax Emin δΦ EΦ E ( E)dE (6) Equation 5 is an ill-conditioned equations system with an infinite number of solutions which have motivated researches to propose new and complementary approaches. as a consequence. consuming much time and does not systematically target a near optimal solution. the dose △ can be calculated using the fluence-to-dose conversion coefficients δΦ E. 14–21] The trial-and-error technique is the usual way to get a better combination of these vales. there are some drawbacks and limitations related to ANN design process. Those factors include the synaptic weight initialization.5772/51274 95 by analytical functions. [25–31] however. Ri.j Φi − −− > j = 1. has been obtained. 44. [43. and. . Besides many advantages that ANNs offer. Φ E ( E). which may lead to a poor overall neural network design. Above all. Even though the BP learning algorithm provides a method for training multilayer feed forward neural nets. Many factors affect the performance of the learning and should be treated for having a successful learning process.Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 13 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx. the size of the net and the learning database. the momentum. To unfold the neutron spectrum.doi. Choosing the ANN architecture followed by selection of training algorithm and related parameters is rather a matter of the designer past experience since there are no practical rules which could be generally applied. a number of issues must be taken into consideration. as shown in equation 3. In order to develop an ANN which generalizes well and be robust. [8. This method cannot identify interactions between the parameters and do not use systematic methodologies for the identification of the “best” values. Φi is the neutron fluence within the ith energy interval and m is the number of spheres utilized. This is usually a very time consuming trial and error procedure where a number of ANNs are designed and compared to one another.. the learning rate. several drawbacks must be solved in order to simplify the use of these procedures. It is unrealistic to analyze all combination of ANN parameters and parameter’s levels effects on the ANN performance. showed in the following equation: Cj = ∑ Ri. a discretised numerical form is used. is not free of problems.org/10. j is the jth detector’s response to neutrons at the ith energy interval. the design of optimal ANN is not guaranteed. several methods are used. 47] ANN technology is a useful alternative to solve this problem. m i =1 N (5) where Cj is jth detector’s count rate. . Φ. A good election of these parameters could speed up and improve in great measure the learning process to reach the goal... although a universal answer does not exist for such topics. particularly related to architecture and training parameters. Once the neutron spectrum. 2.

This computer tool. As can be appreciated in figure 11. One challenge in neutron spectrometry using neural nets is the pre-processing of the information in order to create suitable pair input-output training data sets. the Taguchi method can be applied. etc. shows the re-binned neutron spectra data set used for training and testing the optimum ANN architecture designed with RDANN methodology. were re-binned into the thirty-one energy groups of the UTA4 response matrix. The attention is drawing on the advantages on Taguchi methods which offer potential benefits in evaluating the network behavior. At present. emphasizes simultaneous ANNs parameters optimization under various noise conditions. and can be applied to many aspects such as optimization. for the robust design of multilayer feedforward neural networks trained by backpropagation algorithm in the neutron spectrometry field. sensitivity analysis. expressed in energy units and in 60 energy bins. This work is concerned with the application of Taguchi method for the optimization of ANN models. model prediction. and at the same time. [48] Figure 11. 2. [50] The generation of a suitable data set is a non trivial task. In order to use the response matrix known as UTA4. The original spectra in this report were defined per unit lethargy in 60 energy groups ranging from thermal to 630 MeV. Robust design of artificial neural networks methodology Neutron spectrum unfolding is an ill-conditioned system with an infinite number of solutions. the energy range of neutrons spectra was changed through a re-binning process by means of MCNP simulations. Figure 12.2 MeV in the ANN training process. it is evident the need to have technological tools that automate this process. The integration of ANN and Taguchi’s optimization provides a tool for designing robust network parameters and improving their performance. [27] Researchers have using ANNs to unfold neutron spectra from BSS. The Taguchi method offers considerable benefits in time and accuracy when is compared with the conventional trial and error neural network design approach. 13 different equivalent doses were calculated per spectra by using the International Commission on Radiological Protection (ICRP) fluence-to-dose conversion factors. In this work. expressed en 31 energy groups. shows the classical approach of neutron spectrometry by means ANN technology starting from rate counts measured with BSS. neutron spectrometry by means of ANN technology is done by using a neutron spectra data set compiled by the International Atomic Energy Agency (IAEA). parameter estimation. [50] 187 neutrons spectra from IAEA compilation. work is being realized in order to alleviate this drawback.96 14 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications To deal economically with the many possible combinations. ranging from 10−8 up to 231. a systematic and experimental strategy called Robust Design of Artificial Neural Networks (RDANN) methodology was designed. Taguchi’s techniques have been widely used in engineering design. . experimental design. From the anterior. Because the novelty of this technology in this research area. [49] This compendium contains a large collection of detector responses and spectra. Here. the researcher spent a lot of time in this activity mainly because all the work is done by hand and a lot of effort is required. we make a comparison among this method and conventional training methods.

spending often large amount of time. The ANN designers have to choose the architecture and determine many of the parameters through the trial and error technique. the rate counts data set was calculated. showed in figure 13. This method consuming much time and does not systematically target a near optimal solution to select suitable parameter values. Re-binned spectra and equivalent doses are the desired output of ANN and its corresponding calculated rate counts the entrance data. In the ANN design process. Classical Neutron spectrometry with ANN technology Figure 12.5772/51274 97 Figure 11. An easier and more efficient way to overcome this disadvantage is to use the RDANN methodology. The selection of these parameters follows in practical use no rules. the choice of the ANN’s basic parameters often determines the success of the training process.org/10. which has become in a new approach to solve this problem. Re-binned neutron spectra data set used to train the optimum ANN architecture designed with RDANN methodology Multiplying re-binned neutron spectra by UTA4 response matrix. The second one challenge in neutron spectrometry by means ANN. This is a .Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 15 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx. is the determination of the net topology. which produces ANN with poor performance and low generalization capability. and their value is at most arguable. RDANN is a very powerful method based on parallel processes where all the experiments are planed a priori and the results are analyzed after all the experiments are completed.doi.

In the selected model. To provide scientific discipline to this work. In other words. which maximize the ANN performance and generalization capacity. the steps followed to obtain the optimum design of the ANN are described: 1. Finally.e. (a) The objective function. simulations. According figure 13. The main objective of the proposed methodology is to develop accurate and robust ANN models. The integration of neural networks and optimization provides a tool for designing ANN parameters improving the network performance and generalization capability. the designer must recognize the application problem well and choose a suitable ANN model. The objective function must be defined according to the purpose and requirements of the problem. a confirmation experiment at the optimal design condition is conducted. i. based on the Taguchi philosophy. which need to be optimized need to be determined (Planning stage). From figure 13 can be seen that in ANN design using Taguchi philosophy in RDANN methodology. the objective function is the prediction or classification errors between the target and the output values of BP ANN at testing stage. i. From simulation results. the response can be analyzed by using S/N ratio from Taguchi method (Analysis stage). In this research. so that the ANN model yields best performance. the design parameters. Planning stage In this stage it is necessary to identify the objective function and the design and noise variables. the goal is to select ANN training and architectural parameters. Using OAs.98 16 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications Figure 13. factors. in this research the systematic and methodological approach called RDANN methodology was used to obtain the optimum architectural and learning values of an ANN capable to solve the neutron spectrometry problem. the performance or ... Robust design of artificial neural networks methodology systematic and methodological approach of ANN design. training of ANNs with different net topologies can be executed in a systematic way (experimentation stage). calculating the robustness measure for the performance characteristic and checking if the robustness prediction is close to the predicted value (Confirmation stage).e.

Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 17

Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx.doi.org/10.5772/51274

99

mean square error (MSE) output of the ANN is used as the objective function as is showed in the following equation:

MSE =

1 N N AL )2 (Φ E ( E)iANN − Φ E ( E)ORIGI i N i∑ =1

(7)

N AL is the original spectra and Φ ( E ) ANN Where N is the number of trials, Φ E ( E)ORIGI E i i is the spectra unfolded with ANN. (b) Design and noise variables. Based in the requirements of the physical problem, users can choose some factors as design variables, which can be varied during the optimization iteration process, and some factors as fixed constants. Among the various parameters that affect the ANN performance, four design variables were selected, as is showed in the table 1:

Design Var. A B C D
Table 1. Design variables and their levels

Level 1 L1 L1 L1 L1

Level 2 L2 L2 L2 L2

Level 3 L3 L3 L3 L3

where A is the number of neurons in the first hidden layer, B is the number of neurons in the second hidden layer, C is the momentum and D is the learning rate. Noise variables are shown in table 2. These variables in most cases are not controlled by the user. The initial set of weights, U , usually is randomly selected; In the training and testing data sets, V , the designer must decide how much of the whole data should be allocated to the training and testing data sets. Once V is determined, the designer must decide which data of the whole data set to include in the training and testing data set, W . Design Var. U V W
Table 2. Noise variables and their levels

Level 1 Set 1 6:4 Tr-1/Tst-1

Level 2 Set 2 8:2 Tr-2/Tst-2

where U is the initial set of random weights, V is the size of training set versus size of testing set, i.e., V = 60% / 40%, 80% / 20% and W is the selection of training and testing sets, i.e., W = Training1/Test1, Training2/Test2. In practice, these variables are randomly determined, and are not controlled by designer. Because the random nature of this selection processes, the ANN designer must create these data sets starting from the whole data set. This procedure is very time consuming when is done by hand without the help of technological tools. RDANN methodology was designed in order to fully automate in a computer program, developed under Matlab environment and showed in figure 14, the creation of the noise

100

18 Artificial Neural Networks

Artificial Neural Networks – Architectures and Applications

variables and their levels. This work is done before the training of the several net topologies tested at experimentation stage. Besides the automatic generation of noise variables, another programming routines were created in order to train the different net architectures, and to statistically analyze and graph the obtained data. When this procedure is done by hand, is very time consuming. The use of the designed computer tool saves a lot of time and effort to ANN designer.

Figure 14. Robust Design of Artificicial Neural Networks Methodology

After the factors and levels are determined, a suitable OA can be selected for training process. The Taguchi OAs are denoted by Lr (sc ) where r is the number of rows, c is the number of columns and s is the number of levels in each column. 2. Experimentation stage. The choice of a suitable OA is critical for the success of this stage. OA allow to compute the main interaction effects via a minimum number of experimental trails. In this research, the columns of OA represent the experimental parameters to be optimized and the rows represent the individual trials, i.e., combinations of levels. For a robust experimental design, Taguchi suggest to use two crossed OAs with a L9 (34 ) y L4 (32 ) configuration, as is showed in table 3. From table 3, can be seen that a design variable is assigned to a column of the OA. Then, each row of the design OA represents a specific design of ANN. Similarly, a noise variable is assigned to a column of the noise OA, each row corresponds to a noise condition. 3. Analysis stage. The S/N ratio is a measure of both, the location and dispersion of the measured responses. It transforms the row data to allow quantitative evaluation of the design parameters considering their mean and variation. It is measured in decibels using the formula:

S/ N = 10log10 ( MSD )

(8)

where MSD is a measure of the mean square deviation in performance, since in every design, more signal and less noise is desired. The best design will have the highest S/N ratio. In this stage, the statistical program JMP was used to select the best values of the ANN being designed.

Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 19

Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx.doi.org/10.5772/51274

101

Trial No. 1 2 3 4 5 6 7 8 9

A 1 1 1 2 2 2 3 3 3

B 1 2 3 1 2 3 1 2 3

C 1 2 3 2 3 1 3 1 2

D 1 2 3 3 1 2 2 3 1

S1

S2

S3

S4

Mean

S/N

Table 3. ANN measured responses with a crossed OA with L9 (34 ) y L4 (32 ) configuration

4. Confirmation stage. In this stage, the value of the robustness measure is predicted at the optimal design condition; a confirmation experiment at the optimal design condition is conducted, calculating the robustness measure for the performance characteristic and checking if the robustness prediction is close to the predicted value.

3. Results and discussion
RDANN methodology was applied in nuclear sciences in order to solve the neutron spectrometry problem starting from the count rates of BSS System with a 6 LiI(Eu) thermal neutrons detector, 7 polyethylene spheres and the UTA4 response matrix expressed 31 energy bins. In this work, a feed-forward ANN trained with BP learning algorithm was designed. For ANN training, the “trainscg” training algorithm and mse = 1E−4 were selected. In RDANN methodology an OA with L9 (34 ) and L4 (32 ) configuration, corresponding to design and noise variables respectively, was used. The optimal net architecture was designed in short time and has high performance and generalization capability. The obtained results after applying RDANN methodology are:

3.1. Planning stage
Tables 4 and 5 shown the design and noise variables selected and their levels. Design Var. A B C D
Table 4. Design variables and their levels

Level 1 14 0 0.001 0.1

Level 2 28 28 0.1 0.3

Level 3 56 56 0.3 0.5

102

20 Artificial Neural Networks

Artificial Neural Networks – Architectures and Applications

where A is the number of neurons in the first hidden layer, B is the number of neurons in the second hidden layer, C is the momentum, D is the learning rate. Design Var. U V W
Table 5. Noise variables and their levels

Level 1 Set 1 6:4 Tr-1/Tst-1

Level 2 Set 2 8:2 Tr-2/Tst-2

where U is the initial set of random weights, V is the size of training set versus size of testing set, i.e., V = 60% / 40%, 80% / 20% and W is the selection of training and testing sets, i.e., W = Training1/Test1, Training2/Test2.

3.2. Experimentation Stage
In this stage by using a crossed OA with L9 (34 ), L4 (32 ) configuration, 36 different ANNs architectures were trained and tested as is showed in table 6. Trial No. 1 2 3 4 5 6 7 8 9 Resp-1 3.316E-04 2.213E-04 4.193E-04 2.585E-04 2.678E-04 2.713E-04 2.247E-04 3.952E-04 5.425E-04 Resp-2 2.416E-04 3.087E-04 3.658E-04 2.278E-04 3.692E-04 2.793E-04 7.109E-04 5.944E-04 3.893E-04 Resp-3 2.350E-04 3.646E-04 3.411E-04 2.695E-04 3.087E-04 2.041E-04 3.723E-04 2.657E-04 3.374E-04 Resp-4 3.035E-04 2.630E-04 2.868E-04 3.741E-04 3.988E-04 3.970E-04 2.733E-04 3.522E-04 4.437E-04 Median 2.779E-04 2.894E-04 3.533E-04 2.825E-04 3.361E-04 2.879E-04 3.953E-04 4.019E-04 4.282E-04 S/N 3.316E-04 2.213E-04 4.193E-04 2.585E-04 2.678E-04 2.713E-04 2.247E-04 3.952E-04 5.425E-04

Table 6. ANN measured responses with a corssed OA with L9,L4 configuration

The signal-to-noise ratio was analyzed by means of Analysis of Variance (ANOVA) by using the statistical program JMP. Since an error of 1E−4 was established for the objective function, from table 6, can be seen that all ANN performances reach this value. This means that this particular OA has a good performance.

3.3. Analysis stage
The signal-to-noise ratio is used in this stage to determine the optimum ANN architecture. The best ANN’s design parameters are showed in table 7. Trial No. 1 2 3 A 14 14 56 B 0 0 56 C1 0.001 0.001 0.001 C2 0.001 0.1 0.1 C3 0.001 0.3 0.1 D 0.1 0.1 0.1

Table 7. Best values used in the confirmation stage to design the ANN

Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 21

Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx.doi.org/10.5772/51274

103

3.4. Confirmation stage
Once optimum design parameters were determined, the confirmation stage was performed to determine the final optimum values, highlighted in table 7. After the best ANN topology was determined a final training and testing was made to validate the data obtained with the ANN designed. At final ANN validation and using the designed computational tool, correlation and Chi square statistical tests were carried out as shown in figure 15.

(a) Chi square test

(b) Correlation test Figure 15. Subfigure with four images

From figure 15(a), can be seen that all neutron spectra pass the Chi square statistical test, which demonstrate that statistically there is not difference among the neutron spectra reconstructed by the designed ANN and the target neutron spectra. Similarly from figure 15(b), can be seen that the whole data set of neutron spectra is near of the optimum value equal to one, which demonstrate that this is an OA with high quality. Figure 16 and 17 shown the best and worst neutron spectra unfolded at final testing stage of the designed ANN compared with the target neutron spectra, along with the correlation and chi sqare tests applied to each spectra.

RDANN: Best neutron spectra and correlation test .104 22 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications (a) Best neutron spectra (b) Correlation test Figure 16.

5772/51274 105 (a) Worst neutron spectra (b) Correlation test Figure 17.org/10.Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 23 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx.doi. RDANN: Worst neutron spectra and correlation test .

is best to reduce the amount of noise or gather additional data?. learning rate and momentum term. can be concluded that it is necessary to reduce the noise introduced in the random weight initialization. In consequence. The integration of ANN and optimization provides a tool for designing neural network parameters and improving the network performance.1. Such initialization introduces large negative numbers which is very harmful for the neutron spectra unfolded. The best time to stop training to avoid over-fitting is variable and depends of the proper selection of the ANN parameters. The effect of noise in the random weight initialization in the testing data. which not overfit the data and do not require more training time instead of using a large architecture stopping the training over the time or trials which produce a poor ANN.1 and a momentum = 0. methodological and experimental approach called RDANN methdology was introduced to obtain the optimum design of artificial neural networks. The optimal net architecture was designed in short time and has high performance and generalization capability. although researchers have developed potent learning algorithms of great practical value. but may require more time to train?. • Is it better to use a large architecture and stop training optimally or to use an optimum architecture. Conclusions ANNs is a theory that still is in development process. are even unknown. The proper density for training samples in the input space was: 80% for ANN training stage and 20% for testing stage. the best time to train the network avoiding the over-fitting was 120 seconds average. In the optimum ANN designed. • If noise is present in the training data. The Taguchi method is the main technique used to simplify the optimization problem. In this work. designed with the RDANN methodology.1 and a learning rate = 0. 4. which probably will not over-fit the data. The best architecture to use is 7 : 14 : 31. its true potentiality has not still been reached. a systematic. • When is the best time to stop training to avoid over-fitting?. a learning rate = 0. which has not physics meaning. In this case. the use of the RDANN methodology can help provide answers to the following critical design and construction issues: • What is the proper density for training samples in the input space?. It is better to use an optimum architecture. affects significantly the performance of the network. the noise produced results negatives in the unfolded neutrons. The factors that are found to be significant in the case study were number of hidden neurons in hidden layer 1 and 2. • Which is the best architecture to use?.1. and what is the effect of noise in the testing data on the performance of the network?.106 24 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications In ANNs design. In the random weight initialization is introduced a great amount of noise in training data. representations and procedures that the brain is served. . RDANN methdology was applied with success in nuclear sciences to solve the neutron spectra unfolding problem. mse = 1E−4 and a “trainscg” learning function. The near optimum ANN topology was: 7:14:31 whit a momentum = 0. a trainscg training algorithm and an mse = 1 E −4 .

Fanelli. and incorporates the concept of robustness in the ANN design process. This work was partially supported by PROMEP under contract PROMEP/103. Lakhmi and A. M. Recent advances in artificial neural networks design and applications.⋆ . . The results show that RDANN methodolgy can be used to find better setting of ANNs. Alavala.org/10. 2000. Unidades Académicas. It offers a convenient way of simultaneously considering design and noise variables. which not only results in minimum error.5772/51274 107 The proposed systematic and experimental approach is a useful alternative for the robust design of ANNs. References [1] C. to process and to present the information in an appropriate way to de designer. With RDANN it takes from minutes to a couple of hours to determine the best and robust ANN architectural and learning parameters allowing to researches more time to solve the problem in question. Fuzzy logic and neural networks basic concepts & applications. del Rosario Martínez-Blanco2 . 1996. The computer program developed to implement the experimental and confirmation stages of the RDANN methodology.com. [2] J. José Manuel Cervantes Viramontes1 and Héctor René Vega-Carrillo2 ⋆ Address all correspondence to: morvymm@yahoo.R.Gobierno del Estado de Zacatecas (México) under contract ZAC-2011-C01-168387. as the significant factors might be different for ANNs trained for different purpose. Ideally and optimization process should be performed for each ANNs application. 1-Ingeniería Eléctrica. CRC Press. and in the search of the optimal net topology being designed.mx Universidad Autónoma de Zacatecas. Ma. reduces significantly the time required to prepare.Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 25 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx. Acknowledgements This work was partially supported by Fondos Mixtos CONACYT . RDANN methodology reduces significantly the time spent in determining the optimum ANN architecture.doi. México 5. When compared with the trial-and-error approach. Author details José Manuel Ortiz-Rodríguez1. but also significantly reduces training time and effort in the modeling phases. New Age International Publishers. which can spent from several days to months to prove different ANN architectures and parameters which may lead to a poor overall ANN design. 2-Estudios Nucleares. This gives to the researcher time to solve the problem in which he is interested. The optimum setting of ANNS parameters are largely problem-dependent.5/12/3603.

Teaching the taguchi method to industrial engineers. International Journal of Materials & Product Technology. R. Peterson. month 1994. and M. Fausett. [16] G. Clair. Frenie and A. Mohiuddin. 2003. Elements of artificial neural networks. 2007. Lin and C. 2007. Jin. Artificial neural networks: a tutorial. Kishan. Inc. [10] D. C. S. Chen. Mao. and K. Graupe. L. Using taguchi’s method of experimental design to control errors in layered perceptrons. 2006. H. Aylward. L.K.A. 11(3-4):333–344. [6] J. E. Static and dynamic neural networks: from fundamentals to advanced theory. IEEE: Computer. [15] S. Chen. D. Marinaro. month 1995. Y. Apolloni. J. Robust design. journal = "engineering applications of artificial inteligence. and N.M. Zupan. [8] T. Fundamentals of probability and statistics for engineers. [7] A. Gupta.108 26 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications [3] R. Fundamentl of research methodology and statistics. Springer. Neural networks theory. K. [14] J. [9] M. Galushkin. IEEE Transactions on Neural Networks. Aplicaciones de redes neuronales artificiales en la ingeniería nuclear. 1996. St. Neural networks. Introduction to artificial neural network methods: what they are and how to use them. 2009. Y. Principles of artificial neural networks. E. [12] L. 1993. Technical report. Homma. and R. 2005. 2000. MCB University Press. Soong. Chilukuri. John Wiley & Sons. 2000. 29(3):31–44. [18] M. Shyam. Bond. Bassis. Singh. algorithms and applications. Dreyfus. Y. Application of taguchi method in the optimization of laser micro-engraving of photomasks. 13:3–14. [11] M. 2002. XII congreso Español sobre Tecnologías y Lógica Fuzzy. [17] Y. New directions in neural networks. Springer. Zheng.. The MIT Press. [5] B. 2004. . 41(3):327–352. M. Fundamentals of neural networks. New Age International Publishers. 1:485–490. architectures.N. Requena. IOS Press. and W. Acta Chimica Slovenica. [19] T. Jain. World Scientific. Optimum design for artifcial neural networks: an example in a bicycle derailleur system". S. C. 2001. 6(4):949–961. Correa and I. 2004. H. and H. methodology and applications. Jiju. Tseng.T. Tam. Prentice Hall. [4] G. W. [13] A. K.I. month 1996. Sanjay. 50(4):141–149.

Hernández-Dávila. Martínez-Blanco. Journal of Radioanalytical and Nuclear Chemistry. Mathworks. Vega-Carrillo. Perales-Muñoz.Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 27 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx.org/10. R. . A. G. M. S. and G. Inc. 281(3):615–618. Drake. M. Ortiz-Rodríguez. R. Artificial neural networks in neutron dosimetry. Ortiz-Rodríguez. P. Vega-Carrillo. [22] M. [29] H. and 239Pu-Be. Radiation Protection Dosimetry. Beale. A. Lorente. month 2009a. www. Méndez-Villafañe. R. fuzzy systems. month 2004. Spectra and dose with ANN of 252Cf. T. Packianather. Neutron spectrometry using artificial neural networks. Hernández-Dávila. theory and experiment. Arbib. [30] H. Vega-Carrillo. V. and Rowlands H. and H. and J.. M. Radiation Protection Dosimetry. [32] R. P. Kasabov. MIT Press. [27] H. W.com/help/pdf_doc/nnet/nnet. Hernández-Dávila. M . [31] H. Mercado Sánchez. Ortiz-Rodríguez.R. Martínez-Blanco. A. Koohi-Fayegh. E. Nuclear physics. V.pdf. A. Nigam. Martínez-Blanco. Robles-Rodríguez. M.doi. A. [26] H. 41:425–431. Neural networks toolbox. 1998.M. Vega-Carrillo. Iñiguez de la Torre. 56(1):35–39. 2003. S. month 2006. Part B. The Mit Press. [28] H. J. Preciado-Flores. Hernández-Dávila. Modelling neural network performance through response surface methodology for classifying wood veneer defects. Ortiz-Rodríguez. Different spectra with the same neutron source. R. 1310:12–17. Drake.mathworks. Vega-Carrillo. American Institute of Physics Proccedings. and M. M.R. 218(4):459–466. Hernández-Almaraz. Neutron spectra unfolding in Bonner spheres spectrometry using neural networks. R. Radiation Measurements. S. Ann in spectrometry and neutron dosimetry. user’s guide. Mercado-Sánchez.A. [24] N. B. Quality and Reliability Engineering International. Martínez-Blanco. Ghiassi-Nejad. Gallego. Arteaga-Arteaga. Setayeshi. Demuth. Proceedings of the Institution of Mechanical Engineers. [25] M. R. Optimizing the parameters of multilayered feedforward neural networks through taguchi design of experiments. Barquero. 241Am-Be. R. E. A.R. and J. 118(3):251–259. Revista Mexicana de Física S. M. R. M.P. V. G. Kardan. month 2005. Ann in spectroscopy and neutron dosimetry. John Wiley & Sons. Journal of Radioanalytical and Nuclear Chemistry.R.R. Manzanares-Acuña. S. E. and J. Hernández-Dávila. K. M. Mercado.M. R. R. 281(3):615–618. 2004.5772/51274 109 [20] M. 2010. 104(1):27–30. M. M. Vega-Carrillo. month 2000. Foundations of neural networs. V. Packianather and P. Roy and B.M. Manzanares-Acuña. 1967. Ortiz-Rodríguez. 1992. [23] M. month 2009b. Ortiz-Hernández. B. Hernández-Dávila. T Hagan. Brain theory and neural networks. and V. J. and J. M. M. V. 16:461–473. H. and knowledge engineering. A. [21] M.R. 2009.R.M.

Neutron spectrometry. Bedek. 2009. Lounis-Mokrani. 110(1-4):141–149. Nuclear Instruments and Methods in Physics Research A. month 2002. Performance of three different unfolding procedures connected to bonner sphere data. month 2002. Reuss. Gressier. Asselineau. 43:1095–1099. and T. 2000. Reginatto. M. What can we learn about the spectrum of high-energy stray neutorn fields from bonner sphere measurements. systems. Idiri. [43] M. J. L.B. 476:26–30. McDonald. 476:1–11. Dosimetry. Sidahmed. month 2008. Radiation Protection Dosimetry. Klein. 10(1-4):89–101. 2008. Neutron spectrometry for radiation protection. [42] V. EDP Sciences. World Scientific. and applications of nuclear processes. and H. B. J. Cahn and G. and M. .N. Neutron spectrometry for radiation protection purposes. Neutron physics. [34] R. Reginatto. Characterization of bonner sphere systems at monoenergetic and thermal neutron fields. 110(1-4):529–532. Z. [47] H. [35] D. 108(3):247–253. historical review and present status. Lacoste. Siebert. Characterization of the crna bonner sphere spectrometer based on 6lii scintillator exposed to an 241ambe neutron source. Radiation Protection [36] B. 2002. J. Vylet. F. month 2004. Use of 6LiI(Eu) as a scintillation detector and spectrometer for fast neutrons. Z. Radiation Protection Dosimetry. Radiation Measurements. V. [44] M. Bakali. 107(1-3):155–174. Radiation Protection Dosimetry. Mazrou. 1985. [40] M. M. 1957. M. Radiation Protection Dosimetry. and R. [45] R. month 2004. month 2007. Chouak. and W.L. Applications of bonner sphere detectors in neutron field dosimetry. 2:237–248. [37] P. Nuclear Instruments. 44:692–699. Matzke.110 28 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications [33] R. 2009. Pochat.S. Z. Nuclear energy.G. Bonner sphere neutron spectrometry at nuclear workplaces in the framework of the evidos project. Lferde. The experimental fundations of particle physics. Lacoste. month 2004. Nuclear Instruments and Methods in Physics Research A. 125(1-4):304–308. T. Brooks and H. Sanna. Nuclear Instruments and Methods in Physics Research A. A. Alberts. . Unfolding procedures. Radiation Measurements. an introduction to the concepts.R. El Messaoudi. 476(1-2):347–352. Goldhaber. [41] V. Fernández. Murray. Thomas. Allab. Cambridge University Press. D. [46] V. Bouassoule. Awschalom and R. Cherkaoui.L.C. Murray. month 2003. [39] M. Radiation Protection Dosimetry. Muller. [38] F. Response matrix of an extended bonner sphere system.

M.5772/51274 111 [48] H. V. month 2007b. Gallego. A. Catalogue to select the initial guess spectrum during unfolding. 476(1):270–273. R. month 2002. Technical Report 403. E. Nuclear Instruments and Methods in Physics Research A. Vega Carrillo. R.Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry 29 Robust Design of Artificial Neural Networks Methodology in Neutron Spectrometry http://dx. Vega-Carrillo. Compendium of neutron spectra and detector responses for radiation protection purposes. 126(1-4):408–412. 2001. and M. Lorente. Manzanares-Acuña. P. Iñiguez de la Torre and H. [50] M.org/10. Artificial neural networks technology for neutron spectrometry and dosimetry. Hernández-Dávila. P. . Iñiguez. E.doi. [49] IAEA. Radiation Protection Dosimetry.

.

Section 2 Applications .

.

org/10. several tools have been developed for the prediction of outcomes following drug treatment and other medical interventions. . ANNs are trained for specific applications through a learning process and knowledge is usually retained as a set of connection weights [4]. The backpropagation algorithm and its var‐ iants are learning algorithms that are widely used in neural networks. Sandro Orrù. Each time. Roberto Baccoli. the output is compared to the desired output and an error is computed.2] but over the past few years artificial neural networks (ANNs) have become an increas‐ ingly popular alternative to LR analysis for prognostic and diagnostic classification in clinical medicine [3]. In Medicine. The growing interest in ANNs has mainly been triggered by their ability to mimic the learning processes of the human brain.doi. Exactly what interactions are mod‐ eled in the hidden layers is still under study. Introduction Predicting clinical outcome following a specific treatment is a challenge that sees physicians and researchers alike sharing the dream of a crystal ball to read into the future. the input data is repeatedly presented to the network. Carlo Carcassi and Giorgio La Nasa Additional information is available at the end of the chapter http://dx. The standard approach for a binary outcome is to use logistic regression (LR) [1. With backpropagation. The error is then fed back through the network and used to adjust the weights in such a way that with each iteration it gradually declines until the neural model produces the desired output. Each node is connected to other nodes of a previous layer through adaptable inter-neuron connection strengths known as synaptic weights. Each layer within the network is made up of com‐ puting nodes with remarkable data processing abilities.5772/53104 1. Roberto Littera.Chapter 5 Comparison Between an Artificial Neural Network and Logistic Regression in Predicting Long Term Kidney Transplantation Outcome Giovanni Caocci. The network operates in a feed-forward mode from the input layer through the hidden layers to the output layer.

. medicine. although their role remains advi‐ sory since no convincing evidence of any real progress in clinical prognosis has yet been produced [5]. analyis of factors influencing therapeutic efficacy in idiopathic membranous nephropathy. In the field of nephrology. and many others... Indeed. engineering. meteorology. Figure 1. This all led up to the intriguing challenge of discovering whether ANNs were capable of predicting the outcome of kidney transplantation after analyzing a series of clinical and im‐ munogenetic variables. most of which describe their ability to individuate predictive factors of technique survival in peritoneal dialy‐ sis patients as well as their application to prescription and monitoring of hemodialysis therapy. economics. The prediction of kidney allograft outcome. a dream about to come true? . there are very few reports on the use of ANNs [6-10]. pre‐ diction of survival after radical cystectomy for invasive bladder carcinoma and individual risk for progression to end-stage renal failure in chronic nephropathies.116 Artificial Neural Networks – Architectures and Applications ANNs have been successfully applied in the fields of mathematics. in medicine. psychology. neurology. they offer a tantalizing alternative to multivariate analysis.

the reproducibility of this system has been questioned [13]. acute myeloid leukemia. Moreover. The role of HLA-G in kidney transplantation outcome Human Leukocyte Antigen G (HLA-G) represents a “non classic” HLA class I molecule. In the past. The biggest obstacles to be overcome in organ transplantation are the risks of acute and chronic immunologic rejection. Also during ischemic damage. the diagnosis can be extremely difficult or late [12]. Alternative approaches to the diagnosis of rejection are fine needle aspiration and urine cytology. It has also been shown that HLA-G expression by tumoral cells can contribute to an “escape” mechanism. What we need is a simple prognostic tool capable of analyzing the most relevant predictive variables of rejection in the setting of kidney transplantation.org/10. G2. including excessive levels of some immunosup‐ pressive drugs. Although allograft biopsy is considered the gold standard. diagnosis in the early stages is often difficult [11]. G6. melanoma. The complex setting of kidney transplantation Predicting the outcome of kidney transplantation is important in optimizing transplantation parameters and modifying factors related to the recipient. highly expressed in trophoblast cells. Blood tests may reveal an increase in serum creatinine but which cannot be considered a specific sign of acute rejection since there are several causes of impaired renal function that can lead to creatinine increase. 3. are often faced with a certain degree of uncertainty about the diagnosis.5772/53104 117 2. [16] HLA-G transcript generates 7 alternative messenger ribonucleic acids (mRNAs) that encode 4 membrane-bound (HLA-G1.Comparison Between an Artificial Neural Network and Logistic Regression in Predicting Long Term Kidney Transplantation Outcome http://dx. G4) and 3 soluble protein isoforms (HLA-G5.doi. G7). the Banff classification of renal transplant pathology provided a rational basis for grading of the severity of a variety of histological features. acute lym‐ phoblastic leukemia and B-cell chronic lymphocytic leukemia. donor and transplant procedure [8]. but unfortunately. but the main approach remains histological as‐ sessment of needle biopsy. These molecules exert their immunotolerogenic function towards the main effector cells involved in graft rejection through inhibition of NK and cytotoxic T lympho‐ cyte (CTL)-mediated cytolysis and CD4+T-cell alloproliferation. Unfortunately. HLA-G allelic variants are characterized by a 14-basepair (bp) deletion-inser‐ . [15] Additionally it would seem that HLA-G molecules have a role in graft tolerance following hematopoietic stem cell transplantation. pathologists working in centres where this approach is used early in the investigation of graft dysfunction. G3. Acute renal allograft rejection requires a rapid in‐ crease in immunosuppression. especially when they entail loss of graft function despite ad‐ justment of immunosuppressive therapy. [14] HLA-G plays a key role in embryo implantation and pregnancy by contributing to maternal immune tolerance of the fetus and. by protecting trophoblast cells from maternal natural killer (NK) cells through interac‐ tion with their inhibitory KIR receptors.[10] However. because the histological changes of acute rejection develop gradually. serum creatinine levels are elevated and so provide no indication of rejection. inducing NK tolerance toward can‐ cer cells in ovarian and breast carcinomas. more specifi‐ cally. including acute rejection.

95% CI 6. The patients had a median age of 49 years (range 18-77) and were prevalently males (66. . using the abbreviated Modification of Diet in Renal Dis‐ ease (MDRD) study equation. Kydney transplantation outcome In one cohort. a total of 64 patients (20. The patients were divid‐ ed into 2 groups according to the presence or absence of HLA-G alleles exhibiting the 14-bp insertion polymorphism. 95% CI 1.73m2. Average ±SD cold ischemia time (CIT) was 16 ± 5. Pre-trans‐ plantation serum levels of interleukin-10 (IL-10) were lower in the group of homozy‐ gotes for the 14-bp deletion.1%) for the HLA-G −14-bp polymorphism. This difference between the 2 groups became statistically significant at two years (5. The percentage of patients hyperimmunized against HLA Class I and II antigens (PRA > 50%) was higher in the group of homozygotes for the HLA-G 14-bp deletion.73m2.3) and continued to rise at 3 (10. and thus determine an increment in HLA-G expression. though not always concord‐ ant results [21-22]. 95% CI 7.7 antigens for HLA Class II.7 – 15. The donors had a median age of 47 years (range 15-75). Recent studies of association between the HLA-G +14-bp /−14-bp polymorphism and the outcome of kidney transplantation have provided interesting.6%). At one year after transplantation. Nearly all patients (94.118 Artificial Neural Networks – Architectures and Applications tion polymorphism located in the 3’-untranslated region (3’UTR) of HLA-G.3) and 6 years (11. Therefore.3 ml/min/1.0001. liver or kidney transplant patients is associated with better graft survival [18-20]. the 14-bp polymorphism is involved in the mechanisms controlling post-transcriptional reg‐ ulation of HLA-G molecules A crucial role has been attributed to the ability of these molecules to preserve graft function from the insults caused by recipient alloreactive NK cells and cytotoxic T lymphocytes (CTL).7%) it was their first transplant. The presence of the 14-bp insertion is known to generate an additional splice whereby 92 bases are removed from the 3’UTR [28]. a stronger progressive decline of the estimated GFR.1) after transplantation. Kidney transplant outcome was evaluated by glomerular filtration rate (GFR).4 ml/min/1.6 hours. serum cre‐ atinine and graft function tests.2 -9. was observed in the group of homozygotes for the HLAG 14-bp deletion in comparison with the group of heterozygotes for the 14-bp insertion.73 m2. The average (±SD) number of mismatches was 3 ± 1 antigens for HLA Class I and 1 ± 0.4%) lost graft function. P<0.9%) with either HLA-G +14-bp/+14-bp or HLA-G −14/+14-bp whereas the second group included 104 ho‐ mozygotes (33. The first group included 210 patients (66.9%) had been given a cadaver do‐ nor kidney transplant and for most of them (91. 4. P<0.0001.4 ml/min/1.01.4-14. P<0. [17] This is well supported by the numerous studies demonstrating that high HLA-G plasma concentrations in heart. HLA-G mRNAs having the 92-base deletion are more stable than the complete mRNA forms.

everolimus. ANNs have different architectures. positivity for anti-HLA Class I antibodies >50%. Once the input is propagated to the output neuron. recipients homozygous/heterozygous for the 14-bp insertion (+14-bp/ +14-bp and +14-bp/−14-bp) and homozygous for the 14-bp deletion (−14-bp/−14-bp). By adjusting the strengths of these connections. donor age. recipient age. corticosteroids.org/10. (Table 1) . We used the Neural Network ToolboxTM 6 of the soft‐ ware Matlab® 2008. These input data were then processed in the hidden layer (30 neurons). This type of network requires a desired output in order to learn. [23]. Logistic regression and neural network training We compared the prognostic performance of ANNs versus LR for predicting rejection in a group of 353 patients who underwent kidney transplantation. mycophenolate mophetyl. Finding appropriate architecture needs trial and error method and this is where backpropagation steps in. antithy‐ mocyte globulin (ATG) induction therapy. Each single neuron is connected to the neurons of the previous lay‐ er through adaptable synaptic weights. The 10 test phases utilized 63 patients randomly extracted from the entire cohort and not used in the training phase. patient/donor compatibility: class I (HLA-A. Mean sensitivity (the ability of predicting rejection) and specificity (the ability of predicting no-rejection) of data sets were determined and compared to LR. The difference is treated as the error of the network which is then backpropagated through the layers. The network is trained with historical data so that it can produce the correct output when the output is unknown.doi. which consequently require different types of algo‐ rithms. this neuron compares its activation with the expected training output. ANNs can approximate a function that computes the proper output for a given input pattern. positivity for anti-HLA Class II antibodies >50%.6 (MathWorks.Comparison Between an Artificial Neural Network and Logistic Regression in Predicting Long Term Kidney Transplantation Outcome http://dx. respectively. IL-10 pg/mL. donor gender. and the weights of each layer are adjusted such that with each backpropagation cycle the network gets closer and closer to producing the desired output [4]. from the output to the input layer. inc. we applied the ‘on-line back-propagation’ meth‐ od on 10 data sets of 300 patients previously analyzed by LR. The multilayer perceptron is the most popular network architecture in use today (Figure 2). Until the network is appropriately trained its responses will be random. -B) mismatch (0-4). type of immunosoppressive therapy (rapamy‐ cin. first versus second transplant. class II (HLA-DRB1 mismatch. The input layer of 15 neurons was represented by the 15 previous‐ ly listed clinical and immunogenetic parameters. time of cold ischemia.5772/53104 119 5. cyclosporine. Graft survival was calculated from the date of transplantation to the date of irreversible graft failure or graft loss or the date of the last follow up or death with a functioning graft. The output neuron predicted a number between 1 and 0 (goal). representing the event “Kidney rejection yes” [1] or “Kidney rejection no” (0).) to develop a three layer feed forward neural network. The training data set includes a number of cases. The following clinical and immunogenetic parameters were considered: recipient gender. tacrolimus). For the training procedure. each containing values for a range of well-matched input and output variables. version 7.

Sensitivity and specificity of Logistic Regression and an Artificial Neural Network in the prediction of Kidney rejection in 10 training and validating datasets of kidney transplant recipients .120 Artificial Neural Networks – Architectures and Applications Rejection No Extraction_1 Test N=63 Yes No Extraction_2 Test N=63 Yes No Extraction_3 Test N=63 Yes No Extraction_4 Test N=63 Yes No Extraction_5 Test N=63 Yes No Extraction_6 Test N=63 Yes No Extraction_7 Test N=63 Yes No Extraction_8 Test N=63 Yes No Extraction_9 Test N=63 Yes No Extraction_10 Test N=63 Yes Specificity % (mean) Sensitivity % (mean) No Rejection YES Rejection Observed Cases 55 8 55 8 55 8 55 8 7 8 55 8 55 8 55 8 55 8 55 8 LR Expected cases (%) 40 (73) 2 (25) 38 (69) 3 (38) 30 (55) 3 (38) 40 (73) 3 (38) 40 (73) 4 (50) 30 (55) 4 (50) 40 (73) 3 (38) 38 (69) 4 (50) 44 (80) 2 (25) 32 (58) 2 (25) 68% 38% ANN Expected cases (%) 48 (87) 4 (50) 48 (87) 4 (50) 48(87) 5 (63) 49 (89) 5 (63) 46 (84) 6 (75) 34 (62) 6 (75) 47 (85) 5 (63) 46 (84) 5 (63) 51 (93) 4 (50) 52 (95) 5 (63) 85% 62% Table 1.

Structure of a three-layered ANN 6.Comparison Between an Artificial Neural Network and Logistic Regression in Predicting Long Term Kidney Transplantation Outcome http://dx. Another motivation was the need for a simple prognostic tool capable of analyzing the relatively large number of immu‐ nogenetic and other variables that have been shown to influence the outcome of transplanta‐ .doi.5772/53104 121 Figure 2. The decision to perform analyses in this particular clinical setting was motivated by the importance of optimizing transplantation parameters and modifying factors related to the recipient.org/10. Results and perspectives ANNs can be considered a useful supportive tool in the prediction of kidney rejection fol‐ lowing transplantation. donor and transplant procedure.

When comparing the prognostic performance of LR to ANN. Roberto Baccoli2. the ability of predicting kidney rejection (sensitivity) was 38% for LR versus 62% for ANN. Italy References [1] Royston P. The advantage of ANNs over LR can theoretically be explained by their ability to evaluate complex nonlinear relations among variables.3 ) for her precious assistance in pre‐ Author details Giovanni Caocci1. Cagliari. evaluating assumptions and adequacy. ANNs have been faulted for be‐ ing unable to assess the relative importance of the single variables while LR determines a relative risk for each variable.19:541-561. 2000. 2000. and measuring and reducing errors. A strategy for modelling the effect of a continuous covariate in medicine and epidemiology. Cagliari. Italy 3 Medical Genetics. Lee KL. University of Cagliari. Department of Internal Medical Sciences. Stat Med. Faculty of Engineering. On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. By contrast. In many ways. University of Cagliari. The ability of predicting no-rejection (specificity) was 68% for LR compared to 85% of ANN. Multivariable prognostic models: issues in developing models. Stat Med. Mark DB. these two approaches are complementary and their combined use should considerably improve the clinical decision-making process and prognosis of kidney transplantation. Carlo Carcassi3 and Giorgio La Nasa1 1 Division of Hematology and Hematopoietic Stem Cell Transplantation. Stat Med. Cagliari.19:1831-1847.122 Artificial Neural Networks – Architectures and Applications tion. Department of Engineering of the Territory.1996. [2] Harrell FE. University of Cagliari. Vach W. [3] Schwarzer G. Acknowledgement We wish to thank Anna Maria Koopmans (affiliations paring the manuscript 1. Schumacher M.15:361-387. Department of Internal Medical Sciences. Italy 2 Technical Physics Division. Sandro Orrù3. Roberto Littera3. .

57:300-9. Diag. . Lezaic V. Fisher SJ. 1996. Meth‐ ods Inf Med. 18:2655-9. Jeffery JR. Furness PN. Diener HC. König IR. 11:4862-4870 [16] Le Rond S. [6] Simic-Ogrizovic S. Koford JK. Transplant Proc. 248:220-223 [15] Carosella ED. Carosella ED. Direct evidence to support the role of HLA-G in protecting the fetus from maternal uterine natural kill‐ er cytolysis. DeMars R. Radivojevic D. 2001. [14] Kovats S. Djukanovic L. Main EK. Rouas-Freiss N. [7] Brier ME. ASAIO J. Nicholson M. Baird BC. Axelsen RA. Dtubblebine M. A neural network approach to the diagnosis of early acute allograft rejection. International standardization of criteria for the histologic diagnosis of renal allograft rejection: the Banff working classifica‐ tion of kidney transplant pathology. 2006.a comparison of logistic regression and neural networks. J Clin Pathol. Poynton MR. 1999. [17] Rouas-Freiss N. Rouas-Freiss N. Gonçalves RM. Schroeder TJ. Krawice-Radanne I.Comparison Between an Artificial Neural Network and Logistic Regression in Predicting Long Term Kidney Transplantation Outcome http://dx. Nephrol Dial Transplant. 94:11520-5. Levesley J. 176:3266–3276. Advances in the diagnosis of renal transplant rejection.doi. Artificial neural networks in renewable energy systems applica‐ tions: a review. [10] Furness PN. Science. Furuncic D. 57:208-211.org/10. [5] 5. 1998. et al. Ziegler A. Pöppl SJ. Gough J. Pathol. Librach C. Proc Natl Acad Sci U S A. Curr. Predicting three-year kidney graft survival in recipients with systemic lupus erythe‐ matosus. 3:81-90. 2006. Journal of Immunolo‐ gy. 51:108-13. 31:3151 [11] Furness PN. 44:411-22. Favier B. expressed in human trophoblasts. HLA-G. Prediction of delayed renal allograft function using an artificial neural network. [9] Kazi JI. Blagojevic R. Durrbach A.45:536-540. 1999. Nicholson M. Blood 2008. Benediktsson H. 2003. LeMaoult P. 2011. Transplant Proc. Transplantation 1994. Carosella ED. 1990. Henry SF. Moreau P. Menier C. [12] Rush DN. Dausset J. Klein JB. Taub N. A class I antigen. Histological findings in early routine biopsies of stable renal allograft recipients. Aze´ma C. Ray PC. [13] Solez K.17. Weimar C. Kalogirou. Beyond the increas‐ ing complexity of the immunomodulatory HLA-G molecule. Guettier C.5:373–401. Two models for outcome prediction . Evidence to support the role of HLA-G5 in allograft acceptance through induction of immunosuppressive/ regulatory T cells. Diagnosis of early acute renal allograft rejection by evaluation of multiple histological features using a Bayesian belief network. Hurdle JF. Linder R. [8] Tang H. Kidney Int 1993. Renewable and Sustainable Energy Review. 31:368. Kazi J.5772/53104 123 [4] Soteris A. Goldfarb-Rumyantzev AS. 1997. Using ANN in selection of the most important variables in prediction of chronic re‐ nal allograft rejection progression.

Donadi EA. 41:1187-8. Duarte RA. Hagan M. 2008. The MathWorks. Natick. [23] Demuth H. Pisa‐ ni F. Liberatore G. Faggioni LP. HLA-G 14-bp insertion/deletion polymorphism in kidney transplant patients with metabolic complications. [21] Crispim JC. Mizutani K.124 Artificial Neural Networks – Architectures and Applications [18] Lila N. Terasaki PI. Beale M. Karakayali F. J Heart Lung Transplant. Saber LT. 26:421-2. Human leukocyte antigen-G expression after kidney transplantation is associated with a reduced incidence of rejection. 38:571-4. MA. Clemente K. 2008. Sözer O. Soares CP. Chevalier P. Mendes-Júnior CT. 2006. Maccarone D. . [20] Qiu J. Soluble HLA-G expres‐ sion and renal graft acceptance. [22] Piancatelli D. Transplant Proc. Costa R. Human leukocyte antigen-G. Soluble hu‐ man leukocyte antigen-G: a new strategy for monitoring acute and chronic rejections after heart transplantation. Guillemain R. User’s Guide. Carpentier A. Bal D. Wastowski IJ. 2006. Azzarone R. 18:361-7. 2009. Papola F. 6:2152-6. Haberal A. Famulari A. Fabiani JN. a new parameter in the follow-up of liver transplantation. Parzanese I. [19] Baştürk B. Am J Transplant. Carosella ED. Miller J. Amrein C. Transpl Immunol. Inc. Neural Network Toolbox™ 6. Silva JS. Cai J. Haberal M. Emiroğlu R. Transplant Proc. 2007.

The edge detection procedure is a critical step in the analysis of biomedical images.doi. e. The self-organizing map provides a quick and easy approach for edge detection tasks with satisfying quality of outputs. Nevertheless there are some examples of SOM utilization in edge detection. Self-organizing map in edge detection performance The self-organizing map (SOM) [5.Provisional chapter Chapter 6 Edge Detection in Edge Detection in Biomedical Images Using Self-Organizing Maps Biomedical Images Using Self-Organizing Maps Lucie Gráfová. in color image segmentation [18]. 9] is widely applied approach for clustering and pattern recognition that can be used in many stages of the image processing. The SOM algorithm has been implemented in MATLAB program suite with various optional parameters enabling the adjustment of the model according to the user’s requirements. Jan Mareš.1.org/10. Self-organizing map 2.5772/51468 1. 2. In our case. edge detection by contours [13] or edge detection performed in combination with conventional edge detector [24] and methods of image de-noising [8] . g. Jan Mareš. image compression [25]. Aleš Procházka and Pavel Konopásek Additional information is available at the end of the chapter http://dx. Introduction The application of self-organizing maps (SOMs) to the edge detection in biomedical images is discussed. which has been verified using the high-resolution computed tomography images capturing the expressions of the Granulomatosis with polyangiitis. texture edge detection [27]. enabling for instance the detection of the abnormal structure or the recognition of different types of tissue. binarisation document [4] etc. g. For easier application of SOM the graphical user interface has been developed. . The edge detection approaches based on SOMs are not extensively used. generation of a global ordering of spectral vectors [26]. The obtained results have been discussed with an expert as well. e. Aleš Procházka and Pavel Konopásek Additional information is available at the end of the chapter Lucie Gráfová. the SOM has been utilized in edge detection process in order to reduce the image intensity levels.

at the end of an epoch. whereas in batch mode. i.2. An adjustment of the weights. 4. Training of self-organizing map A SOM is neural network with unsupervised type of learning. Structure of SOM The input layer (size Nx1) represents input data x1 . e. 2. The output layer (size KxL). SOM can be trained in either recursive or batch mode. 3. 2. Calculation of a distances between the input vector and the weight vectors of the neurons of the output layer.3. xM (M inputs. the weight adjustment for each neuron is made after the entire batch of inputs has been processed. no cluster values denoting an a priori grouping of the data instances are provided. the nearest (the most similar) neuron of the output layer to the submitted input vector becomes a winner and its weight vector and the weight vectors of its neighbouring neurons are adjusted according to W = W + λ φs (xi − W). The weights adapt during the learning process based on a competition. The learning process is divided in epochs. represents clusters in which the input data will be grouped. Each neuron of the input layer is connected with all neurons of the output layer through the weights W (size of the weight matrix is KxLxN). The epoch involves the following steps: 1. . x2 . In recursive mode. e. i. . e. that may have a linear or 2D arrangement. i. Structure of self-organizing map A SOM has two layers of neurons. Figure 1. Consecutive submission of an input data vector to the network. (1) .126 2 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications 2. the weights of the winning neurons are updated after each submission of an input vector. see Figure 1. Selection of the nearest (the most similar) neuron of the output layer to the presented input data vector. . during which the entire batch of input vectors is processed. . each input is N dimensional).

the SOM is fine tuned with the shrunk neighbourhood and constant learning rate. In the ordering phase.01.5772/51468 3 127 where W is the weight matrix. respectively. corresponding to the strength of the learning.2.org/10. the value of the neighbourhood size parameter). i. Exponential decay λ t = λ0 e− τ . [23] 2.Edge Detection in Biomedical Images Using Self-Organizing Maps Edge Detection in Biomedical Images Using Self-Organizing Maps http://dx. In the convergence phase. e.3. however. The initial value of the neighbourhood size can be up to the size of the output layer.1. the less its weight is updated. the topological ordering of the weight vectors is established using reduction of learning rate and neighbourhood size with iterations. All neighbour weight vectors are shifted towards the submitted input vector. is usually reduced during the learning process. which can be reached already during the learning process. No decay λ t = λ0 . 1 . (4) (5) where T is the total number of iterations. the final value of the neighbourhood size must not be less than 1. t (2) t τ . 2. . The learning parameter should be in the interval 0. adjust their weights. the winning neuron update is the most pronounced and the farther away the neighbouring neuron is. Neighbourhood In the learning process not only the winner but also the neighbouring neurons of the winner neuron learn. Linear decay λ t = λ0 1 − 3.3. The learning process can be divided into two phases: ordering and convergence. There are several ways how to define a neighbourhood (some of them are depicted in Figure 3). 2. λ the learning parameter determining the strength of the learning and φs the neighbourhood strength parameter determining how the weight adjustment decays with distance from the winner neuron (it depends on s. (3) − 2tτ2 2 . not only at the end of the learning.doi. λ0 and λt are the initial learning rate and that at iteration t. This procedure of the weight adjustment produces topology preservation. The neighbourhood strength parameter. It decays from the initial value to the final value. Learning Parameter The learning parameter. xi the submitted input vector. Gaussian decay λ t = λ0 e 4. There are several common forms of the decay function (see Figure 2): 1.

2. x mean value of the input vector x. Euclidean distance dj = 2. the measure of the distance between the presented input vector and its weight vectors. The neighbourhood strength parameter should be in the interval 0. w ji i-th component of the j-th weight vector. The most commonly used are: 1. (6) ( xi − x )(w ji − w j ) . σx standard deviation of the input vector x.128 4 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications determining how the weight adjustment of the neighbouring neurons decays with distance from the winner. σx σw j i =1 ∑ N (7) ∑ | xi − w ji |. 2. w j mean value of the weight vector w j . Weights The resulting weight vectors of the neurons of the output layer. It decays from the initial value to the final value. which can be reached already during the learning process. Correlation dj = 3. There are several ways how to initialize the weight vector. is usually reduced during the learning process (as well as the learning parameter.01.3. The Figure 4 depicts one of the possible development of neighbourhood size and strength parameters during the learning process.3. e.4. represent the centers of the clusters. may have many forms. The resulting patterns of the weight vectors may depend on the type of the weights initialization. i. xi w ji (8) ∑ i =1 N xi − w ji 2 . not only at the end of the learning process. xi length of the input vector x and w ji length of the weight vector w j . obtained at the end of the learning process. Block distance dj = ∑iN =1 xi w ji . Distance Measures The criterion for victory in the competition of the neurons of the output layer. N dimension of the input and weight vectors. some of them are depicted in Figure 5. analogue of Equations 2–5 and Figure 2). 1 . i =1 N (9) where xi is i-th component of the input vector. Direction cosine dj = 4.3. . σw j standard deviation of the weight vector w j .

Learning rate decay function (dependence of the value of the learning parameter on the number of iterations): (a) No decay.5772/51468 5 129 Figure 2. Neighbourhood size decay function (dependence of the neighbourhood size on the number of iterations) and neighbourhood strength decay function (dependence of the value of the neighbourhood strength on the distance from the winner) .org/10.doi. Types of neighbourhood: (a) Linear arrangements. (c) Gaussian decay. (c) Hexagonal arrangements Figure 4. (b) Square arrangements. (b) Linear decay.Edge Detection in Biomedical Images Using Self-Organizing Maps Edge Detection in Biomedical Images Using Self-Organizing Maps http://dx. (d) Exponential decay Figure 3.

These weights represent the cluster centers.5. For the best result. Normalized error in the i-th cluster Ei = 1 ( x n − w i )2 . The weight adjustment corresponding to the smallest learning progress criterion is the result of the SOM learning process.3. the weight vector of the winning neuron representing cluster ci ). see Figure 6. according to D= ∑ ∑ ( x n − w i )2 . i. Mi n∑ ∈ ci (14) 1 k 1 ∑ ( x n − w i )2 .130 6 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications 2. k i∑ M i n ∈ ci =1 (12) 1 k ∑ ( x n − w i )2 . M i∑ =1 n ∈ ci (11) ∑ ( x n − w i )2 . the SOM should be run several times with various settings of SOM parameters to avoid detection of local minima and to find the global optimum on the error surface plot. For more information about the trained SOM. see Figure 7. Errors The errors of trained SOM can be evaluated according to 1.3. Error in the i-th cluster Ei = 5. Normalized error in the cluster E= 4. Learning Progress Criterion The learning progress criterion. the distribution of the input vectors in the clusters and the errors of the clusters can be visualized. minimized over the learning process. i. calculated after the end of each epoch. Mi is number of input vectors belonging to i-th cluster and k is number of clusters. Normalized learning progress criterion E= 3. the weight vector of the winning neuron representing cluster ci ). M is number of input vectors. i =1 n ∈ ci k (10) where xn is the n-th input vector belonging to cluster ci whose center is represented by wi (e. . n ∈ ci (13) where xn is the n-th input vector belonging to cluster ci whose center is represented by wi (e. is the sum of distances between all input vectors and their respective winning neuron weights.6. 2. Learning progress criterion (see Equation 10). 2.

The differential operations are usually approximated discretely . see Figure 8. K-means clustering). Canny [6]. Edge detection Edge detection techniques are commonly used in image processing. However it is calculated from the image intensity function of the adjacent pixels. g. The large value of the first derivative and zero-crossings in the second derivative of the image intensity represent necessary condition for the location of the edge. Sobel [19]. Selection of the nearest (the most similar) neuron of the output layer (e. 3. 23] 2. 2. g. Conventional edge detector The image edge is a property attached to a single pixel. [16. Forming Clusters The U-matrix (the matrix of average distance between weights vectors of neighbouring neurons) can be used for finding of realistic and distinct clusters. 28] 2. Validation of Trained Self-organizing Map The validation of the trained SOM can be done so. Different approach for SOM validation is n-fold cross validation with the leave-one out method.7. to detect discontinuities in either the image intensity or the derivatives of the image intensity. e. Using of trained self-organizing map A trained SOM can be used according the following steps for clustering: 1. 3. 3. Consecutive submission of an input data vector to the network. above all for feature detection.5772/51468 7 131 2. in proportion 70:30). Many commonly used edge detection methods (Roberts [22]. the cluster) to the presented input data vector. Prewitt [20]. Calculation of a distances between the input vector and the weight vectors of the neurons of the output layer. Marr-Hildreth [15] etc). feature extraction and segmentation. i.org/10. a portion of the input data is used for map training and another portion for validation (e.3. Implementation of self-organizing map The SOM algorithm has been implemented in MATLAB program suite [17] with various optional parameters enabling the adjustment of the model according to the user’s requirements. employ derivatives (the first or the second one) to measure the rate of change in the‘image intensity function. The aim of the edge detection process is to detect the object boundaries based on the abrupt changes in the image tones.3.doi. [23.5. The other approach for forming clusters on the map can be to utilize any established clustering method (e.1.4. For easier application of SOM the graphical user interface has been developed facilitating above all the setting of the neural network.8.Edge Detection in Biomedical Images Using Self-Organizing Maps Edge Detection in Biomedical Images Using Self-Organizing Maps http://dx. i. 2.

the SOM has been utilized for the reduction of image intensity levels. respectively. e. Self-organizing map A SOM may facilitate the edge detection task. The ability of SOM for edge detection in biomedical images has been tested using the high-resolution computed tomography (CT) images capturing the expressions of the Granulomatosis with polyangiitis disease. They can be distinguished from other vasculitides by the presence of ANCA antibodies (ANCA . i. The essential step in edge detection process is thresholding. GPA is quite a rare disease. Granulomatosis with polyangiitis 4.132 8 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications by proper convolution mask. Due to this image preprocessing using SOM. on the other hand exceedingly large value of the threshold leads to omission of some significant edges. Diagnosis of granulomatosis with polyangiitis Granulomatosis with polyangiitis (GPA). only the numerical gradient has been calculated G= 2 + G2 . that forms the set of input vectors (9-dimensional). the result image of the edge detection process is comprised only of the edge (white) and non-edge (black) pixels. i. Each image has been transformed to mask (3x3). In our case. Due to the thresholding. see Table 1. its yearly incidence is around 10 cases per million inhabitants. Gx y (15) where G is the edge gradient. on the other hand there is a possibility of generalized disease affecting multiple organs and threatening the patients life. e. 3.Neutrophil Cytoplasmic Antibodies).1. Gx and Gy are values of the first derivative in the horizontal and in the vertical direction. the edges are usually detected only in two or four directions. i.2. The course of the disease is extremely variable on the one hand there are cases of organ limited disease that affects only single organ. the following edge detection process has been strongly simplified. The quality of thresholding setting has an impact on the quality of the whole edge detection process. The output set of the classified vectors has been reversely transformed into the binary (black and white) image.Anti. from 256 to 2 levels. determination of the threshold limit corresponding to a dividing value for the evaluation of the edge detector response either as the edge or non-edge. exceedingly small value of the threshold leads to assignment of the noise as the edge. 4. Moreover. for instance by reducing the dimensionality of an input data or by segmentation of an input data. e. Diagnostics of this disease has been improved . The input vectors have been then classified into 2 classes according to the weights of the beforehand trained SOM. in the past also known as Wegener‘s granulomatosis is a disease belonging to the group of vasculitic diseases affecting mainly small caliber blood vessels [7]. for simplification of derivative calculation.

Edge Detection in Biomedical Images Using Self-Organizing Maps Edge Detection in Biomedical Images Using Self-Organizing Maps http://dx. where P1– P9 denote the intensity values of the image pixel.5772/51468 9 133 Figure 5. (c) Some of the input vectors are randomly chosen as the‘initial weight vectors Figure 6. (b) Vectors near the center of mass of inputs. i. The mask (9 adjacent pixels) was moved over the whole image pixel by pixel row-wise and the process continued until the whole image was scanned. x = ( P1. P3. P8. P5. In this case. P2.doi. P7. Weight vectors initialization: (a) Random small numbers. The figure shows the example of the evolution of the training criterion (sum of the distances between all input vectors and their respective winning neuron weights) as the function of the number of epochs. P4. The image mask used for the preparation of the set of the input vectors. The (binary) output value of SOM for each input vector replaced the intensity value in the position of the pixel with original intensity value P5 . From each location of the scanning mask in the image the single input vector has been formed. The training criterion of the learning process.org/10. e. P6. the smallest sum of distances was achieved in the 189th epoch P1 P4 P7 P2 P5 P8 P3 P6 P9 Table 1. P9).

134

10 Artificial Neural Networks

Artificial Neural Networks – Architectures and Applications

(a)

(b)

Figure 7. The additional information about the SOM result. (a) The distribution of the input vectors in the clusters, (b) The errors of the clusters (the mean value of the distance between the respective input vectors and the cluster center with respect to the maximum value of the distance)

by the discovery of ANCA antibodies and their routine investigation since the nineties of the past century. The onset of GPA may occur at any age, although patients typically present at age 35–55 years [11]. A classic form of GPA is characterized by necrotizing granulomatous vasculitis of the upper and lower respiratory tract, glomerulonefritis, and small-vessel vasculitis of variable degree. Because of the respiratory tract involvement nearly all the patients have some respiratory symptoms including cough, dyspnea or hemoptysis. Due to this, nearly all of them have a chest X-ray at the admission to the hospital, usually followed by a high-resolution computed CT scan. The major value of CT scanning is in the further characterization of lesions found on chest radiography. The spectrum of high-resolution CT findings of GPA is broad, ranging from nodules and masses to ground glass opacity and lung consolidation. All of the findings may mimic other conditions such as pneumonia, neoplasm, and noninfectious inflammatory diseases [2]. The prognosis of GPA depends on the activity of the disease and disease-caused damage and response to therapy. There are several drugs used to induce remission, including cyclophosphamide, glucocorticoids or monoclonal anti CD 20 antibody rituximab. Once the remission is induced, mainly azathioprin or mycophenolate mofetil are used for its maintenance. Unfortunately, relapses are common in GPA. Typically, up to half of patients experience relapse within 5 years [21].

4.2. Analysis of granulomatosis with polyangiitis
Up to now, all analysis of CT images with GPA expressions have been done using manual measurements and subsequent statistical evaluation [1, 3, 10, 12, 14]. It has been based on the experience of a radiologist who can find abnormalities in the CT scan and who is able to

Edge Detection in Biomedical Images Using Self-Organizing Maps

Edge Detection in Biomedical Images Using Self-Organizing Maps http://dx.doi.org/10.5772/51468

11

135

classify types of the finding. The standard software 1 used for CT image visualization usually includes some interactive tools (zoom, rotation, distance or angle measurements, etc.) but no analysis is done (e. g. detection of the abnormal structure or recognition of different types of tissue). Moreover, there is no software customized specifically for requirements of GPA analysis. Therefore, there is a place to introduce a new approach for analysis based on SOM. CT finding analysis is then less time consuming and more precise.

5. Results and discussion
The aim of the work has been to detect all three expression forms of the GPA disease in high-resolution CT images (provided by Department of Nephrology, First Faculty of Medicine and General Faculty Hospital, Prague, Czech Republic) using the SOM, i. e. to detect granulomatosin, mass and ground-glass, see Figure 10a, 11a, 12a. The particular expression forms occur often together, therefore there has been the requirement to distinguish the particular expression forms from each other. Firstly, the SOM has been trained using the test image, see Figure 9 (For detailed information about the training process, please see Table 2). Secondly, the edge detection of the validating images has been done using the result weights from the SOM training process, see Figure 10b, 11b, 12b (For detailed information about the edge detection process, please see Table 3).
Setting description Size of the output layer Type of the weights initialization Initial value of the‘learning parameter Final value of the learning parameter A point in the learning process in which the learning parameter reaches the final value Initial value of the cneighbourhood size parameter Final value of the cneighbourhood size parameter A point in the learning process in which the neighbourhood size parameter reaches the final value Type of the distance measure Type of the learning rate decay function Type of neighbourhood size strength function Type of the neighbourhood size rate decay function Number of the epochs Value (1,2) Some of the input vectors are randomly chosen as the initial weight vectors. 1 0.1 0.9 2 1 0.75

1 Exponential Linear Exponential 500

Table 2. Setting of the SOM training process

1

Syngo Imaging XS-VA60B, Siemens AG Medical Solutions, Health Services, 91052 Erlangen, Germany

136

12 Artificial Neural Networks

Artificial Neural Networks – Architectures and Applications

Figure 8. The graphical user interface of the software using SOM for clustering

Stages Image preprocessing Image transformation Clustering using SOM

Reverse image transformation

Edge detection
Table 3. Stages of the edge detection process

Description Histogram matching of the input image to the test image. The image is transformed to M masks (3x3) that form the set of M input vectors (9 dimensional). The set of the input vectors is classified into 2 classes according to the obtained weights from the SOM training process. The set of classified input vectors is reversely transformed into the image. The intensity values are replaced by the values of the class membership. Computation of image gradient.

Edge Detection in Biomedical Images Using Self-Organizing Maps

Edge Detection in Biomedical Images Using Self-Organizing Maps http://dx.doi.org/10.5772/51468

13

137

Figure 9. Test image (transverse high-resolution CT image of both lungs)

The obtained results have been discussed with an expert from Department of Nephrology of the First Faculty of Medicine and General Teaching Hospital in Prague. In the first case, a high-resolution CT image capturing a granuloma in the left lung has been processed, see Figure 10a. The expert has been satisfied with the quality of the GPA detection (see Figure 10b) provided by the SOM, since the granuloma has been detected without any artifacts. In the second case, the expert has pointed out that the high-resolution CT image, capturing both masses and ground-glass (see Figure 11a, 12a), was problematic to be evaluated. The possibility of the detection of the masses was aggravated by the ground-glass surrounding the lower part of the first mass and the upper part of the second mass. The artifacts, which originated in the coughing movements of the patient (‘wavy’ lower part of the image), made the detection process of the masses and the ground-glass difficult as well. Despite these inconveniences, the expert has confirmed, that the presented software has been able to distinguish between the particular expression forms of GPA with satisfying accuracy and it has detected the mass and ground-glass correctly (see Figure 11b, 12b, 12c). In conclusion, the presented SOM approach represents a new helpful approach for GPA disease diagnostics.

138

14 Artificial Neural Networks

Artificial Neural Networks – Architectures and Applications

(a)

(b)

Figure 10. Detection of granuloma. (a) Transverse high-resolution CT image of both lungs with active granulomatosis (white arrow). b) The edge detection result obtained by the SOM. The granulomatosin is detected with sufficient accuracy

(b) The edge detection result obtained by the SOM.org/10. The‘masses are detected and distinguished from the ground-glass with sufficient accuracy .doi. Detection of masses. Possibility of the detection of the masses is aggravated by the‘ground-glass surrounding the lower part of the first mass and the upper part of the second mass.Edge Detection in Biomedical Images Using Self-Organizing Maps Edge Detection in Biomedical Images Using Self-Organizing Maps http://dx. The artifacts originated by coughing movements of the patient makes the detection process difficult as well.5772/51468 15 139 (a) (b) Figure 11. (a) Coronal high-resolution CT image of both lungs with masses (white arrows).

Detection of ground-glass. Possibility of the detection of the ground-glasses is complicated by the masses in close proximity to the ground-glasses. (a) Coronal high-resolution CT image of both lungs with ground-glasses (white arrows). (b) The edge detection result obtained by the SOM. (c) The overlap of the original CT image and the edge detection result (cyan color) . The artifacts originated by coughing movements of the patient makes the detection process hard as well.140 16 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications (a) (b) (c) Figure 12. The ground-glasses are detected and distinguished from the masses.

P.. Author details Lucie Gráfová1 . . The‘ability of SOM has been verified using the‘high-resolution CT images capturing all three forms of the expressions of the GPA disease (granulomatosin. Using SOM. Pulmonary wegener?s granulomatosis: changes at follow-up ct. Wegener?s granulomatosis in the chest: High-resolution ct findings. J. N. Am J Roentgenol 192(3): 676–82. L. References [1] Ananthakrishnan. Future plans are based on the problem extension to three-dimensional space to enable CT image analysis involving (i) pathological finding 3D visualization and (ii) 3D reconstruction of the whole region (using the whole set of CT images). N. Begum. E. Aleš Procházka1 and Pavel Konopásek2 1 Department of Computing and Control Engineering. The obtained results have been discussed by the expert who has confirmed that the SOM provides a quick and easy approach for edge detection tasks with satisfying quality of output.. W. [2] Annanthakrishan. [3] Attali. 6046137306 and PRVOUK-P25/LF1/2.Edge Detection in Biomedical Images Using Self-Organizing Maps Edge Detection in Biomedical Images Using Self-Organizing Maps http://dx. enabling above all the detection of the abnormal structure or the‘recognition of different types of tissue.. M. H. P. Czech Republic 7. First Faculty of Medicine and General Faculty Hospital. The application of SOM for edge detection in biomedical images has been discussed and its contribution to the solution of the edge detection task has been confirmed. European Radiology 8: 1009–1113. Czech Republic 2 Department of Nephrology. [2007]. B. Institute of Chemical Technology. & Brauner. [4] Badekas.. Sharma. Am J Roentgenol 192(3): 676–82. [2009]. mass. ground-glass). & Kanne. & Kanne. [2009]. Wegener‘s granulomatosis in the chest: High-resolution ct findings.org/10. particular expression forms of the GPA disease have been detected and distinguished from each other.doi.. IET Image Processing 1: 67–84. Prague. Valeyre. L. D. P. & Papamarkos. Acknowledgements The work was supported by specific university research MSMT No. J. [1998].. Conclusions The edge detection procedure is a‘critical step in the analysis of biomedical images. the reaserch grant MSM No.5772/51468 17 141 6. Prague. Document binarisation using kohonen som. Guillevin. 21/2012. Romdhane. L. N. Sharma. R. Jan Mareš1 .

Serial high-resolution computed tomography imaging in patients with wegener granulomatosis: Differentiation between active inflammatory and chronic fibrotic lesions. Proceedings of the IEEE International Conference on Acoustics. & Schnabel. [9] Kohonen.. [19] Pingle. Kim. New York. O. [1986].. K. D. A. [2003]. Tateishi.. L. T. V. Anais do IX SIBGRAPI: 47–54... Araujo. G. Self-Organization and Associative Memory. Springer. E.. Visual perception by computer. T. E. Active disease and residual damage in treated wegener?s granulomatosis: an observational study using pulmonary high-resolution computed tomography. European Radiology 13: 36–42. [18] Moreira.. Gross. Kim. G.142 18 Artificial Neural Networks Artificial Neural Networks – Architectures and Applications [5] Baez. [2011]. Nomenclature and classification of vasculitis: lessons learned from granulomatosis with polyangiitis (wegener’s granulomatosis). [1996]. Muraközi. [11] Lane. Heller. Machine Learning. E.. Watanabe. & Procházka. & Fontuora. Kotter. N. [2011]. 305–324. & Pok. [2007]. Ghanem. A. G. K. [1999]. J.. H. [2003]. & Langer:.. A. [17] Moore. K.. U. C. Soc. Vol. pp. H. I. R. pp. & Scott. Clin Exp Immunol 164: 7–10. J. S. [2005]. Pearson. J. L. [1969]. K. [13] Liu.207. pp. Fujimoto. Theory of edge detection. Acta Radiologica 46: 484–491.. Springer-Verlag. T. European Radiology 13: 43–51. Reuter. C. Schaefer.. [16] Mitchell. Watts. Biomedical Image Volumes Denoising via the Wavelet Transform.. K. & Hildreth. [10] Komócsi.. [7] Jennette. M.. Texture edge detection by feature encoding and predictive model. [2011]. & Procházka. Landon. S. InTech. [15] Marr. B.. 1105–1108. Proc. Academic Press. S. P. [1997]. & Kwon. M. [1980]. Germany. D. E. chapter 14.. Epidemiology of systemic vasculitis. Ashizawa. [2005]. [1989]. A. [12] Lee. M. Matlab for Engineers. . [8] Jerhotová. H. 187–217. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8(6): 679–698. Roy. McGraw-Hill. A. W. and Signal Processing. Thoracic manifestation of wegener?s granulomatosis: Ct findings in 30 patients. H. E. Curr Rheumatol Rep 7: 270–275. Fernandez. J.. D. [6] Canny. Uhl. Švihlík.-C. J. S. [14] Lohrmann. Automatic Interpretation and Classification of Images. Neural-based color image segmentation and classification. Differential Diagnosis of Dementia Using HUMANN-S Based Ensembles.. chapter 14. A computational approach to edge detection. J. Speech. C. A. Moriya. M. Berlin. Prentice Hall. Johkoh. T. O.

P. [2003]. [1963]. [25] Sharma. Image compression and feature extraction using kohonen’s self-organizing map neural network. IEEE Transactions on Image Processing 15: 892–899. PhD thesis. & Mielikäinen. Renal involvement in wegener‘s granulomatosis.. J. [2005]. Y.org/10. [1970]. Neural Networks for Applied Sciences and Engineering: From Fundamentals to Complex Pattern Recognition. & Meur. L. Journal of Strategic E-Commerce 5(No. . J. [2010]. Clinic Rev Allerg Immunol 35: 22–29. & Okunbor. Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition. [24] Sampaziotis. C. D. Machine Perception of Three Dimensional Solids. pp. Image Analysis and Applications. & Ramya. S. S.Edge Detection in Biomedical Images Using Self-Organizing Maps Edge Detection in Biomedical Images Using Self-Organizing Maps http://dx. & Papamarko. Massachusetts Institute of Technology. D. [2008]. [2006]. Edge detection in multispectral images using the self-organizing map. L. L. K. M. Ansamäki.doi. L.. K. [28] Wilson. [27] Venkatesh. G. [2006]. 954–965. Parkkinen.Raja. Academic Press. J. Mathematical Modeling. New York. 0): 25–38. N.. [21] Renaudineau. P. [2007]. Clustering Algorithms and Applications. S. Y.5772/51468 19 143 [20] Prewitt. P. Object enhancement and extraction. Y. [23] Samarasinghe. Gaur. Multiple contour extraction from graylevel images using an artificial neural network. J. [22] Roberts. V. J. Picture Processing and Psychophysics. [26] Toivanen.. N. Automatic edge detection by combining kohonen som and the canny operator. S. Electrical Engineering Department. Pattern Recognition Letters 24: 2987–2994.

.

doi. Introduction The control of industrial machining manufacturing processes is of great economic impor‐ tance due to the ongoing search to reduce raw materials and labor wastage. Among the types of sig‐ nals employed is that of machining loads measured with a dynamometer [5. In the experiment. This article contributes to the use of MLP [10-12] and ANFIS type [13-16] artificial intelli‐ gence systems programmed in MATLAB to estimate the diameter of drilled holes. Carlos E. which is the next step in the monitoring of manufacturing processes. Aguiar and Eduardo C. has been researched through the application of artificial neural networks (ANN) since the 1980s [2]. a threedimensional dynamometer. Cruz. D. 4]. which consisted of drilling single-layer test specimens of 2024-T3 alloy and of Ti6Al4V alloy. Fernando de Souza Campos. as well as a combination of the above with other devices such as accelerometers and acoustic emission sensors [9]. The two types of network use the backpropagation method. Geronimo. Indirect manufacturing operations such as dimensional quality control generate indirect costs that can be avoided or reduced through the use of control systems [1].Chapter 7 MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process Thiago M. an accelerometer. which is the most popular model for manufacturing applications [2]. Bianchi Additional information is available at the end of the chapter http://dx. The use of intelligent manufacturing systems (IMS). In the quest for higher quality in drilling operations.org/10. Paulo R. vibrations [8]. The machining drilling process ranks among the most widely used manufacturing proc‐ esses in industry in general [3. an acoustic emission sensor. ANNs have been employed to monitor drill wear using sensors. 6].5772/51629 1. electric current measured by applying Hall Effect sensors on electric motors [7]. and a Hall Effect sensor were used to collect .

and electric current in the motor. 2. and the latter is the least studied process. since these holes are usually affected by several parameters. 3. Drilling Process The three drilling processes most frequently employed in industry today are turning. cutting parameters and machine stiffness also affect the precision of the hole [19]. which can learn from examples and. the most representative of which is the feed force FZ. respectively. Figure 1. generalize to process a new set of information [14]. as well as burrs and surface integrity. boring with helical drill bits accounts for 20% to 25% of all machining processes. The sum of the values of the inputs adjusted by the weight of each . the type of drilling process. y and z ax‐ es. since it affects chip formation and surface roughness.146 Artificial Neural Networks – Architectures and Applications information about noise frequency and intensity. It is very difficult to generate a reliable analytical model to predict and control hole diame‐ ters. table vibrations. mill‐ ing and boring [3]. the tool.1. Artificial neurons have synaptic connections that receive information from sensors and have an attributed weight. Figure 1 illustrates the loads involved in the drilling process. loads on the x. it is estimated that today. The quality of a hole depends on geometric and dimensional errors. Moreover. Loads involved in drilling processes. Artificial Intelligent Systems 3. However. Artificial multilayer perceptron (MLP) neural network Artificial neural networks are gaining ground as a new information processing paradigm for intelligent systems. based on training.

MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process http://dx. which are the knowledge base.1) ù û + hd j ( n ) yi ( n ) (1) where η is the learning rate and α is the moment. Typical architecture of an MLP with two hidden layers. are determined by a computational algorithm based on neural networks.5772/51629 147 synapse is processed and an output is generated. which consists of propagating the mean squared error gener‐ ated in the diameter estimation by each layer of neurons. between RBF (radial basis function) neural networks and TSK-type fuzzy systems [15]. In the present work. under certain constraints. Figure 3 exemplifies the ANFIS model with two input varia‐ bles (x and y) and two rules [11]. which are parameters that influence the learning rate and its stability.doi. The training error is calculated in each iter‐ ation. with m inputs and p outputs. A neural network artificial (ANN) learns by continuously adjusting the synaptic weights at the connections between layers of neurons until a satisfactory response is produced [9]. and δj (l) is the local gradient calculated from the error signal. A single ex‐ isting output is calculated directly by weighting the inputs according to fuzzy rules. The weight readjustment method em‐ ployed was backpropagation.org/10. Figure 2 shows a typical MLP ANN. 3. readjusting the weights of the connections so as to reduce the error in the next iteration. The outputs of a neuron are used as inputs for a neuron in the next layer. the MLP network was applied to estimate drilled hole diameters based on an analysis of the data captured by the sensors. based on the calculated output and desired output. respectively. . Adaptive neuro-fuzzy inference system (ANFIS) The ANFIS system is based on functional equivalence. wij (l) is the weight of each connection. and is used to adjust the synaptic weights according to the generalized delta rule: (l ) (l ) (l ) ( l -1) wij ( n + 1) = wij(l ) ( n ) + a é ë wij ( n . Figure 2. These rules. with each circle represent‐ ing a neuron.2.

To this end. The ANFIS meth‐ od is superior to other modeling methods such as autoregressive models.x j ÷ r j =i è a ø (2) where Pi is the potential of the possible cluster. The number of clusters. cascade correlation neural networks.148 Artificial Neural Networks – Architectures and Applications Obtaining an ANFIS model that performs well requires taking into consideration the initial number of parameters and the number of inputs and rules of the system [16]. the algorithm seeks a point that minimizes the sum of the potential and the neighboring points. ANFIS is a fuzzy inference system introduced in the work structure of an adaptive neural network. and the number of training iterations to be employed should be defined as parameters for the configuration of the inference sys‐ tem. xi is the possible cluster center. xj is each point in the neighborhood of the cluster that will be grouped in it. this method is not always efficient because it does not show how many relevant input groups there are. ANFIS architecture for two inputs and two rules based on the first-order Sugeno model. Figure 3. and an initial model is usually created with equally spaced pertinence functions. and n is the number of points in the neighborhood. However. backpropagation algorithm neural networks. thus enabling one to calculate the maximum number of fuzzy rules. These parame‐ ters are determined empirically. the ANFIS system is able to map inputs and out‐ puts based on human knowledge and on input and output data pairs [18]. the radius of influence of each cluster center. sixth-order poly‐ nomials. in which are centered the pertinence curves with pertinence values equal to 1.2 xi . . Using a hybrid learning scheme. The potential is calculated by equation (2): n 2ö æ 4 Pi = å exp ç . and linear prediction methods [10]. there are algorithms that help determine the number of pertinence functions. The subtractive clustering algorithm is used to identify data distribution centers [17]. At each pass.

which were arranged in this order to mimic their use in the aerospace industry. a considerable number of data were made available to train the artificial intelligence systems. LabView software was used to acquire the signals and store them in binary format for subsequent analysis and processing. Sensor assembly scheme for testing. a three-dimension‐ al dynamometer. Signal collection and drilling process The tests were performed on test specimens composed of a package of sheets of Ti6Al4V titani‐ um alloy and 2024-T3 aluminum alloy. an accelerometer.5772/51629 149 4. The elec‐ tric power was measured by applying a transducer to monitor the electric current and voltage in the terminals of the electric motor that activates the tool holder. and 162 holes were drilled in each one. The six signals were sent to a National Instruments PCI-6035E data acquisition board installed in a com‐ puter. and a Hall Effect sensor. The tool employed here was a helical drill made of hard metal. The data set consisted of the signals collected during drilling and the diameters measured at the end of the process.1. which were arranged as illus‐ trated in Figure 4. Thus. .MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process http://dx.doi. A total of nine test specimens were prepared.org/10. Figure 4. The drilling process was monitored using an acoustic emission sensor. Methodology 4. The acoustic emission signal was collected using a Sensis model DM-42 sensor.

0 22. cutting power and acceleration shown in Figure 5 were measured in real time at a rate of 2000 samples per second.0 90. Definition of the architecture of the artificial intelligence systems The architecture of the systems was defined using all the collected signals.0 22. Moreover. due to the differences in the behavior of the signals (see Figure 5) and in the ranges of values found in the measurement of the diameters. 4.e. two measurements were taken of each hole.. loads. i.4 250. A MAHR MarCator 1087 B comparator gauge with a precision of ±0. Diameter measurements Because the roundness of machined holes is not perfect.2.150 Artificial Neural Networks – Architectures and Applications To simulate diverse machining conditions. Each pass consisted of a single drilling movement along the workpiece in a given condition.4 250. An MLP network and an ANFIS system were created for each test specimen material. . 4. the di‐ ameter of the hole in each machined material will also be different due to the material’s par‐ ticular characteristics. y and z directions. Each test specimen was dubbed as listed in Table 1.3.0 Condition 1 2 3 4 5 6 7 8 9 ID 1A 1B 1C 2A 2B 2C 3A 3B 3C Spindle [rpm] 1000 1000 1000 500 500 500 2000 2000 2000 Table 1. Machining conditions used in the tests.0 22. electrical power and acceleration. different cutting parameters were selected for each machined test specimen.4 250.005mm was used to re‐ cord the measured dimensions and the dimensional control results were employed to train the artificial intelligence systems. those of acoustic emission. Feed Speed [mm/min] 90. The signals of acoustic emission. This method is useful to evaluate the performance of artifi‐ cial intelligence systems in response to changes in the process. loads in the x.0 90. one of the maximum and the other of the minimum diameter.

were organized into two matrices. the number of neurons contained in each layer.3. The entire data set of the tests resulted in 1337 samples. together with the maximum and minimum meas‐ ured diameters. Signals collected during the drilling process of Ti6Al4V alloy (left) and 2024-T3 alloy (right). the learning rate and the moment. 4.3 0. Multilayer Perceptron In this study. considering tool breakage during testing under condition 2C. The final choice was the combination that appeared among the five smallest errors in the estimate of the maximum and minimum diameters.15 0. one for each test specimen material.doi. Parameters Neurons in each layer Learning rate Moment Transfer function Table 2. Parameters such as the number of training iterations and the de‐ sired error are used as criteria to complete the training and were established at 200 training iterations and 1x10-7 mm.5772/51629 151 Figure 5.2 tansig 2024-T3 [20 10 5] 0. These data were utilized to train the neural network.org/10. This procedure was performed for each material. Ti6Al4V [5 20 15] 0. the signals from the sensors.8 poslin The MLP network architecture is defined by establishing the number of hidden layers to be used.MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process http://dx. respectively. .1. An algorithm was created to test combinations of these parameters. Architecture of the MLP networks.

6 0.152 Artificial Neural Networks – Architectures and Applications generating two MLP networks whose configuration is described in Table 2. Ti6Al4V 1. inclu‐ sion rate and rejection rate help define the number of clusters. This system consists of a fuzzy inference system (FIS) that collects the available data. and processes them to generate the desired output.4 2024-T3 1. with the entire training set presented at once.3. Thus.25 0.25 0. Mean error of hole diameters estimated by the ANFIS system for several training iterations. Parameters such as the radius of influence. optimizing the FIS through the definition of points with the highest potential for the cluster center. the appropriate number of training iterations was investigated. ANFIS parameters.1 Figure 6. The FIS is influenced by the organiza‐ tion of the training data set. The training method consists of a hybrid algorithm with the method of backpropagation and least-squares estimate. converts them into If-Then type rules by means of pertinence functions. 4.6 0. Parameters Radius of influence Inclusion rate Rejection rate Table 3. The subtractive clustering algorithm (subclust) is used to search for similar data clusters in the training set. . ANFIS The same data matrix employed to train the MLP neural network was used to train the ANFIS system. whose task is performed with the help of MATLAB’s Fuzzy toolbox. The desired er‐ ror was set at 1x10-7mm. Table 3 lists the parameters used in the ANFIS systems.2. Because training in the ANFIS system is performed in batch mode. and hence. The remaining parameters were kept according to the MATLAB default. the number of rules of the FIS.

. estimated by the MLP) for the Ti6Al4V alloy. the mean error was 0. Influence of inputs on the performance of diameter estimation 5.MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process http://dx. Given the data set of the sensors and the desired outputs. Figure 7 depicts the results of these estimates.0067mm.0649mm. MLP For the Ti6Al4V titanium alloy. the systems were trained using all the collected signals. The mean error for the minimum diameter was 0. which re‐ quires low computational capacity. Figure 8 shows the result of the estimation of the hole diameters machined in the 2024-T3 aluminum alloy. with a maximum error of 0. Minimum and maximum diameters (actual vs. The larger the number of training iterations.org/10.doi. the resulting mean error was 0.1.0655mm. which are shown on graphs.0676mm.1. Thus. For the maximum diameter. with a maximum error of 0. Use of all the collected signals Initially. the performance of the neural network is evaluated based on the error between the estimated diameter and the measured diameter.0080mm. Figure 6 illustrates the result of this test. the number de training iterations was set at 75. with a maxi‐ mum error of 0. which Consist of the measured diameters.0086mm. the greater the computational effort. Figure 7.0668mm. 5.0066mm. with a maximum error of 0.1. avoiding the peaks (Figure 6). For the maximum diameter. the estimate of the minimum diameter resulted in a mean error of 0. 5.5772/51629 153 an algorithm was created to test several numbers of training iterations ranging from 10 up to 1000.

Figure 9. For the maximum diameter. Figure 10 illustrates the result of the machined hole diameter estimated for the 2024-T3 al‐ loy.0739mm. ANFIS Figure 9 shows the diameters estimated by ANFIS.01102mm. Minimum and maximum diameters (actual vs.01188mm.0704mm.0980mm. . The maximum diameter presented a mean error of 0.0791mm. 5. using the same network configuration. with a maximum error of 0.00990mm. The mean error for the minimum diameter was 0. Minimum and maximum diameters (actual vs. and the highest error was 0. and a maximum error of 0.1. estimated by ANFIS) for the Ti6Al4V alloy.154 Artificial Neural Networks – Architectures and Applications Figure 8. estimated by the MLP) for the 2024-T3 alloy. The mean error in the estimate of the minimum diameter was 0. the resulting mean error was 0. with a maximum error of 0.0718mm.2.

2. an algorithm was created to test the performance of each type of system in response to each of the signals separately or to a combination of two or more signals.5772/51629 155 Figure 10.org/10. Performance of individual and combined signals in the estimation of hole diameters in Ti6Al4V alloy by the MLP network Individual signals and a combination of two distinct signals were tested for the MLP net‐ work. The classified signals are illustrated in Figure 11. estimated by the MLP) for the 2024-T3 alloy. Isolated and combined use of signals To optimize the computational effort. Figure 11.MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process http://dx. The best individual inputs for the Ti6Al4V alloy were the acoustic emission and Z force signals. the Z force and acceleration signals presented the lowest error. This procedure was adopted in order to identify a less invasive estimation method. . Minimum and maximum diameters (actual vs. Combined.doi. 5.

The acoustic emission signal combined with the Z force presented the best result with two combinations. the ANFIS system presented the best performance with the individ‐ ual Z force signal and with a combination of the Z force and acoustic emission signals. For the aluminum alloy. Figure 11 depicts the classified signals. Figure 13.156 Artificial Neural Networks – Architectures and Applications For the 2024-T3 alloy. the Z force and acceleration signals presented the lowest error. as indicated in Figure 14. Figure 12. as indicated in Figure 12. Performance of individual and combined signals in the estimation of hole diameters in Ti6Al4V alloy by the ANFIS system. Performance of individual and combined signals in the estimation of hole diameters in 2024-T3 alloy by the MLP system. . the best individual input was the Z force. In the ANFIS system. When combined. the Z force provided the best individual signal for the estimate of the drilled hole diameter in Ti6Al4V alloy.

5772/51629 157 Figure 14.MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process http://dx. tolerance normally employed in industrial settings (≤ 25μm). Performance using the Z force Because the performance of the artificial intelligence systems in the tests was the highest when using the Z force signal.org/10. new tests were carried out with only this signal. The simulation of the MLP network for the aluminum alloy presented lower precision errors than the measurement instrument used in 44% of the attempts. 6. Thirty-three percent of the . Figure 15. 6. The configurations used in the previous tests were maintained in this test. Classification of estimation errors for the 2024-T3 alloy obtained by the MLP network.1. Performance of individual and combined signals in the estimation of hole diameters in 2024-T3 alloy by the ANFIS system. MLP The multilayer perceptron ANN was trained with the information of the Z force and the minimum and maximum measured diameters. tol‐ erance required for precision drilling processes (≤ 12μm). and the errors that would lead to a non-conformity (> 25μm). according to the following criteria: precision of the instrument (≤ 5μm). The errors were divid‐ ed into four classes.doi.

Only 3% of the estimates performed by the artificial neural network would result in a product conformity rejection. For the titanium alloy. 6. but this time using only one input. ANFIS presented lower precision errors than the measurement in‐ strument employed in 40% of the attempts. ANFIS presented lower precision errors than the measurement in‐ strument employed in 35% of the attempts. Classification of estimation errors for the 2024-T3 alloy obtained by the ANFIS system. Only 6% of the estimates performed by the artificial neural network would result in a product conformity rejection.2. as indicated in Figure 17. Figure 16.158 Artificial Neural Networks – Architectures and Applications estimates presented errors within the range stipulated for precision holes and 17% for the tolerances normally applied in industry in general. ANFIS The ANFIS system was simulated in the same way as was done with the MLP network. Only 9% of the estimates performed by the artificial neural net‐ work would result in a product conformity rejection. the Z force. Thirty-four percent of the estimates presented . This procedure resulted in changes in the FIS structure due to the use of only one input. For the aluminum alloy. The simulation of the MLP network for the titanium alloy presented lower precision errors than the measurement instrument used in 45% of the attempts. Thirty-seven percent of the estimates presented errors within the range stipulated for precision holes and 15% for the tolerances normally applied in industry in general. Classification of estimation errors for the Ti6Al4V alloy obtained by the MLP network. Figure 17. Thirty-seven percent of the estimates presented errors within the range stipulated for precision holes and 19% for the tolerances normally used in industry in general.

with the former presenting a better result for the two alloys of the test specimen and the latter presenting good results only in the hole diameter estimation for the titanium alloy. The first system used here consisted of a multiple layer perceptron (MLP) artificial neural net‐ work. . the MLP system generated only 4% and 8% for the titanium and aluminum alloys.org/10. Two signals stood out: the Z force and acoustic emis‐ sion signals. respectively. another method was employed using the signal from only one sensor.doi. 7. the Z force was selected for the continuation of the tests. Only 11% of the estimates performed by the artificial neural network would lead to a product conformity rejection. Therefore. Conclusions Artificial intelligence systems today are employed in mechanical manufacturing processes to monitor tool wear and control cutting parameters. which involved the application of an adaptive neuro-fuzzy infer‐ ence system (ANFIS). Its performance was marked by the large number of signals used in its training and for its estimation precision.. Figure 18. i.. generated a large number of correct responses using the six availa‐ ble signals. i. As for its unacceptable error rates.e.MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process http://dx. The second approach.e. to evaluate the economic feasibility of their application. The results obtained here are very encouraging in that fewer estimates fell within the range considered inadmissible. whose simulations generated the low‐ est error among the available signals. 45% for the titanium alloy and 33% for the aluminum alloy. which produced 52% of correct responses (errors below 5 μm) for the titanium alloy and 42% for the aluminum alloy. The results described herein demonstrate the applicability of the two systems in industrial contexts.5772/51629 159 errors within the range stipulated for precision holes and 15% for the tolerances normally used in industry. Classification of estimation errors for the Ti6Al4V alloy obtained by the ANFIS system. using the MLP network. This article presented a study of the application of two systems used in the dimensional control of a precision drilling process. as indicated in Figure 18. only 6% for the aluminum alloy and 3% for the titanium alloy. However. A total of 11% of errors for the titanium alloy and 20% for the aluminum alloy were classified above the admissible tolerances (> 25μm).

(2006). which implies few physical changes in the equipment to be monitored. particularly multilayer perceptron neural networks and the adaptive neuro-fuzzy inference systems. H. Carlos E. Artificial Neural Networks in Manufactur‐ ing: Concepts. Aguiar and Eduardo C. 1-11. Comp. Manuf.. IEEE Trans. & López de Lacalle.br Universidade Estadual Paulista “Júlio de Mesquita Filho” (UNESP). W. i. Tech. and CNPq (National Council for Scientific and Technological Development) and CAPES (Federal Agency for the Support and Evaluation of Postgraduate Education) for providing scholarships. Paulo R. G. We are also indebted to the company OSG Sulamericana de Ferra‐ mentas Ltda. (2002). Bianchi* *Address all correspondence to: bianchi@feb. Industrial Controls and Manufacturing. & Klocke.. and Perspectives. H. S. Ber‐ lin. 9% for the aluminum alloy and 11% for the titanium alloy. These systems showed high accuracy and low computational effort. An experi‐ mental investigation of the effect of coatings and cutting parameters on the dry drill‐ ing performance of aluminium alloys. is feasible. Based on the approaches used in this work. Geronimo. & Zhang. N... Aramendi. Fernando de Souza Campos. (1999). Fertigungsverfahren: drehen. bohren (7 ed. 230. Bauru campus. for manufacturing and donating the tools used in this research. [3] Konig.-C. June).e. Author details Thiago M. . Int J Adv Manuf Technol. San Diego. as well as a low implementation cost with the use of only one sensor. Acknowledgements The authors gratefully acknowledge the Brazilian research funding agencies FAPESP (São Paulo Research Foundation). frasen. 17(2). Brazil References [1] Kamen. [2] Huang. Pack. W. Cruz.). D.. L. Herranz. Springer-Verlag. 28.unesp. F. it can be stated that the use of artificial intelli‐ gence systems in industry..160 Artificial Neural Networks – Architectures and Applications The results produced by the ANFIS system also demonstrated a drop in the number of errors outside the expected range. [4] Rivero. S. for supporting this research work under Process # 2009/50504-0. Applications. 409. A. Academic Press.Part A. (1994.

525. 2(3) [7] Li. Computer and Information Science. R. S. August). 1 ed. 665-685.. M. (1999). 172-178. I. K.. [6] Yang. Y. O.. . S. 227-235. Artificial-neural-networksbased surface roughness: pokayoke system for end-milling operations. & Zhang. S. Proceedings of 2007 IEEE International Conference on Mechatronics and Automation. 231. 43. Accurate Estimation of Surface Roughness from Texture Features of The Surface Image Using an Adaptive Neuro-Fuzzy Infer‐ ence System. Ertunc. [19] Lezanski. 707-720. 846-852. & Pal. Drill wear monitoring based on current signals. 258-263. L. 1388-1394. [10] Haykin.. Journal of Materials Processing Technology.. August(2009). Drilling wear detection and classification using vibration signals and artificial neural nerwork.. Butterworth-Heinemann. (2007). 823. & Jyothi. 2. J. Neurocomputing.. Concepts in Artificial Intelligence: Designing Intelligent Machines. A study of surface roughness in drilling Technol using mathematical analysis and neural networks. [14] Resende. & Li. [8] Abu-Mahfouz. P. Pear‐ son Prentice Hall. 109.. & Ho. 95-100. S. Sistemas Inteligentes: Fundamentos e Aplicações.doi. W. International Journal of Machine Tools & Manufac‐ ture. Int J Adv Manuf. (2003). 109. S.. P. S. Barueri. [13] J. Kumehara. B. [11] Sanjay. (1993). (2001). Oxford.-S. Man and Cybernetics. C. 2. (2003). Precision Engineering. Fuzzy Model Identification Based on Cluster Estimation. X. & Çakir. J. J. 29. [15] Lezanski. M. 267-278. & Kang. An Intelligent System for Grinding Wheel Condition Monitoring. Journal of Intelligent and Fuzzy Systems. I. 29. J. Chakraborty. Manole. P. K. Back-propagation Wavelet Neu‐ ral Network Based Prediction of Drill Wear from Thrust Force and Cutting Torque Signals. (1988). & Picton. ANFIS: Adaptive-Network-Based Fuzzy Inference System. Neural Networks: A Compreensive Foundation. IEEE Transactions on Systems. [20] Chiu.. 34.. Fuzzy Sets and Systems. 15-33. C. Patparganj. Monitoring of drill flank wear using fuzzy back-propagation neural network. 71. M.. [9] Kandilli.MLP and ANFIS Applied to the Prediction of Hole Diameters in the Drilling Process http://dx.. Journal of Materials Processing Technology. C. D. (2007. (2008). [12] Huang. 23(3). G. (2001). 2 ed. [18] Sugeno. Online Monitoring of Tool Wear In Drilling and Milling By Multi-Sensor Neural Network Fusion. [17] Johnson.. Int. C..org/10. Y. T. Ho. 376. Wear. H. (1994). Sönmez.. Har‐ bin. X. Structure Identification of Fuzzy Model.. H. S. 544-549. P. (2001). An Intelligent System for Grinding Wheel Condition Monitoring. (2001). Manuf. K. (2006). Chenb. S. & Tso.. Technol. B. (2005). S. 28. [16] Lee. . 258-263. Jang. Adv.5772/51629 161 [5] Panda.

C. S. [22] Drozda. 29. Accurate Estimation of Surface Roughness from Texture Features of The Surface Image Using an Adaptive Neuro-Fuzzy Infer‐ ence System. 95-100.. T. K.162 Artificial Neural Networks – Architectures and Applications [21] Lee. Y. S.. Ma‐ chining. & Wick. .. SME. J. (1983). Tool and Manufacturing Engineers Handbook. (2005). C. Precision Engineering. Ho. Dearborn. J. & Ho. 1.

in turn. increasing the number and quality of training samples. As a result of this. Motivated by the highly-modular biological network. This is done by using modular neural nets (MNNs) that di‐ vide the input space into several homogenous regions. slow learning or over fitting can be done during the learning process. result in over and under learning at different regions [8]. due to the uniqueness of the classes therein. High coupling among hidden no‐ des will then. Modular Neural Nets (MNNs) present a new trend in neural network architecture de‐ sign.modular designs. Sometime.doi. At the same time. 16 logic function on one bit level. a salient reduction in the order of computa‐ tions and hardware requirements is obtained. artificial neural net designers aim to build architectures which are more scalable and less subjected to interference than the traditional non-modular neural nets [1]. Such approach is applied to imple‐ ment XOR functions. Introduction In this chapter. and techniques for avoiding local minina. There are now a wide variety of MNN de‐ signs for classification. Enlarging the network. causes a wide range of deviations from efficient learn‐ ing in the different regions of input space [3]. and 2-bit digital multiplier. and hence cross talking .Chapter 8 Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks Hazem M. we introduce a powerful solution for complex problems that are required to be solved by using neural nets. there are other regions which show little or even no overlap. Usually there are regions in the class fea‐ ture space which show high overlap due to the resemblance of two or more input patterns (classes). Such tasks tend to introduce a wide range of overlap which. Compared to previous non. Non-modular classifiers tend to introduce high internal interfer‐ ence because of the strong coupling among their hidden layer weights [2]. El-Bakry Additional information is available at the end of the chapter http://dx. the network could not be learned for complex tasks.5772/53021 1. will not stretch the learning capabilities of the NN classifier be‐ yond a certain limit as long as hidden nodes are tightly coupled.org/10.

generally. 2. sub-solutions are integrated via a multi-module decision-making strategy. proved to be more efficient than non-modular alterna‐ tives [6]. Complexity reduction using modular neural networks In the following subsections. It. In comparison with non-MNNs. but meanwhile modules have to be able to give the multi module decision making strategy enough information to take accurate global decision [4]. each one is handled by a simple.1. In section 2. the neural net is trained to classify all of these four patterns at the same time. and learned by using backpropagation algorithm. There is no direct connections between the input and output layer as shown in Fig. MNN classifiers. A MNN classifier attempts to reduce the effect of these problems via a divide and conquer approach. The first uses fully connected neural nets with three neurons. However. we take into account the number of neurons and weights in both models as well as the number of computations during the test phase. 2. generally. In other words. y 0 1 0 1 O/P 0 1 1 0 . the task decomposition algo‐ rithm should produce sub tasks as they can be. In section 3. MNNs can not offer a real alternative to non-modular networks un‐ less the MNNs designer balances the simplicity of subtasks and the efficiency of the multi module decision-making strategy. and the other is in the output layer. we prove that MNNs can solve some problems with a little amount of requirements than non-MNNs. In this chapter. Hence. XOR function. Then. In this case. A simple implementation for XOR problem There are two topologies to realize XOR function whose truth Table is shown in Table 1 us‐ ing neural nets. we investigate the usage of MNNs in some binary problems. two of which are in the hidden layer. decomposes the large size / high complexi‐ ty task into several sub-tasks. and 16 logic functions on one bit level are simply implemented using MNN. all MNNs are feedforward type. In a previous paper [9]. another strategy for the design of MNNS is presented and applied to realize.164 Artificial Neural Networks – Architectures and Applications during learning [2].1. Here. and efficient mod‐ ule. Comparisons with conventional MNN are given. Truth table of XOR function. and 2-bit digital multiplier. we have shown that this model can be applied to realize non-binary data. x 0 0 1 1 Table 1. fast.

The second approach was presented by Minsky and Papert which was realized using two neurons as shown in Fig. The value of +1. This can be done at two steps: 1. we may consider the problem of classifying these four patterns as two indi‐ vidual problems. The first bit (X) may be used to select the function.5.doi. 2. Figure 2. X 0 0 1 1 Table 2. Y 0 1 0 1 O/P 0 1 1 0 ¯) Inverter (Y New Function Buffer (Y) .Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks http://dx.5772/53021 165 Figure 1. Realization of XOR function using three neurons. Consider the second bit Y.5 for the output neuron insures that it will turn on only when it receives a net positive input greater than +0. The first representing logic AND and the other logic OR. Divide the four patterns into two groups. We deal with each bit alone. The weight of -2 from the hidden neuron to the output one insures that the output neuron will not come on when both input neurons are on [7].org/10. The value of +0. Using MNNs.5 for the threshold of the hidden neuron insures that it will be turned on only when both input units are on. Results of dividing XOR Patterns. 2. while the second group which contains the other two patterns represents an inverter as shown in Table 2. The first group consists of the first two patterns which realize a buffer. Realization of XOR function using two neurons.

one to realize the buffer. The first input X acts as a detector to select the proper weights as shown in Fig.4. Figure 3. Implementation of XOR function using only one neuron. Realization of XOR function using modular neural nets. It is clear that the number of computations and the hardware requirements for the new model is less than that of the other models.166 Artificial Neural Networks – Architectures and Applications So. and the other to represent the in‐ verter. Each one of them may be implemented by using only one neuron.3. we implement the weights. there is no need to the buffer and the neural net may be represented by using only one weight corresponding to the inverter as shown in Fig. A comparison between the new model and the two previous approaches is given in Table 3. we may use two neural nets. When realizing these two neurons. XOR function is realized by using only one neuron. . for XOR function. and perform only one summing operation. In a special case. As a result of us‐ ing cooperative modular neural nets. Figure 4.

1 1 1 1 X 0 0 1 1 .. of computations Hardware requirements First model Second model New model (one neuron) O(3) (three neurons) (two neurons) O(15) O(12) 3 neurons.. 1 1 1 1 C3 0 0 0 0 . X.. Function AND C1 0 0 0 0 . A comparison between MNN and non-MNNs used to imple‐ ment logic functions is shown in Table 5.. Each group has an in‐ put of 5 bits.Y ¯ . 1.2.. 1 1 1 1 Table 4.doi.. we must have another 4 bits at the input. The hidden layer contains 8 neurons... X ¯ Y. Y. NOR. 2... XY ¯. A similar procedure is done between hidden and output layer.. X the selection for each one of these functions. 9 weights 2 neurons. 1 0 1 1 .. XNOR. in order to control OR.. 7 weights 1 neuron. 2 switches. 2 weights. XOR.. As a result of this... 0 0 1 1 Y 0 1 0 1 .. The first group requires 4 neurons and 29 weights in the hidden layer..5772/53021 167 Type of Comparison No.... while the second needs 3 neurons and 22 weights.org/10. NAND. 1 inverter Table 3. Fig.. . Non-MNNs can classify these 64 patterns using a network of three layers.. we may implement only 4 summing operations in the hidden layer (in spite of 8 neurons in case of non-MNNs) where as the MSB is used to select which group of weights must be connected to the neurons in the hidden layer. X ¯ +Y.. These patterns can be divided into two groups.. So.. 5 shows the structure of the first neuron in the hidden layer..Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks http://dx. Implementation of logic functions by using MNNs Realization of logic functions in one bit level (X.Y) generates 16 functions which are (AND.... while the output needs only one neuron and a total number of 65 weights are required. ¯ . there‐ by the total input is 6 bits as shown in Table 4. 0. Truth table of Logic function (one bit level) with their control selection. while the MSB is 0 with the first group and 1 with the second. X+Y ¯ ). ¯ X+Y C2 0 0 0 0 . A comparison between different models used to implement XOR function. 1 1 1 1 C4 0 0 0 0 . 0 1 0 1 O/p 0 0 0 1 ..

we may deal with each bit in out‐ put alone. to simplify the problem. 3. Implementation of 2-bits digital multiplier by using MNNs In the previous section. 65 weights Realization using MNNs O(54) 5 neurons. According to the truth table shown in Table 6. while the output one has 4 neurons. we make division in input. A comparison between MNN and non MNNs used to implement 16 logic functions. The hidden layer contains 3 neurons. Type of Comparison No. 1 inverter Table 5.168 Artificial Neural Networks – Architectures and Applications Figure 5. Using MNN we may simplify the problem as: W = CA (1) X = AD Ä BC = AD(B + C) + BC(A + D) = (AD + BC)(A + B + C + D) (2) Y = BD(A + C) = BD(A + B + C + D) (3) (4) Z = ABCD . Realization of logic functions using MNNs (the first neuron in the hidden layer). Non MNNs can realize the 2-bits multiplier with a network of three layers with total number of 31 weights. 51 weights. instead of treating the problem as mapping 4 bits in input to 4 bits in output. of computations Hardware requirements Realization using non MNNs O(121) 9 neurons. here is an exam‐ ple for division in output. 10 switches.

AD can be implemented using only one neuron.doi. This ¯ andD ¯ . A comparison between MNN and non-MNNs used to result with ( ABCD implement 2bits digital multiplier is listed in Table 7.6. An‐ other neuron is used to realize BC and at the same time oring (AD.Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks http://dx. BC) as well as anding the ¯ ) as shown in Fig. Equation 2 resembles an XOR. eliminates the need to use two neurons to represent A but we must first obtain AD and BC. 2.5772/53021 169 Equations 1. 3 can be implemented using only one neuron. Truth table of 2-bit digital multiplier. Figure 6.org/10. Input Patterns D 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 C 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 B 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 A 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Output Patterns Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Y 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 X 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 W 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 Table 6. . The third term in Equation 3 can be implemented using the output from Bit Z with a negative (inhibitory) weight. Realization of 2-bits digital multiplier using MNNs.

multiplication. A comparison between MNN and non-MNNs used to implement 2-bits digital multiplier. but only at the expense of speed and silicon area consumption.16. In this paper. Designers may use either analog or digital technologies to implement neural network models. and multipliers) can realize several of the operations in neural networks. 20 weights Table 7. 4. and 2-bit digital divider. The ana‐ log approach boasts compactness and high speed. Nowadays.170 Artificial Neural Networks – Architectures and Applications Type of Comparison No. 4. summation. Such weights can be realized by resistors as shown in Fig. namely. Implementation of artificial neuron Implementation of analog neural networks means that using only analog computation [14.12]. is the interconnection of artificial neurons that tend to simulate the nervous system of human brain [15]. On the other hand. One of the main reasons for using analog electronics to realize network hardware is that simple analog circuits (for example adders. thus making neural networks possible candidates for real-time applications. full-sub‐ tractor. 2-bit digital multiplier. Analog circuit techniques provide area-efficient implementations of the functions required in a neural network. Artificial neural network as the name indicates.1. Neural networks are modeled as simple processors (neurons) that are connected together via weights. of computations Hardware requirements Realization using non MNNs O(55) Realization using MNNs O(35) 7 neurons. and the sigmoid transfer character‐ istic [13]. Other advantages of hardware realizations as compared to soft‐ ware implementations are the lower per unit cost and small size system. there is a growing demand for large as well as fast neural processors to provide solutions for difficult problems. we describe the design of a reconfigurable neural network in analog hardware and demonstrate experimentally how a reconfigurable artificial neural network approach is used in implementation of arithmetic unit that including full-adder. 31 weights 5 neurons. The weights can be positive (excitatory) or negative (inhibitory). . Hardware implementation of MNNs by using reconfigurable circuits Advances in MOS VLSI have made it possible to integrate neural networks of large sizes on a single-chip [10.18]. 7. Hardware realizations make it possible to execute the forward pass op‐ eration of neural networks at high speeds. sigmoid. digital implemen‐ tations offer flexibility and adaptability.

....... The main problem with the electronic neural networks is the realiza‐ tion of resistors which are fixed and have many problems in hardware implementation [17]. The equivalent resistance between terminal 1 and 2 is given by [19]: . ¼¼ . The corresponding resistors that represent these weights can be determined as follow [16]: win = . 8.R f / Rin i = 1. So the calculated resistors corresponding to the obtainable weights can be implemented by us‐ ing CMOS transistors operating in continuous mode (triode region) as shown in Fig. n (5) Wpp = æ Ro Ro Ro ö + + .18].... Implementation of positive and negative weights using only one opamp. + ç1+ R1p R2p Rpp ÷ è ø n æ ö Ro ç 1 + å Win ÷ è ø Rpp i1 (6) The exact values of these resistors can be calculated as presented in [14. As a consequence...doi. Such resistors are not easily adjustable or controllable.org/10. they can be used neither for learning.... 2.5772/53021 171 Figure 7..Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks http://dx. nor can they be used for recall when another task needs to be solved. The summing circuit accumulates all the input-weighted signals and then passes to the output through the transfer function [13].. The computed weights may have positive or negative values.....

172

Artificial Neural Networks – Architectures and Applications

Figure 8. Two MOS transistor as a linear resistor.

Req = 1 / é K Vg – 2Vth ù ë û

(

)

(7)

4.2. Reconfigurability The interconnection of synapses and neurons determines the topology of a neural network. Reconfigurability is defined as the ability to alter the topology of the neural network [19]. Using switches in the interconnections between synapses and neurons permits one to change the network topology as shown in Fig. 9. These switches are called "reconfiguration switches". The concept of reconfigurability should not be confused with weight programmability. Weight programmability is defined as the ability to alter the values of the weights in each synapse. In Fig. 9, weight programmability involves setting the values of the weights w1, w2, w3,...., wn. Although reconfigurability can be achieved by setting weights of some synapses to zero value, this would be very inefficient in hardware.

Figure 9. Neuron with reconfigurable switches.

Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks http://dx.doi.org/10.5772/53021

173

Reconfigurability is desirable for several reasons [20]: 1. 2. 3. 4. Providing a general problem-solving environment. Correcting offsets. Ease of testing. Reconfiguration for isolating defects.

5. Design arithmetic and logic unit by using reconfigurable neural networks
In previous paper [20], a neural design for logic functions by using modular neural net‐ works was presented. Here, a simple design for the arithmetic unit using reconfigurable neural networks is presented. The aim is to have a complete design for ALU by using the benefits of both modular and reconfigurable neural networks. 5.1. Implementation of full adder/full subtractor by using neural networks Full-adder/full-subtractor problem is solved practically and a neural network is simulated and implemented using the back-propagation algorithm for the purpose of learning this network [10]. The network is learned to map the functions of full-adder and full-subtractor. The problem is to classify the patterns shown in Table 8 correctly.
I/P x y z Full-Adder Full-Subtractor

S
0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1

C
0 1 0 1 0 1 0 1

D
0 0 0 1 0 1 1 1

B
0 1 1 0 1 0 0 1 0 1 1 1 0 0 0 1 0 1 1 0 1 0 0 1

Table 8. Truth table of full-adder/full-subtractor

The computed values of weights and their corresponding values of resistors are described in Table 9. After completing the design of the network, simulations are carried out to test both the design and performance of this network by using H-spice. Experimental results confirm the proposed theoretical considerations. Fig. 10 shows the construction of full-adder/fullsubtractor neural network. The network consists of three neurons and 12-connection weights.

174

Artificial Neural Networks – Architectures and Applications

I/P Weight 1 2 3 Bias

Neuron (1) Resistance 7.5 7.5 7.5 -10.0 Weight 11.8 Ro 11.8 Ro 11.8 Ro 0.1 Rf

Neuron (2) Resistance 15 15 -10 -10 Weight 6.06 Ro 6.06 Ro 0.1 Rf 0.1 Rf

Neuron (3) Resistance 15 15 -10 -10 6.06 Ro 6.06 Ro 0.1 Rf 0.1 Rf

Table 9. Computed weights and their corresponding resistances of full-adder/full-subtractor

Figure 10. Full-adder/full-subtractor implementation.

5.2. Hardware implementation of 2-bit digital multiplier 2-bit digital multiplier can be realized easily using the traditional feed-forward artificial neural network [21]. As shown in Fig. 11, the implementation of 2-bit digital multiplier us‐ ing the traditional architecture of a feed-forward artificial neural network requires 4-neu‐ rons, 20-synaptic weights in the input-hidden layer, and 4-neurons, 20-synaptic weights in the hidden-output layer. Hence, the total number of neurons is 8-neurons with 40-synaptic weights.

Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks http://dx.doi.org/10.5772/53021

175

Figure 11. Bit digital multiplier using traditional feed-forward neural network

In the present work, a new design of 2-bit digital multiplier has been adopted. The new de‐ sign requires only 5-neurons with 20-synaptic weights as shown in Fig. 12. The network re‐ ceives two digital words, each word has 2-bit, and the output of the network gives the resulting multiplication. The network is trained by the training set shown in Table 10.

I/P B2 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 B1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 A2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 A1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 O4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 O3 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0

O/P O2 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 O1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1

Table 10. 2-bit digital multiplier training set

176

Artificial Neural Networks – Architectures and Applications

Figure 12. A novel design for2-Bit multiplier using neural network

During the training phase, these input/output pairs are fed to the network and in each itera‐ tion; the weights are modified until reached to the optimal values. The optimal value of the weights and their corresponding resistance values are shown in Table 11. The proposed circuit has been realized by hardware means and the results have been tested using H-spice computer program. Both the actual and computer results are found to be very close to the correct results.
Neuron
(1)

I/P
A1 B1 Bias A1 B2

W. Value
7.5 7.5 -10.0 7.5 7.5 -10.0 -30.0 20.0 7.5 7.5 -10.0 -10.0 3.0 3.0 3.0 3.0 -10.0 7.5 7.5 -10.0

Resistor
1200 1200 100 1450 1450 100 33 618 1200 1200 100 100 1200 1200 1200 1200 100 1200 1200 100

(2)

Bias N4 N5 A2

(3)

B2 bias N4 A1 A2

(4)

B1 B2 bias A2

(5)

B1 Bias

Table 11. Weight values and their corresponding resistance values for digital multiplier.

the total number of neurons is 8-neurons with 35-synaptic weights. The network receives two digital words. the implementation of 2-bit digital divider using neural network requires 4-neurons.org/10. The network is trained by the training set shown in Table 12 Figure 13. As shown in Fig. I/P B2 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 B1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 A2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 A1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 O4 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 O3 1 0 0 0 1 0 1 1 1 0 0 0 1 0 1 0 O/P O2 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 O1 1 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 Table 12.3. Hardware implementation of 2-bit digital divider 2-bit digital divider can be realized easily using the artificial neural network. Bit digital divider using neural network. each word has 2-bit. 15-synaptic weights in the hidden-output layer. and the output of the network gives two digital words one for the resulting division and the other for the result‐ ing remainder.Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks http://dx. Hence. 20-synaptic weights in the input-hidden layer. 13. and 4-neurons.5772/53021 177 5.doi. 2-bit digital dividier training set .

5 7.5 -10 7. -17.178 Artificial Neural Networks – Architectures and Applications The values of the weights and their corresponding resistance values are shown in Table 13.5 -17.5 7.5 -4.5 7.5 -10 -4. Weight values and their corresponding resistance values for digital divider.5 -10 -20 -30 10 25 -25 17. Neuron I/P A1 A2 W.5 5 5 5 7.5 10 10 -5 10 10 -5 10 10 -5 Resistor 56 56 2700 2700 2700 1200 1200 100 1200 56 1200 100 1200 100 220 1200 1200 220 100 50 33 1200 500 40 700 1000 1000 220 1000 1000 220 1000 1000 220 (1) B1 B2 Bias A1 A2 (2) B1 B2 Bias A1 (3) A2 B2 Bias A1 A2 (4) B1 B2 Bias A1 A2 B1 B2 N3 Bias N1 (5) (6) N3 Bias N1 (7) N4 Bias N1 (8) N2 Bias Table 13. Val.5 7. .5 -17.5 -10 7.

Both the Fig. Conclusion We have presented a new model of neural nets for classifying patterns that appeared expen‐ sive to be solved using conventional models of neural nets. addition. correct results. actual and found to be very close to the 13. Arithmetic operations namely. it can be applied to manipulate nonbinary data. The net‐ bias bias work includes full-adder.Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks A1 Output-Layerhttp://dx. The computed values of weights and their corresponding values of resistors are described in Tables 9. (8) (4) O4 B2 circuit is realized by hardware The proposed means and the results are tested using H-spice bias computer results are bias computer program. neurons and weights. 14 shows of only 8-neurons. After completing the design of the network.10. 67-connection weights. 2-bit digital multiplier. simulations are carried out to test both the design and performance of this network by using H-spice. and 32-reconfiguration switches. The proposed network consists (3) (7) B1 O3 Fig.15. A2 (2) (6) A2 B1 B2 (5) O1 O2 I/P A1 Connection weights Full-Adder Selection C1 C2 Neurons A2 FullSubtractor 2 Bit Digital Multiplier 2 Bit Digital Divider Reconfiguration switches Neurons B1 O1 O2 O3 O4 B2 Figure 14. realization of problems using MNNs resulted in reduction of the number of computations. and division can be re‐ bias bias alized easily using a reconfigurable artificial neural network. . multiplication. the block diagram of the arithmetic operation using reconfigurable neural network. 2-Bit digital divider using neural network. and 2-bit digital divider. This approach has been intro‐ duced to realize different types of logic problems. Block diagram of arithmetic unit using reconfigurable neural network. Computer results are found bias bias to be very close to the correct results. full-subtractor.116. Also. 6. subtraction. compared to non MNNs.doi.5772/53021 Hidden-Layer 179 A1 (1) The results have been tested using H-spice computer program. We have shown that. Experimental results confirm the proposed theoretical considerations as shown in Tables 14.org/10.

79 -1.3761 -2.46 -2.63 3.34 -2.63 -1.498 3.79 -2.068 -2.46 3.46 3.73 -2.409 -2.79 -2.399 -1.415 -2.63 -1.74 -2.384 -3.390 -2.3081 3.63 -2.710 Neuron (4) Pract. -3.068 3.73 Sim.438 -3. -2.390 Table 15.4157 -2.384 -3. -3.78 -2.413 -3.390 -3.34 -2.46 -2.79 -2.45 -1.46 -2. -2.86 Sim.2741 3.423 -3.46 3.79 -2.4120 X Y Z 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 Table 14.78 -2.409 -2.34 -2.75 3.068 -2.34 -2.46 -2.423 -3.72 3.415 -2.78 -2.068 -3.45 3.5968 3.355 -3.068 -2.79 3.45 -2.438 -3.5968 3.390 -2.34 -2.72 -2.72 3.74 1.34 3.413 -3.3741 -3. -3.3081 3.3081 3.068 -2.415 -2.3741 3.373 -2. -2.423 -3.415 -3.73 3.75 3.46 Sim.415 -2.397 3.78 -2.34 3.79 -2.45 3.3761 3.498 3.355 -1.3081 -3.068 -3.79 -2.390 3.78 -2. -3.390 3. Mansoura University.409 -2.3741 3.068 3.79 -2.78 -2.355 -1.068 3.4366 -3.4135 3.398 Neuron (3) Pract.70 Sim.34 3.79 -2.438 -3. Practical and Simulation results after the summing circuit of 2-bit digital multiplier Author details Hazem M.63 -1.498 3.63 3.413 -1.390 Neuron (2) Pract.79 -2.79 -2.068 3.46 -2.4120 Neuron(3) Practical -2.46 -2.355 -1.46 3.399 -2.068 -2.45 -2.068 3.424 -2.4372 -3.79 -2.438 -3.46 -2.79 -2.45 -2.72 3.78 -2.519 Neuron (5) Pract.72 -2.46 Simulated -3. -3.384 2.415 -2.399 3.63 -1.373 3.423 -3.498 3.34 -2.5968 -2.34 -2.34 3.3081 -3.068 -2.068 -2. -2.373 -3.73 3.34 3.3741 -3.72 3.423 -3.3761 3.79 -2.180 Artificial Neural Networks – Architectures and Applications I/p Neuron(1) Practical -2.4135 3.415 -3.390 -3.46 3.355 3.4231 Neuron(2) Practical -2. Egypt .355 -1.75 -2.46 Sim.373 -2.74 -2.79 -2.384 -3. El-Bakry Faculty of Computer Science & Information Systems.75 3.79 -2.34 -2.79 -2.34 3.068 3.355 -1.34 -2.46 3. -2.355 3.068 -2.45 -2.415 -2.74 -2.63 -1.78 -2.79 -1.46 3.79 3.314 -1.447 -3.498 -3.498 -3.34 -2.75 -2.78 -2. Practical and Simulation results after the summing circuit of full-adder/full-subtractor Neuron (1) Pract.45 3.78 -2.423 -3.48 Simulated -3.45 -2.46 -2.48 Simulated -3.

Waibel. Australia. 5. [13] H. Jackel. pp. A. Kamel. 1992. Parallel distributed Processing: Explorations in the Microstructues of Cognition. Egypt.” IEEE Journal of Solid State Circuits.” IEEE Cir‐ cuits Devices Mag. Learning and Categorization in Modular Neural Networks. Murre. Vittos. “A Reconfigurable VLSI Neural Network. CMNN: Cooperative Modular Neural Networks for Pattern Recognition. and Hans Peter graf. 1993. S. 1999. Pattern Recognition Letters. Voting Schemes for cooperative neural network clas‐ sifiers. Raafat. [3] G. J. [7] D. Conf. 27. [2] R. A. Jacobs. M. H. pp. El-Mikati " Design and Implementation of 2-bit Logic functions Using Artificial Neural Networks . [8] K. R. Construction of a large scale neural network: Simulation of handwritten Japanese Character Recognition. Kamel. Y. Abo-Elsoud. Mori. Neural Computing 1. Circuits Syst. H. [12] H. Auda. Tsividis. vol. Harvester Wheatcheaf. M. Yannis P. Barto. 1997. E. on Neural Net‐ works. pp. [9] H.” in proc. MA:MIT Press. pp. Vol. . January 1992.Integrating Modularity and Reconfigurability for Perfect Implementation of Neural Networks http://dx. Rumelhart. 1. Cambridge. and M. [11] E. (new Orleans. pp. Perth.39-46. Joe. Int. pp. Multiple Networks for Function Learning. Vol.org/10. on Statistics computer Science. 1986. 1240-1243. 9-14. . G. “Analog VLSI Implementation of Neural Networks. [6] A. Hinton. EL-Bakry. November. M. M. Egypt. The 24th international Conf. . 1990. pp. 1995. pp. Abo-elsoud. [5] E. 318-362. 3. D. and H. Jordan. Vol. on Neural Networks. 1996. Alpaydin. graf and L. Cairo. 2524-2527." Proc. Neural Computa‐ tion 3. Miyake. Symp. 1391-1398. Williams. vol. ICNN95. Automatic Personal Identification Using Neu‐ ral Nets. El-bakry. 79-107.doi. 79-87. [4] G. “Analog Electronic Neural Network Circuits. July 1989. of the 6th International Conference on Microelectronics (ICM'96). pp. Vol. A. E. and its applica‐ tions. P. 18. LA). and R. on NCUBE Concurrency 2 (2). no. 44-49.. Learning representation by error backpropagation.1 CA. M. Int.5772/53021 181 References [1] J. [10] Srinagesh Satyanarayna. A. IEEE Trans. 1991. and M. USA. 1. Auda. Modular Construction of Time Delay Neural Networks for Speach Recog‐ nition. 16-18 Dec. Soliman and H. Task Decomposition Through Competition in a Modu‐ lar Connectionist Architecture: The what and where vision tasks. Cairo.

1994. 1984. Macmillan college publishing company. and N. USA. Alex. Nov. H.11. of the 6th International Conference on Computer Theory and Applications. 2003. Oregon. and M. 1992. “Voltage-Controlled Linear Resistor by Using two MOS Transistors and its Applications to RC Active Filter MOS Integration. Vol. “Fundamentals of Neural Network : Architectures. B. [20] H.” Prentice Hall International. and H. M. 1655-1657. of IEEE IJCNN’09.” Kluwer Academic Publishers. pp. 2202-2207. M. Han and S. [15] Jack M. “Complexity Reduction Using Modular Neural Networks. Portland. A. Egypt. USA. 283-288. and Applications. pp. [21] H. 20-24.” Proc. M. of IEEE IJCNN’03. 2009. 1996. pp. [19] Laurene Fausett. “Neural Network : A comprehensive foundation”. 744-750. A. Atlanta. 3-5 Sept. “Introduction to Artificial Neural Systems. Zurada. Ismail. El-Bakry. June 14-19. pp.” West Publishing Com‐ pany. Soliman and H. Mead. “Analog VLSI Implementation of Neural Systems. No.. M.” Proceedings of the IEEE. El-Mikati " Imple‐ mentation of 2-bit Logic functions Using Artificial Neural Networks . Abo-Elsoud. .182 Artificial Neural Networks – Architectures and Applications [14] Simon Haykin." Proc. El-bakry.72. . July. [18] I. Algorithms. Park. S. Mastorakis “A Simple Design and Implementation of Recon‐ figurable Neural Networks Detection. 1989 [17] H. EL-Bakry. [16] C.” Proc.

Experimental results provide excellent opportunities to discover the missing parti‐ cles of the Standard Model. Those Evolution Algorithms have a physical powerful existence in that field [13-17].e. Hindawi Additional information is available at the end of the chapter http://dx. The study of the interactions between those elementary particles requests enormously high energy collisions as in LHC [1-8]. searches for the fundamental par‐ ticles and forces which construct the world surrounding us and understands how our uni‐ verse works at its most fundamental level. The behavior of the p-p interactions is complicated due to the nonlinear relationship between the interaction parameters and the output. Neutrons.org/10.Hadron Collisions at LHC Amr Radi and Samy K. Elementary particles of the Standard Model are gauge Bosons (force carriers) and Fermions which are classified into two groups: Leptons (i. include the appli‐ cation of Artificial Intelligence (AI) Techniques. LHC possibly will yield the way in the direction of our awareness of particle physics beyond the Standard Model. based on soft computing systems. used to test various particle production models. To un‐ derstand the interactions of fundamental particles. Electrons. Introduction High Energy Physics (HEP) targeting on particle physics. As well as. up to the highest energy hadrons collider in the world s =14 Tev. Recently. Muons. etc).doi. different modeling methods.5772/51273 1. etc) and Quarks (Protons.Chapter 9 Applying Artificial Neural Network Hadron . it is important to have a complete understanding of the reaction mechanism. The proton-proton (p-p) interaction is one of the fundamental interactions in high-energy physics. Those techniques are becoming useful as alternate approaches to . multipart data analysis are needed and AI techniques are vital. The particle multiplicity distributions. It is based on different physics mechanisms and also provide constrains on model features. In order to fully exploit the enormous physics potential. Some of these models are based on string fragmentation mechanism [9-11] and some are based on Pomeron exchange [12]. as one of the first measurements made at LHC.

. x2. Genetic Algorithm (GA) [20].. Section 2. An overview of Artificial Neural Networks (ANN) An ANN is a network of artificial neurons which can store. such as Artificial Neural Network (ANN) [19].1.. x2p. d2. the results and conclusions are provided in sections 4 and 5 respectively. Finally... In the following sub-sections we introduce some of the concepts and the basic compo‐ nents of NNs: 2.. Let nℓ be the number of neurons in the ℓth layer. In this chapter... the use of the term neuron is now so deeply estab‐ lished that its continued general use seems assured.. A single-layer network is an area of neurons while a multilayer network consists of more than one area of neurons... Let {x1.dnp}. gain and utilize knowledge. 2. However.... w2... wp} plus the bias θ. AI techniques.d2p.. The motivation of using a NN approach is its learning algorithm that learns the relation‐ ships between variables in sets of data and then builds models to explain these relationships (mathematically dependant). Section 3 explains how NN & GA is used to model the p-p in‐ teraction. gives a review to the basics of the NN & GA technique. A way to encompass the NNs studied in the literature is to regard them as dynamical systems controlled by synaptic matrixes (i. Some researchers in ANNs decided that the name ``neuron'' was inappropriate and used other terms. Parallel Distributed Processes (PDPs)) [24]. xp} and its corresponding weights {w1. Neuron-like Processing Units A processing neuron based on neural functionality which equals to the summation of the prod‐ ucts of the input patterns element {x1. Genetic Programming (GP) [21 and Gene Expres‐ sion Programming (GEP) [22]. dp}be the corresponding desired output patterns. The output of a neuron ui 0 is the input xip (for input pattern p). can be used as alternative tools for the simulation of these in‐ teractions [13-17. the net‐ work input netpi ℓ+1 to a neuron ui ℓ+1 for the input xpi ℓ+1 is usually computed as follows: . such as ``node''.. For the other layers. xp} be the set of input patterns that the network is supposed to learn its classifica‐ tion and let {d1.. 21-23]. xnp} and dp is an n dimension vector {d1p. we have discovered the functions that describe the multiplicity distribution of the charged shower particles of p-p interactions at different values of high energies using the GA-ANN technique. In this sense. Some important concepts associated with this simplified neuron are defined below. This chapter is organized on five sections.. dp) is called a training pattern.. The weight of the link between neuron uj ℓ in layer ℓ and neuron ui ℓ+1 in layer ℓ+1 is denoted by wij ℓ.e. The input layer is called the xth layer and the output layer is called the Oth layer. It should be noted that xp is an n dimension vector {x1p...184 Artificial Neural Networks – Architectures and Applications conventional ones [18]. x2.. Let ui ℓ be the ith neuron in ℓth layer. The pair (xp.

The sigmoid function. Those hidden layers enable extraction of higher-order features.2.e. Hidden layers are placed between input and output layers.org/10.qii +1 j =1 ni (1) where Opj ℓ = xpi ℓ+1 is the output of the neuron uj ℓ of layer ℓ and θi ℓ+1 is the neuron's bias value of neuron ui ℓ+1 of layer ℓ+1. hidden and output) of neurons are fully interconnected. 2. θi is often substituted by a ``bias neuron'' with a constant output 1. This allows the variation of input conditions to affect the output. In the rest of this chapter we assume that β=1. Activation Functions The activation function converts the neuron input to its activation (i.Hadron Collisions at LHC http://dx.5772/51273 185 +1 i i = å wij netipi o pj . is also often used as an activation function. is shown in Fig 1. . This means that biases can be treat‐ ed like weights.b netipi (2) where β determines the steepness of the activation function.doi. 2. a three layer NN. and recurrent networks) [25]. and passes it via weighted connec‐ tions to the neurons in the first hidden layer [25]. This is a common form of NN. a new state of acti‐ vation) by f (netp). the three layers (input.Applying Artificial Neural Network Hadron . The logistic function is an example of a sigmoid function of the following form: oipj = f (netipi ) = 1 1+ e . The input layer receives an external activation vector. Figure 1. these contain one or more hidden layers. multi-layer feedfor‐ ward. In this chapter the Multi-layer Feedforward Networks are considered. usu‐ ally included as Op. An example of this arrangement. as a non-linear function. which is done throughout the remainder of the text. For the sake of a homogeneous representation. Network Architectures Network architectures have different types (single-layer feedforward.3.

in the network. No unique learning algorithm exists for the design of NN. When the network is trained sufficiently. Supervised learning is a class of learning rules for NNs. A method. through which the values of the weights in the network are adjusted to reflect the characteristics of the input data. Weights are adjusted in the learning system so as to minimize the difference between the desired and actual outputs for each in‐ put training data. it is essential to have some form of training. tp) is computed as: e pi  =  t pi . through which a trained NN is able to provide correct output data to a set of previously (unseen) input data. A set of well-defined rules for the solution of a learning problem is called a learning algo‐ rithm. The error εpi for the ith neuron ui o of the output layer o for the training pair (xp. In other words. In which a teaching is provided by telling the network output required for a given input. NNs are claimed to have the feature of generalization. first sums gradient information for the whole pattern set and then updates the weights. the objective of the learning process is to tune the weights in the network so that the network performs the desired mapping of input to output activation. This method is also known as “batch learning” . Figure 2. This means that the actual response of each output neuron. Training deter‐ mines the generalization capability in the network structure. approaches the desired response for that neuron.4. Neural Networks Learning To use a NN. o pi o (3) This error is used to adjust the weights in such a way that the error is gradually reduced.186 Artificial Neural Networks – Architectures and Applications 2. known as “learning by epoch”. Learning algorithms differ from each other in the way in which the adjustment of Δwij to the synaptic weight wij is for‐ mulated. or when no further improvement is obtained. This is illustrated in Fig 2. it will obtain the most nearest correct output for a presented set of input data. The training process stops when the error for every training pair is reduced to an acceptable level. Example of Supervised Learning. An example of a supervised learning rule is the delta rule which aims to minimize the error function.

it is usu‐ ally advisable to decrease the number of hidden units and/or hidden layers.5772/51273 187 and most researchers use it for its good performance [25].Applying Artificial Neural Network Hadron . The proposed ANN model was trained using Levenberg. Each weight-update tries to minimize the summed error of the pattern set.Hadron Collisions at LHC http://dx. The training set is used to train the ANN model by adjusting the link weights of net‐ work model. Linearly.doi. The weight updating rule for the batch mode is given by: w ijs +1 =  Dw ijl ( s )   +  w ijl ( s ) (6) where wij s+1 is the update weight of wij ℓ of layer ℓ in the sth learning step. the training must be stopped. to drive the weights in the di‐ rection of the estimated minimum. In contrast. if this condition holds for all patterns in the training set. the error function can be defined for all the patterns (Known as the Total Sum of Squared. If the error stops decreasing. (TSS) errors as: E= 1 m n åå e pi 2 p =1 i =1 (5) The most desirable condition that we could achieve in any learning algorithm training is εpi ≥0.Marquardt optimization technique [26]. training set and testing set. Obviously. This means that the training data set has to be fairly large to contain all the required information and must include a wide variety of data from different experimental conditions. In case over-fitting or over-learning occurs during the training process. if .org/10. the available input data set consists of many facts and is normally divided into two groups. In training a network. and at this point. The error function can be defined for one training pattern pair (xp. namely. including different formulation composition and process parameters. the training error keeps dropping. which should include the data covering the entire experimental space. and s is the step number in the learning process. Data collected from experiments are divided into two sets. we can say that the algorithm found a global minimum. One group of facts is used as the training data set and the sec‐ ond group is retained for checking and testing the accuracy of the performance of the net‐ work after training. The weights in the network are changed along a search direction. dp) as: E p = 1/ 2å e pi i =1 no (4) Then. or alternatively starts to rise. the ANN model starts to over-fit the data.

We will discuss GP further in Section 3. indicating its performance on the problem. was suggested in [33]. The problem itself represents the environment. These individuals are perturbed using the operators.39]. This provides general heuristics for exploration in the environment. and on GAs in particular. the members of the population represent possible solutions to a par‐ ticular optimization problem. whose descriptions are mainly based on [30. each of the two varieties implements genetic operators in a different manner. Although. we give more background information about natural and artificial evolu‐ tion in general.33. selection. mutation and survival continues until some termination criterion is met.2. Natural and Artificial Evolution As described by Darwin [40]. Varieties of these evolutionary computational models have been proposed and used in many applications.37]. An overview of Genetic Algorithm 3. it is very simple from a biological point of view. 3. Introduction Evolutionary Computation (EC) uses computational models of evolutionary processes based on concepts in biological theory.1. Each individual in the population is given a measure of its fitness in the environment. first implemented in [34] and popularised in [35-37]. The notion of using ab‐ stract syntax trees to represent programs in GAs. Although similar at the highest level. including optimization of NN parameters and searching for new NN learning rules. evolution is the process by which a population of organisms gradually adapt over time to enhance their chances of surviving. In the following two sections.188 Artificial Neural Networks – Architectures and Applications the network is not sufficiently powerful to model the underlying function. 3. This is achieved by ensur‐ ing that the stronger individuals in the population have a higher chance of reproducing and creating children (offspring). over-learning is not likely to occur. This cycle of evaluation. who strongly stressed recombination as the energetic potential of evolution [32]. We must ap‐ ply each potential solution to the problem and assign it a fitness value. We will refer to them as Evolu‐ tionary Algorithms (EAs) [27-29] EAs are based on the evolution of a population which evolves according to rules of selection and other operators such as crossover and mutation. these algorithms are sufficiently complex to provide strong and powerful adaptive search mechanisms. This thesis concentrates on the tree-based variety. crossover. Genetic Algorithms (GAs) were developed in the 70s by John Holland [30]. The term Genetic Programming is used to refer to both tree-based GAs and the evolutionary generation of programs [38.36. The two essential features of natural evolution which we need to maintain are propagation of more adaptive features to future generations (by applying a .4.32. and the training errors will drop to a satisfactory level. Genetic Programming (GP). Selection favors individual with high fit‐ ness. In artificial evolution.35.

Then. a loop is en‐ tered based on whether or not the algorithm's termination criteria are met in line 4. The particular numbers of parents for that operator are then se‐ lected. The operator is then applied to generate one or more new children. the method used to select a crossover point for the recombination operator and the seed value used for the random number generator. Lines 7 through 10 contain the part of the algorithm in which new individuals are generated. Line 6 contains the control code for the inner loop in which a new generation is created. the choice of methods for implementing probabil‐ ity in selection. The Genetic Algorithm GAs is powerful search and optimization techniques. 3. In addition. First. • A population is the variety of chromosomes that evolves from generation to generation. such as the size of the population. GA is capable of finding good solutions quickly [32]. predefined to guide the GA. one GA can be used for a variety of optimization problems. The components are: a representation (encoding) of the problem.Hadron Collisions at LHC http://dx. To solve an optimization problem. • A chromosome is an encoding representation of a phenotype in a form that can be used. based on the mechanics of natural se‐ lection [31]. • Evaluation is the interpretation of the genotype into the phenotype and the computation of its fitness. an initial population is generated in line 2. • Fitness is the measure of the performance of an individual on the problem. a population initialization procedure and a set of genetic operators. the probabili‐ ties of each genetic operator being chosen.org/10. The structure of a typical GA can be described as follows [41] In the algorithm. Some basic terms used are: • A phenotype is a possible solution to the problem. • Genes are the parts of data which make up a chromosome. the GA is inherently parallel. Also. the probability of mutation of a gene in a selected individual. Accord‐ ingly. Finally. since a population of potential solutions is maintained.5772/51273 189 selective pressure which gives better solutions a greater opportunity to reproduce) and the heritability of features from parent to children (we need to ensure that the process of repro‐ duction keeps most of the features of the parent solution and yet allows for variety so that new features can be explored) [30].3. there are a set of GA control parameters. the algorithm computes the fitness for each member of the initial population in line 3. a GA requires four components and a termination criteri‐ on for the search.Applying Artificial Neural Network Hadron . the new children are added to the new generation.doi. • A generation (a population set) represents a single step toward the solution. the method by which genetic operators are chosen. Subsequently. a genetic operator is selected. The advantage of GAs is that they have a consistent structure for different problems. a fitness evaluation function. GAs are used for a number of different application areas [30]. .

not deterministic ones • GA can provide a number of potential solutions to a given problem • GAs operate on fixed length representations.g. these applica‐ tions do not address the problem of training neural networks. The termination criterion is tested and the algorithm is ei‐ ther repeated or terminated. The most significant differences in GAs are: • GAs search a population of points in parallel. .only the objective function and corresponding fit‐ ness levels affect the directions of search • GAs use probabilistic transition rules. since they still depend on other training methods to adjust the weights. GAs have been applied successfully to the problem of designing NNs with supervised learning proc‐ esses. However. Fitness values are computed for each individual in the new generation. for evolving the architecture suitable for the problem [42-47]. 4. The Proposed Hybrid GA .190 Artificial Neural Networks – Architectures and Applications Lines 11 and 12 serve to close the outer loop of the algorithm. These values are used to guide simulated natural selection in the new generation. e. not a single point • GAs do not require derivative information (unlike gradient descending methods.ANN Modeling Genetic connectionism combines genetic search and connectionist computation. SBP) or other additional knowledge .

Modeling by Using ANN and GA This study proposed a hybrid model combined of ANN and GA (We called it “GA–ANN hybrid model”) for optimization of the weights of feed-forward neural networks to improve the effectiveness of the ANN model. In this work. Using a bitstring is not essentially the best approach to evolve the architecture. we should know how to encode architectures (neurons. For instance when networks with different topolo‐ gies may show similar learning and generalization abilities. The network architecture designing can be explained as a search in the architecture space that each point represents a different topology. 4. Building the architectures by means of GAs is strongly reliant on how the features of the network are encoded in the genotype.2. The process key in the evolutionary design of neural architectures is shown in Fig.1. Genetic algorithm is run to have the optimal parameters of the architectures. we defined the fitness function as the mean square error (SSE). Not only have GAs been used to perform weight training for supervised learning and for reinforcement learning applications. where an architecture specification indicates how many hidden units a network should have and how these units should be connected. the search space makes things even more difficult in some cases.org/10. GAs for Training NNs GAs have been used for training NNs either with fixed architectures or in combination with constructive/destructive methods. In addition. layers. GAs have been applied to the problem of finding NN architectures [52-57].5772/51273 191 4.We construct a ge‐ netic algorithm. During the weight train‐ ing and adjusting process. the performance evaluation depends on the training method and on the initial conditions (weight initialization) [59]. The top‐ ologies of the network have to be distinct before any training process. networks with similar structures may have different performances. This can be made by replacing traditional learning algo‐ rithms such as gradient-based methods [48]. the alternative provided by destructive and constructive techniques is not satisfactory. a determination has to be made concerning how the information about the architecture should be encoded in the genotype. To find good NN architectures using GAs. Assuming that the structure of these networks has been decided. Encoding of NNs onto a chromosome can take many different forms. which can search for the global optimum of the number of hidden units and the connection structure between the inputs and the output layers.The approach is to use the GA-ANN model that is enough intelligent to discover functions for p-p interac‐ tions (mean multiplicity distribution of charged particles with respects of the total center of .doi. the effectiveness and efficiency of the learning process. weights and biases of all the neurons which are joined to create vectors. alternatively. and a controlled connectivity. The definition of the architecture has great weight on the network performance. The search space is huge. As discussed in [58].Hadron Collisions at LHC http://dx.Applying Artificial Neural Network Hadron . the fitness functions of a neural network can be defined by con‐ sidering two important factors: the error is the different between target and actual outputs. Therefore. Additionally. but they have also been used to select training data and to translate the output behavior of NNs [49-51]. and connections) in the chromosomes that can be manipulated by the GA. even with a limited number of neurons.

Figure 3. SSE = ∑ ( yi − ^ y i )2 i =1 n (7) where ^ y i = b0 + b1 xi is the predicted value for xi and yi is the observed data value occurring at xi . The statistical measures of sum squared error (SSE). Overview of GA-ANN hybrid model. GA-ANN discovers a new model by using the training set while the predicated set is used to examine their generalization capa‐ bilities.192 Artificial Neural Networks – Architectures and Applications mass energy). The model is trained/predicated by using experimental data to simulate the pp interaction. to show that the data sets are subdivided into two sets (training and predication). .To measure the error between the experimental data and the simulated data we used the statistic measures. It is also called the summed square of residuals and is usually labeled as SSE. GA-ANN has the potential to discover a new model. The total deviation of the response values from the fit to the response values.

We notice that the curves obtained by the trained GA-ANN hybrid model show an exact fitting to the ex‐ perimental data in the three cases.01.36 Tev and 7 TeV.5772/51273 193 The proposed GA-ANN hybrid model has been used to model the multiplicity distribution of the charged shower particles. and the pseudo rapidity (η). we noticed in the ANN model that by increasing the number of connections to 285 the error decreases to 0. Figure 1 shows the schematic of GA-ANN model. The architecture of GA-ANN has three inputs and one output. The training set is used to train the GA. Table 1 shows a comparison between the ANN model and the GA-ANN model for the prediction of the pseudo-rapidity distribution.The output is the charged particles multiplicity distribu‐ tion (Pn). For work completeness. In the 3x15x15x1 ANN structure. Comparison between the different training algorithms (ANN and GA-ANN) for the for charge particle Multiplicity distribution.Hadron Collisions at LHC http://dx. Simulation results based on both ANN and GA-ANN hybrid model. Data collected from experiments are divided into two sets.org/10.01 0. Therefore. the final weights and biases after training are given in Appendix A. the weights and biases which used for the designed network are provided in the Appendix A. we have used 285 connec‐ tions and obtained an error equal to 0. The inputs are the charged particles multiplicity (n). By using GA optimization search. The proposed model was trained using Levenberg-Mar‐ quardt optimization technique [26]. The testing data set is used to confirm the accuracy of the proposed model.doi. and c respectively.Applying Artificial Neural Network Hadron .0001 Learning rule LMA GA GA optimization structure 229 Table 1.ANN hybrid model. we have used GA as hybrid model. The total sum of squared error SSE. we have obtained the structure which minimizes the number of connections equals . The fast Levenberg-Marquardt al‐ gorithm (LMA) has been employed to train the ANN. training set and testing set. while the connection in GA-ANN model is 225. Results and discussion The input patterns of the designed GA-ANN hybrid have been trained to produce target patterns that modeling the pseudo-rapidity distribution. to model the distribu‐ tion of shower charged particle produced for P-P at different the total center of mass energy.9 TeV. b. It ensures that the relationship between in‐ puts and outputs. Structure ANN: 3 x15x15x1 Number of connections 285 Error values 0. In this model we have obtained the minimum error (=0. the GA-ANN Hybrid model is able to exactly model for the charge particle multiplici‐ ty distribution. Then. 5. based on the training and test sets are real. 2. The data set is divided into two groups 80% for training and 20% for testing. In order to obtain the optimal struc‐ ture of ANN. but this needs more calculations.0001. namely. are given in Figure 2-a.0001) by using GA. s 0. the total center of mass energy ( s ).

ANN and GA-ANN simulation results for charge particle Multiplicity distribution of shower p-p. s ) of p-p interaction. 6. Pn (n.0001). The discovered . Conclusions The chapter presents the GA-ANN as a new technique for constructing the functions of the multiplicity distribution of charged particles. This indicates that the GA-ANN hybrid model is more efficient than the ANN model. Figure 4.194 Artificial Neural Networks – Architectures and Applications to 229 only and the error (= 0. η.

8137 -1.0259 0.3420 -3.7322 -0.1986 2.7632 7.1758 -1. s ) that are not used in the training session.6309 0.0875 4.2517 -3.1861 3.5772/51273 195 models show good match to the experimental data.4795 2.6232 1.6272 -2.Hadron Collisions at LHC http://dx.4267].2120 -1. Appendices The efficient ANN structure is given as follows: [3x15x15x1] or [ixjxkxm].6292 -1.4145 -0.7109 1. η.0691 -2.2605 -3.2561 1.1033 2.9551 3.1519 1.4979 -0.0889 -2.2408 1.1349 -1.7565 -1.5001 0.1091 -0.Applying Artificial Neural Network Hadron .6118 3. Moreover.0028 0.4762 -0.9749 0. they are capable of testing ex‐ perimental data for Pn (n. the testing values of Pn (n.0896 -3.1010 3.3577 2. η.6987 -1.8041 -3. Finally.5615 1.0600 -0. .4889 2. Consequence.6012 -1.2589 -1.0757 1.0526 1.0299 -2.2456 2.1887 -1.8996 -1.doi.3688 2. Weights coefficient after training are: Wji = [3.org/10.2080 -1.8254 3.5844 -3.4683 -0. s ) in terms of the same parameters are in good agreement with the experimental data from Particle Data Group.4153 2.4291 -0.7820 3.1415 0.6741 -2.6633 -2.4896 2.5519 0.0116 -3.0855 3.1359 4.5717 -2.4374 2.2805 2.5538 -3.0831 -2.2488 -4. we conclude that GA-ANN has become one of important research areas in the field of high Energy physics.2442 -3.

3894 0.2705 0.6121 0.1584 0.1840 0.2948 0.3128 0.0761 Columns 6 through 10 0.2058 0.1441 0.2235 0.2792 0.196 Artificial Neural Networks – Architectures and Applications Wkj = [0.4323 0.2209 0.0313 0.1559 0.0683 0.2207 0.2947 0.1043 0.2169 0.4361 .0745 0.0408 -0.0093 0.0693 0.2790 -0.0466 0.2529 0.2601 0.1658 0.1875 0.5625 -0.2253 -0.0981 -0.2345 0.5882 -0.4946 0.1570 0.2387 0.1706 0.0354 0.5006 0.3568 0.3538 -0.4724 0.4562 0.4795 0.3603 0.1207 0.2412 0.1347 0.0694 0.1975 0.0815 0.2498 0.3619 0.5147 0.5657 0.0403 0.0074 0.0851 0.2559 0.5528 0.3498 -0.2738 0.4505 0.5802 0.3951 0.4801 0.3498 0.3870 -0.4270 0.4404 -0.0125 0.0421 0.5591 0.5976 0.4031 -0.0297 -0.3383 0.0407 0.5055 -0.2670 0.5506 0.0612 -0.0802 -0.2806 -0.1928 0.0185 0.7176 0.1497 -0.0063 -0.2012 0.5570 0.2047 0.3935 0.0441 0.3939 0.2459 -0.3268 -0.3034 -0.3294 0.1972 0.0425 0.3198 -0.3967 0.5626 0.1943 -0.4214 0.4698 0.1302 -0.0036 0.2678 0.2156 -0.0022 0.5513 -0.4295 0.2430 -0.4887 0.2899 0.1572 0.0875 0.4338 0.1719 0.0518 0.0697 0.4321 0.2797 -0.2790 -0.4085 0.1416 0.1081 0.4581 0.2685 0.7558 0.2102 0.2453 0.5578 0.4784 0.0592 0.3784 0.6226 -0.4253 0.5506 0.

2594 -0.6920 -0.1362 0.4397 -0.6241 0.3373 0.0722 -1.3421 -0.5859 -1. probability of mutation = 0.Applying Artificial Neural Network Hadron .4317 0.1663 -0.3486 -0.0356 -0.doi. bk = [ 1.0603 -0.3710 0.4752 -0.6321 0.7598 2.3519 -0.3694 0.0746 -0.6813 -0. Acknowledgements The authors highly acknowledge and deeply appreciate the supports of the Egyptian Acade‐ my of Scientific Research and Technology (ASRT) and the Egyptian Network for High Ener‐ gy Physics (ENHEP).5433 -0.6909 -0.3520 0. probability of crossover = 0. bi = [-4.9766 -3.3211 0.6500 0.1078].9368 1.4892 -0.3888 0.5772/51273 197 Columns 11 through 15 -0.3883].4299 -0.4167 0.6584 -0. The optimized GA-ANN: The standard GA has been used.3700 -0.2311 -0.3029 0.8559 3.2205 1.2109 -0.7272 0.0432 0.4833 -0.9297 -0.0527 -0.2157 3.3083 0.1448 1.0868 -0.1745 -0.7971 1.9.1614 0.1678 0.2071].1201 -0.3205 0.6988 -0.4557 0.0100 -0. The parameters are given as follows: Generation = 1000.7175 -2.4464 -0. Population = 4000.6123 -0.0061 -0.4295 -0. Fitness function is SSE.6932 ].5000 -1.0601 -0.0710 0.0190 -0.Hadron Collisions at LHC http://dx.0033 -0.0462 -0.8312 -3.2422 0.7441 -0.7304 0.5040 2.7821 0.2915 1.1326 1. A neural network had been optimized as 229 of neurons.3752 0.7305 0.2013 -0.0185 0.3455 0.0978 0.7113 0.7214 -1.1412 0.7950].1315 1.7100 1.org/10.1649 0.0808 -0.3927 -0.0539 0.2346 0.1756 -3.9283 1.3430 2. Wmk = [0. bj = [-4.1384 -0.1151 0.4155 0.0457 0.1575 0.0577 -0.4692 -0.001.4147 -0.7610 -0.6169 -0.4066 0.0721 0.6547 0.4848 -0.9395 0.4963 -0.5964 -1.0837 -0. bm = [-0.0802 0.9749 1.4088]. .2768 -0.

(2007).. Sci. 96. Prentice Hall. Engel. [19] Haykin. Cairo. 20. 409.. (2010). & Slansky. Rev. (2011). 551. 08. Neural networks a comprehensive foundation (2nd ed. J. Phys. EPL.. (2012). 1143.. (2009). 01. Egypt / Center of Theoretical Physics at the British University in Egypt (BUE). Eur. Rev. Comput. Lett. B. 178. (2011). Rev. Hindawi2 *Address all correspondence to: Amr. Intel. R.. Abbassia. Phys. Lett. (2011). Z. Abbassia. Nucl. IEEE T. [17] El-dahshan. Meth. High Energy Phys. D. A. 351. [10] Hwa. [15] Link. (1971). 707. B. Artif. 22. M. 1847. Phys. (1970). Phys. 21002. [5] ATLAS Collaboration. [2] CMS Collaboration. R. B. (2011). J. 53-68. C. Instrum. R. 693. Phys. Lett. [7] ALICE Collaboration. Lett.. Lett. Faculty of Sciences.. 2221. Roesler. & Sherwood.radi@cern. (2008). R.. 022002.. D. R. E. Mod. 105. [4] ATLAS Collaboration. Rev. 504. Ranft and S. D1. L. & Radi. 1817. 330-348. [6] ALICE Collaboration... Radi. S. Phys. Faculty of Sciences. Commun. & El-Bakry. & Whiteson. D52.. Mod. 68345-354. M. [12] Engel. 1790. Rev. M. [8] TOTEM Collaboration. .). J. C.. High Energy Phys.. [14] Teodorescu... Cairo. J. [16] El-Bakry. [13] Teodorescu. L. 26. Phys. Yaseen. J. Egypt 2 Department of Physics. [11] Hwa. 18. Int. J. A. Phys. [18] Whiteson. Eng. (2009). Ain Shams University. C. Amr. Egypt References [1] CMS Collaboration. (1999).. Phys. Nucl.. S. (1972). C.. 141. [3] CMS Collaboration. (2010).ch 1 Department of Physics. S. D5. Phys. 1203. Int. Phys. (2010).198 Artificial Neural Networks – Architectures and Applications Author details Amr Radi1* and Samy K. Appl.. Phys. [9] Jacob. Y. (1995). J. (2006). (2005).. 079. Ain Shams University. Phys. & . 53.

143-147. Discovery of Neural network learning rules using genetic pro‐ gramming. [26] Hagan. (ed. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. R. [34] Cramer. 5(1). Genetic Programming: On the Programming of Computers by Means of Natural Selection. Hammel. T. Evolutionary Computation.). M. J. University of Michi‐ gan Press.. (1994). 6. . 159-166. C. [33] Richard. B. (2006). (1997). Germany. NJ. Nichael Lynn. E. & Menhaj. Adaptation in Natural and Artificial Systems. BEAGLE A Darwinian Approach to Pattern Recognition. 3-17. Birmingham University. 861-867. Kybernetes . Cambridge. . A. Forsyth (1981). T. (1992). Introduction to Evolutionary Algorithms. Ann Arbor. Evolutionary Algorithms in Theory and Practice. [28] Fogel. (1996). (1989). D. P. (1992). In 2005 IEEE Nuclear Science Symposium Conference Record. B. [27] Back. (1995). IEEE Trans. Amr. J. the School of computers Sciences. MIT Press. Oxford University Press. John J. An Introduction to Simulated Evolutionary Optimization. IEEE Trans. Genetic Programming: On the Programming of Computers by means of Natural Selection. [24] Radi. Berlin. H. T. (1985). (2000). The MIT Press. IEEE Transactions on Neural Networks. (1994). (2003). 1(1). MIT Press. D. E. New York. R. U. 1. H. B. 10(3). M. & Schwefel. PHD. (1975). J. Neural Networks. IEEE Press. [30] Holland. MA. J. E. Genetic Algorithm in Search Optimization and Machine Learning. A representation for the Adaptive Generation of Sim‐ ple Sequential Programs. 2nd Edition. CMU. [25] Teodorescu. Addison-Wesley. High energy physics data analysis with gene expression pro‐ gramming. Springer.. [32] Goldberg.. L. Michigan [31] Fogel. (2005).Applying Artificial Neural Network Hadron . ((1975). [22] Ferreira.org/10. New York.Hadron Collisions at LHC http://dx. Evolutionary Computation: Com‐ ments on the History and Current State. 3-14.doi. Evolutionary Computation: Toward a New Philosophy of Ma‐ chine Intelligence. R. [36] Koza. D. ) Adaptation in Natural and Artificial Systems The University of Michigan Press Ann Arbor. [21] Koza. [35] Koza. J.. [29] Back. Springer-Verlag. [23] Eiben. & Smith. Grefenstette. H.5772/51273 199 [20] Holland. Genetic Programming II: Automatic Discovery of Reusable Pro‐ grams. (1994). Piscataway. Training feedforward networks with the Mar‐ quardt algorithm. in Proceedings of an International Conference on Genetic Algo‐ rithms and the Applications.

5(1). Wu. P. G. (2010). Nordin. H. A genetic algorithm tutorial. & Francone. [40] Darwin. C. J. in Artificial Intelligence Review. World Scientific. & Davis. (1994). D. (1999). X. in: IEEE Transactions of Neural Networks. in: Neural Networks. Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and its Applications. & Johnson. Loh. F.. C. An Introduction to Genetic Algorithms.. A. (1994). General Asymmetric Neural Networks and Structure Design by Genetic Algorithms. Norton. Morgan Kaufmann. Darrel. A. in Computers & Chemical Engineering. Perga‐ mon Press. [56] Bornholdt. A. S. [42] A new algorithm for developing dynamic radial basis function neural network mod‐ els based on genetic algorithms [43] Sarimveis.. & Su.. 5.. 65-85. 327-334. [54] van Rooij. in Machine Learning. H. [48] Genetic Algorithm based Selective Neural Network Ensemble [49] Zhou. (2004).. G. (1996). Lawrence. The Autobiography of Charles Darwin: With original omissions restored. J. (1959). G. S.. 39-53. Morgan Kaufmann.. Xiangui.. [52] Training Feedforward Neural Networks Using Genetic Algorithms [53] Montana. . E. Vittorio.. W. & Keane. in Proceedings of the 17th Interna‐ tional Joint Conference on Artificial Intelligence. C. Keller. M. Stefan. D. & Graudenz.. MIT Press. (1998). Y. [50] Modified backpropagation algorithms for training the multilayer feedforward neural networks with hard-limiting neurons [51] Yu. Statistics and Computing. P. [46] Hierarchical genetic algorithm based neural network design [47] Yen. Mazarkakis. F. David J. Jullien.. [55] Maniezzo. Jiang. L. in IEEE Symposium on Combinations of Evolutionary Compu‐ tation and Neural Networks. H.200 Artificial Neural Networks – Architectures and Applications [37] Koza. Bennett.. Z. Alexandridis. (1993). F. Neural network training using genetic algorithms. H. J.. K. (1989). [41] Whitley. R. [39] Mitchell. A. (2001). W. Jain. (2000). in Proceedings of Canadi‐ an Conference on Electrical and Computer Engineering. Genetic Programming III: Darwinian Invention and Problem Solving. F. N. Shifei.. & Miller. G. Genetic Evolution of the Topology and Weight Distribu‐ tion of Neural Networks. 4. [38] Banzhaf. & Bafas. Dirk.. & Lu. Chunyang. Andre.. R. R. (1992). [44] An optimizing BP neural network algorithm based on genetic algorithm [45] Ding.. Nora Barlow.. M. Singapore. & Chen. edited with appendix and notes by his grand-daughter.. (1996).

[58] Nolfi.doi. U. (1995f). PhD thesis. S. Desired answers do not correspond to good teaching in‐ puts in ecological neural networks. Evolution of Artificial Neural Networks Using a Two-dimensional Representation. [62] Miller. Springer. ed. Learning to adapt to changing environments in evolving neural networks.. Hiroaki. [59] Nolfi. X. 379-384.. Efficient evolution of asymmetric recurrent neural networks using a two-dimensional representation. & Parisi. New York. [61] Pujol. (1994). P.5772/51273 201 [57] Kitano. Wien and New York. 5. J..Applying Artificial Neural Network Hadron . S. [60] Nolfi.org/10. 33.. [64] Figueira Pujol. Designing neural networks using genetic algorithms. G. Parisi. 75-97. R. 3(1). (1990a). Adaptive Behavior. Williams. F. Todd. UK. 461-476. G. S. University of Birmingham. in: Complex Systems [4]. & Elman. 1.. (1989). L. (1994). Kent and J. In Encyclopedia of Computer Science and Technology. & Hedge. 130-141. Designing Neural Networks Using Genetic Algorithms with Graph Generation Systems. 1-4. D. 5-28. School of Computer Science. Adaptive Behavior. Neural processing letters. 643-649. S. D... (1999). Proceedings of the third international conference on genetic algorithms and their applications. Evolutionary artificial neural networks. & Parisi. (1993). M. . Learning and evolution in neural networks.Hadron Collisions at LHC http://dx. F. [63] Mandischer. 137-170. D. (1998). C. Representation and evolution of neural networks. M. Paper pre‐ sented at Artificial neural nets and genetic algorithms proceedings of the internation‐ al conference at Innsbruck. Austria. & Poli. A. (1996).. Joao Carlos. [65] Yao. Also appearing in Encyclopedia of Library and Information Science. Proceedings of the first European workshop on genetic programming (EUROGP). Marcel Dekker Inc. J.

.

the use of chemometrics was growing along with the computational capacity. etc. Nowa‐ days.). Usually. the interest for robust statistical methodologies for chemical studies also increased. In the 80’s. Due the com‐ plexity of this technique. chemical problems are composed by complex systems. . There are also thousands of calculated and experimental descriptors/molecular properties that are able to describe the chemical be‐ havior of substances. Introduction In general.5772/51275 1. exponential.9].org/10. In several experiments. the chemometric algorithms and softwares started to be developed and applied [8.Chapter 10 Applications of Artificial Neural Networks in Chemical Problems Vinícius Gonçalves Maltarollo. logarithmic functions. chemometrics (scientific area that employs statistical and mathematical methods to understand chemical problems) is largely used as valuable tool to treat chemical data and to solve complex problems [3-8]. Káthia Maria Honório and Albérico Borges Ferreira da Silva Additional information is available at the end of the chapter http://dx. In fact. when small computers with relatively high capacity of calculation became popular. hyperbolic. There are several chemi‐ cal processes that can be described by different mathematical functions (linear. when compared to other statistical methods. PLS analysis is largely employed to solve chemical problems [10.11]. quadratic. Initially. This technique does not per‐ form a simple regression as multiple linear regression (MLR).2]. there are several softwares and complex algorithms available to commercial and aca‐ demic use as a result of the technological development. PLS method can be employed to a large number of variables because it treats the colinearity of descriptors.11]. One of the most employed stat‐ istical methods is partial least squares (PLS) analysis [10. many variables can influence the chemical de‐ sired response [1.doi.

All ANN methodologies share the concept of “neurons” (also called “hidden units”) in their architecture. The main advantages of ANN techni‐ ques include learning and generalization ability of data.): MATLAB [12].23]. It is im‐ portant to mention that since 90’s many studies have related advantages of applying ANN techniques when compared to other statistical methods [23. these events stimulated the extension of initial perceptron archi‐ tecture (a single-layer neural network) to multilayer networks [37. In few years (1988). Werbos [40] proposed the back-propagation learning algorithm. which helps the ANN popularization. 2. CoMFA [17-18]. Therefore. artificial neural networks (ANNs) may provide accurate results for very complex and non-linear prob‐ lems that demand high computational costs [22. etc. In addition to Holpfield’s study. analytical chemistry. In 1982. the PLS method is used to analyse only line‐ ar problems. the relationship becomes non-linear [21].38]. Hopfield [39] described a new approach with the introduction of nonlinearity between input and output data and this new architecture of perceptrons yielded a good improvement in the ANN re‐ sults.1) to study chemical engineering processes. The first applications of ANNs did not present good results and showed several limitations (such as the treatment of linear correlated data). [41] that reported the employing of a multilayer feed-forward neural net‐ work (described in Session 2. MLR. Artificial Neural Networks (ANNs) The first studies describing ANNs (also called perceptron network) were performed by McCulloch and Pitts [34. etc [32-33]. The initial idea of neural networks was devel‐ oped as a model for neurons. one of the first applications of ANNs in chemistry was performed by Hoskins et al. Due to the popularization.26-31]. R-Studio [13]. CoMSIA [19] and LTQA-QSAR [20] that also use the PLS analysis to treat their generated descriptors. when a large number of phenomena and noise are present in the calibration problem. In general. pharmaceutical. ANN techniques are a family of mathematical models that are based on the hu‐ man brain functioning. However. One of the most employed learning algorithm is the back-propagation and its main advantage is the use of output informa‐ tion and expected pattern to error corrections [24]. fault tolerance and inherent contextual information processing in addition to fast computation capacity [25].43]. two studies employing ANNs were published with the aim to predict the secondary structure of proteins [42. Each neuron represents a synapse as its biological counterpart. In the same year.204 Artificial Neural Networks – Architectures and Applications We can cite some examples of computational packages employed in chemometrics and containing several statistical tools (PLS. In general. Statistica [14] and Pirouette [15]. their biological counterparts. there is a large interest in ANN techniques. However. food research. There are some molecular modeling methodologies as HQSAR [16].35] and Hebb [36]. The theory of some ANN methodologies and their applications will be presented as follows. in special in their ap‐ plications in various chemical fields such as medicinal chemistry. biochemistry. Therefore. theoreti‐ cal chemistry. each hidden unity is constituted of activation functions that control .

doi. Table 1 illustrates some examples of activation functions. the new input data is modified [25.5772/51275 205 the propagation of neuron signal to the next layer (e. (B) artificial neuron or hidden unity. In recurrent networks. positive weights simulate the excita‐ tory stimulus and negative weights simulate the inhibitory ones). (D) ANN synapses. Therefore. the connection pat‐ tern is characterized by loops due to the feedback behavior. The feedback or recurrent networks are the ANNs where the connections among layers occur in both directions. Figure 1 shows a comparison between a human neuron and an ANN neuron. (A) Human neuron.g. radial basis function (e. some authors com‐ pare the hidden unities of ANNs like a “black box” [44-47]. There are several typical activation function used to compose ANNs.org/10. there is a connection flow from the input to output direction.Applications of Artificial Neural Networks in Chemical Problems http://dx.44-47].44-48]. Figure 1. non-linear correla‐ tions can be treated. The general purpose of ANN techniques is based on stimulus–response activation functions that accept some input (parameters) and yield some output (response). The difference be‐ tween the neurons of distinct artificial neural networks consists in the nature of activation function of each neuron.g. if more than one neuron is used to compose an ANN.g. (C) biological synapse. The feed-forward networks are composed by unidirectional connections be‐ tween network layers. sigmoid (e. . Different ANN techniques can be classified based on their architecture or neuron connec‐ tion pattern. In this kind of neural network. gaussian) [25. when the output signal of a neuron enter in a previous neuron (the feedback connection). A hidden unit is com‐ posed by a regression equation that processes the input information into a non-linear output data. hyperbolic tangent). In other words. linear. Due to the non-linearity between input and output. as threshold function.

Some activation functions used in ANN studies. There are an extensive number of ANN types and Figure 2 exemplifies the general classification of neural networks showing the most common ANN techniques employed in chemistry. 0 ifn0} φ (r) = { 1ifv ≥ 0 0ifv 0 1 1 φ (v ) = v − v 2 2 0v ≤ − 1 2 { 1−v ≥ 1 2 φ (r) = tanh(n / 2) = 1 − exp( − n) / 1 + exp( − n) v 1 − exp( − v ) φ (v ) = tanh = 2 1 + exp( − v ) ( ) ^2 φ (r ) = e − (ε v ) Table 1. The next topics explain the most common types of ANN employed in chemical problems. Figure 2. n − ½ n½. According to the previous brief explanation.206 Artificial Neural Networks – Architectures and Applications threshold linear φ (r) = {1 r ≥ ½. . the neural networks can be clas‐ sified according to their connections pattern. Each ANN architecture has an intrinsic behavior. the nature of ac‐ tivation functions and the learning algorithm [44-47]. Therefore. 0n ≤ − ½} hyperbolic tangent gaussian φ (r) = {1 ifn ≥ 0. the number of hidden unities. ANN techniques can be classified based on some features. 1996 [25]). The most common neural networks employed in chemistry (adapted from Jain & Mao.

In other words.48-50].Applications of Artificial Neural Networks in Chemical Problems http://dx. a three layer feed-forward network is composed by an input layer. . This algorithm uses the error values of the output layer (prediction) to adjust the weight of layer connections. MLP is also called feed-forward neural networks because the data information flows only in the forward direction.48-50]. There are several learning algorithms for MLP such as conjugate gradient descent. An important characteristic of feed-forward networks is the su‐ pervised learning [38. Levenberg-Marquardt. Therefore. Figure 3 displays the influence of number of layers on the pattern recognition ability of neural network.org/10. The training or learning step is a search process for a set of weight values with the objective of reducing/minimizing the squared errors of prediction (experimental x estimated data). two hidden layers and the output layer [38. Each connection between the input and hidden layers (or two hidden layers) is similar to a synapse (biological counterpart) and the input data is modified by a determined weight.1.. Influence of the number of layers on the pattern recognition ability of MLP (adapted from Jain & Mao. The higher the number of hidden layers. the produced output of a layer is only used as input for the next layer. 1996 [25]).48-50]. Figure 3. the higher the complexity of the pattern recognition of the neural network. The increase in the number of layers in a MLP algorithm is proportional to the increase of complexity of the problem to be solved. The term “multilayer” is used because this methodology is composed by several neurons ar‐ ranged in different layers. Therefore. this algorithm provides a guarantee of minimum (local or global) convergence [38. The main challenge of MLP is the choice of the most suitable architecture.48-50]. The speed and the performance of the MLP learning are strongly affected by the number of layers and the number of hidden unities in each layer [38.5772/51275 207 2. but the most employed one is the back-propagation algorithm. The crucial task in the MLP methodology is the training step. This phase is the slowest one and there is no guarantee of minimum global achievement. etc. Multilayer perceptrons Multilayer perceptrons (MLP) is one of the most employed ANN algorithms in chemistry.doi. quasi-Newton.

208 Artificial Neural Networks – Architectures and Applications 2. Example of output layers of SOM models using square and hexagonal neurons for the combinatorial design of (a) purinergic receptor antagonists [54] and (b) cannabinoid compounds [30]. . respectively. In other words. the learning rate for the neighborhood is scaled down proportional to the distance of the winner output neuron [51-53]. respectively. the neighborhood of an output neuron is defined as square or hexagonal and this means that each neuron has 4 or 6 nearest neighbors. The competitive learning means that only the neuron in the output layer is select‐ ed if its weight is the most similar to the input pattern than the other input neurons. usually a bidimensional space. Figure 4 exemplifies the output layers of a SOM model using square and hexagonal neurons for a combinatorial de‐ sign of purinergic receptor antagonists [54] and cannabinoid compounds [30]. the SOM technique is employed to cluster and extrapolate the data set keeping the original topology. The SOM technique could be considered a competitive neural network due to its learning algorithm. The visuali‐ zation of the output data is performed from the distance/proximity of neurons in the output 2D-layer. In general. respectively [51-53]. also called Kohonen neural network (KNN). Finally. The SOM output neurons are only connected to its nearest neighbors. The neighborhood represents a similar pattern represented by an output neuron.2. Figure 4. is an unsupervised neural network designed to perform a non-linear mapping of a high-dimensionality data space transforming it in a low-dimensional space. Self-organizing map or Kohonen neural network Self-organizing map (SOM).

The neural network known as radial basis function (RBF) [74] typically has the input layer. A simple variation of this technique. can be determined by Bayes’s theorem [55]. The Hopfield neural network [81-82] is a model that uses a binary n x n matrix (presented as n x n pixel image) as a weight matrix for n input signals. and there is a transformation of the matrix data to enlarge the interval from 0 – 1 to (-1) – (+1). the algorithm treats black and white pixels as 0 and 1 binary digits. the Bayesian method consid‐ ers all possible values of weights of a neural network weighted by the probability of each set of weights. One of the most important char‐ acteristic of this technique is the capacity of knowledge without disturbing or destroying the stored knowledge. There are several studies comparing the ro‐ bustness of prediction (prediction coefficients. the choice of the network architecture is a very important step for the learning of BRANN.3. . A complete re‐ view of the BRANN technique can be found in other studies [56-59]. Applications Following. Besides. the RBF networks have been successfully employed to chemical problems. This kind of neural network is called Bayesian regularized artificial neural (BRANN) networks because the probability of distribution of each neural network.5772/51275 209 2. the Bayesian method can estimate the number of effective parameters to predict an output data. This network was developed to treat irregular topographic contours of geographical data [75-76] but due to its capacity of solving complex problems (non-linear specially). As well as the MLP technique. Bayesian regularized artificial neural networks Different from the usual back-propagation learning algorithm. In the literature. r2. The complete description of this technique can be found in reference [47]. which provides the weights. practical‐ ly independent from the ANN architecture. we can found some studies employing the Hopfield model to obtain molecular alignments [83].org/10. 2. Therefore. we will present a brief description of some studies that apply ANN techniques as important tools to solve chemical problems. 3.doi.Applications of Artificial Neural Networks in Chemical Problems http://dx. The ART-2a method consists in constructing a weight matrix that describes the centroid na‐ ture of a predicted class [62. pattern recognition rates and errors) of RBF-based networks and other methods [77-80]. to calculate the intermolecular potential energy function from the second virial coefficient [84] and other purposes [85-86]. The activation function treats the activation signal only as 1 or -1. Other important neural networks Adaptative resonance theory (ART) neural networks [60.63]. the ART-2a model.61] constitute other mathematical models designed to describe the biological brain behavior. has a simple learning algorithm and it is practically inexpensive compared to other ART models [60-63]. respectively. a hidden layer with a RBF as the activation function and the output layer. there are several chemical studies that em‐ ploy the ART-based neural networks [64-73]. In chemistry research.4.

such as biological affinity. vari‐ ous ANN methodologies have been largely applied to control the process of the pharma‐ ceutical production [101-104].74 for the training. The use of SOM methodology accelerated the similarity searches by using several pharmacophore descriptors. Theoretical and Computational Chemistry In theoretical/computational chemistry.1. [105] constructed a SOM model to perform VS experiments and tested an exter‐ nal database of 160.000 compounds. The data set was composed of 137 organic compounds divided into training. An‐ other important approach to design new potential drugs is virtual screening (VS). The authors could clearly explain how the input descriptors influenced the pKBH+ behavior. chemical/physical/mechanical properties of polymer employing topological indices [109] and relative permittivity and oxygen diffusion of ceramic materials [110]. Besides. 3.210 Artificial Neural Networks – Architectures and Applications 3.2. pharmacokinetic and toxicological studies. as well as quantitative structure-activity relationship (QSAR) models [87-95]. [111] also constructed a quantitative structure-property relationship (QSPR) model to predict pKBH+ for 92 amines. To construct the regression model. Fatemi [120] constructed a QSPR model employing the ANN technique with back-propaga‐ tion algorithm to predict the ozone tropospheric degradation rate constant of organic com‐ pounds. These results showed the best efficacy of the ANN methodology in this case.86 and 0. The author also compared the ANN results with those obtained from the MLR method. test and validation sets. test and validation sets.96/0. Analytical Chemistry There are several studies in analytical chemistry employing ANN techniques with the aim to obtain multivariate calibration and analysis of spectroscopy data [112-117]. The counter-propagation neural network was employed as a modeling tool and the Kohonen self-organizing map was employed to graphically visualize the results.88. as well as to model the HPLC retention behavior [118] and reaction kinetics [119].96/0. Fanny et al. the authors cal‐ culated some topological and quantum chemical descriptors. 3. 0. we can obtain some applications of ANN techniques such as the prediction of ionization potential [106].99/0. which can maximize the effectiveness of rational drug development employing computational assays to classify or filter a compound database as potent drug candidates [96-100]. in special the presence of halogens atoms in the amines structure. Stojković et al. 108]. The best result indicated a map that retrieves 90% of relevant neighbors (output neurons) in the similarity search for virtual hits. respectively. The correlation coefficients obtained with ANN/MLR were 0. lipophilicity of chemicals [107.3. Medicinal Chemistry and Pharmaceutical Research The drug design research involves the use of several experimental and computational strat‐ egies with different purposes. .

Some examples of application of ANNs in this area include vegetable oil studies [130-138]. Cimpoiu et al . honeys [141-142] and water [143-144]. as well as structural information and physical explanations to understand the stability of human lysozyme.92.9% between experimental and predicted anti‐ oxidant activity. [128] employed a three layer neural network with back-propagation algorithm to predict the reverse-phase liquid chromatography retention time of peptides enzymatically di‐ gested from proteomes. The authors tested several different architecture of neurons (some func‐ tions were employed to simulate different learning behaviors) and analyzed the predic‐ tion errors to assess the ANN performance. This study resulted in significant coeffi‐ cients of calibration and validation (r2=0. wines [140].5772/51275 211 3. The best result was obtained employing a radial basis function neural network.doi. the authors used 7000 known peptides from D. The authors obtained a correlation of 99. The proposed meth‐ odology provided good prediction of biological activity.4. Huang et al. Petritis et al.5. In this study. [146] used the multi-layer perceptron with the back-propagation algorithm to model the antioxidant activity of some classes of tea such as black. A classification of samples was also performed using an ANN technique with a radial basis layer followed by a competitive layer with a perfect match between real and predicted classes.95 and q2=0. The constructed ANN model was employed to predict a set with 5200 peptides from S.Applications of Artificial Neural Networks in Chemical Problems http://dx. This methodology has the parameters and coefficients clearly based on physicochemical in‐ sights. [129] introduced a novel ANN approach combining aspects of QSAR and ANN and they called this approach of physics and chemistry-driven ANN (Phys-Chem ANN). . Biochemistry Neural networks have been largely employed in biochemistry and correlated research fields such as protein. The ob‐ tained ANN model could predict a peptide sequence containing 41 aminoacids with an error less than 0. 3. DNA/RNA and molecular biology sciences [121-127]. Bos et al. These results showed that the ANN method‐ ology is a good tool to predict the peptide retention time from liquid chromatography.03. The data set was composed by 50 types of mutated lysozymes (including the wild type) and the experimental property used in the modeling was the change in the unfolding Gibbs free energy (kJ-1 mol). respectively). Food Research ANNs have also been widely employed in food research. In the training set. [145] employed several ANN techniques to predict the water percentage in cheese samples. The used neural network generated some weights for the chromatographic re‐ tention time for each aminoacid in agreement to results obtained by other authors. Half of the test set was predicted with less than 3% of error and more than 95% of this set was predicted with an error around 10%. beers [139]. oneidensis.org/10. the authors employed the Phys-Chem ANN methodology to predict the stability of human lysozyme. express black and green teas. radi‐ odurans.

However. SOM. the main contri‐ bution of this book chapter will be briefly outline our view on the present scope and future ad‐ vances of ANNs based on some applications from recent research projects with emphasis in the generation of predictive ANN models.212 Artificial Neural Networks – Architectures and Applications 4. the evo‐ lution of computer science (software and hardware) has allowed the development of many computational methods used to understand and simulate the behavior of complex systems. ANN techniques can be considered valuable tools to understand the main mechanisms involved in chemical problems. etc. food research. . ART. The basic unities of ANNs are called neurons and are designed to transform the input data as well as propagating the signal with the aim to perform a non-linear correlation between experimental and predicted data. classification of compound classes. analytical chemistry. consequently. Actually. these models can be used in prediction studies. biochemistry. MLR and PLS). the main purpose of ANN techniques in chemical problems is to create models for complex input–output relationships based on learning from examples and. Conclusions Artificial Neural Networks (ANNs) were originally developed to mimic the learning process of human brain and the knowledge storage functions. Therefore. It is interesting to note that ANN methodologies have shown their power and robustness in the creation of use‐ ful models to help chemists in research projects in academy and industry. theoretical and computational chemistry. and these studies have shown that ANNs have the best perform‐ ance in many cases. The most common ANNs ap‐ plied to chemistry are MLP. Nowadays. So. these methods have been widely employed in several research fields such as medicinal chemistry. Due to the robustness and efficacy of ANNs to solve complex problems. BRANN. pharmaceutical research. identification of drug targets. The evolution of ANN theory has resulted in an increase in the number of successful applications. Notes Techniques related to artificial neural networks (ANNs) have been increasingly used in chemi‐ cal studies for data analysis in the last decades. Some areas of ANN applications involve pat‐ tern identification. peo‐ ple that can use computational techniques must be prepared to understand the limits of applic‐ ability of any computational method and to distinguish between those opportunities which are appropriate to apply ANN methodologies to solve chemical problems. There are several studies in the literature that compare ANN approaches with other chemometric tools (e.g. the integration of technological and scientific innovation has helped the treatment of large databases of chemical compounds in order to identify possible patterns. Hopfield and RBF neural networks. prediction of several physi‐ cochemical properties and others. As the human brain is not completely understood. In this way. modeling of relationships between structure and biological activity. there are several different architectures of artificial neural networks presenting different performances.

Sci. 724-731.. (1984).. [4] Wold. L.. 42. Chemometrics: Views and Propositions. Chemometrics. (1957). Sjöströma. Sci. (2001). 3-14.. L.. Chem. G. J. Ciências e Humanidades – USP – São Paulo – SP 3 Departamento de Química e Física Molecular – Instituto de Química de São Carlos – USP – São Carlos – SP References [1] Teófilo. Comput. Intell. The collinearity problem in linear regression: The partial least squares approach to generalized inverses. (1998). 500. 29. Chem. & Berg‐ man. R. Theilin. 5. Pettersen. 201-203. S. Chemom.. Seifert. 753-743. 3-40.2 and Albérico Borges Ferreira da Silva3* *Address all correspondence to: alberico@iqsc. Acta.. A. Quím. Quimiome‐ tria I: calibração multivariada um tutorial. I. M. Nova.. R. Ruhe. T. F. (1976). The evolution of chemometrics. M. (1987). PLS-regression: a basic tool of chemo‐ metrics. S.General Introduction and Historical Development. B. & Eriksson. 141.. J.. Lab. [3] Kowalski. [5] Vandeginste. C. Anal. 1401-1406. [9] Hopke... [6] Wold.. Káthia Maria Honório1. S. S. Antunes. Chemom. S.usp. Nova.5772/51275 213 Author details Vinícius Gonçalves Maltarollo1.Applications of Artificial Neural Networks in Chemical Problems http://dx. K. Top. H. B. (1998). Curr. O. Stat.. Intell. & Volpe. Lab. Quimiometria II: Planilhas Eletrônicas para Cálculos de Planejamento Experimental um Tutorial... Intell. M. 44. Experimental design and optimization. M. Quim. A. 29. Melgo. Quim. & Bruns. Lab. [11] Wold.br 1 Centro de Ciências Naturais e Humanas – UFABC – Santo André – SP 2 Escola de Artes.. (2006). Pattern Recognition... Inf. 365-377.. C. E. & Sjöström. 22. Comp. A. Nyström. 127-139. 338-350. J. M. B. P. 15. M. 8. 1-42. [2] Lundstedt.. L. Nova. [7] Ferreira.. M. R. SIAM J. Anos de Quimiometria no Brasil. (1999).org/10. 58. B. Scarminio.. Chemometrics present and future success. M. M. & Dunn. B. Pattern recognition by means of disjoint principal component mod‐ els. [10] Wold. & Ferreira. W. R. Wold. 109-130. Abramo. E. (2006). [8] Neto. S. Chemom.doi. . P. (2003).. Chim.

.. K. 64. 110. [26] Borggaard. & Thodberg. 24. A. F. Eur. (1988). Andrade.. [21] Long. [17] Cramer. 1428-1436.. Chem. Chem. Tripos Inc. S. [15] Pirouette. O.214 Artificial Neural Networks – Architectures and Applications [12] MATLAB r. K.. Deaciuc. (1992). Model. Med. Am. (2011).. Data Analysis Software System. (1994).. K. [16] HQSARTM . Mao. J. R. A.. Patterson. C. [22] Cerqueira. M. C.. V. Molecular Similarity Indices in a Com‐ parative Analysis (CoMSIA) of Drug Molecules to Correlate and Predict Their Bio‐ logical Activity. F. . R.. Inf. Redes neurais e suas aplicações em calibração multivariada. & Rives. 45.. Med.. Chem. C. (2010). Dwoskin. (1990).3. P. Soc. (2001). 2975-2992. Res. [13] RStudioTM 3. A. [20] Martins. & Ferreira. [14] Statistica. & Mietzner. G. Computational neural network analysis of the affinity of lobeline and tetrabe‐ nazine analogs for the vesicular monoamine transporter-2. Comparative molecular field analysis (CoMFA).. (1998). Chem.. Chem. LQTA-QSAR: A New 4D-QSAR Methodology. Prediction of intrinsic solubility of generic drugs using mlr... & Crooks. E. G. J. P. H. Gregoriou.. Anal. Pasqualoto. (2001). (2007). 161-165. Prediction of Atomic Ionization Potentials I-III Using an Artificial Neural Network. 1791-1797.. 29. (2012)... 34. 5959-5967. V. 37. M. 4018-4025. Clin. 4130-4146. K. Manual Release in Sybyl 7. 62. E. J... E. A. Med. Recent advances in comparative molecular field analysis (CoMFA).. C. Biol. Artificial Neural Networks: A Tutori‐ al. J. P. Anal.. Spectroscopic calibration and quantitation using artificial neural networks. IEEE Computer. Zeng. (2011). Abraham.. J. P. 617-620. Prog.. (2007). D. Chem. D.. Optimal minimal neural interpretation of spectra. Chem. [23] Sigman. H. [18] Cramer. 49. [25] Jain.. Patterson. [19] Klebe. Chem. C. Bioorg. (1994). M. & Khadikar. Zhan. ann and svm analyses. 545-551. D. P. M. & Poppi. (1996). Quím.. S.. Nova. U.. B. Inf. T. D. 1.. D. V. 31-44. L. T. Proceedings of the 20th An‐ nual International Conference of the IEEE Engineering in Medicine and Biology Society. J. Effect of shape on binding of steroids to carrier proteins. J. & Bunce. D. Infometrix Inc. Sci. J. J. RStudio Inc. G. G. [24] Hsiao. M. 15. E. 291. R. H.. Agrawal. K.. (1989). Barbosa. J. G. 20. 864-873. E. M. (2009). J. & Mohiuddin. MathWorks Inc.. The Implementation of Partial Least Squares with Artificial Neural Network Architecture. [27] Zheng.. & Bunce. G. [28] Louis. J. Zheng. R. & Chiang. 1341-1343. Lin. & Gemperline. J. Comput.

..doi. Chemom. J. Cotterill. 12. M.. [43] Bohr. Chim. Buydens. W... Neural networks: A new method for solving chemi‐ cal problems or just a passing phase? Anal. 83. J. (1994).. & Pttts.. Silva. Biophys. W. A logical calculus of the Ideas imminent in nervous activity. Brunak. D. Melssen. Math.. Mol. An accurate nonlinear QSAR model for the antitumor activities of chloroethylnitrosoureas using neural networks. A. Mol. Y. 115-133. F. Molfetta.. [30] Honório. & da. Chem. Jpn. [33] Marini. (2003)... Bull. 79. F. 223-228. 88. & Sejnowski. (1991).. [41] Hoskins. J. Neural networks and physical systems with emergent collective computational abilities. Korean J. J. 22.. Acta.. [36] Hebb. Set. . & Zhong. Artificial Neural Networks and the Study of the Psychoactivity of Cannabinoid Compounds. Applications of artificial neural networks in chemical en‐ gineering. D. New York.5772/51275 215 [29] Fatemi. (1949). C. 1-30. (1988). Fundamentals of Artificial Neural Networks. G. 241. & Ghorbanzade.. & Gasteiger. J. A. J.. Eng. Math... Olsen.. H. F. A.. E. Norskov. Lautrup. Artificial Neural Network Models of Knowledge Representation in Chemical Engineering. 75. Bull. O. W. 178-185. Chem.. Using artifi‐ cial neural networks for solving chemical problems. 17.. J. S. (2010). (1947). J. Part I.org/10. Yan. 1338-1345. & Kateman.. (1982). R. How we know universals: the perception of au‐ ditory and visual forms.. S. S. (1943). Harvard University Cambridge. 632-640. O. Quiles. (2008). [38] Smits. L. 826-833. 865-884. 29. M. Heidari. 127-147. M. 248. Chem. [34] Mc Cutloch. 373-392. H.. A Bradford Book. 165-189. L. [40] Werbos. Multi-layer feed-forward networks. A. [31] Qin. Eng.. Protein Secondary Structure and Homology by Neural Networks. Magri. M. Graph. J. L. R. H.. Wiley. Biol. Artificial neural networks in chemometrics: History examples and perspectives. (2010). Chem. Drug. [35] Pitts. (2000). T. 2554-2558. H.Applications of Artificial Neural Networks in Chemical Problems http://dx. R. M. Model. FEBS Lett. Acad.. Bucci. B. Bull. Biol.. [42] Qian. Deng. Soc.. PhD thesis. 202.. Microchem. Bohr. & Magri. & Himmelbau. de Lima. & Pe‐ tersen. (2011). Biophys.. [39] Hopfield. (1974). Comput. C.. S. 9. W. F. [32] Himmelblau. 5. H. N. J. Romero. 881-890. Proc.... M. P. [37] Zupan. (1988). A. Prediction of aqueous solubili‐ ty of drug-like compounds by using an artificial neural network and least-squares support vector machine.. F. [44] Hassoun. R. J. G.. J. The Organization of Behavior.. D. M. Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. Intell. & Mc Culloch.. (1988). M. Lab. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. B. Nat. M. W. D. A. K. Des. R.

Neural Sys. [47] Gasteiger.. (1992). 23. & Ruisánchez. Chem. [48] Marini. (1994). 23. 78. M. A review. M. Neural Networks in Chemistry. (1999). Kohonen self-organising feature maps and Hopfield networks. W. Comb. I. Adaptive pattern classification and universal recoding II. [54] Schneider. (2011). [58] Buntine. J. J. . Lab.. Cybern. & Weigend.. Edit. Sys. (2009). F. Alpha‐ script Publishing. & Mc Brewster. C. (2003). Kohonen and counterpropagation arti‐ ficial neural networks in analytical chemistry. F. [52] Zupan. [51] Kohonen. L. M. J. F. R. (1995). 370-418. Multilayer Perceptron. J. Biol. D. Bayesian Back-Propagation. Biol. C. [46] Zupan. 5. 503-527. A. [53] Smits. Bayesian Interpolation. An Essay toward solving a problem in the doctrine of chances. Melssen. [61] Grossberg.. PhD thesis. 23.. F. Artificial neural networks in foodstuff analyses: Trends and per‐ spectives.. (1976). Noviča. Intell. 267-291. Philo‐ sophical Transactions of the Royal Society of London. Boston. & Kateman. 4.. J. (2003). G. Ligand-based combinatorial design of selec‐ tive purinergic receptor (a2a) antagonists using self-organizing maps.. Chem. L... 121-134. Neural Networks in Chemistry and Drug Design (2 ed. Springer. S. Self Organizing Maps (3 ed. M. J. Using artifi‐ cial neural networks for solving chemical problems. Chemom. (2001). Lab. Neural Comput. G... 32. J. 1415-1442. Cybern. (1992).a Review of Practical Bayesian Methods for Supervised Neural Networks. Feed‐ back expectation olfaction and illusions. PWS Publishing Company. (1990). Probable Networks and Plausible Predictions. J. S. J.. & Nettekoven. 6. C. & Zupan.. [60] Grossberg. 1-23. 415-447.216 Artificial Neural Networks – Architectures and Applications [45] Zurada. A. IEEE. Parallel development and coding of neural feature detectors. (1993). 187-203. Proc. M. 38.. Vandome. Bayesian Methods for Neural Networks. [55] Bayes. (1764). Part II.. Int. 603-643. P. 53. 30 years of Adaptive Neural Networks: Perceptron Madaline and Backpropagation. 469-505. [57] Mackay.). (1991). [59] de Freitas.. B. T. Intell. & Gasteiger. Introduction to Artificial Neural Systems. D. Univer‐ sity Engineering Dept Cambridge. Complex. J. W. T. G. Chemom. [56] Mackay. Adaptive pattern classification and universal recoding I.. 635. Buydens. Anal. Wiley-VCH. Angew. 5-233.). J.. Acta. Comput. J. A. [50] Widrow.. New York. M. [49] Miller. (1997).. & Lehr. S. J. Chim. (1976). 121-131.

Chem. 317. G.. [65] Whiteley.. T. Chem. Acta. Environ. van den. Lab. Y. (2003). Adaptive resonance theory based artificial neural networks for treatment of open-category problems in chemical pattern recognitionapplication to UV-Vis and IR spectroscopy. Inf. Chemom. 165-176.. Hopke. A similarity-based approach to interpretation of sensor data using adaptive resonance theory. 32. L. Devillers. R. ART 2-A for Optimal Test Series Design in QSAR. Adaptive resonance theory based neural network for supervised chemical pattern recognition (FuzzyARTMAP). & Buydens. (1993). Chemom. & Buydens. D. P. Comput. Adaptive resonance theory based neural networksthe’ART’ of real-time pattern recognition in chemical process monitoring? TrAC Trend. T...... Feldhoff. J.. (1995). D. Classification of autoregressive spectral estimated signal patterns using an adaptive resonance theory neural network. [69] Xie. Chem. (1996). & Wang. 151-164. J. Comput. Comparison of an adaptive resonance theory based neural network (ART-2a) against other classifiers for rapid sorting of post consumer plastics by remote near-infrared spectroscopic sensing using an InGaAs diode array. (1994). S. D. 54-63. G.. & Wienke. 637-661. M.. 1921-1928. T. Kantimm... Intell. & Davis.. 32... P. (1995).. K. (1994). C. K. Adaptive resonance theory based neural network for su‐ pervised chemical pattern recognition (FuzzyARTMAP). W. L. A.. [72] Wienke. & Cammann. Airborne particle classification with a combination of chemical composition and shape index utilizing an adaptive reso‐ nance artificial neural network. J. Neural Networks. (1994). 14. L.. 1-16. & Hopke. 10-17. [67] Wienke. W. Xie. & Cammann. Huth-Fehre. Intell. 22. L. Kantimm. (1991). K. IEEE Expert. (1994). van den. 37. & Buydens.. P. Cambridge University. 26.. 23. Technol. F. 309-329. J. [68] Wienke. C. Buydens. Anal.doi. & Davis. [66] Whiteley. (1997). 4. Lab.. 4. Broek.org/10. 367-387. F. Intell. Melssen. Ind. D. F. (1993). 143-157. D. D. D.. [74] Buhmann. Chim. L. W... ART-2a-an adaptive resonance algorithm for rapid category learning and recognition. Part 2: Classification of post-consumer plastics by remote NIR spectroscopy using an InGaAs diode array. H. K. & Rosen. 18. Radial Basis Functions: Theory and Implementations. D.. R. (1996). Wienke. Feldhoff. 493-504. [64] Lin. Lab. Huth-Fehre. R... An adaptive resonance theory based artifi‐ cial neural network (ART-2a) for rapid identification of airborne particle shapes from their scanning electron microscopy images.. L. Anal.. B. Lab. J.5772/51275 217 [62] Carpenter. D. Broek. Grossberg. Winter. D... Eng. T. Intell. Chemom. Buydens. Qualitative interpretation of sensor patterns.. Part 1: Theory and network properties. [71] Domine. Quick. J. 28.. .. & Kateman. R. D. [73] Wienke. Y. Sci.. 398-406. [70] Wienke.. [63] Wienke. Sci. Comput...Applications of Artificial Neural Networks in Chemical Problems http://dx.

Use of Radial Basis Function Networks and Near-Infrared Spectroscopy for the Determination of Total Nitrogen Content in Soils from Sao Paulo State.. C.. Q. L. R.. Radial Basis Function Neural Networks Based QSPR for the Prediction of log P. Inf. X. Society of Civil Engi‐ neers. J. Chem. P. 347-352. (2003). J.. Application of Optimal Rbf Neural Net‐ works for Optimization and Characterization of Porous Materials. Braga. & Shoemaker. [83] Arakawa.. 20. Andrade. [84] Braga. PNAS. Proc. H. S. M. Generation of QSAR sets with a self-organ‐ izing map. F. (1982). 79. & Jurs. 3088-3092.. 1-14. Zhang. 23. Natl. (1992). (1984). Neural networks and physical systems with emergent collective computational abilities. Chem. 29. Optim.. Application of the Novel Molecu‐ lar Alignment Method Using the Hopfield Neural Network to 3D-QSAR.. (1998). 81. (2011). F. USA. & Qiao. 722-730. J. Pattern Recognition in Coupled Chemical Kinetic Systems. P. 1..) Artificial Neural Networks for Civil Engineers: Advanced Features and Applications. J. W. Hu. (2005).. 260. A. J. R. In: Flood I. Model. [78] Han. B. Z. 388-391. 53-76. Proc. H..218 Artificial Neural Networks – Architectures and Applications [75] Lingireddy. [77] Regis.. B.. R.. Anal. [82] Hopfield. Chemical implementation and thermodynamics of collective neural networks.. 31. J. [85] Hjelmfelt.. [76] Shahsavand. G. A Constrained Global Optimization of Expen‐ sive Black Box Functions Using Radial Basis Functions. Neural Networks in Optimal Calibration of Water Distribution Systems. Hasegawa. [79] Fidêncio. (eds. M. J. & Ormsbee. USA. 89. Almeida. J. & Ross. J. (2008).. (2005). K. Phys. Comput-Aid.. Sci. Chem. A. Sci. (2004). 43. Drug. 1396-1402. 153-171. An efficient self-organizing RBF neural network for water quality prediction. K. Graph. 945-948.. P.. [86] Hjelmfelt. C. 24. (2002). J. R. Sci.. (1993). C. . Amer. Chem. [87] Vracko. P. J. Sci. Kartam N.. 335-337. Science. J. Poppi. (2000). Schneider. E. 2134-2143.. Global. Kohonen Artificial Neural Network and Counter Propagation Neural Network in Molecular Structure-Toxicity Studies. A. J. [88] Guha. 260. Eng. Neural Networks. A. Natl. Liu. [81] Hopfield.. 73-78. Zhang. & Abreu. 24. J. Neurons with graded response have collective computational properties like those of two-state neurons. C. & Belchior. A.. X. Chinese J. Mol. Comput. M. R... [80] Yao. Serra. Hopfield neural net‐ work model for calculating the potential energy function from second virial data... & Funatsu. Chen. Comput. & Fan. (2005). M. J.. Curr. 717-725. Acad. & Ahmadpour.. M.. Acad. 2554-2558. & Ross.

Bioorg.. Sarimveis. Bioorg. Med. Med. 244-253. & Gonzalez-Nilo. 1570-1574. 45. (2011). [91] Xiao. Lett. I. Bull... Inf. An analysis of thyroid function diagnosis using Bayesian-type and SOM-type neural networks. & Zefirov.. R. S. & Weil. V. J. A.. C. [94] Caballero.. (2005). A. [92] Molfetta. P.. T. L. J. Romero. Bioorg. 6728-6731.. L. M. P. Med. (2008). Schneider. Clauset. 14. M. Nishimura. 17. G. & Lavé. P. H.. 2975-2992... Pharm.. Montanari. 21... Med. Chem. 6103-6115. Model.. F. Model..5772/51275 219 [89] Hoshi. Chem. Mol. D. Bayram. Super‐ vised self-organizing maps in drug discovery. 15. Deaciuc.. 12. [98] Noeske. A. K. Chem. Silva. [99] Karpov. [93] Zheng. G. Melagraki.. (2011). 42. F. 5708-5715. G. One-class classification as a novel method of ligand-based virtual screening: The case of glycogen synthase kinase 3β inhibitors. & Sato. S.. Trifanova. J.. Mol. D. E.based virtual screening procedure for the prediction and the identification of novel β-amyloid aggregation inhibitors using Kohonen maps and Counterpropaga‐ tion Artificial Neural Networks. Graph. V. P.. .org/10... D... Chem. G.. 419-421. Chem... B. 975-985. J. F. [95] Schneider. Structural requirements of pyrido[23-d]pyrimidin-7-one as CDK4/D inhibitors: 2D autocorrelation CoMFA and CoMSIA analyses. Eur. C. R.. P. M. & Schmitt. Kasahara. Osolodkin. Chem. Chem.Applications of Artificial Neural Networks in Chemical Problems http://dx. & Bagchi... 5072-5076. [90] Nandi. Med. I. A. N. A. & Crooks. G. M.. A. T. (2002).. Santago.. K.. & Chau. 70. Nakamura.. H. & Kollias. Baskin.. Chem. Y. J. V. Chem... & da. S.. M. 1749-1758. Bioorg. F. A. Palyulin... (2008). Koutentis. J. Parsons. Coassolo. C. Med. T... Kumagai.. A. Kawakami. M... Chem. Computational neural network analysis of the affinity of lobeline and tetrabe‐ nazine analogs for the vesicular monoamine transporter-2. P. Robust behavior with overdeter‐ mined data sets. & Keseru. I. Biol. R. A neural network based virtual screening of cyto‐ chrome p450a4 inhibitors. 1.. Vracko. (2007). A neural networks-based drug discovery approach and its application for designing aldose reductase inhibitors. 46. 424-436. Zheng. [97] Afantitis. Drug Des. A. Zhan. D. G. (1999). F. Synergism of virtual screening and medicinal chemistry: Identification and optimization of allosteric antagonists of metabotropic glutamate receptor 1.. Med. .. C. Kauss. [100] Molnar. Anticancer activity of selected phenolic compounds: QSAR studies using ridge regression and neural networks. F. (2007). Harris.. L. Fernandez. S. [96] Hu. 53. G. Lett. Bioorg. W. G. Li‐ gand. N. (2005). 24. W. 497-508. Dwoskin. G. Combining in vitro and in vivo phar‐ macokinetic data for prediction of hepatic drug clearance in humans by artificial neu‐ ral networks and multivariate statistical techniques.. D. Chen.. G.doi. (2006).. A neural networks study of quinone compounds with trypanocidal ac‐ tivity. Renner. Model. (2009).. 16. Angelotti. A. J.

S. S... M. Chim... Rossinyb. Neural-network models for infra‐ red-spectrum interpretation.. J. E. 265. (1992). V. Sci. G. V... (1998).. (2002). & Weigelt. Ceram.. M. & Dragos. G. K. Chem.. T. Sci. 16. (1992). Pharm. C. Inf. Theor. In‐ tell. Fujikawa. Hattori. In Press. 175-186. & Tanchuk. 3. Formula optimization of theophylline controlled-release tablet based on artificial neural networks.. 363-378. Chem. Tanchuk. Application of Associative Neural Networks for Prediction of Lipophilicity in ALOGPS 2. Chemom.. 1136-1145. & Morris. T. N. S. (2010). Control. P... H. Macromol. E. (1991). & Noid. A. (1994). Sci. (1994).. A. (1999).. M.. G. I. Bioorg. T. Eng.. M. Coveneya. [113] Munk. M. [104] Takayama. J. 4425-4435. B. (2000). Release. & Torrecilla. [103] Takayama. E. D. . A. J. M.. A. http://dxdoiorg/ 101016/jbmc201204024. K. 102. Counter-propagation artificial neural networks as a tool for prediction of pKBH+ for series of amides. Aragón. V. M. Kilnerb.. & Miller. I.. [112] Næs.. J. Isaksson. C. 1-6. Prediction of the functional properties of ceramic materials from composition using artificialneuralnetworks. Tham. 183-190. [106] Sigman. Natalia. Comput. (2001). J. M.. [110] Scotta. Y.. & Rives. & Kuzmanovski. Inf. 505-524. & Nagai. Chem. T. Interpretation of infrared spectra by artificial neural networks. J. Acta. [105] Fanny. J. Novič. 283-291. Artificial Neural Network as a Novel Method to Optimize Pharmaceutical Formulations. [107] Tetko. 37. C. 1. J. Prediction of n-Octanol/Water Partition Coefficients from PHYSPROP Database Using Artificial Neural Networks and E-State Indices. [111] Stojković.. pH-Control System Based on Artificial Neural Networks. 123-129... Chem.. Med. J. Simul. Y. D. Chem. J.. 68. Y. 2. 41. 1407-1421. Anal. A. (2007).. Eur. N. Infrared Spectrosc. J. Ind.. 1-11. 2729-2740. Res. K. Fujikawa. J.. Montague.. Comput. 16. Obata. Kvaal. M. 34. 42. T. & Alford. M.. (1993). [108] Tetko. P. Using Self-Organizing Maps to Accelerate Similarity Search. Comput. Alexandre. [102] Palancar. Eng.220 Artificial Neural Networks – Architectures and Applications [101] Di Massimo. M. V... Res. 617-620.. M. Y. & Nagai.. Morva.1 Program. & Villa. B.. Soc.. Madison. Chem. [109] Sumpter. T. S. K. Lab. To‐ wards improved penicillin fermentation via artificialneuralnetworks.. Artificial neural networks in mul‐ tivariate calibration. H.. W.. C. V. 27. W. E. Acta. & Robb. Prediction of Atomic Ionization Potentials I-III Using an Artificial Neural Network. Neural networks and graph theory as compu‐ tational tools for predicting polymer properties. Mikrochim. Near. Comput. V. I. Inf. [114] Meyer. Gilles.. Willis.

Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition... Zeth.. Intell. Protein Sci.. Prediction of ozone tropospheric degradation rate constant of organic compounds by using artificial neural networks. E. X. P... Yang... 7. Strittmatter. & Lawrence. Chem. P. J. [120] Fatemi. Anal. & Zhang. R. Zhao. Prediction by a neural network of outer membrane {beta}-strand protein topology. & Kell. 70. (2003). H. Chem.. A. Anal. A. Coello..Applications of Artificial Neural Networks in Chemical Problems http://dx. Zhou. R. Intell. Artificial Neural Networks for Multicomponent Kinetic Determinations. 163-170. Y. P. [122] Meiler. K. Iturriaga. K. A.... Anal. & Breed. Biomol. Yue. 557-566. 18. 1597-1601. K. Sijstermans. 2413-2420. 123-128. M. Schneider. Cheng. [117] Cirovic. Use of Artificial Neural Networks for the Accurate Prediction of Peptide Liquid Chromatog‐ raphy Elution Times in Proteome Analyses. 148-155. D. J.. H. J. F. B. 16. 25-37. L. (1995). (1994). J. M. L. Chemom. Stehmann.. Umhau.. Analysis of protein transmembrane helical re‐ gions by a neural network. Protein Sci.. L. W. 66. TrAC Trend.. A neural net‐ work for predicting the stability of RNA/DNA hybrid duplexes..org/10. Protein Sci. Interpretation of infrared spectra with modular neural-network sys‐ tems. K.. 507-521... Rapid and Quantitative Analysis of the Pyrolysis Mass Spectra of Complex Binary and Tertiary Mixtures Using Multivariate Calibration and Artificial Neural Networks. & Chou. (2006). C. D. Freigang. Neural network model for the prediction of membrane-spanning amino acid sequences. Lab. & Ferrara... Lab. D.. M. P. J. Ni. 1039-1048. [119] Blanco. Neal. M. (1998).. (1993). NMR. H. A. 3... J. 3. J. Shen. & Smith.. S. Chim. Intell. Schoenmakers. J. Paša-Tolić. [128] Petritis. (1999). J. Pflugfelaer.doi. (1997).. Acta.. L. A.. Y. 556. R. 27-39. Ferguson. (1994). J. G. 45. [127] Ferran. Lab.. Theor. Maspoch. S. & Herdewijn. R. PROSHIFT: Protein chemical shift prediction using artificial neural networks. . (1994). 4477-4483. (1994).5772/51275 221 [115] Smits. & Wrede.. 941-946. Biol. F. [123] Lohmann. Lipton. Wang. Y.. E. G. Kangas. Behrens.. 355-363. G. Application of an artificial neural network in chromatography-retention behavior prediction and pat‐ tern recognition. Chemom. M. S. (2006). [124] Dombi. (2004). R. 75... 3.. Self-organized neural maps of hu‐ man protein sequences. K. Feed-forward artificial neural networks: applications to spectro‐ scopy.. Protein Sci.. F. Liu.. 242. 1070-1085. R. 26.. B... 67. Chem. H. & Redon. Q. [116] Goodacre. J. F.. Zhao... & Chemom. S. B. Anal. P.. D. Chem. Anderson. [126] Ma. K. [125] Wang. M. C. J. [118] Zhao. (2003).. Kate‐ man G.. Y. J. Auberry.. [121] Diederichs.. Anal.

L. J. F. Tech. & Chou. Electronic nose based on metal oxide semiconductor sensors and pattern recognition techniques: characterisation of vegetable oils. F. Magri.. Intell.. Marini. Balestrieri. C. Tech.. G. Food Chem. In‐ tell. Balestrieri. Y. Di Benedetto... [132] Zhang.. D. (2003). (2003). Field. Y. Lipid. Supervised pat‐ tern recognition to discriminate the geographical origin of rice bran oils: a first study. Wei. (2005).. Theor. G. [136] Marini. [135] Bucci. R. Chemometric criteria for the characterisation of Italian DOP olive oils from their metabolic profiles.. Sci. F. [131] Brodnjak-Voncina. F. 449.. G. H. Multivariate data analysis in classification of vegetable oils characterized by the content of fatty acids. 283-296. L. & Marini.. R. 103.. 697-704. G. Magri. D. P. 297-307. Magri. Chemom... & Balestrieri. Bucci.. Anal. J. Microch. Chim.. (2009). & Cassano. Oliveros. & Cordero. Q. (2001). (2002). R. F. J. 31-43... G. [137] Marini. Chem. M. 293-300. & Marini. C. Z. 105. A. G. Z.. 69-80. [138] Marini. C. 256. F... Sci. Ni. Characterization of the lipid fraction of Niger seeds (Guizotia abyssinica cass) from different regions of Ethiopia and India and chemometric authentication of their geographical origin. F.... [140] Penza. & Marini... Kodba. Tech. Eur.. Rapid assessment of the adulteration of virgin olive oils by other seed oils using pyrolysis mass spectrometry and artificial neural networks. Kell. A. (2006). J. A.222 Artificial Neural Networks – Architectures and Applications [129] Huang. S. Magri.. Y.. (1998).. Churchill. R. Pinto. Giansante. 145-153.... A. Part 2: Identification of beer samples using artificial neural networks.. . 73. (2004). (2004). A. A field-portable gas analyzer with an array of six semiconductor sensors. Pang. D.. W. [130] Martin. Anal. Chemom. D. Lab. & Kell. [134] Bianchi. L. R. D. 141-150. N.. (2001). [133] Goodacre. Supervised pattern recognition to authenticate Italian extra virgin olive oil varieties. B. 428-435. L. A.. M. M. L... C. Pavon. W. Chemometric characterization of Italian wines by thin-film multisensors array and artificial neural networks. B.. Lipid. P.. Food Agr. J.. K. & Bianchi. Acta. Du. B. Physics and chemis‐ try-driven artificial neural network for predicting bioactivity of peptides and pro‐ teins and their design. D. J. Sci. Bucci. A. (1993). Authentication of vegetable oils on the basis of their physico-chemical properties with the aid of chemometrics.. 85-93. J. 70. Magri. 239-248.. & Novic.. B. Biol.. 2. D. & Kokot. Wei. & Hibbert. Talanta. Lab. L.. [139] Alexander. Agric. L. J. Eur. 63. Chemical Au‐ thentication of Extra Virgin Olive Oil Varieties by Supervised Chemometric Proce‐ dures. Magri. Food Chem. 75. Marini. 413-418. T. D. D.. M. D. 86. Shaw. 50.. 74.

. (2003). 51. [143] Brodnjak-Voncina. & van der Linden. Food Chem. (2011). Sandru. Anal. Y. Garcia..doi.. [145] Bos. 462. M. Acta. Sovic. Acta. 119-125.. Chim. R. & Zupan.. M. J.. Cristea. S. L. 87-100. 1323-1328.. (2002). Artificial neural networks as a tool for soft-modelling in quantitative analytical chemistry: the prediction of the water content of cheese. C.... E. Pena. 127. [144] Voncina. (2000). (2007).. Analyst. Anal. D. Chim. 125. E.org/10. & Clement. 256. C. A.. Bos. Antioxidant activity prediction and classification of some teas using artificial neural networks... (1992). & Seserman. 133-144.5772/51275 223 [141] Latorre. C. Novic. Cabrol-Bass D Honey Characterization and Adulteration Detection by Pattern Recognition Applied on HPAEC-PAD Profiles. J.. Chemometrics characterisation of the quality of river water. W. Chemometric char‐ acterisation of the Quality of Ground Waters from Different wells in Slovenia. 54. T. 3234-3242.Applications of Artificial Neural Networks in Chemical Problems http://dx. 307-312. Acta Chim. D. D.. M. & Herrero.. Brodnjak-Voncina. M. Food Chem. [146] Cimpoiu. Slov. Dobcnik. N. A. Militao. [142] Cordella. S. 1. Agric. V. C. Honey Floral Species Characterization.. L. Hosu. J.. B. & Novic.. M. Authentication of Galician (NW Spain) honeys by multivariate techniques based on metal content data. .

.

Introduction Many communities obtain their drinking water from underground sources called aquifers. Considering the increasing difficulty to obtain new electrical power sources. José Ângelo Cagnon and Nilton José Saggioro Additional information is available at the end of the chapter http://dx. These parameters depend on several factors. From a dimensional analysis. An aquifer can be defined as a geologic formation that will supply water to a well in enough quantities to make possible the production of water from this formation.doi. da Silva.Chapter 11 Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems Ivan N.org/10. The transportation of water to the reservoirs is usually done through submerse electrical motor pumps. it is essential to determine a parameter that expresses the energetic behavior of whole water extraction set. A methodology using artificial neural networks is here developed in order to take into account several experimental tests related to energy consumption in submerse motor pumps.5772/51598 1.m. Official water suppliers or public incorporations drill wells into soil and rock aquifers look‐ ing for groundwater contained there in order to supply the population with drinking water. The GEEI of a depth is given in Wh/m3. . we can observe that the smaller numeric value of GEEI indicates the better energetic efficiency to the water ex‐ traction system from aquifers. such as soil properties and hydrologic and geologic aspects [1]. there is then the need to reduce both operational costs and global energy consumption. being the electric power one of the main sources to the water production. it is impor‐ tant to adopt appropriate operational actions to manage efficiently the use of electrical pow‐ er in these groundwater hydrology problems. For this purpose. The conventional estimation of the exploration flow involves many efforts to understand the relationship be‐ tween the structural and physical parameters. which is here defined as Global Energetic Efficiency Indicator (GEEI). Thus.

The sedimentary rocks. receives the whole electric power supplied to the system. the key issues raised in the chapter are summarized and conclusions are drawn. In Section 2. some aspects related to mathematical models applied to water exploration process are described. To trans‐ port water in this hydraulic system. submersed on the water that comes from the aquifer. and those volcanic and fractured crystalline rocks can also be classified as aquifers. Another characteristic given by this test is the determination of Drawdown Discharge Curves. which is normally located at an upper position in relation to the well.226 Artificial Neural Networks – Architectures and Applications For such scope. the water is transport‐ ed to the ground surface and from there. ii) interference of neigh‐ boring wells or changes in its exploration conditions. the test known as Step Drawdown Test is car‐ ried out. Finally. Water Exploration Process An aquifer is a saturated geologic unit with enough permeability to transmit economical quantities of water to wells [10]. In Section 3. pipes.. such as arenite and limestone. Since aquifer behavior changes in relation to operation time. 2. the motor-pump set mounted inside the well. the Drawdown Discharge Curves can represent the aquifer dynamics only in that particular moment. From an eduction piping. This test gives the maximum water flow that can be pumped from the aquifer taking into account its respective dynamic level. . Figure 1 shows the typical components involved with a water extraction system by means of deep wells.) for its implementation. other components of the exploration system interfere on the global energetic efficiency of the system. Thus. This test consists of measuring the aquifer depth in relation to continue withdrawal of water and with crescent flow on the time. These changes occur by many factors. and v) rest time available to the well. i. Besides the aquifer behavior. it is necessary several accessories (valves. These curves are usually ex‐ pressed by a mathematical function and their results have presented low precision. iv) operation cycle of pump. while the procedures for estimation of aquifer dynamic behavior using neural networks are presented in Section 6. in Section 7.e. such as the following: i) aquifer recharge capability. In Section 4 is formulated the ex‐ pressions for defining the GEEI. which also supports physically the motor pump. the mapping of these groundwater hydrology problems by conventional identi‐ fication techniques has become very difficult when all above considerations are taken into account. which represent the dynamic level in relation to exploration flow [2]. This depth relationship is defined as Dynamic Level of the aquifer and the aquifer level at the initial instant. it is transported to the reservoir. this chapter is organized as follows. etc. a brief summary about wa‐ ter exploration processes are presented. The aquifers are usually shaped by unconsolidated sands and crushed rocks. After the drilling process of groundwater wells. On the other hand. through an adduction piping. iii) modification of the static level when the pump is turned on. is defined as Static Level. that instant when the pump is turned on. The neural approach used to determine the GEEI is intro‐ duced in Section 5. curves.

due to the fact that they come from particular points or parts of the tubing. Regarding the hydraulic circuit. the effect created by this resistance is also called “load loss along the pipe”.5772/51598 227 The resistance to the water flow. In order to overcome these problems. Another important factor concerning the global energetic efficiency of the system is the geo‐ metrical difference of level. with aggregated incrustation along the operational time. Similar to the tubing. other elements of the system cause a resist‐ ance to the fluid flow. and therefore. However.org/10. this parameter does not show any variation after the total implantation of the system. These are some observations. to represent the capitation system. it is possible to deter‐ mine the GEEI of the system. Components of the pumping system. and that it varies with the type and the state of the material. old tubing. Concerning this. This resistance makes the motor pump to supply an additional pressure (or a load) in order to water can reach the reservoir. locat‐ ed.doi. shows a load loss different of that present in new tubing.Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems http://dx. and the use of artificial neural networks to determine the exploration flow. which are easily obtained in practice. these models should frequently be evaluated in certain periods of time. and will be taken as uniform in every place where the diameter of the pipe to be constant. . that could be done. These losses can be considered local. From these parameters. A valve turned off twice introduces a bigger load loss than that when it is totally open. two statements can be done: i) when mathematical models were used to study the lowering of the piezometric surface. due to the state of the pipe walls. A variation on the extraction flow also creates changes on the load loss. load losses. accidental or singular. this work considers the use of parameters. is continuous along all the tubing. Figure 1. ii) the exploration flow of the aquifer assumes a fundamental role in the study of the hydraulic circuit and it should be carefully analyzed. it is observed that the load loss (distributed and located) is an important parameter. among several other points. Therefore. Thus.

S × = × w × sr dr T (5) s . iii) the aquifer thickness is considered constant with infinite horizontal extent. In this model. the following hypotheses are considered: i) the aquifer is confined by impermeable forma‐ tions. r is the horizontal distance between the well and the observation place.228 Artificial Neural Networks – Architectures and Applications 3. we have: d 2s dr 2 + 1 ds . The model proposed by Theis can be represented by the following equations: ¶ 2s ¶r 2 + 1 ¶s S ¶s × = × r ¶r T ¶t (1) s(r . and iv) the wells penetrate the entire aquifer and their pumping rates are also consid‐ ered constant in relation to time. Mathematical Models Applied to Water Exploration Process One of the most used mathematical models to simulate aquifer dynamic behavior is the Theis’ model [1. Applying the Laplace’s transform on these equations.0) = 0 (2) (3) (4) s(¥.(r . T is the transmissivity coefficient.9]. This model is very simple and it is used to transitory flow. t ) = 0 lim r ( r →0 ∂s Q ) =− ∂r 2⋅π ⋅T where: s is the aquifer drawdown. ii) the aquifer structure is homogeneous and isotropic in relation to its hydro-geologi‐ cal parameters. Q is the exploration flow. w ) = A × K0 × (r × (S / T ) × w ) (6) lim r ( r →0 ds Q ) =− dr 2⋅π ⋅T ⋅w (7) where: .

homogeneous. t ) = s(r . t ) = (9) The Theis’ solution is then defined by: ¥ s= q 4 × ×T ò u e-y Q dy = × W (u ) y 4 × ×T (10) where: r 2 ×S 4 ×T × t u= (11) Finally. Thus. S is the storage coefficient.doi. the mathematical models require ex‐ pert knowledge of concepts and tools of hydrogeology. from Equation (10). which is mapped under some hypothe‐ ses. w ) = K × (r × (S / T ) × w ) q × 0 2 × ×T w (8) This equation in the real space is as follows: é K × (r × (S / T ) × w ) ù Q × L-1 ê 0 ú 2 × ×T w ê ú ë û h . we have: é K × (r × (S / T ) × w ) ù W (u ) = 2 × L-1 ê 0 ú` w ê ú ë û (12) where: L-1 is the Laplace’s inverse operator.(r . the aquifer drawdown in the Laplace’s space is given by: s .5772/51598 229 w is the Laplace’s parameter. storage coefficient and hydraulic con‐ ductivity) to be explored must be also defined. isotropic. . other aquifer parameters (transmissivity coefficient. Thus.Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems http://dx. From analysis of the Theis’ model.org/10. such as confined aquifer. K0 is the hydraulic conductivity. constant thickness. Moreover. it is observed that to model a particular aquifer is indis‐ pensable a high technical knowledge on this aquifer. etc.h0 (r .

The theoretical concept for the proposed Global Energetic Efficiency Indicator will be pre‐ sented using classical equations that show the relationship between the absorbed power from the electric system and the other parameters involved with the process. The changes are normally motivated by the companies that operate the exploration systems. or still. such as those presented in [11]-[16]. The descriptive indicators are those that characterize the energetic situation without looking for a justification for its variations or deviations. For these situations. HT is the total manometric height (m). “Energetic Efficiency” is a generalized concept that refers to set of ac‐ tions to be done. or then. Therefore. In addition. Defining the Global Energetic Efficiency Indicator As presented in [3]. by drilling of new wells or changes of the exploration conditions. 4. static level. Q is the water flow (m3/s). neighboring wells will also be able to cause interference on the aquifer. . As presented in [3]. γ is the specific weight of the water (1000 kgf/m3). they present low precision because it is more difficult to consider all parameters related to the aquifer dynamics. operation time. the description of reached results. These changes have certainly re‐ quired immediate adjustment on the Theis’ model. the power of a motor-pump set is given by: Pmp = × Q × HT 75 × mp (13) where: Pmp is the power of the motor-pump set (CV).230 Artificial Neural Networks – Architectures and Applications It is also indispensable to consider that the aquifer of a specific region shows continuous changes in its exploration conditions. although to be possible the estimation of aquifer behavior using mathematical models. in‐ telligent approaches [17]-[20] have also been used to obtain a good performance. and obviously with those intrinsic characteristics of the aquifer under exploration. which become possible the re‐ duction of demand by electrical energy. Another fact is that the aquifer dynamic level modifies in relation to exploration flow. ηmp is the efficiency of the motor-pump set (ηmotor ⋅ η pump ). motivated by drilling of illegal wells. The energetic efficiency indicators are established through relationships and variables that can be used in order to monitor the variations and deviations on the energetic efficiency of the systems.

1 m3/s = 1/3600 m3/h. such as semi-closed valves. Thus. since they usually do not change after in‐ stalled. Observing again Figure 1.Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems http://dx. piping length. using these considerations and substituting (15) in (14). The dynamic level (Ha) will change (to lower) since the beginning of the pumping until the moment of stabilization. which frequently changes along the useful life of the well. can take place as well as alterations in the aquifer characteristics. γ = 1000 kgf/m3 } in equation (13). valves.doi. The total load loss will also vary during the pumping. and iii) opera‐ tional problems. curves.). Besides this variation.org/10. due to interferences from other neighboring wells. we have: Pel = 2. From analyses on the variables in (15). the total load loss is also dependent on other characteristic of the hydraul‐ ic circuit. Ha is the dynamic level of the aquifer in the well (m). etc.726 × Q × HT mp (14) The total manometric height (HT) in elevator sets to water extraction from underground aquifers is given by: H T = H a + H g + Δh f t where: HT is the total manometric height (m). overcoming all the inherent load losses. it is supplied by the electric system to the motor-pump set. This observation is verified in short period of time.726 ⋅ Q ⋅ ( H a + H g + Δh f t ) ηmp (16) (15) . it is possible that other types of variation. Hg is the geometric difference in level between the well surface and the reservoir (m). as for instance. ii) water flow. while other two will change along the operation time of the well. These characteristics can be considered constant. leakage. However.5772/51598 231 Substituting the following values {1 CV ≅ 736 Watts. it is observed that only the variable corresponding to the geometric difference in level (Hg) can be considered constant. etc. Δhft is total load loss in the hydraulic circuit (m). These variable characteristics are given by: i) roughness of the piping system. it is verified that the necessary energy to transport the water from the aquifer to the reservoir. we have: Pmp = 2. which can present a cyclic behavior. hydraulic accessories. and it is dependent on hydraulic cir‐ cuit characteristics (diameter. a month.

etc. obstructions of filters inside the well. leakages in the hydraulic system. it is impossible to make a direct measurement and its value is obtained through relationships between other quantities. the manometric height changes and as result the water flow also changes. Δhft is the total load loss in the hydraulic circuit (m). Q is the water flow (m3/h). From (16) and considering that an energetic efficiency indicator should be a generic descrip‐ tive indicator. geometric difference of level. . ηmp is the efficiency of the motor-pump set (ηmotor ⋅ η pump ). Thus. water flow. piping incrustations. Then. the most generic form of the GEEI is given by: GEEI = Pel Q. Ha is the dynamic level of the aquifer in the well (m). when the efficiency of the motor-pump set is high. the GEEI will be low. After the beginning of the pumping. Therefore. closed or semi-closed valves. it is occurred the lowering of water level inside the well.232 Artificial Neural Networks – Architectures and Applications where: Pel is the electric power absorbed from electric system (W). dynamic level. The effi‐ ciency of a motor-pump set will also change along its useful life due to the equipment wear‐ ing. it is verified that the GEEI will depend on electric power. and total load loss of the hydraulic circuit.HT (18) The GEEI defined in (18) can be used to analyze the well behavior along the time. the best GEEI will be those presenting the smallest numeric values. the Global Energetic Efficiency Indicator (GEEI) is here proposed by the follow‐ ing equation: Pel Q ⋅ ( H a + H g + Δh f t ) GEEI = (17) Observing equation (17). Hg is the geometric difference of level between the well surface and the reservoir (m). Since it is a fictitious value. Another reason to exclude the efficiency of the motor-pump set in (17) is the difficulty to obtain this value in practice. converting all variables in (17) to meters. The efficiency of the motor-pump set does not take part in (17) because its behavior will be reflected inversely by the GEEI. Therefore.

They consist of several simple processor elements with high degree of connectivity between them [4]. trained by the backpropagation algorithm. that allows the mapping of a set of known values (net‐ work inputs) to a set of associated values (network outputs). To overcome this practical problem. is very expensive. since the exploration flow of the well is a dependent variable of those ones that will be used as input variables. The process of weight adjustment to suitable values (network training) is carried out through successive presentation of a set of training data. known as network weights. In this work. the determination of the exploration flow is the most difficult to obtain in practice. Neural Approach Used to Determine the Global Energetic Efficiency Indicator Among all necessary parameters to determine the proposed GEEI. making its modeling difficult by conventional techniques. • Electric power in Watts (Pel) absorbed from the electric system at the instant t. The functional approximation consists of mapping the relationship between the several variables that describe the behavior of a real system [5]. • Manometric height in meters of water column (Hm) at the instant t. the network will be able to estimate values for the input set. Artificial Neural Networks (ANN) are dynamic systems that explore parallel and adaptive processing architectures. The objective of the training is the minimization between the output (response) generated by the network and the respective desired output. .5772/51598 233 5. as that shown in Figure 2. Each one of these elements is associated with a set of pa‐ rameters. an ANN will be used as a functional approximator. The use of flow meters.org/10. was used as a practical tool to determine the water flow from the measured pa‐ rameters.Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems http://dx. The ability of neural artificial networks to mapping complex nonlinear functions makes them an attractive tool to identify and to estimate models representing the dynamic behav‐ ior of engineering processes. The input variables applied to the proposed neural network were the following: • Level of water in meters (Ha) inside the well at the instant t. A multilayer perceptron (MLP). After training process.doi. The use of rudimentary tests has provided impre‐ cise results. This feature is particularly important when the relationship be‐ tween several variables involved with the process is nonlinear and/or not very well defined. which were not included in the training data. it is proposed here the use of artificial neural networks to determine the exploration flow from other parameters that have been measured before determining the GEEI. as the electromagnetic ones.

These patterns were applied to a neural network of MLP type (Multilayer Perceptron) with two hidden layers.234 Artificial Neural Networks – Architectures and Applications The unique output variable was the exploration flow of the aquifer (Q). The values of the input variables and the respective output for a certain pumping period.9x10-5. the network was able to estimate the respective output variable. A description of the main steps of this algorithm is presented in the Appendix. The determination of GEEI will be done by using in equation (18) the flow values obtained from the neural network and other parameters that come from experimental measurements. all these variables (inputs and output) were measured and provided to the network. To training of the neural network. which is expressed in cubic meters per hour. the neural network will return a result for the flow at that same instant t. Multilayer perceptron used to determine the water flow. which were used in the network training. The network is here composed by two hidden layers and the following pa‐ rameters were used in the training process: • Number of neurons of the 1st hidden layer: 15 neurons. • Number of neurons of the 2nd hidden layer: 10 neurons. and its training was done using the backpropagation algorithm based on the Levenberg-Marquardt’s method [6]. the mean squared error obtained was 7. Figure 2. After training. It is important to observe that for each set of input values at a cer‐ tain instant t. The number of hidden layers and the number of neurons in each layer were determined from results ob‐ tained in [7. At the end training process. are given by a set composed by 40 training patterns (or training vectors).8]. which is a value considered acceptable for this application [7]. The network topology that was used is similar to that presented in Figure 2. • Training algorithm: Levenberg-Maquart. • Number of training epochs: 5000 epochs. .

83 34.10 31. we should observe that it would be desirable a greater number of training pat‐ terns for the neural network.970 25.5772/51598 235 After training process.00 48.50 53. Table I shows some values of flow that were given by the artificial neural network (QANN) and those measured by experimental tests (QET). When new patterns are used.00 Table 1.doi.00 53. the highest error reaches the value of 14.25 40.00 53.953 25. Ha (m) 25.00 48.12 32.08 54.35% of the measured value.41 34.00 54. it is noticed that the difference between the results is very small.50 53.886 25.00 53.00 56.74 33.00 Pel (W) 26.00 55.00 48.886 25.01 53.69 31.48 53.50 53.987 25.15 58.870 25.00 48. Hm (m) 8.49 53.95 35.00 48. the values in bold were not presented to the neural network during the train‐ ing. .937 25.00 48.50 54.Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems http://dx.15 34.00 62.00 54.00 48.00 48.5%.801 QANN (m3/h) 74.00 In this table.25 53.30 53. At this point. i.970 25.14 QET (m3/h) 75.54 53. reaching the maximum value of 0.00 48.org/10.59 33.50 32.92 31.155 25.869 25.00 55. Comparison of results.954 25.99 53.14 53.00 56.05 33.00 48.e. especially if it could be obtained from a wider variation of the range of values. they are far away from the transitory period of pumping.71 34. It is also observed that the error value to new patterns decreases when they represent an operational stability situation of the motor-pump set.00 48. values of input variables were applied to the network and the respec‐ tive values of flow were obtained in its output.256 26.26 33.887 25.00 53.00 48..785 25.77 54.50 48.00 54.00 54. These values were then compared with the measured ones in order to evaluate the obtained precision. When the patterns used in the training are presented again.

022 5. and the water flow obtained from the neural network.738* 5.115 5.078* 4. Table 2. GEEI calculated using the artificial neural network.030 5.049 5.037 5.042 5.020 5.073 5.066 5.017 5.060 5.896* 4.027 5.090* 5.006* 5.420* 4.840* 5.026 5.689* 5. the pressure of out‐ put in the well.951* 4.015* 5.031 5.m) 5. .663 Operation Time (min) 40 45 50 55 60 75 90 105 120 135 155 185 215 245 275 305 335 365 395 425 455 485 GEEI(t) (Wh/m3.042 5.025 5.134 5. The numeric values that have generated the graphic in Figure 3 are presented in Table 2. the geometric difference of level.m) 7.032 5.100* 5.062 4. Figure 3 shows the behavior of GEEI during the analyzed pumping period.092* 5.236 Artificial Neural Networks – Architectures and Applications The proposed GEEI was determined by equation (18) and the measured values used were the electric power.034 5.015 * GEEI in transitory period (from 0 to 20 min of pumping).245* 4. Operation Time (min) 0 1 2 3 4 5 6 7 8 9 10 11 12 14 16 18 20 22 24 26 28 30 35 GEEI(t) (Wh/m3.030 5.456* 5.027* 5. the dynamic level.139 5.044* 5.066* 5.017 5.040 5.032 5.054 5.

MLP-1 and MLP-2. 6. where two neural networks of type MLP.5772/51598 237 Figure 3. Figure 4. . Estimation of Aquifer Dynamic Behavior Using Neural Networks In this section. The general architecture of the neural system used in this application is shown in Figure 4.org/10. artificial neural networks are now used to map the relationship between the variables associated with the identification process of aquifer dynamic behavior. compose this architecture. General architecture of the ANN used for estimation of aquifer dynamic behavior. Behavior of the GEEI in relation to time. constituted respectively by one and two hidden layers.doi.Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems http://dx.

08 105. The training data for ANN-1 were directly ob‐ tained from experimental measurements.03 104.238 Artificial Neural Networks – Architectures and Applications The first network (ANN-1) has 10 neurons in the hidden layer and it is responsible by the computation of the aquifer operation level.55 116. Present Level (meters) 115. the computa‐ tion of the aquifer dynamic level takes into account the aquifer operation level.26 137.14 117. It is important to note that this network has taken into account the present level and rest time of the aquifer.03 104.03 % . the ‘Relative Error’ column provides the relative error between the values estimated by the net‐ work and those obtained by measurements. Operation Flow (m3/h) 145 160 170 220 Table 4. the explora‐ tion flow and operation time. Rest Time (hours) 4 9 9 8 Operation Level (ANN-1) 103.10 118. The second network (ANN-2) is responsible by the computation of the aquifer dynamic lev‐ el and it is composed by 2 hidden layers with both having 10 neurons.30 Dynamic Level (Exact) 115.43 % 0.52 % 0.03 104.86 141.05 % Table 3 presents the simulation results obtained by the ANN-1 for a particular well. the training data were also obtained from experimental measurements.03 Relative Error (%) 0. Simulation results (ANN-1). The op‐ eration levels computed by the network taking into account the present level and rest time of the aquifer were compared with those results obtained by measurements.50 116.59 141. Therefore. After training process of the neural networks.20 141.03 % 0.69 102. the ANN-1 output is provided as an input parameter to the ANN-2. they were used for estimation of the aquifer dynamic level. Simulation results (ANN-2). For this network.41 Table 3. In this table.05 % 1.55 125.58 % 1. The simulation results obtained by the networks are presented in Table 3 and Table 4. As observed in Figure 4.26 Relative Error (%) 0. Operation Time (hours) 14 2 6 21 Dynamic Level (ANN-2) 115.95 Operation Level (Exact) 104.59 104.04 % 0.

5% of the ex‐ act values for ANN-1 and 0. it is possible to simulate several situations in order to define appropriate management plans and policies to the aquifer. the greatest relative er‐ ror is 1. From analysis of the results presented in Table 3 and 4. and iii) precision: the values estimated by the proposed approach are as good as those obtained by physical measurements.org/10. In addition. Conclusion The management of systems that explore underground aquifers includes the analysis of two basic components: the water. The GEEI calculus can also be done by operators or to be implemented by means of computational system. which comes from the aquifer. it is verified that the relative error between values provided by the network and those obtained by experimental measurements is very small. The obtained GEEI will indicate the global energetic behavior of the water capitation system from aquifers and will be an indicator of occurrences of abnormalities. to convert the obtained results in actions that become possible a reduction of energy consumption. which is necessary to the transportation of the water to the consumption point or reservoir. These results show the efficiency of the neural approach used for estimation of aquifer dy‐ namic behavior. the development of an efficiency indicator that shows the energetic behavior of a certain capitation system is of great importance to efficient management of the energy consump‐ tion.5772/51598 239 The simulation results obtained by the ANN-2 are provided in Table 4. ii) economy and simplicity: reduction of operational costs and measure‐ ment devices.5 for ANN-2. The application of the proposed methodology uses parameters that have easily been ob‐ tained in the water exploration system. From results.doi. The main advantages in using this neural network approach are the following: i) velocity: the estimation of dynamic levels are instantly computed and it is appropriated for applica‐ tion in real time. . the ‘Relative Error’ column gives the relative error between the values computed by the network and those from measurements. a novel methodology for estimation of aquifer dynamic behavior using artificial neural networks was also presented in this chapter.52% (Table 4). 7. Simulation results confirm that proposed approach can be efficiently used in these types of problem.Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems http://dx. exploration flow and operation time. These results are also compared with those obtained by measurements. The dynamic level of the aquifer is estimated by the network in relation to operation level (computed by the ANN-1). The values estimated by the network are accurate to within 1. such as tubing breaks or obstructions. and the electric energy.58 % (Table 3) and for ANN-2 is 0. Thus. The estimation process is carried out by two feedforward neural networks. In Table 4. or still. For ANN-1.

an artificial neuron works as follows: (a) Signals are presented to the inputs. the unit produces an output. u is the activation potential. Appendix The mathematic model that describes the behavior of the artificial neuron is expressed by the following equation: u= å i =1 n w i × xi + b (19) y = g (u ) (20) where n is the number of inputs of the neuron. The network weights (wj) associated with the j-th output neuron are adjusted by computing the error signal linked to the k-th iteration or k-th input vector (training example). resulting in a level of activity.y j (k ) (21) where dj(k) is the desired response to the j-th output neuron. (b) Each signal is multiplied by a weight that represents its influence in that unit. g( ) is the activation function of the neuron. Therefore. y is the output of the neuron. To approximate any continuous nonlinear function a neural network with only a hidden layer can be used. However. wi is the weight associated with the i-th input. This error signal is provided by: e j (k ) = d j (k ) . which are generally difficult to obtain by conventional techni‐ ques. the networks are of great im‐ portance in mapping nonlinear processes and in identifying the relationship between the variables of these systems. (c) A weighted sum of the signals is made. Adding all squared errors produced by the output neurons of the network with respect to kth iteration. xi is the i-th input of the neuron. (d) If this level of activity exceeds a certain threshold. Basically. to approximate non-continuous functions in its domain it is necessary to increase the amount of hidden layers. b is the threshold associated with the neuron.240 Artificial Neural Networks – Architectures and Applications 8. we have: E (k ) = 1 2 å j =1 p e2 j (k ) (22) .

Since the backpropagation learning algorithm was first popularized. . The Levenberg-Marquardt algorithm uses this approximation to the Hessian matrix in the following Newton-like update: w (k + 1) ← w (k ) − ( J T ⋅ J + μ ⋅ I ) ⋅ J T ⋅ e −1 (26) When the scalar μ is zero. For an optimum weight configuration. and J is the Jacobean matrix that contains first deriva‐ tives of the network errors with respect to the weights and biases. then the Hessian matrix can be approximated as H = JT × J (24) and the gradient can be computed as g = JT × e (25) where e is a vector of network errors. and η is a constant that determines the learning rate of the backpropaga‐ tion algorithm.doi.Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems http://dx.5772/51598 241 where p is the number of output neurons. there has been consid‐ erable research into methods to accelerate the convergence of the algorithm. While backpropagation is a steepest descent algorithm. the Marquardt-Levenberg algorithm is similar to the quasi-Newton method. Newton’s method is faster and more accurate near to an error minimum. this produces a gradient descent with a small step size. The weights associated with the output layer of the network are therefore updated using the following relationship: w ji (k + 1) ← w ji (k ) − η ∂ E (k ) ∂ w ji (k ) (23) where wji is the weight connecting the j-th neuron of the output layer to the i-th neuron of the previous layer.org/10. The adjustment of weights belonging to the hidden layers of the network is carried out in an analogous way. so the aim is to shift toward Newton’s method as quickly as possible. The necessary basic steps for adjusting the weights associated with the hid‐ den neurons can be found in [4]. which was designed to approach second-order train‐ ing speed without having to compute the Hessian matrix. E(k) is minimized with respect to the synaptic weight wji. When the performance function has the form of a sum of squared errors like that presented in (22). this is Newton's method. using the approximate Hessian matrix. When μ is large.

(2009). Development of Methodology for Determination of Global En‐ ergy Efficiency Indicator to Deep Wells. In: proceedings of the International Joint conference on Neural Networks. 989-993. Brazil 3 University of São Paulo (USP). J. New York: McGraw-Hill. Concepts and Models in Groundwater Hydrology. (2000). [5] Anthony. M. São Paulo State University. (2008). J. P. M. the performance function is always reduced at each iteration of the algorithm [6].242 Artificial Neural Networks – Architectures and Applications Thus. 24-27 July 2000. I. . A. da Silva1*. S. (1990). 3rd edition. B. N. 5(6). Bauru. IEEE Transactions on Neural Networks. W. Bauru. A. New York: PrenticeHall. Neural Network Learning: Theoretical Founda‐ tions. Italy.usp. (1994). Author details Ivan N. Como. Brazil References [1] Domenico. & Menhaj. Saggioro.... IJCNN2000. In this way. [6] Hagan. [7] Silva. T. Brazil 2 São Paulo State University (UNESP). N. Training Feedforward Networks with the Mar‐ quardt Algorithm. New York: John Wiley and Sons.br 1 University of São Paulo (USP). Using neural networks for estima‐ tion of aquifer dynamical behavior. N.. This algorithm appears to be the fastest method for training moderate-sized feedforward neural networks (up to several hundred weights). José Ângelo Cagnon2 and Nilton José Saggioro3 *Address all correspondence to: insilva@sc. & Schwartz. Neural Networks and Learning Machines. L. SP . [3] Saggioro. SP. São Carlos. & Barlett. P. SP. A. [4] Haykin. M. & Cagnon. (2011). Cambridge: Cambridge University Press. P.. Physical and Chemical Hydrogeology. F. (2001). [2] Domenico. J. Master’s degree dissertation (in Portuguese). μ is decreased after each successful step (reduction in performance function) and is increased only when a tentative step would increase the performance function.

J. D. Engineering Geology. (2001). & Jianqing. V. [11] Delhomme. (2000). Korea. & Zhang. [18] Gemitzi. 653-673. E.. Groundwater and Wells. A mathematic time dependent boundary model for flow to a well in an unconfined aquifer. China. M. Environmental Geology. & Xue. Water Resources Research. 40(11-12). [19] Hong. H. L. (2006). J. ICCIT2010. F. Journal of Geographical Systems [9]. doi: 10. G. & Hongjuan. N.doi.. L. Southwest Japan. ISWREP2011.1029/ 2005WR004742. (2006). Assessment of Groundwater Vulnerability to Pollution: A Combination of GIS.. [14] Fu. (2011). Environmental Geology. Modeled Impacts of Predicted Climate Change on Recharge and Groundwater Levels. Detection and Hydrologic Modeling of Aquifers in Unconsolidated Alluvial Plains though Combination of Borehole Data Sets: A Case Study of the Arao Area. In: Proceedings of the 5th International Con‐ ference on Computer Sciences and Convergence Information Technology. Xi’an. G. INDUSCON2000. China.org/10. A. Fuzzy Logic and De‐ cision Making Techniques. Using Fuzzy Logic for Modeling Aquifer Architecture. [9] Driscoll. Sakamoto.. Q. Saggioro. In: Proceedings of the International Symposium on Water Resource and Environmental Protection. In: Proceedings of the IEEE Industry Ap‐ plications Conference. (2002). N. (1986). & Pisinaras. . & Silva. & Liu. [15] Jinyan. C. M. 7(5). ISA2009.5772/51598 243 [8] Cagnon. A. (2010). X. 380-391. J.. An Application of Fuzzy Logic to the Assessment of Aquifers’ Pollution Potential. 62(4). H.. Petalas. Brazil... Application of neural networks for analysis of the groundwater aquifer behavior. 269-280. (2001)... ISWREP2011.. 15(2). Tsihrintzis. Spatial Variability and Uncertainty in Groundwater Flow Pa‐ rameters: A Geostatistical Approach. (1989).. D. Aquifer parameter identification with ant colony optimiza‐ tion algorithm. 49(5). M. Porto Alegre. R. Yuan.. P.. Mingchao. 1305-1315. I. S. Journal of Hydrologic Engineering.. In: Proceedings of the International Workshop on Intelligent Systems and Applications. J. Y. 06-09 November. J. V. 30 No‐ vember to 02 December 2010. M. (2011). In: Proceedings of the International Symposium on Water Resource and Environ‐ mental Protection. & Peloso. China. (2007). Z. [16] Hongfei. Water Resources Research [42].Recurrent Neural Network Based Approach for Solving Groundwater Hydrology Problems http://dx. R. L. G. Dynamic Fuzzy Modeling of Storm Water Infiltration in Urban Fractured Aquifers.. 20-22 May. Seoul. Xi’an. Schuurman. & Allen... 23-24 May. [17] Cameron. F. Wuhan. R. S. 20-22 May 2011. K. Yan. 301-317.. 289-310. Minneapolis: Johnson Division. Identifying aquifer parameters based on the algorithm of simple pure shape. Rosen. N. [12] Koike.. A mathematic time dependent boundary model for flow to a well in an unconfined aquifer. M. G. Y. [13] Scibek.. Yudong. (2009). & Ohmi. 18. [20] He. [10] Allen. A. & Reeves. J.

.

org/10. Materials and methods 2. Most of the tools are based on regression models and in quantitative variables.Chapter 12 Use of Artificial Neural Networks to Predict The Business Success or Failure of Start-Up Firms Francisco Garcia Fernandez. Investors use different tools to evaluate the survival capabilities of middle-aged companies but there is not any tool for start-up ones. After a moderate drop (8%) in 2008. and there is no problem in mix quantitative and qualitative variables in the same model. They have been large specially used in engineering proc‐ esses modeling. Ignacio Soret Los Santos.5772/51381 1.doi. Introduction There is a great interest to know if a new company will be able to survive or not. Nevertheless. In this case artificial neural networks can be a very useful tool to model the company survival capabilities. qualitative variables which measure the company way of work and the manager skills can be considered as important as quantitative ones. Develop a global regression model that includes quantitative and qualitative variables can be very complicated.1. Javier Lopez Martinez. but also in economy and business modeling. the Total Entrepreneurial Activity index . A snapshot of entrepreneurship in Spain in 2009 The Spanish entrepreneurship's basic indexes through 2009 have been affected by the eco‐ nomic crisis. 2. This kind of nets are called mixed artificial neural networks. Santiago Izquierdo Izquierdo and Francisco Llamazares Redondo Additional information is available at the end of the chapter http://dx.

) prevail. tertiary activities that target other firms as main customers. Now the female TEA index is 3.6. mining).1%). innovation-driven. In Spain. with a mean age of 36. This rate implies that in our country there are 1.. The median amount invested by entrepreneurs in 2009 was around 30. The owner-managers of a new business (more than 3 months but not more than 3.e.9% and 24.3%.29%. they accounted for 56. restaurants. returning to 2004 levels ([1]de la Vega Gar‐ cía. The real estate activity in Spain was of great importance. where business services (i. real estate. others are pushed into entrepreneurship because they have no other means of making a living. retail.2% respectively. respectively.534.6%. fishing. and its decline explains the reduction in the business services sector in 2009. Although most individuals are pulled into entrepreneurial activity because of opportunity recognition (80. Global Entrepreneurship Monitor. data analysis. etc. which are typical of factor-driven economies. . Extraction businesses (farming. which are typical of efficiency-driven countries. The gender difference in the TEA index has increased from two to almost three points.246 Artificial Neural Networks – Architectures and Applications (TEA) experienced a great drop (27. In Spain.1% of established businesses. Consumer services (i. respectively. Transforming businesses (man‐ ufacturing and construction). forestry.Spain.8% and 17. 2010) (Fig 1).1%) in 2009.964 nascent businesses (between 0 and 3 months old). accounted for 6.e. Therefore the entrepreneurial initiative is less ambitious in general.5 years) have also declined in 2009.8% of the entrepreneurs in Spain.000 Euros (less than the median amount of 50. and well educated (55. returning to 2007 levels. insurance.33% and the male TEA index is 6. countries. the typical early stage entrepreneur in Spain is male (62. These necessity entrepreneurs are 15.4% with a university degree).5% of all entrepreneurs).000 Euros in 2008). such as finance. were the sec‐ ond largest sector. As in other comparable.. or because they fear becoming unemployed in the near future. tourism) accounted for 12. accounted for 25.5% of early-stage activities and 46. Executive Report. the distribution of early-stage entrepreneurial activity and established business owner/managers by industry sector is similar to that in other innovation-driven countries. Figure 1. The female entrepreneurial initiatives have been declined in 2009 and the difference between female and male Total Entrepreneurial Activity index (TEA) rates is now bigger than in 2008.0% and 8.

Questionnaire It is clear that the company survival is greatly influenced by its financial capabilities.doi.org/10. it is not always reliable. the influence of some knowledge. quality policies or academic level of its employees and manager. Variable Working Capital/Total Assets Retained Earnings/Total Assets Earnings Before Interest and Taxes/Total Assets Market Capitalization/Total Debts Sales/Total Assets Manager academic level Company technological resources Quality policies Trademark Employees education policy Number of innovations areas Marketing experience Knowledge of the business area Openness to experience Table 1. often driven by family members. the entrepreneurial initiatives. Variables Type Quantitative Quantitative Quantitative Quantitative Quantitative Qualitative Qualitative Qualitative Qualitative Qualitative Qualitative Qualitative Qualitative Qualitative Range R+ R+ R+ R+ R+ 1-4 1-4 1-5 1-3 1-2 1-5 1-3 1-3 1-2 There are some other qualitative factors that have influence in the company success.g. social and cultural norms. .5%) was developed in a familiar mod‐ el. Third. tech‐ nology or research result developed in the University (14. Therefore. government policies supporting entrepreneurship. technology or research result developed in the University was bigger than expected. availability of debt and equity). More than one fifth of the entrepreneurial activity (21. People decided to start businesses because they used some knowledge. such as its technological capabilities. howev‐ er. Second. this numerical information is not always easy to obtain. and 10. and even when obtained. 2. which was cited as a constraining factor by 62% of respond‐ ents. Nevertheless. which was cited as a constraining factor by 32% of respondents. received finan‐ cial support or management assistance from some family members. In this study we will use both numerical and qualitative data to model the company surviv‐ al (Table 1)..3% of the nascent businesses.Use of Artificial Neural Networks to Predict The Business Success or Failure of Start-Up Firms http://dx.2.5772/51381 247 The factors that mostly constrain entrepreneurial activity are: first. financial support (e.3% of the owner-managers of a new business). which was cited as a con‐ straining factor by 40% of respondents.

The most used ratios to predict the company success are the Altman ratios ([2]Lacher et al. • The company controls either. 1995. ranged from 1 to 5. ranged from 1 to 4. 2001): • Working Capital/Total Assets. short-terms provision and accrued expenses. inventories. This ratio measures the firm´s assets utilization.2.2. . • Basic studies (1). 2. Qualitative data.2. • PhD or Master (4). • The company uses self-made software programs (4). Current assets include cash. • A production control is the only quality policy (2). receivables and marketable securities.1. ranged from 1 to 4. • Retained Earnings/Total Assets. Since a company’s existence is dependent on the earning power of its assets. where they establish several parameters to value the company positioning and its survival capabilities and the influence of manager personality in the company survival. • University degree (3). this ratio is appropriate in failure prediction. • Manager academic level. • The company uses specific programs but it buys them (3). • Market Capitalization/Total Debts. • The company uses the same software than competitor (2). [3]Atiya. Working Capital is defined as the difference between cur‐ rent assets and current liabilities.248 Artificial Neural Networks – Architectures and Applications 2. • Earnings Before Interest and Taxes/Total Assets. • Company technological resources. • High school (2). • The company has quality policies based on ISO 9000 (5). • The company uses older software than competitors (1). This ratio weighs up the dimension of a company’s competitive market place value.. production and client satisfaction (4). • Sales/Total Assets. This ratio is specially important because bankruptcy is higher for start-ups and young companies. • Quality policies. The election on which qualitative data should be used is based on previous works as in ref‐ erences [4-6] Aragon Sánchez y Rubio Bañón (2002 y 2005) and Woods and Hampson (2005). Financial data. Current liabilities include accounts payable.

ranged from 1 to 5. 2. ranged from 1 to 2.org/10. • Number of innovations areas in the company. • The company is involved in its employees education (2). • Trademark. • The company trademark is better known than competitors’ (3). ranged from 1 to 2. • The company has a great marketing experience in the field of its products and in others (3). . The surveys will be conducted by the same team of researchers to ensure the consistency of questions in‐ volving qualitative variables. Artificial neural networks Predictive models based on artificial neural networks have been widely used in different knowledge areas. but may lack pragmatism (1). • The company has only marketing experience in his field of duty (2). • The manager knows lightly the business area (2). • Knowledge of the business area. • The manager has no idea on the business area (1). • Employees education policy. • The company is not involved in its employees education (1). • Marketing experience. • o The company has not any quality policy. 1995. including economy and bankruptcy prediction ([2. has an active imagination and likes to think up new ways of doing things. • Openness to experience. prefers works that is routine and has few artistic interest (2). Researchers will conduct these surveys with managers from 125 companies.5772/51381 249 • Supplies control is the only quality control in the company (1).Use of Artificial Neural Networks to Predict The Business Success or Failure of Start-Up Firms http://dx.doi. ranged from 1 to 3. • The company has no marketing experience (1). ranged from 1 to 3. 7-9]Lacher et al. ranged from 1 to 3. • The manager knows perfectly the business area and has been working on several compa‐ nies related whit it (3). • The manager is a practical person who is not interested in abstract ideas. • The company trademark is as known than competitors’ (2) • The company trademark is less known than competitors’ (3). • The manager spends time reflecting on things.3.

which are capable of extract knowledge from a set of examples ([11] Perez and Martin. 2009) and forecast markets evolution ([10]Jalil and Misas. the inner layer performs the mathematical operations to obtain the proper response which is shown by the output layer. 2). 2007). Yang et al. They are made up of a series of interconnected elements called neurons (Fig. 3). An artificial neuron. 1999.): transfer function. Artificial neuron network architecture. 1997. Those neurons are organized in a series of layers (Fig. f(. The input layer receives the values from the example variables. u(. 2003). Hsiao et al. Figure 3.250 Artificial Neural Networks – Architectures and Applications Jo et al. Artificial neural networks are mathematical structures based on biological brains. wij: connection weighs. Bi: Bias. and knowledge is set in the connections between neurons ([12] Priore et al. . 2002).): net function. Figure 2.

Use of Artificial Neural Networks to Predict The Business Success or Failure of Start-Up Firms http://dx.5772/51381 251 There is not a clear method to know how many hidden layers on how many neurons an arti‐ ficial neural network must have.org/10. 1998. As the transfer function output interval is (-1. In this work a special software will be develop. depending on the problem to solve or to model. 1) the input data were normalized before training the network by means of equation: (Ec. Demuth et al. in order to find the optimum number of neurons and hidden layers. The . Xmin and Xmax: Maximum and minimum values of vector X. [21] Isasi & Galván. so the only method to perform the best net is by trial and error ([13]Lin and Tseng. f(x): Neuron output. X ′= X − X min X max − X min (1) X’: Value after normalization of vector X. [20] Haykin. 2002). 2004). 19-21] Hagan et al. [11] Pérez & Martín.1) ([16-18]Krauss et al. 1989) makes it suitable for modeling too many different kinds of variable relationships. In this work perceptron structure has been chosen. 2007). Peng et al. 1997. 2002. The network training will be carried out by means of supervised learning ([11. This function is a variation of the hyperbolic tangent ([15] Chen. Tansig function.doi. There are lots of different types of artificial neural network structures. x: Neuron input. 1995) but the first one is quicker and improves the network efficiency ([16] Demuth et al. specially when it is more important to obtain a reliable solution than to know how are the relations between the variables. 2000).. The hyperbolic tangent sigmoid function (Fig. 4) has been chosen as transfer function. Figure 4. 1995. 2003. Perceptron is one of the most used artificial neural network and its capability of universal function aproximator ([14]Hornik.

The training set (60% of the data). (Fig. the training set error and the validation set error will be compared every 100 epochs. 2002). . a very common problem during training. 6) Figure 5. Finally the software will compare the results between all the nets developed and will choose the best one. test set (20% of the data) and validation set (20% of the data). This method is very adequate when sigmoid transfer functions are used ([16] Demuth et al. to find the optimum artificial neural network architecture an specific software will be develop. Program pseudocode. 5. The resilient backpropagation training method will be used for training. As mentioned before. Training will be considered to be finished when training set error begins to decreases while validation set error increases. This software makes automatically different artificial neural net‐ work structures with different number of neurons and hidden layers.252 Artificial Neural Networks – Architectures and Applications whole data will be randomly divided into three groups with no repetition. To prevent overfitting.

The Matlab Artificial Neural Network Toolbox V. . will be used for develop the artificial neural network.Use of Artificial Neural Networks to Predict The Business Success or Failure of Start-Up Firms http://dx.4.doi.org/10. Neural network design flow chart.5772/51381 253 Figure 6.

Spain 4 MTP Metodos y Tecnologia. Acknowledgements This study is part of the 07/10 ESIC Project “Redes neuronales y su aplicación al diagnóstico empresarial. Spain 3 Universidad San Pablo CEU. Author details Francisco Garcia Fernandez1*. Santiago Izquierdo Izquierdo4 and Francisco Llamazares Redondo2 *Address all correspondence to: francisco.es 1 Universidad Politecnica de Madrid.garcia@upm. Figure 7. Spain 2 ESIC Business & Marketing School. Javier Lopez Martinez3. Actually the work is on his third stage which is the data analysis by means of artificial neural network modeling method. Most expected neural network architecture. Factores críticos de éxito de emprendimiento” supported by ESIC Business and Marketing School (Spain). Spain . Results and conclusions This work is the initial steps of an ambitious project that pretend to evaluate the survival of start-up companies.254 Artificial Neural Networks – Architectures and Applications 3. Ignacio Soret Los Santos2. Once finished it is expected to have a very useful tool to predict the business success or fail‐ ure of start-up firms.

GEM Informe Ejecutivo 2009 España. E. Evaluación de pronósticos del tipo de cambio uti‐ lizando redes neuronales y funciones de pérdida asimétricas.. A. Engineering Applications of Artificial Intelligence. (2002). Factores explicativos del éxito com‐ petitivo. M. Pino. & Tseng. 30. F. Pastor. C.. Han. J. A neural network for classifying the financial health of a firm. & Aragón. 79(6). 19. B. T. D. S. L. 67-74. A. M. [3] Atiya. (2003). (2009). M.. C. [2] Lacher. D. Utilización de las redes neu‐ ronales en la toma de decisiones. 143-161. Anales de Mecánica y Electricidad. 28-34. R. 359-366. M.. 35-69. C. Paper presented at Memoria de Vi‐ veros de Empresa de la Comunidad de Madrid. [10] Jalil. [11] Pérez. 6100-6107. R. Coats. F. (1989). A.. 3-14. Cuadernos de Estadística. [9] Hsiao. [8] Yang... Q. Sharma. & Rubio. A. 2009. 2. (2007). Madrid: Instituto de Empresa Madrid Emprende. Sánchez. (1997). & Garcia. Bankruptcy prediction for credit risk using neural networks: A survey and new results. [4] Rubio. P. (2000). (1995)... & Lee. Revista Colombiana de Estadística. (2005). Expert Systems With Applications. Sánchez. Bañón. Expert Systems With Applications. Bañón. T. Measuring the Big Five with single items us‐ ing a bipolar response scale. 53-65. Journal of Business Research. (1999). S. Aplicación a un problema de secuenciación. R. A study of financial insolvency prediction model for life insurers. Y. P.5772/51381 255 References [1] la Vega.. I. 373-390. 49-63... (2002). Z. Platt. European Journal of Personality.. & Hampson. [6] Woods. La Muralla S. 13(2). 36. & Martín. & Fant. H. Cuadernos de Gestión.. J. neural Networks and discriminant analysis. J..A. 13. De La Fuente. A. H. K. 44.Use of Artificial Neural Networks to Predict The Business Success or Failure of Start-Up Firms http://dx. 929-935. Bankruptcy prediction using case-based reasoning.org/10.doi. H. [13] Lin.. & Puente. S. (2005).. European Journal of Operational Research. (2001). & Misas. Neural Networks. Contaduría y Administración. [12] Priore. e. Optimum design for artificial networks: an example in a bicycle derailleur system. Aplicaciones de las redes neuronales a la estadísti‐ ca. [7] Jo. K. [5] Aragón. A. 12. Madrid: Madrid Emprende. IEEE Transactions on Neural Networks.. H. (2010). & Platt. L. & Whang. A. 2(1). Madrid. . Probabilistic neural networks in bank‐ ruptcy prediction.Un estudio empírico en la pyme. 85. H. 97-108. 216. Multilayer Feedforward Networks are Universal Approximators. [14] Hornik. Factores explicativos del éxito com‐ petitivo: el caso de las PYMES del estado de Veracruz.. S.

295-281.. MA 01760. I. Kindangen. [16] Demuth. [21] Isasi. (2002). M. Redes Neuronales Artificiales. Building and environment. & Beale. J.A. [19] Hagan. Chen. C. H. & Galván. Natick The Mathworks Inc. W. S. Wu. G. Boston:. X. B. version 4. I. [20] Haykin. USA.. Modeling of water sorption isotherm for corn starch. PWS Pub. 80.. (2004). [18] Peng. M.. (2007). S.256 Artificial Neural Networks – Architectures and Applications [15] Cheng. M. USA. Demuth. T. X. USA. [17] Krauss. S. H. Neural Network Toolbox User’s guide. Co. & Jiang. (1997). Computers & Industrial Engineering. 2nd edition. (1999). un enfoque práctico... 4. P.. 51-61. Prentice Hall New Jersey. (1995). 562-567. & Hagan. M. G. Neural Network Design. Pearson Educación. 28. M. (1996).. Neural Networks: A Comprehensive Foundation. Using artificial neural network to predict interior velocity coefficients.. Madrid. A multi-layer neural network model for detecting changes in the process mean. Journal of Food Engineering. P. Beale.. & Depecker. .