Professional Documents
Culture Documents
31 699 PDF
31 699 PDF
Abstract: The artificial neural networks (ANN) have proven their efficiency in several applications: pattern
recognition, voice and classification problems. The training stage is very important in the ANN’s performance.
The selection of the architecture of a neural network suitable to solve a given problem is one of the most
important aspects of neural network research. The choice of the hidden layers number and the values of weights
has a large impact on the convergence of the training algorithm. In this paper we propose a mathematical
formulation in order to determine the optimal number of hidden layers and good values of weights. To solve
this problem, we use genetic algorithms. The numerical results assess the effectiveness of the theorical results
shown in this paper and computational experiments are presented, and the advantages of the new modelling.
Key-Words: Artificial neural networks (ANN), Non-linear optimization, Genetic algorithms, Supervised
Training, Feed forward neural network.
2. Artificial neural networks input layer, one or more hidden layers and one
output layer. The number of neurons in the input
architecture layer and the output layer are determined by the
Artificial neural network (ANN) is a computing
numbers of input and output parameters,
paradigm designed to imitate the human brain and
respectively.
nervous systems, which are primarily made up of
In order to find the optimal architecture, number of
neurons [4].
neurons in the hidden layer has to be determined
(this number will be determined based on the ANN
2.1. Multilayer network during the training process by taking into
The neural multilayer network (MLN) was first consideration the convergence rate, mapping
introduced by M. Minsky and S. Papert in 1969 [4]. accuracy, etc.).
It has one ore more hidden neuron layers between Artificial neural network modelling is essentially a
its input and output layers (Fig1), where there is no black box operation linking input to output data
connection between neurons in the same layer. using a particular set of nonlinear basis functions.
Usually, each neuron is connected to all neurons in ANN consists of simple synchronous processing
the next layer. elements, which are inspired by biological nervous
systems and the basic unit in the ANN is the neuron.
Output Y ANNs are trained using a large number of input data
with corresponding output data (input/output pairs)
WN obtained from actual measurement so that a
particular set of inputs produces, as nearly as
possible, a specific set of target outputs.
Hidden
Training MLN in a supervised method, many
layer hN
studies have shown MLN ability to solve complex
and diverse problems.
Wi
Hidden
2.2. Supervised learning
layer hi Training of ANN consists of adjusting the weight
associated with each connection (weights) between
neurons until the computed outputs for each set of
W1
data inputs are as close as possible to the
Hidden experimental data outputs.
layer h1
In other words, the error between the output and the
desired output is minimized, possibly to zero. The
W0 error minimization process requires a special circuit
Input X
known as supervisor; hence, the name, ‘supervised
learning’. It is well known that during the design
and raining of ANNs, factors such as: Architecture
Fig1: Neural network structure of the ANN; Training algorithm and Transfer
function need to be considered eventually.
The Figure 1 shows a feed-forward neural network Said supervised training is characterized by the
with N hidden layers of neurons. Note that each presence of pairs, Q input and output that we will
neuron of a certain layer is connected to each ((p1, d1), (p2, d2). . . (pQ, dQ)), which means a
neuron of the next layer. stimulus pi (input) and di target for this desired
Determining the parameters of the ANN has a large outputs [4].
impact on the performance of ANN, it must be Neurons of successive layers are interconnected,
considered carefully. Parameters of the ANN are each neuron calculates the weighted sum of its
network architecture, training algorithm and inputs and output calculates this amount modified
activation function [7]. by an activation function, for example, the sigmoid
function:
Definitions 1
The term “network architecture” refers to the f ( x) =
number of layers in the ANN and the number of 1 + e−x
neurons in each layer. In general, it consists of an
Each input vector is passed forward through the hidden layer, The output of the ith can be formulated
network and an output vector is calculated. as following:
During training, the network’s output is hi = t (hi1 , hi2 ,..., hini )
compared to the actual observations, and an
error term is created. This error term is called 2.6 Output layer
cost function. The cost function is an important The output layer has neurons to collect the outputs
concept in training, as it is a measure of how far from the hidden layer hN..
away, we are from an optimal solution to the The neurons in the output layers are computing as
problem that we want to solve. Training flows:
algorithms search through the solution space in nN N
order to find a function that has the smallest yi = f ∑ w k , j hNk
possible cost. k =1
After training, the network reproduced the good Y = ( y1 ,..., y j ,..., ym ) = F (W , X )
output not only for the points of the learning base, Where w N k , j is the weight between the neuron k in
but also for other inputs, that is known as
generalization. the Nth hidden layer and the neuron j in the output
The learning problem can be considered as an layer, nN is the number of the neurons in the Nth
optimization problem, where it is to identify and hidden layer, Y=(y1,y2…,ym) is the vector of output
minimize the error between the calculated output layer, F is the transfer function and W is the matrix
of weights, it’s defined as follows:
[ ]
and the desired output.
The multi-layered feed forward network trained W = W 0 ,...,W i ,...,W N
usually with the back-propagation learning
algorithm. The back-propagation learning algorithm where W i = (wki , j )1≤ k ≤ ni i = 0,..., N
is based on the selection of a suitable error function, 1≤ j ≤ ni +1
whose values are determined by the actual and and
predicted outputs of the network. (
W = wi k , )0≤ i ≤ N
j 1≤ k ≤ n
i
1≤ j ≤ ni +1
2.3 Input layer
Remark
We suppose that B={x1,x2,…,xQ} the training base. To simplify we can take n = ni ∀ i = 1, ..., N for all
Each observation xi is presented at input layer of hidden layers. where X is the input of neural
artificial neural networks. i
network and f is the activation function and W is
In the training process, observations are sequentially
input to the learning system in an arbitrary order. the matrix of weights between the ith hidden layer
0
and the (i+1)th hidden layer for i=1,…,N-1, W is
2.4 Hidden layers the matrix of weights between the input layer and
Consider a set of hidden layers (h1, h2, …., hN). the first hidden layer, and W
N
is the matrix of
Assuming ni the neurons number by each hidden th
layer hi. weights between the N hidden layer and the output
layer.
The outputs hi j of neurons in the hidden layers are
computing as flows: 3. Optimization of Artificial Neural
ni −1 i −1 k
hi = f ∑ wk , j hi −1 , i = 2,..., N and j = 1,..., ni
j architecture
k =1 In this section, general information about our
For the first hidden layer is given as follows: approach is explained.
ni−1
h1j = f ∑ wk0, j xk , j = 1,..., n1 Definition
k =1 The problem of neural architectures optimization is
i to find the optimal number of hidden layers in the
Where wk , j is the weight between the neuron k in
ANN, the number of neurons within each layer, and
the hidden layer i and the neuron j in the hidden the good activation function, in order to maximize
layer i+1 , ni is the number of the neurons in the ith the performance of artificial neural networks [22].
Objective function:
1
hi1 The objective function of the mathematical
h i −1
programming model is the error between the
calculated output and desired output:
hik−1 F (U , X , W ) − d
2
w1i,−j1
hi j
i −1
w k, j Constraints:
h ni −1 The last constraints guarantee the existence of the
i −1 i −1
w ni−1 , j hini hidden layers
∑u i ≥1
hi −1 hi i =1,.., N −1
yj 1≤ j ≤ni +1
u ∈ {0,1} i = 1,..., N − 1
wkN, j i
nN
h N
y n N +1 where
wnNN , j U = (u1 ,..., u N −1 ) ∈ {0,1}
N −1
hN Y
( )
W = wki , j 0 ≤i ≤ N
1≤ k ≤ n i
is the matrix of the weights,
1≤ j ≤ n i +1
Hidden Layer h3
U = [u1 ,..., u i ,..., u N −1 ]
W2
[
W = W 0 ,...,W i ,...,W N ] Hidden Layer h2
W1
W0
Remark
The optimal number of hidden layers is defined as Input
follows:
N −1 Fig 5: Example for the neural architecture
nopt = ∑ u i optimization
i =1
Our approach begins with a maximal number of The model corresponds to this example is defined as
hidden layers. The training process, all observations follows:
are sequentially input to learning system in an Min F (U ,W , X ) − d 2
arbitrary order. The number of hidden layers and the
values of weights are needed by the optimization Sc :
model. In addition, the number of hidden layers
must be decided simultaneously with a training
phase (weights adjust). ( P)u1 + u2 ≥ 1
Example
Let a multi-layer neural network with two hidden ( )
W = wi 0≤i≤3
k , j 1≤k ≤n
i
wki , j ∈ IR
1≤ j≤ni +1
layers, In this example, the number of hidden
u1 , u2 ∈ {0,1}
layers is equal to 2, the number of neurons by
each hidden layers is equal to3.
The rule of the variable ui can be explained by the where
following example:
( )
W = wki , j 0≤i≤3
1≤ k ≤ ni
, i = 0,...,3 is the matrix of
Size of the observations set 10 1≤ j ≤ ni +1
Dimension of desired space 2 the weights.
Dimension of input 3 The transfer function is defined as:
Number max of hidden layers 3
Activation function f(x)=x
3
n0 , n2, ,n3 3 f (∑ wk3,1 h3k )
k =1
n1 4 F (W , U , X ) =
n4 2 3
f (∑ wk , 2 h3 )
3 k
k =1
Remark
h3 is the output for the i neuron by the 2th hidden
i th
The set of observation is generated randomly.
layer.
2 Darwin. Each solution represents an individual who
3
h1 = h1 = f ∑ wk0, 2 x k , is coded in one or several chromosomes. These
h3 k =1 chromosomes represent the problem’s variables.
1 First, an initial population composed by a fix
number of individuals is generated, then, operators
3
f ∑ wk ,3 x k
0
of reproduction are applied to a number of
k =1 individuals selected switch their fitness. This
procedure is repeated until the maximums number
Remark of iterations is attained. G.A. has been applied in a
The ith hidden layer is deleted, the (i-1)th hidden large number of optimization’s problems in several
layer is connected to the (i+1)th one. domains [6], telecommunication, routing,
scheduling, and it proves it’s efficiently to obtain a
In this example, we have four cases: good solutions [2]. We have formulated the problem
Case 1: as a non linear program with mixed variables.
u1=1 and u2=1
Case 2: Genetic Algorithm’s framework:
u1=1 and u2=0
Case 3: 1. Coding individuals
u1=0 and u2=1 2. Generate the initial population
Case 4: 3. Repeat
u1=0 and u2=0 Evaluation individuals
Selection of individuals
We studying each case, we obtained the optimal Crossover
solution, where we have u1=0 and u2=1. Mutation
Consequently, the first hidden layer is deleted, and Reproduction
the input is connected to the 2th hidden layer. Until attaining the criteria
Coding
In our application, we have encoded an individual where M>>0.
by two chromosomes, the first one represent the
matrix of weights “W”, and the second represents Minimize the value of the objective function
the vector “u” which is an array of decision "objective" is equivalent to maximize the value of
variables who takes 1 if the hidden layer is activated the fitness function.
and 0 other.
Selection
Example: The selection method used in this paper is the
Roulette Wheel Selection RWS, which is a
proportional method of selection, in this method;
The weight between the kth individuals are selected according to their fitness.
neuron in layer i and jth neuron The principle of RWS can be summarized in the
i
in layer i+1: wk , j following schema:
P
1
I1 I I I4 I I6 I I8 I I10
2 3 5 7 9
0 1
Number choosen
randomly in [0.1]
1 1 0 0 1 0 0 0 1
∑ i=1
fi
Initial Population
Individuals of the initial population are randomly
generated, where vector “u” variables take the value Crossover
0 or 1, and the matrix of weights takes random The crossover is very important in the algorithm, in
values in space IR. this step, new individuals called children are created
There are other ways to generate the initial by individuals selected from the population called
population like applying others heuristics, this is the parents. Children are constructed as follows:
case when the research space is constrained what is We fix a point of crossover, the parent are cut
not our case. switch this point, the first part of parent 1 and the
second of parent 2 go to child 1 and the rest go to
Evaluating individuals child 2.
In this step, each individual is assigned a numerical In the crossover that we adopted, we choose two
value called fitness which corresponds to its different crossover points, the first for the matrix of
performance; it depends essentially on the value of weights and the second is for Vector U.
objective function in this individual. An individual
who has a great fitness is the one who is the most
adapted to the problem.
The fitness suggested in our work is the following
function:
Fitness(i ) = M − objective(i )
Example: Algorithm
Crossover Crossover point The scheme of the algorithm is depicted in the
point for for matrix of following figure.
vector U weights
Coding
Initial
Parent 1 Population
Before
crossover W
Parent 2
Training
Base
After Child 1
crossover
Child 2 Evaluating
individuals
Fig 9: Crossover
Selection
Mutation
The role of mutation is to keep the diversity of
solutions in order to avoid local optimums. It Crossover
corresponds on changing the values of one (or
several) value (s) of the individuals who are (or
were) (s) chosen randomly.
Mutation
Example:
Before
Evaluation
mutation
Remplacement
After
mutation Stop
Fig 9: Mutation
1 Q Y −d
i i
5. Experiments Results MAPE = ∑ × 100
In this section, we provide the numerical examples Q i =1 Y i
to illustrate the potential applications of the
proposed methodology.
The architecture of multi layer network is used in Where, Yi denotes the actual (calculated) value, d
this paper. the predicted value, where the ith observation is
The activation function is used in this architecture is counted from 1 to Q and Q is the total number of
given as: observations.
The experimental settings for neural architecture
optimization shows in table 1, in order to study
f ( x ) = (1 + exp ( x ))
−1
some of the properties of data used:
12 Acknowledgements
10 The authors would like to express their sincere
8
thanks for the referee for his/her helpful
N suggestions.
6 nopt
RMSE
4 References
2 [1] M. K. Apalak, M. Yildirim, R Ekici, Layer
0
optimization for maximum fundamental
1 2 3 frequency of laminated composite plates for
different edge conditions, composites science
and technology (68) 537-550, 2008.
Fig 13: Optimal solution for 10000 iterations
[2] Bouktir, T., Slimani, L., Optimal power flow of
the Algerian Electrical Network using genetic
algorithms, WSEAS TRANSACTIONS on
CIRCUITS and SYSTEMS, Vol. 3, Issue 6, [13] J. Holland, ‘Adaptation in natural and artificial
2004, pp. 1478-1482. systems’. Ann Arbor, MI: University of
Michigan Press, 1992.
[3] INCI CABAR, SIRMA YAVUZ2, OSMAN
EROL1, ‘Robot Mapping and Map [14] Julio J. Valdes, Alan J. Barton1,’ Multi-
Optimization Using Genetic Algorithms and objective evolutionary optimization for
Artificial Neural Networks’, WSEAS constructing neural networks for virtual reality
TRANSACTIONS on COMPUTERS, Issue 7, visual data mining: Application to geophysical
Volume 7, July 2008. prospecting’, Neural Networks 20 (2007) 498–
508.
[4] A. Cichoki, R. Unberhuen, ‘Neural Network
for Optimization and signal processing’, Jhon [15] T. Kohonen, ‘Self Organizing Maps’, Springer,
Wiley & Sons, New York, 1993. 3e edition, 2001.
[5] Y. Chang, C. Low, ‘Optimization of a passive [16] T.Y. Kwok, D.K. Yeung, ‘Constructive
harmonic filter based on the neural-genetic algorithms for structure learning in feed
algorithm with Fuzzy logic for a steel forward neural networks for regression
manufacturing plant’, Expert systems with problems’, IEEE Trans. Neural Networks 8
Application, 2059-2070,2008. (1997) 630-645.
[6] J Dréo, A. Pétrowski, P. Siarry, É. Taillard, [17] CHE-CHERN LIN, ‘Implementation
‘Méta-heuristiques pour l’optimisation Feasibility of Convex Recursive Deletion
difficile’ , Éditions Eyrolles, 2003. Regions Using Multi-Layer Perceptrons’,
WSEAS TRANSACTIONS on COMPUTERS,
[7] G. Dreyfus, J. Martinez, M. Samuelides, M. B.
Issue 1, Volume 7, January 2008.
Gordon, F. Badran, S.Thiria, L.Hérault,
‘Réseaux de neurones. Méthodologie et [18] H Peng, X Ling, ‘Optimal design approach for
applications’, Edition Eyrolles 2e édition, the plate-fin heat exchangers using neural
2003. networks cooperated with genetic algorithms’,
Applied Thermal Engineering 28 (2008) 642–
[8] E. Egriogglu, C, Hakam Aladag, S. Gunay, ‘A
650.
new model selection straegy in artificiel neural
networks’, Applied Mathematics and [19] JOSEPH RAJ V. ‘Better Learning of
Computation (195) 591-597, 2008. Supervised Neural Networks Based on
Functional Graph – An Experimental
[9] M. Ettaouil, ‘Contribution à l’étude des
Approach’, WSEAS TRANSACTIONS on
problèmes de satisfaction de contraintes et à la
COMPUTERS, Issue 8, Volume 7, August
programmation quadratiques en nombre entiers,
2008.
allocation statiques de tâches dans les systèmes
distribués’, thèse d’état, Université Sidi [20] Pedro M. Talavan, J. Yanez, ‘A continuous
Mohammed ben Abdellah, F.S.T. de Fès, 1999. Hopfield network equilibrium points
algorithm’, Computers & Operations Research
[10] M. Ettaouil, Y. Ghanou, ‘Réseaux de neurones
32 (2005) 2179–2196
artificiels pour la compression et la filtration
de l’image médicale’, congres de la société [21] Z. Tang, C.Almeida, P.A Fishwick, ‘Time
marocaine de mathématiques appliquées, series forecostng using neural network vs Box-
proceedings pages 148-150 Rabat 2008. jenkins methodology’, Simulation 57
(1991)303-310.
[11] M. Ettaouil, Y Ghanou, ‘Méta- heuristiques et
optimisation des architectures neuronales’, [22] D. Wang, ‘Fast Constructive-Coverting
JIMD’2 procedings ENSIAS Rabat, 2008. Algorithm for neural networks and its
implement in classification’, Applied Soft
[12] M. Ettaouil and C. Loqman, ‘Constraint
Computing 8 (2008) 166-173.
satisfaction problem solved by semi definite
relaxation’, WESAS TRASACTIONS ON [23] D. Wang, N.S. Chaudhari, ‘A constructive
COMPUTER, Issue 7, Volume 7, 951-961, unsupervised learning algorithm for Boolean
2008. neural networks based on multi-level
geometrical expansion’, Neurocomputing 57C
(2004) 455-461.