You are on page 1of 12

WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

Neural architectures optimization and Genetic algorithms


MOHAMED ETTAOUIL and YOUSSEF GHANOU
UFR : Scientific calculation and Computing, Engineering sciences
Department of Mathematics and Computer science
Faculty of Science and Technology of Fez
Box 2202, University Sidi Mohammed ben Abdellah Fez
MOROCCO
mohamedettaouil@yahoo.fr yghanou2000@yahoo.fr

Abstract: The artificial neural networks (ANN) have proven their efficiency in several applications: pattern
recognition, voice and classification problems. The training stage is very important in the ANN’s performance.
The selection of the architecture of a neural network suitable to solve a given problem is one of the most
important aspects of neural network research. The choice of the hidden layers number and the values of weights
has a large impact on the convergence of the training algorithm. In this paper we propose a mathematical
formulation in order to determine the optimal number of hidden layers and good values of weights. To solve
this problem, we use genetic algorithms. The numerical results assess the effectiveness of the theorical results
shown in this paper and computational experiments are presented, and the advantages of the new modelling.

Key-Words: Artificial neural networks (ANN), Non-linear optimization, Genetic algorithms, Supervised
Training, Feed forward neural network.

1. Introduction Traditional algorithms fix the neural network


In recent years, neural networks have attracted architecture before learning [19], others studies
considerable attention as they proved to be essential propose constructive learning [22], [23], it begins
in applications such as content-addressable memory, with a minimal structure of hidden layer, these
pattern recognition and optimization [10], [7]. researchers initialised the hidden layer, with a
Learning or training of ANN is equivalent to finding minimal number of hidden layer neurons. The most
the values of all weights such that the desired output of researchers treat the construction of neural
is generated to corresponding input, it can be architecture (structure) without finding the optimal
viewed as the minimization of error function neural architecture [15].
computed by the difference between the output of In this paper, we propose a new modelling of this
the network and the desired output of a training problem as a mathematical programming. We apply
observations set [3]. genetic algorithms to find the optimal number of
In most instances for neural networks, multilayer hidden layers, the activation function and the matrix
neural networks that are trained with the back of weights. We formulate this problem as a non
propagation algorithm have been used. The major linear programming with mixed constraints. Genetic
shortcoming of this approach is that the knowledge algorithm is proposed to solve it. Consequently we
contained in the trained networks is difficult to determine the optimal architecture and we can
interpret [15] and [3]. Error Back propagation adjust the matrix of weights (learning or training the
algorithm for neural networks is based on gradient artificial neural network). In our approach, we
descending technique. It suffers from problems of assign to each hidden layer a binary variable witch
iterative computing for training. We can use takes the value 1 if the layer is activated and 0
evolutionary algorithms which realize a global otherwise.
search [6]. This paper is organized as follows: Section 2
Global search may stop the convergence to a non- describes the artificial neural networks. In Section 3,
optimal solution and determine the optimum number we present the problem of optimization neural
of ANN hidden layers. Recently, some studies in the architecture and a new modelling is proposed.
optimization architecture problems have been Section 4 shows how we apply a simple genetic
introduced [8], in order to determine neural algorithm to solve this problem. Section 5 contains
networks parameters, but not optimally. illustrative simulation results.

ISSN: 1109-2750 526 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

2. Artificial neural networks input layer, one or more hidden layers and one
output layer. The number of neurons in the input
architecture layer and the output layer are determined by the
Artificial neural network (ANN) is a computing
numbers of input and output parameters,
paradigm designed to imitate the human brain and
respectively.
nervous systems, which are primarily made up of
In order to find the optimal architecture, number of
neurons [4].
neurons in the hidden layer has to be determined
(this number will be determined based on the ANN
2.1. Multilayer network during the training process by taking into
The neural multilayer network (MLN) was first consideration the convergence rate, mapping
introduced by M. Minsky and S. Papert in 1969 [4]. accuracy, etc.).
It has one ore more hidden neuron layers between Artificial neural network modelling is essentially a
its input and output layers (Fig1), where there is no black box operation linking input to output data
connection between neurons in the same layer. using a particular set of nonlinear basis functions.
Usually, each neuron is connected to all neurons in ANN consists of simple synchronous processing
the next layer. elements, which are inspired by biological nervous
systems and the basic unit in the ANN is the neuron.
Output Y ANNs are trained using a large number of input data
with corresponding output data (input/output pairs)
WN obtained from actual measurement so that a
particular set of inputs produces, as nearly as
possible, a specific set of target outputs.
Hidden
Training MLN in a supervised method, many
layer hN
studies have shown MLN ability to solve complex
and diverse problems.
Wi

Hidden
2.2. Supervised learning
layer hi Training of ANN consists of adjusting the weight
associated with each connection (weights) between
neurons until the computed outputs for each set of
W1
data inputs are as close as possible to the
Hidden experimental data outputs.
layer h1
In other words, the error between the output and the
desired output is minimized, possibly to zero. The
W0 error minimization process requires a special circuit
Input X
known as supervisor; hence, the name, ‘supervised
learning’. It is well known that during the design
and raining of ANNs, factors such as: Architecture
Fig1: Neural network structure of the ANN; Training algorithm and Transfer
function need to be considered eventually.
The Figure 1 shows a feed-forward neural network Said supervised training is characterized by the
with N hidden layers of neurons. Note that each presence of pairs, Q input and output that we will
neuron of a certain layer is connected to each ((p1, d1), (p2, d2). . . (pQ, dQ)), which means a
neuron of the next layer. stimulus pi (input) and di target for this desired
Determining the parameters of the ANN has a large outputs [4].
impact on the performance of ANN, it must be Neurons of successive layers are interconnected,
considered carefully. Parameters of the ANN are each neuron calculates the weighted sum of its
network architecture, training algorithm and inputs and output calculates this amount modified
activation function [7]. by an activation function, for example, the sigmoid
function:
Definitions 1
The term “network architecture” refers to the f ( x) =
number of layers in the ANN and the number of 1 + e−x
neurons in each layer. In general, it consists of an

ISSN: 1109-2750 527 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

Each input vector is passed forward through the hidden layer, The output of the ith can be formulated
network and an output vector is calculated. as following:
During training, the network’s output is hi = t (hi1 , hi2 ,..., hini )
compared to the actual observations, and an
error term is created. This error term is called 2.6 Output layer
cost function. The cost function is an important The output layer has neurons to collect the outputs
concept in training, as it is a measure of how far from the hidden layer hN..
away, we are from an optimal solution to the The neurons in the output layers are computing as
problem that we want to solve. Training flows:
algorithms search through the solution space in  nN N 
order to find a function that has the smallest yi = f  ∑ w k , j hNk 
possible cost.  k =1 
After training, the network reproduced the good Y = ( y1 ,..., y j ,..., ym ) = F (W , X )
output not only for the points of the learning base, Where w N k , j is the weight between the neuron k in
but also for other inputs, that is known as
generalization. the Nth hidden layer and the neuron j in the output
The learning problem can be considered as an layer, nN is the number of the neurons in the Nth
optimization problem, where it is to identify and hidden layer, Y=(y1,y2…,ym) is the vector of output
minimize the error between the calculated output layer, F is the transfer function and W is the matrix
of weights, it’s defined as follows:
[ ]
and the desired output.
The multi-layered feed forward network trained W = W 0 ,...,W i ,...,W N
usually with the back-propagation learning
algorithm. The back-propagation learning algorithm where W i = (wki , j )1≤ k ≤ ni i = 0,..., N
is based on the selection of a suitable error function, 1≤ j ≤ ni +1
whose values are determined by the actual and and
predicted outputs of the network. (
W = wi k , )0≤ i ≤ N
j 1≤ k ≤ n
i
1≤ j ≤ ni +1
2.3 Input layer
Remark
We suppose that B={x1,x2,…,xQ} the training base. To simplify we can take n = ni ∀ i = 1, ..., N for all
Each observation xi is presented at input layer of hidden layers. where X is the input of neural
artificial neural networks. i
network and f is the activation function and W is
In the training process, observations are sequentially
input to the learning system in an arbitrary order. the matrix of weights between the ith hidden layer
0
and the (i+1)th hidden layer for i=1,…,N-1, W is
2.4 Hidden layers the matrix of weights between the input layer and
Consider a set of hidden layers (h1, h2, …., hN). the first hidden layer, and W
N
is the matrix of
Assuming ni the neurons number by each hidden th
layer hi. weights between the N hidden layer and the output
layer.
The outputs hi j of neurons in the hidden layers are
computing as flows: 3. Optimization of Artificial Neural
 ni −1 i −1 k 
hi = f  ∑ wk , j hi −1 , i = 2,..., N and j = 1,..., ni
j architecture
 k =1  In this section, general information about our
For the first hidden layer is given as follows: approach is explained.
 ni−1 
h1j = f  ∑ wk0, j xk , j = 1,..., n1 Definition
 k =1  The problem of neural architectures optimization is
i to find the optimal number of hidden layers in the
Where wk , j is the weight between the neuron k in
ANN, the number of neurons within each layer, and
the hidden layer i and the neuron j in the hidden the good activation function, in order to maximize
layer i+1 , ni is the number of the neurons in the ith the performance of artificial neural networks [22].

ISSN: 1109-2750 528 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

3.1 Problem formulation • ui : The binary variable for i=1,…,N-1


The good performance is obtains by and U = (u1 ,..., u i ,..., u N −1 )
minimizing the error (distance) between the
desired output and the calculated output. The output of neural networks is computed by this
Determining the parameters has an impact on expression:
the performance of ANN. (
F (U , W , X ) = Y = y1 , y 2 ,..., y nN +1 )
The training was done in a supervised learning that The output of first hidden layer is defined by the
is for every input vector, the desired output is given following expression:
and weights are adapted to minimize an error
  n0 
function that measures the discrepancy between the  f  ∑ wk0,1 xk  
desired output and the output computed by the  h11    k =1 
network. The mean squared error (MSE) is    
employed as the error function.    
 n0 0 
In this article, a new optimization model is h1 =  h1j  =  f  ∑ wk , j xk 
introduced, in order:     k =1 
   
 n1   
• To optimize the artificial neural  h1    n0 0 
architecture,  f  ∑ wk ,n1 xk  
• To adjust the matrix of weights (learning)   k =1 
Input layer (layer 0) Hidden layer 1
In this new proposed strategy, we can modelize the
problem of neural architecture optimization as non x1 h11
linear constraint programming with mixed variables.
We apply genetic algorithm to solve it.
w10, j
3.2 Modelization of neural architecture xk
optimization problem wk0, j h1j
For modelling, the problem of neural architecture
optimization , we have needed to define some h1n1
x n0 wn00 , j
parameters as follows:
X h1
Notation:
Fig 2: the output of the first hidden layer
• N : The maximum number of hidden
Layers. where (x , x ,..., x )
1 2 n0 is the input of neural
• n0 : The number of neurons by in networks.
input. The following term give the calculated output by the
• nN+1 : The number of neurons by in hidden layer i
output.
• ni : The al number of neurons by the
  ni − 1 
ith hidden layer i=1,…,N.  f  ∑ wki −,11 hik−1  
• nopt : The optimal number of hidden   k =1 
layers.  
•  
X : The input of ANN  ni − 1
k 
• hi : The output of the hidden layer for hi = (1 − u i −1 )hi −1 + u i −1  f  ∑ wk , j hi −1 
i −1
  k =1 
i=1,…,N.  
• Y : The calculated (actual) output of  
ANN   ni −1 i −1 k  
 f  ∑ wk ,ni hi −1  
• d : The desired output of ANN
  k =1 
• f : The activation function of the all
neurons.
where i = 2,.., N − 1
• F : The transfer function of artificial
neural network.

ISSN: 1109-2750 529 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

Fig 4: the output of ANN


Hidden layer i-1 Hidden layer i

Objective function:
1
hi1 The objective function of the mathematical
h i −1
programming model is the error between the
calculated output and desired output:
hik−1 F (U , X , W ) − d
2
w1i,−j1
hi j
i −1
w k, j Constraints:
h ni −1 The last constraints guarantee the existence of the
i −1 i −1
w ni−1 , j hini hidden layers
∑u i ≥1
hi −1 hi i =1,.., N −1

We will know that we will always have the output


Fig 3: the output of the ith hidden layer layer and one of hidden layer, so we consider duress
following.
The last hidden layer is calculated as flows:


 n N −1 
f  ∑ w kN,1−1 h Nk −1  
( )
W = wki , j 0 ≤i ≤ N
1≤ k ≤ ni
, wki , j ∈ IR
  k =1  1≤ j ≤ ni +1
 
  Where i = 0,..., N
 n N −1

h N = (1 − u N −1 )h N −1 + u N −1  f  ∑ w kN, −j 1 h Nk −1  
  k =1   If ui = 0 then the ith hidden layer is deleted, and the

  output of (i+1)th hidden layer is computed by used
  n N −1 N −1 k   the neurons of the (i – 1)th hidden layer. And we
 f  ∑ w k , n N h N −1  
  k =1  have hi = hi-1 .
Otherwise ui = 1 then the ith layer must be retained
(
hN = h1N ,..., hNk ,...hNn N ) and we will: hi = f (W i −1hi −1 )
The output of the neural network is calculated as
flows: The non linear programming model
 nN
 The neural architecture optimization problem can be
y j = f  ∑ wkN, j hNk  j = 1,..., n N +1 formulated as the following model:
 k =1 
Min F (U ,W , X ) − d 2

Hidden layer N Output (layer N+1) Sc :

N −1
h1N y1

( P)∑ ui ≥ 1
 i=1
hNk 
w1N, j ( )
W = wk , j 10≤≤ki≤≤Nni wk , j ∈ IR
i i

yj  1≤ j ≤ni +1
u ∈ {0,1} i = 1,..., N − 1
wkN, j  i
nN
h N
y n N +1 where
wnNN , j U = (u1 ,..., u N −1 ) ∈ {0,1}
N −1

hN Y

ISSN: 1109-2750 530 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

( )
W = wki , j 0 ≤i ≤ N
1≤ k ≤ n i
is the matrix of the weights,
1≤ j ≤ n i +1

N is the number of hidden layers, and ni is the


number of neurons by the nth hidden layers. Output
Variables that can be summed up search in the
following matrix: W3

Hidden Layer h3
U = [u1 ,..., u i ,..., u N −1 ]
W2

[
W = W 0 ,...,W i ,...,W N ] Hidden Layer h2
W1

where W i = (wki , j )1≤ k ≤ ni i = 0,..., N Hidden Layer h1


1≤ j ≤ ni +1

W0
Remark
The optimal number of hidden layers is defined as Input
follows:
N −1 Fig 5: Example for the neural architecture
nopt = ∑ u i optimization
i =1

Our approach begins with a maximal number of The model corresponds to this example is defined as
hidden layers. The training process, all observations follows:
are sequentially input to learning system in an Min F (U ,W , X ) − d 2
arbitrary order. The number of hidden layers and the 
values of weights are needed by the optimization Sc :
model. In addition, the number of hidden layers 
must be decided simultaneously with a training 

phase (weights adjust). ( P)u1 + u2 ≥ 1

Example
Let a multi-layer neural network with two hidden  ( )
W = wi 0≤i≤3
k , j 1≤k ≤n
i
wki , j ∈ IR
1≤ j≤ni +1
layers, In this example, the number of hidden 
u1 , u2 ∈ {0,1}
layers is equal to 2, the number of neurons by
each hidden layers is equal to3.
The rule of the variable ui can be explained by the where
following example:
( )
W = wki , j 0≤i≤3
1≤ k ≤ ni
, i = 0,...,3 is the matrix of
Size of the observations set 10 1≤ j ≤ ni +1
Dimension of desired space 2 the weights.
Dimension of input 3 The transfer function is defined as:
Number max of hidden layers 3
Activation function f(x)=x
 3

n0 , n2, ,n3 3  f (∑ wk3,1 h3k ) 
 k =1 
n1 4 F (W , U , X ) =  
n4 2  3 
 f (∑ wk , 2 h3 ) 
3 k

 k =1 
Remark
h3 is the output for the i neuron by the 2th hidden
i th
The set of observation is generated randomly.
layer.

ISSN: 1109-2750 531 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

  3  The optimal architecture is defined as follows:


 f  ∑ wk2,1 h2k  
  k =1 
  Output
 h31  
   3

h3 =  h32  = (1 − u 2 )h1 + u 2  f  ∑ wk2, 2 h2k  , W3
 3   k =1 
 h3    Hidden Layer h3
 
 3
 W2

 f  ∑ wk2,3 h2k  
  k =1  Hidden Layer h2
W0
  3 
 f  ∑ w1k ,1 h1k   Input
  k =1 
 
 h2 
1
  Fig 6: Example for the neural architecture
 2  3 1 k 
h2 =  h2  = (1 − u1 )h1 + u1  f  ∑ wk , 2 h1  , optimization
 h3    k =1 
 2   4. Solving the obtained non linear
 
 3
 programming model

 f  ∑ w1k ,3 h1k   Genetic algorithm is proposed to solve this problem.
  k =1 
Genetic Algorithm belongs to a class of
probabilistic methods called “evolutionary
  3  algorithms” based on the principles selection and
 f  ∑ wk0,1 x k   mutation. GA was introduced by J. HOLLAND [13]
  k =1 
  that it’s based on natural evolution theory of
 h1  
1


 2 Darwin. Each solution represents an individual who
 3
h1 =  h1  =  f  ∑ wk0, 2 x k  , is coded in one or several chromosomes. These
 h3    k =1  chromosomes represent the problem’s variables.
 1   First, an initial population composed by a fix
  number of individuals is generated, then, operators
 3


 f  ∑ wk ,3 x k  
0
of reproduction are applied to a number of
  k =1  individuals selected switch their fitness. This
procedure is repeated until the maximums number
Remark of iterations is attained. G.A. has been applied in a
The ith hidden layer is deleted, the (i-1)th hidden large number of optimization’s problems in several
layer is connected to the (i+1)th one. domains [6], telecommunication, routing,
scheduling, and it proves it’s efficiently to obtain a
In this example, we have four cases: good solutions [2]. We have formulated the problem
Case 1: as a non linear program with mixed variables.
u1=1 and u2=1
Case 2: Genetic Algorithm’s framework:
u1=1 and u2=0
Case 3: 1. Coding individuals
u1=0 and u2=1 2. Generate the initial population
Case 4: 3. Repeat
u1=0 and u2=0 Evaluation individuals
Selection of individuals
We studying each case, we obtained the optimal Crossover
solution, where we have u1=0 and u2=1. Mutation
Consequently, the first hidden layer is deleted, and Reproduction
the input is connected to the 2th hidden layer. Until attaining the criteria

ISSN: 1109-2750 532 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

Coding
In our application, we have encoded an individual where M>>0.
by two chromosomes, the first one represent the
matrix of weights “W”, and the second represents Minimize the value of the objective function
the vector “u” which is an array of decision "objective" is equivalent to maximize the value of
variables who takes 1 if the hidden layer is activated the fitness function.
and 0 other.
Selection
Example: The selection method used in this paper is the
Roulette Wheel Selection RWS, which is a
proportional method of selection, in this method;
The weight between the kth individuals are selected according to their fitness.
neuron in layer i and jth neuron The principle of RWS can be summarized in the
i
in layer i+1: wk , j following schema:

P
1

I1 I I I4 I I6 I I8 I I10
2 3 5 7 9

0 1
Number choosen
randomly in [0.1]
1 1 0 0 1 0 0 0 1

Fig 8: Selection RWS


Vector u
where
Fig 7: Coding fi
Pi = n

∑ i=1
fi
Initial Population
Individuals of the initial population are randomly
generated, where vector “u” variables take the value Crossover
0 or 1, and the matrix of weights takes random The crossover is very important in the algorithm, in
values in space IR. this step, new individuals called children are created
There are other ways to generate the initial by individuals selected from the population called
population like applying others heuristics, this is the parents. Children are constructed as follows:
case when the research space is constrained what is We fix a point of crossover, the parent are cut
not our case. switch this point, the first part of parent 1 and the
second of parent 2 go to child 1 and the rest go to
Evaluating individuals child 2.
In this step, each individual is assigned a numerical In the crossover that we adopted, we choose two
value called fitness which corresponds to its different crossover points, the first for the matrix of
performance; it depends essentially on the value of weights and the second is for Vector U.
objective function in this individual. An individual
who has a great fitness is the one who is the most
adapted to the problem.
The fitness suggested in our work is the following
function:

Fitness(i ) = M − objective(i )

ISSN: 1109-2750 533 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

Example: Algorithm
Crossover Crossover point The scheme of the algorithm is depicted in the
point for for matrix of following figure.
vector U weights
Coding

Initial
Parent 1 Population
Before
crossover W

Parent 2
Training
Base

After Child 1
crossover

Child 2 Evaluating
individuals

Fig 9: Crossover
Selection

Mutation
The role of mutation is to keep the diversity of
solutions in order to avoid local optimums. It Crossover
corresponds on changing the values of one (or
several) value (s) of the individuals who are (or
were) (s) chosen randomly.
Mutation
Example:

Before
Evaluation
mutation

Remplacement

After
mutation Stop

Fig 9: Mutation

Fig 10: Algorithm description

ISSN: 1109-2750 534 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

1 Q Y −d
i i
5. Experiments Results MAPE = ∑ × 100
In this section, we provide the numerical examples Q i =1 Y i
to illustrate the potential applications of the
proposed methodology.
The architecture of multi layer network is used in Where, Yi denotes the actual (calculated) value, d
this paper. the predicted value, where the ith observation is
The activation function is used in this architecture is counted from 1 to Q and Q is the total number of
given as: observations.
The experimental settings for neural architecture
optimization shows in table 1, in order to study
f ( x ) = (1 + exp ( x ))
−1
some of the properties of data used:

Performance evaluation Population size 10


Two different types of standard statistical were Size of learning set 10
considered as statistical performance evaluation. Dimension of desired space 15
The mean prediction error is calculated by the root Dimension of input 10
mean squared error (RMSE) and mean percentage Number max of hidden layers 10
error (MAPE) were used. The performance Number of neurons by each hidden 10
evaluation criteria used in this article can be layer
computed using the following expressions: Table 1: experimental settings for neural
architecture

In order to test all theoretical results introduced in


Q
this paper, a computer program has been designed.
∑Y
i =1
i
−d i 2
Our algorithm was implanted in C++, and tested on
RMSE = instances generated randomly. The program was run
Q in a compatible IBM, Pentium III 666MHz and 128
Mb of RAM. The experimental results are presented
in the table 2:

N. Max of iteration N nopt RMSE MAPE (%) Number of CPU(s)


iteration
100 6 4 1.41 24.56 19 2.25
6 2 1.53 23.212 25 2.20
8 5 0.86 17.20 96 3.08
8 4 0.85 30.77 73 3.08
10 3 1.25 13.75 55 3.96
10 7 1.28 21.28 37 4.12
1000 6 4 1.08 19.02 213 22.58
8 5 1.02 15.21 627 31.55
8 2 1.16 20.35 613 31.26
10 4 0.87 12.99 752 48.30
10 7 0.91 14 915 49.45
10000 6 5 0.90 15.19 9816 247.42
6 5 0.83 13.78 1432 247.4
8 4 0.73 11.88 9433 344.12
10 4 0.78 12.97 9043 481.70
Table 2: Results of neural architecture optimization

ISSN: 1109-2750 535 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

In Table 2, we present the obtained number of 6. Conclusion


hidden layers and the performance criteria, we ANN is widely accepted as a technology offering an
analyse the performance of our method for different alternative way to simulate complex and ill-defined
observations witch are generated randomly. problems. They have been used in diverse
In the following graphics, we interpret the obtained applications in control, robotics, pattern recognition,
results for the problem of the neural architecture forecasting power systems, manufacturing,
optimization. optimization, signal processing,…
12 In this paper, we have proposed a new modelling
10 for the neural architecture optimization problem.
The GA is especially appropriate to obtain the
8
N optimization solution of the complex nonlinear
6 nopt problem. This method is tested to determine the
RMSE
4 optimal number of hidden layers in the ANN and
2
the most favourable weights matrix, in order to
minimize the error between the desired (predicted)
0
1 2 3
output and the calculated output. Determining these
parameters has an impact on the performance of
ANN. Several directions can be investigated to try
Fig 11: Optimal solution for 100 iterations improving this method use a more efficient
metaheuristics for example: tabu research, ant
The numerical results assess the effectiveness of the colony system [6]. Clearly, heuristical and
theorical results shown in this paper, and the mathematical approaches using a priori information
advantages of the new modelling. would give rise to better performance and stability.
To make this approach more efficient, it can be
combined with other metaheuristics or it can be
12
computationally optimised by introducing analytical
10
improvements, such as replacing the hyperbolic
8 tangent, with a linear function.
N
6 nopt
Moreover, the approach introduced in this paper is
RMSE also intrinsically easy to parallelize.
4
Other experiment results are in progress for
2 searching adequate values of parameters such as
0 threshold value of the GA, Tabu research, and Ant
1 2 3 colony system.
Others studies are in progress to apply this approach
to many problems such as: image processing,
Fig 12: Optimal solution for 1000 iterations
classification, image filtration [10], optimization.

12 Acknowledgements
10 The authors would like to express their sincere
8
thanks for the referee for his/her helpful
N suggestions.
6 nopt
RMSE
4 References
2 [1] M. K. Apalak, M. Yildirim, R Ekici, Layer
0
optimization for maximum fundamental
1 2 3 frequency of laminated composite plates for
different edge conditions, composites science
and technology (68) 537-550, 2008.
Fig 13: Optimal solution for 10000 iterations
[2] Bouktir, T., Slimani, L., Optimal power flow of
the Algerian Electrical Network using genetic
algorithms, WSEAS TRANSACTIONS on

ISSN: 1109-2750 536 Issue 3, Volume 8, March 2009


WSEAS TRANSACTIONS on COMPUTERS Mohamed Ettaouil, Youssef Ghanou

CIRCUITS and SYSTEMS, Vol. 3, Issue 6, [13] J. Holland, ‘Adaptation in natural and artificial
2004, pp. 1478-1482. systems’. Ann Arbor, MI: University of
Michigan Press, 1992.
[3] INCI CABAR, SIRMA YAVUZ2, OSMAN
EROL1, ‘Robot Mapping and Map [14] Julio J. Valdes, Alan J. Barton1,’ Multi-
Optimization Using Genetic Algorithms and objective evolutionary optimization for
Artificial Neural Networks’, WSEAS constructing neural networks for virtual reality
TRANSACTIONS on COMPUTERS, Issue 7, visual data mining: Application to geophysical
Volume 7, July 2008. prospecting’, Neural Networks 20 (2007) 498–
508.
[4] A. Cichoki, R. Unberhuen, ‘Neural Network
for Optimization and signal processing’, Jhon [15] T. Kohonen, ‘Self Organizing Maps’, Springer,
Wiley & Sons, New York, 1993. 3e edition, 2001.
[5] Y. Chang, C. Low, ‘Optimization of a passive [16] T.Y. Kwok, D.K. Yeung, ‘Constructive
harmonic filter based on the neural-genetic algorithms for structure learning in feed
algorithm with Fuzzy logic for a steel forward neural networks for regression
manufacturing plant’, Expert systems with problems’, IEEE Trans. Neural Networks 8
Application, 2059-2070,2008. (1997) 630-645.
[6] J Dréo, A. Pétrowski, P. Siarry, É. Taillard, [17] CHE-CHERN LIN, ‘Implementation
‘Méta-heuristiques pour l’optimisation Feasibility of Convex Recursive Deletion
difficile’ , Éditions Eyrolles, 2003. Regions Using Multi-Layer Perceptrons’,
WSEAS TRANSACTIONS on COMPUTERS,
[7] G. Dreyfus, J. Martinez, M. Samuelides, M. B.
Issue 1, Volume 7, January 2008.
Gordon, F. Badran, S.Thiria, L.Hérault,
‘Réseaux de neurones. Méthodologie et [18] H Peng, X Ling, ‘Optimal design approach for
applications’, Edition Eyrolles 2e édition, the plate-fin heat exchangers using neural
2003. networks cooperated with genetic algorithms’,
Applied Thermal Engineering 28 (2008) 642–
[8] E. Egriogglu, C, Hakam Aladag, S. Gunay, ‘A
650.
new model selection straegy in artificiel neural
networks’, Applied Mathematics and [19] JOSEPH RAJ V. ‘Better Learning of
Computation (195) 591-597, 2008. Supervised Neural Networks Based on
Functional Graph – An Experimental
[9] M. Ettaouil, ‘Contribution à l’étude des
Approach’, WSEAS TRANSACTIONS on
problèmes de satisfaction de contraintes et à la
COMPUTERS, Issue 8, Volume 7, August
programmation quadratiques en nombre entiers,
2008.
allocation statiques de tâches dans les systèmes
distribués’, thèse d’état, Université Sidi [20] Pedro M. Talavan, J. Yanez, ‘A continuous
Mohammed ben Abdellah, F.S.T. de Fès, 1999. Hopfield network equilibrium points
algorithm’, Computers & Operations Research
[10] M. Ettaouil, Y. Ghanou, ‘Réseaux de neurones
32 (2005) 2179–2196
artificiels pour la compression et la filtration
de l’image médicale’, congres de la société [21] Z. Tang, C.Almeida, P.A Fishwick, ‘Time
marocaine de mathématiques appliquées, series forecostng using neural network vs Box-
proceedings pages 148-150 Rabat 2008. jenkins methodology’, Simulation 57
(1991)303-310.
[11] M. Ettaouil, Y Ghanou, ‘Méta- heuristiques et
optimisation des architectures neuronales’, [22] D. Wang, ‘Fast Constructive-Coverting
JIMD’2 procedings ENSIAS Rabat, 2008. Algorithm for neural networks and its
implement in classification’, Applied Soft
[12] M. Ettaouil and C. Loqman, ‘Constraint
Computing 8 (2008) 166-173.
satisfaction problem solved by semi definite
relaxation’, WESAS TRASACTIONS ON [23] D. Wang, N.S. Chaudhari, ‘A constructive
COMPUTER, Issue 7, Volume 7, 951-961, unsupervised learning algorithm for Boolean
2008. neural networks based on multi-level
geometrical expansion’, Neurocomputing 57C
(2004) 455-461.

ISSN: 1109-2750 537 Issue 3, Volume 8, March 2009

You might also like