Evolving Artificial Neural Networks

XIN YAO, SENIOR MEMBER, IEEE

Invited Paper

Learning and evolution are two fundamental forms of adapta- between ANN’s and EA’s for combinatorial optimization
tion. There has been a great interest in combining learning and will be mentioned but not discussed in detail.
evolution with artificial neural networks (ANN’s) in recent years.
This paper: 1) reviews different combinations between ANN’s and
evolutionary algorithms (EA’s), including using EA’s to evolve A. Artificial Neural Networks
ANN connection weights, architectures, learning rules, and input
features; 2) discusses different search operators which have been 1) Architectures: An ANN consists of a set of processing
used in various EA’s; and 3) points out possible future research elements, also known as neurons or nodes, which are
directions. It is shown, through a considerably large literature interconnected. It can be described as a directed graph in
review, that combinations between ANN’s and EA’s can lead to which each node performs a transfer function of the
significantly better intelligent systems than relying on ANN’s or form
EA’s alone.
Keywords—Evolutionary computation, intelligent systems, neu-
ral networks. (1)

I. INTRODUCTION
where is the output of the node is the th input to
Evolutionary artificial neural networks (EANN’s) refer the node, and is the connection weight between nodes
to a special class of artificial neural networks (ANN’s) in and . is the threshold (or bias) of the node. Usually,
which evolution is another fundamental form of adaptation is nonlinear, such as a heaviside, sigmoid, or Gaussian
in addition to learning [1]–[5]. Evolutionary algorithms function.
(EA’s) are used to perform various tasks, such as con- ANN’s can be divided into feedforward and recurrent
nection weight training, architecture design, learning rule classes according to their connectivity. An ANN is feed-
adaptation, input feature selection, connection weight ini- forward if there exists a method which numbers all the
tialization, rule extraction from ANN’s, etc. One distinct nodes in the network such that there is no connection from
feature of EANN’s is their adaptability to a dynamic a node with a large number to a node with a smaller number.
environment. In other words, EANN’s can adapt to an en- All the connections are from nodes with small numbers to
vironment as well as changes in the environment. The two nodes with larger numbers. An ANN is recurrent if such a
forms of adaptation, i.e., evolution and learning in EANN’s, numbering method does not exist.
make their adaptation to a dynamic environment much more In (1), each term in the summation only involves one
effective and efficient. In a broader sense, EANN’s can be input . High-order ANN’s are those that contain high-
regarded as a general framework for adaptive systems, i.e., order nodes, i.e., nodes in which more than one input
systems that can change their architectures and learning are involved in some of the terms of the summation. For
rules appropriately without human intervention. example, a second-order node can be described as
This paper is most concerned with exploring possible
benefits arising from combinations between ANN’s and
EA’s. Emphasis is placed on the design of intelligent
systems based on ANN’s and EA’s. Other combinations
Manuscript received July 10, 1998; revised February 18, 1999. This
work was supported in part by the Australian Research Council through where all the symbols have similar definitions to those in
its small grant scheme. (1).
The author is with the School of Computer Science, University The architecture of an ANN is determined by its topo-
of Birmingham, Edgbaston, Birmingham B15 2TT U.K. (e-mail:
xin@cs.bham.ac.uk). logical structure, i.e., the overall connectivity and transfer
Publisher Item Identifier S 0018-9219(99)06906-6. function of each node in the network.

0018–9219/99$10.00  1999 IEEE

PROCEEDINGS OF THE IEEE, VOL. 87, NO. 9, SEPTEMBER 1999 1423

Fig. 1. A general framework of EA’s.

2) Learning in ANN’s: Learning in ANN’s is typically gradient-based search algorithms. They do not depend
accomplished using examples. This is also called “training” on gradient information and thus are quite suitable for
in ANN’s because the learning is achieved by adjusting the problems where such information is unavailable or very
connection weights1 in ANN’s iteratively so that trained costly to obtain or estimate. They can even deal with
(or learned) ANN’s can perform certain tasks. Learning in problems where no explicit and/or exact objective function
ANN’s can roughly be divided into supervised, unsuper- is available. These features make them much more robust
vised, and reinforcement learning. Supervised learning is than many other search algorithms. Fogel [15] and Bäck
based on direct comparison between the actual output of et al. [16] give a good introduction to various evolutionary
an ANN and the desired correct output, also known as the algorithms for optimization.
target output. It is often formulated as the minimization
of an error function such as the total mean square error C. Evolution in EANN’s
between the actual output and the desired output summed Evolution has been introduced into ANN’s at roughly
over all available data. A gradient descent-based optimiza- three different levels: connection weights; architectures;
tion algorithm such as backpropagation (BP) [6] can then and learning rules. The evolution of connection weights
be used to adjust connection weights in the ANN iteratively introduces an adaptive and global approach to training,
in order to minimize the error. Reinforcement learning is a especially in the reinforcement learning and recurrent net-
special case of supervised learning where the exact desired work learning paradigm where gradient-based training al-
output is unknown. It is based only on the information of gorithms often experience great difficulties. The evolution
whether or not the actual output is correct. Unsupervised of architectures enables ANN’s to adapt their topologies
learning is solely based on the correlations among input to different tasks without human intervention and thus
data. No information on “correct output” is available for provides an approach to automatic ANN design as both
learning. ANN connection weights and structures can be evolved.
The essence of a learning algorithm is the learning rule, The evolution of learning rules can be regarded as a process
i.e., a weight-updating rule which determines how connec- of “learning to learn” in ANN’s where the adaptation of
tion weights are changed. Examples of popular learning learning rules is achieved through evolution. It can also be
rules include the delta rule, the Hebbian rule, the anti- regarded as an adaptive process of automatic discovery of
Hebbian rule, and the competitive learning rule [7]. novel learning rules.
More detailed discussion of ANN’s can be found in [7].
D. Organization of the Article
B. EA’s The remainder of this paper is organized as follows.
EA’s refer to a class of population-based stochastic Section II discusses the evolution of connection weights.
search algorithms that are developed from ideas and princi- The aim is to find a near-optimal set of connection weights
ples of natural evolution. They include evolution strategies globally for an ANN with a fixed architecture using EA’s.
(ES) [8], [9], evolutionary programming (EP) [10], [11], Various methods of encoding connection weights and dif-
[12], and genetic algorithms (GA’s) [13], [14]. One im- ferent search operators used in EA’s will be discussed.
portant feature of all these algorithms is their population- Comparisons between the evolutionary approach and con-
based search strategy. Individuals in a population compete ventional training algorithms, such as BP, will be made.
and exchange information with each other in order to In general, no single algorithm is an overall winner for all
perform certain tasks. A general framework of EA’s can kinds of networks. The best training algorithm is problem
be described by Fig. 1. dependent.
EA’s are particularly useful for dealing with large com- Section III is devoted to the evolution of architectures,
plex problems which generate many local optima. They are i.e., finding a near-optimal ANN architecture for the tasks
less likely to be trapped in local minima than traditional at hand. It is known that the architecture of an ANN
1 Thresholds (biases) can be viewed as connection weights with fixed determines the information processing capability of the
0
input 1. ANN. Architecture design has become one of the most

1424 PROCEEDINGS OF THE IEEE, VOL. 87, NO. 9, SEPTEMBER 1999

Binary Representation There have been some successful applications of BP in The canonical genetic algorithm (GA) [13]. Two important factors which often appear in the ANN’s and examines the relationship between learning fitness (or error) function are the error between target and and evolution. e. A typical cycle of the evolution of connection weights. mulate the training process as the evolution of connection Reasons and empirical results will be given in this section weights in the environment determined by the architecture to explain why this is the case.. i. Different representations and search operators can lead to II.Fig.e. A. [26]–[112]. the training error is smaller than a certain value) error between target and actual outputs averaged over all or the population has converged. fitness (or error) function does not have to be differentiable Although research on this topic is still in its early stages. and the learning task. such as BP and conjugate gradient algorithms [7]. The section then gives a brief search operators such as crossover and mutation have to summary of the paper and concludes with a few remarks.e. considerable research and application between ANN’s and EA’s.. by iteratively adjusting connection weights. which are the typical Section V summarizes some other forms of combinations case in the real world. in which different EANN models. It is demonstrated that an the case in gradient-descent-based training algorithms. A detailed review of BP and other representation and search operators used in EA’s.e.. whether in among three levels of evolution are considered. non- machine learning as a whole. Two most a global minimum if the error function is multimodal important issues in the evolution of architectures. Unlike how learning itself evolves. The fitness of an ANN can be defined according to different Section IV addresses the evolution of learning rules in needs. They demonstrate the breadth [24]. [17]–[19]. are based on gradient descent. THE EVOLUTION OF CONNECTION WEIGHTS quite different training performance. important tasks in ANN research and application. A typical cycle of the evolution of connection weights is shown in Fig. of possible combinations between ANN’s and EA’s. 2. be decided in conjunction with the representation scheme. will be learning algorithms can be found in [7]. Most training algorithms. addressed in this section. and [25].e. Some of the early work in minimum of the error function and is incapable of finding evolving ANN connection weights followed this approach YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1425 .. The second one is framework provides a common basis for comparing the evolutionary process simulated by an EA. how learning guides evolution and actual outputs and the complexity of the ANN. [14] has various areas [20]–[22]. It often gets trapped in a local often termed chromosomes. simply indicative. EA’s can then be used effectively If imagining ANN’s connection weights and architectures in the evolution to find a near-optimal set of connection as their “hardware. or even continuous since EA’s do not depend on gradient further studies will no doubt benefit research in ANN’s and information. the ANN’s learning ability can be improved through evolution. such as the mean square value (i. The first phase is to decide EANN’s in terms of adaptive systems where interactions the representation of connection weights. 2. differentiable. of gradient descent [23]. It is shown that evolutionary One way to overcome gradient-descent-based training algorithms relying on crossover operators do not perform algorithms’ shortcomings is to adopt EANN’s. i. to for- very well in searching for a near-optimal ANN architecture. complex. [24]. i. The Weight training in ANN’s is usually formulated as min- evolution stops when the fitness is greater than a predefined imization of an error function. They do not intend to be has been conducted on the evolution of connection weights exhaustive. and multimodal spaces. [17]. The the form of binary strings or not. but BP has drawbacks due to its use always used binary strings to encode alternative solutions. examples. Because EA’s can treat large.. of the evolution of ANN’s “software”—learning rules. The evolutionary approach to weight training in ANN’s Section VI first describes a general framework of consists of two major phases.g.” it is easier to understand the importance weights globally without computing gradient information. the and/or nondifferentiable.

The idea was to ranges and precisions given the same number of bits. 3. such as uniform. exponential. Their results showed that the evolutionary the length of chromosome often has to be made. (a) (b) (a) (b) Fig. would increase the difficulty of constructing useful feature detectors because they might be destroyed by crossover B.0.2. [116] has given representations weight in the ANN is represented by 4 bits. [30]. VOL. [28]. the whole ANN other than ary strings a more solid theoretical foundation. chromosomes representing large EP or ES since they are particularly well-suited for treating ANN’s will become extremely long and the evolution in continuous optimization. [96].7. [26].. a real- classical crossover (such as one-point or uniform crossover) number representation of the ANN given by Fig. (b) A Fig. For example.e. Some have argued that the mini- to destroy feature detectors found during the evolutionary mal cardinality. simplicity and generality. each connection weight is represented by their hidden nodes differently in their chromosomes will a number of bits with certain length. [95]. is represented by 24 bits where weight 0000 indicates no Real numbers have been proposed to represent connection connection between two nodes. Separating inputs to the makes crossover operator very inefficient and ineffective in same hidden node far apart in the binary representation producing good offspring. [111]. Unlike GA’s. A natural way to evolve real vectors would be to use if too many bits are used..3. 3(a). It is caused by the Hence the evolutionary process can be more efficient. the binary representation. As connection weights are represented by real numbers. the primary search turn will become very inefficient. [113]. ANN’s shown concatenation of all the connection weights of the network by Figs. It is generally very difficult to apply crossover There have been some debates on the cardinality of operators in evolving connection weights since they tend the genotype alphabet.10. If too training approach was much faster than BP for the problems few bits are used to represent each connection weight. [118]. 87. Each connection equivalent classes [115]. 3(a) could and mutation to binary strings.0. 3(a) and 4(a) are equivalent functionally. [48]. [114].0. [52]. assuming that each weight (b) Its binary representation under the same representation scheme. For example. SEPTEMBER 1999 . [29]. [41]. 9. Bartlett and Downs [30] also demonstrated training might fail because some combinations of real. complex and tailored search operators. longer be used directly. i. Hidden nodes in ANN’s are in essence ent chromosome representations. to the actual ANN (phenotype) since two ANN’s that order sentation scheme. They encode real values using different many heuristics about training ANN’s. sufficient accuracy by discrete values. The permutation problem feature extractors and detectors. An ANN is encoded by still be equivalent functionally. to be designed. In such a repre. There 1426 PROCEEDINGS OF THE IEEE. The binary repre. [110]. [38]. as the competing convention problem. i. (a) An ANN with connection weights shown.0. NO. also known reduce the negative impact of the permutation problem.. operator in EP and ES is mutation. In general. Traditional binary crossover and mutation can no bits in hardware with limited precision. 3(b) and A heuristic concerning the order of the concatenation 4(b). (a) An ANN which is equivalent to that given in Fig. [74]. binary representation of the weights. a tradeoff between representation precision and during evolution. [63]–[65].0. weights directly. Montana and Davis [27] defined a large Gray. There is little need to design be (4. Special search operators have There are several encoding methods.e. 3 gives an example of the binary representation of representations and operators based on the concept of an ANN whose architecture is predefined. etc. many-to-one mapping from the representation (genotype) [24]. retain useful feature detectors formed around hidden nodes However.0). one real number per connection The advantages of the binary representation lie in its weight [27]. that can be used in the binary number of tailored genetic operators which incorporated representation. might not process. [37].0. but they in the chromosome. [117]. that the evolutionary approach was faster and had better valued connection weights cannot be approximated with scalability than BP. any permutation of the hidden nodes is to put connection weights to the same hidden/output will produce functionally equivalent ANN’s with differ- node together. One of the major One of the problems faced by evolutionary training of advantages of using mutation-based EA’s is that they can ANN’s is the permutation problem [32]. they considered. On the other hand. Formal analysis of nonstandard Fig. Real-Number Representation operators. sentation also facilitates digital hardware implementation each individual in an evolving population will be a real of ANN’s since weights have to be represented in terms of vector. It is straightforward to apply [102]. be the best [48]. is represented by four bits. have different chromosomes as shown by Figs. [53]. 4.

including Downs [30] also gave a modified GA which was “an order all parents and offspring. [122]. or even continuous. [83].” that runs a GA first and then BP. [119]. [80]. The general applicability of the evolutionary ap- as follows. the GA-based training algorithm replaced by Cauchy mutation [121]. Different error functions may be used here. and denote the a gradient descent algorithm can only find a local optimum th component of the vectors and . [40]. and the GDR faster evolution. there have been many other papers 5) Stop if the halting criterion is satisfied. [106]. [102]. the same as Cauchy mutation [121]. that have most wins to form the scale networks. Unlike the case in gradient- as strategy parameters in self-adaptive EA’s). in (3) may be reported in his paper [34].” next generation. this term does not need to be differentiable individual corresponds to an ANN. The timodal. [67]. of whether they are feedforward. quite different results were reported by Ki- For each comparison. it receives a “win. which report excellent results using hybrid evolutionary and and go to Step 2). In these examples. whether the comparison tionary training approach is attractive because it can handle is between a classical binary GA and a fast BP algorithm. the primary search operator [117]. [60]. [111]. [126]–[128]. respectively. the evolu. [120]. [71]. EP EA can be used to train many different networks regardless and ES also allow self adaptation of strategy parameters. . based on the training error. and the two-spiral [125]. [95]. otherwise. denotes a normally distributed For some problems. 1) Generate an initial population of individuals at The evolutionary approach also makes it easier to gener- random and set . [124] for “took a total of about 3 hours and 20 minutes. mul. Prados [34] described a GA-based training number is generated anew for each value of . [122].) problem. such as various size encoder/decoder problems.” Bartlett and 3) Determine the fitness of every individual. [96]. the global search problem better in a vast. Other mutation operators. single offspring by: for Evolutionary training can be slow for some problems in comparison with fast variants of BP [131] and conjugate (2) gradient algorithms [19]. Each based training. [110]–[112]. creates a also be incorporated into the fitness function easily. formly at random from all the parents and offspring. or between a fast EA and a classical BP algorithm. of magnitude” faster than BP for the 7-bit parity problem. Comparison Between Evolutionary Training The discrepancy between two seemingly contradictory and Gradient-Based Training results can be attributed at least partly to the different As indicated at the beginning of Section II. at best. and has been Gaussian mutation. This is certainly true according to the YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1427 . [100]. However. opponents are chosen uni- problem. [53]. but far less efficient in larger networks. [74]. recurrent. [34]. They always search for a globally optimal solution. [70]. That is.have been a number of successful examples of applying EP is unavailable or very costly to obtain or estimate. recurrent ANN’s [41]. For or ES to the evolution of ANN connection weights [29]. Each individual is a pair ate ANN’s with some special characteristics. Moreover. The best one is always and thus is particularly appealing when this information problem dependent. [103]. (This tournament selection scheme The test problems he used included the XOR problem. “is. can also be used. evolutionary training can be signif- one-dimensional random number with mean zero and icantly faster and more reliable than BP [30]. XOR problem but faster than BP for the larger 7-bit parity For each individual. For example. [132]. the evolutionary approach has been used to train [63]–[65]. [105]. equally Select individuals out of and efficient to faster variants of back propagation in very small . if the individual’s fitness is tano [133]. [77]. took a total of about 23 hours and 40 minutes. [67]. The algorithm which is “significantly faster than methods that parameters and are commonly set to use the generalized delta rule (GDR). [106]. indicates that the random [63]. Weight sharing and weight decay can 2) Each individual . while where . [89]. [103]. However. It does not depend discrepancy also shows that there is no clear winner in on gradient information of the error (or fitness) function terms of the best training algorithm. in a neighborhood of the initial solution. [65]. The modified GA seemed to have better scalability than 4) Conduct pairwise comparison over the union of par- BP since it was “around twice” as slow as BP for the ents and offspring . proach saves a lot of human efforts in developing different training algorithms for different types of ANN’s. [102]. of real-valued vectors. [117]. [123]. EA’s are gen- (3) erally much less sensitive to initial conditions of training. [130]. Interestingly. such fuzzy ANN’s [76]. [129]. variance one. and nondifferentiable surface. may be replaced by other selection schemes. C. [86]. example. [68]. EA’s and BP compared. higher order ANN’s [52]. or higher order Evolving connection weights by EP can be implemented ANN’s. He found that the GA–BP method.” For the three tests and [15]. gradient descent algorithms [32]. complex. the ANN’s complexity can be decreased and its generaliza- where ’s are connection weight vectors and ’s are tion increased by including a complexity (regularization) variance vectors for Gaussian mutations (also known term in the fitness function. a technique no smaller than the opponent’s. [81].

i. 5 illustrates a simple case where there is than the complete class of network architectures” [149]. In general. Research on constructive and destructive al- training algorithm will be quite competitive.. an ANN with only a few connections 2) the surface is nondifferentiable since changes in the and linear nodes may not be able to perform the task number of nodes or connections are discrete and can at all due to its limited capability. However. [67]. [150] which make EA’s a better candidate for structure.. [74]. 1428 PROCEEDINGS OF THE IEEE. nodes. NO. This is especially true for GA’s. i. [149]. only one connection weight in the ANN. SEPTEMBER 1999 . application areas [32]. D. The optimal architecture design is predefined and fixed during the evolution of connection equivalent to finding the highest point on this surface.e. architecture design is still very much a human approach was more efficient than either the GA or BP al. EA’s can be used to locate a good region in the space and then a local search procedure is used to find a near-optimal Fig.. initial weights. Similar work gorithms represents an effort toward the automatic design of on the evolution of initial weights has also been done on architectures [141]–[148]. Given a learning task. VOL.” In addition. then a global minimum can easily be found by “Such structural hill climbing methods are susceptible to the local search algorithm if an EA can locate a point. e. [86]. [103]. [105]. and connections when nec- as locating a good region in the weight space. Fig. 9. If we consider that BP often has to run and a tedious trial-and-error process. [133] or other random search algorithms [30]. An illustration of using an EA to find good initial weights solution in this region. it would be easy for a local formulated as a search problem in the architecture space search algorithm to arrive at the globally optimal weight where each point represents an architecture. [71].e. 5. Hybrid training has been used successfully in many because it can lead to the global optimum wB using a local search algorithm. combining EA’s global search ability with local search’s ability to fine tune. The efficiency of evolutionary training can be improved significantly by incorporating a local search procedure into the evolution. [110]–[112]. about architectures. THE EVOLUTION OF ARCHITECTURES performance level of all architectures forms a discrete Section II assumed that the architecture of an ANN is surface in the space. 87. the hybrid automatically. in the basin of attraction of the global they “only investigate restricted topological subsets rather minimum. lowest training error. set of initial weights.. starts with the maximal network and of all the points. Defining that essary during training while a destructive algorithm does basin of attraction of a local minimum as being composed the opposite. This section discusses the design of ANN architec. minimal number of hidden layers. are several characteristics of such a surface as indicated by tures. The architecture of an ANN includes its topological Miller et al. As indicated in the beginning of this tive algorithms mentioned above. and connections during converge to the local minimum through a local search training. lowest network complexity. [150]: plication of ANN’s because the architecture has significant 1) the surface is infinitely large since the number of impact on a network’s information processing capabilities.no-free-lunch theorem [134].e..g. Their results showed that the hybrid GA/BP Up to now. It depends heavily on the expert experience gorithm used alone. If an EA can find Design of the optimal architecture for an ANN can be an initial weight such as . Given some even though itself is not as good as .e. i. Lee [81] and many others [32]. Hybrid Training Most EA’s are rather inefficient in fine-tuned local search although they are good at global search.. weights easily. the III. which can deletes unnecessary layers. wI 2 in this figure is an optimal initial weight [135]. There is no systematic several times in practice in order to find good connection way to design a near-optimal architecture for a given task weights due to its sensitivity to initial conditions. There weights. a becoming trapped at structural local optima. a constructive competitive learning neural networks [139] and Kohonen algorithm starts with a minimal network (network with networks [140]. [70]. [81]. [136]–[138] used GA’s to a large number of connections and nonlinear nodes may search for a near-optimal set of initial connection weights overfit noise in the training data and fail to have good and then used BP to perform local search from these generalization ability. hybrid algorithms tend to perform better than others for a large number of problems. nodes. These characteristics are paper. algorithm. Roughly speaking. nodes. as indicated by Angeline et al. performance (optimality) criteria. i. and connections) It is interesting to consider finding good initial weights and adds new layers. The local search algorithm could such that a local search algorithm can find the globally optimal be BP [32]. connectivity. and the transfer function of each searching the surface than those constructive and destruc- node in the ANN. etc. expert’s job. sets of weights in this case. [80]. while an ANN with have a discontinuous effect on EANN’s performance. architecture design is crucial in the successful ap. possible nodes and connections is unbounded.

where indicates presence or are encoded. simultaneous evolution of both topological structures and e. After a representation connection. such as the an matrix can represent an ANN number of hidden layers and hidden nodes in each layer. [165]. a feedforward ANN will have nonzero entries only node transfer functions.e. let alone the resentation scheme by imposing constraints on the matrix. 5) the surface is multimodal since different architectures A. [166]. genotype representation scheme of architectures and the EA [154].g. [185]–[200]. [128]. [165]. Relatively little has been done on being explored can easily be incorporated into such a rep- the evolution of node transfer functions. [169]. [45]. [165]. [182]. [169]. [150]. This section information about an architecture should be encoded in the will focus on the first approach. [45]. The Direct Encoding Scheme may have similar performance. At one extreme. two major Two different approaches have been taken in the direct phases involved in the evolution of architectures are the encoding scheme. connection and node of an architecture. i. every be discussed in Section III-D. 6. is called direct encoding. [170]. we will analyze in the upper-right triangle of the matrix. [138]. [150]. [127]. The binary string [118]. 7 and 8 the genotypical representation scheme of topological struc.. This kind of representation to indicate a connection and to indicate no scheme is called indirect encoding. Figs.. architecture with nodes. [179]. Section III-C discusses the evolution of node transfer strongly epistatic. give two examples of the direct encoding scheme of ANN tures in Sections III-A and III-B. The first separates the evolution of archi- tectures from that of connection weights [24].Fig. the evolution of architectures can weights from node to node so that the architecture and progress according to the cycle shown in Fig. It is obvious that such an encoding scheme term architecture will be used interchangeably with the can handle both feedforward and recurrent ANN’s. [180]. evolving architectures in Section III-D. [42]. Other details about the architecture are left to absence of the connection from node to node . the corresponding ANN architecture. The second approach will chromosome. the architectures. At the other extreme. The cycle connection weights can be evolved simultaneously [37]. Similar to the evolution of connection weights. Most of representing an architecture is the concatenation of rows the research has concentrated on the evolution of ANN (or columns) of the matrix. [167]. [149]–[225]. 3) the surface is complex and noisy since the mapping term topological structure (topology) in these two sec- from an architecture to its performance is indirect. 6. can represent real-valued connection scheme has been chosen. and dependent on the evaluation functions briefly. In fact. [154]. [153]. [167]. [170]. One of the key issues evolves architectures and connection weights simultane- in encoding ANN architectures is to decide how much ously [149]. most important parameters of an architecture. We can use the training process to decide. can be specified In the first approach. YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1429 . For convenience. In this paper. The second approach used to evolve ANN architectures. A typical cycle of the evolution of architectures. stops when a satisfactory ANN is found. [130]. evolution of ANN connection weights and architectures 4) the surface is deceptive since similar architectures is beneficial and what search operators should be used in may have quite different performance. [42]. each connection of an architecture by the chromosome. Constraints on architectures topological structures. [169]–[171]. only the [153]. Then we explain why the simultaneous method used. For example. [202]. all the details. tions. Considerable research on evolving ANN architectures Each matrix has a direct one-to-one mapping to has been carried out in recent years [33]. This kind of representation scheme is directly specified by its binary representation [24].

The complexity measurement such as a population of such chromosomes. 7(b). Fig. Schaffer Such shortcuts pose no problems to the representation and et al. The training error will be used to readily be introduced into the fitness function without much measure the fitness. (a). 7(c). Its connectivity matrix is given by Fig. and (c) show the architecture. An EA can then be employed to evolve the fitness function. initialize it with random on the information theory or statistics [226]–[228] can weights. no limitation such as being differentiable or continuous Since the ANN is feedforward. 8 shows a recurrent ANN. Converting this connectivity matrix to a optimization of tightly pruned interesting designs that no chromosome is straightforward. 6. respectively. Improvement on ANN’s generalization ability Fig. The reduced chromosome is such as the error and the training time is often used in given by Fig. (b). Because only feedforward architectures are under consideration. and its binary string representation. we only need to represent on how the fitness function should be defined at Step 3 the upper triangle of the connectivity matrix in order to in Fig. 7(a) shows a feedforward ANN with two inputs and is possible if we want to explore the whole connectivity one output. the first The direct encoding scheme as described above is quite row indicates connections from node 1 to all other nodes. There is virtually 00 110 00 101 00 001 00 001 00 000. The training result pertaining to an architecture reduce the chromosome length. (b). many criteria based chromosome back to an ANN. NO. basically the same as that for feedforward ANN’s. (a) (b) (c) Fig. In order to evaluate the number of nodes and connections is also used in the the fitness of each chromosome. However. can be expected if these criteria are adopted. [153] have presented an experiment which showed evolution. connection from node to node . 8. It is worth noting that the ANN in difficulty. rows (or columns) and obtain Another flexibility provided by the evolution of architec- tures stems from the fitness definition. An EA is capable of exploring all possible that an ANN designed by the evolutionary approach had connectivities. Its representation is a human-designed architecture. 87. The One potential problem of the direct encoding scheme only difference is that no reduction in chromosome length is scalability. and its binary string representation. The EA used to evolve recurrent ANN’s can be the where entry indicates the presence or absence of a same as that used to evolve feedforward ones. respectively. (a) (b) (c) Fig. An example of the direct encoding of a recurrent ANN. VOL. As a matter of fact. Hence columns 3 from the ANN easily. It may facilitate rapid generation and and 4 have 1’s. and train it. better generalization ability than one trained by BP using Fig. SEPTEMBER 1999 . since a single connection can be added or removed node 1 is connected to nodes 3 and 4. We can concatenate all the one has hit upon so far [150]. its connectivity matrix. A large ANN would require a very large 1430 PROCEEDINGS OF THE IEEE. 9. For example. 7. and (c) show the architecture. (a). its connectivity matrix. the binary string representation only needs to consider the upper-right triangle of the matrix. we need to convert a fitness function. space. 7 has a shortcut connection from the input to output. An example of the direct encoding of a feedforward ANN. ture. It is very suitable for the The first two columns are 0’s because there is no connection precise and fine-tuned search of a compact ANN architec- from node 1 to itself and no connection to node 2. straightforward to implement.

While it is necessary to investigate the algorithm. [160]. problems [233]. [168]. It is worth indicating that most studies on encode only the number of hidden nodes in the hidden layer. The connectivity pattern of [151]. [184]. [179]. [152].g. [155]. the network thus avoided crossover and adopted only mutations in instantiation software used by Harp et al. Hancock [113] suggested that the permutation the genotypical representation. 3 and Each segment includes two parts of the information: 1) 4 in Section II-A still exists and causes unwanted side that about the area itself. with a single hidden layer. [113]. indirect encoding method from the above is to encode de- velopmental rules. The learning parameters can problem might “not be as severe as had been supposed” evolve along with architecture parameters. such as more compact genotypical representation. a single-element matrix. [231] have argued that the indirect encoding the architecture in the form of a matrix is constructed from a scheme is biologically more plausible than the direct one. since the performance surface defined in the be. the permutation problem concentrate on the GA used. [230]–[232] to the evolution of architectures. [159]. For example. it is neighboring layers are fully connected as well. But this method also has some knowledge or specified by a set of deterministic devel. We also run the segments representing an area (layer) and its efferent risk of missing some very good solutions when we restrict connectivity (projections). but it may not be very good at finding rule in a production system with a left-hand side (LHS) and a compact ANN with good generalization ability. constrained to be the input and output area. [230]. The detailed node-to-node connection is recombining them is often very low. doing chromosome. [152]. e. [184]. selection mechanisms. [205]. such as the number of nodes in effects in the evolution of architectures.” Thierens [101] proposed a genetic encod. However. used because “The increased number of ways of solving Although the parametric representation method can re- the problem outweigh the difficulties of bringing building blocks together. [232]. The destructive effect of where only some characteristics of an architecture are crossover will also be lessened since the developmental rule encoded in the chromosome.matrix and thus increase the computation time of the chromosomes to specify independently the whole nervous evolution. A developmental rule is usually described by a recursive duce more compact genotypical representation of ANN equation [230] or a generation rule similar to a production architectures. [205]. which are used to construct architectures. equally important to study the genotypical representation the parametric representation method will be most suitable scheme. Harp et al. In general. [230]. The permutation problem as illustrated by Figs. The details about each con. One way to cut down the size of matrices is system according to the discoveries of neuroscience. if complete connection is to be used between two be specified by a set of parameters such as the number neighboring layers in a feedforward ANN.’s work is their efficiency of evolution for some problems [48]. EA’s can only search a limited subset of problem. Some a right-hand side (RHS) [151]. [217]. benefits. [156]. [154].. For 1) Parametric Representation: ANN architectures may example. An interesting aspect of Harp et al. population sizes. however. nodes in each hidden layer. respectively. The interaction with the population size and the selection mechanism he between the two can be explored through evolution. [208]. The length of chromosome can These parameters can be encoded in various forms in a be reduced greatly in this case [153]. to find. the number of connections between two layers. basis. [152]. and 2) tionally equivalent ANN’s which order their hidden nodes that about the efferent connectivity. The shift from the direct optimization of architectures to tation of architectures.e. etc. It should be noted that differently have two different genotypical representations. [156] used a “blueprint” to so requires sufficient domain knowledge and expertise. The indirect encoding scheme can pro. the area and the spatial organization of the area. [128]. B. the evolution of architectures [45]. when we know what kind of architectures we are trying ginning of Section III is determined by the representation. Some researchers specified by implicit developmental rules. Because two func. The first and last area are the search space manually. [168]. only very limited experimental results the whole feasible architecture space.g. [211]. crossover may be useful and important in increasing the [159]. The Indirect Encoding Scheme in chromosomes [151]. only the connectivity pattern instead of each connection the probability of producing a highly fit offspring by is specified here. the indirect encoding scheme has the optimization of developmental rules has brought some been used by many researchers [151]. e. Similar parametric representation methods with different [185]–[197]. [156]. its architecture of hidden layers. to use domain knowledge to reduce the search space. [223]. opmental rules. More research is needed to further understand the impact of 2) Developmental Rule Representation: A quite different the permutation problem on the evolution of architectures. the number of hidden nodes in each can be encoded by just the number of hidden layers and layer. if we were presented. by repetitively applying because it is impossible for genetic information encoded in suitable developmental rules to nonterminal elements in YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1431 . We will have to assume two etc. we basically assume strictly layered feedforward ANN’s genetic operators. although it has been shown that sets of parameters have also been proposed by others [155]. duce the length of binary chromosome specifying ANN’s ing scheme of ANN’s which can avoid the permutation architectures. [149]. [229]. combination of learning parameters with architectures in [212]... i. representation is capable of preserving promising building nection in an ANN is either predefined according to prior blocks found so far [151]. represent an architecture which consists of one or more which are difficult to obtain in practice. In order to reduce the length of the genotypical represen.

(d) step 3 when all the entries in the matrix are terminal elements. A typical step of constructing a connection matrix is to find rules whose LHS’s appear in the current matrix and replace all the appearance with respective RHS’s. 4. and (e) the architecture. SEPTEMBER 1999 . (a) (b) (c) Each developmental rule consists of a LHS which is a nonterminal element and a RHS which is a 2 2 matrix with either terminal or nonterminal elements. a chromosome encoding by others [151]. LHS and 2 2 matrices with only ones and zeros on the Fig.” respectively. (b) step 1. The LHS can be represented implicitly by the rule’s position in the chromosome. zeros only (i. a terminal element is either 1 (existence of a connection) be reached. one step application of these rules will produce the matrix (d) (e) by replacing with . either 1 or 0. Development of an EANN architecture using the rules we can generate the following matrix: given in Fig. 9 and 10 illustrates how would lead us to the matrix an ANN architecture can be defined given a set of rules. 87. Since there are 26 different rules. depending on how many with .” The 16 rules with “ ” to “ ” on the be an ANN’s connection matrix. 10 summarizes the previous three rule-rewriting RHS are predefined and do not participate in evolution in steps and the final ANN generated according to Fig. order to guarantee that different connectivity patterns can 2 In this paper. (c) step 2. where is the starting symbol (state).“ . until a connectivity pattern is fully specified. For example. given a set of rules as described by Fig.. S is the initial element (or state).e. i. the current matrix until the matrix contains only terminal2 elements which indicate the presence or absence of a connection. We can encode the whole rule set as an individual [151] (the so- called Pitt approach) or encode each rule as an individual [184] (the so-called Michigan approach). 9: (a) the initial state.” . there will be no further For example. NO. and 6 do not appear in the ANN by replacing with with with . Another step of applying these rules The example described by Figs. Some examples of developmental rule are given in Fig. Fig. 10(d). 9. 10. terminals only). that is. the nonterminals may range from “ ” to application of developmental rules. If we apply these rules again. 9. and one of many different values. VOL.” “ .e. The question now is how to get such a set of rules to construct an ANN. Each rule may be represented by four allele positions corresponding to four elements in the RHS of the rule. and with . all of them would need 26 4 104 alleles. 9. 9. The above matrix will “ ” and “ ” to “ . Each position in a chromosome can take by replacing with with with . 1432 PROCEEDINGS OF THE IEEE. Isolated nodes are not shown. Since the above matrix consists of ones and nonterminal elements (symbols) we use in the rule set. These definitions are slightly different from those used is “ . The nodes in the architecture are numbered from one to eight.. four per rule. Note that nodes 2. Examples of some developmental rules used to construct a connectivity matrix. whose LHS or 0 (nonexistence of a connection) and a nonterminal element is a symbol other than 1 and 0.Fig. because they are not connected to any other nodes. One answer is to evolve them.

a compact ANN architecture. Another some limitations. additional clean-up algorithm. The number of hidden nodes in each hidden Some good results from the developmental rule repre. an input coefficient. For example.e. It often needs to predefine the number of limitation is that there are usually several hidden nodes in rewriting steps. Hwang et al. Mjolsness et al. often assumed to be the same for all the nodes in an ANN. determined at random. very good at evolving detailed connectivity patterns among i. Each hidden layer is constructed auto- be represented by the following chromosome: matically through an evolutionary process which employs ABCDaaaaiiiaiaacaeae the GA with fitness sharing.e. apply EA’s to the evolution of both topological structures tions which specify the growth of connection matrices.. hidden layers are added one by one if the networks. They decide the optimal mixture between these two transfer argued that the fractal representation of architectures was functions automatically. Section III-D will discuss these in more detail. One advantage of using simulated White and Ligomenides [171] adopted a simpler ap- annealing instead of GA’s in the evolution is the avoidance proach to the evolution of both topological structures and of the destructive effect of crossover. at least for all the nodes in the same layer.e. which are basically the same feature detector in a individual nodes. Connection representation.The LHS of a rule is implicitly determined by its position current architecture cannot reduce the training error below in the chromosome. at least as good as the developmental rule method.e. the performance difference between the direct and indirect encoding schemes was not caused by the encoding scheme C. the two functions were evolved. 80% nodes in the ANN used 3) Fractal Representation: Merrill and Port [231] pro. However. Wilson [234] also node transfer functions. ANN) used simulated annealing in ANN architecture design.e. i.. as well as connection weights for projection neural by layer. ANN’s performance [238]–[240]. was specified in the structural genes in their genotypic timized by simulated annealing instead of EA’s. this both sigmoidal and Gaussian nodes. in the initial population. node transfer than the whole architecture. Recent work by Siddiqi and Smith and Cribbs [181]. An architecture is built layer function. whole ANN by adding or deleting a node (either sigmoidal 4) Other Representations: A very different approach to or Gaussian). the method has only deal with strictly layered feedforward ANN’s. [241] were.. No parameters of representation. are encoded in genotypes and op. Rather than fixing encoding method is closer to the direct encoding scheme the total number of nodes and evolve mixture of different rather to the indirect one. encoding scheme achieved the same performance as that The transfer function of each node in the architecture has achieved by the developmental rule representation when been assumed to be fixed and predefined by human experts. and an output coefficient Liu and Yao [191] used EP to evolve ANN’s with to specify each node in an architecture. The type of node added or deleted was the evolution of architectures has been proposed by Ander. The transfer function is nection weights. The evolution was used to is based on the use of fractal subsets of the plane. The direct only deals with the topological structure of an architecture. [230] described a similar rule encoding Stork et al. They evolved ANN topology. the sigmoid transfer function and 20% nodes used the posed another method for encoding architectures which Gaussian transfer function. A compact genotypical representation population. For each individual (i. The Evolution of Node Transfer Functions itself. Their approach is unique in that each some benchmark problems [191].. represented by with seven nodes were considered. It was much more complex than the usual weights are optimized along with connectivity by simulated sigmoid function because they tried to model a biological annealing since each entry of a connection matrix can have neuron in the tailflip circuitry of crayfish. They used three real-valued parameters. It is not the same species which have very similar functionality. to our best knowledge. the first to method where rules are represented by recursive equa. a real-valued weight. Fitness sharing encourages the where the first four elements indicate the RHS of rule “ ”. [225] went individual in a population represents a hidden node rather one step further. sentation method have been reported [151] using various One limitation of this approach [236] is that it could size encoder/decoder problems. [237] also used an individual Lucas [233] shows that the direct encoding scheme can be to represent a hidden node rather than the whole ANN. the initial conditions were the same [233]. an edge code. yet the transfer function has been shown to be an important The developmental rule representation method normally part of an ANN architecture and have significant impact on separates the evolution of architectures from that of con. layer can vary. The transfer function decomposition matrices. Fast simulated annealing [235] nodes. formation of different feature detectors (hidden nodes) in the second four indicate the RHS of rule “ . Good performance was reported for sen and Tsoi [236]. 9 can certain threshold.” etc. It does not allow recursive rules. but by how sparsely connected the initial ANN The discussion on the evolution of architectures so far architectures were in the initial population [151]. They Their approach can only deal with strictly three-layered have reimplemented Kitano’s system and discovered that feedforward ANN’s. the rule set in Fig.. and node transfer functions even though only simple ANN’s Coefficients of these recursive equations. The sigmoid and Gaussian transfer biologically more plausible than the developmental rule function themselves were not evolvable. This creates some problems for evolution. In a sense. i. YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1433 . i. the population. Such redundancy needs to be removed by an does not imply a compact phenotypical representation. their algorithm allowed growth and shrinking of the was used in the evolution.

e. parity problems of size from four to eight. Simultaneous Evolution of Architectures appears to contradict the basic ideas behind ANN’s. [45]. better generalization capability for most practical problems. NO. One major problem with However. process is called partial training. such ANN’s. However. based on EP for simultaneous evolution of ANN duce different training results. Sebald and Chellapilla [242] used the evolution of node [182]. it is regarded as more general discussion on the mapping between genotypes successful. including [42].e. Such noise may mislead evolution because the fact that Hybrid training is the only mutation in EPNet which mod- the fitness of a phenotype generated from genotype is ifies ANN’s weights. in a population is defined by two user-specified parameters. The motivation behind without any weight information has difficulties in evaluat. If one Angeline et al. 6 is very inaccurate perceptrons. A validation set very inefficient. BP may reduce an ANN’s error to 0. such as radial basis is noisy fitness evaluation [194]. Hence this training only small ANN’s were evolved in previous studies. an ANN with a There has been some work in this area where good results full set of weights) fitness was used to approximate its were reported [119]. connection addition. It is based on a modified BP (MBP) higher than that generated from genotype does not mean algorithm with an adaptive learning rate and simulated that truly has higher quality than . fitness function (RBF) networks or nearest-neighbor multilayer evaluation as described in step 3 of Fig. [165]. Otherwise and phenotypes. [232]. [180].e. is emphasized in EPNet. is used in EPNet to measure the fitness of an individual. some difficult problems where little human expertise is One issue in evolving ANN’s is the choice of search available. The average The number of epochs used by MBP to train each ANN’s result is then used to estimate the genotype’s mean fitness. EPNet does not use type may have quite different fitness due to different any crossover operators for the reason given above. rather training algorithms may produce different training genetic evolution. [120]. VOL. [194]. ANN’s emphasize distributed (knowledge) representation ing ANN architectures evolve architectures only. Behavioral (i. without [244]. the evolution would be ANN’s without sacrificing generalization. EPNet. because and Connection Weights crossover works best when there exist “building blocks” but it is unclear what a building block might be in an ANN since The evolutionary approaches discussed so far in design. node deletion. In other words. and noisy because a phenotype’s (i. As a result. 9. Representation and search are the weight information. 87. In order to reduce annealing. use of crossover D. 11 shows the example. ordering mutations is to encourage the evolution of compact ing fitness accurately. For between a parent and its offspring [190]. It is one of the major reasons why a local optimum after those epochs. [243] have provided a mutation leads to a better offspring. [185]–[200]. ANN’s genotype’s (i. [245]–[253]. This is es. tions: hybrid training.05 main structure of EPNet. [166]. functional) evolution. mark problems and achieved excellent results. an architecture usually has to be trained many prune hidden nodes and connections. the same geno. the two-spiral 1434 PROCEEDINGS OF THE IEEE. architectures and connection weights. [194] developed an automatic system. the one-to-many mapping from genotypes to phenotypes. A number of results even from the same set of initial weights.001 due to its global search capability. [169]–[172]. Different random initial weights may pro. each individ- transfer function as an example to show the importance of ual in a population is a fully specified ANN with complete evolving representations. Connection weights have to be the weights in the ANN. 1) The first source is the random initialization of the Yao and Liu [193]. Both crossover-based and mutation- based EA’s have been used.. No further mutation will be applied. Co-evolving solutions between a genotype and its phenotype. Fig. if ANN’s do not use a distributed represen- the evolution of architectures without connection weights tation but rather a localized one. Recombining one part of an ANN learned after a near-optimal architecture is found. [149]. In this case.. Hence. [254]. connection deletion. weights. fitness evaluation and their representations may be an effective way to tackle is accurate. In general. operators used in EA’s. It is used to bridge the In essence.. One way to alleviate this problem is to evolve ANN EPNet has been tested extensively on a number of bench- architectures and connection weights simultaneously [37]. times from different random initial weights. [230]. SEPTEMBER 1999 . [179]. The other four mutations are used to grow and such noise. Since there is a one-to-one mapping two key issues in problem solving. as the developmental rule method. Different and weights. through training. It is clear that the evolution of architectures the next mutation is attempted. It relies random initial weights used in training. The five mutations are attempted sequentially. with another part of another ANN is likely to destroy both pecially true if one uses the indirect encoding scheme. the noise identified in this paper is caused by behavioral gap between a parent and its offspring. This method increases the computation time for fitness There is no guarantee that an ANN will converge to even evaluation dramatically. and node addition [188]. but an EA could reduce the error EPNet uses rank-based selection [125] and five muta- to 0. This techniques were adopted to maintain the behavioral link is especially true for multimodal error functions. The knowledge in an ANN is distributed among all any connection weights. an ANN without any weight information) using distributed representation are more compact and have fitness. [149] and Fogel [12]. There are two major sources of noise [194]. crossover might be a very useful operator. on a number of mutation operators to modify architectures 2) The second source is the training algorithm.

prediction problem. Very compact ANN’s with good an ANN should learn its learning rule dynamically rather generalization ability have been evolved [185]–[195]. The design how learning can guide evolution [257]–[260] and the of training algorithms. in biological systems. It is desirable to develop an since newly evolved learning rules can deal with a complex automatic and systematic way to adapt the learning rule to and dynamic environment. [270]. 12. [269]. Since evolution are some other different EP-based systems for designing is one of the most fundamental forms of adaptation. the thyroid problem. Different variants the evolution of learning rules is still in its early stages of the Hebbian learning rule have been proposed to deal [264]–[267]. [217]. designing an optimal only in providing an automatic way of optimizing learning learning rule becomes very difficult when there is little rules and in modeling the relationship between learning prior knowledge about the ANN’s architecture. Research into type of architectures under investigation. the Aus. but most of them deal with the issue of mance when applied to different architectures. what is needed from an ANN is its the heart disease problem. ability to adjust its learning rule adaptively according to its tralian credit card problem. the encoded however. it is ANN’s [128]. and how to model the creative process which are not necessarily true in practice. the diabetes problem. In fact. all ANN’s. still difficult to say that this rule is optimal for learning rule. more fundamentally the learning relationship between the evolution of architectures and rules used to adjust connection weights. 11. etc. i. systems.. A typical cycle of the evolution of For example. The main structure of EPNet. Designing a understand better how creativity can emerge in artificial learning rule manually often implies that some assumptions. new rule can learn more patterns than the optimal Hebbian Similar to the reason explained in Section III-D. [223]. This research will help us to an architecture and the task to be performed. THE EVOLUTION OF LEARNING RULES extremely complex. [255] in many cases [256]. have to be made. The relationship between evolution and learning is IV. Various models have been proposed An ANN training algorithm may have different perfor. [257]–[271]. which is and evolution. the widely accepted Hebbian learning rule learning rules can be described by Fig. but also in modeling the creative process often the case in practice. It is. This research is important not with different architectures. introduced into ANN’s in order to learn their learning rules. like ANN’s. the Mackey–Glass time series architecture and the task to be performed. is very noisy because we use phenotype’s YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1435 . problem.e. the rule and can learn exceptions as well as regularities. fitness evaluation of each individual. In other words. However. The iteration has recently been shown to be outperformed by a new rule stops when the population converges or a predefined proposed by Artola et al. depends on the that of connection weights [261]–[263]. There than have it designed and fixed manually. [129]. The maximum number of iterations has been reached.Fig. the breast cancer problem. but none has been tested not surprising that the evolution of learning rules has been on as many different benchmark problems.

[152] encoded weight. using a weighted has to work on the dynamic behavior of an ANN.g. Harp et al. If of dynamic behaviors is clearly impractical. No third. and 3) the EA used to evolve interesting but it hardly touches the fundamental part of these chromosomes.. There are a determine different learning rules. an ANN’s training result) to approximate Unlike the evolution of connection weights and architec- genotype’s fitness (i. (4) teractions between the learning algorithm and architectures such that a near-optimal combination of BP with an archi- tecture can be found. and the ’s are real-valued coefficients evolutionary process to find parameters for BP but ANN’s which will be determined by evolution. A typical cycle of the evolution of learning rules.or fourth-order3 terms expected to enhance ANN’s adaptivity greatly in a dynamic environment.. i. be considered as the first attempt of the evolution of the activation of the output node. [272] also used an are local variables.. The parameters evolved in this the evolution of learning rules in this case is equivalent to case tend to be optimized toward the architecture rather the evolution of real-valued vectors of ’s. Due to a large number number of BP algorithms with an adaptive learning rate of possible terms in (4). Some techniques have weights and architectures. a learning rule’s fitness). [152]. only a few terms have been Further comparison between the two approaches would be used in practice according to some biological or heuristic quite useful.. is the weight change. 12. Constraints have to be set on the the average training result from different ANN architectures type of dynamic behaviors. etc. 2) representation of their real-valued The evolution of algorithmic parameters is certainly coefficients as chromosomes.e. i..e. A. the evolved universal representation scheme which can specify any kind learning rule would be optimized toward this architecture. 87. There are three major issues involved in the evolution B. NO. the current connection learning rules [32]. [213]. let alone the a near-optimal learning rule for different ANN architectures prohibitive long computation time required to search such is to be evolved. the evolution of learning rules been used to alleviate this problem. If the learning rule into static chromosomes. e.. 1436 PROCEEDINGS OF THE IEEE. a learning rule can be described by the function [5] because the simultaneous evolution of both algorithmic parameters and architectures facilitates exploration of in.Fig. its learning rule or weight up- as a linear combination of four local variables and their dating rule.. Such tures which only deal with static objects in an ANN. knowledge. the fitness evaluation should be based on a learning rule space. The Evolution of Algorithmic Parameters Two basic assumptions which have often been made on The adaptive adjustment of BP parameters (such as learning rules are: 1) weight updating depends only on the learning rate and momentum) through evolution could local information such as the activation of the input node. the nonevolutionary one such as offered by Jacobs [273] That is. rules being evolved in order to reduce the representation complexity and the search space. and 2) the learning rule is the same for all BP’s parameters in chromosomes together with ANN’s connections in an ANN. Other researchers [32]. i. 3 The order is defined as the number of variables in a product. In other words. A learning rule is assumed to be a architecture. VOL. Chalmers [264] defined a learning rule a training algorithm. 9. the basic form of learning in order to avoid overfitting a particular architecture.e. Different ’s than being generally applicable to learning. The key average of the training results from ANN’s with different issue here is how to encode the dynamic behavior of a initial connection weights in the fitness function. where is time. [139]. Adapting a learning rule through evolution is six pairwise products. architecture was predefined. fitness (i. This evolutionary approach is different from linear function of these local variables and their products. Trying to develop a ANN architecture is predefined and fixed. very slow and impractical. [272]. approximation may be inaccurate. The Evolution of Learning Rules of learning rules: 1) determination of a subset of terms described in (4).e.e. which would make evolution and momentum where a nonevolutionary approach is used. SEPTEMBER 1999 .

i. from Chalmers’ in the sense that gradient descent algo. each individual in the were not set on the connection weights. OTHER COMBINATIONS BETWEEN ANN’S AND EA’S of inputs and outputs were fixed by the learning task at hand. The ANN architecture is often evolution [269]. indicates presence of a feature. better performance rules) in a single level of evolution. ANN’s are often used to model and approximate of the decoded learning rule and thus in reducing the noise a real control system due to their good generalization from the second source. Preprocessing is often needed to or fourth-order terms. three second-order. but the performance of the ANN using this subset is no Baxter [269] took one step further than just the evolution worse than that of the ANN using the whole input set. architectures.e. it can also be used to discover new training local variables and one zeroth-order. optimal subset which has the fewest number of features order. rather than EA’s. The second is for each combination of control parameters. (including connection weights. the possible inputs to an ants. architecture used in the fitness evaluation was fixed because only single-layer ANN’s were considered and the number V. and learning [277]–[287]. These experiments. The output will be the control system output learning tasks is needed. The first is the decoding control problems as it is impractical to run a real system process (morphogenesis) of chromosomes. [274] and some of its vari. the environmental diversity has to from which an evaluation of the whole system can easily be YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1437 .’s work on “econets. using it to train ANN’s. However. and three third-order terms in (4). three first-order. Not only does the evolution of input features provide a rithms and simulated annealing. ANN as Fitness Estimator networks. The tasks have to be used in the fitness evaluation. and population represents a subset of all possible inputs. did not evolve learning rules explicitly [260]. learning paradigm where a training algorithm based on EA’s can self-select training examples. however. including the principal component were also carried out by others [265].’s approach [265]. which included one first.. although simple and preliminary. Very good results. This learning rules. fixed. Cho and Cha [289] Research related to the evolution of learning rules in- proposed another algorithm for evolving training sets by cludes Parisi et al. four automatically. generalization ability. Zhang and Veenker [288] described an active three second-order terms in (4) were used. space of possible ANN’s would be enormous if constraints In the evolution of input features.. it is very time Section III-D and at the beginning of Section IV. “1” The learning rule only considered two Boolean variables. Each bit in the chromosome corresponds to a feature. They also features to an ANN can be formulated as a search problem. while “0” indicates absence Although Baxter’s experiments were rather simple. we want to find a near- adopted in their learning rules. constraints set on learning rules could prevent and longer training times in order to achieve a reasonable some from being evolved such as those which include third. In their experiments. The input to such ANN’s will be a set of control is applicable to a wide range of ANN architectures and parameters. reduce the number of inputs to an ANN. The number of nodes in ANN’s was fixed. For many practical problems. The environmental diversity is fitness values are often approximated rather than computed essential in obtaining a good approximation to the fitness exactly. starting from a population of A. The evaluation of an individual is carried out confirmed that complex behaviors could be learned and by training an ANN with these inputs and using the result the ANN’s learning ability could be improved through to calculate its fitness value. the evolution discovered the well-known delta rule [7]. be 1 or 1. [269]. Ten coefficients and a scale parameter were be very high. There consuming and costly to obtain fitness values for some are two possible sources of noise. The Evolution of Input Features randomly generated learning rules. have been used for this purpose. They emphasized the crucial role of the environment in which the evolution occured while only using some simple neural B. It is clear that the search with fewer inputs. After 1000 generations. EA’s of learning rules. were way to discover important features from all possible inputs used to find near-optimal ’s. Such evaluation is very noisy. If a general learning rule which abilities. [266]. ANN can be quite large. In order to introduced when a decoded learning rule is evaluated by get around this problem and make evolution more efficient. i. so the weights could only length is the same as the total number of input features.e. The issue of environmental diversity is closely EA’s have been used with success to optimize various related to the noisy fitness evaluation as pointed out in control parameters [290]–[292]. only ANN’s with binary can be implemented using a binary chromosome whose threshold nodes were considered.” although they adding virtual samples. and examples. Various dimension Similar experiments on the evolution of learning rules reduction techniques. they of the feature. have been reported from these studies. There may be some redundancy demonstrated the potential of evolution in discovering novel among different inputs. A large number of inputs to an learning rules from a set of randomly generated rules. In his experiments. because of Bengio et al. ANN increase its size and thus require more training data However. considered four local variables but only seven terms were Given a large set of potential inputs. Fontanari and Meir [267] used Chalmers’ approach The problem of finding a near-optimal set of input to evolve learning rules for binary perceptrons. [266] is slightly different the reason explained in Section III-D. analysis. many different architectures and learning encoded in a binary string via exponential encoding. He tried to evolve complete ANN’s have been used to perform such a search effectively [267]. architectures. [270]. [267]. [275].were used.

architectures. and thus they face the same From the engineering perspective. First. they can roughly be divided into three: the system to have best generalization. Unfortunately. however. EA’s are stochastic algorithms. A General Framework for EANN’s accurately is almost impossible in practice [298] although A general framework for EANN’s can be described there are many theories and criteria on generalization. a learning rule. It is possible that system and then used them to train ANN’s. of architectures at the highest level since the optimality Such a population of ANN’s is called an ANN ensemble of a learning rule makes more sense when evaluated in in this section. Unfortunately. it is more appropriate to put the evolution an integrated system is expected to produce better results. minimize an error function. evolutionary learning. to put the evolution of architectures at the highest level based learning.g. Many combination approach will be depends largely on how well others used EA’s and ANN’s for combinatorial or global ANN’s learn and generalize. there is usually little prior single individual. Olmez [97] real control system. which is different from evolution of connection weights. [193]. measuring generalization quantitatively and A. the faster EA’s are often used to maximize a fitness function or the time scale it is on.obtained. When an EA is used to search for a near-optimal experiments which show that EA’s can be used to evolve set of control parameters. However. There have been some very successful an environment including the architecture to which the 1438 PROCEEDINGS OF THE IEEE. (numerical) optimization in order to combine EA’s global search capability with ANN’s fast convergence to local optima [308]–[318]. If we use ANN’s to estimate fitness. because such knowledge can be encoded in an architecture’s Since the maximum fitness may not be equivalent to best genotypic representation to reduce the (architecture) search generalization in evolutionary learning. or vice versa. C. Zitar and Hassoun [306] ond. SEPTEMBER 1999 . If there is more prior knowledge about an actually used as an optimization. evaluation rather than the real control system [293]–[297]. Evolving ANN Ensembles Learning is often formulated as an optimization problem VI. [302]–[305]. Others advantages in evolving control systems. not learning. e. NO. CONCLUDING REMARKS in the machine learning field. these criteria and learning tasks.. Hence. The rules. ANN’s architecture than that about their learning rules. A population always contains at least knowledge available about either architectures or learning as much information as any single individual. rules in practice except for some very vague statements combining different individuals in the population to form [319]. algorithm. the best individual space and the lower-level evolution of learning rules can with the maximum fitness in a population may not be be biased toward this type of architectures. used EA’s to optimize a modified restricted Coulomb we do not need to use the real system and thus can avoid energy (RCE) ANN. or While little can be done for traditional nonpopulation-based if a particular class of architectures is pursued. the time- consuming fitness evaluation based on real control systems There are some other novel combinations between is replaced by fast fitness evaluation based on ANN’s. ANN’s and EA’s. not have best generalization unless there is an equivalence between generalization and the error on the training data. it is better learning. The lower the level of evolution. this combination provides safer evolution of control used EA’s to extract rules in a reinforcement learning systems. and learning minimizing an error function on a training data set. For example. Imada and Araki [307] used EA’s damages to the real system. by Fig. learning rules: either the evolution of architectures is at the While these functions often lead to better generalization of highest level and that of learning rules at the lower level learned systems. two alternatives to are often used to define better error functions in the hope decide the level of the evolution of architectures and that of that minimizing the functions will maximize generalization. message length (MML) [301]. proceeds at the lowest level on the fastest time scale in an Akaike information criteria (AIC) [300]. The evolution of connection weights such as the minimum description length (MDL) [299]. the ANN will be used in fitness ANN ensembles [192]. how successful this to evolve connection weights for Hopfield ANN’s. In practice. Other individuals in the population may hand. the evolution of learning rules should be at the contain some useful information that will help to improve highest level if there is more prior knowledge about them generalization of learned systems. Sziranyi some poor control parameters may be generated in the [99] and Pal and Bhandari [98] used EA’s to tune circuit evolutionary process. This section first describes a general framework for ANN with the minimum error on a training data set may ANN’s and then draws some conclusions. the decision on the problem as described above: maximizing a fitness function level of evolution depends on what kind of prior knowledge is different from maximizing generalization. Sec. However. These parameters could damage a parameters and templates in cellular ANN’s. 87. In this case. there is no guarantee. The EA is is available. there are opportunities for improving population. and minimum environment determined by an architecture. This combination of ANN’s and EA’s has a couple of D. 9. learning is different Although evolution has been introduced into ANN’s at from optimization in practice because we want the learned various levels. VOL. It is thus beneficial available or if there is a special interest in certain type of to make use of the whole population rather than any learning rules. 13 [3]–[5]. On the other the one we want. There are.

233–243. when gradient information of the error function is difficult Griffith Univ. New York: Marcel Dekker. the work with one-shot (only-one-candidate) search used in evolution of large ANN’s becomes feasible. The author is grateful to D. Conclusions Evolution can be introduced into ANN’s at many differ. B. “Evolution of connectionist networks.” in Artificial to avoid being trapped in a poor local optimum. Queensland. Not only can the evolution of architectures and learning rules and BP such evolution discover possible new ANN architectures used in the evolution of connection weights. the and learning rules. Intell. In fact. Fig. pp. Williams. YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1439 . A general framework for EANN’s. J. “Evolutionary artificial neural networks. adaptive systems if we do not restrict ourselves to EA’s especially when there is little prior knowledge available and three levels. gradient descent. Reasoning & Creativity. “A review of evolutionary artificial neural networks. 1995. Evolution can also be used to allow an ANN to adapt its learning rule to its environment. Each ACKNOWLEDGMENT EANN model corresponds to a point in this space. In a sense. 1993. Simultaneous evolution of ANN architectures and connection weights generally produces better results. 1991. 203–222. 49–52. Due to the simplicity and generality [2] . 13 summarizes different levels putationally expensive. Preliminary experiments have shown that efficient learning rules can be evolved from randomly generated rules. Simulated annealing. 13 can also be viewed as a general framework of cial to introduce global search at some levels of evolution.. vol. benefi- Fig.” Int. Crossover would be more suitable for localized ANN’s. The indirect encoding scheme is suitable for finding a particular type of ANN architecture quickly.. 1994. “The evolution of connectionist networks. 33. This has several advantages over J. EA’s at all three levels of evolution. no. Fogel and an anonymous referee for their constructive comment on earlier versions of this paper. Ed. 1993.” in Preprints global approach to connection weight training. can be considered as a special case of our general frame- With the increasing power of parallel computers.g. “Evolutionary artificial neural networks. 4. T. Dordrecht. It also helps to model the relationship between learning and evo- lution. vol. [4] . Dartnall. especially Int. Yao. because the trial-and-error and other heuristic cases of EA’s. characteristics of the design problem given in the begin- ning of Section III. heuristic methods of architecture design because of the 137–170. pp. however. pp. reduce the search space in the evolution. Australia. 539–567. The evolution of connection weights provides a [1] X. 13. the traditional BP network methods are very ineffective in such circumstances. It is argued that crossover produces more harm than benefit in evolving ANN’s using distributed representation (e. Syst. such as RBF networks. at that level and the performance of the ANN is required and even exhaustive search could be considered as special to be high. Global search procedures such as EA’s are usually com- learning rule is applied. pp. vol. or costly to obtain.” in Encyclope- dia of Computer Science and Technology. J. Neural algorithms often have to be run multiple times in order Syst.” Int. REFERENCES ent levels. A. G. For example. 4. While constraints on learning rules are necessary to Fig. It would be better not to employ of evolution in ANN’s. Eds. The evolutionary approach is quite competitive. B. but it also offers a way to model the general framework provides a basis for comparing various creative process as a result of ANN’s adaptation to a specific EANN models according to the search procedures dynamic environment. multilayer perceptrons) because it de- stroys knowledge learned and distributed among different connections easily. Evolution can be used to find a near-optimal ANN [5] . they used at three different levels since it defines a three- dimensional space where zero represents one-shot search and represents exhaustive search along each axis. they might prevent some interesting learning rules from being discovered. AI. Netherlands: Kluwer. the Intelligence and Creativity.. Current research on the evolution of learning rules normally assumes that learning rules can be specified by (4). Separating the evolution of architectures and that of connection weights can make fitness evaluation inaccu- rate and mislead evolution. no. It is. 8.. 3. of the evolution and the fact that gradient-based training [3] . Symp. the evolution provides ANN’s with the ability of learning to learn. pp. The direct encoding scheme of ANN architectures is very good at fine tuning and generating a compact architecture. Kent and architecture automatically.

Ann Arbor. CA: Morgan Kaufmann. 1. pp. Kangas. Distributed Processing: Explorations in the Microstructures of 1391–1396. 1992. Introduction to the Theory of [32] R. “A scaled conjugate gradient algorithm for 2nd IASTED Int. ing internal representations by error propagation. I. Owens. Fogel. pp. 1991. 40. “The GENITOR algorithm and selective pressure: [46] B. “A mixed genetic approach to the optimization in neural networks with evolution strategy. Eng. L. D. feedback controllers. O’Neil. 1st Int. vol. [6] G. 3. Personnaz. Tecuci. NJ: IEEE Int. Graudenz. W. vol. MA: MIT Press.. E. Fogel. 3. “Evidence of hyperplanes in the J. Heistermann.” in sterdam. Workshop Mul- [15] D.. R. 1991 recognition by neural networks with single-layer training. reinforcement learning for neural networks. vol. Homaifar and S. S. Jan. fast supervised learning. 2–8. pp. optoelec- a genetic algorithm. 71–76. vol.” in Proc. Hillsdale. J. U.” in Proc. 3. vol. NAECON 1991. Conf. McClelland. O. 2. 1st Europ. 1991.” Neural Networks. 1992. Jan. 1989. P. [24] D. mization. Simula. vol. “Learning neural network putation: Comments on the history and current state. vol. Patnaik.. Cambridge.” Biological Cybern. no. no. J. Dep. pp. Tech. ers (compressed edition). P. Through Simulated Evolution. R. Dewilde and J. J. Hinton. pp. “Parallel algorithms for learning [51] J. behaviors in neural networks.” Artificial Computing’89.” Ph. Univ. 1993. R. Schaffer. Eds. E. Eckardt. San Chichester. “Genetic based training of two-layer. CA: Morgan Kaufmann. [16] T. “Training weights of neural networks Cambridge. “Genetic al. no. Williams. Bornholdt and D. vol. J. 1273–1276. Ed.. Menczer and D. G. [27] D. Belew. Wieland. Eds. networks. vol. 1990. pp. CA: Acta. 1989. vol. Caudell and C. Simula. [47] J.” IEEE Signal Processing Mag. Koza and J. J. 1111–1115. MA: Addison-Wesley. 275–280. 1990. vol.” Dep. 1975. 4. C.” [8] H. vol. 1995. P. Chauvin and D. 14. [17] D. pp. R. [44] F. Rumelhart. Deer. B. Conf.” in Artificial Neural Networks: Proc. Schraudolph. Networks. networks and structure design by genetic algorithms. S. Artificial Neural Networks—ICANN-91. The Netherlands: North-Holland. “Genetic generation of both the [22] S. “Connectionist learning procedures. Proc. pp. and C. Radcliffe. Int. Am- Training of constrained networks using genetic algorithms. [13] J. 1990.” in Proc. Mar. works. Autonomous Systems: Proc. “Training a neural network with [50] A. D. vol. and H. “Evolving neural [49] Y. B. J. M. New York: Wiley. Tech. “An exploration of genetic algorithms gorithms and neural networks: Optimizing connections and for the selection of connection weights in dynamical neural net- connectivity. no. Rumelhart. 1440 PROCEEDINGS OF THE IEEE. 370–374. Schwefel. 1989. MA: Addison-Wesley.. Genetic Algorithms and Their Applications. “Progress in supervised neural Neural Networks (IJCNN’91 Singapore). Michalski and G. Anaheim. 1981. F. VA: Center for Artificial Intelligence. Conf. Artificial Life. E. Bourgine. Evolution and Optimum Seeking. Knerr. 1991. O. The Netherlands: Elsevier. San Mateo. (C-014). and M. 1. Eds. Genetic Algorithms in Search. Parallel of neural controllers. Needham CON’92. S. Dill and B. of California. Sutton. IEEE SOUTHEAST- tion: A Machine Learning Approach to Modeling. Erlbaum. Eds. Ichikawa and T. J. G. 3–14.” in Toward a Practice of Machine Intelligence.” in Artificial Neural Networks: 1989. [42] J. 28. Hertz. [28] T. Comput. Heights.” IEEE Networks (IJCNN’91 Seattle). Evans. Proc. J. Joint Conf.” IEEE Int. de Garis. L. 5.. “GenNets: Genetically programmed neural Jan. 1993. New York: [33] N. Queensland. E. 1560–1561. F. “New learning algorithm for training multilay- [20] K. 273–280. 487–493. San Mateo. D. IEEE 1991 National Aerospace and Electronics 1990. Backpropagation: The. Edinburgh. The Netherlands: North-Holland. 6. Theoretical [10] L. [38] H. pp. Joint Conf.. Hush and B. Walsh. 47–48. Fairfax. Lett. McInerney. vol. Peters. 74–77. P. Parisi. [35] H. P. Conf. H. [37] M. 762–767. CompEuro’92. IEEE Trans. Diego. [30] P. Montana and L. Rep.. Joint Conf. 962–968. Porto. Rice. pp. Mäkisara.” in Proc. 28. MI: Univ. Schwefel. Sawa. Srinivas and L. pp. [41] A. U. A. [23] R. 11th Int. 3.” neural network architecture for isolated word recognition. L.” in Proc. T. Holland. ing the least fit hidden neurons. tronic neural network. E. [7] J. and J. de Garis. pp. Cognitive Science Society. Int. Eds. Prados. pp. N. Whitley. “Learn. pp. H. vol.: Wiley. inputs and/or outputs vary in time. Fogel.” in Proc. and V. 1993. G. 1986. [9] H. E. “General asymmetric neural ory. 973–978. “Training multilayered neural networks by replac- [11] D. vol. Bartlett and T. K. 1990. search-space reduction. [25] Y. Joubert. J. J. 1992. Mäkisara. Feb. MA: Ginn. Møller. 1997. pp. Starkweather. A. [40] D. L. July 1992. [36] . Intell. W.” in Proc. 87.. H. “Neural network application for direct networks. 3. Das. Joint Conf. [31] J. Amsterdam. “An introduction to simulated evolutionary opti. Whitley. 1989. “Two problems with backpropagation and other [43] S. Reading. Neural Networks.” IEEE weights using genetic algorithms—Improving performance by Trans. 1992. 283–289. “Genetically programmed neural network for Why rank-based allocation of reproductive trials is best. Dep. Hinton. 823–831. 1991. Sept. Trans. Kohonen. “Glove-talk: A neural network stable systems. Goldberg. 347–361. Artificial Intelligence Phys. pp. Dreyfus. New York: IEEE Press. Krogh. Dominic.” Biological Cybern. 10. D.” in Proc. [34] D. Kohonen. Hillsdale. 272–281. . Kangas. pp. 2. J. B. Dolan. 3rd Int. pp. [39] A. 9. 66. 3. 1994. VOL. by genetic algorithms and messy genetic algorithms. 4. 1–3. Eds. 116–121. “Evolving Neural Computation. “Steerable genNets: The genetic programming [12] . Expert Systems and Neural Networks. 1990. Cognition. no. 3rd Int. Neural Networks (IJCNN’91 Singapore). and Applications. and R. Reading. CS90-174 (revised). tistrategy Learning (MSL-91). Numerical Optimization of Computer Models. vol. Fogel. in amorphous neural networks. Sci. pp. Architectures. and G. “Evolving neural network controllers for un- [21] S. “Evolutionary com. Conf. Michigan Press. 634–637. D. vol.” Parallel Comput. pp. vol. 3–17. pp. 2. 525–533. Lang. Neural Networks.” in Proc. 327–334.-P. M.. [19] M. pp. Evolutionary Computation. 1991 IEEE Int. pp. Horne. pp. nets—Using the genetic algorithm to train neural nets whose [18] D. Bäck. [26] D. Schaffer. “Genetic neural networks on MIMD comput- Wiley. Neural Networks. pp. 2331–2336. NJ: Erlbaum.K. 185–234.-P. pp. “Handwritten digit weights and architecture for a neural network. Downs. NO. T. pp.” IEEE Trans. 1991 IEEE Int. dissertation. vol. Prados.K. Whitley. Jan. 8–39. [48] F. 1991. Univ. Neural Networks (IJCNN’91 Seattle). U. 1992. Lett. Amsterdam. [14] D. 1989. Bogart. Spofford and K. and R. Neural Networks. Maricic. Evolutionary Computation: Toward a New Philosophy of of steerable behaviors in genNets. L. Australia. K. Optimization. and J. Neural Networks. Hinton. J. 318–362. pp. 1991 8th Annual Conf. Hamza. J. 1991 IEEE Int. and G.” in Parallel Joint Conf. 1966. no. pp.” in Proc. Waibel. 63. Schwefel. vol. 2. J.-P. 6. pp.” in Proc.” Neural 1995. [29] D. vol. Adaptation in Natural and Artificial Systems.. 2. Rep. Neural Networks (IJCNN’91 Seattle). “A time-delay ered neural networks that uses genetic-algorithm techniques. vol. pp. Artificial Neural Networks—ICANN-91. “Training feedforward neural net. Fels and G. pp. A. networks: Using genetic algorithm with connectionist learning. 1995. 224–231. MA: MIT Press. Eds. Fogel. Conf. Anderson. Elect. K. Heistermann and H. R. Joint Conf. Hintz.” in solving pole-balancing problem. Proc. E. Univ. A. 33–43. CA: Morgan Kaufmann. 5. Apr. [45] S. pp. 1986. vol. Genetic Algorithms and Their Applications.” IEEE Trans. vol. 1991. Eng. Varela and P. SEPTEMBER 1999 .” Electron. and F. 667–673.” Electron.” in Proc. Neural interface between a data-glove and a speech synthesizer. 1. 2. San Mateo. System Identification Through Simulated Evolu. E. Guan. 2.D. works using genetic algorithms. “Using the genetic algorithm to train time dependent and Machine Learning. Rumelhart and J. Symp. and C. and N. genetic learning of neural networks. Ed. 397–404. Davis. “Parametric connectivity: T. Nov. “Evolving sequential machines Artificial Intelligence. Hammel.” in Proc. Palmer. Jan. Ed.. Hinton. “Genetic steepest-descent learning procedures for networks. 1991.

“Training neural networks with models for hot metal desulphurization. vol. Chmurny. M. C. nos. 30. [75] Y. 1996.. 2–3. pp. pp. 38. Korning. Kunze. nos. Hamza. H. Jain. pp. optimization of neural networks: A comparison of the genetic [68] R. “Pattern recognition Robotics and Automation. Wei. vol. 5. 230–232. 353. A. [79] S. 12. no. Robot.. pp. M. manipulator.” in Proc. 6. C. pp. vol. Deb. Meeden. Mar. no. B. pp. Los Alamitos.” in Proc. SPIE. [71] S. no. 3. Japan (English Translation of Denki Gakkai Ron- gramming approach to the construction of a neural network bunshi).” Solid State Commun. 1995. Janson and J. 1997. A. 8. Elect. O. Jang.” J. “Alternative learning methods for Simulation in Materials Sci. T. 1992.” Neural Network World. Hirata. “Translation. no. J. and L. Lee and D. 4. Narita. pp. Eds. 965–976. 3. July/Aug.. 1996.. pp. 62. Agriculture.. pp. Civco. 1998. June 1. no. “Neural network training by networks in process control: Model-based and reinforcement means of cooperative evolutionary search. Bentink. R. Symp. vol. Baluja. and Associated Equipment. 3–4. “Alternative neural work. “Neural networks in compu- FL. vol. “Evolving wavelet neural ceptron for Taiwan power system short-term load forecasting. Orlando. vol. 900–909. 171–185. vol. Spectrometers. pp. vol.” Electric Power [53] D. H. 583–588. “Application of an improved genetic algorithm to Photogrammetric Eng. Deo. 1995. Eng. Lewis. “Neural network Conf. M. Remote Sensing. 26. B. 1996. [78] P. Kerckhoffs. 1st Int. G.. Chilingarian. 38.” J. Roy. no. Voelz. Tzes. pp. pp. V. D. “Toward global 240–241. vol. Frenzel. algorithm and backpropagation. R. Man. neural network and a refined genetic algorithm. 16–22. 5–6. Fogel. Sheble. 2618–2623.” Decision Support Syst. 18. modified genetic algorithms. pp. Meservy. and networks for spatial decision making in GIS..” Comput.” IEEE Trans. Kubo. “Comparison between the perfor. [93] P.. [73] B. F. 70–76. vol. tational materials science: Training algorithms. Feb. Q. pt. 1994.” Nuclear Instrum. 67–85. Gallagher. 1996. D..” in Proc. Maifeld and G. works using evolutionary strategies. Ter-Antonyan. “Evolution of an artificial neural network based 511–522. V. May 1993.” networks for function approximation. 1997. 1997. 1994. Chen. 1992. Huang. pp. Holmes. vol. 1287–1295. 115. Porto. B. Syst. Lehotsky. 1995. 1–2. mization of back propagation algorithm and GAS-assisted ANN [54] A. CA: Transportation Res. and A. Lee. parison of Bayesian and neural techniques in problems of Applicat. 317–325. Ikuno. ditioner with a neural network control. Yao. Ed. D. 76. Penfold. vol. 1. [83] J. Power Syst. 31. Electron. vol. DeRouin and J. CA: IEEE Computer Soc. 2. nos. D. Methods [87] C. and M. L. 1. 1995. pp. 88–99. Eng.” in Proc. 32. Iwasa. 33. Inform. Marose. mance of feed forward neural networks and the supervised 22. vol. W. 450–463. 1994. IASTED Int. B. genetic optimization of a generalized regression neural net- [63] V. E. pp. nos.. vol. “Learning experiments with Cybern. pp. vol. Hung and H. 1992. 1992 IEEE Int. nos. Man.” PE & RS: Y. “Genetic-based fuzzy net networks. Scherf and L. 1710. Janson and J. B. 1992. vol.. Zhang. Wang. Mar. A. P.” Comput. Schaffer. J.” Neurocomput. no. Khorrami. “Com. 49–58. “An gorithms. vol. He. pp. H. network training methods. R. Olej. J. Eng. Chen and C. “Training neural networks by means of genetic YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1441 . Electric Power Syst. 192–194. Methods in Phys. Hansen and R. “Genetic generation of connection patterns for a dy. Hanna. rithms to the training of higher order neural networks. H. Chen and R. Section A: Accelerators. Workshop Combi. V. for control of a walking robot. classification to multiple categories. 5. [88] J. nos. 1994. 399. Frenzel. 91–97. and A. [61] L. M. pp. 5. Artificial Intelligence on Wall Street.” [56] M. 3. Sci. vol. 3. 11.” J. and L. Gargano. pp. “A genetic no.-W. 474–485. 5. CA: IEEE based on the fuzzy neural networks and their learning by Computer Soc. Ind. “Training partially recurrent neural net. Inagaki. Artificial Intelligence controller and its application. Orlando. pp. Fagg. “Active power line con- [66] A. Johnson. Yoon. Anaheim. L. 3. 65.” IEEE Expert. Jarmulak. 3. 474–483.” IEEE Expert. 371–390. “Parallel genetic/neural network 1992. pp. 389. Whitley and J. [84] S. Res. Pattern Anal. Adeli. 1. Diessel. neural-fuzzy networks. 267–291. pp. 6. J. Feb.. Nagahara. J. [72] T. Intell. pp. and Y. Beer and J. no. 15. “Concurrent training algorithm for in Phys. Solidum.. pp. “Opti- 26–33. model for the selection of pavement maintenance strategy. Spronck. [91] J. vol. Conf. 1996. 91. Trans. ence of Artificial Neural Networks. “Application of genetic algo. 1710. Trans. “Using genetic learning neural [69] Y. Los Alamitos. IEEE Computer Soc. 1997. Langholz. 2. 69–74. p. 5. [57] H. D. pp. no. vol. and G. 1. vol. pp.-L. Methods in [90] M. 1997. Syst. Lebedko. “Evolving dynamical neural written numerals using multilayer cluster neural network. no. [85] . Yao. no..” Steel Res. and J. Machine Intell. no.. and F.: Theory Applicat. and Z. Eds.” Decision Support Syst. D. Taha and A. 528–533.. Brown. CA: ACTA. Vardanyan. [92] B. [86] Y. S. B.. Speech Audio 654–660. Res.” IEEE [52] D. 1994. Skinner and J. pattern recognition using spectral analysis and hybrid genetic- Cybern. “Short-term load forecasting by a Eng. D. Ind. Sci. 272–276. pp. Section A: Accelerators. “Incremental approach to developing intelligent [82] S. M. J. [59] J. Huang and C. 1996. vol. vol. Peng. 1. 1131–1136. “Off-line recognition of totally unconstrained hand- [60] R.. Y. 1997. Processing. 1996. Res. 12.-Y. vol.” Nuclear Instrum. 648–652. nos. 360–361. vol. 1996. P. Neural Networks. “Genetic-based multilayered per- [64] S. no. 91–122. vol. 257–262. H. “Genetic pro. 1997. A. [74] A.” IEEE Trans. Topchy and O. Osmera. 1992. [76] M. the learning of neural networks. May 1995. Syst. J. Res. Res. Application and Neural Networks—AINN’90. “Optimization of neural networks by genetic al- [58] M. Conf. pp. autonomous land vehicle controller. pp. learning algorithm for MIMD shared memory machines. Kamo. to personnel selection in the financial industry.. vol. Int. 1997. pp. 1367. 274–277. vol. vol.” IEEE Trans. vol. application of artificial neural networks and genetic algorithms 1995. 1992. vol. vol. Kawabata. S. [80] M. 2. “Nuclear reactor diagnostic FL. Trans. 1–2. Chu. Greenwood.” Philosophical and Associated Equipment. 1990. and E.. G.. [81] S. Inform.” Advances in Modeling and Anal. pp.-K.” 92). 13. 1991. “Evolutionary neural network nations of Genetic Algorithms and Neural Networks (COGANN.” method using a genetic approach. Los Alamitos. [62] S. Yamada. and J. trained by a genetic algorithm. genetic algorithms for target detection.” IEEE Trans. Vandewallele. T. pp. “Efficient genetic [70] W. 1994. and K. 1996. Kukreja. Dorsey. system using genetic algorithm (GA)-trained neural networks. pp. vol.” IEEE Trans... 1997. 18. Datta. CA: design with genetic learning for control of a single link flexible IEEE Computer Soc. Berlich and M. networks with genetic algorithms. A. rotation and scale invariant neural network controllers for robots. O’Connell.. pp.” Electron. Soc. S. 1–2.. 147–152. Zhou and D. 18. Hirao.. pp. breeding algorithm which exhibits self-organizing in neural [77] N. pp. and T. 389. vol. “Application of genetic-based neural networks to ther- [65] G..” IEEE networks for adaptive behavior.. Detectors mate preference. A. 459–464. F. Los Alamitos. 2. Y. “Neural [67] A. Science of Artificial Neural Networks. 389. 1998. Lett. Syst.” Adaptive Behavior. “Neural network for female Phys. growing neural gas algorithm. Sexton. Broughton. Nov. W. no. pp. Spectrometers. Fogel. 38–54. Sci.” Nuclear Instrum. Elias. 1497. trained controllers. 1–2. SPIE. 1–2. “Accelerating the standard backpropagation algorithms for training layered feedforward neural networks. pp. R. W. 135–151.-J. vol. vol. 149–166.” Neural Network World. no. P. 734–741. pp. mal unit commitment. 731–735. Detectors and Associated Equipment. “Training product unit neural Syst. pp. Eng. and F. Conf. Section A: Accelerators. Mar. 10. D.” in Proc. 293–296. Spectrometers. 26. namic artificial neural network.” in Proc. L. A. Kinnebrock. F. D. nos. training neural network classifiers. vol. Rec. [89] R. Detectors supervised learning in artificial neural networks. Rastogi. von Kleeck.” Modeling and [55] E. pt.

Schaffer.. E. pp. different evolution behaviors. pp. Zhu. Fogel and A. Sankai. 779–782. 789–795.-Y. Neural no. Inst. D. B. Wolpert and W. 584–589. pp. Computer Soc. Nat. Belew and L. Conf. no. H. Yao. 495–498. Evolutionary Computation. 1997 IEEE Int.” in Proc. Z. 38. Chen. Olmez. Y. Technologies and Factory Automation. Neural Networks. Park. algorithms working on very long chromosomes. T. 1023–1028. “A step toward [117] N. 1994.-H. R. 4109–4114. “New hybrid An evolutionary approach. Genetic Algorithms in Engineering Sys- neural networks for detecting breast cancer. Sziranyi. Lin. Conf. 2 (of 2). [136] S. pp. Yao and Y. Booker. Goodman. 162–169. J. Lett. vol. 2744–2748. Conf. tems: Innovations and Applications (GALESIA’95). pp. Takano. algorithms. “Equivalence class analysis of genetic algo- Syst. Lee. Los Alamitos. 1995 American Control neural network training using genetic algorithms. [97] T. 707–714.-D. pp. vol. 183–205. Yao. Vlachavas. vol. 4). Conf. Australia. evolutionary learning on the random neural networks. 159–161. D. 87. W.-T. “Traditional that uses evolutionary learning. pp. no. P. Evolutionary tation. pp. Conf. CA: IEEE Computer Soc. Part 4 (of [109] R. MA: MIT Press. Porto.” IEEE Trans. “Evolving neural [132] E.” in Proc. 1994. C. 6. 18. pp.-P. in Proc. Los Alamitos. 1996 IEEE Int. [104] D. Neural [115] N..” in Proc. pp.. 1995. no. Nickolay. Conf.” Cancer Lett. Khoshgoftaar. [106] A. Part 1 (of 4). 1995 IEEE Int. E. “Hybrid adaptive learning [133] H. ral network learning. Eds. Ghozeil. K. Fogel. 4th Int. Pub. Hinton. 1561–1562. Hutchins. 1. vol.” in Proc. pp. vol. Pal and D. 3rd Int. B. 9. NO. vol. Eds. E. “Faster-learning variations on back-propagation: its application for radar target classification. 1997 IEEE Int. modular fuzzy neural network for robot path planning. Evolutionary Neural Networks. based probabilistic neural networks construction technique. algorithms based on neighborhood and step sizes. vol. 887–891. C. 2. 129–139. and Cybernetics. Yegnanarayana. Schwefel. 1996 1st Joint Conf. Kao.-H. lutionary Programming V: Proc. pp. vol.” in Proc. pp. 1992. no. Liu.” in Proc. 1997. Omatu and S.. 1–5. “A neuro-genetic controller for [116] D. and Y. F.-J. [112] P.” in Proc. Koeppen. pp. Sejnowski. and D. Hu. Tools with Artificial Intelligence. Popvic and K. SEPTEMBER 1999 . VOL. Sept. Horng. Stevenage. Algorithms. vol. Zhang. W. 1997. 1997 IEEE Int. vol. Eng.. pp. “Hybrid genetic/BP algorithm and [131] S. Liu. Networks. and J. Murty.” Int. Neural Syst. “A note on representations and nonminimum phase systems.” in Evo- Applicat. [110] W. G. pt 2. 1995. 1997. 1997 An empirical study.” IEEE Trans. pp. T. Part 1 (of 2). pp. 24. Conf. “An overview of evolutionary [103] J. Hancock. C. [111] J. variation operators. [102] R.” in Proc. “Neural network [128] F. Apr. D.” in Proc. 555–561. Torreele.” in Proc. vol. 1213 of Lecture Notes in Computer Science. Sept. Systems. G. 1442 PROCEEDINGS OF THE IEEE. Man. Conf. and T. Circuit Theory [121] X.” IEEE Trans. 467–496. 28th Eberhart. “Backprop- induction regular language using combined evolutionary agation learning for multilayer feed-forward neural networks algorithms. Dowla. 10. method for fuzzy rules using neural networks with adaptive Software Reliability Engineering. Conf. Conf. [94] S. 8th Conf. J.” Int. Wasson. San Mateo. 1997. “Cache-genetic-based 1997 IEEE Int. and V. using the conjugate gradient method. “Recognition algorithm using Part 3 (of 5). and [99] T. Kwong. “A new interpretation of schema notation that 94–97. Thierens. no. Evolutionary Computation. Lei. Deris. Hudepohl. nos. of Genetic Algorithms and Neural Networks (COGANN-92). U. M. and T. “Genetic 93. Tang.” in Proc. Evolutionary Computation. pp. pp.-P. Fogel. [96] D. H. Conf. 1.K. AI (AAAI-90). J. 1991. Systems. Yan. pp. Part 5 (of 5). Conf.” in Proc.” Microprocessing and Microprogramming. “State estimation of the CSTR algorithms for parameter optimization. pp. Allen. CA: Morgan Kaufmann. Yang. Angeline. Systems & Computers. Fahlman. 1995.. Proc. Int. J. in Proc. 1991.” in Proc. Boughton. Part 2 (of 4).” Evolutionary Compu- system based on a recurrent neural network trained by HGA’s. L.” in Proc. [114] J. “No free lunch theorems structure specification. vol. ETFA’96. Conf. 1995–2000. “Evolutionary neural networks: A robust approach [130] T. Berlin. G. Cambridge. pp.” Control Cybern. 6. Conf. Systems. pp. 119. July 1997. J. 1997.” Complex Syst. Schultz and H. S. Conf. and E. Bäck and H. 291–301. Park. “Genetic algorithms with fuzzy fitness 456–461. ICEC’96. vol. and R. J. pp. “Optimization by genetic annealing. 381–396. Conf. Los Alamitos. Eds. 2nd Computer Soc. tionary Programming VI: Proc. pp. and evolved dynamic neural networks for aircraft simulation. and B. 981–984. no. Ichimura. 299–316. “An analysis of evolutionary [101] D. function for object extraction using cellular networks. and S. K. R. Zalesski. “Co-operating populations with Programming.” in Proc.” IEEE Expert. S.” in Proc. Symp. and M. 6th Annu. Jr. ICEC’96. Tsoukalas. Sydney. Touretzky. L. Wasson. 1996. “Evolving 1st IEE/IEEE Int. pp. Radcliffe. neural nets and evolutionary programming. “An empirical study of genetic operators in genetic point distribution through a breeder genetic algorithm for neu. M. 635–639. Man. Part 2 (of 2). tems/ISAI/IFIS.. lems: A comparison of recombination operators for neural net [134] D. Eds. Programming. and R. G. [118] K. J. 2. Johansson. 1988 Connectionist Models Sum- IEEE National Aerospace and Electronics Conf.” Int. Part mer School. Bäck. “Reasoning and learning to software reliability problems. “Robustness of cellular neural networks in image Cybernetics. G. [129] K. E. 3089–3094. San Mateo. NAECON. B. overturns the binary encoding constraint.. 1. 65. p. Kitano. Neural Networks. Part 4 (of 5). Ohta. B. “Temporal processing with recurrent networks: [105] A. B. 1996. Fogel. “Stabilization of inverted pendulum Genetic Algorithms and Their Applications. 1997 8th Int. “Evolving neural control sys- computer-assisted mammography using evolutionary program. Bhandari. and C. Conf. C. and E. Schaffer. 1994 IEEE Int. 451–460. 1988. [124] X. B.” Fuzzy [120] P. 1991. U. Liu. 1–23. pp. 1995 IEEE Int. 1995. Evolutionary [100] P. [123] T. “Empirical studies on the speed of convergence of control of nonlinear system. Conf. “Data fusion in neural networks [127] M. 23–27. S. Systems.. 1997. J. Conf. [98] S. 3044–3049. and L. [126] J. L. 1996 IEEE Int. 2–3. 96. “Evolving basis functions with dynamic recep- Sets and Syst. 5th Annu. pp. I. “Retaining diversity of search [125] X. Germany: Springer-Verlag. R. pp. no. Sarkar and B.” in Proc. pp. J. M. and J. vol. ICEC’97. J. [108] J. pp.” in Proc. Angeline. 1990. Elect. Neural Networks. C.” tation. CA: Morgan Kaufmann. 1.” Evolutionary Computation. pp. 297–307. binary encoding. Part 2 (of 4). Hochman. and J. [95] D. 3. no. 414. CA: Morgan Kaufmann.. Genetic neural-genetic methodology for improving learning. “Classification of EEG waveforms by using RCE [119] M. Heimes. 4. Man and Cybernetics. Man. 1996 IEEE Int. C. P. Jiang. pp. Part 2 (of 2). Neural Networks. D.. Whitley and J. 49–53. Y. Antonisse. 1297–1300. 1996 IEEE Conf. Asilomar Conf. Colmenares. Park. F. Reynolds. He. tems.” Electron. Mar. Computation. Mandischer. 571–575. Chan. “Evolving recurrent neural networks with non- via computational evolution. Likartsis. Oshima.” in Proc. CA: IEEE [135] X. 3.” in Proc. M. pp. 3. Yao and Y. Part 4 (of 6). MA: MIT Press. Emerging San Mateo. G. 1. 250–255. Fogel. Wu. 67–82. vol.” in Proc. pp.” in Proc. Neural Networks. G. 3269–3274. “Identifying nonlinear dynamic systems using P. pp. pp. “Fast evolutionary programming. pp. J.” in Evolu- works. and M. McDonnell. Part 5 (of 7). rithms. Australian Conf. 1991. 86–91. and J. Yao.” Cancer Lett. “Genetic algorithms and permutation prob. [113] P. 1993. 32–36. deblurring and texture segmentation. 1989. Part 1 (of 4). ming and neural networks. structure for NN topology and weights optimization. 1997 IEEE Int. J. 108–122. Wechsler. Petridis. Adamidis and V. 38–51. no. pp.. Angeline. 1996. Eds. 1993. 1. pp. 1997 IEEE Int. Ed. by the genetic algorithm. 26. pp. 13–26. [107] M. 188–191. Intelligent Sys. Tazaki. Workshop Combinations for optimization. E.” in Proc. Cambridge. “Fast evolution strategies. pp. Macready. tive fields.” in 33... Saravanan and D. [122] X. 9th IEEE Int.-M.. Conf.” in Proc. CA: IEEE structured genetic algorithm. Aguilar and A. “Evolutionary programming- neural networks and genetic algorithms. “Non-redundant genetic coding of neural net. Signals. Man and Cybernetics. 5. B. Teunis. vol. 1997 IEEE Int. Conf. M. Evolutionary Compu.

San Mateo. Conf. no. B.. B.” IEEE Trans. Andersen. Guha. Touretzky. CA: Morgan Kaufmann. no. pp. that generalize. “Genetically generated neural networks I: Repre- Networks. Phys. pp. D. Conf. “What’s [169] F. Macfarlane. “Using Queensland. Symp. “The upstart algorithm: A method for constructing netic techniques. 282–287. of the 1st IEEE Int. 123–131. Univ. CA: Morgan Kaufmann. “An evolution. Furuya. Ed. P. Jan. P. Adey. O. pp. L. M. Workshop Artificial Neural in Toward a Practice of Autonomous Systems: Proc. 1995. B. “Back-propagation Syst. “Evolution of neural control struc- [157] W. pp. vol. [175] A. no. 226–233. pp. vol. Workshop Artificial Neural Networks (IWANN’93). Dress. U. pp. 1989. Ishigami. Conf.” in Ad. Rep. 61–66. 4. Int. Dow. pp. 1st Eu- Networks (IWANN’93). “GANNET: Design of a neural net for face applications. D. 1991. vol. Bakkers.” in Advances in Neural Information Processing Eds. San Mateo. P. Germany: Springer-Verlag. Comput. 6. The Netherlands: North-Holland. Stirling. Kaufmann. no.” in Proc. “Structure optimization of [158] A. synthesis of neural networks. Ozdogan. and C. architecture. sentational effects. Sato and T. CA: Morgan 1165–1168. 1987.” in Proc. J.. Rzevski and R. Neural Networks. 2nd Int. Kunii. Rep. Voigt. 3rd Int. Sietsma and R. Guha. no. Dep. The Netherlands: IOS. San Mateo. vol. Part 3 (of 5). Miikkulainen. algorithm which varies the number of hidden units. Genetic Algorithms and Their Intelligent Control. Int. pp. 1993. Bergman and M.. 3. pp.” Neural Networks. 1990.: Elsevier. 1992. 63. Techniques Lab. and S. Koblenz. Maimon. 5. 939–944. Santibáñez-Koref. “Self-tuning neuro-PID control and [159] P. Arai. rop. Bourgine. “Toward the evolution of sym- Kohonen clustering process. Fullmer and R. 4. 1996 IEEE Int. T. Morán. pp. pp. Conf. [147] A. San Mateo.” in Proc. pp. London. Joint Conf. Patón. Denker. Mondada and D. and J. Germany: Springer-Verlag. no. W. T. Systems 2. algorithms. [161] B. Berlin. J. Int. vol. F. pp. Lecture Notes in Computer Sci- Washington. Kitano. “Skeletonization: A technique [164] S. Ed. 4.-N. ary algorithm that constructs recurrent neural networks. pp. Pollack. 1989. 686.” in Advances in Neural Information Processing Systems [167] W. no. [176] Y. “Toward the genetic many: Springer-Verlag. Ed. in Proc. Genetic Algorithms and Their Applications. 63–69. 4. 686. Fahlman and C. [165] S.” Neural [166] L. Sci. Tech. Dobnikar. J. pp. Tech. pp. F. Lecture Notes in Computer Science. 1985–1989. algorithm for neural network design and training. Eng. Harrison.” in Proc. 769–775. 1996. Eng.. Elect.” Neural Computa. “The cascade-correlation learning 91. Gariglio and A. Troya.” fuzzy neural network by genetic algorithm. Australia.” in Proc. 255–260. [156] S. 44. Rep. You. 1990. D. 447–454. CA: Morgan Kaufmann.” Neural Networks. S. Schiffmann.” Center for Cognitive and Cybernetics. Hancock. Dodd.. 2. Merelo. Applications. Sandoval. pp.” Simulation. Hirose. pp.” Dep. M. vol. 67–79.” 1990. A.” techniques. 537–542. Applications of Artificial In- and training feedforward neural networks.-C.. Artificial Life. H. bols. 39–43.-S. Germany. 1995. Univ. 3. Comput. simulated distillation plant using connectionist and evolutionary [154] S. Wang and J. “Fusion-technology and the design of neural networks. 687–700. Eshelman. 244–248. M. Conf. pp.” Fuzzy Sets Syst. A. 1993. Genetic [172] H. 1.” Bionics and Evolution multilayer perceptrons. Cañas. Neural Networks. J. and S. G. Dodd. U. 6. Solla. G. A. vol. 3–26. 1. 198–209. pp. 257–264. and I. Smolensky. Tech. 686. ral networks: A genetic approach. Ger- [152] S. 42. M. CA: Morgan Kaufmann. vol. 5. 360–369. YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1443 . pp. 183–195. S. R. 6. vol. genetic search to exploit the emergent behavior of neural [173] D. [138] I.-M.-R. Zaus and R. 1990. Conf. “Designing application. 598–605. A. Comput. Artificial Neural Networks—ICANN- [141] S. Yamashita.” Complex Syst. 1990. 2. Harp. no. Dep. “Optimal brain dam. “Optimization of switched reluctance motors using evolutionary neural net- artificial neural network structure using genetic techniques works. 686. FT-10. (IJCNN’92 Baltimore). T. [153] J.” in Proc. 1992. 16/1992. pp. time recurrent neural network. Ams. 1991. Todd. vol. Schaffer. pp. “Breeding intelligent automata. pp. Dolan and M. 1991.-R. 1989. Marín and F. vol. S.. Xu. vol. 71. 1991. Joost. “Modeling faulted [155] N. T. 2–4. Marshall and R. [143] M. [178] H. pp. Transputing’91.” Robot. pp. vol.” with graph generation system. “GANNet: A genetic algorithm [151] H. Workshop Artificial jection pursuit learning perspective. 322–327. White and P. 2nd IEE Int. D. Aug. vol. “Fault detection based on evolving LVQ [162] M. Univ. S.” in Proc. 1. Yoshioka. pp.. Berlin. 16. 399–404. [149] P. Stiles. and R. [148] J. Psychology. “Darwinian optimization of synthetic neural tures: Some experiments on mobile robots. Angeline. 4. Conf. J. Arkadan. “Genetic synthesis of discrete- wrong with a cascaded correlation learning network: A pro. Phys. Inst. Touretzky. Simula. Cambridge. Conf. 1995. Systems. 1997 IEEE Int. Eng. vol. and F. Elect. “A distributed genetic 1991. works using genetic algorithms. Conf. “Identification and control of a networks. ence. in Proc.” Phys. Oliker. Tech. Hillsdale. Lay. Floreano. A. “Creating artificial neural networks Inst. Caruana. Ligomenides. Mäkisara. Roy. “Coevolution in recurrent neural net- vances in Neural Information Processing Systems 2. Press. Marland. and I. Kohonen. Werner. Furst. D. and S. K. 393–404. 1993. Harp. Ed. 535–545. LeCun. nos. Networks: Proc. Hwang. and O. S. J. Schaffer. Ed. F. Fukuda. London. M. Elect. Berlin. 1997 IEEE Int.” IEEE [170] E. [137] S. D. D. 459–477.” in Proc. pp. Berlin. 249–256.” Complex [145] Y. and A. “Short term load forecasting using Stirling Univ.” Neural Network World. 1. K. Connection Sci. 379–384. vol. design: A genetic approach. “Evolutionary design of application-specific neu- terdam. J. San ceptron based on co-operative population concepts in genetic Mateo. specific neural networks using the genetic algorithm.. Lecture Notes in Computer Science.: [144] J. Workshop Artificial [150] G. D. Alba. 1993. genetically optimized neural network cascaded with a modified [160] C. Kim. vol. Born. Jou. 1987. Touretzky. tion. NJ: Erlbaum. Japan. vol.K.” feedforward neural networks by genetic algorithms. 107–112. 1. “Optimization and training of for trimming the fat from a network via relevance assessment.” in Proc. MA: MIT Press. Mukhopadhyay. 1993. 1993. Amsterdam. Kangas. Univ. Prieto. Welch. “Designing neural Neural Networks (IWANN’93). Computational Neuroscience. T. C. Neural Networks. Conf. Samad. Ind. S. implemented on multiple transputers. J. 5. Germany: Springer-Verlag. Man. Megnet. II. Erkmen and A. 1991.” in Proc. Tech. Man evolutionary machines for neural networks. 1997. [140] D. Feb. J. J. U. Systems. Brisbane. “Optimization of neural-network structure using ge- [142] M. J. Int. Schaffer.” Syst. A. 255–262. 179–184. and J. 1990. Eds.K.” in Proc. L.K. F. Eds. pp. CCCN-6. pp. 461–476. 5. Syst. and L. D.” genetic algorithms. 64–73. 1990.” in Artificial Neural and Cybernetics. Conf. no. Sauders. and A. Sept. Martí. Int. 524–532. Hegde. Eds. 185–192. and A. Belfore. 1. [139] J.. 2. [171] D. “Designing neural networks using genetic algorithms for optimizing topology and weights in neural network design. vol. telligence in Engineering VI. and J. [174] L. vol. 42. vol. J. pp. pp. Rep. Varela and P.. 1994.” Master’s thesis.” in Proc. Samad. E. Koblenz. vol. Lebiere. Berlin.” in Proc. 1993. and recognition by genetic algorithm. U. Wilson. Artificial Neural Networks. A. no. “Perceptron redux: Emergence of structure. Int. IV. pp. Seattle.” in Proc. 3. 1991. Dyer. J. M. 54–65. “Synthesis and per- 2. “Using marker-based genetic “Optimization of a competitive learning neural network by encoding of neural networks to evolve finite-state behavior. vol. and S. Frean. 3rd Int. Kerszberg. D. Helget. pp. and F. no. ence. S. D.. pp. Aldana. 27. and A. pp. 1994. Miller. “Fully automatic ANN Trans. 1990. G. Lecture Notes in Computer Sci- networks using genetic algorithms. F. Neural Networks (IWANN’93). 1993. A. “A constructive algorithm for a multilayer per- Algorithms and Their Applications. age. J. Hijiya. Neural Networks [146] Y. pp. vol. “Evolutionary time algorithm for the construction and training of a class of structuring of artificial neural networks. Autonomous systems. 1st IEEE Int. Mozer and P. 1987.. [163] N. “A polynomial [168] H. [177] F. vol. Omatu and M. Electron. pp. formance analysis of multilayer neural network architectures. A. 41–50.

Romaniuk. Fotakis. evolved by genetic programming. “Evolving recurrent percep.” Appl. 1997. Eds. Neural Networks. Yao. “EPNet for chaotic time-series prediction. vol. Yao. 7. Part 1994. no. 1st IEEE Conf. L. Smith and I. H. 659–664. Yao. NO. Engineering Systems Design and Analysis.” in Proc. Conf. 257–266. K. Systems. strategy. 1996 IEEE Int. C. Neural Networks. 1997. Part 1 (of 2). vol. Greece: Technological Educational Inst. New York: ASME.. H. pp. neural networks using co-evolution. A. 1995 Australia-Korea Joint Workshop Evolu. Pub. pp. KAIST. pp.” in Proc. Neural Networks. “Making use of population information in evolutionary search with evolutionary neural networks. Int. Okubo. 1996 IEEE Int. C. Nikolopoulos and P. “MONSTER—The ghost in the [188] . strategies through evolutionary neural networks. 11. Conf.. B. 8.” in Proc. Korea. 1995. [183] J. Evolutionary Computation (ICEC’97). “Configuration of multilayered paradigms: A neural.” in Proc. no. “Modular neural networks Int. Evolutionary Computation. Kothari. Fang and Y. “Machinery fault diagnostics 155–161. Yao and Y. 91. pp. Liu.” Expert Syst. McDonnell and D. pp. 1995 ACM/IEEE Supercomput- works.” in Proc. Int. [212] S.” in Proc. pp. and T. Autonomous Syst. 1994. 327–338. IN. 1996. Yip. 1997. pp. Ashurst. [201] R. Kim. J. “Genetic evolution of the topology and weight [204] S. 1997 IEEE which learns both architectures and weights of neural net. [196] J. Cho and K. F. Evolutionary Computation. [199] N. pp. Conf. Neural Networks in Image Processing. R. Eds. 7. “Improving game-tree [193] . Part 1 (of 2). Kim. 1996 IEEE Int. UPEC’97. and T. pp.” Artificial Intell.. E. R. Maniezzo. Microwave by evolution. “Toward minimal network architectures with evolution- ary programming.” [221] Y. pp. pp. 1998. Li. Y. 1st IEEE artificial neural networks. Vriesenga.” in Proc. Aggarwal.” [207] D. “Evolving artificial neural networks through evolution. “Current transients based faulted phase selection technique [200] D. Xi. 417–425. 1995 IEEE vol. Part 4 (of 6). “A population-based learning algorithm topology and by augmented training. Developments in Power System Protection. pp. H. vol. “Evolutionary ordered neural network 1996. 205–210. pp. L. pp. Evolutionary Computation. Patel. 11. Singapore. 8. Part 4 (of 4). 1995. 1. Intelligent Control and Instrumentation. vol. R. 432–436. X. Fogel. L. apportionment: A new learning algorithm for neural nets. pp. Performance Computers in Engineering. “A new evolutionary system for evolving artificial neural [215] S. 204–213. Ragg and S. 159–164. pp. 665–669. G. no.” 1–16. ICEC’96.” IEEE Trans. Liu and X. Conf. Mar. J. Yao and Y. 1995. Song. [211] T. 1997 Learning (SEAL’96). 1997 23rd EUROMICRO Conf. D. J. pp. Yu.” in Proc. architectures for signal processing. P. B. Application of High- [191] Y. “Toward designing artificial neural networks signal HEMT model. Esat. Cribbs. Simulated Evolution and artificial neural networks in sales forecasting.. Poli. Richards. works. Perez and C.” IEEE Trans. CA: IEEE Computer Soc. pp. “Discovering complex othello using a genetic algorithm evolved neural network. Angeline. [208] C. pp. 5. SEPTEMBER 1999 . 2. Likothanassis. vol. “Combining evolution with credit 1997. 1994. 5. 496–501. vol. Liu. 1997. vol. 3. F. neural networks to play Go.” Robot. 65–74. Neural Net. vol. Los Alamitos. IEEE Singapore [206] S. Nagoya. “Genetic 1444 PROCEEDINGS OF THE IEEE. no. WA. [181] R. 549–555. 1998. “Evolving the topology and the Moore. Bäck. 157–168. Conf.” IEEE Trans. Li. pp. Moriarty. H. “Combined biological [203] C. “Genetic algorithm based neural distribution of neural networks. “Evolutionary artificial neural networks that learn and connection machine: Modularity of neural systems in theoretical generalize well. [194] . Indianapolis. [205] E. [216] . Chu. [213] D. “Improvements on hand- Germany: Springer-Verlag. vol. Liu and X. 1995 IEEE Int. [195] . CA: [190] X. [214] D. pp. Advanced Software Res. Fellrath. Paya. Eng. 1998. pp.-H. Conf... gramming techniques that evolve recurrent neural network [197] J. 2. E. tween parents and offspring. C. A. pp. H. “Automatic determination of optimal tionary Computation. Neural Networks. 1285 of Lecture Notes in Artificial IEEE Int. 4. [217] C.” Int. U. Evolu- May 1997. pp. “Application of in Select. with a linked-list encoding scheme. Yao and Y. no.. network topologies based on information theory and evolution. Elect. 629–633. 1996 IEEE Work- trons for time-series modeling. 1996 3rd Biennial Joint investment advising. 1994. 5th Int. [184] X. vol. Waagen. 694–713.K. pp. MA: MIT Press.. Georgopoulos. vol. Q. 1. Conf. [185] X. 1997 IEEE Int. H. Evolution. 73–84. A.” in Proc. [219] Z. Jacak.. Conf. [209] K. Lee and J. Man. 1997. “Evolving cascade. Cambridge. 24–38. Jain. J. Esparcia-Alcazar and K. E. correlation networks for time-series forecasting.” in Proc. “Using genetic algorithms ral modeling of piecewise linear classifiers. 1994. 139–148. and I. 3. T. 87. pp. Part 2 (of 2). “Evolutionary design of artificial neu. Vonk. Y. 1. networks for dynamical system modeling. 587–612. W. Part 2 (of 2). genetics-based autonomous systems feedforward networks by an evolutionary process. [210] N. Yao and Y. Conf. 9. vol.. 85–96. VOL. Holzmann. Intell. techniques. Inst. 5. ing Conf. Stevenage. Neural Networks. and R. no.” in Proc. 959–962. 146–156. E. in Proc. 1996. nos. Y. 1996. 1995.” in Proc. pp. DC. based on genetic evolved neural network. 1996 IEEE Int. Computational Mechanics. 1. Math. Snoad and T. Conf. Networks. pp. “Optimizing the structure of neural networks using evolution 1997. Part 3 (of 7).” in Proc. 39–53. Part 2 (of 2). Bo. pp. Neural Annu. pp. J. Berlin. 1997. [218] A. Int. evolutionary research.” in Proc. pp. Sharman. Evolutionary Programming. 1998. Bossomaier. J. A. pp. Syst.K. Eng. Japan. no. Apr. Turnbridge Wells. 22. Sklansky and M.” in Proc. 3. Chow and C. Part 1 (of 2). 1996. [220] Z. tionary programming. pp. 1994. Taejon. 28. and J. Liu. [198] J. Moriarty and R. Bo.” in Proc. Conf. Conrad. [202] B. tionary Computation. Johns.” Connection 1997 32nd Universities Power Engineering Conf. “Hybrid expert system for neural network structure. R. 149–154. Conf.” IEEE Trans.. “Genetic pro- Artificial Intell. vol. 1997 6th Intell. Conf.: ral networks with different nodes.” Int. 291–294. no. and D. no. 7 (of 9). cations. pp. Conf. pp. and A. “Evolving U. ary Computation (ICEC’96). Sept. SPIE: Applications of Artificial [192] X. pp. Johnson. 10. [182] V. E. 434.” in Proc. Conf. shop Neural Networks for Signal Processing. Miikkulainen. ICEC’96. “Ensemble structure of evolutionary artifi. Conf. 1. June 3–6. 1996 IEEE Int. Papers 1st Asia-Pacific Conf. pp.-H. T. Gutjahr. 54–65. Shirakawa and N. pp. Intelligence. Q. 341–351. Pattern with grammar encoding to generate neural networks. ESDA. Jan. 2121–2124. written digit recognition by genetic selection of neural network [186] Y. “A preliminary study on designing artificial 1928–1931. 1710–1712. no. T. E. “Genetic selection and neu. pp. I.” in Proc. R.” in Evolutionary Programming V: Proc.K. McDonnell and D.” in Proc. Miikkulainen. H.” in Proc. Japan. “The importance of maintaining behavioral link be. Man. H.-B. D. 5th ary growth networks. Los Alamitos. 1994 IEEE Int. 531–534. 578–601.” Chinese J. Cyber. “Non-communication protection of transmission line weights of neural networks using a dual representation. Recognition Artificial Intell. June 1995. cial neural networks. Miikkulainen. pp. 681–684. Tools. neural network construction.. “Genetic determination of large- [187] X. Aggarwal. 1997 27th Europ. Washington. Smalz and M. 1487–1491. Y. Jan. vol. Shimohara.” in Proc.[179] J.. pp. B.” in Proc. U. Computation. vol. Evolutionary Computation. 1994. [189] .. Conf. using direct encoding graph syntax for optimizing artificial [180] C. pp. 37th Midwest Symp. IEEE Computer Soc. “Neural network design based on evolu. Pujol and R. Iraklio.” Appl. “Using genetic algorithms to construct a network for 670–675. 83–90. and W.” in Proc. “Evolving artificial neural networks for medical appli. financial prediction. 3/4. Nagoya. Q. 245–248.” in Proc. Hines..” Appl. Johns. Dreiseitl and W. Evolutionary Computation (ICEC’96). Waagen. 602–607. Bellingham. 750–752. Furuhashi. and R. “Applying crossover operators to automatic networks. K. Part 2 (of 5). and Cybernetics. Liu. Xuan. Conf. Sci. 8.. 195–210. Yao. Shi. Conf. Johns. Neural Networks. 1996. Circuits and Systems. and P. Moriarty and R.

C. 438–443. Conf. “A comparison of matrix vol. 133–141. Simulation of Adaptive Behavior: From Ani- 1992. 271–280. pp. Alpert.” neural networks. via the prediction risk: Application to corporate bond rating [250] B. 1997 IEEE Int. Neural Networks. pp. E.. B. 79. D. Washington. “Genetic synthesis of boolean neural networks with works. 1990. 434. Smith and H. pp.” Neural Networks.” Complex Syst. vol. Tech. pp. Conf.. K. works. [262] H. W. pp. Billings and G. “Selecting neural network architectures 7.” in Proc. R. and B. Nov. pp. 6th Int. Rumelhart and J.. Gruau.” Evolutionary Computation.” IEEE Trans. Szu and R. J. Eds. Rep. Rep. [243] D. simulated annealing. J. pp. 8. pp. learning and culture: Computational the training of a multilayer perceptron based on the genetic metaphors for adaptive algorithms. Eds. [226] M. IEEE. (C- algorithm. 1. Porto. M. 1447 of Lecture Notes in rule. 4. “Efficient learning of NN-MLP based information approach to layered networks. Zhao and T. D. no. 6142.. [265] Y. Eds. “Evolving space-filling mankova. Int. W. Oct. Summer School. Meyer and S. [228] J. CA: Morgan Kaufmann. basis learning. 122–130. L. Utans and J. algorithm based neural networks applied to fault classification A. A. Conf. and genetic neural nets. Berlin. Eds. Com.K. 762–767. C. Siddiqi and S.” in Proc. 1996. [258] G. 1991. Sharp. Whitehead and T. J. Moody. Conf. 1229–1239. “How learning can guide 1987. pp. vol. San Mateo. Int.” Neural Networks. Beijing. vol. [227] D. Developments in Power System Protection. Eds. Conf. 35–41. Joint Conf. Sept. 4. [241] D. San [238] G. evolution. 2. part I: Evolving the most convenient repre. vol. DC. vol. “A bio- in Proc. 19–36. which learns both architectures and weights of neural net- [232] F. Tech. Sept. [225] M. pp. Univ. [230] E. Jackson. [263] J. 667–671. 7. 1st Int. Hwang. A. 7.. and W. E. 559–566. Sci. 1995 IEEE Int. B. Y. Neural Networks. 1st Int. A. Belew. sentations. “Efficient approximation with C. “Scaling. “Evolutionary learning of nearest- by evolutionary algorithm. Canada. Conf. D. Artola. CA. Rep. 1990 Connectionist Models [242] A. Neural Networks. Computation. 3. 7. Elect. pp. Inst. pp.” Advances in Applied Math.” Neural Networks. S. Conf. MA: MIT Press. Chellapilla. pp. B. Evolutionary Computation. Eds.” Dép. and operators in evolu- U. 1990. Neural Networks. Walker. “Adaptation in open systems: Learning and Apr. 6. “Evolution. S. O. “The evolution of behavior: Some experiments. Part 1 (of 4). Queensland. pp. Nov. Elman. Tech. Neural Networks. Park. AI Applications on Wall coverage using orthogonal niches. San Diego. “Learning and evolution in no. and G. Artificial Neural Networks—ICANN-91. Language. Tsoi. Part 4 (of 4). Pennsylvania State Univ.” in Proc. Higuchi. mals to Animats. neighbor MLP. 201–215. Man. pp. “Nonconvex optimization by fast 329.” in Proc. and D. J. P. Mjolsness. Apr. ICEC’97. “When learning guides evolution. MA: MIT Press. Pfeifer et al.. 1371–1378. Conf. 75. Sebald and K. 1991. 8. Neural [264] D. Evolutionary Computation. annealing on a massively parallel processor. Choate.. Conf. The tron with Various S-Cell and C-Cell Transfer Functions. K. Gueriot. tionarily friendly. 1987. Dep.” in Proc.” Complex Syst. 1. Nov. vol. and B. petition and evolutionary optimization. rewriting versus direct encoding for evolving neural networks. 249–268. Jan. long-term potentiation in slices of rat visual cortex. Cambridge. Parallel Dis- 641–645. Germany: Springer-Verlag. neural networks. 014). 53–60. Los Alamitos. I. pp. vol. Informatique et de Recherche Opérationelle. Bichsel and P. Whitley and J. J.” in Proc. “Feedforward neural net. vol. Cho. CA: IEEE Computer Soc.” Nature. Methodologies 8. Amsterdam. The Performance of the Neocogni. Evolutionary Programming. on individual evolutionary algorithm. M.. pp.” in Workshop Konnektionismus.” in Proc. [257] J. ICEC’97.. for EHV transmission lines with a UPFC. Perth. no. vol.” Proc. L. 1992.” in put. 1996. vol. CA: IEEE Computer Soc. M. 1989. Intell. in Proc. Zheng. 869–880. 173–198.. no. Eiben. May 1996. functions and genetic algorithms.” in Proc. Univ. 1987. Stevenage. 1994. works configuration using evolutionary programming. J. evolution. Conf. IEEE. 1988. Germany: Springer-Verlag. 29–38. pp. D. 193–198. Burns. L. 1990.. “A study of crossover operators configuration using genetic algorithms.” Comput. pp. “Learning by gradient descent in function space. 15–23. Whitehead and T. pp. A. Bengio. 1989. Z. Zhao and T. “Stable on-line evolutionary learning of NN-MLP. Lovell and A. Work. 1997 IEEE Int. 1. ICEC’97. [251] S. Conf. Simula. “Is a learning classifier system Sept. Los [261] H. Anand. no.” Diego. Broecher. Yegnanarayana. Ritter. Australia. Amsterdam. 1995. Computer Science. Netherlands: Elsevier. 1992. pp. 242–247. Parisi. a cell rewriting developmental process. Univ. “Combining modular neural networks developed [246] Q. pp. pp. Smith. D. pp. vol. “Teaching network connectivity using simulated nen. Evolutionary IEEE Trans. W. DasGupta and G. [233] A. [244] D. Symp. Conf. N.” Proc. [252] B. Mühlenbein and J. T. Hinton. in genetic programming. V. and W. pp. Cambridge. Evolution- [222] Q. Higuchi. “Different voltage- (COGANN-92). Tsoi. 1994. Schnitger. Choi. 2–4. Fogel. ism in Perspective. Whitehead. 647–650. “A population-based learning algorithm networks. 202–205. Kindermann and [240] B. 392–397. for Intelligent Systems (ISMIS’91). pp. 1997 IEEE Int. pp. Elect. D. [236] H. Wilson.. and J. 1991. Nolfi. Eds. Mani. Int. “Cooperative-competitive network selection. Rep. V. Andersen and A. Jockusch and H. 1997 IEEE Int. California.. 6288. [245] S. of California. Eng. China. S. [237] R. Pub. no. “Learning a synaptic learning Conf.” in Proc. Berlin.” in Proc. ary Computation (ICEC’95). “Fractally configured neural [254] Y. Eng. Hancock. 69–72. 347. F. Maillard and D. Yao. E. Int. 1991. Fogel. 137–163. “EditEr: A combination of IEA and CEA. J. 278–281. Stork. Bengio and S. C. Los dependent thresholds for inducing long-term depression and Alamitos. A. curves to distribute radial basis functions over an input space. 13. Rep. vol. Lucas. 1991. Neural Networks. 877–890. Neural Networks. McClelland. B. Neural Net- Street. 1990. Nowlan. E. pp. “Minimum class entropy: A maximum [248] Q. and Montréal. L. pp. S. University Park. Dep. pp. M. 1993. tributed Processing: Explorations in the Microstructures of Cog- [223] M. 1991. 1986. Ras and M. pp.” IEEE Trans. H. Choate. L. vol. Waagen. genotypes. “Preadap. Sci. Koho- [234] S. The Netherlands: North-Holland. “On making problems evolu. Machines Lab. “The evolution of learning: An experiment in Networks. “RBF neural network.” in Proc. Singer. and learning—Toward genetic neural networks. D. Zhao. July Tech. logically supported error-correcting learning rule. IEEE Int. and J. vol. “An information criterion for optimal neural [249] B. 1. B. Schaffer. Smith. [231] J. Proc.” in Proc.” Nature. Liu and X. Hartley. 7th Annu. W. tionary computation. 1997. L.” [256] P. Paredis. 10.” Dep. shop Combinations of Genetic Algorithms and Neural Networks [255] A. 1998. no. Germany: GMD. 81–90..” 409–418. genetic connectionism. YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1445 . pp. Seitz. “A constructive algorithm for [259] R. Saravanan.” Center Res. K. no. machine [253] E. vol. “Self-organizing maps: Local com- 1997 IEEE Int. 761–762.” in Proc. Zhao. “Evolutionary projection [247] Q. Cribbs III. Ze. 2. July 1996. A. vol. 1538–1540. Eng. pp. vol.” in Connection- [239] D. System.” IEEE Trans. L. 495–502. and Cybernetics. 55–74. S. Evolutionary Computation. Nov. pp. 1990. nition. Eds. 1991. Mäkisara. Tech. #CS89-156. 1990. 1989. Merrill and R. Hinton and S. 1989. ICYCS’95 Workshop Soft Computing. “Radial basis function network [229] W.” IEEE Trans. Spring 1994. 531–536. “The dynamics of evolution Angeles. 1998 IEEE Int. vol. Spears and V. Phillips. G. pp. Eds. W. Sarkar and B. [235] H. Kangas. vol. Elman. a type of neural network. “Genetic evolution of radial basis function prediction. tation in neural circuits.” Neural Networks. 2. Chalmers. Port. Conf. H. [224] S. genetic evolution of radial basis function centers with widths 490–497. [260] S. nos.” in Proc. Wilson. “Phenotypes. for time series prediction. 1997 6th pp. 2. CRT-9019. vol.” Neurocomput. vol. E. 2187–2190.. 1995. pp. IEEE Trans. J. pp. 1997. Univ. Mühlenbein. Kindermann. 751. Touretzky. 1525–1528. 5. neural networks: A comparison of gate functions. Lischka.” in Evolutionary Programming VII: Proc. 7. R. no.

and R. Terao. N.” Dep. 223–234. Liu and X. 4–5. Part 4 (of 4). Evolutionary 1997 IEEE Int. “A modi- [269] J.. Noguchi and H. 1990. and D. 453–461. Lippmann. U. Conf. 1997 IEEE Int.. no. “Applications of (PPSN) V. E. C. pp. Lucas. 1992. 1997 IEEE Int. pp. Applicat. S. Commun. pp. H. 2. “Using genetic algorithms Interactions to Global Phenomena. 1990. P. 5.. “Modeling by shortest data description. and A. vol. Cecconi. 2460–2463. The Netherlands: IOS. 11. L. 1996. vol. pp. Part 4 (of 5). 690–695. Kim. 797–803.. Wallace and J. pp.” IEEE Trans. Los Alamitos. pp. H. 256–258. pp. T. D. Victoria. networks to combine financial forecasts. “Feature extraction technique for tionary Computation. “Combinatorial optimization with Applications (GALESIA’95). 1994. 83–94. C. Bobbin and X. 1997. D.” IEEE Trans. “Fast genetic [300] H. A. “Interactions between learning Singapore). Evolutionary Computation. nos. no.[266] S. Electrical and Computer of Texas. “Optimization of [273] R. pp. Applicat. [284] F. and J. pp. 101–106. vol. Ackley and M. Germany: Springer-Verlag. [291] D. S. “Input pattern encoding through gen. 1991. “Intelli- [272] H. 1. [267] J. Back. Stocker.” Eng. 407–413. improve a neural network to predict a patient’s response to [297] . D. Feb. “Comparative study of [283] B. Los Alamitos. “Coding decision trees. 3. pp. Evolutionary Computation. paths using neural-network-based genetic algorithms.. de Baerdemaeker. enothelial cells. “The artificial evolution of a generalized class of pp.. “Neural networks that teach the binary perceptron. 331–336. 1498 of Lecture Notes in Computer Science. eralized adaptive search.” IEICE [277] Z. adaptive processes. pp. 295–307.” Neural Networks. vol. 213–329. 3. Artificial Intell. Neural Information Processing Systems (3).” Expert Syst. [285] P. Hassanein. [271] P. 543–552. Eds.” Proc. pp. 4.. Vandewalle.” of Genetic Algorithms and Neural Networks (COGANN-92). optimization of a synaptic learning rule. Bengio. Veenker. C.” in Preprints AI’93 Workshop Evolutionary [292] J. P. “On the Networks. 1993. W. “A mathematical theory of generalization. Dallas. Martin. 2–3. pp. 33–39. Liu. A. 14. Trans. Oper.. Wu. Darwen. J.-H. adaptation of learning rates. Chou and J. E78-B. Kim. Y. Part 1 (of 2). 4. Littman. no.” in Proc.” Comput. no. Applicat. 11. Univ. 6. 5. Engineering. pp. B. and through addition of virtual samples. Anderson. Apr. 1994. B. 623–632. “Feature selection and [309] D.” in Proc. vol.” Comput. Weller. no. Hashimoto.” in Proc. R. “A genetic algorithm to Inst. Vol. [305] S. “Neurocontrollers trained with neural networks by means of genetic algorithms: Finding a rules extracted by a genetic assisted reinforcement learning feature detector. Conf. Eng. Math. and W.” in Proc. Akaike. [276] M. F. ANN-based financial forecasting. Imada and K. of evolutionary learning. pp. 187–204. 1997. Evolutionary Computation (ICEC’97). Neural approach. Fogel. [274] B. no. 18..” Automat- Computer Soc. Yao. A. D. works. pp. pp. SEPTEMBER 1999 . [275] D. Eds. Sept. Wu. 10. 685–688. R. Hoff. [290] Z. Zitar and M. pp. Khiani. F. “Solving the redundancy allocation classifiers for the computerized detection of mass lesions in problem using a combined neural network/genetic algorithm digital mammography. B. Cho and K. J. Eds.” Comput. pp. vol. no. Jelinek. 1997.” Network. Agriculture. [296] L. no. 1992. Park. “Using [307] A. 679–683. Contr. 1988. CCECE’97. IEEE Trans. vol. use of guided evolutionary simulated annealing. maier. 87. SFI Studies in the Sciences of [289] S. 9. pp. B. pp. “Automatic design of cellular [306] R. “Applying evolutionary programming to selected 1992.” Neurocomput. 1996. Joint Conf. R. Neural Networks. Evolu- [281] J. no. Pub.” Comput. Smith. vol. Conf. [270] D. “Fast learning gent approach for optimal control of fruit-storage process using method for back-propagation neural network by evolutionary neural networks and genetic algorithms. Schaffer. J. 290–295. Zhang and G. and K. Schwefel. Michalewicz.. 18. and evolution. 2. Bengio.” Neural Network World. Rep. Wolpert. S.” in Advances in B.” in Complex Systems: From Local [280] E. pp. D. Workshop Combinations [298] D. 235–247. no. Eds. Amsterdam. G. vol. 2. 145.-L. Amsterdam. 23.: neural networks. [279] L. 324–328. 18. 6. A. Schoenauer. themselves through genetic discovery of novel examples. Z. Eng. Araki. 1992. 1205–1210. 4. 353–359. Int. Sci. Applicat. “Toward designing neural network ensem- 7. Agriculture. Neural Networks (IJCNN’91 [268] D.-T.” in Proc. “Evolution of neural network training set Complexity. 1998. D. Networks and Their Applications. and A. Brown. D. Uhrig. vol. pp. Computation (Special Issue on the Baldwin Effect). Rasmussen. Australia. [278] F. “The evolution of learning algorithms for artificial fied genetic algorithm for optimal control problems. that learn in an environment. E. Touretzky. vol. 3534–3539. vol. 96–104.” in Proc. Monash University. bile robot by neural network and genetic algorithm. Laitinen. Hsu and Z. H. CA: Morgan [303] P. Int. 6–8.” IEEE Trans. Jung. Jacobs. and H. 1995. Eds. Whitley and J. pp. bles by evolution. Systems. 4. CA: IEEE [302] X. pp. vol. and S. Complex Syst. T. A. “Adaptive switching circuits. 6. Clayton.” selection of features for neural network classifiers. ICEC’96. N. 91/153. “Using genetic algorithms to select in.” in Preprints Conf.” in Artificial Life II. Genetic Algorithms in Engineering Systems: Innovations and [308] P. Med. IN. Stevenage. 1995. E. 1991. Hassoun. Green and T. H. Neural Networks. 151–249. Commun. Indianapolis. Widrow and M. and Y. Pao. AC-19. Whitley and J. vol. Eds. Suzuki. 572–579.. 414. Turney. 1. genetic algorithms. ica. Sere. Harrald and M. Conf. nos. pp. Monostori. Workshop Cellular Neural system. “Solving optimal control problems with a Computation. J. Krawczyk. 1992. J. The Netherlands: IOS. 11–13. “Evolutionary artificial neural net- Optimality in Artificial and Biological Neural Networks. Berlin. Conf. Brown and H. Y. G. M. Moody. “Neural networks and genetic combination schemes for an ensemble of digit recognition algorithms for bankruptcy predictions. S. J. vol. 11. 229–242. [287] A. Electron. pp. C.. 465–471. 487–509. Kupinski and M. 1993. Narayanan and S. pp. 2. Eds.” in Nov. pp. Rissanen. 1997. 1991 IEEE Int.” in Proc. pp. pp. [294] T. 1. G. 1991.” in Parallel Problem Solving from Nature [282] S. J. Summers. 4. 89–104. Card. Mar. vol. “Increased rates of convergence through learning a fuzzy controller for fruit storage using neural networks and rate adaptation. vol. Conf. Nov. Int. M. Fontanari and R. Taylor. Elect. no. Yao. Int. nos. P. and Y. pp. 1998.. netic algorithms for guaranteed QOS in atm networks. 1997.. 859–878. no. vol. J. pp. 515–526. “Evolving artificial neural Kaufmann. Automat. Whitley.” Comput.” Pattern Recognition Lett. H. and P. D. “Bandwidth allocation of virtual 149–168. Lippmann. Cha. neural networks and genetic algorithms in the classification of E. C. K.K. Crosher. Langton.. Meir. Dellaert and J. Brill. Baxter. Coit and A. 1. vol. Chang and R. Wesolkowski and K. pp.. control problems. 313–326. [293] T. Yamany. Inst. Thompson. 2–3. Farag. B.” in Complex Systems. Math. vol. Proc. 23. E. neural networks. nos. Res. Eiben. 1997 Canadian Conf. 189–194.-D. ICEC’97. Bäck. Hornyak and L. Hashimoto. Guo and R. Conf. “Application of an evolution strategy to a genetic algorithm to evolve an optimum input set for a the Hopfield model of associative memory. “A new look at the statistical model identification. 1991. vol. vol. 1960. S. Tech. and K. “How to make best use Computer Soc. and to improve pattern classification performance. 55–58. of Genetic Algorithms and Neural Networks (COGANN-92). 40–51. Nolfi. X. Man. no. “Evolving a learning algorithm for [288] B.. 18–36. Cloutier. “Econets: Neural networks Electron. 1st IEE/IEEE Int. 32. Parasi. 1996. 313–317. pp. 1995. 1. D. Gecsei. 3. Reading. H. puts for neural networks. July 1995. pp. San Mateo. pp. 716–723. Morimoto. pp. pp. Apr. C.” in Proc. T. Maryellen. CA: IEEE [299] J.” IEEE Trans.” Methods Inform. 1996. [301] C. cost on changing control by evolutionary algorithms. vol. “Path planning of an agricultural mo- 1960 IRE WESTCON Convention Rec. pp. 1978. Kamstra. R. NO. 1992. Schaffer. D. Dec. Bosso. 1. 1997 IEEE predictive neural network. Elect. 1446 PROCEEDINGS OF THE IEEE. Patrick. “Parameter adjustment using neural-network-based ge- Warfarin. 1974. 12. vol. Aug. IEEE Int. 1997. S. Neural Networks. 27.” in Proc. pp. 205–224. Durnota. E. Janikow. vol. and J. Yao.” Network. Morimoto. and Cybernetics. Farmer. 1996 IEEE Int.-P. [304] Y. Workshop Combinations Comput. C. VOL.. Z.” in Proc. [286] M. no. no. MA: Addison-Wesley. vol.” in [295] N. Yip and Y. Eds. 1997.

Sette. J. Liu. 1997. Rep. and P. Part 2 (of 4). Ara. He held postdoctoral fellowships in the Australian National fiber-to-yarn production process with a combined neural University (ANU) and the Commonwealth Scientific and Industrial Re- network/genetic algorithm approach.” J. Appl. 10. 29. C. in 1985. Lee and C. [319] G.. vol.Sc. vol. S. 24. S. Manufact.” Institut für Informatik. and S. Conf. versity College. pp. “New approaches to nesting He is a Professor of Computer Science at rectangular patterns. “Optical field- scale groundwater remediation using neural networks and the genetic algorithms. Kitamichi. nos. the M. before joining the University of [316] S. Hefei.. pp. Dagli. pp. Huang and H.-V.” Environmental Sci. Technol. M. Technische Univ. “Object recognition using genetic B.” Int. L. vol. Tech. SEAL’98. 3. no. University of 177–190. U. H. Force Academy (ADFA). vol. 13. Canberra. 6. F. no. Anhui. and Knowledge and Information Systems: An International Journal. 241–252. IEEE) received the 5. Weiss. 115–122. “Hybrid approach to North China Institute of Computing Technolo- vehicle routing using neural networks and genetic algorithms. Programming Society. Manufact. K. Uni- for job-shop scheduling problems. He 279–287... “Combining neural and evolutionary learning: As- [310] M. [312] J. Nishikawa. no. “Optimizing the Birmingham. J. Plasma Sci. Johnson. Appli. pp.” IEEE Trans. T.D. Neural Networks. Potvin. 84–92. Boullart. 1996. 3. 1145–1156.” gies (NCI). and Technology of China (USTC). 1995. Robillard. [317] S. L. 1997. Xin Yao (Senior Member.” Expert Syst.-S. Poshyanonda. and the Ph. 8. 1997. degree from the [313] J. Dowla. 51. Fujita. Yao was the Program Committee Cochair of CEC’99. no. 191–199. and SEAL’96. Funabiki. and V. cat. China. pp. a member of the IEEE NNC Technical Committee on Evolu- neural network algorithm for max cut problems. 1–2. degree from USTC in 1990. “Parallel genetic-neuro scheduler fessor in the School of Computer Science. pp.. and the Second Vice-President of the Evolutionary IEEE Int. Dube. 1996.K. Suzuki. IEEE ICEC’97. He is an to perform PECVD silicon dioxide recipe synthesis via genetic Associate Editor of IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION algorithms. pp. Production Econ. vol. U. Birmingham. D. J. FKI-132-90. Intell. Dagli and P.. and C. 67. Kiekens. 2..” IEEE Trans. Han and G. in 1982. 1996. Germany. pp.” in Proc. H. 528–538. the School of Computer Science. Apr.-C. 1260–1265. Rogers. Dr. combinatorial optimization techniques. He was an Associate Pro- [315] H. “Evolutionary Research. May 1990. K. degree from the University of Science algorithms with a Hopfield’s neural model. Kishimoto. 3. [314] C. [311] L. Australian Defence vol. pp. vol. the University of New South Wales. ICCIMA’99. Semiconduct. “Using neural network process models IEEE ICEC’98. no. 1997. search Organization (CSIRO) in 1990–1992.. vol. May 1997. Beijing.” Textile Res. “Reconstruction of plasma current profile of tokamaks using München. 1997 tionary Computation. Intell. pects and approaches. and Y.Sc. May. is a member of the editorial board of the Journal of Cognitive Systems [318] N. Sakasai. YAO: EVOLVING ARTIFICIAL NEURAL NETWORKS 1447 .